[PATCH v2 0/2] btrfs: qgroup: stale qgroups related impromvents

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] btrfs: qgroup: stale qgroups related impromvents
@ 2024-04-19  9:46 Qu Wenruo
  2024-04-19  9:46 ` [PATCH v2 1/2] btrfs: slightly loose the requirement for qgroup removal Qu Wenruo
  2024-04-19  9:46 ` [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup Qu Wenruo
  0 siblings, 2 replies; 28+ messages in thread
From: Qu Wenruo @ 2024-04-19  9:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: boris

[CHANGELOG]
v2:
- Do more sanity checks before deleting a qgroup

- Make squota to handle auto deleted qgroup more gracefully
  Unfortunately the behavior change would affect btrfs/301, as the
  fully deleted subvolume would make the test case to cause bash grammar
  error (since the qgroup is gone with the subvolume).

  Cc Boris for extra comments on squota compatibility and future
  btrfs/311 updates ideas.

We have two problems in recent qgroup code:

- One can not delete a fully removed qgroup if the drop hits
  drop_subtree_threshold
  As hitting drop_subtree_threshold would mark qgroup inconsistent and
  skip all accounting, this would leave qgroup number untouched (thus
  non-zero), and btrfs refuses to delete qgroup with non-zero rfer/excl
  numbers.

  This would be addressed by the first patch, allowing qgroup
  deletion as long as it doesn't have any child nor a corresponding
  subvolume.

- Deleted subvolumes leaves a stale qgroup until next rescan
  This is a long existing problem.

  Although previous pushes all failed, just let me try it again.

  The idea is commit current transaction if needed (full accounting mode
  and qgroup numbers are consistent), then try to remove the subvolume
  qgroup after it is fully dropped.


Qu Wenruo (2):
  btrfs: slightly loose the requirement for qgroup removal
  btrfs: automatically remove the subvolume qgroup

 fs/btrfs/extent-tree.c |   8 +++
 fs/btrfs/qgroup.c      | 108 ++++++++++++++++++++++++++++++++++++++---
 fs/btrfs/qgroup.h      |   2 +
 3 files changed, 110 insertions(+), 8 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/2] btrfs: slightly loose the requirement for qgroup removal
  2024-04-19  9:46 [PATCH v2 0/2] btrfs: qgroup: stale qgroups related impromvents Qu Wenruo
@ 2024-04-19  9:46 ` Qu Wenruo
  2024-04-29 12:47   ` Boris Burkov
  2024-04-19  9:46 ` [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup Qu Wenruo
  1 sibling, 1 reply; 28+ messages in thread
From: Qu Wenruo @ 2024-04-19  9:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: boris

[BUG]
Currently if one is utilizing "qgroups/drop_subtree_threshold" sysfs,
and a snapshot with level higher than that value is dropped, btrfs will
not be able to delete the qgroup until next qgroup rescan:

  uuid=ffffffff-eeee-dddd-cccc-000000000000

  wipefs -fa $dev
  mkfs.btrfs -f $dev -O quota -s 4k -n 4k -U $uuid
  mount $dev $mnt

  btrfs subvolume create $mnt/subv1/
  for (( i = 0; i < 1024; i++ )); do
  	xfs_io -f -c "pwrite 0 2k" $mnt/subv1/file_$i > /dev/null
  done
  sync
  btrfs subv snapshot $mnt/subv1 $mnt/snapshot
  btrfs quota enable $mnt
  btrfs quota rescan -w $mnt
  sync
  echo 1 > /sys/fs/btrfs/$uuid/qgroups/drop_subtree_threshold
  btrfs subvolume delete $mnt/snapshot
  btrfs subv sync $mnt
  btrfs qgroup show -prce --sync $mnt
  btrfs qgroup destroy 0/257 $mnt
  umount $mnt

The final qgroup removal would fail with the following error:

  ERROR: unable to destroy quota group: Device or resource busy

[CAUSE]
The above script would generate a subvolume of level 2, then snapshot
it, enable qgroup, set the drop_subtree_threshold, then drop the
snapshot.

Since the subvolume drop would meet the threshold, qgroup would be
marked inconsistent and skip accounting to avoid hanging the system at
transaction commit.

But currently we do not allow a qgroup with any rfer/excl numbers to be
dropped, and this is not really compatible with the new
drop_subtree_threshold behavior.

[FIX]
Instead of a strong requirement for zero rfer/excl numbers, just check
if there is any child for higher level qgroup, and for subvolume qgroups
check if there is a corresponding subvolume for it.

For rsv values, do extra warnings, and for rfer/excl numbers, only do the
warning if we're in full accounting mode and the qgroup is consistent.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/qgroup.c | 69 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 62 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9a9f84c4d3b8..2ea16a07a7d4 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1736,13 +1736,38 @@ int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
 	return ret;
 }
 
-static bool qgroup_has_usage(struct btrfs_qgroup *qgroup)
+static bool can_delete_qgroup(struct btrfs_fs_info *fs_info,
+			      struct btrfs_qgroup *qgroup)
 {
-	return (qgroup->rfer > 0 || qgroup->rfer_cmpr > 0 ||
-		qgroup->excl > 0 || qgroup->excl_cmpr > 0 ||
-		qgroup->rsv.values[BTRFS_QGROUP_RSV_DATA] > 0 ||
-		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PREALLOC] > 0 ||
-		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PERTRANS] > 0);
+	struct btrfs_key key;
+	struct btrfs_path *path;
+	int ret;
+
+	/* For higher level qgroup, we can only delete it if it has no child. */
+	if (btrfs_qgroup_level(qgroup->qgroupid)) {
+		if (!list_empty(&qgroup->members))
+			return false;
+		return true;
+	}
+
+	/*
+	 * For level-0 qgroups, we can only delete it if it has no subvolume
+	 * for it.
+	 * This means even a subvolume is unlinked but not yet fully dropped,
+	 * we can not delete the qgroup.
+	 */
+	key.objectid = qgroup->qgroupid;
+	key.type = BTRFS_ROOT_ITEM_KEY;
+	key.offset = -1ULL;
+	path = btrfs_alloc_path();
+	if (!path)
+		return false;
+
+	ret = btrfs_find_root(fs_info->tree_root, &key, path, NULL, NULL);
+	btrfs_free_path(path);
+	if (ret > 0)
+		return true;
+	return false;
 }
 
 int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
@@ -1764,7 +1789,7 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
 		goto out;
 	}
 
-	if (is_fstree(qgroupid) && qgroup_has_usage(qgroup)) {
+	if (!can_delete_qgroup(fs_info, qgroup)) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -1789,6 +1814,36 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
 	}
 
 	spin_lock(&fs_info->qgroup_lock);
+	/*
+	 * Warn on reserved space. The subvolume should has no child nor
+	 * corresponding subvolume.
+	 * Thus its reserved space should all be zero, no matter if qgroup
+	 * is consistent or the mode.
+	 */
+	WARN_ON(qgroup->rsv.values[BTRFS_QGROUP_RSV_DATA] ||
+		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PREALLOC] ||
+		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PERTRANS]);
+	/*
+	 * The same for rfer/excl numbers, but that's only if our qgroup is
+	 * consistent and if it's in regular qgroup mode.
+	 * For simple mode it's not as accurate thus we can hit non-zero values
+	 * very frequently.
+	 */
+	if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_FULL &&
+	    !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT)) {
+		if (WARN_ON(qgroup->rfer || qgroup->excl ||
+			    qgroup->rfer_cmpr || qgroup->excl_cmpr)) {
+			btrfs_warn_rl(fs_info,
+"to be deleted qgroup %u/%llu has non-zero numbers, rfer %llu rfer_cmpr %llu excl %llu excl_cmpr %llu",
+				      btrfs_qgroup_level(qgroup->qgroupid),
+				      btrfs_qgroup_subvolid(qgroup->qgroupid),
+				      qgroup->rfer,
+				      qgroup->rfer_cmpr,
+				      qgroup->excl,
+				      qgroup->excl_cmpr);
+			qgroup_mark_inconsistent(fs_info);
+		}
+	}
 	del_qgroup_rb(fs_info, qgroupid);
 	spin_unlock(&fs_info->qgroup_lock);
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-04-19  9:46 [PATCH v2 0/2] btrfs: qgroup: stale qgroups related impromvents Qu Wenruo
  2024-04-19  9:46 ` [PATCH v2 1/2] btrfs: slightly loose the requirement for qgroup removal Qu Wenruo
@ 2024-04-19  9:46 ` Qu Wenruo
  2024-04-24 12:41   ` David Sterba
  2024-04-29 12:57   ` Boris Burkov
  1 sibling, 2 replies; 28+ messages in thread
From: Qu Wenruo @ 2024-04-19  9:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: boris

Currently if we fully removed a subvolume (not only unlinked, but fully
dropped its root item), its qgroup would not be removed.

Thus we have "btrfs qgroup clear-stale" to handle such 0 level qgroups.

This patch changes the behavior by automatically removing the qgroup of
a fully dropped subvolume.

There is an exception for simple quota, that btrfs_record_squota_delta()
has to handle missing qgroup case, where the target delta belongs to an
automatically deleted subvolume.
In that case since the subvolume is already gone, no need to treat it as
an error.

Link: https://bugzilla.suse.com/show_bug.cgi?id=1222847
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-tree.c |  8 ++++++++
 fs/btrfs/qgroup.c      | 39 ++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/qgroup.h      |  2 ++
 3 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 023920d0d971..1e2caa234146 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5834,6 +5834,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
 	struct btrfs_root_item *root_item = &root->root_item;
 	struct walk_control *wc;
 	struct btrfs_key key;
+	const u64 rootid = btrfs_root_id(root);
 	int err = 0;
 	int ret;
 	int level;
@@ -6064,6 +6065,13 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
 	kfree(wc);
 	btrfs_free_path(path);
 out:
+	if (!err && root_dropped) {
+		ret = btrfs_qgroup_cleanup_dropped_subvolume(fs_info, rootid);
+		if (ret < 0)
+			btrfs_warn_rl(fs_info,
+				      "failed to cleanup qgroup 0/%llu: %d",
+				      rootid, ret);
+	}
 	/*
 	 * We were an unfinished drop root, check to see if there are any
 	 * pending, and if not clear and wake up any waiters.
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 2ea16a07a7d4..9aeb740388ab 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1859,6 +1859,39 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
 	return ret;
 }
 
+int btrfs_qgroup_cleanup_dropped_subvolume(struct btrfs_fs_info *fs_info,
+					   u64 subvolid)
+{
+	struct btrfs_trans_handle *trans;
+	int ret;
+
+	if (!is_fstree(subvolid) || !btrfs_qgroup_enabled(fs_info) ||
+	    !fs_info->quota_root)
+		return 0;
+
+	/*
+	 * If our qgroup is consistent, commit current transaction to make sure
+	 * all the rfer/excl numbers get updated to 0 before deleting.
+	 */
+	if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT)) {
+		trans = btrfs_start_transaction(fs_info->quota_root, 0);
+		if (IS_ERR(trans))
+			return PTR_ERR(trans);
+
+		ret = btrfs_commit_transaction(trans);
+		if (ret < 0)
+			return ret;
+	}
+
+	/* Start new trans to delete the qgroup info and limit items. */
+	trans = btrfs_start_transaction(fs_info->quota_root, 2);
+	if (IS_ERR(trans))
+		return PTR_ERR(trans);
+	ret = btrfs_remove_qgroup(trans, subvolid);
+	btrfs_end_transaction(trans);
+	return ret;
+}
+
 int btrfs_limit_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid,
 		       struct btrfs_qgroup_limit *limit)
 {
@@ -4877,7 +4910,11 @@ int btrfs_record_squota_delta(struct btrfs_fs_info *fs_info,
 	spin_lock(&fs_info->qgroup_lock);
 	qgroup = find_qgroup_rb(fs_info, root);
 	if (!qgroup) {
-		ret = -ENOENT;
+		/*
+		 * The qgroup can be auto deleted by subvolume deleting.
+		 * In that case do not consider it an error.
+		 */
+		ret = 0;
 		goto out;
 	}
 
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 706640be0ec2..3f93856a02e1 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -327,6 +327,8 @@ int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, u64 src,
 			      u64 dst);
 int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid);
 int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid);
+int btrfs_qgroup_cleanup_dropped_subvolume(struct btrfs_fs_info *fs_info,
+					   u64 subvolid);
 int btrfs_limit_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid,
 		       struct btrfs_qgroup_limit *limit);
 int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-04-19  9:46 ` [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup Qu Wenruo
@ 2024-04-24 12:41   ` David Sterba
  2024-04-24 22:19     ` Qu Wenruo
  2024-04-29 12:57   ` Boris Burkov
  1 sibling, 1 reply; 28+ messages in thread
From: David Sterba @ 2024-04-24 12:41 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, boris

On Fri, Apr 19, 2024 at 07:16:53PM +0930, Qu Wenruo wrote:
> Currently if we fully removed a subvolume (not only unlinked, but fully
> dropped its root item), its qgroup would not be removed.
> 
> Thus we have "btrfs qgroup clear-stale" to handle such 0 level qgroups.

There's also an option 'btrfs subvolume delete --delete-qgroup' that
does that and is going to be default in 6.9. With this kernel change it
would break the behaviour of the --no-delete-qgroup, which is there for
the case something depends on that.  For now I'd rather postpone
changing the kernel behaviour.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-04-24 12:41   ` David Sterba
@ 2024-04-24 22:19     ` Qu Wenruo
  2024-04-25 12:34       ` David Sterba
  0 siblings, 1 reply; 28+ messages in thread
From: Qu Wenruo @ 2024-04-24 22:19 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: linux-btrfs, boris

在 2024/4/24 22:11, David Sterba 写道:
> On Fri, Apr 19, 2024 at 07:16:53PM +0930, Qu Wenruo wrote:
>> Currently if we fully removed a subvolume (not only unlinked, but fully
>> dropped its root item), its qgroup would not be removed.
>>
>> Thus we have "btrfs qgroup clear-stale" to handle such 0 level qgroups.
>
> There's also an option 'btrfs subvolume delete --delete-qgroup' that
> does that and is going to be default in 6.9. With this kernel change it
> would break the behaviour of the --no-delete-qgroup, which is there for
> the case something depends on that.  For now I'd rather postpone
> changing the kernel behaviour.
>

A quick glance of the --delete-qgroup shows it won't work as expected at
all.

Firstly, the qgroup delete requires the qgroup numbers to be 0.
Meanwhile qgroup numbers can only be 0 after 1) the full subvolume has
been dropped 2) a transaction is committed to reflect the qgroup numbers.

Both situation is only handled in my patchset, thus this means for a lot
of cases it won't work at all.

Furthermore, there is the drop_subtree_threshold thing, which can mark
qgroup inconsistent and skip accounting, making the target subvolume's
qgroup numbers never fall back to 0 (until next rescan).

So I'm afraid the --delete-qgroup won't work until the 1/2 patch get
merged (allowing deleting qgroups as long as the target subvolume is gone).

Thanks,
Qu

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-04-24 22:19     ` Qu Wenruo
@ 2024-04-25 12:34       ` David Sterba
  2024-04-25 21:51         ` Qu Wenruo
  0 siblings, 1 reply; 28+ messages in thread
From: David Sterba @ 2024-04-25 12:34 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, boris

On Thu, Apr 25, 2024 at 07:49:12AM +0930, Qu Wenruo wrote:
> 
> 
> 在 2024/4/24 22:11, David Sterba 写道:
> > On Fri, Apr 19, 2024 at 07:16:53PM +0930, Qu Wenruo wrote:
> >> Currently if we fully removed a subvolume (not only unlinked, but fully
> >> dropped its root item), its qgroup would not be removed.
> >>
> >> Thus we have "btrfs qgroup clear-stale" to handle such 0 level qgroups.
> >
> > There's also an option 'btrfs subvolume delete --delete-qgroup' that
> > does that and is going to be default in 6.9. With this kernel change it
> > would break the behaviour of the --no-delete-qgroup, which is there for
> > the case something depends on that.  For now I'd rather postpone
> > changing the kernel behaviour.
> >
> 
> A quick glance of the --delete-qgroup shows it won't work as expected at
> all.
> 
> Firstly, the qgroup delete requires the qgroup numbers to be 0.
> Meanwhile qgroup numbers can only be 0 after 1) the full subvolume has
> been dropped 2) a transaction is committed to reflect the qgroup numbers.

The deletion option calls ioctl, so this means that 'btrfs qgroup remove'
will not delete it either?

> Both situation is only handled in my patchset, thus this means for a lot
> of cases it won't work at all.
> 
> Furthermore, there is the drop_subtree_threshold thing, which can mark
> qgroup inconsistent and skip accounting, making the target subvolume's
> qgroup numbers never fall back to 0 (until next rescan).
> 
> So I'm afraid the --delete-qgroup won't work until the 1/2 patch get
> merged (allowing deleting qgroups as long as the target subvolume is gone).

Ok, so for emulation of the complete removal in userspace it's

btrfs subvolume delete 123
btrfs subvolume sync 123
btrfs qgroup remove 0/123

but this needs to wait until the sync is finished and that is not
expected for the subvolume delete command. It needs to be fixed but now
I'm not sure this can be default in 6.9 as planned.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-04-25 12:34       ` David Sterba
@ 2024-04-25 21:51         ` Qu Wenruo
  2024-04-29 13:13           ` Boris Burkov
  2024-04-29 16:36           ` David Sterba
  0 siblings, 2 replies; 28+ messages in thread
From: Qu Wenruo @ 2024-04-25 21:51 UTC (permalink / raw)
  To: dsterba; +Cc: Qu Wenruo, linux-btrfs, boris



在 2024/4/25 22:04, David Sterba 写道:
> On Thu, Apr 25, 2024 at 07:49:12AM +0930, Qu Wenruo wrote:
>>
>>
>> 在 2024/4/24 22:11, David Sterba 写道:
>>> On Fri, Apr 19, 2024 at 07:16:53PM +0930, Qu Wenruo wrote:
>>>> Currently if we fully removed a subvolume (not only unlinked, but fully
>>>> dropped its root item), its qgroup would not be removed.
>>>>
>>>> Thus we have "btrfs qgroup clear-stale" to handle such 0 level qgroups.
>>>
>>> There's also an option 'btrfs subvolume delete --delete-qgroup' that
>>> does that and is going to be default in 6.9. With this kernel change it
>>> would break the behaviour of the --no-delete-qgroup, which is there for
>>> the case something depends on that.  For now I'd rather postpone
>>> changing the kernel behaviour.
>>>
>>
>> A quick glance of the --delete-qgroup shows it won't work as expected at
>> all.
>>
>> Firstly, the qgroup delete requires the qgroup numbers to be 0.
>> Meanwhile qgroup numbers can only be 0 after 1) the full subvolume has
>> been dropped 2) a transaction is committed to reflect the qgroup numbers.
>
> The deletion option calls ioctl, so this means that 'btrfs qgroup remove'
> will not delete it either?

Nope, at least if the subvolume is not cleaned up immediately.
>
>> Both situation is only handled in my patchset, thus this means for a lot
>> of cases it won't work at all.
>>
>> Furthermore, there is the drop_subtree_threshold thing, which can mark
>> qgroup inconsistent and skip accounting, making the target subvolume's
>> qgroup numbers never fall back to 0 (until next rescan).
>>
>> So I'm afraid the --delete-qgroup won't work until the 1/2 patch get
>> merged (allowing deleting qgroups as long as the target subvolume is gone).
>
> Ok, so for emulation of the complete removal in userspace it's
>
> btrfs subvolume delete 123
> btrfs subvolume sync 123
> btrfs qgroup remove 0/123
>
> but this needs to wait until the sync is finished and that is not
> expected for the subvolume delete command.

That's the problem, and why doing it in user space has it limits.

Furthermore, with drop_subtree_threshold or other qgroup operations
marking the qgroup inconsistent, you can not delete that qgroup at all,
until the next rescan.

> It needs to be fixed but now
> I'm not sure this can be default in 6.9 as planned.

I'd say, you should not implement this feature without really
understanding the challenges in the first place.

And that's why I really prefer you send out non-trivial btrfs-progs for
review, other than pushing them directly into github repo.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] btrfs: slightly loose the requirement for qgroup removal
  2024-04-19  9:46 ` [PATCH v2 1/2] btrfs: slightly loose the requirement for qgroup removal Qu Wenruo
@ 2024-04-29 12:47   ` Boris Burkov
  2024-04-29 22:00     ` Qu Wenruo
  0 siblings, 1 reply; 28+ messages in thread
From: Boris Burkov @ 2024-04-29 12:47 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Apr 19, 2024 at 07:16:52PM +0930, Qu Wenruo wrote:
> [BUG]
> Currently if one is utilizing "qgroups/drop_subtree_threshold" sysfs,
> and a snapshot with level higher than that value is dropped, btrfs will
> not be able to delete the qgroup until next qgroup rescan:
> 
>   uuid=ffffffff-eeee-dddd-cccc-000000000000
> 
>   wipefs -fa $dev
>   mkfs.btrfs -f $dev -O quota -s 4k -n 4k -U $uuid
>   mount $dev $mnt
> 
>   btrfs subvolume create $mnt/subv1/
>   for (( i = 0; i < 1024; i++ )); do
>   	xfs_io -f -c "pwrite 0 2k" $mnt/subv1/file_$i > /dev/null
>   done
>   sync
>   btrfs subv snapshot $mnt/subv1 $mnt/snapshot
>   btrfs quota enable $mnt
>   btrfs quota rescan -w $mnt
>   sync
>   echo 1 > /sys/fs/btrfs/$uuid/qgroups/drop_subtree_threshold
>   btrfs subvolume delete $mnt/snapshot
>   btrfs subv sync $mnt
>   btrfs qgroup show -prce --sync $mnt
>   btrfs qgroup destroy 0/257 $mnt
>   umount $mnt
> 
> The final qgroup removal would fail with the following error:
> 
>   ERROR: unable to destroy quota group: Device or resource busy
> 
> [CAUSE]
> The above script would generate a subvolume of level 2, then snapshot
> it, enable qgroup, set the drop_subtree_threshold, then drop the
> snapshot.
> 
> Since the subvolume drop would meet the threshold, qgroup would be
> marked inconsistent and skip accounting to avoid hanging the system at
> transaction commit.
> 
> But currently we do not allow a qgroup with any rfer/excl numbers to be
> dropped, and this is not really compatible with the new
> drop_subtree_threshold behavior.
> 
> [FIX]
> Instead of a strong requirement for zero rfer/excl numbers, just check
> if there is any child for higher level qgroup, and for subvolume qgroups
> check if there is a corresponding subvolume for it.
> 
> For rsv values, do extra warnings, and for rfer/excl numbers, only do the
> warning if we're in full accounting mode and the qgroup is consistent.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/qgroup.c | 69 ++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 62 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 9a9f84c4d3b8..2ea16a07a7d4 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -1736,13 +1736,38 @@ int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
>  	return ret;
>  }
>  
> -static bool qgroup_has_usage(struct btrfs_qgroup *qgroup)
> +static bool can_delete_qgroup(struct btrfs_fs_info *fs_info,
> +			      struct btrfs_qgroup *qgroup)
>  {
> -	return (qgroup->rfer > 0 || qgroup->rfer_cmpr > 0 ||
> -		qgroup->excl > 0 || qgroup->excl_cmpr > 0 ||
> -		qgroup->rsv.values[BTRFS_QGROUP_RSV_DATA] > 0 ||
> -		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PREALLOC] > 0 ||
> -		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PERTRANS] > 0);
> +	struct btrfs_key key;
> +	struct btrfs_path *path;
> +	int ret;
> +
> +	/* For higher level qgroup, we can only delete it if it has no child. */
> +	if (btrfs_qgroup_level(qgroup->qgroupid)) {
> +		if (!list_empty(&qgroup->members))
> +			return false;
> +		return true;
> +	}
> +
> +	/*
> +	 * For level-0 qgroups, we can only delete it if it has no subvolume
> +	 * for it.
> +	 * This means even a subvolume is unlinked but not yet fully dropped,
> +	 * we can not delete the qgroup.
> +	 */

This was what I originally considered for normal qgroups before observing
that usage is 0 anyway. I didn't know about the drop tree threshold,
my mistake.

With that said, I support this change for non-squota qgroups.

For squota, I think this behavior would be OK, but undesirable, IMO. Any
parent qgroup would still have its usage incremented against the limit,
but it would be impossible to find any information on where it was from.

How do you feel about making this patch add the new logic and make it
conditional on qgroup mode?

Thanks,
Boris

> +	key.objectid = qgroup->qgroupid;
> +	key.type = BTRFS_ROOT_ITEM_KEY;
> +	key.offset = -1ULL;
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return false;
> +
> +	ret = btrfs_find_root(fs_info->tree_root, &key, path, NULL, NULL);
> +	btrfs_free_path(path);
> +	if (ret > 0)
> +		return true;
> +	return false;
>  }
>  
>  int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
> @@ -1764,7 +1789,7 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
>  		goto out;
>  	}
>  
> -	if (is_fstree(qgroupid) && qgroup_has_usage(qgroup)) {
> +	if (!can_delete_qgroup(fs_info, qgroup)) {
>  		ret = -EBUSY;
>  		goto out;
>  	}
> @@ -1789,6 +1814,36 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
>  	}
>  
>  	spin_lock(&fs_info->qgroup_lock);
> +	/*
> +	 * Warn on reserved space. The subvolume should has no child nor
> +	 * corresponding subvolume.
> +	 * Thus its reserved space should all be zero, no matter if qgroup
> +	 * is consistent or the mode.
> +	 */
> +	WARN_ON(qgroup->rsv.values[BTRFS_QGROUP_RSV_DATA] ||
> +		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PREALLOC] ||
> +		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PERTRANS]);
> +	/*
> +	 * The same for rfer/excl numbers, but that's only if our qgroup is
> +	 * consistent and if it's in regular qgroup mode.
> +	 * For simple mode it's not as accurate thus we can hit non-zero values
> +	 * very frequently.
> +	 */
> +	if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_FULL &&
> +	    !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT)) {
> +		if (WARN_ON(qgroup->rfer || qgroup->excl ||
> +			    qgroup->rfer_cmpr || qgroup->excl_cmpr)) {
> +			btrfs_warn_rl(fs_info,
> +"to be deleted qgroup %u/%llu has non-zero numbers, rfer %llu rfer_cmpr %llu excl %llu excl_cmpr %llu",
> +				      btrfs_qgroup_level(qgroup->qgroupid),
> +				      btrfs_qgroup_subvolid(qgroup->qgroupid),
> +				      qgroup->rfer,
> +				      qgroup->rfer_cmpr,
> +				      qgroup->excl,
> +				      qgroup->excl_cmpr);
> +			qgroup_mark_inconsistent(fs_info);
> +		}
> +	}

If you go ahead with making it conditional, I would fold this warning
into the check logic. Either way is fine, of course.

>  	del_qgroup_rb(fs_info, qgroupid);
>  	spin_unlock(&fs_info->qgroup_lock);
>  
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-04-19  9:46 ` [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup Qu Wenruo
  2024-04-24 12:41   ` David Sterba
@ 2024-04-29 12:57   ` Boris Burkov
  1 sibling, 0 replies; 28+ messages in thread
From: Boris Burkov @ 2024-04-29 12:57 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Apr 19, 2024 at 07:16:53PM +0930, Qu Wenruo wrote:
> Currently if we fully removed a subvolume (not only unlinked, but fully
> dropped its root item), its qgroup would not be removed.
> 
> Thus we have "btrfs qgroup clear-stale" to handle such 0 level qgroups.
> 
> This patch changes the behavior by automatically removing the qgroup of
> a fully dropped subvolume.
> 
> There is an exception for simple quota, that btrfs_record_squota_delta()
> has to handle missing qgroup case, where the target delta belongs to an
> automatically deleted subvolume.
> In that case since the subvolume is already gone, no need to treat it as
> an error.
> 
> Link: https://bugzilla.suse.com/show_bug.cgi?id=1222847
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent-tree.c |  8 ++++++++
>  fs/btrfs/qgroup.c      | 39 ++++++++++++++++++++++++++++++++++++++-
>  fs/btrfs/qgroup.h      |  2 ++
>  3 files changed, 48 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 023920d0d971..1e2caa234146 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -5834,6 +5834,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
>  	struct btrfs_root_item *root_item = &root->root_item;
>  	struct walk_control *wc;
>  	struct btrfs_key key;
> +	const u64 rootid = btrfs_root_id(root);
>  	int err = 0;
>  	int ret;
>  	int level;
> @@ -6064,6 +6065,13 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
>  	kfree(wc);
>  	btrfs_free_path(path);
>  out:
> +	if (!err && root_dropped) {
> +		ret = btrfs_qgroup_cleanup_dropped_subvolume(fs_info, rootid);
> +		if (ret < 0)
> +			btrfs_warn_rl(fs_info,
> +				      "failed to cleanup qgroup 0/%llu: %d",
> +				      rootid, ret);
> +	}
>  	/*
>  	 * We were an unfinished drop root, check to see if there are any
>  	 * pending, and if not clear and wake up any waiters.
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 2ea16a07a7d4..9aeb740388ab 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -1859,6 +1859,39 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
>  	return ret;
>  }
>  
> +int btrfs_qgroup_cleanup_dropped_subvolume(struct btrfs_fs_info *fs_info,
> +					   u64 subvolid)
> +{
> +	struct btrfs_trans_handle *trans;
> +	int ret;
> +
> +	if (!is_fstree(subvolid) || !btrfs_qgroup_enabled(fs_info) ||
> +	    !fs_info->quota_root)
> +		return 0;
> +
> +	/*
> +	 * If our qgroup is consistent, commit current transaction to make sure
> +	 * all the rfer/excl numbers get updated to 0 before deleting.
> +	 */
> +	if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT)) {
> +		trans = btrfs_start_transaction(fs_info->quota_root, 0);
> +		if (IS_ERR(trans))
> +			return PTR_ERR(trans);
> +
> +		ret = btrfs_commit_transaction(trans);
> +		if (ret < 0)
> +			return ret;
> +	}
> +
> +	/* Start new trans to delete the qgroup info and limit items. */
> +	trans = btrfs_start_transaction(fs_info->quota_root, 2);
> +	if (IS_ERR(trans))
> +		return PTR_ERR(trans);
> +	ret = btrfs_remove_qgroup(trans, subvolid);
> +	btrfs_end_transaction(trans);
> +	return ret;
> +}
> +
>  int btrfs_limit_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid,
>  		       struct btrfs_qgroup_limit *limit)
>  {
> @@ -4877,7 +4910,11 @@ int btrfs_record_squota_delta(struct btrfs_fs_info *fs_info,
>  	spin_lock(&fs_info->qgroup_lock);
>  	qgroup = find_qgroup_rb(fs_info, root);
>  	if (!qgroup) {
> -		ret = -ENOENT;
> +		/*
> +		 * The qgroup can be auto deleted by subvolume deleting.
> +		 * In that case do not consider it an error.
> +		 */
> +		ret = 0;

If we call record_squota_delta, that means we are adding an extent to
the tree whose owner ref will point at this subvol. Which means that we
are entering into a case where our on disk accounting disagrees between
qgroup and extent tree. This should fail the squota fsck logic, if you
really trigger it.

How were you hitting this? In some mundane way where the extent actually
gets dropped? I worry that if the extent also picks up a reference in
the same transaction in a way that the owner is the dropped subvol, we
could get into inconsistency. Ideally, the checks on rsv/usage in drop
also prevent that..

>  		goto out;
>  	}
>  
> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
> index 706640be0ec2..3f93856a02e1 100644
> --- a/fs/btrfs/qgroup.h
> +++ b/fs/btrfs/qgroup.h
> @@ -327,6 +327,8 @@ int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, u64 src,
>  			      u64 dst);
>  int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid);
>  int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid);
> +int btrfs_qgroup_cleanup_dropped_subvolume(struct btrfs_fs_info *fs_info,
> +					   u64 subvolid);
>  int btrfs_limit_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid,
>  		       struct btrfs_qgroup_limit *limit);
>  int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info);
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-04-25 21:51         ` Qu Wenruo
@ 2024-04-29 13:13           ` Boris Burkov
  2024-04-29 16:31             ` David Sterba
  2024-04-29 22:49             ` Qu Wenruo
  2024-04-29 16:36           ` David Sterba
  1 sibling, 2 replies; 28+ messages in thread
From: Boris Burkov @ 2024-04-29 13:13 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Fri, Apr 26, 2024 at 07:21:08AM +0930, Qu Wenruo wrote:
> 
> 
> 在 2024/4/25 22:04, David Sterba 写道:
> > On Thu, Apr 25, 2024 at 07:49:12AM +0930, Qu Wenruo wrote:
> > > 
> > > 
> > > 在 2024/4/24 22:11, David Sterba 写道:
> > > > On Fri, Apr 19, 2024 at 07:16:53PM +0930, Qu Wenruo wrote:
> > > > > Currently if we fully removed a subvolume (not only unlinked, but fully
> > > > > dropped its root item), its qgroup would not be removed.
> > > > > 
> > > > > Thus we have "btrfs qgroup clear-stale" to handle such 0 level qgroups.
> > > > 
> > > > There's also an option 'btrfs subvolume delete --delete-qgroup' that
> > > > does that and is going to be default in 6.9. With this kernel change it
> > > > would break the behaviour of the --no-delete-qgroup, which is there for
> > > > the case something depends on that.  For now I'd rather postpone
> > > > changing the kernel behaviour.
> > > > 
> > > 
> > > A quick glance of the --delete-qgroup shows it won't work as expected at
> > > all.
> > > 
> > > Firstly, the qgroup delete requires the qgroup numbers to be 0.
> > > Meanwhile qgroup numbers can only be 0 after 1) the full subvolume has
> > > been dropped 2) a transaction is committed to reflect the qgroup numbers.
> > 
> > The deletion option calls ioctl, so this means that 'btrfs qgroup remove'
> > will not delete it either?
> 
> Nope, at least if the subvolume is not cleaned up immediately.
> > 
> > > Both situation is only handled in my patchset, thus this means for a lot
> > > of cases it won't work at all.
> > > 
> > > Furthermore, there is the drop_subtree_threshold thing, which can mark
> > > qgroup inconsistent and skip accounting, making the target subvolume's
> > > qgroup numbers never fall back to 0 (until next rescan).
> > > 
> > > So I'm afraid the --delete-qgroup won't work until the 1/2 patch get
> > > merged (allowing deleting qgroups as long as the target subvolume is gone).
> > 
> > Ok, so for emulation of the complete removal in userspace it's
> > 
> > btrfs subvolume delete 123
> > btrfs subvolume sync 123
> > btrfs qgroup remove 0/123
> > 
> > but this needs to wait until the sync is finished and that is not
> > expected for the subvolume delete command.
> 
> That's the problem, and why doing it in user space has it limits.
> 
> Furthermore, with drop_subtree_threshold or other qgroup operations
> marking the qgroup inconsistent, you can not delete that qgroup at all,
> until the next rescan.
> 
> > It needs to be fixed but now
> > I'm not sure this can be default in 6.9 as planned.
> 
> I'd say, you should not implement this feature without really
> understanding the challenges in the first place.
> 
> And that's why I really prefer you send out non-trivial btrfs-progs for
> review, other than pushing them directly into github repo.
> 
> Thanks,
> Qu

I support the auto deletion in the kernel as you propose, I think it
just makes sense. Who wants stale, empty qgroups around that aren't
attached to any subvol? I suppose that with the drop_thresh thing, it is
possible some parent qgroup still reflects the usage until the next full
scan?

Thinking out loud -- for regular qgroups, we could avoid this all if we
do the reaping when usage goes to 0 and there is no subvol. So remove
the qgroup as a consequence of the rescan, not the subvol delete. I
imagine this is quite a bit messier, though :(

We could also just not auto-reap if that condition occurs (inconsistent
qg with a parent), but I think that may be surprising for the same
reasons that got you working on this in the first place...

Do we know of an explicit need to support --no-delete-qgroup? It feels
like it is perfectly normal for us to improve the default behavior of
the kernel or userspace tools without supporting the old behavior as a
flag forever (without a user).

Put another way, I think it would be perfectly reasonable to term the
stale qgroups a leaked memory resource and this patch a bug fix, if we
are willing to get overly philosophical about it. We don't carry around
eternal flags for bug fixes, unless it's some rather exceptional case.

If we are on the hook for supporing that flag because we already added
it to progs and don't want to deprecate it, then maybe we can think of
something compatible with default auto-reap.

Thanks,
Boris

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-04-29 13:13           ` Boris Burkov
@ 2024-04-29 16:31             ` David Sterba
  2024-04-29 22:05               ` Qu Wenruo
  2024-04-29 22:49             ` Qu Wenruo
  1 sibling, 1 reply; 28+ messages in thread
From: David Sterba @ 2024-04-29 16:31 UTC (permalink / raw)
  To: Boris Burkov; +Cc: Qu Wenruo, Qu Wenruo, linux-btrfs

On Mon, Apr 29, 2024 at 06:13:33AM -0700, Boris Burkov wrote:
> I support the auto deletion in the kernel as you propose, I think it
> just makes sense. Who wants stale, empty qgroups around that aren't
> attached to any subvol? I suppose that with the drop_thresh thing, it is
> possible some parent qgroup still reflects the usage until the next full
> scan?

The stale qgroups have been out for a long time so removing them after
subvolume deletion is changing default behaviour, this always breaks
somebody's scripts or tools.

> Thinking out loud -- for regular qgroups, we could avoid this all if we
> do the reaping when usage goes to 0 and there is no subvol. So remove
> the qgroup as a consequence of the rescan, not the subvol delete. I
> imagine this is quite a bit messier, though :(
> 
> We could also just not auto-reap if that condition occurs (inconsistent
> qg with a parent), but I think that may be surprising for the same
> reasons that got you working on this in the first place...
> 
> Do we know of an explicit need to support --no-delete-qgroup? It feels
> like it is perfectly normal for us to improve the default behavior of
> the kernel or userspace tools without supporting the old behavior as a
> flag forever (without a user).

$ id=$(btrfs inspect rootid subvol)
$ btrfs subvolume delete subvol
$ btrfs qgroup remove 0/$id 1/1 .                   <---- fails
$ btrfs qgroup destroy 0/$id .                      <---- fails

> Put another way, I think it would be perfectly reasonable to term the
> stale qgroups a leaked memory resource and this patch a bug fix, if we
> are willing to get overly philosophical about it. We don't carry around
> eternal flags for bug fixes, unless it's some rather exceptional case.

The command line option does not do what I expected, if somebody would
have to update the scripts to add it then we can do the kernel
auto-remove and the document it. Eventually we can translate the -ENOENT
error code to be ignored.

> If we are on the hook for supporing that flag because we already added
> it to progs and don't want to deprecate it, then maybe we can think of
> something compatible with default auto-reap.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-04-25 21:51         ` Qu Wenruo
  2024-04-29 13:13           ` Boris Burkov
@ 2024-04-29 16:36           ` David Sterba
  1 sibling, 0 replies; 28+ messages in thread
From: David Sterba @ 2024-04-29 16:36 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs, boris

On Fri, Apr 26, 2024 at 07:21:08AM +0930, Qu Wenruo wrote:
> > btrfs subvolume delete 123
> > btrfs subvolume sync 123
> > btrfs qgroup remove 0/123
> >
> > but this needs to wait until the sync is finished and that is not
> > expected for the subvolume delete command.
> 
> That's the problem, and why doing it in user space has it limits.
> 
> Furthermore, with drop_subtree_threshold or other qgroup operations
> marking the qgroup inconsistent, you can not delete that qgroup at all,
> until the next rescan.

Ok so that makes it more complicated and better solved in kernel, with
the compatibility issues mentioned in the other mail.

> > It needs to be fixed but now
> > I'm not sure this can be default in 6.9 as planned.
> 
> I'd say, you should not implement this feature without really
> understanding the challenges in the first place.
> 
> And that's why I really prefer you send out non-trivial btrfs-progs for
> review, other than pushing them directly into github repo.

As discussed on slack, the development happens in git, for btrfs-progs
the mails are overhead that does not pay off.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] btrfs: slightly loose the requirement for qgroup removal
  2024-04-29 12:47   ` Boris Burkov
@ 2024-04-29 22:00     ` Qu Wenruo
  2024-04-29 22:19       ` Boris Burkov
  0 siblings, 1 reply; 28+ messages in thread
From: Qu Wenruo @ 2024-04-29 22:00 UTC (permalink / raw)
  To: Boris Burkov, Qu Wenruo; +Cc: linux-btrfs



在 2024/4/29 22:17, Boris Burkov 写道:
> On Fri, Apr 19, 2024 at 07:16:52PM +0930, Qu Wenruo wrote:
>> [BUG]
>> Currently if one is utilizing "qgroups/drop_subtree_threshold" sysfs,
>> and a snapshot with level higher than that value is dropped, btrfs will
>> not be able to delete the qgroup until next qgroup rescan:
>>
>>    uuid=ffffffff-eeee-dddd-cccc-000000000000
>>
>>    wipefs -fa $dev
>>    mkfs.btrfs -f $dev -O quota -s 4k -n 4k -U $uuid
>>    mount $dev $mnt
>>
>>    btrfs subvolume create $mnt/subv1/
>>    for (( i = 0; i < 1024; i++ )); do
>>    	xfs_io -f -c "pwrite 0 2k" $mnt/subv1/file_$i > /dev/null
>>    done
>>    sync
>>    btrfs subv snapshot $mnt/subv1 $mnt/snapshot
>>    btrfs quota enable $mnt
>>    btrfs quota rescan -w $mnt
>>    sync
>>    echo 1 > /sys/fs/btrfs/$uuid/qgroups/drop_subtree_threshold
>>    btrfs subvolume delete $mnt/snapshot
>>    btrfs subv sync $mnt
>>    btrfs qgroup show -prce --sync $mnt
>>    btrfs qgroup destroy 0/257 $mnt
>>    umount $mnt
>>
>> The final qgroup removal would fail with the following error:
>>
>>    ERROR: unable to destroy quota group: Device or resource busy
>>
>> [CAUSE]
>> The above script would generate a subvolume of level 2, then snapshot
>> it, enable qgroup, set the drop_subtree_threshold, then drop the
>> snapshot.
>>
>> Since the subvolume drop would meet the threshold, qgroup would be
>> marked inconsistent and skip accounting to avoid hanging the system at
>> transaction commit.
>>
>> But currently we do not allow a qgroup with any rfer/excl numbers to be
>> dropped, and this is not really compatible with the new
>> drop_subtree_threshold behavior.
>>
>> [FIX]
>> Instead of a strong requirement for zero rfer/excl numbers, just check
>> if there is any child for higher level qgroup, and for subvolume qgroups
>> check if there is a corresponding subvolume for it.
>>
>> For rsv values, do extra warnings, and for rfer/excl numbers, only do the
>> warning if we're in full accounting mode and the qgroup is consistent.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/qgroup.c | 69 ++++++++++++++++++++++++++++++++++++++++++-----
>>   1 file changed, 62 insertions(+), 7 deletions(-)
>>
>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>> index 9a9f84c4d3b8..2ea16a07a7d4 100644
>> --- a/fs/btrfs/qgroup.c
>> +++ b/fs/btrfs/qgroup.c
>> @@ -1736,13 +1736,38 @@ int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
>>   	return ret;
>>   }
>>
>> -static bool qgroup_has_usage(struct btrfs_qgroup *qgroup)
>> +static bool can_delete_qgroup(struct btrfs_fs_info *fs_info,
>> +			      struct btrfs_qgroup *qgroup)
>>   {
>> -	return (qgroup->rfer > 0 || qgroup->rfer_cmpr > 0 ||
>> -		qgroup->excl > 0 || qgroup->excl_cmpr > 0 ||
>> -		qgroup->rsv.values[BTRFS_QGROUP_RSV_DATA] > 0 ||
>> -		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PREALLOC] > 0 ||
>> -		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PERTRANS] > 0);
>> +	struct btrfs_key key;
>> +	struct btrfs_path *path;
>> +	int ret;
>> +
>> +	/* For higher level qgroup, we can only delete it if it has no child. */
>> +	if (btrfs_qgroup_level(qgroup->qgroupid)) {
>> +		if (!list_empty(&qgroup->members))
>> +			return false;
>> +		return true;
>> +	}
>> +
>> +	/*
>> +	 * For level-0 qgroups, we can only delete it if it has no subvolume
>> +	 * for it.
>> +	 * This means even a subvolume is unlinked but not yet fully dropped,
>> +	 * we can not delete the qgroup.
>> +	 */
>
> This was what I originally considered for normal qgroups before observing
> that usage is 0 anyway. I didn't know about the drop tree threshold,
> my mistake.
>
> With that said, I support this change for non-squota qgroups.
>
> For squota, I think this behavior would be OK, but undesirable, IMO. Any
> parent qgroup would still have its usage incremented against the limit,
> but it would be impossible to find any information on where it was from.

That's indeed another problem.

For regular qgroup that could only happen when qgroup is inconsistent,
and we're fine to drop the qgroup without updating the parent.

But for squota, there is no inconsistent state, meaning we should also
drop all the numbers from parent too.

>
> How do you feel about making this patch add the new logic and make it
> conditional on qgroup mode?

I'm totally fine to do it conditional, although I'd also like to get
feedback on dropping the numbers from parent qgroup (so we can do it for
both qgroup and squota).

Would the auto drop for parent numbers be a solution?

Thanks,
Qu
>
> Thanks,
> Boris
>
>> +	key.objectid = qgroup->qgroupid;
>> +	key.type = BTRFS_ROOT_ITEM_KEY;
>> +	key.offset = -1ULL;
>> +	path = btrfs_alloc_path();
>> +	if (!path)
>> +		return false;
>> +
>> +	ret = btrfs_find_root(fs_info->tree_root, &key, path, NULL, NULL);
>> +	btrfs_free_path(path);
>> +	if (ret > 0)
>> +		return true;
>> +	return false;
>>   }
>>
>>   int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
>> @@ -1764,7 +1789,7 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
>>   		goto out;
>>   	}
>>
>> -	if (is_fstree(qgroupid) && qgroup_has_usage(qgroup)) {
>> +	if (!can_delete_qgroup(fs_info, qgroup)) {
>>   		ret = -EBUSY;
>>   		goto out;
>>   	}
>> @@ -1789,6 +1814,36 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
>>   	}
>>
>>   	spin_lock(&fs_info->qgroup_lock);
>> +	/*
>> +	 * Warn on reserved space. The subvolume should has no child nor
>> +	 * corresponding subvolume.
>> +	 * Thus its reserved space should all be zero, no matter if qgroup
>> +	 * is consistent or the mode.
>> +	 */
>> +	WARN_ON(qgroup->rsv.values[BTRFS_QGROUP_RSV_DATA] ||
>> +		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PREALLOC] ||
>> +		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PERTRANS]);
>> +	/*
>> +	 * The same for rfer/excl numbers, but that's only if our qgroup is
>> +	 * consistent and if it's in regular qgroup mode.
>> +	 * For simple mode it's not as accurate thus we can hit non-zero values
>> +	 * very frequently.
>> +	 */
>> +	if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_FULL &&
>> +	    !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT)) {
>> +		if (WARN_ON(qgroup->rfer || qgroup->excl ||
>> +			    qgroup->rfer_cmpr || qgroup->excl_cmpr)) {
>> +			btrfs_warn_rl(fs_info,
>> +"to be deleted qgroup %u/%llu has non-zero numbers, rfer %llu rfer_cmpr %llu excl %llu excl_cmpr %llu",
>> +				      btrfs_qgroup_level(qgroup->qgroupid),
>> +				      btrfs_qgroup_subvolid(qgroup->qgroupid),
>> +				      qgroup->rfer,
>> +				      qgroup->rfer_cmpr,
>> +				      qgroup->excl,
>> +				      qgroup->excl_cmpr);
>> +			qgroup_mark_inconsistent(fs_info);
>> +		}
>> +	}
>
> If you go ahead with making it conditional, I would fold this warning
> into the check logic. Either way is fine, of course.
>
>>   	del_qgroup_rb(fs_info, qgroupid);
>>   	spin_unlock(&fs_info->qgroup_lock);
>>
>> --
>> 2.44.0
>>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-04-29 16:31             ` David Sterba
@ 2024-04-29 22:05               ` Qu Wenruo
  2024-04-30 10:59                 ` David Sterba
  0 siblings, 1 reply; 28+ messages in thread
From: Qu Wenruo @ 2024-04-29 22:05 UTC (permalink / raw)
  To: dsterba, Boris Burkov; +Cc: Qu Wenruo, linux-btrfs



在 2024/4/30 02:01, David Sterba 写道:
> On Mon, Apr 29, 2024 at 06:13:33AM -0700, Boris Burkov wrote:
>> I support the auto deletion in the kernel as you propose, I think it
>> just makes sense. Who wants stale, empty qgroups around that aren't
>> attached to any subvol? I suppose that with the drop_thresh thing, it is
>> possible some parent qgroup still reflects the usage until the next full
>> scan?
>
> The stale qgroups have been out for a long time so removing them after
> subvolume deletion is changing default behaviour, this always breaks
> somebody's scripts or tools.

If needed I can introduce a compat bit (the first one), to tell the
behavior difference.

And if we go the compat bit way, I believe it can be the first example
on how to do a kernel behavior change correctly without breaking any
user space assumption.

Thanks,
Qu
>
>> Thinking out loud -- for regular qgroups, we could avoid this all if we
>> do the reaping when usage goes to 0 and there is no subvol. So remove
>> the qgroup as a consequence of the rescan, not the subvol delete. I
>> imagine this is quite a bit messier, though :(
>>
>> We could also just not auto-reap if that condition occurs (inconsistent
>> qg with a parent), but I think that may be surprising for the same
>> reasons that got you working on this in the first place...
>>
>> Do we know of an explicit need to support --no-delete-qgroup? It feels
>> like it is perfectly normal for us to improve the default behavior of
>> the kernel or userspace tools without supporting the old behavior as a
>> flag forever (without a user).
>
> $ id=$(btrfs inspect rootid subvol)
> $ btrfs subvolume delete subvol
> $ btrfs qgroup remove 0/$id 1/1 .                   <---- fails
> $ btrfs qgroup destroy 0/$id .                      <---- fails
>
>> Put another way, I think it would be perfectly reasonable to term the
>> stale qgroups a leaked memory resource and this patch a bug fix, if we
>> are willing to get overly philosophical about it. We don't carry around
>> eternal flags for bug fixes, unless it's some rather exceptional case.
>
> The command line option does not do what I expected, if somebody would
> have to update the scripts to add it then we can do the kernel
> auto-remove and the document it. Eventually we can translate the -ENOENT
> error code to be ignored.
>
>> If we are on the hook for supporing that flag because we already added
>> it to progs and don't want to deprecate it, then maybe we can think of
>> something compatible with default auto-reap.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] btrfs: slightly loose the requirement for qgroup removal
  2024-04-29 22:00     ` Qu Wenruo
@ 2024-04-29 22:19       ` Boris Burkov
  2024-04-29 22:29         ` Qu Wenruo
  0 siblings, 1 reply; 28+ messages in thread
From: Boris Burkov @ 2024-04-29 22:19 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

On Tue, Apr 30, 2024 at 07:30:58AM +0930, Qu Wenruo wrote:
> 
> 
> 在 2024/4/29 22:17, Boris Burkov 写道:
> > On Fri, Apr 19, 2024 at 07:16:52PM +0930, Qu Wenruo wrote:
> > > [BUG]
> > > Currently if one is utilizing "qgroups/drop_subtree_threshold" sysfs,
> > > and a snapshot with level higher than that value is dropped, btrfs will
> > > not be able to delete the qgroup until next qgroup rescan:
> > > 
> > >    uuid=ffffffff-eeee-dddd-cccc-000000000000
> > > 
> > >    wipefs -fa $dev
> > >    mkfs.btrfs -f $dev -O quota -s 4k -n 4k -U $uuid
> > >    mount $dev $mnt
> > > 
> > >    btrfs subvolume create $mnt/subv1/
> > >    for (( i = 0; i < 1024; i++ )); do
> > >    	xfs_io -f -c "pwrite 0 2k" $mnt/subv1/file_$i > /dev/null
> > >    done
> > >    sync
> > >    btrfs subv snapshot $mnt/subv1 $mnt/snapshot
> > >    btrfs quota enable $mnt
> > >    btrfs quota rescan -w $mnt
> > >    sync
> > >    echo 1 > /sys/fs/btrfs/$uuid/qgroups/drop_subtree_threshold
> > >    btrfs subvolume delete $mnt/snapshot
> > >    btrfs subv sync $mnt
> > >    btrfs qgroup show -prce --sync $mnt
> > >    btrfs qgroup destroy 0/257 $mnt
> > >    umount $mnt
> > > 
> > > The final qgroup removal would fail with the following error:
> > > 
> > >    ERROR: unable to destroy quota group: Device or resource busy
> > > 
> > > [CAUSE]
> > > The above script would generate a subvolume of level 2, then snapshot
> > > it, enable qgroup, set the drop_subtree_threshold, then drop the
> > > snapshot.
> > > 
> > > Since the subvolume drop would meet the threshold, qgroup would be
> > > marked inconsistent and skip accounting to avoid hanging the system at
> > > transaction commit.
> > > 
> > > But currently we do not allow a qgroup with any rfer/excl numbers to be
> > > dropped, and this is not really compatible with the new
> > > drop_subtree_threshold behavior.
> > > 
> > > [FIX]
> > > Instead of a strong requirement for zero rfer/excl numbers, just check
> > > if there is any child for higher level qgroup, and for subvolume qgroups
> > > check if there is a corresponding subvolume for it.
> > > 
> > > For rsv values, do extra warnings, and for rfer/excl numbers, only do the
> > > warning if we're in full accounting mode and the qgroup is consistent.
> > > 
> > > Signed-off-by: Qu Wenruo <wqu@suse.com>
> > > ---
> > >   fs/btrfs/qgroup.c | 69 ++++++++++++++++++++++++++++++++++++++++++-----
> > >   1 file changed, 62 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> > > index 9a9f84c4d3b8..2ea16a07a7d4 100644
> > > --- a/fs/btrfs/qgroup.c
> > > +++ b/fs/btrfs/qgroup.c
> > > @@ -1736,13 +1736,38 @@ int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
> > >   	return ret;
> > >   }
> > > 
> > > -static bool qgroup_has_usage(struct btrfs_qgroup *qgroup)
> > > +static bool can_delete_qgroup(struct btrfs_fs_info *fs_info,
> > > +			      struct btrfs_qgroup *qgroup)
> > >   {
> > > -	return (qgroup->rfer > 0 || qgroup->rfer_cmpr > 0 ||
> > > -		qgroup->excl > 0 || qgroup->excl_cmpr > 0 ||
> > > -		qgroup->rsv.values[BTRFS_QGROUP_RSV_DATA] > 0 ||
> > > -		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PREALLOC] > 0 ||
> > > -		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PERTRANS] > 0);
> > > +	struct btrfs_key key;
> > > +	struct btrfs_path *path;
> > > +	int ret;
> > > +
> > > +	/* For higher level qgroup, we can only delete it if it has no child. */
> > > +	if (btrfs_qgroup_level(qgroup->qgroupid)) {
> > > +		if (!list_empty(&qgroup->members))
> > > +			return false;
> > > +		return true;
> > > +	}
> > > +
> > > +	/*
> > > +	 * For level-0 qgroups, we can only delete it if it has no subvolume
> > > +	 * for it.
> > > +	 * This means even a subvolume is unlinked but not yet fully dropped,
> > > +	 * we can not delete the qgroup.
> > > +	 */
> > 
> > This was what I originally considered for normal qgroups before observing
> > that usage is 0 anyway. I didn't know about the drop tree threshold,
> > my mistake.
> > 
> > With that said, I support this change for non-squota qgroups.
> > 
> > For squota, I think this behavior would be OK, but undesirable, IMO. Any
> > parent qgroup would still have its usage incremented against the limit,
> > but it would be impossible to find any information on where it was from.
> 
> That's indeed another problem.
> 
> For regular qgroup that could only happen when qgroup is inconsistent,
> and we're fine to drop the qgroup without updating the parent.
> 
> But for squota, there is no inconsistent state, meaning we should also
> drop all the numbers from parent too.
> 
> > 
> > How do you feel about making this patch add the new logic and make it
> > conditional on qgroup mode?
> 
> I'm totally fine to do it conditional, although I'd also like to get
> feedback on dropping the numbers from parent qgroup (so we can do it for
> both qgroup and squota).
> 
> Would the auto drop for parent numbers be a solution?

It's a good question. It never occurred to me until this discussion
today.

Thinking out loud, squotas as a feature is already "comfortable" with
unaccounted space. If you enable it on a live FS, all the extents that
already exist are unaccounted, so there is no objection on that front.

I think from a container isolation perspective, the current behavior
makes more sense than auto dropping from the parent on subvol delete to
allow cleaning up the qgroup.

Suppose we create a container wrapping parent qgroup with ID 1/100. To
enforce a limit on the container customer, we set some limit on 1/100.
Then we create a container subvol and put 0/svid0 into 1/100. The
customer gets to do stuff in the subvol. They are a fancy customer and
create a subvol svid1, snapshot it svid2, and delete svid1.

svid1 and svid2 are created in the subvol of id svid0, so auto-inheritance
ensured that 1/100 was the parent of 0/svid1 and 0/svid2, but now
deleting svid1 frees all that customer usage from 1/100 and allows the
customer to escape the limit. This is obviously quite undesirable, and
wouldn't require a particularly malicious customer to hit.

Boris

> 
> Thanks,
> Qu
> > 
> > Thanks,
> > Boris
> > 
> > > +	key.objectid = qgroup->qgroupid;
> > > +	key.type = BTRFS_ROOT_ITEM_KEY;
> > > +	key.offset = -1ULL;
> > > +	path = btrfs_alloc_path();
> > > +	if (!path)
> > > +		return false;
> > > +
> > > +	ret = btrfs_find_root(fs_info->tree_root, &key, path, NULL, NULL);
> > > +	btrfs_free_path(path);
> > > +	if (ret > 0)
> > > +		return true;
> > > +	return false;
> > >   }
> > > 
> > >   int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
> > > @@ -1764,7 +1789,7 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
> > >   		goto out;
> > >   	}
> > > 
> > > -	if (is_fstree(qgroupid) && qgroup_has_usage(qgroup)) {
> > > +	if (!can_delete_qgroup(fs_info, qgroup)) {
> > >   		ret = -EBUSY;
> > >   		goto out;
> > >   	}
> > > @@ -1789,6 +1814,36 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
> > >   	}
> > > 
> > >   	spin_lock(&fs_info->qgroup_lock);
> > > +	/*
> > > +	 * Warn on reserved space. The subvolume should has no child nor
> > > +	 * corresponding subvolume.
> > > +	 * Thus its reserved space should all be zero, no matter if qgroup
> > > +	 * is consistent or the mode.
> > > +	 */
> > > +	WARN_ON(qgroup->rsv.values[BTRFS_QGROUP_RSV_DATA] ||
> > > +		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PREALLOC] ||
> > > +		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PERTRANS]);
> > > +	/*
> > > +	 * The same for rfer/excl numbers, but that's only if our qgroup is
> > > +	 * consistent and if it's in regular qgroup mode.
> > > +	 * For simple mode it's not as accurate thus we can hit non-zero values
> > > +	 * very frequently.
> > > +	 */
> > > +	if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_FULL &&
> > > +	    !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT)) {
> > > +		if (WARN_ON(qgroup->rfer || qgroup->excl ||
> > > +			    qgroup->rfer_cmpr || qgroup->excl_cmpr)) {
> > > +			btrfs_warn_rl(fs_info,
> > > +"to be deleted qgroup %u/%llu has non-zero numbers, rfer %llu rfer_cmpr %llu excl %llu excl_cmpr %llu",
> > > +				      btrfs_qgroup_level(qgroup->qgroupid),
> > > +				      btrfs_qgroup_subvolid(qgroup->qgroupid),
> > > +				      qgroup->rfer,
> > > +				      qgroup->rfer_cmpr,
> > > +				      qgroup->excl,
> > > +				      qgroup->excl_cmpr);
> > > +			qgroup_mark_inconsistent(fs_info);
> > > +		}
> > > +	}
> > 
> > If you go ahead with making it conditional, I would fold this warning
> > into the check logic. Either way is fine, of course.
> > 
> > >   	del_qgroup_rb(fs_info, qgroupid);
> > >   	spin_unlock(&fs_info->qgroup_lock);
> > > 
> > > --
> > > 2.44.0
> > > 
> > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] btrfs: slightly loose the requirement for qgroup removal
  2024-04-29 22:19       ` Boris Burkov
@ 2024-04-29 22:29         ` Qu Wenruo
  2024-04-29 22:38           ` Boris Burkov
  0 siblings, 1 reply; 28+ messages in thread
From: Qu Wenruo @ 2024-04-29 22:29 UTC (permalink / raw)
  To: Boris Burkov, Qu Wenruo; +Cc: linux-btrfs



在 2024/4/30 07:49, Boris Burkov 写道:
> On Tue, Apr 30, 2024 at 07:30:58AM +0930, Qu Wenruo wrote:
[...]
>>
>> I'm totally fine to do it conditional, although I'd also like to get
>> feedback on dropping the numbers from parent qgroup (so we can do it for
>> both qgroup and squota).
>>
>> Would the auto drop for parent numbers be a solution?
> 
> It's a good question. It never occurred to me until this discussion
> today.
> 
> Thinking out loud, squotas as a feature is already "comfortable" with
> unaccounted space. If you enable it on a live FS, all the extents that
> already exist are unaccounted, so there is no objection on that front.
> 
> I think from a container isolation perspective, the current behavior
> makes more sense than auto dropping from the parent on subvol delete to
> allow cleaning up the qgroup.
> 
> Suppose we create a container wrapping parent qgroup with ID 1/100. To
> enforce a limit on the container customer, we set some limit on 1/100.
> Then we create a container subvol and put 0/svid0 into 1/100. The
> customer gets to do stuff in the subvol. They are a fancy customer and
> create a subvol svid1, snapshot it svid2, and delete svid1.
> 
> svid1 and svid2 are created in the subvol of id svid0, so auto-inheritance
> ensured that 1/100 was the parent of 0/svid1 and 0/svid2, but now
> deleting svid1 frees all that customer usage from 1/100 and allows the
> customer to escape the limit. This is obviously quite undesirable, and
> wouldn't require a particularly malicious customer to hit.

OK, got your point. Thanks for the detailed explanation, and I can not 
come up with any alternative so far.

So I'll make the qgroup drop to have an extra condition for squota, so 
that a squota qgroup can only be dropped when:

1) The subvolume is fully dropped
    The same one for both regular qgroup and squota

2) The number are all zero
    This would be squota specific one.

So that the numbers would still be correct while regular qgroup can 
still handle auto-deletion for inconsistent case.

Thanks,
Qu

> 
> Boris
> 
>>
>> Thanks,
>> Qu
>>>
>>> Thanks,
>>> Boris
>>>
>>>> +	key.objectid = qgroup->qgroupid;
>>>> +	key.type = BTRFS_ROOT_ITEM_KEY;
>>>> +	key.offset = -1ULL;
>>>> +	path = btrfs_alloc_path();
>>>> +	if (!path)
>>>> +		return false;
>>>> +
>>>> +	ret = btrfs_find_root(fs_info->tree_root, &key, path, NULL, NULL);
>>>> +	btrfs_free_path(path);
>>>> +	if (ret > 0)
>>>> +		return true;
>>>> +	return false;
>>>>    }
>>>>
>>>>    int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
>>>> @@ -1764,7 +1789,7 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
>>>>    		goto out;
>>>>    	}
>>>>
>>>> -	if (is_fstree(qgroupid) && qgroup_has_usage(qgroup)) {
>>>> +	if (!can_delete_qgroup(fs_info, qgroup)) {
>>>>    		ret = -EBUSY;
>>>>    		goto out;
>>>>    	}
>>>> @@ -1789,6 +1814,36 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
>>>>    	}
>>>>
>>>>    	spin_lock(&fs_info->qgroup_lock);
>>>> +	/*
>>>> +	 * Warn on reserved space. The subvolume should has no child nor
>>>> +	 * corresponding subvolume.
>>>> +	 * Thus its reserved space should all be zero, no matter if qgroup
>>>> +	 * is consistent or the mode.
>>>> +	 */
>>>> +	WARN_ON(qgroup->rsv.values[BTRFS_QGROUP_RSV_DATA] ||
>>>> +		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PREALLOC] ||
>>>> +		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PERTRANS]);
>>>> +	/*
>>>> +	 * The same for rfer/excl numbers, but that's only if our qgroup is
>>>> +	 * consistent and if it's in regular qgroup mode.
>>>> +	 * For simple mode it's not as accurate thus we can hit non-zero values
>>>> +	 * very frequently.
>>>> +	 */
>>>> +	if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_FULL &&
>>>> +	    !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT)) {
>>>> +		if (WARN_ON(qgroup->rfer || qgroup->excl ||
>>>> +			    qgroup->rfer_cmpr || qgroup->excl_cmpr)) {
>>>> +			btrfs_warn_rl(fs_info,
>>>> +"to be deleted qgroup %u/%llu has non-zero numbers, rfer %llu rfer_cmpr %llu excl %llu excl_cmpr %llu",
>>>> +				      btrfs_qgroup_level(qgroup->qgroupid),
>>>> +				      btrfs_qgroup_subvolid(qgroup->qgroupid),
>>>> +				      qgroup->rfer,
>>>> +				      qgroup->rfer_cmpr,
>>>> +				      qgroup->excl,
>>>> +				      qgroup->excl_cmpr);
>>>> +			qgroup_mark_inconsistent(fs_info);
>>>> +		}
>>>> +	}
>>>
>>> If you go ahead with making it conditional, I would fold this warning
>>> into the check logic. Either way is fine, of course.
>>>
>>>>    	del_qgroup_rb(fs_info, qgroupid);
>>>>    	spin_unlock(&fs_info->qgroup_lock);
>>>>
>>>> --
>>>> 2.44.0
>>>>
>>>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] btrfs: slightly loose the requirement for qgroup removal
  2024-04-29 22:29         ` Qu Wenruo
@ 2024-04-29 22:38           ` Boris Burkov
  0 siblings, 0 replies; 28+ messages in thread
From: Boris Burkov @ 2024-04-29 22:38 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

On Tue, Apr 30, 2024 at 07:59:17AM +0930, Qu Wenruo wrote:
> 
> 
> 在 2024/4/30 07:49, Boris Burkov 写道:
> > On Tue, Apr 30, 2024 at 07:30:58AM +0930, Qu Wenruo wrote:
> [...]
> > > 
> > > I'm totally fine to do it conditional, although I'd also like to get
> > > feedback on dropping the numbers from parent qgroup (so we can do it for
> > > both qgroup and squota).
> > > 
> > > Would the auto drop for parent numbers be a solution?
> > 
> > It's a good question. It never occurred to me until this discussion
> > today.
> > 
> > Thinking out loud, squotas as a feature is already "comfortable" with
> > unaccounted space. If you enable it on a live FS, all the extents that
> > already exist are unaccounted, so there is no objection on that front.
> > 
> > I think from a container isolation perspective, the current behavior
> > makes more sense than auto dropping from the parent on subvol delete to
> > allow cleaning up the qgroup.
> > 
> > Suppose we create a container wrapping parent qgroup with ID 1/100. To
> > enforce a limit on the container customer, we set some limit on 1/100.
> > Then we create a container subvol and put 0/svid0 into 1/100. The
> > customer gets to do stuff in the subvol. They are a fancy customer and
> > create a subvol svid1, snapshot it svid2, and delete svid1.
> > 
> > svid1 and svid2 are created in the subvol of id svid0, so auto-inheritance
> > ensured that 1/100 was the parent of 0/svid1 and 0/svid2, but now
> > deleting svid1 frees all that customer usage from 1/100 and allows the
> > customer to escape the limit. This is obviously quite undesirable, and
> > wouldn't require a particularly malicious customer to hit.
> 
> OK, got your point. Thanks for the detailed explanation, and I can not come
> up with any alternative so far.
> 
> So I'll make the qgroup drop to have an extra condition for squota, so that
> a squota qgroup can only be dropped when:
> 
> 1) The subvolume is fully dropped
>    The same one for both regular qgroup and squota
> 
> 2) The number are all zero
>    This would be squota specific one.
> 
> So that the numbers would still be correct while regular qgroup can still
> handle auto-deletion for inconsistent case.

That sounds like a good design to me. And as I mentioned, you might be
able to share the conditions for squota failing with EBUSY and normal
qgroup printing a debug message about being inconsistent

> 
> Thanks,
> Qu
> 
> > 
> > Boris
> > 
> > > 
> > > Thanks,
> > > Qu
> > > > 
> > > > Thanks,
> > > > Boris
> > > > 
> > > > > +	key.objectid = qgroup->qgroupid;
> > > > > +	key.type = BTRFS_ROOT_ITEM_KEY;
> > > > > +	key.offset = -1ULL;
> > > > > +	path = btrfs_alloc_path();
> > > > > +	if (!path)
> > > > > +		return false;
> > > > > +
> > > > > +	ret = btrfs_find_root(fs_info->tree_root, &key, path, NULL, NULL);
> > > > > +	btrfs_free_path(path);
> > > > > +	if (ret > 0)
> > > > > +		return true;
> > > > > +	return false;
> > > > >    }
> > > > > 
> > > > >    int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
> > > > > @@ -1764,7 +1789,7 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
> > > > >    		goto out;
> > > > >    	}
> > > > > 
> > > > > -	if (is_fstree(qgroupid) && qgroup_has_usage(qgroup)) {
> > > > > +	if (!can_delete_qgroup(fs_info, qgroup)) {
> > > > >    		ret = -EBUSY;
> > > > >    		goto out;
> > > > >    	}
> > > > > @@ -1789,6 +1814,36 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
> > > > >    	}
> > > > > 
> > > > >    	spin_lock(&fs_info->qgroup_lock);
> > > > > +	/*
> > > > > +	 * Warn on reserved space. The subvolume should has no child nor
> > > > > +	 * corresponding subvolume.
> > > > > +	 * Thus its reserved space should all be zero, no matter if qgroup
> > > > > +	 * is consistent or the mode.
> > > > > +	 */
> > > > > +	WARN_ON(qgroup->rsv.values[BTRFS_QGROUP_RSV_DATA] ||
> > > > > +		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PREALLOC] ||
> > > > > +		qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PERTRANS]);
> > > > > +	/*
> > > > > +	 * The same for rfer/excl numbers, but that's only if our qgroup is
> > > > > +	 * consistent and if it's in regular qgroup mode.
> > > > > +	 * For simple mode it's not as accurate thus we can hit non-zero values
> > > > > +	 * very frequently.
> > > > > +	 */
> > > > > +	if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_FULL &&
> > > > > +	    !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT)) {
> > > > > +		if (WARN_ON(qgroup->rfer || qgroup->excl ||
> > > > > +			    qgroup->rfer_cmpr || qgroup->excl_cmpr)) {
> > > > > +			btrfs_warn_rl(fs_info,
> > > > > +"to be deleted qgroup %u/%llu has non-zero numbers, rfer %llu rfer_cmpr %llu excl %llu excl_cmpr %llu",
> > > > > +				      btrfs_qgroup_level(qgroup->qgroupid),
> > > > > +				      btrfs_qgroup_subvolid(qgroup->qgroupid),
> > > > > +				      qgroup->rfer,
> > > > > +				      qgroup->rfer_cmpr,
> > > > > +				      qgroup->excl,
> > > > > +				      qgroup->excl_cmpr);
> > > > > +			qgroup_mark_inconsistent(fs_info);
> > > > > +		}
> > > > > +	}
> > > > 
> > > > If you go ahead with making it conditional, I would fold this warning
> > > > into the check logic. Either way is fine, of course.
> > > > 
> > > > >    	del_qgroup_rb(fs_info, qgroupid);
> > > > >    	spin_unlock(&fs_info->qgroup_lock);
> > > > > 
> > > > > --
> > > > > 2.44.0
> > > > > 
> > > > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-04-29 13:13           ` Boris Burkov
  2024-04-29 16:31             ` David Sterba
@ 2024-04-29 22:49             ` Qu Wenruo
  1 sibling, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2024-04-29 22:49 UTC (permalink / raw)
  To: Boris Burkov; +Cc: dsterba, Qu Wenruo, linux-btrfs

在 2024/4/29 22:43, Boris Burkov 写道:
> On Fri, Apr 26, 2024 at 07:21:08AM +0930, Qu Wenruo wrote:
[...]
>> I'd say, you should not implement this feature without really
>> understanding the challenges in the first place.
>>
>> And that's why I really prefer you send out non-trivial btrfs-progs for
>> review, other than pushing them directly into github repo.
>>
>> Thanks,
>> Qu
>
> I support the auto deletion in the kernel as you propose, I think it
> just makes sense. Who wants stale, empty qgroups around that aren't
> attached to any subvol? I suppose that with the drop_thresh thing, it is
> possible some parent qgroup still reflects the usage until the next full
> scan?

Yep, that's the problem.
Although I can also do the extra dropping for regular qgroups, but since
it's already inconsistent, modifying the numbers won't make much
difference anyway.

>
> Thinking out loud -- for regular qgroups, we could avoid this all if we
> do the reaping when usage goes to 0 and there is no subvol. So remove
> the qgroup as a consequence of the rescan, not the subvol delete. I
> imagine this is quite a bit messier, though :(

This may be a little more complex, we need to do the extra check at
qgroup accounting time.
And IIRC there may be some corner case that numbers of a qgroup dropped
to 0 temporarily, but later extents accounting would increase it back.

Another concern is related to rescan.
Rescan would mark all qgroup to have 0 numbers first, then let rescan to
refill them all.

This means for a fully dropped subvolume in INCONSISTENT mode, there is
no other timing other than rescan setting its numbers to 0.
So dropping the qgroup when its number goes back to 0 won't happen for it.

>
> We could also just not auto-reap if that condition occurs (inconsistent
> qg with a parent), but I think that may be surprising for the same
> reasons that got you working on this in the first place...
>
> Do we know of an explicit need to support --no-delete-qgroup? It feels
> like it is perfectly normal for us to improve the default behavior of
> the kernel or userspace tools without supporting the old behavior as a
> flag forever (without a user).

I do not think any one would intentionally prefer a stale one.

But considering if squota requires a stale qgroup to keep the accounting
correct, the difference between regular qgroup and squota may grow
larger and larger in the future.

I'll need to re-consider the qgroup drop condition to make it only
possible to delete really stale qgroups (all zero and no subvolume for
both regular qgroup/squota, but for inconsistent regular qgroup,
allowing dropping non-zero qgroups).

>
> Put another way, I think it would be perfectly reasonable to term the
> stale qgroups a leaked memory resource and this patch a bug fix, if we
> are willing to get overly philosophical about it. We don't carry around
> eternal flags for bug fixes, unless it's some rather exceptional case.
>
> If we are on the hook for supporing that flag because we already added
> it to progs and don't want to deprecate it, then maybe we can think of
> something compatible with default auto-reap.

Like a compat bit to represent if we do the auto-reap.
This would definitely make older script happy.

But I think it's a little overkilled.

Thanks,
Qu
>
> Thanks,
> Boris
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-04-29 22:05               ` Qu Wenruo
@ 2024-04-30 10:59                 ` David Sterba
  2024-04-30 22:05                   ` Qu Wenruo
  0 siblings, 1 reply; 28+ messages in thread
From: David Sterba @ 2024-04-30 10:59 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Boris Burkov, Qu Wenruo, linux-btrfs

On Tue, Apr 30, 2024 at 07:35:11AM +0930, Qu Wenruo wrote:
> 
> 
> 在 2024/4/30 02:01, David Sterba 写道:
> > On Mon, Apr 29, 2024 at 06:13:33AM -0700, Boris Burkov wrote:
> >> I support the auto deletion in the kernel as you propose, I think it
> >> just makes sense. Who wants stale, empty qgroups around that aren't
> >> attached to any subvol? I suppose that with the drop_thresh thing, it is
> >> possible some parent qgroup still reflects the usage until the next full
> >> scan?
> >
> > The stale qgroups have been out for a long time so removing them after
> > subvolume deletion is changing default behaviour, this always breaks
> > somebody's scripts or tools.
> 
> If needed I can introduce a compat bit (the first one), to tell the
> behavior difference.
> 
> And if we go the compat bit way, I believe it can be the first example
> on how to do a kernel behavior change correctly without breaking any
> user space assumption.

I don't see how a compat bit would work here, we use them for feature
compatibility and for general access to data (full or read-only). What
we do with individual behavioral changes are sysfs files. They're
detectable by scripts and can be also used for configuration. In this
case enabling/disabling autoclean of the qgroups.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-04-30 10:59                 ` David Sterba
@ 2024-04-30 22:05                   ` Qu Wenruo
  2024-04-30 22:18                     ` Boris Burkov
  2024-05-02 15:00                     ` David Sterba
  0 siblings, 2 replies; 28+ messages in thread
From: Qu Wenruo @ 2024-04-30 22:05 UTC (permalink / raw)
  To: dsterba; +Cc: Boris Burkov, Qu Wenruo, linux-btrfs



在 2024/4/30 20:29, David Sterba 写道:
> On Tue, Apr 30, 2024 at 07:35:11AM +0930, Qu Wenruo wrote:
>>
>>
>> 在 2024/4/30 02:01, David Sterba 写道:
>>> On Mon, Apr 29, 2024 at 06:13:33AM -0700, Boris Burkov wrote:
>>>> I support the auto deletion in the kernel as you propose, I think it
>>>> just makes sense. Who wants stale, empty qgroups around that aren't
>>>> attached to any subvol? I suppose that with the drop_thresh thing, it is
>>>> possible some parent qgroup still reflects the usage until the next full
>>>> scan?
>>>
>>> The stale qgroups have been out for a long time so removing them after
>>> subvolume deletion is changing default behaviour, this always breaks
>>> somebody's scripts or tools.
>>
>> If needed I can introduce a compat bit (the first one), to tell the
>> behavior difference.
>>
>> And if we go the compat bit way, I believe it can be the first example
>> on how to do a kernel behavior change correctly without breaking any
>> user space assumption.
>
> I don't see how a compat bit would work here, we use them for feature
> compatibility and for general access to data (full or read-only). What
> we do with individual behavioral changes are sysfs files. They're
> detectable by scripts and can be also used for configuration. In this
> case enabling/disabling autoclean of the qgroups.
>

I mean the compat bit, which is fully empty now.

The new bit would be something like BTRFS_QGROUP_AUTO_REMOVAL, with that
set, btrfs would auto remove any stale qgroups (for regular qgroups though).

Without that, it would be as usual (no auto removal).

Since this doesn't cause any on-disk change, it does not needs compat-ro
nor incompat.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-04-30 22:05                   ` Qu Wenruo
@ 2024-04-30 22:18                     ` Boris Burkov
  2024-04-30 22:27                       ` Qu Wenruo
  2024-05-02 15:00                     ` David Sterba
  1 sibling, 1 reply; 28+ messages in thread
From: Boris Burkov @ 2024-04-30 22:18 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Wed, May 01, 2024 at 07:35:09AM +0930, Qu Wenruo wrote:
> 
> 
> 在 2024/4/30 20:29, David Sterba 写道:
> > On Tue, Apr 30, 2024 at 07:35:11AM +0930, Qu Wenruo wrote:
> > > 
> > > 
> > > 在 2024/4/30 02:01, David Sterba 写道:
> > > > On Mon, Apr 29, 2024 at 06:13:33AM -0700, Boris Burkov wrote:
> > > > > I support the auto deletion in the kernel as you propose, I think it
> > > > > just makes sense. Who wants stale, empty qgroups around that aren't
> > > > > attached to any subvol? I suppose that with the drop_thresh thing, it is
> > > > > possible some parent qgroup still reflects the usage until the next full
> > > > > scan?
> > > > 
> > > > The stale qgroups have been out for a long time so removing them after
> > > > subvolume deletion is changing default behaviour, this always breaks
> > > > somebody's scripts or tools.
> > > 
> > > If needed I can introduce a compat bit (the first one), to tell the
> > > behavior difference.
> > > 
> > > And if we go the compat bit way, I believe it can be the first example
> > > on how to do a kernel behavior change correctly without breaking any
> > > user space assumption.
> > 
> > I don't see how a compat bit would work here, we use them for feature
> > compatibility and for general access to data (full or read-only). What
> > we do with individual behavioral changes are sysfs files. They're
> > detectable by scripts and can be also used for configuration. In this
> > case enabling/disabling autoclean of the qgroups.

This was my initial thought too, but your compat bit idea is interesting
since it persists? I vote sysfs since it has good
infrastructure/momentum already for similar config.

> > 
> 
> I mean the compat bit, which is fully empty now.
> 
> The new bit would be something like BTRFS_QGROUP_AUTO_REMOVAL, with that
> set, btrfs would auto remove any stale qgroups (for regular qgroups though).
> 
> Without that, it would be as usual (no auto removal).
> 
> Since this doesn't cause any on-disk change, it does not needs compat-ro
> nor incompat.
> 
> Thanks,
> Qu

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-04-30 22:18                     ` Boris Burkov
@ 2024-04-30 22:27                       ` Qu Wenruo
  2024-05-02 15:03                         ` David Sterba
  0 siblings, 1 reply; 28+ messages in thread
From: Qu Wenruo @ 2024-04-30 22:27 UTC (permalink / raw)
  To: Boris Burkov; +Cc: dsterba, Qu Wenruo, linux-btrfs



在 2024/5/1 07:48, Boris Burkov 写道:
> On Wed, May 01, 2024 at 07:35:09AM +0930, Qu Wenruo wrote:
[...]
>>>
>>> I don't see how a compat bit would work here, we use them for feature
>>> compatibility and for general access to data (full or read-only). What
>>> we do with individual behavioral changes are sysfs files. They're
>>> detectable by scripts and can be also used for configuration. In this
>>> case enabling/disabling autoclean of the qgroups.
>
> This was my initial thought too, but your compat bit idea is interesting
> since it persists? I vote sysfs since it has good
> infrastructure/momentum already for similar config.

Sysfs is another way which we are already utilizing for qgroups, like
drop_subtree_threshold.

The problem is as mentioned already, it's not persistent, thus it needs
a user space daemon to set it for every fs after mount.

I'm totally fine to go sysfs for now, but I really hope to a persistent
solution.
Maybe a dedicated config tree?

Thanks,
Qu

>
>>>
>>
>> I mean the compat bit, which is fully empty now.
>>
>> The new bit would be something like BTRFS_QGROUP_AUTO_REMOVAL, with that
>> set, btrfs would auto remove any stale qgroups (for regular qgroups though).
>>
>> Without that, it would be as usual (no auto removal).
>>
>> Since this doesn't cause any on-disk change, it does not needs compat-ro
>> nor incompat.
>>
>> Thanks,
>> Qu

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-04-30 22:05                   ` Qu Wenruo
  2024-04-30 22:18                     ` Boris Burkov
@ 2024-05-02 15:00                     ` David Sterba
  2024-05-02 21:27                       ` Qu Wenruo
  1 sibling, 1 reply; 28+ messages in thread
From: David Sterba @ 2024-05-02 15:00 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Boris Burkov, Qu Wenruo, linux-btrfs

On Wed, May 01, 2024 at 07:35:09AM +0930, Qu Wenruo wrote:
> 
> 
> 在 2024/4/30 20:29, David Sterba 写道:
> > On Tue, Apr 30, 2024 at 07:35:11AM +0930, Qu Wenruo wrote:
> >>
> >>
> >> 在 2024/4/30 02:01, David Sterba 写道:
> >>> On Mon, Apr 29, 2024 at 06:13:33AM -0700, Boris Burkov wrote:
> >>>> I support the auto deletion in the kernel as you propose, I think it
> >>>> just makes sense. Who wants stale, empty qgroups around that aren't
> >>>> attached to any subvol? I suppose that with the drop_thresh thing, it is
> >>>> possible some parent qgroup still reflects the usage until the next full
> >>>> scan?
> >>>
> >>> The stale qgroups have been out for a long time so removing them after
> >>> subvolume deletion is changing default behaviour, this always breaks
> >>> somebody's scripts or tools.
> >>
> >> If needed I can introduce a compat bit (the first one), to tell the
> >> behavior difference.
> >>
> >> And if we go the compat bit way, I believe it can be the first example
> >> on how to do a kernel behavior change correctly without breaking any
> >> user space assumption.
> >
> > I don't see how a compat bit would work here, we use them for feature
> > compatibility and for general access to data (full or read-only). What
> > we do with individual behavioral changes are sysfs files. They're
> > detectable by scripts and can be also used for configuration. In this
> > case enabling/disabling autoclean of the qgroups.
> 
> I mean the compat bit, which is fully empty now.

Which compat bit and in which structure? What I mean is the super block
incompat/compat bits but what you described below does not match it.

> The new bit would be something like BTRFS_QGROUP_AUTO_REMOVAL, with that
> set, btrfs would auto remove any stale qgroups (for regular qgroups though).
> 
> Without that, it would be as usual (no auto removal).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-04-30 22:27                       ` Qu Wenruo
@ 2024-05-02 15:03                         ` David Sterba
  2024-05-02 21:29                           ` Qu Wenruo
  0 siblings, 1 reply; 28+ messages in thread
From: David Sterba @ 2024-05-02 15:03 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Boris Burkov, dsterba, Qu Wenruo, linux-btrfs

On Wed, May 01, 2024 at 07:57:08AM +0930, Qu Wenruo wrote:
> 
> 
> 在 2024/5/1 07:48, Boris Burkov 写道:
> > On Wed, May 01, 2024 at 07:35:09AM +0930, Qu Wenruo wrote:
> [...]
> >>>
> >>> I don't see how a compat bit would work here, we use them for feature
> >>> compatibility and for general access to data (full or read-only). What
> >>> we do with individual behavioral changes are sysfs files. They're
> >>> detectable by scripts and can be also used for configuration. In this
> >>> case enabling/disabling autoclean of the qgroups.
> >
> > This was my initial thought too, but your compat bit idea is interesting
> > since it persists? I vote sysfs since it has good
> > infrastructure/momentum already for similar config.
> 
> Sysfs is another way which we are already utilizing for qgroups, like
> drop_subtree_threshold.
> 
> The problem is as mentioned already, it's not persistent, thus it needs
> a user space daemon to set it for every fs after mount.

My idea of using sysfs is to export the information that the
autocleaning feature is present and if we make it on by default then
there's no need for additional step to enable it. The feedback about
that was that it should have been default so we're going to make that
change, but with sysfs export also provide a fallback to disable it in
case it breaks things for somebody.

> I'm totally fine to go sysfs for now, but I really hope to a persistent
> solution.
> Maybe a dedicated config tree?

No, we already have a way to store data in the trees or in the
properties so no new tree.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-05-02 15:00                     ` David Sterba
@ 2024-05-02 21:27                       ` Qu Wenruo
  0 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2024-05-02 21:27 UTC (permalink / raw)
  To: dsterba; +Cc: Boris Burkov, Qu Wenruo, linux-btrfs



在 2024/5/3 00:30, David Sterba 写道:
> On Wed, May 01, 2024 at 07:35:09AM +0930, Qu Wenruo wrote:
>>
>>
>> 在 2024/4/30 20:29, David Sterba 写道:
>>> On Tue, Apr 30, 2024 at 07:35:11AM +0930, Qu Wenruo wrote:
>>>>
>>>>
>>>> 在 2024/4/30 02:01, David Sterba 写道:
>>>>> On Mon, Apr 29, 2024 at 06:13:33AM -0700, Boris Burkov wrote:
>>>>>> I support the auto deletion in the kernel as you propose, I think it
>>>>>> just makes sense. Who wants stale, empty qgroups around that aren't
>>>>>> attached to any subvol? I suppose that with the drop_thresh thing, it is
>>>>>> possible some parent qgroup still reflects the usage until the next full
>>>>>> scan?
>>>>>
>>>>> The stale qgroups have been out for a long time so removing them after
>>>>> subvolume deletion is changing default behaviour, this always breaks
>>>>> somebody's scripts or tools.
>>>>
>>>> If needed I can introduce a compat bit (the first one), to tell the
>>>> behavior difference.
>>>>
>>>> And if we go the compat bit way, I believe it can be the first example
>>>> on how to do a kernel behavior change correctly without breaking any
>>>> user space assumption.
>>>
>>> I don't see how a compat bit would work here, we use them for feature
>>> compatibility and for general access to data (full or read-only). What
>>> we do with individual behavioral changes are sysfs files. They're
>>> detectable by scripts and can be also used for configuration. In this
>>> case enabling/disabling autoclean of the qgroups.
>>
>> I mean the compat bit, which is fully empty now.
>
> Which compat bit and in which structure? What I mean is the super block
> incompat/compat bits but what you described below does not match it.

Superblock compat bit exactly.

I didn't see why my following description doesn't match it.

We have 3 compat bits:

- incompat
   Huge on-disk format change that older kernel can not understand and
   affects older kernels' ability to read files.
   This is the most abused bits from the past.
   A lot of features like skinny metadata should go compat_ro.

- compat_ro
   On-disk format changes, but older kernels should still be able to read
   the fs/csum tree etc.

- compat
   Not utilized for now. Older kernel can still do everything without any
   problem.
   Thus this should be the best candidate for runtime behavior changes
   that doesn't affect on-disk format at all.

Thanks,
Qu
>
>> The new bit would be something like BTRFS_QGROUP_AUTO_REMOVAL, with that
>> set, btrfs would auto remove any stale qgroups (for regular qgroups though).
>>
>> Without that, it would be as usual (no auto removal).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-05-02 15:03                         ` David Sterba
@ 2024-05-02 21:29                           ` Qu Wenruo
  2024-05-03 12:46                             ` David Sterba
  0 siblings, 1 reply; 28+ messages in thread
From: Qu Wenruo @ 2024-05-02 21:29 UTC (permalink / raw)
  To: dsterba; +Cc: Boris Burkov, Qu Wenruo, linux-btrfs



在 2024/5/3 00:33, David Sterba 写道:
> On Wed, May 01, 2024 at 07:57:08AM +0930, Qu Wenruo wrote:
>>
>>
>> 在 2024/5/1 07:48, Boris Burkov 写道:
>>> On Wed, May 01, 2024 at 07:35:09AM +0930, Qu Wenruo wrote:
>> [...]
>>>>>
>>>>> I don't see how a compat bit would work here, we use them for feature
>>>>> compatibility and for general access to data (full or read-only). What
>>>>> we do with individual behavioral changes are sysfs files. They're
>>>>> detectable by scripts and can be also used for configuration. In this
>>>>> case enabling/disabling autoclean of the qgroups.
>>>
>>> This was my initial thought too, but your compat bit idea is interesting
>>> since it persists? I vote sysfs since it has good
>>> infrastructure/momentum already for similar config.
>>
>> Sysfs is another way which we are already utilizing for qgroups, like
>> drop_subtree_threshold.
>>
>> The problem is as mentioned already, it's not persistent, thus it needs
>> a user space daemon to set it for every fs after mount.
>
> My idea of using sysfs is to export the information that the
> autocleaning feature is present and if we make it on by default then
> there's no need for additional step to enable it. The feedback about
> that was that it should have been default so we're going to make that
> change, but with sysfs export also provide a fallback to disable it in
> case it breaks things for somebody.
>
>> I'm totally fine to go sysfs for now, but I really hope to a persistent
>> solution.
>> Maybe a dedicated config tree?
>
> No, we already have a way to store data in the trees or in the
> properties so no new tree.

That means a on-disk format change.
IIRC everytime we introduce a new TEMP objectid, it should at least be
compat or compat_ro bit change.

Or older kernel won't understand nor follow the new TEMP key.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-05-02 21:29                           ` Qu Wenruo
@ 2024-05-03 12:46                             ` David Sterba
  2024-05-03 22:14                               ` Qu Wenruo
  0 siblings, 1 reply; 28+ messages in thread
From: David Sterba @ 2024-05-03 12:46 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Boris Burkov, Qu Wenruo, linux-btrfs

On Fri, May 03, 2024 at 06:59:03AM +0930, Qu Wenruo wrote:
> >> The problem is as mentioned already, it's not persistent, thus it needs
> >> a user space daemon to set it for every fs after mount.
> >
> > My idea of using sysfs is to export the information that the
> > autocleaning feature is present and if we make it on by default then
> > there's no need for additional step to enable it. The feedback about
> > that was that it should have been default so we're going to make that
> > change, but with sysfs export also provide a fallback to disable it in
> > case it breaks things for somebody.
> >
> >> I'm totally fine to go sysfs for now, but I really hope to a persistent
> >> solution.
> >> Maybe a dedicated config tree?
> >
> > No, we already have a way to store data in the trees or in the
> > properties so no new tree.
> 
> That means a on-disk format change.
> IIRC everytime we introduce a new TEMP objectid, it should at least be
> compat or compat_ro bit change.
> 
> Or older kernel won't understand nor follow the new TEMP key.

This is not necessarily a problem, if older kernel does not understand
the key it'll skip it. The search treats the keys as numbers. If there's
something more associated with the additional data that would prevent
mount then a compat bit would make sense.

The compat_ro does not make sense here, the same filesystem can be
mounted and data read or written, the only difference is that qgroups
are also removed. How is that incompatible on the same level as other
features like BGT?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup
  2024-05-03 12:46                             ` David Sterba
@ 2024-05-03 22:14                               ` Qu Wenruo
  0 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2024-05-03 22:14 UTC (permalink / raw)
  To: dsterba; +Cc: Boris Burkov, Qu Wenruo, linux-btrfs



在 2024/5/3 22:16, David Sterba 写道:
> On Fri, May 03, 2024 at 06:59:03AM +0930, Qu Wenruo wrote:
>>>> The problem is as mentioned already, it's not persistent, thus it needs
>>>> a user space daemon to set it for every fs after mount.
>>>
>>> My idea of using sysfs is to export the information that the
>>> autocleaning feature is present and if we make it on by default then
>>> there's no need for additional step to enable it. The feedback about
>>> that was that it should have been default so we're going to make that
>>> change, but with sysfs export also provide a fallback to disable it in
>>> case it breaks things for somebody.
>>>
>>>> I'm totally fine to go sysfs for now, but I really hope to a persistent
>>>> solution.
>>>> Maybe a dedicated config tree?
>>>
>>> No, we already have a way to store data in the trees or in the
>>> properties so no new tree.
>>
>> That means a on-disk format change.
>> IIRC everytime we introduce a new TEMP objectid, it should at least be
>> compat or compat_ro bit change.
>>
>> Or older kernel won't understand nor follow the new TEMP key.
>
> This is not necessarily a problem, if older kernel does not understand
> the key it'll skip it. The search treats the keys as numbers. If there's
> something more associated with the additional data that would prevent
> mount then a compat bit would make sense.
>
> The compat_ro does not make sense here, the same filesystem can be
> mounted and data read or written, the only difference is that qgroups
> are also removed. How is that incompatible on the same level as other
> features like BGT?

That's why BGT is compat_ro, not compat.
And exactly why I want to push a new compat bit just for qgroup auto reap.

Thanks,
Qu


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2024-05-03 22:14 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-19  9:46 [PATCH v2 0/2] btrfs: qgroup: stale qgroups related impromvents Qu Wenruo
2024-04-19  9:46 ` [PATCH v2 1/2] btrfs: slightly loose the requirement for qgroup removal Qu Wenruo
2024-04-29 12:47   ` Boris Burkov
2024-04-29 22:00     ` Qu Wenruo
2024-04-29 22:19       ` Boris Burkov
2024-04-29 22:29         ` Qu Wenruo
2024-04-29 22:38           ` Boris Burkov
2024-04-19  9:46 ` [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup Qu Wenruo
2024-04-24 12:41   ` David Sterba
2024-04-24 22:19     ` Qu Wenruo
2024-04-25 12:34       ` David Sterba
2024-04-25 21:51         ` Qu Wenruo
2024-04-29 13:13           ` Boris Burkov
2024-04-29 16:31             ` David Sterba
2024-04-29 22:05               ` Qu Wenruo
2024-04-30 10:59                 ` David Sterba
2024-04-30 22:05                   ` Qu Wenruo
2024-04-30 22:18                     ` Boris Burkov
2024-04-30 22:27                       ` Qu Wenruo
2024-05-02 15:03                         ` David Sterba
2024-05-02 21:29                           ` Qu Wenruo
2024-05-03 12:46                             ` David Sterba
2024-05-03 22:14                               ` Qu Wenruo
2024-05-02 15:00                     ` David Sterba
2024-05-02 21:27                       ` Qu Wenruo
2024-04-29 22:49             ` Qu Wenruo
2024-04-29 16:36           ` David Sterba
2024-04-29 12:57   ` Boris Burkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).