All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/4] btrfs: qgroup: rescan enhancement related to INCONSISTENT flag
@ 2021-08-22  7:01 Qu Wenruo
  2021-08-22  7:01 ` [PATCH RFC 1/4] btrfs: introduce BTRFS_QGROUP_STATUS_FLAGS_MASK for later expansion Qu Wenruo
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Qu Wenruo @ 2021-08-22  7:01 UTC (permalink / raw)
  To: linux-btrfs

There is a long existing window that if we did some operations marking
qgroup INCONSISTENT during a qgroup rescan, the INCONSISTENT bit will be
cleared by rescan, leaving incorrect qgroup numbers unnoticed.

Furthermore, when we mark qgroup INCONSISTENT, we can in theory skip all
qgroup accountings.
Since the numbers are already crazy, we don't really need to waste time
updating something that's already wrong.

So here we introduce two runtime flags:

- BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN
  To inform any running rescan to exit immediately and don't clear
  the INCONSISTENT bit on its exit.

- BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING
  To inform qgroup code not to do any accounting for dirty extents.

  But still allow operations on qgroup relationship to be continued.

Both flags will be set when an operation marks the qgroup INCONSISTENT
and only get cleared when a new rescan is started.


With those flags, we can have the following enhancement:

- Prevent qgroup rescan to clear inconsistent flag which should be kept
  If an operation marks qgroup inconsistent when a rescan is running,
  qgroup rescan will clear the inconsistent flag while the qgroup
  numbers are already wrong.

- Skip qgroup accountings while qgroup numbers are already inconsistent

- Skip huge subtree accounting when dropping subvolumes
  With the obvious cost of marking qgroup inconsistent


Reason for RFC:
- If the runtime qgroup flags are acceptable

- If the behavior of marking qgroup inconsistent when dropping large
  subvolumes

- If the lifespan of runtime qgroup flags are acceptable
  They have longer than needed lifespan (from inconsistent time point to
  next rescan), not sure if it's OK.

Qu Wenruo (4):
  btrfs: introduce BTRFS_QGROUP_STATUS_FLAGS_MASK for later expansion
  btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN
  btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING to skip
    qgroup accounting
  btrfs: skip subtree scan if it's too high to avoid low stall in
    btrfs_commit_transaction()

 fs/btrfs/qgroup.c               | 82 ++++++++++++++++++++++++---------
 fs/btrfs/qgroup.h               |  3 ++
 include/uapi/linux/btrfs_tree.h |  4 ++
 3 files changed, 67 insertions(+), 22 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH RFC 1/4] btrfs: introduce BTRFS_QGROUP_STATUS_FLAGS_MASK for later expansion
  2021-08-22  7:01 [PATCH RFC 0/4] btrfs: qgroup: rescan enhancement related to INCONSISTENT flag Qu Wenruo
@ 2021-08-22  7:01 ` Qu Wenruo
  2021-08-22  7:01 ` [PATCH RFC 2/4] btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN Qu Wenruo
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2021-08-22  7:01 UTC (permalink / raw)
  To: linux-btrfs

Currently we only have 3 qgroup flags:
- BTRFS_QGROUP_STATUS_FLAG_ON
- BTRFS_QGROUP_STATUS_FLAG_RESCAN
- BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT

These flags matches the on-disk flags used in btrfs_qgroup_status.

But we're going to introduce extra runtime flags which will not reach
disks.

So here we introduce a new mask, BTRFS_QGROUP_STATUS_FLAGS_MASK, to
make sure only those flags can reach disks.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/qgroup.c               | 6 ++++--
 include/uapi/linux/btrfs_tree.h | 4 ++++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index db680f5be745..9babf925bd59 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -867,7 +867,8 @@ static int update_qgroup_status_item(struct btrfs_trans_handle *trans)
 	l = path->nodes[0];
 	slot = path->slots[0];
 	ptr = btrfs_item_ptr(l, slot, struct btrfs_qgroup_status_item);
-	btrfs_set_qgroup_status_flags(l, ptr, fs_info->qgroup_flags);
+	btrfs_set_qgroup_status_flags(l, ptr, fs_info->qgroup_flags &
+				      BTRFS_QGROUP_STATUS_FLAGS_MASK);
 	btrfs_set_qgroup_status_generation(l, ptr, trans->transid);
 	btrfs_set_qgroup_status_rescan(l, ptr,
 				fs_info->qgroup_rescan_progress.objectid);
@@ -1027,7 +1028,8 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
 	btrfs_set_qgroup_status_version(leaf, ptr, BTRFS_QGROUP_STATUS_VERSION);
 	fs_info->qgroup_flags = BTRFS_QGROUP_STATUS_FLAG_ON |
 				BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
-	btrfs_set_qgroup_status_flags(leaf, ptr, fs_info->qgroup_flags);
+	btrfs_set_qgroup_status_flags(leaf, ptr, fs_info->qgroup_flags &
+				      BTRFS_QGROUP_STATUS_FLAGS_MASK);
 	btrfs_set_qgroup_status_rescan(leaf, ptr, 0);
 
 	btrfs_mark_buffer_dirty(leaf);
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index e1c4c732aaba..86dbd13a88f3 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -973,6 +973,10 @@ static inline __u16 btrfs_qgroup_level(__u64 qgroupid)
  */
 #define BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT	(1ULL << 2)
 
+#define BTRFS_QGROUP_STATUS_FLAGS_MASK	(BTRFS_QGROUP_STATUS_FLAG_ON |\
+					 BTRFS_QGROUP_STATUS_FLAG_RESCAN |\
+					 BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT)
+
 #define BTRFS_QGROUP_STATUS_VERSION        1
 
 struct btrfs_qgroup_status_item {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 2/4] btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN
  2021-08-22  7:01 [PATCH RFC 0/4] btrfs: qgroup: rescan enhancement related to INCONSISTENT flag Qu Wenruo
  2021-08-22  7:01 ` [PATCH RFC 1/4] btrfs: introduce BTRFS_QGROUP_STATUS_FLAGS_MASK for later expansion Qu Wenruo
@ 2021-08-22  7:01 ` Qu Wenruo
  2021-08-22  7:01 ` [PATCH RFC 3/4] btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING to skip qgroup accounting Qu Wenruo
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2021-08-22  7:01 UTC (permalink / raw)
  To: linux-btrfs

Here we introduce a new runtime flag,
BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN, which will inform qgroup rescan
to cancel its work asynchronously.

This is to address the window when an operation makes qgroup numbers
inconsistent (like qgroup inheriting) while a qgroup rescan is running.

In that case, qgroup inconsistent flag will be cleared when qgroup
rescan finishes.
But we changed the ownership of some extents, which means the rescan is
already meaningless, and the qgroup inconsistent flag should not be
cleared.

With the new flag, each time we set INCONSISTENT flag, we also set this
new flag to inform any running qgroup rescan to exit immediately, and
leaving the INCONSISTENT flag there.

The new runtime flag can only be cleared when a new rescan is started.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/qgroup.c | 49 ++++++++++++++++++++++++++++++-----------------
 fs/btrfs/qgroup.h |  2 ++
 2 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9babf925bd59..91657ba1ebd1 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -322,6 +322,14 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid,
 }
 #endif
 
+static void qgroup_mark_inconsistent(struct btrfs_fs_info *fs_info)
+{
+	BUILD_BUG_ON(BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN &
+		     BTRFS_QGROUP_STATUS_FLAGS_MASK);
+	fs_info->qgroup_flags |= (BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT |
+				  BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN);
+}
+
 /*
  * The full config is read in one go, only called from open_ctree()
  * It doesn't use any locking, as at this point we're still single-threaded
@@ -390,7 +398,7 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info)
 			}
 			if (btrfs_qgroup_status_generation(l, ptr) !=
 			    fs_info->generation) {
-				flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+				qgroup_mark_inconsistent(fs_info);
 				btrfs_err(fs_info,
 					"qgroup generation mismatch, marked as inconsistent");
 			}
@@ -408,7 +416,7 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info)
 		if ((qgroup && found_key.type == BTRFS_QGROUP_INFO_KEY) ||
 		    (!qgroup && found_key.type == BTRFS_QGROUP_LIMIT_KEY)) {
 			btrfs_err(fs_info, "inconsistent qgroup config");
-			flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+			qgroup_mark_inconsistent(fs_info);
 		}
 		if (!qgroup) {
 			qgroup = add_qgroup_rb(fs_info, found_key.offset);
@@ -1661,7 +1669,7 @@ int btrfs_limit_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid,
 
 	ret = update_qgroup_limit_item(trans, qgroup);
 	if (ret) {
-		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+		qgroup_mark_inconsistent(fs_info);
 		btrfs_info(fs_info, "unable to update quota limit for %llu",
 		       qgroupid);
 	}
@@ -1737,7 +1745,7 @@ int btrfs_qgroup_trace_extent_post(struct btrfs_trans_handle *trans,
 	ret = btrfs_find_all_roots(NULL, trans->fs_info, bytenr, 0, &old_root,
 				   true);
 	if (ret < 0) {
-		trans->fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+		qgroup_mark_inconsistent(trans->fs_info);
 		btrfs_warn(trans->fs_info,
 "error accounting new delayed refs extent (err code: %d), quota inconsistent",
 			ret);
@@ -2213,7 +2221,7 @@ static int qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans,
 out:
 	btrfs_free_path(dst_path);
 	if (ret < 0)
-		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+		qgroup_mark_inconsistent(fs_info);
 	return ret;
 }
 
@@ -2717,12 +2725,10 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans)
 		spin_unlock(&fs_info->qgroup_lock);
 		ret = update_qgroup_info_item(trans, qgroup);
 		if (ret)
-			fs_info->qgroup_flags |=
-					BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+			qgroup_mark_inconsistent(fs_info);
 		ret = update_qgroup_limit_item(trans, qgroup);
 		if (ret)
-			fs_info->qgroup_flags |=
-					BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+			qgroup_mark_inconsistent(fs_info);
 		spin_lock(&fs_info->qgroup_lock);
 	}
 	if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags))
@@ -2733,7 +2739,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans)
 
 	ret = update_qgroup_status_item(trans);
 	if (ret)
-		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+		qgroup_mark_inconsistent(fs_info);
 
 	return ret;
 }
@@ -2851,7 +2857,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
 
 		ret = update_qgroup_limit_item(trans, dstgroup);
 		if (ret) {
-			fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+			qgroup_mark_inconsistent(fs_info);
 			btrfs_info(fs_info,
 				   "unable to update quota limit for %llu",
 				   dstgroup->qgroupid);
@@ -2957,7 +2963,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
 	if (!committing)
 		mutex_unlock(&fs_info->qgroup_ioctl_lock);
 	if (need_rescan)
-		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+		qgroup_mark_inconsistent(fs_info);
 	return ret;
 }
 
@@ -3226,7 +3232,8 @@ static int qgroup_rescan_leaf(struct btrfs_trans_handle *trans,
 static bool rescan_should_stop(struct btrfs_fs_info *fs_info)
 {
 	return btrfs_fs_closing(fs_info) ||
-		test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state);
+		test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state) ||
+		fs_info->qgroup_flags & BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN;
 }
 
 static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
@@ -3274,7 +3281,8 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
 	if (err > 0 &&
 	    fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT) {
 		fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
-	} else if (err < 0) {
+	} else if (err < 0 ||
+		   fs_info->qgroup_flags & BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN) {
 		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
 	}
 	mutex_unlock(&fs_info->qgroup_rescan_lock);
@@ -3293,7 +3301,8 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
 	}
 
 	mutex_lock(&fs_info->qgroup_rescan_lock);
-	if (!stopped)
+	if (!stopped ||
+	    fs_info->qgroup_flags & BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN)
 		fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
 	if (trans) {
 		ret = update_qgroup_status_item(trans);
@@ -3304,6 +3313,7 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
 		}
 	}
 	fs_info->qgroup_rescan_running = false;
+	fs_info->qgroup_flags &= ~BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN;
 	complete_all(&fs_info->qgroup_rescan_completion);
 	mutex_unlock(&fs_info->qgroup_rescan_lock);
 
@@ -3314,6 +3324,9 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
 
 	if (stopped) {
 		btrfs_info(fs_info, "qgroup scan paused");
+	} else if (fs_info->qgroup_flags &
+		   BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN) {
+		btrfs_info(fs_info, "qgroup scan cancelled");
 	} else if (err >= 0) {
 		btrfs_info(fs_info, "qgroup scan completed%s",
 			err > 0 ? " (inconsistency flag cleared)" : "");
@@ -3373,6 +3386,7 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
 
 	memset(&fs_info->qgroup_rescan_progress, 0,
 		sizeof(fs_info->qgroup_rescan_progress));
+	fs_info->qgroup_flags &= ~BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN;
 	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
 	init_completion(&fs_info->qgroup_rescan_completion);
 	mutex_unlock(&fs_info->qgroup_rescan_lock);
@@ -4169,8 +4183,7 @@ int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans,
 	spin_unlock(&blocks->lock);
 out:
 	if (ret < 0)
-		fs_info->qgroup_flags |=
-			BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+		qgroup_mark_inconsistent(fs_info);
 	return ret;
 }
 
@@ -4257,7 +4270,7 @@ int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans,
 		btrfs_err_rl(fs_info,
 			     "failed to account subtree at bytenr %llu: %d",
 			     subvol_eb->start, ret);
-		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+		qgroup_mark_inconsistent(fs_info);
 	}
 	return ret;
 }
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 880e9df0dac1..f92e635bfb69 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -100,6 +100,8 @@
  *     subtree rescan for them.
  */
 
+#define BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN		(1UL << 3)
+
 /*
  * Record a dirty extent, and info qgroup to update quota on it
  * TODO: Use kmem cache to alloc it.
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 3/4] btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING to skip qgroup accounting
  2021-08-22  7:01 [PATCH RFC 0/4] btrfs: qgroup: rescan enhancement related to INCONSISTENT flag Qu Wenruo
  2021-08-22  7:01 ` [PATCH RFC 1/4] btrfs: introduce BTRFS_QGROUP_STATUS_FLAGS_MASK for later expansion Qu Wenruo
  2021-08-22  7:01 ` [PATCH RFC 2/4] btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN Qu Wenruo
@ 2021-08-22  7:01 ` Qu Wenruo
  2021-08-22  7:02 ` [PATCH RFC 4/4] btrfs: skip subtree scan if it's too high to avoid low stall in btrfs_commit_transaction() Qu Wenruo
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2021-08-22  7:01 UTC (permalink / raw)
  To: linux-btrfs

The new flag will make btrfs qgroup to skip all its time consuming
qgroup accounting.

The lifespan is the same as BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN,
only get cleared after a new rescan.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/qgroup.c | 18 ++++++++++++++----
 fs/btrfs/qgroup.h |  1 +
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 91657ba1ebd1..291c404e8718 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -326,8 +326,11 @@ static void qgroup_mark_inconsistent(struct btrfs_fs_info *fs_info)
 {
 	BUILD_BUG_ON(BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN &
 		     BTRFS_QGROUP_STATUS_FLAGS_MASK);
+	BUILD_BUG_ON(BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING &
+		     BTRFS_QGROUP_STATUS_FLAGS_MASK);
 	fs_info->qgroup_flags |= (BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT |
-				  BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN);
+				  BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN |
+				  BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING);
 }
 
 /*
@@ -1742,6 +1745,10 @@ int btrfs_qgroup_trace_extent_post(struct btrfs_trans_handle *trans,
 	 */
 	ASSERT(trans != NULL);
 
+	if (trans->fs_info->qgroup_flags &
+	    BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING)
+		return 0;
+
 	ret = btrfs_find_all_roots(NULL, trans->fs_info, bytenr, 0, &old_root,
 				   true);
 	if (ret < 0) {
@@ -2556,7 +2563,8 @@ int btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 	 * If quotas get disabled meanwhile, the resources need to be freed and
 	 * we can't just exit here.
 	 */
-	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags))
+	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) ||
+	    fs_info->qgroup_flags & BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING)
 		goto out_free;
 
 	if (new_roots) {
@@ -2652,7 +2660,8 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans)
 		num_dirty_extents++;
 		trace_btrfs_qgroup_account_extents(fs_info, record);
 
-		if (!ret) {
+		if (!ret && !(fs_info->qgroup_flags &
+			      BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING)) {
 			/*
 			 * Old roots should be searched when inserting qgroup
 			 * extent record
@@ -3386,7 +3395,8 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
 
 	memset(&fs_info->qgroup_rescan_progress, 0,
 		sizeof(fs_info->qgroup_rescan_progress));
-	fs_info->qgroup_flags &= ~BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN;
+	fs_info->qgroup_flags &= ~(BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN |
+				   BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING);
 	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
 	init_completion(&fs_info->qgroup_rescan_completion);
 	mutex_unlock(&fs_info->qgroup_rescan_lock);
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index f92e635bfb69..fe42975dcea7 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -101,6 +101,7 @@
  */
 
 #define BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN		(1UL << 3)
+#define BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING		(1UL << 4)
 
 /*
  * Record a dirty extent, and info qgroup to update quota on it
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 4/4] btrfs: skip subtree scan if it's too high to avoid low stall in btrfs_commit_transaction()
  2021-08-22  7:01 [PATCH RFC 0/4] btrfs: qgroup: rescan enhancement related to INCONSISTENT flag Qu Wenruo
                   ` (2 preceding siblings ...)
  2021-08-22  7:01 ` [PATCH RFC 3/4] btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING to skip qgroup accounting Qu Wenruo
@ 2021-08-22  7:02 ` Qu Wenruo
  2021-08-23 17:24 ` [PATCH RFC 0/4] btrfs: qgroup: rescan enhancement related to INCONSISTENT flag David Sterba
  2021-08-23 17:30 ` David Sterba
  5 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2021-08-22  7:02 UTC (permalink / raw)
  To: linux-btrfs

Btrfs qgroup has a long history of bringing performance penalty in
btrfs_commit_transaction().

Although we tried our best to migrate such impact, there is still a
unsolved call site, btrfs_drop_snapshot().

This function will find the highest shared tree block and modify its
extent ownership to do a subvolume/snapshot dropping.

Such change will affect the whole subtree, and cause tons of qgroup
dirty extents and stall btrfs_commit_transaction().

To avoid such problem, we can simply skip such subtree accounting if
it's too high.
Of course, the cost is to mark qgroup inconsistent.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/qgroup.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 291c404e8718..c650258f5cec 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2248,6 +2248,19 @@ int btrfs_qgroup_trace_subtree(struct btrfs_trans_handle *trans,
 	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags))
 		return 0;
 
+	/*
+	 * This function only get called for snapshot drop, if we hit a high
+	 * node here, it means we are going to change ownership for quite a lot
+	 * of extents, which will greatly slow down btrfs_commit_transaction().
+	 *
+	 * So here if we find a high tree here, we just skip the accounting and
+	 * mark qgroup inconsistent.
+	 */
+	if (root_level >= 4) {
+		qgroup_mark_inconsistent(fs_info);
+		return 0;
+	}
+
 	if (!extent_buffer_uptodate(root_eb)) {
 		ret = btrfs_read_buffer(root_eb, root_gen, root_level, NULL);
 		if (ret)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC 0/4] btrfs: qgroup: rescan enhancement related to INCONSISTENT flag
  2021-08-22  7:01 [PATCH RFC 0/4] btrfs: qgroup: rescan enhancement related to INCONSISTENT flag Qu Wenruo
                   ` (3 preceding siblings ...)
  2021-08-22  7:02 ` [PATCH RFC 4/4] btrfs: skip subtree scan if it's too high to avoid low stall in btrfs_commit_transaction() Qu Wenruo
@ 2021-08-23 17:24 ` David Sterba
  2021-08-23 23:27   ` Qu Wenruo
  2021-08-23 17:30 ` David Sterba
  5 siblings, 1 reply; 10+ messages in thread
From: David Sterba @ 2021-08-23 17:24 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, mpdesouza

On Sun, Aug 22, 2021 at 03:01:56PM +0800, Qu Wenruo wrote:
> There is a long existing window that if we did some operations marking
> qgroup INCONSISTENT during a qgroup rescan, the INCONSISTENT bit will be
> cleared by rescan, leaving incorrect qgroup numbers unnoticed.
> 
> Furthermore, when we mark qgroup INCONSISTENT, we can in theory skip all
> qgroup accountings.
> Since the numbers are already crazy, we don't really need to waste time
> updating something that's already wrong.
> 
> So here we introduce two runtime flags:
> 
> - BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN
>   To inform any running rescan to exit immediately and don't clear
>   the INCONSISTENT bit on its exit.
> 
> - BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING
>   To inform qgroup code not to do any accounting for dirty extents.
> 
>   But still allow operations on qgroup relationship to be continued.
> 
> Both flags will be set when an operation marks the qgroup INCONSISTENT
> and only get cleared when a new rescan is started.
> 
> 
> With those flags, we can have the following enhancement:
> 
> - Prevent qgroup rescan to clear inconsistent flag which should be kept
>   If an operation marks qgroup inconsistent when a rescan is running,
>   qgroup rescan will clear the inconsistent flag while the qgroup
>   numbers are already wrong.
> 
> - Skip qgroup accountings while qgroup numbers are already inconsistent
> 
> - Skip huge subtree accounting when dropping subvolumes
>   With the obvious cost of marking qgroup inconsistent
> 
> 
> Reason for RFC:
> - If the runtime qgroup flags are acceptable
> 
> - If the behavior of marking qgroup inconsistent when dropping large
>   subvolumes
> 
> - If the lifespan of runtime qgroup flags are acceptable
>   They have longer than needed lifespan (from inconsistent time point to
>   next rescan), not sure if it's OK.

How is this related to the patch from Marcos?

https://lore.kernel.org/linux-btrfs/20210617123436.28327-1-mpdesouza@suse.com/

If there's way to cancel the rescan, does this patchset fix the possible
problems?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC 0/4] btrfs: qgroup: rescan enhancement related to INCONSISTENT flag
  2021-08-22  7:01 [PATCH RFC 0/4] btrfs: qgroup: rescan enhancement related to INCONSISTENT flag Qu Wenruo
                   ` (4 preceding siblings ...)
  2021-08-23 17:24 ` [PATCH RFC 0/4] btrfs: qgroup: rescan enhancement related to INCONSISTENT flag David Sterba
@ 2021-08-23 17:30 ` David Sterba
  2021-08-24  6:54   ` Qu Wenruo
  5 siblings, 1 reply; 10+ messages in thread
From: David Sterba @ 2021-08-23 17:30 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Sun, Aug 22, 2021 at 03:01:56PM +0800, Qu Wenruo wrote:
> There is a long existing window that if we did some operations marking
> qgroup INCONSISTENT during a qgroup rescan, the INCONSISTENT bit will be
> cleared by rescan, leaving incorrect qgroup numbers unnoticed.
> 
> Furthermore, when we mark qgroup INCONSISTENT, we can in theory skip all
> qgroup accountings.
> Since the numbers are already crazy, we don't really need to waste time
> updating something that's already wrong.
> 
> So here we introduce two runtime flags:
> 
> - BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN
>   To inform any running rescan to exit immediately and don't clear
>   the INCONSISTENT bit on its exit.
> 
> - BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING
>   To inform qgroup code not to do any accounting for dirty extents.
> 
>   But still allow operations on qgroup relationship to be continued.
> 
> Both flags will be set when an operation marks the qgroup INCONSISTENT
> and only get cleared when a new rescan is started.
> 
> 
> With those flags, we can have the following enhancement:
> 
> - Prevent qgroup rescan to clear inconsistent flag which should be kept
>   If an operation marks qgroup inconsistent when a rescan is running,
>   qgroup rescan will clear the inconsistent flag while the qgroup
>   numbers are already wrong.
> 
> - Skip qgroup accountings while qgroup numbers are already inconsistent
> 
> - Skip huge subtree accounting when dropping subvolumes
>   With the obvious cost of marking qgroup inconsistent
> 
> 
> Reason for RFC:
> - If the runtime qgroup flags are acceptable

As long as it's internal I think it's ok.

> - If the behavior of marking qgroup inconsistent when dropping large
>   subvolumes

That could be a bit problematic because user never knows if the rescan
has been started or not.

> - If the lifespan of runtime qgroup flags are acceptable
>   They have longer than needed lifespan (from inconsistent time point to
>   next rescan), not sure if it's OK.

On first read I haven't found anything obviously problematic.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC 0/4] btrfs: qgroup: rescan enhancement related to INCONSISTENT flag
  2021-08-23 17:24 ` [PATCH RFC 0/4] btrfs: qgroup: rescan enhancement related to INCONSISTENT flag David Sterba
@ 2021-08-23 23:27   ` Qu Wenruo
  0 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2021-08-23 23:27 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, mpdesouza



On 2021/8/24 上午1:24, David Sterba wrote:
> On Sun, Aug 22, 2021 at 03:01:56PM +0800, Qu Wenruo wrote:
>> There is a long existing window that if we did some operations marking
>> qgroup INCONSISTENT during a qgroup rescan, the INCONSISTENT bit will be
>> cleared by rescan, leaving incorrect qgroup numbers unnoticed.
>>
>> Furthermore, when we mark qgroup INCONSISTENT, we can in theory skip all
>> qgroup accountings.
>> Since the numbers are already crazy, we don't really need to waste time
>> updating something that's already wrong.
>>
>> So here we introduce two runtime flags:
>>
>> - BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN
>>    To inform any running rescan to exit immediately and don't clear
>>    the INCONSISTENT bit on its exit.
>>
>> - BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING
>>    To inform qgroup code not to do any accounting for dirty extents.
>>
>>    But still allow operations on qgroup relationship to be continued.
>>
>> Both flags will be set when an operation marks the qgroup INCONSISTENT
>> and only get cleared when a new rescan is started.
>>
>>
>> With those flags, we can have the following enhancement:
>>
>> - Prevent qgroup rescan to clear inconsistent flag which should be kept
>>    If an operation marks qgroup inconsistent when a rescan is running,
>>    qgroup rescan will clear the inconsistent flag while the qgroup
>>    numbers are already wrong.
>>
>> - Skip qgroup accountings while qgroup numbers are already inconsistent
>>
>> - Skip huge subtree accounting when dropping subvolumes
>>    With the obvious cost of marking qgroup inconsistent
>>
>>
>> Reason for RFC:
>> - If the runtime qgroup flags are acceptable
>>
>> - If the behavior of marking qgroup inconsistent when dropping large
>>    subvolumes
>>
>> - If the lifespan of runtime qgroup flags are acceptable
>>    They have longer than needed lifespan (from inconsistent time point to
>>    next rescan), not sure if it's OK.
>
> How is this related to the patch from Marcos?
>
> https://lore.kernel.org/linux-btrfs/20210617123436.28327-1-mpdesouza@suse.com/
>
> If there's way to cancel the rescan, does this patchset fix the possible
> problems?
>

It's a coincidence, as my primary objective here is to solve the final
piece of qgroup slow down from subvolume dropping.

Although the solution I take would not require ioctl to cancel a rescan
and also works for cases like qgroup inherit.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC 0/4] btrfs: qgroup: rescan enhancement related to INCONSISTENT flag
  2021-08-23 17:30 ` David Sterba
@ 2021-08-24  6:54   ` Qu Wenruo
  2021-08-24  7:49     ` Qu Wenruo
  0 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2021-08-24  6:54 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2021/8/24 上午1:30, David Sterba wrote:
> On Sun, Aug 22, 2021 at 03:01:56PM +0800, Qu Wenruo wrote:
>> There is a long existing window that if we did some operations marking
>> qgroup INCONSISTENT during a qgroup rescan, the INCONSISTENT bit will be
>> cleared by rescan, leaving incorrect qgroup numbers unnoticed.
>>
>> Furthermore, when we mark qgroup INCONSISTENT, we can in theory skip all
>> qgroup accountings.
>> Since the numbers are already crazy, we don't really need to waste time
>> updating something that's already wrong.
>>
>> So here we introduce two runtime flags:
>>
>> - BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN
>>    To inform any running rescan to exit immediately and don't clear
>>    the INCONSISTENT bit on its exit.
>>
>> - BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING
>>    To inform qgroup code not to do any accounting for dirty extents.
>>
>>    But still allow operations on qgroup relationship to be continued.
>>
>> Both flags will be set when an operation marks the qgroup INCONSISTENT
>> and only get cleared when a new rescan is started.
>>
>>
>> With those flags, we can have the following enhancement:
>>
>> - Prevent qgroup rescan to clear inconsistent flag which should be kept
>>    If an operation marks qgroup inconsistent when a rescan is running,
>>    qgroup rescan will clear the inconsistent flag while the qgroup
>>    numbers are already wrong.
>>
>> - Skip qgroup accountings while qgroup numbers are already inconsistent
>>
>> - Skip huge subtree accounting when dropping subvolumes
>>    With the obvious cost of marking qgroup inconsistent
>>
>>
>> Reason for RFC:
>> - If the runtime qgroup flags are acceptable
>
> As long as it's internal I think it's ok.
>
>> - If the behavior of marking qgroup inconsistent when dropping large
>>    subvolumes
>
> That could be a bit problematic because user never knows if the rescan
> has been started or not.

I guess for the end users, the new behavior means they should know which
operations could cancel rescan and cause qgroup to be inconsistent.

For now, only qgroup inherit (snapshot creation with -i option or qgroup
assign) and large subvolume dropping can cause such situation.

If there is something like snapper running, we can teach snapper to
check the qgroup status with an interval.

But for non-snapper qgroup users, it can indeed be problematic,
especially considering how frequently snapshot dropping can be.

Any better idea on how to balance the performance and qgroup consistency?

Thanks,
Qu

>
>> - If the lifespan of runtime qgroup flags are acceptable
>>    They have longer than needed lifespan (from inconsistent time point to
>>    next rescan), not sure if it's OK.
>
> On first read I haven't found anything obviously problematic.
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC 0/4] btrfs: qgroup: rescan enhancement related to INCONSISTENT flag
  2021-08-24  6:54   ` Qu Wenruo
@ 2021-08-24  7:49     ` Qu Wenruo
  0 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2021-08-24  7:49 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2021/8/24 下午2:54, Qu Wenruo wrote:
>
>
> On 2021/8/24 上午1:30, David Sterba wrote:
>> On Sun, Aug 22, 2021 at 03:01:56PM +0800, Qu Wenruo wrote:
>>> There is a long existing window that if we did some operations marking
>>> qgroup INCONSISTENT during a qgroup rescan, the INCONSISTENT bit will be
>>> cleared by rescan, leaving incorrect qgroup numbers unnoticed.
>>>
>>> Furthermore, when we mark qgroup INCONSISTENT, we can in theory skip all
>>> qgroup accountings.
>>> Since the numbers are already crazy, we don't really need to waste time
>>> updating something that's already wrong.
>>>
>>> So here we introduce two runtime flags:
>>>
>>> - BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN
>>>    To inform any running rescan to exit immediately and don't clear
>>>    the INCONSISTENT bit on its exit.
>>>
>>> - BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING
>>>    To inform qgroup code not to do any accounting for dirty extents.
>>>
>>>    But still allow operations on qgroup relationship to be continued.
>>>
>>> Both flags will be set when an operation marks the qgroup INCONSISTENT
>>> and only get cleared when a new rescan is started.
>>>
>>>
>>> With those flags, we can have the following enhancement:
>>>
>>> - Prevent qgroup rescan to clear inconsistent flag which should be kept
>>>    If an operation marks qgroup inconsistent when a rescan is running,
>>>    qgroup rescan will clear the inconsistent flag while the qgroup
>>>    numbers are already wrong.
>>>
>>> - Skip qgroup accountings while qgroup numbers are already inconsistent
>>>
>>> - Skip huge subtree accounting when dropping subvolumes
>>>    With the obvious cost of marking qgroup inconsistent
>>>
>>>
>>> Reason for RFC:
>>> - If the runtime qgroup flags are acceptable
>>
>> As long as it's internal I think it's ok.
>>
>>> - If the behavior of marking qgroup inconsistent when dropping large
>>>    subvolumes
>>
>> That could be a bit problematic because user never knows if the rescan
>> has been started or not.
>
> I guess for the end users, the new behavior means they should know which
> operations could cancel rescan and cause qgroup to be inconsistent.
>
> For now, only qgroup inherit (snapshot creation with -i option or qgroup
> assign) and large subvolume dropping can cause such situation.
>
> If there is something like snapper running, we can teach snapper to
> check the qgroup status with an interval.
>
> But for non-snapper qgroup users, it can indeed be problematic,
> especially considering how frequently snapshot dropping can be.
>
> Any better idea on how to balance the performance and qgroup consistency?

Another idea is to allow the user to specify the threshold to skip
certain subtree drop.

And by default, we set the threshold to MAX_LEVEL, so no subtree will be
skipped (aka, super slow drop, but no sudden qgroup related slow down).

For the qgroup inherit case, it's no difference, any caller that would
bring qgroup inconsistent should know what they are doing.
Thus only the subvolume drop is the key.

Any idea for the interface? Per-fs sysfs? Or mount option?

Thanks,
Qu
>
> Thanks,
> Qu
>
>>
>>> - If the lifespan of runtime qgroup flags are acceptable
>>>    They have longer than needed lifespan (from inconsistent time
>>> point to
>>>    next rescan), not sure if it's OK.
>>
>> On first read I haven't found anything obviously problematic.
>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-08-24  7:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-22  7:01 [PATCH RFC 0/4] btrfs: qgroup: rescan enhancement related to INCONSISTENT flag Qu Wenruo
2021-08-22  7:01 ` [PATCH RFC 1/4] btrfs: introduce BTRFS_QGROUP_STATUS_FLAGS_MASK for later expansion Qu Wenruo
2021-08-22  7:01 ` [PATCH RFC 2/4] btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN Qu Wenruo
2021-08-22  7:01 ` [PATCH RFC 3/4] btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING to skip qgroup accounting Qu Wenruo
2021-08-22  7:02 ` [PATCH RFC 4/4] btrfs: skip subtree scan if it's too high to avoid low stall in btrfs_commit_transaction() Qu Wenruo
2021-08-23 17:24 ` [PATCH RFC 0/4] btrfs: qgroup: rescan enhancement related to INCONSISTENT flag David Sterba
2021-08-23 23:27   ` Qu Wenruo
2021-08-23 17:30 ` David Sterba
2021-08-24  6:54   ` Qu Wenruo
2021-08-24  7:49     ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.