All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
@ 2018-10-18 11:17 Qu Wenruo
  2018-10-18 11:17 ` [PATCH 1/6] btrfs: qgroup: Allow btrfs_qgroup_extent_record::old_roots unpopulated at insert time Qu Wenruo
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Qu Wenruo @ 2018-10-18 11:17 UTC (permalink / raw)
  To: linux-btrfs

This patchset can be fetched from github:
https://github.com/adam900710/linux/tree/qgroup_balance_skip_trees

Which is still based on v4.19-rc1, but with previous submitted patches
as dependency.

This patch address the heavy load subtree scan, but delaying it until
we're going to modify the swapped tree block.

The overall workflow is:

1) Record the subtree root block get swapped.

   During subtree swap:
   O = Old tree blocks
   N = New tree blocks
         reloc tree                         file tree X
            Root                               Root
           /    \                             /    \
         NA     OB                          OA      OB
       /  |     |  \                      /  |      |  \
     NC  ND     OE  OF                   OC  OD     OE  OF

  In these case, NA and OA is going to be swapped, record (NA, OA) into
  file tree X.

2) After subtree swap.
         reloc tree                         file tree X
            Root                               Root
           /    \                             /    \
         OA     OB                          NA      OB
       /  |     |  \                      /  |      |  \
     OC  OD     OE  OF                   NC  ND     OE  OF

3a) CoW happens for OB
    If we are going to CoW tree block OB, we check OB's bytenr against
    tree X's swapped_blocks structure.
    It doesn't fit any one, nothing will happen.

3b) CoW happens for NA
    Check NA's bytenr against tree X's swapped_blocks, and get a hit.
    Then we do subtree scan on both subtree OA and NA.
    Resulting 6 tree blocks to be scanned (OA, OC, OD, NA, NC, ND).

    Then no matter what we do to file tree X, qgroup numbers will
    still be correct.
    Then NA's record get removed from X's swapped_blocks.

4)  Transaction commit
    Any record in X's swapped_blocks get removed, since there is no
    modification to swapped subtrees, no need to trigger heavy qgroup
    subtree rescan for them.

[[Benchmark]]
Hardware:
	VM 4G vRAM, 8 vCPUs,
	disk is using 'unsafe' cache mode,
	backing device is SAMSUNG 850 evo SSD.
	Host has 16G ram.

Mkfs parameter:
	--nodesize 4K (To bump up tree size)

Initial subvolume contents:
	4G data copied from /usr and /lib.
	(With enough regular small files)

Snapshots:
	16 snapshots of the original subvolume.
	each snapshot has 3 random files modified.

balance parameter:
	-m

So the content should be pretty similar to a real world root fs layout.

And after file system population, there is no other activity, so it
should be the best case scenario.

                     | prev optimization(*) | w/ patchset    | diff
-----------------------------------------------------------------------
relocated extents    | 22958                | 22983          | +0.1%
qgroup dirty extents | 129282               | 54630          | -57.8%
time (sys)           | 37.807s              | 27.038s        | -28.5%
time (real)          | 45.446s              | 34.015s        | -25.2%

*: Previous submitted patch, titled
   "[PATCH v4 0/7] btrfs: qgroup: Reduce dirty extents for metadata".
   Still based on v4.19-rc1

Qu Wenruo (6):
  btrfs: qgroup: Allow btrfs_qgroup_extent_record::old_roots unpopulated
    at insert time
  btrfs: relocation: Commit transaction before dropping
    btrfs_root::reloc_root
  btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap()
  btrfs: qgroup: Introduce per-root swapped blocks infrastructure
  btrfs: qgroup: Use delayed subtree rescan for balance
  btrfs: qgroup: Cleanup old subtree swap code

 fs/btrfs/ctree.c       |   8 +
 fs/btrfs/ctree.h       |  13 ++
 fs/btrfs/disk-io.c     |   1 +
 fs/btrfs/qgroup.c      | 368 +++++++++++++++++++++++++++++++----------
 fs/btrfs/qgroup.h      | 107 +++++++++++-
 fs/btrfs/relocation.c  |  26 ++-
 fs/btrfs/transaction.c |   1 +
 7 files changed, 424 insertions(+), 100 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] btrfs: qgroup: Allow btrfs_qgroup_extent_record::old_roots unpopulated at insert time
  2018-10-18 11:17 [PATCH 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead Qu Wenruo
@ 2018-10-18 11:17 ` Qu Wenruo
  2018-10-18 11:17 ` [PATCH 2/6] btrfs: relocation: Commit transaction before dropping btrfs_root::reloc_root Qu Wenruo
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2018-10-18 11:17 UTC (permalink / raw)
  To: linux-btrfs

Commit fb235dc06fac ("btrfs: qgroup: Move half of the qgroup accounting
time out of commit trans") makes btrfs_qgroup_extent_record::old_roots
populated at insert time.

It's OK for most cases as btrfs_qgroup_extent_record is inserted at
delayed ref head insert time, which has a less restrict lock context.

But later delayed subtree scan optimization will need to insert
btrfs_qgroup_extent_record with path write lock hold, where triggering a
backref walk can easily lead to dead lock.

So this patch introduces two new internal functions,
qgroup_trace_extent() and qgroup_trace_leaf_items(), with new @exec_post
parameter to info whether we need to initialize the backref walk right
now.

Also modifies btrfs_qgroup_account_extents() not to trigger kernel
warning.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/qgroup.c | 51 +++++++++++++++++++++++++++++++++++++----------
 1 file changed, 41 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 678e07c11bc6..aaa3657266b0 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1580,8 +1580,16 @@ int btrfs_qgroup_trace_extent_post(struct btrfs_fs_info *fs_info,
 	return 0;
 }
 
-int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
-			      u64 num_bytes, gfp_t gfp_flag)
+/*
+ * Insert qgroup extent record for extent at @bytenr, @num_bytes.
+ *
+ * @bytenr:	bytenr of the extent
+ * @num_bytes:	length of the extent
+ * @exec_post:	whether to exec the post insert work
+ * 		will init backref walk if set to true.
+ */
+static int qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
+			       u64 num_bytes, gfp_t gfp_flag, bool exec_post)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_qgroup_extent_record *record;
@@ -1607,11 +1615,27 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 		kfree(record);
 		return 0;
 	}
-	return btrfs_qgroup_trace_extent_post(fs_info, record);
+	if (exec_post)
+		return btrfs_qgroup_trace_extent_post(fs_info, record);
+	return 0;
 }
 
-int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans,
-				  struct extent_buffer *eb)
+int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
+			      u64 num_bytes, gfp_t gfp_flag)
+{
+	return qgroup_trace_extent(trans, bytenr, num_bytes, gfp_flag, true);
+}
+
+/*
+ * Insert qgroup extent record for leaf and all file extents in it
+ *
+ * @bytenr:	bytenr of the leaf
+ * @num_bytes:	length of the leaf
+ * @exec_post:	whether to exec the post insert work
+ * 		will init backref walk if set to true.
+ */
+static int qgroup_trace_leaf_items(struct btrfs_trans_handle *trans,
+				   struct extent_buffer *eb, bool exec_post)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	int nr = btrfs_header_nritems(eb);
@@ -1643,8 +1667,8 @@ int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans,
 
 		num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi);
 
-		ret = btrfs_qgroup_trace_extent(trans, bytenr, num_bytes,
-						GFP_NOFS);
+		ret = qgroup_trace_extent(trans, bytenr, num_bytes, GFP_NOFS,
+					  exec_post);
 		if (ret)
 			return ret;
 	}
@@ -1652,6 +1676,12 @@ int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans,
 	return 0;
 }
 
+int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans,
+				  struct extent_buffer *eb)
+{
+	return qgroup_trace_leaf_items(trans, eb, true);
+}
+
 /*
  * Walk up the tree from the bottom, freeing leaves and any interior
  * nodes which have had all slots visited. If a node (leaf or
@@ -2521,10 +2551,11 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans)
 
 		if (!ret) {
 			/*
-			 * Old roots should be searched when inserting qgroup
-			 * extent record
+			 * Most record->old_roots should have been populated at
+			 * insert time. Although we still allow some records
+			 * without old_roots populated.
 			 */
-			if (WARN_ON(!record->old_roots)) {
+			if (!record->old_roots) {
 				/* Search commit root to find old_roots */
 				ret = btrfs_find_all_roots(NULL, fs_info,
 						record->bytenr, 0,
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/6] btrfs: relocation: Commit transaction before dropping btrfs_root::reloc_root
  2018-10-18 11:17 [PATCH 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead Qu Wenruo
  2018-10-18 11:17 ` [PATCH 1/6] btrfs: qgroup: Allow btrfs_qgroup_extent_record::old_roots unpopulated at insert time Qu Wenruo
@ 2018-10-18 11:17 ` Qu Wenruo
  2018-10-18 11:17 ` [PATCH 3/6] btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap() Qu Wenruo
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2018-10-18 11:17 UTC (permalink / raw)
  To: linux-btrfs

Currently only relocation code cares about btrfs_root::reloc_root, and
they have the method to sync btrfs_root::reloc_root without screwing
things up.

However qgroup code doesn't really have the ability to keep
btrfs_root::reloc_root reliable.

Currently if someone outside of relocation code want to access
btrfs_root::reloc_root, it's highly possible some race could make
@reloc_root unreliable.

Thankfully, we only need to make qgroup code access
btrfs_root::reloc_root, and due to the nature of qgroup, it only needs
it before committing a transaction.

So alter the timming of transaction commit for merge_reloc_root(). For
every successful root merge, commit transaction before dropping
btrfs_root::reloc_root, making qgroup code happy with it.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/relocation.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 44efde9886fc..a8b62323f60b 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2251,6 +2251,17 @@ static noinline_for_stack int merge_reloc_root(struct reloc_control *rc,
 	btrfs_free_path(path);
 
 	if (err == 0) {
+		/*
+		 * Commit current trans before we reset root->reloc_root
+		 * pointer, since qgroup code still needs root->reloc_root
+		 * to trace post-swap CoWs until we commit transaction.
+		 */
+		err = btrfs_commit_transaction(trans);
+		if (err < 0)
+			return err;
+		trans = btrfs_start_transaction(root, 0);
+		if (IS_ERR(trans))
+			return PTR_ERR(trans);
 		memset(&root_item->drop_progress, 0,
 		       sizeof(root_item->drop_progress));
 		root_item->drop_level = 0;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/6] btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap()
  2018-10-18 11:17 [PATCH 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead Qu Wenruo
  2018-10-18 11:17 ` [PATCH 1/6] btrfs: qgroup: Allow btrfs_qgroup_extent_record::old_roots unpopulated at insert time Qu Wenruo
  2018-10-18 11:17 ` [PATCH 2/6] btrfs: relocation: Commit transaction before dropping btrfs_root::reloc_root Qu Wenruo
@ 2018-10-18 11:17 ` Qu Wenruo
  2018-10-18 11:17 ` [PATCH 4/6] btrfs: qgroup: Introduce per-root swapped blocks infrastructure Qu Wenruo
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2018-10-18 11:17 UTC (permalink / raw)
  To: linux-btrfs

Refactor btrfs_qgroup_trace_subtree_swap() into
qgroup_trace_subtree_swap(), which only needs two extent buffer and some
other bool to control the behavior.

Also, allow depending functions to accept parameter @exec_post to
determine whether we need to trigger backref walk.

This provides the basis for later delayed subtree scan work.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/qgroup.c | 101 ++++++++++++++++++++++++++++++++--------------
 1 file changed, 70 insertions(+), 31 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index aaa3657266b0..e40d5991c438 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1756,7 +1756,7 @@ static int qgroup_trace_extent_swap(struct btrfs_trans_handle* trans,
 				    struct extent_buffer *src_eb,
 				    struct btrfs_path *dst_path,
 				    int dst_level, int root_level,
-				    bool trace_leaf)
+				    bool trace_leaf, bool exec_post)
 {
 	struct btrfs_key key;
 	struct btrfs_path *src_path;
@@ -1847,23 +1847,23 @@ static int qgroup_trace_extent_swap(struct btrfs_trans_handle* trans,
 	 * Now both @dst_path and @src_path have been populated, record the tree
 	 * blocks for qgroup accounting.
 	 */
-	ret = btrfs_qgroup_trace_extent(trans,
-			src_path->nodes[dst_level]->start,
-			nodesize, GFP_NOFS);
+	ret = qgroup_trace_extent(trans, src_path->nodes[dst_level]->start,
+				  nodesize, GFP_NOFS, exec_post);
 	if (ret < 0)
 		goto out;
-	ret = btrfs_qgroup_trace_extent(trans,
-			dst_path->nodes[dst_level]->start,
-			nodesize, GFP_NOFS);
+	ret = qgroup_trace_extent(trans, dst_path->nodes[dst_level]->start,
+				  nodesize, GFP_NOFS, exec_post);
 	if (ret < 0)
 		goto out;
 
 	/* Record leaf file extents */
 	if (dst_level == 0 && trace_leaf) {
-		ret = btrfs_qgroup_trace_leaf_items(trans, src_path->nodes[0]);
+		ret = qgroup_trace_leaf_items(trans, src_path->nodes[0],
+					      exec_post);
 		if (ret < 0)
 			goto out;
-		ret = btrfs_qgroup_trace_leaf_items(trans, dst_path->nodes[0]);
+		ret = qgroup_trace_leaf_items(trans, dst_path->nodes[0],
+					      exec_post);
 	}
 out:
 	btrfs_free_path(src_path);
@@ -1896,7 +1896,8 @@ static int qgroup_trace_new_subtree_blocks(struct btrfs_trans_handle* trans,
 					   struct extent_buffer *src_eb,
 					   struct btrfs_path *dst_path,
 					   int cur_level, int root_level,
-					   u64 last_snapshot, bool trace_leaf)
+					   u64 last_snapshot, bool trace_leaf,
+					   bool exec_post)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct extent_buffer *eb;
@@ -1968,7 +1969,7 @@ static int qgroup_trace_new_subtree_blocks(struct btrfs_trans_handle* trans,
 
 	/* Now record this tree block and its counter part for qgroups */
 	ret = qgroup_trace_extent_swap(trans, src_eb, dst_path, cur_level,
-				       root_level, trace_leaf);
+				       root_level, trace_leaf, exec_post);
 	if (ret < 0)
 		goto cleanup;
 
@@ -1985,7 +1986,7 @@ static int qgroup_trace_new_subtree_blocks(struct btrfs_trans_handle* trans,
 			/* Recursive call (at most 7 times) */
 			ret = qgroup_trace_new_subtree_blocks(trans, src_eb,
 					dst_path, cur_level - 1, root_level,
-					last_snapshot, trace_leaf);
+					last_snapshot, trace_leaf, exec_post);
 			if (ret < 0)
 				goto cleanup;
 		}
@@ -2005,6 +2006,61 @@ static int qgroup_trace_new_subtree_blocks(struct btrfs_trans_handle* trans,
 	return ret;
 }
 
+static int qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans,
+				struct extent_buffer *src_eb,
+				struct extent_buffer *dst_eb,
+				u64 last_snapshot, bool trace_leaf,
+				bool exec_post)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_path *dst_path = NULL;
+	int level;
+	int ret;
+
+	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags))
+		return 0;
+
+	/* Wrong parameter order */
+	if (btrfs_header_generation(src_eb) > btrfs_header_generation(dst_eb)) {
+		btrfs_err_rl(fs_info,
+		"%s: bad parameter order, src_gen=%llu dst_gen=%llu", __func__,
+			     btrfs_header_generation(src_eb),
+			     btrfs_header_generation(dst_eb));
+		return -EUCLEAN;
+	}
+
+	if (!extent_buffer_uptodate(src_eb) || !extent_buffer_uptodate(dst_eb)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	level = btrfs_header_level(dst_eb);
+	dst_path = btrfs_alloc_path();
+	if (!dst_path) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	/* For dst_path */
+	extent_buffer_get(dst_eb);
+	dst_path->nodes[level] = dst_eb;
+	dst_path->slots[level] = 0;
+	dst_path->locks[level] = 0;
+
+	/* Do the generation aware breadth-first search */
+	ret = qgroup_trace_new_subtree_blocks(trans, src_eb, dst_path, level,
+					      level, last_snapshot, trace_leaf,
+					      exec_post);
+	if (ret < 0)
+		goto out;
+	ret = 0;
+
+out:
+	btrfs_free_path(dst_path);
+	if (ret < 0)
+		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+	return ret;
+}
+
 /*
  * Inform qgroup to trace subtree swap used in balance.
  *
@@ -2030,14 +2086,12 @@ int btrfs_qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans,
 				u64 last_snapshot)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
-	struct btrfs_path *dst_path = NULL;
 	struct btrfs_key first_key;
 	struct extent_buffer *src_eb = NULL;
 	struct extent_buffer *dst_eb = NULL;
 	bool trace_leaf = false;
 	u64 child_gen;
 	u64 child_bytenr;
-	int level;
 	int ret;
 
 	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags))
@@ -2088,21 +2142,9 @@ int btrfs_qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans,
 		goto out;
 	}
 
-	level = btrfs_header_level(dst_eb);
-	dst_path = btrfs_alloc_path();
-	if (!dst_path) {
-		ret = -ENOMEM;
-		goto out;
-	}
-	/* For dst_path */
-	extent_buffer_get(dst_eb);
-	dst_path->nodes[level] = dst_eb;
-	dst_path->slots[level] = 0;
-	dst_path->locks[level] = 0;
-
 	/* Do the generation aware breadth-first search */
-	ret = qgroup_trace_new_subtree_blocks(trans, src_eb, dst_path, level,
-					      level, last_snapshot, trace_leaf);
+	ret = qgroup_trace_subtree_swap(trans, src_eb, dst_eb, last_snapshot,
+					trace_leaf, true);
 	if (ret < 0)
 		goto out;
 	ret = 0;
@@ -2110,9 +2152,6 @@ int btrfs_qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans,
 out:
 	free_extent_buffer(src_eb);
 	free_extent_buffer(dst_eb);
-	btrfs_free_path(dst_path);
-	if (ret < 0)
-		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
 	return ret;
 }
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/6] btrfs: qgroup: Introduce per-root swapped blocks infrastructure
  2018-10-18 11:17 [PATCH 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead Qu Wenruo
                   ` (2 preceding siblings ...)
  2018-10-18 11:17 ` [PATCH 3/6] btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap() Qu Wenruo
@ 2018-10-18 11:17 ` Qu Wenruo
  2018-10-18 16:20   ` David Sterba
  2018-10-18 11:17 ` [PATCH 5/6] btrfs: qgroup: Use delayed subtree rescan for balance Qu Wenruo
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2018-10-18 11:17 UTC (permalink / raw)
  To: linux-btrfs

To allow delayed subtree swap rescan, btrfs needs to record per-root
info about which tree blocks get swapped.

So this patch introduces per-root btrfs_qgroup_swapped_blocks structure,
which records which tree blocks get swapped.

The designed workflow will be:

1) Record the subtree root block get swapped.

   During subtree swap:
   O = Old tree blocks
   N = New tree blocks
         reloc tree                         file tree X
            Root                               Root
           /    \                             /    \
         NA     OB                          OA      OB
       /  |     |  \                      /  |      |  \
     NC  ND     OE  OF                   OC  OD     OE  OF

  In these case, NA and OA is going to be swapped, record (NA, OA) into
  file tree X.

2) After subtree swap.
         reloc tree                         file tree X
            Root                               Root
           /    \                             /    \
         OA     OB                          NA      OB
       /  |     |  \                      /  |      |  \
     OC  OD     OE  OF                   NC  ND     OE  OF

3a) CoW happens for OB
    If we are going to CoW tree block OB, we check OB's bytenr against
    tree X's swapped_blocks structure.
    It doesn't fit any one, nothing will happen.

3b) CoW happens for NA
    Check NA's bytenr against tree X's swapped_blocks, and get a hit.
    Then we do subtree scan on both subtree OA and NA.
    Resulting 6 tree blocks to be scanned (OA, OC, OD, NA, NC, ND).

    Then no matter what we do to file tree X, qgroup numbers will
    still be correct.
    Then NA's record get removed from X's swapped_blocks.

4)  Transaction commit
    Any record in X's swapped_blocks get removed, since there is no
    modification to swapped subtrees, no need to trigger heavy qgroup
    subtree rescan for them.

This will introduce 128 bytes overhead for each btrfs_root even qgroup
is not enabled.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h       |  13 +++++
 fs/btrfs/disk-io.c     |   1 +
 fs/btrfs/qgroup.c      | 129 +++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/qgroup.h      |  99 +++++++++++++++++++++++++++++++
 fs/btrfs/relocation.c  |   7 +++
 fs/btrfs/transaction.c |   1 +
 6 files changed, 250 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 53af9f5253f4..5f2b055fcd56 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1157,6 +1157,17 @@ struct btrfs_subvolume_writers {
 #define BTRFS_ROOT_MULTI_LOG_TASKS	7
 #define BTRFS_ROOT_DIRTY		8
 
+/*
+ * Record swapped tree blocks of a file/subvolume tree for delayed subtree
+ * trace code. For detail check comment in fs/btrfs/qgroup.c.
+ */
+struct btrfs_qgroup_swapped_blocks {
+	spinlock_t lock;
+	struct rb_root blocks[BTRFS_MAX_LEVEL];
+	/* RM_EMPTY_ROOT() of above blocks[] */
+	bool swapped;
+};
+
 /*
  * in ram representation of the tree.  extent_root is used for all allocations
  * and for the extent tree extent_root root.
@@ -1285,6 +1296,8 @@ struct btrfs_root {
 	spinlock_t qgroup_meta_rsv_lock;
 	u64 qgroup_meta_rsv_pertrans;
 	u64 qgroup_meta_rsv_prealloc;
+
+	struct btrfs_qgroup_swapped_blocks swapped_blocks;
 };
 
 struct btrfs_file_private {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 5124c15705ce..d6aa266f0638 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1204,6 +1204,7 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
 	root->anon_dev = 0;
 
 	spin_lock_init(&root->root_item_lock);
+	btrfs_qgroup_init_swapped_blocks(&root->swapped_blocks);
 }
 
 static struct btrfs_root *btrfs_alloc_root(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index e40d5991c438..146956fa8574 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3808,3 +3808,132 @@ void btrfs_qgroup_check_reserved_leak(struct inode *inode)
 	}
 	extent_changeset_release(&changeset);
 }
+
+/*
+ * Delete all swapped blocks record of @root.
+ * Every record here means we skipped a full subtree scan for qgroup.
+ *
+ * Get called when commit one transaction.
+ */
+void btrfs_qgroup_clean_swapped_blocks(struct btrfs_root *root)
+{
+	struct btrfs_qgroup_swapped_blocks *swapped_blocks;
+	struct btrfs_qgroup_swapped_block *cur, *next;
+	int i;
+
+	swapped_blocks = &root->swapped_blocks;
+
+	spin_lock(&swapped_blocks->lock);
+	if (!swapped_blocks->swapped)
+		goto out;
+	for (i = 0; i < BTRFS_MAX_LEVEL; i++) {
+		struct rb_root *cur_root = &swapped_blocks->blocks[i];
+
+		rbtree_postorder_for_each_entry_safe(cur, next, cur_root,
+						     node) {
+			rb_erase(&cur->node, cur_root);
+			kfree(cur);
+		}
+	}
+	swapped_blocks->swapped = false;
+out:
+	spin_unlock(&swapped_blocks->lock);
+}
+
+/*
+ * Adding subtree roots record into @file_root.
+ *
+ * @file_root:		tree root of the file tree get swapped
+ * @bg:			block group under balance
+ * @file_parent/slot:	pointer to the subtree root in file tree
+ * @reloc_parent/slot:	pointer to the subtree root in reloc tree
+ * 			BOTH POINTERS ARE BEFORE TREE SWAP
+ * @last_snapshot:	last snapshot generation of the file tree
+ */
+int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans,
+		struct btrfs_root *file_root,
+		struct btrfs_block_group_cache *bg,
+		struct extent_buffer *file_parent, int file_slot,
+		struct extent_buffer *reloc_parent, int reloc_slot,
+		u64 last_snapshot)
+{
+	int level = btrfs_header_level(file_parent) - 1;
+	struct btrfs_qgroup_swapped_blocks *blocks = &file_root->swapped_blocks;
+	struct btrfs_fs_info *fs_info = file_root->fs_info;
+	struct btrfs_qgroup_swapped_block *block;
+	struct rb_node **p = &blocks->blocks[level].rb_node;
+	struct rb_node *parent = NULL;
+	int ret = 0;
+
+	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags))
+		return 0;
+
+	if (btrfs_node_ptr_generation(file_parent, file_slot) >
+		btrfs_node_ptr_generation(reloc_parent, reloc_slot)) {
+		btrfs_err_rl(fs_info,
+		"%s: bad parameter order, file_gen=%llu reloc_gen=%llu", __func__,
+			btrfs_node_ptr_generation(file_parent, file_slot),
+			btrfs_node_ptr_generation(reloc_parent, reloc_slot));
+		return -EUCLEAN;
+	}
+
+	block = kmalloc(sizeof(*block), GFP_NOFS);
+	if (!block) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	/*
+	 * @reloc_parent/slot is still *BEFORE* swap, while @block is going to
+	 * record the bytenr *AFTER* swap, so we do the swap here.
+	 */
+	block->file_bytenr = btrfs_node_blockptr(reloc_parent, reloc_slot);
+	block->file_generation = btrfs_node_ptr_generation(reloc_parent,
+							   reloc_slot);
+	block->reloc_bytenr = btrfs_node_blockptr(file_parent, file_slot);
+	block->reloc_generation = btrfs_node_ptr_generation(file_parent,
+							    file_slot);
+	block->last_snapshot = last_snapshot;
+	block->level = level;
+	if (bg->flags & BTRFS_BLOCK_GROUP_DATA)
+		block->trace_leaf = true;
+	else
+		block->trace_leaf = false;
+	btrfs_node_key_to_cpu(reloc_parent, &block->first_key, reloc_slot);
+
+	/* Insert @block into @blocks */
+	spin_lock(&blocks->lock);
+	while (*p) {
+		struct btrfs_qgroup_swapped_block *entry;
+
+		parent = *p;
+		entry = rb_entry(parent, struct btrfs_qgroup_swapped_block,
+				 node);
+
+		if (entry->file_bytenr < block->file_bytenr)
+			p = &(*p)->rb_left;
+		else if (entry->file_bytenr > block->file_bytenr)
+			p = &(*p)->rb_right;
+		else {
+			if (entry->file_generation != block->file_generation ||
+			    entry->reloc_bytenr != block->reloc_bytenr ||
+			    entry->reloc_generation !=
+			    block->reloc_generation) {
+				WARN_ON_ONCE(1);
+				ret = -EEXIST;
+			}
+			kfree(block);
+			goto out_unlock;
+		}
+	}
+	rb_link_node(&block->node, parent, p);
+	rb_insert_color(&block->node, &blocks->blocks[level]);
+	blocks->swapped = true;
+out_unlock:
+	spin_unlock(&blocks->lock);
+out:
+	if (ret < 0)
+		fs_info->qgroup_flags |=
+			BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+	return ret;
+}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 80ebeb3ab5ba..fab6ca96f677 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -6,6 +6,8 @@
 #ifndef BTRFS_QGROUP_H
 #define BTRFS_QGROUP_H
 
+#include <linux/spinlock.h>
+#include <linux/rbtree.h>
 #include "ulist.h"
 #include "delayed-ref.h"
 
@@ -37,6 +39,66 @@
  *    Normally at qgroup rescan and transaction commit time.
  */
 
+/*
+ * Special performance hack for balance.
+ *
+ * For balance, we need to swap subtree of file and reloc tree.
+ * In theory, we need to trace all subtree blocks of both file and reloc tree,
+ * since their owner has changed during such swap.
+ *
+ * However since balance has ensured that both subtrees are containing the
+ * same contents and have the same tree structures, such swap won't cause
+ * qgroup number change.
+ *
+ * But there is a race window between subtree swap and transaction commit,
+ * during that window, if we increase/decrease tree level or merge/split tree
+ * blocks, we still needs to trace original subtrees.
+ *
+ * So for balance, we use a delayed subtree trace, whose workflow is:
+ *
+ * 1) Record the subtree root block get swapped.
+ *
+ *    During subtree swap:
+ *    O = Old tree blocks
+ *    N = New tree blocks
+ *          reloc tree                         file tree X
+ *             Root                               Root
+ *            /    \                             /    \
+ *          NA     OB                          OA      OB
+ *        /  |     |  \                      /  |      |  \
+ *      NC  ND     OE  OF                   OC  OD     OE  OF
+ *
+ *   In these case, NA and OA is going to be swapped, record (NA, OA) into
+ *   file tree X.
+ *
+ * 2) After subtree swap.
+ *          reloc tree                         file tree X
+ *             Root                               Root
+ *            /    \                             /    \
+ *          OA     OB                          NA      OB
+ *        /  |     |  \                      /  |      |  \
+ *      OC  OD     OE  OF                   NC  ND     OE  OF
+ *
+ * 3a) CoW happens for OB
+ *     If we are going to CoW tree block OB, we check OB's bytenr against
+ *     tree X's swapped_blocks structure.
+ *     It doesn't fit any one, nothing will happen.
+ *
+ * 3b) CoW happens for NA
+ *     Check NA's bytenr against tree X's swapped_blocks, and get a hit.
+ *     Then we do subtree scan on both subtree OA and NA.
+ *     Resulting 6 tree blocks to be scanned (OA, OC, OD, NA, NC, ND).
+ *
+ *     Then no matter what we do to file tree X, qgroup numbers will
+ *     still be correct.
+ *     Then NA's record get removed from X's swapped_blocks.
+ *
+ * 4)  Transaction commit
+ *     Any record in X's swapped_blocks get removed, since there is no
+ *     modification to swapped subtrees, no need to trigger heavy qgroup
+ *     subtree rescan for them.
+ */
+
 /*
  * Record a dirty extent, and info qgroup to update quota on it
  * TODO: Use kmem cache to alloc it.
@@ -48,6 +110,24 @@ struct btrfs_qgroup_extent_record {
 	struct ulist *old_roots;
 };
 
+struct btrfs_qgroup_swapped_block {
+	struct rb_node node;
+
+	bool trace_leaf;
+	int level;
+
+	/* bytenr/generation of the tree block in file tree after swap */
+	u64 file_bytenr;
+	u64 file_generation;
+
+	/* bytenr/generation of the tree block in reloc tree after swap */
+	u64 reloc_bytenr;
+	u64 reloc_generation;
+
+	u64 last_snapshot;
+	struct btrfs_key first_key;
+};
+
 /*
  * Qgroup reservation types:
  *
@@ -323,4 +403,23 @@ void btrfs_qgroup_convert_reserved_meta(struct btrfs_root *root, int num_bytes);
 
 void btrfs_qgroup_check_reserved_leak(struct inode *inode);
 
+/* btrfs_qgroup_swapped_blocks related functions */
+static void inline btrfs_qgroup_init_swapped_blocks(
+		struct btrfs_qgroup_swapped_blocks *swapped_blocks)
+{
+	int i;
+
+	spin_lock_init(&swapped_blocks->lock);
+	for (i = 0; i < BTRFS_MAX_LEVEL; i++)
+		swapped_blocks->blocks[i] = RB_ROOT;
+	swapped_blocks->swapped = false;
+}
+
+void btrfs_qgroup_clean_swapped_blocks(struct btrfs_root *root);
+int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans,
+		struct btrfs_root *file_root,
+		struct btrfs_block_group_cache *bg,
+		struct extent_buffer *file_parent, int file_slot,
+		struct extent_buffer *reloc_parent, int reloc_slot,
+		u64 last_snapshot);
 #endif
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index a8b62323f60b..86b8d82b62d8 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1885,6 +1885,13 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc,
 		if (ret < 0)
 			break;
 
+		btrfs_node_key_to_cpu(parent, &first_key, slot);
+		ret = btrfs_qgroup_add_swapped_blocks(trans, dest,
+				rc->block_group, parent, slot,
+				path->nodes[level], path->slots[level],
+				last_snapshot);
+		if (ret < 0)
+			break;
 		/*
 		 * swap blocks in fs tree and reloc tree.
 		 */
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 3b84f5015029..15418cea9eb0 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -121,6 +121,7 @@ static noinline void switch_commit_roots(struct btrfs_transaction *trans)
 		if (is_fstree(root->objectid))
 			btrfs_unpin_free_ino(root);
 		clear_btree_io_tree(&root->dirty_log_pages);
+		btrfs_qgroup_clean_swapped_blocks(root);
 	}
 
 	/* We can free old roots now. */
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/6] btrfs: qgroup: Use delayed subtree rescan for balance
  2018-10-18 11:17 [PATCH 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead Qu Wenruo
                   ` (3 preceding siblings ...)
  2018-10-18 11:17 ` [PATCH 4/6] btrfs: qgroup: Introduce per-root swapped blocks infrastructure Qu Wenruo
@ 2018-10-18 11:17 ` Qu Wenruo
  2018-10-19 13:12   ` David Sterba
  2018-10-18 11:17 ` [PATCH 6/6] btrfs: qgroup: Cleanup old subtree swap code Qu Wenruo
  2018-10-19 13:42 ` [PATCH 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead David Sterba
  6 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2018-10-18 11:17 UTC (permalink / raw)
  To: linux-btrfs

Before this patch, qgroup code trace the whole subtree of file and reloc
trees unconditionally.

This makes qgroup numbers consistent, but it could cause tons of
unnecessary extent trace, which cause a lot of overhead.

However for subtree swap of balance, since both subtree contains the
same content and tree structures, just swap them won't change qgroup
numbers.

It's the race window between subtree swap and transaction commit could
cause qgroup number change.

This patch will delay the qgroup subtree scan until CoW happens for the
subtree root.

So if there is no other operations for the fs, balance won't cause extra
qgroup overhead. (best case scenario)
And depends on the workload, most of the subtree scan can still be
avoided.

Only for worst case scenario, it will fall back to old subtree swap
overhead. (scan all swapped subtrees)

[[Benchmark]]
Hardware:
	VM 4G vRAM, 8 vCPUs,
	disk is using 'unsafe' cache mode,
	backing device is SAMSUNG 850 evo SSD.
	Host has 16G ram.

Mkfs parameter:
	--nodesize 4K (To bump up tree size)

Initial subvolume contents:
	4G data copied from /usr and /lib.
	(With enough regular small files)

Snapshots:
	16 snapshots of the original subvolume.
	each snapshot has 3 random files modified.

balance parameter:
	-m

So the content should be pretty similar to a real world root fs layout.

And after file system population, there is no other activity, so it
should be the best case scenario.

                     | prev optimization(*) | w/ patchset    | diff
-----------------------------------------------------------------------
relocated extents    | 22958                | 22983          | +0.1%
qgroup dirty extents | 129282               | 54630          | -57.8%
time (sys)           | 37.807s              | 27.038s        | -28.5%
time (real)          | 45.446s              | 34.015s        | -25.2%

*: Previous submitted patch, titled
   "[PATCH v4 0/7] btrfs: qgroup: Reduce dirty extents for metadata".
   Still based on v4.19-rc1

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.c      |  8 ++++
 fs/btrfs/qgroup.c     | 87 +++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/qgroup.h     |  2 +
 fs/btrfs/relocation.c | 14 +++----
 4 files changed, 102 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index d436fb4c002e..8891ea05cdf3 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -12,6 +12,7 @@
 #include "transaction.h"
 #include "print-tree.h"
 #include "locking.h"
+#include "qgroup.h"
 
 static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root
 		      *root, struct btrfs_path *path, int level);
@@ -1482,6 +1483,13 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle *trans,
 		btrfs_set_lock_blocking(parent);
 	btrfs_set_lock_blocking(buf);
 
+	/*
+	 * Before CoWing this block for later modification, check if it's
+	 * the subtree root and do the delayed subtree trace if needed.
+	 *
+	 * Also We don't care about the error, as it's handled internally.
+	 */
+	btrfs_qgroup_trace_subtree_after_cow(trans, root, buf);
 	ret = __btrfs_cow_block(trans, root, buf, parent,
 				 parent_slot, cow_ret, search_start, 0);
 
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 146956fa8574..896711b1671a 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3937,3 +3937,90 @@ int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans,
 			BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
 	return ret;
 }
+
+/*
+ * Check if the tree block is a subtree root, and if so do the needed
+ * delayed subtree trace for qgroup.
+ *
+ * This is called during btrfs_cow_block().
+ */
+int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans,
+		struct btrfs_root *root, struct extent_buffer *file_eb)
+{
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct btrfs_qgroup_swapped_blocks *blocks = &root->swapped_blocks;
+	struct btrfs_qgroup_swapped_block *block;
+	struct extent_buffer *reloc_eb = NULL;
+	struct rb_node *n;
+	bool found = false;
+	bool swapped = false;
+	int level = btrfs_header_level(file_eb);
+	int ret = 0;
+	int i;
+
+	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags))
+		return 0;
+	if (!is_fstree(root->root_key.objectid) || !root->reloc_root)
+		return 0;
+
+	spin_lock(&blocks->lock);
+	if (!blocks->swapped) {
+		spin_unlock(&blocks->lock);
+		goto out;
+	}
+	n = blocks->blocks[level].rb_node;
+
+	while (n) {
+		block = rb_entry(n, struct btrfs_qgroup_swapped_block, node);
+		if (block->file_bytenr < file_eb->start)
+			n = n->rb_left;
+		else if (block->file_bytenr > file_eb->start)
+			n = n->rb_right;
+		else {
+			found = true;
+			break;
+		}
+	}
+	if (!found) {
+		spin_unlock(&blocks->lock);
+		goto out;
+	}
+	/* Found one, remove it from @blocks first and update blocks->swapped */
+	rb_erase(&block->node, &blocks->blocks[level]);
+	for (i = 0; i < BTRFS_MAX_LEVEL; i++) {
+		if (RB_EMPTY_ROOT(&blocks->blocks[i])) {
+			swapped = true;
+			break;
+		}
+	}
+	blocks->swapped = swapped;
+	spin_unlock(&blocks->lock);
+
+	/* Read out reloc subtree root */
+	reloc_eb = read_tree_block(fs_info, block->reloc_bytenr,
+				   block->reloc_generation, block->level,
+				   &block->first_key);
+	if (IS_ERR(reloc_eb)) {
+		ret = PTR_ERR(file_eb);
+		reloc_eb = NULL;
+		goto free_out;
+	}
+	if (!extent_buffer_uptodate(reloc_eb)) {
+		ret = -EIO;
+		goto free_out;
+	}
+
+	ret = qgroup_trace_subtree_swap(trans, reloc_eb, file_eb,
+			block->last_snapshot, block->trace_leaf, false);
+free_out:
+	kfree(block);
+	free_extent_buffer(reloc_eb);
+out:
+	if (ret < 0) {
+		btrfs_err_rl(fs_info,
+			     "failed to account subtree at bytenr %llu: %d",
+			     file_eb->start, ret); 
+		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+	}
+	return ret;
+}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index fab6ca96f677..145787be3f70 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -422,4 +422,6 @@ int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans,
 		struct extent_buffer *file_parent, int file_slot,
 		struct extent_buffer *reloc_parent, int reloc_slot,
 		u64 last_snapshot);
+int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans,
+		struct btrfs_root *root, struct extent_buffer *eb);
 #endif
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 86b8d82b62d8..a308b4728b87 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1876,16 +1876,12 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc,
 		 *    If not traced, we will leak data numbers
 		 * 2) Fs subtree
 		 *    If not traced, we will double count old data
-		 *    and tree block numbers, if current trans doesn't free
-		 *    data reloc tree inode.
+		 *
+		 * We don't scan the subtree right now, but only record
+		 * the swapped tree blocks.
+		 * The real subtree rescan is delayed until we have new
+		 * CoW on the subtree root node before transaction commit.
 		 */
-		ret = btrfs_qgroup_trace_subtree_swap(trans, rc->block_group,
-				parent, slot, path->nodes[level],
-				path->slots[level], last_snapshot);
-		if (ret < 0)
-			break;
-
-		btrfs_node_key_to_cpu(parent, &first_key, slot);
 		ret = btrfs_qgroup_add_swapped_blocks(trans, dest,
 				rc->block_group, parent, slot,
 				path->nodes[level], path->slots[level],
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/6] btrfs: qgroup: Cleanup old subtree swap code
  2018-10-18 11:17 [PATCH 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead Qu Wenruo
                   ` (4 preceding siblings ...)
  2018-10-18 11:17 ` [PATCH 5/6] btrfs: qgroup: Use delayed subtree rescan for balance Qu Wenruo
@ 2018-10-18 11:17 ` Qu Wenruo
  2018-10-19 13:42 ` [PATCH 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead David Sterba
  6 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2018-10-18 11:17 UTC (permalink / raw)
  To: linux-btrfs

Since it's replaced by new delayed subtree swap code, remove the
original code.

The cleanup is small since most of its core function is still used by
delayed subtree swap trace.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/qgroup.c | 94 -----------------------------------------------
 fs/btrfs/qgroup.h |  6 ---
 2 files changed, 100 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 896711b1671a..9de00a82d046 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2061,100 +2061,6 @@ static int qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
-/*
- * Inform qgroup to trace subtree swap used in balance.
- *
- * Unlike btrfs_qgroup_trace_subtree(), this function will only trace
- * new tree blocks whose generation is equal to (or larger than) @last_snapshot.
- *
- * Will go down the tree block pointed by @dst_eb (pointed by @dst_parent and
- * @dst_slot), and find any tree blocks whose generation is at @last_snapshot,
- * and then go down @src_eb (pointed by @src_parent and @src_slot) to find
- * the conterpart of the tree block, then mark both tree blocks as qgroup dirty,
- * and skip all tree blocks whose generation is smaller than last_snapshot.
- *
- * This would skip tons of tree blocks of original btrfs_qgroup_trace_subtree(),
- * which could be the culprit of super slow balance if the file tree is large.
- *
- * @src_parent, @src_slot: pointer to src (file tree) eb.
- * @dst_parent, @dst_slot: pointer to dst (tree reloc tree) eb.
- */
-int btrfs_qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans,
-				struct btrfs_block_group_cache *bg_cache,
-				struct extent_buffer *src_parent, int src_slot,
-				struct extent_buffer *dst_parent, int dst_slot,
-				u64 last_snapshot)
-{
-	struct btrfs_fs_info *fs_info = trans->fs_info;
-	struct btrfs_key first_key;
-	struct extent_buffer *src_eb = NULL;
-	struct extent_buffer *dst_eb = NULL;
-	bool trace_leaf = false;
-	u64 child_gen;
-	u64 child_bytenr;
-	int ret;
-
-	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags))
-		return 0;
-
-	/* Wrong parameter order */
-	if (btrfs_node_ptr_generation(src_parent, src_slot) >
-		btrfs_node_ptr_generation(dst_parent, dst_slot)) {
-		btrfs_err_rl(fs_info,
-		"%s: bad parameter order, src_gen=%llu dst_gen=%llu", __func__,
-			btrfs_node_ptr_generation(src_parent, src_slot),
-			btrfs_node_ptr_generation(dst_parent, dst_slot));
-		return -EUCLEAN;
-	}
-
-	/*
-	 * Only trace leaf if we're relocating data block groups, this could
-	 * reduce tons of data extents tracing for meta/sys bg relocation.
-	 */
-	if (bg_cache->flags & BTRFS_BLOCK_GROUP_DATA)
-		trace_leaf = true;
-	/* Read out real @src_eb, pointed by @src_parent and @src_slot */
-	child_bytenr = btrfs_node_blockptr(src_parent, src_slot);
-	child_gen = btrfs_node_ptr_generation(src_parent, src_slot);
-	btrfs_node_key_to_cpu(src_parent, &first_key, src_slot);
-
-	src_eb = read_tree_block(fs_info, child_bytenr, child_gen,
-			btrfs_header_level(src_parent) - 1, &first_key);
-	if (IS_ERR(src_eb)) {
-		ret = PTR_ERR(src_eb);
-		goto out;
-	}
-
-	/* Read out real @dst_eb, pointed by @src_parent and @src_slot */
-	child_bytenr = btrfs_node_blockptr(dst_parent, dst_slot);
-	child_gen = btrfs_node_ptr_generation(dst_parent, dst_slot);
-	btrfs_node_key_to_cpu(dst_parent, &first_key, dst_slot);
-
-	dst_eb = read_tree_block(fs_info, child_bytenr, child_gen,
-			btrfs_header_level(dst_parent) - 1, &first_key);
-	if (IS_ERR(dst_eb)) {
-		ret = PTR_ERR(dst_eb);
-		goto out;
-	}
-
-	if (!extent_buffer_uptodate(src_eb) || !extent_buffer_uptodate(dst_eb)) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	/* Do the generation aware breadth-first search */
-	ret = qgroup_trace_subtree_swap(trans, src_eb, dst_eb, last_snapshot,
-					trace_leaf, true);
-	if (ret < 0)
-		goto out;
-	ret = 0;
-
-out:
-	free_extent_buffer(src_eb);
-	free_extent_buffer(dst_eb);
-	return ret;
-}
-
 int btrfs_qgroup_trace_subtree(struct btrfs_trans_handle *trans,
 			       struct extent_buffer *root_eb,
 			       u64 root_gen, int root_level)
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 145787be3f70..fc65eba26de9 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -316,12 +316,6 @@ int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans,
 int btrfs_qgroup_trace_subtree(struct btrfs_trans_handle *trans,
 			       struct extent_buffer *root_eb,
 			       u64 root_gen, int root_level);
-
-int btrfs_qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans,
-				struct btrfs_block_group_cache *bg_cache,
-				struct extent_buffer *src_parent, int src_slot,
-				struct extent_buffer *dst_parent, int dst_slot,
-				u64 last_snapshot);
 int btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 				u64 num_bytes, struct ulist *old_roots,
 				struct ulist *new_roots);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] btrfs: qgroup: Introduce per-root swapped blocks infrastructure
  2018-10-18 11:17 ` [PATCH 4/6] btrfs: qgroup: Introduce per-root swapped blocks infrastructure Qu Wenruo
@ 2018-10-18 16:20   ` David Sterba
  2018-10-18 23:29     ` Qu Wenruo
  0 siblings, 1 reply; 19+ messages in thread
From: David Sterba @ 2018-10-18 16:20 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Oct 18, 2018 at 07:17:27PM +0800, Qu Wenruo wrote:
> +void btrfs_qgroup_clean_swapped_blocks(struct btrfs_root *root)
> +{
> +	struct btrfs_qgroup_swapped_blocks *swapped_blocks;
> +	struct btrfs_qgroup_swapped_block *cur, *next;
> +	int i;
> +
> +	swapped_blocks = &root->swapped_blocks;
> +
> +	spin_lock(&swapped_blocks->lock);
> +	if (!swapped_blocks->swapped)
> +		goto out;
> +	for (i = 0; i < BTRFS_MAX_LEVEL; i++) {
> +		struct rb_root *cur_root = &swapped_blocks->blocks[i];
> +
> +		rbtree_postorder_for_each_entry_safe(cur, next, cur_root,
> +						     node) {
> +			rb_erase(&cur->node, cur_root);

https://elixir.bootlin.com/linux/latest/source/include/linux/rbtree.h#L141

rb_erase must not be used on the same pointer that
rbtree_postorder_for_each_entry_safe iterates, here it's cur.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] btrfs: qgroup: Introduce per-root swapped blocks infrastructure
  2018-10-18 16:20   ` David Sterba
@ 2018-10-18 23:29     ` Qu Wenruo
  2018-10-19  9:15       ` David Sterba
  0 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2018-10-18 23:29 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1018 bytes --]



On 2018/10/19 上午12:20, David Sterba wrote:
> On Thu, Oct 18, 2018 at 07:17:27PM +0800, Qu Wenruo wrote:
>> +void btrfs_qgroup_clean_swapped_blocks(struct btrfs_root *root)
>> +{
>> +	struct btrfs_qgroup_swapped_blocks *swapped_blocks;
>> +	struct btrfs_qgroup_swapped_block *cur, *next;
>> +	int i;
>> +
>> +	swapped_blocks = &root->swapped_blocks;
>> +
>> +	spin_lock(&swapped_blocks->lock);
>> +	if (!swapped_blocks->swapped)
>> +		goto out;
>> +	for (i = 0; i < BTRFS_MAX_LEVEL; i++) {
>> +		struct rb_root *cur_root = &swapped_blocks->blocks[i];
>> +
>> +		rbtree_postorder_for_each_entry_safe(cur, next, cur_root,
>> +						     node) {
>> +			rb_erase(&cur->node, cur_root);
> 
> https://elixir.bootlin.com/linux/latest/source/include/linux/rbtree.h#L141
> 
> rb_erase must not be used on the same pointer that
> rbtree_postorder_for_each_entry_safe iterates, here it's cur.

Oh, thanks for pointing this out.

So we must go the old while(rb_first()) loop...

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] btrfs: qgroup: Introduce per-root swapped blocks infrastructure
  2018-10-18 23:29     ` Qu Wenruo
@ 2018-10-19  9:15       ` David Sterba
  2018-10-19  9:46         ` Qu Wenruo
  0 siblings, 1 reply; 19+ messages in thread
From: David Sterba @ 2018-10-19  9:15 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Fri, Oct 19, 2018 at 07:29:26AM +0800, Qu Wenruo wrote:
> 
> 
> On 2018/10/19 上午12:20, David Sterba wrote:
> > On Thu, Oct 18, 2018 at 07:17:27PM +0800, Qu Wenruo wrote:
> >> +void btrfs_qgroup_clean_swapped_blocks(struct btrfs_root *root)
> >> +{
> >> +	struct btrfs_qgroup_swapped_blocks *swapped_blocks;
> >> +	struct btrfs_qgroup_swapped_block *cur, *next;
> >> +	int i;
> >> +
> >> +	swapped_blocks = &root->swapped_blocks;
> >> +
> >> +	spin_lock(&swapped_blocks->lock);
> >> +	if (!swapped_blocks->swapped)
> >> +		goto out;
> >> +	for (i = 0; i < BTRFS_MAX_LEVEL; i++) {
> >> +		struct rb_root *cur_root = &swapped_blocks->blocks[i];
> >> +
> >> +		rbtree_postorder_for_each_entry_safe(cur, next, cur_root,
> >> +						     node) {
> >> +			rb_erase(&cur->node, cur_root);
> > 
> > https://elixir.bootlin.com/linux/latest/source/include/linux/rbtree.h#L141
> > 
> > rb_erase must not be used on the same pointer that
> > rbtree_postorder_for_each_entry_safe iterates, here it's cur.
> 
> Oh, thanks for pointing this out.
> 
> So we must go the old while(rb_first()) loop...

I just realized the postorder iterator can be used here. The forbidden
pattern is

rbtree_postorder_for_each_entry_safe() {
	rb_erase();
}

But you also do kfree right after the erase, and the iterator is fine
with that and rb_erase is pointless for an entry that's being freed
anyway.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] btrfs: qgroup: Introduce per-root swapped blocks infrastructure
  2018-10-19  9:15       ` David Sterba
@ 2018-10-19  9:46         ` Qu Wenruo
  2018-10-19 10:04           ` David Sterba
  0 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2018-10-19  9:46 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2170 bytes --]



On 2018/10/19 下午5:15, David Sterba wrote:
> On Fri, Oct 19, 2018 at 07:29:26AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2018/10/19 上午12:20, David Sterba wrote:
>>> On Thu, Oct 18, 2018 at 07:17:27PM +0800, Qu Wenruo wrote:
>>>> +void btrfs_qgroup_clean_swapped_blocks(struct btrfs_root *root)
>>>> +{
>>>> +	struct btrfs_qgroup_swapped_blocks *swapped_blocks;
>>>> +	struct btrfs_qgroup_swapped_block *cur, *next;
>>>> +	int i;
>>>> +
>>>> +	swapped_blocks = &root->swapped_blocks;
>>>> +
>>>> +	spin_lock(&swapped_blocks->lock);
>>>> +	if (!swapped_blocks->swapped)
>>>> +		goto out;
>>>> +	for (i = 0; i < BTRFS_MAX_LEVEL; i++) {
>>>> +		struct rb_root *cur_root = &swapped_blocks->blocks[i];
>>>> +
>>>> +		rbtree_postorder_for_each_entry_safe(cur, next, cur_root,
>>>> +						     node) {
>>>> +			rb_erase(&cur->node, cur_root);
>>>
>>> https://elixir.bootlin.com/linux/latest/source/include/linux/rbtree.h#L141
>>>
>>> rb_erase must not be used on the same pointer that
>>> rbtree_postorder_for_each_entry_safe iterates, here it's cur.
>>
>> Oh, thanks for pointing this out.
>>
>> So we must go the old while(rb_first()) loop...
> 
> I just realized the postorder iterator can be used here. The forbidden
> pattern is
> 
> rbtree_postorder_for_each_entry_safe() {
> 	rb_erase();
> }

Did you mean some like this is possible?

rbtree_postorder_for_each_entry_safe() {
	kfree(entry);
}

If so, I still don't really believe it's OK.

For the following tree:
                          4
                       /     \
                      2       6
                    /  \     / \
                   1    3   5   7

If current entry is 2, next is 3.
And 2 get freed.
Then we go 3, to reach next we need to go back to access 2, which is
already freed, we will trigger use-after-free.

So the only correct way to free the whole rbtree is still that tried and
true while(rb_first()) loop.

Or did I miss something?

Thanks,
Qu

> 
> But you also do kfree right after the erase, and the iterator is fine
> with that and rb_erase is pointless for an entry that's being freed
> anyway.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] btrfs: qgroup: Introduce per-root swapped blocks infrastructure
  2018-10-19  9:46         ` Qu Wenruo
@ 2018-10-19 10:04           ` David Sterba
  2018-10-19 13:03             ` Qu Wenruo
  0 siblings, 1 reply; 19+ messages in thread
From: David Sterba @ 2018-10-19 10:04 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs

On Fri, Oct 19, 2018 at 05:46:28PM +0800, Qu Wenruo wrote:
> Did you mean some like this is possible?
> 
> rbtree_postorder_for_each_entry_safe() {
> 	kfree(entry);
> }
> 
> If so, I still don't really believe it's OK.
> 
> For the following tree:
>                           4
>                        /     \
>                       2       6
>                     /  \     / \
>                    1    3   5   7
> 
> If current entry is 2, next is 3.
> And 2 get freed.
> Then we go 3, to reach next we need to go back to access 2, which is
> already freed, we will trigger use-after-free.
> 
> So the only correct way to free the whole rbtree is still that tried and
> true while(rb_first()) loop.
> 
> Or did I miss something?

It's postorder traversal so the node is ready for processing after all
it's children are processed. In the above example, it will result in the
followin sequence: 1 3 2 5 7 6 4 .

The iterator is safe in the sense that there's enough information saved
before going to the node so it's not needed to be accessed later, eg.
after it's freed.

prelim_release uses the same postorder/kfree pattern.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] btrfs: qgroup: Introduce per-root swapped blocks infrastructure
  2018-10-19 10:04           ` David Sterba
@ 2018-10-19 13:03             ` Qu Wenruo
  0 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2018-10-19 13:03 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1443 bytes --]



On 2018/10/19 下午6:04, David Sterba wrote:
> On Fri, Oct 19, 2018 at 05:46:28PM +0800, Qu Wenruo wrote:
>> Did you mean some like this is possible?
>>
>> rbtree_postorder_for_each_entry_safe() {
>> 	kfree(entry);
>> }
>>
>> If so, I still don't really believe it's OK.
>>
>> For the following tree:
>>                           4
>>                        /     \
>>                       2       6
>>                     /  \     / \
>>                    1    3   5   7
>>
>> If current entry is 2, next is 3.
>> And 2 get freed.
>> Then we go 3, to reach next we need to go back to access 2, which is
>> already freed, we will trigger use-after-free.
>>
>> So the only correct way to free the whole rbtree is still that tried and
>> true while(rb_first()) loop.
>>
>> Or did I miss something?
> 
> It's postorder traversal so the node is ready for processing after all
> it's children are processed. In the above example, it will result in the
> followin sequence: 1 3 2 5 7 6 4 .

Oh, I totally misunderstand the word post-order.

Post-order is LRN, so upper nodes are always accessed last, then kfree()
inside post-order is completely OK.

Thanks,
Qu

> 
> The iterator is safe in the sense that there's enough information saved
> before going to the node so it's not needed to be accessed later, eg.
> after it's freed.
> 
> prelim_release uses the same postorder/kfree pattern.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] btrfs: qgroup: Use delayed subtree rescan for balance
  2018-10-18 11:17 ` [PATCH 5/6] btrfs: qgroup: Use delayed subtree rescan for balance Qu Wenruo
@ 2018-10-19 13:12   ` David Sterba
  2018-10-19 13:19     ` David Sterba
  0 siblings, 1 reply; 19+ messages in thread
From: David Sterba @ 2018-10-19 13:12 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Oct 18, 2018 at 07:17:28PM +0800, Qu Wenruo wrote:
> +free_out:
> +	kfree(block);
> +	free_extent_buffer(reloc_eb);
> +out:
> +	if (ret < 0) {
> +		btrfs_err_rl(fs_info,
> +			     "failed to account subtree at bytenr %llu: %d",
> +			     file_eb->start, ret); 

There's a trailing space at the end of the line and git-am does not let
me apply the patch without manual intervetion. Please clean the patch
before sending next time, thanks.

> +		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
> +	}
> +	return ret;
> +}

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] btrfs: qgroup: Use delayed subtree rescan for balance
  2018-10-19 13:12   ` David Sterba
@ 2018-10-19 13:19     ` David Sterba
  0 siblings, 0 replies; 19+ messages in thread
From: David Sterba @ 2018-10-19 13:19 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

On Fri, Oct 19, 2018 at 03:12:34PM +0200, David Sterba wrote:
> On Thu, Oct 18, 2018 at 07:17:28PM +0800, Qu Wenruo wrote:
> > +free_out:
> > +	kfree(block);
> > +	free_extent_buffer(reloc_eb);
> > +out:
> > +	if (ret < 0) {
> > +		btrfs_err_rl(fs_info,
> > +			     "failed to account subtree at bytenr %llu: %d",
> > +			     file_eb->start, ret); 
> 
> There's a trailing space at the end of the line and git-am does not let
> me apply the patch without manual intervetion. Please clean the patch
> before sending next time, thanks.

And when I take the git branch instead and rebase it to a recent commit
the same problem arises as it's from git-apply.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
  2018-10-18 11:17 [PATCH 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead Qu Wenruo
                   ` (5 preceding siblings ...)
  2018-10-18 11:17 ` [PATCH 6/6] btrfs: qgroup: Cleanup old subtree swap code Qu Wenruo
@ 2018-10-19 13:42 ` David Sterba
  2018-10-19 13:47   ` Qu Wenruo
  2018-10-19 14:51   ` Qu Wenruo
  6 siblings, 2 replies; 19+ messages in thread
From: David Sterba @ 2018-10-19 13:42 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Oct 18, 2018 at 07:17:23PM +0800, Qu Wenruo wrote:
> This patchset can be fetched from github:
> https://github.com/adam900710/linux/tree/qgroup_balance_skip_trees
> 
> Which is still based on v4.19-rc1, but with previous submitted patches
> as dependency.

Please rebase this patchset on top of misc-4.20, this should not change
as it's going to be the 1st pull.

I tried to pull the git branch and rebase but there were several
conflicts and some of the paches appeared duplicated so the rebase did
not recognize the changes as identical.

Applying patches from mails causes other sort of conflicts and I can't
seem to get a compilable version.

This is not urgent as I want to put the patchset to a topic branch and
give it some testing before it goes to for-next. Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
  2018-10-19 13:42 ` [PATCH 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead David Sterba
@ 2018-10-19 13:47   ` Qu Wenruo
  2018-10-19 14:51   ` Qu Wenruo
  1 sibling, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2018-10-19 13:47 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1442 bytes --]



On 2018/10/19 下午9:42, David Sterba wrote:
> On Thu, Oct 18, 2018 at 07:17:23PM +0800, Qu Wenruo wrote:
>> This patchset can be fetched from github:
>> https://github.com/adam900710/linux/tree/qgroup_balance_skip_trees
>>
>> Which is still based on v4.19-rc1, but with previous submitted patches
>> as dependency.
> 
> Please rebase this patchset on top of misc-4.20, this should not change
> as it's going to be the 1st pull.
> 
> I tried to pull the git branch and rebase but there were several
> conflicts and some of the paches appeared duplicated so the rebase did
> not recognize the changes as identical.

Yes, the branch has submitted qgroup patches as dependency, and it's
still based on v4.19-rc1, so conflicts are expected.

> 
> Applying patches from mails causes other sort of conflicts and I can't
> seem to get a compilable version.

I'll look into it.

> 
> This is not urgent as I want to put the patchset to a topic branch and
> give it some testing before it goes to for-next. Thanks.

OK, however this patchset may still need extra polish, especially for
the 2nd patch.
As you can see in the benchmark part, it reduces nearly half of dirty
qgroup extents but only cuts about 25% relocation time, that's mostly
caused by that 2nd patch, and that 2nd patch could impact balance
performance even for quota disabled case.

So I'm not eager to push it soon either.

Thanks,
Qu



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
  2018-10-19 13:42 ` [PATCH 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead David Sterba
  2018-10-19 13:47   ` Qu Wenruo
@ 2018-10-19 14:51   ` Qu Wenruo
  2018-10-19 14:54     ` David Sterba
  1 sibling, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2018-10-19 14:51 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1265 bytes --]



On 2018/10/19 下午9:42, David Sterba wrote:
> On Thu, Oct 18, 2018 at 07:17:23PM +0800, Qu Wenruo wrote:
>> This patchset can be fetched from github:
>> https://github.com/adam900710/linux/tree/qgroup_balance_skip_trees
>>
>> Which is still based on v4.19-rc1, but with previous submitted patches
>> as dependency.
> 
> Please rebase this patchset on top of misc-4.20, this should not change
> as it's going to be the 1st pull.
> 
> I tried to pull the git branch and rebase but there were several
> conflicts and some of the paches appeared duplicated so the rebase did
> not recognize the changes as identical.
> 
> Applying patches from mails causes other sort of conflicts and I can't
> seem to get a compilable version.
> 
> This is not urgent as I want to put the patchset to a topic branch and
> give it some testing before it goes to for-next. Thanks.

Current patchset (still using while(rb_first()) for cleanup
swapped_blocks) rebased to misc-4.20.
With all checkpatch errors fixed.

The branch is:
https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased

Please note that this is only for test purpose.
Due to explained reasons, I don't really think it should go misc-next
right now.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
  2018-10-19 14:51   ` Qu Wenruo
@ 2018-10-19 14:54     ` David Sterba
  0 siblings, 0 replies; 19+ messages in thread
From: David Sterba @ 2018-10-19 14:54 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Fri, Oct 19, 2018 at 10:51:47PM +0800, Qu Wenruo wrote:
> 
> 
> On 2018/10/19 下午9:42, David Sterba wrote:
> > On Thu, Oct 18, 2018 at 07:17:23PM +0800, Qu Wenruo wrote:
> >> This patchset can be fetched from github:
> >> https://github.com/adam900710/linux/tree/qgroup_balance_skip_trees
> >>
> >> Which is still based on v4.19-rc1, but with previous submitted patches
> >> as dependency.
> > 
> > Please rebase this patchset on top of misc-4.20, this should not change
> > as it's going to be the 1st pull.
> > 
> > I tried to pull the git branch and rebase but there were several
> > conflicts and some of the paches appeared duplicated so the rebase did
> > not recognize the changes as identical.
> > 
> > Applying patches from mails causes other sort of conflicts and I can't
> > seem to get a compilable version.
> > 
> > This is not urgent as I want to put the patchset to a topic branch and
> > give it some testing before it goes to for-next. Thanks.
> 
> Current patchset (still using while(rb_first()) for cleanup
> swapped_blocks) rebased to misc-4.20.
> With all checkpatch errors fixed.
> 
> The branch is:
> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased

Thanks.

> Please note that this is only for test purpose.
> Due to explained reasons, I don't really think it should go misc-next
> right now.

Ok. Misc-next is for things that pass some review and testing, the rest
is in for-next and can be delayed or dropped as needed.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-10-19 14:54 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-18 11:17 [PATCH 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead Qu Wenruo
2018-10-18 11:17 ` [PATCH 1/6] btrfs: qgroup: Allow btrfs_qgroup_extent_record::old_roots unpopulated at insert time Qu Wenruo
2018-10-18 11:17 ` [PATCH 2/6] btrfs: relocation: Commit transaction before dropping btrfs_root::reloc_root Qu Wenruo
2018-10-18 11:17 ` [PATCH 3/6] btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap() Qu Wenruo
2018-10-18 11:17 ` [PATCH 4/6] btrfs: qgroup: Introduce per-root swapped blocks infrastructure Qu Wenruo
2018-10-18 16:20   ` David Sterba
2018-10-18 23:29     ` Qu Wenruo
2018-10-19  9:15       ` David Sterba
2018-10-19  9:46         ` Qu Wenruo
2018-10-19 10:04           ` David Sterba
2018-10-19 13:03             ` Qu Wenruo
2018-10-18 11:17 ` [PATCH 5/6] btrfs: qgroup: Use delayed subtree rescan for balance Qu Wenruo
2018-10-19 13:12   ` David Sterba
2018-10-19 13:19     ` David Sterba
2018-10-18 11:17 ` [PATCH 6/6] btrfs: qgroup: Cleanup old subtree swap code Qu Wenruo
2018-10-19 13:42 ` [PATCH 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead David Sterba
2018-10-19 13:47   ` Qu Wenruo
2018-10-19 14:51   ` Qu Wenruo
2018-10-19 14:54     ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.