Linux-BTRFS Archive on lore.kernel.org
 help / Atom feed
* [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
@ 2018-11-08  5:49 Qu Wenruo
  2018-11-08  5:49 ` [PATCH v2 1/6] btrfs: qgroup: Allow btrfs_qgroup_extent_record::old_roots unpopulated at insert time Qu Wenruo
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Qu Wenruo @ 2018-11-08  5:49 UTC (permalink / raw)
  To: linux-btrfs

This patchset can be fetched from github:
https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased

Which is based on v4.20-rc1.

This patch address the heavy load subtree scan, but delaying it until
we're going to modify the swapped tree block.

The overall workflow is:

1) Record the subtree root block get swapped.

   During subtree swap:
   O = Old tree blocks
   N = New tree blocks
         reloc tree                         file tree X
            Root                               Root
           /    \                             /    \
         NA     OB                          OA      OB
       /  |     |  \                      /  |      |  \
     NC  ND     OE  OF                   OC  OD     OE  OF

  In these case, NA and OA is going to be swapped, record (NA, OA) into
  file tree X.

2) After subtree swap.
         reloc tree                         file tree X
            Root                               Root
           /    \                             /    \
         OA     OB                          NA      OB
       /  |     |  \                      /  |      |  \
     OC  OD     OE  OF                   NC  ND     OE  OF

3a) CoW happens for OB
    If we are going to CoW tree block OB, we check OB's bytenr against
    tree X's swapped_blocks structure.
    It doesn't fit any one, nothing will happen.

3b) CoW happens for NA
    Check NA's bytenr against tree X's swapped_blocks, and get a hit.
    Then we do subtree scan on both subtree OA and NA.
    Resulting 6 tree blocks to be scanned (OA, OC, OD, NA, NC, ND).

    Then no matter what we do to file tree X, qgroup numbers will
    still be correct.
    Then NA's record get removed from X's swapped_blocks.

4)  Transaction commit
    Any record in X's swapped_blocks get removed, since there is no
    modification to swapped subtrees, no need to trigger heavy qgroup
    subtree rescan for them.

[[Benchmark]]
Hardware:
	VM 4G vRAM, 8 vCPUs,
	disk is using 'unsafe' cache mode,
	backing device is SAMSUNG 850 evo SSD.
	Host has 16G ram.

Mkfs parameter:
	--nodesize 4K (To bump up tree size)

Initial subvolume contents:
	4G data copied from /usr and /lib.
	(With enough regular small files)

Snapshots:
	16 snapshots of the original subvolume.
	each snapshot has 3 random files modified.

balance parameter:
	-m

So the content should be pretty similar to a real world root fs layout.

And after file system population, there is no other activity, so it
should be the best case scenario.

                     | v4.20-rc1            | w/ patchset    | diff
-----------------------------------------------------------------------
relocated extents    | 22615                | 22457          | -0.1%
qgroup dirty extents | 163457               | 121606         | -25.6%
time (sys)           | 22.884s              | 18.842s        | -17.6%
time (real)          | 27.724s              | 22.884s        | -17.5%

changelog:
v2:
  Rebase to v4.20-rc1.
  Instead commit transaction after each reloc tree merge, delay it until
  merge_reloc_roots() finishes.
  This provides a more natural behavior, and reduce the unnecessary
  transaction commits.

Qu Wenruo (6):
  btrfs: qgroup: Allow btrfs_qgroup_extent_record::old_roots unpopulated
    at insert time
  btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots()
  btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap()
  btrfs: qgroup: Introduce per-root swapped blocks infrastructure
  btrfs: qgroup: Use delayed subtree rescan for balance
  btrfs: qgroup: Cleanup old subtree swap code

 fs/btrfs/ctree.c       |   8 +
 fs/btrfs/ctree.h       |  14 ++
 fs/btrfs/disk-io.c     |   1 +
 fs/btrfs/qgroup.c      | 376 +++++++++++++++++++++++++++++++----------
 fs/btrfs/qgroup.h      | 107 +++++++++++-
 fs/btrfs/relocation.c  | 140 ++++++++++++---
 fs/btrfs/transaction.c |   1 +
 7 files changed, 527 insertions(+), 120 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/6] btrfs: qgroup: Allow btrfs_qgroup_extent_record::old_roots unpopulated at insert time
  2018-11-08  5:49 [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead Qu Wenruo
@ 2018-11-08  5:49 ` Qu Wenruo
  2018-11-08  5:49 ` [PATCH v2 2/6] btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots() Qu Wenruo
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2018-11-08  5:49 UTC (permalink / raw)
  To: linux-btrfs

Commit fb235dc06fac ("btrfs: qgroup: Move half of the qgroup accounting
time out of commit trans") makes btrfs_qgroup_extent_record::old_roots
populated at insert time.

It's OK for most cases as btrfs_qgroup_extent_record is inserted at
delayed ref head insert time, which has a less restrict lock context.

But later delayed subtree scan optimization will need to insert
btrfs_qgroup_extent_record with path write lock hold, where triggering a
backref walk can easily lead to dead lock.

So this patch introduces two new internal functions,
qgroup_trace_extent() and qgroup_trace_leaf_items(), with new @exec_post
parameter to info whether we need to initialize the backref walk right
now.

Also modifies btrfs_qgroup_account_extents() not to trigger kernel
warning.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/qgroup.c | 51 +++++++++++++++++++++++++++++++++++++----------
 1 file changed, 41 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 45868fd76209..6c674ac29b90 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1580,8 +1580,16 @@ int btrfs_qgroup_trace_extent_post(struct btrfs_fs_info *fs_info,
 	return 0;
 }
 
-int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
-			      u64 num_bytes, gfp_t gfp_flag)
+/*
+ * Insert qgroup extent record for extent at @bytenr, @num_bytes.
+ *
+ * @bytenr:	bytenr of the extent
+ * @num_bytes:	length of the extent
+ * @exec_post:	whether to exec the post insert work
+ *		will init backref walk if set to true.
+ */
+static int qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
+			       u64 num_bytes, gfp_t gfp_flag, bool exec_post)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_qgroup_extent_record *record;
@@ -1607,11 +1615,27 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 		kfree(record);
 		return 0;
 	}
-	return btrfs_qgroup_trace_extent_post(fs_info, record);
+	if (exec_post)
+		return btrfs_qgroup_trace_extent_post(fs_info, record);
+	return 0;
 }
 
-int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans,
-				  struct extent_buffer *eb)
+int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
+			      u64 num_bytes, gfp_t gfp_flag)
+{
+	return qgroup_trace_extent(trans, bytenr, num_bytes, gfp_flag, true);
+}
+
+/*
+ * Insert qgroup extent record for leaf and all file extents in it
+ *
+ * @bytenr:	bytenr of the leaf
+ * @num_bytes:	length of the leaf
+ * @exec_post:	whether to exec the post insert work
+ *		will init backref walk if set to true.
+ */
+static int qgroup_trace_leaf_items(struct btrfs_trans_handle *trans,
+				   struct extent_buffer *eb, bool exec_post)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	int nr = btrfs_header_nritems(eb);
@@ -1643,8 +1667,8 @@ int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans,
 
 		num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi);
 
-		ret = btrfs_qgroup_trace_extent(trans, bytenr, num_bytes,
-						GFP_NOFS);
+		ret = qgroup_trace_extent(trans, bytenr, num_bytes, GFP_NOFS,
+					  exec_post);
 		if (ret)
 			return ret;
 	}
@@ -1652,6 +1676,12 @@ int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans,
 	return 0;
 }
 
+int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans,
+				  struct extent_buffer *eb)
+{
+	return qgroup_trace_leaf_items(trans, eb, true);
+}
+
 /*
  * Walk up the tree from the bottom, freeing leaves and any interior
  * nodes which have had all slots visited. If a node (leaf or
@@ -2558,10 +2588,11 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans)
 
 		if (!ret) {
 			/*
-			 * Old roots should be searched when inserting qgroup
-			 * extent record
+			 * Most record->old_roots should have been populated at
+			 * insert time. Although we still allow some records
+			 * without old_roots populated.
 			 */
-			if (WARN_ON(!record->old_roots)) {
+			if (!record->old_roots) {
 				/* Search commit root to find old_roots */
 				ret = btrfs_find_all_roots(NULL, fs_info,
 						record->bytenr, 0,
-- 
2.19.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 2/6] btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots()
  2018-11-08  5:49 [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead Qu Wenruo
  2018-11-08  5:49 ` [PATCH v2 1/6] btrfs: qgroup: Allow btrfs_qgroup_extent_record::old_roots unpopulated at insert time Qu Wenruo
@ 2018-11-08  5:49 ` Qu Wenruo
  2018-11-08  5:49 ` [PATCH v2 3/6] btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap() Qu Wenruo
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2018-11-08  5:49 UTC (permalink / raw)
  To: linux-btrfs

Relocation code will drop btrfs_root::reloc_root as soon as
merge_reloc_root() finishes.

However later qgroup code will need to access btrfs_root::reloc_root
after merge_reloc_root() for delayed subtree rescan.

So alter the timming of resetting btrfs_root:::reloc_root, make it
happens after transaction commit.

With this patch, we will introduce a new btrfs_root::state,
BTRFS_ROOT_DEAD_RELOC_TREE, to info part of btrfs_root::reloc_tree user
that although btrfs_root::reloc_tree is still non-NULL, but still it's
not used any more.

The lifespan of btrfs_root::reloc tree will become:
          Old behavior            |              New
------------------------------------------------------------------------
btrfs_init_reloc_root()      ---  | btrfs_init_reloc_root()      ---
  set reloc_root              |   |   set reloc_root              |
                              |   |                               |
                              |   |                               |
merge_reloc_root()            |   | merge_reloc_root()            |
|- btrfs_update_reloc_root() ---  | |- btrfs_update_reloc_root() -+-
     clear btrfs_root::reloc_root |      set ROOT_DEAD_RELOC_TREE |
                                  |      record root into dirty   |
                                  |      roots rbtree             |
                                  |                               |
                                  | reloc_block_group() Or        |
                                  | btrfs_recover_relocation()    |
                                  | | After transaction commit    |
                                  | |- clean_dirty_root()        ---
                                  |     clear btrfs_root::reloc_root

During ROOT_DEAD_RELOC_TREE set lifespan, the only user of
btrfs_root::reloc_tree should be qgroup.

And to co-operate this, also delayed btrfs_drop_snapshot() call on reloc
tree, btrfs_drop_snapshot() call will also be delayed to
clean_dirty_root().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h      |   1 +
 fs/btrfs/relocation.c | 125 ++++++++++++++++++++++++++++++++++++------
 2 files changed, 109 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 80953528572d..2c33506bdaaa 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1149,6 +1149,7 @@ struct btrfs_subvolume_writers {
 #define BTRFS_ROOT_FORCE_COW		6
 #define BTRFS_ROOT_MULTI_LOG_TASKS	7
 #define BTRFS_ROOT_DIRTY		8
+#define BTRFS_ROOT_DEAD_RELOC_TREE	9
 
 /*
  * in ram representation of the tree.  extent_root is used for all allocations
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 924116f654a1..6f1f11b5d8f6 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -143,6 +143,20 @@ struct file_extent_cluster {
 	unsigned int nr;
 };
 
+/*
+ * Helper structure to keep record of a file tree whose reloc
+ * root needs to be cleaned up.
+ *
+ * Since reloc_control is used less frequently than btrfs_root, this should
+ * prevent us to add another structure in btrfs_root.
+ */
+struct dirty_source_root {
+	struct rb_node node;
+
+	/* Root must be file tree */
+	struct btrfs_root *root;
+};
+
 struct reloc_control {
 	/* block group to relocate */
 	struct btrfs_block_group_cache *block_group;
@@ -172,6 +186,9 @@ struct reloc_control {
 	u64 search_start;
 	u64 extents_found;
 
+	/* dirty source roots, whose reloc root needs to be cleaned up */
+	struct rb_root dirty_roots;
+
 	unsigned int stage:8;
 	unsigned int create_reloc_tree:1;
 	unsigned int merge_reloc_tree:1;
@@ -1467,15 +1484,17 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
 	struct btrfs_root_item *root_item;
 	int ret;
 
-	if (!root->reloc_root)
+	if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state) ||
+	    !root->reloc_root)
 		goto out;
 
 	reloc_root = root->reloc_root;
 	root_item = &reloc_root->root_item;
 
+	/* root->reloc_root will stay until current relocation finished */
 	if (fs_info->reloc_ctl->merge_reloc_tree &&
 	    btrfs_root_refs(root_item) == 0) {
-		root->reloc_root = NULL;
+		set_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
 		__del_reloc_root(reloc_root);
 	}
 
@@ -2120,6 +2139,84 @@ static int find_next_key(struct btrfs_path *path, int level,
 	return 1;
 }
 
+/*
+ * Helper to insert current root into reloc_control::dirty_roots
+ */
+static int insert_dirty_root(struct btrfs_trans_handle *trans,
+			     struct reloc_control *rc,
+			     struct btrfs_root *root)
+{
+	struct rb_node **p = &rc->dirty_roots.rb_node;
+	struct rb_node *parent = NULL;
+	struct dirty_source_root *entry;
+	struct btrfs_root *reloc_root = root->reloc_root;
+	struct btrfs_root_item *reloc_root_item;
+	u64 root_objectid = root->root_key.objectid;
+
+	/* @root must be a file tree root*/
+	ASSERT(root_objectid != BTRFS_TREE_RELOC_OBJECTID);
+	ASSERT(reloc_root);
+
+	reloc_root_item = &reloc_root->root_item;
+	memset(&reloc_root_item->drop_progress, 0,
+		sizeof(reloc_root_item->drop_progress));
+	reloc_root_item->drop_level = 0;
+	btrfs_set_root_refs(reloc_root_item, 0);
+	btrfs_update_reloc_root(trans, root);
+
+	/* We're at relocation route, not writeback route, GFP_KERNEL is OK */
+	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		return -ENOMEM;
+	btrfs_grab_fs_root(root);
+	entry->root = root;
+	while (*p) {
+		struct dirty_source_root *cur_entry;
+
+		parent = *p;
+		cur_entry = rb_entry(parent, struct dirty_source_root, node);
+
+		if (root_objectid < cur_entry->root->root_key.objectid)
+			p = &(*p)->rb_left;
+		else if (root_objectid > cur_entry->root->root_key.objectid)
+			p = &(*p)->rb_right;
+		else {
+			/* This root is already dirtied */
+			btrfs_put_fs_root(root);
+			kfree(entry);
+			return 0;
+		}
+	}
+	rb_link_node(&entry->node, parent, p);
+	rb_insert_color(&entry->node, &rc->dirty_roots);
+	return 0;
+}
+
+static int clean_dirty_root(struct reloc_control *rc)
+{
+	struct dirty_source_root *entry;
+	struct dirty_source_root *next;
+	int err = 0;
+	int ret;
+
+	rbtree_postorder_for_each_entry_safe(entry, next, &rc->dirty_roots,
+					     node) {
+		struct btrfs_root *reloc_root = entry->root->reloc_root;
+
+		clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &entry->root->state);
+		entry->root->reloc_root = NULL;
+		if (reloc_root) {
+			ret = btrfs_drop_snapshot(reloc_root, NULL, 0, 1);
+			if (ret < 0 && !err)
+				err = ret;
+		}
+		btrfs_put_fs_root(entry->root);
+		kfree(entry);
+	}
+	rc->dirty_roots = RB_ROOT;
+	return err;
+}
+
 /*
  * merge the relocated tree blocks in reloc tree with corresponding
  * fs tree.
@@ -2259,13 +2356,8 @@ static noinline_for_stack int merge_reloc_root(struct reloc_control *rc,
 out:
 	btrfs_free_path(path);
 
-	if (err == 0) {
-		memset(&root_item->drop_progress, 0,
-		       sizeof(root_item->drop_progress));
-		root_item->drop_level = 0;
-		btrfs_set_root_refs(root_item, 0);
-		btrfs_update_reloc_root(trans, root);
-	}
+	if (err == 0)
+		err = insert_dirty_root(trans, rc, root);
 
 	if (trans)
 		btrfs_end_transaction_throttle(trans);
@@ -2410,14 +2502,6 @@ void merge_reloc_roots(struct reloc_control *rc)
 		} else {
 			list_del_init(&reloc_root->root_list);
 		}
-
-		ret = btrfs_drop_snapshot(reloc_root, rc->block_rsv, 0, 1);
-		if (ret < 0) {
-			if (list_empty(&reloc_root->root_list))
-				list_add_tail(&reloc_root->root_list,
-					      &reloc_roots);
-			goto out;
-		}
 	}
 
 	if (found) {
@@ -4078,6 +4162,9 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
 		goto out_free;
 	}
 	btrfs_commit_transaction(trans);
+	ret = clean_dirty_root(rc);
+	if (ret < 0 && !err)
+		err = ret;
 out_free:
 	btrfs_free_block_rsv(fs_info, rc->block_rsv);
 	btrfs_free_path(path);
@@ -4481,6 +4568,10 @@ int btrfs_recover_relocation(struct btrfs_root *root)
 		goto out_free;
 	}
 	err = btrfs_commit_transaction(trans);
+
+	ret = clean_dirty_root(rc);
+	if (ret < 0 && !err)
+		err = ret;
 out_free:
 	kfree(rc);
 out:
-- 
2.19.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 3/6] btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap()
  2018-11-08  5:49 [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead Qu Wenruo
  2018-11-08  5:49 ` [PATCH v2 1/6] btrfs: qgroup: Allow btrfs_qgroup_extent_record::old_roots unpopulated at insert time Qu Wenruo
  2018-11-08  5:49 ` [PATCH v2 2/6] btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots() Qu Wenruo
@ 2018-11-08  5:49 ` Qu Wenruo
  2018-11-08  5:49 ` [PATCH v2 4/6] btrfs: qgroup: Introduce per-root swapped blocks infrastructure Qu Wenruo
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2018-11-08  5:49 UTC (permalink / raw)
  To: linux-btrfs

Refactor btrfs_qgroup_trace_subtree_swap() into
qgroup_trace_subtree_swap(), which only needs two extent buffer and some
other bool to control the behavior.

Also, allow depending functions to accept parameter @exec_post to
determine whether we need to trigger backref walk.

This provides the basis for later delayed subtree scan work.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/qgroup.c | 104 ++++++++++++++++++++++++++++++++--------------
 1 file changed, 72 insertions(+), 32 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 6c674ac29b90..c50c369d5f16 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1793,7 +1793,7 @@ static int qgroup_trace_extent_swap(struct btrfs_trans_handle* trans,
 				    struct extent_buffer *src_eb,
 				    struct btrfs_path *dst_path,
 				    int dst_level, int root_level,
-				    bool trace_leaf)
+				    bool trace_leaf, bool exec_post)
 {
 	struct btrfs_key key;
 	struct btrfs_path *src_path;
@@ -1884,22 +1884,23 @@ static int qgroup_trace_extent_swap(struct btrfs_trans_handle* trans,
 	 * Now both @dst_path and @src_path have been populated, record the tree
 	 * blocks for qgroup accounting.
 	 */
-	ret = btrfs_qgroup_trace_extent(trans, src_path->nodes[dst_level]->start,
-			nodesize, GFP_NOFS);
+	ret = qgroup_trace_extent(trans, src_path->nodes[dst_level]->start,
+				  nodesize, GFP_NOFS, exec_post);
 	if (ret < 0)
 		goto out;
-	ret = btrfs_qgroup_trace_extent(trans,
-			dst_path->nodes[dst_level]->start,
-			nodesize, GFP_NOFS);
+	ret = qgroup_trace_extent(trans, dst_path->nodes[dst_level]->start,
+				  nodesize, GFP_NOFS, exec_post);
 	if (ret < 0)
 		goto out;
 
 	/* Record leaf file extents */
 	if (dst_level == 0 && trace_leaf) {
-		ret = btrfs_qgroup_trace_leaf_items(trans, src_path->nodes[0]);
+		ret = qgroup_trace_leaf_items(trans, src_path->nodes[0],
+					      exec_post);
 		if (ret < 0)
 			goto out;
-		ret = btrfs_qgroup_trace_leaf_items(trans, dst_path->nodes[0]);
+		ret = qgroup_trace_leaf_items(trans, dst_path->nodes[0],
+					      exec_post);
 	}
 out:
 	btrfs_free_path(src_path);
@@ -1932,7 +1933,8 @@ static int qgroup_trace_new_subtree_blocks(struct btrfs_trans_handle* trans,
 					   struct extent_buffer *src_eb,
 					   struct btrfs_path *dst_path,
 					   int cur_level, int root_level,
-					   u64 last_snapshot, bool trace_leaf)
+					   u64 last_snapshot, bool trace_leaf,
+					   bool exec_post)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct extent_buffer *eb;
@@ -2004,7 +2006,7 @@ static int qgroup_trace_new_subtree_blocks(struct btrfs_trans_handle* trans,
 
 	/* Now record this tree block and its counter part for qgroups */
 	ret = qgroup_trace_extent_swap(trans, src_eb, dst_path, cur_level,
-				       root_level, trace_leaf);
+				       root_level, trace_leaf, exec_post);
 	if (ret < 0)
 		goto cleanup;
 
@@ -2021,7 +2023,7 @@ static int qgroup_trace_new_subtree_blocks(struct btrfs_trans_handle* trans,
 			/* Recursive call (at most 7 times) */
 			ret = qgroup_trace_new_subtree_blocks(trans, src_eb,
 					dst_path, cur_level - 1, root_level,
-					last_snapshot, trace_leaf);
+					last_snapshot, trace_leaf, exec_post);
 			if (ret < 0)
 				goto cleanup;
 		}
@@ -2041,6 +2043,62 @@ static int qgroup_trace_new_subtree_blocks(struct btrfs_trans_handle* trans,
 	return ret;
 }
 
+static int qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans,
+				struct extent_buffer *src_eb,
+				struct extent_buffer *dst_eb,
+				u64 last_snapshot, bool trace_leaf,
+				bool exec_post)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_path *dst_path = NULL;
+	int level;
+	int ret;
+
+	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags))
+		return 0;
+
+	/* Wrong parameter order */
+	if (btrfs_header_generation(src_eb) > btrfs_header_generation(dst_eb)) {
+		btrfs_err_rl(fs_info,
+		"%s: bad parameter order, src_gen=%llu dst_gen=%llu", __func__,
+			     btrfs_header_generation(src_eb),
+			     btrfs_header_generation(dst_eb));
+		return -EUCLEAN;
+	}
+
+	if (!extent_buffer_uptodate(src_eb) ||
+	    !extent_buffer_uptodate(dst_eb)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	level = btrfs_header_level(dst_eb);
+	dst_path = btrfs_alloc_path();
+	if (!dst_path) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	/* For dst_path */
+	extent_buffer_get(dst_eb);
+	dst_path->nodes[level] = dst_eb;
+	dst_path->slots[level] = 0;
+	dst_path->locks[level] = 0;
+
+	/* Do the generation aware breadth-first search */
+	ret = qgroup_trace_new_subtree_blocks(trans, src_eb, dst_path, level,
+					      level, last_snapshot, trace_leaf,
+					      exec_post);
+	if (ret < 0)
+		goto out;
+	ret = 0;
+
+out:
+	btrfs_free_path(dst_path);
+	if (ret < 0)
+		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+	return ret;
+}
+
 /*
  * Inform qgroup to trace subtree swap used in balance.
  *
@@ -2066,14 +2124,12 @@ int btrfs_qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans,
 				u64 last_snapshot)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
-	struct btrfs_path *dst_path = NULL;
 	struct btrfs_key first_key;
 	struct extent_buffer *src_eb = NULL;
 	struct extent_buffer *dst_eb = NULL;
 	bool trace_leaf = false;
 	u64 child_gen;
 	u64 child_bytenr;
-	int level;
 	int ret;
 
 	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags))
@@ -2124,22 +2180,9 @@ int btrfs_qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans,
 		goto out;
 	}
 
-	level = btrfs_header_level(dst_eb);
-	dst_path = btrfs_alloc_path();
-	if (!dst_path) {
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	/* For dst_path */
-	extent_buffer_get(dst_eb);
-	dst_path->nodes[level] = dst_eb;
-	dst_path->slots[level] = 0;
-	dst_path->locks[level] = 0;
-
-	/* Do the generation-aware breadth-first search */
-	ret = qgroup_trace_new_subtree_blocks(trans, src_eb, dst_path, level,
-					      level, last_snapshot, trace_leaf);
+	/* Do the generation aware breadth-first search */
+	ret = qgroup_trace_subtree_swap(trans, src_eb, dst_eb, last_snapshot,
+					trace_leaf, true);
 	if (ret < 0)
 		goto out;
 	ret = 0;
@@ -2147,9 +2190,6 @@ int btrfs_qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans,
 out:
 	free_extent_buffer(src_eb);
 	free_extent_buffer(dst_eb);
-	btrfs_free_path(dst_path);
-	if (ret < 0)
-		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
 	return ret;
 }
 
-- 
2.19.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 4/6] btrfs: qgroup: Introduce per-root swapped blocks infrastructure
  2018-11-08  5:49 [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead Qu Wenruo
                   ` (2 preceding siblings ...)
  2018-11-08  5:49 ` [PATCH v2 3/6] btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap() Qu Wenruo
@ 2018-11-08  5:49 ` Qu Wenruo
  2018-11-08  5:49 ` [PATCH v2 5/6] btrfs: qgroup: Use delayed subtree rescan for balance Qu Wenruo
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2018-11-08  5:49 UTC (permalink / raw)
  To: linux-btrfs

To allow delayed subtree swap rescan, btrfs needs to record per-root
info about which tree blocks get swapped.

So this patch introduces per-root btrfs_qgroup_swapped_blocks structure,
which records which tree blocks get swapped.

The designed workflow will be:

1) Record the subtree root block get swapped.

   During subtree swap:
   O = Old tree blocks
   N = New tree blocks
         reloc tree                         file tree X
            Root                               Root
           /    \                             /    \
         NA     OB                          OA      OB
       /  |     |  \                      /  |      |  \
     NC  ND     OE  OF                   OC  OD     OE  OF

  In these case, NA and OA is going to be swapped, record (NA, OA) into
  file tree X.

2) After subtree swap.
         reloc tree                         file tree X
            Root                               Root
           /    \                             /    \
         OA     OB                          NA      OB
       /  |     |  \                      /  |      |  \
     OC  OD     OE  OF                   NC  ND     OE  OF

3a) CoW happens for OB
    If we are going to CoW tree block OB, we check OB's bytenr against
    tree X's swapped_blocks structure.
    It doesn't fit any one, nothing will happen.

3b) CoW happens for NA
    Check NA's bytenr against tree X's swapped_blocks, and get a hit.
    Then we do subtree scan on both subtree OA and NA.
    Resulting 6 tree blocks to be scanned (OA, OC, OD, NA, NC, ND).

    Then no matter what we do to file tree X, qgroup numbers will
    still be correct.
    Then NA's record get removed from X's swapped_blocks.

4)  Transaction commit
    Any record in X's swapped_blocks get removed, since there is no
    modification to swapped subtrees, no need to trigger heavy qgroup
    subtree rescan for them.

This will introduce 128 bytes overhead for each btrfs_root even qgroup
is not enabled.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h       |  13 +++++
 fs/btrfs/disk-io.c     |   1 +
 fs/btrfs/qgroup.c      | 130 +++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/qgroup.h      |  99 +++++++++++++++++++++++++++++++
 fs/btrfs/relocation.c  |   7 +++
 fs/btrfs/transaction.c |   1 +
 6 files changed, 251 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2c33506bdaaa..e32fcf211c8a 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1151,6 +1151,17 @@ struct btrfs_subvolume_writers {
 #define BTRFS_ROOT_DIRTY		8
 #define BTRFS_ROOT_DEAD_RELOC_TREE	9
 
+/*
+ * Record swapped tree blocks of a file/subvolume tree for delayed subtree
+ * trace code. For detail check comment in fs/btrfs/qgroup.c.
+ */
+struct btrfs_qgroup_swapped_blocks {
+	spinlock_t lock;
+	struct rb_root blocks[BTRFS_MAX_LEVEL];
+	/* RM_EMPTY_ROOT() of above blocks[] */
+	bool swapped;
+};
+
 /*
  * in ram representation of the tree.  extent_root is used for all allocations
  * and for the extent tree extent_root root.
@@ -1275,6 +1286,8 @@ struct btrfs_root {
 	u64 qgroup_meta_rsv_pertrans;
 	u64 qgroup_meta_rsv_prealloc;
 
+	struct btrfs_qgroup_swapped_blocks swapped_blocks;
+
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 	u64 alloc_bytenr;
 #endif
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b0ab41da91d1..bd37c3ee2fa9 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1204,6 +1204,7 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
 	root->anon_dev = 0;
 
 	spin_lock_init(&root->root_item_lock);
+	btrfs_qgroup_init_swapped_blocks(&root->swapped_blocks);
 }
 
 static struct btrfs_root *btrfs_alloc_root(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index c50c369d5f16..461895af512b 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3852,3 +3852,133 @@ void btrfs_qgroup_check_reserved_leak(struct inode *inode)
 	}
 	extent_changeset_release(&changeset);
 }
+
+/*
+ * Delete all swapped blocks record of @root.
+ * Every record here means we skipped a full subtree scan for qgroup.
+ *
+ * Get called when commit one transaction.
+ */
+void btrfs_qgroup_clean_swapped_blocks(struct btrfs_root *root)
+{
+	struct btrfs_qgroup_swapped_blocks *swapped_blocks;
+	int i;
+
+	swapped_blocks = &root->swapped_blocks;
+
+	spin_lock(&swapped_blocks->lock);
+	if (!swapped_blocks->swapped)
+		goto out;
+	for (i = 0; i < BTRFS_MAX_LEVEL; i++) {
+		struct rb_root *cur_root = &swapped_blocks->blocks[i];
+		struct btrfs_qgroup_swapped_block *entry;
+		struct btrfs_qgroup_swapped_block *next;
+
+		rbtree_postorder_for_each_entry_safe(entry, next, cur_root,
+						     node)
+			kfree(entry);
+		swapped_blocks->blocks[i] = RB_ROOT;
+	}
+	swapped_blocks->swapped = false;
+out:
+	spin_unlock(&swapped_blocks->lock);
+}
+
+/*
+ * Adding subtree roots record into @file_root.
+ *
+ * @file_root:		tree root of the file tree get swapped
+ * @bg:			block group under balance
+ * @file_parent/slot:	pointer to the subtree root in file tree
+ * @reloc_parent/slot:	pointer to the subtree root in reloc tree
+ *			BOTH POINTERS ARE BEFORE TREE SWAP
+ * @last_snapshot:	last snapshot generation of the file tree
+ */
+int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans,
+		struct btrfs_root *file_root,
+		struct btrfs_block_group_cache *bg,
+		struct extent_buffer *file_parent, int file_slot,
+		struct extent_buffer *reloc_parent, int reloc_slot,
+		u64 last_snapshot)
+{
+	int level = btrfs_header_level(file_parent) - 1;
+	struct btrfs_qgroup_swapped_blocks *blocks = &file_root->swapped_blocks;
+	struct btrfs_fs_info *fs_info = file_root->fs_info;
+	struct btrfs_qgroup_swapped_block *block;
+	struct rb_node **p = &blocks->blocks[level].rb_node;
+	struct rb_node *parent = NULL;
+	int ret = 0;
+
+	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags))
+		return 0;
+
+	if (btrfs_node_ptr_generation(file_parent, file_slot) >
+		btrfs_node_ptr_generation(reloc_parent, reloc_slot)) {
+		btrfs_err_rl(fs_info,
+		"%s: bad parameter order, file_gen=%llu reloc_gen=%llu",
+			__func__,
+			btrfs_node_ptr_generation(file_parent, file_slot),
+			btrfs_node_ptr_generation(reloc_parent, reloc_slot));
+		return -EUCLEAN;
+	}
+
+	block = kmalloc(sizeof(*block), GFP_NOFS);
+	if (!block) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	/*
+	 * @reloc_parent/slot is still *BEFORE* swap, while @block is going to
+	 * record the bytenr *AFTER* swap, so we do the swap here.
+	 */
+	block->file_bytenr = btrfs_node_blockptr(reloc_parent, reloc_slot);
+	block->file_generation = btrfs_node_ptr_generation(reloc_parent,
+							   reloc_slot);
+	block->reloc_bytenr = btrfs_node_blockptr(file_parent, file_slot);
+	block->reloc_generation = btrfs_node_ptr_generation(file_parent,
+							    file_slot);
+	block->last_snapshot = last_snapshot;
+	block->level = level;
+	if (bg->flags & BTRFS_BLOCK_GROUP_DATA)
+		block->trace_leaf = true;
+	else
+		block->trace_leaf = false;
+	btrfs_node_key_to_cpu(reloc_parent, &block->first_key, reloc_slot);
+
+	/* Insert @block into @blocks */
+	spin_lock(&blocks->lock);
+	while (*p) {
+		struct btrfs_qgroup_swapped_block *entry;
+
+		parent = *p;
+		entry = rb_entry(parent, struct btrfs_qgroup_swapped_block,
+				 node);
+
+		if (entry->file_bytenr < block->file_bytenr)
+			p = &(*p)->rb_left;
+		else if (entry->file_bytenr > block->file_bytenr)
+			p = &(*p)->rb_right;
+		else {
+			if (entry->file_generation != block->file_generation ||
+			    entry->reloc_bytenr != block->reloc_bytenr ||
+			    entry->reloc_generation !=
+			    block->reloc_generation) {
+				WARN_ON_ONCE(1);
+				ret = -EEXIST;
+			}
+			kfree(block);
+			goto out_unlock;
+		}
+	}
+	rb_link_node(&block->node, parent, p);
+	rb_insert_color(&block->node, &blocks->blocks[level]);
+	blocks->swapped = true;
+out_unlock:
+	spin_unlock(&blocks->lock);
+out:
+	if (ret < 0)
+		fs_info->qgroup_flags |=
+			BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+	return ret;
+}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index d8f78f5ab854..242e41251626 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -6,6 +6,8 @@
 #ifndef BTRFS_QGROUP_H
 #define BTRFS_QGROUP_H
 
+#include <linux/spinlock.h>
+#include <linux/rbtree.h>
 #include "ulist.h"
 #include "delayed-ref.h"
 
@@ -37,6 +39,66 @@
  *    Normally at qgroup rescan and transaction commit time.
  */
 
+/*
+ * Special performance hack for balance.
+ *
+ * For balance, we need to swap subtree of file and reloc tree.
+ * In theory, we need to trace all subtree blocks of both file and reloc tree,
+ * since their owner has changed during such swap.
+ *
+ * However since balance has ensured that both subtrees are containing the
+ * same contents and have the same tree structures, such swap won't cause
+ * qgroup number change.
+ *
+ * But there is a race window between subtree swap and transaction commit,
+ * during that window, if we increase/decrease tree level or merge/split tree
+ * blocks, we still needs to trace original subtrees.
+ *
+ * So for balance, we use a delayed subtree trace, whose workflow is:
+ *
+ * 1) Record the subtree root block get swapped.
+ *
+ *    During subtree swap:
+ *    O = Old tree blocks
+ *    N = New tree blocks
+ *          reloc tree                         file tree X
+ *             Root                               Root
+ *            /    \                             /    \
+ *          NA     OB                          OA      OB
+ *        /  |     |  \                      /  |      |  \
+ *      NC  ND     OE  OF                   OC  OD     OE  OF
+ *
+ *   In these case, NA and OA is going to be swapped, record (NA, OA) into
+ *   file tree X.
+ *
+ * 2) After subtree swap.
+ *          reloc tree                         file tree X
+ *             Root                               Root
+ *            /    \                             /    \
+ *          OA     OB                          NA      OB
+ *        /  |     |  \                      /  |      |  \
+ *      OC  OD     OE  OF                   NC  ND     OE  OF
+ *
+ * 3a) CoW happens for OB
+ *     If we are going to CoW tree block OB, we check OB's bytenr against
+ *     tree X's swapped_blocks structure.
+ *     It doesn't fit any one, nothing will happen.
+ *
+ * 3b) CoW happens for NA
+ *     Check NA's bytenr against tree X's swapped_blocks, and get a hit.
+ *     Then we do subtree scan on both subtree OA and NA.
+ *     Resulting 6 tree blocks to be scanned (OA, OC, OD, NA, NC, ND).
+ *
+ *     Then no matter what we do to file tree X, qgroup numbers will
+ *     still be correct.
+ *     Then NA's record get removed from X's swapped_blocks.
+ *
+ * 4)  Transaction commit
+ *     Any record in X's swapped_blocks get removed, since there is no
+ *     modification to swapped subtrees, no need to trigger heavy qgroup
+ *     subtree rescan for them.
+ */
+
 /*
  * Record a dirty extent, and info qgroup to update quota on it
  * TODO: Use kmem cache to alloc it.
@@ -48,6 +110,24 @@ struct btrfs_qgroup_extent_record {
 	struct ulist *old_roots;
 };
 
+struct btrfs_qgroup_swapped_block {
+	struct rb_node node;
+
+	bool trace_leaf;
+	int level;
+
+	/* bytenr/generation of the tree block in file tree after swap */
+	u64 file_bytenr;
+	u64 file_generation;
+
+	/* bytenr/generation of the tree block in reloc tree after swap */
+	u64 reloc_bytenr;
+	u64 reloc_generation;
+
+	u64 last_snapshot;
+	struct btrfs_key first_key;
+};
+
 /*
  * Qgroup reservation types:
  *
@@ -325,4 +405,23 @@ void btrfs_qgroup_convert_reserved_meta(struct btrfs_root *root, int num_bytes);
 
 void btrfs_qgroup_check_reserved_leak(struct inode *inode);
 
+/* btrfs_qgroup_swapped_blocks related functions */
+static inline void btrfs_qgroup_init_swapped_blocks(
+		struct btrfs_qgroup_swapped_blocks *swapped_blocks)
+{
+	int i;
+
+	spin_lock_init(&swapped_blocks->lock);
+	for (i = 0; i < BTRFS_MAX_LEVEL; i++)
+		swapped_blocks->blocks[i] = RB_ROOT;
+	swapped_blocks->swapped = false;
+}
+
+void btrfs_qgroup_clean_swapped_blocks(struct btrfs_root *root);
+int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans,
+		struct btrfs_root *file_root,
+		struct btrfs_block_group_cache *bg,
+		struct extent_buffer *file_parent, int file_slot,
+		struct extent_buffer *reloc_parent, int reloc_slot,
+		u64 last_snapshot);
 #endif
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 6f1f11b5d8f6..9b78c8fff40f 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1913,6 +1913,13 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc,
 		if (ret < 0)
 			break;
 
+		btrfs_node_key_to_cpu(parent, &first_key, slot);
+		ret = btrfs_qgroup_add_swapped_blocks(trans, dest,
+				rc->block_group, parent, slot,
+				path->nodes[level], path->slots[level],
+				last_snapshot);
+		if (ret < 0)
+			break;
 		/*
 		 * swap blocks in fs tree and reloc tree.
 		 */
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index d1eeef9ec5da..7a8b6a60ab18 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -122,6 +122,7 @@ static noinline void switch_commit_roots(struct btrfs_transaction *trans)
 		if (is_fstree(root->root_key.objectid))
 			btrfs_unpin_free_ino(root);
 		clear_btree_io_tree(&root->dirty_log_pages);
+		btrfs_qgroup_clean_swapped_blocks(root);
 	}
 
 	/* We can free old roots now. */
-- 
2.19.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 5/6] btrfs: qgroup: Use delayed subtree rescan for balance
  2018-11-08  5:49 [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead Qu Wenruo
                   ` (3 preceding siblings ...)
  2018-11-08  5:49 ` [PATCH v2 4/6] btrfs: qgroup: Introduce per-root swapped blocks infrastructure Qu Wenruo
@ 2018-11-08  5:49 ` Qu Wenruo
  2018-11-08  5:49 ` [PATCH v2 6/6] btrfs: qgroup: Cleanup old subtree swap code Qu Wenruo
  2018-11-12 21:33 ` [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead David Sterba
  6 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2018-11-08  5:49 UTC (permalink / raw)
  To: linux-btrfs

Before this patch, qgroup code trace the whole subtree of file and reloc
trees unconditionally.

This makes qgroup numbers consistent, but it could cause tons of
unnecessary extent trace, which cause a lot of overhead.

However for subtree swap of balance, since both subtree contains the
same content and tree structures, just swap them won't change qgroup
numbers.

It's the race window between subtree swap and transaction commit could
cause qgroup number change.

This patch will delay the qgroup subtree scan until CoW happens for the
subtree root.

So if there is no other operations for the fs, balance won't cause extra
qgroup overhead. (best case scenario)
And depends on the workload, most of the subtree scan can still be
avoided.

Only for worst case scenario, it will fall back to old subtree swap
overhead. (scan all swapped subtrees)

[[Benchmark]]
Hardware:
	VM 4G vRAM, 8 vCPUs,
	disk is using 'unsafe' cache mode,
	backing device is SAMSUNG 850 evo SSD.
	Host has 16G ram.

Mkfs parameter:
	--nodesize 4K (To bump up tree size)

Initial subvolume contents:
	4G data copied from /usr and /lib.
	(With enough regular small files)

Snapshots:
	16 snapshots of the original subvolume.
	each snapshot has 3 random files modified.

balance parameter:
	-m

So the content should be pretty similar to a real world root fs layout.

And after file system population, there is no other activity, so it
should be the best case scenario.

                     | v4.20-rc1            | w/ patchset    | diff
-----------------------------------------------------------------------
relocated extents    | 22615                | 22457          | -0.1%
qgroup dirty extents | 163457               | 121606         | -25.6%
time (sys)           | 22.884s              | 18.842s        | -17.6%
time (real)          | 27.724s              | 22.884s        | -17.5%

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.c      |  8 ++++
 fs/btrfs/qgroup.c     | 87 +++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/qgroup.h     |  2 +
 fs/btrfs/relocation.c | 14 +++----
 4 files changed, 102 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 539901fb5165..f4b1f73ecb71 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -12,6 +12,7 @@
 #include "transaction.h"
 #include "print-tree.h"
 #include "locking.h"
+#include "qgroup.h"
 
 static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root
 		      *root, struct btrfs_path *path, int level);
@@ -1462,6 +1463,13 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle *trans,
 		btrfs_set_lock_blocking(parent);
 	btrfs_set_lock_blocking(buf);
 
+	/*
+	 * Before CoWing this block for later modification, check if it's
+	 * the subtree root and do the delayed subtree trace if needed.
+	 *
+	 * Also We don't care about the error, as it's handled internally.
+	 */
+	btrfs_qgroup_trace_subtree_after_cow(trans, root, buf);
 	ret = __btrfs_cow_block(trans, root, buf, parent,
 				 parent_slot, cow_ret, search_start, 0);
 
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 461895af512b..58ba106abad9 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3982,3 +3982,90 @@ int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans,
 			BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
 	return ret;
 }
+
+/*
+ * Check if the tree block is a subtree root, and if so do the needed
+ * delayed subtree trace for qgroup.
+ *
+ * This is called during btrfs_cow_block().
+ */
+int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans,
+		struct btrfs_root *root, struct extent_buffer *file_eb)
+{
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct btrfs_qgroup_swapped_blocks *blocks = &root->swapped_blocks;
+	struct btrfs_qgroup_swapped_block *block;
+	struct extent_buffer *reloc_eb = NULL;
+	struct rb_node *n;
+	bool found = false;
+	bool swapped = false;
+	int level = btrfs_header_level(file_eb);
+	int ret = 0;
+	int i;
+
+	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags))
+		return 0;
+	if (!is_fstree(root->root_key.objectid) || !root->reloc_root)
+		return 0;
+
+	spin_lock(&blocks->lock);
+	if (!blocks->swapped) {
+		spin_unlock(&blocks->lock);
+		goto out;
+	}
+	n = blocks->blocks[level].rb_node;
+
+	while (n) {
+		block = rb_entry(n, struct btrfs_qgroup_swapped_block, node);
+		if (block->file_bytenr < file_eb->start)
+			n = n->rb_left;
+		else if (block->file_bytenr > file_eb->start)
+			n = n->rb_right;
+		else {
+			found = true;
+			break;
+		}
+	}
+	if (!found) {
+		spin_unlock(&blocks->lock);
+		goto out;
+	}
+	/* Found one, remove it from @blocks first and update blocks->swapped */
+	rb_erase(&block->node, &blocks->blocks[level]);
+	for (i = 0; i < BTRFS_MAX_LEVEL; i++) {
+		if (RB_EMPTY_ROOT(&blocks->blocks[i])) {
+			swapped = true;
+			break;
+		}
+	}
+	blocks->swapped = swapped;
+	spin_unlock(&blocks->lock);
+
+	/* Read out reloc subtree root */
+	reloc_eb = read_tree_block(fs_info, block->reloc_bytenr,
+				   block->reloc_generation, block->level,
+				   &block->first_key);
+	if (IS_ERR(reloc_eb)) {
+		ret = PTR_ERR(file_eb);
+		reloc_eb = NULL;
+		goto free_out;
+	}
+	if (!extent_buffer_uptodate(reloc_eb)) {
+		ret = -EIO;
+		goto free_out;
+	}
+
+	ret = qgroup_trace_subtree_swap(trans, reloc_eb, file_eb,
+			block->last_snapshot, block->trace_leaf, false);
+free_out:
+	kfree(block);
+	free_extent_buffer(reloc_eb);
+out:
+	if (ret < 0) {
+		btrfs_err_rl(fs_info,
+			     "failed to account subtree at bytenr %llu: %d",
+			     file_eb->start, ret);
+		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+	}
+	return ret;
+}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 242e41251626..9f941421c405 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -424,4 +424,6 @@ int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans,
 		struct extent_buffer *file_parent, int file_slot,
 		struct extent_buffer *reloc_parent, int reloc_slot,
 		u64 last_snapshot);
+int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans,
+		struct btrfs_root *root, struct extent_buffer *eb);
 #endif
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 9b78c8fff40f..a5e9754243f4 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1904,16 +1904,12 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc,
 		 *    If not traced, we will leak data numbers
 		 * 2) Fs subtree
 		 *    If not traced, we will double count old data
-		 *    and tree block numbers, if current trans doesn't free
-		 *    data reloc tree inode.
+		 *
+		 * We don't scan the subtree right now, but only record
+		 * the swapped tree blocks.
+		 * The real subtree rescan is delayed until we have new
+		 * CoW on the subtree root node before transaction commit.
 		 */
-		ret = btrfs_qgroup_trace_subtree_swap(trans, rc->block_group,
-				parent, slot, path->nodes[level],
-				path->slots[level], last_snapshot);
-		if (ret < 0)
-			break;
-
-		btrfs_node_key_to_cpu(parent, &first_key, slot);
 		ret = btrfs_qgroup_add_swapped_blocks(trans, dest,
 				rc->block_group, parent, slot,
 				path->nodes[level], path->slots[level],
-- 
2.19.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 6/6] btrfs: qgroup: Cleanup old subtree swap code
  2018-11-08  5:49 [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead Qu Wenruo
                   ` (4 preceding siblings ...)
  2018-11-08  5:49 ` [PATCH v2 5/6] btrfs: qgroup: Use delayed subtree rescan for balance Qu Wenruo
@ 2018-11-08  5:49 ` Qu Wenruo
  2018-11-12 21:33 ` [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead David Sterba
  6 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2018-11-08  5:49 UTC (permalink / raw)
  To: linux-btrfs

Since it's replaced by new delayed subtree swap code, remove the
original code.

The cleanup is small since most of its core function is still used by
delayed subtree swap trace.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/qgroup.c | 94 -----------------------------------------------
 fs/btrfs/qgroup.h |  6 ---
 2 files changed, 100 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 58ba106abad9..b662be1e35cc 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2099,100 +2099,6 @@ static int qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
-/*
- * Inform qgroup to trace subtree swap used in balance.
- *
- * Unlike btrfs_qgroup_trace_subtree(), this function will only trace
- * new tree blocks whose generation is equal to (or larger than) @last_snapshot.
- *
- * Will go down the tree block pointed by @dst_eb (pointed by @dst_parent and
- * @dst_slot), and find any tree blocks whose generation is at @last_snapshot,
- * and then go down @src_eb (pointed by @src_parent and @src_slot) to find
- * the conterpart of the tree block, then mark both tree blocks as qgroup dirty,
- * and skip all tree blocks whose generation is smaller than last_snapshot.
- *
- * This would skip tons of tree blocks of original btrfs_qgroup_trace_subtree(),
- * which could be the cause of very slow balance if the file tree is large.
- *
- * @src_parent, @src_slot: pointer to src (file tree) eb.
- * @dst_parent, @dst_slot: pointer to dst (reloc tree) eb.
- */
-int btrfs_qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans,
-				struct btrfs_block_group_cache *bg_cache,
-				struct extent_buffer *src_parent, int src_slot,
-				struct extent_buffer *dst_parent, int dst_slot,
-				u64 last_snapshot)
-{
-	struct btrfs_fs_info *fs_info = trans->fs_info;
-	struct btrfs_key first_key;
-	struct extent_buffer *src_eb = NULL;
-	struct extent_buffer *dst_eb = NULL;
-	bool trace_leaf = false;
-	u64 child_gen;
-	u64 child_bytenr;
-	int ret;
-
-	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags))
-		return 0;
-
-	/* Check parameter order */
-	if (btrfs_node_ptr_generation(src_parent, src_slot) >
-	    btrfs_node_ptr_generation(dst_parent, dst_slot)) {
-		btrfs_err_rl(fs_info,
-		"%s: bad parameter order, src_gen=%llu dst_gen=%llu", __func__,
-			btrfs_node_ptr_generation(src_parent, src_slot),
-			btrfs_node_ptr_generation(dst_parent, dst_slot));
-		return -EUCLEAN;
-	}
-
-	/*
-	 * Only trace leaf if we're relocating data block groups, this could
-	 * reduce tons of data extents tracing for meta/sys bg relocation.
-	 */
-	if (bg_cache->flags & BTRFS_BLOCK_GROUP_DATA)
-		trace_leaf = true;
-	/* Read out real @src_eb, pointed by @src_parent and @src_slot */
-	child_bytenr = btrfs_node_blockptr(src_parent, src_slot);
-	child_gen = btrfs_node_ptr_generation(src_parent, src_slot);
-	btrfs_node_key_to_cpu(src_parent, &first_key, src_slot);
-
-	src_eb = read_tree_block(fs_info, child_bytenr, child_gen,
-			btrfs_header_level(src_parent) - 1, &first_key);
-	if (IS_ERR(src_eb)) {
-		ret = PTR_ERR(src_eb);
-		goto out;
-	}
-
-	/* Read out real @dst_eb, pointed by @src_parent and @src_slot */
-	child_bytenr = btrfs_node_blockptr(dst_parent, dst_slot);
-	child_gen = btrfs_node_ptr_generation(dst_parent, dst_slot);
-	btrfs_node_key_to_cpu(dst_parent, &first_key, dst_slot);
-
-	dst_eb = read_tree_block(fs_info, child_bytenr, child_gen,
-			btrfs_header_level(dst_parent) - 1, &first_key);
-	if (IS_ERR(dst_eb)) {
-		ret = PTR_ERR(dst_eb);
-		goto out;
-	}
-
-	if (!extent_buffer_uptodate(src_eb) || !extent_buffer_uptodate(dst_eb)) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	/* Do the generation aware breadth-first search */
-	ret = qgroup_trace_subtree_swap(trans, src_eb, dst_eb, last_snapshot,
-					trace_leaf, true);
-	if (ret < 0)
-		goto out;
-	ret = 0;
-
-out:
-	free_extent_buffer(src_eb);
-	free_extent_buffer(dst_eb);
-	return ret;
-}
-
 int btrfs_qgroup_trace_subtree(struct btrfs_trans_handle *trans,
 			       struct extent_buffer *root_eb,
 			       u64 root_gen, int root_level)
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 9f941421c405..3254add3c340 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -316,12 +316,6 @@ int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans,
 int btrfs_qgroup_trace_subtree(struct btrfs_trans_handle *trans,
 			       struct extent_buffer *root_eb,
 			       u64 root_gen, int root_level);
-
-int btrfs_qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans,
-				struct btrfs_block_group_cache *bg_cache,
-				struct extent_buffer *src_parent, int src_slot,
-				struct extent_buffer *dst_parent, int dst_slot,
-				u64 last_snapshot);
 int btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 				u64 num_bytes, struct ulist *old_roots,
 				struct ulist *new_roots);
-- 
2.19.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
  2018-11-08  5:49 [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead Qu Wenruo
                   ` (5 preceding siblings ...)
  2018-11-08  5:49 ` [PATCH v2 6/6] btrfs: qgroup: Cleanup old subtree swap code Qu Wenruo
@ 2018-11-12 21:33 ` David Sterba
  2018-11-13 17:07   ` David Sterba
  2018-12-06 19:35   ` David Sterba
  6 siblings, 2 replies; 22+ messages in thread
From: David Sterba @ 2018-11-12 21:33 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote:
> This patchset can be fetched from github:
> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased
> 
> Which is based on v4.20-rc1.

Thanks, I'll add it to for-next soon.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
  2018-11-12 21:33 ` [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead David Sterba
@ 2018-11-13 17:07   ` David Sterba
  2018-11-13 17:58     ` Filipe Manana
  2018-12-06 19:35   ` David Sterba
  1 sibling, 1 reply; 22+ messages in thread
From: David Sterba @ 2018-11-13 17:07 UTC (permalink / raw)
  To: David Sterba; +Cc: Qu Wenruo, linux-btrfs

On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote:
> On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote:
> > This patchset can be fetched from github:
> > https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased
> > 
> > Which is based on v4.20-rc1.
> 
> Thanks, I'll add it to for-next soon.

During test generic/517, the logs were full of the warning below. The reference
test on current master, effectively misc-4.20 which was used as base of your
branch did not get the warning.

[11540.167829] BTRFS: end < start 2519039 2519040
[11540.170513] WARNING: CPU: 1 PID: 539 at fs/btrfs/extent_io.c:436 insert_state+0xd8/0x100 [btrfs]
[11540.174411] Modules linked in: dm_thin_pool dm_persistent_data dm_bufio dm_bio_prison btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq dm_mod loop [last unloaded: libcrc32c]
[11540.178279] CPU: 1 PID: 539 Comm: xfs_io Tainted: G      D W         4.20.0-rc1-default+ #329
[11540.180616] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
[11540.183754] RIP: 0010:insert_state+0xd8/0x100 [btrfs]
[11540.189173] RSP: 0018:ffffa0d245eafb20 EFLAGS: 00010282
[11540.189885] RAX: 0000000000000000 RBX: ffff9f0bb3267320 RCX: 0000000000000000
[11540.191646] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffffffffa40c400d
[11540.192942] RBP: 0000000000266fff R08: 0000000000000001 R09: 0000000000000000
[11540.193871] R10: 0000000000000000 R11: ffffffffa629da2d R12: ffff9f0ba0281c60
[11540.195527] R13: 0000000000267000 R14: ffffa0d245eafb98 R15: ffffa0d245eafb90
[11540.197026] FS:  00007fa338eb4b80(0000) GS:ffff9f0bbd600000(0000) knlGS:0000000000000000
[11540.198251] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11540.199698] CR2: 00007fa33873bfb8 CR3: 000000006fb6e000 CR4: 00000000000006e0
[11540.201428] Call Trace:
[11540.202164]  __set_extent_bit+0x43b/0x5b0 [btrfs]
[11540.203223]  lock_extent_bits+0x5d/0x210 [btrfs]
[11540.204346]  ? _raw_spin_unlock+0x24/0x40
[11540.205381]  ? test_range_bit+0xdf/0x130 [btrfs]
[11540.206573]  lock_extent_range+0xb8/0x150 [btrfs]
[11540.207696]  btrfs_double_extent_lock+0x78/0xb0 [btrfs]
[11540.208988]  btrfs_extent_same_range+0x131/0x4e0 [btrfs]
[11540.210237]  btrfs_remap_file_range+0x337/0x350 [btrfs]
[11540.211448]  vfs_dedupe_file_range_one+0x141/0x150
[11540.212622]  vfs_dedupe_file_range+0x146/0x1a0
[11540.213795]  do_vfs_ioctl+0x520/0x6c0
[11540.214711]  ? __fget+0x109/0x1e0
[11540.215616]  ksys_ioctl+0x3a/0x70
[11540.216233]  __x64_sys_ioctl+0x16/0x20
[11540.216860]  do_syscall_64+0x54/0x180
[11540.217409]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[11540.218126] RIP: 0033:0x7fa338a4daa7


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
  2018-11-13 17:07   ` David Sterba
@ 2018-11-13 17:58     ` Filipe Manana
  2018-11-13 23:56       ` Qu Wenruo
  2018-11-14 19:05       ` David Sterba
  0 siblings, 2 replies; 22+ messages in thread
From: Filipe Manana @ 2018-11-13 17:58 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

On Tue, Nov 13, 2018 at 5:08 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote:
> > On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote:
> > > This patchset can be fetched from github:
> > > https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased
> > >
> > > Which is based on v4.20-rc1.
> >
> > Thanks, I'll add it to for-next soon.
>
> During test generic/517, the logs were full of the warning below. The reference
> test on current master, effectively misc-4.20 which was used as base of your
> branch did not get the warning.
>
> [11540.167829] BTRFS: end < start 2519039 2519040
> [11540.170513] WARNING: CPU: 1 PID: 539 at fs/btrfs/extent_io.c:436 insert_state+0xd8/0x100 [btrfs]
> [11540.174411] Modules linked in: dm_thin_pool dm_persistent_data dm_bufio dm_bio_prison btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq dm_mod loop [last unloaded: libcrc32c]
> [11540.178279] CPU: 1 PID: 539 Comm: xfs_io Tainted: G      D W         4.20.0-rc1-default+ #329
> [11540.180616] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
> [11540.183754] RIP: 0010:insert_state+0xd8/0x100 [btrfs]
> [11540.189173] RSP: 0018:ffffa0d245eafb20 EFLAGS: 00010282
> [11540.189885] RAX: 0000000000000000 RBX: ffff9f0bb3267320 RCX: 0000000000000000
> [11540.191646] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffffffffa40c400d
> [11540.192942] RBP: 0000000000266fff R08: 0000000000000001 R09: 0000000000000000
> [11540.193871] R10: 0000000000000000 R11: ffffffffa629da2d R12: ffff9f0ba0281c60
> [11540.195527] R13: 0000000000267000 R14: ffffa0d245eafb98 R15: ffffa0d245eafb90
> [11540.197026] FS:  00007fa338eb4b80(0000) GS:ffff9f0bbd600000(0000) knlGS:0000000000000000
> [11540.198251] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [11540.199698] CR2: 00007fa33873bfb8 CR3: 000000006fb6e000 CR4: 00000000000006e0
> [11540.201428] Call Trace:
> [11540.202164]  __set_extent_bit+0x43b/0x5b0 [btrfs]
> [11540.203223]  lock_extent_bits+0x5d/0x210 [btrfs]
> [11540.204346]  ? _raw_spin_unlock+0x24/0x40
> [11540.205381]  ? test_range_bit+0xdf/0x130 [btrfs]
> [11540.206573]  lock_extent_range+0xb8/0x150 [btrfs]
> [11540.207696]  btrfs_double_extent_lock+0x78/0xb0 [btrfs]
> [11540.208988]  btrfs_extent_same_range+0x131/0x4e0 [btrfs]
> [11540.210237]  btrfs_remap_file_range+0x337/0x350 [btrfs]
> [11540.211448]  vfs_dedupe_file_range_one+0x141/0x150
> [11540.212622]  vfs_dedupe_file_range+0x146/0x1a0
> [11540.213795]  do_vfs_ioctl+0x520/0x6c0
> [11540.214711]  ? __fget+0x109/0x1e0
> [11540.215616]  ksys_ioctl+0x3a/0x70
> [11540.216233]  __x64_sys_ioctl+0x16/0x20
> [11540.216860]  do_syscall_64+0x54/0x180
> [11540.217409]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [11540.218126] RIP: 0033:0x7fa338a4daa7

That's the infinite loop issue fixed by one of the patches submitted
for 4.20-rc2:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.20-rc2&id=11023d3f5fdf89bba5e1142127701ca6e6014587

The branch you used for testing doesn't have that fix?

>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
  2018-11-13 17:58     ` Filipe Manana
@ 2018-11-13 23:56       ` Qu Wenruo
  2018-11-14 19:05       ` David Sterba
  1 sibling, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2018-11-13 23:56 UTC (permalink / raw)
  To: fdmanana, dsterba, Qu Wenruo, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 3480 bytes --]



On 2018/11/14 上午1:58, Filipe Manana wrote:
> On Tue, Nov 13, 2018 at 5:08 PM David Sterba <dsterba@suse.cz> wrote:
>>
>> On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote:
>>> On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote:
>>>> This patchset can be fetched from github:
>>>> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased
>>>>
>>>> Which is based on v4.20-rc1.
>>>
>>> Thanks, I'll add it to for-next soon.
>>
>> During test generic/517, the logs were full of the warning below. The reference
>> test on current master, effectively misc-4.20 which was used as base of your
>> branch did not get the warning.
>>
>> [11540.167829] BTRFS: end < start 2519039 2519040
>> [11540.170513] WARNING: CPU: 1 PID: 539 at fs/btrfs/extent_io.c:436 insert_state+0xd8/0x100 [btrfs]
>> [11540.174411] Modules linked in: dm_thin_pool dm_persistent_data dm_bufio dm_bio_prison btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq dm_mod loop [last unloaded: libcrc32c]
>> [11540.178279] CPU: 1 PID: 539 Comm: xfs_io Tainted: G      D W         4.20.0-rc1-default+ #329
>> [11540.180616] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
>> [11540.183754] RIP: 0010:insert_state+0xd8/0x100 [btrfs]
>> [11540.189173] RSP: 0018:ffffa0d245eafb20 EFLAGS: 00010282
>> [11540.189885] RAX: 0000000000000000 RBX: ffff9f0bb3267320 RCX: 0000000000000000
>> [11540.191646] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffffffffa40c400d
>> [11540.192942] RBP: 0000000000266fff R08: 0000000000000001 R09: 0000000000000000
>> [11540.193871] R10: 0000000000000000 R11: ffffffffa629da2d R12: ffff9f0ba0281c60
>> [11540.195527] R13: 0000000000267000 R14: ffffa0d245eafb98 R15: ffffa0d245eafb90
>> [11540.197026] FS:  00007fa338eb4b80(0000) GS:ffff9f0bbd600000(0000) knlGS:0000000000000000
>> [11540.198251] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [11540.199698] CR2: 00007fa33873bfb8 CR3: 000000006fb6e000 CR4: 00000000000006e0
>> [11540.201428] Call Trace:
>> [11540.202164]  __set_extent_bit+0x43b/0x5b0 [btrfs]
>> [11540.203223]  lock_extent_bits+0x5d/0x210 [btrfs]
>> [11540.204346]  ? _raw_spin_unlock+0x24/0x40
>> [11540.205381]  ? test_range_bit+0xdf/0x130 [btrfs]
>> [11540.206573]  lock_extent_range+0xb8/0x150 [btrfs]
>> [11540.207696]  btrfs_double_extent_lock+0x78/0xb0 [btrfs]
>> [11540.208988]  btrfs_extent_same_range+0x131/0x4e0 [btrfs]
>> [11540.210237]  btrfs_remap_file_range+0x337/0x350 [btrfs]
>> [11540.211448]  vfs_dedupe_file_range_one+0x141/0x150
>> [11540.212622]  vfs_dedupe_file_range+0x146/0x1a0
>> [11540.213795]  do_vfs_ioctl+0x520/0x6c0
>> [11540.214711]  ? __fget+0x109/0x1e0
>> [11540.215616]  ksys_ioctl+0x3a/0x70
>> [11540.216233]  __x64_sys_ioctl+0x16/0x20
>> [11540.216860]  do_syscall_64+0x54/0x180
>> [11540.217409]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> [11540.218126] RIP: 0033:0x7fa338a4daa7
> 
> That's the infinite loop issue fixed by one of the patches submitted
> for 4.20-rc2:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.20-rc2&id=11023d3f5fdf89bba5e1142127701ca6e6014587
> 
> The branch you used for testing doesn't have that fix?

Yep, I tried v4.20-rc1 tag, which hits tons of such warning even without
my patchset.

So it shouldn't be my patches causing the problem.

Thanks,
Qu

> 
>>
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
  2018-11-13 17:58     ` Filipe Manana
  2018-11-13 23:56       ` Qu Wenruo
@ 2018-11-14 19:05       ` David Sterba
  2018-11-15  5:23         ` Qu Wenruo
  1 sibling, 1 reply; 22+ messages in thread
From: David Sterba @ 2018-11-14 19:05 UTC (permalink / raw)
  To: Filipe Manana; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Tue, Nov 13, 2018 at 05:58:14PM +0000, Filipe Manana wrote:
> That's the infinite loop issue fixed by one of the patches submitted
> for 4.20-rc2:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.20-rc2&id=11023d3f5fdf89bba5e1142127701ca6e6014587
> 
> The branch you used for testing doesn't have that fix?

That explains it, thanks.  The branch was based on 4.20-rc1 as I took it
from Qu's repository but did not check the base.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
  2018-11-14 19:05       ` David Sterba
@ 2018-11-15  5:23         ` Qu Wenruo
  2018-11-15 10:28           ` David Sterba
  0 siblings, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2018-11-15  5:23 UTC (permalink / raw)
  To: dsterba, Filipe Manana, Qu Wenruo, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 778 bytes --]



On 2018/11/15 上午3:05, David Sterba wrote:
> On Tue, Nov 13, 2018 at 05:58:14PM +0000, Filipe Manana wrote:
>> That's the infinite loop issue fixed by one of the patches submitted
>> for 4.20-rc2:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.20-rc2&id=11023d3f5fdf89bba5e1142127701ca6e6014587
>>
>> The branch you used for testing doesn't have that fix?
> 
> That explains it, thanks.  The branch was based on 4.20-rc1 as I took it
> from Qu's repository but did not check the base.

BTW should I always rebase my patches to misc-next or misc-4.20?

IMHO based on -rc tags should make it easier for David to rebase/apply,
but if it's causing problem like this, I could definitely go misc-* based.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
  2018-11-15  5:23         ` Qu Wenruo
@ 2018-11-15 10:28           ` David Sterba
  0 siblings, 0 replies; 22+ messages in thread
From: David Sterba @ 2018-11-15 10:28 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Filipe Manana, Qu Wenruo, linux-btrfs

On Thu, Nov 15, 2018 at 01:23:25PM +0800, Qu Wenruo wrote:
> 
> 
> On 2018/11/15 上午3:05, David Sterba wrote:
> > On Tue, Nov 13, 2018 at 05:58:14PM +0000, Filipe Manana wrote:
> >> That's the infinite loop issue fixed by one of the patches submitted
> >> for 4.20-rc2:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.20-rc2&id=11023d3f5fdf89bba5e1142127701ca6e6014587
> >>
> >> The branch you used for testing doesn't have that fix?
> > 
> > That explains it, thanks.  The branch was based on 4.20-rc1 as I took it
> > from Qu's repository but did not check the base.
> 
> BTW should I always rebase my patches to misc-next or misc-4.20?

The misc-next can rebase as I add the tags or need to remove a patch
etc., so using last rc or the last pulled branch (ie. the exact commit
from misc-4.20, not the head itself) should be better option for you so
you don't need to catch up and rebase constantly.

I can handle rebases of your branches to current misc-next as long as
there are no major conflicts. 

> IMHO based on -rc tags should make it easier for David to rebase/apply,
> but if it's causing problem like this, I could definitely go misc-* based.

No, that was my fault. I review and rebase all topic branches at each rc
release, but taking the branch from your repo was a bit different step
in the workflow.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
  2018-11-12 21:33 ` [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead David Sterba
  2018-11-13 17:07   ` David Sterba
@ 2018-12-06 19:35   ` David Sterba
  2018-12-06 22:51     ` Qu Wenruo
  1 sibling, 1 reply; 22+ messages in thread
From: David Sterba @ 2018-12-06 19:35 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote:
> On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote:
> > This patchset can be fetched from github:
> > https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased
> > 
> > Which is based on v4.20-rc1.
> 
> Thanks, I'll add it to for-next soon.

The branch was there for some time but not for at least a week (my
mistake I did not notice in time). I've rebased it on top of recent
misc-next, but without the delayed refs patchset from Josef.

At the moment I'm considering it for merge to 4.21, there's still some
time to pull it out in case it shows up to be too problematic. I'm
mostly worried about the unknown interactions with the enospc updates or
generally because of lack of qgroup and reloc code reviews.

I'm going to do some testing of the rebased branch before I add it to
for-next. The branch is ext/qu/qgroup-delay-scan in my devel repos,
plase check if everyghing is still ok there. Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
  2018-12-06 19:35   ` David Sterba
@ 2018-12-06 22:51     ` Qu Wenruo
  2018-12-08  0:47       ` David Sterba
  0 siblings, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2018-12-06 22:51 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 1740 bytes --]



On 2018/12/7 上午3:35, David Sterba wrote:
> On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote:
>> On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote:
>>> This patchset can be fetched from github:
>>> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased
>>>
>>> Which is based on v4.20-rc1.
>>
>> Thanks, I'll add it to for-next soon.
> 
> The branch was there for some time but not for at least a week (my
> mistake I did not notice in time). I've rebased it on top of recent
> misc-next, but without the delayed refs patchset from Josef.
> 
> At the moment I'm considering it for merge to 4.21, there's still some
> time to pull it out in case it shows up to be too problematic. I'm
> mostly worried about the unknown interactions with the enospc updates or

For that part, I don't think it would have some obvious problem for
enospc updates.

As the user-noticeable effect is the delay of reloc tree deletion.

Despite that, it's mostly transparent to extent allocation.

> generally because of lack of qgroup and reloc code reviews.

That's the biggest problem.

However most of the current qgroup + balance optimization is done inside
qgroup code (to skip certain qgroup record), if we're going to hit some
problem then this patchset would have the highest possibility to hit
problem.

Later patches will just keep tweaking qgroup to without affecting any
other parts mostly.

So I'm fine if you decide to pull it out for now.

Thanks,
Qu

> 
> I'm going to do some testing of the rebased branch before I add it to
> for-next. The branch is ext/qu/qgroup-delay-scan in my devel repos,
> plase check if everyghing is still ok there. Thanks.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
  2018-12-06 22:51     ` Qu Wenruo
@ 2018-12-08  0:47       ` David Sterba
  2018-12-08  0:50         ` Qu Wenruo
  2018-12-10  5:51         ` Qu Wenruo
  0 siblings, 2 replies; 22+ messages in thread
From: David Sterba @ 2018-12-08  0:47 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Fri, Dec 07, 2018 at 06:51:21AM +0800, Qu Wenruo wrote:
> 
> 
> On 2018/12/7 上午3:35, David Sterba wrote:
> > On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote:
> >> On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote:
> >>> This patchset can be fetched from github:
> >>> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased
> >>>
> >>> Which is based on v4.20-rc1.
> >>
> >> Thanks, I'll add it to for-next soon.
> > 
> > The branch was there for some time but not for at least a week (my
> > mistake I did not notice in time). I've rebased it on top of recent
> > misc-next, but without the delayed refs patchset from Josef.
> > 
> > At the moment I'm considering it for merge to 4.21, there's still some
> > time to pull it out in case it shows up to be too problematic. I'm
> > mostly worried about the unknown interactions with the enospc updates or
> 
> For that part, I don't think it would have some obvious problem for
> enospc updates.
> 
> As the user-noticeable effect is the delay of reloc tree deletion.
> 
> Despite that, it's mostly transparent to extent allocation.
> 
> > generally because of lack of qgroup and reloc code reviews.
> 
> That's the biggest problem.
> 
> However most of the current qgroup + balance optimization is done inside
> qgroup code (to skip certain qgroup record), if we're going to hit some
> problem then this patchset would have the highest possibility to hit
> problem.
> 
> Later patches will just keep tweaking qgroup to without affecting any
> other parts mostly.
> 
> So I'm fine if you decide to pull it out for now.

I've adapted a stress tests that unpacks a large tarball, snaphosts
every 20 seconds, deletes a random snapshot every 50 seconds, deletes
file from the original subvolume, now enhanced with qgroups just for the
new snapshots inherigin the toplevel subvolume. Lockup.

It gets stuck in a snapshot call with the follwin stacktrace

[<0>] btrfs_tree_read_lock+0xf3/0x150 [btrfs]
[<0>] btrfs_qgroup_trace_subtree+0x280/0x7b0 [btrfs]
[<0>] do_walk_down+0x681/0xb20 [btrfs]
[<0>] walk_down_tree+0xf5/0x1c0 [btrfs]
[<0>] btrfs_drop_snapshot+0x43b/0xb60 [btrfs]
[<0>] btrfs_clean_one_deleted_snapshot+0xc1/0x120 [btrfs]
[<0>] cleaner_kthread+0xf8/0x170 [btrfs]
[<0>] kthread+0x121/0x140
[<0>] ret_from_fork+0x27/0x50

and that's like 10th snapshot and ~3rd deltion. This is qgroup show:

qgroupid         rfer         excl parent
--------         ----         ---- ------
0/5         865.27MiB      1.66MiB ---
0/257           0.00B        0.00B ---
0/259           0.00B        0.00B ---
0/260       806.58MiB    637.25MiB ---
0/262           0.00B        0.00B ---
0/263           0.00B        0.00B ---
0/264           0.00B        0.00B ---
0/265           0.00B        0.00B ---
0/266           0.00B        0.00B ---
0/267           0.00B        0.00B ---
0/268           0.00B        0.00B ---
0/269           0.00B        0.00B ---
0/270       989.04MiB      1.22MiB ---
0/271           0.00B        0.00B ---
0/272       922.25MiB    416.00KiB ---
0/273       931.02MiB      1.50MiB ---
0/274       910.94MiB      1.52MiB ---
1/1           1.64GiB      1.64GiB
0/5,0/257,0/259,0/260,0/262,0/263,0/264,0/265,0/266,0/267,0/268,0/269,0/270,0/271,0/272,0/273,0/274

No IO or cpu activity at this point, the stacktrace and show output
remains the same.

So, considering this, I'm not going to add the patchset to 4.21 but will
keep it in for-next for testing, any fixups or updates will be applied.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
  2018-12-08  0:47       ` David Sterba
@ 2018-12-08  0:50         ` Qu Wenruo
  2018-12-08 16:17           ` David Sterba
  2018-12-10 10:45           ` Filipe Manana
  2018-12-10  5:51         ` Qu Wenruo
  1 sibling, 2 replies; 22+ messages in thread
From: Qu Wenruo @ 2018-12-08  0:50 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 3881 bytes --]



On 2018/12/8 上午8:47, David Sterba wrote:
> On Fri, Dec 07, 2018 at 06:51:21AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2018/12/7 上午3:35, David Sterba wrote:
>>> On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote:
>>>> On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote:
>>>>> This patchset can be fetched from github:
>>>>> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased
>>>>>
>>>>> Which is based on v4.20-rc1.
>>>>
>>>> Thanks, I'll add it to for-next soon.
>>>
>>> The branch was there for some time but not for at least a week (my
>>> mistake I did not notice in time). I've rebased it on top of recent
>>> misc-next, but without the delayed refs patchset from Josef.
>>>
>>> At the moment I'm considering it for merge to 4.21, there's still some
>>> time to pull it out in case it shows up to be too problematic. I'm
>>> mostly worried about the unknown interactions with the enospc updates or
>>
>> For that part, I don't think it would have some obvious problem for
>> enospc updates.
>>
>> As the user-noticeable effect is the delay of reloc tree deletion.
>>
>> Despite that, it's mostly transparent to extent allocation.
>>
>>> generally because of lack of qgroup and reloc code reviews.
>>
>> That's the biggest problem.
>>
>> However most of the current qgroup + balance optimization is done inside
>> qgroup code (to skip certain qgroup record), if we're going to hit some
>> problem then this patchset would have the highest possibility to hit
>> problem.
>>
>> Later patches will just keep tweaking qgroup to without affecting any
>> other parts mostly.
>>
>> So I'm fine if you decide to pull it out for now.
> 
> I've adapted a stress tests that unpacks a large tarball, snaphosts
> every 20 seconds, deletes a random snapshot every 50 seconds, deletes
> file from the original subvolume, now enhanced with qgroups just for the
> new snapshots inherigin the toplevel subvolume. Lockup.
> 
> It gets stuck in a snapshot call with the follwin stacktrace
> 
> [<0>] btrfs_tree_read_lock+0xf3/0x150 [btrfs]
> [<0>] btrfs_qgroup_trace_subtree+0x280/0x7b0 [btrfs]

This looks like the original subtree tracing has something wrong.

Thanks for the report, I'll investigate it.
Qu

> [<0>] do_walk_down+0x681/0xb20 [btrfs]
> [<0>] walk_down_tree+0xf5/0x1c0 [btrfs]
> [<0>] btrfs_drop_snapshot+0x43b/0xb60 [btrfs]
> [<0>] btrfs_clean_one_deleted_snapshot+0xc1/0x120 [btrfs]
> [<0>] cleaner_kthread+0xf8/0x170 [btrfs]
> [<0>] kthread+0x121/0x140
> [<0>] ret_from_fork+0x27/0x50
> 
> and that's like 10th snapshot and ~3rd deltion. This is qgroup show:
> 
> qgroupid         rfer         excl parent
> --------         ----         ---- ------
> 0/5         865.27MiB      1.66MiB ---
> 0/257           0.00B        0.00B ---
> 0/259           0.00B        0.00B ---
> 0/260       806.58MiB    637.25MiB ---
> 0/262           0.00B        0.00B ---
> 0/263           0.00B        0.00B ---
> 0/264           0.00B        0.00B ---
> 0/265           0.00B        0.00B ---
> 0/266           0.00B        0.00B ---
> 0/267           0.00B        0.00B ---
> 0/268           0.00B        0.00B ---
> 0/269           0.00B        0.00B ---
> 0/270       989.04MiB      1.22MiB ---
> 0/271           0.00B        0.00B ---
> 0/272       922.25MiB    416.00KiB ---
> 0/273       931.02MiB      1.50MiB ---
> 0/274       910.94MiB      1.52MiB ---
> 1/1           1.64GiB      1.64GiB
> 0/5,0/257,0/259,0/260,0/262,0/263,0/264,0/265,0/266,0/267,0/268,0/269,0/270,0/271,0/272,0/273,0/274
> 
> No IO or cpu activity at this point, the stacktrace and show output
> remains the same.
> 
> So, considering this, I'm not going to add the patchset to 4.21 but will
> keep it in for-next for testing, any fixups or updates will be applied.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
  2018-12-08  0:50         ` Qu Wenruo
@ 2018-12-08 16:17           ` David Sterba
  2018-12-10 10:45           ` Filipe Manana
  1 sibling, 0 replies; 22+ messages in thread
From: David Sterba @ 2018-12-08 16:17 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs

On Sat, Dec 08, 2018 at 08:50:32AM +0800, Qu Wenruo wrote:
> > I've adapted a stress tests that unpacks a large tarball, snaphosts
> > every 20 seconds, deletes a random snapshot every 50 seconds, deletes
> > file from the original subvolume, now enhanced with qgroups just for the
> > new snapshots inherigin the toplevel subvolume. Lockup.
> > 
> > It gets stuck in a snapshot call with the follwin stacktrace
> > 
> > [<0>] btrfs_tree_read_lock+0xf3/0x150 [btrfs]
> > [<0>] btrfs_qgroup_trace_subtree+0x280/0x7b0 [btrfs]
> 
> This looks like the original subtree tracing has something wrong.

Yes, I ran the test on current master and it locked up too, so it's not
due to your patchset.

> Thanks for the report, I'll investigate it.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
  2018-12-08  0:47       ` David Sterba
  2018-12-08  0:50         ` Qu Wenruo
@ 2018-12-10  5:51         ` Qu Wenruo
  1 sibling, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2018-12-10  5:51 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 4285 bytes --]



On 2018/12/8 上午8:47, David Sterba wrote:
> On Fri, Dec 07, 2018 at 06:51:21AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2018/12/7 上午3:35, David Sterba wrote:
>>> On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote:
>>>> On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote:
>>>>> This patchset can be fetched from github:
>>>>> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased
>>>>>
>>>>> Which is based on v4.20-rc1.
>>>>
>>>> Thanks, I'll add it to for-next soon.
>>>
>>> The branch was there for some time but not for at least a week (my
>>> mistake I did not notice in time). I've rebased it on top of recent
>>> misc-next, but without the delayed refs patchset from Josef.
>>>
>>> At the moment I'm considering it for merge to 4.21, there's still some
>>> time to pull it out in case it shows up to be too problematic. I'm
>>> mostly worried about the unknown interactions with the enospc updates or
>>
>> For that part, I don't think it would have some obvious problem for
>> enospc updates.
>>
>> As the user-noticeable effect is the delay of reloc tree deletion.
>>
>> Despite that, it's mostly transparent to extent allocation.
>>
>>> generally because of lack of qgroup and reloc code reviews.
>>
>> That's the biggest problem.
>>
>> However most of the current qgroup + balance optimization is done inside
>> qgroup code (to skip certain qgroup record), if we're going to hit some
>> problem then this patchset would have the highest possibility to hit
>> problem.
>>
>> Later patches will just keep tweaking qgroup to without affecting any
>> other parts mostly.
>>
>> So I'm fine if you decide to pull it out for now.
> 
> I've adapted a stress tests that unpacks a large tarball, snaphosts
> every 20 seconds, deletes a random snapshot every 50 seconds, deletes
> file from the original subvolume, now enhanced with qgroups just for the
> new snapshots inherigin the toplevel subvolume. Lockup.

Could you please provide the test script?
As I can't reproduce it in my environment.

I crafted my own test with some simplification, namely no qgroup inherit.
However I can't reproduce the problem even with more snapshots
creation/deletion and more data.

In my test script, I created around 35 snapshots, deleted 6 snapshots,
with around 1000 data regular extents and 1000 2K inline extents.

My test script can be found at:
https://gist.github.com/adam900710/4109fa23fc5ba8fc6b37a9c8e52353c1

Thanks,
Qu

> 
> It gets stuck in a snapshot call with the follwin stacktrace
> 
> [<0>] btrfs_tree_read_lock+0xf3/0x150 [btrfs]
> [<0>] btrfs_qgroup_trace_subtree+0x280/0x7b0 [btrfs]
> [<0>] do_walk_down+0x681/0xb20 [btrfs]
> [<0>] walk_down_tree+0xf5/0x1c0 [btrfs]
> [<0>] btrfs_drop_snapshot+0x43b/0xb60 [btrfs]
> [<0>] btrfs_clean_one_deleted_snapshot+0xc1/0x120 [btrfs]
> [<0>] cleaner_kthread+0xf8/0x170 [btrfs]
> [<0>] kthread+0x121/0x140
> [<0>] ret_from_fork+0x27/0x50
> 
> and that's like 10th snapshot and ~3rd deltion. This is qgroup show:
> 
> qgroupid         rfer         excl parent
> --------         ----         ---- ------
> 0/5         865.27MiB      1.66MiB ---
> 0/257           0.00B        0.00B ---
> 0/259           0.00B        0.00B ---
> 0/260       806.58MiB    637.25MiB ---
> 0/262           0.00B        0.00B ---
> 0/263           0.00B        0.00B ---
> 0/264           0.00B        0.00B ---
> 0/265           0.00B        0.00B ---
> 0/266           0.00B        0.00B ---
> 0/267           0.00B        0.00B ---
> 0/268           0.00B        0.00B ---
> 0/269           0.00B        0.00B ---
> 0/270       989.04MiB      1.22MiB ---
> 0/271           0.00B        0.00B ---
> 0/272       922.25MiB    416.00KiB ---
> 0/273       931.02MiB      1.50MiB ---
> 0/274       910.94MiB      1.52MiB ---
> 1/1           1.64GiB      1.64GiB
> 0/5,0/257,0/259,0/260,0/262,0/263,0/264,0/265,0/266,0/267,0/268,0/269,0/270,0/271,0/272,0/273,0/274
> 
> No IO or cpu activity at this point, the stacktrace and show output
> remains the same.
> 
> So, considering this, I'm not going to add the patchset to 4.21 but will
> keep it in for-next for testing, any fixups or updates will be applied.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
  2018-12-08  0:50         ` Qu Wenruo
  2018-12-08 16:17           ` David Sterba
@ 2018-12-10 10:45           ` Filipe Manana
  2018-12-10 11:23             ` Qu Wenruo
  1 sibling, 1 reply; 22+ messages in thread
From: Filipe Manana @ 2018-12-10 10:45 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs

On Sat, Dec 8, 2018 at 12:51 AM Qu Wenruo <wqu@suse.de> wrote:
>
>
>
> On 2018/12/8 上午8:47, David Sterba wrote:
> > On Fri, Dec 07, 2018 at 06:51:21AM +0800, Qu Wenruo wrote:
> >>
> >>
> >> On 2018/12/7 上午3:35, David Sterba wrote:
> >>> On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote:
> >>>> On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote:
> >>>>> This patchset can be fetched from github:
> >>>>> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased
> >>>>>
> >>>>> Which is based on v4.20-rc1.
> >>>>
> >>>> Thanks, I'll add it to for-next soon.
> >>>
> >>> The branch was there for some time but not for at least a week (my
> >>> mistake I did not notice in time). I've rebased it on top of recent
> >>> misc-next, but without the delayed refs patchset from Josef.
> >>>
> >>> At the moment I'm considering it for merge to 4.21, there's still some
> >>> time to pull it out in case it shows up to be too problematic. I'm
> >>> mostly worried about the unknown interactions with the enospc updates or
> >>
> >> For that part, I don't think it would have some obvious problem for
> >> enospc updates.
> >>
> >> As the user-noticeable effect is the delay of reloc tree deletion.
> >>
> >> Despite that, it's mostly transparent to extent allocation.
> >>
> >>> generally because of lack of qgroup and reloc code reviews.
> >>
> >> That's the biggest problem.
> >>
> >> However most of the current qgroup + balance optimization is done inside
> >> qgroup code (to skip certain qgroup record), if we're going to hit some
> >> problem then this patchset would have the highest possibility to hit
> >> problem.
> >>
> >> Later patches will just keep tweaking qgroup to without affecting any
> >> other parts mostly.
> >>
> >> So I'm fine if you decide to pull it out for now.
> >
> > I've adapted a stress tests that unpacks a large tarball, snaphosts
> > every 20 seconds, deletes a random snapshot every 50 seconds, deletes
> > file from the original subvolume, now enhanced with qgroups just for the
> > new snapshots inherigin the toplevel subvolume. Lockup.
> >
> > It gets stuck in a snapshot call with the follwin stacktrace
> >
> > [<0>] btrfs_tree_read_lock+0xf3/0x150 [btrfs]
> > [<0>] btrfs_qgroup_trace_subtree+0x280/0x7b0 [btrfs]
>
> This looks like the original subtree tracing has something wrong.
>
> Thanks for the report, I'll investigate it.

Btw, there's another deadlock with qgroups. I don't recall if I ever
reported it, but I still hit it with fstests (rarely happens) for at
least 1 year iirc:

[29845.732448] INFO: task kworker/u8:8:3898 blocked for more than 120 seconds.
[29845.732852]       Not tainted 4.20.0-rc5-btrfs-next-40 #1
[29845.733248] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[29845.733558] kworker/u8:8    D    0  3898      2 0x80000000
[29845.733878] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[29845.734183] Call Trace:
[29845.734499]  ? __schedule+0x3d4/0xbc0
[29845.734818]  schedule+0x39/0x90
[29845.735131]  btrfs_tree_read_lock+0xe7/0x140 [btrfs]
[29845.735430]  ? remove_wait_queue+0x60/0x60
[29845.735731]  find_parent_nodes+0x25e/0xe30 [btrfs]
[29845.736037]  btrfs_find_all_roots_safe+0xc6/0x140 [btrfs]
[29845.736342]  btrfs_find_all_roots+0x52/0x70 [btrfs]
[29845.736710]  btrfs_qgroup_trace_extent_post+0x37/0x80 [btrfs]
[29845.737046]  btrfs_add_delayed_data_ref+0x240/0x3d0 [btrfs]
[29845.737362]  btrfs_inc_extent_ref+0xb7/0x140 [btrfs]
[29845.737678]  __btrfs_mod_ref+0x174/0x250 [btrfs]
[29845.737999]  ? add_pinned_bytes+0x60/0x60 [btrfs]
[29845.738298]  update_ref_for_cow+0x26b/0x340 [btrfs]
[29845.738592]  __btrfs_cow_block+0x221/0x5b0 [btrfs]
[29845.738899]  btrfs_cow_block+0xf4/0x210 [btrfs]
[29845.739200]  btrfs_search_slot+0x583/0xa40 [btrfs]
[29845.739527]  ? init_object+0x6b/0x80
[29845.739823]  btrfs_lookup_file_extent+0x4a/0x70 [btrfs]
[29845.740119]  __btrfs_drop_extents+0x157/0xd70 [btrfs]
[29845.740524]  insert_reserved_file_extent.constprop.66+0x97/0x2f0 [btrfs]
[29845.740853]  ? start_transaction+0xa2/0x490 [btrfs]
[29845.741166]  btrfs_finish_ordered_io+0x344/0x810 [btrfs]
[29845.741489]  normal_work_helper+0xea/0x530 [btrfs]
[29845.741880]  process_one_work+0x22f/0x5d0
[29845.742174]  worker_thread+0x4f/0x3b0
[29845.742462]  ? rescuer_thread+0x360/0x360
[29845.742759]  kthread+0x103/0x140
[29845.743044]  ? kthread_create_worker_on_cpu+0x70/0x70
[29845.743336]  ret_from_fork+0x3a/0x50

It happened last friday again on 4.20-rcX. It's caused by a change
from 2017 (commit fb235dc06fac9eaa4408ade9c8b20d45d63c89b7 btrfs:
qgroup: Move half of the qgroup accounting time out of commit trans).
The task is deadlocking with itself.

thanks


> Qu
>
> > [<0>] do_walk_down+0x681/0xb20 [btrfs]
> > [<0>] walk_down_tree+0xf5/0x1c0 [btrfs]
> > [<0>] btrfs_drop_snapshot+0x43b/0xb60 [btrfs]
> > [<0>] btrfs_clean_one_deleted_snapshot+0xc1/0x120 [btrfs]
> > [<0>] cleaner_kthread+0xf8/0x170 [btrfs]
> > [<0>] kthread+0x121/0x140
> > [<0>] ret_from_fork+0x27/0x50
> >
> > and that's like 10th snapshot and ~3rd deltion. This is qgroup show:
> >
> > qgroupid         rfer         excl parent
> > --------         ----         ---- ------
> > 0/5         865.27MiB      1.66MiB ---
> > 0/257           0.00B        0.00B ---
> > 0/259           0.00B        0.00B ---
> > 0/260       806.58MiB    637.25MiB ---
> > 0/262           0.00B        0.00B ---
> > 0/263           0.00B        0.00B ---
> > 0/264           0.00B        0.00B ---
> > 0/265           0.00B        0.00B ---
> > 0/266           0.00B        0.00B ---
> > 0/267           0.00B        0.00B ---
> > 0/268           0.00B        0.00B ---
> > 0/269           0.00B        0.00B ---
> > 0/270       989.04MiB      1.22MiB ---
> > 0/271           0.00B        0.00B ---
> > 0/272       922.25MiB    416.00KiB ---
> > 0/273       931.02MiB      1.50MiB ---
> > 0/274       910.94MiB      1.52MiB ---
> > 1/1           1.64GiB      1.64GiB
> > 0/5,0/257,0/259,0/260,0/262,0/263,0/264,0/265,0/266,0/267,0/268,0/269,0/270,0/271,0/272,0/273,0/274
> >
> > No IO or cpu activity at this point, the stacktrace and show output
> > remains the same.
> >
> > So, considering this, I'm not going to add the patchset to 4.21 but will
> > keep it in for-next for testing, any fixups or updates will be applied.
> >
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead
  2018-12-10 10:45           ` Filipe Manana
@ 2018-12-10 11:23             ` Qu Wenruo
  0 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2018-12-10 11:23 UTC (permalink / raw)
  To: fdmanana, Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 7053 bytes --]



On 2018/12/10 下午6:45, Filipe Manana wrote:
> On Sat, Dec 8, 2018 at 12:51 AM Qu Wenruo <wqu@suse.de> wrote:
>>
>>
>>
>> On 2018/12/8 上午8:47, David Sterba wrote:
>>> On Fri, Dec 07, 2018 at 06:51:21AM +0800, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2018/12/7 上午3:35, David Sterba wrote:
>>>>> On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote:
>>>>>> On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote:
>>>>>>> This patchset can be fetched from github:
>>>>>>> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased
>>>>>>>
>>>>>>> Which is based on v4.20-rc1.
>>>>>>
>>>>>> Thanks, I'll add it to for-next soon.
>>>>>
>>>>> The branch was there for some time but not for at least a week (my
>>>>> mistake I did not notice in time). I've rebased it on top of recent
>>>>> misc-next, but without the delayed refs patchset from Josef.
>>>>>
>>>>> At the moment I'm considering it for merge to 4.21, there's still some
>>>>> time to pull it out in case it shows up to be too problematic. I'm
>>>>> mostly worried about the unknown interactions with the enospc updates or
>>>>
>>>> For that part, I don't think it would have some obvious problem for
>>>> enospc updates.
>>>>
>>>> As the user-noticeable effect is the delay of reloc tree deletion.
>>>>
>>>> Despite that, it's mostly transparent to extent allocation.
>>>>
>>>>> generally because of lack of qgroup and reloc code reviews.
>>>>
>>>> That's the biggest problem.
>>>>
>>>> However most of the current qgroup + balance optimization is done inside
>>>> qgroup code (to skip certain qgroup record), if we're going to hit some
>>>> problem then this patchset would have the highest possibility to hit
>>>> problem.
>>>>
>>>> Later patches will just keep tweaking qgroup to without affecting any
>>>> other parts mostly.
>>>>
>>>> So I'm fine if you decide to pull it out for now.
>>>
>>> I've adapted a stress tests that unpacks a large tarball, snaphosts
>>> every 20 seconds, deletes a random snapshot every 50 seconds, deletes
>>> file from the original subvolume, now enhanced with qgroups just for the
>>> new snapshots inherigin the toplevel subvolume. Lockup.
>>>
>>> It gets stuck in a snapshot call with the follwin stacktrace
>>>
>>> [<0>] btrfs_tree_read_lock+0xf3/0x150 [btrfs]
>>> [<0>] btrfs_qgroup_trace_subtree+0x280/0x7b0 [btrfs]
>>
>> This looks like the original subtree tracing has something wrong.
>>
>> Thanks for the report, I'll investigate it.
> 
> Btw, there's another deadlock with qgroups. I don't recall if I ever
> reported it, but I still hit it with fstests (rarely happens) for at
> least 1 year iirc:
> 
> [29845.732448] INFO: task kworker/u8:8:3898 blocked for more than 120 seconds.
> [29845.732852]       Not tainted 4.20.0-rc5-btrfs-next-40 #1
> [29845.733248] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [29845.733558] kworker/u8:8    D    0  3898      2 0x80000000
> [29845.733878] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
> [29845.734183] Call Trace:
> [29845.734499]  ? __schedule+0x3d4/0xbc0
> [29845.734818]  schedule+0x39/0x90
> [29845.735131]  btrfs_tree_read_lock+0xe7/0x140 [btrfs]
> [29845.735430]  ? remove_wait_queue+0x60/0x60
> [29845.735731]  find_parent_nodes+0x25e/0xe30 [btrfs]
> [29845.736037]  btrfs_find_all_roots_safe+0xc6/0x140 [btrfs]
> [29845.736342]  btrfs_find_all_roots+0x52/0x70 [btrfs]
> [29845.736710]  btrfs_qgroup_trace_extent_post+0x37/0x80 [btrfs]
> [29845.737046]  btrfs_add_delayed_data_ref+0x240/0x3d0 [btrfs]
> [29845.737362]  btrfs_inc_extent_ref+0xb7/0x140 [btrfs]
> [29845.737678]  __btrfs_mod_ref+0x174/0x250 [btrfs]
> [29845.737999]  ? add_pinned_bytes+0x60/0x60 [btrfs]
> [29845.738298]  update_ref_for_cow+0x26b/0x340 [btrfs]
> [29845.738592]  __btrfs_cow_block+0x221/0x5b0 [btrfs]
> [29845.738899]  btrfs_cow_block+0xf4/0x210 [btrfs]
> [29845.739200]  btrfs_search_slot+0x583/0xa40 [btrfs]
> [29845.739527]  ? init_object+0x6b/0x80
> [29845.739823]  btrfs_lookup_file_extent+0x4a/0x70 [btrfs]
> [29845.740119]  __btrfs_drop_extents+0x157/0xd70 [btrfs]
> [29845.740524]  insert_reserved_file_extent.constprop.66+0x97/0x2f0 [btrfs]
> [29845.740853]  ? start_transaction+0xa2/0x490 [btrfs]
> [29845.741166]  btrfs_finish_ordered_io+0x344/0x810 [btrfs]
> [29845.741489]  normal_work_helper+0xea/0x530 [btrfs]
> [29845.741880]  process_one_work+0x22f/0x5d0
> [29845.742174]  worker_thread+0x4f/0x3b0
> [29845.742462]  ? rescuer_thread+0x360/0x360
> [29845.742759]  kthread+0x103/0x140
> [29845.743044]  ? kthread_create_worker_on_cpu+0x70/0x70
> [29845.743336]  ret_from_fork+0x3a/0x50
> 
> It happened last friday again on 4.20-rcX. It's caused by a change
> from 2017 (commit fb235dc06fac9eaa4408ade9c8b20d45d63c89b7 btrfs:
> qgroup: Move half of the qgroup accounting time out of commit trans).

I have to admit, this commit doesn't really save much critical section
time, but causes a lot of problem for its ability to trigger backward
tree locking behavior.

Especially when its original objective is to reduce balance + qgroup
overhead, but did a poor job compared to recent optimization.

I'll revert it just as what we did in SLE kernels.

Thanks,
Qu

> The task is deadlocking with itself.
> 
> thanks
> 
> 
>> Qu
>>
>>> [<0>] do_walk_down+0x681/0xb20 [btrfs]
>>> [<0>] walk_down_tree+0xf5/0x1c0 [btrfs]
>>> [<0>] btrfs_drop_snapshot+0x43b/0xb60 [btrfs]
>>> [<0>] btrfs_clean_one_deleted_snapshot+0xc1/0x120 [btrfs]
>>> [<0>] cleaner_kthread+0xf8/0x170 [btrfs]
>>> [<0>] kthread+0x121/0x140
>>> [<0>] ret_from_fork+0x27/0x50
>>>
>>> and that's like 10th snapshot and ~3rd deltion. This is qgroup show:
>>>
>>> qgroupid         rfer         excl parent
>>> --------         ----         ---- ------
>>> 0/5         865.27MiB      1.66MiB ---
>>> 0/257           0.00B        0.00B ---
>>> 0/259           0.00B        0.00B ---
>>> 0/260       806.58MiB    637.25MiB ---
>>> 0/262           0.00B        0.00B ---
>>> 0/263           0.00B        0.00B ---
>>> 0/264           0.00B        0.00B ---
>>> 0/265           0.00B        0.00B ---
>>> 0/266           0.00B        0.00B ---
>>> 0/267           0.00B        0.00B ---
>>> 0/268           0.00B        0.00B ---
>>> 0/269           0.00B        0.00B ---
>>> 0/270       989.04MiB      1.22MiB ---
>>> 0/271           0.00B        0.00B ---
>>> 0/272       922.25MiB    416.00KiB ---
>>> 0/273       931.02MiB      1.50MiB ---
>>> 0/274       910.94MiB      1.52MiB ---
>>> 1/1           1.64GiB      1.64GiB
>>> 0/5,0/257,0/259,0/260,0/262,0/263,0/264,0/265,0/266,0/267,0/268,0/269,0/270,0/271,0/272,0/273,0/274
>>>
>>> No IO or cpu activity at this point, the stacktrace and show output
>>> remains the same.
>>>
>>> So, considering this, I'm not going to add the patchset to 4.21 but will
>>> keep it in for-next for testing, any fixups or updates will be applied.
>>>
>>
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, back to index

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-08  5:49 [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead Qu Wenruo
2018-11-08  5:49 ` [PATCH v2 1/6] btrfs: qgroup: Allow btrfs_qgroup_extent_record::old_roots unpopulated at insert time Qu Wenruo
2018-11-08  5:49 ` [PATCH v2 2/6] btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots() Qu Wenruo
2018-11-08  5:49 ` [PATCH v2 3/6] btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap() Qu Wenruo
2018-11-08  5:49 ` [PATCH v2 4/6] btrfs: qgroup: Introduce per-root swapped blocks infrastructure Qu Wenruo
2018-11-08  5:49 ` [PATCH v2 5/6] btrfs: qgroup: Use delayed subtree rescan for balance Qu Wenruo
2018-11-08  5:49 ` [PATCH v2 6/6] btrfs: qgroup: Cleanup old subtree swap code Qu Wenruo
2018-11-12 21:33 ` [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead David Sterba
2018-11-13 17:07   ` David Sterba
2018-11-13 17:58     ` Filipe Manana
2018-11-13 23:56       ` Qu Wenruo
2018-11-14 19:05       ` David Sterba
2018-11-15  5:23         ` Qu Wenruo
2018-11-15 10:28           ` David Sterba
2018-12-06 19:35   ` David Sterba
2018-12-06 22:51     ` Qu Wenruo
2018-12-08  0:47       ` David Sterba
2018-12-08  0:50         ` Qu Wenruo
2018-12-08 16:17           ` David Sterba
2018-12-10 10:45           ` Filipe Manana
2018-12-10 11:23             ` Qu Wenruo
2018-12-10  5:51         ` Qu Wenruo

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable: git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox