All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Qgroup comment enhance and balance fix
@ 2016-10-18  1:31 Qu Wenruo
  2016-10-18  1:31 ` [PATCH 1/4] btrfs: qgroup: Add comments explaining how btrfs qgroup works Qu Wenruo
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Qu Wenruo @ 2016-10-18  1:31 UTC (permalink / raw)
  To: linux-btrfs; +Cc: rgoldwyn, rgoldwyn

The patchset does the following things:
1) Enhance comment for qgroup, rename 2 functions
   Explain the how qgroup works, so new developers won't waste too much
   time digging into the boring codes.

   The qgroup work flow is split into 3 main phrases:
   Reverse, Trace, Account.
   And rename functions like btrfs_qgroup_insert_dirty_extent_record()
   to btrfs_qgroup_trace_extent(), to follow the "Trace" phrase.

   Other function name already follows such schema before.

2) Move account_shared_subtree() and account_leaf_items() to qgroup.c
   Such functions are only used by qgroup, so move them to qgroup.c and
   rename them to follow "trace" schema.

3) Fix the long standing qgroup balance corruption
   Commit 62b99540a1d91e4 doesn't fix the problem completely.
   It can only handle case that merge_reloc_roots() are all done in one
   transaction.

   If transaction commits during merge_reloc_roots(), the data extents
   will leak again.

   The tree fix is to info qgroup to trace both subtree(tree reloc tree
   and destination fs tree), at replace_path() time.
   Inside  replace_path(), there is one transaction start and end, so we
   must make qgroup to trace both subtrees.

   Thanks for previous work, now we can easily trace subtree, so the fix
   is quite simple now.

   And the cause also makes it easier to create pinpoint test case for
   this bug.

Qu Wenruo (4):
  btrfs: qgroup: Add comments explaining how btrfs qgroup works
  btrfs: qgroup: Rename functions to make it follow
    reserve,trace,account steps
  btrfs: Expoert and move leaf/subtree qgroup helpers to qgroup.c
  btrfs: qgroup: Fix qgroup data leaking by using subtree tracing

 fs/btrfs/delayed-ref.c       |   2 +-
 fs/btrfs/extent-tree.c       | 220 +------------------------------------------
 fs/btrfs/qgroup.c            | 219 +++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/qgroup.h            |  64 +++++++++++--
 fs/btrfs/relocation.c        | 119 +++++------------------
 fs/btrfs/tree-log.c          |   2 +-
 include/trace/events/btrfs.h |   2 +-
 7 files changed, 302 insertions(+), 326 deletions(-)

-- 
2.10.0




^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] btrfs: qgroup: Add comments explaining how btrfs qgroup works
  2016-10-18  1:31 [PATCH 0/4] Qgroup comment enhance and balance fix Qu Wenruo
@ 2016-10-18  1:31 ` Qu Wenruo
  2016-11-03 18:49   ` Goldwyn Rodrigues
  2016-10-18  1:31 ` [PATCH 2/4] btrfs: qgroup: Rename functions to make it follow reserve,trace,account steps Qu Wenruo
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2016-10-18  1:31 UTC (permalink / raw)
  To: linux-btrfs; +Cc: rgoldwyn, rgoldwyn

Add explain how btrfs qgroup works.

Qgroup is split into 3 main phrases:
1) Reserve
   To ensure qgroup doesn't exceed its limit

2) Trace
   To info qgroup to trace which extent

3) Account
   Calculate qgroup number change for each traced extent.

This should save quite some time for new developers.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/qgroup.h | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 1bc64c8..a72bf21 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -23,6 +23,34 @@
 #include "delayed-ref.h"
 
 /*
+ * Btrfs qgroup overview
+ *
+ * Btrfs qgroup splits into 3 main part:
+ * 1) Reserve
+ *    Reserve metadata/data space for incoming operations
+ *    Affect how qgroup limit works
+ *
+ * 2) Trace
+ *    Tell btrfs qgroup to trace dirty extents.
+ *
+ *    Dirty extents including:
+ *    - Newly allocated extents
+ *    - Extents going to be deleted (in this trans)
+ *    - Extents whose owner is going to be modified
+ *
+ *    This is the main part affects whether qgroup numbers will stay
+ *    consistent.
+ *    Btrfs qgroup can trace clean extents and won't cause any problem,
+ *    but it will consume extra CPU time, it should be avoided if possible.
+ *
+ * 3) Account
+ *    Btrfs qgroup will updates its numbers, based on dirty extents traced
+ *    in previous step.
+ *
+ *    Normally at qgroup rescan and transaction commit time.
+ */
+
+/*
  * Record a dirty extent, and info qgroup to update quota on it
  * TODO: Use kmem cache to alloc it.
  */
-- 
2.10.0




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] btrfs: qgroup: Rename functions to make it follow reserve,trace,account steps
  2016-10-18  1:31 [PATCH 0/4] Qgroup comment enhance and balance fix Qu Wenruo
  2016-10-18  1:31 ` [PATCH 1/4] btrfs: qgroup: Add comments explaining how btrfs qgroup works Qu Wenruo
@ 2016-10-18  1:31 ` Qu Wenruo
  2016-11-03 18:49   ` Goldwyn Rodrigues
  2016-10-18  1:31 ` [PATCH 3/4] btrfs: Expoert and move leaf/subtree qgroup helpers to qgroup.c Qu Wenruo
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2016-10-18  1:31 UTC (permalink / raw)
  To: linux-btrfs; +Cc: rgoldwyn, rgoldwyn

Rename btrfs_qgroup_insert_dirty_extent(_nolock) to
btrfs_qgroup_trace_extent(_nolock), according to the new
reserve/trace/account naming schema.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/delayed-ref.c       |  2 +-
 fs/btrfs/extent-tree.c       |  6 +++---
 fs/btrfs/qgroup.c            |  8 ++++----
 fs/btrfs/qgroup.h            | 13 +++++++------
 fs/btrfs/relocation.c        |  2 +-
 fs/btrfs/tree-log.c          |  2 +-
 include/trace/events/btrfs.h |  2 +-
 7 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 8d93854..a1cd0da 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -606,7 +606,7 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
 		qrecord->num_bytes = num_bytes;
 		qrecord->old_roots = NULL;
 
-		if(btrfs_qgroup_insert_dirty_extent_nolock(fs_info,
+		if(btrfs_qgroup_trace_extent_nolock(fs_info,
 					delayed_refs, qrecord))
 			kfree(qrecord);
 	}
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 210c94a..024eb5d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8568,8 +8568,8 @@ static int account_leaf_items(struct btrfs_trans_handle *trans,
 
 		num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi);
 
-		ret = btrfs_qgroup_insert_dirty_extent(trans, root->fs_info,
-				bytenr, num_bytes, GFP_NOFS);
+		ret = btrfs_qgroup_trace_extent(trans, root->fs_info,
+						bytenr, num_bytes, GFP_NOFS);
 		if (ret)
 			return ret;
 	}
@@ -8718,7 +8718,7 @@ walk_down:
 			btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
 			path->locks[level] = BTRFS_READ_LOCK_BLOCKING;
 
-			ret = btrfs_qgroup_insert_dirty_extent(trans,
+			ret = btrfs_qgroup_trace_extent(trans,
 					root->fs_info, child_bytenr,
 					root->nodesize, GFP_NOFS);
 			if (ret)
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 11f4fff..e73eea3 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1450,7 +1450,7 @@ int btrfs_qgroup_prepare_account_extents(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
-int btrfs_qgroup_insert_dirty_extent_nolock(struct btrfs_fs_info *fs_info,
+int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
 				struct btrfs_delayed_ref_root *delayed_refs,
 				struct btrfs_qgroup_extent_record *record)
 {
@@ -1460,7 +1460,7 @@ int btrfs_qgroup_insert_dirty_extent_nolock(struct btrfs_fs_info *fs_info,
 	u64 bytenr = record->bytenr;
 
 	assert_spin_locked(&delayed_refs->lock);
-	trace_btrfs_qgroup_insert_dirty_extent(fs_info, record);
+	trace_btrfs_qgroup_trace_extent(fs_info, record);
 
 	while (*p) {
 		parent_node = *p;
@@ -1479,7 +1479,7 @@ int btrfs_qgroup_insert_dirty_extent_nolock(struct btrfs_fs_info *fs_info,
 	return 0;
 }
 
-int btrfs_qgroup_insert_dirty_extent(struct btrfs_trans_handle *trans,
+int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans,
 		struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes,
 		gfp_t gfp_flag)
 {
@@ -1502,7 +1502,7 @@ int btrfs_qgroup_insert_dirty_extent(struct btrfs_trans_handle *trans,
 	record->old_roots = NULL;
 
 	spin_lock(&delayed_refs->lock);
-	ret = btrfs_qgroup_insert_dirty_extent_nolock(fs_info, delayed_refs,
+	ret = btrfs_qgroup_trace_extent_nolock(fs_info, delayed_refs,
 						      record);
 	spin_unlock(&delayed_refs->lock);
 	if (ret > 0)
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index a72bf21..9303e09 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -93,8 +93,8 @@ struct btrfs_delayed_extent_op;
 int btrfs_qgroup_prepare_account_extents(struct btrfs_trans_handle *trans,
 					 struct btrfs_fs_info *fs_info);
 /*
- * Insert one dirty extent record into @delayed_refs, informing qgroup to
- * account that extent at commit trans time.
+ * Inform qgroup to trace one dirty extent, its info is recorded in @record.
+ * So qgroup can account it at commit trans time.
  *
  * No lock version, caller must acquire delayed ref lock and allocate memory.
  *
@@ -102,14 +102,15 @@ int btrfs_qgroup_prepare_account_extents(struct btrfs_trans_handle *trans,
  * Return >0 for existing record, caller can free @record safely.
  * Error is not possible
  */
-int btrfs_qgroup_insert_dirty_extent_nolock(
+int btrfs_qgroup_trace_extent_nolock(
 		struct btrfs_fs_info *fs_info,
 		struct btrfs_delayed_ref_root *delayed_refs,
 		struct btrfs_qgroup_extent_record *record);
 
 /*
- * Insert one dirty extent record into @delayed_refs, informing qgroup to
- * account that extent at commit trans time.
+ * Inform qgroup to trace one dirty extent, specified by @bytenr and
+ * @num_bytes.
+ * So qgroup can account it at commit trans time.
  *
  * Better encapsulated version.
  *
@@ -117,7 +118,7 @@ int btrfs_qgroup_insert_dirty_extent_nolock(
  * Return <0 for error, like memory allocation failure or invalid parameter
  * (NULL trans)
  */
-int btrfs_qgroup_insert_dirty_extent(struct btrfs_trans_handle *trans,
+int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans,
 		struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes,
 		gfp_t gfp_flag);
 
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0ec8ffa..33cde30 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4005,7 +4005,7 @@ static int qgroup_fix_relocated_data_extents(struct btrfs_trans_handle *trans,
 		if (btrfs_file_extent_type(path->nodes[0], fi) !=
 				BTRFS_FILE_EXTENT_REG)
 			goto next;
-		ret = btrfs_qgroup_insert_dirty_extent(trans, fs_info,
+		ret = btrfs_qgroup_trace_extent(trans, fs_info,
 			btrfs_file_extent_disk_bytenr(path->nodes[0], fi),
 			btrfs_file_extent_disk_num_bytes(path->nodes[0], fi),
 			GFP_NOFS);
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 688df71..8ad7665 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -689,7 +689,7 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
 		 * as the owner of the file extent changed from log tree
 		 * (doesn't affect qgroup) to fs/file tree(affects qgroup)
 		 */
-		ret = btrfs_qgroup_insert_dirty_extent(trans, root->fs_info,
+		ret = btrfs_qgroup_trace_extent(trans, root->fs_info,
 				btrfs_file_extent_disk_bytenr(eb, item),
 				btrfs_file_extent_disk_num_bytes(eb, item),
 				GFP_NOFS);
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index e030d6f..e61bbc3 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -1406,7 +1406,7 @@ DEFINE_EVENT(btrfs_qgroup_extent, btrfs_qgroup_account_extents,
 	TP_ARGS(fs_info, rec)
 );
 
-DEFINE_EVENT(btrfs_qgroup_extent, btrfs_qgroup_insert_dirty_extent,
+DEFINE_EVENT(btrfs_qgroup_extent, btrfs_qgroup_trace_extent,
 
 	TP_PROTO(struct btrfs_fs_info *fs_info,
 		 struct btrfs_qgroup_extent_record *rec),
-- 
2.10.0




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] btrfs: Expoert and move leaf/subtree qgroup helpers to qgroup.c
  2016-10-18  1:31 [PATCH 0/4] Qgroup comment enhance and balance fix Qu Wenruo
  2016-10-18  1:31 ` [PATCH 1/4] btrfs: qgroup: Add comments explaining how btrfs qgroup works Qu Wenruo
  2016-10-18  1:31 ` [PATCH 2/4] btrfs: qgroup: Rename functions to make it follow reserve,trace,account steps Qu Wenruo
@ 2016-10-18  1:31 ` Qu Wenruo
  2016-11-03 18:50   ` Goldwyn Rodrigues
  2016-10-18  1:31 ` [PATCH 4/4] btrfs: qgroup: Fix qgroup data leaking by using subtree tracing Qu Wenruo
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2016-10-18  1:31 UTC (permalink / raw)
  To: linux-btrfs; +Cc: rgoldwyn, rgoldwyn

Move account_shared_subtree() to qgroup.c and rename it to
btrfs_qgroup_trace_subtree().

Do the same thing for account_leaf_items() and rename it to
btrfs_qgroup_trace_leaf_items().

Since all these functions are only for qgroup, move them to qgroup.c and
export them is more appropriate.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c | 220 +------------------------------------------------
 fs/btrfs/qgroup.c      | 211 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/qgroup.h      |  23 ++++++
 3 files changed, 237 insertions(+), 217 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 024eb5d..f7aa49d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8535,220 +8535,6 @@ reada:
 	wc->reada_slot = slot;
 }
 
-static int account_leaf_items(struct btrfs_trans_handle *trans,
-			      struct btrfs_root *root,
-			      struct extent_buffer *eb)
-{
-	int nr = btrfs_header_nritems(eb);
-	int i, extent_type, ret;
-	struct btrfs_key key;
-	struct btrfs_file_extent_item *fi;
-	u64 bytenr, num_bytes;
-
-	/* We can be called directly from walk_up_proc() */
-	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &root->fs_info->flags))
-		return 0;
-
-	for (i = 0; i < nr; i++) {
-		btrfs_item_key_to_cpu(eb, &key, i);
-
-		if (key.type != BTRFS_EXTENT_DATA_KEY)
-			continue;
-
-		fi = btrfs_item_ptr(eb, i, struct btrfs_file_extent_item);
-		/* filter out non qgroup-accountable extents  */
-		extent_type = btrfs_file_extent_type(eb, fi);
-
-		if (extent_type == BTRFS_FILE_EXTENT_INLINE)
-			continue;
-
-		bytenr = btrfs_file_extent_disk_bytenr(eb, fi);
-		if (!bytenr)
-			continue;
-
-		num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi);
-
-		ret = btrfs_qgroup_trace_extent(trans, root->fs_info,
-						bytenr, num_bytes, GFP_NOFS);
-		if (ret)
-			return ret;
-	}
-	return 0;
-}
-
-/*
- * Walk up the tree from the bottom, freeing leaves and any interior
- * nodes which have had all slots visited. If a node (leaf or
- * interior) is freed, the node above it will have it's slot
- * incremented. The root node will never be freed.
- *
- * At the end of this function, we should have a path which has all
- * slots incremented to the next position for a search. If we need to
- * read a new node it will be NULL and the node above it will have the
- * correct slot selected for a later read.
- *
- * If we increment the root nodes slot counter past the number of
- * elements, 1 is returned to signal completion of the search.
- */
-static int adjust_slots_upwards(struct btrfs_root *root,
-				struct btrfs_path *path, int root_level)
-{
-	int level = 0;
-	int nr, slot;
-	struct extent_buffer *eb;
-
-	if (root_level == 0)
-		return 1;
-
-	while (level <= root_level) {
-		eb = path->nodes[level];
-		nr = btrfs_header_nritems(eb);
-		path->slots[level]++;
-		slot = path->slots[level];
-		if (slot >= nr || level == 0) {
-			/*
-			 * Don't free the root -  we will detect this
-			 * condition after our loop and return a
-			 * positive value for caller to stop walking the tree.
-			 */
-			if (level != root_level) {
-				btrfs_tree_unlock_rw(eb, path->locks[level]);
-				path->locks[level] = 0;
-
-				free_extent_buffer(eb);
-				path->nodes[level] = NULL;
-				path->slots[level] = 0;
-			}
-		} else {
-			/*
-			 * We have a valid slot to walk back down
-			 * from. Stop here so caller can process these
-			 * new nodes.
-			 */
-			break;
-		}
-
-		level++;
-	}
-
-	eb = path->nodes[root_level];
-	if (path->slots[root_level] >= btrfs_header_nritems(eb))
-		return 1;
-
-	return 0;
-}
-
-/*
- * root_eb is the subtree root and is locked before this function is called.
- */
-static int account_shared_subtree(struct btrfs_trans_handle *trans,
-				  struct btrfs_root *root,
-				  struct extent_buffer *root_eb,
-				  u64 root_gen,
-				  int root_level)
-{
-	int ret = 0;
-	int level;
-	struct extent_buffer *eb = root_eb;
-	struct btrfs_path *path = NULL;
-
-	BUG_ON(root_level < 0 || root_level > BTRFS_MAX_LEVEL);
-	BUG_ON(root_eb == NULL);
-
-	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &root->fs_info->flags))
-		return 0;
-
-	if (!extent_buffer_uptodate(root_eb)) {
-		ret = btrfs_read_buffer(root_eb, root_gen);
-		if (ret)
-			goto out;
-	}
-
-	if (root_level == 0) {
-		ret = account_leaf_items(trans, root, root_eb);
-		goto out;
-	}
-
-	path = btrfs_alloc_path();
-	if (!path)
-		return -ENOMEM;
-
-	/*
-	 * Walk down the tree.  Missing extent blocks are filled in as
-	 * we go. Metadata is accounted every time we read a new
-	 * extent block.
-	 *
-	 * When we reach a leaf, we account for file extent items in it,
-	 * walk back up the tree (adjusting slot pointers as we go)
-	 * and restart the search process.
-	 */
-	extent_buffer_get(root_eb); /* For path */
-	path->nodes[root_level] = root_eb;
-	path->slots[root_level] = 0;
-	path->locks[root_level] = 0; /* so release_path doesn't try to unlock */
-walk_down:
-	level = root_level;
-	while (level >= 0) {
-		if (path->nodes[level] == NULL) {
-			int parent_slot;
-			u64 child_gen;
-			u64 child_bytenr;
-
-			/* We need to get child blockptr/gen from
-			 * parent before we can read it. */
-			eb = path->nodes[level + 1];
-			parent_slot = path->slots[level + 1];
-			child_bytenr = btrfs_node_blockptr(eb, parent_slot);
-			child_gen = btrfs_node_ptr_generation(eb, parent_slot);
-
-			eb = read_tree_block(root, child_bytenr, child_gen);
-			if (IS_ERR(eb)) {
-				ret = PTR_ERR(eb);
-				goto out;
-			} else if (!extent_buffer_uptodate(eb)) {
-				free_extent_buffer(eb);
-				ret = -EIO;
-				goto out;
-			}
-
-			path->nodes[level] = eb;
-			path->slots[level] = 0;
-
-			btrfs_tree_read_lock(eb);
-			btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
-			path->locks[level] = BTRFS_READ_LOCK_BLOCKING;
-
-			ret = btrfs_qgroup_trace_extent(trans,
-					root->fs_info, child_bytenr,
-					root->nodesize, GFP_NOFS);
-			if (ret)
-				goto out;
-		}
-
-		if (level == 0) {
-			ret = account_leaf_items(trans, root, path->nodes[level]);
-			if (ret)
-				goto out;
-
-			/* Nonzero return here means we completed our search */
-			ret = adjust_slots_upwards(root, path, root_level);
-			if (ret)
-				break;
-
-			/* Restart search with new slots */
-			goto walk_down;
-		}
-
-		level--;
-	}
-
-	ret = 0;
-out:
-	btrfs_free_path(path);
-
-	return ret;
-}
-
 /*
  * helper to process tree block while walking down the tree.
  *
@@ -8977,8 +8763,8 @@ skip:
 		}
 
 		if (need_account) {
-			ret = account_shared_subtree(trans, root, next,
-						     generation, level - 1);
+			ret = btrfs_qgroup_trace_subtree(trans, root, next,
+							 generation, level - 1);
 			if (ret) {
 				btrfs_err_rl(root->fs_info,
 					     "Error %d accounting shared subtree. Quota is out of sync, rescan required.",
@@ -9075,7 +8861,7 @@ static noinline int walk_up_proc(struct btrfs_trans_handle *trans,
 			else
 				ret = btrfs_dec_ref(trans, root, eb, 0);
 			BUG_ON(ret); /* -ENOMEM */
-			ret = account_leaf_items(trans, root, eb);
+			ret = btrfs_qgroup_trace_leaf_items(trans, root, eb);
 			if (ret) {
 				btrfs_err_rl(root->fs_info,
 					     "error %d accounting leaf items. Quota is out of sync, rescan required.",
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index e73eea3..e97f304 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1510,6 +1510,217 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans,
 	return 0;
 }
 
+int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans,
+				  struct btrfs_root *root,
+				  struct extent_buffer *eb)
+{
+	int nr = btrfs_header_nritems(eb);
+	int i, extent_type, ret;
+	struct btrfs_key key;
+	struct btrfs_file_extent_item *fi;
+	u64 bytenr, num_bytes;
+
+	/* We can be called directly from walk_up_proc() */
+	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &root->fs_info->flags))
+		return 0;
+
+	for (i = 0; i < nr; i++) {
+		btrfs_item_key_to_cpu(eb, &key, i);
+
+		if (key.type != BTRFS_EXTENT_DATA_KEY)
+			continue;
+
+		fi = btrfs_item_ptr(eb, i, struct btrfs_file_extent_item);
+		/* filter out non qgroup-accountable extents  */
+		extent_type = btrfs_file_extent_type(eb, fi);
+
+		if (extent_type == BTRFS_FILE_EXTENT_INLINE)
+			continue;
+
+		bytenr = btrfs_file_extent_disk_bytenr(eb, fi);
+		if (!bytenr)
+			continue;
+
+		num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi);
+
+		ret = btrfs_qgroup_trace_extent(trans, root->fs_info,
+						bytenr, num_bytes, GFP_NOFS);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+/*
+ * Walk up the tree from the bottom, freeing leaves and any interior
+ * nodes which have had all slots visited. If a node (leaf or
+ * interior) is freed, the node above it will have it's slot
+ * incremented. The root node will never be freed.
+ *
+ * At the end of this function, we should have a path which has all
+ * slots incremented to the next position for a search. If we need to
+ * read a new node it will be NULL and the node above it will have the
+ * correct slot selected for a later read.
+ *
+ * If we increment the root nodes slot counter past the number of
+ * elements, 1 is returned to signal completion of the search.
+ */
+static int adjust_slots_upwards(struct btrfs_root *root,
+				struct btrfs_path *path, int root_level)
+{
+	int level = 0;
+	int nr, slot;
+	struct extent_buffer *eb;
+
+	if (root_level == 0)
+		return 1;
+
+	while (level <= root_level) {
+		eb = path->nodes[level];
+		nr = btrfs_header_nritems(eb);
+		path->slots[level]++;
+		slot = path->slots[level];
+		if (slot >= nr || level == 0) {
+			/*
+			 * Don't free the root -  we will detect this
+			 * condition after our loop and return a
+			 * positive value for caller to stop walking the tree.
+			 */
+			if (level != root_level) {
+				btrfs_tree_unlock_rw(eb, path->locks[level]);
+				path->locks[level] = 0;
+
+				free_extent_buffer(eb);
+				path->nodes[level] = NULL;
+				path->slots[level] = 0;
+			}
+		} else {
+			/*
+			 * We have a valid slot to walk back down
+			 * from. Stop here so caller can process these
+			 * new nodes.
+			 */
+			break;
+		}
+
+		level++;
+	}
+
+	eb = path->nodes[root_level];
+	if (path->slots[root_level] >= btrfs_header_nritems(eb))
+		return 1;
+
+	return 0;
+}
+
+int btrfs_qgroup_trace_subtree(struct btrfs_trans_handle *trans,
+			       struct btrfs_root *root,
+			       struct extent_buffer *root_eb,
+			       u64 root_gen, int root_level)
+{
+	int ret = 0;
+	int level;
+	struct extent_buffer *eb = root_eb;
+	struct btrfs_path *path = NULL;
+
+	BUG_ON(root_level < 0 || root_level > BTRFS_MAX_LEVEL);
+	BUG_ON(root_eb == NULL);
+
+	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &root->fs_info->flags))
+		return 0;
+
+	if (!extent_buffer_uptodate(root_eb)) {
+		ret = btrfs_read_buffer(root_eb, root_gen);
+		if (ret)
+			goto out;
+	}
+
+	if (root_level == 0) {
+		ret = btrfs_qgroup_trace_leaf_items(trans, root, root_eb);
+		goto out;
+	}
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	/*
+	 * Walk down the tree.  Missing extent blocks are filled in as
+	 * we go. Metadata is accounted every time we read a new
+	 * extent block.
+	 *
+	 * When we reach a leaf, we account for file extent items in it,
+	 * walk back up the tree (adjusting slot pointers as we go)
+	 * and restart the search process.
+	 */
+	extent_buffer_get(root_eb); /* For path */
+	path->nodes[root_level] = root_eb;
+	path->slots[root_level] = 0;
+	path->locks[root_level] = 0; /* so release_path doesn't try to unlock */
+walk_down:
+	level = root_level;
+	while (level >= 0) {
+		if (path->nodes[level] == NULL) {
+			int parent_slot;
+			u64 child_gen;
+			u64 child_bytenr;
+
+			/* We need to get child blockptr/gen from
+			 * parent before we can read it. */
+			eb = path->nodes[level + 1];
+			parent_slot = path->slots[level + 1];
+			child_bytenr = btrfs_node_blockptr(eb, parent_slot);
+			child_gen = btrfs_node_ptr_generation(eb, parent_slot);
+
+			eb = read_tree_block(root, child_bytenr, child_gen);
+			if (IS_ERR(eb)) {
+				ret = PTR_ERR(eb);
+				goto out;
+			} else if (!extent_buffer_uptodate(eb)) {
+				free_extent_buffer(eb);
+				ret = -EIO;
+				goto out;
+			}
+
+			path->nodes[level] = eb;
+			path->slots[level] = 0;
+
+			btrfs_tree_read_lock(eb);
+			btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
+			path->locks[level] = BTRFS_READ_LOCK_BLOCKING;
+
+			ret = btrfs_qgroup_trace_extent(trans,
+					root->fs_info, child_bytenr,
+					root->nodesize, GFP_NOFS);
+			if (ret)
+				goto out;
+		}
+
+		if (level == 0) {
+			ret = btrfs_qgroup_trace_leaf_items(trans, root,
+					path->nodes[level]);
+			if (ret)
+				goto out;
+
+			/* Nonzero return here means we completed our search */
+			ret = adjust_slots_upwards(root, path, root_level);
+			if (ret)
+				break;
+
+			/* Restart search with new slots */
+			goto walk_down;
+		}
+
+		level--;
+	}
+
+	ret = 0;
+out:
+	btrfs_free_path(path);
+
+	return ret;
+}
+
 #define UPDATE_NEW	0
 #define UPDATE_OLD	1
 /*
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 9303e09..99c879d 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -122,6 +122,29 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans,
 		struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes,
 		gfp_t gfp_flag);
 
+/*
+ * Inform qgroup to trace all leaf items of data
+ *
+ * Return 0 for success
+ * Return <0 for error(ENOMEM)
+ */
+int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans,
+				  struct btrfs_root *root,
+				  struct extent_buffer *eb);
+/*
+ * Inform qgroup to trace a whole subtree, including all its child tree
+ * blocks and data.
+ * The root tree block is specified by @root_eb.
+ *
+ * Normally used by relocation(tree block swap) and subvolume deletion.
+ *
+ * Return 0 for success
+ * Return <0 for error(ENOMEM or tree search error)
+ */
+int btrfs_qgroup_trace_subtree(struct btrfs_trans_handle *trans,
+			       struct btrfs_root *root,
+			       struct extent_buffer *root_eb,
+			       u64 root_gen, int root_level);
 int
 btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans,
 			    struct btrfs_fs_info *fs_info,
-- 
2.10.0




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] btrfs: qgroup: Fix qgroup data leaking by using subtree tracing
  2016-10-18  1:31 [PATCH 0/4] Qgroup comment enhance and balance fix Qu Wenruo
                   ` (2 preceding siblings ...)
  2016-10-18  1:31 ` [PATCH 3/4] btrfs: Expoert and move leaf/subtree qgroup helpers to qgroup.c Qu Wenruo
@ 2016-10-18  1:31 ` Qu Wenruo
  2016-11-03 18:50   ` Goldwyn Rodrigues
  2016-10-28  0:33 ` [PATCH 0/4] Qgroup comment enhance and balance fix Qu Wenruo
  2016-11-07 17:59 ` David Sterba
  5 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2016-10-18  1:31 UTC (permalink / raw)
  To: linux-btrfs; +Cc: rgoldwyn, rgoldwyn

Commit 62b99540a1d91e464 (btrfs: relocation: Fix leaking qgroups numbers
on data extents) only fixes the problem partly.

The previous fix is to trace all new data extents at transaction commit
time when balance finishes.

However balance is not done in a large transaction, every path
replacement can happen in its own transaction.
This makes the fix useless if transaction commits during relocation.

For example:
relocate_block_group()
|-merge_reloc_roots()
|  |- merge_reloc_root()
|     |- btrfs_start_transaction()         <- Trans X
|     |- replace_path()                    <- Cause leak
|     |- btrfs_end_transaction_throttle()  <- Trans X commits here
|     |                                       Leak not fixed
|     |
|     |- btrfs_start_transaction()         <- Trans Y
|     |- replace_path()                    <- Cause leak
|     |- btrfs_end_transaction_throttle()  <- Trans Y ends
|                                             but not committed
|-btrfs_join_transaction()                 <- Still trans Y
|-qgroup_fix()                             <- Only fixes data leak
|                                             in trans Y
|-btrfs_commit_transaction()               <- Trans Y commits

In that case, qgroup fixup can only fix data leak in trans Y, data leak
in trans X is out of fix.

So the correct fix should happens in the same transaction of
replace_path().

This patch fixes it by tracing both subtrees of tree block swap, so it
can fix the problem and ensure all leaking and fix are in the same
transaction, so no leak again.

Reported-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/relocation.c | 119 ++++++++++----------------------------------------
 1 file changed, 23 insertions(+), 96 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 33cde30..540f225 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1901,6 +1901,29 @@ again:
 		BUG_ON(ret);
 
 		/*
+		 * Info qgroup to trace both subtrees.
+		 *
+		 * We must trace both trees.
+		 * 1) Tree reloc subtree
+		 *    If not traced, we will leak data numbers
+		 * 2) Fs subtree
+		 *    If not traced, we will double count old data
+		 *    and tree block numbers, if current trans doesn't free
+		 *    data reloc tree inode.
+		 */
+		ret = btrfs_qgroup_trace_subtree(trans, src, parent,
+				btrfs_header_generation(parent),
+				btrfs_header_level(parent));
+		if (ret < 0)
+			break;
+		ret = btrfs_qgroup_trace_subtree(trans, dest,
+				path->nodes[level],
+				btrfs_header_generation(path->nodes[level]),
+				btrfs_header_level(path->nodes[level]));
+		if (ret < 0)
+			break;
+
+		/*
 		 * swap blocks in fs tree and reloc tree.
 		 */
 		btrfs_set_node_blockptr(parent, slot, new_bytenr);
@@ -3942,90 +3965,6 @@ int prepare_to_relocate(struct reloc_control *rc)
 	return 0;
 }
 
-/*
- * Qgroup fixer for data chunk relocation.
- * The data relocation is done in the following steps
- * 1) Copy data extents into data reloc tree
- * 2) Create tree reloc tree(special snapshot) for related subvolumes
- * 3) Modify file extents in tree reloc tree
- * 4) Merge tree reloc tree with original fs tree, by swapping tree blocks
- *
- * The problem is, data and tree reloc tree are not accounted to qgroup,
- * and 4) will only info qgroup to track tree blocks change, not file extents
- * in the tree blocks.
- *
- * The good news is, related data extents are all in data reloc tree, so we
- * only need to info qgroup to track all file extents in data reloc tree
- * before commit trans.
- */
-static int qgroup_fix_relocated_data_extents(struct btrfs_trans_handle *trans,
-					     struct reloc_control *rc)
-{
-	struct btrfs_fs_info *fs_info = rc->extent_root->fs_info;
-	struct inode *inode = rc->data_inode;
-	struct btrfs_root *data_reloc_root = BTRFS_I(inode)->root;
-	struct btrfs_path *path;
-	struct btrfs_key key;
-	int ret = 0;
-
-	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags))
-		return 0;
-
-	/*
-	 * Only for stage where we update data pointers the qgroup fix is
-	 * valid.
-	 * For MOVING_DATA stage, we will miss the timing of swapping tree
-	 * blocks, and won't fix it.
-	 */
-	if (!(rc->stage == UPDATE_DATA_PTRS && rc->extents_found))
-		return 0;
-
-	path = btrfs_alloc_path();
-	if (!path)
-		return -ENOMEM;
-	key.objectid = btrfs_ino(inode);
-	key.type = BTRFS_EXTENT_DATA_KEY;
-	key.offset = 0;
-
-	ret = btrfs_search_slot(NULL, data_reloc_root, &key, path, 0, 0);
-	if (ret < 0)
-		goto out;
-
-	lock_extent(&BTRFS_I(inode)->io_tree, 0, (u64)-1);
-	while (1) {
-		struct btrfs_file_extent_item *fi;
-
-		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
-		if (key.objectid > btrfs_ino(inode))
-			break;
-		if (key.type != BTRFS_EXTENT_DATA_KEY)
-			goto next;
-		fi = btrfs_item_ptr(path->nodes[0], path->slots[0],
-				    struct btrfs_file_extent_item);
-		if (btrfs_file_extent_type(path->nodes[0], fi) !=
-				BTRFS_FILE_EXTENT_REG)
-			goto next;
-		ret = btrfs_qgroup_trace_extent(trans, fs_info,
-			btrfs_file_extent_disk_bytenr(path->nodes[0], fi),
-			btrfs_file_extent_disk_num_bytes(path->nodes[0], fi),
-			GFP_NOFS);
-		if (ret < 0)
-			break;
-next:
-		ret = btrfs_next_item(data_reloc_root, path);
-		if (ret < 0)
-			break;
-		if (ret > 0) {
-			ret = 0;
-			break;
-		}
-	}
-	unlock_extent(&BTRFS_I(inode)->io_tree, 0 , (u64)-1);
-out:
-	btrfs_free_path(path);
-	return ret;
-}
-
 static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
 {
 	struct rb_root blocks = RB_ROOT;
@@ -4216,13 +4155,6 @@ restart:
 		err = PTR_ERR(trans);
 		goto out_free;
 	}
-	ret = qgroup_fix_relocated_data_extents(trans, rc);
-	if (ret < 0) {
-		btrfs_abort_transaction(trans, ret);
-		if (!err)
-			err = ret;
-		goto out_free;
-	}
 	btrfs_commit_transaction(trans, rc->extent_root);
 out_free:
 	btrfs_free_block_rsv(rc->extent_root, rc->block_rsv);
@@ -4591,11 +4523,6 @@ int btrfs_recover_relocation(struct btrfs_root *root)
 		err = PTR_ERR(trans);
 		goto out_free;
 	}
-	err = qgroup_fix_relocated_data_extents(trans, rc);
-	if (err < 0) {
-		btrfs_abort_transaction(trans, err);
-		goto out_free;
-	}
 	err = btrfs_commit_transaction(trans, rc->extent_root);
 out_free:
 	kfree(rc);
-- 
2.10.0




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] Qgroup comment enhance and balance fix
  2016-10-18  1:31 [PATCH 0/4] Qgroup comment enhance and balance fix Qu Wenruo
                   ` (3 preceding siblings ...)
  2016-10-18  1:31 ` [PATCH 4/4] btrfs: qgroup: Fix qgroup data leaking by using subtree tracing Qu Wenruo
@ 2016-10-28  0:33 ` Qu Wenruo
  2016-10-28 15:03   ` Goldwyn Rodrigues
  2016-10-31 17:00   ` David Sterba
  2016-11-07 17:59 ` David Sterba
  5 siblings, 2 replies; 13+ messages in thread
From: Qu Wenruo @ 2016-10-28  0:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: rgoldwyn, rgoldwyn

Any comment?

Especially the final patch will fix a long standing bug.

Thanks,
Qu

At 10/18/2016 09:31 AM, Qu Wenruo wrote:
> The patchset does the following things:
> 1) Enhance comment for qgroup, rename 2 functions
>    Explain the how qgroup works, so new developers won't waste too much
>    time digging into the boring codes.
>
>    The qgroup work flow is split into 3 main phrases:
>    Reverse, Trace, Account.
>    And rename functions like btrfs_qgroup_insert_dirty_extent_record()
>    to btrfs_qgroup_trace_extent(), to follow the "Trace" phrase.
>
>    Other function name already follows such schema before.
>
> 2) Move account_shared_subtree() and account_leaf_items() to qgroup.c
>    Such functions are only used by qgroup, so move them to qgroup.c and
>    rename them to follow "trace" schema.
>
> 3) Fix the long standing qgroup balance corruption
>    Commit 62b99540a1d91e4 doesn't fix the problem completely.
>    It can only handle case that merge_reloc_roots() are all done in one
>    transaction.
>
>    If transaction commits during merge_reloc_roots(), the data extents
>    will leak again.
>
>    The tree fix is to info qgroup to trace both subtree(tree reloc tree
>    and destination fs tree), at replace_path() time.
>    Inside  replace_path(), there is one transaction start and end, so we
>    must make qgroup to trace both subtrees.
>
>    Thanks for previous work, now we can easily trace subtree, so the fix
>    is quite simple now.
>
>    And the cause also makes it easier to create pinpoint test case for
>    this bug.
>
> Qu Wenruo (4):
>   btrfs: qgroup: Add comments explaining how btrfs qgroup works
>   btrfs: qgroup: Rename functions to make it follow
>     reserve,trace,account steps
>   btrfs: Expoert and move leaf/subtree qgroup helpers to qgroup.c
>   btrfs: qgroup: Fix qgroup data leaking by using subtree tracing
>
>  fs/btrfs/delayed-ref.c       |   2 +-
>  fs/btrfs/extent-tree.c       | 220 +------------------------------------------
>  fs/btrfs/qgroup.c            | 219 +++++++++++++++++++++++++++++++++++++++++-
>  fs/btrfs/qgroup.h            |  64 +++++++++++--
>  fs/btrfs/relocation.c        | 119 +++++------------------
>  fs/btrfs/tree-log.c          |   2 +-
>  include/trace/events/btrfs.h |   2 +-
>  7 files changed, 302 insertions(+), 326 deletions(-)
>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] Qgroup comment enhance and balance fix
  2016-10-28  0:33 ` [PATCH 0/4] Qgroup comment enhance and balance fix Qu Wenruo
@ 2016-10-28 15:03   ` Goldwyn Rodrigues
  2016-10-31 17:00   ` David Sterba
  1 sibling, 0 replies; 13+ messages in thread
From: Goldwyn Rodrigues @ 2016-10-28 15:03 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: rgoldwyn



On 10/27/2016 07:33 PM, Qu Wenruo wrote:
> Any comment?
> 
> Especially the final patch will fix a long standing bug.
> 

While I have tested it, and it works, I did not get time to review it.
So, you can have my

Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

> Thanks,
> Qu
> 
> At 10/18/2016 09:31 AM, Qu Wenruo wrote:
>> The patchset does the following things:
>> 1) Enhance comment for qgroup, rename 2 functions
>>    Explain the how qgroup works, so new developers won't waste too much
>>    time digging into the boring codes.
>>
>>    The qgroup work flow is split into 3 main phrases:
>>    Reverse, Trace, Account.
>>    And rename functions like btrfs_qgroup_insert_dirty_extent_record()
>>    to btrfs_qgroup_trace_extent(), to follow the "Trace" phrase.
>>
>>    Other function name already follows such schema before.
>>
>> 2) Move account_shared_subtree() and account_leaf_items() to qgroup.c
>>    Such functions are only used by qgroup, so move them to qgroup.c and
>>    rename them to follow "trace" schema.
>>
>> 3) Fix the long standing qgroup balance corruption
>>    Commit 62b99540a1d91e4 doesn't fix the problem completely.
>>    It can only handle case that merge_reloc_roots() are all done in one
>>    transaction.
>>
>>    If transaction commits during merge_reloc_roots(), the data extents
>>    will leak again.
>>
>>    The tree fix is to info qgroup to trace both subtree(tree reloc tree
>>    and destination fs tree), at replace_path() time.
>>    Inside  replace_path(), there is one transaction start and end, so we
>>    must make qgroup to trace both subtrees.
>>
>>    Thanks for previous work, now we can easily trace subtree, so the fix
>>    is quite simple now.
>>
>>    And the cause also makes it easier to create pinpoint test case for
>>    this bug.
>>
>> Qu Wenruo (4):
>>   btrfs: qgroup: Add comments explaining how btrfs qgroup works
>>   btrfs: qgroup: Rename functions to make it follow
>>     reserve,trace,account steps
>>   btrfs: Expoert and move leaf/subtree qgroup helpers to qgroup.c
>>   btrfs: qgroup: Fix qgroup data leaking by using subtree tracing
>>
>>  fs/btrfs/delayed-ref.c       |   2 +-
>>  fs/btrfs/extent-tree.c       | 220
>> +------------------------------------------
>>  fs/btrfs/qgroup.c            | 219
>> +++++++++++++++++++++++++++++++++++++++++-
>>  fs/btrfs/qgroup.h            |  64 +++++++++++--
>>  fs/btrfs/relocation.c        | 119 +++++------------------
>>  fs/btrfs/tree-log.c          |   2 +-
>>  include/trace/events/btrfs.h |   2 +-
>>  7 files changed, 302 insertions(+), 326 deletions(-)
>>
> 
> 

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] Qgroup comment enhance and balance fix
  2016-10-28  0:33 ` [PATCH 0/4] Qgroup comment enhance and balance fix Qu Wenruo
  2016-10-28 15:03   ` Goldwyn Rodrigues
@ 2016-10-31 17:00   ` David Sterba
  1 sibling, 0 replies; 13+ messages in thread
From: David Sterba @ 2016-10-31 17:00 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, rgoldwyn, rgoldwyn

On Fri, Oct 28, 2016 at 08:33:34AM +0800, Qu Wenruo wrote:
> Any comment?
> 
> Especially the final patch will fix a long standing bug.

>From my perspective, review is required (but I appreciate Goldwyn's
testing). Who's going to do that, I don't know.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] btrfs: qgroup: Add comments explaining how btrfs qgroup works
  2016-10-18  1:31 ` [PATCH 1/4] btrfs: qgroup: Add comments explaining how btrfs qgroup works Qu Wenruo
@ 2016-11-03 18:49   ` Goldwyn Rodrigues
  0 siblings, 0 replies; 13+ messages in thread
From: Goldwyn Rodrigues @ 2016-11-03 18:49 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: rgoldwyn

Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

On 10/17/2016 08:31 PM, Qu Wenruo wrote:
> Add explain how btrfs qgroup works.
> 
> Qgroup is split into 3 main phrases:
> 1) Reserve
>    To ensure qgroup doesn't exceed its limit
> 
> 2) Trace
>    To info qgroup to trace which extent
> 
> 3) Account
>    Calculate qgroup number change for each traced extent.
> 
> This should save quite some time for new developers.
> 
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>  fs/btrfs/qgroup.h | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
> index 1bc64c8..a72bf21 100644
> --- a/fs/btrfs/qgroup.h
> +++ b/fs/btrfs/qgroup.h
> @@ -23,6 +23,34 @@
>  #include "delayed-ref.h"
>  
>  /*
> + * Btrfs qgroup overview
> + *
> + * Btrfs qgroup splits into 3 main part:
> + * 1) Reserve
> + *    Reserve metadata/data space for incoming operations
> + *    Affect how qgroup limit works
> + *
> + * 2) Trace
> + *    Tell btrfs qgroup to trace dirty extents.
> + *
> + *    Dirty extents including:
> + *    - Newly allocated extents
> + *    - Extents going to be deleted (in this trans)
> + *    - Extents whose owner is going to be modified
> + *
> + *    This is the main part affects whether qgroup numbers will stay
> + *    consistent.
> + *    Btrfs qgroup can trace clean extents and won't cause any problem,
> + *    but it will consume extra CPU time, it should be avoided if possible.
> + *
> + * 3) Account
> + *    Btrfs qgroup will updates its numbers, based on dirty extents traced
> + *    in previous step.
> + *
> + *    Normally at qgroup rescan and transaction commit time.
> + */
> +
> +/*
>   * Record a dirty extent, and info qgroup to update quota on it
>   * TODO: Use kmem cache to alloc it.
>   */
> 

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] btrfs: qgroup: Rename functions to make it follow reserve,trace,account steps
  2016-10-18  1:31 ` [PATCH 2/4] btrfs: qgroup: Rename functions to make it follow reserve,trace,account steps Qu Wenruo
@ 2016-11-03 18:49   ` Goldwyn Rodrigues
  0 siblings, 0 replies; 13+ messages in thread
From: Goldwyn Rodrigues @ 2016-11-03 18:49 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: rgoldwyn

Reviewed-and-Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

On 10/17/2016 08:31 PM, Qu Wenruo wrote:
> Rename btrfs_qgroup_insert_dirty_extent(_nolock) to
> btrfs_qgroup_trace_extent(_nolock), according to the new
> reserve/trace/account naming schema.
> 
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>  fs/btrfs/delayed-ref.c       |  2 +-
>  fs/btrfs/extent-tree.c       |  6 +++---
>  fs/btrfs/qgroup.c            |  8 ++++----
>  fs/btrfs/qgroup.h            | 13 +++++++------
>  fs/btrfs/relocation.c        |  2 +-
>  fs/btrfs/tree-log.c          |  2 +-
>  include/trace/events/btrfs.h |  2 +-
>  7 files changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index 8d93854..a1cd0da 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -606,7 +606,7 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>  		qrecord->num_bytes = num_bytes;
>  		qrecord->old_roots = NULL;
>  
> -		if(btrfs_qgroup_insert_dirty_extent_nolock(fs_info,
> +		if(btrfs_qgroup_trace_extent_nolock(fs_info,
>  					delayed_refs, qrecord))
>  			kfree(qrecord);
>  	}
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 210c94a..024eb5d 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -8568,8 +8568,8 @@ static int account_leaf_items(struct btrfs_trans_handle *trans,
>  
>  		num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi);
>  
> -		ret = btrfs_qgroup_insert_dirty_extent(trans, root->fs_info,
> -				bytenr, num_bytes, GFP_NOFS);
> +		ret = btrfs_qgroup_trace_extent(trans, root->fs_info,
> +						bytenr, num_bytes, GFP_NOFS);
>  		if (ret)
>  			return ret;
>  	}
> @@ -8718,7 +8718,7 @@ walk_down:
>  			btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
>  			path->locks[level] = BTRFS_READ_LOCK_BLOCKING;
>  
> -			ret = btrfs_qgroup_insert_dirty_extent(trans,
> +			ret = btrfs_qgroup_trace_extent(trans,
>  					root->fs_info, child_bytenr,
>  					root->nodesize, GFP_NOFS);
>  			if (ret)
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 11f4fff..e73eea3 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -1450,7 +1450,7 @@ int btrfs_qgroup_prepare_account_extents(struct btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> -int btrfs_qgroup_insert_dirty_extent_nolock(struct btrfs_fs_info *fs_info,
> +int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
>  				struct btrfs_delayed_ref_root *delayed_refs,
>  				struct btrfs_qgroup_extent_record *record)
>  {
> @@ -1460,7 +1460,7 @@ int btrfs_qgroup_insert_dirty_extent_nolock(struct btrfs_fs_info *fs_info,
>  	u64 bytenr = record->bytenr;
>  
>  	assert_spin_locked(&delayed_refs->lock);
> -	trace_btrfs_qgroup_insert_dirty_extent(fs_info, record);
> +	trace_btrfs_qgroup_trace_extent(fs_info, record);
>  
>  	while (*p) {
>  		parent_node = *p;
> @@ -1479,7 +1479,7 @@ int btrfs_qgroup_insert_dirty_extent_nolock(struct btrfs_fs_info *fs_info,
>  	return 0;
>  }
>  
> -int btrfs_qgroup_insert_dirty_extent(struct btrfs_trans_handle *trans,
> +int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans,
>  		struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes,
>  		gfp_t gfp_flag)
>  {
> @@ -1502,7 +1502,7 @@ int btrfs_qgroup_insert_dirty_extent(struct btrfs_trans_handle *trans,
>  	record->old_roots = NULL;
>  
>  	spin_lock(&delayed_refs->lock);
> -	ret = btrfs_qgroup_insert_dirty_extent_nolock(fs_info, delayed_refs,
> +	ret = btrfs_qgroup_trace_extent_nolock(fs_info, delayed_refs,
>  						      record);
>  	spin_unlock(&delayed_refs->lock);
>  	if (ret > 0)
> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
> index a72bf21..9303e09 100644
> --- a/fs/btrfs/qgroup.h
> +++ b/fs/btrfs/qgroup.h
> @@ -93,8 +93,8 @@ struct btrfs_delayed_extent_op;
>  int btrfs_qgroup_prepare_account_extents(struct btrfs_trans_handle *trans,
>  					 struct btrfs_fs_info *fs_info);
>  /*
> - * Insert one dirty extent record into @delayed_refs, informing qgroup to
> - * account that extent at commit trans time.
> + * Inform qgroup to trace one dirty extent, its info is recorded in @record.
> + * So qgroup can account it at commit trans time.
>   *
>   * No lock version, caller must acquire delayed ref lock and allocate memory.
>   *
> @@ -102,14 +102,15 @@ int btrfs_qgroup_prepare_account_extents(struct btrfs_trans_handle *trans,
>   * Return >0 for existing record, caller can free @record safely.
>   * Error is not possible
>   */
> -int btrfs_qgroup_insert_dirty_extent_nolock(
> +int btrfs_qgroup_trace_extent_nolock(
>  		struct btrfs_fs_info *fs_info,
>  		struct btrfs_delayed_ref_root *delayed_refs,
>  		struct btrfs_qgroup_extent_record *record);
>  
>  /*
> - * Insert one dirty extent record into @delayed_refs, informing qgroup to
> - * account that extent at commit trans time.
> + * Inform qgroup to trace one dirty extent, specified by @bytenr and
> + * @num_bytes.
> + * So qgroup can account it at commit trans time.
>   *
>   * Better encapsulated version.
>   *
> @@ -117,7 +118,7 @@ int btrfs_qgroup_insert_dirty_extent_nolock(
>   * Return <0 for error, like memory allocation failure or invalid parameter
>   * (NULL trans)
>   */
> -int btrfs_qgroup_insert_dirty_extent(struct btrfs_trans_handle *trans,
> +int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans,
>  		struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes,
>  		gfp_t gfp_flag);
>  
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 0ec8ffa..33cde30 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -4005,7 +4005,7 @@ static int qgroup_fix_relocated_data_extents(struct btrfs_trans_handle *trans,
>  		if (btrfs_file_extent_type(path->nodes[0], fi) !=
>  				BTRFS_FILE_EXTENT_REG)
>  			goto next;
> -		ret = btrfs_qgroup_insert_dirty_extent(trans, fs_info,
> +		ret = btrfs_qgroup_trace_extent(trans, fs_info,
>  			btrfs_file_extent_disk_bytenr(path->nodes[0], fi),
>  			btrfs_file_extent_disk_num_bytes(path->nodes[0], fi),
>  			GFP_NOFS);
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 688df71..8ad7665 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -689,7 +689,7 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
>  		 * as the owner of the file extent changed from log tree
>  		 * (doesn't affect qgroup) to fs/file tree(affects qgroup)
>  		 */
> -		ret = btrfs_qgroup_insert_dirty_extent(trans, root->fs_info,
> +		ret = btrfs_qgroup_trace_extent(trans, root->fs_info,
>  				btrfs_file_extent_disk_bytenr(eb, item),
>  				btrfs_file_extent_disk_num_bytes(eb, item),
>  				GFP_NOFS);
> diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
> index e030d6f..e61bbc3 100644
> --- a/include/trace/events/btrfs.h
> +++ b/include/trace/events/btrfs.h
> @@ -1406,7 +1406,7 @@ DEFINE_EVENT(btrfs_qgroup_extent, btrfs_qgroup_account_extents,
>  	TP_ARGS(fs_info, rec)
>  );
>  
> -DEFINE_EVENT(btrfs_qgroup_extent, btrfs_qgroup_insert_dirty_extent,
> +DEFINE_EVENT(btrfs_qgroup_extent, btrfs_qgroup_trace_extent,
>  
>  	TP_PROTO(struct btrfs_fs_info *fs_info,
>  		 struct btrfs_qgroup_extent_record *rec),
> 

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] btrfs: Expoert and move leaf/subtree qgroup helpers to qgroup.c
  2016-10-18  1:31 ` [PATCH 3/4] btrfs: Expoert and move leaf/subtree qgroup helpers to qgroup.c Qu Wenruo
@ 2016-11-03 18:50   ` Goldwyn Rodrigues
  0 siblings, 0 replies; 13+ messages in thread
From: Goldwyn Rodrigues @ 2016-11-03 18:50 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: rgoldwyn

Reviewed-and-Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

On 10/17/2016 08:31 PM, Qu Wenruo wrote:
> Move account_shared_subtree() to qgroup.c and rename it to
> btrfs_qgroup_trace_subtree().
> 
> Do the same thing for account_leaf_items() and rename it to
> btrfs_qgroup_trace_leaf_items().
> 
> Since all these functions are only for qgroup, move them to qgroup.c and
> export them is more appropriate.
> 
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>  fs/btrfs/extent-tree.c | 220 +------------------------------------------------
>  fs/btrfs/qgroup.c      | 211 +++++++++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/qgroup.h      |  23 ++++++
>  3 files changed, 237 insertions(+), 217 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 024eb5d..f7aa49d 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -8535,220 +8535,6 @@ reada:
>  	wc->reada_slot = slot;
>  }
>  
> -static int account_leaf_items(struct btrfs_trans_handle *trans,
> -			      struct btrfs_root *root,
> -			      struct extent_buffer *eb)
> -{
> -	int nr = btrfs_header_nritems(eb);
> -	int i, extent_type, ret;
> -	struct btrfs_key key;
> -	struct btrfs_file_extent_item *fi;
> -	u64 bytenr, num_bytes;
> -
> -	/* We can be called directly from walk_up_proc() */
> -	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &root->fs_info->flags))
> -		return 0;
> -
> -	for (i = 0; i < nr; i++) {
> -		btrfs_item_key_to_cpu(eb, &key, i);
> -
> -		if (key.type != BTRFS_EXTENT_DATA_KEY)
> -			continue;
> -
> -		fi = btrfs_item_ptr(eb, i, struct btrfs_file_extent_item);
> -		/* filter out non qgroup-accountable extents  */
> -		extent_type = btrfs_file_extent_type(eb, fi);
> -
> -		if (extent_type == BTRFS_FILE_EXTENT_INLINE)
> -			continue;
> -
> -		bytenr = btrfs_file_extent_disk_bytenr(eb, fi);
> -		if (!bytenr)
> -			continue;
> -
> -		num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi);
> -
> -		ret = btrfs_qgroup_trace_extent(trans, root->fs_info,
> -						bytenr, num_bytes, GFP_NOFS);
> -		if (ret)
> -			return ret;
> -	}
> -	return 0;
> -}
> -
> -/*
> - * Walk up the tree from the bottom, freeing leaves and any interior
> - * nodes which have had all slots visited. If a node (leaf or
> - * interior) is freed, the node above it will have it's slot
> - * incremented. The root node will never be freed.
> - *
> - * At the end of this function, we should have a path which has all
> - * slots incremented to the next position for a search. If we need to
> - * read a new node it will be NULL and the node above it will have the
> - * correct slot selected for a later read.
> - *
> - * If we increment the root nodes slot counter past the number of
> - * elements, 1 is returned to signal completion of the search.
> - */
> -static int adjust_slots_upwards(struct btrfs_root *root,
> -				struct btrfs_path *path, int root_level)
> -{
> -	int level = 0;
> -	int nr, slot;
> -	struct extent_buffer *eb;
> -
> -	if (root_level == 0)
> -		return 1;
> -
> -	while (level <= root_level) {
> -		eb = path->nodes[level];
> -		nr = btrfs_header_nritems(eb);
> -		path->slots[level]++;
> -		slot = path->slots[level];
> -		if (slot >= nr || level == 0) {
> -			/*
> -			 * Don't free the root -  we will detect this
> -			 * condition after our loop and return a
> -			 * positive value for caller to stop walking the tree.
> -			 */
> -			if (level != root_level) {
> -				btrfs_tree_unlock_rw(eb, path->locks[level]);
> -				path->locks[level] = 0;
> -
> -				free_extent_buffer(eb);
> -				path->nodes[level] = NULL;
> -				path->slots[level] = 0;
> -			}
> -		} else {
> -			/*
> -			 * We have a valid slot to walk back down
> -			 * from. Stop here so caller can process these
> -			 * new nodes.
> -			 */
> -			break;
> -		}
> -
> -		level++;
> -	}
> -
> -	eb = path->nodes[root_level];
> -	if (path->slots[root_level] >= btrfs_header_nritems(eb))
> -		return 1;
> -
> -	return 0;
> -}
> -
> -/*
> - * root_eb is the subtree root and is locked before this function is called.
> - */
> -static int account_shared_subtree(struct btrfs_trans_handle *trans,
> -				  struct btrfs_root *root,
> -				  struct extent_buffer *root_eb,
> -				  u64 root_gen,
> -				  int root_level)
> -{
> -	int ret = 0;
> -	int level;
> -	struct extent_buffer *eb = root_eb;
> -	struct btrfs_path *path = NULL;
> -
> -	BUG_ON(root_level < 0 || root_level > BTRFS_MAX_LEVEL);
> -	BUG_ON(root_eb == NULL);
> -
> -	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &root->fs_info->flags))
> -		return 0;
> -
> -	if (!extent_buffer_uptodate(root_eb)) {
> -		ret = btrfs_read_buffer(root_eb, root_gen);
> -		if (ret)
> -			goto out;
> -	}
> -
> -	if (root_level == 0) {
> -		ret = account_leaf_items(trans, root, root_eb);
> -		goto out;
> -	}
> -
> -	path = btrfs_alloc_path();
> -	if (!path)
> -		return -ENOMEM;
> -
> -	/*
> -	 * Walk down the tree.  Missing extent blocks are filled in as
> -	 * we go. Metadata is accounted every time we read a new
> -	 * extent block.
> -	 *
> -	 * When we reach a leaf, we account for file extent items in it,
> -	 * walk back up the tree (adjusting slot pointers as we go)
> -	 * and restart the search process.
> -	 */
> -	extent_buffer_get(root_eb); /* For path */
> -	path->nodes[root_level] = root_eb;
> -	path->slots[root_level] = 0;
> -	path->locks[root_level] = 0; /* so release_path doesn't try to unlock */
> -walk_down:
> -	level = root_level;
> -	while (level >= 0) {
> -		if (path->nodes[level] == NULL) {
> -			int parent_slot;
> -			u64 child_gen;
> -			u64 child_bytenr;
> -
> -			/* We need to get child blockptr/gen from
> -			 * parent before we can read it. */
> -			eb = path->nodes[level + 1];
> -			parent_slot = path->slots[level + 1];
> -			child_bytenr = btrfs_node_blockptr(eb, parent_slot);
> -			child_gen = btrfs_node_ptr_generation(eb, parent_slot);
> -
> -			eb = read_tree_block(root, child_bytenr, child_gen);
> -			if (IS_ERR(eb)) {
> -				ret = PTR_ERR(eb);
> -				goto out;
> -			} else if (!extent_buffer_uptodate(eb)) {
> -				free_extent_buffer(eb);
> -				ret = -EIO;
> -				goto out;
> -			}
> -
> -			path->nodes[level] = eb;
> -			path->slots[level] = 0;
> -
> -			btrfs_tree_read_lock(eb);
> -			btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
> -			path->locks[level] = BTRFS_READ_LOCK_BLOCKING;
> -
> -			ret = btrfs_qgroup_trace_extent(trans,
> -					root->fs_info, child_bytenr,
> -					root->nodesize, GFP_NOFS);
> -			if (ret)
> -				goto out;
> -		}
> -
> -		if (level == 0) {
> -			ret = account_leaf_items(trans, root, path->nodes[level]);
> -			if (ret)
> -				goto out;
> -
> -			/* Nonzero return here means we completed our search */
> -			ret = adjust_slots_upwards(root, path, root_level);
> -			if (ret)
> -				break;
> -
> -			/* Restart search with new slots */
> -			goto walk_down;
> -		}
> -
> -		level--;
> -	}
> -
> -	ret = 0;
> -out:
> -	btrfs_free_path(path);
> -
> -	return ret;
> -}
> -
>  /*
>   * helper to process tree block while walking down the tree.
>   *
> @@ -8977,8 +8763,8 @@ skip:
>  		}
>  
>  		if (need_account) {
> -			ret = account_shared_subtree(trans, root, next,
> -						     generation, level - 1);
> +			ret = btrfs_qgroup_trace_subtree(trans, root, next,
> +							 generation, level - 1);
>  			if (ret) {
>  				btrfs_err_rl(root->fs_info,
>  					     "Error %d accounting shared subtree. Quota is out of sync, rescan required.",
> @@ -9075,7 +8861,7 @@ static noinline int walk_up_proc(struct btrfs_trans_handle *trans,
>  			else
>  				ret = btrfs_dec_ref(trans, root, eb, 0);
>  			BUG_ON(ret); /* -ENOMEM */
> -			ret = account_leaf_items(trans, root, eb);
> +			ret = btrfs_qgroup_trace_leaf_items(trans, root, eb);
>  			if (ret) {
>  				btrfs_err_rl(root->fs_info,
>  					     "error %d accounting leaf items. Quota is out of sync, rescan required.",
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index e73eea3..e97f304 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -1510,6 +1510,217 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans,
>  	return 0;
>  }
>  
> +int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans,
> +				  struct btrfs_root *root,
> +				  struct extent_buffer *eb)
> +{
> +	int nr = btrfs_header_nritems(eb);
> +	int i, extent_type, ret;
> +	struct btrfs_key key;
> +	struct btrfs_file_extent_item *fi;
> +	u64 bytenr, num_bytes;
> +
> +	/* We can be called directly from walk_up_proc() */
> +	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &root->fs_info->flags))
> +		return 0;
> +
> +	for (i = 0; i < nr; i++) {
> +		btrfs_item_key_to_cpu(eb, &key, i);
> +
> +		if (key.type != BTRFS_EXTENT_DATA_KEY)
> +			continue;
> +
> +		fi = btrfs_item_ptr(eb, i, struct btrfs_file_extent_item);
> +		/* filter out non qgroup-accountable extents  */
> +		extent_type = btrfs_file_extent_type(eb, fi);
> +
> +		if (extent_type == BTRFS_FILE_EXTENT_INLINE)
> +			continue;
> +
> +		bytenr = btrfs_file_extent_disk_bytenr(eb, fi);
> +		if (!bytenr)
> +			continue;
> +
> +		num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi);
> +
> +		ret = btrfs_qgroup_trace_extent(trans, root->fs_info,
> +						bytenr, num_bytes, GFP_NOFS);
> +		if (ret)
> +			return ret;
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Walk up the tree from the bottom, freeing leaves and any interior
> + * nodes which have had all slots visited. If a node (leaf or
> + * interior) is freed, the node above it will have it's slot
> + * incremented. The root node will never be freed.
> + *
> + * At the end of this function, we should have a path which has all
> + * slots incremented to the next position for a search. If we need to
> + * read a new node it will be NULL and the node above it will have the
> + * correct slot selected for a later read.
> + *
> + * If we increment the root nodes slot counter past the number of
> + * elements, 1 is returned to signal completion of the search.
> + */
> +static int adjust_slots_upwards(struct btrfs_root *root,
> +				struct btrfs_path *path, int root_level)
> +{
> +	int level = 0;
> +	int nr, slot;
> +	struct extent_buffer *eb;
> +
> +	if (root_level == 0)
> +		return 1;
> +
> +	while (level <= root_level) {
> +		eb = path->nodes[level];
> +		nr = btrfs_header_nritems(eb);
> +		path->slots[level]++;
> +		slot = path->slots[level];
> +		if (slot >= nr || level == 0) {
> +			/*
> +			 * Don't free the root -  we will detect this
> +			 * condition after our loop and return a
> +			 * positive value for caller to stop walking the tree.
> +			 */
> +			if (level != root_level) {
> +				btrfs_tree_unlock_rw(eb, path->locks[level]);
> +				path->locks[level] = 0;
> +
> +				free_extent_buffer(eb);
> +				path->nodes[level] = NULL;
> +				path->slots[level] = 0;
> +			}
> +		} else {
> +			/*
> +			 * We have a valid slot to walk back down
> +			 * from. Stop here so caller can process these
> +			 * new nodes.
> +			 */
> +			break;
> +		}
> +
> +		level++;
> +	}
> +
> +	eb = path->nodes[root_level];
> +	if (path->slots[root_level] >= btrfs_header_nritems(eb))
> +		return 1;
> +
> +	return 0;
> +}
> +
> +int btrfs_qgroup_trace_subtree(struct btrfs_trans_handle *trans,
> +			       struct btrfs_root *root,
> +			       struct extent_buffer *root_eb,
> +			       u64 root_gen, int root_level)
> +{
> +	int ret = 0;
> +	int level;
> +	struct extent_buffer *eb = root_eb;
> +	struct btrfs_path *path = NULL;
> +
> +	BUG_ON(root_level < 0 || root_level > BTRFS_MAX_LEVEL);
> +	BUG_ON(root_eb == NULL);
> +
> +	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &root->fs_info->flags))
> +		return 0;
> +
> +	if (!extent_buffer_uptodate(root_eb)) {
> +		ret = btrfs_read_buffer(root_eb, root_gen);
> +		if (ret)
> +			goto out;
> +	}
> +
> +	if (root_level == 0) {
> +		ret = btrfs_qgroup_trace_leaf_items(trans, root, root_eb);
> +		goto out;
> +	}
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	/*
> +	 * Walk down the tree.  Missing extent blocks are filled in as
> +	 * we go. Metadata is accounted every time we read a new
> +	 * extent block.
> +	 *
> +	 * When we reach a leaf, we account for file extent items in it,
> +	 * walk back up the tree (adjusting slot pointers as we go)
> +	 * and restart the search process.
> +	 */
> +	extent_buffer_get(root_eb); /* For path */
> +	path->nodes[root_level] = root_eb;
> +	path->slots[root_level] = 0;
> +	path->locks[root_level] = 0; /* so release_path doesn't try to unlock */
> +walk_down:
> +	level = root_level;
> +	while (level >= 0) {
> +		if (path->nodes[level] == NULL) {
> +			int parent_slot;
> +			u64 child_gen;
> +			u64 child_bytenr;
> +
> +			/* We need to get child blockptr/gen from
> +			 * parent before we can read it. */
> +			eb = path->nodes[level + 1];
> +			parent_slot = path->slots[level + 1];
> +			child_bytenr = btrfs_node_blockptr(eb, parent_slot);
> +			child_gen = btrfs_node_ptr_generation(eb, parent_slot);
> +
> +			eb = read_tree_block(root, child_bytenr, child_gen);
> +			if (IS_ERR(eb)) {
> +				ret = PTR_ERR(eb);
> +				goto out;
> +			} else if (!extent_buffer_uptodate(eb)) {
> +				free_extent_buffer(eb);
> +				ret = -EIO;
> +				goto out;
> +			}
> +
> +			path->nodes[level] = eb;
> +			path->slots[level] = 0;
> +
> +			btrfs_tree_read_lock(eb);
> +			btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
> +			path->locks[level] = BTRFS_READ_LOCK_BLOCKING;
> +
> +			ret = btrfs_qgroup_trace_extent(trans,
> +					root->fs_info, child_bytenr,
> +					root->nodesize, GFP_NOFS);
> +			if (ret)
> +				goto out;
> +		}
> +
> +		if (level == 0) {
> +			ret = btrfs_qgroup_trace_leaf_items(trans, root,
> +					path->nodes[level]);
> +			if (ret)
> +				goto out;
> +
> +			/* Nonzero return here means we completed our search */
> +			ret = adjust_slots_upwards(root, path, root_level);
> +			if (ret)
> +				break;
> +
> +			/* Restart search with new slots */
> +			goto walk_down;
> +		}
> +
> +		level--;
> +	}
> +
> +	ret = 0;
> +out:
> +	btrfs_free_path(path);
> +
> +	return ret;
> +}
> +
>  #define UPDATE_NEW	0
>  #define UPDATE_OLD	1
>  /*
> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
> index 9303e09..99c879d 100644
> --- a/fs/btrfs/qgroup.h
> +++ b/fs/btrfs/qgroup.h
> @@ -122,6 +122,29 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans,
>  		struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes,
>  		gfp_t gfp_flag);
>  
> +/*
> + * Inform qgroup to trace all leaf items of data
> + *
> + * Return 0 for success
> + * Return <0 for error(ENOMEM)
> + */
> +int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans,
> +				  struct btrfs_root *root,
> +				  struct extent_buffer *eb);
> +/*
> + * Inform qgroup to trace a whole subtree, including all its child tree
> + * blocks and data.
> + * The root tree block is specified by @root_eb.
> + *
> + * Normally used by relocation(tree block swap) and subvolume deletion.
> + *
> + * Return 0 for success
> + * Return <0 for error(ENOMEM or tree search error)
> + */
> +int btrfs_qgroup_trace_subtree(struct btrfs_trans_handle *trans,
> +			       struct btrfs_root *root,
> +			       struct extent_buffer *root_eb,
> +			       u64 root_gen, int root_level);
>  int
>  btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans,
>  			    struct btrfs_fs_info *fs_info,
> 

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] btrfs: qgroup: Fix qgroup data leaking by using subtree tracing
  2016-10-18  1:31 ` [PATCH 4/4] btrfs: qgroup: Fix qgroup data leaking by using subtree tracing Qu Wenruo
@ 2016-11-03 18:50   ` Goldwyn Rodrigues
  0 siblings, 0 replies; 13+ messages in thread
From: Goldwyn Rodrigues @ 2016-11-03 18:50 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: rgoldwyn

Reviewed-and-Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

On 10/17/2016 08:31 PM, Qu Wenruo wrote:
> Commit 62b99540a1d91e464 (btrfs: relocation: Fix leaking qgroups numbers
> on data extents) only fixes the problem partly.
> 
> The previous fix is to trace all new data extents at transaction commit
> time when balance finishes.
> 
> However balance is not done in a large transaction, every path
> replacement can happen in its own transaction.
> This makes the fix useless if transaction commits during relocation.
> 
> For example:
> relocate_block_group()
> |-merge_reloc_roots()
> |  |- merge_reloc_root()
> |     |- btrfs_start_transaction()         <- Trans X
> |     |- replace_path()                    <- Cause leak
> |     |- btrfs_end_transaction_throttle()  <- Trans X commits here
> |     |                                       Leak not fixed
> |     |
> |     |- btrfs_start_transaction()         <- Trans Y
> |     |- replace_path()                    <- Cause leak
> |     |- btrfs_end_transaction_throttle()  <- Trans Y ends
> |                                             but not committed
> |-btrfs_join_transaction()                 <- Still trans Y
> |-qgroup_fix()                             <- Only fixes data leak
> |                                             in trans Y
> |-btrfs_commit_transaction()               <- Trans Y commits
> 
> In that case, qgroup fixup can only fix data leak in trans Y, data leak
> in trans X is out of fix.
> 
> So the correct fix should happens in the same transaction of
> replace_path().
> 
> This patch fixes it by tracing both subtrees of tree block swap, so it
> can fix the problem and ensure all leaking and fix are in the same
> transaction, so no leak again.
> 
> Reported-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>  fs/btrfs/relocation.c | 119 ++++++++++----------------------------------------
>  1 file changed, 23 insertions(+), 96 deletions(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 33cde30..540f225 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -1901,6 +1901,29 @@ again:
>  		BUG_ON(ret);
>  
>  		/*
> +		 * Info qgroup to trace both subtrees.
> +		 *
> +		 * We must trace both trees.
> +		 * 1) Tree reloc subtree
> +		 *    If not traced, we will leak data numbers
> +		 * 2) Fs subtree
> +		 *    If not traced, we will double count old data
> +		 *    and tree block numbers, if current trans doesn't free
> +		 *    data reloc tree inode.
> +		 */
> +		ret = btrfs_qgroup_trace_subtree(trans, src, parent,
> +				btrfs_header_generation(parent),
> +				btrfs_header_level(parent));
> +		if (ret < 0)
> +			break;
> +		ret = btrfs_qgroup_trace_subtree(trans, dest,
> +				path->nodes[level],
> +				btrfs_header_generation(path->nodes[level]),
> +				btrfs_header_level(path->nodes[level]));
> +		if (ret < 0)
> +			break;
> +
> +		/*
>  		 * swap blocks in fs tree and reloc tree.
>  		 */
>  		btrfs_set_node_blockptr(parent, slot, new_bytenr);
> @@ -3942,90 +3965,6 @@ int prepare_to_relocate(struct reloc_control *rc)
>  	return 0;
>  }
>  
> -/*
> - * Qgroup fixer for data chunk relocation.
> - * The data relocation is done in the following steps
> - * 1) Copy data extents into data reloc tree
> - * 2) Create tree reloc tree(special snapshot) for related subvolumes
> - * 3) Modify file extents in tree reloc tree
> - * 4) Merge tree reloc tree with original fs tree, by swapping tree blocks
> - *
> - * The problem is, data and tree reloc tree are not accounted to qgroup,
> - * and 4) will only info qgroup to track tree blocks change, not file extents
> - * in the tree blocks.
> - *
> - * The good news is, related data extents are all in data reloc tree, so we
> - * only need to info qgroup to track all file extents in data reloc tree
> - * before commit trans.
> - */
> -static int qgroup_fix_relocated_data_extents(struct btrfs_trans_handle *trans,
> -					     struct reloc_control *rc)
> -{
> -	struct btrfs_fs_info *fs_info = rc->extent_root->fs_info;
> -	struct inode *inode = rc->data_inode;
> -	struct btrfs_root *data_reloc_root = BTRFS_I(inode)->root;
> -	struct btrfs_path *path;
> -	struct btrfs_key key;
> -	int ret = 0;
> -
> -	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags))
> -		return 0;
> -
> -	/*
> -	 * Only for stage where we update data pointers the qgroup fix is
> -	 * valid.
> -	 * For MOVING_DATA stage, we will miss the timing of swapping tree
> -	 * blocks, and won't fix it.
> -	 */
> -	if (!(rc->stage == UPDATE_DATA_PTRS && rc->extents_found))
> -		return 0;
> -
> -	path = btrfs_alloc_path();
> -	if (!path)
> -		return -ENOMEM;
> -	key.objectid = btrfs_ino(inode);
> -	key.type = BTRFS_EXTENT_DATA_KEY;
> -	key.offset = 0;
> -
> -	ret = btrfs_search_slot(NULL, data_reloc_root, &key, path, 0, 0);
> -	if (ret < 0)
> -		goto out;
> -
> -	lock_extent(&BTRFS_I(inode)->io_tree, 0, (u64)-1);
> -	while (1) {
> -		struct btrfs_file_extent_item *fi;
> -
> -		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
> -		if (key.objectid > btrfs_ino(inode))
> -			break;
> -		if (key.type != BTRFS_EXTENT_DATA_KEY)
> -			goto next;
> -		fi = btrfs_item_ptr(path->nodes[0], path->slots[0],
> -				    struct btrfs_file_extent_item);
> -		if (btrfs_file_extent_type(path->nodes[0], fi) !=
> -				BTRFS_FILE_EXTENT_REG)
> -			goto next;
> -		ret = btrfs_qgroup_trace_extent(trans, fs_info,
> -			btrfs_file_extent_disk_bytenr(path->nodes[0], fi),
> -			btrfs_file_extent_disk_num_bytes(path->nodes[0], fi),
> -			GFP_NOFS);
> -		if (ret < 0)
> -			break;
> -next:
> -		ret = btrfs_next_item(data_reloc_root, path);
> -		if (ret < 0)
> -			break;
> -		if (ret > 0) {
> -			ret = 0;
> -			break;
> -		}
> -	}
> -	unlock_extent(&BTRFS_I(inode)->io_tree, 0 , (u64)-1);
> -out:
> -	btrfs_free_path(path);
> -	return ret;
> -}
> -
>  static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
>  {
>  	struct rb_root blocks = RB_ROOT;
> @@ -4216,13 +4155,6 @@ restart:
>  		err = PTR_ERR(trans);
>  		goto out_free;
>  	}
> -	ret = qgroup_fix_relocated_data_extents(trans, rc);
> -	if (ret < 0) {
> -		btrfs_abort_transaction(trans, ret);
> -		if (!err)
> -			err = ret;
> -		goto out_free;
> -	}
>  	btrfs_commit_transaction(trans, rc->extent_root);
>  out_free:
>  	btrfs_free_block_rsv(rc->extent_root, rc->block_rsv);
> @@ -4591,11 +4523,6 @@ int btrfs_recover_relocation(struct btrfs_root *root)
>  		err = PTR_ERR(trans);
>  		goto out_free;
>  	}
> -	err = qgroup_fix_relocated_data_extents(trans, rc);
> -	if (err < 0) {
> -		btrfs_abort_transaction(trans, err);
> -		goto out_free;
> -	}
>  	err = btrfs_commit_transaction(trans, rc->extent_root);
>  out_free:
>  	kfree(rc);
> 

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] Qgroup comment enhance and balance fix
  2016-10-18  1:31 [PATCH 0/4] Qgroup comment enhance and balance fix Qu Wenruo
                   ` (4 preceding siblings ...)
  2016-10-28  0:33 ` [PATCH 0/4] Qgroup comment enhance and balance fix Qu Wenruo
@ 2016-11-07 17:59 ` David Sterba
  5 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2016-11-07 17:59 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, rgoldwyn, rgoldwyn

On Tue, Oct 18, 2016 at 09:31:25AM +0800, Qu Wenruo wrote:
> The patchset does the following things:
> 1) Enhance comment for qgroup, rename 2 functions
>    Explain the how qgroup works, so new developers won't waste too much
>    time digging into the boring codes.
> 
>    The qgroup work flow is split into 3 main phrases:
>    Reverse, Trace, Account.
>    And rename functions like btrfs_qgroup_insert_dirty_extent_record()
>    to btrfs_qgroup_trace_extent(), to follow the "Trace" phrase.
> 
>    Other function name already follows such schema before.
> 
> 2) Move account_shared_subtree() and account_leaf_items() to qgroup.c
>    Such functions are only used by qgroup, so move them to qgroup.c and
>    rename them to follow "trace" schema.
> 
> 3) Fix the long standing qgroup balance corruption
>    Commit 62b99540a1d91e4 doesn't fix the problem completely.
>    It can only handle case that merge_reloc_roots() are all done in one
>    transaction.
> 
>    If transaction commits during merge_reloc_roots(), the data extents
>    will leak again.
> 
>    The tree fix is to info qgroup to trace both subtree(tree reloc tree
>    and destination fs tree), at replace_path() time.
>    Inside  replace_path(), there is one transaction start and end, so we
>    must make qgroup to trace both subtrees.
> 
>    Thanks for previous work, now we can easily trace subtree, so the fix
>    is quite simple now.
> 
>    And the cause also makes it easier to create pinpoint test case for
>    this bug.
> 
> Qu Wenruo (4):
>   btrfs: qgroup: Add comments explaining how btrfs qgroup works
>   btrfs: qgroup: Rename functions to make it follow
>     reserve,trace,account steps
>   btrfs: Expoert and move leaf/subtree qgroup helpers to qgroup.c
>   btrfs: qgroup: Fix qgroup data leaking by using subtree tracing

Added to next queue.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-11-07 17:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-18  1:31 [PATCH 0/4] Qgroup comment enhance and balance fix Qu Wenruo
2016-10-18  1:31 ` [PATCH 1/4] btrfs: qgroup: Add comments explaining how btrfs qgroup works Qu Wenruo
2016-11-03 18:49   ` Goldwyn Rodrigues
2016-10-18  1:31 ` [PATCH 2/4] btrfs: qgroup: Rename functions to make it follow reserve,trace,account steps Qu Wenruo
2016-11-03 18:49   ` Goldwyn Rodrigues
2016-10-18  1:31 ` [PATCH 3/4] btrfs: Expoert and move leaf/subtree qgroup helpers to qgroup.c Qu Wenruo
2016-11-03 18:50   ` Goldwyn Rodrigues
2016-10-18  1:31 ` [PATCH 4/4] btrfs: qgroup: Fix qgroup data leaking by using subtree tracing Qu Wenruo
2016-11-03 18:50   ` Goldwyn Rodrigues
2016-10-28  0:33 ` [PATCH 0/4] Qgroup comment enhance and balance fix Qu Wenruo
2016-10-28 15:03   ` Goldwyn Rodrigues
2016-10-31 17:00   ` David Sterba
2016-11-07 17:59 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.