All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/15] Add delayed-refs support to btrfs-progs
@ 2018-06-08 12:47 Nikolay Borisov
  2018-06-08 12:47 ` [PATCH 01/15] btrfs-progs: Remove root argument from pin_down_bytes Nikolay Borisov
                   ` (16 more replies)
  0 siblings, 17 replies; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-08 12:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Hello,                                                                          
                                                                                
Here is a series which adds support for delayed refs. This is needed to enable  
later work on adding freespace tree repair code. Additionally, it results in  
more code sharing between kernel/user space.

Patches 1-9 are simple prep patches removing some arguments, causing problems
later. They can go independently of the delayed refs work. They don't introduce
any functional changes. Next, patches 10-13 introduce the needed infrastructure
to for delayed refs without actually activating it. Patch 14 finally wires it
up by adding the necessary call outs to btrfs_run_delayed refs and reworking the
extent addition/freeing functions. With all of this done, patch 15 finally
removes the old code.

This series passes all btrfs progs fsck and misc tests + fuzz tests apart from
fuzz-003/007/009 - but those fail without this series so it's unlikely it's
caused by it.

Nikolay Borisov (15):
  btrfs-progs: Remove root argument from pin_down_bytes
  btrfs-progs: Remove root argument from btrfs_del_csums
  btrfs-progs: Add functions to modify the used space by a root
  btrfs-progs: Refactor the root used bytes are updated
  btrfs-progs: Make update_block_group take fs_info instead of root
  btrfs-progs: check: Drop trans/root arguments from free_extent_hook
  btrfs-progs: Remove root argument from __free_extent
  btrfs-progs: Remove root argument from alloc_reserved_tree_block
  btrfs-progs: Always pass 0 for offset when calling btrfs_free_extent
    for btree blocks.
  btrfs-progs: Add boolean to signal whether we are re-initing extent
    tree
  btrfs-progs: Add delayed refs infrastructure
  btrfs-progs: Add __free_extent2 function
  btrfs-progs: Add alloc_reserved_tree_block2 function
  btrfs-progs: Wire up delayed refs
  btrfs-progs: Remove old delayed refs infrastructure

 Makefile              |   3 +-
 btrfs-corrupt-block.c |   2 +-
 check/main.c          |   8 +-
 ctree.c               |  29 ++-
 ctree.h               |  11 +-
 delayed-ref.c         | 608 ++++++++++++++++++++++++++++++++++++++++++++++++++
 delayed-ref.h         | 225 +++++++++++++++++++
 extent-tree.c         | 604 +++++++++++++++++++++++++++++--------------------
 file-item.c           |  20 +-
 kerncompat.h          |   8 +
 transaction.c         |  25 +++
 transaction.h         |   5 +
 12 files changed, 1280 insertions(+), 268 deletions(-)
 create mode 100644 delayed-ref.c
 create mode 100644 delayed-ref.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/15] btrfs-progs: Remove root argument from pin_down_bytes
  2018-06-08 12:47 [PATCH 00/15] Add delayed-refs support to btrfs-progs Nikolay Borisov
@ 2018-06-08 12:47 ` Nikolay Borisov
  2018-06-11  4:41   ` Qu Wenruo
  2018-06-08 12:47 ` [PATCH 02/15] btrfs-progs: Remove root argument from btrfs_del_csums Nikolay Borisov
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-08 12:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

This argument is used to obtain a reference to fs_info, which can
already be done from the passed trans handle, so use that instead.
This is in preparation for delayed refs support.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 extent-tree.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/extent-tree.c b/extent-tree.c
index 0643815bd41c..cbc022f6cef6 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -2098,9 +2098,8 @@ static int finish_current_insert(struct btrfs_trans_handle *trans)
 	return 0;
 }
 
-static int pin_down_bytes(struct btrfs_trans_handle *trans,
-			  struct btrfs_root *root,
-			  u64 bytenr, u64 num_bytes, int is_data)
+static int pin_down_bytes(struct btrfs_trans_handle *trans, u64 bytenr,
+			  u64 num_bytes, int is_data)
 {
 	int err = 0;
 	struct extent_buffer *buf;
@@ -2108,7 +2107,7 @@ static int pin_down_bytes(struct btrfs_trans_handle *trans,
 	if (is_data)
 		goto pinit;
 
-	buf = btrfs_find_tree_block(root->fs_info, bytenr, num_bytes);
+	buf = btrfs_find_tree_block(trans->fs_info, bytenr, num_bytes);
 	if (!buf)
 		goto pinit;
 
@@ -2360,7 +2359,7 @@ static int __free_extent(struct btrfs_trans_handle *trans,
 		}
 
 		if (pin) {
-			ret = pin_down_bytes(trans, root, bytenr, num_bytes,
+			ret = pin_down_bytes(trans, bytenr, num_bytes,
 					     is_data);
 			if (ret > 0)
 				mark_free = 1;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 02/15] btrfs-progs: Remove root argument from btrfs_del_csums
  2018-06-08 12:47 [PATCH 00/15] Add delayed-refs support to btrfs-progs Nikolay Borisov
  2018-06-08 12:47 ` [PATCH 01/15] btrfs-progs: Remove root argument from pin_down_bytes Nikolay Borisov
@ 2018-06-08 12:47 ` Nikolay Borisov
  2018-06-11  4:46   ` Qu Wenruo
  2018-06-08 12:47 ` [PATCH 03/15] btrfs-progs: Add functions to modify the used space by a root Nikolay Borisov
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-08 12:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

It's not needed, since we can obtain a reference to fs_info from the
passed transaction handle. This is needed by delayed refs code.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 btrfs-corrupt-block.c |  2 +-
 ctree.h               |  3 +--
 extent-tree.c         |  2 +-
 file-item.c           | 20 ++++++++++----------
 4 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
index 4fbea26cda20..3add8e63b7bb 100644
--- a/btrfs-corrupt-block.c
+++ b/btrfs-corrupt-block.c
@@ -926,7 +926,7 @@ static int delete_csum(struct btrfs_root *root, u64 bytenr, u64 bytes)
 		return PTR_ERR(trans);
 	}
 
-	ret = btrfs_del_csums(trans, root, bytenr, bytes);
+	ret = btrfs_del_csums(trans, bytenr, bytes);
 	if (ret)
 		fprintf(stderr, "Error deleting csums %d\n", ret);
 	btrfs_commit_transaction(trans, root);
diff --git a/ctree.h b/ctree.h
index de4b1b7e6416..082726238b91 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2752,8 +2752,7 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
 			u64 ino, u64 parent_ino, u64 *index);
 
 /* file-item.c */
-int btrfs_del_csums(struct btrfs_trans_handle *trans,
-		    struct btrfs_root *root, u64 bytenr, u64 len);
+int btrfs_del_csums(struct btrfs_trans_handle *trans, u64 bytenr, u64 len);
 int btrfs_insert_file_extent(struct btrfs_trans_handle *trans,
 			     struct btrfs_root *root,
 			     u64 objectid, u64 pos, u64 offset,
diff --git a/extent-tree.c b/extent-tree.c
index cbc022f6cef6..c6f09b52800f 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -2372,7 +2372,7 @@ static int __free_extent(struct btrfs_trans_handle *trans,
 		btrfs_release_path(path);
 
 		if (is_data) {
-			ret = btrfs_del_csums(trans, root, bytenr, num_bytes);
+			ret = btrfs_del_csums(trans, bytenr, num_bytes);
 			BUG_ON(ret);
 		}
 
diff --git a/file-item.c b/file-item.c
index 7b0ff3585509..71d4e89f78d1 100644
--- a/file-item.c
+++ b/file-item.c
@@ -394,8 +394,7 @@ static noinline int truncate_one_csum(struct btrfs_root *root,
  * deletes the csum items from the csum tree for a given
  * range of bytes.
  */
-int btrfs_del_csums(struct btrfs_trans_handle *trans,
-		    struct btrfs_root *root, u64 bytenr, u64 len)
+int btrfs_del_csums(struct btrfs_trans_handle *trans, u64 bytenr, u64 len)
 {
 	struct btrfs_path *path;
 	struct btrfs_key key;
@@ -403,11 +402,10 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
 	u64 csum_end;
 	struct extent_buffer *leaf;
 	int ret;
-	u16 csum_size =
-		btrfs_super_csum_size(root->fs_info->super_copy);
-	int blocksize = root->fs_info->sectorsize;
+	u16 csum_size = btrfs_super_csum_size(trans->fs_info->super_copy);
+	int blocksize = trans->fs_info->sectorsize;
+	struct btrfs_root *csum_root = trans->fs_info->csum_root;
 
-	root = root->fs_info->csum_root;
 
 	path = btrfs_alloc_path();
 	if (!path)
@@ -418,7 +416,7 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
 		key.offset = end_byte - 1;
 		key.type = BTRFS_EXTENT_CSUM_KEY;
 
-		ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
+		ret = btrfs_search_slot(trans, csum_root, &key, path, -1, 1);
 		if (ret > 0) {
 			if (path->slots[0] == 0)
 				goto out;
@@ -445,7 +443,7 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
 
 		/* delete the entire item, it is inside our range */
 		if (key.offset >= bytenr && csum_end <= end_byte) {
-			ret = btrfs_del_item(trans, root, path);
+			ret = btrfs_del_item(trans, csum_root, path);
 			BUG_ON(ret);
 		} else if (key.offset < bytenr && csum_end > end_byte) {
 			unsigned long offset;
@@ -485,12 +483,14 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
 			 * btrfs_split_item returns -EAGAIN when the
 			 * item changed size or key
 			 */
-			ret = btrfs_split_item(trans, root, path, &key, offset);
+			ret = btrfs_split_item(trans, csum_root, path, &key,
+					       offset);
 			BUG_ON(ret && ret != -EAGAIN);
 
 			key.offset = end_byte - 1;
 		} else {
-			ret = truncate_one_csum(root, path, &key, bytenr, len);
+			ret = truncate_one_csum(csum_root, path, &key, bytenr,
+						len);
 			BUG_ON(ret);
 		}
 		btrfs_release_path(path);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 03/15] btrfs-progs: Add functions to modify the used space by a root
  2018-06-08 12:47 [PATCH 00/15] Add delayed-refs support to btrfs-progs Nikolay Borisov
  2018-06-08 12:47 ` [PATCH 01/15] btrfs-progs: Remove root argument from pin_down_bytes Nikolay Borisov
  2018-06-08 12:47 ` [PATCH 02/15] btrfs-progs: Remove root argument from btrfs_del_csums Nikolay Borisov
@ 2018-06-08 12:47 ` Nikolay Borisov
  2018-06-11  4:47   ` Qu Wenruo
  2018-06-08 12:47 ` [PATCH 04/15] btrfs-progs: Refactor the root used bytes are updated Nikolay Borisov
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-08 12:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Pull the necessary function, excluding locking. Required to enable
integration of delayed refs.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 ctree.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/ctree.c b/ctree.c
index 2c51580fec65..7b74716bf92f 100644
--- a/ctree.c
+++ b/ctree.c
@@ -76,6 +76,18 @@ void add_root_to_dirty_list(struct btrfs_root *root)
 	}
 }
 
+static void root_add_used(struct btrfs_root *root, u32 size)
+{
+        btrfs_set_root_used(&root->root_item,
+                            btrfs_root_used(&root->root_item) + size);
+}
+
+static void root_sub_used(struct btrfs_root *root, u32 size)
+{
+        btrfs_set_root_used(&root->root_item,
+                            btrfs_root_used(&root->root_item) - size);
+}
+
 int btrfs_copy_root(struct btrfs_trans_handle *trans,
 		      struct btrfs_root *root,
 		      struct extent_buffer *buf,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/15] btrfs-progs: Refactor the root used bytes are updated
  2018-06-08 12:47 [PATCH 00/15] Add delayed-refs support to btrfs-progs Nikolay Borisov
                   ` (2 preceding siblings ...)
  2018-06-08 12:47 ` [PATCH 03/15] btrfs-progs: Add functions to modify the used space by a root Nikolay Borisov
@ 2018-06-08 12:47 ` Nikolay Borisov
  2018-06-08 12:47 ` [PATCH 05/15] btrfs-progs: Make update_block_group take fs_info instead of root Nikolay Borisov
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-08 12:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Instead of updating this during update_block_group, move the updating
code at the places where we free/allocate a block. This resembles the
current state of the kernel code. This is in prep for delayed refs.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 ctree.c       | 13 +++++++++++++
 extent-tree.c |  8 --------
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/ctree.c b/ctree.c
index 7b74716bf92f..8f3338b4693a 100644
--- a/ctree.c
+++ b/ctree.c
@@ -734,6 +734,8 @@ static int balance_level(struct btrfs_trans_handle *trans,
 		/* once for the path */
 		free_extent_buffer(mid);
 
+		root_sub_used(root, mid->len);
+
 		ret = btrfs_free_extent(trans, root, mid->start, mid->len,
 					0, root->root_key.objectid,
 					level, 1);
@@ -789,6 +791,8 @@ static int balance_level(struct btrfs_trans_handle *trans,
 			wret = btrfs_del_ptr(root, path, level + 1, pslot + 1);
 			if (wret)
 				ret = wret;
+
+			root_sub_used(root, right->len);
 			wret = btrfs_free_extent(trans, root, bytenr,
 						 blocksize, 0,
 						 root->root_key.objectid,
@@ -835,6 +839,8 @@ static int balance_level(struct btrfs_trans_handle *trans,
 		wret = btrfs_del_ptr(root, path, level + 1, pslot);
 		if (wret)
 			ret = wret;
+
+		root_sub_used(root, blocksize);
 		wret = btrfs_free_extent(trans, root, bytenr, blocksize,
 					 0, root->root_key.objectid,
 					 level, 0);
@@ -1466,6 +1472,8 @@ static int noinline insert_new_root(struct btrfs_trans_handle *trans,
 	btrfs_set_header_backref_rev(c, BTRFS_MIXED_BACKREF_REV);
 	btrfs_set_header_owner(c, root->root_key.objectid);
 
+	root_add_used(root, root->fs_info->nodesize);
+
 	write_extent_buffer(c, root->fs_info->fsid,
 			    btrfs_header_fsid(), BTRFS_FSID_SIZE);
 
@@ -1593,6 +1601,7 @@ static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root
 			    btrfs_header_chunk_tree_uuid(split),
 			    BTRFS_UUID_SIZE);
 
+	root_add_used(root, root->fs_info->nodesize);
 
 	copy_extent_buffer(split, c,
 			   btrfs_node_key_ptr_offset(0),
@@ -2175,6 +2184,8 @@ static noinline int split_leaf(struct btrfs_trans_handle *trans,
 			    btrfs_header_chunk_tree_uuid(right),
 			    BTRFS_UUID_SIZE);
 
+	root_add_used(root, root->fs_info->nodesize);
+
 	if (split == 0) {
 		if (mid <= slot) {
 			btrfs_set_header_nritems(right, 0);
@@ -2694,6 +2705,8 @@ static noinline int btrfs_del_leaf(struct btrfs_trans_handle *trans,
 	if (ret)
 		return ret;
 
+	root_sub_used(root, leaf->len);
+
 	ret = btrfs_free_extent(trans, root, leaf->start, leaf->len,
 				0, root->root_key.objectid, 0, 0);
 	return ret;
diff --git a/extent-tree.c b/extent-tree.c
index c6f09b52800f..07b5fb99e8cf 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1932,14 +1932,6 @@ static int update_block_group(struct btrfs_root *root,
 		old_val -= num_bytes;
 	btrfs_set_super_bytes_used(info->super_copy, old_val);
 
-	/* block accounting for root item */
-	old_val = btrfs_root_used(&root->root_item);
-	if (alloc)
-		old_val += num_bytes;
-	else
-		old_val -= num_bytes;
-	btrfs_set_root_used(&root->root_item, old_val);
-
 	while(total) {
 		cache = btrfs_lookup_block_group(info, bytenr);
 		if (!cache) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/15] btrfs-progs: Make update_block_group take fs_info instead of root
  2018-06-08 12:47 [PATCH 00/15] Add delayed-refs support to btrfs-progs Nikolay Borisov
                   ` (3 preceding siblings ...)
  2018-06-08 12:47 ` [PATCH 04/15] btrfs-progs: Refactor the root used bytes are updated Nikolay Borisov
@ 2018-06-08 12:47 ` Nikolay Borisov
  2018-06-11  4:49   ` Qu Wenruo
  2018-06-08 12:47 ` [PATCH 06/15] btrfs-progs: check: Drop trans/root arguments from free_extent_hook Nikolay Borisov
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-08 12:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

This is in preparation of delayed refs code.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 extent-tree.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/extent-tree.c b/extent-tree.c
index 07b5fb99e8cf..6e7a19323efc 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1912,12 +1912,10 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
 	return 0;
 }
 
-static int update_block_group(struct btrfs_root *root,
-			      u64 bytenr, u64 num_bytes, int alloc,
-			      int mark_free)
+static int update_block_group(struct btrfs_fs_info *info, u64 bytenr,
+			      u64 num_bytes, int alloc, int mark_free)
 {
 	struct btrfs_block_group_cache *cache;
-	struct btrfs_fs_info *info = root->fs_info;
 	u64 total = num_bytes;
 	u64 old_val;
 	u64 byte_in_group;
@@ -2368,7 +2366,8 @@ static int __free_extent(struct btrfs_trans_handle *trans,
 			BUG_ON(ret);
 		}
 
-		update_block_group(root, bytenr, num_bytes, 0, mark_free);
+		update_block_group(trans->fs_info, bytenr, num_bytes, 0,
+				   mark_free);
 	}
 fail:
 	btrfs_free_path(path);
@@ -2730,7 +2729,7 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
 	btrfs_mark_buffer_dirty(leaf);
 	btrfs_free_path(path);
 
-	ret = update_block_group(root, ins->objectid, fs_info->nodesize,
+	ret = update_block_group(fs_info, ins->objectid, fs_info->nodesize,
 				 1, 0);
 	return ret;
 }
@@ -3413,7 +3412,7 @@ int btrfs_update_block_group(struct btrfs_root *root,
 			     u64 bytenr, u64 num_bytes, int alloc,
 			     int mark_free)
 {
-	return update_block_group(root, bytenr, num_bytes,
+	return update_block_group(root->fs_info, bytenr, num_bytes,
 				  alloc, mark_free);
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/15] btrfs-progs: check: Drop trans/root arguments from free_extent_hook
  2018-06-08 12:47 [PATCH 00/15] Add delayed-refs support to btrfs-progs Nikolay Borisov
                   ` (4 preceding siblings ...)
  2018-06-08 12:47 ` [PATCH 05/15] btrfs-progs: Make update_block_group take fs_info instead of root Nikolay Borisov
@ 2018-06-08 12:47 ` Nikolay Borisov
  2018-06-11  4:55   ` Qu Wenruo
  2018-06-08 12:47 ` [PATCH 07/15] btrfs-progs: Remove root argument from __free_extent Nikolay Borisov
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-08 12:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

They are not really needed, what free_extent_hook wants is really a
pointer to fs_info so give it to it directly. This is in preparation
of delayed refs code.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 check/main.c  | 5 ++---
 ctree.h       | 3 +--
 extent-tree.c | 4 ++--
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/check/main.c b/check/main.c
index 9a1f238800b0..b84903acdb25 100644
--- a/check/main.c
+++ b/check/main.c
@@ -6234,8 +6234,7 @@ static int add_root_to_pending(struct extent_buffer *buf,
  * we're tracking for repair.  This hook makes sure we
  * remove any backrefs for blocks as we are fixing them.
  */
-static int free_extent_hook(struct btrfs_trans_handle *trans,
-			    struct btrfs_root *root,
+static int free_extent_hook(struct btrfs_fs_info *fs_info,
 			    u64 bytenr, u64 num_bytes, u64 parent,
 			    u64 root_objectid, u64 owner, u64 offset,
 			    int refs_to_drop)
@@ -6243,7 +6242,7 @@ static int free_extent_hook(struct btrfs_trans_handle *trans,
 	struct extent_record *rec;
 	struct cache_extent *cache;
 	int is_data;
-	struct cache_tree *extent_cache = root->fs_info->fsck_extent_cache;
+	struct cache_tree *extent_cache = fs_info->fsck_extent_cache;
 
 	is_data = owner >= BTRFS_FIRST_FREE_OBJECTID;
 	cache = lookup_cache_extent(extent_cache, bytenr, num_bytes);
diff --git a/ctree.h b/ctree.h
index 082726238b91..b30a946658ce 100644
--- a/ctree.h
+++ b/ctree.h
@@ -1143,8 +1143,7 @@ struct btrfs_fs_info {
 
 	int transaction_aborted;
 
-	int (*free_extent_hook)(struct btrfs_trans_handle *trans,
-				struct btrfs_root *root,
+	int (*free_extent_hook)(struct btrfs_fs_info *fs_info,
 				u64 bytenr, u64 num_bytes, u64 parent,
 				u64 root_objectid, u64 owner, u64 offset,
 				int refs_to_drop);
diff --git a/extent-tree.c b/extent-tree.c
index 6e7a19323efc..9132cb3f8e15 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -2163,8 +2163,8 @@ static int __free_extent(struct btrfs_trans_handle *trans,
 	int skinny_metadata =
 		btrfs_fs_incompat(extent_root->fs_info, SKINNY_METADATA);
 
-	if (root->fs_info->free_extent_hook) {
-		root->fs_info->free_extent_hook(trans, root, bytenr, num_bytes,
+	if (trans->fs_info->free_extent_hook) {
+		trans->fs_info->free_extent_hook(trans->fs_info, bytenr, num_bytes,
 						parent, root_objectid, owner_objectid,
 						owner_offset, refs_to_drop);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/15] btrfs-progs: Remove root argument from __free_extent
  2018-06-08 12:47 [PATCH 00/15] Add delayed-refs support to btrfs-progs Nikolay Borisov
                   ` (5 preceding siblings ...)
  2018-06-08 12:47 ` [PATCH 06/15] btrfs-progs: check: Drop trans/root arguments from free_extent_hook Nikolay Borisov
@ 2018-06-08 12:47 ` Nikolay Borisov
  2018-06-11  4:58   ` Qu Wenruo
  2018-06-08 12:47 ` [PATCH 08/15] btrfs-progs: Remove root argument from alloc_reserved_tree_block Nikolay Borisov
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-08 12:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

This argument is no longer used in this function so remove it.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 extent-tree.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/extent-tree.c b/extent-tree.c
index 9132cb3f8e15..c16bd85e92be 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -50,7 +50,6 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
 				     u64 flags, struct btrfs_disk_key *key,
 				     int level, struct btrfs_key *ins);
 static int __free_extent(struct btrfs_trans_handle *trans,
-			 struct btrfs_root *root,
 			 u64 bytenr, u64 num_bytes, u64 parent,
 			 u64 root_objectid, u64 owner_objectid,
 			 u64 owner_offset, int refs_to_drop);
@@ -2141,7 +2140,6 @@ void btrfs_unpin_extent(struct btrfs_fs_info *fs_info,
  * remove an extent from the root, returns 0 on success
  */
 static int __free_extent(struct btrfs_trans_handle *trans,
-			 struct btrfs_root *root,
 			 u64 bytenr, u64 num_bytes, u64 parent,
 			 u64 root_objectid, u64 owner_objectid,
 			 u64 owner_offset, int refs_to_drop)
@@ -2149,7 +2147,7 @@ static int __free_extent(struct btrfs_trans_handle *trans,
 
 	struct btrfs_key key;
 	struct btrfs_path *path;
-	struct btrfs_root *extent_root = root->fs_info->extent_root;
+	struct btrfs_root *extent_root = trans->fs_info->extent_root;
 	struct extent_buffer *leaf;
 	struct btrfs_extent_item *ei;
 	struct btrfs_extent_inline_ref *iref;
@@ -2409,8 +2407,7 @@ static int del_pending_extents(struct btrfs_trans_handle *trans)
 
 		if (!test_range_bit(extent_ins, start, end,
 				    EXTENT_LOCKED, 0)) {
-			ret = __free_extent(trans, extent_root,
-					    start, end + 1 - start, 0,
+			ret = __free_extent(trans, start, end + 1 - start, 0,
 					    extent_root->root_key.objectid,
 					    extent_op->level, 0, 1);
 			kfree(extent_op);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/15] btrfs-progs: Remove root argument from alloc_reserved_tree_block
  2018-06-08 12:47 [PATCH 00/15] Add delayed-refs support to btrfs-progs Nikolay Borisov
                   ` (6 preceding siblings ...)
  2018-06-08 12:47 ` [PATCH 07/15] btrfs-progs: Remove root argument from __free_extent Nikolay Borisov
@ 2018-06-08 12:47 ` Nikolay Borisov
  2018-06-08 12:47 ` [PATCH 09/15] btrfs-progs: Always pass 0 for offset when calling btrfs_free_extent for btree blocks Nikolay Borisov
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-08 12:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

This is not really needed, since we can reference the fs_info from the
passed transaction. This is in preparation for delayed-refs support.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 extent-tree.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/extent-tree.c b/extent-tree.c
index c16bd85e92be..079204ed290f 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -45,7 +45,6 @@ struct pending_extent_op {
 };
 
 static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
-				     struct btrfs_root *root,
 				     u64 root_objectid, u64 generation,
 				     u64 flags, struct btrfs_disk_key *key,
 				     int level, struct btrfs_key *ins);
@@ -2070,7 +2069,8 @@ static int finish_current_insert(struct btrfs_trans_handle *trans)
 				key.offset = extent_op->num_bytes;
 				key.type = BTRFS_EXTENT_ITEM_KEY;
 			}
-			ret = alloc_reserved_tree_block(trans, extent_root,
+
+			ret = alloc_reserved_tree_block(trans,
 						extent_root->root_key.objectid,
 						trans->transid,
 						extent_op->flags,
@@ -2677,13 +2677,12 @@ int btrfs_reserve_extent(struct btrfs_trans_handle *trans,
 }
 
 static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
-				     struct btrfs_root *root,
 				     u64 root_objectid, u64 generation,
 				     u64 flags, struct btrfs_disk_key *key,
 				     int level, struct btrfs_key *ins)
 {
 	int ret;
-	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_extent_item *extent_item;
 	struct btrfs_tree_block_info *block_info;
 	struct btrfs_extent_inline_ref *iref;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/15] btrfs-progs: Always pass 0 for offset when calling btrfs_free_extent for btree blocks.
  2018-06-08 12:47 [PATCH 00/15] Add delayed-refs support to btrfs-progs Nikolay Borisov
                   ` (7 preceding siblings ...)
  2018-06-08 12:47 ` [PATCH 08/15] btrfs-progs: Remove root argument from alloc_reserved_tree_block Nikolay Borisov
@ 2018-06-08 12:47 ` Nikolay Borisov
  2018-06-11  5:05   ` Qu Wenruo
  2018-06-08 12:47 ` [PATCH 10/15] btrfs-progs: Add boolean to signal whether we are re-initing extent tree Nikolay Borisov
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-08 12:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Currently some instances of btrfs_free_extent are called with the
last parameter ("offset") being set to 1. This makes no sense, since
offset is used for data extents. I suspect this is a left-over from
95d3f20b51e9 ("Mixed back reference  (FORWARD ROLLING FORMAT CHANGE)")
since this commit changed the signature of the function from :

-int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root
-                     *root, u64 bytenr, u64 num_bytes, u64 parent,
-                     u64 root_objectid, u64 ref_generation,
-                     u64 owner_objectid, int pin);

to

+int btrfs_free_extent(struct btrfs_trans_handle *trans,
+                     struct btrfs_root *root,
+                     u64 bytenr, u64 num_bytes, u64 parent,
+                     u64 root_objectid, u64 owner, u64 offset);

I.e the last parameter was "pin" and not offset. So these are just
leftovers with no semantic meaning. Fix this by passing 0.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 ctree.c       | 4 ++--
 extent-tree.c | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/ctree.c b/ctree.c
index 8f3338b4693a..d8a6883aa85f 100644
--- a/ctree.c
+++ b/ctree.c
@@ -334,7 +334,7 @@ int __btrfs_cow_block(struct btrfs_trans_handle *trans,
 		WARN_ON(btrfs_header_generation(parent) != trans->transid);
 
 		btrfs_free_extent(trans, root, buf->start, buf->len,
-				  0, root->root_key.objectid, level, 1);
+				  0, root->root_key.objectid, level, 0);
 	}
 	if (!list_empty(&buf->recow)) {
 		list_del_init(&buf->recow);
@@ -738,7 +738,7 @@ static int balance_level(struct btrfs_trans_handle *trans,
 
 		ret = btrfs_free_extent(trans, root, mid->start, mid->len,
 					0, root->root_key.objectid,
-					level, 1);
+					level, 0);
 		/* once for the root ptr */
 		free_extent_buffer(mid);
 		return ret;
diff --git a/extent-tree.c b/extent-tree.c
index 079204ed290f..ab57c20d9dee 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -2961,7 +2961,7 @@ static int noinline walk_down_tree(struct btrfs_trans_handle *trans,
 			path->slots[*level]++;
 			ret = btrfs_free_extent(trans, root, bytenr, blocksize,
 						parent->start, root_owner,
-						root_gen, *level - 1, 1);
+						root_gen, *level - 1, 0);
 			BUG_ON(ret);
 			continue;
 		}
@@ -3003,7 +3003,7 @@ static int noinline walk_down_tree(struct btrfs_trans_handle *trans,
 	root_gen = btrfs_header_generation(parent);
 	ret = btrfs_free_extent(trans, root, path->nodes[*level]->start,
 				path->nodes[*level]->len, parent->start,
-				root_owner, root_gen, *level, 1);
+				root_owner, root_gen, *level, 0);
 	free_extent_buffer(path->nodes[*level]);
 	path->nodes[*level] = NULL;
 	*level += 1;
@@ -3054,7 +3054,7 @@ static int noinline walk_up_tree(struct btrfs_trans_handle *trans,
 						path->nodes[*level]->start,
 						path->nodes[*level]->len,
 						parent->start, root_owner,
-						root_gen, *level, 1);
+						root_gen, *level, 0);
 			BUG_ON(ret);
 			free_extent_buffer(path->nodes[*level]);
 			path->nodes[*level] = NULL;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/15] btrfs-progs: Add boolean to signal whether we are re-initing extent tree
  2018-06-08 12:47 [PATCH 00/15] Add delayed-refs support to btrfs-progs Nikolay Borisov
                   ` (8 preceding siblings ...)
  2018-06-08 12:47 ` [PATCH 09/15] btrfs-progs: Always pass 0 for offset when calling btrfs_free_extent for btree blocks Nikolay Borisov
@ 2018-06-08 12:47 ` Nikolay Borisov
  2018-06-08 12:47 ` [PATCH 11/15] btrfs-progs: Add delayed refs infrastructure Nikolay Borisov
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-08 12:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Add a boolean to record whether the extent tree is being re-initialised
in the current transaction. This is going to be needed by the
delayed refs code.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 transaction.c | 1 +
 transaction.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/transaction.c b/transaction.c
index 9619265ef6e8..ecafbb156610 100644
--- a/transaction.c
+++ b/transaction.c
@@ -46,6 +46,7 @@ struct btrfs_trans_handle* btrfs_start_transaction(struct btrfs_root *root,
 	fs_info->generation++;
 	h->transid = fs_info->generation;
 	h->blocks_reserved = num_blocks;
+	h->reinit_extent_tree = false;
 	root->last_trans = h->transid;
 	root->commit_root = root->node;
 	extent_buffer_get(root->node);
diff --git a/transaction.h b/transaction.h
index 470ee3de1358..750e329e1ba8 100644
--- a/transaction.h
+++ b/transaction.h
@@ -27,6 +27,7 @@ struct btrfs_trans_handle {
 	u64 transid;
 	u64 alloc_exclude_start;
 	u64 alloc_exclude_nr;
+	bool reinit_extent_tree;
 	unsigned long blocks_reserved;
 	unsigned long blocks_used;
 	struct btrfs_block_group_cache *block_group;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 11/15] btrfs-progs: Add delayed refs infrastructure
  2018-06-08 12:47 [PATCH 00/15] Add delayed-refs support to btrfs-progs Nikolay Borisov
                   ` (9 preceding siblings ...)
  2018-06-08 12:47 ` [PATCH 10/15] btrfs-progs: Add boolean to signal whether we are re-initing extent tree Nikolay Borisov
@ 2018-06-08 12:47 ` Nikolay Borisov
  2018-06-08 14:53   ` [PATCH 11/15 v2] " Nikolay Borisov
                     ` (2 more replies)
  2018-06-08 12:47 ` [PATCH 12/15] btrfs-progs: Add __free_extent2 function Nikolay Borisov
                   ` (5 subsequent siblings)
  16 siblings, 3 replies; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-08 12:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

This commit pulls those portions of the kernel implementation of
delayed refs which are necessary to have them working in user-space.
I've done the following modifications:

1. Replaced all kmem_cache_alloc calls to kmalloc.

2. Removed all locking-related code, since we are single threaded in
userspace.

3. Removed code which deals with data refs - delayed refs in user space
are going to be used only for cowonly trees.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 Makefile      |   3 +-
 ctree.h       |   3 +
 delayed-ref.c | 608 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 delayed-ref.h | 225 ++++++++++++++++++++++
 extent-tree.c | 228 ++++++++++++++++++++++
 kerncompat.h  |   8 +
 transaction.h |   4 +
 7 files changed, 1078 insertions(+), 1 deletion(-)
 create mode 100644 delayed-ref.c
 create mode 100644 delayed-ref.h

diff --git a/Makefile b/Makefile
index 544410e6440c..9508ad4f11e6 100644
--- a/Makefile
+++ b/Makefile
@@ -116,7 +116,8 @@ objects = ctree.o disk-io.o kernel-lib/radix-tree.o extent-tree.o print-tree.o \
 	  qgroup.o free-space-cache.o kernel-lib/list_sort.o props.o \
 	  kernel-shared/ulist.o qgroup-verify.o backref.o string-table.o task-utils.o \
 	  inode.o file.o find-root.o free-space-tree.o help.o send-dump.o \
-	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o
+	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o \
+	  delayed-ref.o
 cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
 	       cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
 	       cmds-quota.o cmds-qgroup.o cmds-replace.o check/main.o \
diff --git a/ctree.h b/ctree.h
index b30a946658ce..d1ea45571d1e 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2812,4 +2812,7 @@ int btrfs_punch_hole(struct btrfs_trans_handle *trans,
 int btrfs_read_file(struct btrfs_root *root, u64 ino, u64 start, int len,
 		    char *dest);
 
+
+/* extent-tree.c */
+int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, unsigned long nr);
 #endif
diff --git a/delayed-ref.c b/delayed-ref.c
new file mode 100644
index 000000000000..f3fa50239380
--- /dev/null
+++ b/delayed-ref.c
@@ -0,0 +1,608 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2009 Oracle.  All rights reserved.
+ */
+
+#include "ctree.h"
+#include "btrfs-list.h"
+#include "delayed-ref.h"
+#include "transaction.h"
+
+/*
+ * delayed back reference update tracking.  For subvolume trees
+ * we queue up extent allocations and backref maintenance for
+ * delayed processing.   This avoids deep call chains where we
+ * add extents in the middle of btrfs_search_slot, and it allows
+ * us to buffer up frequently modified backrefs in an rb tree instead
+ * of hammering updates on the extent allocation tree.
+ */
+
+/*
+ * compare two delayed tree backrefs with same bytenr and type
+ */
+static int comp_tree_refs(struct btrfs_delayed_tree_ref *ref1,
+			  struct btrfs_delayed_tree_ref *ref2)
+{
+	if (ref1->node.type == BTRFS_TREE_BLOCK_REF_KEY) {
+		if (ref1->root < ref2->root)
+			return -1;
+		if (ref1->root > ref2->root)
+			return 1;
+	} else {
+		if (ref1->parent < ref2->parent)
+			return -1;
+		if (ref1->parent > ref2->parent)
+			return 1;
+	}
+	return 0;
+}
+
+static int comp_refs(struct btrfs_delayed_ref_node *ref1,
+		     struct btrfs_delayed_ref_node *ref2,
+		     bool check_seq)
+{
+	int ret = 0;
+
+	if (ref1->type < ref2->type)
+		return -1;
+	if (ref1->type > ref2->type)
+		return 1;
+	if (ref1->type == BTRFS_TREE_BLOCK_REF_KEY ||
+	    ref1->type == BTRFS_SHARED_BLOCK_REF_KEY)
+		ret = comp_tree_refs(btrfs_delayed_node_to_tree_ref(ref1),
+				     btrfs_delayed_node_to_tree_ref(ref2));
+	else
+		BUG();
+
+	if (ret)
+		return ret;
+	if (check_seq) {
+		if (ref1->seq < ref2->seq)
+			return -1;
+		if (ref1->seq > ref2->seq)
+			return 1;
+	}
+	return 0;
+}
+
+/* insert a new ref to head ref rbtree */
+static struct btrfs_delayed_ref_head *htree_insert(struct rb_root *root,
+						   struct rb_node *node)
+{
+	struct rb_node **p = &root->rb_node;
+	struct rb_node *parent_node = NULL;
+	struct btrfs_delayed_ref_head *entry;
+	struct btrfs_delayed_ref_head *ins;
+	u64 bytenr;
+
+	ins = rb_entry(node, struct btrfs_delayed_ref_head, href_node);
+	bytenr = ins->bytenr;
+	while (*p) {
+		parent_node = *p;
+		entry = rb_entry(parent_node, struct btrfs_delayed_ref_head,
+				 href_node);
+
+		if (bytenr < entry->bytenr)
+			p = &(*p)->rb_left;
+		else if (bytenr > entry->bytenr)
+			p = &(*p)->rb_right;
+		else
+			return entry;
+	}
+
+	rb_link_node(node, parent_node, p);
+	rb_insert_color(node, root);
+	return NULL;
+}
+
+static struct btrfs_delayed_ref_node* tree_insert(struct rb_root *root,
+		struct btrfs_delayed_ref_node *ins)
+{
+	struct rb_node **p = &root->rb_node;
+	struct rb_node *node = &ins->ref_node;
+	struct rb_node *parent_node = NULL;
+	struct btrfs_delayed_ref_node *entry;
+
+	while (*p) {
+		int comp;
+
+		parent_node = *p;
+		entry = rb_entry(parent_node, struct btrfs_delayed_ref_node,
+				 ref_node);
+		comp = comp_refs(ins, entry, true);
+		if (comp < 0)
+			p = &(*p)->rb_left;
+		else if (comp > 0)
+			p = &(*p)->rb_right;
+		else
+			return entry;
+	}
+
+	rb_link_node(node, parent_node, p);
+	rb_insert_color(node, root);
+	return NULL;
+}
+
+/*
+ * find an head entry based on bytenr. This returns the delayed ref
+ * head if it was able to find one, or NULL if nothing was in that spot.
+ * If return_bigger is given, the next bigger entry is returned if no exact
+ * match is found.
+ */
+static struct btrfs_delayed_ref_head *
+find_ref_head(struct rb_root *root, u64 bytenr,
+	      int return_bigger)
+{
+	struct rb_node *n;
+	struct btrfs_delayed_ref_head *entry;
+
+	n = root->rb_node;
+	entry = NULL;
+	while (n) {
+		entry = rb_entry(n, struct btrfs_delayed_ref_head, href_node);
+
+		if (bytenr < entry->bytenr)
+			n = n->rb_left;
+		else if (bytenr > entry->bytenr)
+			n = n->rb_right;
+		else
+			return entry;
+	}
+	if (entry && return_bigger) {
+		if (bytenr > entry->bytenr) {
+			n = rb_next(&entry->href_node);
+			if (!n)
+				n = rb_first(root);
+			entry = rb_entry(n, struct btrfs_delayed_ref_head,
+					 href_node);
+			return entry;
+		}
+		return entry;
+	}
+	return NULL;
+}
+
+static inline void drop_delayed_ref(struct btrfs_trans_handle *trans,
+				    struct btrfs_delayed_ref_root *delayed_refs,
+				    struct btrfs_delayed_ref_head *head,
+				    struct btrfs_delayed_ref_node *ref)
+{
+	rb_erase(&ref->ref_node, &head->ref_tree);
+	RB_CLEAR_NODE(&ref->ref_node);
+	if (!list_empty(&ref->add_list))
+		list_del(&ref->add_list);
+	ref->in_tree = 0;
+	btrfs_put_delayed_ref(ref);
+	if (trans->delayed_ref_updates)
+		trans->delayed_ref_updates--;
+}
+
+static bool merge_ref(struct btrfs_trans_handle *trans,
+		      struct btrfs_delayed_ref_root *delayed_refs,
+		      struct btrfs_delayed_ref_head *head,
+		      struct btrfs_delayed_ref_node *ref,
+		      u64 seq)
+{
+	struct btrfs_delayed_ref_node *next;
+	struct rb_node *node = rb_next(&ref->ref_node);
+	bool done = false;
+
+	while (!done && node) {
+		int mod;
+
+		next = rb_entry(node, struct btrfs_delayed_ref_node, ref_node);
+		node = rb_next(node);
+		if (seq && next->seq >= seq)
+			break;
+		if (comp_refs(ref, next, false))
+			break;
+
+		if (ref->action == next->action) {
+			mod = next->ref_mod;
+		} else {
+			if (ref->ref_mod < next->ref_mod) {
+				swap(ref, next);
+				done = true;
+			}
+			mod = -next->ref_mod;
+		}
+
+		drop_delayed_ref(trans, delayed_refs, head, next);
+		ref->ref_mod += mod;
+		if (ref->ref_mod == 0) {
+			drop_delayed_ref(trans, delayed_refs, head, ref);
+			done = true;
+		} else {
+			/*
+			 * Can't have multiples of the same ref on a tree block.
+			 */
+			WARN_ON(ref->type == BTRFS_TREE_BLOCK_REF_KEY ||
+				ref->type == BTRFS_SHARED_BLOCK_REF_KEY);
+		}
+	}
+
+	return done;
+}
+
+void btrfs_merge_delayed_refs(struct btrfs_trans_handle *trans,
+			      struct btrfs_delayed_ref_root *delayed_refs,
+			      struct btrfs_delayed_ref_head *head)
+{
+	struct btrfs_delayed_ref_node *ref;
+	struct rb_node *node;
+
+	if (RB_EMPTY_ROOT(&head->ref_tree))
+		return;
+
+	/* We don't have too many refs to merge for data. */
+	if (head->is_data)
+		return;
+
+again:
+	for (node = rb_first(&head->ref_tree); node; node = rb_next(node)) {
+		ref = rb_entry(node, struct btrfs_delayed_ref_node, ref_node);
+		if (merge_ref(trans, delayed_refs, head, ref, 0))
+			goto again;
+	}
+}
+
+struct btrfs_delayed_ref_head *
+btrfs_select_ref_head(struct btrfs_trans_handle *trans)
+{
+	struct btrfs_delayed_ref_root *delayed_refs;
+	struct btrfs_delayed_ref_head *head;
+	u64 start;
+	bool loop = false;
+
+	delayed_refs = &trans->delayed_refs;
+
+again:
+	start = delayed_refs->run_delayed_start;
+	head = find_ref_head(&delayed_refs->href_root, start, 1);
+	if (!head && !loop) {
+		delayed_refs->run_delayed_start = 0;
+		start = 0;
+		loop = true;
+		head = find_ref_head(&delayed_refs->href_root, start, 1);
+		if (!head)
+			return NULL;
+	} else if (!head && loop) {
+		return NULL;
+	}
+
+	while (head->processing) {
+		struct rb_node *node;
+
+		node = rb_next(&head->href_node);
+		if (!node) {
+			if (loop)
+				return NULL;
+			delayed_refs->run_delayed_start = 0;
+			start = 0;
+			loop = true;
+			goto again;
+		}
+		head = rb_entry(node, struct btrfs_delayed_ref_head,
+				href_node);
+	}
+
+	head->processing = 1;
+	WARN_ON(delayed_refs->num_heads_ready == 0);
+	delayed_refs->num_heads_ready--;
+	delayed_refs->run_delayed_start = head->bytenr +
+		head->num_bytes;
+	return head;
+}
+
+/*
+ * Helper to insert the ref_node to the tail or merge with tail.
+ *
+ * Return 0 for insert.
+ * Return >0 for merge.
+ */
+static int insert_delayed_ref(struct btrfs_trans_handle *trans,
+			      struct btrfs_delayed_ref_root *root,
+			      struct btrfs_delayed_ref_head *href,
+			      struct btrfs_delayed_ref_node *ref)
+{
+	struct btrfs_delayed_ref_node *exist;
+	int mod;
+	int ret = 0;
+
+	exist = tree_insert(&href->ref_tree, ref);
+	if (!exist)
+		goto inserted;
+
+	/* Now we are sure we can merge */
+	ret = 1;
+	if (exist->action == ref->action) {
+		mod = ref->ref_mod;
+	} else {
+		/* Need to change action */
+		if (exist->ref_mod < ref->ref_mod) {
+			exist->action = ref->action;
+			mod = -exist->ref_mod;
+			exist->ref_mod = ref->ref_mod;
+			if (ref->action == BTRFS_ADD_DELAYED_REF)
+				list_add_tail(&exist->add_list,
+					      &href->ref_add_list);
+			else if (ref->action == BTRFS_DROP_DELAYED_REF) {
+				ASSERT(!list_empty(&exist->add_list));
+				list_del(&exist->add_list);
+			} else {
+				ASSERT(0);
+			}
+		} else
+			mod = -ref->ref_mod;
+	}
+	exist->ref_mod += mod;
+
+	/* remove existing tail if its ref_mod is zero */
+	if (exist->ref_mod == 0)
+		drop_delayed_ref(trans, root, href, exist);
+	return ret;
+inserted:
+	if (ref->action == BTRFS_ADD_DELAYED_REF)
+		list_add_tail(&ref->add_list, &href->ref_add_list);
+	root->num_entries++;
+	trans->delayed_ref_updates++;
+	return ret;
+}
+
+/*
+ * helper function to update the accounting in the head ref
+ * existing and update must have the same bytenr
+ */
+static noinline void
+update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs,
+			 struct btrfs_delayed_ref_head *existing,
+			 struct btrfs_delayed_ref_head *update,
+			 int *old_ref_mod_ret)
+{
+	int old_ref_mod;
+
+	BUG_ON(existing->is_data != update->is_data);
+
+	if (update->must_insert_reserved) {
+		/* if the extent was freed and then
+		 * reallocated before the delayed ref
+		 * entries were processed, we can end up
+		 * with an existing head ref without
+		 * the must_insert_reserved flag set.
+		 * Set it again here
+		 */
+		existing->must_insert_reserved = update->must_insert_reserved;
+
+		/*
+		 * update the num_bytes so we make sure the accounting
+		 * is done correctly
+		 */
+		existing->num_bytes = update->num_bytes;
+
+	}
+
+	if (update->extent_op) {
+		if (!existing->extent_op) {
+			existing->extent_op = update->extent_op;
+		} else {
+			if (update->extent_op->update_key) {
+				memcpy(&existing->extent_op->key,
+				       &update->extent_op->key,
+				       sizeof(update->extent_op->key));
+				existing->extent_op->update_key = true;
+			}
+			if (update->extent_op->update_flags) {
+				existing->extent_op->flags_to_set |=
+					update->extent_op->flags_to_set;
+				existing->extent_op->update_flags = true;
+			}
+			btrfs_free_delayed_extent_op(update->extent_op);
+		}
+	}
+	/*
+	 * update the reference mod on the head to reflect this new operation,
+	 * only need the lock for this case cause we could be processing it
+	 * currently, for refs we just added we know we're a-ok.
+	 */
+	old_ref_mod = existing->total_ref_mod;
+	if (old_ref_mod_ret)
+		*old_ref_mod_ret = old_ref_mod;
+	existing->ref_mod += update->ref_mod;
+	existing->total_ref_mod += update->ref_mod;
+
+}
+
+static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref,
+				  void *qrecord,
+				  u64 bytenr, u64 num_bytes, u64 ref_root,
+				  u64 reserved, int action, bool is_data,
+				  bool is_system)
+{
+	int count_mod = 1;
+	int must_insert_reserved = 0;
+
+	/* If reserved is provided, it must be a data extent. */
+	BUG_ON(!is_data && reserved);
+
+	/*
+	 * The head node stores the sum of all the mods, so dropping a ref
+	 * should drop the sum in the head node by one.
+	 */
+	if (action == BTRFS_UPDATE_DELAYED_HEAD)
+		count_mod = 0;
+	else if (action == BTRFS_DROP_DELAYED_REF)
+		count_mod = -1;
+
+	/*
+	 * BTRFS_ADD_DELAYED_EXTENT means that we need to update the reserved
+	 * accounting when the extent is finally added, or if a later
+	 * modification deletes the delayed ref without ever inserting the
+	 * extent into the extent allocation tree.  ref->must_insert_reserved
+	 * is the flag used to record that accounting mods are required.
+	 *
+	 * Once we record must_insert_reserved, switch the action to
+	 * BTRFS_ADD_DELAYED_REF because other special casing is not required.
+	 */
+	if (action == BTRFS_ADD_DELAYED_EXTENT)
+		must_insert_reserved = 1;
+	else
+		must_insert_reserved = 0;
+
+	head_ref->refs = 1;
+	head_ref->bytenr = bytenr;
+	head_ref->num_bytes = num_bytes;
+	head_ref->ref_mod = count_mod;
+	head_ref->must_insert_reserved = must_insert_reserved;
+	head_ref->is_data = is_data;
+	head_ref->is_system = is_system;
+	head_ref->ref_tree = RB_ROOT;
+	INIT_LIST_HEAD(&head_ref->ref_add_list);
+	RB_CLEAR_NODE(&head_ref->href_node);
+	head_ref->processing = 0;
+	head_ref->total_ref_mod = count_mod;
+}
+
+/*
+ * helper function to actually insert a head node into the rbtree.
+ * this does all the dirty work in terms of maintaining the correct
+ * overall modification count.
+ */
+static noinline struct btrfs_delayed_ref_head *
+add_delayed_ref_head(struct btrfs_trans_handle *trans,
+		     struct btrfs_delayed_ref_head *head_ref,
+		     void *qrecord,
+		     int action, int *qrecord_inserted_ret,
+		     int *old_ref_mod, int *new_ref_mod)
+{
+	struct btrfs_delayed_ref_head *existing;
+	struct btrfs_delayed_ref_root *delayed_refs;
+
+	delayed_refs = &trans->delayed_refs;
+
+	existing = htree_insert(&delayed_refs->href_root, &head_ref->href_node);
+	if (existing) {
+		update_existing_head_ref(delayed_refs, existing, head_ref, old_ref_mod);
+		/*
+		 * we've updated the existing ref, free the newly
+		 * allocated ref
+		 */
+		kfree(head_ref);
+		head_ref = existing;
+	} else {
+		if (old_ref_mod)
+			*old_ref_mod = 0;
+		delayed_refs->num_heads++;
+		delayed_refs->num_heads_ready++;
+		trans->delayed_ref_updates++;
+	}
+	if (new_ref_mod)
+		*new_ref_mod = head_ref->total_ref_mod;
+
+	return head_ref;
+}
+
+/*
+ * init_delayed_ref_common - Initialize the structure which represents a
+ *			     modification to a an extent.
+ *
+ * @fs_info:    Internal to the mounted filesystem mount structure.
+ *
+ * @ref:	The structure which is going to be initialized.
+ *
+ * @bytenr:	The logical address of the extent for which a modification is
+ *		going to be recorded.
+ *
+ * @num_bytes:  Size of the extent whose modification is being recorded.
+ *
+ * @ref_root:	The id of the root where this modification has originated, this
+ *		can be either one of the well-known metadata trees or the
+ *		subvolume id which references this extent.
+ *
+ * @action:	Can be one of BTRFS_ADD_DELAYED_REF/BTRFS_DROP_DELAYED_REF or
+ *		BTRFS_ADD_DELAYED_EXTENT
+ *
+ * @ref_type:	Holds the type of the extent which is being recorded, can be
+ *		one of BTRFS_SHARED_BLOCK_REF_KEY/BTRFS_TREE_BLOCK_REF_KEY
+ *		when recording a metadata extent or BTRFS_SHARED_DATA_REF_KEY/
+ *		BTRFS_EXTENT_DATA_REF_KEY when recording data extent
+ */
+static void init_delayed_ref_common(struct btrfs_fs_info *fs_info,
+				    struct btrfs_delayed_ref_node *ref,
+				    u64 bytenr, u64 num_bytes, u64 ref_root,
+				    int action, u8 ref_type)
+{
+	if (action == BTRFS_ADD_DELAYED_EXTENT)
+		action = BTRFS_ADD_DELAYED_REF;
+
+	ref->refs = 1;
+	ref->bytenr = bytenr;
+	ref->num_bytes = num_bytes;
+	ref->ref_mod = 1;
+	ref->action = action;
+	ref->is_head = 0;
+	ref->in_tree = 1;
+	ref->seq = 0;
+	ref->type = ref_type;
+	RB_CLEAR_NODE(&ref->ref_node);
+	INIT_LIST_HEAD(&ref->add_list);
+}
+
+/*
+ * add a delayed tree ref.  This does all of the accounting required
+ * to make sure the delayed ref is eventually processed before this
+ * transaction commits.
+ */
+int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
+			       struct btrfs_trans_handle *trans,
+			       u64 bytenr, u64 num_bytes, u64 parent,
+			       u64 ref_root, int level, int action,
+			       struct btrfs_delayed_extent_op *extent_op,
+			       int *old_ref_mod, int *new_ref_mod)
+{
+	struct btrfs_delayed_tree_ref *ref;
+	struct btrfs_delayed_ref_head *head_ref;
+	struct btrfs_delayed_ref_root *delayed_refs;
+	bool is_system = (ref_root == BTRFS_CHUNK_TREE_OBJECTID);
+	int ret;
+	u8 ref_type;
+
+	BUG_ON(extent_op && extent_op->is_data);
+	ref = kmalloc(sizeof(*ref), GFP_NOFS);
+	if (!ref)
+		return -ENOMEM;
+
+	if (parent)
+		ref_type = BTRFS_SHARED_BLOCK_REF_KEY;
+	else
+		ref_type = BTRFS_TREE_BLOCK_REF_KEY;
+	init_delayed_ref_common(fs_info, &ref->node, bytenr, num_bytes,
+				ref_root, action, ref_type);
+	ref->root = ref_root;
+	ref->parent = parent;
+	ref->level = level;
+
+	head_ref = kmalloc(sizeof(*head_ref), GFP_NOFS);
+	if (!head_ref)
+		goto free_ref;
+
+	init_delayed_ref_head(head_ref, NULL, bytenr, num_bytes,
+			      ref_root, 0, action, false, is_system);
+	head_ref->extent_op = extent_op;
+
+	delayed_refs = &trans->delayed_refs;
+
+	head_ref = add_delayed_ref_head(trans, head_ref, NULL, action, NULL,
+			old_ref_mod, new_ref_mod);
+
+	ret = insert_delayed_ref(trans, delayed_refs, head_ref, &ref->node);
+
+	if (ret > 0)
+		kfree(ref);
+
+	return 0;
+
+free_ref:
+	kfree(ref);
+
+	return -ENOMEM;
+}
diff --git a/delayed-ref.h b/delayed-ref.h
new file mode 100644
index 000000000000..208551c80452
--- /dev/null
+++ b/delayed-ref.h
@@ -0,0 +1,225 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2008 Oracle.  All rights reserved.
+ */
+
+#ifndef BTRFS_DELAYED_REF_H
+#define BTRFS_DELAYED_REF_H
+
+#include "kerncompat.h"
+
+/* these are the possible values of struct btrfs_delayed_ref_node->action */
+#define BTRFS_ADD_DELAYED_REF    1 /* add one backref to the tree */
+#define BTRFS_DROP_DELAYED_REF   2 /* delete one backref from the tree */
+#define BTRFS_ADD_DELAYED_EXTENT 3 /* record a full extent allocation */
+#define BTRFS_UPDATE_DELAYED_HEAD 4 /* not changing ref count on head ref */
+
+struct btrfs_delayed_ref_node {
+	struct rb_node ref_node;
+	/*
+	 * If action is BTRFS_ADD_DELAYED_REF, also link this node to
+	 * ref_head->ref_add_list, then we do not need to iterate the
+	 * whole ref_head->ref_list to find BTRFS_ADD_DELAYED_REF nodes.
+	 */
+	struct list_head add_list;
+
+	/* the starting bytenr of the extent */
+	u64 bytenr;
+
+	/* the size of the extent */
+	u64 num_bytes;
+
+	/* seq number to keep track of insertion order */
+	u64 seq;
+
+	/* ref count on this data structure */
+	u64 refs;
+
+	/*
+	 * how many refs is this entry adding or deleting.  For
+	 * head refs, this may be a negative number because it is keeping
+	 * track of the total mods done to the reference count.
+	 * For individual refs, this will always be a positive number
+	 *
+	 * It may be more than one, since it is possible for a single
+	 * parent to have more than one ref on an extent
+	 */
+	int ref_mod;
+
+	unsigned int action:8;
+	unsigned int type:8;
+	/* is this node still in the rbtree? */
+	unsigned int is_head:1;
+	unsigned int in_tree:1;
+};
+
+struct btrfs_delayed_extent_op {
+	struct btrfs_disk_key key;
+	u8 level;
+	bool update_key;
+	bool update_flags;
+	bool is_data;
+	u64 flags_to_set;
+};
+
+/*
+ * the head refs are used to hold a lock on a given extent, which allows us
+ * to make sure that only one process is running the delayed refs
+ * at a time for a single extent.  They also store the sum of all the
+ * reference count modifications we've queued up.
+ */
+struct btrfs_delayed_ref_head {
+	u64 bytenr;
+	u64 num_bytes;
+	u64 refs;
+
+	struct rb_root ref_tree;
+	/* accumulate add BTRFS_ADD_DELAYED_REF nodes to this ref_add_list. */
+	struct list_head ref_add_list;
+
+	struct rb_node href_node;
+
+	struct btrfs_delayed_extent_op *extent_op;
+
+	/*
+	 * This is used to track the final ref_mod from all the refs associated
+	 * with this head ref, this is not adjusted as delayed refs are run,
+	 * this is meant to track if we need to do the csum accounting or not.
+	 */
+	int total_ref_mod;
+
+	/*
+	 * This is the current outstanding mod references for this bytenr.  This
+	 * is used with lookup_extent_info to get an accurate reference count
+	 * for a bytenr, so it is adjusted as delayed refs are run so that any
+	 * on disk reference count + ref_mod is accurate.
+	 */
+	int ref_mod;
+
+	/*
+	 * when a new extent is allocated, it is just reserved in memory
+	 * The actual extent isn't inserted into the extent allocation tree
+	 * until the delayed ref is processed.  must_insert_reserved is
+	 * used to flag a delayed ref so the accounting can be updated
+	 * when a full insert is done.
+	 *
+	 * It is possible the extent will be freed before it is ever
+	 * inserted into the extent allocation tree.  In this case
+	 * we need to update the in ram accounting to properly reflect
+	 * the free has happened.
+	 */
+	unsigned int must_insert_reserved:1;
+	unsigned int is_data:1;
+	unsigned int is_system:1;
+	unsigned int processing:1;
+};
+
+struct btrfs_delayed_tree_ref {
+	struct btrfs_delayed_ref_node node;
+	u64 root;
+	u64 parent;
+	int level;
+};
+
+struct btrfs_delayed_data_ref {
+	struct btrfs_delayed_ref_node node;
+	u64 root;
+	u64 parent;
+	u64 objectid;
+	u64 offset;
+};
+
+struct btrfs_delayed_ref_root {
+	/* head ref rbtree */
+	struct rb_root href_root;
+
+	/* dirty extent records */
+	struct rb_root dirty_extent_root;
+
+	/* total number of head nodes in tree */
+	unsigned long num_heads;
+
+	/* total number of head nodes ready for processing */
+	unsigned long num_heads_ready;
+
+	unsigned long num_entries;
+
+	/*
+	 * set when the tree is flushing before a transaction commit,
+	 * used by the throttling code to decide if new updates need
+	 * to be run right away
+	 */
+	int flushing;
+
+	u64 run_delayed_start;
+};
+
+
+static inline struct btrfs_delayed_extent_op *
+btrfs_alloc_delayed_extent_op(void)
+{
+	return kmalloc(sizeof(struct btrfs_delayed_extent_op), GFP_KERNEL);
+}
+
+static inline void
+btrfs_free_delayed_extent_op(struct btrfs_delayed_extent_op *op)
+{
+	if (op)
+		kfree(op);
+}
+
+static inline void btrfs_put_delayed_ref(struct btrfs_delayed_ref_node *ref)
+{
+	WARN_ON(ref->refs == 0);
+	if (--ref->refs) {
+		WARN_ON(ref->in_tree);
+		switch (ref->type) {
+		case BTRFS_TREE_BLOCK_REF_KEY:
+		case BTRFS_SHARED_BLOCK_REF_KEY:
+			kfree(ref);
+			break;
+		case BTRFS_EXTENT_DATA_REF_KEY:
+		case BTRFS_SHARED_DATA_REF_KEY:
+			kfree(ref);
+			break;
+		default:
+			BUG();
+		}
+	}
+}
+
+static inline void btrfs_put_delayed_ref_head(struct btrfs_delayed_ref_head *head)
+{
+	if (--head->refs)
+		kfree(head);
+}
+
+int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
+			       struct btrfs_trans_handle *trans,
+			       u64 bytenr, u64 num_bytes, u64 parent,
+			       u64 ref_root, int level, int action,
+			       struct btrfs_delayed_extent_op *extent_op,
+			       int *old_ref_mod, int *new_ref_mod);
+void btrfs_merge_delayed_refs(struct btrfs_trans_handle *trans,
+			      struct btrfs_delayed_ref_root *delayed_refs,
+			      struct btrfs_delayed_ref_head *head);
+
+struct btrfs_delayed_ref_head *
+btrfs_select_ref_head(struct btrfs_trans_handle *trans);
+
+/*
+ * helper functions to cast a node into its container
+ */
+static inline struct btrfs_delayed_tree_ref *
+btrfs_delayed_node_to_tree_ref(struct btrfs_delayed_ref_node *node)
+{
+	return container_of(node, struct btrfs_delayed_tree_ref, node);
+}
+
+static inline struct btrfs_delayed_data_ref *
+btrfs_delayed_node_to_data_ref(struct btrfs_delayed_ref_node *node)
+{
+	return container_of(node, struct btrfs_delayed_data_ref, node);
+}
+
+#endif
diff --git a/extent-tree.c b/extent-tree.c
index ab57c20d9dee..aff00e536c9c 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -4183,3 +4183,231 @@ u64 add_new_free_space(struct btrfs_block_group_cache *block_group,
 
 	return total_added;
 }
+
+static void cleanup_extent_op(struct btrfs_trans_handle *trans,
+			     struct btrfs_fs_info *fs_info,
+			     struct btrfs_delayed_ref_head *head)
+{
+	struct btrfs_delayed_extent_op *extent_op = head->extent_op;
+
+	if (!extent_op)
+		return;
+	head->extent_op = NULL;
+	btrfs_free_delayed_extent_op(extent_op);
+}
+
+static void unselect_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
+				      struct btrfs_delayed_ref_head *head)
+{
+	head->processing = 0;
+	delayed_refs->num_heads_ready++;
+}
+
+static int cleanup_ref_head(struct btrfs_trans_handle *trans,
+			    struct btrfs_fs_info *fs_info,
+			    struct btrfs_delayed_ref_head *head)
+{
+	struct btrfs_delayed_ref_root *delayed_refs;
+
+	delayed_refs = &trans->delayed_refs;
+
+	cleanup_extent_op(trans, fs_info, head);
+
+	/*
+	 * Need to drop our head ref lock and re-acquire the delayed ref lock
+	 * and then re-check to make sure nobody got added.
+	 */
+	if (!RB_EMPTY_ROOT(&head->ref_tree) || head->extent_op)
+		return 1;
+
+	delayed_refs->num_heads--;
+	rb_erase(&head->href_node, &delayed_refs->href_root);
+	RB_CLEAR_NODE(&head->href_node);
+	--delayed_refs->num_entries;
+
+	if (head->must_insert_reserved)
+		btrfs_pin_extent(fs_info, head->bytenr, head->num_bytes);
+
+	btrfs_put_delayed_ref_head(head);
+	return 0;
+}
+
+static inline struct btrfs_delayed_ref_node *
+select_delayed_ref(struct btrfs_delayed_ref_head *head)
+{
+	struct btrfs_delayed_ref_node *ref;
+
+	if (RB_EMPTY_ROOT(&head->ref_tree))
+		return NULL;
+	/*
+	 * Select a delayed ref of type BTRFS_ADD_DELAYED_REF first.
+	 * This is to prevent a ref count from going down to zero, which deletes
+	 * the extent item from the extent tree, when there still are references
+	 * to add, which would fail because they would not find the extent item.
+	 */
+	if (!list_empty(&head->ref_add_list))
+		return list_first_entry(&head->ref_add_list,
+					struct btrfs_delayed_ref_node,
+					add_list);
+	ref = rb_entry(rb_first(&head->ref_tree),
+		       struct btrfs_delayed_ref_node, ref_node);
+	ASSERT(list_empty(&ref->add_list));
+	return ref;
+}
+
+
+static int run_delayed_tree_ref(struct btrfs_trans_handle *trans,
+				struct btrfs_fs_info *fs_info,
+				struct btrfs_delayed_ref_node *node,
+				struct btrfs_delayed_extent_op *extent_op,
+				int insert_reserved)
+{
+	int ret = 0;
+	struct btrfs_delayed_tree_ref *ref;
+	u64 parent = 0;
+	u64 ref_root = 0;
+
+	ref = btrfs_delayed_node_to_tree_ref(node);
+
+	if (node->type == BTRFS_SHARED_BLOCK_REF_KEY)
+			parent = ref->parent;
+	ref_root = ref->root;
+
+	if (node->ref_mod != 1) {
+		printf("btree block(%llu) has %d references rather than 1: action %d ref_root %llu parent %llu",
+			node->bytenr, node->ref_mod, node->action, ref_root,
+			parent);
+		return -EIO;
+	}
+	if (node->action == BTRFS_ADD_DELAYED_REF && insert_reserved) {
+		BUG_ON(!extent_op || !extent_op->update_flags);
+		ret = alloc_reserved_tree_block2(trans, node, extent_op);
+	} else if (node->action == BTRFS_DROP_DELAYED_REF) {
+		ret = __free_extent2(trans, node, extent_op);
+	} else {
+		BUG();
+	}
+
+	return ret;
+}
+
+/* helper function to actually process a single delayed ref entry */
+static int run_one_delayed_ref(struct btrfs_trans_handle *trans,
+			       struct btrfs_fs_info *fs_info,
+			       struct btrfs_delayed_ref_node *node,
+			       struct btrfs_delayed_extent_op *extent_op,
+			       int insert_reserved)
+{
+	int ret = 0;
+
+	if (node->type == BTRFS_TREE_BLOCK_REF_KEY ||
+		node->type == BTRFS_SHARED_BLOCK_REF_KEY) {
+		ret = run_delayed_tree_ref(trans, fs_info, node, extent_op,
+					   insert_reserved);
+	} else
+		BUG();
+	return ret;
+}
+
+int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, unsigned long nr)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_delayed_ref_root *delayed_refs;
+	struct btrfs_delayed_ref_node *ref;
+	struct btrfs_delayed_ref_head *locked_ref = NULL;
+	struct btrfs_delayed_extent_op *extent_op;
+	int ret;
+	int must_insert_reserved = 0;
+
+	delayed_refs = &trans->delayed_refs;
+	while (1) {
+		if (!locked_ref) {
+			locked_ref = btrfs_select_ref_head(trans);
+			if (!locked_ref)
+				break;
+		}
+		/*
+		 * We need to try and merge add/drops of the same ref since we
+		 * can run into issues with relocate dropping the implicit ref
+		 * and then it being added back again before the drop can
+		 * finish.	If we merged anything we need to re-loop so we can
+		 * get a good ref.
+		 * Or we can get node references of the same type that weren't
+		 * merged when created due to bumps in the tree mod seq, and
+		 * we need to merge them to prevent adding an inline extent
+		 * backref before dropping it (triggering a BUG_ON at
+		 * insert_inline_extent_backref()).
+		 */
+		btrfs_merge_delayed_refs(trans, delayed_refs, locked_ref);
+		ref = select_delayed_ref(locked_ref);
+		/*
+		 * We're done processing refs in this ref_head, clean everything
+		 * up and move on to the next ref_head.
+		 */
+		if (!ref) {
+			ret = cleanup_ref_head(trans, fs_info, locked_ref);
+			if (ret > 0 ) {
+				/* We dropped our lock, we need to loop. */
+				ret = 0;
+				continue;
+			} else if (ret) {
+				return ret;
+			}
+			locked_ref = NULL;
+			continue;
+		}
+
+		ref->in_tree = 0;
+		rb_erase(&ref->ref_node, &locked_ref->ref_tree);
+		RB_CLEAR_NODE(&ref->ref_node);
+		if (!list_empty(&ref->add_list))
+				list_del(&ref->add_list);
+		/*
+		 * When we play the delayed ref, also correct the ref_mod on
+		 * head
+		 */
+		switch (ref->action) {
+		case BTRFS_ADD_DELAYED_REF:
+		case BTRFS_ADD_DELAYED_EXTENT:
+			locked_ref->ref_mod -= ref->ref_mod;
+			break;
+		case BTRFS_DROP_DELAYED_REF:
+			locked_ref->ref_mod += ref->ref_mod;
+			break;
+		default:
+			WARN_ON(1);
+		}
+		delayed_refs->num_entries--;
+
+		/*
+		 * Record the must-insert_reserved flag before we drop the spin
+		 * lock.
+		 */
+		must_insert_reserved = locked_ref->must_insert_reserved;
+		locked_ref->must_insert_reserved = 0;
+
+		extent_op = locked_ref->extent_op;
+		locked_ref->extent_op = NULL;
+
+		ret = run_one_delayed_ref(trans, fs_info, ref, extent_op,
+					  must_insert_reserved);
+
+		btrfs_free_delayed_extent_op(extent_op);
+		/*
+		 * If we are re-initing extent tree in this transaction
+		 * failure in freeing old roots are expected (because we don't
+		 * have the old extent tree, hence backref resolution will
+		 * return -EIO).
+		 */
+		if (ret && (!trans->reinit_extent_tree ||
+		     ref->action != BTRFS_DROP_DELAYED_REF)) {
+			unselect_delayed_ref_head(delayed_refs, locked_ref);
+			btrfs_put_delayed_ref(ref);
+			return ret;
+		}
+
+		btrfs_put_delayed_ref(ref);
+	}
+
+	return 0;
+}
diff --git a/kerncompat.h b/kerncompat.h
index fa96715fb70c..1a2bc18c3ac2 100644
--- a/kerncompat.h
+++ b/kerncompat.h
@@ -263,6 +263,14 @@ static inline int IS_ERR_OR_NULL(const void *ptr)
 	return !ptr || IS_ERR(ptr);
 }
 
+/**
+ * swap - swap values of @a and @b
+ * @a: first value
+ * @b: second value
+ */
+#define swap(a, b) \
+        do { typeof(a) __tmp = (a); (a) = (b); (b) = __tmp; } while (0)
+
 /*
  * This looks more complex than it should be. But we need to
  * get the type for the ~ right in round_down (it needs to be
diff --git a/transaction.h b/transaction.h
index 750e329e1ba8..34060252dd5c 100644
--- a/transaction.h
+++ b/transaction.h
@@ -21,6 +21,7 @@
 
 #include "kerncompat.h"
 #include "ctree.h"
+#include "delayed-ref.h"
 
 struct btrfs_trans_handle {
 	struct btrfs_fs_info *fs_info;
@@ -28,9 +29,12 @@ struct btrfs_trans_handle {
 	u64 alloc_exclude_start;
 	u64 alloc_exclude_nr;
 	bool reinit_extent_tree;
+	u64 delayed_ref_updates;
 	unsigned long blocks_reserved;
 	unsigned long blocks_used;
 	struct btrfs_block_group_cache *block_group;
+	struct btrfs_delayed_ref_root delayed_refs;
+
 };
 
 struct btrfs_trans_handle* btrfs_start_transaction(struct btrfs_root *root,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 12/15] btrfs-progs: Add __free_extent2 function
  2018-06-08 12:47 [PATCH 00/15] Add delayed-refs support to btrfs-progs Nikolay Borisov
                   ` (10 preceding siblings ...)
  2018-06-08 12:47 ` [PATCH 11/15] btrfs-progs: Add delayed refs infrastructure Nikolay Borisov
@ 2018-06-08 12:47 ` Nikolay Borisov
  2018-06-08 12:47 ` [PATCH 13/15] btrfs-progs: Add alloc_reserved_tree_block2 function Nikolay Borisov
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-08 12:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

This is a simple adapter to convert the arguments delayed ref arguments
to the existing arguments of __free_extent.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 extent-tree.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/extent-tree.c b/extent-tree.c
index aff00e536c9c..8789a43c7fea 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -2136,6 +2136,17 @@ void btrfs_unpin_extent(struct btrfs_fs_info *fs_info,
 	update_pinned_extents(fs_info, bytenr, num_bytes, 0);
 }
 
+static int __free_extent2(struct btrfs_trans_handle *trans,
+			  struct btrfs_delayed_ref_node *node,
+			  struct btrfs_delayed_extent_op *extent_op)
+{
+
+	struct btrfs_delayed_tree_ref *ref = btrfs_delayed_node_to_tree_ref(node);
+
+	return __free_extent(trans, node->bytenr, node->num_bytes,
+			     ref->parent, ref->root, ref->level, 0, 1);
+}
+
 /*
  * remove an extent from the root, returns 0 on success
  */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 13/15] btrfs-progs: Add alloc_reserved_tree_block2 function
  2018-06-08 12:47 [PATCH 00/15] Add delayed-refs support to btrfs-progs Nikolay Borisov
                   ` (11 preceding siblings ...)
  2018-06-08 12:47 ` [PATCH 12/15] btrfs-progs: Add __free_extent2 function Nikolay Borisov
@ 2018-06-08 12:47 ` Nikolay Borisov
  2018-06-08 12:47 ` [PATCH 14/15] btrfs-progs: Wire up delayed refs Nikolay Borisov
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-08 12:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

This is a simple adapter function to convert the delayed-refs structures
to the current arguments of alloc_reserved_tree_block.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 extent-tree.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/extent-tree.c b/extent-tree.c
index 8789a43c7fea..3208ed11cb91 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -2687,6 +2687,30 @@ int btrfs_reserve_extent(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
+static int alloc_reserved_tree_block2(struct btrfs_trans_handle *trans,
+				      struct btrfs_delayed_ref_node *node,
+				      struct btrfs_delayed_extent_op *extent_op)
+{
+
+	struct btrfs_delayed_tree_ref *ref = btrfs_delayed_node_to_tree_ref(node);
+	struct btrfs_key ins;
+	bool skinny_metadata = btrfs_fs_incompat(trans->fs_info, SKINNY_METADATA);
+
+	ins.objectid = node->bytenr;
+	if (skinny_metadata) {
+		ins.offset = ref->level;
+		ins.type = BTRFS_METADATA_ITEM_KEY;
+	} else {
+		ins.offset = node->num_bytes;
+		ins.type = BTRFS_EXTENT_ITEM_KEY;
+	}
+
+	return alloc_reserved_tree_block(trans, ref->root, trans->transid,
+					 extent_op->flags_to_set,
+					 &extent_op->key, ref->level, &ins);
+
+}
+
 static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
 				     u64 root_objectid, u64 generation,
 				     u64 flags, struct btrfs_disk_key *key,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 14/15] btrfs-progs: Wire up delayed refs
  2018-06-08 12:47 [PATCH 00/15] Add delayed-refs support to btrfs-progs Nikolay Borisov
                   ` (12 preceding siblings ...)
  2018-06-08 12:47 ` [PATCH 13/15] btrfs-progs: Add alloc_reserved_tree_block2 function Nikolay Borisov
@ 2018-06-08 12:47 ` Nikolay Borisov
  2018-07-30  8:33   ` Misono Tomohiro
  2018-06-08 12:47 ` [PATCH 15/15] btrfs-progs: Remove old delayed refs infrastructure Nikolay Borisov
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-08 12:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

This commit enables the delayed refs infrastructures. This entails doing
the following:

1. Replacing existing calls of btrfs_extent_post_op (which is the
equivalent of delayed refs) with the proper btrfs_run_delayed_refs.
As well as eliminating open-coded calls to finish_current_insert and
del_pending_extents which execute the delayed ops.

2. Wiring up the addition of delayed refs when freeing extents
(btrfs_free_extent) and when adding new extents (alloc_tree_block).

3. Adding calls to btrfs_run_delayed refs in the transaction commit
path alongside comments why every call is needed, since it's not always
obvious (those call sites were derived empirically by running and
debugging existing tests)

4. Correctly flagging the transaction in which we are reinitialising
the extent tree.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 check/main.c  |   3 +-
 extent-tree.c | 166 ++++++++++++++++++++++++++++++----------------------------
 transaction.c |  24 +++++++++
 3 files changed, 111 insertions(+), 82 deletions(-)

diff --git a/check/main.c b/check/main.c
index b84903acdb25..7c9689f29fd3 100644
--- a/check/main.c
+++ b/check/main.c
@@ -8634,7 +8634,7 @@ static int reinit_extent_tree(struct btrfs_trans_handle *trans,
 			fprintf(stderr, "Error adding block group\n");
 			return ret;
 		}
-		btrfs_extent_post_op(trans);
+		btrfs_run_delayed_refs(trans, -1);
 	}
 
 	ret = reset_balance(trans, fs_info);
@@ -9682,6 +9682,7 @@ int cmd_check(int argc, char **argv)
 			goto close_out;
 		}
 
+		trans->reinit_extent_tree = true;
 		if (init_extent_tree) {
 			printf("Creating a new extent tree\n");
 			ret = reinit_extent_tree(trans, info,
diff --git a/extent-tree.c b/extent-tree.c
index 3208ed11cb91..9d085158f2d8 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1418,8 +1418,6 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 		err = ret;
 out:
 	btrfs_free_path(path);
-	finish_current_insert(trans);
-	del_pending_extents(trans);
 	BUG_ON(err);
 	return err;
 }
@@ -1602,8 +1600,6 @@ int btrfs_set_block_flags(struct btrfs_trans_handle *trans, u64 bytenr,
 	btrfs_set_extent_flags(l, item, flags);
 out:
 	btrfs_free_path(path);
-	finish_current_insert(trans);
-	del_pending_extents(trans);
 	return ret;
 }
 
@@ -1701,7 +1697,6 @@ static int write_one_cache_group(struct btrfs_trans_handle *trans,
 				 struct btrfs_block_group_cache *cache)
 {
 	int ret;
-	int pending_ret;
 	struct btrfs_root *extent_root = trans->fs_info->extent_root;
 	unsigned long bi;
 	struct extent_buffer *leaf;
@@ -1717,12 +1712,8 @@ static int write_one_cache_group(struct btrfs_trans_handle *trans,
 	btrfs_mark_buffer_dirty(leaf);
 	btrfs_release_path(path);
 fail:
-	finish_current_insert(trans);
-	pending_ret = del_pending_extents(trans);
 	if (ret)
 		return ret;
-	if (pending_ret)
-		return pending_ret;
 	return 0;
 
 }
@@ -2050,6 +2041,7 @@ static int finish_current_insert(struct btrfs_trans_handle *trans)
 	int skinny_metadata =
 		btrfs_fs_incompat(extent_root->fs_info, SKINNY_METADATA);
 
+
 	while(1) {
 		ret = find_first_extent_bit(&info->extent_ins, 0, &start,
 					    &end, EXTENT_LOCKED);
@@ -2081,6 +2073,8 @@ static int finish_current_insert(struct btrfs_trans_handle *trans)
 			BUG_ON(1);
 		}
 
+
+		printf("shouldn't be executed\n");
 		clear_extent_bits(&info->extent_ins, start, end, EXTENT_LOCKED);
 		kfree(extent_op);
 	}
@@ -2380,7 +2374,6 @@ static int __free_extent(struct btrfs_trans_handle *trans,
 	}
 fail:
 	btrfs_free_path(path);
-	finish_current_insert(trans);
 	return ret;
 }
 
@@ -2463,33 +2456,30 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans,
 		      u64 bytenr, u64 num_bytes, u64 parent,
 		      u64 root_objectid, u64 owner, u64 offset)
 {
-	struct btrfs_root *extent_root = root->fs_info->extent_root;
-	int pending_ret;
 	int ret;
 
 	WARN_ON(num_bytes < root->fs_info->sectorsize);
-	if (root == extent_root) {
-		struct pending_extent_op *extent_op;
-
-		extent_op = kmalloc(sizeof(*extent_op), GFP_NOFS);
-		BUG_ON(!extent_op);
-
-		extent_op->type = PENDING_EXTENT_DELETE;
-		extent_op->bytenr = bytenr;
-		extent_op->num_bytes = num_bytes;
-		extent_op->level = (int)owner;
-
-		set_extent_bits(&root->fs_info->pending_del,
-				bytenr, bytenr + num_bytes - 1,
-				EXTENT_LOCKED);
-		set_state_private(&root->fs_info->pending_del,
-				  bytenr, (unsigned long)extent_op);
-		return 0;
+	/*
+	 * tree log blocks never actually go into the extent allocation
+	 * tree, just update pinning info and exit early.
+	 */
+	if (root_objectid == BTRFS_TREE_LOG_OBJECTID) {
+		printf("PINNING EXTENTS IN LOG TREE\n");
+		WARN_ON(owner >= BTRFS_FIRST_FREE_OBJECTID);
+		btrfs_pin_extent(trans->fs_info, bytenr, num_bytes);
+		ret = 0;
+	} else if (owner < BTRFS_FIRST_FREE_OBJECTID) {
+		BUG_ON(offset);
+		ret = btrfs_add_delayed_tree_ref(trans->fs_info, trans,
+						 bytenr, num_bytes, parent,
+						 root_objectid, (int)owner,
+						 BTRFS_DROP_DELAYED_REF,
+						 NULL, NULL, NULL);
+	} else {
+		ret = __free_extent(trans, bytenr, num_bytes, parent,
+				    root_objectid, owner, offset, 1);
 	}
-	ret = __free_extent(trans, root, bytenr, num_bytes, parent,
-			    root_objectid, owner, offset, 1);
-	pending_ret = del_pending_extents(trans);
-	return ret ? ret : pending_ret;
+	return ret;
 }
 
 static u64 stripe_align(struct btrfs_root *root, u64 val)
@@ -2695,6 +2685,8 @@ static int alloc_reserved_tree_block2(struct btrfs_trans_handle *trans,
 	struct btrfs_delayed_tree_ref *ref = btrfs_delayed_node_to_tree_ref(node);
 	struct btrfs_key ins;
 	bool skinny_metadata = btrfs_fs_incompat(trans->fs_info, SKINNY_METADATA);
+	int ret;
+	u64 start, end;
 
 	ins.objectid = node->bytenr;
 	if (skinny_metadata) {
@@ -2705,10 +2697,25 @@ static int alloc_reserved_tree_block2(struct btrfs_trans_handle *trans,
 		ins.type = BTRFS_EXTENT_ITEM_KEY;
 	}
 
-	return alloc_reserved_tree_block(trans, ref->root, trans->transid,
-					 extent_op->flags_to_set,
-					 &extent_op->key, ref->level, &ins);
+	if (ref->root == BTRFS_EXTENT_TREE_OBJECTID) {
+		ret = find_first_extent_bit(&trans->fs_info->extent_ins,
+					    node->bytenr, &start, &end,
+					    EXTENT_LOCKED);
+		ASSERT(!ret);
+		ASSERT(start == node->bytenr);
+		ASSERT(end == node->bytenr + node->num_bytes - 1);
+	}
+
+	ret = alloc_reserved_tree_block(trans, ref->root, trans->transid,
+					extent_op->flags_to_set,
+					&extent_op->key, ref->level, &ins);
 
+	if (ref->root == BTRFS_EXTENT_TREE_OBJECTID) {
+		clear_extent_bits(&trans->fs_info->extent_ins, start, end,
+				  EXTENT_LOCKED);
+	}
+
+	return ret;
 }
 
 static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
@@ -2773,39 +2780,50 @@ static int alloc_tree_block(struct btrfs_trans_handle *trans,
 			    u64 search_end, struct btrfs_key *ins)
 {
 	int ret;
+	u64 extent_size;
+	struct btrfs_delayed_extent_op *extent_op;
+	bool skinny_metadata = btrfs_fs_incompat(root->fs_info,
+						 SKINNY_METADATA);
+
+	extent_op = btrfs_alloc_delayed_extent_op();
+	if (!extent_op)
+		return -ENOMEM;
+
 	ret = btrfs_reserve_extent(trans, root, num_bytes, empty_size,
 				   hint_byte, search_end, ins, 0);
 	BUG_ON(ret);
 
+	if (key)
+		memcpy(&extent_op->key, key, sizeof(extent_op->key));
+	else
+		memset(&extent_op->key, 0, sizeof(extent_op->key));
+	extent_op->flags_to_set = flags;
+	extent_op->update_key = skinny_metadata ? false : true;
+	extent_op->update_flags = true;
+	extent_op->is_data = false;
+	extent_op->level = level;
+
+	extent_size = ins->offset;
+
+	if (btrfs_fs_incompat(root->fs_info, SKINNY_METADATA)) {
+		ins->offset = level;
+		ins->type = BTRFS_METADATA_ITEM_KEY;
+	}
+
+	/* Ensure this reserved extent is not found by the allocator */
 	if (root_objectid == BTRFS_EXTENT_TREE_OBJECTID) {
-		struct pending_extent_op *extent_op;
-
-		extent_op = kmalloc(sizeof(*extent_op), GFP_NOFS);
-		BUG_ON(!extent_op);
-
-		extent_op->type = PENDING_EXTENT_INSERT;
-		extent_op->bytenr = ins->objectid;
-		extent_op->num_bytes = ins->offset;
-		extent_op->level = level;
-		extent_op->flags = flags;
-		memcpy(&extent_op->key, key, sizeof(*key));
-
-		set_extent_bits(&root->fs_info->extent_ins, ins->objectid,
-				ins->objectid + ins->offset - 1,
-				EXTENT_LOCKED);
-		set_state_private(&root->fs_info->extent_ins,
-				  ins->objectid, (unsigned long)extent_op);
-	} else {
-		if (btrfs_fs_incompat(root->fs_info, SKINNY_METADATA)) {
-			ins->offset = level;
-			ins->type = BTRFS_METADATA_ITEM_KEY;
-		}
-		ret = alloc_reserved_tree_block(trans, root, root_objectid,
-						generation, flags,
-						key, level, ins);
-		finish_current_insert(trans);
-		del_pending_extents(trans);
+		ret = set_extent_bits(&trans->fs_info->extent_ins,
+				      ins->objectid,
+				      ins->objectid + extent_size - 1,
+				      EXTENT_LOCKED);
+
+		BUG_ON(ret);
 	}
+
+	ret = btrfs_add_delayed_tree_ref(root->fs_info, trans, ins->objectid,
+					 extent_size, 0, root_objectid,
+					 level, BTRFS_ADD_DELAYED_EXTENT,
+					 extent_op, NULL, NULL);
 	return ret;
 }
 
@@ -3330,11 +3348,6 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans,
 				sizeof(cache->item));
 	BUG_ON(ret);
 
-	ret = finish_current_insert(trans);
-	BUG_ON(ret);
-	ret = del_pending_extents(trans);
-	BUG_ON(ret);
-
 	return 0;
 }
 
@@ -3430,10 +3443,6 @@ int btrfs_make_block_groups(struct btrfs_trans_handle *trans,
 					sizeof(cache->item));
 		BUG_ON(ret);
 
-		finish_current_insert(trans);
-		ret = del_pending_extents(trans);
-		BUG_ON(ret);
-
 		cur_start = cache->key.objectid + cache->key.offset;
 	}
 	return 0;
@@ -3815,14 +3824,9 @@ int btrfs_fix_block_accounting(struct btrfs_trans_handle *trans)
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_root *root = fs_info->extent_root;
 
-	while(extent_root_pending_ops(fs_info)) {
-		ret = finish_current_insert(trans);
-		if (ret)
-			return ret;
-		ret = del_pending_extents(trans);
-		if (ret)
-			return ret;
-	}
+	ret = btrfs_run_delayed_refs(trans, -1);
+	if (ret)
+		return ret;
 
 	while(1) {
 		cache = btrfs_lookup_first_block_group(fs_info, start);
@@ -4027,7 +4031,7 @@ static int __btrfs_record_file_extent(struct btrfs_trans_handle *trans,
 		} else if (ret != -EEXIST) {
 			goto fail;
 		}
-		btrfs_extent_post_op(trans);
+		btrfs_run_delayed_refs(trans, -1);
 		extent_bytenr = disk_bytenr;
 		extent_num_bytes = num_bytes;
 		extent_offset = 0;
diff --git a/transaction.c b/transaction.c
index ecafbb156610..b2d613ee88d0 100644
--- a/transaction.c
+++ b/transaction.c
@@ -98,6 +98,17 @@ int commit_tree_roots(struct btrfs_trans_handle *trans,
 	if (ret)
 		return ret;
 
+	/*
+	 * If the above CoW is the first one to dirty the current tree_root,
+	 * delayed refs for it won't be run until after this function has
+	 * finished executing, meaning we won't process the extent tree root,
+	 * which will have been added to ->dirty_cowonly_roots.  So run
+	 * delayed refs here as well.
+	 */
+	ret = btrfs_run_delayed_refs(trans, -1);
+	if (ret)
+		return ret;
+
 	while(!list_empty(&fs_info->dirty_cowonly_roots)) {
 		next = fs_info->dirty_cowonly_roots.next;
 		list_del_init(next);
@@ -147,6 +158,12 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
 
 	if (trans->fs_info->transaction_aborted)
 		return -EROFS;
+	/*
+	 * Flush all accumulated delayed refs so that root-tree updates are
+	 * consistent
+	 */
+	ret = btrfs_run_delayed_refs(trans, -1);
+	BUG_ON(ret);
 
 	if (root->commit_root == root->node)
 		goto commit_tree;
@@ -164,9 +181,16 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
 	ret = btrfs_update_root(trans, root->fs_info->tree_root,
 				&root->root_key, &root->root_item);
 	BUG_ON(ret);
+
 commit_tree:
 	ret = commit_tree_roots(trans, fs_info);
 	BUG_ON(ret);
+	/*
+	 * Ensure that all comitted roots are properly accounted in the
+	 * extent tree
+	 */
+	ret = btrfs_run_delayed_refs(trans, -1);
+	BUG_ON(ret);
 	ret = __commit_transaction(trans, root);
 	BUG_ON(ret);
 	write_ctree_super(trans);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 15/15] btrfs-progs: Remove old delayed refs infrastructure
  2018-06-08 12:47 [PATCH 00/15] Add delayed-refs support to btrfs-progs Nikolay Borisov
                   ` (13 preceding siblings ...)
  2018-06-08 12:47 ` [PATCH 14/15] btrfs-progs: Wire up delayed refs Nikolay Borisov
@ 2018-06-08 12:47 ` Nikolay Borisov
  2018-06-08 14:49   ` [PATCH 15/15 v2] " Nikolay Borisov
  2018-06-08 13:50 ` [PATCH 00/15] Add delayed-refs support to btrfs-progs Qu Wenruo
  2018-07-16 15:39 ` David Sterba
  16 siblings, 1 reply; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-08 12:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Given that the new delayed refs infrastructure is implemented and
wired up, there is no point in keeping the old code. So just remove it.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 ctree.h       |   2 -
 extent-tree.c | 137 ----------------------------------------------------------
 2 files changed, 139 deletions(-)

diff --git a/ctree.h b/ctree.h
index d1ea45571d1e..3e9ca2ca8432 100644
--- a/ctree.h
+++ b/ctree.h
@@ -1098,7 +1098,6 @@ struct btrfs_fs_info {
 	struct extent_io_tree free_space_cache;
 	struct extent_io_tree block_group_cache;
 	struct extent_io_tree pinned_extents;
-	struct extent_io_tree pending_del;
 	struct extent_io_tree extent_ins;
 	struct extent_io_tree *excluded_extents;
 
@@ -2503,7 +2502,6 @@ int btrfs_fix_block_accounting(struct btrfs_trans_handle *trans);
 void btrfs_pin_extent(struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes);
 void btrfs_unpin_extent(struct btrfs_fs_info *fs_info,
 			u64 bytenr, u64 num_bytes);
-int btrfs_extent_post_op(struct btrfs_trans_handle *trans);
 struct btrfs_block_group_cache *btrfs_lookup_block_group(struct
 							 btrfs_fs_info *info,
 							 u64 bytenr);
diff --git a/extent-tree.c b/extent-tree.c
index 9d085158f2d8..b9d51b388c9a 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -52,8 +52,6 @@ static int __free_extent(struct btrfs_trans_handle *trans,
 			 u64 bytenr, u64 num_bytes, u64 parent,
 			 u64 root_objectid, u64 owner_objectid,
 			 u64 owner_offset, int refs_to_drop);
-static int finish_current_insert(struct btrfs_trans_handle *trans);
-static int del_pending_extents(struct btrfs_trans_handle *trans);
 static struct btrfs_block_group_cache *
 btrfs_find_block_group(struct btrfs_root *root, struct btrfs_block_group_cache
 		       *hint, u64 search_start, int data, int owner);
@@ -1422,13 +1420,6 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 	return err;
 }
 
-int btrfs_extent_post_op(struct btrfs_trans_handle *trans)
-{
-	finish_current_insert(trans);
-	del_pending_extents(trans);
-	return 0;
-}
-
 int btrfs_lookup_extent_info(struct btrfs_trans_handle *trans,
 			     struct btrfs_fs_info *fs_info, u64 bytenr,
 			     u64 offset, int metadata, u64 *refs, u64 *flags)
@@ -2013,74 +2004,6 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans,
 	return 0;
 }
 
-static int extent_root_pending_ops(struct btrfs_fs_info *info)
-{
-	u64 start;
-	u64 end;
-	int ret;
-
-	ret = find_first_extent_bit(&info->extent_ins, 0, &start,
-				    &end, EXTENT_LOCKED);
-	if (!ret) {
-		ret = find_first_extent_bit(&info->pending_del, 0, &start, &end,
-					    EXTENT_LOCKED);
-	}
-	return ret == 0;
-
-}
-static int finish_current_insert(struct btrfs_trans_handle *trans)
-{
-	u64 start;
-	u64 end;
-	u64 priv;
-	struct btrfs_fs_info *info = trans->fs_info;
-	struct btrfs_root *extent_root = info->extent_root;
-	struct pending_extent_op *extent_op;
-	struct btrfs_key key;
-	int ret;
-	int skinny_metadata =
-		btrfs_fs_incompat(extent_root->fs_info, SKINNY_METADATA);
-
-
-	while(1) {
-		ret = find_first_extent_bit(&info->extent_ins, 0, &start,
-					    &end, EXTENT_LOCKED);
-		if (ret)
-			break;
-
-		ret = get_state_private(&info->extent_ins, start, &priv);
-		BUG_ON(ret);
-		extent_op = (struct pending_extent_op *)(unsigned long)priv;
-
-		if (extent_op->type == PENDING_EXTENT_INSERT) {
-			key.objectid = start;
-			if (skinny_metadata) {
-				key.offset = extent_op->level;
-				key.type = BTRFS_METADATA_ITEM_KEY;
-			} else {
-				key.offset = extent_op->num_bytes;
-				key.type = BTRFS_EXTENT_ITEM_KEY;
-			}
-
-			ret = alloc_reserved_tree_block(trans,
-						extent_root->root_key.objectid,
-						trans->transid,
-						extent_op->flags,
-						&extent_op->key,
-						extent_op->level, &key);
-			BUG_ON(ret);
-		} else {
-			BUG_ON(1);
-		}
-
-
-		printf("shouldn't be executed\n");
-		clear_extent_bits(&info->extent_ins, start, end, EXTENT_LOCKED);
-		kfree(extent_op);
-	}
-	return 0;
-}
-
 static int pin_down_bytes(struct btrfs_trans_handle *trans, u64 bytenr,
 			  u64 num_bytes, int is_data)
 {
@@ -2377,66 +2300,6 @@ static int __free_extent(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
-/*
- * find all the blocks marked as pending in the radix tree and remove
- * them from the extent map
- */
-static int del_pending_extents(struct btrfs_trans_handle *trans)
-{
-	int ret;
-	int err = 0;
-	u64 start;
-	u64 end;
-	u64 priv;
-	struct extent_io_tree *pending_del;
-	struct extent_io_tree *extent_ins;
-	struct pending_extent_op *extent_op;
-	struct btrfs_fs_info *fs_info = trans->fs_info;
-	struct btrfs_root *extent_root = fs_info->extent_root;
-
-	extent_ins = &extent_root->fs_info->extent_ins;
-	pending_del = &extent_root->fs_info->pending_del;
-
-	while(1) {
-		ret = find_first_extent_bit(pending_del, 0, &start, &end,
-					    EXTENT_LOCKED);
-		if (ret)
-			break;
-
-		ret = get_state_private(pending_del, start, &priv);
-		BUG_ON(ret);
-		extent_op = (struct pending_extent_op *)(unsigned long)priv;
-
-		clear_extent_bits(pending_del, start, end, EXTENT_LOCKED);
-
-		if (!test_range_bit(extent_ins, start, end,
-				    EXTENT_LOCKED, 0)) {
-			ret = __free_extent(trans, start, end + 1 - start, 0,
-					    extent_root->root_key.objectid,
-					    extent_op->level, 0, 1);
-			kfree(extent_op);
-		} else {
-			kfree(extent_op);
-			ret = get_state_private(extent_ins, start, &priv);
-			BUG_ON(ret);
-			extent_op = (struct pending_extent_op *)
-							(unsigned long)priv;
-
-			clear_extent_bits(extent_ins, start, end,
-					  EXTENT_LOCKED);
-
-			if (extent_op->type == PENDING_BACKREF_UPDATE)
-				BUG_ON(1);
-
-			kfree(extent_op);
-		}
-		if (ret)
-			err = ret;
-	}
-	return err;
-}
-
-
 int btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 			  struct btrfs_root *root,
 			  struct extent_buffer *buf,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/15] Add delayed-refs support to btrfs-progs
  2018-06-08 12:47 [PATCH 00/15] Add delayed-refs support to btrfs-progs Nikolay Borisov
                   ` (14 preceding siblings ...)
  2018-06-08 12:47 ` [PATCH 15/15] btrfs-progs: Remove old delayed refs infrastructure Nikolay Borisov
@ 2018-06-08 13:50 ` Qu Wenruo
  2018-06-08 14:08   ` Nikolay Borisov
  2018-07-16 15:39 ` David Sterba
  16 siblings, 1 reply; 46+ messages in thread
From: Qu Wenruo @ 2018-06-08 13:50 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2018年06月08日 20:47, Nikolay Borisov wrote:
> Hello,                                                                          
>                                                                                 
> Here is a series which adds support for delayed refs. This is needed to enable  
> later work on adding freespace tree repair code.

Would it be possible to explain this in details?
Personally speaking, I'd like to avoid introducing complex delayed-ref
into btrfs-progs if possible.

And in my (possibly wrong) understanding, the main purpose of
delayed-ref is to reduce the race on extent tree, thus to improve
performance.
However in btrfs-progs, it's the least important aspect.

So extra comment on this is appreciated.

Thanks,
Qu

> Additionally, it results in  
> more code sharing between kernel/user space.
> 
> Patches 1-9 are simple prep patches removing some arguments, causing problems
> later. They can go independently of the delayed refs work. They don't introduce
> any functional changes. Next, patches 10-13 introduce the needed infrastructure
> to for delayed refs without actually activating it. Patch 14 finally wires it
> up by adding the necessary call outs to btrfs_run_delayed refs and reworking the
> extent addition/freeing functions. With all of this done, patch 15 finally
> removes the old code.
> 
> This series passes all btrfs progs fsck and misc tests + fuzz tests apart from
> fuzz-003/007/009 - but those fail without this series so it's unlikely it's
> caused by it.
> 
> Nikolay Borisov (15):
>   btrfs-progs: Remove root argument from pin_down_bytes
>   btrfs-progs: Remove root argument from btrfs_del_csums
>   btrfs-progs: Add functions to modify the used space by a root
>   btrfs-progs: Refactor the root used bytes are updated
>   btrfs-progs: Make update_block_group take fs_info instead of root
>   btrfs-progs: check: Drop trans/root arguments from free_extent_hook
>   btrfs-progs: Remove root argument from __free_extent
>   btrfs-progs: Remove root argument from alloc_reserved_tree_block
>   btrfs-progs: Always pass 0 for offset when calling btrfs_free_extent
>     for btree blocks.
>   btrfs-progs: Add boolean to signal whether we are re-initing extent
>     tree
>   btrfs-progs: Add delayed refs infrastructure
>   btrfs-progs: Add __free_extent2 function
>   btrfs-progs: Add alloc_reserved_tree_block2 function
>   btrfs-progs: Wire up delayed refs
>   btrfs-progs: Remove old delayed refs infrastructure
> 
>  Makefile              |   3 +-
>  btrfs-corrupt-block.c |   2 +-
>  check/main.c          |   8 +-
>  ctree.c               |  29 ++-
>  ctree.h               |  11 +-
>  delayed-ref.c         | 608 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  delayed-ref.h         | 225 +++++++++++++++++++
>  extent-tree.c         | 604 +++++++++++++++++++++++++++++--------------------
>  file-item.c           |  20 +-
>  kerncompat.h          |   8 +
>  transaction.c         |  25 +++
>  transaction.h         |   5 +
>  12 files changed, 1280 insertions(+), 268 deletions(-)
>  create mode 100644 delayed-ref.c
>  create mode 100644 delayed-ref.h
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/15] Add delayed-refs support to btrfs-progs
  2018-06-08 13:50 ` [PATCH 00/15] Add delayed-refs support to btrfs-progs Qu Wenruo
@ 2018-06-08 14:08   ` Nikolay Borisov
  2018-06-08 14:21     ` Qu Wenruo
  0 siblings, 1 reply; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-08 14:08 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On  8.06.2018 16:50, Qu Wenruo wrote:
>  details?
> Personally speaking, I'd like to avoid introducing complex delayed-ref
> into btrfs-progs if possible.
> 
> And in my (possibly wrong) understanding, the main purpose of
> delayed-ref is to reduce the race on extent tree, thus to improve
> performance.
> However in btrfs-progs, it's the least important aspect.
> 
> So extra comment on this is appreciated.

So in order to have freespace tree repair code working I needed to hook
up its add_to_free_space_tree/remove_from_free_space_tree to
alloc_reserved_tree_block/__free_extent. In my testing this lead to a
very deep recursion - it crashed on 58k call frames. So the idea was to
have delayed refs which would record and accumulate modifications and
then the freespace tree freeing code would piggy back on them to rely on
correct operation.

I guess I could try and debug the freespace code and see why I was going
into this infinite recursion so to speak.

Also the delayed refs code in progs is actually a lot simpler than the
kernel counterpart due to the lack of locking. One more benefit of
having this code in progs is the fact one can go through it with a
debugger and really inspect/understand how it works - i.e addition of
refs, selection of refs etc. Furthermore, it at least unifies the logic
between kernel and userspace, since right now there is code which mimics
the delayed refs - check the code being removed in the last patch.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/15] Add delayed-refs support to btrfs-progs
  2018-06-08 14:08   ` Nikolay Borisov
@ 2018-06-08 14:21     ` Qu Wenruo
  0 siblings, 0 replies; 46+ messages in thread
From: Qu Wenruo @ 2018-06-08 14:21 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2018年06月08日 22:08, Nikolay Borisov wrote:
> 
> 
> On  8.06.2018 16:50, Qu Wenruo wrote:
>>  details?
>> Personally speaking, I'd like to avoid introducing complex delayed-ref
>> into btrfs-progs if possible.
>>
>> And in my (possibly wrong) understanding, the main purpose of
>> delayed-ref is to reduce the race on extent tree, thus to improve
>> performance.
>> However in btrfs-progs, it's the least important aspect.
>>
>> So extra comment on this is appreciated.
> 
> So in order to have freespace tree repair code working I needed to hook
> up its add_to_free_space_tree/remove_from_free_space_tree to
> alloc_reserved_tree_block/__free_extent. In my testing this lead to a
> very deep recursion - it crashed on 58k call frames. So the idea was to
> have delayed refs which would record and accumulate modifications and
> then the freespace tree freeing code would piggy back on them to rely on
> correct operation.

In fact, I have a pretty nasty idea on this problem.
Mark one or more metadata chunks without free space tree cache.

Then at least recursion could be easily resolved (although need extra
extent allocation hook to handle fst allocation)

> 
> I guess I could try and debug the freespace code and see why I was going
> into this infinite recursion so to speak.
> 
> Also the delayed refs code in progs is actually a lot simpler than the
> kernel counterpart due to the lack of locking.

Right.
And no need to do the async delayed ref execution should also makes
things easier.

> One more benefit of
> having this code in progs is the fact one can go through it with a
> debugger and really inspect/understand how it works

Indeed, this makes a lot of sense.

I'll take some time to do more review on this patchset, and dig deeper
into delayed-ref facility.

Thanks,
Qu

> - i.e addition of
> refs, selection of refs etc. Furthermore, it at least unifies the logic
> between kernel and userspace, since right now there is code which mimics
> the delayed refs - check the code being removed in the last patch.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 15/15 v2] btrfs-progs: Remove old delayed refs infrastructure
  2018-06-08 12:47 ` [PATCH 15/15] btrfs-progs: Remove old delayed refs infrastructure Nikolay Borisov
@ 2018-06-08 14:49   ` Nikolay Borisov
  0 siblings, 0 replies; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-08 14:49 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Given that the new delayed refs infrastructure is implemented and
wired up, there is no point in keeping the old code. So just remove it.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---

V2: 

 * Remove fs_info->pending_del references in disk-io.c . This prevented 
 compilation. 

 ctree.h       |   2 -
 disk-io.c     |   2 -
 extent-tree.c | 137 ----------------------------------------------------------
 3 files changed, 141 deletions(-)

diff --git a/ctree.h b/ctree.h
index d1ea45571d1e..3e9ca2ca8432 100644
--- a/ctree.h
+++ b/ctree.h
@@ -1098,7 +1098,6 @@ struct btrfs_fs_info {
 	struct extent_io_tree free_space_cache;
 	struct extent_io_tree block_group_cache;
 	struct extent_io_tree pinned_extents;
-	struct extent_io_tree pending_del;
 	struct extent_io_tree extent_ins;
 	struct extent_io_tree *excluded_extents;
 
@@ -2503,7 +2502,6 @@ int btrfs_fix_block_accounting(struct btrfs_trans_handle *trans);
 void btrfs_pin_extent(struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes);
 void btrfs_unpin_extent(struct btrfs_fs_info *fs_info,
 			u64 bytenr, u64 num_bytes);
-int btrfs_extent_post_op(struct btrfs_trans_handle *trans);
 struct btrfs_block_group_cache *btrfs_lookup_block_group(struct
 							 btrfs_fs_info *info,
 							 u64 bytenr);
diff --git a/disk-io.c b/disk-io.c
index 4a609a892be7..8da6e3ce5fc8 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -726,7 +726,6 @@ struct btrfs_fs_info *btrfs_new_fs_info(int writable, u64 sb_bytenr)
 	extent_io_tree_init(&fs_info->free_space_cache);
 	extent_io_tree_init(&fs_info->block_group_cache);
 	extent_io_tree_init(&fs_info->pinned_extents);
-	extent_io_tree_init(&fs_info->pending_del);
 	extent_io_tree_init(&fs_info->extent_ins);
 	fs_info->excluded_extents = NULL;
 
@@ -984,7 +983,6 @@ void btrfs_cleanup_all_caches(struct btrfs_fs_info *fs_info)
 	extent_io_tree_cleanup(&fs_info->free_space_cache);
 	extent_io_tree_cleanup(&fs_info->block_group_cache);
 	extent_io_tree_cleanup(&fs_info->pinned_extents);
-	extent_io_tree_cleanup(&fs_info->pending_del);
 	extent_io_tree_cleanup(&fs_info->extent_ins);
 }
 
diff --git a/extent-tree.c b/extent-tree.c
index 9d085158f2d8..b9d51b388c9a 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -52,8 +52,6 @@ static int __free_extent(struct btrfs_trans_handle *trans,
 			 u64 bytenr, u64 num_bytes, u64 parent,
 			 u64 root_objectid, u64 owner_objectid,
 			 u64 owner_offset, int refs_to_drop);
-static int finish_current_insert(struct btrfs_trans_handle *trans);
-static int del_pending_extents(struct btrfs_trans_handle *trans);
 static struct btrfs_block_group_cache *
 btrfs_find_block_group(struct btrfs_root *root, struct btrfs_block_group_cache
 		       *hint, u64 search_start, int data, int owner);
@@ -1422,13 +1420,6 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 	return err;
 }
 
-int btrfs_extent_post_op(struct btrfs_trans_handle *trans)
-{
-	finish_current_insert(trans);
-	del_pending_extents(trans);
-	return 0;
-}
-
 int btrfs_lookup_extent_info(struct btrfs_trans_handle *trans,
 			     struct btrfs_fs_info *fs_info, u64 bytenr,
 			     u64 offset, int metadata, u64 *refs, u64 *flags)
@@ -2013,74 +2004,6 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans,
 	return 0;
 }
 
-static int extent_root_pending_ops(struct btrfs_fs_info *info)
-{
-	u64 start;
-	u64 end;
-	int ret;
-
-	ret = find_first_extent_bit(&info->extent_ins, 0, &start,
-				    &end, EXTENT_LOCKED);
-	if (!ret) {
-		ret = find_first_extent_bit(&info->pending_del, 0, &start, &end,
-					    EXTENT_LOCKED);
-	}
-	return ret == 0;
-
-}
-static int finish_current_insert(struct btrfs_trans_handle *trans)
-{
-	u64 start;
-	u64 end;
-	u64 priv;
-	struct btrfs_fs_info *info = trans->fs_info;
-	struct btrfs_root *extent_root = info->extent_root;
-	struct pending_extent_op *extent_op;
-	struct btrfs_key key;
-	int ret;
-	int skinny_metadata =
-		btrfs_fs_incompat(extent_root->fs_info, SKINNY_METADATA);
-
-
-	while(1) {
-		ret = find_first_extent_bit(&info->extent_ins, 0, &start,
-					    &end, EXTENT_LOCKED);
-		if (ret)
-			break;
-
-		ret = get_state_private(&info->extent_ins, start, &priv);
-		BUG_ON(ret);
-		extent_op = (struct pending_extent_op *)(unsigned long)priv;
-
-		if (extent_op->type == PENDING_EXTENT_INSERT) {
-			key.objectid = start;
-			if (skinny_metadata) {
-				key.offset = extent_op->level;
-				key.type = BTRFS_METADATA_ITEM_KEY;
-			} else {
-				key.offset = extent_op->num_bytes;
-				key.type = BTRFS_EXTENT_ITEM_KEY;
-			}
-
-			ret = alloc_reserved_tree_block(trans,
-						extent_root->root_key.objectid,
-						trans->transid,
-						extent_op->flags,
-						&extent_op->key,
-						extent_op->level, &key);
-			BUG_ON(ret);
-		} else {
-			BUG_ON(1);
-		}
-
-
-		printf("shouldn't be executed\n");
-		clear_extent_bits(&info->extent_ins, start, end, EXTENT_LOCKED);
-		kfree(extent_op);
-	}
-	return 0;
-}
-
 static int pin_down_bytes(struct btrfs_trans_handle *trans, u64 bytenr,
 			  u64 num_bytes, int is_data)
 {
@@ -2377,66 +2300,6 @@ static int __free_extent(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
-/*
- * find all the blocks marked as pending in the radix tree and remove
- * them from the extent map
- */
-static int del_pending_extents(struct btrfs_trans_handle *trans)
-{
-	int ret;
-	int err = 0;
-	u64 start;
-	u64 end;
-	u64 priv;
-	struct extent_io_tree *pending_del;
-	struct extent_io_tree *extent_ins;
-	struct pending_extent_op *extent_op;
-	struct btrfs_fs_info *fs_info = trans->fs_info;
-	struct btrfs_root *extent_root = fs_info->extent_root;
-
-	extent_ins = &extent_root->fs_info->extent_ins;
-	pending_del = &extent_root->fs_info->pending_del;
-
-	while(1) {
-		ret = find_first_extent_bit(pending_del, 0, &start, &end,
-					    EXTENT_LOCKED);
-		if (ret)
-			break;
-
-		ret = get_state_private(pending_del, start, &priv);
-		BUG_ON(ret);
-		extent_op = (struct pending_extent_op *)(unsigned long)priv;
-
-		clear_extent_bits(pending_del, start, end, EXTENT_LOCKED);
-
-		if (!test_range_bit(extent_ins, start, end,
-				    EXTENT_LOCKED, 0)) {
-			ret = __free_extent(trans, start, end + 1 - start, 0,
-					    extent_root->root_key.objectid,
-					    extent_op->level, 0, 1);
-			kfree(extent_op);
-		} else {
-			kfree(extent_op);
-			ret = get_state_private(extent_ins, start, &priv);
-			BUG_ON(ret);
-			extent_op = (struct pending_extent_op *)
-							(unsigned long)priv;
-
-			clear_extent_bits(extent_ins, start, end,
-					  EXTENT_LOCKED);
-
-			if (extent_op->type == PENDING_BACKREF_UPDATE)
-				BUG_ON(1);
-
-			kfree(extent_op);
-		}
-		if (ret)
-			err = ret;
-	}
-	return err;
-}
-
-
 int btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 			  struct btrfs_root *root,
 			  struct extent_buffer *buf,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 11/15 v2] btrfs-progs: Add delayed refs infrastructure
  2018-06-08 12:47 ` [PATCH 11/15] btrfs-progs: Add delayed refs infrastructure Nikolay Borisov
@ 2018-06-08 14:53   ` Nikolay Borisov
  2018-06-11  5:20   ` [PATCH 11/15] " Qu Wenruo
  2018-07-30  8:34   ` Misono Tomohiro
  2 siblings, 0 replies; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-08 14:53 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

This commit pulls those portions of the kernel implementation of
delayed refs which are necessary to have them working in user-space.
I've done the following modifications:

1. Replaced all kmem_cache_alloc calls to kmalloc.

2. Removed all locking-related code, since we are single threaded in
userspace.

3. Removed code which deals with data refs - delayed refs in user space
are going to be used only for cowonly trees.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---

V2: 
 * removed definitions of delayed data ref structure. 

 Makefile      |   3 +-
 ctree.h       |   3 +
 delayed-ref.c | 608 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 delayed-ref.h | 210 ++++++++++++++++++++
 extent-tree.c | 228 ++++++++++++++++++++++
 kerncompat.h  |   8 +
 transaction.h |   4 +
 7 files changed, 1063 insertions(+), 1 deletion(-)
 create mode 100644 delayed-ref.c
 create mode 100644 delayed-ref.h

diff --git a/Makefile b/Makefile
index 544410e6440c..9508ad4f11e6 100644
--- a/Makefile
+++ b/Makefile
@@ -116,7 +116,8 @@ objects = ctree.o disk-io.o kernel-lib/radix-tree.o extent-tree.o print-tree.o \
 	  qgroup.o free-space-cache.o kernel-lib/list_sort.o props.o \
 	  kernel-shared/ulist.o qgroup-verify.o backref.o string-table.o task-utils.o \
 	  inode.o file.o find-root.o free-space-tree.o help.o send-dump.o \
-	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o
+	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o \
+	  delayed-ref.o
 cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
 	       cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
 	       cmds-quota.o cmds-qgroup.o cmds-replace.o check/main.o \
diff --git a/ctree.h b/ctree.h
index b30a946658ce..d1ea45571d1e 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2812,4 +2812,7 @@ int btrfs_punch_hole(struct btrfs_trans_handle *trans,
 int btrfs_read_file(struct btrfs_root *root, u64 ino, u64 start, int len,
 		    char *dest);
 
+
+/* extent-tree.c */
+int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, unsigned long nr);
 #endif
diff --git a/delayed-ref.c b/delayed-ref.c
new file mode 100644
index 000000000000..f3fa50239380
--- /dev/null
+++ b/delayed-ref.c
@@ -0,0 +1,608 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2009 Oracle.  All rights reserved.
+ */
+
+#include "ctree.h"
+#include "btrfs-list.h"
+#include "delayed-ref.h"
+#include "transaction.h"
+
+/*
+ * delayed back reference update tracking.  For subvolume trees
+ * we queue up extent allocations and backref maintenance for
+ * delayed processing.   This avoids deep call chains where we
+ * add extents in the middle of btrfs_search_slot, and it allows
+ * us to buffer up frequently modified backrefs in an rb tree instead
+ * of hammering updates on the extent allocation tree.
+ */
+
+/*
+ * compare two delayed tree backrefs with same bytenr and type
+ */
+static int comp_tree_refs(struct btrfs_delayed_tree_ref *ref1,
+			  struct btrfs_delayed_tree_ref *ref2)
+{
+	if (ref1->node.type == BTRFS_TREE_BLOCK_REF_KEY) {
+		if (ref1->root < ref2->root)
+			return -1;
+		if (ref1->root > ref2->root)
+			return 1;
+	} else {
+		if (ref1->parent < ref2->parent)
+			return -1;
+		if (ref1->parent > ref2->parent)
+			return 1;
+	}
+	return 0;
+}
+
+static int comp_refs(struct btrfs_delayed_ref_node *ref1,
+		     struct btrfs_delayed_ref_node *ref2,
+		     bool check_seq)
+{
+	int ret = 0;
+
+	if (ref1->type < ref2->type)
+		return -1;
+	if (ref1->type > ref2->type)
+		return 1;
+	if (ref1->type == BTRFS_TREE_BLOCK_REF_KEY ||
+	    ref1->type == BTRFS_SHARED_BLOCK_REF_KEY)
+		ret = comp_tree_refs(btrfs_delayed_node_to_tree_ref(ref1),
+				     btrfs_delayed_node_to_tree_ref(ref2));
+	else
+		BUG();
+
+	if (ret)
+		return ret;
+	if (check_seq) {
+		if (ref1->seq < ref2->seq)
+			return -1;
+		if (ref1->seq > ref2->seq)
+			return 1;
+	}
+	return 0;
+}
+
+/* insert a new ref to head ref rbtree */
+static struct btrfs_delayed_ref_head *htree_insert(struct rb_root *root,
+						   struct rb_node *node)
+{
+	struct rb_node **p = &root->rb_node;
+	struct rb_node *parent_node = NULL;
+	struct btrfs_delayed_ref_head *entry;
+	struct btrfs_delayed_ref_head *ins;
+	u64 bytenr;
+
+	ins = rb_entry(node, struct btrfs_delayed_ref_head, href_node);
+	bytenr = ins->bytenr;
+	while (*p) {
+		parent_node = *p;
+		entry = rb_entry(parent_node, struct btrfs_delayed_ref_head,
+				 href_node);
+
+		if (bytenr < entry->bytenr)
+			p = &(*p)->rb_left;
+		else if (bytenr > entry->bytenr)
+			p = &(*p)->rb_right;
+		else
+			return entry;
+	}
+
+	rb_link_node(node, parent_node, p);
+	rb_insert_color(node, root);
+	return NULL;
+}
+
+static struct btrfs_delayed_ref_node* tree_insert(struct rb_root *root,
+		struct btrfs_delayed_ref_node *ins)
+{
+	struct rb_node **p = &root->rb_node;
+	struct rb_node *node = &ins->ref_node;
+	struct rb_node *parent_node = NULL;
+	struct btrfs_delayed_ref_node *entry;
+
+	while (*p) {
+		int comp;
+
+		parent_node = *p;
+		entry = rb_entry(parent_node, struct btrfs_delayed_ref_node,
+				 ref_node);
+		comp = comp_refs(ins, entry, true);
+		if (comp < 0)
+			p = &(*p)->rb_left;
+		else if (comp > 0)
+			p = &(*p)->rb_right;
+		else
+			return entry;
+	}
+
+	rb_link_node(node, parent_node, p);
+	rb_insert_color(node, root);
+	return NULL;
+}
+
+/*
+ * find an head entry based on bytenr. This returns the delayed ref
+ * head if it was able to find one, or NULL if nothing was in that spot.
+ * If return_bigger is given, the next bigger entry is returned if no exact
+ * match is found.
+ */
+static struct btrfs_delayed_ref_head *
+find_ref_head(struct rb_root *root, u64 bytenr,
+	      int return_bigger)
+{
+	struct rb_node *n;
+	struct btrfs_delayed_ref_head *entry;
+
+	n = root->rb_node;
+	entry = NULL;
+	while (n) {
+		entry = rb_entry(n, struct btrfs_delayed_ref_head, href_node);
+
+		if (bytenr < entry->bytenr)
+			n = n->rb_left;
+		else if (bytenr > entry->bytenr)
+			n = n->rb_right;
+		else
+			return entry;
+	}
+	if (entry && return_bigger) {
+		if (bytenr > entry->bytenr) {
+			n = rb_next(&entry->href_node);
+			if (!n)
+				n = rb_first(root);
+			entry = rb_entry(n, struct btrfs_delayed_ref_head,
+					 href_node);
+			return entry;
+		}
+		return entry;
+	}
+	return NULL;
+}
+
+static inline void drop_delayed_ref(struct btrfs_trans_handle *trans,
+				    struct btrfs_delayed_ref_root *delayed_refs,
+				    struct btrfs_delayed_ref_head *head,
+				    struct btrfs_delayed_ref_node *ref)
+{
+	rb_erase(&ref->ref_node, &head->ref_tree);
+	RB_CLEAR_NODE(&ref->ref_node);
+	if (!list_empty(&ref->add_list))
+		list_del(&ref->add_list);
+	ref->in_tree = 0;
+	btrfs_put_delayed_ref(ref);
+	if (trans->delayed_ref_updates)
+		trans->delayed_ref_updates--;
+}
+
+static bool merge_ref(struct btrfs_trans_handle *trans,
+		      struct btrfs_delayed_ref_root *delayed_refs,
+		      struct btrfs_delayed_ref_head *head,
+		      struct btrfs_delayed_ref_node *ref,
+		      u64 seq)
+{
+	struct btrfs_delayed_ref_node *next;
+	struct rb_node *node = rb_next(&ref->ref_node);
+	bool done = false;
+
+	while (!done && node) {
+		int mod;
+
+		next = rb_entry(node, struct btrfs_delayed_ref_node, ref_node);
+		node = rb_next(node);
+		if (seq && next->seq >= seq)
+			break;
+		if (comp_refs(ref, next, false))
+			break;
+
+		if (ref->action == next->action) {
+			mod = next->ref_mod;
+		} else {
+			if (ref->ref_mod < next->ref_mod) {
+				swap(ref, next);
+				done = true;
+			}
+			mod = -next->ref_mod;
+		}
+
+		drop_delayed_ref(trans, delayed_refs, head, next);
+		ref->ref_mod += mod;
+		if (ref->ref_mod == 0) {
+			drop_delayed_ref(trans, delayed_refs, head, ref);
+			done = true;
+		} else {
+			/*
+			 * Can't have multiples of the same ref on a tree block.
+			 */
+			WARN_ON(ref->type == BTRFS_TREE_BLOCK_REF_KEY ||
+				ref->type == BTRFS_SHARED_BLOCK_REF_KEY);
+		}
+	}
+
+	return done;
+}
+
+void btrfs_merge_delayed_refs(struct btrfs_trans_handle *trans,
+			      struct btrfs_delayed_ref_root *delayed_refs,
+			      struct btrfs_delayed_ref_head *head)
+{
+	struct btrfs_delayed_ref_node *ref;
+	struct rb_node *node;
+
+	if (RB_EMPTY_ROOT(&head->ref_tree))
+		return;
+
+	/* We don't have too many refs to merge for data. */
+	if (head->is_data)
+		return;
+
+again:
+	for (node = rb_first(&head->ref_tree); node; node = rb_next(node)) {
+		ref = rb_entry(node, struct btrfs_delayed_ref_node, ref_node);
+		if (merge_ref(trans, delayed_refs, head, ref, 0))
+			goto again;
+	}
+}
+
+struct btrfs_delayed_ref_head *
+btrfs_select_ref_head(struct btrfs_trans_handle *trans)
+{
+	struct btrfs_delayed_ref_root *delayed_refs;
+	struct btrfs_delayed_ref_head *head;
+	u64 start;
+	bool loop = false;
+
+	delayed_refs = &trans->delayed_refs;
+
+again:
+	start = delayed_refs->run_delayed_start;
+	head = find_ref_head(&delayed_refs->href_root, start, 1);
+	if (!head && !loop) {
+		delayed_refs->run_delayed_start = 0;
+		start = 0;
+		loop = true;
+		head = find_ref_head(&delayed_refs->href_root, start, 1);
+		if (!head)
+			return NULL;
+	} else if (!head && loop) {
+		return NULL;
+	}
+
+	while (head->processing) {
+		struct rb_node *node;
+
+		node = rb_next(&head->href_node);
+		if (!node) {
+			if (loop)
+				return NULL;
+			delayed_refs->run_delayed_start = 0;
+			start = 0;
+			loop = true;
+			goto again;
+		}
+		head = rb_entry(node, struct btrfs_delayed_ref_head,
+				href_node);
+	}
+
+	head->processing = 1;
+	WARN_ON(delayed_refs->num_heads_ready == 0);
+	delayed_refs->num_heads_ready--;
+	delayed_refs->run_delayed_start = head->bytenr +
+		head->num_bytes;
+	return head;
+}
+
+/*
+ * Helper to insert the ref_node to the tail or merge with tail.
+ *
+ * Return 0 for insert.
+ * Return >0 for merge.
+ */
+static int insert_delayed_ref(struct btrfs_trans_handle *trans,
+			      struct btrfs_delayed_ref_root *root,
+			      struct btrfs_delayed_ref_head *href,
+			      struct btrfs_delayed_ref_node *ref)
+{
+	struct btrfs_delayed_ref_node *exist;
+	int mod;
+	int ret = 0;
+
+	exist = tree_insert(&href->ref_tree, ref);
+	if (!exist)
+		goto inserted;
+
+	/* Now we are sure we can merge */
+	ret = 1;
+	if (exist->action == ref->action) {
+		mod = ref->ref_mod;
+	} else {
+		/* Need to change action */
+		if (exist->ref_mod < ref->ref_mod) {
+			exist->action = ref->action;
+			mod = -exist->ref_mod;
+			exist->ref_mod = ref->ref_mod;
+			if (ref->action == BTRFS_ADD_DELAYED_REF)
+				list_add_tail(&exist->add_list,
+					      &href->ref_add_list);
+			else if (ref->action == BTRFS_DROP_DELAYED_REF) {
+				ASSERT(!list_empty(&exist->add_list));
+				list_del(&exist->add_list);
+			} else {
+				ASSERT(0);
+			}
+		} else
+			mod = -ref->ref_mod;
+	}
+	exist->ref_mod += mod;
+
+	/* remove existing tail if its ref_mod is zero */
+	if (exist->ref_mod == 0)
+		drop_delayed_ref(trans, root, href, exist);
+	return ret;
+inserted:
+	if (ref->action == BTRFS_ADD_DELAYED_REF)
+		list_add_tail(&ref->add_list, &href->ref_add_list);
+	root->num_entries++;
+	trans->delayed_ref_updates++;
+	return ret;
+}
+
+/*
+ * helper function to update the accounting in the head ref
+ * existing and update must have the same bytenr
+ */
+static noinline void
+update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs,
+			 struct btrfs_delayed_ref_head *existing,
+			 struct btrfs_delayed_ref_head *update,
+			 int *old_ref_mod_ret)
+{
+	int old_ref_mod;
+
+	BUG_ON(existing->is_data != update->is_data);
+
+	if (update->must_insert_reserved) {
+		/* if the extent was freed and then
+		 * reallocated before the delayed ref
+		 * entries were processed, we can end up
+		 * with an existing head ref without
+		 * the must_insert_reserved flag set.
+		 * Set it again here
+		 */
+		existing->must_insert_reserved = update->must_insert_reserved;
+
+		/*
+		 * update the num_bytes so we make sure the accounting
+		 * is done correctly
+		 */
+		existing->num_bytes = update->num_bytes;
+
+	}
+
+	if (update->extent_op) {
+		if (!existing->extent_op) {
+			existing->extent_op = update->extent_op;
+		} else {
+			if (update->extent_op->update_key) {
+				memcpy(&existing->extent_op->key,
+				       &update->extent_op->key,
+				       sizeof(update->extent_op->key));
+				existing->extent_op->update_key = true;
+			}
+			if (update->extent_op->update_flags) {
+				existing->extent_op->flags_to_set |=
+					update->extent_op->flags_to_set;
+				existing->extent_op->update_flags = true;
+			}
+			btrfs_free_delayed_extent_op(update->extent_op);
+		}
+	}
+	/*
+	 * update the reference mod on the head to reflect this new operation,
+	 * only need the lock for this case cause we could be processing it
+	 * currently, for refs we just added we know we're a-ok.
+	 */
+	old_ref_mod = existing->total_ref_mod;
+	if (old_ref_mod_ret)
+		*old_ref_mod_ret = old_ref_mod;
+	existing->ref_mod += update->ref_mod;
+	existing->total_ref_mod += update->ref_mod;
+
+}
+
+static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref,
+				  void *qrecord,
+				  u64 bytenr, u64 num_bytes, u64 ref_root,
+				  u64 reserved, int action, bool is_data,
+				  bool is_system)
+{
+	int count_mod = 1;
+	int must_insert_reserved = 0;
+
+	/* If reserved is provided, it must be a data extent. */
+	BUG_ON(!is_data && reserved);
+
+	/*
+	 * The head node stores the sum of all the mods, so dropping a ref
+	 * should drop the sum in the head node by one.
+	 */
+	if (action == BTRFS_UPDATE_DELAYED_HEAD)
+		count_mod = 0;
+	else if (action == BTRFS_DROP_DELAYED_REF)
+		count_mod = -1;
+
+	/*
+	 * BTRFS_ADD_DELAYED_EXTENT means that we need to update the reserved
+	 * accounting when the extent is finally added, or if a later
+	 * modification deletes the delayed ref without ever inserting the
+	 * extent into the extent allocation tree.  ref->must_insert_reserved
+	 * is the flag used to record that accounting mods are required.
+	 *
+	 * Once we record must_insert_reserved, switch the action to
+	 * BTRFS_ADD_DELAYED_REF because other special casing is not required.
+	 */
+	if (action == BTRFS_ADD_DELAYED_EXTENT)
+		must_insert_reserved = 1;
+	else
+		must_insert_reserved = 0;
+
+	head_ref->refs = 1;
+	head_ref->bytenr = bytenr;
+	head_ref->num_bytes = num_bytes;
+	head_ref->ref_mod = count_mod;
+	head_ref->must_insert_reserved = must_insert_reserved;
+	head_ref->is_data = is_data;
+	head_ref->is_system = is_system;
+	head_ref->ref_tree = RB_ROOT;
+	INIT_LIST_HEAD(&head_ref->ref_add_list);
+	RB_CLEAR_NODE(&head_ref->href_node);
+	head_ref->processing = 0;
+	head_ref->total_ref_mod = count_mod;
+}
+
+/*
+ * helper function to actually insert a head node into the rbtree.
+ * this does all the dirty work in terms of maintaining the correct
+ * overall modification count.
+ */
+static noinline struct btrfs_delayed_ref_head *
+add_delayed_ref_head(struct btrfs_trans_handle *trans,
+		     struct btrfs_delayed_ref_head *head_ref,
+		     void *qrecord,
+		     int action, int *qrecord_inserted_ret,
+		     int *old_ref_mod, int *new_ref_mod)
+{
+	struct btrfs_delayed_ref_head *existing;
+	struct btrfs_delayed_ref_root *delayed_refs;
+
+	delayed_refs = &trans->delayed_refs;
+
+	existing = htree_insert(&delayed_refs->href_root, &head_ref->href_node);
+	if (existing) {
+		update_existing_head_ref(delayed_refs, existing, head_ref, old_ref_mod);
+		/*
+		 * we've updated the existing ref, free the newly
+		 * allocated ref
+		 */
+		kfree(head_ref);
+		head_ref = existing;
+	} else {
+		if (old_ref_mod)
+			*old_ref_mod = 0;
+		delayed_refs->num_heads++;
+		delayed_refs->num_heads_ready++;
+		trans->delayed_ref_updates++;
+	}
+	if (new_ref_mod)
+		*new_ref_mod = head_ref->total_ref_mod;
+
+	return head_ref;
+}
+
+/*
+ * init_delayed_ref_common - Initialize the structure which represents a
+ *			     modification to a an extent.
+ *
+ * @fs_info:    Internal to the mounted filesystem mount structure.
+ *
+ * @ref:	The structure which is going to be initialized.
+ *
+ * @bytenr:	The logical address of the extent for which a modification is
+ *		going to be recorded.
+ *
+ * @num_bytes:  Size of the extent whose modification is being recorded.
+ *
+ * @ref_root:	The id of the root where this modification has originated, this
+ *		can be either one of the well-known metadata trees or the
+ *		subvolume id which references this extent.
+ *
+ * @action:	Can be one of BTRFS_ADD_DELAYED_REF/BTRFS_DROP_DELAYED_REF or
+ *		BTRFS_ADD_DELAYED_EXTENT
+ *
+ * @ref_type:	Holds the type of the extent which is being recorded, can be
+ *		one of BTRFS_SHARED_BLOCK_REF_KEY/BTRFS_TREE_BLOCK_REF_KEY
+ *		when recording a metadata extent or BTRFS_SHARED_DATA_REF_KEY/
+ *		BTRFS_EXTENT_DATA_REF_KEY when recording data extent
+ */
+static void init_delayed_ref_common(struct btrfs_fs_info *fs_info,
+				    struct btrfs_delayed_ref_node *ref,
+				    u64 bytenr, u64 num_bytes, u64 ref_root,
+				    int action, u8 ref_type)
+{
+	if (action == BTRFS_ADD_DELAYED_EXTENT)
+		action = BTRFS_ADD_DELAYED_REF;
+
+	ref->refs = 1;
+	ref->bytenr = bytenr;
+	ref->num_bytes = num_bytes;
+	ref->ref_mod = 1;
+	ref->action = action;
+	ref->is_head = 0;
+	ref->in_tree = 1;
+	ref->seq = 0;
+	ref->type = ref_type;
+	RB_CLEAR_NODE(&ref->ref_node);
+	INIT_LIST_HEAD(&ref->add_list);
+}
+
+/*
+ * add a delayed tree ref.  This does all of the accounting required
+ * to make sure the delayed ref is eventually processed before this
+ * transaction commits.
+ */
+int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
+			       struct btrfs_trans_handle *trans,
+			       u64 bytenr, u64 num_bytes, u64 parent,
+			       u64 ref_root, int level, int action,
+			       struct btrfs_delayed_extent_op *extent_op,
+			       int *old_ref_mod, int *new_ref_mod)
+{
+	struct btrfs_delayed_tree_ref *ref;
+	struct btrfs_delayed_ref_head *head_ref;
+	struct btrfs_delayed_ref_root *delayed_refs;
+	bool is_system = (ref_root == BTRFS_CHUNK_TREE_OBJECTID);
+	int ret;
+	u8 ref_type;
+
+	BUG_ON(extent_op && extent_op->is_data);
+	ref = kmalloc(sizeof(*ref), GFP_NOFS);
+	if (!ref)
+		return -ENOMEM;
+
+	if (parent)
+		ref_type = BTRFS_SHARED_BLOCK_REF_KEY;
+	else
+		ref_type = BTRFS_TREE_BLOCK_REF_KEY;
+	init_delayed_ref_common(fs_info, &ref->node, bytenr, num_bytes,
+				ref_root, action, ref_type);
+	ref->root = ref_root;
+	ref->parent = parent;
+	ref->level = level;
+
+	head_ref = kmalloc(sizeof(*head_ref), GFP_NOFS);
+	if (!head_ref)
+		goto free_ref;
+
+	init_delayed_ref_head(head_ref, NULL, bytenr, num_bytes,
+			      ref_root, 0, action, false, is_system);
+	head_ref->extent_op = extent_op;
+
+	delayed_refs = &trans->delayed_refs;
+
+	head_ref = add_delayed_ref_head(trans, head_ref, NULL, action, NULL,
+			old_ref_mod, new_ref_mod);
+
+	ret = insert_delayed_ref(trans, delayed_refs, head_ref, &ref->node);
+
+	if (ret > 0)
+		kfree(ref);
+
+	return 0;
+
+free_ref:
+	kfree(ref);
+
+	return -ENOMEM;
+}
diff --git a/delayed-ref.h b/delayed-ref.h
new file mode 100644
index 000000000000..f6c35bceb111
--- /dev/null
+++ b/delayed-ref.h
@@ -0,0 +1,210 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2008 Oracle.  All rights reserved.
+ */
+
+#ifndef BTRFS_DELAYED_REF_H
+#define BTRFS_DELAYED_REF_H
+
+#include "kerncompat.h"
+
+/* these are the possible values of struct btrfs_delayed_ref_node->action */
+#define BTRFS_ADD_DELAYED_REF    1 /* add one backref to the tree */
+#define BTRFS_DROP_DELAYED_REF   2 /* delete one backref from the tree */
+#define BTRFS_ADD_DELAYED_EXTENT 3 /* record a full extent allocation */
+#define BTRFS_UPDATE_DELAYED_HEAD 4 /* not changing ref count on head ref */
+
+struct btrfs_delayed_ref_node {
+	struct rb_node ref_node;
+	/*
+	 * If action is BTRFS_ADD_DELAYED_REF, also link this node to
+	 * ref_head->ref_add_list, then we do not need to iterate the
+	 * whole ref_head->ref_list to find BTRFS_ADD_DELAYED_REF nodes.
+	 */
+	struct list_head add_list;
+
+	/* the starting bytenr of the extent */
+	u64 bytenr;
+
+	/* the size of the extent */
+	u64 num_bytes;
+
+	/* seq number to keep track of insertion order */
+	u64 seq;
+
+	/* ref count on this data structure */
+	u64 refs;
+
+	/*
+	 * how many refs is this entry adding or deleting.  For
+	 * head refs, this may be a negative number because it is keeping
+	 * track of the total mods done to the reference count.
+	 * For individual refs, this will always be a positive number
+	 *
+	 * It may be more than one, since it is possible for a single
+	 * parent to have more than one ref on an extent
+	 */
+	int ref_mod;
+
+	unsigned int action:8;
+	unsigned int type:8;
+	/* is this node still in the rbtree? */
+	unsigned int is_head:1;
+	unsigned int in_tree:1;
+};
+
+struct btrfs_delayed_extent_op {
+	struct btrfs_disk_key key;
+	u8 level;
+	bool update_key;
+	bool update_flags;
+	bool is_data;
+	u64 flags_to_set;
+};
+
+/*
+ * the head refs are used to hold a lock on a given extent, which allows us
+ * to make sure that only one process is running the delayed refs
+ * at a time for a single extent.  They also store the sum of all the
+ * reference count modifications we've queued up.
+ */
+struct btrfs_delayed_ref_head {
+	u64 bytenr;
+	u64 num_bytes;
+	u64 refs;
+
+	struct rb_root ref_tree;
+	/* accumulate add BTRFS_ADD_DELAYED_REF nodes to this ref_add_list. */
+	struct list_head ref_add_list;
+
+	struct rb_node href_node;
+
+	struct btrfs_delayed_extent_op *extent_op;
+
+	/*
+	 * This is used to track the final ref_mod from all the refs associated
+	 * with this head ref, this is not adjusted as delayed refs are run,
+	 * this is meant to track if we need to do the csum accounting or not.
+	 */
+	int total_ref_mod;
+
+	/*
+	 * This is the current outstanding mod references for this bytenr.  This
+	 * is used with lookup_extent_info to get an accurate reference count
+	 * for a bytenr, so it is adjusted as delayed refs are run so that any
+	 * on disk reference count + ref_mod is accurate.
+	 */
+	int ref_mod;
+
+	/*
+	 * when a new extent is allocated, it is just reserved in memory
+	 * The actual extent isn't inserted into the extent allocation tree
+	 * until the delayed ref is processed.  must_insert_reserved is
+	 * used to flag a delayed ref so the accounting can be updated
+	 * when a full insert is done.
+	 *
+	 * It is possible the extent will be freed before it is ever
+	 * inserted into the extent allocation tree.  In this case
+	 * we need to update the in ram accounting to properly reflect
+	 * the free has happened.
+	 */
+	unsigned int must_insert_reserved:1;
+	unsigned int is_data:1;
+	unsigned int is_system:1;
+	unsigned int processing:1;
+};
+
+struct btrfs_delayed_tree_ref {
+	struct btrfs_delayed_ref_node node;
+	u64 root;
+	u64 parent;
+	int level;
+};
+
+struct btrfs_delayed_ref_root {
+	/* head ref rbtree */
+	struct rb_root href_root;
+
+	/* dirty extent records */
+	struct rb_root dirty_extent_root;
+
+	/* total number of head nodes in tree */
+	unsigned long num_heads;
+
+	/* total number of head nodes ready for processing */
+	unsigned long num_heads_ready;
+
+	unsigned long num_entries;
+
+	/*
+	 * set when the tree is flushing before a transaction commit,
+	 * used by the throttling code to decide if new updates need
+	 * to be run right away
+	 */
+	int flushing;
+
+	u64 run_delayed_start;
+};
+
+
+static inline struct btrfs_delayed_extent_op *
+btrfs_alloc_delayed_extent_op(void)
+{
+	return kmalloc(sizeof(struct btrfs_delayed_extent_op), GFP_KERNEL);
+}
+
+static inline void
+btrfs_free_delayed_extent_op(struct btrfs_delayed_extent_op *op)
+{
+	if (op)
+		kfree(op);
+}
+
+static inline void btrfs_put_delayed_ref(struct btrfs_delayed_ref_node *ref)
+{
+	WARN_ON(ref->refs == 0);
+	if (--ref->refs) {
+		WARN_ON(ref->in_tree);
+		switch (ref->type) {
+		case BTRFS_TREE_BLOCK_REF_KEY:
+		case BTRFS_SHARED_BLOCK_REF_KEY:
+			kfree(ref);
+			break;
+		case BTRFS_EXTENT_DATA_REF_KEY:
+		case BTRFS_SHARED_DATA_REF_KEY:
+			kfree(ref);
+			break;
+		default:
+			BUG();
+		}
+	}
+}
+
+static inline void btrfs_put_delayed_ref_head(struct btrfs_delayed_ref_head *head)
+{
+	if (--head->refs)
+		kfree(head);
+}
+
+int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
+			       struct btrfs_trans_handle *trans,
+			       u64 bytenr, u64 num_bytes, u64 parent,
+			       u64 ref_root, int level, int action,
+			       struct btrfs_delayed_extent_op *extent_op,
+			       int *old_ref_mod, int *new_ref_mod);
+void btrfs_merge_delayed_refs(struct btrfs_trans_handle *trans,
+			      struct btrfs_delayed_ref_root *delayed_refs,
+			      struct btrfs_delayed_ref_head *head);
+
+struct btrfs_delayed_ref_head *
+btrfs_select_ref_head(struct btrfs_trans_handle *trans);
+
+/*
+ * helper functions to cast a node into its container
+ */
+static inline struct btrfs_delayed_tree_ref *
+btrfs_delayed_node_to_tree_ref(struct btrfs_delayed_ref_node *node)
+{
+	return container_of(node, struct btrfs_delayed_tree_ref, node);
+}
+#endif
diff --git a/extent-tree.c b/extent-tree.c
index ab57c20d9dee..aff00e536c9c 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -4183,3 +4183,231 @@ u64 add_new_free_space(struct btrfs_block_group_cache *block_group,
 
 	return total_added;
 }
+
+static void cleanup_extent_op(struct btrfs_trans_handle *trans,
+			     struct btrfs_fs_info *fs_info,
+			     struct btrfs_delayed_ref_head *head)
+{
+	struct btrfs_delayed_extent_op *extent_op = head->extent_op;
+
+	if (!extent_op)
+		return;
+	head->extent_op = NULL;
+	btrfs_free_delayed_extent_op(extent_op);
+}
+
+static void unselect_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
+				      struct btrfs_delayed_ref_head *head)
+{
+	head->processing = 0;
+	delayed_refs->num_heads_ready++;
+}
+
+static int cleanup_ref_head(struct btrfs_trans_handle *trans,
+			    struct btrfs_fs_info *fs_info,
+			    struct btrfs_delayed_ref_head *head)
+{
+	struct btrfs_delayed_ref_root *delayed_refs;
+
+	delayed_refs = &trans->delayed_refs;
+
+	cleanup_extent_op(trans, fs_info, head);
+
+	/*
+	 * Need to drop our head ref lock and re-acquire the delayed ref lock
+	 * and then re-check to make sure nobody got added.
+	 */
+	if (!RB_EMPTY_ROOT(&head->ref_tree) || head->extent_op)
+		return 1;
+
+	delayed_refs->num_heads--;
+	rb_erase(&head->href_node, &delayed_refs->href_root);
+	RB_CLEAR_NODE(&head->href_node);
+	--delayed_refs->num_entries;
+
+	if (head->must_insert_reserved)
+		btrfs_pin_extent(fs_info, head->bytenr, head->num_bytes);
+
+	btrfs_put_delayed_ref_head(head);
+	return 0;
+}
+
+static inline struct btrfs_delayed_ref_node *
+select_delayed_ref(struct btrfs_delayed_ref_head *head)
+{
+	struct btrfs_delayed_ref_node *ref;
+
+	if (RB_EMPTY_ROOT(&head->ref_tree))
+		return NULL;
+	/*
+	 * Select a delayed ref of type BTRFS_ADD_DELAYED_REF first.
+	 * This is to prevent a ref count from going down to zero, which deletes
+	 * the extent item from the extent tree, when there still are references
+	 * to add, which would fail because they would not find the extent item.
+	 */
+	if (!list_empty(&head->ref_add_list))
+		return list_first_entry(&head->ref_add_list,
+					struct btrfs_delayed_ref_node,
+					add_list);
+	ref = rb_entry(rb_first(&head->ref_tree),
+		       struct btrfs_delayed_ref_node, ref_node);
+	ASSERT(list_empty(&ref->add_list));
+	return ref;
+}
+
+
+static int run_delayed_tree_ref(struct btrfs_trans_handle *trans,
+				struct btrfs_fs_info *fs_info,
+				struct btrfs_delayed_ref_node *node,
+				struct btrfs_delayed_extent_op *extent_op,
+				int insert_reserved)
+{
+	int ret = 0;
+	struct btrfs_delayed_tree_ref *ref;
+	u64 parent = 0;
+	u64 ref_root = 0;
+
+	ref = btrfs_delayed_node_to_tree_ref(node);
+
+	if (node->type == BTRFS_SHARED_BLOCK_REF_KEY)
+			parent = ref->parent;
+	ref_root = ref->root;
+
+	if (node->ref_mod != 1) {
+		printf("btree block(%llu) has %d references rather than 1: action %d ref_root %llu parent %llu",
+			node->bytenr, node->ref_mod, node->action, ref_root,
+			parent);
+		return -EIO;
+	}
+	if (node->action == BTRFS_ADD_DELAYED_REF && insert_reserved) {
+		BUG_ON(!extent_op || !extent_op->update_flags);
+		ret = alloc_reserved_tree_block2(trans, node, extent_op);
+	} else if (node->action == BTRFS_DROP_DELAYED_REF) {
+		ret = __free_extent2(trans, node, extent_op);
+	} else {
+		BUG();
+	}
+
+	return ret;
+}
+
+/* helper function to actually process a single delayed ref entry */
+static int run_one_delayed_ref(struct btrfs_trans_handle *trans,
+			       struct btrfs_fs_info *fs_info,
+			       struct btrfs_delayed_ref_node *node,
+			       struct btrfs_delayed_extent_op *extent_op,
+			       int insert_reserved)
+{
+	int ret = 0;
+
+	if (node->type == BTRFS_TREE_BLOCK_REF_KEY ||
+		node->type == BTRFS_SHARED_BLOCK_REF_KEY) {
+		ret = run_delayed_tree_ref(trans, fs_info, node, extent_op,
+					   insert_reserved);
+	} else
+		BUG();
+	return ret;
+}
+
+int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, unsigned long nr)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_delayed_ref_root *delayed_refs;
+	struct btrfs_delayed_ref_node *ref;
+	struct btrfs_delayed_ref_head *locked_ref = NULL;
+	struct btrfs_delayed_extent_op *extent_op;
+	int ret;
+	int must_insert_reserved = 0;
+
+	delayed_refs = &trans->delayed_refs;
+	while (1) {
+		if (!locked_ref) {
+			locked_ref = btrfs_select_ref_head(trans);
+			if (!locked_ref)
+				break;
+		}
+		/*
+		 * We need to try and merge add/drops of the same ref since we
+		 * can run into issues with relocate dropping the implicit ref
+		 * and then it being added back again before the drop can
+		 * finish.	If we merged anything we need to re-loop so we can
+		 * get a good ref.
+		 * Or we can get node references of the same type that weren't
+		 * merged when created due to bumps in the tree mod seq, and
+		 * we need to merge them to prevent adding an inline extent
+		 * backref before dropping it (triggering a BUG_ON at
+		 * insert_inline_extent_backref()).
+		 */
+		btrfs_merge_delayed_refs(trans, delayed_refs, locked_ref);
+		ref = select_delayed_ref(locked_ref);
+		/*
+		 * We're done processing refs in this ref_head, clean everything
+		 * up and move on to the next ref_head.
+		 */
+		if (!ref) {
+			ret = cleanup_ref_head(trans, fs_info, locked_ref);
+			if (ret > 0 ) {
+				/* We dropped our lock, we need to loop. */
+				ret = 0;
+				continue;
+			} else if (ret) {
+				return ret;
+			}
+			locked_ref = NULL;
+			continue;
+		}
+
+		ref->in_tree = 0;
+		rb_erase(&ref->ref_node, &locked_ref->ref_tree);
+		RB_CLEAR_NODE(&ref->ref_node);
+		if (!list_empty(&ref->add_list))
+				list_del(&ref->add_list);
+		/*
+		 * When we play the delayed ref, also correct the ref_mod on
+		 * head
+		 */
+		switch (ref->action) {
+		case BTRFS_ADD_DELAYED_REF:
+		case BTRFS_ADD_DELAYED_EXTENT:
+			locked_ref->ref_mod -= ref->ref_mod;
+			break;
+		case BTRFS_DROP_DELAYED_REF:
+			locked_ref->ref_mod += ref->ref_mod;
+			break;
+		default:
+			WARN_ON(1);
+		}
+		delayed_refs->num_entries--;
+
+		/*
+		 * Record the must-insert_reserved flag before we drop the spin
+		 * lock.
+		 */
+		must_insert_reserved = locked_ref->must_insert_reserved;
+		locked_ref->must_insert_reserved = 0;
+
+		extent_op = locked_ref->extent_op;
+		locked_ref->extent_op = NULL;
+
+		ret = run_one_delayed_ref(trans, fs_info, ref, extent_op,
+					  must_insert_reserved);
+
+		btrfs_free_delayed_extent_op(extent_op);
+		/*
+		 * If we are re-initing extent tree in this transaction
+		 * failure in freeing old roots are expected (because we don't
+		 * have the old extent tree, hence backref resolution will
+		 * return -EIO).
+		 */
+		if (ret && (!trans->reinit_extent_tree ||
+		     ref->action != BTRFS_DROP_DELAYED_REF)) {
+			unselect_delayed_ref_head(delayed_refs, locked_ref);
+			btrfs_put_delayed_ref(ref);
+			return ret;
+		}
+
+		btrfs_put_delayed_ref(ref);
+	}
+
+	return 0;
+}
diff --git a/kerncompat.h b/kerncompat.h
index fa96715fb70c..1a2bc18c3ac2 100644
--- a/kerncompat.h
+++ b/kerncompat.h
@@ -263,6 +263,14 @@ static inline int IS_ERR_OR_NULL(const void *ptr)
 	return !ptr || IS_ERR(ptr);
 }
 
+/**
+ * swap - swap values of @a and @b
+ * @a: first value
+ * @b: second value
+ */
+#define swap(a, b) \
+        do { typeof(a) __tmp = (a); (a) = (b); (b) = __tmp; } while (0)
+
 /*
  * This looks more complex than it should be. But we need to
  * get the type for the ~ right in round_down (it needs to be
diff --git a/transaction.h b/transaction.h
index 750e329e1ba8..34060252dd5c 100644
--- a/transaction.h
+++ b/transaction.h
@@ -21,6 +21,7 @@
 
 #include "kerncompat.h"
 #include "ctree.h"
+#include "delayed-ref.h"
 
 struct btrfs_trans_handle {
 	struct btrfs_fs_info *fs_info;
@@ -28,9 +29,12 @@ struct btrfs_trans_handle {
 	u64 alloc_exclude_start;
 	u64 alloc_exclude_nr;
 	bool reinit_extent_tree;
+	u64 delayed_ref_updates;
 	unsigned long blocks_reserved;
 	unsigned long blocks_used;
 	struct btrfs_block_group_cache *block_group;
+	struct btrfs_delayed_ref_root delayed_refs;
+
 };
 
 struct btrfs_trans_handle* btrfs_start_transaction(struct btrfs_root *root,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 01/15] btrfs-progs: Remove root argument from pin_down_bytes
  2018-06-08 12:47 ` [PATCH 01/15] btrfs-progs: Remove root argument from pin_down_bytes Nikolay Borisov
@ 2018-06-11  4:41   ` Qu Wenruo
  0 siblings, 0 replies; 46+ messages in thread
From: Qu Wenruo @ 2018-06-11  4:41 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2018年06月08日 20:47, Nikolay Borisov wrote:
> This argument is used to obtain a reference to fs_info, which can
> already be done from the passed trans handle, so use that instead.
> This is in preparation for delayed refs support.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> ---
>  extent-tree.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/extent-tree.c b/extent-tree.c
> index 0643815bd41c..cbc022f6cef6 100644
> --- a/extent-tree.c
> +++ b/extent-tree.c
> @@ -2098,9 +2098,8 @@ static int finish_current_insert(struct btrfs_trans_handle *trans)
>  	return 0;
>  }
>  
> -static int pin_down_bytes(struct btrfs_trans_handle *trans,
> -			  struct btrfs_root *root,
> -			  u64 bytenr, u64 num_bytes, int is_data)
> +static int pin_down_bytes(struct btrfs_trans_handle *trans, u64 bytenr,
> +			  u64 num_bytes, int is_data)
>  {
>  	int err = 0;
>  	struct extent_buffer *buf;
> @@ -2108,7 +2107,7 @@ static int pin_down_bytes(struct btrfs_trans_handle *trans,
>  	if (is_data)
>  		goto pinit;
>  
> -	buf = btrfs_find_tree_block(root->fs_info, bytenr, num_bytes);
> +	buf = btrfs_find_tree_block(trans->fs_info, bytenr, num_bytes);
>  	if (!buf)
>  		goto pinit;
>  
> @@ -2360,7 +2359,7 @@ static int __free_extent(struct btrfs_trans_handle *trans,
>  		}
>  
>  		if (pin) {
> -			ret = pin_down_bytes(trans, root, bytenr, num_bytes,
> +			ret = pin_down_bytes(trans, bytenr, num_bytes,
>  					     is_data);
>  			if (ret > 0)
>  				mark_free = 1;
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/15] btrfs-progs: Remove root argument from btrfs_del_csums
  2018-06-08 12:47 ` [PATCH 02/15] btrfs-progs: Remove root argument from btrfs_del_csums Nikolay Borisov
@ 2018-06-11  4:46   ` Qu Wenruo
  2018-06-11  7:02     ` Nikolay Borisov
  0 siblings, 1 reply; 46+ messages in thread
From: Qu Wenruo @ 2018-06-11  4:46 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2018年06月08日 20:47, Nikolay Borisov wrote:
> It's not needed, since we can obtain a reference to fs_info from the
> passed transaction handle. This is needed by delayed refs code.

This looks a little too aggressive to me.

Normally we would expect parameters like @trans then @fs_info.
Although @trans only let us get @fs_info, under certain case @trans
could be NULL (like btrfs_search_slot).

Although for btrfs_del_csums() @tran can be NULL, I still prefer the
@trans, @fs_info parameters.

Thanks,
Qu


> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  btrfs-corrupt-block.c |  2 +-
>  ctree.h               |  3 +--
>  extent-tree.c         |  2 +-
>  file-item.c           | 20 ++++++++++----------
>  4 files changed, 13 insertions(+), 14 deletions(-)
> 
> diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
> index 4fbea26cda20..3add8e63b7bb 100644
> --- a/btrfs-corrupt-block.c
> +++ b/btrfs-corrupt-block.c
> @@ -926,7 +926,7 @@ static int delete_csum(struct btrfs_root *root, u64 bytenr, u64 bytes)
>  		return PTR_ERR(trans);
>  	}
>  
> -	ret = btrfs_del_csums(trans, root, bytenr, bytes);
> +	ret = btrfs_del_csums(trans, bytenr, bytes);
>  	if (ret)
>  		fprintf(stderr, "Error deleting csums %d\n", ret);
>  	btrfs_commit_transaction(trans, root);
> diff --git a/ctree.h b/ctree.h
> index de4b1b7e6416..082726238b91 100644
> --- a/ctree.h
> +++ b/ctree.h
> @@ -2752,8 +2752,7 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
>  			u64 ino, u64 parent_ino, u64 *index);
>  
>  /* file-item.c */
> -int btrfs_del_csums(struct btrfs_trans_handle *trans,
> -		    struct btrfs_root *root, u64 bytenr, u64 len);
> +int btrfs_del_csums(struct btrfs_trans_handle *trans, u64 bytenr, u64 len);
>  int btrfs_insert_file_extent(struct btrfs_trans_handle *trans,
>  			     struct btrfs_root *root,
>  			     u64 objectid, u64 pos, u64 offset,
> diff --git a/extent-tree.c b/extent-tree.c
> index cbc022f6cef6..c6f09b52800f 100644
> --- a/extent-tree.c
> +++ b/extent-tree.c
> @@ -2372,7 +2372,7 @@ static int __free_extent(struct btrfs_trans_handle *trans,
>  		btrfs_release_path(path);
>  
>  		if (is_data) {
> -			ret = btrfs_del_csums(trans, root, bytenr, num_bytes);
> +			ret = btrfs_del_csums(trans, bytenr, num_bytes);
>  			BUG_ON(ret);
>  		}
>  
> diff --git a/file-item.c b/file-item.c
> index 7b0ff3585509..71d4e89f78d1 100644
> --- a/file-item.c
> +++ b/file-item.c
> @@ -394,8 +394,7 @@ static noinline int truncate_one_csum(struct btrfs_root *root,
>   * deletes the csum items from the csum tree for a given
>   * range of bytes.
>   */
> -int btrfs_del_csums(struct btrfs_trans_handle *trans,
> -		    struct btrfs_root *root, u64 bytenr, u64 len)
> +int btrfs_del_csums(struct btrfs_trans_handle *trans, u64 bytenr, u64 len)
>  {
>  	struct btrfs_path *path;
>  	struct btrfs_key key;
> @@ -403,11 +402,10 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>  	u64 csum_end;
>  	struct extent_buffer *leaf;
>  	int ret;
> -	u16 csum_size =
> -		btrfs_super_csum_size(root->fs_info->super_copy);
> -	int blocksize = root->fs_info->sectorsize;
> +	u16 csum_size = btrfs_super_csum_size(trans->fs_info->super_copy);
> +	int blocksize = trans->fs_info->sectorsize;
> +	struct btrfs_root *csum_root = trans->fs_info->csum_root;
>  
> -	root = root->fs_info->csum_root;
>  
>  	path = btrfs_alloc_path();
>  	if (!path)
> @@ -418,7 +416,7 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>  		key.offset = end_byte - 1;
>  		key.type = BTRFS_EXTENT_CSUM_KEY;
>  
> -		ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
> +		ret = btrfs_search_slot(trans, csum_root, &key, path, -1, 1);
>  		if (ret > 0) {
>  			if (path->slots[0] == 0)
>  				goto out;
> @@ -445,7 +443,7 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>  
>  		/* delete the entire item, it is inside our range */
>  		if (key.offset >= bytenr && csum_end <= end_byte) {
> -			ret = btrfs_del_item(trans, root, path);
> +			ret = btrfs_del_item(trans, csum_root, path);
>  			BUG_ON(ret);
>  		} else if (key.offset < bytenr && csum_end > end_byte) {
>  			unsigned long offset;
> @@ -485,12 +483,14 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>  			 * btrfs_split_item returns -EAGAIN when the
>  			 * item changed size or key
>  			 */
> -			ret = btrfs_split_item(trans, root, path, &key, offset);
> +			ret = btrfs_split_item(trans, csum_root, path, &key,
> +					       offset);
>  			BUG_ON(ret && ret != -EAGAIN);
>  
>  			key.offset = end_byte - 1;
>  		} else {
> -			ret = truncate_one_csum(root, path, &key, bytenr, len);
> +			ret = truncate_one_csum(csum_root, path, &key, bytenr,
> +						len);
>  			BUG_ON(ret);
>  		}
>  		btrfs_release_path(path);
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/15] btrfs-progs: Add functions to modify the used space by a root
  2018-06-08 12:47 ` [PATCH 03/15] btrfs-progs: Add functions to modify the used space by a root Nikolay Borisov
@ 2018-06-11  4:47   ` Qu Wenruo
  0 siblings, 0 replies; 46+ messages in thread
From: Qu Wenruo @ 2018-06-11  4:47 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2018年06月08日 20:47, Nikolay Borisov wrote:
> Pull the necessary function, excluding locking. Required to enable
> integration of delayed refs.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  ctree.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/ctree.c b/ctree.c
> index 2c51580fec65..7b74716bf92f 100644
> --- a/ctree.c
> +++ b/ctree.c
> @@ -76,6 +76,18 @@ void add_root_to_dirty_list(struct btrfs_root *root)
>  	}
>  }
>  
> +static void root_add_used(struct btrfs_root *root, u32 size)
> +{
> +        btrfs_set_root_used(&root->root_item,
> +                            btrfs_root_used(&root->root_item) + size);
> +}
> +
> +static void root_sub_used(struct btrfs_root *root, u32 size)
> +{
> +        btrfs_set_root_used(&root->root_item,
> +                            btrfs_root_used(&root->root_item) - size);
> +}
> +

So small that it can be included into the patch which uses this.

BTW, it would be better to do some basic underflow check here.
No need to return int, but some WARN_ON() would definitely help.

Thanks,
Qu

>  int btrfs_copy_root(struct btrfs_trans_handle *trans,
>  		      struct btrfs_root *root,
>  		      struct extent_buffer *buf,
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 05/15] btrfs-progs: Make update_block_group take fs_info instead of root
  2018-06-08 12:47 ` [PATCH 05/15] btrfs-progs: Make update_block_group take fs_info instead of root Nikolay Borisov
@ 2018-06-11  4:49   ` Qu Wenruo
  0 siblings, 0 replies; 46+ messages in thread
From: Qu Wenruo @ 2018-06-11  4:49 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2018年06月08日 20:47, Nikolay Borisov wrote:
> This is in preparation of delayed refs code.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> ---
>  extent-tree.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/extent-tree.c b/extent-tree.c
> index 07b5fb99e8cf..6e7a19323efc 100644
> --- a/extent-tree.c
> +++ b/extent-tree.c
> @@ -1912,12 +1912,10 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
>  	return 0;
>  }
>  
> -static int update_block_group(struct btrfs_root *root,
> -			      u64 bytenr, u64 num_bytes, int alloc,
> -			      int mark_free)
> +static int update_block_group(struct btrfs_fs_info *info, u64 bytenr,
> +			      u64 num_bytes, int alloc, int mark_free)
>  {
>  	struct btrfs_block_group_cache *cache;
> -	struct btrfs_fs_info *info = root->fs_info;
>  	u64 total = num_bytes;
>  	u64 old_val;
>  	u64 byte_in_group;
> @@ -2368,7 +2366,8 @@ static int __free_extent(struct btrfs_trans_handle *trans,
>  			BUG_ON(ret);
>  		}
>  
> -		update_block_group(root, bytenr, num_bytes, 0, mark_free);
> +		update_block_group(trans->fs_info, bytenr, num_bytes, 0,
> +				   mark_free);
>  	}
>  fail:
>  	btrfs_free_path(path);
> @@ -2730,7 +2729,7 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
>  	btrfs_mark_buffer_dirty(leaf);
>  	btrfs_free_path(path);
>  
> -	ret = update_block_group(root, ins->objectid, fs_info->nodesize,
> +	ret = update_block_group(fs_info, ins->objectid, fs_info->nodesize,
>  				 1, 0);
>  	return ret;
>  }
> @@ -3413,7 +3412,7 @@ int btrfs_update_block_group(struct btrfs_root *root,
>  			     u64 bytenr, u64 num_bytes, int alloc,
>  			     int mark_free)
>  {
> -	return update_block_group(root, bytenr, num_bytes,
> +	return update_block_group(root->fs_info, bytenr, num_bytes,
>  				  alloc, mark_free);
>  }
>  
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 06/15] btrfs-progs: check: Drop trans/root arguments from free_extent_hook
  2018-06-08 12:47 ` [PATCH 06/15] btrfs-progs: check: Drop trans/root arguments from free_extent_hook Nikolay Borisov
@ 2018-06-11  4:55   ` Qu Wenruo
  2018-06-11  7:04     ` Nikolay Borisov
  0 siblings, 1 reply; 46+ messages in thread
From: Qu Wenruo @ 2018-06-11  4:55 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2018年06月08日 20:47, Nikolay Borisov wrote:
> They are not really needed, what free_extent_hook wants is really a
> pointer to fs_info so give it to it directly. This is in preparation
> of delayed refs code.

Looks good, since free_extent_hook is only used by original mode and it
doesn't involve any tree operation at all, it's a valid modification.

Although I can't really see the relationship with delayed refs, hopes I
could find it out when reviewing the rest patches.

> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> ---
>  check/main.c  | 5 ++---
>  ctree.h       | 3 +--
>  extent-tree.c | 4 ++--
>  3 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/check/main.c b/check/main.c
> index 9a1f238800b0..b84903acdb25 100644
> --- a/check/main.c
> +++ b/check/main.c
> @@ -6234,8 +6234,7 @@ static int add_root_to_pending(struct extent_buffer *buf,
>   * we're tracking for repair.  This hook makes sure we
>   * remove any backrefs for blocks as we are fixing them.
>   */
> -static int free_extent_hook(struct btrfs_trans_handle *trans,
> -			    struct btrfs_root *root,
> +static int free_extent_hook(struct btrfs_fs_info *fs_info,
>  			    u64 bytenr, u64 num_bytes, u64 parent,
>  			    u64 root_objectid, u64 owner, u64 offset,
>  			    int refs_to_drop)
> @@ -6243,7 +6242,7 @@ static int free_extent_hook(struct btrfs_trans_handle *trans,
>  	struct extent_record *rec;
>  	struct cache_extent *cache;
>  	int is_data;
> -	struct cache_tree *extent_cache = root->fs_info->fsck_extent_cache;
> +	struct cache_tree *extent_cache = fs_info->fsck_extent_cache;
>  
>  	is_data = owner >= BTRFS_FIRST_FREE_OBJECTID;
>  	cache = lookup_cache_extent(extent_cache, bytenr, num_bytes);
> diff --git a/ctree.h b/ctree.h
> index 082726238b91..b30a946658ce 100644
> --- a/ctree.h
> +++ b/ctree.h
> @@ -1143,8 +1143,7 @@ struct btrfs_fs_info {
>  
>  	int transaction_aborted;
>  
> -	int (*free_extent_hook)(struct btrfs_trans_handle *trans,
> -				struct btrfs_root *root,
> +	int (*free_extent_hook)(struct btrfs_fs_info *fs_info,
>  				u64 bytenr, u64 num_bytes, u64 parent,
>  				u64 root_objectid, u64 owner, u64 offset,
>  				int refs_to_drop);
> diff --git a/extent-tree.c b/extent-tree.c
> index 6e7a19323efc..9132cb3f8e15 100644
> --- a/extent-tree.c
> +++ b/extent-tree.c
> @@ -2163,8 +2163,8 @@ static int __free_extent(struct btrfs_trans_handle *trans,
>  	int skinny_metadata =
>  		btrfs_fs_incompat(extent_root->fs_info, SKINNY_METADATA);
>  
> -	if (root->fs_info->free_extent_hook) {
> -		root->fs_info->free_extent_hook(trans, root, bytenr, num_bytes,
> +	if (trans->fs_info->free_extent_hook) {
> +		trans->fs_info->free_extent_hook(trans->fs_info, bytenr, num_bytes,
>  						parent, root_objectid, owner_objectid,
>  						owner_offset, refs_to_drop);
>  
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 07/15] btrfs-progs: Remove root argument from __free_extent
  2018-06-08 12:47 ` [PATCH 07/15] btrfs-progs: Remove root argument from __free_extent Nikolay Borisov
@ 2018-06-11  4:58   ` Qu Wenruo
  2018-06-11  7:06     ` Nikolay Borisov
  0 siblings, 1 reply; 46+ messages in thread
From: Qu Wenruo @ 2018-06-11  4:58 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2018年06月08日 20:47, Nikolay Borisov wrote:
> This argument is no longer used in this function so remove it.

The same concern about the aggressive removal of fs_info.

I would completely accept if it's only convert root to fs_info, but
removing it completely and rely on trans to get fs_info, I'm still not
100% sure.

Thanks,
Qu

> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  extent-tree.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/extent-tree.c b/extent-tree.c
> index 9132cb3f8e15..c16bd85e92be 100644
> --- a/extent-tree.c
> +++ b/extent-tree.c
> @@ -50,7 +50,6 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
>  				     u64 flags, struct btrfs_disk_key *key,
>  				     int level, struct btrfs_key *ins);
>  static int __free_extent(struct btrfs_trans_handle *trans,
> -			 struct btrfs_root *root,
>  			 u64 bytenr, u64 num_bytes, u64 parent,
>  			 u64 root_objectid, u64 owner_objectid,
>  			 u64 owner_offset, int refs_to_drop);
> @@ -2141,7 +2140,6 @@ void btrfs_unpin_extent(struct btrfs_fs_info *fs_info,
>   * remove an extent from the root, returns 0 on success
>   */
>  static int __free_extent(struct btrfs_trans_handle *trans,
> -			 struct btrfs_root *root,
>  			 u64 bytenr, u64 num_bytes, u64 parent,
>  			 u64 root_objectid, u64 owner_objectid,
>  			 u64 owner_offset, int refs_to_drop)
> @@ -2149,7 +2147,7 @@ static int __free_extent(struct btrfs_trans_handle *trans,
>  
>  	struct btrfs_key key;
>  	struct btrfs_path *path;
> -	struct btrfs_root *extent_root = root->fs_info->extent_root;
> +	struct btrfs_root *extent_root = trans->fs_info->extent_root;
>  	struct extent_buffer *leaf;
>  	struct btrfs_extent_item *ei;
>  	struct btrfs_extent_inline_ref *iref;
> @@ -2409,8 +2407,7 @@ static int del_pending_extents(struct btrfs_trans_handle *trans)
>  
>  		if (!test_range_bit(extent_ins, start, end,
>  				    EXTENT_LOCKED, 0)) {
> -			ret = __free_extent(trans, extent_root,
> -					    start, end + 1 - start, 0,
> +			ret = __free_extent(trans, start, end + 1 - start, 0,
>  					    extent_root->root_key.objectid,
>  					    extent_op->level, 0, 1);
>  			kfree(extent_op);
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/15] btrfs-progs: Always pass 0 for offset when calling btrfs_free_extent for btree blocks.
  2018-06-08 12:47 ` [PATCH 09/15] btrfs-progs: Always pass 0 for offset when calling btrfs_free_extent for btree blocks Nikolay Borisov
@ 2018-06-11  5:05   ` Qu Wenruo
  0 siblings, 0 replies; 46+ messages in thread
From: Qu Wenruo @ 2018-06-11  5:05 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2018年06月08日 20:47, Nikolay Borisov wrote:
> Currently some instances of btrfs_free_extent are called with the
> last parameter ("offset") being set to 1. This makes no sense, since
> offset is used for data extents. I suspect this is a left-over from
> 95d3f20b51e9 ("Mixed back reference  (FORWARD ROLLING FORMAT CHANGE)")
> since this commit changed the signature of the function from :
> 
> -int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root
> -                     *root, u64 bytenr, u64 num_bytes, u64 parent,
> -                     u64 root_objectid, u64 ref_generation,
> -                     u64 owner_objectid, int pin);
> 
> to
> 
> +int btrfs_free_extent(struct btrfs_trans_handle *trans,
> +                     struct btrfs_root *root,
> +                     u64 bytenr, u64 num_bytes, u64 parent,
> +                     u64 root_objectid, u64 owner, u64 offset);
> 
> I.e the last parameter was "pin" and not offset. So these are just
> leftovers with no semantic meaning. Fix this by passing 0.

And indeed, for tree blocks the @offset parameter is not used at all.

The call sites involving the offset is:
btrfs_free_extent()
|- __free_extent()
   |- lookup_extent_backref()
      |- lookup_inline_extent_backref() <<<
      |- lookup_extent_data_ref()       <<<

And in lookup_inline_extent_backref() we won't use @offset for tree
blocks anyway.

The the 1 passed as @offset should be a left-over.

> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> ---
>  ctree.c       | 4 ++--
>  extent-tree.c | 6 +++---
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/ctree.c b/ctree.c
> index 8f3338b4693a..d8a6883aa85f 100644
> --- a/ctree.c
> +++ b/ctree.c
> @@ -334,7 +334,7 @@ int __btrfs_cow_block(struct btrfs_trans_handle *trans,
>  		WARN_ON(btrfs_header_generation(parent) != trans->transid);
>  
>  		btrfs_free_extent(trans, root, buf->start, buf->len,
> -				  0, root->root_key.objectid, level, 1);
> +				  0, root->root_key.objectid, level, 0);
>  	}
>  	if (!list_empty(&buf->recow)) {
>  		list_del_init(&buf->recow);
> @@ -738,7 +738,7 @@ static int balance_level(struct btrfs_trans_handle *trans,
>  
>  		ret = btrfs_free_extent(trans, root, mid->start, mid->len,
>  					0, root->root_key.objectid,
> -					level, 1);
> +					level, 0);
>  		/* once for the root ptr */
>  		free_extent_buffer(mid);
>  		return ret;
> diff --git a/extent-tree.c b/extent-tree.c
> index 079204ed290f..ab57c20d9dee 100644
> --- a/extent-tree.c
> +++ b/extent-tree.c
> @@ -2961,7 +2961,7 @@ static int noinline walk_down_tree(struct btrfs_trans_handle *trans,
>  			path->slots[*level]++;
>  			ret = btrfs_free_extent(trans, root, bytenr, blocksize,
>  						parent->start, root_owner,
> -						root_gen, *level - 1, 1);
> +						root_gen, *level - 1, 0);
>  			BUG_ON(ret);
>  			continue;
>  		}
> @@ -3003,7 +3003,7 @@ static int noinline walk_down_tree(struct btrfs_trans_handle *trans,
>  	root_gen = btrfs_header_generation(parent);
>  	ret = btrfs_free_extent(trans, root, path->nodes[*level]->start,
>  				path->nodes[*level]->len, parent->start,
> -				root_owner, root_gen, *level, 1);
> +				root_owner, root_gen, *level, 0);
>  	free_extent_buffer(path->nodes[*level]);
>  	path->nodes[*level] = NULL;
>  	*level += 1;
> @@ -3054,7 +3054,7 @@ static int noinline walk_up_tree(struct btrfs_trans_handle *trans,
>  						path->nodes[*level]->start,
>  						path->nodes[*level]->len,
>  						parent->start, root_owner,
> -						root_gen, *level, 1);
> +						root_gen, *level, 0);
>  			BUG_ON(ret);
>  			free_extent_buffer(path->nodes[*level]);
>  			path->nodes[*level] = NULL;
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/15] btrfs-progs: Add delayed refs infrastructure
  2018-06-08 12:47 ` [PATCH 11/15] btrfs-progs: Add delayed refs infrastructure Nikolay Borisov
  2018-06-08 14:53   ` [PATCH 11/15 v2] " Nikolay Borisov
@ 2018-06-11  5:20   ` Qu Wenruo
  2018-06-11  7:10     ` Nikolay Borisov
  2018-07-30  8:34   ` Misono Tomohiro
  2 siblings, 1 reply; 46+ messages in thread
From: Qu Wenruo @ 2018-06-11  5:20 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2018年06月08日 20:47, Nikolay Borisov wrote:
> This commit pulls those portions of the kernel implementation of
> delayed refs which are necessary to have them working in user-space.
> I've done the following modifications:
> 
> 1. Replaced all kmem_cache_alloc calls to kmalloc.
> 
> 2. Removed all locking-related code, since we are single threaded in
> userspace.
> 
> 3. Removed code which deals with data refs - delayed refs in user space
> are going to be used only for cowonly trees.

That's pretty good, although still some data ref related
structures/functions are left.

> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  Makefile      |   3 +-
>  ctree.h       |   3 +
>  delayed-ref.c | 608 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  delayed-ref.h | 225 ++++++++++++++++++++++
>  extent-tree.c | 228 ++++++++++++++++++++++
>  kerncompat.h  |   8 +
>  transaction.h |   4 +
>  7 files changed, 1078 insertions(+), 1 deletion(-)
>  create mode 100644 delayed-ref.c
>  create mode 100644 delayed-ref.h
> 
> diff --git a/Makefile b/Makefile
> index 544410e6440c..9508ad4f11e6 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -116,7 +116,8 @@ objects = ctree.o disk-io.o kernel-lib/radix-tree.o extent-tree.o print-tree.o \
>  	  qgroup.o free-space-cache.o kernel-lib/list_sort.o props.o \
>  	  kernel-shared/ulist.o qgroup-verify.o backref.o string-table.o task-utils.o \
>  	  inode.o file.o find-root.o free-space-tree.o help.o send-dump.o \
> -	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o
> +	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o \
> +	  delayed-ref.o
>  cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
>  	       cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
>  	       cmds-quota.o cmds-qgroup.o cmds-replace.o check/main.o \
> diff --git a/ctree.h b/ctree.h
> index b30a946658ce..d1ea45571d1e 100644
> --- a/ctree.h
> +++ b/ctree.h
> @@ -2812,4 +2812,7 @@ int btrfs_punch_hole(struct btrfs_trans_handle *trans,
>  int btrfs_read_file(struct btrfs_root *root, u64 ino, u64 start, int len,
>  		    char *dest);
>  
> +
> +/* extent-tree.c */
> +int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, unsigned long nr);
>  #endif
> diff --git a/delayed-ref.c b/delayed-ref.c
> new file mode 100644
> index 000000000000..f3fa50239380
> --- /dev/null
> +++ b/delayed-ref.c
> @@ -0,0 +1,608 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2009 Oracle.  All rights reserved.
> + */
> +
> +#include "ctree.h"
> +#include "btrfs-list.h"
> +#include "delayed-ref.h"
> +#include "transaction.h"
> +
> +/*
> + * delayed back reference update tracking.  For subvolume trees
> + * we queue up extent allocations and backref maintenance for
> + * delayed processing.   This avoids deep call chains where we
> + * add extents in the middle of btrfs_search_slot, and it allows
> + * us to buffer up frequently modified backrefs in an rb tree instead
> + * of hammering updates on the extent allocation tree.
> + */

A little more explanation on how delayed ref works will be more appricated.

[snip]
> +struct btrfs_delayed_tree_ref {
> +	struct btrfs_delayed_ref_node node;
> +	u64 root;
> +	u64 parent;
> +	int level;
> +};
> +
> +struct btrfs_delayed_data_ref {
> +	struct btrfs_delayed_ref_node node;
> +	u64 root;
> +	u64 parent;
> +	u64 objectid;
> +	u64 offset;
> +};

Since we don't use this structure and don't support data ref yet, what
about just removing this definiation?

[snip]

> +struct btrfs_delayed_ref_head *
> +btrfs_select_ref_head(struct btrfs_trans_handle *trans);
> +
> +/*
> + * helper functions to cast a node into its container
> + */
> +static inline struct btrfs_delayed_tree_ref *
> +btrfs_delayed_node_to_tree_ref(struct btrfs_delayed_ref_node *node)
> +{
> +	return container_of(node, struct btrfs_delayed_tree_ref, node);
> +}
> +
> +static inline struct btrfs_delayed_data_ref *
> +btrfs_delayed_node_to_data_ref(struct btrfs_delayed_ref_node *node)
> +{
> +	return container_of(node, struct btrfs_delayed_data_ref, node);
> +}

So is the only user of btrfs_delayed_data_ref structure.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/15] btrfs-progs: Remove root argument from btrfs_del_csums
  2018-06-11  4:46   ` Qu Wenruo
@ 2018-06-11  7:02     ` Nikolay Borisov
  2018-06-11  7:40       ` Qu Wenruo
  0 siblings, 1 reply; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-11  7:02 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 11.06.2018 07:46, Qu Wenruo wrote:
> 
> 
> On 2018年06月08日 20:47, Nikolay Borisov wrote:
>> It's not needed, since we can obtain a reference to fs_info from the
>> passed transaction handle. This is needed by delayed refs code.
> 
> This looks a little too aggressive to me.
> 
> Normally we would expect parameters like @trans then @fs_info.
> Although @trans only let us get @fs_info, under certain case @trans
> could be NULL (like btrfs_search_slot).
> 
> Although for btrfs_del_csums() @tran can be NULL, I still prefer the
> @trans, @fs_info parameters.

The reason I'm making those prep patches i because I would like to get
rid of the root argument from _free_extent, since if you look at the
later delayed refs patches - patch 12/15 you'd see that the adapter
function __free_extent2 doesn't really have a root pointer as an
argument. If you also take a look at 10/15 (add delayed refs
infrastructure), the functions which create the delayed refs also don't
take a struct btrfs_root as an argument.

Alternatively I can perhaps wire up a call to btrfs_read_fs_root in
__free_extent which will be using the "ref->root" id to obtain struct
btrfs_root. However, due to the way FST fixup is implemented (I still
haven't sent that code) it's possible that we call __free_extent2 with
the freespace root set to 0 so if btrfs_read_fs_root is called it will
return an -ENOENT.

In conclusion what you suggest *could* be done but it will require
re-engineering the currently-tested FST code + delayed refs code. Given
the purpose of the root argument in this callchain I'd rather eliminate
it altogether and not bother with it.

> 
> Thanks,
> Qu
> 
> 
>>
>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>> ---
>>  btrfs-corrupt-block.c |  2 +-
>>  ctree.h               |  3 +--
>>  extent-tree.c         |  2 +-
>>  file-item.c           | 20 ++++++++++----------
>>  4 files changed, 13 insertions(+), 14 deletions(-)
>>
>> diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
>> index 4fbea26cda20..3add8e63b7bb 100644
>> --- a/btrfs-corrupt-block.c
>> +++ b/btrfs-corrupt-block.c
>> @@ -926,7 +926,7 @@ static int delete_csum(struct btrfs_root *root, u64 bytenr, u64 bytes)
>>  		return PTR_ERR(trans);
>>  	}
>>  
>> -	ret = btrfs_del_csums(trans, root, bytenr, bytes);
>> +	ret = btrfs_del_csums(trans, bytenr, bytes);
>>  	if (ret)
>>  		fprintf(stderr, "Error deleting csums %d\n", ret);
>>  	btrfs_commit_transaction(trans, root);
>> diff --git a/ctree.h b/ctree.h
>> index de4b1b7e6416..082726238b91 100644
>> --- a/ctree.h
>> +++ b/ctree.h
>> @@ -2752,8 +2752,7 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
>>  			u64 ino, u64 parent_ino, u64 *index);
>>  
>>  /* file-item.c */
>> -int btrfs_del_csums(struct btrfs_trans_handle *trans,
>> -		    struct btrfs_root *root, u64 bytenr, u64 len);
>> +int btrfs_del_csums(struct btrfs_trans_handle *trans, u64 bytenr, u64 len);
>>  int btrfs_insert_file_extent(struct btrfs_trans_handle *trans,
>>  			     struct btrfs_root *root,
>>  			     u64 objectid, u64 pos, u64 offset,
>> diff --git a/extent-tree.c b/extent-tree.c
>> index cbc022f6cef6..c6f09b52800f 100644
>> --- a/extent-tree.c
>> +++ b/extent-tree.c
>> @@ -2372,7 +2372,7 @@ static int __free_extent(struct btrfs_trans_handle *trans,
>>  		btrfs_release_path(path);
>>  
>>  		if (is_data) {
>> -			ret = btrfs_del_csums(trans, root, bytenr, num_bytes);
>> +			ret = btrfs_del_csums(trans, bytenr, num_bytes);
>>  			BUG_ON(ret);
>>  		}
>>  
>> diff --git a/file-item.c b/file-item.c
>> index 7b0ff3585509..71d4e89f78d1 100644
>> --- a/file-item.c
>> +++ b/file-item.c
>> @@ -394,8 +394,7 @@ static noinline int truncate_one_csum(struct btrfs_root *root,
>>   * deletes the csum items from the csum tree for a given
>>   * range of bytes.
>>   */
>> -int btrfs_del_csums(struct btrfs_trans_handle *trans,
>> -		    struct btrfs_root *root, u64 bytenr, u64 len)
>> +int btrfs_del_csums(struct btrfs_trans_handle *trans, u64 bytenr, u64 len)
>>  {
>>  	struct btrfs_path *path;
>>  	struct btrfs_key key;
>> @@ -403,11 +402,10 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>  	u64 csum_end;
>>  	struct extent_buffer *leaf;
>>  	int ret;
>> -	u16 csum_size =
>> -		btrfs_super_csum_size(root->fs_info->super_copy);
>> -	int blocksize = root->fs_info->sectorsize;
>> +	u16 csum_size = btrfs_super_csum_size(trans->fs_info->super_copy);
>> +	int blocksize = trans->fs_info->sectorsize;
>> +	struct btrfs_root *csum_root = trans->fs_info->csum_root;
>>  
>> -	root = root->fs_info->csum_root;
>>  
>>  	path = btrfs_alloc_path();
>>  	if (!path)
>> @@ -418,7 +416,7 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>  		key.offset = end_byte - 1;
>>  		key.type = BTRFS_EXTENT_CSUM_KEY;
>>  
>> -		ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
>> +		ret = btrfs_search_slot(trans, csum_root, &key, path, -1, 1);
>>  		if (ret > 0) {
>>  			if (path->slots[0] == 0)
>>  				goto out;
>> @@ -445,7 +443,7 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>  
>>  		/* delete the entire item, it is inside our range */
>>  		if (key.offset >= bytenr && csum_end <= end_byte) {
>> -			ret = btrfs_del_item(trans, root, path);
>> +			ret = btrfs_del_item(trans, csum_root, path);
>>  			BUG_ON(ret);
>>  		} else if (key.offset < bytenr && csum_end > end_byte) {
>>  			unsigned long offset;
>> @@ -485,12 +483,14 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>  			 * btrfs_split_item returns -EAGAIN when the
>>  			 * item changed size or key
>>  			 */
>> -			ret = btrfs_split_item(trans, root, path, &key, offset);
>> +			ret = btrfs_split_item(trans, csum_root, path, &key,
>> +					       offset);
>>  			BUG_ON(ret && ret != -EAGAIN);
>>  
>>  			key.offset = end_byte - 1;
>>  		} else {
>> -			ret = truncate_one_csum(root, path, &key, bytenr, len);
>> +			ret = truncate_one_csum(csum_root, path, &key, bytenr,
>> +						len);
>>  			BUG_ON(ret);
>>  		}
>>  		btrfs_release_path(path);
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 06/15] btrfs-progs: check: Drop trans/root arguments from free_extent_hook
  2018-06-11  4:55   ` Qu Wenruo
@ 2018-06-11  7:04     ` Nikolay Borisov
  0 siblings, 0 replies; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-11  7:04 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 11.06.2018 07:55, Qu Wenruo wrote:
> 
> 
> On 2018年06月08日 20:47, Nikolay Borisov wrote:
>> They are not really needed, what free_extent_hook wants is really a
>> pointer to fs_info so give it to it directly. This is in preparation
>> of delayed refs code.
> 
> Looks good, since free_extent_hook is only used by original mode and it
> doesn't involve any tree operation at all, it's a valid modification.
> 
> Although I can't really see the relationship with delayed refs, hopes I
> could find it out when reviewing the rest patches.

I guess I failed to explain that in the cover letter, for more details
see my previous reply to your comments regarding 2/15. In short - I
wanted to remove the root argument from __free_extent in order to do
that I first had to remove it from its callees.

> 
>>
>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> 
> Reviewed-by: Qu Wenruo <wqu@suse.com>
> 
> Thanks,
> Qu
> 
>> ---
>>  check/main.c  | 5 ++---
>>  ctree.h       | 3 +--
>>  extent-tree.c | 4 ++--
>>  3 files changed, 5 insertions(+), 7 deletions(-)
>>
>> diff --git a/check/main.c b/check/main.c
>> index 9a1f238800b0..b84903acdb25 100644
>> --- a/check/main.c
>> +++ b/check/main.c
>> @@ -6234,8 +6234,7 @@ static int add_root_to_pending(struct extent_buffer *buf,
>>   * we're tracking for repair.  This hook makes sure we
>>   * remove any backrefs for blocks as we are fixing them.
>>   */
>> -static int free_extent_hook(struct btrfs_trans_handle *trans,
>> -			    struct btrfs_root *root,
>> +static int free_extent_hook(struct btrfs_fs_info *fs_info,
>>  			    u64 bytenr, u64 num_bytes, u64 parent,
>>  			    u64 root_objectid, u64 owner, u64 offset,
>>  			    int refs_to_drop)
>> @@ -6243,7 +6242,7 @@ static int free_extent_hook(struct btrfs_trans_handle *trans,
>>  	struct extent_record *rec;
>>  	struct cache_extent *cache;
>>  	int is_data;
>> -	struct cache_tree *extent_cache = root->fs_info->fsck_extent_cache;
>> +	struct cache_tree *extent_cache = fs_info->fsck_extent_cache;
>>  
>>  	is_data = owner >= BTRFS_FIRST_FREE_OBJECTID;
>>  	cache = lookup_cache_extent(extent_cache, bytenr, num_bytes);
>> diff --git a/ctree.h b/ctree.h
>> index 082726238b91..b30a946658ce 100644
>> --- a/ctree.h
>> +++ b/ctree.h
>> @@ -1143,8 +1143,7 @@ struct btrfs_fs_info {
>>  
>>  	int transaction_aborted;
>>  
>> -	int (*free_extent_hook)(struct btrfs_trans_handle *trans,
>> -				struct btrfs_root *root,
>> +	int (*free_extent_hook)(struct btrfs_fs_info *fs_info,
>>  				u64 bytenr, u64 num_bytes, u64 parent,
>>  				u64 root_objectid, u64 owner, u64 offset,
>>  				int refs_to_drop);
>> diff --git a/extent-tree.c b/extent-tree.c
>> index 6e7a19323efc..9132cb3f8e15 100644
>> --- a/extent-tree.c
>> +++ b/extent-tree.c
>> @@ -2163,8 +2163,8 @@ static int __free_extent(struct btrfs_trans_handle *trans,
>>  	int skinny_metadata =
>>  		btrfs_fs_incompat(extent_root->fs_info, SKINNY_METADATA);
>>  
>> -	if (root->fs_info->free_extent_hook) {
>> -		root->fs_info->free_extent_hook(trans, root, bytenr, num_bytes,
>> +	if (trans->fs_info->free_extent_hook) {
>> +		trans->fs_info->free_extent_hook(trans->fs_info, bytenr, num_bytes,
>>  						parent, root_objectid, owner_objectid,
>>  						owner_offset, refs_to_drop);
>>  
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 07/15] btrfs-progs: Remove root argument from __free_extent
  2018-06-11  4:58   ` Qu Wenruo
@ 2018-06-11  7:06     ` Nikolay Borisov
  0 siblings, 0 replies; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-11  7:06 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 11.06.2018 07:58, Qu Wenruo wrote:
> 
> 
> On 2018年06月08日 20:47, Nikolay Borisov wrote:
>> This argument is no longer used in this function so remove it.
> 
> The same concern about the aggressive removal of fs_info.
> 
> I would completely accept if it's only convert root to fs_info, but
> removing it completely and rely on trans to get fs_info, I'm still not
> 100% sure.

Freeing an extent is now related to running delayed refs, which are
keyed off a valid transaction and a valid transaction must have access
to fs_info so I think it's fine.

> 
> Thanks,
> Qu
> 
>>
>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>> ---
>>  extent-tree.c | 7 ++-----
>>  1 file changed, 2 insertions(+), 5 deletions(-)
>>
>> diff --git a/extent-tree.c b/extent-tree.c
>> index 9132cb3f8e15..c16bd85e92be 100644
>> --- a/extent-tree.c
>> +++ b/extent-tree.c
>> @@ -50,7 +50,6 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
>>  				     u64 flags, struct btrfs_disk_key *key,
>>  				     int level, struct btrfs_key *ins);
>>  static int __free_extent(struct btrfs_trans_handle *trans,
>> -			 struct btrfs_root *root,
>>  			 u64 bytenr, u64 num_bytes, u64 parent,
>>  			 u64 root_objectid, u64 owner_objectid,
>>  			 u64 owner_offset, int refs_to_drop);
>> @@ -2141,7 +2140,6 @@ void btrfs_unpin_extent(struct btrfs_fs_info *fs_info,
>>   * remove an extent from the root, returns 0 on success
>>   */
>>  static int __free_extent(struct btrfs_trans_handle *trans,
>> -			 struct btrfs_root *root,
>>  			 u64 bytenr, u64 num_bytes, u64 parent,
>>  			 u64 root_objectid, u64 owner_objectid,
>>  			 u64 owner_offset, int refs_to_drop)
>> @@ -2149,7 +2147,7 @@ static int __free_extent(struct btrfs_trans_handle *trans,
>>  
>>  	struct btrfs_key key;
>>  	struct btrfs_path *path;
>> -	struct btrfs_root *extent_root = root->fs_info->extent_root;
>> +	struct btrfs_root *extent_root = trans->fs_info->extent_root;
>>  	struct extent_buffer *leaf;
>>  	struct btrfs_extent_item *ei;
>>  	struct btrfs_extent_inline_ref *iref;
>> @@ -2409,8 +2407,7 @@ static int del_pending_extents(struct btrfs_trans_handle *trans)
>>  
>>  		if (!test_range_bit(extent_ins, start, end,
>>  				    EXTENT_LOCKED, 0)) {
>> -			ret = __free_extent(trans, extent_root,
>> -					    start, end + 1 - start, 0,
>> +			ret = __free_extent(trans, start, end + 1 - start, 0,
>>  					    extent_root->root_key.objectid,
>>  					    extent_op->level, 0, 1);
>>  			kfree(extent_op);
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/15] btrfs-progs: Add delayed refs infrastructure
  2018-06-11  5:20   ` [PATCH 11/15] " Qu Wenruo
@ 2018-06-11  7:10     ` Nikolay Borisov
  2018-06-11  7:46       ` Qu Wenruo
  0 siblings, 1 reply; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-11  7:10 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 11.06.2018 08:20, Qu Wenruo wrote:
> 
> 
> On 2018年06月08日 20:47, Nikolay Borisov wrote:
>> This commit pulls those portions of the kernel implementation of
>> delayed refs which are necessary to have them working in user-space.
>> I've done the following modifications:
>>
>> 1. Replaced all kmem_cache_alloc calls to kmalloc.
>>
>> 2. Removed all locking-related code, since we are single threaded in
>> userspace.
>>
>> 3. Removed code which deals with data refs - delayed refs in user space
>> are going to be used only for cowonly trees.
> 
> That's pretty good, although still some data ref related
> structures/functions are left.
> 
>>
>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>> ---
>>  Makefile      |   3 +-
>>  ctree.h       |   3 +
>>  delayed-ref.c | 608 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  delayed-ref.h | 225 ++++++++++++++++++++++
>>  extent-tree.c | 228 ++++++++++++++++++++++
>>  kerncompat.h  |   8 +
>>  transaction.h |   4 +
>>  7 files changed, 1078 insertions(+), 1 deletion(-)
>>  create mode 100644 delayed-ref.c
>>  create mode 100644 delayed-ref.h
>>
>> diff --git a/Makefile b/Makefile
>> index 544410e6440c..9508ad4f11e6 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -116,7 +116,8 @@ objects = ctree.o disk-io.o kernel-lib/radix-tree.o extent-tree.o print-tree.o \
>>  	  qgroup.o free-space-cache.o kernel-lib/list_sort.o props.o \
>>  	  kernel-shared/ulist.o qgroup-verify.o backref.o string-table.o task-utils.o \
>>  	  inode.o file.o find-root.o free-space-tree.o help.o send-dump.o \
>> -	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o
>> +	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o \
>> +	  delayed-ref.o
>>  cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
>>  	       cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
>>  	       cmds-quota.o cmds-qgroup.o cmds-replace.o check/main.o \
>> diff --git a/ctree.h b/ctree.h
>> index b30a946658ce..d1ea45571d1e 100644
>> --- a/ctree.h
>> +++ b/ctree.h
>> @@ -2812,4 +2812,7 @@ int btrfs_punch_hole(struct btrfs_trans_handle *trans,
>>  int btrfs_read_file(struct btrfs_root *root, u64 ino, u64 start, int len,
>>  		    char *dest);
>>  
>> +
>> +/* extent-tree.c */
>> +int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, unsigned long nr);
>>  #endif
>> diff --git a/delayed-ref.c b/delayed-ref.c
>> new file mode 100644
>> index 000000000000..f3fa50239380
>> --- /dev/null
>> +++ b/delayed-ref.c
>> @@ -0,0 +1,608 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright (C) 2009 Oracle.  All rights reserved.
>> + */
>> +
>> +#include "ctree.h"
>> +#include "btrfs-list.h"
>> +#include "delayed-ref.h"
>> +#include "transaction.h"
>> +
>> +/*
>> + * delayed back reference update tracking.  For subvolume trees
>> + * we queue up extent allocations and backref maintenance for
>> + * delayed processing.   This avoids deep call chains where we
>> + * add extents in the middle of btrfs_search_slot, and it allows
>> + * us to buffer up frequently modified backrefs in an rb tree instead
>> + * of hammering updates on the extent allocation tree.
>> + */
> 
> A little more explanation on how delayed ref works will be more appricated.

I just copy/pasted that from the kernel code. TBH I'm not too familiar
with the backref lookup code to write something but I guess I can take a
look at it and perhaps add information to the btrfs-devs-docs. For now
I'm confident in my understanding of the delayed allocation/freeing logic.

> 
> [snip]
>> +struct btrfs_delayed_tree_ref {
>> +	struct btrfs_delayed_ref_node node;
>> +	u64 root;
>> +	u64 parent;
>> +	int level;
>> +};
>> +
>> +struct btrfs_delayed_data_ref {
>> +	struct btrfs_delayed_ref_node node;
>> +	u64 root;
>> +	u64 parent;
>> +	u64 objectid;
>> +	u64 offset;
>> +};
> 
> Since we don't use this structure and don't support data ref yet, what
> about just removing this definiation?

Saw that tooand immediately sent v2 :)

> 
> [snip]
> 
>> +struct btrfs_delayed_ref_head *
>> +btrfs_select_ref_head(struct btrfs_trans_handle *trans);
>> +
>> +/*
>> + * helper functions to cast a node into its container
>> + */
>> +static inline struct btrfs_delayed_tree_ref *
>> +btrfs_delayed_node_to_tree_ref(struct btrfs_delayed_ref_node *node)
>> +{
>> +	return container_of(node, struct btrfs_delayed_tree_ref, node);
>> +}
>> +
>> +static inline struct btrfs_delayed_data_ref *
>> +btrfs_delayed_node_to_data_ref(struct btrfs_delayed_ref_node *node)
>> +{
>> +	return container_of(node, struct btrfs_delayed_data_ref, node);
>> +}
> 
> So is the only user of btrfs_delayed_data_ref structure.

Fixed in v2
> 
> Thanks,
> Qu
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/15] btrfs-progs: Remove root argument from btrfs_del_csums
  2018-06-11  7:02     ` Nikolay Borisov
@ 2018-06-11  7:40       ` Qu Wenruo
  2018-06-11  7:48         ` Nikolay Borisov
  0 siblings, 1 reply; 46+ messages in thread
From: Qu Wenruo @ 2018-06-11  7:40 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2018年06月11日 15:02, Nikolay Borisov wrote:
> 
> 
> On 11.06.2018 07:46, Qu Wenruo wrote:
>>
>>
>> On 2018年06月08日 20:47, Nikolay Borisov wrote:
>>> It's not needed, since we can obtain a reference to fs_info from the
>>> passed transaction handle. This is needed by delayed refs code.
>>
>> This looks a little too aggressive to me.
>>
>> Normally we would expect parameters like @trans then @fs_info.
>> Although @trans only let us get @fs_info, under certain case @trans
>> could be NULL (like btrfs_search_slot).
>>
>> Although for btrfs_del_csums() @tran can be NULL, I still prefer the
>> @trans, @fs_info parameters.
> 
> The reason I'm making those prep patches i because I would like to get
> rid of the root argument from _free_extent,

For this part, I completely agree with you.

The root doesn't make sense, getting rid of it is completely fine.
The abuse of btrfs_root as parameter should be stopped, and we have tons
of such cleanup both in kernel and btrfs-progs.

The only concern is about possible NULL @trans parameter.
However it could be easily addressed by adding 'restrict' prefix.

So just adding 'restrict', and then the whole cleanup part is fine to me.

Thanks,
Qu

> since if you look at the
> later delayed refs patches - patch 12/15 you'd see that the adapter
> function __free_extent2 doesn't really have a root pointer as an
> argument. If you also take a look at 10/15 (add delayed refs
> infrastructure), the functions which create the delayed refs also don't
> take a struct btrfs_root as an argument.
> 
> Alternatively I can perhaps wire up a call to btrfs_read_fs_root in
> __free_extent which will be using the "ref->root" id to obtain struct
> btrfs_root. However, due to the way FST fixup is implemented (I still
> haven't sent that code) it's possible that we call __free_extent2 with
> the freespace root set to 0 so if btrfs_read_fs_root is called it will
> return an -ENOENT.
> 
> In conclusion what you suggest *could* be done but it will require
> re-engineering the currently-tested FST code + delayed refs code. Given
> the purpose of the root argument in this callchain I'd rather eliminate
> it altogether and not bother with it.
> 
>>
>> Thanks,
>> Qu
>>
>>
>>>
>>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>>> ---
>>>  btrfs-corrupt-block.c |  2 +-
>>>  ctree.h               |  3 +--
>>>  extent-tree.c         |  2 +-
>>>  file-item.c           | 20 ++++++++++----------
>>>  4 files changed, 13 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
>>> index 4fbea26cda20..3add8e63b7bb 100644
>>> --- a/btrfs-corrupt-block.c
>>> +++ b/btrfs-corrupt-block.c
>>> @@ -926,7 +926,7 @@ static int delete_csum(struct btrfs_root *root, u64 bytenr, u64 bytes)
>>>  		return PTR_ERR(trans);
>>>  	}
>>>  
>>> -	ret = btrfs_del_csums(trans, root, bytenr, bytes);
>>> +	ret = btrfs_del_csums(trans, bytenr, bytes);
>>>  	if (ret)
>>>  		fprintf(stderr, "Error deleting csums %d\n", ret);
>>>  	btrfs_commit_transaction(trans, root);
>>> diff --git a/ctree.h b/ctree.h
>>> index de4b1b7e6416..082726238b91 100644
>>> --- a/ctree.h
>>> +++ b/ctree.h
>>> @@ -2752,8 +2752,7 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
>>>  			u64 ino, u64 parent_ino, u64 *index);
>>>  
>>>  /* file-item.c */
>>> -int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>> -		    struct btrfs_root *root, u64 bytenr, u64 len);
>>> +int btrfs_del_csums(struct btrfs_trans_handle *trans, u64 bytenr, u64 len);
>>>  int btrfs_insert_file_extent(struct btrfs_trans_handle *trans,
>>>  			     struct btrfs_root *root,
>>>  			     u64 objectid, u64 pos, u64 offset,
>>> diff --git a/extent-tree.c b/extent-tree.c
>>> index cbc022f6cef6..c6f09b52800f 100644
>>> --- a/extent-tree.c
>>> +++ b/extent-tree.c
>>> @@ -2372,7 +2372,7 @@ static int __free_extent(struct btrfs_trans_handle *trans,
>>>  		btrfs_release_path(path);
>>>  
>>>  		if (is_data) {
>>> -			ret = btrfs_del_csums(trans, root, bytenr, num_bytes);
>>> +			ret = btrfs_del_csums(trans, bytenr, num_bytes);
>>>  			BUG_ON(ret);
>>>  		}
>>>  
>>> diff --git a/file-item.c b/file-item.c
>>> index 7b0ff3585509..71d4e89f78d1 100644
>>> --- a/file-item.c
>>> +++ b/file-item.c
>>> @@ -394,8 +394,7 @@ static noinline int truncate_one_csum(struct btrfs_root *root,
>>>   * deletes the csum items from the csum tree for a given
>>>   * range of bytes.
>>>   */
>>> -int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>> -		    struct btrfs_root *root, u64 bytenr, u64 len)
>>> +int btrfs_del_csums(struct btrfs_trans_handle *trans, u64 bytenr, u64 len)
>>>  {
>>>  	struct btrfs_path *path;
>>>  	struct btrfs_key key;
>>> @@ -403,11 +402,10 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>  	u64 csum_end;
>>>  	struct extent_buffer *leaf;
>>>  	int ret;
>>> -	u16 csum_size =
>>> -		btrfs_super_csum_size(root->fs_info->super_copy);
>>> -	int blocksize = root->fs_info->sectorsize;
>>> +	u16 csum_size = btrfs_super_csum_size(trans->fs_info->super_copy);
>>> +	int blocksize = trans->fs_info->sectorsize;
>>> +	struct btrfs_root *csum_root = trans->fs_info->csum_root;
>>>  
>>> -	root = root->fs_info->csum_root;
>>>  
>>>  	path = btrfs_alloc_path();
>>>  	if (!path)
>>> @@ -418,7 +416,7 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>  		key.offset = end_byte - 1;
>>>  		key.type = BTRFS_EXTENT_CSUM_KEY;
>>>  
>>> -		ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
>>> +		ret = btrfs_search_slot(trans, csum_root, &key, path, -1, 1);
>>>  		if (ret > 0) {
>>>  			if (path->slots[0] == 0)
>>>  				goto out;
>>> @@ -445,7 +443,7 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>  
>>>  		/* delete the entire item, it is inside our range */
>>>  		if (key.offset >= bytenr && csum_end <= end_byte) {
>>> -			ret = btrfs_del_item(trans, root, path);
>>> +			ret = btrfs_del_item(trans, csum_root, path);
>>>  			BUG_ON(ret);
>>>  		} else if (key.offset < bytenr && csum_end > end_byte) {
>>>  			unsigned long offset;
>>> @@ -485,12 +483,14 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>  			 * btrfs_split_item returns -EAGAIN when the
>>>  			 * item changed size or key
>>>  			 */
>>> -			ret = btrfs_split_item(trans, root, path, &key, offset);
>>> +			ret = btrfs_split_item(trans, csum_root, path, &key,
>>> +					       offset);
>>>  			BUG_ON(ret && ret != -EAGAIN);
>>>  
>>>  			key.offset = end_byte - 1;
>>>  		} else {
>>> -			ret = truncate_one_csum(root, path, &key, bytenr, len);
>>> +			ret = truncate_one_csum(csum_root, path, &key, bytenr,
>>> +						len);
>>>  			BUG_ON(ret);
>>>  		}
>>>  		btrfs_release_path(path);
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/15] btrfs-progs: Add delayed refs infrastructure
  2018-06-11  7:10     ` Nikolay Borisov
@ 2018-06-11  7:46       ` Qu Wenruo
  0 siblings, 0 replies; 46+ messages in thread
From: Qu Wenruo @ 2018-06-11  7:46 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2018年06月11日 15:10, Nikolay Borisov wrote:
> 
> 
> On 11.06.2018 08:20, Qu Wenruo wrote:
>>
>>
>> On 2018年06月08日 20:47, Nikolay Borisov wrote:
>>> This commit pulls those portions of the kernel implementation of
>>> delayed refs which are necessary to have them working in user-space.
>>> I've done the following modifications:
>>>
>>> 1. Replaced all kmem_cache_alloc calls to kmalloc.
>>>
>>> 2. Removed all locking-related code, since we are single threaded in
>>> userspace.
>>>
>>> 3. Removed code which deals with data refs - delayed refs in user space
>>> are going to be used only for cowonly trees.
>>
>> That's pretty good, although still some data ref related
>> structures/functions are left.
>>
>>>
>>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>>> ---
>>>  Makefile      |   3 +-
>>>  ctree.h       |   3 +
>>>  delayed-ref.c | 608 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  delayed-ref.h | 225 ++++++++++++++++++++++
>>>  extent-tree.c | 228 ++++++++++++++++++++++
>>>  kerncompat.h  |   8 +
>>>  transaction.h |   4 +
>>>  7 files changed, 1078 insertions(+), 1 deletion(-)
>>>  create mode 100644 delayed-ref.c
>>>  create mode 100644 delayed-ref.h
>>>
>>> diff --git a/Makefile b/Makefile
>>> index 544410e6440c..9508ad4f11e6 100644
>>> --- a/Makefile
>>> +++ b/Makefile
>>> @@ -116,7 +116,8 @@ objects = ctree.o disk-io.o kernel-lib/radix-tree.o extent-tree.o print-tree.o \
>>>  	  qgroup.o free-space-cache.o kernel-lib/list_sort.o props.o \
>>>  	  kernel-shared/ulist.o qgroup-verify.o backref.o string-table.o task-utils.o \
>>>  	  inode.o file.o find-root.o free-space-tree.o help.o send-dump.o \
>>> -	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o
>>> +	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o \
>>> +	  delayed-ref.o
>>>  cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
>>>  	       cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
>>>  	       cmds-quota.o cmds-qgroup.o cmds-replace.o check/main.o \
>>> diff --git a/ctree.h b/ctree.h
>>> index b30a946658ce..d1ea45571d1e 100644
>>> --- a/ctree.h
>>> +++ b/ctree.h
>>> @@ -2812,4 +2812,7 @@ int btrfs_punch_hole(struct btrfs_trans_handle *trans,
>>>  int btrfs_read_file(struct btrfs_root *root, u64 ino, u64 start, int len,
>>>  		    char *dest);
>>>  
>>> +
>>> +/* extent-tree.c */
>>> +int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, unsigned long nr);
>>>  #endif
>>> diff --git a/delayed-ref.c b/delayed-ref.c
>>> new file mode 100644
>>> index 000000000000..f3fa50239380
>>> --- /dev/null
>>> +++ b/delayed-ref.c
>>> @@ -0,0 +1,608 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/*
>>> + * Copyright (C) 2009 Oracle.  All rights reserved.
>>> + */
>>> +
>>> +#include "ctree.h"
>>> +#include "btrfs-list.h"
>>> +#include "delayed-ref.h"
>>> +#include "transaction.h"
>>> +
>>> +/*
>>> + * delayed back reference update tracking.  For subvolume trees
>>> + * we queue up extent allocations and backref maintenance for
>>> + * delayed processing.   This avoids deep call chains where we
>>> + * add extents in the middle of btrfs_search_slot, and it allows
>>> + * us to buffer up frequently modified backrefs in an rb tree instead
>>> + * of hammering updates on the extent allocation tree.
>>> + */
>>
>> A little more explanation on how delayed ref works will be more appricated.
> 
> I just copy/pasted that from the kernel code.> TBH I'm not too familiar

Neither do I.

> with the backref lookup code to write something but I guess I can take a
> look at it and perhaps add information to the btrfs-devs-docs. For now
> I'm confident in my understanding of the delayed allocation/freeing logic.

It doesn't need to be that detailed.

My expectation is just some simple comments on:
1) The purpose.
   Speedup for kernel.
   And some dirty hack for fst?

2) The basic data structures.
   A rbtree of delayed ref heads for each dirty extents.
   Then a list of extent operations in on delayed ref head.
   Each delayed data/tree ref represents a reference update
   (add/remove/creation)

3) When delayed ref is written to disk.
   At transaction time.

So just a basic overview, and reviewer could get some clue how to get
deeper if needed.

Thanks,
Qu

> 
>>
>> [snip]
>>> +struct btrfs_delayed_tree_ref {
>>> +	struct btrfs_delayed_ref_node node;
>>> +	u64 root;
>>> +	u64 parent;
>>> +	int level;
>>> +};
>>> +
>>> +struct btrfs_delayed_data_ref {
>>> +	struct btrfs_delayed_ref_node node;
>>> +	u64 root;
>>> +	u64 parent;
>>> +	u64 objectid;
>>> +	u64 offset;
>>> +};
>>
>> Since we don't use this structure and don't support data ref yet, what
>> about just removing this definiation?
> 
> Saw that tooand immediately sent v2 :)
> 
>>
>> [snip]
>>
>>> +struct btrfs_delayed_ref_head *
>>> +btrfs_select_ref_head(struct btrfs_trans_handle *trans);
>>> +
>>> +/*
>>> + * helper functions to cast a node into its container
>>> + */
>>> +static inline struct btrfs_delayed_tree_ref *
>>> +btrfs_delayed_node_to_tree_ref(struct btrfs_delayed_ref_node *node)
>>> +{
>>> +	return container_of(node, struct btrfs_delayed_tree_ref, node);
>>> +}
>>> +
>>> +static inline struct btrfs_delayed_data_ref *
>>> +btrfs_delayed_node_to_data_ref(struct btrfs_delayed_ref_node *node)
>>> +{
>>> +	return container_of(node, struct btrfs_delayed_data_ref, node);
>>> +}
>>
>> So is the only user of btrfs_delayed_data_ref structure.
> 
> Fixed in v2
>>
>> Thanks,
>> Qu
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/15] btrfs-progs: Remove root argument from btrfs_del_csums
  2018-06-11  7:40       ` Qu Wenruo
@ 2018-06-11  7:48         ` Nikolay Borisov
  2018-06-11  8:08           ` Qu Wenruo
  0 siblings, 1 reply; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-11  7:48 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 11.06.2018 10:40, Qu Wenruo wrote:
> 
> 
> On 2018年06月11日 15:02, Nikolay Borisov wrote:
>>
>>
>> On 11.06.2018 07:46, Qu Wenruo wrote:
>>>
>>>
>>> On 2018年06月08日 20:47, Nikolay Borisov wrote:
>>>> It's not needed, since we can obtain a reference to fs_info from the
>>>> passed transaction handle. This is needed by delayed refs code.
>>>
>>> This looks a little too aggressive to me.
>>>
>>> Normally we would expect parameters like @trans then @fs_info.
>>> Although @trans only let us get @fs_info, under certain case @trans
>>> could be NULL (like btrfs_search_slot).
>>>
>>> Although for btrfs_del_csums() @tran can be NULL, I still prefer the
>>> @trans, @fs_info parameters.
>>
>> The reason I'm making those prep patches i because I would like to get
>> rid of the root argument from _free_extent,
> 
> For this part, I completely agree with you.
> 
> The root doesn't make sense, getting rid of it is completely fine.
> The abuse of btrfs_root as parameter should be stopped, and we have tons
> of such cleanup both in kernel and btrfs-progs.
> 
> The only concern is about possible NULL @trans parameter.
> However it could be easily addressed by adding 'restrict' prefix.
> 
> So just adding 'restrict', and then the whole cleanup part is fine to me.

you mean making it "struct btrfs_trans_handle * restrict trans". But
restrict deals with pointer aliasing it doesn't have any bearing on the
NULL-ability of trans. I'm afraid I don't follow your logic here, care
to explain further?

> 
> Thanks,
> Qu
> 
>> since if you look at the
>> later delayed refs patches - patch 12/15 you'd see that the adapter
>> function __free_extent2 doesn't really have a root pointer as an
>> argument. If you also take a look at 10/15 (add delayed refs
>> infrastructure), the functions which create the delayed refs also don't
>> take a struct btrfs_root as an argument.
>>
>> Alternatively I can perhaps wire up a call to btrfs_read_fs_root in
>> __free_extent which will be using the "ref->root" id to obtain struct
>> btrfs_root. However, due to the way FST fixup is implemented (I still
>> haven't sent that code) it's possible that we call __free_extent2 with
>> the freespace root set to 0 so if btrfs_read_fs_root is called it will
>> return an -ENOENT.
>>
>> In conclusion what you suggest *could* be done but it will require
>> re-engineering the currently-tested FST code + delayed refs code. Given
>> the purpose of the root argument in this callchain I'd rather eliminate
>> it altogether and not bother with it.
>>
>>>
>>> Thanks,
>>> Qu
>>>
>>>
>>>>
>>>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>>>> ---
>>>>  btrfs-corrupt-block.c |  2 +-
>>>>  ctree.h               |  3 +--
>>>>  extent-tree.c         |  2 +-
>>>>  file-item.c           | 20 ++++++++++----------
>>>>  4 files changed, 13 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
>>>> index 4fbea26cda20..3add8e63b7bb 100644
>>>> --- a/btrfs-corrupt-block.c
>>>> +++ b/btrfs-corrupt-block.c
>>>> @@ -926,7 +926,7 @@ static int delete_csum(struct btrfs_root *root, u64 bytenr, u64 bytes)
>>>>  		return PTR_ERR(trans);
>>>>  	}
>>>>  
>>>> -	ret = btrfs_del_csums(trans, root, bytenr, bytes);
>>>> +	ret = btrfs_del_csums(trans, bytenr, bytes);
>>>>  	if (ret)
>>>>  		fprintf(stderr, "Error deleting csums %d\n", ret);
>>>>  	btrfs_commit_transaction(trans, root);
>>>> diff --git a/ctree.h b/ctree.h
>>>> index de4b1b7e6416..082726238b91 100644
>>>> --- a/ctree.h
>>>> +++ b/ctree.h
>>>> @@ -2752,8 +2752,7 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
>>>>  			u64 ino, u64 parent_ino, u64 *index);
>>>>  
>>>>  /* file-item.c */
>>>> -int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>> -		    struct btrfs_root *root, u64 bytenr, u64 len);
>>>> +int btrfs_del_csums(struct btrfs_trans_handle *trans, u64 bytenr, u64 len);
>>>>  int btrfs_insert_file_extent(struct btrfs_trans_handle *trans,
>>>>  			     struct btrfs_root *root,
>>>>  			     u64 objectid, u64 pos, u64 offset,
>>>> diff --git a/extent-tree.c b/extent-tree.c
>>>> index cbc022f6cef6..c6f09b52800f 100644
>>>> --- a/extent-tree.c
>>>> +++ b/extent-tree.c
>>>> @@ -2372,7 +2372,7 @@ static int __free_extent(struct btrfs_trans_handle *trans,
>>>>  		btrfs_release_path(path);
>>>>  
>>>>  		if (is_data) {
>>>> -			ret = btrfs_del_csums(trans, root, bytenr, num_bytes);
>>>> +			ret = btrfs_del_csums(trans, bytenr, num_bytes);
>>>>  			BUG_ON(ret);
>>>>  		}
>>>>  
>>>> diff --git a/file-item.c b/file-item.c
>>>> index 7b0ff3585509..71d4e89f78d1 100644
>>>> --- a/file-item.c
>>>> +++ b/file-item.c
>>>> @@ -394,8 +394,7 @@ static noinline int truncate_one_csum(struct btrfs_root *root,
>>>>   * deletes the csum items from the csum tree for a given
>>>>   * range of bytes.
>>>>   */
>>>> -int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>> -		    struct btrfs_root *root, u64 bytenr, u64 len)
>>>> +int btrfs_del_csums(struct btrfs_trans_handle *trans, u64 bytenr, u64 len)
>>>>  {
>>>>  	struct btrfs_path *path;
>>>>  	struct btrfs_key key;
>>>> @@ -403,11 +402,10 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>>  	u64 csum_end;
>>>>  	struct extent_buffer *leaf;
>>>>  	int ret;
>>>> -	u16 csum_size =
>>>> -		btrfs_super_csum_size(root->fs_info->super_copy);
>>>> -	int blocksize = root->fs_info->sectorsize;
>>>> +	u16 csum_size = btrfs_super_csum_size(trans->fs_info->super_copy);
>>>> +	int blocksize = trans->fs_info->sectorsize;
>>>> +	struct btrfs_root *csum_root = trans->fs_info->csum_root;
>>>>  
>>>> -	root = root->fs_info->csum_root;
>>>>  
>>>>  	path = btrfs_alloc_path();
>>>>  	if (!path)
>>>> @@ -418,7 +416,7 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>>  		key.offset = end_byte - 1;
>>>>  		key.type = BTRFS_EXTENT_CSUM_KEY;
>>>>  
>>>> -		ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
>>>> +		ret = btrfs_search_slot(trans, csum_root, &key, path, -1, 1);
>>>>  		if (ret > 0) {
>>>>  			if (path->slots[0] == 0)
>>>>  				goto out;
>>>> @@ -445,7 +443,7 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>>  
>>>>  		/* delete the entire item, it is inside our range */
>>>>  		if (key.offset >= bytenr && csum_end <= end_byte) {
>>>> -			ret = btrfs_del_item(trans, root, path);
>>>> +			ret = btrfs_del_item(trans, csum_root, path);
>>>>  			BUG_ON(ret);
>>>>  		} else if (key.offset < bytenr && csum_end > end_byte) {
>>>>  			unsigned long offset;
>>>> @@ -485,12 +483,14 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>>  			 * btrfs_split_item returns -EAGAIN when the
>>>>  			 * item changed size or key
>>>>  			 */
>>>> -			ret = btrfs_split_item(trans, root, path, &key, offset);
>>>> +			ret = btrfs_split_item(trans, csum_root, path, &key,
>>>> +					       offset);
>>>>  			BUG_ON(ret && ret != -EAGAIN);
>>>>  
>>>>  			key.offset = end_byte - 1;
>>>>  		} else {
>>>> -			ret = truncate_one_csum(root, path, &key, bytenr, len);
>>>> +			ret = truncate_one_csum(csum_root, path, &key, bytenr,
>>>> +						len);
>>>>  			BUG_ON(ret);
>>>>  		}
>>>>  		btrfs_release_path(path);
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/15] btrfs-progs: Remove root argument from btrfs_del_csums
  2018-06-11  7:48         ` Nikolay Borisov
@ 2018-06-11  8:08           ` Qu Wenruo
  2018-06-11  8:09             ` Nikolay Borisov
  0 siblings, 1 reply; 46+ messages in thread
From: Qu Wenruo @ 2018-06-11  8:08 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2018年06月11日 15:48, Nikolay Borisov wrote:
> 
> 
> On 11.06.2018 10:40, Qu Wenruo wrote:
>>
>>
>> On 2018年06月11日 15:02, Nikolay Borisov wrote:
>>>
>>>
>>> On 11.06.2018 07:46, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2018年06月08日 20:47, Nikolay Borisov wrote:
>>>>> It's not needed, since we can obtain a reference to fs_info from the
>>>>> passed transaction handle. This is needed by delayed refs code.
>>>>
>>>> This looks a little too aggressive to me.
>>>>
>>>> Normally we would expect parameters like @trans then @fs_info.
>>>> Although @trans only let us get @fs_info, under certain case @trans
>>>> could be NULL (like btrfs_search_slot).
>>>>
>>>> Although for btrfs_del_csums() @tran can be NULL, I still prefer the
>>>> @trans, @fs_info parameters.
>>>
>>> The reason I'm making those prep patches i because I would like to get
>>> rid of the root argument from _free_extent,
>>
>> For this part, I completely agree with you.
>>
>> The root doesn't make sense, getting rid of it is completely fine.
>> The abuse of btrfs_root as parameter should be stopped, and we have tons
>> of such cleanup both in kernel and btrfs-progs.
>>
>> The only concern is about possible NULL @trans parameter.
>> However it could be easily addressed by adding 'restrict' prefix.
>>
>> So just adding 'restrict', and then the whole cleanup part is fine to me.
> 
> you mean making it "struct btrfs_trans_handle * restrict trans". But
> restrict deals with pointer aliasing it doesn't have any bearing on the
> NULL-ability of trans. I'm afraid I don't follow your logic here, care
> to explain further?

All my fault, I misunderstood the 'restrict' qualifier.

There seems to be no qualifier to give early warning on passing NULL
pointer in.

So the cleanup part should be OK.
(If there is such qualifier, I'm all ears)

Thanks,
Qu

> 
>>
>> Thanks,
>> Qu
>>
>>> since if you look at the
>>> later delayed refs patches - patch 12/15 you'd see that the adapter
>>> function __free_extent2 doesn't really have a root pointer as an
>>> argument. If you also take a look at 10/15 (add delayed refs
>>> infrastructure), the functions which create the delayed refs also don't
>>> take a struct btrfs_root as an argument.
>>>
>>> Alternatively I can perhaps wire up a call to btrfs_read_fs_root in
>>> __free_extent which will be using the "ref->root" id to obtain struct
>>> btrfs_root. However, due to the way FST fixup is implemented (I still
>>> haven't sent that code) it's possible that we call __free_extent2 with
>>> the freespace root set to 0 so if btrfs_read_fs_root is called it will
>>> return an -ENOENT.
>>>
>>> In conclusion what you suggest *could* be done but it will require
>>> re-engineering the currently-tested FST code + delayed refs code. Given
>>> the purpose of the root argument in this callchain I'd rather eliminate
>>> it altogether and not bother with it.
>>>
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>
>>>>>
>>>>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>>>>> ---
>>>>>  btrfs-corrupt-block.c |  2 +-
>>>>>  ctree.h               |  3 +--
>>>>>  extent-tree.c         |  2 +-
>>>>>  file-item.c           | 20 ++++++++++----------
>>>>>  4 files changed, 13 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
>>>>> index 4fbea26cda20..3add8e63b7bb 100644
>>>>> --- a/btrfs-corrupt-block.c
>>>>> +++ b/btrfs-corrupt-block.c
>>>>> @@ -926,7 +926,7 @@ static int delete_csum(struct btrfs_root *root, u64 bytenr, u64 bytes)
>>>>>  		return PTR_ERR(trans);
>>>>>  	}
>>>>>  
>>>>> -	ret = btrfs_del_csums(trans, root, bytenr, bytes);
>>>>> +	ret = btrfs_del_csums(trans, bytenr, bytes);
>>>>>  	if (ret)
>>>>>  		fprintf(stderr, "Error deleting csums %d\n", ret);
>>>>>  	btrfs_commit_transaction(trans, root);
>>>>> diff --git a/ctree.h b/ctree.h
>>>>> index de4b1b7e6416..082726238b91 100644
>>>>> --- a/ctree.h
>>>>> +++ b/ctree.h
>>>>> @@ -2752,8 +2752,7 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
>>>>>  			u64 ino, u64 parent_ino, u64 *index);
>>>>>  
>>>>>  /* file-item.c */
>>>>> -int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>>> -		    struct btrfs_root *root, u64 bytenr, u64 len);
>>>>> +int btrfs_del_csums(struct btrfs_trans_handle *trans, u64 bytenr, u64 len);
>>>>>  int btrfs_insert_file_extent(struct btrfs_trans_handle *trans,
>>>>>  			     struct btrfs_root *root,
>>>>>  			     u64 objectid, u64 pos, u64 offset,
>>>>> diff --git a/extent-tree.c b/extent-tree.c
>>>>> index cbc022f6cef6..c6f09b52800f 100644
>>>>> --- a/extent-tree.c
>>>>> +++ b/extent-tree.c
>>>>> @@ -2372,7 +2372,7 @@ static int __free_extent(struct btrfs_trans_handle *trans,
>>>>>  		btrfs_release_path(path);
>>>>>  
>>>>>  		if (is_data) {
>>>>> -			ret = btrfs_del_csums(trans, root, bytenr, num_bytes);
>>>>> +			ret = btrfs_del_csums(trans, bytenr, num_bytes);
>>>>>  			BUG_ON(ret);
>>>>>  		}
>>>>>  
>>>>> diff --git a/file-item.c b/file-item.c
>>>>> index 7b0ff3585509..71d4e89f78d1 100644
>>>>> --- a/file-item.c
>>>>> +++ b/file-item.c
>>>>> @@ -394,8 +394,7 @@ static noinline int truncate_one_csum(struct btrfs_root *root,
>>>>>   * deletes the csum items from the csum tree for a given
>>>>>   * range of bytes.
>>>>>   */
>>>>> -int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>>> -		    struct btrfs_root *root, u64 bytenr, u64 len)
>>>>> +int btrfs_del_csums(struct btrfs_trans_handle *trans, u64 bytenr, u64 len)
>>>>>  {
>>>>>  	struct btrfs_path *path;
>>>>>  	struct btrfs_key key;
>>>>> @@ -403,11 +402,10 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>>>  	u64 csum_end;
>>>>>  	struct extent_buffer *leaf;
>>>>>  	int ret;
>>>>> -	u16 csum_size =
>>>>> -		btrfs_super_csum_size(root->fs_info->super_copy);
>>>>> -	int blocksize = root->fs_info->sectorsize;
>>>>> +	u16 csum_size = btrfs_super_csum_size(trans->fs_info->super_copy);
>>>>> +	int blocksize = trans->fs_info->sectorsize;
>>>>> +	struct btrfs_root *csum_root = trans->fs_info->csum_root;
>>>>>  
>>>>> -	root = root->fs_info->csum_root;
>>>>>  
>>>>>  	path = btrfs_alloc_path();
>>>>>  	if (!path)
>>>>> @@ -418,7 +416,7 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>>>  		key.offset = end_byte - 1;
>>>>>  		key.type = BTRFS_EXTENT_CSUM_KEY;
>>>>>  
>>>>> -		ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
>>>>> +		ret = btrfs_search_slot(trans, csum_root, &key, path, -1, 1);
>>>>>  		if (ret > 0) {
>>>>>  			if (path->slots[0] == 0)
>>>>>  				goto out;
>>>>> @@ -445,7 +443,7 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>>>  
>>>>>  		/* delete the entire item, it is inside our range */
>>>>>  		if (key.offset >= bytenr && csum_end <= end_byte) {
>>>>> -			ret = btrfs_del_item(trans, root, path);
>>>>> +			ret = btrfs_del_item(trans, csum_root, path);
>>>>>  			BUG_ON(ret);
>>>>>  		} else if (key.offset < bytenr && csum_end > end_byte) {
>>>>>  			unsigned long offset;
>>>>> @@ -485,12 +483,14 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>>>  			 * btrfs_split_item returns -EAGAIN when the
>>>>>  			 * item changed size or key
>>>>>  			 */
>>>>> -			ret = btrfs_split_item(trans, root, path, &key, offset);
>>>>> +			ret = btrfs_split_item(trans, csum_root, path, &key,
>>>>> +					       offset);
>>>>>  			BUG_ON(ret && ret != -EAGAIN);
>>>>>  
>>>>>  			key.offset = end_byte - 1;
>>>>>  		} else {
>>>>> -			ret = truncate_one_csum(root, path, &key, bytenr, len);
>>>>> +			ret = truncate_one_csum(csum_root, path, &key, bytenr,
>>>>> +						len);
>>>>>  			BUG_ON(ret);
>>>>>  		}
>>>>>  		btrfs_release_path(path);
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/15] btrfs-progs: Remove root argument from btrfs_del_csums
  2018-06-11  8:08           ` Qu Wenruo
@ 2018-06-11  8:09             ` Nikolay Borisov
  0 siblings, 0 replies; 46+ messages in thread
From: Nikolay Borisov @ 2018-06-11  8:09 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 11.06.2018 11:08, Qu Wenruo wrote:
> 
> 
> On 2018年06月11日 15:48, Nikolay Borisov wrote:
>>
>>
>> On 11.06.2018 10:40, Qu Wenruo wrote:
>>>
>>>
>>> On 2018年06月11日 15:02, Nikolay Borisov wrote:
>>>>
>>>>
>>>> On 11.06.2018 07:46, Qu Wenruo wrote:
>>>>>
>>>>>
>>>>> On 2018年06月08日 20:47, Nikolay Borisov wrote:
>>>>>> It's not needed, since we can obtain a reference to fs_info from the
>>>>>> passed transaction handle. This is needed by delayed refs code.
>>>>>
>>>>> This looks a little too aggressive to me.
>>>>>
>>>>> Normally we would expect parameters like @trans then @fs_info.
>>>>> Although @trans only let us get @fs_info, under certain case @trans
>>>>> could be NULL (like btrfs_search_slot).
>>>>>
>>>>> Although for btrfs_del_csums() @tran can be NULL, I still prefer the
>>>>> @trans, @fs_info parameters.
>>>>
>>>> The reason I'm making those prep patches i because I would like to get
>>>> rid of the root argument from _free_extent,
>>>
>>> For this part, I completely agree with you.
>>>
>>> The root doesn't make sense, getting rid of it is completely fine.
>>> The abuse of btrfs_root as parameter should be stopped, and we have tons
>>> of such cleanup both in kernel and btrfs-progs.
>>>
>>> The only concern is about possible NULL @trans parameter.
>>> However it could be easily addressed by adding 'restrict' prefix.
>>>
>>> So just adding 'restrict', and then the whole cleanup part is fine to me.
>>
>> you mean making it "struct btrfs_trans_handle * restrict trans". But
>> restrict deals with pointer aliasing it doesn't have any bearing on the
>> NULL-ability of trans. I'm afraid I don't follow your logic here, care
>> to explain further?
> 
> All my fault, I misunderstood the 'restrict' qualifier.
> 
> There seems to be no qualifier to give early warning on passing NULL
> pointer in.
> 
> So the cleanup part should be OK.
> (If there is such qualifier, I'm all ears)

I guess it's called ASSERT(trans) :)
> 
> Thanks,
> Qu
> 
>>
>>>
>>> Thanks,
>>> Qu
>>>
>>>> since if you look at the
>>>> later delayed refs patches - patch 12/15 you'd see that the adapter
>>>> function __free_extent2 doesn't really have a root pointer as an
>>>> argument. If you also take a look at 10/15 (add delayed refs
>>>> infrastructure), the functions which create the delayed refs also don't
>>>> take a struct btrfs_root as an argument.
>>>>
>>>> Alternatively I can perhaps wire up a call to btrfs_read_fs_root in
>>>> __free_extent which will be using the "ref->root" id to obtain struct
>>>> btrfs_root. However, due to the way FST fixup is implemented (I still
>>>> haven't sent that code) it's possible that we call __free_extent2 with
>>>> the freespace root set to 0 so if btrfs_read_fs_root is called it will
>>>> return an -ENOENT.
>>>>
>>>> In conclusion what you suggest *could* be done but it will require
>>>> re-engineering the currently-tested FST code + delayed refs code. Given
>>>> the purpose of the root argument in this callchain I'd rather eliminate
>>>> it altogether and not bother with it.
>>>>
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>
>>>>>
>>>>>>
>>>>>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>>>>>> ---
>>>>>>  btrfs-corrupt-block.c |  2 +-
>>>>>>  ctree.h               |  3 +--
>>>>>>  extent-tree.c         |  2 +-
>>>>>>  file-item.c           | 20 ++++++++++----------
>>>>>>  4 files changed, 13 insertions(+), 14 deletions(-)
>>>>>>
>>>>>> diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
>>>>>> index 4fbea26cda20..3add8e63b7bb 100644
>>>>>> --- a/btrfs-corrupt-block.c
>>>>>> +++ b/btrfs-corrupt-block.c
>>>>>> @@ -926,7 +926,7 @@ static int delete_csum(struct btrfs_root *root, u64 bytenr, u64 bytes)
>>>>>>  		return PTR_ERR(trans);
>>>>>>  	}
>>>>>>  
>>>>>> -	ret = btrfs_del_csums(trans, root, bytenr, bytes);
>>>>>> +	ret = btrfs_del_csums(trans, bytenr, bytes);
>>>>>>  	if (ret)
>>>>>>  		fprintf(stderr, "Error deleting csums %d\n", ret);
>>>>>>  	btrfs_commit_transaction(trans, root);
>>>>>> diff --git a/ctree.h b/ctree.h
>>>>>> index de4b1b7e6416..082726238b91 100644
>>>>>> --- a/ctree.h
>>>>>> +++ b/ctree.h
>>>>>> @@ -2752,8 +2752,7 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
>>>>>>  			u64 ino, u64 parent_ino, u64 *index);
>>>>>>  
>>>>>>  /* file-item.c */
>>>>>> -int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>>>> -		    struct btrfs_root *root, u64 bytenr, u64 len);
>>>>>> +int btrfs_del_csums(struct btrfs_trans_handle *trans, u64 bytenr, u64 len);
>>>>>>  int btrfs_insert_file_extent(struct btrfs_trans_handle *trans,
>>>>>>  			     struct btrfs_root *root,
>>>>>>  			     u64 objectid, u64 pos, u64 offset,
>>>>>> diff --git a/extent-tree.c b/extent-tree.c
>>>>>> index cbc022f6cef6..c6f09b52800f 100644
>>>>>> --- a/extent-tree.c
>>>>>> +++ b/extent-tree.c
>>>>>> @@ -2372,7 +2372,7 @@ static int __free_extent(struct btrfs_trans_handle *trans,
>>>>>>  		btrfs_release_path(path);
>>>>>>  
>>>>>>  		if (is_data) {
>>>>>> -			ret = btrfs_del_csums(trans, root, bytenr, num_bytes);
>>>>>> +			ret = btrfs_del_csums(trans, bytenr, num_bytes);
>>>>>>  			BUG_ON(ret);
>>>>>>  		}
>>>>>>  
>>>>>> diff --git a/file-item.c b/file-item.c
>>>>>> index 7b0ff3585509..71d4e89f78d1 100644
>>>>>> --- a/file-item.c
>>>>>> +++ b/file-item.c
>>>>>> @@ -394,8 +394,7 @@ static noinline int truncate_one_csum(struct btrfs_root *root,
>>>>>>   * deletes the csum items from the csum tree for a given
>>>>>>   * range of bytes.
>>>>>>   */
>>>>>> -int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>>>> -		    struct btrfs_root *root, u64 bytenr, u64 len)
>>>>>> +int btrfs_del_csums(struct btrfs_trans_handle *trans, u64 bytenr, u64 len)
>>>>>>  {
>>>>>>  	struct btrfs_path *path;
>>>>>>  	struct btrfs_key key;
>>>>>> @@ -403,11 +402,10 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>>>>  	u64 csum_end;
>>>>>>  	struct extent_buffer *leaf;
>>>>>>  	int ret;
>>>>>> -	u16 csum_size =
>>>>>> -		btrfs_super_csum_size(root->fs_info->super_copy);
>>>>>> -	int blocksize = root->fs_info->sectorsize;
>>>>>> +	u16 csum_size = btrfs_super_csum_size(trans->fs_info->super_copy);
>>>>>> +	int blocksize = trans->fs_info->sectorsize;
>>>>>> +	struct btrfs_root *csum_root = trans->fs_info->csum_root;
>>>>>>  
>>>>>> -	root = root->fs_info->csum_root;
>>>>>>  
>>>>>>  	path = btrfs_alloc_path();
>>>>>>  	if (!path)
>>>>>> @@ -418,7 +416,7 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>>>>  		key.offset = end_byte - 1;
>>>>>>  		key.type = BTRFS_EXTENT_CSUM_KEY;
>>>>>>  
>>>>>> -		ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
>>>>>> +		ret = btrfs_search_slot(trans, csum_root, &key, path, -1, 1);
>>>>>>  		if (ret > 0) {
>>>>>>  			if (path->slots[0] == 0)
>>>>>>  				goto out;
>>>>>> @@ -445,7 +443,7 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>>>>  
>>>>>>  		/* delete the entire item, it is inside our range */
>>>>>>  		if (key.offset >= bytenr && csum_end <= end_byte) {
>>>>>> -			ret = btrfs_del_item(trans, root, path);
>>>>>> +			ret = btrfs_del_item(trans, csum_root, path);
>>>>>>  			BUG_ON(ret);
>>>>>>  		} else if (key.offset < bytenr && csum_end > end_byte) {
>>>>>>  			unsigned long offset;
>>>>>> @@ -485,12 +483,14 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
>>>>>>  			 * btrfs_split_item returns -EAGAIN when the
>>>>>>  			 * item changed size or key
>>>>>>  			 */
>>>>>> -			ret = btrfs_split_item(trans, root, path, &key, offset);
>>>>>> +			ret = btrfs_split_item(trans, csum_root, path, &key,
>>>>>> +					       offset);
>>>>>>  			BUG_ON(ret && ret != -EAGAIN);
>>>>>>  
>>>>>>  			key.offset = end_byte - 1;
>>>>>>  		} else {
>>>>>> -			ret = truncate_one_csum(root, path, &key, bytenr, len);
>>>>>> +			ret = truncate_one_csum(csum_root, path, &key, bytenr,
>>>>>> +						len);
>>>>>>  			BUG_ON(ret);
>>>>>>  		}
>>>>>>  		btrfs_release_path(path);
>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/15] Add delayed-refs support to btrfs-progs
  2018-06-08 12:47 [PATCH 00/15] Add delayed-refs support to btrfs-progs Nikolay Borisov
                   ` (15 preceding siblings ...)
  2018-06-08 13:50 ` [PATCH 00/15] Add delayed-refs support to btrfs-progs Qu Wenruo
@ 2018-07-16 15:39 ` David Sterba
  2018-09-12 11:51   ` Su Yue
  16 siblings, 1 reply; 46+ messages in thread
From: David Sterba @ 2018-07-16 15:39 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

On Fri, Jun 08, 2018 at 03:47:43PM +0300, Nikolay Borisov wrote:
> Hello,                                                                          
>                                                                                 
> Here is a series which adds support for delayed refs. This is needed to enable  
> later work on adding freespace tree repair code. Additionally, it results in  
> more code sharing between kernel/user space.
> 
> Patches 1-9 are simple prep patches removing some arguments, causing problems
> later. They can go independently of the delayed refs work. They don't introduce
> any functional changes. Next, patches 10-13 introduce the needed infrastructure
> to for delayed refs without actually activating it. Patch 14 finally wires it
> up by adding the necessary call outs to btrfs_run_delayed refs and reworking the
> extent addition/freeing functions. With all of this done, patch 15 finally
> removes the old code.
> 
> This series passes all btrfs progs fsck and misc tests + fuzz tests apart from
> fuzz-003/007/009 - but those fail without this series so it's unlikely it's
> caused by it.
> 
> Nikolay Borisov (15):
>   btrfs-progs: Remove root argument from pin_down_bytes
>   btrfs-progs: Remove root argument from btrfs_del_csums
>   btrfs-progs: Add functions to modify the used space by a root
>   btrfs-progs: Refactor the root used bytes are updated
>   btrfs-progs: Make update_block_group take fs_info instead of root
>   btrfs-progs: check: Drop trans/root arguments from free_extent_hook
>   btrfs-progs: Remove root argument from __free_extent
>   btrfs-progs: Remove root argument from alloc_reserved_tree_block
>   btrfs-progs: Always pass 0 for offset when calling btrfs_free_extent
>     for btree blocks.
>   btrfs-progs: Add boolean to signal whether we are re-initing extent
>     tree
>   btrfs-progs: Add delayed refs infrastructure
>   btrfs-progs: Add __free_extent2 function
>   btrfs-progs: Add alloc_reserved_tree_block2 function
>   btrfs-progs: Wire up delayed refs
>   btrfs-progs: Remove old delayed refs infrastructure

Added to devel. There were some patch-to-patch compilation issues,
function alloc_reserved_tree_block2 used earlier than defined so I
reordered the patches to fix that.

The CI fails at test check/020-extent-ref-cases but it works on my
machine so it's not caused by the patchset.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/15] btrfs-progs: Wire up delayed refs
  2018-06-08 12:47 ` [PATCH 14/15] btrfs-progs: Wire up delayed refs Nikolay Borisov
@ 2018-07-30  8:33   ` Misono Tomohiro
  2018-07-30  9:30     ` Nikolay Borisov
  0 siblings, 1 reply; 46+ messages in thread
From: Misono Tomohiro @ 2018-07-30  8:33 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs

On 2018/06/08 21:47, Nikolay Borisov wrote:
> This commit enables the delayed refs infrastructures. This entails doing
> the following:
> 
> 1. Replacing existing calls of btrfs_extent_post_op (which is the
> equivalent of delayed refs) with the proper btrfs_run_delayed_refs.
> As well as eliminating open-coded calls to finish_current_insert and
> del_pending_extents which execute the delayed ops.
> 
> 2. Wiring up the addition of delayed refs when freeing extents
> (btrfs_free_extent) and when adding new extents (alloc_tree_block).
> 
> 3. Adding calls to btrfs_run_delayed refs in the transaction commit
> path alongside comments why every call is needed, since it's not always
> obvious (those call sites were derived empirically by running and
> debugging existing tests)
> 
> 4. Correctly flagging the transaction in which we are reinitialising
> the extent tree.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  check/main.c  |   3 +-
>  extent-tree.c | 166 ++++++++++++++++++++++++++++++----------------------------
>  transaction.c |  24 +++++++++
>  3 files changed, 111 insertions(+), 82 deletions(-)
> 
> diff --git a/check/main.c b/check/main.c
> index b84903acdb25..7c9689f29fd3 100644
> --- a/check/main.c
> +++ b/check/main.c
> @@ -8634,7 +8634,7 @@ static int reinit_extent_tree(struct btrfs_trans_handle *trans,
>  			fprintf(stderr, "Error adding block group\n");
>  			return ret;
>  		}
> -		btrfs_extent_post_op(trans);
> +		btrfs_run_delayed_refs(trans, -1);
>  	}
>  
>  	ret = reset_balance(trans, fs_info);
> @@ -9682,6 +9682,7 @@ int cmd_check(int argc, char **argv)
>  			goto close_out;
>  		}
>  
> +		trans->reinit_extent_tree = true;
>  		if (init_extent_tree) {
>  			printf("Creating a new extent tree\n");
>  			ret = reinit_extent_tree(trans, info,
> diff --git a/extent-tree.c b/extent-tree.c
> index 3208ed11cb91..9d085158f2d8 100644
> --- a/extent-tree.c
> +++ b/extent-tree.c
> @@ -1418,8 +1418,6 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
>  		err = ret;
>  out:
>  	btrfs_free_path(path);
> -	finish_current_insert(trans);
> -	del_pending_extents(trans);
>  	BUG_ON(err);
>  	return err;
>  }
> @@ -1602,8 +1600,6 @@ int btrfs_set_block_flags(struct btrfs_trans_handle *trans, u64 bytenr,
>  	btrfs_set_extent_flags(l, item, flags);
>  out:
>  	btrfs_free_path(path);
> -	finish_current_insert(trans);
> -	del_pending_extents(trans);
>  	return ret;
>  }
>  
> @@ -1701,7 +1697,6 @@ static int write_one_cache_group(struct btrfs_trans_handle *trans,
>  				 struct btrfs_block_group_cache *cache)
>  {
>  	int ret;
> -	int pending_ret;
>  	struct btrfs_root *extent_root = trans->fs_info->extent_root;
>  	unsigned long bi;
>  	struct extent_buffer *leaf;
> @@ -1717,12 +1712,8 @@ static int write_one_cache_group(struct btrfs_trans_handle *trans,
>  	btrfs_mark_buffer_dirty(leaf);
>  	btrfs_release_path(path);
>  fail:
> -	finish_current_insert(trans);
> -	pending_ret = del_pending_extents(trans);
>  	if (ret)
>  		return ret;
> -	if (pending_ret)
> -		return pending_ret;
>  	return 0;
>  
>  }
> @@ -2050,6 +2041,7 @@ static int finish_current_insert(struct btrfs_trans_handle *trans)
>  	int skinny_metadata =
>  		btrfs_fs_incompat(extent_root->fs_info, SKINNY_METADATA);
>  
> +
>  	while(1) {
>  		ret = find_first_extent_bit(&info->extent_ins, 0, &start,
>  					    &end, EXTENT_LOCKED);
> @@ -2081,6 +2073,8 @@ static int finish_current_insert(struct btrfs_trans_handle *trans)
>  			BUG_ON(1);
>  		}
>  
> +
> +		printf("shouldn't be executed\n");
>  		clear_extent_bits(&info->extent_ins, start, end, EXTENT_LOCKED);
>  		kfree(extent_op);
>  	}
> @@ -2380,7 +2374,6 @@ static int __free_extent(struct btrfs_trans_handle *trans,
>  	}
>  fail:
>  	btrfs_free_path(path);
> -	finish_current_insert(trans);
>  	return ret;
>  }
>  
> @@ -2463,33 +2456,30 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans,
>  		      u64 bytenr, u64 num_bytes, u64 parent,
>  		      u64 root_objectid, u64 owner, u64 offset)
>  {
> -	struct btrfs_root *extent_root = root->fs_info->extent_root;
> -	int pending_ret;
>  	int ret;
>  
>  	WARN_ON(num_bytes < root->fs_info->sectorsize);
> -	if (root == extent_root) {
> -		struct pending_extent_op *extent_op;
> -
> -		extent_op = kmalloc(sizeof(*extent_op), GFP_NOFS);
> -		BUG_ON(!extent_op);
> -
> -		extent_op->type = PENDING_EXTENT_DELETE;
> -		extent_op->bytenr = bytenr;
> -		extent_op->num_bytes = num_bytes;
> -		extent_op->level = (int)owner;
> -
> -		set_extent_bits(&root->fs_info->pending_del,
> -				bytenr, bytenr + num_bytes - 1,
> -				EXTENT_LOCKED);
> -		set_state_private(&root->fs_info->pending_del,
> -				  bytenr, (unsigned long)extent_op);
> -		return 0;
> +	/*
> +	 * tree log blocks never actually go into the extent allocation
> +	 * tree, just update pinning info and exit early.
> +	 */
> +	if (root_objectid == BTRFS_TREE_LOG_OBJECTID) {
> +		printf("PINNING EXTENTS IN LOG TREE\n");
> +		WARN_ON(owner >= BTRFS_FIRST_FREE_OBJECTID);
> +		btrfs_pin_extent(trans->fs_info, bytenr, num_bytes);
> +		ret = 0;
> +	} else if (owner < BTRFS_FIRST_FREE_OBJECTID) {
> +		BUG_ON(offset);
> +		ret = btrfs_add_delayed_tree_ref(trans->fs_info, trans,
> +						 bytenr, num_bytes, parent,
> +						 root_objectid, (int)owner,
> +						 BTRFS_DROP_DELAYED_REF,
> +						 NULL, NULL, NULL);
> +	} else {
> +		ret = __free_extent(trans, bytenr, num_bytes, parent,
> +				    root_objectid, owner, offset, 1);
>  	}
> -	ret = __free_extent(trans, root, bytenr, num_bytes, parent,
> -			    root_objectid, owner, offset, 1);
> -	pending_ret = del_pending_extents(trans);
> -	return ret ? ret : pending_ret;
> +	return ret;
>  }
>  
>  static u64 stripe_align(struct btrfs_root *root, u64 val)
> @@ -2695,6 +2685,8 @@ static int alloc_reserved_tree_block2(struct btrfs_trans_handle *trans,
>  	struct btrfs_delayed_tree_ref *ref = btrfs_delayed_node_to_tree_ref(node);
>  	struct btrfs_key ins;
>  	bool skinny_metadata = btrfs_fs_incompat(trans->fs_info, SKINNY_METADATA);
> +	int ret;
> +	u64 start, end;
>  
>  	ins.objectid = node->bytenr;
>  	if (skinny_metadata) {
> @@ -2705,10 +2697,25 @@ static int alloc_reserved_tree_block2(struct btrfs_trans_handle *trans,
>  		ins.type = BTRFS_EXTENT_ITEM_KEY;
>  	}
>  
> -	return alloc_reserved_tree_block(trans, ref->root, trans->transid,
> -					 extent_op->flags_to_set,
> -					 &extent_op->key, ref->level, &ins);
> +	if (ref->root == BTRFS_EXTENT_TREE_OBJECTID) {
> +		ret = find_first_extent_bit(&trans->fs_info->extent_ins,
> +					    node->bytenr, &start, &end,
> +					    EXTENT_LOCKED);
> +		ASSERT(!ret);
> +		ASSERT(start == node->bytenr);
> +		ASSERT(end == node->bytenr + node->num_bytes - 1);
> +	}
> +
> +	ret = alloc_reserved_tree_block(trans, ref->root, trans->transid,
> +					extent_op->flags_to_set,
> +					&extent_op->key, ref->level, &ins);
>  
> +	if (ref->root == BTRFS_EXTENT_TREE_OBJECTID) {
> +		clear_extent_bits(&trans->fs_info->extent_ins, start, end,
> +				  EXTENT_LOCKED);
> +	}
> +
> +	return ret;
>  }
>  
>  static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
> @@ -2773,39 +2780,50 @@ static int alloc_tree_block(struct btrfs_trans_handle *trans,
>  			    u64 search_end, struct btrfs_key *ins)
>  {
>  	int ret;
> +	u64 extent_size;
> +	struct btrfs_delayed_extent_op *extent_op;
> +	bool skinny_metadata = btrfs_fs_incompat(root->fs_info,
> +						 SKINNY_METADATA);
> +
> +	extent_op = btrfs_alloc_delayed_extent_op();
> +	if (!extent_op)
> +		return -ENOMEM;
> +
>  	ret = btrfs_reserve_extent(trans, root, num_bytes, empty_size,
>  				   hint_byte, search_end, ins, 0);
>  	BUG_ON(ret);
>  
> +	if (key)
> +		memcpy(&extent_op->key, key, sizeof(extent_op->key));
> +	else
> +		memset(&extent_op->key, 0, sizeof(extent_op->key));
> +	extent_op->flags_to_set = flags;
> +	extent_op->update_key = skinny_metadata ? false : true;
> +	extent_op->update_flags = true;
> +	extent_op->is_data = false;
> +	extent_op->level = level;
> +
> +	extent_size = ins->offset;
> +
> +	if (btrfs_fs_incompat(root->fs_info, SKINNY_METADATA)) {
> +		ins->offset = level;
> +		ins->type = BTRFS_METADATA_ITEM_KEY;
> +	}
> +
> +	/* Ensure this reserved extent is not found by the allocator */
>  	if (root_objectid == BTRFS_EXTENT_TREE_OBJECTID) {
> -		struct pending_extent_op *extent_op;
> -
> -		extent_op = kmalloc(sizeof(*extent_op), GFP_NOFS);
> -		BUG_ON(!extent_op);
> -
> -		extent_op->type = PENDING_EXTENT_INSERT;
> -		extent_op->bytenr = ins->objectid;
> -		extent_op->num_bytes = ins->offset;
> -		extent_op->level = level;
> -		extent_op->flags = flags;
> -		memcpy(&extent_op->key, key, sizeof(*key));
> -
> -		set_extent_bits(&root->fs_info->extent_ins, ins->objectid,
> -				ins->objectid + ins->offset - 1,
> -				EXTENT_LOCKED);
> -		set_state_private(&root->fs_info->extent_ins,
> -				  ins->objectid, (unsigned long)extent_op);
> -	} else {
> -		if (btrfs_fs_incompat(root->fs_info, SKINNY_METADATA)) {
> -			ins->offset = level;
> -			ins->type = BTRFS_METADATA_ITEM_KEY;
> -		}
> -		ret = alloc_reserved_tree_block(trans, root, root_objectid,
> -						generation, flags,
> -						key, level, ins);
> -		finish_current_insert(trans);
> -		del_pending_extents(trans);
> +		ret = set_extent_bits(&trans->fs_info->extent_ins,
> +				      ins->objectid,
> +				      ins->objectid + extent_size - 1,
> +				      EXTENT_LOCKED);
> +
> +		BUG_ON(ret);
>  	}
> +
> +	ret = btrfs_add_delayed_tree_ref(root->fs_info, trans, ins->objectid,
> +					 extent_size, 0, root_objectid,
> +					 level, BTRFS_ADD_DELAYED_EXTENT,
> +					 extent_op, NULL, NULL);
>  	return ret;
>  }
>  
> @@ -3330,11 +3348,6 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans,
>  				sizeof(cache->item));
>  	BUG_ON(ret);
>  
> -	ret = finish_current_insert(trans);
> -	BUG_ON(ret);
> -	ret = del_pending_extents(trans);
> -	BUG_ON(ret);
> -
>  	return 0;
>  }
>  
> @@ -3430,10 +3443,6 @@ int btrfs_make_block_groups(struct btrfs_trans_handle *trans,
>  					sizeof(cache->item));
>  		BUG_ON(ret);
>  
> -		finish_current_insert(trans);
> -		ret = del_pending_extents(trans);
> -		BUG_ON(ret);
> -
>  		cur_start = cache->key.objectid + cache->key.offset;
>  	}
>  	return 0;
> @@ -3815,14 +3824,9 @@ int btrfs_fix_block_accounting(struct btrfs_trans_handle *trans)
>  	struct btrfs_fs_info *fs_info = trans->fs_info;
>  	struct btrfs_root *root = fs_info->extent_root;
>  
> -	while(extent_root_pending_ops(fs_info)) {
> -		ret = finish_current_insert(trans);
> -		if (ret)
> -			return ret;
> -		ret = del_pending_extents(trans);
> -		if (ret)
> -			return ret;
> -	}
> +	ret = btrfs_run_delayed_refs(trans, -1);
> +	if (ret)
> +		return ret;
>  
>  	while(1) {
>  		cache = btrfs_lookup_first_block_group(fs_info, start);
> @@ -4027,7 +4031,7 @@ static int __btrfs_record_file_extent(struct btrfs_trans_handle *trans,
>  		} else if (ret != -EEXIST) {
>  			goto fail;
>  		}
> -		btrfs_extent_post_op(trans);
> +		btrfs_run_delayed_refs(trans, -1);
>  		extent_bytenr = disk_bytenr;
>  		extent_num_bytes = num_bytes;
>  		extent_offset = 0;
> diff --git a/transaction.c b/transaction.c
> index ecafbb156610..b2d613ee88d0 100644
> --- a/transaction.c
> +++ b/transaction.c
> @@ -98,6 +98,17 @@ int commit_tree_roots(struct btrfs_trans_handle *trans,
>  	if (ret)
>  		return ret;
>  
> +	/*
> +	 * If the above CoW is the first one to dirty the current tree_root,
> +	 * delayed refs for it won't be run until after this function has
> +	 * finished executing, meaning we won't process the extent tree root,
> +	 * which will have been added to ->dirty_cowonly_roots.  So run
> +	 * delayed refs here as well.
> +	 */
> +	ret = btrfs_run_delayed_refs(trans, -1);
> +	if (ret)
> +		return ret;
> +
>  	while(!list_empty(&fs_info->dirty_cowonly_roots)) {
>  		next = fs_info->dirty_cowonly_roots.next;
>  		list_del_init(next);
> @@ -147,6 +158,12 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
>  
>  	if (trans->fs_info->transaction_aborted)
>  		return -EROFS;
> +	/*
> +	 * Flush all accumulated delayed refs so that root-tree updates are
> +	 * consistent
> +	 */
> +	ret = btrfs_run_delayed_refs(trans, -1);
> +	BUG_ON(ret);
>  
>  	if (root->commit_root == root->node)
>  		goto commit_tree;
> @@ -164,9 +181,16 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
>  	ret = btrfs_update_root(trans, root->fs_info->tree_root,
>  				&root->root_key, &root->root_item);
>  	BUG_ON(ret);
> +
>  commit_tree:
>  	ret = commit_tree_roots(trans, fs_info);
>  	BUG_ON(ret);
> +	/*
> +	 * Ensure that all comitted roots are properly accounted in the
> +	 * extent tree
> +	 */
> +	ret = btrfs_run_delayed_refs(trans, -1);
> +	BUG_ON(ret);

Is "btrfs_write_dirty_block_groups(trans, root);" needed here
since above run_delayed_refs() may update block_group_cache?

[long explanation] 

I observed fsck-tests/020 fails with low-mem mode in current devel branch.
i.e.

  $ make test-fsck TEST_ENABLE_OVERRIDE=true TEST_ARGS_CHECK=--mode=lowmem TEST=020\*

fails and log indicates mismatch of used value in block group item:

=====
<snip>
  [2/7] checking extents   
  ERROR: block group[4194304 8388608] used 20480 but extent items used 24576
  ERROR: block group[20971520 16777216] used 659456 but extent items used 655360
<snip>
=====

I found that before this commit it works fine. 
It turned out that "btrfs-image -r" actually causes the problem as it modifies
DEV_ITEM in fixup_devices() and commits transaction, which misses to write
block group cache before __commit_transaction() for
tests/fsck-tests/020-extent/ref-cases/keyed_data_ref_with_shared_leaf.img.

(Used value check of block group item only exists in low-mem mode and therefore
original mode does not complain.)

With "btrfs_write_dirty_block_groups()" I don't see any failure with both original
and low-mem mode (in all fsck tests).

Thanks,
Misono

>  	ret = __commit_transaction(trans, root);
>  	BUG_ON(ret);
>  	write_ctree_super(trans);
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/15] btrfs-progs: Add delayed refs infrastructure
  2018-06-08 12:47 ` [PATCH 11/15] btrfs-progs: Add delayed refs infrastructure Nikolay Borisov
  2018-06-08 14:53   ` [PATCH 11/15 v2] " Nikolay Borisov
  2018-06-11  5:20   ` [PATCH 11/15] " Qu Wenruo
@ 2018-07-30  8:34   ` Misono Tomohiro
  2018-07-30  9:11     ` Nikolay Borisov
  2018-08-02 12:17     ` David Sterba
  2 siblings, 2 replies; 46+ messages in thread
From: Misono Tomohiro @ 2018-07-30  8:34 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs

On 2018/06/08 21:47, Nikolay Borisov wrote:
> This commit pulls those portions of the kernel implementation of
> delayed refs which are necessary to have them working in user-space.
> I've done the following modifications:
> 
> 1. Replaced all kmem_cache_alloc calls to kmalloc.
> 
> 2. Removed all locking-related code, since we are single threaded in
> userspace.
> 
> 3. Removed code which deals with data refs - delayed refs in user space
> are going to be used only for cowonly trees.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  Makefile      |   3 +-
>  ctree.h       |   3 +
>  delayed-ref.c | 608 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  delayed-ref.h | 225 ++++++++++++++++++++++
>  extent-tree.c | 228 ++++++++++++++++++++++
>  kerncompat.h  |   8 +
>  transaction.h |   4 +
>  7 files changed, 1078 insertions(+), 1 deletion(-)
>  create mode 100644 delayed-ref.c
>  create mode 100644 delayed-ref.h
> 
> diff --git a/Makefile b/Makefile
> index 544410e6440c..9508ad4f11e6 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -116,7 +116,8 @@ objects = ctree.o disk-io.o kernel-lib/radix-tree.o extent-tree.o print-tree.o \
>  	  qgroup.o free-space-cache.o kernel-lib/list_sort.o props.o \
>  	  kernel-shared/ulist.o qgroup-verify.o backref.o string-table.o task-utils.o \
>  	  inode.o file.o find-root.o free-space-tree.o help.o send-dump.o \
> -	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o
> +	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o \
> +	  delayed-ref.o
>  cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
>  	       cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
>  	       cmds-quota.o cmds-qgroup.o cmds-replace.o check/main.o \
> diff --git a/ctree.h b/ctree.h
> index b30a946658ce..d1ea45571d1e 100644
> --- a/ctree.h
> +++ b/ctree.h
> @@ -2812,4 +2812,7 @@ int btrfs_punch_hole(struct btrfs_trans_handle *trans,
>  int btrfs_read_file(struct btrfs_root *root, u64 ino, u64 start, int len,
>  		    char *dest);
>  
> +
> +/* extent-tree.c */
> +int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, unsigned long nr);
>  #endif
> diff --git a/delayed-ref.c b/delayed-ref.c
> new file mode 100644
> index 000000000000..f3fa50239380
> --- /dev/null
> +++ b/delayed-ref.c
> @@ -0,0 +1,608 @@

<snip>

> +
> +static inline void drop_delayed_ref(struct btrfs_trans_handle *trans,
> +				    struct btrfs_delayed_ref_root *delayed_refs,
> +				    struct btrfs_delayed_ref_head *head,
> +				    struct btrfs_delayed_ref_node *ref)
> +{
> +	rb_erase(&ref->ref_node, &head->ref_tree);
> +	RB_CLEAR_NODE(&ref->ref_node);
> +	if (!list_empty(&ref->add_list))
> +		list_del(&ref->add_list);
> +	ref->in_tree = 0;
> +	btrfs_put_delayed_ref(ref);

Compared with kernel code, it seems that we need

        delayed_refs->num_entries--;

> +	if (trans->delayed_ref_updates)
> +		trans->delayed_ref_updates--;
> +}

> +static noinline struct btrfs_delayed_ref_head *
> +add_delayed_ref_head(struct btrfs_trans_handle *trans,
> +		     struct btrfs_delayed_ref_head *head_ref,
> +		     void *qrecord,
> +		     int action, int *qrecord_inserted_ret,
> +		     int *old_ref_mod, int *new_ref_mod)
> +{
> +	struct btrfs_delayed_ref_head *existing;
> +	struct btrfs_delayed_ref_root *delayed_refs;
> +
> +	delayed_refs = &trans->delayed_refs;
> +
> +	existing = htree_insert(&delayed_refs->href_root, &head_ref->href_node);
> +	if (existing) {
> +		update_existing_head_ref(delayed_refs, existing, head_ref, old_ref_mod);
> +		/*
> +		 * we've updated the existing ref, free the newly
> +		 * allocated ref
> +		 */
> +		kfree(head_ref);
> +		head_ref = existing;
> +	} else {
> +		if (old_ref_mod)
> +			*old_ref_mod = 0;
> +		delayed_refs->num_heads++;
> +		delayed_refs->num_heads_ready++;

And
                delayed_refs->num_entries++;

to correctly count num_entries.
(I noticed that num_entries went to negative value when I'm running gdb)

However, num_entries is actually not used in progs at all (it is used for
throttling in kernel), so maybe we can just drop the variable from progs?

> +		trans->delayed_ref_updates++;
> +	}
> +	if (new_ref_mod)
> +		*new_ref_mod = head_ref->total_ref_mod;
> +
> +	return head_ref;
> +}

<snip>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/15] btrfs-progs: Add delayed refs infrastructure
  2018-07-30  8:34   ` Misono Tomohiro
@ 2018-07-30  9:11     ` Nikolay Borisov
  2018-08-02 12:17     ` David Sterba
  1 sibling, 0 replies; 46+ messages in thread
From: Nikolay Borisov @ 2018-07-30  9:11 UTC (permalink / raw)
  To: Misono Tomohiro, linux-btrfs



On 30.07.2018 11:34, Misono Tomohiro wrote:
> On 2018/06/08 21:47, Nikolay Borisov wrote:
>> This commit pulls those portions of the kernel implementation of
>> delayed refs which are necessary to have them working in user-space.
>> I've done the following modifications:
>>
>> 1. Replaced all kmem_cache_alloc calls to kmalloc.
>>
>> 2. Removed all locking-related code, since we are single threaded in
>> userspace.
>>
>> 3. Removed code which deals with data refs - delayed refs in user space
>> are going to be used only for cowonly trees.
>>
>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>> ---
>>  Makefile      |   3 +-
>>  ctree.h       |   3 +
>>  delayed-ref.c | 608 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  delayed-ref.h | 225 ++++++++++++++++++++++
>>  extent-tree.c | 228 ++++++++++++++++++++++
>>  kerncompat.h  |   8 +
>>  transaction.h |   4 +
>>  7 files changed, 1078 insertions(+), 1 deletion(-)
>>  create mode 100644 delayed-ref.c
>>  create mode 100644 delayed-ref.h
>>
>> diff --git a/Makefile b/Makefile
>> index 544410e6440c..9508ad4f11e6 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -116,7 +116,8 @@ objects = ctree.o disk-io.o kernel-lib/radix-tree.o extent-tree.o print-tree.o \
>>  	  qgroup.o free-space-cache.o kernel-lib/list_sort.o props.o \
>>  	  kernel-shared/ulist.o qgroup-verify.o backref.o string-table.o task-utils.o \
>>  	  inode.o file.o find-root.o free-space-tree.o help.o send-dump.o \
>> -	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o
>> +	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o \
>> +	  delayed-ref.o
>>  cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
>>  	       cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
>>  	       cmds-quota.o cmds-qgroup.o cmds-replace.o check/main.o \
>> diff --git a/ctree.h b/ctree.h
>> index b30a946658ce..d1ea45571d1e 100644
>> --- a/ctree.h
>> +++ b/ctree.h
>> @@ -2812,4 +2812,7 @@ int btrfs_punch_hole(struct btrfs_trans_handle *trans,
>>  int btrfs_read_file(struct btrfs_root *root, u64 ino, u64 start, int len,
>>  		    char *dest);
>>  
>> +
>> +/* extent-tree.c */
>> +int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, unsigned long nr);
>>  #endif
>> diff --git a/delayed-ref.c b/delayed-ref.c
>> new file mode 100644
>> index 000000000000..f3fa50239380
>> --- /dev/null
>> +++ b/delayed-ref.c
>> @@ -0,0 +1,608 @@
> 
> <snip>
> 
>> +
>> +static inline void drop_delayed_ref(struct btrfs_trans_handle *trans,
>> +				    struct btrfs_delayed_ref_root *delayed_refs,
>> +				    struct btrfs_delayed_ref_head *head,
>> +				    struct btrfs_delayed_ref_node *ref)
>> +{
>> +	rb_erase(&ref->ref_node, &head->ref_tree);
>> +	RB_CLEAR_NODE(&ref->ref_node);
>> +	if (!list_empty(&ref->add_list))
>> +		list_del(&ref->add_list);
>> +	ref->in_tree = 0;
>> +	btrfs_put_delayed_ref(ref);
> 
> Compared with kernel code, it seems that we need
> 
>         delayed_refs->num_entries--;
> 
>> +	if (trans->delayed_ref_updates)
>> +		trans->delayed_ref_updates--;
>> +}
> 
>> +static noinline struct btrfs_delayed_ref_head *
>> +add_delayed_ref_head(struct btrfs_trans_handle *trans,
>> +		     struct btrfs_delayed_ref_head *head_ref,
>> +		     void *qrecord,
>> +		     int action, int *qrecord_inserted_ret,
>> +		     int *old_ref_mod, int *new_ref_mod)
>> +{
>> +	struct btrfs_delayed_ref_head *existing;
>> +	struct btrfs_delayed_ref_root *delayed_refs;
>> +
>> +	delayed_refs = &trans->delayed_refs;
>> +
>> +	existing = htree_insert(&delayed_refs->href_root, &head_ref->href_node);
>> +	if (existing) {
>> +		update_existing_head_ref(delayed_refs, existing, head_ref, old_ref_mod);
>> +		/*
>> +		 * we've updated the existing ref, free the newly
>> +		 * allocated ref
>> +		 */
>> +		kfree(head_ref);
>> +		head_ref = existing;
>> +	} else {
>> +		if (old_ref_mod)
>> +			*old_ref_mod = 0;
>> +		delayed_refs->num_heads++;
>> +		delayed_refs->num_heads_ready++;
> 
> And
>                 delayed_refs->num_entries++;
> 
> to correctly count num_entries.
> (I noticed that num_entries went to negative value when I'm running gdb)
> 
> However, num_entries is actually not used in progs at all (it is used for
> throttling in kernel), so maybe we can just drop the variable from progs?

I agree, will fix it on next respin. But for now I will wait for more
reviews.

> 
>> +		trans->delayed_ref_updates++;
>> +	}
>> +	if (new_ref_mod)
>> +		*new_ref_mod = head_ref->total_ref_mod;
>> +
>> +	return head_ref;
>> +}
> 
> <snip>
> 
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/15] btrfs-progs: Wire up delayed refs
  2018-07-30  8:33   ` Misono Tomohiro
@ 2018-07-30  9:30     ` Nikolay Borisov
  0 siblings, 0 replies; 46+ messages in thread
From: Nikolay Borisov @ 2018-07-30  9:30 UTC (permalink / raw)
  To: Misono Tomohiro, linux-btrfs



On 30.07.2018 11:33, Misono Tomohiro wrote:
> On 2018/06/08 21:47, Nikolay Borisov wrote:
>> This commit enables the delayed refs infrastructures. This entails doing
>> the following:
>>
>> 1. Replacing existing calls of btrfs_extent_post_op (which is the
>> equivalent of delayed refs) with the proper btrfs_run_delayed_refs.
>> As well as eliminating open-coded calls to finish_current_insert and
>> del_pending_extents which execute the delayed ops.
>>
>> 2. Wiring up the addition of delayed refs when freeing extents
>> (btrfs_free_extent) and when adding new extents (alloc_tree_block).
>>
>> 3. Adding calls to btrfs_run_delayed refs in the transaction commit
>> path alongside comments why every call is needed, since it's not always
>> obvious (those call sites were derived empirically by running and
>> debugging existing tests)
>>
>> 4. Correctly flagging the transaction in which we are reinitialising
>> the extent tree.
>>
>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>> ---
>>  check/main.c  |   3 +-
>>  extent-tree.c | 166 ++++++++++++++++++++++++++++++----------------------------
>>  transaction.c |  24 +++++++++
>>  3 files changed, 111 insertions(+), 82 deletions(-)
>>
>> diff --git a/check/main.c b/check/main.c
>> index b84903acdb25..7c9689f29fd3 100644
>> --- a/check/main.c
>> +++ b/check/main.c
>> @@ -8634,7 +8634,7 @@ static int reinit_extent_tree(struct btrfs_trans_handle *trans,
>>  			fprintf(stderr, "Error adding block group\n");
>>  			return ret;
>>  		}
>> -		btrfs_extent_post_op(trans);
>> +		btrfs_run_delayed_refs(trans, -1);
>>  	}
>>  
>>  	ret = reset_balance(trans, fs_info);
>> @@ -9682,6 +9682,7 @@ int cmd_check(int argc, char **argv)
>>  			goto close_out;
>>  		}
>>  
>> +		trans->reinit_extent_tree = true;
>>  		if (init_extent_tree) {
>>  			printf("Creating a new extent tree\n");
>>  			ret = reinit_extent_tree(trans, info,
>> diff --git a/extent-tree.c b/extent-tree.c
>> index 3208ed11cb91..9d085158f2d8 100644
>> --- a/extent-tree.c
>> +++ b/extent-tree.c
>> @@ -1418,8 +1418,6 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
>>  		err = ret;
>>  out:
>>  	btrfs_free_path(path);
>> -	finish_current_insert(trans);
>> -	del_pending_extents(trans);
>>  	BUG_ON(err);
>>  	return err;
>>  }
>> @@ -1602,8 +1600,6 @@ int btrfs_set_block_flags(struct btrfs_trans_handle *trans, u64 bytenr,
>>  	btrfs_set_extent_flags(l, item, flags);
>>  out:
>>  	btrfs_free_path(path);
>> -	finish_current_insert(trans);
>> -	del_pending_extents(trans);
>>  	return ret;
>>  }
>>  
>> @@ -1701,7 +1697,6 @@ static int write_one_cache_group(struct btrfs_trans_handle *trans,
>>  				 struct btrfs_block_group_cache *cache)
>>  {
>>  	int ret;
>> -	int pending_ret;
>>  	struct btrfs_root *extent_root = trans->fs_info->extent_root;
>>  	unsigned long bi;
>>  	struct extent_buffer *leaf;
>> @@ -1717,12 +1712,8 @@ static int write_one_cache_group(struct btrfs_trans_handle *trans,
>>  	btrfs_mark_buffer_dirty(leaf);
>>  	btrfs_release_path(path);
>>  fail:
>> -	finish_current_insert(trans);
>> -	pending_ret = del_pending_extents(trans);
>>  	if (ret)
>>  		return ret;
>> -	if (pending_ret)
>> -		return pending_ret;
>>  	return 0;
>>  
>>  }
>> @@ -2050,6 +2041,7 @@ static int finish_current_insert(struct btrfs_trans_handle *trans)
>>  	int skinny_metadata =
>>  		btrfs_fs_incompat(extent_root->fs_info, SKINNY_METADATA);
>>  
>> +
>>  	while(1) {
>>  		ret = find_first_extent_bit(&info->extent_ins, 0, &start,
>>  					    &end, EXTENT_LOCKED);
>> @@ -2081,6 +2073,8 @@ static int finish_current_insert(struct btrfs_trans_handle *trans)
>>  			BUG_ON(1);
>>  		}
>>  
>> +
>> +		printf("shouldn't be executed\n");
>>  		clear_extent_bits(&info->extent_ins, start, end, EXTENT_LOCKED);
>>  		kfree(extent_op);
>>  	}
>> @@ -2380,7 +2374,6 @@ static int __free_extent(struct btrfs_trans_handle *trans,
>>  	}
>>  fail:
>>  	btrfs_free_path(path);
>> -	finish_current_insert(trans);
>>  	return ret;
>>  }
>>  
>> @@ -2463,33 +2456,30 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans,
>>  		      u64 bytenr, u64 num_bytes, u64 parent,
>>  		      u64 root_objectid, u64 owner, u64 offset)
>>  {
>> -	struct btrfs_root *extent_root = root->fs_info->extent_root;
>> -	int pending_ret;
>>  	int ret;
>>  
>>  	WARN_ON(num_bytes < root->fs_info->sectorsize);
>> -	if (root == extent_root) {
>> -		struct pending_extent_op *extent_op;
>> -
>> -		extent_op = kmalloc(sizeof(*extent_op), GFP_NOFS);
>> -		BUG_ON(!extent_op);
>> -
>> -		extent_op->type = PENDING_EXTENT_DELETE;
>> -		extent_op->bytenr = bytenr;
>> -		extent_op->num_bytes = num_bytes;
>> -		extent_op->level = (int)owner;
>> -
>> -		set_extent_bits(&root->fs_info->pending_del,
>> -				bytenr, bytenr + num_bytes - 1,
>> -				EXTENT_LOCKED);
>> -		set_state_private(&root->fs_info->pending_del,
>> -				  bytenr, (unsigned long)extent_op);
>> -		return 0;
>> +	/*
>> +	 * tree log blocks never actually go into the extent allocation
>> +	 * tree, just update pinning info and exit early.
>> +	 */
>> +	if (root_objectid == BTRFS_TREE_LOG_OBJECTID) {
>> +		printf("PINNING EXTENTS IN LOG TREE\n");
>> +		WARN_ON(owner >= BTRFS_FIRST_FREE_OBJECTID);
>> +		btrfs_pin_extent(trans->fs_info, bytenr, num_bytes);
>> +		ret = 0;
>> +	} else if (owner < BTRFS_FIRST_FREE_OBJECTID) {
>> +		BUG_ON(offset);
>> +		ret = btrfs_add_delayed_tree_ref(trans->fs_info, trans,
>> +						 bytenr, num_bytes, parent,
>> +						 root_objectid, (int)owner,
>> +						 BTRFS_DROP_DELAYED_REF,
>> +						 NULL, NULL, NULL);
>> +	} else {
>> +		ret = __free_extent(trans, bytenr, num_bytes, parent,
>> +				    root_objectid, owner, offset, 1);
>>  	}
>> -	ret = __free_extent(trans, root, bytenr, num_bytes, parent,
>> -			    root_objectid, owner, offset, 1);
>> -	pending_ret = del_pending_extents(trans);
>> -	return ret ? ret : pending_ret;
>> +	return ret;
>>  }
>>  
>>  static u64 stripe_align(struct btrfs_root *root, u64 val)
>> @@ -2695,6 +2685,8 @@ static int alloc_reserved_tree_block2(struct btrfs_trans_handle *trans,
>>  	struct btrfs_delayed_tree_ref *ref = btrfs_delayed_node_to_tree_ref(node);
>>  	struct btrfs_key ins;
>>  	bool skinny_metadata = btrfs_fs_incompat(trans->fs_info, SKINNY_METADATA);
>> +	int ret;
>> +	u64 start, end;
>>  
>>  	ins.objectid = node->bytenr;
>>  	if (skinny_metadata) {
>> @@ -2705,10 +2697,25 @@ static int alloc_reserved_tree_block2(struct btrfs_trans_handle *trans,
>>  		ins.type = BTRFS_EXTENT_ITEM_KEY;
>>  	}
>>  
>> -	return alloc_reserved_tree_block(trans, ref->root, trans->transid,
>> -					 extent_op->flags_to_set,
>> -					 &extent_op->key, ref->level, &ins);
>> +	if (ref->root == BTRFS_EXTENT_TREE_OBJECTID) {
>> +		ret = find_first_extent_bit(&trans->fs_info->extent_ins,
>> +					    node->bytenr, &start, &end,
>> +					    EXTENT_LOCKED);
>> +		ASSERT(!ret);
>> +		ASSERT(start == node->bytenr);
>> +		ASSERT(end == node->bytenr + node->num_bytes - 1);
>> +	}
>> +
>> +	ret = alloc_reserved_tree_block(trans, ref->root, trans->transid,
>> +					extent_op->flags_to_set,
>> +					&extent_op->key, ref->level, &ins);
>>  
>> +	if (ref->root == BTRFS_EXTENT_TREE_OBJECTID) {
>> +		clear_extent_bits(&trans->fs_info->extent_ins, start, end,
>> +				  EXTENT_LOCKED);
>> +	}
>> +
>> +	return ret;
>>  }
>>  
>>  static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
>> @@ -2773,39 +2780,50 @@ static int alloc_tree_block(struct btrfs_trans_handle *trans,
>>  			    u64 search_end, struct btrfs_key *ins)
>>  {
>>  	int ret;
>> +	u64 extent_size;
>> +	struct btrfs_delayed_extent_op *extent_op;
>> +	bool skinny_metadata = btrfs_fs_incompat(root->fs_info,
>> +						 SKINNY_METADATA);
>> +
>> +	extent_op = btrfs_alloc_delayed_extent_op();
>> +	if (!extent_op)
>> +		return -ENOMEM;
>> +
>>  	ret = btrfs_reserve_extent(trans, root, num_bytes, empty_size,
>>  				   hint_byte, search_end, ins, 0);
>>  	BUG_ON(ret);
>>  
>> +	if (key)
>> +		memcpy(&extent_op->key, key, sizeof(extent_op->key));
>> +	else
>> +		memset(&extent_op->key, 0, sizeof(extent_op->key));
>> +	extent_op->flags_to_set = flags;
>> +	extent_op->update_key = skinny_metadata ? false : true;
>> +	extent_op->update_flags = true;
>> +	extent_op->is_data = false;
>> +	extent_op->level = level;
>> +
>> +	extent_size = ins->offset;
>> +
>> +	if (btrfs_fs_incompat(root->fs_info, SKINNY_METADATA)) {
>> +		ins->offset = level;
>> +		ins->type = BTRFS_METADATA_ITEM_KEY;
>> +	}
>> +
>> +	/* Ensure this reserved extent is not found by the allocator */
>>  	if (root_objectid == BTRFS_EXTENT_TREE_OBJECTID) {
>> -		struct pending_extent_op *extent_op;
>> -
>> -		extent_op = kmalloc(sizeof(*extent_op), GFP_NOFS);
>> -		BUG_ON(!extent_op);
>> -
>> -		extent_op->type = PENDING_EXTENT_INSERT;
>> -		extent_op->bytenr = ins->objectid;
>> -		extent_op->num_bytes = ins->offset;
>> -		extent_op->level = level;
>> -		extent_op->flags = flags;
>> -		memcpy(&extent_op->key, key, sizeof(*key));
>> -
>> -		set_extent_bits(&root->fs_info->extent_ins, ins->objectid,
>> -				ins->objectid + ins->offset - 1,
>> -				EXTENT_LOCKED);
>> -		set_state_private(&root->fs_info->extent_ins,
>> -				  ins->objectid, (unsigned long)extent_op);
>> -	} else {
>> -		if (btrfs_fs_incompat(root->fs_info, SKINNY_METADATA)) {
>> -			ins->offset = level;
>> -			ins->type = BTRFS_METADATA_ITEM_KEY;
>> -		}
>> -		ret = alloc_reserved_tree_block(trans, root, root_objectid,
>> -						generation, flags,
>> -						key, level, ins);
>> -		finish_current_insert(trans);
>> -		del_pending_extents(trans);
>> +		ret = set_extent_bits(&trans->fs_info->extent_ins,
>> +				      ins->objectid,
>> +				      ins->objectid + extent_size - 1,
>> +				      EXTENT_LOCKED);
>> +
>> +		BUG_ON(ret);
>>  	}
>> +
>> +	ret = btrfs_add_delayed_tree_ref(root->fs_info, trans, ins->objectid,
>> +					 extent_size, 0, root_objectid,
>> +					 level, BTRFS_ADD_DELAYED_EXTENT,
>> +					 extent_op, NULL, NULL);
>>  	return ret;
>>  }
>>  
>> @@ -3330,11 +3348,6 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans,
>>  				sizeof(cache->item));
>>  	BUG_ON(ret);
>>  
>> -	ret = finish_current_insert(trans);
>> -	BUG_ON(ret);
>> -	ret = del_pending_extents(trans);
>> -	BUG_ON(ret);
>> -
>>  	return 0;
>>  }
>>  
>> @@ -3430,10 +3443,6 @@ int btrfs_make_block_groups(struct btrfs_trans_handle *trans,
>>  					sizeof(cache->item));
>>  		BUG_ON(ret);
>>  
>> -		finish_current_insert(trans);
>> -		ret = del_pending_extents(trans);
>> -		BUG_ON(ret);
>> -
>>  		cur_start = cache->key.objectid + cache->key.offset;
>>  	}
>>  	return 0;
>> @@ -3815,14 +3824,9 @@ int btrfs_fix_block_accounting(struct btrfs_trans_handle *trans)
>>  	struct btrfs_fs_info *fs_info = trans->fs_info;
>>  	struct btrfs_root *root = fs_info->extent_root;
>>  
>> -	while(extent_root_pending_ops(fs_info)) {
>> -		ret = finish_current_insert(trans);
>> -		if (ret)
>> -			return ret;
>> -		ret = del_pending_extents(trans);
>> -		if (ret)
>> -			return ret;
>> -	}
>> +	ret = btrfs_run_delayed_refs(trans, -1);
>> +	if (ret)
>> +		return ret;
>>  
>>  	while(1) {
>>  		cache = btrfs_lookup_first_block_group(fs_info, start);
>> @@ -4027,7 +4031,7 @@ static int __btrfs_record_file_extent(struct btrfs_trans_handle *trans,
>>  		} else if (ret != -EEXIST) {
>>  			goto fail;
>>  		}
>> -		btrfs_extent_post_op(trans);
>> +		btrfs_run_delayed_refs(trans, -1);
>>  		extent_bytenr = disk_bytenr;
>>  		extent_num_bytes = num_bytes;
>>  		extent_offset = 0;
>> diff --git a/transaction.c b/transaction.c
>> index ecafbb156610..b2d613ee88d0 100644
>> --- a/transaction.c
>> +++ b/transaction.c
>> @@ -98,6 +98,17 @@ int commit_tree_roots(struct btrfs_trans_handle *trans,
>>  	if (ret)
>>  		return ret;
>>  
>> +	/*
>> +	 * If the above CoW is the first one to dirty the current tree_root,
>> +	 * delayed refs for it won't be run until after this function has
>> +	 * finished executing, meaning we won't process the extent tree root,
>> +	 * which will have been added to ->dirty_cowonly_roots.  So run
>> +	 * delayed refs here as well.
>> +	 */
>> +	ret = btrfs_run_delayed_refs(trans, -1);
>> +	if (ret)
>> +		return ret;
>> +
>>  	while(!list_empty(&fs_info->dirty_cowonly_roots)) {
>>  		next = fs_info->dirty_cowonly_roots.next;
>>  		list_del_init(next);
>> @@ -147,6 +158,12 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
>>  
>>  	if (trans->fs_info->transaction_aborted)
>>  		return -EROFS;
>> +	/*
>> +	 * Flush all accumulated delayed refs so that root-tree updates are
>> +	 * consistent
>> +	 */
>> +	ret = btrfs_run_delayed_refs(trans, -1);
>> +	BUG_ON(ret);
>>  
>>  	if (root->commit_root == root->node)
>>  		goto commit_tree;
>> @@ -164,9 +181,16 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
>>  	ret = btrfs_update_root(trans, root->fs_info->tree_root,
>>  				&root->root_key, &root->root_item);
>>  	BUG_ON(ret);
>> +
>>  commit_tree:
>>  	ret = commit_tree_roots(trans, fs_info);
>>  	BUG_ON(ret);
>> +	/*
>> +	 * Ensure that all comitted roots are properly accounted in the
>> +	 * extent tree
>> +	 */
>> +	ret = btrfs_run_delayed_refs(trans, -1);
>> +	BUG_ON(ret);
> 
> Is "btrfs_write_dirty_block_groups(trans, root);" needed here
> since above run_delayed_refs() may update block_group_cache?

Yes, it is indeed. At the moment the delayed refs support
freeing/allocating metadata blocks. So when running delayed refs we can
modify the in-memory state of the block groups with the following call
chain (in the alloc case, freeing is analogical):

run_delayed_refs
  alloc_reserved_tree_block
   update_block_group  <-- used space of block groups is modified.

So block groups state needs to be written after the final delayed ref is
run. As a matter of fact I think btrfs_write_dirty_block_groups should
really be called once in btrfs_commit_transaction, i.e the calls in
update_cowonly_roots could be lifted in either btrfs_commit_transaction
or in __commit_transaction.

I will consider this when sending v2 and also running this test to
ensure we don't regress.

Thank you for the review.

> 
> [long explanation] 
> 
> I observed fsck-tests/020 fails with low-mem mode in current devel branch.
> i.e.
> 
>   $ make test-fsck TEST_ENABLE_OVERRIDE=true TEST_ARGS_CHECK=--mode=lowmem TEST=020\*
> 
> fails and log indicates mismatch of used value in block group item:
> 
> =====
> <snip>
>   [2/7] checking extents   
>   ERROR: block group[4194304 8388608] used 20480 but extent items used 24576
>   ERROR: block group[20971520 16777216] used 659456 but extent items used 655360
> <snip>
> =====
> 
> I found that before this commit it works fine. 
> It turned out that "btrfs-image -r" actually causes the problem as it modifies
> DEV_ITEM in fixup_devices() and commits transaction, which misses to write
> block group cache before __commit_transaction() for
> tests/fsck-tests/020-extent/ref-cases/keyed_data_ref_with_shared_leaf.img.
> 
> (Used value check of block group item only exists in low-mem mode and therefore
> original mode does not complain.)
> 
> With "btrfs_write_dirty_block_groups()" I don't see any failure with both original
> and low-mem mode (in all fsck tests).
> 
> Thanks,
> Misono
> 
>>  	ret = __commit_transaction(trans, root);
>>  	BUG_ON(ret);
>>  	write_ctree_super(trans);
>>
> 
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/15] btrfs-progs: Add delayed refs infrastructure
  2018-07-30  8:34   ` Misono Tomohiro
  2018-07-30  9:11     ` Nikolay Borisov
@ 2018-08-02 12:17     ` David Sterba
  1 sibling, 0 replies; 46+ messages in thread
From: David Sterba @ 2018-08-02 12:17 UTC (permalink / raw)
  To: Misono Tomohiro; +Cc: Nikolay Borisov, linux-btrfs

On Mon, Jul 30, 2018 at 05:34:54PM +0900, Misono Tomohiro wrote:
> to correctly count num_entries.
> (I noticed that num_entries went to negative value when I'm running gdb)
> 
> However, num_entries is actually not used in progs at all (it is used for
> throttling in kernel), so maybe we can just drop the variable from progs?

The long-term goal is to use the same source for kernel and progs where
possible and where it does not cause trouble. We can add stubs for
spinlocks, atomics etc and could live with a few variables that are not
used in progs.

Right now I'm ok with both ways, ie. when it's easier to take the kernel
code as-is and add stubs, or drop the unused bits now when porting to
progs.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/15] Add delayed-refs support to btrfs-progs
  2018-07-16 15:39 ` David Sterba
@ 2018-09-12 11:51   ` Su Yue
  2018-09-12 18:02     ` David Sterba
  0 siblings, 1 reply; 46+ messages in thread
From: Su Yue @ 2018-09-12 11:51 UTC (permalink / raw)
  To: dsterba, Nikolay Borisov, linux-btrfs; +Cc: Su Yue



On 2018/7/16 11:39 PM, David Sterba wrote:
> On Fri, Jun 08, 2018 at 03:47:43PM +0300, Nikolay Borisov wrote:
>> Hello,
>>                                                                                  
>> Here is a series which adds support for delayed refs. This is needed to enable
>> later work on adding freespace tree repair code. Additionally, it results in
>> more code sharing between kernel/user space.
>>
>> Patches 1-9 are simple prep patches removing some arguments, causing problems
>> later. They can go independently of the delayed refs work. They don't introduce
>> any functional changes. Next, patches 10-13 introduce the needed infrastructure
>> to for delayed refs without actually activating it. Patch 14 finally wires it
>> up by adding the necessary call outs to btrfs_run_delayed refs and reworking the
>> extent addition/freeing functions. With all of this done, patch 15 finally
>> removes the old code.
>>
>> This series passes all btrfs progs fsck and misc tests + fuzz tests apart from
>> fuzz-003/007/009 - but those fail without this series so it's unlikely it's
>> caused by it.
>>
>> Nikolay Borisov (15):
>>    btrfs-progs: Remove root argument from pin_down_bytes
>>    btrfs-progs: Remove root argument from btrfs_del_csums
>>    btrfs-progs: Add functions to modify the used space by a root
>>    btrfs-progs: Refactor the root used bytes are updated
>>    btrfs-progs: Make update_block_group take fs_info instead of root
>>    btrfs-progs: check: Drop trans/root arguments from free_extent_hook
>>    btrfs-progs: Remove root argument from __free_extent
>>    btrfs-progs: Remove root argument from alloc_reserved_tree_block
>>    btrfs-progs: Always pass 0 for offset when calling btrfs_free_extent
>>      for btree blocks.
>>    btrfs-progs: Add boolean to signal whether we are re-initing extent
>>      tree
>>    btrfs-progs: Add delayed refs infrastructure
>>    btrfs-progs: Add __free_extent2 function
>>    btrfs-progs: Add alloc_reserved_tree_block2 function
>>    btrfs-progs: Wire up delayed refs
>>    btrfs-progs: Remove old delayed refs infrastructure
> 
> Added to devel. There were some patch-to-patch compilation issues,
> function alloc_reserved_tree_block2 used earlier than defined so I
> reordered the patches to fix that.
> 
> The CI fails at test check/020-extent-ref-cases but it works on my
> machine so it's not caused by the patchset.

Hi, David
Actually, now kdave/devel still fails at fsck-tests/020 due to
version 1st of this patchset. See the thread please
https://www.spinics.net/lists/linux-btrfs/msg81675.html

Nikolay's V2 patchset should slove the problem.
You may have known the situation, this mail is just a gentle reminder :).

Thanks,
Su
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/15] Add delayed-refs support to btrfs-progs
  2018-09-12 11:51   ` Su Yue
@ 2018-09-12 18:02     ` David Sterba
  0 siblings, 0 replies; 46+ messages in thread
From: David Sterba @ 2018-09-12 18:02 UTC (permalink / raw)
  To: Su Yue; +Cc: dsterba, Nikolay Borisov, linux-btrfs, Su Yue

On Wed, Sep 12, 2018 at 07:51:39PM +0800, Su Yue wrote:
> Actually, now kdave/devel still fails at fsck-tests/020 due to
> version 1st of this patchset. See the thread please
> https://www.spinics.net/lists/linux-btrfs/msg81675.html
> 
> Nikolay's V2 patchset should slove the problem.
> You may have known the situation, this mail is just a gentle reminder :).

I'll replace the patches today, thanks.

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2018-09-12 23:07 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-08 12:47 [PATCH 00/15] Add delayed-refs support to btrfs-progs Nikolay Borisov
2018-06-08 12:47 ` [PATCH 01/15] btrfs-progs: Remove root argument from pin_down_bytes Nikolay Borisov
2018-06-11  4:41   ` Qu Wenruo
2018-06-08 12:47 ` [PATCH 02/15] btrfs-progs: Remove root argument from btrfs_del_csums Nikolay Borisov
2018-06-11  4:46   ` Qu Wenruo
2018-06-11  7:02     ` Nikolay Borisov
2018-06-11  7:40       ` Qu Wenruo
2018-06-11  7:48         ` Nikolay Borisov
2018-06-11  8:08           ` Qu Wenruo
2018-06-11  8:09             ` Nikolay Borisov
2018-06-08 12:47 ` [PATCH 03/15] btrfs-progs: Add functions to modify the used space by a root Nikolay Borisov
2018-06-11  4:47   ` Qu Wenruo
2018-06-08 12:47 ` [PATCH 04/15] btrfs-progs: Refactor the root used bytes are updated Nikolay Borisov
2018-06-08 12:47 ` [PATCH 05/15] btrfs-progs: Make update_block_group take fs_info instead of root Nikolay Borisov
2018-06-11  4:49   ` Qu Wenruo
2018-06-08 12:47 ` [PATCH 06/15] btrfs-progs: check: Drop trans/root arguments from free_extent_hook Nikolay Borisov
2018-06-11  4:55   ` Qu Wenruo
2018-06-11  7:04     ` Nikolay Borisov
2018-06-08 12:47 ` [PATCH 07/15] btrfs-progs: Remove root argument from __free_extent Nikolay Borisov
2018-06-11  4:58   ` Qu Wenruo
2018-06-11  7:06     ` Nikolay Borisov
2018-06-08 12:47 ` [PATCH 08/15] btrfs-progs: Remove root argument from alloc_reserved_tree_block Nikolay Borisov
2018-06-08 12:47 ` [PATCH 09/15] btrfs-progs: Always pass 0 for offset when calling btrfs_free_extent for btree blocks Nikolay Borisov
2018-06-11  5:05   ` Qu Wenruo
2018-06-08 12:47 ` [PATCH 10/15] btrfs-progs: Add boolean to signal whether we are re-initing extent tree Nikolay Borisov
2018-06-08 12:47 ` [PATCH 11/15] btrfs-progs: Add delayed refs infrastructure Nikolay Borisov
2018-06-08 14:53   ` [PATCH 11/15 v2] " Nikolay Borisov
2018-06-11  5:20   ` [PATCH 11/15] " Qu Wenruo
2018-06-11  7:10     ` Nikolay Borisov
2018-06-11  7:46       ` Qu Wenruo
2018-07-30  8:34   ` Misono Tomohiro
2018-07-30  9:11     ` Nikolay Borisov
2018-08-02 12:17     ` David Sterba
2018-06-08 12:47 ` [PATCH 12/15] btrfs-progs: Add __free_extent2 function Nikolay Borisov
2018-06-08 12:47 ` [PATCH 13/15] btrfs-progs: Add alloc_reserved_tree_block2 function Nikolay Borisov
2018-06-08 12:47 ` [PATCH 14/15] btrfs-progs: Wire up delayed refs Nikolay Borisov
2018-07-30  8:33   ` Misono Tomohiro
2018-07-30  9:30     ` Nikolay Borisov
2018-06-08 12:47 ` [PATCH 15/15] btrfs-progs: Remove old delayed refs infrastructure Nikolay Borisov
2018-06-08 14:49   ` [PATCH 15/15 v2] " Nikolay Borisov
2018-06-08 13:50 ` [PATCH 00/15] Add delayed-refs support to btrfs-progs Qu Wenruo
2018-06-08 14:08   ` Nikolay Borisov
2018-06-08 14:21     ` Qu Wenruo
2018-07-16 15:39 ` David Sterba
2018-09-12 11:51   ` Su Yue
2018-09-12 18:02     ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.