All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Delayed refs rsv
@ 2018-11-21 18:59 Josef Bacik
  2018-11-21 18:59 ` [PATCH 1/6] btrfs: add btrfs_delete_ref_head helper Josef Bacik
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Josef Bacik @ 2018-11-21 18:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

This patchset changes how we do space reservations for delayed refs.  We were
hitting probably 20-40 enospc abort's per day in production while running
delayed refs at transaction commit time.  This means we ran out of space in the
global reserve and couldn't easily get more space in use_block_rsv().

The global reserve has grown to cover everything we don't reserve space
explicitly for, and we've grown a lot of weird ad-hoc hueristics to know if
we're running short on space and when it's time to force a commit.  A failure
rate of 20-40 file systems when we run hundreds of thousands of them isn't super
high, but cleaning up this code will make things less ugly and more predictible.

Thus the delayed refs rsv.  We always know how many delayed refs we have
outstanding, and although running them generates more we can use the global
reserve for that spill over, which fits better into it's desired use than a full
blown reservation.  This first approach is to simply take how many times we're
reserving space for and multiply that by 2 in order to save enough space for the
delayed refs that could be generated.  This is a niave approach and will
probably evolve, but for now it works.

With this patchset we've gone down to 2-8 failures per week.  It's not perfect,
there are some corner cases that still need to be addressed, but is
significantly better than what we had.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/6] btrfs: add btrfs_delete_ref_head helper
  2018-11-21 18:59 [PATCH 0/6] Delayed refs rsv Josef Bacik
@ 2018-11-21 18:59 ` Josef Bacik
  2018-11-22  9:12   ` Nikolay Borisov
  2018-11-21 18:59 ` [PATCH 2/6] btrfs: add cleanup_ref_head_accounting helper Josef Bacik
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Josef Bacik @ 2018-11-21 18:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Josef Bacik

From: Josef Bacik <jbacik@fb.com>

We do this dance in cleanup_ref_head and check_ref_cleanup, unify it
into a helper and cleanup the calling functions.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/delayed-ref.c | 14 ++++++++++++++
 fs/btrfs/delayed-ref.h |  3 ++-
 fs/btrfs/extent-tree.c | 22 +++-------------------
 3 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 9301b3ad9217..b3e4c9fcb664 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -400,6 +400,20 @@ struct btrfs_delayed_ref_head *btrfs_select_ref_head(
 	return head;
 }
 
+void btrfs_delete_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
+			   struct btrfs_delayed_ref_head *head)
+{
+	lockdep_assert_held(&delayed_refs->lock);
+	lockdep_assert_held(&head->lock);
+
+	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
+	RB_CLEAR_NODE(&head->href_node);
+	atomic_dec(&delayed_refs->num_entries);
+	delayed_refs->num_heads--;
+	if (head->processing == 0)
+		delayed_refs->num_heads_ready--;
+}
+
 /*
  * Helper to insert the ref_node to the tail or merge with tail.
  *
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index 8e20c5cb5404..d2af974f68a1 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -261,7 +261,8 @@ static inline void btrfs_delayed_ref_unlock(struct btrfs_delayed_ref_head *head)
 {
 	mutex_unlock(&head->mutex);
 }
-
+void btrfs_delete_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
+			   struct btrfs_delayed_ref_head *head);
 
 struct btrfs_delayed_ref_head *btrfs_select_ref_head(
 		struct btrfs_delayed_ref_root *delayed_refs);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index d242a1174e50..c36b3a42f2bb 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2474,12 +2474,9 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans,
 		spin_unlock(&delayed_refs->lock);
 		return 1;
 	}
-	delayed_refs->num_heads--;
-	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
-	RB_CLEAR_NODE(&head->href_node);
+	btrfs_delete_ref_head(delayed_refs, head);
 	spin_unlock(&head->lock);
 	spin_unlock(&delayed_refs->lock);
-	atomic_dec(&delayed_refs->num_entries);
 
 	trace_run_delayed_ref_head(fs_info, head, 0);
 
@@ -6984,22 +6981,9 @@ static noinline int check_ref_cleanup(struct btrfs_trans_handle *trans,
 	if (!mutex_trylock(&head->mutex))
 		goto out;
 
-	/*
-	 * at this point we have a head with no other entries.  Go
-	 * ahead and process it.
-	 */
-	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
-	RB_CLEAR_NODE(&head->href_node);
-	atomic_dec(&delayed_refs->num_entries);
-
-	/*
-	 * we don't take a ref on the node because we're removing it from the
-	 * tree, so we just steal the ref the tree was holding.
-	 */
-	delayed_refs->num_heads--;
-	if (head->processing == 0)
-		delayed_refs->num_heads_ready--;
+	btrfs_delete_ref_head(delayed_refs, head);
 	head->processing = 0;
+
 	spin_unlock(&head->lock);
 	spin_unlock(&delayed_refs->lock);
 
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/6] btrfs: add cleanup_ref_head_accounting helper
  2018-11-21 18:59 [PATCH 0/6] Delayed refs rsv Josef Bacik
  2018-11-21 18:59 ` [PATCH 1/6] btrfs: add btrfs_delete_ref_head helper Josef Bacik
@ 2018-11-21 18:59 ` Josef Bacik
  2018-11-22  1:06   ` Qu Wenruo
  2018-11-21 18:59 ` [PATCH 3/6] btrfs: cleanup extent_op handling Josef Bacik
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Josef Bacik @ 2018-11-21 18:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Josef Bacik

From: Josef Bacik <jbacik@fb.com>

We were missing some quota cleanups in check_ref_cleanup, so break the
ref head accounting cleanup into a helper and call that from both
check_ref_cleanup and cleanup_ref_head.  This will hopefully ensure that
we don't screw up accounting in the future for other things that we add.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/extent-tree.c | 67 +++++++++++++++++++++++++++++---------------------
 1 file changed, 39 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index c36b3a42f2bb..e3ed3507018d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2443,6 +2443,41 @@ static int cleanup_extent_op(struct btrfs_trans_handle *trans,
 	return ret ? ret : 1;
 }
 
+static void cleanup_ref_head_accounting(struct btrfs_trans_handle *trans,
+					struct btrfs_delayed_ref_head *head)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_delayed_ref_root *delayed_refs =
+		&trans->transaction->delayed_refs;
+
+	if (head->total_ref_mod < 0) {
+		struct btrfs_space_info *space_info;
+		u64 flags;
+
+		if (head->is_data)
+			flags = BTRFS_BLOCK_GROUP_DATA;
+		else if (head->is_system)
+			flags = BTRFS_BLOCK_GROUP_SYSTEM;
+		else
+			flags = BTRFS_BLOCK_GROUP_METADATA;
+		space_info = __find_space_info(fs_info, flags);
+		ASSERT(space_info);
+		percpu_counter_add_batch(&space_info->total_bytes_pinned,
+				   -head->num_bytes,
+				   BTRFS_TOTAL_BYTES_PINNED_BATCH);
+
+		if (head->is_data) {
+			spin_lock(&delayed_refs->lock);
+			delayed_refs->pending_csums -= head->num_bytes;
+			spin_unlock(&delayed_refs->lock);
+		}
+	}
+
+	/* Also free its reserved qgroup space */
+	btrfs_qgroup_free_delayed_ref(fs_info, head->qgroup_ref_root,
+				      head->qgroup_reserved);
+}
+
 static int cleanup_ref_head(struct btrfs_trans_handle *trans,
 			    struct btrfs_delayed_ref_head *head)
 {
@@ -2478,31 +2513,6 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans,
 	spin_unlock(&head->lock);
 	spin_unlock(&delayed_refs->lock);
 
-	trace_run_delayed_ref_head(fs_info, head, 0);
-
-	if (head->total_ref_mod < 0) {
-		struct btrfs_space_info *space_info;
-		u64 flags;
-
-		if (head->is_data)
-			flags = BTRFS_BLOCK_GROUP_DATA;
-		else if (head->is_system)
-			flags = BTRFS_BLOCK_GROUP_SYSTEM;
-		else
-			flags = BTRFS_BLOCK_GROUP_METADATA;
-		space_info = __find_space_info(fs_info, flags);
-		ASSERT(space_info);
-		percpu_counter_add_batch(&space_info->total_bytes_pinned,
-				   -head->num_bytes,
-				   BTRFS_TOTAL_BYTES_PINNED_BATCH);
-
-		if (head->is_data) {
-			spin_lock(&delayed_refs->lock);
-			delayed_refs->pending_csums -= head->num_bytes;
-			spin_unlock(&delayed_refs->lock);
-		}
-	}
-
 	if (head->must_insert_reserved) {
 		btrfs_pin_extent(fs_info, head->bytenr,
 				 head->num_bytes, 1);
@@ -2512,9 +2522,9 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans,
 		}
 	}
 
-	/* Also free its reserved qgroup space */
-	btrfs_qgroup_free_delayed_ref(fs_info, head->qgroup_ref_root,
-				      head->qgroup_reserved);
+	cleanup_ref_head_accounting(trans, head);
+
+	trace_run_delayed_ref_head(fs_info, head, 0);
 	btrfs_delayed_ref_unlock(head);
 	btrfs_put_delayed_ref_head(head);
 	return 0;
@@ -6991,6 +7001,7 @@ static noinline int check_ref_cleanup(struct btrfs_trans_handle *trans,
 	if (head->must_insert_reserved)
 		ret = 1;
 
+	cleanup_ref_head_accounting(trans, head);
 	mutex_unlock(&head->mutex);
 	btrfs_put_delayed_ref_head(head);
 	return ret;
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/6] btrfs: cleanup extent_op handling
  2018-11-21 18:59 [PATCH 0/6] Delayed refs rsv Josef Bacik
  2018-11-21 18:59 ` [PATCH 1/6] btrfs: add btrfs_delete_ref_head helper Josef Bacik
  2018-11-21 18:59 ` [PATCH 2/6] btrfs: add cleanup_ref_head_accounting helper Josef Bacik
@ 2018-11-21 18:59 ` Josef Bacik
  2018-11-22  8:56   ` Lu Fengqi
                     ` (2 more replies)
  2018-11-21 18:59 ` [PATCH 4/6] btrfs: only track ref_heads in delayed_ref_updates Josef Bacik
                   ` (3 subsequent siblings)
  6 siblings, 3 replies; 23+ messages in thread
From: Josef Bacik @ 2018-11-21 18:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Josef Bacik

From: Josef Bacik <jbacik@fb.com>

The cleanup_extent_op function actually would run the extent_op if it
needed running, which made the name sort of a misnomer.  Change it to
run_and_cleanup_extent_op, and move the actual cleanup work to
cleanup_extent_op so it can be used by check_ref_cleanup() in order to
unify the extent op handling.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/extent-tree.c | 36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e3ed3507018d..8a776dc9cb38 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2424,19 +2424,33 @@ static void unselect_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_ref
 	btrfs_delayed_ref_unlock(head);
 }
 
-static int cleanup_extent_op(struct btrfs_trans_handle *trans,
-			     struct btrfs_delayed_ref_head *head)
+static struct btrfs_delayed_extent_op *
+cleanup_extent_op(struct btrfs_trans_handle *trans,
+		  struct btrfs_delayed_ref_head *head)
 {
 	struct btrfs_delayed_extent_op *extent_op = head->extent_op;
-	int ret;
 
 	if (!extent_op)
-		return 0;
-	head->extent_op = NULL;
+		return NULL;
+
 	if (head->must_insert_reserved) {
+		head->extent_op = NULL;
 		btrfs_free_delayed_extent_op(extent_op);
-		return 0;
+		return NULL;
 	}
+	return extent_op;
+}
+
+static int run_and_cleanup_extent_op(struct btrfs_trans_handle *trans,
+				     struct btrfs_delayed_ref_head *head)
+{
+	struct btrfs_delayed_extent_op *extent_op =
+		cleanup_extent_op(trans, head);
+	int ret;
+
+	if (!extent_op)
+		return 0;
+	head->extent_op = NULL;
 	spin_unlock(&head->lock);
 	ret = run_delayed_extent_op(trans, head, extent_op);
 	btrfs_free_delayed_extent_op(extent_op);
@@ -2488,7 +2502,7 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans,
 
 	delayed_refs = &trans->transaction->delayed_refs;
 
-	ret = cleanup_extent_op(trans, head);
+	ret = run_and_cleanup_extent_op(trans, head);
 	if (ret < 0) {
 		unselect_delayed_ref_head(delayed_refs, head);
 		btrfs_debug(fs_info, "run_delayed_extent_op returned %d", ret);
@@ -6977,12 +6991,8 @@ static noinline int check_ref_cleanup(struct btrfs_trans_handle *trans,
 	if (!RB_EMPTY_ROOT(&head->ref_tree.rb_root))
 		goto out;
 
-	if (head->extent_op) {
-		if (!head->must_insert_reserved)
-			goto out;
-		btrfs_free_delayed_extent_op(head->extent_op);
-		head->extent_op = NULL;
-	}
+	if (cleanup_extent_op(trans, head) != NULL)
+		goto out;
 
 	/*
 	 * waiting for the lock here would deadlock.  If someone else has it
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/6] btrfs: only track ref_heads in delayed_ref_updates
  2018-11-21 18:59 [PATCH 0/6] Delayed refs rsv Josef Bacik
                   ` (2 preceding siblings ...)
  2018-11-21 18:59 ` [PATCH 3/6] btrfs: cleanup extent_op handling Josef Bacik
@ 2018-11-21 18:59 ` Josef Bacik
  2018-11-22 10:19   ` Nikolay Borisov
  2018-11-21 18:59 ` [PATCH 5/6] btrfs: introduce delayed_refs_rsv Josef Bacik
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Josef Bacik @ 2018-11-21 18:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Josef Bacik

From: Josef Bacik <jbacik@fb.com>

We use this number to figure out how many delayed refs to run, but
__btrfs_run_delayed_refs really only checks every time we need a new
delayed ref head, so we always run at least one ref head completely no
matter what the number of items on it.  Fix the accounting to only be
adjusted when we add/remove a ref head.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/delayed-ref.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index b3e4c9fcb664..48725fa757a3 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -251,8 +251,6 @@ static inline void drop_delayed_ref(struct btrfs_trans_handle *trans,
 	ref->in_tree = 0;
 	btrfs_put_delayed_ref(ref);
 	atomic_dec(&delayed_refs->num_entries);
-	if (trans->delayed_ref_updates)
-		trans->delayed_ref_updates--;
 }
 
 static bool merge_ref(struct btrfs_trans_handle *trans,
@@ -467,7 +465,6 @@ static int insert_delayed_ref(struct btrfs_trans_handle *trans,
 	if (ref->action == BTRFS_ADD_DELAYED_REF)
 		list_add_tail(&ref->add_list, &href->ref_add_list);
 	atomic_inc(&root->num_entries);
-	trans->delayed_ref_updates++;
 	spin_unlock(&href->lock);
 	return ret;
 }
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/6] btrfs: introduce delayed_refs_rsv
  2018-11-21 18:59 [PATCH 0/6] Delayed refs rsv Josef Bacik
                   ` (3 preceding siblings ...)
  2018-11-21 18:59 ` [PATCH 4/6] btrfs: only track ref_heads in delayed_ref_updates Josef Bacik
@ 2018-11-21 18:59 ` Josef Bacik
  2018-11-26  9:14   ` Nikolay Borisov
  2018-11-21 18:59 ` [PATCH 6/6] btrfs: fix truncate throttling Josef Bacik
  2018-11-23 15:55 ` [PATCH 0/6] Delayed refs rsv David Sterba
  6 siblings, 1 reply; 23+ messages in thread
From: Josef Bacik @ 2018-11-21 18:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Josef Bacik

From: Josef Bacik <jbacik@fb.com>

Traditionally we've had voodoo in btrfs to account for the space that
delayed refs may take up by having a global_block_rsv.  This works most
of the time, except when it doesn't.  We've had issues reported and seen
in production where sometimes the global reserve is exhausted during
transaction commit before we can run all of our delayed refs, resulting
in an aborted transaction.  Because of this voodoo we have equally
dubious flushing semantics around throttling delayed refs which we often
get wrong.

So instead give them their own block_rsv.  This way we can always know
exactly how much outstanding space we need for delayed refs.  This
allows us to make sure we are constantly filling that reservation up
with space, and allows us to put more precise pressure on the enospc
system.  Instead of doing math to see if its a good time to throttle,
the normal enospc code will be invoked if we have a lot of delayed refs
pending, and they will be run via the normal flushing mechanism.

For now the delayed_refs_rsv will hold the reservations for the delayed
refs, the block group updates, and deleting csums.  We could have a
separate rsv for the block group updates, but the csum deletion stuff is
still handled via the delayed_refs so that will stay there.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/ctree.h             |  26 ++--
 fs/btrfs/delayed-ref.c       |  28 ++++-
 fs/btrfs/disk-io.c           |   4 +
 fs/btrfs/extent-tree.c       | 279 +++++++++++++++++++++++++++++++++++--------
 fs/btrfs/inode.c             |   4 +-
 fs/btrfs/transaction.c       |  77 ++++++------
 include/trace/events/btrfs.h |   2 +
 7 files changed, 313 insertions(+), 107 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8b41ec42f405..0c6d589c8ce4 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -448,8 +448,9 @@ struct btrfs_space_info {
 #define	BTRFS_BLOCK_RSV_TRANS		3
 #define	BTRFS_BLOCK_RSV_CHUNK		4
 #define	BTRFS_BLOCK_RSV_DELOPS		5
-#define	BTRFS_BLOCK_RSV_EMPTY		6
-#define	BTRFS_BLOCK_RSV_TEMP		7
+#define BTRFS_BLOCK_RSV_DELREFS		6
+#define	BTRFS_BLOCK_RSV_EMPTY		7
+#define	BTRFS_BLOCK_RSV_TEMP		8
 
 struct btrfs_block_rsv {
 	u64 size;
@@ -812,6 +813,8 @@ struct btrfs_fs_info {
 	struct btrfs_block_rsv chunk_block_rsv;
 	/* block reservation for delayed operations */
 	struct btrfs_block_rsv delayed_block_rsv;
+	/* block reservation for delayed refs */
+	struct btrfs_block_rsv delayed_refs_rsv;
 
 	struct btrfs_block_rsv empty_block_rsv;
 
@@ -2628,7 +2631,7 @@ static inline u64 btrfs_calc_trunc_metadata_size(struct btrfs_fs_info *fs_info,
 }
 
 int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans);
-int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans);
+bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
 void btrfs_dec_block_group_reservations(struct btrfs_fs_info *fs_info,
 					 const u64 start);
 void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg);
@@ -2742,10 +2745,12 @@ enum btrfs_reserve_flush_enum {
 enum btrfs_flush_state {
 	FLUSH_DELAYED_ITEMS_NR	=	1,
 	FLUSH_DELAYED_ITEMS	=	2,
-	FLUSH_DELALLOC		=	3,
-	FLUSH_DELALLOC_WAIT	=	4,
-	ALLOC_CHUNK		=	5,
-	COMMIT_TRANS		=	6,
+	FLUSH_DELAYED_REFS_NR	=	3,
+	FLUSH_DELAYED_REFS	=	4,
+	FLUSH_DELALLOC		=	5,
+	FLUSH_DELALLOC_WAIT	=	6,
+	ALLOC_CHUNK		=	7,
+	COMMIT_TRANS		=	8,
 };
 
 int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes);
@@ -2796,6 +2801,13 @@ int btrfs_cond_migrate_bytes(struct btrfs_fs_info *fs_info,
 void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
 			     struct btrfs_block_rsv *block_rsv,
 			     u64 num_bytes);
+void btrfs_delayed_refs_rsv_release(struct btrfs_fs_info *fs_info, int nr);
+void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans);
+int btrfs_throttle_delayed_refs(struct btrfs_fs_info *fs_info,
+				enum btrfs_reserve_flush_enum flush);
+void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info,
+				       struct btrfs_block_rsv *src,
+				       u64 num_bytes);
 int btrfs_inc_block_group_ro(struct btrfs_block_group_cache *cache);
 void btrfs_dec_block_group_ro(struct btrfs_block_group_cache *cache);
 void btrfs_put_block_group_cache(struct btrfs_fs_info *info);
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 48725fa757a3..96de92588f06 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -474,11 +474,14 @@ static int insert_delayed_ref(struct btrfs_trans_handle *trans,
  * existing and update must have the same bytenr
  */
 static noinline void
-update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs,
+update_existing_head_ref(struct btrfs_trans_handle *trans,
 			 struct btrfs_delayed_ref_head *existing,
 			 struct btrfs_delayed_ref_head *update,
 			 int *old_ref_mod_ret)
 {
+	struct btrfs_delayed_ref_root *delayed_refs =
+		&trans->transaction->delayed_refs;
+	struct btrfs_fs_info *fs_info = trans->fs_info;
 	int old_ref_mod;
 
 	BUG_ON(existing->is_data != update->is_data);
@@ -536,10 +539,18 @@ update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs,
 	 * versa we need to make sure to adjust pending_csums accordingly.
 	 */
 	if (existing->is_data) {
-		if (existing->total_ref_mod >= 0 && old_ref_mod < 0)
+		u64 csum_items =
+			btrfs_csum_bytes_to_leaves(fs_info,
+						   existing->num_bytes);
+
+		if (existing->total_ref_mod >= 0 && old_ref_mod < 0) {
 			delayed_refs->pending_csums -= existing->num_bytes;
-		if (existing->total_ref_mod < 0 && old_ref_mod >= 0)
+			btrfs_delayed_refs_rsv_release(fs_info, csum_items);
+		}
+		if (existing->total_ref_mod < 0 && old_ref_mod >= 0) {
 			delayed_refs->pending_csums += existing->num_bytes;
+			trans->delayed_ref_updates += csum_items;
+		}
 	}
 	spin_unlock(&existing->lock);
 }
@@ -645,7 +656,7 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
 			&& head_ref->qgroup_reserved
 			&& existing->qgroup_ref_root
 			&& existing->qgroup_reserved);
-		update_existing_head_ref(delayed_refs, existing, head_ref,
+		update_existing_head_ref(trans, existing, head_ref,
 					 old_ref_mod);
 		/*
 		 * we've updated the existing ref, free the newly
@@ -656,8 +667,12 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
 	} else {
 		if (old_ref_mod)
 			*old_ref_mod = 0;
-		if (head_ref->is_data && head_ref->ref_mod < 0)
+		if (head_ref->is_data && head_ref->ref_mod < 0) {
 			delayed_refs->pending_csums += head_ref->num_bytes;
+			trans->delayed_ref_updates +=
+				btrfs_csum_bytes_to_leaves(trans->fs_info,
+							   head_ref->num_bytes);
+		}
 		delayed_refs->num_heads++;
 		delayed_refs->num_heads_ready++;
 		atomic_inc(&delayed_refs->num_entries);
@@ -792,6 +807,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans,
 
 	ret = insert_delayed_ref(trans, delayed_refs, head_ref, &ref->node);
 	spin_unlock(&delayed_refs->lock);
+	btrfs_update_delayed_refs_rsv(trans);
 
 	trace_add_delayed_tree_ref(fs_info, &ref->node, ref,
 				   action == BTRFS_ADD_DELAYED_EXTENT ?
@@ -873,6 +889,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans,
 
 	ret = insert_delayed_ref(trans, delayed_refs, head_ref, &ref->node);
 	spin_unlock(&delayed_refs->lock);
+	btrfs_update_delayed_refs_rsv(trans);
 
 	trace_add_delayed_data_ref(trans->fs_info, &ref->node, ref,
 				   action == BTRFS_ADD_DELAYED_EXTENT ?
@@ -910,6 +927,7 @@ int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
 			     NULL, NULL, NULL);
 
 	spin_unlock(&delayed_refs->lock);
+	btrfs_update_delayed_refs_rsv(trans);
 	return 0;
 }
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 55e8ca782b98..beaa58e742b5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2666,6 +2666,9 @@ int open_ctree(struct super_block *sb,
 	btrfs_init_block_rsv(&fs_info->empty_block_rsv, BTRFS_BLOCK_RSV_EMPTY);
 	btrfs_init_block_rsv(&fs_info->delayed_block_rsv,
 			     BTRFS_BLOCK_RSV_DELOPS);
+	btrfs_init_block_rsv(&fs_info->delayed_refs_rsv,
+			     BTRFS_BLOCK_RSV_DELREFS);
+
 	atomic_set(&fs_info->async_delalloc_pages, 0);
 	atomic_set(&fs_info->defrag_running, 0);
 	atomic_set(&fs_info->qgroup_op_seq, 0);
@@ -4413,6 +4416,7 @@ void btrfs_cleanup_dirty_bgs(struct btrfs_transaction *cur_trans,
 
 		spin_unlock(&cur_trans->dirty_bgs_lock);
 		btrfs_put_block_group(cache);
+		btrfs_delayed_refs_rsv_release(fs_info, 1);
 		spin_lock(&cur_trans->dirty_bgs_lock);
 	}
 	spin_unlock(&cur_trans->dirty_bgs_lock);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8a776dc9cb38..8af68b13fa27 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2463,6 +2463,7 @@ static void cleanup_ref_head_accounting(struct btrfs_trans_handle *trans,
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_delayed_ref_root *delayed_refs =
 		&trans->transaction->delayed_refs;
+	int nr_items = 1;
 
 	if (head->total_ref_mod < 0) {
 		struct btrfs_space_info *space_info;
@@ -2484,12 +2485,15 @@ static void cleanup_ref_head_accounting(struct btrfs_trans_handle *trans,
 			spin_lock(&delayed_refs->lock);
 			delayed_refs->pending_csums -= head->num_bytes;
 			spin_unlock(&delayed_refs->lock);
+			nr_items += btrfs_csum_bytes_to_leaves(fs_info,
+				head->num_bytes);
 		}
 	}
 
 	/* Also free its reserved qgroup space */
 	btrfs_qgroup_free_delayed_ref(fs_info, head->qgroup_ref_root,
 				      head->qgroup_reserved);
+	btrfs_delayed_refs_rsv_release(fs_info, nr_items);
 }
 
 static int cleanup_ref_head(struct btrfs_trans_handle *trans,
@@ -2831,40 +2835,22 @@ u64 btrfs_csum_bytes_to_leaves(struct btrfs_fs_info *fs_info, u64 csum_bytes)
 	return num_csums;
 }
 
-int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans)
+bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info)
 {
-	struct btrfs_fs_info *fs_info = trans->fs_info;
-	struct btrfs_block_rsv *global_rsv;
-	u64 num_heads = trans->transaction->delayed_refs.num_heads_ready;
-	u64 csum_bytes = trans->transaction->delayed_refs.pending_csums;
-	unsigned int num_dirty_bgs = trans->transaction->num_dirty_bgs;
-	u64 num_bytes, num_dirty_bgs_bytes;
-	int ret = 0;
-
-	num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
-	num_heads = heads_to_leaves(fs_info, num_heads);
-	if (num_heads > 1)
-		num_bytes += (num_heads - 1) * fs_info->nodesize;
-	num_bytes <<= 1;
-	num_bytes += btrfs_csum_bytes_to_leaves(fs_info, csum_bytes) *
-							fs_info->nodesize;
-	num_dirty_bgs_bytes = btrfs_calc_trans_metadata_size(fs_info,
-							     num_dirty_bgs);
-	global_rsv = &fs_info->global_block_rsv;
-
-	/*
-	 * If we can't allocate any more chunks lets make sure we have _lots_ of
-	 * wiggle room since running delayed refs can create more delayed refs.
-	 */
-	if (global_rsv->space_info->full) {
-		num_dirty_bgs_bytes <<= 1;
-		num_bytes <<= 1;
-	}
+	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
+	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
+	u64 reserved;
+	bool ret = false;
 
 	spin_lock(&global_rsv->lock);
-	if (global_rsv->reserved <= num_bytes + num_dirty_bgs_bytes)
-		ret = 1;
+	reserved = global_rsv->reserved;
 	spin_unlock(&global_rsv->lock);
+
+	spin_lock(&delayed_refs_rsv->lock);
+	reserved += delayed_refs_rsv->reserved;
+	if (delayed_refs_rsv->size >= reserved)
+		ret = true;
+	spin_unlock(&delayed_refs_rsv->lock);
 	return ret;
 }
 
@@ -2883,7 +2869,7 @@ int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans)
 	if (val >= NSEC_PER_SEC / 2)
 		return 2;
 
-	return btrfs_check_space_for_delayed_refs(trans);
+	return btrfs_check_space_for_delayed_refs(trans->fs_info);
 }
 
 struct async_delayed_refs {
@@ -3627,6 +3613,8 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans)
 	 */
 	mutex_lock(&trans->transaction->cache_write_mutex);
 	while (!list_empty(&dirty)) {
+		bool drop_reserve = true;
+
 		cache = list_first_entry(&dirty,
 					 struct btrfs_block_group_cache,
 					 dirty_list);
@@ -3699,6 +3687,7 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans)
 					list_add_tail(&cache->dirty_list,
 						      &cur_trans->dirty_bgs);
 					btrfs_get_block_group(cache);
+					drop_reserve = false;
 				}
 				spin_unlock(&cur_trans->dirty_bgs_lock);
 			} else if (ret) {
@@ -3709,6 +3698,8 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans)
 		/* if its not on the io list, we need to put the block group */
 		if (should_put)
 			btrfs_put_block_group(cache);
+		if (drop_reserve)
+			btrfs_delayed_refs_rsv_release(fs_info, 1);
 
 		if (ret)
 			break;
@@ -3857,6 +3848,7 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans,
 		/* if its not on the io list, we need to put the block group */
 		if (should_put)
 			btrfs_put_block_group(cache);
+		btrfs_delayed_refs_rsv_release(fs_info, 1);
 		spin_lock(&cur_trans->dirty_bgs_lock);
 	}
 	spin_unlock(&cur_trans->dirty_bgs_lock);
@@ -4829,8 +4821,10 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 {
 	struct reserve_ticket *ticket = NULL;
 	struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_block_rsv;
+	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
 	struct btrfs_trans_handle *trans;
 	u64 bytes;
+	u64 reclaim_bytes = 0;
 
 	trans = (struct btrfs_trans_handle *)current->journal_info;
 	if (trans)
@@ -4863,12 +4857,16 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 		return -ENOSPC;
 
 	spin_lock(&delayed_rsv->lock);
-	if (delayed_rsv->size > bytes)
-		bytes = 0;
-	else
-		bytes -= delayed_rsv->size;
+	reclaim_bytes += delayed_rsv->reserved;
 	spin_unlock(&delayed_rsv->lock);
 
+	spin_lock(&delayed_refs_rsv->lock);
+	reclaim_bytes += delayed_refs_rsv->reserved;
+	spin_unlock(&delayed_refs_rsv->lock);
+	if (reclaim_bytes >= bytes)
+		goto commit;
+	bytes -= reclaim_bytes;
+
 	if (__percpu_counter_compare(&space_info->total_bytes_pinned,
 				   bytes,
 				   BTRFS_TOTAL_BYTES_PINNED_BATCH) < 0) {
@@ -4918,6 +4916,20 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 		shrink_delalloc(fs_info, num_bytes * 2, num_bytes,
 				state == FLUSH_DELALLOC_WAIT);
 		break;
+	case FLUSH_DELAYED_REFS_NR:
+	case FLUSH_DELAYED_REFS:
+		trans = btrfs_join_transaction(root);
+		if (IS_ERR(trans)) {
+			ret = PTR_ERR(trans);
+			break;
+		}
+		if (state == FLUSH_DELAYED_REFS_NR)
+			nr = calc_reclaim_items_nr(fs_info, num_bytes);
+		else
+			nr = 0;
+		btrfs_run_delayed_refs(trans, nr);
+		btrfs_end_transaction(trans);
+		break;
 	case ALLOC_CHUNK:
 		trans = btrfs_join_transaction(root);
 		if (IS_ERR(trans)) {
@@ -5390,6 +5402,91 @@ int btrfs_cond_migrate_bytes(struct btrfs_fs_info *fs_info,
 	return 0;
 }
 
+/**
+ * btrfs_migrate_to_delayed_refs_rsv - transfer bytes to our delayed refs rsv.
+ * @fs_info - the fs info for our fs.
+ * @src - the source block rsv to transfer from.
+ * @num_bytes - the number of bytes to transfer.
+ *
+ * This transfers up to the num_bytes amount from the src rsv to the
+ * delayed_refs_rsv.  Any extra bytes are returned to the space info.
+ */
+void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info,
+				       struct btrfs_block_rsv *src,
+				       u64 num_bytes)
+{
+	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
+	u64 to_free = 0;
+
+	spin_lock(&src->lock);
+	src->reserved -= num_bytes;
+	src->size -= num_bytes;
+	spin_unlock(&src->lock);
+
+	spin_lock(&delayed_refs_rsv->lock);
+	if (delayed_refs_rsv->size > delayed_refs_rsv->reserved) {
+		u64 delta = delayed_refs_rsv->size -
+			delayed_refs_rsv->reserved;
+		if (num_bytes > delta) {
+			to_free = num_bytes - delta;
+			num_bytes = delta;
+		}
+	} else {
+		to_free = num_bytes;
+		num_bytes = 0;
+	}
+
+	if (num_bytes)
+		delayed_refs_rsv->reserved += num_bytes;
+	if (delayed_refs_rsv->reserved >= delayed_refs_rsv->size)
+		delayed_refs_rsv->full = 1;
+	spin_unlock(&delayed_refs_rsv->lock);
+
+	if (num_bytes)
+		trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv",
+					      0, num_bytes, 1);
+	if (to_free)
+		space_info_add_old_bytes(fs_info, delayed_refs_rsv->space_info,
+					 to_free);
+}
+
+/**
+ * btrfs_throttle_delayed_refs - throttle based on our delayed refs usage.
+ * @fs_info - the fs_info for our fs.
+ * @flush - control how we can flush for this reservation.
+ *
+ * This will refill the delayed block_rsv up to 1 items size worth of space and
+ * will return -ENOSPC if we can't make the reservation.
+ */
+int btrfs_throttle_delayed_refs(struct btrfs_fs_info *fs_info,
+				enum btrfs_reserve_flush_enum flush)
+{
+	struct btrfs_block_rsv *block_rsv = &fs_info->delayed_refs_rsv;
+	u64 limit = btrfs_calc_trans_metadata_size(fs_info, 1);
+	u64 num_bytes = 0;
+	int ret = -ENOSPC;
+
+	spin_lock(&block_rsv->lock);
+	if (block_rsv->reserved < block_rsv->size) {
+		num_bytes = block_rsv->size - block_rsv->reserved;
+		num_bytes = min(num_bytes, limit);
+	}
+	spin_unlock(&block_rsv->lock);
+
+	if (!num_bytes)
+		return 0;
+
+	ret = reserve_metadata_bytes(fs_info->extent_root, block_rsv,
+				     num_bytes, flush);
+	if (ret)
+		return ret;
+	block_rsv_add_bytes(block_rsv, num_bytes, 0);
+	trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv",
+				      0, num_bytes, 1);
+	return 0;
+}
+
+
 /*
  * This is for space we already have accounted in space_info->bytes_may_use, so
  * basically when we're returning space from block_rsv's.
@@ -5710,6 +5807,31 @@ static int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
 	return ret;
 }
 
+static u64 __btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
+				     struct btrfs_block_rsv *block_rsv,
+				     u64 num_bytes, u64 *qgroup_to_release)
+{
+	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
+	struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_refs_rsv;
+	struct btrfs_block_rsv *target = delayed_rsv;
+
+	if (target->full || target == block_rsv)
+		target = global_rsv;
+
+	if (block_rsv->space_info != target->space_info)
+		target = NULL;
+
+	return block_rsv_release_bytes(fs_info, block_rsv, target, num_bytes,
+				       qgroup_to_release);
+}
+
+void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
+			     struct btrfs_block_rsv *block_rsv,
+			     u64 num_bytes)
+{
+	__btrfs_block_rsv_release(fs_info, block_rsv, num_bytes, NULL);
+}
+
 /**
  * btrfs_inode_rsv_release - release any excessive reservation.
  * @inode - the inode we need to release from.
@@ -5724,7 +5846,6 @@ static int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
 static void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
 	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
 	u64 released = 0;
 	u64 qgroup_to_release = 0;
@@ -5734,8 +5855,8 @@ static void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
 	 * are releasing 0 bytes, and then we'll just get the reservation over
 	 * the size free'd.
 	 */
-	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0,
-					   &qgroup_to_release);
+	released = __btrfs_block_rsv_release(fs_info, block_rsv, 0,
+					     &qgroup_to_release);
 	if (released > 0)
 		trace_btrfs_space_reservation(fs_info, "delalloc",
 					      btrfs_ino(inode), released, 0);
@@ -5746,16 +5867,26 @@ static void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
 						   qgroup_to_release);
 }
 
-void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
-			     struct btrfs_block_rsv *block_rsv,
-			     u64 num_bytes)
+/**
+ * btrfs_delayed_refs_rsv_release - release a ref head's reservation.
+ * @fs_info - the fs_info for our fs.
+ * @nr - the number of items to drop.
+ *
+ * This drops the delayed ref head's count from the delayed refs rsv and free's
+ * any excess reservation we had.
+ */
+void btrfs_delayed_refs_rsv_release(struct btrfs_fs_info *fs_info, int nr)
 {
+	struct btrfs_block_rsv *block_rsv = &fs_info->delayed_refs_rsv;
 	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
+	u64 num_bytes = btrfs_calc_trans_metadata_size(fs_info, nr);
+	u64 released = 0;
 
-	if (global_rsv == block_rsv ||
-	    block_rsv->space_info != global_rsv->space_info)
-		global_rsv = NULL;
-	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes, NULL);
+	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv,
+					   num_bytes, NULL);
+	if (released)
+		trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv",
+					      0, released, 0);
 }
 
 static void update_global_block_rsv(struct btrfs_fs_info *fs_info)
@@ -5820,9 +5951,10 @@ static void init_global_block_rsv(struct btrfs_fs_info *fs_info)
 	fs_info->trans_block_rsv.space_info = space_info;
 	fs_info->empty_block_rsv.space_info = space_info;
 	fs_info->delayed_block_rsv.space_info = space_info;
+	fs_info->delayed_refs_rsv.space_info = space_info;
 
-	fs_info->extent_root->block_rsv = &fs_info->global_block_rsv;
-	fs_info->csum_root->block_rsv = &fs_info->global_block_rsv;
+	fs_info->extent_root->block_rsv = &fs_info->delayed_refs_rsv;
+	fs_info->csum_root->block_rsv = &fs_info->delayed_refs_rsv;
 	fs_info->dev_root->block_rsv = &fs_info->global_block_rsv;
 	fs_info->tree_root->block_rsv = &fs_info->global_block_rsv;
 	if (fs_info->quota_root)
@@ -5842,8 +5974,34 @@ static void release_global_block_rsv(struct btrfs_fs_info *fs_info)
 	WARN_ON(fs_info->chunk_block_rsv.reserved > 0);
 	WARN_ON(fs_info->delayed_block_rsv.size > 0);
 	WARN_ON(fs_info->delayed_block_rsv.reserved > 0);
+	WARN_ON(fs_info->delayed_refs_rsv.reserved > 0);
+	WARN_ON(fs_info->delayed_refs_rsv.size > 0);
 }
 
+/*
+ * btrfs_update_delayed_refs_rsv - adjust the size of the delayed refs rsv
+ * @trans - the trans that may have generated delayed refs
+ *
+ * This is to be called anytime we may have adjusted trans->delayed_ref_updates,
+ * it'll calculate the additional size and add it to the delayed_refs_rsv.
+ */
+void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_refs_rsv;
+	u64 num_bytes;
+
+	if (!trans->delayed_ref_updates)
+		return;
+
+	num_bytes = btrfs_calc_trans_metadata_size(fs_info,
+						   trans->delayed_ref_updates);
+	spin_lock(&delayed_rsv->lock);
+	delayed_rsv->size += num_bytes;
+	delayed_rsv->full = 0;
+	spin_unlock(&delayed_rsv->lock);
+	trans->delayed_ref_updates = 0;
+}
 
 /*
  * To be called after all the new block groups attached to the transaction
@@ -6136,6 +6294,7 @@ static int update_block_group(struct btrfs_trans_handle *trans,
 	u64 old_val;
 	u64 byte_in_group;
 	int factor;
+	int ret = 0;
 
 	/* block accounting for super block */
 	spin_lock(&info->delalloc_root_lock);
@@ -6149,8 +6308,10 @@ static int update_block_group(struct btrfs_trans_handle *trans,
 
 	while (total) {
 		cache = btrfs_lookup_block_group(info, bytenr);
-		if (!cache)
-			return -ENOENT;
+		if (!cache) {
+			ret = -ENOENT;
+			break;
+		}
 		factor = btrfs_bg_type_to_factor(cache->flags);
 
 		/*
@@ -6209,6 +6370,7 @@ static int update_block_group(struct btrfs_trans_handle *trans,
 			list_add_tail(&cache->dirty_list,
 				      &trans->transaction->dirty_bgs);
 			trans->transaction->num_dirty_bgs++;
+			trans->delayed_ref_updates++;
 			btrfs_get_block_group(cache);
 		}
 		spin_unlock(&trans->transaction->dirty_bgs_lock);
@@ -6226,7 +6388,10 @@ static int update_block_group(struct btrfs_trans_handle *trans,
 		total -= num_bytes;
 		bytenr += num_bytes;
 	}
-	return 0;
+
+	/* Modified block groups are accounted for in the delayed_refs_rsv. */
+	btrfs_update_delayed_refs_rsv(trans);
+	return ret;
 }
 
 static u64 first_logical_byte(struct btrfs_fs_info *fs_info, u64 search_start)
@@ -8369,7 +8534,12 @@ use_block_rsv(struct btrfs_trans_handle *trans,
 		goto again;
 	}
 
-	if (btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
+	/*
+	 * The global reserve still exists to save us from ourselves, so don't
+	 * warn_on if we are short on our delayed refs reserve.
+	 */
+	if (block_rsv->type != BTRFS_BLOCK_RSV_DELREFS &&
+	    btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
 		static DEFINE_RATELIMIT_STATE(_rs,
 				DEFAULT_RATELIMIT_INTERVAL * 10,
 				/*DEFAULT_RATELIMIT_BURST*/ 1);
@@ -10302,6 +10472,7 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans)
 		add_block_group_free_space(trans, block_group);
 		/* already aborted the transaction if it failed. */
 next:
+		btrfs_delayed_refs_rsv_release(fs_info, 1);
 		list_del_init(&block_group->bg_list);
 	}
 	btrfs_trans_release_chunk_metadata(trans);
@@ -10379,6 +10550,8 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used,
 	link_block_group(cache);
 
 	list_add_tail(&cache->bg_list, &trans->new_bgs);
+	trans->delayed_ref_updates++;
+	btrfs_update_delayed_refs_rsv(trans);
 
 	set_avail_alloc_bits(fs_info, type);
 	return 0;
@@ -10416,6 +10589,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 	int factor;
 	struct btrfs_caching_control *caching_ctl = NULL;
 	bool remove_em;
+	bool remove_rsv = false;
 
 	block_group = btrfs_lookup_block_group(fs_info, group_start);
 	BUG_ON(!block_group);
@@ -10480,6 +10654,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 
 	if (!list_empty(&block_group->dirty_list)) {
 		list_del_init(&block_group->dirty_list);
+		remove_rsv = true;
 		btrfs_put_block_group(block_group);
 	}
 	spin_unlock(&trans->transaction->dirty_bgs_lock);
@@ -10689,6 +10864,8 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 
 	ret = btrfs_del_item(trans, root, path);
 out:
+	if (remove_rsv)
+		btrfs_delayed_refs_rsv_release(fs_info, 1);
 	btrfs_free_path(path);
 	return ret;
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a097f5fde31d..8532a2eb56d1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5326,8 +5326,8 @@ static struct btrfs_trans_handle *evict_refill_and_join(struct btrfs_root *root,
 		 * Try to steal from the global reserve if there is space for
 		 * it.
 		 */
-		if (!btrfs_check_space_for_delayed_refs(trans) &&
-		    !btrfs_block_rsv_migrate(global_rsv, rsv, rsv->size, false))
+		if (!btrfs_check_space_for_delayed_refs(fs_info) &&
+		    !btrfs_block_rsv_migrate(global_rsv, rsv, rsv->size, 0))
 			return trans;
 
 		/* If not, commit and try again. */
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index f92c0a88c4ad..a4682a696fb6 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -454,7 +454,7 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
 		  bool enforce_qgroups)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
-
+	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
 	struct btrfs_trans_handle *h;
 	struct btrfs_transaction *cur_trans;
 	u64 num_bytes = 0;
@@ -483,13 +483,28 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
 	 * the appropriate flushing if need be.
 	 */
 	if (num_items && root != fs_info->chunk_root) {
+		struct btrfs_block_rsv *rsv = &fs_info->trans_block_rsv;
+		u64 delayed_refs_bytes = 0;
+
 		qgroup_reserved = num_items * fs_info->nodesize;
 		ret = btrfs_qgroup_reserve_meta_pertrans(root, qgroup_reserved,
 				enforce_qgroups);
 		if (ret)
 			return ERR_PTR(ret);
 
+		/*
+		 * We want to reserve all the bytes we may need all at once, so
+		 * we only do 1 enospc flushing cycle per transaction start.  We
+		 * accomplish this by simply assuming we'll do 2 x num_items
+		 * worth of delayed refs updates in this trans handle, and
+		 * refill that amount for whatever is missing in the reserve.
+		 */
 		num_bytes = btrfs_calc_trans_metadata_size(fs_info, num_items);
+		if (delayed_refs_rsv->full == 0) {
+			delayed_refs_bytes = num_bytes;
+			num_bytes <<= 1;
+		}
+
 		/*
 		 * Do the reservation for the relocation root creation
 		 */
@@ -498,8 +513,24 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
 			reloc_reserved = true;
 		}
 
-		ret = btrfs_block_rsv_add(root, &fs_info->trans_block_rsv,
-					  num_bytes, flush);
+		ret = btrfs_block_rsv_add(root, rsv, num_bytes, flush);
+		if (ret)
+			goto reserve_fail;
+		if (delayed_refs_bytes) {
+			btrfs_migrate_to_delayed_refs_rsv(fs_info, rsv,
+							  delayed_refs_bytes);
+			num_bytes -= delayed_refs_bytes;
+		}
+	} else if (num_items == 0 && flush == BTRFS_RESERVE_FLUSH_ALL &&
+		   !delayed_refs_rsv->full) {
+		/*
+		 * Some people call with btrfs_start_transaction(root, 0)
+		 * because they can be throttled, but have some other mechanism
+		 * for reserving space.  We still want these guys to refill the
+		 * delayed block_rsv so just add 1 items worth of reservation
+		 * here.
+		 */
+		ret = btrfs_throttle_delayed_refs(fs_info, flush);
 		if (ret)
 			goto reserve_fail;
 	}
@@ -758,7 +789,7 @@ static int should_end_transaction(struct btrfs_trans_handle *trans)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 
-	if (btrfs_check_space_for_delayed_refs(trans))
+	if (btrfs_check_space_for_delayed_refs(fs_info))
 		return 1;
 
 	return !!btrfs_block_rsv_check(&fs_info->global_block_rsv, 5);
@@ -767,22 +798,12 @@ static int should_end_transaction(struct btrfs_trans_handle *trans)
 int btrfs_should_end_transaction(struct btrfs_trans_handle *trans)
 {
 	struct btrfs_transaction *cur_trans = trans->transaction;
-	int updates;
-	int err;
 
 	smp_mb();
 	if (cur_trans->state >= TRANS_STATE_BLOCKED ||
 	    cur_trans->delayed_refs.flushing)
 		return 1;
 
-	updates = trans->delayed_ref_updates;
-	trans->delayed_ref_updates = 0;
-	if (updates) {
-		err = btrfs_run_delayed_refs(trans, updates * 2);
-		if (err) /* Error code will also eval true */
-			return err;
-	}
-
 	return should_end_transaction(trans);
 }
 
@@ -812,11 +833,8 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 {
 	struct btrfs_fs_info *info = trans->fs_info;
 	struct btrfs_transaction *cur_trans = trans->transaction;
-	u64 transid = trans->transid;
-	unsigned long cur = trans->delayed_ref_updates;
 	int lock = (trans->type != TRANS_JOIN_NOLOCK);
 	int err = 0;
-	int must_run_delayed_refs = 0;
 
 	if (refcount_read(&trans->use_count) > 1) {
 		refcount_dec(&trans->use_count);
@@ -827,27 +845,6 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 	btrfs_trans_release_metadata(trans);
 	trans->block_rsv = NULL;
 
-	if (!list_empty(&trans->new_bgs))
-		btrfs_create_pending_block_groups(trans);
-
-	trans->delayed_ref_updates = 0;
-	if (!trans->sync) {
-		must_run_delayed_refs =
-			btrfs_should_throttle_delayed_refs(trans);
-		cur = max_t(unsigned long, cur, 32);
-
-		/*
-		 * don't make the caller wait if they are from a NOLOCK
-		 * or ATTACH transaction, it will deadlock with commit
-		 */
-		if (must_run_delayed_refs == 1 &&
-		    (trans->type & (__TRANS_JOIN_NOLOCK | __TRANS_ATTACH)))
-			must_run_delayed_refs = 2;
-	}
-
-	btrfs_trans_release_metadata(trans);
-	trans->block_rsv = NULL;
-
 	if (!list_empty(&trans->new_bgs))
 		btrfs_create_pending_block_groups(trans);
 
@@ -892,10 +889,6 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 	}
 
 	kmem_cache_free(btrfs_trans_handle_cachep, trans);
-	if (must_run_delayed_refs) {
-		btrfs_async_run_delayed_refs(info, cur, transid,
-					     must_run_delayed_refs == 1);
-	}
 	return err;
 }
 
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 8568946f491d..63d1f9d8b8c7 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -1048,6 +1048,8 @@ TRACE_EVENT(btrfs_trigger_flush,
 		{ FLUSH_DELAYED_ITEMS,		"FLUSH_DELAYED_ITEMS"},		\
 		{ FLUSH_DELALLOC,		"FLUSH_DELALLOC"},		\
 		{ FLUSH_DELALLOC_WAIT,		"FLUSH_DELALLOC_WAIT"},		\
+		{ FLUSH_DELAYED_REFS_NR,	"FLUSH_DELAYED_REFS_NR"},	\
+		{ FLUSH_DELAYED_REFS,		"FLUSH_ELAYED_REFS"},		\
 		{ ALLOC_CHUNK,			"ALLOC_CHUNK"},			\
 		{ COMMIT_TRANS,			"COMMIT_TRANS"})
 
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/6] btrfs: fix truncate throttling
  2018-11-21 18:59 [PATCH 0/6] Delayed refs rsv Josef Bacik
                   ` (4 preceding siblings ...)
  2018-11-21 18:59 ` [PATCH 5/6] btrfs: introduce delayed_refs_rsv Josef Bacik
@ 2018-11-21 18:59 ` Josef Bacik
  2018-11-26  9:44   ` Nikolay Borisov
  2018-11-23 15:55 ` [PATCH 0/6] Delayed refs rsv David Sterba
  6 siblings, 1 reply; 23+ messages in thread
From: Josef Bacik @ 2018-11-21 18:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We have a bunch of magic to make sure we're throttling delayed refs when
truncating a file.  Now that we have a delayed refs rsv and a mechanism
for refilling that reserve simply use that instead of all of this magic.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c | 79 ++++++++++++--------------------------------------------
 1 file changed, 17 insertions(+), 62 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8532a2eb56d1..cae30f6c095f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4437,31 +4437,6 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
 	return err;
 }
 
-static int truncate_space_check(struct btrfs_trans_handle *trans,
-				struct btrfs_root *root,
-				u64 bytes_deleted)
-{
-	struct btrfs_fs_info *fs_info = root->fs_info;
-	int ret;
-
-	/*
-	 * This is only used to apply pressure to the enospc system, we don't
-	 * intend to use this reservation at all.
-	 */
-	bytes_deleted = btrfs_csum_bytes_to_leaves(fs_info, bytes_deleted);
-	bytes_deleted *= fs_info->nodesize;
-	ret = btrfs_block_rsv_add(root, &fs_info->trans_block_rsv,
-				  bytes_deleted, BTRFS_RESERVE_NO_FLUSH);
-	if (!ret) {
-		trace_btrfs_space_reservation(fs_info, "transaction",
-					      trans->transid,
-					      bytes_deleted, 1);
-		trans->bytes_reserved += bytes_deleted;
-	}
-	return ret;
-
-}
-
 /*
  * Return this if we need to call truncate_block for the last bit of the
  * truncate.
@@ -4506,7 +4481,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 	u64 bytes_deleted = 0;
 	bool be_nice = false;
 	bool should_throttle = false;
-	bool should_end = false;
 
 	BUG_ON(new_size > 0 && min_type != BTRFS_EXTENT_DATA_KEY);
 
@@ -4719,15 +4693,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 				btrfs_abort_transaction(trans, ret);
 				break;
 			}
-			if (btrfs_should_throttle_delayed_refs(trans))
-				btrfs_async_run_delayed_refs(fs_info,
-					trans->delayed_ref_updates * 2,
-					trans->transid, 0);
 			if (be_nice) {
-				if (truncate_space_check(trans, root,
-							 extent_num_bytes)) {
-					should_end = true;
-				}
 				if (btrfs_should_throttle_delayed_refs(trans))
 					should_throttle = true;
 			}
@@ -4738,7 +4704,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 
 		if (path->slots[0] == 0 ||
 		    path->slots[0] != pending_del_slot ||
-		    should_throttle || should_end) {
+		    should_throttle) {
 			if (pending_del_nr) {
 				ret = btrfs_del_items(trans, root, path,
 						pending_del_slot,
@@ -4750,23 +4716,24 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 				pending_del_nr = 0;
 			}
 			btrfs_release_path(path);
-			if (should_throttle) {
-				unsigned long updates = trans->delayed_ref_updates;
-				if (updates) {
-					trans->delayed_ref_updates = 0;
-					ret = btrfs_run_delayed_refs(trans,
-								   updates * 2);
-					if (ret)
-						break;
-				}
-			}
+
 			/*
-			 * if we failed to refill our space rsv, bail out
-			 * and let the transaction restart
+			 * We can generate a lot of delayed refs, so we need to
+			 * throttle every once and a while and make sure we're
+			 * adding enough space to keep up with the work we are
+			 * generating.  Since we hold a transaction here we
+			 * can't flush, and we don't want to FLUSH_LIMIT because
+			 * we could have generated too many delayed refs to
+			 * actually allocate, so just bail if we're short and
+			 * let the normal reservation dance happen higher up.
 			 */
-			if (should_end) {
-				ret = -EAGAIN;
-				break;
+			if (should_throttle) {
+				ret = btrfs_throttle_delayed_refs(fs_info,
+							BTRFS_RESERVE_NO_FLUSH);
+				if (ret) {
+					ret = -EAGAIN;
+					break;
+				}
 			}
 			goto search_again;
 		} else {
@@ -4792,18 +4759,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 	}
 
 	btrfs_free_path(path);
-
-	if (be_nice && bytes_deleted > SZ_32M && (ret >= 0 || ret == -EAGAIN)) {
-		unsigned long updates = trans->delayed_ref_updates;
-		int err;
-
-		if (updates) {
-			trans->delayed_ref_updates = 0;
-			err = btrfs_run_delayed_refs(trans, updates * 2);
-			if (err)
-				ret = err;
-		}
-	}
 	return ret;
 }
 
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/6] btrfs: add cleanup_ref_head_accounting helper
  2018-11-21 18:59 ` [PATCH 2/6] btrfs: add cleanup_ref_head_accounting helper Josef Bacik
@ 2018-11-22  1:06   ` Qu Wenruo
  2018-11-23 13:51     ` David Sterba
  0 siblings, 1 reply; 23+ messages in thread
From: Qu Wenruo @ 2018-11-22  1:06 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team; +Cc: Josef Bacik


[-- Attachment #1.1: Type: text/plain, Size: 4250 bytes --]



On 2018/11/22 上午2:59, Josef Bacik wrote:
> From: Josef Bacik <jbacik@fb.com>
> 
> We were missing some quota cleanups in check_ref_cleanup, so break the
> ref head accounting cleanup into a helper and call that from both
> check_ref_cleanup and cleanup_ref_head.  This will hopefully ensure that
> we don't screw up accounting in the future for other things that we add.
> 
> Reviewed-by: Omar Sandoval <osandov@fb.com>
> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  fs/btrfs/extent-tree.c | 67 +++++++++++++++++++++++++++++---------------------
>  1 file changed, 39 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index c36b3a42f2bb..e3ed3507018d 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2443,6 +2443,41 @@ static int cleanup_extent_op(struct btrfs_trans_handle *trans,
>  	return ret ? ret : 1;
>  }
>  
> +static void cleanup_ref_head_accounting(struct btrfs_trans_handle *trans,
> +					struct btrfs_delayed_ref_head *head)
> +{
> +	struct btrfs_fs_info *fs_info = trans->fs_info;
> +	struct btrfs_delayed_ref_root *delayed_refs =
> +		&trans->transaction->delayed_refs;
> +
> +	if (head->total_ref_mod < 0) {
> +		struct btrfs_space_info *space_info;
> +		u64 flags;
> +
> +		if (head->is_data)
> +			flags = BTRFS_BLOCK_GROUP_DATA;
> +		else if (head->is_system)
> +			flags = BTRFS_BLOCK_GROUP_SYSTEM;
> +		else
> +			flags = BTRFS_BLOCK_GROUP_METADATA;
> +		space_info = __find_space_info(fs_info, flags);
> +		ASSERT(space_info);
> +		percpu_counter_add_batch(&space_info->total_bytes_pinned,
> +				   -head->num_bytes,
> +				   BTRFS_TOTAL_BYTES_PINNED_BATCH);
> +
> +		if (head->is_data) {
> +			spin_lock(&delayed_refs->lock);
> +			delayed_refs->pending_csums -= head->num_bytes;
> +			spin_unlock(&delayed_refs->lock);
> +		}
> +	}
> +
> +	/* Also free its reserved qgroup space */
> +	btrfs_qgroup_free_delayed_ref(fs_info, head->qgroup_ref_root,
> +				      head->qgroup_reserved);

This part will be removed in patch "[PATCH] btrfs: qgroup: Move reserved
data account from btrfs_delayed_ref_head to btrfs_qgroup_extent_record".

So there is one less thing to worry about in delayed ref head.

Thanks,
Qu

> +}
> +
>  static int cleanup_ref_head(struct btrfs_trans_handle *trans,
>  			    struct btrfs_delayed_ref_head *head)
>  {
> @@ -2478,31 +2513,6 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans,
>  	spin_unlock(&head->lock);
>  	spin_unlock(&delayed_refs->lock);
>  
> -	trace_run_delayed_ref_head(fs_info, head, 0);
> -
> -	if (head->total_ref_mod < 0) {
> -		struct btrfs_space_info *space_info;
> -		u64 flags;
> -
> -		if (head->is_data)
> -			flags = BTRFS_BLOCK_GROUP_DATA;
> -		else if (head->is_system)
> -			flags = BTRFS_BLOCK_GROUP_SYSTEM;
> -		else
> -			flags = BTRFS_BLOCK_GROUP_METADATA;
> -		space_info = __find_space_info(fs_info, flags);
> -		ASSERT(space_info);
> -		percpu_counter_add_batch(&space_info->total_bytes_pinned,
> -				   -head->num_bytes,
> -				   BTRFS_TOTAL_BYTES_PINNED_BATCH);
> -
> -		if (head->is_data) {
> -			spin_lock(&delayed_refs->lock);
> -			delayed_refs->pending_csums -= head->num_bytes;
> -			spin_unlock(&delayed_refs->lock);
> -		}
> -	}
> -
>  	if (head->must_insert_reserved) {
>  		btrfs_pin_extent(fs_info, head->bytenr,
>  				 head->num_bytes, 1);
> @@ -2512,9 +2522,9 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans,
>  		}
>  	}
>  
> -	/* Also free its reserved qgroup space */
> -	btrfs_qgroup_free_delayed_ref(fs_info, head->qgroup_ref_root,
> -				      head->qgroup_reserved);
> +	cleanup_ref_head_accounting(trans, head);
> +
> +	trace_run_delayed_ref_head(fs_info, head, 0);
>  	btrfs_delayed_ref_unlock(head);
>  	btrfs_put_delayed_ref_head(head);
>  	return 0;
> @@ -6991,6 +7001,7 @@ static noinline int check_ref_cleanup(struct btrfs_trans_handle *trans,
>  	if (head->must_insert_reserved)
>  		ret = 1;
>  
> +	cleanup_ref_head_accounting(trans, head);
>  	mutex_unlock(&head->mutex);
>  	btrfs_put_delayed_ref_head(head);
>  	return ret;
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] btrfs: cleanup extent_op handling
  2018-11-21 18:59 ` [PATCH 3/6] btrfs: cleanup extent_op handling Josef Bacik
@ 2018-11-22  8:56   ` Lu Fengqi
  2018-11-22 10:09   ` Nikolay Borisov
  2018-11-23 15:05   ` David Sterba
  2 siblings, 0 replies; 23+ messages in thread
From: Lu Fengqi @ 2018-11-22  8:56 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team, Josef Bacik

On Wed, Nov 21, 2018 at 01:59:09PM -0500, Josef Bacik wrote:
>From: Josef Bacik <jbacik@fb.com>
>
>The cleanup_extent_op function actually would run the extent_op if it
>needed running, which made the name sort of a misnomer.  Change it to
>run_and_cleanup_extent_op, and move the actual cleanup work to
>cleanup_extent_op so it can be used by check_ref_cleanup() in order to
>unify the extent op handling.
>
>Signed-off-by: Josef Bacik <jbacik@fb.com>

One nitpick below.

Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>

>---
> fs/btrfs/extent-tree.c | 36 +++++++++++++++++++++++-------------
> 1 file changed, 23 insertions(+), 13 deletions(-)
>
>diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>index e3ed3507018d..8a776dc9cb38 100644
>--- a/fs/btrfs/extent-tree.c
>+++ b/fs/btrfs/extent-tree.c
>@@ -2424,19 +2424,33 @@ static void unselect_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_ref
> 	btrfs_delayed_ref_unlock(head);
> }
> 
>-static int cleanup_extent_op(struct btrfs_trans_handle *trans,
>-			     struct btrfs_delayed_ref_head *head)
>+static struct btrfs_delayed_extent_op *
>+cleanup_extent_op(struct btrfs_trans_handle *trans,

The trans parameter seems useless.

-- 
Thanks,
Lu

>+		  struct btrfs_delayed_ref_head *head)
> {
> 	struct btrfs_delayed_extent_op *extent_op = head->extent_op;
>-	int ret;
> 
> 	if (!extent_op)
>-		return 0;
>-	head->extent_op = NULL;
>+		return NULL;
>+
> 	if (head->must_insert_reserved) {
>+		head->extent_op = NULL;
> 		btrfs_free_delayed_extent_op(extent_op);
>-		return 0;
>+		return NULL;
> 	}
>+	return extent_op;
>+}
>+
>+static int run_and_cleanup_extent_op(struct btrfs_trans_handle *trans,
>+				     struct btrfs_delayed_ref_head *head)
>+{
>+	struct btrfs_delayed_extent_op *extent_op =
>+		cleanup_extent_op(trans, head);
>+	int ret;
>+
>+	if (!extent_op)
>+		return 0;
>+	head->extent_op = NULL;
> 	spin_unlock(&head->lock);
> 	ret = run_delayed_extent_op(trans, head, extent_op);
> 	btrfs_free_delayed_extent_op(extent_op);
>@@ -2488,7 +2502,7 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans,
> 
> 	delayed_refs = &trans->transaction->delayed_refs;
> 
>-	ret = cleanup_extent_op(trans, head);
>+	ret = run_and_cleanup_extent_op(trans, head);
> 	if (ret < 0) {
> 		unselect_delayed_ref_head(delayed_refs, head);
> 		btrfs_debug(fs_info, "run_delayed_extent_op returned %d", ret);
>@@ -6977,12 +6991,8 @@ static noinline int check_ref_cleanup(struct btrfs_trans_handle *trans,
> 	if (!RB_EMPTY_ROOT(&head->ref_tree.rb_root))
> 		goto out;
> 
>-	if (head->extent_op) {
>-		if (!head->must_insert_reserved)
>-			goto out;
>-		btrfs_free_delayed_extent_op(head->extent_op);
>-		head->extent_op = NULL;
>-	}
>+	if (cleanup_extent_op(trans, head) != NULL)
>+		goto out;
> 
> 	/*
> 	 * waiting for the lock here would deadlock.  If someone else has it
>-- 
>2.14.3
>
>
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] btrfs: add btrfs_delete_ref_head helper
  2018-11-21 18:59 ` [PATCH 1/6] btrfs: add btrfs_delete_ref_head helper Josef Bacik
@ 2018-11-22  9:12   ` Nikolay Borisov
  2018-11-22  9:42     ` Nikolay Borisov
  0 siblings, 1 reply; 23+ messages in thread
From: Nikolay Borisov @ 2018-11-22  9:12 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team; +Cc: Josef Bacik



On 21.11.18 г. 20:59 ч., Josef Bacik wrote:
> From: Josef Bacik <jbacik@fb.com>
> 
> We do this dance in cleanup_ref_head and check_ref_cleanup, unify it
> into a helper and cleanup the calling functions.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> Reviewed-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/delayed-ref.c | 14 ++++++++++++++
>  fs/btrfs/delayed-ref.h |  3 ++-
>  fs/btrfs/extent-tree.c | 22 +++-------------------
>  3 files changed, 19 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index 9301b3ad9217..b3e4c9fcb664 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -400,6 +400,20 @@ struct btrfs_delayed_ref_head *btrfs_select_ref_head(
>  	return head;
>  }
>  
> +void btrfs_delete_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
> +			   struct btrfs_delayed_ref_head *head)
> +{
> +	lockdep_assert_held(&delayed_refs->lock);
> +	lockdep_assert_held(&head->lock);
> +
> +	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
> +	RB_CLEAR_NODE(&head->href_node);
> +	atomic_dec(&delayed_refs->num_entries);
> +	delayed_refs->num_heads--;
> +	if (head->processing == 0)
> +		delayed_refs->num_heads_ready--;

num_heads_ready will never execute in cleanup_ref_head, since
processing == 0 only when the ref head is unselected. Perhaps those 2
lines shouldn't be in this function? I find it a bit confusing that if
processing is 0 we decrement num_heads_ready in check_ref_cleanup, but
in unselect_delayed_ref_head we set it to 0 and increment it.

> +}
> +
>  /*
>   * Helper to insert the ref_node to the tail or merge with tail.
>   *
> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
> index 8e20c5cb5404..d2af974f68a1 100644
> --- a/fs/btrfs/delayed-ref.h
> +++ b/fs/btrfs/delayed-ref.h
> @@ -261,7 +261,8 @@ static inline void btrfs_delayed_ref_unlock(struct btrfs_delayed_ref_head *head)
>  {
>  	mutex_unlock(&head->mutex);
>  }
> -
> +void btrfs_delete_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
> +			   struct btrfs_delayed_ref_head *head);
>  
>  struct btrfs_delayed_ref_head *btrfs_select_ref_head(
>  		struct btrfs_delayed_ref_root *delayed_refs);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index d242a1174e50..c36b3a42f2bb 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2474,12 +2474,9 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans,
>  		spin_unlock(&delayed_refs->lock);
>  		return 1;
>  	}
> -	delayed_refs->num_heads--;
> -	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
> -	RB_CLEAR_NODE(&head->href_node);
> +	btrfs_delete_ref_head(delayed_refs, head);
>  	spin_unlock(&head->lock);
>  	spin_unlock(&delayed_refs->lock);
> -	atomic_dec(&delayed_refs->num_entries);
>  
>  	trace_run_delayed_ref_head(fs_info, head, 0);
>  
> @@ -6984,22 +6981,9 @@ static noinline int check_ref_cleanup(struct btrfs_trans_handle *trans,
>  	if (!mutex_trylock(&head->mutex))
>  		goto out;
>  
> -	/*
> -	 * at this point we have a head with no other entries.  Go
> -	 * ahead and process it.
> -	 */
> -	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
> -	RB_CLEAR_NODE(&head->href_node);
> -	atomic_dec(&delayed_refs->num_entries);
> -
> -	/*
> -	 * we don't take a ref on the node because we're removing it from the
> -	 * tree, so we just steal the ref the tree was holding.
> -	 */
> -	delayed_refs->num_heads--;
> -	if (head->processing == 0)
> -		delayed_refs->num_heads_ready--;
> +	btrfs_delete_ref_head(delayed_refs, head);
>  	head->processing = 0;

Something is fishy here, before the code checked for processing == 0 and
then also set it to 0 ?

> +
>  	spin_unlock(&head->lock);
>  	spin_unlock(&delayed_refs->lock);
>  
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] btrfs: add btrfs_delete_ref_head helper
  2018-11-22  9:12   ` Nikolay Borisov
@ 2018-11-22  9:42     ` Nikolay Borisov
  2018-11-23 13:45       ` David Sterba
  0 siblings, 1 reply; 23+ messages in thread
From: Nikolay Borisov @ 2018-11-22  9:42 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team; +Cc: Josef Bacik



On 22.11.18 г. 11:12 ч., Nikolay Borisov wrote:
> 
> 
> On 21.11.18 г. 20:59 ч., Josef Bacik wrote:
>> From: Josef Bacik <jbacik@fb.com>
>>
>> We do this dance in cleanup_ref_head and check_ref_cleanup, unify it
>> into a helper and cleanup the calling functions.
>>
>> Signed-off-by: Josef Bacik <jbacik@fb.com>
>> Reviewed-by: Omar Sandoval <osandov@fb.com>
>> ---
>>  fs/btrfs/delayed-ref.c | 14 ++++++++++++++
>>  fs/btrfs/delayed-ref.h |  3 ++-
>>  fs/btrfs/extent-tree.c | 22 +++-------------------
>>  3 files changed, 19 insertions(+), 20 deletions(-)
>>
>> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
>> index 9301b3ad9217..b3e4c9fcb664 100644
>> --- a/fs/btrfs/delayed-ref.c
>> +++ b/fs/btrfs/delayed-ref.c
>> @@ -400,6 +400,20 @@ struct btrfs_delayed_ref_head *btrfs_select_ref_head(
>>  	return head;
>>  }
>>  
>> +void btrfs_delete_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
>> +			   struct btrfs_delayed_ref_head *head)
>> +{
>> +	lockdep_assert_held(&delayed_refs->lock);
>> +	lockdep_assert_held(&head->lock);
>> +
>> +	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
>> +	RB_CLEAR_NODE(&head->href_node);
>> +	atomic_dec(&delayed_refs->num_entries);
>> +	delayed_refs->num_heads--;
>> +	if (head->processing == 0)
>> +		delayed_refs->num_heads_ready--;
> 
> num_heads_ready will never execute in cleanup_ref_head, since
> processing == 0 only when the ref head is unselected. Perhaps those 2
> lines shouldn't be in this function? I find it a bit confusing that if
> processing is 0 we decrement num_heads_ready in check_ref_cleanup, but
> in unselect_delayed_ref_head we set it to 0 and increment it.
> 
>> +}
>> +
>>  /*
>>   * Helper to insert the ref_node to the tail or merge with tail.
>>   *
>> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
>> index 8e20c5cb5404..d2af974f68a1 100644
>> --- a/fs/btrfs/delayed-ref.h
>> +++ b/fs/btrfs/delayed-ref.h
>> @@ -261,7 +261,8 @@ static inline void btrfs_delayed_ref_unlock(struct btrfs_delayed_ref_head *head)
>>  {
>>  	mutex_unlock(&head->mutex);
>>  }
>> -
>> +void btrfs_delete_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
>> +			   struct btrfs_delayed_ref_head *head);
>>  
>>  struct btrfs_delayed_ref_head *btrfs_select_ref_head(
>>  		struct btrfs_delayed_ref_root *delayed_refs);
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index d242a1174e50..c36b3a42f2bb 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -2474,12 +2474,9 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans,
>>  		spin_unlock(&delayed_refs->lock);
>>  		return 1;
>>  	}
>> -	delayed_refs->num_heads--;
>> -	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
>> -	RB_CLEAR_NODE(&head->href_node);
>> +	btrfs_delete_ref_head(delayed_refs, head);
>>  	spin_unlock(&head->lock);
>>  	spin_unlock(&delayed_refs->lock);
>> -	atomic_dec(&delayed_refs->num_entries);
>>  
>>  	trace_run_delayed_ref_head(fs_info, head, 0);
>>  
>> @@ -6984,22 +6981,9 @@ static noinline int check_ref_cleanup(struct btrfs_trans_handle *trans,
>>  	if (!mutex_trylock(&head->mutex))
>>  		goto out;
>>  
>> -	/*
>> -	 * at this point we have a head with no other entries.  Go
>> -	 * ahead and process it.
>> -	 */
>> -	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
>> -	RB_CLEAR_NODE(&head->href_node);
>> -	atomic_dec(&delayed_refs->num_entries);
>> -
>> -	/*
>> -	 * we don't take a ref on the node because we're removing it from the
>> -	 * tree, so we just steal the ref the tree was holding.
>> -	 */
>> -	delayed_refs->num_heads--;
>> -	if (head->processing == 0)
>> -		delayed_refs->num_heads_ready--;
>> +	btrfs_delete_ref_head(delayed_refs, head);
>>  	head->processing = 0;

On a closer inspection I think here we can do:

ASSERT(head->processing == 0) because at that point we've taken the
head->lock spinlock which is held during ordinary delayed refs
processing (in __btrfs_run_delayed_refs) when the head is selected (and
processing is 1). So head->processing == 0 here I think is a hard
invariant of the code. The decrement here should pair with the increment
when the head was initially added to the tree.

In cleanup_ref_head we don't need to ever worry about num_heads_ready
since it has already been decremented by btrfs_select_ref_head.

As a matter fact this counter is not used anywhere so we might as well
just remove it.

> 
> Something is fishy here, before the code checked for processing == 0 and
> then also set it to 0 ?
> 
>> +
>>  	spin_unlock(&head->lock);
>>  	spin_unlock(&delayed_refs->lock);
>>  
>>
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] btrfs: cleanup extent_op handling
  2018-11-21 18:59 ` [PATCH 3/6] btrfs: cleanup extent_op handling Josef Bacik
  2018-11-22  8:56   ` Lu Fengqi
@ 2018-11-22 10:09   ` Nikolay Borisov
  2018-11-27 15:39     ` Josef Bacik
  2018-11-23 15:05   ` David Sterba
  2 siblings, 1 reply; 23+ messages in thread
From: Nikolay Borisov @ 2018-11-22 10:09 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team; +Cc: Josef Bacik



On 21.11.18 г. 20:59 ч., Josef Bacik wrote:
> From: Josef Bacik <jbacik@fb.com>
> 
> The cleanup_extent_op function actually would run the extent_op if it
> needed running, which made the name sort of a misnomer.  Change it to
> run_and_cleanup_extent_op, and move the actual cleanup work to
> cleanup_extent_op so it can be used by check_ref_cleanup() in order to
> unify the extent op handling.

The whole name extent_op is actually a misnomer since AFAIR this is some
sort of modification of the references of metadata nodes. I don't see
why it can't be made as yet another type of reference which is run for a
given node.

> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  fs/btrfs/extent-tree.c | 36 +++++++++++++++++++++++-------------
>  1 file changed, 23 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index e3ed3507018d..8a776dc9cb38 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2424,19 +2424,33 @@ static void unselect_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_ref
>  	btrfs_delayed_ref_unlock(head);
>  }
>  
> -static int cleanup_extent_op(struct btrfs_trans_handle *trans,
> -			     struct btrfs_delayed_ref_head *head)
> +static struct btrfs_delayed_extent_op *
> +cleanup_extent_op(struct btrfs_trans_handle *trans,
> +		  struct btrfs_delayed_ref_head *head)
>  {
>  	struct btrfs_delayed_extent_op *extent_op = head->extent_op;
> -	int ret;
>  
>  	if (!extent_op)
> -		return 0;
> -	head->extent_op = NULL;
> +		return NULL;
> +
>  	if (head->must_insert_reserved) {
> +		head->extent_op = NULL;
>  		btrfs_free_delayed_extent_op(extent_op);
> -		return 0;
> +		return NULL;
>  	}
> +	return extent_op;
> +}
> +
> +static int run_and_cleanup_extent_op(struct btrfs_trans_handle *trans,
> +				     struct btrfs_delayed_ref_head *head)
> +{
> +	struct btrfs_delayed_extent_op *extent_op =
> +		cleanup_extent_op(trans, head);
> +	int ret;
> +
> +	if (!extent_op)
> +		return 0;
> +	head->extent_op = NULL;
>  	spin_unlock(&head->lock);
>  	ret = run_delayed_extent_op(trans, head, extent_op);
>  	btrfs_free_delayed_extent_op(extent_op);
> @@ -2488,7 +2502,7 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans,
>  
>  	delayed_refs = &trans->transaction->delayed_refs;
>  
> -	ret = cleanup_extent_op(trans, head);
> +	ret = run_and_cleanup_extent_op(trans, head);
>  	if (ret < 0) {
>  		unselect_delayed_ref_head(delayed_refs, head);
>  		btrfs_debug(fs_info, "run_delayed_extent_op returned %d", ret);
> @@ -6977,12 +6991,8 @@ static noinline int check_ref_cleanup(struct btrfs_trans_handle *trans,
>  	if (!RB_EMPTY_ROOT(&head->ref_tree.rb_root))
>  		goto out;
>  
> -	if (head->extent_op) {
> -		if (!head->must_insert_reserved)
> -			goto out;
> -		btrfs_free_delayed_extent_op(head->extent_op);
> -		head->extent_op = NULL;
> -	}
> +	if (cleanup_extent_op(trans, head) != NULL)
> +		goto out;
>  
>  	/*
>  	 * waiting for the lock here would deadlock.  If someone else has it
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/6] btrfs: only track ref_heads in delayed_ref_updates
  2018-11-21 18:59 ` [PATCH 4/6] btrfs: only track ref_heads in delayed_ref_updates Josef Bacik
@ 2018-11-22 10:19   ` Nikolay Borisov
  0 siblings, 0 replies; 23+ messages in thread
From: Nikolay Borisov @ 2018-11-22 10:19 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team; +Cc: Josef Bacik



On 21.11.18 г. 20:59 ч., Josef Bacik wrote:
> From: Josef Bacik <jbacik@fb.com>
> 
> We use this number to figure out how many delayed refs to run, but
> __btrfs_run_delayed_refs really only checks every time we need a new
> delayed ref head, so we always run at least one ref head completely no
> matter what the number of items on it.  Fix the accounting to only be
> adjusted when we add/remove a ref head.

LGTM:

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

However, what if we kill delayed_ref_updates since the name is a bit
ambiguous and instead migrate num_heads_ready from delayed_refs to trans
and use that? Otherwise, as stated previously num_heads_ready is
currently unused and could be removed.



> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  fs/btrfs/delayed-ref.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index b3e4c9fcb664..48725fa757a3 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -251,8 +251,6 @@ static inline void drop_delayed_ref(struct btrfs_trans_handle *trans,
>  	ref->in_tree = 0;
>  	btrfs_put_delayed_ref(ref);
>  	atomic_dec(&delayed_refs->num_entries);
> -	if (trans->delayed_ref_updates)
> -		trans->delayed_ref_updates--;
>  }
>  
>  static bool merge_ref(struct btrfs_trans_handle *trans,
> @@ -467,7 +465,6 @@ static int insert_delayed_ref(struct btrfs_trans_handle *trans,
>  	if (ref->action == BTRFS_ADD_DELAYED_REF)
>  		list_add_tail(&ref->add_list, &href->ref_add_list);
>  	atomic_inc(&root->num_entries);
> -	trans->delayed_ref_updates++;
>  	spin_unlock(&href->lock);
>  	return ret;
>  }
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] btrfs: add btrfs_delete_ref_head helper
  2018-11-22  9:42     ` Nikolay Borisov
@ 2018-11-23 13:45       ` David Sterba
  2018-11-23 13:50         ` Nikolay Borisov
  0 siblings, 1 reply; 23+ messages in thread
From: David Sterba @ 2018-11-23 13:45 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Josef Bacik, linux-btrfs, kernel-team, Josef Bacik

On Thu, Nov 22, 2018 at 11:42:28AM +0200, Nikolay Borisov wrote:
> 
> 
> On 22.11.18 г. 11:12 ч., Nikolay Borisov wrote:
> > 
> > 
> > On 21.11.18 г. 20:59 ч., Josef Bacik wrote:
> >> From: Josef Bacik <jbacik@fb.com>
> >>
> >> We do this dance in cleanup_ref_head and check_ref_cleanup, unify it
> >> into a helper and cleanup the calling functions.
> >>
> >> Signed-off-by: Josef Bacik <jbacik@fb.com>
> >> Reviewed-by: Omar Sandoval <osandov@fb.com>
> >> ---
> >>  fs/btrfs/delayed-ref.c | 14 ++++++++++++++
> >>  fs/btrfs/delayed-ref.h |  3 ++-
> >>  fs/btrfs/extent-tree.c | 22 +++-------------------
> >>  3 files changed, 19 insertions(+), 20 deletions(-)
> >>
> >> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> >> index 9301b3ad9217..b3e4c9fcb664 100644
> >> --- a/fs/btrfs/delayed-ref.c
> >> +++ b/fs/btrfs/delayed-ref.c
> >> @@ -400,6 +400,20 @@ struct btrfs_delayed_ref_head *btrfs_select_ref_head(
> >>  	return head;
> >>  }
> >>  
> >> +void btrfs_delete_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
> >> +			   struct btrfs_delayed_ref_head *head)
> >> +{
> >> +	lockdep_assert_held(&delayed_refs->lock);
> >> +	lockdep_assert_held(&head->lock);
> >> +
> >> +	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
> >> +	RB_CLEAR_NODE(&head->href_node);
> >> +	atomic_dec(&delayed_refs->num_entries);
> >> +	delayed_refs->num_heads--;
> >> +	if (head->processing == 0)
> >> +		delayed_refs->num_heads_ready--;
> > 
> > num_heads_ready will never execute in cleanup_ref_head, since
> > processing == 0 only when the ref head is unselected. Perhaps those 2
> > lines shouldn't be in this function? I find it a bit confusing that if
> > processing is 0 we decrement num_heads_ready in check_ref_cleanup, but
> > in unselect_delayed_ref_head we set it to 0 and increment it.
> > 
> >> +}
> >> +
> >>  /*
> >>   * Helper to insert the ref_node to the tail or merge with tail.
> >>   *
> >> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
> >> index 8e20c5cb5404..d2af974f68a1 100644
> >> --- a/fs/btrfs/delayed-ref.h
> >> +++ b/fs/btrfs/delayed-ref.h
> >> @@ -261,7 +261,8 @@ static inline void btrfs_delayed_ref_unlock(struct btrfs_delayed_ref_head *head)
> >>  {
> >>  	mutex_unlock(&head->mutex);
> >>  }
> >> -
> >> +void btrfs_delete_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
> >> +			   struct btrfs_delayed_ref_head *head);
> >>  
> >>  struct btrfs_delayed_ref_head *btrfs_select_ref_head(
> >>  		struct btrfs_delayed_ref_root *delayed_refs);
> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> >> index d242a1174e50..c36b3a42f2bb 100644
> >> --- a/fs/btrfs/extent-tree.c
> >> +++ b/fs/btrfs/extent-tree.c
> >> @@ -2474,12 +2474,9 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans,
> >>  		spin_unlock(&delayed_refs->lock);
> >>  		return 1;
> >>  	}
> >> -	delayed_refs->num_heads--;
> >> -	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
> >> -	RB_CLEAR_NODE(&head->href_node);
> >> +	btrfs_delete_ref_head(delayed_refs, head);
> >>  	spin_unlock(&head->lock);
> >>  	spin_unlock(&delayed_refs->lock);
> >> -	atomic_dec(&delayed_refs->num_entries);
> >>  
> >>  	trace_run_delayed_ref_head(fs_info, head, 0);
> >>  
> >> @@ -6984,22 +6981,9 @@ static noinline int check_ref_cleanup(struct btrfs_trans_handle *trans,
> >>  	if (!mutex_trylock(&head->mutex))
> >>  		goto out;
> >>  
> >> -	/*
> >> -	 * at this point we have a head with no other entries.  Go
> >> -	 * ahead and process it.
> >> -	 */
> >> -	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
> >> -	RB_CLEAR_NODE(&head->href_node);
> >> -	atomic_dec(&delayed_refs->num_entries);
> >> -
> >> -	/*
> >> -	 * we don't take a ref on the node because we're removing it from the
> >> -	 * tree, so we just steal the ref the tree was holding.
> >> -	 */
> >> -	delayed_refs->num_heads--;
> >> -	if (head->processing == 0)
> >> -		delayed_refs->num_heads_ready--;
> >> +	btrfs_delete_ref_head(delayed_refs, head);
> >>  	head->processing = 0;
> 
> On a closer inspection I think here we can do:
> 
> ASSERT(head->processing == 0) because at that point we've taken the
> head->lock spinlock which is held during ordinary delayed refs
> processing (in __btrfs_run_delayed_refs) when the head is selected (and
> processing is 1). So head->processing == 0 here I think is a hard
> invariant of the code. The decrement here should pair with the increment
> when the head was initially added to the tree.
> 
> In cleanup_ref_head we don't need to ever worry about num_heads_ready
> since it has already been decremented by btrfs_select_ref_head.
> 
> As a matter fact this counter is not used anywhere so we might as well
> just remove it.

The logic does not use it, there's only a WARN_ON in
btrfs_select_ref_head, that's more like a debugging or assertion that
everything is fine. So the question is whether to keep it as a
consistency check (and add comments) or remove it and simplify the code.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] btrfs: add btrfs_delete_ref_head helper
  2018-11-23 13:45       ` David Sterba
@ 2018-11-23 13:50         ` Nikolay Borisov
  0 siblings, 0 replies; 23+ messages in thread
From: Nikolay Borisov @ 2018-11-23 13:50 UTC (permalink / raw)
  To: dsterba, Josef Bacik, linux-btrfs, kernel-team, Josef Bacik



On 23.11.18 г. 15:45 ч., David Sterba wrote:
> On Thu, Nov 22, 2018 at 11:42:28AM +0200, Nikolay Borisov wrote:
>>
>>
>> On 22.11.18 г. 11:12 ч., Nikolay Borisov wrote:
>>>
>>>
>>> On 21.11.18 г. 20:59 ч., Josef Bacik wrote:
>>>> From: Josef Bacik <jbacik@fb.com>
>>>>
>>>> We do this dance in cleanup_ref_head and check_ref_cleanup, unify it
>>>> into a helper and cleanup the calling functions.
>>>>
>>>> Signed-off-by: Josef Bacik <jbacik@fb.com>
>>>> Reviewed-by: Omar Sandoval <osandov@fb.com>
>>>> ---
>>>>  fs/btrfs/delayed-ref.c | 14 ++++++++++++++
>>>>  fs/btrfs/delayed-ref.h |  3 ++-
>>>>  fs/btrfs/extent-tree.c | 22 +++-------------------
>>>>  3 files changed, 19 insertions(+), 20 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
>>>> index 9301b3ad9217..b3e4c9fcb664 100644
>>>> --- a/fs/btrfs/delayed-ref.c
>>>> +++ b/fs/btrfs/delayed-ref.c
>>>> @@ -400,6 +400,20 @@ struct btrfs_delayed_ref_head *btrfs_select_ref_head(
>>>>  	return head;
>>>>  }
>>>>  
>>>> +void btrfs_delete_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
>>>> +			   struct btrfs_delayed_ref_head *head)
>>>> +{
>>>> +	lockdep_assert_held(&delayed_refs->lock);
>>>> +	lockdep_assert_held(&head->lock);
>>>> +
>>>> +	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
>>>> +	RB_CLEAR_NODE(&head->href_node);
>>>> +	atomic_dec(&delayed_refs->num_entries);
>>>> +	delayed_refs->num_heads--;
>>>> +	if (head->processing == 0)
>>>> +		delayed_refs->num_heads_ready--;
>>>
>>> num_heads_ready will never execute in cleanup_ref_head, since
>>> processing == 0 only when the ref head is unselected. Perhaps those 2
>>> lines shouldn't be in this function? I find it a bit confusing that if
>>> processing is 0 we decrement num_heads_ready in check_ref_cleanup, but
>>> in unselect_delayed_ref_head we set it to 0 and increment it.
>>>
>>>> +}
>>>> +
>>>>  /*
>>>>   * Helper to insert the ref_node to the tail or merge with tail.
>>>>   *
>>>> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
>>>> index 8e20c5cb5404..d2af974f68a1 100644
>>>> --- a/fs/btrfs/delayed-ref.h
>>>> +++ b/fs/btrfs/delayed-ref.h
>>>> @@ -261,7 +261,8 @@ static inline void btrfs_delayed_ref_unlock(struct btrfs_delayed_ref_head *head)
>>>>  {
>>>>  	mutex_unlock(&head->mutex);
>>>>  }
>>>> -
>>>> +void btrfs_delete_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
>>>> +			   struct btrfs_delayed_ref_head *head);
>>>>  
>>>>  struct btrfs_delayed_ref_head *btrfs_select_ref_head(
>>>>  		struct btrfs_delayed_ref_root *delayed_refs);
>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>>> index d242a1174e50..c36b3a42f2bb 100644
>>>> --- a/fs/btrfs/extent-tree.c
>>>> +++ b/fs/btrfs/extent-tree.c
>>>> @@ -2474,12 +2474,9 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans,
>>>>  		spin_unlock(&delayed_refs->lock);
>>>>  		return 1;
>>>>  	}
>>>> -	delayed_refs->num_heads--;
>>>> -	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
>>>> -	RB_CLEAR_NODE(&head->href_node);
>>>> +	btrfs_delete_ref_head(delayed_refs, head);
>>>>  	spin_unlock(&head->lock);
>>>>  	spin_unlock(&delayed_refs->lock);
>>>> -	atomic_dec(&delayed_refs->num_entries);
>>>>  
>>>>  	trace_run_delayed_ref_head(fs_info, head, 0);
>>>>  
>>>> @@ -6984,22 +6981,9 @@ static noinline int check_ref_cleanup(struct btrfs_trans_handle *trans,
>>>>  	if (!mutex_trylock(&head->mutex))
>>>>  		goto out;
>>>>  
>>>> -	/*
>>>> -	 * at this point we have a head with no other entries.  Go
>>>> -	 * ahead and process it.
>>>> -	 */
>>>> -	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
>>>> -	RB_CLEAR_NODE(&head->href_node);
>>>> -	atomic_dec(&delayed_refs->num_entries);
>>>> -
>>>> -	/*
>>>> -	 * we don't take a ref on the node because we're removing it from the
>>>> -	 * tree, so we just steal the ref the tree was holding.
>>>> -	 */
>>>> -	delayed_refs->num_heads--;
>>>> -	if (head->processing == 0)
>>>> -		delayed_refs->num_heads_ready--;
>>>> +	btrfs_delete_ref_head(delayed_refs, head);
>>>>  	head->processing = 0;
>>
>> On a closer inspection I think here we can do:
>>
>> ASSERT(head->processing == 0) because at that point we've taken the
>> head->lock spinlock which is held during ordinary delayed refs
>> processing (in __btrfs_run_delayed_refs) when the head is selected (and
>> processing is 1). So head->processing == 0 here I think is a hard
>> invariant of the code. The decrement here should pair with the increment
>> when the head was initially added to the tree.
>>
>> In cleanup_ref_head we don't need to ever worry about num_heads_ready
>> since it has already been decremented by btrfs_select_ref_head.
>>
>> As a matter fact this counter is not used anywhere so we might as well
>> just remove it.
> 
> The logic does not use it, there's only a WARN_ON in
> btrfs_select_ref_head, that's more like a debugging or assertion that
> everything is fine. So the question is whether to keep it as a
> consistency check (and add comments) or remove it and simplify the code.

IMO it should go. A later patch pretty much tracks what this number used
to to track - btrfs: only track ref_heads in delayed_ref_updates.

Even for consistency I don't see much value brought by num_heads_ready.

> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/6] btrfs: add cleanup_ref_head_accounting helper
  2018-11-22  1:06   ` Qu Wenruo
@ 2018-11-23 13:51     ` David Sterba
  0 siblings, 0 replies; 23+ messages in thread
From: David Sterba @ 2018-11-23 13:51 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Josef Bacik, linux-btrfs, kernel-team, Josef Bacik

On Thu, Nov 22, 2018 at 09:06:30AM +0800, Qu Wenruo wrote:
> 
> 
> On 2018/11/22 上午2:59, Josef Bacik wrote:
> > From: Josef Bacik <jbacik@fb.com>
> > 
> > We were missing some quota cleanups in check_ref_cleanup, so break the
> > ref head accounting cleanup into a helper and call that from both
> > check_ref_cleanup and cleanup_ref_head.  This will hopefully ensure that
> > we don't screw up accounting in the future for other things that we add.
> > 
> > Reviewed-by: Omar Sandoval <osandov@fb.com>
> > Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
> > Signed-off-by: Josef Bacik <jbacik@fb.com>
> > ---
> >  fs/btrfs/extent-tree.c | 67 +++++++++++++++++++++++++++++---------------------
> >  1 file changed, 39 insertions(+), 28 deletions(-)
> > 
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index c36b3a42f2bb..e3ed3507018d 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -2443,6 +2443,41 @@ static int cleanup_extent_op(struct btrfs_trans_handle *trans,
> >  	return ret ? ret : 1;
> >  }
> >  
> > +static void cleanup_ref_head_accounting(struct btrfs_trans_handle *trans,
> > +					struct btrfs_delayed_ref_head *head)
> > +{
> > +	struct btrfs_fs_info *fs_info = trans->fs_info;
> > +	struct btrfs_delayed_ref_root *delayed_refs =
> > +		&trans->transaction->delayed_refs;
> > +
> > +	if (head->total_ref_mod < 0) {
> > +		struct btrfs_space_info *space_info;
> > +		u64 flags;
> > +
> > +		if (head->is_data)
> > +			flags = BTRFS_BLOCK_GROUP_DATA;
> > +		else if (head->is_system)
> > +			flags = BTRFS_BLOCK_GROUP_SYSTEM;
> > +		else
> > +			flags = BTRFS_BLOCK_GROUP_METADATA;
> > +		space_info = __find_space_info(fs_info, flags);
> > +		ASSERT(space_info);
> > +		percpu_counter_add_batch(&space_info->total_bytes_pinned,
> > +				   -head->num_bytes,
> > +				   BTRFS_TOTAL_BYTES_PINNED_BATCH);
> > +
> > +		if (head->is_data) {
> > +			spin_lock(&delayed_refs->lock);
> > +			delayed_refs->pending_csums -= head->num_bytes;
> > +			spin_unlock(&delayed_refs->lock);
> > +		}
> > +	}
> > +
> > +	/* Also free its reserved qgroup space */
> > +	btrfs_qgroup_free_delayed_ref(fs_info, head->qgroup_ref_root,
> > +				      head->qgroup_reserved);
> 
> This part will be removed in patch "[PATCH] btrfs: qgroup: Move reserved
> data account from btrfs_delayed_ref_head to btrfs_qgroup_extent_record".

We need to decide the order of merging if the patches touch the same
code. Your patch looks easier to refresh on top of this series, so
please hold on until this gets merged.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] btrfs: cleanup extent_op handling
  2018-11-21 18:59 ` [PATCH 3/6] btrfs: cleanup extent_op handling Josef Bacik
  2018-11-22  8:56   ` Lu Fengqi
  2018-11-22 10:09   ` Nikolay Borisov
@ 2018-11-23 15:05   ` David Sterba
  2 siblings, 0 replies; 23+ messages in thread
From: David Sterba @ 2018-11-23 15:05 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team, Josef Bacik

On Wed, Nov 21, 2018 at 01:59:09PM -0500, Josef Bacik wrote:
> From: Josef Bacik <jbacik@fb.com>
> 
> The cleanup_extent_op function actually would run the extent_op if it
> needed running, which made the name sort of a misnomer.  Change it to
> run_and_cleanup_extent_op, and move the actual cleanup work to
> cleanup_extent_op so it can be used by check_ref_cleanup() in order to
> unify the extent op handling.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  fs/btrfs/extent-tree.c | 36 +++++++++++++++++++++++-------------
>  1 file changed, 23 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index e3ed3507018d..8a776dc9cb38 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2424,19 +2424,33 @@ static void unselect_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_ref
>  	btrfs_delayed_ref_unlock(head);
>  }
>  
> -static int cleanup_extent_op(struct btrfs_trans_handle *trans,
> -			     struct btrfs_delayed_ref_head *head)
> +static struct btrfs_delayed_extent_op *
> +cleanup_extent_op(struct btrfs_trans_handle *trans,
> +		  struct btrfs_delayed_ref_head *head)

Please keep the type and function name on one line, if the arguments
would overflow, then move it on the next line and un-indent as
necessary. I fix such things when merging, no need to resend.

>  {
>  	struct btrfs_delayed_extent_op *extent_op = head->extent_op;
> -	int ret;
>  
>  	if (!extent_op)
> -		return 0;
> -	head->extent_op = NULL;
> +		return NULL;
> +
>  	if (head->must_insert_reserved) {
> +		head->extent_op = NULL;
>  		btrfs_free_delayed_extent_op(extent_op);
> -		return 0;
> +		return NULL;
>  	}
> +	return extent_op;
> +}
> +
> +static int run_and_cleanup_extent_op(struct btrfs_trans_handle *trans,
> +				     struct btrfs_delayed_ref_head *head)
> +{
> +	struct btrfs_delayed_extent_op *extent_op =
> +		cleanup_extent_op(trans, head);

I prefer to do non-trivial initializations in the statement block, it's
easy to overlook that in among the declarations. Simple variable init,
pointer dereferenes or using simple helpers is fine but cleanup_extent_op
does not seem to fit to that. It's in other patches too so I can update
that unless there are other things that would need a resend.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Delayed refs rsv
  2018-11-21 18:59 [PATCH 0/6] Delayed refs rsv Josef Bacik
                   ` (5 preceding siblings ...)
  2018-11-21 18:59 ` [PATCH 6/6] btrfs: fix truncate throttling Josef Bacik
@ 2018-11-23 15:55 ` David Sterba
  6 siblings, 0 replies; 23+ messages in thread
From: David Sterba @ 2018-11-23 15:55 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Wed, Nov 21, 2018 at 01:59:06PM -0500, Josef Bacik wrote:
> This patchset changes how we do space reservations for delayed refs.  We were
> hitting probably 20-40 enospc abort's per day in production while running
> delayed refs at transaction commit time.  This means we ran out of space in the
> global reserve and couldn't easily get more space in use_block_rsv().
> 
> The global reserve has grown to cover everything we don't reserve space
> explicitly for, and we've grown a lot of weird ad-hoc hueristics to know if
> we're running short on space and when it's time to force a commit.  A failure
> rate of 20-40 file systems when we run hundreds of thousands of them isn't super
> high, but cleaning up this code will make things less ugly and more predictible.
> 
> Thus the delayed refs rsv.  We always know how many delayed refs we have
> outstanding, and although running them generates more we can use the global
> reserve for that spill over, which fits better into it's desired use than a full
> blown reservation.  This first approach is to simply take how many times we're
> reserving space for and multiply that by 2 in order to save enough space for the
> delayed refs that could be generated.  This is a niave approach and will
> probably evolve, but for now it works.
> 
> With this patchset we've gone down to 2-8 failures per week.  It's not perfect,
> there are some corner cases that still need to be addressed, but is
> significantly better than what we had.  Thanks,

Parts of this cover letter could go to the main patch 5/6 to extend the
background why the change is made.

As the other patchsets build on top of this, I'll merge this to to
misc-next soon. The testing so far looks ok, some patches might need
fixups but nothing big that I can't do at commit time.

The patch 5/6 still seems too big, some parts could be split as
preparatory work, otherwise the main idea where to wire in the special
block reserve seems clear. If anybody wants to do a review there, please
do, I did only a high-level and did not check all the calculations etc.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/6] btrfs: introduce delayed_refs_rsv
  2018-11-21 18:59 ` [PATCH 5/6] btrfs: introduce delayed_refs_rsv Josef Bacik
@ 2018-11-26  9:14   ` Nikolay Borisov
  2018-11-27 15:38     ` David Sterba
  2018-11-27 19:11     ` Josef Bacik
  0 siblings, 2 replies; 23+ messages in thread
From: Nikolay Borisov @ 2018-11-26  9:14 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team; +Cc: Josef Bacik



On 21.11.18 г. 20:59 ч., Josef Bacik wrote:
> From: Josef Bacik <jbacik@fb.com>
> 
> Traditionally we've had voodoo in btrfs to account for the space that
> delayed refs may take up by having a global_block_rsv.  This works most
> of the time, except when it doesn't.  We've had issues reported and seen
> in production where sometimes the global reserve is exhausted during
> transaction commit before we can run all of our delayed refs, resulting
> in an aborted transaction.  Because of this voodoo we have equally
> dubious flushing semantics around throttling delayed refs which we often
> get wrong.
> 
> So instead give them their own block_rsv.  This way we can always know
> exactly how much outstanding space we need for delayed refs.  This
> allows us to make sure we are constantly filling that reservation up
> with space, and allows us to put more precise pressure on the enospc
> system.  Instead of doing math to see if its a good time to throttle,
> the normal enospc code will be invoked if we have a lot of delayed refs
> pending, and they will be run via the normal flushing mechanism.
> 
> For now the delayed_refs_rsv will hold the reservations for the delayed
> refs, the block group updates, and deleting csums.  We could have a
> separate rsv for the block group updates, but the csum deletion stuff is
> still handled via the delayed_refs so that will stay there.

Couldn't the same "no premature ENOSPC in critical section" effect be
achieved if we simply allocate 2* num bytes in start transaction without
adding the additional granularity for delayd refs? There seems to be a
lot of supporting code added  and this obfuscates the simple idea that
now we do 2* reservation and put it in a separate block_rsv structure.

> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  fs/btrfs/ctree.h             |  26 ++--
>  fs/btrfs/delayed-ref.c       |  28 ++++-
>  fs/btrfs/disk-io.c           |   4 +
>  fs/btrfs/extent-tree.c       | 279 +++++++++++++++++++++++++++++++++++--------
>  fs/btrfs/inode.c             |   4 +-
>  fs/btrfs/transaction.c       |  77 ++++++------
>  include/trace/events/btrfs.h |   2 +
>  7 files changed, 313 insertions(+), 107 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 8b41ec42f405..0c6d589c8ce4 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -448,8 +448,9 @@ struct btrfs_space_info {
>  #define	BTRFS_BLOCK_RSV_TRANS		3
>  #define	BTRFS_BLOCK_RSV_CHUNK		4
>  #define	BTRFS_BLOCK_RSV_DELOPS		5
> -#define	BTRFS_BLOCK_RSV_EMPTY		6
> -#define	BTRFS_BLOCK_RSV_TEMP		7
> +#define BTRFS_BLOCK_RSV_DELREFS		6
> +#define	BTRFS_BLOCK_RSV_EMPTY		7
> +#define	BTRFS_BLOCK_RSV_TEMP		8
>  
>  struct btrfs_block_rsv {
>  	u64 size;
> @@ -812,6 +813,8 @@ struct btrfs_fs_info {
>  	struct btrfs_block_rsv chunk_block_rsv;
>  	/* block reservation for delayed operations */
>  	struct btrfs_block_rsv delayed_block_rsv;
> +	/* block reservation for delayed refs */
> +	struct btrfs_block_rsv delayed_refs_rsv;
>  
>  	struct btrfs_block_rsv empty_block_rsv;
>  
> @@ -2628,7 +2631,7 @@ static inline u64 btrfs_calc_trunc_metadata_size(struct btrfs_fs_info *fs_info,
>  }
>  
>  int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans);
> -int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans);
> +bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
>  void btrfs_dec_block_group_reservations(struct btrfs_fs_info *fs_info,
>  					 const u64 start);
>  void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg);
> @@ -2742,10 +2745,12 @@ enum btrfs_reserve_flush_enum {
>  enum btrfs_flush_state {
>  	FLUSH_DELAYED_ITEMS_NR	=	1,
>  	FLUSH_DELAYED_ITEMS	=	2,
> -	FLUSH_DELALLOC		=	3,
> -	FLUSH_DELALLOC_WAIT	=	4,
> -	ALLOC_CHUNK		=	5,
> -	COMMIT_TRANS		=	6,
> +	FLUSH_DELAYED_REFS_NR	=	3,
> +	FLUSH_DELAYED_REFS	=	4,
> +	FLUSH_DELALLOC		=	5,
> +	FLUSH_DELALLOC_WAIT	=	6,
> +	ALLOC_CHUNK		=	7,
> +	COMMIT_TRANS		=	8,
>  };
>  
>  int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes);
> @@ -2796,6 +2801,13 @@ int btrfs_cond_migrate_bytes(struct btrfs_fs_info *fs_info,
>  void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
>  			     struct btrfs_block_rsv *block_rsv,
>  			     u64 num_bytes);
> +void btrfs_delayed_refs_rsv_release(struct btrfs_fs_info *fs_info, int nr);
> +void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans);
> +int btrfs_throttle_delayed_refs(struct btrfs_fs_info *fs_info,
> +				enum btrfs_reserve_flush_enum flush);
> +void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info,
> +				       struct btrfs_block_rsv *src,
> +				       u64 num_bytes);
>  int btrfs_inc_block_group_ro(struct btrfs_block_group_cache *cache);
>  void btrfs_dec_block_group_ro(struct btrfs_block_group_cache *cache);
>  void btrfs_put_block_group_cache(struct btrfs_fs_info *info);
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index 48725fa757a3..96de92588f06 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -474,11 +474,14 @@ static int insert_delayed_ref(struct btrfs_trans_handle *trans,
>   * existing and update must have the same bytenr
>   */
>  static noinline void
> -update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs,
> +update_existing_head_ref(struct btrfs_trans_handle *trans,
>  			 struct btrfs_delayed_ref_head *existing,
>  			 struct btrfs_delayed_ref_head *update,
>  			 int *old_ref_mod_ret)
>  {
> +	struct btrfs_delayed_ref_root *delayed_refs =
> +		&trans->transaction->delayed_refs;
> +	struct btrfs_fs_info *fs_info = trans->fs_info;
>  	int old_ref_mod;
>  
>  	BUG_ON(existing->is_data != update->is_data);
> @@ -536,10 +539,18 @@ update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs,
>  	 * versa we need to make sure to adjust pending_csums accordingly.
>  	 */
>  	if (existing->is_data) {
> -		if (existing->total_ref_mod >= 0 && old_ref_mod < 0)
> +		u64 csum_items =

The name is a bit of misnomer, what csum_items really holds is the
number of btree leaves require to store them.

> +			btrfs_csum_bytes_to_leaves(fs_info,
> +						   existing->num_bytes);
> +
> +		if (existing->total_ref_mod >= 0 && old_ref_mod < 0) {
>  			delayed_refs->pending_csums -= existing->num_bytes;
> -		if (existing->total_ref_mod < 0 && old_ref_mod >= 0)
> +			btrfs_delayed_refs_rsv_release(fs_info, csum_items);
> +		}
> +		if (existing->total_ref_mod < 0 && old_ref_mod >= 0) {
>  			delayed_refs->pending_csums += existing->num_bytes;
> +			trans->delayed_ref_updates += csum_items;

Now with addition to tracking the number of delayed heads/ref,
delayed_ref_updates now also tracks the number of csum leaves. Doesn't
this make the number in delayed_ref_updates a bit of a mess?

Additionally, shouldn't delayed_ref_updates really track only refs
otherwise double accounting occurs. I.e for a brand-new ref insertion we
create a head so increment delayed_ref_updates once in
add_delayed_ref_head, then we create the ref and increment it again in
insert_delayed_ref?


> +		}
>  	}
>  	spin_unlock(&existing->lock);
>  }
> @@ -645,7 +656,7 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
>  			&& head_ref->qgroup_reserved
>  			&& existing->qgroup_ref_root
>  			&& existing->qgroup_reserved);
> -		update_existing_head_ref(delayed_refs, existing, head_ref,
> +		update_existing_head_ref(trans, existing, head_ref,
>  					 old_ref_mod);
>  		/*
>  		 * we've updated the existing ref, free the newly
> @@ -656,8 +667,12 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
>  	} else {
>  		if (old_ref_mod)
>  			*old_ref_mod = 0;
> -		if (head_ref->is_data && head_ref->ref_mod < 0)
> +		if (head_ref->is_data && head_ref->ref_mod < 0) {

nit: I think for the sake of consistency you must really check
head_ref->total_ref_mod as per the code in:

1262133b8d6f ("Btrfs: account for crcs in delayed ref processing")

Additionally the comment above total_ref_mod also documents this.

Here it's guaranteed that head_ref->ref_mod and ->total_ref_mod are
identical.

>  			delayed_refs->pending_csums += head_ref->num_bytes;
> +			trans->delayed_ref_updates +=
> +				btrfs_csum_bytes_to_leaves(trans->fs_info,
> +							   head_ref->num_bytes);
> +		}
>  		delayed_refs->num_heads++;
>  		delayed_refs->num_heads_ready++;
>  		atomic_inc(&delayed_refs->num_entries);
> @@ -792,6 +807,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans,
>  
>  	ret = insert_delayed_ref(trans, delayed_refs, head_ref, &ref->node);
>  	spin_unlock(&delayed_refs->lock);
> +	btrfs_update_delayed_refs_rsv(trans);
>  
>  	trace_add_delayed_tree_ref(fs_info, &ref->node, ref,
>  				   action == BTRFS_ADD_DELAYED_EXTENT ?
> @@ -873,6 +889,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans,
>  
>  	ret = insert_delayed_ref(trans, delayed_refs, head_ref, &ref->node);
>  	spin_unlock(&delayed_refs->lock);
> +	btrfs_update_delayed_refs_rsv(trans);

As stated previously this call to btrfs_update_delayed_refs_rsv is
really paired with the code that was executed in add_delayed_ref_head.
Since it's not really obvious from first look at least a comment would
be fine. Or better yet (as suggested previously) just make updating the
delayed_refs_updates and the resv one operation. I don't see how this
function adds much to the critical section of delayed_refs->lock. The
interface you give is really error-prone.

>  
>  	trace_add_delayed_data_ref(trans->fs_info, &ref->node, ref,
>  				   action == BTRFS_ADD_DELAYED_EXTENT ?
> @@ -910,6 +927,7 @@ int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
>  			     NULL, NULL, NULL);
>  
>  	spin_unlock(&delayed_refs->lock);
> +	btrfs_update_delayed_refs_rsv(trans);
Ditto

>  	return 0;
>  }
>  
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 55e8ca782b98..beaa58e742b5 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2666,6 +2666,9 @@ int open_ctree(struct super_block *sb,
>  	btrfs_init_block_rsv(&fs_info->empty_block_rsv, BTRFS_BLOCK_RSV_EMPTY);
>  	btrfs_init_block_rsv(&fs_info->delayed_block_rsv,
>  			     BTRFS_BLOCK_RSV_DELOPS);
> +	btrfs_init_block_rsv(&fs_info->delayed_refs_rsv,
> +			     BTRFS_BLOCK_RSV_DELREFS);
> +
>  	atomic_set(&fs_info->async_delalloc_pages, 0);
>  	atomic_set(&fs_info->defrag_running, 0);
>  	atomic_set(&fs_info->qgroup_op_seq, 0);
> @@ -4413,6 +4416,7 @@ void btrfs_cleanup_dirty_bgs(struct btrfs_transaction *cur_trans,
>  
>  		spin_unlock(&cur_trans->dirty_bgs_lock);
>  		btrfs_put_block_group(cache);
> +		btrfs_delayed_refs_rsv_release(fs_info, 1);
>  		spin_lock(&cur_trans->dirty_bgs_lock);
>  	}
>  	spin_unlock(&cur_trans->dirty_bgs_lock);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 8a776dc9cb38..8af68b13fa27 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2463,6 +2463,7 @@ static void cleanup_ref_head_accounting(struct btrfs_trans_handle *trans,
>  	struct btrfs_fs_info *fs_info = trans->fs_info;
>  	struct btrfs_delayed_ref_root *delayed_refs =
>  		&trans->transaction->delayed_refs;
> +	int nr_items = 1;

This 1 is a magic value, where does it come from?

>  
>  	if (head->total_ref_mod < 0) {
>  		struct btrfs_space_info *space_info;
> @@ -2484,12 +2485,15 @@ static void cleanup_ref_head_accounting(struct btrfs_trans_handle *trans,
>  			spin_lock(&delayed_refs->lock);
>  			delayed_refs->pending_csums -= head->num_bytes;
>  			spin_unlock(&delayed_refs->lock);
> +			nr_items += btrfs_csum_bytes_to_leaves(fs_info,
> +				head->num_bytes);
>  		}
>  	}
>  
>  	/* Also free its reserved qgroup space */
>  	btrfs_qgroup_free_delayed_ref(fs_info, head->qgroup_ref_root,
>  				      head->qgroup_reserved);
> +	btrfs_delayed_refs_rsv_release(fs_info, nr_items);
>  }
>  
>  static int cleanup_ref_head(struct btrfs_trans_handle *trans,
> @@ -2831,40 +2835,22 @@ u64 btrfs_csum_bytes_to_leaves(struct btrfs_fs_info *fs_info, u64 csum_bytes)
>  	return num_csums;
>  }
>  
> -int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans)
> +bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info)
>  {
> -	struct btrfs_fs_info *fs_info = trans->fs_info;
> -	struct btrfs_block_rsv *global_rsv;
> -	u64 num_heads = trans->transaction->delayed_refs.num_heads_ready;
> -	u64 csum_bytes = trans->transaction->delayed_refs.pending_csums;
> -	unsigned int num_dirty_bgs = trans->transaction->num_dirty_bgs;
> -	u64 num_bytes, num_dirty_bgs_bytes;
> -	int ret = 0;
> -
> -	num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
> -	num_heads = heads_to_leaves(fs_info, num_heads);
> -	if (num_heads > 1)
> -		num_bytes += (num_heads - 1) * fs_info->nodesize;
> -	num_bytes <<= 1;
> -	num_bytes += btrfs_csum_bytes_to_leaves(fs_info, csum_bytes) *
> -							fs_info->nodesize;
> -	num_dirty_bgs_bytes = btrfs_calc_trans_metadata_size(fs_info,
> -							     num_dirty_bgs);
> -	global_rsv = &fs_info->global_block_rsv;
> -
> -	/*
> -	 * If we can't allocate any more chunks lets make sure we have _lots_ of
> -	 * wiggle room since running delayed refs can create more delayed refs.
> -	 */
> -	if (global_rsv->space_info->full) {
> -		num_dirty_bgs_bytes <<= 1;
> -		num_bytes <<= 1;
> -	}
> +	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
> +	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
> +	u64 reserved;
> +	bool ret = false;

nit: Reverse christmas tree:

delayed_refs_rsv
global_rsv
bool ret
reserved

>  
>  	spin_lock(&global_rsv->lock);
> -	if (global_rsv->reserved <= num_bytes + num_dirty_bgs_bytes)
> -		ret = 1;
> +	reserved = global_rsv->reserved;
>  	spin_unlock(&global_rsv->lock);
> +
> +	spin_lock(&delayed_refs_rsv->lock);
> +	reserved += delayed_refs_rsv->reserved;
> +	if (delayed_refs_rsv->size >= reserved)
> +		ret = true;

Essentially you want to throttle every time there is enough available
reserved space, no?

> +	spin_unlock(&delayed_refs_rsv->lock);
>  	return ret;
>  }
>  
> @@ -2883,7 +2869,7 @@ int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans)
>  	if (val >= NSEC_PER_SEC / 2)
>  		return 2;
>  
> -	return btrfs_check_space_for_delayed_refs(trans);
> +	return btrfs_check_space_for_delayed_refs(trans->fs_info);
>  }
>  
>  struct async_delayed_refs {
> @@ -3627,6 +3613,8 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans)
>  	 */
>  	mutex_lock(&trans->transaction->cache_write_mutex);
>  	while (!list_empty(&dirty)) {
> +		bool drop_reserve = true;
> +
>  		cache = list_first_entry(&dirty,
>  					 struct btrfs_block_group_cache,
>  					 dirty_list);
> @@ -3699,6 +3687,7 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans)
>  					list_add_tail(&cache->dirty_list,
>  						      &cur_trans->dirty_bgs);
>  					btrfs_get_block_group(cache);
> +					drop_reserve = false;
>  				}
>  				spin_unlock(&cur_trans->dirty_bgs_lock);
>  			} else if (ret) {
> @@ -3709,6 +3698,8 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans)
>  		/* if its not on the io list, we need to put the block group */
>  		if (should_put)
>  			btrfs_put_block_group(cache);
> +		if (drop_reserve)
> +			btrfs_delayed_refs_rsv_release(fs_info, 1);
>  
>  		if (ret)
>  			break;
> @@ -3857,6 +3848,7 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans,
>  		/* if its not on the io list, we need to put the block group */
>  		if (should_put)
>  			btrfs_put_block_group(cache);
> +		btrfs_delayed_refs_rsv_release(fs_info, 1);
>  		spin_lock(&cur_trans->dirty_bgs_lock);
>  	}
>  	spin_unlock(&cur_trans->dirty_bgs_lock);
> @@ -4829,8 +4821,10 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
>  {
>  	struct reserve_ticket *ticket = NULL;
>  	struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_block_rsv;
> +	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
>  	struct btrfs_trans_handle *trans;
>  	u64 bytes;
> +	u64 reclaim_bytes = 0;
>  
>  	trans = (struct btrfs_trans_handle *)current->journal_info;
>  	if (trans)
> @@ -4863,12 +4857,16 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
>  		return -ENOSPC;
>  
>  	spin_lock(&delayed_rsv->lock);
> -	if (delayed_rsv->size > bytes)
> -		bytes = 0;
> -	else
> -		bytes -= delayed_rsv->size;
> +	reclaim_bytes += delayed_rsv->reserved;
>  	spin_unlock(&delayed_rsv->lock);
>  
> +	spin_lock(&delayed_refs_rsv->lock);
> +	reclaim_bytes += delayed_refs_rsv->reserved;
> +	spin_unlock(&delayed_refs_rsv->lock);
> +	if (reclaim_bytes >= bytes)
> +		goto commit;
> +	bytes -= reclaim_bytes;

Reading reclaim_bytes I start thinking "this must be the value we want
to reclaim in order to satisfy the allocation" but no, in reality it
holds the amount of available bytes, which  is subtracted from the
required bytes. Rename to something like "available_bytes"

> +
>  	if (__percpu_counter_compare(&space_info->total_bytes_pinned,
>  				   bytes,
>  				   BTRFS_TOTAL_BYTES_PINNED_BATCH) < 0) {

<snip>

> +/**
> + * btrfs_throttle_delayed_refs - throttle based on our delayed refs usage.
> + * @fs_info - the fs_info for our fs.
> + * @flush - control how we can flush for this reservation.
> + *
> + * This will refill the delayed block_rsv up to 1 items size worth of space and
> + * will return -ENOSPC if we can't make the reservation.
> + */
> +int btrfs_throttle_delayed_refs(struct btrfs_fs_info *fs_info,
> +				enum btrfs_reserve_flush_enum flush)
> +{

Why is this function called throttle? I don't see any throttling logic
in it?
> +	struct btrfs_block_rsv *block_rsv = &fs_info->delayed_refs_rsv;
> +	u64 limit = btrfs_calc_trans_metadata_size(fs_info, 1);
> +	u64 num_bytes = 0;
> +	int ret = -ENOSPC;
> +
> +	spin_lock(&block_rsv->lock);
> +	if (block_rsv->reserved < block_rsv->size) {
> +		num_bytes = block_rsv->size - block_rsv->reserved;
> +		num_bytes = min(num_bytes, limit);
> +	}
> +	spin_unlock(&block_rsv->lock);
> +
> +	if (!num_bytes)
> +		return 0;
> +
> +	ret = reserve_metadata_bytes(fs_info->extent_root, block_rsv,
> +				     num_bytes, flush);
> +	if (ret)
> +		return ret;
> +	block_rsv_add_bytes(block_rsv, num_bytes, 0);
> +	trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv",
> +				      0, num_bytes, 1);
> +	return 0;
> +}
> +

<snip>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/6] btrfs: fix truncate throttling
  2018-11-21 18:59 ` [PATCH 6/6] btrfs: fix truncate throttling Josef Bacik
@ 2018-11-26  9:44   ` Nikolay Borisov
  0 siblings, 0 replies; 23+ messages in thread
From: Nikolay Borisov @ 2018-11-26  9:44 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 21.11.18 г. 20:59 ч., Josef Bacik wrote:
> We have a bunch of magic to make sure we're throttling delayed refs when
> truncating a file.  Now that we have a delayed refs rsv and a mechanism
> for refilling that reserve simply use that instead of all of this magic.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/inode.c | 79 ++++++++++++--------------------------------------------
>  1 file changed, 17 insertions(+), 62 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 8532a2eb56d1..cae30f6c095f 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -4437,31 +4437,6 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
>  	return err;
>  }
>  
> -static int truncate_space_check(struct btrfs_trans_handle *trans,
> -				struct btrfs_root *root,
> -				u64 bytes_deleted)
> -{
> -	struct btrfs_fs_info *fs_info = root->fs_info;
> -	int ret;
> -
> -	/*
> -	 * This is only used to apply pressure to the enospc system, we don't
> -	 * intend to use this reservation at all.
> -	 */
> -	bytes_deleted = btrfs_csum_bytes_to_leaves(fs_info, bytes_deleted);
> -	bytes_deleted *= fs_info->nodesize;
> -	ret = btrfs_block_rsv_add(root, &fs_info->trans_block_rsv,
> -				  bytes_deleted, BTRFS_RESERVE_NO_FLUSH);
> -	if (!ret) {
> -		trace_btrfs_space_reservation(fs_info, "transaction",
> -					      trans->transid,
> -					      bytes_deleted, 1);
> -		trans->bytes_reserved += bytes_deleted;
> -	}
> -	return ret;
> -
> -}
> -
>  /*
>   * Return this if we need to call truncate_block for the last bit of the
>   * truncate.
> @@ -4506,7 +4481,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>  	u64 bytes_deleted = 0;
>  	bool be_nice = false;
>  	bool should_throttle = false;
> -	bool should_end = false;
>  
>  	BUG_ON(new_size > 0 && min_type != BTRFS_EXTENT_DATA_KEY);
>  
> @@ -4719,15 +4693,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>  				btrfs_abort_transaction(trans, ret);
>  				break;
>  			}
> -			if (btrfs_should_throttle_delayed_refs(trans))
> -				btrfs_async_run_delayed_refs(fs_info,
> -					trans->delayed_ref_updates * 2,
> -					trans->transid, 0);
>  			if (be_nice) {
> -				if (truncate_space_check(trans, root,
> -							 extent_num_bytes)) {
> -					should_end = true;
> -				}
>  				if (btrfs_should_throttle_delayed_refs(trans))
>  					should_throttle = true;
>  			}
> @@ -4738,7 +4704,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>  
>  		if (path->slots[0] == 0 ||
>  		    path->slots[0] != pending_del_slot ||
> -		    should_throttle || should_end) {
> +		    should_throttle) {
>  			if (pending_del_nr) {
>  				ret = btrfs_del_items(trans, root, path,
>  						pending_del_slot,
> @@ -4750,23 +4716,24 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>  				pending_del_nr = 0;
>  			}
>  			btrfs_release_path(path);
> -			if (should_throttle) {
> -				unsigned long updates = trans->delayed_ref_updates;
> -				if (updates) {
> -					trans->delayed_ref_updates = 0;
> -					ret = btrfs_run_delayed_refs(trans,
> -								   updates * 2);
> -					if (ret)
> -						break;
> -				}
> -			}
> +
>  			/*
> -			 * if we failed to refill our space rsv, bail out
> -			 * and let the transaction restart
> +			 * We can generate a lot of delayed refs, so we need to
> +			 * throttle every once and a while and make sure we're
> +			 * adding enough space to keep up with the work we are
> +			 * generating.  Since we hold a transaction here we
> +			 * can't flush, and we don't want to FLUSH_LIMIT because
> +			 * we could have generated too many delayed refs to
> +			 * actually allocate, so just bail if we're short and
> +			 * let the normal reservation dance happen higher up.
>  			 */
> -			if (should_end) {
> -				ret = -EAGAIN;
> -				break;
> +			if (should_throttle) {
> +				ret = btrfs_throttle_delayed_refs(fs_info,
> +							BTRFS_RESERVE_NO_FLUSH);
> +				if (ret) {
> +					ret = -EAGAIN;
> +					break;
> +				}
>  			}
>  			goto search_again;
>  		} else {
> @@ -4792,18 +4759,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>  	}
>  
>  	btrfs_free_path(path);
> -
> -	if (be_nice && bytes_deleted > SZ_32M && (ret >= 0 || ret == -EAGAIN)) {
> -		unsigned long updates = trans->delayed_ref_updates;
> -		int err;
> -
> -		if (updates) {
> -			trans->delayed_ref_updates = 0;
> -			err = btrfs_run_delayed_refs(trans, updates * 2);
> -			if (err)
> -				ret = err;
> -		}
> -	}
>  	return ret;
>  }
>  
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/6] btrfs: introduce delayed_refs_rsv
  2018-11-26  9:14   ` Nikolay Borisov
@ 2018-11-27 15:38     ` David Sterba
  2018-11-27 19:11     ` Josef Bacik
  1 sibling, 0 replies; 23+ messages in thread
From: David Sterba @ 2018-11-27 15:38 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Josef Bacik, linux-btrfs, kernel-team, Josef Bacik

On Mon, Nov 26, 2018 at 11:14:12AM +0200, Nikolay Borisov wrote:
> > -	if (global_rsv->space_info->full) {
> > -		num_dirty_bgs_bytes <<= 1;
> > -		num_bytes <<= 1;
> > -	}
> > +	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
> > +	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
> > +	u64 reserved;
> > +	bool ret = false;
> 
> nit: Reverse christmas tree:

Let it be known that the reverse-xmas-tree declaration style will _not_
be enforced in new patches and cleanup patches only reordering
declarations will not be accepted.

You're free to use it but I may edit your patch so the declaration
conforms to the style used in the file or to a pattern that's common the
subsystem, eg. global pointers like fs_info or root first in the list.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] btrfs: cleanup extent_op handling
  2018-11-22 10:09   ` Nikolay Borisov
@ 2018-11-27 15:39     ` Josef Bacik
  0 siblings, 0 replies; 23+ messages in thread
From: Josef Bacik @ 2018-11-27 15:39 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Josef Bacik, linux-btrfs, kernel-team, Josef Bacik

On Thu, Nov 22, 2018 at 12:09:34PM +0200, Nikolay Borisov wrote:
> 
> 
> On 21.11.18 г. 20:59 ч., Josef Bacik wrote:
> > From: Josef Bacik <jbacik@fb.com>
> > 
> > The cleanup_extent_op function actually would run the extent_op if it
> > needed running, which made the name sort of a misnomer.  Change it to
> > run_and_cleanup_extent_op, and move the actual cleanup work to
> > cleanup_extent_op so it can be used by check_ref_cleanup() in order to
> > unify the extent op handling.
> 
> The whole name extent_op is actually a misnomer since AFAIR this is some
> sort of modification of the references of metadata nodes. I don't see
> why it can't be made as yet another type of reference which is run for a
> given node.
> 

It would change the key for a metadata extent reference for non-skinny metadata,
and it sets the FULL_BACKREF flag.  Since it really only changes flags now we
could probably roll that into it's own thing, but that's out of scope for this
stuff.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/6] btrfs: introduce delayed_refs_rsv
  2018-11-26  9:14   ` Nikolay Borisov
  2018-11-27 15:38     ` David Sterba
@ 2018-11-27 19:11     ` Josef Bacik
  1 sibling, 0 replies; 23+ messages in thread
From: Josef Bacik @ 2018-11-27 19:11 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Josef Bacik, linux-btrfs, kernel-team, Josef Bacik

On Mon, Nov 26, 2018 at 11:14:12AM +0200, Nikolay Borisov wrote:
> 
> 
> On 21.11.18 г. 20:59 ч., Josef Bacik wrote:
> > From: Josef Bacik <jbacik@fb.com>
> > 
> > Traditionally we've had voodoo in btrfs to account for the space that
> > delayed refs may take up by having a global_block_rsv.  This works most
> > of the time, except when it doesn't.  We've had issues reported and seen
> > in production where sometimes the global reserve is exhausted during
> > transaction commit before we can run all of our delayed refs, resulting
> > in an aborted transaction.  Because of this voodoo we have equally
> > dubious flushing semantics around throttling delayed refs which we often
> > get wrong.
> > 
> > So instead give them their own block_rsv.  This way we can always know
> > exactly how much outstanding space we need for delayed refs.  This
> > allows us to make sure we are constantly filling that reservation up
> > with space, and allows us to put more precise pressure on the enospc
> > system.  Instead of doing math to see if its a good time to throttle,
> > the normal enospc code will be invoked if we have a lot of delayed refs
> > pending, and they will be run via the normal flushing mechanism.
> > 
> > For now the delayed_refs_rsv will hold the reservations for the delayed
> > refs, the block group updates, and deleting csums.  We could have a
> > separate rsv for the block group updates, but the csum deletion stuff is
> > still handled via the delayed_refs so that will stay there.
> 
> Couldn't the same "no premature ENOSPC in critical section" effect be
> achieved if we simply allocate 2* num bytes in start transaction without
> adding the additional granularity for delayd refs? There seems to be a
> lot of supporting code added  and this obfuscates the simple idea that
> now we do 2* reservation and put it in a separate block_rsv structure.
> 
> > 
> > Signed-off-by: Josef Bacik <jbacik@fb.com>
> > ---
> >  fs/btrfs/ctree.h             |  26 ++--
> >  fs/btrfs/delayed-ref.c       |  28 ++++-
> >  fs/btrfs/disk-io.c           |   4 +
> >  fs/btrfs/extent-tree.c       | 279 +++++++++++++++++++++++++++++++++++--------
> >  fs/btrfs/inode.c             |   4 +-
> >  fs/btrfs/transaction.c       |  77 ++++++------
> >  include/trace/events/btrfs.h |   2 +
> >  7 files changed, 313 insertions(+), 107 deletions(-)
> > 
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index 8b41ec42f405..0c6d589c8ce4 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -448,8 +448,9 @@ struct btrfs_space_info {
> >  #define	BTRFS_BLOCK_RSV_TRANS		3
> >  #define	BTRFS_BLOCK_RSV_CHUNK		4
> >  #define	BTRFS_BLOCK_RSV_DELOPS		5
> > -#define	BTRFS_BLOCK_RSV_EMPTY		6
> > -#define	BTRFS_BLOCK_RSV_TEMP		7
> > +#define BTRFS_BLOCK_RSV_DELREFS		6
> > +#define	BTRFS_BLOCK_RSV_EMPTY		7
> > +#define	BTRFS_BLOCK_RSV_TEMP		8
> >  
> >  struct btrfs_block_rsv {
> >  	u64 size;
> > @@ -812,6 +813,8 @@ struct btrfs_fs_info {
> >  	struct btrfs_block_rsv chunk_block_rsv;
> >  	/* block reservation for delayed operations */
> >  	struct btrfs_block_rsv delayed_block_rsv;
> > +	/* block reservation for delayed refs */
> > +	struct btrfs_block_rsv delayed_refs_rsv;
> >  
> >  	struct btrfs_block_rsv empty_block_rsv;
> >  
> > @@ -2628,7 +2631,7 @@ static inline u64 btrfs_calc_trunc_metadata_size(struct btrfs_fs_info *fs_info,
> >  }
> >  
> >  int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans);
> > -int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans);
> > +bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
> >  void btrfs_dec_block_group_reservations(struct btrfs_fs_info *fs_info,
> >  					 const u64 start);
> >  void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg);
> > @@ -2742,10 +2745,12 @@ enum btrfs_reserve_flush_enum {
> >  enum btrfs_flush_state {
> >  	FLUSH_DELAYED_ITEMS_NR	=	1,
> >  	FLUSH_DELAYED_ITEMS	=	2,
> > -	FLUSH_DELALLOC		=	3,
> > -	FLUSH_DELALLOC_WAIT	=	4,
> > -	ALLOC_CHUNK		=	5,
> > -	COMMIT_TRANS		=	6,
> > +	FLUSH_DELAYED_REFS_NR	=	3,
> > +	FLUSH_DELAYED_REFS	=	4,
> > +	FLUSH_DELALLOC		=	5,
> > +	FLUSH_DELALLOC_WAIT	=	6,
> > +	ALLOC_CHUNK		=	7,
> > +	COMMIT_TRANS		=	8,
> >  };
> >  
> >  int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes);
> > @@ -2796,6 +2801,13 @@ int btrfs_cond_migrate_bytes(struct btrfs_fs_info *fs_info,
> >  void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
> >  			     struct btrfs_block_rsv *block_rsv,
> >  			     u64 num_bytes);
> > +void btrfs_delayed_refs_rsv_release(struct btrfs_fs_info *fs_info, int nr);
> > +void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans);
> > +int btrfs_throttle_delayed_refs(struct btrfs_fs_info *fs_info,
> > +				enum btrfs_reserve_flush_enum flush);
> > +void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info,
> > +				       struct btrfs_block_rsv *src,
> > +				       u64 num_bytes);
> >  int btrfs_inc_block_group_ro(struct btrfs_block_group_cache *cache);
> >  void btrfs_dec_block_group_ro(struct btrfs_block_group_cache *cache);
> >  void btrfs_put_block_group_cache(struct btrfs_fs_info *info);
> > diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> > index 48725fa757a3..96de92588f06 100644
> > --- a/fs/btrfs/delayed-ref.c
> > +++ b/fs/btrfs/delayed-ref.c
> > @@ -474,11 +474,14 @@ static int insert_delayed_ref(struct btrfs_trans_handle *trans,
> >   * existing and update must have the same bytenr
> >   */
> >  static noinline void
> > -update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs,
> > +update_existing_head_ref(struct btrfs_trans_handle *trans,
> >  			 struct btrfs_delayed_ref_head *existing,
> >  			 struct btrfs_delayed_ref_head *update,
> >  			 int *old_ref_mod_ret)
> >  {
> > +	struct btrfs_delayed_ref_root *delayed_refs =
> > +		&trans->transaction->delayed_refs;
> > +	struct btrfs_fs_info *fs_info = trans->fs_info;
> >  	int old_ref_mod;
> >  
> >  	BUG_ON(existing->is_data != update->is_data);
> > @@ -536,10 +539,18 @@ update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs,
> >  	 * versa we need to make sure to adjust pending_csums accordingly.
> >  	 */
> >  	if (existing->is_data) {
> > -		if (existing->total_ref_mod >= 0 && old_ref_mod < 0)
> > +		u64 csum_items =
> 
> The name is a bit of misnomer, what csum_items really holds is the
> number of btree leaves require to store them.
> 
> > +			btrfs_csum_bytes_to_leaves(fs_info,
> > +						   existing->num_bytes);
> > +
> > +		if (existing->total_ref_mod >= 0 && old_ref_mod < 0) {
> >  			delayed_refs->pending_csums -= existing->num_bytes;
> > -		if (existing->total_ref_mod < 0 && old_ref_mod >= 0)
> > +			btrfs_delayed_refs_rsv_release(fs_info, csum_items);
> > +		}
> > +		if (existing->total_ref_mod < 0 && old_ref_mod >= 0) {
> >  			delayed_refs->pending_csums += existing->num_bytes;
> > +			trans->delayed_ref_updates += csum_items;
> 
> Now with addition to tracking the number of delayed heads/ref,
> delayed_ref_updates now also tracks the number of csum leaves. Doesn't
> this make the number in delayed_ref_updates a bit of a mess?
> 
> Additionally, shouldn't delayed_ref_updates really track only refs
> otherwise double accounting occurs. I.e for a brand-new ref insertion we
> create a head so increment delayed_ref_updates once in
> add_delayed_ref_head, then we create the ref and increment it again in
> insert_delayed_ref?
>

No, update_existing_head_ref() only gets called if we tried to insert the ref
head and found another one.  We only update the delayed_ref_updates if we go
from a positive ref_mod to a negative, ie we delete an existing one which means
we'll need to delete csum items.  If we add a brand new one the accounting is
done there appropriately.

> 
> > +		}
> >  	}
> >  	spin_unlock(&existing->lock);
> >  }
> > @@ -645,7 +656,7 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
> >  			&& head_ref->qgroup_reserved
> >  			&& existing->qgroup_ref_root
> >  			&& existing->qgroup_reserved);
> > -		update_existing_head_ref(delayed_refs, existing, head_ref,
> > +		update_existing_head_ref(trans, existing, head_ref,
> >  					 old_ref_mod);
> >  		/*
> >  		 * we've updated the existing ref, free the newly
> > @@ -656,8 +667,12 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
> >  	} else {
> >  		if (old_ref_mod)
> >  			*old_ref_mod = 0;
> > -		if (head_ref->is_data && head_ref->ref_mod < 0)
> > +		if (head_ref->is_data && head_ref->ref_mod < 0) {
> 
> nit: I think for the sake of consistency you must really check
> head_ref->total_ref_mod as per the code in:
> 
> 1262133b8d6f ("Btrfs: account for crcs in delayed ref processing")
> 
> Additionally the comment above total_ref_mod also documents this.
> 
> Here it's guaranteed that head_ref->ref_mod and ->total_ref_mod are
> identical.
> 

I agree, but it's unrelated to this work so I don't want to touch it here.

> >  			delayed_refs->pending_csums += head_ref->num_bytes;
> > +			trans->delayed_ref_updates +=
> > +				btrfs_csum_bytes_to_leaves(trans->fs_info,
> > +							   head_ref->num_bytes);
> > +		}
> >  		delayed_refs->num_heads++;
> >  		delayed_refs->num_heads_ready++;
> >  		atomic_inc(&delayed_refs->num_entries);
> > @@ -792,6 +807,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans,
> >  
> >  	ret = insert_delayed_ref(trans, delayed_refs, head_ref, &ref->node);
> >  	spin_unlock(&delayed_refs->lock);
> > +	btrfs_update_delayed_refs_rsv(trans);
> >  
> >  	trace_add_delayed_tree_ref(fs_info, &ref->node, ref,
> >  				   action == BTRFS_ADD_DELAYED_EXTENT ?
> > @@ -873,6 +889,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans,
> >  
> >  	ret = insert_delayed_ref(trans, delayed_refs, head_ref, &ref->node);
> >  	spin_unlock(&delayed_refs->lock);
> > +	btrfs_update_delayed_refs_rsv(trans);
> 
> As stated previously this call to btrfs_update_delayed_refs_rsv is
> really paired with the code that was executed in add_delayed_ref_head.
> Since it's not really obvious from first look at least a comment would
> be fine. Or better yet (as suggested previously) just make updating the
> delayed_refs_updates and the resv one operation. I don't see how this
> function adds much to the critical section of delayed_refs->lock. The
> interface you give is really error-prone.
> 

I'm going to re-arrange this in a new patchset, so I'm going to leave this as is
for now.  I'll just comment here for now.

> >  
> >  	trace_add_delayed_data_ref(trans->fs_info, &ref->node, ref,
> >  				   action == BTRFS_ADD_DELAYED_EXTENT ?
> > @@ -910,6 +927,7 @@ int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
> >  			     NULL, NULL, NULL);
> >  
> >  	spin_unlock(&delayed_refs->lock);
> > +	btrfs_update_delayed_refs_rsv(trans);
> Ditto
> 
> >  	return 0;
> >  }
> >  
> > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > index 55e8ca782b98..beaa58e742b5 100644
> > --- a/fs/btrfs/disk-io.c
> > +++ b/fs/btrfs/disk-io.c
> > @@ -2666,6 +2666,9 @@ int open_ctree(struct super_block *sb,
> >  	btrfs_init_block_rsv(&fs_info->empty_block_rsv, BTRFS_BLOCK_RSV_EMPTY);
> >  	btrfs_init_block_rsv(&fs_info->delayed_block_rsv,
> >  			     BTRFS_BLOCK_RSV_DELOPS);
> > +	btrfs_init_block_rsv(&fs_info->delayed_refs_rsv,
> > +			     BTRFS_BLOCK_RSV_DELREFS);
> > +
> >  	atomic_set(&fs_info->async_delalloc_pages, 0);
> >  	atomic_set(&fs_info->defrag_running, 0);
> >  	atomic_set(&fs_info->qgroup_op_seq, 0);
> > @@ -4413,6 +4416,7 @@ void btrfs_cleanup_dirty_bgs(struct btrfs_transaction *cur_trans,
> >  
> >  		spin_unlock(&cur_trans->dirty_bgs_lock);
> >  		btrfs_put_block_group(cache);
> > +		btrfs_delayed_refs_rsv_release(fs_info, 1);
> >  		spin_lock(&cur_trans->dirty_bgs_lock);
> >  	}
> >  	spin_unlock(&cur_trans->dirty_bgs_lock);
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index 8a776dc9cb38..8af68b13fa27 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -2463,6 +2463,7 @@ static void cleanup_ref_head_accounting(struct btrfs_trans_handle *trans,
> >  	struct btrfs_fs_info *fs_info = trans->fs_info;
> >  	struct btrfs_delayed_ref_root *delayed_refs =
> >  		&trans->transaction->delayed_refs;
> > +	int nr_items = 1;
> 
> This 1 is a magic value, where does it come from?
> 
> >  
> >  	if (head->total_ref_mod < 0) {
> >  		struct btrfs_space_info *space_info;
> > @@ -2484,12 +2485,15 @@ static void cleanup_ref_head_accounting(struct btrfs_trans_handle *trans,
> >  			spin_lock(&delayed_refs->lock);
> >  			delayed_refs->pending_csums -= head->num_bytes;
> >  			spin_unlock(&delayed_refs->lock);
> > +			nr_items += btrfs_csum_bytes_to_leaves(fs_info,
> > +				head->num_bytes);
> >  		}
> >  	}
> >  
> >  	/* Also free its reserved qgroup space */
> >  	btrfs_qgroup_free_delayed_ref(fs_info, head->qgroup_ref_root,
> >  				      head->qgroup_reserved);
> > +	btrfs_delayed_refs_rsv_release(fs_info, nr_items);
> >  }
> >  
> >  static int cleanup_ref_head(struct btrfs_trans_handle *trans,
> > @@ -2831,40 +2835,22 @@ u64 btrfs_csum_bytes_to_leaves(struct btrfs_fs_info *fs_info, u64 csum_bytes)
> >  	return num_csums;
> >  }
> >  
> > -int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans)
> > +bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info)
> >  {
> > -	struct btrfs_fs_info *fs_info = trans->fs_info;
> > -	struct btrfs_block_rsv *global_rsv;
> > -	u64 num_heads = trans->transaction->delayed_refs.num_heads_ready;
> > -	u64 csum_bytes = trans->transaction->delayed_refs.pending_csums;
> > -	unsigned int num_dirty_bgs = trans->transaction->num_dirty_bgs;
> > -	u64 num_bytes, num_dirty_bgs_bytes;
> > -	int ret = 0;
> > -
> > -	num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
> > -	num_heads = heads_to_leaves(fs_info, num_heads);
> > -	if (num_heads > 1)
> > -		num_bytes += (num_heads - 1) * fs_info->nodesize;
> > -	num_bytes <<= 1;
> > -	num_bytes += btrfs_csum_bytes_to_leaves(fs_info, csum_bytes) *
> > -							fs_info->nodesize;
> > -	num_dirty_bgs_bytes = btrfs_calc_trans_metadata_size(fs_info,
> > -							     num_dirty_bgs);
> > -	global_rsv = &fs_info->global_block_rsv;
> > -
> > -	/*
> > -	 * If we can't allocate any more chunks lets make sure we have _lots_ of
> > -	 * wiggle room since running delayed refs can create more delayed refs.
> > -	 */
> > -	if (global_rsv->space_info->full) {
> > -		num_dirty_bgs_bytes <<= 1;
> > -		num_bytes <<= 1;
> > -	}
> > +	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
> > +	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
> > +	u64 reserved;
> > +	bool ret = false;
> 
> nit: Reverse christmas tree:
> 
> delayed_refs_rsv
> global_rsv
> bool ret
> reserved
> 
> >  
> >  	spin_lock(&global_rsv->lock);
> > -	if (global_rsv->reserved <= num_bytes + num_dirty_bgs_bytes)
> > -		ret = 1;
> > +	reserved = global_rsv->reserved;
> >  	spin_unlock(&global_rsv->lock);
> > +
> > +	spin_lock(&delayed_refs_rsv->lock);
> > +	reserved += delayed_refs_rsv->reserved;
> > +	if (delayed_refs_rsv->size >= reserved)
> > +		ret = true;
> 
> Essentially you want to throttle every time there is enough available
> reserved space, no?

If we're at the amount in the delayed refs rsv + the global reserve then we're
in murky territory and want to back off and run things.  I'll add a comment.
I'll address the rest of this stuff as well.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-11-27 19:11 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-21 18:59 [PATCH 0/6] Delayed refs rsv Josef Bacik
2018-11-21 18:59 ` [PATCH 1/6] btrfs: add btrfs_delete_ref_head helper Josef Bacik
2018-11-22  9:12   ` Nikolay Borisov
2018-11-22  9:42     ` Nikolay Borisov
2018-11-23 13:45       ` David Sterba
2018-11-23 13:50         ` Nikolay Borisov
2018-11-21 18:59 ` [PATCH 2/6] btrfs: add cleanup_ref_head_accounting helper Josef Bacik
2018-11-22  1:06   ` Qu Wenruo
2018-11-23 13:51     ` David Sterba
2018-11-21 18:59 ` [PATCH 3/6] btrfs: cleanup extent_op handling Josef Bacik
2018-11-22  8:56   ` Lu Fengqi
2018-11-22 10:09   ` Nikolay Borisov
2018-11-27 15:39     ` Josef Bacik
2018-11-23 15:05   ` David Sterba
2018-11-21 18:59 ` [PATCH 4/6] btrfs: only track ref_heads in delayed_ref_updates Josef Bacik
2018-11-22 10:19   ` Nikolay Borisov
2018-11-21 18:59 ` [PATCH 5/6] btrfs: introduce delayed_refs_rsv Josef Bacik
2018-11-26  9:14   ` Nikolay Borisov
2018-11-27 15:38     ` David Sterba
2018-11-27 19:11     ` Josef Bacik
2018-11-21 18:59 ` [PATCH 6/6] btrfs: fix truncate throttling Josef Bacik
2018-11-26  9:44   ` Nikolay Borisov
2018-11-23 15:55 ` [PATCH 0/6] Delayed refs rsv David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.