linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10][V2] Delayed refs rsv
@ 2018-12-03 15:20 Josef Bacik
  2018-12-03 15:20 ` [PATCH 01/10] btrfs: add btrfs_delete_ref_head helper Josef Bacik
                   ` (10 more replies)
  0 siblings, 11 replies; 26+ messages in thread
From: Josef Bacik @ 2018-12-03 15:20 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

v1->v2:
- addressed the comments from the various reviewers.
- split "introduce delayed_refs_rsv" into 5 patches.  The patches are the same
  together as they were, just split out more logically.  They can't really be
  bisected across in that you will likely have fun enospc failures, but they
  compile properly.  This was done to make it easier for review.

-- Original message --

This patchset changes how we do space reservations for delayed refs.  We were
hitting probably 20-40 enospc abort's per day in production while running
delayed refs at transaction commit time.  This means we ran out of space in the
global reserve and couldn't easily get more space in use_block_rsv().

The global reserve has grown to cover everything we don't reserve space
explicitly for, and we've grown a lot of weird ad-hoc hueristics to know if
we're running short on space and when it's time to force a commit.  A failure
rate of 20-40 file systems when we run hundreds of thousands of them isn't super
high, but cleaning up this code will make things less ugly and more predictible.

Thus the delayed refs rsv.  We always know how many delayed refs we have
outstanding, and although running them generates more we can use the global
reserve for that spill over, which fits better into it's desired use than a full
blown reservation.  This first approach is to simply take how many times we're
reserving space for and multiply that by 2 in order to save enough space for the
delayed refs that could be generated.  This is a niave approach and will
probably evolve, but for now it works.

With this patchset we've gone down to 2-8 failures per week.  It's not perfect,
there are some corner cases that still need to be addressed, but is
significantly better than what we had.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/10] btrfs: add btrfs_delete_ref_head helper
  2018-12-03 15:20 [PATCH 00/10][V2] Delayed refs rsv Josef Bacik
@ 2018-12-03 15:20 ` Josef Bacik
  2018-12-06 12:32   ` Nikolay Borisov
  2018-12-03 15:20 ` [PATCH 02/10] btrfs: add cleanup_ref_head_accounting helper Josef Bacik
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Josef Bacik @ 2018-12-03 15:20 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Josef Bacik

From: Josef Bacik <jbacik@fb.com>

We do this dance in cleanup_ref_head and check_ref_cleanup, unify it
into a helper and cleanup the calling functions.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/delayed-ref.c | 14 ++++++++++++++
 fs/btrfs/delayed-ref.h |  3 ++-
 fs/btrfs/extent-tree.c | 22 +++-------------------
 3 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 9301b3ad9217..b3e4c9fcb664 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -400,6 +400,20 @@ struct btrfs_delayed_ref_head *btrfs_select_ref_head(
 	return head;
 }
 
+void btrfs_delete_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
+			   struct btrfs_delayed_ref_head *head)
+{
+	lockdep_assert_held(&delayed_refs->lock);
+	lockdep_assert_held(&head->lock);
+
+	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
+	RB_CLEAR_NODE(&head->href_node);
+	atomic_dec(&delayed_refs->num_entries);
+	delayed_refs->num_heads--;
+	if (head->processing == 0)
+		delayed_refs->num_heads_ready--;
+}
+
 /*
  * Helper to insert the ref_node to the tail or merge with tail.
  *
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index 8e20c5cb5404..d2af974f68a1 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -261,7 +261,8 @@ static inline void btrfs_delayed_ref_unlock(struct btrfs_delayed_ref_head *head)
 {
 	mutex_unlock(&head->mutex);
 }
-
+void btrfs_delete_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
+			   struct btrfs_delayed_ref_head *head);
 
 struct btrfs_delayed_ref_head *btrfs_select_ref_head(
 		struct btrfs_delayed_ref_root *delayed_refs);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index d242a1174e50..c36b3a42f2bb 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2474,12 +2474,9 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans,
 		spin_unlock(&delayed_refs->lock);
 		return 1;
 	}
-	delayed_refs->num_heads--;
-	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
-	RB_CLEAR_NODE(&head->href_node);
+	btrfs_delete_ref_head(delayed_refs, head);
 	spin_unlock(&head->lock);
 	spin_unlock(&delayed_refs->lock);
-	atomic_dec(&delayed_refs->num_entries);
 
 	trace_run_delayed_ref_head(fs_info, head, 0);
 
@@ -6984,22 +6981,9 @@ static noinline int check_ref_cleanup(struct btrfs_trans_handle *trans,
 	if (!mutex_trylock(&head->mutex))
 		goto out;
 
-	/*
-	 * at this point we have a head with no other entries.  Go
-	 * ahead and process it.
-	 */
-	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
-	RB_CLEAR_NODE(&head->href_node);
-	atomic_dec(&delayed_refs->num_entries);
-
-	/*
-	 * we don't take a ref on the node because we're removing it from the
-	 * tree, so we just steal the ref the tree was holding.
-	 */
-	delayed_refs->num_heads--;
-	if (head->processing == 0)
-		delayed_refs->num_heads_ready--;
+	btrfs_delete_ref_head(delayed_refs, head);
 	head->processing = 0;
+
 	spin_unlock(&head->lock);
 	spin_unlock(&delayed_refs->lock);
 
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 02/10] btrfs: add cleanup_ref_head_accounting helper
  2018-12-03 15:20 [PATCH 00/10][V2] Delayed refs rsv Josef Bacik
  2018-12-03 15:20 ` [PATCH 01/10] btrfs: add btrfs_delete_ref_head helper Josef Bacik
@ 2018-12-03 15:20 ` Josef Bacik
  2018-12-06 12:38   ` Nikolay Borisov
  2018-12-03 15:20 ` [PATCH 03/10] btrfs: cleanup extent_op handling Josef Bacik
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Josef Bacik @ 2018-12-03 15:20 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Josef Bacik

From: Josef Bacik <jbacik@fb.com>

We were missing some quota cleanups in check_ref_cleanup, so break the
ref head accounting cleanup into a helper and call that from both
check_ref_cleanup and cleanup_ref_head.  This will hopefully ensure that
we don't screw up accounting in the future for other things that we add.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/extent-tree.c | 67 +++++++++++++++++++++++++++++---------------------
 1 file changed, 39 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index c36b3a42f2bb..e3ed3507018d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2443,6 +2443,41 @@ static int cleanup_extent_op(struct btrfs_trans_handle *trans,
 	return ret ? ret : 1;
 }
 
+static void cleanup_ref_head_accounting(struct btrfs_trans_handle *trans,
+					struct btrfs_delayed_ref_head *head)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_delayed_ref_root *delayed_refs =
+		&trans->transaction->delayed_refs;
+
+	if (head->total_ref_mod < 0) {
+		struct btrfs_space_info *space_info;
+		u64 flags;
+
+		if (head->is_data)
+			flags = BTRFS_BLOCK_GROUP_DATA;
+		else if (head->is_system)
+			flags = BTRFS_BLOCK_GROUP_SYSTEM;
+		else
+			flags = BTRFS_BLOCK_GROUP_METADATA;
+		space_info = __find_space_info(fs_info, flags);
+		ASSERT(space_info);
+		percpu_counter_add_batch(&space_info->total_bytes_pinned,
+				   -head->num_bytes,
+				   BTRFS_TOTAL_BYTES_PINNED_BATCH);
+
+		if (head->is_data) {
+			spin_lock(&delayed_refs->lock);
+			delayed_refs->pending_csums -= head->num_bytes;
+			spin_unlock(&delayed_refs->lock);
+		}
+	}
+
+	/* Also free its reserved qgroup space */
+	btrfs_qgroup_free_delayed_ref(fs_info, head->qgroup_ref_root,
+				      head->qgroup_reserved);
+}
+
 static int cleanup_ref_head(struct btrfs_trans_handle *trans,
 			    struct btrfs_delayed_ref_head *head)
 {
@@ -2478,31 +2513,6 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans,
 	spin_unlock(&head->lock);
 	spin_unlock(&delayed_refs->lock);
 
-	trace_run_delayed_ref_head(fs_info, head, 0);
-
-	if (head->total_ref_mod < 0) {
-		struct btrfs_space_info *space_info;
-		u64 flags;
-
-		if (head->is_data)
-			flags = BTRFS_BLOCK_GROUP_DATA;
-		else if (head->is_system)
-			flags = BTRFS_BLOCK_GROUP_SYSTEM;
-		else
-			flags = BTRFS_BLOCK_GROUP_METADATA;
-		space_info = __find_space_info(fs_info, flags);
-		ASSERT(space_info);
-		percpu_counter_add_batch(&space_info->total_bytes_pinned,
-				   -head->num_bytes,
-				   BTRFS_TOTAL_BYTES_PINNED_BATCH);
-
-		if (head->is_data) {
-			spin_lock(&delayed_refs->lock);
-			delayed_refs->pending_csums -= head->num_bytes;
-			spin_unlock(&delayed_refs->lock);
-		}
-	}
-
 	if (head->must_insert_reserved) {
 		btrfs_pin_extent(fs_info, head->bytenr,
 				 head->num_bytes, 1);
@@ -2512,9 +2522,9 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans,
 		}
 	}
 
-	/* Also free its reserved qgroup space */
-	btrfs_qgroup_free_delayed_ref(fs_info, head->qgroup_ref_root,
-				      head->qgroup_reserved);
+	cleanup_ref_head_accounting(trans, head);
+
+	trace_run_delayed_ref_head(fs_info, head, 0);
 	btrfs_delayed_ref_unlock(head);
 	btrfs_put_delayed_ref_head(head);
 	return 0;
@@ -6991,6 +7001,7 @@ static noinline int check_ref_cleanup(struct btrfs_trans_handle *trans,
 	if (head->must_insert_reserved)
 		ret = 1;
 
+	cleanup_ref_head_accounting(trans, head);
 	mutex_unlock(&head->mutex);
 	btrfs_put_delayed_ref_head(head);
 	return ret;
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 03/10] btrfs: cleanup extent_op handling
  2018-12-03 15:20 [PATCH 00/10][V2] Delayed refs rsv Josef Bacik
  2018-12-03 15:20 ` [PATCH 01/10] btrfs: add btrfs_delete_ref_head helper Josef Bacik
  2018-12-03 15:20 ` [PATCH 02/10] btrfs: add cleanup_ref_head_accounting helper Josef Bacik
@ 2018-12-03 15:20 ` Josef Bacik
  2018-12-03 15:20 ` [PATCH 04/10] btrfs: only track ref_heads in delayed_ref_updates Josef Bacik
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Josef Bacik @ 2018-12-03 15:20 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Josef Bacik

From: Josef Bacik <jbacik@fb.com>

The cleanup_extent_op function actually would run the extent_op if it
needed running, which made the name sort of a misnomer.  Change it to
run_and_cleanup_extent_op, and move the actual cleanup work to
cleanup_extent_op so it can be used by check_ref_cleanup() in order to
unify the extent op handling.

Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/extent-tree.c | 35 ++++++++++++++++++++++-------------
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e3ed3507018d..9f169f3c5fdb 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2424,19 +2424,32 @@ static void unselect_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_ref
 	btrfs_delayed_ref_unlock(head);
 }
 
-static int cleanup_extent_op(struct btrfs_trans_handle *trans,
-			     struct btrfs_delayed_ref_head *head)
+static struct btrfs_delayed_extent_op *cleanup_extent_op(
+				struct btrfs_delayed_ref_head *head)
 {
 	struct btrfs_delayed_extent_op *extent_op = head->extent_op;
-	int ret;
 
 	if (!extent_op)
-		return 0;
-	head->extent_op = NULL;
+		return NULL;
+
 	if (head->must_insert_reserved) {
+		head->extent_op = NULL;
 		btrfs_free_delayed_extent_op(extent_op);
-		return 0;
+		return NULL;
 	}
+	return extent_op;
+}
+
+static int run_and_cleanup_extent_op(struct btrfs_trans_handle *trans,
+				     struct btrfs_delayed_ref_head *head)
+{
+	struct btrfs_delayed_extent_op *extent_op;
+	int ret;
+
+	extent_op = cleanup_extent_op(head);
+	if (!extent_op)
+		return 0;
+	head->extent_op = NULL;
 	spin_unlock(&head->lock);
 	ret = run_delayed_extent_op(trans, head, extent_op);
 	btrfs_free_delayed_extent_op(extent_op);
@@ -2488,7 +2501,7 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans,
 
 	delayed_refs = &trans->transaction->delayed_refs;
 
-	ret = cleanup_extent_op(trans, head);
+	ret = run_and_cleanup_extent_op(trans, head);
 	if (ret < 0) {
 		unselect_delayed_ref_head(delayed_refs, head);
 		btrfs_debug(fs_info, "run_delayed_extent_op returned %d", ret);
@@ -6977,12 +6990,8 @@ static noinline int check_ref_cleanup(struct btrfs_trans_handle *trans,
 	if (!RB_EMPTY_ROOT(&head->ref_tree.rb_root))
 		goto out;
 
-	if (head->extent_op) {
-		if (!head->must_insert_reserved)
-			goto out;
-		btrfs_free_delayed_extent_op(head->extent_op);
-		head->extent_op = NULL;
-	}
+	if (cleanup_extent_op(head) != NULL)
+		goto out;
 
 	/*
 	 * waiting for the lock here would deadlock.  If someone else has it
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/10] btrfs: only track ref_heads in delayed_ref_updates
  2018-12-03 15:20 [PATCH 00/10][V2] Delayed refs rsv Josef Bacik
                   ` (2 preceding siblings ...)
  2018-12-03 15:20 ` [PATCH 03/10] btrfs: cleanup extent_op handling Josef Bacik
@ 2018-12-03 15:20 ` Josef Bacik
  2018-12-07 13:01   ` Nikolay Borisov
  2018-12-03 15:20 ` [PATCH 05/10] btrfs: introduce delayed_refs_rsv Josef Bacik
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Josef Bacik @ 2018-12-03 15:20 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Josef Bacik

From: Josef Bacik <jbacik@fb.com>

We use this number to figure out how many delayed refs to run, but
__btrfs_run_delayed_refs really only checks every time we need a new
delayed ref head, so we always run at least one ref head completely no
matter what the number of items on it.  Fix the accounting to only be
adjusted when we add/remove a ref head.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/delayed-ref.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index b3e4c9fcb664..48725fa757a3 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -251,8 +251,6 @@ static inline void drop_delayed_ref(struct btrfs_trans_handle *trans,
 	ref->in_tree = 0;
 	btrfs_put_delayed_ref(ref);
 	atomic_dec(&delayed_refs->num_entries);
-	if (trans->delayed_ref_updates)
-		trans->delayed_ref_updates--;
 }
 
 static bool merge_ref(struct btrfs_trans_handle *trans,
@@ -467,7 +465,6 @@ static int insert_delayed_ref(struct btrfs_trans_handle *trans,
 	if (ref->action == BTRFS_ADD_DELAYED_REF)
 		list_add_tail(&ref->add_list, &href->ref_add_list);
 	atomic_inc(&root->num_entries);
-	trans->delayed_ref_updates++;
 	spin_unlock(&href->lock);
 	return ret;
 }
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/10] btrfs: introduce delayed_refs_rsv
  2018-12-03 15:20 [PATCH 00/10][V2] Delayed refs rsv Josef Bacik
                   ` (3 preceding siblings ...)
  2018-12-03 15:20 ` [PATCH 04/10] btrfs: only track ref_heads in delayed_ref_updates Josef Bacik
@ 2018-12-03 15:20 ` Josef Bacik
  2018-12-07 14:45   ` Nikolay Borisov
  2018-12-03 15:20 ` [PATCH 06/10] btrfs: update may_commit_transaction to use the delayed refs rsv Josef Bacik
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Josef Bacik @ 2018-12-03 15:20 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Josef Bacik

From: Josef Bacik <jbacik@fb.com>

Traditionally we've had voodoo in btrfs to account for the space that
delayed refs may take up by having a global_block_rsv.  This works most
of the time, except when it doesn't.  We've had issues reported and seen
in production where sometimes the global reserve is exhausted during
transaction commit before we can run all of our delayed refs, resulting
in an aborted transaction.  Because of this voodoo we have equally
dubious flushing semantics around throttling delayed refs which we often
get wrong.

So instead give them their own block_rsv.  This way we can always know
exactly how much outstanding space we need for delayed refs.  This
allows us to make sure we are constantly filling that reservation up
with space, and allows us to put more precise pressure on the enospc
system.  Instead of doing math to see if its a good time to throttle,
the normal enospc code will be invoked if we have a lot of delayed refs
pending, and they will be run via the normal flushing mechanism.

For now the delayed_refs_rsv will hold the reservations for the delayed
refs, the block group updates, and deleting csums.  We could have a
separate rsv for the block group updates, but the csum deletion stuff is
still handled via the delayed_refs so that will stay there.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/ctree.h       |  14 +++-
 fs/btrfs/delayed-ref.c |  43 ++++++++--
 fs/btrfs/disk-io.c     |   4 +
 fs/btrfs/extent-tree.c | 212 +++++++++++++++++++++++++++++++++++++++++++++----
 fs/btrfs/transaction.c |  37 ++++++++-
 5 files changed, 284 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8b41ec42f405..52a87d446945 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -448,8 +448,9 @@ struct btrfs_space_info {
 #define	BTRFS_BLOCK_RSV_TRANS		3
 #define	BTRFS_BLOCK_RSV_CHUNK		4
 #define	BTRFS_BLOCK_RSV_DELOPS		5
-#define	BTRFS_BLOCK_RSV_EMPTY		6
-#define	BTRFS_BLOCK_RSV_TEMP		7
+#define BTRFS_BLOCK_RSV_DELREFS		6
+#define	BTRFS_BLOCK_RSV_EMPTY		7
+#define	BTRFS_BLOCK_RSV_TEMP		8
 
 struct btrfs_block_rsv {
 	u64 size;
@@ -812,6 +813,8 @@ struct btrfs_fs_info {
 	struct btrfs_block_rsv chunk_block_rsv;
 	/* block reservation for delayed operations */
 	struct btrfs_block_rsv delayed_block_rsv;
+	/* block reservation for delayed refs */
+	struct btrfs_block_rsv delayed_refs_rsv;
 
 	struct btrfs_block_rsv empty_block_rsv;
 
@@ -2796,6 +2799,13 @@ int btrfs_cond_migrate_bytes(struct btrfs_fs_info *fs_info,
 void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
 			     struct btrfs_block_rsv *block_rsv,
 			     u64 num_bytes);
+void btrfs_delayed_refs_rsv_release(struct btrfs_fs_info *fs_info, int nr);
+void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans);
+int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
+				  enum btrfs_reserve_flush_enum flush);
+void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info,
+				       struct btrfs_block_rsv *src,
+				       u64 num_bytes);
 int btrfs_inc_block_group_ro(struct btrfs_block_group_cache *cache);
 void btrfs_dec_block_group_ro(struct btrfs_block_group_cache *cache);
 void btrfs_put_block_group_cache(struct btrfs_fs_info *info);
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 48725fa757a3..a198ab91879c 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -474,11 +474,14 @@ static int insert_delayed_ref(struct btrfs_trans_handle *trans,
  * existing and update must have the same bytenr
  */
 static noinline void
-update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs,
+update_existing_head_ref(struct btrfs_trans_handle *trans,
 			 struct btrfs_delayed_ref_head *existing,
 			 struct btrfs_delayed_ref_head *update,
 			 int *old_ref_mod_ret)
 {
+	struct btrfs_delayed_ref_root *delayed_refs =
+		&trans->transaction->delayed_refs;
+	struct btrfs_fs_info *fs_info = trans->fs_info;
 	int old_ref_mod;
 
 	BUG_ON(existing->is_data != update->is_data);
@@ -536,10 +539,18 @@ update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs,
 	 * versa we need to make sure to adjust pending_csums accordingly.
 	 */
 	if (existing->is_data) {
-		if (existing->total_ref_mod >= 0 && old_ref_mod < 0)
+		u64 csum_leaves =
+			btrfs_csum_bytes_to_leaves(fs_info,
+						   existing->num_bytes);
+
+		if (existing->total_ref_mod >= 0 && old_ref_mod < 0) {
 			delayed_refs->pending_csums -= existing->num_bytes;
-		if (existing->total_ref_mod < 0 && old_ref_mod >= 0)
+			btrfs_delayed_refs_rsv_release(fs_info, csum_leaves);
+		}
+		if (existing->total_ref_mod < 0 && old_ref_mod >= 0) {
 			delayed_refs->pending_csums += existing->num_bytes;
+			trans->delayed_ref_updates += csum_leaves;
+		}
 	}
 	spin_unlock(&existing->lock);
 }
@@ -645,7 +656,7 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
 			&& head_ref->qgroup_reserved
 			&& existing->qgroup_ref_root
 			&& existing->qgroup_reserved);
-		update_existing_head_ref(delayed_refs, existing, head_ref,
+		update_existing_head_ref(trans, existing, head_ref,
 					 old_ref_mod);
 		/*
 		 * we've updated the existing ref, free the newly
@@ -656,8 +667,12 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
 	} else {
 		if (old_ref_mod)
 			*old_ref_mod = 0;
-		if (head_ref->is_data && head_ref->ref_mod < 0)
+		if (head_ref->is_data && head_ref->ref_mod < 0) {
 			delayed_refs->pending_csums += head_ref->num_bytes;
+			trans->delayed_ref_updates +=
+				btrfs_csum_bytes_to_leaves(trans->fs_info,
+							   head_ref->num_bytes);
+		}
 		delayed_refs->num_heads++;
 		delayed_refs->num_heads_ready++;
 		atomic_inc(&delayed_refs->num_entries);
@@ -793,6 +808,12 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans,
 	ret = insert_delayed_ref(trans, delayed_refs, head_ref, &ref->node);
 	spin_unlock(&delayed_refs->lock);
 
+	/*
+	 * Need to update the delayed_refs_rsv with any changes we may have
+	 * made.
+	 */
+	btrfs_update_delayed_refs_rsv(trans);
+
 	trace_add_delayed_tree_ref(fs_info, &ref->node, ref,
 				   action == BTRFS_ADD_DELAYED_EXTENT ?
 				   BTRFS_ADD_DELAYED_REF : action);
@@ -874,6 +895,12 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans,
 	ret = insert_delayed_ref(trans, delayed_refs, head_ref, &ref->node);
 	spin_unlock(&delayed_refs->lock);
 
+	/*
+	 * Need to update the delayed_refs_rsv with any changes we may have
+	 * made.
+	 */
+	btrfs_update_delayed_refs_rsv(trans);
+
 	trace_add_delayed_data_ref(trans->fs_info, &ref->node, ref,
 				   action == BTRFS_ADD_DELAYED_EXTENT ?
 				   BTRFS_ADD_DELAYED_REF : action);
@@ -910,6 +937,12 @@ int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
 			     NULL, NULL, NULL);
 
 	spin_unlock(&delayed_refs->lock);
+
+	/*
+	 * Need to update the delayed_refs_rsv with any changes we may have
+	 * made.
+	 */
+	btrfs_update_delayed_refs_rsv(trans);
 	return 0;
 }
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 55e8ca782b98..beaa58e742b5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2666,6 +2666,9 @@ int open_ctree(struct super_block *sb,
 	btrfs_init_block_rsv(&fs_info->empty_block_rsv, BTRFS_BLOCK_RSV_EMPTY);
 	btrfs_init_block_rsv(&fs_info->delayed_block_rsv,
 			     BTRFS_BLOCK_RSV_DELOPS);
+	btrfs_init_block_rsv(&fs_info->delayed_refs_rsv,
+			     BTRFS_BLOCK_RSV_DELREFS);
+
 	atomic_set(&fs_info->async_delalloc_pages, 0);
 	atomic_set(&fs_info->defrag_running, 0);
 	atomic_set(&fs_info->qgroup_op_seq, 0);
@@ -4413,6 +4416,7 @@ void btrfs_cleanup_dirty_bgs(struct btrfs_transaction *cur_trans,
 
 		spin_unlock(&cur_trans->dirty_bgs_lock);
 		btrfs_put_block_group(cache);
+		btrfs_delayed_refs_rsv_release(fs_info, 1);
 		spin_lock(&cur_trans->dirty_bgs_lock);
 	}
 	spin_unlock(&cur_trans->dirty_bgs_lock);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 9f169f3c5fdb..aa0a638d0263 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2462,6 +2462,7 @@ static void cleanup_ref_head_accounting(struct btrfs_trans_handle *trans,
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_delayed_ref_root *delayed_refs =
 		&trans->transaction->delayed_refs;
+	int nr_items = 1;	/* Dropping this ref head update. */
 
 	if (head->total_ref_mod < 0) {
 		struct btrfs_space_info *space_info;
@@ -2479,16 +2480,24 @@ static void cleanup_ref_head_accounting(struct btrfs_trans_handle *trans,
 				   -head->num_bytes,
 				   BTRFS_TOTAL_BYTES_PINNED_BATCH);
 
+		/*
+		 * We had csum deletions accounted for in our delayed refs rsv,
+		 * we need to drop the csum leaves for this update from our
+		 * delayed_refs_rsv.
+		 */
 		if (head->is_data) {
 			spin_lock(&delayed_refs->lock);
 			delayed_refs->pending_csums -= head->num_bytes;
 			spin_unlock(&delayed_refs->lock);
+			nr_items += btrfs_csum_bytes_to_leaves(fs_info,
+				head->num_bytes);
 		}
 	}
 
 	/* Also free its reserved qgroup space */
 	btrfs_qgroup_free_delayed_ref(fs_info, head->qgroup_ref_root,
 				      head->qgroup_reserved);
+	btrfs_delayed_refs_rsv_release(fs_info, nr_items);
 }
 
 static int cleanup_ref_head(struct btrfs_trans_handle *trans,
@@ -3626,6 +3635,8 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans)
 	 */
 	mutex_lock(&trans->transaction->cache_write_mutex);
 	while (!list_empty(&dirty)) {
+		bool drop_reserve = true;
+
 		cache = list_first_entry(&dirty,
 					 struct btrfs_block_group_cache,
 					 dirty_list);
@@ -3698,6 +3709,7 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans)
 					list_add_tail(&cache->dirty_list,
 						      &cur_trans->dirty_bgs);
 					btrfs_get_block_group(cache);
+					drop_reserve = false;
 				}
 				spin_unlock(&cur_trans->dirty_bgs_lock);
 			} else if (ret) {
@@ -3708,6 +3720,8 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans)
 		/* if its not on the io list, we need to put the block group */
 		if (should_put)
 			btrfs_put_block_group(cache);
+		if (drop_reserve)
+			btrfs_delayed_refs_rsv_release(fs_info, 1);
 
 		if (ret)
 			break;
@@ -3856,6 +3870,7 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans,
 		/* if its not on the io list, we need to put the block group */
 		if (should_put)
 			btrfs_put_block_group(cache);
+		btrfs_delayed_refs_rsv_release(fs_info, 1);
 		spin_lock(&cur_trans->dirty_bgs_lock);
 	}
 	spin_unlock(&cur_trans->dirty_bgs_lock);
@@ -5389,6 +5404,91 @@ int btrfs_cond_migrate_bytes(struct btrfs_fs_info *fs_info,
 	return 0;
 }
 
+/**
+ * btrfs_migrate_to_delayed_refs_rsv - transfer bytes to our delayed refs rsv.
+ * @fs_info - the fs info for our fs.
+ * @src - the source block rsv to transfer from.
+ * @num_bytes - the number of bytes to transfer.
+ *
+ * This transfers up to the num_bytes amount from the src rsv to the
+ * delayed_refs_rsv.  Any extra bytes are returned to the space info.
+ */
+void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info,
+				       struct btrfs_block_rsv *src,
+				       u64 num_bytes)
+{
+	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
+	u64 to_free = 0;
+
+	spin_lock(&src->lock);
+	src->reserved -= num_bytes;
+	src->size -= num_bytes;
+	spin_unlock(&src->lock);
+
+	spin_lock(&delayed_refs_rsv->lock);
+	if (delayed_refs_rsv->size > delayed_refs_rsv->reserved) {
+		u64 delta = delayed_refs_rsv->size -
+			delayed_refs_rsv->reserved;
+		if (num_bytes > delta) {
+			to_free = num_bytes - delta;
+			num_bytes = delta;
+		}
+	} else {
+		to_free = num_bytes;
+		num_bytes = 0;
+	}
+
+	if (num_bytes)
+		delayed_refs_rsv->reserved += num_bytes;
+	if (delayed_refs_rsv->reserved >= delayed_refs_rsv->size)
+		delayed_refs_rsv->full = 1;
+	spin_unlock(&delayed_refs_rsv->lock);
+
+	if (num_bytes)
+		trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv",
+					      0, num_bytes, 1);
+	if (to_free)
+		space_info_add_old_bytes(fs_info, delayed_refs_rsv->space_info,
+					 to_free);
+}
+
+/**
+ * btrfs_delayed_refs_rsv_refill - refill based on our delayed refs usage.
+ * @fs_info - the fs_info for our fs.
+ * @flush - control how we can flush for this reservation.
+ *
+ * This will refill the delayed block_rsv up to 1 items size worth of space and
+ * will return -ENOSPC if we can't make the reservation.
+ */
+int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
+				  enum btrfs_reserve_flush_enum flush)
+{
+	struct btrfs_block_rsv *block_rsv = &fs_info->delayed_refs_rsv;
+	u64 limit = btrfs_calc_trans_metadata_size(fs_info, 1);
+	u64 num_bytes = 0;
+	int ret = -ENOSPC;
+
+	spin_lock(&block_rsv->lock);
+	if (block_rsv->reserved < block_rsv->size) {
+		num_bytes = block_rsv->size - block_rsv->reserved;
+		num_bytes = min(num_bytes, limit);
+	}
+	spin_unlock(&block_rsv->lock);
+
+	if (!num_bytes)
+		return 0;
+
+	ret = reserve_metadata_bytes(fs_info->extent_root, block_rsv,
+				     num_bytes, flush);
+	if (ret)
+		return ret;
+	block_rsv_add_bytes(block_rsv, num_bytes, 0);
+	trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv",
+				      0, num_bytes, 1);
+	return 0;
+}
+
+
 /*
  * This is for space we already have accounted in space_info->bytes_may_use, so
  * basically when we're returning space from block_rsv's.
@@ -5709,6 +5809,31 @@ static int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
 	return ret;
 }
 
+static u64 __btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
+				     struct btrfs_block_rsv *block_rsv,
+				     u64 num_bytes, u64 *qgroup_to_release)
+{
+	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
+	struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_refs_rsv;
+	struct btrfs_block_rsv *target = delayed_rsv;
+
+	if (target->full || target == block_rsv)
+		target = global_rsv;
+
+	if (block_rsv->space_info != target->space_info)
+		target = NULL;
+
+	return block_rsv_release_bytes(fs_info, block_rsv, target, num_bytes,
+				       qgroup_to_release);
+}
+
+void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
+			     struct btrfs_block_rsv *block_rsv,
+			     u64 num_bytes)
+{
+	__btrfs_block_rsv_release(fs_info, block_rsv, num_bytes, NULL);
+}
+
 /**
  * btrfs_inode_rsv_release - release any excessive reservation.
  * @inode - the inode we need to release from.
@@ -5723,7 +5848,6 @@ static int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
 static void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
 	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
 	u64 released = 0;
 	u64 qgroup_to_release = 0;
@@ -5733,8 +5857,8 @@ static void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
 	 * are releasing 0 bytes, and then we'll just get the reservation over
 	 * the size free'd.
 	 */
-	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0,
-					   &qgroup_to_release);
+	released = __btrfs_block_rsv_release(fs_info, block_rsv, 0,
+					     &qgroup_to_release);
 	if (released > 0)
 		trace_btrfs_space_reservation(fs_info, "delalloc",
 					      btrfs_ino(inode), released, 0);
@@ -5745,16 +5869,26 @@ static void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
 						   qgroup_to_release);
 }
 
-void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
-			     struct btrfs_block_rsv *block_rsv,
-			     u64 num_bytes)
+/**
+ * btrfs_delayed_refs_rsv_release - release a ref head's reservation.
+ * @fs_info - the fs_info for our fs.
+ * @nr - the number of items to drop.
+ *
+ * This drops the delayed ref head's count from the delayed refs rsv and free's
+ * any excess reservation we had.
+ */
+void btrfs_delayed_refs_rsv_release(struct btrfs_fs_info *fs_info, int nr)
 {
+	struct btrfs_block_rsv *block_rsv = &fs_info->delayed_refs_rsv;
 	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
+	u64 num_bytes = btrfs_calc_trans_metadata_size(fs_info, nr);
+	u64 released = 0;
 
-	if (global_rsv == block_rsv ||
-	    block_rsv->space_info != global_rsv->space_info)
-		global_rsv = NULL;
-	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes, NULL);
+	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv,
+					   num_bytes, NULL);
+	if (released)
+		trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv",
+					      0, released, 0);
 }
 
 static void update_global_block_rsv(struct btrfs_fs_info *fs_info)
@@ -5819,9 +5953,10 @@ static void init_global_block_rsv(struct btrfs_fs_info *fs_info)
 	fs_info->trans_block_rsv.space_info = space_info;
 	fs_info->empty_block_rsv.space_info = space_info;
 	fs_info->delayed_block_rsv.space_info = space_info;
+	fs_info->delayed_refs_rsv.space_info = space_info;
 
-	fs_info->extent_root->block_rsv = &fs_info->global_block_rsv;
-	fs_info->csum_root->block_rsv = &fs_info->global_block_rsv;
+	fs_info->extent_root->block_rsv = &fs_info->delayed_refs_rsv;
+	fs_info->csum_root->block_rsv = &fs_info->delayed_refs_rsv;
 	fs_info->dev_root->block_rsv = &fs_info->global_block_rsv;
 	fs_info->tree_root->block_rsv = &fs_info->global_block_rsv;
 	if (fs_info->quota_root)
@@ -5841,8 +5976,34 @@ static void release_global_block_rsv(struct btrfs_fs_info *fs_info)
 	WARN_ON(fs_info->chunk_block_rsv.reserved > 0);
 	WARN_ON(fs_info->delayed_block_rsv.size > 0);
 	WARN_ON(fs_info->delayed_block_rsv.reserved > 0);
+	WARN_ON(fs_info->delayed_refs_rsv.reserved > 0);
+	WARN_ON(fs_info->delayed_refs_rsv.size > 0);
 }
 
+/*
+ * btrfs_update_delayed_refs_rsv - adjust the size of the delayed refs rsv
+ * @trans - the trans that may have generated delayed refs
+ *
+ * This is to be called anytime we may have adjusted trans->delayed_ref_updates,
+ * it'll calculate the additional size and add it to the delayed_refs_rsv.
+ */
+void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_refs_rsv;
+	u64 num_bytes;
+
+	if (!trans->delayed_ref_updates)
+		return;
+
+	num_bytes = btrfs_calc_trans_metadata_size(fs_info,
+						   trans->delayed_ref_updates);
+	spin_lock(&delayed_rsv->lock);
+	delayed_rsv->size += num_bytes;
+	delayed_rsv->full = 0;
+	spin_unlock(&delayed_rsv->lock);
+	trans->delayed_ref_updates = 0;
+}
 
 /*
  * To be called after all the new block groups attached to the transaction
@@ -6135,6 +6296,7 @@ static int update_block_group(struct btrfs_trans_handle *trans,
 	u64 old_val;
 	u64 byte_in_group;
 	int factor;
+	int ret = 0;
 
 	/* block accounting for super block */
 	spin_lock(&info->delalloc_root_lock);
@@ -6148,8 +6310,10 @@ static int update_block_group(struct btrfs_trans_handle *trans,
 
 	while (total) {
 		cache = btrfs_lookup_block_group(info, bytenr);
-		if (!cache)
-			return -ENOENT;
+		if (!cache) {
+			ret = -ENOENT;
+			break;
+		}
 		factor = btrfs_bg_type_to_factor(cache->flags);
 
 		/*
@@ -6208,6 +6372,7 @@ static int update_block_group(struct btrfs_trans_handle *trans,
 			list_add_tail(&cache->dirty_list,
 				      &trans->transaction->dirty_bgs);
 			trans->transaction->num_dirty_bgs++;
+			trans->delayed_ref_updates++;
 			btrfs_get_block_group(cache);
 		}
 		spin_unlock(&trans->transaction->dirty_bgs_lock);
@@ -6225,7 +6390,10 @@ static int update_block_group(struct btrfs_trans_handle *trans,
 		total -= num_bytes;
 		bytenr += num_bytes;
 	}
-	return 0;
+
+	/* Modified block groups are accounted for in the delayed_refs_rsv. */
+	btrfs_update_delayed_refs_rsv(trans);
+	return ret;
 }
 
 static u64 first_logical_byte(struct btrfs_fs_info *fs_info, u64 search_start)
@@ -8368,7 +8536,12 @@ use_block_rsv(struct btrfs_trans_handle *trans,
 		goto again;
 	}
 
-	if (btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
+	/*
+	 * The global reserve still exists to save us from ourselves, so don't
+	 * warn_on if we are short on our delayed refs reserve.
+	 */
+	if (block_rsv->type != BTRFS_BLOCK_RSV_DELREFS &&
+	    btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
 		static DEFINE_RATELIMIT_STATE(_rs,
 				DEFAULT_RATELIMIT_INTERVAL * 10,
 				/*DEFAULT_RATELIMIT_BURST*/ 1);
@@ -10301,6 +10474,7 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans)
 		add_block_group_free_space(trans, block_group);
 		/* already aborted the transaction if it failed. */
 next:
+		btrfs_delayed_refs_rsv_release(fs_info, 1);
 		list_del_init(&block_group->bg_list);
 	}
 	btrfs_trans_release_chunk_metadata(trans);
@@ -10378,6 +10552,8 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used,
 	link_block_group(cache);
 
 	list_add_tail(&cache->bg_list, &trans->new_bgs);
+	trans->delayed_ref_updates++;
+	btrfs_update_delayed_refs_rsv(trans);
 
 	set_avail_alloc_bits(fs_info, type);
 	return 0;
@@ -10415,6 +10591,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 	int factor;
 	struct btrfs_caching_control *caching_ctl = NULL;
 	bool remove_em;
+	bool remove_rsv = false;
 
 	block_group = btrfs_lookup_block_group(fs_info, group_start);
 	BUG_ON(!block_group);
@@ -10479,6 +10656,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 
 	if (!list_empty(&block_group->dirty_list)) {
 		list_del_init(&block_group->dirty_list);
+		remove_rsv = true;
 		btrfs_put_block_group(block_group);
 	}
 	spin_unlock(&trans->transaction->dirty_bgs_lock);
@@ -10688,6 +10866,8 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 
 	ret = btrfs_del_item(trans, root, path);
 out:
+	if (remove_rsv)
+		btrfs_delayed_refs_rsv_release(fs_info, 1);
 	btrfs_free_path(path);
 	return ret;
 }
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index f92c0a88c4ad..a21c4defad92 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -454,7 +454,7 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
 		  bool enforce_qgroups)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
-
+	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
 	struct btrfs_trans_handle *h;
 	struct btrfs_transaction *cur_trans;
 	u64 num_bytes = 0;
@@ -483,13 +483,28 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
 	 * the appropriate flushing if need be.
 	 */
 	if (num_items && root != fs_info->chunk_root) {
+		struct btrfs_block_rsv *rsv = &fs_info->trans_block_rsv;
+		u64 delayed_refs_bytes = 0;
+
 		qgroup_reserved = num_items * fs_info->nodesize;
 		ret = btrfs_qgroup_reserve_meta_pertrans(root, qgroup_reserved,
 				enforce_qgroups);
 		if (ret)
 			return ERR_PTR(ret);
 
+		/*
+		 * We want to reserve all the bytes we may need all at once, so
+		 * we only do 1 enospc flushing cycle per transaction start.  We
+		 * accomplish this by simply assuming we'll do 2 x num_items
+		 * worth of delayed refs updates in this trans handle, and
+		 * refill that amount for whatever is missing in the reserve.
+		 */
 		num_bytes = btrfs_calc_trans_metadata_size(fs_info, num_items);
+		if (delayed_refs_rsv->full == 0) {
+			delayed_refs_bytes = num_bytes;
+			num_bytes <<= 1;
+		}
+
 		/*
 		 * Do the reservation for the relocation root creation
 		 */
@@ -498,8 +513,24 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
 			reloc_reserved = true;
 		}
 
-		ret = btrfs_block_rsv_add(root, &fs_info->trans_block_rsv,
-					  num_bytes, flush);
+		ret = btrfs_block_rsv_add(root, rsv, num_bytes, flush);
+		if (ret)
+			goto reserve_fail;
+		if (delayed_refs_bytes) {
+			btrfs_migrate_to_delayed_refs_rsv(fs_info, rsv,
+							  delayed_refs_bytes);
+			num_bytes -= delayed_refs_bytes;
+		}
+	} else if (num_items == 0 && flush == BTRFS_RESERVE_FLUSH_ALL &&
+		   !delayed_refs_rsv->full) {
+		/*
+		 * Some people call with btrfs_start_transaction(root, 0)
+		 * because they can be throttled, but have some other mechanism
+		 * for reserving space.  We still want these guys to refill the
+		 * delayed block_rsv so just add 1 items worth of reservation
+		 * here.
+		 */
+		ret = btrfs_delayed_refs_rsv_refill(fs_info, flush);
 		if (ret)
 			goto reserve_fail;
 	}
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/10] btrfs: update may_commit_transaction to use the delayed refs rsv
  2018-12-03 15:20 [PATCH 00/10][V2] Delayed refs rsv Josef Bacik
                   ` (4 preceding siblings ...)
  2018-12-03 15:20 ` [PATCH 05/10] btrfs: introduce delayed_refs_rsv Josef Bacik
@ 2018-12-03 15:20 ` Josef Bacik
  2018-12-06 12:51   ` Nikolay Borisov
  2018-12-03 15:20 ` [PATCH 07/10] btrfs: add new flushing states for " Josef Bacik
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Josef Bacik @ 2018-12-03 15:20 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Any space used in the delayed_refs_rsv will be freed up by a transaction
commit, so instead of just counting the pinned space we also need to
account for any space in the delayed_refs_rsv when deciding if it will
make a different to commit the transaction to satisfy our space
reservation.  If we have enough bytes to satisfy our reservation ticket
then we are good to go, otherwise subtract out what space we would gain
back by committing the transaction and compare that against the pinned
space to make our decision.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent-tree.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index aa0a638d0263..63ff9d832867 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4843,8 +4843,10 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 {
 	struct reserve_ticket *ticket = NULL;
 	struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_block_rsv;
+	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
 	struct btrfs_trans_handle *trans;
-	u64 bytes;
+	u64 bytes_needed;
+	u64 reclaim_bytes = 0;
 
 	trans = (struct btrfs_trans_handle *)current->journal_info;
 	if (trans)
@@ -4857,15 +4859,15 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	else if (!list_empty(&space_info->tickets))
 		ticket = list_first_entry(&space_info->tickets,
 					  struct reserve_ticket, list);
-	bytes = (ticket) ? ticket->bytes : 0;
+	bytes_needed = (ticket) ? ticket->bytes : 0;
 	spin_unlock(&space_info->lock);
 
-	if (!bytes)
+	if (!bytes_needed)
 		return 0;
 
 	/* See if there is enough pinned space to make this reservation */
 	if (__percpu_counter_compare(&space_info->total_bytes_pinned,
-				   bytes,
+				   bytes_needed,
 				   BTRFS_TOTAL_BYTES_PINNED_BATCH) >= 0)
 		goto commit;
 
@@ -4877,14 +4879,18 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 		return -ENOSPC;
 
 	spin_lock(&delayed_rsv->lock);
-	if (delayed_rsv->size > bytes)
-		bytes = 0;
-	else
-		bytes -= delayed_rsv->size;
+	reclaim_bytes += delayed_rsv->reserved;
 	spin_unlock(&delayed_rsv->lock);
 
+	spin_lock(&delayed_refs_rsv->lock);
+	reclaim_bytes += delayed_refs_rsv->reserved;
+	spin_unlock(&delayed_refs_rsv->lock);
+	if (reclaim_bytes >= bytes_needed)
+		goto commit;
+	bytes_needed -= reclaim_bytes;
+
 	if (__percpu_counter_compare(&space_info->total_bytes_pinned,
-				   bytes,
+				   bytes_needed,
 				   BTRFS_TOTAL_BYTES_PINNED_BATCH) < 0) {
 		return -ENOSPC;
 	}
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/10] btrfs: add new flushing states for the delayed refs rsv
  2018-12-03 15:20 [PATCH 00/10][V2] Delayed refs rsv Josef Bacik
                   ` (5 preceding siblings ...)
  2018-12-03 15:20 ` [PATCH 06/10] btrfs: update may_commit_transaction to use the delayed refs rsv Josef Bacik
@ 2018-12-03 15:20 ` Josef Bacik
  2018-12-03 15:20 ` [PATCH 08/10] btrfs: rework btrfs_check_space_for_delayed_refs Josef Bacik
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Josef Bacik @ 2018-12-03 15:20 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

A nice thing we gain with the delayed refs rsv is the ability to flush
the delayed refs on demand to deal with enospc pressure.  Add states to
flush delayed refs on demand, and this will allow us to remove a lot of
ad-hoc work around checking to see if we should commit the transaction
to run our delayed refs.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h             | 10 ++++++----
 fs/btrfs/extent-tree.c       | 14 ++++++++++++++
 include/trace/events/btrfs.h |  2 ++
 3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 52a87d446945..2eba398c722b 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2745,10 +2745,12 @@ enum btrfs_reserve_flush_enum {
 enum btrfs_flush_state {
 	FLUSH_DELAYED_ITEMS_NR	=	1,
 	FLUSH_DELAYED_ITEMS	=	2,
-	FLUSH_DELALLOC		=	3,
-	FLUSH_DELALLOC_WAIT	=	4,
-	ALLOC_CHUNK		=	5,
-	COMMIT_TRANS		=	6,
+	FLUSH_DELAYED_REFS_NR	=	3,
+	FLUSH_DELAYED_REFS	=	4,
+	FLUSH_DELALLOC		=	5,
+	FLUSH_DELALLOC_WAIT	=	6,
+	ALLOC_CHUNK		=	7,
+	COMMIT_TRANS		=	8,
 };
 
 int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 63ff9d832867..5a2d0b061f57 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4938,6 +4938,20 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 		shrink_delalloc(fs_info, num_bytes * 2, num_bytes,
 				state == FLUSH_DELALLOC_WAIT);
 		break;
+	case FLUSH_DELAYED_REFS_NR:
+	case FLUSH_DELAYED_REFS:
+		trans = btrfs_join_transaction(root);
+		if (IS_ERR(trans)) {
+			ret = PTR_ERR(trans);
+			break;
+		}
+		if (state == FLUSH_DELAYED_REFS_NR)
+			nr = calc_reclaim_items_nr(fs_info, num_bytes);
+		else
+			nr = 0;
+		btrfs_run_delayed_refs(trans, nr);
+		btrfs_end_transaction(trans);
+		break;
 	case ALLOC_CHUNK:
 		trans = btrfs_join_transaction(root);
 		if (IS_ERR(trans)) {
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 8568946f491d..63d1f9d8b8c7 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -1048,6 +1048,8 @@ TRACE_EVENT(btrfs_trigger_flush,
 		{ FLUSH_DELAYED_ITEMS,		"FLUSH_DELAYED_ITEMS"},		\
 		{ FLUSH_DELALLOC,		"FLUSH_DELALLOC"},		\
 		{ FLUSH_DELALLOC_WAIT,		"FLUSH_DELALLOC_WAIT"},		\
+		{ FLUSH_DELAYED_REFS_NR,	"FLUSH_DELAYED_REFS_NR"},	\
+		{ FLUSH_DELAYED_REFS,		"FLUSH_ELAYED_REFS"},		\
 		{ ALLOC_CHUNK,			"ALLOC_CHUNK"},			\
 		{ COMMIT_TRANS,			"COMMIT_TRANS"})
 
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 08/10] btrfs: rework btrfs_check_space_for_delayed_refs
  2018-12-03 15:20 [PATCH 00/10][V2] Delayed refs rsv Josef Bacik
                   ` (6 preceding siblings ...)
  2018-12-03 15:20 ` [PATCH 07/10] btrfs: add new flushing states for " Josef Bacik
@ 2018-12-03 15:20 ` Josef Bacik
  2018-12-06 16:52   ` Nikolay Borisov
  2019-01-14  6:28   ` Qu Wenruo
  2018-12-03 15:20 ` [PATCH 09/10] btrfs: don't run delayed refs in the end transaction logic Josef Bacik
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 26+ messages in thread
From: Josef Bacik @ 2018-12-03 15:20 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Now with the delayed_refs_rsv we can now know exactly how much pending
delayed refs space we need.  This means we can drastically simplify
btrfs_check_space_for_delayed_refs by simply checking how much space we
have reserved for the global rsv (which acts as a spill over buffer) and
the delayed refs rsv.  If our total size is beyond that amount then we
know it's time to commit the transaction and stop any more delayed refs
from being generated.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h       |  2 +-
 fs/btrfs/extent-tree.c | 48 ++++++++++++++++++------------------------------
 fs/btrfs/inode.c       |  4 ++--
 fs/btrfs/transaction.c |  2 +-
 4 files changed, 22 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2eba398c722b..30da075c042e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2631,7 +2631,7 @@ static inline u64 btrfs_calc_trunc_metadata_size(struct btrfs_fs_info *fs_info,
 }
 
 int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans);
-int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans);
+bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
 void btrfs_dec_block_group_reservations(struct btrfs_fs_info *fs_info,
 					 const u64 start);
 void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 5a2d0b061f57..07ef1b8087f7 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2839,40 +2839,28 @@ u64 btrfs_csum_bytes_to_leaves(struct btrfs_fs_info *fs_info, u64 csum_bytes)
 	return num_csums;
 }
 
-int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans)
+bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info)
 {
-	struct btrfs_fs_info *fs_info = trans->fs_info;
-	struct btrfs_block_rsv *global_rsv;
-	u64 num_heads = trans->transaction->delayed_refs.num_heads_ready;
-	u64 csum_bytes = trans->transaction->delayed_refs.pending_csums;
-	unsigned int num_dirty_bgs = trans->transaction->num_dirty_bgs;
-	u64 num_bytes, num_dirty_bgs_bytes;
-	int ret = 0;
+	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
+	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
+	bool ret = false;
+	u64 reserved;
 
-	num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
-	num_heads = heads_to_leaves(fs_info, num_heads);
-	if (num_heads > 1)
-		num_bytes += (num_heads - 1) * fs_info->nodesize;
-	num_bytes <<= 1;
-	num_bytes += btrfs_csum_bytes_to_leaves(fs_info, csum_bytes) *
-							fs_info->nodesize;
-	num_dirty_bgs_bytes = btrfs_calc_trans_metadata_size(fs_info,
-							     num_dirty_bgs);
-	global_rsv = &fs_info->global_block_rsv;
+	spin_lock(&global_rsv->lock);
+	reserved = global_rsv->reserved;
+	spin_unlock(&global_rsv->lock);
 
 	/*
-	 * If we can't allocate any more chunks lets make sure we have _lots_ of
-	 * wiggle room since running delayed refs can create more delayed refs.
+	 * Since the global reserve is just kind of magic we don't really want
+	 * to rely on it to save our bacon, so if our size is more than the
+	 * delayed_refs_rsv and the global rsv then it's time to think about
+	 * bailing.
 	 */
-	if (global_rsv->space_info->full) {
-		num_dirty_bgs_bytes <<= 1;
-		num_bytes <<= 1;
-	}
-
-	spin_lock(&global_rsv->lock);
-	if (global_rsv->reserved <= num_bytes + num_dirty_bgs_bytes)
-		ret = 1;
-	spin_unlock(&global_rsv->lock);
+	spin_lock(&delayed_refs_rsv->lock);
+	reserved += delayed_refs_rsv->reserved;
+	if (delayed_refs_rsv->size >= reserved)
+		ret = true;
+	spin_unlock(&delayed_refs_rsv->lock);
 	return ret;
 }
 
@@ -2891,7 +2879,7 @@ int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans)
 	if (val >= NSEC_PER_SEC / 2)
 		return 2;
 
-	return btrfs_check_space_for_delayed_refs(trans);
+	return btrfs_check_space_for_delayed_refs(trans->fs_info);
 }
 
 struct async_delayed_refs {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a097f5fde31d..8532a2eb56d1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5326,8 +5326,8 @@ static struct btrfs_trans_handle *evict_refill_and_join(struct btrfs_root *root,
 		 * Try to steal from the global reserve if there is space for
 		 * it.
 		 */
-		if (!btrfs_check_space_for_delayed_refs(trans) &&
-		    !btrfs_block_rsv_migrate(global_rsv, rsv, rsv->size, false))
+		if (!btrfs_check_space_for_delayed_refs(fs_info) &&
+		    !btrfs_block_rsv_migrate(global_rsv, rsv, rsv->size, 0))
 			return trans;
 
 		/* If not, commit and try again. */
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index a21c4defad92..2d8401bf8df9 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -789,7 +789,7 @@ static int should_end_transaction(struct btrfs_trans_handle *trans)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 
-	if (btrfs_check_space_for_delayed_refs(trans))
+	if (btrfs_check_space_for_delayed_refs(fs_info))
 		return 1;
 
 	return !!btrfs_block_rsv_check(&fs_info->global_block_rsv, 5);
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/10] btrfs: don't run delayed refs in the end transaction logic
  2018-12-03 15:20 [PATCH 00/10][V2] Delayed refs rsv Josef Bacik
                   ` (7 preceding siblings ...)
  2018-12-03 15:20 ` [PATCH 08/10] btrfs: rework btrfs_check_space_for_delayed_refs Josef Bacik
@ 2018-12-03 15:20 ` Josef Bacik
  2018-12-06 16:43   ` Nikolay Borisov
  2018-12-03 15:20 ` [PATCH 10/10] btrfs: fix truncate throttling Josef Bacik
  2018-12-06 15:56 ` [PATCH 00/10][V2] Delayed refs rsv David Sterba
  10 siblings, 1 reply; 26+ messages in thread
From: Josef Bacik @ 2018-12-03 15:20 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Over the years we have built up a lot of infrastructure to keep delayed
refs in check, mostly by running them at btrfs_end_transaction() time.
We have a lot of different maths we do to figure out how much, if we
should do it inline or async, etc.  This existed because we had no
feedback mechanism to force the flushing of delayed refs when they
became a problem.  However with the enospc flushing infrastructure in
place for flushing delayed refs when they put too much pressure on the
enospc system we have this problem solved.  Rip out all of this code as
it is no longer needed.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/transaction.c | 38 --------------------------------------
 1 file changed, 38 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 2d8401bf8df9..01f39401619a 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -798,22 +798,12 @@ static int should_end_transaction(struct btrfs_trans_handle *trans)
 int btrfs_should_end_transaction(struct btrfs_trans_handle *trans)
 {
 	struct btrfs_transaction *cur_trans = trans->transaction;
-	int updates;
-	int err;
 
 	smp_mb();
 	if (cur_trans->state >= TRANS_STATE_BLOCKED ||
 	    cur_trans->delayed_refs.flushing)
 		return 1;
 
-	updates = trans->delayed_ref_updates;
-	trans->delayed_ref_updates = 0;
-	if (updates) {
-		err = btrfs_run_delayed_refs(trans, updates * 2);
-		if (err) /* Error code will also eval true */
-			return err;
-	}
-
 	return should_end_transaction(trans);
 }
 
@@ -843,11 +833,8 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 {
 	struct btrfs_fs_info *info = trans->fs_info;
 	struct btrfs_transaction *cur_trans = trans->transaction;
-	u64 transid = trans->transid;
-	unsigned long cur = trans->delayed_ref_updates;
 	int lock = (trans->type != TRANS_JOIN_NOLOCK);
 	int err = 0;
-	int must_run_delayed_refs = 0;
 
 	if (refcount_read(&trans->use_count) > 1) {
 		refcount_dec(&trans->use_count);
@@ -858,27 +845,6 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 	btrfs_trans_release_metadata(trans);
 	trans->block_rsv = NULL;
 
-	if (!list_empty(&trans->new_bgs))
-		btrfs_create_pending_block_groups(trans);
-
-	trans->delayed_ref_updates = 0;
-	if (!trans->sync) {
-		must_run_delayed_refs =
-			btrfs_should_throttle_delayed_refs(trans);
-		cur = max_t(unsigned long, cur, 32);
-
-		/*
-		 * don't make the caller wait if they are from a NOLOCK
-		 * or ATTACH transaction, it will deadlock with commit
-		 */
-		if (must_run_delayed_refs == 1 &&
-		    (trans->type & (__TRANS_JOIN_NOLOCK | __TRANS_ATTACH)))
-			must_run_delayed_refs = 2;
-	}
-
-	btrfs_trans_release_metadata(trans);
-	trans->block_rsv = NULL;
-
 	if (!list_empty(&trans->new_bgs))
 		btrfs_create_pending_block_groups(trans);
 
@@ -923,10 +889,6 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 	}
 
 	kmem_cache_free(btrfs_trans_handle_cachep, trans);
-	if (must_run_delayed_refs) {
-		btrfs_async_run_delayed_refs(info, cur, transid,
-					     must_run_delayed_refs == 1);
-	}
 	return err;
 }
 
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 10/10] btrfs: fix truncate throttling
  2018-12-03 15:20 [PATCH 00/10][V2] Delayed refs rsv Josef Bacik
                   ` (8 preceding siblings ...)
  2018-12-03 15:20 ` [PATCH 09/10] btrfs: don't run delayed refs in the end transaction logic Josef Bacik
@ 2018-12-03 15:20 ` Josef Bacik
  2018-12-06 15:56 ` [PATCH 00/10][V2] Delayed refs rsv David Sterba
  10 siblings, 0 replies; 26+ messages in thread
From: Josef Bacik @ 2018-12-03 15:20 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We have a bunch of magic to make sure we're throttling delayed refs when
truncating a file.  Now that we have a delayed refs rsv and a mechanism
for refilling that reserve simply use that instead of all of this magic.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c | 79 ++++++++++++--------------------------------------------
 1 file changed, 17 insertions(+), 62 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8532a2eb56d1..623a71d871d4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4437,31 +4437,6 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
 	return err;
 }
 
-static int truncate_space_check(struct btrfs_trans_handle *trans,
-				struct btrfs_root *root,
-				u64 bytes_deleted)
-{
-	struct btrfs_fs_info *fs_info = root->fs_info;
-	int ret;
-
-	/*
-	 * This is only used to apply pressure to the enospc system, we don't
-	 * intend to use this reservation at all.
-	 */
-	bytes_deleted = btrfs_csum_bytes_to_leaves(fs_info, bytes_deleted);
-	bytes_deleted *= fs_info->nodesize;
-	ret = btrfs_block_rsv_add(root, &fs_info->trans_block_rsv,
-				  bytes_deleted, BTRFS_RESERVE_NO_FLUSH);
-	if (!ret) {
-		trace_btrfs_space_reservation(fs_info, "transaction",
-					      trans->transid,
-					      bytes_deleted, 1);
-		trans->bytes_reserved += bytes_deleted;
-	}
-	return ret;
-
-}
-
 /*
  * Return this if we need to call truncate_block for the last bit of the
  * truncate.
@@ -4506,7 +4481,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 	u64 bytes_deleted = 0;
 	bool be_nice = false;
 	bool should_throttle = false;
-	bool should_end = false;
 
 	BUG_ON(new_size > 0 && min_type != BTRFS_EXTENT_DATA_KEY);
 
@@ -4719,15 +4693,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 				btrfs_abort_transaction(trans, ret);
 				break;
 			}
-			if (btrfs_should_throttle_delayed_refs(trans))
-				btrfs_async_run_delayed_refs(fs_info,
-					trans->delayed_ref_updates * 2,
-					trans->transid, 0);
 			if (be_nice) {
-				if (truncate_space_check(trans, root,
-							 extent_num_bytes)) {
-					should_end = true;
-				}
 				if (btrfs_should_throttle_delayed_refs(trans))
 					should_throttle = true;
 			}
@@ -4738,7 +4704,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 
 		if (path->slots[0] == 0 ||
 		    path->slots[0] != pending_del_slot ||
-		    should_throttle || should_end) {
+		    should_throttle) {
 			if (pending_del_nr) {
 				ret = btrfs_del_items(trans, root, path,
 						pending_del_slot,
@@ -4750,23 +4716,24 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 				pending_del_nr = 0;
 			}
 			btrfs_release_path(path);
-			if (should_throttle) {
-				unsigned long updates = trans->delayed_ref_updates;
-				if (updates) {
-					trans->delayed_ref_updates = 0;
-					ret = btrfs_run_delayed_refs(trans,
-								   updates * 2);
-					if (ret)
-						break;
-				}
-			}
+
 			/*
-			 * if we failed to refill our space rsv, bail out
-			 * and let the transaction restart
+			 * We can generate a lot of delayed refs, so we need to
+			 * throttle every once and a while and make sure we're
+			 * adding enough space to keep up with the work we are
+			 * generating.  Since we hold a transaction here we
+			 * can't flush, and we don't want to FLUSH_LIMIT because
+			 * we could have generated too many delayed refs to
+			 * actually allocate, so just bail if we're short and
+			 * let the normal reservation dance happen higher up.
 			 */
-			if (should_end) {
-				ret = -EAGAIN;
-				break;
+			if (should_throttle) {
+				ret = btrfs_delayed_refs_rsv_refill(fs_info,
+							BTRFS_RESERVE_NO_FLUSH);
+				if (ret) {
+					ret = -EAGAIN;
+					break;
+				}
 			}
 			goto search_again;
 		} else {
@@ -4792,18 +4759,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 	}
 
 	btrfs_free_path(path);
-
-	if (be_nice && bytes_deleted > SZ_32M && (ret >= 0 || ret == -EAGAIN)) {
-		unsigned long updates = trans->delayed_ref_updates;
-		int err;
-
-		if (updates) {
-			trans->delayed_ref_updates = 0;
-			err = btrfs_run_delayed_refs(trans, updates * 2);
-			if (err)
-				ret = err;
-		}
-	}
 	return ret;
 }
 
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 01/10] btrfs: add btrfs_delete_ref_head helper
  2018-12-03 15:20 ` [PATCH 01/10] btrfs: add btrfs_delete_ref_head helper Josef Bacik
@ 2018-12-06 12:32   ` Nikolay Borisov
  0 siblings, 0 replies; 26+ messages in thread
From: Nikolay Borisov @ 2018-12-06 12:32 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team; +Cc: Josef Bacik



On 3.12.18 г. 17:20 ч., Josef Bacik wrote:
> From: Josef Bacik <jbacik@fb.com>
> 
> We do this dance in cleanup_ref_head and check_ref_cleanup, unify it
> into a helper and cleanup the calling functions.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> Reviewed-by: Omar Sandoval <osandov@fb.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/delayed-ref.c | 14 ++++++++++++++
>  fs/btrfs/delayed-ref.h |  3 ++-
>  fs/btrfs/extent-tree.c | 22 +++-------------------
>  3 files changed, 19 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index 9301b3ad9217..b3e4c9fcb664 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -400,6 +400,20 @@ struct btrfs_delayed_ref_head *btrfs_select_ref_head(
>  	return head;
>  }
>  
> +void btrfs_delete_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
> +			   struct btrfs_delayed_ref_head *head)
> +{
> +	lockdep_assert_held(&delayed_refs->lock);
> +	lockdep_assert_held(&head->lock);
> +
> +	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
> +	RB_CLEAR_NODE(&head->href_node);
> +	atomic_dec(&delayed_refs->num_entries);
> +	delayed_refs->num_heads--;
> +	if (head->processing == 0)
> +		delayed_refs->num_heads_ready--;
> +}
> +
>  /*
>   * Helper to insert the ref_node to the tail or merge with tail.
>   *
> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
> index 8e20c5cb5404..d2af974f68a1 100644
> --- a/fs/btrfs/delayed-ref.h
> +++ b/fs/btrfs/delayed-ref.h
> @@ -261,7 +261,8 @@ static inline void btrfs_delayed_ref_unlock(struct btrfs_delayed_ref_head *head)
>  {
>  	mutex_unlock(&head->mutex);
>  }
> -
> +void btrfs_delete_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
> +			   struct btrfs_delayed_ref_head *head);
>  
>  struct btrfs_delayed_ref_head *btrfs_select_ref_head(
>  		struct btrfs_delayed_ref_root *delayed_refs);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index d242a1174e50..c36b3a42f2bb 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2474,12 +2474,9 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans,
>  		spin_unlock(&delayed_refs->lock);
>  		return 1;
>  	}
> -	delayed_refs->num_heads--;
> -	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
> -	RB_CLEAR_NODE(&head->href_node);
> +	btrfs_delete_ref_head(delayed_refs, head);
>  	spin_unlock(&head->lock);
>  	spin_unlock(&delayed_refs->lock);
> -	atomic_dec(&delayed_refs->num_entries);
>  
>  	trace_run_delayed_ref_head(fs_info, head, 0);
>  
> @@ -6984,22 +6981,9 @@ static noinline int check_ref_cleanup(struct btrfs_trans_handle *trans,
>  	if (!mutex_trylock(&head->mutex))
>  		goto out;
>  
> -	/*
> -	 * at this point we have a head with no other entries.  Go
> -	 * ahead and process it.
> -	 */
> -	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
> -	RB_CLEAR_NODE(&head->href_node);
> -	atomic_dec(&delayed_refs->num_entries);
> -
> -	/*
> -	 * we don't take a ref on the node because we're removing it from the
> -	 * tree, so we just steal the ref the tree was holding.
> -	 */
> -	delayed_refs->num_heads--;
> -	if (head->processing == 0)
> -		delayed_refs->num_heads_ready--;
> +	btrfs_delete_ref_head(delayed_refs, head);
>  	head->processing = 0;
> +
>  	spin_unlock(&head->lock);
>  	spin_unlock(&delayed_refs->lock);
>  
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 02/10] btrfs: add cleanup_ref_head_accounting helper
  2018-12-03 15:20 ` [PATCH 02/10] btrfs: add cleanup_ref_head_accounting helper Josef Bacik
@ 2018-12-06 12:38   ` Nikolay Borisov
  0 siblings, 0 replies; 26+ messages in thread
From: Nikolay Borisov @ 2018-12-06 12:38 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team; +Cc: Josef Bacik



On 3.12.18 г. 17:20 ч., Josef Bacik wrote:
> From: Josef Bacik <jbacik@fb.com>
> 
> We were missing some quota cleanups in check_ref_cleanup, so break the
> ref head accounting cleanup into a helper and call that from both
> check_ref_cleanup and cleanup_ref_head.  This will hopefully ensure that
> we don't screw up accounting in the future for other things that we add.
> 
> Reviewed-by: Omar Sandoval <osandov@fb.com>
> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
> Signed-off-by: Josef Bacik <jbacik@fb.com>

Doesn't this also need a stable tag? Furthermore, doesn't the missing
code dealing with total_bytes_pinned in check_ref_cleanup mean that
every time the last reference for a block was freed we were leaking
bytes in total_bytes_pinned? Shouldn't this have lead to eventually
total_bytes_pinned dominating the usage in a space_info ?

Codewise lgtm:

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/extent-tree.c | 67 +++++++++++++++++++++++++++++---------------------
>  1 file changed, 39 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index c36b3a42f2bb..e3ed3507018d 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2443,6 +2443,41 @@ static int cleanup_extent_op(struct btrfs_trans_handle *trans,
>  	return ret ? ret : 1;
>  }
>  
> +static void cleanup_ref_head_accounting(struct btrfs_trans_handle *trans,
> +					struct btrfs_delayed_ref_head *head)
> +{
> +	struct btrfs_fs_info *fs_info = trans->fs_info;
> +	struct btrfs_delayed_ref_root *delayed_refs =
> +		&trans->transaction->delayed_refs;
> +
> +	if (head->total_ref_mod < 0) {
> +		struct btrfs_space_info *space_info;
> +		u64 flags;
> +
> +		if (head->is_data)
> +			flags = BTRFS_BLOCK_GROUP_DATA;
> +		else if (head->is_system)
> +			flags = BTRFS_BLOCK_GROUP_SYSTEM;
> +		else
> +			flags = BTRFS_BLOCK_GROUP_METADATA;
> +		space_info = __find_space_info(fs_info, flags);
> +		ASSERT(space_info);
> +		percpu_counter_add_batch(&space_info->total_bytes_pinned,
> +				   -head->num_bytes,
> +				   BTRFS_TOTAL_BYTES_PINNED_BATCH);
> +
> +		if (head->is_data) {
> +			spin_lock(&delayed_refs->lock);
> +			delayed_refs->pending_csums -= head->num_bytes;
> +			spin_unlock(&delayed_refs->lock);
> +		}
> +	}
> +
> +	/* Also free its reserved qgroup space */
> +	btrfs_qgroup_free_delayed_ref(fs_info, head->qgroup_ref_root,
> +				      head->qgroup_reserved);
> +}
> +
>  static int cleanup_ref_head(struct btrfs_trans_handle *trans,
>  			    struct btrfs_delayed_ref_head *head)
>  {
> @@ -2478,31 +2513,6 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans,
>  	spin_unlock(&head->lock);
>  	spin_unlock(&delayed_refs->lock);
>  
> -	trace_run_delayed_ref_head(fs_info, head, 0);
> -
> -	if (head->total_ref_mod < 0) {
> -		struct btrfs_space_info *space_info;
> -		u64 flags;
> -
> -		if (head->is_data)
> -			flags = BTRFS_BLOCK_GROUP_DATA;
> -		else if (head->is_system)
> -			flags = BTRFS_BLOCK_GROUP_SYSTEM;
> -		else
> -			flags = BTRFS_BLOCK_GROUP_METADATA;
> -		space_info = __find_space_info(fs_info, flags);
> -		ASSERT(space_info);
> -		percpu_counter_add_batch(&space_info->total_bytes_pinned,
> -				   -head->num_bytes,
> -				   BTRFS_TOTAL_BYTES_PINNED_BATCH);
> -
> -		if (head->is_data) {
> -			spin_lock(&delayed_refs->lock);
> -			delayed_refs->pending_csums -= head->num_bytes;
> -			spin_unlock(&delayed_refs->lock);
> -		}
> -	}
> -
>  	if (head->must_insert_reserved) {
>  		btrfs_pin_extent(fs_info, head->bytenr,
>  				 head->num_bytes, 1);
> @@ -2512,9 +2522,9 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans,
>  		}
>  	}
>  
> -	/* Also free its reserved qgroup space */
> -	btrfs_qgroup_free_delayed_ref(fs_info, head->qgroup_ref_root,
> -				      head->qgroup_reserved);
> +	cleanup_ref_head_accounting(trans, head);
> +
> +	trace_run_delayed_ref_head(fs_info, head, 0);
>  	btrfs_delayed_ref_unlock(head);
>  	btrfs_put_delayed_ref_head(head);
>  	return 0;
> @@ -6991,6 +7001,7 @@ static noinline int check_ref_cleanup(struct btrfs_trans_handle *trans,
>  	if (head->must_insert_reserved)
>  		ret = 1;
>  
> +	cleanup_ref_head_accounting(trans, head);
>  	mutex_unlock(&head->mutex);
>  	btrfs_put_delayed_ref_head(head);
>  	return ret;
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 06/10] btrfs: update may_commit_transaction to use the delayed refs rsv
  2018-12-03 15:20 ` [PATCH 06/10] btrfs: update may_commit_transaction to use the delayed refs rsv Josef Bacik
@ 2018-12-06 12:51   ` Nikolay Borisov
  0 siblings, 0 replies; 26+ messages in thread
From: Nikolay Borisov @ 2018-12-06 12:51 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 3.12.18 г. 17:20 ч., Josef Bacik wrote:
> Any space used in the delayed_refs_rsv will be freed up by a transaction
> commit, so instead of just counting the pinned space we also need to
> account for any space in the delayed_refs_rsv when deciding if it will
> make a different to commit the transaction to satisfy our space
> reservation.  If we have enough bytes to satisfy our reservation ticket
> then we are good to go, otherwise subtract out what space we would gain
> back by committing the transaction and compare that against the pinned
> space to make our decision.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

However, look below for one suggestion: 

> ---
>  fs/btrfs/extent-tree.c | 24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index aa0a638d0263..63ff9d832867 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4843,8 +4843,10 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
>  {
>  	struct reserve_ticket *ticket = NULL;
>  	struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_block_rsv;
> +	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
>  	struct btrfs_trans_handle *trans;
> -	u64 bytes;
> +	u64 bytes_needed;
> +	u64 reclaim_bytes = 0;
>  
>  	trans = (struct btrfs_trans_handle *)current->journal_info;
>  	if (trans)
> @@ -4857,15 +4859,15 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
>  	else if (!list_empty(&space_info->tickets))
>  		ticket = list_first_entry(&space_info->tickets,
>  					  struct reserve_ticket, list);
> -	bytes = (ticket) ? ticket->bytes : 0;
> +	bytes_needed = (ticket) ? ticket->bytes : 0;
>  	spin_unlock(&space_info->lock);
>  
> -	if (!bytes)
> +	if (!bytes_needed)
>  		return 0;
>  
>  	/* See if there is enough pinned space to make this reservation */
>  	if (__percpu_counter_compare(&space_info->total_bytes_pinned,
> -				   bytes,
> +				   bytes_needed,
>  				   BTRFS_TOTAL_BYTES_PINNED_BATCH) >= 0)
>  		goto commit;
>  
> @@ -4877,14 +4879,18 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
>  		return -ENOSPC;

If we remove this :
 if (space_info != delayed_rsv->space_info)                              
                return -ENOSPC; 

Check, can't we move the reclaim_bytes calc code above the __percpu_counter_compare 
and eventually be left with just a single invocation to percpu_compare. 
The diff should looke something along the lines of: 

@@ -4828,19 +4827,6 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
        if (!bytes)
                return 0;
 
-       /* See if there is enough pinned space to make this reservation */
-       if (__percpu_counter_compare(&space_info->total_bytes_pinned,
-                                  bytes,
-                                  BTRFS_TOTAL_BYTES_PINNED_BATCH) >= 0)
-               goto commit;
-
-       /*
-        * See if there is some space in the delayed insertion reservation for
-        * this reservation.
-        */
-       if (space_info != delayed_rsv->space_info)
-               return -ENOSPC;
-
        spin_lock(&delayed_rsv->lock);
        if (delayed_rsv->size > bytes)
                bytes = 0;
@@ -4850,9 +4836,8 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 
        if (__percpu_counter_compare(&space_info->total_bytes_pinned,
                                   bytes,
-                                  BTRFS_TOTAL_BYTES_PINNED_BATCH) < 0) {
+                                  BTRFS_TOTAL_BYTES_PINNED_BATCH) < 0)
                return -ENOSPC;
-       }
 
 commit:
        trans = btrfs_join_transaction(fs_info->extent_root);


>  
>  	spin_lock(&delayed_rsv->lock);
> -	if (delayed_rsv->size > bytes)
> -		bytes = 0;
> -	else
> -		bytes -= delayed_rsv->size;
> +	reclaim_bytes += delayed_rsv->reserved;
>  	spin_unlock(&delayed_rsv->lock);
>  
> +	spin_lock(&delayed_refs_rsv->lock);
> +	reclaim_bytes += delayed_refs_rsv->reserved;
> +	spin_unlock(&delayed_refs_rsv->lock);
> +	if (reclaim_bytes >= bytes_needed)
> +		goto commit;
> +	bytes_needed -= reclaim_bytes;
> +
>  	if (__percpu_counter_compare(&space_info->total_bytes_pinned,
> -				   bytes,
> +				   bytes_needed,
>  				   BTRFS_TOTAL_BYTES_PINNED_BATCH) < 0) {
>  		return -ENOSPC;
>  	}
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/10][V2] Delayed refs rsv
  2018-12-03 15:20 [PATCH 00/10][V2] Delayed refs rsv Josef Bacik
                   ` (9 preceding siblings ...)
  2018-12-03 15:20 ` [PATCH 10/10] btrfs: fix truncate throttling Josef Bacik
@ 2018-12-06 15:56 ` David Sterba
  10 siblings, 0 replies; 26+ messages in thread
From: David Sterba @ 2018-12-06 15:56 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Mon, Dec 03, 2018 at 10:20:28AM -0500, Josef Bacik wrote:
> v1->v2:
> - addressed the comments from the various reviewers.
> - split "introduce delayed_refs_rsv" into 5 patches.  The patches are the same
>   together as they were, just split out more logically.  They can't really be
>   bisected across in that you will likely have fun enospc failures, but they
>   compile properly.  This was done to make it easier for review.

Thanks, that was helpful. The bisectability is IMO not hurt, as it
compiles and runs despite the potential ENOSPC problems. I guess the
point is to be able to reach any commit & compile & run and not be hit
by unrelated bugs. If somebody is bisecting an ENOSPC problem and lands
in the middle of the series, there's a hint in the changelogs that
something is going on.

> -- Original message --
> 
> This patchset changes how we do space reservations for delayed refs.  We were
> hitting probably 20-40 enospc abort's per day in production while running
> delayed refs at transaction commit time.  This means we ran out of space in the
> global reserve and couldn't easily get more space in use_block_rsv().
> 
> The global reserve has grown to cover everything we don't reserve space
> explicitly for, and we've grown a lot of weird ad-hoc hueristics to know if
> we're running short on space and when it's time to force a commit.  A failure
> rate of 20-40 file systems when we run hundreds of thousands of them isn't super
> high, but cleaning up this code will make things less ugly and more predictible.
> 
> Thus the delayed refs rsv.  We always know how many delayed refs we have
> outstanding, and although running them generates more we can use the global
> reserve for that spill over, which fits better into it's desired use than a full
> blown reservation.  This first approach is to simply take how many times we're
> reserving space for and multiply that by 2 in order to save enough space for the
> delayed refs that could be generated.  This is a niave approach and will
> probably evolve, but for now it works.
> 
> With this patchset we've gone down to 2-8 failures per week.  It's not perfect,
> there are some corner cases that still need to be addressed, but is
> significantly better than what we had.  Thanks,

I've merged the series to misc-next, the amount of reviews and testing
is sufficient. Though there are some more comments or suggestions for
improvements, they can be done as followups.

The other patchsets from you are namely fixes, that can be applied at
the -rc time, for that reason I'm glad the main infrastructure change
can be merged before the 4.21 code freeze.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/10] btrfs: don't run delayed refs in the end transaction logic
  2018-12-03 15:20 ` [PATCH 09/10] btrfs: don't run delayed refs in the end transaction logic Josef Bacik
@ 2018-12-06 16:43   ` Nikolay Borisov
  0 siblings, 0 replies; 26+ messages in thread
From: Nikolay Borisov @ 2018-12-06 16:43 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 3.12.18 г. 17:20 ч., Josef Bacik wrote:
> Over the years we have built up a lot of infrastructure to keep delayed
> refs in check, mostly by running them at btrfs_end_transaction() time.
> We have a lot of different maths we do to figure out how much, if we
> should do it inline or async, etc.  This existed because we had no
> feedback mechanism to force the flushing of delayed refs when they
> became a problem.  However with the enospc flushing infrastructure in
> place for flushing delayed refs when they put too much pressure on the
> enospc system we have this problem solved.  Rip out all of this code as
> it is no longer needed.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/transaction.c | 38 --------------------------------------
>  1 file changed, 38 deletions(-)
> 
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 2d8401bf8df9..01f39401619a 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -798,22 +798,12 @@ static int should_end_transaction(struct btrfs_trans_handle *trans)
>  int btrfs_should_end_transaction(struct btrfs_trans_handle *trans)
>  {
>  	struct btrfs_transaction *cur_trans = trans->transaction;
> -	int updates;
> -	int err;
>  
>  	smp_mb();
>  	if (cur_trans->state >= TRANS_STATE_BLOCKED ||
>  	    cur_trans->delayed_refs.flushing)
>  		return 1;
>  
> -	updates = trans->delayed_ref_updates;
> -	trans->delayed_ref_updates = 0;
> -	if (updates) {
> -		err = btrfs_run_delayed_refs(trans, updates * 2);
> -		if (err) /* Error code will also eval true */
> -			return err;
> -	}
> -
>  	return should_end_transaction(trans);
>  }
>  
> @@ -843,11 +833,8 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
>  {
>  	struct btrfs_fs_info *info = trans->fs_info;
>  	struct btrfs_transaction *cur_trans = trans->transaction;
> -	u64 transid = trans->transid;
> -	unsigned long cur = trans->delayed_ref_updates;
>  	int lock = (trans->type != TRANS_JOIN_NOLOCK);
>  	int err = 0;
> -	int must_run_delayed_refs = 0;
>  
>  	if (refcount_read(&trans->use_count) > 1) {
>  		refcount_dec(&trans->use_count);
> @@ -858,27 +845,6 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
>  	btrfs_trans_release_metadata(trans);
>  	trans->block_rsv = NULL;
>  
> -	if (!list_empty(&trans->new_bgs))
> -		btrfs_create_pending_block_groups(trans);

Is this being deleted because in delayed_refs_rsv you account also fo
new block groups?

> -
> -	trans->delayed_ref_updates = 0;
> -	if (!trans->sync) {
> -		must_run_delayed_refs =
> -			btrfs_should_throttle_delayed_refs(trans);
> -		cur = max_t(unsigned long, cur, 32);
> -
> -		/*
> -		 * don't make the caller wait if they are from a NOLOCK
> -		 * or ATTACH transaction, it will deadlock with commit
> -		 */
> -		if (must_run_delayed_refs == 1 &&
> -		    (trans->type & (__TRANS_JOIN_NOLOCK | __TRANS_ATTACH)))
> -			must_run_delayed_refs = 2;
> -	}
> -
> -	btrfs_trans_release_metadata(trans);
> -	trans->block_rsv = NULL;

Why remove those 2 lines as well ?

> -
>  	if (!list_empty(&trans->new_bgs))
>  		btrfs_create_pending_block_groups(trans);
>  
> @@ -923,10 +889,6 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
>  	}
>  
>  	kmem_cache_free(btrfs_trans_handle_cachep, trans);
> -	if (must_run_delayed_refs) {
> -		btrfs_async_run_delayed_refs(info, cur, transid,
> -					     must_run_delayed_refs == 1);
> -	}
>  	return err;
>  }
>  
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 08/10] btrfs: rework btrfs_check_space_for_delayed_refs
  2018-12-03 15:20 ` [PATCH 08/10] btrfs: rework btrfs_check_space_for_delayed_refs Josef Bacik
@ 2018-12-06 16:52   ` Nikolay Borisov
  2018-12-06 17:54     ` David Sterba
  2019-01-14  6:28   ` Qu Wenruo
  1 sibling, 1 reply; 26+ messages in thread
From: Nikolay Borisov @ 2018-12-06 16:52 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 3.12.18 г. 17:20 ч., Josef Bacik wrote:
> Now with the delayed_refs_rsv we can now know exactly how much pending
> delayed refs space we need.  This means we can drastically simplify

IMO it will be helpful if there is a sentence here referring back to
btrfs_update_delayed_refs_rsv to put your first sentence into context.
But I guess this is something David can also do.

> btrfs_check_space_for_delayed_refs by simply checking how much space we
> have reserved for the global rsv (which acts as a spill over buffer) and
> the delayed refs rsv.  If our total size is beyond that amount then we
> know it's time to commit the transaction and stop any more delayed refs
> from being generated.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/ctree.h       |  2 +-
>  fs/btrfs/extent-tree.c | 48 ++++++++++++++++++------------------------------
>  fs/btrfs/inode.c       |  4 ++--
>  fs/btrfs/transaction.c |  2 +-
>  4 files changed, 22 insertions(+), 34 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 2eba398c722b..30da075c042e 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -2631,7 +2631,7 @@ static inline u64 btrfs_calc_trunc_metadata_size(struct btrfs_fs_info *fs_info,
>  }
>  
>  int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans);
> -int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans);
> +bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
>  void btrfs_dec_block_group_reservations(struct btrfs_fs_info *fs_info,
>  					 const u64 start);
>  void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 5a2d0b061f57..07ef1b8087f7 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2839,40 +2839,28 @@ u64 btrfs_csum_bytes_to_leaves(struct btrfs_fs_info *fs_info, u64 csum_bytes)
>  	return num_csums;
>  }
>  
> -int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans)
> +bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info)
>  {
> -	struct btrfs_fs_info *fs_info = trans->fs_info;
> -	struct btrfs_block_rsv *global_rsv;
> -	u64 num_heads = trans->transaction->delayed_refs.num_heads_ready;
> -	u64 csum_bytes = trans->transaction->delayed_refs.pending_csums;
> -	unsigned int num_dirty_bgs = trans->transaction->num_dirty_bgs;
> -	u64 num_bytes, num_dirty_bgs_bytes;
> -	int ret = 0;
> +	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
> +	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
> +	bool ret = false;
> +	u64 reserved;
>  
> -	num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
> -	num_heads = heads_to_leaves(fs_info, num_heads);
> -	if (num_heads > 1)
> -		num_bytes += (num_heads - 1) * fs_info->nodesize;
> -	num_bytes <<= 1;
> -	num_bytes += btrfs_csum_bytes_to_leaves(fs_info, csum_bytes) *
> -							fs_info->nodesize;
> -	num_dirty_bgs_bytes = btrfs_calc_trans_metadata_size(fs_info,
> -							     num_dirty_bgs);
> -	global_rsv = &fs_info->global_block_rsv;
> +	spin_lock(&global_rsv->lock);
> +	reserved = global_rsv->reserved;
> +	spin_unlock(&global_rsv->lock);
>  
>  	/*
> -	 * If we can't allocate any more chunks lets make sure we have _lots_ of
> -	 * wiggle room since running delayed refs can create more delayed refs.
> +	 * Since the global reserve is just kind of magic we don't really want
> +	 * to rely on it to save our bacon, so if our size is more than the
> +	 * delayed_refs_rsv and the global rsv then it's time to think about
> +	 * bailing.
>  	 */
> -	if (global_rsv->space_info->full) {
> -		num_dirty_bgs_bytes <<= 1;
> -		num_bytes <<= 1;
> -	}
> -
> -	spin_lock(&global_rsv->lock);
> -	if (global_rsv->reserved <= num_bytes + num_dirty_bgs_bytes)
> -		ret = 1;
> -	spin_unlock(&global_rsv->lock);
> +	spin_lock(&delayed_refs_rsv->lock);
> +	reserved += delayed_refs_rsv->reserved;
> +	if (delayed_refs_rsv->size >= reserved)
> +		ret = true;
> +	spin_unlock(&delayed_refs_rsv->lock);
>  	return ret;
>  }
>  
> @@ -2891,7 +2879,7 @@ int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans)
>  	if (val >= NSEC_PER_SEC / 2)
>  		return 2;
>  
> -	return btrfs_check_space_for_delayed_refs(trans);
> +	return btrfs_check_space_for_delayed_refs(trans->fs_info);
>  }
>  
>  struct async_delayed_refs {
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index a097f5fde31d..8532a2eb56d1 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -5326,8 +5326,8 @@ static struct btrfs_trans_handle *evict_refill_and_join(struct btrfs_root *root,
>  		 * Try to steal from the global reserve if there is space for
>  		 * it.
>  		 */
> -		if (!btrfs_check_space_for_delayed_refs(trans) &&
> -		    !btrfs_block_rsv_migrate(global_rsv, rsv, rsv->size, false))
> +		if (!btrfs_check_space_for_delayed_refs(fs_info) &&
> +		    !btrfs_block_rsv_migrate(global_rsv, rsv, rsv->size, 0))
>  			return trans;
>  
>  		/* If not, commit and try again. */
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index a21c4defad92..2d8401bf8df9 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -789,7 +789,7 @@ static int should_end_transaction(struct btrfs_trans_handle *trans)
>  {
>  	struct btrfs_fs_info *fs_info = trans->fs_info;
>  
> -	if (btrfs_check_space_for_delayed_refs(trans))
> +	if (btrfs_check_space_for_delayed_refs(fs_info))
>  		return 1;
>  
>  	return !!btrfs_block_rsv_check(&fs_info->global_block_rsv, 5);
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 08/10] btrfs: rework btrfs_check_space_for_delayed_refs
  2018-12-06 16:52   ` Nikolay Borisov
@ 2018-12-06 17:54     ` David Sterba
  2018-12-07  7:09       ` Nikolay Borisov
  0 siblings, 1 reply; 26+ messages in thread
From: David Sterba @ 2018-12-06 17:54 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Josef Bacik, linux-btrfs, kernel-team

On Thu, Dec 06, 2018 at 06:52:21PM +0200, Nikolay Borisov wrote:
> 
> 
> On 3.12.18 г. 17:20 ч., Josef Bacik wrote:
> > Now with the delayed_refs_rsv we can now know exactly how much pending
> > delayed refs space we need.  This means we can drastically simplify
> 
> IMO it will be helpful if there is a sentence here referring back to
> btrfs_update_delayed_refs_rsv to put your first sentence into context.
> But I guess this is something David can also do.

I'll update the changelog, but I'm not sure what exactly you want to see
there, please post the replacement text. Thanks.

> > btrfs_check_space_for_delayed_refs by simply checking how much space we
> > have reserved for the global rsv (which acts as a spill over buffer) and
> > the delayed refs rsv.  If our total size is beyond that amount then we
> > know it's time to commit the transaction and stop any more delayed refs
> > from being generated.
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 08/10] btrfs: rework btrfs_check_space_for_delayed_refs
  2018-12-06 17:54     ` David Sterba
@ 2018-12-07  7:09       ` Nikolay Borisov
  2018-12-07  9:57         ` Nikolay Borisov
  2018-12-13 16:44         ` David Sterba
  0 siblings, 2 replies; 26+ messages in thread
From: Nikolay Borisov @ 2018-12-07  7:09 UTC (permalink / raw)
  To: dsterba, Josef Bacik, linux-btrfs, kernel-team



On 6.12.18 г. 19:54 ч., David Sterba wrote:
> On Thu, Dec 06, 2018 at 06:52:21PM +0200, Nikolay Borisov wrote:
>>
>>
>> On 3.12.18 г. 17:20 ч., Josef Bacik wrote:
>>> Now with the delayed_refs_rsv we can now know exactly how much pending
>>> delayed refs space we need.  This means we can drastically simplify
>>
>> IMO it will be helpful if there is a sentence here referring back to
>> btrfs_update_delayed_refs_rsv to put your first sentence into context.
>> But I guess this is something David can also do.
> 
> I'll update the changelog, but I'm not sure what exactly you want to see
> there, please post the replacement text. Thanks.

With the introduction of dealyed_refs_rsv infrastructure, namely
btrfs_update_delayed_refs_rsv we now know exactly how much pending
delayed refs space is required.

> 
>>> btrfs_check_space_for_delayed_refs by simply checking how much space we
>>> have reserved for the global rsv (which acts as a spill over buffer) and
>>> the delayed refs rsv.  If our total size is beyond that amount then we
>>> know it's time to commit the transaction and stop any more delayed refs
>>> from being generated.
>>>
>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 08/10] btrfs: rework btrfs_check_space_for_delayed_refs
  2018-12-07  7:09       ` Nikolay Borisov
@ 2018-12-07  9:57         ` Nikolay Borisov
  2018-12-13 16:44         ` David Sterba
  1 sibling, 0 replies; 26+ messages in thread
From: Nikolay Borisov @ 2018-12-07  9:57 UTC (permalink / raw)
  To: dsterba, Josef Bacik, linux-btrfs, kernel-team



On 7.12.18 г. 9:09 ч., Nikolay Borisov wrote:
> 
> 
> On 6.12.18 г. 19:54 ч., David Sterba wrote:
>> On Thu, Dec 06, 2018 at 06:52:21PM +0200, Nikolay Borisov wrote:
>>>
>>>
>>> On 3.12.18 г. 17:20 ч., Josef Bacik wrote:
>>>> Now with the delayed_refs_rsv we can now know exactly how much pending
>>>> delayed refs space we need.  This means we can drastically simplify
>>>
>>> IMO it will be helpful if there is a sentence here referring back to
>>> btrfs_update_delayed_refs_rsv to put your first sentence into context.
>>> But I guess this is something David can also do.
>>
>> I'll update the changelog, but I'm not sure what exactly you want to see
>> there, please post the replacement text. Thanks.
> 
> With the introduction of dealyed_refs_rsv infrastructure, namely
> btrfs_update_delayed_refs_rsv we now know exactly how much pending
> delayed refs space is required.

To put things into context as to why I deem this change beneficial -
basically doing the migration of reservation from transaction to delayed
refs rsv modifies both size and reserved - they will be equal. Calling
btrfs_update_delayed_refs_rsv actually increases ->size and doesn't
really decrement ->reserved. Also we never do
btrfs_block_rsv_migrate/use_block_rsv on the delayed refs block rsv so
managing ->reserved  value for delayed refs rsv is different than for
the rest of the block rsv.


> 
>>
>>>> btrfs_check_space_for_delayed_refs by simply checking how much space we
>>>> have reserved for the global rsv (which acts as a spill over buffer) and
>>>> the delayed refs rsv.  If our total size is beyond that amount then we
>>>> know it's time to commit the transaction and stop any more delayed refs
>>>> from being generated.
>>>>
>>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>>
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 04/10] btrfs: only track ref_heads in delayed_ref_updates
  2018-12-03 15:20 ` [PATCH 04/10] btrfs: only track ref_heads in delayed_ref_updates Josef Bacik
@ 2018-12-07 13:01   ` Nikolay Borisov
  2018-12-13 16:36     ` David Sterba
  0 siblings, 1 reply; 26+ messages in thread
From: Nikolay Borisov @ 2018-12-07 13:01 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team; +Cc: Josef Bacik



On 3.12.18 г. 17:20 ч., Josef Bacik wrote:
> From: Josef Bacik <jbacik@fb.com>
> 
> We use this number to figure out how many delayed refs to run, but
> __btrfs_run_delayed_refs really only checks every time we need a new
> delayed ref head, so we always run at least one ref head completely no
> matter what the number of items on it.  Fix the accounting to only be
> adjusted when we add/remove a ref head.

David,

I think also warrants a forward looking sentence stating that the number
is also going to be used to calculate the required number of bytes in
the delayed refs rsv. Something along the lines of:

In addition to using this number to limit the number of delayed refs
run, a future patch is also going to use it to calculate the amount of
space required for delayed refs space reservation.

> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  fs/btrfs/delayed-ref.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index b3e4c9fcb664..48725fa757a3 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -251,8 +251,6 @@ static inline void drop_delayed_ref(struct btrfs_trans_handle *trans,
>  	ref->in_tree = 0;
>  	btrfs_put_delayed_ref(ref);
>  	atomic_dec(&delayed_refs->num_entries);
> -	if (trans->delayed_ref_updates)
> -		trans->delayed_ref_updates--;
>  }
>  
>  static bool merge_ref(struct btrfs_trans_handle *trans,
> @@ -467,7 +465,6 @@ static int insert_delayed_ref(struct btrfs_trans_handle *trans,
>  	if (ref->action == BTRFS_ADD_DELAYED_REF)
>  		list_add_tail(&ref->add_list, &href->ref_add_list);
>  	atomic_inc(&root->num_entries);
> -	trans->delayed_ref_updates++;
>  	spin_unlock(&href->lock);
>  	return ret;
>  }
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 05/10] btrfs: introduce delayed_refs_rsv
  2018-12-03 15:20 ` [PATCH 05/10] btrfs: introduce delayed_refs_rsv Josef Bacik
@ 2018-12-07 14:45   ` Nikolay Borisov
  2018-12-13 16:49     ` David Sterba
  0 siblings, 1 reply; 26+ messages in thread
From: Nikolay Borisov @ 2018-12-07 14:45 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team; +Cc: Josef Bacik



On 3.12.18 г. 17:20 ч., Josef Bacik wrote:
> From: Josef Bacik <jbacik@fb.com>
> 
> Traditionally we've had voodoo in btrfs to account for the space that
> delayed refs may take up by having a global_block_rsv.  This works most
> of the time, except when it doesn't.  We've had issues reported and seen
> in production where sometimes the global reserve is exhausted during
> transaction commit before we can run all of our delayed refs, resulting
> in an aborted transaction.  Because of this voodoo we have equally
> dubious flushing semantics around throttling delayed refs which we often
> get wrong.
> 
> So instead give them their own block_rsv.  This way we can always know
> exactly how much outstanding space we need for delayed refs.  This
> allows us to make sure we are constantly filling that reservation up
> with space, and allows us to put more precise pressure on the enospc
> system.  Instead of doing math to see if its a good time to throttle,
> the normal enospc code will be invoked if we have a lot of delayed refs
> pending, and they will be run via the normal flushing mechanism.
> 
> For now the delayed_refs_rsv will hold the reservations for the delayed
> refs, the block group updates, and deleting csums.  We could have a
> separate rsv for the block group updates, but the csum deletion stuff is
> still handled via the delayed_refs so that will stay there.


I see one difference in the way that the space is managed. Essentially
for delayed refs rsv you'll only ever be increasaing the size and
->reserved only when you have to refill. This is opposite to the way
other metadata space is managed i.e by using use_block_rsv which
subtracts ->reserved everytime a block has to be CoW'ed. Why this
difference?


> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  fs/btrfs/ctree.h       |  14 +++-
>  fs/btrfs/delayed-ref.c |  43 ++++++++--
>  fs/btrfs/disk-io.c     |   4 +
>  fs/btrfs/extent-tree.c | 212 +++++++++++++++++++++++++++++++++++++++++++++----
>  fs/btrfs/transaction.c |  37 ++++++++-
>  5 files changed, 284 insertions(+), 26 deletions(-)
> 

<snip>

> +/**
> + * btrfs_migrate_to_delayed_refs_rsv - transfer bytes to our delayed refs rsv.
> + * @fs_info - the fs info for our fs.
> + * @src - the source block rsv to transfer from.
> + * @num_bytes - the number of bytes to transfer.
> + *
> + * This transfers up to the num_bytes amount from the src rsv to the
> + * delayed_refs_rsv.  Any extra bytes are returned to the space info.
> + */
> +void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info,
> +				       struct btrfs_block_rsv *src,
> +				       u64 num_bytes)

This function is currently used only during transaction start, it seems
to be rather specific to the delayed refs so I'd suggest making it
private to transaction.c

> +{
> +	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
> +	u64 to_free = 0;
> +
> +	spin_lock(&src->lock);
> +	src->reserved -= num_bytes;
> +	src->size -= num_bytes;
> +	spin_unlock(&src->lock);
> +
> +	spin_lock(&delayed_refs_rsv->lock);
> +	if (delayed_refs_rsv->size > delayed_refs_rsv->reserved) {
> +		u64 delta = delayed_refs_rsv->size -
> +			delayed_refs_rsv->reserved;
> +		if (num_bytes > delta) {
> +			to_free = num_bytes - delta;
> +			num_bytes = delta;
> +		}
> +	} else {
> +		to_free = num_bytes;
> +		num_bytes = 0;
> +	}
> +
> +	if (num_bytes)
> +		delayed_refs_rsv->reserved += num_bytes;
> +	if (delayed_refs_rsv->reserved >= delayed_refs_rsv->size)
> +		delayed_refs_rsv->full = 1;
> +	spin_unlock(&delayed_refs_rsv->lock);
> +
> +	if (num_bytes)
> +		trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv",
> +					      0, num_bytes, 1);
> +	if (to_free)
> +		space_info_add_old_bytes(fs_info, delayed_refs_rsv->space_info,
> +					 to_free);
> +}
> +
> +/**
> + * btrfs_delayed_refs_rsv_refill - refill based on our delayed refs usage.
> + * @fs_info - the fs_info for our fs.
> + * @flush - control how we can flush for this reservation.
> + *
> + * This will refill the delayed block_rsv up to 1 items size worth of space and
> + * will return -ENOSPC if we can't make the reservation.
> + */
> +int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
> +				  enum btrfs_reserve_flush_enum flush)
> +{
> +	struct btrfs_block_rsv *block_rsv = &fs_info->delayed_refs_rsv;
> +	u64 limit = btrfs_calc_trans_metadata_size(fs_info, 1);
> +	u64 num_bytes = 0;
> +	int ret = -ENOSPC;
> +
> +	spin_lock(&block_rsv->lock);
> +	if (block_rsv->reserved < block_rsv->size) {
> +		num_bytes = block_rsv->size - block_rsv->reserved;
> +		num_bytes = min(num_bytes, limit);
> +	}
> +	spin_unlock(&block_rsv->lock);
> +
> +	if (!num_bytes)
> +		return 0;
> +
> +	ret = reserve_metadata_bytes(fs_info->extent_root, block_rsv,
> +				     num_bytes, flush);
> +	if (ret)
> +		return ret;
> +	block_rsv_add_bytes(block_rsv, num_bytes, 0);
> +	trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv",
> +				      0, num_bytes, 1);
> +	return 0;
> +}

This is also called from a single place in transaction.c so can be made
static.

> +
> +
>  /*
>   * This is for space we already have accounted in space_info->bytes_may_use, so
>   * basically when we're returning space from block_rsv's.
> @@ -5709,6 +5809,31 @@ static int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>  	return ret;
>  }
>  
> +static u64 __btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
> +				     struct btrfs_block_rsv *block_rsv,
> +				     u64 num_bytes, u64 *qgroup_to_release)
> +{
> +	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
> +	struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_refs_rsv;
> +	struct btrfs_block_rsv *target = delayed_rsv;
> +
> +	if (target->full || target == block_rsv)
> +		target = global_rsv;
> +
> +	if (block_rsv->space_info != target->space_info)
> +		target = NULL;
> +
> +	return block_rsv_release_bytes(fs_info, block_rsv, target, num_bytes,
> +				       qgroup_to_release);
> +}
> +
> +void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
> +			     struct btrfs_block_rsv *block_rsv,
> +			     u64 num_bytes)
> +{
> +	__btrfs_block_rsv_release(fs_info, block_rsv, num_bytes, NULL);
> +}

Again, I think the changelog should mention that one of the major
changes to behavior you do is essentially prioritize delayed refs
refilling over global rsv refilling by "hijacking" all block group
releases to go through __btrfs_block_rsv_release

> +
>  /**
>   * btrfs_inode_rsv_release - release any excessive reservation.
>   * @inode - the inode we need to release from.
> @@ -5723,7 +5848,6 @@ static int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>  static void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
>  {
>  	struct btrfs_fs_info *fs_info = inode->root->fs_info;
> -	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>  	u64 released = 0;
>  	u64 qgroup_to_release = 0;
> @@ -5733,8 +5857,8 @@ static void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
>  	 * are releasing 0 bytes, and then we'll just get the reservation over
>  	 * the size free'd.
>  	 */
> -	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0,
> -					   &qgroup_to_release);
> +	released = __btrfs_block_rsv_release(fs_info, block_rsv, 0,
> +					     &qgroup_to_release);

This change requires a bit more explanation. Now when releasing the
bytes for an inode we execute the additional logic in
__btrfs_block_rsv_release to decide if we need to refill the globalrsv
or delayed refs rsf based on whether delayed refs rsv is full or not.
Before global rsv will be refilled by whatever is the excess amount and
right now if delayed refs rsv is not full delayed refs rsv will be
refilled.

>  	if (released > 0)
>  		trace_btrfs_space_reservation(fs_info, "delalloc",
>  					      btrfs_ino(inode), released, 0);
> @@ -5745,16 +5869,26 @@ static void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
>  						   qgroup_to_release);
>  }
>  
> -void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
> -			     struct btrfs_block_rsv *block_rsv,
> -			     u64 num_bytes)
> +/**
> + * btrfs_delayed_refs_rsv_release - release a ref head's reservation.
> + * @fs_info - the fs_info for our fs.
> + * @nr - the number of items to drop.
> + *
> + * This drops the delayed ref head's count from the delayed refs rsv and free's
> + * any excess reservation we had.
> + */
> +void btrfs_delayed_refs_rsv_release(struct btrfs_fs_info *fs_info, int nr)
>  {
> +	struct btrfs_block_rsv *block_rsv = &fs_info->delayed_refs_rsv;
>  	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
> +	u64 num_bytes = btrfs_calc_trans_metadata_size(fs_info, nr);
> +	u64 released = 0;
>  
> -	if (global_rsv == block_rsv ||
> -	    block_rsv->space_info != global_rsv->space_info)
> -		global_rsv = NULL;
> -	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes, NULL);
> +	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv,
> +					   num_bytes, NULL);
> +	if (released)
> +		trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv",
> +					      0, released, 0);
>  }
>  
>  static void update_global_block_rsv(struct btrfs_fs_info *fs_info)
> @@ -5819,9 +5953,10 @@ static void init_global_block_rsv(struct btrfs_fs_info *fs_info)
>  	fs_info->trans_block_rsv.space_info = space_info;
>  	fs_info->empty_block_rsv.space_info = space_info;
>  	fs_info->delayed_block_rsv.space_info = space_info;
> +	fs_info->delayed_refs_rsv.space_info = space_info;
>  
> -	fs_info->extent_root->block_rsv = &fs_info->global_block_rsv;
> -	fs_info->csum_root->block_rsv = &fs_info->global_block_rsv;
> +	fs_info->extent_root->block_rsv = &fs_info->delayed_refs_rsv;
> +	fs_info->csum_root->block_rsv = &fs_info->delayed_refs_rsv;

nit: Some whitespace damage

>  	fs_info->dev_root->block_rsv = &fs_info->global_block_rsv;
>  	fs_info->tree_root->block_rsv = &fs_info->global_block_rsv;
>  	if (fs_info->quota_root)

<snip>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 04/10] btrfs: only track ref_heads in delayed_ref_updates
  2018-12-07 13:01   ` Nikolay Borisov
@ 2018-12-13 16:36     ` David Sterba
  0 siblings, 0 replies; 26+ messages in thread
From: David Sterba @ 2018-12-13 16:36 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Josef Bacik, linux-btrfs, kernel-team, Josef Bacik

On Fri, Dec 07, 2018 at 03:01:45PM +0200, Nikolay Borisov wrote:
> 
> 
> On 3.12.18 г. 17:20 ч., Josef Bacik wrote:
> > From: Josef Bacik <jbacik@fb.com>
> > 
> > We use this number to figure out how many delayed refs to run, but
> > __btrfs_run_delayed_refs really only checks every time we need a new
> > delayed ref head, so we always run at least one ref head completely no
> > matter what the number of items on it.  Fix the accounting to only be
> > adjusted when we add/remove a ref head.
> 
> David,
> 
> I think also warrants a forward looking sentence stating that the number
> is also going to be used to calculate the required number of bytes in
> the delayed refs rsv. Something along the lines of:
> 
> In addition to using this number to limit the number of delayed refs
> run, a future patch is also going to use it to calculate the amount of
> space required for delayed refs space reservation.

Added to the changelog, thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 08/10] btrfs: rework btrfs_check_space_for_delayed_refs
  2018-12-07  7:09       ` Nikolay Borisov
  2018-12-07  9:57         ` Nikolay Borisov
@ 2018-12-13 16:44         ` David Sterba
  1 sibling, 0 replies; 26+ messages in thread
From: David Sterba @ 2018-12-13 16:44 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: dsterba, Josef Bacik, linux-btrfs, kernel-team

On Fri, Dec 07, 2018 at 09:09:28AM +0200, Nikolay Borisov wrote:
> 
> 
> On 6.12.18 г. 19:54 ч., David Sterba wrote:
> > On Thu, Dec 06, 2018 at 06:52:21PM +0200, Nikolay Borisov wrote:
> >>
> >>
> >> On 3.12.18 г. 17:20 ч., Josef Bacik wrote:
> >>> Now with the delayed_refs_rsv we can now know exactly how much pending
> >>> delayed refs space we need.  This means we can drastically simplify
> >>
> >> IMO it will be helpful if there is a sentence here referring back to
> >> btrfs_update_delayed_refs_rsv to put your first sentence into context.
> >> But I guess this is something David can also do.
> > 
> > I'll update the changelog, but I'm not sure what exactly you want to see
> > there, please post the replacement text. Thanks.
> 
> With the introduction of dealyed_refs_rsv infrastructure, namely
> btrfs_update_delayed_refs_rsv we now know exactly how much pending
> delayed refs space is required.

Changelog updated, thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 05/10] btrfs: introduce delayed_refs_rsv
  2018-12-07 14:45   ` Nikolay Borisov
@ 2018-12-13 16:49     ` David Sterba
  0 siblings, 0 replies; 26+ messages in thread
From: David Sterba @ 2018-12-13 16:49 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Josef Bacik, linux-btrfs, kernel-team, Josef Bacik

On Fri, Dec 07, 2018 at 04:45:37PM +0200, Nikolay Borisov wrote:
> > @@ -5819,9 +5953,10 @@ static void init_global_block_rsv(struct btrfs_fs_info *fs_info)
> >  	fs_info->trans_block_rsv.space_info = space_info;
> >  	fs_info->empty_block_rsv.space_info = space_info;
> >  	fs_info->delayed_block_rsv.space_info = space_info;
> > +	fs_info->delayed_refs_rsv.space_info = space_info;
> >  
> > -	fs_info->extent_root->block_rsv = &fs_info->global_block_rsv;
> > -	fs_info->csum_root->block_rsv = &fs_info->global_block_rsv;
> > +	fs_info->extent_root->block_rsv = &fs_info->delayed_refs_rsv;
> > +	fs_info->csum_root->block_rsv = &fs_info->delayed_refs_rsv;
> 
> nit: Some whitespace damage

No, the removed part is global_block_rsv and added is delayed_refs_rsv.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 08/10] btrfs: rework btrfs_check_space_for_delayed_refs
  2018-12-03 15:20 ` [PATCH 08/10] btrfs: rework btrfs_check_space_for_delayed_refs Josef Bacik
  2018-12-06 16:52   ` Nikolay Borisov
@ 2019-01-14  6:28   ` Qu Wenruo
  1 sibling, 0 replies; 26+ messages in thread
From: Qu Wenruo @ 2019-01-14  6:28 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team


[-- Attachment #1.1: Type: text/plain, Size: 6377 bytes --]



On 2018/12/3 下午11:20, Josef Bacik wrote:
> Now with the delayed_refs_rsv we can now know exactly how much pending
> delayed refs space we need.  This means we can drastically simplify
> btrfs_check_space_for_delayed_refs by simply checking how much space we
> have reserved for the global rsv (which acts as a spill over buffer) and
> the delayed refs rsv.  If our total size is beyond that amount then we
> know it's time to commit the transaction and stop any more delayed refs
> from being generated.

This patch is causing obvious large performance regression for metadata
relocation.
Bisect leads to this patch.

I'm using a script copying /usr (around 3.5G) into a subvolume, and do
16 snapshots, then touching 3 random files in each snapshot.

Then do a *metadata* balance.
Please note, quota is *DISABLED* here.

The full scripts can be found here:
https://gist.github.com/adam900710/e0a9719441e770a4d0d7b32c4a88bf95

Before this patch, it's around 5s to relocate 3 metadata block groups.
(VM is using unsafe cache mode, so IO is as fast as memory speed).

After this patch, I don't know how long it will take, as it doesn't even
finish before I reset the VM.

I also found during the super slow relocation, the generation of the fs
is increasing like crazy.
So something is definitely causing a ton of transaction commit.

Thanks,
Qu

> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/ctree.h       |  2 +-
>  fs/btrfs/extent-tree.c | 48 ++++++++++++++++++------------------------------
>  fs/btrfs/inode.c       |  4 ++--
>  fs/btrfs/transaction.c |  2 +-
>  4 files changed, 22 insertions(+), 34 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 2eba398c722b..30da075c042e 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -2631,7 +2631,7 @@ static inline u64 btrfs_calc_trunc_metadata_size(struct btrfs_fs_info *fs_info,
>  }
>  
>  int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans);
> -int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans);
> +bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
>  void btrfs_dec_block_group_reservations(struct btrfs_fs_info *fs_info,
>  					 const u64 start);
>  void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 5a2d0b061f57..07ef1b8087f7 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2839,40 +2839,28 @@ u64 btrfs_csum_bytes_to_leaves(struct btrfs_fs_info *fs_info, u64 csum_bytes)
>  	return num_csums;
>  }
>  
> -int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans)
> +bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info)
>  {
> -	struct btrfs_fs_info *fs_info = trans->fs_info;
> -	struct btrfs_block_rsv *global_rsv;
> -	u64 num_heads = trans->transaction->delayed_refs.num_heads_ready;
> -	u64 csum_bytes = trans->transaction->delayed_refs.pending_csums;
> -	unsigned int num_dirty_bgs = trans->transaction->num_dirty_bgs;
> -	u64 num_bytes, num_dirty_bgs_bytes;
> -	int ret = 0;
> +	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
> +	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
> +	bool ret = false;
> +	u64 reserved;
>  
> -	num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
> -	num_heads = heads_to_leaves(fs_info, num_heads);
> -	if (num_heads > 1)
> -		num_bytes += (num_heads - 1) * fs_info->nodesize;
> -	num_bytes <<= 1;
> -	num_bytes += btrfs_csum_bytes_to_leaves(fs_info, csum_bytes) *
> -							fs_info->nodesize;
> -	num_dirty_bgs_bytes = btrfs_calc_trans_metadata_size(fs_info,
> -							     num_dirty_bgs);
> -	global_rsv = &fs_info->global_block_rsv;
> +	spin_lock(&global_rsv->lock);
> +	reserved = global_rsv->reserved;
> +	spin_unlock(&global_rsv->lock);
>  
>  	/*
> -	 * If we can't allocate any more chunks lets make sure we have _lots_ of
> -	 * wiggle room since running delayed refs can create more delayed refs.
> +	 * Since the global reserve is just kind of magic we don't really want
> +	 * to rely on it to save our bacon, so if our size is more than the
> +	 * delayed_refs_rsv and the global rsv then it's time to think about
> +	 * bailing.
>  	 */
> -	if (global_rsv->space_info->full) {
> -		num_dirty_bgs_bytes <<= 1;
> -		num_bytes <<= 1;
> -	}
> -
> -	spin_lock(&global_rsv->lock);
> -	if (global_rsv->reserved <= num_bytes + num_dirty_bgs_bytes)
> -		ret = 1;
> -	spin_unlock(&global_rsv->lock);
> +	spin_lock(&delayed_refs_rsv->lock);
> +	reserved += delayed_refs_rsv->reserved;
> +	if (delayed_refs_rsv->size >= reserved)
> +		ret = true;
> +	spin_unlock(&delayed_refs_rsv->lock);
>  	return ret;
>  }
>  
> @@ -2891,7 +2879,7 @@ int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans)
>  	if (val >= NSEC_PER_SEC / 2)
>  		return 2;
>  
> -	return btrfs_check_space_for_delayed_refs(trans);
> +	return btrfs_check_space_for_delayed_refs(trans->fs_info);
>  }
>  
>  struct async_delayed_refs {
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index a097f5fde31d..8532a2eb56d1 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -5326,8 +5326,8 @@ static struct btrfs_trans_handle *evict_refill_and_join(struct btrfs_root *root,
>  		 * Try to steal from the global reserve if there is space for
>  		 * it.
>  		 */
> -		if (!btrfs_check_space_for_delayed_refs(trans) &&
> -		    !btrfs_block_rsv_migrate(global_rsv, rsv, rsv->size, false))
> +		if (!btrfs_check_space_for_delayed_refs(fs_info) &&
> +		    !btrfs_block_rsv_migrate(global_rsv, rsv, rsv->size, 0))
>  			return trans;
>  
>  		/* If not, commit and try again. */
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index a21c4defad92..2d8401bf8df9 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -789,7 +789,7 @@ static int should_end_transaction(struct btrfs_trans_handle *trans)
>  {
>  	struct btrfs_fs_info *fs_info = trans->fs_info;
>  
> -	if (btrfs_check_space_for_delayed_refs(trans))
> +	if (btrfs_check_space_for_delayed_refs(fs_info))
>  		return 1;
>  
>  	return !!btrfs_block_rsv_check(&fs_info->global_block_rsv, 5);
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-01-14  6:28 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-03 15:20 [PATCH 00/10][V2] Delayed refs rsv Josef Bacik
2018-12-03 15:20 ` [PATCH 01/10] btrfs: add btrfs_delete_ref_head helper Josef Bacik
2018-12-06 12:32   ` Nikolay Borisov
2018-12-03 15:20 ` [PATCH 02/10] btrfs: add cleanup_ref_head_accounting helper Josef Bacik
2018-12-06 12:38   ` Nikolay Borisov
2018-12-03 15:20 ` [PATCH 03/10] btrfs: cleanup extent_op handling Josef Bacik
2018-12-03 15:20 ` [PATCH 04/10] btrfs: only track ref_heads in delayed_ref_updates Josef Bacik
2018-12-07 13:01   ` Nikolay Borisov
2018-12-13 16:36     ` David Sterba
2018-12-03 15:20 ` [PATCH 05/10] btrfs: introduce delayed_refs_rsv Josef Bacik
2018-12-07 14:45   ` Nikolay Borisov
2018-12-13 16:49     ` David Sterba
2018-12-03 15:20 ` [PATCH 06/10] btrfs: update may_commit_transaction to use the delayed refs rsv Josef Bacik
2018-12-06 12:51   ` Nikolay Borisov
2018-12-03 15:20 ` [PATCH 07/10] btrfs: add new flushing states for " Josef Bacik
2018-12-03 15:20 ` [PATCH 08/10] btrfs: rework btrfs_check_space_for_delayed_refs Josef Bacik
2018-12-06 16:52   ` Nikolay Borisov
2018-12-06 17:54     ` David Sterba
2018-12-07  7:09       ` Nikolay Borisov
2018-12-07  9:57         ` Nikolay Borisov
2018-12-13 16:44         ` David Sterba
2019-01-14  6:28   ` Qu Wenruo
2018-12-03 15:20 ` [PATCH 09/10] btrfs: don't run delayed refs in the end transaction logic Josef Bacik
2018-12-06 16:43   ` Nikolay Borisov
2018-12-03 15:20 ` [PATCH 10/10] btrfs: fix truncate throttling Josef Bacik
2018-12-06 15:56 ` [PATCH 00/10][V2] Delayed refs rsv David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).