linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] Throttle delayed refs based on time
@ 2020-03-13 21:23 Josef Bacik
  2020-03-13 21:23 ` [PATCH 01/13] btrfs: use a stable rolling avg for delayed refs avg Josef Bacik
                   ` (12 more replies)
  0 siblings, 13 replies; 16+ messages in thread
From: Josef Bacik @ 2020-03-13 21:23 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Zygo reported a problem on IRC where he was seeing multi-hour long latencies
with his test rig with transaction commits.  This turned out to be because his
test rig runs rsync, balance, snapshot create and delete, dedup, a infinite loop
of mkdir/rmdirs, and I'm sure some other horrors I'm forgetting.

When I added the delayed refs reserve, I assumed that the space pressure
generated by generating a lot of delayed refs would result in transactions being
ended if they needed to be, and thus we no longer needed to throttle delayed
refs based on time.

This assumption was wrong, because in Zygo's case he has a multi terabyte file
system, so overcommit allows him to generate as many delayed refs as he wants.
This meant that he would need to run hundreds of thousands of delayed refs at
commit time.  To make matters worse, we didn't have a way to stop people from
generating more delayed refs, so the transaction commit could be held open
indefinitely by balance and snapshot delete.  This is how we were getting
transaction commits happening every few hours.

The solution to this problem is to bring back the time based delayed ref
flushing, and then add the ability for people to throttle themselves on that.

Balance and truncate already had this ability, it only needed to be added to
snapshot delete.

I've also added back the async delayed ref flushing, and I've added code to help
throttle people when we're generating delayed refs too fast for the system to
keep up with them.

This whole patch queue has been running in some form or another on Zygo's awful
test bed, and appears to be performing better than it was before I ripped out
the original code.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 01/13] btrfs: use a stable rolling avg for delayed refs avg
  2020-03-13 21:23 [PATCH 00/13] Throttle delayed refs based on time Josef Bacik
@ 2020-03-13 21:23 ` Josef Bacik
  2020-03-13 21:23 ` [PATCH 02/13] btrfs: change btrfs_should_throttle_delayed_refs to a bool Josef Bacik
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2020-03-13 21:23 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Zygo Blaxell

From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>

The way we currently calculate the average of delayed refs is very
jittery.  We do a weighted average, weighing the new average at 3/4 of
the previously calculated average.

This jitteriness leads to pretty wild swings in latencies when we are
generating a lot of delayed refs.  Fix this by smoothing out the delayed
ref average calculation with 1000 seconds of data, 1000 refs minimum,
with a 0.75 decay rate.

Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h       |  7 +++++++
 fs/btrfs/disk-io.c     |  3 +++
 fs/btrfs/extent-tree.c | 19 +++++++++++++++++--
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2ccb2a090782..992ce47977b8 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -620,7 +620,14 @@ struct btrfs_fs_info {
 
 	u64 generation;
 	u64 last_trans_committed;
+
+	/*
+	 * This is for keeping track of how long it takes to run delayed refs so
+	 * that our delayed ref timing doesn't hurt us.
+	 */
 	u64 avg_delayed_ref_runtime;
+	u64 delayed_ref_runtime;
+	u64 delayed_ref_nr_run;
 
 	/*
 	 * this is updated to the current trans every time a full commit
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 772cf0fa7c55..b5846552666e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2734,6 +2734,9 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	fs_info->tree_mod_log = RB_ROOT;
 	fs_info->commit_interval = BTRFS_DEFAULT_COMMIT_INTERVAL;
 	fs_info->avg_delayed_ref_runtime = NSEC_PER_SEC >> 6; /* div by 64 */
+	fs_info->delayed_ref_runtime = NSEC_PER_SEC;
+	fs_info->delayed_ref_nr_run = 64;
+
 	/* readahead state */
 	INIT_RADIX_TREE(&fs_info->reada_tree, GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
 	spin_lock_init(&fs_info->reada_lock);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 2925b3ad77a1..645ae95f465e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2082,8 +2082,23 @@ static noinline int __btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
 		 * to avoid large swings in the average.
 		 */
 		spin_lock(&delayed_refs->lock);
-		avg = fs_info->avg_delayed_ref_runtime * 3 + runtime;
-		fs_info->avg_delayed_ref_runtime = avg >> 2;	/* div by 4 */
+		fs_info->delayed_ref_nr_run += actual_count;
+		fs_info->delayed_ref_runtime += runtime;
+		avg = div64_u64(fs_info->delayed_ref_runtime,
+				fs_info->delayed_ref_nr_run);
+
+		/*
+		 * Once we've built up a fair bit of data, start decaying
+		 * everything by 3/4.
+		 */
+		if (fs_info->delayed_ref_runtime >= (NSEC_PER_SEC * 1000ULL) &&
+		    fs_info->delayed_ref_nr_run > 1000) {
+			fs_info->delayed_ref_runtime *= 3;
+			fs_info->delayed_ref_runtime >>= 2;
+			fs_info->delayed_ref_nr_run *= 3;
+			fs_info->delayed_ref_nr_run >>= 2;
+		}
+		fs_info->avg_delayed_ref_runtime = avg;
 		spin_unlock(&delayed_refs->lock);
 	}
 	return 0;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 02/13] btrfs: change btrfs_should_throttle_delayed_refs to a bool
  2020-03-13 21:23 [PATCH 00/13] Throttle delayed refs based on time Josef Bacik
  2020-03-13 21:23 ` [PATCH 01/13] btrfs: use a stable rolling avg for delayed refs avg Josef Bacik
@ 2020-03-13 21:23 ` Josef Bacik
  2020-03-13 21:23 ` [PATCH 03/13] btrfs: make btrfs_should_throttle_delayed_refs only check run time Josef Bacik
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2020-03-13 21:23 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We don't actually check the specific value from
btrfs_should_throttle_delayed_refs anywhere, just return a bool.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/delayed-ref.c | 6 +++---
 fs/btrfs/delayed-ref.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index dfdb7d4f8406..acad9978b927 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -50,7 +50,7 @@ bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info)
 	return ret;
 }
 
-int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans)
+bool btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans)
 {
 	u64 num_entries =
 		atomic_read(&trans->transaction->delayed_refs.num_entries);
@@ -61,9 +61,9 @@ int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans)
 	avg_runtime = trans->fs_info->avg_delayed_ref_runtime;
 	val = num_entries * avg_runtime;
 	if (val >= NSEC_PER_SEC)
-		return 1;
+		return true;
 	if (val >= NSEC_PER_SEC / 2)
-		return 2;
+		return true;
 
 	return btrfs_check_space_for_delayed_refs(trans->fs_info);
 }
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index 1c977e6d45dc..9a07480b497b 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -371,7 +371,7 @@ int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
 void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info,
 				       struct btrfs_block_rsv *src,
 				       u64 num_bytes);
-int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans);
+bool btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans);
 bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
 
 /*
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 03/13] btrfs: make btrfs_should_throttle_delayed_refs only check run time
  2020-03-13 21:23 [PATCH 00/13] Throttle delayed refs based on time Josef Bacik
  2020-03-13 21:23 ` [PATCH 01/13] btrfs: use a stable rolling avg for delayed refs avg Josef Bacik
  2020-03-13 21:23 ` [PATCH 02/13] btrfs: change btrfs_should_throttle_delayed_refs to a bool Josef Bacik
@ 2020-03-13 21:23 ` Josef Bacik
  2020-03-13 21:23 ` [PATCH 04/13] btrfs: make should_end_transaction check time to run delayed refs Josef Bacik
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2020-03-13 21:23 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

btrfs_should_throttle_delayed_refs checks run time of the delayed refs
and if there's enough space.  However we want to use these two checks
independently in the future, so make btrfs_should_throttle_delayed_refs
only check the runtime.  Then fix the only caller of
btrfs_should_throttle_delayed_refs to check the space as well, because
we want to throttle truncates on either space or time.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/delayed-ref.c | 3 +--
 fs/btrfs/inode.c       | 3 ++-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index acad9978b927..e28565dc4288 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -64,8 +64,7 @@ bool btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans)
 		return true;
 	if (val >= NSEC_PER_SEC / 2)
 		return true;
-
-	return btrfs_check_space_for_delayed_refs(trans->fs_info);
+	return false;
 }
 
 /**
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b8dabffac767..d3e75e04a0a0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4349,7 +4349,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 				break;
 			}
 			if (be_nice) {
-				if (btrfs_should_throttle_delayed_refs(trans))
+				if (btrfs_should_throttle_delayed_refs(trans) ||
+				    btrfs_check_space_for_delayed_refs(fs_info))
 					should_throttle = true;
 			}
 		}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 04/13] btrfs: make should_end_transaction check time to run delayed refs
  2020-03-13 21:23 [PATCH 00/13] Throttle delayed refs based on time Josef Bacik
                   ` (2 preceding siblings ...)
  2020-03-13 21:23 ` [PATCH 03/13] btrfs: make btrfs_should_throttle_delayed_refs only check run time Josef Bacik
@ 2020-03-13 21:23 ` Josef Bacik
  2020-03-13 21:23 ` [PATCH 05/13] btrfs: squash should_end_transaction Josef Bacik
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2020-03-13 21:23 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Currently snapshot deletion checks to see if it needs to throttle itself
before ending a transaction, however this only checks if there's enough
space for delayed refs, not how much time it'll take to run those
delayed refs.  Fix this by checking btrfs_should_throttle_delayed_refs
as well, which takes into account how much time it'll take to run
delayed refs.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/transaction.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 8d34d7e0adb6..309a2a60040f 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -859,7 +859,8 @@ static int should_end_transaction(struct btrfs_trans_handle *trans)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 
-	if (btrfs_check_space_for_delayed_refs(fs_info))
+	if (btrfs_should_throttle_delayed_refs(trans) ||
+	    btrfs_check_space_for_delayed_refs(fs_info))
 		return 1;
 
 	return !!btrfs_block_rsv_check(&fs_info->global_block_rsv, 5);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 05/13] btrfs: squash should_end_transaction
  2020-03-13 21:23 [PATCH 00/13] Throttle delayed refs based on time Josef Bacik
                   ` (3 preceding siblings ...)
  2020-03-13 21:23 ` [PATCH 04/13] btrfs: make should_end_transaction check time to run delayed refs Josef Bacik
@ 2020-03-13 21:23 ` Josef Bacik
  2020-03-13 21:23 ` [PATCH 06/13] btrfs: add a mode for delayed ref time based throttling Josef Bacik
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2020-03-13 21:23 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We used to call should_end_transaction() in __btrfs_end_transaction, but
we no longer do that and it's a tiny function, so squash it into
btrfs_should_end_transaction.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/transaction.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 309a2a60040f..f6eecb402f5b 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -855,27 +855,21 @@ void btrfs_throttle(struct btrfs_fs_info *fs_info)
 	wait_current_trans(fs_info);
 }
 
-static int should_end_transaction(struct btrfs_trans_handle *trans)
-{
-	struct btrfs_fs_info *fs_info = trans->fs_info;
-
-	if (btrfs_should_throttle_delayed_refs(trans) ||
-	    btrfs_check_space_for_delayed_refs(fs_info))
-		return 1;
-
-	return !!btrfs_block_rsv_check(&fs_info->global_block_rsv, 5);
-}
-
 int btrfs_should_end_transaction(struct btrfs_trans_handle *trans)
 {
 	struct btrfs_transaction *cur_trans = trans->transaction;
+	struct btrfs_fs_info *fs_info = trans->fs_info;
 
 	smp_mb();
 	if (cur_trans->state >= TRANS_STATE_COMMIT_START ||
 	    cur_trans->delayed_refs.flushing)
 		return 1;
 
-	return should_end_transaction(trans);
+	if (btrfs_should_throttle_delayed_refs(trans) ||
+	    btrfs_check_space_for_delayed_refs(fs_info))
+		return 1;
+
+	return !!btrfs_block_rsv_check(&fs_info->global_block_rsv, 5);
 }
 
 static void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 06/13] btrfs: add a mode for delayed ref time based throttling
  2020-03-13 21:23 [PATCH 00/13] Throttle delayed refs based on time Josef Bacik
                   ` (4 preceding siblings ...)
  2020-03-13 21:23 ` [PATCH 05/13] btrfs: squash should_end_transaction Josef Bacik
@ 2020-03-13 21:23 ` Josef Bacik
  2020-03-13 21:23 ` [PATCH 07/13] btrfs: kick off async delayed ref flushing if we are over time budget Josef Bacik
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2020-03-13 21:23 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Currently we only use btrfs_should_throttle_delayed_refs in the case
where we want to pre-emptively stop what we're doing and throttle
delayed refs in some way.  However we're going to use this function for
all transaction ending, so add a flag so we can toggle between the
maximum theoretical runtime and our "maybe we should start flushing"
runtime.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/delayed-ref.c | 9 +++++----
 fs/btrfs/delayed-ref.h | 3 ++-
 fs/btrfs/inode.c       | 3 ++-
 fs/btrfs/transaction.c | 2 +-
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index e28565dc4288..6e9fa03be87d 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -50,7 +50,8 @@ bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info)
 	return ret;
 }
 
-bool btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans)
+bool btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans,
+					bool for_throttle)
 {
 	u64 num_entries =
 		atomic_read(&trans->transaction->delayed_refs.num_entries);
@@ -62,9 +63,9 @@ bool btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans)
 	val = num_entries * avg_runtime;
 	if (val >= NSEC_PER_SEC)
 		return true;
-	if (val >= NSEC_PER_SEC / 2)
-		return true;
-	return false;
+	if (!for_throttle)
+		return false;
+	return (val >= NSEC_PER_SEC / 2);
 }
 
 /**
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index 9a07480b497b..c0ae440434af 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -371,7 +371,8 @@ int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
 void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info,
 				       struct btrfs_block_rsv *src,
 				       u64 num_bytes);
-bool btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans);
+bool btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans,
+					bool for_throttle);
 bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
 
 /*
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d3e75e04a0a0..ad0f0961a711 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4349,7 +4349,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 				break;
 			}
 			if (be_nice) {
-				if (btrfs_should_throttle_delayed_refs(trans) ||
+				if (btrfs_should_throttle_delayed_refs(trans,
+								       true) ||
 				    btrfs_check_space_for_delayed_refs(fs_info))
 					should_throttle = true;
 			}
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index f6eecb402f5b..b0d82e1a6a6e 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -865,7 +865,7 @@ int btrfs_should_end_transaction(struct btrfs_trans_handle *trans)
 	    cur_trans->delayed_refs.flushing)
 		return 1;
 
-	if (btrfs_should_throttle_delayed_refs(trans) ||
+	if (btrfs_should_throttle_delayed_refs(trans, true) ||
 	    btrfs_check_space_for_delayed_refs(fs_info))
 		return 1;
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 07/13] btrfs: kick off async delayed ref flushing if we are over time budget
  2020-03-13 21:23 [PATCH 00/13] Throttle delayed refs based on time Josef Bacik
                   ` (5 preceding siblings ...)
  2020-03-13 21:23 ` [PATCH 06/13] btrfs: add a mode for delayed ref time based throttling Josef Bacik
@ 2020-03-13 21:23 ` Josef Bacik
  2020-04-09 13:11   ` Nikolay Borisov
  2020-04-09 13:26   ` Nikolay Borisov
  2020-03-13 21:23 ` [PATCH 08/13] btrfs: adjust the arguments for btrfs_should_throttle_delayed_refs Josef Bacik
                   ` (5 subsequent siblings)
  12 siblings, 2 replies; 16+ messages in thread
From: Josef Bacik @ 2020-03-13 21:23 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

For very large file systems we cannot rely on the space reservation
system to provide enough pressure to flush delayed refs in a timely
manner.  We have the infrastructure in place to keep track of how much
theoretical time it'll take to run our outstanding delayed refs, but
unfortunately I ripped all of that out when I added the delayed refs
rsv.  This code originally was added to address the problem of too many
delayed refs building up and thus causing transaction commits to take
several minutes to finish.

Fix this by adding back the ability to flush delayed refs based on the
time budget for them.  We want to limit to around 1 seconds worth of
delayed refs to be pending at any given time.  In order to keep up with
demand we will start the async flusher once we are at the 500ms mark,
and the async flusher will attempt to keep us in this ballpark.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h       |  4 ++++
 fs/btrfs/disk-io.c     |  3 +++
 fs/btrfs/extent-tree.c | 44 ++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/transaction.c |  8 ++++++++
 4 files changed, 59 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 992ce47977b8..2a6b2938f9ea 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -494,6 +494,7 @@ enum btrfs_orphan_cleanup_state {
 };
 
 void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info);
+void btrfs_init_async_delayed_ref_work(struct btrfs_fs_info *fs_info);
 
 /* fs_info */
 struct reloc_control;
@@ -924,6 +925,9 @@ struct btrfs_fs_info {
 	struct work_struct async_reclaim_work;
 	struct work_struct async_data_reclaim_work;
 
+	/* Used to run delayed refs in the background. */
+	struct work_struct async_delayed_ref_work;
+
 	spinlock_t unused_bgs_lock;
 	struct list_head unused_bgs;
 	struct mutex unused_bg_unpin_mutex;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b5846552666e..b1a9fe5a639a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2754,6 +2754,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 #endif
 	btrfs_init_balance(fs_info);
 	btrfs_init_async_reclaim_work(fs_info);
+	btrfs_init_async_delayed_ref_work(fs_info);
 
 	spin_lock_init(&fs_info->block_group_cache_lock);
 	fs_info->block_group_cache_tree = RB_ROOT;
@@ -3997,6 +3998,8 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
 	 */
 	kthread_park(fs_info->cleaner_kthread);
 
+	cancel_work_sync(&fs_info->async_delayed_ref_work);
+
 	/* wait for the qgroup rescan worker to stop */
 	btrfs_qgroup_wait_for_completion(fs_info, false);
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 645ae95f465e..0e81990b57e0 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2249,6 +2249,50 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
 	return 0;
 }
 
+static void btrfs_async_run_delayed_refs(struct work_struct *work)
+{
+	struct btrfs_fs_info *fs_info;
+	struct btrfs_trans_handle *trans;
+
+	fs_info = container_of(work, struct btrfs_fs_info,
+			       async_delayed_ref_work);
+
+	while (!btrfs_fs_closing(fs_info)) {
+		unsigned long count;
+		int ret;
+
+		trans = btrfs_attach_transaction(fs_info->extent_root);
+		if (IS_ERR(trans))
+			break;
+
+		smp_rmb();
+		if (trans->transaction->delayed_refs.flushing) {
+			btrfs_end_transaction(trans);
+			break;
+		}
+
+		/* No longer over our threshold, lets bail. */
+		if (!btrfs_should_throttle_delayed_refs(trans, true)) {
+			btrfs_end_transaction(trans);
+			break;
+		}
+
+		count = atomic_read(&trans->transaction->delayed_refs.num_entries);
+		count >>= 2;
+
+		ret = btrfs_run_delayed_refs(trans, count);
+		btrfs_end_transaction(trans);
+		if (ret < 0)
+			break;
+	}
+}
+
+void btrfs_init_async_delayed_ref_work(struct btrfs_fs_info *fs_info)
+{
+	INIT_WORK(&fs_info->async_delayed_ref_work,
+		  btrfs_async_run_delayed_refs);
+}
+
 int btrfs_set_disk_extent_flags(struct btrfs_trans_handle *trans,
 				u64 bytenr, u64 num_bytes, u64 flags,
 				int level, int is_data)
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index b0d82e1a6a6e..7f994ab73b0b 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -899,6 +899,7 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 	struct btrfs_fs_info *info = trans->fs_info;
 	struct btrfs_transaction *cur_trans = trans->transaction;
 	int err = 0;
+	bool run_async = false;
 
 	if (refcount_read(&trans->use_count) > 1) {
 		refcount_dec(&trans->use_count);
@@ -906,6 +907,9 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 		return 0;
 	}
 
+	if (btrfs_should_throttle_delayed_refs(trans, true))
+		run_async = true;
+
 	btrfs_trans_release_metadata(trans);
 	trans->block_rsv = NULL;
 
@@ -936,6 +940,10 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 		err = -EIO;
 	}
 
+	if (run_async && !work_busy(&info->async_delayed_ref_work))
+		queue_work(system_unbound_wq,
+			   &info->async_delayed_ref_work);
+
 	kmem_cache_free(btrfs_trans_handle_cachep, trans);
 	return err;
 }
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 08/13] btrfs: adjust the arguments for btrfs_should_throttle_delayed_refs
  2020-03-13 21:23 [PATCH 00/13] Throttle delayed refs based on time Josef Bacik
                   ` (6 preceding siblings ...)
  2020-03-13 21:23 ` [PATCH 07/13] btrfs: kick off async delayed ref flushing if we are over time budget Josef Bacik
@ 2020-03-13 21:23 ` Josef Bacik
  2020-03-13 21:23 ` [PATCH 09/13] btrfs: throttle delayed refs based on time Josef Bacik
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2020-03-13 21:23 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We want to be able to call this without a trans handle being open, so
adjust the arguments to be the fs_info and the delayed_ref_root instead
of the trans handle.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/delayed-ref.c | 8 ++++----
 fs/btrfs/delayed-ref.h | 3 ++-
 fs/btrfs/extent-tree.c | 2 +-
 fs/btrfs/inode.c       | 5 +++--
 fs/btrfs/transaction.c | 5 +++--
 5 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 6e9fa03be87d..e709f051320a 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -50,16 +50,16 @@ bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info)
 	return ret;
 }
 
-bool btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans,
+bool btrfs_should_throttle_delayed_refs(struct btrfs_fs_info *fs_info,
+					struct btrfs_delayed_ref_root *delayed_refs,
 					bool for_throttle)
 {
-	u64 num_entries =
-		atomic_read(&trans->transaction->delayed_refs.num_entries);
+	u64 num_entries = atomic_read(&delayed_refs->num_entries);
 	u64 avg_runtime;
 	u64 val;
 
 	smp_mb();
-	avg_runtime = trans->fs_info->avg_delayed_ref_runtime;
+	avg_runtime = fs_info->avg_delayed_ref_runtime;
 	val = num_entries * avg_runtime;
 	if (val >= NSEC_PER_SEC)
 		return true;
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index c0ae440434af..3ea3a1627d26 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -371,7 +371,8 @@ int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
 void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info,
 				       struct btrfs_block_rsv *src,
 				       u64 num_bytes);
-bool btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans,
+bool btrfs_should_throttle_delayed_refs(struct btrfs_fs_info *fs_info,
+					struct btrfs_delayed_ref_root *delayed_refs,
 					bool for_throttle);
 bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 0e81990b57e0..b9b96e4db65f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2272,7 +2272,7 @@ static void btrfs_async_run_delayed_refs(struct work_struct *work)
 		}
 
 		/* No longer over our threshold, lets bail. */
-		if (!btrfs_should_throttle_delayed_refs(trans, true)) {
+		if (!btrfs_should_throttle_delayed_refs(fs_info, &trans->transaction->delayed_refs, true)) {
 			btrfs_end_transaction(trans);
 			break;
 		}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ad0f0961a711..c9815ed03d21 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4349,8 +4349,9 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 				break;
 			}
 			if (be_nice) {
-				if (btrfs_should_throttle_delayed_refs(trans,
-								       true) ||
+				if (btrfs_should_throttle_delayed_refs(fs_info,
+					&trans->transaction->delayed_refs,
+					true) ||
 				    btrfs_check_space_for_delayed_refs(fs_info))
 					should_throttle = true;
 			}
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 7f994ab73b0b..cf8fab22782f 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -865,7 +865,7 @@ int btrfs_should_end_transaction(struct btrfs_trans_handle *trans)
 	    cur_trans->delayed_refs.flushing)
 		return 1;
 
-	if (btrfs_should_throttle_delayed_refs(trans, true) ||
+	if (btrfs_should_throttle_delayed_refs(fs_info, &cur_trans->delayed_refs, true) ||
 	    btrfs_check_space_for_delayed_refs(fs_info))
 		return 1;
 
@@ -907,7 +907,8 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 		return 0;
 	}
 
-	if (btrfs_should_throttle_delayed_refs(trans, true))
+	if (btrfs_should_throttle_delayed_refs(info,
+					       &cur_trans->delayed_refs, true))
 		run_async = true;
 
 	btrfs_trans_release_metadata(trans);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 09/13] btrfs: throttle delayed refs based on time
  2020-03-13 21:23 [PATCH 00/13] Throttle delayed refs based on time Josef Bacik
                   ` (7 preceding siblings ...)
  2020-03-13 21:23 ` [PATCH 08/13] btrfs: adjust the arguments for btrfs_should_throttle_delayed_refs Josef Bacik
@ 2020-03-13 21:23 ` Josef Bacik
  2020-03-13 21:23 ` [PATCH 10/13] btrfs: handle uncontrolled delayed ref generation Josef Bacik
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2020-03-13 21:23 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We need to make sure we don't generate so many delayed refs that the box
gets overwhelmed at commit time.  Keep a sequence number of the number
of entries run, and if we need to be throttled based on our time
constraints wait for the number of delayed refs we added to be run once
we end our transaction.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/delayed-ref.c |  6 ++++--
 fs/btrfs/delayed-ref.h | 14 ++++++++++++
 fs/btrfs/extent-tree.c |  4 +++-
 fs/btrfs/transaction.c | 49 +++++++++++++++++++++++++++++++++++++++---
 fs/btrfs/transaction.h | 16 ++++++++++++++
 5 files changed, 83 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index e709f051320a..e2f40a449d85 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -424,7 +424,7 @@ static inline void drop_delayed_ref(struct btrfs_trans_handle *trans,
 		list_del(&ref->add_list);
 	ref->in_tree = 0;
 	btrfs_put_delayed_ref(ref);
-	atomic_dec(&delayed_refs->num_entries);
+	btrfs_dec_delayed_ref_entries(delayed_refs);
 }
 
 static bool merge_ref(struct btrfs_trans_handle *trans,
@@ -580,7 +580,7 @@ void btrfs_delete_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
 
 	rb_erase_cached(&head->href_node, &delayed_refs->href_root);
 	RB_CLEAR_NODE(&head->href_node);
-	atomic_dec(&delayed_refs->num_entries);
+	btrfs_dec_delayed_ref_entries(delayed_refs);
 	delayed_refs->num_heads--;
 	if (head->processing == 0)
 		delayed_refs->num_heads_ready--;
@@ -639,6 +639,7 @@ static int insert_delayed_ref(struct btrfs_trans_handle *trans,
 	if (ref->action == BTRFS_ADD_DELAYED_REF)
 		list_add_tail(&ref->add_list, &href->ref_add_list);
 	atomic_inc(&root->num_entries);
+	trans->total_delayed_refs++;
 	spin_unlock(&href->lock);
 	return ret;
 }
@@ -843,6 +844,7 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
 		delayed_refs->num_heads_ready++;
 		atomic_inc(&delayed_refs->num_entries);
 		trans->delayed_ref_updates++;
+		trans->total_delayed_refs++;
 	}
 	if (qrecord_inserted_ret)
 		*qrecord_inserted_ret = qrecord_inserted;
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index 3ea3a1627d26..16cf0af91464 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -150,6 +150,13 @@ struct btrfs_delayed_ref_root {
 	 */
 	atomic_t num_entries;
 
+	/*
+	 * How many entries we've run, and a corresponding waitqueue so that we
+	 * can throttle generators appropriately.
+	 */
+	atomic_t entries_run;
+	wait_queue_head_t wait;
+
 	/* total number of head nodes in tree */
 	unsigned long num_heads;
 
@@ -391,4 +398,11 @@ btrfs_delayed_node_to_data_ref(struct btrfs_delayed_ref_node *node)
 	return container_of(node, struct btrfs_delayed_data_ref, node);
 }
 
+static inline void
+btrfs_dec_delayed_ref_entries(struct btrfs_delayed_ref_root *delayed_refs)
+{
+	atomic_dec(&delayed_refs->num_entries);
+	atomic_inc(&delayed_refs->entries_run);
+	wake_up(&delayed_refs->wait);
+}
 #endif
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index b9b96e4db65f..e490ce994d1d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1958,7 +1958,6 @@ static int btrfs_run_delayed_refs_for_head(struct btrfs_trans_handle *trans,
 		default:
 			WARN_ON(1);
 		}
-		atomic_dec(&delayed_refs->num_entries);
 
 		/*
 		 * Record the must_insert_reserved flag before we drop the
@@ -1974,6 +1973,9 @@ static int btrfs_run_delayed_refs_for_head(struct btrfs_trans_handle *trans,
 		ret = run_one_delayed_ref(trans, ref, extent_op,
 					  must_insert_reserved);
 
+		/* Anybody who's been throttled may be woken up here. */
+		btrfs_dec_delayed_ref_entries(delayed_refs);
+
 		btrfs_free_delayed_extent_op(extent_op);
 		if (ret) {
 			unselect_delayed_ref_head(delayed_refs, locked_ref);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index cf8fab22782f..ac77a2b805fa 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -307,6 +307,8 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
 	cur_trans->delayed_refs.href_root = RB_ROOT_CACHED;
 	cur_trans->delayed_refs.dirty_extent_root = RB_ROOT;
 	atomic_set(&cur_trans->delayed_refs.num_entries, 0);
+	atomic_set(&cur_trans->delayed_refs.entries_run, 0);
+	init_waitqueue_head(&cur_trans->delayed_refs.wait);
 
 	/*
 	 * although the tree mod log is per file system and not per transaction,
@@ -893,13 +895,29 @@ static void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans)
 	trans->bytes_reserved = 0;
 }
 
+static void noinline
+btrfs_throttle_for_delayed_refs(struct btrfs_fs_info *fs_info,
+				struct btrfs_delayed_ref_root *delayed_refs,
+				unsigned long refs, bool throttle)
+{
+	unsigned long threshold = max(refs, 1UL) +
+		atomic_read(&delayed_refs->entries_run);
+	wait_event_interruptible(delayed_refs->wait,
+		 (atomic_read(&delayed_refs->entries_run) >= threshold) ||
+		 !btrfs_should_throttle_delayed_refs(fs_info, delayed_refs,
+						     throttle));
+}
+
 static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 				   int throttle)
 {
 	struct btrfs_fs_info *info = trans->fs_info;
 	struct btrfs_transaction *cur_trans = trans->transaction;
+	unsigned long total_delayed_refs;
+	unsigned int trans_type = trans->type;
 	int err = 0;
 	bool run_async = false;
+	bool throttle_delayed_refs = false;
 
 	if (refcount_read(&trans->use_count) > 1) {
 		refcount_dec(&trans->use_count);
@@ -907,9 +925,23 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 		return 0;
 	}
 
+	/*
+	 * If we are over our threshold for our specified throttle then we need
+	 * to throttle ourselves, because the async flusher is not keeping up.
+	 *
+	 * However if we're just over the async threshold simply kick the async
+	 * flusher.
+	 */
 	if (btrfs_should_throttle_delayed_refs(info,
-					       &cur_trans->delayed_refs, true))
+					       &cur_trans->delayed_refs,
+					       throttle)) {
 		run_async = true;
+		throttle_delayed_refs = true;
+	} else if (btrfs_should_throttle_delayed_refs(info,
+						      &cur_trans->delayed_refs,
+						      true)) {
+		run_async = true;
+	}
 
 	btrfs_trans_release_metadata(trans);
 	trans->block_rsv = NULL;
@@ -918,7 +950,7 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 
 	btrfs_trans_release_chunk_metadata(trans);
 
-	if (trans->type & __TRANS_FREEZABLE)
+	if (trans_type & __TRANS_FREEZABLE)
 		sb_end_intwrite(info->sb);
 
 	WARN_ON(cur_trans != info->running_transaction);
@@ -927,7 +959,6 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 	extwriter_counter_dec(cur_trans, trans->type);
 
 	cond_wake_up(&cur_trans->writer_wait);
-	btrfs_put_transaction(cur_trans);
 
 	if (current->journal_info == trans)
 		current->journal_info = NULL;
@@ -935,6 +966,7 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 	if (throttle)
 		btrfs_run_delayed_iputs(info);
 
+	total_delayed_refs = trans->total_delayed_refs;
 	if (TRANS_ABORTED(trans) ||
 	    test_bit(BTRFS_FS_STATE_ERROR, &info->fs_state)) {
 		wake_up_process(info->transaction_kthread);
@@ -946,6 +978,17 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 			   &info->async_delayed_ref_work);
 
 	kmem_cache_free(btrfs_trans_handle_cachep, trans);
+
+	/*
+	 * We only want to throttle generators, so btrfs_transaction_start
+	 * callers.
+	 */
+	if (throttle_delayed_refs && total_delayed_refs &&
+	    (trans_type & __TRANS_START))
+		btrfs_throttle_for_delayed_refs(info, &cur_trans->delayed_refs,
+						total_delayed_refs, throttle);
+	btrfs_put_transaction(cur_trans);
+
 	return err;
 }
 
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 453cea7c7a72..2ec10978fa2a 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -109,7 +109,23 @@ struct btrfs_trans_handle {
 	u64 transid;
 	u64 bytes_reserved;
 	u64 chunk_bytes_reserved;
+
+	/*
+	 * This tracks the number of items required for the delayed ref rsv, and
+	 * is used by that code.  The accounting is
+	 *
+	 * - 1 per delayed ref head (individual items are not counted).
+	 * - number of csum items that would be inserted for data.
+	 * - block group item updates.
+	 */
 	unsigned long delayed_ref_updates;
+
+	/*
+	 * This is the total number of delayed items that we added for this
+	 * trans handle, this is used for the end transaction throttling code.
+	 */
+	unsigned long total_delayed_refs;
+
 	struct btrfs_transaction *transaction;
 	struct btrfs_block_rsv *block_rsv;
 	struct btrfs_block_rsv *orig_rsv;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 10/13] btrfs: handle uncontrolled delayed ref generation
  2020-03-13 21:23 [PATCH 00/13] Throttle delayed refs based on time Josef Bacik
                   ` (8 preceding siblings ...)
  2020-03-13 21:23 ` [PATCH 09/13] btrfs: throttle delayed refs based on time Josef Bacik
@ 2020-03-13 21:23 ` Josef Bacik
  2020-03-13 21:23 ` [PATCH 11/13] btrfs: check delayed refs while relocating Josef Bacik
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2020-03-13 21:23 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Some operations can generate way too many delayed refs, resulting in the
async flusher being unable to keep up.  To deal with this keep track of
how often we're needing to throttle the trans handles, and if it's too
much increase how many delayed refs they need to wait on each iteration.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/delayed-ref.h |  3 +++
 fs/btrfs/transaction.c | 21 +++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index 16cf0af91464..03590a13f86e 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -157,6 +157,9 @@ struct btrfs_delayed_ref_root {
 	atomic_t entries_run;
 	wait_queue_head_t wait;
 
+	atomic_t mult;
+	time64_t last_adjustment;
+
 	/* total number of head nodes in tree */
 	unsigned long num_heads;
 
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index ac77a2b805fa..6f74f9699560 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -308,6 +308,7 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
 	cur_trans->delayed_refs.dirty_extent_root = RB_ROOT;
 	atomic_set(&cur_trans->delayed_refs.num_entries, 0);
 	atomic_set(&cur_trans->delayed_refs.entries_run, 0);
+	atomic_set(&cur_trans->delayed_refs.mult, 1);
 	init_waitqueue_head(&cur_trans->delayed_refs.wait);
 
 	/*
@@ -902,6 +903,17 @@ btrfs_throttle_for_delayed_refs(struct btrfs_fs_info *fs_info,
 {
 	unsigned long threshold = max(refs, 1UL) +
 		atomic_read(&delayed_refs->entries_run);
+	time64_t start = ktime_get_seconds();
+
+	spin_lock(&delayed_refs->lock);
+	if (delayed_refs->last_adjustment - start >= 1) {
+		if (delayed_refs->last_adjustment)
+			atomic_inc(&delayed_refs->mult);
+		delayed_refs->last_adjustment = start;
+	}
+	spin_unlock(&delayed_refs->lock);
+	refs *= atomic_read(&delayed_refs->mult);
+
 	wait_event_interruptible(delayed_refs->wait,
 		 (atomic_read(&delayed_refs->entries_run) >= threshold) ||
 		 !btrfs_should_throttle_delayed_refs(fs_info, delayed_refs,
@@ -973,6 +985,15 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 		err = -EIO;
 	}
 
+	if (!throttle_delayed_refs && atomic_read(&cur_trans->delayed_refs.mult) > 1) {
+		time64_t start = ktime_get_seconds();
+		spin_lock(&cur_trans->delayed_refs.lock);
+		if ((start - cur_trans->delayed_refs.last_adjustment) >= 1) {
+			atomic_dec(&cur_trans->delayed_refs.mult);
+			cur_trans->delayed_refs.last_adjustment = start;
+		}
+		spin_unlock(&cur_trans->delayed_refs.lock);
+	}
 	if (run_async && !work_busy(&info->async_delayed_ref_work))
 		queue_work(system_unbound_wq,
 			   &info->async_delayed_ref_work);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 11/13] btrfs: check delayed refs while relocating
  2020-03-13 21:23 [PATCH 00/13] Throttle delayed refs based on time Josef Bacik
                   ` (9 preceding siblings ...)
  2020-03-13 21:23 ` [PATCH 10/13] btrfs: handle uncontrolled delayed ref generation Josef Bacik
@ 2020-03-13 21:23 ` Josef Bacik
  2020-03-13 21:23 ` [PATCH 12/13] btrfs: throttle truncate for delayed ref generation Josef Bacik
  2020-03-13 21:23 ` [PATCH 13/13] btrfs: throttle snapshot delete on delayed refs Josef Bacik
  12 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2020-03-13 21:23 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Relocation can generate a serious amount of delayed refs, so we need to
throttle the relocation sometimes in order to keep up with the async
flusher.  We already have a mechanism to start over because of ENOSPC,
simply add delayed ref counts to the check as well.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 45268e50cb17..e3d6ba27663e 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2797,6 +2797,16 @@ static int reserve_metadata_space(struct btrfs_trans_handle *trans,
 	int ret;
 	u64 tmp;
 
+	/*
+	 * If we're generating too many delayed refs we should bail and allow
+	 * the delayed ref throttling stuff to catch up.
+	 */
+	if (btrfs_check_space_for_delayed_refs(fs_info) ||
+	    btrfs_should_throttle_delayed_refs(fs_info,
+					       &trans->transaction->delayed_refs,
+					       true))
+		return -EAGAIN;
+
 	num_bytes = calcu_metadata_size(rc, node, 1) * 2;
 
 	trans->block_rsv = rc->block_rsv;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 12/13] btrfs: throttle truncate for delayed ref generation
  2020-03-13 21:23 [PATCH 00/13] Throttle delayed refs based on time Josef Bacik
                   ` (10 preceding siblings ...)
  2020-03-13 21:23 ` [PATCH 11/13] btrfs: check delayed refs while relocating Josef Bacik
@ 2020-03-13 21:23 ` Josef Bacik
  2020-03-13 21:23 ` [PATCH 13/13] btrfs: throttle snapshot delete on delayed refs Josef Bacik
  12 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2020-03-13 21:23 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Truncates can generate a lot of delayed refs, and if we're over our time
limit or we're already flushing we should just bail so the appropriate
action can be taken.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c9815ed03d21..c39794a95acb 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4386,6 +4386,21 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 			 * let the normal reservation dance happen higher up.
 			 */
 			if (should_throttle) {
+				struct btrfs_transaction *cur_trans =
+					trans->transaction;
+
+				/*
+				 * If we're over time, or we're flushing, go
+				 * ahead and break out so that we can let
+				 * everybody catch up.
+				 */
+				if (btrfs_should_throttle_delayed_refs(fs_info,
+					&cur_trans->delayed_refs, true) ||
+				    cur_trans->delayed_refs.flushing) {
+					ret = -EAGAIN;
+					break;
+				}
+
 				ret = btrfs_delayed_refs_rsv_refill(fs_info,
 							BTRFS_RESERVE_NO_FLUSH);
 				if (ret) {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 13/13] btrfs: throttle snapshot delete on delayed refs
  2020-03-13 21:23 [PATCH 00/13] Throttle delayed refs based on time Josef Bacik
                   ` (11 preceding siblings ...)
  2020-03-13 21:23 ` [PATCH 12/13] btrfs: throttle truncate for delayed ref generation Josef Bacik
@ 2020-03-13 21:23 ` Josef Bacik
  12 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2020-03-13 21:23 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

One of the largest generators of delayed refs is snapshot delete.  This
is because we'll walk down to a shared node/leaf and drop all of the
references to the lower layer in that node/leaf.  With our default
nodesize of 16kib this can be hundreds of delayed refs, which can easily
put us over our threshold for running delayed refs.

Instead check and see if we need to throttle ourselves, and if we do
break out with -EAGAIN.  If this happens we do not want to do the
walk_up_tree because we need to keep processing the node we're on.

We also have to get rid of our BUG_ON(drop_level == 0) everywhere,
because we can actually stop at the 0 level.  Since we already have the
ability to restart snapshot deletions from an arbitrary key this works
out fine.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent-tree.c | 39 ++++++++++++++++++++++++++++++---------
 1 file changed, 30 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e490ce994d1d..718c99e5674f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4662,6 +4662,7 @@ struct walk_control {
 	int reada_slot;
 	int reada_count;
 	int restarted;
+	int drop_subtree;
 };
 
 #define DROP_REFERENCE	1
@@ -4766,6 +4767,21 @@ static noinline int walk_down_proc(struct btrfs_trans_handle *trans,
 	u64 flag = BTRFS_BLOCK_FLAG_FULL_BACKREF;
 	int ret;
 
+	/*
+	 * We only want to break if we aren't yet at the end of our leaf/node.
+	 * The reason for this is if we're at DROP_REFERENCE we'll grab the
+	 * current slot's key for the drop_progress.  If we're at the end this
+	 * will obviously go wrong.  We are also not going to generate many more
+	 * delayed refs at this point, so allowing us to continue will not hurt
+	 * us.
+	 */
+	if (!wc->drop_subtree &&
+	    (path->slots[level] < btrfs_header_nritems(path->nodes[level])) &&
+	    btrfs_should_throttle_delayed_refs(fs_info,
+					       &trans->transaction->delayed_refs,
+					       true))
+		return -EAGAIN;
+
 	if (wc->stage == UPDATE_BACKREF &&
 	    btrfs_header_owner(eb) != root->root_key.objectid)
 		return 1;
@@ -5198,6 +5214,8 @@ static noinline int walk_down_tree(struct btrfs_trans_handle *trans,
 		ret = walk_down_proc(trans, root, path, wc, lookup_info);
 		if (ret > 0)
 			break;
+		if (ret < 0)
+			return ret;
 
 		if (level == 0)
 			break;
@@ -5332,7 +5350,6 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
 		       sizeof(wc->update_progress));
 
 		level = root_item->drop_level;
-		BUG_ON(level == 0);
 		path->lowest_level = level;
 		ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
 		path->lowest_level = 0;
@@ -5381,19 +5398,23 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
 	wc->update_ref = update_ref;
 	wc->keep_locks = 0;
 	wc->reada_count = BTRFS_NODEPTRS_PER_BLOCK(fs_info);
+	wc->drop_subtree = 0;
 
 	while (1) {
 
 		ret = walk_down_tree(trans, root, path, wc);
-		if (ret < 0) {
-			err = ret;
-			break;
-		}
-
-		ret = walk_up_tree(trans, root, path, wc, BTRFS_MAX_LEVEL);
-		if (ret < 0) {
+		if (ret < 0 && ret != -EAGAIN) {
 			err = ret;
 			break;
+		} else if (ret != -EAGAIN) {
+			ret = walk_up_tree(trans, root, path, wc,
+					   BTRFS_MAX_LEVEL);
+			if (ret < 0) {
+				err = ret;
+				break;
+			}
+		} else {
+			ret = 0;
 		}
 
 		if (ret > 0) {
@@ -5411,7 +5432,6 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
 				      &wc->drop_progress);
 		root_item->drop_level = wc->drop_level;
 
-		BUG_ON(wc->level == 0);
 		if (btrfs_should_end_transaction(trans) ||
 		    (!for_reloc && btrfs_need_cleaner_sleep(fs_info))) {
 			ret = btrfs_update_root(trans, tree_root,
@@ -5544,6 +5564,7 @@ int btrfs_drop_subtree(struct btrfs_trans_handle *trans,
 	wc->stage = DROP_REFERENCE;
 	wc->update_ref = 0;
 	wc->keep_locks = 1;
+	wc->drop_subtree = 1;
 	wc->reada_count = BTRFS_NODEPTRS_PER_BLOCK(fs_info);
 
 	while (1) {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 07/13] btrfs: kick off async delayed ref flushing if we are over time budget
  2020-03-13 21:23 ` [PATCH 07/13] btrfs: kick off async delayed ref flushing if we are over time budget Josef Bacik
@ 2020-04-09 13:11   ` Nikolay Borisov
  2020-04-09 13:26   ` Nikolay Borisov
  1 sibling, 0 replies; 16+ messages in thread
From: Nikolay Borisov @ 2020-04-09 13:11 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 13.03.20 г. 23:23 ч., Josef Bacik wrote:
> For very large file systems we cannot rely on the space reservation
> system to provide enough pressure to flush delayed refs in a timely
> manner.  We have the infrastructure in place to keep track of how much
> theoretical time it'll take to run our outstanding delayed refs, but
> unfortunately I ripped all of that out when I added the delayed refs
> rsv.  This code originally was added to address the problem of too many
> delayed refs building up and thus causing transaction commits to take
> several minutes to finish.
> 
> Fix this by adding back the ability to flush delayed refs based on the
> time budget for them.  We want to limit to around 1 seconds worth of
> delayed refs to be pending at any given time.  In order to keep up with
> demand we will start the async flusher once we are at the 500ms mark,
> and the async flusher will attempt to keep us in this ballpark.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/ctree.h       |  4 ++++
>  fs/btrfs/disk-io.c     |  3 +++
>  fs/btrfs/extent-tree.c | 44 ++++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/transaction.c |  8 ++++++++
>  4 files changed, 59 insertions(+)
> 

<snip>

>  
> +static void btrfs_async_run_delayed_refs(struct work_struct *work)
> +{
> +	struct btrfs_fs_info *fs_info;
> +	struct btrfs_trans_handle *trans;
> +
> +	fs_info = container_of(work, struct btrfs_fs_info,
> +			       async_delayed_ref_work);
> +
> +	while (!btrfs_fs_closing(fs_info)) {
> +		unsigned long count;
> +		int ret;
> +
> +		trans = btrfs_attach_transaction(fs_info->extent_root);
> +		if (IS_ERR(trans))
> +			break;
> +
> +		smp_rmb();
> +		if (trans->transaction->delayed_refs.flushing) {
> +			btrfs_end_transaction(trans);
> +			break;
> +		}
> +
> +		/* No longer over our threshold, lets bail. */
> +		if (!btrfs_should_throttle_delayed_refs(trans, true)) {
> +			btrfs_end_transaction(trans);
> +			break;
> +		}
> +
> +		count = atomic_read(&trans->transaction->delayed_refs.num_entries);

Don't you want to actually read num_heads_ready rather than num_entries,
i.e isn't this introducing the same issues as the one fixed by:

[PATCH 2/5] btrfs: delayed refs pre-flushing should only run the heads
we have


> +		count >>= 2;
> +
> +		ret = btrfs_run_delayed_refs(trans, count);
> +		btrfs_end_transaction(trans);
> +		if (ret < 0)
> +			break;
> +	}
> +}

<snip>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 07/13] btrfs: kick off async delayed ref flushing if we are over time budget
  2020-03-13 21:23 ` [PATCH 07/13] btrfs: kick off async delayed ref flushing if we are over time budget Josef Bacik
  2020-04-09 13:11   ` Nikolay Borisov
@ 2020-04-09 13:26   ` Nikolay Borisov
  1 sibling, 0 replies; 16+ messages in thread
From: Nikolay Borisov @ 2020-04-09 13:26 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 13.03.20 г. 23:23 ч., Josef Bacik wrote:
> For very large file systems we cannot rely on the space reservation
> system to provide enough pressure to flush delayed refs in a timely
> manner.  We have the infrastructure in place to keep track of how much
> theoretical time it'll take to run our outstanding delayed refs, but
> unfortunately I ripped all of that out when I added the delayed refs
> rsv.  This code originally was added to address the problem of too many
> delayed refs building up and thus causing transaction commits to take
> several minutes to finish.
> 
> Fix this by adding back the ability to flush delayed refs based on the
> time budget for them.  We want to limit to around 1 seconds worth of
> delayed refs to be pending at any given time.  In order to keep up with
> demand we will start the async flusher once we are at the 500ms mark,
> and the async flusher will attempt to keep us in this ballpark.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/ctree.h       |  4 ++++
>  fs/btrfs/disk-io.c     |  3 +++
>  fs/btrfs/extent-tree.c | 44 ++++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/transaction.c |  8 ++++++++
>  4 files changed, 59 insertions(+)
> 

<snip>

> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 645ae95f465e..0e81990b57e0 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2249,6 +2249,50 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
>  	return 0;
>  }
>  
> +static void btrfs_async_run_delayed_refs(struct work_struct *work)
> +{
> +	struct btrfs_fs_info *fs_info;
> +	struct btrfs_trans_handle *trans;
> +
> +	fs_info = container_of(work, struct btrfs_fs_info,
> +			       async_delayed_ref_work);
> +
> +	while (!btrfs_fs_closing(fs_info)) {
> +		unsigned long count;
> +		int ret;
> +
> +		trans = btrfs_attach_transaction(fs_info->extent_root);
> +		if (IS_ERR(trans))
> +			break;
> +
> +		smp_rmb();

What is this barrier ordering? IMO its usage is bogus here, because in
btrfs_should_end_transaction we use a full barrier and here only an RMB.
Further more in btrfs_should_end_transaction we don't have any memory
accesses preceding the check of the flushing state. Looking at the
callers of btrfs_should_end_transaction I also don't see any ordering
guaranteed i.e I think it could be removed altogether. Or perhahps we
really want acquire/release semantics e.g. accesses to
delayed_refs.flushing should be done via
smp_load_acquire/smp_store_release functions?


> +		if (trans->transaction->delayed_refs.flushing) {
> +			btrfs_end_transaction(trans);
> +			break;
> +		}
> +
> +		/* No longer over our threshold, lets bail. */
> +		if (!btrfs_should_throttle_delayed_refs(trans, true)) {
> +			btrfs_end_transaction(trans);
> +			break;
> +		}
> +
> +		count = atomic_read(&trans->transaction->delayed_refs.num_entries);
> +		count >>= 2;
> +
> +		ret = btrfs_run_delayed_refs(trans, count);
> +		btrfs_end_transaction(trans);
> +		if (ret < 0)
> +			break;
> +	}
> +}
> +

<snip>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-04-09 13:26 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-13 21:23 [PATCH 00/13] Throttle delayed refs based on time Josef Bacik
2020-03-13 21:23 ` [PATCH 01/13] btrfs: use a stable rolling avg for delayed refs avg Josef Bacik
2020-03-13 21:23 ` [PATCH 02/13] btrfs: change btrfs_should_throttle_delayed_refs to a bool Josef Bacik
2020-03-13 21:23 ` [PATCH 03/13] btrfs: make btrfs_should_throttle_delayed_refs only check run time Josef Bacik
2020-03-13 21:23 ` [PATCH 04/13] btrfs: make should_end_transaction check time to run delayed refs Josef Bacik
2020-03-13 21:23 ` [PATCH 05/13] btrfs: squash should_end_transaction Josef Bacik
2020-03-13 21:23 ` [PATCH 06/13] btrfs: add a mode for delayed ref time based throttling Josef Bacik
2020-03-13 21:23 ` [PATCH 07/13] btrfs: kick off async delayed ref flushing if we are over time budget Josef Bacik
2020-04-09 13:11   ` Nikolay Borisov
2020-04-09 13:26   ` Nikolay Borisov
2020-03-13 21:23 ` [PATCH 08/13] btrfs: adjust the arguments for btrfs_should_throttle_delayed_refs Josef Bacik
2020-03-13 21:23 ` [PATCH 09/13] btrfs: throttle delayed refs based on time Josef Bacik
2020-03-13 21:23 ` [PATCH 10/13] btrfs: handle uncontrolled delayed ref generation Josef Bacik
2020-03-13 21:23 ` [PATCH 11/13] btrfs: check delayed refs while relocating Josef Bacik
2020-03-13 21:23 ` [PATCH 12/13] btrfs: throttle truncate for delayed ref generation Josef Bacik
2020-03-13 21:23 ` [PATCH 13/13] btrfs: throttle snapshot delete on delayed refs Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).