All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Cc: Nikolay Borisov <nborisov@suse.com>
Subject: [PATCH v4 10/12] btrfs: implement space clamping for preemptive flushing
Date: Tue, 26 Jan 2021 16:24:34 -0500	[thread overview]
Message-ID: <8bece7f165ba8b7b17aedc131533302434093afc.1611695838.git.josef@toxicpanda.com> (raw)
In-Reply-To: <cover.1611695838.git.josef@toxicpanda.com>

Starting preemptive flushing at 50% of available free space is a good
start, but some workloads are particularly abusive and can quickly
overwhelm the preemptive flushing code and drive us into using tickets.

Handle this by clamping down on our threshold for starting and
continuing to run preemptive flushing.  This is particularly important
for our overcommit case, as we can really drive the file system into
overages and then it's more difficult to pull it back as we start to
actually fill up the file system.

The clamping is essentially 2^CLAMP, but we start at 1 so whatever we
calculate for overcommit is the baseline.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 51 +++++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/space-info.h |  4 ++++
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 1c4226f78e27..6afb9cac694a 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -206,6 +206,7 @@ static int create_space_info(struct btrfs_fs_info *info, u64 flags)
 	INIT_LIST_HEAD(&space_info->ro_bgs);
 	INIT_LIST_HEAD(&space_info->tickets);
 	INIT_LIST_HEAD(&space_info->priority_tickets);
+	space_info->clamp = 1;
 
 	ret = btrfs_sysfs_add_space_info_type(info, space_info);
 	if (ret)
@@ -809,13 +810,26 @@ static bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info,
 	 * because this doesn't quite work how we want.  If we had more than 50%
 	 * of the space_info used by bytes_used and we had 0 available we'd just
 	 * constantly run the background flusher.  Instead we want it to kick in
-	 * if our reclaimable space exceeds 50% of our available free space.
+	 * if our reclaimable space exceeds our clamped free space.
+	 *
+	 * Our clamping range is 2^1 -> 2^8.  Practically speaking that means
+	 * the following
+	 *
+	 * Amount of RAM	Minimum threshold	Maximum threshold
+	 * 256GIB		1GIB			128GIB
+	 * 128GIB		512MIB			64GIB
+	 * 64GIB		256MIB			32GIB
+	 * 32GIB		128MIB			16GIB
+	 * 16GIB		64MIB			8GIB
+	 *
+	 * These are the range our thresholds will fall in, corresponding to how
+	 * much delalloc we need for the background flusher to kick in.
 	 */
 	thresh = calc_available_free_space(fs_info, space_info,
 					   BTRFS_RESERVE_FLUSH_ALL);
 	thresh += (space_info->total_bytes - space_info->bytes_used -
 		   space_info->bytes_reserved - space_info->bytes_readonly);
-	thresh >>= 1;
+	thresh >>= space_info->clamp;
 
 	used = space_info->bytes_pinned;
 
@@ -1039,6 +1053,7 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
 	struct btrfs_block_rsv *delayed_refs_rsv;
 	struct btrfs_block_rsv *global_rsv;
 	struct btrfs_block_rsv *trans_rsv;
+	int loops = 0;
 
 	fs_info = container_of(work, struct btrfs_fs_info,
 			       preempt_reclaim_work);
@@ -1055,6 +1070,8 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
 		u64 to_reclaim, block_rsv_size;
 		u64 global_rsv_size = global_rsv->reserved;
 
+		loops++;
+
 		/*
 		 * We don't have a precise counter for the metadata being
 		 * reserved for delalloc, so we'll approximate it by subtracting
@@ -1112,6 +1129,10 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
 		cond_resched();
 		spin_lock(&space_info->lock);
 	}
+
+	/* We only went through once, back off our clamping. */
+	if (loops == 1 && !space_info->reclaim_size)
+		space_info->clamp = max(1, space_info->clamp - 1);
 	spin_unlock(&space_info->lock);
 }
 
@@ -1434,6 +1455,24 @@ static inline bool is_normal_flushing(enum btrfs_reserve_flush_enum flush)
 		(flush == BTRFS_RESERVE_FLUSH_ALL_STEAL);
 }
 
+static inline void maybe_clamp_preempt(struct btrfs_fs_info *fs_info,
+				       struct btrfs_space_info *space_info)
+{
+	u64 ordered = percpu_counter_sum_positive(&fs_info->ordered_bytes);
+	u64 delalloc = percpu_counter_sum_positive(&fs_info->delalloc_bytes);
+
+	/*
+	 * If we're heavy on ordered operations then clamping won't help us.  We
+	 * need to clamp specifically to keep up with dirty'ing buffered
+	 * writers, because there's not a 1:1 correlation of writing delalloc
+	 * and freeing space, like there is with flushing delayed refs or
+	 * delayed nodes.  If we're already more ordered than delalloc then
+	 * we're keeping up, otherwise we aren't and should probably clamp.
+	 */
+	if (ordered < delalloc)
+		space_info->clamp = min(space_info->clamp + 1, 8);
+}
+
 /**
  * Try to reserve bytes from the block_rsv's space
  *
@@ -1527,6 +1566,14 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info,
 			list_add_tail(&ticket.list,
 				      &space_info->priority_tickets);
 		}
+
+		/*
+		 * We were forced to add a reserve ticket, so our preemptive
+		 * flushing is unable to keep up.  Clamp down on the threshold
+		 * for the preemptive flushing in order to keep up with the
+		 * workload.
+		 */
+		maybe_clamp_preempt(fs_info, space_info);
 	} else if (!ret && space_info->flags & BTRFS_BLOCK_GROUP_METADATA) {
 		used += orig_bytes;
 		/*
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index 74706f604bce..8870bd7805b5 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -22,6 +22,10 @@ struct btrfs_space_info {
 				   the space info if we had an ENOSPC in the
 				   allocator. */
 
+	int clamp;		/* Used to scale our threshold for preemptive
+				   flushing.  The value is >> clamp, so
+				   turns out to be a 2^clamp divisor. */
+
 	unsigned int full:1;	/* indicates that we cannot allocate any more
 				   chunks for this space */
 	unsigned int chunk_alloc:1;	/* set if we are allocating a chunk */
-- 
2.26.2


  parent reply	other threads:[~2021-01-26 23:09 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-26 21:24 [PATCH v4 00/12] Improve preemptive ENOSPC flushing Josef Bacik
2021-01-26 21:24 ` [PATCH v4 01/12] btrfs: make flush_space take a enum btrfs_flush_state instead of int Josef Bacik
2021-01-26 21:24 ` [PATCH v4 02/12] btrfs: add a trace point for reserve tickets Josef Bacik
2021-01-26 21:24 ` [PATCH v4 03/12] btrfs: track ordered bytes instead of just dio ordered bytes Josef Bacik
2021-01-26 21:24 ` [PATCH v4 04/12] btrfs: introduce a FORCE_COMMIT_TRANS flush operation Josef Bacik
2021-01-26 21:24 ` [PATCH v4 05/12] btrfs: improve preemptive background space flushing Josef Bacik
2021-01-26 21:24 ` [PATCH v4 06/12] btrfs: rename need_do_async_reclaim Josef Bacik
2021-01-26 21:24 ` [PATCH v4 07/12] btrfs: check reclaim_size in need_preemptive_reclaim Josef Bacik
2021-01-26 21:24 ` [PATCH v4 08/12] btrfs: rework btrfs_calc_reclaim_metadata_size Josef Bacik
2021-01-26 21:24 ` [PATCH v4 09/12] btrfs: simplify the logic in need_preemptive_flushing Josef Bacik
2021-01-26 21:24 ` Josef Bacik [this message]
2021-01-26 21:24 ` [PATCH v4 11/12] btrfs: adjust the flush trace point to include the source Josef Bacik
2021-01-26 21:24 ` [PATCH v4 12/12] btrfs: add a trace class for dumping the current ENOSPC state Josef Bacik
2021-01-27 15:03 ` [PATCH v4 00/12] Improve preemptive ENOSPC flushing David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8bece7f165ba8b7b17aedc131533302434093afc.1611695838.git.josef@toxicpanda.com \
    --to=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nborisov@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.