From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Cc: Nikolay Borisov <nborisov@suse.com>
Subject: [PATCH v4 09/12] btrfs: simplify the logic in need_preemptive_flushing
Date: Tue, 26 Jan 2021 16:24:33 -0500 [thread overview]
Message-ID: <8f206fd7fece62626124cc1d5272b81f10bc19ee.1611695838.git.josef@toxicpanda.com> (raw)
In-Reply-To: <cover.1611695838.git.josef@toxicpanda.com>
A lot of this was added all in one go with no explanation, and is a bit
unwieldy and confusing. Simplify the logic to start preemptive flushing
if we've reserved more than half of our available free space.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/btrfs/space-info.c | 73 ++++++++++++++++++++++++++++---------------
1 file changed, 48 insertions(+), 25 deletions(-)
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 8f3b4cc8b812..1c4226f78e27 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -780,11 +780,11 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info *fs_info,
}
static bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info,
- struct btrfs_space_info *space_info,
- u64 used)
+ struct btrfs_space_info *space_info)
{
+ u64 ordered, delalloc;
u64 thresh = div_factor_fine(space_info->total_bytes, 98);
- u64 to_reclaim, expected;
+ u64 used;
/* If we're just plain full then async reclaim just slows us down. */
if ((space_info->bytes_used + space_info->bytes_reserved) >= thresh)
@@ -797,26 +797,52 @@ static bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info,
if (space_info->reclaim_size)
return 0;
- to_reclaim = min_t(u64, num_online_cpus() * SZ_1M, SZ_16M);
- if (btrfs_can_overcommit(fs_info, space_info, to_reclaim,
- BTRFS_RESERVE_FLUSH_ALL))
- return 0;
+ /*
+ * If we have over half of the free space occupied by reservations or
+ * pinned then we want to start flushing.
+ *
+ * We do not do the traditional thing here, which is to say
+ *
+ * if (used >= ((total_bytes + avail) >> 1))
+ * return 1;
+ *
+ * because this doesn't quite work how we want. If we had more than 50%
+ * of the space_info used by bytes_used and we had 0 available we'd just
+ * constantly run the background flusher. Instead we want it to kick in
+ * if our reclaimable space exceeds 50% of our available free space.
+ */
+ thresh = calc_available_free_space(fs_info, space_info,
+ BTRFS_RESERVE_FLUSH_ALL);
+ thresh += (space_info->total_bytes - space_info->bytes_used -
+ space_info->bytes_reserved - space_info->bytes_readonly);
+ thresh >>= 1;
- used = btrfs_space_info_used(space_info, true);
- if (btrfs_can_overcommit(fs_info, space_info, SZ_1M,
- BTRFS_RESERVE_FLUSH_ALL))
- expected = div_factor_fine(space_info->total_bytes, 95);
- else
- expected = div_factor_fine(space_info->total_bytes, 90);
+ used = space_info->bytes_pinned;
- if (used > expected)
- to_reclaim = used - expected;
+ /*
+ * If we have more ordered bytes than delalloc bytes then we're either
+ * doing a lot of DIO, or we simply don't have a lot of delalloc waiting
+ * around. Preemptive flushing is only useful in that it can free up
+ * space before tickets need to wait for things to finish. In the case
+ * of ordered extents, preemptively waiting on ordered extents gets us
+ * nothing, if our reservations are tied up in ordered extents we'll
+ * simply have to slow down writers by forcing them to wait on ordered
+ * extents.
+ *
+ * In the case that ordered is larger than delalloc, only include the
+ * block reserves that we would actually be able to directly reclaim
+ * from. In this case if we're heavy on metadata operations this will
+ * clearly be heavy enough to warrant preemptive flushing. In the case
+ * of heavy DIO or ordered reservations, preemptive flushing will just
+ * waste time and cause us to slow down.
+ */
+ ordered = percpu_counter_sum_positive(&fs_info->ordered_bytes);
+ delalloc = percpu_counter_sum_positive(&fs_info->delalloc_bytes);
+ if (ordered >= delalloc)
+ used += fs_info->delayed_refs_rsv.reserved +
+ fs_info->delayed_block_rsv.reserved;
else
- to_reclaim = 0;
- to_reclaim = min(to_reclaim, space_info->bytes_may_use +
- space_info->bytes_reserved);
- if (!to_reclaim)
- return 0;
+ used += space_info->bytes_may_use;
return (used >= thresh && !btrfs_fs_closing(fs_info) &&
!test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state));
@@ -1013,7 +1039,6 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
struct btrfs_block_rsv *delayed_refs_rsv;
struct btrfs_block_rsv *global_rsv;
struct btrfs_block_rsv *trans_rsv;
- u64 used;
fs_info = container_of(work, struct btrfs_fs_info,
preempt_reclaim_work);
@@ -1024,8 +1049,7 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
trans_rsv = &fs_info->trans_block_rsv;
spin_lock(&space_info->lock);
- used = btrfs_space_info_used(space_info, true);
- while (need_preemptive_reclaim(fs_info, space_info, used)) {
+ while (need_preemptive_reclaim(fs_info, space_info)) {
enum btrfs_flush_state flush;
u64 delalloc_size = 0;
u64 to_reclaim, block_rsv_size;
@@ -1087,7 +1111,6 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
flush_space(fs_info, space_info, to_reclaim, flush);
cond_resched();
spin_lock(&space_info->lock);
- used = btrfs_space_info_used(space_info, true);
}
spin_unlock(&space_info->lock);
}
@@ -1512,7 +1535,7 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info,
* the async reclaim as we will panic.
*/
if (!test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags) &&
- need_preemptive_reclaim(fs_info, space_info, used) &&
+ need_preemptive_reclaim(fs_info, space_info) &&
!work_busy(&fs_info->preempt_reclaim_work)) {
trace_btrfs_trigger_flush(fs_info, space_info->flags,
orig_bytes, flush, "preempt");
--
2.26.2
next prev parent reply other threads:[~2021-01-27 10:04 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-26 21:24 [PATCH v4 00/12] Improve preemptive ENOSPC flushing Josef Bacik
2021-01-26 21:24 ` [PATCH v4 01/12] btrfs: make flush_space take a enum btrfs_flush_state instead of int Josef Bacik
2021-01-26 21:24 ` [PATCH v4 02/12] btrfs: add a trace point for reserve tickets Josef Bacik
2021-01-26 21:24 ` [PATCH v4 03/12] btrfs: track ordered bytes instead of just dio ordered bytes Josef Bacik
2021-01-26 21:24 ` [PATCH v4 04/12] btrfs: introduce a FORCE_COMMIT_TRANS flush operation Josef Bacik
2021-01-26 21:24 ` [PATCH v4 05/12] btrfs: improve preemptive background space flushing Josef Bacik
2021-01-26 21:24 ` [PATCH v4 06/12] btrfs: rename need_do_async_reclaim Josef Bacik
2021-01-26 21:24 ` [PATCH v4 07/12] btrfs: check reclaim_size in need_preemptive_reclaim Josef Bacik
2021-01-26 21:24 ` [PATCH v4 08/12] btrfs: rework btrfs_calc_reclaim_metadata_size Josef Bacik
2021-01-26 21:24 ` Josef Bacik [this message]
2021-01-26 21:24 ` [PATCH v4 10/12] btrfs: implement space clamping for preemptive flushing Josef Bacik
2021-01-26 21:24 ` [PATCH v4 11/12] btrfs: adjust the flush trace point to include the source Josef Bacik
2021-01-26 21:24 ` [PATCH v4 12/12] btrfs: add a trace class for dumping the current ENOSPC state Josef Bacik
2021-01-27 15:03 ` [PATCH v4 00/12] Improve preemptive ENOSPC flushing David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8f206fd7fece62626124cc1d5272b81f10bc19ee.1611695838.git.josef@toxicpanda.com \
--to=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=nborisov@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).