All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] btrfs: minor reclaim tuning
@ 2022-10-13 22:52 Boris Burkov
  2022-10-13 22:52 ` [PATCH v2 1/2] btrfs: skip reclaim if block_group is empty Boris Burkov
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Boris Burkov @ 2022-10-13 22:52 UTC (permalink / raw)
  To: linux-btrfs, kernel-team, 'Filipe Manana '

Two minor reclaim fixes that reduce relocations when they aren't quite
necessary. These are a basic first step in a broader effort to reduce
the alarmingly high rate of relocation we have observed in production
at Meta.

The first patch skips empty relocation.

The second patch skips relocation that no longer passes the reclaim
threshold check at reclaim time.

Changes in v2:
- added the re-check patch
- improved commit message and comment in the skip-empty patch.

Boris Burkov (2):
  btrfs: skip reclaim if block_group is empty
  btrfs: re-check reclaim condition in reclaim worker

 fs/btrfs/block-group.c | 83 +++++++++++++++++++++++++++++-------------
 1 file changed, 58 insertions(+), 25 deletions(-)

-- 
2.38.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] btrfs: skip reclaim if block_group is empty
  2022-10-13 22:52 [PATCH v2 0/2] btrfs: minor reclaim tuning Boris Burkov
@ 2022-10-13 22:52 ` Boris Burkov
  2022-10-13 22:52 ` [PATCH v2 2/2] btrfs: re-check reclaim condition in reclaim worker Boris Burkov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Boris Burkov @ 2022-10-13 22:52 UTC (permalink / raw)
  To: linux-btrfs, kernel-team, 'Filipe Manana '

As we delete extents from a block group, at some deletion we cross below
the reclaim threshold. It is possible we are still in the middle of
deleting more extents and might soon hit 0. If the block group is empty
by the time the reclaim worker runs, we will still relocate it.

This works just fine, as relocating an empty block group ultimately
results in properly deleting it. However, we have more direct ways of
removing empty block groups in the cleaner thread. Those are either
async discard or the unused_bgs list. In fact, when we decide whether to
relocate a block group during extent deletion, we do check for emptiness
and prefer the discard/unused_bgs mechanisms when possible.

Not using relocation for this case reduces some modest overhead from
empty bg relocation:
- extra transactions
- extra metadata use/churn for creating relocation metadata
- trying to read the extent tree to look for extents (and in this case
  finding none)

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/block-group.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 3f8b1cbbbc43..684401aa014a 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1606,6 +1606,24 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
 			up_write(&space_info->groups_sem);
 			goto next;
 		}
+		if (bg->used == 0) {
+			/*
+			 * It is possible that we trigger relocation on a block
+			 * group as its extents are deleted and it first goes
+			 * below the threshold, then shortly after goes empty.
+			 *
+			 * In this case, relocating it does delete it, but has
+			 * some overhead in relocation specific metadata, looking
+			 * for the non-existent extents and running some extra
+			 * transactions, which we can avoid by using one of the
+			 * other mechanisms for dealing with empty block groups.
+			 */
+			if (!btrfs_test_opt(fs_info, DISCARD_ASYNC))
+				btrfs_mark_bg_unused(bg);
+			spin_unlock(&bg->lock);
+			up_write(&space_info->groups_sem);
+			goto next;
+		}
 		spin_unlock(&bg->lock);
 
 		/* Get out fast, in case we're unmounting the filesystem */
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] btrfs: re-check reclaim condition in reclaim worker
  2022-10-13 22:52 [PATCH v2 0/2] btrfs: minor reclaim tuning Boris Burkov
  2022-10-13 22:52 ` [PATCH v2 1/2] btrfs: skip reclaim if block_group is empty Boris Burkov
@ 2022-10-13 22:52 ` Boris Burkov
  2022-10-14  8:43   ` David Sterba
  2022-10-14 10:24 ` [PATCH v2 0/2] btrfs: minor reclaim tuning Filipe Manana
  2022-10-14 14:09 ` David Sterba
  3 siblings, 1 reply; 6+ messages in thread
From: Boris Burkov @ 2022-10-13 22:52 UTC (permalink / raw)
  To: linux-btrfs, kernel-team, 'Filipe Manana '

I have observed the following case play out and lead to unnecessary
relocations:

1. write a file across multiple block groups
2. delete the file
3. several block groups fall below the reclaim threshold
4. reclaim the first, moving extents into the others
5. reclaim the others which are now actually very full, leading to poor
   reclaim behavior with lots of writing, allocating new block groups,
   etc.

I believe the risk of missing some reasonable reclaims is worth it
when traded off against the savings of avoiding overfull reclaims.

Going forward, it could be interesting to make the check more advanced
(zoned aware, fragmentation aware, etc...) so that it can be a really
strong signal both at extent delete and reclaim time.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/block-group.c | 65 ++++++++++++++++++++++++++----------------
 1 file changed, 40 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 684401aa014a..b3e9b1bc566e 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1539,6 +1539,30 @@ static inline bool btrfs_should_reclaim(struct btrfs_fs_info *fs_info)
 	return true;
 }
 
+static inline bool should_reclaim_block_group(struct btrfs_block_group *bg, u64 bytes_freed)
+{
+	const struct btrfs_space_info *space_info = bg->space_info;
+	const int reclaim_thresh = READ_ONCE(space_info->bg_reclaim_threshold);
+	const u64 new_val = bg->used;
+	const u64 old_val = new_val + bytes_freed;
+	u64 thresh;
+
+	if (reclaim_thresh == 0)
+		return false;
+
+	thresh = div_factor_fine(bg->length, reclaim_thresh);
+
+	/*
+	 * If we were below the threshold before don't reclaim, we are likely a
+	 * brand new block group and we don't want to relocate new block groups.
+	 */
+	if (old_val < thresh)
+		return false;
+	if (new_val >= thresh)
+		return false;
+	return true;
+}
+
 void btrfs_reclaim_bgs_work(struct work_struct *work)
 {
 	struct btrfs_fs_info *fs_info =
@@ -1623,6 +1647,22 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
 			spin_unlock(&bg->lock);
 			up_write(&space_info->groups_sem);
 			goto next;
+
+		}
+		/*
+		 * The block group might no longer meet the reclaim condition by
+		 * the time we get around to reclaiming it, so to avoid
+		 * reclaiming overly full block_groups, skip reclaiming them.
+		 *
+		 * Since the decision making process also depends on the amount
+		 * being freed, pass in a fake giant value to skip that extra
+		 * check, which is more meaningful when adding to the list in
+		 * the first place.
+		 */
+		if (!should_reclaim_block_group(bg, bg->length)) {
+			spin_unlock(&bg->lock);
+			up_write(&space_info->groups_sem);
+			goto next;
 		}
 		spin_unlock(&bg->lock);
 
@@ -3241,31 +3281,6 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans)
 	return ret;
 }
 
-static inline bool should_reclaim_block_group(struct btrfs_block_group *bg,
-					      u64 bytes_freed)
-{
-	const struct btrfs_space_info *space_info = bg->space_info;
-	const int reclaim_thresh = READ_ONCE(space_info->bg_reclaim_threshold);
-	const u64 new_val = bg->used;
-	const u64 old_val = new_val + bytes_freed;
-	u64 thresh;
-
-	if (reclaim_thresh == 0)
-		return false;
-
-	thresh = div_factor_fine(bg->length, reclaim_thresh);
-
-	/*
-	 * If we were below the threshold before don't reclaim, we are likely a
-	 * brand new block group and we don't want to relocate new block groups.
-	 */
-	if (old_val < thresh)
-		return false;
-	if (new_val >= thresh)
-		return false;
-	return true;
-}
-
 int btrfs_update_block_group(struct btrfs_trans_handle *trans,
 			     u64 bytenr, u64 num_bytes, bool alloc)
 {
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] btrfs: re-check reclaim condition in reclaim worker
  2022-10-13 22:52 ` [PATCH v2 2/2] btrfs: re-check reclaim condition in reclaim worker Boris Burkov
@ 2022-10-14  8:43   ` David Sterba
  0 siblings, 0 replies; 6+ messages in thread
From: David Sterba @ 2022-10-14  8:43 UTC (permalink / raw)
  To: Boris Burkov; +Cc: linux-btrfs, kernel-team, 'Filipe Manana '

On Thu, Oct 13, 2022 at 03:52:10PM -0700, Boris Burkov wrote:
> I have observed the following case play out and lead to unnecessary
> relocations:
> 
> 1. write a file across multiple block groups
> 2. delete the file
> 3. several block groups fall below the reclaim threshold
> 4. reclaim the first, moving extents into the others
> 5. reclaim the others which are now actually very full, leading to poor
>    reclaim behavior with lots of writing, allocating new block groups,
>    etc.
> 
> I believe the risk of missing some reasonable reclaims is worth it
> when traded off against the savings of avoiding overfull reclaims.
> 
> Going forward, it could be interesting to make the check more advanced
> (zoned aware, fragmentation aware, etc...) so that it can be a really
> strong signal both at extent delete and reclaim time.
> 
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
>  fs/btrfs/block-group.c | 65 ++++++++++++++++++++++++++----------------
>  1 file changed, 40 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 684401aa014a..b3e9b1bc566e 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1539,6 +1539,30 @@ static inline bool btrfs_should_reclaim(struct btrfs_fs_info *fs_info)
>  	return true;
>  }
>  
> +static inline bool should_reclaim_block_group(struct btrfs_block_group *bg, u64 bytes_freed)

The static inline does not make sense here, though it's just copied. I'm
doing fixups to make it just static in code gets moved (with a note to
changelog), as it would be a bit excessive to fix in a separate patch.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 0/2] btrfs: minor reclaim tuning
  2022-10-13 22:52 [PATCH v2 0/2] btrfs: minor reclaim tuning Boris Burkov
  2022-10-13 22:52 ` [PATCH v2 1/2] btrfs: skip reclaim if block_group is empty Boris Burkov
  2022-10-13 22:52 ` [PATCH v2 2/2] btrfs: re-check reclaim condition in reclaim worker Boris Burkov
@ 2022-10-14 10:24 ` Filipe Manana
  2022-10-14 14:09 ` David Sterba
  3 siblings, 0 replies; 6+ messages in thread
From: Filipe Manana @ 2022-10-14 10:24 UTC (permalink / raw)
  To: Boris Burkov; +Cc: linux-btrfs, kernel-team

On Thu, Oct 13, 2022 at 11:52 PM Boris Burkov <boris@bur.io> wrote:
>
> Two minor reclaim fixes that reduce relocations when they aren't quite
> necessary. These are a basic first step in a broader effort to reduce
> the alarmingly high rate of relocation we have observed in production
> at Meta.
>
> The first patch skips empty relocation.
>
> The second patch skips relocation that no longer passes the reclaim
> threshold check at reclaim time.
>
> Changes in v2:
> - added the re-check patch
> - improved commit message and comment in the skip-empty patch.
>
> Boris Burkov (2):
>   btrfs: skip reclaim if block_group is empty
>   btrfs: re-check reclaim condition in reclaim worker

Makes sense and both patches look good to me, so:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

>
>  fs/btrfs/block-group.c | 83 +++++++++++++++++++++++++++++-------------
>  1 file changed, 58 insertions(+), 25 deletions(-)
>
> --
> 2.38.0
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 0/2] btrfs: minor reclaim tuning
  2022-10-13 22:52 [PATCH v2 0/2] btrfs: minor reclaim tuning Boris Burkov
                   ` (2 preceding siblings ...)
  2022-10-14 10:24 ` [PATCH v2 0/2] btrfs: minor reclaim tuning Filipe Manana
@ 2022-10-14 14:09 ` David Sterba
  3 siblings, 0 replies; 6+ messages in thread
From: David Sterba @ 2022-10-14 14:09 UTC (permalink / raw)
  To: Boris Burkov; +Cc: linux-btrfs, kernel-team, 'Filipe Manana '

On Thu, Oct 13, 2022 at 03:52:08PM -0700, Boris Burkov wrote:
> Two minor reclaim fixes that reduce relocations when they aren't quite
> necessary. These are a basic first step in a broader effort to reduce
> the alarmingly high rate of relocation we have observed in production
> at Meta.
> 
> The first patch skips empty relocation.
> 
> The second patch skips relocation that no longer passes the reclaim
> threshold check at reclaim time.
> 
> Changes in v2:
> - added the re-check patch
> - improved commit message and comment in the skip-empty patch.
> 
> Boris Burkov (2):
>   btrfs: skip reclaim if block_group is empty
>   btrfs: re-check reclaim condition in reclaim worker

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-10-14 14:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-13 22:52 [PATCH v2 0/2] btrfs: minor reclaim tuning Boris Burkov
2022-10-13 22:52 ` [PATCH v2 1/2] btrfs: skip reclaim if block_group is empty Boris Burkov
2022-10-13 22:52 ` [PATCH v2 2/2] btrfs: re-check reclaim condition in reclaim worker Boris Burkov
2022-10-14  8:43   ` David Sterba
2022-10-14 10:24 ` [PATCH v2 0/2] btrfs: minor reclaim tuning Filipe Manana
2022-10-14 14:09 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.