[PATCH v2 0/4] btrfs: rework background block group relocation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/4] btrfs: rework background block group relocation
@ 2022-03-29  8:56 Johannes Thumshirn
  2022-03-29  8:56 ` [PATCH v2 1/4] btrfs: make the bg_reclaim_threshold per-space info Johannes Thumshirn
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Johannes Thumshirn @ 2022-03-29  8:56 UTC (permalink / raw)
  To: David Sterba
  Cc: Johannes Thumshirn, Naohiro Aota, Josef Bacik, Pankaj Raghav,
	linux-btrfs @ vger . kernel . org

This is a combination of Josef's series titled "btrfs: rework background
block group relocation" and my patch titled "btrfs: zoned: make auto-reclaim
less aggressive" plus another preparation patch to address Josef's comments.

I've opted for rebasinig my path onto Josef's series to avoid and fix
conflicts, as we're both touching the same code.

Here's the original cover letter from Josef:

Currently the background block group relocation code only works for zoned
devices, as it prevents the file system from becoming unusable because of block
group fragmentation.

However inside Facebook our common workload is to download tens of gigabytes
worth of send files or package files, and it does this by fallocate()'ing the
entire package, writing into it, and then free'ing it up afterwards.
Unfortunately this leads to a similar problem as zoned, we get fragmented data
block groups, and this trends towards filling the entire disk up with partly
used data block groups, which then leads to ENOSPC because of the lack of
metadata space.

Because of this we have been running balance internally forever, but this was
triggered based on different size usage hueristics and stil gave us a high
enough failure rate that it was annoying (figure 10-20 machines needing to be
reprovisioned per week).

So I modified the existing bg_reclaim_threshold code to also apply in the !zoned
case, and I also made it only apply to DATA block groups.  This has completely
eliminated these random failure cases, and we're no longer reprovisioning
machines that get stuck with 0 metadata space.

However my internal patch is kind of janky as it hard codes the DATA check.
What I've done here is made the bg_reclaim_threshold per-space_info, this way
a user can target all block group types or just the ones they care about.  This
won't break any current users because this only applied in the zoned case
before.

Additionally I've added the code to allow this to work in the !zoned case, and
loosened the restriction on the threshold from 50-100 to 0-100.

I tested this on my vm by writing 500m files and then removing half of them and
validating that the block groups were automatically reclaimed.

https://lore.kernel.org/linux-btrfs/cover.1646934721.git.josef@toxicpanda.com/

Changes to v1:
* Fix zoned threshold calculation (Pankaj)
* Drop unneeded patch

Johannes Thumshirn (1):
  btrfs: zoned: make auto-reclaim less aggressive

Josef Bacik (3):
  btrfs: make the bg_reclaim_threshold per-space info
  btrfs: allow block group background reclaim for !zoned fs'es
  btrfs: change the bg_reclaim_threshold valid region from 0 to 100

 fs/btrfs/block-group.c      | 41 +++++++++++++++++++++++++++++++++++++
 fs/btrfs/free-space-cache.c |  7 +++++--
 fs/btrfs/space-info.c       |  9 ++++++++
 fs/btrfs/space-info.h       |  6 ++++++
 fs/btrfs/sysfs.c            | 37 +++++++++++++++++++++++++++++++++
 fs/btrfs/zoned.c            | 28 +++++++++++++++++++++++++
 fs/btrfs/zoned.h            | 12 ++++++-----
 7 files changed, 133 insertions(+), 7 deletions(-)

-- 
2.35.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/4] btrfs: make the bg_reclaim_threshold per-space info
  2022-03-29  8:56 [PATCH v2 0/4] btrfs: rework background block group relocation Johannes Thumshirn
@ 2022-03-29  8:56 ` Johannes Thumshirn
  2022-03-29  8:56 ` [PATCH v2 2/4] btrfs: allow block group background reclaim for !zoned fs'es Johannes Thumshirn
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Johannes Thumshirn @ 2022-03-29  8:56 UTC (permalink / raw)
  To: David Sterba
  Cc: Josef Bacik, Naohiro Aota, Pankaj Raghav,
	linux-btrfs @ vger . kernel . org, Johannes Thumshirn

From: Josef Bacik <josef@toxicpanda.com>

For !zoned file systems it's useful to have the auto reclaim feature,
however there are different use cases for !zoned, for example we may not
want to reclaim metadata chunks ever, only data chunks.  Move this sysfs
flag to per-space_info.  This won't affect current users because this
tunable only ever did anything for zoned, and that is currently hidden
behind BTRFS_CONFIG_DEBUG.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ jth restore global bg_reclaim_threshold ]
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/free-space-cache.c |  7 +++++--
 fs/btrfs/space-info.c       |  9 +++++++++
 fs/btrfs/space-info.h       |  6 ++++++
 fs/btrfs/sysfs.c            | 37 +++++++++++++++++++++++++++++++++++++
 fs/btrfs/zoned.h            |  6 +-----
 5 files changed, 58 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 01a408db5683..ef84bc5030cd 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2630,16 +2630,19 @@ int __btrfs_add_free_space(struct btrfs_block_group *block_group,
 static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
 					u64 bytenr, u64 size, bool used)
 {
-	struct btrfs_fs_info *fs_info = block_group->fs_info;
+	struct btrfs_space_info *sinfo = block_group->space_info;
 	struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
 	u64 offset = bytenr - block_group->start;
 	u64 to_free, to_unusable;
-	const int bg_reclaim_threshold = READ_ONCE(fs_info->bg_reclaim_threshold);
+	int bg_reclaim_threshold = 0;
 	bool initial = (size == block_group->length);
 	u64 reclaimable_unusable;
 
 	WARN_ON(!initial && offset + size > block_group->zone_capacity);
 
+	if (!initial)
+		bg_reclaim_threshold = READ_ONCE(sinfo->bg_reclaim_threshold);
+
 	spin_lock(&ctl->tree_lock);
 	if (!used)
 		to_free = size;
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index b87931a458eb..60d0a58c4644 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -181,6 +181,12 @@ void btrfs_clear_space_info_full(struct btrfs_fs_info *info)
 		found->full = 0;
 }
 
+/*
+ * Block groups with more than this value (percents) of unusable space will be
+ * scheduled for background reclaim.
+ */
+#define BTRFS_DEFAULT_ZONED_RECLAIM_THRESH	75
+
 static int create_space_info(struct btrfs_fs_info *info, u64 flags)
 {
 
@@ -203,6 +209,9 @@ static int create_space_info(struct btrfs_fs_info *info, u64 flags)
 	INIT_LIST_HEAD(&space_info->priority_tickets);
 	space_info->clamp = 1;
 
+	if (btrfs_is_zoned(info))
+		space_info->bg_reclaim_threshold = BTRFS_DEFAULT_ZONED_RECLAIM_THRESH;
+
 	ret = btrfs_sysfs_add_space_info_type(info, space_info);
 	if (ret)
 		return ret;
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index d841fed73492..0c45f539e3cf 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -24,6 +24,12 @@ struct btrfs_space_info {
 				   the space info if we had an ENOSPC in the
 				   allocator. */
 
+	/*
+	 * Once a block group drops below this threshold we'll schedule it for
+	 * reclaim.
+	 */
+	int bg_reclaim_threshold;
+
 	int clamp;		/* Used to scale our threshold for preemptive
 				   flushing. The value is >> clamp, so turns
 				   out to be a 2^clamp divisor. */
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 17389a42a3ab..90da1ea0cae0 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -722,6 +722,42 @@ SPACE_INFO_ATTR(bytes_zone_unusable);
 SPACE_INFO_ATTR(disk_used);
 SPACE_INFO_ATTR(disk_total);
 
+static ssize_t btrfs_sinfo_bg_reclaim_threshold_show(struct kobject *kobj,
+						     struct kobj_attribute *a,
+						     char *buf)
+{
+	struct btrfs_space_info *space_info = to_space_info(kobj);
+	ssize_t ret;
+
+	ret = sysfs_emit(buf, "%d\n", READ_ONCE(space_info->bg_reclaim_threshold));
+
+	return ret;
+}
+
+static ssize_t btrfs_sinfo_bg_reclaim_threshold_store(struct kobject *kobj,
+						      struct kobj_attribute *a,
+						      const char *buf, size_t len)
+{
+	struct btrfs_space_info *space_info = to_space_info(kobj);
+	int thresh;
+	int ret;
+
+	ret = kstrtoint(buf, 10, &thresh);
+	if (ret)
+		return ret;
+
+	if (thresh != 0 && (thresh <= 50 || thresh > 100))
+		return -EINVAL;
+
+	WRITE_ONCE(space_info->bg_reclaim_threshold, thresh);
+
+	return len;
+}
+
+BTRFS_ATTR_RW(space_info, bg_reclaim_threshold,
+	      btrfs_sinfo_bg_reclaim_threshold_show,
+	      btrfs_sinfo_bg_reclaim_threshold_store);
+
 /*
  * Allocation information about block group types.
  *
@@ -738,6 +774,7 @@ static struct attribute *space_info_attrs[] = {
 	BTRFS_ATTR_PTR(space_info, bytes_zone_unusable),
 	BTRFS_ATTR_PTR(space_info, disk_used),
 	BTRFS_ATTR_PTR(space_info, disk_total),
+	BTRFS_ATTR_PTR(space_info, bg_reclaim_threshold),
 	NULL,
 };
 ATTRIBUTE_GROUPS(space_info);
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index cbf016a7bb5d..c489c08d7fd5 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -10,11 +10,7 @@
 #include "block-group.h"
 #include "btrfs_inode.h"
 
-/*
- * Block groups with more than this value (percents) of unusable space will be
- * scheduled for background reclaim.
- */
-#define BTRFS_DEFAULT_RECLAIM_THRESH		75
+#define BTRFS_DEFAULT_RECLAIM_THRESH           75
 
 struct btrfs_zoned_device_info {
 	/*
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/4] btrfs: allow block group background reclaim for !zoned fs'es
  2022-03-29  8:56 [PATCH v2 0/4] btrfs: rework background block group relocation Johannes Thumshirn
  2022-03-29  8:56 ` [PATCH v2 1/4] btrfs: make the bg_reclaim_threshold per-space info Johannes Thumshirn
@ 2022-03-29  8:56 ` Johannes Thumshirn
  2022-03-29  8:56 ` [PATCH v2 3/4] btrfs: change the bg_reclaim_threshold valid region from 0 to 100 Johannes Thumshirn
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Johannes Thumshirn @ 2022-03-29  8:56 UTC (permalink / raw)
  To: David Sterba
  Cc: Josef Bacik, Naohiro Aota, Pankaj Raghav,
	linux-btrfs @ vger . kernel . org, Johannes Thumshirn

From: Josef Bacik <josef@toxicpanda.com>

We have found this feature invaluable at Facebook due to how our
workload interacts with the allocator.  We have been using this in
production for months with only a single problem that has already been
fixed.  This will allow us to set a threshold for block groups to be
automatically relocated even if we don't have zoned devices.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/block-group.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 59f18a10fd5f..628741ecb97b 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -3208,6 +3208,31 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans)
 	return ret;
 }
 
+static inline bool should_reclaim_block_group(struct btrfs_block_group *block_group,
+					      u64 bytes_freed)
+{
+	const struct btrfs_space_info *space_info = block_group->space_info;
+	const int reclaim_thresh = READ_ONCE(space_info->bg_reclaim_threshold);
+	const u64 new_val = block_group->used;
+	const u64 old_val = new_val + bytes_freed;
+	u64 thresh;
+
+	if (reclaim_thresh == 0)
+		return false;
+
+	thresh = div_factor_fine(block_group->length, reclaim_thresh);
+
+	/*
+	 * If we were below the threshold before don't reclaim, we are likely a
+	 * brand new block group and we don't want to relocate new block groups.
+	 */
+	if (old_val < thresh)
+		return false;
+	if (new_val >= thresh)
+		return false;
+	return true;
+}
+
 int btrfs_update_block_group(struct btrfs_trans_handle *trans,
 			     u64 bytenr, u64 num_bytes, bool alloc)
 {
@@ -3230,6 +3255,8 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
 	spin_unlock(&info->delalloc_root_lock);
 
 	while (total) {
+		bool reclaim;
+
 		cache = btrfs_lookup_block_group(info, bytenr);
 		if (!cache) {
 			ret = -ENOENT;
@@ -3275,6 +3302,8 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
 					cache->space_info, num_bytes);
 			cache->space_info->bytes_used -= num_bytes;
 			cache->space_info->disk_used -= num_bytes * factor;
+
+			reclaim = should_reclaim_block_group(cache, num_bytes);
 			spin_unlock(&cache->lock);
 			spin_unlock(&cache->space_info->lock);
 
@@ -3301,6 +3330,8 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
 		if (!alloc && old_val == 0) {
 			if (!btrfs_test_opt(info, DISCARD_ASYNC))
 				btrfs_mark_bg_unused(cache);
+		} else if (!alloc && reclaim) {
+			btrfs_mark_bg_to_reclaim(cache);
 		}
 
 		btrfs_put_block_group(cache);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/4] btrfs: change the bg_reclaim_threshold valid region from 0 to 100
  2022-03-29  8:56 [PATCH v2 0/4] btrfs: rework background block group relocation Johannes Thumshirn
  2022-03-29  8:56 ` [PATCH v2 1/4] btrfs: make the bg_reclaim_threshold per-space info Johannes Thumshirn
  2022-03-29  8:56 ` [PATCH v2 2/4] btrfs: allow block group background reclaim for !zoned fs'es Johannes Thumshirn
@ 2022-03-29  8:56 ` Johannes Thumshirn
  2022-03-29  8:56 ` [PATCH v2 4/4] btrfs: zoned: make auto-reclaim less aggressive Johannes Thumshirn
  2022-04-04 15:50 ` [PATCH v2 0/4] btrfs: rework background block group relocation David Sterba
  4 siblings, 0 replies; 8+ messages in thread
From: Johannes Thumshirn @ 2022-03-29  8:56 UTC (permalink / raw)
  To: David Sterba
  Cc: Josef Bacik, Naohiro Aota, Pankaj Raghav,
	linux-btrfs @ vger . kernel . org, Johannes Thumshirn

From: Josef Bacik <josef@toxicpanda.com>

For the !zoned case we may want to set the threshold for reclaim to
something below 50%.  Change the acceptable threshold from 50-100 to
0-100.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/sysfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 90da1ea0cae0..fdf9bf789528 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -746,7 +746,7 @@ static ssize_t btrfs_sinfo_bg_reclaim_threshold_store(struct kobject *kobj,
 	if (ret)
 		return ret;
 
-	if (thresh != 0 && (thresh <= 50 || thresh > 100))
+	if (thresh < 0 || thresh > 100)
 		return -EINVAL;
 
 	WRITE_ONCE(space_info->bg_reclaim_threshold, thresh);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 4/4] btrfs: zoned: make auto-reclaim less aggressive
  2022-03-29  8:56 [PATCH v2 0/4] btrfs: rework background block group relocation Johannes Thumshirn
                   ` (2 preceding siblings ...)
  2022-03-29  8:56 ` [PATCH v2 3/4] btrfs: change the bg_reclaim_threshold valid region from 0 to 100 Johannes Thumshirn
@ 2022-03-29  8:56 ` Johannes Thumshirn
  2022-03-30 15:22   ` Pankaj Raghav
  2022-04-04 15:48   ` David Sterba
  2022-04-04 15:50 ` [PATCH v2 0/4] btrfs: rework background block group relocation David Sterba
  4 siblings, 2 replies; 8+ messages in thread
From: Johannes Thumshirn @ 2022-03-29  8:56 UTC (permalink / raw)
  To: David Sterba
  Cc: Johannes Thumshirn, Naohiro Aota, Josef Bacik, Pankaj Raghav,
	linux-btrfs @ vger . kernel . org

The current auto-reclaim algorithm starts reclaiming all block-group's
with a zone_unusable value above a configured threshold. This is causing a
lot of reclaim IO even if there would be enough free zones on the device.

Instead of only accounting a block-group's zone_unusable value, also take
the ratio of free and not usable (written as well as zone_unusable) bytes
a device has into account.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/block-group.c | 10 ++++++++++
 fs/btrfs/zoned.c       | 28 ++++++++++++++++++++++++++++
 fs/btrfs/zoned.h       |  6 ++++++
 3 files changed, 44 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 628741ecb97b..12454304bb85 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1512,6 +1512,13 @@ static int reclaim_bgs_cmp(void *unused, const struct list_head *a,
 	return bg1->used > bg2->used;
 }
 
+static inline bool btrfs_should_reclaim(struct btrfs_fs_info *fs_info)
+{
+	if (btrfs_is_zoned(fs_info))
+		return btrfs_zoned_should_reclaim(fs_info);
+	return true;
+}
+
 void btrfs_reclaim_bgs_work(struct work_struct *work)
 {
 	struct btrfs_fs_info *fs_info =
@@ -1522,6 +1529,9 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
 	if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags))
 		return;
 
+	if (!btrfs_should_reclaim(fs_info))
+		return;
+
 	sb_start_write(fs_info->sb);
 
 	if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE)) {
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 1b1b310c3c51..c0c460749b74 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -2079,3 +2079,31 @@ void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info)
 	}
 	mutex_unlock(&fs_devices->device_list_mutex);
 }
+
+bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
+	struct btrfs_device *device;
+	u64 used = 0;
+	u64 total = 0;
+	u64 factor;
+
+	ASSERT(btrfs_is_zoned(fs_info));
+
+	if (!fs_info->bg_reclaim_threshold)
+		return false;
+
+	mutex_lock(&fs_devices->device_list_mutex);
+	list_for_each_entry(device, &fs_devices->devices, dev_list) {
+		if (!device->bdev)
+			continue;
+
+		total += device->disk_total_bytes;
+		used += device->bytes_used;
+
+	}
+	mutex_unlock(&fs_devices->device_list_mutex);
+
+	factor = div64_u64(used * 100, total);
+	return factor >= fs_info->bg_reclaim_threshold;
+}
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index c489c08d7fd5..f2d16395087f 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -74,6 +74,7 @@ void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical,
 			     u64 length);
 void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg);
 void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info);
+bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info);
 #else /* CONFIG_BLK_DEV_ZONED */
 static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos,
 				     struct blk_zone *zone)
@@ -232,6 +233,11 @@ static inline void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info,
 static inline void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg) { }
 
 static inline void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info) { }
+
+static inline bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info)
+{
+	return false;
+}
 #endif
 
 static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 4/4] btrfs: zoned: make auto-reclaim less aggressive
  2022-03-29  8:56 ` [PATCH v2 4/4] btrfs: zoned: make auto-reclaim less aggressive Johannes Thumshirn
@ 2022-03-30 15:22   ` Pankaj Raghav
  2022-04-04 15:48   ` David Sterba
  1 sibling, 0 replies; 8+ messages in thread
From: Pankaj Raghav @ 2022-03-30 15:22 UTC (permalink / raw)
  To: Johannes Thumshirn, David Sterba
  Cc: Naohiro Aota, Josef Bacik, linux-btrfs @ vger . kernel . org

LGTM. Tested it in QEMU with zcap == zsize and zcap!= zsize.

Tested-by: Pankaj Raghav <p.raghav@samsung.com>

On 2022-03-29 10:56, Johannes Thumshirn wrote:
> The current auto-reclaim algorithm starts reclaiming all block-group's
> with a zone_unusable value above a configured threshold. This is causing a
> lot of reclaim IO even if there would be enough free zones on the device.
> 
> Instead of only accounting a block-group's zone_unusable value, also take
> the ratio of free and not usable (written as well as zone_unusable) bytes
> a device has into account.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/block-group.c | 10 ++++++++++
>  fs/btrfs/zoned.c       | 28 ++++++++++++++++++++++++++++
>  fs/btrfs/zoned.h       |  6 ++++++
>  3 files changed, 44 insertions(+)
> 
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 628741ecb97b..12454304bb85 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1512,6 +1512,13 @@ static int reclaim_bgs_cmp(void *unused, const struct list_head *a,
>  	return bg1->used > bg2->used;
>  }
>  
> +static inline bool btrfs_should_reclaim(struct btrfs_fs_info *fs_info)
> +{
> +	if (btrfs_is_zoned(fs_info))
> +		return btrfs_zoned_should_reclaim(fs_info);
> +	return true;
> +}
> +
>  void btrfs_reclaim_bgs_work(struct work_struct *work)
>  {
>  	struct btrfs_fs_info *fs_info =
> @@ -1522,6 +1529,9 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
>  	if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags))
>  		return;
>  
> +	if (!btrfs_should_reclaim(fs_info))
> +		return;
> +
>  	sb_start_write(fs_info->sb);
>  
>  	if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE)) {
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 1b1b310c3c51..c0c460749b74 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -2079,3 +2079,31 @@ void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info)
>  	}
>  	mutex_unlock(&fs_devices->device_list_mutex);
>  }
> +
> +bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info)
> +{
> +	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
> +	struct btrfs_device *device;
> +	u64 used = 0;
> +	u64 total = 0;
> +	u64 factor;
> +
> +	ASSERT(btrfs_is_zoned(fs_info));
> +
> +	if (!fs_info->bg_reclaim_threshold)
> +		return false;
> +
> +	mutex_lock(&fs_devices->device_list_mutex);
> +	list_for_each_entry(device, &fs_devices->devices, dev_list) {
> +		if (!device->bdev)
> +			continue;
> +
> +		total += device->disk_total_bytes;
> +		used += device->bytes_used;
> +
> +	}
> +	mutex_unlock(&fs_devices->device_list_mutex);
> +
> +	factor = div64_u64(used * 100, total);
> +	return factor >= fs_info->bg_reclaim_threshold;
> +}
> diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
> index c489c08d7fd5..f2d16395087f 100644
> --- a/fs/btrfs/zoned.h
> +++ b/fs/btrfs/zoned.h
> @@ -74,6 +74,7 @@ void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical,
>  			     u64 length);
>  void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg);
>  void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info);
> +bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info);
>  #else /* CONFIG_BLK_DEV_ZONED */
>  static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos,
>  				     struct blk_zone *zone)
> @@ -232,6 +233,11 @@ static inline void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info,
>  static inline void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg) { }
>  
>  static inline void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info) { }
> +
> +static inline bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info)
> +{
> +	return false;
> +}
>  #endif
>  
>  static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)

-- 
Regards,
Pankaj

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 4/4] btrfs: zoned: make auto-reclaim less aggressive
  2022-03-29  8:56 ` [PATCH v2 4/4] btrfs: zoned: make auto-reclaim less aggressive Johannes Thumshirn
  2022-03-30 15:22   ` Pankaj Raghav
@ 2022-04-04 15:48   ` David Sterba
  1 sibling, 0 replies; 8+ messages in thread
From: David Sterba @ 2022-04-04 15:48 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, Naohiro Aota, Josef Bacik, Pankaj Raghav,
	linux-btrfs @ vger . kernel . org

On Tue, Mar 29, 2022 at 01:56:09AM -0700, Johannes Thumshirn wrote:
> The current auto-reclaim algorithm starts reclaiming all block-group's

Please write it as 'block group' in the text, also the use of 's is not
plural.

> with a zone_unusable value above a configured threshold. This is causing a
> lot of reclaim IO even if there would be enough free zones on the device.
> 
> Instead of only accounting a block-group's zone_unusable value, also take
> the ratio of free and not usable (written as well as zone_unusable) bytes
> a device has into account.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/block-group.c | 10 ++++++++++
>  fs/btrfs/zoned.c       | 28 ++++++++++++++++++++++++++++
>  fs/btrfs/zoned.h       |  6 ++++++
>  3 files changed, 44 insertions(+)
> 
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 628741ecb97b..12454304bb85 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1512,6 +1512,13 @@ static int reclaim_bgs_cmp(void *unused, const struct list_head *a,
>  	return bg1->used > bg2->used;
>  }
>  
> +static inline bool btrfs_should_reclaim(struct btrfs_fs_info *fs_info)
> +{
> +	if (btrfs_is_zoned(fs_info))
> +		return btrfs_zoned_should_reclaim(fs_info);
> +	return true;
> +}
> +
>  void btrfs_reclaim_bgs_work(struct work_struct *work)
>  {
>  	struct btrfs_fs_info *fs_info =
> @@ -1522,6 +1529,9 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
>  	if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags))
>  		return;
>  
> +	if (!btrfs_should_reclaim(fs_info))
> +		return;
> +
>  	sb_start_write(fs_info->sb);
>  
>  	if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE)) {
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 1b1b310c3c51..c0c460749b74 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -2079,3 +2079,31 @@ void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info)
>  	}
>  	mutex_unlock(&fs_devices->device_list_mutex);
>  }
> +
> +bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info)
> +{
> +	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
> +	struct btrfs_device *device;
> +	u64 used = 0;
> +	u64 total = 0;
> +	u64 factor;
> +
> +	ASSERT(btrfs_is_zoned(fs_info));
> +
> +	if (!fs_info->bg_reclaim_threshold)

For integer values it's IMHO better to use == 0 as ! is for bool
variables.

> +		return false;
> +
> +	mutex_lock(&fs_devices->device_list_mutex);
> +	list_for_each_entry(device, &fs_devices->devices, dev_list) {
> +		if (!device->bdev)
> +			continue;
> +
> +		total += device->disk_total_bytes;
> +		used += device->bytes_used;
> +
> +	}
> +	mutex_unlock(&fs_devices->device_list_mutex);
> +
> +	factor = div64_u64(used * 100, total);

Seems we can't avoid 64bit division here, at least it's not perf
critical.

> +	return factor >= fs_info->bg_reclaim_threshold;
> +}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/4] btrfs: rework background block group relocation
  2022-03-29  8:56 [PATCH v2 0/4] btrfs: rework background block group relocation Johannes Thumshirn
                   ` (3 preceding siblings ...)
  2022-03-29  8:56 ` [PATCH v2 4/4] btrfs: zoned: make auto-reclaim less aggressive Johannes Thumshirn
@ 2022-04-04 15:50 ` David Sterba
  4 siblings, 0 replies; 8+ messages in thread
From: David Sterba @ 2022-04-04 15:50 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, Naohiro Aota, Josef Bacik, Pankaj Raghav,
	linux-btrfs @ vger . kernel . org

On Tue, Mar 29, 2022 at 01:56:05AM -0700, Johannes Thumshirn wrote:
> This is a combination of Josef's series titled "btrfs: rework background
> block group relocation" and my patch titled "btrfs: zoned: make auto-reclaim
> less aggressive" plus another preparation patch to address Josef's comments.
> 
> I've opted for rebasinig my path onto Josef's series to avoid and fix
> conflicts, as we're both touching the same code.
> 
> Here's the original cover letter from Josef:
> 
> Currently the background block group relocation code only works for zoned
> devices, as it prevents the file system from becoming unusable because of block
> group fragmentation.
> 
> However inside Facebook our common workload is to download tens of gigabytes
> worth of send files or package files, and it does this by fallocate()'ing the
> entire package, writing into it, and then free'ing it up afterwards.
> Unfortunately this leads to a similar problem as zoned, we get fragmented data
> block groups, and this trends towards filling the entire disk up with partly
> used data block groups, which then leads to ENOSPC because of the lack of
> metadata space.
> 
> Because of this we have been running balance internally forever, but this was
> triggered based on different size usage hueristics and stil gave us a high
> enough failure rate that it was annoying (figure 10-20 machines needing to be
> reprovisioned per week).
> 
> So I modified the existing bg_reclaim_threshold code to also apply in the !zoned
> case, and I also made it only apply to DATA block groups.  This has completely
> eliminated these random failure cases, and we're no longer reprovisioning
> machines that get stuck with 0 metadata space.
> 
> However my internal patch is kind of janky as it hard codes the DATA check.
> What I've done here is made the bg_reclaim_threshold per-space_info, this way
> a user can target all block group types or just the ones they care about.  This
> won't break any current users because this only applied in the zoned case
> before.
> 
> Additionally I've added the code to allow this to work in the !zoned case, and
> loosened the restriction on the threshold from 50-100 to 0-100.
> 
> I tested this on my vm by writing 500m files and then removing half of them and
> validating that the block groups were automatically reclaimed.
> 
> https://lore.kernel.org/linux-btrfs/cover.1646934721.git.josef@toxicpanda.com/
> 
> Changes to v1:
> * Fix zoned threshold calculation (Pankaj)
> * Drop unneeded patch
> 
> Johannes Thumshirn (1):
>   btrfs: zoned: make auto-reclaim less aggressive
> 
> Josef Bacik (3):
>   btrfs: make the bg_reclaim_threshold per-space info
>   btrfs: allow block group background reclaim for !zoned fs'es
>   btrfs: change the bg_reclaim_threshold valid region from 0 to 100

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-04-04 15:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-29  8:56 [PATCH v2 0/4] btrfs: rework background block group relocation Johannes Thumshirn
2022-03-29  8:56 ` [PATCH v2 1/4] btrfs: make the bg_reclaim_threshold per-space info Johannes Thumshirn
2022-03-29  8:56 ` [PATCH v2 2/4] btrfs: allow block group background reclaim for !zoned fs'es Johannes Thumshirn
2022-03-29  8:56 ` [PATCH v2 3/4] btrfs: change the bg_reclaim_threshold valid region from 0 to 100 Johannes Thumshirn
2022-03-29  8:56 ` [PATCH v2 4/4] btrfs: zoned: make auto-reclaim less aggressive Johannes Thumshirn
2022-03-30 15:22   ` Pankaj Raghav
2022-04-04 15:48   ` David Sterba
2022-04-04 15:50 ` [PATCH v2 0/4] btrfs: rework background block group relocation David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.