* [PATCH RFC 0/3] btrfs: zoned: reclaim block-groups more aggressively @ 2024-03-28 13:56 Johannes Thumshirn 2024-03-28 13:56 ` [PATCH RFC PATCH 1/3] btrfs: zoned: traverse device list in should reclaim under rcu_read_lock Johannes Thumshirn ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Johannes Thumshirn @ 2024-03-28 13:56 UTC (permalink / raw) To: Chris Mason, Josef Bacik, David Sterba Cc: linux-btrfs, linux-kernel, Hans Holmberg, Naohiro Aota, hch, Damien LeMoal, Boris Burkov, Johannes Thumshirn Recently we had reports internally that zoned btrfs can get to -ENOSPC prematurely, because we're running out of allocatable zones on the drive. But we would have enough space left, if we would reclaim more aggressively. The following fio-job is an example how to trigger the failure. [test] filename=$SCRATCH_MNT readwrite=write ioengine=libaio direct=1 loops=2 filesize=$60_PERCENT_OF_DRIVE bs=128k The reason this is still an RFC is, it is enough to have DATA block groups free but not METADATA block groups. Of cause the same principle could be applied to METADATA block groups as well, but I'd prefer to have a discussion on the general direction first. --- Johannes Thumshirn (3): btrfs: zoned: traverse device list in should reclaim under rcu_read_lock btrfs: zoned: reserve relocation zone on mount btrfs: zoned: kick cleaner kthread if low on space fs/btrfs/disk-io.c | 2 ++ fs/btrfs/zoned.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++-- fs/btrfs/zoned.h | 3 +++ 3 files changed, 59 insertions(+), 2 deletions(-) --- base-commit: c22750cb802ea5fd510fa66f77aefa3a0529951b change-id: 20240328-hans-2c9e3b4a69e8 Best regards, -- Johannes Thumshirn <jth@kernel.org> ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH RFC PATCH 1/3] btrfs: zoned: traverse device list in should reclaim under rcu_read_lock 2024-03-28 13:56 [PATCH RFC 0/3] btrfs: zoned: reclaim block-groups more aggressively Johannes Thumshirn @ 2024-03-28 13:56 ` Johannes Thumshirn 2024-04-04 19:35 ` David Sterba 2024-03-28 13:56 ` [PATCH RFC PATCH 2/3] btrfs: zoned: reserve relocation zone on mount Johannes Thumshirn 2024-03-28 13:56 ` [PATCH RFC PATCH 3/3] btrfs: zoned: kick cleaner kthread if low on space Johannes Thumshirn 2 siblings, 1 reply; 13+ messages in thread From: Johannes Thumshirn @ 2024-03-28 13:56 UTC (permalink / raw) To: Chris Mason, Josef Bacik, David Sterba Cc: linux-btrfs, linux-kernel, Hans Holmberg, Naohiro Aota, hch, Damien LeMoal, Boris Burkov, Johannes Thumshirn From: Johannes Thumshirn <johannes.thumshirn@wdc.com> As btrfs_zoned_should_reclaim() traverses the device list with the device_list_mutex held. But we're never changing the device list. All we do is gathering the used and total bytes. So change the list traversal from the holding the device_list_mutex to rcu_read_lock(). This also opens up the possibilities to call btrfs_zoned_should_reclaim() with the chunk_mutex held. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> --- fs/btrfs/zoned.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 4cba80b34387..d51faf7f4162 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -2446,7 +2446,7 @@ bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info) if (fs_info->bg_reclaim_threshold == 0) return false; - mutex_lock(&fs_devices->device_list_mutex); + rcu_read_lock(); list_for_each_entry(device, &fs_devices->devices, dev_list) { if (!device->bdev) continue; @@ -2454,7 +2454,7 @@ bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info) total += device->disk_total_bytes; used += device->bytes_used; } - mutex_unlock(&fs_devices->device_list_mutex); + rcu_read_unlock(); factor = div64_u64(used * 100, total); return factor >= fs_info->bg_reclaim_threshold; -- 2.35.3 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH RFC PATCH 1/3] btrfs: zoned: traverse device list in should reclaim under rcu_read_lock 2024-03-28 13:56 ` [PATCH RFC PATCH 1/3] btrfs: zoned: traverse device list in should reclaim under rcu_read_lock Johannes Thumshirn @ 2024-04-04 19:35 ` David Sterba 0 siblings, 0 replies; 13+ messages in thread From: David Sterba @ 2024-04-04 19:35 UTC (permalink / raw) To: Johannes Thumshirn Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs, linux-kernel, Hans Holmberg, Naohiro Aota, hch, Damien LeMoal, Boris Burkov, Johannes Thumshirn Subject: btrfs: zoned: traverse device list in should reclaim under rcu_read_lock Please use the function name in the subject instead of the description, so like btrfs: zoned: traverse device list under RCU lock in btrfs_zoned_should_reclaim() On Thu, Mar 28, 2024 at 02:56:31PM +0100, Johannes Thumshirn wrote: > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > As btrfs_zoned_should_reclaim() traverses the device list with the > device_list_mutex held. But we're never changing the device list. All we > do is gathering the used and total bytes. > > So change the list traversal from the holding the device_list_mutex to > rcu_read_lock(). This also opens up the possibilities to call > btrfs_zoned_should_reclaim() with the chunk_mutex held. You can add this patch independently, the device_list_mutex does not seem to be needed for strong consistency of the values you read. There are several other places where devices are under RCU only, like btrfs_commit_device_sizes (though this is also in the transaction context), various ioctls and btrfs_calc_avail_data_space(). ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH RFC PATCH 2/3] btrfs: zoned: reserve relocation zone on mount 2024-03-28 13:56 [PATCH RFC 0/3] btrfs: zoned: reclaim block-groups more aggressively Johannes Thumshirn 2024-03-28 13:56 ` [PATCH RFC PATCH 1/3] btrfs: zoned: traverse device list in should reclaim under rcu_read_lock Johannes Thumshirn @ 2024-03-28 13:56 ` Johannes Thumshirn 2024-03-28 23:05 ` Damien Le Moal 2024-04-05 1:14 ` Naohiro Aota 2024-03-28 13:56 ` [PATCH RFC PATCH 3/3] btrfs: zoned: kick cleaner kthread if low on space Johannes Thumshirn 2 siblings, 2 replies; 13+ messages in thread From: Johannes Thumshirn @ 2024-03-28 13:56 UTC (permalink / raw) To: Chris Mason, Josef Bacik, David Sterba Cc: linux-btrfs, linux-kernel, Hans Holmberg, Naohiro Aota, hch, Damien LeMoal, Boris Burkov, Johannes Thumshirn From: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reserve one zone as a data relocation target on each mount. If we already find one empty block group, there's no need to force a chunk allocation, but we can use this empty data block group as our relocation target. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> --- fs/btrfs/disk-io.c | 2 ++ fs/btrfs/zoned.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 3 +++ 3 files changed, 51 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 5a35c2c0bbc9..83b56f109d29 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3550,6 +3550,8 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device } btrfs_discard_resume(fs_info); + btrfs_reserve_relocation_zone(fs_info); + if (fs_info->uuid_root && (btrfs_test_opt(fs_info, RESCAN_UUID_TREE) || fs_info->generation != btrfs_super_uuid_tree_generation(disk_super))) { diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index d51faf7f4162..fb8707f4cab5 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -17,6 +17,7 @@ #include "fs.h" #include "accessors.h" #include "bio.h" +#include "transaction.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -2634,3 +2635,48 @@ void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info) } spin_unlock(&fs_info->zone_active_bgs_lock); } + +static u64 find_empty_block_group(struct btrfs_space_info *sinfo) +{ + struct btrfs_block_group *bg; + + for (int i = 0; i < BTRFS_NR_RAID_TYPES; i++) { + list_for_each_entry(bg, &sinfo->block_groups[i], list) { + if (bg->used == 0) + return bg->start; + } + } + + return 0; +} + +void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info) +{ + struct btrfs_root *tree_root = fs_info->tree_root; + struct btrfs_space_info *sinfo = fs_info->data_sinfo; + struct btrfs_trans_handle *trans; + u64 flags = btrfs_get_alloc_profile(fs_info, sinfo->flags); + u64 bytenr = 0; + + if (!btrfs_is_zoned(fs_info)) + return; + + bytenr = find_empty_block_group(sinfo); + if (!bytenr) { + int ret; + + trans = btrfs_join_transaction(tree_root); + if (IS_ERR(trans)) + return; + + ret = btrfs_chunk_alloc(trans, flags, CHUNK_ALLOC_FORCE); + btrfs_end_transaction(trans); + + if (!ret) + bytenr = find_empty_block_group(sinfo); + } + + spin_lock(&fs_info->relocation_bg_lock); + fs_info->data_reloc_bg = bytenr; + spin_unlock(&fs_info->relocation_bg_lock); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 77c4321e331f..048ffada4549 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -97,6 +97,7 @@ int btrfs_zone_finish_one_bg(struct btrfs_fs_info *fs_info); int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info, struct btrfs_space_info *space_info, bool do_finish); void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info); +void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -271,6 +272,8 @@ static inline int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info, static inline void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info) { } +static inline void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) -- 2.35.3 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH RFC PATCH 2/3] btrfs: zoned: reserve relocation zone on mount 2024-03-28 13:56 ` [PATCH RFC PATCH 2/3] btrfs: zoned: reserve relocation zone on mount Johannes Thumshirn @ 2024-03-28 23:05 ` Damien Le Moal 2024-04-02 6:03 ` Naohiro Aota 2024-04-05 1:14 ` Naohiro Aota 1 sibling, 1 reply; 13+ messages in thread From: Damien Le Moal @ 2024-03-28 23:05 UTC (permalink / raw) To: Johannes Thumshirn, Chris Mason, Josef Bacik, David Sterba Cc: linux-btrfs, linux-kernel, Hans Holmberg, Naohiro Aota, hch, Boris Burkov, Johannes Thumshirn On 3/28/24 22:56, Johannes Thumshirn wrote: > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > Reserve one zone as a data relocation target on each mount. If we already > find one empty block group, there's no need to force a chunk allocation, > but we can use this empty data block group as our relocation target. > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/disk-io.c | 2 ++ > fs/btrfs/zoned.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > fs/btrfs/zoned.h | 3 +++ > 3 files changed, 51 insertions(+) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 5a35c2c0bbc9..83b56f109d29 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -3550,6 +3550,8 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device > } > btrfs_discard_resume(fs_info); > > + btrfs_reserve_relocation_zone(fs_info); > + > if (fs_info->uuid_root && > (btrfs_test_opt(fs_info, RESCAN_UUID_TREE) || > fs_info->generation != btrfs_super_uuid_tree_generation(disk_super))) { > diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c > index d51faf7f4162..fb8707f4cab5 100644 > --- a/fs/btrfs/zoned.c > +++ b/fs/btrfs/zoned.c > @@ -17,6 +17,7 @@ > #include "fs.h" > #include "accessors.h" > #include "bio.h" > +#include "transaction.h" > > /* Maximum number of zones to report per blkdev_report_zones() call */ > #define BTRFS_REPORT_NR_ZONES 4096 > @@ -2634,3 +2635,48 @@ void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info) > } > spin_unlock(&fs_info->zone_active_bgs_lock); > } > + > +static u64 find_empty_block_group(struct btrfs_space_info *sinfo) > +{ > + struct btrfs_block_group *bg; > + > + for (int i = 0; i < BTRFS_NR_RAID_TYPES; i++) { > + list_for_each_entry(bg, &sinfo->block_groups[i], list) { > + if (bg->used == 0) > + return bg->start; > + } > + } > + > + return 0; The first block group does not start at offset 0 ? If it does, then this is not correct. Maybe turn this function into returning a bool and add a pointer argument to return the bg start value ? > +} > + > +void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info) > +{ > + struct btrfs_root *tree_root = fs_info->tree_root; > + struct btrfs_space_info *sinfo = fs_info->data_sinfo; > + struct btrfs_trans_handle *trans; > + u64 flags = btrfs_get_alloc_profile(fs_info, sinfo->flags); > + u64 bytenr = 0; > + > + if (!btrfs_is_zoned(fs_info)) > + return; > + > + bytenr = find_empty_block_group(sinfo); > + if (!bytenr) { > + int ret; > + > + trans = btrfs_join_transaction(tree_root); > + if (IS_ERR(trans)) > + return; > + > + ret = btrfs_chunk_alloc(trans, flags, CHUNK_ALLOC_FORCE); > + btrfs_end_transaction(trans); > + > + if (!ret) > + bytenr = find_empty_block_group(sinfo); What if this fail again ? > + } > + > + spin_lock(&fs_info->relocation_bg_lock); > + fs_info->data_reloc_bg = bytenr; > + spin_unlock(&fs_info->relocation_bg_lock); > +} > diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h > index 77c4321e331f..048ffada4549 100644 > --- a/fs/btrfs/zoned.h > +++ b/fs/btrfs/zoned.h > @@ -97,6 +97,7 @@ int btrfs_zone_finish_one_bg(struct btrfs_fs_info *fs_info); > int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info, > struct btrfs_space_info *space_info, bool do_finish); > void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info); > +void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info); > #else /* CONFIG_BLK_DEV_ZONED */ > static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, > struct blk_zone *zone) > @@ -271,6 +272,8 @@ static inline int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info, > > static inline void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info) { } > > +static inline void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info) { } > + > #endif > > static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) > -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC PATCH 2/3] btrfs: zoned: reserve relocation zone on mount 2024-03-28 23:05 ` Damien Le Moal @ 2024-04-02 6:03 ` Naohiro Aota 2024-04-02 17:04 ` Boris Burkov 0 siblings, 1 reply; 13+ messages in thread From: Naohiro Aota @ 2024-04-02 6:03 UTC (permalink / raw) To: Damien Le Moal Cc: Johannes Thumshirn, Chris Mason, Josef Bacik, David Sterba, linux-btrfs, linux-kernel, Hans Holmberg, hch, Boris Burkov, Johannes Thumshirn On Fri, Mar 29, 2024 at 08:05:34AM +0900, Damien Le Moal wrote: > On 3/28/24 22:56, Johannes Thumshirn wrote: > > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > > > Reserve one zone as a data relocation target on each mount. If we already > > find one empty block group, there's no need to force a chunk allocation, > > but we can use this empty data block group as our relocation target. > > > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > --- > > fs/btrfs/disk-io.c | 2 ++ > > fs/btrfs/zoned.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > > fs/btrfs/zoned.h | 3 +++ > > 3 files changed, 51 insertions(+) > > > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > > index 5a35c2c0bbc9..83b56f109d29 100644 > > --- a/fs/btrfs/disk-io.c > > +++ b/fs/btrfs/disk-io.c > > @@ -3550,6 +3550,8 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device > > } > > btrfs_discard_resume(fs_info); > > > > + btrfs_reserve_relocation_zone(fs_info); > > + > > if (fs_info->uuid_root && > > (btrfs_test_opt(fs_info, RESCAN_UUID_TREE) || > > fs_info->generation != btrfs_super_uuid_tree_generation(disk_super))) { > > diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c > > index d51faf7f4162..fb8707f4cab5 100644 > > --- a/fs/btrfs/zoned.c > > +++ b/fs/btrfs/zoned.c > > @@ -17,6 +17,7 @@ > > #include "fs.h" > > #include "accessors.h" > > #include "bio.h" > > +#include "transaction.h" > > > > /* Maximum number of zones to report per blkdev_report_zones() call */ > > #define BTRFS_REPORT_NR_ZONES 4096 > > @@ -2634,3 +2635,48 @@ void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info) > > } > > spin_unlock(&fs_info->zone_active_bgs_lock); > > } > > + > > +static u64 find_empty_block_group(struct btrfs_space_info *sinfo) > > +{ > > + struct btrfs_block_group *bg; > > + > > + for (int i = 0; i < BTRFS_NR_RAID_TYPES; i++) { > > + list_for_each_entry(bg, &sinfo->block_groups[i], list) { > > + if (bg->used == 0) > > + return bg->start; > > + } > > + } > > + > > + return 0; > > The first block group does not start at offset 0 ? If it does, then this is not > correct. Maybe turn this function into returning a bool and add a pointer > argument to return the bg start value ? No, it does not. The bg->start (logical address) increases monotonically as a new block group is created. And, the first block group created by mkfs.btrfs has a non-zero bg->start address. > > +} > > + > > +void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info) > > +{ > > + struct btrfs_root *tree_root = fs_info->tree_root; > > + struct btrfs_space_info *sinfo = fs_info->data_sinfo; > > + struct btrfs_trans_handle *trans; > > + u64 flags = btrfs_get_alloc_profile(fs_info, sinfo->flags); > > + u64 bytenr = 0; > > + > > + if (!btrfs_is_zoned(fs_info)) > > + return; > > + > > + bytenr = find_empty_block_group(sinfo); > > + if (!bytenr) { > > + int ret; > > + > > + trans = btrfs_join_transaction(tree_root); > > + if (IS_ERR(trans)) > > + return; > > + > > + ret = btrfs_chunk_alloc(trans, flags, CHUNK_ALLOC_FORCE); > > + btrfs_end_transaction(trans); > > + > > + if (!ret) > > + bytenr = find_empty_block_group(sinfo); > > What if this fail again ? That (almost) means we don't have any free space to create a new block group. Then, we don't have a way to deal with it. Maybe, we can reclaim directly here? Anyway, in that case, we will set fs_info->data_reloc_bg = 0, which is the same behavior as the current code. > > + } > > + > > + spin_lock(&fs_info->relocation_bg_lock); > > + fs_info->data_reloc_bg = bytenr; > > + spin_unlock(&fs_info->relocation_bg_lock); > > +} > > diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h > > index 77c4321e331f..048ffada4549 100644 > > --- a/fs/btrfs/zoned.h > > +++ b/fs/btrfs/zoned.h > > @@ -97,6 +97,7 @@ int btrfs_zone_finish_one_bg(struct btrfs_fs_info *fs_info); > > int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info, > > struct btrfs_space_info *space_info, bool do_finish); > > void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info); > > +void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info); > > #else /* CONFIG_BLK_DEV_ZONED */ > > static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, > > struct blk_zone *zone) > > @@ -271,6 +272,8 @@ static inline int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info, > > > > static inline void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info) { } > > > > +static inline void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info) { } > > + > > #endif > > > > static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) > > > > -- > Damien Le Moal > Western Digital Research > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC PATCH 2/3] btrfs: zoned: reserve relocation zone on mount 2024-04-02 6:03 ` Naohiro Aota @ 2024-04-02 17:04 ` Boris Burkov 2024-04-05 5:03 ` Naohiro Aota 0 siblings, 1 reply; 13+ messages in thread From: Boris Burkov @ 2024-04-02 17:04 UTC (permalink / raw) To: Naohiro Aota Cc: Damien Le Moal, Johannes Thumshirn, Chris Mason, Josef Bacik, David Sterba, linux-btrfs, linux-kernel, Hans Holmberg, hch, Johannes Thumshirn On Tue, Apr 02, 2024 at 06:03:35AM +0000, Naohiro Aota wrote: > On Fri, Mar 29, 2024 at 08:05:34AM +0900, Damien Le Moal wrote: > > On 3/28/24 22:56, Johannes Thumshirn wrote: > > > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > > > > > Reserve one zone as a data relocation target on each mount. If we already > > > find one empty block group, there's no need to force a chunk allocation, > > > but we can use this empty data block group as our relocation target. I'm confused why it's sufficient to ensure the reservation only once at mount. What ensures that the fs is in a condition to handle needed relocations a month later after we have already made use of the one bg we reserved at mount? Do we always reserve the "just-relocated-out-of" fresh one for future relocations or something? I couldn't infer that from a quick look at the use-sites of data_reloc_bg, but I could have easily missed it. > > > > > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > > --- > > > fs/btrfs/disk-io.c | 2 ++ > > > fs/btrfs/zoned.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > > > fs/btrfs/zoned.h | 3 +++ > > > 3 files changed, 51 insertions(+) > > > > > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > > > index 5a35c2c0bbc9..83b56f109d29 100644 > > > --- a/fs/btrfs/disk-io.c > > > +++ b/fs/btrfs/disk-io.c > > > @@ -3550,6 +3550,8 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device > > > } > > > btrfs_discard_resume(fs_info); > > > > > > + btrfs_reserve_relocation_zone(fs_info); > > > + > > > if (fs_info->uuid_root && > > > (btrfs_test_opt(fs_info, RESCAN_UUID_TREE) || > > > fs_info->generation != btrfs_super_uuid_tree_generation(disk_super))) { > > > diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c > > > index d51faf7f4162..fb8707f4cab5 100644 > > > --- a/fs/btrfs/zoned.c > > > +++ b/fs/btrfs/zoned.c > > > @@ -17,6 +17,7 @@ > > > #include "fs.h" > > > #include "accessors.h" > > > #include "bio.h" > > > +#include "transaction.h" > > > > > > /* Maximum number of zones to report per blkdev_report_zones() call */ > > > #define BTRFS_REPORT_NR_ZONES 4096 > > > @@ -2634,3 +2635,48 @@ void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info) > > > } > > > spin_unlock(&fs_info->zone_active_bgs_lock); > > > } > > > + > > > +static u64 find_empty_block_group(struct btrfs_space_info *sinfo) > > > +{ > > > + struct btrfs_block_group *bg; > > > + > > > + for (int i = 0; i < BTRFS_NR_RAID_TYPES; i++) { > > > + list_for_each_entry(bg, &sinfo->block_groups[i], list) { > > > + if (bg->used == 0) > > > + return bg->start; > > > + } > > > + } > > > + > > > + return 0; > > > > The first block group does not start at offset 0 ? If it does, then this is not > > correct. Maybe turn this function into returning a bool and add a pointer > > argument to return the bg start value ? > > No, it does not. The bg->start (logical address) increases monotonically as > a new block group is created. And, the first block group created by > mkfs.btrfs has a non-zero bg->start address. > > > > +} > > > + > > > +void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info) > > > +{ > > > + struct btrfs_root *tree_root = fs_info->tree_root; > > > + struct btrfs_space_info *sinfo = fs_info->data_sinfo; > > > + struct btrfs_trans_handle *trans; > > > + u64 flags = btrfs_get_alloc_profile(fs_info, sinfo->flags); > > > + u64 bytenr = 0; > > > + > > > + if (!btrfs_is_zoned(fs_info)) > > > + return; > > > + > > > + bytenr = find_empty_block_group(sinfo); > > > + if (!bytenr) { > > > + int ret; > > > + > > > + trans = btrfs_join_transaction(tree_root); > > > + if (IS_ERR(trans)) > > > + return; > > > + > > > + ret = btrfs_chunk_alloc(trans, flags, CHUNK_ALLOC_FORCE); > > > + btrfs_end_transaction(trans); > > > + > > > + if (!ret) > > > + bytenr = find_empty_block_group(sinfo); > > > > What if this fail again ? > > That (almost) means we don't have any free space to create a new block > group. Then, we don't have a way to deal with it. Maybe, we can reclaim > directly here? To my more general point, should we keep trying in a more sustained way on the live fs, even if it happens to be full-full at mount? > > Anyway, in that case, we will set fs_info->data_reloc_bg = 0, which is the > same behavior as the current code. Well right now it is only called from mount, in which case it will only fail if we are full, since there shouldn't be concurrent allocations. OTOH, if this does get called from some more live fs context down the line, then this could easily race with allocations using the block group. For that reason, I think it makes sense to either add locking, a retry loop, or inline reclaim. > > > > + } > > > + > > > + spin_lock(&fs_info->relocation_bg_lock); > > > + fs_info->data_reloc_bg = bytenr; > > > + spin_unlock(&fs_info->relocation_bg_lock); > > > +} > > > diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h > > > index 77c4321e331f..048ffada4549 100644 > > > --- a/fs/btrfs/zoned.h > > > +++ b/fs/btrfs/zoned.h > > > @@ -97,6 +97,7 @@ int btrfs_zone_finish_one_bg(struct btrfs_fs_info *fs_info); > > > int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info, > > > struct btrfs_space_info *space_info, bool do_finish); > > > void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info); > > > +void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info); > > > #else /* CONFIG_BLK_DEV_ZONED */ > > > static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, > > > struct blk_zone *zone) > > > @@ -271,6 +272,8 @@ static inline int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info, > > > > > > static inline void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info) { } > > > > > > +static inline void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info) { } > > > + > > > #endif > > > > > > static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) > > > > > > > -- > > Damien Le Moal > > Western Digital Research > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC PATCH 2/3] btrfs: zoned: reserve relocation zone on mount 2024-04-02 17:04 ` Boris Burkov @ 2024-04-05 5:03 ` Naohiro Aota 0 siblings, 0 replies; 13+ messages in thread From: Naohiro Aota @ 2024-04-05 5:03 UTC (permalink / raw) To: Boris Burkov Cc: Damien Le Moal, Johannes Thumshirn, Chris Mason, Josef Bacik, David Sterba, linux-btrfs, linux-kernel, Hans Holmberg, hch, Johannes Thumshirn On Tue, Apr 02, 2024 at 10:04:51AM -0700, Boris Burkov wrote: > On Tue, Apr 02, 2024 at 06:03:35AM +0000, Naohiro Aota wrote: > > On Fri, Mar 29, 2024 at 08:05:34AM +0900, Damien Le Moal wrote: > > > On 3/28/24 22:56, Johannes Thumshirn wrote: > > > > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > > > > > > > Reserve one zone as a data relocation target on each mount. If we already > > > > find one empty block group, there's no need to force a chunk allocation, > > > > but we can use this empty data block group as our relocation target. > > I'm confused why it's sufficient to ensure the reservation only once at > mount. What ensures that the fs is in a condition to handle needed > relocations a month later after we have already made use of the one bg > we reserved at mount? Do we always reserve the "just-relocated-out-of" > fresh one for future relocations or something? I couldn't infer that > from a quick look at the use-sites of data_reloc_bg, but I could have > easily missed it. In general, btrfs_alloc_data_chunk_ondemand() called from prealloc_file_extent_cluster() should be responsible for allocating a new block group. It allocates a new block group if the space is not enough. When a block group is filled exactly, there is a chance we don't have data relocation block group. But, the next relocation cluster will allocate new one. There is a problem, however, that it only checks there is enough space for a relocation cluster in the DATA space_info. So, even if there is no space in the data relocation block group, we still can have enough space in the DATA space. Then, it won't ensure the relocation cluster will be written properly. We would like to have a dedicated function for the zoned case. It checks the space against the data relocation BG. When the space is not enough, it allocates/activates a new block group or takes one existing block group, making it a new data relocation BG. This new data relocation BG "promotion" is also done in do_allocation_zoned(), so moving the logic out to prealloc_file_extent_cluster() would be interesting. There is one issue when we choose an existing block group. We cannot use an existing BG if that BG is already "used" or "reserved" to avoid mixing ZONE_APPEND (normal data write) and WRITE (relocation data write). This restriction makes it difficult to use an existing BG. Here is one interesting solution. We can freely choose an existing BG, then we can wait enough time for on-going IOs to finish since we are in the relocation context. It is easier to implement the logic in prealloc_file_extent_cluster() than do_allocation_zoned() because we are free from many locks for the extent allocation and the writeback context. > > > > > > > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > > > --- > > > > fs/btrfs/disk-io.c | 2 ++ > > > > fs/btrfs/zoned.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > > > > fs/btrfs/zoned.h | 3 +++ > > > > 3 files changed, 51 insertions(+) > > > > > > > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > > > > index 5a35c2c0bbc9..83b56f109d29 100644 > > > > --- a/fs/btrfs/disk-io.c > > > > +++ b/fs/btrfs/disk-io.c > > > > @@ -3550,6 +3550,8 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device > > > > } > > > > btrfs_discard_resume(fs_info); > > > > > > > > + btrfs_reserve_relocation_zone(fs_info); > > > > + > > > > if (fs_info->uuid_root && > > > > (btrfs_test_opt(fs_info, RESCAN_UUID_TREE) || > > > > fs_info->generation != btrfs_super_uuid_tree_generation(disk_super))) { > > > > diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c > > > > index d51faf7f4162..fb8707f4cab5 100644 > > > > --- a/fs/btrfs/zoned.c > > > > +++ b/fs/btrfs/zoned.c > > > > @@ -17,6 +17,7 @@ > > > > #include "fs.h" > > > > #include "accessors.h" > > > > #include "bio.h" > > > > +#include "transaction.h" > > > > > > > > /* Maximum number of zones to report per blkdev_report_zones() call */ > > > > #define BTRFS_REPORT_NR_ZONES 4096 > > > > @@ -2634,3 +2635,48 @@ void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info) > > > > } > > > > spin_unlock(&fs_info->zone_active_bgs_lock); > > > > } > > > > + > > > > +static u64 find_empty_block_group(struct btrfs_space_info *sinfo) > > > > +{ > > > > + struct btrfs_block_group *bg; > > > > + > > > > + for (int i = 0; i < BTRFS_NR_RAID_TYPES; i++) { > > > > + list_for_each_entry(bg, &sinfo->block_groups[i], list) { > > > > + if (bg->used == 0) > > > > + return bg->start; > > > > + } > > > > + } > > > > + > > > > + return 0; > > > > > > The first block group does not start at offset 0 ? If it does, then this is not > > > correct. Maybe turn this function into returning a bool and add a pointer > > > argument to return the bg start value ? > > > > No, it does not. The bg->start (logical address) increases monotonically as > > a new block group is created. And, the first block group created by > > mkfs.btrfs has a non-zero bg->start address. > > > > > > +} > > > > + > > > > +void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info) > > > > +{ > > > > + struct btrfs_root *tree_root = fs_info->tree_root; > > > > + struct btrfs_space_info *sinfo = fs_info->data_sinfo; > > > > + struct btrfs_trans_handle *trans; > > > > + u64 flags = btrfs_get_alloc_profile(fs_info, sinfo->flags); > > > > + u64 bytenr = 0; > > > > + > > > > + if (!btrfs_is_zoned(fs_info)) > > > > + return; > > > > + > > > > + bytenr = find_empty_block_group(sinfo); > > > > + if (!bytenr) { > > > > + int ret; > > > > + > > > > + trans = btrfs_join_transaction(tree_root); > > > > + if (IS_ERR(trans)) > > > > + return; > > > > + > > > > + ret = btrfs_chunk_alloc(trans, flags, CHUNK_ALLOC_FORCE); > > > > + btrfs_end_transaction(trans); > > > > + > > > > + if (!ret) > > > > + bytenr = find_empty_block_group(sinfo); > > > > > > What if this fail again ? > > > > That (almost) means we don't have any free space to create a new block > > group. Then, we don't have a way to deal with it. Maybe, we can reclaim > > directly here? > > To my more general point, should we keep trying in a more sustained way > on the live fs, even if it happens to be full-full at mount? > > > > > Anyway, in that case, we will set fs_info->data_reloc_bg = 0, which is the > > same behavior as the current code. > > Well right now it is only called from mount, in which case it will only > fail if we are full, since there shouldn't be concurrent allocations. > > OTOH, if this does get called from some more live fs context down the > line, then this could easily race with allocations using the block > group. For that reason, I think it makes sense to either add locking, > a retry loop, or inline reclaim. So, implementing the above logic would help for live fs context. > > > > > > + } > > > > + > > > > + spin_lock(&fs_info->relocation_bg_lock); > > > > + fs_info->data_reloc_bg = bytenr; > > > > + spin_unlock(&fs_info->relocation_bg_lock); > > > > +} > > > > diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h > > > > index 77c4321e331f..048ffada4549 100644 > > > > --- a/fs/btrfs/zoned.h > > > > +++ b/fs/btrfs/zoned.h > > > > @@ -97,6 +97,7 @@ int btrfs_zone_finish_one_bg(struct btrfs_fs_info *fs_info); > > > > int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info, > > > > struct btrfs_space_info *space_info, bool do_finish); > > > > void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info); > > > > +void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info); > > > > #else /* CONFIG_BLK_DEV_ZONED */ > > > > static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, > > > > struct blk_zone *zone) > > > > @@ -271,6 +272,8 @@ static inline int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info, > > > > > > > > static inline void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info) { } > > > > > > > > +static inline void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info) { } > > > > + > > > > #endif > > > > > > > > static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) > > > > > > > > > > -- > > > Damien Le Moal > > > Western Digital Research > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC PATCH 2/3] btrfs: zoned: reserve relocation zone on mount 2024-03-28 13:56 ` [PATCH RFC PATCH 2/3] btrfs: zoned: reserve relocation zone on mount Johannes Thumshirn 2024-03-28 23:05 ` Damien Le Moal @ 2024-04-05 1:14 ` Naohiro Aota 1 sibling, 0 replies; 13+ messages in thread From: Naohiro Aota @ 2024-04-05 1:14 UTC (permalink / raw) To: Johannes Thumshirn Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs, linux-kernel, Hans Holmberg, hch, Damien LeMoal, Boris Burkov, Johannes Thumshirn On Thu, Mar 28, 2024 at 02:56:32PM +0100, Johannes Thumshirn wrote: > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > Reserve one zone as a data relocation target on each mount. If we already > find one empty block group, there's no need to force a chunk allocation, > but we can use this empty data block group as our relocation target. > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/disk-io.c | 2 ++ > fs/btrfs/zoned.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > fs/btrfs/zoned.h | 3 +++ > 3 files changed, 51 insertions(+) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 5a35c2c0bbc9..83b56f109d29 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -3550,6 +3550,8 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device > } > btrfs_discard_resume(fs_info); > > + btrfs_reserve_relocation_zone(fs_info); > + > if (fs_info->uuid_root && > (btrfs_test_opt(fs_info, RESCAN_UUID_TREE) || > fs_info->generation != btrfs_super_uuid_tree_generation(disk_super))) { > diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c > index d51faf7f4162..fb8707f4cab5 100644 > --- a/fs/btrfs/zoned.c > +++ b/fs/btrfs/zoned.c > @@ -17,6 +17,7 @@ > #include "fs.h" > #include "accessors.h" > #include "bio.h" > +#include "transaction.h" > > /* Maximum number of zones to report per blkdev_report_zones() call */ > #define BTRFS_REPORT_NR_ZONES 4096 > @@ -2634,3 +2635,48 @@ void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info) > } > spin_unlock(&fs_info->zone_active_bgs_lock); > } > + > +static u64 find_empty_block_group(struct btrfs_space_info *sinfo) > +{ > + struct btrfs_block_group *bg; > + > + for (int i = 0; i < BTRFS_NR_RAID_TYPES; i++) { This starting index prefers SINGLE to DUP/RAID profiles, which is bad. We can use something like get_alloc_profile_by_root() to decide a proper starting index. > + list_for_each_entry(bg, &sinfo->block_groups[i], list) { > + if (bg->used == 0) > + return bg->start; > + } > + } > + > + return 0; > +} > + > +void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info) > +{ > + struct btrfs_root *tree_root = fs_info->tree_root; > + struct btrfs_space_info *sinfo = fs_info->data_sinfo; > + struct btrfs_trans_handle *trans; > + u64 flags = btrfs_get_alloc_profile(fs_info, sinfo->flags); > + u64 bytenr = 0; > + > + if (!btrfs_is_zoned(fs_info)) > + return; > + > + bytenr = find_empty_block_group(sinfo); > + if (!bytenr) { > + int ret; > + > + trans = btrfs_join_transaction(tree_root); > + if (IS_ERR(trans)) > + return; > + > + ret = btrfs_chunk_alloc(trans, flags, CHUNK_ALLOC_FORCE); > + btrfs_end_transaction(trans); > + > + if (!ret) > + bytenr = find_empty_block_group(sinfo); > + } > + > + spin_lock(&fs_info->relocation_bg_lock); Currently, this function is called in the mount process: there is no relocation BG set. To prevent future misuse, I'd like to add an ASSERT(fs_info->relocation_bg_lock == 0). > + fs_info->data_reloc_bg = bytenr; We can activate that block group as well to ensure it's ready to go. > + spin_unlock(&fs_info->relocation_bg_lock); > +} > diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h > index 77c4321e331f..048ffada4549 100644 > --- a/fs/btrfs/zoned.h > +++ b/fs/btrfs/zoned.h > @@ -97,6 +97,7 @@ int btrfs_zone_finish_one_bg(struct btrfs_fs_info *fs_info); > int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info, > struct btrfs_space_info *space_info, bool do_finish); > void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info); > +void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info); > #else /* CONFIG_BLK_DEV_ZONED */ > static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, > struct blk_zone *zone) > @@ -271,6 +272,8 @@ static inline int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info, > > static inline void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info) { } > > +static inline void btrfs_reserve_relocation_zone(struct btrfs_fs_info *fs_info) { } > + > #endif > > static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) > > -- > 2.35.3 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH RFC PATCH 3/3] btrfs: zoned: kick cleaner kthread if low on space 2024-03-28 13:56 [PATCH RFC 0/3] btrfs: zoned: reclaim block-groups more aggressively Johannes Thumshirn 2024-03-28 13:56 ` [PATCH RFC PATCH 1/3] btrfs: zoned: traverse device list in should reclaim under rcu_read_lock Johannes Thumshirn 2024-03-28 13:56 ` [PATCH RFC PATCH 2/3] btrfs: zoned: reserve relocation zone on mount Johannes Thumshirn @ 2024-03-28 13:56 ` Johannes Thumshirn 2024-03-28 23:06 ` Damien Le Moal ` (2 more replies) 2 siblings, 3 replies; 13+ messages in thread From: Johannes Thumshirn @ 2024-03-28 13:56 UTC (permalink / raw) To: Chris Mason, Josef Bacik, David Sterba Cc: linux-btrfs, linux-kernel, Hans Holmberg, Naohiro Aota, hch, Damien LeMoal, Boris Burkov, Johannes Thumshirn From: Johannes Thumshirn <johannes.thumshirn@wdc.com> Kick the cleaner kthread on chunk allocation if we're slowly running out of usable space on a zoned filesystem. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> --- fs/btrfs/zoned.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index fb8707f4cab5..25c1a17873db 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1040,6 +1040,7 @@ int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror) u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, u64 hole_end, u64 num_bytes) { + struct btrfs_fs_info *fs_info = device->fs_info; struct btrfs_zoned_device_info *zinfo = device->zone_info; const u8 shift = zinfo->zone_size_shift; u64 nzones = num_bytes >> shift; @@ -1051,6 +1052,11 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, ASSERT(IS_ALIGNED(hole_start, zinfo->zone_size)); ASSERT(IS_ALIGNED(num_bytes, zinfo->zone_size)); + if (!test_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags) && + btrfs_zoned_should_reclaim(fs_info)) { + wake_up_process(fs_info->cleaner_kthread); + } + while (pos < hole_end) { begin = pos >> shift; end = begin + nzones; -- 2.35.3 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH RFC PATCH 3/3] btrfs: zoned: kick cleaner kthread if low on space 2024-03-28 13:56 ` [PATCH RFC PATCH 3/3] btrfs: zoned: kick cleaner kthread if low on space Johannes Thumshirn @ 2024-03-28 23:06 ` Damien Le Moal 2024-04-02 17:09 ` Boris Burkov 2024-04-04 19:45 ` David Sterba 2 siblings, 0 replies; 13+ messages in thread From: Damien Le Moal @ 2024-03-28 23:06 UTC (permalink / raw) To: Johannes Thumshirn, Chris Mason, Josef Bacik, David Sterba Cc: linux-btrfs, linux-kernel, Hans Holmberg, Naohiro Aota, hch, Boris Burkov, Johannes Thumshirn On 3/28/24 22:56, Johannes Thumshirn wrote: > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > Kick the cleaner kthread on chunk allocation if we're slowly running out > of usable space on a zoned filesystem. > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/zoned.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c > index fb8707f4cab5..25c1a17873db 100644 > --- a/fs/btrfs/zoned.c > +++ b/fs/btrfs/zoned.c > @@ -1040,6 +1040,7 @@ int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror) > u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, > u64 hole_end, u64 num_bytes) > { > + struct btrfs_fs_info *fs_info = device->fs_info; > struct btrfs_zoned_device_info *zinfo = device->zone_info; > const u8 shift = zinfo->zone_size_shift; > u64 nzones = num_bytes >> shift; > @@ -1051,6 +1052,11 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, > ASSERT(IS_ALIGNED(hole_start, zinfo->zone_size)); > ASSERT(IS_ALIGNED(num_bytes, zinfo->zone_size)); > > + if (!test_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags) && > + btrfs_zoned_should_reclaim(fs_info)) { > + wake_up_process(fs_info->cleaner_kthread); > + } Nit: no need for the curly brackets. > + > while (pos < hole_end) { > begin = pos >> shift; > end = begin + nzones; > -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC PATCH 3/3] btrfs: zoned: kick cleaner kthread if low on space 2024-03-28 13:56 ` [PATCH RFC PATCH 3/3] btrfs: zoned: kick cleaner kthread if low on space Johannes Thumshirn 2024-03-28 23:06 ` Damien Le Moal @ 2024-04-02 17:09 ` Boris Burkov 2024-04-04 19:45 ` David Sterba 2 siblings, 0 replies; 13+ messages in thread From: Boris Burkov @ 2024-04-02 17:09 UTC (permalink / raw) To: Johannes Thumshirn Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs, linux-kernel, Hans Holmberg, Naohiro Aota, hch, Damien LeMoal, Johannes Thumshirn On Thu, Mar 28, 2024 at 02:56:33PM +0100, Johannes Thumshirn wrote: > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > Kick the cleaner kthread on chunk allocation if we're slowly running out > of usable space on a zoned filesystem. I'm really excited about this and think it probably makes sense on not-zoned as well. Have you found that this is a fast enough to help real allocations that are in trouble? I'd be worried that it's a lot faster than waiting 30 seconds but still not urgent enough to save the day for someone trying to allocate right now. Not a blocking request for this patch, which I think is a definite improvement, but we could do something like add a pass to find_free_extent (before allocating a fresh BG?!?) that blocks on reclaim running if we have kicked it (or errors out in a way that signals to the outer loop that we can wait for reclaim, if needed for locking or whatever) > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/zoned.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c > index fb8707f4cab5..25c1a17873db 100644 > --- a/fs/btrfs/zoned.c > +++ b/fs/btrfs/zoned.c > @@ -1040,6 +1040,7 @@ int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror) > u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, > u64 hole_end, u64 num_bytes) > { > + struct btrfs_fs_info *fs_info = device->fs_info; > struct btrfs_zoned_device_info *zinfo = device->zone_info; > const u8 shift = zinfo->zone_size_shift; > u64 nzones = num_bytes >> shift; > @@ -1051,6 +1052,11 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, > ASSERT(IS_ALIGNED(hole_start, zinfo->zone_size)); > ASSERT(IS_ALIGNED(num_bytes, zinfo->zone_size)); > > + if (!test_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags) && > + btrfs_zoned_should_reclaim(fs_info)) { > + wake_up_process(fs_info->cleaner_kthread); > + } > + > while (pos < hole_end) { > begin = pos >> shift; > end = begin + nzones; > > -- > 2.35.3 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC PATCH 3/3] btrfs: zoned: kick cleaner kthread if low on space 2024-03-28 13:56 ` [PATCH RFC PATCH 3/3] btrfs: zoned: kick cleaner kthread if low on space Johannes Thumshirn 2024-03-28 23:06 ` Damien Le Moal 2024-04-02 17:09 ` Boris Burkov @ 2024-04-04 19:45 ` David Sterba 2 siblings, 0 replies; 13+ messages in thread From: David Sterba @ 2024-04-04 19:45 UTC (permalink / raw) To: Johannes Thumshirn Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs, linux-kernel, Hans Holmberg, Naohiro Aota, hch, Damien LeMoal, Boris Burkov, Johannes Thumshirn On Thu, Mar 28, 2024 at 02:56:33PM +0100, Johannes Thumshirn wrote: > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > Kick the cleaner kthread on chunk allocation if we're slowly running out > of usable space on a zoned filesystem. For zoned mode it makes more sense, as Boris noted it could be done for non-zoned too but maybe based on additional criteria as reusing the space is easier. Also cleaner does several things so it may start doing subvolume cleaning or defragmentation at the worst time. > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/zoned.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c > index fb8707f4cab5..25c1a17873db 100644 > --- a/fs/btrfs/zoned.c > +++ b/fs/btrfs/zoned.c > @@ -1040,6 +1040,7 @@ int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror) > u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, > u64 hole_end, u64 num_bytes) > { > + struct btrfs_fs_info *fs_info = device->fs_info; > struct btrfs_zoned_device_info *zinfo = device->zone_info; > const u8 shift = zinfo->zone_size_shift; > u64 nzones = num_bytes >> shift; > @@ -1051,6 +1052,11 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, > ASSERT(IS_ALIGNED(hole_start, zinfo->zone_size)); > ASSERT(IS_ALIGNED(num_bytes, zinfo->zone_size)); > > + if (!test_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags) && > + btrfs_zoned_should_reclaim(fs_info)) { > + wake_up_process(fs_info->cleaner_kthread); You can drop the BTRFS_FS_CLEANER_RUNNING condition, this is not a performance sensitive where the wake up of an already running process would hurt. > + } > + > while (pos < hole_end) { > begin = pos >> shift; > end = begin + nzones; > > -- > 2.35.3 > ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-04-05 5:03 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-03-28 13:56 [PATCH RFC 0/3] btrfs: zoned: reclaim block-groups more aggressively Johannes Thumshirn 2024-03-28 13:56 ` [PATCH RFC PATCH 1/3] btrfs: zoned: traverse device list in should reclaim under rcu_read_lock Johannes Thumshirn 2024-04-04 19:35 ` David Sterba 2024-03-28 13:56 ` [PATCH RFC PATCH 2/3] btrfs: zoned: reserve relocation zone on mount Johannes Thumshirn 2024-03-28 23:05 ` Damien Le Moal 2024-04-02 6:03 ` Naohiro Aota 2024-04-02 17:04 ` Boris Burkov 2024-04-05 5:03 ` Naohiro Aota 2024-04-05 1:14 ` Naohiro Aota 2024-03-28 13:56 ` [PATCH RFC PATCH 3/3] btrfs: zoned: kick cleaner kthread if low on space Johannes Thumshirn 2024-03-28 23:06 ` Damien Le Moal 2024-04-02 17:09 ` Boris Burkov 2024-04-04 19:45 ` David Sterba
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.