All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] btrfs: fixes for reclaim
@ 2023-06-06  5:36 Naohiro Aota
  2023-06-06  5:36 ` [PATCH 1/4] btrfs: delete unused BGs while reclaiming BGs Naohiro Aota
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Naohiro Aota @ 2023-06-06  5:36 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

There are several issues on the reclaim process out there:

 - Long-running reclaim process blocks removing unused BGs
 - Belonging to the reclaim list blocks it goes to the unused list
 - It tries relocation even when FS is read-only
 - Temporal failure keep a block group un-reclaimed

This series fixes them.

Naohiro Aota (4):
  btrfs: delete unused BGs while reclaiming BGs
  btrfs: move out now unused BG from the reclaim list
  btrfs: bail out reclaim process if filesystem is read-only
  btrfs: reinsert BGs failed to reclaim

 fs/btrfs/block-group.c | 43 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 39 insertions(+), 4 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/4] btrfs: delete unused BGs while reclaiming BGs
  2023-06-06  5:36 [PATCH 0/4] btrfs: fixes for reclaim Naohiro Aota
@ 2023-06-06  5:36 ` Naohiro Aota
  2023-06-06  9:54   ` Filipe Manana
                     ` (2 more replies)
  2023-06-06  5:36 ` [PATCH 2/4] btrfs: move out now unused BG from the reclaim list Naohiro Aota
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 18+ messages in thread
From: Naohiro Aota @ 2023-06-06  5:36 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

The reclaiming process only starts after the FS volumes are allocated to a
certain level (75% by default). Thus, the list of reclaiming target block
groups can build up so huge at the time the reclaim process kicks in. On a
test run, there were over 1000 BGs in the reclaim list.

As the reclaim involves rewriting the data, it takes really long time to
reclaim the BGs. While the reclaim is running, btrfs_delete_unused_bgs()
won't proceed because the reclaim side is holding
fs_info->reclaim_bgs_lock. As a result, we will have a large number of unused
BGs kept in the unused list. On my test run, I got 1057 unused BGs.

Since deleting a block group is relatively easy and fast work, we can call
btrfs_delete_unused_bgs() while it reclaims BGs, to avoid building up
unused BGs.

Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/block-group.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 618ba7670e66..c5547da0f6eb 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1824,10 +1824,24 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
 
 next:
 		btrfs_put_block_group(bg);
+
+		mutex_unlock(&fs_info->reclaim_bgs_lock);
+		/*
+		 * Reclaiming all the BGs in the list can take really long.
+		 * Prioritize cleanning up unused BGs.
+		 */
+		btrfs_delete_unused_bgs(fs_info);
+		/*
+		 * If we are interrupted by a balance, we can just bail out. The
+		 * cleaner thread call me again if necessary.
+		 */
+		if (!mutex_trylock(&fs_info->reclaim_bgs_lock))
+			goto end;
 		spin_lock(&fs_info->unused_bgs_lock);
 	}
 	spin_unlock(&fs_info->unused_bgs_lock);
 	mutex_unlock(&fs_info->reclaim_bgs_lock);
+end:
 	btrfs_exclop_finish(fs_info);
 	sb_end_write(fs_info->sb);
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/4] btrfs: move out now unused BG from the reclaim list
  2023-06-06  5:36 [PATCH 0/4] btrfs: fixes for reclaim Naohiro Aota
  2023-06-06  5:36 ` [PATCH 1/4] btrfs: delete unused BGs while reclaiming BGs Naohiro Aota
@ 2023-06-06  5:36 ` Naohiro Aota
  2023-06-06 10:06   ` Filipe Manana
  2023-06-06 11:33   ` Johannes Thumshirn
  2023-06-06  5:36 ` [PATCH 3/4] btrfs: bail out reclaim process if filesystem is read-only Naohiro Aota
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 18+ messages in thread
From: Naohiro Aota @ 2023-06-06  5:36 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

An unused block group is easy to remove to free up space and should be
reclaimed fast. Such block group can often already be a target of the
reclaim process. As we check list_empty(&bg->bg_list), we keep it in the
reclaim list. That block group is never reclaimed until the file system is
filled e.g, 75%.

Instead, we can move unused block group to the unused list and delete it
fast.

Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/block-group.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index c5547da0f6eb..d5bba02167be 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1633,11 +1633,14 @@ void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
 {
 	struct btrfs_fs_info *fs_info = bg->fs_info;
 
+	trace_btrfs_add_unused_block_group(bg);
 	spin_lock(&fs_info->unused_bgs_lock);
 	if (list_empty(&bg->bg_list)) {
 		btrfs_get_block_group(bg);
-		trace_btrfs_add_unused_block_group(bg);
 		list_add_tail(&bg->bg_list, &fs_info->unused_bgs);
+	} else {
+		/* Pull out the BG from the reclaim_bgs list. */
+		list_move_tail(&bg->bg_list, &fs_info->unused_bgs);
 	}
 	spin_unlock(&fs_info->unused_bgs_lock);
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/4] btrfs: bail out reclaim process if filesystem is read-only
  2023-06-06  5:36 [PATCH 0/4] btrfs: fixes for reclaim Naohiro Aota
  2023-06-06  5:36 ` [PATCH 1/4] btrfs: delete unused BGs while reclaiming BGs Naohiro Aota
  2023-06-06  5:36 ` [PATCH 2/4] btrfs: move out now unused BG from the reclaim list Naohiro Aota
@ 2023-06-06  5:36 ` Naohiro Aota
  2023-06-06 10:16   ` Filipe Manana
  2023-06-06 11:34   ` Johannes Thumshirn
  2023-06-06  5:36 ` [PATCH 4/4] btrfs: reinsert BGs failed to reclaim Naohiro Aota
  2023-06-08 11:50 ` [PATCH 0/4] btrfs: fixes for reclaim David Sterba
  4 siblings, 2 replies; 18+ messages in thread
From: Naohiro Aota @ 2023-06-06  5:36 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

When a filesystem is read-only, we cannot reclaim a block group as it
cannot rewrite the data. Just bail out in that case.

Note that it can drop BGs in this case. As we did sb_start_write(),
read-only filesystem means we got a fatal error and forced read-only. There
is no chance to reclaim them again.

Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/block-group.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index d5bba02167be..db9bee071434 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1794,8 +1794,15 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
 		}
 		spin_unlock(&bg->lock);
 
-		/* Get out fast, in case we're unmounting the filesystem */
-		if (btrfs_fs_closing(fs_info)) {
+		/*
+		 * Get out fast, in case we're read-only or unmounting the
+		 * filesystem. It is OK to drop block groups from the list even
+		 * for the read-only case. As we did sb_start_write(), "mount -o
+		 * ro" won't happen and read-only FS means it is forced
+		 * read-only due to a fatal error. So, it never get back to
+		 * read-write to let us reclaime again.
+		 */
+		if (btrfs_need_cleaner_sleep(fs_info)) {
 			up_write(&space_info->groups_sem);
 			goto next;
 		}
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/4] btrfs: reinsert BGs failed to reclaim
  2023-06-06  5:36 [PATCH 0/4] btrfs: fixes for reclaim Naohiro Aota
                   ` (2 preceding siblings ...)
  2023-06-06  5:36 ` [PATCH 3/4] btrfs: bail out reclaim process if filesystem is read-only Naohiro Aota
@ 2023-06-06  5:36 ` Naohiro Aota
  2023-06-06 10:25   ` Filipe Manana
  2023-06-08 11:50 ` [PATCH 0/4] btrfs: fixes for reclaim David Sterba
  4 siblings, 1 reply; 18+ messages in thread
From: Naohiro Aota @ 2023-06-06  5:36 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

The reclaim process can temporally fail. For example, if the space is
getting tight, it fails to make the block group read-only. If there are no
further writes on that block group, the block group will never get back to
the reclaim list, and the BG never get reclaimed. In a certain workload, we
can leave many such block groups never reclaimed.

So, let's get it back to the list and give it a chance to be reclaimed.

Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/block-group.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index db9bee071434..36e0d9b5d824 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1833,7 +1833,18 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
 		}
 
 next:
-		btrfs_put_block_group(bg);
+		if (!ret) {
+			btrfs_put_block_group(bg);
+		} else {
+			spin_lock(&bg->lock);
+			spin_lock(&fs_info->unused_bgs_lock);
+			if (list_empty(&bg->bg_list))
+				list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs);
+			else
+				btrfs_put_block_group(bg);
+			spin_unlock(&fs_info->unused_bgs_lock);
+			spin_unlock(&bg->lock);
+		}
 
 		mutex_unlock(&fs_info->reclaim_bgs_lock);
 		/*
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] btrfs: delete unused BGs while reclaiming BGs
  2023-06-06  5:36 ` [PATCH 1/4] btrfs: delete unused BGs while reclaiming BGs Naohiro Aota
@ 2023-06-06  9:54   ` Filipe Manana
  2023-06-06 11:32   ` Johannes Thumshirn
  2023-06-07 18:48   ` Boris Burkov
  2 siblings, 0 replies; 18+ messages in thread
From: Filipe Manana @ 2023-06-06  9:54 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: linux-btrfs, Naohiro Aota

On Tue, Jun 6, 2023 at 7:04 AM Naohiro Aota <naota@elisp.net> wrote:
>
> The reclaiming process only starts after the FS volumes are allocated to a
> certain level (75% by default). Thus, the list of reclaiming target block
> groups can build up so huge at the time the reclaim process kicks in. On a
> test run, there were over 1000 BGs in the reclaim list.
>
> As the reclaim involves rewriting the data, it takes really long time to
> reclaim the BGs. While the reclaim is running, btrfs_delete_unused_bgs()
> won't proceed because the reclaim side is holding
> fs_info->reclaim_bgs_lock. As a result, we will have a large number of unused
> BGs kept in the unused list. On my test run, I got 1057 unused BGs.
>
> Since deleting a block group is relatively easy and fast work, we can call
> btrfs_delete_unused_bgs() while it reclaims BGs, to avoid building up
> unused BGs.
>
> Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones")
> CC: stable@vger.kernel.org # 5.15+
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, great catch.

> ---
>  fs/btrfs/block-group.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 618ba7670e66..c5547da0f6eb 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1824,10 +1824,24 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
>
>  next:
>                 btrfs_put_block_group(bg);
> +
> +               mutex_unlock(&fs_info->reclaim_bgs_lock);
> +               /*
> +                * Reclaiming all the BGs in the list can take really long.
> +                * Prioritize cleanning up unused BGs.
> +                */
> +               btrfs_delete_unused_bgs(fs_info);
> +               /*
> +                * If we are interrupted by a balance, we can just bail out. The
> +                * cleaner thread call me again if necessary.
> +                */
> +               if (!mutex_trylock(&fs_info->reclaim_bgs_lock))
> +                       goto end;
>                 spin_lock(&fs_info->unused_bgs_lock);
>         }
>         spin_unlock(&fs_info->unused_bgs_lock);
>         mutex_unlock(&fs_info->reclaim_bgs_lock);
> +end:
>         btrfs_exclop_finish(fs_info);
>         sb_end_write(fs_info->sb);
>  }
> --
> 2.40.1
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] btrfs: move out now unused BG from the reclaim list
  2023-06-06  5:36 ` [PATCH 2/4] btrfs: move out now unused BG from the reclaim list Naohiro Aota
@ 2023-06-06 10:06   ` Filipe Manana
  2023-06-06 11:33   ` Johannes Thumshirn
  1 sibling, 0 replies; 18+ messages in thread
From: Filipe Manana @ 2023-06-06 10:06 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: linux-btrfs, Naohiro Aota

On Tue, Jun 6, 2023 at 7:06 AM Naohiro Aota <naota@elisp.net> wrote:
>
> An unused block group is easy to remove to free up space and should be
> reclaimed fast. Such block group can often already be a target of the
> reclaim process. As we check list_empty(&bg->bg_list), we keep it in the
> reclaim list. That block group is never reclaimed until the file system is
> filled e.g, 75%.
>
> Instead, we can move unused block group to the unused list and delete it
> fast.
>
> Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones")
> CC: stable@vger.kernel.org # 5.15+
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, thanks.

> ---
>  fs/btrfs/block-group.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index c5547da0f6eb..d5bba02167be 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1633,11 +1633,14 @@ void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
>  {
>         struct btrfs_fs_info *fs_info = bg->fs_info;
>
> +       trace_btrfs_add_unused_block_group(bg);
>         spin_lock(&fs_info->unused_bgs_lock);
>         if (list_empty(&bg->bg_list)) {
>                 btrfs_get_block_group(bg);
> -               trace_btrfs_add_unused_block_group(bg);
>                 list_add_tail(&bg->bg_list, &fs_info->unused_bgs);
> +       } else {
> +               /* Pull out the BG from the reclaim_bgs list. */
> +               list_move_tail(&bg->bg_list, &fs_info->unused_bgs);
>         }
>         spin_unlock(&fs_info->unused_bgs_lock);
>  }
> --
> 2.40.1
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/4] btrfs: bail out reclaim process if filesystem is read-only
  2023-06-06  5:36 ` [PATCH 3/4] btrfs: bail out reclaim process if filesystem is read-only Naohiro Aota
@ 2023-06-06 10:16   ` Filipe Manana
  2023-06-06 11:34   ` Johannes Thumshirn
  1 sibling, 0 replies; 18+ messages in thread
From: Filipe Manana @ 2023-06-06 10:16 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: linux-btrfs, Naohiro Aota

On Tue, Jun 6, 2023 at 7:21 AM Naohiro Aota <naota@elisp.net> wrote:
>
> When a filesystem is read-only, we cannot reclaim a block group as it
> cannot rewrite the data. Just bail out in that case.
>
> Note that it can drop BGs in this case. As we did sb_start_write(),
> read-only filesystem means we got a fatal error and forced read-only. There
> is no chance to reclaim them again.
>
> Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones")
> CC: stable@vger.kernel.org # 5.15+
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  fs/btrfs/block-group.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index d5bba02167be..db9bee071434 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1794,8 +1794,15 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
>                 }
>                 spin_unlock(&bg->lock);
>
> -               /* Get out fast, in case we're unmounting the filesystem */
> -               if (btrfs_fs_closing(fs_info)) {
> +               /*
> +                * Get out fast, in case we're read-only or unmounting the
> +                * filesystem. It is OK to drop block groups from the list even
> +                * for the read-only case. As we did sb_start_write(), "mount -o
> +                * ro" won't happen and read-only FS means it is forced

I think here you mean "mount -o remount,ro", just to be more clear.

> +                * read-only due to a fatal error. So, it never get back to

get -> gets

> +                * read-write to let us reclaime again.

reclaime -> reclaim

Other than that, it looks good to me.

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.
> +                */
> +               if (btrfs_need_cleaner_sleep(fs_info)) {
>                         up_write(&space_info->groups_sem);
>                         goto next;
>                 }
> --
> 2.40.1
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] btrfs: reinsert BGs failed to reclaim
  2023-06-06  5:36 ` [PATCH 4/4] btrfs: reinsert BGs failed to reclaim Naohiro Aota
@ 2023-06-06 10:25   ` Filipe Manana
  2023-06-06 16:23     ` David Sterba
  0 siblings, 1 reply; 18+ messages in thread
From: Filipe Manana @ 2023-06-06 10:25 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: linux-btrfs, Naohiro Aota

On Tue, Jun 6, 2023 at 7:04 AM Naohiro Aota <naota@elisp.net> wrote:
>
> The reclaim process can temporally fail. For example, if the space is

temporally -> temporarily

> getting tight, it fails to make the block group read-only. If there are no
> further writes on that block group, the block group will never get back to
> the reclaim list, and the BG never get reclaimed. In a certain workload, we

get -> gets

> can leave many such block groups never reclaimed.
>
> So, let's get it back to the list and give it a chance to be reclaimed.
>
> Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones")
> CC: stable@vger.kernel.org # 5.15+
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  fs/btrfs/block-group.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index db9bee071434..36e0d9b5d824 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1833,7 +1833,18 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
>                 }
>
>  next:
> -               btrfs_put_block_group(bg);
> +               if (!ret) {
> +                       btrfs_put_block_group(bg);
> +               } else {
> +                       spin_lock(&bg->lock);

Why do we need to take bg->lock?
The ->bg_list is protected by fs_info->unused_bgs_lock alone.

> +                       spin_lock(&fs_info->unused_bgs_lock);
> +                       if (list_empty(&bg->bg_list))
> +                               list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs);
> +                       else
> +                               btrfs_put_block_group(bg);
> +                       spin_unlock(&fs_info->unused_bgs_lock);
> +                       spin_unlock(&bg->lock);
> +               }

Also, this is very similar to btrfs_mark_bg_to_reclaim(), so we should
use that, and simply have:

btrfs_mark_bg_to_reclaim();
btrfs_put_block_group(bg);

Thanks.

>
>                 mutex_unlock(&fs_info->reclaim_bgs_lock);
>                 /*
> --
> 2.40.1
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] btrfs: delete unused BGs while reclaiming BGs
  2023-06-06  5:36 ` [PATCH 1/4] btrfs: delete unused BGs while reclaiming BGs Naohiro Aota
  2023-06-06  9:54   ` Filipe Manana
@ 2023-06-06 11:32   ` Johannes Thumshirn
  2023-06-07 18:48   ` Boris Burkov
  2 siblings, 0 replies; 18+ messages in thread
From: Johannes Thumshirn @ 2023-06-06 11:32 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs; +Cc: Naohiro Aota

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] btrfs: move out now unused BG from the reclaim list
  2023-06-06  5:36 ` [PATCH 2/4] btrfs: move out now unused BG from the reclaim list Naohiro Aota
  2023-06-06 10:06   ` Filipe Manana
@ 2023-06-06 11:33   ` Johannes Thumshirn
  1 sibling, 0 replies; 18+ messages in thread
From: Johannes Thumshirn @ 2023-06-06 11:33 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs; +Cc: Naohiro Aota

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/4] btrfs: bail out reclaim process if filesystem is read-only
  2023-06-06  5:36 ` [PATCH 3/4] btrfs: bail out reclaim process if filesystem is read-only Naohiro Aota
  2023-06-06 10:16   ` Filipe Manana
@ 2023-06-06 11:34   ` Johannes Thumshirn
  1 sibling, 0 replies; 18+ messages in thread
From: Johannes Thumshirn @ 2023-06-06 11:34 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs; +Cc: Naohiro Aota

With Filipe's comments incorporated:

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] btrfs: reinsert BGs failed to reclaim
  2023-06-06 10:25   ` Filipe Manana
@ 2023-06-06 16:23     ` David Sterba
  2023-06-06 23:55       ` Naohiro Aota
  0 siblings, 1 reply; 18+ messages in thread
From: David Sterba @ 2023-06-06 16:23 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Naohiro Aota, linux-btrfs, Naohiro Aota

On Tue, Jun 06, 2023 at 11:25:23AM +0100, Filipe Manana wrote:
> On Tue, Jun 6, 2023 at 7:04 AM Naohiro Aota <naota@elisp.net> wrote:
> > +                       spin_lock(&fs_info->unused_bgs_lock);
> > +                       if (list_empty(&bg->bg_list))
> > +                               list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs);
> > +                       else
> > +                               btrfs_put_block_group(bg);
> > +                       spin_unlock(&fs_info->unused_bgs_lock);
> > +                       spin_unlock(&bg->lock);
> > +               }
> 
> Also, this is very similar to btrfs_mark_bg_to_reclaim(), so we should
> use that, and simply have:
> 
> btrfs_mark_bg_to_reclaim();
> btrfs_put_block_group(bg);

I can fold the diff below if you agree

--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1833,18 +1833,9 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
                }
 
 next:
-               if (!ret) {
-                       btrfs_put_block_group(bg);
-               } else {
-                       spin_lock(&bg->lock);
-                       spin_lock(&fs_info->unused_bgs_lock);
-                       if (list_empty(&bg->bg_list))
-                               list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs);
-                       else
-                               btrfs_put_block_group(bg);
-                       spin_unlock(&fs_info->unused_bgs_lock);
-                       spin_unlock(&bg->lock);
-               }
+               if (!ret)
+                       btrfs_mark_bg_to_reclaim(bg);
+               btrfs_put_block_group(bg);
 
                mutex_unlock(&fs_info->reclaim_bgs_lock);
                /*
---

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] btrfs: reinsert BGs failed to reclaim
  2023-06-06 16:23     ` David Sterba
@ 2023-06-06 23:55       ` Naohiro Aota
  2023-06-08 11:48         ` David Sterba
  0 siblings, 1 reply; 18+ messages in thread
From: Naohiro Aota @ 2023-06-06 23:55 UTC (permalink / raw)
  To: David Sterba; +Cc: Filipe Manana, Naohiro Aota, linux-btrfs

On Tue, Jun 06, 2023 at 06:23:23PM +0200, David Sterba wrote:
> On Tue, Jun 06, 2023 at 11:25:23AM +0100, Filipe Manana wrote:
> > On Tue, Jun 6, 2023 at 7:04 AM Naohiro Aota <naota@elisp.net> wrote:
> > > +                       spin_lock(&fs_info->unused_bgs_lock);
> > > +                       if (list_empty(&bg->bg_list))
> > > +                               list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs);
> > > +                       else
> > > +                               btrfs_put_block_group(bg);
> > > +                       spin_unlock(&fs_info->unused_bgs_lock);
> > > +                       spin_unlock(&bg->lock);
> > > +               }
> > 
> > Also, this is very similar to btrfs_mark_bg_to_reclaim(), so we should
> > use that, and simply have:
> > 
> > btrfs_mark_bg_to_reclaim();
> > btrfs_put_block_group(bg);

Yeah, this looks nice. Thank you.

> 
> I can fold the diff below if you agree
> 
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1833,18 +1833,9 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
>                 }
>  
>  next:
> -               if (!ret) {
> -                       btrfs_put_block_group(bg);
> -               } else {
> -                       spin_lock(&bg->lock);
> -                       spin_lock(&fs_info->unused_bgs_lock);
> -                       if (list_empty(&bg->bg_list))
> -                               list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs);
> -                       else
> -                               btrfs_put_block_group(bg);
> -                       spin_unlock(&fs_info->unused_bgs_lock);
> -                       spin_unlock(&bg->lock);
> -               }
> +               if (!ret)
> +                       btrfs_mark_bg_to_reclaim(bg);

Thank you for folding this, but the condition is flipped.  We should add it
back to the list in a failure case.

> +               btrfs_put_block_group(bg);
>  
>                 mutex_unlock(&fs_info->reclaim_bgs_lock);
>                 /*
> ---

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] btrfs: delete unused BGs while reclaiming BGs
  2023-06-06  5:36 ` [PATCH 1/4] btrfs: delete unused BGs while reclaiming BGs Naohiro Aota
  2023-06-06  9:54   ` Filipe Manana
  2023-06-06 11:32   ` Johannes Thumshirn
@ 2023-06-07 18:48   ` Boris Burkov
  2023-06-12 12:50     ` Naohiro Aota
  2 siblings, 1 reply; 18+ messages in thread
From: Boris Burkov @ 2023-06-07 18:48 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: linux-btrfs, Naohiro Aota

On Tue, Jun 06, 2023 at 02:36:33PM +0900, Naohiro Aota wrote:
> The reclaiming process only starts after the FS volumes are allocated to a
> certain level (75% by default). Thus, the list of reclaiming target block
> groups can build up so huge at the time the reclaim process kicks in. On a
> test run, there were over 1000 BGs in the reclaim list.
> 
> As the reclaim involves rewriting the data, it takes really long time to
> reclaim the BGs. While the reclaim is running, btrfs_delete_unused_bgs()
> won't proceed because the reclaim side is holding
> fs_info->reclaim_bgs_lock. As a result, we will have a large number of unused
> BGs kept in the unused list. On my test run, I got 1057 unused BGs.
> 
> Since deleting a block group is relatively easy and fast work, we can call
> btrfs_delete_unused_bgs() while it reclaims BGs, to avoid building up
> unused BGs.
> 
> Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones")
> CC: stable@vger.kernel.org # 5.15+
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  fs/btrfs/block-group.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 618ba7670e66..c5547da0f6eb 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1824,10 +1824,24 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
>  
>  next:
>  		btrfs_put_block_group(bg);
> +
> +		mutex_unlock(&fs_info->reclaim_bgs_lock);
> +		/*
> +		 * Reclaiming all the BGs in the list can take really long.
> +		 * Prioritize cleanning up unused BGs.
> +		 */
> +		btrfs_delete_unused_bgs(fs_info);
> +		/*
> +		 * If we are interrupted by a balance, we can just bail out. The
> +		 * cleaner thread call me again if necessary.
> +		 */
> +		if (!mutex_trylock(&fs_info->reclaim_bgs_lock))
> +			goto end;

I agree that this fix makes sense and a lot of reclaim should not block
deleting unused bgs.

However, it feels quite hacky to me, because by the current design, we
are explicitly calling btrfs_delete_unused_bgs as a first class cleanup
action from the cleaner thread, before calling reclaim. I think a little
more cleanup to integrate the two together would reduce the "throw
things at the wall" feel here.

I would propose either:
1. Run them in parallel and make sure they release locks appropriately
   so that everyone makes good forward progress. I think it's a good
   model, but kind of a departure from the single cleaner thread so
   maybe risky/a pain to code.
2. Just get rid of the explicit delete unused from cleaner and
   integrate it as a first class component of this reclaim loop. This
   loop becomes *the* delete unused work except shutdown type cases.

FWIW, not a NAK, just my 2c.

Thanks,
Boris

>  		spin_lock(&fs_info->unused_bgs_lock);
>  	}
>  	spin_unlock(&fs_info->unused_bgs_lock);
>  	mutex_unlock(&fs_info->reclaim_bgs_lock);
> +end:
>  	btrfs_exclop_finish(fs_info);
>  	sb_end_write(fs_info->sb);
>  }
> -- 
> 2.40.1
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] btrfs: reinsert BGs failed to reclaim
  2023-06-06 23:55       ` Naohiro Aota
@ 2023-06-08 11:48         ` David Sterba
  0 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2023-06-08 11:48 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: David Sterba, Filipe Manana, Naohiro Aota, linux-btrfs

On Tue, Jun 06, 2023 at 11:55:48PM +0000, Naohiro Aota wrote:
> On Tue, Jun 06, 2023 at 06:23:23PM +0200, David Sterba wrote:
> > On Tue, Jun 06, 2023 at 11:25:23AM +0100, Filipe Manana wrote:
> > > On Tue, Jun 6, 2023 at 7:04 AM Naohiro Aota <naota@elisp.net> wrote:
> > > > +                       spin_lock(&fs_info->unused_bgs_lock);
> > > > +                       if (list_empty(&bg->bg_list))
> > > > +                               list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs);
> > > > +                       else
> > > > +                               btrfs_put_block_group(bg);
> > > > +                       spin_unlock(&fs_info->unused_bgs_lock);
> > > > +                       spin_unlock(&bg->lock);
> > > > +               }
> > > 
> > > Also, this is very similar to btrfs_mark_bg_to_reclaim(), so we should
> > > use that, and simply have:
> > > 
> > > btrfs_mark_bg_to_reclaim();
> > > btrfs_put_block_group(bg);
> 
> Yeah, this looks nice. Thank you.
> 
> > 
> > I can fold the diff below if you agree
> > 
> > --- a/fs/btrfs/block-group.c
> > +++ b/fs/btrfs/block-group.c
> > @@ -1833,18 +1833,9 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
> >                 }
> >  
> >  next:
> > -               if (!ret) {
> > -                       btrfs_put_block_group(bg);
> > -               } else {
> > -                       spin_lock(&bg->lock);
> > -                       spin_lock(&fs_info->unused_bgs_lock);
> > -                       if (list_empty(&bg->bg_list))
> > -                               list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs);
> > -                       else
> > -                               btrfs_put_block_group(bg);
> > -                       spin_unlock(&fs_info->unused_bgs_lock);
> > -                       spin_unlock(&bg->lock);
> > -               }
> > +               if (!ret)
> > +                       btrfs_mark_bg_to_reclaim(bg);
> 
> Thank you for folding this, but the condition is flipped.  We should add it
> back to the list in a failure case.

Ah right, fixed and pushed. The whole diff becomes:

               if (ret)
                       btrfs_mark_bg_to_reclaim(bg);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] btrfs: fixes for reclaim
  2023-06-06  5:36 [PATCH 0/4] btrfs: fixes for reclaim Naohiro Aota
                   ` (3 preceding siblings ...)
  2023-06-06  5:36 ` [PATCH 4/4] btrfs: reinsert BGs failed to reclaim Naohiro Aota
@ 2023-06-08 11:50 ` David Sterba
  4 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2023-06-08 11:50 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: linux-btrfs, Naohiro Aota

On Tue, Jun 06, 2023 at 02:36:32PM +0900, Naohiro Aota wrote:
> There are several issues on the reclaim process out there:
> 
>  - Long-running reclaim process blocks removing unused BGs
>  - Belonging to the reclaim list blocks it goes to the unused list
>  - It tries relocation even when FS is read-only
>  - Temporal failure keep a block group un-reclaimed
> 
> This series fixes them.
> 
> Naohiro Aota (4):
>   btrfs: delete unused BGs while reclaiming BGs
>   btrfs: move out now unused BG from the reclaim list
>   btrfs: bail out reclaim process if filesystem is read-only
>   btrfs: reinsert BGs failed to reclaim

Added to misc-next, with the fixups, thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] btrfs: delete unused BGs while reclaiming BGs
  2023-06-07 18:48   ` Boris Burkov
@ 2023-06-12 12:50     ` Naohiro Aota
  0 siblings, 0 replies; 18+ messages in thread
From: Naohiro Aota @ 2023-06-12 12:50 UTC (permalink / raw)
  To: Boris Burkov; +Cc: Naohiro Aota, linux-btrfs

On Wed, Jun 07, 2023 at 11:48:37AM -0700, Boris Burkov wrote:
> On Tue, Jun 06, 2023 at 02:36:33PM +0900, Naohiro Aota wrote:
> > The reclaiming process only starts after the FS volumes are allocated to a
> > certain level (75% by default). Thus, the list of reclaiming target block
> > groups can build up so huge at the time the reclaim process kicks in. On a
> > test run, there were over 1000 BGs in the reclaim list.
> > 
> > As the reclaim involves rewriting the data, it takes really long time to
> > reclaim the BGs. While the reclaim is running, btrfs_delete_unused_bgs()
> > won't proceed because the reclaim side is holding
> > fs_info->reclaim_bgs_lock. As a result, we will have a large number of unused
> > BGs kept in the unused list. On my test run, I got 1057 unused BGs.
> > 
> > Since deleting a block group is relatively easy and fast work, we can call
> > btrfs_delete_unused_bgs() while it reclaims BGs, to avoid building up
> > unused BGs.
> > 
> > Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones")
> > CC: stable@vger.kernel.org # 5.15+
> > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > ---
> >  fs/btrfs/block-group.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> > 
> > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> > index 618ba7670e66..c5547da0f6eb 100644
> > --- a/fs/btrfs/block-group.c
> > +++ b/fs/btrfs/block-group.c
> > @@ -1824,10 +1824,24 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
> >  
> >  next:
> >  		btrfs_put_block_group(bg);
> > +
> > +		mutex_unlock(&fs_info->reclaim_bgs_lock);
> > +		/*
> > +		 * Reclaiming all the BGs in the list can take really long.
> > +		 * Prioritize cleanning up unused BGs.
> > +		 */
> > +		btrfs_delete_unused_bgs(fs_info);
> > +		/*
> > +		 * If we are interrupted by a balance, we can just bail out. The
> > +		 * cleaner thread call me again if necessary.
> > +		 */
> > +		if (!mutex_trylock(&fs_info->reclaim_bgs_lock))
> > +			goto end;
> 
> I agree that this fix makes sense and a lot of reclaim should not block
> deleting unused bgs.
> 
> However, it feels quite hacky to me, because by the current design, we
> are explicitly calling btrfs_delete_unused_bgs as a first class cleanup
> action from the cleaner thread, before calling reclaim. I think a little
> more cleanup to integrate the two together would reduce the "throw
> things at the wall" feel here.
> 
> I would propose either:
> 1. Run them in parallel and make sure they release locks appropriately
>    so that everyone makes good forward progress. I think it's a good
>    model, but kind of a departure from the single cleaner thread so
>    maybe risky/a pain to code.
> 2. Just get rid of the explicit delete unused from cleaner and
>    integrate it as a first class component of this reclaim loop. This
>    loop becomes *the* delete unused work except shutdown type cases.
> 
> FWIW, not a NAK, just my 2c.

Thank you for your suggestion. Yeah, I also considered something like your
option 2, but I hesitated to do so as a fix patch because it will change
the current cleaner and reclaim thread model. I'll rework them an a
integrated loop if btrfs devs agree with it.

> Thanks,
> Boris
> 
> >  		spin_lock(&fs_info->unused_bgs_lock);
> >  	}
> >  	spin_unlock(&fs_info->unused_bgs_lock);
> >  	mutex_unlock(&fs_info->reclaim_bgs_lock);
> > +end:
> >  	btrfs_exclop_finish(fs_info);
> >  	sb_end_write(fs_info->sb);
> >  }
> > -- 
> > 2.40.1
> > 

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-06-12 12:50 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-06  5:36 [PATCH 0/4] btrfs: fixes for reclaim Naohiro Aota
2023-06-06  5:36 ` [PATCH 1/4] btrfs: delete unused BGs while reclaiming BGs Naohiro Aota
2023-06-06  9:54   ` Filipe Manana
2023-06-06 11:32   ` Johannes Thumshirn
2023-06-07 18:48   ` Boris Burkov
2023-06-12 12:50     ` Naohiro Aota
2023-06-06  5:36 ` [PATCH 2/4] btrfs: move out now unused BG from the reclaim list Naohiro Aota
2023-06-06 10:06   ` Filipe Manana
2023-06-06 11:33   ` Johannes Thumshirn
2023-06-06  5:36 ` [PATCH 3/4] btrfs: bail out reclaim process if filesystem is read-only Naohiro Aota
2023-06-06 10:16   ` Filipe Manana
2023-06-06 11:34   ` Johannes Thumshirn
2023-06-06  5:36 ` [PATCH 4/4] btrfs: reinsert BGs failed to reclaim Naohiro Aota
2023-06-06 10:25   ` Filipe Manana
2023-06-06 16:23     ` David Sterba
2023-06-06 23:55       ` Naohiro Aota
2023-06-08 11:48         ` David Sterba
2023-06-08 11:50 ` [PATCH 0/4] btrfs: fixes for reclaim David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.