All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] btrfs: Use the normal writeback path for flushing delalloc
@ 2021-01-04 17:03 Josef Bacik
  2021-01-04 17:23 ` Filipe Manana
  0 siblings, 1 reply; 3+ messages in thread
From: Josef Bacik @ 2021-01-04 17:03 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: stable, René Rebe

This is a revert for 38d715f494f2 ("btrfs: use
btrfs_start_delalloc_roots in shrink_delalloc").  A user reported a
problem where performance was significantly worse with this patch
applied.  The problem needs to be fixed with proper pre-flushing, and
changes to how we deal with the work queues for the inodes.  However
that work is much more complicated than is acceptable for stable, and
simply reverting this patch fixes the problem.  The original patch was
a cleanup of the code, so it's fine to revert it.  My numbers for the
original reported test, which was untarring a copy of the firefox
sources, are as follows

5.9	0m54.258s
5.10	1m26.212s
Fix	0m35.038s

cc: stable@vger.kernel.org # 5.10
Reported-by: René Rebe <rene@exactcode.de>
Fixes: 38d715f494f2 ("btrfs: use btrfs_start_delalloc_roots in shrink_delalloc")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
Dave, this is ontop of linus's branch, because we've changed the arguments for
btrfs_start_delalloc_roots in misc-next, and this needs to go back to 5.10 ASAP.
I can send a misc-next version if you want to have it there as well while we're
waiting for it to go into linus's tree, just let me know.

 fs/btrfs/space-info.c | 54 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 64099565ab8f..a2b322275b8d 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -465,6 +465,28 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
 	up_read(&info->groups_sem);
 }
 
+static void btrfs_writeback_inodes_sb_nr(struct btrfs_fs_info *fs_info,
+					 unsigned long nr_pages, u64 nr_items)
+{
+	struct super_block *sb = fs_info->sb;
+
+	if (down_read_trylock(&sb->s_umount)) {
+		writeback_inodes_sb_nr(sb, nr_pages, WB_REASON_FS_FREE_SPACE);
+		up_read(&sb->s_umount);
+	} else {
+		/*
+		 * We needn't worry the filesystem going from r/w to r/o though
+		 * we don't acquire ->s_umount mutex, because the filesystem
+		 * should guarantee the delalloc inodes list be empty after
+		 * the filesystem is readonly(all dirty pages are written to
+		 * the disk).
+		 */
+		btrfs_start_delalloc_roots(fs_info, nr_items);
+		if (!current->journal_info)
+			btrfs_wait_ordered_roots(fs_info, nr_items, 0, (u64)-1);
+	}
+}
+
 static inline u64 calc_reclaim_items_nr(struct btrfs_fs_info *fs_info,
 					u64 to_reclaim)
 {
@@ -490,8 +512,10 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
 	struct btrfs_trans_handle *trans;
 	u64 delalloc_bytes;
 	u64 dio_bytes;
+	u64 async_pages;
 	u64 items;
 	long time_left;
+	unsigned long nr_pages;
 	int loops;
 
 	/* Calc the number of the pages we need flush for space reservation */
@@ -532,8 +556,36 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
 
 	loops = 0;
 	while ((delalloc_bytes || dio_bytes) && loops < 3) {
-		btrfs_start_delalloc_roots(fs_info, items);
+		nr_pages = min(delalloc_bytes, to_reclaim) >> PAGE_SHIFT;
+
+		/*
+		 * Triggers inode writeback for up to nr_pages. This will invoke
+		 * ->writepages callback and trigger delalloc filling
+		 *  (btrfs_run_delalloc_range()).
+		 */
+		btrfs_writeback_inodes_sb_nr(fs_info, nr_pages, items);
+		/*
+		 * We need to wait for the compressed pages to start before
+		 * we continue.
+		 */
+		async_pages = atomic_read(&fs_info->async_delalloc_pages);
+		if (!async_pages)
+			goto skip_async;
+
+		/*
+		 * Calculate how many compressed pages we want to be written
+		 * before we continue. I.e if there are more async pages than we
+		 * require wait_event will wait until nr_pages are written.
+		 */
+		if (async_pages <= nr_pages)
+			async_pages = 0;
+		else
+			async_pages -= nr_pages;
 
+		wait_event(fs_info->async_submit_wait,
+			   atomic_read(&fs_info->async_delalloc_pages) <=
+			   (int)async_pages);
+skip_async:
 		loops++;
 		if (wait_ordered && !trans) {
 			btrfs_wait_ordered_roots(fs_info, items, 0, (u64)-1);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] btrfs: Use the normal writeback path for flushing delalloc
  2021-01-04 17:03 [PATCH] btrfs: Use the normal writeback path for flushing delalloc Josef Bacik
@ 2021-01-04 17:23 ` Filipe Manana
  2021-01-04 18:28   ` Josef Bacik
  0 siblings, 1 reply; 3+ messages in thread
From: Filipe Manana @ 2021-01-04 17:23 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team, stable, René Rebe

On Mon, Jan 4, 2021 at 5:06 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> This is a revert for 38d715f494f2 ("btrfs: use
> btrfs_start_delalloc_roots in shrink_delalloc").  A user reported a
> problem where performance was significantly worse with this patch
> applied.  The problem needs to be fixed with proper pre-flushing, and
> changes to how we deal with the work queues for the inodes.  However
> that work is much more complicated than is acceptable for stable, and
> simply reverting this patch fixes the problem.  The original patch was
> a cleanup of the code, so it's fine to revert it.  My numbers for the
> original reported test, which was untarring a copy of the firefox
> sources, are as follows
>
> 5.9     0m54.258s
> 5.10    1m26.212s
> Fix     0m35.038s
>
> cc: stable@vger.kernel.org # 5.10
> Reported-by: René Rebe <rene@exactcode.de>
> Fixes: 38d715f494f2 ("btrfs: use btrfs_start_delalloc_roots in shrink_delalloc")
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
> Dave, this is ontop of linus's branch, because we've changed the arguments for
> btrfs_start_delalloc_roots in misc-next, and this needs to go back to 5.10 ASAP.
> I can send a misc-next version if you want to have it there as well while we're
> waiting for it to go into linus's tree, just let me know.

Adding this to stable releases will also make the following fix not
work on stable releases:

https://lore.kernel.org/linux-btrfs/39c2a60aa682f69f9823f51aa119d37ef4b9f834.1606909923.git.fdmanana@suse.com/

Since now the async reclaim task can trigger writeback through
writeback_inodes_sb_nr() and not only through
btrfs_start_delalloc_roots().
Other than changing that patch to make extent_write_cache_pages() do
nothing when the inode has the bit BTRFS_INODE_NO_DELALLOC_FLUSH set,
I'm not seeing other simple ways to do it.

Thanks.

>
>  fs/btrfs/space-info.c | 54 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 53 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index 64099565ab8f..a2b322275b8d 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -465,6 +465,28 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
>         up_read(&info->groups_sem);
>  }
>
> +static void btrfs_writeback_inodes_sb_nr(struct btrfs_fs_info *fs_info,
> +                                        unsigned long nr_pages, u64 nr_items)
> +{
> +       struct super_block *sb = fs_info->sb;
> +
> +       if (down_read_trylock(&sb->s_umount)) {
> +               writeback_inodes_sb_nr(sb, nr_pages, WB_REASON_FS_FREE_SPACE);
> +               up_read(&sb->s_umount);
> +       } else {
> +               /*
> +                * We needn't worry the filesystem going from r/w to r/o though
> +                * we don't acquire ->s_umount mutex, because the filesystem
> +                * should guarantee the delalloc inodes list be empty after
> +                * the filesystem is readonly(all dirty pages are written to
> +                * the disk).
> +                */
> +               btrfs_start_delalloc_roots(fs_info, nr_items);
> +               if (!current->journal_info)
> +                       btrfs_wait_ordered_roots(fs_info, nr_items, 0, (u64)-1);
> +       }
> +}
> +
>  static inline u64 calc_reclaim_items_nr(struct btrfs_fs_info *fs_info,
>                                         u64 to_reclaim)
>  {
> @@ -490,8 +512,10 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
>         struct btrfs_trans_handle *trans;
>         u64 delalloc_bytes;
>         u64 dio_bytes;
> +       u64 async_pages;
>         u64 items;
>         long time_left;
> +       unsigned long nr_pages;
>         int loops;
>
>         /* Calc the number of the pages we need flush for space reservation */
> @@ -532,8 +556,36 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
>
>         loops = 0;
>         while ((delalloc_bytes || dio_bytes) && loops < 3) {
> -               btrfs_start_delalloc_roots(fs_info, items);
> +               nr_pages = min(delalloc_bytes, to_reclaim) >> PAGE_SHIFT;
> +
> +               /*
> +                * Triggers inode writeback for up to nr_pages. This will invoke
> +                * ->writepages callback and trigger delalloc filling
> +                *  (btrfs_run_delalloc_range()).
> +                */
> +               btrfs_writeback_inodes_sb_nr(fs_info, nr_pages, items);
> +               /*
> +                * We need to wait for the compressed pages to start before
> +                * we continue.
> +                */
> +               async_pages = atomic_read(&fs_info->async_delalloc_pages);
> +               if (!async_pages)
> +                       goto skip_async;
> +
> +               /*
> +                * Calculate how many compressed pages we want to be written
> +                * before we continue. I.e if there are more async pages than we
> +                * require wait_event will wait until nr_pages are written.
> +                */
> +               if (async_pages <= nr_pages)
> +                       async_pages = 0;
> +               else
> +                       async_pages -= nr_pages;
>
> +               wait_event(fs_info->async_submit_wait,
> +                          atomic_read(&fs_info->async_delalloc_pages) <=
> +                          (int)async_pages);
> +skip_async:
>                 loops++;
>                 if (wait_ordered && !trans) {
>                         btrfs_wait_ordered_roots(fs_info, items, 0, (u64)-1);
> --
> 2.26.2
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] btrfs: Use the normal writeback path for flushing delalloc
  2021-01-04 17:23 ` Filipe Manana
@ 2021-01-04 18:28   ` Josef Bacik
  0 siblings, 0 replies; 3+ messages in thread
From: Josef Bacik @ 2021-01-04 18:28 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, kernel-team, stable, René Rebe

On 1/4/21 12:23 PM, Filipe Manana wrote:
> On Mon, Jan 4, 2021 at 5:06 PM Josef Bacik <josef@toxicpanda.com> wrote:
>>
>> This is a revert for 38d715f494f2 ("btrfs: use
>> btrfs_start_delalloc_roots in shrink_delalloc").  A user reported a
>> problem where performance was significantly worse with this patch
>> applied.  The problem needs to be fixed with proper pre-flushing, and
>> changes to how we deal with the work queues for the inodes.  However
>> that work is much more complicated than is acceptable for stable, and
>> simply reverting this patch fixes the problem.  The original patch was
>> a cleanup of the code, so it's fine to revert it.  My numbers for the
>> original reported test, which was untarring a copy of the firefox
>> sources, are as follows
>>
>> 5.9     0m54.258s
>> 5.10    1m26.212s
>> Fix     0m35.038s
>>
>> cc: stable@vger.kernel.org # 5.10
>> Reported-by: René Rebe <rene@exactcode.de>
>> Fixes: 38d715f494f2 ("btrfs: use btrfs_start_delalloc_roots in shrink_delalloc")
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>> ---
>> Dave, this is ontop of linus's branch, because we've changed the arguments for
>> btrfs_start_delalloc_roots in misc-next, and this needs to go back to 5.10 ASAP.
>> I can send a misc-next version if you want to have it there as well while we're
>> waiting for it to go into linus's tree, just let me know.
> 
> Adding this to stable releases will also make the following fix not
> work on stable releases:
> 
> https://lore.kernel.org/linux-btrfs/39c2a60aa682f69f9823f51aa119d37ef4b9f834.1606909923.git.fdmanana@suse.com/
> 
> Since now the async reclaim task can trigger writeback through
> writeback_inodes_sb_nr() and not only through
> btrfs_start_delalloc_roots().
> Other than changing that patch to make extent_write_cache_pages() do
> nothing when the inode has the bit BTRFS_INODE_NO_DELALLOC_FLUSH set,
> I'm not seeing other simple ways to do it.

Hmmm shit, ok let me see if I can make the perf regression go away while still 
using btrfs_start_delalloc_roots().  Thanks,

Josef

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-01-04 18:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-04 17:03 [PATCH] btrfs: Use the normal writeback path for flushing delalloc Josef Bacik
2021-01-04 17:23 ` Filipe Manana
2021-01-04 18:28   ` Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.