All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] btrfs: fallback to single page allocation to avoid bulk allocation latency
@ 2024-03-24 20:20 Qu Wenruo
  2024-03-25 12:32 ` Filipe Manana
  2024-03-25 21:18 ` Sweet Tea Dorminy
  0 siblings, 2 replies; 5+ messages in thread
From: Qu Wenruo @ 2024-03-24 20:20 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Julian Taylor

[BUG]
There is a recent report that when memory pressure is high (including
cached pages), btrfs can spend most of its time on memory allocation in
btrfs_alloc_page_array().

[CAUSE]
For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and
even if the bulk allocation failed we still retry but with extra
memalloc_retry_wait().

If the bulk alloc only returned one page a time, we would spend a lot of
time on the retry wait.

[FIX]
Instead of always trying the same bulk allocation, fallback to single
page allocation if the initial bulk allocation attempt doesn't fill the
whole request.

Even if this means a higher chance of memory allocation failure.

Reported-by: Julian Taylor <julian.taylor@1und1.de>
Link: https://lore.kernel.org/all/8966c095-cbe7-4d22-9784-a647d1bf27c3@1und1.de/
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 35 ++++++++++++-----------------------
 1 file changed, 12 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 7441245b1ceb..d49e7f0384ed 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -681,33 +681,22 @@ static void end_bbio_data_read(struct btrfs_bio *bbio)
 int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array,
 			   gfp_t extra_gfp)
 {
+	const gfp_t gfp = GFP_NOFS | extra_gfp;
 	unsigned int allocated;
 
-	for (allocated = 0; allocated < nr_pages;) {
-		unsigned int last = allocated;
-
-		allocated = alloc_pages_bulk_array(GFP_NOFS | extra_gfp,
-						   nr_pages, page_array);
-
-		if (allocated == nr_pages)
-			return 0;
-
-		/*
-		 * During this iteration, no page could be allocated, even
-		 * though alloc_pages_bulk_array() falls back to alloc_page()
-		 * if  it could not bulk-allocate. So we must be out of memory.
-		 */
-		if (allocated == last) {
-			for (int i = 0; i < allocated; i++) {
-				__free_page(page_array[i]);
-				page_array[i] = NULL;
-			}
-			return -ENOMEM;
-		}
-
-		memalloc_retry_wait(GFP_NOFS);
+	allocated = alloc_pages_bulk_array(GFP_NOFS | gfp, nr_pages, page_array);
+	for (; allocated < nr_pages; allocated++) {
+		page_array[allocated] = alloc_page(gfp);
+		if (unlikely(!page_array[allocated]))
+			goto enomem;
 	}
 	return 0;
+enomem:
+	for (int i = 0; i < allocated; i++) {
+		__free_page(page_array[i]);
+		page_array[i] = NULL;
+	}
+	return -ENOMEM;
 }
 
 /*
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: fallback to single page allocation to avoid bulk allocation latency
  2024-03-24 20:20 [PATCH] btrfs: fallback to single page allocation to avoid bulk allocation latency Qu Wenruo
@ 2024-03-25 12:32 ` Filipe Manana
  2024-03-25 21:18 ` Sweet Tea Dorminy
  1 sibling, 0 replies; 5+ messages in thread
From: Filipe Manana @ 2024-03-25 12:32 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Julian Taylor

On Sun, Mar 24, 2024 at 8:21 PM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> There is a recent report that when memory pressure is high (including
> cached pages), btrfs can spend most of its time on memory allocation in
> btrfs_alloc_page_array().
>
> [CAUSE]
> For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and
> even if the bulk allocation failed we still retry but with extra
> memalloc_retry_wait().
>
> If the bulk alloc only returned one page a time, we would spend a lot of
> time on the retry wait.
>
> [FIX]
> Instead of always trying the same bulk allocation, fallback to single
> page allocation if the initial bulk allocation attempt doesn't fill the
> whole request.
>
> Even if this means a higher chance of memory allocation failure.

Well, this is a bit confusing, this last sentence.
I mean the chances are the same we had before commit 91d6ac1d62c3
("btrfs: allocate page arrays using bulk page allocator").

>
> Reported-by: Julian Taylor <julian.taylor@1und1.de>
> Link: https://lore.kernel.org/all/8966c095-cbe7-4d22-9784-a647d1bf27c3@1und1.de/

As Julian seems to have tested this and confirmed it solves the
regression for him, a Tested-by tag should be here.

Also given that this seems like a huge performance drop, a Fixes tag
would be appropriate here.

Just one comment about the code below.

> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 35 ++++++++++++-----------------------
>  1 file changed, 12 insertions(+), 23 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 7441245b1ceb..d49e7f0384ed 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -681,33 +681,22 @@ static void end_bbio_data_read(struct btrfs_bio *bbio)
>  int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array,
>                            gfp_t extra_gfp)
>  {
> +       const gfp_t gfp = GFP_NOFS | extra_gfp;
>         unsigned int allocated;
>
> -       for (allocated = 0; allocated < nr_pages;) {
> -               unsigned int last = allocated;
> -
> -               allocated = alloc_pages_bulk_array(GFP_NOFS | extra_gfp,
> -                                                  nr_pages, page_array);
> -
> -               if (allocated == nr_pages)
> -                       return 0;
> -
> -               /*
> -                * During this iteration, no page could be allocated, even
> -                * though alloc_pages_bulk_array() falls back to alloc_page()
> -                * if  it could not bulk-allocate. So we must be out of memory.
> -                */
> -               if (allocated == last) {
> -                       for (int i = 0; i < allocated; i++) {
> -                               __free_page(page_array[i]);
> -                               page_array[i] = NULL;
> -                       }
> -                       return -ENOMEM;
> -               }
> -
> -               memalloc_retry_wait(GFP_NOFS);
> +       allocated = alloc_pages_bulk_array(GFP_NOFS | gfp, nr_pages, page_array);

No need to add GFP_NOFS, the gpg variable already has it.

Otherwise it looks good, so with that adjusted:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> +       for (; allocated < nr_pages; allocated++) {
> +               page_array[allocated] = alloc_page(gfp);
> +               if (unlikely(!page_array[allocated]))
> +                       goto enomem;
>         }
>         return 0;
> +enomem:
> +       for (int i = 0; i < allocated; i++) {
> +               __free_page(page_array[i]);
> +               page_array[i] = NULL;
> +       }
> +       return -ENOMEM;
>  }
>
>  /*
> --
> 2.44.0
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: fallback to single page allocation to avoid bulk allocation latency
  2024-03-24 20:20 [PATCH] btrfs: fallback to single page allocation to avoid bulk allocation latency Qu Wenruo
  2024-03-25 12:32 ` Filipe Manana
@ 2024-03-25 21:18 ` Sweet Tea Dorminy
  2024-03-25 21:38   ` Qu Wenruo
  1 sibling, 1 reply; 5+ messages in thread
From: Sweet Tea Dorminy @ 2024-03-25 21:18 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Julian Taylor



On 3/24/24 16:20, Qu Wenruo wrote:
> [BUG]
> There is a recent report that when memory pressure is high (including
> cached pages), btrfs can spend most of its time on memory allocation in
> btrfs_alloc_page_array().
> 
> [CAUSE]
> For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and
> even if the bulk allocation failed we still retry but with extra
> memalloc_retry_wait().
> 
> If the bulk alloc only returned one page a time, we would spend a lot of
> time on the retry wait.
> 
> [FIX]
> Instead of always trying the same bulk allocation, fallback to single
> page allocation if the initial bulk allocation attempt doesn't fill the
> whole request.


I still think the argument in 395cb57e85604 is reasonably compelling: 
that it's polite and normative among filesystems to throw in a 
retry_wait() between attempts to allocate, and I'm not sure why we're 
seeing this performance impact where others seemingly don't.

But your change does a lot more than just cutting out the retry_wait(); 
it only tries the bulk allocator once. Given the bulk allocator falls 
back to the single-page allocator itself and can allocate multiple pages 
on any given iteration, I think it's better to just cut out the 
retry_wait() if we must, in hopes we can allocate more than one when 
retrying?

Or, could we add in GFP_RETRY_MAYFAIL into the btrfs_alloc_page_array() 
call in compression.c instead, so that page reclaim is started if the 
initial alloc_pages_bulk_array() doesn't work out?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: fallback to single page allocation to avoid bulk allocation latency
  2024-03-25 21:18 ` Sweet Tea Dorminy
@ 2024-03-25 21:38   ` Qu Wenruo
  2024-03-25 22:54     ` Sweet Tea Dorminy
  0 siblings, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2024-03-25 21:38 UTC (permalink / raw)
  To: Sweet Tea Dorminy, linux-btrfs; +Cc: Julian Taylor



在 2024/3/26 07:48, Sweet Tea Dorminy 写道:
> 
> 
> On 3/24/24 16:20, Qu Wenruo wrote:
>> [BUG]
>> There is a recent report that when memory pressure is high (including
>> cached pages), btrfs can spend most of its time on memory allocation in
>> btrfs_alloc_page_array().
>>
>> [CAUSE]
>> For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and
>> even if the bulk allocation failed we still retry but with extra
>> memalloc_retry_wait().
>>
>> If the bulk alloc only returned one page a time, we would spend a lot of
>> time on the retry wait.
>>
>> [FIX]
>> Instead of always trying the same bulk allocation, fallback to single
>> page allocation if the initial bulk allocation attempt doesn't fill the
>> whole request.
> 
> 
> I still think the argument in 395cb57e85604 is reasonably compelling: 
> that it's polite and normative among filesystems to throw in a 
> retry_wait() between attempts to allocate, and I'm not sure why we're 
> seeing this performance impact where others seemingly don't.

The reporter is hitting this during compression, meanwhile no other 
major fs (until recent bcachefs?) supports compression.

Thus I guess we have a corner case where other fses don't?

> 
> But your change does a lot more than just cutting out the retry_wait(); 
> it only tries the bulk allocator once. Given the bulk allocator falls 
> back to the single-page allocator itself and can allocate multiple pages 
> on any given iteration, I think it's better to just cut out the 
> retry_wait() if we must, in hopes we can allocate more than one when 
> retrying?
> 
> Or, could we add in GFP_RETRY_MAYFAIL into the btrfs_alloc_page_array() 
> call in compression.c instead, so that page reclaim is started if the 
> initial alloc_pages_bulk_array() doesn't work out?

My question is, I don't have a good reference on why we should retry 
wait even for successful allocation.


In f2fs/ext4/XFS, the memalloc_retry_wait() is only called for page 
allocation failure (aka, no allocation is done at all), not after each 
bulk allocation unconditionally.

Thus I'm wondering if it's really a policy/common practice to do the 
wait for successful allocation at all.

In that case, I'm totally fine to only call retry wait if we are unable 
to allocate any page, other than unconditionally.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: fallback to single page allocation to avoid bulk allocation latency
  2024-03-25 21:38   ` Qu Wenruo
@ 2024-03-25 22:54     ` Sweet Tea Dorminy
  0 siblings, 0 replies; 5+ messages in thread
From: Sweet Tea Dorminy @ 2024-03-25 22:54 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Julian Taylor


> 
> The reporter is hitting this during compression, meanwhile no other 
> major fs (until recent bcachefs?) supports compression.
> 
> Thus I guess we have a corner case where other fses don't?

Makes sense. And since read success depends on it, it's latency 
sensitive, so I think that could be a justification if we did eventually 
want to throw GFP_RETRY_MAYFAIL in the flags.

> 
> My question is, I don't have a good reference on why we should retry 
> wait even for successful allocation.
> 
> In f2fs/ext4/XFS, the memalloc_retry_wait() is only called for page 
> allocation failure (aka, no allocation is done at all), not after each 
> bulk allocation unconditionally.
> 
> Thus I'm wondering if it's really a policy/common practice to do the 
> wait for successful allocation at all.

You make a good point. 
https://elixir.bootlin.com/linux/latest/source/include/linux/sched/mm.h#L267 
seems to indicate one should use it on every retry of mem allocation, 
but in practice it does appear that it's only done for a complete 
fialure to allocate.

 >
> 
> In that case, I'm totally fine to only call retry wait if we are unable 
> to allocate any page, other than unconditionally.
> 

Okay, I agree that that's a good plan and appreciate your discussion. I 
don't know if it's worth giving-up-and-freeing if we fail to make 
progress twice, but I don't think it matters.

Thank you very much!

Sweet Tea

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-03-25 22:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-24 20:20 [PATCH] btrfs: fallback to single page allocation to avoid bulk allocation latency Qu Wenruo
2024-03-25 12:32 ` Filipe Manana
2024-03-25 21:18 ` Sweet Tea Dorminy
2024-03-25 21:38   ` Qu Wenruo
2024-03-25 22:54     ` Sweet Tea Dorminy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.