All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] btrfs: do not wait for short bulk allocation
@ 2024-03-25 22:46 Qu Wenruo
  2024-03-25 22:57 ` Sweet Tea Dorminy
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Qu Wenruo @ 2024-03-25 22:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Julian Taylor, Filipe Manana

[BUG]
There is a recent report that when memory pressure is high (including
cached pages), btrfs can spend most of its time on memory allocation in
btrfs_alloc_page_array() for compressed read/write.

[CAUSE]
For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and
even if the bulk allocation failed (fell back to single page
allocation) we still retry but with extra memalloc_retry_wait().

If the bulk alloc only returned one page a time, we would spend a lot of
time on the retry wait.

The behavior was introduced in commit 395cb57e8560 ("btrfs: wait between
incomplete batch memory allocations").

[FIX]
Although the commit mentioned that other filesystems do the wait, it's
not the case at least nowadays.

All the mainlined filesystems only call memalloc_retry_wait() if they
failed to allocate any page (not only for bulk allocation).
If there is any progress, they won't call memalloc_retry_wait() at all.

For example, xfs_buf_alloc_pages() would only call memalloc_retry_wait()
if there is no allocation progress at all, and the call is not for
metadata readahead.

So I don't believe we should call memalloc_retry_wait() unconditionally
for short allocation.

This patch would only call memalloc_retry_wait() if failed to allocate
any page for tree block allocation (which goes with __GFP_NOFAIL and may
not need the special handling anyway), and reduce the latency for
btrfs_alloc_page_array().

Reported-by: Julian Taylor <julian.taylor@1und1.de>
Tested-by: Julian Taylor <julian.taylor@1und1.de>
Link: https://lore.kernel.org/all/8966c095-cbe7-4d22-9784-a647d1bf27c3@1und1.de/
Fixes: 395cb57e8560 ("btrfs: wait between incomplete batch memory allocations")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changelog:
v2:
- Still use bulk allocation function
  Since alloc_pages_bulk_array() would fall back to single page
  allocation by itself, there is no need to go alloc_page() manually.

- Update the commit message to indicate other fses do not call
  memalloc_retry_wait() unconditionally
  In fact, they only call it when they need to retry hard and can not
  really fail.
---
 fs/btrfs/extent_io.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 7441245b1ceb..c96089b6f388 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -681,31 +681,27 @@ static void end_bbio_data_read(struct btrfs_bio *bbio)
 int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array,
 			   gfp_t extra_gfp)
 {
+	const gfp_t gfp = GFP_NOFS | extra_gfp;
 	unsigned int allocated;
 
 	for (allocated = 0; allocated < nr_pages;) {
 		unsigned int last = allocated;
 
-		allocated = alloc_pages_bulk_array(GFP_NOFS | extra_gfp,
-						   nr_pages, page_array);
+		allocated = alloc_pages_bulk_array(gfp, nr_pages, page_array);
+		if (unlikely(allocated == last)) {
+			/* Can not fail, wait and retry. */
+			if (extra_gfp & __GFP_NOFAIL) {
+				memalloc_retry_wait(GFP_NOFS);
+				continue;
+			}
 
-		if (allocated == nr_pages)
-			return 0;
-
-		/*
-		 * During this iteration, no page could be allocated, even
-		 * though alloc_pages_bulk_array() falls back to alloc_page()
-		 * if  it could not bulk-allocate. So we must be out of memory.
-		 */
-		if (allocated == last) {
+			/* Allowed to fail, error out. */
 			for (int i = 0; i < allocated; i++) {
 				__free_page(page_array[i]);
 				page_array[i] = NULL;
 			}
 			return -ENOMEM;
 		}
-
-		memalloc_retry_wait(GFP_NOFS);
 	}
 	return 0;
 }
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] btrfs: do not wait for short bulk allocation
  2024-03-25 22:46 [PATCH v2] btrfs: do not wait for short bulk allocation Qu Wenruo
@ 2024-03-25 22:57 ` Sweet Tea Dorminy
  2024-03-26 13:05 ` Johannes Thumshirn
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Sweet Tea Dorminy @ 2024-03-25 22:57 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Julian Taylor, Filipe Manana

Thanks!!

On 3/25/24 18:46, Qu Wenruo wrote:
> [BUG]
> There is a recent report that when memory pressure is high (including
> cached pages), btrfs can spend most of its time on memory allocation in
> btrfs_alloc_page_array() for compressed read/write.
> 
> [CAUSE]
> For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and
> even if the bulk allocation failed (fell back to single page
> allocation) we still retry but with extra memalloc_retry_wait().
> 
> If the bulk alloc only returned one page a time, we would spend a lot of
> time on the retry wait.
> 
> The behavior was introduced in commit 395cb57e8560 ("btrfs: wait between
> incomplete batch memory allocations").
> 
> [FIX]
> Although the commit mentioned that other filesystems do the wait, it's
> not the case at least nowadays.
> 
> All the mainlined filesystems only call memalloc_retry_wait() if they
> failed to allocate any page (not only for bulk allocation).
> If there is any progress, they won't call memalloc_retry_wait() at all.
> 
> For example, xfs_buf_alloc_pages() would only call memalloc_retry_wait()
> if there is no allocation progress at all, and the call is not for
> metadata readahead.
> 
> So I don't believe we should call memalloc_retry_wait() unconditionally
> for short allocation.
> 
> This patch would only call memalloc_retry_wait() if failed to allocate
> any page for tree block allocation (which goes with __GFP_NOFAIL and may
> not need the special handling anyway), and reduce the latency for
> btrfs_alloc_page_array().
> 
> Reported-by: Julian Taylor <julian.taylor@1und1.de>
> Tested-by: Julian Taylor <julian.taylor@1und1.de>
> Link: https://lore.kernel.org/all/8966c095-cbe7-4d22-9784-a647d1bf27c3@1und1.de/
> Fixes: 395cb57e8560 ("btrfs: wait between incomplete batch memory allocations")
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>

> ---
> Changelog:
> v2:
> - Still use bulk allocation function
>    Since alloc_pages_bulk_array() would fall back to single page
>    allocation by itself, there is no need to go alloc_page() manually.
> 
> - Update the commit message to indicate other fses do not call
>    memalloc_retry_wait() unconditionally
>    In fact, they only call it when they need to retry hard and can not
>    really fail.
> ---
>   fs/btrfs/extent_io.c | 22 +++++++++-------------
>   1 file changed, 9 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 7441245b1ceb..c96089b6f388 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -681,31 +681,27 @@ static void end_bbio_data_read(struct btrfs_bio *bbio)
>   int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array,
>   			   gfp_t extra_gfp)
>   {
> +	const gfp_t gfp = GFP_NOFS | extra_gfp;
>   	unsigned int allocated;
>   
>   	for (allocated = 0; allocated < nr_pages;) {
>   		unsigned int last = allocated;
>   
> -		allocated = alloc_pages_bulk_array(GFP_NOFS | extra_gfp,
> -						   nr_pages, page_array);
> +		allocated = alloc_pages_bulk_array(gfp, nr_pages, page_array);
> +		if (unlikely(allocated == last)) {
> +			/* Can not fail, wait and retry. */
> +			if (extra_gfp & __GFP_NOFAIL) {
> +				memalloc_retry_wait(GFP_NOFS);
> +				continue;
> +			}
>   
> -		if (allocated == nr_pages)
> -			return 0;
> -
> -		/*
> -		 * During this iteration, no page could be allocated, even
> -		 * though alloc_pages_bulk_array() falls back to alloc_page()
> -		 * if  it could not bulk-allocate. So we must be out of memory.
> -		 */
> -		if (allocated == last) {
> +			/* Allowed to fail, error out. */
>   			for (int i = 0; i < allocated; i++) {
>   				__free_page(page_array[i]);
>   				page_array[i] = NULL;
>   			}
>   			return -ENOMEM;
>   		}
> -
> -		memalloc_retry_wait(GFP_NOFS);
>   	}
>   	return 0;
>   }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] btrfs: do not wait for short bulk allocation
  2024-03-25 22:46 [PATCH v2] btrfs: do not wait for short bulk allocation Qu Wenruo
  2024-03-25 22:57 ` Sweet Tea Dorminy
@ 2024-03-26 13:05 ` Johannes Thumshirn
  2024-03-28 15:57 ` David Sterba
       [not found] ` <20240414202622.B092.409509F4@e16-tech.com>
  3 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2024-03-26 13:05 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Julian Taylor, Filipe Manana

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] btrfs: do not wait for short bulk allocation
  2024-03-25 22:46 [PATCH v2] btrfs: do not wait for short bulk allocation Qu Wenruo
  2024-03-25 22:57 ` Sweet Tea Dorminy
  2024-03-26 13:05 ` Johannes Thumshirn
@ 2024-03-28 15:57 ` David Sterba
  2024-03-28 20:29   ` Qu Wenruo
       [not found] ` <20240414202622.B092.409509F4@e16-tech.com>
  3 siblings, 1 reply; 9+ messages in thread
From: David Sterba @ 2024-03-28 15:57 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Julian Taylor, Filipe Manana

On Tue, Mar 26, 2024 at 09:16:46AM +1030, Qu Wenruo wrote:
> [BUG]
> There is a recent report that when memory pressure is high (including
> cached pages), btrfs can spend most of its time on memory allocation in
> btrfs_alloc_page_array() for compressed read/write.
> 
> [CAUSE]
> For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and
> even if the bulk allocation failed (fell back to single page
> allocation) we still retry but with extra memalloc_retry_wait().
> 
> If the bulk alloc only returned one page a time, we would spend a lot of
> time on the retry wait.
> 
> The behavior was introduced in commit 395cb57e8560 ("btrfs: wait between
> incomplete batch memory allocations").
> 
> [FIX]
> Although the commit mentioned that other filesystems do the wait, it's
> not the case at least nowadays.
> 
> All the mainlined filesystems only call memalloc_retry_wait() if they
> failed to allocate any page (not only for bulk allocation).
> If there is any progress, they won't call memalloc_retry_wait() at all.
> 
> For example, xfs_buf_alloc_pages() would only call memalloc_retry_wait()
> if there is no allocation progress at all, and the call is not for
> metadata readahead.
> 
> So I don't believe we should call memalloc_retry_wait() unconditionally
> for short allocation.
> 
> This patch would only call memalloc_retry_wait() if failed to allocate
> any page for tree block allocation (which goes with __GFP_NOFAIL and may
> not need the special handling anyway), and reduce the latency for
> btrfs_alloc_page_array().

Is this saying that it can fail with GFP_NOFAIL?

> Reported-by: Julian Taylor <julian.taylor@1und1.de>
> Tested-by: Julian Taylor <julian.taylor@1und1.de>
> Link: https://lore.kernel.org/all/8966c095-cbe7-4d22-9784-a647d1bf27c3@1und1.de/
> Fixes: 395cb57e8560 ("btrfs: wait between incomplete batch memory allocations")
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Changelog:
> v2:
> - Still use bulk allocation function
>   Since alloc_pages_bulk_array() would fall back to single page
>   allocation by itself, there is no need to go alloc_page() manually.
> 
> - Update the commit message to indicate other fses do not call
>   memalloc_retry_wait() unconditionally
>   In fact, they only call it when they need to retry hard and can not
>   really fail.
> ---
>  fs/btrfs/extent_io.c | 22 +++++++++-------------
>  1 file changed, 9 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 7441245b1ceb..c96089b6f388 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -681,31 +681,27 @@ static void end_bbio_data_read(struct btrfs_bio *bbio)
>  int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array,
>  			   gfp_t extra_gfp)
>  {
> +	const gfp_t gfp = GFP_NOFS | extra_gfp;
>  	unsigned int allocated;
>  
>  	for (allocated = 0; allocated < nr_pages;) {
>  		unsigned int last = allocated;
>  
> -		allocated = alloc_pages_bulk_array(GFP_NOFS | extra_gfp,
> -						   nr_pages, page_array);
> +		allocated = alloc_pages_bulk_array(gfp, nr_pages, page_array);
> +		if (unlikely(allocated == last)) {
> +			/* Can not fail, wait and retry. */
> +			if (extra_gfp & __GFP_NOFAIL) {
> +				memalloc_retry_wait(GFP_NOFS);

Can this happen? alloc_pages_bulk_array() should not fail when
GFP_NOFAIL is passed, there are two allocation phases in
__alloc_pages_bulk() and if it falls back to __alloc_pages() + NOFAIL it
will not fail ... so what's the point of the retry?

Anyway the whole thing with NOFAIL flag that's passed only from
alloc_extent_buffer() could be made a bit more straightforward. The gfp
flags replaced by a bool with 'nofail' semantics or 2 helpers one that
is for normal use an the one for the special purpose.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] btrfs: do not wait for short bulk allocation
  2024-03-28 15:57 ` David Sterba
@ 2024-03-28 20:29   ` Qu Wenruo
  2024-04-04 19:57     ` David Sterba
  0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2024-03-28 20:29 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: linux-btrfs, Julian Taylor, Filipe Manana



在 2024/3/29 02:27, David Sterba 写道:
> On Tue, Mar 26, 2024 at 09:16:46AM +1030, Qu Wenruo wrote:
>> [BUG]
>> There is a recent report that when memory pressure is high (including
>> cached pages), btrfs can spend most of its time on memory allocation in
>> btrfs_alloc_page_array() for compressed read/write.
>>
>> [CAUSE]
>> For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and
>> even if the bulk allocation failed (fell back to single page
>> allocation) we still retry but with extra memalloc_retry_wait().
>>
>> If the bulk alloc only returned one page a time, we would spend a lot of
>> time on the retry wait.
>>
>> The behavior was introduced in commit 395cb57e8560 ("btrfs: wait between
>> incomplete batch memory allocations").
>>
>> [FIX]
>> Although the commit mentioned that other filesystems do the wait, it's
>> not the case at least nowadays.
>>
>> All the mainlined filesystems only call memalloc_retry_wait() if they
>> failed to allocate any page (not only for bulk allocation).
>> If there is any progress, they won't call memalloc_retry_wait() at all.
>>
>> For example, xfs_buf_alloc_pages() would only call memalloc_retry_wait()
>> if there is no allocation progress at all, and the call is not for
>> metadata readahead.
>>
>> So I don't believe we should call memalloc_retry_wait() unconditionally
>> for short allocation.
>>
>> This patch would only call memalloc_retry_wait() if failed to allocate
>> any page for tree block allocation (which goes with __GFP_NOFAIL and may
>> not need the special handling anyway), and reduce the latency for
>> btrfs_alloc_page_array().
>
> Is this saying that it can fail with GFP_NOFAIL?

I'd say no, but never say no to memory allocation failure.

>
>> Reported-by: Julian Taylor <julian.taylor@1und1.de>
>> Tested-by: Julian Taylor <julian.taylor@1und1.de>
>> Link: https://lore.kernel.org/all/8966c095-cbe7-4d22-9784-a647d1bf27c3@1und1.de/
>> Fixes: 395cb57e8560 ("btrfs: wait between incomplete batch memory allocations")
>> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> Changelog:
>> v2:
>> - Still use bulk allocation function
>>    Since alloc_pages_bulk_array() would fall back to single page
>>    allocation by itself, there is no need to go alloc_page() manually.
>>
>> - Update the commit message to indicate other fses do not call
>>    memalloc_retry_wait() unconditionally
>>    In fact, they only call it when they need to retry hard and can not
>>    really fail.
>> ---
>>   fs/btrfs/extent_io.c | 22 +++++++++-------------
>>   1 file changed, 9 insertions(+), 13 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 7441245b1ceb..c96089b6f388 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -681,31 +681,27 @@ static void end_bbio_data_read(struct btrfs_bio *bbio)
>>   int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array,
>>   			   gfp_t extra_gfp)
>>   {
>> +	const gfp_t gfp = GFP_NOFS | extra_gfp;
>>   	unsigned int allocated;
>>
>>   	for (allocated = 0; allocated < nr_pages;) {
>>   		unsigned int last = allocated;
>>
>> -		allocated = alloc_pages_bulk_array(GFP_NOFS | extra_gfp,
>> -						   nr_pages, page_array);
>> +		allocated = alloc_pages_bulk_array(gfp, nr_pages, page_array);
>> +		if (unlikely(allocated == last)) {
>> +			/* Can not fail, wait and retry. */
>> +			if (extra_gfp & __GFP_NOFAIL) {
>> +				memalloc_retry_wait(GFP_NOFS);
>
> Can this happen? alloc_pages_bulk_array() should not fail when
> GFP_NOFAIL is passed, there are two allocation phases in
> __alloc_pages_bulk() and if it falls back to __alloc_pages() + NOFAIL it
> will not fail ... so what's the point of the retry?

Yeah, that's also one of my concern.

Unlike other fses, btrfs utilizes NOFAIL for metadata memory allocation,
meanwhile others do not.
E.g. XFS always do the retry wait even the allocation does not got a
page allocated. (aka, another kind of NOFAIL).

If needed, I can drop the retry part completely.

>
> Anyway the whole thing with NOFAIL flag that's passed only from
> alloc_extent_buffer() could be made a bit more straightforward. The gfp
> flags replaced by a bool with 'nofail' semantics or 2 helpers one that
> is for normal use an the one for the special purpose.
>

Sure I can do extra cleanups on this.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] btrfs: do not wait for short bulk allocation
  2024-03-28 20:29   ` Qu Wenruo
@ 2024-04-04 19:57     ` David Sterba
  2024-04-04 21:08       ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: David Sterba @ 2024-04-04 19:57 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, Julian Taylor, Filipe Manana

On Fri, Mar 29, 2024 at 06:59:57AM +1030, Qu Wenruo wrote:
> >> This patch would only call memalloc_retry_wait() if failed to allocate
> >> any page for tree block allocation (which goes with __GFP_NOFAIL and may
> >> not need the special handling anyway), and reduce the latency for
> >> btrfs_alloc_page_array().
> >
> > Is this saying that it can fail with GFP_NOFAIL?
> 
> I'd say no, but never say no to memory allocation failure.

It's contradicting and looks confusing in the code, either it fails or
not.

> >> Reported-by: Julian Taylor <julian.taylor@1und1.de>
> >> Tested-by: Julian Taylor <julian.taylor@1und1.de>
> >> Link: https://lore.kernel.org/all/8966c095-cbe7-4d22-9784-a647d1bf27c3@1und1.de/
> >> Fixes: 395cb57e8560 ("btrfs: wait between incomplete batch memory allocations")
> >> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >> Changelog:
> >> v2:
> >> - Still use bulk allocation function
> >>    Since alloc_pages_bulk_array() would fall back to single page
> >>    allocation by itself, there is no need to go alloc_page() manually.
> >>
> >> - Update the commit message to indicate other fses do not call
> >>    memalloc_retry_wait() unconditionally
> >>    In fact, they only call it when they need to retry hard and can not
> >>    really fail.
> >> ---
> >>   fs/btrfs/extent_io.c | 22 +++++++++-------------
> >>   1 file changed, 9 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> >> index 7441245b1ceb..c96089b6f388 100644
> >> --- a/fs/btrfs/extent_io.c
> >> +++ b/fs/btrfs/extent_io.c
> >> @@ -681,31 +681,27 @@ static void end_bbio_data_read(struct btrfs_bio *bbio)
> >>   int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array,
> >>   			   gfp_t extra_gfp)
> >>   {
> >> +	const gfp_t gfp = GFP_NOFS | extra_gfp;
> >>   	unsigned int allocated;
> >>
> >>   	for (allocated = 0; allocated < nr_pages;) {
> >>   		unsigned int last = allocated;
> >>
> >> -		allocated = alloc_pages_bulk_array(GFP_NOFS | extra_gfp,
> >> -						   nr_pages, page_array);
> >> +		allocated = alloc_pages_bulk_array(gfp, nr_pages, page_array);
> >> +		if (unlikely(allocated == last)) {
> >> +			/* Can not fail, wait and retry. */
> >> +			if (extra_gfp & __GFP_NOFAIL) {
> >> +				memalloc_retry_wait(GFP_NOFS);
> >
> > Can this happen? alloc_pages_bulk_array() should not fail when
> > GFP_NOFAIL is passed, there are two allocation phases in
> > __alloc_pages_bulk() and if it falls back to __alloc_pages() + NOFAIL it
> > will not fail ... so what's the point of the retry?
> 
> Yeah, that's also one of my concern.
> 
> Unlike other fses, btrfs utilizes NOFAIL for metadata memory allocation,
> meanwhile others do not.
> E.g. XFS always do the retry wait even the allocation does not got a
> page allocated. (aka, another kind of NOFAIL).
> 
> If needed, I can drop the retry part completely.

The retry logic could make sense for the normal allocations (ie. without
NOFAIL), but as it is now it's confusing. If memory management and
allocator says something does not fail then we should take it as such,
same what we do with bioset allocations of bios and do not expect them
to fail.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] btrfs: do not wait for short bulk allocation
  2024-04-04 19:57     ` David Sterba
@ 2024-04-04 21:08       ` Qu Wenruo
  0 siblings, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2024-04-04 21:08 UTC (permalink / raw)
  To: dsterba; +Cc: Qu Wenruo, linux-btrfs, Julian Taylor, Filipe Manana



在 2024/4/5 06:27, David Sterba 写道:
> On Fri, Mar 29, 2024 at 06:59:57AM +1030, Qu Wenruo wrote:
>>>> This patch would only call memalloc_retry_wait() if failed to allocate
>>>> any page for tree block allocation (which goes with __GFP_NOFAIL and may
>>>> not need the special handling anyway), and reduce the latency for
>>>> btrfs_alloc_page_array().
>>>
>>> Is this saying that it can fail with GFP_NOFAIL?
>>
>> I'd say no, but never say no to memory allocation failure.
>
> It's contradicting and looks confusing in the code, either it fails or
> not.
>
>>>> Reported-by: Julian Taylor <julian.taylor@1und1.de>
>>>> Tested-by: Julian Taylor <julian.taylor@1und1.de>
>>>> Link: https://lore.kernel.org/all/8966c095-cbe7-4d22-9784-a647d1bf27c3@1und1.de/
>>>> Fixes: 395cb57e8560 ("btrfs: wait between incomplete batch memory allocations")
>>>> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>> Changelog:
>>>> v2:
>>>> - Still use bulk allocation function
>>>>     Since alloc_pages_bulk_array() would fall back to single page
>>>>     allocation by itself, there is no need to go alloc_page() manually.
>>>>
>>>> - Update the commit message to indicate other fses do not call
>>>>     memalloc_retry_wait() unconditionally
>>>>     In fact, they only call it when they need to retry hard and can not
>>>>     really fail.
>>>> ---
>>>>    fs/btrfs/extent_io.c | 22 +++++++++-------------
>>>>    1 file changed, 9 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>>> index 7441245b1ceb..c96089b6f388 100644
>>>> --- a/fs/btrfs/extent_io.c
>>>> +++ b/fs/btrfs/extent_io.c
>>>> @@ -681,31 +681,27 @@ static void end_bbio_data_read(struct btrfs_bio *bbio)
>>>>    int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array,
>>>>    			   gfp_t extra_gfp)
>>>>    {
>>>> +	const gfp_t gfp = GFP_NOFS | extra_gfp;
>>>>    	unsigned int allocated;
>>>>
>>>>    	for (allocated = 0; allocated < nr_pages;) {
>>>>    		unsigned int last = allocated;
>>>>
>>>> -		allocated = alloc_pages_bulk_array(GFP_NOFS | extra_gfp,
>>>> -						   nr_pages, page_array);
>>>> +		allocated = alloc_pages_bulk_array(gfp, nr_pages, page_array);
>>>> +		if (unlikely(allocated == last)) {
>>>> +			/* Can not fail, wait and retry. */
>>>> +			if (extra_gfp & __GFP_NOFAIL) {
>>>> +				memalloc_retry_wait(GFP_NOFS);
>>>
>>> Can this happen? alloc_pages_bulk_array() should not fail when
>>> GFP_NOFAIL is passed, there are two allocation phases in
>>> __alloc_pages_bulk() and if it falls back to __alloc_pages() + NOFAIL it
>>> will not fail ... so what's the point of the retry?
>>
>> Yeah, that's also one of my concern.
>>
>> Unlike other fses, btrfs utilizes NOFAIL for metadata memory allocation,
>> meanwhile others do not.
>> E.g. XFS always do the retry wait even the allocation does not got a
>> page allocated. (aka, another kind of NOFAIL).
>>
>> If needed, I can drop the retry part completely.
>
> The retry logic could make sense for the normal allocations (ie. without
> NOFAIL), but as it is now it's confusing. If memory management and
> allocator says something does not fail then we should take it as such,
> same what we do with bioset allocations of bios and do not expect them
> to fail.

Got it, would remove the retry part completely.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] btrfs: do not wait for short bulk allocation
       [not found] ` <20240414202622.B092.409509F4@e16-tech.com>
@ 2024-04-14 22:19   ` Qu Wenruo
  2024-04-15  2:35     ` Wang Yugui
  0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2024-04-14 22:19 UTC (permalink / raw)
  To: Wang Yugui; +Cc: linux-btrfs, Julian Taylor, Filipe Manana



在 2024/4/14 21:56, Wang Yugui 写道:
> Hi,
> 
>> [BUG]
>> There is a recent report that when memory pressure is high (including
>> cached pages), btrfs can spend most of its time on memory allocation in
>> btrfs_alloc_page_array() for compressed read/write.
>>
>> [CAUSE]
>> For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and
>> even if the bulk allocation failed (fell back to single page
>> allocation) we still retry but with extra memalloc_retry_wait().
>>
>> If the bulk alloc only returned one page a time, we would spend a lot of
>> time on the retry wait.
>>
>> The behavior was introduced in commit 395cb57e8560 ("btrfs: wait between
>> incomplete batch memory allocations").
>>
>> [FIX]
>> Although the commit mentioned that other filesystems do the wait, it's
>> not the case at least nowadays.
>>
>> All the mainlined filesystems only call memalloc_retry_wait() if they
>> failed to allocate any page (not only for bulk allocation).
>> If there is any progress, they won't call memalloc_retry_wait() at all.
>>
>> For example, xfs_buf_alloc_pages() would only call memalloc_retry_wait()
>> if there is no allocation progress at all, and the call is not for
>> metadata readahead.
>>
>> So I don't believe we should call memalloc_retry_wait() unconditionally
>> for short allocation.
>>
>> This patch would only call memalloc_retry_wait() if failed to allocate
>> any page for tree block allocation (which goes with __GFP_NOFAIL and may
>> not need the special handling anyway), and reduce the latency for
>> btrfs_alloc_page_array().
>>
>> Reported-by: Julian Taylor <julian.taylor@1und1.de>
>> Tested-by: Julian Taylor <julian.taylor@1und1.de>
>> Link: https://lore.kernel.org/all/8966c095-cbe7-4d22-9784-a647d1bf27c3@1und1.de/
>> Fixes: 395cb57e8560 ("btrfs: wait between incomplete batch memory allocations")
> 
> It seems this patch remove all the logic of
> 	395cb57e8560 ("btrfs: wait between incomplete batch memory allocations"),
> 
> so we should revert this part too?

Oh, right.

Feel free to submit a patch to cleanup the headers here.

Thanks,
Qu
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index c140dd0..df4675e 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -6,7 +6,6 @@
>   #include <linux/mm.h>
>   #include <linux/pagemap.h>
>   #include <linux/page-flags.h>
> -#include <linux/sched/mm.h>
>   #include <linux/spinlock.h>
>   #include <linux/blkdev.h>
>   #include <linux/swap.h>
> 
> 
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2024/04/14
> 
> 
>> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> Changelog:
>> v2:
>> - Still use bulk allocation function
>>    Since alloc_pages_bulk_array() would fall back to single page
>>    allocation by itself, there is no need to go alloc_page() manually.
>>
>> - Update the commit message to indicate other fses do not call
>>    memalloc_retry_wait() unconditionally
>>    In fact, they only call it when they need to retry hard and can not
>>    really fail.
>> ---
>>   fs/btrfs/extent_io.c | 22 +++++++++-------------
>>   1 file changed, 9 insertions(+), 13 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 7441245b1ceb..c96089b6f388 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -681,31 +681,27 @@ static void end_bbio_data_read(struct btrfs_bio *bbio)
>>   int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array,
>>   			   gfp_t extra_gfp)
>>   {
>> +	const gfp_t gfp = GFP_NOFS | extra_gfp;
>>   	unsigned int allocated;
>>   
>>   	for (allocated = 0; allocated < nr_pages;) {
>>   		unsigned int last = allocated;
>>   
>> -		allocated = alloc_pages_bulk_array(GFP_NOFS | extra_gfp,
>> -						   nr_pages, page_array);
>> +		allocated = alloc_pages_bulk_array(gfp, nr_pages, page_array);
>> +		if (unlikely(allocated == last)) {
>> +			/* Can not fail, wait and retry. */
>> +			if (extra_gfp & __GFP_NOFAIL) {
>> +				memalloc_retry_wait(GFP_NOFS);
>> +				continue;
>> +			}
>>   
>> -		if (allocated == nr_pages)
>> -			return 0;
>> -
>> -		/*
>> -		 * During this iteration, no page could be allocated, even
>> -		 * though alloc_pages_bulk_array() falls back to alloc_page()
>> -		 * if  it could not bulk-allocate. So we must be out of memory.
>> -		 */
>> -		if (allocated == last) {
>> +			/* Allowed to fail, error out. */
>>   			for (int i = 0; i < allocated; i++) {
>>   				__free_page(page_array[i]);
>>   				page_array[i] = NULL;
>>   			}
>>   			return -ENOMEM;
>>   		}
>> -
>> -		memalloc_retry_wait(GFP_NOFS);
>>   	}
>>   	return 0;
>>   }
>> -- 
>> 2.44.0
>>
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] btrfs: do not wait for short bulk allocation
  2024-04-14 22:19   ` Qu Wenruo
@ 2024-04-15  2:35     ` Wang Yugui
  0 siblings, 0 replies; 9+ messages in thread
From: Wang Yugui @ 2024-04-15  2:35 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Julian Taylor, Filipe Manana

Hi,
> 
> 
> 在 2024/4/14 21:56, Wang Yugui 写道:
> > Hi,
> >
> >> [BUG]
> >> There is a recent report that when memory pressure is high (including
> >> cached pages), btrfs can spend most of its time on memory allocation in
> >> btrfs_alloc_page_array() for compressed read/write.
> >>
> >> [CAUSE]
> >> For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and
> >> even if the bulk allocation failed (fell back to single page
> >> allocation) we still retry but with extra memalloc_retry_wait().
> >>
> >> If the bulk alloc only returned one page a time, we would spend a lot of
> >> time on the retry wait.
> >>
> >> The behavior was introduced in commit 395cb57e8560 ("btrfs: wait between
> >> incomplete batch memory allocations").
> >>
> >> [FIX]
> >> Although the commit mentioned that other filesystems do the wait, it's
> >> not the case at least nowadays.
> >>
> >> All the mainlined filesystems only call memalloc_retry_wait() if they
> >> failed to allocate any page (not only for bulk allocation).
> >> If there is any progress, they won't call memalloc_retry_wait() at all.
> >>
> >> For example, xfs_buf_alloc_pages() would only call memalloc_retry_wait()
> >> if there is no allocation progress at all, and the call is not for
> >> metadata readahead.
> >>
> >> So I don't believe we should call memalloc_retry_wait() unconditionally
> >> for short allocation.
> >>
> >> This patch would only call memalloc_retry_wait() if failed to allocate
> >> any page for tree block allocation (which goes with __GFP_NOFAIL and may
> >> not need the special handling anyway), and reduce the latency for
> >> btrfs_alloc_page_array().
> >>
> >> Reported-by: Julian Taylor <julian.taylor@1und1.de>
> >> Tested-by: Julian Taylor <julian.taylor@1und1.de>
> >> Link: https://lore.kernel.org/all/8966c095-cbe7-4d22-9784-a647d1bf27c3@1und1.de/
> >> Fixes: 395cb57e8560 ("btrfs: wait between incomplete batch memory allocations")
> >
> > It seems this patch remove all the logic of
> > 	395cb57e8560 ("btrfs: wait between incomplete batch memory allocations"),
> >
> > so we should revert this part too?
> 
> Oh, right.
> 
> Feel free to submit a patch to cleanup the headers here.

This patch is in misc-next now, and yet not in linux upstream.

so a v3 patch in the format of revert 395cb57e8560 ("btrfs: wait between
incomplete batch memory allocations")' maybe better.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2024/04/15


> Thanks,
> Qu
> >
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index c140dd0..df4675e 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -6,7 +6,6 @@
> >   #include <linux/mm.h>
> >   #include <linux/pagemap.h>
> >   #include <linux/page-flags.h>
> > -#include <linux/sched/mm.h>
> >   #include <linux/spinlock.h>
> >   #include <linux/blkdev.h>
> >   #include <linux/swap.h>
> >


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-04-15  2:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-25 22:46 [PATCH v2] btrfs: do not wait for short bulk allocation Qu Wenruo
2024-03-25 22:57 ` Sweet Tea Dorminy
2024-03-26 13:05 ` Johannes Thumshirn
2024-03-28 15:57 ` David Sterba
2024-03-28 20:29   ` Qu Wenruo
2024-04-04 19:57     ` David Sterba
2024-04-04 21:08       ` Qu Wenruo
     [not found] ` <20240414202622.B092.409509F4@e16-tech.com>
2024-04-14 22:19   ` Qu Wenruo
2024-04-15  2:35     ` Wang Yugui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.