Linux-BTRFS Archive on lore.kernel.org
 help / Atom feed
* [PATCH] btrfs: Handle ENOMEM gracefully in cow_file_range_async
@ 2019-01-09 14:43 Nikolay Borisov
  2019-01-09 14:46 ` Johannes Thumshirn
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Nikolay Borisov @ 2019-01-09 14:43 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

If we run out of memory during delalloc filling in compress case btrfs
is going to BUG_ON. This is unnecessary since the higher levels code
(btrfs_run_delalloc_range and its callers) gracefully handle error
condtions and error out the page being submittede. Let's be a model
kernel citizen and no panic the machine due to ENOMEM and instead fail
the IO.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/inode.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index cde51ace68b5..b4b2d7f8a98b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1197,7 +1197,8 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page,
 			 1, 0, NULL);
 	while (start < end) {
 		async_cow = kmalloc(sizeof(*async_cow), GFP_NOFS);
-		BUG_ON(!async_cow); /* -ENOMEM */
+		if (!async_cow)
+			return -ENOMEM;
 		/*
 		 * igrab is called higher up in the call chain, take only the
 		 * lightweight reference for the callback lifetime
-- 
2.17.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: Handle ENOMEM gracefully in cow_file_range_async
  2019-01-09 14:43 [PATCH] btrfs: Handle ENOMEM gracefully in cow_file_range_async Nikolay Borisov
@ 2019-01-09 14:46 ` Johannes Thumshirn
  2019-01-23  8:31 ` Nikolay Borisov
  2019-01-25 15:05 ` David Sterba
  2 siblings, 0 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2019-01-09 14:46 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs

On 09/01/2019 15:43, Nikolay Borisov wrote:
> If we run out of memory during delalloc filling in compress case btrfs
> is going to BUG_ON. This is unnecessary since the higher levels code
> (btrfs_run_delalloc_range and its callers) gracefully handle error
> condtions and error out the page being submittede. Let's be a model
   ^ conditions and            submitted ^ ?

Otherwise:
Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: Handle ENOMEM gracefully in cow_file_range_async
  2019-01-09 14:43 [PATCH] btrfs: Handle ENOMEM gracefully in cow_file_range_async Nikolay Borisov
  2019-01-09 14:46 ` Johannes Thumshirn
@ 2019-01-23  8:31 ` Nikolay Borisov
  2019-01-25 15:05 ` David Sterba
  2 siblings, 0 replies; 6+ messages in thread
From: Nikolay Borisov @ 2019-01-23  8:31 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba



On 9.01.19 г. 16:43 ч., Nikolay Borisov wrote:
> If we run out of memory during delalloc filling in compress case btrfs
> is going to BUG_ON. This is unnecessary since the higher levels code
> (btrfs_run_delalloc_range and its callers) gracefully handle error
> condtions and error out the page being submittede. Let's be a model
> kernel citizen and no panic the machine due to ENOMEM and instead fail
> the IO.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Ping

> ---
>  fs/btrfs/inode.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index cde51ace68b5..b4b2d7f8a98b 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1197,7 +1197,8 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page,
>  			 1, 0, NULL);
>  	while (start < end) {
>  		async_cow = kmalloc(sizeof(*async_cow), GFP_NOFS);
> -		BUG_ON(!async_cow); /* -ENOMEM */
> +		if (!async_cow)
> +			return -ENOMEM;
>  		/*
>  		 * igrab is called higher up in the call chain, take only the
>  		 * lightweight reference for the callback lifetime
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: Handle ENOMEM gracefully in cow_file_range_async
  2019-01-09 14:43 [PATCH] btrfs: Handle ENOMEM gracefully in cow_file_range_async Nikolay Borisov
  2019-01-09 14:46 ` Johannes Thumshirn
  2019-01-23  8:31 ` Nikolay Borisov
@ 2019-01-25 15:05 ` David Sterba
  2019-01-25 16:17   ` Nikolay Borisov
  2019-01-25 17:46   ` Filipe Manana
  2 siblings, 2 replies; 6+ messages in thread
From: David Sterba @ 2019-01-25 15:05 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs, wqu, fdmanana

On Wed, Jan 09, 2019 at 04:43:03PM +0200, Nikolay Borisov wrote:
> If we run out of memory during delalloc filling in compress case btrfs
> is going to BUG_ON. This is unnecessary since the higher levels code
> (btrfs_run_delalloc_range and its callers) gracefully handle error
> condtions and error out the page being submittede. Let's be a model
> kernel citizen and no panic the machine due to ENOMEM and instead fail
> the IO.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  fs/btrfs/inode.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index cde51ace68b5..b4b2d7f8a98b 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1197,7 +1197,8 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page,
>  			 1, 0, NULL);
>  	while (start < end) {
>  		async_cow = kmalloc(sizeof(*async_cow), GFP_NOFS);
> -		BUG_ON(!async_cow); /* -ENOMEM */
> +		if (!async_cow)
> +			return -ENOMEM;

The error handling here is very simple and breaks the usual rule that
all functions must clean up after themselves before returning to the
caller.

This is async submission so it can be expected to do deferred cleanup,
but this cannot be easily seen from the function and should be better
documented.

What happens with the inode reference (igrab), what happens with all
work queued until now, or extent range state bits.

It's true that btrfs_run_delalloc_range does error handling, though it
does that from 4 different types of conditions (nocow, prealloc,
compression and async). I'd really like to see explained that there's
nothing left and cause surprises later. The memory allocation failures
are almost never tested so we have to be sure we understand the error
handling code flow. I can't say I do after reading your changelog and
the correctness proof is left as an exercise.

The error handling was brought by 524272607e882d04 "btrfs: Handle
delalloc error correctly to avoid ordered extent hang", so there's a
remote chance to cause lockups when the state is not cleaned up.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: Handle ENOMEM gracefully in cow_file_range_async
  2019-01-25 15:05 ` David Sterba
@ 2019-01-25 16:17   ` Nikolay Borisov
  2019-01-25 17:46   ` Filipe Manana
  1 sibling, 0 replies; 6+ messages in thread
From: Nikolay Borisov @ 2019-01-25 16:17 UTC (permalink / raw)
  To: dsterba, linux-btrfs, wqu, fdmanana



On 25.01.19 г. 17:05 ч., David Sterba wrote:
> On Wed, Jan 09, 2019 at 04:43:03PM +0200, Nikolay Borisov wrote:
>> If we run out of memory during delalloc filling in compress case btrfs
>> is going to BUG_ON. This is unnecessary since the higher levels code
>> (btrfs_run_delalloc_range and its callers) gracefully handle error
>> condtions and error out the page being submittede. Let's be a model
>> kernel citizen and no panic the machine due to ENOMEM and instead fail
>> the IO.
>>
>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>> ---
>>  fs/btrfs/inode.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index cde51ace68b5..b4b2d7f8a98b 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -1197,7 +1197,8 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page,
>>  			 1, 0, NULL);
>>  	while (start < end) {
>>  		async_cow = kmalloc(sizeof(*async_cow), GFP_NOFS);
>> -		BUG_ON(!async_cow); /* -ENOMEM */
>> +		if (!async_cow)
>> +			return -ENOMEM;
> 
> The error handling here is very simple and breaks the usual rule that
> all functions must clean up after themselves before returning to the
> caller.
> 
> This is async submission so it can be expected to do deferred cleanup,
> but this cannot be easily seen from the function and should be better
> documented.
> 
> What happens with the inode reference (igrab), what happens with all
> work queued until now, or extent range state bits.
> 
> It's true that btrfs_run_delalloc_range does error handling, though it
> does that from 4 different types of conditions (nocow, prealloc,
> compression and async). I'd really like to see explained that there's
> nothing left and cause surprises later. The memory allocation failures
> are almost never tested so we have to be sure we understand the error
> handling code flow. I can't say I do after reading your changelog and
> the correctness proof is left as an exercise.

Good point, looking at the body of the loop in case of INODE_NOCOMPRESS
&& !FORCE_COMPRESS the loop is going to be executed only once. Otherwise
it will be split into 512k chunks. So I wonder if it will be better if
the if/else branch in the middle of the loop is hoisted out of it. So at
the beginning we can calculate whether the loop is going to be executed
just once or shall we just allocate an array of async_cow's with length
(end - start) / 512.

> 
> The error handling was brought by 524272607e882d04 "btrfs: Handle
> delalloc error correctly to avoid ordered extent hang", so there's a
> remote chance to cause lockups when the state is not cleaned up.
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: Handle ENOMEM gracefully in cow_file_range_async
  2019-01-25 15:05 ` David Sterba
  2019-01-25 16:17   ` Nikolay Borisov
@ 2019-01-25 17:46   ` Filipe Manana
  1 sibling, 0 replies; 6+ messages in thread
From: Filipe Manana @ 2019-01-25 17:46 UTC (permalink / raw)
  To: dsterba, Nikolay Borisov, linux-btrfs, Qu Wenruo,
	Filipe David Borba Manana

On Fri, Jan 25, 2019 at 3:08 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Wed, Jan 09, 2019 at 04:43:03PM +0200, Nikolay Borisov wrote:
> > If we run out of memory during delalloc filling in compress case btrfs
> > is going to BUG_ON. This is unnecessary since the higher levels code
> > (btrfs_run_delalloc_range and its callers) gracefully handle error
> > condtions and error out the page being submittede. Let's be a model
> > kernel citizen and no panic the machine due to ENOMEM and instead fail
> > the IO.
> >
> > Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> > ---
> >  fs/btrfs/inode.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index cde51ace68b5..b4b2d7f8a98b 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -1197,7 +1197,8 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page,
> >                        1, 0, NULL);
> >       while (start < end) {
> >               async_cow = kmalloc(sizeof(*async_cow), GFP_NOFS);
> > -             BUG_ON(!async_cow); /* -ENOMEM */
> > +             if (!async_cow)
> > +                     return -ENOMEM;
>
> The error handling here is very simple and breaks the usual rule that
> all functions must clean up after themselves before returning to the
> caller.
>
> This is async submission so it can be expected to do deferred cleanup,
> but this cannot be easily seen from the function and should be better
> documented.
>
> What happens with the inode reference (igrab), what happens with all
> work queued until now, or extent range state bits.
>
> It's true that btrfs_run_delalloc_range does error handling, though it
> does that from 4 different types of conditions (nocow, prealloc,
> compression and async). I'd really like to see explained that there's
> nothing left and cause surprises later. The memory allocation failures
> are almost never tested so we have to be sure we understand the error
> handling code flow. I can't say I do after reading your changelog and
> the correctness proof is left as an exercise.
>
> The error handling was brought by 524272607e882d04 "btrfs: Handle
> delalloc error correctly to avoid ordered extent hang", so there's a
> remote chance to cause lockups when the state is not cleaned up.

So taking a quick look at this, just returning does not seem correct:

- If you have for example a dealloc range of 1Mb, submitting the first
512K async job succeeds
  but the the second one fails due to the out of memory issue, then
writepage_delalloc() will
  set the error bit on the page starting at offset 0 of the range, and
not the one at 512K.
  What if the first submitted range succeeds? What happens? I think
the error handling from the
  caller should be aware if writeback started for a part of the range
already, and if so operate only
  on the remaining of the range.

Did you actually run a test where an iteration other then the first
one fails and see if there
were no hangs, error reported to user space (by means of an fsync for
example), no leaks (kmemleak helps here), etc?

This is very fishy because of the async nature of the compression
path, each submitted job also does
error handling if anything fails, and they all get a reference for the
first page of the whole range.
Did you check if we don't end up with 2 tasks unlocking that page for example?

Pre-allocating the async_cow structures at the beginning of
cow_file_range_async(), as suggested in another reply,
seems ok to me and would not need any adjustments of the existing
error handling code.


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-09 14:43 [PATCH] btrfs: Handle ENOMEM gracefully in cow_file_range_async Nikolay Borisov
2019-01-09 14:46 ` Johannes Thumshirn
2019-01-23  8:31 ` Nikolay Borisov
2019-01-25 15:05 ` David Sterba
2019-01-25 16:17   ` Nikolay Borisov
2019-01-25 17:46   ` Filipe Manana

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox