All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
To: "Holger Hoffstätte" <holger@applied-asynchrony.com>,
	linux-btrfs@vger.kernel.org
Cc: <jbacik@fb.com>, <dsterba@suse.com>
Subject: Re: [PATCH 2/2] btrfs: fix false enospc for compression
Date: Mon, 17 Oct 2016 17:01:46 +0800	[thread overview]
Message-ID: <5804937A.2060506@cn.fujitsu.com> (raw)
In-Reply-To: <5800E4AB.2010701@applied-asynchrony.com>

hi,

On 10/14/2016 09:59 PM, Holger Hoffstätte wrote:
> On 10/06/16 04:51, Wang Xiaoguang wrote:
>> When testing btrfs compression, sometimes we got ENOSPC error, though fs
>> still has much free space, xfstests generic/171, generic/172, generic/173,
>> generic/174, generic/175 can reveal this bug in my test environment when
>> compression is enabled.
>>
>> After some debuging work, we found that it's btrfs_delalloc_reserve_metadata()
>> which sometimes tries to reserve plenty of metadata space, even for very small
>> data range. In btrfs_delalloc_reserve_metadata(), the number of metadata bytes
>> we try to reserve is calculated by the difference between outstanding_extents
>> and reserved_extents. Please see below case for how ENOSPC occurs:
>>
>>    1, Buffered write 128MB data in unit of 128KB, so finially we'll have inode
>> outstanding extents be 1, and reserved_extents be 1024. Note it's
>> btrfs_merge_extent_hook() that merges these 128KB units into one big
>> outstanding extent, but do not change reserved_extents.
>>
>>    2, When writing dirty pages, for compression, cow_file_range_async() will
>> split above big extent in unit of 128KB(compression extent size is 128KB).
>> When first split opeartion finishes, we'll have 2 outstanding extents and 1024
>> reserved extents, and just right now the currently generated ordered extent is
>> dispatched to run and complete, then btrfs_delalloc_release_metadata()(see
>> btrfs_finish_ordered_io()) will be called to release metadata, after that we
>> will have 1 outstanding extents and 1 reserved extents(also see logic in
>> drop_outstanding_extent()). Later cow_file_range_async() continues to handles
>> left data range[128KB, 128MB), and if no other ordered extent was dispatched
>> to run, there will be 1023 outstanding extents and 1 reserved extent.
>>
>>    3, Now if another bufferd write for this file enters, then
>> btrfs_delalloc_reserve_metadata() will at least try to reserve metadata
>> for 1023 outstanding extents' metadata, for 16KB node size, it'll be 1023*16384*2*8,
>> about 255MB, for 64K node size, it'll be 1023*65536*8*2, about 1GB metadata, so
>> obviously it's not sane and can easily result in enospc error.
>>
>> The root cause is that for compression, its max extent size will no longer be
>> BTRFS_MAX_EXTENT_SIZE(128MB), it'll be 128KB, so current metadata reservation
>> method in btrfs is not appropriate or correct, here we introduce:
>> 	enum btrfs_metadata_reserve_type {
>>          	BTRFS_RESERVE_NORMAL,
>>          	BTRFS_RESERVE_COMPRESS,
>> 	};
>> and expand btrfs_delalloc_reserve_metadata() and btrfs_delalloc_reserve_space()
>> by adding a new enum btrfs_metadata_reserve_type argument. When a data range will
>> go through compression, we use BTRFS_RESERVE_COMPRESS to reserve metatata.
>> Meanwhile we introduce EXTENT_COMPRESS flag to mark a data range that will go
>> through compression path.
>>
>> With this patch, we can fix these false enospc error for compression.
>>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> I took some time again to get this into my tree on top of what's in
> btrfs-4.9rc1 and managed to merge it after all.
>
> Both this and patch #1 seem to work fine, and they don't seem to cause any
> regressions; ran a couple of both full and incremental rsync backups with
>> 100GB on a new and now compressed subvolume without problem. Also Stefan
> just reported that his ENOSPC seems to be gone as well, so it seems to be
> good. \o/
>
> So for both this and patch #1 have a careful:
>
> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
>
> Also a comment about something I found while resolving conflicts caused
> by the preliminary dedupe suppoort:
>
> [..]
>>   int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
>> -			      struct extent_state **cached_state);
>> +			      struct extent_state **cached_state, int flag);
>>   int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
>> -			    struct extent_state **cached_state);
>> +			    struct extent_state **cached_state, int flag);
> [..]
>>   int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
>>   		      struct page **pages, size_t num_pages,
>>   		      loff_t pos, size_t write_bytes,
>> -		      struct extent_state **cached);
>> +		      struct extent_state **cached, int flag);
> Instead of adding "int flag" why not use the already defined
> btrfs_metadata_reserve_type enum? I know it's just an int at the end of
> the day, but the dedupe support already added another "int dedupe" argument
> and it's probably easy to cause confusion.
> Maybe later it would be beneficial to consolidate the flags into a consistent
> set of enum values to prevent more "int flag" inflation and better declare the
> intent of the extent state change. Not sure if that makes sense.
Yes, agree.
I'll rebase them later, thanks.

Regards,
Xiaoguang Wang

>
> thanks,
> Holger
>
>
>




  reply	other threads:[~2016-10-17  9:07 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-06  2:51 [PATCH 1/2] btrfs: improve inode's outstanding_extents computation Wang Xiaoguang
2016-10-06  2:51 ` [PATCH 2/2] btrfs: fix false enospc for compression Wang Xiaoguang
2016-10-06  3:51   ` Wang Xiaoguang
2016-10-12  3:12   ` Wang Xiaoguang
2016-10-17 15:05     ` David Sterba
2016-10-14 13:09   ` Stefan Priebe - Profihost AG
2016-10-14 13:59   ` Holger Hoffstätte
2016-10-17  9:01     ` Wang Xiaoguang [this message]
2016-10-19 14:23       ` David Sterba
2016-10-25 10:43         ` Wang Xiaoguang
2016-10-14 13:09 ` [PATCH 1/2] btrfs: improve inode's outstanding_extents computation Stefan Priebe - Profihost AG
2016-10-23 17:45   ` Stefan Priebe - Profihost AG
2016-11-01 10:18 Wang Xiaoguang
2016-11-01 10:18 ` [PATCH 2/2] btrfs: fix false enospc for compression Wang Xiaoguang
2016-11-01 10:28   ` Wang Xiaoguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5804937A.2060506@cn.fujitsu.com \
    --to=wangxg.fnst@cn.fujitsu.com \
    --cc=dsterba@suse.com \
    --cc=holger@applied-asynchrony.com \
    --cc=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.