linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Su Yue <Damenly_Su@gmx.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>,
	Su Yue <suy.fnst@cn.fujitsu.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: [RFC PATCH 00/17] btrfs: implementation of priority aware allocator
Date: Sun, 2 Dec 2018 13:28:00 +0800	[thread overview]
Message-ID: <983cb730-2b0c-cf40-c5e4-c23b54c6910f@gmx.com> (raw)
In-Reply-To: <6c19f898-ee15-0670-2094-ce870ae3d513@gmx.com>



On 2018/11/28 12:04 PM, Qu Wenruo wrote:
> 
> 
> On 2018/11/28 上午11:11, Su Yue wrote:
>> This patchset can be fetched from repo:
>> https://github.com/Damenly/btrfs-devel/commits/priority_aware_allocator.
>> Since patchset 'btrfs: Refactor find_free_extent()' does a nice work
>> to simplify find_free_extent(). This patchset dependents on the refactor.
>> The base is the commit in kdave/misc-next:
>>
>> commit fcaaa1dfa81f2f87ad88cbe0ab86a07f9f76073c (kdave/misc-next)
>> Author: Nikolay Borisov <nborisov@suse.com>
>> Date:   Tue Nov 6 16:40:20 2018 +0200
>>
>>      btrfs: Always try all copies when reading extent buffers
>>
>>
>> This patchset introduces a new mount option named 'priority_alloc=%s',
>> %s is supported to be "usage" and "off" now. The mount option changes
>> the way find_free_extent() how to search block groups.
>>
>> Previously, block groups are stored in list of btrfs_space_info
>> by start position. When call find_free_extent() if no hint,
>> block_groups are searched one by one.
>>
>> Design of priority aware allocator:
>> Block group has its own priority. We split priorities to many levels,
>> block groups are split to different trees according priorities.
>> And those trees are sorted by their levels and stored in space_info.
>> Once find_free_extent() is called, try to search block groups in higher
>> priority level then lower level. Then a block group with higher
>> priority is more likely to be used.
>>
>> Pros:
>> 1) Reduce the frequency of balance.
>>     The block group with a higher usage rate will be used preferentially
>>     for allocating extents. Free the empty block groups with pinned bytes
>>     as non-zero.[1]
>>
>> 2) The priority of empty block group with pinned bytes as non-zero
>>     will be set as the lowest.
>>     
>> 3) Support zoned block device.[2]
>>     For metadata allocation, the block group in conventional zones
>>     will be used as much as possible regardless of usage rate.
>>     Will do it in future.
> 
> Personally I'm a big fan of the priority aware extent allocator.
> 
> So nice job!
>

Thanks for the offline help.

>>     
>> Cons:
>> 1) Expectable performance regression.
>>     The degree of the decline is temporarily unknown.
>>     The user can disable block group priority to get the full performance.
>>
>> TESTS:
>>
>> If use usage as priority(the only available option), empty block group
>> is much harder to be reused.
>>
>> About block group usage:
>>    Disk: 4 x 1T HDD gathered in LVM.
>>
>>    Run script to create files and delete files randomly in loop.
>>    The num of files to create are double than to delete.
>>
>>    Default mount option result:
>>    https://i.loli.net/2018/11/28/5bfdfdf08c760.png
>>
>>    Priority aware allocator(usage) result:
>>    https://i.loli.net/2018/11/28/5bfdfdf0c1b11.png
>>
>>    X coordinate means total disk usage, Y coordinate means avg block
>>    group usage.
>>
>>    Due to fragmentation of extents, the different are not obvious,
>>    only about 1% improvement....
> 
> I think you're using the wrong indicator to show the difference.
> 
> The real indicator should not be overall block group usage, but:
> 1) Number of block groups
> 2) Usage distribution of the block groups
> 
> If the number of block groups isn't much different, then we should go
> check the distribution.
> E.g. all bgs with 97% usage is not as good mostly 100% bgs and several
> near 10% bgs.
>

Took some time to write scripts for summary:
Avg of percentage of block groups during disk usage from 0% to 100%

For block groups whose usage >= 98%, default: 31.09%, priorty_alloc: 46.73%


For block groups whose usage >= 95%, default: 57.69%, priorty_alloc: 64.24%


For block groups whose usage >= 90%, default: 79.87%, priorty_alloc: 80.2%

So this patchset does work in improvement of block groups usages.


> And we should check the usage distribution between metadata and data bgs.
> For data bg, we could hit some fragmentation problem, while for meta bgs
> all extents are in the same size, thus may have a better performance for
> metadata.
> 
> Thus we could do better for the test result.
> 
>>    						
>> Performance regression:
>>     I have ran sysbench on our machine with SSD in multi combinations,
>>     no obvious regression found.
>>     However in theory, the new allocator may cost more time in some
>>     cases.
> 
> Isn't that a good news? :)
> 

Yeah.
>>
>> [1] https://www.spinics.net/lists/linux-btrfs/msg79508.html
>> [2] https://lkml.org/lkml/2018/8/16/174
>>
>> ---
>> Due to some reasons includes time and hardware, the use-case is not
>> outstanding enough.
> 
> As discussed offline, another cause would be data extent fragmentations.
> E.g we have a lot of small 4K holes but the request is a big 128M.
> In that case btrfs_reserve_extent() could still trigger a new data chunk
> other than return the 4K holes found.
> 

IMO, this is another business. Doing it in another patchset is prefered.

Thanks,
Su

> Thanks,
> Qu
> 
>> And some codes are dirty but I can't found another
>> way. So I named it as RFC.
>>       Any comments and suggestions are welcome.
>>       
>> Su Yue (17):
>>    btrfs: priority alloc: prepare of priority aware allocator
>>    btrfs: add mount definition BTRFS_MOUNT_PRIORITY_USAGE
>>    btrfs: priority alloc: introduce compute_block_group_priority/usage
>>    btrfs: priority alloc: add functions to create/remove priority trees
>>    btrfs: priority alloc: introduce functions to add block group to
>>      priority tree
>>    btrfs: priority alloc: introduce three macros to mark block group
>>      status
>>    btrfs: priority alloc: add functions to remove block group from
>>      priority tree
>>    btrfs: priority alloc: add btrfs_update_block_group_priority()
>>    btrfs: priority alloc: call create/remove_priority_trees in space_info
>>    btrfs: priority alloc: call add_block_group_priority while reading or
>>      making block group
>>    btrfs: priority alloc: remove block group from priority tree while
>>      removing block group
>>    btrfs: priority alloc: introduce find_free_extent_search()
>>    btrfs: priority alloc: modify find_free_extent() to fit priority
>>      allocator
>>    btrfs: priority alloc: introduce btrfs_set_bg_updating and call
>>      btrfs_update_block_group_prioriy
>>    btrfs: priority alloc: write bg->priority_groups_sem while waiting
>>      reservation
>>    btrfs: priority alloc: write bg->priority_tree->groups_sem to avoid
>>      race in btrfs_delete_unused_bgs()
>>    btrfs: add mount option "priority_alloc=%s"
>>
>>   fs/btrfs/ctree.h            |  28 ++
>>   fs/btrfs/extent-tree.c      | 672 +++++++++++++++++++++++++++++++++---
>>   fs/btrfs/free-space-cache.c |   3 +
>>   fs/btrfs/super.c            |  18 +
>>   fs/btrfs/transaction.c      |   1 +
>>   5 files changed, 681 insertions(+), 41 deletions(-)
>>

      reply	other threads:[~2018-12-02  5:28 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-28  3:11 [RFC PATCH 00/17] btrfs: implementation of priority aware allocator Su Yue
2018-11-28  3:11 ` [RFC PATCH 01/17] btrfs: priority alloc: prepare " Su Yue
2018-11-28  8:24   ` Nikolay Borisov
2018-11-28  9:24     ` Su Yue
2018-11-28  3:11 ` [RFC PATCH 02/17] btrfs: add mount definition BTRFS_MOUNT_PRIORITY_USAGE Su Yue
2018-11-28  3:11 ` [RFC PATCH 03/17] btrfs: priority alloc: introduce compute_block_group_priority/usage Su Yue
2018-11-28  8:56   ` Nikolay Borisov
2018-11-28  3:11 ` [RFC PATCH 04/17] btrfs: priority alloc: add functions to create/remove priority trees Su Yue
2018-11-28  3:11 ` [RFC PATCH 05/17] btrfs: priority alloc: introduce functions to add block group to priority tree Su Yue
2018-11-28  3:11 ` [RFC PATCH 06/17] btrfs: priority alloc: introduce three macros to mark block group status Su Yue
2018-11-28  3:11 ` [RFC PATCH 07/17] btrfs: priority alloc: add functions to remove block group from priority tree Su Yue
2018-11-28  3:11 ` [RFC PATCH 08/17] btrfs: priority alloc: add btrfs_update_block_group_priority() Su Yue
2018-11-28  3:11 ` [RFC PATCH 09/17] btrfs: priority alloc: call create/remove_priority_trees in space_info Su Yue
2018-11-28  3:11 ` [RFC PATCH 10/17] btrfs: priority alloc: call add_block_group_priority while reading or making block group Su Yue
2018-11-28  3:11 ` [RFC PATCH 11/17] btrfs: priority alloc: remove block group from priority tree while removing " Su Yue
2018-11-28  3:11 ` [RFC PATCH 12/17] btrfs: priority alloc: introduce find_free_extent_search() Su Yue
2018-11-28  3:11 ` [RFC PATCH 13/17] btrfs: priority alloc: modify find_free_extent() to fit priority allocator Su Yue
2018-11-28  3:11 ` [RFC PATCH 14/17] btrfs: priority alloc: introduce btrfs_set_bg_updating and call btrfs_update_block_group_prioriy Su Yue
2018-11-28  3:11 ` [RFC PATCH 15/17] btrfs: priority alloc: write bg->priority_groups_sem while waiting reservation Su Yue
2018-11-28  3:11 ` [RFC PATCH 16/17] btrfs: priority alloc: write bg->priority_tree->groups_sem to avoid race in btrfs_delete_unused_bgs() Su Yue
2018-11-28  3:11 ` [RFC PATCH 17/17] btrfs: add mount option "priority_alloc=%s" Su Yue
2018-11-28  4:04 ` [RFC PATCH 00/17] btrfs: implementation of priority aware allocator Qu Wenruo
2018-12-02  5:28   ` Su Yue [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=983cb730-2b0c-cf40-c5e4-c23b54c6910f@gmx.com \
    --to=damenly_su@gmx.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=suy.fnst@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).