All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robbie Ko <robbieko@synology.com>
To: dsterba@suse.cz, linux-btrfs@vger.kernel.org, wqu@suse.com
Subject: Re: [PATCH v2] btrfs: speedup mount time with readahead chunk tree
Date: Fri, 10 Jul 2020 09:54:58 +0800	[thread overview]
Message-ID: <4df1b2e0-71a9-9466-7b8d-859358f61667@synology.com> (raw)
In-Reply-To: <20200709091305.GE28832@twin.jikos.cz>


David Sterba 於 2020/7/9 下午5:13 寫道:
> On Thu, Jul 09, 2020 at 10:38:42AM +0800, Robbie Ko wrote:
>> David Sterba 於 2020/7/9 上午5:11 寫道:
>>> On Tue, Jul 07, 2020 at 09:25:11PM +0200, David Sterba wrote:
>>>> On Tue, Jul 07, 2020 at 11:59:44AM +0800, robbieko wrote:
>>>> This skips readahead of all nodes above level 1, if you find a nicer way
>>>> to readahead the whole tree I won't object, but for the first
>>>> implementation the level 1 seems ok to me.
>>> Patch below, I tried to create large system chunk by fallocate on a
>>> sparse loop device, but got only 1 node on level 1 so the readahead
>>> cannot show off.
>>>
>>> # btrfs fi df .
>>> Data, single: total=59.83TiB, used=59.83TiB
>>> System, single: total=36.00MiB, used=6.20MiB
>>> Metadata, single: total=1.01GiB, used=91.78MiB
>>> GlobalReserve, single: total=26.80MiB, used=0.00B
>>>
>>> There were 395 leaf nodes that got read ahead, time between the first
>>> and last is 0.83s and the block group tree read took about 40 seconds.
>>> This was in a VM with file-backed images, and the loop device was
>>> constructed from these devices so it's spinning rust.
>>>
>>> I don't have results for non-prefetched mount to compare at the moment.
>>>
>> I think what you're doing is working.
>>
>> But there are many similar problems that need to be improved.
> Agreed, but this started from 'let's readahead chunk tree' and now we
> drifted to fixing or perhaps making better use of readahead in several
> other areas.
>
>> 1. load_free_space_tree
>> We need to read all BTRFS_FREE_SPACE_BITMAP_KEY and
>> BTRFS_FREE_SPACE_EXTENT_KEY until the next FREE_SPACE_INFO_KEY.
>>
>> 2. populate_free_space_tree
>> We need to read all BTRFS_EXTENT_ITEM_KEY and BTRFS_METADATA_ITEM_KEY
>> until the end of the block group
>>
>> 3. btrfs_real_readdir
>> We need as many reads as possible (inode, BTRFS_DIR_INDEX_KEY).
>>
>> 4. btrfs_clone
>> We need as many reads as possible (inode, BTRFS_EXTENT_DATA_KEY).
>>
>> 5. btrfs_verify_dev_extents
>> We need to read all the BTRFS_DEV_EXTENT_KEYs.
> Each case needs to be evaluated separately because the items live in
> different trees and other item types could be scattered among the ones
> we're interested in.
>
> But the list gives an insight in what types of readahead we might need,
> like full key range [from, to], or just all items of one key type.
>
>> 6. caching_kthread (inode-map.c)
>> We need all the BTRFS_INODE_ITEM_KEY of fs_tree to build the inode map
>>
>> For the above cases.
>> It is not possible to write a special readahead code for each case.
>> We have to provide a new readaread framework
>> Enable the caller to determine the scope of readaheads needed.
>> The possible parameters of the readahead are as follows
>> 1. reada_maximum_nr : Read a maximum of several leaves at a time.
>> 2. reada_max_key : READA_FORWARD Early Suspension Condition
>> 3. reada_min_key : READA_BACK Abort condition ahead of time.
> Yeah something like that.
>
>> We need to review all users of readahead to confirm that the The
>> behavior of readahead.
>> For example, in scrub_enumerate_chunks readahead has the effect of Very
>> small,
>> Because most of the time is spent on scrub_chunk,
>> The processing of scrub_chunk for all DEV_EXTENT in a leaf is A long time.
>> If the dev tree has been modified in the meantime, the previously
>> pre-reading leaf may be useless.
> Yes that's another case where doing the readahead is useless.
>
> So, now it's a question if we should start with the easy cases with
> specific readahead and then unify them under a common API, or try to
> figure out the API and then audit all us.
>
> I'd be more in favor of the former as it allows to give us a baseline
> where the readahead would be implemented optimally, the followup API
> cleanup would need to keep the performance.
>
> The latter is IMHO harder just because getting an API right on the first
> try usually does not work.

Okay, I agree with you.
Please help submit your patch to speedup chunk tree read.
Thank you for your help.



      reply	other threads:[~2020-07-10  1:55 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-07  3:59 [PATCH v2] btrfs: speedup mount time with readahead chunk tree robbieko
2020-07-07 19:25 ` David Sterba
2020-07-08  2:19   ` Robbie Ko
2020-07-08 14:04     ` David Sterba
2020-07-08 14:57       ` Holger Hoffstätte
2020-07-08 15:21         ` David Sterba
2020-07-09  1:46         ` Robbie Ko
2020-07-09  7:17           ` Holger Hoffstätte
2020-07-08 21:11   ` David Sterba
2020-07-09  2:38     ` Robbie Ko
2020-07-09  9:13       ` David Sterba
2020-07-10  1:54         ` Robbie Ko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4df1b2e0-71a9-9466-7b8d-859358f61667@synology.com \
    --to=robbieko@synology.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.