All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ellis H. Wilson III" <ellisw@panasas.com>
To: Hans van Kranenburg <hans.van.kranenburg@mendix.com>,
	Qu Wenruo <quwenruo.btrfs@gmx.com>,
	Nikolay Borisov <nborisov@suse.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: Status of FST and mount times
Date: Fri, 16 Feb 2018 09:55:29 -0500	[thread overview]
Message-ID: <35e8989b-9b6f-27c0-e2d9-0920bbd84726@panasas.com> (raw)
In-Reply-To: <b45357dd-044c-bff8-8e1c-4c8a06fb9636@panasas.com>

On 02/16/2018 09:42 AM, Ellis H. Wilson III wrote:
> On 02/16/2018 09:20 AM, Hans van Kranenburg wrote:
>> Well, imagine you have a big tree (an actual real life tree outside) and
>> you need to pick things (e.g. apples) which are hanging everywhere.
>>
>> So, what you need to to is climb the tree, climb on a branch all the way
>> to the end where the first apple is... climb back, climb up a bit, go
>> onto the next branch to the end for the next apple... etc etc....
>>
>> The bigger the tree is, the longer it keeps you busy, because the apples
>> will be semi-evenly distributed around the full tree, and they're always
>> hanging at the end of the branch. The speed with which you can climb
>> around (random read disk access IO speed for btrfs, because your disk
>> cache is empty when first mounting) determines how quickly you're done.
>>
>> So, yes.
> 
> Thanks Hans.  I will say multiple minutes (by the looks of things, I'll 
> end up near to an hour for 60TB if this non-linear scaling continues) to 
> mount a filesystem is undesirable, but I won't offer that criticism 
> without thinking constructively for a moment:
> 
> Help me out by referencing the tree in question if you don't mind, so I 
> can better understand the point of picking all these "apples" (I would 
> guess for capacity reporting via df, but maybe there's more).
> 
> Typical disclaimer that I haven't yet grokked the various inner-workings 
> of BTRFS, so this is quite possibly a terrible or unapproachable idea:
> 
> On umount, you must already have whatever metadata you were doing the 
> tree walk on mount for in-memory (otherwise you would have been able to 
> lazily do the treewalk after a quick mount).  Therefore, could we not 
> stash this metadata at or associated with, say, the root of the 
> subvolumes?  This way you can always determine on mount quickly if the 
> cache is still valid (i.e., no situation like: remount with old btrfs, 
> change stuff, umount with old btrfs, remount with new btrfs, pain).  I 
> would guess generation would be sufficient to determine if the cached 
> metadata is valid for the given root block.
> 
> This would scale with number of subvolumes (but not snapshots), and 
> would be reasonably quick I think.

I see on 02/13 Qu commented regarding a similar idea, except proposed 
perhaps a richer version of my above suggestion (making block group into 
its own tree).  The concern was that it would be a lot of work since it 
modifies the on-disk format.  That's a reasonable worry.

I will get a new kernel, expand my array to around 36TB, and will 
generate a plot of mount times against extents going up to at least 30TB 
in increments of 0.5TB.  If this proves to reach absurd mount time 
delays (to be specific, anything above around 60s is untenable for our 
use), we may very well be sufficiently motivated to implement the above 
improvement and submit it for consideration.  Accordingly, if anybody 
has additional and/or more specific thoughts on the optimization, I am 
all ears.

Best,

ellis

  reply	other threads:[~2018-02-16 14:55 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-14 16:00 Status of FST and mount times Ellis H. Wilson III
2018-02-14 17:08 ` Nikolay Borisov
2018-02-14 17:21   ` Ellis H. Wilson III
2018-02-15  1:42   ` Qu Wenruo
2018-02-15  2:15     ` Duncan
2018-02-15  3:49       ` Qu Wenruo
2018-02-15 11:12     ` Hans van Kranenburg
2018-02-15 16:30       ` Ellis H. Wilson III
2018-02-16  1:55         ` Qu Wenruo
2018-02-16 14:12           ` Ellis H. Wilson III
2018-02-16 14:20             ` Hans van Kranenburg
2018-02-16 14:42               ` Ellis H. Wilson III
2018-02-16 14:55                 ` Ellis H. Wilson III [this message]
2018-02-17  0:59             ` Qu Wenruo
2018-02-20 14:59               ` Ellis H. Wilson III
2018-02-20 15:41                 ` Austin S. Hemmelgarn
2018-02-21  1:49                   ` Qu Wenruo
2018-02-21 14:49                     ` Ellis H. Wilson III
2018-02-21 15:03                       ` Hans van Kranenburg
2018-02-21 15:19                         ` Ellis H. Wilson III
2018-02-21 15:56                           ` Hans van Kranenburg
2018-02-22 12:41                             ` Austin S. Hemmelgarn
2018-02-21 21:27                       ` E V
2018-02-22  0:53                       ` Qu Wenruo
2018-02-15  5:54   ` Chris Murphy
2018-02-14 23:24 ` Duncan
2018-02-15 15:42   ` Ellis H. Wilson III
2018-02-15 16:51     ` Austin S. Hemmelgarn
2018-02-15 16:58       ` Ellis H. Wilson III
2018-02-15 17:57         ` Austin S. Hemmelgarn
2018-02-15  6:14 ` Chris Murphy
2018-02-15 16:45   ` Ellis H. Wilson III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=35e8989b-9b6f-27c0-e2d9-0920bbd84726@panasas.com \
    --to=ellisw@panasas.com \
    --cc=hans.van.kranenburg@mendix.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nborisov@suse.com \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.