From: Chris Mason <firstname.lastname@example.org> To: Edward Shishkin <email@example.com> Cc: Jamie Lokier <firstname.lastname@example.org>, Edward Shishkin <email@example.com>, Mat <firstname.lastname@example.org>, LKML <email@example.com>, firstname.lastname@example.org, Ric Wheeler <email@example.com>, Andrew Morton <firstname.lastname@example.org>, Linus Torvalds <email@example.com>, The development of BTRFS <firstname.lastname@example.org> Subject: Re: Balancing leaves when walking from top to down (was Btrfs:...) Date: Mon, 21 Jun 2010 09:15:28 -0400 [thread overview] Message-ID: <20100621131528.GA17979@think> (raw) In-Reply-To: <4C1BED56.email@example.com> On Sat, Jun 19, 2010 at 12:04:06AM +0200, Edward Shishkin wrote: > Chris Mason wrote: > >On Fri, Jun 18, 2010 at 09:29:40PM +0200, Edward Shishkin wrote: > >>Jamie Lokier wrote: > >>>Edward Shishkin wrote: > >>>>If you decide to base your file system on some algorithms then please > >>>>use the original ones from proper academic papers. DO NOT modify the > >>>>algorithms in solitude: this is very fragile thing! All such > >>>>modifications must be reviewed by specialists in the theory of > >>>>algorithms. Such review can be done in various scientific magazines of > >>>>proper level. > >>>> > >>>>Personally I don't see any way to improve the situation with Btrfs > >>>>except full redesigning the last one. If you want to base your file > >>>>system on the paper of Ohad Rodeh, then please, use *exactly* the > >>>>Bayer's B-trees that he refers to. That said, make sure that all > >>>>records you put to the tree has equal length and all non-root nodes of > >>>>your tree are at least half filled. > >>>First, thanks Edward for identifying a specific problem with the > >>>current btrfs implementation. > >>Hello Jamie. > >> > >>>I've studied modified B-trees quite a lot and know enough to be sure > >>>that they are quite robust when you modify them in all sorts of ways. > >>Which property is robust? > >> > >>>Moreover, you are incorrect to say there's an intrinsic algorithmic > >>>problem with variable-length records. It is not true; if Knuth said > >>>so, Knuth was mistaken. > >>I didn't say about intrinsic algorithmic problems :) > >>I just repeat (after Knuth et al) that B-trees with variable-length > >>records don't > >>have any sane boundary for internal fragmentation. The common idea > >>is that if we > >>don't want Btrfs to be in infinite development stage, then we should > >>choose some > >>*sane* strategy (for example the paper of Ohad Rodeh) and strictly > >>adhere this in > >>future. > > > >Again, other than the inline file data, what exactly do you believe > >needs to change? > > 1. getting rid of inline extents; > 2. new formats for directory and xattr items to not look like a train, > which is able to occupy the whole leaf; > 3. make sure we do pro-active balancing like it is described in the paper. > > Sorry, I don't see other ways for now.. > > > Top down balancing vs balancing on insertion doesn't > >impact our ability to maintain full leaves. The current code is clearly > >choosing not to merge two leaves that it should have merged, which is > >just a plain old bug. > > How are you going to balance leaves when walking from top to down? > Suppose 1) and 2) above are not satisfied and having arrived to the leaf > level we see a number of items of variable length. What will we do to > keep leaves full? I think I'm mixing up complaints ;) top-down balancing is unrelated to the variable length items. You could balance from the bottom up and if you have a 1K item that can't be split next to a 3K item that can't be split, we're going to have those two items in two leaves. This is true regardless of how we balance. So the prime complaint here is that you don't like variable length items, correct? Ideally leaves are full, and ideally data blocks are full. In practice we can end up with either one utilized less than we would like. At the end of the day, every FS has a worst case setup where it is mostly empty in terms of bytes stored but full in terms of blocks. On xfs: for x in `seq 1 bazallion` ; echo 'f' > $x This would work on ext4 too, until you ran out of inodes. > > Could you please provide a sketch of the algorithm? > I'll reproduce from your test case and provide a fix. mount -o max_inline=1500 would give us 50% usage in the worst case (minus the balancing bug you hit). -chris
next prev parent reply other threads:[~2010-06-21 13:15 UTC|newest] Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top 2010-06-03 14:58 Unbound(?) internal fragmentation in Btrfs Edward Shishkin [not found] ` <AANLkTilKw2onQkdNlZjg7WVnPu2dsNpDSvoxrO_FA2z_@mail.gmail.com> 2010-06-18 8:03 ` Christian Stroetmann 2010-06-18 13:32 ` Btrfs: broken file system design (was Unbound(?) internal fragmentation in Btrfs) Edward Shishkin 2010-06-18 13:45 ` Daniel J Blueman 2010-06-18 16:50 ` Edward Shishkin 2010-06-23 23:40 ` Jamie Lokier 2010-06-24 3:43 ` Daniel Taylor 2010-06-24 4:51 ` Mike Fedyk 2010-06-24 22:06 ` Daniel Taylor 2010-06-25 9:15 ` Btrfs: broken file system design Andi Kleen 2010-06-25 18:58 ` Btrfs: broken file system design (was Unbound(?) internal fragmentation in Btrfs) Ric Wheeler 2010-06-26 5:18 ` Michael Tokarev 2010-06-26 11:55 ` Ric Wheeler [not found] ` <57784.2001:5c0:82dc::firstname.lastname@example.org> 2010-06-26 13:47 ` Ric Wheeler 2010-06-24 9:50 ` David Woodhouse 2010-06-18 18:15 ` Christian Stroetmann 2010-06-18 13:47 ` Chris Mason 2010-06-18 15:05 ` Edward Shishkin [not found] ` <4C1B8B4A.email@example.com> 2010-06-18 15:10 ` Chris Mason 2010-06-18 16:22 ` Edward Shishkin [not found] ` <4C1B9D4F.firstname.lastname@example.org> 2010-06-18 18:10 ` Chris Mason 2010-06-18 15:21 ` Christian Stroetmann 2010-06-18 15:22 ` Chris Mason 2010-06-18 15:56 ` Jamie Lokier 2010-06-18 19:25 ` Christian Stroetmann 2010-06-18 19:29 ` Edward Shishkin 2010-06-18 19:35 ` Chris Mason 2010-06-18 22:04 ` Balancing leaves when walking from top to down (was Btrfs:...) Edward Shishkin [not found] ` <4C1BED56.email@example.com> 2010-06-18 22:16 ` Ric Wheeler 2010-06-19 0:03 ` Edward Shishkin 2010-06-21 13:15 ` Chris Mason [this message] [not found] ` <20100621180013.GD17979@think> 2010-06-22 14:12 ` Edward Shishkin 2010-06-22 14:20 ` Chris Mason 2010-06-23 13:46 ` Edward Shishkin [not found] ` <4C221049.firstname.lastname@example.org> 2010-06-23 23:37 ` Jamie Lokier 2010-06-24 13:06 ` Chris Mason 2010-06-30 20:05 ` Edward Shishkin [not found] ` <4C2BA381.email@example.com> 2010-06-30 21:12 ` Chris Mason 2010-07-09 4:16 ` Chris Samuel 2010-07-09 20:30 ` Chris Mason 2010-06-23 23:57 ` Btrfs: broken file system design (was Unbound(?) internal fragmentation in Btrfs) Jamie Lokier
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20100621131528.GA17979@think \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: Balancing leaves when walking from top to down (was Btrfs:...)' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).