From: Christian Stroetmann <firstname.lastname@example.org>
To: Chris Mason <email@example.com>
Cc: Linux Kernel Mailing List <firstname.lastname@example.org>,
Subject: Re: Btrfs: broken file system design (was Unbound(?) internal fragmentation in Btrfs)
Date: Fri, 18 Jun 2010 17:21:21 +0200 [thread overview]
Message-ID: <4C1B8EF1.email@example.com> (raw)
Chris Mason wrote:
> On Fri, Jun 18, 2010 at 03:32:16PM +0200, Edward Shishkin wrote:
>> Mat wrote:
>>> On Thu, Jun 3, 2010 at 4:58 PM, Edward Shishkin<firstname.lastname@example.org> wrote:
>>>> Hello everyone.
>>>> I was asked to review/evaluate Btrfs for using in enterprise
>>>> systems and the below are my first impressions (linux-2.6.33).
>>>> The first test I have made was filling an empty 659M (/dev/sdb2)
>>>> btrfs partition (mounted to /mnt) with 2K files:
>>>> # for i in $(seq 1000000); \
>>>> do dd if=/dev/zero of=/mnt/file_$i bs=2048 count=1; done
>>>> (terminated after getting "No space left on device" reports).
>>>> # ls /mnt | wc -l
>>>> So, I got the "dirty" utilization 59480*2048 / (659*1024*1024) = 0.17,
>>>> and the first obvious question is "hey, where are other 83% of my
>>>> disk space???" I looked at the btrfs storage tree (fs_tree) and was
>>>> shocked with the situation on the leaf level. The Appendix B shows
>>>> 5 adjacent btrfs leafs, which have the same parent.
>>>> For example, look at the leaf 29425664: "items 1 free space 3892"
>>>> (of 4096!!). Note, that this "free" space (3892) is _dead_: any
>>>> attempts to write to the file system will result in "No space left
>>>> on device".
> There are two easy ways to fix this problem. Turn off the inline
> extents (max_inline=0) or allow splitting of the inline extents. I
> didn't put in the splitting simply because the complexity was high while
> the benefits were low (in comparison with just turning off the inline
But then the benefits of splitting must be high, because it solves this
problem if inline extents are turned on.
>> It must be a highly unexpected and difficult question for file system
>> developers: "how efficiently does your file system manage disk space"?
>> In the meanwhile I confirm that Btrfs design is completely broken:
>> records stored in the B-tree differ greatly from each other (it is
>> unacceptable!), and the balancing algorithms have been modified in
>> insane manner. All these factors has led to loss of *all* boundaries
>> holding internal fragmentation and to exhaustive waste of disk space
>> (and memory!) in spite of the property "scaling in their ability to
>> address large storage".
>> This is not a large storage, this is a "scalable sieve": you can not
>> rely on finding there some given amount of water even after infinite
>> increasing the size of the sieve (read escalating the pool of Btrfs
>> It seems that nobody have reviewed Btrfs before its inclusion to the
>> mainline. I have only found a pair of recommendations with a common
>> idea that Btrfs maintainer is "not a crazy man". Plus a number of
>> papers which admire with the "Btrfs phenomena". Sigh.
>> Well, let's decide what can we do in current situation..
>> The first obvious point here is that we *can not* put such file system
>> to production. Just because it doesn't provide any guarantees for our
>> users regarding disk space utilization.
> Are you basing all of this on inline extents? The other extents of
> variable size are more flexible (taking up the room in the leaf), but
> they can also easy be split during balancing.
If we have to split everywhere, hasn't it then some (dramatic) impact on
the performance of the Btrfs filesystem?
As it was said above: splitting has a high complexity.
next prev parent reply other threads:[~2010-06-18 15:21 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-03 14:58 Unbound(?) internal fragmentation in Btrfs Edward Shishkin
[not found] ` <AANLkTilKw2onQkdNlZjg7WVnPu2dsNpDSvoxrO_FA2z_@mail.gmail.com>
2010-06-18 8:03 ` Christian Stroetmann
2010-06-18 13:32 ` Btrfs: broken file system design (was Unbound(?) internal fragmentation in Btrfs) Edward Shishkin
2010-06-18 13:45 ` Daniel J Blueman
2010-06-18 16:50 ` Edward Shishkin
2010-06-23 23:40 ` Jamie Lokier
2010-06-24 3:43 ` Daniel Taylor
2010-06-24 4:51 ` Mike Fedyk
2010-06-24 22:06 ` Daniel Taylor
2010-06-25 9:15 ` Btrfs: broken file system design Andi Kleen
2010-06-25 18:58 ` Btrfs: broken file system design (was Unbound(?) internal fragmentation in Btrfs) Ric Wheeler
2010-06-26 5:18 ` Michael Tokarev
2010-06-26 11:55 ` Ric Wheeler
[not found] ` <57784.2001:5c0:82dc::email@example.com>
2010-06-26 13:47 ` Ric Wheeler
2010-06-24 9:50 ` David Woodhouse
2010-06-18 18:15 ` Christian Stroetmann
2010-06-18 13:47 ` Chris Mason
2010-06-18 15:05 ` Edward Shishkin
[not found] ` <4C1B8B4A.firstname.lastname@example.org>
2010-06-18 15:10 ` Chris Mason
2010-06-18 16:22 ` Edward Shishkin
[not found] ` <4C1B9D4F.email@example.com>
2010-06-18 18:10 ` Chris Mason
2010-06-18 15:21 ` Christian Stroetmann [this message]
2010-06-18 15:22 ` Chris Mason
2010-06-18 15:56 ` Jamie Lokier
2010-06-18 19:25 ` Christian Stroetmann
2010-06-18 19:29 ` Edward Shishkin
2010-06-18 19:35 ` Chris Mason
2010-06-18 22:04 ` Balancing leaves when walking from top to down (was Btrfs:...) Edward Shishkin
[not found] ` <4C1BED56.firstname.lastname@example.org>
2010-06-18 22:16 ` Ric Wheeler
2010-06-19 0:03 ` Edward Shishkin
2010-06-21 13:15 ` Chris Mason
[not found] ` <20100621180013.GD17979@think>
2010-06-22 14:12 ` Edward Shishkin
2010-06-22 14:20 ` Chris Mason
2010-06-23 13:46 ` Edward Shishkin
[not found] ` <4C221049.email@example.com>
2010-06-23 23:37 ` Jamie Lokier
2010-06-24 13:06 ` Chris Mason
2010-06-30 20:05 ` Edward Shishkin
[not found] ` <4C2BA381.firstname.lastname@example.org>
2010-06-30 21:12 ` Chris Mason
2010-07-09 4:16 ` Chris Samuel
2010-07-09 20:30 ` Chris Mason
2010-06-23 23:57 ` Btrfs: broken file system design (was Unbound(?) internal fragmentation in Btrfs) Jamie Lokier
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).