All of lore.kernel.org
 help / color / mirror / Atom feed
* fresh btrfs filesystem, out of disk space, hundreds of gigs free
@ 2014-03-22  0:00 Jon Nelson
  2014-03-22  8:28 ` Duncan
  0 siblings, 1 reply; 6+ messages in thread
From: Jon Nelson @ 2014-03-22  0:00 UTC (permalink / raw)
  To: linux-btrfs

Using openSUSE 13.1 on x86_64 which - as of this writing - is 3.11.10,
I tried to copy a bunch of files over to a btrfs filesystem (which was
mounted as /, in fact).

After some time, things ground to a halt and I got out of disk space errors.
btrfs fi df /   showed about 1TB of *data* free, and 500MB of metadata free.

Below are the btrfs fi df /    and   btrfs fi show.
I ended up having to reboot the machine. I was not able to get the
machine to boot again after that, and ended up having to resort to a
rescue environment, at which point I copied everything over to an ext4
filesystem.

This is this first time I have tried btrfs since I experienced
(unfixable) corruption a year or so back, with 3.7 and up. I was led
to believe that the ENOSPC errors had been resolved.

Would a more recent kernel than 3.11 have done me any good?


turnip:~ # btrfs fi df /
Data, single: total=1.80TiB, used=832.22GiB
System, DUP: total=8.00MiB, used=204.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=5.50GiB, used=5.00GiB
Metadata, single: total=8.00MiB, used=0.00
turnip:~ # btrfs fi show
Label: none  uuid: 9379c138-b309-4556-8835-
0f156b863d29
        Total devices 1 FS bytes used 837.22GiB
        devid    1 size 1.81TiB used 1.81TiB path /dev/sda3

Btrfs v3.12+20131125


-- 
Jon
Software Blacksmith

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fresh btrfs filesystem, out of disk space, hundreds of gigs free
  2014-03-22  0:00 fresh btrfs filesystem, out of disk space, hundreds of gigs free Jon Nelson
@ 2014-03-22  8:28 ` Duncan
  2014-03-22 21:38   ` Andrew Skretvedt
  0 siblings, 1 reply; 6+ messages in thread
From: Duncan @ 2014-03-22  8:28 UTC (permalink / raw)
  To: linux-btrfs

Jon Nelson posted on Fri, 21 Mar 2014 19:00:51 -0500 as excerpted:

> Using openSUSE 13.1 on x86_64 which - as of this writing - is 3.11.10,
> Would a more recent kernel than 3.11 have done me any good?

[Reordered the kernel question from below to here, where you reported the 
running version.]

As both mkfs.btrfs and the wiki recommend, always use the latest kernel.  
In fact, the kernel config's btrfs option had a pretty strong warning thru 
3.12 that was only toned down in 3.13 as well, so I'd definitely 
recommend at least the latest 3.13.x stable series kernel in any case.

Additionally, the btrfs-progs you're running is a 3.12+ snapshot from 
November so it's relatively current as 3.12 is the latest release there 
(with 3.14 scheduled for released for the 3.14 kernel), but the
btrfs-progs version numbers are synced to the kernel, so a 3.12 btrfs-
progs userspace really indicates you should be running a 3.12 or later 
kernelspace btrfs, as well. 

Since kernel 3.14 is nearing release, that'd mean 3.13.x stable series or 
3.14-rc.  As I said, mkfs.btrfs actually tells you to run a current 
kernel when you create a filesystem...

Tho I don't believe it would have helped here, but there was an easy 
enough fix.  See below.

> I tried to copy a bunch of files over to a btrfs filesystem (which was
> mounted as /, in fact).
> 
> After some time, things ground to a halt and I got out of disk space
> errors. btrfs fi df /   showed about 1TB of *data* free, and 500MB
> of metadata free.

It's the metadata, plus no space left to allocate more.  See below.

> Below are the btrfs fi df /  and  btrfs fi show.
> 
> 
> turnip:~ # btrfs fi df /
> Data, single: total=1.80TiB, used=832.22GiB
> System, DUP: total=8.00MiB, used=204.00KiB
> System, single: total=4.00MiB, used=0.00
> Metadata, DUP: total=5.50GiB, used=5.00GiB
> Metadata, single: total=8.00MiB, used=0.00

FWIW, the system and metadata single chunks reported there are an 
artifact from mkfs.btrfs and aren't used (used=0.00).  At some point it 
should be updated to remove them automatically, but meanwhile, a balance 
should remove them from the listing.  If you do that balance immediately 
after filesystem creation, at the first mount, you'll be rid of them when 
there's not a whole lot of other data on the filesystem to balance as 
well.  That would leave:

> Data, single: total=1.80TiB, used=832.22GiB
> System, DUP: total=8.00MiB, used=204.00KiB
> Metadata, DUP: total=5.50GiB, used=5.00GiB

Metadata is the red-flag here.  Metadata chunks are 256 MiB in size, but 
in default DUP mode, two are allocated at once, thus 512 MiB at a time.  
And you're under 512 MiB free so you're running on the last pair of 
metadata chunks, which means depending on the operation, you may need to 
allocate metadata pretty quickly.  You can probably copy a few files 
before that, but a big copy operation with many files at a time would 
likely need to allocate more metadata.

But for a complete picture you need the filesystem show output, below, as 
well...

> turnip:~ # btrfs fi show
> Label: none  uuid: 9379c138-b309-4556-8835-0f156b863d29
>         Total devices 1 FS bytes used 837.22GiB
>         devid    1 size 1.81TiB used 1.81TiB path /dev/sda3
> 
> Btrfs v3.12+20131125

OK.  Here we see the root problem.  Size 1.81 TiB, used 1.81 TiB.  No 
unallocated space at all.  Whichever runs out of space first, data or 
metadata, you'll be stuck.

And as was discussed above, you're going to need another pair of metadata 
chunks allocated pretty quickly, but there's no unallocated space 
available to allocate to them, so no surprise at all you got free-space 
errors! =:^(

Conversely, you have all sorts of free data space.  Data space is 
allocated in gig-size chunks, and you have nearly a TiB of free data-
space, which means there's quite a few nearly empty data chunks available.

To correct that imbalance and free the extra data space to the pool so 
more metadata can be allocated, you run a balance.

Here, you probably want a balance of the data only, since it's what's 
unbalanced, and on slow spinning rust (as opposed to fast SSD) rewriting
/everything/, as balance does by default, will take some time.  To do 
data only, use the -d option:

# btrfs balance start -d /

(You said it was mounted on root, so that's what I used.)

But balance will try to allocate a new chunk to copy all the used data 
from several used chunks at once, since they're near-empty, and won't be 
able to do it, since all your space is allocated.  To get around that, we 
use balance filters, as described on the wiki.

# btrfs balance start -dusage=0 /

The usage=0 filter says only rebalance entirely empty chunks, effectively 
simply reclaiming them to the unallocated pool.  If you're lucky, you 
have some of these and the above should reclaim them and return them to 
the unallocated pool.  That will give you a bit of unallocated space to 
allocate further chunks from.

Then, depending on how much space that reclaimed to unallocated, you can 
bump the usage=<max-percent> up until you're satisfied.  One at a time, 
or simply try skipping to 50% or just -d, if you like:

-dusage=5, -dusage=20, -dusage=50 -d

Usage=5 says only do chunks that are less than 5% used.  Usage=50 thus 
obviously means only those that are less than 50% used.  And a simple -d 
without the usage= filter is equivalent to -dusage=100.

As the usage bumps up, you'll be rewriting chunks with more actual data 
in them, which will take longer per chunk, tho how long overall will 
depend on how many chunks actually had that much data in them.  Given 
that you're on spinning rust with 800+ gig of data, you may not actually 
want to rewrite ALL data chunks (a plain -d), but that's not likely to be 
necessary anyway.  Stopping at say 20% or 50% usage will likely reclaim 
most of the unused data chunks you'd reclaim with a full -d, and in fact, 
even the 5% filter might well be good enough.

While we're at it, you might as well reclaim those empty single metadata 
and system chunks too, that the mkfs.btrfs left you with as mentioned 
above.

# btrfs balance start -musage=0 -susage=0 

(For the -s/system you might also need the -f/force option.  I didn't 
need it here and in fact -m implies -s as well, so in theory -musage=0 
should clear out both as it did for me, but one list poster indicated 
that didn't work for him, and he had to specifically do -susage=0 -f, 
which worked fine, clearing the unused system single from his btrfs 
filesystem df listing.  YMMV.)

That should get you back in business.  After that, simply learn to keep 
an eye on the btrfs filesystem show results and do a rebalance whenever 
unallocated space seems to be dropping too low, say when used space 
reaches 1.5 or 1.7 TiB out of the 1.81 TiB, depending on how full you 
actually eventually fill the filesystem.  (Obviously if data+metadata
+system used is say 1.65 TiB, doing a balance because allocated is 1.7 TiB 
isn't going to get you much back.  OTOH, if you end up with only say 1.2 
TiB of data+metadata+system used, rebalancing if total allocated usage 
exceeds 1.5 TiB could still make sense and get you a couple hundred gig 
back.)

Meanwhile, I strongly urge you to read up on the btrfs wiki.  The 
following is easy to remember and bookmark:

https://btrfs.wiki.kernel.org

Here's the documentation link (alternate bookmarking candidate):

https://btrfs.wiki.kernel.org/index.php/Main_Page#Documentation

Here's the discussion that would have gotten you out of this specific 
bind (long link, watch the wrap):

https://btrfs.wiki.kernel.org/index.php/
FAQ#if_your_device_is_large_.28.3E16GiB.29

And here's the balance-filters page, which can be a bit hard to find altho 
it's linked on the FAQ page under the balance discussion:

https://btrfs.wiki.kernel.org/index.php/Balance_Filters

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fresh btrfs filesystem, out of disk space, hundreds of gigs free
  2014-03-22  8:28 ` Duncan
@ 2014-03-22 21:38   ` Andrew Skretvedt
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Skretvedt @ 2014-03-22 21:38 UTC (permalink / raw)
  To: linux-btrfs, 1i5t5.duncan

This is just a rave-post to praise Duncan for his excellent post back to 
Jon!

This will surely help a good number of new btrfs users who have the good 
sense to watch this mailing list. Your exposition on the balance command 
helped to clarify in my mind exactly why someone might want to use the 
usage= filter with a given figure (it looked like black magic to me 
before, and its usefulness still wasn't completely clear to me after 
reading the wiki). You had a lot of practical advice for making and 
maintaining a new btrfs filesystem. I run a btrfs as my root as well, 
created under kernel 3.2 and purring right along while I've since moved 
the kernel to 3.9. I recently updated the userspace tools to 3.12 (from 
the distro-provided 0.19+something), so I'll be sure to move the kernel 
up to 3.12 as well before trying anything complex.

Thanks for the effort.

-Andrew
-----
On 2014-Mar-22 03:28, Duncan wrote:> Jon Nelson posted on Fri, 21 Mar 
2014 19:00:51 -0500 as excerpted:
>
-----(snip)-----
> Meanwhile, I strongly urge you to read up on the btrfs wiki.  The
> following is easy to remember and bookmark:
>
> https://btrfs.wiki.kernel.org
>
> Here's the documentation link (alternate bookmarking candidate):
>
> https://btrfs.wiki.kernel.org/index.php/Main_Page#Documentation
>
> Here's the discussion that would have gotten you out of this specific
> bind (long link, watch the wrap):
>
> https://btrfs.wiki.kernel.org/index.php/
> FAQ#if_your_device_is_large_.28.3E16GiB.29
>
> And here's the balance-filters page, which can be a bit hard to find altho
> it's linked on the FAQ page under the balance discussion:
>
> https://btrfs.wiki.kernel.org/index.php/Balance_Filters
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fresh btrfs filesystem, out of disk space, hundreds of gigs free
  2014-03-22 23:21 Jon Nelson
  2014-03-22 23:38 ` Hugo Mills
@ 2014-03-23 11:54 ` Duncan
  1 sibling, 0 replies; 6+ messages in thread
From: Duncan @ 2014-03-23 11:54 UTC (permalink / raw)
  To: linux-btrfs

Jon Nelson posted on Sat, 22 Mar 2014 18:21:02 -0500 as excerpted:

>>> # btrfs fi df /
>>> Data, single: total=1.80TiB, used=832.22GiB
>>> System,  DUP: total=8.00MiB, used=204.00KiB
>>> Metadata, DUP: total=5.50GiB, used=5.00GiB

[The 0-used single listings left over from filesystem creation omitted.]

>> Metadata is the red-flag here.  Metadata chunks are 256 MiB in size,
>> but in default DUP mode, two are allocated at once, thus 512 MiB at a
>> time. And you're [below that so close to needing more allocated].
> 
> The size of the chunks allocated is especially useful information. I've
> not seen that anywhere else, and does explain a fair bit.

I actually had to dig a little bit for that information, but like you I 
found it quite useful, so the digging was worth it. =:^)

>> But for a complete picture you need the filesystem show output, below,
>> as well...
>>
>>> # btrfs fi show Label: none  uuid: [...]
>>>         Total devices 1 FS bytes used 837.22GiB
>>>         devid 1 size 1.81TiB used 1.81TiB path /dev/sda3
>>
>> OK.  Here we see the root problem.  Size 1.81 TiB, used 1.81 TiB.  No
>> unallocated space at all.  Whichever runs out of space first, data or
>> metadata, you'll be stuck.
> 
> Now it's at this point that I am unclear. I thought the above said:

> "1 device on this filesystem, 837.22 GiB used."

> and

> [devID #1, /dev/sda3, is 1.81TiB in size, with btrfs using it all.]
> 
> Which I interpret differently. Can you go into more detail as to how
> (from btrfs fi show) we can say "the _filesystem_ (not the device) is
> full"?

FWIW, there has been some discussion about changing the way both df and 
show present their information, giving a bit more than they are now and 
ideally presenting the core information you need both commands to see 
now, in one.  I expect that to eventually happen, but meanwhile, the 
output of filesystem show in particular /is/ a bit confusing.  I actually 
think they need to omit or change the size displayed on the total devices 
line entirely, as the information it gives, without the information from 
filesystem df as well, really isn't useful on its own, and is an 
invitation to confusion and misinterpretation, much like you found 
yourself with, because it really isn't related to the numbers given (in 
show) for the individual devices at all, and is only useful in the 
context of filesystem df, which is where it belongs, NOT in show!  My 
opinion, of course. =:^)

Anyway, if you compare the numbers from filesystem df and do the math, 
you'll quickly find what the total size in show is actually telling you:

>From df: data-used + metadata-used + system-used = ...

>From show: filesystem total used.

Given the numbers posted above:

>From df: data-used=     832.22 GiB (out of 1.8 TiB allocated/total data)
         metadata-used=   5.00 GiB (out of 5.5 GiB allocated metadata)
         system-used=   (insignificant, well under a MiB)

>From show, the total:

         total-used=    837.22 GiB

The PROBLEM is that the numbers the REST of show is giving you are 
something entirely different, only tangentially related:

>From show: per device:

1) Total filesystem size on that device.

2) Total of all chunk allocations (*NOT* what's actually used from those 
allocations) on that device, altho it's /labeled/ "used" in show's 
individual device listings).

Again, comparing from df it's quickly apparent where the numbers come 
from, the totals (*NOT* used) of data+metadata+system allocations 
(labeled total in df, but it's the allocated).

Given the posting above, that'd be:

>From df: data-allocated (total) = 1.80 TiB
         metadata-allocated     = 0.005 TiB (5.5 GiB)
         system-allocated       = (trivial, a few MiB)

>From show, adding up all individual devices, in your case just one:

        total-allocated         = 1.81 TiB (obviously rounded slightly)

3) What's *NOT* shown but can be easily deduced by subtracting allocated 
(labeled used) from total is the unallocated, thus still free to allocate.

In this case, that's zero, since the filesystem size on that (single) 
device is 1.81 TiB and 1.81 TiB is allocated.

So there's nothing left available to allocate and you're running out of 
metadata space and need to allocate some more, thus the problem, despite 
the fact that both normal (not btrfs) df and btrfs filesystem show, 
APPEAR to say there's plenty of room left.  Btrfs filesystem show 
ACTUALLY says there's NO room left, at least for further chunk 
allocations, but you really have to understand what information it's 
presenting and how, in ordered to actually get what it's telling you.

Like I said, I really wish show's total used size either wasn't even 
there, or likewise corresponded to the allocation, not what's used from 
that allocation, as all the device lines do.

But that /does/ explain why the total of all the device used (in your 
case just one) doesn't equal the total used of the filesystem -- they're 
reporting two *ENTIRELY* different things!!  No WONDER people are 
confused!


>> To correct that imbalance and free the extra data space to the pool so
>> more metadata can be allocated, you run a balance.
> 
> In fact, I did try a balance - both a data-only and a metadata-only
> balance. The metadata-only balance failed. I cancelled the data-only
> balance early, although perhaps I should have been more patient. I went
> from a running system to working from a rescue environment -- I was
> under a bit of time pressure to get things moving again.

Well, as I explained it may well have failed due to lack of space anyway, 
without trying a rather low -dusage= parameter first.  Tho it's possible 
it would have freed some space in the process, enough so that you could 
run it again and not get the errors the second time.  But since I guess 
you weren't aware of the -dusage= thing, you'd have probably seen the 
balance out of space error and given up anyway, so it's probably just as 
well that you gave up a bit earlier than that, after all.

Meanwhile, I've actually had the two balance thing, the first fail on 
some chunks due to lack of space, the second succeed, happen here.  I was 
not entirely out of space but close, and a first balance didn't have 
enough space to rebalance some of the fuller chunks when it got to them 
so gave an error on them, but a few of the less used chunks were 
consolidated without error on them, and that gave me enough more space 
after the first balance with errors so that I could run a second balance 
and have it complete without error.

(I'm on SSD and have my devices partitioned up into smaller partitions as 
well, my largest btrfs being well under 100 gigs, so between the small 
size and the fast SSD, a full balance on any single btrfs is a couple 
minutes or less, so running that second balance after the first was 
actually rather trivial; nothing at all like the big deal it'd be on a
TiB+ spinning rust device!)

> To be honest, it seems like a lot of hoop-jumping and a maintenance
> burden for the administrator. Not being able to draw from "free space
> pool" for either data or metadata seems like a big bummer. I'm hoping
> that such a limitation will be resolved at some near-term future point.

Well (as I guess you probably understand now), you /can/ draw from the 
free (unallocated) space pool for either, but your problem was that it 
had been drawn dry -- the data chunks were hogging it all!  And as I (and 
Hugo) explained, unfortunately btrfs doesn't automatically reclaim unused 
but allocated chunks to the unallocated pool yet -- you presently have to 
run a balance for that.

But you're right.  Ideally btrfs would be able to automatically reclaim 
chunks TO unallocated just as it can automatically claim them FROM 
unallocated, and as Hugo says, it has been discussed, just not 
implemented yet.  Until then the automatic process is unfortunately only 
one way, and you have to manually run a balance to go the other way. =:^\

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fresh btrfs filesystem, out of disk space, hundreds of gigs free
  2014-03-22 23:21 Jon Nelson
@ 2014-03-22 23:38 ` Hugo Mills
  2014-03-23 11:54 ` Duncan
  1 sibling, 0 replies; 6+ messages in thread
From: Hugo Mills @ 2014-03-22 23:38 UTC (permalink / raw)
  To: Jon Nelson; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4524 bytes --]

On Sat, Mar 22, 2014 at 06:21:02PM -0500, Jon Nelson wrote:
> Duncan <1i5t5.duncan <at> cox.net> writes:
> > Jon Nelson posted on Fri, 21 Mar 2014 19:00:51 -0500 as excerpted:
[snip]
> > > Below are the btrfs fi df /  and  btrfs fi show.
> > >
> > >
> > > turnip:~ # btrfs fi df /
> > > Data, single: total=1.80TiB, used=832.22GiB
> > > System, DUP: total=8.00MiB, used=204.00KiB
> > > System, single: total=4.00MiB, used=0.00
> > > Metadata, DUP: total=5.50GiB, used=5.00GiB
> > > Metadata, single: total=8.00MiB, used=0.00
> >
> > FWIW, the system and metadata single chunks reported there are an
> > artifact from mkfs.btrfs and aren't used (used=0.00).  At some point it
> > should be updated to remove them automatically, but meanwhile, a balance
> > should remove them from the listing.  If you do that balance immediately
> > after filesystem creation, at the first mount, you'll be rid of them when
> > there's not a whole lot of other data on the filesystem to balance as
> > well.  That would leave:
> >
> > > Data, single: total=1.80TiB, used=832.22GiB
> > > System, DUP: total=8.00MiB, used=204.00KiB
> > > Metadata, DUP: total=5.50GiB, used=5.00GiB
> >
> > Metadata is the red-flag here.  Metadata chunks are 256 MiB in size, but
> > in default DUP mode, two are allocated at once, thus 512 MiB at a time.
> > And you're under 512 MiB free so you're running on the last pair of
> > metadata chunks, which means depending on the operation, you may need to
> > allocate metadata pretty quickly.  You can probably copy a few files
> > before that, but a big copy operation with many files at a time would
> > likely need to allocate more metadata.
> 
> The size of the chunks allocated is especially useful information. I've not
> seen that anywhere else, and does explain a fair bit.
> 
> > But for a complete picture you need the filesystem show output, below, as
> > well...
> >
> > > turnip:~ # btrfs fi show
> > > Label: none  uuid: 9379c138-b309-4556-8835-0f156b863d29
> > >         Total devices 1 FS bytes used 837.22GiB
> > >         devid    1 size 1.81TiB used 1.81TiB path /dev/sda3
> > >
> > > Btrfs v3.12+20131125
> >
> > OK.  Here we see the root problem.  Size 1.81 TiB, used 1.81 TiB.  No
> > unallocated space at all.  Whichever runs out of space first, data or
> > metadata, you'll be stuck.
> 
> Now it's at this point that I am unclear. I thought the above said:
> "1 device on this filesystem, 837.22 GiB used."
> and
> "device ID #1 is /dev/sda3, is 1.81TiB in size, and btrfs is using 1.81TiB
> of that"
> 
> Which I interpret differently. Can you go into more detail as to how (from
> btrfs fi show) we can say "the _filesystem_ (not the device) is full"?

   From btrfs fi show on its own, you can't. The problem is that the
data/metadata split means that the metadata has run out, and there's
(currently -- see below) no way of reassigning some of the data
allocation to metadata. So the "disk full" condition is "complete
allocation (see btrfs fi show)" *and* "metadata near-full (see btrfs
fi df)".

   An interesting question here is how come the FS allocated all that
space to data when it's a newly-made filesystem with less than half
that space actually used -- did you write lots of other data to it and
then delete it again? If not, I haven't seen overallocation like that
that since 3.9 or so, and it would be good to know what happened.

[snip]
> > Meanwhile, I strongly urge you to read up on the btrfs wiki.  The
> > following is easy to remember and bookmark:
> 
> I read the wiki and related pages many times, but there is a lot of info
> there and I must have skipped over the "if your device is large" section.
> 
> To be honest, it seems like a lot of hoop-jumping and a maintenance burden
> for the administrator. Not being able to draw from "free space pool" for
> either data or metadata seems like a big bummer. I'm hoping that such a
> limitation will be resolved at some near-term future point.

   It's certainly something that's been discussed in the past. I think
Ilya had automatic reclamation of unused allocation (e.g. an autonomic
balance / reallocation) on his to-do list at one point. I don't know
what the status of the work is, though.

[snip]

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Alert status chocolate viridian: Authorised personnel only. ---   
                   Dogs must be carried on escalator.                    

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fresh btrfs filesystem, out of disk space, hundreds of gigs free
@ 2014-03-22 23:21 Jon Nelson
  2014-03-22 23:38 ` Hugo Mills
  2014-03-23 11:54 ` Duncan
  0 siblings, 2 replies; 6+ messages in thread
From: Jon Nelson @ 2014-03-22 23:21 UTC (permalink / raw)
  To: linux-btrfs

Duncan <1i5t5.duncan <at> cox.net> writes:

>
> Jon Nelson posted on Fri, 21 Mar 2014 19:00:51 -0500 as excerpted:
>
> > Using openSUSE 13.1 on x86_64 which - as of this writing - is 3.11.10,
> > Would a more recent kernel than 3.11 have done me any good?
>
> [Reordered the kernel question from below to here, where you reported the
> running version.]
>
> As both mkfs.btrfs and the wiki recommend, always use the latest kernel.
> In fact, the kernel config's btrfs option had a pretty strong warning thru
> 3.12 that was only toned down in 3.13 as well, so I'd definitely
> recommend at least the latest 3.13.x stable series kernel in any case.

I would like to say that your response is one of the most useful and
detailed responsed I've ever received on a mailing list. Thank you!

The "please run the very latest kernel/userland" is sort of true for
everything, though. Also, I am of the understanding that the openSUSE folks
back-port *some* of the btrfs-relevant bits to both the kernel and the
userspace tools, but I could be wrong, too.

> > I tried to copy a bunch of files over to a btrfs filesystem (which was
> > mounted as /, in fact).
> >
> > After some time, things ground to a halt and I got out of disk space
> > errors. btrfs fi df /   showed about 1TB of *data* free, and 500MB
> > of metadata free.
>
> It's the metadata, plus no space left to allocate more.  See below.

Right. Although I did a poor job of noting it, I understood at least that much.

> > Below are the btrfs fi df /  and  btrfs fi show.
> >
> >
> > turnip:~ # btrfs fi df /
> > Data, single: total=1.80TiB, used=832.22GiB
> > System, DUP: total=8.00MiB, used=204.00KiB
> > System, single: total=4.00MiB, used=0.00
> > Metadata, DUP: total=5.50GiB, used=5.00GiB
> > Metadata, single: total=8.00MiB, used=0.00
>
> FWIW, the system and metadata single chunks reported there are an
> artifact from mkfs.btrfs and aren't used (used=0.00).  At some point it
> should be updated to remove them automatically, but meanwhile, a balance
> should remove them from the listing.  If you do that balance immediately
> after filesystem creation, at the first mount, you'll be rid of them when
> there's not a whole lot of other data on the filesystem to balance as
> well.  That would leave:
>
> > Data, single: total=1.80TiB, used=832.22GiB
> > System, DUP: total=8.00MiB, used=204.00KiB
> > Metadata, DUP: total=5.50GiB, used=5.00GiB
>
> Metadata is the red-flag here.  Metadata chunks are 256 MiB in size, but
> in default DUP mode, two are allocated at once, thus 512 MiB at a time.
> And you're under 512 MiB free so you're running on the last pair of
> metadata chunks, which means depending on the operation, you may need to
> allocate metadata pretty quickly.  You can probably copy a few files
> before that, but a big copy operation with many files at a time would
> likely need to allocate more metadata.

The size of the chunks allocated is especially useful information. I've not
seen that anywhere else, and does explain a fair bit.

> But for a complete picture you need the filesystem show output, below, as
> well...
>
> > turnip:~ # btrfs fi show
> > Label: none  uuid: 9379c138-b309-4556-8835-0f156b863d29
> >         Total devices 1 FS bytes used 837.22GiB
> >         devid    1 size 1.81TiB used 1.81TiB path /dev/sda3
> >
> > Btrfs v3.12+20131125
>
> OK.  Here we see the root problem.  Size 1.81 TiB, used 1.81 TiB.  No
> unallocated space at all.  Whichever runs out of space first, data or
> metadata, you'll be stuck.

Now it's at this point that I am unclear. I thought the above said:
"1 device on this filesystem, 837.22 GiB used."
and
"device ID #1 is /dev/sda3, is 1.81TiB in size, and btrfs is using 1.81TiB
of that"

Which I interpret differently. Can you go into more detail as to how (from
btrfs fi show) we can say "the _filesystem_ (not the device) is full"?

> And as was discussed above, you're going to need another pair of metadata
> chunks allocated pretty quickly, but there's no unallocated space
> available to allocate to them, so no surprise at all you got free-space
> errors! =:^(
>
> Conversely, you have all sorts of free data space.  Data space is
> allocated in gig-size chunks, and you have nearly a TiB of free data-
> space, which means there's quite a few nearly empty data chunks available.

> To correct that imbalance and free the extra data space to the pool so
> more metadata can be allocated, you run a balance.

In fact, I did try a balance - both a data-only and a metadata-only balance.
 The metadata-only balance failed. I cancelled the data-only balance early,
although perhaps I should have been more patient. I went from a running
system to working from a rescue environment -- I was under a bit of time
pressure to get things moving again.

> Here, you probably want a balance of the data only, since it's what's
> unbalanced, and on slow spinning rust (as opposed to fast SSD) rewriting
> /everything/, as balance does by default, will take some time.  To do
> data only, use the -d option:
>
> # btrfs balance start -d /
>
> (You said it was mounted on root, so that's what I used.)

I'm going to remove a bunch of (great!) quoted stuff here that, while
useful, won't be relevant to my reply.

> Meanwhile, I strongly urge you to read up on the btrfs wiki.  The
> following is easy to remember and bookmark:

I read the wiki and related pages many times, but there is a lot of info
there and I must have skipped over the "if your device is large" section.

To be honest, it seems like a lot of hoop-jumping and a maintenance burden
for the administrator. Not being able to draw from "free space pool" for
either data or metadata seems like a big bummer. I'm hoping that such a
limitation will be resolved at some near-term future point.

Otherwise, I think it's a problem to suggest that btrfs will require
administrators to keep an eye on data /and/ metadata free space (using
btrfs-specific tools) *and* that they might need to run processes which
shuffle data about in the hopes that such actions might arrange things more
optimally.

I've been very excited to reap the very real benefits of btrfs but it seems
like the also real downsides continue to negatively impact it's operation.

Thanks again for your truly excellent response. I'm a big fan of the design
and featureset provided by btrfs, but some of these rough edges hurt a bit
more when they've bit me more than once.


-- 
Jon
Software Blacksmith

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-03-23 11:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-22  0:00 fresh btrfs filesystem, out of disk space, hundreds of gigs free Jon Nelson
2014-03-22  8:28 ` Duncan
2014-03-22 21:38   ` Andrew Skretvedt
2014-03-22 23:21 Jon Nelson
2014-03-22 23:38 ` Hugo Mills
2014-03-23 11:54 ` Duncan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.