Re: ENOSPC while df shows 826.93GiB free

From: Christoph Anton Mitterer <calestyo@scientia.org>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: Qu Wenruo <quwenruo.btrfs@gmx.com>, linux-btrfs@vger.kernel.org
Subject: Re: ENOSPC while df shows 826.93GiB free
Date: Tue, 07 Dec 2021 16:07:32 +0100	[thread overview]
Message-ID: <b913833639d11a0242f781d94452e5e31d7cbf1b.camel@scientia.org> (raw)
In-Reply-To: <20211207072128.GL17148@hungrycats.org>

On Tue, 2021-12-07 at 02:21 -0500, Zygo Blaxell wrote:
> If you minimally balance data (so that you keep 2GB unallocated at
> all
> times) then it works much better: you can allocate the last metadata
> chunk that you need to expand, and it requires only a few minutes of
> IO
> per day.  After a while you don't need to do this any more, as a
> large
> buffer of allocated but unused metadata will form.

Hm I've already asked Qu in the other mail just before, whether/why
balancing would help there at all.

Doesn't it just re-write the block groups (but not defragment them...)
would that (and why) help to gain back unallocated space (which could
then be allocated for meta-data)?

And what exactly do you mean with "minimally"? I mean of course I can
use -dusage=20 or so... is it that?

But I guess all that wouldn't help now, when the unallocated space is
already used up, right?

> If you need a drastic intervention, you can mount with
> metadata_ratio=1
> for a short(!) time to allocate a lot of extra metadata block groups.
> Combine with a data block group balance for a few blocks (e.g. -
> dlimit=9).

All that seems rather impractical do to, to be honest. At least for an
non-expert admin.

First, these systems are production systems... so one doesn't want to
unmount (and do this procedure) when one sees that unallocated space
runs out.
One would rather want some way that if one sees: unallocated space gets
low -> allocate so and so much for meta data

I guess there are no real/official tools out there for such
surveillance? Like Nagios/Icinga checks, that look at the unallocated
space?

> You need about (3 + number_of_disks) GB of allocated but unused
> metadata
> block groups to handle the worst case (balance, scrub, and discard
> all
> active at the same time, plus the required free metadata space). 
> Also
> leave room for existing metadata to expand by about 50%, especially
> if
> you have snapshots.

> Never balance metadata.  Balancing metadata will erase existing
> metadata
> allocations, leading directly to this situation.

Wouldn't that only unallocated such allocations, that are completely
empty?

> > So if csum data needs so much space... why can't it simply reserve
> > e.g. 60 GB for metadata instead of just 17 GB?
> 
> It normally does.  Are you:
> 
>         - running metadata balances?  (Stop immediately.)

Nope, I did once accidentally (-musage=0 ... copy&pasted the wrong one)
but only *after* the filesystem got stuck...

>         - preallocating large files?  Checksums are allocated later,
> and
>         naive usage of prealloc burns metadata space due to
> fragmentation.

Hmm... not so sure about that... (I mean I don't know what the storage
middleware, which is www.dcache.org, does)... but it would probably do
this only for 1 to few such large files at once, if at all.

>         - modifying snapshots?  Metadata size increases with each
>         modified snapshot.

No snapshots are used at all on these filesystems.

>         - replacing large files with a lot of very small ones?  Files
>         below 2K are stored in metadata.  max_inline=0 disables this.

I guess you mean here:
First many large files were written... unallocated space is used up
(with data and meta-data block groups).
Then, large files are deleted... data block groups get fragmented (but
not unallocated acagain, because they're not empty.

Then loads of small files would be written (inline)... which then fails
as meta-data space would fill up even faster, right?

Well we do have filesystems, where there may be *many* small files..
but I guess still all around the range of 1MB or more. I don't think we
have lots of files below 2K.. if at all.

So I don't think that we have this IO pattern.

It rather seems simply as if btrfs wouldn't reserve meta-data
aggressively enough (at least not in our case)... and that to much is
allocated for data.. and when that is actually filled, it cannot
allocate anymore enough for metadata.

Thanks,
Chris.