From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Chris Murphy <lists@colorremedies.com>, Tom Worster <fsb@thefsb.org>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Recommendations for balancing as part of regular maintenance?
Date: Tue, 16 Jan 2018 07:57:35 -0500 [thread overview]
Message-ID: <88164eee-ead2-e6a3-9d6a-aeb0803466db@gmail.com> (raw)
In-Reply-To: <CAJCQCtSKhpLPu_YtbLNEjd82fwgqN+5w=GZociXhJacOQbCjZw@mail.gmail.com>
On 2018-01-16 01:45, Chris Murphy wrote:
> On Mon, Jan 15, 2018 at 11:23 AM, Tom Worster <fsb@thefsb.org> wrote:
>> On 13 Jan 2018, at 17:09, Chris Murphy wrote:
>>
>>> On Fri, Jan 12, 2018 at 11:24 AM, Austin S. Hemmelgarn
>>> <ahferroin7@gmail.com> wrote:
>>>
>>>> To that end, I propose the following text for the FAQ:
>>>>
>>>> Q: Do I need to run a balance regularly?
>>>>
>>>> A: While not strictly necessary for normal operations, running a filtered
>>>> balance regularly can help prevent your filesystem from ending up with
>>>> ENOSPC issues. The following command run daily on each BTRFS volume
>>>> should
>>>> be more than sufficient for most users:
>>>>
>>>> `btrfs balance start -dusage=25 -dlimit=2..10 -musage=25 -mlimit=2..10`
>>>
>>> Daily? Seems excessive.
>>>
>>> I've got multiple Btrfs file systems that I haven't balanced, full or
>>> partial, in a year. And I have no problems. One is a laptop which
>>> accumulates snapshots until roughly 25% free space remains and then
>>> most of the snapshots are deleted, except the most recent few, all at
>>> one time. I'm not experiencing any problems so far. The other is a NAS
>>> and it's multiple copies, with maybe 100-200 snapshots. One backup
>>> volume is 99% full, there's no more unallocated free space, I delete
>>> snapshots only to make room for btrfs send receive to keep pushing the
>>> most recent snapshot from the main volume to the backup. Again no
>>> problems.
>>>
>>> I really think suggestions this broad are just going to paper over
>>> bugs or design flaws, we won't see as many bug reports and then real
>>> problems won't get fixed.
>>
>> This is just an answer to a FAQ. This is not Austin or anyone else trying to
>> telling you or anyone else that you should do this. It should be clear that
>> there is an implied caveat along the lines of: "There are other ways to
>> manage allocation besides regular balancing. This recommendation is a
>> For-Dummies-kinda default that should work well enough if you don't have
>> another strategy better adapted to your situation." If this implication is
>> not obvious enough then we can add something explicit.
>
> It's an upstream answer to a frequently asked question. It's rather
> official, or about as close as it gets to it.
>
>>
>>> I also thing the time based method is too subjective. What about the
>>> layout means a balance is needed? And if it's really a suggestion, why
>>> isn't there a chron or systemd unit that just does this for the user,
>>> in btrfs-progs, working and enabled by default?
>>
>> As a newcomer to BTRFS, I was astonished to learn that it demands each user
>> figure out some workaround for what is, in my judgement, a required but
>> missing feature, i.e. a defect, a bug. At present the docs are pretty
>> confusing for someone trying to deal with it on their own.
>>
>> Unless some better fix is in the works, this _should_ be a systemd unit or
>> something. Until then, please put it in FAQ.
>
> At least openSUSE has a systemd unit for a long time now, but last
> time I checked (a bit over a year ago) it's disabled by default. Why?
>
> And insofar as I'm aware, openSUSE users aren't having big problems
> related to lack of balancing, they have problems due to the lack of
> balancing combined with schizo snapper defaults, which are these days
> masked somewhat by turning on quotas so snapper can be more accurate
> about cleaning up.
And in turn causing other issues because of the quotas, but that's
getting OT...
>
> Basically the scripted balance tells me two things:
> a. Something is broken (still)
> b. None of the developers has time to investigate coherent bug reports
> about a. and fix/refine it.
I don't entirely agree here. The issue is essentially inherent in the
very design of the two-stage allocator itself, so it's not really
something that can just be fixed by some simple surface patch. The only
real options I see to fix it are either:
1. Redesign the allocator
or:
2. figure out some way to handle this generically and automatically.
The first case is pretty much immediately out because it will almost
certainly require a breaking change in the on-disk format. The second
is extremely challenging to do right, and likely to cause some
significant controversy among list regulars (I for one don't want the FS
doing stuff behind my back that impacts performance, and I have a
feeling that quite a lot of other people here don't either).
Given that, I would say time is only a (probably small) part of it.
This is not an easy thing to fix given the current situation, and
difficult problems tend to sit around with no progress for very long
periods of time in open source development.
>
> And therefore papering over the problem is all we have. Basically it's
> a sledgehammer approach.
How exactly is this any different than requiring a user to manually
scrub things to check data that's not being actively used? Or requiring
manual invocation of defragmentation? Or even batch deduplication?
All of those are manually triggered solutions to 'problems' with the
filesystem, just like this is. The only difference is that people are
used to needing to manually defrag disks, and reasonably used to the
need for manual scrubs (and don't seem to care much about dedupe), while
doing something like this to keep the allocator happy is absolutely
alien to them (despite being no different conceptually in that respect
from defrag, just operating at a different level).
>
> The main person working on enoscp stuff is Josef so I'd run this by
> him and make sure this papering over bugs is something he agrees with.
I agree that Josef's input would be nice to have, as he really does
appear to be the authority on this type of thing.
I would also love to hear from someone at Facebook about their
experience with this type of thing, as they probably have the largest
current deployment of BTRFS around.
>
>>
>>> I really do not like
>>> all this hand holding of Btrfs, it's not going to make it better.
>>
>> Maybe it won't but, absent better proposals, and given the nature of the
>> problem, this kind of hand-holding is only fair to the user.
>
> This is hardly the biggest gotcha with Btrfs. I'm fine with the idea
> of papering over design flaws and long standing bugs with user space
> work arounds. I just want everyone on the same page about it, so it's
> not some big surprise it's happening. As far as I know, none of the
> developers regularly looks at the Btrfs wiki.
>
> And I think the best way of communicating:
> a. this is busted, and it sucks
> b. here's a proposed user space work around, so users aren't so pissed off.
>
> Is to try and get it into btrfs-progs, and enabled by default, because
> that will get in front of at least one developer.
Maybe it's time someone writes up a BCP document and includes that as a
man page bundled with btrfs-progs? That would get much better developer
visibility, would be much easier to keep current, and would probably
cover the biggest issue with our documentation currently (it's great for
technical people, but somewhat horrendous for new users without
technical background). We've already essentially got the beginnings of
such a document between the FAQ and the Gotcha's page on the wiki.
next prev parent reply other threads:[~2018-01-16 12:57 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-08 15:55 Recommendations for balancing as part of regular maintenance? Austin S. Hemmelgarn
2018-01-08 16:20 ` ein
2018-01-08 16:34 ` Austin S. Hemmelgarn
2018-01-08 18:17 ` Graham Cobb
2018-01-08 18:34 ` Austin S. Hemmelgarn
2018-01-08 20:29 ` Martin Raiber
2018-01-09 8:33 ` Marat Khalili
2018-01-09 12:46 ` Austin S. Hemmelgarn
2018-01-10 3:49 ` Duncan
2018-01-10 16:30 ` Tom Worster
2018-01-10 17:01 ` Austin S. Hemmelgarn
2018-01-10 18:33 ` Tom Worster
2018-01-10 20:44 ` Timofey Titovets
2018-01-11 13:00 ` Austin S. Hemmelgarn
2018-01-11 8:51 ` Duncan
2018-01-10 4:38 ` Duncan
2018-01-10 12:41 ` Austin S. Hemmelgarn
2018-01-11 20:12 ` Hans van Kranenburg
2018-01-10 21:37 ` waxhead
2018-01-11 12:50 ` Austin S. Hemmelgarn
2018-01-11 19:56 ` Hans van Kranenburg
2018-01-12 18:24 ` Austin S. Hemmelgarn
2018-01-12 19:26 ` Tom Worster
2018-01-12 19:43 ` Austin S. Hemmelgarn
2018-01-13 22:09 ` Chris Murphy
2018-01-15 13:43 ` Austin S. Hemmelgarn
2018-01-15 18:23 ` Tom Worster
2018-01-16 6:45 ` Chris Murphy
2018-01-16 11:02 ` Andrei Borzenkov
2018-01-16 12:57 ` Austin S. Hemmelgarn [this message]
2018-01-08 21:43 Tom Worster
2018-01-08 22:18 ` Hugo Mills
2018-01-09 12:23 ` Austin S. Hemmelgarn
2018-01-09 14:16 ` Tom Worster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=88164eee-ead2-e6a3-9d6a-aeb0803466db@gmail.com \
--to=ahferroin7@gmail.com \
--cc=fsb@thefsb.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@colorremedies.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.