Recommendations for balancing as part of regular maintenance?

All of lore.kernel.org
 help / color / mirror / Atom feed

* Recommendations for balancing as part of regular maintenance?
@ 2018-01-08 15:55 Austin S. Hemmelgarn
  2018-01-08 16:20 ` ein
                   ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Austin S. Hemmelgarn @ 2018-01-08 15:55 UTC (permalink / raw)
  To: Btrfs BTRFS

So, for a while now I've been recommending small filtered balances to 
people as part of regular maintenance for BTRFS filesystems under the 
logic that it does help in some cases and can't really hurt (and if done 
right, is really inexpensive in terms of resources).  This ended up 
integrated partially in the info text next to the BTRFS charts on 
netdata's dashboard, and someone has now pointed out (correctly I might 
add) that this is at odds with the BTRFS FAQ entry on balances.

For reference, here's the bit about it in netdata:

You can keep your volume healthy by running the `btrfs balance` command 
on it regularly (check `man btrfs-balance` for more info).

And here's the FAQ entry:

Q: Do I need to run a balance regularly?

A: In general usage, no. A full unfiltered balance typically takes a 
long time, and will rewrite huge amounts of data unnecessarily. You may 
wish to run a balance on metadata only (see Balance_Filters) if you find 
you have very large amounts of metadata space allocated but unused, but 
this should be a last resort.

I've commented in the issue in netdata's issue tracker that I feel that 
the FAQ entry could be better worded (strictly speaking, you don't 
_need_ to run balances regularly, but it's usually a good idea). 
Looking at both though, I think they could probably both be improved, 
but I would like to get some input here on what people actually think 
the best current practices are regarding this (and ideally why they feel 
that way) before I go and change anything.

So, on that note, how does anybody else out there feel about this?  Is 
balancing regularly with filters restricting things to small numbers of 
mostly empty chunks a good thing for regular maintenance or not?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-08 15:55 Recommendations for balancing as part of regular maintenance? Austin S. Hemmelgarn
@ 2018-01-08 16:20 ` ein
  2018-01-08 16:34   ` Austin S. Hemmelgarn
  2018-01-10 21:37 ` waxhead
  2018-01-12 18:24 ` Austin S. Hemmelgarn
  2 siblings, 1 reply; 34+ messages in thread
From: ein @ 2018-01-08 16:20 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Btrfs BTRFS

On 01/08/2018 04:55 PM, Austin S. Hemmelgarn wrote:
> [...]
> 
> And here's the FAQ entry:
> 
> Q: Do I need to run a balance regularly?
> 
> A: In general usage, no. A full unfiltered balance typically takes a
> long time, and will rewrite huge amounts of data unnecessarily. You may
> wish to run a balance on metadata only (see Balance_Filters) if you find
> you have very large amounts of metadata space allocated but unused, but
> this should be a last resort.

IHMO three more sentencens and the answer would be more useful:
1. BTRFS balance command example with note check the man first.
2. What use case may cause 'large amounts of metadata space allocated
but unused'.


-- 
PGP Public Key (RSA/4096b):
ID: 0xF2C6EA10
SHA-1: 51DA 40EE 832A 0572 5AD8 B3C0 7AFF 69E1 F2C6 EA10

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-08 16:20 ` ein
@ 2018-01-08 16:34   ` Austin S. Hemmelgarn
  2018-01-08 18:17     ` Graham Cobb
  0 siblings, 1 reply; 34+ messages in thread
From: Austin S. Hemmelgarn @ 2018-01-08 16:34 UTC (permalink / raw)
  To: ein, Btrfs BTRFS

On 2018-01-08 11:20, ein wrote:
> On 01/08/2018 04:55 PM, Austin S. Hemmelgarn wrote:
>> [...]
>>
>> And here's the FAQ entry:
>>
>> Q: Do I need to run a balance regularly?
>>
>> A: In general usage, no. A full unfiltered balance typically takes a
>> long time, and will rewrite huge amounts of data unnecessarily. You may
>> wish to run a balance on metadata only (see Balance_Filters) if you find
>> you have very large amounts of metadata space allocated but unused, but
>> this should be a last resort.
> 
> IHMO three more sentencens and the answer would be more useful:
> 1. BTRFS balance command example with note check the man first.
> 2. What use case may cause 'large amounts of metadata space allocated
> but unused'.
> 
That's kind of what I was thinking as well, but I'm hesitant to get too 
heavily into stuff along the lines of 'for use case X, do 1, for use 
case Y, do 2, etc', as that tends to result in pigeonholing (people just 
go with what sounds closest to their use case instead of trying to 
figure out what actually is best for their use case).

Ideally, I think it should be as generic as reasonably possible, 
possibly something along the lines of:

A: While not strictly necessary, running regular filtered balances (for 
example `btrfs balance start -dusage=50 -dlimit=2 -musage=50 -mlimit=4`, 
see `man btrfs-balance` for more info on what the options mean) can help 
keep a volume healthy by mitigating the things that typically cause 
ENOSPC errors.  Full balances by contrast are long and expensive 
operations, and should be done only as a last resort.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-08 16:34   ` Austin S. Hemmelgarn
@ 2018-01-08 18:17     ` Graham Cobb
  2018-01-08 18:34       ` Austin S. Hemmelgarn
  2018-01-10  4:38       ` Duncan
  0 siblings, 2 replies; 34+ messages in thread
From: Graham Cobb @ 2018-01-08 18:17 UTC (permalink / raw)
  To: Btrfs BTRFS

On 08/01/18 16:34, Austin S. Hemmelgarn wrote:
> Ideally, I think it should be as generic as reasonably possible,
> possibly something along the lines of:
> 
> A: While not strictly necessary, running regular filtered balances (for
> example `btrfs balance start -dusage=50 -dlimit=2 -musage=50 -mlimit=4`,
> see `man btrfs-balance` for more info on what the options mean) can help
> keep a volume healthy by mitigating the things that typically cause
> ENOSPC errors.  Full balances by contrast are long and expensive
> operations, and should be done only as a last resort.

That recommendation is similar to what I do and it works well for my use
case. I would recommend it to anyone with my usage, but cannot say how
well it would work for other uses. In my case, I run balances like that
once a week: some weeks nothing happens, other weeks 5 or 10 blocks may
get moved.

For reference, my use case is for two separate btrfs filesystems each on
a single large disk (so no RAID) -- the disks are 6TB and 12TB, both
around 80% used -- one is my main personal data disk, the other is my
main online backup disk.

The data disk receives all email delivery (so lots of small files,
coming and going), stores TV programs as PVR storage (many GB sized
files, each one written once, which typically stick around for a while
and eventually get deleted) and is where I do my software development
(sources and build objects). No (significant) database usage. I am
guessing this is pretty typical personal user usage (although it doesn't
store any operating system files). The only unusual thing is that I have
it set up as about 20 subvolumes, and each one has frequent snapshots
(maybe 200 or so subvolumes in total at any time).

The online backup disk receives backups from all my systems in three
main forms: btrfs snapshots (send/receive), rsnapshot copies (rsync),
and DAR archives. Most get updated daily. It contains several hundred
snapshots (most received from the data disk).

It would be interesting to hear if similar balancing is seen as useful
for other very different cases (RAID use, databases or VM disks, etc).

Graham

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-08 18:17     ` Graham Cobb
@ 2018-01-08 18:34       ` Austin S. Hemmelgarn
  2018-01-08 20:29         ` Martin Raiber
  2018-01-10  4:38       ` Duncan
  1 sibling, 1 reply; 34+ messages in thread
From: Austin S. Hemmelgarn @ 2018-01-08 18:34 UTC (permalink / raw)
  To: Graham Cobb, Btrfs BTRFS

On 2018-01-08 13:17, Graham Cobb wrote:
> On 08/01/18 16:34, Austin S. Hemmelgarn wrote:
>> Ideally, I think it should be as generic as reasonably possible,
>> possibly something along the lines of:
>>
>> A: While not strictly necessary, running regular filtered balances (for
>> example `btrfs balance start -dusage=50 -dlimit=2 -musage=50 -mlimit=4`,
>> see `man btrfs-balance` for more info on what the options mean) can help
>> keep a volume healthy by mitigating the things that typically cause
>> ENOSPC errors.  Full balances by contrast are long and expensive
>> operations, and should be done only as a last resort.
> 
> That recommendation is similar to what I do and it works well for my use
> case. I would recommend it to anyone with my usage, but cannot say how
> well it would work for other uses. In my case, I run balances like that
> once a week: some weeks nothing happens, other weeks 5 or 10 blocks may
> get moved.
> 
> For reference, my use case is for two separate btrfs filesystems each on
> a single large disk (so no RAID) -- the disks are 6TB and 12TB, both
> around 80% used -- one is my main personal data disk, the other is my
> main online backup disk.
> 
> The data disk receives all email delivery (so lots of small files,
> coming and going), stores TV programs as PVR storage (many GB sized
> files, each one written once, which typically stick around for a while
> and eventually get deleted) and is where I do my software development
> (sources and build objects). No (significant) database usage. I am
> guessing this is pretty typical personal user usage (although it doesn't
> store any operating system files). The only unusual thing is that I have
> it set up as about 20 subvolumes, and each one has frequent snapshots
> (maybe 200 or so subvolumes in total at any time).
> 
> The online backup disk receives backups from all my systems in three
> main forms: btrfs snapshots (send/receive), rsnapshot copies (rsync),
> and DAR archives. Most get updated daily. It contains several hundred
> snapshots (most received from the data disk).
> 
> It would be interesting to hear if similar balancing is seen as useful
> for other very different cases (RAID use, databases or VM disks, etc).

In my own usage I've got a pretty varied mix of other stuff going on. 
All my systems are Gentoo, so system updates mean that I'm building 
software regularly (though on most of the systems that happens on tmpfs 
in RAM), I run a home server with a dozen low use QEMU VM's and a bunch 
of transient test VM's, all of which I'm currently storing disk images 
for raw on top of BTRFS (which is actually handling all of it pretty 
well, though that may be thanks to all the VM's using PV-SCSI for their 
disks), I run a BOINC client system that sees pretty heavy filesystem 
usage, and have a lot of personal files that get synced regularly across 
systems, and all of this is on raid1 with essentially no snapshots.  For 
me the balance command I mentioned above run daily seems to help, even 
if the balance doesn't move much most of the time on most filesystems, 
and the actual balance operations take at most a few seconds most of the 
time (I've got reasonably nice SSD's in everything).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-08 18:34       ` Austin S. Hemmelgarn
@ 2018-01-08 20:29         ` Martin Raiber
  2018-01-09  8:33           ` Marat Khalili
  0 siblings, 1 reply; 34+ messages in thread
From: Martin Raiber @ 2018-01-08 20:29 UTC (permalink / raw)
  To: Btrfs BTRFS

On 08.01.2018 19:34 Austin S. Hemmelgarn wrote:
> On 2018-01-08 13:17, Graham Cobb wrote:
>> On 08/01/18 16:34, Austin S. Hemmelgarn wrote:
>>> Ideally, I think it should be as generic as reasonably possible,
>>> possibly something along the lines of:
>>>
>>> A: While not strictly necessary, running regular filtered balances (for
>>> example `btrfs balance start -dusage=50 -dlimit=2 -musage=50
>>> -mlimit=4`,
>>> see `man btrfs-balance` for more info on what the options mean) can
>>> help
>>> keep a volume healthy by mitigating the things that typically cause
>>> ENOSPC errors.  Full balances by contrast are long and expensive
>>> operations, and should be done only as a last resort.
>>
>> That recommendation is similar to what I do and it works well for my use
>> case. I would recommend it to anyone with my usage, but cannot say how
>> well it would work for other uses. In my case, I run balances like that
>> once a week: some weeks nothing happens, other weeks 5 or 10 blocks may
>> get moved.
>
> In my own usage I've got a pretty varied mix of other stuff going on.
> All my systems are Gentoo, so system updates mean that I'm building
> software regularly (though on most of the systems that happens on
> tmpfs in RAM), I run a home server with a dozen low use QEMU VM's and
> a bunch of transient test VM's, all of which I'm currently storing
> disk images for raw on top of BTRFS (which is actually handling all of
> it pretty well, though that may be thanks to all the VM's using
> PV-SCSI for their disks), I run a BOINC client system that sees pretty
> heavy filesystem usage, and have a lot of personal files that get
> synced regularly across systems, and all of this is on raid1 with
> essentially no snapshots.  For me the balance command I mentioned
> above run daily seems to help, even if the balance doesn't move much
> most of the time on most filesystems, and the actual balance
> operations take at most a few seconds most of the time (I've got
> reasonably nice SSD's in everything).

There have been reports of (rare) corruption caused by balance (won't be
detected by a scrub) here on the mailing list. So I would stay a away
from btrfs balance unless it is absolutely needed (ENOSPC), and while it
is run I would try not to do anything else wrt. to writes simultaneously.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-08 20:29         ` Martin Raiber
@ 2018-01-09  8:33           ` Marat Khalili
  2018-01-09 12:46             ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 34+ messages in thread
From: Marat Khalili @ 2018-01-09  8:33 UTC (permalink / raw)
  To: Martin Raiber, Austin S. Hemmelgarn; +Cc: Btrfs BTRFS

On 08/01/18 19:34, Austin S. Hemmelgarn wrote:
> A: While not strictly necessary, running regular filtered balances 
> (for example `btrfs balance start -dusage=50 -dlimit=2 -musage=50 
> -mlimit=4`, see `man btrfs-balance` for more info on what the options 
> mean) can help keep a volume healthy by mitigating the things that 
> typically cause ENOSPC errors. 

The choice of words is not very fortunate IMO. In my view volume 
stopping being "healthy" during normal operation presumes some bugs (at 
least shortcomings) in the filesystem code. In this case I'd prefer to 
have detailed understanding of the situation before copy-pasting 
commands from wiki pages. Remember, most users don't run cutting-edge 
kernels and tools, preferring LTS distribution releases instead, so one 
size might not fit all.

On 08/01/18 23:29, Martin Raiber wrote:
> There have been reports of (rare) corruption caused by balance (won't be
> detected by a scrub) here on the mailing list. So I would stay a away
> from btrfs balance unless it is absolutely needed (ENOSPC), and while it
> is run I would try not to do anything else wrt. to writes simultaneously.

This is my opinion too as a normal user, based upon reading this list 
and own attempts to recover from ENOSPC. I'd rather re-create filesystem 
from scratch, or at least make full verified backup before attempting to 
fix problems with balance.

--

With Best Regards,
Marat Khalili

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-09  8:33           ` Marat Khalili
@ 2018-01-09 12:46             ` Austin S. Hemmelgarn
  2018-01-10  3:49               ` Duncan
  0 siblings, 1 reply; 34+ messages in thread
From: Austin S. Hemmelgarn @ 2018-01-09 12:46 UTC (permalink / raw)
  To: Marat Khalili, Martin Raiber; +Cc: Btrfs BTRFS

On 2018-01-09 03:33, Marat Khalili wrote:
> On 08/01/18 19:34, Austin S. Hemmelgarn wrote:
>> A: While not strictly necessary, running regular filtered balances 
>> (for example `btrfs balance start -dusage=50 -dlimit=2 -musage=50 
>> -mlimit=4`, see `man btrfs-balance` for more info on what the options 
>> mean) can help keep a volume healthy by mitigating the things that 
>> typically cause ENOSPC errors. 
> 
> The choice of words is not very fortunate IMO. In my view volume 
> stopping being "healthy" during normal operation presumes some bugs (at 
> least shortcomings) in the filesystem code. In this case I'd prefer to 
> have detailed understanding of the situation before copy-pasting 
> commands from wiki pages. Remember, most users don't run cutting-edge 
> kernels and tools, preferring LTS distribution releases instead, so one 
> size might not fit all.
I will not dispute that the tendency of BTRFS to end up in bad 
situations is a shortcoming of the filesystem code.  However, that isn't 
likely to change any time soon (fixing it is going to be a lot of work 
that will likely reduce performance for quite a few people), so there is 
absolutely no reason that people should not be trying to mitigate the 
problem.

As far as the exact command, the one I quoted has worked for at least 2 
years worth of btrfs-progs and kernels, and I think far longer than that 
(the usage and limit filters were implemented pretty early on).  I agree 
that detailed knowledge would be better, but that doesn't exactly fit 
with the concept of a FAQ in most cases, and most people really don't 
care about the details as long as it works.
> 
> On 08/01/18 23:29, Martin Raiber wrote:
>> There have been reports of (rare) corruption caused by balance (won't be
>> detected by a scrub) here on the mailing list. So I would stay a away
>> from btrfs balance unless it is absolutely needed (ENOSPC), and while it
>> is run I would try not to do anything else wrt. to writes simultaneously.
> 
> This is my opinion too as a normal user, based upon reading this list 
> and own attempts to recover from ENOSPC. I'd rather re-create filesystem 
> from scratch, or at least make full verified backup before attempting to 
> fix problems with balance.
While I'm generally of the same opinion (and I have a feeling most other 
people who have been server admins are too), it's not a very user 
friendly position to recommend that.  Keep in mind that many (probably 
most) users don't keep proper backups, and just targeting 'sensible' 
people as your primary audience is a bad idea.  It also needs to work at 
at least a basic level anyway though simply because you can't always 
just nuke the volume and rebuild it from scratch.

Personally though, I don't think I've ever seen issues with balance 
corrupting data, and I don't recall seeing complaints about it either 
(though I would love to see some links that prove me wrong).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-09 12:46             ` Austin S. Hemmelgarn
@ 2018-01-10  3:49               ` Duncan
  2018-01-10 16:30                 ` Tom Worster
  0 siblings, 1 reply; 34+ messages in thread
From: Duncan @ 2018-01-10  3:49 UTC (permalink / raw)
  To: linux-btrfs

Austin S. Hemmelgarn posted on Tue, 09 Jan 2018 07:46:48 -0500 as
excerpted:

>> On 08/01/18 23:29, Martin Raiber wrote:
>>> There have been reports of (rare) corruption caused by balance (won't
>>> be detected by a scrub) here on the mailing list. So I would stay a
>>> away from btrfs balance unless it is absolutely needed (ENOSPC), and
>>> while it is run I would try not to do anything else wrt. to writes
>>> simultaneously.
>> 
>> This is my opinion too as a normal user, based upon reading this list
>> and own attempts to recover from ENOSPC. I'd rather re-create
>> filesystem from scratch, or at least make full verified backup before
>> attempting to fix problems with balance.

> While I'm generally of the same opinion (and I have a feeling most other
> people who have been server admins are too), it's not a very user
> friendly position to recommend that.  Keep in mind that many (probably
> most) users don't keep proper backups, and just targeting 'sensible'
> people as your primary audience is a bad idea.  It also needs to work at
> at least a basic level anyway though simply because you can't always
> just nuke the volume and rebuild it from scratch.
> 
> Personally though, I don't think I've ever seen issues with balance
> corrupting data, and I don't recall seeing complaints about it either
> (though I would love to see some links that prove me wrong).

AFAIK, such corruption reports re balance aren't really balance, per se, 
at all.

Instead, what I've seen in nearly all cases is a number of filesystem 
maintenance commands involving heavy I/O colliding, that is, being run at 
the same time, possibly because some of them are scheduled, and the admin 
didn't take into account scheduled commands when issuing others manually.

I don't believe anyone would recommend running balance, scrub, snapshot-
deletion, and backups (rsync or btrfs send/receive being the common 
ones), all at the same time, or even two or more at the same time, if for 
no other reason than because they're all IO intensive and running just 
/one/ of them at a time is hard /enough/ on the system and the 
performance of anything else running at the same time, even when all 
components are fully stable and mature (and as we all know, btrfs is 
stabilizing, but not yet fully stable and mature), yet that's what these 
sorts of reports invariably involve.

Of course, with a certainty btrfs /should/ be able to handle more than 
one of these at once without corruption, because anything else is a bug, 
but... btrfs /is/ still stabilizing and maturing, and it's precisely this 
sort of rare corner-case race-condition bugs where more than one 
extremely heavy IO filesystem maintenance command is being run at the 
same time that tend to be the last to be found and fixed, because they 
/are/ rare corner-cases, often depending on race conditions, that tend to 
be rare enough reported, and then extremely difficult to duplicate, so 
that's exactly the type of bugs that tend to remain around at this point.

So rather than discouraging a sane-filtered regular balance (which I'll 
discuss in a different reply), I'd suggest that the more sane 
recommendation is to be aware of other major-IO filesystem maintenance 
commands (not just btrfs commands but rsync-based backups, etc, too, 
rsync being demanding enough on its own to have triggered a number of 
btrfs bug reports and fixes over the years), including scheduled 
commands, and to only run one at a time.

IOW, don't do a balance if your scheduled backup or snapshot-deletion is 
about to kick in.  One at a time is stressful enough on the filesystem 
and hardware, don't compound the problem trying to do two or more at once!

So assuming a weekly schedule, do one a day of balance, scrub, snapshot-
deletion, backups (after ensuring that none of them take over a day, 
balance in particular could at TiB-scale+ if not sanely filtered, 
particularly if quotas are enabled due to the scaling issues of that 
feature).  And if any of those are scheduled daily or more frequently, 
space the scheduling appropriately and ensure they're done before 
starting the next task.

And keep in mind the scheduled tasks when running things manually, so as 
not to collide there either.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-10  3:49               ` Duncan
@ 2018-01-10 16:30                 ` Tom Worster
  2018-01-10 17:01                   ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 34+ messages in thread
From: Tom Worster @ 2018-01-10 16:30 UTC (permalink / raw)
  To: linux-btrfs

On 9 Jan 2018, at 22:49, Duncan wrote:

> AFAIK, such corruption reports re balance aren't really balance, per 
> se,
> at all.
>
> Instead, what I've seen in nearly all cases is a number of filesystem
> maintenance commands involving heavy I/O colliding, that is, being run 
> at
> the same time

I hope there is consensus on this because it might be the key to 
resolving the contradictions that appear to me in the following 
propositions that all seem plausible/reasonable:

- Depletion of unallocated space (DoUS, apologies for coining the term 
if there already is one) is a property of BTRFS even if the volume's 
capacity is more than enough for the files on it.

- To a user that isn't a BTRFS expert, DoUS can be unexpected, its 
advance can be surprisingly fast and it can become severe.

- BTRFS does not recycle allocated but unused space to the unallocated 
pool.

- Resolving severe DoUS involves either running `btrfs balance` or 
recreating the filesystem from, e.g. backups.

- People have reported that `btrfs balance` sometimes causes filesystem 
corruption.

- Some experienced users say that, to resolve a problem with DoUS, they 
would rather recreate the filesystem than run balance.

- Some experienced users say you should stop all other use of the 
filesystem while running balance.

- Some experts recommend running balance regularly, even once a day, to 
prevent DoUS.

Without some satisfactory way to resolve the contradictions, I'm not 
sure how to proceed. For example, I'm not willing to offload the 
workload from each filesystem once a day for prophylactic balance. And 
I'm not going to let balance run unattended if those more experienced 
than me say it's known to corrupt filesystems. The best I can do is 
monitor DoUS and respond ad hoc. Or I can use a different fs type.

But if Duncan is right (which, for me, is practically the same as 
consensus on the proposition) that problems with corruption while 
running balance are associated with heavy coincident IO activity, then I 
can see a reasonable way forwards. I can even see how general 
recommendations for BTRFS maintenance might develop.

Tom

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-10 16:30                 ` Tom Worster
@ 2018-01-10 17:01                   ` Austin S. Hemmelgarn
  2018-01-10 18:33                     ` Tom Worster
  2018-01-11  8:51                     ` Duncan
  0 siblings, 2 replies; 34+ messages in thread
From: Austin S. Hemmelgarn @ 2018-01-10 17:01 UTC (permalink / raw)
  To: Tom Worster, linux-btrfs

On 2018-01-10 11:30, Tom Worster wrote:
> On 9 Jan 2018, at 22:49, Duncan wrote:
> 
>> AFAIK, such corruption reports re balance aren't really balance, per se,
>> at all.
>>
>> Instead, what I've seen in nearly all cases is a number of filesystem
>> maintenance commands involving heavy I/O colliding, that is, being run at
>> the same time
> 
> I hope there is consensus on this because it might be the key to 
> resolving the contradictions that appear to me in the following 
> propositions that all seem plausible/reasonable:
> 
> - Depletion of unallocated space (DoUS, apologies for coining the term 
> if there already is one) is a property of BTRFS even if the volume's 
> capacity is more than enough for the files on it.
Strictly speaking this particular statement is only true in that there 
are still probably bugs in the allocator.  The goal is for this to never 
be a significant problem as long as you have a reasonable amount of free 
space (reasonable being enough for at least a couple of chunks to be 
allocated).

Also, for future reference, the term we typically use is ENOSPC, as 
that's the symbolic name for the error code you get when this happens 
(or when your filesystem is just normally full), but I actually kind of 
like your name for it too, it conveys the exact condition being 
discussed in a way that should be a bit easier for non-technical types 
to understand.
> 
> - To a user that isn't a BTRFS expert, DoUS can be unexpected, its 
> advance can be surprisingly fast and it can become severe.
Absolutely correct, and actually true even for a number of BTRFS 
'experts' (no, seriously, I know of a number of cases where this caught 
'experts' (including myself) by surprise simply because they ran into a 
corner case they had never dealt with or found a bug in the allocator).
> 
> - BTRFS does not recycle allocated but unused space to the unallocated 
> pool.
Kind of.

The regular BTRFS allocator will (usually) preferentially avoid using 
blocks of free space smaller than a given size for new allocations. 
Without the 'ssd' mount option set, or when using Linux kernel version 
4.14 or newer, the minimum size is 64kB, so it's generally not too bad 
unless you regularly are dealing with lots of small files that change 
very frequently.  With the 'ssd' mount option set on Linux kernels prior 
to 4.14, the minimum size is 2MB, which tends to result in really poor 
space utilization, though it's still mostly an issue with volumes 
holding lots of small files that change frequently or see lots of small 
changes to large files.

However, this does not mean that that space will always be unused.  If 
space gets tight, BTRFS will use that previously allocated space to it's 
fullest, and it will reuse it in other circumstances too.
> 
> - Resolving severe DoUS involves either running `btrfs balance` or 
> recreating the filesystem from, e.g. backups.
In most cases yes, though it is sometimes possible to resolve simply by 
dropping snapshots if you have a lot of them and then deleting some files.
> 
> - People have reported that `btrfs balance` sometimes causes filesystem 
> corruption.
As I commented, I've not heard about this specifically, and I'm inclined 
to agree with Duncan's assessment that it's probably from people running 
multiple low-level maintenance operations happening concurrently 
(running two or more balances at the same time is known to be able to 
cause this type of corruption, and as a result there's locking in the 
kernel to prevent you from running more than one balance at a time on a 
filesystem).>
> - Some experienced users say that, to resolve a problem with DoUS, they 
> would rather recreate the filesystem than run balance.
This is kind of independent of BTRFS.  A lot of seasoned system 
administrators are going to be more likely to just rebuild a broken 
filesystem from scratch if possible than repair it simply because it's 
more reliable and generally guaranteed to fix the issue.  It largely 
comes down to the mentality of the individual, and how confident they 
are that they can fix a problem in a reasonable amount of time without 
causing damage elsewhere.
> 
> - Some experienced users say you should stop all other use of the 
> filesystem while running balance.
I've never seen any evidence that this is actually needed, but it does 
make the balance operation finish faster.  Strictly speaking, it 
shouldn't be needed at all (that's part of the point of having CoW 
semantics in the filesystem, it makes it easier to handle maintenance 
on-line).
> 
> - Some experts recommend running balance regularly, even once a day, to 
> prevent DoUS. >
> Without some satisfactory way to resolve the contradictions, I'm not 
> sure how to proceed. For example, I'm not willing to offload the 
> workload from each filesystem once a day for prophylactic balance. And 
> I'm not going to let balance run unattended if those more experienced 
> than me say it's known to corrupt filesystems. The best I can do is 
> monitor DoUS and respond ad hoc. Or I can use a different fs type.
It may be worth seriously looking at whether you actually _need_ BTRFS 
for your use case.  In general, unless you need at least one of it's 
features, and either can't get that feature with ZFS or just want to 
avoid using ZFS, you are likely better-off for the time being using 
another filesystem.

In my case for example, I _really_ want to avoid dealing with ZFS on 
Linux because of how it impacts what kernel versions I use and the fact 
that I don't trust the proprietary NVIDIA drivers to get along with it, 
and I need the checksumming and online transformation features 
(reshaping, profile conversion, device replacement, etc) of BTRFS.  If 
it weren't for all of that, I would not be using BTRFS at all.
> 
> But if Duncan is right (which, for me, is practically the same as 
> consensus on the proposition) that problems with corruption while 
> running balance are associated with heavy coincident IO activity, then I 
> can see a reasonable way forwards. I can even see how general 
> recommendations for BTRFS maintenance might develop.
As I commented above, I would tend to believe Duncan is right in this 
case (both because it makes sense, and because he seems to generally be 
right about this type of thing).  That said, I really do think that 
normal user I/O is probably not the issue, but low-level filesystem 
operations are.  That said, there is no reason that BTRFS shouldn't either:
1. Handle this just fine without causing corruption.
or:
2. Extend the mutex used to prevent concurrent balances to cover other 
operations that might cause issues (that is, make it so you can't scrub 
a filesystem while it's being balanced, or defragment it, or whatever else).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-10 17:01                   ` Austin S. Hemmelgarn
@ 2018-01-10 18:33                     ` Tom Worster
  2018-01-10 20:44                       ` Timofey Titovets
  2018-01-11  8:51                     ` Duncan
  1 sibling, 1 reply; 34+ messages in thread
From: Tom Worster @ 2018-01-10 18:33 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: linux-btrfs

On 10 Jan 2018, at 12:01, Austin S. Hemmelgarn wrote:

> On 2018-01-10 11:30, Tom Worster wrote:
>
> Also, for future reference, the term we typically use is ENOSPC, as 
> that's the symbolic name for the error code you get when this happens 
> (or when your filesystem is just normally full), but I actually kind 
> of like your name for it too, it conveys the exact condition being 
> discussed in a way that should be a bit easier for non-technical types 
> to understand.

Iiuc, ENOSPC is _exhaustion_ of unallocated space, which is a specific 
case of depletion.

I sought a term to refer to the phenomenon of unallocated space 
shrinking beyond what filesystem use would demand and how it ratchets 
down. Hence a sysop needs to manage DoUS. ENOSPC is likely a failure of 
such management.


>> - Some experienced users say that, to resolve a problem with DoUS, 
>> they would rather recreate the filesystem than run balance.
> This is kind of independent of BTRFS.

Yes. I mentioned it only because it was, to me, a striking statement of 
lack of confidence in balance.


>> But if Duncan is right (which, for me, is practically the same as 
>> consensus on the proposition) that problems with corruption while 
>> running balance are associated with heavy coincident IO activity, 
>> then I can see a reasonable way forwards. I can even see how general 
>> recommendations for BTRFS maintenance might develop.
> As I commented above, I would tend to believe Duncan is right in this 
> case (both because it makes sense, and because he seems to generally 
> be right about this type of thing).  That said, I really do think that 
> normal user I/O is probably not the issue, but low-level filesystem 
> operations are.  That said, there is no reason that BTRFS shouldn't 
> either:
> 1. Handle this just fine without causing corruption.
> or:
> 2. Extend the mutex used to prevent concurrent balances to cover other 
> operations that might cause issues (that is, make it so you can't 
> scrub a filesystem while it's being balanced, or defragment it, or 
> whatever else).

Yes, but backtracking a bit, I think there's another really important 
point here. Assuming Duncan's right, it's not so hard to develop 
guidelines for general BTRFS management that include DoUS among other 
topics. Duncan's other email today contains or implies quite a lot of 
those guidelines.

Or, to put it another way, it's enough for me. I think I know what to do 
now. And that much could be written down for the benefit of others.

Tom


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-10 18:33                     ` Tom Worster
@ 2018-01-10 20:44                       ` Timofey Titovets
  2018-01-11 13:00                         ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 34+ messages in thread
From: Timofey Titovets @ 2018-01-10 20:44 UTC (permalink / raw)
  To: Tom Worster; +Cc: Austin S. Hemmelgarn, linux-btrfs

2018-01-10 21:33 GMT+03:00 Tom Worster <fsb@thefsb.org>:
> On 10 Jan 2018, at 12:01, Austin S. Hemmelgarn wrote:
>
>> On 2018-01-10 11:30, Tom Worster wrote:
>>
>> Also, for future reference, the term we typically use is ENOSPC, as that's
>> the symbolic name for the error code you get when this happens (or when your
>> filesystem is just normally full), but I actually kind of like your name for
>> it too, it conveys the exact condition being discussed in a way that should
>> be a bit easier for non-technical types to understand.
>
>
> Iiuc, ENOSPC is _exhaustion_ of unallocated space, which is a specific case
> of depletion.
>
> I sought a term to refer to the phenomenon of unallocated space shrinking
> beyond what filesystem use would demand and how it ratchets down. Hence a
> sysop needs to manage DoUS. ENOSPC is likely a failure of such management.
>
>
>>> - Some experienced users say that, to resolve a problem with DoUS, they
>>> would rather recreate the filesystem than run balance.
>>
>> This is kind of independent of BTRFS.
>
>
> Yes. I mentioned it only because it was, to me, a striking statement of lack
> of confidence in balance.
>
>
>>> But if Duncan is right (which, for me, is practically the same as
>>> consensus on the proposition) that problems with corruption while running
>>> balance are associated with heavy coincident IO activity, then I can see a
>>> reasonable way forwards. I can even see how general recommendations for
>>> BTRFS maintenance might develop.
>>
>> As I commented above, I would tend to believe Duncan is right in this case
>> (both because it makes sense, and because he seems to generally be right
>> about this type of thing).  That said, I really do think that normal user
>> I/O is probably not the issue, but low-level filesystem operations are.
>> That said, there is no reason that BTRFS shouldn't either:
>> 1. Handle this just fine without causing corruption.
>> or:
>> 2. Extend the mutex used to prevent concurrent balances to cover other
>> operations that might cause issues (that is, make it so you can't scrub a
>> filesystem while it's being balanced, or defragment it, or whatever else).
>
>
> Yes, but backtracking a bit, I think there's another really important point
> here. Assuming Duncan's right, it's not so hard to develop guidelines for
> general BTRFS management that include DoUS among other topics. Duncan's
> other email today contains or implies quite a lot of those guidelines.
>
> Or, to put it another way, it's enough for me. I think I know what to do
> now. And that much could be written down for the benefit of others.
>
> Tom
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

My two cents,
I've about ~50 different systems
(VCS Systems, MySQL DB, Web Servers, Elastic Search nodes & etc.).
All running btrfs only and run fine, even with auto snapshot rotating
on some of them,
(btrfs make my life easier and i like it).

Most of them are small VMs From 3GiB..512GiB (I use compression everywhere).
And no one of them need balance, only that i care,
i try have always some unallocated space on it.

Most of them are stuck with some used/allocated/unallocated ratio.

I.e. as i see from conversation point of view.
We run balance for reallocate data -> make more unallocated space,
but if someone have plenty of it, that useless, no?

ex. I've 60% allocated by data/meta data chunks on my notebook,
And only 40% are really used by data, even then i have 90% allocated,
and 85% used, i don't face into ENOSPC problems. (256GiB ssd).

And if i run balance, i run it only to fight with btrfs discard processing bug,
which leads to trim only unallocated space (probably fixed already).

So if we talk about "regular" running of balance, may be that make a sense
To check free space, i.e. if system have some percentage of space
allocated, like 80%,
and have plenty of allocated/unused space, only then balance will be needed, no?

(I'm not say that btrfs have no problems, i see some rare hateful bugs,
on some systems, but most of them are internal btrfs problems
or problems with coop of btrfs with applications).

Thanks.
-- 
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-10 20:44                       ` Timofey Titovets
@ 2018-01-11 13:00                         ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 34+ messages in thread
From: Austin S. Hemmelgarn @ 2018-01-11 13:00 UTC (permalink / raw)
  To: Timofey Titovets, Tom Worster; +Cc: linux-btrfs

On 2018-01-10 15:44, Timofey Titovets wrote:
> 2018-01-10 21:33 GMT+03:00 Tom Worster <fsb@thefsb.org>:
>> On 10 Jan 2018, at 12:01, Austin S. Hemmelgarn wrote:
>>
>>> On 2018-01-10 11:30, Tom Worster wrote:
>>>
>>> Also, for future reference, the term we typically use is ENOSPC, as that's
>>> the symbolic name for the error code you get when this happens (or when your
>>> filesystem is just normally full), but I actually kind of like your name for
>>> it too, it conveys the exact condition being discussed in a way that should
>>> be a bit easier for non-technical types to understand.
>>
>>
>> Iiuc, ENOSPC is _exhaustion_ of unallocated space, which is a specific case
>> of depletion.
>>
>> I sought a term to refer to the phenomenon of unallocated space shrinking
>> beyond what filesystem use would demand and how it ratchets down. Hence a
>> sysop needs to manage DoUS. ENOSPC is likely a failure of such management.
>>
>>
>>>> - Some experienced users say that, to resolve a problem with DoUS, they
>>>> would rather recreate the filesystem than run balance.
>>>
>>> This is kind of independent of BTRFS.
>>
>>
>> Yes. I mentioned it only because it was, to me, a striking statement of lack
>> of confidence in balance.
>>
>>
>>>> But if Duncan is right (which, for me, is practically the same as
>>>> consensus on the proposition) that problems with corruption while running
>>>> balance are associated with heavy coincident IO activity, then I can see a
>>>> reasonable way forwards. I can even see how general recommendations for
>>>> BTRFS maintenance might develop.
>>>
>>> As I commented above, I would tend to believe Duncan is right in this case
>>> (both because it makes sense, and because he seems to generally be right
>>> about this type of thing).  That said, I really do think that normal user
>>> I/O is probably not the issue, but low-level filesystem operations are.
>>> That said, there is no reason that BTRFS shouldn't either:
>>> 1. Handle this just fine without causing corruption.
>>> or:
>>> 2. Extend the mutex used to prevent concurrent balances to cover other
>>> operations that might cause issues (that is, make it so you can't scrub a
>>> filesystem while it's being balanced, or defragment it, or whatever else).
>>
>>
>> Yes, but backtracking a bit, I think there's another really important point
>> here. Assuming Duncan's right, it's not so hard to develop guidelines for
>> general BTRFS management that include DoUS among other topics. Duncan's
>> other email today contains or implies quite a lot of those guidelines.
>>
>> Or, to put it another way, it's enough for me. I think I know what to do
>> now. And that much could be written down for the benefit of others.
>>
>> Tom
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> My two cents,
> I've about ~50 different systems
> (VCS Systems, MySQL DB, Web Servers, Elastic Search nodes & etc.).
> All running btrfs only and run fine, even with auto snapshot rotating
> on some of them,
> (btrfs make my life easier and i like it).
> 
> Most of them are small VMs From 3GiB..512GiB (I use compression everywhere).
> And no one of them need balance, only that i care,
> i try have always some unallocated space on it.
> 
> Most of them are stuck with some used/allocated/unallocated ratio.
> 
> I.e. as i see from conversation point of view.
> We run balance for reallocate data -> make more unallocated space,
> but if someone have plenty of it, that useless, no?
Not exactly.  In terms of reactive, after-the-fact type maintenance, 
that's most of what it's used for.  For preventative maintenance (which 
is what the recommendations I'm trying to work out are about), it can be 
used to help avoid ending up in such situations in the first place.

IOW, balancing in small increments on a regular basis can be used as a 
prophylactic measure to help keep things from getting into a state where 
you have a bunch of free space in one type of chunk, but need more space 
in another type of chunk and can't allocate any of that second type of 
chunk (which is the most common case of ENOSPC problems, df and 
statvfs() both show lots of free space, but certain VFS ops reliably 
return ENOSPC).  While it's unlikely to end up in such a state if you 
keep a reasonable amount of space unallocated, it is still possible, and 
even aside from that balancing can help keep the load evenly distributed 
in a multi-device volume.
> 
> ex. I've 60% allocated by data/meta data chunks on my notebook,
> And only 40% are really used by data, even then i have 90% allocated,
> and 85% used, i don't face into ENOSPC problems. (256GiB ssd).
> 
> And if i run balance, i run it only to fight with btrfs discard processing bug,
> which leads to trim only unallocated space (probably fixed already).
Yes, it is fixed in mainline, though I forget what kernel version the 
fix went into (I think 4.9 and newer have it, but I'm not sure).
> 
> So if we talk about "regular" running of balance, may be that make a sense
> To check free space, i.e. if system have some percentage of space
> allocated, like 80%,
> and have plenty of allocated/unused space, only then balance will be needed, no?
> 
> (I'm not say that btrfs have no problems, i see some rare hateful bugs,
> on some systems, but most of them are internal btrfs problems
> or problems with coop of btrfs with applications).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-10 17:01                   ` Austin S. Hemmelgarn
  2018-01-10 18:33                     ` Tom Worster
@ 2018-01-11  8:51                     ` Duncan
  1 sibling, 0 replies; 34+ messages in thread
From: Duncan @ 2018-01-11  8:51 UTC (permalink / raw)
  To: linux-btrfs

Austin S. Hemmelgarn posted on Wed, 10 Jan 2018 12:01:42 -0500 as
excerpted:

>> - Some experienced users say that, to resolve a problem with DoUS, they
>> would rather recreate the filesystem than run balance.

> This is kind of independent of BTRFS.  A lot of seasoned system
> administrators are going to be more likely to just rebuild a broken
> filesystem from scratch if possible than repair it simply because it's
> more reliable and generally guaranteed to fix the issue.  It largely
> comes down to the mentality of the individual, and how confident they
> are that they can fix a problem in a reasonable amount of time without
> causing damage elsewhere.

Specific to this one...

I'm known around here for harping on the backup point (hold on, I'll 
explain how that ties in).  A/the sysadmin's first rule of backups: The 
(true) value of your data is defined not by any arbitrary claims, but by 
how many backups of that data you consider it worth having.  No backups 
defines the data as of only trivial value, worth less than the time/
trouble/resources necessary to make that backup.

It therefore follows that in the event of data mishap, a sysadmin can 
always rest happy, because regardless of what might have been lost, what 
actions defined as of *MOST* value, either the data if it was backed up, 
or the time/trouble/resources that would have otherwise gone into that 
backup if not, was *ALWAYS* saved.

Understanding that puts an entirely different spin on backups and data 
mishaps, taking a lot of the pressure off when things /do/ go wrong, 
because one understands that the /true/ value of that data was defined 
long before, and now we're simply dealing with the results of our 
decision to define it that way, only playing out the story we setup for 
ourselves long before.

But how does that apply to the current discussion?

Simply this way:  For someone understanding the above, repair is never a 
huge problem or priority, because the data was either of such trivial 
value as to make it no big deal, or there were backups, thus making this 
particular instance of the data, and the necessity of repair, no big deal.

Once /that/ is understood, the question of repair vs. rebuild from 
scratch (or even simply fail-over to the hot-spare and send the old 
filesystem component devices to be tested for reuse or recycle) becomes 
purely one of efficiency, and the answer ends up being pretty 
predictable, because rebuild from scratch and restore from backup should 
be near 100% reliable on a reasonable/predictable time frame, vs. 
/attempting/ a repair with unknown likelihood of success and a much 
/less/ predictable time frame, especially since there's a non-trivial 
chance one will have to fall back to the rebuild from scratch and backups 
method anyway, after repair attempts fail.

Once one is thinking in those terms and already has backups accordingly, 
even for home or other one-off systems where actual formatting and 
restore from backups is going to be manual and thus will take longer than 
a trivial fix, the practical limits on the extents to which one is 
willing to go to get a fix are pretty narrow, and while one might try a 
couple fixes if they're easy and quick enough, beyond that it very 
quickly becomes restore from backups time if the data was considered 
valuable enough to be worth making them, or simply throw it away and 
start over if the data wasn't considered valuable enough to be worth 
making a backup in the first place.

So it's really independent of btrfs and not reflective on the reliability 
of balance, etc, at all.  It's simply a reflection of understanding the 
realities of possible repair... or not and having to replace anyway... 
without a good estimate on the time required either way... vs. a (near) 
100% guaranteed fix and back in business, in a relatively tightly 
predictable timeframe.  Couple that with the possibility that a repair 
may leave other problems latent and ready to be exposed later, while 
starting over from scratch gives you a "clean starting point", and it's 
pretty much a no-brainer, regardless of the filesystem... or whatever 
else (hardware, software layers other than the filesystem) may be in use.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-08 18:17     ` Graham Cobb
  2018-01-08 18:34       ` Austin S. Hemmelgarn
@ 2018-01-10  4:38       ` Duncan
  2018-01-10 12:41         ` Austin S. Hemmelgarn
  2018-01-11 20:12         ` Hans van Kranenburg
  1 sibling, 2 replies; 34+ messages in thread
From: Duncan @ 2018-01-10  4:38 UTC (permalink / raw)
  To: linux-btrfs

Graham Cobb posted on Mon, 08 Jan 2018 18:17:13 +0000 as excerpted:

> On 08/01/18 16:34, Austin S. Hemmelgarn wrote:
>> Ideally, I think it should be as generic as reasonably possible,
>> possibly something along the lines of:
>> 
>> A: While not strictly necessary, running regular filtered balances (for
>> example `btrfs balance start -dusage=50 -dlimit=2 -musage=50
>> -mlimit=4`,
>> see `man btrfs-balance` for more info on what the options mean) can
>> help keep a volume healthy by mitigating the things that typically
>> cause ENOSPC errors.  Full balances by contrast are long and expensive
>> operations, and should be done only as a last resort.
> 
> That recommendation is similar to what I do and it works well for my use
> case. I would recommend it to anyone with my usage, but cannot say how
> well it would work for other uses. In my case, I run balances like that
> once a week: some weeks nothing happens, other weeks 5 or 10 blocks may
> get moved.

Why 50% usage, and why the rather low limits?

OK, so it rarely makes sense to go over 50% usage when the intent of the 
balance is to return chunks to the unallocated pool, because at 50% the 
payback ratio is one free chunk for two processed and it gets worse after 
that and MUCH worse after ~67-75%, where the ratios are 1:3 and 1:4 
respectively, but why so high especially for a suggested scheduled/
routine command?

I'd suggest a rather lower usage value, say 20/25/34%, for favorable 
payback ratios of 5:1, 4:1, and 3:1.  That should be reasonable for a 
generic recommendation for scheduled/routine balances.  If that's not 
enough, people can do more manually or increase the values from the 
generic recommendation for their specific use-case.

And I'd suggest either no limits or (for kernels that can handle it, 
4.4+, which at this point is everything within our recommended support 
range of the last two LTSs, thus now 4.9 earliest, anyway) range-limits, 
say 2..20, so it won't bother if there's less than enough to clear at 
least one chunk within the usage target (but see the observed behavior 
change noted below), but will do more than the low 2-4 in the above 
suggested limits if there is.  With the lower usage= values, processing 
should take less time per chunk, and if there's no more that fit the 
usage filter it won't use the higher range anyway, so the limit can and 
should be higher.

Meanwhile, for any recommendation of balance, I'd suggest also mentioning 
the negative effect that enabled quotas have on balance times, probably 
with a link to a fuller discussion where I'd suggest disabling them due 
to the scaling issues if the use-case doesn't require them, and if that's 
not possible due to the use-case, to at least consider temporarily 
disabling quotas before doing a balance so as to speed it up, after which 
they can be enabled again.  (I'm not sure if a manual quota rescan is 
required to update them at that point, or not.  I don't use quotas here 
or I'd test.)

And an additional observation...

I'm on ssd here and run many rather small independent btrfs instead of 
fewer larger ones, so I'm used to keeping an eye on usage, tho I've never 
found the need to schedule balances, partly because on ssd with 
relatively small btrfs, balances are fast enough they're not a problem to 
do "while I wait".

And I've definitely noticed an effect since the ssd option stopped using 
the 2 MiB spreading algorithm in 4.14.  In particular, while chunk usage 
was generally stable before that and I only occasionally needed to run 
balance to clear out empty chunks, now, balance with the usage filter 
will apparently actively fill in empty space in existing chunks, so while 
previously a usage-filtered balance that only rewrote one chunk didn't 
actually free anything, simply allocating a new chunk to replace the one 
it freed, so at least two chunks needed rewritten to actually free space 
back to unallocated...

Now, usage-filtered rewrites of only a single chunk routinely frees the 
allocated space, because it writes that small bit of data in the freed 
chunk into existing free space in other chunks.

At least I /presume/ that new balance-usage behavior is due to the ssd 
changes.  Maybe it's due to other patches.  Either way, it's an 
interesting and useful change. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-10  4:38       ` Duncan
@ 2018-01-10 12:41         ` Austin S. Hemmelgarn
  2018-01-11 20:12         ` Hans van Kranenburg
  1 sibling, 0 replies; 34+ messages in thread
From: Austin S. Hemmelgarn @ 2018-01-10 12:41 UTC (permalink / raw)
  To: linux-btrfs

On 2018-01-09 23:38, Duncan wrote:
> Graham Cobb posted on Mon, 08 Jan 2018 18:17:13 +0000 as excerpted:
> 
>> On 08/01/18 16:34, Austin S. Hemmelgarn wrote:
>>> Ideally, I think it should be as generic as reasonably possible,
>>> possibly something along the lines of:
>>>
>>> A: While not strictly necessary, running regular filtered balances (for
>>> example `btrfs balance start -dusage=50 -dlimit=2 -musage=50
>>> -mlimit=4`,
>>> see `man btrfs-balance` for more info on what the options mean) can
>>> help keep a volume healthy by mitigating the things that typically
>>> cause ENOSPC errors.  Full balances by contrast are long and expensive
>>> operations, and should be done only as a last resort.
>>
>> That recommendation is similar to what I do and it works well for my use
>> case. I would recommend it to anyone with my usage, but cannot say how
>> well it would work for other uses. In my case, I run balances like that
>> once a week: some weeks nothing happens, other weeks 5 or 10 blocks may
>> get moved.
> 
> 
> Why 50% usage, and why the rather low limits?
> 
> OK, so it rarely makes sense to go over 50% usage when the intent of the
> balance is to return chunks to the unallocated pool, because at 50% the
> payback ratio is one free chunk for two processed and it gets worse after
> that and MUCH worse after ~67-75%, where the ratios are 1:3 and 1:4
> respectively, but why so high especially for a suggested scheduled/
> routine command?
Largely because that's what I use myself, and I know it works reliably. 
In my case, I use a large number of small filesystems, don't delete very 
large amounts of data very often, and run the command daily, so it's not 
very likely that a large number of chunks are going to be below half 
full, and therefore it made sense for me to just limit it to a small 
number of half full chunks so that it completes quickly.
> 
> I'd suggest a rather lower usage value, say 20/25/34%, for favorable
> payback ratios of 5:1, 4:1, and 3:1.  That should be reasonable for a
> generic recommendation for scheduled/routine balances.  If that's not
> enough, people can do more manually or increase the values from the
> generic recommendation for their specific use-case.
That's probably a good idea, though I'd likely go for about 25% as a 
generic recommendation (much lower, and you're not likely to process any 
chunks at all most of the time since BTRFS will back-fill things, much 
higher and the ratio becomes rather unfavorable).
> 
> And I'd suggest either no limits or (for kernels that can handle it,
> 4.4+, which at this point is everything within our recommended support
> range of the last two LTSs, thus now 4.9 earliest, anyway) range-limits,
> say 2..20, so it won't bother if there's less than enough to clear at
> least one chunk within the usage target (but see the observed behavior
> change noted below), but will do more than the low 2-4 in the above
> suggested limits if there is.  With the lower usage= values, processing
> should take less time per chunk, and if there's no more that fit the
> usage filter it won't use the higher range anyway, so the limit can and
> should be higher.
Good point on the limits too, though I would say that we should probably 
comment specifically on the fact that you need 4.4 or newer for the 
range support (there are still people dealing with much older kernels 
out there, think of embedded life-cycles for example).
> 
> 
> Meanwhile, for any recommendation of balance, I'd suggest also mentioning
> the negative effect that enabled quotas have on balance times, probably
> with a link to a fuller discussion where I'd suggest disabling them due
> to the scaling issues if the use-case doesn't require them, and if that's
> not possible due to the use-case, to at least consider temporarily
> disabling quotas before doing a balance so as to speed it up, after which
> they can be enabled again.  (I'm not sure if a manual quota rescan is
> required to update them at that point, or not.  I don't use quotas here
> or I'd test.)
Also a good point!
> 
> 
> And an additional observation...
> 
> I'm on ssd here and run many rather small independent btrfs instead of
> fewer larger ones, so I'm used to keeping an eye on usage, tho I've never
> found the need to schedule balances, partly because on ssd with
> relatively small btrfs, balances are fast enough they're not a problem to
> do "while I wait".
In my case, they're pretty darn fast too, I just don't like having to 
remember to run them by hand (that is the main appeal for automation 
after all).
> 
> And I've definitely noticed an effect since the ssd option stopped using
> the 2 MiB spreading algorithm in 4.14.  In particular, while chunk usage
> was generally stable before that and I only occasionally needed to run
> balance to clear out empty chunks, now, balance with the usage filter
> will apparently actively fill in empty space in existing chunks, so while
> previously a usage-filtered balance that only rewrote one chunk didn't
> actually free anything, simply allocating a new chunk to replace the one
> it freed, so at least two chunks needed rewritten to actually free space
> back to unallocated...
> 
> Now, usage-filtered rewrites of only a single chunk routinely frees the
> allocated space, because it writes that small bit of data in the freed
> chunk into existing free space in other chunks.
> 
> At least I /presume/ that new balance-usage behavior is due to the ssd
> changes.  Maybe it's due to other patches.  Either way, it's an
> interesting and useful change. =:^)
I'm pretty sure it's due to the 'ssd' option change.  The way it was 
coded previously made the allocator rather averse to back-filling free 
space, and balance just sends stuff back through the allocator again 
(other than the filtering, that is quite literally all it does), so a 
change to the allocator's behavior will change balance behavior too. 
Regardless, this is also a good point that should probably be added to 
the FAQ.  Given this, it might also be worth recommending that people 
with SSD's who upgraded to 4.14 should run a much more aggressive 
filtered balance (thinking 50% usage and no limit filter) to repack 
things a bit more efficiently.

Overall, I'm starting to think that the best option here is to update 
the FAQ entry, and then have netdata's help text point to the FAQ entry 
instead of trying to contain the same info.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-10  4:38       ` Duncan
  2018-01-10 12:41         ` Austin S. Hemmelgarn
@ 2018-01-11 20:12         ` Hans van Kranenburg
  1 sibling, 0 replies; 34+ messages in thread
From: Hans van Kranenburg @ 2018-01-11 20:12 UTC (permalink / raw)
  To: Duncan, linux-btrfs

On 01/10/2018 05:38 AM, Duncan wrote:
> [...]
> 
> And I've definitely noticed an effect since the ssd option stopped using 
> the 2 MiB spreading algorithm in 4.14.

Glad to hear. :-)

> In particular, while chunk usage 
> was generally stable before that and I only occasionally needed to run 
> balance to clear out empty chunks, now, balance with the usage filter 
> will apparently actively fill in empty space in existing chunks, so while 
> previously a usage-filtered balance that only rewrote one chunk didn't 
> actually free anything, simply allocating a new chunk to replace the one 
> it freed, so at least two chunks needed rewritten to actually free space 
> back to unallocated...
> 
> Now, usage-filtered rewrites of only a single chunk routinely frees the 
> allocated space, because it writes that small bit of data in the freed 
> chunk into existing free space in other chunks.

And that back-filling the existing chunks indeed also means a decrease
in total work that needs to be done. But this probably also means that
the free space gaps you have/had were rather small so they got ignored
in the past. Large free gaps would always get data written in them by
balance already. It probably also means that now you're on 4.14, much
less of these small free space gaps will be left behind, because they're
already immediately reused by new small writes.

> At least I /presume/ that new balance-usage behavior is due to the ssd 
> changes.

Most probably, yes.

>  Maybe it's due to other patches.  Either way, it's an 
> interesting and useful change. =:^)

-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-08 15:55 Recommendations for balancing as part of regular maintenance? Austin S. Hemmelgarn
  2018-01-08 16:20 ` ein
@ 2018-01-10 21:37 ` waxhead
  2018-01-11 12:50   ` Austin S. Hemmelgarn
  2018-01-11 19:56   ` Hans van Kranenburg
  2018-01-12 18:24 ` Austin S. Hemmelgarn
  2 siblings, 2 replies; 34+ messages in thread
From: waxhead @ 2018-01-10 21:37 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Btrfs BTRFS

Austin S. Hemmelgarn wrote:
> So, for a while now I've been recommending small filtered balances to
> people as part of regular maintenance for BTRFS filesystems under the
> logic that it does help in some cases and can't really hurt (and if done
> right, is really inexpensive in terms of resources).  This ended up
> integrated partially in the info text next to the BTRFS charts on
> netdata's dashboard, and someone has now pointed out (correctly I might
> add) that this is at odds with the BTRFS FAQ entry on balances.
>
> For reference, here's the bit about it in netdata:
>
> You can keep your volume healthy by running the `btrfs balance` command
> on it regularly (check `man btrfs-balance` for more info).
>
>
> And here's the FAQ entry:
>
> Q: Do I need to run a balance regularly?
>
> A: In general usage, no. A full unfiltered balance typically takes a
> long time, and will rewrite huge amounts of data unnecessarily. You may
> wish to run a balance on metadata only (see Balance_Filters) if you find
> you have very large amounts of metadata space allocated but unused, but
> this should be a last resort.
>
>
> I've commented in the issue in netdata's issue tracker that I feel that
> the FAQ entry could be better worded (strictly speaking, you don't
> _need_ to run balances regularly, but it's usually a good idea). Looking
> at both though, I think they could probably both be improved, but I
> would like to get some input here on what people actually think the best
> current practices are regarding this (and ideally why they feel that
> way) before I go and change anything.
>
> So, on that note, how does anybody else out there feel about this?  Is
> balancing regularly with filters restricting things to small numbers of
> mostly empty chunks a good thing for regular maintenance or not?
> --
As just a regular user I would think that the first thing you would need 
is an analyze that can tell you if it is a good idea to balance or not 
in the first place.

Scrub seems like a great place to start - e.g. scrub could auto-analyze 
and report back need to balance. I also think that scrub should 
optionally autobalance if needed.

Balance may not be needed, but if one can determine that balancing would 
speed up things a bit I don't see why this as an option can't be 
scheduled automatically. Ideally there should be a "scrub and polish" 
option that would scrub, balance and perhaps even defragment in one go.

In fact, the way I see it btrfs should idealy by itself keep track on 
each data/metadata chunk and it should know , when was this chunk last 
affected by a scrub, balance, defrag etc and perform the required 
operations by itself based on a configuration or similar. Some may 
disagree for good reasons , but for me this is my wishlist for a 
filesystem :) e.g. a pool that just works and only annoys you with the 
need of replacing a bad disk every now and then :)

> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-10 21:37 ` waxhead
@ 2018-01-11 12:50   ` Austin S. Hemmelgarn
  2018-01-11 19:56   ` Hans van Kranenburg
  1 sibling, 0 replies; 34+ messages in thread
From: Austin S. Hemmelgarn @ 2018-01-11 12:50 UTC (permalink / raw)
  To: waxhead, Btrfs BTRFS

On 2018-01-10 16:37, waxhead wrote:
> Austin S. Hemmelgarn wrote:
>> So, for a while now I've been recommending small filtered balances to
>> people as part of regular maintenance for BTRFS filesystems under the
>> logic that it does help in some cases and can't really hurt (and if done
>> right, is really inexpensive in terms of resources).  This ended up
>> integrated partially in the info text next to the BTRFS charts on
>> netdata's dashboard, and someone has now pointed out (correctly I might
>> add) that this is at odds with the BTRFS FAQ entry on balances.
>>
>> For reference, here's the bit about it in netdata:
>>
>> You can keep your volume healthy by running the `btrfs balance` command
>> on it regularly (check `man btrfs-balance` for more info).
>>
>>
>> And here's the FAQ entry:
>>
>> Q: Do I need to run a balance regularly?
>>
>> A: In general usage, no. A full unfiltered balance typically takes a
>> long time, and will rewrite huge amounts of data unnecessarily. You may
>> wish to run a balance on metadata only (see Balance_Filters) if you find
>> you have very large amounts of metadata space allocated but unused, but
>> this should be a last resort.
>>
>>
>> I've commented in the issue in netdata's issue tracker that I feel that
>> the FAQ entry could be better worded (strictly speaking, you don't
>> _need_ to run balances regularly, but it's usually a good idea). Looking
>> at both though, I think they could probably both be improved, but I
>> would like to get some input here on what people actually think the best
>> current practices are regarding this (and ideally why they feel that
>> way) before I go and change anything.
>>
>> So, on that note, how does anybody else out there feel about this?  Is
>> balancing regularly with filters restricting things to small numbers of
>> mostly empty chunks a good thing for regular maintenance or not?
>> -- 
> As just a regular user I would think that the first thing you would need 
> is an analyze that can tell you if it is a good idea to balance or not 
> in the first place.
In an ideal situation, the only reason it should ever be a bad idea to 
run a balance is the performance impact (which is of course why we have 
filters).  Beyond that though, there's too much involved for even a 
computer to reliably tell you if it will be beneficial to run a balance 
or not.  It depends not just on how the data looks on the filesystem, 
but also how you are going to be using the filesystem in the near future 
(for example, if you've got a number of large blocks of empty space 
within data chunks, it might make sense to balance, but not if you're 
likely to be adding a bunch of new files in the very near future (they 
will just end up packed into that empty space in existing chunks, and 
your actual layout on disk shouldn't be all that different from if you 
had run a balance)).
> 
> Scrub seems like a great place to start - e.g. scrub could auto-analyze 
> and report back need to balance. I also think that scrub should 
> optionally autobalance if needed.
> 
> Balance may not be needed, but if one can determine that balancing would 
> speed up things a bit I don't see why this as an option can't be 
> scheduled automatically. Ideally there should be a "scrub and polish" 
> option that would scrub, balance and perhaps even defragment in one go.
In this case, the recommendation isn't as much about speed as it is 
about trying to keep things from getting into a state where you get 
ENOSPC but conventional tools report lots of free space.  As a general 
rule, unless things are pathologically bad to begin with, balancing a 
filesystem won't usually have any measurable impact on performance.
> 
> In fact, the way I see it btrfs should idealy by itself keep track on 
> each data/metadata chunk and it should know , when was this chunk last 
> affected by a scrub, balance, defrag etc and perform the required 
> operations by itself based on a configuration or similar. Some may 
> disagree for good reasons , but for me this is my wishlist for a 
> filesystem :) e.g. a pool that just works and only annoys you with the 
> need of replacing a bad disk every now and then :)
Long-term, that type of things is a goal, but I doubt that we're going 
to go that far with automation (even ZFS doesn't go that far, you still 
have to schedule scrubs and similar things).


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-10 21:37 ` waxhead
  2018-01-11 12:50   ` Austin S. Hemmelgarn
@ 2018-01-11 19:56   ` Hans van Kranenburg
  1 sibling, 0 replies; 34+ messages in thread
From: Hans van Kranenburg @ 2018-01-11 19:56 UTC (permalink / raw)
  To: waxhead, Austin S. Hemmelgarn, Btrfs BTRFS

On 01/10/2018 10:37 PM, waxhead wrote:
> As just a regular user I would think that the first thing you would need
> is an analyze that can tell you if it is a good idea to balance or not
> in the first place.

Tooling to create that is available. Btrfs allows you to read a lot of
different data to analyze, and then you can experiment with your own
algorithms to find out which blockgroup you're going to feed to balance
next.

There's two language options...
* C -> in this case you're extending btrfs-progs to build new tools
* Python -> python-btrfs has everything in it to quickly throw together
things like this, and examples are available with the source.

For example:
- balance_least_used.py -> balance starting from the least used chunk
and work up towards a max of X% used.
- show_free_space_fragmentation.py -> find out which chunks have badly
fragmented free space. Remember: if you have a 1GiB chunk with usage
50%, that doesn't tell you if it has only a handful of extents, filling
up the first 500MiB, with the rest empty, or if it's thousands of
alternating pieces of 4KiB used space and 4KiB free space. ;-] [0]

In the same way you can program something new, like a balance algorithm
that cleans up blocks with high free space fragmentation first.

Or, another thing you could do is first count the number of extents in a
block group and add it to the algorithm. Balance of a block group with a
few extents is much faster than thousands of extents with a lot of
reflinks, like highly deduped data.

Or... look at generation of metadata to find out which parts of data on
your disk have been touched recently, and which weren't... Too many fun
things to play around with. \:D/

As always, first thing to do is make sure you're on 4.14 or otherwise
use nossd, otherwise you might keep shoveling data around forever.

And if your filesystem has been treated badly by <4.14 kernel in ssd
mode for a long time, then first get that cleaned up:

https://www.spinics.net/lists/linux-btrfs/msg70622.html

> Scrub seems like a great place to start - e.g. scrub could auto-analyze
> and report back need to balance. I also think that scrub should
> optionally autobalance if needed.
> 
> Balance may not be needed, but if one can determine that balancing would
> speed up things a bit I don't see why this as an option can't be
> scheduled automatically. Ideally there should be a "scrub and polish"
> option that would scrub, balance and perhaps even defragment in one go.
> 
> In fact, the way I see it btrfs should idealy by itself keep track on
> each data/metadata chunk and it should know , when was this chunk last
> affected by a scrub, balance, defrag etc and perform the required
> operations by itself based on a configuration or similar. Some may
> disagree for good reasons , but for me this is my wishlist for a
> filesystem :) e.g. a pool that just works and only annoys you with the
> need of replacing a bad disk every now and then :)

I don't think these kind of things will ever end up in kernel code.

[0] There's a version in the devel branch in git that also works without
free space tree, taking a slower detour via the extent tree.

-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-08 15:55 Recommendations for balancing as part of regular maintenance? Austin S. Hemmelgarn
  2018-01-08 16:20 ` ein
  2018-01-10 21:37 ` waxhead
@ 2018-01-12 18:24 ` Austin S. Hemmelgarn
  2018-01-12 19:26   ` Tom Worster
  2018-01-13 22:09   ` Chris Murphy
  2 siblings, 2 replies; 34+ messages in thread
From: Austin S. Hemmelgarn @ 2018-01-12 18:24 UTC (permalink / raw)
  To: Btrfs BTRFS

On 2018-01-08 10:55, Austin S. Hemmelgarn wrote:
> So, for a while now I've been recommending small filtered balances to 
> people as part of regular maintenance for BTRFS filesystems under the 
> logic that it does help in some cases and can't really hurt (and if done 
> right, is really inexpensive in terms of resources).  This ended up 
> integrated partially in the info text next to the BTRFS charts on 
> netdata's dashboard, and someone has now pointed out (correctly I might 
> add) that this is at odds with the BTRFS FAQ entry on balances.
> 
> For reference, here's the bit about it in netdata:
> 
> You can keep your volume healthy by running the `btrfs balance` command 
> on it regularly (check `man btrfs-balance` for more info).
> 
> 
> And here's the FAQ entry:
> 
> Q: Do I need to run a balance regularly?
> 
> A: In general usage, no. A full unfiltered balance typically takes a 
> long time, and will rewrite huge amounts of data unnecessarily. You may 
> wish to run a balance on metadata only (see Balance_Filters) if you find 
> you have very large amounts of metadata space allocated but unused, but 
> this should be a last resort.
> 
> 
> I've commented in the issue in netdata's issue tracker that I feel that 
> the FAQ entry could be better worded (strictly speaking, you don't 
> _need_ to run balances regularly, but it's usually a good idea). Looking 
> at both though, I think they could probably both be improved, but I 
> would like to get some input here on what people actually think the best 
> current practices are regarding this (and ideally why they feel that 
> way) before I go and change anything.
> 
> So, on that note, how does anybody else out there feel about this?  Is 
> balancing regularly with filters restricting things to small numbers of 
> mostly empty chunks a good thing for regular maintenance or not?
OK, I've gotten a lot of good feedback on this, and the general 
consensus seems to be:

* If we're going to recommend regular balance, we should explain how it 
actually helps things.
* We should mention the performance interactions with qgroups, as well 
as warning people off of running other things like scrubs or defrag 
concurrently.
* The filters should be reasonably tame in terms of chunk selection.
* BTRFS should ideally get smarter about this kind of thing so the user 
doesn't have to be.

To that end, I propose the following text for the FAQ:

Q: Do I need to run a balance regularly?

A: While not strictly necessary for normal operations, running a 
filtered balance regularly can help prevent your filesystem from ending 
up with ENOSPC issues.  The following command run daily on each BTRFS 
volume should be more than sufficient for most users:

`btrfs balance start -dusage=25 -dlimit=2..10 -musage=25 -mlimit=2..10`

If you are running a kernel older than version 4.4 and can't upgrade, 
the following should be used instead:

`btrfs balance start -dusage=25 -musage=25`

Both of these commands will effectively compact partially full chunks on 
the filesystem so that new chunks have more space to be allocated.  For 
more information on what the commands actually mean, check out `man 
btrfs-balance`

When run regularly, both of these should complete extremely fast on most 
BTRFS volumes.  Note that these may run significantly slower on volumes 
which have quotas enabled.  Additionally, it's best to make sure other 
things aren't putting a lot of load on the filesystem while running a 
balance, so try to make sure this doesn't run at the same time as a 
scrub or defrag.

A full, unfiltered balance (one without any options passed in) is 
completely unnecessary for normal usage of a filesystem.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-12 18:24 ` Austin S. Hemmelgarn
@ 2018-01-12 19:26   ` Tom Worster
  2018-01-12 19:43     ` Austin S. Hemmelgarn
  2018-01-13 22:09   ` Chris Murphy
  1 sibling, 1 reply; 34+ messages in thread
From: Tom Worster @ 2018-01-12 19:26 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Btrfs BTRFS

On 12 Jan 2018, at 13:24, Austin S. Hemmelgarn wrote:

> OK, I've gotten a lot of good feedback on this, and the general 
> consensus seems to be:
>
> * If we're going to recommend regular balance, we should explain how 
> it actually helps things.
> * We should mention the performance interactions with qgroups, as well 
> as warning people off of running other things like scrubs or defrag 
> concurrently.
> * The filters should be reasonably tame in terms of chunk selection.
> * BTRFS should ideally get smarter about this kind of thing so the 
> user doesn't have to be.
>
> To that end, I propose the following text for the FAQ:
>
> Q: Do I need to run a balance regularly?
>
> A: While not strictly necessary for normal operations, running a 
> filtered balance regularly can help prevent your filesystem from 
> ending up with ENOSPC issues.  The following command run daily on each 
> BTRFS volume should be more than sufficient for most users:
>
> `btrfs balance start -dusage=25 -dlimit=2..10 -musage=25 
> -mlimit=2..10`
>
> If you are running a kernel older than version 4.4 and can't upgrade, 
> the following should be used instead:
>
> `btrfs balance start -dusage=25 -musage=25`
>
> Both of these commands will effectively compact partially full chunks 
> on the filesystem so that new chunks have more space to be allocated.  
> For more information on what the commands actually mean, check out 
> `man btrfs-balance`
>
> When run regularly, both of these should complete extremely fast on 
> most BTRFS volumes.  Note that these may run significantly slower on 
> volumes which have quotas enabled.  Additionally, it's best to make 
> sure other things aren't putting a lot of load on the filesystem while 
> running a balance, so try to make sure this doesn't run at the same 
> time as a scrub or defrag.
>
> A full, unfiltered balance (one without any options passed in) is 
> completely unnecessary for normal usage of a filesystem.

Hi Austin,

 From the discussion we've had I have the impression there might be 
another way to answer this FAQ that is as valid as this one: monitor 
usage. You may never need to balance, or hardly ever.

Your suggestion for regular balance has clear advantages: It's 
set-and-forget. And to apply the advice the user doesn't need to 
understand the allocator, interpret usage stats, or figure out filters.

On the other hand, not needing to run balance is quite appealing. So is 
avoiding yet another cron/timer job if I don't really need it.

Question about your proposed text, "When run regularly, both of these 
should complete extremely fast..." Does that imply that it might not be 
fast if, say, you've never run balance before on a filesystem with a lot 
of unused space in allocated chunks?

Tom

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-12 19:26   ` Tom Worster
@ 2018-01-12 19:43     ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 34+ messages in thread
From: Austin S. Hemmelgarn @ 2018-01-12 19:43 UTC (permalink / raw)
  To: Tom Worster; +Cc: Btrfs BTRFS

On 2018-01-12 14:26, Tom Worster wrote:
> On 12 Jan 2018, at 13:24, Austin S. Hemmelgarn wrote:
> 
>> OK, I've gotten a lot of good feedback on this, and the general 
>> consensus seems to be:
>>
>> * If we're going to recommend regular balance, we should explain how 
>> it actually helps things.
>> * We should mention the performance interactions with qgroups, as well 
>> as warning people off of running other things like scrubs or defrag 
>> concurrently.
>> * The filters should be reasonably tame in terms of chunk selection.
>> * BTRFS should ideally get smarter about this kind of thing so the 
>> user doesn't have to be.
>>
>> To that end, I propose the following text for the FAQ:
>>
>> Q: Do I need to run a balance regularly?
>>
>> A: While not strictly necessary for normal operations, running a 
>> filtered balance regularly can help prevent your filesystem from 
>> ending up with ENOSPC issues.  The following command run daily on each 
>> BTRFS volume should be more than sufficient for most users:
>>
>> `btrfs balance start -dusage=25 -dlimit=2..10 -musage=25 -mlimit=2..10`
>>
>> If you are running a kernel older than version 4.4 and can't upgrade, 
>> the following should be used instead:
>>
>> `btrfs balance start -dusage=25 -musage=25`
>>
>> Both of these commands will effectively compact partially full chunks 
>> on the filesystem so that new chunks have more space to be allocated. 
>> For more information on what the commands actually mean, check out 
>> `man btrfs-balance`
>>
>> When run regularly, both of these should complete extremely fast on 
>> most BTRFS volumes.  Note that these may run significantly slower on 
>> volumes which have quotas enabled.  Additionally, it's best to make 
>> sure other things aren't putting a lot of load on the filesystem while 
>> running a balance, so try to make sure this doesn't run at the same 
>> time as a scrub or defrag.
>>
>> A full, unfiltered balance (one without any options passed in) is 
>> completely unnecessary for normal usage of a filesystem.
> 
> Hi Austin,
> 
>  From the discussion we've had I have the impression there might be 
> another way to answer this FAQ that is as valid as this one: monitor 
> usage. You may never need to balance, or hardly ever.
> 
> Your suggestion for regular balance has clear advantages: It's 
> set-and-forget. And to apply the advice the user doesn't need to 
> understand the allocator, interpret usage stats, or figure out filters.
> 
> On the other hand, not needing to run balance is quite appealing. So is 
> avoiding yet another cron/timer job if I don't really need it.
While I can understand some people may prefer that, it's also a lot more 
error prone, and I personally don't think it's a good idea to suggest 
it.  Recommendations where the user is required to regularly do 
something themself and evaluate the results tend to be bad ideas from a 
support perspective.  Yeah, it may be fine for you and me to just 
monitor things manually, but it's not really a good idea for someone who 
doesn't have any understanding of how things work under the hood.  It's 
also not likely to be what most distros end up standardizing on as best 
practices for handling this, and not matching up with that will look bad.
> 
> Question about your proposed text, "When run regularly, both of these 
> should complete extremely fast..." Does that imply that it might not be 
> fast if, say, you've never run balance before on a filesystem with a lot 
> of unused space in allocated chunks?
There's a very high potential that it will take a long time on the first 
run (or the first couple if things are particularly bad), but the 
filters still limit exactly how much data will get moved.  The 
recommended command with the limit filters for example shouldn't ever 
move more than about 15GB of data, and it will only do that much on 
pathologically bad multi-TB volumes, and thus it shouldn't take more 
than a few minutes on SSD's, or about 15-20 on traditional hard drives.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-12 18:24 ` Austin S. Hemmelgarn
  2018-01-12 19:26   ` Tom Worster
@ 2018-01-13 22:09   ` Chris Murphy
  2018-01-15 13:43     ` Austin S. Hemmelgarn
  2018-01-15 18:23     ` Tom Worster
  1 sibling, 2 replies; 34+ messages in thread
From: Chris Murphy @ 2018-01-13 22:09 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Btrfs BTRFS

On Fri, Jan 12, 2018 at 11:24 AM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:

> To that end, I propose the following text for the FAQ:
>
> Q: Do I need to run a balance regularly?
>
> A: While not strictly necessary for normal operations, running a filtered
> balance regularly can help prevent your filesystem from ending up with
> ENOSPC issues.  The following command run daily on each BTRFS volume should
> be more than sufficient for most users:
>
> `btrfs balance start -dusage=25 -dlimit=2..10 -musage=25 -mlimit=2..10`

Daily? Seems excessive.

I've got multiple Btrfs file systems that I haven't balanced, full or
partial, in a year. And I have no problems. One is a laptop which
accumulates snapshots until roughly 25% free space remains and then
most of the snapshots are deleted, except the most recent few, all at
one time. I'm not experiencing any problems so far. The other is a NAS
and it's multiple copies, with maybe 100-200 snapshots. One backup
volume is 99% full, there's no more unallocated free space, I delete
snapshots only to make room for btrfs send receive to keep pushing the
most recent snapshot from the main volume to the backup. Again no
problems.

I really think suggestions this broad are just going to paper over
bugs or design flaws, we won't see as many bug reports and then real
problems won't get fixed.

I also thing the time based method is too subjective. What about the
layout means a balance is needed? And if it's really a suggestion, why
isn't there a chron or systemd unit that just does this for the user,
in btrfs-progs, working and enabled by default? I really do not like
all this hand holding of Btrfs, it's not going to make it better.

> A full, unfiltered balance (one without any options passed in) is completely
> unnecessary for normal usage of a filesystem.

That's good advice.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-13 22:09   ` Chris Murphy
@ 2018-01-15 13:43     ` Austin S. Hemmelgarn
  2018-01-15 18:23     ` Tom Worster
  1 sibling, 0 replies; 34+ messages in thread
From: Austin S. Hemmelgarn @ 2018-01-15 13:43 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On 2018-01-13 17:09, Chris Murphy wrote:
> On Fri, Jan 12, 2018 at 11:24 AM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
> 
> 
>> To that end, I propose the following text for the FAQ:
>>
>> Q: Do I need to run a balance regularly?
>>
>> A: While not strictly necessary for normal operations, running a filtered
>> balance regularly can help prevent your filesystem from ending up with
>> ENOSPC issues.  The following command run daily on each BTRFS volume should
>> be more than sufficient for most users:
>>
>> `btrfs balance start -dusage=25 -dlimit=2..10 -musage=25 -mlimit=2..10`
> 
> 
> Daily? Seems excessive.
For handling of chunks that are only 25% full and capping it at 10 
chunks processed each for data and metadata?  That's only (assuming I 
remember the max chunk size correctly) about 15GB of data being moved at 
the absolute most, and that will likely only happen in pathologically 
bad cases.  In most cases it should be either nothing (in most cases) or 
about 768MB being shuffled around, and even on traditional hard drives 
that should complete insanely fast (barring impact from very large 
numbers of snapshots or use of qgroups).

If there are no chunks that match (or only one chunk), this finishes in 
at most a second with near zero disk I/O.  If exactly two match (which 
should be the common case for most users when it matches at all), it 
should take at most a few seconds to complete, even on traditional hard 
drives.  If more match, it will of course take longer, but it should be 
pretty rare that more than two match.

Given that, it really doesn't seem all that excessive to me.  As a point 
of comparison, automated X.509 certificate renewal checks via certbot 
take more resources to perform when there's not a renewal due than this 
balance command takes when there's nothing to work on, and it's 
absolutely standard to run the X.509 checks daily despite the fact that 
weekly checks would still give no worse security (certbot will renew 
things well before they expire).
> 
> I've got multiple Btrfs file systems that I haven't balanced, full or
> partial, in a year. And I have no problems. One is a laptop which
> accumulates snapshots until roughly 25% free space remains and then
> most of the snapshots are deleted, except the most recent few, all at
> one time. I'm not experiencing any problems so far. The other is a NAS
> and it's multiple copies, with maybe 100-200 snapshots. One backup
> volume is 99% full, there's no more unallocated free space, I delete
> snapshots only to make room for btrfs send receive to keep pushing the
> most recent snapshot from the main volume to the backup. Again no
> problems.
In the first case, you're dealing with a special configuration that 
makes most of this irrelevant most of the time (as I'm assuming things 
change _enough_ between snapshots that dumping most of them will 
completely empty out most of the chunks they were stored in).

In the second I'd have to say you've been lucky.  I've personally never 
run a volume that close to full with BTRFS without balancing regularly 
and not had some kind of issue.
> 
> I really think suggestions this broad are just going to paper over
> bugs or design flaws, we won't see as many bug reports and then real
> problems won't get fixed.
So maybe we should fix things so that this is never needed?  Yes, it's a 
workaround for a well known and documented design flaw (and yes, I 
consider the whole two-level allocator's handling of free space 
exhaustion to be a design flaw), but I don't see any patches forthcoming 
to fix it, so if we want to keep users around, we need to provide some 
way for them to mitigate the problems it can cause (otherwise we won't 
find any bugs because we won't have any users).
> 
> I also thing the time based method is too subjective. What about the
> layout means a balance is needed? And if it's really a suggestion, why
> isn't there a chron or systemd unit that just does this for the user,
> in btrfs-progs, working and enabled by default? I really do not like
> all this hand holding of Btrfs, it's not going to make it better.
For a filesystem you really have two generic possibilities for use cases:

1. It's designed for general purpose usage.  Doesn't really excel at any 
thing in particular, but isn't really bad at anything either.
2. It's designed for a very specific use case.  Does an amazing job for 
that particular use case and possibly for some similar ones, and may or 
may not do a reasonable job for other use cases.

Your comments here seem to imply that BTRFS falls under the second case, 
which is odd since most everything else I've seen implies that BTRFS 
fits the first case (or is trying to at least).  In either case though, 
you need to provide something to deal with this particular design flaw.

In the first case, you _need_ to make it as easy as possible for people 
who have no understanding of computers to use.  While needing balances 
from time to time is not exactly in-line with that, requiring people to 
try and judge based on the numbers whether or not a balance is warranted 
is even less in-line with it.  By just telling people to automate it and 
give reasonable filters to the balance command, we remove the guesswork 
entirely, and make things far easier for people.

In the second case, it's generally more acceptable to require more work 
of the user, but making baseline prophylactic maintenance something that 
you can't trivially automate is still a bad idea (imagine how popular 
ZFS would be if you could only run scrubs manually).

That said, if you can find or write up a script that reliably does the 
math to check if a balance is needed and then actually runs it if it is, 
I would be more than happy to recommend that in the FAQ instead.
> 
>> A full, unfiltered balance (one without any options passed in) is completely
>> unnecessary for normal usage of a filesystem.
> 
> That's good advice.
And so far it seems to be the one thing that everyone agrees on ;).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-13 22:09   ` Chris Murphy
  2018-01-15 13:43     ` Austin S. Hemmelgarn
@ 2018-01-15 18:23     ` Tom Worster
  2018-01-16  6:45       ` Chris Murphy
  1 sibling, 1 reply; 34+ messages in thread
From: Tom Worster @ 2018-01-15 18:23 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Austin S. Hemmelgarn, Btrfs BTRFS

On 13 Jan 2018, at 17:09, Chris Murphy wrote:

> On Fri, Jan 12, 2018 at 11:24 AM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>
>
>> To that end, I propose the following text for the FAQ:
>>
>> Q: Do I need to run a balance regularly?
>>
>> A: While not strictly necessary for normal operations, running a 
>> filtered
>> balance regularly can help prevent your filesystem from ending up 
>> with
>> ENOSPC issues.  The following command run daily on each BTRFS volume 
>> should
>> be more than sufficient for most users:
>>
>> `btrfs balance start -dusage=25 -dlimit=2..10 -musage=25 
>> -mlimit=2..10`
>
>
> Daily? Seems excessive.
>
> I've got multiple Btrfs file systems that I haven't balanced, full or
> partial, in a year. And I have no problems. One is a laptop which
> accumulates snapshots until roughly 25% free space remains and then
> most of the snapshots are deleted, except the most recent few, all at
> one time. I'm not experiencing any problems so far. The other is a NAS
> and it's multiple copies, with maybe 100-200 snapshots. One backup
> volume is 99% full, there's no more unallocated free space, I delete
> snapshots only to make room for btrfs send receive to keep pushing the
> most recent snapshot from the main volume to the backup. Again no
> problems.
>
> I really think suggestions this broad are just going to paper over
> bugs or design flaws, we won't see as many bug reports and then real
> problems won't get fixed.

This is just an answer to a FAQ. This is not Austin or anyone else 
trying to telling you or anyone else that you should do this. It should 
be clear that there is an implied caveat along the lines of: "There are 
other ways to manage allocation besides regular balancing. This 
recommendation is a For-Dummies-kinda default that should work well 
enough if you don't have another strategy better adapted to your 
situation." If this implication is not obvious enough then we can add 
something explicit.

> I also thing the time based method is too subjective. What about the
> layout means a balance is needed? And if it's really a suggestion, why
> isn't there a chron or systemd unit that just does this for the user,
> in btrfs-progs, working and enabled by default?

As a newcomer to BTRFS, I was astonished to learn that it demands each 
user figure out some workaround for what is, in my judgement, a required 
but missing feature, i.e. a defect, a bug. At present the docs are 
pretty confusing for someone trying to deal with it on their own.

Unless some better fix is in the works, this _should_ be a systemd unit 
or something. Until then, please put it in FAQ.

> I really do not like
> all this hand holding of Btrfs, it's not going to make it better.

Maybe it won't but, absent better proposals, and given the nature of the 
problem, this kind of hand-holding is only fair to the user.

Tom

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-15 18:23     ` Tom Worster
@ 2018-01-16  6:45       ` Chris Murphy
  2018-01-16 11:02         ` Andrei Borzenkov
  2018-01-16 12:57         ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 34+ messages in thread
From: Chris Murphy @ 2018-01-16  6:45 UTC (permalink / raw)
  To: Tom Worster; +Cc: Chris Murphy, Austin S. Hemmelgarn, Btrfs BTRFS

On Mon, Jan 15, 2018 at 11:23 AM, Tom Worster <fsb@thefsb.org> wrote:
> On 13 Jan 2018, at 17:09, Chris Murphy wrote:
>
>> On Fri, Jan 12, 2018 at 11:24 AM, Austin S. Hemmelgarn
>> <ahferroin7@gmail.com> wrote:
>>
>>
>>> To that end, I propose the following text for the FAQ:
>>>
>>> Q: Do I need to run a balance regularly?
>>>
>>> A: While not strictly necessary for normal operations, running a filtered
>>> balance regularly can help prevent your filesystem from ending up with
>>> ENOSPC issues.  The following command run daily on each BTRFS volume
>>> should
>>> be more than sufficient for most users:
>>>
>>> `btrfs balance start -dusage=25 -dlimit=2..10 -musage=25 -mlimit=2..10`
>>
>>
>>
>> Daily? Seems excessive.
>>
>> I've got multiple Btrfs file systems that I haven't balanced, full or
>> partial, in a year. And I have no problems. One is a laptop which
>> accumulates snapshots until roughly 25% free space remains and then
>> most of the snapshots are deleted, except the most recent few, all at
>> one time. I'm not experiencing any problems so far. The other is a NAS
>> and it's multiple copies, with maybe 100-200 snapshots. One backup
>> volume is 99% full, there's no more unallocated free space, I delete
>> snapshots only to make room for btrfs send receive to keep pushing the
>> most recent snapshot from the main volume to the backup. Again no
>> problems.
>>
>> I really think suggestions this broad are just going to paper over
>> bugs or design flaws, we won't see as many bug reports and then real
>> problems won't get fixed.
>
>
> This is just an answer to a FAQ. This is not Austin or anyone else trying to
> telling you or anyone else that you should do this. It should be clear that
> there is an implied caveat along the lines of: "There are other ways to
> manage allocation besides regular balancing. This recommendation is a
> For-Dummies-kinda default that should work well enough if you don't have
> another strategy better adapted to your situation." If this implication is
> not obvious enough then we can add something explicit.

It's an upstream answer to a frequently asked question. It's rather
official, or about as close as it gets to it.

>
>
>> I also thing the time based method is too subjective. What about the
>> layout means a balance is needed? And if it's really a suggestion, why
>> isn't there a chron or systemd unit that just does this for the user,
>> in btrfs-progs, working and enabled by default?
>
>
> As a newcomer to BTRFS, I was astonished to learn that it demands each user
> figure out some workaround for what is, in my judgement, a required but
> missing feature, i.e. a defect, a bug. At present the docs are pretty
> confusing for someone trying to deal with it on their own.
>
> Unless some better fix is in the works, this _should_ be a systemd unit or
> something. Until then, please put it in FAQ.

At least openSUSE has a systemd unit for a long time now, but last
time I checked (a bit over a year ago) it's disabled by default. Why?

And insofar as I'm aware, openSUSE users aren't having big problems
related to lack of balancing, they have problems due to the lack of
balancing combined with schizo snapper defaults, which are these days
masked somewhat by turning on quotas so snapper can be more accurate
about cleaning up.

Basically the scripted balance tells me two things:
a. Something is broken (still)
b. None of the developers has time to investigate coherent bug reports
about a. and fix/refine it.

And therefore papering over the problem is all we have. Basically it's
a sledgehammer approach.

The main person working on enoscp stuff is Josef so I'd run this by
him and make sure this papering over bugs is something he agrees with.

>
>
>> I really do not like
>> all this hand holding of Btrfs, it's not going to make it better.
>
>
> Maybe it won't but, absent better proposals, and given the nature of the
> problem, this kind of hand-holding is only fair to the user.

This is hardly the biggest gotcha with Btrfs. I'm fine with the idea
of papering over design flaws and long standing bugs with user space
work arounds. I just want everyone on the same page about it, so it's
not some big surprise it's happening. As far as I know, none of the
developers regularly looks at the Btrfs wiki.

And I think the best way of communicating:
a. this is busted, and it sucks
b. here's a proposed user space work around, so users aren't so pissed off.

Is to try and get it into btrfs-progs, and enabled by default, because
that will get in front of at least one developer.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-16  6:45       ` Chris Murphy
@ 2018-01-16 11:02         ` Andrei Borzenkov
  2018-01-16 12:57         ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 34+ messages in thread
From: Andrei Borzenkov @ 2018-01-16 11:02 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Tom Worster, Austin S. Hemmelgarn, Btrfs BTRFS

On Tue, Jan 16, 2018 at 9:45 AM, Chris Murphy <lists@colorremedies.com> wrote:
...
>>
>> Unless some better fix is in the works, this _should_ be a systemd unit or
>> something. Until then, please put it in FAQ.
>
> At least openSUSE has a systemd unit for a long time now, but last
> time I checked (a bit over a year ago) it's disabled by default. Why?
>

It is now enabled by default on Tumbleweed and hence likely on SLE/Leap 15.

> And insofar as I'm aware, openSUSE users aren't having big problems
> related to lack of balancing, they have problems due to the lack of
> balancing combined with schizo snapper defaults, which are these days
> masked somewhat by turning on quotas so snapper can be more accurate
> about cleaning up.
>

Not only that but also making snapshot policy less aggressive - now
(in Tumbleweed/Leap 42.3) periodical snapshots are turned off by
default, only configuration changes via YaST/package updates via
zypper trigger snapshot creation.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-16  6:45       ` Chris Murphy
  2018-01-16 11:02         ` Andrei Borzenkov
@ 2018-01-16 12:57         ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 34+ messages in thread
From: Austin S. Hemmelgarn @ 2018-01-16 12:57 UTC (permalink / raw)
  To: Chris Murphy, Tom Worster; +Cc: Btrfs BTRFS

On 2018-01-16 01:45, Chris Murphy wrote:
> On Mon, Jan 15, 2018 at 11:23 AM, Tom Worster <fsb@thefsb.org> wrote:
>> On 13 Jan 2018, at 17:09, Chris Murphy wrote:
>>
>>> On Fri, Jan 12, 2018 at 11:24 AM, Austin S. Hemmelgarn
>>> <ahferroin7@gmail.com> wrote:
>>>
>>>> To that end, I propose the following text for the FAQ:
>>>>
>>>> Q: Do I need to run a balance regularly?
>>>>
>>>> A: While not strictly necessary for normal operations, running a filtered
>>>> balance regularly can help prevent your filesystem from ending up with
>>>> ENOSPC issues.  The following command run daily on each BTRFS volume
>>>> should
>>>> be more than sufficient for most users:
>>>>
>>>> `btrfs balance start -dusage=25 -dlimit=2..10 -musage=25 -mlimit=2..10`
>>>
>>> Daily? Seems excessive.
>>>
>>> I've got multiple Btrfs file systems that I haven't balanced, full or
>>> partial, in a year. And I have no problems. One is a laptop which
>>> accumulates snapshots until roughly 25% free space remains and then
>>> most of the snapshots are deleted, except the most recent few, all at
>>> one time. I'm not experiencing any problems so far. The other is a NAS
>>> and it's multiple copies, with maybe 100-200 snapshots. One backup
>>> volume is 99% full, there's no more unallocated free space, I delete
>>> snapshots only to make room for btrfs send receive to keep pushing the
>>> most recent snapshot from the main volume to the backup. Again no
>>> problems.
>>>
>>> I really think suggestions this broad are just going to paper over
>>> bugs or design flaws, we won't see as many bug reports and then real
>>> problems won't get fixed.
>>
>> This is just an answer to a FAQ. This is not Austin or anyone else trying to
>> telling you or anyone else that you should do this. It should be clear that
>> there is an implied caveat along the lines of: "There are other ways to
>> manage allocation besides regular balancing. This recommendation is a
>> For-Dummies-kinda default that should work well enough if you don't have
>> another strategy better adapted to your situation." If this implication is
>> not obvious enough then we can add something explicit.
> 
> It's an upstream answer to a frequently asked question. It's rather
> official, or about as close as it gets to it.
> 
>>
>>> I also thing the time based method is too subjective. What about the
>>> layout means a balance is needed? And if it's really a suggestion, why
>>> isn't there a chron or systemd unit that just does this for the user,
>>> in btrfs-progs, working and enabled by default?
>>
>> As a newcomer to BTRFS, I was astonished to learn that it demands each user
>> figure out some workaround for what is, in my judgement, a required but
>> missing feature, i.e. a defect, a bug. At present the docs are pretty
>> confusing for someone trying to deal with it on their own.
>>
>> Unless some better fix is in the works, this _should_ be a systemd unit or
>> something. Until then, please put it in FAQ.
> 
> At least openSUSE has a systemd unit for a long time now, but last
> time I checked (a bit over a year ago) it's disabled by default. Why?
> 
> And insofar as I'm aware, openSUSE users aren't having big problems
> related to lack of balancing, they have problems due to the lack of
> balancing combined with schizo snapper defaults, which are these days
> masked somewhat by turning on quotas so snapper can be more accurate
> about cleaning up.
And in turn causing other issues because of the quotas, but that's 
getting OT...
> 
> Basically the scripted balance tells me two things:
> a. Something is broken (still)
> b. None of the developers has time to investigate coherent bug reports
> about a. and fix/refine it.
I don't entirely agree here.  The issue is essentially inherent in the 
very design of the two-stage allocator itself, so it's not really 
something that can just be fixed by some simple surface patch.  The only 
real options I see to fix it are either:
1. Redesign the allocator
or:
2. figure out some way to handle this generically and automatically.

The first case is pretty much immediately out because it will almost 
certainly require a breaking change in the on-disk format.  The second 
is extremely challenging to do right, and likely to cause some 
significant controversy among list regulars (I for one don't want the FS 
doing stuff behind my back that impacts performance, and I have a 
feeling that quite a lot of other people here don't either).

Given that, I would say time is only a (probably small) part of it. 
This is not an easy thing to fix given the current situation, and 
difficult problems tend to sit around with no progress for very long 
periods of time in open source development.
> 
> And therefore papering over the problem is all we have. Basically it's
> a sledgehammer approach.
How exactly is this any different than requiring a user to manually 
scrub things to check data that's not being actively used?  Or requiring 
manual invocation of defragmentation?  Or even batch deduplication?

All of those are manually triggered solutions to 'problems' with the 
filesystem, just like this is.  The only difference is that people are 
used to needing to manually defrag disks, and reasonably used to the 
need for manual scrubs (and don't seem to care much about dedupe), while 
doing something like this to keep the allocator happy is absolutely 
alien to them (despite being no different conceptually in that respect 
from defrag, just operating at a different level).
> 
> The main person working on enoscp stuff is Josef so I'd run this by
> him and make sure this papering over bugs is something he agrees with.
I agree that Josef's input would be nice to have, as he really does 
appear to be the authority on this type of thing.

I would also love to hear from someone at Facebook about their 
experience with this type of thing, as they probably have the largest 
current deployment of BTRFS around.
> 
>>
>>> I really do not like
>>> all this hand holding of Btrfs, it's not going to make it better.
>>
>> Maybe it won't but, absent better proposals, and given the nature of the
>> problem, this kind of hand-holding is only fair to the user.
> 
> This is hardly the biggest gotcha with Btrfs. I'm fine with the idea
> of papering over design flaws and long standing bugs with user space
> work arounds. I just want everyone on the same page about it, so it's
> not some big surprise it's happening. As far as I know, none of the
> developers regularly looks at the Btrfs wiki.
> 
> And I think the best way of communicating:
> a. this is busted, and it sucks
> b. here's a proposed user space work around, so users aren't so pissed off.
> 
> Is to try and get it into btrfs-progs, and enabled by default, because
> that will get in front of at least one developer.
Maybe it's time someone writes up a BCP document and includes that as a 
man page bundled with btrfs-progs?  That would get much better developer 
visibility, would be much easier to keep current, and would probably 
cover the biggest issue with our documentation currently (it's great for 
technical people, but somewhat horrendous for new users without 
technical background).  We've already essentially got the beginnings of 
such a document between the FAQ and the Gotcha's page on the wiki.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
@ 2018-01-08 21:43 Tom Worster
  2018-01-08 22:18 ` Hugo Mills
  2018-01-09 12:23 ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 34+ messages in thread
From: Tom Worster @ 2018-01-08 21:43 UTC (permalink / raw)
  To: linux-btrfs

On 01/08/2018 04:55 PM, Austin S. Hemmelgarn wrote:

> On 2018-01-08 11:20, ein wrote:
>
> > On 01/08/2018 04:55 PM, Austin S. Hemmelgarn wrote:
> >
> > > [...]
> > >
> > > And here's the FAQ entry:
> > >
> > > Q: Do I need to run a balance regularly?
> > >
> > > A: In general usage, no. A full unfiltered balance typically takes 
> a
> > > long time, and will rewrite huge amounts of data unnecessarily. 
> You may
> > > wish to run a balance on metadata only (see Balance_Filters) if 
> you find
> > > you have very large amounts of metadata space allocated but 
> unused, but
> > > this should be a last resort.
> >
> > IHMO three more sentencens and the answer would be more useful:
> > 1. BTRFS balance command example with note check the man first.
> > 2. What use case may cause 'large amounts of metadata space 
> allocated
> > but unused'.
>
> That's kind of what I was thinking as well, but I'm hesitant to get 
> too heavily into stuff along the lines of 'for use case X, do 1, for 
> use case Y, do 2, etc', as that tends to result in pigeonholing 
> (people just go with what sounds closest to their use case instead of 
> trying to figure out what actually is best for their use case).
>
> Ideally, I think it should be as generic as reasonably possible, 
> possibly something along the lines of:
>
> A: While not strictly necessary, running regular filtered balances 
> (for example `btrfs balance start -dusage=50 -dlimit=2 -musage=50 
> -mlimit=4`, see `man btrfs-balance` for more info on what the options 
> mean) can help keep a volume healthy by mitigating the things that 
> typically cause ENOSPC errors. Full balances by contrast are long and 
> expensive operations, and should be done only as a last resort.

As the BTRFS noob who started the conversation on netdata's Github 
issues, I'd like to describe my experience.

I got an alert that unallocated space on a BTRFS filesystem on one host 
was low. A netdata caption suggested btrfs-balance and directed me to 
its man page. But I found it hard to understand since I don't know how 
BTRFS works or its particular terminology. The FAQ was easier to 
understand but didn't help me find a solution to my problem.

It's a 420GiB NVMe with single data and metadata. It has a MariaDB 
datadir with an OLTP workload and a small GlusterFS brick for 
replicating filesystem with little activity. I recall that unallocated 
space was under 2G, metadata allocation was low, a few G and about 1/3 
used. Data allocation was very large, almost everything else, with ~25% 
used.

Given the documentation and the usage stats, I did not know what options 
to use with balance. I spent some time reading and researching and 
trying to understand the filters and how they should relate to my 
situation. Eventually I abandoned that effort and ran balance without 
options.

While general recommendations about running balance would be welcome, 
what I needed was a dummy's guide to what the output of btrfs usage 
_means_ and how to use balance to tackle problems with it.

The other mystery is how the data allocation became so large.

Tom

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-08 21:43 Tom Worster
@ 2018-01-08 22:18 ` Hugo Mills
  2018-01-09 12:23 ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 34+ messages in thread
From: Hugo Mills @ 2018-01-08 22:18 UTC (permalink / raw)
  To: Tom Worster; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 5919 bytes --]

On Mon, Jan 08, 2018 at 04:43:02PM -0500, Tom Worster wrote:
> On 01/08/2018 04:55 PM, Austin S. Hemmelgarn wrote:
> 
> >On 2018-01-08 11:20, ein wrote:
> >
> >> On 01/08/2018 04:55 PM, Austin S. Hemmelgarn wrote:
> >>
> >> > [...]
> >> >
> >> > And here's the FAQ entry:
> >> >
> >> > Q: Do I need to run a balance regularly?
> >> >
> >> > A: In general usage, no. A full unfiltered balance typically
> >takes a
> >> > long time, and will rewrite huge amounts of data
> >unnecessarily. You may
> >> > wish to run a balance on metadata only (see Balance_Filters)
> >if you find
> >> > you have very large amounts of metadata space allocated but
> >unused, but
> >> > this should be a last resort.
> >>
> >> IHMO three more sentencens and the answer would be more useful:
> >> 1. BTRFS balance command example with note check the man first.
> >> 2. What use case may cause 'large amounts of metadata space
> >allocated
> >> but unused'.
> >
> >That's kind of what I was thinking as well, but I'm hesitant to
> >get too heavily into stuff along the lines of 'for use case X, do
> >1, for use case Y, do 2, etc', as that tends to result in
> >pigeonholing (people just go with what sounds closest to their use
> >case instead of trying to figure out what actually is best for
> >their use case).
> >
> >Ideally, I think it should be as generic as reasonably possible,
> >possibly something along the lines of:
> >
> >A: While not strictly necessary, running regular filtered balances
> >(for example `btrfs balance start -dusage=50 -dlimit=2 -musage=50
> >-mlimit=4`, see `man btrfs-balance` for more info on what the
> >options mean) can help keep a volume healthy by mitigating the
> >things that typically cause ENOSPC errors. Full balances by
> >contrast are long and expensive operations, and should be done
> >only as a last resort.
> 
> As the BTRFS noob who started the conversation on netdata's Github
> issues, I'd like to describe my experience.
> 
> I got an alert that unallocated space on a BTRFS filesystem on one
> host was low. A netdata caption suggested btrfs-balance and directed
> me to its man page. But I found it hard to understand since I don't
> know how BTRFS works or its particular terminology. The FAQ was
> easier to understand but didn't help me find a solution to my
> problem.

   The information is there in the FAQ, but only under headings that
you'd find if you'd actually hit the problems, rather than being warned
that the problems might be happening (which is your situation):

https://btrfs.wiki.kernel.org/index.php/FAQ#Help.21_Btrfs_claims_I.27m_out_of_space.2C_but_it_looks_like_I_should_have_lots_left.21

> It's a 420GiB NVMe with single data and metadata. It has a MariaDB
> datadir with an OLTP workload and a small GlusterFS brick for
> replicating filesystem with little activity. I recall that
> unallocated space was under 2G, metadata allocation was low, a few G
> and about 1/3 used. Data allocation was very large, almost
> everything else, with ~25% used.
> 
> Given the documentation and the usage stats, I did not know what
> options to use with balance. I spent some time reading and
> researching and trying to understand the filters and how they should
> relate to my situation. Eventually I abandoned that effort and ran
> balance without options.

   That'll certainly work, although it's wasteful of I/O bandwidth and
time.

> While general recommendations about running balance would be
> welcome, what I needed was a dummy's guide to what the output of
> btrfs usage _means_ and how to use balance to tackle problems with
> it.

   In this kind of situation, it's generally recommended to balance
data chunks only (because that's where the overallocation usually
happens). There's not much point in balancing everything, so the
question is how much work to do... Ideally, you want to end up
compacting everything into the smallest number of chunks, which will
be the number of GiB of actual data.

   There's a couple of ways to limit the work done. One way is to only
pick the chunks less than some threshold fraction used. This is the
usage=N option (-dusage=30, for example). It allows you to do (in
theory) the minimum amount of actual balance work neded. Drawbacks are
that you don't know how many such chunks there are for any given N, so
you end up searching manually for an appropriate N.

   The other way is to tell balance exactly how many chunks it should
operate on. This is the limit=N option. This gives you precise control
over the number of chunks to balance, but doesn't specify which
chunks, so you may end up moving N GiB of data (whereas usage=N could
move much less actual data).

   Personally, I recommend using limit=N, where N is something like
(Allocated - Used)*3/4 GiB.

   Note the caveat below, which is that using "ssd" mount option on
earlier kernels could prevent the balance from doing a decent job.

> The other mystery is how the data allocation became so large.

   You have a non-rotational device. That means that it'd be mounted
automatically with the "ssd" mount option. Up to 4.13 (or 4.14, I
always forget), the behaviour of "ssd" leads to highly fragmented
allocation of extents, which in turn results in new data chunks being
allocated when there's theoretically loads of space available to use
(but which it may not be practical to use, due to the fragmented free
space).

   After 4.13 (or 4.14), the "ssd" mount option has been fixed, and it
no longer has the bad long-term effects that we've seen before, but it
won't deal with the existing fragmented free space without a data
balance.

   If you're running an older kernel, it's definitely recommended to
mount all filesystems with "nossd" to avoid these issues.

   Hugo.

-- 
Hugo Mills             | As long as you're getting different error messages,
hugo@... carfax.org.uk | you're making progress.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-08 21:43 Tom Worster
  2018-01-08 22:18 ` Hugo Mills
@ 2018-01-09 12:23 ` Austin S. Hemmelgarn
  2018-01-09 14:16   ` Tom Worster
  1 sibling, 1 reply; 34+ messages in thread
From: Austin S. Hemmelgarn @ 2018-01-09 12:23 UTC (permalink / raw)
  To: Tom Worster, linux-btrfs

On 2018-01-08 16:43, Tom Worster wrote:
> On 01/08/2018 04:55 PM, Austin S. Hemmelgarn wrote:
> 
>> On 2018-01-08 11:20, ein wrote:
>>
>> > On 01/08/2018 04:55 PM, Austin S. Hemmelgarn wrote:
>> >
>> > > [...]
>> > >
>> > > And here's the FAQ entry:
>> > >
>> > > Q: Do I need to run a balance regularly?
>> > >
>> > > A: In general usage, no. A full unfiltered balance typically takes a
>> > > long time, and will rewrite huge amounts of data unnecessarily. 
>> You may
>> > > wish to run a balance on metadata only (see Balance_Filters) if 
>> you find
>> > > you have very large amounts of metadata space allocated but 
>> unused, but
>> > > this should be a last resort.
>> >
>> > IHMO three more sentencens and the answer would be more useful:
>> > 1. BTRFS balance command example with note check the man first.
>> > 2. What use case may cause 'large amounts of metadata space allocated
>> > but unused'.
>>
>> That's kind of what I was thinking as well, but I'm hesitant to get 
>> too heavily into stuff along the lines of 'for use case X, do 1, for 
>> use case Y, do 2, etc', as that tends to result in pigeonholing 
>> (people just go with what sounds closest to their use case instead of 
>> trying to figure out what actually is best for their use case).
>>
>> Ideally, I think it should be as generic as reasonably possible, 
>> possibly something along the lines of:
>>
>> A: While not strictly necessary, running regular filtered balances 
>> (for example `btrfs balance start -dusage=50 -dlimit=2 -musage=50 
>> -mlimit=4`, see `man btrfs-balance` for more info on what the options 
>> mean) can help keep a volume healthy by mitigating the things that 
>> typically cause ENOSPC errors. Full balances by contrast are long and 
>> expensive operations, and should be done only as a last resort.
> 
> As the BTRFS noob who started the conversation on netdata's Github 
> issues, I'd like to describe my experience.
> 
> I got an alert that unallocated space on a BTRFS filesystem on one host 
> was low. A netdata caption suggested btrfs-balance and directed me to 
> its man page. But I found it hard to understand since I don't know how 
> BTRFS works or its particular terminology. The FAQ was easier to 
> understand but didn't help me find a solution to my problem.
> 
> It's a 420GiB NVMe with single data and metadata. It has a MariaDB 
> datadir with an OLTP workload and a small GlusterFS brick for 
> replicating filesystem with little activity. I recall that unallocated 
> space was under 2G, metadata allocation was low, a few G and about 1/3 
> used. Data allocation was very large, almost everything else, with ~25% 
> used.
> 
> Given the documentation and the usage stats, I did not know what options 
> to use with balance. I spent some time reading and researching and 
> trying to understand the filters and how they should relate to my 
> situation. Eventually I abandoned that effort and ran balance without 
> options.
Hopefully the explanation I gave on the filters in the Github issue 
helped some.  In this case though, it sounds like running a filtered 
balance probably wouldn't have saved you much over a full one.
> 
> While general recommendations about running balance would be welcome, 
> what I needed was a dummy's guide to what the output of btrfs usage 
> _means_ and how to use balance to tackle problems with it.
This really is a great point.  Our documentation does a decent job as a 
reference for people who already have some idea what they're doing, but 
it really is worthless for people who have no prior experience.
> 
> The other mystery is how the data allocation became so large.
The most common case is that you had a lot of data on the device, and 
then deleted most of it.  Unless a chunk becomes completely empty 
(either because the data that was in it becomes completely unused, or 
because a balance moved all the data), it won't be automatically deleted 
by the kernel, so it's not unusual for filesystems that have been very 
active (especially if they have the 'ssd' mount option set, which 
happens automatically on most SSD's and a lot of other things the kernel 
marks as not being rotational media) to have a reasonably large amount 
of empty space scattered around the data chunks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Recommendations for balancing as part of regular maintenance?
  2018-01-09 12:23 ` Austin S. Hemmelgarn
@ 2018-01-09 14:16   ` Tom Worster
  0 siblings, 0 replies; 34+ messages in thread
From: Tom Worster @ 2018-01-09 14:16 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: linux-btrfs

On 9 Jan 2018, at 7:23, Austin S. Hemmelgarn wrote:

> On 2018-01-08 16:43, Tom Worster wrote:
>>
>> Given the documentation and the usage stats, I did not know what 
>> options to use with balance. I spent some time reading and 
>> researching and trying to understand the filters and how they should 
>> relate to my situation. Eventually I abandoned that effort and ran 
>> balance without options.
> Hopefully the explanation I gave on the filters in the Github issue 
> helped some.  In this case though, it sounds like running a filtered 
> balance probably wouldn't have saved you much over a full one.

Yes, it helped. Hugo's email helped too. I now have a better 
understanding of balance filters.

At the same time, Hugo's email and others in this thread added to my 
belief that I'm now managing systems with a filesystem I'm unqualified 
to use.

Tom

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2018-01-16 12:57 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-08 15:55 Recommendations for balancing as part of regular maintenance? Austin S. Hemmelgarn
2018-01-08 16:20 ` ein
2018-01-08 16:34   ` Austin S. Hemmelgarn
2018-01-08 18:17     ` Graham Cobb
2018-01-08 18:34       ` Austin S. Hemmelgarn
2018-01-08 20:29         ` Martin Raiber
2018-01-09  8:33           ` Marat Khalili
2018-01-09 12:46             ` Austin S. Hemmelgarn
2018-01-10  3:49               ` Duncan
2018-01-10 16:30                 ` Tom Worster
2018-01-10 17:01                   ` Austin S. Hemmelgarn
2018-01-10 18:33                     ` Tom Worster
2018-01-10 20:44                       ` Timofey Titovets
2018-01-11 13:00                         ` Austin S. Hemmelgarn
2018-01-11  8:51                     ` Duncan
2018-01-10  4:38       ` Duncan
2018-01-10 12:41         ` Austin S. Hemmelgarn
2018-01-11 20:12         ` Hans van Kranenburg
2018-01-10 21:37 ` waxhead
2018-01-11 12:50   ` Austin S. Hemmelgarn
2018-01-11 19:56   ` Hans van Kranenburg
2018-01-12 18:24 ` Austin S. Hemmelgarn
2018-01-12 19:26   ` Tom Worster
2018-01-12 19:43     ` Austin S. Hemmelgarn
2018-01-13 22:09   ` Chris Murphy
2018-01-15 13:43     ` Austin S. Hemmelgarn
2018-01-15 18:23     ` Tom Worster
2018-01-16  6:45       ` Chris Murphy
2018-01-16 11:02         ` Andrei Borzenkov
2018-01-16 12:57         ` Austin S. Hemmelgarn
2018-01-08 21:43 Tom Worster
2018-01-08 22:18 ` Hugo Mills
2018-01-09 12:23 ` Austin S. Hemmelgarn
2018-01-09 14:16   ` Tom Worster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.