All of lore.kernel.org
 help / color / mirror / Atom feed
From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Recommendations for balancing as part of regular maintenance?
Date: Wed, 10 Jan 2018 03:49:04 +0000 (UTC)	[thread overview]
Message-ID: <pan$5d0d3$29334af9$ed04365a$f8d9747d@cox.net> (raw)
In-Reply-To: 13b5063c-a7bd-5c95-1f6e-16124d385569@gmail.com

Austin S. Hemmelgarn posted on Tue, 09 Jan 2018 07:46:48 -0500 as
excerpted:

>> On 08/01/18 23:29, Martin Raiber wrote:
>>> There have been reports of (rare) corruption caused by balance (won't
>>> be detected by a scrub) here on the mailing list. So I would stay a
>>> away from btrfs balance unless it is absolutely needed (ENOSPC), and
>>> while it is run I would try not to do anything else wrt. to writes
>>> simultaneously.
>> 
>> This is my opinion too as a normal user, based upon reading this list
>> and own attempts to recover from ENOSPC. I'd rather re-create
>> filesystem from scratch, or at least make full verified backup before
>> attempting to fix problems with balance.

> While I'm generally of the same opinion (and I have a feeling most other
> people who have been server admins are too), it's not a very user
> friendly position to recommend that.  Keep in mind that many (probably
> most) users don't keep proper backups, and just targeting 'sensible'
> people as your primary audience is a bad idea.  It also needs to work at
> at least a basic level anyway though simply because you can't always
> just nuke the volume and rebuild it from scratch.
> 
> Personally though, I don't think I've ever seen issues with balance
> corrupting data, and I don't recall seeing complaints about it either
> (though I would love to see some links that prove me wrong).

AFAIK, such corruption reports re balance aren't really balance, per se, 
at all.

Instead, what I've seen in nearly all cases is a number of filesystem 
maintenance commands involving heavy I/O colliding, that is, being run at 
the same time, possibly because some of them are scheduled, and the admin 
didn't take into account scheduled commands when issuing others manually.

I don't believe anyone would recommend running balance, scrub, snapshot-
deletion, and backups (rsync or btrfs send/receive being the common 
ones), all at the same time, or even two or more at the same time, if for 
no other reason than because they're all IO intensive and running just 
/one/ of them at a time is hard /enough/ on the system and the 
performance of anything else running at the same time, even when all 
components are fully stable and mature (and as we all know, btrfs is 
stabilizing, but not yet fully stable and mature), yet that's what these 
sorts of reports invariably involve.

Of course, with a certainty btrfs /should/ be able to handle more than 
one of these at once without corruption, because anything else is a bug, 
but... btrfs /is/ still stabilizing and maturing, and it's precisely this 
sort of rare corner-case race-condition bugs where more than one 
extremely heavy IO filesystem maintenance command is being run at the 
same time that tend to be the last to be found and fixed, because they 
/are/ rare corner-cases, often depending on race conditions, that tend to 
be rare enough reported, and then extremely difficult to duplicate, so 
that's exactly the type of bugs that tend to remain around at this point.


So rather than discouraging a sane-filtered regular balance (which I'll 
discuss in a different reply), I'd suggest that the more sane 
recommendation is to be aware of other major-IO filesystem maintenance 
commands (not just btrfs commands but rsync-based backups, etc, too, 
rsync being demanding enough on its own to have triggered a number of 
btrfs bug reports and fixes over the years), including scheduled 
commands, and to only run one at a time.

IOW, don't do a balance if your scheduled backup or snapshot-deletion is 
about to kick in.  One at a time is stressful enough on the filesystem 
and hardware, don't compound the problem trying to do two or more at once!

So assuming a weekly schedule, do one a day of balance, scrub, snapshot-
deletion, backups (after ensuring that none of them take over a day, 
balance in particular could at TiB-scale+ if not sanely filtered, 
particularly if quotas are enabled due to the scaling issues of that 
feature).  And if any of those are scheduled daily or more frequently, 
space the scheduling appropriately and ensure they're done before 
starting the next task.

And keep in mind the scheduled tasks when running things manually, so as 
not to collide there either.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


  reply	other threads:[~2018-01-10  3:51 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-08 15:55 Recommendations for balancing as part of regular maintenance? Austin S. Hemmelgarn
2018-01-08 16:20 ` ein
2018-01-08 16:34   ` Austin S. Hemmelgarn
2018-01-08 18:17     ` Graham Cobb
2018-01-08 18:34       ` Austin S. Hemmelgarn
2018-01-08 20:29         ` Martin Raiber
2018-01-09  8:33           ` Marat Khalili
2018-01-09 12:46             ` Austin S. Hemmelgarn
2018-01-10  3:49               ` Duncan [this message]
2018-01-10 16:30                 ` Tom Worster
2018-01-10 17:01                   ` Austin S. Hemmelgarn
2018-01-10 18:33                     ` Tom Worster
2018-01-10 20:44                       ` Timofey Titovets
2018-01-11 13:00                         ` Austin S. Hemmelgarn
2018-01-11  8:51                     ` Duncan
2018-01-10  4:38       ` Duncan
2018-01-10 12:41         ` Austin S. Hemmelgarn
2018-01-11 20:12         ` Hans van Kranenburg
2018-01-10 21:37 ` waxhead
2018-01-11 12:50   ` Austin S. Hemmelgarn
2018-01-11 19:56   ` Hans van Kranenburg
2018-01-12 18:24 ` Austin S. Hemmelgarn
2018-01-12 19:26   ` Tom Worster
2018-01-12 19:43     ` Austin S. Hemmelgarn
2018-01-13 22:09   ` Chris Murphy
2018-01-15 13:43     ` Austin S. Hemmelgarn
2018-01-15 18:23     ` Tom Worster
2018-01-16  6:45       ` Chris Murphy
2018-01-16 11:02         ` Andrei Borzenkov
2018-01-16 12:57         ` Austin S. Hemmelgarn
2018-01-08 21:43 Tom Worster
2018-01-08 22:18 ` Hugo Mills
2018-01-09 12:23 ` Austin S. Hemmelgarn
2018-01-09 14:16   ` Tom Worster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$5d0d3$29334af9$ed04365a$f8d9747d@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.