From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: how to run balance successfully (No space left on device)?
Date: Tue, 19 Sep 2017 02:59:19 +0000 (UTC) [thread overview]
Message-ID: <pan$7ed29$8cce4f45$b9679fc8$79ad166d@cox.net> (raw)
In-Reply-To: d4dce3d04c11e171b44a1924114f5ddd@wpkg.org
Tomasz Chmielewski posted on Mon, 18 Sep 2017 18:27:09 +0900 as excerpted:
> And perhaps more important - can I assume that right now, with the
> latest stable kernel (4.13.2 right now), running "btrfs balance" is not
> safe and can lead to data corruption or loss?
>
>
> Consider the following case:
>
> - system admin runs btrfs balance on a filesystem with 100 GB free and
> assumes it is enough space to complete successfully
>
> - btrfs balance fails due to some bug with "No space left on device"
>
> - at the same time, a database using this filesystem will fail with "No
> space left on device", apt/rpm will fail a package upgrade, some program
> using temp space will fail, log collector will fail to catch some data,
> because of "No space left on device" and so on?
To the best of my knowledge that shouldn't be a problem, certainly not
one I'd worry about if you're following the sysadmin's first rule of
backups, the true value of data to you is defined not by any claims but
by the number of backups you consider it worth having of that data, so it
follows that no backups means you've defined the data as worth less than
the time/trouble/resources it would take to create at least that one
backup.
The ENOSPC is because the internal calculation for the reserved-space
requirement is buggy ATM, but AFAIK it's just that, an /internal/
calculation, that goes waayyy wild, and stops any action it's going to
stop before it goes anywhere -- it doesn't get to the point of affecting
anything else because the reserve space calculation goes wild and stops
it before it can actually reserve the space.
Talking about which... I've not seen it mentioned in the bug discussion,
but I wonder if doing a btrfs balance start -d, followed by a another
balance with -m replacing the -d, thus separating the data and metadata
balances, might work around the problem. At least you could know for
sure which is causing it that way, and complete a balance of the other
one. And if that blocks on one or the other, you could split the job up
further using the devid= and drange= filters (see the btrfs-balance
manpage), doing only part of the filesystem at a time. My speculation is
that you should be able to divide the operation up enough so that even if
the reserve space calculation is off, it'll still complete.
Meanwhile, I don't believe it's just balance that's affected, either, tho
it's the most commonly reported. By my understanding, any sufficiently
large operation could trigger it, tho obviously a full btrfs balance is
about the largest operation a btrfs is likely to have, so it stands to
reason that would trigger it more reliably than common generic filesystem
operations.
Of course if you're paranoid, you can refrain from doing balances until
you know the bug is fixed, but then I'd have to ask, if you're that
paranoid of a filesystem failure, why are you running the still
stabilizing, not yet entirely stable and mature, btrfs, in the first
place? Seems a bit like the folks still running RHEL/CentOS 6 with their
stable kernels because they want stability, yet choosing to run the still
not entirely stable btrfs, definitely not entirely stable on that old a
kernel, on top of them.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2017-09-19 2:59 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-17 15:02 how to run balance successfully (No space left on device)? Tomasz Chmielewski
2017-09-18 1:50 ` Duncan
2017-09-18 8:20 ` Tomasz Chmielewski
2017-09-18 8:29 ` Andrei Borzenkov
2017-09-18 9:27 ` Tomasz Chmielewski
2017-09-18 13:44 ` Peter Becker
2017-09-18 13:50 ` Tomasz Chmielewski
2017-09-19 2:59 ` Duncan [this message]
2017-10-31 14:18 ` Tomasz Chmielewski
2017-10-31 14:51 ` Tomasz Chmielewski
2017-11-07 5:13 ` Tomasz Chmielewski
[not found] ` <CAJtFHUQ34uyt-iAQKuQ-WqXMrCqxsPeqFc5LvYmZHrz+Rxs66A@mail.gmail.com>
2017-11-10 7:42 ` Tomasz Chmielewski
2017-11-10 21:51 ` Chris Murphy
2017-11-10 22:18 ` Martin Raiber
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$7ed29$8cce4f45$b9679fc8$79ad166d@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.