All of lore.kernel.org
 help / color / mirror / Atom feed
From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: how to run balance successfully (No space left on device)?
Date: Tue, 19 Sep 2017 02:59:19 +0000 (UTC)	[thread overview]
Message-ID: <pan$7ed29$8cce4f45$b9679fc8$79ad166d@cox.net> (raw)
In-Reply-To: d4dce3d04c11e171b44a1924114f5ddd@wpkg.org

Tomasz Chmielewski posted on Mon, 18 Sep 2017 18:27:09 +0900 as excerpted:

> And perhaps more important - can I assume that right now, with the
> latest stable kernel (4.13.2 right now), running "btrfs balance" is not
> safe and can lead to data corruption or loss?
> 
> 
> Consider the following case:
> 
> - system admin runs btrfs balance on a filesystem with 100 GB free and
> assumes it is enough space to complete successfully
> 
> - btrfs balance fails due to some bug with "No space left on device"
> 
> - at the same time, a database using this filesystem will fail with "No
> space left on device", apt/rpm will fail a package upgrade, some program
> using temp space will fail, log collector will fail to catch some data,
> because of "No space left on device" and so on?

To the best of my knowledge that shouldn't be a problem, certainly not 
one I'd worry about if you're following the sysadmin's first rule of 
backups, the true value of data to you is defined not by any claims but 
by the number of backups you consider it worth having of that data, so it 
follows that no backups means you've defined the data as worth less than 
the time/trouble/resources it would take to create at least that one 
backup.

The ENOSPC is because the internal calculation for the reserved-space 
requirement is buggy ATM, but AFAIK it's just that, an /internal/ 
calculation, that goes waayyy wild, and stops any action it's going to 
stop before it goes anywhere -- it doesn't get to the point of affecting 
anything else because the reserve space calculation goes wild and stops 
it before it can actually reserve the space.

Talking about which... I've not seen it mentioned in the bug discussion, 
but I wonder if doing a btrfs balance start -d, followed by a another 
balance with -m replacing the -d, thus separating the data and metadata 
balances, might work around the problem.  At least you could know for 
sure which is causing it that way, and complete a balance of the other 
one.  And if that blocks on one or the other, you could split the job up 
further using the devid= and drange= filters (see the btrfs-balance 
manpage), doing only part of the filesystem at a time.  My speculation is 
that you should be able to divide the operation up enough so that even if 
the reserve space calculation is off, it'll still complete.

Meanwhile, I don't believe it's just balance that's affected, either, tho 
it's the most commonly reported.  By my understanding, any sufficiently 
large operation could trigger it, tho obviously a full btrfs balance is 
about the largest operation a btrfs is likely to have, so it stands to 
reason that would trigger it more reliably than common generic filesystem 
operations.

Of course if you're paranoid, you can refrain from doing balances until 
you know the bug is fixed, but then I'd have to ask, if you're that 
paranoid of a filesystem failure, why are you running the still 
stabilizing, not yet entirely stable and mature, btrfs, in the first 
place?  Seems a bit like the folks still running RHEL/CentOS 6 with their 
stable kernels because they want stability, yet choosing to run the still 
not entirely stable btrfs, definitely not entirely stable on that old a 
kernel, on top of them.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


  parent reply	other threads:[~2017-09-19  2:59 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-17 15:02 how to run balance successfully (No space left on device)? Tomasz Chmielewski
2017-09-18  1:50 ` Duncan
2017-09-18  8:20 ` Tomasz Chmielewski
2017-09-18  8:29   ` Andrei Borzenkov
2017-09-18  9:27     ` Tomasz Chmielewski
2017-09-18 13:44       ` Peter Becker
2017-09-18 13:50         ` Tomasz Chmielewski
2017-09-19  2:59       ` Duncan [this message]
2017-10-31 14:18   ` Tomasz Chmielewski
2017-10-31 14:51     ` Tomasz Chmielewski
2017-11-07  5:13     ` Tomasz Chmielewski
     [not found]       ` <CAJtFHUQ34uyt-iAQKuQ-WqXMrCqxsPeqFc5LvYmZHrz+Rxs66A@mail.gmail.com>
2017-11-10  7:42         ` Tomasz Chmielewski
2017-11-10 21:51           ` Chris Murphy
2017-11-10 22:18             ` Martin Raiber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$7ed29$8cce4f45$b9679fc8$79ad166d@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.