All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Murphy <lists@colorremedies.com>
To: Tomasz Chmielewski <mangoo@wpkg.org>
Cc: E V <eliventer@gmail.com>, Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: how to run balance successfully (No space left on device)?
Date: Fri, 10 Nov 2017 14:51:58 -0700	[thread overview]
Message-ID: <CAJCQCtS3kR0xeKCpMdk9mivJ+8JyToPVoKMmUUX2sp2-L1PpfA@mail.gmail.com> (raw)
In-Reply-To: <011ae8c4281f0f8799d48189f540a302@wpkg.org>

On Fri, Nov 10, 2017 at 12:42 AM, Tomasz Chmielewski <mangoo@wpkg.org> wrote:
> On 2017-11-07 23:49, E V wrote:
>
>> Hmm, I used to see these phantom no space issues quite a bit on older
>> 4.x kernels, and haven't seen them since switching to space_cache=v2.
>> So it could be space cache corruption. You might try either clearing
>> you space cache, or mounting with nospace_cache, or try converting to
>> space_cache=v2 after reading up on it's caveats.
>
>
> We have space_cache=v2.


I have no idea if it's related or not, as this isn't a default mount
option and is still under testing.



>
> Unfortunately yet one more system running 4.14-rc8 with "No space left"
> during balance:
>
>
> [68443.535664] BTRFS info (device sdb3): relocating block group 591771009024
> flags data|raid1
> [68463.203330] BTRFS info (device sdb3): found 8578 extents
> [68492.238676] BTRFS info (device sdb3): found 8559 extents
> [68500.751792] BTRFS info (device sdb3): 1 enospc errors during balance
>
>
> # btrfs balance start /var/lib/lxd
> WARNING:
>
>         Full balance without filters requested. This operation is very
>         intense and takes potentially very long. It is recommended to
>         use the balance filters to narrow down the balanced data.
>         Use 'btrfs balance start --full-balance' option to skip this
>         warning. The operation will start in 10 seconds.
>         Use Ctrl-C to stop it.
> 10 9 8 7 6 5 4 3 2 1
> Starting balance without any filters.
> ERROR: error during balancing '/var/lib/lxd': No space left on device
> There may be more info in syslog - try dmesg | tail


OK I wonder if this is a bug in user space tool's error handling?
Because what you have in kernel messages is BTRFS info. It is not a
warning or an error. I interpret this as enospc error happened but it
recovered, so it was not an unhandled error condition, and definitely
non-fatal. But the user space tool is reporting a bogus "No space left
on device". It's plainly bogus because you have a lot of space on the
device, including unallocated space. So the user space tool needs to
either ignore this type of informational enospc or it needs a
different message to make it clear this is not a fatal error and was
properly handled.

Do you get any additional information when using enospc_debug mount
option and reproduce this problem?


> Unallocated:
>    /dev/sda3     112.00GiB
>    /dev/sdb3     112.00GiB

Metric shittons of space. The error is certainly bogus.



> Combined with evidence that "No space left on device" during balance can
> lead to various file corruption (we've witnessed it with MySQL), I'd day
> btrfs balance is a dangerous operation and decision to use it should be
> considered very thoroughly.

I've never heard of this. Balance is COW at the chunk level. The old
chunk is not dereferenced until it's written in the new location
correctly. Corruption during balance shouldn't be possible so if you
have a reproducer, the devs need to know about it.





-- 
Chris Murphy

  reply	other threads:[~2017-11-10 21:52 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-17 15:02 how to run balance successfully (No space left on device)? Tomasz Chmielewski
2017-09-18  1:50 ` Duncan
2017-09-18  8:20 ` Tomasz Chmielewski
2017-09-18  8:29   ` Andrei Borzenkov
2017-09-18  9:27     ` Tomasz Chmielewski
2017-09-18 13:44       ` Peter Becker
2017-09-18 13:50         ` Tomasz Chmielewski
2017-09-19  2:59       ` Duncan
2017-10-31 14:18   ` Tomasz Chmielewski
2017-10-31 14:51     ` Tomasz Chmielewski
2017-11-07  5:13     ` Tomasz Chmielewski
     [not found]       ` <CAJtFHUQ34uyt-iAQKuQ-WqXMrCqxsPeqFc5LvYmZHrz+Rxs66A@mail.gmail.com>
2017-11-10  7:42         ` Tomasz Chmielewski
2017-11-10 21:51           ` Chris Murphy [this message]
2017-11-10 22:18             ` Martin Raiber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJCQCtS3kR0xeKCpMdk9mivJ+8JyToPVoKMmUUX2sp2-L1PpfA@mail.gmail.com \
    --to=lists@colorremedies.com \
    --cc=eliventer@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mangoo@wpkg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.