linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Giovanni Biscuolo <g@xelera.eu>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: how to recover from "enospc errors during balance"
Date: Tue, 29 Sep 2020 20:04:26 -0400	[thread overview]
Message-ID: <20200930000417.GH5890@hungrycats.org> (raw)
In-Reply-To: <87r1qk4q4d.fsf@roquette.i-did-not-set--mail-host-address--so-tickle-me>

On Tue, Sep 29, 2020 at 04:25:06PM +0200, Giovanni Biscuolo wrote:
> Hello,
> 
> please also reply to me since I'm not subscribed to linux-btrfs, thanks!
> 
> My BTRFS filesystem is full, I got ENOSPC during a (scheduled) balance:
> 
> --8<---------------cut here---------------start------------->8---
> 
> [6928066.755704] BTRFS info (device sda3): balance: start -dusage=50 -musage=70 -susage=70

Never balance metadata on a schedule.  If it is done too often, and the
disk fills up, it will eventually lead to ENOSPC errors that are hard
to get out of...

> [6928066.760485] BTRFS info (device sda3): relocating block group 139449073664 flags metadata|raid1
> [6928075.142462] BTRFS: error (device sda3) in btrfs_drop_snapshot:5421: errno=-28 No space left

...like this one.

> [6928075.146566] BTRFS info (device sda3): forced readonly
> [6928075.150851] BTRFS info (device sda3): 2 enospc errors during balance
> [6928075.155422] BTRFS info (device sda3): balance: ended with status: -30
> [6928083.483820] BTRFS info (device sda3): delayed_refs has NO entry
> 
> --8<---------------cut here---------------end--------------->8---
> 
> and now it's mounted read-only:
> 
> --8<---------------cut here---------------start------------->8---
> 
> /dev/sda3 on / type btrfs (ro,relatime,ssd,space_cache,subvolid=5,subvol=/)
> /dev/sda3 on /gnu/store type btrfs (ro,relatime,ssd,space_cache,subvolid=5,subvol=/gnu/store)
> 
> --8<---------------cut here---------------end--------------->8---
> 
> If I try to remount rw (to try to free space) I get:
> 
> --8<---------------cut here---------------start------------->8---
> 
> [7323937.312122] BTRFS info (device sda3): disk space caching is enabled
> [7323937.316478] BTRFS error (device sda3): Remounting read-write after error is not allowed
> 
> --8<---------------cut here---------------end--------------->8---
> 
> I tried to add a new device (I have 2 spare disks) but it does not work
> with a read-only filesystem.
> 
> Please how can I remount the filesystem read-write and free some space
> deleting some files?

Add 'skip_balance' to mount options so that the next mount will not
attempt to resume balancing metadata.  Keep mounting and umounting
(not remounting) until it completes orphan and relocation cleanup (it
may take more than one attempt, probably fewer than 20 attempts).

Once you have the filesystem mounted, run 'btrfs balance cancel' on
the mount point.  Then edit your maintenance scripts and remove the
metadata balance (-m flag to 'btrfs balance start').

> Additional data:
> 
> --8<---------------cut here---------------start------------->8---
> 
> ~$ uname -a
> Linux myhost 5.4.50-gnu #1 SMP 1 x86_64 GNU/Linux
> 
> ~$ btrfs --version
> btrfs-progs v5.6
> 
> ~$ sudo btrfs balance status /
> No balance found on '/'
> 
> ~$ btrfs fi df /
> Data, RAID1: total=446.50GiB, used=446.42GiB
> System, RAID1: total=32.00MiB, used=80.00KiB
> Metadata, RAID1: total=3.00GiB, used=2.11GiB
> GlobalReserve, single: total=512.00MiB, used=5.53MiB
> 
> ~$ sudo btrfs fi usage /
> Overall:
>     Device size:                 899.07GiB
>     Device allocated:            899.07GiB
>     Device unallocated:            2.01MiB
>     Device missing:                  0.00B
>     Used:                        897.05GiB
>     Free (estimated):             85.87MiB      (min: 85.87MiB)
>     Data ratio:                       2.00
>     Metadata ratio:                   2.00
>     Global reserve:              512.00MiB      (used: 5.53MiB)
> 
> Data,RAID1: Size:446.50GiB, Used:446.42GiB (99.98%)
>    /dev/sda3     446.50GiB
>    /dev/sdb3     446.50GiB
> 
> Metadata,RAID1: Size:3.00GiB, Used:2.11GiB (70.22%)
>    /dev/sda3       3.00GiB
>    /dev/sdb3       3.00GiB
> 
> System,RAID1: Size:32.00MiB, Used:80.00KiB (0.24%)
>    /dev/sda3      32.00MiB
>    /dev/sdb3      32.00MiB
> 
> Unallocated:
>    /dev/sda3       1.00MiB
>    /dev/sdb3       1.00MiB

A metadata balance will require a GB of temporary free space so that
it can relocate and delete one of the existing metadata block groups.
This space isn't available (there is no unallocated space and less than
1GB free in allocated metadata), so the metadata balance is failing now.

If scheduled metadata balances continue, eventually the filesystem will
reach a point where there would be no space available for the metadata
to expand with the data, and the next ordinary data write will force the
filesystem read-only.  Just before that happens, the filesystem will
slow down a _lot_, reducing the amount of data written per committed
transaction in an attempt to avoid this failure.

To avoid this, never run metadata balances from a scheduled job (or for
any reason other than working around a kernel bug or adding disks to a
RAID array) so that an appropriate number of metadata block groups is
allocated and _stay_ allocated.

Scheduled data balances (-d) are OK.  They defragment free space and
improve allocator performance, and make unallocated space available so
that additional metadata block groups can be allocated when necessary.

> ~$ sudo btrfs device stats /
> [/dev/sda3].write_io_errs    0
> [/dev/sda3].read_io_errs     0
> [/dev/sda3].flush_io_errs    0
> [/dev/sda3].corruption_errs  0
> [/dev/sda3].generation_errs  0
> [/dev/sdb3].write_io_errs    0
> [/dev/sdb3].read_io_errs     0
> [/dev/sdb3].flush_io_errs    0
> [/dev/sdb3].corruption_errs  0
> [/dev/sdb3].generation_errs  0
> 
> --8<---------------cut here---------------end--------------->8---
> 
> Thank you for any useful hint!
> Best regards, Giovanni
> 
> -- 
> Giovanni Biscuolo
> 
> Xelera IT Infrastructures



  parent reply	other threads:[~2020-09-30  0:14 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-29 14:25 how to recover from "enospc errors during balance" Giovanni Biscuolo
2020-09-29 15:07 ` A L
2020-10-01  8:24   ` Giovanni Biscuolo
2020-09-30  0:04 ` Zygo Blaxell [this message]
2020-10-01  8:56   ` Giovanni Biscuolo
2020-10-01 15:28     ` A L
2020-10-02  9:32       ` Giovanni Biscuolo
2020-10-02  2:44     ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200930000417.GH5890@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=g@xelera.eu \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).