All of lore.kernel.org
 help / color / mirror / Atom feed
* how to recover from "enospc errors during balance"
@ 2020-09-29 14:25 Giovanni Biscuolo
  2020-09-29 15:07 ` A L
  2020-09-30  0:04 ` Zygo Blaxell
  0 siblings, 2 replies; 8+ messages in thread
From: Giovanni Biscuolo @ 2020-09-29 14:25 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3663 bytes --]

Hello,

please also reply to me since I'm not subscribed to linux-btrfs, thanks!

My BTRFS filesystem is full, I got ENOSPC during a (scheduled) balance:

--8<---------------cut here---------------start------------->8---

[6928066.755704] BTRFS info (device sda3): balance: start -dusage=50 -musage=70 -susage=70
[6928066.760485] BTRFS info (device sda3): relocating block group 139449073664 flags metadata|raid1
[6928075.142462] BTRFS: error (device sda3) in btrfs_drop_snapshot:5421: errno=-28 No space left
[6928075.146566] BTRFS info (device sda3): forced readonly
[6928075.150851] BTRFS info (device sda3): 2 enospc errors during balance
[6928075.155422] BTRFS info (device sda3): balance: ended with status: -30
[6928083.483820] BTRFS info (device sda3): delayed_refs has NO entry

--8<---------------cut here---------------end--------------->8---

and now it's mounted read-only:

--8<---------------cut here---------------start------------->8---

/dev/sda3 on / type btrfs (ro,relatime,ssd,space_cache,subvolid=5,subvol=/)
/dev/sda3 on /gnu/store type btrfs (ro,relatime,ssd,space_cache,subvolid=5,subvol=/gnu/store)

--8<---------------cut here---------------end--------------->8---

If I try to remount rw (to try to free space) I get:

--8<---------------cut here---------------start------------->8---

[7323937.312122] BTRFS info (device sda3): disk space caching is enabled
[7323937.316478] BTRFS error (device sda3): Remounting read-write after error is not allowed

--8<---------------cut here---------------end--------------->8---

I tried to add a new device (I have 2 spare disks) but it does not work
with a read-only filesystem.

Please how can I remount the filesystem read-write and free some space
deleting some files?

Additional data:

--8<---------------cut here---------------start------------->8---

~$ uname -a
Linux myhost 5.4.50-gnu #1 SMP 1 x86_64 GNU/Linux

~$ btrfs --version
btrfs-progs v5.6

~$ sudo btrfs balance status /
No balance found on '/'

~$ btrfs fi df /
Data, RAID1: total=446.50GiB, used=446.42GiB
System, RAID1: total=32.00MiB, used=80.00KiB
Metadata, RAID1: total=3.00GiB, used=2.11GiB
GlobalReserve, single: total=512.00MiB, used=5.53MiB

~$ sudo btrfs fi usage /
Overall:
    Device size:                 899.07GiB
    Device allocated:            899.07GiB
    Device unallocated:            2.01MiB
    Device missing:                  0.00B
    Used:                        897.05GiB
    Free (estimated):             85.87MiB      (min: 85.87MiB)
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 5.53MiB)

Data,RAID1: Size:446.50GiB, Used:446.42GiB (99.98%)
   /dev/sda3     446.50GiB
   /dev/sdb3     446.50GiB

Metadata,RAID1: Size:3.00GiB, Used:2.11GiB (70.22%)
   /dev/sda3       3.00GiB
   /dev/sdb3       3.00GiB

System,RAID1: Size:32.00MiB, Used:80.00KiB (0.24%)
   /dev/sda3      32.00MiB
   /dev/sdb3      32.00MiB

Unallocated:
   /dev/sda3       1.00MiB
   /dev/sdb3       1.00MiB

~$ sudo btrfs device stats /
[/dev/sda3].write_io_errs    0
[/dev/sda3].read_io_errs     0
[/dev/sda3].flush_io_errs    0
[/dev/sda3].corruption_errs  0
[/dev/sda3].generation_errs  0
[/dev/sdb3].write_io_errs    0
[/dev/sdb3].read_io_errs     0
[/dev/sdb3].flush_io_errs    0
[/dev/sdb3].corruption_errs  0
[/dev/sdb3].generation_errs  0

--8<---------------cut here---------------end--------------->8---

Thank you for any useful hint!
Best regards, Giovanni

-- 
Giovanni Biscuolo

Xelera IT Infrastructures

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: how to recover from "enospc errors during balance"
  2020-09-29 14:25 how to recover from "enospc errors during balance" Giovanni Biscuolo
@ 2020-09-29 15:07 ` A L
  2020-10-01  8:24   ` Giovanni Biscuolo
  2020-09-30  0:04 ` Zygo Blaxell
  1 sibling, 1 reply; 8+ messages in thread
From: A L @ 2020-09-29 15:07 UTC (permalink / raw)
  To: Giovanni Biscuolo, linux-btrfs



---- From: Giovanni Biscuolo <g@xelera.eu> -- Sent: 2020-09-29 - 16:25 ----

> Hello,
> 
> please also reply to me since I'm not subscribed to linux-btrfs, thanks!
> 
> My BTRFS filesystem is full, I got ENOSPC during a (scheduled) balance:
> 
> --8<---------------cut here---------------start------------->8---
> 
> [6928066.755704] BTRFS info (device sda3): balance: start -dusage=50 -musage=70 -susage=70
> [6928066.760485] BTRFS info (device sda3): relocating block group 139449073664 flags metadata|raid1
> [6928075.142462] BTRFS: error (device sda3) in btrfs_drop_snapshot:5421: errno=-28 No space left
> [6928075.146566] BTRFS info (device sda3): forced readonly
> [6928075.150851] BTRFS info (device sda3): 2 enospc errors during balance
> [6928075.155422] BTRFS info (device sda3): balance: ended with status: -30
> [6928083.483820] BTRFS info (device sda3): delayed_refs has NO entry
> 
> --8<---------------cut here---------------end--------------->8---
> 
> and now it's mounted read-only:
> 
> --8<---------------cut here---------------start------------->8---
> 
> /dev/sda3 on / type btrfs (ro,relatime,ssd,space_cache,subvolid=5,subvol=/)
> /dev/sda3 on /gnu/store type btrfs (ro,relatime,ssd,space_cache,subvolid=5,subvol=/gnu/store)
> 
> --8<---------------cut here---------------end--------------->8---
> 
> If I try to remount rw (to try to free space) I get:
> 
> --8<---------------cut here---------------start------------->8---
> 
> [7323937.312122] BTRFS info (device sda3): disk space caching is enabled
> [7323937.316478] BTRFS error (device sda3): Remounting read-write after error is not allowed
> 
> --8<---------------cut here---------------end--------------->8---
> 
> I tried to add a new device (I have 2 spare disks) but it does not work
> with a read-only filesystem.
> 
> Please how can I remount the filesystem read-write and free some space
> deleting some files?
> 
> Additional data:
> 
> --8<---------------cut here---------------start------------->8---
> 
> ~$ uname -a
> Linux myhost 5.4.50-gnu #1 SMP 1 x86_64 GNU/Linux
> 
> ~$ btrfs --version
> btrfs-progs v5.6
> 
> ~$ sudo btrfs balance status /
> No balance found on '/'
> 
> ~$ btrfs fi df /
> Data, RAID1: total=446.50GiB, used=446.42GiB
> System, RAID1: total=32.00MiB, used=80.00KiB
> Metadata, RAID1: total=3.00GiB, used=2.11GiB
> GlobalReserve, single: total=512.00MiB, used=5.53MiB
> 
> ~$ sudo btrfs fi usage /
> Overall:
>     Device size:                 899.07GiB
>     Device allocated:            899.07GiB
>     Device unallocated:            2.01MiB
>     Device missing:                  0.00B
>     Used:                        897.05GiB
>     Free (estimated):             85.87MiB      (min: 85.87MiB)
>     Data ratio:                       2.00
>     Metadata ratio:                   2.00
>     Global reserve:              512.00MiB      (used: 5.53MiB)
> 
> Data,RAID1: Size:446.50GiB, Used:446.42GiB (99.98%)
>    /dev/sda3     446.50GiB
>    /dev/sdb3     446.50GiB
> 
> Metadata,RAID1: Size:3.00GiB, Used:2.11GiB (70.22%)
>    /dev/sda3       3.00GiB
>    /dev/sdb3       3.00GiB
> 
> System,RAID1: Size:32.00MiB, Used:80.00KiB (0.24%)
>    /dev/sda3      32.00MiB
>    /dev/sdb3      32.00MiB
> 
> Unallocated:
>    /dev/sda3       1.00MiB
>    /dev/sdb3       1.00MiB
> 
> ~$ sudo btrfs device stats /
> [/dev/sda3].write_io_errs    0
> [/dev/sda3].read_io_errs     0
> [/dev/sda3].flush_io_errs    0
> [/dev/sda3].corruption_errs  0
> [/dev/sda3].generation_errs  0
> [/dev/sdb3].write_io_errs    0
> [/dev/sdb3].read_io_errs     0
> [/dev/sdb3].flush_io_errs    0
> [/dev/sdb3].corruption_errs  0
> [/dev/sdb3].generation_errs  0
> 
> --8<---------------cut here---------------end--------------->8---
> 
> Thank you for any useful hint!
> Best regards, Giovanni
> 
> -- 
> Giovanni Biscuolo
> 
> Xelera IT Infrastructures


Hi,

I think you need to mount with -o skip_balance to get it back into rw mode.

Then you may need to add two new disks, because raid1 profile allocates two chunks (1GiB each) on two disks. At the moment you don't have space for any additional data or metadata chunks.

You can also as for help on irc channel #btrfs on Freenode.

Good luck! 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: how to recover from "enospc errors during balance"
  2020-09-29 14:25 how to recover from "enospc errors during balance" Giovanni Biscuolo
  2020-09-29 15:07 ` A L
@ 2020-09-30  0:04 ` Zygo Blaxell
  2020-10-01  8:56   ` Giovanni Biscuolo
  1 sibling, 1 reply; 8+ messages in thread
From: Zygo Blaxell @ 2020-09-30  0:04 UTC (permalink / raw)
  To: Giovanni Biscuolo; +Cc: linux-btrfs

On Tue, Sep 29, 2020 at 04:25:06PM +0200, Giovanni Biscuolo wrote:
> Hello,
> 
> please also reply to me since I'm not subscribed to linux-btrfs, thanks!
> 
> My BTRFS filesystem is full, I got ENOSPC during a (scheduled) balance:
> 
> --8<---------------cut here---------------start------------->8---
> 
> [6928066.755704] BTRFS info (device sda3): balance: start -dusage=50 -musage=70 -susage=70

Never balance metadata on a schedule.  If it is done too often, and the
disk fills up, it will eventually lead to ENOSPC errors that are hard
to get out of...

> [6928066.760485] BTRFS info (device sda3): relocating block group 139449073664 flags metadata|raid1
> [6928075.142462] BTRFS: error (device sda3) in btrfs_drop_snapshot:5421: errno=-28 No space left

...like this one.

> [6928075.146566] BTRFS info (device sda3): forced readonly
> [6928075.150851] BTRFS info (device sda3): 2 enospc errors during balance
> [6928075.155422] BTRFS info (device sda3): balance: ended with status: -30
> [6928083.483820] BTRFS info (device sda3): delayed_refs has NO entry
> 
> --8<---------------cut here---------------end--------------->8---
> 
> and now it's mounted read-only:
> 
> --8<---------------cut here---------------start------------->8---
> 
> /dev/sda3 on / type btrfs (ro,relatime,ssd,space_cache,subvolid=5,subvol=/)
> /dev/sda3 on /gnu/store type btrfs (ro,relatime,ssd,space_cache,subvolid=5,subvol=/gnu/store)
> 
> --8<---------------cut here---------------end--------------->8---
> 
> If I try to remount rw (to try to free space) I get:
> 
> --8<---------------cut here---------------start------------->8---
> 
> [7323937.312122] BTRFS info (device sda3): disk space caching is enabled
> [7323937.316478] BTRFS error (device sda3): Remounting read-write after error is not allowed
> 
> --8<---------------cut here---------------end--------------->8---
> 
> I tried to add a new device (I have 2 spare disks) but it does not work
> with a read-only filesystem.
> 
> Please how can I remount the filesystem read-write and free some space
> deleting some files?

Add 'skip_balance' to mount options so that the next mount will not
attempt to resume balancing metadata.  Keep mounting and umounting
(not remounting) until it completes orphan and relocation cleanup (it
may take more than one attempt, probably fewer than 20 attempts).

Once you have the filesystem mounted, run 'btrfs balance cancel' on
the mount point.  Then edit your maintenance scripts and remove the
metadata balance (-m flag to 'btrfs balance start').

> Additional data:
> 
> --8<---------------cut here---------------start------------->8---
> 
> ~$ uname -a
> Linux myhost 5.4.50-gnu #1 SMP 1 x86_64 GNU/Linux
> 
> ~$ btrfs --version
> btrfs-progs v5.6
> 
> ~$ sudo btrfs balance status /
> No balance found on '/'
> 
> ~$ btrfs fi df /
> Data, RAID1: total=446.50GiB, used=446.42GiB
> System, RAID1: total=32.00MiB, used=80.00KiB
> Metadata, RAID1: total=3.00GiB, used=2.11GiB
> GlobalReserve, single: total=512.00MiB, used=5.53MiB
> 
> ~$ sudo btrfs fi usage /
> Overall:
>     Device size:                 899.07GiB
>     Device allocated:            899.07GiB
>     Device unallocated:            2.01MiB
>     Device missing:                  0.00B
>     Used:                        897.05GiB
>     Free (estimated):             85.87MiB      (min: 85.87MiB)
>     Data ratio:                       2.00
>     Metadata ratio:                   2.00
>     Global reserve:              512.00MiB      (used: 5.53MiB)
> 
> Data,RAID1: Size:446.50GiB, Used:446.42GiB (99.98%)
>    /dev/sda3     446.50GiB
>    /dev/sdb3     446.50GiB
> 
> Metadata,RAID1: Size:3.00GiB, Used:2.11GiB (70.22%)
>    /dev/sda3       3.00GiB
>    /dev/sdb3       3.00GiB
> 
> System,RAID1: Size:32.00MiB, Used:80.00KiB (0.24%)
>    /dev/sda3      32.00MiB
>    /dev/sdb3      32.00MiB
> 
> Unallocated:
>    /dev/sda3       1.00MiB
>    /dev/sdb3       1.00MiB

A metadata balance will require a GB of temporary free space so that
it can relocate and delete one of the existing metadata block groups.
This space isn't available (there is no unallocated space and less than
1GB free in allocated metadata), so the metadata balance is failing now.

If scheduled metadata balances continue, eventually the filesystem will
reach a point where there would be no space available for the metadata
to expand with the data, and the next ordinary data write will force the
filesystem read-only.  Just before that happens, the filesystem will
slow down a _lot_, reducing the amount of data written per committed
transaction in an attempt to avoid this failure.

To avoid this, never run metadata balances from a scheduled job (or for
any reason other than working around a kernel bug or adding disks to a
RAID array) so that an appropriate number of metadata block groups is
allocated and _stay_ allocated.

Scheduled data balances (-d) are OK.  They defragment free space and
improve allocator performance, and make unallocated space available so
that additional metadata block groups can be allocated when necessary.

> ~$ sudo btrfs device stats /
> [/dev/sda3].write_io_errs    0
> [/dev/sda3].read_io_errs     0
> [/dev/sda3].flush_io_errs    0
> [/dev/sda3].corruption_errs  0
> [/dev/sda3].generation_errs  0
> [/dev/sdb3].write_io_errs    0
> [/dev/sdb3].read_io_errs     0
> [/dev/sdb3].flush_io_errs    0
> [/dev/sdb3].corruption_errs  0
> [/dev/sdb3].generation_errs  0
> 
> --8<---------------cut here---------------end--------------->8---
> 
> Thank you for any useful hint!
> Best regards, Giovanni
> 
> -- 
> Giovanni Biscuolo
> 
> Xelera IT Infrastructures



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: how to recover from "enospc errors during balance"
  2020-09-29 15:07 ` A L
@ 2020-10-01  8:24   ` Giovanni Biscuolo
  0 siblings, 0 replies; 8+ messages in thread
From: Giovanni Biscuolo @ 2020-10-01  8:24 UTC (permalink / raw)
  To: A L; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 785 bytes --]

Hello,

A L <mail@lechevalier.se> writes:

> ---- From: Giovanni Biscuolo <g@xelera.eu> -- Sent: 2020-09-29 - 16:25 ----

[...]

>> Thank you for any useful hint!
>> Best regards, Giovanni

[...]

> Hi,
>
> I think you need to mount with -o skip_balance to get it back into rw
> mode.
>
> Then you may need to add two new disks, because raid1 profile
> allocates two chunks (1GiB each) on two disks. At the moment you don't
> have space for any additional data or metadata chunks.
>
> You can also as for help on irc channel #btrfs on Freenode.
>
> Good luck! 

Thanky you very much for your advice, I'm going to try what Zygo Blaxell
suggested me yesterday in this thread.

Happy hacking! Giovanni.

-- 
Giovanni Biscuolo

Xelera IT Infrastructures

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 849 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: how to recover from "enospc errors during balance"
  2020-09-30  0:04 ` Zygo Blaxell
@ 2020-10-01  8:56   ` Giovanni Biscuolo
  2020-10-01 15:28     ` A L
  2020-10-02  2:44     ` Zygo Blaxell
  0 siblings, 2 replies; 8+ messages in thread
From: Giovanni Biscuolo @ 2020-10-01  8:56 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4223 bytes --]

Hello Zygo,

thank you for your help!

...but I still cannot mount the filesystem RW, see below.

Zygo Blaxell <ce3g8jdj@umail.furryterror.org> writes:

> On Tue, Sep 29, 2020 at 04:25:06PM +0200, Giovanni Biscuolo wrote:

[...]

>> [6928066.755704] BTRFS info (device sda3): balance: start -dusage=50 -musage=70 -susage=70
>
> Never balance metadata on a schedule.  If it is done too often, and the
> disk fills up, it will eventually lead to ENOSPC errors that are hard
> to get out of...

OK I got it: I'll fix it as soon as I'll get to remount the (root)
filesystem.

I was using an option I did not fully understand and I was not able to
find such a warning in the documentation.

[...]

>> I tried to add a new device (I have 2 spare disks) but it does not work
>> with a read-only filesystem.
>> 
>> Please how can I remount the filesystem read-write and free some space
>> deleting some files?
>
> Add 'skip_balance' to mount options so that the next mount will not
> attempt to resume balancing metadata.  Keep mounting and umounting
> (not remounting) until it completes orphan and relocation cleanup (it
> may take more than one attempt, probably fewer than 20 attempts).

I try to mount with this command:

--8<---------------cut here---------------start------------->8---

~$ mount -o skip_balance,relatime,ssd,subvol=/ /dev/sda3 /
mount: /: wrong fs type, bad option, bad superblock on /dev/sda3, missing codepage or helper program, or other error.

--8<---------------cut here---------------end--------------->8---

dmesg says:

--8<---------------cut here---------------start------------->8---

[7484575.970136] BTRFS info (device sda3): disk space caching is enabled
[7484576.001375] BTRFS error (device sda3): Remounting read-write after error is not allowed

--8<---------------cut here---------------end--------------->8---

Am I doing something wrong?

It seems that the filesystem is not allowed to be remounted RW after the
error.

I don't think rebooting is a good option since it will be unbootable
(and it's a remote machine).

I fear the only option is to reboot from USB and revover :(

Do you have any other option in mind please?

> Once you have the filesystem mounted, run 'btrfs balance cancel' on
> the mount point.  Then edit your maintenance scripts and remove the
> metadata balance (-m flag to 'btrfs balance start').

OK clear thanks.

>> Additional data:
>> 
>> --8<---------------cut here---------------start------------->8---

[...]

>> ~$ sudo btrfs fi usage /
>> Overall:
>>     Device size:                 899.07GiB
>>     Device allocated:            899.07GiB
>>     Device unallocated:            2.01MiB
>>     Device missing:                  0.00B
>>     Used:                        897.05GiB
>>     Free (estimated):             85.87MiB      (min: 85.87MiB)
>>     Data ratio:                       2.00
>>     Metadata ratio:                   2.00
>>     Global reserve:              512.00MiB      (used: 5.53MiB)
>> 
>> Data,RAID1: Size:446.50GiB, Used:446.42GiB (99.98%)
>>    /dev/sda3     446.50GiB
>>    /dev/sdb3     446.50GiB
>> 
>> Metadata,RAID1: Size:3.00GiB, Used:2.11GiB (70.22%)
>>    /dev/sda3       3.00GiB
>>    /dev/sdb3       3.00GiB
>> 
>> System,RAID1: Size:32.00MiB, Used:80.00KiB (0.24%)
>>    /dev/sda3      32.00MiB
>>    /dev/sdb3      32.00MiB
>> 
>> Unallocated:
>>    /dev/sda3       1.00MiB
>>    /dev/sdb3       1.00MiB

[...]

> To avoid this, never run metadata balances from a scheduled job (or for
> any reason other than working around a kernel bug or adding disks to a
> RAID array) so that an appropriate number of metadata block groups is
> allocated and _stay_ allocated.

[...]

> Scheduled data balances (-d) are OK.  They defragment free space and
> improve allocator performance, and make unallocated space available so
> that additional metadata block groups can be allocated when necessary.

OK got it: thank you for the clear and complete explanation.

No doubt I made a bad mistake with that scheduled job :-(

[...]

Thanks, Giovanni.

-- 
Giovanni Biscuolo

Xelera IT Infrastructures

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 849 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: how to recover from "enospc errors during balance"
  2020-10-01  8:56   ` Giovanni Biscuolo
@ 2020-10-01 15:28     ` A L
  2020-10-02  9:32       ` Giovanni Biscuolo
  2020-10-02  2:44     ` Zygo Blaxell
  1 sibling, 1 reply; 8+ messages in thread
From: A L @ 2020-10-01 15:28 UTC (permalink / raw)
  To: Giovanni Biscuolo, Zygo Blaxell; +Cc: linux-btrfs



On 2020-10-01 10:56, Giovanni Biscuolo wrote:
> [...]
>
>>> I tried to add a new device (I have 2 spare disks) but it does not work
>>> with a read-only filesystem.
>>>
>>> Please how can I remount the filesystem read-write and free some space
>>> deleting some files?
>> Add 'skip_balance' to mount options so that the next mount will not
>> attempt to resume balancing metadata.  Keep mounting and umounting
>> (not remounting) until it completes orphan and relocation cleanup (it
>> may take more than one attempt, probably fewer than 20 attempts).
> I try to mount with this command:
>
> --8<---------------cut here---------------start------------->8---
>
> ~$ mount -o skip_balance,relatime,ssd,subvol=/ /dev/sda3 /
> mount: /: wrong fs type, bad option, bad superblock on /dev/sda3, missing codepage or helper program, or other error.
>
> --8<---------------cut here---------------end--------------->8---
>
> dmesg says:
>
> --8<---------------cut here---------------start------------->8---
>
> [7484575.970136] BTRFS info (device sda3): disk space caching is enabled
> [7484576.001375] BTRFS error (device sda3): Remounting read-write after error is not allowed
>
> --8<---------------cut here---------------end--------------->8---
>
> Am I doing something wrong?
>
> It seems that the filesystem is not allowed to be remounted RW after the
> error.
>
> I don't think rebooting is a good option since it will be unbootable
> (and it's a remote machine).
>
> I fear the only option is to reboot from USB and revover :(
>
> Do you have any other option in mind please?
I think you need to mount an unmounted filesystem and not re-mounting it 
(as per dmesg output).

Example: "mount -o skip_balance /media && btrfs balance cancel /media"

However, I think this is your root filesystem, correct? They you must 
boot with a bootable media and do recovery from there

Just remember that deleting data on Btrfs can increase metadata usage, 
especially if you have lots of snapshots and such. In the case your 
filesystem goes back into ro mode when deleting files, you may need to 
add two additional disks (or loop devices, usb sticks etc) to continue.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: how to recover from "enospc errors during balance"
  2020-10-01  8:56   ` Giovanni Biscuolo
  2020-10-01 15:28     ` A L
@ 2020-10-02  2:44     ` Zygo Blaxell
  1 sibling, 0 replies; 8+ messages in thread
From: Zygo Blaxell @ 2020-10-02  2:44 UTC (permalink / raw)
  To: Giovanni Biscuolo; +Cc: linux-btrfs

On Thu, Oct 01, 2020 at 10:56:15AM +0200, Giovanni Biscuolo wrote:
> Hello Zygo,
> 
> thank you for your help!
> 
> ...but I still cannot mount the filesystem RW, see below.
> 
> Zygo Blaxell <ce3g8jdj@umail.furryterror.org> writes:
> 
> > On Tue, Sep 29, 2020 at 04:25:06PM +0200, Giovanni Biscuolo wrote:
> 
> [...]
> 
> >> [6928066.755704] BTRFS info (device sda3): balance: start -dusage=50 -musage=70 -susage=70
> >
> > Never balance metadata on a schedule.  If it is done too often, and the
> > disk fills up, it will eventually lead to ENOSPC errors that are hard
> > to get out of...
> 
> OK I got it: I'll fix it as soon as I'll get to remount the (root)
> filesystem.
> 
> I was using an option I did not fully understand and I was not able to
> find such a warning in the documentation.
> 
> [...]
> 
> >> I tried to add a new device (I have 2 spare disks) but it does not work
> >> with a read-only filesystem.
> >> 
> >> Please how can I remount the filesystem read-write and free some space
> >> deleting some files?
> >
> > Add 'skip_balance' to mount options so that the next mount will not
> > attempt to resume balancing metadata.  Keep mounting and umounting
> > (not remounting) until it completes orphan and relocation cleanup (it
> > may take more than one attempt, probably fewer than 20 attempts).
> 
> I try to mount with this command:
> 
> --8<---------------cut here---------------start------------->8---
> 
> ~$ mount -o skip_balance,relatime,ssd,subvol=/ /dev/sda3 /
> mount: /: wrong fs type, bad option, bad superblock on /dev/sda3, missing codepage or helper program, or other error.
> 
> --8<---------------cut here---------------end--------------->8---
> 
> dmesg says:
> 
> --8<---------------cut here---------------start------------->8---
> 
> [7484575.970136] BTRFS info (device sda3): disk space caching is enabled
> [7484576.001375] BTRFS error (device sda3): Remounting read-write after error is not allowed
> 
> --8<---------------cut here---------------end--------------->8---
> 
> Am I doing something wrong?
> 
> It seems that the filesystem is not allowed to be remounted RW after the
> error.
> 
> I don't think rebooting is a good option since it will be unbootable
> (and it's a remote machine).
> 
> I fear the only option is to reboot from USB and revover :(
> 
> Do you have any other option in mind please?

Unfortunately, that's the only option that is known to work.  As dmesg
says, remounts are not allowed.  You have to umount and mount again,
and for a root fs that implies at least one reboot.  If you have /boot
on a separate filesystem you can add skip_balance to the rootflags and
then use a remote power switch to do the necessary reboots.  If /boot is
on the root filesystem you'll need remote console access or boot from USB.

If the system was expendable, I'd try booting it in qemu-kvm using its
own disks as raw filesystem images, but that has high risk of trashing
the whole filesystem (and might be hard to implement if you don't already
have kvm installed on the host).

> > Once you have the filesystem mounted, run 'btrfs balance cancel' on
> > the mount point.  Then edit your maintenance scripts and remove the
> > metadata balance (-m flag to 'btrfs balance start').
> 
> OK clear thanks.
> 
> >> Additional data:
> >> 
> >> --8<---------------cut here---------------start------------->8---
> 
> [...]
> 
> >> ~$ sudo btrfs fi usage /
> >> Overall:
> >>     Device size:                 899.07GiB
> >>     Device allocated:            899.07GiB
> >>     Device unallocated:            2.01MiB
> >>     Device missing:                  0.00B
> >>     Used:                        897.05GiB
> >>     Free (estimated):             85.87MiB      (min: 85.87MiB)
> >>     Data ratio:                       2.00
> >>     Metadata ratio:                   2.00
> >>     Global reserve:              512.00MiB      (used: 5.53MiB)
> >> 
> >> Data,RAID1: Size:446.50GiB, Used:446.42GiB (99.98%)
> >>    /dev/sda3     446.50GiB
> >>    /dev/sdb3     446.50GiB
> >> 
> >> Metadata,RAID1: Size:3.00GiB, Used:2.11GiB (70.22%)
> >>    /dev/sda3       3.00GiB
> >>    /dev/sdb3       3.00GiB
> >> 
> >> System,RAID1: Size:32.00MiB, Used:80.00KiB (0.24%)
> >>    /dev/sda3      32.00MiB
> >>    /dev/sdb3      32.00MiB
> >> 
> >> Unallocated:
> >>    /dev/sda3       1.00MiB
> >>    /dev/sdb3       1.00MiB
> 
> [...]
> 
> > To avoid this, never run metadata balances from a scheduled job (or for
> > any reason other than working around a kernel bug or adding disks to a
> > RAID array) so that an appropriate number of metadata block groups is
> > allocated and _stay_ allocated.
> 
> [...]
> 
> > Scheduled data balances (-d) are OK.  They defragment free space and
> > improve allocator performance, and make unallocated space available so
> > that additional metadata block groups can be allocated when necessary.
> 
> OK got it: thank you for the clear and complete explanation.
> 
> No doubt I made a bad mistake with that scheduled job :-(
> 
> [...]
> 
> Thanks, Giovanni.
> 
> -- 
> Giovanni Biscuolo
> 
> Xelera IT Infrastructures



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: how to recover from "enospc errors during balance"
  2020-10-01 15:28     ` A L
@ 2020-10-02  9:32       ` Giovanni Biscuolo
  0 siblings, 0 replies; 8+ messages in thread
From: Giovanni Biscuolo @ 2020-10-02  9:32 UTC (permalink / raw)
  To: A L, Zygo Blaxell; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1709 bytes --]

Hello,

A L <mail@lechevalier.se> writes:

> On 2020-10-01 10:56, Giovanni Biscuolo wrote:

[...]

>> --8<---------------cut here---------------start------------->8---
>>
>> ~$ mount -o skip_balance,relatime,ssd,subvol=/ /dev/sda3 /
>> mount: /: wrong fs type, bad option, bad superblock on /dev/sda3, missing codepage or helper program, or other error.
>>
>> --8<---------------cut here---------------end--------------->8---
>>
>> dmesg says:
>>
>> --8<---------------cut here---------------start------------->8---
>>
>> [7484575.970136] BTRFS info (device sda3): disk space caching is enabled
>> [7484576.001375] BTRFS error (device sda3): Remounting read-write after error is not allowed
>>
>> --8<---------------cut here---------------end--------------->8---
>>
>> Am I doing something wrong?

[...]

> I think you need to mount an unmounted filesystem and not re-mounting it 
> (as per dmesg output).
>
> Example: "mount -o skip_balance /media && btrfs balance cancel /media"

Ah OK, now I understand

> However, I think this is your root filesystem, correct? They you must 
> boot with a bootable media and do recovery from there

Yes it's the roof filesystem of that machine, so yes: I'll have to
recover via bootable media.

> Just remember that deleting data on Btrfs can increase metadata usage, 
> especially if you have lots of snapshots and such. In the case your 
> filesystem goes back into ro mode when deleting files, you may need to 
> add two additional disks (or loop devices, usb sticks etc) to continue.

OK I'll do that.

Thank you both for your support!

Best regards, Giovanni.

-- 
Giovanni Biscuolo

Xelera IT Infrastructures

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 849 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-10-02  9:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-29 14:25 how to recover from "enospc errors during balance" Giovanni Biscuolo
2020-09-29 15:07 ` A L
2020-10-01  8:24   ` Giovanni Biscuolo
2020-09-30  0:04 ` Zygo Blaxell
2020-10-01  8:56   ` Giovanni Biscuolo
2020-10-01 15:28     ` A L
2020-10-02  9:32       ` Giovanni Biscuolo
2020-10-02  2:44     ` Zygo Blaxell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.