linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* enospc errors during balance — how to prevent out of space
@ 2024-04-16  6:09 Leszek Dubiel
  2024-04-16  7:54 ` HAN Yuwei
  0 siblings, 1 reply; 9+ messages in thread
From: Leszek Dubiel @ 2024-04-16  6:09 UTC (permalink / raw)
  To: linux-btrfs



Hello :) :)



My disk got full, so I have removed snapshots and
now it has plenty of free space, see:


# df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdc3       8.2T  7.2T  1.1T  88% /


# btrfs fi show /
Label: none  uuid: ec3525ef-b73a-452a-a4ee-86286252d730
     Total devices 3 FS bytes used 7.11TiB
     devid    1 size 5.43TiB used 5.42TiB path /dev/sdc3
     devid    2 size 5.43TiB used 5.42TiB path /dev/sda3
     devid    3 size 5.43TiB used 5.43TiB path /dev/sdb3


# btrfs fi df /
Data, RAID1: total=8.09TiB, used=7.07TiB
System, RAID1: total=32.00MiB, used=1.38MiB
Metadata, RAID1: total=45.04GiB, used=39.08GiB
GlobalReserve, single: total=512.00MiB, used=32.00KiB





But i got error  NO FREE SPACE.




# btrfs device usage /
/dev/sdc3, ID: 1
    Device size:             5.43TiB
    Device slack:            3.50KiB
    Data,RAID1:              5.39TiB
    Metadata,RAID1:         39.04GiB
    System,RAID1:           32.00MiB
    Unallocated:             1.00MiB

/dev/sda3, ID: 2
    Device size:             5.43TiB
    Device slack:            3.50KiB
    Data,RAID1:              5.39TiB
    Metadata,RAID1:         38.03GiB
    System,RAID1:           32.00MiB
    Unallocated:             1.00MiB

/dev/sdb3, ID: 3
    Device size:             5.43TiB
    Device slack:            3.50KiB
    Data,RAID1:              5.41TiB
    Metadata,RAID1:         13.00GiB
    Unallocated:             1.00MiB





I noticed 1Mb for Unallocated space, so
I have run multiple times balance (data usage filter):

          btrfs balance start -dusage=XX,limit=1 /


and it didn't help.

It even got error no space when balancing:

syslog-2024-04-15T14:20:41.498301+02:00 zefir kernel: [161213.968020] 
BTRFS info (device sdc3): balance: start -dusage=70,limit=3
syslog-2024-04-15T14:20:41.510297+02:00 zefir kernel: [161213.978076] 
BTRFS info (device sdc3): relocating block group 31021491027968 flags 
data|raid1
syslog-2024-04-15T14:20:46.118283+02:00 zefir kernel: [161218.585833] 
BTRFS info (device sdc3): relocating block group 30657484161024 flags 
data|raid1
syslog-2024-04-15T14:20:50.406268+02:00 zefir kernel: [161222.874987] 
BTRFS info (device sdc3): relocating block group 30654262935552 flags 
data|raid1
syslog:2024-04-15T14:21:01.270284+02:00 zefir kernel: [161233.739112] 
BTRFS info (device sdc3): 3 enospc errors during balance
syslog-2024-04-15T14:21:01.270305+02:00 zefir kernel: [161233.739119] 
BTRFS info (device sdc3): balance: ended with status: -28





Then multiple times both for data and metadata:

             btrfs balance start -musage=XX,limit=1 /

             btrfs balance start -dusage=50,limit=1 /




Unallocated space increased:

# btrfs device usage /
/dev/sdc3, ID: 1
    Device size:             5.43TiB
    Device slack:            3.50KiB
    Data,RAID1:              5.39TiB
    Metadata,RAID1:         39.04GiB
    System,RAID1:           32.00MiB
    Unallocated:             1.00MiB

/dev/sda3, ID: 2
    Device size:             5.43TiB
    Device slack:            3.50KiB
    Data,RAID1:              5.39TiB
    Metadata,RAID1:         38.03GiB
    System,RAID1:           32.00MiB
    Unallocated:             1.00MiB

/dev/sdb3, ID: 3
    Device size:             5.43TiB
    Device slack:            3.50KiB
    Data,RAID1:              5.41TiB
    Metadata,RAID1:         13.00GiB
    Unallocated:            21.57MiB



and now I have:

/dev/sdc3, ID: 1
    Device size:             5.43TiB
    Device slack:            3.50KiB
    Data,RAID1:              5.36TiB
    Metadata,RAID1:         40.00GiB
    Unallocated:            31.06GiB

/dev/sda3, ID: 2
    Device size:             5.43TiB
    Device slack:            3.50KiB
    Data,RAID1:              5.36TiB
    Metadata,RAID1:         36.00GiB
    System,RAID1:           32.00MiB
    Unallocated:            28.06GiB

/dev/sdb3, ID: 3
    Device size:             5.43TiB
    Device slack:            3.50KiB
    Data,RAID1:              5.38TiB
    Metadata,RAID1:         12.00GiB
    System,RAID1:           32.00MiB
    Unallocated:            31.02GiB



Should I run balance quite often to prevent Unallocated space to go as 
low as 1.00MiB?

How to prevent "NO SPACE ERROR" when there is pleny of space left?

Run balance regularly to keep Unallocated space high?



Thank you.
I am using BTRFS in production many, many years (since 2010 maybe?).





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: enospc errors during balance — how to prevent out of space
  2024-04-16  6:09 enospc errors during balance — how to prevent out of space Leszek Dubiel
@ 2024-04-16  7:54 ` HAN Yuwei
  2024-04-16 15:17   ` Leszek Dubiel
  0 siblings, 1 reply; 9+ messages in thread
From: HAN Yuwei @ 2024-04-16  7:54 UTC (permalink / raw)
  To: Leszek Dubiel, linux-btrfs

在 2024/4/16 14:09, Leszek Dubiel 写道:
>
>
> Hello :) :)
>
>
>
> My disk got full, so I have removed snapshots and
> now it has plenty of free space, see:
>
>
> # df -h /
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/sdc3       8.2T  7.2T  1.1T  88% /
>
>
> # btrfs fi show /
> Label: none  uuid: ec3525ef-b73a-452a-a4ee-86286252d730
>     Total devices 3 FS bytes used 7.11TiB
>     devid    1 size 5.43TiB used 5.42TiB path /dev/sdc3
>     devid    2 size 5.43TiB used 5.42TiB path /dev/sda3
>     devid    3 size 5.43TiB used 5.43TiB path /dev/sdb3
>
>
> # btrfs fi df /
> Data, RAID1: total=8.09TiB, used=7.07TiB
> System, RAID1: total=32.00MiB, used=1.38MiB
> Metadata, RAID1: total=45.04GiB, used=39.08GiB
> GlobalReserve, single: total=512.00MiB, used=32.00KiB
>
>
>
>
>
> But i got error  NO FREE SPACE.
>
>
>
>
> # btrfs device usage /
> /dev/sdc3, ID: 1
>    Device size:             5.43TiB
>    Device slack:            3.50KiB
>    Data,RAID1:              5.39TiB
>    Metadata,RAID1:         39.04GiB
>    System,RAID1:           32.00MiB
>    Unallocated:             1.00MiB
>
> /dev/sda3, ID: 2
>    Device size:             5.43TiB
>    Device slack:            3.50KiB
>    Data,RAID1:              5.39TiB
>    Metadata,RAID1:         38.03GiB
>    System,RAID1:           32.00MiB
>    Unallocated:             1.00MiB
>
> /dev/sdb3, ID: 3
>    Device size:             5.43TiB
>    Device slack:            3.50KiB
>    Data,RAID1:              5.41TiB
>    Metadata,RAID1:         13.00GiB
>    Unallocated:             1.00MiB
>
>
>
>
>
> I noticed 1Mb for Unallocated space, so
> I have run multiple times balance (data usage filter):
>
>          btrfs balance start -dusage=XX,limit=1 /
>
>
> and it didn't help.
You can try add a small device (USB stick) and rebalance.
> It even got error no space when balancing:
>
> syslog-2024-04-15T14:20:41.498301+02:00 zefir kernel: [161213.968020] 
> BTRFS info (device sdc3): balance: start -dusage=70,limit=3
> syslog-2024-04-15T14:20:41.510297+02:00 zefir kernel: [161213.978076] 
> BTRFS info (device sdc3): relocating block group 31021491027968 flags 
> data|raid1
> syslog-2024-04-15T14:20:46.118283+02:00 zefir kernel: [161218.585833] 
> BTRFS info (device sdc3): relocating block group 30657484161024 flags 
> data|raid1
> syslog-2024-04-15T14:20:50.406268+02:00 zefir kernel: [161222.874987] 
> BTRFS info (device sdc3): relocating block group 30654262935552 flags 
> data|raid1
> syslog:2024-04-15T14:21:01.270284+02:00 zefir kernel: [161233.739112] 
> BTRFS info (device sdc3): 3 enospc errors during balance
> syslog-2024-04-15T14:21:01.270305+02:00 zefir kernel: [161233.739119] 
> BTRFS info (device sdc3): balance: ended with status: -28
>
>
>
>
>
> Then multiple times both for data and metadata:
>
>             btrfs balance start -musage=XX,limit=1 /
>
>             btrfs balance start -dusage=50,limit=1 /
>
>
>
>
> Unallocated space increased:
>
> # btrfs device usage /
> /dev/sdc3, ID: 1
>    Device size:             5.43TiB
>    Device slack:            3.50KiB
>    Data,RAID1:              5.39TiB
>    Metadata,RAID1:         39.04GiB
>    System,RAID1:           32.00MiB
>    Unallocated:             1.00MiB
>
> /dev/sda3, ID: 2
>    Device size:             5.43TiB
>    Device slack:            3.50KiB
>    Data,RAID1:              5.39TiB
>    Metadata,RAID1:         38.03GiB
>    System,RAID1:           32.00MiB
>    Unallocated:             1.00MiB
>
> /dev/sdb3, ID: 3
>    Device size:             5.43TiB
>    Device slack:            3.50KiB
>    Data,RAID1:              5.41TiB
>    Metadata,RAID1:         13.00GiB
>    Unallocated:            21.57MiB
>
>
>
> and now I have:
>
> /dev/sdc3, ID: 1
>    Device size:             5.43TiB
>    Device slack:            3.50KiB
>    Data,RAID1:              5.36TiB
>    Metadata,RAID1:         40.00GiB
>    Unallocated:            31.06GiB
>
> /dev/sda3, ID: 2
>    Device size:             5.43TiB
>    Device slack:            3.50KiB
>    Data,RAID1:              5.36TiB
>    Metadata,RAID1:         36.00GiB
>    System,RAID1:           32.00MiB
>    Unallocated:            28.06GiB
>
> /dev/sdb3, ID: 3
>    Device size:             5.43TiB
>    Device slack:            3.50KiB
>    Data,RAID1:              5.38TiB
>    Metadata,RAID1:         12.00GiB
>    System,RAID1:           32.00MiB
>    Unallocated:            31.02GiB
>
>
>
> Should I run balance quite often to prevent Unallocated space to go as 
> low as 1.00MiB?
>
> How to prevent "NO SPACE ERROR" when there is pleny of space left?
>
> Run balance regularly to keep Unallocated space high?
>
If you have zabbix or other monitoring mechanism, you can try monitoring 
"Unallocated" and reserve at least 2 block group (2GiB). Or you can have 
a weekly timer to rebalance your btrfs volume.
kdave/btrfsmaintenance should helps you.

>
>
> Thank you.
> I am using BTRFS in production many, many years (since 2010 maybe?).
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: enospc errors during balance — how to prevent out of space
  2024-04-16  7:54 ` HAN Yuwei
@ 2024-04-16 15:17   ` Leszek Dubiel
  2024-04-16 17:00     ` Forza
  0 siblings, 1 reply; 9+ messages in thread
From: Leszek Dubiel @ 2024-04-16 15:17 UTC (permalink / raw)
  To: HAN Yuwei, linux-btrfs



>>
>>
>> I noticed 1Mb for Unallocated space, so
>> I have run multiple times balance (data usage filter):
>>
>>          btrfs balance start -dusage=XX,limit=1 /
>>
>>
>> and it didn't help.
> You can try add a small device (USB stick) and rebalance.
>> It even got error no space when balancing:


When it refused to balance i tried musage, dusage, few times and it helped.
Thanks for tip.




> If you have zabbix or other monitoring mechanism, you can try 
> monitoring "Unallocated" and reserve at least 2 block group (2GiB). Or 
> you can have a weekly timer to rebalance your btrfs volume.
> kdave/btrfsmaintenance should helps you.

I started to run this script from cron every 10 minutes:


#!/usr/bin/bash

mount | sed -nr 's%^/dev/sd[a-z][0-9] on (/[/_[:alnum:]]+) type btrfs 
.*$%\1%; T; p' |
while read mnt; do
     if
         btrfs dev usage "$mnt" -g |
         perl -ne '/Unallocated: +([0-9]+\.[0-9]{2})GiB/ and $1 < 64 and 
print $1' |
         grep -q .
     then
         echo "porządkować $mnt"

         for usa in $(seq 0 10 100); do
             # I don't know whether to start with "dusage" or "musage", 
so i shuffle it;
             # 15 april 2024 my serwer was locked, "dusage" returned 
"enospace", and it
             # got unlocked after "musage=0"
             {
                 echo d $usa
                 echo m $usa
             } | shuf
         done |
         while read typ usa; do

             echo; echo; date; echo "balance type=$typ, usage=$usa"

             out="$(btrfs balance start -${typ}usage=$usa,limit=3 "$mnt" 
2>&1)"
             echo "$out"

             # if nothing was done, then try higher usage
             echo "$out" | grep -q "had to relocate 0 out of" && continue

             # otherwise finish: on error or on successful relocate
             break
         done
     fi
done







> "Unallocated" and reserve at least If you have zabbix or other 
> monitoring mechanism, you can try monitoring 2 block group (2GiB). Or 
> you can have a weekly timer to rebalance your btrfs volume.
> kdave/btrfsmaintenance should helps you. 

Thanks for hints :) :)

This solves my questions:

1. i have to rebalance when Unallocated is low

2. i have to keep 2Gb at least.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: enospc errors during balance — how to prevent out of space
  2024-04-16 15:17   ` Leszek Dubiel
@ 2024-04-16 17:00     ` Forza
  2024-04-16 20:07       ` Leszek Dubiel
  0 siblings, 1 reply; 9+ messages in thread
From: Forza @ 2024-04-16 17:00 UTC (permalink / raw)
  To: Leszek Dubiel, HAN Yuwei, linux-btrfs



---- From: Leszek Dubiel <leszek@dubiel.pl> -- Sent: 2024-04-16 - 17:17 ----

> 
> 
>>>
>>>
>>> I noticed 1Mb for Unallocated space, so
>>> I have run multiple times balance (data usage filter):
>>>
>>>          btrfs balance start -dusage=XX,limit=1 /
>>>
>>>
>>> and it didn't help.
>> You can try add a small device (USB stick) and rebalance.
>>> It even got error no space when balancing:
> 
> 
> When it refused to balance i tried musage, dusage, few times and it helped.
> Thanks for tip.
> 

It is not wise to balance metadata block groups because it can directly lead to the situation you ended up with. Btrfs needs unallocated disk space to be able to allocate new data or metadata chunks If it cannot allocate new data chunks you will see an out of disk space error. However if btrfs cannot allocate new metadata chunks, the filesystem will turn read-only, making it much harder to fix - as you cannot balance or add additional space in an read-only filesystem. 


> 
> 
> 
>> If you have zabbix or other monitoring mechanism, you can try 
>> monitoring "Unallocated" and reserve at least 2 block group (2GiB). Or 
>> you can have a weekly timer to rebalance your btrfs volume.
>> kdave/btrfsmaintenance should helps you.
> 
> I started to run this script from cron every 10 minutes:
> 
> 
> #!/usr/bin/bash
> 
> mount | sed -nr 's%^/dev/sd[a-z][0-9] on (/[/_[:alnum:]]+) type btrfs 
> .*$%\1%; T; p' |
> while read mnt; do
>      if
>          btrfs dev usage "$mnt" -g |
>          perl -ne '/Unallocated: +([0-9]+\.[0-9]{2})GiB/ and $1 < 64 and 
> print $1' |
>          grep -q .
>      then
>          echo "porządkować $mnt"
> 
>          for usa in $(seq 0 10 100); do
>              # I don't know whether to start with "dusage" or "musage", 
> so i shuffle it;
>              # 15 april 2024 my serwer was locked, "dusage" returned 
> "enospace", and it
>              # got unlocked after "musage=0"
>              {
>                  echo d $usa
>                  echo m $usa
>              } | shuf
>          done |
>          while read typ usa; do
> 
>              echo; echo; date; echo "balance type=$typ, usage=$usa"
> 
>              out="$(btrfs balance start -${typ}usage=$usa,limit=3 "$mnt" 
> 2>&1)"
>              echo "$out"
> 
>              # if nothing was done, then try higher usage
>              echo "$out" | grep -q "had to relocate 0 out of" && continue
> 
>              # otherwise finish: on error or on successful relocate
>              break
>          done
>      fi
> done
> 
> 
> 

Do you mean you run this continuously on your filesystem? This is normally not required and will increase wear on your disks. 
> 
> 
> 
> 
>> "Unallocated" and reserve at least If you have zabbix or other 
>> monitoring mechanism, you can try monitoring 2 block group (2GiB). Or 
>> you can have a weekly timer to rebalance your btrfs volume.
>> kdave/btrfsmaintenance should helps you. 
> 
> Thanks for hints :) :)
> 
> This solves my questions:
> 
> 1. i have to rebalance when Unallocated is low
> 
> 2. i have to keep 2Gb at least.

You should keep at least 1GB x profile mode. That is for DUP, 2GB, and for RAID1, at least 1GB on two devices.

It would be better to have more, to be safe.

An option that could be used is 'bg_reclaim_threshold', which is a sysfs knob to let the kernel automatically reclaim (balance) block groups that fall under a specific threshold.

https://btrfs.readthedocs.io/en/latest/ch-sysfs.html

> 
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: enospc errors during balance — how to prevent out of space
  2024-04-16 17:00     ` Forza
@ 2024-04-16 20:07       ` Leszek Dubiel
  2024-04-21 19:09         ` Forza
  0 siblings, 1 reply; 9+ messages in thread
From: Leszek Dubiel @ 2024-04-16 20:07 UTC (permalink / raw)
  To: linux-btrfs

> It is not wise to balance metadata block groups

I have never run balancing with musage=.

Disk got full.
I remove snapshots.
Run balance with  dusage= many times.
Hit "no space" error agian.
As a last rescue try i used  musage=0.
This helped.


In script from Kdave:

https://github.com/kdave/btrfsmaintenance/blob/be42cb6267055d125994abd6927cf3a26deab74c/btrfs-balance.sh#L55

Anyway — I will use musage as a last resort.





>   Btrfs needs unallocated disk space to be able to allocate new data or metadata chunks

Yes, thank you.






>   If it cannot allocate new data chunks you will see an out of disk space error. However if btrfs cannot allocate new metadata chunks, the filesystem will turn read-only, making it much harder to fix - as you cannot balance or add additional space in an read-only filesystem.

Now I understand it. :)




>> I started to run this script from cron every 10 minutes:
>>
>>
>> #!/usr/bin/bash
>>
>> mount | sed -nr 's%^/dev/sd[a-z][0-9] on (/[/_[:alnum:]]+) type btrfs
>> .*$%\1%; T; p' |
>> while read mnt; do
>>       if
>>           btrfs dev usage "$mnt" -g |
>>           perl -ne '/Unallocated: +([0-9]+\.[0-9]{2})GiB/ and $1 < 64 and
>> print $1' |
>>           grep -q .
>>       then
>>           echo "porządkować $mnt"
>>
>>           for usa in $(seq 0 10 100); do
>>               # I don't know whether to start with "dusage" or "musage",
>> so i shuffle it;
>>               # 15 april 2024 my serwer was locked, "dusage" returned
>> "enospace", and it
>>               # got unlocked after "musage=0"
>>               {
>>                   echo d $usa
>>                   echo m $usa
>>               } | shuf
>>           done |
>>           while read typ usa; do
>>
>>               echo; echo; date; echo "balance type=$typ, usage=$usa"
>>
>>               out="$(btrfs balance start -${typ}usage=$usa,limit=3 "$mnt"
>> 2>&1)"
>>               echo "$out"
>>
>>               # if nothing was done, then try higher usage
>>               echo "$out" | grep -q "had to relocate 0 out of" && continue
>>
>>               # otherwise finish: on error or on successful relocate
>>               break
>>           done
>>       fi
>> done
>>
> Do you mean you run this continuously on your filesystem? This is normally not required and will increase wear on your disks.

Yes. But it runs balance only when Unallocated is less then limit.


>> Thanks for hints :) :)
>>
>> This solves my questions:
>>
>> 1. i have to rebalance when Unallocated is low
>>
>> 2. i have to keep 2Gb at least.
> You should keep at least 1GB x profile mode. That is for DUP, 2GB, and for RAID1, at least 1GB on two devices.

Great info. :)

My raid is 3 disk, so I will set limit of 3 disk * 1Gb = 3Gb minimum.


> It would be better to have more, to be safe.
>
> An option that could be used is 'bg_reclaim_threshold', which is a sysfs knob to let the kernel automatically reclaim (balance) block groups that fall under a specific threshold.
>
> https://btrfs.readthedocs.io/en/latest/ch-sysfs.html

On my system it is set to 75.

Disk got full.
I remove snapshots.
Disk showed it has free space 1Tb out of 8TB.
And I got "no space" error again.
I spotted then "Unallocated = 1 MiB",
then started to balance by hand.

I will observe that system and test all i know from your help.
Thank you.



# cat 
/sys/fs/btrfs/ec3525ef-b73a-452a-a4ee-86286252d730/bg_reclaim_threshold
75


# btrfs fi show /
Label: none  uuid: ec3525ef-b73a-452a-a4ee-86286252d730
     Total devices 3 FS bytes used 7.22TiB
     devid    1 size 5.43TiB used 5.36TiB path /dev/sdc3
     devid    2 size 5.43TiB used 5.36TiB path /dev/sda3
     devid    3 size 5.43TiB used 5.37TiB path /dev/sdb3



# btrfs fi df /
Data, RAID1: total=8.00TiB, used=7.18TiB
System, RAID1: total=32.00MiB, used=1.36MiB
Metadata, RAID1: total=43.00GiB, used=39.31GiB
GlobalReserve, single: total=512.00MiB, used=48.00KiB








^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: enospc errors during balance — how to prevent out of space
  2024-04-16 20:07       ` Leszek Dubiel
@ 2024-04-21 19:09         ` Forza
  2024-04-21 19:52           ` Leszek Dubiel
                             ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Forza @ 2024-04-21 19:09 UTC (permalink / raw)
  To: Leszek Dubiel, linux-btrfs



---- From: Leszek Dubiel <leszek@dubiel.pl> -- Sent: 2024-04-16 - 22:07 ----

>> It is not wise to balance metadata block groups
> 
> I have never run balancing with musage=.
> 
> Disk got full.
> I remove snapshots.
> Run balance with  dusage= many times.
> Hit "no space" error agian.
> As a last rescue try i used  musage=0.
> This helped.
> 
> 
> In script from Kdave:
> 
> https://github.com/kdave/btrfsmaintenance/blob/be42cb6267055d125994abd6927cf3a26deab74c/btrfs-balance.sh#L55
> 
> Anyway — I will use musage as a last resort.
> 
> 
> 
> 
> 
>>   Btrfs needs unallocated disk space to be able to allocate new data or metadata chunks
> 
> Yes, thank you.
> 
> 
> 
> 
> 
> 
>>   If it cannot allocate new data chunks you will see an out of disk space error. However if btrfs cannot allocate new metadata chunks, the filesystem will turn read-only, making it much harder to fix - as you cannot balance or add additional space in an read-only filesystem.
> 
> Now I understand it. :)
> 
> 
> 
> 
>>> I started to run this script from cron every 10 minutes:
>>>
>>>
>>> #!/usr/bin/bash
>>>
>>> mount | sed -nr 's%^/dev/sd[a-z][0-9] on (/[/_[:alnum:]]+) type btrfs
>>> .*$%\1%; T; p' |
>>> while read mnt; do
>>>       if
>>>           btrfs dev usage "$mnt" -g |
>>>           perl -ne '/Unallocated: +([0-9]+\.[0-9]{2})GiB/ and $1 < 64 and
>>> print $1' |
>>>           grep -q .
>>>       then
>>>           echo "porządkować $mnt"
>>>
>>>           for usa in $(seq 0 10 100); do
>>>               # I don't know whether to start with "dusage" or "musage",
>>> so i shuffle it;
>>>               # 15 april 2024 my serwer was locked, "dusage" returned
>>> "enospace", and it
>>>               # got unlocked after "musage=0"
>>>               {
>>>                   echo d $usa
>>>                   echo m $usa
>>>               } | shuf
>>>           done |
>>>           while read typ usa; do
>>>
>>>               echo; echo; date; echo "balance type=$typ, usage=$usa"
>>>
>>>               out="$(btrfs balance start -${typ}usage=$usa,limit=3 "$mnt"
>>> 2>&1)"
>>>               echo "$out"
>>>
>>>               # if nothing was done, then try higher usage
>>>               echo "$out" | grep -q "had to relocate 0 out of" && continue
>>>
>>>               # otherwise finish: on error or on successful relocate
>>>               break
>>>           done
>>>       fi
>>> done
>>>
>> Do you mean you run this continuously on your filesystem? This is normally not required and will increase wear on your disks.
> 
> Yes. But it runs balance only when Unallocated is less then limit.
> 
> 
>>> Thanks for hints :) :)
>>>
>>> This solves my questions:
>>>
>>> 1. i have to rebalance when Unallocated is low
>>>
>>> 2. i have to keep 2Gb at least.
>> You should keep at least 1GB x profile mode. That is for DUP, 2GB, and for RAID1, at least 1GB on two devices.
> 
> Great info. :)
> 
> My raid is 3 disk, so I will set limit of 3 disk * 1Gb = 3Gb minimum.
> 
> 
>> It would be better to have more, to be safe.
>>
>> An option that could be used is 'bg_reclaim_threshold', which is a sysfs knob to let the kernel automatically reclaim (balance) block groups that fall under a specific threshold.
>>
>> https://btrfs.readthedocs.io/en/latest/ch-sysfs.html
> 
> On my system it is set to 75.
> 

It is '0' as default. There are two different sysfs files with that name. You should look the one at:

 /sys/fs/btrfs/<uuid>/allocation/data/



> Disk got full.
> I remove snapshots.
> Disk showed it has free space 1Tb out of 8TB.
> And I got "no space" error again.
> I spotted then "Unallocated = 1 MiB",
> then started to balance by hand.
> 
> I will observe that system and test all i know from your help.
> Thank you.
> 
> 
> 
> # cat 
> /sys/fs/btrfs/ec3525ef-b73a-452a-a4ee-86286252d730/bg_reclaim_threshold
> 75
> 
> 
> # btrfs fi show /
> Label: none  uuid: ec3525ef-b73a-452a-a4ee-86286252d730
>      Total devices 3 FS bytes used 7.22TiB
>      devid    1 size 5.43TiB used 5.36TiB path /dev/sdc3
>      devid    2 size 5.43TiB used 5.36TiB path /dev/sda3
>      devid    3 size 5.43TiB used 5.37TiB path /dev/sdb3
> 
> 
> 
> # btrfs fi df /
> Data, RAID1: total=8.00TiB, used=7.18TiB
> System, RAID1: total=32.00MiB, used=1.36MiB
> Metadata, RAID1: total=43.00GiB, used=39.31GiB
> GlobalReserve, single: total=512.00MiB, used=48.00KiB
> 
> 
> 
> 
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: enospc errors during balance — how to prevent out of space
  2024-04-21 19:09         ` Forza
@ 2024-04-21 19:52           ` Leszek Dubiel
  2024-04-27 12:03           ` Leszek Dubiel
  2024-04-29 21:05           ` Leszek Dubiel
  2 siblings, 0 replies; 9+ messages in thread
From: Leszek Dubiel @ 2024-04-21 19:52 UTC (permalink / raw)
  To: Forza, linux-btrfs



 >>
 >>
 >>> It would be better to have more, to be safe.
 >>>
 >>> An option that could be used is 'bg_reclaim_threshold', which is a 
sysfs knob to let the kernel automatically reclaim (balance) block 
groups that fall under a specific threshold.
 >>>
 >>> https://btrfs.readthedocs.io/en/latest/ch-sysfs.html
 >>
 >> On my system it is set to 75.
 >>
 >
 > It is '0' as default. There are two different sysfs files with that 
name. You should look the one at:
 >
 >  /sys/fs/btrfs/<uuid>/allocation/data/


Thank you.

One file has 75 and other 0.



# grep . /sys/fs/btrfs/*/{,allocation/data}/bg_reclaim_threshold

/sys/fs/btrfs/6ff9a384-3dad-4309-803d-c1b9555941f6/bg_reclaim_threshold:75
/sys/fs/btrfs/772bbbcf-78b6-45a5-9d50-e7fd100f09e0/bg_reclaim_threshold:75

/sys/fs/btrfs/6ff9a384-3dad-4309-803d-c1b9555941f6/allocation/data/bg_reclaim_threshold:0
/sys/fs/btrfs/772bbbcf-78b6-45a5-9d50-e7fd100f09e0/allocation/data/bg_reclaim_threshold:0





# btrfs dev usa /mnt/*
/dev/sdb2, ID: 1
    Device size:             5.43TiB
    Device slack:            3.50KiB
    Data,single:             4.95TiB
    Metadata,DUP:           80.00GiB
    System,DUP:             16.00MiB
    Unallocated:           409.00GiB


/dev/sdc1, ID: 1
    Device size:             1.82TiB
    Device slack:            3.50KiB
    Data,single:             1.77TiB
    Metadata,DUP:           48.01GiB
    System,DUP:             64.00MiB
    Unallocated:             1.00GiB    <--- low unallocated





Current script to take care of Unallocated space.


         findmnt --types btrfs --output "SOURCE" --nofsroot --noheading 
| sort | uniq |
         while read dev; do
                 mnt="$(findmnt --types btrfs --first-only --noheadings 
--output "TARGET" --source "$dev")"

                 # no balance if plenty of unallocated space
                 btrfs dev usage "$mnt" -g |
                 perl -ne '/Unallocated: +([0-9]+\.[0-9]{2})GiB/ and $1 
< 9 and print $1' |
                 grep -q . || continue

                 for typ in dusage musage; do
                         for usa in $(seq 0 10 90); do
                                 # if relocated, then get out of two 
loops for next "$dev"
                                 btrfs balance start -$typ=$usa,limit=3 
"$mnt" 2>&1 |
                                 grep -Eq "Done, had to relocate 
[1-9][0-9]* out of [0-9]+ chunks" &&
                                 break 2
                         done
                 done
         done










^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: enospc errors during balance — how to prevent out of space
  2024-04-21 19:09         ` Forza
  2024-04-21 19:52           ` Leszek Dubiel
@ 2024-04-27 12:03           ` Leszek Dubiel
  2024-04-29 21:05           ` Leszek Dubiel
  2 siblings, 0 replies; 9+ messages in thread
From: Leszek Dubiel @ 2024-04-27 12:03 UTC (permalink / raw)
  To: linux-btrfs



My system was making backups for about one week.

It was doing automatic "btrfs balance".



Yesterday it went through:

btrfs balance -dusage=0
btrfs balance -dusage=10
btrfs balance -dusage=20
btrfs balance -dusage=30
...
btrfs balance -dusage=100
...
btrfs balance -musage=0
btrfs balance -dusage=10
btrfs balance -dusage=20
...


Something went wrong when balancing musage (m, as metadata).
System got "read only".



While this happened btrfs was in a process of deleting four snapshots 
(btrfs sub list / -d  — not empty).




It had 450 GB of free space (shown for df -h).

It had almost no Unallocated space (btrfs dev usa /).



After reboot system is mounted read-only.
Kernel shows (Ctrl+D or give root password for maintenance).


Tried to run    btrfs balance -dusage /      on read only system failed.

Tried to mount -oremount,rw    hanged.


Reboot.




Started from USB key Finnix to repair.


Started to mount system.


Dmesg shows:

              bdev /dev/sdc3 errs: wr 0, rd 0, flush 0, corrupt 35967, gen 0


It mounts for a long time now.
Nothing more in dmesg.
Mount command seems stalled, but on iotop I see "btrfs-transaction" 
running — write about 10 M/s



I will leave the system over night and check tommorow or on monday if 
mount was successful.




PS. script that was balancing:


         findmnt --types btrfs --output SOURCE --nofsroot --noheadings | 
sort | uniq |
         while read dev; do
                 mnt=$(findmnt --source "$dev" --output TARGET 
--first-only --noheadings)
                 test -d "$mnt" || continue

                 # no balance if plenty of unallocated space
                 btrfs dev usage "$mnt" -g |
                 perl -ne '/Unallocated: +([0-9]+\.[0-9]{2})GiB/ and $1 
< 21 and print $1' |
                 grep -q . || continue

                 for typ in dusage musage; do
                         for usa in $(seq 0 10 100); do
                                 # if relocated, then get out of two 
loops for next "$dev"
                                 btrfs balance start -$typ=$usa,limit=3 
"$mnt" 2>&1 |
                                 grep -Eq "Done, had to relocate 
[1-9][0-9]* out of [0-9]+ chunks" &&
                                 break 2
                         done
                 done
         done






^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: enospc errors during balance — how to prevent out of space
  2024-04-21 19:09         ` Forza
  2024-04-21 19:52           ` Leszek Dubiel
  2024-04-27 12:03           ` Leszek Dubiel
@ 2024-04-29 21:05           ` Leszek Dubiel
  2 siblings, 0 replies; 9+ messages in thread
From: Leszek Dubiel @ 2024-04-29 21:05 UTC (permalink / raw)
  To: linux-btrfs



Final report for "no space" and balancing.
I write it for maybe it will help someone.


My problem was identical to this one:

https://superuser.com/questions/1573030/cannot-repair-btrfs-partition-because-there-is-too-little-space-left



While mounting it went "read only" because of errors:

         errs: wr: 0, rd 0, flush 0, corrupt: 35967, gen 0

         "No space left failed to recover relocation"

         space_info: metadata ... is full



On "read only" you cannot add more devices, you canot finish balance.



Tried to repair system — no errors.



It was done on terminal, co I uploaded photos here:

https://postimg.cc/gallery/09dzGRG



Finally I have cleared disks and start to rebuild system from scratch.





Balancing script reworked to start balancing earlier and slower,
then go faster:



     findmnt --types btrfs --output SOURCE --nofsroot --noheadings | 
sort | uniq |
     while read dev; do

         # mount point
         mnt=$(findmnt --source "$dev" --output TARGET --first-only 
--noheadings)
         test -d "$mnt" || continue

         # lowest unallocated space from all disks in btrfs filesystem
         una=$(
             btrfs dev usage "$mnt" -g |
             sed -nr 's/.*Unallocated: +([0-9]+)\.[0-9]{2}GiB.*/\1/; T; p' |
             sort -n | head -n1
         )
         echo -n "$una" | tr -c "[[:print:]]" "#" | grep -Eq '^[0-9]+$' 
|| continue

         # if lots of unallocated space do nothing
         if test "$una" -ge 32; then
             : # do nothing

         # if going low, then optimize "dusage" slowly
         elif test "$una" -ge 16; then
             seq --format "-dusage=%g,limit=2" 0 20 100

         elif test "$una" -ge 8; then
             seq --format "-dusage=%g,limit=3" 0 10 100

         # if critically low, then balnace both "dusage" and "musage"
         else
             seq --format "-dusage=%g,limit=4" 0 10 100
             seq --format "-musage=%g,limit=1" 0 10 100
         fi |

         # balance for options above until any extents rellocated
         while read opt; do

             btrfs balance start "$opt" "$mnt" 2>&1 |
             grep -Eq "Done, had to relocate [1-9][0-9]* out of [0-9]+ 
chunks" &&

             # if relocated, then get out of two loops for next "$dev"
             break 2
         done
     done







^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-04-29 21:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-16  6:09 enospc errors during balance — how to prevent out of space Leszek Dubiel
2024-04-16  7:54 ` HAN Yuwei
2024-04-16 15:17   ` Leszek Dubiel
2024-04-16 17:00     ` Forza
2024-04-16 20:07       ` Leszek Dubiel
2024-04-21 19:09         ` Forza
2024-04-21 19:52           ` Leszek Dubiel
2024-04-27 12:03           ` Leszek Dubiel
2024-04-29 21:05           ` Leszek Dubiel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).