btrfs metadata has reserved 1T of extra space and balances don't reclaim it

All of lore.kernel.org
 help / color / mirror / Atom feed

* btrfs metadata has reserved 1T of extra space and balances don't reclaim it
@ 2021-09-29  2:23 Brandon Heisner
  2021-09-29  7:23 ` Forza
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Brandon Heisner @ 2021-09-29  2:23 UTC (permalink / raw)
  To: linux-btrfs

I have a server running CentOS 7 on 4.9.5-1.el7.elrepo.x86_64 #1 SMP Fri Jan 20 11:34:13 EST 2017 x86_64 x86_64 x86_64 GNU/Linux.  It is version locked to that kernel.  The metadata has reserved a full 1T of disk space, while only using ~38G.  I've tried to balance the metadata to reclaim that so it can be used for data, but it doesn't work and gives no errors.  It just says it balanced the chunks but the size doesn't change.  The metadata total is still growing as well, as it used to be 1.04 and now it is 1.08 with only about 10G more of metadata used.  I've tried doing balances up to 70 or 80 musage I think, and the total metadata does not decrease.  I've done so many attempts at balancing, I've probably tried to move 300 chunks or more.  None have resulted in any change to the metadata total like they do on other servers running btrfs.  I first started with very low musage, like 10 and then increased it by 10 to try to see if that would balance any chunks out, but with no success.

# /sbin/btrfs balance start -musage=60 -mlimit=30 /opt/zimbra
Done, had to relocate 30 out of 2127 chunks

I can do that command over and over again, or increase the mlimit, and it doesn't change the metadata total ever.

# btrfs fi show /opt/zimbra/
Label: 'Data'  uuid: ece150db-5817-4704-9e84-80f7d8a3b1da
        Total devices 4 FS bytes used 1.48TiB
        devid    1 size 1.46TiB used 1.38TiB path /dev/sde
        devid    2 size 1.46TiB used 1.38TiB path /dev/sdf
        devid    3 size 1.46TiB used 1.38TiB path /dev/sdg
        devid    4 size 1.46TiB used 1.38TiB path /dev/sdh

# btrfs fi df /opt/zimbra/
Data, RAID10: total=1.69TiB, used=1.45TiB
System, RAID10: total=64.00MiB, used=640.00KiB
Metadata, RAID10: total=1.08TiB, used=37.69GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

# btrfs fi us /opt/zimbra/ -T
Overall:
    Device size:                   5.82TiB
    Device allocated:              5.54TiB
    Device unallocated:          291.54GiB
    Device missing:                  0.00B
    Used:                          2.96TiB
    Free (estimated):            396.36GiB      (min: 396.36GiB)
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)

            Data      Metadata  System
Id Path     RAID10    RAID10    RAID10    Unallocated
-- -------- --------- --------- --------- -----------
 1 /dev/sde 432.75GiB 276.00GiB  16.00MiB   781.65GiB
 2 /dev/sdf 432.75GiB 276.00GiB  16.00MiB   781.65GiB
 3 /dev/sdg 432.75GiB 276.00GiB  16.00MiB   781.65GiB
 4 /dev/sdh 432.75GiB 276.00GiB  16.00MiB   781.65GiB
-- -------- --------- --------- --------- -----------
   Total      1.69TiB   1.08TiB  64.00MiB     3.05TiB
   Used       1.45TiB  37.69GiB 640.00KiB

-- 
Brandon Heisner
System Administrator
Wolfram Research

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs metadata has reserved 1T of extra space and balances don't reclaim it
  2021-09-29  2:23 btrfs metadata has reserved 1T of extra space and balances don't reclaim it Brandon Heisner
@ 2021-09-29  7:23 ` Forza
  2021-09-29 14:34   ` Brandon Heisner
  2021-09-29  8:22 ` Qu Wenruo
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Forza @ 2021-09-29  7:23 UTC (permalink / raw)
  To: brandonh, linux-btrfs



---- From: Brandon Heisner <brandonh@wolfram.com> -- Sent: 2021-09-29 - 04:23 ----

> I have a server running CentOS 7 on 4.9.5-1.el7.elrepo.x86_64 #1 SMP Fri Jan 20 11:34:13 EST 2017 x86_64 x86_64 x86_64 GNU/Linux.  It is version locked to that kernel.  The metadata has reserved a full 1T of disk space, while only using ~38G.  I've tried to balance the metadata to reclaim that so it can be used for data, but it doesn't work and gives no errors.  It just says it balanced the chunks but the size doesn't change.  The metadata total is still growing as well, as it used to be 1.04 and now it is 1.08 with only about 10G more of metadata used.  I've tried doing balances up to 70 or 80 musage I think, and the total metadata does not decrease.  I've done so many attempts at balancing, I've probably tried to move 300 chunks or more.  None have resulted in any change to the metadata total like they do on other servers running btrfs.  I first started with very low musage, like 10 and then increased it by 10 to try to see if that would balance any chunks out, but with no succes
 s.
> 
> # /sbin/btrfs balance start -musage=60 -mlimit=30 /opt/zimbra
> Done, had to relocate 30 out of 2127 chunks
> 
> I can do that command over and over again, or increase the mlimit, and it doesn't change the metadata total ever.
> 
> 
> # btrfs fi show /opt/zimbra/
> Label: 'Data'  uuid: ece150db-5817-4704-9e84-80f7d8a3b1da
>         Total devices 4 FS bytes used 1.48TiB
>         devid    1 size 1.46TiB used 1.38TiB path /dev/sde
>         devid    2 size 1.46TiB used 1.38TiB path /dev/sdf
>         devid    3 size 1.46TiB used 1.38TiB path /dev/sdg
>         devid    4 size 1.46TiB used 1.38TiB path /dev/sdh
> 
> # btrfs fi df /opt/zimbra/
> Data, RAID10: total=1.69TiB, used=1.45TiB
> System, RAID10: total=64.00MiB, used=640.00KiB
> Metadata, RAID10: total=1.08TiB, used=37.69GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> 
> # btrfs fi us /opt/zimbra/ -T
> Overall:
>     Device size:                   5.82TiB
>     Device allocated:              5.54TiB
>     Device unallocated:          291.54GiB
>     Device missing:                  0.00B
>     Used:                          2.96TiB
>     Free (estimated):            396.36GiB      (min: 396.36GiB)
>     Data ratio:                       2.00
>     Metadata ratio:                   2.00
>     Global reserve:              512.00MiB      (used: 0.00B)
> 
>             Data      Metadata  System
> Id Path     RAID10    RAID10    RAID10    Unallocated
> -- -------- --------- --------- --------- -----------
>  1 /dev/sde 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>  2 /dev/sdf 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>  3 /dev/sdg 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>  4 /dev/sdh 432.75GiB 276.00GiB  16.00MiB   781.65GiB
> -- -------- --------- --------- --------- -----------
>    Total      1.69TiB   1.08TiB  64.00MiB     3.05TiB
>    Used       1.45TiB  37.69GiB 640.00KiB
> 
> 
> 
> 
> 
> 
> -- 
> Brandon Heisner
> System Administrator
> Wolfram Research


What are you mount options? Do you by any chance use metadata_ratio mount option?

https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)#MOUNT_OPTIONS


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs metadata has reserved 1T of extra space and balances don't reclaim it
  2021-09-29  7:23 ` Forza
@ 2021-09-29 14:34   ` Brandon Heisner
  2021-10-03 11:26     ` Forza
  0 siblings, 1 reply; 11+ messages in thread
From: Brandon Heisner @ 2021-09-29 14:34 UTC (permalink / raw)
  To: Forza; +Cc: linux-btrfs

No I do not use that option.  Also, because of btrfs not mounting individual subvolume options, I have the compression and nodatacow set with filesystem attributes on the directories that are btrfs subvolumes.

UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /opt/zimbra           btrfs   subvol=zimbra,defaults,discard,compress=lzo 0 0
UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /var/log              btrfs   subvol=root-var-log,defaults,discard,compress=lzo 0 0
UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /opt/zimbra/db        btrfs   subvol=db,defaults,discard,nodatacow 0 0
UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /opt/zimbra/index     btrfs   subvol=index,defaults,discard,compress=lzo 0 0
UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /opt/zimbra/store     btrfs   subvol=store,defaults,discard,compress=lzo 0 0
UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /opt/zimbra/log       btrfs   subvol=log,defaults,discard,compress=lzo 0 0
UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /opt/zimbra/snapshots btrfs   subvol=snapshots,defaults,discard,compress=lzo 0 0




----- On Sep 29, 2021, at 2:23 AM, Forza forza@tnonline.net wrote:

> ---- From: Brandon Heisner <brandonh@wolfram.com> -- Sent: 2021-09-29 - 04:23
> ----
> 
>> I have a server running CentOS 7 on 4.9.5-1.el7.elrepo.x86_64 #1 SMP Fri Jan 20
>> 11:34:13 EST 2017 x86_64 x86_64 x86_64 GNU/Linux.  It is version locked to that
>> kernel.  The metadata has reserved a full 1T of disk space, while only using
>> ~38G.  I've tried to balance the metadata to reclaim that so it can be used for
>> data, but it doesn't work and gives no errors.  It just says it balanced the
>> chunks but the size doesn't change.  The metadata total is still growing as
>> well, as it used to be 1.04 and now it is 1.08 with only about 10G more of
>> metadata used.  I've tried doing balances up to 70 or 80 musage I think, and
>> the total metadata does not decrease.  I've done so many attempts at balancing,
>> I've probably tried to move 300 chunks or more.  None have resulted in any
>> change to the metadata total like they do on other servers running btrfs.  I
>> first started with very low musage, like 10 and then increased it by 10 to try
>> to see if that would balance any chunks out, but with no success.
>> 
>> # /sbin/btrfs balance start -musage=60 -mlimit=30 /opt/zimbra
>> Done, had to relocate 30 out of 2127 chunks
>> 
>> I can do that command over and over again, or increase the mlimit, and it
>> doesn't change the metadata total ever.
>> 
>> 
>> # btrfs fi show /opt/zimbra/
>> Label: 'Data'  uuid: ece150db-5817-4704-9e84-80f7d8a3b1da
>>         Total devices 4 FS bytes used 1.48TiB
>>         devid    1 size 1.46TiB used 1.38TiB path /dev/sde
>>         devid    2 size 1.46TiB used 1.38TiB path /dev/sdf
>>         devid    3 size 1.46TiB used 1.38TiB path /dev/sdg
>>         devid    4 size 1.46TiB used 1.38TiB path /dev/sdh
>> 
>> # btrfs fi df /opt/zimbra/
>> Data, RAID10: total=1.69TiB, used=1.45TiB
>> System, RAID10: total=64.00MiB, used=640.00KiB
>> Metadata, RAID10: total=1.08TiB, used=37.69GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>> 
>> 
>> # btrfs fi us /opt/zimbra/ -T
>> Overall:
>>     Device size:                   5.82TiB
>>     Device allocated:              5.54TiB
>>     Device unallocated:          291.54GiB
>>     Device missing:                  0.00B
>>     Used:                          2.96TiB
>>     Free (estimated):            396.36GiB      (min: 396.36GiB)
>>     Data ratio:                       2.00
>>     Metadata ratio:                   2.00
>>     Global reserve:              512.00MiB      (used: 0.00B)
>> 
>>             Data      Metadata  System
>> Id Path     RAID10    RAID10    RAID10    Unallocated
>> -- -------- --------- --------- --------- -----------
>>  1 /dev/sde 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>>  2 /dev/sdf 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>>  3 /dev/sdg 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>>  4 /dev/sdh 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>> -- -------- --------- --------- --------- -----------
>>    Total      1.69TiB   1.08TiB  64.00MiB     3.05TiB
>>    Used       1.45TiB  37.69GiB 640.00KiB
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> Brandon Heisner
>> System Administrator
>> Wolfram Research
> 
> 
> What are you mount options? Do you by any chance use metadata_ratio mount
> option?
> 
> https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)#MOUNT_OPTIONS

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs metadata has reserved 1T of extra space and balances don't reclaim it
  2021-09-29 14:34   ` Brandon Heisner
@ 2021-10-03 11:26     ` Forza
  2021-10-03 18:21       ` Zygo Blaxell
  0 siblings, 1 reply; 11+ messages in thread
From: Forza @ 2021-10-03 11:26 UTC (permalink / raw)
  To: brandonh; +Cc: linux-btrfs



On 2021-09-29 16:34, Brandon Heisner wrote:
> No I do not use that option.  Also, because of btrfs not mounting individual subvolume options, I have the compression and nodatacow set with filesystem attributes on the directories that are btrfs subvolumes.
> 
> UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /opt/zimbra           btrfs   subvol=zimbra,defaults,discard,compress=lzo 0 0
> UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /var/log              btrfs   subvol=root-var-log,defaults,discard,compress=lzo 0 0
> UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /opt/zimbra/db        btrfs   subvol=db,defaults,discard,nodatacow 0 0
> UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /opt/zimbra/index     btrfs   subvol=index,defaults,discard,compress=lzo 0 0
> UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /opt/zimbra/store     btrfs   subvol=store,defaults,discard,compress=lzo 0 0
> UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /opt/zimbra/log       btrfs   subvol=log,defaults,discard,compress=lzo 0 0
> UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /opt/zimbra/snapshots btrfs   subvol=snapshots,defaults,discard,compress=lzo 0 0
> 
> 

It might be worth looking into discard=async (*) or setting up regular 
fstrim instead of doing the discard mount option.

* async discard:
"mount -o discard=async" to enable it freed extents are not discarded 
immediatelly, but grouped together and trimmed later, with IO rate limiting
* https://lore.kernel.org/lkml/cover.1580142284.git.dsterba@suse.com/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs metadata has reserved 1T of extra space and balances don't reclaim it
  2021-10-03 11:26     ` Forza
@ 2021-10-03 18:21       ` Zygo Blaxell
  0 siblings, 0 replies; 11+ messages in thread
From: Zygo Blaxell @ 2021-10-03 18:21 UTC (permalink / raw)
  To: Forza; +Cc: brandonh, linux-btrfs

On Sun, Oct 03, 2021 at 01:26:24PM +0200, Forza wrote:
> 
> 
> On 2021-09-29 16:34, Brandon Heisner wrote:
> > No I do not use that option.  Also, because of btrfs not mounting individual subvolume options, I have the compression and nodatacow set with filesystem attributes on the directories that are btrfs subvolumes.
> > 
> > UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /opt/zimbra           btrfs   subvol=zimbra,defaults,discard,compress=lzo 0 0
> > UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /var/log              btrfs   subvol=root-var-log,defaults,discard,compress=lzo 0 0
> > UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /opt/zimbra/db        btrfs   subvol=db,defaults,discard,nodatacow 0 0
> > UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /opt/zimbra/index     btrfs   subvol=index,defaults,discard,compress=lzo 0 0
> > UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /opt/zimbra/store     btrfs   subvol=store,defaults,discard,compress=lzo 0 0
> > UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /opt/zimbra/log       btrfs   subvol=log,defaults,discard,compress=lzo 0 0
> > UUID=ece150db-5817-4704-9e84-80f7d8a3b1da /opt/zimbra/snapshots btrfs   subvol=snapshots,defaults,discard,compress=lzo 0 0
> > 
> > 
> 
> It might be worth looking into discard=async (*) or setting up regular
> fstrim instead of doing the discard mount option.

Brandon's kernel (4.9.5 from 2017) is three years too old to have working
discard=async.

Upgrading the kernel would most likely fix the problems even without
changing mount options.

> * async discard:
> "mount -o discard=async" to enable it freed extents are not discarded
> immediatelly, but grouped together and trimmed later, with IO rate limiting
> * https://lore.kernel.org/lkml/cover.1580142284.git.dsterba@suse.com/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs metadata has reserved 1T of extra space and balances don't reclaim it
  2021-09-29  2:23 btrfs metadata has reserved 1T of extra space and balances don't reclaim it Brandon Heisner
  2021-09-29  7:23 ` Forza
@ 2021-09-29  8:22 ` Qu Wenruo
  2021-09-29 15:18 ` Andrea Gelmini
  2021-09-29 17:31 ` Zygo Blaxell
  3 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2021-09-29  8:22 UTC (permalink / raw)
  To: brandonh, linux-btrfs



On 2021/9/29 10:23, Brandon Heisner wrote:
> I have a server running CentOS 7 on 4.9.5-1.el7.elrepo.x86_64 #1 SMP Fri Jan 20 11:34:13 EST 2017 x86_64 x86_64 x86_64 GNU/Linux.  It is version locked to that kernel.  The metadata has reserved a full 1T of disk space, while only using ~38G.  I've tried to balance the metadata to reclaim that so it can be used for data, but it doesn't work and gives no errors.  It just says it balanced the chunks but the size doesn't change.  The metadata total is still growing as well, as it used to be 1.04 and now it is 1.08 with only about 10G more of metadata used.  I've tried doing balances up to 70 or 80 musage I think, and the total metadata does not decrease.  I've done so many attempts at balancing, I've probably tried to move 300 chunks or more.  None have resulted in any change to the metadata total like they do on other servers running btrfs.  I first started with very low musage, like 10 and then increased it by 10 to try to see if that would balance any chunks out, but with no success.
>
> # /sbin/btrfs balance start -musage=60 -mlimit=30 /opt/zimbra
> Done, had to relocate 30 out of 2127 chunks

One question is, did -musage=0 resulted any change?

If there are empty metadata block groups, btrfs should be able to
reclaim without any extra commands.

And is there any dmesg during above -musage=0 output?

Thanks,
Qu

>
> I can do that command over and over again, or increase the mlimit, and it doesn't change the metadata total ever.
>
>
> # btrfs fi show /opt/zimbra/
> Label: 'Data'  uuid: ece150db-5817-4704-9e84-80f7d8a3b1da
>          Total devices 4 FS bytes used 1.48TiB
>          devid    1 size 1.46TiB used 1.38TiB path /dev/sde
>          devid    2 size 1.46TiB used 1.38TiB path /dev/sdf
>          devid    3 size 1.46TiB used 1.38TiB path /dev/sdg
>          devid    4 size 1.46TiB used 1.38TiB path /dev/sdh
>
> # btrfs fi df /opt/zimbra/
> Data, RAID10: total=1.69TiB, used=1.45TiB
> System, RAID10: total=64.00MiB, used=640.00KiB
> Metadata, RAID10: total=1.08TiB, used=37.69GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
>
> # btrfs fi us /opt/zimbra/ -T
> Overall:
>      Device size:                   5.82TiB
>      Device allocated:              5.54TiB
>      Device unallocated:          291.54GiB
>      Device missing:                  0.00B
>      Used:                          2.96TiB
>      Free (estimated):            396.36GiB      (min: 396.36GiB)
>      Data ratio:                       2.00
>      Metadata ratio:                   2.00
>      Global reserve:              512.00MiB      (used: 0.00B)
>
>              Data      Metadata  System
> Id Path     RAID10    RAID10    RAID10    Unallocated
> -- -------- --------- --------- --------- -----------
>   1 /dev/sde 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>   2 /dev/sdf 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>   3 /dev/sdg 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>   4 /dev/sdh 432.75GiB 276.00GiB  16.00MiB   781.65GiB
> -- -------- --------- --------- --------- -----------
>     Total      1.69TiB   1.08TiB  64.00MiB     3.05TiB
>     Used       1.45TiB  37.69GiB 640.00KiB
>
>
>
>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs metadata has reserved 1T of extra space and balances don't reclaim it
  2021-09-29  2:23 btrfs metadata has reserved 1T of extra space and balances don't reclaim it Brandon Heisner
  2021-09-29  7:23 ` Forza
  2021-09-29  8:22 ` Qu Wenruo
@ 2021-09-29 15:18 ` Andrea Gelmini
  2021-09-29 16:39   ` Forza
  2021-09-29 17:31 ` Zygo Blaxell
  3 siblings, 1 reply; 11+ messages in thread
From: Andrea Gelmini @ 2021-09-29 15:18 UTC (permalink / raw)
  To: brandonh; +Cc: Linux BTRFS

Il giorno mer 29 set 2021 alle ore 04:41 Brandon Heisner
<brandonh@wolfram.com> ha scritto:
>
> I have a server running CentOS 7 on 4.9.5-1.el7.elrepo.x86_64 #1 SMP Fri Jan 20 11:34:13 EST 2017 x86_64 x86_64 x86_64 GNU/Linux.  It is version locked to that kernel.  The metadata has reserved a full 1T of disk space, while only using ~38G.  I've tried to balance the metadata to reclaim that so it can be used for data, but it doesn't work and gives no errors.  It just says it balanced the chunks but the size doesn't change.  The metadata total is still growing as well, as it used to be 1.04 and now it is 1.08 with only about 10G more of metadata used.  I've tried doing balances up to 70 or 80 musage I think, and

Similar situation here.
A 18TB single disk with one big snapraid parity file, and a lot of
metadata allocated.
I solved with:
btrfs filesystem defrag -v -r -clzo  . (useless the compression, in my case)

So, just after a little bit from start  I saw already space reclaming.

In the end I fallback to exfat to avoid to keep re-reading/re-writing
all data just to avoid "metadata waste".

Ciao,
Gelma

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs metadata has reserved 1T of extra space and balances don't reclaim it
  2021-09-29 15:18 ` Andrea Gelmini
@ 2021-09-29 16:39   ` Forza
  2021-09-29 18:55     ` Andrea Gelmini
  0 siblings, 1 reply; 11+ messages in thread
From: Forza @ 2021-09-29 16:39 UTC (permalink / raw)
  To: Andrea Gelmini, brandonh; +Cc: Linux BTRFS



---- From: Andrea Gelmini <andrea.gelmini@gmail.com> -- Sent: 2021-09-29 - 17:18 ----

> Il giorno mer 29 set 2021 alle ore 04:41 Brandon Heisner
> <brandonh@wolfram.com> ha scritto:
>>
>> I have a server running CentOS 7 on 4.9.5-1.el7.elrepo.x86_64 #1 SMP Fri Jan 20 11:34:13 EST 2017 x86_64 x86_64 x86_64 GNU/Linux.  It is version locked to that kernel.  The metadata has reserved a full 1T of disk space, while only using ~38G.  I've tried to balance the metadata to reclaim that so it can be used for data, but it doesn't work and gives no errors.  It just says it balanced the chunks but the size doesn't change.  The metadata total is still growing as well, as it used to be 1.04 and now it is 1.08 with only about 10G more of metadata used.  I've tried doing balances up to 70 or 80 musage I think, and
> 
> Similar situation here.
> A 18TB single disk with one big snapraid parity file, and a lot of
> metadata allocated.
> I solved with:
> btrfs filesystem defrag -v -r -clzo  . (useless the compression, in my case)
> 
> So, just after a little bit from start  I saw already space reclaming.
> 
> In the end I fallback to exfat to avoid to keep re-reading/re-writing
> all data just to avoid "metadata waste".
> 
> Ciao,
> Gelma

Maybe autodefrag mount option might be helpful?

Your problem sounds like partially filled extents and not metadata related. Typical scenarios where that happens is with some databases and vm images. A file could allocate much more space than actuall data due to this. Use 'compsize' to determine this. 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs metadata has reserved 1T of extra space and balances don't reclaim it
  2021-09-29 16:39   ` Forza
@ 2021-09-29 18:55     ` Andrea Gelmini
  0 siblings, 0 replies; 11+ messages in thread
From: Andrea Gelmini @ 2021-09-29 18:55 UTC (permalink / raw)
  To: Forza; +Cc: brandonh, Linux BTRFS

Il giorno mer 29 set 2021 alle ore 18:39 Forza <forza@tnonline.net> ha scritto:

> Maybe autodefrag mount option might be helpful?
It was enabled since beginning.

> Your problem sounds like partially filled extents and not metadata related. Typical scenarios where that happens is with some databases and vm images. A file could allocate much more space than actuall data due to this. Use 'compsize' to determine this.

I confirm is one big file with random writes. I agree about extents.
But I'm quite confident the same approach can fix the original question.

Ciao,
Gelma

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs metadata has reserved 1T of extra space and balances don't reclaim it
  2021-09-29  2:23 btrfs metadata has reserved 1T of extra space and balances don't reclaim it Brandon Heisner
                   ` (2 preceding siblings ...)
  2021-09-29 15:18 ` Andrea Gelmini
@ 2021-09-29 17:31 ` Zygo Blaxell
  2021-10-01  7:49   ` Brandon Heisner
  3 siblings, 1 reply; 11+ messages in thread
From: Zygo Blaxell @ 2021-09-29 17:31 UTC (permalink / raw)
  To: Brandon Heisner; +Cc: linux-btrfs

On Tue, Sep 28, 2021 at 09:23:01PM -0500, Brandon Heisner wrote:
> I have a server running CentOS 7 on 4.9.5-1.el7.elrepo.x86_64 #1 SMP
> Fri Jan 20 11:34:13 EST 2017 x86_64 x86_64 x86_64 GNU/Linux.  It is

That is a really old kernel.  I recall there were some anomalous
metadata allocation behaviors with kernels of that age, e.g. running
scrub and balance at the same time would allocate a lot of metadata
because scrub would lock a metadata block group immediately after
it had been allocated, forcing another metadata block group to be
allocated immediately.  The symptom of that bug is very similar to
yours--without warning, hundreds of GB of metadata block groups are
allocated, all empty, during a scrub or balance operation.

Unfortunately I don't have a better solution than "upgrade to a newer
kernel", as that particular bug was solved years ago (along with
hundreds of others).

> version locked to that kernel.  The metadata has reserved a full
> 1T of disk space, while only using ~38G.  I've tried to balance the
> metadata to reclaim that so it can be used for data, but it doesn't
> work and gives no errors.  It just says it balanced the chunks but the
> size doesn't change.  The metadata total is still growing as well,
> as it used to be 1.04 and now it is 1.08 with only about 10G more
> of metadata used.  I've tried doing balances up to 70 or 80 musage I
> think, and the total metadata does not decrease.  I've done so many
> attempts at balancing, I've probably tried to move 300 chunks or more.
> None have resulted in any change to the metadata total like they do
> on other servers running btrfs.  I first started with very low musage,
> like 10 and then increased it by 10 to try to see if that would balance
> any chunks out, but with no success.

Have you tried rebooting?  The block groups may be stuck in a locked
state in memory or pinned by pending discard requests, in which case
balance won't touch them.  For that matter, try turning off discard
(it's usually better to run fstrim once a day anyway, and not use
the discard mount option).

> # /sbin/btrfs balance start -musage=60 -mlimit=30 /opt/zimbra
> Done, had to relocate 30 out of 2127 chunks
> 
> I can do that command over and over again, or increase the mlimit,
> and it doesn't change the metadata total ever.

I would use just -m here (no filters, only metadata).  If it gets the
allocation under control, run 'btrfs balance cancel'; if it doesn't,
let it run all the way to the end.  Each balance starts from the last
block group, so you are effectively restarting balance to process the
same 30 block groups over and over here.

> # btrfs fi show /opt/zimbra/
> Label: 'Data'  uuid: ece150db-5817-4704-9e84-80f7d8a3b1da
>         Total devices 4 FS bytes used 1.48TiB
>         devid    1 size 1.46TiB used 1.38TiB path /dev/sde
>         devid    2 size 1.46TiB used 1.38TiB path /dev/sdf
>         devid    3 size 1.46TiB used 1.38TiB path /dev/sdg
>         devid    4 size 1.46TiB used 1.38TiB path /dev/sdh
> 
> # btrfs fi df /opt/zimbra/
> Data, RAID10: total=1.69TiB, used=1.45TiB
> System, RAID10: total=64.00MiB, used=640.00KiB
> Metadata, RAID10: total=1.08TiB, used=37.69GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> 
> # btrfs fi us /opt/zimbra/ -T
> Overall:
>     Device size:                   5.82TiB
>     Device allocated:              5.54TiB
>     Device unallocated:          291.54GiB
>     Device missing:                  0.00B
>     Used:                          2.96TiB
>     Free (estimated):            396.36GiB      (min: 396.36GiB)
>     Data ratio:                       2.00
>     Metadata ratio:                   2.00
>     Global reserve:              512.00MiB      (used: 0.00B)
> 
>             Data      Metadata  System
> Id Path     RAID10    RAID10    RAID10    Unallocated
> -- -------- --------- --------- --------- -----------
>  1 /dev/sde 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>  2 /dev/sdf 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>  3 /dev/sdg 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>  4 /dev/sdh 432.75GiB 276.00GiB  16.00MiB   781.65GiB
> -- -------- --------- --------- --------- -----------
>    Total      1.69TiB   1.08TiB  64.00MiB     3.05TiB
>    Used       1.45TiB  37.69GiB 640.00KiB
> 
> 
> 
> 
> 
> 
> -- 
> Brandon Heisner
> System Administrator
> Wolfram Research

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs metadata has reserved 1T of extra space and balances don't reclaim it
  2021-09-29 17:31 ` Zygo Blaxell
@ 2021-10-01  7:49   ` Brandon Heisner
  0 siblings, 0 replies; 11+ messages in thread
From: Brandon Heisner @ 2021-10-01  7:49 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs

A reboot of the server did help quite a bit with the problem, but still not fixed completely.  I went from having 1.08T reserved for metadata to "only" having 446G reserved.  My free space went from 346G to 1010G.  So at least I have some breathing room again.  I prefer not to do a defrag, as that breaks all the COW links and the disk usage would go up then.  I haven't tried the balance of all the metadata, which might be resource intensive.  

# btrfs fi us /opt/zimbra/ -T
Overall:
    Device size:                   5.82TiB
    Device allocated:              4.36TiB
    Device unallocated:            1.46TiB
    Device missing:                  0.00B
    Used:                          3.05TiB
    Free (estimated):           1010.62GiB      (min: 1010.62GiB)
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)

            Data      Metadata  System
Id Path     RAID10    RAID10    RAID10    Unallocated
-- -------- --------- --------- --------- -----------
 1 /dev/sdc 446.25GiB 111.50GiB  32.00MiB   932.63GiB
 2 /dev/sdd 446.25GiB 111.50GiB  32.00MiB   932.63GiB
 3 /dev/sde 446.25GiB 111.50GiB  32.00MiB   932.63GiB
 4 /dev/sdf 446.25GiB 111.50GiB  32.00MiB   932.63GiB
-- -------- --------- --------- --------- -----------
   Total      1.74TiB 446.00GiB 128.00MiB     3.64TiB
   Used       1.49TiB  38.16GiB 464.00KiB
# btrfs fi df /opt/zimbra/
Data, RAID10: total=1.74TiB, used=1.49TiB
System, RAID10: total=128.00MiB, used=464.00KiB
Metadata, RAID10: total=446.00GiB, used=38.19GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


----- On Sep 29, 2021, at 12:31 PM, Zygo Blaxell ce3g8jdj@umail.furryterror.org wrote:

> On Tue, Sep 28, 2021 at 09:23:01PM -0500, Brandon Heisner wrote:
>> I have a server running CentOS 7 on 4.9.5-1.el7.elrepo.x86_64 #1 SMP
>> Fri Jan 20 11:34:13 EST 2017 x86_64 x86_64 x86_64 GNU/Linux.  It is
> 
> That is a really old kernel.  I recall there were some anomalous
> metadata allocation behaviors with kernels of that age, e.g. running
> scrub and balance at the same time would allocate a lot of metadata
> because scrub would lock a metadata block group immediately after
> it had been allocated, forcing another metadata block group to be
> allocated immediately.  The symptom of that bug is very similar to
> yours--without warning, hundreds of GB of metadata block groups are
> allocated, all empty, during a scrub or balance operation.
> 
> Unfortunately I don't have a better solution than "upgrade to a newer
> kernel", as that particular bug was solved years ago (along with
> hundreds of others).
> 
>> version locked to that kernel.  The metadata has reserved a full
>> 1T of disk space, while only using ~38G.  I've tried to balance the
>> metadata to reclaim that so it can be used for data, but it doesn't
>> work and gives no errors.  It just says it balanced the chunks but the
>> size doesn't change.  The metadata total is still growing as well,
>> as it used to be 1.04 and now it is 1.08 with only about 10G more
>> of metadata used.  I've tried doing balances up to 70 or 80 musage I
>> think, and the total metadata does not decrease.  I've done so many
>> attempts at balancing, I've probably tried to move 300 chunks or more.
>> None have resulted in any change to the metadata total like they do
>> on other servers running btrfs.  I first started with very low musage,
>> like 10 and then increased it by 10 to try to see if that would balance
>> any chunks out, but with no success.
> 
> Have you tried rebooting?  The block groups may be stuck in a locked
> state in memory or pinned by pending discard requests, in which case
> balance won't touch them.  For that matter, try turning off discard
> (it's usually better to run fstrim once a day anyway, and not use
> the discard mount option).
> 
>> # /sbin/btrfs balance start -musage=60 -mlimit=30 /opt/zimbra
>> Done, had to relocate 30 out of 2127 chunks
>> 
>> I can do that command over and over again, or increase the mlimit,
>> and it doesn't change the metadata total ever.
> 
> I would use just -m here (no filters, only metadata).  If it gets the
> allocation under control, run 'btrfs balance cancel'; if it doesn't,
> let it run all the way to the end.  Each balance starts from the last
> block group, so you are effectively restarting balance to process the
> same 30 block groups over and over here.
> 
>> # btrfs fi show /opt/zimbra/
>> Label: 'Data'  uuid: ece150db-5817-4704-9e84-80f7d8a3b1da
>>         Total devices 4 FS bytes used 1.48TiB
>>         devid    1 size 1.46TiB used 1.38TiB path /dev/sde
>>         devid    2 size 1.46TiB used 1.38TiB path /dev/sdf
>>         devid    3 size 1.46TiB used 1.38TiB path /dev/sdg
>>         devid    4 size 1.46TiB used 1.38TiB path /dev/sdh
>> 
>> # btrfs fi df /opt/zimbra/
>> Data, RAID10: total=1.69TiB, used=1.45TiB
>> System, RAID10: total=64.00MiB, used=640.00KiB
>> Metadata, RAID10: total=1.08TiB, used=37.69GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>> 
>> 
>> # btrfs fi us /opt/zimbra/ -T
>> Overall:
>>     Device size:                   5.82TiB
>>     Device allocated:              5.54TiB
>>     Device unallocated:          291.54GiB
>>     Device missing:                  0.00B
>>     Used:                          2.96TiB
>>     Free (estimated):            396.36GiB      (min: 396.36GiB)
>>     Data ratio:                       2.00
>>     Metadata ratio:                   2.00
>>     Global reserve:              512.00MiB      (used: 0.00B)
>> 
>>             Data      Metadata  System
>> Id Path     RAID10    RAID10    RAID10    Unallocated
>> -- -------- --------- --------- --------- -----------
>>  1 /dev/sde 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>>  2 /dev/sdf 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>>  3 /dev/sdg 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>>  4 /dev/sdh 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>> -- -------- --------- --------- --------- -----------
>>    Total      1.69TiB   1.08TiB  64.00MiB     3.05TiB
>>    Used       1.45TiB  37.69GiB 640.00KiB
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> Brandon Heisner
>> System Administrator
> > Wolfram Research

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-10-03 18:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-29  2:23 btrfs metadata has reserved 1T of extra space and balances don't reclaim it Brandon Heisner
2021-09-29  7:23 ` Forza
2021-09-29 14:34   ` Brandon Heisner
2021-10-03 11:26     ` Forza
2021-10-03 18:21       ` Zygo Blaxell
2021-09-29  8:22 ` Qu Wenruo
2021-09-29 15:18 ` Andrea Gelmini
2021-09-29 16:39   ` Forza
2021-09-29 18:55     ` Andrea Gelmini
2021-09-29 17:31 ` Zygo Blaxell
2021-10-01  7:49   ` Brandon Heisner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.