All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brandon Heisner <brandonh@wolfram.com>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: btrfs metadata has reserved 1T of extra space and balances don't reclaim it
Date: Fri, 1 Oct 2021 02:49:39 -0500 (CDT)	[thread overview]
Message-ID: <1185660843.2173930.1633074579864.JavaMail.zimbra@wolfram.com> (raw)
In-Reply-To: <20210929173055.GO29026@hungrycats.org>

A reboot of the server did help quite a bit with the problem, but still not fixed completely.  I went from having 1.08T reserved for metadata to "only" having 446G reserved.  My free space went from 346G to 1010G.  So at least I have some breathing room again.  I prefer not to do a defrag, as that breaks all the COW links and the disk usage would go up then.  I haven't tried the balance of all the metadata, which might be resource intensive.  

# btrfs fi us /opt/zimbra/ -T
Overall:
    Device size:                   5.82TiB
    Device allocated:              4.36TiB
    Device unallocated:            1.46TiB
    Device missing:                  0.00B
    Used:                          3.05TiB
    Free (estimated):           1010.62GiB      (min: 1010.62GiB)
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)

            Data      Metadata  System
Id Path     RAID10    RAID10    RAID10    Unallocated
-- -------- --------- --------- --------- -----------
 1 /dev/sdc 446.25GiB 111.50GiB  32.00MiB   932.63GiB
 2 /dev/sdd 446.25GiB 111.50GiB  32.00MiB   932.63GiB
 3 /dev/sde 446.25GiB 111.50GiB  32.00MiB   932.63GiB
 4 /dev/sdf 446.25GiB 111.50GiB  32.00MiB   932.63GiB
-- -------- --------- --------- --------- -----------
   Total      1.74TiB 446.00GiB 128.00MiB     3.64TiB
   Used       1.49TiB  38.16GiB 464.00KiB
# btrfs fi df /opt/zimbra/
Data, RAID10: total=1.74TiB, used=1.49TiB
System, RAID10: total=128.00MiB, used=464.00KiB
Metadata, RAID10: total=446.00GiB, used=38.19GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


----- On Sep 29, 2021, at 12:31 PM, Zygo Blaxell ce3g8jdj@umail.furryterror.org wrote:

> On Tue, Sep 28, 2021 at 09:23:01PM -0500, Brandon Heisner wrote:
>> I have a server running CentOS 7 on 4.9.5-1.el7.elrepo.x86_64 #1 SMP
>> Fri Jan 20 11:34:13 EST 2017 x86_64 x86_64 x86_64 GNU/Linux.  It is
> 
> That is a really old kernel.  I recall there were some anomalous
> metadata allocation behaviors with kernels of that age, e.g. running
> scrub and balance at the same time would allocate a lot of metadata
> because scrub would lock a metadata block group immediately after
> it had been allocated, forcing another metadata block group to be
> allocated immediately.  The symptom of that bug is very similar to
> yours--without warning, hundreds of GB of metadata block groups are
> allocated, all empty, during a scrub or balance operation.
> 
> Unfortunately I don't have a better solution than "upgrade to a newer
> kernel", as that particular bug was solved years ago (along with
> hundreds of others).
> 
>> version locked to that kernel.  The metadata has reserved a full
>> 1T of disk space, while only using ~38G.  I've tried to balance the
>> metadata to reclaim that so it can be used for data, but it doesn't
>> work and gives no errors.  It just says it balanced the chunks but the
>> size doesn't change.  The metadata total is still growing as well,
>> as it used to be 1.04 and now it is 1.08 with only about 10G more
>> of metadata used.  I've tried doing balances up to 70 or 80 musage I
>> think, and the total metadata does not decrease.  I've done so many
>> attempts at balancing, I've probably tried to move 300 chunks or more.
>> None have resulted in any change to the metadata total like they do
>> on other servers running btrfs.  I first started with very low musage,
>> like 10 and then increased it by 10 to try to see if that would balance
>> any chunks out, but with no success.
> 
> Have you tried rebooting?  The block groups may be stuck in a locked
> state in memory or pinned by pending discard requests, in which case
> balance won't touch them.  For that matter, try turning off discard
> (it's usually better to run fstrim once a day anyway, and not use
> the discard mount option).
> 
>> # /sbin/btrfs balance start -musage=60 -mlimit=30 /opt/zimbra
>> Done, had to relocate 30 out of 2127 chunks
>> 
>> I can do that command over and over again, or increase the mlimit,
>> and it doesn't change the metadata total ever.
> 
> I would use just -m here (no filters, only metadata).  If it gets the
> allocation under control, run 'btrfs balance cancel'; if it doesn't,
> let it run all the way to the end.  Each balance starts from the last
> block group, so you are effectively restarting balance to process the
> same 30 block groups over and over here.
> 
>> # btrfs fi show /opt/zimbra/
>> Label: 'Data'  uuid: ece150db-5817-4704-9e84-80f7d8a3b1da
>>         Total devices 4 FS bytes used 1.48TiB
>>         devid    1 size 1.46TiB used 1.38TiB path /dev/sde
>>         devid    2 size 1.46TiB used 1.38TiB path /dev/sdf
>>         devid    3 size 1.46TiB used 1.38TiB path /dev/sdg
>>         devid    4 size 1.46TiB used 1.38TiB path /dev/sdh
>> 
>> # btrfs fi df /opt/zimbra/
>> Data, RAID10: total=1.69TiB, used=1.45TiB
>> System, RAID10: total=64.00MiB, used=640.00KiB
>> Metadata, RAID10: total=1.08TiB, used=37.69GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>> 
>> 
>> # btrfs fi us /opt/zimbra/ -T
>> Overall:
>>     Device size:                   5.82TiB
>>     Device allocated:              5.54TiB
>>     Device unallocated:          291.54GiB
>>     Device missing:                  0.00B
>>     Used:                          2.96TiB
>>     Free (estimated):            396.36GiB      (min: 396.36GiB)
>>     Data ratio:                       2.00
>>     Metadata ratio:                   2.00
>>     Global reserve:              512.00MiB      (used: 0.00B)
>> 
>>             Data      Metadata  System
>> Id Path     RAID10    RAID10    RAID10    Unallocated
>> -- -------- --------- --------- --------- -----------
>>  1 /dev/sde 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>>  2 /dev/sdf 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>>  3 /dev/sdg 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>>  4 /dev/sdh 432.75GiB 276.00GiB  16.00MiB   781.65GiB
>> -- -------- --------- --------- --------- -----------
>>    Total      1.69TiB   1.08TiB  64.00MiB     3.05TiB
>>    Used       1.45TiB  37.69GiB 640.00KiB
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> Brandon Heisner
>> System Administrator
> > Wolfram Research

      reply	other threads:[~2021-10-01  7:49 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-29  2:23 btrfs metadata has reserved 1T of extra space and balances don't reclaim it Brandon Heisner
2021-09-29  7:23 ` Forza
2021-09-29 14:34   ` Brandon Heisner
2021-10-03 11:26     ` Forza
2021-10-03 18:21       ` Zygo Blaxell
2021-09-29  8:22 ` Qu Wenruo
2021-09-29 15:18 ` Andrea Gelmini
2021-09-29 16:39   ` Forza
2021-09-29 18:55     ` Andrea Gelmini
2021-09-29 17:31 ` Zygo Blaxell
2021-10-01  7:49   ` Brandon Heisner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1185660843.2173930.1633074579864.JavaMail.zimbra@wolfram.com \
    --to=brandonh@wolfram.com \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.