linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Understanding "Used" in df
@ 2020-07-22 15:10 Martin Steigerwald
  2020-07-22 19:07 ` A L
  2020-07-23  4:51 ` Zygo Blaxell
  0 siblings, 2 replies; 9+ messages in thread
From: Martin Steigerwald @ 2020-07-22 15:10 UTC (permalink / raw)
  To: linux-btrfs

I have:

% LANG=en df -hT /home
Filesystem            Type   Size  Used Avail Use% Mounted on
/dev/mapper/sata-home btrfs  300G  175G  123G  59% /home

And:

merkaba:~> btrfs fi sh /home   
Label: 'home'  uuid: […]
        Total devices 2 FS bytes used 173.91GiB
        devid    1 size 300.00GiB used 223.03GiB path /dev/mapper/sata-home
        devid    2 size 300.00GiB used 223.03GiB path /dev/mapper/msata-home

merkaba:~> btrfs fi df /home
Data, RAID1: total=218.00GiB, used=171.98GiB
System, RAID1: total=32.00MiB, used=64.00KiB
Metadata, RAID1: total=5.00GiB, used=1.94GiB
GlobalReserve, single: total=490.48MiB, used=0.00B

As well as:

merkaba:~> btrfs fi usage -T /home
Overall:
    Device size:                 600.00GiB
    Device allocated:            446.06GiB
    Device unallocated:          153.94GiB
    Device missing:                  0.00B
    Used:                        347.82GiB
    Free (estimated):            123.00GiB      (min: 123.00GiB)
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              490.45MiB      (used: 0.00B)
    Multiple profiles:                  no

                          Data      Metadata System              
Id Path                   RAID1     RAID1    RAID1    Unallocated
-- ---------------------- --------- -------- -------- -----------
 1 /dev/mapper/sata-home  218.00GiB  5.00GiB 32.00MiB    76.97GiB
 2 /dev/mapper/msata-home 218.00GiB  5.00GiB 32.00MiB    76.97GiB
-- ---------------------- --------- -------- -------- -----------
   Total                  218.00GiB  5.00GiB 32.00MiB   153.94GiB
   Used                   171.97GiB  1.94GiB 64.00KiB   


I think I understand all of it, including just 123G instead of
300 - 175 = 125 GiB "Avail" in df -hT.

But why 175 GiB "Used" in 'df -hT' when just 173.91GiB (see 'btrfs fi sh')
is allocated *within* the block group / chunks?

Does this have something to do with that global reserve thing?

Thank you,
-- 
Martin



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Understanding "Used" in df
  2020-07-22 15:10 Understanding "Used" in df Martin Steigerwald
@ 2020-07-22 19:07 ` A L
  2020-07-23  4:51 ` Zygo Blaxell
  1 sibling, 0 replies; 9+ messages in thread
From: A L @ 2020-07-22 19:07 UTC (permalink / raw)
  To: Martin Steigerwald, linux-btrfs


On 2020-07-22 17:10, Martin Steigerwald wrote:
> I have:
>
> % LANG=en df -hT /home
> Filesystem            Type   Size  Used Avail Use% Mounted on
> /dev/mapper/sata-home btrfs  300G  175G  123G  59% /home
>
> And:
>
> merkaba:~> btrfs fi sh /home
> Label: 'home'  uuid: […]
>          Total devices 2 FS bytes used 173.91GiB
>          devid    1 size 300.00GiB used 223.03GiB path /dev/mapper/sata-home
>          devid    2 size 300.00GiB used 223.03GiB path /dev/mapper/msata-home
>
> merkaba:~> btrfs fi df /home
> Data, RAID1: total=218.00GiB, used=171.98GiB
> System, RAID1: total=32.00MiB, used=64.00KiB
> Metadata, RAID1: total=5.00GiB, used=1.94GiB
> GlobalReserve, single: total=490.48MiB, used=0.00B
>
> As well as:
>
> merkaba:~> btrfs fi usage -T /home
> Overall:
>      Device size:                 600.00GiB
>      Device allocated:            446.06GiB
>      Device unallocated:          153.94GiB
>      Device missing:                  0.00B
>      Used:                        347.82GiB
>      Free (estimated):            123.00GiB      (min: 123.00GiB)
>      Data ratio:                       2.00
>      Metadata ratio:                   2.00
>      Global reserve:              490.45MiB      (used: 0.00B)
>      Multiple profiles:                  no
>
>                            Data      Metadata System
> Id Path                   RAID1     RAID1    RAID1    Unallocated
> -- ---------------------- --------- -------- -------- -----------
>   1 /dev/mapper/sata-home  218.00GiB  5.00GiB 32.00MiB    76.97GiB
>   2 /dev/mapper/msata-home 218.00GiB  5.00GiB 32.00MiB    76.97GiB
> -- ---------------------- --------- -------- -------- -----------
>     Total                  218.00GiB  5.00GiB 32.00MiB   153.94GiB
>     Used                   171.97GiB  1.94GiB 64.00KiB
>
>
> I think I understand all of it, including just 123G instead of
> 300 - 175 = 125 GiB "Avail" in df -hT.
>
> But why 175 GiB "Used" in 'df -hT' when just 173.91GiB (see 'btrfs fi sh')
> is allocated *within* the block group / chunks?
>
> Does this have something to do with that global reserve thing?
>
> Thank you,
Hi,

I do not think global reserve should be counted in 'df' output, but it 
does count all unallocated space, even if some of that space will be 
used up by metadata chunks. Isn't the discrepancy the metadata allocated 
vs used?

I made a post a little while ago on this subject, on how to improve the 
space calculations if any one if interested in picking this up? I 
believe the space calculation is wrong in all of the df/btrfs df/btrfs 
us tools.

https://lore.kernel.org/linux-btrfs/ddb33661-2d71-5046-7b6a-4a601dc2df44@lechevalier.se/ 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Understanding "Used" in df
  2020-07-22 15:10 Understanding "Used" in df Martin Steigerwald
  2020-07-22 19:07 ` A L
@ 2020-07-23  4:51 ` Zygo Blaxell
  2020-07-27 11:38   ` Martin Steigerwald
  1 sibling, 1 reply; 9+ messages in thread
From: Zygo Blaxell @ 2020-07-23  4:51 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-btrfs

On Wed, Jul 22, 2020 at 05:10:19PM +0200, Martin Steigerwald wrote:
> I have:
> 
> % LANG=en df -hT /home
> Filesystem            Type   Size  Used Avail Use% Mounted on
> /dev/mapper/sata-home btrfs  300G  175G  123G  59% /home
> 
> And:
> 
> merkaba:~> btrfs fi sh /home   
> Label: 'home'  uuid: […]
>         Total devices 2 FS bytes used 173.91GiB
>         devid    1 size 300.00GiB used 223.03GiB path /dev/mapper/sata-home
>         devid    2 size 300.00GiB used 223.03GiB path /dev/mapper/msata-home
> 
> merkaba:~> btrfs fi df /home
> Data, RAID1: total=218.00GiB, used=171.98GiB
> System, RAID1: total=32.00MiB, used=64.00KiB
> Metadata, RAID1: total=5.00GiB, used=1.94GiB
> GlobalReserve, single: total=490.48MiB, used=0.00B
> 
> As well as:
> 
> merkaba:~> btrfs fi usage -T /home
> Overall:
>     Device size:                 600.00GiB
>     Device allocated:            446.06GiB
>     Device unallocated:          153.94GiB
>     Device missing:                  0.00B
>     Used:                        347.82GiB
>     Free (estimated):            123.00GiB      (min: 123.00GiB)
>     Data ratio:                       2.00
>     Metadata ratio:                   2.00
>     Global reserve:              490.45MiB      (used: 0.00B)
>     Multiple profiles:                  no
> 
>                           Data      Metadata System              
> Id Path                   RAID1     RAID1    RAID1    Unallocated
> -- ---------------------- --------- -------- -------- -----------
>  1 /dev/mapper/sata-home  218.00GiB  5.00GiB 32.00MiB    76.97GiB
>  2 /dev/mapper/msata-home 218.00GiB  5.00GiB 32.00MiB    76.97GiB
> -- ---------------------- --------- -------- -------- -----------
>    Total                  218.00GiB  5.00GiB 32.00MiB   153.94GiB
>    Used                   171.97GiB  1.94GiB 64.00KiB   
> 
> 
> I think I understand all of it, including just 123G instead of
> 300 - 175 = 125 GiB "Avail" in df -hT.
> 
> But why 175 GiB "Used" in 'df -hT' when just 173.91GiB (see 'btrfs fi sh')
> is allocated *within* the block group / chunks?

statvfs (the 'df' syscall) does not report a "used" number, only total
and available btrfs data blocks (no metadata blocks are counted).
'df' computes "used" by subtracting f_blocks - f_bavail.

	122.99875 = 300 - 171.97 - 5 - .03125

	df_free = total - data_used - metadata_allocated - system_allocated

Inline files count as metadata instead of data, so even when you are
out of data blocks (zero blocks free in df), you can sometimes still
write small files.  Sometimes, when you write one small file, 1GB of
available space disappears as a new metadata block group is allocated.

'df' doesn't take metadata or data sharing into account at all, or
the space required to store csums, or bursty metadata usage workloads.
'df' can't predict these events, so its accuracy is limited to no better
than about 0.5% of the size of the filesystem or +/- 1GB, whichever is
larger.

> Does this have something to do with that global reserve thing?

'df' currently tells you nothing about metadata (except in kernels
before 5.6, when you run too low on metadata space, f_bavail is abruptly
set to zero).  That's about the only impact global reserve has on 'df'.

Global reserve is metadata allocated-but-unused space, and all metadata
is not visible to df.  The reserve ensures that critical btrfs metadata
operations can complete without running out of space, by forcing
non-critical long-running operations to commit transactions when no
metadata space is available outside the reserved pool.  It mostly works,
though there are still a few bugs left that lead to EROFS when metadata
runs low.

> 
> Thank you,
> -- 
> Martin
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Understanding "Used" in df
  2020-07-23  4:51 ` Zygo Blaxell
@ 2020-07-27 11:38   ` Martin Steigerwald
  2020-07-27 16:42     ` Andrei Borzenkov
  2020-07-28 21:20     ` Zygo Blaxell
  0 siblings, 2 replies; 9+ messages in thread
From: Martin Steigerwald @ 2020-07-27 11:38 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs

Zygo Blaxell - 23.07.20, 06:51:06 CEST:
> On Wed, Jul 22, 2020 at 05:10:19PM +0200, Martin Steigerwald wrote:
> > I have:
> > 
> > % LANG=en df -hT /home
> > Filesystem            Type   Size  Used Avail Use% Mounted on
> > /dev/mapper/sata-home btrfs  300G  175G  123G  59% /home
> > 
> > And:
> > 
> > merkaba:~> btrfs fi sh /home
> > Label: 'home'  uuid: […]
> > 
> >         Total devices 2 FS bytes used 173.91GiB
> >         devid    1 size 300.00GiB used 223.03GiB path
> >         /dev/mapper/sata-home
> >         devid    2 size 300.00GiB used 223.03GiB path
> >         /dev/mapper/msata-home
> > 
> > merkaba:~> btrfs fi df /home
> > Data, RAID1: total=218.00GiB, used=171.98GiB
> > System, RAID1: total=32.00MiB, used=64.00KiB
> > Metadata, RAID1: total=5.00GiB, used=1.94GiB
> > GlobalReserve, single: total=490.48MiB, used=0.00B
> > 
> > As well as:
> > 
> > merkaba:~> btrfs fi usage -T /home
> > 
> > Overall:
> >     Device size:                 600.00GiB
> >     Device allocated:            446.06GiB
> >     Device unallocated:          153.94GiB
> >     Device missing:                  0.00B
> >     Used:                        347.82GiB
> >     Free (estimated):            123.00GiB      (min: 123.00GiB)
> >     Data ratio:                       2.00
> >     Metadata ratio:                   2.00
> >     Global reserve:              490.45MiB      (used: 0.00B)
> >     Multiple profiles:                  no
> >     
> >                           Data      Metadata System
> > 
> > Id Path                   RAID1     RAID1    RAID1    Unallocated
> > -- ---------------------- --------- -------- -------- -----------
> > 
> >  1 /dev/mapper/sata-home  218.00GiB  5.00GiB 32.00MiB    76.97GiB
> >  2 /dev/mapper/msata-home 218.00GiB  5.00GiB 32.00MiB    76.97GiB
> > 
> > -- ---------------------- --------- -------- -------- -----------
> > 
> >    Total                  218.00GiB  5.00GiB 32.00MiB   153.94GiB
> >    Used                   171.97GiB  1.94GiB 64.00KiB
> > 
> > I think I understand all of it, including just 123G instead of
> > 300 - 175 = 125 GiB "Avail" in df -hT.
> > 
> > But why 175 GiB "Used" in 'df -hT' when just 173.91GiB (see 'btrfs
> > fi sh') is allocated *within* the block group / chunks?
> 
> statvfs (the 'df' syscall) does not report a "used" number, only total
> and available btrfs data blocks (no metadata blocks are counted).
> 'df' computes "used" by subtracting f_blocks - f_bavail.
> 
> 	122.99875 = 300 - 171.97 - 5 - .03125
> 
> 	df_free = total - data_used - metadata_allocated - system_allocated

I get that one. That is for what is still free.

But I do not understand "Used" in df as.

1) It it would be doing 300 GiB - what is still available, it would do 300-122.99 = 177.01

2) If it would add together all allocated within a chunk… 

171.98 GiB used in data + 64 KiB used in system + 1,94 GiB used in metadata ~= 174 GiB

3) It may consider all allocated system and metadata chunks as lost for writing
data:

171.98 used in date + 32 MiB allocated in system + 5 GiB allocated in metadata ~= 176.98 GiB

4) It may consider 2 of those 5 GiB chunks for metadata as reclaimable and
then it would go like this:

171.98 used in date + 32 MiB allocated in system + 3 GiB metadata ~= 116.98 GiB = 174.98 GiB

That would be about right, but also as unpredictable as it can get.

> Inline files count as metadata instead of data, so even when you are
> out of data blocks (zero blocks free in df), you can sometimes still
> write small files.  Sometimes, when you write one small file, 1GB of
> available space disappears as a new metadata block group is allocated.
> 
> 'df' doesn't take metadata or data sharing into account at all, or
> the space required to store csums, or bursty metadata usage workloads.
> 'df' can't predict these events, so its accuracy is limited to no
> better than about 0.5% of the size of the filesystem or +/- 1GB,
> whichever is larger.

So just assume that df output can be +/- 1 GiB off?

I am just wondering cause I aimed to explaining this to participants of
my Linux courses… and for now I have the honest answer that I have
no clue why df displays "175 GiB" as used.

> > Does this have something to do with that global reserve thing?
> 
> 'df' currently tells you nothing about metadata (except in kernels
> before 5.6, when you run too low on metadata space, f_bavail is
> abruptly set to zero).  That's about the only impact global reserve
> has on 'df'.

But it won't claim used or even just allocated metadata space as available
for writing data?

> Global reserve is metadata allocated-but-unused space, and all
> metadata is not visible to df.  The reserve ensures that critical
> btrfs metadata operations can complete without running out of space,
> by forcing non-critical long-running operations to commit
> transactions when no metadata space is available outside the reserved
> pool.  It mostly works, though there are still a few bugs left that
> lead to EROFS when metadata runs low.

Hmmm, thanks.

But as far as I understood also from the other post, Global Reserve is
reserved but not reported as used in df?

I am not sure whether I am getting it though.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Understanding "Used" in df
  2020-07-27 11:38   ` Martin Steigerwald
@ 2020-07-27 16:42     ` Andrei Borzenkov
  2020-07-27 19:30       ` Chris Murphy
  2020-07-28 21:20     ` Zygo Blaxell
  1 sibling, 1 reply; 9+ messages in thread
From: Andrei Borzenkov @ 2020-07-27 16:42 UTC (permalink / raw)
  To: Martin Steigerwald, Zygo Blaxell; +Cc: linux-btrfs

27.07.2020 14:38, Martin Steigerwald пишет:
> Zygo Blaxell - 23.07.20, 06:51:06 CEST:
>> On Wed, Jul 22, 2020 at 05:10:19PM +0200, Martin Steigerwald wrote:
>>> I have:
>>>
>>> % LANG=en df -hT /home
>>> Filesystem            Type   Size  Used Avail Use% Mounted on
>>> /dev/mapper/sata-home btrfs  300G  175G  123G  59% /home
>>>
>>> And:
>>>
>>> merkaba:~> btrfs fi sh /home
>>> Label: 'home'  uuid: […]
>>>
>>>         Total devices 2 FS bytes used 173.91GiB
>>>         devid    1 size 300.00GiB used 223.03GiB path
>>>         /dev/mapper/sata-home
>>>         devid    2 size 300.00GiB used 223.03GiB path
>>>         /dev/mapper/msata-home
>>>
>>> merkaba:~> btrfs fi df /home
>>> Data, RAID1: total=218.00GiB, used=171.98GiB
>>> System, RAID1: total=32.00MiB, used=64.00KiB
>>> Metadata, RAID1: total=5.00GiB, used=1.94GiB
>>> GlobalReserve, single: total=490.48MiB, used=0.00B
>>>
>>> As well as:
>>>
>>> merkaba:~> btrfs fi usage -T /home
>>>
>>> Overall:
>>>     Device size:                 600.00GiB
>>>     Device allocated:            446.06GiB
>>>     Device unallocated:          153.94GiB
>>>     Device missing:                  0.00B
>>>     Used:                        347.82GiB
>>>     Free (estimated):            123.00GiB      (min: 123.00GiB)
>>>     Data ratio:                       2.00
>>>     Metadata ratio:                   2.00
>>>     Global reserve:              490.45MiB      (used: 0.00B)
>>>     Multiple profiles:                  no
>>>     
>>>                           Data      Metadata System
>>>
>>> Id Path                   RAID1     RAID1    RAID1    Unallocated
>>> -- ---------------------- --------- -------- -------- -----------
>>>
>>>  1 /dev/mapper/sata-home  218.00GiB  5.00GiB 32.00MiB    76.97GiB
>>>  2 /dev/mapper/msata-home 218.00GiB  5.00GiB 32.00MiB    76.97GiB
>>>
>>> -- ---------------------- --------- -------- -------- -----------
>>>
>>>    Total                  218.00GiB  5.00GiB 32.00MiB   153.94GiB
>>>    Used                   171.97GiB  1.94GiB 64.00KiB
>>>
>>> I think I understand all of it, including just 123G instead of
>>> 300 - 175 = 125 GiB "Avail" in df -hT.
>>>
>>> But why 175 GiB "Used" in 'df -hT' when just 173.91GiB (see 'btrfs
>>> fi sh') is allocated *within* the block group / chunks?
>>
>> statvfs (the 'df' syscall) does not report a "used" number, only total
>> and available btrfs data blocks (no metadata blocks are counted).
>> 'df' computes "used" by subtracting f_blocks - f_bavail.
>>
>> 	122.99875 = 300 - 171.97 - 5 - .03125
>>
>> 	df_free = total - data_used - metadata_allocated - system_allocated
> 
> I get that one. That is for what is still free.
> 
> But I do not understand "Used" in df as.
> 
> 1) It it would be doing 300 GiB - what is still available, it would do 300-122.99 = 177.01
> 

df "Used" is computed as "total" - "free", where "free" is reported by
filesystem. btrfs free is 76.97GiB + 49.12GiB. I suppose btrfs does
internally round at least the first number to the full chunk size which
gives us close to 125GiB. To

> 2) If it would add together all allocated within a chunk… 
> 
> 171.98 GiB used in data + 64 KiB used in system + 1,94 GiB used in metadata ~= 174 GiB
> 
> 3) It may consider all allocated system and metadata chunks as lost for writing
> data:
> 
> 171.98 used in date + 32 MiB allocated in system + 5 GiB allocated in metadata ~= 176.98 GiB
> 
> 4) It may consider 2 of those 5 GiB chunks for metadata as reclaimable and
> then it would go like this:
> 
> 171.98 used in date + 32 MiB allocated in system + 3 GiB metadata ~= 116.98 GiB = 174.98 GiB
> 

"df" does not know anything about data vs. metadata vs. system reserve.
It has only two values filesystem returns - free and avail. And yes,
they are computed independently.

"used" + "free" == "total", but do not expect "avail" to have any direct
relation to other metrics.

Unfortunately, "df" does not display "free" (I was wrong in other post).
But using stat ...


$ LANGUAGE=en stat -f .
...
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 115164174  Free: 49153062   Available: 43297293

$ LANGUAGE=en df -B 4K .
Filesystem     4K-blocks     Used Available Use% Mounted on
/dev/sda4      115164174 66011112  43297293  61% /

115164174 - 49153062 == 66011112

But there is no way you can compute Available from other values - it is
whatever filesystem returns.


> That would be about right, but also as unpredictable as it can get.
> 
>> Inline files count as metadata instead of data, so even when you are
>> out of data blocks (zero blocks free in df), you can sometimes still
>> write small files.  Sometimes, when you write one small file, 1GB of
>> available space disappears as a new metadata block group is allocated.
>>
>> 'df' doesn't take metadata or data sharing into account at all, or
>> the space required to store csums, or bursty metadata usage workloads.
>> 'df' can't predict these events, so its accuracy is limited to no
>> better than about 0.5% of the size of the filesystem or +/- 1GB,
>> whichever is larger.
> 
> So just assume that df output can be +/- 1 GiB off?
> 
> I am just wondering cause I aimed to explaining this to participants of
> my Linux courses… and for now I have the honest answer that I have
> no clue why df displays "175 GiB" as used.
> 
>>> Does this have something to do with that global reserve thing?
>>
>> 'df' currently tells you nothing about metadata (except in kernels
>> before 5.6, when you run too low on metadata space, f_bavail is
>> abruptly set to zero).  That's about the only impact global reserve
>> has on 'df'.
> 
> But it won't claim used or even just allocated metadata space as available
> for writing data?
> 
>> Global reserve is metadata allocated-but-unused space, and all
>> metadata is not visible to df.  The reserve ensures that critical
>> btrfs metadata operations can complete without running out of space,
>> by forcing non-critical long-running operations to commit
>> transactions when no metadata space is available outside the reserved
>> pool.  It mostly works, though there are still a few bugs left that
>> lead to EROFS when metadata runs low.
> 
> Hmmm, thanks.
> 
> But as far as I understood also from the other post, Global Reserve is
> reserved but not reported as used in df?
> 
> I am not sure whether I am getting it though.
> 
> Best,
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Understanding "Used" in df
  2020-07-27 16:42     ` Andrei Borzenkov
@ 2020-07-27 19:30       ` Chris Murphy
  2020-07-27 19:48         ` Andrei Borzenkov
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2020-07-27 19:30 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Martin Steigerwald, Zygo Blaxell, Btrfs BTRFS

On Mon, Jul 27, 2020 at 10:43 AM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
>
> Unfortunately, "df" does not display "free" (I was wrong in other post).
> But using stat ...
>
>
> $ LANGUAGE=en stat -f .
> ...
> Block size: 4096       Fundamental block size: 4096
> Blocks: Total: 115164174  Free: 49153062   Available: 43297293
>
> $ LANGUAGE=en df -B 4K .
> Filesystem     4K-blocks     Used Available Use% Mounted on
> /dev/sda4      115164174 66011112  43297293  61% /
>
> 115164174 - 49153062 == 66011112
>
> But there is no way you can compute Available from other values - it is
> whatever filesystem returns.
>

It's definitely goofy in the odd device raid1 case. If I fallocate a
file equal to Avail, Avail is not zero. The fallocated file + new
Avail != old Avail, which is just not at all correct behavior. So I
keep fallocating files to get to 0 Avail, and instead I'm only chasing
zero Avail condition. Once I'm close enough, and delete all six
fallocated files, I end up for a short period of time, an Avail that
seems reasonable but is way bigger than it was at the start. And then
a minute later, without any additional action on my part, returns to
reporting nonsense.

It's pretty confusing.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Understanding "Used" in df
  2020-07-27 19:30       ` Chris Murphy
@ 2020-07-27 19:48         ` Andrei Borzenkov
  2020-07-27 20:47           ` Hugo Mills
  0 siblings, 1 reply; 9+ messages in thread
From: Andrei Borzenkov @ 2020-07-27 19:48 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Martin Steigerwald, Zygo Blaxell, Btrfs BTRFS

27.07.2020 22:30, Chris Murphy пишет:
> On Mon, Jul 27, 2020 at 10:43 AM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
>>
>> Unfortunately, "df" does not display "free" (I was wrong in other post).
>> But using stat ...
>>
>>
>> $ LANGUAGE=en stat -f .
>> ...
>> Block size: 4096       Fundamental block size: 4096
>> Blocks: Total: 115164174  Free: 49153062   Available: 43297293
>>
>> $ LANGUAGE=en df -B 4K .
>> Filesystem     4K-blocks     Used Available Use% Mounted on
>> /dev/sda4      115164174 66011112  43297293  61% /
>>
>> 115164174 - 49153062 == 66011112
>>
>> But there is no way you can compute Available from other values - it is
>> whatever filesystem returns.
>>
> 
> It's definitely goofy in the odd device raid1 case.

Well, I already explained why it happens. Yes, it looks like a bug, the
question is how to do better estimation without performing exhaustive
single-chunk allocation every time. Three equal size devices looks
simple, but consider general case of multiple devices of different size
or different amount of free space.

> If I fallocate a
> file equal to Avail, Avail is not zero.

Yes, because real allocator breaks large file into chunks and estimation
in statfs "allocates" large size as single operation. So end result is
different.

> The fallocated file + new
> Avail != old Avail, which is just not at all correct behavior. So I
> keep fallocating files to get to 0 Avail, and instead I'm only chasing
> zero Avail condition. Once I'm close enough, and delete all six
> fallocated files, I end up for a short period of time, an Avail that
> seems reasonable but is way bigger than it was at the start. And then
> a minute later, without any additional action on my part, returns to
> reporting nonsense.
> 
> It's pretty confusing.
> 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Understanding "Used" in df
  2020-07-27 19:48         ` Andrei Borzenkov
@ 2020-07-27 20:47           ` Hugo Mills
  0 siblings, 0 replies; 9+ messages in thread
From: Hugo Mills @ 2020-07-27 20:47 UTC (permalink / raw)
  To: Andrei Borzenkov
  Cc: Chris Murphy, Martin Steigerwald, Zygo Blaxell, Btrfs BTRFS

On Mon, Jul 27, 2020 at 10:48:18PM +0300, Andrei Borzenkov wrote:
> 27.07.2020 22:30, Chris Murphy пишет:
> > On Mon, Jul 27, 2020 at 10:43 AM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
> >>
> >> Unfortunately, "df" does not display "free" (I was wrong in other post).
> >> But using stat ...
> >>
> >>
> >> $ LANGUAGE=en stat -f .
> >> ...
> >> Block size: 4096       Fundamental block size: 4096
> >> Blocks: Total: 115164174  Free: 49153062   Available: 43297293
> >>
> >> $ LANGUAGE=en df -B 4K .
> >> Filesystem     4K-blocks     Used Available Use% Mounted on
> >> /dev/sda4      115164174 66011112  43297293  61% /
> >>
> >> 115164174 - 49153062 == 66011112
> >>
> >> But there is no way you can compute Available from other values - it is
> >> whatever filesystem returns.
> >>
> > 
> > It's definitely goofy in the odd device raid1 case.
> 
> Well, I already explained why it happens. Yes, it looks like a bug, the
> question is how to do better estimation without performing exhaustive
> single-chunk allocation every time. Three equal size devices looks
> simple, but consider general case of multiple devices of different size
> or different amount of free space.

   There's an O(n^2) algorithm in the number of devices. It's what I
used to implement the online space checker[1]. I've put up a write up
of the process at [2]. Without proof -- I wasn't able to work it out
-- but nobody's been able to catch it out yet.

   Hugo.

[1] https://carfax.org.uk/btrfs-usage/
[2] https://carfax.org.uk/files/temp/btrfs-allocator.pdf

-- 
Hugo Mills             | >squeek< *POP*
hugo@... carfax.org.uk | gluglugluglug
http://carfax.org.uk/  | <pause>
PGP: E2AB1DE4          | gluglugluglug!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Understanding "Used" in df
  2020-07-27 11:38   ` Martin Steigerwald
  2020-07-27 16:42     ` Andrei Borzenkov
@ 2020-07-28 21:20     ` Zygo Blaxell
  1 sibling, 0 replies; 9+ messages in thread
From: Zygo Blaxell @ 2020-07-28 21:20 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-btrfs

On Mon, Jul 27, 2020 at 01:38:13PM +0200, Martin Steigerwald wrote:
> Zygo Blaxell - 23.07.20, 06:51:06 CEST:
> > On Wed, Jul 22, 2020 at 05:10:19PM +0200, Martin Steigerwald wrote:
> > > I have:
> > > 
> > > % LANG=en df -hT /home
> > > Filesystem            Type   Size  Used Avail Use% Mounted on
> > > /dev/mapper/sata-home btrfs  300G  175G  123G  59% /home
> > > 
> > > And:
> > > 
> > > merkaba:~> btrfs fi sh /home
> > > Label: 'home'  uuid: […]
> > > 
> > >         Total devices 2 FS bytes used 173.91GiB
> > >         devid    1 size 300.00GiB used 223.03GiB path
> > >         /dev/mapper/sata-home
> > >         devid    2 size 300.00GiB used 223.03GiB path
> > >         /dev/mapper/msata-home
> > > 
> > > merkaba:~> btrfs fi df /home
> > > Data, RAID1: total=218.00GiB, used=171.98GiB
> > > System, RAID1: total=32.00MiB, used=64.00KiB
> > > Metadata, RAID1: total=5.00GiB, used=1.94GiB
> > > GlobalReserve, single: total=490.48MiB, used=0.00B
> > > 
> > > As well as:
> > > 
> > > merkaba:~> btrfs fi usage -T /home
> > > 
> > > Overall:
> > >     Device size:                 600.00GiB
> > >     Device allocated:            446.06GiB
> > >     Device unallocated:          153.94GiB
> > >     Device missing:                  0.00B
> > >     Used:                        347.82GiB
> > >     Free (estimated):            123.00GiB      (min: 123.00GiB)
> > >     Data ratio:                       2.00
> > >     Metadata ratio:                   2.00
> > >     Global reserve:              490.45MiB      (used: 0.00B)
> > >     Multiple profiles:                  no
> > >     
> > >                           Data      Metadata System
> > > 
> > > Id Path                   RAID1     RAID1    RAID1    Unallocated
> > > -- ---------------------- --------- -------- -------- -----------
> > > 
> > >  1 /dev/mapper/sata-home  218.00GiB  5.00GiB 32.00MiB    76.97GiB
> > >  2 /dev/mapper/msata-home 218.00GiB  5.00GiB 32.00MiB    76.97GiB
> > > 
> > > -- ---------------------- --------- -------- -------- -----------
> > > 
> > >    Total                  218.00GiB  5.00GiB 32.00MiB   153.94GiB
> > >    Used                   171.97GiB  1.94GiB 64.00KiB
> > > 
> > > I think I understand all of it, including just 123G instead of
> > > 300 - 175 = 125 GiB "Avail" in df -hT.
> > > 
> > > But why 175 GiB "Used" in 'df -hT' when just 173.91GiB (see 'btrfs
> > > fi sh') is allocated *within* the block group / chunks?
> > 
> > statvfs (the 'df' syscall) does not report a "used" number, only total
> > and available btrfs data blocks (no metadata blocks are counted).
> > 'df' computes "used" by subtracting f_blocks - f_bavail.
> > 
> > 	122.99875 = 300 - 171.97 - 5 - .03125
> > 
> > 	df_free = total - data_used - metadata_allocated - system_allocated
> 
> I get that one. That is for what is still free.
> 
> But I do not understand "Used" in df as.

df "used" is f_blocks - f_bfree.  f_bfree is f_blocks - total_used,
where total_used is all space used in the filesystem.  f_bavail is the
sum of all free space in existing data block groups plus a simulation of
how much space would be available if all unallocated block groups were
converted to the current data profile.  f_blocks is the total number of
blocks in the raw filesystem divided by the RAID redundancy of the last
data block group.

These values count different, partially overlapping things, potentially
in different units, so no relation holds between f_blocks, f_bavail
and f_bfree.

> 1) It it would be doing 300 GiB - what is still available, it would do 300-122.99 = 177.01
> 
> 2) If it would add together all allocated within a chunk… 
> 
> 171.98 GiB used in data + 64 KiB used in system + 1,94 GiB used in metadata ~= 174 GiB
> 
> 3) It may consider all allocated system and metadata chunks as lost for writing
> data:
> 
> 171.98 used in date + 32 MiB allocated in system + 5 GiB allocated in metadata ~= 176.98 GiB
> 
> 4) It may consider 2 of those 5 GiB chunks for metadata as reclaimable and
> then it would go like this:

df f_bavail doesn't consider reclaiming anything.  It's strictly "how
many data blocks could we theoretically fallocate in this instant" (and
even that is wrong since fallocate isn't atomic wrt space allocation).
It doesn't run a full simulation of the real allocator behavior.

> 171.98 used in date + 32 MiB allocated in system + 3 GiB metadata ~= 116.98 GiB = 174.98 GiB
> 
> That would be about right, but also as unpredictable as it can get.
> 
> > Inline files count as metadata instead of data, so even when you are
> > out of data blocks (zero blocks free in df), you can sometimes still
> > write small files.  Sometimes, when you write one small file, 1GB of
> > available space disappears as a new metadata block group is allocated.
> > 
> > 'df' doesn't take metadata or data sharing into account at all, or
> > the space required to store csums, or bursty metadata usage workloads.
> > 'df' can't predict these events, so its accuracy is limited to no
> > better than about 0.5% of the size of the filesystem or +/- 1GB,
> > whichever is larger.
> 
> So just assume that df output can be +/- 1 GiB off?

If anything, +/- 1GiB is a _lower_ bound on df estimation error.

df avail is an estimation of how much of a specific type of space resource
(data blocks) is available, based on currently available information,
so some error is expected.  Some error components change over time
and cannot be predicted.  Some of these errors are percentages of free
space, others are multiples of constant values (1GB per disk).  If df
says you have 300GB free, in theory you can write between 284GB and
299GB of data to that filesystem based on various factors (not including
dedupe or compression).  In practice there may be other error components
outside of the limited data model presented here (e.g. system block
group allocation).

If we write data with csums there's a 0.1-0.9% extra overhead depending
on csum algorithm and write sizes.  0.3% of 300GB is 1GB, so filesystem
with 300GB free can only hold 299GB of data with xxhash csums.

A metadata block group is normally 1 GB, and one could be allocated at
any time.  If df reports we have 1.1GB of space, we either have 1.1GB
or 100MB of space for data in reality, and we won't find out which it
is until after we do more than 100MB of writes (i.e. did we run out of
space while doing the write, or not?).

If we write a million 1K files, df available space won't change,
as they'll all be inline files, inline files are stored in metadata,
and statvfs doesn't count metadata in f_bavail.  If we write the next
file, 1 GB or more may be allocated for a new metadata block group
(depends on disk layout).  The average size of each file is 1K (multiplied
by metadata redundancy factor) but each individual file consumes either
0 or 1 GB of space if you compare df available space before and after
writing the file.

If the metadata is raid0 or raid10 (or raid5 or raid6, but don't use
those) then it's possible for a block group to contain more than 1GB
of data.  This multiplies the "1 GB" number across the board, so you
can jump from e.g. 5.1 GB to 0.1 GB at any time on a 5 disk array with
raid0 metadata, or adding 4K to a file could rarely but randomly allocate
5 GB of disk space for metadata.

If we write a million 4K blocks to a single file in random order or with
fsync() between each block, every 75 4K data blocks will come with an
extra 16K metadata block, 213MB total, 5.3%.  If we do that 4 times,
we'll use an extra GB of metadata and probably trigger a metadata block
group allocation.  If we write 4GB of file all at once, there will be
0 or 64K of metadata blocks due to rounding, and df's prediction will
be a closer match to reality (but even that 64K could still trigger a
1GB allocation).  Most workloads fall somewhere in between these extremes.

There are also prediction errors in the other direction.  If data is
overwritten or deleted during a transaction, it is not freed until the end
of the transaction, or even much later with async discard.  Balances and
scrubs can mark entire block groups as temporarily full.  These events
will cause df to temporarily underreport a block group's worth of free
space.  This imposes additional requirements on free-space-triggered
daemons like automatic file reapers--they can overdelete due to temporary
conditions, so they either need hysteresis, long time delay averaging,
or big error tolerances (i.e. they don't care if they're a GB or two
above target).  In these cases, df is not inaccurate--pinned extents
really do make disk space unavailable temporarily--but the reported
value from df is low compared to the number of data blocks that could
actually be allocated.

statvfs f_bfree is a better estimation of how much space is available if
all possible optimizations are performed, since it's "size of all disks
minus size of everything in use."  You'll never get exactly that much
space out of a filesystem, though, due to minimum block group sizes,
global reservations, data copy on write, fragmentation, metadata page
unsharing, and other assorted inefficiencies.

> I am just wondering cause I aimed to explaining this to participants of
> my Linux courses… and for now I have the honest answer that I have
> no clue why df displays "175 GiB" as used.

As a rule of thumb, a filesystem with less than 1GB free per disk is
effectively full, mostly due to that "may allocate extra GB at any
time" behavior.  Even filesystems with less than 10GB or even 100GB
per disk require extra attention to avoid sudden ENOSPC in some cases
and performance degradation due to fragmentation in others.

If the filesytem has more than 100GB unallocated on every disk, no special
attention is required.  Every half-percent of writes consumes an extra
GB, no big deal.

There are some scale factors in btrfs that change in smaller filesystems.
The block group allocation unit gets smaller, so e.g. for a filesystem
that is under 16GB the rule of thumb operates in units of 128MB per
disk instead of 1GB per disk.

Mixed block groups are recommended for tiny filesystems.  Mixed block
groups don't have separate metadata and data, so the 'df' behavior is
much more linear and predictive.

> > > Does this have something to do with that global reserve thing?
> > 
> > 'df' currently tells you nothing about metadata (except in kernels
> > before 5.6, when you run too low on metadata space, f_bavail is
> > abruptly set to zero).  That's about the only impact global reserve
> > has on 'df'.
> 
> But it won't claim used or even just allocated metadata space as available
> for writing data?

Correct.  Metadata space is never counted in df as available for data
(f_bavail).  It is counted in f_bfree (which df uses to calculate "used"),
but f_bfree is a fairly meaningless internal statistic.

If a metadata block group is deleted (e.g. by resize, device delete,
balance, or if it becomes completely empty) then its space will be counted
as available, but not before.  Balances may change the available space in
df by changing on-disk layout so that future allocations become possible
that are not possible with the present on-disk layout (e.g. when a new
empty disk is added to a full raid5 array, initially none of the free
space can be used).  Converting balances can change the filesystem size
in df too, which will change "used" since df used = size - free, sometimes
in strange ways if there are block groups with multiple profiles present.

> > Global reserve is metadata allocated-but-unused space, and all
> > metadata is not visible to df.  The reserve ensures that critical
> > btrfs metadata operations can complete without running out of space,
> > by forcing non-critical long-running operations to commit
> > transactions when no metadata space is available outside the reserved
> > pool.  It mostly works, though there are still a few bugs left that
> > lead to EROFS when metadata runs low.
> 
> Hmmm, thanks.
> 
> But as far as I understood also from the other post, Global Reserve is
> reserved but not reported as used in df?

Global reserve overlaps with metadata.  Some of global reserve inevitably
gets counted in f_bfree (or more correctly f_blocks - f_bfree) because
global reserve blocks are also metadata blocks, and f_bfree is affected
by the number of metadata blocks in use.  Changes in global reserve will
therefore correlate with changes in df "used" output some of the time.

Global reserve isn't a physical region on disk like a metadata or system
block group.  Global reserve is just a block counter in the kernel that
deducts future metadata allocations from the amount of free space that
appears in metadata block groups in the present, so that later allocations
required to commit the transaction are guaranteed to succeed.  The usage
numbers for global reserve and metadata cannot be meaningfully separated.

> I am not sure whether I am getting it though.
> 
> Best,
> -- 
> Martin
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-07-28 21:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-22 15:10 Understanding "Used" in df Martin Steigerwald
2020-07-22 19:07 ` A L
2020-07-23  4:51 ` Zygo Blaxell
2020-07-27 11:38   ` Martin Steigerwald
2020-07-27 16:42     ` Andrei Borzenkov
2020-07-27 19:30       ` Chris Murphy
2020-07-27 19:48         ` Andrei Borzenkov
2020-07-27 20:47           ` Hugo Mills
2020-07-28 21:20     ` Zygo Blaxell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).