All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs space used issue
@ 2018-02-27 13:09 vinayak hegde
  2018-02-27 13:54 ` Austin S. Hemmelgarn
  2018-02-28 19:09 ` Duncan
  0 siblings, 2 replies; 14+ messages in thread
From: vinayak hegde @ 2018-02-27 13:09 UTC (permalink / raw)
  To: linux-btrfs

I am using btrfs, But I am seeing du -sh and df -h showing huge size
difference on ssd.

mount:
/dev/drbd1 on /dc/fileunifier.datacache type btrfs
(rw,noatime,nodiratime,flushoncommit,discard,nospace_cache,recovery,commit=5,subvolid=5,subvol=/)


du -sh /dc/fileunifier.datacache/ -  331G

df -h
/dev/drbd1      746G  346G  398G  47% /dc/fileunifier.datacache

btrfs fi usage /dc/fileunifier.datacache/
Overall:
    Device size:         745.19GiB
    Device allocated:         368.06GiB
    Device unallocated:         377.13GiB
    Device missing:             0.00B
    Used:             346.73GiB
    Free (estimated):         396.36GiB    (min: 207.80GiB)
    Data ratio:                  1.00
    Metadata ratio:              2.00
    Global reserve:         176.00MiB    (used: 0.00B)

Data,single: Size:365.00GiB, Used:345.76GiB
   /dev/drbd1     365.00GiB

Metadata,DUP: Size:1.50GiB, Used:493.23MiB
   /dev/drbd1       3.00GiB

System,DUP: Size:32.00MiB, Used:80.00KiB
   /dev/drbd1      64.00MiB

Unallocated:
   /dev/drbd1     377.13GiB


Even if we consider 6G metadata its 331+6 = 337.
where is 9GB used?

Please explain.

Vinayak

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs space used issue
  2018-02-27 13:09 btrfs space used issue vinayak hegde
@ 2018-02-27 13:54 ` Austin S. Hemmelgarn
  2018-02-28  6:01   ` vinayak hegde
  2018-02-28 19:09 ` Duncan
  1 sibling, 1 reply; 14+ messages in thread
From: Austin S. Hemmelgarn @ 2018-02-27 13:54 UTC (permalink / raw)
  To: vinayak hegde, linux-btrfs

On 2018-02-27 08:09, vinayak hegde wrote:
> I am using btrfs, But I am seeing du -sh and df -h showing huge size
> difference on ssd.
> 
> mount:
> /dev/drbd1 on /dc/fileunifier.datacache type btrfs
> (rw,noatime,nodiratime,flushoncommit,discard,nospace_cache,recovery,commit=5,subvolid=5,subvol=/)
> 
> 
> du -sh /dc/fileunifier.datacache/ -  331G
> 
> df -h
> /dev/drbd1      746G  346G  398G  47% /dc/fileunifier.datacache
> 
> btrfs fi usage /dc/fileunifier.datacache/
> Overall:
>      Device size:         745.19GiB
>      Device allocated:         368.06GiB
>      Device unallocated:         377.13GiB
>      Device missing:             0.00B
>      Used:             346.73GiB
>      Free (estimated):         396.36GiB    (min: 207.80GiB)
>      Data ratio:                  1.00
>      Metadata ratio:              2.00
>      Global reserve:         176.00MiB    (used: 0.00B)
> 
> Data,single: Size:365.00GiB, Used:345.76GiB
>     /dev/drbd1     365.00GiB
> 
> Metadata,DUP: Size:1.50GiB, Used:493.23MiB
>     /dev/drbd1       3.00GiB
> 
> System,DUP: Size:32.00MiB, Used:80.00KiB
>     /dev/drbd1      64.00MiB
> 
> Unallocated:
>     /dev/drbd1     377.13GiB
> 
> 
> Even if we consider 6G metadata its 331+6 = 337.
> where is 9GB used?
> 
> Please explain.
First, you're counting the metadata wrong.  The value shown per-device 
by `btrfs filesystem usage` already accounts for replication (so it's 
only 3 GB of metadata allocated, not 6 GB).  Neither `df` nor `du` looks 
at the chunk level allocations though.

Now, with that out of the way, the discrepancy almost certainly comes 
form differences in how `df` and `du` calculate space usage.  In 
particular, `df` calls statvfs and looks at the f_blocks and f_bfree 
values to compute space usage, while `du` walks the filesystem tree 
calling stat on everything and looking at st_blksize and st_blocks (or 
instead at st_size if you pass in `--apparent-size` as an option).  This 
leads to a couple of differences in what they will count:

1. `du` may or may not properly count hardlinks, sparse files, and 
transparently compressed data, dependent on whether or not you use 
`--apparent-sizes` (by default, it does properly count all of those), 
while `df` will always account for those properly.
2. `du` does not properly account for reflinked blocks (from 
deduplication, snapshots, or use of the CLONE ioctl), and will count 
each reflink of every block as part of the total size, while `df` will 
always count each block exactly once no matter how many reflinks it has.
3. `du` does not account for all of the BTRFS metadata allocations, 
functionally ignoring space allocated for anything but inline data, 
while `df` accounts for all BTRFS metadata properly.
4. `du` will recurse into other filesystems if you don't pass the `-x` 
option to it, while `df` will only report for each filesystem separately.
5. `du` will only count data usage under the given mount point, and 
won't account for data on other subvolumes that may be mounted elsewhere 
(and if you pass in `-x` won't count data on other subvolumes located 
under the given path either), while `df` will count all the data in all 
subvolumes.
6. There are a couple of other differences too, but they're rather 
complex and dependent on the internals of BTRFS.

In your case, I think the issue is probably one of the various things 
under item 6.  Items 1, 2 and 4 will cause `du` to report more space 
usage than `df`, item 3 is irrelevant because `du` shows less space than 
the total data chunk usage reported by `btrfs filesystem usage`, and 
item 5 is irrelevant because you're mounting the root subvolume and not 
using the `-x` option on `du` (and therefore there can't be other 
subvolumes you're missing).

Try running a full defrag of the given mount point.  If what I think is 
causing this is in fact the issue, that should bring the numbers back 
in-line with each other.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs space used issue
  2018-02-27 13:54 ` Austin S. Hemmelgarn
@ 2018-02-28  6:01   ` vinayak hegde
  2018-02-28 15:22     ` Andrei Borzenkov
  0 siblings, 1 reply; 14+ messages in thread
From: vinayak hegde @ 2018-02-28  6:01 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: linux-btrfs

I ran full defragement and balance both, but didnt help.
My created and accounting usage files are matching the du -sh output.
But I am not getting why btrfs internals use so much extra space.
My worry is, will get no space error earlier than I expect.
Is it expected with btrfs internal that it will use so much extra space?

Vinayak




On Tue, Feb 27, 2018 at 7:24 PM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:
> On 2018-02-27 08:09, vinayak hegde wrote:
>>
>> I am using btrfs, But I am seeing du -sh and df -h showing huge size
>> difference on ssd.
>>
>> mount:
>> /dev/drbd1 on /dc/fileunifier.datacache type btrfs
>>
>> (rw,noatime,nodiratime,flushoncommit,discard,nospace_cache,recovery,commit=5,subvolid=5,subvol=/)
>>
>>
>> du -sh /dc/fileunifier.datacache/ -  331G
>>
>> df -h
>> /dev/drbd1      746G  346G  398G  47% /dc/fileunifier.datacache
>>
>> btrfs fi usage /dc/fileunifier.datacache/
>> Overall:
>>      Device size:         745.19GiB
>>      Device allocated:         368.06GiB
>>      Device unallocated:         377.13GiB
>>      Device missing:             0.00B
>>      Used:             346.73GiB
>>      Free (estimated):         396.36GiB    (min: 207.80GiB)
>>      Data ratio:                  1.00
>>      Metadata ratio:              2.00
>>      Global reserve:         176.00MiB    (used: 0.00B)
>>
>> Data,single: Size:365.00GiB, Used:345.76GiB
>>     /dev/drbd1     365.00GiB
>>
>> Metadata,DUP: Size:1.50GiB, Used:493.23MiB
>>     /dev/drbd1       3.00GiB
>>
>> System,DUP: Size:32.00MiB, Used:80.00KiB
>>     /dev/drbd1      64.00MiB
>>
>> Unallocated:
>>     /dev/drbd1     377.13GiB
>>
>>
>> Even if we consider 6G metadata its 331+6 = 337.
>> where is 9GB used?
>>
>> Please explain.
>
> First, you're counting the metadata wrong.  The value shown per-device by
> `btrfs filesystem usage` already accounts for replication (so it's only 3 GB
> of metadata allocated, not 6 GB).  Neither `df` nor `du` looks at the chunk
> level allocations though.
>
> Now, with that out of the way, the discrepancy almost certainly comes form
> differences in how `df` and `du` calculate space usage.  In particular, `df`
> calls statvfs and looks at the f_blocks and f_bfree values to compute space
> usage, while `du` walks the filesystem tree calling stat on everything and
> looking at st_blksize and st_blocks (or instead at st_size if you pass in
> `--apparent-size` as an option).  This leads to a couple of differences in
> what they will count:
>
> 1. `du` may or may not properly count hardlinks, sparse files, and
> transparently compressed data, dependent on whether or not you use
> `--apparent-sizes` (by default, it does properly count all of those), while
> `df` will always account for those properly.
> 2. `du` does not properly account for reflinked blocks (from deduplication,
> snapshots, or use of the CLONE ioctl), and will count each reflink of every
> block as part of the total size, while `df` will always count each block
> exactly once no matter how many reflinks it has.
> 3. `du` does not account for all of the BTRFS metadata allocations,
> functionally ignoring space allocated for anything but inline data, while
> `df` accounts for all BTRFS metadata properly.
> 4. `du` will recurse into other filesystems if you don't pass the `-x`
> option to it, while `df` will only report for each filesystem separately.
> 5. `du` will only count data usage under the given mount point, and won't
> account for data on other subvolumes that may be mounted elsewhere (and if
> you pass in `-x` won't count data on other subvolumes located under the
> given path either), while `df` will count all the data in all subvolumes.
> 6. There are a couple of other differences too, but they're rather complex
> and dependent on the internals of BTRFS.
>
> In your case, I think the issue is probably one of the various things under
> item 6.  Items 1, 2 and 4 will cause `du` to report more space usage than
> `df`, item 3 is irrelevant because `du` shows less space than the total data
> chunk usage reported by `btrfs filesystem usage`, and item 5 is irrelevant
> because you're mounting the root subvolume and not using the `-x` option on
> `du` (and therefore there can't be other subvolumes you're missing).
>
> Try running a full defrag of the given mount point.  If what I think is
> causing this is in fact the issue, that should bring the numbers back
> in-line with each other.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs space used issue
  2018-02-28  6:01   ` vinayak hegde
@ 2018-02-28 15:22     ` Andrei Borzenkov
  2018-03-01  9:26       ` vinayak hegde
  0 siblings, 1 reply; 14+ messages in thread
From: Andrei Borzenkov @ 2018-02-28 15:22 UTC (permalink / raw)
  To: vinayak hegde; +Cc: Austin S. Hemmelgarn, Btrfs BTRFS

On Wed, Feb 28, 2018 at 9:01 AM, vinayak hegde <vinayakhegdev@gmail.com> wrote:
> I ran full defragement and balance both, but didnt help.

Showing the same information immediately after full defragment would be helpful.

> My created and accounting usage files are matching the du -sh output.
> But I am not getting why btrfs internals use so much extra space.
> My worry is, will get no space error earlier than I expect.
> Is it expected with btrfs internal that it will use so much extra space?
>

Did you try to reboot? Deleted opened file could well cause this effect.

> Vinayak
>
>
>
>
> On Tue, Feb 27, 2018 at 7:24 PM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>> On 2018-02-27 08:09, vinayak hegde wrote:
>>>
>>> I am using btrfs, But I am seeing du -sh and df -h showing huge size
>>> difference on ssd.
>>>
>>> mount:
>>> /dev/drbd1 on /dc/fileunifier.datacache type btrfs
>>>
>>> (rw,noatime,nodiratime,flushoncommit,discard,nospace_cache,recovery,commit=5,subvolid=5,subvol=/)
>>>
>>>
>>> du -sh /dc/fileunifier.datacache/ -  331G
>>>
>>> df -h
>>> /dev/drbd1      746G  346G  398G  47% /dc/fileunifier.datacache
>>>
>>> btrfs fi usage /dc/fileunifier.datacache/
>>> Overall:
>>>      Device size:         745.19GiB
>>>      Device allocated:         368.06GiB
>>>      Device unallocated:         377.13GiB
>>>      Device missing:             0.00B
>>>      Used:             346.73GiB
>>>      Free (estimated):         396.36GiB    (min: 207.80GiB)
>>>      Data ratio:                  1.00
>>>      Metadata ratio:              2.00
>>>      Global reserve:         176.00MiB    (used: 0.00B)
>>>
>>> Data,single: Size:365.00GiB, Used:345.76GiB
>>>     /dev/drbd1     365.00GiB
>>>
>>> Metadata,DUP: Size:1.50GiB, Used:493.23MiB
>>>     /dev/drbd1       3.00GiB
>>>
>>> System,DUP: Size:32.00MiB, Used:80.00KiB
>>>     /dev/drbd1      64.00MiB
>>>
>>> Unallocated:
>>>     /dev/drbd1     377.13GiB
>>>
>>>
>>> Even if we consider 6G metadata its 331+6 = 337.
>>> where is 9GB used?
>>>
>>> Please explain.
>>
>> First, you're counting the metadata wrong.  The value shown per-device by
>> `btrfs filesystem usage` already accounts for replication (so it's only 3 GB
>> of metadata allocated, not 6 GB).  Neither `df` nor `du` looks at the chunk
>> level allocations though.
>>
>> Now, with that out of the way, the discrepancy almost certainly comes form
>> differences in how `df` and `du` calculate space usage.  In particular, `df`
>> calls statvfs and looks at the f_blocks and f_bfree values to compute space
>> usage, while `du` walks the filesystem tree calling stat on everything and
>> looking at st_blksize and st_blocks (or instead at st_size if you pass in
>> `--apparent-size` as an option).  This leads to a couple of differences in
>> what they will count:
>>
>> 1. `du` may or may not properly count hardlinks, sparse files, and
>> transparently compressed data, dependent on whether or not you use
>> `--apparent-sizes` (by default, it does properly count all of those), while
>> `df` will always account for those properly.
>> 2. `du` does not properly account for reflinked blocks (from deduplication,
>> snapshots, or use of the CLONE ioctl), and will count each reflink of every
>> block as part of the total size, while `df` will always count each block
>> exactly once no matter how many reflinks it has.
>> 3. `du` does not account for all of the BTRFS metadata allocations,
>> functionally ignoring space allocated for anything but inline data, while
>> `df` accounts for all BTRFS metadata properly.
>> 4. `du` will recurse into other filesystems if you don't pass the `-x`
>> option to it, while `df` will only report for each filesystem separately.
>> 5. `du` will only count data usage under the given mount point, and won't
>> account for data on other subvolumes that may be mounted elsewhere (and if
>> you pass in `-x` won't count data on other subvolumes located under the
>> given path either), while `df` will count all the data in all subvolumes.
>> 6. There are a couple of other differences too, but they're rather complex
>> and dependent on the internals of BTRFS.
>>
>> In your case, I think the issue is probably one of the various things under
>> item 6.  Items 1, 2 and 4 will cause `du` to report more space usage than
>> `df`, item 3 is irrelevant because `du` shows less space than the total data
>> chunk usage reported by `btrfs filesystem usage`, and item 5 is irrelevant
>> because you're mounting the root subvolume and not using the `-x` option on
>> `du` (and therefore there can't be other subvolumes you're missing).
>>
>> Try running a full defrag of the given mount point.  If what I think is
>> causing this is in fact the issue, that should bring the numbers back
>> in-line with each other.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs space used issue
  2018-02-27 13:09 btrfs space used issue vinayak hegde
  2018-02-27 13:54 ` Austin S. Hemmelgarn
@ 2018-02-28 19:09 ` Duncan
  2018-02-28 19:24   ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 14+ messages in thread
From: Duncan @ 2018-02-28 19:09 UTC (permalink / raw)
  To: linux-btrfs

vinayak hegde posted on Tue, 27 Feb 2018 18:39:51 +0530 as excerpted:

> I am using btrfs, But I am seeing du -sh and df -h showing huge size
> difference on ssd.
> 
> mount:
> /dev/drbd1 on /dc/fileunifier.datacache type btrfs
> 
(rw,noatime,nodiratime,flushoncommit,discard,nospace_cache,recovery,commit=5,subvolid=5,subvol=/)
> 
> 
> du -sh /dc/fileunifier.datacache/ -  331G
> 
> df -h /dev/drbd1      746G  346G  398G  47% /dc/fileunifier.datacache
> 
> btrfs fi usage /dc/fileunifier.datacache/
> Overall:
>     Device size:         745.19GiB Device allocated:         368.06GiB
>     Device unallocated:         377.13GiB Device missing:            
>     0.00B Used:             346.73GiB Free (estimated):        
>     396.36GiB    (min: 207.80GiB)
>     Data ratio:                  1.00 Metadata ratio:              2.00
>     Global reserve:         176.00MiB    (used: 0.00B)
> 
> Data,single: Size:365.00GiB, Used:345.76GiB
>    /dev/drbd1     365.00GiB
> 
> Metadata,DUP: Size:1.50GiB, Used:493.23MiB
>    /dev/drbd1       3.00GiB
> 
> System,DUP: Size:32.00MiB, Used:80.00KiB
>    /dev/drbd1      64.00MiB
> 
> Unallocated:
>    /dev/drbd1     377.13GiB
> 
> 
> Even if we consider 6G metadata its 331+6 = 337.
> where is 9GB used?
> 
> Please explain.

Taking a somewhat higher level view than Austin's reply, on btrfs, plain 
df and to a somewhat lessor extent du[1] are at best good /estimations/ 
of usage, and for df, space remaining.  Due to btrfs' COW/copy-on-write 
semantics and features such as the various replication/raid schemes, 
snapshotting, etc, btrfs makes available, that df/du don't really 
understand as they simply don't have and weren't /designed/ to have that 
level of filesystem-specific insight, they, particularly df due to its 
whole-filesystem focus, aren't particularly accurate on btrfs.  Consider 
their output more a "best estimate given the rough data we have 
available" sort of report.

To get the real filesystem focused picture, use btrfs filesystem usage, 
or btrfs filesystem show combined with btrfs filesystem df.  That's what 
you should trust, altho various utilities that check for available space 
before doing something often use the kernel-call equivalent of (plain) df 
to ensure they have the required space, so it's worthwhile to keep an eye 
on it as the filesystem fills, as well.  If it gets too out of sync with 
btrfs filesystem usage, or if btrfs filesystem usage unallocated drops 
below say five gigs or data or metadata size vs used shows a spread of 
multiple gigs (your data shows a spread of ~20 gigs ATM, but with 377 
gigs still unallocated it's no big deal; it would be a big deal if those 
were reversed, tho, only 20 gigs unallocated and a spread of 300+ gigs in 
data size vs used), then corrective action such as a filtered rebalance 
may be necessary.

There are entries in the FAQ discussing free space issues that you should 
definitely read if you haven't, altho they obviously address the general 
case, so if you have more questions about an individual case after having 
read them, here is a good place to ask. =:^)

Everything having to do with "space" (see both the 1/Important-questions 
and 4/Common-questions sections) here:

https://btrfs.wiki.kernel.org/index.php/FAQ

Meanwhile, it's worth noting that not entirely intuitively, btrfs' COW 
implementation can "waste" space on larger files that are mostly, but not 
entirely, rewritten.  An example is the best way to demonstrate.  
Consider each x a used block and each - an unused but still referenced 
block:

Original file, written as a single extent (diagram works best with 
monospace, not arbitrarily rewrapped):

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

First rewrite of part of it:

xxxxxxxxxxx------xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
           xxxxxx


Nth rewrite, where some blocks of the original still remain as originally 
written:

------------------xxx------------------------------
           xxx---
xxxx----xxx
    xxxx
                     xxxxxxxxxxxxxxxxxxxxx---xxxxxx
                                          xxx
              xxx


As you can see, that first really large extent remains fully referenced, 
altho only three blocks of it remain in actual use.  All those -- won't 
be returned to free space until those last three blocks get rewritten as 
well, thus freeing the entire original extent.

I believe this effect is what Austin was referencing when he suggested 
the defrag, tho defrag won't necessarily /entirely/ clear it up.  One way 
to be /sure/ it's cleared up would be to rewrite the entire file, 
deleting the original, either by copying it to a different filesystem and 
back (with the off-filesystem copy guaranteeing that it can't use reflinks 
to the existing extents), or by using cp's --reflink=never option.  
(FWIW, I prefer the former, just to be sure, using temporary copies to a 
suitably sized tmpfs for speed where possible, tho obviously if the file 
is larger than your memory size that's not possible.)

Of course where applicable, snapshots and dedup keep reflink-references 
to the old extents, so they must be adjusted or deleted as well, to 
properly free that space.

---
[1] du: Because its purpose is different.  du's primary purpose is 
telling you in detail what space files take up, per-file and per-
directory, without particular regard to usage on the filesystem itself.  
df's focus, by contrast, is on the filesystem as a whole.  So where two 
files share the same extent due to reflinking, du should and does count 
that usage for each file, because that's what each file /uses/ even if 
they both use the same extents.


-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs space used issue
  2018-02-28 19:09 ` Duncan
@ 2018-02-28 19:24   ` Austin S. Hemmelgarn
  2018-02-28 19:54     ` Duncan
  0 siblings, 1 reply; 14+ messages in thread
From: Austin S. Hemmelgarn @ 2018-02-28 19:24 UTC (permalink / raw)
  To: linux-btrfs

On 2018-02-28 14:09, Duncan wrote:
> vinayak hegde posted on Tue, 27 Feb 2018 18:39:51 +0530 as excerpted:
> 
>> I am using btrfs, But I am seeing du -sh and df -h showing huge size
>> difference on ssd.
>>
>> mount:
>> /dev/drbd1 on /dc/fileunifier.datacache type btrfs
>>
> (rw,noatime,nodiratime,flushoncommit,discard,nospace_cache,recovery,commit=5,subvolid=5,subvol=/)
>>
>>
>> du -sh /dc/fileunifier.datacache/ -  331G
>>
>> df -h /dev/drbd1      746G  346G  398G  47% /dc/fileunifier.datacache
>>
>> btrfs fi usage /dc/fileunifier.datacache/
>> Overall:
>>      Device size:         745.19GiB Device allocated:         368.06GiB
>>      Device unallocated:         377.13GiB Device missing:
>>      0.00B Used:             346.73GiB Free (estimated):
>>      396.36GiB    (min: 207.80GiB)
>>      Data ratio:                  1.00 Metadata ratio:              2.00
>>      Global reserve:         176.00MiB    (used: 0.00B)
>>
>> Data,single: Size:365.00GiB, Used:345.76GiB
>>     /dev/drbd1     365.00GiB
>>
>> Metadata,DUP: Size:1.50GiB, Used:493.23MiB
>>     /dev/drbd1       3.00GiB
>>
>> System,DUP: Size:32.00MiB, Used:80.00KiB
>>     /dev/drbd1      64.00MiB
>>
>> Unallocated:
>>     /dev/drbd1     377.13GiB
>>
>>
>> Even if we consider 6G metadata its 331+6 = 337.
>> where is 9GB used?
>>
>> Please explain.
> 
> Taking a somewhat higher level view than Austin's reply, on btrfs, plain
> df and to a somewhat lessor extent du[1] are at best good /estimations/
> of usage, and for df, space remaining.  Due to btrfs' COW/copy-on-write
> semantics and features such as the various replication/raid schemes,
> snapshotting, etc, btrfs makes available, that df/du don't really
> understand as they simply don't have and weren't /designed/ to have that
> level of filesystem-specific insight, they, particularly df due to its
> whole-filesystem focus, aren't particularly accurate on btrfs.  Consider
> their output more a "best estimate given the rough data we have
> available" sort of report.
> 
> To get the real filesystem focused picture, use btrfs filesystem usage,
> or btrfs filesystem show combined with btrfs filesystem df.  That's what
> you should trust, altho various utilities that check for available space
> before doing something often use the kernel-call equivalent of (plain) df
> to ensure they have the required space, so it's worthwhile to keep an eye
> on it as the filesystem fills, as well.  If it gets too out of sync with
> btrfs filesystem usage, or if btrfs filesystem usage unallocated drops
> below say five gigs or data or metadata size vs used shows a spread of
> multiple gigs (your data shows a spread of ~20 gigs ATM, but with 377
> gigs still unallocated it's no big deal; it would be a big deal if those
> were reversed, tho, only 20 gigs unallocated and a spread of 300+ gigs in
> data size vs used), then corrective action such as a filtered rebalance
> may be necessary.
> 
> There are entries in the FAQ discussing free space issues that you should
> definitely read if you haven't, altho they obviously address the general
> case, so if you have more questions about an individual case after having
> read them, here is a good place to ask. =:^)
> 
> Everything having to do with "space" (see both the 1/Important-questions
> and 4/Common-questions sections) here:
> 
> https://btrfs.wiki.kernel.org/index.php/FAQ
> 
> Meanwhile, it's worth noting that not entirely intuitively, btrfs' COW
> implementation can "waste" space on larger files that are mostly, but not
> entirely, rewritten.  An example is the best way to demonstrate.
> Consider each x a used block and each - an unused but still referenced
> block:
> 
> Original file, written as a single extent (diagram works best with
> monospace, not arbitrarily rewrapped):
> 
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> 
> First rewrite of part of it:
> 
> xxxxxxxxxxx------xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>             xxxxxx
> 
> 
> Nth rewrite, where some blocks of the original still remain as originally
> written:
> 
> ------------------xxx------------------------------
>             xxx---
> xxxx----xxx
>      xxxx
>                       xxxxxxxxxxxxxxxxxxxxx---xxxxxx
>                                            xxx
>                xxx
> 
> 
> As you can see, that first really large extent remains fully referenced,
> altho only three blocks of it remain in actual use.  All those -- won't
> be returned to free space until those last three blocks get rewritten as
> well, thus freeing the entire original extent.
> 
> I believe this effect is what Austin was referencing when he suggested
> the defrag, tho defrag won't necessarily /entirely/ clear it up.  One way
> to be /sure/ it's cleared up would be to rewrite the entire file,
> deleting the original, either by copying it to a different filesystem and
> back (with the off-filesystem copy guaranteeing that it can't use reflinks
> to the existing extents), or by using cp's --reflink=never option.
> (FWIW, I prefer the former, just to be sure, using temporary copies to a
> suitably sized tmpfs for speed where possible, tho obviously if the file
> is larger than your memory size that's not possible.)
Correct, this is why I recommended trying a defrag.  I've actually never 
seen things so bad that a simple defrag didn't fix them however (though 
I have seen a few cases where the target extent size had to be set 
higher than the default of 20MB).  Also, as counter-intuitive as it 
might sound, autodefrag really doesn't help much with this, and can 
actually make things worse.

This is also one of the things I was referring to in item 6of the list 
of causes I gave, partly because I couldn't come up with a good way to 
explain it clearly (which I feel you did an excellent job of above), 
with the other big one being handling of xattrs and ACL's (which get 
accounted by `df` but generally aren't by `du` (at least, not reliably).
> 
> Of course where applicable, snapshots and dedup keep reflink-references
> to the old extents, so they must be adjusted or deleted as well, to
> properly free that space.
> 
> ---
> [1] du: Because its purpose is different.  du's primary purpose is
> telling you in detail what space files take up, per-file and per-
> directory, without particular regard to usage on the filesystem itself.
> df's focus, by contrast, is on the filesystem as a whole.  So where two
> files share the same extent due to reflinking, du should and does count
> that usage for each file, because that's what each file /uses/ even if
> they both use the same extents.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs space used issue
  2018-02-28 19:24   ` Austin S. Hemmelgarn
@ 2018-02-28 19:54     ` Duncan
  2018-02-28 20:15       ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 14+ messages in thread
From: Duncan @ 2018-02-28 19:54 UTC (permalink / raw)
  To: linux-btrfs

Austin S. Hemmelgarn posted on Wed, 28 Feb 2018 14:24:40 -0500 as
excerpted:

>> I believe this effect is what Austin was referencing when he suggested
>> the defrag, tho defrag won't necessarily /entirely/ clear it up.  One
>> way to be /sure/ it's cleared up would be to rewrite the entire file,
>> deleting the original, either by copying it to a different filesystem
>> and back (with the off-filesystem copy guaranteeing that it can't use
>> reflinks to the existing extents), or by using cp's --reflink=never
>> option.
>> (FWIW, I prefer the former, just to be sure, using temporary copies to
>> a suitably sized tmpfs for speed where possible, tho obviously if the
>> file is larger than your memory size that's not possible.)

> Correct, this is why I recommended trying a defrag.  I've actually never
> seen things so bad that a simple defrag didn't fix them however (though
> I have seen a few cases where the target extent size had to be set
> higher than the default of 20MB).

Good to know.  I knew larger target extent sizes could help, but between 
not being sure they'd entirely fix it and not wanting to get too far down 
into the detail when the copy-off-the-filesystem-and-back option is 
/sure/ to fix the problem, I decided to handwave that part of it. =:^)

> Also, as counter-intuitive as it
> might sound, autodefrag really doesn't help much with this, and can
> actually make things worse.

I hadn't actually seen that here, but suspect I might, now, as previous 
autodefrag behavior on my system tended to rewrite the entire file[1], 
thereby effectively giving me the benefit of the copy-away-and-back 
technique without actually bothering, while that "bug" has now been fixed.

I sort of wish the old behavior remained an option, maybe 
radicalautodefrag or something, and must confess to being a bit concerned 
over the eventual impact here now that autodefrag does /not/ rewrite the 
entire file any more, but oh, well...  Chances are it's not going to be 
/that/ big a deal since I /am/ on fast ssd, and if it becomes one, I 
guess I can just setup say firefox-profile-defrag.timer jobs or whatever, 
as necessary.

---
[1] I forgot whether it was ssd behavior, or compression, or what, but 
something I'm using here apparently forced autodefrag to rewrite the 
entire file, and a recent "bugfix" changed that so it's more in line with 
the normal autodefrag behavior.  I rather preferred the old behavior, 
especially since I'm on fast ssd and all my large files tend to be write-
once no-rewrite anyway, but I understand the performance implications on 
large active-rewrite files such as gig-plus database and VM-image files, 
so...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs space used issue
  2018-02-28 19:54     ` Duncan
@ 2018-02-28 20:15       ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 14+ messages in thread
From: Austin S. Hemmelgarn @ 2018-02-28 20:15 UTC (permalink / raw)
  To: linux-btrfs

On 2018-02-28 14:54, Duncan wrote:
> Austin S. Hemmelgarn posted on Wed, 28 Feb 2018 14:24:40 -0500 as
> excerpted:
> 
>>> I believe this effect is what Austin was referencing when he suggested
>>> the defrag, tho defrag won't necessarily /entirely/ clear it up.  One
>>> way to be /sure/ it's cleared up would be to rewrite the entire file,
>>> deleting the original, either by copying it to a different filesystem
>>> and back (with the off-filesystem copy guaranteeing that it can't use
>>> reflinks to the existing extents), or by using cp's --reflink=never
>>> option.
>>> (FWIW, I prefer the former, just to be sure, using temporary copies to
>>> a suitably sized tmpfs for speed where possible, tho obviously if the
>>> file is larger than your memory size that's not possible.)
> 
>> Correct, this is why I recommended trying a defrag.  I've actually never
>> seen things so bad that a simple defrag didn't fix them however (though
>> I have seen a few cases where the target extent size had to be set
>> higher than the default of 20MB).
> 
> Good to know.  I knew larger target extent sizes could help, but between
> not being sure they'd entirely fix it and not wanting to get too far down
> into the detail when the copy-off-the-filesystem-and-back option is
> /sure/ to fix the problem, I decided to handwave that part of it. =:^)
FWIW, a target size of 128M has fixed it on all 5 cases I've seen where 
the default didn't.  In theory, there's probably some really 
pathological case where that won't work, but I've just gotten into the 
habit of using that by default on all my systems now and haven't seen 
any issues so far (but like you I'm pretty much exclusively on SSD's, 
and the small handful of things I have on traditional hard disks are all 
archival storage with WORM access patterns).
> 
>> Also, as counter-intuitive as it
>> might sound, autodefrag really doesn't help much with this, and can
>> actually make things worse.
> 
> I hadn't actually seen that here, but suspect I might, now, as previous
> autodefrag behavior on my system tended to rewrite the entire file[1],
> thereby effectively giving me the benefit of the copy-away-and-back
> technique without actually bothering, while that "bug" has now been fixed.
> 
> I sort of wish the old behavior remained an option, maybe
> radicalautodefrag or something, and must confess to being a bit concerned
> over the eventual impact here now that autodefrag does /not/ rewrite the
> entire file any more, but oh, well...  Chances are it's not going to be
> /that/ big a deal since I /am/ on fast ssd, and if it becomes one, I
> guess I can just setup say firefox-profile-defrag.timer jobs or whatever,
> as necessary.
> 
> ---
> [1] I forgot whether it was ssd behavior, or compression, or what, but
> something I'm using here apparently forced autodefrag to rewrite the
> entire file, and a recent "bugfix" changed that so it's more in line with
> the normal autodefrag behavior.  I rather preferred the old behavior,
> especially since I'm on fast ssd and all my large files tend to be write-
> once no-rewrite anyway, but I understand the performance implications on
> large active-rewrite files such as gig-plus database and VM-image files,
> so...
Hmm.  I've actually never seen such behavior myself.  I do know that 
compression impacts how autodefrag works (autodefrag tries to rewrite up 
to 64k around a random write, but compression operates in 128k blocks), 
but beyond that I'm not sure what might have caused this.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs space used issue
  2018-02-28 15:22     ` Andrei Borzenkov
@ 2018-03-01  9:26       ` vinayak hegde
  2018-03-01 10:18         ` Andrei Borzenkov
  2018-03-03  6:59         ` Duncan
  0 siblings, 2 replies; 14+ messages in thread
From: vinayak hegde @ 2018-03-01  9:26 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Austin S. Hemmelgarn, Btrfs BTRFS

No, there is no opened file which is deleted, I did umount and mounted
again and reboot also.

I think I am hitting the below issue, lot of random writes were
happening and the file is not fully written and its sparse file.
Let me try with disabling COW.


file offset 0                                               offset 302g
[-------------------------prealloced 302g extent----------------------]

(man it's impressive I got all that lined up right)

On disk you have 2 things. First your file which has file extents which says

inode 256, file offset 0, size 302g, offset0, disk bytenr 123, disklen 302g

and then in the extent tree, who keeps track of actual allocated space has this

extent bytenr 123, len 302g, refs 1

Now say you boot up your virt image and it writes 1 4k block to offset
0. Now you have this

[4k][--------------------302g-4k--------------------------------------]

And for your inode you now have this

inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g),
disklen 4k inode 256, file offset 4k, size 302g-4k, offset 4k,
diskbytenr 123, disklen 302g

and in your extent tree you have

extent bytenr 123, len 302g, refs 1
extent bytenr whatever, len 4k, refs 1

See that? Your file is still the same size, it is still 302g. If you
cp'ed it right now it would copy 302g of information. But what you
have actually allocated on disk? Well that's now 302g + 4k. Now lets
say your virt thing decides to write to the middle, lets say at offset
12k, now you have thisinode 256, file offset 0, size 4k, offset 0,
diskebytenr (123+302g), disklen 4k

inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g

inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever,
disklen 4k inode 256, file offset 16k, size 302g - 16k, offset 16k,
diskbytenr 123, disklen 302g

and in the extent tree you have this

extent bytenr 123, len 302g, refs 2
extent bytenr whatever, len 4k, refs 1
extent bytenr notimportant, len 4k, refs 1

See that refs 2 change? We split the original extent, so we have 2
file extents pointing to the same physical extents, so we bumped the
ref count. This will happen over and over again until we have
completely overwritten the original extent, at which point your space
usage will go back down to ~302g.We split big extents with cow, so
unless you've got lots of space to spare or are going to use nodatacow
you should probably not pre-allocate virt images

Vinayak

On Wed, Feb 28, 2018 at 8:52 PM, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
> On Wed, Feb 28, 2018 at 9:01 AM, vinayak hegde <vinayakhegdev@gmail.com> wrote:
>> I ran full defragement and balance both, but didnt help.
>
> Showing the same information immediately after full defragment would be helpful.
>
>> My created and accounting usage files are matching the du -sh output.
>> But I am not getting why btrfs internals use so much extra space.
>> My worry is, will get no space error earlier than I expect.
>> Is it expected with btrfs internal that it will use so much extra space?
>>
>
> Did you try to reboot? Deleted opened file could well cause this effect.
>
>> Vinayak
>>
>>
>>
>>
>> On Tue, Feb 27, 2018 at 7:24 PM, Austin S. Hemmelgarn
>> <ahferroin7@gmail.com> wrote:
>>> On 2018-02-27 08:09, vinayak hegde wrote:
>>>>
>>>> I am using btrfs, But I am seeing du -sh and df -h showing huge size
>>>> difference on ssd.
>>>>
>>>> mount:
>>>> /dev/drbd1 on /dc/fileunifier.datacache type btrfs
>>>>
>>>> (rw,noatime,nodiratime,flushoncommit,discard,nospace_cache,recovery,commit=5,subvolid=5,subvol=/)
>>>>
>>>>
>>>> du -sh /dc/fileunifier.datacache/ -  331G
>>>>
>>>> df -h
>>>> /dev/drbd1      746G  346G  398G  47% /dc/fileunifier.datacache
>>>>
>>>> btrfs fi usage /dc/fileunifier.datacache/
>>>> Overall:
>>>>      Device size:         745.19GiB
>>>>      Device allocated:         368.06GiB
>>>>      Device unallocated:         377.13GiB
>>>>      Device missing:             0.00B
>>>>      Used:             346.73GiB
>>>>      Free (estimated):         396.36GiB    (min: 207.80GiB)
>>>>      Data ratio:                  1.00
>>>>      Metadata ratio:              2.00
>>>>      Global reserve:         176.00MiB    (used: 0.00B)
>>>>
>>>> Data,single: Size:365.00GiB, Used:345.76GiB
>>>>     /dev/drbd1     365.00GiB
>>>>
>>>> Metadata,DUP: Size:1.50GiB, Used:493.23MiB
>>>>     /dev/drbd1       3.00GiB
>>>>
>>>> System,DUP: Size:32.00MiB, Used:80.00KiB
>>>>     /dev/drbd1      64.00MiB
>>>>
>>>> Unallocated:
>>>>     /dev/drbd1     377.13GiB
>>>>
>>>>
>>>> Even if we consider 6G metadata its 331+6 = 337.
>>>> where is 9GB used?
>>>>
>>>> Please explain.
>>>
>>> First, you're counting the metadata wrong.  The value shown per-device by
>>> `btrfs filesystem usage` already accounts for replication (so it's only 3 GB
>>> of metadata allocated, not 6 GB).  Neither `df` nor `du` looks at the chunk
>>> level allocations though.
>>>
>>> Now, with that out of the way, the discrepancy almost certainly comes form
>>> differences in how `df` and `du` calculate space usage.  In particular, `df`
>>> calls statvfs and looks at the f_blocks and f_bfree values to compute space
>>> usage, while `du` walks the filesystem tree calling stat on everything and
>>> looking at st_blksize and st_blocks (or instead at st_size if you pass in
>>> `--apparent-size` as an option).  This leads to a couple of differences in
>>> what they will count:
>>>
>>> 1. `du` may or may not properly count hardlinks, sparse files, and
>>> transparently compressed data, dependent on whether or not you use
>>> `--apparent-sizes` (by default, it does properly count all of those), while
>>> `df` will always account for those properly.
>>> 2. `du` does not properly account for reflinked blocks (from deduplication,
>>> snapshots, or use of the CLONE ioctl), and will count each reflink of every
>>> block as part of the total size, while `df` will always count each block
>>> exactly once no matter how many reflinks it has.
>>> 3. `du` does not account for all of the BTRFS metadata allocations,
>>> functionally ignoring space allocated for anything but inline data, while
>>> `df` accounts for all BTRFS metadata properly.
>>> 4. `du` will recurse into other filesystems if you don't pass the `-x`
>>> option to it, while `df` will only report for each filesystem separately.
>>> 5. `du` will only count data usage under the given mount point, and won't
>>> account for data on other subvolumes that may be mounted elsewhere (and if
>>> you pass in `-x` won't count data on other subvolumes located under the
>>> given path either), while `df` will count all the data in all subvolumes.
>>> 6. There are a couple of other differences too, but they're rather complex
>>> and dependent on the internals of BTRFS.
>>>
>>> In your case, I think the issue is probably one of the various things under
>>> item 6.  Items 1, 2 and 4 will cause `du` to report more space usage than
>>> `df`, item 3 is irrelevant because `du` shows less space than the total data
>>> chunk usage reported by `btrfs filesystem usage`, and item 5 is irrelevant
>>> because you're mounting the root subvolume and not using the `-x` option on
>>> `du` (and therefore there can't be other subvolumes you're missing).
>>>
>>> Try running a full defrag of the given mount point.  If what I think is
>>> causing this is in fact the issue, that should bring the numbers back
>>> in-line with each other.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs space used issue
  2018-03-01  9:26       ` vinayak hegde
@ 2018-03-01 10:18         ` Andrei Borzenkov
  2018-03-01 12:25           ` Austin S. Hemmelgarn
  2018-03-03  6:59         ` Duncan
  1 sibling, 1 reply; 14+ messages in thread
From: Andrei Borzenkov @ 2018-03-01 10:18 UTC (permalink / raw)
  To: vinayak hegde; +Cc: Austin S. Hemmelgarn, Btrfs BTRFS

On Thu, Mar 1, 2018 at 12:26 PM, vinayak hegde <vinayakhegdev@gmail.com> wrote:
> No, there is no opened file which is deleted, I did umount and mounted
> again and reboot also.
>
> I think I am hitting the below issue, lot of random writes were
> happening and the file is not fully written and its sparse file.
> Let me try with disabling COW.
>
>
> file offset 0                                               offset 302g
> [-------------------------prealloced 302g extent----------------------]
>
> (man it's impressive I got all that lined up right)
>
> On disk you have 2 things. First your file which has file extents which says
>
> inode 256, file offset 0, size 302g, offset0, disk bytenr 123, disklen 302g
>
> and then in the extent tree, who keeps track of actual allocated space has this
>
> extent bytenr 123, len 302g, refs 1
>
> Now say you boot up your virt image and it writes 1 4k block to offset
> 0. Now you have this
>
> [4k][--------------------302g-4k--------------------------------------]
>
> And for your inode you now have this
>
> inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g),
> disklen 4k inode 256, file offset 4k, size 302g-4k, offset 4k,
> diskbytenr 123, disklen 302g
>
> and in your extent tree you have
>
> extent bytenr 123, len 302g, refs 1
> extent bytenr whatever, len 4k, refs 1
>
> See that? Your file is still the same size, it is still 302g. If you
> cp'ed it right now it would copy 302g of information. But what you
> have actually allocated on disk? Well that's now 302g + 4k. Now lets
> say your virt thing decides to write to the middle, lets say at offset
> 12k, now you have thisinode 256, file offset 0, size 4k, offset 0,
> diskebytenr (123+302g), disklen 4k
>
> inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g
>
> inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever,
> disklen 4k inode 256, file offset 16k, size 302g - 16k, offset 16k,
> diskbytenr 123, disklen 302g
>
> and in the extent tree you have this
>
> extent bytenr 123, len 302g, refs 2
> extent bytenr whatever, len 4k, refs 1
> extent bytenr notimportant, len 4k, refs 1
>
> See that refs 2 change? We split the original extent, so we have 2
> file extents pointing to the same physical extents, so we bumped the
> ref count. This will happen over and over again until we have
> completely overwritten the original extent, at which point your space
> usage will go back down to ~302g.

Sure, I just mentioned the same in another thread. But you said you
performed full defragmentation and I expect it to "fix" this condition
by relocating data and freeing original big extent. If this did not
happen, I wonder what are conditions when defragment decides to (not)
move data.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs space used issue
  2018-03-01 10:18         ` Andrei Borzenkov
@ 2018-03-01 12:25           ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 14+ messages in thread
From: Austin S. Hemmelgarn @ 2018-03-01 12:25 UTC (permalink / raw)
  To: Andrei Borzenkov, vinayak hegde; +Cc: Btrfs BTRFS

On 2018-03-01 05:18, Andrei Borzenkov wrote:
> On Thu, Mar 1, 2018 at 12:26 PM, vinayak hegde <vinayakhegdev@gmail.com> wrote:
>> No, there is no opened file which is deleted, I did umount and mounted
>> again and reboot also.
>>
>> I think I am hitting the below issue, lot of random writes were
>> happening and the file is not fully written and its sparse file.
>> Let me try with disabling COW.
>>
>>
>> file offset 0                                               offset 302g
>> [-------------------------prealloced 302g extent----------------------]
>>
>> (man it's impressive I got all that lined up right)
>>
>> On disk you have 2 things. First your file which has file extents which says
>>
>> inode 256, file offset 0, size 302g, offset0, disk bytenr 123, disklen 302g
>>
>> and then in the extent tree, who keeps track of actual allocated space has this
>>
>> extent bytenr 123, len 302g, refs 1
>>
>> Now say you boot up your virt image and it writes 1 4k block to offset
>> 0. Now you have this
>>
>> [4k][--------------------302g-4k--------------------------------------]
>>
>> And for your inode you now have this
>>
>> inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g),
>> disklen 4k inode 256, file offset 4k, size 302g-4k, offset 4k,
>> diskbytenr 123, disklen 302g
>>
>> and in your extent tree you have
>>
>> extent bytenr 123, len 302g, refs 1
>> extent bytenr whatever, len 4k, refs 1
>>
>> See that? Your file is still the same size, it is still 302g. If you
>> cp'ed it right now it would copy 302g of information. But what you
>> have actually allocated on disk? Well that's now 302g + 4k. Now lets
>> say your virt thing decides to write to the middle, lets say at offset
>> 12k, now you have thisinode 256, file offset 0, size 4k, offset 0,
>> diskebytenr (123+302g), disklen 4k
>>
>> inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g
>>
>> inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever,
>> disklen 4k inode 256, file offset 16k, size 302g - 16k, offset 16k,
>> diskbytenr 123, disklen 302g
>>
>> and in the extent tree you have this
>>
>> extent bytenr 123, len 302g, refs 2
>> extent bytenr whatever, len 4k, refs 1
>> extent bytenr notimportant, len 4k, refs 1
>>
>> See that refs 2 change? We split the original extent, so we have 2
>> file extents pointing to the same physical extents, so we bumped the
>> ref count. This will happen over and over again until we have
>> completely overwritten the original extent, at which point your space
>> usage will go back down to ~302g.
> 
> Sure, I just mentioned the same in another thread. But you said you
> performed full defragmentation and I expect it to "fix" this condition
> by relocating data and freeing original big extent. If this did not
> happen, I wonder what are conditions when defragment decides to (not)
> move data.
> 
While I'm not certain exactly how it works, defragmentation tries to 
make all extents at least as large as a target extent size.  By default, 
this target size is 32MB (I believe it used to be 20, but I'm not 100% 
certain about that).  For files less than that size, they will always be 
fully defragmented if there is any fragmentation.  For files larger than 
that size, defrag may ignore extents larger than that size.  The `-t` 
option for the defrag command can be used to control this aspect.  It 
may also avoid given extents for other more complicated reasons 
involving free space fragmentation, but the primary one is the target 
extent size.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs space used issue
  2018-03-01  9:26       ` vinayak hegde
  2018-03-01 10:18         ` Andrei Borzenkov
@ 2018-03-03  6:59         ` Duncan
  2018-03-05 15:28           ` Christoph Hellwig
  1 sibling, 1 reply; 14+ messages in thread
From: Duncan @ 2018-03-03  6:59 UTC (permalink / raw)
  To: linux-btrfs

vinayak hegde posted on Thu, 01 Mar 2018 14:56:46 +0530 as excerpted:

> This will happen over and over again until we have completely
> overwritten the original extent, at which point your space usage will go
> back down to ~302g.We split big extents with cow, so unless you've got
> lots of space to spare or are going to use nodatacow you should probably
> not pre-allocate virt images

Indeed.  Preallocation with COW doesn't make the sense it does on an 
overwrite-in-place filesystem.  Either nocow it and take the penalties 
that brings[1], or configure your app not to preallocate in the first 
place[2].

---
[1] On btrfs, nocow implies no checksumming or transparent compression, 
either.  Also, the nocow attribute needs to be set on the empty file, 
with the easiest way to do that being to set it on the parent directory 
before file creation, so it's inherited by any newly created files/
subdirs within it.

[2] Many apps that preallocate by default have an option to turn 
preallocation off.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs space used issue
  2018-03-03  6:59         ` Duncan
@ 2018-03-05 15:28           ` Christoph Hellwig
  2018-03-05 16:17             ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2018-03-05 15:28 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On Sat, Mar 03, 2018 at 06:59:26AM +0000, Duncan wrote:
> Indeed.  Preallocation with COW doesn't make the sense it does on an 
> overwrite-in-place filesystem.

It makes a whole lot of sense, it just is a little harder to implement.

There is no reason not to preallocate specific space, or if you aren't
forced to be fully log structured by the medium, specific blocks to
COW into.  It just isn't quite as trivial as for a rewrite in place
file system to implement.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs space used issue
  2018-03-05 15:28           ` Christoph Hellwig
@ 2018-03-05 16:17             ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 14+ messages in thread
From: Austin S. Hemmelgarn @ 2018-03-05 16:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs

On 2018-03-05 10:28, Christoph Hellwig wrote:
> On Sat, Mar 03, 2018 at 06:59:26AM +0000, Duncan wrote:
>> Indeed.  Preallocation with COW doesn't make the sense it does on an
>> overwrite-in-place filesystem.
> 
> It makes a whole lot of sense, it just is a little harder to implement.
> 
> There is no reason not to preallocate specific space, or if you aren't
> forced to be fully log structured by the medium, specific blocks to
> COW into.  It just isn't quite as trivial as for a rewrite in place
> file system to implement.
Yes, there's generally no reason not to pre-allocate space, but given 
how BTRFS implements pre-allocation, it doesn't make sense to do so 
pretty much at all for anything but NOCOW files, as it doesn't even 
guarantee that you'll be able to write however much data you 
pre-allocated space for (and it doesn't matter whether you use fallocate 
or just write out a run of zeroes, either way does not work in a manner 
consistent with how other filesystems do).

There's been discussion before about this, arising from the (completely 
illogical given how fallocate is expected to behave) behavior that you 
can fallocate more than half the free space on a BTRFS volume but will 
then fail writes with -ENOSPC part way through actually writing data to 
the pre-allocated space you just reserved (and that it can fail for 
other reasons too with -ENOSPC).

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-03-05 16:17 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-27 13:09 btrfs space used issue vinayak hegde
2018-02-27 13:54 ` Austin S. Hemmelgarn
2018-02-28  6:01   ` vinayak hegde
2018-02-28 15:22     ` Andrei Borzenkov
2018-03-01  9:26       ` vinayak hegde
2018-03-01 10:18         ` Andrei Borzenkov
2018-03-01 12:25           ` Austin S. Hemmelgarn
2018-03-03  6:59         ` Duncan
2018-03-05 15:28           ` Christoph Hellwig
2018-03-05 16:17             ` Austin S. Hemmelgarn
2018-02-28 19:09 ` Duncan
2018-02-28 19:24   ` Austin S. Hemmelgarn
2018-02-28 19:54     ` Duncan
2018-02-28 20:15       ` Austin S. Hemmelgarn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.