Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
* Defragmenting to recover wasted space
@ 2019-11-07 14:03 Nate Eldredge
  2019-11-07 16:50 ` Remi Gauvin
  0 siblings, 1 reply; 6+ messages in thread
From: Nate Eldredge @ 2019-11-07 14:03 UTC (permalink / raw)
  To: linux-btrfs

I had a confusing issue on a btrfs filesystem, where the amount of space 
used according to `df', `btrfs fi usage', etc, was about 50% higher than 
the total reported by `du' or `btrfs fi du', about 185 GB vs 125 GB, 
meaning that about 60 GB was somehow wasted.  I ruled out all the usual 
suspects (deleted files still open, files under mount points, etc) and 
eventually fixed the issue by doing `btrfs fi defrag` on a directory 
containing a few big files (Virtualbox disk images).

This is on Ubuntu 19.04, currently using kernel 5.0.0-32.

So everything is good now, but I have questions:

1. What causes this?  I saw some references to "unused extents" but it 
wasn't clear how that happens, or why they wouldn't be freed through 
normal operation.  Are there certain usage patterns that exacerbate it?

2. Is this documented?  I didn't see it mentioned anywhere in the 
documentation, and defragmenting was just a random thing to try, based on 
a few hints in various blogs and mailing lists.  Luckily it worked, but 
otherwise I'm not sure how I could have known that defragmenting was the 
solution.

3. Is this reasonable?  With all the other filesystems I've used, space 
that isn't occupied by your files is available for use, minus a reasonable 
amount of overhead for metadata etc, without needing any special 
administrative chores.  Should I take it that I can't expect this from 
btrfs, and I have to plan to defragment occasionally to keep the disk from 
filling up?

4. If this is not normal, and if I'm able to reproduce it, what 
information should I gather for a bug report?

5. Is there a better way to detect this kind of wastage, to distinguish it 
from more mundane causes (deleted files still open, etc) and see how much 
space could be recovered? In particular, is there a way to tell which 
files are most affected, so that I can just defragment those?

Thanks very much for any information or pointers.

Here is info about the filesystem, if it matters.  This is from after the 
defrag.  It has two subvolumes and no snapshots.

# uname -a
Linux moneta 5.0.0-32-generic #34-Ubuntu SMP Wed Oct 2 02:06:48 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
# btrfs --version
btrfs-progs v4.20.2 
# btrfs fi show /
Label: none  uuid: [xxx]
 	Total devices 1 FS bytes used 127.83GiB
 	devid    1 size 227.29GiB used 197.02GiB path /dev/mapper/nvme0n1p3_crypt

# btrfs fi df /
Data, single: total=194.01GiB, used=127.03GiB
System, single: total=4.00MiB, used=48.00KiB
Metadata, single: total=3.01GiB, used=817.80MiB
GlobalReserve, single: total=182.75MiB, used=0.00B

Prior to the defrag, the `used=` number in `btrfs fi df` was about 185 
GiB.

-- 
Nate Eldredge
nate@thatsmathematics.com


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Defragmenting to recover wasted space
  2019-11-07 14:03 Defragmenting to recover wasted space Nate Eldredge
@ 2019-11-07 16:50 ` Remi Gauvin
  2019-11-07 19:41   ` Nate Eldredge
  0 siblings, 1 reply; 6+ messages in thread
From: Remi Gauvin @ 2019-11-07 16:50 UTC (permalink / raw)
  To: Nate Eldredge

[-- Attachment #1.1: Type: text/plain, Size: 1649 bytes --]

On 2019-11-07 9:03 a.m., Nate Eldredge wrote:

> 1. What causes this?  I saw some references to "unused extents" but it
> wasn't clear how that happens, or why they wouldn't be freed through
> normal operation.  Are there certain usage patterns that exacerbate it?

Virtual Box Image files are subject to many, many small writes... (just
booting windows, for example, can create well over 5000 file fragments.)
 When the image file is new, the extents will be very large.  In BTRFS,
the extents are immutable. When a small write creates a new 4K COW
extent, the old 4k remains as part of the old extent as well.  This
situation will remain until all the data in the old extent is
re-written.. when none of that data is referenced anymore, the extent
will be freed.

> 5. Is there a better way to detect this kind of wastage, to distinguish
> it from more mundane causes (deleted files still open, etc) and see how
> much space could be recovered? In particular, is there a way to tell
> which files are most affected, so that I can just defragment those?

Generally speaking, files that are subject to many random writes are
few, and you should be well aware of the larger ones where this might be
an issues,, (virtual image files, large databases, etc.)  These files
should be defragmented frequently.  I don't see any reason not run
defrag over the whole subvolume, but if you want to search for files
with absurd fragments, you can always use the find command to search for
files, run the filefrag command on them, then use whatever tools you
like to search the output for files with thousands of fragments.





[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Defragmenting to recover wasted space
  2019-11-07 16:50 ` Remi Gauvin
@ 2019-11-07 19:41   ` Nate Eldredge
  2019-11-08  8:01     ` Qu Wenruo
  0 siblings, 1 reply; 6+ messages in thread
From: Nate Eldredge @ 2019-11-07 19:41 UTC (permalink / raw)
  To: Remi Gauvin; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2997 bytes --]

On Thu, 7 Nov 2019, Remi Gauvin wrote:

> On 2019-11-07 9:03 a.m., Nate Eldredge wrote:
>
>> 1. What causes this?  I saw some references to "unused extents" but it
>> wasn't clear how that happens, or why they wouldn't be freed through
>> normal operation.  Are there certain usage patterns that exacerbate it?
>
> Virtual Box Image files are subject to many, many small writes... (just
> booting windows, for example, can create well over 5000 file fragments.)
> When the image file is new, the extents will be very large.  In BTRFS,
> the extents are immutable. When a small write creates a new 4K COW
> extent, the old 4k remains as part of the old extent as well.  This
> situation will remain until all the data in the old extent is
> re-written.. when none of that data is referenced anymore, the extent
> will be freed.

Thanks, Remi.  This is very helpful in understanding what is going on.  In 
particular, I didn't realize that extents are immutable even when there is 
only one reference to them (I have no snapshots or reflinks to these 
files).

I guess this also means that in the worst case, if I want to overwrite the 
entire file "in place" in a random order, I actually need additional free 
space equal to the file's size, until I get around to defragging.  That's 
rather counterintuitive for somebody used to traditional filesystems.

>> 5. Is there a better way to detect this kind of wastage, to distinguish
>> it from more mundane causes (deleted files still open, etc) and see how
>> much space could be recovered? In particular, is there a way to tell
>> which files are most affected, so that I can just defragment those?
>
> Generally speaking, files that are subject to many random writes are
> few, and you should be well aware of the larger ones where this might be
> an issues,, (virtual image files, large databases, etc.)  These files
> should be defragmented frequently.  I don't see any reason not run
> defrag over the whole subvolume, but if you want to search for files
> with absurd fragments, you can always use the find command to search for
> files, run the filefrag command on them, then use whatever tools you
> like to search the output for files with thousands of fragments.

Okay.  Defragmenting is kind of inconvenient, though, and I suppose it 
involves some extra wear on the SSD since data is really being moved. 
There's also the issue, as I understand it, that defragmenting will break 
up existing reflinks, which in some other situations I may really want to 
keep.

In fact, it seems that somehow what I really want is for the file to be 
*completely* fragmented, so that every write replaces an extent and frees 
the old one.  On an SSD I don't really care if the data blocks are 
actually contiguous.  It seems perverse, but even if there is more 
overhead, it might be worth it when I don't have a lot of free space to 
spare.  I don't suppose there is any way to arrange that?

Thanks again!

-- 
Nate Eldredge
nate@thatsmathematics.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Defragmenting to recover wasted space
  2019-11-07 19:41   ` Nate Eldredge
@ 2019-11-08  8:01     ` Qu Wenruo
  2019-11-08 15:24       ` Nate Eldredge
  0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2019-11-08  8:01 UTC (permalink / raw)
  To: Nate Eldredge, Remi Gauvin; +Cc: linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 3440 bytes --]



On 2019/11/8 上午3:41, Nate Eldredge wrote:
> On Thu, 7 Nov 2019, Remi Gauvin wrote:
> 
>> On 2019-11-07 9:03 a.m., Nate Eldredge wrote:
>>
>>> 1. What causes this?  I saw some references to "unused extents" but it
>>> wasn't clear how that happens, or why they wouldn't be freed through
>>> normal operation.  Are there certain usage patterns that exacerbate it?
>>
>> Virtual Box Image files are subject to many, many small writes... (just
>> booting windows, for example, can create well over 5000 file fragments.)
>> When the image file is new, the extents will be very large.  In BTRFS,
>> the extents are immutable. When a small write creates a new 4K COW
>> extent, the old 4k remains as part of the old extent as well.  This
>> situation will remain until all the data in the old extent is
>> re-written.. when none of that data is referenced anymore, the extent
>> will be freed.
> 
> Thanks, Remi.  This is very helpful in understanding what is going on. 
> In particular, I didn't realize that extents are immutable even when
> there is only one reference to them (I have no snapshots or reflinks to
> these files).
> 
> I guess this also means that in the worst case, if I want to overwrite
> the entire file "in place" in a random order, I actually need additional
> free space equal to the file's size, until I get around to defragging. 
> That's rather counterintuitive for somebody used to traditional
> filesystems.
> 
>>> 5. Is there a better way to detect this kind of wastage, to distinguish
>>> it from more mundane causes (deleted files still open, etc) and see how
>>> much space could be recovered? In particular, is there a way to tell
>>> which files are most affected, so that I can just defragment those?
>>
>> Generally speaking, files that are subject to many random writes are
>> few, and you should be well aware of the larger ones where this might be
>> an issues,, (virtual image files, large databases, etc.)  These files
>> should be defragmented frequently.  I don't see any reason not run
>> defrag over the whole subvolume, but if you want to search for files
>> with absurd fragments, you can always use the find command to search for
>> files, run the filefrag command on them, then use whatever tools you
>> like to search the output for files with thousands of fragments.
> 
> Okay.  Defragmenting is kind of inconvenient, though, and I suppose it
> involves some extra wear on the SSD since data is really being moved.
> There's also the issue, as I understand it, that defragmenting will
> break up existing reflinks, which in some other situations I may really
> want to keep.
> 
> In fact, it seems that somehow what I really want is for the file to be
> *completely* fragmented, so that every write replaces an extent and
> frees the old one.  On an SSD I don't really care if the data blocks are
> actually contiguous.  It seems perverse, but even if there is more
> overhead, it might be worth it when I don't have a lot of free space to
> spare.  I don't suppose there is any way to arrange that?

In fact, you can just go nodatacow.
Furthermore, nodatacow attr can be applied to a directory so that any
newer file will just inherit the nodatacow attr.

In that case, any overwrite will not be COWed (as long as there is no
snapshot for it), thus no space wasted.

Thanks,
Qu

> 
> Thanks again!
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Defragmenting to recover wasted space
  2019-11-08  8:01     ` Qu Wenruo
@ 2019-11-08 15:24       ` Nate Eldredge
  2019-11-08 15:53         ` Remi Gauvin
  0 siblings, 1 reply; 6+ messages in thread
From: Nate Eldredge @ 2019-11-08 15:24 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Remi Gauvin, linux-btrfs

On Fri, 8 Nov 2019, Qu Wenruo wrote:

> In fact, you can just go nodatacow.
> Furthermore, nodatacow attr can be applied to a directory so that any
> newer file will just inherit the nodatacow attr.
>
> In that case, any overwrite will not be COWed (as long as there is no
> snapshot for it), thus no space wasted.

Aha, I didn't know about that feature.  Thanks, that is exactly what I 
want.

-- 
Nate Eldredge
nate@thatsmathematics.com


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Defragmenting to recover wasted space
  2019-11-08 15:24       ` Nate Eldredge
@ 2019-11-08 15:53         ` Remi Gauvin
  0 siblings, 0 replies; 6+ messages in thread
From: Remi Gauvin @ 2019-11-08 15:53 UTC (permalink / raw)
  To: Nate Eldredge; +Cc: linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 1184 bytes --]

On 2019-11-08 10:24 a.m., Nate Eldredge wrote:
> On Fri, 8 Nov 2019, Qu Wenruo wrote:
> 
>> In fact, you can just go nodatacow.
>> Furthermore, nodatacow attr can be applied to a directory so that any
>> newer file will just inherit the nodatacow attr.
>>
>> In that case, any overwrite will not be COWed (as long as there is no
>> snapshot for it), thus no space wasted.
> 
> Aha, I didn't know about that feature.  Thanks, that is exactly what I
> want.
> 


I would advise caution with this approach.. with nodatacow you give up
all of the features that would make you want to use BTRFS in the first
place.  (No Checksum verification, for example.)

And if using in conjunction with BTRFS Raid, BTRFS behavior, is,, in
terms of RAID, outright psychotic.  In case of unclean shutdown while
data was being written, the RAID copies will be inconsistent, and BTRFS
will never synchronize them, (short of a full re-balance.).. What data
gets read will just randomnly depend on what device BTRFS is reading from.

If you would rather forgo the benefits of BTRFS for better performance
or fragmentation issues, why not carve out an XFS / EXT4 partition?




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-07 14:03 Defragmenting to recover wasted space Nate Eldredge
2019-11-07 16:50 ` Remi Gauvin
2019-11-07 19:41   ` Nate Eldredge
2019-11-08  8:01     ` Qu Wenruo
2019-11-08 15:24       ` Nate Eldredge
2019-11-08 15:53         ` Remi Gauvin

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git