linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* CoW overhead from old extents?
@ 2019-06-25 10:41 Roman Mamedov
  2019-06-25 14:37 ` Qu Wenruo
  2019-06-25 17:49 ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 3+ messages in thread
From: Roman Mamedov @ 2019-06-25 10:41 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I have a number of VM images in sparse NOCOW files, with:

  # du -B M -sc *
  ...
  46030M	total

and:

  # du -B M -sc --apparent-size *
  ...
  96257M	total

But despite there being nothing else on the filesystem and no snapshots,

  # df -B M .

  ... 1M-blocks   Used Available Use% ...
  ...   710192M 69024M   640102M  10% ...

The filesystem itself is:

  Data, RAID0: total=70.00GiB, used=67.38GiB
  System, RAID0: total=64.00MiB, used=16.00KiB
  Metadata, RAID0: total=1.00GiB, used=7.03MiB
  GlobalReserve, single: total=16.00MiB, used=0.00B

So there's about 23 GB of overhead to store only 46 GB of data.

I vaguely remember the reason is something along the lines of the need to keep
around old extents, which are split in the middle when CoWed, but the entire
old extent must be also kept in place, until overwritten fully.

These NOCOW files are being snapshotted for backup purposes, and the snapshot
is getting removed usually within 30 minutes (while the VMs are active and
writing to their files), so it was not pure NOCOW 100% of the time.

Main question is, can we have this recorded/explained in the wiki in precise
terms (perhaps in Gotchas), or is there maybe already a description of this
issue on it somewhere? I looked through briefly just now, and couldn't find
anything similar. Only remember this being explained once on the mailing list
a few years ago. (Anyone has a link?)

Also, any way to mitigate this and regain space? Short of shutting down the
VMs, copying their images into new files and deleting old ones. Balance,
defragment or "fallocate -d" (for the non-running ones) do not seem to help.

What's unfortunate is that "fstrim -v" only reports ~640 GB as having been
trimmed, which means the overhead part will be not freed by TRIM if this was
on top of thin-provisioned storage either.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: CoW overhead from old extents?
  2019-06-25 10:41 CoW overhead from old extents? Roman Mamedov
@ 2019-06-25 14:37 ` Qu Wenruo
  2019-06-25 17:49 ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2019-06-25 14:37 UTC (permalink / raw)
  To: Roman Mamedov, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3333 bytes --]



On 2019/6/25 下午6:41, Roman Mamedov wrote:
> Hello,
> 
> I have a number of VM images in sparse NOCOW files, with:

NODATACOW and no snapshot?

Then unless some thing like balance or defrag, it should mostly behave
much like regular fs.

> 
>   # du -B M -sc *
>   ...
>   46030M	total
> 
> and:
> 
>   # du -B M -sc --apparent-size *
>   ...
>   96257M	total
> 
> But despite there being nothing else on the filesystem and no snapshots,
> 
>   # df -B M .
> 
>   ... 1M-blocks   Used Available Use% ...
>   ...   710192M 69024M   640102M  10% ...
> 
> The filesystem itself is:
> 
>   Data, RAID0: total=70.00GiB, used=67.38GiB
>   System, RAID0: total=64.00MiB, used=16.00KiB
>   Metadata, RAID0: total=1.00GiB, used=7.03MiB
>   GlobalReserve, single: total=16.00MiB, used=0.00B
> 
> So there's about 23 GB of overhead to store only 46 GB of data.
> 
> I vaguely remember the reason is something along the lines of the need to keep
> around old extents, which are split in the middle when CoWed, but the entire
> old extent must be also kept in place, until overwritten fully.

Yes, that's the extent booking mechanism of btrfs.
But not the case for NODATACOW case if no other snapshot.

> 
> These NOCOW files are being snapshotted for backup purposes, and the snapshot
> is getting removed usually within 30 minutes (while the VMs are active and
> writing to their files), so it was not pure NOCOW 100% of the time.

Completely removed snapshots still cause some CoWed extents, which
breaks the NODATACOW flag.
Remember COW is the default behavior for btrfs, NODATACOW is kinda
second citizen in btrfs.

So that could happens under certain cases.

> 
> Main question is, can we have this recorded/explained in the wiki in precise
> terms (perhaps in Gotchas), or is there maybe already a description of this
> issue on it somewhere? I looked through briefly just now, and couldn't find
> anything similar. Only remember this being explained once on the mailing list
> a few years ago. (Anyone has a link?)
> 
> Also, any way to mitigate this and regain space? Short of shutting down the
> VMs, copying their images into new files and deleting old ones. Balance,
> defragment or "fallocate -d" (for the non-running ones) do not seem to help.

IIRC defrag should solve your problem as long as there is only one
subvolume owning that file, and all snapshots are completely removed
(subv del just orphan it, doesn't ensure it disappear on-disk, but
normally after some transactions deleted snapshots should disappear).

Balance won't change the situation at all.

And fallocate in fact would make things even worse for snapshot.
If you fallocate + set nodatacow, write some data, so far so good,
everything acts normally.

But then just after one snapshot, even writing to the unpopulated space,
they will be CoWed. So it wastes more space!

> 
> What's unfortunate is that "fstrim -v" only reports ~640 GB as having been
> trimmed, which means the overhead part will be not freed by TRIM if this was
> on top of thin-provisioned storage either.
> 

Fstrim has a bug, that if you have balanced your fs several times,
fstrim will only trim unallocated space.
It should be fixed in recent kernel releases already though.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: CoW overhead from old extents?
  2019-06-25 10:41 CoW overhead from old extents? Roman Mamedov
  2019-06-25 14:37 ` Qu Wenruo
@ 2019-06-25 17:49 ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 3+ messages in thread
From: Austin S. Hemmelgarn @ 2019-06-25 17:49 UTC (permalink / raw)
  To: Roman Mamedov, linux-btrfs

On 2019-06-25 06:41, Roman Mamedov wrote:
> Hello,
> 
> I have a number of VM images in sparse NOCOW files, with:
> 
>    # du -B M -sc *
>    ...
>    46030M	total
> 
> and:
> 
>    # du -B M -sc --apparent-size *
>    ...
>    96257M	total
> 
> But despite there being nothing else on the filesystem and no snapshots,
> 
>    # df -B M .
> 
>    ... 1M-blocks   Used Available Use% ...
>    ...   710192M 69024M   640102M  10% ...
> 
> The filesystem itself is:
> 
>    Data, RAID0: total=70.00GiB, used=67.38GiB
>    System, RAID0: total=64.00MiB, used=16.00KiB
>    Metadata, RAID0: total=1.00GiB, used=7.03MiB
>    GlobalReserve, single: total=16.00MiB, used=0.00B
> 
> So there's about 23 GB of overhead to store only 46 GB of data.
> 
> I vaguely remember the reason is something along the lines of the need to keep
> around old extents, which are split in the middle when CoWed, but the entire
> old extent must be also kept in place, until overwritten fully.
Essentially yes.
> 
> These NOCOW files are being snapshotted for backup purposes, and the snapshot
> is getting removed usually within 30 minutes (while the VMs are active and
> writing to their files), so it was not pure NOCOW 100% of the time.
> 
> Main question is, can we have this recorded/explained in the wiki in precise
> terms (perhaps in Gotchas), or is there maybe already a description of this
> issue on it somewhere? I looked through briefly just now, and couldn't find
> anything similar. Only remember this being explained once on the mailing list
> a few years ago. (Anyone has a link?)
I don't have a link, though I think I may have been one of the people 
who explained it back then.  It could indeed be better explained 
somewhere, though I think it probably isn't based on the reasoning that 
it should never get as bad as you are seeing here.
> 
> Also, any way to mitigate this and regain space? Short of shutting down the
> VMs, copying their images into new files and deleting old ones. Balance,
> defragment or "fallocate -d" (for the non-running ones) do not seem to help.
If you can attach and detach disks from the VM's while they're running 
and are using some kind of volume management inside the VM itself (LVM, 
BTRFS, ZFS, etc).

The general procedure for this is as follows:

1. Create a new (empty) disk image the same size as the one you want to 
copy.
2. Attach the new disk image to the VM which has the disk to be copied.
3. Use whatever volume management tools you have inside the VM itself to 
move things to the new disk (pvmove, btrfs replace, etc).
4. Once the data is completely moved, detach the old disk image.
5. Optionally archive the old disk image (just in case), and then remove 
it from the disk.

If it weren't NOCOW files we were talking about, you could actually 
force the extents to be rewritten 'in-place' (from a userspace 
perspective) by using the `-c` switch for defrag to change their 
compression state and then change it back.
> 
> What's unfortunate is that "fstrim -v" only reports ~640 GB as having been
> trimmed, which means the overhead part will be not freed by TRIM if this was
> on top of thin-provisioned storage either.
> 
Because it can't get rid of that overhead without rewriting the whole 
file, otherwise it would be getting freed in the first place.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-06-25 17:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-25 10:41 CoW overhead from old extents? Roman Mamedov
2019-06-25 14:37 ` Qu Wenruo
2019-06-25 17:49 ` Austin S. Hemmelgarn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).