All of lore.kernel.org
 help / color / mirror / Atom feed
* Not all deduped disk space freed?
@ 2020-09-11 17:51 Zhang Boyang
  2020-09-11 23:50 ` Zygo Blaxell
  2020-09-11 23:59 ` Qu Wenruo
  0 siblings, 2 replies; 3+ messages in thread
From: Zhang Boyang @ 2020-09-11 17:51 UTC (permalink / raw)
  To: linux-btrfs

Hello all,

The background was I developed a btrfs deduplication tool recently, 
which was opensourced at github.com/zhangboyang/simplededup

The dedup algorithm is very simple: hash & find dupe blocks (4K) and 
ioctl(FIDEDUPERANGE) to eliminate them.

However, after I run my tool, I found not all deduped blocks turned into 
free space, and `btrfs fi du' [Exclusive+Set shared] != `btrfs fi usage' 
[Used], as below: 2932206698496+945128120320 is far lower than 4119389741056


root@athlon:/media/datahdd# btrfs fi du --raw -s /media/datahdd
      Total   Exclusive  Set shared  Filename
4369431683072  2932206698496  945128120320  /media/datahdd

root@athlon:/media/datahdd# btrfs fi usage --raw  /media/datahdd
Overall:
     Device size:             8999528280064
     Device allocated:             4144710549504
     Device unallocated:             4854817730560
     Device missing:                         0
     Used:                 4138705166336
     Free (estimated):             4856449110016    (min: 2429040244736)
     Data ratio:                          1.00
     Metadata ratio:                      2.00
     Global reserve:                  75546624    (used: 0)

Data,single: Size:4121021120512, Used:4119389741056 (99.96%)
    /dev/sdc1    2559800508416
    /dev/sdb1    1561220612096

Metadata,RAID1: Size:11811160064, Used:9657270272 (81.76%)
    /dev/sdc1    11811160064
    /dev/sdb1    11811160064

System,RAID1: Size:33554432, Used:442368 (1.32%)
    /dev/sdc1      33554432
    /dev/sdb1      33554432

Unallocated:
    /dev/sdc1    2429196152832
    /dev/sdb1    2425621577728
root@athlon:/media/datahdd#


That's quite strange. Is this an expected behaviour?

Thank you all!


ZBY


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Not all deduped disk space freed?
  2020-09-11 17:51 Not all deduped disk space freed? Zhang Boyang
@ 2020-09-11 23:50 ` Zygo Blaxell
  2020-09-11 23:59 ` Qu Wenruo
  1 sibling, 0 replies; 3+ messages in thread
From: Zygo Blaxell @ 2020-09-11 23:50 UTC (permalink / raw)
  To: Zhang Boyang; +Cc: linux-btrfs

On Sat, Sep 12, 2020 at 01:51:12AM +0800, Zhang Boyang wrote:
> Hello all,
> 
> The background was I developed a btrfs deduplication tool recently, which
> was opensourced at github.com/zhangboyang/simplededup
> 
> The dedup algorithm is very simple: hash & find dupe blocks (4K) and
> ioctl(FIDEDUPERANGE) to eliminate them.

btrfs counts references to extents, not to blocks, and btrfs extents
are immutable (i.e. there is no support for splitting an extent in-place).
It is critical to understand these two points before designing a dedupe
tool for btrfs.

In order to recover any space, all of the blocks in the target extent
must be eliminated, even if they contain unique data.  btrfs will not
do this for you.  It will only remove the exact portion of the extent
reference(s) you supply in the ioctl arguments.  It is up to the dedupe
application to provide a solution that eliminates all references to any
block in the target extent.  The kernel will verify and implement it.

If a target extent contains both unique and duplicate data, any unique
data left over in the extent must be relocated (copied) to a new extent
so that the target extent can be completely replaced by dedupe operations.
If any block of the target extent remains referenced, the entire target
extent will remain on disk.

bees recovers about 50% of the potential space by making necessary data
copies.  (OK, it's more accurate to say bees recovers 90% of the potential
space, then wastes about 40% of what it gained by making poor choices
about which extent in a duplicate pair to keep and getting confused by
its own temporary data).

duperemove can (with some combinations of options) perform a partial
extent map, then only match extent pairs with the same size.  An extent
that contains a mix of duplicate and unique blocks is therefore not
deduped at all, because the entire extent would be unique.  This runs
quickly since it's not wasting iops on dedupe calls that will have no
effect, but it doesn't recover very much space.

Other dedupers work only at the file level, which is a valid solution
in many cases.  Since deduping an entire file necessarily removes the
entire file's extent references, it usually removes the target file's
extents too.  Exceptions would be files that have snapshots, reflinks,
or other dedupe applied to them--those parts of the file that were still
referenced from elsewhere would remain on disk.  A file-level deduper
is the least effective at freeing space, but it requires the least
examination of the filesystem structure to operate efficiently.

Deduping with a very large block size has a similar effect to deduping
entire files.  The larger the dedupe block size, the greater the
probability that two random matching dedupe blocks will cover an entire
random target extent.

> However, after I run my tool, I found not all deduped blocks turned into
> free space, and `btrfs fi du' [Exclusive+Set shared] != `btrfs fi usage'
> [Used], as below: 2932206698496+945128120320 is far lower than 4119389741056
> 
> 
> root@athlon:/media/datahdd# btrfs fi du --raw -s /media/datahdd
>      Total   Exclusive  Set shared  Filename
> 4369431683072  2932206698496  945128120320  /media/datahdd
> 
> root@athlon:/media/datahdd# btrfs fi usage --raw  /media/datahdd
> Overall:
>     Device size:             8999528280064
>     Device allocated:             4144710549504
>     Device unallocated:             4854817730560
>     Device missing:                         0
>     Used:                 4138705166336
>     Free (estimated):             4856449110016    (min: 2429040244736)
>     Data ratio:                          1.00
>     Metadata ratio:                      2.00
>     Global reserve:                  75546624    (used: 0)
> 
> Data,single: Size:4121021120512, Used:4119389741056 (99.96%)
>    /dev/sdc1    2559800508416
>    /dev/sdb1    1561220612096
> 
> Metadata,RAID1: Size:11811160064, Used:9657270272 (81.76%)
>    /dev/sdc1    11811160064
>    /dev/sdb1    11811160064
> 
> System,RAID1: Size:33554432, Used:442368 (1.32%)
>    /dev/sdc1      33554432
>    /dev/sdb1      33554432
> 
> Unallocated:
>    /dev/sdc1    2429196152832
>    /dev/sdb1    2425621577728
> root@athlon:/media/datahdd#
> 
> 
> That's quite strange. Is this an expected behaviour?
> 
> Thank you all!
> 
> 
> ZBY
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Not all deduped disk space freed?
  2020-09-11 17:51 Not all deduped disk space freed? Zhang Boyang
  2020-09-11 23:50 ` Zygo Blaxell
@ 2020-09-11 23:59 ` Qu Wenruo
  1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2020-09-11 23:59 UTC (permalink / raw)
  To: Zhang Boyang, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2909 bytes --]



On 2020/9/12 上午1:51, Zhang Boyang wrote:
> Hello all,
> 
> The background was I developed a btrfs deduplication tool recently,
> which was opensourced at github.com/zhangboyang/simplededup
> 
> The dedup algorithm is very simple: hash & find dupe blocks (4K) and
> ioctl(FIDEDUPERANGE) to eliminate them.
> 
> However, after I run my tool, I found not all deduped blocks turned into
> free space, and `btrfs fi du' [Exclusive+Set shared] != `btrfs fi usage'
> [Used], as below: 2932206698496+945128120320 is far lower than
> 4119389741056

This is mostly caused by btrfs extent booking.

Btrfs will only release the space if all of the extent get de-referred.

So the most simple case would look like this:

# mkfs.btrfs -f -b 128M $dev
# mount $dev $mnt
# xfs_io -f -c "pwrite -S 0xff 0 8M" $mnt/file1
# xfs_io -f -c "pwrite -S 0xff 0 16M" $mnt/file2
# sync
# btrfs fi df $mnt
Data, single: total=24.00MiB, used=24.00MiB
...
# xfs_io -f -c "reflink $mnt/file1 0 0 8M" $mnt/file2

Above reflink would be the same as dedupe range 0~8M of file1 and file2.

# sync
# btrfs fi df $mnt
Data, single: total=24.00MiB, used=24.00MiB

So that saved 8M won't be freed until that all of that 16M extent get freed.

This also applies to hole punching and other writes.

Thanks,
Qu

> 
> 
> root@athlon:/media/datahdd# btrfs fi du --raw -s /media/datahdd
>      Total   Exclusive  Set shared  Filename
> 4369431683072  2932206698496  945128120320  /media/datahdd
> 
> root@athlon:/media/datahdd# btrfs fi usage --raw  /media/datahdd
> Overall:
>     Device size:             8999528280064
>     Device allocated:             4144710549504
>     Device unallocated:             4854817730560
>     Device missing:                         0
>     Used:                 4138705166336
>     Free (estimated):             4856449110016    (min: 2429040244736)
>     Data ratio:                          1.00
>     Metadata ratio:                      2.00
>     Global reserve:                  75546624    (used: 0)
> 
> Data,single: Size:4121021120512, Used:4119389741056 (99.96%)
>    /dev/sdc1    2559800508416
>    /dev/sdb1    1561220612096
> 
> Metadata,RAID1: Size:11811160064, Used:9657270272 (81.76%)
>    /dev/sdc1    11811160064
>    /dev/sdb1    11811160064
> 
> System,RAID1: Size:33554432, Used:442368 (1.32%)
>    /dev/sdc1      33554432
>    /dev/sdb1      33554432
> 
> Unallocated:
>    /dev/sdc1    2429196152832
>    /dev/sdb1    2425621577728
> root@athlon:/media/datahdd#
> 
> 
> That's quite strange. Is this an expected behaviour?
> 
> Thank you all!
> 
> 
> ZBY
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-09-11 23:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-11 17:51 Not all deduped disk space freed? Zhang Boyang
2020-09-11 23:50 ` Zygo Blaxell
2020-09-11 23:59 ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.