* Not all deduped disk space freed?
@ 2020-09-11 17:51 Zhang Boyang
2020-09-11 23:50 ` Zygo Blaxell
2020-09-11 23:59 ` Qu Wenruo
0 siblings, 2 replies; 3+ messages in thread
From: Zhang Boyang @ 2020-09-11 17:51 UTC (permalink / raw)
To: linux-btrfs
Hello all,
The background was I developed a btrfs deduplication tool recently,
which was opensourced at github.com/zhangboyang/simplededup
The dedup algorithm is very simple: hash & find dupe blocks (4K) and
ioctl(FIDEDUPERANGE) to eliminate them.
However, after I run my tool, I found not all deduped blocks turned into
free space, and `btrfs fi du' [Exclusive+Set shared] != `btrfs fi usage'
[Used], as below: 2932206698496+945128120320 is far lower than 4119389741056
root@athlon:/media/datahdd# btrfs fi du --raw -s /media/datahdd
Total Exclusive Set shared Filename
4369431683072 2932206698496 945128120320 /media/datahdd
root@athlon:/media/datahdd# btrfs fi usage --raw /media/datahdd
Overall:
Device size: 8999528280064
Device allocated: 4144710549504
Device unallocated: 4854817730560
Device missing: 0
Used: 4138705166336
Free (estimated): 4856449110016 (min: 2429040244736)
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 75546624 (used: 0)
Data,single: Size:4121021120512, Used:4119389741056 (99.96%)
/dev/sdc1 2559800508416
/dev/sdb1 1561220612096
Metadata,RAID1: Size:11811160064, Used:9657270272 (81.76%)
/dev/sdc1 11811160064
/dev/sdb1 11811160064
System,RAID1: Size:33554432, Used:442368 (1.32%)
/dev/sdc1 33554432
/dev/sdb1 33554432
Unallocated:
/dev/sdc1 2429196152832
/dev/sdb1 2425621577728
root@athlon:/media/datahdd#
That's quite strange. Is this an expected behaviour?
Thank you all!
ZBY
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Not all deduped disk space freed?
2020-09-11 17:51 Not all deduped disk space freed? Zhang Boyang
@ 2020-09-11 23:50 ` Zygo Blaxell
2020-09-11 23:59 ` Qu Wenruo
1 sibling, 0 replies; 3+ messages in thread
From: Zygo Blaxell @ 2020-09-11 23:50 UTC (permalink / raw)
To: Zhang Boyang; +Cc: linux-btrfs
On Sat, Sep 12, 2020 at 01:51:12AM +0800, Zhang Boyang wrote:
> Hello all,
>
> The background was I developed a btrfs deduplication tool recently, which
> was opensourced at github.com/zhangboyang/simplededup
>
> The dedup algorithm is very simple: hash & find dupe blocks (4K) and
> ioctl(FIDEDUPERANGE) to eliminate them.
btrfs counts references to extents, not to blocks, and btrfs extents
are immutable (i.e. there is no support for splitting an extent in-place).
It is critical to understand these two points before designing a dedupe
tool for btrfs.
In order to recover any space, all of the blocks in the target extent
must be eliminated, even if they contain unique data. btrfs will not
do this for you. It will only remove the exact portion of the extent
reference(s) you supply in the ioctl arguments. It is up to the dedupe
application to provide a solution that eliminates all references to any
block in the target extent. The kernel will verify and implement it.
If a target extent contains both unique and duplicate data, any unique
data left over in the extent must be relocated (copied) to a new extent
so that the target extent can be completely replaced by dedupe operations.
If any block of the target extent remains referenced, the entire target
extent will remain on disk.
bees recovers about 50% of the potential space by making necessary data
copies. (OK, it's more accurate to say bees recovers 90% of the potential
space, then wastes about 40% of what it gained by making poor choices
about which extent in a duplicate pair to keep and getting confused by
its own temporary data).
duperemove can (with some combinations of options) perform a partial
extent map, then only match extent pairs with the same size. An extent
that contains a mix of duplicate and unique blocks is therefore not
deduped at all, because the entire extent would be unique. This runs
quickly since it's not wasting iops on dedupe calls that will have no
effect, but it doesn't recover very much space.
Other dedupers work only at the file level, which is a valid solution
in many cases. Since deduping an entire file necessarily removes the
entire file's extent references, it usually removes the target file's
extents too. Exceptions would be files that have snapshots, reflinks,
or other dedupe applied to them--those parts of the file that were still
referenced from elsewhere would remain on disk. A file-level deduper
is the least effective at freeing space, but it requires the least
examination of the filesystem structure to operate efficiently.
Deduping with a very large block size has a similar effect to deduping
entire files. The larger the dedupe block size, the greater the
probability that two random matching dedupe blocks will cover an entire
random target extent.
> However, after I run my tool, I found not all deduped blocks turned into
> free space, and `btrfs fi du' [Exclusive+Set shared] != `btrfs fi usage'
> [Used], as below: 2932206698496+945128120320 is far lower than 4119389741056
>
>
> root@athlon:/media/datahdd# btrfs fi du --raw -s /media/datahdd
> Total Exclusive Set shared Filename
> 4369431683072 2932206698496 945128120320 /media/datahdd
>
> root@athlon:/media/datahdd# btrfs fi usage --raw /media/datahdd
> Overall:
> Device size: 8999528280064
> Device allocated: 4144710549504
> Device unallocated: 4854817730560
> Device missing: 0
> Used: 4138705166336
> Free (estimated): 4856449110016 (min: 2429040244736)
> Data ratio: 1.00
> Metadata ratio: 2.00
> Global reserve: 75546624 (used: 0)
>
> Data,single: Size:4121021120512, Used:4119389741056 (99.96%)
> /dev/sdc1 2559800508416
> /dev/sdb1 1561220612096
>
> Metadata,RAID1: Size:11811160064, Used:9657270272 (81.76%)
> /dev/sdc1 11811160064
> /dev/sdb1 11811160064
>
> System,RAID1: Size:33554432, Used:442368 (1.32%)
> /dev/sdc1 33554432
> /dev/sdb1 33554432
>
> Unallocated:
> /dev/sdc1 2429196152832
> /dev/sdb1 2425621577728
> root@athlon:/media/datahdd#
>
>
> That's quite strange. Is this an expected behaviour?
>
> Thank you all!
>
>
> ZBY
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Not all deduped disk space freed?
2020-09-11 17:51 Not all deduped disk space freed? Zhang Boyang
2020-09-11 23:50 ` Zygo Blaxell
@ 2020-09-11 23:59 ` Qu Wenruo
1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2020-09-11 23:59 UTC (permalink / raw)
To: Zhang Boyang, linux-btrfs
[-- Attachment #1.1: Type: text/plain, Size: 2909 bytes --]
On 2020/9/12 上午1:51, Zhang Boyang wrote:
> Hello all,
>
> The background was I developed a btrfs deduplication tool recently,
> which was opensourced at github.com/zhangboyang/simplededup
>
> The dedup algorithm is very simple: hash & find dupe blocks (4K) and
> ioctl(FIDEDUPERANGE) to eliminate them.
>
> However, after I run my tool, I found not all deduped blocks turned into
> free space, and `btrfs fi du' [Exclusive+Set shared] != `btrfs fi usage'
> [Used], as below: 2932206698496+945128120320 is far lower than
> 4119389741056
This is mostly caused by btrfs extent booking.
Btrfs will only release the space if all of the extent get de-referred.
So the most simple case would look like this:
# mkfs.btrfs -f -b 128M $dev
# mount $dev $mnt
# xfs_io -f -c "pwrite -S 0xff 0 8M" $mnt/file1
# xfs_io -f -c "pwrite -S 0xff 0 16M" $mnt/file2
# sync
# btrfs fi df $mnt
Data, single: total=24.00MiB, used=24.00MiB
...
# xfs_io -f -c "reflink $mnt/file1 0 0 8M" $mnt/file2
Above reflink would be the same as dedupe range 0~8M of file1 and file2.
# sync
# btrfs fi df $mnt
Data, single: total=24.00MiB, used=24.00MiB
So that saved 8M won't be freed until that all of that 16M extent get freed.
This also applies to hole punching and other writes.
Thanks,
Qu
>
>
> root@athlon:/media/datahdd# btrfs fi du --raw -s /media/datahdd
> Total Exclusive Set shared Filename
> 4369431683072 2932206698496 945128120320 /media/datahdd
>
> root@athlon:/media/datahdd# btrfs fi usage --raw /media/datahdd
> Overall:
> Device size: 8999528280064
> Device allocated: 4144710549504
> Device unallocated: 4854817730560
> Device missing: 0
> Used: 4138705166336
> Free (estimated): 4856449110016 (min: 2429040244736)
> Data ratio: 1.00
> Metadata ratio: 2.00
> Global reserve: 75546624 (used: 0)
>
> Data,single: Size:4121021120512, Used:4119389741056 (99.96%)
> /dev/sdc1 2559800508416
> /dev/sdb1 1561220612096
>
> Metadata,RAID1: Size:11811160064, Used:9657270272 (81.76%)
> /dev/sdc1 11811160064
> /dev/sdb1 11811160064
>
> System,RAID1: Size:33554432, Used:442368 (1.32%)
> /dev/sdc1 33554432
> /dev/sdb1 33554432
>
> Unallocated:
> /dev/sdc1 2429196152832
> /dev/sdb1 2425621577728
> root@athlon:/media/datahdd#
>
>
> That's quite strange. Is this an expected behaviour?
>
> Thank you all!
>
>
> ZBY
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-09-11 23:59 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-11 17:51 Not all deduped disk space freed? Zhang Boyang
2020-09-11 23:50 ` Zygo Blaxell
2020-09-11 23:59 ` Qu Wenruo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.