All of lore.kernel.org
 help / color / mirror / Atom feed
* compression disk space saving - what are your results?
@ 2015-12-02  9:46 Tomasz Chmielewski
  2015-12-02 10:36 ` Duncan
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Tomasz Chmielewski @ 2015-12-02  9:46 UTC (permalink / raw)
  To: linux-btrfs

What are your disk space savings when using btrfs with compression?

I have a 200 GB btrfs filesystem which uses compress=zlib, only stores 
text files (logs), mostly multi-gigabyte files.


It's a "single" filesystem, so "df" output matches "btrfs fi df":

# df -h
Filesystem      Size  Used Avail Use% Mounted on
(...)
/dev/xvdb       200G  124G   76G  62% /var/log/remote


# du -sh /var/log/remote/
153G    /var/log/remote/


 From these numbers (124 GB used where data size is 153 GB), it appears 
that we save around 20% with zlib compression enabled.
Is 20% reasonable saving for zlib? Typically text compresses much better 
with that algorithm, although I understand that we have several 
limitations when applying that on a filesystem level.


Tomasz Chmielewski
http://wpkg.org


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compression disk space saving - what are your results?
  2015-12-02  9:46 compression disk space saving - what are your results? Tomasz Chmielewski
@ 2015-12-02 10:36 ` Duncan
  2015-12-02 14:03   ` Imran Geriskovan
  2015-12-02 13:03 ` Austin S Hemmelgarn
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Duncan @ 2015-12-02 10:36 UTC (permalink / raw)
  To: linux-btrfs

Tomasz Chmielewski posted on Wed, 02 Dec 2015 18:46:30 +0900 as excerpted:

> What are your disk space savings when using btrfs with compression?
> 
> I have a 200 GB btrfs filesystem which uses compress=zlib, only stores
> text files (logs), mostly multi-gigabyte files.
> 
> 
> It's a "single" filesystem, so "df" output matches "btrfs fi df":
> 
> # df -h Filesystem      Size  Used Avail Use% Mounted on (...)
> /dev/xvdb       200G  124G   76G  62% /var/log/remote
> 
> 
> # du -sh /var/log/remote/
> 153G    /var/log/remote/
> 
> 
>  From these numbers (124 GB used where data size is 153 GB), it appears
> that we save around 20% with zlib compression enabled.
> Is 20% reasonable saving for zlib? Typically text compresses much better
> with that algorithm, although I understand that we have several
> limitations when applying that on a filesystem level.

Here, just using compress=lzo, no compress-force and lzo not zlib, I'm 
mostly just happy to see lower usage than I was getting on reiserfs.  
Between that and no longer needing to worry whether copying a sparse file 
is going to end up sparse or not, because even if not the compression 
should effectively collapse the sparse areas, I've been happy /enough/ 
with it.


There's at least three additional factors to consider, for your case.

* There is of course metadata to consider as well as data, and on
single-device btrfs, metadata normally defaults to dup, 2X the space.  
You did say single, but didn't specify whether that was for metadata also 
(and for that matter, didn't specify whether it was a single-device 
filesystem or not, tho I assume it is).  And of course btrfs does 
checksumming that other filesystems don't do, and even puts small files 
in metadata too, all of which will be dup by default, taking even more 
space.

A btrfs fi df will of course give you separate data/metadata/system 
values, and you can take the data used value and compare that against the 
du -sh value to get a more accurate read on how well your compression 
really is working.  (Tho as noted, small files, a few KiB max, are often 
stored in the metadata, so if you have lots of those, you'd probably need 
to adjust for that, but you mentioned mostly GiB-scale files, so...)

* There's the compress vs. compress-force option and discussion.  A 
number of posters have reported that for mostly text, compress didn't 
give them expected compression results and they needed to use compress-
force.

Of course, changing the option now won't change how existing files are 
stored.  You'd have to either rewrite them, or wait for log rotation to 
rotate out the old files, to see the full effect.  Also see the btrfs fi 
defrag -c option.

* Talking about defrag, it's not snapshot aware, which brings up the 
question of whether you're using btrfs snapshots on this filesystem and 
the effect that would have if you do. 

I'll presume not, as that would seem to be important enough to mention in 
a discussion of this sort, if you were, and also because that allows me 
to simply handwave further discussion of this point away. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compression disk space saving - what are your results?
  2015-12-02  9:46 compression disk space saving - what are your results? Tomasz Chmielewski
  2015-12-02 10:36 ` Duncan
@ 2015-12-02 13:03 ` Austin S Hemmelgarn
  2015-12-02 13:53   ` Tomasz Chmielewski
  2015-12-05 13:37 ` Marc Joliet
  2015-12-05 19:38 ` guido_kuenne
  3 siblings, 1 reply; 20+ messages in thread
From: Austin S Hemmelgarn @ 2015-12-02 13:03 UTC (permalink / raw)
  To: Tomasz Chmielewski, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4096 bytes --]

On 2015-12-02 04:46, Tomasz Chmielewski wrote:
> What are your disk space savings when using btrfs with compression?
>
> I have a 200 GB btrfs filesystem which uses compress=zlib, only stores
> text files (logs), mostly multi-gigabyte files.
>
>
> It's a "single" filesystem, so "df" output matches "btrfs fi df":
>
> # df -h
> Filesystem      Size  Used Avail Use% Mounted on
> (...)
> /dev/xvdb       200G  124G   76G  62% /var/log/remote
>
>
> # du -sh /var/log/remote/
> 153G    /var/log/remote/
>
>
>  From these numbers (124 GB used where data size is 153 GB), it appears
> that we save around 20% with zlib compression enabled.
> Is 20% reasonable saving for zlib? Typically text compresses much better
> with that algorithm, although I understand that we have several
> limitations when applying that on a filesystem level.

This is actually an excellent question.  A couple of things to note 
before I share what I've seen:
1. Text compresses better with any compression algorithm.  It is by 
nature highly patterned and moderately redundant data, which is what 
benefits the most from compression.
2. When BTRFS does in-line compression, it uses 128k blocks.  Because of 
this, there are diminishing returns for smaller files when using 
compression.
3. The best compression ratio I've ever seen from zlib on real data is 
about 65-70%, and that was using SquashFS, which is designed to take up 
as little room as possible.
4. LZO gets a worse compression ratio than zlib (around 40-50% if you're 
lucky), but is a _lot_ faster.
5. By playing around with the -c option for defrag, you can compress or 
uncompress different parts of the filesystem, and get a rough idea of 
what compresses best.

Now, to my results.  These are all from my desktop system, with no 
deduplication, and the data for zlib is somewhat outdated (I've not used 
it since LZO support stabilized).

For the filesystems I have on traditional hard disks:
1. For /home (mostly text files, some SQLite databases, and a couple of 
git repositories), I get about 15-20% space savings with zlib, and about 
a 2-4$ performance hit.  I get about 5-10% space savings with lzo, but 
performance is about 5-8% better than uncompressed.
2. For /usr/src (50/50 mix of text and executable code), I get about 25% 
space savings with zlib with a 5-7% hit to performance, and about 10% 
with lzo with a 7% boost in performance relative to uncompressed.
3. For /usr/portage and /var/lib/layman (lots of small text files, a 
number of VCS repos, and about 2000 compressed source archives), I get 
about 25% space savings with zlib, with a 15% performance hit (yes, 
seriously 15%), and with lzo I get about 25% space savings with no 
measurable performance difference relative to uncompressed.

For the filesystems I have on SSD's:
1. For /var/tmp (huge assortment of different things, but usually 
similar to /usr/src because this is where packages get built), I get 
almost no space savings with either type of compression, and see a 
performance reduction of about 5% for both.
2. For /var/log (Lots of text files (notably, I don't compress rotated 
logs, and I don't have systemd's insane binary log files), I get about 
30% space savings with zlib, but it makes the _whole_ system run about 
5% slower, and I get about 20% space savings with lzo, with no 
measurable performance difference relative to uncompressed.
3. For /var/spool (Lots of really short text files, mostly stuff from 
postfix and CUPS), I actually see higher disk usage with both types of 
compression, but almost zero performance impact from either of them.
4. For /boot (a couple of big binary files that already have built-in 
compression), I see no net space savings, and don't have any numbers 
regarding performance impact.
5. For / (everything that isn't on one of the other filesystems I listed 
above), I see about 10-20% space savings from zlib, with a roughly 5% 
performance hit, and about 5-15% space savings with lzo, with no 
measurable performance difference.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compression disk space saving - what are your results?
  2015-12-02 13:03 ` Austin S Hemmelgarn
@ 2015-12-02 13:53   ` Tomasz Chmielewski
  2015-12-02 14:03     ` Wang Shilong
  2015-12-02 14:49     ` Austin S Hemmelgarn
  0 siblings, 2 replies; 20+ messages in thread
From: Tomasz Chmielewski @ 2015-12-02 13:53 UTC (permalink / raw)
  To: Austin S Hemmelgarn; +Cc: linux-btrfs

On 2015-12-02 22:03, Austin S Hemmelgarn wrote:

>>  From these numbers (124 GB used where data size is 153 GB), it 
>> appears
>> that we save around 20% with zlib compression enabled.
>> Is 20% reasonable saving for zlib? Typically text compresses much 
>> better
>> with that algorithm, although I understand that we have several
>> limitations when applying that on a filesystem level.
> 
> This is actually an excellent question.  A couple of things to note
> before I share what I've seen:
> 1. Text compresses better with any compression algorithm.  It is by
> nature highly patterned and moderately redundant data, which is what
> benefits the most from compression.

It looks that compress=zlib does not compress very well. Following 
Duncan's suggestion, I've changed it to compress-force=zlib, and 
re-copied the data to make sure the file are compressed.

Compression ratio is much much better now (on a slightly changed data 
set):

# df -h
/dev/xvdb       200G   24G  176G  12% /var/log/remote


# du -sh /var/log/remote/
138G    /var/log/remote/


So, 138 GB files use just 24 GB on disk - nice!

However, I would still expect that compress=zlib has almost the same 
effect as compress-force=zlib, for 100% text files/logs.


Tomasz Chmielewski
http://wpkg.org


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compression disk space saving - what are your results?
  2015-12-02 10:36 ` Duncan
@ 2015-12-02 14:03   ` Imran Geriskovan
  2015-12-02 14:39     ` Austin S Hemmelgarn
  0 siblings, 1 reply; 20+ messages in thread
From: Imran Geriskovan @ 2015-12-02 14:03 UTC (permalink / raw)
  To: linux-btrfs

>> What are your disk space savings when using btrfs with compression?

> * There's the compress vs. compress-force option and discussion.  A
> number of posters have reported that for mostly text, compress didn't
> give them expected compression results and they needed to use compress-
> force.

"compress-force" option compresses regardless of the "compressibility"
of the file.

"compress" option makes some inference about the "compressibility"
and decides to compress or not.

I wonder how that inference is done?
Can anyone provide some pseudo code for it?

Regards,
Imran

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compression disk space saving - what are your results?
  2015-12-02 13:53   ` Tomasz Chmielewski
@ 2015-12-02 14:03     ` Wang Shilong
  2015-12-02 14:06       ` Tomasz Chmielewski
  2015-12-02 14:49     ` Austin S Hemmelgarn
  1 sibling, 1 reply; 20+ messages in thread
From: Wang Shilong @ 2015-12-02 14:03 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: Austin S Hemmelgarn, linux-btrfs

On Wed, Dec 2, 2015 at 9:53 PM, Tomasz Chmielewski <tch@virtall.com> wrote:
> On 2015-12-02 22:03, Austin S Hemmelgarn wrote:
>
>>>  From these numbers (124 GB used where data size is 153 GB), it appears
>>> that we save around 20% with zlib compression enabled.
>>> Is 20% reasonable saving for zlib? Typically text compresses much better
>>> with that algorithm, although I understand that we have several
>>> limitations when applying that on a filesystem level.
>>
>>
>> This is actually an excellent question.  A couple of things to note
>> before I share what I've seen:
>> 1. Text compresses better with any compression algorithm.  It is by
>> nature highly patterned and moderately redundant data, which is what
>> benefits the most from compression.
>
>
> It looks that compress=zlib does not compress very well. Following Duncan's
> suggestion, I've changed it to compress-force=zlib, and re-copied the data
> to make sure the file are compressed.
>
> Compression ratio is much much better now (on a slightly changed data set):
>
> # df -h
> /dev/xvdb       200G   24G  176G  12% /var/log/remote
>
>
> # du -sh /var/log/remote/
> 138G    /var/log/remote/
>
>
> So, 138 GB files use just 24 GB on disk - nice!
>
> However, I would still expect that compress=zlib has almost the same effect
> as compress-force=zlib, for 100% text files/logs.

btw, what is your kernel version? there was a bug that detected inode
compression
ration wrong.

http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=68bb462d42a963169bf7acbe106aae08c17129a5
http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=4bcbb33255131adbe481c0467df26d654ce3bc78

Regards,
Shilong

>
>
> Tomasz Chmielewski
> http://wpkg.org
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compression disk space saving - what are your results?
  2015-12-02 14:03     ` Wang Shilong
@ 2015-12-02 14:06       ` Tomasz Chmielewski
  0 siblings, 0 replies; 20+ messages in thread
From: Tomasz Chmielewski @ 2015-12-02 14:06 UTC (permalink / raw)
  To: Wang Shilong; +Cc: Austin S Hemmelgarn, linux-btrfs

On 2015-12-02 23:03, Wang Shilong wrote:

>> Compression ratio is much much better now (on a slightly changed data 
>> set):
>> 
>> # df -h
>> /dev/xvdb       200G   24G  176G  12% /var/log/remote
>> 
>> 
>> # du -sh /var/log/remote/
>> 138G    /var/log/remote/
>> 
>> 
>> So, 138 GB files use just 24 GB on disk - nice!
>> 
>> However, I would still expect that compress=zlib has almost the same 
>> effect
>> as compress-force=zlib, for 100% text files/logs.
> 
> btw, what is your kernel version? there was a bug that detected inode
> compression
> ration wrong.
> 
> http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=68bb462d42a963169bf7acbe106aae08c17129a5
> http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=4bcbb33255131adbe481c0467df26d654ce3bc78

Linux 4.3.0.


Tomasz Chmielewski
http://wpkg.org/


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compression disk space saving - what are your results?
  2015-12-02 14:03   ` Imran Geriskovan
@ 2015-12-02 14:39     ` Austin S Hemmelgarn
  2015-12-03  6:29       ` Duncan
  0 siblings, 1 reply; 20+ messages in thread
From: Austin S Hemmelgarn @ 2015-12-02 14:39 UTC (permalink / raw)
  To: Imran Geriskovan, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1752 bytes --]

On 2015-12-02 09:03, Imran Geriskovan wrote:
>>> What are your disk space savings when using btrfs with compression?
>
>> * There's the compress vs. compress-force option and discussion.  A
>> number of posters have reported that for mostly text, compress didn't
>> give them expected compression results and they needed to use compress-
>> force.
>
> "compress-force" option compresses regardless of the "compressibility"
> of the file.
>
> "compress" option makes some inference about the "compressibility"
> and decides to compress or not.
>
> I wonder how that inference is done?
> Can anyone provide some pseudo code for it?
I'm not certain how BTRFS does it, but my guess would be trying to 
compress the block, then storing the uncompressed version if the 
compressed one is bigger.

The program lrzip has an option to do per-block compression checks kind 
of like this, but it's method is to try LZO compression on the block 
(which is fast), and only use the selected compression method (bzip2 by 
default I think, but it can also do zlib and xz) if the LZO compression 
ratio is is good enough.  If we went with a similar method, I'd say we 
should integrate LZ4 support first, and use that for the test.  I think 
NTFS compression on Windows might do something similar, but they use an 
old LZ77 derivative for their compression (I think it's referred to as 
LZNT1, and it's designed for speed, and usually doesn't get much better 
than a 30% compression ratio).

On a side note, I really wish BTRFS would just add LZ4 support.  It's a 
lot more deterministic WRT decompression time than LZO, gets a similar 
compression ratio, and runs faster on most processors for both 
compression and decompression.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compression disk space saving - what are your results?
  2015-12-02 13:53   ` Tomasz Chmielewski
  2015-12-02 14:03     ` Wang Shilong
@ 2015-12-02 14:49     ` Austin S Hemmelgarn
  2015-12-22  3:55       ` Kai Krakow
  1 sibling, 1 reply; 20+ messages in thread
From: Austin S Hemmelgarn @ 2015-12-02 14:49 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2158 bytes --]

On 2015-12-02 08:53, Tomasz Chmielewski wrote:
> On 2015-12-02 22:03, Austin S Hemmelgarn wrote:
>
>>>  From these numbers (124 GB used where data size is 153 GB), it appears
>>> that we save around 20% with zlib compression enabled.
>>> Is 20% reasonable saving for zlib? Typically text compresses much better
>>> with that algorithm, although I understand that we have several
>>> limitations when applying that on a filesystem level.
>>
>> This is actually an excellent question.  A couple of things to note
>> before I share what I've seen:
>> 1. Text compresses better with any compression algorithm.  It is by
>> nature highly patterned and moderately redundant data, which is what
>> benefits the most from compression.
>
> It looks that compress=zlib does not compress very well. Following
> Duncan's suggestion, I've changed it to compress-force=zlib, and
> re-copied the data to make sure the file are compressed.
For future reference, if you run 'btrfs filesystem defrag -r -czlib' on 
the top level directory, you can achieve the same effect without having 
to deal with the copy overhead.  This has a side effect of breaking 
reflinks, but copying the files off and back onto the filesystem does so 
also, and even then, I doubt that you're using reflinks.  There probably 
wouldn't be much difference in the time it takes, but at least you 
wouldn't be hitting another disk in the process.
>
> Compression ratio is much much better now (on a slightly changed data set):
>
> # df -h
> /dev/xvdb       200G   24G  176G  12% /var/log/remote
>
>
> # du -sh /var/log/remote/
> 138G    /var/log/remote/
>
>
> So, 138 GB files use just 24 GB on disk - nice!
>
> However, I would still expect that compress=zlib has almost the same
> effect as compress-force=zlib, for 100% text files/logs.
>
That's better than 80% space savings (it works out to about 83.6%), so I 
doubt that you'd manage to get anything better than that even with only 
plain text files.  It's interesting that there's such a big discrepancy 
though, that indicates that BTRFS really needs some work WRT deciding 
what to compress.



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compression disk space saving - what are your results?
  2015-12-02 14:39     ` Austin S Hemmelgarn
@ 2015-12-03  6:29       ` Duncan
  2015-12-03 12:09         ` Imran Geriskovan
  2015-12-04 12:37         ` Austin S Hemmelgarn
  0 siblings, 2 replies; 20+ messages in thread
From: Duncan @ 2015-12-03  6:29 UTC (permalink / raw)
  To: linux-btrfs

Austin S Hemmelgarn posted on Wed, 02 Dec 2015 09:39:08 -0500 as
excerpted:

> On 2015-12-02 09:03, Imran Geriskovan wrote:
>>>> What are your disk space savings when using btrfs with compression?
>>
>>> [Some] posters have reported that for mostly text, compress didn't
>>> give them expected compression results and they needed to use
>>> compress-force.
>>
>> "compress-force" option compresses regardless of the "compressibility"
>> of the file.
>>
>> "compress" option makes some inference about the "compressibility" and
>> decides to compress or not.
>>
>> I wonder how that inference is done?
>> Can anyone provide some pseudo code for it?

> I'm not certain how BTRFS does it, but my guess would be trying to
> compress the block, then storing the uncompressed version if the
> compressed one is bigger.

No pseudocode as I'm not a dev and wouldn't want to give the wrong 
impression, but as I believe I replied recently in another thread, based 
on comments the devs have made...

With compress, btrfs does a(n intended to be fast) trial compression of 
the first 128 KiB block or two and uses the result of that to decide 
whether to compress the entire file.

Compress-force simply bypasses that first decision point, processing the 
file as if the test always succeeded and compression was chosen.

If the decision to compress is made, the file is (evidently, again, not a 
dev, but filefrag results support) compressed a 128 KiB block at a time 
with the resulting size compared against the uncompressed version, with 
the smaller version stored.

(Filefrag doesn't understand btrfs compression and reports individual 
extents for each 128 KiB compression block, if compressed.  However, for 
many files processed with compress-force, filefrag doesn't report the 
expected size/128-KiB extents, but rather something lower.  If
filefrag -v is used, details of each "extent" are listed, and some show 
up as multiples of 128 KiB, indicating runs of uncompressable blocks that 
unlike actually compressed blocks, filefrag can and does report correctly 
as single extents.  The conclusion is thus as above, that btrfs is 
testing the compression result of each block, and not compressing if the 
"compression" ends up being negative, that is, if the "compressed" size 
is larger than the uncompressed size.)

> On a side note, I really wish BTRFS would just add LZ4 support.  It's a
> lot more deterministic WRT decompression time than LZO, gets a similar
> compression ratio, and runs faster on most processors for both
> compression and decompression.

There were patches (at least RFC level, IIRC) floating around years ago 
to add lz4... I wonder what happened to them?  My impression was that a 
large deployment somewhere may actually be running them as well, making 
them well tested (and obviously well beyond preliminary RFC level) by 
now, altho that impression could well be wrong.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compression disk space saving - what are your results?
  2015-12-03  6:29       ` Duncan
@ 2015-12-03 12:09         ` Imran Geriskovan
  2015-12-04 12:33           ` Austin S Hemmelgarn
  2015-12-04 12:37         ` Austin S Hemmelgarn
  1 sibling, 1 reply; 20+ messages in thread
From: Imran Geriskovan @ 2015-12-03 12:09 UTC (permalink / raw)
  To: linux-btrfs

>> On a side note, I really wish BTRFS would just add LZ4 support.  It's a
>> lot more deterministic WRT decompression time than LZO, gets a similar
>> compression ratio, and runs faster on most processors for both
>> compression and decompression.

Relative ratios according to
http://catchchallenger.first-world.info//wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO

Compressed size
gzip (1) - lzo (1.4) - lz4 (1.4)

Compression Time
gzip (5) - lzo (1) - lz4 (0.8)

Decompression Time
gzip (9) - lzo (4) - lz4 (1)

Compression Memory
gzip (1) - lzo (2) - lz4 (20)

Decompression Memory
gzip (1) - lzo (2) - lz4 (130). Yes 130! not a typo.

But there is a note:
Note: lz4 it's the program using this size, the
code for internal lz4 use very less memory.

However, I could not find any better apples to apples
comparison.

If lz4's real memory consumption is in orders of lzo,
than it looks good.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compression disk space saving - what are your results?
  2015-12-03 12:09         ` Imran Geriskovan
@ 2015-12-04 12:33           ` Austin S Hemmelgarn
  0 siblings, 0 replies; 20+ messages in thread
From: Austin S Hemmelgarn @ 2015-12-04 12:33 UTC (permalink / raw)
  To: Imran Geriskovan, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1513 bytes --]

On 2015-12-03 07:09, Imran Geriskovan wrote:
>>> On a side note, I really wish BTRFS would just add LZ4 support.  It's a
>>> lot more deterministic WRT decompression time than LZO, gets a similar
>>> compression ratio, and runs faster on most processors for both
>>> compression and decompression.
>
> Relative ratios according to
> http://catchchallenger.first-world.info//wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO
>
> Compressed size
> gzip (1) - lzo (1.4) - lz4 (1.4)
>
> Compression Time
> gzip (5) - lzo (1) - lz4 (0.8)
>
> Decompression Time
> gzip (9) - lzo (4) - lz4 (1)
>
> Compression Memory
> gzip (1) - lzo (2) - lz4 (20)
>
> Decompression Memory
> gzip (1) - lzo (2) - lz4 (130). Yes 130! not a typo.
>
> But there is a note:
> Note: lz4 it's the program using this size, the
> code for internal lz4 use very less memory.
>
> However, I could not find any better apples to apples
> comparison.
>
> If lz4's real memory consumption is in orders of lzo,
> than it looks good.
AFAICT, it's similar memory consumption.  I did some tests a while back 
comparing the options for kernel image compression using a VM, and one 
of the things I tested (although I can't for the life of me remember how 
exactly except that it involved using QEMU hooked up to GDB) was 
run-time decompressor footprint.  LZO really should have a smaller 
memory footprint too, it's just that lzop needs to handle almost a dozen 
different LZO compression formats.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compression disk space saving - what are your results?
  2015-12-03  6:29       ` Duncan
  2015-12-03 12:09         ` Imran Geriskovan
@ 2015-12-04 12:37         ` Austin S Hemmelgarn
  1 sibling, 0 replies; 20+ messages in thread
From: Austin S Hemmelgarn @ 2015-12-04 12:37 UTC (permalink / raw)
  To: Duncan, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3645 bytes --]

On 2015-12-03 01:29, Duncan wrote:
> Austin S Hemmelgarn posted on Wed, 02 Dec 2015 09:39:08 -0500 as
> excerpted:
>
>> On 2015-12-02 09:03, Imran Geriskovan wrote:
>>>>> What are your disk space savings when using btrfs with compression?
>>>
>>>> [Some] posters have reported that for mostly text, compress didn't
>>>> give them expected compression results and they needed to use
>>>> compress-force.
>>>
>>> "compress-force" option compresses regardless of the "compressibility"
>>> of the file.
>>>
>>> "compress" option makes some inference about the "compressibility" and
>>> decides to compress or not.
>>>
>>> I wonder how that inference is done?
>>> Can anyone provide some pseudo code for it?
>
>> I'm not certain how BTRFS does it, but my guess would be trying to
>> compress the block, then storing the uncompressed version if the
>> compressed one is bigger.
>
> No pseudocode as I'm not a dev and wouldn't want to give the wrong
> impression, but as I believe I replied recently in another thread, based
> on comments the devs have made...
>
> With compress, btrfs does a(n intended to be fast) trial compression of
> the first 128 KiB block or two and uses the result of that to decide
> whether to compress the entire file.
>
> Compress-force simply bypasses that first decision point, processing the
> file as if the test always succeeded and compression was chosen.
>
> If the decision to compress is made, the file is (evidently, again, not a
> dev, but filefrag results support) compressed a 128 KiB block at a time
> with the resulting size compared against the uncompressed version, with
> the smaller version stored.
>
> (Filefrag doesn't understand btrfs compression and reports individual
> extents for each 128 KiB compression block, if compressed.  However, for
> many files processed with compress-force, filefrag doesn't report the
> expected size/128-KiB extents, but rather something lower.  If
> filefrag -v is used, details of each "extent" are listed, and some show
> up as multiples of 128 KiB, indicating runs of uncompressable blocks that
> unlike actually compressed blocks, filefrag can and does report correctly
> as single extents.  The conclusion is thus as above, that btrfs is
> testing the compression result of each block, and not compressing if the
> "compression" ends up being negative, that is, if the "compressed" size
> is larger than the uncompressed size.)
>
>> On a side note, I really wish BTRFS would just add LZ4 support.  It's a
>> lot more deterministic WRT decompression time than LZO, gets a similar
>> compression ratio, and runs faster on most processors for both
>> compression and decompression.
>
> There were patches (at least RFC level, IIRC) floating around years ago
> to add lz4... I wonder what happened to them?  My impression was that a
> large deployment somewhere may actually be running them as well, making
> them well tested (and obviously well beyond preliminary RFC level) by
> now, altho that impression could well be wrong.
>
Hmm, I'll have to see if I can find those and rebase them.  IIRC, the 
argument against adding it was 'but we already have a fast compression 
algorithm!', which in turn says to me they didn't try to sell it on the 
most significant parts, namely that it's faster at decompression than 
LZO (even when you use the lz4hc variant, which takes longer to compress 
to give a (usually) better compression ratio, but decompresses just as 
fast as regular lz4), and the timings are a lot more deterministic 
(which is really important if you're doing real-time stuff).


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compression disk space saving - what are your results?
  2015-12-02  9:46 compression disk space saving - what are your results? Tomasz Chmielewski
  2015-12-02 10:36 ` Duncan
  2015-12-02 13:03 ` Austin S Hemmelgarn
@ 2015-12-05 13:37 ` Marc Joliet
  2015-12-05 14:11   ` Marc Joliet
  2015-12-05 19:38 ` guido_kuenne
  3 siblings, 1 reply; 20+ messages in thread
From: Marc Joliet @ 2015-12-05 13:37 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2212 bytes --]

On Wednesday 02 December 2015 18:46:30 Tomasz Chmielewski wrote:
>What are your disk space savings when using btrfs with compression?
>
>I have a 200 GB btrfs filesystem which uses compress=zlib, only stores
>text files (logs), mostly multi-gigabyte files.
>
>
>It's a "single" filesystem, so "df" output matches "btrfs fi df":
>
># df -h
>Filesystem      Size  Used Avail Use% Mounted on
>(...)
>/dev/xvdb       200G  124G   76G  62% /var/log/remote
>
>
># du -sh /var/log/remote/
>153G    /var/log/remote/
>
>
> From these numbers (124 GB used where data size is 153 GB), it appears
>that we save around 20% with zlib compression enabled.
>Is 20% reasonable saving for zlib? Typically text compresses much better
>with that algorithm, although I understand that we have several
>limitations when applying that on a filesystem level.
>
>
>Tomasz Chmielewski
>http://wpkg.org

I have a total of three file systems that use compression, on a desktop and a 
laptop.  / on both uses compress=lzo, and my backup drive uses compress=zlib 
(my RAID1 FS does not use compression).  My desktop looks like this:

% df -h
Dateisystem    Größe Benutzt Verf. Verw% Eingehängt auf
/dev/sda1       108G     79G   26G   76% /
[...]

For / I get a total of about 8G or at least 9% space saving:

# du -hsc /mnt/rootfs/*
71G     /mnt/rootfs/home
14G     /mnt/rootfs/rootfs
2,3G    /mnt/rootfs/var
87G     insgesamt

I write "at least" because this does not include snapshots.  On my laptop the 
difference is merely 1 GB (83 vs. 84 GB), but it was using the autodefrag 
mount option until yesterday (when I migrated it to an SSD using dd), which 
probably accounts for a significant amount of wasted space.  I'll see how it 
develops over the next two weeks, but I expect the ratio to become similar to 
my desktop (probably less, since there is also a lot of music on there).

I would love to answer the question for my backup drive, but du took too long 
(> 1 h) so I stopped it :-( .  I might try it again later, but no promises!

Greetings
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compression disk space saving - what are your results?
  2015-12-05 13:37 ` Marc Joliet
@ 2015-12-05 14:11   ` Marc Joliet
  2015-12-06  4:21     ` Duncan
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Joliet @ 2015-12-05 14:11 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1254 bytes --]

On Saturday 05 December 2015 14:37:05 Marc Joliet wrote:
>My desktop looks like this:
>
>% df -h
>Dateisystem    Größe Benutzt Verf. Verw% Eingehängt auf
>/dev/sda1       108G     79G   26G   76% /
>[...]
>
>For / I get a total of about 8G or at least 9% space saving:
>
># du -hsc /mnt/rootfs/*
>71G     /mnt/rootfs/home
>14G     /mnt/rootfs/rootfs
>2,3G    /mnt/rootfs/var
>87G     insgesamt
>
>I write "at least" because this does not include snapshots.

Just to be explicit, in case it was not clear, but I of course meant that the 
*du output* does not account for extra space used by snapshots.

>On my laptop
>the  difference is merely 1 GB (83 vs. 84 GB),

And here I also want to clarify that the df output was 84 GB, and the du 
output was 83 GB.  Again, the du output does not account for snapshots, which 
go back farther on the laptop: 2 weeks of daily snapshots (with autodefrag!) 
instead of up to up to 2 days of bi-hourly snapshots.

I do think it's interesting that compression (even with LZO) seems to have 
offset the extra space wastage caused by autodefrag.

Greetings
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: compression disk space saving - what are your results?
  2015-12-02  9:46 compression disk space saving - what are your results? Tomasz Chmielewski
                   ` (2 preceding siblings ...)
  2015-12-05 13:37 ` Marc Joliet
@ 2015-12-05 19:38 ` guido_kuenne
  3 siblings, 0 replies; 20+ messages in thread
From: guido_kuenne @ 2015-12-05 19:38 UTC (permalink / raw)
  To: 'linux-btrfs'

> Subject: compression disk space saving - what are your results?
> 
> What are your disk space savings when using btrfs with compression?

I checked that for some folders when I moved from ext4 to btrfs. I compared
du with df** just to get some numbers. I use lzo since btrfs-wiki said its
better for speed.


Percent_saving=(1-df/du)*100:
47% (mostly endless text files, source code etc., total amount of data is
about 1TB)
2%-10% (for data which is mostly in the form of large (several hundred MB up
to fewGB) binary files, total amount is about 4TB)
23% (for something in between, total amount is 0.4TB)

Result indicate pretty clearly: large binary files are almost not compressed
- without understanding much of it that's what I would intuitively expect
(afaik lzo is dictionary based and those binary files have little for that).


** du -s on the folder I copied to the btrfs drive. df is the difference in
between a df before and after the copy. Based on casual checking results
were consistent with the space needed on the old ext4 drive.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compression disk space saving - what are your results?
  2015-12-05 14:11   ` Marc Joliet
@ 2015-12-06  4:21     ` Duncan
  2015-12-06 11:26       ` Marc Joliet
  0 siblings, 1 reply; 20+ messages in thread
From: Duncan @ 2015-12-06  4:21 UTC (permalink / raw)
  To: linux-btrfs

Marc Joliet posted on Sat, 05 Dec 2015 15:11:51 +0100 as excerpted:

> I do think it's interesting that compression (even with LZO) seems to
> have offset the extra space wastage caused by autodefrag.

I've seen (I think) you mention that twice now.  Perhaps I'm missing 
something... How does autodefrag trigger space wastage?

What autodefrag does is watch for seriously fragmented files and queue 
them up for later defrag by a worker thread.  How would that waste space?

Unless of course you're talking about breaking reflinks to existing 
snapshots or other (possibly partial) copies of the file.  But I'd call 
that wasting space due to the snapshots storing old copies, not due to 
autodefrag keeping the current copy defragmented.  And reflinks are 
saving space by effectively storing parts of two files in the same 
extent, not autodefrag wasting it, as the default on a normal filesystem 
would be separate copies, so that's the zero-point base, and reflinks 
save from it, with autodefrag therefore not changing things from the zero-
point base.  No snapshots, no reflinks, autodefrag no longer "wastes" 
space, so it's not autodefrag's wastage in the first place, it's the 
other mechanisms' saving space.

>From my viewpoint, anyway.  I'd not ordinarily quibble over it one way or 
the other if that's what you're referring to.  But just in case you had 
something else in mind that I'm not aware of, I'm posting the question.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compression disk space saving - what are your results?
  2015-12-06  4:21     ` Duncan
@ 2015-12-06 11:26       ` Marc Joliet
  0 siblings, 0 replies; 20+ messages in thread
From: Marc Joliet @ 2015-12-06 11:26 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2317 bytes --]

On Sunday 06 December 2015 04:21:30 Duncan wrote:
>Marc Joliet posted on Sat, 05 Dec 2015 15:11:51 +0100 as excerpted:
>> I do think it's interesting that compression (even with LZO) seems to
>> have offset the extra space wastage caused by autodefrag.
>
>I've seen (I think) you mention that twice now.  Perhaps I'm missing
>something... How does autodefrag trigger space wastage?
>
>What autodefrag does is watch for seriously fragmented files and queue
>them up for later defrag by a worker thread.  How would that waste space?
>
>Unless of course you're talking about breaking reflinks to existing
>snapshots or other (possibly partial) copies of the file.

That is in fact what I was referring to.

>But I'd call
>that wasting space due to the snapshots storing old copies, not due to
>autodefrag keeping the current copy defragmented.  And reflinks are
>saving space by effectively storing parts of two files in the same
>extent, not autodefrag wasting it, as the default on a normal filesystem
>would be separate copies, so that's the zero-point base,

Of course, the default on a normal file system is to not have any snapshots 
between which to reflink ;-) .  Also, autodefrag is not a default mount 
option, so the default on BTRFS is to save space via reflinks, which is undone 
by defragmenting, hence why I see it as autodefrag triggering the waste of 
space.

>and reflinks
>save from it, with autodefrag therefore not changing things from the zero-
>point base.  No snapshots, no reflinks, autodefrag no longer "wastes"
>space, so it's not autodefrag's wastage in the first place, it's the
>other mechanisms' saving space.

To my mind it is the keeping of snapshots and the breaking of reflinks via 
autodefrag that together cause space wastage.  This is coming from the 
perspective that snapshots are *useful* and hence by themselves do not 
constitute wasted space.

>From my viewpoint, anyway.  I'd not ordinarily quibble over it one way or
>the other if that's what you're referring to.  But just in case you had
>something else in mind that I'm not aware of, I'm posting the question.

And the above is my viewpoint :-) .

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compression disk space saving - what are your results?
  2015-12-02 14:49     ` Austin S Hemmelgarn
@ 2015-12-22  3:55       ` Kai Krakow
  2015-12-22 17:25         ` james northrup
  0 siblings, 1 reply; 20+ messages in thread
From: Kai Krakow @ 2015-12-22  3:55 UTC (permalink / raw)
  To: linux-btrfs

Am Wed, 2 Dec 2015 09:49:05 -0500
schrieb Austin S Hemmelgarn <ahferroin7@gmail.com>:

> > So, 138 GB files use just 24 GB on disk - nice!
> >
> > However, I would still expect that compress=zlib has almost the same
> > effect as compress-force=zlib, for 100% text files/logs.
> >  
> That's better than 80% space savings (it works out to about 83.6%),
> so I doubt that you'd manage to get anything better than that even
> with only plain text files.  It's interesting that there's such a big
> discrepancy though, that indicates that BTRFS really needs some work
> WRT deciding what to compress.

As far as I understood from reading here, btrfs fairly quickly opts out
of compressing further extents if it stumbles across the first block
with a bad compression ratio for file.

So, what I do is compress-force=zlib for my backup drive which holds
several months of snapshots, new backups go to a scratch area which is
snapshotted after rsync finishes (important: use --no-whole-file and
--inplace).

On my system drive I use compress=lzo and hope the heuristics work.
>From time to time I use find and btrfs-defrag to selectively recompress
files (using mtime and name filters) and defrag directory nodes (which
according to docs should defrag metadata).

A 3x TB btrfs mraid1 draid0 (1.6TB used) fits onto a 2TB backup drive
with backlog worth around 4 months (daily backups). It looks pretty
effective. Forcing zlib manages to compress file additions quite well
although I didn't measure it lately. It was far from 80% but it was not
far below 40-50%.

I wish one could use per-subvolume compression option already.

-- 
Regards,
Kai

Replies to list-only preferred.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compression disk space saving - what are your results?
  2015-12-22  3:55       ` Kai Krakow
@ 2015-12-22 17:25         ` james northrup
  0 siblings, 0 replies; 20+ messages in thread
From: james northrup @ 2015-12-22 17:25 UTC (permalink / raw)
  To: Kai Krakow; +Cc: linux-btrfs

if you have been waiting for a particular compressor to reach linux,
chances are it already has.

if you are slacking with btrfs and assuming someone will port your
favorite compression profile to a btrfs mount option someday, someone
has thought of that too, and that's already happened as well.



Add support for LZ4-compressed kernel [LWN.net]
https://lwn.net/Articles/541425/

bzip2/lzma kernel compression [LWN.net] https://lwn.net/Articles/314295/

Btrfs Picks Up Snappy Compression Support - Phoronix
http://www.phoronix.com/scan.php?page=news_item&px=MTA0MjQ

fusecompress - Transparent compression FUSE filesystem (0.9.x tree) -
Google Project Hosting https://code.google.com/p/fusecompress/

On Mon, Dec 21, 2015 at 7:55 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
> Am Wed, 2 Dec 2015 09:49:05 -0500
> schrieb Austin S Hemmelgarn <ahferroin7@gmail.com>:
>
>> > So, 138 GB files use just 24 GB on disk - nice!
>> >
>> > However, I would still expect that compress=zlib has almost the same
>> > effect as compress-force=zlib, for 100% text files/logs.
>> >
>> That's better than 80% space savings (it works out to about 83.6%),
>> so I doubt that you'd manage to get anything better than that even
>> with only plain text files.  It's interesting that there's such a big
>> discrepancy though, that indicates that BTRFS really needs some work
>> WRT deciding what to compress.
>
> As far as I understood from reading here, btrfs fairly quickly opts out
> of compressing further extents if it stumbles across the first block
> with a bad compression ratio for file.
>
> So, what I do is compress-force=zlib for my backup drive which holds
> several months of snapshots, new backups go to a scratch area which is
> snapshotted after rsync finishes (important: use --no-whole-file and
> --inplace).
>
> On my system drive I use compress=lzo and hope the heuristics work.
> From time to time I use find and btrfs-defrag to selectively recompress
> files (using mtime and name filters) and defrag directory nodes (which
> according to docs should defrag metadata).
>
> A 3x TB btrfs mraid1 draid0 (1.6TB used) fits onto a 2TB backup drive
> with backlog worth around 4 months (daily backups). It looks pretty
> effective. Forcing zlib manages to compress file additions quite well
> although I didn't measure it lately. It was far from 80% but it was not
> far below 40-50%.
>
> I wish one could use per-subvolume compression option already.
>
> --
> Regards,
> Kai
>
> Replies to list-only preferred.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-12-22 17:26 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-02  9:46 compression disk space saving - what are your results? Tomasz Chmielewski
2015-12-02 10:36 ` Duncan
2015-12-02 14:03   ` Imran Geriskovan
2015-12-02 14:39     ` Austin S Hemmelgarn
2015-12-03  6:29       ` Duncan
2015-12-03 12:09         ` Imran Geriskovan
2015-12-04 12:33           ` Austin S Hemmelgarn
2015-12-04 12:37         ` Austin S Hemmelgarn
2015-12-02 13:03 ` Austin S Hemmelgarn
2015-12-02 13:53   ` Tomasz Chmielewski
2015-12-02 14:03     ` Wang Shilong
2015-12-02 14:06       ` Tomasz Chmielewski
2015-12-02 14:49     ` Austin S Hemmelgarn
2015-12-22  3:55       ` Kai Krakow
2015-12-22 17:25         ` james northrup
2015-12-05 13:37 ` Marc Joliet
2015-12-05 14:11   ` Marc Joliet
2015-12-06  4:21     ` Duncan
2015-12-06 11:26       ` Marc Joliet
2015-12-05 19:38 ` guido_kuenne

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.