All of lore.kernel.org
 help / color / mirror / Atom feed
* compress-force not really forcing compression?
@ 2018-12-19 15:41 devzero
  2018-12-20  0:53 ` Anand Jain
  2018-12-20  0:57 ` Qu Wenruo
  0 siblings, 2 replies; 9+ messages in thread
From: devzero @ 2018-12-19 15:41 UTC (permalink / raw)
  To: linux-btrfs

does compress-force really force compression?

for me (found via compsize - see https://github.com/kilobyte/compsize/issues/24 )  it looks it is problably forcing compression check for every block of a file (while compress= makes btrfs skip compression check after first block) and if some block is incompressible, apparently it's being stored uncompressed.

so, the documentation is probably telling misleading information and should be fixed!?

roland



https://btrfs.wiki.kernel.org/index.php/Compression#What_happens_to_incompressible_files.3F

What happens to incompressible files?

There is a simple decision logic: if the first portion of data being compressed is not smaller than the original, the compression of the file is disabled -- unless the filesystem is mounted with -o compress-force. In that case it'll be compressed always regardless of the compressibility of the file. This is not optimal and subject to optimizations and further development. 




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: compress-force not really forcing compression?
  2018-12-19 15:41 compress-force not really forcing compression? devzero
@ 2018-12-20  0:53 ` Anand Jain
  2018-12-20  0:57 ` Qu Wenruo
  1 sibling, 0 replies; 9+ messages in thread
From: Anand Jain @ 2018-12-20  0:53 UTC (permalink / raw)
  To: devzero; +Cc: linux-btrfs



On 12/19/2018 11:41 PM, devzero@web.de wrote:
> does compress-force really force compression?
> 
> for me (found via compsize - see https://github.com/kilobyte/compsize/issues/24 )  it looks it is problably forcing compression check for every block of a file (while compress= makes btrfs skip compression check after first block) and if some block is incompressible, apparently it's being stored uncompressed.
> 
> so, the documentation is probably telling misleading information and should be fixed!?

  A slight correction is needed to the documentation..
   The difference between the -o compress and -o compress-force is that 
the latter tries to compress every extent of the file (that is it never 
gives-up), whereas the former gives up (on an inode) at the first 
non-compressible extent. In both the cases compressed-extents are saved 
only if there is a space benefit otherwise it shall write the regular 
(uncompressed) extent.

HTH

> roland
> 
> 
> 
> https://btrfs.wiki.kernel.org/index.php/Compression#What_happens_to_incompressible_files.3F
> 
> What happens to incompressible files?
> 
> There is a simple decision logic: if the first portion of data being compressed is not smaller than the original, the compression of the file is disabled -- unless the filesystem is mounted with -o compress-force. In that case it'll be compressed always regardless of the compressibility of the file. This is not optimal and subject to optimizations and further development.
> 
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: compress-force not really forcing compression?
  2018-12-19 15:41 compress-force not really forcing compression? devzero
  2018-12-20  0:53 ` Anand Jain
@ 2018-12-20  0:57 ` Qu Wenruo
  2018-12-20 10:43   ` Nikolay Borisov
  2018-12-20 12:07   ` Austin S. Hemmelgarn
  1 sibling, 2 replies; 9+ messages in thread
From: Qu Wenruo @ 2018-12-20  0:57 UTC (permalink / raw)
  To: devzero, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1323 bytes --]



On 2018/12/19 下午11:41, devzero@web.de wrote:
> does compress-force really force compression?

It should.

The only exception is block size.

If the file is smaller than the sector size (4K for x86_64), then no
compression no matter whatever the mount options are.

> 
> for me (found via compsize - see https://github.com/kilobyte/compsize/issues/24 )  it looks it is problably forcing compression check for every block of a file (while compress= makes btrfs skip compression check after first block) and if some block is incompressible, apparently it's being stored uncompressed.

Any reproducer for this unexpected behavior?

Thanks,
Qu

> 
> so, the documentation is probably telling misleading information and should be fixed!?
> 
> roland
> 
> 
> 
> https://btrfs.wiki.kernel.org/index.php/Compression#What_happens_to_incompressible_files.3F
> 
> What happens to incompressible files?
> 
> There is a simple decision logic: if the first portion of data being compressed is not smaller than the original, the compression of the file is disabled -- unless the filesystem is mounted with -o compress-force. In that case it'll be compressed always regardless of the compressibility of the file. This is not optimal and subject to optimizations and further development. 
> 
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: compress-force not really forcing compression?
  2018-12-20  0:57 ` Qu Wenruo
@ 2018-12-20 10:43   ` Nikolay Borisov
  2018-12-23  0:24     ` Paul Jones
  2018-12-20 12:07   ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 9+ messages in thread
From: Nikolay Borisov @ 2018-12-20 10:43 UTC (permalink / raw)
  To: Qu Wenruo, devzero, linux-btrfs



On 20.12.18 г. 2:57 ч., Qu Wenruo wrote:
> 
> 
> On 2018/12/19 下午11:41, devzero@web.de wrote:
>> does compress-force really force compression?
> 
> It should.
> 
> The only exception is block size.
> 
> If the file is smaller than the sector size (4K for x86_64), then no
> compression no matter whatever the mount options are.

What FORCE_COMPRESS does is it ensures that compression is always tried
for a file (check code in compress_file_range in the if (pages) branch).
However, if btrfs_compress_pages detects compression makes no difference
then 'if (pages)'  branch is executed since will_compress will not be
set and only thing it will be do is not set BTRFS_INODE_NOCOMPRESS.

What this all means is that with FORCE_COMPRESS future writes will also
be tried to be compressed. For example if you do some non-compressible
writes on a file with no FORCE_COMPRESS then BTRFS_INODE_NOCOMPRESS will
be set. This means that all future invocation of inode_need_compress for
this inode will return false. So if at a later time the io pattern
changes to one which is compressible then it wont' be compressed.

OTOH with force-compress you will also be compressing those portions of
the file which are compressible.

IMHO the more pertinent question is :

If a file has portions which are not easily compressible does that imply
all future writes are also incompressible. IMO no, so I think what will
be prudent is remove FORCE_COMPRESS altogether and make the code act as
if it's always on.

Any opinions?

> 
>>
>> for me (found via compsize - see https://github.com/kilobyte/compsize/issues/24 )  it looks it is problably forcing compression check for every block of a file (while compress= makes btrfs skip compression check after first block) and if some block is incompressible, apparently it's being stored uncompressed.
> 
> Any reproducer for this unexpected behavior?
> 
> Thanks,
> Qu
> 
>>
>> so, the documentation is probably telling misleading information and should be fixed!?
>>
>> roland
>>
>>
>>
>> https://btrfs.wiki.kernel.org/index.php/Compression#What_happens_to_incompressible_files.3F
>>
>> What happens to incompressible files?
>>
>> There is a simple decision logic: if the first portion of data being compressed is not smaller than the original, the compression of the file is disabled -- unless the filesystem is mounted with -o compress-force. In that case it'll be compressed always regardless of the compressibility of the file. This is not optimal and subject to optimizations and further development. 
>>
>>
>>
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: compress-force not really forcing compression?
  2018-12-20  0:57 ` Qu Wenruo
  2018-12-20 10:43   ` Nikolay Borisov
@ 2018-12-20 12:07   ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 9+ messages in thread
From: Austin S. Hemmelgarn @ 2018-12-20 12:07 UTC (permalink / raw)
  To: Qu Wenruo, devzero, linux-btrfs

On 12/19/2018 7:57 PM, Qu Wenruo wrote:
> 
> 
> On 2018/12/19 下午11:41, devzero@web.de wrote:
>> does compress-force really force compression?
> 
> It should.
> 
> The only exception is block size.
> 
> If the file is smaller than the sector size (4K for x86_64), then no
> compression no matter whatever the mount options are.
> 
>>
>> for me (found via compsize - see https://github.com/kilobyte/compsize/issues/24 )  it looks it is problably forcing compression check for every block of a file (while compress= makes btrfs skip compression check after first block) and if some block is incompressible, apparently it's being stored uncompressed.
> 
> Any reproducer for this unexpected behavior?

Not a reproducer, but this appears to be how it's been behaving on my 
systems since I started using BTRFS.

It's also arguably far more useful to behave this way than to just 
unconditionally compress everything, it wastes essentially zero time 
compared to that, and gets you better space utilization (which is big, 
since the size of the blocks we're compressing can mean that 
incompressible data expands significantly when we compress it).


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: compress-force not really forcing compression?
  2018-12-20 10:43   ` Nikolay Borisov
@ 2018-12-23  0:24     ` Paul Jones
  2018-12-23  6:16       ` Adam Borowski
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Jones @ 2018-12-23  0:24 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, devzero, linux-btrfs

 -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org <linux-btrfs-
> owner@vger.kernel.org> On Behalf Of Nikolay Borisov
> Sent: Thursday, 20 December 2018 9:44 PM
> To: Qu Wenruo <quwenruo.btrfs@gmx.com>; devzero@web.de; linux-
> btrfs@vger.kernel.org
> Subject: Re: compress-force not really forcing compression?
> 
> 
> 
> On 20.12.18 г. 2:57 ч., Qu Wenruo wrote:
> >
> >
> > On 2018/12/19 下午11:41, devzero@web.de wrote:
> >> does compress-force really force compression?
> >
> > It should.
> >
> > The only exception is block size.
> >
> > If the file is smaller than the sector size (4K for x86_64), then no
> > compression no matter whatever the mount options are.
> 
> What FORCE_COMPRESS does is it ensures that compression is always tried
> for a file (check code in compress_file_range in the if (pages) branch).
> However, if btrfs_compress_pages detects compression makes no
> difference then 'if (pages)'  branch is executed since will_compress will not
> be set and only thing it will be do is not set BTRFS_INODE_NOCOMPRESS.
> 
> What this all means is that with FORCE_COMPRESS future writes will also be
> tried to be compressed. For example if you do some non-compressible
> writes on a file with no FORCE_COMPRESS then
> BTRFS_INODE_NOCOMPRESS will be set. This means that all future
> invocation of inode_need_compress for this inode will return false. So if at a
> later time the io pattern changes to one which is compressible then it wont'
> be compressed.
> 
> OTOH with force-compress you will also be compressing those portions of
> the file which are compressible.
> 
> IMHO the more pertinent question is :
> 
> If a file has portions which are not easily compressible does that imply all
> future writes are also incompressible. IMO no, so I think what will be prudent
> is remove FORCE_COMPRESS altogether and make the code act as if it's
> always on.
> 
> Any opinions?


That is a good idea. If I turn on compression I would expect everything to be compressed, except in cases where there is no size benefit.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: compress-force not really forcing compression?
  2018-12-23  0:24     ` Paul Jones
@ 2018-12-23  6:16       ` Adam Borowski
  2019-01-02 13:59         ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 9+ messages in thread
From: Adam Borowski @ 2018-12-23  6:16 UTC (permalink / raw)
  To: Paul Jones; +Cc: Nikolay Borisov, Qu Wenruo, devzero, linux-btrfs

On Sun, Dec 23, 2018 at 12:24:02AM +0000, Paul Jones wrote:
> > IMHO the more pertinent question is :
> > 
> > If a file has portions which are not easily compressible does that imply all
> > future writes are also incompressible. IMO no, so I think what will be prudent
> > is remove FORCE_COMPRESS altogether and make the code act as if it's
> > always on.
> > 
> > Any opinions?
> 
> 
> That is a good idea.  If I turn on compression I would expect everything
> to be compressed, except in cases where there is no size benefit.

I expect that the vast majority of files consist of blocks of similar
compressibility.  Thus, finding a block that fails to compress strongly
suggests other blocks are either incompressible as well or compress only
minimally.  Refusing to waste time, electricity and fragmentation in such
case is a good default, I think.

But, if you believe this should be changed, there's an easy experiment you
can try: for all files on your filesystem, chop every file into 128KB pieces
and compress each of them with your chosen algorithm.  Noting the compressed
size of every block in a file that had at least one block fail to compress
would give us some data.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢠⠒⠀⣿⡁ Ivan was a worldly man: born in St. Petersburg, raised in
⢿⡄⠘⠷⠚⠋⠀ Petrograd, lived most of his life in Leningrad, then returned
⠈⠳⣄⠀⠀⠀⠀ to the city of his birth to die.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: compress-force not really forcing compression?
  2018-12-23  6:16       ` Adam Borowski
@ 2019-01-02 13:59         ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 9+ messages in thread
From: Austin S. Hemmelgarn @ 2019-01-02 13:59 UTC (permalink / raw)
  To: Adam Borowski, Paul Jones
  Cc: Nikolay Borisov, Qu Wenruo, devzero, linux-btrfs

On 12/23/2018 1:16 AM, Adam Borowski wrote:
> On Sun, Dec 23, 2018 at 12:24:02AM +0000, Paul Jones wrote:
>>> IMHO the more pertinent question is :
>>>
>>> If a file has portions which are not easily compressible does that imply all
>>> future writes are also incompressible. IMO no, so I think what will be prudent
>>> is remove FORCE_COMPRESS altogether and make the code act as if it's
>>> always on.
>>>
>>> Any opinions?
>>
>>
>> That is a good idea.  If I turn on compression I would expect everything
>> to be compressed, except in cases where there is no size benefit.
> 
> I expect that the vast majority of files consist of blocks of similar
> compressibility.  Thus, finding a block that fails to compress strongly
> suggests other blocks are either incompressible as well or compress only
> minimally.  Refusing to waste time, electricity and fragmentation in such
> case is a good default, I think.
> 
> But, if you believe this should be changed, there's an easy experiment you
> can try: for all files on your filesystem, chop every file into 128KB pieces
> and compress each of them with your chosen algorithm.  Noting the compressed
> size of every block in a file that had at least one block fail to compress
> would give us some data.

I would suggest looking at Windows DLL files installed as part of a Wine 
setup as a potential candidate for this.  They tend to have very long 
runs of null bytes scattered seemingly randomly throughout the file 
(because hot patching, except you can't hot-patch DLL's reliably on 
Windows) and use UTF-16 strings.  As a result, the actual machine code 
generally doesn't compress well, but most of the rest of the file does. 
Fixed-size preallocated VM disk images would be another good candidate, 
just wipe the free space with zeroes from the VM before testing them.

Realistically though, I see a couple of issues with the default behavior:

* There's no way for a regular user to figure out if a file actually is 
transparently compressed or not.
* Without editing the filesystem directly, there's no way to 
preemptively set the bit in metadata that tells BTRFS to not try to 
compress a file, and there's no way to reset it either.
* The default behavior happens to be what `chattr +c` honors, which 
leads to potentially unexpected behaviors some times (I, and most people 
I know, would expect 'chattr +c' to behave like `compress-force`, not 
`compress`).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: compress-force not really forcing compression?
@ 2018-12-24  2:00 Tomasz Chmielewski
  0 siblings, 0 replies; 9+ messages in thread
From: Tomasz Chmielewski @ 2018-12-24  2:00 UTC (permalink / raw)
  To: linux-btrfs

>> > If a file has portions which are not easily compressible does that imply all
>> > future writes are also incompressible. IMO no, so I think what will be prudent
>> > is remove FORCE_COMPRESS altogether and make the code act as if it's
>> > always on.
>> >
>> > Any opinions?
>> 
>> 
>> That is a good idea.  If I turn on compression I would expect 
>> everything
>> to be compressed, except in cases where there is no size benefit.
> 
> I expect that the vast majority of files consist of blocks of similar
> compressibility.  Thus, finding a block that fails to compress strongly
> suggests other blocks are either incompressible as well or compress 
> only
> minimally.  Refusing to waste time, electricity and fragmentation in 
> such
> case is a good default, I think.

Please see this thread for an example where btrfs compression fails to 
detect that text files compress very well:

https://marc.info/?t=144905015000003&r=1&w=4


To put it short - with rsync server receiving text logs from remote 
servers, sometimes at slow speeds (due to how logfiles are appended):

- compress + zlib - some 20% compression ratio
- compress-force + zlib - some 80% compression ratio



Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-01-02 13:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-19 15:41 compress-force not really forcing compression? devzero
2018-12-20  0:53 ` Anand Jain
2018-12-20  0:57 ` Qu Wenruo
2018-12-20 10:43   ` Nikolay Borisov
2018-12-23  0:24     ` Paul Jones
2018-12-23  6:16       ` Adam Borowski
2019-01-02 13:59         ` Austin S. Hemmelgarn
2018-12-20 12:07   ` Austin S. Hemmelgarn
2018-12-24  2:00 Tomasz Chmielewski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.