All of lore.kernel.org
 help / color / mirror / Atom feed
* Transparent compression in the FS
@ 2003-10-14 20:30 Josh Litherland
  2003-10-15 13:33 ` Erik Mouw
  2003-10-15 16:25 ` David Woodhouse
  0 siblings, 2 replies; 92+ messages in thread
From: Josh Litherland @ 2003-10-14 20:30 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 142 bytes --]


Are there any filesystems which implement the transparent compression
attribute ?  (chattr +c)

-- 
Josh Litherland (josh@temp123.org)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-14 20:30 Transparent compression in the FS Josh Litherland
@ 2003-10-15 13:33 ` Erik Mouw
  2003-10-15 13:45   ` Josh Litherland
                     ` (4 more replies)
  2003-10-15 16:25 ` David Woodhouse
  1 sibling, 5 replies; 92+ messages in thread
From: Erik Mouw @ 2003-10-15 13:33 UTC (permalink / raw)
  To: Josh Litherland; +Cc: linux-kernel

On Tue, Oct 14, 2003 at 04:30:50PM -0400, Josh Litherland wrote:
> Are there any filesystems which implement the transparent compression
> attribute ?  (chattr +c)

The NTFS driver supports compressed files. Because it doesn't have
proper write support, I don't think it will do anything useful with
chattr +c.

Nowadays disks are so incredibly cheap, that transparent compression
support is not realy worth it anymore (IMHO).


Erik

-- 
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands
| Data lost? Stay calm and contact Harddisk-recovery.com

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 13:33 ` Erik Mouw
@ 2003-10-15 13:45   ` Josh Litherland
  2003-10-15 13:50   ` Nikita Danilov
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 92+ messages in thread
From: Josh Litherland @ 2003-10-15 13:45 UTC (permalink / raw)
  To: Erik Mouw; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1030 bytes --]

On Wed, 2003-10-15 at 09:33, Erik Mouw wrote:

> Nowadays disks are so incredibly cheap, that transparent compression
> support is not realy worth it anymore (IMHO).

Well, just a for-instance... my application is for a data recovery
company.  We have 4 bulk servers with a total of around 2TB of storage,
so we're certainly availing ourselves of the low prices of IDE disks. 
Most of this stuff is just archival; we keep it around as long as
possible in case our customers discover a month later that they're
missing a file or two.  We could probably increase our capacity by a
factor of 5 by compressing these volumes, and the performance penalties
wouldn't hurt us.  Honestly, if this is not a supportable thing, we will
probably just elect to incur the overhead of tar -j'ing things after the
customer signs off on the job.
 
I'm leaning on The Management to find and pay someone to support this in
ext3, but I don't think that's in the near future, unfortunately.

-- 
Josh Litherland (josh@temp123.org)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 13:33 ` Erik Mouw
  2003-10-15 13:45   ` Josh Litherland
@ 2003-10-15 13:50   ` Nikita Danilov
  2003-10-15 14:27     ` Erik Mouw
  2003-10-15 15:13   ` Jeff Garzik
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 92+ messages in thread
From: Nikita Danilov @ 2003-10-15 13:50 UTC (permalink / raw)
  To: Erik Mouw; +Cc: Josh Litherland, linux-kernel

Erik Mouw writes:
 > On Tue, Oct 14, 2003 at 04:30:50PM -0400, Josh Litherland wrote:
 > > Are there any filesystems which implement the transparent compression
 > > attribute ?  (chattr +c)
 > 
 > The NTFS driver supports compressed files. Because it doesn't have
 > proper write support, I don't think it will do anything useful with
 > chattr +c.
 > 
 > Nowadays disks are so incredibly cheap, that transparent compression
 > support is not realy worth it anymore (IMHO).

But disk bandwidth is so incredibly expensive that compression becoming
more and more useful: on compressed file system bandwidth of user-data
transfers can be larger than raw disk bandwidth. It is the same
situation as with allocation of disk space for files: disks are cheap,
but storing several files in the same block becomes more advantageous
over time.

 > 
 > 
 > Erik
 > 

Nikita.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 13:50   ` Nikita Danilov
@ 2003-10-15 14:27     ` Erik Mouw
  2003-10-15 14:33       ` Nikita Danilov
                         ` (3 more replies)
  0 siblings, 4 replies; 92+ messages in thread
From: Erik Mouw @ 2003-10-15 14:27 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Josh Litherland, linux-kernel

On Wed, Oct 15, 2003 at 05:50:38PM +0400, Nikita Danilov wrote:
> Erik Mouw writes:
>  > Nowadays disks are so incredibly cheap, that transparent compression
>  > support is not realy worth it anymore (IMHO).
> 
> But disk bandwidth is so incredibly expensive that compression becoming
> more and more useful: on compressed file system bandwidth of user-data
> transfers can be larger than raw disk bandwidth. It is the same
> situation as with allocation of disk space for files: disks are cheap,
> but storing several files in the same block becomes more advantageous
> over time.

You have a point, but remember that modern IDE drives can do about
50MB/s from medium. I don't think you'll find a CPU that is able to
handle transparent decompression on the fly at 50MB/s, even not with a
simple compression scheme as used in NTFS (see the NTFS docs on
SourceForge for details).


Erik

PS: let me guess: among other things, reiser4 comes with transparent
    compression? ;-)

-- 
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands
| Data lost? Stay calm and contact Harddisk-recovery.com

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 14:27     ` Erik Mouw
@ 2003-10-15 14:33       ` Nikita Danilov
  2003-10-15 15:54         ` Richard B. Johnson
                           ` (2 more replies)
  2003-10-15 14:47       ` Erik Bourget
                         ` (2 subsequent siblings)
  3 siblings, 3 replies; 92+ messages in thread
From: Nikita Danilov @ 2003-10-15 14:33 UTC (permalink / raw)
  To: Erik Mouw; +Cc: Josh Litherland, linux-kernel

Erik Mouw writes:
 > On Wed, Oct 15, 2003 at 05:50:38PM +0400, Nikita Danilov wrote:
 > > Erik Mouw writes:
 > >  > Nowadays disks are so incredibly cheap, that transparent compression
 > >  > support is not realy worth it anymore (IMHO).
 > > 
 > > But disk bandwidth is so incredibly expensive that compression becoming
 > > more and more useful: on compressed file system bandwidth of user-data
 > > transfers can be larger than raw disk bandwidth. It is the same
 > > situation as with allocation of disk space for files: disks are cheap,
 > > but storing several files in the same block becomes more advantageous
 > > over time.
 > 
 > You have a point, but remember that modern IDE drives can do about
 > 50MB/s from medium. I don't think you'll find a CPU that is able to
 > handle transparent decompression on the fly at 50MB/s, even not with a
 > simple compression scheme as used in NTFS (see the NTFS docs on
 > SourceForge for details).

Trend is that CPU is getting faster and faster with respect to the
disk. So, even if it were hard to find such a CPU to-day, it will be
common place to-morrow.

 > 
 > 
 > Erik
 > 
 > PS: let me guess: among other things, reiser4 comes with transparent
 >     compression? ;-)

Yes, it will.

 > 

Nikita.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 14:27     ` Erik Mouw
  2003-10-15 14:33       ` Nikita Danilov
@ 2003-10-15 14:47       ` Erik Bourget
  2003-10-15 15:05         ` Nikita Danilov
  2003-10-15 21:36       ` Tomas Szepe
  2003-10-17  1:32       ` Eric W. Biederman
  3 siblings, 1 reply; 92+ messages in thread
From: Erik Bourget @ 2003-10-15 14:47 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Josh Litherland, linux-kernel

Erik Mouw <erik@harddisk-recovery.com> writes:

> On Wed, Oct 15, 2003 at 05:50:38PM +0400, Nikita Danilov wrote:
>> Erik Mouw writes:
>>  > Nowadays disks are so incredibly cheap, that transparent compression
>>  > support is not realy worth it anymore (IMHO).
>> 
>> But disk bandwidth is so incredibly expensive that compression becoming
>> more and more useful: on compressed file system bandwidth of user-data
>> transfers can be larger than raw disk bandwidth. It is the same
>> situation as with allocation of disk space for files: disks are cheap,
>> but storing several files in the same block becomes more advantageous
>> over time.
>
> You have a point, but remember that modern IDE drives can do about
> 50MB/s from medium. I don't think you'll find a CPU that is able to
> handle transparent decompression on the fly at 50MB/s, even not with a
> simple compression scheme as used in NTFS (see the NTFS docs on
> SourceForge for details).
>
> Erik
>
> PS: let me guess: among other things, reiser4 comes with transparent
>     compression? ;-)

Reiser4 made my coffee this morning, it's wonderful :)

Seriously, though, (and getting off the topic), has anyone started to use
reiser4 in a high-load environment?  I've got a mail system that shoots a few
million messages through it every day and a filesystem that's faster with
creating and deleting tons of ~4kb qmail queue files (with data journaling!)
would be verrry innnteresting.

- Erik Bourget


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 14:47       ` Erik Bourget
@ 2003-10-15 15:05         ` Nikita Danilov
  2003-10-15 15:06           ` Erik Bourget
  0 siblings, 1 reply; 92+ messages in thread
From: Nikita Danilov @ 2003-10-15 15:05 UTC (permalink / raw)
  To: Erik Bourget; +Cc: Josh Litherland, linux-kernel

Erik Bourget writes:
 > Erik Mouw <erik@harddisk-recovery.com> writes:
 > 
 > > On Wed, Oct 15, 2003 at 05:50:38PM +0400, Nikita Danilov wrote:
 > >> Erik Mouw writes:
 > >>  > Nowadays disks are so incredibly cheap, that transparent compression
 > >>  > support is not realy worth it anymore (IMHO).
 > >> 
 > >> But disk bandwidth is so incredibly expensive that compression becoming
 > >> more and more useful: on compressed file system bandwidth of user-data
 > >> transfers can be larger than raw disk bandwidth. It is the same
 > >> situation as with allocation of disk space for files: disks are cheap,
 > >> but storing several files in the same block becomes more advantageous
 > >> over time.
 > >
 > > You have a point, but remember that modern IDE drives can do about
 > > 50MB/s from medium. I don't think you'll find a CPU that is able to
 > > handle transparent decompression on the fly at 50MB/s, even not with a
 > > simple compression scheme as used in NTFS (see the NTFS docs on
 > > SourceForge for details).
 > >
 > > Erik
 > >
 > > PS: let me guess: among other things, reiser4 comes with transparent
 > >     compression? ;-)
 > 
 > Reiser4 made my coffee this morning, it's wonderful :)
 > 
 > Seriously, though, (and getting off the topic), has anyone started to use
 > reiser4 in a high-load environment?  I've got a mail system that shoots a few
 > million messages through it every day and a filesystem that's faster with
 > creating and deleting tons of ~4kb qmail queue files (with data journaling!)
 > would be verrry innnteresting.

Please, don't use reiser4 in production environments. It is still in the
late debugging stage. Stay tuned, as they say. :)

 > 
 > - Erik Bourget
 > 

Nikita.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 15:05         ` Nikita Danilov
@ 2003-10-15 15:06           ` Erik Bourget
  0 siblings, 0 replies; 92+ messages in thread
From: Erik Bourget @ 2003-10-15 15:06 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: linux-kernel

Nikita Danilov <Nikita@Namesys.COM> writes:

> Erik Bourget writes:
>  > Seriously, though, (and getting off the topic), has anyone started to use
>  > reiser4 in a high-load environment?  I've got a mail system that shoots a
>  > few million messages through it every day and a filesystem that's faster
>  > with creating and deleting tons of ~4kb qmail queue files (with data
>  > journaling!)  would be verrry innnteresting.
>
> Please, don't use reiser4 in production environments. It is still in the
> late debugging stage. Stay tuned, as they say. :)

What?  It's too late, I already switched the storage to the kernel 2.4 NTFS
write driver, LVM'd wiht reiser4!  arrrrrg ;D

- Erik


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 13:33 ` Erik Mouw
  2003-10-15 13:45   ` Josh Litherland
  2003-10-15 13:50   ` Nikita Danilov
@ 2003-10-15 15:13   ` Jeff Garzik
  2003-10-15 21:00     ` Christopher Li
  2003-10-16 16:29     ` Andrea Arcangeli
  2003-10-16  8:27   ` tconnors+linuxkernel1066292516
  2003-10-17 10:55   ` Ingo Oeser
  4 siblings, 2 replies; 92+ messages in thread
From: Jeff Garzik @ 2003-10-15 15:13 UTC (permalink / raw)
  To: Erik Mouw; +Cc: Josh Litherland, linux-kernel

Erik Mouw wrote:
> On Tue, Oct 14, 2003 at 04:30:50PM -0400, Josh Litherland wrote:
> 
>>Are there any filesystems which implement the transparent compression
>>attribute ?  (chattr +c)
> 
> 
> The NTFS driver supports compressed files. Because it doesn't have
> proper write support, I don't think it will do anything useful with
> chattr +c.
> 
> Nowadays disks are so incredibly cheap, that transparent compression
> support is not realy worth it anymore (IMHO).


Josh and others should take a look at Plan9's venti file storage method 
-- archival storage is a series of unordered blocks, all of which are 
indexed by the sha1 hash of their contents.  This magically coalesces 
all duplicate blocks by its very nature, including the loooooong runs of 
zeroes that you'll find in many filesystems.  I bet savings on "all 
bytes in this block are zero" are worth a bunch right there.

	Jeff




^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 14:33       ` Nikita Danilov
@ 2003-10-15 15:54         ` Richard B. Johnson
  2003-10-15 16:21           ` Nikita Danilov
  2003-10-15 16:04         ` Erik Mouw
  2003-10-15 18:54         ` root
  2 siblings, 1 reply; 92+ messages in thread
From: Richard B. Johnson @ 2003-10-15 15:54 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Erik Mouw, Josh Litherland, Linux kernel

On Wed, 15 Oct 2003, Nikita Danilov wrote:

> Erik Mouw writes:
>  > On Wed, Oct 15, 2003 at 05:50:38PM +0400, Nikita Danilov wrote:
>  > > Erik Mouw writes:
>  > >  > Nowadays disks are so incredibly cheap, that transparent compression
>  > >  > support is not realy worth it anymore (IMHO).
>  > >
[SNIPPED...]

>  >
>  > PS: let me guess: among other things, reiser4 comes with transparent
>  >     compression? ;-)
>
> Yes, it will.
>

EeeeeeK!  A single bad sector could prevent an entire database from
being uncompressed! You may want to re-think that idea before you
commit a lot of time to it.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 14:33       ` Nikita Danilov
  2003-10-15 15:54         ` Richard B. Johnson
@ 2003-10-15 16:04         ` Erik Mouw
  2003-10-15 17:24           ` Josh Litherland
  2003-10-15 19:03           ` Geert Uytterhoeven
  2003-10-15 18:54         ` root
  2 siblings, 2 replies; 92+ messages in thread
From: Erik Mouw @ 2003-10-15 16:04 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Josh Litherland, linux-kernel

On Wed, Oct 15, 2003 at 06:33:03PM +0400, Nikita Danilov wrote:
> Erik Mouw writes:
>  > You have a point, but remember that modern IDE drives can do about
>  > 50MB/s from medium. I don't think you'll find a CPU that is able to
>  > handle transparent decompression on the fly at 50MB/s, even not with a
>  > simple compression scheme as used in NTFS (see the NTFS docs on
>  > SourceForge for details).
> 
> Trend is that CPU is getting faster and faster with respect to the
> disk. So, even if it were hard to find such a CPU to-day, it will be
> common place to-morrow.

I'm not too sure about this. It's my feeling that CPU speed and disk
throughput grow about as fast. I don't have hard figures, so I can be
proven wrong on this.

FYI: you hardly see compressed files on NTFS. If you do, it's either
because the user thought it was a fun feature (in which case only a
small amount of files are compressed), or because it's an old (small)
disk filled to the brim with data (in which case most of the volume is
compressed). In the latter case compression indeed makes things faster,
but only when using the disk with a modern CPU, not when using it in
the original machine.


Erik

-- 
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands
| Data lost? Stay calm and contact Harddisk-recovery.com

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 15:54         ` Richard B. Johnson
@ 2003-10-15 16:21           ` Nikita Danilov
  2003-10-15 17:19             ` Richard B. Johnson
  0 siblings, 1 reply; 92+ messages in thread
From: Nikita Danilov @ 2003-10-15 16:21 UTC (permalink / raw)
  To: root; +Cc: Erik Mouw, Josh Litherland, Linux kernel

Richard B. Johnson writes:
 > On Wed, 15 Oct 2003, Nikita Danilov wrote:
 > 
 > > Erik Mouw writes:
 > >  > On Wed, Oct 15, 2003 at 05:50:38PM +0400, Nikita Danilov wrote:
 > >  > > Erik Mouw writes:
 > >  > >  > Nowadays disks are so incredibly cheap, that transparent compression
 > >  > >  > support is not realy worth it anymore (IMHO).
 > >  > >
 > [SNIPPED...]
 > 
 > >  >
 > >  > PS: let me guess: among other things, reiser4 comes with transparent
 > >  >     compression? ;-)
 > >
 > > Yes, it will.
 > >
 > 
 > EeeeeeK!  A single bad sector could prevent an entire database from
 > being uncompressed! You may want to re-think that idea before you
 > commit a lot of time to it.

It could not if block-level compression is used. Which is the only
solution, given random-access to file bodies.

 > 
 > Cheers,
 > Dick Johnson
 > Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
 >             Note 96.31% of all statistics are fiction.
 > 
 > 

Nikita.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-14 20:30 Transparent compression in the FS Josh Litherland
  2003-10-15 13:33 ` Erik Mouw
@ 2003-10-15 16:25 ` David Woodhouse
  2003-10-15 16:56   ` Andreas Dilger
  1 sibling, 1 reply; 92+ messages in thread
From: David Woodhouse @ 2003-10-15 16:25 UTC (permalink / raw)
  To: josh; +Cc: linux-kernel

On Tue, 2003-10-14 at 16:30 -0400, Josh Litherland wrote:
> Are there any filesystems which implement the transparent compression
> attribute ?  (chattr +c)

JFFS2 doesn't implement 'chattr +c', which is in fact an EXT2-private
ioctl. But it does do transparent compression.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 16:25 ` David Woodhouse
@ 2003-10-15 16:56   ` Andreas Dilger
  2003-10-15 17:44     ` David Woodhouse
  0 siblings, 1 reply; 92+ messages in thread
From: Andreas Dilger @ 2003-10-15 16:56 UTC (permalink / raw)
  To: David Woodhouse; +Cc: josh, linux-kernel

On Oct 15, 2003  17:25 +0100, David Woodhouse wrote:
> On Tue, 2003-10-14 at 16:30 -0400, Josh Litherland wrote:
> > Are there any filesystems which implement the transparent compression
> > attribute ?  (chattr +c)
> 
> JFFS2 doesn't implement 'chattr +c', which is in fact an EXT2-private
> ioctl. But it does do transparent compression.

Actually, reiserfs also supports ext2-compatible SETFLAGS/GETFLAGS ioctls.
This allows us to use the same tools for lsattr and chattr instead of using
lsattr.reiserfs, etc.  Our Lustre distributed filesystem does the same.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 16:21           ` Nikita Danilov
@ 2003-10-15 17:19             ` Richard B. Johnson
  2003-10-15 17:37               ` Andreas Dilger
                                 ` (2 more replies)
  0 siblings, 3 replies; 92+ messages in thread
From: Richard B. Johnson @ 2003-10-15 17:19 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Erik Mouw, Josh Litherland, Linux kernel

On Wed, 15 Oct 2003, Nikita Danilov wrote:

> Richard B. Johnson writes:
>  > On Wed, 15 Oct 2003, Nikita Danilov wrote:
>  >
>  > > Erik Mouw writes:
>  > >  > On Wed, Oct 15, 2003 at 05:50:38PM +0400, Nikita Danilov wrote:
>  > >  > > Erik Mouw writes:
>  > >  > >  > Nowadays disks are so incredibly cheap, that transparent compression
>  > >  > >  > support is not realy worth it anymore (IMHO).
>  > >  > >
>  > [SNIPPED...]
>  >
>  > >  >
>  > >  > PS: let me guess: among other things, reiser4 comes with transparent
>  > >  >     compression? ;-)
>  > >
>  > > Yes, it will.
>  > >
>  >
>  > EeeeeeK!  A single bad sector could prevent an entire database from
>  > being uncompressed! You may want to re-think that idea before you
>  > commit a lot of time to it.
>
> It could not if block-level compression is used. Which is the only
> solution, given random-access to file bodies.
>

Then the degenerative case is no compression at all. There is no
advantage to writing a block that is 1/N full of compressed data.
You end up having to write the whole block anyway.

This problem was well developed in the days where RLE (at the hardware
level) was used to extend the size of hard disks from their physical
size of about 38 megabytes to about 70 megabytes. The minimim size
of a read or write is a sector.

So, lets's use the minimum compression alogithm, no sliding
dictionaries complicating things, just RLE and see.

The alogithm is a sentinal byte, a byte representing the number
of bytes to expand -1, then the byte to expand. The sentinal byte
in RLE was 0xbb. If you needed to read/write a 0xbb, you need
to expand that to three bytes, 0xbb, 0x00, 0xbb.
                                  |     |     |___ byte to expand
                                  |     |________ nr bytes (0 + 1)
                                  |______________ sentinal byte

All other sequences will reduce the size. So, we have a 512-
byte sector full of nulls, what gets written is:
        0xbb, 0xff, 0x00, 0xbb, 0xff, 0x00
           |     |     |     |     |     |___ byte value
           |     |     |     |     |_________ 256 bytes
           |     |     |     |_______________ sentinal
           |     |     |_____________________ byte value
           |     |___________________________ 256 bytes
           |_________________________________ sentinal.

In this example, we have compressed a 512 byte sector to
only 6 bytes. Wonderful! Now we have to write 512 bytes
so that effort was wasted. FYI, I invented RLE and I also
put it into JMODEM the "last" file-transfer protocol that
I created in 1989.  http://www.hal9k.com/cug/v300e.htm

Any time you are bound to a minimum block size to transfer,
you will have this problem.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 16:04         ` Erik Mouw
@ 2003-10-15 17:24           ` Josh Litherland
  2003-10-15 18:53             ` Erik Bourget
  2003-10-15 19:03           ` Geert Uytterhoeven
  1 sibling, 1 reply; 92+ messages in thread
From: Josh Litherland @ 2003-10-15 17:24 UTC (permalink / raw)
  To: Erik Mouw; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 407 bytes --]

On Wed, 2003-10-15 at 12:04, Erik Mouw wrote:

> FYI: you hardly see compressed files on NTFS. If you do, it's either
> because the user thought it was a fun feature

-shrug-  The windows disk cleanup tool does this by default now if you
let it.  It compresses files that have an old access time.  Not a bad
idea imo, if perhaps one of limited usefulness.

-- 
Josh Litherland (josh@temp123.org)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 17:19             ` Richard B. Johnson
@ 2003-10-15 17:37               ` Andreas Dilger
  2003-10-15 17:48               ` Dave Jones
  2003-10-15 18:06               ` Hans Reiser
  2 siblings, 0 replies; 92+ messages in thread
From: Andreas Dilger @ 2003-10-15 17:37 UTC (permalink / raw)
  To: Richard B. Johnson
  Cc: Nikita Danilov, Erik Mouw, Josh Litherland, Linux kernel

On Oct 15, 2003  13:19 -0400, Richard B. Johnson wrote:
> On Wed, 15 Oct 2003, Nikita Danilov wrote:
> > It could not if block-level compression is used. Which is the only
> > solution, given random-access to file bodies.
> 
> Then the degenerative case is no compression at all. There is no
> advantage to writing a block that is 1/N full of compressed data.
> You end up having to write the whole block anyway.

In the ext2 compression code, they compress maybe 8 source blocks into
(hopefully) some smaller number of compressed blocks.  Yes, there is still
a minimum block size, but you can save some reasonable fraction of the
total space (e.g. 8 blocks down to 4.5 blocks still gives you 5/8 = 37%
compression, although not 50%).  You get more efficient compression the
more source blocks you use, although your "damage area" grows in case of
error.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 16:56   ` Andreas Dilger
@ 2003-10-15 17:44     ` David Woodhouse
  0 siblings, 0 replies; 92+ messages in thread
From: David Woodhouse @ 2003-10-15 17:44 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: josh, linux-kernel

On Wed, 2003-10-15 at 10:56 -0600, Andreas Dilger wrote:
> Actually, reiserfs also supports ext2-compatible SETFLAGS/GETFLAGS ioctls.
> This allows us to use the same tools for lsattr and chattr instead of using
> lsattr.reiserfs, etc.  Our Lustre distributed filesystem does the same.

Yeah - I probably will too if I ever get round to letting people
actually disable the compression :)

-- 
dwmw2



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 17:19             ` Richard B. Johnson
  2003-10-15 17:37               ` Andreas Dilger
@ 2003-10-15 17:48               ` Dave Jones
  2003-10-15 18:19                 ` Richard B. Johnson
  2003-10-15 18:06               ` Hans Reiser
  2 siblings, 1 reply; 92+ messages in thread
From: Dave Jones @ 2003-10-15 17:48 UTC (permalink / raw)
  To: Richard B. Johnson
  Cc: Nikita Danilov, Erik Mouw, Josh Litherland, Linux kernel

On Wed, Oct 15, 2003 at 01:19:09PM -0400, Richard B. Johnson wrote:
 > FYI, I invented RLE and I also
 > put it into JMODEM the "last" file-transfer protocol that
 > I created in 1989.  http://www.hal9k.com/cug/v300e.htm

RLE has been around a *lot* longer than that.
There are even several patents long before you 'invented' it.

1975:
 http://patft.uspto.gov/netacgi/nph-Parser?Sect2=PTO1&Sect2=HITOFF&p=1&u=%2Fnetahtml%2Fsearch-bool.html&r=1&f=G&l=50&d=PALL&RefSrch=yes&Query=PN%2F4031515

1983:
 http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1=4,586,027.WKU.&OS=PN/4,586,027&RS=PN/4,586,027



		Dave

-- 
 Dave Jones     http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 17:19             ` Richard B. Johnson
  2003-10-15 17:37               ` Andreas Dilger
  2003-10-15 17:48               ` Dave Jones
@ 2003-10-15 18:06               ` Hans Reiser
  2003-10-17 12:51                 ` Edward Shushkin
  2 siblings, 1 reply; 92+ messages in thread
From: Hans Reiser @ 2003-10-15 18:06 UTC (permalink / raw)
  To: root
  Cc: Nikita Danilov, Erik Mouw, Josh Litherland, Linux kernel,
	Edward Shishkin

If you use one of them silly block aligned filesystems, you indeed do 
have the problem described.  Reiser4 only block aligns files larger than 
16k that don't use the compression plugin.  Why do we block align even 
them?  mmap performance.  mmap is moderately worth optimizing for, 
unless you want to do something more important like compress the file.

Edward can say more about this more precisely....

Hans

Richard B. Johnson wrote:

>On Wed, 15 Oct 2003, Nikita Danilov wrote:
>
>  
>
>>Richard B. Johnson writes:
>> > On Wed, 15 Oct 2003, Nikita Danilov wrote:
>> >
>> > > Erik Mouw writes:
>> > >  > On Wed, Oct 15, 2003 at 05:50:38PM +0400, Nikita Danilov wrote:
>> > >  > > Erik Mouw writes:
>> > >  > >  > Nowadays disks are so incredibly cheap, that transparent compression
>> > >  > >  > support is not realy worth it anymore (IMHO).
>> > >  > >
>> > [SNIPPED...]
>> >
>> > >  >
>> > >  > PS: let me guess: among other things, reiser4 comes with transparent
>> > >  >     compression? ;-)
>> > >
>> > > Yes, it will.
>> > >
>> >
>> > EeeeeeK!  A single bad sector could prevent an entire database from
>> > being uncompressed! You may want to re-think that idea before you
>> > commit a lot of time to it.
>>
>>It could not if block-level compression is used. Which is the only
>>solution, given random-access to file bodies.
>>
>>    
>>
>
>Then the degenerative case is no compression at all. There is no
>advantage to writing a block that is 1/N full of compressed data.
>You end up having to write the whole block anyway.
>
>This problem was well developed in the days where RLE (at the hardware
>level) was used to extend the size of hard disks from their physical
>size of about 38 megabytes to about 70 megabytes. The minimim size
>of a read or write is a sector.
>
>So, lets's use the minimum compression alogithm, no sliding
>dictionaries complicating things, just RLE and see.
>
>The alogithm is a sentinal byte, a byte representing the number
>of bytes to expand -1, then the byte to expand. The sentinal byte
>in RLE was 0xbb. If you needed to read/write a 0xbb, you need
>to expand that to three bytes, 0xbb, 0x00, 0xbb.
>                                  |     |     |___ byte to expand
>                                  |     |________ nr bytes (0 + 1)
>                                  |______________ sentinal byte
>
>All other sequences will reduce the size. So, we have a 512-
>byte sector full of nulls, what gets written is:
>        0xbb, 0xff, 0x00, 0xbb, 0xff, 0x00
>           |     |     |     |     |     |___ byte value
>           |     |     |     |     |_________ 256 bytes
>           |     |     |     |_______________ sentinal
>           |     |     |_____________________ byte value
>           |     |___________________________ 256 bytes
>           |_________________________________ sentinal.
>
>In this example, we have compressed a 512 byte sector to
>only 6 bytes. Wonderful! Now we have to write 512 bytes
>so that effort was wasted. FYI, I invented RLE and I also
>put it into JMODEM the "last" file-transfer protocol that
>I created in 1989.  http://www.hal9k.com/cug/v300e.htm
>
>Any time you are bound to a minimum block size to transfer,
>you will have this problem.
>
>Cheers,
>Dick Johnson
>Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
>            Note 96.31% of all statistics are fiction.
>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>
>
>  
>


-- 
Hans



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 17:48               ` Dave Jones
@ 2003-10-15 18:19                 ` Richard B. Johnson
  0 siblings, 0 replies; 92+ messages in thread
From: Richard B. Johnson @ 2003-10-15 18:19 UTC (permalink / raw)
  To: Dave Jones; +Cc: Nikita Danilov, Erik Mouw, Josh Litherland, Linux kernel

On Wed, 15 Oct 2003, Dave Jones wrote:

> On Wed, Oct 15, 2003 at 01:19:09PM -0400, Richard B. Johnson wrote:
>  > FYI, I invented RLE and I also
>  > put it into JMODEM the "last" file-transfer protocol that
>  > I created in 1989.  http://www.hal9k.com/cug/v300e.htm
>
> RLE has been around a *lot* longer than that.
> There are even several patents long before you 'invented' it.
>
> 1975:
>  http://patft.uspto.gov/netacgi/nph-Parser?Sect2=PTO1&Sect2=HITOFF&p=1&u=%2Fnetahtml%2Fsearch-bool.html&r=1&f=G&l=50&d=PALL&RefSrch=yes&Query=PN%2F4031515
>
> 1983:
>  http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1=4,586,027.WKU.&OS=PN/4,586,027&RS=PN/4,586,027
>
>
>
> 		Dave

JMODEM was done in 1989 as stated. RLE was invented my ME in 1967 and
was first used for a digital telemetry link between the Haystack research
facility in Groton, Mass. and MIT's main campus. I was a technician there
during my senior year at Northeastern. Whether or not it was patented
by others is immaterial.

FYI, there are a lot of things I invented that others patented.
The "Rubber Duckey" antenna (used in cell-phones and other
transceivers) comes to mind.

Even stuff, in which my name is mentioned as a "co-inventor"
ends up being attributed to others, i.e., Gordon, et.al.

 United States Patent 5,577,026
Apparatus for transferring data to and from a moving device.
 European Patent number 95906032.8-2206

This, where I not only invented it, but single-handedly
built it because the others mentioned in the patent were
nay-sayers, claiming it wouldn't work.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 17:24           ` Josh Litherland
@ 2003-10-15 18:53             ` Erik Bourget
  0 siblings, 0 replies; 92+ messages in thread
From: Erik Bourget @ 2003-10-15 18:53 UTC (permalink / raw)
  To: josh; +Cc: linux-kernel

Josh Litherland <josh@temp123.org> writes:

> On Wed, 2003-10-15 at 12:04, Erik Mouw wrote:
>
>> FYI: you hardly see compressed files on NTFS. If you do, it's either
>> because the user thought it was a fun feature
>
> -shrug-  The windows disk cleanup tool does this by default now if you
> let it.  It compresses files that have an old access time.  Not a bad
> idea imo, if perhaps one of limited usefulness.

What happens when you go through and do a folder-open on that old directory of
pictures that you had, which re-reads them to generate thumbnails?  Do they
have to be uncompressed-on-disk to be read?  Will they stay compressed
forever, or be decompressed automatically on access?

- Erik Bourget


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 14:33       ` Nikita Danilov
  2003-10-15 15:54         ` Richard B. Johnson
  2003-10-15 16:04         ` Erik Mouw
@ 2003-10-15 18:54         ` root
  2003-10-16  2:11           ` Chris Meadors
  2 siblings, 1 reply; 92+ messages in thread
From: root @ 2003-10-15 18:54 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Erik Mouw, Josh Litherland, linux-kernel

> 
> Erik Mouw writes:
>  > On Wed, Oct 15, 2003 at 05:50:38PM +0400, Nikita Danilov wrote:
>  > > Erik Mouw writes:
>  > >  > Nowadays disks are so incredibly cheap, that transparent compression
>  > >  > support is not realy worth it anymore (IMHO).
<snip>
>  > You have a point, but remember that modern IDE drives can do about
>  > 50MB/s from medium. I don't think you'll find a CPU that is able to
>  > handle transparent decompression on the fly at 50MB/s, even not with a
>  > simple compression scheme as used in NTFS (see the NTFS docs on
>  > SourceForge for details).

I haven't got the original message (mail problems) so I'm responding here.

I misread your message, and thought you said compression.
My Duron 1300 (hardly the fastest machine) compresses (gzip -1) at around
40Mb/sec (repetitive log-files that compress to 5% using gzip -1) and 
10Mb/sec on text (compressing to 40%).

On expansion, it decompresses compressed text at around 14Mb/sec (resulting
in 30Mb/sec, around my disk speed), and logfiles at 110Mb/sec, which is 
significantly faster.

This is with a single stick of PC133 RAM, and a tiny 64K cache.
I would be very surprised if even a high end consumer machine couldn't handle
50Mb/sec.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 16:04         ` Erik Mouw
  2003-10-15 17:24           ` Josh Litherland
@ 2003-10-15 19:03           ` Geert Uytterhoeven
  2003-10-15 19:14             ` Valdis.Kletnieks
  1 sibling, 1 reply; 92+ messages in thread
From: Geert Uytterhoeven @ 2003-10-15 19:03 UTC (permalink / raw)
  To: Erik Mouw; +Cc: Nikita Danilov, Josh Litherland, Linux Kernel Development

On Wed, 15 Oct 2003, Erik Mouw wrote:
> On Wed, Oct 15, 2003 at 06:33:03PM +0400, Nikita Danilov wrote:
> > Trend is that CPU is getting faster and faster with respect to the
> > disk. So, even if it were hard to find such a CPU to-day, it will be
> > common place to-morrow.
> 
> I'm not too sure about this. It's my feeling that CPU speed and disk
> throughput grow about as fast. I don't have hard figures, so I can be
> proven wrong on this.

Well, let's take some (exotic :-) examples:

  - 1989: Amiga 500, 7.14 MHz 68000, expensive SCSI disk, 675 KB/s
  - 1992: Amiga 4000, 25 MHz 68040, IDE, 1.8 MB/s (SCSI with 5 MB/s should have
    been possible)
  - 1998: CHRP, 200 MHz 604e, UW-SCSI, 17 MB/s

The third CPU is ca. 25 times faster than the second (both in BogoMIPS as
kernel cross-compiles). The disk isn't 25 faster, though.

Now you can buy a machine with a CPU that's 25 times faster again, but please
show me a 400 MB/s disk...

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 19:03           ` Geert Uytterhoeven
@ 2003-10-15 19:14             ` Valdis.Kletnieks
  2003-10-15 19:24               ` Geert Uytterhoeven
  0 siblings, 1 reply; 92+ messages in thread
From: Valdis.Kletnieks @ 2003-10-15 19:14 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Erik Mouw, Nikita Danilov, Josh Litherland, Linux Kernel Development

[-- Attachment #1: Type: text/plain, Size: 462 bytes --]

On Wed, 15 Oct 2003 21:03:35 +0200, Geert Uytterhoeven said:

>   - 1989: Amiga 500, 7.14 MHz 68000, expensive SCSI disk, 675 KB/s
>   - 1992: Amiga 4000, 25 MHz 68040, IDE, 1.8 MB/s (SCSI with 5 MB/s should ha
ve
>     been possible)
>   - 1998: CHRP, 200 MHz 604e, UW-SCSI, 17 MB/s
> 
> The third CPU is ca. 25 times faster than the second (both in BogoMIPS as
> kernel cross-compiles). The disk isn't 25 faster, though.

%  dc
3 k 17 .675 / p
25.185
^D

Huh?

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 19:14             ` Valdis.Kletnieks
@ 2003-10-15 19:24               ` Geert Uytterhoeven
  0 siblings, 0 replies; 92+ messages in thread
From: Geert Uytterhoeven @ 2003-10-15 19:24 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Erik Mouw, Nikita Danilov, Josh Litherland, Linux Kernel Development

On Wed, 15 Oct 2003 Valdis.Kletnieks@vt.edu wrote:
> On Wed, 15 Oct 2003 21:03:35 +0200, Geert Uytterhoeven said:
> >   - 1989: Amiga 500, 7.14 MHz 68000, expensive SCSI disk, 675 KB/s
> >   - 1992: Amiga 4000, 25 MHz 68040, IDE, 1.8 MB/s (SCSI with 5 MB/s should ha
> ve
> >     been possible)
> >   - 1998: CHRP, 200 MHz 604e, UW-SCSI, 17 MB/s
> > 
> > The third CPU is ca. 25 times faster than the second (both in BogoMIPS as
> > kernel cross-compiles). The disk isn't 25 faster, though.
> 
> %  dc
> 3 k 17 .675 / p
> 25.185
> ^D
> 
> Huh?

You're comparing the _third_ disk to the _first_. If you want to do that,
please take note that the 68040 is a lot faster than the 68000, too.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 15:13   ` Jeff Garzik
@ 2003-10-15 21:00     ` Christopher Li
  2003-10-16 16:29     ` Andrea Arcangeli
  1 sibling, 0 replies; 92+ messages in thread
From: Christopher Li @ 2003-10-15 21:00 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Erik Mouw, Josh Litherland, linux-kernel

On Wed, Oct 15, 2003 at 11:13:27AM -0400, Jeff Garzik wrote:
> Josh and others should take a look at Plan9's venti file storage method 
> -- archival storage is a series of unordered blocks, all of which are 
> indexed by the sha1 hash of their contents.  This magically coalesces 

That is cool. I can image it will help versioning file system.
I guess it still need to compare the whole block for possible hash collision.
I should check it out.

Chris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 14:27     ` Erik Mouw
  2003-10-15 14:33       ` Nikita Danilov
  2003-10-15 14:47       ` Erik Bourget
@ 2003-10-15 21:36       ` Tomas Szepe
  2003-10-16  8:04         ` Ville Herva
  2003-10-17  1:32       ` Eric W. Biederman
  3 siblings, 1 reply; 92+ messages in thread
From: Tomas Szepe @ 2003-10-15 21:36 UTC (permalink / raw)
  To: Erik Mouw; +Cc: Nikita Danilov, Josh Litherland, linux-kernel

On Oct-15 2003, Wed, 16:27 +0200
Erik Mouw <erik@harddisk-recovery.com> wrote:

> You have a point, but remember that modern IDE drives can do about
> 50MB/s from medium. I don't think you'll find a CPU that is able to
> handle transparent decompression on the fly at 50MB/s ... [snip]

You may want to check out LZO performance on a recent CPU.
http://www.oberhumer.com/opensource/lzo/

-- 
Tomas Szepe <szepe@pinerecords.com>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 18:54         ` root
@ 2003-10-16  2:11           ` Chris Meadors
  2003-10-16  3:01             ` Shawn
  0 siblings, 1 reply; 92+ messages in thread
From: Chris Meadors @ 2003-10-16  2:11 UTC (permalink / raw)
  To: Linux Kernel

On Wed, 2003-10-15 at 14:54, root@mauve.demon.co.uk wrote:

> I haven't got the original message (mail problems) so I'm responding here.
> 
> I misread your message, and thought you said compression.
> My Duron 1300 (hardly the fastest machine) compresses (gzip -1) at around
> 40Mb/sec (repetitive log-files that compress to 5% using gzip -1) and 
> 10Mb/sec on text (compressing to 40%).
> 
> On expansion, it decompresses compressed text at around 14Mb/sec (resulting
> in 30Mb/sec, around my disk speed), and logfiles at 110Mb/sec, which is 
> significantly faster.
> 
> This is with a single stick of PC133 RAM, and a tiny 64K cache.
> I would be very surprised if even a high end consumer machine couldn't handle
> 50Mb/sec.

chris@prime:/tmp$ dd if=/dev/urandom of=random_data bs=1024 count=100k
102400+0 records in
102400+0 records out
chris@prime:/tmp$ time gzip -9 random_data
 
real    0m10.572s
user    0m10.271s
sys     0m0.298s
chris@prime:/tmp$ time gunzip random_data.gz
 
real    0m1.506s
user    0m1.205s
sys     0m0.301s
chris@prime:/tmp$ time gzip -1 random_data
 
real    0m9.959s
user    0m9.632s
sys     0m0.326s
chris@prime:/tmp$ time gunzip random_data.gz
 
real    0m1.523s
user    0m1.218s
sys     0m0.305s


chris@prime:/tmp$ dd if=/dev/zero of=zero_data bs=1024 count=100k
102400+0 records in
102400+0 records out
chris@prime:/tmp$ time gzip -9 zero_data
 
real    0m2.947s
user    0m2.812s
sys     0m0.134s
chris@prime:/tmp$ time gunzip zero_data.gz
 
real    0m1.062s
user    0m0.920s
sys     0m0.142s
chris@prime:/tmp$ time gzip -1 zero_data
 
real    0m1.840s
user    0m1.717s
sys     0m0.123s
chris@prime:/tmp$ time gunzip zero_data.gz
 
real    0m0.707s
user    0m0.542s
sys     0m0.165s


I can get pretty close to 50MB/sec compressing a nice long string of
zeros.  Over 100MB/sec decompressing the same.  But from the first test,
spinning my wheels on relatively uncompressible data gets me no where
slowly.

Of course the disk subsystem of this machine is also a hardware RAID
array of 8 (4 per channel) 10,000 RPM SCSI U320 drives.  So I'm not
approaching the speeds that I can get from it anyway.

-- 
Chris


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16  2:11           ` Chris Meadors
@ 2003-10-16  3:01             ` Shawn
  0 siblings, 0 replies; 92+ messages in thread
From: Shawn @ 2003-10-16  3:01 UTC (permalink / raw)
  To: Chris Meadors; +Cc: Linux Kernel

You know, I've seen a lot of mostly-N/A benchmarks, but no real-world
examples. I've seen many folks talking back and forth about data
throughput, but little talk of I/O latency.

Regardless what your unloaded CPU can do, does anyone care to realize
that your CPU usually has to act on the data you're reading/writing from
said filesystem, probably at the expense of throughput?

Now, this is not to say that certain workloads wouldn't benefit
tremendously from compression, especially these days when we seem to
have processor to spare. I may have a giganimonstrous file full of zeros
I just need to read from and write to... I actually remember working
with "Jam drive" (a "Stacker" work-alike) on DOS which really did make
my little 33 Mhz box throw up X-appeal
(http://www.xtreme.it/xappeal.html) twice as fast... That was a good
real-world example relevant to me.

Anyway, I guess I'm not arguing for or against anything here. Just
trying to make people realize most of the benchmarks cited do NOT
represent the common scenario.

Also, I would hope that if ext3+compressFS existed I could mount a
compressed ext3 filesystem as vanilla ext3 and gunzip/bunzip2 a file I
know to be compressed to recover the data. Nice to have [back|for]ward
compatibility.

On Wed, 2003-10-15 at 21:11, Chris Meadors wrote:
> On Wed, 2003-10-15 at 14:54, root@mauve.demon.co.uk wrote:
> 
> > I haven't got the original message (mail problems) so I'm responding here.
> > 
> > I misread your message, and thought you said compression.
> > My Duron 1300 (hardly the fastest machine) compresses (gzip -1) at around
> > 40Mb/sec (repetitive log-files that compress to 5% using gzip -1) and 
> > 10Mb/sec on text (compressing to 40%).
> > 
> > On expansion, it decompresses compressed text at around 14Mb/sec (resulting
> > in 30Mb/sec, around my disk speed), and logfiles at 110Mb/sec, which is 
> > significantly faster.
> > 
> > This is with a single stick of PC133 RAM, and a tiny 64K cache.
> > I would be very surprised if even a high end consumer machine couldn't handle
> > 50Mb/sec.
> 
> chris@prime:/tmp$ dd if=/dev/urandom of=random_data bs=1024 count=100k
> 102400+0 records in
> 102400+0 records out
> chris@prime:/tmp$ time gzip -9 random_data
>  
> real    0m10.572s
> user    0m10.271s
> sys     0m0.298s
> chris@prime:/tmp$ time gunzip random_data.gz
>  
> real    0m1.506s
> user    0m1.205s
> sys     0m0.301s
> chris@prime:/tmp$ time gzip -1 random_data
>  
> real    0m9.959s
> user    0m9.632s
> sys     0m0.326s
> chris@prime:/tmp$ time gunzip random_data.gz
>  
> real    0m1.523s
> user    0m1.218s
> sys     0m0.305s
> 
> 
> chris@prime:/tmp$ dd if=/dev/zero of=zero_data bs=1024 count=100k
> 102400+0 records in
> 102400+0 records out
> chris@prime:/tmp$ time gzip -9 zero_data
>  
> real    0m2.947s
> user    0m2.812s
> sys     0m0.134s
> chris@prime:/tmp$ time gunzip zero_data.gz
>  
> real    0m1.062s
> user    0m0.920s
> sys     0m0.142s
> chris@prime:/tmp$ time gzip -1 zero_data
>  
> real    0m1.840s
> user    0m1.717s
> sys     0m0.123s
> chris@prime:/tmp$ time gunzip zero_data.gz
>  
> real    0m0.707s
> user    0m0.542s
> sys     0m0.165s
> 
> 
> I can get pretty close to 50MB/sec compressing a nice long string of
> zeros.  Over 100MB/sec decompressing the same.  But from the first test,
> spinning my wheels on relatively uncompressible data gets me no where
> slowly.
> 
> Of course the disk subsystem of this machine is also a hardware RAID
> array of 8 (4 per channel) 10,000 RPM SCSI U320 drives.  So I'm not
> approaching the speeds that I can get from it anyway.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 21:36       ` Tomas Szepe
@ 2003-10-16  8:04         ` Ville Herva
  0 siblings, 0 replies; 92+ messages in thread
From: Ville Herva @ 2003-10-16  8:04 UTC (permalink / raw)
  To: Tomas Szepe; +Cc: Erik Mouw, Nikita Danilov, Josh Litherland, linux-kernel

On Wed, Oct 15, 2003 at 11:36:24PM +0200, you [Tomas Szepe] wrote:
> On Oct-15 2003, Wed, 16:27 +0200
> Erik Mouw <erik@harddisk-recovery.com> wrote:
> 
> > You have a point, but remember that modern IDE drives can do about
> > 50MB/s from medium. I don't think you'll find a CPU that is able to
> > handle transparent decompression on the fly at 50MB/s ... [snip]
> 
> You may want to check out LZO performance on a recent CPU.
> http://www.oberhumer.com/opensource/lzo/

Out of interest:

Celeron 1.4GHz, a source tar:

           compr MB/s uncompr MB/s ratio
lzo -m71:  58.2       170.2        24.4%
lzo -m972:  2.7       100.8        17.7%

gzip -1:   13.6       115.7        22.1%
gzip -6:    8.8       121.8        17.7%
gzip -9:    4.5        79.8        17.6%

lzo used assembler versions.


-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 13:33 ` Erik Mouw
                     ` (2 preceding siblings ...)
  2003-10-15 15:13   ` Jeff Garzik
@ 2003-10-16  8:27   ` tconnors+linuxkernel1066292516
  2003-10-17 10:55   ` Ingo Oeser
  4 siblings, 0 replies; 92+ messages in thread
From: tconnors+linuxkernel1066292516 @ 2003-10-16  8:27 UTC (permalink / raw)
  To: linux-kernel

Erik Mouw <erik@harddisk-recovery.com> said on Wed, 15 Oct 2003 15:33:05 +0200:
> On Tue, Oct 14, 2003 at 04:30:50PM -0400, Josh Litherland wrote:
> > Are there any filesystems which implement the transparent compression
> > attribute ?  (chattr +c)
> 
> The NTFS driver supports compressed files. Because it doesn't have
> proper write support, I don't think it will do anything useful with
> chattr +c.
> 
> Nowadays disks are so incredibly cheap, that transparent compression
> support is not realy worth it anymore (IMHO).

Why is it that everyone thinks things aren't necessarily done anymore?
This is the reason why we end up with *bloat* these days - people keep
on thinking it is the norm for everyone to have a computer that is
less than 3 years old.

For those of us who like not to waste, or those unfortunate not to
have a few hundred dollars spare, the statement that "disk is cheap"
(or CPU, or RAM) is a bit of an insult.

Hell, my 486 router can't even have more than the 8Megs of RAM it
already has added, due to bios/MB limitations.


Similarly, those of us with 17TB of RAID, but wanting to create 30TB
of data (on a university budget), also find compression fairly useful.
As it is, I have to write ugly hacks to get my fortran programs to be
able to make use of a pipe to gzip - my data is sufficiently redundant
that I get a good factor of 10 saving.


-- 
TimC -- http://astronomy.swin.edu.au/staff/tconnors/
\a\a\a\a        *** System shutdown message from root ***
System going down in 60 seconds

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 15:13   ` Jeff Garzik
  2003-10-15 21:00     ` Christopher Li
@ 2003-10-16 16:29     ` Andrea Arcangeli
  2003-10-16 16:41       ` P
                         ` (3 more replies)
  1 sibling, 4 replies; 92+ messages in thread
From: Andrea Arcangeli @ 2003-10-16 16:29 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Erik Mouw, Josh Litherland, linux-kernel

Hi Jeff,

On Wed, Oct 15, 2003 at 11:13:27AM -0400, Jeff Garzik wrote:
> Josh and others should take a look at Plan9's venti file storage method 
> -- archival storage is a series of unordered blocks, all of which are 
> indexed by the sha1 hash of their contents.  This magically coalesces 
> all duplicate blocks by its very nature, including the loooooong runs of 
> zeroes that you'll find in many filesystems.  I bet savings on "all 
> bytes in this block are zero" are worth a bunch right there.

I had a few ideas on the above.

if the zero blocks are the problem, there's a tool called zum that nukes
them and replaces them with holes. I use it sometime, example:

andrea@velociraptor:~> dd if=/dev/zero of=zero bs=1M count=100
100+0 records in
100+0 records out
andrea@velociraptor:~> ls -ls zero
102504 -rw-r--r--    1 andrea   andrea   104857600 2003-10-16 18:24 zero
andrea@velociraptor:~> ~/bin/i686/zum zero
zero [820032K]  [1 link]
andrea@velociraptor:~> ls -ls zero
   0 -rw-r--r--    1 andrea   andrea   104857600 2003-10-16 18:24 zero
andrea@velociraptor:~> 

if you can't find it ask and I'll send it by email (it's GPL btw).

the hash to the data is interesting, but 1) you lose the zerocopy
behaviour for the I/O, it's like doing a checksum for all the data going to
disk that you normally would never do (except for the tiny files in reiserfs
with tail packing enabled, but that's not bulk I/O), 2) I wonder how much data
is really duplicate besides the "zero" holes trivially fixable in userspace
(modulo bzImage or similar where I'm unsure if the fs code in the bootloader
can handle holes ;).

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 16:29     ` Andrea Arcangeli
@ 2003-10-16 16:41       ` P
  2003-10-16 17:20         ` Jeff Garzik
  2003-10-16 23:12         ` jw schultz
  2003-10-16 17:10       ` Jeff Garzik
                         ` (2 subsequent siblings)
  3 siblings, 2 replies; 92+ messages in thread
From: P @ 2003-10-16 16:41 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Jeff Garzik, Erik Mouw, Josh Litherland, linux-kernel

Andrea Arcangeli wrote:
> Hi Jeff,
> 
> On Wed, Oct 15, 2003 at 11:13:27AM -0400, Jeff Garzik wrote:
> 
>>Josh and others should take a look at Plan9's venti file storage method 
>>-- archival storage is a series of unordered blocks, all of which are 
>>indexed by the sha1 hash of their contents.  This magically coalesces 
>>all duplicate blocks by its very nature, including the loooooong runs of 
>>zeroes that you'll find in many filesystems.  I bet savings on "all 
>>bytes in this block are zero" are worth a bunch right there.
> 
> I had a few ideas on the above.
> 
> if the zero blocks are the problem, there's a tool called zum that nukes
> them and replaces them with holes. I use it sometime, example:

Yes post processing with this tool is useful.
Also note gnu cp (--sparse) inserts holes
in files also.

I thought a bit about this also and thought
that in general the overhead of instant block/file
duplicate merging is not worth it. Periodic
post processing with a merging tool is
much more efficient IMHO. Of course this is
now only possible at the file level, but this
is all that generally useful I think. I guess
it's appropriate to plug my merging tool here :-)
http://www.pixelbeat.org/fslint

Pádraig.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 16:29     ` Andrea Arcangeli
  2003-10-16 16:41       ` P
@ 2003-10-16 17:10       ` Jeff Garzik
  2003-10-16 17:41         ` Andrea Arcangeli
  2003-10-16 17:29       ` Larry McVoy
  2003-10-16 23:20       ` jw schultz
  3 siblings, 1 reply; 92+ messages in thread
From: Jeff Garzik @ 2003-10-16 17:10 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Erik Mouw, Josh Litherland, linux-kernel

Andrea Arcangeli wrote:
> Hi Jeff,
> 
> On Wed, Oct 15, 2003 at 11:13:27AM -0400, Jeff Garzik wrote:
> 
>>Josh and others should take a look at Plan9's venti file storage method 
>>-- archival storage is a series of unordered blocks, all of which are 
>>indexed by the sha1 hash of their contents.  This magically coalesces 
>>all duplicate blocks by its very nature, including the loooooong runs of 
>>zeroes that you'll find in many filesystems.  I bet savings on "all 
>>bytes in this block are zero" are worth a bunch right there.
> 
> 
> I had a few ideas on the above.
> 
> if the zero blocks are the problem, there's a tool called zum that nukes
> them and replaces them with holes. I use it sometime, example:
> 
> andrea@velociraptor:~> dd if=/dev/zero of=zero bs=1M count=100
> 100+0 records in
> 100+0 records out
> andrea@velociraptor:~> ls -ls zero
> 102504 -rw-r--r--    1 andrea   andrea   104857600 2003-10-16 18:24 zero
> andrea@velociraptor:~> ~/bin/i686/zum zero
> zero [820032K]  [1 link]
> andrea@velociraptor:~> ls -ls zero
>    0 -rw-r--r--    1 andrea   andrea   104857600 2003-10-16 18:24 zero
> andrea@velociraptor:~> 

Neat.


> the hash to the data is interesting, but 1) you lose the zerocopy
> behaviour for the I/O, it's like doing a checksum for all the data going to
> disk that you normally would never do (except for the tiny files in reiserfs
> with tail packing enabled, but that's not bulk I/O), 2) I wonder how much data
> is really duplicate besides the "zero" holes trivially fixable in userspace
> (modulo bzImage or similar where I'm unsure if the fs code in the bootloader
> can handle holes ;).

FWIW archival storage doesn't really care...  Since all data written to 
disk is hashed with SHA1 (sha1 hash == block's unique id), you gain (a) 
duplicate block coalescing and (b) _real_ data integrity guaranteed, but 
OTOH, you lose performance and possibly lose zero-copy.

I _really_ like the checksum aspect of Plan9's archival storage (venti).

As Andre H and Larry McVoy love to point out, data isn't _really_ secure 
until it's been checksummed, and that checksum data is verified on 
reads.  LM has an anecdote (doesn't he always?  <g>) about how BitKeeper 
-- which checksums its data inside the app -- has found data-corrupting 
kernel bugs, in days long past.

	Jeff



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 16:41       ` P
@ 2003-10-16 17:20         ` Jeff Garzik
  2003-10-16 23:12         ` jw schultz
  1 sibling, 0 replies; 92+ messages in thread
From: Jeff Garzik @ 2003-10-16 17:20 UTC (permalink / raw)
  To: P; +Cc: Andrea Arcangeli, Erik Mouw, Josh Litherland, linux-kernel

P@draigBrady.com wrote:
> I thought a bit about this also and thought
> that in general the overhead of instant block/file
> duplicate merging is not worth it. Periodic
> post processing with a merging tool is
> much more efficient IMHO. Of course this is
> now only possible at the file level, but this

The sha1 hash method used by plan9 works at the block level.  venti 
_automatically_ coalesces duplicate blocks, without extra effort, simply 
by design.

	Jeff




^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 16:29     ` Andrea Arcangeli
  2003-10-16 16:41       ` P
  2003-10-16 17:10       ` Jeff Garzik
@ 2003-10-16 17:29       ` Larry McVoy
  2003-10-16 17:49         ` Val Henson
                           ` (2 more replies)
  2003-10-16 23:20       ` jw schultz
  3 siblings, 3 replies; 92+ messages in thread
From: Larry McVoy @ 2003-10-16 17:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: val

On Wed, Oct 15, 2003 at 11:13:27AM -0400, Jeff Garzik wrote:
> Josh and others should take a look at Plan9's venti file storage method 
> -- archival storage is a series of unordered blocks, all of which are 
> indexed by the sha1 hash of their contents.  This magically coalesces 
> all duplicate blocks by its very nature, including the loooooong runs of 
> zeroes that you'll find in many filesystems.  I bet savings on "all 
> bytes in this block are zero" are worth a bunch right there.

The only problem with this is that you can get false positives.  Val Hensen
recently wrote a paper about this.  It's really unlikely that you get false
positives but it can happen and it has happened in the field.  
-- 
---
Larry McVoy              lm at bitmover.com          http://www.bitmover.com/lm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 17:10       ` Jeff Garzik
@ 2003-10-16 17:41         ` Andrea Arcangeli
  0 siblings, 0 replies; 92+ messages in thread
From: Andrea Arcangeli @ 2003-10-16 17:41 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Erik Mouw, Josh Litherland, linux-kernel

On Thu, Oct 16, 2003 at 01:10:20PM -0400, Jeff Garzik wrote:
> As Andre H and Larry McVoy love to point out, data isn't _really_ secure 
> until it's been checksummed, and that checksum data is verified on 
> reads.  LM has an anecdote (doesn't he always?  <g>) about how BitKeeper 
> -- which checksums its data inside the app -- has found data-corrupting 
> kernel bugs, in days long past.

yes, even a fs checksum won't protect against kernel bugs randomly
corrupting pagacache, and that's why having it in userspace is
preferable IMHO (though in userspace it's more difficult to switch it
off transparently unless the app is designed for that).

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 17:29       ` Larry McVoy
@ 2003-10-16 17:49         ` Val Henson
  2003-10-16 21:02           ` Jeff Garzik
  2003-10-16 23:04           ` jw schultz
  2003-10-16 18:28         ` John Bradford
  2003-10-17 13:16         ` Ingo Oeser
  2 siblings, 2 replies; 92+ messages in thread
From: Val Henson @ 2003-10-16 17:49 UTC (permalink / raw)
  To: Larry McVoy, linux-kernel

On Thu, Oct 16, 2003 at 10:29:30AM -0700, Larry McVoy wrote:
> On Wed, Oct 15, 2003 at 11:13:27AM -0400, Jeff Garzik wrote:
> > Josh and others should take a look at Plan9's venti file storage method 
> > -- archival storage is a series of unordered blocks, all of which are 
> > indexed by the sha1 hash of their contents.  This magically coalesces 
> > all duplicate blocks by its very nature, including the loooooong runs of 
> > zeroes that you'll find in many filesystems.  I bet savings on "all 
> > bytes in this block are zero" are worth a bunch right there.
> 
> The only problem with this is that you can get false positives.  Val Hensen
> recently wrote a paper about this.  It's really unlikely that you get false
> positives but it can happen and it has happened in the field.  

To be fair, I talked to someone who claims that Venti now checks for
hash collisions on writes, but that's not what the original paper
describes and I haven't confirmed it.

The compare-by-hash paper is only 6 pages long, at least take the time
to read it before you start using compare-by-hash:

http://www.usenix.org/events/hotos03/tech/henson.html

Abstract:

 "Recent research has produced a new and perhaps dangerous technique
  for uniquely identifying blocks that I will call
  compare-by-hash. Using this technique, we decide whether two blocks
  are identical to each other by comparing their hash values, using a
  collision-resistant hash such as SHA-1. If the hash values match,
  we assume the blocks are identical without further ado. Users of
  compare-by-hash argue that this assumption is warranted because the
  chance of a hash collision between any two randomly generated blocks
  is estimated to be many orders of magnitude smaller than the chance
  of many kinds of hardware errors. Further analysis shows that this
  approach is not as risk-free as it seems at first glance."

-VAL (not subscribed to l-k ATM)

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 17:29       ` Larry McVoy
  2003-10-16 17:49         ` Val Henson
@ 2003-10-16 18:28         ` John Bradford
  2003-10-16 18:31           ` Robert Love
                             ` (2 more replies)
  2003-10-17 13:16         ` Ingo Oeser
  2 siblings, 3 replies; 92+ messages in thread
From: John Bradford @ 2003-10-16 18:28 UTC (permalink / raw)
  To: Larry McVoy, linux-kernel; +Cc: val

Quote from Larry McVoy <lm@bitmover.com>:
> On Wed, Oct 15, 2003 at 11:13:27AM -0400, Jeff Garzik wrote:
> > Josh and others should take a look at Plan9's venti file storage method 
> > -- archival storage is a series of unordered blocks, all of which are 
> > indexed by the sha1 hash of their contents.  This magically coalesces 
> > all duplicate blocks by its very nature, including the loooooong runs of 
> > zeroes that you'll find in many filesystems.  I bet savings on "all 
> > bytes in this block are zero" are worth a bunch right there.
> 
> The only problem with this is that you can get false positives.  Val Hensen
> recently wrote a paper about this.  It's really unlikely that you get false
> positives but it can happen and it has happened in the field.  

Surely it's just common sense to say that you have to verify the whole
block - any algorithm that can compress N values into <N values is
lossy by definition.  A mathematical proof for that is easy.

John.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 18:28         ` John Bradford
@ 2003-10-16 18:31           ` Robert Love
  2003-10-16 20:18             ` Jeff Garzik
  2003-10-16 18:43           ` Muli Ben-Yehuda
  2003-10-16 18:56           ` Richard B. Johnson
  2 siblings, 1 reply; 92+ messages in thread
From: Robert Love @ 2003-10-16 18:31 UTC (permalink / raw)
  To: John Bradford; +Cc: Larry McVoy, linux-kernel, val

On Thu, 2003-10-16 at 14:28, John Bradford wrote:

> Surely it's just common sense to say that you have to verify the whole
> block - any algorithm that can compress N values into <N values is
> lossy by definition.  A mathematical proof for that is easy.

That is the problem.

But those pushing this approach argue that the chance of collision is
less than the chance of hardware errors, et cetera.

Read Val's paper.  It is interesting.

	Robert Love



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 18:28         ` John Bradford
  2003-10-16 18:31           ` Robert Love
@ 2003-10-16 18:43           ` Muli Ben-Yehuda
  2003-10-16 18:56           ` Richard B. Johnson
  2 siblings, 0 replies; 92+ messages in thread
From: Muli Ben-Yehuda @ 2003-10-16 18:43 UTC (permalink / raw)
  To: John Bradford; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 406 bytes --]

On Thu, Oct 16, 2003 at 07:28:59PM +0100, John Bradford wrote:

> Surely it's just common sense to say that you have to verify the whole
> block - any algorithm that can compress N values into <N values is
> lossy by definition.  A mathematical proof for that is easy.

N values is easy. N random uncorrelated values is hard.

Pedantic, me? 
Never. 
-- 
Muli Ben-Yehuda
http://www.mulix.org


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 18:28         ` John Bradford
  2003-10-16 18:31           ` Robert Love
  2003-10-16 18:43           ` Muli Ben-Yehuda
@ 2003-10-16 18:56           ` Richard B. Johnson
  2003-10-16 19:00             ` Robert Love
  2003-10-16 19:03             ` John Bradford
  2 siblings, 2 replies; 92+ messages in thread
From: Richard B. Johnson @ 2003-10-16 18:56 UTC (permalink / raw)
  To: John Bradford; +Cc: Larry McVoy, linux-kernel, val

On Thu, 16 Oct 2003, John Bradford wrote:

> Quote from Larry McVoy <lm@bitmover.com>:
> > On Wed, Oct 15, 2003 at 11:13:27AM -0400, Jeff Garzik wrote:
> > > Josh and others should take a look at Plan9's venti file storage method
> > > -- archival storage is a series of unordered blocks, all of which are
> > > indexed by the sha1 hash of their contents.  This magically coalesces
> > > all duplicate blocks by its very nature, including the loooooong runs of
> > > zeroes that you'll find in many filesystems.  I bet savings on "all
> > > bytes in this block are zero" are worth a bunch right there.
> >
> > The only problem with this is that you can get false positives.  Val Hensen
> > recently wrote a paper about this.  It's really unlikely that you get false
> > positives but it can happen and it has happened in the field.
>
> Surely it's just common sense to say that you have to verify the whole
> block - any algorithm that can compress N values into <N values is
> lossy by definition.  A mathematical proof for that is easy.
>
> John.

No! Not true. 'lossy' means that you can't recover the original
data. Some music compression and video compression schemes are
lossy. If you can get back the exact input data, it's not lossy.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 18:56           ` Richard B. Johnson
@ 2003-10-16 19:00             ` Robert Love
  2003-10-16 19:27               ` John Bradford
  2003-10-16 19:03             ` John Bradford
  1 sibling, 1 reply; 92+ messages in thread
From: Robert Love @ 2003-10-16 19:00 UTC (permalink / raw)
  To: root; +Cc: John Bradford, Larry McVoy, linux-kernel, val

On Thu, 2003-10-16 at 14:56, Richard B. Johnson wrote:

> No! Not true. 'lossy' means that you can't recover the original
> data. Some music compression and video compression schemes are
> lossy. If you can get back the exact input data, it's not lossy.

...and with an N-bit hash of M bits of data, where N<M, you cannot
recover the original data...

	Robert Love



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 18:56           ` Richard B. Johnson
  2003-10-16 19:00             ` Robert Love
@ 2003-10-16 19:03             ` John Bradford
  2003-10-16 19:20               ` Richard B. Johnson
  1 sibling, 1 reply; 92+ messages in thread
From: John Bradford @ 2003-10-16 19:03 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: Larry McVoy, linux-kernel, val

Quote from "Richard B. Johnson" <root@chaos.analogic.com>:
> No! Not true. 'lossy' means that you can't recover the original
> data. Some music compression and video compression schemes are
> lossy. If you can get back the exact input data, it's not lossy.

Sorry, I wasn't clear in my description.  What I meant was that you
can't have an algorithm that can compress all possible values of N
bits in to less than N bits, without expanding some of them.  Of
course, you can compress N values in to <N values, compressors do that
by definition :-)

John.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 19:03             ` John Bradford
@ 2003-10-16 19:20               ` Richard B. Johnson
  0 siblings, 0 replies; 92+ messages in thread
From: Richard B. Johnson @ 2003-10-16 19:20 UTC (permalink / raw)
  To: John Bradford; +Cc: Larry McVoy, linux-kernel, val

On Thu, 16 Oct 2003, John Bradford wrote:

> Quote from "Richard B. Johnson" <root@chaos.analogic.com>:
> > No! Not true. 'lossy' means that you can't recover the original
> > data. Some music compression and video compression schemes are
> > lossy. If you can get back the exact input data, it's not lossy.
>
> Sorry, I wasn't clear in my description.  What I meant was that you
> can't have an algorithm that can compress all possible values of N
> bits in to less than N bits, without expanding some of them.  Of
> course, you can compress N values in to <N values, compressors do that
> by definition :-)
>
> John.
>

Ahha. Yes you are provably correct unless you use some additional
"message-channel" to cheat.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 19:00             ` Robert Love
@ 2003-10-16 19:27               ` John Bradford
  0 siblings, 0 replies; 92+ messages in thread
From: John Bradford @ 2003-10-16 19:27 UTC (permalink / raw)
  To: Robert Love, root; +Cc: John Bradford, Larry McVoy, linux-kernel, val

Quote from Robert Love <rml@tech9.net>:
> On Thu, 2003-10-16 at 14:56, Richard B. Johnson wrote:
> 
> > No! Not true. 'lossy' means that you can't recover the original
> > data. Some music compression and video compression schemes are
> > lossy. If you can get back the exact input data, it's not lossy.
> 
> ...and with an N-bit hash of M bits of data, where N<M, you cannot
> recover the original data...

I think the confusion has arisen because I said that you can't
compress N values in to <N values.  This is somewhat ambiguous, and
can be taken to mean either:

1. You can't loss-lessly compress N arbitrary values, in to less than
N arbitrary values using a certain mathematical transformation.

or

2. You can't loss-lessly compress all N combinations of M bits, in to
less than M bits, using any mathematical transformation.

Obviously interpretation 1 is incorrect, and interpretation 2 is
correct.

John.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 18:31           ` Robert Love
@ 2003-10-16 20:18             ` Jeff Garzik
  0 siblings, 0 replies; 92+ messages in thread
From: Jeff Garzik @ 2003-10-16 20:18 UTC (permalink / raw)
  To: Robert Love; +Cc: John Bradford, Larry McVoy, linux-kernel, val

Robert Love wrote:
> On Thu, 2003-10-16 at 14:28, John Bradford wrote:
> 
> 
>>Surely it's just common sense to say that you have to verify the whole
>>block - any algorithm that can compress N values into <N values is
>>lossy by definition.  A mathematical proof for that is easy.
> 
> 
> That is the problem.
> 
> But those pushing this approach argue that the chance of collision is
> less than the chance of hardware errors, et cetera.


Nod.  And there are certainly ways to avoid hash collisions...

	Jeff




^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 17:49         ` Val Henson
@ 2003-10-16 21:02           ` Jeff Garzik
  2003-10-16 21:18             ` Chris Meadors
                               ` (3 more replies)
  2003-10-16 23:04           ` jw schultz
  1 sibling, 4 replies; 92+ messages in thread
From: Jeff Garzik @ 2003-10-16 21:02 UTC (permalink / raw)
  To: Val Henson; +Cc: Larry McVoy, linux-kernel

Val Henson wrote:
> Abstract:
> 
>  "Recent research has produced a new and perhaps dangerous technique
>   for uniquely identifying blocks that I will call
>   compare-by-hash. Using this technique, we decide whether two blocks
>   are identical to each other by comparing their hash values, using a
>   collision-resistant hash such as SHA-1. If the hash values match,
>   we assume the blocks are identical without further ado. Users of
>   compare-by-hash argue that this assumption is warranted because the
>   chance of a hash collision between any two randomly generated blocks
>   is estimated to be many orders of magnitude smaller than the chance
>   of many kinds of hardware errors. Further analysis shows that this
>   approach is not as risk-free as it seems at first glance."


I'm curious if anyone has done any work on using multiple different 
checksums?  For example, the cost of checksumming a single block with 
multiple algorithms (sha1+md5+crc32 for a crazy example), and storing 
each checksum (instead of just one sha1 sum), may be faster than reading 
the block off of disk to compare it with the incoming block.  OTOH, 
there is still a mathematical possibility (however-more-remote) of a 
collission...

With these sorts of schemes, from basic compare-by-hash to one that 
actually checks the data for a collission, you take a hit no matter what 
when writing, but reading is still full-speed-ahead.  (if anyone is 
curious, Plan9's venti uses a multi-GB "write buffer", to mitigate the 
effect of the checksumming on throughput)

So it's easy to tweak the writing algorithms pretty much any which way, 
as long as the outcome is a unique tag for each block.  Hash collisions 
between two files, for example, could be resolved easily by making each 
unique tag "$sha1_hash $n", where $n is the unique number 
differentiating two blocks with the same SHA1 hash.

	Jeff




^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 21:02           ` Jeff Garzik
@ 2003-10-16 21:18             ` Chris Meadors
  2003-10-16 21:25               ` Jeff Garzik
  2003-10-16 21:33             ` Davide Libenzi
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 92+ messages in thread
From: Chris Meadors @ 2003-10-16 21:18 UTC (permalink / raw)
  To: Linux Kernel

On Thu, 2003-10-16 at 17:02, Jeff Garzik wrote:

> I'm curious if anyone has done any work on using multiple different 
> checksums?  For example, the cost of checksumming a single block with 
> multiple algorithms (sha1+md5+crc32 for a crazy example), and storing 
> each checksum (instead of just one sha1 sum), may be faster than reading 
> the block off of disk to compare it with the incoming block.  OTOH, 
> there is still a mathematical possibility (however-more-remote) of a 
> collission...

I don't think multiple hashes will gain any more uniqueness over just a
larger hash value.  But as long as the hash is smaller than the block
being hashed there is the possibility of two dissimilar blocks producing
the same hash.

-- 
Chris


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 21:18             ` Chris Meadors
@ 2003-10-16 21:25               ` Jeff Garzik
  0 siblings, 0 replies; 92+ messages in thread
From: Jeff Garzik @ 2003-10-16 21:25 UTC (permalink / raw)
  To: Chris Meadors; +Cc: Linux Kernel

Chris Meadors wrote:
> On Thu, 2003-10-16 at 17:02, Jeff Garzik wrote:
> 
> 
>>I'm curious if anyone has done any work on using multiple different 
>>checksums?  For example, the cost of checksumming a single block with 
>>multiple algorithms (sha1+md5+crc32 for a crazy example), and storing 
>>each checksum (instead of just one sha1 sum), may be faster than reading 
>>the block off of disk to compare it with the incoming block.  OTOH, 
>>there is still a mathematical possibility (however-more-remote) of a 
>>collission...
> 
> 
> I don't think multiple hashes will gain any more uniqueness over just a
> larger hash value. 

I disagree...

  But as long as the hash is smaller than the block
> being hashed there is the possibility of two dissimilar blocks producing
> the same hash.

Agreed.

	Jeff




^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 21:02           ` Jeff Garzik
  2003-10-16 21:18             ` Chris Meadors
@ 2003-10-16 21:33             ` Davide Libenzi
  2003-10-17  3:47             ` Mark Mielke
  2003-10-17 14:31             ` Jörn Engel
  3 siblings, 0 replies; 92+ messages in thread
From: Davide Libenzi @ 2003-10-16 21:33 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Val Henson, Larry McVoy, Linux Kernel Mailing List

On Thu, 16 Oct 2003, Jeff Garzik wrote:

> Val Henson wrote:
> > Abstract:
> >
> >  "Recent research has produced a new and perhaps dangerous technique
> >   for uniquely identifying blocks that I will call
> >   compare-by-hash. Using this technique, we decide whether two blocks
> >   are identical to each other by comparing their hash values, using a
> >   collision-resistant hash such as SHA-1. If the hash values match,
> >   we assume the blocks are identical without further ado. Users of
> >   compare-by-hash argue that this assumption is warranted because the
> >   chance of a hash collision between any two randomly generated blocks
> >   is estimated to be many orders of magnitude smaller than the chance
> >   of many kinds of hardware errors. Further analysis shows that this
> >   approach is not as risk-free as it seems at first glance."
>
>
> I'm curious if anyone has done any work on using multiple different
> checksums?  For example, the cost of checksumming a single block with
> multiple algorithms (sha1+md5+crc32 for a crazy example), and storing
> each checksum (instead of just one sha1 sum), may be faster than reading
> the block off of disk to compare it with the incoming block.  OTOH,
> there is still a mathematical possibility (however-more-remote) of a
> collission...

At that point it is better to extend the fingerprint size, since the SHA1
algorithm has a better distribution compared to md5 and crc32. Probability
estimates are pretty low though. If you consider a 2^32 blocks FS, that
with a 4Kb block size makes a 4 tera FS, the collision probability is in
the orders of 2^(-95) (with a 160 bit fingerprint). That's a pretty low
number. Yes, it is true that the input is not completely random, but a
good property of SHA1 is the one of spreading output result of very
similar input patterns.




- Davide


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 17:49         ` Val Henson
  2003-10-16 21:02           ` Jeff Garzik
@ 2003-10-16 23:04           ` jw schultz
  2003-10-16 23:30             ` Jeff Garzik
                               ` (4 more replies)
  1 sibling, 5 replies; 92+ messages in thread
From: jw schultz @ 2003-10-16 23:04 UTC (permalink / raw)
  To: linux-kernel

On Thu, Oct 16, 2003 at 11:49:27AM -0600, Val Henson wrote:
> On Thu, Oct 16, 2003 at 10:29:30AM -0700, Larry McVoy wrote:
> > On Wed, Oct 15, 2003 at 11:13:27AM -0400, Jeff Garzik wrote:
> > > Josh and others should take a look at Plan9's venti file storage method 
> > > -- archival storage is a series of unordered blocks, all of which are 
> > > indexed by the sha1 hash of their contents.  This magically coalesces 
> > > all duplicate blocks by its very nature, including the loooooong runs of 
> > > zeroes that you'll find in many filesystems.  I bet savings on "all 
> > > bytes in this block are zero" are worth a bunch right there.
> > 
> > The only problem with this is that you can get false positives.  Val Hensen
> > recently wrote a paper about this.  It's really unlikely that you get false
> > positives but it can happen and it has happened in the field.  
> 
> To be fair, I talked to someone who claims that Venti now checks for
> hash collisions on writes, but that's not what the original paper
> describes and I haven't confirmed it.
> 
> The compare-by-hash paper is only 6 pages long, at least take the time
> to read it before you start using compare-by-hash:
> 
> http://www.usenix.org/events/hotos03/tech/henson.html
> 
> Abstract:
> 
>  "Recent research has produced a new and perhaps dangerous technique
>   for uniquely identifying blocks that I will call
>   compare-by-hash. Using this technique, we decide whether two blocks
>   are identical to each other by comparing their hash values, using a
>   collision-resistant hash such as SHA-1. If the hash values match,
>   we assume the blocks are identical without further ado. Users of
>   compare-by-hash argue that this assumption is warranted because the
>   chance of a hash collision between any two randomly generated blocks
>   is estimated to be many orders of magnitude smaller than the chance
>   of many kinds of hardware errors. Further analysis shows that this
>   approach is not as risk-free as it seems at first glance."
> 
> -VAL (not subscribed to l-k ATM)

Several months ago we encountered the hash collision problem
with rsync.  This brought about a fair amount of discussion
regarding the likelihood of false positives with the result
of some code changes.  I am not educationally equipped to
offer mathematical proofs myself but i can highlight the
salient points.  If you wish more detail you can look it up
in the list archives.

1.  Internal collision: The smaller the hash size to data
    size ratio the greater the likelihood of false
    positives.  A poor quality hash adversely affects this
    but no matter how good the hash is it remains true.

2.  External collision: The smaller hash size to unit count
    ratio the greater liklihood of false positives.  To put
    it in the extreme: if you have 10,000 blocks you are
    almost guaranteed to get false positives on a 16 bit
    hash.  This is similar to filling the namespace.  It is
    external collision that was killing rsync on iso images.

Because each hash algorithm has different pathologies
multiple hashes are generally better than one but their
effective aggregate bit count is less than the sum of the
separate bit counts.

The idea of this sort of block level hashing to allow
sharing of identical blocks seems attractive but i wouldn't
trust any design that did not accept as given that there
would be false positives.  This means that a write would
have to not only hash the block but then if there is a
collision do a compare of the raw data.  Then you have to
add the overhead of having lists of blocks that match a hash
value and reference counts for each block itself.  Further,
every block write would then need to include, at minimum,
relinking facilities on the hash lists and hash entry
allocations plus the inode block list being updated. Finally
in the case of COW due to refcount!=0 you would have to do a
block allocation, even when simply overwriting which could
cause ENOSPACE when an application was simply rewriting an
already allocated block or inside mmap.


-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 16:41       ` P
  2003-10-16 17:20         ` Jeff Garzik
@ 2003-10-16 23:12         ` jw schultz
  2003-10-17  8:03           ` John Bradford
  1 sibling, 1 reply; 92+ messages in thread
From: jw schultz @ 2003-10-16 23:12 UTC (permalink / raw)
  To: linux-kernel

On Thu, Oct 16, 2003 at 05:41:34PM +0100, P@draigBrady.com wrote:
> Andrea Arcangeli wrote:
> >Hi Jeff,
> >
> >On Wed, Oct 15, 2003 at 11:13:27AM -0400, Jeff Garzik wrote:
> >
> >>Josh and others should take a look at Plan9's venti file storage method 
> >>-- archival storage is a series of unordered blocks, all of which are 
> >>indexed by the sha1 hash of their contents.  This magically coalesces 
> >>all duplicate blocks by its very nature, including the loooooong runs of 
> >>zeroes that you'll find in many filesystems.  I bet savings on "all 
> >>bytes in this block are zero" are worth a bunch right there.
> >
> >I had a few ideas on the above.
> >
> >if the zero blocks are the problem, there's a tool called zum that nukes
> >them and replaces them with holes. I use it sometime, example:
> 
> Yes post processing with this tool is useful.
> Also note gnu cp (--sparse) inserts holes
> in files also.

As does cpio, rsync and many other archiving and data
transfer utilities.

> I thought a bit about this also and thought
> that in general the overhead of instant block/file
> duplicate merging is not worth it. Periodic
> post processing with a merging tool is
> much more efficient IMHO. Of course this is
> now only possible at the file level, but this
> is all that generally useful I think. I guess
> it's appropriate to plug my merging tool here :-)
> http://www.pixelbeat.org/fslint

What might be worth considering is internal NUL detection.
Build the block-of-zeros detection into the filesystem
write resulting in automatic creation of sparse files.
This could even work with extent based filesystems where
using hashes to identify shared blocks would not.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 16:29     ` Andrea Arcangeli
                         ` (2 preceding siblings ...)
  2003-10-16 17:29       ` Larry McVoy
@ 2003-10-16 23:20       ` jw schultz
  2003-10-17 14:47         ` Eli Carter
  3 siblings, 1 reply; 92+ messages in thread
From: jw schultz @ 2003-10-16 23:20 UTC (permalink / raw)
  To: linux-kernel

On Thu, Oct 16, 2003 at 06:29:26PM +0200, Andrea Arcangeli wrote:
> Hi Jeff,
> 
> On Wed, Oct 15, 2003 at 11:13:27AM -0400, Jeff Garzik wrote:
> > Josh and others should take a look at Plan9's venti file storage method 
> > -- archival storage is a series of unordered blocks, all of which are 
> > indexed by the sha1 hash of their contents.  This magically coalesces 
> > all duplicate blocks by its very nature, including the loooooong runs of 
> > zeroes that you'll find in many filesystems.  I bet savings on "all 
> > bytes in this block are zero" are worth a bunch right there.
> 
> I had a few ideas on the above.
> 
> if the zero blocks are the problem, there's a tool called zum that nukes
> them and replaces them with holes. I use it sometime, example:

With the exception of NUL blocks i doubt there is much
savings at any level below the file.  That is, only a tiny
fraction of files that are not fully duplicate of one
another will share many aligned blocks.

Now detecting files that are duplicates and linking them in
some way might be a useful in a low-priority daemon.  But
the links created would have to be sure to preserve them as
seperate inodes so that overwrites break the loose link but
not the user-created hardlink.


-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 23:04           ` jw schultz
@ 2003-10-16 23:30             ` Jeff Garzik
  2003-10-16 23:58               ` jw schultz
  2003-10-17  0:45             ` Christopher Li
                               ` (3 subsequent siblings)
  4 siblings, 1 reply; 92+ messages in thread
From: Jeff Garzik @ 2003-10-16 23:30 UTC (permalink / raw)
  To: jw schultz; +Cc: linux-kernel

jw schultz wrote:
> Because each hash algorithm has different pathologies
> multiple hashes are generally better than one but their
> effective aggregate bit count is less than the sum of the
> separate bit counts.

I was coming to this conclusion too...  still, it's safer simply to 
handle collisions.


> The idea of this sort of block level hashing to allow
> sharing of identical blocks seems attractive but i wouldn't
> trust any design that did not accept as given that there
> would be false positives.  This means that a write would
> have to not only hash the block but then if there is a
> collision do a compare of the raw data.  Then you have to

yep


> add the overhead of having lists of blocks that match a hash
> value and reference counts for each block itself.  Further,
> every block write would then need to include, at minimum,
> relinking facilities on the hash lists and hash entry
> allocations plus the inode block list being updated. Finally

Consider a simple key-value map, where "$hash $n" is the key and the 
value is the block of data.  Only three operations need occur:
* if hash exists (highly unlikely!), read and compare w/ raw data
* write new block to disk
* add single datum to key-value index

Inode block lists would need to be updated regardless of any collision; 
that would be a standard and required part of any VFS interaction. And 
the internal workings of the key-value index (think Berkeley DB) are 
static, regardless of any collision.

	Jeff




^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 23:58               ` jw schultz
@ 2003-10-16 23:53                 ` David Lang
  2003-10-17  1:19                 ` Jeff Garzik
  1 sibling, 0 replies; 92+ messages in thread
From: David Lang @ 2003-10-16 23:53 UTC (permalink / raw)
  To: jw schultz; +Cc: linux-kernel

On Thu, 16 Oct 2003, jw schultz wrote:

> Date: Thu, 16 Oct 2003 16:58:11 -0700
> From: jw schultz <jw@pegasys.ws>
> To: linux-kernel@vger.kernel.org
> Subject: Re: Transparent compression in the FS
>
> On Thu, Oct 16, 2003 at 07:30:58PM -0400, Jeff Garzik wrote:
> > jw schultz wrote:
> > >Because each hash algorithm has different pathologies
> > >multiple hashes are generally better than one but their
> > >effective aggregate bit count is less than the sum of the
> > >separate bit counts.
> >
> > I was coming to this conclusion too...  still, it's safer simply to
> > handle collisions.
>
> And because, in a filesystem, you have to handle collisions
> you balance the cost of the hash quality against the cost of
> collision.  A cheap hash with low colission rate is probably
> better than an expensive hash with a near-zero colission
> rate.

true, but one advantage of useing multiple hashes is that once you already
have the data in the CPU you can probably do multiple cheap hashes for the
same cost as a single one (the same reason that calculating a checksum is
essentially free if you are already having the CPU touch the data) when
the CPU is 12x+ faster then the memory bus you have a fair bit of CPU time
available for each byte of data that you are dealing with.

David Lang

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 23:30             ` Jeff Garzik
@ 2003-10-16 23:58               ` jw schultz
  2003-10-16 23:53                 ` David Lang
  2003-10-17  1:19                 ` Jeff Garzik
  0 siblings, 2 replies; 92+ messages in thread
From: jw schultz @ 2003-10-16 23:58 UTC (permalink / raw)
  To: linux-kernel

On Thu, Oct 16, 2003 at 07:30:58PM -0400, Jeff Garzik wrote:
> jw schultz wrote:
> >Because each hash algorithm has different pathologies
> >multiple hashes are generally better than one but their
> >effective aggregate bit count is less than the sum of the
> >separate bit counts.
> 
> I was coming to this conclusion too...  still, it's safer simply to 
> handle collisions.

And because, in a filesystem, you have to handle collisions
you balance the cost of the hash quality against the cost of
collision.  A cheap hash with low colission rate is probably
better than an expensive hash with a near-zero colission
rate.

> 
> 
> >The idea of this sort of block level hashing to allow
> >sharing of identical blocks seems attractive but i wouldn't
> >trust any design that did not accept as given that there
> >would be false positives.  This means that a write would
> >have to not only hash the block but then if there is a
> >collision do a compare of the raw data.  Then you have to
> 
> yep
> 
> 
> >add the overhead of having lists of blocks that match a hash
> >value and reference counts for each block itself.  Further,
> >every block write would then need to include, at minimum,
> >relinking facilities on the hash lists and hash entry
> >allocations plus the inode block list being updated. Finally
> 
> Consider a simple key-value map, where "$hash $n" is the key and the 
> value is the block of data.  Only three operations need occur:
> * if hash exists (highly unlikely!), read and compare w/ raw data
> * write new block to disk
> * add single datum to key-value index

Nicely described, but each block now needs a reference count
which is incrmented if the raw compare yields a positive and
decremented when a reference to it receives a write.
It may help to also have a reverse reference somewhere
from the block to the hash.

More detail than this gets into writing pseudocode.

> Inode block lists would need to be updated regardless of any collision; 
> that would be a standard and required part of any VFS interaction. And 

Under current schemes if i overwrite an already allocated
block of a file the block list need not be updated unless
the block is relocated.  But that is a nit.

> the internal workings of the key-value index (think Berkeley DB) are 
> static, regardless of any collision.
> 
> 	Jeff
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 23:04           ` jw schultz
  2003-10-16 23:30             ` Jeff Garzik
@ 2003-10-17  0:45             ` Christopher Li
  2003-10-17  1:16               ` Jeff Garzik
  2003-10-17  1:32             ` jlnance
                               ` (2 subsequent siblings)
  4 siblings, 1 reply; 92+ messages in thread
From: Christopher Li @ 2003-10-17  0:45 UTC (permalink / raw)
  To: jw schultz, linux-kernel

> The idea of this sort of block level hashing to allow
> sharing of identical blocks seems attractive but i wouldn't
> trust any design that did not accept as given that there
> would be false positives.  This means that a write would
> have to not only hash the block but then if there is a
> collision do a compare of the raw data.  Then you have to
> add the overhead of having lists of blocks that match a hash
> value and reference counts for each block itself.  Further,

Then write every data block will need to dirty at least 2 blocks.
And it also need to read back the original block if hash exist.
There must be some performance hit. 

Chris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-17  0:45             ` Christopher Li
@ 2003-10-17  1:16               ` Jeff Garzik
  0 siblings, 0 replies; 92+ messages in thread
From: Jeff Garzik @ 2003-10-17  1:16 UTC (permalink / raw)
  To: Christopher Li; +Cc: jw schultz, linux-kernel

Christopher Li wrote:
>>The idea of this sort of block level hashing to allow
>>sharing of identical blocks seems attractive but i wouldn't
>>trust any design that did not accept as given that there
>>would be false positives.  This means that a write would
>>have to not only hash the block but then if there is a
>>collision do a compare of the raw data.  Then you have to
>>add the overhead of having lists of blocks that match a hash
>>value and reference counts for each block itself.  Further,
> 
> 
> Then write every data block will need to dirty at least 2 blocks.
> And it also need to read back the original block if hash exist.
> There must be some performance hit. 


In my case at least, we're talking about archival storage.  Plan9 uses a 
"write buffer" of 1-2GB or so, to mitigate performance loss, which seems 
reasonable.  With archival storage and hash indexes and such, you're 
certainly going to be dirtying more disk blocks than a traditional local 
filesystem would.

	Jeff




^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 23:58               ` jw schultz
  2003-10-16 23:53                 ` David Lang
@ 2003-10-17  1:19                 ` Jeff Garzik
  1 sibling, 0 replies; 92+ messages in thread
From: Jeff Garzik @ 2003-10-17  1:19 UTC (permalink / raw)
  To: jw schultz; +Cc: linux-kernel

jw schultz wrote:
>>Consider a simple key-value map, where "$hash $n" is the key and the 
>>value is the block of data.  Only three operations need occur:
>>* if hash exists (highly unlikely!), read and compare w/ raw data
>>* write new block to disk
>>* add single datum to key-value index
> 
> 
> Nicely described, but each block now needs a reference count
> which is incrmented if the raw compare yields a positive and
> decremented when a reference to it receives a write.

Yep.  I'm pondering a sort of a closely-interconnected distributed cache 
for archival storage, so I have to care about refcounts on remote 
servers, too.


> It may help to also have a reverse reference somewhere
> from the block to the hash.

I think I can avoid this...


>>Inode block lists would need to be updated regardless of any collision; 
>>that would be a standard and required part of any VFS interaction. And 
> 
> 
> Under current schemes if i overwrite an already allocated
> block of a file the block list need not be updated unless
> the block is relocated.  But that is a nit.

Yep.  For a networked filesystem, this is less annoying and clients will 
not normally notice any slowdown from this.

	Jeff




^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 23:04           ` jw schultz
  2003-10-16 23:30             ` Jeff Garzik
  2003-10-17  0:45             ` Christopher Li
@ 2003-10-17  1:32             ` jlnance
  2003-10-17  1:47               ` Eric Sandall
                                 ` (3 more replies)
  2003-10-17  9:44             ` Pavel Machek
  2003-10-27  2:22             ` Mike Fedyk
  4 siblings, 4 replies; 92+ messages in thread
From: jlnance @ 2003-10-17  1:32 UTC (permalink / raw)
  To: linux-kernel

On Thu, Oct 16, 2003 at 04:04:48PM -0700, jw schultz wrote:
> 
> The idea of this sort of block level hashing to allow
> sharing of identical blocks seems attractive but i wouldn't
> trust any design that did not accept as given that there
> would be false positives.

But at the same time we rely on TCP/IP which uses a hash (checksum)
to detect back packets.  It seems to work well in practice even
though the hash is weak and the network corrupts a lot of packets.

Lots of machines dont have ECC ram and seem to work reasonably well.

It seems like these two are a lot more likely to bit you than hash
collisions in MD5.  But Ill have to go read the paper to see what
Im missing.

Thanks,

Jim

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 14:27     ` Erik Mouw
                         ` (2 preceding siblings ...)
  2003-10-15 21:36       ` Tomas Szepe
@ 2003-10-17  1:32       ` Eric W. Biederman
  3 siblings, 0 replies; 92+ messages in thread
From: Eric W. Biederman @ 2003-10-17  1:32 UTC (permalink / raw)
  To: Erik Mouw; +Cc: Nikita Danilov, Josh Litherland, linux-kernel

Erik Mouw <erik@harddisk-recovery.com> writes:

> On Wed, Oct 15, 2003 at 05:50:38PM +0400, Nikita Danilov wrote:
> > Erik Mouw writes:
> >  > Nowadays disks are so incredibly cheap, that transparent compression
> >  > support is not realy worth it anymore (IMHO).
> > 
> > But disk bandwidth is so incredibly expensive that compression becoming
> > more and more useful: on compressed file system bandwidth of user-data
> > transfers can be larger than raw disk bandwidth. It is the same
> > situation as with allocation of disk space for files: disks are cheap,
> > but storing several files in the same block becomes more advantageous
> > over time.
> 
> You have a point, but remember that modern IDE drives can do about
> 50MB/s from medium. I don't think you'll find a CPU that is able to
> handle transparent decompression on the fly at 50MB/s, even not with a
> simple compression scheme as used in NTFS (see the NTFS docs on
> SourceForge for details).

You don't need to do 50MB/s decompression to for compression to be
a win though.  At 16MB/s with an average of 3:1 compression you have
achieved the same speed off of the disk.  So you only need > 16MB/s
for it to be a speed win.  The double buffering and cpu resources
may give rise to other throughput issues though.

Beyond that modern disks cannot do 50MB/s random I/O seeks are too
expensive.  So by compressing data you can fit it into a smaller
space and with care reduce your seek overhead.

Looking for some real world examples I picked one of my log files that
I generate from my daily development activities.  It was archived
with logrotate and already compressed.  And I just picked the largest 
one.  The file is a text log file, it is not full of zeros and
run length encoding it would not be useful.

Compressed it is 5.5M decompressed it is 128M.  
At home on my athlon 700Mhz I can decompress it in roughly 3 seconds.
At work on my xeon 1.7Ghz  I can decompress it in 1 second.
On my test opteron at 1.4 Ghz I can decompress it in 0.9 seconds.

So I would get effective 42MB/s at home and 128MB/s at work, and 142MB/s on my
test box.  So for some loads there is certainly a performance win to
be had.

The way to incorporate this into the write path of a filesystem
is to use delayed allocation, and put off the write as long as possible.
Then compress the file, allocate the blocks for it and write it in one
big piece.   This allows very big compressed blocks.  e2compr I think stopped
development before compression was implemented in the right place.

And then there are other cases where compression is worthwhile like zisofs.
You can't make a CD bigger so to stuff more data on it you must compress it.
And then you have more files you can use immediately.

I suspect other things like a transparently compressed glibc would also be a win.
You always have it in ram so it might as well be compressed on disk
and save space..

And there are enough embedded cases out there where people are
squeezing the hardware budgets to bring prices down as far as they can 
living on less hardware is a good thing is you can do so easily.

Eric

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-17  1:32             ` jlnance
@ 2003-10-17  1:47               ` Eric Sandall
  2003-10-17  8:11                 ` John Bradford
  2003-10-17 13:07                 ` jlnance
  2003-10-17  1:49               ` Davide Libenzi
                                 ` (2 subsequent siblings)
  3 siblings, 2 replies; 92+ messages in thread
From: Eric Sandall @ 2003-10-17  1:47 UTC (permalink / raw)
  To: linux-kernel

Quoting jlnance@unity.ncsu.edu:
> But at the same time we rely on TCP/IP which uses a hash (checksum)
> to detect back packets.  It seems to work well in practice even
> though the hash is weak and the network corrupts a lot of packets.
> 
> Lots of machines dont have ECC ram and seem to work reasonably well.
> 
> It seems like these two are a lot more likely to bit you than hash
> collisions in MD5.  But Ill have to go read the paper to see what
> Im missing.
> 
> Thanks,
> 
> Jim

It doesn't really matter that the hash collision is /less/ likely to ruin data
than something in hardware as it adds an /extra/ layer of possible corruption,
so you have a net gain in the possible corruption of your data.  Now, if you
could write it so that there was /no/ possibility of data corruption, than it
would be much more acceptable as it wouldn't add any extra likeliness of
corruption than already exists.

-sandalle

-- 
PGP Key Fingerprint:  FCFF 26A1 BE21 08F4 BB91  FAED 1D7B 7D74 A8EF DD61
http://search.keyserver.net:11371/pks/lookup?op=get&search=0xA8EFDD61

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCS/E/IT$ d-- s++:+>: a-- C++(+++) BL++++VIS>$ P+(++) L+++ E-(---) W++ N+@ o?
K? w++++>-- O M-@ V-- PS+(+++) PE(-) Y++(+) PGP++(+) t+() 5++ X(+) R+(++)
tv(--)b++(+++) DI+@ D++(+++) G>+++ e>+++ h---(++) r++ y+
------END GEEK CODE BLOCK------

Eric Sandall                     |  Source Mage GNU/Linux Developer
eric@sandall.us                  |  http://www.sourcemage.org/
http://eric.sandall.us/          |  SysAdmin @ Inst. Shock Physics @ WSU
http://counter.li.org/  #196285  |  http://www.shock.wsu.edu/

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-17  1:32             ` jlnance
  2003-10-17  1:47               ` Eric Sandall
@ 2003-10-17  1:49               ` Davide Libenzi
  2003-10-17  1:59               ` Larry McVoy
  2003-10-17  2:19               ` jw schultz
  3 siblings, 0 replies; 92+ messages in thread
From: Davide Libenzi @ 2003-10-17  1:49 UTC (permalink / raw)
  To: jlnance; +Cc: Linux Kernel Mailing List

On Thu, 16 Oct 2003 jlnance@unity.ncsu.edu wrote:

> On Thu, Oct 16, 2003 at 04:04:48PM -0700, jw schultz wrote:
> >
> > The idea of this sort of block level hashing to allow
> > sharing of identical blocks seems attractive but i wouldn't
> > trust any design that did not accept as given that there
> > would be false positives.
>
> But at the same time we rely on TCP/IP which uses a hash (checksum)
> to detect back packets.  It seems to work well in practice even
> though the hash is weak and the network corrupts a lot of packets.
>
> Lots of machines dont have ECC ram and seem to work reasonably well.
>
> It seems like these two are a lot more likely to bit you than hash
> collisions in MD5.  But Ill have to go read the paper to see what
> Im missing.

The TCP/ECC thingies are different since the probability of false
negatives is the combined probability that 1) the data is wrong 2) the
hash collides. In case of a hash-indexing algo the probability of coliding
hashes is "raw".



- Davide


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-17  1:32             ` jlnance
  2003-10-17  1:47               ` Eric Sandall
  2003-10-17  1:49               ` Davide Libenzi
@ 2003-10-17  1:59               ` Larry McVoy
  2003-10-17  2:19               ` jw schultz
  3 siblings, 0 replies; 92+ messages in thread
From: Larry McVoy @ 2003-10-17  1:59 UTC (permalink / raw)
  To: jlnance; +Cc: linux-kernel

On Thu, Oct 16, 2003 at 09:32:45PM -0400, jlnance@unity.ncsu.edu wrote:
> Lots of machines dont have ECC ram and seem to work reasonably well.

That's because you have two choices in RAM today: ECC and straight memory,
no parity.  So you never know that there is a problem until /bin/ls starts
core dumping.

BK users catch memory errors all the time because BK checksums the data.
Even with a very (!) weak checksum we see it (we have retained, perhaps
stupidly, backwards compat with SCCS' 16 bit checksum - not CRC).  One
nice thing about the weak checksum is that we can tell if it is a memory
from looking at the got/wanted values for the checksum, single bit errors
are obvious.  It happens so frequently that we have learned to recognize
it and tell the customer within seconds of getting the mail.
-- 
---
Larry McVoy              lm at bitmover.com          http://www.bitmover.com/lm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-17  1:32             ` jlnance
                                 ` (2 preceding siblings ...)
  2003-10-17  1:59               ` Larry McVoy
@ 2003-10-17  2:19               ` jw schultz
  3 siblings, 0 replies; 92+ messages in thread
From: jw schultz @ 2003-10-17  2:19 UTC (permalink / raw)
  To: linux-kernel

On Thu, Oct 16, 2003 at 09:32:45PM -0400, jlnance@unity.ncsu.edu wrote:
> On Thu, Oct 16, 2003 at 04:04:48PM -0700, jw schultz wrote:
> > 
> > The idea of this sort of block level hashing to allow
> > sharing of identical blocks seems attractive but i wouldn't
> > trust any design that did not accept as given that there
> > would be false positives.
> 
> But at the same time we rely on TCP/IP which uses a hash (checksum)
> to detect back packets.  It seems to work well in practice even
> though the hash is weak and the network corrupts a lot of packets.
> 
> Lots of machines dont have ECC ram and seem to work reasonably well.

That is because most of the errors (which are few) get lost
in the noise of BSODs or are trivial data errors.  Can you
tell whether your application crashed because it had a bug
or because a bit in memory flipped?  Is tiis a typm or did a
bit or two flip on this email message?  There is a big
difference between single bit errors and having an entire
block of a file be wrong.

> It seems like these two are a lot more likely to bit you than hash
> collisions in MD5.  But Ill have to go read the paper to see what
> Im missing.

There is a big difference between the probability of any
random pair of blocks getting a false positive, much less a
given block with some corruption still hashing the same and
a false positive between one block and any of millions of
others.

It is a bit like the difference in odds between you winning
at this weeks lotto and anyone winning this week.  Are you
willing to bet that nobody wins this weeks lotto?  Would you
stake your life savings on it?


-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 21:02           ` Jeff Garzik
  2003-10-16 21:18             ` Chris Meadors
  2003-10-16 21:33             ` Davide Libenzi
@ 2003-10-17  3:47             ` Mark Mielke
  2003-10-17 14:31             ` Jörn Engel
  3 siblings, 0 replies; 92+ messages in thread
From: Mark Mielke @ 2003-10-17  3:47 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Val Henson, Larry McVoy, linux-kernel

On Thu, Oct 16, 2003 at 05:02:30PM -0400, Jeff Garzik wrote:
> I'm curious if anyone has done any work on using multiple different 
> checksums?  For example, the cost of checksumming a single block with 
> multiple algorithms (sha1+md5+crc32 for a crazy example), and storing 
> each checksum (instead of just one sha1 sum), may be faster than reading 
> the block off of disk to compare it with the incoming block.  OTOH, 
> there is still a mathematical possibility (however-more-remote) of a 
> collission...

I believe you are drawing your conclusions on a incorrect hunch.

Figure that you had a 'perfect' hash algorithm, that would turn
any 32768 bit block (4096 bytes) into a 32 bit hash value. We will
define 'perfect' to mean that the algorithm is balanced (i.e. the
weight distribution of the hash values is approximately even).

Then, we find a second 'perfect' hash algorithm that can be
mathematically proven to have no similarities to the first algorithm,
and also maps 32768 bit blocks to 32 bit hash values.

The result is two 32 bit values, or a single 64 bit value. The
possible values that can be represented using 64 bits is 2**64-1.

The possible values that can be represented using 32768 bits is
2**32768-1. There is guaranteed to be *many* combinations of bytes
that would produce the same set of 32-bit hash values.

Now, we come back to reality. In reality, I suggest that it is
impossible (or unreasonably difficult) to come up with two 'perfect'
hash algorithms that have no similarities between them. It is likely
that the algorithms are not perfect, and have weaknesses (i.e. certain
hash values would result more frequently than others), and they also
have similarities that would cause these weaknesses to multiply against
each other, meaning that multiple algorithms may make the final result
*less* unique, than using the best algorithm, with a larger hash value.

On the other side of reality, certain types of blocks are a lot more
likely to appear than other types of blocks, within the same file, and
certain types of damage to files, affect blocks in only certain ways.
Meaning -- it is statistically difficult for a block to be altered or
damaged in such a way as to trick any proper checksum/hash algorithm
into thinking the block is valid, when it isn't.

Unclear conclusion? :-)

I believe that checksum/hash is a perfectly valid way of verifying the
consistency of data (no surprise here). I believe that using multiple
algorithms is silly. Instead, pick the best algorithm, and increase the
number of bits. I'll let the experts define 'best', but will suggest that
'best' is relative to the application. For a transmission protocol, 'best'
may mean 'best given that data corruption is only expected to happen in
the following ways'. For digital signatures, 'best' may mean 'statistically
difficult to create a message with the same hash value, that presents a
different apparently valid message'.

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 23:12         ` jw schultz
@ 2003-10-17  8:03           ` John Bradford
  2003-10-17 14:53             ` Eli Carter
  0 siblings, 1 reply; 92+ messages in thread
From: John Bradford @ 2003-10-17  8:03 UTC (permalink / raw)
  To: jw schultz, linux-kernel

> What might be worth considering is internal NUL detection.
> Build the block-of-zeros detection into the filesystem
> write resulting in automatic creation of sparse files.
> This could even work with extent based filesystems where
> using hashes to identify shared blocks would not.

Another idea could be writing uncompressed data to the disk, and
background-compressing it with something like bzip2, but keeping the
uncompressed data on disk as well, only over-writing it when disk
space is low, and then overwriting the least recently used files
first.

The upshot of all that would be that if you needed space, it would be
there, (just overwrite the uncompressed versions of files), but until
you do, you can access the uncompressed data quickly.

You could even take it one step further, and compress files with gzip
by default, and re-compress them with bzip2 after long periods of
inactivity.

John.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-17  1:47               ` Eric Sandall
@ 2003-10-17  8:11                 ` John Bradford
  2003-10-17 17:53                   ` Eric Sandall
  2003-10-17 13:07                 ` jlnance
  1 sibling, 1 reply; 92+ messages in thread
From: John Bradford @ 2003-10-17  8:11 UTC (permalink / raw)
  To: Eric Sandall, linux-kernel

> > But at the same time we rely on TCP/IP which uses a hash (checksum)
> > to detect back packets.  It seems to work well in practice even
> > though the hash is weak and the network corrupts a lot of packets.
> > 
> > Lots of machines dont have ECC ram and seem to work reasonably well.
> > 
> > It seems like these two are a lot more likely to bit you than hash
> > collisions in MD5.  But Ill have to go read the paper to see what
> > Im missing.
> 
> It doesn't really matter that the hash collision is /less/ likely to ruin data
> than something in hardware as it adds an /extra/ layer of possible corruption,
> so you have a net gain in the possible corruption of your data.  Now, if you
> could write it so that there was /no/ possibility of data corruption, than it
> would be much more acceptable as it wouldn't add any extra likeliness of
> corruption than already exists.

Comparing the reliability of the hash function to the reliability of
hardware is an apples to oranges comparison.  Far more interesting
would be to compare the correlation between reliability of each of
them to the input data.

I.E.:

The hash function scores 100% on this - certain patters will _always_
trigger a fault.  Once you find a way to make it corrupt data, it will
be 100% reproducable using the same dataset.

The hardware will typically score less than 100% - certain patterns of
data in memory are more likely to make a weak RAM chip flip a bit, we
have seen many examples of that where a machine dies quickly running
memtest86, but will run Linux apparently fine for at least a while.
However, other things such as the temperature affect this.  It's rare
to find a memory access pattern that will 100% reliably lock up a
machine in exactly the same way.

John.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 23:04           ` jw schultz
                               ` (2 preceding siblings ...)
  2003-10-17  1:32             ` jlnance
@ 2003-10-17  9:44             ` Pavel Machek
  2003-10-17 12:33               ` jlnance
  2003-10-17 18:23               ` jw schultz
  2003-10-27  2:22             ` Mike Fedyk
  4 siblings, 2 replies; 92+ messages in thread
From: Pavel Machek @ 2003-10-17  9:44 UTC (permalink / raw)
  To: jw schultz, linux-kernel

Hi!

> Several months ago we encountered the hash collision problem
> with rsync.  This brought about a fair amount of discussion

So you found collision in something like md5 or sha1?

							Pavel

-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 13:33 ` Erik Mouw
                     ` (3 preceding siblings ...)
  2003-10-16  8:27   ` tconnors+linuxkernel1066292516
@ 2003-10-17 10:55   ` Ingo Oeser
  4 siblings, 0 replies; 92+ messages in thread
From: Ingo Oeser @ 2003-10-17 10:55 UTC (permalink / raw)
  To: Erik Mouw, Josh Litherland; +Cc: linux-kernel

On Wednesday 15 October 2003 15:33, Erik Mouw wrote:
> On Tue, Oct 14, 2003 at 04:30:50PM -0400, Josh Litherland wrote:
> > Are there any filesystems which implement the transparent compression
> > attribute ?  (chattr +c)
>
> Nowadays disks are so incredibly cheap, that transparent compression
> support is not realy worth it anymore (IMHO).

But disk bandwidth is still a problem and compression helps out here.



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-17  9:44             ` Pavel Machek
@ 2003-10-17 12:33               ` jlnance
  2003-10-17 18:23               ` jw schultz
  1 sibling, 0 replies; 92+ messages in thread
From: jlnance @ 2003-10-17 12:33 UTC (permalink / raw)
  To: linux-kernel

On Fri, Oct 17, 2003 at 11:44:44AM +0200, Pavel Machek wrote:
> Hi!
> 
> > Several months ago we encountered the hash collision problem
> > with rsync.  This brought about a fair amount of discussion
> 
> So you found collision in something like md5 or sha1?

No, rsync uses a much weaker hash.  The paper on the rsync alg is
interesting and has all the details, so you should read if if you
want to be sure.  But from my memory rsync uses a combination of
a weak 16 bit hash which it rolls through the data, and a strong
32 bit hash which it uses to check the 16 bit hash.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-15 18:06               ` Hans Reiser
@ 2003-10-17 12:51                 ` Edward Shushkin
  0 siblings, 0 replies; 92+ messages in thread
From: Edward Shushkin @ 2003-10-17 12:51 UTC (permalink / raw)
  To: Hans Reiser
  Cc: root, Nikita Danilov, Erik Mouw, Josh Litherland, Linux kernel

> Richard B. Johnson wrote:
> 
> >Then the degenerative case is no compression at all. There is no
> >advantage to writing a block that is 1/N full of compressed data.
> >You end up having to write the whole block anyway.
> >
> >This problem was well developed in the days where RLE (at the hardware
> >level) was used to extend the size of hard disks from their physical
> >size of about 38 megabytes to about 70 megabytes. The minimim size
> >of a read or write is a sector.
> >
> >So, lets's use the minimum compression alogithm, no sliding
> >dictionaries complicating things, just RLE and see.
> >
> >The alogithm is a sentinal byte, a byte representing the number
> >of bytes to expand -1, then the byte to expand. The sentinal byte
> >in RLE was 0xbb. If you needed to read/write a 0xbb, you need
> >to expand that to three bytes, 0xbb, 0x00, 0xbb.
> >                                  |     |     |___ byte to expand
> >                                  |     |________ nr bytes (0 + 1)
> >                                  |______________ sentinal byte
> >
> >All other sequences will reduce the size. So, we have a 512-
> >byte sector full of nulls, what gets written is:
> >        0xbb, 0xff, 0x00, 0xbb, 0xff, 0x00
> >           |     |     |     |     |     |___ byte value
> >           |     |     |     |     |_________ 256 bytes
> >           |     |     |     |_______________ sentinal
> >           |     |     |_____________________ byte value
> >           |     |___________________________ 256 bytes
> >           |_________________________________ sentinal.
> >
> >In this example, we have compressed a 512 byte sector to
> >only 6 bytes. Wonderful! Now we have to write 512 bytes
> >so that effort was wasted. 

Reiser4 packs compressed block so it will be represented by a set
of tails (fragments). This approach allows to keep compression ratio
as maximal as possible.

Edward.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-17  1:47               ` Eric Sandall
  2003-10-17  8:11                 ` John Bradford
@ 2003-10-17 13:07                 ` jlnance
  2003-10-17 14:16                   ` Jeff Garzik
  1 sibling, 1 reply; 92+ messages in thread
From: jlnance @ 2003-10-17 13:07 UTC (permalink / raw)
  To: linux-kernel

On Thu, Oct 16, 2003 at 06:47:15PM -0700, Eric Sandall wrote:

> It doesn't really matter that the hash collision is /less/ likely to ruin data
> than something in hardware as it adds an /extra/ layer of possible corruption,
> so you have a net gain in the possible corruption of your data.  Now, if you
> could write it so that there was /no/ possibility of data corruption, than it
> would be much more acceptable as it wouldn't add any extra likeliness of
> corruption than already exists.

This assumes that the probability of there being a bug in the code which
checks for identical blocks is less than the probability of a hash collision.
I am not sure that is a good assumption.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 17:29       ` Larry McVoy
  2003-10-16 17:49         ` Val Henson
  2003-10-16 18:28         ` John Bradford
@ 2003-10-17 13:16         ` Ingo Oeser
  2 siblings, 0 replies; 92+ messages in thread
From: Ingo Oeser @ 2003-10-17 13:16 UTC (permalink / raw)
  To: Larry McVoy; +Cc: val, linux-kernel

On Thursday 16 October 2003 19:29, Larry McVoy wrote:
> On Wed, Oct 15, 2003 at 11:13:27AM -0400, Jeff Garzik wrote:
> > Josh and others should take a look at Plan9's venti file storage method
> > -- archival storage is a series of unordered blocks, all of which are
> > indexed by the sha1 hash of their contents.  This magically coalesces
> > all duplicate blocks by its very nature, including the loooooong runs of
> > zeroes that you'll find in many filesystems.  I bet savings on "all
> > bytes in this block are zero" are worth a bunch right there.
>
> The only problem with this is that you can get false positives.  Val Hensen
> recently wrote a paper about this.  It's really unlikely that you get false
> positives but it can happen and it has happened in the field.

Which is solvable by expanding your index with an enumerator for identical keys
but different content and have an overflow table exactly for this.

Handling hash table overflow is normal for hashing (even for
crypto-hashes). Why did they forget that? That's 2nd semester CS!

Regards

Ingo Oeser



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-17 13:07                 ` jlnance
@ 2003-10-17 14:16                   ` Jeff Garzik
  2003-10-17 15:06                     ` Valdis.Kletnieks
  0 siblings, 1 reply; 92+ messages in thread
From: Jeff Garzik @ 2003-10-17 14:16 UTC (permalink / raw)
  To: jlnance; +Cc: linux-kernel

On Fri, Oct 17, 2003 at 09:07:29AM -0400, jlnance@unity.ncsu.edu wrote:
> On Thu, Oct 16, 2003 at 06:47:15PM -0700, Eric Sandall wrote:
> 
> > It doesn't really matter that the hash collision is /less/ likely to ruin data
> > than something in hardware as it adds an /extra/ layer of possible corruption,
> > so you have a net gain in the possible corruption of your data.  Now, if you
> > could write it so that there was /no/ possibility of data corruption, than it
> > would be much more acceptable as it wouldn't add any extra likeliness of
> > corruption than already exists.
> 
> This assumes that the probability of there being a bug in the code which
> checks for identical blocks is less than the probability of a hash collision.
> I am not sure that is a good assumption.

The complexity of a memcmp() is pretty low...

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 21:02           ` Jeff Garzik
                               ` (2 preceding siblings ...)
  2003-10-17  3:47             ` Mark Mielke
@ 2003-10-17 14:31             ` Jörn Engel
  3 siblings, 0 replies; 92+ messages in thread
From: Jörn Engel @ 2003-10-17 14:31 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Val Henson, Larry McVoy, linux-kernel

On Thu, 16 October 2003 17:02:30 -0400, Jeff Garzik wrote:
> 
> I'm curious if anyone has done any work on using multiple different 
> checksums?  For example, the cost of checksumming a single block with 
> multiple algorithms (sha1+md5+crc32 for a crazy example), and storing 
> each checksum (instead of just one sha1 sum), may be faster than reading 
> the block off of disk to compare it with the incoming block.  OTOH, 
> there is still a mathematical possibility (however-more-remote) of a 
> collission...

Would be interesting.  The underlying assumptions of compare-by-hash
are a) a cryptologically strong hash and b) a sufficient hash space.
Since noone has proven a) yet for any hash, it is necessary to store
multiple hashes and just ignore one of them as soon as that particular
hash is proven to be weak.

As a side-effect, you could search for hash collisions this way.  A
new block that has the same md5 hash as some other, but a new sha1 and
crc32 hash tells you a lot. :)

Jörn

-- 
This above all: to thine own self be true.
-- Shakespeare

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 23:20       ` jw schultz
@ 2003-10-17 14:47         ` Eli Carter
  0 siblings, 0 replies; 92+ messages in thread
From: Eli Carter @ 2003-10-17 14:47 UTC (permalink / raw)
  To: jw schultz; +Cc: linux-kernel

jw schultz wrote:
[snip]
> Now detecting files that are duplicates and linking them in
> some way might be a useful in a low-priority daemon.  But
> the links created would have to be sure to preserve them as
> seperate inodes so that overwrites break the loose link but
> not the user-created hardlink.

This would be very useful even in other situations...  particularly the 
'cp -lR linux-2.4.22 linux-2.4.22-work' trick.  Not having to worry 
about modifying the original version would be nice.  (Note that chmod -w 
can help with this, but it isn't automatic.)

Eli
--------------------. "If it ain't broke now,
Eli Carter           \                  it will be soon." -- crypto-gram
eli.carter(a)inet.com `-------------------------------------------------


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-17  8:03           ` John Bradford
@ 2003-10-17 14:53             ` Eli Carter
  2003-10-17 15:27               ` John Bradford
  0 siblings, 1 reply; 92+ messages in thread
From: Eli Carter @ 2003-10-17 14:53 UTC (permalink / raw)
  To: John Bradford; +Cc: jw schultz, linux-kernel

John Bradford wrote:
>>What might be worth considering is internal NUL detection.
>>Build the block-of-zeros detection into the filesystem
>>write resulting in automatic creation of sparse files.
>>This could even work with extent based filesystems where
>>using hashes to identify shared blocks would not.
> 
> 
> Another idea could be writing uncompressed data to the disk, and
> background-compressing it with something like bzip2, but keeping the
> uncompressed data on disk as well, only over-writing it when disk
> space is low, and then overwriting the least recently used files
> first.
> 
> The upshot of all that would be that if you needed space, it would be
> there, (just overwrite the uncompressed versions of files), but until
> you do, you can access the uncompressed data quickly.
> 
> You could even take it one step further, and compress files with gzip
> by default, and re-compress them with bzip2 after long periods of
> inactivity.

Note that a file compressed with bzip2 is not necessarily smaller than 
the same file compressed with gzip.  (It can be quite a bit larger in fact.)

Eli
--------------------. "If it ain't broke now,
Eli Carter           \                  it will be soon." -- crypto-gram
eli.carter(a)inet.com `-------------------------------------------------


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-17 14:16                   ` Jeff Garzik
@ 2003-10-17 15:06                     ` Valdis.Kletnieks
  0 siblings, 0 replies; 92+ messages in thread
From: Valdis.Kletnieks @ 2003-10-17 15:06 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: jlnance, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 538 bytes --]

On Fri, 17 Oct 2003 10:16:29 EDT, Jeff Garzik said:

> > This assumes that the probability of there being a bug in the code which
> > checks for identical blocks is less than the probability of a hash collision.
> > I am not sure that is a good assumption.
> 
> The complexity of a memcmp() is pretty low...

What's the probability of a non-ECC-corrected error on a memory read during the
computation of the hash or the memcmp()? (Remember that Linux *does* boot on
machines that don't support ECC memory by default, such as many Macs).


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-17 14:53             ` Eli Carter
@ 2003-10-17 15:27               ` John Bradford
  2003-10-17 16:22                 ` Eli Carter
  0 siblings, 1 reply; 92+ messages in thread
From: John Bradford @ 2003-10-17 15:27 UTC (permalink / raw)
  To: Eli Carter; +Cc: jw schultz, linux-kernel

> > The upshot of all that would be that if you needed space, it would be
> > there, (just overwrite the uncompressed versions of files), but until
> > you do, you can access the uncompressed data quickly.
> > 
> > You could even take it one step further, and compress files with gzip
> > by default, and re-compress them with bzip2 after long periods of
> > inactivity.
> 
> Note that a file compressed with bzip2 is not necessarily smaller than 
> the same file compressed with gzip.  (It can be quite a bit larger in fact.)

Have you noticed that with real-life data, or only test cases?

John.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-17 15:27               ` John Bradford
@ 2003-10-17 16:22                 ` Eli Carter
  2003-10-17 17:15                   ` John Bradford
  0 siblings, 1 reply; 92+ messages in thread
From: Eli Carter @ 2003-10-17 16:22 UTC (permalink / raw)
  To: John Bradford; +Cc: jw schultz, linux-kernel

John Bradford wrote:
>>>The upshot of all that would be that if you needed space, it would be
>>>there, (just overwrite the uncompressed versions of files), but until
>>>you do, you can access the uncompressed data quickly.
>>>
>>>You could even take it one step further, and compress files with gzip
>>>by default, and re-compress them with bzip2 after long periods of
>>>inactivity.
>>
>>Note that a file compressed with bzip2 is not necessarily smaller than 
>>the same file compressed with gzip.  (It can be quite a bit larger in fact.)
> 
> 
> Have you noticed that with real-life data, or only test cases?

Real-life data.  I don't remember the exact details for certain, but as 
best as I can recall:  I was dealing with copies of output from build 
logs, telnet sessions, messages files, or the like (i.e. text) that were 
(many,) many MB in size (and probably highly repetitititititive).  I 
wound up with a loop that compressed each file into a gzip and a bzip2, 
compared the sizes, and killed the larger.  There were a number of .gz's 
that won.  (I have also read that gzip is better at text compression 
whereas bzip2 is better at binary compression.  No, I don't remember the 
source.)

But that is immaterial... You have to deal with the case where the 
'better' algorithm gives 'worse' results (by size).  Keep in mind that 
some data won't compress at all (for a given algorithm), and winds up 
needing more space in the compressed form.  (In which case we add a byte 
to say "this is not compressed" and keep the original form.)

uncompressed -> gzip; gzip -> bzip2 would be by far the normal case
But, sometimes gzip can't get it any smaller, or would increase the 
size.  (Keep in mind we may be storing a file that is already compressed...)

So your scheme needs to note when compression fails so it doesn't try 
again, so we see:

uncompressed -> gzip or uncompressed(gzip failed)
gzip -> bzip2 or gzip(bzip2 failed)
uncompressed(gzip failed) -> bzip2 or uncompressed(bzip2 failed)

If it were me, I'd do it with one compression algorthim as a 
proof-of-concept, then add a second, and then generalize it to N cases 
(which would not be hard once the 2 cases was done).

But I must say, I like your idea of keeping the uncompressed form around 
until we need the space.  (I'd also want to track reads separately from 
writes.)

Eli
--------------------. "If it ain't broke now,
Eli Carter           \                  it will be soon." -- crypto-gram
eli.carter(a)inet.com `-------------------------------------------------


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-17 16:22                 ` Eli Carter
@ 2003-10-17 17:15                   ` John Bradford
  0 siblings, 0 replies; 92+ messages in thread
From: John Bradford @ 2003-10-17 17:15 UTC (permalink / raw)
  To: Eli Carter; +Cc: jw schultz, linux-kernel

Quote from Eli Carter <eli.carter@inet.com>:
> John Bradford wrote:
> >>>The upshot of all that would be that if you needed space, it would be
> >>>there, (just overwrite the uncompressed versions of files), but until
> >>>you do, you can access the uncompressed data quickly.
> >>>
> >>>You could even take it one step further, and compress files with gzip
> >>>by default, and re-compress them with bzip2 after long periods of
> >>>inactivity.
> >>
> >>Note that a file compressed with bzip2 is not necessarily smaller than 
> >>the same file compressed with gzip.  (It can be quite a bit larger in fact.)
> > 
> > 
> > Have you noticed that with real-life data, or only test cases?
> 
> Real-life data.  I don't remember the exact details for certain, but as 
> best as I can recall:  I was dealing with copies of output from build 
> logs, telnet sessions, messages files, or the like (i.e. text) that were 
> (many,) many MB in size (and probably highly repetitititititive).  I 
> wound up with a loop that compressed each file into a gzip and a bzip2, 
> compared the sizes, and killed the larger.  There were a number of .gz's 
> that won.  (I have also read that gzip is better at text compression 
> whereas bzip2 is better at binary compression.  No, I don't remember the 
> source.)

Wow, I'm really suprised, I've always had good results with text,
although quite possibly not as repetitive as yours.  I have noticed
that uncompressable data such as /dev/random is almost always expanded
to a greater extent with bzip2, which is why I asked.

> But that is immaterial... You have to deal with the case where the 
> 'better' algorithm gives 'worse' results (by size).  Keep in mind that 
> some data won't compress at all (for a given algorithm), and winds up 
> needing more space in the compressed form.  (In which case we add a byte 
> to say "this is not compressed" and keep the original form.)
> 
> uncompressed -> gzip; gzip -> bzip2 would be by far the normal case
> But, sometimes gzip can't get it any smaller, or would increase the 
> size.  (Keep in mind we may be storing a file that is already compressed...)
> 
> So your scheme needs to note when compression fails so it doesn't try 
> again, so we see:
> 
> uncompressed -> gzip or uncompressed(gzip failed)
> gzip -> bzip2 or gzip(bzip2 failed)
> uncompressed(gzip failed) -> bzip2 or uncompressed(bzip2 failed)

It might also be worth only using a much slower compressor if we get
at least N% better results.  Literally saving 0.1% of the size at the
expense of 5x worse decompression time is possibly not worth while in
most cases.

> If it were me, I'd do it with one compression algorthim as a 
> proof-of-concept, then add a second, and then generalize it to N cases 
> (which would not be hard once the 2 cases was done).
> 
> But I must say, I like your idea of keeping the uncompressed form around 
> until we need the space.  (I'd also want to track reads separately from 
> writes.)

Yes - one write would be worth a lot of reads in terms of keeping it
uncompressed.

Heh, it might also help if you get a bad sector on one of the copies
:-).

John.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-17  8:11                 ` John Bradford
@ 2003-10-17 17:53                   ` Eric Sandall
  0 siblings, 0 replies; 92+ messages in thread
From: Eric Sandall @ 2003-10-17 17:53 UTC (permalink / raw)
  To: John Bradford; +Cc: linux-kernel

Quoting John Bradford <john@grabjohn.com>:
> Comparing the reliability of the hash function to the reliability of
> hardware is an apples to oranges comparison.  Far more interesting
> would be to compare the correlation between reliability of each of
> them to the input data.
<snip> 
> John.

Very true, but I wasn't trying to compare them, instead I was saying (or trying
to say) that we don't want an /extra/ layer of possible corruption, if it can
be avoided.

-sandalle

-- 
PGP Key Fingerprint:  FCFF 26A1 BE21 08F4 BB91  FAED 1D7B 7D74 A8EF DD61
http://search.keyserver.net:11371/pks/lookup?op=get&search=0xA8EFDD61

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCS/E/IT$ d-- s++:+>: a-- C++(+++) BL++++VIS>$ P+(++) L+++ E-(---) W++ N+@ o?
K? w++++>-- O M-@ V-- PS+(+++) PE(-) Y++(+) PGP++(+) t+() 5++ X(+) R+(++)
tv(--)b++(+++) DI+@ D++(+++) G>+++ e>+++ h---(++) r++ y+
------END GEEK CODE BLOCK------

Eric Sandall                     |  Source Mage GNU/Linux Developer
eric@sandall.us                  |  http://www.sourcemage.org/
http://eric.sandall.us/          |  SysAdmin @ Inst. Shock Physics @ WSU
http://counter.li.org/  #196285  |  http://www.shock.wsu.edu/

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-17  9:44             ` Pavel Machek
  2003-10-17 12:33               ` jlnance
@ 2003-10-17 18:23               ` jw schultz
  2003-10-27  2:08                 ` Mike Fedyk
  1 sibling, 1 reply; 92+ messages in thread
From: jw schultz @ 2003-10-17 18:23 UTC (permalink / raw)
  To: linux-kernel

On Fri, Oct 17, 2003 at 11:44:44AM +0200, Pavel Machek wrote:
> Hi!
> 
> > Several months ago we encountered the hash collision problem
> > with rsync.  This brought about a fair amount of discussion
> 
> So you found collision in something like md5 or sha1?

Each block was done with md4 truncated to 16 bits and
adler32.  The file as a whole is double checked with the
full 128 bit md4 and adler32.

The changes made were to improve block sizing to reduce the
number of blocks, and to scale the hash truncation according
to block count and size on a per-file basis.

The probability of false positives in rsync are orders of
magnitude smaller than they would be in a block hashing
filesystem.  Yet we were seeing it happen (with truncated
hash) at measurable rates on files as small as a few hundred
megabytes.  It was almost commonplace on iso images.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-17 18:23               ` jw schultz
@ 2003-10-27  2:08                 ` Mike Fedyk
  2003-10-27  2:15                   ` jw schultz
  0 siblings, 1 reply; 92+ messages in thread
From: Mike Fedyk @ 2003-10-27  2:08 UTC (permalink / raw)
  To: jw schultz, linux-kernel

On Fri, Oct 17, 2003 at 11:23:21AM -0700, jw schultz wrote:
> The probability of false positives in rsync are orders of
> magnitude smaller than they would be in a block hashing
> filesystem.  Yet we were seeing it happen (with truncated
> hash) at measurable rates on files as small as a few hundred
> megabytes.  It was almost commonplace on iso images.

So has there been anything done to solve this problem in rsync?

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-27  2:08                 ` Mike Fedyk
@ 2003-10-27  2:15                   ` jw schultz
  0 siblings, 0 replies; 92+ messages in thread
From: jw schultz @ 2003-10-27  2:15 UTC (permalink / raw)
  To: linux-kernel

On Sun, Oct 26, 2003 at 06:08:44PM -0800, Mike Fedyk wrote:
> On Fri, Oct 17, 2003 at 11:23:21AM -0700, jw schultz wrote:
> > The probability of false positives in rsync are orders of
> > magnitude smaller than they would be in a block hashing
> > filesystem.  Yet we were seeing it happen (with truncated
> > hash) at measurable rates on files as small as a few hundred
> > megabytes.  It was almost commonplace on iso images.
> 
> So has there been anything done to solve this problem in rsync?

Yes.  Hence the use of the past tense.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-16 23:04           ` jw schultz
                               ` (3 preceding siblings ...)
  2003-10-17  9:44             ` Pavel Machek
@ 2003-10-27  2:22             ` Mike Fedyk
  2003-10-27  2:45               ` jw schultz
  4 siblings, 1 reply; 92+ messages in thread
From: Mike Fedyk @ 2003-10-27  2:22 UTC (permalink / raw)
  To: jw schultz, linux-kernel

On Thu, Oct 16, 2003 at 04:04:48PM -0700, jw schultz wrote:
> 2.  External collision: The smaller hash size to unit count
>     ratio the greater liklihood of false positives.  To put
>     it in the extreme: if you have 10,000 blocks you are
>     almost guaranteed to get false positives on a 16 bit
>     hash.  This is similar to filling the namespace.  It is
>     external collision that was killing rsync on iso images.

What about making one bit changes in the block at a random location (the
same bit on both sides), and comparing the hashes again.

This would work for something that is trying to save bandwidth over a
network link, but wouldn't save anything in a Compare-by-hash File System
(why hash again when you need to read the block anyway?).

One thought for rsync would be to have a 4 step process (the first two are
how it operates now according to another sub-thread):

 1) Check the block with a weak 16bit hash
 
 2) Check the block with a stronger 32bit hash (if step 1 had a collision)

 3) Check the block with SHA1 (if step 2 had a collision)

 4) Make a one bit change at a random location in the block, but at the same
    place on both ends of the transfer and repeat steps 1 to 3
    
It's arguable how much of an optimization the 16bit hash is, but it's there
and maybe it's a win, and maybe not.  It's also arguable just how much
bandwidth you're going to save with the extra steps 3, and the recursive 4.
Maybe 4 should be just a recheck with SHA1, and the recursive check isn't
needed, but I haven't done any hash benchmarks.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
  2003-10-27  2:22             ` Mike Fedyk
@ 2003-10-27  2:45               ` jw schultz
  0 siblings, 0 replies; 92+ messages in thread
From: jw schultz @ 2003-10-27  2:45 UTC (permalink / raw)
  To: linux-kernel

On Sun, Oct 26, 2003 at 06:22:31PM -0800, Mike Fedyk wrote:
> On Thu, Oct 16, 2003 at 04:04:48PM -0700, jw schultz wrote:
> > 2.  External collision: The smaller hash size to unit count
> >     ratio the greater liklihood of false positives.  To put
> >     it in the extreme: if you have 10,000 blocks you are
> >     almost guaranteed to get false positives on a 16 bit
> >     hash.  This is similar to filling the namespace.  It is
> >     external collision that was killing rsync on iso images.
> 
> What about making one bit changes in the block at a random location (the
> same bit on both sides), and comparing the hashes again.
> 
> This would work for something that is trying to save bandwidth over a
> network link, but wouldn't save anything in a Compare-by-hash File System
> (why hash again when you need to read the block anyway?).
> 
> One thought for rsync would be to have a 4 step process (the first two are
> how it operates now according to another sub-thread):
> 
>  1) Check the block with a weak 16bit hash
>  
>  2) Check the block with a stronger 32bit hash (if step 1 had a collision)
> 
>  3) Check the block with SHA1 (if step 2 had a collision)
> 
>  4) Make a one bit change at a random location in the block, but at the same
>     place on both ends of the transfer and repeat steps 1 to 3
>     
> It's arguable how much of an optimization the 16bit hash is, but it's there
> and maybe it's a win, and maybe not.  It's also arguable just how much
> bandwidth you're going to save with the extra steps 3, and the recursive 4.
> Maybe 4 should be just a recheck with SHA1, and the recursive check isn't
> needed, but I haven't done any hash benchmarks.

My "extreme" example was for illustration only. 16 bits was
chosen as a size to get the numberspace into a scale where
the issue become obvious.

Rsync uses a minimum of a 48 bit hash per block.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: Transparent compression in the FS
       [not found]           ` <HnxT.3BB.27@gated-at.bofh.it>
@ 2003-10-17  8:15             ` Ihar 'Philips' Filipau
  0 siblings, 0 replies; 92+ messages in thread
From: Ihar 'Philips' Filipau @ 2003-10-17  8:15 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Linux Kernel Mailing List

Jeff Garzik wrote:
> Val Henson wrote:
>>   of many kinds of hardware errors. Further analysis shows that this
>>   approach is not as risk-free as it seems at first glance."
> 
> 
> I'm curious if anyone has done any work on using multiple different 
> checksums?  For example, the cost of checksumming a single block with 
> multiple algorithms (sha1+md5+crc32 for a crazy example), and storing 

   With sha1 you probably have 8-9 nines reliability.
   You can improve this - but still adding nines doesn't mean it 
(reliability) becomes 100%.

   my 0.02c.

-- 
Ihar 'Philips' Filipau  / with best regards from Saarbruecken.
--
   "... and for $64000 question, could you get yourself vaguely
      familiar with the notion of on-topic posting?"
				-- Al Viro @ LKML


^ permalink raw reply	[flat|nested] 92+ messages in thread

end of thread, other threads:[~2003-10-27  2:45 UTC | newest]

Thread overview: 92+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-14 20:30 Transparent compression in the FS Josh Litherland
2003-10-15 13:33 ` Erik Mouw
2003-10-15 13:45   ` Josh Litherland
2003-10-15 13:50   ` Nikita Danilov
2003-10-15 14:27     ` Erik Mouw
2003-10-15 14:33       ` Nikita Danilov
2003-10-15 15:54         ` Richard B. Johnson
2003-10-15 16:21           ` Nikita Danilov
2003-10-15 17:19             ` Richard B. Johnson
2003-10-15 17:37               ` Andreas Dilger
2003-10-15 17:48               ` Dave Jones
2003-10-15 18:19                 ` Richard B. Johnson
2003-10-15 18:06               ` Hans Reiser
2003-10-17 12:51                 ` Edward Shushkin
2003-10-15 16:04         ` Erik Mouw
2003-10-15 17:24           ` Josh Litherland
2003-10-15 18:53             ` Erik Bourget
2003-10-15 19:03           ` Geert Uytterhoeven
2003-10-15 19:14             ` Valdis.Kletnieks
2003-10-15 19:24               ` Geert Uytterhoeven
2003-10-15 18:54         ` root
2003-10-16  2:11           ` Chris Meadors
2003-10-16  3:01             ` Shawn
2003-10-15 14:47       ` Erik Bourget
2003-10-15 15:05         ` Nikita Danilov
2003-10-15 15:06           ` Erik Bourget
2003-10-15 21:36       ` Tomas Szepe
2003-10-16  8:04         ` Ville Herva
2003-10-17  1:32       ` Eric W. Biederman
2003-10-15 15:13   ` Jeff Garzik
2003-10-15 21:00     ` Christopher Li
2003-10-16 16:29     ` Andrea Arcangeli
2003-10-16 16:41       ` P
2003-10-16 17:20         ` Jeff Garzik
2003-10-16 23:12         ` jw schultz
2003-10-17  8:03           ` John Bradford
2003-10-17 14:53             ` Eli Carter
2003-10-17 15:27               ` John Bradford
2003-10-17 16:22                 ` Eli Carter
2003-10-17 17:15                   ` John Bradford
2003-10-16 17:10       ` Jeff Garzik
2003-10-16 17:41         ` Andrea Arcangeli
2003-10-16 17:29       ` Larry McVoy
2003-10-16 17:49         ` Val Henson
2003-10-16 21:02           ` Jeff Garzik
2003-10-16 21:18             ` Chris Meadors
2003-10-16 21:25               ` Jeff Garzik
2003-10-16 21:33             ` Davide Libenzi
2003-10-17  3:47             ` Mark Mielke
2003-10-17 14:31             ` Jörn Engel
2003-10-16 23:04           ` jw schultz
2003-10-16 23:30             ` Jeff Garzik
2003-10-16 23:58               ` jw schultz
2003-10-16 23:53                 ` David Lang
2003-10-17  1:19                 ` Jeff Garzik
2003-10-17  0:45             ` Christopher Li
2003-10-17  1:16               ` Jeff Garzik
2003-10-17  1:32             ` jlnance
2003-10-17  1:47               ` Eric Sandall
2003-10-17  8:11                 ` John Bradford
2003-10-17 17:53                   ` Eric Sandall
2003-10-17 13:07                 ` jlnance
2003-10-17 14:16                   ` Jeff Garzik
2003-10-17 15:06                     ` Valdis.Kletnieks
2003-10-17  1:49               ` Davide Libenzi
2003-10-17  1:59               ` Larry McVoy
2003-10-17  2:19               ` jw schultz
2003-10-17  9:44             ` Pavel Machek
2003-10-17 12:33               ` jlnance
2003-10-17 18:23               ` jw schultz
2003-10-27  2:08                 ` Mike Fedyk
2003-10-27  2:15                   ` jw schultz
2003-10-27  2:22             ` Mike Fedyk
2003-10-27  2:45               ` jw schultz
2003-10-16 18:28         ` John Bradford
2003-10-16 18:31           ` Robert Love
2003-10-16 20:18             ` Jeff Garzik
2003-10-16 18:43           ` Muli Ben-Yehuda
2003-10-16 18:56           ` Richard B. Johnson
2003-10-16 19:00             ` Robert Love
2003-10-16 19:27               ` John Bradford
2003-10-16 19:03             ` John Bradford
2003-10-16 19:20               ` Richard B. Johnson
2003-10-17 13:16         ` Ingo Oeser
2003-10-16 23:20       ` jw schultz
2003-10-17 14:47         ` Eli Carter
2003-10-16  8:27   ` tconnors+linuxkernel1066292516
2003-10-17 10:55   ` Ingo Oeser
2003-10-15 16:25 ` David Woodhouse
2003-10-15 16:56   ` Andreas Dilger
2003-10-15 17:44     ` David Woodhouse
     [not found] <GTJr.60q.17@gated-at.bofh.it>
     [not found] ` <GU2N.6v7.17@gated-at.bofh.it>
     [not found]   ` <GVBC.Ep.23@gated-at.bofh.it>
     [not found]     ` <Hjkq.3Al.1@gated-at.bofh.it>
     [not found]       ` <Hkgx.4Vu.7@gated-at.bofh.it>
     [not found]         ` <HkA0.5lh.9@gated-at.bofh.it>
     [not found]           ` <HnxT.3BB.27@gated-at.bofh.it>
2003-10-17  8:15             ` Ihar 'Philips' Filipau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.