All of lore.kernel.org
 help / color / mirror / Atom feed
From: Davide Libenzi <davidel@xmailserver.org>
To: Jeff Garzik <jgarzik@pobox.com>
Cc: Val Henson <val@nmt.edu>, Larry McVoy <lm@work.bitmover.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Transparent compression in the FS
Date: Thu, 16 Oct 2003 14:33:39 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.56.0310161422170.2217@bigblue.dev.mdolabs.com> (raw)
In-Reply-To: <3F8F0766.30405@pobox.com>

On Thu, 16 Oct 2003, Jeff Garzik wrote:

> Val Henson wrote:
> > Abstract:
> >
> >  "Recent research has produced a new and perhaps dangerous technique
> >   for uniquely identifying blocks that I will call
> >   compare-by-hash. Using this technique, we decide whether two blocks
> >   are identical to each other by comparing their hash values, using a
> >   collision-resistant hash such as SHA-1. If the hash values match,
> >   we assume the blocks are identical without further ado. Users of
> >   compare-by-hash argue that this assumption is warranted because the
> >   chance of a hash collision between any two randomly generated blocks
> >   is estimated to be many orders of magnitude smaller than the chance
> >   of many kinds of hardware errors. Further analysis shows that this
> >   approach is not as risk-free as it seems at first glance."
>
>
> I'm curious if anyone has done any work on using multiple different
> checksums?  For example, the cost of checksumming a single block with
> multiple algorithms (sha1+md5+crc32 for a crazy example), and storing
> each checksum (instead of just one sha1 sum), may be faster than reading
> the block off of disk to compare it with the incoming block.  OTOH,
> there is still a mathematical possibility (however-more-remote) of a
> collission...

At that point it is better to extend the fingerprint size, since the SHA1
algorithm has a better distribution compared to md5 and crc32. Probability
estimates are pretty low though. If you consider a 2^32 blocks FS, that
with a 4Kb block size makes a 4 tera FS, the collision probability is in
the orders of 2^(-95) (with a 160 bit fingerprint). That's a pretty low
number. Yes, it is true that the input is not completely random, but a
good property of SHA1 is the one of spreading output result of very
similar input patterns.




- Davide


  parent reply	other threads:[~2003-10-16 21:37 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-14 20:30 Transparent compression in the FS Josh Litherland
2003-10-15 13:33 ` Erik Mouw
2003-10-15 13:45   ` Josh Litherland
2003-10-15 13:50   ` Nikita Danilov
2003-10-15 14:27     ` Erik Mouw
2003-10-15 14:33       ` Nikita Danilov
2003-10-15 15:54         ` Richard B. Johnson
2003-10-15 16:21           ` Nikita Danilov
2003-10-15 17:19             ` Richard B. Johnson
2003-10-15 17:37               ` Andreas Dilger
2003-10-15 17:48               ` Dave Jones
2003-10-15 18:19                 ` Richard B. Johnson
2003-10-15 18:06               ` Hans Reiser
2003-10-17 12:51                 ` Edward Shushkin
2003-10-15 16:04         ` Erik Mouw
2003-10-15 17:24           ` Josh Litherland
2003-10-15 18:53             ` Erik Bourget
2003-10-15 19:03           ` Geert Uytterhoeven
2003-10-15 19:14             ` Valdis.Kletnieks
2003-10-15 19:24               ` Geert Uytterhoeven
2003-10-15 18:54         ` root
2003-10-16  2:11           ` Chris Meadors
2003-10-16  3:01             ` Shawn
2003-10-15 14:47       ` Erik Bourget
2003-10-15 15:05         ` Nikita Danilov
2003-10-15 15:06           ` Erik Bourget
2003-10-15 21:36       ` Tomas Szepe
2003-10-16  8:04         ` Ville Herva
2003-10-17  1:32       ` Eric W. Biederman
2003-10-15 15:13   ` Jeff Garzik
2003-10-15 21:00     ` Christopher Li
2003-10-16 16:29     ` Andrea Arcangeli
2003-10-16 16:41       ` P
2003-10-16 17:20         ` Jeff Garzik
2003-10-16 23:12         ` jw schultz
2003-10-17  8:03           ` John Bradford
2003-10-17 14:53             ` Eli Carter
2003-10-17 15:27               ` John Bradford
2003-10-17 16:22                 ` Eli Carter
2003-10-17 17:15                   ` John Bradford
2003-10-16 17:10       ` Jeff Garzik
2003-10-16 17:41         ` Andrea Arcangeli
2003-10-16 17:29       ` Larry McVoy
2003-10-16 17:49         ` Val Henson
2003-10-16 21:02           ` Jeff Garzik
2003-10-16 21:18             ` Chris Meadors
2003-10-16 21:25               ` Jeff Garzik
2003-10-16 21:33             ` Davide Libenzi [this message]
2003-10-17  3:47             ` Mark Mielke
2003-10-17 14:31             ` Jörn Engel
2003-10-16 23:04           ` jw schultz
2003-10-16 23:30             ` Jeff Garzik
2003-10-16 23:58               ` jw schultz
2003-10-16 23:53                 ` David Lang
2003-10-17  1:19                 ` Jeff Garzik
2003-10-17  0:45             ` Christopher Li
2003-10-17  1:16               ` Jeff Garzik
2003-10-17  1:32             ` jlnance
2003-10-17  1:47               ` Eric Sandall
2003-10-17  8:11                 ` John Bradford
2003-10-17 17:53                   ` Eric Sandall
2003-10-17 13:07                 ` jlnance
2003-10-17 14:16                   ` Jeff Garzik
2003-10-17 15:06                     ` Valdis.Kletnieks
2003-10-17  1:49               ` Davide Libenzi
2003-10-17  1:59               ` Larry McVoy
2003-10-17  2:19               ` jw schultz
2003-10-17  9:44             ` Pavel Machek
2003-10-17 12:33               ` jlnance
2003-10-17 18:23               ` jw schultz
2003-10-27  2:08                 ` Mike Fedyk
2003-10-27  2:15                   ` jw schultz
2003-10-27  2:22             ` Mike Fedyk
2003-10-27  2:45               ` jw schultz
2003-10-16 18:28         ` John Bradford
2003-10-16 18:31           ` Robert Love
2003-10-16 20:18             ` Jeff Garzik
2003-10-16 18:43           ` Muli Ben-Yehuda
2003-10-16 18:56           ` Richard B. Johnson
2003-10-16 19:00             ` Robert Love
2003-10-16 19:27               ` John Bradford
2003-10-16 19:03             ` John Bradford
2003-10-16 19:20               ` Richard B. Johnson
2003-10-17 13:16         ` Ingo Oeser
2003-10-16 23:20       ` jw schultz
2003-10-17 14:47         ` Eli Carter
2003-10-16  8:27   ` tconnors+linuxkernel1066292516
2003-10-17 10:55   ` Ingo Oeser
2003-10-15 16:25 ` David Woodhouse
2003-10-15 16:56   ` Andreas Dilger
2003-10-15 17:44     ` David Woodhouse
     [not found] <GTJr.60q.17@gated-at.bofh.it>
     [not found] ` <GU2N.6v7.17@gated-at.bofh.it>
     [not found]   ` <GVBC.Ep.23@gated-at.bofh.it>
     [not found]     ` <Hjkq.3Al.1@gated-at.bofh.it>
     [not found]       ` <Hkgx.4Vu.7@gated-at.bofh.it>
     [not found]         ` <HkA0.5lh.9@gated-at.bofh.it>
     [not found]           ` <HnxT.3BB.27@gated-at.bofh.it>
2003-10-17  8:15             ` Ihar 'Philips' Filipau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.56.0310161422170.2217@bigblue.dev.mdolabs.com \
    --to=davidel@xmailserver.org \
    --cc=jgarzik@pobox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lm@work.bitmover.com \
    --cc=val@nmt.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.