All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Jeff Garzik <jgarzik@pobox.com>
Cc: Erik Mouw <erik@harddisk-recovery.com>,
	Josh Litherland <josh@temp123.org>,
	linux-kernel@vger.kernel.org
Subject: Re: Transparent compression in the FS
Date: Thu, 16 Oct 2003 18:29:26 +0200	[thread overview]
Message-ID: <20031016162926.GF1663@velociraptor.random> (raw)
In-Reply-To: <3F8D6417.8050409@pobox.com>

Hi Jeff,

On Wed, Oct 15, 2003 at 11:13:27AM -0400, Jeff Garzik wrote:
> Josh and others should take a look at Plan9's venti file storage method 
> -- archival storage is a series of unordered blocks, all of which are 
> indexed by the sha1 hash of their contents.  This magically coalesces 
> all duplicate blocks by its very nature, including the loooooong runs of 
> zeroes that you'll find in many filesystems.  I bet savings on "all 
> bytes in this block are zero" are worth a bunch right there.

I had a few ideas on the above.

if the zero blocks are the problem, there's a tool called zum that nukes
them and replaces them with holes. I use it sometime, example:

andrea@velociraptor:~> dd if=/dev/zero of=zero bs=1M count=100
100+0 records in
100+0 records out
andrea@velociraptor:~> ls -ls zero
102504 -rw-r--r--    1 andrea   andrea   104857600 2003-10-16 18:24 zero
andrea@velociraptor:~> ~/bin/i686/zum zero
zero [820032K]  [1 link]
andrea@velociraptor:~> ls -ls zero
   0 -rw-r--r--    1 andrea   andrea   104857600 2003-10-16 18:24 zero
andrea@velociraptor:~> 

if you can't find it ask and I'll send it by email (it's GPL btw).

the hash to the data is interesting, but 1) you lose the zerocopy
behaviour for the I/O, it's like doing a checksum for all the data going to
disk that you normally would never do (except for the tiny files in reiserfs
with tail packing enabled, but that's not bulk I/O), 2) I wonder how much data
is really duplicate besides the "zero" holes trivially fixable in userspace
(modulo bzImage or similar where I'm unsure if the fs code in the bootloader
can handle holes ;).

  parent reply	other threads:[~2003-10-16 16:29 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-14 20:30 Transparent compression in the FS Josh Litherland
2003-10-15 13:33 ` Erik Mouw
2003-10-15 13:45   ` Josh Litherland
2003-10-15 13:50   ` Nikita Danilov
2003-10-15 14:27     ` Erik Mouw
2003-10-15 14:33       ` Nikita Danilov
2003-10-15 15:54         ` Richard B. Johnson
2003-10-15 16:21           ` Nikita Danilov
2003-10-15 17:19             ` Richard B. Johnson
2003-10-15 17:37               ` Andreas Dilger
2003-10-15 17:48               ` Dave Jones
2003-10-15 18:19                 ` Richard B. Johnson
2003-10-15 18:06               ` Hans Reiser
2003-10-17 12:51                 ` Edward Shushkin
2003-10-15 16:04         ` Erik Mouw
2003-10-15 17:24           ` Josh Litherland
2003-10-15 18:53             ` Erik Bourget
2003-10-15 19:03           ` Geert Uytterhoeven
2003-10-15 19:14             ` Valdis.Kletnieks
2003-10-15 19:24               ` Geert Uytterhoeven
2003-10-15 18:54         ` root
2003-10-16  2:11           ` Chris Meadors
2003-10-16  3:01             ` Shawn
2003-10-15 14:47       ` Erik Bourget
2003-10-15 15:05         ` Nikita Danilov
2003-10-15 15:06           ` Erik Bourget
2003-10-15 21:36       ` Tomas Szepe
2003-10-16  8:04         ` Ville Herva
2003-10-17  1:32       ` Eric W. Biederman
2003-10-15 15:13   ` Jeff Garzik
2003-10-15 21:00     ` Christopher Li
2003-10-16 16:29     ` Andrea Arcangeli [this message]
2003-10-16 16:41       ` P
2003-10-16 17:20         ` Jeff Garzik
2003-10-16 23:12         ` jw schultz
2003-10-17  8:03           ` John Bradford
2003-10-17 14:53             ` Eli Carter
2003-10-17 15:27               ` John Bradford
2003-10-17 16:22                 ` Eli Carter
2003-10-17 17:15                   ` John Bradford
2003-10-16 17:10       ` Jeff Garzik
2003-10-16 17:41         ` Andrea Arcangeli
2003-10-16 17:29       ` Larry McVoy
2003-10-16 17:49         ` Val Henson
2003-10-16 21:02           ` Jeff Garzik
2003-10-16 21:18             ` Chris Meadors
2003-10-16 21:25               ` Jeff Garzik
2003-10-16 21:33             ` Davide Libenzi
2003-10-17  3:47             ` Mark Mielke
2003-10-17 14:31             ` Jörn Engel
2003-10-16 23:04           ` jw schultz
2003-10-16 23:30             ` Jeff Garzik
2003-10-16 23:58               ` jw schultz
2003-10-16 23:53                 ` David Lang
2003-10-17  1:19                 ` Jeff Garzik
2003-10-17  0:45             ` Christopher Li
2003-10-17  1:16               ` Jeff Garzik
2003-10-17  1:32             ` jlnance
2003-10-17  1:47               ` Eric Sandall
2003-10-17  8:11                 ` John Bradford
2003-10-17 17:53                   ` Eric Sandall
2003-10-17 13:07                 ` jlnance
2003-10-17 14:16                   ` Jeff Garzik
2003-10-17 15:06                     ` Valdis.Kletnieks
2003-10-17  1:49               ` Davide Libenzi
2003-10-17  1:59               ` Larry McVoy
2003-10-17  2:19               ` jw schultz
2003-10-17  9:44             ` Pavel Machek
2003-10-17 12:33               ` jlnance
2003-10-17 18:23               ` jw schultz
2003-10-27  2:08                 ` Mike Fedyk
2003-10-27  2:15                   ` jw schultz
2003-10-27  2:22             ` Mike Fedyk
2003-10-27  2:45               ` jw schultz
2003-10-16 18:28         ` John Bradford
2003-10-16 18:31           ` Robert Love
2003-10-16 20:18             ` Jeff Garzik
2003-10-16 18:43           ` Muli Ben-Yehuda
2003-10-16 18:56           ` Richard B. Johnson
2003-10-16 19:00             ` Robert Love
2003-10-16 19:27               ` John Bradford
2003-10-16 19:03             ` John Bradford
2003-10-16 19:20               ` Richard B. Johnson
2003-10-17 13:16         ` Ingo Oeser
2003-10-16 23:20       ` jw schultz
2003-10-17 14:47         ` Eli Carter
2003-10-16  8:27   ` tconnors+linuxkernel1066292516
2003-10-17 10:55   ` Ingo Oeser
2003-10-15 16:25 ` David Woodhouse
2003-10-15 16:56   ` Andreas Dilger
2003-10-15 17:44     ` David Woodhouse
     [not found] <GTJr.60q.17@gated-at.bofh.it>
     [not found] ` <GU2N.6v7.17@gated-at.bofh.it>
     [not found]   ` <GVBC.Ep.23@gated-at.bofh.it>
     [not found]     ` <Hjkq.3Al.1@gated-at.bofh.it>
     [not found]       ` <Hkgx.4Vu.7@gated-at.bofh.it>
     [not found]         ` <HkA0.5lh.9@gated-at.bofh.it>
     [not found]           ` <HnxT.3BB.27@gated-at.bofh.it>
2003-10-17  8:15             ` Ihar 'Philips' Filipau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031016162926.GF1663@velociraptor.random \
    --to=andrea@suse.de \
    --cc=erik@harddisk-recovery.com \
    --cc=jgarzik@pobox.com \
    --cc=josh@temp123.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.