linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: VDA <VDA@port.imtp.ilyichevsk.odessa.ua>
To: John Ripley <jripley@riohome.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: COW fs (Re: Editing-in-place of a large file)
Date: Mon, 10 Sep 2001 12:28:51 +0300	[thread overview]
Message-ID: <8015001541.20010910122851@port.imtp.ilyichevsk.odessa.ua> (raw)
In-Reply-To: <3B9B80E2.C9D5B947@riohome.com>
In-Reply-To: <20010902152137.L23180@draal.physics.wisc.edu> <318476047.20010903002818@port.imtp.ilyichevsk.odessa.ua> <3B9B80E2.C9D5B947@riohome.com>

JR> I've tried this idea. I did an MD5 of every block (4KB) in a partition
JR> and counted the number of blocks with the same hash. Only about 5-10% of
JR> blocks on several filesystem were actually duplicates. This might be
JR> better if you reduced the block size to 512 bytes, but there's a
JR> question of how much extra space filesystem structures would then take
JR> up.

JR> Basically, it didn't look like compressing duplicate blocks would
JR> actually be worth the extra structures or CPU.

JR> On the other hand, a COW fs would be excellent for making file copying
JR> much quicker. You can do things like copying the linux kernel tree using
JR> 'cp -lR', but the files do not act as if they are unique copies - and
JR> I've been bitten many times when I forgot this. If you had COW, you
JR> could just copy the entire tree and forget about the fact they're
JR> linked.

Yeah, I'm mostly thinking about this kind of COW fs usage. You may copy
gigabytes in the instant and don't bother about tracking duplicate
files ("zero blocks left??? where's the hell I copied that .mpg's???").

Now, sometimes we use hardlinks as "poor man's COW fs", but
I bet it's error prone. Every now and then you forget it's a
hardlinked kernel tree and start happily hacking in it... :-(

A "compressor" which hunts and merges duplicate blocks is a bonus,
not a primary tool.
-- 
Best regards,
VDA
mailto:VDA@port.imtp.ilyichevsk.odessa.ua
http://port.imtp.ilyichevsk.odessa.ua/vda/



  parent reply	other threads:[~2001-09-10  9:30 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-09-02 20:21 Editing-in-place of a large file Bob McElrath
2001-09-02 21:28 ` COW fs (Re: Editing-in-place of a large file) VDA
2001-09-09 14:46   ` John Ripley
2001-09-09 16:30     ` John Ripley
2001-09-10  2:43       ` Daniel Phillips
2001-09-10  2:58         ` David Lang
2001-09-09 17:41     ` Xavier Bestel
2001-09-10  1:29       ` John Ripley
2001-09-10  6:45         ` Ragnar Kjørstad
2001-09-14 10:06         ` Pavel Machek
2001-09-10 11:11       ` Ihar Filipau
2001-09-10 16:10         ` Kari Hurtta
2001-09-14 10:03     ` Pavel Machek
2001-09-10  9:28   ` VDA [this message]
2001-09-10  9:35     ` John P. Looney
2001-09-02 21:30 ` Editing-in-place of a large file Ingo Oeser
2001-09-03  0:59   ` Larry McVoy
2001-09-03  1:24     ` Ingo Oeser
2001-09-03  1:31       ` Alan Cox
2001-09-03  1:50         ` Ingo Oeser
2001-09-03 10:48           ` Alan Cox
2001-09-03 14:31             ` Daniel Phillips
2001-09-03 14:46             ` Bob McElrath
2001-09-03 14:54               ` Alan Cox
2001-09-03 15:42                 ` Doug McNaught
2001-09-03 15:11               ` Richard Guenther
2001-09-03 21:19             ` Ben Ford
2001-09-03  4:27       ` Bob McElrath
2001-09-03  1:30     ` Daniel Phillips

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8015001541.20010910122851@port.imtp.ilyichevsk.odessa.ua \
    --to=vda@port.imtp.ilyichevsk.odessa.ua \
    --cc=jripley@riohome.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).