linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Ripley <jripley@riohome.com>
To: linux-kernel@vger.kernel.org
Cc: VDA <VDA@port.imtp.ilyichevsk.odessa.ua>
Subject: Re: COW fs (Re: Editing-in-place of a large file)
Date: Sun, 09 Sep 2001 15:46:58 +0100	[thread overview]
Message-ID: <3B9B80E2.C9D5B947@riohome.com> (raw)
In-Reply-To: <20010902152137.L23180@draal.physics.wisc.edu> <318476047.20010903002818@port.imtp.ilyichevsk.odessa.ua>

VDA wrote:
> 
> Sunday, September 02, 2001, 11:21:37 PM, Bob McElrath wrote:
> BM> I would like to take an extremely large file (multi-gigabyte) and edit
> BM> it by removing a chunk out of the middle.  This is easy enough by
> BM> reading in the entire file and spitting it back out again, but it's
> BM> hardly efficent to read in an 8GB file just to remove a 100MB segment.

> BM> Is there another way to do this?

> BM> Is it possible to modify the inode structure of the underlying
> BM> filesystem to free blocks in the middle?  (What to do with the half-full
> BM> blocks that are left?)  Has anyone written a tool to do something like
> BM> this?

> BM> Is there a way to do this in a filesystem-independent manner?

> A COW fs is a far more useful and cool. A fs where a copy of a file
> does not duplicate all blocks. Blocks get copied-on-write only when
> copy of a file is written to. There could be even a fs compressor
> which looks for and merges blocks with exactly same contents from
> different files.
> 
> Maybe ext2/3 folks will play with this idea after ext3?
> 
> I'm planning to write a test program which will scan my ext2 fs and
> report how many duplicate blocks with the same contents it sees (i.e
> how many would I save with a COW fs)

I've tried this idea. I did an MD5 of every block (4KB) in a partition
and counted the number of blocks with the same hash. Only about 5-10% of
blocks on several filesystem were actually duplicates. This might be
better if you reduced the block size to 512 bytes, but there's a
question of how much extra space filesystem structures would then take
up.

Basically, it didn't look like compressing duplicate blocks would
actually be worth the extra structures or CPU.

On the other hand, a COW fs would be excellent for making file copying
much quicker. You can do things like copying the linux kernel tree using
'cp -lR', but the files do not act as if they are unique copies - and
I've been bitten many times when I forgot this. If you had COW, you
could just copy the entire tree and forget about the fact they're
linked.

The problem is this needs a bit of userland support, which could only be
done automatically if you did this:

- Keep a hash of the contents of blocks in the buffer-cache.
- The kernel compares the hash of each block write to all blocks already
in the buffer-cache.
- If a duplicate is found, the kernel generates a COW link instead of
writing the block to disk.

Obviously this would involve large amounts of CPU. I think a simple
userland call for 'COW this file to this new file' wouldn't be too
hideous a solution.

-- 
John Ripley

  reply	other threads:[~2001-09-09 14:46 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-09-02 20:21 Editing-in-place of a large file Bob McElrath
2001-09-02 21:28 ` COW fs (Re: Editing-in-place of a large file) VDA
2001-09-09 14:46   ` John Ripley [this message]
2001-09-09 16:30     ` John Ripley
2001-09-10  2:43       ` Daniel Phillips
2001-09-10  2:58         ` David Lang
2001-09-09 17:41     ` Xavier Bestel
2001-09-10  1:29       ` John Ripley
2001-09-10  6:45         ` Ragnar Kjørstad
2001-09-14 10:06         ` Pavel Machek
2001-09-10 11:11       ` Ihar Filipau
2001-09-10 16:10         ` Kari Hurtta
2001-09-14 10:03     ` Pavel Machek
2001-09-10  9:28   ` VDA
2001-09-10  9:35     ` John P. Looney
2001-09-02 21:30 ` Editing-in-place of a large file Ingo Oeser
2001-09-03  0:59   ` Larry McVoy
2001-09-03  1:24     ` Ingo Oeser
2001-09-03  1:31       ` Alan Cox
2001-09-03  1:50         ` Ingo Oeser
2001-09-03 10:48           ` Alan Cox
2001-09-03 14:31             ` Daniel Phillips
2001-09-03 14:46             ` Bob McElrath
2001-09-03 14:54               ` Alan Cox
2001-09-03 15:42                 ` Doug McNaught
2001-09-03 15:11               ` Richard Guenther
2001-09-03 21:19             ` Ben Ford
2001-09-03  4:27       ` Bob McElrath
2001-09-03  1:30     ` Daniel Phillips

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3B9B80E2.C9D5B947@riohome.com \
    --to=jripley@riohome.com \
    --cc=VDA@port.imtp.ilyichevsk.odessa.ua \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).