linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Editing-in-place of a large file
@ 2001-09-02 20:21 Bob McElrath
  2001-09-02 21:28 ` COW fs (Re: Editing-in-place of a large file) VDA
  2001-09-02 21:30 ` Editing-in-place of a large file Ingo Oeser
  0 siblings, 2 replies; 29+ messages in thread
From: Bob McElrath @ 2001-09-02 20:21 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 721 bytes --]

I would like to take an extremely large file (multi-gigabyte) and edit
it by removing a chunk out of the middle.  This is easy enough by
reading in the entire file and spitting it back out again, but it's
hardly efficent to read in an 8GB file just to remove a 100MB segment.

Is there another way to do this?

Is it possible to modify the inode structure of the underlying
filesystem to free blocks in the middle?  (What to do with the half-full
blocks that are left?)  Has anyone written a tool to do something like
this?

Is there a way to do this in a filesystem-independent manner?

Thanks,
-- Bob

Bob McElrath (rsmcelrath@students.wisc.edu) 
Univ. of Wisconsin at Madison, Department of Physics

[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* COW fs (Re: Editing-in-place of a large file)
  2001-09-02 20:21 Editing-in-place of a large file Bob McElrath
@ 2001-09-02 21:28 ` VDA
  2001-09-09 14:46   ` John Ripley
  2001-09-10  9:28   ` VDA
  2001-09-02 21:30 ` Editing-in-place of a large file Ingo Oeser
  1 sibling, 2 replies; 29+ messages in thread
From: VDA @ 2001-09-02 21:28 UTC (permalink / raw)
  To: linux-kernel

Sunday, September 02, 2001, 11:21:37 PM, Bob McElrath wrote:
BM> I would like to take an extremely large file (multi-gigabyte) and edit
BM> it by removing a chunk out of the middle.  This is easy enough by
BM> reading in the entire file and spitting it back out again, but it's
BM> hardly efficent to read in an 8GB file just to remove a 100MB segment.

BM> Is there another way to do this?

BM> Is it possible to modify the inode structure of the underlying
BM> filesystem to free blocks in the middle?  (What to do with the half-full
BM> blocks that are left?)  Has anyone written a tool to do something like
BM> this?

BM> Is there a way to do this in a filesystem-independent manner?

A COW fs is a far more useful and cool. A fs where a copy of a file
does not duplicate all blocks. Blocks get copied-on-write only when
copy of a file is written to. There could be even a fs compressor
which looks for and merges blocks with exactly same contents from
different files.

Maybe ext2/3 folks will play with this idea after ext3?

I'm planning to write a test program which will scan my ext2 fs and
report how many duplicate blocks with the same contents it sees (i.e
how many would I save with a COW fs)
-- 
Best regards,
VDA
mailto:VDA@port.imtp.ilyichevsk.odessa.ua
http://port.imtp.ilyichevsk.odessa.ua/vda/



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Editing-in-place of a large file
  2001-09-02 20:21 Editing-in-place of a large file Bob McElrath
  2001-09-02 21:28 ` COW fs (Re: Editing-in-place of a large file) VDA
@ 2001-09-02 21:30 ` Ingo Oeser
  2001-09-03  0:59   ` Larry McVoy
  1 sibling, 1 reply; 29+ messages in thread
From: Ingo Oeser @ 2001-09-02 21:30 UTC (permalink / raw)
  To: Bob McElrath; +Cc: linux-kernel

On Sun, Sep 02, 2001 at 03:21:37PM -0500, Bob McElrath wrote:
> I would like to take an extremely large file (multi-gigabyte) and edit
> it by removing a chunk out of the middle.  This is easy enough by
> reading in the entire file and spitting it back out again, but it's
> hardly efficent to read in an 8GB file just to remove a 100MB segment.
> 
> Is there another way to do this?
 
It's basically changing ownership (in terms of "which inode owns
which blocks") of blocks. 

There is just no POSIX-API to do this, that's why there is no
simple way to do this.

Applications handling such large files usally implement a chunk
management, which can mark chunks as "unused" and skip them while
processing the file.

What's needed is a generalisation of sparse files and truncate().
They both handle similar problems.

For now I would seriously consider editing the ext2-structures
for this, because that's the only way you can do this right now.

Regards

Ingo Oeser
-- 
In der Wunschphantasie vieler Mann-Typen [ist die Frau] unsigned und
operatorvertraeglich. --- Dietz Proepper in dasr

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Editing-in-place of a large file
  2001-09-02 21:30 ` Editing-in-place of a large file Ingo Oeser
@ 2001-09-03  0:59   ` Larry McVoy
  2001-09-03  1:24     ` Ingo Oeser
  2001-09-03  1:30     ` Daniel Phillips
  0 siblings, 2 replies; 29+ messages in thread
From: Larry McVoy @ 2001-09-03  0:59 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: Bob McElrath, linux-kernel

> What's needed is a generalisation of sparse files and truncate().
> They both handle similar problems.

how about 

	fzero(int fd, off_t off, size_t len)

which zeros the blocks and if it can creates a holey file?

However, that's not what Bob wants, he wants to remove commercials from
recorded TV.  So what he wants is 

	fdelete(int fd, off_t off, size_t len)

which has the semantics of shifting the rest of the file backwards to "off".

The main problem with this is if the off/len are not block aligned.  If they
are, then this is just block twiddling, if they aren't, then this is a file
rewrite anyway.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Editing-in-place of a large file
  2001-09-03  0:59   ` Larry McVoy
@ 2001-09-03  1:24     ` Ingo Oeser
  2001-09-03  1:31       ` Alan Cox
  2001-09-03  4:27       ` Bob McElrath
  2001-09-03  1:30     ` Daniel Phillips
  1 sibling, 2 replies; 29+ messages in thread
From: Ingo Oeser @ 2001-09-03  1:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: Bob McElrath

On Sun, Sep 02, 2001 at 05:59:38PM -0700, Larry McVoy wrote:
> > What's needed is a generalisation of sparse files and truncate().
> > They both handle similar problems.
> 
> how about 
> 
> 	fzero(int fd, off_t off, size_t len)
> 	fdelete(int fd, off_t off, size_t len)
 
and 

   finsert(int fd, off_t off, size_t len, void *buf, size_t buflen)

> The main problem with this is if the off/len are not block aligned.  If they
> are, then this is just block twiddling, if they aren't, then this is a file
> rewrite anyway.

Yes, that's why I solved this in user space by implementing a C++
stream consisting of multiple mmaps() of files and anonymous
memory. I needed this for someone editing audio streams.

It's basically creating a binary diff ;-)

Another solution for the original problem is to rewrite the file
in-place by coping from the end of the gap to the beginning of
the gap until the gap is shifted to the end of the file and thus
can be left to ftruncate().

This will at least not require more space on disk, but will take
quite a while and risk data corruption for this file in case of
abortion.

But fzero, fdelete and finsert might be worth considering, since
some file systems, which pack tails could also pack these kind of
partial used blocks and handle them properly. 

We already handle partial pages, so why not handle them with
offset/size pairs and enable this mechanisms? Multi media streams
would love these kind of APIs ;-)


Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Editing-in-place of a large file
  2001-09-03  0:59   ` Larry McVoy
  2001-09-03  1:24     ` Ingo Oeser
@ 2001-09-03  1:30     ` Daniel Phillips
  1 sibling, 0 replies; 29+ messages in thread
From: Daniel Phillips @ 2001-09-03  1:30 UTC (permalink / raw)
  To: Larry McVoy, Ingo Oeser; +Cc: Bob McElrath, linux-kernel

On September 3, 2001 02:59 am, Larry McVoy wrote:
> > What's needed is a generalisation of sparse files and truncate().
> > They both handle similar problems.
> 
> how about 
> 
> 	fzero(int fd, off_t off, size_t len)

sys_clear :-)

> which zeros the blocks and if it can creates a holey file?
> 
> However, that's not what Bob wants, he wants to remove commercials from
> recorded TV.  So what he wants is 
> 
> 	fdelete(int fd, off_t off, size_t len)
> 
> which has the semantics of shifting the rest of the file backwards to "off".
>
> The main problem with this is if the off/len are not block aligned.  If they
> are, then this is just block twiddling, if they aren't, then this is a file
> rewrite anyway.

He could insert blank video frames to pad to the edges of blocks.  Very 
theoretical since we are ages away from having fzero/sys_clear.  Ask Al Viro 
if you want to hear the whole ugly story.  (Executive summary: it's hard 
enough handling remove/create races with just one boundary per file, now try 
it with an unbounded number.)

--
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Editing-in-place of a large file
  2001-09-03  1:24     ` Ingo Oeser
@ 2001-09-03  1:31       ` Alan Cox
  2001-09-03  1:50         ` Ingo Oeser
  2001-09-03  4:27       ` Bob McElrath
  1 sibling, 1 reply; 29+ messages in thread
From: Alan Cox @ 2001-09-03  1:31 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: linux-kernel, Bob McElrath

> Another solution for the original problem is to rewrite the file
> in-place by coping from the end of the gap to the beginning of
> the gap until the gap is shifted to the end of the file and thus
> can be left to ftruncate().

Another approach would be to keep your own index of blocks and use that
for the data reads. Since fdelete and fzero wont actually relayout the files
in order to make the data linear (even if such calls existed) there isnt
much point performancewise doing it in kernel space - its a very specialised
application

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Editing-in-place of a large file
  2001-09-03  1:31       ` Alan Cox
@ 2001-09-03  1:50         ` Ingo Oeser
  2001-09-03 10:48           ` Alan Cox
  0 siblings, 1 reply; 29+ messages in thread
From: Ingo Oeser @ 2001-09-03  1:50 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel, Bob McElrath

On Mon, Sep 03, 2001 at 02:31:58AM +0100, Alan Cox wrote:
> Another approach would be to keep your own index of blocks and use that
> for the data reads.

That is reimplementing file system functionality in user space. 
I'm in doubts that this is considered good design...

But I've done a similar thing anyway (using a ordered list of
continous mmap()ed chunks) some years ago (see my other posting
in this thread mentioning C++) ;-)

> Since fdelete and fzero wont actually relayout the files in
> order to make the data linear (even if such calls existed)
> there isnt much point performancewise doing it in kernel space

That's the problem of the file system to be used. And the data
doesn't need to be linear. Current file systems on Linux only
avoid fragmentation, but they don't actively fight it by moving
things around, so this doesn't matter anyway.

> - its a very specialised application

Editing video and audio streams is more common then you think and
letting the user wait, while we copy 4GB around is not what I
consider user friendly, even for the selective user friendlyness
of a Unix ;-)


Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Editing-in-place of a large file
  2001-09-03  1:24     ` Ingo Oeser
  2001-09-03  1:31       ` Alan Cox
@ 2001-09-03  4:27       ` Bob McElrath
  1 sibling, 0 replies; 29+ messages in thread
From: Bob McElrath @ 2001-09-03  4:27 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3316 bytes --]

Ingo Oeser [ingo.oeser@informatik.tu-chemnitz.de] wrote:
> On Sun, Sep 02, 2001 at 05:59:38PM -0700, Larry McVoy wrote:
> > > What's needed is a generalisation of sparse files and truncate().
> > > They both handle similar problems.
> > 
> > how about 
> > 
> > 	fzero(int fd, off_t off, size_t len)
> > 	fdelete(int fd, off_t off, size_t len)
>  
> and 
> 
>    finsert(int fd, off_t off, size_t len, void *buf, size_t buflen)
> 
> > The main problem with this is if the off/len are not block aligned.  If they
> > are, then this is just block twiddling, if they aren't, then this is a file
> > rewrite anyway.

*exactly*  I don't know enough about ext2fs to know if this is possible
(i.e. a partially filled block in the middle of a file) so that's why I
asked.

> Another solution for the original problem is to rewrite the file
> in-place by coping from the end of the gap to the beginning of
> the gap until the gap is shifted to the end of the file and thus
> can be left to ftruncate().

For editing commercials, you'd still have to copy 90% of the data.  In
the US, there's roughly 5 minutes of commercials for every 15 of the
show, so that would only save copying the first 15 minutes...

> This will at least not require more space on disk, but will take
> quite a while and risk data corruption for this file in case of
> abortion.

Yep.  I should mention that the Linux/mjpeg tools
(http://mjpeg.sourceforge.net) already have an elegant way of "marking"
a portion of a video and skipping it when playing it, through the use of
"edit lists".  (use xlav/glav to mark it, and then you can lavplay the
edit list, which just contains the start/end of skipped sections)  They
also have a program to apply the edit list and create a new video
(lavtrans).  But this requires copying the desired sections of video to
a new file, which requires 75% more disk space than the original file,
and takes a looong time.

The idea behind my first message should be obvious here...an almost
atomic operation modifying at most 2 blocks (and marking a bunch as
free) wouldn't require nearly as much disk-thrashing, and would be
nearly instantaneous from the user's perspective.

Disk fragmentation is unimportant when the contiguous chunks are 300MB
long.

> But fzero, fdelete and finsert might be worth considering, since
> some file systems, which pack tails could also pack these kind of
> partial used blocks and handle them properly. 

Do the journaling filesystems use blocks in a similar manner to ext2fs?
Anyone know if any of them can handle partially filled blocks in the
middle of a file?

Are there any media-filesystems out there that have these kinds of
extensions?  I'm not sure these extensions would be useful for anything
but editing media...

> We already handle partial pages, so why not handle them with
> offset/size pairs and enable this mechanisms? Multi media streams
> would love these kind of APIs ;-)

Yep yep yep.  What do multimedia people use?  Custom multi-thousand
dollar programs with their own filesystem layer?  What about TiVo?
Didn't they contribute some fs-layer modifications a while back?

Cheers,
-- Bob

Bob McElrath (rsmcelrath@students.wisc.edu) 
Univ. of Wisconsin at Madison, Department of Physics

[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Editing-in-place of a large file
  2001-09-03  1:50         ` Ingo Oeser
@ 2001-09-03 10:48           ` Alan Cox
  2001-09-03 14:31             ` Daniel Phillips
                               ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Alan Cox @ 2001-09-03 10:48 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: Alan Cox, linux-kernel, Bob McElrath

> That is reimplementing file system functionality in user space. 
> I'm in doubts that this is considered good design...

Keeping things out of the kernel is good design. Your block indirections
are no different to other database formats. Perhaps you think we should
have fsql_operation() and libdb in kernel 8)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Editing-in-place of a large file
  2001-09-03 10:48           ` Alan Cox
@ 2001-09-03 14:31             ` Daniel Phillips
  2001-09-03 14:46             ` Bob McElrath
  2001-09-03 21:19             ` Ben Ford
  2 siblings, 0 replies; 29+ messages in thread
From: Daniel Phillips @ 2001-09-03 14:31 UTC (permalink / raw)
  To: Alan Cox, Ingo Oeser; +Cc: Alan Cox, linux-kernel, Bob McElrath

On September 3, 2001 12:48 pm, Alan Cox wrote:
> > That is reimplementing file system functionality in user space. 
> > I'm in doubts that this is considered good design...
> 
> Keeping things out of the kernel is good design. Your block indirections
> are no different to other database formats. Perhaps you think we should
> have fsql_operation() and libdb in kernel 8)

For that matter, he could use a database file.  I don't know if Postgres (for 
example) supports streaming read/write from a database record, but if it 
doesn't it could be made to.

Or if he doesn't want to hack Postgres today, he can put his "metadata" in a 
database file and the video data in a separate file.

--
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Editing-in-place of a large file
  2001-09-03 10:48           ` Alan Cox
  2001-09-03 14:31             ` Daniel Phillips
@ 2001-09-03 14:46             ` Bob McElrath
  2001-09-03 14:54               ` Alan Cox
  2001-09-03 15:11               ` Richard Guenther
  2001-09-03 21:19             ` Ben Ford
  2 siblings, 2 replies; 29+ messages in thread
From: Bob McElrath @ 2001-09-03 14:46 UTC (permalink / raw)
  To: Alan Cox; +Cc: Ingo Oeser, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1697 bytes --]

Alan Cox [alan@lxorguk.ukuu.org.uk] wrote:
> > That is reimplementing file system functionality in user space. 
> > I'm in doubts that this is considered good design...
> 
> Keeping things out of the kernel is good design. Your block indirections
> are no different to other database formats. Perhaps you think we should
> have fsql_operation() and libdb in kernel 8)

Well, a filesystem that is:
1) synchronous
2) bypasses linux's buffer cache
3) insert() and delete() to insert and delete from the middle of a file.
4) Has large block sizes

Sounds like a possibility for the kernel to me.  As with most things,
you could do raw disk I/O from userspace, but it seems reasonable to put
it in the kernel.  Call it "mediafs" or something.

I agree that "normal" filesystems like ext2 should not do the insert()
and delete() that were mentioned.  It'd be a lot of work and could
easily get someone in to trouble (imagine doing it on small files!)

It appears that SGI's XFS does some of this in IRIX.  They play some
tricks to keep from copying the streaming data.  (i.e. same buffer gets
passed around as a target for the video device, a source for a userspace
program, and a source for DMA to disk)  They also have some special
flags:
    fcentl(fd, F_SETFL, FDIRECT); /* enables direct disk access */
    open(filename, O_DIRECT);     /* likewise */
See this page for details:
    http://reality.sgi.com/cpirazzi_engr/lg/uv/disk.html

Can linux disable its buffer cache for a particular filesystem
(something like a 'nocache' mount option?)

Cheers,
-- Bob

Bob McElrath (rsmcelrath@students.wisc.edu) 
Univ. of Wisconsin at Madison, Department of Physics

[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Editing-in-place of a large file
  2001-09-03 14:46             ` Bob McElrath
@ 2001-09-03 14:54               ` Alan Cox
  2001-09-03 15:42                 ` Doug McNaught
  2001-09-03 15:11               ` Richard Guenther
  1 sibling, 1 reply; 29+ messages in thread
From: Alan Cox @ 2001-09-03 14:54 UTC (permalink / raw)
  To: Bob McElrath; +Cc: Alan Cox, Ingo Oeser, linux-kernel

> Sounds like a possibility for the kernel to me.  As with most things,

But you have it backwards - things are not "could go in the kernel" things
are "could avoid being in kernel"

> passed around as a target for the video device, a source for a userspace
> program, and a source for DMA to disk)  They also have some special
> flags:
>     fcentl(fd, F_SETFL, FDIRECT); /* enables direct disk access */
>     open(filename, O_DIRECT);     /* likewise */
> See this page for details:
>     http://reality.sgi.com/cpirazzi_engr/lg/uv/disk.html

Andrea has this working on 2.4 + patches


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Editing-in-place of a large file
  2001-09-03 14:46             ` Bob McElrath
  2001-09-03 14:54               ` Alan Cox
@ 2001-09-03 15:11               ` Richard Guenther
  1 sibling, 0 replies; 29+ messages in thread
From: Richard Guenther @ 2001-09-03 15:11 UTC (permalink / raw)
  To: Bob McElrath; +Cc: Alan Cox, Ingo Oeser, linux-kernel

On Mon, 3 Sep 2001, Bob McElrath wrote:

> Alan Cox [alan@lxorguk.ukuu.org.uk] wrote:
> > > That is reimplementing file system functionality in user space. 
> > > I'm in doubts that this is considered good design...
> > 
> > Keeping things out of the kernel is good design. Your block indirections
> > are no different to other database formats. Perhaps you think we should
> > have fsql_operation() and libdb in kernel 8)
> 
> Well, a filesystem that is:
> 1) synchronous
> 2) bypasses linux's buffer cache
> 3) insert() and delete() to insert and delete from the middle of a file.
> 4) Has large block sizes

Well, just make it possible to tell something more about the operation
you want to do to the kernel/VFS. Copy/Insert/Delete is in fact some
sort of sendfile operation. For GLAME I did a "simple" (well, it turned
out to be not that simple...) user level filesystem that supports those
kind of operations. The interface I chose was
  sendfile(dest_fd, source_fd, count, mode)
where mode can be composed out of nothing (overwrite, leave source
intact), INSERT and CUT.

As it is a userspace implementation byte granularity is supported, but
for a kernel level support I suppose block granularity would suffice and
could be optimized for in the lower level filesystems code. I'd prefer
such a generic interface over fcntls which would certainly be possible
at least for a "split this file into two ones" operation.

Oh yes - it would help to have this in the kernel, at least if you
want to support sane mmap behaviour (for block aligned modifications,
of course - byte level is impossible due to aliasing issues, I believe).

Richard.

--
Richard Guenther <richard.guenther@uni-tuebingen.de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/
The GLAME Project: http://www.glame.de/


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Editing-in-place of a large file
  2001-09-03 14:54               ` Alan Cox
@ 2001-09-03 15:42                 ` Doug McNaught
  0 siblings, 0 replies; 29+ messages in thread
From: Doug McNaught @ 2001-09-03 15:42 UTC (permalink / raw)
  To: Alan Cox; +Cc: Bob McElrath, Ingo Oeser, linux-kernel

Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

> >     open(filename, O_DIRECT);     /* likewise */
> 
> Andrea has this working on 2.4 + patches

Is O_DIRECT slated to go into mainstream 2.4?  Or is it a 2.5 thing?
Or neither?

-Doug
-- 
Free Dmitry Sklyarov! 
http://www.freesklyarov.org/ 

We will return to our regularly scheduled signature shortly.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Editing-in-place of a large file
  2001-09-03 10:48           ` Alan Cox
  2001-09-03 14:31             ` Daniel Phillips
  2001-09-03 14:46             ` Bob McElrath
@ 2001-09-03 21:19             ` Ben Ford
  2 siblings, 0 replies; 29+ messages in thread
From: Ben Ford @ 2001-09-03 21:19 UTC (permalink / raw)
  To: linux-kernel

Alan Cox wrote:

>>That is reimplementing file system functionality in user space. 
>>I'm in doubts that this is considered good design...
>>
>
>Keeping things out of the kernel is good design. Your block indirections
>are no different to other database formats. Perhaps you think we should
>have fsql_operation() and libdb in kernel 8)
>

 From what I've read, that is where windows is going!

-b

-- 
Number of restrictions placed on "Alice in Wonderland" (public domain)    
eBook:  5

Maximum penalty for reading "Alice in Wonderland" aloud (possible DMCA    
violation):  5 years jail

Average sentence for commiting Rape: 5 years




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: COW fs (Re: Editing-in-place of a large file)
  2001-09-02 21:28 ` COW fs (Re: Editing-in-place of a large file) VDA
@ 2001-09-09 14:46   ` John Ripley
  2001-09-09 16:30     ` John Ripley
                       ` (2 more replies)
  2001-09-10  9:28   ` VDA
  1 sibling, 3 replies; 29+ messages in thread
From: John Ripley @ 2001-09-09 14:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: VDA

VDA wrote:
> 
> Sunday, September 02, 2001, 11:21:37 PM, Bob McElrath wrote:
> BM> I would like to take an extremely large file (multi-gigabyte) and edit
> BM> it by removing a chunk out of the middle.  This is easy enough by
> BM> reading in the entire file and spitting it back out again, but it's
> BM> hardly efficent to read in an 8GB file just to remove a 100MB segment.

> BM> Is there another way to do this?

> BM> Is it possible to modify the inode structure of the underlying
> BM> filesystem to free blocks in the middle?  (What to do with the half-full
> BM> blocks that are left?)  Has anyone written a tool to do something like
> BM> this?

> BM> Is there a way to do this in a filesystem-independent manner?

> A COW fs is a far more useful and cool. A fs where a copy of a file
> does not duplicate all blocks. Blocks get copied-on-write only when
> copy of a file is written to. There could be even a fs compressor
> which looks for and merges blocks with exactly same contents from
> different files.
> 
> Maybe ext2/3 folks will play with this idea after ext3?
> 
> I'm planning to write a test program which will scan my ext2 fs and
> report how many duplicate blocks with the same contents it sees (i.e
> how many would I save with a COW fs)

I've tried this idea. I did an MD5 of every block (4KB) in a partition
and counted the number of blocks with the same hash. Only about 5-10% of
blocks on several filesystem were actually duplicates. This might be
better if you reduced the block size to 512 bytes, but there's a
question of how much extra space filesystem structures would then take
up.

Basically, it didn't look like compressing duplicate blocks would
actually be worth the extra structures or CPU.

On the other hand, a COW fs would be excellent for making file copying
much quicker. You can do things like copying the linux kernel tree using
'cp -lR', but the files do not act as if they are unique copies - and
I've been bitten many times when I forgot this. If you had COW, you
could just copy the entire tree and forget about the fact they're
linked.

The problem is this needs a bit of userland support, which could only be
done automatically if you did this:

- Keep a hash of the contents of blocks in the buffer-cache.
- The kernel compares the hash of each block write to all blocks already
in the buffer-cache.
- If a duplicate is found, the kernel generates a COW link instead of
writing the block to disk.

Obviously this would involve large amounts of CPU. I think a simple
userland call for 'COW this file to this new file' wouldn't be too
hideous a solution.

-- 
John Ripley

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: COW fs (Re: Editing-in-place of a large file)
  2001-09-09 14:46   ` John Ripley
@ 2001-09-09 16:30     ` John Ripley
  2001-09-10  2:43       ` Daniel Phillips
  2001-09-09 17:41     ` Xavier Bestel
  2001-09-14 10:03     ` Pavel Machek
  2 siblings, 1 reply; 29+ messages in thread
From: John Ripley @ 2001-09-09 16:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: VDA

John Ripley wrote:
> 
> VDA wrote:
> >
> > Sunday, September 02, 2001, 11:21:37 PM, Bob McElrath wrote:
> > BM> I would like to take an extremely large file (multi-gigabyte) and edit
> > BM> it by removing a chunk out of the middle.  This is easy enough by
> > BM> reading in the entire file and spitting it back out again, but it's
> > BM> hardly efficent to read in an 8GB file just to remove a 100MB segment.
> 
> > BM> Is there another way to do this?
> 
> > BM> Is it possible to modify the inode structure of the underlying
> > BM> filesystem to free blocks in the middle?  (What to do with the half-full
> > BM> blocks that are left?)  Has anyone written a tool to do something like
> > BM> this?
> 
> > BM> Is there a way to do this in a filesystem-independent manner?
> 
> > A COW fs is a far more useful and cool. A fs where a copy of a file
> > does not duplicate all blocks. Blocks get copied-on-write only when
> > copy of a file is written to. There could be even a fs compressor
> > which looks for and merges blocks with exactly same contents from
> > different files.
> >
> > Maybe ext2/3 folks will play with this idea after ext3?
> >
> > I'm planning to write a test program which will scan my ext2 fs and
> > report how many duplicate blocks with the same contents it sees (i.e
> > how many would I save with a COW fs)
> 
> I've tried this idea. I did an MD5 of every block (4KB) in a partition
> and counted the number of blocks with the same hash. Only about 5-10% of
> blocks on several filesystem were actually duplicates. This might be
> better if you reduced the block size to 512 bytes, but there's a
> question of how much extra space filesystem structures would then take
> up.

Thought I'd reply to myself with some more details :)

Scanning for duplicates gave the following results:

 512 byte blocks
----------------

/dev/sda5 - swap	-   32122 blocks,  11488 duplicates, 35.76%
/dev/sdb3 - swap	-   25297 blocks,   2302 duplicates,  9.09%
/dev/sdc5 - swap	-   34122 blocks,  10239 duplicates, 30.00%

/dev/sda6 - /tmp	-  210845 blocks,  17697 duplicates,  8.39%
/dev/sda7 - /var	-   32122 blocks,   5327 duplicates, 16.58%
/dev/sdb5 - /home	-  220885 blocks,  24541 duplicates, 11.11%
/dev/sdc7 - /usr	- 1084379 blocks, 122370 duplicates, 11.28%

4096 byte blocks
----------------

/dev/sda5 - swap	-   32122 blocks,   9799 duplicates, 30.50%
/dev/sdb3 - swap	-   26105 blocks,      0 duplicates,  0.00%
/dev/sdc5 - swap	-   34122 blocks,  10539 duplicates, 30.88%

/dev/sda6 - /tmp 	-  210845 blocks,  17880 duplicates,  8.48%
/dev/sda7 - /var	-   32122 blocks,   2816 duplicates,  8.76%
/dev/sdb5 - /home	-  220885 blocks,   8908 duplicates,  4.03%
/dev/sdc7 - /usr	- 1084379 blocks,  71778 duplicates,  6.61%

Interesting results for the swap partitions. Probably full of zeros. The
time between runs probably explains the difference in /tmp.

You can grab the program I used from
http://www.pslam.demon.co.uk/md5-stuff.tar.gz
Run with ./md5device </dev/blah

-- 
John Ripley

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: COW fs (Re: Editing-in-place of a large file)
  2001-09-09 14:46   ` John Ripley
  2001-09-09 16:30     ` John Ripley
@ 2001-09-09 17:41     ` Xavier Bestel
  2001-09-10  1:29       ` John Ripley
  2001-09-10 11:11       ` Ihar Filipau
  2001-09-14 10:03     ` Pavel Machek
  2 siblings, 2 replies; 29+ messages in thread
From: Xavier Bestel @ 2001-09-09 17:41 UTC (permalink / raw)
  To: John Ripley; +Cc: Linux Kernel Mailing List, VDA

le dim 09-09-2001 at 18:30 John Ripley a _rit :

> /dev/sda6 - /tmp	-  210845 blocks,  17697 duplicates,  8.39%
> /dev/sda7 - /var	-   32122 blocks,   5327 duplicates, 16.58%
> /dev/sdb5 - /home	-  220885 blocks,  24541 duplicates, 11.11%
> /dev/sdc7 - /usr	- 1084379 blocks, 122370 duplicates, 11.28%

How many of these blocks actually belong to file data ?

	Xav


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: COW fs (Re: Editing-in-place of a large file)
  2001-09-09 17:41     ` Xavier Bestel
@ 2001-09-10  1:29       ` John Ripley
  2001-09-10  6:45         ` Ragnar Kjørstad
  2001-09-14 10:06         ` Pavel Machek
  2001-09-10 11:11       ` Ihar Filipau
  1 sibling, 2 replies; 29+ messages in thread
From: John Ripley @ 2001-09-10  1:29 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Xavier Bestel, VDA

Xavier Bestel wrote:
> 
> le dim 09-09-2001 at 18:30 John Ripley a _rit :
> 
> > /dev/sda6 - /tmp      -  210845 blocks,  17697 duplicates,  8.39%
> > /dev/sda7 - /var      -   32122 blocks,   5327 duplicates, 16.58%
> > /dev/sdb5 - /home     -  220885 blocks,  24541 duplicates, 11.11%
> > /dev/sdc7 - /usr      - 1084379 blocks, 122370 duplicates, 11.28%
> 
> How many of these blocks actually belong to file data ?

Hmm, good point:

Filesystem         1024-blocks  Used Available Capacity Mounted on
/dev/sda6             841616    4508   837108      1%   /tmp
/dev/sda7             124407   63774    54209     54%   /var
/dev/sdb5             855138  677328   177810     79%   /home
/dev/sdc7            4191237 3946214   245023     94%   /usr

My thinking was that I've managed to run out of space on all of the
partitions in the past and had to prune a lot of stuff... so nearly all
the blocks should contain at least some "likely" data. Still, I guess I
need to verify that this isn't distorting the results. The program needs
to recurse over all files on the filesystem rather than all blocks on a
partition.

-- 
John Ripley

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: COW fs (Re: Editing-in-place of a large file)
  2001-09-09 16:30     ` John Ripley
@ 2001-09-10  2:43       ` Daniel Phillips
  2001-09-10  2:58         ` David Lang
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Phillips @ 2001-09-10  2:43 UTC (permalink / raw)
  To: John Ripley, linux-kernel; +Cc: VDA

On September 9, 2001 06:30 pm, John Ripley wrote:
> Interesting results for the swap partitions. Probably full of zeros.

It doesn't make a lot of sense to spend 30-35% of your swap bandwidth 
swapping zeros in and out, does it?

--
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: COW fs (Re: Editing-in-place of a large file)
  2001-09-10  2:43       ` Daniel Phillips
@ 2001-09-10  2:58         ` David Lang
  0 siblings, 0 replies; 29+ messages in thread
From: David Lang @ 2001-09-10  2:58 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: John Ripley, linux-kernel, VDA

if sectors full of zeros are really that common then they should never be
swapped out, just a new page allocated and zeroed when it would be swapped
back in. Even better then combining all of them into one block on disk.

David Lang

 On Mon, 10 Sep 2001, Daniel Phillips wrote:

> Date: Mon, 10 Sep 2001 04:43:53 +0200
> From: Daniel Phillips <phillips@bonn-fries.net>
> To: John Ripley <jripley@riohome.com>, linux-kernel@vger.kernel.org
> Cc: VDA <VDA@port.imtp.ilyichevsk.odessa.ua>
> Subject: Re: COW fs (Re: Editing-in-place of a large file)
>
> On September 9, 2001 06:30 pm, John Ripley wrote:
> > Interesting results for the swap partitions. Probably full of zeros.
>
> It doesn't make a lot of sense to spend 30-35% of your swap bandwidth
> swapping zeros in and out, does it?
>
> --
> Daniel
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: COW fs (Re: Editing-in-place of a large file)
  2001-09-10  1:29       ` John Ripley
@ 2001-09-10  6:45         ` Ragnar Kjørstad
  2001-09-14 10:06         ` Pavel Machek
  1 sibling, 0 replies; 29+ messages in thread
From: Ragnar Kjørstad @ 2001-09-10  6:45 UTC (permalink / raw)
  To: John Ripley; +Cc: Linux Kernel Mailing List, Xavier Bestel, VDA

On Mon, Sep 10, 2001 at 02:29:11AM +0100, John Ripley wrote:
> My thinking was that I've managed to run out of space on all of the
> partitions in the past and had to prune a lot of stuff... so nearly all
> the blocks should contain at least some "likely" data. Still, I guess I
> need to verify that this isn't distorting the results. The program needs
> to recurse over all files on the filesystem rather than all blocks on a
> partition.

You can find a program that does that at:
http://www.stud.ntnu.no/~ragnarkj/download/duplicates.tgz

And results from running on a few different filesystem-types (webpages,
users home directories, softwareand so on) were posted to reiserfs-list
long time ago - look in the archives if you're curious.



-- 
Ragnar Kjørstad
Big Storage

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: COW fs (Re: Editing-in-place of a large file)
  2001-09-02 21:28 ` COW fs (Re: Editing-in-place of a large file) VDA
  2001-09-09 14:46   ` John Ripley
@ 2001-09-10  9:28   ` VDA
  2001-09-10  9:35     ` John P. Looney
  1 sibling, 1 reply; 29+ messages in thread
From: VDA @ 2001-09-10  9:28 UTC (permalink / raw)
  To: John Ripley; +Cc: linux-kernel

JR> I've tried this idea. I did an MD5 of every block (4KB) in a partition
JR> and counted the number of blocks with the same hash. Only about 5-10% of
JR> blocks on several filesystem were actually duplicates. This might be
JR> better if you reduced the block size to 512 bytes, but there's a
JR> question of how much extra space filesystem structures would then take
JR> up.

JR> Basically, it didn't look like compressing duplicate blocks would
JR> actually be worth the extra structures or CPU.

JR> On the other hand, a COW fs would be excellent for making file copying
JR> much quicker. You can do things like copying the linux kernel tree using
JR> 'cp -lR', but the files do not act as if they are unique copies - and
JR> I've been bitten many times when I forgot this. If you had COW, you
JR> could just copy the entire tree and forget about the fact they're
JR> linked.

Yeah, I'm mostly thinking about this kind of COW fs usage. You may copy
gigabytes in the instant and don't bother about tracking duplicate
files ("zero blocks left??? where's the hell I copied that .mpg's???").

Now, sometimes we use hardlinks as "poor man's COW fs", but
I bet it's error prone. Every now and then you forget it's a
hardlinked kernel tree and start happily hacking in it... :-(

A "compressor" which hunts and merges duplicate blocks is a bonus,
not a primary tool.
-- 
Best regards,
VDA
mailto:VDA@port.imtp.ilyichevsk.odessa.ua
http://port.imtp.ilyichevsk.odessa.ua/vda/



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: COW fs (Re: Editing-in-place of a large file)
  2001-09-10  9:28   ` VDA
@ 2001-09-10  9:35     ` John P. Looney
  0 siblings, 0 replies; 29+ messages in thread
From: John P. Looney @ 2001-09-10  9:35 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 714 bytes --]

On Mon, Sep 10, 2001 at 12:28:51PM +0300, VDA mentioned:
> Now, sometimes we use hardlinks as "poor man's COW fs", but
> I bet it's error prone. Every now and then you forget it's a
> hardlinked kernel tree and start happily hacking in it... :-(

 And of course hardlinks don't work on directories...

> A "compressor" which hunts and merges duplicate blocks is a bonus,
> not a primary tool.

 Checkout http://freshmeat.net/projects/fslint/ - it's an excellent tool
for hunting down duplicate files, dangling links etc.

Kate

-- 
_______________________________________
John Looney             Chief Scientist
a n t e f a c t o     t: +353 1 8586004
www.antefacto.com     f: +353 1 8586014


[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: COW fs (Re: Editing-in-place of a large file)
  2001-09-09 17:41     ` Xavier Bestel
  2001-09-10  1:29       ` John Ripley
@ 2001-09-10 11:11       ` Ihar Filipau
  2001-09-10 16:10         ` Kari Hurtta
  1 sibling, 1 reply; 29+ messages in thread
From: Ihar Filipau @ 2001-09-10 11:11 UTC (permalink / raw)
  To: Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 793 bytes --]


	Is there any FS that have dynamic allocation?

	On one partition can reside a number of FSs and using only needed space? this
would be really cute - HD behaves just like RAM.

	Some FSs on partition just like number files on FS.
	In other words "File Systems' System".

	google and altavista both show nothing...

PS Interesting like academic task. Hm. Will invistigate.

Xavier Bestel wrote:
> 
> le dim 09-09-2001 at 18:30 John Ripley a _rit :
> 
> > /dev/sda6 - /tmp      -  210845 blocks,  17697 duplicates,  8.39%
> > /dev/sda7 - /var      -   32122 blocks,   5327 duplicates, 16.58%
> > /dev/sdb5 - /home     -  220885 blocks,  24541 duplicates, 11.11%
> > /dev/sdc7 - /usr      - 1084379 blocks, 122370 duplicates, 11.28%
> 
> How many of these blocks actually belong to file data ?
>

[-- Attachment #2: Card for Ihar Filipau --]
[-- Type: text/x-vcard, Size: 407 bytes --]

begin:vcard 
n:Filiapau;Ihar
tel;pager:+375 (0) 17 2850000#6683
tel;fax:+375 (0) 17 2841537
tel;home:+375 (0) 17 2118441
tel;work:+375 (0) 17 2841371
x-mozilla-html:TRUE
url:www.iph.to
org:Enformatica Ltd.;Linux Developement Department
adr:;;Kalinine str. 19-18;Minsk;BY;220012;Belarus
version:2.1
email;internet:philips@iph.to
title:Software Developer
note:(none)
x-mozilla-cpt:;18368
fn:Philips
end:vcard

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: COW fs (Re: Editing-in-place of a large file)
  2001-09-10 11:11       ` Ihar Filipau
@ 2001-09-10 16:10         ` Kari Hurtta
  0 siblings, 0 replies; 29+ messages in thread
From: Kari Hurtta @ 2001-09-10 16:10 UTC (permalink / raw)
  To: Ihar Filipau; +Cc: Linux Kernel Mailing List

[ Charset KOI8-R unsupported, converting... ]
> 
> 	Is there any FS that have dynamic allocation?
> 
> 	On one partition can reside a number of FSs and using only needed space? this
> would be really cute - HD behaves just like RAM.
> 
> 	Some FSs on partition just like number files on FS.
> 	In other words "File Systems' System".
> 
> 	google and altavista both show nothing...
> 
> PS Interesting like academic task. Hm. Will invistigate.

Are you thinking something similar than AdvFS of Tru 64 ?

( storage or partition is called as 'volume'
  and filesystems which share volume are called as 'filesets' )

-- 
          /"\                           |  Kari 
          \ /     ASCII Ribbon Campaign |    Hurtta
           X      Against HTML Mail     |
          / \                           |

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: COW fs (Re: Editing-in-place of a large file)
  2001-09-09 14:46   ` John Ripley
  2001-09-09 16:30     ` John Ripley
  2001-09-09 17:41     ` Xavier Bestel
@ 2001-09-14 10:03     ` Pavel Machek
  2 siblings, 0 replies; 29+ messages in thread
From: Pavel Machek @ 2001-09-14 10:03 UTC (permalink / raw)
  To: John Ripley; +Cc: linux-kernel, VDA

Hi!

> - Keep a hash of the contents of blocks in the buffer-cache.
> - The kernel compares the hash of each block write to all blocks already
> in the buffer-cache.
> - If a duplicate is found, the kernel generates a COW link instead of
> writing the block to disk.
> 
> Obviously this would involve large amounts of CPU. I think a simple

Why? If you hashed the hashes, you could do it very fast.
								Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: COW fs (Re: Editing-in-place of a large file)
  2001-09-10  1:29       ` John Ripley
  2001-09-10  6:45         ` Ragnar Kjørstad
@ 2001-09-14 10:06         ` Pavel Machek
  1 sibling, 0 replies; 29+ messages in thread
From: Pavel Machek @ 2001-09-14 10:06 UTC (permalink / raw)
  To: John Ripley; +Cc: Linux Kernel Mailing List, Xavier Bestel, VDA

Hi!
> > le dim 09-09-2001 at 18:30 John Ripley a _rit :
> > 
> > > /dev/sda6 - /tmp      -  210845 blocks,  17697 duplicates,  8.39%
> > > /dev/sda7 - /var      -   32122 blocks,   5327 duplicates, 16.58%
> > > /dev/sdb5 - /home     -  220885 blocks,  24541 duplicates, 11.11%
> > > /dev/sdc7 - /usr      - 1084379 blocks, 122370 duplicates, 11.28%
> > 
> > How many of these blocks actually belong to file data ?
> 
> Hmm, good point:
> 
> Filesystem         1024-blocks  Used Available Capacity Mounted on
> /dev/sda6             841616    4508   837108      1%   /tmp
> /dev/sda7             124407   63774    54209     54%   /var
> /dev/sdb5             855138  677328   177810     79%   /home
> /dev/sdc7            4191237 3946214   245023     94%   /usr
> 
> My thinking was that I've managed to run out of space on all of the
> partitions in the past and had to prune a lot of stuff... so nearly all
> the blocks should contain at least some "likely" data. Still, I guess I
> need to verify that this isn't distorting the results. The program needs
> to recurse over all files on the filesystem rather than all blocks on a
> partition.
just cat /dev/urandom > file to fill it with garbage
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2001-09-17 22:37 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-02 20:21 Editing-in-place of a large file Bob McElrath
2001-09-02 21:28 ` COW fs (Re: Editing-in-place of a large file) VDA
2001-09-09 14:46   ` John Ripley
2001-09-09 16:30     ` John Ripley
2001-09-10  2:43       ` Daniel Phillips
2001-09-10  2:58         ` David Lang
2001-09-09 17:41     ` Xavier Bestel
2001-09-10  1:29       ` John Ripley
2001-09-10  6:45         ` Ragnar Kjørstad
2001-09-14 10:06         ` Pavel Machek
2001-09-10 11:11       ` Ihar Filipau
2001-09-10 16:10         ` Kari Hurtta
2001-09-14 10:03     ` Pavel Machek
2001-09-10  9:28   ` VDA
2001-09-10  9:35     ` John P. Looney
2001-09-02 21:30 ` Editing-in-place of a large file Ingo Oeser
2001-09-03  0:59   ` Larry McVoy
2001-09-03  1:24     ` Ingo Oeser
2001-09-03  1:31       ` Alan Cox
2001-09-03  1:50         ` Ingo Oeser
2001-09-03 10:48           ` Alan Cox
2001-09-03 14:31             ` Daniel Phillips
2001-09-03 14:46             ` Bob McElrath
2001-09-03 14:54               ` Alan Cox
2001-09-03 15:42                 ` Doug McNaught
2001-09-03 15:11               ` Richard Guenther
2001-09-03 21:19             ` Ben Ford
2001-09-03  4:27       ` Bob McElrath
2001-09-03  1:30     ` Daniel Phillips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).