* Editing-in-place of a large file @ 2001-09-02 20:21 Bob McElrath 2001-09-02 21:28 ` COW fs (Re: Editing-in-place of a large file) VDA 2001-09-02 21:30 ` Editing-in-place of a large file Ingo Oeser 0 siblings, 2 replies; 29+ messages in thread From: Bob McElrath @ 2001-09-02 20:21 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 721 bytes --] I would like to take an extremely large file (multi-gigabyte) and edit it by removing a chunk out of the middle. This is easy enough by reading in the entire file and spitting it back out again, but it's hardly efficent to read in an 8GB file just to remove a 100MB segment. Is there another way to do this? Is it possible to modify the inode structure of the underlying filesystem to free blocks in the middle? (What to do with the half-full blocks that are left?) Has anyone written a tool to do something like this? Is there a way to do this in a filesystem-independent manner? Thanks, -- Bob Bob McElrath (rsmcelrath@students.wisc.edu) Univ. of Wisconsin at Madison, Department of Physics [-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* COW fs (Re: Editing-in-place of a large file) 2001-09-02 20:21 Editing-in-place of a large file Bob McElrath @ 2001-09-02 21:28 ` VDA 2001-09-09 14:46 ` John Ripley 2001-09-10 9:28 ` VDA 2001-09-02 21:30 ` Editing-in-place of a large file Ingo Oeser 1 sibling, 2 replies; 29+ messages in thread From: VDA @ 2001-09-02 21:28 UTC (permalink / raw) To: linux-kernel Sunday, September 02, 2001, 11:21:37 PM, Bob McElrath wrote: BM> I would like to take an extremely large file (multi-gigabyte) and edit BM> it by removing a chunk out of the middle. This is easy enough by BM> reading in the entire file and spitting it back out again, but it's BM> hardly efficent to read in an 8GB file just to remove a 100MB segment. BM> Is there another way to do this? BM> Is it possible to modify the inode structure of the underlying BM> filesystem to free blocks in the middle? (What to do with the half-full BM> blocks that are left?) Has anyone written a tool to do something like BM> this? BM> Is there a way to do this in a filesystem-independent manner? A COW fs is a far more useful and cool. A fs where a copy of a file does not duplicate all blocks. Blocks get copied-on-write only when copy of a file is written to. There could be even a fs compressor which looks for and merges blocks with exactly same contents from different files. Maybe ext2/3 folks will play with this idea after ext3? I'm planning to write a test program which will scan my ext2 fs and report how many duplicate blocks with the same contents it sees (i.e how many would I save with a COW fs) -- Best regards, VDA mailto:VDA@port.imtp.ilyichevsk.odessa.ua http://port.imtp.ilyichevsk.odessa.ua/vda/ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: COW fs (Re: Editing-in-place of a large file) 2001-09-02 21:28 ` COW fs (Re: Editing-in-place of a large file) VDA @ 2001-09-09 14:46 ` John Ripley 2001-09-09 16:30 ` John Ripley ` (2 more replies) 2001-09-10 9:28 ` VDA 1 sibling, 3 replies; 29+ messages in thread From: John Ripley @ 2001-09-09 14:46 UTC (permalink / raw) To: linux-kernel; +Cc: VDA VDA wrote: > > Sunday, September 02, 2001, 11:21:37 PM, Bob McElrath wrote: > BM> I would like to take an extremely large file (multi-gigabyte) and edit > BM> it by removing a chunk out of the middle. This is easy enough by > BM> reading in the entire file and spitting it back out again, but it's > BM> hardly efficent to read in an 8GB file just to remove a 100MB segment. > BM> Is there another way to do this? > BM> Is it possible to modify the inode structure of the underlying > BM> filesystem to free blocks in the middle? (What to do with the half-full > BM> blocks that are left?) Has anyone written a tool to do something like > BM> this? > BM> Is there a way to do this in a filesystem-independent manner? > A COW fs is a far more useful and cool. A fs where a copy of a file > does not duplicate all blocks. Blocks get copied-on-write only when > copy of a file is written to. There could be even a fs compressor > which looks for and merges blocks with exactly same contents from > different files. > > Maybe ext2/3 folks will play with this idea after ext3? > > I'm planning to write a test program which will scan my ext2 fs and > report how many duplicate blocks with the same contents it sees (i.e > how many would I save with a COW fs) I've tried this idea. I did an MD5 of every block (4KB) in a partition and counted the number of blocks with the same hash. Only about 5-10% of blocks on several filesystem were actually duplicates. This might be better if you reduced the block size to 512 bytes, but there's a question of how much extra space filesystem structures would then take up. Basically, it didn't look like compressing duplicate blocks would actually be worth the extra structures or CPU. On the other hand, a COW fs would be excellent for making file copying much quicker. You can do things like copying the linux kernel tree using 'cp -lR', but the files do not act as if they are unique copies - and I've been bitten many times when I forgot this. If you had COW, you could just copy the entire tree and forget about the fact they're linked. The problem is this needs a bit of userland support, which could only be done automatically if you did this: - Keep a hash of the contents of blocks in the buffer-cache. - The kernel compares the hash of each block write to all blocks already in the buffer-cache. - If a duplicate is found, the kernel generates a COW link instead of writing the block to disk. Obviously this would involve large amounts of CPU. I think a simple userland call for 'COW this file to this new file' wouldn't be too hideous a solution. -- John Ripley ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: COW fs (Re: Editing-in-place of a large file) 2001-09-09 14:46 ` John Ripley @ 2001-09-09 16:30 ` John Ripley 2001-09-10 2:43 ` Daniel Phillips 2001-09-09 17:41 ` Xavier Bestel 2001-09-14 10:03 ` Pavel Machek 2 siblings, 1 reply; 29+ messages in thread From: John Ripley @ 2001-09-09 16:30 UTC (permalink / raw) To: linux-kernel; +Cc: VDA John Ripley wrote: > > VDA wrote: > > > > Sunday, September 02, 2001, 11:21:37 PM, Bob McElrath wrote: > > BM> I would like to take an extremely large file (multi-gigabyte) and edit > > BM> it by removing a chunk out of the middle. This is easy enough by > > BM> reading in the entire file and spitting it back out again, but it's > > BM> hardly efficent to read in an 8GB file just to remove a 100MB segment. > > > BM> Is there another way to do this? > > > BM> Is it possible to modify the inode structure of the underlying > > BM> filesystem to free blocks in the middle? (What to do with the half-full > > BM> blocks that are left?) Has anyone written a tool to do something like > > BM> this? > > > BM> Is there a way to do this in a filesystem-independent manner? > > > A COW fs is a far more useful and cool. A fs where a copy of a file > > does not duplicate all blocks. Blocks get copied-on-write only when > > copy of a file is written to. There could be even a fs compressor > > which looks for and merges blocks with exactly same contents from > > different files. > > > > Maybe ext2/3 folks will play with this idea after ext3? > > > > I'm planning to write a test program which will scan my ext2 fs and > > report how many duplicate blocks with the same contents it sees (i.e > > how many would I save with a COW fs) > > I've tried this idea. I did an MD5 of every block (4KB) in a partition > and counted the number of blocks with the same hash. Only about 5-10% of > blocks on several filesystem were actually duplicates. This might be > better if you reduced the block size to 512 bytes, but there's a > question of how much extra space filesystem structures would then take > up. Thought I'd reply to myself with some more details :) Scanning for duplicates gave the following results: 512 byte blocks ---------------- /dev/sda5 - swap - 32122 blocks, 11488 duplicates, 35.76% /dev/sdb3 - swap - 25297 blocks, 2302 duplicates, 9.09% /dev/sdc5 - swap - 34122 blocks, 10239 duplicates, 30.00% /dev/sda6 - /tmp - 210845 blocks, 17697 duplicates, 8.39% /dev/sda7 - /var - 32122 blocks, 5327 duplicates, 16.58% /dev/sdb5 - /home - 220885 blocks, 24541 duplicates, 11.11% /dev/sdc7 - /usr - 1084379 blocks, 122370 duplicates, 11.28% 4096 byte blocks ---------------- /dev/sda5 - swap - 32122 blocks, 9799 duplicates, 30.50% /dev/sdb3 - swap - 26105 blocks, 0 duplicates, 0.00% /dev/sdc5 - swap - 34122 blocks, 10539 duplicates, 30.88% /dev/sda6 - /tmp - 210845 blocks, 17880 duplicates, 8.48% /dev/sda7 - /var - 32122 blocks, 2816 duplicates, 8.76% /dev/sdb5 - /home - 220885 blocks, 8908 duplicates, 4.03% /dev/sdc7 - /usr - 1084379 blocks, 71778 duplicates, 6.61% Interesting results for the swap partitions. Probably full of zeros. The time between runs probably explains the difference in /tmp. You can grab the program I used from http://www.pslam.demon.co.uk/md5-stuff.tar.gz Run with ./md5device </dev/blah -- John Ripley ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: COW fs (Re: Editing-in-place of a large file) 2001-09-09 16:30 ` John Ripley @ 2001-09-10 2:43 ` Daniel Phillips 2001-09-10 2:58 ` David Lang 0 siblings, 1 reply; 29+ messages in thread From: Daniel Phillips @ 2001-09-10 2:43 UTC (permalink / raw) To: John Ripley, linux-kernel; +Cc: VDA On September 9, 2001 06:30 pm, John Ripley wrote: > Interesting results for the swap partitions. Probably full of zeros. It doesn't make a lot of sense to spend 30-35% of your swap bandwidth swapping zeros in and out, does it? -- Daniel ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: COW fs (Re: Editing-in-place of a large file) 2001-09-10 2:43 ` Daniel Phillips @ 2001-09-10 2:58 ` David Lang 0 siblings, 0 replies; 29+ messages in thread From: David Lang @ 2001-09-10 2:58 UTC (permalink / raw) To: Daniel Phillips; +Cc: John Ripley, linux-kernel, VDA if sectors full of zeros are really that common then they should never be swapped out, just a new page allocated and zeroed when it would be swapped back in. Even better then combining all of them into one block on disk. David Lang On Mon, 10 Sep 2001, Daniel Phillips wrote: > Date: Mon, 10 Sep 2001 04:43:53 +0200 > From: Daniel Phillips <phillips@bonn-fries.net> > To: John Ripley <jripley@riohome.com>, linux-kernel@vger.kernel.org > Cc: VDA <VDA@port.imtp.ilyichevsk.odessa.ua> > Subject: Re: COW fs (Re: Editing-in-place of a large file) > > On September 9, 2001 06:30 pm, John Ripley wrote: > > Interesting results for the swap partitions. Probably full of zeros. > > It doesn't make a lot of sense to spend 30-35% of your swap bandwidth > swapping zeros in and out, does it? > > -- > Daniel > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: COW fs (Re: Editing-in-place of a large file) 2001-09-09 14:46 ` John Ripley 2001-09-09 16:30 ` John Ripley @ 2001-09-09 17:41 ` Xavier Bestel 2001-09-10 1:29 ` John Ripley 2001-09-10 11:11 ` Ihar Filipau 2001-09-14 10:03 ` Pavel Machek 2 siblings, 2 replies; 29+ messages in thread From: Xavier Bestel @ 2001-09-09 17:41 UTC (permalink / raw) To: John Ripley; +Cc: Linux Kernel Mailing List, VDA le dim 09-09-2001 at 18:30 John Ripley a _rit : > /dev/sda6 - /tmp - 210845 blocks, 17697 duplicates, 8.39% > /dev/sda7 - /var - 32122 blocks, 5327 duplicates, 16.58% > /dev/sdb5 - /home - 220885 blocks, 24541 duplicates, 11.11% > /dev/sdc7 - /usr - 1084379 blocks, 122370 duplicates, 11.28% How many of these blocks actually belong to file data ? Xav ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: COW fs (Re: Editing-in-place of a large file) 2001-09-09 17:41 ` Xavier Bestel @ 2001-09-10 1:29 ` John Ripley 2001-09-10 6:45 ` Ragnar Kjørstad 2001-09-14 10:06 ` Pavel Machek 2001-09-10 11:11 ` Ihar Filipau 1 sibling, 2 replies; 29+ messages in thread From: John Ripley @ 2001-09-10 1:29 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Xavier Bestel, VDA Xavier Bestel wrote: > > le dim 09-09-2001 at 18:30 John Ripley a _rit : > > > /dev/sda6 - /tmp - 210845 blocks, 17697 duplicates, 8.39% > > /dev/sda7 - /var - 32122 blocks, 5327 duplicates, 16.58% > > /dev/sdb5 - /home - 220885 blocks, 24541 duplicates, 11.11% > > /dev/sdc7 - /usr - 1084379 blocks, 122370 duplicates, 11.28% > > How many of these blocks actually belong to file data ? Hmm, good point: Filesystem 1024-blocks Used Available Capacity Mounted on /dev/sda6 841616 4508 837108 1% /tmp /dev/sda7 124407 63774 54209 54% /var /dev/sdb5 855138 677328 177810 79% /home /dev/sdc7 4191237 3946214 245023 94% /usr My thinking was that I've managed to run out of space on all of the partitions in the past and had to prune a lot of stuff... so nearly all the blocks should contain at least some "likely" data. Still, I guess I need to verify that this isn't distorting the results. The program needs to recurse over all files on the filesystem rather than all blocks on a partition. -- John Ripley ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: COW fs (Re: Editing-in-place of a large file) 2001-09-10 1:29 ` John Ripley @ 2001-09-10 6:45 ` Ragnar Kjørstad 2001-09-14 10:06 ` Pavel Machek 1 sibling, 0 replies; 29+ messages in thread From: Ragnar Kjørstad @ 2001-09-10 6:45 UTC (permalink / raw) To: John Ripley; +Cc: Linux Kernel Mailing List, Xavier Bestel, VDA On Mon, Sep 10, 2001 at 02:29:11AM +0100, John Ripley wrote: > My thinking was that I've managed to run out of space on all of the > partitions in the past and had to prune a lot of stuff... so nearly all > the blocks should contain at least some "likely" data. Still, I guess I > need to verify that this isn't distorting the results. The program needs > to recurse over all files on the filesystem rather than all blocks on a > partition. You can find a program that does that at: http://www.stud.ntnu.no/~ragnarkj/download/duplicates.tgz And results from running on a few different filesystem-types (webpages, users home directories, softwareand so on) were posted to reiserfs-list long time ago - look in the archives if you're curious. -- Ragnar Kjørstad Big Storage ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: COW fs (Re: Editing-in-place of a large file) 2001-09-10 1:29 ` John Ripley 2001-09-10 6:45 ` Ragnar Kjørstad @ 2001-09-14 10:06 ` Pavel Machek 1 sibling, 0 replies; 29+ messages in thread From: Pavel Machek @ 2001-09-14 10:06 UTC (permalink / raw) To: John Ripley; +Cc: Linux Kernel Mailing List, Xavier Bestel, VDA Hi! > > le dim 09-09-2001 at 18:30 John Ripley a _rit : > > > > > /dev/sda6 - /tmp - 210845 blocks, 17697 duplicates, 8.39% > > > /dev/sda7 - /var - 32122 blocks, 5327 duplicates, 16.58% > > > /dev/sdb5 - /home - 220885 blocks, 24541 duplicates, 11.11% > > > /dev/sdc7 - /usr - 1084379 blocks, 122370 duplicates, 11.28% > > > > How many of these blocks actually belong to file data ? > > Hmm, good point: > > Filesystem 1024-blocks Used Available Capacity Mounted on > /dev/sda6 841616 4508 837108 1% /tmp > /dev/sda7 124407 63774 54209 54% /var > /dev/sdb5 855138 677328 177810 79% /home > /dev/sdc7 4191237 3946214 245023 94% /usr > > My thinking was that I've managed to run out of space on all of the > partitions in the past and had to prune a lot of stuff... so nearly all > the blocks should contain at least some "likely" data. Still, I guess I > need to verify that this isn't distorting the results. The program needs > to recurse over all files on the filesystem rather than all blocks on a > partition. just cat /dev/urandom > file to fill it with garbage -- Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt, details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: COW fs (Re: Editing-in-place of a large file) 2001-09-09 17:41 ` Xavier Bestel 2001-09-10 1:29 ` John Ripley @ 2001-09-10 11:11 ` Ihar Filipau 2001-09-10 16:10 ` Kari Hurtta 1 sibling, 1 reply; 29+ messages in thread From: Ihar Filipau @ 2001-09-10 11:11 UTC (permalink / raw) To: Linux Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 793 bytes --] Is there any FS that have dynamic allocation? On one partition can reside a number of FSs and using only needed space? this would be really cute - HD behaves just like RAM. Some FSs on partition just like number files on FS. In other words "File Systems' System". google and altavista both show nothing... PS Interesting like academic task. Hm. Will invistigate. Xavier Bestel wrote: > > le dim 09-09-2001 at 18:30 John Ripley a _rit : > > > /dev/sda6 - /tmp - 210845 blocks, 17697 duplicates, 8.39% > > /dev/sda7 - /var - 32122 blocks, 5327 duplicates, 16.58% > > /dev/sdb5 - /home - 220885 blocks, 24541 duplicates, 11.11% > > /dev/sdc7 - /usr - 1084379 blocks, 122370 duplicates, 11.28% > > How many of these blocks actually belong to file data ? > [-- Attachment #2: Card for Ihar Filipau --] [-- Type: text/x-vcard, Size: 407 bytes --] begin:vcard n:Filiapau;Ihar tel;pager:+375 (0) 17 2850000#6683 tel;fax:+375 (0) 17 2841537 tel;home:+375 (0) 17 2118441 tel;work:+375 (0) 17 2841371 x-mozilla-html:TRUE url:www.iph.to org:Enformatica Ltd.;Linux Developement Department adr:;;Kalinine str. 19-18;Minsk;BY;220012;Belarus version:2.1 email;internet:philips@iph.to title:Software Developer note:(none) x-mozilla-cpt:;18368 fn:Philips end:vcard ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: COW fs (Re: Editing-in-place of a large file) 2001-09-10 11:11 ` Ihar Filipau @ 2001-09-10 16:10 ` Kari Hurtta 0 siblings, 0 replies; 29+ messages in thread From: Kari Hurtta @ 2001-09-10 16:10 UTC (permalink / raw) To: Ihar Filipau; +Cc: Linux Kernel Mailing List [ Charset KOI8-R unsupported, converting... ] > > Is there any FS that have dynamic allocation? > > On one partition can reside a number of FSs and using only needed space? this > would be really cute - HD behaves just like RAM. > > Some FSs on partition just like number files on FS. > In other words "File Systems' System". > > google and altavista both show nothing... > > PS Interesting like academic task. Hm. Will invistigate. Are you thinking something similar than AdvFS of Tru 64 ? ( storage or partition is called as 'volume' and filesystems which share volume are called as 'filesets' ) -- /"\ | Kari \ / ASCII Ribbon Campaign | Hurtta X Against HTML Mail | / \ | ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: COW fs (Re: Editing-in-place of a large file) 2001-09-09 14:46 ` John Ripley 2001-09-09 16:30 ` John Ripley 2001-09-09 17:41 ` Xavier Bestel @ 2001-09-14 10:03 ` Pavel Machek 2 siblings, 0 replies; 29+ messages in thread From: Pavel Machek @ 2001-09-14 10:03 UTC (permalink / raw) To: John Ripley; +Cc: linux-kernel, VDA Hi! > - Keep a hash of the contents of blocks in the buffer-cache. > - The kernel compares the hash of each block write to all blocks already > in the buffer-cache. > - If a duplicate is found, the kernel generates a COW link instead of > writing the block to disk. > > Obviously this would involve large amounts of CPU. I think a simple Why? If you hashed the hashes, you could do it very fast. Pavel -- Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt, details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: COW fs (Re: Editing-in-place of a large file) 2001-09-02 21:28 ` COW fs (Re: Editing-in-place of a large file) VDA 2001-09-09 14:46 ` John Ripley @ 2001-09-10 9:28 ` VDA 2001-09-10 9:35 ` John P. Looney 1 sibling, 1 reply; 29+ messages in thread From: VDA @ 2001-09-10 9:28 UTC (permalink / raw) To: John Ripley; +Cc: linux-kernel JR> I've tried this idea. I did an MD5 of every block (4KB) in a partition JR> and counted the number of blocks with the same hash. Only about 5-10% of JR> blocks on several filesystem were actually duplicates. This might be JR> better if you reduced the block size to 512 bytes, but there's a JR> question of how much extra space filesystem structures would then take JR> up. JR> Basically, it didn't look like compressing duplicate blocks would JR> actually be worth the extra structures or CPU. JR> On the other hand, a COW fs would be excellent for making file copying JR> much quicker. You can do things like copying the linux kernel tree using JR> 'cp -lR', but the files do not act as if they are unique copies - and JR> I've been bitten many times when I forgot this. If you had COW, you JR> could just copy the entire tree and forget about the fact they're JR> linked. Yeah, I'm mostly thinking about this kind of COW fs usage. You may copy gigabytes in the instant and don't bother about tracking duplicate files ("zero blocks left??? where's the hell I copied that .mpg's???"). Now, sometimes we use hardlinks as "poor man's COW fs", but I bet it's error prone. Every now and then you forget it's a hardlinked kernel tree and start happily hacking in it... :-( A "compressor" which hunts and merges duplicate blocks is a bonus, not a primary tool. -- Best regards, VDA mailto:VDA@port.imtp.ilyichevsk.odessa.ua http://port.imtp.ilyichevsk.odessa.ua/vda/ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: COW fs (Re: Editing-in-place of a large file) 2001-09-10 9:28 ` VDA @ 2001-09-10 9:35 ` John P. Looney 0 siblings, 0 replies; 29+ messages in thread From: John P. Looney @ 2001-09-10 9:35 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 714 bytes --] On Mon, Sep 10, 2001 at 12:28:51PM +0300, VDA mentioned: > Now, sometimes we use hardlinks as "poor man's COW fs", but > I bet it's error prone. Every now and then you forget it's a > hardlinked kernel tree and start happily hacking in it... :-( And of course hardlinks don't work on directories... > A "compressor" which hunts and merges duplicate blocks is a bonus, > not a primary tool. Checkout http://freshmeat.net/projects/fslint/ - it's an excellent tool for hunting down duplicate files, dangling links etc. Kate -- _______________________________________ John Looney Chief Scientist a n t e f a c t o t: +353 1 8586004 www.antefacto.com f: +353 1 8586014 [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Editing-in-place of a large file 2001-09-02 20:21 Editing-in-place of a large file Bob McElrath 2001-09-02 21:28 ` COW fs (Re: Editing-in-place of a large file) VDA @ 2001-09-02 21:30 ` Ingo Oeser 2001-09-03 0:59 ` Larry McVoy 1 sibling, 1 reply; 29+ messages in thread From: Ingo Oeser @ 2001-09-02 21:30 UTC (permalink / raw) To: Bob McElrath; +Cc: linux-kernel On Sun, Sep 02, 2001 at 03:21:37PM -0500, Bob McElrath wrote: > I would like to take an extremely large file (multi-gigabyte) and edit > it by removing a chunk out of the middle. This is easy enough by > reading in the entire file and spitting it back out again, but it's > hardly efficent to read in an 8GB file just to remove a 100MB segment. > > Is there another way to do this? It's basically changing ownership (in terms of "which inode owns which blocks") of blocks. There is just no POSIX-API to do this, that's why there is no simple way to do this. Applications handling such large files usally implement a chunk management, which can mark chunks as "unused" and skip them while processing the file. What's needed is a generalisation of sparse files and truncate(). They both handle similar problems. For now I would seriously consider editing the ext2-structures for this, because that's the only way you can do this right now. Regards Ingo Oeser -- In der Wunschphantasie vieler Mann-Typen [ist die Frau] unsigned und operatorvertraeglich. --- Dietz Proepper in dasr ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Editing-in-place of a large file 2001-09-02 21:30 ` Editing-in-place of a large file Ingo Oeser @ 2001-09-03 0:59 ` Larry McVoy 2001-09-03 1:24 ` Ingo Oeser 2001-09-03 1:30 ` Daniel Phillips 0 siblings, 2 replies; 29+ messages in thread From: Larry McVoy @ 2001-09-03 0:59 UTC (permalink / raw) To: Ingo Oeser; +Cc: Bob McElrath, linux-kernel > What's needed is a generalisation of sparse files and truncate(). > They both handle similar problems. how about fzero(int fd, off_t off, size_t len) which zeros the blocks and if it can creates a holey file? However, that's not what Bob wants, he wants to remove commercials from recorded TV. So what he wants is fdelete(int fd, off_t off, size_t len) which has the semantics of shifting the rest of the file backwards to "off". The main problem with this is if the off/len are not block aligned. If they are, then this is just block twiddling, if they aren't, then this is a file rewrite anyway. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Editing-in-place of a large file 2001-09-03 0:59 ` Larry McVoy @ 2001-09-03 1:24 ` Ingo Oeser 2001-09-03 1:31 ` Alan Cox 2001-09-03 4:27 ` Bob McElrath 2001-09-03 1:30 ` Daniel Phillips 1 sibling, 2 replies; 29+ messages in thread From: Ingo Oeser @ 2001-09-03 1:24 UTC (permalink / raw) To: linux-kernel; +Cc: Bob McElrath On Sun, Sep 02, 2001 at 05:59:38PM -0700, Larry McVoy wrote: > > What's needed is a generalisation of sparse files and truncate(). > > They both handle similar problems. > > how about > > fzero(int fd, off_t off, size_t len) > fdelete(int fd, off_t off, size_t len) and finsert(int fd, off_t off, size_t len, void *buf, size_t buflen) > The main problem with this is if the off/len are not block aligned. If they > are, then this is just block twiddling, if they aren't, then this is a file > rewrite anyway. Yes, that's why I solved this in user space by implementing a C++ stream consisting of multiple mmaps() of files and anonymous memory. I needed this for someone editing audio streams. It's basically creating a binary diff ;-) Another solution for the original problem is to rewrite the file in-place by coping from the end of the gap to the beginning of the gap until the gap is shifted to the end of the file and thus can be left to ftruncate(). This will at least not require more space on disk, but will take quite a while and risk data corruption for this file in case of abortion. But fzero, fdelete and finsert might be worth considering, since some file systems, which pack tails could also pack these kind of partial used blocks and handle them properly. We already handle partial pages, so why not handle them with offset/size pairs and enable this mechanisms? Multi media streams would love these kind of APIs ;-) Regards Ingo Oeser ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Editing-in-place of a large file 2001-09-03 1:24 ` Ingo Oeser @ 2001-09-03 1:31 ` Alan Cox 2001-09-03 1:50 ` Ingo Oeser 2001-09-03 4:27 ` Bob McElrath 1 sibling, 1 reply; 29+ messages in thread From: Alan Cox @ 2001-09-03 1:31 UTC (permalink / raw) To: Ingo Oeser; +Cc: linux-kernel, Bob McElrath > Another solution for the original problem is to rewrite the file > in-place by coping from the end of the gap to the beginning of > the gap until the gap is shifted to the end of the file and thus > can be left to ftruncate(). Another approach would be to keep your own index of blocks and use that for the data reads. Since fdelete and fzero wont actually relayout the files in order to make the data linear (even if such calls existed) there isnt much point performancewise doing it in kernel space - its a very specialised application ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Editing-in-place of a large file 2001-09-03 1:31 ` Alan Cox @ 2001-09-03 1:50 ` Ingo Oeser 2001-09-03 10:48 ` Alan Cox 0 siblings, 1 reply; 29+ messages in thread From: Ingo Oeser @ 2001-09-03 1:50 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel, Bob McElrath On Mon, Sep 03, 2001 at 02:31:58AM +0100, Alan Cox wrote: > Another approach would be to keep your own index of blocks and use that > for the data reads. That is reimplementing file system functionality in user space. I'm in doubts that this is considered good design... But I've done a similar thing anyway (using a ordered list of continous mmap()ed chunks) some years ago (see my other posting in this thread mentioning C++) ;-) > Since fdelete and fzero wont actually relayout the files in > order to make the data linear (even if such calls existed) > there isnt much point performancewise doing it in kernel space That's the problem of the file system to be used. And the data doesn't need to be linear. Current file systems on Linux only avoid fragmentation, but they don't actively fight it by moving things around, so this doesn't matter anyway. > - its a very specialised application Editing video and audio streams is more common then you think and letting the user wait, while we copy 4GB around is not what I consider user friendly, even for the selective user friendlyness of a Unix ;-) Regards Ingo Oeser ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Editing-in-place of a large file 2001-09-03 1:50 ` Ingo Oeser @ 2001-09-03 10:48 ` Alan Cox 2001-09-03 14:31 ` Daniel Phillips ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Alan Cox @ 2001-09-03 10:48 UTC (permalink / raw) To: Ingo Oeser; +Cc: Alan Cox, linux-kernel, Bob McElrath > That is reimplementing file system functionality in user space. > I'm in doubts that this is considered good design... Keeping things out of the kernel is good design. Your block indirections are no different to other database formats. Perhaps you think we should have fsql_operation() and libdb in kernel 8) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Editing-in-place of a large file 2001-09-03 10:48 ` Alan Cox @ 2001-09-03 14:31 ` Daniel Phillips 2001-09-03 14:46 ` Bob McElrath 2001-09-03 21:19 ` Ben Ford 2 siblings, 0 replies; 29+ messages in thread From: Daniel Phillips @ 2001-09-03 14:31 UTC (permalink / raw) To: Alan Cox, Ingo Oeser; +Cc: Alan Cox, linux-kernel, Bob McElrath On September 3, 2001 12:48 pm, Alan Cox wrote: > > That is reimplementing file system functionality in user space. > > I'm in doubts that this is considered good design... > > Keeping things out of the kernel is good design. Your block indirections > are no different to other database formats. Perhaps you think we should > have fsql_operation() and libdb in kernel 8) For that matter, he could use a database file. I don't know if Postgres (for example) supports streaming read/write from a database record, but if it doesn't it could be made to. Or if he doesn't want to hack Postgres today, he can put his "metadata" in a database file and the video data in a separate file. -- Daniel ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Editing-in-place of a large file 2001-09-03 10:48 ` Alan Cox 2001-09-03 14:31 ` Daniel Phillips @ 2001-09-03 14:46 ` Bob McElrath 2001-09-03 14:54 ` Alan Cox 2001-09-03 15:11 ` Richard Guenther 2001-09-03 21:19 ` Ben Ford 2 siblings, 2 replies; 29+ messages in thread From: Bob McElrath @ 2001-09-03 14:46 UTC (permalink / raw) To: Alan Cox; +Cc: Ingo Oeser, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1697 bytes --] Alan Cox [alan@lxorguk.ukuu.org.uk] wrote: > > That is reimplementing file system functionality in user space. > > I'm in doubts that this is considered good design... > > Keeping things out of the kernel is good design. Your block indirections > are no different to other database formats. Perhaps you think we should > have fsql_operation() and libdb in kernel 8) Well, a filesystem that is: 1) synchronous 2) bypasses linux's buffer cache 3) insert() and delete() to insert and delete from the middle of a file. 4) Has large block sizes Sounds like a possibility for the kernel to me. As with most things, you could do raw disk I/O from userspace, but it seems reasonable to put it in the kernel. Call it "mediafs" or something. I agree that "normal" filesystems like ext2 should not do the insert() and delete() that were mentioned. It'd be a lot of work and could easily get someone in to trouble (imagine doing it on small files!) It appears that SGI's XFS does some of this in IRIX. They play some tricks to keep from copying the streaming data. (i.e. same buffer gets passed around as a target for the video device, a source for a userspace program, and a source for DMA to disk) They also have some special flags: fcentl(fd, F_SETFL, FDIRECT); /* enables direct disk access */ open(filename, O_DIRECT); /* likewise */ See this page for details: http://reality.sgi.com/cpirazzi_engr/lg/uv/disk.html Can linux disable its buffer cache for a particular filesystem (something like a 'nocache' mount option?) Cheers, -- Bob Bob McElrath (rsmcelrath@students.wisc.edu) Univ. of Wisconsin at Madison, Department of Physics [-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Editing-in-place of a large file 2001-09-03 14:46 ` Bob McElrath @ 2001-09-03 14:54 ` Alan Cox 2001-09-03 15:42 ` Doug McNaught 2001-09-03 15:11 ` Richard Guenther 1 sibling, 1 reply; 29+ messages in thread From: Alan Cox @ 2001-09-03 14:54 UTC (permalink / raw) To: Bob McElrath; +Cc: Alan Cox, Ingo Oeser, linux-kernel > Sounds like a possibility for the kernel to me. As with most things, But you have it backwards - things are not "could go in the kernel" things are "could avoid being in kernel" > passed around as a target for the video device, a source for a userspace > program, and a source for DMA to disk) They also have some special > flags: > fcentl(fd, F_SETFL, FDIRECT); /* enables direct disk access */ > open(filename, O_DIRECT); /* likewise */ > See this page for details: > http://reality.sgi.com/cpirazzi_engr/lg/uv/disk.html Andrea has this working on 2.4 + patches ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Editing-in-place of a large file 2001-09-03 14:54 ` Alan Cox @ 2001-09-03 15:42 ` Doug McNaught 0 siblings, 0 replies; 29+ messages in thread From: Doug McNaught @ 2001-09-03 15:42 UTC (permalink / raw) To: Alan Cox; +Cc: Bob McElrath, Ingo Oeser, linux-kernel Alan Cox <alan@lxorguk.ukuu.org.uk> writes: > > open(filename, O_DIRECT); /* likewise */ > > Andrea has this working on 2.4 + patches Is O_DIRECT slated to go into mainstream 2.4? Or is it a 2.5 thing? Or neither? -Doug -- Free Dmitry Sklyarov! http://www.freesklyarov.org/ We will return to our regularly scheduled signature shortly. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Editing-in-place of a large file 2001-09-03 14:46 ` Bob McElrath 2001-09-03 14:54 ` Alan Cox @ 2001-09-03 15:11 ` Richard Guenther 1 sibling, 0 replies; 29+ messages in thread From: Richard Guenther @ 2001-09-03 15:11 UTC (permalink / raw) To: Bob McElrath; +Cc: Alan Cox, Ingo Oeser, linux-kernel On Mon, 3 Sep 2001, Bob McElrath wrote: > Alan Cox [alan@lxorguk.ukuu.org.uk] wrote: > > > That is reimplementing file system functionality in user space. > > > I'm in doubts that this is considered good design... > > > > Keeping things out of the kernel is good design. Your block indirections > > are no different to other database formats. Perhaps you think we should > > have fsql_operation() and libdb in kernel 8) > > Well, a filesystem that is: > 1) synchronous > 2) bypasses linux's buffer cache > 3) insert() and delete() to insert and delete from the middle of a file. > 4) Has large block sizes Well, just make it possible to tell something more about the operation you want to do to the kernel/VFS. Copy/Insert/Delete is in fact some sort of sendfile operation. For GLAME I did a "simple" (well, it turned out to be not that simple...) user level filesystem that supports those kind of operations. The interface I chose was sendfile(dest_fd, source_fd, count, mode) where mode can be composed out of nothing (overwrite, leave source intact), INSERT and CUT. As it is a userspace implementation byte granularity is supported, but for a kernel level support I suppose block granularity would suffice and could be optimized for in the lower level filesystems code. I'd prefer such a generic interface over fcntls which would certainly be possible at least for a "split this file into two ones" operation. Oh yes - it would help to have this in the kernel, at least if you want to support sane mmap behaviour (for block aligned modifications, of course - byte level is impossible due to aliasing issues, I believe). Richard. -- Richard Guenther <richard.guenther@uni-tuebingen.de> WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ The GLAME Project: http://www.glame.de/ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Editing-in-place of a large file 2001-09-03 10:48 ` Alan Cox 2001-09-03 14:31 ` Daniel Phillips 2001-09-03 14:46 ` Bob McElrath @ 2001-09-03 21:19 ` Ben Ford 2 siblings, 0 replies; 29+ messages in thread From: Ben Ford @ 2001-09-03 21:19 UTC (permalink / raw) To: linux-kernel Alan Cox wrote: >>That is reimplementing file system functionality in user space. >>I'm in doubts that this is considered good design... >> > >Keeping things out of the kernel is good design. Your block indirections >are no different to other database formats. Perhaps you think we should >have fsql_operation() and libdb in kernel 8) > From what I've read, that is where windows is going! -b -- Number of restrictions placed on "Alice in Wonderland" (public domain) eBook: 5 Maximum penalty for reading "Alice in Wonderland" aloud (possible DMCA violation): 5 years jail Average sentence for commiting Rape: 5 years ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Editing-in-place of a large file 2001-09-03 1:24 ` Ingo Oeser 2001-09-03 1:31 ` Alan Cox @ 2001-09-03 4:27 ` Bob McElrath 1 sibling, 0 replies; 29+ messages in thread From: Bob McElrath @ 2001-09-03 4:27 UTC (permalink / raw) To: Ingo Oeser; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 3316 bytes --] Ingo Oeser [ingo.oeser@informatik.tu-chemnitz.de] wrote: > On Sun, Sep 02, 2001 at 05:59:38PM -0700, Larry McVoy wrote: > > > What's needed is a generalisation of sparse files and truncate(). > > > They both handle similar problems. > > > > how about > > > > fzero(int fd, off_t off, size_t len) > > fdelete(int fd, off_t off, size_t len) > > and > > finsert(int fd, off_t off, size_t len, void *buf, size_t buflen) > > > The main problem with this is if the off/len are not block aligned. If they > > are, then this is just block twiddling, if they aren't, then this is a file > > rewrite anyway. *exactly* I don't know enough about ext2fs to know if this is possible (i.e. a partially filled block in the middle of a file) so that's why I asked. > Another solution for the original problem is to rewrite the file > in-place by coping from the end of the gap to the beginning of > the gap until the gap is shifted to the end of the file and thus > can be left to ftruncate(). For editing commercials, you'd still have to copy 90% of the data. In the US, there's roughly 5 minutes of commercials for every 15 of the show, so that would only save copying the first 15 minutes... > This will at least not require more space on disk, but will take > quite a while and risk data corruption for this file in case of > abortion. Yep. I should mention that the Linux/mjpeg tools (http://mjpeg.sourceforge.net) already have an elegant way of "marking" a portion of a video and skipping it when playing it, through the use of "edit lists". (use xlav/glav to mark it, and then you can lavplay the edit list, which just contains the start/end of skipped sections) They also have a program to apply the edit list and create a new video (lavtrans). But this requires copying the desired sections of video to a new file, which requires 75% more disk space than the original file, and takes a looong time. The idea behind my first message should be obvious here...an almost atomic operation modifying at most 2 blocks (and marking a bunch as free) wouldn't require nearly as much disk-thrashing, and would be nearly instantaneous from the user's perspective. Disk fragmentation is unimportant when the contiguous chunks are 300MB long. > But fzero, fdelete and finsert might be worth considering, since > some file systems, which pack tails could also pack these kind of > partial used blocks and handle them properly. Do the journaling filesystems use blocks in a similar manner to ext2fs? Anyone know if any of them can handle partially filled blocks in the middle of a file? Are there any media-filesystems out there that have these kinds of extensions? I'm not sure these extensions would be useful for anything but editing media... > We already handle partial pages, so why not handle them with > offset/size pairs and enable this mechanisms? Multi media streams > would love these kind of APIs ;-) Yep yep yep. What do multimedia people use? Custom multi-thousand dollar programs with their own filesystem layer? What about TiVo? Didn't they contribute some fs-layer modifications a while back? Cheers, -- Bob Bob McElrath (rsmcelrath@students.wisc.edu) Univ. of Wisconsin at Madison, Department of Physics [-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Editing-in-place of a large file 2001-09-03 0:59 ` Larry McVoy 2001-09-03 1:24 ` Ingo Oeser @ 2001-09-03 1:30 ` Daniel Phillips 1 sibling, 0 replies; 29+ messages in thread From: Daniel Phillips @ 2001-09-03 1:30 UTC (permalink / raw) To: Larry McVoy, Ingo Oeser; +Cc: Bob McElrath, linux-kernel On September 3, 2001 02:59 am, Larry McVoy wrote: > > What's needed is a generalisation of sparse files and truncate(). > > They both handle similar problems. > > how about > > fzero(int fd, off_t off, size_t len) sys_clear :-) > which zeros the blocks and if it can creates a holey file? > > However, that's not what Bob wants, he wants to remove commercials from > recorded TV. So what he wants is > > fdelete(int fd, off_t off, size_t len) > > which has the semantics of shifting the rest of the file backwards to "off". > > The main problem with this is if the off/len are not block aligned. If they > are, then this is just block twiddling, if they aren't, then this is a file > rewrite anyway. He could insert blank video frames to pad to the edges of blocks. Very theoretical since we are ages away from having fzero/sys_clear. Ask Al Viro if you want to hear the whole ugly story. (Executive summary: it's hard enough handling remove/create races with just one boundary per file, now try it with an unbounded number.) -- Daniel ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2001-09-17 22:37 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2001-09-02 20:21 Editing-in-place of a large file Bob McElrath 2001-09-02 21:28 ` COW fs (Re: Editing-in-place of a large file) VDA 2001-09-09 14:46 ` John Ripley 2001-09-09 16:30 ` John Ripley 2001-09-10 2:43 ` Daniel Phillips 2001-09-10 2:58 ` David Lang 2001-09-09 17:41 ` Xavier Bestel 2001-09-10 1:29 ` John Ripley 2001-09-10 6:45 ` Ragnar Kjørstad 2001-09-14 10:06 ` Pavel Machek 2001-09-10 11:11 ` Ihar Filipau 2001-09-10 16:10 ` Kari Hurtta 2001-09-14 10:03 ` Pavel Machek 2001-09-10 9:28 ` VDA 2001-09-10 9:35 ` John P. Looney 2001-09-02 21:30 ` Editing-in-place of a large file Ingo Oeser 2001-09-03 0:59 ` Larry McVoy 2001-09-03 1:24 ` Ingo Oeser 2001-09-03 1:31 ` Alan Cox 2001-09-03 1:50 ` Ingo Oeser 2001-09-03 10:48 ` Alan Cox 2001-09-03 14:31 ` Daniel Phillips 2001-09-03 14:46 ` Bob McElrath 2001-09-03 14:54 ` Alan Cox 2001-09-03 15:42 ` Doug McNaught 2001-09-03 15:11 ` Richard Guenther 2001-09-03 21:19 ` Ben Ford 2001-09-03 4:27 ` Bob McElrath 2001-09-03 1:30 ` Daniel Phillips
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).