All of lore.kernel.org
 help / color / mirror / Atom feed
* 'Subset' Hard Links
@ 2009-09-18  0:02 Micah Dombrowski
  2009-09-18  1:31 ` Sunil Mushran
  0 siblings, 1 reply; 3+ messages in thread
From: Micah Dombrowski @ 2009-09-18  0:02 UTC (permalink / raw)
  To: linux-fsdevel

Hello,

I couldn't think of anywhere else to ask such a question, and google  
is useless as I have no unique keywords.  I am wondering if it is  
possible with some/any filesystems to have multiple hard links to a  
file, some of which only point to a subset of the file's data.

Eg:
firstname -> all data bytes 1 to 10
secondname -> bytes 3 to 10
thirdname -> bytes 5 to 7

This would clearly require some interesting locking of the file WRT  
writes, but it seems like it should be possible, and even easy for  
read-only access.  I deal with moderately large data files (50+GB),  
and such a thing would be incredibly useful to me for pulling out  
interesting bits of my data without having to make copies of the data  
itself.

A related method I was wishing existed would allow concatenation of  
files simply by deleting all but one hard link, and changing the  
remaining one to point to all of the original files' data as  
fragments.  This would be great, as 'cat'ing together six 10GB files  
is pretty slow, and it seems silly to be copying all of that data  
around when I only need one actual instance of the full data on disk,  
and that instance already exists, albeit in a fragmented manner.

Do any tools for doing this sort of thing exist?

Thanks,
Micah Dombrowski
Department of Physics and Astronomy
Dartmouth College

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 'Subset' Hard Links
  2009-09-18  0:02 'Subset' Hard Links Micah Dombrowski
@ 2009-09-18  1:31 ` Sunil Mushran
  2009-09-18  4:26   ` Sage Weil
  0 siblings, 1 reply; 3+ messages in thread
From: Sunil Mushran @ 2009-09-18  1:31 UTC (permalink / raw)
  To: Micah Dombrowski; +Cc: linux-fsdevel

Micah Dombrowski wrote:
> I couldn't think of anywhere else to ask such a question, and google 
> is useless as I have no unique keywords.  I am wondering if it is 
> possible with some/any filesystems to have multiple hard links to a 
> file, some of which only point to a subset of the file's data.
>
> Eg:
> firstname -> all data bytes 1 to 10
> secondname -> bytes 3 to 10
> thirdname -> bytes 5 to 7
>
> This would clearly require some interesting locking of the file WRT 
> writes, but it seems like it should be possible, and even easy for 
> read-only access.  I deal with moderately large data files (50+GB), 
> and such a thing would be incredibly useful to me for pulling out 
> interesting bits of my data without having to make copies of the data 
> itself.
>
> A related method I was wishing existed would allow concatenation of 
> files simply by deleting all but one hard link, and changing the 
> remaining one to point to all of the original files' data as 
> fragments.  This would be great, as 'cat'ing together six 10GB files 
> is pretty slow, and it seems silly to be copying all of that data 
> around when I only need one actual instance of the full data on disk, 
> and that instance already exists, albeit in a fragmented manner.
>
> Do any tools for doing this sort of thing exist?
>

btrfs should able to handle most of this.

http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commitdiff;h=c5c9cd4d1b827fe545ed2a945e91e3a6909f3886

However, note that file systems operate in terms of blocks. So the start 
offset
would need to be block aligned.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 'Subset' Hard Links
  2009-09-18  1:31 ` Sunil Mushran
@ 2009-09-18  4:26   ` Sage Weil
  0 siblings, 0 replies; 3+ messages in thread
From: Sage Weil @ 2009-09-18  4:26 UTC (permalink / raw)
  To: Sunil Mushran; +Cc: Micah Dombrowski, linux-fsdevel

On Thu, 17 Sep 2009, Sunil Mushran wrote:

> Micah Dombrowski wrote:
> > I couldn't think of anywhere else to ask such a question, and google is
> > useless as I have no unique keywords.  I am wondering if it is possible with
> > some/any filesystems to have multiple hard links to a file, some of which
> > only point to a subset of the file's data.
> > 
> > Eg:
> > firstname -> all data bytes 1 to 10
> > secondname -> bytes 3 to 10
> > thirdname -> bytes 5 to 7
> >
> > This would clearly require some interesting locking of the file WRT writes,
> > but it seems like it should be possible, and even easy for read-only access.
> > I deal with moderately large data files (50+GB), and such a thing would be
> > incredibly useful to me for pulling out interesting bits of my data without
> > having to make copies of the data itself.
> >
> > A related method I was wishing existed would allow concatenation of files
> > simply by deleting all but one hard link, and changing the remaining one to
> > point to all of the original files' data as fragments.  This would be great,
> > as 'cat'ing together six 10GB files is pretty slow, and it seems silly to be
> > copying all of that data around when I only need one actual instance of the
> > full data on disk, and that instance already exists, albeit in a fragmented
> > manner.
> > 
> > Do any tools for doing this sort of thing exist?
> > 
> 
> btrfs should able to handle most of this.
> 
> http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commitdiff;h=c5c9cd4d1b827fe545ed2a945e91e3a6909f3886

Note that currently, you can only clone a range from one file to another.  
It should be pretty straightforward to allow cloning from one offset to 
another.  Or, you can work around it by cloning the range to a temporary 
file and then back again at a different offset.

The code can also currently fail when compression is enabled and 
you clone a subset of the file (compressed inline extents don't get split 
yet).

sage


> 
> However, note that file systems operate in terms of blocks. So the start 
> offset would need to be block aligned.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-09-18  4:26 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-18  0:02 'Subset' Hard Links Micah Dombrowski
2009-09-18  1:31 ` Sunil Mushran
2009-09-18  4:26   ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.