linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [IDEA] - run-length compaction of block numbers
       [not found] <1eI8L-5fS-9@gated-at.bofh.it>
@ 2004-01-16 20:01 ` Andi Kleen
  0 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2004-01-16 20:01 UTC (permalink / raw)
  To: raymond jennings; +Cc: linux-kernel

"raymond jennings" <highwind747@hotmail.com> writes:

> Is there any value in creating a new filesystem that encodes long
> contiguous blocks as a single block run instead of multiple block
> numbers?  A long file may use only a few block runs instead of many
> block numbers if there is little fragmentation (usually the case).
> Also dynamic allocation of inodes...etc.  The details are long; anyone
> interested can e-mail me privately.

Congratulations. You just reinvented the basic specs of JFS and XFS.

-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [IDEA] - run-length compaction of block numbers
  2004-01-19 16:20         ` Valdis.Kletnieks
  2004-01-19 17:06           ` Hans Reiser
@ 2004-02-08 11:38           ` Keith Owens
  1 sibling, 0 replies; 13+ messages in thread
From: Keith Owens @ 2004-02-08 11:38 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Hans Reiser, linux-kernel

On Mon, 19 Jan 2004 11:20:51 -0500, 
Valdis.Kletnieks@vt.edu wrote:
>I don't know how IBM did disk space management on the earlier systems
>such as the 1401, 7040, and 7090 series, but I'd suspect it was a similar
>extent-based method.

<old-fogey-mode>

Can't speak for the 1401 or 7xxx series.  IBM 1620[*], which was
obsolete back in 1977 when I programmed it, also used extent based disk
directories.  There were three directory areas on disk, one to map file
names to extent indexes, one to map the used extents on the disk and
one to map the free space extents.

[*] Binary coded decimal.  Each digit had 4 numeric bits plus two flag
    bits.  Two adjacent digits made a letter (and you thought that
    Unicode was bad!).  Card punch or paper tape in/out.  No printer,
    feed the cards through a separate machine for that.  20,000 digits
    of memory with a 2 Mb disk drive was a big machine.

    Five digit addressing with one nice feature, if the flag bit was
    set on the rightmost digit of an address then it was automatically
    treated as an indirect address and the real address was fetched
    from the area referenced by this address, the real address could
    also be flagged as indirect and the process would continue.  The
    IBM manual claimed that indirect addresses required no extra time
    (nothing changes about computer claims :).  The student's idea of
    fun was to load all of memory with indirect addresses, each
    pointing to the next with loop around.  Refer to the first and
    watch the big panel lights slowly cycle through all of memory doing
    nothing useful.

</old-fogey-mode>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [IDEA] - run-length compaction of block numbers
@ 2004-01-29 16:40 raymond jennings
  0 siblings, 0 replies; 13+ messages in thread
From: raymond jennings @ 2004-01-29 16:40 UTC (permalink / raw)
  To: reiser; +Cc: linux-kernel

Well, here's the basic idea.  There are four types of "block runs":

Direct runs
Indirect runs
Null runs
Zero runs

struct blockrun_t {
   blocknum_t logstart, loglength;
   blocknum_t phystart, phylength;
}

Direct runs are determined by nonzero and equal logical and physical 
numbers.  The only way this can happen is for the run to directly reference 
data blocks (I think, some crazy cases can be guarded against).

Indirect runs have a larger logical count than the physical count, meaning 
that the referenced blocks actually comprise a list of more block runs.  
Functionally equivalent to an indirect block.

Null runs have a zero logical length and are useful as padding.  The logical 
start should be consistent with surrounding runs to allow binary searching.

Zero runs are a quick and dirty implementation of sparse files.  They are 
given by a PHYSICAL zero, meaning that the actual data blocks are not stored 
on disk.  reads of these "blocks" return zeroes.

The top level inode contains eight or so of these runs, listed in logical 
block start order (binary searchable).  The indirect runs thereof reference 
other runs, which may reference others (and so on).  Blocks could be as 
small as 512 (saves space)

these top level block runs form what is a block chain.  It could be as large 
or as small as needed.

The superblock contains a blockchain of its own, but this "block chain" 
references the dynamic inode space, thus, inodes are only limited by the 
range of an inode number (probably 2G), or the amount of space.  Unused 
inodes could even be sparsified.

fsck.rle will have its work cut out for it however.  I suggest that block 
chains be abstracted.  Manipulating the raw block runs should be left to the 
discretion of the rlefs library (if there is one).

If this has already been implemented, feel free to use my suggestions to 
improve them (or not).  I am unfortunate enough to not have enough space to 
unpack my kernel source, so I can't very well keep up :(.  I'm running a 
system on a mere 212MB hard disk, with gcc, g++, ncurses (devel too), jed, 
joe, init, initscripts, mingetty, and midnight commander, and I only have 
7.5MB of space left.  Naturally, I can't afford swap space :O.  This only 
happened because my stupid 2GB HD suffered a short circuit, which caused the 
dreaded head crash.  That's my sob story.  Thank's for listening.

Is the

>From: Hans Reiser <reiser@namesys.com>
>To: Valdis.Kletnieks@vt.edu
>CC: raymond jennings <highwind747@hotmail.com>,  
>linux-kernel@vger.kernel.org
>Subject: Re: [IDEA] - run-length compaction of block numbers
>Date: Fri, 16 Jan 2004 23:47:55 +0300
>
>Valdis.Kletnieks@vt.edu wrote:
>
>>On Fri, 16 Jan 2004 11:38:59 PST, raymond jennings 
>><highwind747@hotmail.com>  said:
>>
>>
>>>Is there any value in creating a new filesystem that encodes long 
>>>contiguous blocks as a single block run instead of multiple block 
>>>numbers?  A long file may use only a few block runs instead of many block 
>>>numbers if there is little fragmentation (usually the case).
>>>
>This is already done, they are called "extent"s.  Reiser4 uses them, XFS 
>uses them, I think Veritas may have been the first to use them but I am not 
>sure of this, maybe it was IBM.
>
>>>Also dynamic allocation of inodes...etc.
>>>
>ReiserFS does dynamic allocation of stat data (what stat() returns), we 
>don't have inodes.  Many filesystems do dynamic allocation of inodes.
>
>>>  The details are long; anyone interested can e-mail me privately.
>>>
>>>
>>
>>Only question I have is how you make an efficient scheme for finding the 
>>block
>>number for an offset several gigabytes into the file.
>>
>Use a tree to store everything in.  See www.namesys.com for extensive 
>details.
>
>>  You either get to run
>>through the list linearly (gaak) or you need to stick a tree or hash on 
>>the
>>front to allow semi-random access to a starting point to scan linearly 
>>from, at
>>which point you've probably blown any space savings unless you have a VERY 
>>low
>>fragmentation rate...
>>
>>On the other hand, dynamic allocation of inodes is interesting, as it 
>>means
>>you're not screwed if over time, the NBPI value for the filesystem changes 
>>(or
>>if you simply guessed wrong at mkfs time) and you run out of inodes when 
>>you
>>still have space free.  Reiserfs V3 already does this, in fact...
>>
>>
>>
>Cheers,
>
>--
>Hans
>
>

_________________________________________________________________
Check out the coupons and bargains on MSN Offers! 
http://shopping.msn.com/softcontent/softcontent.aspx?scmId=1418


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [IDEA] - run-length compaction of block numbers
  2004-01-19 16:20         ` Valdis.Kletnieks
@ 2004-01-19 17:06           ` Hans Reiser
  2004-02-08 11:38           ` Keith Owens
  1 sibling, 0 replies; 13+ messages in thread
From: Hans Reiser @ 2004-01-19 17:06 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: raymond jennings, linux-kernel

So extents were around at the beginning of filesystems, and then 
forgotten by Unix for a while.... interesting....

-- 
Hans



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [IDEA] - run-length compaction of block numbers
  2004-01-19 10:45       ` Hans Reiser
@ 2004-01-19 16:20         ` Valdis.Kletnieks
  2004-01-19 17:06           ` Hans Reiser
  2004-02-08 11:38           ` Keith Owens
  0 siblings, 2 replies; 13+ messages in thread
From: Valdis.Kletnieks @ 2004-01-19 16:20 UTC (permalink / raw)
  To: Hans Reiser; +Cc: raymond jennings, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1359 bytes --]

On Mon, 19 Jan 2004 13:45:29 +0300, Hans Reiser said:

> >Does the extent-based disk allocation used by OS/360 in 1964 count? :)
> >  
> >
> Probably.  Tell me more about it.

Well, basically, a dataset was described by an entry in the Volume Table of
Contents (VTOC) with a something conceptually similar to an inode.  The actual
data area was described with a initial allocation, and up to 15 additional
"extents" (yes, hard limit of 15 extents per volume, although a dataset could
span volumes).  Each extent was described with start/end cylinder/track pairs.
You could allocate space by absolute tracks, by tracks, or by cylinders (of
course, space actually allocated was dependent on the device).  So you could
code in the JCL something like SPACE=(CYL,(10.5)) which would allocate 10
cylinders and up to 15 additional 5-cylinder spaces.  And yes, it was quite
possible to get hosed on fragementation.

The first 3 allocations were contained in the 'format 1' DSCB (Data Set
Control Block), and additional extents were in a 'format 3' DSCB.  Free
space was described in extents in a series of chained 'format 4' DSCB's.
(A single DSCB was only 140 bytes, so there's a lot of chaining;).

I don't know how IBM did disk space management on the earlier systems
such as the 1401, 7040, and 7090 series, but I'd suspect it was a similar
extent-based method.


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [IDEA] - run-length compaction of block numbers
  2004-01-16 22:38     ` Valdis.Kletnieks
@ 2004-01-19 10:45       ` Hans Reiser
  2004-01-19 16:20         ` Valdis.Kletnieks
  0 siblings, 1 reply; 13+ messages in thread
From: Hans Reiser @ 2004-01-19 10:45 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: raymond jennings, linux-kernel

Valdis.Kletnieks@vt.edu wrote:

>On Fri, 16 Jan 2004 23:47:55 +0300, Hans Reiser said:
>
>  
>
>>This is already done, they are called "extent"s.  Reiser4 uses them, XFS 
>>uses them, I think Veritas may have been the first to use them but I am 
>>not sure of this, maybe it was IBM.
>>    
>>
>
>Does the extent-based disk allocation used by OS/360 in 1964 count? :)
>  
>
Probably.  Tell me more about it.

-- 
Hans



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [IDEA] - run-length compaction of block numbers
  2004-01-16 20:47   ` Hans Reiser
@ 2004-01-16 22:38     ` Valdis.Kletnieks
  2004-01-19 10:45       ` Hans Reiser
  0 siblings, 1 reply; 13+ messages in thread
From: Valdis.Kletnieks @ 2004-01-16 22:38 UTC (permalink / raw)
  To: Hans Reiser; +Cc: raymond jennings, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 316 bytes --]

On Fri, 16 Jan 2004 23:47:55 +0300, Hans Reiser said:

> This is already done, they are called "extent"s.  Reiser4 uses them, XFS 
> uses them, I think Veritas may have been the first to use them but I am 
> not sure of this, maybe it was IBM.

Does the extent-based disk allocation used by OS/360 in 1964 count? :)

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [IDEA] - run-length compaction of block numbers
  2004-01-16 20:36   ` Ian Pilcher
@ 2004-01-16 21:57     ` Mike Fedyk
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Fedyk @ 2004-01-16 21:57 UTC (permalink / raw)
  To: Ian Pilcher; +Cc: linux-kernel

On Fri, Jan 16, 2004 at 02:36:03PM -0600, Ian Pilcher wrote:
> Valdis.Kletnieks@vt.edu wrote:
> >
> >On the other hand, dynamic allocation of inodes is interesting, as it means
> >you're not screwed if over time, the NBPI value for the filesystem changes 
> >(or
> >if you simply guessed wrong at mkfs time) and you run out of inodes when 
> >you
> >still have space free.  Reiserfs V3 already does this, in fact...
> >
> 
> As does JFS.  Anyone know about XFS?

Yes, XFS has dynamic inodes also.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [IDEA] - run-length compaction of block numbers
  2004-01-16 19:54 ` Valdis.Kletnieks
  2004-01-16 20:36   ` Ian Pilcher
@ 2004-01-16 20:47   ` Hans Reiser
  2004-01-16 22:38     ` Valdis.Kletnieks
  1 sibling, 1 reply; 13+ messages in thread
From: Hans Reiser @ 2004-01-16 20:47 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: raymond jennings, linux-kernel

Valdis.Kletnieks@vt.edu wrote:

>On Fri, 16 Jan 2004 11:38:59 PST, raymond jennings <highwind747@hotmail.com>  said:
>  
>
>>Is there any value in creating a new filesystem that encodes long contiguous 
>>blocks as a single block run instead of multiple block numbers?  A long file 
>>may use only a few block runs instead of many block numbers if there is 
>>little fragmentation (usually the case). 
>>
This is already done, they are called "extent"s.  Reiser4 uses them, XFS 
uses them, I think Veritas may have been the first to use them but I am 
not sure of this, maybe it was IBM.

>> Also dynamic allocation of 
>>inodes...etc.
>>
ReiserFS does dynamic allocation of stat data (what stat() returns), we 
don't have inodes.  Many filesystems do dynamic allocation of inodes.

>>  The details are long; anyone interested can e-mail me 
>>privately.
>>    
>>
>
>Only question I have is how you make an efficient scheme for finding the block
>number for an offset several gigabytes into the file.
>
Use a tree to store everything in.  See www.namesys.com for extensive 
details.

>  You either get to run
>through the list linearly (gaak) or you need to stick a tree or hash on the
>front to allow semi-random access to a starting point to scan linearly from, at
>which point you've probably blown any space savings unless you have a VERY low
>fragmentation rate...
>
>On the other hand, dynamic allocation of inodes is interesting, as it means
>you're not screwed if over time, the NBPI value for the filesystem changes (or
>if you simply guessed wrong at mkfs time) and you run out of inodes when you
>still have space free.  Reiserfs V3 already does this, in fact...
>
>  
>
Cheers,

-- 
Hans



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [IDEA] - run-length compaction of block numbers
  2004-01-16 19:54 ` Valdis.Kletnieks
@ 2004-01-16 20:36   ` Ian Pilcher
  2004-01-16 21:57     ` Mike Fedyk
  2004-01-16 20:47   ` Hans Reiser
  1 sibling, 1 reply; 13+ messages in thread
From: Ian Pilcher @ 2004-01-16 20:36 UTC (permalink / raw)
  To: linux-kernel

Valdis.Kletnieks@vt.edu wrote:
> 
> On the other hand, dynamic allocation of inodes is interesting, as it means
> you're not screwed if over time, the NBPI value for the filesystem changes (or
> if you simply guessed wrong at mkfs time) and you run out of inodes when you
> still have space free.  Reiserfs V3 already does this, in fact...
> 

As does JFS.  Anyone know about XFS?

-- 
========================================================================
Ian Pilcher                                        i.pilcher@comcast.net
========================================================================


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [IDEA] - run-length compaction of block numbers
  2004-01-16 19:38 raymond jennings
  2004-01-16 19:52 ` Randy.Dunlap
@ 2004-01-16 19:54 ` Valdis.Kletnieks
  2004-01-16 20:36   ` Ian Pilcher
  2004-01-16 20:47   ` Hans Reiser
  1 sibling, 2 replies; 13+ messages in thread
From: Valdis.Kletnieks @ 2004-01-16 19:54 UTC (permalink / raw)
  To: raymond jennings; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1190 bytes --]

On Fri, 16 Jan 2004 11:38:59 PST, raymond jennings <highwind747@hotmail.com>  said:
> Is there any value in creating a new filesystem that encodes long contiguous 
> blocks as a single block run instead of multiple block numbers?  A long file 
> may use only a few block runs instead of many block numbers if there is 
> little fragmentation (usually the case).  Also dynamic allocation of 
> inodes...etc.  The details are long; anyone interested can e-mail me 
> privately.

Only question I have is how you make an efficient scheme for finding the block
number for an offset several gigabytes into the file.  You either get to run
through the list linearly (gaak) or you need to stick a tree or hash on the
front to allow semi-random access to a starting point to scan linearly from, at
which point you've probably blown any space savings unless you have a VERY low
fragmentation rate...

On the other hand, dynamic allocation of inodes is interesting, as it means
you're not screwed if over time, the NBPI value for the filesystem changes (or
if you simply guessed wrong at mkfs time) and you run out of inodes when you
still have space free.  Reiserfs V3 already does this, in fact...


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [IDEA] - run-length compaction of block numbers
  2004-01-16 19:38 raymond jennings
@ 2004-01-16 19:52 ` Randy.Dunlap
  2004-01-16 19:54 ` Valdis.Kletnieks
  1 sibling, 0 replies; 13+ messages in thread
From: Randy.Dunlap @ 2004-01-16 19:52 UTC (permalink / raw)
  To: raymond jennings; +Cc: linux-kernel

On Fri, 16 Jan 2004 11:38:59 -0800 "raymond jennings" <highwind747@hotmail.com> wrote:

| Is there any value in creating a new filesystem that encodes long contiguous 
| blocks as a single block run instead of multiple block numbers?  A long file 
| may use only a few block runs instead of many block numbers if there is 
| little fragmentation (usually the case).  Also dynamic allocation of 
| inodes...etc.  The details are long; anyone interested can e-mail me 
| privately.

You mean line ext3fs + extents, or like JFS or XFS, which use extents?

--
~Randy
Everything is relative.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [IDEA] - run-length compaction of block numbers
@ 2004-01-16 19:38 raymond jennings
  2004-01-16 19:52 ` Randy.Dunlap
  2004-01-16 19:54 ` Valdis.Kletnieks
  0 siblings, 2 replies; 13+ messages in thread
From: raymond jennings @ 2004-01-16 19:38 UTC (permalink / raw)
  To: linux-kernel

Is there any value in creating a new filesystem that encodes long contiguous 
blocks as a single block run instead of multiple block numbers?  A long file 
may use only a few block runs instead of many block numbers if there is 
little fragmentation (usually the case).  Also dynamic allocation of 
inodes...etc.  The details are long; anyone interested can e-mail me 
privately.

_________________________________________________________________
Rethink your business approach for the new year with the helpful tips here. 
http://special.msn.com/bcentral/prep04.armx


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-02-08 11:38 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1eI8L-5fS-9@gated-at.bofh.it>
2004-01-16 20:01 ` [IDEA] - run-length compaction of block numbers Andi Kleen
2004-01-29 16:40 raymond jennings
  -- strict thread matches above, loose matches on Subject: below --
2004-01-16 19:38 raymond jennings
2004-01-16 19:52 ` Randy.Dunlap
2004-01-16 19:54 ` Valdis.Kletnieks
2004-01-16 20:36   ` Ian Pilcher
2004-01-16 21:57     ` Mike Fedyk
2004-01-16 20:47   ` Hans Reiser
2004-01-16 22:38     ` Valdis.Kletnieks
2004-01-19 10:45       ` Hans Reiser
2004-01-19 16:20         ` Valdis.Kletnieks
2004-01-19 17:06           ` Hans Reiser
2004-02-08 11:38           ` Keith Owens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).