linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Ideas for TUX2
@ 2001-07-03  6:25 ` Ph. Marek
  2001-07-03 23:44   ` Rik van Riel
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Ph. Marek @ 2001-07-03  6:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: phillips

I'd like to give some of my thoughts regarding tux2 (phase-change-tree fs):


* FILES *


If a file's data has been changed, it suffices to update the inode and the
of free blocks bitmap (fbb).
But updating them in one go is not possible - the fbb is located at the
superblock, the inode can be (nearly) anywhere on disk.

- Updating another inode and rewriting the directory isn't possible.
This would change the inode number which can be referenced in more than one
directories; and some programs (nfs) rely on the inode number.

- So we have to have two inode spaces per inode, which are switched with
the superblock.
To do this we give a generation counter into the inode, which is checked
against the latest valid superblock.
(Alternatively, we could use the modification time of the inode and have a
modification time in the superblock).
Of two inode spaces we'd regard the one as active, which has a higher
generation counter/mtime than the other, but only if it's lower or equal
than the one of the superblock.



* FILESYSTEM STRUCTURE / SUPERBLOCKs *


As you wrote it's necessary to have several versions of superblocks around. 
As the fbb MUST be updated in sync with the superblocks, they should be
near to each other - and the superblock should be later on disk to be able
to write the fbb first and the mark this version as active without a seek.


Some calculation:
Let's assume a 4kB blocksize.
4kB are 32kBit, which multiplied with 4kB blocksize give 128MB of
adressable memory. So we need one 4kB block for every 128MB of fs size.

So 12.8GB would amount to 100 blocks or 400kB.



- Every copy/version (3 or 4) of the superblock has it's own fbb.
This amounts on the above example of 12.8 GB (with 4 versions) to 1/8192 of
the entire fs space - I think that's ok, especially if there will be more
space needed for the inodes and the partly duplicated other data.

- Every single fbb/sb version is (on the harddisk) a linked list with the
following structure:
  - fbb
    - magic number
    - reference to the next block (block number)
    - entries of fbb.
      This takes 2*32Bit or 2*64Bit of the 4kB.
  - sb: all normal sb entries, enlarged with 
    - a generation counter/mtime.
    - a reference to the next assumed fbb/sb location.


I'd like to make a linked list on the harddisk in order to have this
filesystem dynamically resizeable (at least enlargeable):
- a new phase is generated, with more fbb space and so on.
- a sufficiently large space is taken from the harddisk (which is easy,
since the harddisk is now bigger) - preferable continously
- the old phase is written to disk, with the sb "next reference" pointing
to the new space.
- if the old phase is completed, the new phase is populated like normal
operation OR immediately phased again.

As soon as this new fbb/sb block is written on disk, the newest version of
this filesystem says the new size - voila! Online resizing!



I think there are some loose ends in there.
All comments welcome.



Regards,

Phil



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Ideas for TUX2
  2001-07-03  6:25 ` Ideas for TUX2 Ph. Marek
@ 2001-07-03 23:44   ` Rik van Riel
  2001-07-04  6:16   ` Ph. Marek
  2001-07-04  7:53   ` Ph. Marek
  2 siblings, 0 replies; 6+ messages in thread
From: Rik van Riel @ 2001-07-03 23:44 UTC (permalink / raw)
  To: Ph. Marek; +Cc: linux-kernel, phillips

On Tue, 3 Jul 2001, Ph. Marek wrote:

> If a file's data has been changed, it suffices to update the inode and the
> of free blocks bitmap (fbb).
> But updating them in one go is not possible

You seem to have missed some fundamental understanding of
exactly how phase tree works; the wohle point of phase
tree is to make atomic updates like this possible!

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Ideas for TUX2
  2001-07-03  6:25 ` Ideas for TUX2 Ph. Marek
  2001-07-03 23:44   ` Rik van Riel
@ 2001-07-04  6:16   ` Ph. Marek
  2001-07-04  6:54     ` Daniel Phillips
  2001-07-04  7:53   ` Ph. Marek
  2 siblings, 1 reply; 6+ messages in thread
From: Ph. Marek @ 2001-07-04  6:16 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, phillips

>> If a file's data has been changed, it suffices to update the inode and the
>> of free blocks bitmap (fbb).
>> But updating them in one go is not possible
>
>You seem to have missed some fundamental understanding of
>exactly how phase tree works; the wohle point of phase
>tree is to make atomic updates like this possible!
Well, my point was, that with several thousand inodes spread over the disk
it won't always be possible to update the inode AND the fbb in one go.
So I proposed the 2nd inode with generation counter!


Regards,

Phil

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Ideas for TUX2
  2001-07-04  6:16   ` Ph. Marek
@ 2001-07-04  6:54     ` Daniel Phillips
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Phillips @ 2001-07-04  6:54 UTC (permalink / raw)
  To: Ph. Marek, Rik van Riel; +Cc: linux-kernel

On Wednesday 04 July 2001 08:16, Ph. Marek wrote:
> >> If a file's data has been changed, it suffices to update the inode and
> >> the of free blocks bitmap (fbb).
> >> But updating them in one go is not possible
> >
> >You seem to have missed some fundamental understanding of
> >exactly how phase tree works; the wohle point of phase
> >tree is to make atomic updates like this possible!
>
> Well, my point was, that with several thousand inodes spread over the disk
> it won't always be possible to update the inode AND the fbb in one go.
> So I proposed the 2nd inode with generation counter!

The cool thing is, it *is* possible, read how here:

  http://nl.linux.org/~phillips/tux2/phase.tree.tutorial.html

--
Daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Ideas for TUX2
  2001-07-03  6:25 ` Ideas for TUX2 Ph. Marek
  2001-07-03 23:44   ` Rik van Riel
  2001-07-04  6:16   ` Ph. Marek
@ 2001-07-04  7:53   ` Ph. Marek
  2001-07-04 14:31     ` Daniel Phillips
  2 siblings, 1 reply; 6+ messages in thread
From: Ph. Marek @ 2001-07-04  7:53 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel

>> >> If a file's data has been changed, it suffices to update the inode and
>> >> the of free blocks bitmap (fbb).
>> >> But updating them in one go is not possible
>> >
>> >You seem to have missed some fundamental understanding of
>> >exactly how phase tree works; the wohle point of phase
>> >tree is to make atomic updates like this possible!
>>
>> Well, my point was, that with several thousand inodes spread over the disk
>> it won't always be possible to update the inode AND the fbb in one go.
>> So I proposed the 2nd inode with generation counter!
>
>The cool thing is, it *is* possible, read how here:
>
>  http://nl.linux.org/~phillips/tux2/phase.tree.tutorial.html
Well, ok. Your split the inode "files" too.

Hmmm...
That sound more complex than my version (at least now, until I've seen the
implementation - maybe it's easier because it has less special cases than
mine).

And of course the memory usage on the harddisk is much less with your
version as you split your inode data and don't have it duplicated.

Well, I hope to see an implementation soon - I'd like to help, even if it's
only testing.


Thanks for the answer!


Regards,

Phil


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Ideas for TUX2
  2001-07-04  7:53   ` Ph. Marek
@ 2001-07-04 14:31     ` Daniel Phillips
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Phillips @ 2001-07-04 14:31 UTC (permalink / raw)
  To: Ph. Marek; +Cc: linux-kernel

On Wednesday 04 July 2001 09:53, Ph. Marek wrote:
> >> Well, my point was, that with several thousand inodes spread over the
> >> disk it won't always be possible to update the inode AND the fbb in one
> >> go. So I proposed the 2nd inode with generation counter!
> >
> >The cool thing is, it *is* possible, read how here:
> >
> >  http://nl.linux.org/~phillips/tux2/phase.tree.tutorial.html
>
> Well, ok. Your split the inode "files" too.
>
> Hmmm...
> That sound more complex than my version (at least now, until I've seen the
> implementation - maybe it's easier because it has less special cases than
> mine).

Yes, it's more complex, but not horribly so.  It's a lot more efficient, and 
that's the point.

> And of course the memory usage on the harddisk is much less with your
> version as you split your inode data and don't have it duplicated.

Yep.

> Well, I hope to see an implementation soon - I'd like to help, even if it's
> only testing.

See you on the list.  By the way, if you want to help out right now, could 
you run some benchmarks my latest early flush patch?  See "[RFC] Early Flush 
with Bandwidth Estimation".

--
Daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2001-07-04 14:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.33L.0107032042220.28737-100000@imladris.rielhom e.conectiva>
2001-07-03  6:25 ` Ideas for TUX2 Ph. Marek
2001-07-03 23:44   ` Rik van Riel
2001-07-04  6:16   ` Ph. Marek
2001-07-04  6:54     ` Daniel Phillips
2001-07-04  7:53   ` Ph. Marek
2001-07-04 14:31     ` Daniel Phillips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).