linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Chinner <dgc@sgi.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] fsblock
Date: Tue, 26 Jun 2007 13:06:40 +1000	[thread overview]
Message-ID: <20070626030640.GM989688@sgi.com> (raw)
In-Reply-To: <20070624014528.GA17609@wotan.suse.de>

On Sun, Jun 24, 2007 at 03:45:28AM +0200, Nick Piggin wrote:
> 
> I'm announcing "fsblock" now because it is quite intrusive and so I'd
> like to get some thoughts about significantly changing this core part
> of the kernel.

Can you rename it to something other than shorthand for
"filesystem block"? e.g. When you say:

> - In line with the above item, filesystem block allocation is performed

What are we actually talking aout here? filesystem block allocation
is something a filesystem does to allocate blocks on disk, not
allocate a mapping structure in memory.

Realistically, this is not about "filesystem blocks", this is
about file offset to disk blocks. i.e. it's a mapping.

>   Probably better would be to
>   move towards offset,length rather than page based fs APIs where everything
>   can be batched up nicely and this sort of non-trivial locking can be more
>   optimal.

If we are going to turn over the API completely like this, can
we seriously look at moving to this sort of interface at the same
time?

With a offset/len interface, we can start to track contiguous
ranges of blocks rather than persisting with a structure per
filesystem block. If you want to save memory, thet's where
we need to go.

XFS uses "iomaps" for this purpose - it's basically:

	- start offset into file
	- start block on disk
	- length of mapping
	- state 

With special "disk blocks" for indicating delayed allocation
blocks (-1) and unwritten extents (-2). Worst case we end up
with is an iomap per filesystem block.

If we allow iomaps to be split and combined along with range
locking, we can parallelise read and write access to each
file on an iomap basis, etc. There's plenty of goodness that
comes from indexing by range....

FWIW, I really see little point in making all the filesystems
work with fsblocks if the plan is to change the API again in
a major way a year down the track. Let's get all the changes
we think are necessary in one basket first, and then work out
a coherent plan to implement them ;)

> - Large block support. I can mount and run an 8K block size minix3 fs on
>   my 4K page system and it didn't require anything special in the fs. We
>   can go up to about 32MB blocks now, and gigabyte+ blocks would only
>   require  one more bit in the fsblock flags. fsblock_superpage blocks
>   are > PAGE_CACHE_SIZE, midpage ==, and subpage <.

My 2c worth - this is a damn complex way of introducing large block
size support. It has all the problems I pointed out that it would
have (locking issues, vmap overhead, every filesystem needs needs
major changes and it's not very efficient) and it's going to take
quite some time to stabilise.

If this is the only real feature that fsblocks are going to give us,
then I think this is a waste of time. If we are going to replace
buffer heads, lets do it with something that is completely
independent of filesystem block size and not introduce something
that is just a bufferhead on steroids.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

  parent reply	other threads:[~2007-06-26  3:07 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-24  1:45 [RFC] fsblock Nick Piggin
2007-06-24  1:46 ` [patch 1/3] add the fsblock layer Nick Piggin
2007-06-24 15:28   ` Andi Kleen
2007-06-24 20:18     ` Arjan van de Ven
2007-06-25  8:58       ` Andi Kleen
2007-06-25  7:19     ` Nick Piggin
2007-06-24 23:01   ` Neil Brown
2007-06-25  7:41     ` Nick Piggin
2007-06-25 12:29       ` Chris Mason
2007-06-26  2:34         ` Nick Piggin
2007-06-26  2:48           ` Neil Brown
2007-06-26  3:07             ` Nick Piggin
2007-06-26 12:26               ` Chris Mason
2007-06-30 10:40                 ` Christoph Hellwig
2007-06-30 10:40           ` Christoph Hellwig
2007-06-25 13:19   ` Chris Mason
2007-06-26  2:42     ` Nick Piggin
2007-06-24  1:46 ` [patch 2/3] block_dev: convert to fsblock Nick Piggin
2007-06-24  1:47 ` [patch 3/3] minix: " Nick Piggin
2007-06-24  1:53 ` [RFC] fsblock Nick Piggin
2007-06-24  3:07 ` Jeff Garzik
2007-06-24  3:47   ` Nick Piggin
2007-06-24 13:51     ` Chris Mason
2007-06-25  6:58       ` Nick Piggin
2007-06-25 12:25         ` Chris Mason
2007-06-30 10:44           ` Christoph Hellwig
2007-06-30 10:42   ` Christoph Hellwig
2007-06-30 11:10     ` Jeff Garzik
2007-06-30 11:13       ` Christoph Hellwig
2007-06-24  4:19 ` William Lee Irwin III
2007-06-24 14:16 ` Andi Kleen
2007-06-25  7:16   ` Nick Piggin
2007-06-26  3:06 ` David Chinner [this message]
2007-06-26  3:55   ` Nick Piggin
2007-06-26  9:23     ` David Chinner
2007-06-26 11:14       ` Nick Piggin
2007-06-27 12:39         ` Kyle Moffett
2007-06-26 12:34       ` Chris Mason
2007-06-27  5:32         ` Nick Piggin
2007-06-27  6:05           ` David Chinner
2007-06-27 11:50           ` Chris Mason
2007-06-27 15:18             ` Anton Altaparmakov
2007-06-27 22:35             ` David Chinner
2007-06-28  2:44               ` Nick Piggin
2007-06-28 12:20                 ` Chris Mason
2007-06-29  2:08                   ` David Chinner
2007-06-29  2:33                   ` Nick Piggin
2007-06-30 11:05 ` Christoph Hellwig
2007-07-09 17:14 ` Christoph Lameter
2007-07-10  0:54   ` Nick Piggin
2007-07-10  0:59     ` Christoph Lameter
2007-07-10  1:07       ` Nick Piggin
2007-07-10  1:37       ` Dave McCracken

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070626030640.GM989688@sgi.com \
    --to=dgc@sgi.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).