linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Chris Mason <chris.mason@oracle.com>
Cc: David Chinner <dgc@sgi.com>,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] fsblock
Date: Fri, 29 Jun 2007 04:33:17 +0200	[thread overview]
Message-ID: <20070629023316.GC6091@wotan.suse.de> (raw)
In-Reply-To: <20070628122031.GF5313@think.oraclecorp.com>

On Thu, Jun 28, 2007 at 08:20:31AM -0400, Chris Mason wrote:
> On Thu, Jun 28, 2007 at 04:44:43AM +0200, Nick Piggin wrote:
> > 
> > That's true but I don't think an extent data structure means we can
> > become too far divorced from the pagecache or the native block size
> > -- what will end up happening is that often we'll need "stuff" to map
> > between all those as well, even if it is only at IO-time.
> 
> I think the fundamental difference is that fsblock still does:
> mapping_info = page->something, where something is attached on a per
> page basis.  What we really want is mapping_info = lookup_mapping(page),
> where that function goes and finds something stored on a per extent
> basis, with extra bits for tracking dirty and locked state.
> 
> Ideally, in at least some of the cases the dirty and locked state could
> be at an extent granularity (streaming IO) instead of the block
> granularity (random IO).
> 
> In my little brain, even block based filesystems should be able to take
> advantage of this...but such things are always easier to believe in
> before the coding starts.

Now I wouldn't for a minute deny that at least some of the block
information would be well to store in extent/tree format (if XFS 
does it it must be good!).

And yes, I'm sure filesystems with even basic block based allocation
could get a reasonable ratio of blocks to extents.

However I think it is fundamentally another layer or at least
more complexity... fsblocks uses the existing pagecache mapping as
(much of) the data structure and uses the existing pagecache locking
for the locking. And it fundamentally just provides a block access
and IO layer into the pagecache for the filesystem, which I think will
often be needed anyway.

But that said, I would like to see a generic extent mapping layer
sitting between fsblock and the filesystem (I might even have a crack
at it myself)... and I could be proven completely wrong and it may be
that fsblock isn't required at all after such a layer goes in. So I
will try to keep all the APIs extent based.

The first thing I actually looked at for "get_blocks" was for the
filesystem to build up a tree of mappings itself, completely unconnected
from the pagecache. It just ended up being a little more work and
locking but the idea isn't insane :)


> > One issue I have with the current nobh and mpage stuff is that it
> > requires multiple calls into get_block (first to prepare write, then
> > to writepage), it doesn't allow filesystems to attach resources
> > required for writeout at prepare_write time, and it doesn't play nicely
> > with buffers in general. (not to mention that nobh error handling is
> > buggy).
> > 
> > I haven't done any mpage-like code for fsblocks yet, but I think they
> > wouldn't be too much trouble, and wouldn't have any of the above
> > problems...
> 
> Could be, but the fundamental issue of sometimes pages have mappings
> attached and sometimes they don't is still there.  The window is
> smaller, but non-zero.

The aim for fsblocks is that any page under IO will always have fsblocks,
which I hope is going to make this easy. In the fsblocks patch I sent out
there is a window (with mmapped pages), however that's a bug wich can be
fixed rather than a fundamental problem. So writepages will be less problem.

Readpages may indeed be more efficient and block mapping with extents than
individual fsblocks (or it could be, if it were an extent based API itself).

Well I don't know. Extents are always going to have benefits, but I don't
know if it means the fsblock part could go away completely. I'll keep
it in mind though.


  parent reply	other threads:[~2007-06-29  2:33 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-24  1:45 [RFC] fsblock Nick Piggin
2007-06-24  1:46 ` [patch 1/3] add the fsblock layer Nick Piggin
2007-06-24 15:28   ` Andi Kleen
2007-06-24 20:18     ` Arjan van de Ven
2007-06-25  8:58       ` Andi Kleen
2007-06-25  7:19     ` Nick Piggin
2007-06-24 23:01   ` Neil Brown
2007-06-25  7:41     ` Nick Piggin
2007-06-25 12:29       ` Chris Mason
2007-06-26  2:34         ` Nick Piggin
2007-06-26  2:48           ` Neil Brown
2007-06-26  3:07             ` Nick Piggin
2007-06-26 12:26               ` Chris Mason
2007-06-30 10:40                 ` Christoph Hellwig
2007-06-30 10:40           ` Christoph Hellwig
2007-06-25 13:19   ` Chris Mason
2007-06-26  2:42     ` Nick Piggin
2007-06-24  1:46 ` [patch 2/3] block_dev: convert to fsblock Nick Piggin
2007-06-24  1:47 ` [patch 3/3] minix: " Nick Piggin
2007-06-24  1:53 ` [RFC] fsblock Nick Piggin
2007-06-24  3:07 ` Jeff Garzik
2007-06-24  3:47   ` Nick Piggin
2007-06-24 13:51     ` Chris Mason
2007-06-25  6:58       ` Nick Piggin
2007-06-25 12:25         ` Chris Mason
2007-06-30 10:44           ` Christoph Hellwig
2007-06-30 10:42   ` Christoph Hellwig
2007-06-30 11:10     ` Jeff Garzik
2007-06-30 11:13       ` Christoph Hellwig
2007-06-24  4:19 ` William Lee Irwin III
2007-06-24 14:16 ` Andi Kleen
2007-06-25  7:16   ` Nick Piggin
2007-06-26  3:06 ` David Chinner
2007-06-26  3:55   ` Nick Piggin
2007-06-26  9:23     ` David Chinner
2007-06-26 11:14       ` Nick Piggin
2007-06-27 12:39         ` Kyle Moffett
2007-06-26 12:34       ` Chris Mason
2007-06-27  5:32         ` Nick Piggin
2007-06-27  6:05           ` David Chinner
2007-06-27 11:50           ` Chris Mason
2007-06-27 15:18             ` Anton Altaparmakov
2007-06-27 22:35             ` David Chinner
2007-06-28  2:44               ` Nick Piggin
2007-06-28 12:20                 ` Chris Mason
2007-06-29  2:08                   ` David Chinner
2007-06-29  2:33                   ` Nick Piggin [this message]
2007-06-30 11:05 ` Christoph Hellwig
2007-07-09 17:14 ` Christoph Lameter
2007-07-10  0:54   ` Nick Piggin
2007-07-10  0:59     ` Christoph Lameter
2007-07-10  1:07       ` Nick Piggin
2007-07-10  1:37       ` Dave McCracken

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070629023316.GC6091@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=chris.mason@oracle.com \
    --cc=dgc@sgi.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).