linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Badari Pulavarty <pbadari@us.ibm.com>
To: Joel Becker <Joel.Becker@oracle.com>
Cc: Jeff Garzik <jeff@garzik.org>, "Theodore Ts'o" <tytso@mit.edu>,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: Proposal and plan for ext2/3 future development work
Date: Fri, 30 Jun 2006 12:29:52 -0700	[thread overview]
Message-ID: <1151695792.339.19.camel@dyn9047017100.beaverton.ibm.com> (raw)
In-Reply-To: <20060630182457.GH11640@ca-server1.us.oracle.com>

On Fri, 2006-06-30 at 11:24 -0700, Joel Becker wrote:
> On Fri, Jun 30, 2006 at 10:13:06AM -0700, Badari Pulavarty wrote:
> > I tried adding "delayed allocation" for ext3 earlier. Yes. VFS level
> > infrastructure would be nice. But, I haven't found much that we can
> > do at VFS - which is common across all the filesystems (except
> > mpage_writepage(s) handling). Most of the stuff is specific to 
> > filesystem implementation (even though it could be common) - coming
> > out with VFS level interfaces to suite all the different filesystem
> > delalloc would be *interesting* exercise.
> 
> 	Well, to be fair, I'm just going by what little I know about
> XFS.  They maintain a cache of all pages waiting on delayed allocation
> for writepack.  Why have this entire cache (hash, list, whatever) when
> we could create some state on in the pagecache?  We save a large chunk
> of memory and some complex writeback code.  I suspect you were thinking
> of this when you said "mpage_writepage(s) handling".  But this is a
> large complexity win if we can do it.
> 	The same with metadata/data ordering issues.  ie, data=ordered
> or even plain "creat(2); write(2)".  I don't know how generic the
> ordering is for each filesystem, but there is always room for play.
> 	On-disk, of course each filesystem is going to be different.
> I'm not sure we could fit a fully-generic aops->reserve_space() &
> aops->commit_space() API.  But I don't think we need to.

Unfortunately, I haven't looked at XFS delalloc implementation indetail
to understand what exact they would need from VFS (or could be pushed to
VFS). I purely tried to work with current ext3 code and current VFS
support. What I find is that -

1) Instead of allocating a block at prepare time, we need to be able to
"reserve" a block (so it won't file as part of writeback). And, as 
part of writeback - we need a way to figure out if a given page did
indeed really reserve the block. (we need to make sure the allocation
succeeds for those). We might need a pageflag for this (but I haven't
decided that its absolutely needed).

2) Needed a way to cluster bunch of (contig) pages and allocate disk
blocksfor those in a single shot - which is NOT a direct delalloc
requirement, but that is the whole reason for doing delalloc. 
(Suprana did few radix_tree interfaces for this).

Other than these general VFS level ones - I had to play with journal
lock ordering issues (very specific to ext3 stuff). To work around
the journalling issues, I had to do my own mpage_writepages() since 
the changes I need are specific to ext3 journalling - I am not sure
if they are going to be useful for other filesystems or not.

If you can think of general infrastructure you need for OCFS2, please
let me know - we can come with commonality.


Thanks,
Badari



  parent reply	other threads:[~2006-06-30 19:27 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-28 23:55 Proposal and plan for ext2/3 future development work Theodore Ts'o
2006-06-30  1:14 ` Jeff Garzik
2006-06-30  1:59   ` Joel Becker
2006-06-30 17:13     ` Badari Pulavarty
2006-06-30 18:24       ` Joel Becker
2006-06-30 19:17         ` Steve Lord
2006-06-30 19:29         ` Badari Pulavarty [this message]
2006-06-30 10:39 ` Andi Kleen
2006-06-30 15:14   ` Theodore Tso
2006-07-01  9:42     ` Adrian Bunk
2006-07-01 10:29       ` Theodore Tso
2006-06-30 11:09 ` Patrick McFarland
2006-06-30 23:44   ` Mingming Cao
2006-07-24 13:04 ` Guillaume Chazarain
2006-07-24 18:11   ` Jeff Garzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1151695792.339.19.camel@dyn9047017100.beaverton.ibm.com \
    --to=pbadari@us.ibm.com \
    --cc=Joel.Becker@oracle.com \
    --cc=jeff@garzik.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).