linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Garzik <jeff@garzik.org>
To: Nick Piggin <npiggin@suse.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] fsblock
Date: Sat, 23 Jun 2007 23:07:54 -0400	[thread overview]
Message-ID: <467DE00A.9080700@garzik.org> (raw)
In-Reply-To: <20070624014528.GA17609@wotan.suse.de>

Nick Piggin wrote:
> - No deadlocks (hopefully). The buffer layer is technically deadlocky by
>   design, because it can require memory allocations at page writeout-time.
>   It also has one path that cannot tolerate memory allocation failures.
>   No such problems for fsblock, which keeps fsblock metadata around for as
>   long as a page is dirty (this still has problems vs get_user_pages, but
>   that's going to require an audit of all get_user_pages sites. Phew).
> 
> - In line with the above item, filesystem block allocation is performed
>   before a page is dirtied. In the buffer layer, mmap writes can dirty a
>   page with no backing blocks which is a problem if the filesystem is
>   ENOSPC (patches exist for buffer.c for this).

This raises an eyebrow...  The handling of ENOSPC prior to mmap write is 
more an ABI behavior, so I don't see how this can be fixed with internal 
changes, yet without changing behavior currently exported to userland 
(and thus affecting code based on such assumptions).


> - An inode's metadata must be tracked per-inode in order for fsync to
>   work correctly. buffer contains helpers to do this for basic
>   filesystems, but any block can be only the metadata for a single inode.
>   This is not really correct for things like inode descriptor blocks.
>   fsblock can track multiple inodes per block. (This is non trivial,
>   and it may be overkill so it could be reverted to a simpler scheme
>   like buffer).

hrm; no specific comment but this seems like an idea/area that needs to 
be fleshed out more, by converting some of the more advanced filesystems.


> - Large block support. I can mount and run an 8K block size minix3 fs on
>   my 4K page system and it didn't require anything special in the fs. We
>   can go up to about 32MB blocks now, and gigabyte+ blocks would only
>   require  one more bit in the fsblock flags. fsblock_superpage blocks
>   are > PAGE_CACHE_SIZE, midpage ==, and subpage <.

definitely useful, especially if I rewrite my ibu filesystem for 2.6.x, 
like I've been planning.


> So. Comments? Is this something we want? If yes, then how would we
> transition from buffer.c to fsblock.c?

Your work is definitely interesting, but I think it will be even more 
interesting once ext2 (w/ dir in pagecache) and ext3 (journalling) are 
converted.

My gut feeling is that there are several problem areas you haven't hit 
yet, with the new code.

Also, once things are converted, the question of transitioning from 
buffer.c will undoubtedly answer itself.  That's the way several of us 
handle transitions:  finish all the work, then look with fresh eyes and 
conceive a path from the current code to your enhanced code.

	Jeff



  parent reply	other threads:[~2007-06-24  3:08 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-24  1:45 [RFC] fsblock Nick Piggin
2007-06-24  1:46 ` [patch 1/3] add the fsblock layer Nick Piggin
2007-06-24 15:28   ` Andi Kleen
2007-06-24 20:18     ` Arjan van de Ven
2007-06-25  8:58       ` Andi Kleen
2007-06-25  7:19     ` Nick Piggin
2007-06-24 23:01   ` Neil Brown
2007-06-25  7:41     ` Nick Piggin
2007-06-25 12:29       ` Chris Mason
2007-06-26  2:34         ` Nick Piggin
2007-06-26  2:48           ` Neil Brown
2007-06-26  3:07             ` Nick Piggin
2007-06-26 12:26               ` Chris Mason
2007-06-30 10:40                 ` Christoph Hellwig
2007-06-30 10:40           ` Christoph Hellwig
2007-06-25 13:19   ` Chris Mason
2007-06-26  2:42     ` Nick Piggin
2007-06-24  1:46 ` [patch 2/3] block_dev: convert to fsblock Nick Piggin
2007-06-24  1:47 ` [patch 3/3] minix: " Nick Piggin
2007-06-24  1:53 ` [RFC] fsblock Nick Piggin
2007-06-24  3:07 ` Jeff Garzik [this message]
2007-06-24  3:47   ` Nick Piggin
2007-06-24 13:51     ` Chris Mason
2007-06-25  6:58       ` Nick Piggin
2007-06-25 12:25         ` Chris Mason
2007-06-30 10:44           ` Christoph Hellwig
2007-06-30 10:42   ` Christoph Hellwig
2007-06-30 11:10     ` Jeff Garzik
2007-06-30 11:13       ` Christoph Hellwig
2007-06-24  4:19 ` William Lee Irwin III
2007-06-24 14:16 ` Andi Kleen
2007-06-25  7:16   ` Nick Piggin
2007-06-26  3:06 ` David Chinner
2007-06-26  3:55   ` Nick Piggin
2007-06-26  9:23     ` David Chinner
2007-06-26 11:14       ` Nick Piggin
2007-06-27 12:39         ` Kyle Moffett
2007-06-26 12:34       ` Chris Mason
2007-06-27  5:32         ` Nick Piggin
2007-06-27  6:05           ` David Chinner
2007-06-27 11:50           ` Chris Mason
2007-06-27 15:18             ` Anton Altaparmakov
2007-06-27 22:35             ` David Chinner
2007-06-28  2:44               ` Nick Piggin
2007-06-28 12:20                 ` Chris Mason
2007-06-29  2:08                   ` David Chinner
2007-06-29  2:33                   ` Nick Piggin
2007-06-30 11:05 ` Christoph Hellwig
2007-07-09 17:14 ` Christoph Lameter
2007-07-10  0:54   ` Nick Piggin
2007-07-10  0:59     ` Christoph Lameter
2007-07-10  1:07       ` Nick Piggin
2007-07-10  1:37       ` Dave McCracken

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=467DE00A.9080700@garzik.org \
    --to=jeff@garzik.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).