linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* GFS2 and DLM
@ 2006-06-20 12:17 Steven Whitehouse
  2006-06-20 12:32 ` Nick Piggin
                   ` (6 more replies)
  0 siblings, 7 replies; 46+ messages in thread
From: Steven Whitehouse @ 2006-06-20 12:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Teigland, Patrick Caulfield, Kevin Anderson, Andrew Morton,
	Ingo Molnar, linux-kernel

Hi,

Linus, Andrew suggested to me to send this pull request to you directly.
Please consider merging the GFS2 filesystem and DLM from (they are both
in the same tree for ease of testing):

rsync://rsync.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6.git

They have both lived in -mm for quite some time. We merged all review
feedback and patches that came though -mm and from the various previous
postings of patches to lkml and fsdevel mailing lists.

I have updated my git tree so that its fully uptodate with your current
tree (as of the time of this request) and tested building and using it,

Steve.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-20 12:17 GFS2 and DLM Steven Whitehouse
@ 2006-06-20 12:32 ` Nick Piggin
  2006-06-20 12:47   ` Steven Whitehouse
  2006-06-20 12:33 ` Christoph Hellwig
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 46+ messages in thread
From: Nick Piggin @ 2006-06-20 12:32 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, Ingo Molnar, linux-kernel

Steven Whitehouse wrote:
> Hi,
> 
> Linus, Andrew suggested to me to send this pull request to you directly.
> Please consider merging the GFS2 filesystem and DLM from (they are both
> in the same tree for ease of testing):
> 
> rsync://rsync.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6.git
> 
> They have both lived in -mm for quite some time. We merged all review
> feedback and patches that came though -mm and from the various previous
> postings of patches to lkml and fsdevel mailing lists.
> 
> I have updated my git tree so that its fully uptodate with your current
> tree (as of the time of this request) and tested building and using it,

Hi Steve,

Could you post a diffstat?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-20 12:17 GFS2 and DLM Steven Whitehouse
  2006-06-20 12:32 ` Nick Piggin
@ 2006-06-20 12:33 ` Christoph Hellwig
  2006-06-20 12:55   ` Steven Whitehouse
  2006-06-20 15:55   ` Ingo Molnar
  2006-06-23 14:49 ` Christoph Hellwig
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 46+ messages in thread
From: Christoph Hellwig @ 2006-06-20 12:33 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, Ingo Molnar, linux-kernel

On Tue, Jun 20, 2006 at 01:17:13PM +0100, Steven Whitehouse wrote:
> Hi,
> 
> Linus, Andrew suggested to me to send this pull request to you directly.
> Please consider merging the GFS2 filesystem and DLM from (they are both
> in the same tree for ease of testing):

Did anyone actually bother to review it?  It's huge and was in pretty
bad shapre when I looked last time.  Also in the -mm merge writeup
you guys said it's only scheduled for 2.6.19 so I didn't even bother
looking at the huge mess.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-20 12:32 ` Nick Piggin
@ 2006-06-20 12:47   ` Steven Whitehouse
  2006-06-20 14:04     ` Nick Piggin
  0 siblings, 1 reply; 46+ messages in thread
From: Steven Whitehouse @ 2006-06-20 12:47 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, Ingo Molnar, linux-kernel

Hi,

On Tue, 2006-06-20 at 22:32 +1000, Nick Piggin wrote:
> Steven Whitehouse wrote:
> > Hi,
> > 
> > Linus, Andrew suggested to me to send this pull request to you directly.
> > Please consider merging the GFS2 filesystem and DLM from (they are both
> > in the same tree for ease of testing):
> > 
> > rsync://rsync.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6.git
> > 
> > They have both lived in -mm for quite some time. We merged all review
> > feedback and patches that came though -mm and from the various previous
> > postings of patches to lkml and fsdevel mailing lists.
> > 
> > I have updated my git tree so that its fully uptodate with your current
> > tree (as of the time of this request) and tested building and using it,
> 
> Hi Steve,
> 
> Could you post a diffstat?
> 
Here is the diffstat as requested,

Steve.

------------------------------------------------------------------------------------------------
 CREDITS                            |    6 
 Documentation/filesystems/gfs2.txt |   43 
 MAINTAINERS                        |   18 
 fs/Kconfig                         |    2 
 fs/Makefile                        |    2 
 fs/configfs/item.c                 |    2 
 fs/dlm/Kconfig                     |   29 
 fs/dlm/Makefile                    |   21 
 fs/dlm/ast.c                       |  167 +
 fs/dlm/ast.h                       |   26 
 fs/dlm/config.c                    |  789 ++++++++
 fs/dlm/config.h                    |   42 
 fs/dlm/debug_fs.c                  |  296 +++
 fs/dlm/device.c                    | 1239 ++++++++++++
 fs/dlm/dir.c                       |  423 ++++
 fs/dlm/dir.h                       |   30 
 fs/dlm/dlm_internal.h              |  494 +++++
 fs/dlm/lock.c                      | 3547 +++++++++++++++++++++++++++++++++++++
 fs/dlm/lock.h                      |   50 
 fs/dlm/lockspace.c                 |  678 +++++++
 fs/dlm/lockspace.h                 |   24 
 fs/dlm/lowcomms.c                  | 1238 ++++++++++++
 fs/dlm/lowcomms.h                  |   26 
 fs/dlm/lvb_table.h                 |   18 
 fs/dlm/main.c                      |   89 
 fs/dlm/member.c                    |  312 +++
 fs/dlm/member.h                    |   24 
 fs/dlm/memory.c                    |  106 +
 fs/dlm/memory.h                    |   29 
 fs/dlm/midcomms.c                  |  140 +
 fs/dlm/midcomms.h                  |   21 
 fs/dlm/rcom.c                      |  457 ++++
 fs/dlm/rcom.h                      |   24 
 fs/dlm/recover.c                   |  762 +++++++
 fs/dlm/recover.h                   |   34 
 fs/dlm/recoverd.c                  |  285 ++
 fs/dlm/recoverd.h                  |   24 
 fs/dlm/requestqueue.c              |  184 +
 fs/dlm/requestqueue.h              |   22 
 fs/dlm/util.c                      |  161 +
 fs/dlm/util.h                      |   22 
 fs/gfs2/Kconfig                    |   44 
 fs/gfs2/Makefile                   |   10 
 fs/gfs2/acl.c                      |  315 +++
 fs/gfs2/acl.h                      |   37 
 fs/gfs2/bmap.c                     | 1103 +++++++++++
 fs/gfs2/bmap.h                     |   32 
 fs/gfs2/daemon.c                   |  196 ++
 fs/gfs2/daemon.h                   |   19 
 fs/gfs2/dir.c                      | 1980 ++++++++++++++++++++
 fs/gfs2/dir.h                      |   73 
 fs/gfs2/eaops.c                    |  230 ++
 fs/gfs2/eaops.h                    |   31 
 fs/gfs2/eattr.c                    | 1548 ++++++++++++++++
 fs/gfs2/eattr.h                    |   97 +
 fs/gfs2/format.h                   |   21 
 fs/gfs2/gfs2.h                     |   31 
 fs/gfs2/glock.c                    | 2280 +++++++++++++++++++++++
 fs/gfs2/glock.h                    |  152 +
 fs/gfs2/glops.c                    |  491 +++++
 fs/gfs2/glops.h                    |   23 
 fs/gfs2/incore.h                   |  659 ++++++
 fs/gfs2/inode.c                    | 1363 ++++++++++++++
 fs/gfs2/inode.h                    |   58 
 fs/gfs2/lm.c                       |  244 ++
 fs/gfs2/lm.h                       |   41 
 fs/gfs2/lm_interface.h             |  295 +++
 fs/gfs2/locking.c                  |  191 +
 fs/gfs2/locking/dlm/Makefile       |    3 
 fs/gfs2/locking/dlm/lock.c         |  541 +++++
 fs/gfs2/locking/dlm/lock_dlm.h     |  188 +
 fs/gfs2/locking/dlm/main.c         |   64 
 fs/gfs2/locking/dlm/mount.c        |  256 ++
 fs/gfs2/locking/dlm/plock.c        |  299 +++
 fs/gfs2/locking/dlm/sysfs.c        |  225 ++
 fs/gfs2/locking/dlm/thread.c       |  352 +++
 fs/gfs2/locking/nolock/Makefile    |    3 
 fs/gfs2/locking/nolock/main.c      |  259 ++
 fs/gfs2/log.c                      |  601 ++++++
 fs/gfs2/log.h                      |   61 
 fs/gfs2/lops.c                     |  800 ++++++++
 fs/gfs2/lops.h                     |   96 +
 fs/gfs2/lvb.c                      |   45 
 fs/gfs2/lvb.h                      |   19 
 fs/gfs2/main.c                     |  127 +
 fs/gfs2/meta_io.c                  |  888 +++++++++
 fs/gfs2/meta_io.h                  |   89 
 fs/gfs2/mount.c                    |  214 ++
 fs/gfs2/mount.h                    |   15 
 fs/gfs2/ondisk.c                   |  304 +++
 fs/gfs2/ops_address.c              |  669 ++++++
 fs/gfs2/ops_address.h              |   17 
 fs/gfs2/ops_dentry.c               |  123 +
 fs/gfs2/ops_dentry.h               |   15 
 fs/gfs2/ops_export.c               |  285 ++
 fs/gfs2/ops_export.h               |   15 
 fs/gfs2/ops_file.c                 | 1000 ++++++++++
 fs/gfs2/ops_file.h                 |   20 
 fs/gfs2/ops_fstype.c               |  842 ++++++++
 fs/gfs2/ops_fstype.h               |   16 
 fs/gfs2/ops_inode.c                | 1163 ++++++++++++
 fs/gfs2/ops_inode.h                |   18 
 fs/gfs2/ops_super.c                |  471 ++++
 fs/gfs2/ops_super.h                |   15 
 fs/gfs2/ops_vm.c                   |  195 ++
 fs/gfs2/ops_vm.h                   |   16 
 fs/gfs2/page.c                     |  268 ++
 fs/gfs2/page.h                     |   23 
 fs/gfs2/quota.c                    | 1305 +++++++++++++
 fs/gfs2/quota.h                    |   32 
 fs/gfs2/recovery.c                 |  575 +++++
 fs/gfs2/recovery.h                 |   32 
 fs/gfs2/rgrp.c                     | 1529 +++++++++++++++
 fs/gfs2/rgrp.h                     |   63 
 fs/gfs2/super.c                    |  928 +++++++++
 fs/gfs2/super.h                    |   52 
 fs/gfs2/sys.c                      |  579 ++++++
 fs/gfs2/sys.h                      |   24 
 fs/gfs2/trans.c                    |  184 +
 fs/gfs2/trans.h                    |   34 
 fs/gfs2/util.c                     |  245 ++
 fs/gfs2/util.h                     |  169 +
 include/linux/dlm.h                |  302 +++
 include/linux/dlm_device.h         |   86 
 include/linux/fs.h                 |    3 
 include/linux/gfs2_ondisk.h        |  441 ++++
 include/linux/iflags.h             |  102 +
 include/linux/kernel.h             |    1 
 include/linux/lock_dlm_plock.h     |   40 
 kernel/printk.c                    |    1 
 mm/filemap.c                       |    1 
 mm/readahead.c                     |    1 
 132 files changed, 40815 insertions(+), 4 deletions(-)



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-20 12:33 ` Christoph Hellwig
@ 2006-06-20 12:55   ` Steven Whitehouse
  2006-06-20 15:55   ` Ingo Molnar
  1 sibling, 0 replies; 46+ messages in thread
From: Steven Whitehouse @ 2006-06-20 12:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, Ingo Molnar, linux-kernel

Hi,

On Tue, 2006-06-20 at 13:33 +0100, Christoph Hellwig wrote:
> On Tue, Jun 20, 2006 at 01:17:13PM +0100, Steven Whitehouse wrote:
> > Hi,
> > 
> > Linus, Andrew suggested to me to send this pull request to you directly.
> > Please consider merging the GFS2 filesystem and DLM from (they are both
> > in the same tree for ease of testing):
> 
> Did anyone actually bother to review it?  It's huge and was in pretty
> bad shapre when I looked last time.  Also in the -mm merge writeup
> you guys said it's only scheduled for 2.6.19 so I didn't even bother
> looking at the huge mess.
> 
Well we have received a few comments from various people, so I would say
that it has been reviewed. You appeared to be happy with the iflags.h
header file which was the last set of comments that I have from you I
think. I'm sorry if we've missed something along the way but I hope
we've been quite careful to address all the issues.

My comments regarding 2.6.19 were due to one particular aspect of the
unlink() call which wasn't quite right. I didn't think at the time I
wrote that message that we'd be ready for this merge window, however
since Linus then went through a final -rc release, there was in fact
time to deal with this. I did post a follow up to clarify that a little
while back, so I'm sorry if it was a bit confusing,

Steve.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-20 12:47   ` Steven Whitehouse
@ 2006-06-20 14:04     ` Nick Piggin
  2006-06-20 15:40       ` Steven Whitehouse
                         ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Nick Piggin @ 2006-06-20 14:04 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, Ingo Molnar, linux-kernel

Thanks.

Steven Whitehouse wrote:
>  kernel/printk.c                    |    1 
>  mm/filemap.c                       |    1 
>  mm/readahead.c                     |    1 

These EXPORTs are a bit unfortunate.

BTW. you have one function that calls file_ra_state_init but never appears
to use the initialized ra_state.

Why is gfs2_internal_read() called the "external read function" in the
changelog?

The internal_read function doesn't look like a great candidate for passing
a ra_state to, which invokes all the mechanism expecting a regular file
being accessed by a user program.

It seems as though you could explicitly control readahead more optimally,
but I don't know what the best way to do that would be. I assume Andrew
has had a quick look and doesn't know either.

The part where you needed file_read_actor looks like pretty much a stright
cut and paste from __generic_file_aio_read, which indicates that you might
be exporting at the wrong level.

Not sure about the tty_ export. Would it be better to make a generic
printfish interface on top of it and also replace the interesting dquot.c
gymnastics? (I don't know)

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-20 14:04     ` Nick Piggin
@ 2006-06-20 15:40       ` Steven Whitehouse
  2006-06-20 19:55         ` Andrew Morton
  2006-06-21 11:20       ` Steven Whitehouse
  2006-06-23 14:45       ` Christoph Hellwig
  2 siblings, 1 reply; 46+ messages in thread
From: Steven Whitehouse @ 2006-06-20 15:40 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, Ingo Molnar, linux-kernel

Hi,

On Wed, 2006-06-21 at 00:04 +1000, Nick Piggin wrote:
> Thanks.
> 
> Steven Whitehouse wrote:
> >  kernel/printk.c                    |    1 
> >  mm/filemap.c                       |    1 
> >  mm/readahead.c                     |    1 
> 
> These EXPORTs are a bit unfortunate.
> 
> BTW. you have one function that calls file_ra_state_init but never appears
> to use the initialized ra_state.
Thanks for pointing that one out, see:
http://www.kernel.org/git/?p=linux/kernel/git/steve/gfs2-2.6.git;a=commitdiff;h=0d42e54220ba34e031167138ef91cbd42d8b5876
for the patch to fix that.

> 
> Why is gfs2_internal_read() called the "external read function" in the
> changelog?
> 
Thats a typo/thinko I think. The title of the entry is still correct
though and should give a bit of a clue to that being wrong in the log.

> The internal_read function doesn't look like a great candidate for passing
> a ra_state to, which invokes all the mechanism expecting a regular file
> being accessed by a user program.
> 
> It seems as though you could explicitly control readahead more optimally,
> but I don't know what the best way to do that would be. I assume Andrew
> has had a quick look and doesn't know either.
> 
I'm not sure if he has looked at this specifically, I've not had any
other feedback suggesting that its a bad idea so far. It is used in
places where a regular file is being read, so it would appear to be the
"right thing" in those cases. At least it seems preferable to me than
the alternative of writing a special case readahead which I doubt would
be a great difference in performance.

> The part where you needed file_read_actor looks like pretty much a stright
> cut and paste from __generic_file_aio_read, which indicates that you might
> be exporting at the wrong level.
> 
Yes, and as the comment says:

/**
 * __gfs2_file_aio_read - The main GFS2 read function
 *
 * N.B. This is almost, but not quite the same as __generic_file_aio_read()
 * the important subtle different being that inode->i_size isn't valid
 * unless we are holding a lock, and we do this _only_ on the O_DIRECT
 * path since otherwise locking is done entirely at the page cache
 * layer.
 */

We have to know the i_size since we fall back to buffered I/O in case
the read is outside the current size of the inode, exactly as other
filesystems do. We only do this on the read path though since the write
path for direct I/O works slightly differently and is not a problem in
that case.

I'm certainly interested in any possible solutions to this, but at the
moment, I can't see any easy way around it. Suggestions are very welcome
of course.

> Not sure about the tty_ export. Would it be better to make a generic
> printfish interface on top of it and also replace the interesting dquot.c
> gymnastics? (I don't know)
> 
Possibly. Originally this code had its own copy of tty_write_message()
and it seemed a bit silly to duplicate this. I'm not sure though that we
really need a more generic interface - its only used for a single
function at the moment, and its crude but effective,

Steve.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-20 12:33 ` Christoph Hellwig
  2006-06-20 12:55   ` Steven Whitehouse
@ 2006-06-20 15:55   ` Ingo Molnar
  1 sibling, 0 replies; 46+ messages in thread
From: Ingo Molnar @ 2006-06-20 15:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Steven Whitehouse, Linus Torvalds, David Teigland,
	Patrick Caulfield, Kevin Anderson, Andrew Morton, linux-kernel


* Christoph Hellwig <hch@infradead.org> wrote:

> On Tue, Jun 20, 2006 at 01:17:13PM +0100, Steven Whitehouse wrote:
> > Hi,
> > 
> > Linus, Andrew suggested to me to send this pull request to you directly.
> > Please consider merging the GFS2 filesystem and DLM from (they are both
> > in the same tree for ease of testing):
> 
> Did anyone actually bother to review it?  It's huge and was in pretty 
> bad shapre when I looked last time.  Also in the -mm merge writeup you 
> guys said it's only scheduled for 2.6.19 so I didn't even bother 
> looking at the huge mess.

i'm confused, are we looking at the same piece of code? Perhaps you are 
still looking at some older codebase? fs/gfs2/ in Steven's current GIT 
tree is a nicely isolated 29K lines of code, fs/dlm/ [distributed lock 
manager] is 11K lines of code. Both look and work like normal Linux 
code. (and there's almost zero impact to generic code.)

Contrast that size for example to the 111K lines of code in fs/xfs/, 
which no doubt has improved recently but still looks a bit alien. For 
example the myriads locking APIs are still quite confusing in XFS, and 
it still feels like a kernel within the kernel. The other cluster 
filesystem which was merged recently, OCFS2, is 44K lines of code. So 
really, on what basis do you call GFS2 "huge"?

	Ingo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-20 15:40       ` Steven Whitehouse
@ 2006-06-20 19:55         ` Andrew Morton
  0 siblings, 0 replies; 46+ messages in thread
From: Andrew Morton @ 2006-06-20 19:55 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: nickpiggin, torvalds, teigland, pcaulfie, kanderso, mingo, linux-kernel

On Tue, 20 Jun 2006 16:40:52 +0100
Steven Whitehouse <swhiteho@redhat.com> wrote:

> > It seems as though you could explicitly control readahead more optimally,
> > but I don't know what the best way to do that would be. I assume Andrew
> > has had a quick look and doesn't know either.
> > 
> I'm not sure if he has looked at this specifically,

No, I haven't made time to look at the GFS2 code at all, really.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-20 14:04     ` Nick Piggin
  2006-06-20 15:40       ` Steven Whitehouse
@ 2006-06-21 11:20       ` Steven Whitehouse
  2006-06-23 14:45       ` Christoph Hellwig
  2 siblings, 0 replies; 46+ messages in thread
From: Steven Whitehouse @ 2006-06-21 11:20 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, Ingo Molnar, linux-kernel

Hi,

On Wed, 2006-06-21 at 00:04 +1000, Nick Piggin wrote:
> Thanks.
> 
> Steven Whitehouse wrote:
> >  kernel/printk.c                    |    1 
> >  mm/filemap.c                       |    1 
> >  mm/readahead.c                     |    1 
> 
> These EXPORTs are a bit unfortunate.
> 
> BTW. you have one function that calls file_ra_state_init but never appears
> to use the initialized ra_state.
> 
> Why is gfs2_internal_read() called the "external read function" in the
> changelog?
> 
> The internal_read function doesn't look like a great candidate for passing
> a ra_state to, which invokes all the mechanism expecting a regular file
> being accessed by a user program.
> 
> It seems as though you could explicitly control readahead more optimally,
> but I don't know what the best way to do that would be. I assume Andrew
> has had a quick look and doesn't know either.
> 
> The part where you needed file_read_actor looks like pretty much a stright
> cut and paste from __generic_file_aio_read, which indicates that you might
> be exporting at the wrong level.
> 
Having thought about this a bit more and taken some advice, I've changed
this export (file_read_actor) to _GPL so that all three of the exports
you mention are now _GPL exports.

The updated tree is at:

git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6.git

Let me know if you still feel that there are further issues to look at,

Steve.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-20 14:04     ` Nick Piggin
  2006-06-20 15:40       ` Steven Whitehouse
  2006-06-21 11:20       ` Steven Whitehouse
@ 2006-06-23 14:45       ` Christoph Hellwig
  2006-06-26 21:03         ` Ingo Molnar
  2 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2006-06-23 14:45 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Steven Whitehouse, Linus Torvalds, David Teigland,
	Patrick Caulfield, Kevin Anderson, Andrew Morton, Ingo Molnar,
	linux-kernel

> The part where you needed file_read_actor looks like pretty much a stright
> cut and paste from __generic_file_aio_read, which indicates that you might
> be exporting at the wrong level.

A definitive NACK to this export.  All other filesystems manage to use
the generic file read code so GFS should do so aswell.  And there's a technical
reason for not exporting aswell as the generic file read interface is far
too complicated already.

> Not sure about the tty_ export. Would it be better to make a generic
> printfish interface on top of it and also replace the interesting dquot.c
> gymnastics? (I don't know)

In fact neither gfs nor dquot should use it at all.  The xfs quota code
is fine without this nonsense.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-20 12:17 GFS2 and DLM Steven Whitehouse
  2006-06-20 12:32 ` Nick Piggin
  2006-06-20 12:33 ` Christoph Hellwig
@ 2006-06-23 14:49 ` Christoph Hellwig
  2006-06-23 20:46   ` Andrew Morton
  2006-06-26 20:03   ` Ingo Molnar
  2006-06-23 14:54 ` Christoph Hellwig
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 46+ messages in thread
From: Christoph Hellwig @ 2006-06-23 14:49 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, Ingo Molnar, linux-kernel

On Tue, Jun 20, 2006 at 01:17:13PM +0100, Steven Whitehouse wrote:
> Hi,
> 
> Linus, Andrew suggested to me to send this pull request to you directly.
> Please consider merging the GFS2 filesystem and DLM from (they are both
> in the same tree for ease of testing):

The code uses GFP_NOFAIL for slab allocator calls.  It's been pointed out
here numerous times that this can't work.  Andrew, what about adding
a check to slab.c to bail out if someone passes it?


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-20 12:17 GFS2 and DLM Steven Whitehouse
                   ` (2 preceding siblings ...)
  2006-06-23 14:49 ` Christoph Hellwig
@ 2006-06-23 14:54 ` Christoph Hellwig
  2006-06-23 15:54   ` Steven Whitehouse
  2006-06-23 14:55 ` Christoph Hellwig
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2006-06-23 14:54 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, Ingo Molnar, linux-kernel

On Tue, Jun 20, 2006 at 01:17:13PM +0100, Steven Whitehouse wrote:
> Hi,
> 
> Linus, Andrew suggested to me to send this pull request to you directly.
> Please consider merging the GFS2 filesystem and DLM from (they are both
> in the same tree for ease of testing):

What's going on with gfs2_repermission?  For one it's a totally useless
wrapper.  Second we prefer to have everyone use vfs_permission or
file_permission and third WTF is it doing anywhere?  Except for very rare
cases checking permissions is the VFS's job.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-20 12:17 GFS2 and DLM Steven Whitehouse
                   ` (3 preceding siblings ...)
  2006-06-23 14:54 ` Christoph Hellwig
@ 2006-06-23 14:55 ` Christoph Hellwig
  2006-06-23 14:57 ` Christoph Hellwig
  2006-06-23 15:00 ` Christoph Hellwig
  6 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2006-06-23 14:55 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, Ingo Molnar, linux-kernel

On Tue, Jun 20, 2006 at 01:17:13PM +0100, Steven Whitehouse wrote:
> Hi,
> 
> Linus, Andrew suggested to me to send this pull request to you directly.
> Please consider merging the GFS2 filesystem and DLM from (they are both
> in the same tree for ease of testing):

gfs2_glock_nq_atime doesn't take per-mountpoint atime into consideration
at all.  Deciding on when to do an atime update is pretty much a VFS
thing anyway.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-20 12:17 GFS2 and DLM Steven Whitehouse
                   ` (4 preceding siblings ...)
  2006-06-23 14:55 ` Christoph Hellwig
@ 2006-06-23 14:57 ` Christoph Hellwig
  2006-06-23 15:26   ` Steven Whitehouse
  2006-06-23 15:00 ` Christoph Hellwig
  6 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2006-06-23 14:57 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, Ingo Molnar, linux-kernel

Why is gfs2_sendfile wrapping generic_file_sendfile?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-20 12:17 GFS2 and DLM Steven Whitehouse
                   ` (5 preceding siblings ...)
  2006-06-23 14:57 ` Christoph Hellwig
@ 2006-06-23 15:00 ` Christoph Hellwig
  2006-06-23 16:29   ` Steven Whitehouse
  6 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2006-06-23 15:00 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, Ingo Molnar, linux-kernel

On Tue, Jun 20, 2006 at 01:17:13PM +0100, Steven Whitehouse wrote:
> Hi,
> 
> Linus, Andrew suggested to me to send this pull request to you directly.
> Please consider merging the GFS2 filesystem and DLM from (they are both
> in the same tree for ease of testing):

A new normal filesystem (aka everything but procfs) shouldn't implement
->readlink but use generic_readlink instead.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-23 14:57 ` Christoph Hellwig
@ 2006-06-23 15:26   ` Steven Whitehouse
  0 siblings, 0 replies; 46+ messages in thread
From: Steven Whitehouse @ 2006-06-23 15:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, Ingo Molnar, linux-kernel

Hi,

On Fri, 2006-06-23 at 15:57 +0100, Christoph Hellwig wrote:
> Why is gfs2_sendfile wrapping generic_file_sendfile?

Might as well start with the easy one first :-) Now fixed in the git
tree. I think it was left over from when GFS used to do locking at this
level but as you point out, is no longer needed,

Steve.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-23 14:54 ` Christoph Hellwig
@ 2006-06-23 15:54   ` Steven Whitehouse
  2006-06-23 15:54     ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Steven Whitehouse @ 2006-06-23 15:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, Ingo Molnar, linux-kernel

Hi,

On Fri, 2006-06-23 at 15:54 +0100, Christoph Hellwig wrote:
> On Tue, Jun 20, 2006 at 01:17:13PM +0100, Steven Whitehouse wrote:
> > Hi,
> > 
> > Linus, Andrew suggested to me to send this pull request to you directly.
> > Please consider merging the GFS2 filesystem and DLM from (they are both
> > in the same tree for ease of testing):
> 
> What's going on with gfs2_repermission?  For one it's a totally useless
> wrapper.  Second we prefer to have everyone use vfs_permission or
> file_permission and third WTF is it doing anywhere?  Except for very rare
> cases checking permissions is the VFS's job.
> 

gfs2_repermission doesn't exist in the latest git tree. I had spotted
that one earlier and replaced it with direct calls to permission.
vfs_permission is just a wrapper for permission.

file_permission which you are advocating using has the comment:

/**
 * file_permission  -  check for additional access rights to a given file
 * @file:       file to check access rights for
 * @mask:       right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
 *
 * Used to check for read/write/execute permissions on an already opened
 * file.
 *
 * Note:
 *      Do not use this function in new code.  All access checks should
 *      be done using vfs_permission().
 */

so I guess thats not the right thing. I'm afraid I fail to see whats wrong
with just calling permission directly... we need to call it mainly because
the VFS only does locking within a single node and we recheck the permissions
in a few places after we've taken the glocks which provide cluster-wide
exclusion.

Steve.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-23 15:54   ` Steven Whitehouse
@ 2006-06-23 15:54     ` Christoph Hellwig
  2006-06-23 16:09       ` Steven Whitehouse
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2006-06-23 15:54 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: Christoph Hellwig, Linus Torvalds, David Teigland,
	Patrick Caulfield, Kevin Anderson, Andrew Morton, Ingo Molnar,
	linux-kernel

On Fri, Jun 23, 2006 at 04:54:04PM +0100, Steven Whitehouse wrote:
> with just calling permission directly... we need to call it mainly because
> the VFS only does locking within a single node and we recheck the permissions
> in a few places after we've taken the glocks which provide cluster-wide
> exclusion.

->permission must give correct answers.  So I think the answer to your question
is that you need to do the right thing there and get rid of the additional
calls.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-23 15:54     ` Christoph Hellwig
@ 2006-06-23 16:09       ` Steven Whitehouse
  0 siblings, 0 replies; 46+ messages in thread
From: Steven Whitehouse @ 2006-06-23 16:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, Ingo Molnar, linux-kernel

Hi,

On Fri, 2006-06-23 at 16:54 +0100, Christoph Hellwig wrote:
> On Fri, Jun 23, 2006 at 04:54:04PM +0100, Steven Whitehouse wrote:
> > with just calling permission directly... we need to call it mainly because
> > the VFS only does locking within a single node and we recheck the permissions
> > in a few places after we've taken the glocks which provide cluster-wide
> > exclusion.
> 
> ->permission must give correct answers.  So I think the answer to your question
> is that you need to do the right thing there and get rid of the additional
> calls.
> 

It does give correct answers. The point is that the answer might change
between ->permission dropping its lock and the actual operation taking
place since there is a time for which the lock is not held where other
nodes might race. To get around this we recheck the permissions in GFS2
to ensure that nothing has changed,

Steve.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-23 15:00 ` Christoph Hellwig
@ 2006-06-23 16:29   ` Steven Whitehouse
  2006-06-23 16:48     ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Steven Whitehouse @ 2006-06-23 16:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, Ingo Molnar, linux-kernel

Hi,

On Fri, 2006-06-23 at 16:00 +0100, Christoph Hellwig wrote:
> On Tue, Jun 20, 2006 at 01:17:13PM +0100, Steven Whitehouse wrote:
> > Hi,
> > 
> > Linus, Andrew suggested to me to send this pull request to you directly.
> > Please consider merging the GFS2 filesystem and DLM from (they are both
> > in the same tree for ease of testing):
> 
> A new normal filesystem (aka everything but procfs) shouldn't implement
> ->readlink but use generic_readlink instead.
> 

The comment above generic_readlink has this to say:

/*
 * A helper for ->readlink().  This should be used *ONLY* for symlinks that
 * have ->follow_link() touching nd only in nd_set_link().  Using (or not
 * using) it for any given inode is up to filesystem.
 */

which appears, at least, to contradict what you are saying. I'll put it
on my list to look at again, but a straight substitution of
generic_readlink() does not work, so I'd prefer to leave it as it is for
the moment,

Steve.
  


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-23 16:29   ` Steven Whitehouse
@ 2006-06-23 16:48     ` Christoph Hellwig
  2006-06-26 20:58       ` Ingo Molnar
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2006-06-23 16:48 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: Christoph Hellwig, Linus Torvalds, David Teigland,
	Patrick Caulfield, Kevin Anderson, Andrew Morton, Ingo Molnar,
	linux-kernel

On Fri, Jun 23, 2006 at 05:29:34PM +0100, Steven Whitehouse wrote:
> Hi,
> 
> On Fri, 2006-06-23 at 16:00 +0100, Christoph Hellwig wrote:
> > On Tue, Jun 20, 2006 at 01:17:13PM +0100, Steven Whitehouse wrote:
> > > Hi,
> > > 
> > > Linus, Andrew suggested to me to send this pull request to you directly.
> > > Please consider merging the GFS2 filesystem and DLM from (they are both
> > > in the same tree for ease of testing):
> > 
> > A new normal filesystem (aka everything but procfs) shouldn't implement
> > ->readlink but use generic_readlink instead.
> > 
> 
> The comment above generic_readlink has this to say:
> 
> /*
>  * A helper for ->readlink().  This should be used *ONLY* for symlinks that
>  * have ->follow_link() touching nd only in nd_set_link().  Using (or not
>  * using) it for any given inode is up to filesystem.
>  */
> 
> which appears, at least, to contradict what you are saying. I'll put it
> on my list to look at again, but a straight substitution of
> generic_readlink() does not work, so I'd prefer to leave it as it is for
> the moment,

The above is the common and preffered case.  The only intree filesystem
not doing it is procfs.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-23 14:49 ` Christoph Hellwig
@ 2006-06-23 20:46   ` Andrew Morton
  2006-06-26 20:03   ` Ingo Molnar
  1 sibling, 0 replies; 46+ messages in thread
From: Andrew Morton @ 2006-06-23 20:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: swhiteho, torvalds, teigland, pcaulfie, kanderso, mingo, linux-kernel

On Fri, 23 Jun 2006 15:49:28 +0100
Christoph Hellwig <hch@infradead.org> wrote:

> On Tue, Jun 20, 2006 at 01:17:13PM +0100, Steven Whitehouse wrote:
> > Hi,
> > 
> > Linus, Andrew suggested to me to send this pull request to you directly.
> > Please consider merging the GFS2 filesystem and DLM from (they are both
> > in the same tree for ease of testing):
> 
> The code uses GFP_NOFAIL for slab allocator calls.

All existing users of GFP_NOFAIL are lame and should be fixed to handle
ENOMEM.  We shouldn't add new ones.

And we shouldn't open-code the retry loops to avoid attracting attention,
either ;)

>  It's been pointed out
> here numerous times that this can't work.  Andrew, what about adding
> a check to slab.c to bail out if someone passes it?

Is there anything special about slab which makes the use of GFP_NOFAIL
worse when called from the slab code?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-23 14:49 ` Christoph Hellwig
  2006-06-23 20:46   ` Andrew Morton
@ 2006-06-26 20:03   ` Ingo Molnar
  2006-06-26 20:12     ` Arjan van de Ven
  2006-06-27  6:33     ` Ingo Molnar
  1 sibling, 2 replies; 46+ messages in thread
From: Ingo Molnar @ 2006-06-26 20:03 UTC (permalink / raw)
  To: Christoph Hellwig, Steven Whitehouse, Linus Torvalds,
	David Teigland, Patrick Caulfield, Kevin Anderson, Andrew Morton,
	linux-kernel


* Christoph Hellwig <hch@infradead.org> wrote:

> The code uses GFP_NOFAIL for slab allocator calls.  It's been pointed 
> out here numerous times that this can't work.  Andrew, what about 
> adding a check to slab.c to bail out if someone passes it?

reiserfs, jbd and NTFS are all using GFP_NOFAIL ...

i dont think this is a huge issue that should block merging.

	Ingo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-26 20:03   ` Ingo Molnar
@ 2006-06-26 20:12     ` Arjan van de Ven
  2006-06-27  7:08       ` Ingo Molnar
  2006-06-27  6:33     ` Ingo Molnar
  1 sibling, 1 reply; 46+ messages in thread
From: Arjan van de Ven @ 2006-06-26 20:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Hellwig, Steven Whitehouse, Linus Torvalds,
	David Teigland, Patrick Caulfield, Kevin Anderson, Andrew Morton,
	linux-kernel

On Mon, 2006-06-26 at 22:03 +0200, Ingo Molnar wrote:
> * Christoph Hellwig <hch@infradead.org> wrote:
> 
> > The code uses GFP_NOFAIL for slab allocator calls.  It's been pointed 
> > out here numerous times that this can't work.  Andrew, what about 
> > adding a check to slab.c to bail out if someone passes it?
> 
> reiserfs, jbd and NTFS are all using GFP_NOFAIL ...
> 

they use it for slab or for get_free_pages() ?



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-23 16:48     ` Christoph Hellwig
@ 2006-06-26 20:58       ` Ingo Molnar
  2006-06-27  7:50         ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Ingo Molnar @ 2006-06-26 20:58 UTC (permalink / raw)
  To: Christoph Hellwig, Steven Whitehouse, Linus Torvalds,
	David Teigland, Patrick Caulfield, Kevin Anderson, Andrew Morton,
	linux-kernel


* Christoph Hellwig <hch@infradead.org> wrote:

> On Fri, Jun 23, 2006 at 05:29:34PM +0100, Steven Whitehouse wrote:
> > Hi,
> > 
> > On Fri, 2006-06-23 at 16:00 +0100, Christoph Hellwig wrote:
> > > On Tue, Jun 20, 2006 at 01:17:13PM +0100, Steven Whitehouse wrote:
> > > > Hi,
> > > > 
> > > > Linus, Andrew suggested to me to send this pull request to you directly.
> > > > Please consider merging the GFS2 filesystem and DLM from (they are both
> > > > in the same tree for ease of testing):
> > > 
> > > A new normal filesystem (aka everything but procfs) shouldn't implement
> > > ->readlink but use generic_readlink instead.
> > > 
> > 
> > The comment above generic_readlink has this to say:
> > 
> > /*
> >  * A helper for ->readlink().  This should be used *ONLY* for symlinks that
> >  * have ->follow_link() touching nd only in nd_set_link().  Using (or not
> >  * using) it for any given inode is up to filesystem.
> >  */
> > 
> > which appears, at least, to contradict what you are saying. I'll put 
> > it on my list to look at again, but a straight substitution of 
> > generic_readlink() does not work, so I'd prefer to leave it as it is 
> > for the moment,
> 
> The above is the common and preffered case.  The only intree 
> filesystem not doing it is procfs.

i think you might be missing that GFS does cross-node locking in 
readlink too. (OCFS2 does not do it because it apparently does not care 
about cross-node atime correctness here it seems.) So GFS simply cannot 
use generic_readlink()!

i'm all for enhancing vfs_readlink()/generic_readlink() so that local 
locking can be extended if needed, but otherwise this does not seem to 
be a merge showstopper to me. GFS simply implements something that 
no-one implemented until now. (i dont know how XFS's non-GPL binary-only 
clustering module does it, but i'd not be surprised if it defined its 
own readlink implementation too.)

	Ingo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-23 14:45       ` Christoph Hellwig
@ 2006-06-26 21:03         ` Ingo Molnar
  2006-06-27  7:52           ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Ingo Molnar @ 2006-06-26 21:03 UTC (permalink / raw)
  To: Christoph Hellwig, Nick Piggin, Steven Whitehouse,
	Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, linux-kernel


* Christoph Hellwig <hch@infradead.org> wrote:

> > The part where you needed file_read_actor looks like pretty much a stright
> > cut and paste from __generic_file_aio_read, which indicates that you might
> > be exporting at the wrong level.
> 
> A definitive NACK to this export.  All other filesystems manage to use 
> the generic file read code so GFS should do so aswell.  And there's a 
> technical reason for not exporting aswell as the generic file read 
> interface is far too complicated already.

GFS is different here mostly due to locking, because one of its strong 
features is an implementation of pretty strict POSIX semantics in a 
clustered environment, something that no other Linux FS (that is 
available in source code) has done so far. (OCFS2 does not do it as 
strictly - it has a very specific application in mind)

so i'd reformulate your request as a request to extend the VFS to unify 
clustering filesystems - which is a nice cleanup goal but not a merge 
showstopper to me.

> > Not sure about the tty_ export. Would it be better to make a generic 
> > printfish interface on top of it and also replace the interesting 
> > dquot.c gymnastics? (I don't know)
> 
> In fact neither gfs nor dquot should use it at all.  The xfs quota 
> code is fine without this nonsense.

yeah, the tty_ export is unnecessary and should be fixed. But this seems 
quite easy to do.

	Ingo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-26 20:03   ` Ingo Molnar
  2006-06-26 20:12     ` Arjan van de Ven
@ 2006-06-27  6:33     ` Ingo Molnar
  2006-06-27  6:43       ` Arjan van de Ven
                         ` (2 more replies)
  1 sibling, 3 replies; 46+ messages in thread
From: Ingo Molnar @ 2006-06-27  6:33 UTC (permalink / raw)
  To: Christoph Hellwig, Steven Whitehouse, Linus Torvalds,
	David Teigland, Patrick Caulfield, Kevin Anderson, Andrew Morton,
	linux-kernel


* Ingo Molnar <mingo@elte.hu> wrote:

> * Christoph Hellwig <hch@infradead.org> wrote:
> 
> > The code uses GFP_NOFAIL for slab allocator calls.  It's been 
> > pointed out here numerous times that this can't work.  Andrew, what 
> > about adding a check to slab.c to bail out if someone passes it?
> 
> reiserfs, jbd and NTFS are all using GFP_NOFAIL ...
> 
> i dont think this is a huge issue that should block merging.

oh, and XFS has this little gem in its journalling code:

 fs/xfs/xfs_log.c:

 STATIC void
 xlog_state_ticket_alloc(xlog_t *log)
 {
 [...]
         /*
          * The kmem_zalloc may sleep, so we shouldn't be holding the
          * global lock.  XXXmiken: may want to use zone allocator.
          */
         buf = (xfs_caddr_t) kmem_zalloc(NBPP, KM_SLEEP);

         s = LOG_LOCK(log);

         /* Attach 1st ticket to Q, so we can keep track of allocated memory */
         t_list = (xlog_ticket_t *)buf;
         t_list->t_next = log->l_unmount_free;
 [...]

where kmem_zalloc() may fail!!!

So XFS is apprarently hiding the "journalling allocations must not fail" 
problem by ... crashing? Wow! Most of the other journalling filesystems 
loop on the allocator: the honest ones do it via GFP_NOFAIL, others via 
open-coded infinite retry loops.

Just in case anyone says 'preallocate': that's _hard_ to do in a 
sophisticated filesystem, which has many dynamic (and delayed) decisions 
that make the prediction of resource overhead difficult. That's the 
fundamental reason why basically all journalling filesystems either loop 
(or the really enterprise quality ones: crash ;) on allocation failure.

Btw., i have just taken a 5 minute tour into XFS, and i found at least 5 
other problems with the XFS code that are similar in nature to the ones 
you pointed out. (mostly useless wrappers around Linux functionality) 
Isnt this whole episode highly hypocritic to begin with?

	Ingo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-27  6:33     ` Ingo Molnar
@ 2006-06-27  6:43       ` Arjan van de Ven
  2006-06-27  7:07         ` Ingo Molnar
  2006-06-27  7:06       ` Andrew Morton
  2006-06-27  8:16       ` Nathan Scott
  2 siblings, 1 reply; 46+ messages in thread
From: Arjan van de Ven @ 2006-06-27  6:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Hellwig, Steven Whitehouse, Linus Torvalds,
	David Teigland, Patrick Caulfield, Kevin Anderson, Andrew Morton,
	linux-kernel


> Btw., i have just taken a 5 minute tour into XFS, and i found at least 5 
> other problems with the XFS code that are similar in nature to the ones 
> you pointed out. 

that's no reason to merge even more junk

> (mostly useless wrappers around Linux functionality) 
> Isnt this whole episode highly hypocritic to begin with?

lets not start calling names please...



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-27  6:33     ` Ingo Molnar
  2006-06-27  6:43       ` Arjan van de Ven
@ 2006-06-27  7:06       ` Andrew Morton
  2006-06-27  8:35         ` Ingo Molnar
  2006-06-27  8:16       ` Nathan Scott
  2 siblings, 1 reply; 46+ messages in thread
From: Andrew Morton @ 2006-06-27  7:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: hch, swhiteho, torvalds, teigland, pcaulfie, kanderso, linux-kernel

On Tue, 27 Jun 2006 08:33:39 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> Isnt this whole episode highly hypocritic to begin with?

Might be, but that's not relevant to GFS2's suitability.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-27  6:43       ` Arjan van de Ven
@ 2006-06-27  7:07         ` Ingo Molnar
  0 siblings, 0 replies; 46+ messages in thread
From: Ingo Molnar @ 2006-06-27  7:07 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Christoph Hellwig, Steven Whitehouse, Linus Torvalds,
	David Teigland, Patrick Caulfield, Kevin Anderson, Andrew Morton,
	linux-kernel


* Arjan van de Ven <arjan@infradead.org> wrote:

> > Btw., i have just taken a 5 minute tour into XFS, and i found at 
> > least 5 other problems with the XFS code that are similar in nature 
> > to the ones you pointed out.
> 
> that's no reason to merge even more junk

the GFS2 ones were immediately addressed, 28 minutes and 27 seconds 
after they were pointed out:

  Date: Fri, 23 Jun 2006 15:57:56 +0100
  From: Christoph Hellwig <hch@infradead.org>
  Subject: Re: GFS2 and DLM

  Date: Fri, 23 Jun 2006 16:26:23 +0100
  From: Steven Whitehouse <swhiteho@redhat.com>
  Subject: Re: GFS2 and DLM

> > (mostly useless wrappers around Linux functionality) Isnt this whole 
> > episode highly hypocritic to begin with?
> 
> lets not start calling names please...

well, i guess it's way too late for that:

> > Christoph Hellwig wrote:
> > > Did anyone actually bother to review it?  It's huge and was in 
> > > pretty bad shapre when I looked last time.  Also in the -mm merge 
> > > writeup you guys said it's only scheduled for 2.6.19 so I didn't 
> > > even bother looking at the huge mess.

:-)

	Ingo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-26 20:12     ` Arjan van de Ven
@ 2006-06-27  7:08       ` Ingo Molnar
  0 siblings, 0 replies; 46+ messages in thread
From: Ingo Molnar @ 2006-06-27  7:08 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Christoph Hellwig, Steven Whitehouse, Linus Torvalds,
	David Teigland, Patrick Caulfield, Kevin Anderson, Andrew Morton,
	linux-kernel


* Arjan van de Ven <arjan@infradead.org> wrote:

> On Mon, 2006-06-26 at 22:03 +0200, Ingo Molnar wrote:
> > * Christoph Hellwig <hch@infradead.org> wrote:
> > 
> > > The code uses GFP_NOFAIL for slab allocator calls.  It's been pointed 
> > > out here numerous times that this can't work.  Andrew, what about 
> > > adding a check to slab.c to bail out if someone passes it?
> > 
> > reiserfs, jbd and NTFS are all using GFP_NOFAIL ...
> > 
> 
> they use it for slab or for get_free_pages() ?

for jbd and NTFS it's SLAB.

	Ingo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-26 20:58       ` Ingo Molnar
@ 2006-06-27  7:50         ` Christoph Hellwig
  2006-06-27  8:31           ` Ingo Molnar
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2006-06-27  7:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Hellwig, Steven Whitehouse, Linus Torvalds,
	David Teigland, Patrick Caulfield, Kevin Anderson, Andrew Morton,
	linux-kernel

On Mon, Jun 26, 2006 at 10:58:24PM +0200, Ingo Molnar wrote:
> i think you might be missing that GFS does cross-node locking in 
> readlink too. (OCFS2 does not do it because it apparently does not care 
> about cross-node atime correctness here it seems.) So GFS simply cannot 
> use generic_readlink()!

Please read the code before giving such useless comments.  ->follow_link
needs exactly the same locking as ->readlink.  The whole point of
using generic_readlink is to avoid having the filesystem reimplement
almost the same code twice, once copying to a kernel buffer and once to
a user buffer.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-26 21:03         ` Ingo Molnar
@ 2006-06-27  7:52           ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2006-06-27  7:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Hellwig, Nick Piggin, Steven Whitehouse,
	Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, linux-kernel

On Mon, Jun 26, 2006 at 11:03:55PM +0200, Ingo Molnar wrote:
> > the generic file read code so GFS should do so aswell.  And there's a 
> > technical reason for not exporting aswell as the generic file read 
> > interface is far too complicated already.
> 
> GFS is different here mostly due to locking,

If it requires locking there it's sendifle and splice_read implementation
are wrong.

<marketing crap removed>

ocfs2 seems to do pretty well using the generic code here, so there
definitly is a explanation required why gfs can't do the same.

> so i'd reformulate your request as a request to extend the VFS to unify 
> clustering filesystems - which is a nice cleanup goal but not a merge 
> showstopper to me.

duplicating half the generic read code is definitily not okay.  I'm
happy to have a discussion about more or less hacly ways to share the
code mid-term.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-27  6:33     ` Ingo Molnar
  2006-06-27  6:43       ` Arjan van de Ven
  2006-06-27  7:06       ` Andrew Morton
@ 2006-06-27  8:16       ` Nathan Scott
  2006-06-27  8:22         ` Ingo Molnar
  2 siblings, 1 reply; 46+ messages in thread
From: Nathan Scott @ 2006-06-27  8:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Hellwig, Steven Whitehouse, Linus Torvalds,
	David Teigland, Patrick Caulfield, Kevin Anderson, Andrew Morton,
	linux-kernel

On Tue, Jun 27, 2006 at 08:33:39AM +0200, Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> > * Christoph Hellwig <hch@infradead.org> wrote:
> > 
> > > The code uses GFP_NOFAIL for slab allocator calls.  It's been 
> > > pointed out here numerous times that this can't work.  Andrew, what 
> > > about adding a check to slab.c to bail out if someone passes it?
> > 
> > reiserfs, jbd and NTFS are all using GFP_NOFAIL ...
> > 
> > i dont think this is a huge issue that should block merging.
> 
> oh, and XFS has this little gem in its journalling code:
> 

[snip misinterpreted code/flag]

>           */
>          buf = (xfs_caddr_t) kmem_zalloc(NBPP, KM_SLEEP);
>  [...]
> 
> where kmem_zalloc() may fail!!!

Not with the flags it was given.

> So XFS is apprarently hiding the "journalling allocations must not fail" 
> problem by ... crashing? Wow! Most of the other journalling filesystems 
> loop on the allocator: the honest ones do it via GFP_NOFAIL, others via 
> open-coded infinite retry loops.

No, please look more closely before making such claims.

> sophisticated filesystem, which has many dynamic (and delayed) decisions 
> that make the prediction of resource overhead difficult. That's the 
> fundamental reason why basically all journalling filesystems either loop 
> (or the really enterprise quality ones: crash ;) on allocation failure.

> other problems with the XFS code that are similar in nature to the ones 
> you pointed out. (mostly useless wrappers around Linux functionality) 

You've missed subtleties in those "useless wrappers", which are
preventing the problem you claim is there.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-27  8:16       ` Nathan Scott
@ 2006-06-27  8:22         ` Ingo Molnar
  2006-06-27  8:41           ` Pekka Enberg
  2006-06-27  8:42           ` Nathan Scott
  0 siblings, 2 replies; 46+ messages in thread
From: Ingo Molnar @ 2006-06-27  8:22 UTC (permalink / raw)
  To: Nathan Scott
  Cc: Christoph Hellwig, Steven Whitehouse, Linus Torvalds,
	David Teigland, Patrick Caulfield, Kevin Anderson, Andrew Morton,
	linux-kernel


* Nathan Scott <nathans@sgi.com> wrote:

> >           */
> >          buf = (xfs_caddr_t) kmem_zalloc(NBPP, KM_SLEEP);
> >  [...]
> > 
> > where kmem_zalloc() may fail!!!
> 
> Not with the flags it was given.

yeah, you are right - sorry about that.

XFS instead loops infinitely:

 void *
 kmem_alloc(size_t size, unsigned int __nocast flags)
 {
         int     retries = 0;
         gfp_t   lflags = kmem_flags_convert(flags);
         void    *ptr;

         do {
                 if (size < MAX_SLAB_SIZE || retries > MAX_VMALLOCS)
                         ptr = kmalloc(size, lflags);
                 else
                         ptr = __vmalloc(size, lflags, PAGE_KERNEL);
                 if (ptr || (flags & (KM_MAYFAIL|KM_NOSLEEP)))
                         return ptr;
                 if (!(++retries % 100))
                         printk(KERN_ERR "XFS: possible memory allocation "
                                         "deadlock in %s (mode:0x%x)\n",
                                         __FUNCTION__, lflags);
                 blk_congestion_wait(WRITE, HZ/50);
         } while (1);
 }

which is in essence an open-coded GFP_NOFAIL implementation. Here's what 
__GFP_NOFAIL does:

                        if (gfp_mask & __GFP_NOFAIL) {
                                blk_congestion_wait(WRITE, HZ/50);
                                goto nofail_alloc;
                        }

and since XFS makes use of KM_SLEEP in 130+ callsites, that means it is 
in essence using GFP_NOFAIL massively!

	Ingo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-27  7:50         ` Christoph Hellwig
@ 2006-06-27  8:31           ` Ingo Molnar
  0 siblings, 0 replies; 46+ messages in thread
From: Ingo Molnar @ 2006-06-27  8:31 UTC (permalink / raw)
  To: Christoph Hellwig, Steven Whitehouse, Linus Torvalds,
	David Teigland, Patrick Caulfield, Kevin Anderson, Andrew Morton,
	linux-kernel


* Christoph Hellwig <hch@infradead.org> wrote:

> ->follow_link needs exactly the same locking as ->readlink.  The whole 
> point of using generic_readlink is to avoid having the filesystem 
> reimplement almost the same code twice, once copying to a kernel 
> buffer and once to a user buffer.

yeah, you are right, i confused it with ->follow_link() and was wrong 
about the locking: generic_readlink() is just a wrapper around 
->follow_link() and vfs_readlink().

Still, as far as i can see the gfs2 implementation of readlink is faster 
(and hence a valid solution), because it knows the length of the symlink 
buffer and hence can avoid the strlen() call in vfs_readlink():

 int vfs_readlink(struct dentry *dentry, char __user *buffer, int buflen, const char *link)
 {
         int len;

         len = PTR_ERR(link);
         if (IS_ERR(link))
                 goto out;

         len = strlen(link); <============= [this one]

while gfs2 can do a straight copy to userspace:

        error = gfs2_readlinki(ip, &buf, &len);
        if (error)
                return error;

        if (user_size > len - 1)
                user_size = len - 1;

        if (copy_to_user(user_buf, buf, user_size))
                error = -EFAULT;
        else
                error = user_size;

btw., ocfs2 does not use generic_readlink() either.

> Please read the code before giving such useless comments.  

thank you for the encouragement to participate in VFS review activities, 
it's really appreciated! It's always a joy taking part in lkml 
discussions.

	Ingo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-27  7:06       ` Andrew Morton
@ 2006-06-27  8:35         ` Ingo Molnar
  2006-06-27  8:50           ` Andrew Morton
  2006-07-03 13:40           ` Steven Whitehouse
  0 siblings, 2 replies; 46+ messages in thread
From: Ingo Molnar @ 2006-06-27  8:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hch, swhiteho, torvalds, teigland, pcaulfie, kanderso, linux-kernel


* Andrew Morton <akpm@osdl.org> wrote:

> On Tue, 27 Jun 2006 08:33:39 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > Isnt this whole episode highly hypocritic to begin with?
> 
> Might be, but that's not relevant to GFS2's suitability.

it is relevant to a certain degree, because it creates a (IMO) false 
impression of merging showstoppers. After months of being in -mm, and 
after addressing all issues that were raised (and there was a fair 
amount of review activity December last year iirc), one week prior the 
close of the merge window a 'huge' list of issues are raised. (after 
belovingly calling the GFS2 code a "huge mess", to create a positive and 
productive tone for the review discussion i guess.)

So far in my reading there are only 2 serious ones in that list:

 - tty_* use in cluster-aware quota.c. Firstly, ocfs2 doesnt do quota -
   which is fair enough, but this also means that there was no in-tree 
   filesystem to base stuff off. Secondly, the tty_* use was inherited 
   from fs/quota.c - hardly something i'd consider a fatal sin. Anyway, 
   despite the mitigating factors it is an arguably lame thing and 
   it should be (and will be) fixed.

 - GFP_NOFAIL: most other journalling filesystems seem to be doing this
   or worse. Fixing it is _hard_. Suddenly this becomes a showstopper? 
   Huh?

(the "use the generic facilities" arguments are only valid if the 
generic facilities can be used as-is, and if they are just optimal as 
the one implemented by the filesystem.)

	Ingo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-27  8:22         ` Ingo Molnar
@ 2006-06-27  8:41           ` Pekka Enberg
  2006-06-27 10:33             ` Ingo Molnar
  2006-06-27  8:42           ` Nathan Scott
  1 sibling, 1 reply; 46+ messages in thread
From: Pekka Enberg @ 2006-06-27  8:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nathan Scott, Christoph Hellwig, Steven Whitehouse,
	Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, linux-kernel

On 6/27/06, Ingo Molnar <mingo@elte.hu> wrote:
> and since XFS makes use of KM_SLEEP in 130+ callsites, that means it is
> in essence using GFP_NOFAIL massively!

Yup, it's not just a wrapper, XFS really has its own allocator and the
PF_FSTRANS magic makes it hard to convert to slab proper.

                                             Pekka

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-27  8:22         ` Ingo Molnar
  2006-06-27  8:41           ` Pekka Enberg
@ 2006-06-27  8:42           ` Nathan Scott
  2006-06-27  8:51             ` Ingo Molnar
  1 sibling, 1 reply; 46+ messages in thread
From: Nathan Scott @ 2006-06-27  8:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Hellwig, Steven Whitehouse, Linus Torvalds,
	David Teigland, Patrick Caulfield, Kevin Anderson, Andrew Morton,
	linux-kernel

On Tue, Jun 27, 2006 at 10:22:40AM +0200, Ingo Molnar wrote:
> yeah, you are right - sorry about that.

No worries.

> which is in essence an open-coded GFP_NOFAIL implementation. Here's what 
> __GFP_NOFAIL does:

Heh, trust me, I know very well what the code does here.

> and since XFS makes use of KM_SLEEP in 130+ callsites, that means it is 
> in essence using GFP_NOFAIL massively!

Their locations have been carefully audited and understood.  The original
issue here was IRIX being able to do a very good of preventing kernel
memory allocation failures, which I suspect caused the original XFS guys
to be fairly relaxed in their handling of memory allocation failures.
Its caused us no end of pain with the Linux port, I assure you.

But, I digress.  If Christoph says you have a problem, you probably do;
he really is not biased XFS's way - he's driven us to make many changes
leading up to our major merge points too, and he will happily point out
crappy code in XFS as well. ;-)

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-27  8:35         ` Ingo Molnar
@ 2006-06-27  8:50           ` Andrew Morton
  2006-06-27  9:04             ` Ingo Molnar
  2006-07-03 13:40           ` Steven Whitehouse
  1 sibling, 1 reply; 46+ messages in thread
From: Andrew Morton @ 2006-06-27  8:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: hch, swhiteho, torvalds, teigland, pcaulfie, kanderso, linux-kernel

On Tue, 27 Jun 2006 10:35:44 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> * Andrew Morton <akpm@osdl.org> wrote:
> 
> > On Tue, 27 Jun 2006 08:33:39 +0200
> > Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > Isnt this whole episode highly hypocritic to begin with?
> > 
> > Might be, but that's not relevant to GFS2's suitability.
> 
> it is relevant to a certain degree, because it creates a (IMO) false 
> impression of merging showstoppers. After months of being in -mm, and 
> after addressing all issues that were raised (and there was a fair 
> amount of review activity December last year iirc), one week prior the 
> close of the merge window a 'huge' list of issues are raised. (after 
> belovingly calling the GFS2 code a "huge mess", to create a positive and 
> productive tone for the review discussion i guess.)

It's a general problem - our reviewing resources do not have the capacity
to cover our coding resources.  This is especially the case on filesystems.
 We'd have merged (a very different) reiser4 a year ago if things were
in balance.

(and our code-breaking resources appear to exceed our code-fixing resources
too, but that's another topic).

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-27  8:42           ` Nathan Scott
@ 2006-06-27  8:51             ` Ingo Molnar
  0 siblings, 0 replies; 46+ messages in thread
From: Ingo Molnar @ 2006-06-27  8:51 UTC (permalink / raw)
  To: Nathan Scott
  Cc: Christoph Hellwig, Steven Whitehouse, Linus Torvalds,
	David Teigland, Patrick Caulfield, Kevin Anderson, Andrew Morton,
	linux-kernel


* Nathan Scott <nathans@sgi.com> wrote:

> > and since XFS makes use of KM_SLEEP in 130+ callsites, that means it 
> > is in essence using GFP_NOFAIL massively!
> 
> Their locations have been carefully audited and understood.  The 
> original issue here was IRIX being able to do a very good of 
> preventing kernel memory allocation failures, which I suspect caused 
> the original XFS guys to be fairly relaxed in their handling of memory 
> allocation failures. Its caused us no end of pain with the Linux port, 
> I assure you.

i know it's a hard problem, and i'm not suggesting at all that this is 
easy to fix. Nevertheless there are 130 allocation callsites in XFS that 
do implicit GFS_NOFAIL in essence, and 7 callsites in GFS2 that mention 
__GFS_NOFAIL explicitly. Ext3 does __GFP_NOFAIL in its journalling code 
too. Reiser too.

	Ingo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-27  8:50           ` Andrew Morton
@ 2006-06-27  9:04             ` Ingo Molnar
  2006-06-27  9:23               ` Andrew Morton
  0 siblings, 1 reply; 46+ messages in thread
From: Ingo Molnar @ 2006-06-27  9:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hch, swhiteho, torvalds, teigland, pcaulfie, kanderso, linux-kernel


* Andrew Morton <akpm@osdl.org> wrote:

> > it is relevant to a certain degree, because it creates a (IMO) false 
> > impression of merging showstoppers. After months of being in -mm, 
> > and after addressing all issues that were raised (and there was a 
> > fair amount of review activity December last year iirc), one week 
> > prior the close of the merge window a 'huge' list of issues are 
> > raised. (after belovingly calling the GFS2 code a "huge mess", to 
> > create a positive and productive tone for the review discussion i 
> > guess.)
> 
> It's a general problem - our reviewing resources do not have the 
> capacity to cover our coding resources.  This is especially the case 
> on filesystems.  We'd have merged (a very different) reiser4 a year 
> ago if things were in balance.

and just this very minute what gets merged upstream? A chunk of OCFS2 
code that has this comment in it:

 * NOTE: the allocation error cases here are scary
 * we really cannot afford to fail an alloc in recovery
 * do we spin?  returning an error only delays the problem really

plus this code:

                        /* sleep for a bit in hopes that we can avoid
                         * another ENOMEM */
                        msleep(100);
                        goto retry;

and this:

                        /* TODO Look into replacing msleep with cond_resched() */
                        msleep(100);
                        goto retry;

and this:

                /* yield a bit to allow any final network messages
                 * to get handled on remaining nodes */
                msleep(100);

and this:

                if (status < 0) {
                        mlog(ML_ERROR, "%s: failed to alloc recovery area, "
                             "retrying\n", dlm->name);
                        msleep(1000);
                }

and this:

                                } else {
                                        /* -ENOMEM on the other node */
                                        mlog(0, "%s: node %u returned "
                                             "%d during recovery, retrying "
                                             "after a short wait\n",
                                             dlm->name, ndata->node_num,
                                             status);
                                        msleep(100);
                                }

and that's just from a 60 seconds scan.

and we are not merging GFS2 that does an honest __GFP_NOFAIL for a hard 
to solve problem? (Btw., __GFP_NOFAIL is actually more robust due to the 
congestion sleep it does, it is more reviewable and more fixable thing 
than an open-coded msleep() or cond_resched().)

"Hypocrisy", "double standard", "pot calling the kettle black" is pretty 
much the nicest words that come to mind :-(

[and again, i'm not blaming XFS or OCFS2 here.]

	Ingo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-27  9:04             ` Ingo Molnar
@ 2006-06-27  9:23               ` Andrew Morton
  0 siblings, 0 replies; 46+ messages in thread
From: Andrew Morton @ 2006-06-27  9:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: hch, swhiteho, torvalds, teigland, pcaulfie, kanderso, linux-kernel

On Tue, 27 Jun 2006 11:04:08 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> > It's a general problem - our reviewing resources do not have the 
> > capacity to cover our coding resources.  This is especially the case 
> > on filesystems.  We'd have merged (a very different) reiser4 a year 
> > ago if things were in balance.
> 
> and just this very minute what gets merged upstream? A chunk of OCFS2 

oh yes, running a git tree is a wonderful way to avoid code review.

We ought to require that people get each diff onto a useful mailing list
for review before sending them upstream.  And not in batches of 100 five
minutes before sending the pull request.

oh, and we ought to require that people send patches upstream more than
fifteen minutes after they wrote them.

<starts to calm down>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-27  8:41           ` Pekka Enberg
@ 2006-06-27 10:33             ` Ingo Molnar
  0 siblings, 0 replies; 46+ messages in thread
From: Ingo Molnar @ 2006-06-27 10:33 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Nathan Scott, Christoph Hellwig, Steven Whitehouse,
	Linus Torvalds, David Teigland, Patrick Caulfield,
	Kevin Anderson, Andrew Morton, linux-kernel


* Pekka Enberg <penberg@cs.helsinki.fi> wrote:

> On 6/27/06, Ingo Molnar <mingo@elte.hu> wrote:
> >and since XFS makes use of KM_SLEEP in 130+ callsites, that means it is
> >in essence using GFP_NOFAIL massively!
> 
> Yup, it's not just a wrapper, XFS really has its own allocator and the 
> PF_FSTRANS magic makes it hard to convert to slab proper.

AFAICS PF_FSTRANS is mostly just avoidance of having to pass the 
__GFP_FS flag around:

                if ((current->flags & PF_FSTRANS) || (flags & KM_NOFS))
                        lflags &= ~__GFP_FS;

Btw., this XFS way of handling PF_FSTRANS in a nested and letting it 
translate into the clearing of __GFP_FS might just be the proper way of 
doing it in other FS and IO code too?

Conceptually, it is really the property of the general task context that 
it's "inside a filesystem transaction", not a property of the immediate 
allocation callsite. I.e. we should transform all uses of GFP_NOFS into 
PF_FSTRANS save-set/restore calls.

I'd not be surprised if that also fixed a couple of bugs: it's much more 
robust and more maintainable to identify the codepaths that are a 
filesystem operation than to identify all allocations (explicit or 
implicit) in a codepath that is known to be a filesystem operation 
(especially in a constantly evolving kernel).

	Ingo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GFS2 and DLM
  2006-06-27  8:35         ` Ingo Molnar
  2006-06-27  8:50           ` Andrew Morton
@ 2006-07-03 13:40           ` Steven Whitehouse
  1 sibling, 0 replies; 46+ messages in thread
From: Steven Whitehouse @ 2006-07-03 13:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, hch, torvalds, teigland, pcaulfie, kanderso, linux-kernel

Hi,

On Tue, 2006-06-27 at 10:35 +0200, Ingo Molnar wrote:
> * Andrew Morton <akpm@osdl.org> wrote:
> 
> > On Tue, 27 Jun 2006 08:33:39 +0200
> > Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > Isnt this whole episode highly hypocritic to begin with?
> > 
> > Might be, but that's not relevant to GFS2's suitability.
> 
> it is relevant to a certain degree, because it creates a (IMO) false 
> impression of merging showstoppers. After months of being in -mm, and 
> after addressing all issues that were raised (and there was a fair 
> amount of review activity December last year iirc), one week prior the 
> close of the merge window a 'huge' list of issues are raised. (after 
> belovingly calling the GFS2 code a "huge mess", to create a positive and 
> productive tone for the review discussion i guess.)
> 
> So far in my reading there are only 2 serious ones in that list:
> 
>  - tty_* use in cluster-aware quota.c. Firstly, ocfs2 doesnt do quota -
>    which is fair enough, but this also means that there was no in-tree 
>    filesystem to base stuff off. Secondly, the tty_* use was inherited 
>    from fs/quota.c - hardly something i'd consider a fatal sin. Anyway, 
>    despite the mitigating factors it is an arguably lame thing and 
>    it should be (and will be) fixed.
> 
This one is now fixed in the git tree:
    git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6.git
Also fixed/changed in this tree is:

 o Mark GFS2 file_operations const
 o Mark GFS2 address_space_operations const
 o Update gfs2_statfs() API in line with change in Linus' tree
 o Update gfs2_get_sb() API in line with change in Linus' tree (thanks
to Andrew Morton for this one)
 o Updated to latest Linus tree

So I think that brings the code back uptodate after my holiday last
week.

>  - GFP_NOFAIL: most other journalling filesystems seem to be doing this
>    or worse. Fixing it is _hard_. Suddenly this becomes a showstopper? 
>    Huh?
> 
> (the "use the generic facilities" arguments are only valid if the 
> generic facilities can be used as-is, and if they are just optimal as 
> the one implemented by the filesystem.)
> 
> 	Ingo
I believe that this point is now resolved (so far as GFS2 going upstream
is concerned) from the discussions that have gone on in my absence. Its
obviously still an ongoing concern for a number of filesystems, not just
GFS2 and I'll be working on ways to reduce the current "nofail"
allocation calls in the future,

Steve.



^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2006-07-03 13:28 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-06-20 12:17 GFS2 and DLM Steven Whitehouse
2006-06-20 12:32 ` Nick Piggin
2006-06-20 12:47   ` Steven Whitehouse
2006-06-20 14:04     ` Nick Piggin
2006-06-20 15:40       ` Steven Whitehouse
2006-06-20 19:55         ` Andrew Morton
2006-06-21 11:20       ` Steven Whitehouse
2006-06-23 14:45       ` Christoph Hellwig
2006-06-26 21:03         ` Ingo Molnar
2006-06-27  7:52           ` Christoph Hellwig
2006-06-20 12:33 ` Christoph Hellwig
2006-06-20 12:55   ` Steven Whitehouse
2006-06-20 15:55   ` Ingo Molnar
2006-06-23 14:49 ` Christoph Hellwig
2006-06-23 20:46   ` Andrew Morton
2006-06-26 20:03   ` Ingo Molnar
2006-06-26 20:12     ` Arjan van de Ven
2006-06-27  7:08       ` Ingo Molnar
2006-06-27  6:33     ` Ingo Molnar
2006-06-27  6:43       ` Arjan van de Ven
2006-06-27  7:07         ` Ingo Molnar
2006-06-27  7:06       ` Andrew Morton
2006-06-27  8:35         ` Ingo Molnar
2006-06-27  8:50           ` Andrew Morton
2006-06-27  9:04             ` Ingo Molnar
2006-06-27  9:23               ` Andrew Morton
2006-07-03 13:40           ` Steven Whitehouse
2006-06-27  8:16       ` Nathan Scott
2006-06-27  8:22         ` Ingo Molnar
2006-06-27  8:41           ` Pekka Enberg
2006-06-27 10:33             ` Ingo Molnar
2006-06-27  8:42           ` Nathan Scott
2006-06-27  8:51             ` Ingo Molnar
2006-06-23 14:54 ` Christoph Hellwig
2006-06-23 15:54   ` Steven Whitehouse
2006-06-23 15:54     ` Christoph Hellwig
2006-06-23 16:09       ` Steven Whitehouse
2006-06-23 14:55 ` Christoph Hellwig
2006-06-23 14:57 ` Christoph Hellwig
2006-06-23 15:26   ` Steven Whitehouse
2006-06-23 15:00 ` Christoph Hellwig
2006-06-23 16:29   ` Steven Whitehouse
2006-06-23 16:48     ` Christoph Hellwig
2006-06-26 20:58       ` Ingo Molnar
2006-06-27  7:50         ` Christoph Hellwig
2006-06-27  8:31           ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).