All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Becker <Joel.Becker@oracle.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Louis Rilling <louis.rilling@kerlabs.com>,
	linux-kernel@vger.kernel.org, cluster-devel@redhat.com,
	swhiteho <swhiteho@redhat.com>
Subject: Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item()
Date: Tue, 27 Jan 2009 19:41:14 -0800	[thread overview]
Message-ID: <20090128034114.GC7244@mail.oracle.com> (raw)
In-Reply-To: <1232973009.4863.76.camel@laptop>

On Mon, Jan 26, 2009 at 01:30:09PM +0100, Peter Zijlstra wrote:
> I don't think I was suggesting that. All you need is to serialize any
> mkdir/creat against the rmdir of the youngest non-default group, and you
> can do that by holding su_mutex.
> 
> In rmdir, you already own all the i_mutex instances you need to uncouple
> the whole tree, all you need to do is validate that its indeed empty --
> you don't need i_mutex's for that, because you're holding su_mutex, and
> any concurrent mkdir/creat will be blocking on that.
> 
> If you find it empty, just mark everybody DEAD, drop su_mutex and
> decouple. All concurrent mkdir/creat thingies that were blocking will
> now bail because their parent is found DEAD.

	Right, that's what I was talking about when I said:

> > 	Now look in detach_groups().  We drop the groups children before
> > marking them DEAD.  Louis' plan, I think, is to perhaps mark a group
> > DEAD, disconnect it from the vfs, and then operate on its children.  In
> > this fashion, perhaps we can unlock the trailing lock like a normal VFS
> > operation.
> > 	This will require some serious auditing, however, because now
> > vfs functions can get into the vfs objects behind us.  And more vfs
> > changes affect us.  Whereas the current locking relies on the vfs's
> > parent->child lock ordering only, something that isn't likely to be
> > changed.
 
	That is, the vfs has already walked past this directory,
dropped i_mutex, and is in a child default group holding its i_mutex.
It wants to mkdir(2) down there.  You're saying that, if mkdir(2) holds
su_mutex higher up, it can check S_DEAD and compare with us, and that's
exactly the scheme I mentioned in the first of the quoted paragraphs
(Louis proposed it a while back).  Thus, though we don't hold i_mutex on that
child, it will eventually either a) have gotten su_mutex first, and will
cause rmdir to ENOTEMPTY or b) have gotten su_mutex second, will see
S_DEAD, and return -ENOENT.  As far as preventing mkdir(2), I don't see
why it wouldn't work.
	The issue is in that second quoted paragraph.  We know that the
VFS can look up our children if we're not holding i_mutex.  In fact,
cached_lookup() can find them without i_mutex.  Now, we know that
mkdir(2) and rmdir(2) will block at su_mutex.  But what about all other
file operations, both on the child directories *and* the attribute
files?  For attribute files, we prevent access at creation time with a
flag.  We can trust the flag because we hold i_mutex.  This might hold
anyway because we're holding that toplevel i_mutex.  At teardown time,
though, we know they can't be found because of i_mutex.  Now we don't
have that protection for processes that are farther down the tree.
	But the bigger issue is just the plain regular operations on our
directories.  An example is ->readdir().  We currently lock out
->readdir() during rmdir(2) with our holding of i_mutex.  However, if we
are not holding i_mutex, vfs_readdir() can call into our ->readdir()
right as we're tearing things down.  We may not have gotten to S_DEAD
yet in the rmdir(2) path, and the two will race.  We can't take su_mutex
in ->readdir(), because that sort of solution effectively says "we have
to take su_mutex for all operations", and we end up serializing all
operations on a subsustem.
	Now, I can think of a way to make ->readdir() safe without
su_mutex.  But what other operation is next?  How do I know I have them
all?  How do I notice when someone adds a new operation or code path to
the generic code that I have to protect against?  With i_mutex, I *know*
that everyone has agreed that is the gatekeeper.  Without it...
	*That's* the big worry.  That's what I'm worried about being
sure of.  I'd love to hear a solution that we know will work, or at
least move towards one.

Joel
	
PS:  And we haven't even talked about configfs_depend_item() yet.

-- 

Life's Little Instruction Book #252

	"Take good care of those you love."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

WARNING: multiple messages have this Message-ID (diff)
From: Joel Becker <Joel.Becker@oracle.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item()
Date: Tue, 27 Jan 2009 19:41:14 -0800	[thread overview]
Message-ID: <20090128034114.GC7244@mail.oracle.com> (raw)
In-Reply-To: <1232973009.4863.76.camel@laptop>

On Mon, Jan 26, 2009 at 01:30:09PM +0100, Peter Zijlstra wrote:
> I don't think I was suggesting that. All you need is to serialize any
> mkdir/creat against the rmdir of the youngest non-default group, and you
> can do that by holding su_mutex.
> 
> In rmdir, you already own all the i_mutex instances you need to uncouple
> the whole tree, all you need to do is validate that its indeed empty --
> you don't need i_mutex's for that, because you're holding su_mutex, and
> any concurrent mkdir/creat will be blocking on that.
> 
> If you find it empty, just mark everybody DEAD, drop su_mutex and
> decouple. All concurrent mkdir/creat thingies that were blocking will
> now bail because their parent is found DEAD.

	Right, that's what I was talking about when I said:

> > 	Now look in detach_groups().  We drop the groups children before
> > marking them DEAD.  Louis' plan, I think, is to perhaps mark a group
> > DEAD, disconnect it from the vfs, and then operate on its children.  In
> > this fashion, perhaps we can unlock the trailing lock like a normal VFS
> > operation.
> > 	This will require some serious auditing, however, because now
> > vfs functions can get into the vfs objects behind us.  And more vfs
> > changes affect us.  Whereas the current locking relies on the vfs's
> > parent->child lock ordering only, something that isn't likely to be
> > changed.
 
	That is, the vfs has already walked past this directory,
dropped i_mutex, and is in a child default group holding its i_mutex.
It wants to mkdir(2) down there.  You're saying that, if mkdir(2) holds
su_mutex higher up, it can check S_DEAD and compare with us, and that's
exactly the scheme I mentioned in the first of the quoted paragraphs
(Louis proposed it a while back).  Thus, though we don't hold i_mutex on that
child, it will eventually either a) have gotten su_mutex first, and will
cause rmdir to ENOTEMPTY or b) have gotten su_mutex second, will see
S_DEAD, and return -ENOENT.  As far as preventing mkdir(2), I don't see
why it wouldn't work.
	The issue is in that second quoted paragraph.  We know that the
VFS can look up our children if we're not holding i_mutex.  In fact,
cached_lookup() can find them without i_mutex.  Now, we know that
mkdir(2) and rmdir(2) will block at su_mutex.  But what about all other
file operations, both on the child directories *and* the attribute
files?  For attribute files, we prevent access at creation time with a
flag.  We can trust the flag because we hold i_mutex.  This might hold
anyway because we're holding that toplevel i_mutex.  At teardown time,
though, we know they can't be found because of i_mutex.  Now we don't
have that protection for processes that are farther down the tree.
	But the bigger issue is just the plain regular operations on our
directories.  An example is ->readdir().  We currently lock out
->readdir() during rmdir(2) with our holding of i_mutex.  However, if we
are not holding i_mutex, vfs_readdir() can call into our ->readdir()
right as we're tearing things down.  We may not have gotten to S_DEAD
yet in the rmdir(2) path, and the two will race.  We can't take su_mutex
in ->readdir(), because that sort of solution effectively says "we have
to take su_mutex for all operations", and we end up serializing all
operations on a subsustem.
	Now, I can think of a way to make ->readdir() safe without
su_mutex.  But what other operation is next?  How do I know I have them
all?  How do I notice when someone adds a new operation or code path to
the generic code that I have to protect against?  With i_mutex, I *know*
that everyone has agreed that is the gatekeeper.  Without it...
	*That's* the big worry.  That's what I'm worried about being
sure of.  I'd love to hear a solution that we know will work, or at
least move towards one.

Joel
	
PS:  And we haven't even talked about configfs_depend_item() yet.

-- 

Life's Little Instruction Book #252

	"Take good care of those you love."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127



  parent reply	other threads:[~2009-01-28  3:42 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-11 14:20 configfs, dlm_controld & lockdep Steven Whitehouse
2008-12-11 14:20 ` [Cluster-devel] " Steven Whitehouse
2008-12-11 14:44 ` Louis Rilling
2008-12-11 17:34   ` Joel Becker
2008-12-11 17:34     ` [Cluster-devel] " Joel Becker
2008-12-12 10:06     ` Louis Rilling
2008-12-12 15:29       ` [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() Louis Rilling
2008-12-17 21:40         ` Andrew Morton
2008-12-17 21:40           ` [Cluster-devel] " Andrew Morton
2008-12-17 22:03           ` Joel Becker
2008-12-17 22:03             ` [Cluster-devel] " Joel Becker
2008-12-17 22:09             ` Andrew Morton
2008-12-17 22:09               ` [Cluster-devel] " Andrew Morton
2008-12-18  7:26           ` Peter Zijlstra
2008-12-18  9:27             ` Joel Becker
2008-12-18  9:27               ` [Cluster-devel] " Joel Becker
2008-12-18 11:15               ` Louis Rilling
2008-12-18 18:00                 ` Make lockdep happy with configfs Louis Rilling
2009-01-26 11:51                   ` Louis Rilling
2009-01-28  3:44                     ` Joel Becker
2009-01-28  3:44                       ` [Cluster-devel] " Joel Becker
2008-12-18 18:00                 ` [PATCH 1/2] configfs: Silence lockdep on mkdir() and rmdir() Louis Rilling
2009-01-28  3:55                   ` Joel Becker
2009-01-28  3:55                     ` [Cluster-devel] " Joel Becker
2009-01-28 10:38                     ` Louis Rilling
2008-12-18 18:00                 ` [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy Louis Rilling
2009-01-28  4:13                   ` Joel Becker
2009-01-28  4:13                     ` [Cluster-devel] " Joel Becker
2009-01-28 10:32                     ` Louis Rilling
2008-12-18 11:26               ` [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() Steven Whitehouse
2008-12-18 11:26                 ` [Cluster-devel] " Steven Whitehouse
2008-12-18 11:48                 ` Louis Rilling
2008-12-18 11:56               ` Peter Zijlstra
2008-12-18 12:28                 ` Peter Zijlstra
2008-12-18 22:58                   ` Joel Becker
2008-12-18 22:58                     ` [Cluster-devel] " Joel Becker
2008-12-19 10:29                     ` Louis Rilling
2009-01-26 12:30                     ` Peter Zijlstra
2009-01-26 13:24                       ` Louis Rilling
2009-01-26 13:41                         ` Peter Zijlstra
2009-01-26 14:00                           ` Louis Rilling
2009-01-26 14:19                             ` Peter Zijlstra
2009-01-26 14:55                               ` Louis Rilling
2009-01-28  3:05                                 ` Joel Becker
2009-01-28  3:05                                   ` [Cluster-devel] " Joel Becker
2009-01-28  3:41                       ` Joel Becker [this message]
2009-01-28  3:41                         ` Joel Becker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090128034114.GC7244@mail.oracle.com \
    --to=joel.becker@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=cluster-devel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=louis.rilling@kerlabs.com \
    --cc=peterz@infradead.org \
    --cc=swhiteho@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.