linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] 9p file system bug fixes for 2.6.35-rc2
@ 2010-06-07 20:02 Eric Van Hensbergen
  2010-06-08  0:08 ` Linus Torvalds
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Van Hensbergen @ 2010-06-07 20:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: V9FS Developers, linux-kernel

Linus,

Please merge the following bug fixes for the 9P file system.

The following changes since commit 386f40c86d6c8d5b717ef20620af1a750d0dacb4:
  Linus Torvalds (1):
        Revert "tty: fix a little bug in scrup, vt.c"

are available in the git repository at:

  ssh://master.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git for-linus

Fang Wenqi (1):
      virtio_9p.h needs <linux/types.h>

jvrao (2):
      Add a helper function to get fsgid for a file create.
      9p: Add a wstat after TCREATE to fix the gid.

 fs/9p/vfs_inode.c         |   34 ++++++++++++++++++++++++++++++++++
 include/linux/virtio_9p.h |    1 +
 2 files changed, 35 insertions(+), 0 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] 9p file system bug fixes for 2.6.35-rc2
  2010-06-07 20:02 [GIT PULL] 9p file system bug fixes for 2.6.35-rc2 Eric Van Hensbergen
@ 2010-06-08  0:08 ` Linus Torvalds
  2010-06-08  0:41   ` Al Viro
  2010-06-08 14:29   ` Venkateswararao Jujjuri (JV)
  0 siblings, 2 replies; 16+ messages in thread
From: Linus Torvalds @ 2010-06-08  0:08 UTC (permalink / raw)
  To: Eric Van Hensbergen; +Cc: V9FS Developers, linux-kernel



On Mon, 7 Jun 2010, Eric Van Hensbergen wrote:
> 
> jvrao (2):
>       Add a helper function to get fsgid for a file create.
>       9p: Add a wstat after TCREATE to fix the gid.

Quite frankly, this looks rather broken.

It uses "dentry->d_parent" without locking (it so happens to likely be ok, 
since we are in "create()" and thus should be holding the parent 
semaphore). On its own, that might be excusable (if people were even 
_aware_ of the this locking rule!), but it does so just to get the inode 
pointer to that parent.

And the only thing that makes it ok to access dentry->d_parent - the fact 
that we are in v9fs_create() - is also the thing that should have made 
people look at the arguments to the function and say "hmm".

We pass in the directory inode pointer as an argument to the create 
function! The code could have used that thing directly, instead of 
mucking around with dentry pointers that it had no business looking at.

I see why it seems to have happened: v9fs does the exact same thing for 
the pre-existing "v9fs_fid_lookup()". So there is history to this 
behavior.

Maybe people weren't aware of the fact that just dereferencing 
dentry->d_parent willy-nilly isn't actually allowed. That field changes. 
Sure, there are cases where it's ok, but this is a dangerous thing to do 
in general. 

In fact, the other thing that I find doing that whole "dentry->d_parent" 
thing seems to literally be broken. If you look at v9fs_fid_lookup(), 
you'll notice how it walks up the d_parent chain, and at that point you do 
NOT own the directory i_mutex, so at that point d_parent really _can_ be 
changing wildly due to concurrent renames or whatever.

So 9pfs seems to have some preexisting bugs in this area. I'm not going to 
pull new bug-prone code. See the other discussions about being tight this 
release about really _only_ taking regressions after the merge window 
closed.

		Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] 9p file system bug fixes for 2.6.35-rc2
  2010-06-08  0:08 ` Linus Torvalds
@ 2010-06-08  0:41   ` Al Viro
  2010-06-08  0:48     ` Linus Torvalds
  2010-06-16 16:42     ` [V9fs-developer] " Aneesh Kumar K. V
  2010-06-08 14:29   ` Venkateswararao Jujjuri (JV)
  1 sibling, 2 replies; 16+ messages in thread
From: Al Viro @ 2010-06-08  0:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Eric Van Hensbergen, V9FS Developers, linux-kernel

On Mon, Jun 07, 2010 at 05:08:19PM -0700, Linus Torvalds wrote:

> In fact, the other thing that I find doing that whole "dentry->d_parent" 
> thing seems to literally be broken. If you look at v9fs_fid_lookup(), 
> you'll notice how it walks up the d_parent chain, and at that point you do 
> NOT own the directory i_mutex, so at that point d_parent really _can_ be 
> changing wildly due to concurrent renames or whatever.

Eh...  It's bogus, all right, but i_mutex is not the correct solution.
You'd have to take it on a lot of inodes along the way to root *and*
you'd violate the ordering in process (ancestors first).

I'm not sure what's the right thing to do there, actually - s_vfs_rename_sem
also won't do, since it'll give you ordering problems of its own (it's
taken before i_mutex in VFS, so trying to take it under i_mutex would not
do).

The _really_ interesting question is how do servers deal with topology-changing
renames.  Note that the problem exists only with extended 9P - with the
original one all of that had been a non-issue, since it didn't allow
cross-directory renames at all and the tree topology remained stable all along.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] 9p file system bug fixes for 2.6.35-rc2
  2010-06-08  0:41   ` Al Viro
@ 2010-06-08  0:48     ` Linus Torvalds
  2010-06-16 16:42     ` [V9fs-developer] " Aneesh Kumar K. V
  1 sibling, 0 replies; 16+ messages in thread
From: Linus Torvalds @ 2010-06-08  0:48 UTC (permalink / raw)
  To: Al Viro; +Cc: Eric Van Hensbergen, V9FS Developers, linux-kernel



On Tue, 8 Jun 2010, Al Viro wrote:

> On Mon, Jun 07, 2010 at 05:08:19PM -0700, Linus Torvalds wrote:
> 
> > In fact, the other thing that I find doing that whole "dentry->d_parent" 
> > thing seems to literally be broken. If you look at v9fs_fid_lookup(), 
> > you'll notice how it walks up the d_parent chain, and at that point you do 
> > NOT own the directory i_mutex, so at that point d_parent really _can_ be 
> > changing wildly due to concurrent renames or whatever.
> 
> Eh...  It's bogus, all right, but i_mutex is not the correct solution.

Oh, no, I didn't imply it was. But the other sites that I saw doing 
the dentry->d_parent access already _had_ the i_mutex thing, so I was 
pointing out how this one does not (and indeed _cannot_ do that).

So I'm just saying that pretty much _all_ the dentry->d_parent use in 9p 
seems very suspect. The cases where we hold i_mutex (because the caller 
already took it) shouldn't do that whole d_parent dance, because they get 
the directory inode passed into them directly.

And the other places are just buggy.

So from a quick look, the use of d_parent in 9p is simply not a good idea. 

			Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [V9fs-developer] [GIT PULL] 9p file system bug fixes for 2.6.35-rc2
  2010-06-08  0:08 ` Linus Torvalds
  2010-06-08  0:41   ` Al Viro
@ 2010-06-08 14:29   ` Venkateswararao Jujjuri (JV)
  1 sibling, 0 replies; 16+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2010-06-08 14:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Eric Van Hensbergen, V9FS Developers, linux-kernel

Linus Torvalds wrote:
> 
> On Mon, 7 Jun 2010, Eric Van Hensbergen wrote:
>> jvrao (2):
>>       Add a helper function to get fsgid for a file create.
>>       9p: Add a wstat after TCREATE to fix the gid.
> 
> Quite frankly, this looks rather broken.
> 
> It uses "dentry->d_parent" without locking (it so happens to likely be ok, 
> since we are in "create()" and thus should be holding the parent 
> semaphore). On its own, that might be excusable (if people were even 
> _aware_ of the this locking rule!), but it does so just to get the inode 
> pointer to that parent.
> 
> And the only thing that makes it ok to access dentry->d_parent - the fact 
> that we are in v9fs_create() - is also the thing that should have made 
> people look at the arguments to the function and say "hmm".

Silly me. I sent out another patch using the dir inode passed through arguments.
But we still need to analyze the use of dentry->d_parent in other parts of code.. 

- JV


> 
> We pass in the directory inode pointer as an argument to the create 
> function! The code could have used that thing directly, instead of 
> mucking around with dentry pointers that it had no business looking at.
> 
> I see why it seems to have happened: v9fs does the exact same thing for 
> the pre-existing "v9fs_fid_lookup()". So there is history to this 
> behavior.
> 
> Maybe people weren't aware of the fact that just dereferencing 
> dentry->d_parent willy-nilly isn't actually allowed. That field changes. 
> Sure, there are cases where it's ok, but this is a dangerous thing to do 
> in general. 
> 
> In fact, the other thing that I find doing that whole "dentry->d_parent" 
> thing seems to literally be broken. If you look at v9fs_fid_lookup(), 
> you'll notice how it walks up the d_parent chain, and at that point you do 
> NOT own the directory i_mutex, so at that point d_parent really _can_ be 
> changing wildly due to concurrent renames or whatever.
> 
> So 9pfs seems to have some preexisting bugs in this area. I'm not going to 
> pull new bug-prone code. See the other discussions about being tight this 
> release about really _only_ taking regressions after the merge window 
> closed.
> 
> 		Linus
> 
> ------------------------------------------------------------------------------
> ThinkGeek and WIRED's GeekDad team up for the Ultimate 
> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
> lucky parental unit.  See the prize list and enter to win: 
> http://p.sf.net/sfu/thinkgeek-promo
> _______________________________________________
> V9fs-developer mailing list
> V9fs-developer@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/v9fs-developer



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [V9fs-developer] [GIT PULL] 9p file system bug fixes for 2.6.35-rc2
  2010-06-08  0:41   ` Al Viro
  2010-06-08  0:48     ` Linus Torvalds
@ 2010-06-16 16:42     ` Aneesh Kumar K. V
  2010-06-24 16:30       ` Aneesh Kumar K. V
  1 sibling, 1 reply; 16+ messages in thread
From: Aneesh Kumar K. V @ 2010-06-16 16:42 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Eric Van Hensbergen, V9FS Developers, linux-kernel

On Tue, 8 Jun 2010 01:41:02 +0100, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> On Mon, Jun 07, 2010 at 05:08:19PM -0700, Linus Torvalds wrote:
> 
> > In fact, the other thing that I find doing that whole "dentry->d_parent" 
> > thing seems to literally be broken. If you look at v9fs_fid_lookup(), 
> > you'll notice how it walks up the d_parent chain, and at that point you do 
> > NOT own the directory i_mutex, so at that point d_parent really _can_ be 
> > changing wildly due to concurrent renames or whatever.
> 
> Eh...  It's bogus, all right, but i_mutex is not the correct solution.
> You'd have to take it on a lot of inodes along the way to root *and*
> you'd violate the ordering in process (ancestors first).
> 
> I'm not sure what's the right thing to do there, actually - s_vfs_rename_sem
> also won't do, since it'll give you ordering problems of its own (it's
> taken before i_mutex in VFS, so trying to take it under i_mutex would not
> do).

Can we use dcache_lock in v9fs_fid_lookup ? Since we are holding
parent directory inode->i_mutex in other places where we refer
dentry->d_parent I guess those access are ok ?. And for v9fs_fid_lookup we
can hold dcache_lock, walk the parent, build the full path name and use
that for TWALK ? 

Another option is we deny a cross directory rename when
doing fid_lookup. That is we can introduce a per superblock v9fs
specific rwlock that get taken (in write mode) in a cross directory
rename and in fid_lookup we take them in read mode ? We will have to set
FS_RENAME_DOES_D_MOVE for 9p.

> 
> The _really_ interesting question is how do servers deal with topology-changing
> renames.  Note that the problem exists only with extended 9P - with the
> original one all of that had been a non-issue, since it didn't allow
> cross-directory renames at all and the tree topology remained stable all along.
> 

True the problem exist only with .L extension since .u protocol don't
allow a cross directory renames. In the server we update the path names
attached to a fid during rename. This happen to all fids that have path
component matching the changed name.

-aneesh


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [V9fs-developer] [GIT PULL] 9p file system bug fixes for 2.6.35-rc2
  2010-06-16 16:42     ` [V9fs-developer] " Aneesh Kumar K. V
@ 2010-06-24 16:30       ` Aneesh Kumar K. V
  2010-06-29 20:18         ` Eric Van Hensbergen
  2010-06-30 12:55         ` Aneesh Kumar K. V
  0 siblings, 2 replies; 16+ messages in thread
From: Aneesh Kumar K. V @ 2010-06-24 16:30 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Eric Van Hensbergen, V9FS Developers, linux-kernel

On Wed, 16 Jun 2010 22:12:32 +0530, "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> On Tue, 8 Jun 2010 01:41:02 +0100, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> > On Mon, Jun 07, 2010 at 05:08:19PM -0700, Linus Torvalds wrote:
> > 
> > > In fact, the other thing that I find doing that whole "dentry->d_parent" 
> > > thing seems to literally be broken. If you look at v9fs_fid_lookup(), 
> > > you'll notice how it walks up the d_parent chain, and at that point you do 
> > > NOT own the directory i_mutex, so at that point d_parent really _can_ be 
> > > changing wildly due to concurrent renames or whatever.
> > 
> > Eh...  It's bogus, all right, but i_mutex is not the correct solution.
> > You'd have to take it on a lot of inodes along the way to root *and*
> > you'd violate the ordering in process (ancestors first).
> > 
> > I'm not sure what's the right thing to do there, actually - s_vfs_rename_sem
> > also won't do, since it'll give you ordering problems of its own (it's
> > taken before i_mutex in VFS, so trying to take it under i_mutex would not
> > do).
> 
> Can we use dcache_lock in v9fs_fid_lookup ? Since we are holding
> parent directory inode->i_mutex in other places where we refer
> dentry->d_parent I guess those access are ok ?. And for v9fs_fid_lookup we
> can hold dcache_lock, walk the parent, build the full path name and use
> that for TWALK ? 
> 
> Another option is we deny a cross directory rename when
> doing fid_lookup. That is we can introduce a per superblock v9fs
> specific rwlock that get taken (in write mode) in a cross directory
> rename and in fid_lookup we take them in read mode ? We will have to set
> FS_RENAME_DOES_D_MOVE for 9p.

something like this ?

commit 86c06ad7506e7e05dd4e1e1b8cee28e19703c4f6
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date:   Mon Jun 21 21:50:07 2010 +0530

    fs/9p: Prevent parallel rename when doing fid_lookup
    
    During fid lookup we need to make sure that the dentry->d_parent doesn't
    change so that we can safely walk the parent dentries. To ensure that
    we need to prevent cross directory rename during fid_lookup. Add a
    per superblock rename_lock rwlock to prevent parallel fid lookup and rename.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

diff --git a/fs/9p/fid.c b/fs/9p/fid.c
index 7317b39..14b542a 100644
--- a/fs/9p/fid.c
+++ b/fs/9p/fid.c
@@ -139,14 +139,19 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
 	fid = v9fs_fid_find(dentry, uid, any);
 	if (fid)
 		return fid;
-
+	/*
+	 * we don't have a matching fid. To do a TWALK we need
+	 * parent fid. We need to prevent rename when we want to
+	 * look at the parent.
+	 */
+	read_lock(&v9ses->rename_lock);
 	ds = dentry->d_parent;
 	fid = v9fs_fid_find(ds, uid, any);
 	if (!fid) { /* walk from the root */
 		n = 0;
 		for (ds = dentry; !IS_ROOT(ds); ds = ds->d_parent)
 			n++;
-
+		read_unlock(&v9ses->rename_lock);
 		fid = v9fs_fid_find(ds, uid, any);
 		if (!fid) { /* the user is not attached to the fs yet */
 			if (access == V9FS_ACCESS_SINGLE)
@@ -165,9 +170,11 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
 
 			v9fs_fid_add(ds, fid);
 		}
-	} else /* walk from the parent */
+	} else  {
+		/* walk from the parent */
 		n = 1;
-
+		read_unlock(&v9ses->rename_lock);
+	}
 	if (ds == dentry)
 		return fid;
 
@@ -175,8 +182,10 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
 	if (!wnames)
 		return ERR_PTR(-ENOMEM);
 
+	read_lock(&v9ses->rename_lock);
 	for (d = dentry, i = (n-1); i >= 0; i--, d = d->d_parent)
 		wnames[i] = (char *) d->d_name.name;
+	read_unlock(&v9ses->rename_lock);
 
 	clone = 1;
 	i = 0;
@@ -199,7 +208,6 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
 		i += l;
 		clone = 0;
 	}
-
 	kfree(wnames);
 	v9fs_fid_add(dentry, fid);
 	return fid;
diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
index f8b86e9..6839fca 100644
--- a/fs/9p/v9fs.c
+++ b/fs/9p/v9fs.c
@@ -237,6 +237,7 @@ struct p9_fid *v9fs_session_init(struct v9fs_session_info *v9ses,
 		__putname(v9ses->uname);
 		return ERR_PTR(-ENOMEM);
 	}
+	rwlock_init(&v9ses->rename_lock);
 
 	rc = bdi_setup_and_register(&v9ses->bdi, "9p", BDI_CAP_MAP_COPY);
 	if (rc) {
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index bec4d0b..dee4f26 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -104,6 +104,7 @@ struct v9fs_session_info {
 	struct p9_client *clnt;	/* 9p client */
 	struct list_head slist; /* list of sessions registered with v9fs */
 	struct backing_dev_info bdi;
+	rwlock_t rename_lock;
 };
 
 struct p9_fid *v9fs_session_init(struct v9fs_session_info *, const char *,
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 4331b3b..be39035 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -678,6 +678,7 @@ static struct dentry *v9fs_vfs_lookup(struct inode *dir, struct dentry *dentry,
 
 	sb = dir->i_sb;
 	v9ses = v9fs_inode2v9ses(dir);
+	/* We can walk d_parent because we hold the dir->i_mutex */
 	dfid = v9fs_fid_lookup(dentry->d_parent);
 	if (IS_ERR(dfid))
 		return ERR_CAST(dfid);
@@ -763,7 +764,7 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 	struct p9_fid *olddirfid;
 	struct p9_fid *newdirfid;
 	struct p9_wstat wstat;
-	int retval;
+	int retval, cross_dir_rename;
 
 	P9_DPRINTK(P9_DEBUG_VFS, "\n");
 	retval = 0;
@@ -784,7 +785,7 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 		retval = PTR_ERR(newdirfid);
 		goto clunk_olddir;
 	}
-
+	cross_dir_rename = (old_dentry->d_parent != new_dentry->d_parent);
 	if (v9fs_proto_dotl(v9ses)) {
 		retval = p9_client_rename(oldfid, newdirfid,
 					(char *) new_dentry->d_name.name);
@@ -806,6 +807,11 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 	retval = p9_client_wstat(oldfid, &wstat);
 
 clunk_newdir:
+	if (cross_dir_rename)
+		write_lock(&v9ses->rename_lock);
+	d_move(old_dentry, new_dentry);
+	if (cross_dir_rename)
+		write_unlock(&v9ses->rename_lock);
 	p9_client_clunk(newdirfid);
 
 clunk_olddir:
diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index be74d02..0f6d06d 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -278,4 +278,5 @@ struct file_system_type v9fs_fs_type = {
 	.get_sb = v9fs_get_sb,
 	.kill_sb = v9fs_kill_super,
 	.owner = THIS_MODULE,
+	.fs_flags = FS_RENAME_DOES_D_MOVE,
 };

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [V9fs-developer] [GIT PULL] 9p file system bug fixes for  2.6.35-rc2
  2010-06-24 16:30       ` Aneesh Kumar K. V
@ 2010-06-29 20:18         ` Eric Van Hensbergen
  2010-06-29 20:38           ` Linus Torvalds
  2010-06-30 12:55         ` Aneesh Kumar K. V
  1 sibling, 1 reply; 16+ messages in thread
From: Eric Van Hensbergen @ 2010-06-29 20:18 UTC (permalink / raw)
  To: Linus Torvalds, Al Viro; +Cc: V9FS Developers, linux-kernel

Linus, Al,

Does this approach satisfy your concerns?  We've been going over
several different options on how to proceed, but this seems to be the
best option.  The v9fs_fid_lookup traversal of the d_parent path is an
exception case that hardly ever gets hit in practice so the extra
locking shouldn't be a performance problem, but we haven't been able
to figure out a way to resolve it in another fashion for existing
use-case scenarios.

     Thanks for any guidance,

         -eric


On Thu, Jun 24, 2010 at 11:30 AM, Aneesh Kumar K. V
<aneesh.kumar@linux.vnet.ibm.com> wrote:
> On Wed, 16 Jun 2010 22:12:32 +0530, "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
>> On Tue, 8 Jun 2010 01:41:02 +0100, Al Viro <viro@ZenIV.linux.org.uk> wrote:
>> > On Mon, Jun 07, 2010 at 05:08:19PM -0700, Linus Torvalds wrote:
>> >
>> > > In fact, the other thing that I find doing that whole "dentry->d_parent"
>> > > thing seems to literally be broken. If you look at v9fs_fid_lookup(),
>> > > you'll notice how it walks up the d_parent chain, and at that point you do
>> > > NOT own the directory i_mutex, so at that point d_parent really _can_ be
>> > > changing wildly due to concurrent renames or whatever.
>> >
>> > Eh...  It's bogus, all right, but i_mutex is not the correct solution.
>> > You'd have to take it on a lot of inodes along the way to root *and*
>> > you'd violate the ordering in process (ancestors first).
>> >
>> > I'm not sure what's the right thing to do there, actually - s_vfs_rename_sem
>> > also won't do, since it'll give you ordering problems of its own (it's
>> > taken before i_mutex in VFS, so trying to take it under i_mutex would not
>> > do).
>>
>> Can we use dcache_lock in v9fs_fid_lookup ? Since we are holding
>> parent directory inode->i_mutex in other places where we refer
>> dentry->d_parent I guess those access are ok ?. And for v9fs_fid_lookup we
>> can hold dcache_lock, walk the parent, build the full path name and use
>> that for TWALK ?
>>
>> Another option is we deny a cross directory rename when
>> doing fid_lookup. That is we can introduce a per superblock v9fs
>> specific rwlock that get taken (in write mode) in a cross directory
>> rename and in fid_lookup we take them in read mode ? We will have to set
>> FS_RENAME_DOES_D_MOVE for 9p.
>
> something like this ?
>
> commit 86c06ad7506e7e05dd4e1e1b8cee28e19703c4f6
> Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Date:   Mon Jun 21 21:50:07 2010 +0530
>
>    fs/9p: Prevent parallel rename when doing fid_lookup
>
>    During fid lookup we need to make sure that the dentry->d_parent doesn't
>    change so that we can safely walk the parent dentries. To ensure that
>    we need to prevent cross directory rename during fid_lookup. Add a
>    per superblock rename_lock rwlock to prevent parallel fid lookup and rename.
>
>    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>
> diff --git a/fs/9p/fid.c b/fs/9p/fid.c
> index 7317b39..14b542a 100644
> --- a/fs/9p/fid.c
> +++ b/fs/9p/fid.c
> @@ -139,14 +139,19 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
>        fid = v9fs_fid_find(dentry, uid, any);
>        if (fid)
>                return fid;
> -
> +       /*
> +        * we don't have a matching fid. To do a TWALK we need
> +        * parent fid. We need to prevent rename when we want to
> +        * look at the parent.
> +        */
> +       read_lock(&v9ses->rename_lock);
>        ds = dentry->d_parent;
>        fid = v9fs_fid_find(ds, uid, any);
>        if (!fid) { /* walk from the root */
>                n = 0;
>                for (ds = dentry; !IS_ROOT(ds); ds = ds->d_parent)
>                        n++;
> -
> +               read_unlock(&v9ses->rename_lock);
>                fid = v9fs_fid_find(ds, uid, any);
>                if (!fid) { /* the user is not attached to the fs yet */
>                        if (access == V9FS_ACCESS_SINGLE)
> @@ -165,9 +170,11 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
>
>                        v9fs_fid_add(ds, fid);
>                }
> -       } else /* walk from the parent */
> +       } else  {
> +               /* walk from the parent */
>                n = 1;
> -
> +               read_unlock(&v9ses->rename_lock);
> +       }
>        if (ds == dentry)
>                return fid;
>
> @@ -175,8 +182,10 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
>        if (!wnames)
>                return ERR_PTR(-ENOMEM);
>
> +       read_lock(&v9ses->rename_lock);
>        for (d = dentry, i = (n-1); i >= 0; i--, d = d->d_parent)
>                wnames[i] = (char *) d->d_name.name;
> +       read_unlock(&v9ses->rename_lock);
>
>        clone = 1;
>        i = 0;
> @@ -199,7 +208,6 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
>                i += l;
>                clone = 0;
>        }
> -
>        kfree(wnames);
>        v9fs_fid_add(dentry, fid);
>        return fid;
> diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
> index f8b86e9..6839fca 100644
> --- a/fs/9p/v9fs.c
> +++ b/fs/9p/v9fs.c
> @@ -237,6 +237,7 @@ struct p9_fid *v9fs_session_init(struct v9fs_session_info *v9ses,
>                __putname(v9ses->uname);
>                return ERR_PTR(-ENOMEM);
>        }
> +       rwlock_init(&v9ses->rename_lock);
>
>        rc = bdi_setup_and_register(&v9ses->bdi, "9p", BDI_CAP_MAP_COPY);
>        if (rc) {
> diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
> index bec4d0b..dee4f26 100644
> --- a/fs/9p/v9fs.h
> +++ b/fs/9p/v9fs.h
> @@ -104,6 +104,7 @@ struct v9fs_session_info {
>        struct p9_client *clnt; /* 9p client */
>        struct list_head slist; /* list of sessions registered with v9fs */
>        struct backing_dev_info bdi;
> +       rwlock_t rename_lock;
>  };
>
>  struct p9_fid *v9fs_session_init(struct v9fs_session_info *, const char *,
> diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
> index 4331b3b..be39035 100644
> --- a/fs/9p/vfs_inode.c
> +++ b/fs/9p/vfs_inode.c
> @@ -678,6 +678,7 @@ static struct dentry *v9fs_vfs_lookup(struct inode *dir, struct dentry *dentry,
>
>        sb = dir->i_sb;
>        v9ses = v9fs_inode2v9ses(dir);
> +       /* We can walk d_parent because we hold the dir->i_mutex */
>        dfid = v9fs_fid_lookup(dentry->d_parent);
>        if (IS_ERR(dfid))
>                return ERR_CAST(dfid);
> @@ -763,7 +764,7 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
>        struct p9_fid *olddirfid;
>        struct p9_fid *newdirfid;
>        struct p9_wstat wstat;
> -       int retval;
> +       int retval, cross_dir_rename;
>
>        P9_DPRINTK(P9_DEBUG_VFS, "\n");
>        retval = 0;
> @@ -784,7 +785,7 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
>                retval = PTR_ERR(newdirfid);
>                goto clunk_olddir;
>        }
> -
> +       cross_dir_rename = (old_dentry->d_parent != new_dentry->d_parent);
>        if (v9fs_proto_dotl(v9ses)) {
>                retval = p9_client_rename(oldfid, newdirfid,
>                                        (char *) new_dentry->d_name.name);
> @@ -806,6 +807,11 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
>        retval = p9_client_wstat(oldfid, &wstat);
>
>  clunk_newdir:
> +       if (cross_dir_rename)
> +               write_lock(&v9ses->rename_lock);
> +       d_move(old_dentry, new_dentry);
> +       if (cross_dir_rename)
> +               write_unlock(&v9ses->rename_lock);
>        p9_client_clunk(newdirfid);
>
>  clunk_olddir:
> diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
> index be74d02..0f6d06d 100644
> --- a/fs/9p/vfs_super.c
> +++ b/fs/9p/vfs_super.c
> @@ -278,4 +278,5 @@ struct file_system_type v9fs_fs_type = {
>        .get_sb = v9fs_get_sb,
>        .kill_sb = v9fs_kill_super,
>        .owner = THIS_MODULE,
> +       .fs_flags = FS_RENAME_DOES_D_MOVE,
>  };
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [V9fs-developer] [GIT PULL] 9p file system bug fixes for  2.6.35-rc2
  2010-06-29 20:18         ` Eric Van Hensbergen
@ 2010-06-29 20:38           ` Linus Torvalds
  2010-06-30 11:52             ` Aneesh Kumar K. V
  0 siblings, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2010-06-29 20:38 UTC (permalink / raw)
  To: Eric Van Hensbergen; +Cc: Al Viro, V9FS Developers, linux-kernel

On Tue, Jun 29, 2010 at 1:18 PM, Eric Van Hensbergen <ericvh@gmail.com> wrote:
>
> Does this approach satisfy your concerns?  We've been going over
> several different options on how to proceed, but this seems to be the
> best option.

Using a p9fs rename lock does seem to be a reasonable option.

That said, the patch itself seems to not be valid. You drop the lock
too early in v9fs_fid_lookup() as far as I can tell. You then re-take
it before doing that whole

   for (d = dentry, i = (n-1); i >= 0; i--, d = d->d_parent)

loop with it held again, but that's totally bogus - because you
dropped the lock, your 'n-1' count has absolutely no meaning any more,
since a cross-directory rename might have changed the depth of the
thing in the meantime.

And if the depth changes, you aren't at all guaranteed to stay on the
same p9fs filesystem, so now you're doing that d_parent access without
the proper locking (sure: you hold the rename lock, but it's not at
all guaranteed that the rename lock is the _right_ lock any more as
you traverse the list down!)

But I didn't look deeply at the patch. There might be some reason why
it's safe (I doubt it, though), and there might be other places where
you do the same. But in general, dropping and re-taking a lock is a
bad idea. If you dropped the lock, you can't depend on anything you
found out while having held it.

              Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [V9fs-developer] [GIT PULL] 9p file system bug fixes for 2.6.35-rc2
  2010-06-29 20:38           ` Linus Torvalds
@ 2010-06-30 11:52             ` Aneesh Kumar K. V
  2010-06-30 16:48               ` Linus Torvalds
  2010-06-30 18:16               ` Latchesar Ionkov
  0 siblings, 2 replies; 16+ messages in thread
From: Aneesh Kumar K. V @ 2010-06-30 11:52 UTC (permalink / raw)
  To: Linus Torvalds, Eric Van Hensbergen
  Cc: V9FS Developers, Al Viro, linux-kernel

On Tue, 29 Jun 2010 13:38:57 -0700, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Tue, Jun 29, 2010 at 1:18 PM, Eric Van Hensbergen <ericvh@gmail.com> wrote:
> >
> > Does this approach satisfy your concerns?  We've been going over
> > several different options on how to proceed, but this seems to be the
> > best option.
> 
> Using a p9fs rename lock does seem to be a reasonable option.
> 
> That said, the patch itself seems to not be valid. You drop the lock
> too early in v9fs_fid_lookup() as far as I can tell. You then re-take
> it before doing that whole
> 
>    for (d = dentry, i = (n-1); i >= 0; i--, d = d->d_parent)
> 
> loop with it held again, but that's totally bogus - because you
> dropped the lock, your 'n-1' count has absolutely no meaning any more,
> since a cross-directory rename might have changed the depth of the
> thing in the meantime.
> 
> And if the depth changes, you aren't at all guaranteed to stay on the
> same p9fs filesystem, so now you're doing that d_parent access without
> the proper locking (sure: you hold the rename lock, but it's not at
> all guaranteed that the rename lock is the _right_ lock any more as
> you traverse the list down!)
> 
> But I didn't look deeply at the patch. There might be some reason why
> it's safe (I doubt it, though), and there might be other places where
> you do the same. But in general, dropping and re-taking a lock is a
> bad idea. If you dropped the lock, you can't depend on anything you
> found out while having held it.

You are correct. we cannot drop the rename lock in between. I also found
another issue in that we are using dentry->d_name.name directly. That
would imply we need to hold the rename_lock even during the
client_walk.  How about the patch below ?. I updated the patch to hold
rename_lock during multiple path walk. Also the rename path is updated
to hold the lock during p9_client_rename operations.

commit 79f6f20dbb70ad35db37b674957c95de20662a75
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date:   Wed Jun 30 15:37:58 2010 +0530

    fs/9p: Prevent parallel rename when doing fid_lookup
    
    During fid lookup we need to make sure that the dentry->d_parent doesn't
    change so that we can safely walk the parent dentries. To ensure that
    we need to prevent cross directory rename during fid_lookup. Add a
    per superblock rename_lock rwlock to prevent parallel fid lookup and rename.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

diff --git a/fs/9p/fid.c b/fs/9p/fid.c
index 5d6cfcb..7b387fe 100644
--- a/fs/9p/fid.c
+++ b/fs/9p/fid.c
@@ -97,6 +97,34 @@ static struct p9_fid *v9fs_fid_find(struct dentry *dentry, u32 uid, int any)
 	return ret;
 }
 
+/*
+ * We need to hold v9ses->rename_lock as long as we hold references
+ * to returned path array. Array element contain pointers to
+ * dentry names.
+ */
+static int build_path_from_dentry(struct v9fs_session_info *v9ses,
+				  struct dentry *dentry, char ***names)
+{
+	int n = 0, i;
+	char **wnames;
+	struct dentry *ds;
+
+	for (ds = dentry; !IS_ROOT(ds); ds = ds->d_parent)
+		n++;
+
+	wnames = kmalloc(sizeof(char *) * n, GFP_KERNEL);
+	if (!wnames)
+		goto err_out;
+
+	for (ds = dentry, i = (n-1); i >= 0; i--, ds = ds->d_parent)
+		wnames[i] = (char  *)ds->d_name.name;
+
+	*names = wnames;
+	return n;
+err_out:
+	return -ENOMEM;
+}
+
 /**
  * v9fs_fid_lookup - lookup for a fid, try to walk if not found
  * @dentry: dentry to look for fid in
@@ -112,7 +140,7 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
 	int i, n, l, clone, any, access;
 	u32 uid;
 	struct p9_fid *fid, *old_fid = NULL;
-	struct dentry *d, *ds;
+	struct dentry *ds;
 	struct v9fs_session_info *v9ses;
 	char **wnames, *uname;
 
@@ -139,50 +167,63 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
 	fid = v9fs_fid_find(dentry, uid, any);
 	if (fid)
 		return fid;
-
+	/*
+	 * we don't have a matching fid. To do a TWALK we need
+	 * parent fid. We need to prevent rename when we want to
+	 * look at the parent.
+	 */
+	read_lock(&v9ses->rename_lock);
 	ds = dentry->d_parent;
 	fid = v9fs_fid_find(ds, uid, any);
-	if (!fid) { /* walk from the root */
-		n = 0;
-		for (ds = dentry; !IS_ROOT(ds); ds = ds->d_parent)
-			n++;
-
-		fid = v9fs_fid_find(ds, uid, any);
-		if (!fid) { /* the user is not attached to the fs yet */
-			if (access == V9FS_ACCESS_SINGLE)
-				return ERR_PTR(-EPERM);
-
-			if (v9fs_proto_dotu(v9ses) ||
-				v9fs_proto_dotl(v9ses))
-				uname = NULL;
-			else
-				uname = v9ses->uname;
+	if (fid) {
+		/* Found the parent fid do a lookup with that */
+		fid = p9_client_walk(fid, 1, (char **)&ds->d_name.name, 1);
+		read_unlock(&v9ses->rename_lock);
+		return fid;
+	}
+	read_unlock(&v9ses->rename_lock);
 
-			fid = p9_client_attach(v9ses->clnt, NULL, uname, uid,
-				v9ses->aname);
+	/* start from the root and try to do a lookup */
+	fid = v9fs_fid_find(dentry->d_sb->s_root, uid, any);
+	if (!fid) {
+		/* the user is not attached to the fs yet */
+		if (access == V9FS_ACCESS_SINGLE)
+			return ERR_PTR(-EPERM);
 
-			if (IS_ERR(fid))
-				return fid;
+		if (v9fs_proto_dotu(v9ses) || v9fs_proto_dotl(v9ses))
+				uname = NULL;
+		else
+			uname = v9ses->uname;
 
-			v9fs_fid_add(ds, fid);
-		}
-	} else /* walk from the parent */
-		n = 1;
+		fid = p9_client_attach(v9ses->clnt, NULL, uname, uid,
+				       v9ses->aname);
+		if (IS_ERR(fid))
+			return fid;
 
-	if (ds == dentry)
+		v9fs_fid_add(dentry->d_sb->s_root, fid);
+	}
+	/* If we are root ourself just return that */
+	if (dentry->d_sb->s_root == dentry)
 		return fid;
-
-	wnames = kmalloc(sizeof(char *) * n, GFP_KERNEL);
-	if (!wnames)
-		return ERR_PTR(-ENOMEM);
-
-	for (d = dentry, i = (n-1); i >= 0; i--, d = d->d_parent)
-		wnames[i] = (char *) d->d_name.name;
-
+	/*
+	 * Do a multipath walk with attached root.
+	 * When walking parent we need to make sure we
+	 * don't have a parallel rename happening
+	 */
+	read_lock(&v9ses->rename_lock);
+	n  = build_path_from_dentry(v9ses, dentry, &wnames);
+	if (n < 0) {
+		fid = ERR_CAST(ERR_PTR(n));
+		goto err_out;
+	}
 	clone = 1;
 	i = 0;
 	while (i < n) {
 		l = min(n - i, P9_MAXWELEM);
+		/*
+		 * We need to hold rename lock when doing a multipath
+		 * walk to ensure none of the patch component change
+		 */
 		fid = p9_client_walk(fid, l, &wnames[i], clone);
 		if (IS_ERR(fid)) {
 			if (old_fid) {
@@ -194,15 +235,16 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
 				p9_client_clunk(old_fid);
 			}
 			kfree(wnames);
-			return fid;
+			goto err_out;
 		}
 		old_fid = fid;
 		i += l;
 		clone = 0;
 	}
-
 	kfree(wnames);
 	v9fs_fid_add(dentry, fid);
+err_out:
+	read_unlock(&v9ses->rename_lock);
 	return fid;
 }
 
diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
index 3c49201..b41bcef 100644
--- a/fs/9p/v9fs.c
+++ b/fs/9p/v9fs.c
@@ -237,6 +237,7 @@ struct p9_fid *v9fs_session_init(struct v9fs_session_info *v9ses,
 		__putname(v9ses->uname);
 		return ERR_PTR(-ENOMEM);
 	}
+	rwlock_init(&v9ses->rename_lock);
 
 	rc = bdi_setup_and_register(&v9ses->bdi, "9p", BDI_CAP_MAP_COPY);
 	if (rc) {
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index bec4d0b..dee4f26 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -104,6 +104,7 @@ struct v9fs_session_info {
 	struct p9_client *clnt;	/* 9p client */
 	struct list_head slist; /* list of sessions registered with v9fs */
 	struct backing_dev_info bdi;
+	rwlock_t rename_lock;
 };
 
 struct p9_fid *v9fs_session_init(struct v9fs_session_info *, const char *,
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 503a6a2..eae89ad 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -964,6 +964,7 @@ static struct dentry *v9fs_vfs_lookup(struct inode *dir, struct dentry *dentry,
 
 	sb = dir->i_sb;
 	v9ses = v9fs_inode2v9ses(dir);
+	/* We can walk d_parent because we hold the dir->i_mutex */
 	dfid = v9fs_fid_lookup(dentry->d_parent);
 	if (IS_ERR(dfid))
 		return ERR_CAST(dfid);
@@ -1049,7 +1050,7 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 	struct p9_fid *olddirfid;
 	struct p9_fid *newdirfid;
 	struct p9_wstat wstat;
-	int retval;
+	int retval, cross_dir_rename = 0;
 
 	P9_DPRINTK(P9_DEBUG_VFS, "\n");
 	retval = 0;
@@ -1070,6 +1071,9 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 		retval = PTR_ERR(newdirfid);
 		goto clunk_olddir;
 	}
+	cross_dir_rename = (old_dentry->d_parent != new_dentry->d_parent);
+	if (cross_dir_rename)
+		write_lock(&v9ses->rename_lock);
 
 	if (v9fs_proto_dotl(v9ses)) {
 		retval = p9_client_rename(oldfid, newdirfid,
@@ -1077,21 +1081,27 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 		if (retval != -ENOSYS)
 			goto clunk_newdir;
 	}
+	if (cross_dir_rename) {
+		/*
+		 * 9P .u can only handle file rename in the same directory
+		 */
 
-	/* 9P can only handle file rename in the same directory */
-	if (memcmp(&olddirfid->qid, &newdirfid->qid, sizeof(newdirfid->qid))) {
 		P9_DPRINTK(P9_DEBUG_ERROR,
 				"old dir and new dir are different\n");
 		retval = -EXDEV;
 		goto clunk_newdir;
 	}
-
 	v9fs_blank_wstat(&wstat);
 	wstat.muid = v9ses->uname;
 	wstat.name = (char *) new_dentry->d_name.name;
 	retval = p9_client_wstat(oldfid, &wstat);
 
 clunk_newdir:
+	if (!retval)
+		/* successful rename */
+		d_move(old_dentry, new_dentry);
+	if (cross_dir_rename)
+		write_unlock(&v9ses->rename_lock);
 	p9_client_clunk(newdirfid);
 
 clunk_olddir:
diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index 0740675..3abc3ec 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -287,4 +287,5 @@ struct file_system_type v9fs_fs_type = {
 	.get_sb = v9fs_get_sb,
 	.kill_sb = v9fs_kill_super,
 	.owner = THIS_MODULE,
+	.fs_flags = FS_RENAME_DOES_D_MOVE,
 };

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [V9fs-developer] [GIT PULL] 9p file system bug fixes for 2.6.35-rc2
  2010-06-24 16:30       ` Aneesh Kumar K. V
  2010-06-29 20:18         ` Eric Van Hensbergen
@ 2010-06-30 12:55         ` Aneesh Kumar K. V
  1 sibling, 0 replies; 16+ messages in thread
From: Aneesh Kumar K. V @ 2010-06-30 12:55 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Eric Van Hensbergen, V9FS Developers, linux-kernel

On Thu, 24 Jun 2010 22:00:10 +0530, "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> On Wed, 16 Jun 2010 22:12:32 +0530, "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> > On Tue, 8 Jun 2010 01:41:02 +0100, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> > > On Mon, Jun 07, 2010 at 05:08:19PM -0700, Linus Torvalds wrote:
> > > 
> > > > In fact, the other thing that I find doing that whole "dentry->d_parent" 
> > > > thing seems to literally be broken. If you look at v9fs_fid_lookup(), 
> > > > you'll notice how it walks up the d_parent chain, and at that point you do 
> > > > NOT own the directory i_mutex, so at that point d_parent really _can_ be 
> > > > changing wildly due to concurrent renames or whatever.
> > > 
> > > Eh...  It's bogus, all right, but i_mutex is not the correct solution.
> > > You'd have to take it on a lot of inodes along the way to root *and*
> > > you'd violate the ordering in process (ancestors first).
> > > 
> > > I'm not sure what's the right thing to do there, actually - s_vfs_rename_sem
> > > also won't do, since it'll give you ordering problems of its own (it's
> > > taken before i_mutex in VFS, so trying to take it under i_mutex would not
> > > do).
> > 
> > Can we use dcache_lock in v9fs_fid_lookup ? Since we are holding
> > parent directory inode->i_mutex in other places where we refer
> > dentry->d_parent I guess those access are ok ?. And for v9fs_fid_lookup we
> > can hold dcache_lock, walk the parent, build the full path name and use
> > that for TWALK ? 
> > 
> > Another option is we deny a cross directory rename when
> > doing fid_lookup. That is we can introduce a per superblock v9fs
> > specific rwlock that get taken (in write mode) in a cross directory
> > rename and in fid_lookup we take them in read mode ? We will have to set
> > FS_RENAME_DOES_D_MOVE for 9p.
> 
> something like this ?
> 
> commit 86c06ad7506e7e05dd4e1e1b8cee28e19703c4f6
> Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Date:   Mon Jun 21 21:50:07 2010 +0530
> 
>     fs/9p: Prevent parallel rename when doing fid_lookup
>     
>     During fid lookup we need to make sure that the dentry->d_parent doesn't
>     change so that we can safely walk the parent dentries. To ensure that
>     we need to prevent cross directory rename during fid_lookup. Add a
>     per superblock rename_lock rwlock to prevent parallel fid lookup and rename.
>     
>     Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> 

To make sure d_name.name remain correct we need something like below ?

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index eae89ad..9f9f804 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1050,7 +1050,7 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 	struct p9_fid *olddirfid;
 	struct p9_fid *newdirfid;
 	struct p9_wstat wstat;
-	int retval, cross_dir_rename = 0;
+	int retval;
 
 	P9_DPRINTK(P9_DEBUG_VFS, "\n");
 	retval = 0;
@@ -1071,17 +1071,15 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 		retval = PTR_ERR(newdirfid);
 		goto clunk_olddir;
 	}
-	cross_dir_rename = (old_dentry->d_parent != new_dentry->d_parent);
-	if (cross_dir_rename)
-		write_lock(&v9ses->rename_lock);
 
+	write_lock(&v9ses->rename_lock);
 	if (v9fs_proto_dotl(v9ses)) {
 		retval = p9_client_rename(oldfid, newdirfid,
 					(char *) new_dentry->d_name.name);
 		if (retval != -ENOSYS)
 			goto clunk_newdir;
 	}
-	if (cross_dir_rename) {
+	if (old_dentry->d_parent != new_dentry->d_parent) {
 		/*
 		 * 9P .u can only handle file rename in the same directory
 		 */
@@ -1100,8 +1098,7 @@ clunk_newdir:
 	if (!retval)
 		/* successful rename */
 		d_move(old_dentry, new_dentry);
-	if (cross_dir_rename)
-		write_unlock(&v9ses->rename_lock);
+	write_unlock(&v9ses->rename_lock);
 	p9_client_clunk(newdirfid);
 
 clunk_olddir:

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [V9fs-developer] [GIT PULL] 9p file system bug fixes for  2.6.35-rc2
  2010-06-30 11:52             ` Aneesh Kumar K. V
@ 2010-06-30 16:48               ` Linus Torvalds
  2010-06-30 18:58                 ` Aneesh Kumar K. V
  2010-06-30 18:16               ` Latchesar Ionkov
  1 sibling, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2010-06-30 16:48 UTC (permalink / raw)
  To: Aneesh Kumar K. V
  Cc: Eric Van Hensbergen, V9FS Developers, Al Viro, linux-kernel

On Wed, Jun 30, 2010 at 4:52 AM, Aneesh Kumar K. V
<aneesh.kumar@linux.vnet.ibm.com> wrote:
>
> You are correct. we cannot drop the rename lock in between. I also found
> another issue in that we are using dentry->d_name.name directly. That
> would imply we need to hold the rename_lock even during the
> client_walk.

Yes.

> How about the patch below ?. I updated the patch to hold
> rename_lock during multiple path walk. Also the rename path is updated
> to hold the lock during p9_client_rename operations.

I'm not finding any obvious problems with this, and you're right that
you also need to hold the rename write-lock even for regular renames
(not just cross-directory ones) in order to protect the name itself.
So those two patches together seem to be ok at a quick glance.

That said, I do wonder if you wouldn't be better off copying the name
components in order to then drop the lock earlier. That way you'd only
need to hold the lock while walking the dentries, and could possibly
release it during the actual walk (unless you need the names to be
stable, but I don't think you can rely on that anyway, since other
clients might be doing renames concurrently.. I don't know)

                       Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [V9fs-developer] [GIT PULL] 9p file system bug fixes for  2.6.35-rc2
  2010-06-30 11:52             ` Aneesh Kumar K. V
  2010-06-30 16:48               ` Linus Torvalds
@ 2010-06-30 18:16               ` Latchesar Ionkov
  2010-06-30 18:31                 ` Linus Torvalds
  2010-06-30 18:33                 ` Aneesh Kumar K. V
  1 sibling, 2 replies; 16+ messages in thread
From: Latchesar Ionkov @ 2010-06-30 18:16 UTC (permalink / raw)
  To: Aneesh Kumar K. V
  Cc: Linus Torvalds, Eric Van Hensbergen, V9FS Developers, Al Viro,
	linux-kernel

I think that you need to use the s_vfs_rename_mutex in the super_block
struct instead of introducing a new rename_lock in the v9fs session.

Thanks,
    Lucho

On Wed, Jun 30, 2010 at 5:52 AM, Aneesh Kumar K. V
<aneesh.kumar@linux.vnet.ibm.com> wrote:
> On Tue, 29 Jun 2010 13:38:57 -0700, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> On Tue, Jun 29, 2010 at 1:18 PM, Eric Van Hensbergen <ericvh@gmail.com> wrote:
>> >
>> > Does this approach satisfy your concerns?  We've been going over
>> > several different options on how to proceed, but this seems to be the
>> > best option.
>>
>> Using a p9fs rename lock does seem to be a reasonable option.
>>
>> That said, the patch itself seems to not be valid. You drop the lock
>> too early in v9fs_fid_lookup() as far as I can tell. You then re-take
>> it before doing that whole
>>
>>    for (d = dentry, i = (n-1); i >= 0; i--, d = d->d_parent)
>>
>> loop with it held again, but that's totally bogus - because you
>> dropped the lock, your 'n-1' count has absolutely no meaning any more,
>> since a cross-directory rename might have changed the depth of the
>> thing in the meantime.
>>
>> And if the depth changes, you aren't at all guaranteed to stay on the
>> same p9fs filesystem, so now you're doing that d_parent access without
>> the proper locking (sure: you hold the rename lock, but it's not at
>> all guaranteed that the rename lock is the _right_ lock any more as
>> you traverse the list down!)
>>
>> But I didn't look deeply at the patch. There might be some reason why
>> it's safe (I doubt it, though), and there might be other places where
>> you do the same. But in general, dropping and re-taking a lock is a
>> bad idea. If you dropped the lock, you can't depend on anything you
>> found out while having held it.
>
> You are correct. we cannot drop the rename lock in between. I also found
> another issue in that we are using dentry->d_name.name directly. That
> would imply we need to hold the rename_lock even during the
> client_walk.  How about the patch below ?. I updated the patch to hold
> rename_lock during multiple path walk. Also the rename path is updated
> to hold the lock during p9_client_rename operations.
>
> commit 79f6f20dbb70ad35db37b674957c95de20662a75
> Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Date:   Wed Jun 30 15:37:58 2010 +0530
>
>    fs/9p: Prevent parallel rename when doing fid_lookup
>
>    During fid lookup we need to make sure that the dentry->d_parent doesn't
>    change so that we can safely walk the parent dentries. To ensure that
>    we need to prevent cross directory rename during fid_lookup. Add a
>    per superblock rename_lock rwlock to prevent parallel fid lookup and rename.
>
>    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>
> diff --git a/fs/9p/fid.c b/fs/9p/fid.c
> index 5d6cfcb..7b387fe 100644
> --- a/fs/9p/fid.c
> +++ b/fs/9p/fid.c
> @@ -97,6 +97,34 @@ static struct p9_fid *v9fs_fid_find(struct dentry *dentry, u32 uid, int any)
>        return ret;
>  }
>
> +/*
> + * We need to hold v9ses->rename_lock as long as we hold references
> + * to returned path array. Array element contain pointers to
> + * dentry names.
> + */
> +static int build_path_from_dentry(struct v9fs_session_info *v9ses,
> +                                 struct dentry *dentry, char ***names)
> +{
> +       int n = 0, i;
> +       char **wnames;
> +       struct dentry *ds;
> +
> +       for (ds = dentry; !IS_ROOT(ds); ds = ds->d_parent)
> +               n++;
> +
> +       wnames = kmalloc(sizeof(char *) * n, GFP_KERNEL);
> +       if (!wnames)
> +               goto err_out;
> +
> +       for (ds = dentry, i = (n-1); i >= 0; i--, ds = ds->d_parent)
> +               wnames[i] = (char  *)ds->d_name.name;
> +
> +       *names = wnames;
> +       return n;
> +err_out:
> +       return -ENOMEM;
> +}
> +
>  /**
>  * v9fs_fid_lookup - lookup for a fid, try to walk if not found
>  * @dentry: dentry to look for fid in
> @@ -112,7 +140,7 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
>        int i, n, l, clone, any, access;
>        u32 uid;
>        struct p9_fid *fid, *old_fid = NULL;
> -       struct dentry *d, *ds;
> +       struct dentry *ds;
>        struct v9fs_session_info *v9ses;
>        char **wnames, *uname;
>
> @@ -139,50 +167,63 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
>        fid = v9fs_fid_find(dentry, uid, any);
>        if (fid)
>                return fid;
> -
> +       /*
> +        * we don't have a matching fid. To do a TWALK we need
> +        * parent fid. We need to prevent rename when we want to
> +        * look at the parent.
> +        */
> +       read_lock(&v9ses->rename_lock);
>        ds = dentry->d_parent;
>        fid = v9fs_fid_find(ds, uid, any);
> -       if (!fid) { /* walk from the root */
> -               n = 0;
> -               for (ds = dentry; !IS_ROOT(ds); ds = ds->d_parent)
> -                       n++;
> -
> -               fid = v9fs_fid_find(ds, uid, any);
> -               if (!fid) { /* the user is not attached to the fs yet */
> -                       if (access == V9FS_ACCESS_SINGLE)
> -                               return ERR_PTR(-EPERM);
> -
> -                       if (v9fs_proto_dotu(v9ses) ||
> -                               v9fs_proto_dotl(v9ses))
> -                               uname = NULL;
> -                       else
> -                               uname = v9ses->uname;
> +       if (fid) {
> +               /* Found the parent fid do a lookup with that */
> +               fid = p9_client_walk(fid, 1, (char **)&ds->d_name.name, 1);
> +               read_unlock(&v9ses->rename_lock);
> +               return fid;
> +       }
> +       read_unlock(&v9ses->rename_lock);
>
> -                       fid = p9_client_attach(v9ses->clnt, NULL, uname, uid,
> -                               v9ses->aname);
> +       /* start from the root and try to do a lookup */
> +       fid = v9fs_fid_find(dentry->d_sb->s_root, uid, any);
> +       if (!fid) {
> +               /* the user is not attached to the fs yet */
> +               if (access == V9FS_ACCESS_SINGLE)
> +                       return ERR_PTR(-EPERM);
>
> -                       if (IS_ERR(fid))
> -                               return fid;
> +               if (v9fs_proto_dotu(v9ses) || v9fs_proto_dotl(v9ses))
> +                               uname = NULL;
> +               else
> +                       uname = v9ses->uname;
>
> -                       v9fs_fid_add(ds, fid);
> -               }
> -       } else /* walk from the parent */
> -               n = 1;
> +               fid = p9_client_attach(v9ses->clnt, NULL, uname, uid,
> +                                      v9ses->aname);
> +               if (IS_ERR(fid))
> +                       return fid;
>
> -       if (ds == dentry)
> +               v9fs_fid_add(dentry->d_sb->s_root, fid);
> +       }
> +       /* If we are root ourself just return that */
> +       if (dentry->d_sb->s_root == dentry)
>                return fid;
> -
> -       wnames = kmalloc(sizeof(char *) * n, GFP_KERNEL);
> -       if (!wnames)
> -               return ERR_PTR(-ENOMEM);
> -
> -       for (d = dentry, i = (n-1); i >= 0; i--, d = d->d_parent)
> -               wnames[i] = (char *) d->d_name.name;
> -
> +       /*
> +        * Do a multipath walk with attached root.
> +        * When walking parent we need to make sure we
> +        * don't have a parallel rename happening
> +        */
> +       read_lock(&v9ses->rename_lock);
> +       n  = build_path_from_dentry(v9ses, dentry, &wnames);
> +       if (n < 0) {
> +               fid = ERR_CAST(ERR_PTR(n));
> +               goto err_out;
> +       }
>        clone = 1;
>        i = 0;
>        while (i < n) {
>                l = min(n - i, P9_MAXWELEM);
> +               /*
> +                * We need to hold rename lock when doing a multipath
> +                * walk to ensure none of the patch component change
> +                */
>                fid = p9_client_walk(fid, l, &wnames[i], clone);
>                if (IS_ERR(fid)) {
>                        if (old_fid) {
> @@ -194,15 +235,16 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
>                                p9_client_clunk(old_fid);
>                        }
>                        kfree(wnames);
> -                       return fid;
> +                       goto err_out;
>                }
>                old_fid = fid;
>                i += l;
>                clone = 0;
>        }
> -
>        kfree(wnames);
>        v9fs_fid_add(dentry, fid);
> +err_out:
> +       read_unlock(&v9ses->rename_lock);
>        return fid;
>  }
>
> diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
> index 3c49201..b41bcef 100644
> --- a/fs/9p/v9fs.c
> +++ b/fs/9p/v9fs.c
> @@ -237,6 +237,7 @@ struct p9_fid *v9fs_session_init(struct v9fs_session_info *v9ses,
>                __putname(v9ses->uname);
>                return ERR_PTR(-ENOMEM);
>        }
> +       rwlock_init(&v9ses->rename_lock);
>
>        rc = bdi_setup_and_register(&v9ses->bdi, "9p", BDI_CAP_MAP_COPY);
>        if (rc) {
> diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
> index bec4d0b..dee4f26 100644
> --- a/fs/9p/v9fs.h
> +++ b/fs/9p/v9fs.h
> @@ -104,6 +104,7 @@ struct v9fs_session_info {
>        struct p9_client *clnt; /* 9p client */
>        struct list_head slist; /* list of sessions registered with v9fs */
>        struct backing_dev_info bdi;
> +       rwlock_t rename_lock;
>  };
>
>  struct p9_fid *v9fs_session_init(struct v9fs_session_info *, const char *,
> diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
> index 503a6a2..eae89ad 100644
> --- a/fs/9p/vfs_inode.c
> +++ b/fs/9p/vfs_inode.c
> @@ -964,6 +964,7 @@ static struct dentry *v9fs_vfs_lookup(struct inode *dir, struct dentry *dentry,
>
>        sb = dir->i_sb;
>        v9ses = v9fs_inode2v9ses(dir);
> +       /* We can walk d_parent because we hold the dir->i_mutex */
>        dfid = v9fs_fid_lookup(dentry->d_parent);
>        if (IS_ERR(dfid))
>                return ERR_CAST(dfid);
> @@ -1049,7 +1050,7 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
>        struct p9_fid *olddirfid;
>        struct p9_fid *newdirfid;
>        struct p9_wstat wstat;
> -       int retval;
> +       int retval, cross_dir_rename = 0;
>
>        P9_DPRINTK(P9_DEBUG_VFS, "\n");
>        retval = 0;
> @@ -1070,6 +1071,9 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
>                retval = PTR_ERR(newdirfid);
>                goto clunk_olddir;
>        }
> +       cross_dir_rename = (old_dentry->d_parent != new_dentry->d_parent);
> +       if (cross_dir_rename)
> +               write_lock(&v9ses->rename_lock);
>
>        if (v9fs_proto_dotl(v9ses)) {
>                retval = p9_client_rename(oldfid, newdirfid,
> @@ -1077,21 +1081,27 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
>                if (retval != -ENOSYS)
>                        goto clunk_newdir;
>        }
> +       if (cross_dir_rename) {
> +               /*
> +                * 9P .u can only handle file rename in the same directory
> +                */
>
> -       /* 9P can only handle file rename in the same directory */
> -       if (memcmp(&olddirfid->qid, &newdirfid->qid, sizeof(newdirfid->qid))) {
>                P9_DPRINTK(P9_DEBUG_ERROR,
>                                "old dir and new dir are different\n");
>                retval = -EXDEV;
>                goto clunk_newdir;
>        }
> -
>        v9fs_blank_wstat(&wstat);
>        wstat.muid = v9ses->uname;
>        wstat.name = (char *) new_dentry->d_name.name;
>        retval = p9_client_wstat(oldfid, &wstat);
>
>  clunk_newdir:
> +       if (!retval)
> +               /* successful rename */
> +               d_move(old_dentry, new_dentry);
> +       if (cross_dir_rename)
> +               write_unlock(&v9ses->rename_lock);
>        p9_client_clunk(newdirfid);
>
>  clunk_olddir:
> diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
> index 0740675..3abc3ec 100644
> --- a/fs/9p/vfs_super.c
> +++ b/fs/9p/vfs_super.c
> @@ -287,4 +287,5 @@ struct file_system_type v9fs_fs_type = {
>        .get_sb = v9fs_get_sb,
>        .kill_sb = v9fs_kill_super,
>        .owner = THIS_MODULE,
> +       .fs_flags = FS_RENAME_DOES_D_MOVE,
>  };
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> _______________________________________________
> V9fs-developer mailing list
> V9fs-developer@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [V9fs-developer] [GIT PULL] 9p file system bug fixes for  2.6.35-rc2
  2010-06-30 18:16               ` Latchesar Ionkov
@ 2010-06-30 18:31                 ` Linus Torvalds
  2010-06-30 18:33                 ` Aneesh Kumar K. V
  1 sibling, 0 replies; 16+ messages in thread
From: Linus Torvalds @ 2010-06-30 18:31 UTC (permalink / raw)
  To: Latchesar Ionkov
  Cc: Aneesh Kumar K. V, Eric Van Hensbergen, V9FS Developers, Al Viro,
	linux-kernel

On Wed, Jun 30, 2010 at 11:16 AM, Latchesar Ionkov <lucho@ionkov.net> wrote:
>
> I think that you need to use the s_vfs_rename_mutex in the super_block
> struct instead of introducing a new rename_lock in the v9fs session.

I actually think it's better to avoid having filesystems muck around
with VFS locking details. Also, I think we get the VFS rename mutex
only for cross-directory renames, and as mentioned, 9p needs locking
even for regular directory renames.

(Also, this way you can have parallel readers - although we could
obviously change the vfs rename mutex into a rw-sem too).

So I do think that keeping the logic private to a 9p-specific lock is
the right solution here.

                 Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [V9fs-developer] [GIT PULL] 9p file system bug fixes for 2.6.35-rc2
  2010-06-30 18:16               ` Latchesar Ionkov
  2010-06-30 18:31                 ` Linus Torvalds
@ 2010-06-30 18:33                 ` Aneesh Kumar K. V
  1 sibling, 0 replies; 16+ messages in thread
From: Aneesh Kumar K. V @ 2010-06-30 18:33 UTC (permalink / raw)
  To: Latchesar Ionkov
  Cc: Linus Torvalds, Eric Van Hensbergen, V9FS Developers, Al Viro,
	linux-kernel

On Wed, 30 Jun 2010 12:16:15 -0600, Latchesar Ionkov <lucho@ionkov.net> wrote:
> I think that you need to use the s_vfs_rename_mutex in the super_block
> struct instead of introducing a new rename_lock in the v9fs session.
> 

I guess that will have lock dependency issue with inode->i_mutex. Also as Linus
suggested with dentry name copied we can avoid holding the lock when
doing 9P operation.
 
-aneesh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [V9fs-developer] [GIT PULL] 9p file system bug fixes for 2.6.35-rc2
  2010-06-30 16:48               ` Linus Torvalds
@ 2010-06-30 18:58                 ` Aneesh Kumar K. V
  0 siblings, 0 replies; 16+ messages in thread
From: Aneesh Kumar K. V @ 2010-06-30 18:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Van Hensbergen, V9FS Developers, Al Viro, linux-kernel

On Wed, 30 Jun 2010 09:48:15 -0700, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Wed, Jun 30, 2010 at 4:52 AM, Aneesh Kumar K. V
> <aneesh.kumar@linux.vnet.ibm.com> wrote:
> >
> > You are correct. we cannot drop the rename lock in between. I also found
> > another issue in that we are using dentry->d_name.name directly. That
> > would imply we need to hold the rename_lock even during the
> > client_walk.
> 
> Yes.
> 
> > How about the patch below ?. I updated the patch to hold
> > rename_lock during multiple path walk. Also the rename path is updated
> > to hold the lock during p9_client_rename operations.
> 
> I'm not finding any obvious problems with this, and you're right that
> you also need to hold the rename write-lock even for regular renames
> (not just cross-directory ones) in order to protect the name itself.
> So those two patches together seem to be ok at a quick glance.
> 
> That said, I do wonder if you wouldn't be better off copying the name
> components in order to then drop the lock earlier. That way you'd only
> need to hold the lock while walking the dentries, and could possibly
> release it during the actual walk (unless you need the names to be
> stable, but I don't think you can rely on that anyway, since other
> clients might be doing renames concurrently.. I don't know)
> 
>                        Linus


Updated patch

>From 0bae321401e88d495ea7e9e96272c11c1b9b9ec3 Mon Sep 17 00:00:00 2001
From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date: Wed, 30 Jun 2010 19:18:50 +0530
Subject: [PATCH] fs/9p: Prevent parallel rename when doing fid_lookup

During fid lookup we need to make sure that the dentry->d_parent doesn't
change so that we can safely walk the parent dentries. To ensure that
we need to prevent cross directory rename during fid_lookup. Add a
per superblock rename_lock rwlock to prevent parallel fid lookup and rename.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/9p/fid.c       |  134 ++++++++++++++++++++++++++++++++++++++---------------
 fs/9p/v9fs.c      |    1 +
 fs/9p/v9fs.h      |    1 +
 fs/9p/vfs_inode.c |   14 ++++--
 fs/9p/vfs_super.c |    1 +
 5 files changed, 110 insertions(+), 41 deletions(-)

diff --git a/fs/9p/fid.c b/fs/9p/fid.c
index 5d6cfcb..7dcefaa 100644
--- a/fs/9p/fid.c
+++ b/fs/9p/fid.c
@@ -97,6 +97,48 @@ static struct p9_fid *v9fs_fid_find(struct dentry *dentry, u32 uid, int any)
 	return ret;
 }
 
+static int build_path_from_dentry(struct v9fs_session_info *v9ses,
+				  struct dentry *dentry, char ***names)
+{
+	int n = 0, i;
+	char **wnames;
+	struct dentry *ds;
+
+	read_lock(&v9ses->rename_lock);
+	for (ds = dentry; !IS_ROOT(ds); ds = ds->d_parent)
+		n++;
+
+	wnames = kmalloc(sizeof(char *) * n, GFP_KERNEL);
+	if (!wnames) {
+		n = 0;
+		goto err_out;
+	}
+	for (ds = dentry, i = (n-1); i >= 0; i--, ds = ds->d_parent) {
+		wnames[i] = kmalloc(strlen(ds->d_name.name) + 1, GFP_KERNEL);
+		if (!wnames[i])
+			goto err_out;
+
+		strcpy(wnames[i], ds->d_name.name);
+	}
+	*names = wnames;
+	read_unlock(&v9ses->rename_lock);
+	return n;
+err_out:
+	read_unlock(&v9ses->rename_lock);
+	for (; i < n; i++)
+		kfree(wnames[i]);
+	kfree(wnames);
+	return -ENOMEM;
+}
+
+static void kfree_wnames(char **wnames, int n)
+{
+	int i;
+	for (i = 0; i < n; i++)
+		kfree(wnames[i]);
+	kfree(wnames);
+}
+
 /**
  * v9fs_fid_lookup - lookup for a fid, try to walk if not found
  * @dentry: dentry to look for fid in
@@ -112,7 +154,7 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
 	int i, n, l, clone, any, access;
 	u32 uid;
 	struct p9_fid *fid, *old_fid = NULL;
-	struct dentry *d, *ds;
+	struct dentry *ds;
 	struct v9fs_session_info *v9ses;
 	char **wnames, *uname;
 
@@ -139,46 +181,62 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
 	fid = v9fs_fid_find(dentry, uid, any);
 	if (fid)
 		return fid;
-
+	/*
+	 * we don't have a matching fid. To do a TWALK we need
+	 * parent fid. We need to prevent rename when we want to
+	 * look at the parent.
+	 */
+	read_lock(&v9ses->rename_lock);
 	ds = dentry->d_parent;
 	fid = v9fs_fid_find(ds, uid, any);
-	if (!fid) { /* walk from the root */
-		n = 0;
-		for (ds = dentry; !IS_ROOT(ds); ds = ds->d_parent)
-			n++;
-
-		fid = v9fs_fid_find(ds, uid, any);
-		if (!fid) { /* the user is not attached to the fs yet */
-			if (access == V9FS_ACCESS_SINGLE)
-				return ERR_PTR(-EPERM);
-
-			if (v9fs_proto_dotu(v9ses) ||
-				v9fs_proto_dotl(v9ses))
+	if (fid) {
+		/*
+		 * Found the parent fid do a lookup with that.
+		 * We want to do the walk without holding rename_lock
+		 */
+		char *dentry_name;
+		dentry_name = kmalloc(strlen(dentry->d_name.name) + 1,
+				      GFP_KERNEL);
+		if (dentry_name) {
+			strcpy(dentry_name, dentry->d_name.name);
+			read_unlock(&v9ses->rename_lock);
+			fid = p9_client_walk(fid, 1, (char **)&dentry_name, 1);
+			kfree(dentry_name);
+			goto dentry_out;
+		} else {
+			read_unlock(&v9ses->rename_lock);
+			return ERR_PTR(-ENOMEM);
+		}
+	}
+	read_unlock(&v9ses->rename_lock);
+	/* start from the root and try to do a lookup */
+	fid = v9fs_fid_find(dentry->d_sb->s_root, uid, any);
+	if (!fid) {
+		/* the user is not attached to the fs yet */
+		if (access == V9FS_ACCESS_SINGLE)
+			return ERR_PTR(-EPERM);
+
+		if (v9fs_proto_dotu(v9ses) || v9fs_proto_dotl(v9ses))
 				uname = NULL;
-			else
-				uname = v9ses->uname;
-
-			fid = p9_client_attach(v9ses->clnt, NULL, uname, uid,
-				v9ses->aname);
+		else
+			uname = v9ses->uname;
 
-			if (IS_ERR(fid))
-				return fid;
-
-			v9fs_fid_add(ds, fid);
-		}
-	} else /* walk from the parent */
-		n = 1;
+		fid = p9_client_attach(v9ses->clnt, NULL, uname, uid,
+				       v9ses->aname);
+		if (IS_ERR(fid))
+			return fid;
 
-	if (ds == dentry)
+		v9fs_fid_add(dentry->d_sb->s_root, fid);
+	}
+	/* If we are root just return the fid */
+	if (dentry->d_sb->s_root == dentry)
 		return fid;
 
-	wnames = kmalloc(sizeof(char *) * n, GFP_KERNEL);
-	if (!wnames)
-		return ERR_PTR(-ENOMEM);
-
-	for (d = dentry, i = (n-1); i >= 0; i--, d = d->d_parent)
-		wnames[i] = (char *) d->d_name.name;
-
+	n  = build_path_from_dentry(v9ses, dentry, &wnames);
+	if (n < 0) {
+		fid = ERR_PTR(n);
+		goto err_out;
+	}
 	clone = 1;
 	i = 0;
 	while (i < n) {
@@ -193,16 +251,18 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
 				 */
 				p9_client_clunk(old_fid);
 			}
-			kfree(wnames);
-			return fid;
+			kfree_wnames(wnames, n);
+			goto err_out;
 		}
 		old_fid = fid;
 		i += l;
 		clone = 0;
 	}
+	kfree_wnames(wnames, n);
 
-	kfree(wnames);
+dentry_out:
 	v9fs_fid_add(dentry, fid);
+err_out:
 	return fid;
 }
 
diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
index 3c49201..b41bcef 100644
--- a/fs/9p/v9fs.c
+++ b/fs/9p/v9fs.c
@@ -237,6 +237,7 @@ struct p9_fid *v9fs_session_init(struct v9fs_session_info *v9ses,
 		__putname(v9ses->uname);
 		return ERR_PTR(-ENOMEM);
 	}
+	rwlock_init(&v9ses->rename_lock);
 
 	rc = bdi_setup_and_register(&v9ses->bdi, "9p", BDI_CAP_MAP_COPY);
 	if (rc) {
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index bec4d0b..dee4f26 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -104,6 +104,7 @@ struct v9fs_session_info {
 	struct p9_client *clnt;	/* 9p client */
 	struct list_head slist; /* list of sessions registered with v9fs */
 	struct backing_dev_info bdi;
+	rwlock_t rename_lock;
 };
 
 struct p9_fid *v9fs_session_init(struct v9fs_session_info *, const char *,
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 503a6a2..97fee13 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -964,6 +964,7 @@ static struct dentry *v9fs_vfs_lookup(struct inode *dir, struct dentry *dentry,
 
 	sb = dir->i_sb;
 	v9ses = v9fs_inode2v9ses(dir);
+	/* We can walk d_parent because we hold the dir->i_mutex */
 	dfid = v9fs_fid_lookup(dentry->d_parent);
 	if (IS_ERR(dfid))
 		return ERR_CAST(dfid);
@@ -1070,28 +1071,33 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 		retval = PTR_ERR(newdirfid);
 		goto clunk_olddir;
 	}
-
 	if (v9fs_proto_dotl(v9ses)) {
 		retval = p9_client_rename(oldfid, newdirfid,
 					(char *) new_dentry->d_name.name);
 		if (retval != -ENOSYS)
 			goto clunk_newdir;
 	}
+	if (old_dentry->d_parent != new_dentry->d_parent) {
+		/*
+		 * 9P .u can only handle file rename in the same directory
+		 */
 
-	/* 9P can only handle file rename in the same directory */
-	if (memcmp(&olddirfid->qid, &newdirfid->qid, sizeof(newdirfid->qid))) {
 		P9_DPRINTK(P9_DEBUG_ERROR,
 				"old dir and new dir are different\n");
 		retval = -EXDEV;
 		goto clunk_newdir;
 	}
-
 	v9fs_blank_wstat(&wstat);
 	wstat.muid = v9ses->uname;
 	wstat.name = (char *) new_dentry->d_name.name;
 	retval = p9_client_wstat(oldfid, &wstat);
 
 clunk_newdir:
+	write_lock(&v9ses->rename_lock);
+	if (!retval)
+		/* successful rename */
+		d_move(old_dentry, new_dentry);
+	write_unlock(&v9ses->rename_lock);
 	p9_client_clunk(newdirfid);
 
 clunk_olddir:
diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index 0740675..3abc3ec 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -287,4 +287,5 @@ struct file_system_type v9fs_fs_type = {
 	.get_sb = v9fs_get_sb,
 	.kill_sb = v9fs_kill_super,
 	.owner = THIS_MODULE,
+	.fs_flags = FS_RENAME_DOES_D_MOVE,
 };
-- 
1.7.2.rc0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-06-30 18:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-07 20:02 [GIT PULL] 9p file system bug fixes for 2.6.35-rc2 Eric Van Hensbergen
2010-06-08  0:08 ` Linus Torvalds
2010-06-08  0:41   ` Al Viro
2010-06-08  0:48     ` Linus Torvalds
2010-06-16 16:42     ` [V9fs-developer] " Aneesh Kumar K. V
2010-06-24 16:30       ` Aneesh Kumar K. V
2010-06-29 20:18         ` Eric Van Hensbergen
2010-06-29 20:38           ` Linus Torvalds
2010-06-30 11:52             ` Aneesh Kumar K. V
2010-06-30 16:48               ` Linus Torvalds
2010-06-30 18:58                 ` Aneesh Kumar K. V
2010-06-30 18:16               ` Latchesar Ionkov
2010-06-30 18:31                 ` Linus Torvalds
2010-06-30 18:33                 ` Aneesh Kumar K. V
2010-06-30 12:55         ` Aneesh Kumar K. V
2010-06-08 14:29   ` Venkateswararao Jujjuri (JV)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).