All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH]autofs4: hang and proposed fix
@ 2005-11-16 10:17 Ram Pai
  2005-11-16 12:41 ` [autofs] " Ian Kent
                   ` (3 more replies)
  0 siblings, 4 replies; 95+ messages in thread
From: Ram Pai @ 2005-11-16 10:17 UTC (permalink / raw)
  To: autofs; +Cc: linux-fsdevel, linuxram

Autofs4 assumes that its ->revalidate() function gets called with the
parent_dentry's_inode_semaphore released. This is true mostly
but not in one particular case.

Process P1  calls autofs4's ->lookup(). The lookup finds that the dentry
does not exist. It creates a dentry and adds to the cache. Releases
the parent's inode's semaphore and than calls ->revalidate().

Process P2 meanwhile comes in and cached_lookup() gets called. It finds
the dentry in the cache and finds ->revalidate() function exists. So
it calls ->revalidate() holding the parent's inode's semaphore.

Now the automounter daemon comes in and tries to hold the same semaphore
in order to mount. But since the semaphore is held by P2 it
goes to sleep.

Process P1 and P2 continue waiting for the mount to complete and it never
happens. Deadlock.

The stack of the deadlock is as follows:

ls            S 00000000     0 13049  11954                     (NOTLB)
f5221df0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 f5d44a70 c721b520 00000000 d4f33800 003d0990 c721b9d8 f5d44030
f5d44164 f5220000 f5221e3c f3dd6880 f5221e68 c0215207 f3b95580 80000000
Call Trace:
[<c0215207>] autofs4_wait+0x307/0x3d0
[<c02141d3>] try_to_fill_dentry+0xf3/0x150
[<c0214389>] autofs4_revalidate+0x159/0x170
[<c02144e0>] autofs4_lookup+0x110/0x150
[<c016f3f5>] __lookup_hash+0x85/0xb0
[<c016f42a>] lookup_hash+0xa/0x10
[<c016f483>] lookup_one_len+0x53/0x70
[<f8851293>] stubfs_readdir+0x113/0x170 [stubfs]
[<c0172fcb>] vfs_readdir+0x8b/0xa0
[<c01733b3>] sys_getdents64+0x63/0xb5
[<c010464d>] syscall_call+0x7/0xb

ls            S C011B1AF     0 13050  11898                     (NOTLB)
f1337df0 00000082 f1337e04 c011b1af 06ce3f60 00000027 00000027 00000080
06d03f60 00000000 c721b520 00000000 d4f33800 003d0990 f1337df0 f5d44a70
f5d44ba4 f1336000 f1337e3c f3dd6880 f1337e68 c0215207 f3b95580 80000000
Call Trace:
[<c0215207>] autofs4_wait+0x307/0x3d0
[<c02141d3>] try_to_fill_dentry+0xf3/0x150
[<c0214389>] autofs4_revalidate+0x159/0x170
[<c016dc77>] cached_lookup+0x47/0x80
[<c016f3ca>] __lookup_hash+0x5a/0xb0
[<c016f42a>] lookup_hash+0xa/0x10
[<c016f483>] lookup_one_len+0x53/0x70  
[<f88512e3>] stubfs_readdir+0x163/0x170 [stubfs]
[<c0172fcb>] vfs_readdir+0x8b/0xa0  
[<c01733b3>] sys_getdents64+0x63/0xb5
[<c010464d>] syscall_call+0x7/0xb

automount     D 00000010     0 13052  13016                     (NOTLB)
f3321f00 fff80000 00000007 00000010 f3321f68 c7b1cd20 00000000 f3321f34
f3321ee8 f5e92a70 c7233520 00000000 d5304100 003d0990 c7233560 f1e31a70
f1e31ba4 f5f59914 f5f5991c 00000296 f3321f38 c03b4cd3 f1e31a70 00000001
Call Trace:
[<c03b4cd3>] __down+0x83/0xe0
[<c03b3632>] __down_failed+0xa/0x10
[<c0171e6d>] .text.lock.namei+0xeb/0x1de
[<c0170482>] sys_mkdir+0x52/0xd0
[<c010464d>] syscall_call+0x7/0xb
BUG: soft lockup detected on CPU#0!


I have coded up a tentative fix. The patch releases the semaphore in
->revalidate() function, instead of the caller of that function.  Not
sure if this is the right fix. Tested it and verified that the deadlock
is fixed.  But I am not sure if it opens up other bugs. Please validate.


 fs/autofs4/root.c |   26 +++++++++++++++-----------
 1 files changed, 15 insertions(+), 11 deletions(-)

Index: 2.6.15-rc1/fs/autofs4/root.c
===================================================================
--- 2.6.15-rc1.orig/fs/autofs4/root.c
+++ 2.6.15-rc1/fs/autofs4/root.c
@@ -386,40 +386,47 @@ static int autofs4_revalidate(struct den
 	struct autofs_sb_info *sbi = autofs4_sbi(dir->i_sb);
 	int oz_mode = autofs4_oz_mode(sbi);
 	int flags = nd ? nd->flags : 0;
 	int status = 1;
 
+	up(&dir->i_sem);
 	/* Pending dentry */
 	if (autofs4_ispending(dentry)) {
 		if (!oz_mode)
-			status = try_to_fill_dentry(dentry, dir->i_sb, sbi, flags);
-		return status;
+			status = try_to_fill_dentry(dentry, dir->i_sb,
+					sbi, flags);
+		goto out;
 	}
 
 	/* Negative dentry.. invalidate if "old" */
-	if (dentry->d_inode == NULL)
-		return (dentry->d_time - jiffies <= AUTOFS_NEGATIVE_TIMEOUT);
+	if (dentry->d_inode == NULL) {
+		status = (dentry->d_time - jiffies <= AUTOFS_NEGATIVE_TIMEOUT);
+		goto out;
+	}
 
 	/* Check for a non-mountpoint directory with no contents */
 	spin_lock(&dcache_lock);
 	if (S_ISDIR(dentry->d_inode->i_mode) &&
 	    !d_mountpoint(dentry) && 
 	    list_empty(&dentry->d_subdirs)) {
 		DPRINTK("dentry=%p %.*s, emptydir",
 			 dentry, dentry->d_name.len, dentry->d_name.name);
 		spin_unlock(&dcache_lock);
 		if (!oz_mode)
-			status = try_to_fill_dentry(dentry, dir->i_sb, sbi, flags);
-		return status;
+			status = try_to_fill_dentry(dentry, dir->i_sb, sbi,
+					flags);
+		goto out;
 	}
 	spin_unlock(&dcache_lock);
 
 	/* Update the usage list */
 	if (!oz_mode)
 		autofs4_update_usage(dentry);
 
-	return 1;
+out:
+	down(&dir->i_sem);
+	return status;
 }
 
 static void autofs4_dentry_release(struct dentry *de)
 {
 	struct autofs_info *inf;
@@ -485,15 +492,12 @@ static struct dentry *autofs4_lookup(str
 		spin_unlock(&dentry->d_lock);
 	}
 	dentry->d_fsdata = NULL;
 	d_add(dentry, NULL);
 
-	if (dentry->d_op && dentry->d_op->d_revalidate) {
-		up(&dir->i_sem);
+	if (dentry->d_op && dentry->d_op->d_revalidate)
 		(dentry->d_op->d_revalidate)(dentry, nd);
-		down(&dir->i_sem);
-	}
 
 	/*
 	 * If we are still pending, check if we had to handle
 	 * a signal. If so we can force a restart..
 	 */

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-16 10:17 [RFC PATCH]autofs4: hang and proposed fix Ram Pai
@ 2005-11-16 12:41 ` Ian Kent
  2005-11-16 16:50   ` Ram Pai
  2005-11-16 15:22   ` Jeff Moyer
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 95+ messages in thread
From: Ian Kent @ 2005-11-16 12:41 UTC (permalink / raw)
  To: Ram Pai; +Cc: autofs, linux-fsdevel

On Wed, 16 Nov 2005, Ram Pai wrote:

Thanks for you effort Ram.

> Autofs4 assumes that its ->revalidate() function gets called with the
> parent_dentry's_inode_semaphore released. This is true mostly
> but not in one particular case.

Yep. Certainly does.

Isn't my mistake not noticing that the inode semaphore is taken in 
vfs_readdir?

It's been like that all along and I can't understand how I didn't notice 
it before.

Help me out a bit here please Ram.

Aren't there other paths that enter revalidate without holding the 
semaphore? chdir?

Does uping an open semaphore allow other undesirable side affects?

Do you think it would perhaps be better to release the semaphore in 
autofs4_readdir ... hang on the stack trace doesn't look like a readdir 
... I'll have to check 2.6.15-rc1 ... ?

Apart from the above, looking at the patch and assuming that the semaphore 
is always held it would probaby be better to move semaphore open/close 
into try_to_fill_dentry as any control process using autofs, such as automount, 
must never cause a mount wait to be called (oz_mode = 1).

Ideas?

> 
> Process P1  calls autofs4's ->lookup(). The lookup finds that the dentry
> does not exist. It creates a dentry and adds to the cache. Releases
> the parent's inode's semaphore and than calls ->revalidate().
> 
> Process P2 meanwhile comes in and cached_lookup() gets called. It finds
> the dentry in the cache and finds ->revalidate() function exists. So
> it calls ->revalidate() holding the parent's inode's semaphore.
> 
> Now the automounter daemon comes in and tries to hold the same semaphore
> in order to mount. But since the semaphore is held by P2 it
> goes to sleep.
> 
> Process P1 and P2 continue waiting for the mount to complete and it never
> happens. Deadlock.
> 
> The stack of the deadlock is as follows:
> 
> ls            S 00000000     0 13049  11954                     (NOTLB)
> f5221df0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 f5d44a70 c721b520 00000000 d4f33800 003d0990 c721b9d8 f5d44030
> f5d44164 f5220000 f5221e3c f3dd6880 f5221e68 c0215207 f3b95580 80000000
> Call Trace:
> [<c0215207>] autofs4_wait+0x307/0x3d0
> [<c02141d3>] try_to_fill_dentry+0xf3/0x150
> [<c0214389>] autofs4_revalidate+0x159/0x170
> [<c02144e0>] autofs4_lookup+0x110/0x150
> [<c016f3f5>] __lookup_hash+0x85/0xb0
> [<c016f42a>] lookup_hash+0xa/0x10
> [<c016f483>] lookup_one_len+0x53/0x70
> [<f8851293>] stubfs_readdir+0x113/0x170 [stubfs]
> [<c0172fcb>] vfs_readdir+0x8b/0xa0
> [<c01733b3>] sys_getdents64+0x63/0xb5
> [<c010464d>] syscall_call+0x7/0xb
> 
> ls            S C011B1AF     0 13050  11898                     (NOTLB)
> f1337df0 00000082 f1337e04 c011b1af 06ce3f60 00000027 00000027 00000080
> 06d03f60 00000000 c721b520 00000000 d4f33800 003d0990 f1337df0 f5d44a70
> f5d44ba4 f1336000 f1337e3c f3dd6880 f1337e68 c0215207 f3b95580 80000000
> Call Trace:
> [<c0215207>] autofs4_wait+0x307/0x3d0
> [<c02141d3>] try_to_fill_dentry+0xf3/0x150
> [<c0214389>] autofs4_revalidate+0x159/0x170
> [<c016dc77>] cached_lookup+0x47/0x80
> [<c016f3ca>] __lookup_hash+0x5a/0xb0
> [<c016f42a>] lookup_hash+0xa/0x10
> [<c016f483>] lookup_one_len+0x53/0x70  
> [<f88512e3>] stubfs_readdir+0x163/0x170 [stubfs]
> [<c0172fcb>] vfs_readdir+0x8b/0xa0  
> [<c01733b3>] sys_getdents64+0x63/0xb5
> [<c010464d>] syscall_call+0x7/0xb
> 
> automount     D 00000010     0 13052  13016                     (NOTLB)
> f3321f00 fff80000 00000007 00000010 f3321f68 c7b1cd20 00000000 f3321f34
> f3321ee8 f5e92a70 c7233520 00000000 d5304100 003d0990 c7233560 f1e31a70
> f1e31ba4 f5f59914 f5f5991c 00000296 f3321f38 c03b4cd3 f1e31a70 00000001
> Call Trace:
> [<c03b4cd3>] __down+0x83/0xe0
> [<c03b3632>] __down_failed+0xa/0x10
> [<c0171e6d>] .text.lock.namei+0xeb/0x1de
> [<c0170482>] sys_mkdir+0x52/0xd0
> [<c010464d>] syscall_call+0x7/0xb
> BUG: soft lockup detected on CPU#0!
> 
> 
> I have coded up a tentative fix. The patch releases the semaphore in
> ->revalidate() function, instead of the caller of that function.  Not
> sure if this is the right fix. Tested it and verified that the deadlock
> is fixed.  But I am not sure if it opens up other bugs. Please validate.
> 
> 
>  fs/autofs4/root.c |   26 +++++++++++++++-----------
>  1 files changed, 15 insertions(+), 11 deletions(-)
> 
> Index: 2.6.15-rc1/fs/autofs4/root.c
> ===================================================================
> --- 2.6.15-rc1.orig/fs/autofs4/root.c
> +++ 2.6.15-rc1/fs/autofs4/root.c
> @@ -386,40 +386,47 @@ static int autofs4_revalidate(struct den
>  	struct autofs_sb_info *sbi = autofs4_sbi(dir->i_sb);
>  	int oz_mode = autofs4_oz_mode(sbi);
>  	int flags = nd ? nd->flags : 0;
>  	int status = 1;
>  
> +	up(&dir->i_sem);
>  	/* Pending dentry */
>  	if (autofs4_ispending(dentry)) {
>  		if (!oz_mode)
> -			status = try_to_fill_dentry(dentry, dir->i_sb, sbi, flags);
> -		return status;
> +			status = try_to_fill_dentry(dentry, dir->i_sb,
> +					sbi, flags);
> +		goto out;
>  	}
>  
>  	/* Negative dentry.. invalidate if "old" */
> -	if (dentry->d_inode == NULL)
> -		return (dentry->d_time - jiffies <= AUTOFS_NEGATIVE_TIMEOUT);
> +	if (dentry->d_inode == NULL) {
> +		status = (dentry->d_time - jiffies <= AUTOFS_NEGATIVE_TIMEOUT);
> +		goto out;
> +	}
>  
>  	/* Check for a non-mountpoint directory with no contents */
>  	spin_lock(&dcache_lock);
>  	if (S_ISDIR(dentry->d_inode->i_mode) &&
>  	    !d_mountpoint(dentry) && 
>  	    list_empty(&dentry->d_subdirs)) {
>  		DPRINTK("dentry=%p %.*s, emptydir",
>  			 dentry, dentry->d_name.len, dentry->d_name.name);
>  		spin_unlock(&dcache_lock);
>  		if (!oz_mode)
> -			status = try_to_fill_dentry(dentry, dir->i_sb, sbi, flags);
> -		return status;
> +			status = try_to_fill_dentry(dentry, dir->i_sb, sbi,
> +					flags);
> +		goto out;
>  	}
>  	spin_unlock(&dcache_lock);
>  
>  	/* Update the usage list */
>  	if (!oz_mode)
>  		autofs4_update_usage(dentry);
>  
> -	return 1;
> +out:
> +	down(&dir->i_sem);
> +	return status;
>  }
>  
>  static void autofs4_dentry_release(struct dentry *de)
>  {
>  	struct autofs_info *inf;
> @@ -485,15 +492,12 @@ static struct dentry *autofs4_lookup(str
>  		spin_unlock(&dentry->d_lock);
>  	}
>  	dentry->d_fsdata = NULL;
>  	d_add(dentry, NULL);
>  
> -	if (dentry->d_op && dentry->d_op->d_revalidate) {
> -		up(&dir->i_sem);
> +	if (dentry->d_op && dentry->d_op->d_revalidate)
>  		(dentry->d_op->d_revalidate)(dentry, nd);
> -		down(&dir->i_sem);
> -	}
>  
>  	/*
>  	 * If we are still pending, check if we had to handle
>  	 * a signal. If so we can force a restart..
>  	 */
> 
> _______________________________________________
> autofs mailing list
> autofs@linux.kernel.org
> http://linux.kernel.org/mailman/listinfo/autofs
> 


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-11-16 10:17 [RFC PATCH]autofs4: hang and proposed fix Ram Pai
@ 2005-11-16 15:22   ` Jeff Moyer
  2005-11-16 15:22   ` Jeff Moyer
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 95+ messages in thread
From: Jeff Moyer @ 2005-11-16 15:22 UTC (permalink / raw)
  To: Ram Pai; +Cc: autofs, linux-fsdevel

==> Regarding [autofs] [RFC PATCH]autofs4: hang and proposed fix; linuxram@us.ibm.com (Ram Pai) adds:

ram> Autofs4 assumes that its ->revalidate() function gets called with the
ram> parent_dentry's_inode_semaphore released. This is true mostly
ram> but not in one particular case.

ram> Process P1  calls autofs4's ->lookup(). The lookup finds that the dentry
ram> does not exist. It creates a dentry and adds to the cache. Releases
ram> the parent's inode's semaphore and than calls ->revalidate().

ram> Process P2 meanwhile comes in and cached_lookup() gets called. It finds
ram> the dentry in the cache and finds ->revalidate() function exists. So
ram> it calls ->revalidate() holding the parent's inode's semaphore.

ram> Now the automounter daemon comes in and tries to hold the same semaphore
ram> in order to mount. But since the semaphore is held by P2 it
ram> goes to sleep.

ram> Process P1 and P2 continue waiting for the mount to complete and it never
ram> happens. Deadlock.

ram> The stack of the deadlock is as follows:

ram> ls            S 00000000     0 13049  11954                     (NOTLB)
ram> f5221df0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ram> 00000000 f5d44a70 c721b520 00000000 d4f33800 003d0990 c721b9d8 f5d44030
ram> f5d44164 f5220000 f5221e3c f3dd6880 f5221e68 c0215207 f3b95580 80000000
ram> Call Trace:
ram> [<c0215207>] autofs4_wait+0x307/0x3d0
ram> [<c02141d3>] try_to_fill_dentry+0xf3/0x150
ram> [<c0214389>] autofs4_revalidate+0x159/0x170
ram> [<c02144e0>] autofs4_lookup+0x110/0x150
ram> [<c016f3f5>] __lookup_hash+0x85/0xb0
ram> [<c016f42a>] lookup_hash+0xa/0x10
ram> [<c016f483>] lookup_one_len+0x53/0x70
ram> [<f8851293>] stubfs_readdir+0x113/0x170 [stubfs]

What's stubfs?

-Jeff

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
@ 2005-11-16 15:22   ` Jeff Moyer
  0 siblings, 0 replies; 95+ messages in thread
From: Jeff Moyer @ 2005-11-16 15:22 UTC (permalink / raw)
  To: Ram Pai; +Cc: autofs, linux-fsdevel

==> Regarding [autofs] [RFC PATCH]autofs4: hang and proposed fix; linuxram@us.ibm.com (Ram Pai) adds:

ram> Autofs4 assumes that its ->revalidate() function gets called with the
ram> parent_dentry's_inode_semaphore released. This is true mostly
ram> but not in one particular case.

ram> Process P1  calls autofs4's ->lookup(). The lookup finds that the dentry
ram> does not exist. It creates a dentry and adds to the cache. Releases
ram> the parent's inode's semaphore and than calls ->revalidate().

ram> Process P2 meanwhile comes in and cached_lookup() gets called. It finds
ram> the dentry in the cache and finds ->revalidate() function exists. So
ram> it calls ->revalidate() holding the parent's inode's semaphore.

ram> Now the automounter daemon comes in and tries to hold the same semaphore
ram> in order to mount. But since the semaphore is held by P2 it
ram> goes to sleep.

ram> Process P1 and P2 continue waiting for the mount to complete and it never
ram> happens. Deadlock.

ram> The stack of the deadlock is as follows:

ram> ls            S 00000000     0 13049  11954                     (NOTLB)
ram> f5221df0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ram> 00000000 f5d44a70 c721b520 00000000 d4f33800 003d0990 c721b9d8 f5d44030
ram> f5d44164 f5220000 f5221e3c f3dd6880 f5221e68 c0215207 f3b95580 80000000
ram> Call Trace:
ram> [<c0215207>] autofs4_wait+0x307/0x3d0
ram> [<c02141d3>] try_to_fill_dentry+0xf3/0x150
ram> [<c0214389>] autofs4_revalidate+0x159/0x170
ram> [<c02144e0>] autofs4_lookup+0x110/0x150
ram> [<c016f3f5>] __lookup_hash+0x85/0xb0
ram> [<c016f42a>] lookup_hash+0xa/0x10
ram> [<c016f483>] lookup_one_len+0x53/0x70
ram> [<f8851293>] stubfs_readdir+0x113/0x170 [stubfs]

What's stubfs?

-Jeff

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-16 12:41 ` [autofs] " Ian Kent
@ 2005-11-16 16:50   ` Ram Pai
  2005-11-16 22:57     ` Ian Kent
  0 siblings, 1 reply; 95+ messages in thread
From: Ram Pai @ 2005-11-16 16:50 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, linux-fsdevel

On Wed, 2005-11-16 at 04:41, Ian Kent wrote:
> On Wed, 16 Nov 2005, Ram Pai wrote:
> 
> Thanks for you effort Ram.
> 
> > Autofs4 assumes that its ->revalidate() function gets called with the
> > parent_dentry's_inode_semaphore released. This is true mostly
> > but not in one particular case.
> 
> Yep. Certainly does.
> 
> Isn't my mistake not noticing that the inode semaphore is taken in 
> vfs_readdir?
> 
> It's been like that all along and I can't understand how I didn't notice 
> it before.
> 
> Help me out a bit here please Ram.
> 
> Aren't there other paths that enter revalidate without holding the 
> semaphore? chdir?

Looking at the code and it seemed to me that ->revalidate() function is
always with the semaphore held. Atleast VFS seem to have that
assumption.

And looking at the autofs4 code, I  get the impression, that it
assumes that the semaphore is released when it gets called. Which seems
to be inconsistent and wrong.

> 
> Does uping an open semaphore allow other undesirable side affects?
> 
> Do you think it would perhaps be better to release the semaphore in 
> autofs4_readdir ... hang on the stack trace doesn't look like a readdir 
> ... I'll have to check 2.6.15-rc1 ... ?
> 

One automounter process is in sys_mkdir() system call and the other if I
recollect correctly was in sys_getdent64() system call.

Yes this problem can be demonstrated on all versions of the 2.6 kernel.
Infact I reproduced it on 2.6.15-rc1 kernel.

> Apart from the above, looking at the patch and assuming that the semaphore 
> is always held it would probaby be better to move semaphore open/close 
> into try_to_fill_dentry as any control process using autofs, such as automount, 
> must never cause a mount wait to be called (oz_mode = 1).



> 
> Ideas?

One thing is sure, VFS assumes that the semaphore is held when
->revalidate() is called, and that convention is followed religiously by
all filesytems.  autofs4 has this special case of releasing the
semaphore if it is waiting to be woken up by the automounter daemon. So
as you said, may be the semaphore must be released just before sleeping
on the waitq?

RP

> 
> > 
> > Process P1  calls autofs4's ->lookup(). The lookup finds that the dentry
> > does not exist. It creates a dentry and adds to the cache. Releases
> > the parent's inode's semaphore and than calls ->revalidate().
> > 
> > Process P2 meanwhile comes in and cached_lookup() gets called. It finds
> > the dentry in the cache and finds ->revalidate() function exists. So
> > it calls ->revalidate() holding the parent's inode's semaphore.
> > 
> > Now the automounter daemon comes in and tries to hold the same semaphore
> > in order to mount. But since the semaphore is held by P2 it
> > goes to sleep.
> > 
> > Process P1 and P2 continue waiting for the mount to complete and it never
> > happens. Deadlock.
> > 
> > The stack of the deadlock is as follows:
> > 
> > ls            S 00000000     0 13049  11954                     (NOTLB)
> > f5221df0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > 00000000 f5d44a70 c721b520 00000000 d4f33800 003d0990 c721b9d8 f5d44030
> > f5d44164 f5220000 f5221e3c f3dd6880 f5221e68 c0215207 f3b95580 80000000
> > Call Trace:
> > [<c0215207>] autofs4_wait+0x307/0x3d0
> > [<c02141d3>] try_to_fill_dentry+0xf3/0x150
> > [<c0214389>] autofs4_revalidate+0x159/0x170
> > [<c02144e0>] autofs4_lookup+0x110/0x150
> > [<c016f3f5>] __lookup_hash+0x85/0xb0
> > [<c016f42a>] lookup_hash+0xa/0x10
> > [<c016f483>] lookup_one_len+0x53/0x70
> > [<f8851293>] stubfs_readdir+0x113/0x170 [stubfs]
> > [<c0172fcb>] vfs_readdir+0x8b/0xa0
> > [<c01733b3>] sys_getdents64+0x63/0xb5
> > [<c010464d>] syscall_call+0x7/0xb
> > 
> > ls            S C011B1AF     0 13050  11898                     (NOTLB)
> > f1337df0 00000082 f1337e04 c011b1af 06ce3f60 00000027 00000027 00000080
> > 06d03f60 00000000 c721b520 00000000 d4f33800 003d0990 f1337df0 f5d44a70
> > f5d44ba4 f1336000 f1337e3c f3dd6880 f1337e68 c0215207 f3b95580 80000000
> > Call Trace:
> > [<c0215207>] autofs4_wait+0x307/0x3d0
> > [<c02141d3>] try_to_fill_dentry+0xf3/0x150
> > [<c0214389>] autofs4_revalidate+0x159/0x170
> > [<c016dc77>] cached_lookup+0x47/0x80
> > [<c016f3ca>] __lookup_hash+0x5a/0xb0
> > [<c016f42a>] lookup_hash+0xa/0x10
> > [<c016f483>] lookup_one_len+0x53/0x70  
> > [<f88512e3>] stubfs_readdir+0x163/0x170 [stubfs]
> > [<c0172fcb>] vfs_readdir+0x8b/0xa0  
> > [<c01733b3>] sys_getdents64+0x63/0xb5
> > [<c010464d>] syscall_call+0x7/0xb
> > 
> > automount     D 00000010     0 13052  13016                     (NOTLB)
> > f3321f00 fff80000 00000007 00000010 f3321f68 c7b1cd20 00000000 f3321f34
> > f3321ee8 f5e92a70 c7233520 00000000 d5304100 003d0990 c7233560 f1e31a70
> > f1e31ba4 f5f59914 f5f5991c 00000296 f3321f38 c03b4cd3 f1e31a70 00000001
> > Call Trace:
> > [<c03b4cd3>] __down+0x83/0xe0
> > [<c03b3632>] __down_failed+0xa/0x10
> > [<c0171e6d>] .text.lock.namei+0xeb/0x1de
> > [<c0170482>] sys_mkdir+0x52/0xd0
> > [<c010464d>] syscall_call+0x7/0xb
> > BUG: soft lockup detected on CPU#0!
> > 
> > 
> > I have coded up a tentative fix. The patch releases the semaphore in
> > ->revalidate() function, instead of the caller of that function.  Not
> > sure if this is the right fix. Tested it and verified that the deadlock
> > is fixed.  But I am not sure if it opens up other bugs. Please validate.
> > 
> > 
> >  fs/autofs4/root.c |   26 +++++++++++++++-----------
> >  1 files changed, 15 insertions(+), 11 deletions(-)
> > 
> > Index: 2.6.15-rc1/fs/autofs4/root.c
> > ===================================================================
> > --- 2.6.15-rc1.orig/fs/autofs4/root.c
> > +++ 2.6.15-rc1/fs/autofs4/root.c
> > @@ -386,40 +386,47 @@ static int autofs4_revalidate(struct den
> >  	struct autofs_sb_info *sbi = autofs4_sbi(dir->i_sb);
> >  	int oz_mode = autofs4_oz_mode(sbi);
> >  	int flags = nd ? nd->flags : 0;
> >  	int status = 1;
> >  
> > +	up(&dir->i_sem);
> >  	/* Pending dentry */
> >  	if (autofs4_ispending(dentry)) {
> >  		if (!oz_mode)
> > -			status = try_to_fill_dentry(dentry, dir->i_sb, sbi, flags);
> > -		return status;
> > +			status = try_to_fill_dentry(dentry, dir->i_sb,
> > +					sbi, flags);
> > +		goto out;
> >  	}
> >  
> >  	/* Negative dentry.. invalidate if "old" */
> > -	if (dentry->d_inode == NULL)
> > -		return (dentry->d_time - jiffies <= AUTOFS_NEGATIVE_TIMEOUT);
> > +	if (dentry->d_inode == NULL) {
> > +		status = (dentry->d_time - jiffies <= AUTOFS_NEGATIVE_TIMEOUT);
> > +		goto out;
> > +	}
> >  
> >  	/* Check for a non-mountpoint directory with no contents */
> >  	spin_lock(&dcache_lock);
> >  	if (S_ISDIR(dentry->d_inode->i_mode) &&
> >  	    !d_mountpoint(dentry) && 
> >  	    list_empty(&dentry->d_subdirs)) {
> >  		DPRINTK("dentry=%p %.*s, emptydir",
> >  			 dentry, dentry->d_name.len, dentry->d_name.name);
> >  		spin_unlock(&dcache_lock);
> >  		if (!oz_mode)
> > -			status = try_to_fill_dentry(dentry, dir->i_sb, sbi, flags);
> > -		return status;
> > +			status = try_to_fill_dentry(dentry, dir->i_sb, sbi,
> > +					flags);
> > +		goto out;
> >  	}
> >  	spin_unlock(&dcache_lock);
> >  
> >  	/* Update the usage list */
> >  	if (!oz_mode)
> >  		autofs4_update_usage(dentry);
> >  
> > -	return 1;
> > +out:
> > +	down(&dir->i_sem);
> > +	return status;
> >  }
> >  
> >  static void autofs4_dentry_release(struct dentry *de)
> >  {
> >  	struct autofs_info *inf;
> > @@ -485,15 +492,12 @@ static struct dentry *autofs4_lookup(str
> >  		spin_unlock(&dentry->d_lock);
> >  	}
> >  	dentry->d_fsdata = NULL;
> >  	d_add(dentry, NULL);
> >  
> > -	if (dentry->d_op && dentry->d_op->d_revalidate) {
> > -		up(&dir->i_sem);
> > +	if (dentry->d_op && dentry->d_op->d_revalidate)
> >  		(dentry->d_op->d_revalidate)(dentry, nd);
> > -		down(&dir->i_sem);
> > -	}
> >  
> >  	/*
> >  	 * If we are still pending, check if we had to handle
> >  	 * a signal. If so we can force a restart..
> >  	 */
> > 
> > _______________________________________________
> > autofs mailing list
> > autofs@linux.kernel.org
> > http://linux.kernel.org/mailman/listinfo/autofs
> > 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-16 15:22   ` Jeff Moyer
  (?)
@ 2005-11-16 17:00   ` Ram Pai
  2005-11-16 18:25     ` Jeff Moyer
  -1 siblings, 1 reply; 95+ messages in thread
From: Ram Pai @ 2005-11-16 17:00 UTC (permalink / raw)
  To: jmoyer; +Cc: autofs, linux-fsdevel

On Wed, 2005-11-16 at 07:22, Jeff Moyer wrote:
> ==> Regarding [autofs] [RFC PATCH]autofs4: hang and proposed fix; linuxram@us.ibm.com (Ram Pai) adds:
> 
> ram> Autofs4 assumes that its ->revalidate() function gets called with the
> ram> parent_dentry's_inode_semaphore released. This is true mostly
> ram> but not in one particular case.
> 
> ram> Process P1  calls autofs4's ->lookup(). The lookup finds that the dentry
> ram> does not exist. It creates a dentry and adds to the cache. Releases
> ram> the parent's inode's semaphore and than calls ->revalidate().
> 
> ram> Process P2 meanwhile comes in and cached_lookup() gets called. It finds
> ram> the dentry in the cache and finds ->revalidate() function exists. So
> ram> it calls ->revalidate() holding the parent's inode's semaphore.
> 
> ram> Now the automounter daemon comes in and tries to hold the same semaphore
> ram> in order to mount. But since the semaphore is held by P2 it
> ram> goes to sleep.
> 
> ram> Process P1 and P2 continue waiting for the mount to complete and it never
> ram> happens. Deadlock.
> 
> ram> The stack of the deadlock is as follows:
> 
> ram> ls            S 00000000     0 13049  11954                     (NOTLB)
> ram> f5221df0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> ram> 00000000 f5d44a70 c721b520 00000000 d4f33800 003d0990 c721b9d8 f5d44030
> ram> f5d44164 f5220000 f5221e3c f3dd6880 f5221e68 c0215207 f3b95580 80000000
> ram> Call Trace:
> ram> [<c0215207>] autofs4_wait+0x307/0x3d0
> ram> [<c02141d3>] try_to_fill_dentry+0xf3/0x150
> ram> [<c0214389>] autofs4_revalidate+0x159/0x170
> ram> [<c02144e0>] autofs4_lookup+0x110/0x150
> ram> [<c016f3f5>] __lookup_hash+0x85/0xb0
> ram> [<c016f42a>] lookup_hash+0xa/0x10
> ram> [<c016f483>] lookup_one_len+0x53/0x70
> ram> [<f8851293>] stubfs_readdir+0x113/0x170 [stubfs]
> 
> What's stubfs?

Its a small stub filesystem we wrote(thanks to Will Taber) to
demonstrate the problem. All it does is holds the parent's
inode-semaphore before calling lookup_one_len() on the dentry that needs
a automount.

This problem demonstrates a very very small race window which cannot be
triggered in normal operations. The stubfs kind of orchestrates the
exact timing to demonstrate the problem.

note: the timing should be such that, process 1 should have added the
newly created dentry in the dcache and jolted the automounter daemon.
And then process P2 has to come in asking for the same dentry, and 
should go to sleep waiting on the automounter to mount at the dentry. 
And finally the automounter has to come in.

RP


> 
> -Jeff


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-16 17:00   ` [autofs] " Ram Pai
@ 2005-11-16 18:25     ` Jeff Moyer
  2005-11-16 19:24       ` William H. Taber
  0 siblings, 1 reply; 95+ messages in thread
From: Jeff Moyer @ 2005-11-16 18:25 UTC (permalink / raw)
  To: Ram Pai; +Cc: autofs, linux-fsdevel

==> Regarding Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix; Ram Pai <linuxram@us.ibm.com> adds:

linuxram> On Wed, 2005-11-16 at 07:22, Jeff Moyer wrote:
>> ==> Regarding [autofs] [RFC PATCH]autofs4: hang and proposed fix;
>> linuxram@us.ibm.com (Ram Pai) adds:
>> 
ram> Autofs4 assumes that its ->revalidate() function gets called with the
ram> parent_dentry's_inode_semaphore released. This is true mostly but not
ram> in one particular case.
>>
ram> Process P1 calls autofs4's ->lookup(). The lookup finds that the
ram> dentry does not exist. It creates a dentry and adds to the
ram> cache. Releases the parent's inode's semaphore and than calls
ram> ->revalidate().
>>
ram> Process P2 meanwhile comes in and cached_lookup() gets called. It
ram> finds the dentry in the cache and finds ->revalidate() function
ram> exists. So it calls ->revalidate() holding the parent's inode's
ram> semaphore.
>>
ram> Now the automounter daemon comes in and tries to hold the same
ram> semaphore in order to mount. But since the semaphore is held by P2 it
ram> goes to sleep.
>>
ram> Process P1 and P2 continue waiting for the mount to complete and it
ram> never happens. Deadlock.
>>
ram> The stack of the deadlock is as follows:
>>
ram> ls S 00000000 0 13049 11954 (NOTLB) f5221df0 00000000 00000000
ram> 00000000 00000000 00000000 00000000 00000000 00000000 f5d44a70
ram> c721b520 00000000 d4f33800 003d0990 c721b9d8 f5d44030 f5d44164
ram> f5220000 f5221e3c f3dd6880 f5221e68 c0215207 f3b95580 80000000 Call
ram> Trace: [<c0215207>] autofs4_wait+0x307/0x3d0 [<c02141d3>]
ram> try_to_fill_dentry+0xf3/0x150 [<c0214389>]
ram> autofs4_revalidate+0x159/0x170 [<c02144e0>] autofs4_lookup+0x110/0x150
ram> [<c016f3f5>] __lookup_hash+0x85/0xb0 [<c016f42a>] lookup_hash+0xa/0x10
ram> [<c016f483>] lookup_one_len+0x53/0x70 [<f8851293>]
ram> stubfs_readdir+0x113/0x170 [stubfs]
>> What's stubfs?

linuxram> Its a small stub filesystem we wrote(thanks to Will Taber) to
linuxram> demonstrate the problem. All it does is holds the parent's
linuxram> inode-semaphore before calling lookup_one_len() on the dentry
linuxram> that needs a automount.

linuxram> This problem demonstrates a very very small race window which
linuxram> cannot be triggered in normal operations. The stubfs kind of
linuxram> orchestrates the exact timing to demonstrate the problem.

linuxram> note: the timing should be such that, process 1 should have added
linuxram> the newly created dentry in the dcache and jolted the automounter
linuxram> daemon.  And then process P2 has to come in asking for the same
linuxram> dentry, and should go to sleep waiting on the automounter to
linuxram> mount at the dentry. And finally the automounter has to come in.

I've been trying to reproduce this using sleeps in the user space daemon,
and I can't.  Can you post your test code so that I'm not guessing at
what's going on?  For example, one thing that's unclear is how you are
stuffing stubfs in between the vfs and autofs.

-Jeff

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-16 18:25     ` Jeff Moyer
@ 2005-11-16 19:24       ` William H. Taber
  2005-11-16 19:51         ` Ram Pai
  0 siblings, 1 reply; 95+ messages in thread
From: William H. Taber @ 2005-11-16 19:24 UTC (permalink / raw)
  To: jmoyer; +Cc: Ram Pai, autofs, linux-fsdevel

Jeff Moyer wrote:
> ==> Regarding Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix; Ram Pai <linuxram@us.ibm.com> adds:

> 
> I've been trying to reproduce this using sleeps in the user space daemon,
> and I can't.  Can you post your test code so that I'm not guessing at
> what's going on?  For example, one thing that's unclear is how you are
> stuffing stubfs in between the vfs and autofs.
> 
> -Jeff
> -
The stubfs is just a test filesystem I wrote to reproduce this problem. 
  It doesn't sit between the vfs and autofs.  What it does is to do a 
lookup on /net and save the inode for it.  Then it gets the i_sem on 
/net and calls lookup_one_len on a given hostname.  The second time in 
it omits the lookup (it already has the inode for /net) and down on the 
i_sem lock and then calls lookup_one_len. It has some of it's own 
locking to get properly synchronized to force the race condition.  What 
happens is that both processes are waiting on i_sem for /net.  The first 
one gets it, and calls into the autofs which creates the new dentry, 
starts the automount deamon, and waits for the mount to complete.  Since 
the second lookup is already queued on the i_sem, it get's in second, 
finds the dentry, calls revalidate which waits for the mount to complete 
without releasing i_sem.  This of course prevents the automounter from 
completing the mount.

Does this clarify?

Will Taber

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-16 19:24       ` William H. Taber
@ 2005-11-16 19:51         ` Ram Pai
  0 siblings, 0 replies; 95+ messages in thread
From: Ram Pai @ 2005-11-16 19:51 UTC (permalink / raw)
  To: William H Taber; +Cc: jmoyer, autofs, linux-fsdevel

On Wed, 2005-11-16 at 11:24, William H. Taber wrote:
> Jeff Moyer wrote:
> > ==> Regarding Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix; Ram Pai <linuxram@us.ibm.com> adds:
> 
> > 
> > I've been trying to reproduce this using sleeps in the user space daemon,
> > and I can't.  Can you post your test code so that I'm not guessing at
> > what's going on?  For example, one thing that's unclear is how you are
> > stuffing stubfs in between the vfs and autofs.
> > 
> > -Jeff
> > -
> The stubfs is just a test filesystem I wrote to reproduce this problem. 
>   It doesn't sit between the vfs and autofs.  What it does is to do a 
> lookup on /net and save the inode for it.  Then it gets the i_sem on 
> /net and calls lookup_one_len on a given hostname.  The second time in 
> it omits the lookup (it already has the inode for /net) and down on the 
> i_sem lock and then calls lookup_one_len. It has some of it's own 
> locking to get properly synchronized to force the race condition.  What 
> happens is that both processes are waiting on i_sem for /net.  The first 
> one gets it, and calls into the autofs which creates the new dentry, 
> starts the automount deamon, and waits for the mount to complete.  Since 
> the second lookup is already queued on the i_sem, it get's in second, 
> finds the dentry, calls revalidate which waits for the mount to complete 
> without releasing i_sem.  This of course prevents the automounter from 
> completing the mount.
> 
> Does this clarify?

here is the pointer to the patch that applies and compiles on
2.6.15-rc1.

http://www.sudhaa.com/~ram/readahead/stubfs.patch


you will have to set up /net/ram as the automounter location. 
That string is hardcoded. Maybe you can change that string to your
environment. And you can modify to it make it a module.

the instructions on how to reproduce is there in fs/stubfs/stub.c
line number 129 roughly.

This GPL code is for testing purpose only,
RP

> Will Taber


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-11-16 16:50   ` Ram Pai
@ 2005-11-16 22:57     ` Ian Kent
  2005-11-17  1:52       ` [autofs] " Ram Pai
  0 siblings, 1 reply; 95+ messages in thread
From: Ian Kent @ 2005-11-16 22:57 UTC (permalink / raw)
  To: Ram Pai; +Cc: autofs, linux-fsdevel

On Wed, 16 Nov 2005, Ram Pai wrote:

> On Wed, 2005-11-16 at 04:41, Ian Kent wrote:
> > On Wed, 16 Nov 2005, Ram Pai wrote:
> > 
> > Thanks for you effort Ram.
> > 
> > > Autofs4 assumes that its ->revalidate() function gets called with the
> > > parent_dentry's_inode_semaphore released. This is true mostly
> > > but not in one particular case.
> > 
> > Yep. Certainly does.
> > 
> > Isn't my mistake not noticing that the inode semaphore is taken in 
> > vfs_readdir?
> > 
> > It's been like that all along and I can't understand how I didn't notice 
> > it before.
> > 
> > Help me out a bit here please Ram.
> > 
> > Aren't there other paths that enter revalidate without holding the 
> > semaphore? chdir?
> 
> Looking at the code and it seemed to me that ->revalidate() function is
> always with the semaphore held. Atleast VFS seem to have that
> assumption.

My reading gives me the opposite impression.

The chdir example above calls the path walking routine which calls the 
lookup method with the semaphore held and revalidate without it held. 
Clearly, there are a number of other examples.

Certainly, I could be wrong and I'll be checking that.

> 
> And looking at the autofs4 code, I  get the impression, that it
> assumes that the semaphore is released when it gets called. Which seems
> to be inconsistent and wrong.

It does but I thought that was VFS design and I'm willing to be corrected 
if I'm wrong.

Thinking about it and looking at the stack trace I'm having a bit of 
trouble working out why there is a mount wait being triggered here at all. 

I think this is a getdents call so then there should have been an open 
which would have done the mount wait, followed by the getdents call itself 
and finally a close. I see this sequence all the time when I'm using debug 
to log activity and it's also evident from the corresponding functions in 
fs/libfs.c.

What I can't work out is how getdents appears to be called without having 
called open. Is there anything more that you can tell me about how you 
have been able to demonstrate this error.

> 
> > 
> > Does uping an open semaphore allow other undesirable side affects?
> > 
> > Do you think it would perhaps be better to release the semaphore in 
> > autofs4_readdir ... hang on the stack trace doesn't look like a readdir 
> > ... I'll have to check 2.6.15-rc1 ... ?
> > 
> 
> One automounter process is in sys_mkdir() system call and the other if I
> recollect correctly was in sys_getdent64() system call.
> 
> Yes this problem can be demonstrated on all versions of the 2.6 kernel.
> Infact I reproduced it on 2.6.15-rc1 kernel.
> 
> > Apart from the above, looking at the patch and assuming that the semaphore 
> > is always held it would probaby be better to move semaphore open/close 
> > into try_to_fill_dentry as any control process using autofs, such as automount, 
> > must never cause a mount wait to be called (oz_mode = 1).
> 
> 
> 
> > 
> > Ideas?
> 
> One thing is sure, VFS assumes that the semaphore is held when
> ->revalidate() is called, and that convention is followed religiously by
> all filesytems.  autofs4 has this special case of releasing the
> semaphore if it is waiting to be woken up by the automounter daemon. So
> as you said, may be the semaphore must be released just before sleeping
> on the waitq?
> 
> RP
> 
> > 
> > > 
> > > Process P1  calls autofs4's ->lookup(). The lookup finds that the dentry
> > > does not exist. It creates a dentry and adds to the cache. Releases
> > > the parent's inode's semaphore and than calls ->revalidate().
> > > 
> > > Process P2 meanwhile comes in and cached_lookup() gets called. It finds
> > > the dentry in the cache and finds ->revalidate() function exists. So
> > > it calls ->revalidate() holding the parent's inode's semaphore.
> > > 
> > > Now the automounter daemon comes in and tries to hold the same semaphore
> > > in order to mount. But since the semaphore is held by P2 it
> > > goes to sleep.
> > > 
> > > Process P1 and P2 continue waiting for the mount to complete and it never
> > > happens. Deadlock.
> > > 
> > > The stack of the deadlock is as follows:
> > > 
> > > ls            S 00000000     0 13049  11954                     (NOTLB)
> > > f5221df0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > > 00000000 f5d44a70 c721b520 00000000 d4f33800 003d0990 c721b9d8 f5d44030
> > > f5d44164 f5220000 f5221e3c f3dd6880 f5221e68 c0215207 f3b95580 80000000
> > > Call Trace:
> > > [<c0215207>] autofs4_wait+0x307/0x3d0
> > > [<c02141d3>] try_to_fill_dentry+0xf3/0x150
> > > [<c0214389>] autofs4_revalidate+0x159/0x170
> > > [<c02144e0>] autofs4_lookup+0x110/0x150
> > > [<c016f3f5>] __lookup_hash+0x85/0xb0
> > > [<c016f42a>] lookup_hash+0xa/0x10
> > > [<c016f483>] lookup_one_len+0x53/0x70
> > > [<f8851293>] stubfs_readdir+0x113/0x170 [stubfs]
> > > [<c0172fcb>] vfs_readdir+0x8b/0xa0
> > > [<c01733b3>] sys_getdents64+0x63/0xb5
> > > [<c010464d>] syscall_call+0x7/0xb
> > > 
> > > ls            S C011B1AF     0 13050  11898                     (NOTLB)
> > > f1337df0 00000082 f1337e04 c011b1af 06ce3f60 00000027 00000027 00000080
> > > 06d03f60 00000000 c721b520 00000000 d4f33800 003d0990 f1337df0 f5d44a70
> > > f5d44ba4 f1336000 f1337e3c f3dd6880 f1337e68 c0215207 f3b95580 80000000
> > > Call Trace:
> > > [<c0215207>] autofs4_wait+0x307/0x3d0
> > > [<c02141d3>] try_to_fill_dentry+0xf3/0x150
> > > [<c0214389>] autofs4_revalidate+0x159/0x170
> > > [<c016dc77>] cached_lookup+0x47/0x80
> > > [<c016f3ca>] __lookup_hash+0x5a/0xb0
> > > [<c016f42a>] lookup_hash+0xa/0x10
> > > [<c016f483>] lookup_one_len+0x53/0x70  
> > > [<f88512e3>] stubfs_readdir+0x163/0x170 [stubfs]
> > > [<c0172fcb>] vfs_readdir+0x8b/0xa0  
> > > [<c01733b3>] sys_getdents64+0x63/0xb5
> > > [<c010464d>] syscall_call+0x7/0xb
> > > 
> > > automount     D 00000010     0 13052  13016                     (NOTLB)
> > > f3321f00 fff80000 00000007 00000010 f3321f68 c7b1cd20 00000000 f3321f34
> > > f3321ee8 f5e92a70 c7233520 00000000 d5304100 003d0990 c7233560 f1e31a70
> > > f1e31ba4 f5f59914 f5f5991c 00000296 f3321f38 c03b4cd3 f1e31a70 00000001
> > > Call Trace:
> > > [<c03b4cd3>] __down+0x83/0xe0
> > > [<c03b3632>] __down_failed+0xa/0x10
> > > [<c0171e6d>] .text.lock.namei+0xeb/0x1de
> > > [<c0170482>] sys_mkdir+0x52/0xd0
> > > [<c010464d>] syscall_call+0x7/0xb
> > > BUG: soft lockup detected on CPU#0!
> > > 
> > > 
> > > I have coded up a tentative fix. The patch releases the semaphore in
> > > ->revalidate() function, instead of the caller of that function.  Not
> > > sure if this is the right fix. Tested it and verified that the deadlock
> > > is fixed.  But I am not sure if it opens up other bugs. Please validate.
> > > 
> > > 
> > >  fs/autofs4/root.c |   26 +++++++++++++++-----------
> > >  1 files changed, 15 insertions(+), 11 deletions(-)
> > > 
> > > Index: 2.6.15-rc1/fs/autofs4/root.c
> > > ===================================================================
> > > --- 2.6.15-rc1.orig/fs/autofs4/root.c
> > > +++ 2.6.15-rc1/fs/autofs4/root.c
> > > @@ -386,40 +386,47 @@ static int autofs4_revalidate(struct den
> > >  	struct autofs_sb_info *sbi = autofs4_sbi(dir->i_sb);
> > >  	int oz_mode = autofs4_oz_mode(sbi);
> > >  	int flags = nd ? nd->flags : 0;
> > >  	int status = 1;
> > >  
> > > +	up(&dir->i_sem);
> > >  	/* Pending dentry */
> > >  	if (autofs4_ispending(dentry)) {
> > >  		if (!oz_mode)
> > > -			status = try_to_fill_dentry(dentry, dir->i_sb, sbi, flags);
> > > -		return status;
> > > +			status = try_to_fill_dentry(dentry, dir->i_sb,
> > > +					sbi, flags);
> > > +		goto out;
> > >  	}
> > >  
> > >  	/* Negative dentry.. invalidate if "old" */
> > > -	if (dentry->d_inode == NULL)
> > > -		return (dentry->d_time - jiffies <= AUTOFS_NEGATIVE_TIMEOUT);
> > > +	if (dentry->d_inode == NULL) {
> > > +		status = (dentry->d_time - jiffies <= AUTOFS_NEGATIVE_TIMEOUT);
> > > +		goto out;
> > > +	}
> > >  
> > >  	/* Check for a non-mountpoint directory with no contents */
> > >  	spin_lock(&dcache_lock);
> > >  	if (S_ISDIR(dentry->d_inode->i_mode) &&
> > >  	    !d_mountpoint(dentry) && 
> > >  	    list_empty(&dentry->d_subdirs)) {
> > >  		DPRINTK("dentry=%p %.*s, emptydir",
> > >  			 dentry, dentry->d_name.len, dentry->d_name.name);
> > >  		spin_unlock(&dcache_lock);
> > >  		if (!oz_mode)
> > > -			status = try_to_fill_dentry(dentry, dir->i_sb, sbi, flags);
> > > -		return status;
> > > +			status = try_to_fill_dentry(dentry, dir->i_sb, sbi,
> > > +					flags);
> > > +		goto out;
> > >  	}
> > >  	spin_unlock(&dcache_lock);
> > >  
> > >  	/* Update the usage list */
> > >  	if (!oz_mode)
> > >  		autofs4_update_usage(dentry);
> > >  
> > > -	return 1;
> > > +out:
> > > +	down(&dir->i_sem);
> > > +	return status;
> > >  }
> > >  
> > >  static void autofs4_dentry_release(struct dentry *de)
> > >  {
> > >  	struct autofs_info *inf;
> > > @@ -485,15 +492,12 @@ static struct dentry *autofs4_lookup(str
> > >  		spin_unlock(&dentry->d_lock);
> > >  	}
> > >  	dentry->d_fsdata = NULL;
> > >  	d_add(dentry, NULL);
> > >  
> > > -	if (dentry->d_op && dentry->d_op->d_revalidate) {
> > > -		up(&dir->i_sem);
> > > +	if (dentry->d_op && dentry->d_op->d_revalidate)
> > >  		(dentry->d_op->d_revalidate)(dentry, nd);
> > > -		down(&dir->i_sem);
> > > -	}
> > >  
> > >  	/*
> > >  	 * If we are still pending, check if we had to handle
> > >  	 * a signal. If so we can force a restart..
> > >  	 */
> > > 
> > > _______________________________________________
> > > autofs mailing list
> > > autofs@linux.kernel.org
> > > http://linux.kernel.org/mailman/listinfo/autofs
> > > 
> > 
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-16 22:57     ` Ian Kent
@ 2005-11-17  1:52       ` Ram Pai
  2005-11-17 18:50         ` Ian Kent
  0 siblings, 1 reply; 95+ messages in thread
From: Ram Pai @ 2005-11-17  1:52 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, linux-fsdevel, William H Taber

On Wed, 2005-11-16 at 14:57, Ian Kent wrote:
> On Wed, 16 Nov 2005, Ram Pai wrote:
> 
> > On Wed, 2005-11-16 at 04:41, Ian Kent wrote:
> > > On Wed, 16 Nov 2005, Ram Pai wrote:
> > > 
> > > Thanks for you effort Ram.
> > > 
> > > > Autofs4 assumes that its ->revalidate() function gets called with the
> > > > parent_dentry's_inode_semaphore released. This is true mostly
> > > > but not in one particular case.
> > > 
> > > Yep. Certainly does.
> > > 
> > > Isn't my mistake not noticing that the inode semaphore is taken in 
> > > vfs_readdir?
> > > 
> > > It's been like that all along and I can't understand how I didn't notice 
> > > it before.
> > > 
> > > Help me out a bit here please Ram.
> > > 
> > > Aren't there other paths that enter revalidate without holding the 
> > > semaphore? chdir?
> > 
> > Looking at the code and it seemed to me that ->revalidate() function is
> > always with the semaphore held. Atleast VFS seem to have that
> > assumption.
> 
> My reading gives me the opposite impression.
> 
> The chdir example above calls the path walking routine which calls the 
> lookup method with the semaphore held and revalidate without it held. 
> Clearly, there are a number of other examples.
> 
> Certainly, I could be wrong and I'll be checking that.

I see your point. Looking more through the code it looks like
the convention about how ->revalidate() gets called, seems to be
inconsistent in VFS.

in do_lookup() which calls ->revalidate(), the semaphore is not-held.

Where as lookup_one_len() is expected to be called with the semaphore
held. This function calls lookup_hash() which calls cached_lookup()
which later calls ->revalidate(), and here ->revalidate() is called with
the semaphore held.  Is this the source of the bug?


> > 
> > And looking at the autofs4 code, I  get the impression, that it
> > assumes that the semaphore is released when it gets called. Which seems
> > to be inconsistent and wrong.
> 
> It does but I thought that was VFS design and I'm willing to be corrected 
> if I'm wrong.
> 
> Thinking about it and looking at the stack trace I'm having a bit of 
> trouble working out why there is a mount wait being triggered here at all. 
> 
> I think this is a getdents call so then there should have been an open 
> which would have done the mount wait, followed by the getdents call itself 
> and finally a close. I see this sequence all the time when I'm using debug 
> to log activity and it's also evident from the corresponding functions in 
> fs/libfs.c.

In this case there is no explicit open on a autofs4 directory. The
readdir is taking place on a directory belonging to the stubfs
filesystem. Internally stubfs filesystem is trying to open the
automounter's dentry through the 
lookup_one_len() call.  And this triggers the automouter into action.

> 
> What I can't work out is how getdents appears to be called without having 
> called open. Is there anything more that you can tell me about how you 
> have been able to demonstrate this error.
> 
Maybe you are missing the stubfs part. The stubfs is kind of
in-the-middle filesystem which sits between the application and
the autofs4. 

Will Taber: Am I saying this right?

Take a look at the test patch for stubfs posted at
http://www.sudhaa.com/~ram/readahead/stubfs.patch


For clarity here is the scenario:

P1 executes 'ls' on a directory belonging to stubfs. 
          stubfs's ->lookup() gets
	  called and it internally redirects that lookup to autofs4
          by calling lookup_one_len() on /net/ram 
          note: /net belongs to autofs4 and lookup_one_len() is
           called holding the inode-semaphore of /net .
           lookup_one_len() calls lookup_hash() which finds that there
	   is no cached dentry for 'ram', and hence allocates a dentry
           and calls ->lookup() of autofs4. 
           autofs4 adds the dentry to the dcache and calls its
	   ->revalidate() after releasing the semaphore.
           ->revalidate tries to wake up the automounter daemon, and
            goes to sleep on a waitq.

P2 executes 'ls' on another directory belonging to stubfs. 
         stubfs's ->lookup()
         gets called and it internally redirects that lookup to autofs4
         by calling lookup_one_len() on /net/ram. lookup_one_len() is
         called holding the inode-semaphore of /net.  
         lookup_one_len() calls lookup_hash() which calls
         cached_lookup(). cached_lookup() finds the dentry
         corresponding to 'ram' in the dcache. So it calls
         ->revalidate() on it. NOTE: this time autofs4's 
         ->revalidate() is called holding the semaphore.
         ->revalidate() goes to sleep on the same waitq 
         waiting on the automounter to wake him up.

automouter: the automounter now comes in and tries to hold 
           the semaphore on /net and deadlocks.

The question is: Who is the culprit?  stubfs?  VFS? or
             autofs4?

RP


> > 
> > > 
> > > Does uping an open semaphore allow other undesirable side affects?
> > > 
> > > Do you think it would perhaps be better to release the semaphore in 
> > > autofs4_readdir ... hang on the stack trace doesn't look like a readdir 
> > > ... I'll have to check 2.6.15-rc1 ... ?
> > > 
> > 
> > One automounter process is in sys_mkdir() system call and the other if I
> > recollect correctly was in sys_getdent64() system call.
> > 
> > Yes this problem can be demonstrated on all versions of the 2.6 kernel.
> > Infact I reproduced it on 2.6.15-rc1 kernel.
> > 
> > > Apart from the above, looking at the patch and assuming that the semaphore 
> > > is always held it would probaby be better to move semaphore open/close 
> > > into try_to_fill_dentry as any control process using autofs, such as automount, 
> > > must never cause a mount wait to be called (oz_mode = 1).
> > 
> > 
> > 
> > > 
> > > Ideas?
> > 
> > One thing is sure, VFS assumes that the semaphore is held when
> > ->revalidate() is called, and that convention is followed religiously by
> > all filesytems.  autofs4 has this special case of releasing the
> > semaphore if it is waiting to be woken up by the automounter daemon. So
> > as you said, may be the semaphore must be released just before sleeping
> > on the waitq?
> > 
> > RP
> > 
> > > 
> > > > 
> > > > Process P1  calls autofs4's ->lookup(). The lookup finds that the dentry
> > > > does not exist. It creates a dentry and adds to the cache. Releases
> > > > the parent's inode's semaphore and than calls ->revalidate().
> > > > 
> > > > Process P2 meanwhile comes in and cached_lookup() gets called. It finds
> > > > the dentry in the cache and finds ->revalidate() function exists. So
> > > > it calls ->revalidate() holding the parent's inode's semaphore.
> > > > 
> > > > Now the automounter daemon comes in and tries to hold the same semaphore
> > > > in order to mount. But since the semaphore is held by P2 it
> > > > goes to sleep.
> > > > 
> > > > Process P1 and P2 continue waiting for the mount to complete and it never
> > > > happens. Deadlock.
> > > > 
> > > > The stack of the deadlock is as follows:
> > > > 
> > > > ls            S 00000000     0 13049  11954                     (NOTLB)
> > > > f5221df0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > > > 00000000 f5d44a70 c721b520 00000000 d4f33800 003d0990 c721b9d8 f5d44030
> > > > f5d44164 f5220000 f5221e3c f3dd6880 f5221e68 c0215207 f3b95580 80000000
> > > > Call Trace:
> > > > [<c0215207>] autofs4_wait+0x307/0x3d0
> > > > [<c02141d3>] try_to_fill_dentry+0xf3/0x150
> > > > [<c0214389>] autofs4_revalidate+0x159/0x170
> > > > [<c02144e0>] autofs4_lookup+0x110/0x150
> > > > [<c016f3f5>] __lookup_hash+0x85/0xb0
> > > > [<c016f42a>] lookup_hash+0xa/0x10
> > > > [<c016f483>] lookup_one_len+0x53/0x70
> > > > [<f8851293>] stubfs_readdir+0x113/0x170 [stubfs]
> > > > [<c0172fcb>] vfs_readdir+0x8b/0xa0
> > > > [<c01733b3>] sys_getdents64+0x63/0xb5
> > > > [<c010464d>] syscall_call+0x7/0xb
> > > > 
> > > > ls            S C011B1AF     0 13050  11898                     (NOTLB)
> > > > f1337df0 00000082 f1337e04 c011b1af 06ce3f60 00000027 00000027 00000080
> > > > 06d03f60 00000000 c721b520 00000000 d4f33800 003d0990 f1337df0 f5d44a70
> > > > f5d44ba4 f1336000 f1337e3c f3dd6880 f1337e68 c0215207 f3b95580 80000000
> > > > Call Trace:
> > > > [<c0215207>] autofs4_wait+0x307/0x3d0
> > > > [<c02141d3>] try_to_fill_dentry+0xf3/0x150
> > > > [<c0214389>] autofs4_revalidate+0x159/0x170
> > > > [<c016dc77>] cached_lookup+0x47/0x80
> > > > [<c016f3ca>] __lookup_hash+0x5a/0xb0
> > > > [<c016f42a>] lookup_hash+0xa/0x10
> > > > [<c016f483>] lookup_one_len+0x53/0x70  
> > > > [<f88512e3>] stubfs_readdir+0x163/0x170 [stubfs]
> > > > [<c0172fcb>] vfs_readdir+0x8b/0xa0  
> > > > [<c01733b3>] sys_getdents64+0x63/0xb5
> > > > [<c010464d>] syscall_call+0x7/0xb
> > > > 
> > > > automount     D 00000010     0 13052  13016                     (NOTLB)
> > > > f3321f00 fff80000 00000007 00000010 f3321f68 c7b1cd20 00000000 f3321f34
> > > > f3321ee8 f5e92a70 c7233520 00000000 d5304100 003d0990 c7233560 f1e31a70
> > > > f1e31ba4 f5f59914 f5f5991c 00000296 f3321f38 c03b4cd3 f1e31a70 00000001
> > > > Call Trace:
> > > > [<c03b4cd3>] __down+0x83/0xe0
> > > > [<c03b3632>] __down_failed+0xa/0x10
> > > > [<c0171e6d>] .text.lock.namei+0xeb/0x1de
> > > > [<c0170482>] sys_mkdir+0x52/0xd0
> > > > [<c010464d>] syscall_call+0x7/0xb
> > > > BUG: soft lockup detected on CPU#0!
> > > > 
> > > > 
> > > > I have coded up a tentative fix. The patch releases the semaphore in
> > > > ->revalidate() function, instead of the caller of that function.  Not
> > > > sure if this is the right fix. Tested it and verified that the deadlock
> > > > is fixed.  But I am not sure if it opens up other bugs. Please validate.
> > > > 
> > > > 
> > > >  fs/autofs4/root.c |   26 +++++++++++++++-----------
> > > >  1 files changed, 15 insertions(+), 11 deletions(-)
> > > > 
> > > > Index: 2.6.15-rc1/fs/autofs4/root.c
> > > > ===================================================================
> > > > --- 2.6.15-rc1.orig/fs/autofs4/root.c
> > > > +++ 2.6.15-rc1/fs/autofs4/root.c
> > > > @@ -386,40 +386,47 @@ static int autofs4_revalidate(struct den
> > > >  	struct autofs_sb_info *sbi = autofs4_sbi(dir->i_sb);
> > > >  	int oz_mode = autofs4_oz_mode(sbi);
> > > >  	int flags = nd ? nd->flags : 0;
> > > >  	int status = 1;
> > > >  
> > > > +	up(&dir->i_sem);
> > > >  	/* Pending dentry */
> > > >  	if (autofs4_ispending(dentry)) {
> > > >  		if (!oz_mode)
> > > > -			status = try_to_fill_dentry(dentry, dir->i_sb, sbi, flags);
> > > > -		return status;
> > > > +			status = try_to_fill_dentry(dentry, dir->i_sb,
> > > > +					sbi, flags);
> > > > +		goto out;
> > > >  	}
> > > >  
> > > >  	/* Negative dentry.. invalidate if "old" */
> > > > -	if (dentry->d_inode == NULL)
> > > > -		return (dentry->d_time - jiffies <= AUTOFS_NEGATIVE_TIMEOUT);
> > > > +	if (dentry->d_inode == NULL) {
> > > > +		status = (dentry->d_time - jiffies <= AUTOFS_NEGATIVE_TIMEOUT);
> > > > +		goto out;
> > > > +	}
> > > >  
> > > >  	/* Check for a non-mountpoint directory with no contents */
> > > >  	spin_lock(&dcache_lock);
> > > >  	if (S_ISDIR(dentry->d_inode->i_mode) &&
> > > >  	    !d_mountpoint(dentry) && 
> > > >  	    list_empty(&dentry->d_subdirs)) {
> > > >  		DPRINTK("dentry=%p %.*s, emptydir",
> > > >  			 dentry, dentry->d_name.len, dentry->d_name.name);
> > > >  		spin_unlock(&dcache_lock);
> > > >  		if (!oz_mode)
> > > > -			status = try_to_fill_dentry(dentry, dir->i_sb, sbi, flags);
> > > > -		return status;
> > > > +			status = try_to_fill_dentry(dentry, dir->i_sb, sbi,
> > > > +					flags);
> > > > +		goto out;
> > > >  	}
> > > >  	spin_unlock(&dcache_lock);
> > > >  
> > > >  	/* Update the usage list */
> > > >  	if (!oz_mode)
> > > >  		autofs4_update_usage(dentry);
> > > >  
> > > > -	return 1;
> > > > +out:
> > > > +	down(&dir->i_sem);
> > > > +	return status;
> > > >  }
> > > >  
> > > >  static void autofs4_dentry_release(struct dentry *de)
> > > >  {
> > > >  	struct autofs_info *inf;
> > > > @@ -485,15 +492,12 @@ static struct dentry *autofs4_lookup(str
> > > >  		spin_unlock(&dentry->d_lock);
> > > >  	}
> > > >  	dentry->d_fsdata = NULL;
> > > >  	d_add(dentry, NULL);
> > > >  
> > > > -	if (dentry->d_op && dentry->d_op->d_revalidate) {
> > > > -		up(&dir->i_sem);
> > > > +	if (dentry->d_op && dentry->d_op->d_revalidate)
> > > >  		(dentry->d_op->d_revalidate)(dentry, nd);
> > > > -		down(&dir->i_sem);
> > > > -	}
> > > >  
> > > >  	/*
> > > >  	 * If we are still pending, check if we had to handle
> > > >  	 * a signal. If so we can force a restart..
> > > >  	 */
> > > > 
> > > > _______________________________________________
> > > > autofs mailing list
> > > > autofs@linux.kernel.org
> > > > http://linux.kernel.org/mailman/listinfo/autofs
> > > > 
> > > 
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-17  1:52       ` [autofs] " Ram Pai
@ 2005-11-17 18:50         ` Ian Kent
  2005-11-17 19:19           ` William H. Taber
  0 siblings, 1 reply; 95+ messages in thread
From: Ian Kent @ 2005-11-17 18:50 UTC (permalink / raw)
  To: Ram Pai; +Cc: autofs mailing list, linux-fsdevel, William H Taber

On Wed, 16 Nov 2005, Ram Pai wrote:

> > > > 
> > > > Aren't there other paths that enter revalidate without holding the 
> > > > semaphore? chdir?
> > > 
> > > Looking at the code and it seemed to me that ->revalidate() function is
> > > always with the semaphore held. Atleast VFS seem to have that
> > > assumption.
> > 
> > My reading gives me the opposite impression.
> > 
> > The chdir example above calls the path walking routine which calls the 
> > lookup method with the semaphore held and revalidate without it held. 
> > Clearly, there are a number of other examples.
> > 
> > Certainly, I could be wrong and I'll be checking that.
> 
> I see your point. Looking more through the code it looks like
> the convention about how ->revalidate() gets called, seems to be
> inconsistent in VFS.
> 
> in do_lookup() which calls ->revalidate(), the semaphore is not-held.
> 
> Where as lookup_one_len() is expected to be called with the semaphore
> held. This function calls lookup_hash() which calls cached_lookup()
> which later calls ->revalidate(), and here ->revalidate() is called with
> the semaphore held.  Is this the source of the bug?

Yep. My focus has been very much on link_path_walk so I've missed this 
case.

I understood what was going on after reading your answer to Jeffs 
question. I realized then what lead to it doesn't matter, your point being 
that it's possible to cause this by calling lookup_one_len, as you describe.

> > 
> > What I can't work out is how getdents appears to be called without having 
> > called open. Is there anything more that you can tell me about how you 
> > have been able to demonstrate this error.
> > 
> Maybe you are missing the stubfs part. The stubfs is kind of
> in-the-middle filesystem which sits between the application and
> the autofs4. 

Yep. Got that from the other post to.

> 
> Will Taber: Am I saying this right?

Your original description was fine.

Thanks for putting in the effort to make it clear.

> 
> Take a look at the test patch for stubfs posted at
> http://www.sudhaa.com/~ram/readahead/stubfs.patch
> 

I will. Soon as I get a chance. Sounds interesting.

> 
> For clarity here is the scenario:
> 
> P1 executes 'ls' on a directory belonging to stubfs. 
>           stubfs's ->lookup() gets
> 	  called and it internally redirects that lookup to autofs4
>           by calling lookup_one_len() on /net/ram 
>           note: /net belongs to autofs4 and lookup_one_len() is
>            called holding the inode-semaphore of /net .
>            lookup_one_len() calls lookup_hash() which finds that there
> 	   is no cached dentry for 'ram', and hence allocates a dentry
>            and calls ->lookup() of autofs4. 
>            autofs4 adds the dentry to the dcache and calls its
> 	   ->revalidate() after releasing the semaphore.
>            ->revalidate tries to wake up the automounter daemon, and
>             goes to sleep on a waitq.
> 
> P2 executes 'ls' on another directory belonging to stubfs. 
>          stubfs's ->lookup()
>          gets called and it internally redirects that lookup to autofs4
>          by calling lookup_one_len() on /net/ram. lookup_one_len() is
>          called holding the inode-semaphore of /net.  
>          lookup_one_len() calls lookup_hash() which calls
>          cached_lookup(). cached_lookup() finds the dentry
>          corresponding to 'ram' in the dcache. So it calls
>          ->revalidate() on it. NOTE: this time autofs4's 
>          ->revalidate() is called holding the semaphore.
>          ->revalidate() goes to sleep on the same waitq 
>          waiting on the automounter to wake him up.
> 
> automouter: the automounter now comes in and tries to hold 
>            the semaphore on /net and deadlocks.
> 
> The question is: Who is the culprit?  stubfs?  VFS? or
>              autofs4?

I'm happy to fix it in autofs unless you feel we need to address the wider 
issue.

I'll put together a patch which takes account of this and pushes the 
hold/release down into try_to_fill_dentry. But I would like a little 
time to think about whether there may be other implications.

Ian


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-17 18:50         ` Ian Kent
@ 2005-11-17 19:19           ` William H. Taber
  2005-11-17 20:39             ` Ram Pai
  2005-11-18 14:44             ` Ian Kent
  0 siblings, 2 replies; 95+ messages in thread
From: William H. Taber @ 2005-11-17 19:19 UTC (permalink / raw)
  To: Ian Kent; +Cc: Ram Pai, autofs mailing list, linux-fsdevel

Ian Kent wrote:
> On Wed, 16 Nov 2005, Ram Pai wrote:
> 
>>
>>The question is: Who is the culprit?  stubfs?  VFS? or
>>             autofs4?
> 
> 
> I'm happy to fix it in autofs unless you feel we need to address the wider 
> issue.
> 
> I'll put together a patch which takes account of this and pushes the 
> hold/release down into try_to_fill_dentry. But I would like a little 
> time to think about whether there may be other implications.
> 

Ian,
I don't think that you can fix this in the autofs by tinkering with 
holding and releasing the parent i_sem.  The reason for this is that you 
  don't have any way of knowing if you hold that lock or not.  The easy 
case is that nobody holds the lock.  But if the lock is held you have no 
way to know that you are the person holding the lock and you cannot 
unlock someone elses lock without serious consequences.

The only way to fix the lock handling is to fix the VFS.  This means 
either changing all calls to the d_revalidate functions (or all calls to 
d_revalidate itself) so that the parent i_sem is obtained first, or to 
change lookup_one_len (or actually lookup_hash) to only get the lock 
around the filesystem lookup call, matching what is done in real_lookup. 
  I don't know which is better from a locking correctness perspective. 
I would have to defer to the VFS experts on that one.  I do know that 
lookup_one_len is called from about 40 places in kernel tree and 
probably from every filesystem outside the tree as well.  Either way, it 
is a non-trivial piece of work.

If you take the inconsistant locking as a given, then the fix has to 
involve not doing the d_add on the new dentry until after the mount 
completes.  This would eliminate the need for revalidate to wait.  You 
would have to provide a mechanism for keeping track of the outstanding 
mount requests and looking for a a mount in progress before starting a 
new request.  This would take the waiting out of revalidate and put it 
into the lookup request itself where you are guaranteed that the parent 
i_sem lock is held.

I hope this is helps.

Will Taber



^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-17 19:19           ` William H. Taber
@ 2005-11-17 20:39             ` Ram Pai
  2005-11-17 22:31               ` William H. Taber
  2005-11-18 14:54               ` Ian Kent
  2005-11-18 14:44             ` Ian Kent
  1 sibling, 2 replies; 95+ messages in thread
From: Ram Pai @ 2005-11-17 20:39 UTC (permalink / raw)
  To: William H Taber; +Cc: Ian Kent, autofs mailing list, linux-fsdevel

On Thu, 2005-11-17 at 11:19, William H. Taber wrote:
> Ian Kent wrote:
> > On Wed, 16 Nov 2005, Ram Pai wrote:
> > 
> >>
> >>The question is: Who is the culprit?  stubfs?  VFS? or
> >>             autofs4?
> > 
> > 
> > I'm happy to fix it in autofs unless you feel we need to address the wider 
> > issue.
> > 
> > I'll put together a patch which takes account of this and pushes the 
> > hold/release down into try_to_fill_dentry. But I would like a little 
> > time to think about whether there may be other implications.
> > 
> 
> Ian,
> I don't think that you can fix this in the autofs by tinkering with 
> holding and releasing the parent i_sem.  The reason for this is that you 
>   don't have any way of knowing if you hold that lock or not.  The easy 
> case is that nobody holds the lock.  But if the lock is held you have no 
> way to know that you are the person holding the lock and you cannot 
> unlock someone elses lock without serious consequences.
> 
> The only way to fix the lock handling is to fix the VFS.  This means 
> either changing all calls to the d_revalidate functions (or all calls to 
> d_revalidate itself) so that the parent i_sem is obtained first, or to 
> change lookup_one_len (or actually lookup_hash) to only get the lock 
> around the filesystem lookup call, matching what is done in real_lookup. 
>   I don't know which is better from a locking correctness perspective. 
> I would have to defer to the VFS experts on that one.  I do know that 
> lookup_one_len is called from about 40 places in kernel tree and 
> probably from every filesystem outside the tree as well.  Either way, it 
> is a non-trivial piece of work.
> 
> If you take the inconsistant locking as a given, then the fix has to 
> involve not doing the d_add on the new dentry until after the mount 
> completes.  This would eliminate the need for revalidate to wait.  You 
> would have to provide a mechanism for keeping track of the outstanding 
> mount requests and looking for a a mount in progress before starting a 
> new request.  This would take the waiting out of revalidate and put it 
> into the lookup request itself where you are guaranteed that the parent 
> i_sem lock is held.

Even this has a issue I think. Because later when the automounter
attempts to mount, VFS wont' find the corresponding dentry in the dcache
and will allocate a new dentry. And this dentry is not the one which
autofs4 is waiting to be mounted on. No?

RP


> 
> I hope this is helps.
> 
> Will Taber
> 
> 


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-17 20:39             ` Ram Pai
@ 2005-11-17 22:31               ` William H. Taber
  2005-11-18 14:57                 ` Ian Kent
  2005-11-18 14:54               ` Ian Kent
  1 sibling, 1 reply; 95+ messages in thread
From: William H. Taber @ 2005-11-17 22:31 UTC (permalink / raw)
  To: Ram Pai; +Cc: Ian Kent, autofs mailing list, linux-fsdevel

Ram Pai wrote:
> On Thu, 2005-11-17 at 11:19, William H. Taber wrote:
> 
>>Ian Kent wrote:
>>
>>>On Wed, 16 Nov 2005, Ram Pai wrote:
>>>
>>>
>>>>The question is: Who is the culprit?  stubfs?  VFS? or
>>>>            autofs4?
>>>
>>>
>>>I'm happy to fix it in autofs unless you feel we need to address the wider 
>>>issue.
>>>
>>>I'll put together a patch which takes account of this and pushes the 
>>>hold/release down into try_to_fill_dentry. But I would like a little 
>>>time to think about whether there may be other implications.
>>>
>>
>>Ian,
>>I don't think that you can fix this in the autofs by tinkering with 
>>holding and releasing the parent i_sem.  The reason for this is that you 
>>  don't have any way of knowing if you hold that lock or not.  The easy 
>>case is that nobody holds the lock.  But if the lock is held you have no 
>>way to know that you are the person holding the lock and you cannot 
>>unlock someone elses lock without serious consequences.
>>
>>The only way to fix the lock handling is to fix the VFS.  This means 
>>either changing all calls to the d_revalidate functions (or all calls to 
>>d_revalidate itself) so that the parent i_sem is obtained first, or to 
>>change lookup_one_len (or actually lookup_hash) to only get the lock 
>>around the filesystem lookup call, matching what is done in real_lookup. 
>>  I don't know which is better from a locking correctness perspective. 
>>I would have to defer to the VFS experts on that one.  I do know that 
>>lookup_one_len is called from about 40 places in kernel tree and 
>>probably from every filesystem outside the tree as well.  Either way, it 
>>is a non-trivial piece of work.
>>
>>If you take the inconsistant locking as a given, then the fix has to 
>>involve not doing the d_add on the new dentry until after the mount 
>>completes.  This would eliminate the need for revalidate to wait.  You 
>>would have to provide a mechanism for keeping track of the outstanding 
>>mount requests and looking for a a mount in progress before starting a 
>>new request.  This would take the waiting out of revalidate and put it 
>>into the lookup request itself where you are guaranteed that the parent 
>>i_sem lock is held.
> 
> 
> Even this has a issue I think. Because later when the automounter
> attempts to mount, VFS wont' find the corresponding dentry in the dcache
> and will allocate a new dentry. And this dentry is not the one which
> autofs4 is waiting to be mounted on. No?
> 
> RP
> 

That would be bad.  So maybe I should just wait for someone who 
understands the automounter better than I do to come up with an idea.  :^)

Will Taber


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-17 19:19           ` William H. Taber
  2005-11-17 20:39             ` Ram Pai
@ 2005-11-18 14:44             ` Ian Kent
  2005-11-18 15:20               ` William H. Taber
  1 sibling, 1 reply; 95+ messages in thread
From: Ian Kent @ 2005-11-18 14:44 UTC (permalink / raw)
  To: William H. Taber; +Cc: Ram Pai, autofs mailing list, linux-fsdevel

On Thu, 17 Nov 2005, William H. Taber wrote:

Hi Taber,

> Ian Kent wrote:
> > On Wed, 16 Nov 2005, Ram Pai wrote:
> > 
> >>
> >>The question is: Who is the culprit?  stubfs?  VFS? or
> >>             autofs4?
> > 
> > 
> > I'm happy to fix it in autofs unless you feel we need to address the wider 
> > issue.
> > 
> > I'll put together a patch which takes account of this and pushes the 
> > hold/release down into try_to_fill_dentry. But I would like a little 
> > time to think about whether there may be other implications.
> > 
> 
> Ian,
> I don't think that you can fix this in the autofs by tinkering with 
> holding and releasing the parent i_sem.  The reason for this is that you 
>   don't have any way of knowing if you hold that lock or not.  The easy 
> case is that nobody holds the lock.  But if the lock is held you have no 
> way to know that you are the person holding the lock and you cannot 
> unlock someone elses lock without serious consequences.

Yes. I see.

But let me make sure I understand what you are saying.

The problem would be that if I release and then retake the lock for autofs 
to do it thing there is a risk of opening the caller to the potential 
races it is protecting itself from. 

Correct?

> 
> The only way to fix the lock handling is to fix the VFS.  This means 
> either changing all calls to the d_revalidate functions (or all calls to 
> d_revalidate itself) so that the parent i_sem is obtained first, or to 
> change lookup_one_len (or actually lookup_hash) to only get the lock 
> around the filesystem lookup call, matching what is done in real_lookup. 
>   I don't know which is better from a locking correctness perspective. 
> I would have to defer to the VFS experts on that one.  I do know that 
> lookup_one_len is called from about 40 places in kernel tree and 
> probably from every filesystem outside the tree as well.  Either way, it 
> is a non-trivial piece of work.

Sure is.

Given the description above my impulsive thought would be to move the 
synchronisation to backet the lookup call in lookup_hash as the low risk 
low impact approach.

As you say we need to get the attention of those that need to know before 
anything can be done.

> 
> If you take the inconsistant locking as a given, then the fix has to 
> involve not doing the d_add on the new dentry until after the mount 
> completes.  This would eliminate the need for revalidate to wait.  You 
> would have to provide a mechanism for keeping track of the outstanding 
> mount requests and looking for a a mount in progress before starting a 
> new request.  This would take the waiting out of revalidate and put it 
> into the lookup request itself where you are guaranteed that the parent 
> i_sem lock is held.

Sounds like this would be major change for autofs and would likely have 
an impact on the userspace daemon as well.

> 
> I hope this is helps.

Sure does. Thanks.

Ian


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-17 20:39             ` Ram Pai
  2005-11-17 22:31               ` William H. Taber
@ 2005-11-18 14:54               ` Ian Kent
  1 sibling, 0 replies; 95+ messages in thread
From: Ian Kent @ 2005-11-18 14:54 UTC (permalink / raw)
  To: Ram Pai; +Cc: William H Taber, autofs mailing list, linux-fsdevel

On Thu, 17 Nov 2005, Ram Pai wrote:

> On Thu, 2005-11-17 at 11:19, William H. Taber wrote:
> > Ian Kent wrote:
> > > On Wed, 16 Nov 2005, Ram Pai wrote:
> > > 
> > >>
> > >>The question is: Who is the culprit?  stubfs?  VFS? or
> > >>             autofs4?
> > > 
> > > 
> > > I'm happy to fix it in autofs unless you feel we need to address the wider 
> > > issue.
> > > 
> > > I'll put together a patch which takes account of this and pushes the 
> > > hold/release down into try_to_fill_dentry. But I would like a little 
> > > time to think about whether there may be other implications.
> > > 
> > 
> > Ian,
> > I don't think that you can fix this in the autofs by tinkering with 
> > holding and releasing the parent i_sem.  The reason for this is that you 
> >   don't have any way of knowing if you hold that lock or not.  The easy 
> > case is that nobody holds the lock.  But if the lock is held you have no 
> > way to know that you are the person holding the lock and you cannot 
> > unlock someone elses lock without serious consequences.
> > 
> > The only way to fix the lock handling is to fix the VFS.  This means 
> > either changing all calls to the d_revalidate functions (or all calls to 
> > d_revalidate itself) so that the parent i_sem is obtained first, or to 
> > change lookup_one_len (or actually lookup_hash) to only get the lock 
> > around the filesystem lookup call, matching what is done in real_lookup. 
> >   I don't know which is better from a locking correctness perspective. 
> > I would have to defer to the VFS experts on that one.  I do know that 
> > lookup_one_len is called from about 40 places in kernel tree and 
> > probably from every filesystem outside the tree as well.  Either way, it 
> > is a non-trivial piece of work.
> > 
> > If you take the inconsistant locking as a given, then the fix has to 
> > involve not doing the d_add on the new dentry until after the mount 
> > completes.  This would eliminate the need for revalidate to wait.  You 
> > would have to provide a mechanism for keeping track of the outstanding 
> > mount requests and looking for a a mount in progress before starting a 
> > new request.  This would take the waiting out of revalidate and put it 
> > into the lookup request itself where you are guaranteed that the parent 
> > i_sem lock is held.
> 
> Even this has a issue I think. Because later when the automounter
> attempts to mount, VFS wont' find the corresponding dentry in the dcache
> and will allocate a new dentry. And this dentry is not the one which
> autofs4 is waiting to be mounted on. No?

Yes. The mount triggering depends on the dentry being present.

And there is the situation where the mount point directory pre-exists 
in the autofs (browseable automounts) so lookup is not called.

In the new version that I am working on now (when I eventually get it 
done) directories will pre-exist for all autofs mount points but simply 
not be displayed based on a mount option.

So this could be kinda difficult for me.

Ian


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-17 22:31               ` William H. Taber
@ 2005-11-18 14:57                 ` Ian Kent
  0 siblings, 0 replies; 95+ messages in thread
From: Ian Kent @ 2005-11-18 14:57 UTC (permalink / raw)
  To: William H. Taber; +Cc: Ram Pai, autofs mailing list, linux-fsdevel

On Thu, 17 Nov 2005, William H. Taber wrote:

> Ram Pai wrote:
> > On Thu, 2005-11-17 at 11:19, William H. Taber wrote:
> > 
> >>Ian Kent wrote:
> >>
> >>>On Wed, 16 Nov 2005, Ram Pai wrote:
> >>>
> >>>
> >>>>The question is: Who is the culprit?  stubfs?  VFS? or
> >>>>            autofs4?
> >>>
> >>>
> >>>I'm happy to fix it in autofs unless you feel we need to address the wider 
> >>>issue.
> >>>
> >>>I'll put together a patch which takes account of this and pushes the 
> >>>hold/release down into try_to_fill_dentry. But I would like a little 
> >>>time to think about whether there may be other implications.
> >>>
> >>
> >>Ian,
> >>I don't think that you can fix this in the autofs by tinkering with 
> >>holding and releasing the parent i_sem.  The reason for this is that you 
> >>  don't have any way of knowing if you hold that lock or not.  The easy 
> >>case is that nobody holds the lock.  But if the lock is held you have no 
> >>way to know that you are the person holding the lock and you cannot 
> >>unlock someone elses lock without serious consequences.
> >>
> >>The only way to fix the lock handling is to fix the VFS.  This means 
> >>either changing all calls to the d_revalidate functions (or all calls to 
> >>d_revalidate itself) so that the parent i_sem is obtained first, or to 
> >>change lookup_one_len (or actually lookup_hash) to only get the lock 
> >>around the filesystem lookup call, matching what is done in real_lookup. 
> >>  I don't know which is better from a locking correctness perspective. 
> >>I would have to defer to the VFS experts on that one.  I do know that 
> >>lookup_one_len is called from about 40 places in kernel tree and 
> >>probably from every filesystem outside the tree as well.  Either way, it 
> >>is a non-trivial piece of work.
> >>
> >>If you take the inconsistant locking as a given, then the fix has to 
> >>involve not doing the d_add on the new dentry until after the mount 
> >>completes.  This would eliminate the need for revalidate to wait.  You 
> >>would have to provide a mechanism for keeping track of the outstanding 
> >>mount requests and looking for a a mount in progress before starting a 
> >>new request.  This would take the waiting out of revalidate and put it 
> >>into the lookup request itself where you are guaranteed that the parent 
> >>i_sem lock is held.
> > 
> > 
> > Even this has a issue I think. Because later when the automounter
> > attempts to mount, VFS wont' find the corresponding dentry in the dcache
> > and will allocate a new dentry. And this dentry is not the one which
> > autofs4 is waiting to be mounted on. No?
> > 
> > RP
> > 
> 
> That would be bad.  So maybe I should just wait for someone who 
> understands the automounter better than I do to come up with an idea.  :^)

FWIW I'll certainly be thinking about it.

Ian


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-18 14:44             ` Ian Kent
@ 2005-11-18 15:20               ` William H. Taber
  2005-11-18 16:30                 ` Ian Kent
  0 siblings, 1 reply; 95+ messages in thread
From: William H. Taber @ 2005-11-18 15:20 UTC (permalink / raw)
  To: Ian Kent; +Cc: Ram Pai, autofs mailing list, linux-fsdevel

Ian Kent wrote:
> On Thu, 17 Nov 2005, William H. Taber wrote:
> 
> Hi Taber,

Hi,
You can call me Will. Most everyone else does. :^)
> 
>>
>>Ian,
>>I don't think that you can fix this in the autofs by tinkering with 
>>holding and releasing the parent i_sem.  The reason for this is that you 
>>  don't have any way of knowing if you hold that lock or not.  The easy 
>>case is that nobody holds the lock.  But if the lock is held you have no 
>>way to know that you are the person holding the lock and you cannot 
>>unlock someone elses lock without serious consequences.
> 
> 
> Yes. I see.
> 
> But let me make sure I understand what you are saying.
> 
> The problem would be that if I release and then retake the lock for autofs 
> to do it thing there is a risk of opening the caller to the potential 
> races it is protecting itself from. 
> 
> Correct?
> 
No, it is actually a little more subtle than that.  The problem is that 
since you can be called from two code paths, one of which get's the lock 
and one of them doesn't, you are stuck if you find that the lock is held 
because you don't know who holds it.  The danger is that some innocent 
third party is holding the lock and counting on being protected by it. 
If you release the lock, then you can be creating the potential for a 
race in their code and there would be no way to detect it.  Their code 
path would look correct because it is.  Not only that but the lock 
itself could get confused because it would have more unlocks than locks, 
because presumably the process that thinks it has the lock would 
eventually unlock as well.  I don't know how the semaphores are 
implemented on all architectures so I don't know if that would be an 
actual problem or not but it I would be surprised if they all handled 
that case gracefully.

Regards,

Will

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-18 15:20               ` William H. Taber
@ 2005-11-18 16:30                 ` Ian Kent
  2005-11-18 17:12                   ` William H. Taber
  0 siblings, 1 reply; 95+ messages in thread
From: Ian Kent @ 2005-11-18 16:30 UTC (permalink / raw)
  To: William H. Taber; +Cc: Ram Pai, autofs mailing list, linux-fsdevel

On Fri, 18 Nov 2005, William H. Taber wrote:

> Ian Kent wrote:
> > On Thu, 17 Nov 2005, William H. Taber wrote:
> > 
> > Hi Taber,
> 
> Hi,
> You can call me Will. Most everyone else does. :^)

Sorry.

> > 
> > > 
> > > Ian,
> > > I don't think that you can fix this in the autofs by tinkering with
> > > holding and releasing the parent i_sem.  The reason for this is that you
> > > don't have any way of knowing if you hold that lock or not.  The easy case
> > > is that nobody holds the lock.  But if the lock is held you have no way to
> > > know that you are the person holding the lock and you cannot unlock
> > > someone elses lock without serious consequences.
> > 
> > 
> > Yes. I see.
> > 
> > But let me make sure I understand what you are saying.
> > 
> > The problem would be that if I release and then retake the lock for autofs
> > to do it thing there is a risk of opening the caller to the potential races
> > it is protecting itself from. 
> > Correct?
> > 
> No, it is actually a little more subtle than that.  The problem is that since
> you can be called from two code paths, one of which get's the lock and one of
> them doesn't, you are stuck if you find that the lock is held because you
> don't know who holds it.  The danger is that some innocent third party is
> holding the lock and counting on being protected by it. If you release the
> lock, then you can be creating the potential for a race in their code and
> there would be no way to detect it.  Their code path would look correct
> because it is.  Not only that but the lock itself could get confused because
> it would have more unlocks than locks, because presumably the process that
> thinks it has the lock would eventually unlock as well.  I don't know how the
> semaphores are implemented on all architectures so I don't know if that would
> be an actual problem or not but it I would be surprised if they all handled
> that case gracefully.

Yes. Thinking about it I hadn't considered the third process.

How can we get advice on this?

Ian


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-18 16:30                 ` Ian Kent
@ 2005-11-18 17:12                   ` William H. Taber
  2005-11-18 18:57                     ` Ram Pai
  2005-11-19  1:40                     ` [autofs] " Ian Kent
  0 siblings, 2 replies; 95+ messages in thread
From: William H. Taber @ 2005-11-18 17:12 UTC (permalink / raw)
  To: Ian Kent; +Cc: Ram Pai, autofs mailing list, linux-fsdevel

Ian Kent wrote:
> On Fri, 18 Nov 2005, William H. Taber wrote:
>>
>>No, it is actually a little more subtle than that.  The problem is that since
>>you can be called from two code paths, one of which get's the lock and one of
>>them doesn't, you are stuck if you find that the lock is held because you
>>don't know who holds it.  The danger is that some innocent third party is
>>holding the lock and counting on being protected by it. If you release the
>>lock, then you can be creating the potential for a race in their code and
>>there would be no way to detect it.  Their code path would look correct
>>because it is.  Not only that but the lock itself could get confused because
>>it would have more unlocks than locks, because presumably the process that
>>thinks it has the lock would eventually unlock as well.  I don't know how the
>>semaphores are implemented on all architectures so I don't know if that would
>>be an actual problem or not but it I would be surprised if they all handled
>>that case gracefully.
> 
> 
> Yes. Thinking about it I hadn't considered the third process.
> 
> How can we get advice on this?
> 
> Ian
> 
Hey, I'm new here. I came here looking for advice. :^)
However, if there is no way to fix this in the autofs code then the only 
alternative is to fix the locking in the VFS.  So VFS folks, which is 
the better solution.  Should all callers to d_revalidate get the 
parent's i_sem lock or should d_revalidate never be called with the 
parent i_sem lock and so we need to change lookup_one_len so that it 
does not expect the parent i_sem lock to be held?

Will Taber

> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-18 17:12                   ` William H. Taber
@ 2005-11-18 18:57                     ` Ram Pai
  2005-11-18 20:08                       ` William H. Taber
  2005-11-19  1:40                     ` [autofs] " Ian Kent
  1 sibling, 1 reply; 95+ messages in thread
From: Ram Pai @ 2005-11-18 18:57 UTC (permalink / raw)
  To: William H Taber; +Cc: Ian Kent, autofs mailing list, linux-fsdevel

On Fri, 2005-11-18 at 09:12, William H. Taber wrote:
> Ian Kent wrote:
> > On Fri, 18 Nov 2005, William H. Taber wrote:
> >>
> >>No, it is actually a little more subtle than that.  The problem is that since
> >>you can be called from two code paths, one of which get's the lock and one of
> >>them doesn't, you are stuck if you find that the lock is held because you
> >>don't know who holds it.  The danger is that some innocent third party is
> >>holding the lock and counting on being protected by it. If you release the
> >>lock, then you can be creating the potential for a race in their code and
> >>there would be no way to detect it.  Their code path would look correct
> >>because it is.  Not only that but the lock itself could get confused because
> >>it would have more unlocks than locks, because presumably the process that
> >>thinks it has the lock would eventually unlock as well.  I don't know how the
> >>semaphores are implemented on all architectures so I don't know if that would
> >>be an actual problem or not but it I would be surprised if they all handled
> >>that case gracefully.
> > 
> > 
> > Yes. Thinking about it I hadn't considered the third process.
> > 
> > How can we get advice on this?
> > 
> > Ian
> > 
> Hey, I'm new here. I came here looking for advice. :^)
> However, if there is no way to fix this in the autofs code then the only 
> alternative is to fix the locking in the VFS.  So VFS folks, which is 
> the better solution.  Should all callers to d_revalidate get the 
> parent's i_sem lock or should d_revalidate never be called with the 
> parent i_sem lock and so we need to change lookup_one_len so that it 
> does not expect the parent i_sem lock to be held?

I think the problem is with cached_lookup(). It is the only place which 
calls ->revalidate() holding the parent's inode-semaphore AFAICT.

note: cached_lookup() is only called from __lookup_hash() and
__lookup_hash() is always called holding the semaphore.

VFS experts agree?
RP




> 
> Will Taber
> 
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-18 18:57                     ` Ram Pai
@ 2005-11-18 20:08                       ` William H. Taber
  2005-11-19  2:52                         ` Ian Kent
  0 siblings, 1 reply; 95+ messages in thread
From: William H. Taber @ 2005-11-18 20:08 UTC (permalink / raw)
  To: Ram Pai; +Cc: Ian Kent, autofs mailing list, linux-fsdevel

Ram Pai wrote:
> On Fri, 2005-11-18 at 09:12, William H. Taber wrote:

> 
> I think the problem is with cached_lookup(). It is the only place which 
> calls ->revalidate() holding the parent's inode-semaphore AFAICT.
> 
> note: cached_lookup() is only called from __lookup_hash() and
> __lookup_hash() is always called holding the semaphore.
> 
> VFS experts agree?
> RP
> 
Ram,
Lookup_one_len calls lookup_hash and it is the callers of lookup_one_len 
that are problematical.  Just as an example, lookup_one_len is called 
from nfs_sillyrename which is called, among other places in the 
nfs_rename code.  In that path the parent i_sem is obtained in do_rename 
   in the vfs code (namei.c). I would think that it would be extremely 
difficult to to change that usage.  The alternative is to move the 
obtaining of the parent i_sem from real_lookup to do_lookup.  We would 
also have to put the locking around the d_revalidate call at 
return_reval in __link_path_walk.

Again, what do the VFS experts think?

Will Taber

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-18 17:12                   ` William H. Taber
  2005-11-18 18:57                     ` Ram Pai
@ 2005-11-19  1:40                     ` Ian Kent
  1 sibling, 0 replies; 95+ messages in thread
From: Ian Kent @ 2005-11-19  1:40 UTC (permalink / raw)
  To: William H. Taber; +Cc: Ram Pai, autofs mailing list, linux-fsdevel

On Fri, 18 Nov 2005, William H. Taber wrote:

> Ian Kent wrote:
> > On Fri, 18 Nov 2005, William H. Taber wrote:
> > > 
> > > No, it is actually a little more subtle than that.  The problem is that
> > > since
> > > you can be called from two code paths, one of which get's the lock and one
> > > of
> > > them doesn't, you are stuck if you find that the lock is held because you
> > > don't know who holds it.  The danger is that some innocent third party is
> > > holding the lock and counting on being protected by it. If you release the
> > > lock, then you can be creating the potential for a race in their code and
> > > there would be no way to detect it.  Their code path would look correct
> > > because it is.  Not only that but the lock itself could get confused
> > > because
> > > it would have more unlocks than locks, because presumably the process that
> > > thinks it has the lock would eventually unlock as well.  I don't know how
> > > the
> > > semaphores are implemented on all architectures so I don't know if that
> > > would
> > > be an actual problem or not but it I would be surprised if they all
> > > handled
> > > that case gracefully.
> > 
> > 
> > Yes. Thinking about it I hadn't considered the third process.
> > 
> > How can we get advice on this?
> > 
> > Ian
> > 
> Hey, I'm new here. I came here looking for advice. :^)

Ha. It's probably wearing a bit thin for me to say I'm new here but my 
patch is quite small and I've got a lot to learn and I'm lovin' it.

> However, if there is no way to fix this in the autofs code then the only
> alternative is to fix the locking in the VFS.  So VFS folks, which is the
> better solution.  Should all callers to d_revalidate get the parent's i_sem
> lock or should d_revalidate never be called with the parent i_sem lock and so
> we need to change lookup_one_len so that it does not expect the parent i_sem
> lock to be held?

Maybe. See next post.

Ian

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-18 20:08                       ` William H. Taber
@ 2005-11-19  2:52                         ` Ian Kent
  2005-11-21 16:40                           ` William H. Taber
  0 siblings, 1 reply; 95+ messages in thread
From: Ian Kent @ 2005-11-19  2:52 UTC (permalink / raw)
  To: William H. Taber; +Cc: Ram Pai, autofs mailing list, linux-fsdevel

On Fri, 18 Nov 2005, William H. Taber wrote:

> Ram Pai wrote:
> > On Fri, 2005-11-18 at 09:12, William H. Taber wrote:
> 
> > 
> > I think the problem is with cached_lookup(). It is the only place which
> > calls ->revalidate() holding the parent's inode-semaphore AFAICT.
> > 
> > note: cached_lookup() is only called from __lookup_hash() and
> > __lookup_hash() is always called holding the semaphore.
> > 
> > VFS experts agree?
> > RP
> > 
> Ram,
> Lookup_one_len calls lookup_hash and it is the callers of lookup_one_len that
> are problematical.  Just as an example, lookup_one_len is called from
> nfs_sillyrename which is called, among other places in the nfs_rename code.
> In that path the parent i_sem is obtained in do_rename   in the vfs code
> (namei.c). I would think that it would be extremely difficult to to change
> that usage.  The alternative is to move the obtaining of the parent i_sem from
> real_lookup to do_lookup.  We would also have to put the locking around the
> d_revalidate call at return_reval in __link_path_walk.
>
 
Perhaps we are making this altogether to complicated.

I'm sure that there are good reasons for the locking being the way 
it is and any attempt to change it is likely to be a disaster. So what 
about solving this by defining a usage policy based on the intent of 
the functions concerned.

For example.

The lookup_one_len a special use funtion to return the dentry 
corresponding to a path element and by definition it does not follow 
mounts or symlinks. To function correctly autofs needs to follow mounts 
and some time soon I will be posting patches that will use the the 
follow_link method as well.

So the policy could be that if autofs revalidate is called with the 
directory inode semaphore held it must validate the autofs dentry itself 
and not cause a mount request be triggered. The responsibility then 
moves to the filesystem to check if the dentry is an autofs dentry and to 
decide if it needs to then make an unlocked revalidate call. It is easy 
enough to check if the semaphore is held the autofs module. The 
filesystem check is easy enough to do once the filesystem magic number is 
moved to one of the common autofs header files.

Thoughts?

Ian


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-19  2:52                         ` Ian Kent
@ 2005-11-21 16:40                           ` William H. Taber
  2005-11-22 13:13                             ` Ian Kent
  0 siblings, 1 reply; 95+ messages in thread
From: William H. Taber @ 2005-11-21 16:40 UTC (permalink / raw)
  To: Ian Kent; +Cc: Ram Pai, autofs mailing list, linux-fsdevel

Ian Kent wrote:
> On Fri, 18 Nov 2005, William H. Taber wrote:
> 
> 
>>Ram Pai wrote:
>>
>>>On Fri, 2005-11-18 at 09:12, William H. Taber wrote:
>>
>>>I think the problem is with cached_lookup(). It is the only place which
>>>calls ->revalidate() holding the parent's inode-semaphore AFAICT.
>>>
>>>note: cached_lookup() is only called from __lookup_hash() and
>>>__lookup_hash() is always called holding the semaphore.
>>>
>>>VFS experts agree?
>>>RP
>>>
>>
>>Ram,
>>Lookup_one_len calls lookup_hash and it is the callers of lookup_one_len that
>>are problematical.  Just as an example, lookup_one_len is called from
>>nfs_sillyrename which is called, among other places in the nfs_rename code.
>>In that path the parent i_sem is obtained in do_rename   in the vfs code
>>(namei.c). I would think that it would be extremely difficult to to change
>>that usage.  The alternative is to move the obtaining of the parent i_sem from
>>real_lookup to do_lookup.  We would also have to put the locking around the
>>d_revalidate call at return_reval in __link_path_walk.
>>
> 
>  
> Perhaps we are making this altogether to complicated.
> 
> I'm sure that there are good reasons for the locking being the way 
> it is and any attempt to change it is likely to be a disaster. So what 
> about solving this by defining a usage policy based on the intent of 
> the functions concerned.
> 
> For example.
> 
> The lookup_one_len a special use funtion to return the dentry 
> corresponding to a path element and by definition it does not follow 
> mounts or symlinks. To function correctly autofs needs to follow mounts 
> and some time soon I will be posting patches that will use the the 
> follow_link method as well.
> 
> So the policy could be that if autofs revalidate is called with the 
> directory inode semaphore held it must validate the autofs dentry itself 
> and not cause a mount request be triggered. The responsibility then 
> moves to the filesystem to check if the dentry is an autofs dentry and to 
> decide if it needs to then make an unlocked revalidate call. It is easy 
> enough to check if the semaphore is held the autofs module. The 
> filesystem check is easy enough to do once the filesystem magic number is 
> moved to one of the common autofs header files.
> 
> Thoughts?
> 
> Ian
So you are asking that lookup_one_len be modified so that it knows about 
the internals of the autofs4 so that it can determine enough to know, 
before it makes the revalidate call that the the call is going to pend 
so that it can release the lock if it needs to?  This does not seem like 
a good idea to me.  The whole point of having the d_revalidate functions 
is so the VFS does not have to know the specifics of any individual 
filesystem.

Since there does not appear to be a clear locking policy on 
d_revalidate, then the autofs4 revalidate function cannot make 
assumptions about that locking state.  This means that 
autofs4_revalidate cannot pend.  I have looked some more at the 
real_lookup code, and it is prepared for the case in which the lookup 
function returns a dentry other than the one passed in.  So here is a 
proposal that might work (but I haven't looked at the autofs4 code to 
verify this.)
1) A lookup request is made for a non-existant automounted file. 
Real_lookup calls autofs4_lookup.
2) Autofs4_lookup saves the information about this request somewhere it 
can find it again and wakes up the automount demon that it has work to 
do.  It does not put a dentry in the dentry cache and it then releases 
the parent i_sem and waits for the mount to complete.
3) Any subsequent lookup for this directory that is not from the 
automount demon will look for a mount request in progress, and if found, 
it will also release the parent lock and add itself to the wait queue.
4)The automount demon will run and get the information that it needs to 
complete the mount request and  then issue the mount.  The lookup 
request from mount will call real_lookup.  Since the demon is in OZ mode 
it does not pend, it fills in the dentry and when the dentry is fully 
ready for consumption, it calls d_add and wakes up the waiters.
5) When the waiters wake up, they get the new dentry and real_lookup 
will discard the one that had been allocated.

This keeps all of the waiting inside the autofs4 lookup function where 
the lock state is defined.  I realize that this may be a lot of work, 
but I haven't seen a possible solution that doesn't involve that.

Will

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-11-21 16:40                           ` William H. Taber
@ 2005-11-22 13:13                             ` Ian Kent
  2005-11-22 17:48                               ` [autofs] " William H. Taber
  0 siblings, 1 reply; 95+ messages in thread
From: Ian Kent @ 2005-11-22 13:13 UTC (permalink / raw)
  To: William H. Taber; +Cc: autofs mailing list, linux-fsdevel

On Mon, 21 Nov 2005, William H. Taber wrote:

> Ian Kent wrote:
> > On Fri, 18 Nov 2005, William H. Taber wrote:
> > 
> > 
> >>Ram Pai wrote:
> >>
> >>>On Fri, 2005-11-18 at 09:12, William H. Taber wrote:
> >>
> >>>I think the problem is with cached_lookup(). It is the only place which
> >>>calls ->revalidate() holding the parent's inode-semaphore AFAICT.
> >>>
> >>>note: cached_lookup() is only called from __lookup_hash() and
> >>>__lookup_hash() is always called holding the semaphore.
> >>>
> >>>VFS experts agree?
> >>>RP
> >>>
> >>
> >>Ram,
> >>Lookup_one_len calls lookup_hash and it is the callers of lookup_one_len that
> >>are problematical.  Just as an example, lookup_one_len is called from
> >>nfs_sillyrename which is called, among other places in the nfs_rename code.
> >>In that path the parent i_sem is obtained in do_rename   in the vfs code
> >>(namei.c). I would think that it would be extremely difficult to to change
> >>that usage.  The alternative is to move the obtaining of the parent i_sem from
> >>real_lookup to do_lookup.  We would also have to put the locking around the
> >>d_revalidate call at return_reval in __link_path_walk.
> >>
> > 
> >  
> > Perhaps we are making this altogether to complicated.
> > 
> > I'm sure that there are good reasons for the locking being the way 
> > it is and any attempt to change it is likely to be a disaster. So what 
> > about solving this by defining a usage policy based on the intent of 
> > the functions concerned.
> > 
> > For example.
> > 
> > The lookup_one_len a special use funtion to return the dentry 
> > corresponding to a path element and by definition it does not follow 
> > mounts or symlinks. To function correctly autofs needs to follow mounts 
> > and some time soon I will be posting patches that will use the the 
> > follow_link method as well.
> > 
> > So the policy could be that if autofs revalidate is called with the 
> > directory inode semaphore held it must validate the autofs dentry itself 
> > and not cause a mount request be triggered. The responsibility then 
> > moves to the filesystem to check if the dentry is an autofs dentry and to 
> > decide if it needs to then make an unlocked revalidate call. It is easy 
> > enough to check if the semaphore is held the autofs module. The 
> > filesystem check is easy enough to do once the filesystem magic number is 
> > moved to one of the common autofs header files.
> > 
> > Thoughts?
> > 
> > Ian
> So you are asking that lookup_one_len be modified so that it knows about 
> the internals of the autofs4 so that it can determine enough to know, 
> before it makes the revalidate call that the the call is going to pend 
> so that it can release the lock if it needs to?  This does not seem like 
> a good idea to me.  The whole point of having the d_revalidate functions 
> is so the VFS does not have to know the specifics of any individual 
> filesystem.

No not at all. Absolutely, that would be a bad idea.
I thought my description above was fairly clear but obviously not.

> 
> Since there does not appear to be a clear locking policy on 
> d_revalidate, then the autofs4 revalidate function cannot make 
> assumptions about that locking state.  This means that 

I'm not suggesting that either. However, it is relatively simple to check 
if a semaphore is held by someone outside of your code (my code in this 
case, see down_trylock()). I think that checking would be safe as if the 
semaphore is held by someone else trylock fails and autofs can assume 
an equivelent state to oz_mode, passing through not mounting anything. 
If the semaphore is not held trylock succeeds and autofs can immediately 
release the semaphore and continue. Can you think of any examples of this 
being unsafe?

> autofs4_revalidate cannot pend.  I have looked some more at the 

Exactly and that's what I'm suggesting. Take account of what the 
lookup_one_len is advertised to do. My point is that lookup_one_len is not 
supposed to follow mounts or soft links, by definition, so it shouldn't 
cause autofs to trigger any mounts. If a filesystem wants to use it then 
it then it needs to take account of its defined behaviour, warts and all.

> real_lookup code, and it is prepared for the case in which the lookup 
> function returns a dentry other than the one passed in.  So here is a 
> proposal that might work (but I haven't looked at the autofs4 code to 
> verify this.)
> 1) A lookup request is made for a non-existant automounted file. 
> Real_lookup calls autofs4_lookup.
> 2) Autofs4_lookup saves the information about this request somewhere it 
> can find it again and wakes up the automount demon that it has work to 
> do.  It does not put a dentry in the dentry cache and it then releases 
> the parent i_sem and waits for the mount to complete.
> 3) Any subsequent lookup for this directory that is not from the 
> automount demon will look for a mount request in progress, and if found, 
> it will also release the parent lock and add itself to the wait queue.
> 4)The automount demon will run and get the information that it needs to 
> complete the mount request and  then issue the mount.  The lookup 
> request from mount will call real_lookup.  Since the demon is in OZ mode 
> it does not pend, it fills in the dentry and when the dentry is fully 
> ready for consumption, it calls d_add and wakes up the waiters.
> 5) When the waiters wake up, they get the new dentry and real_lookup 
> will discard the one that had been allocated.
> 
> This keeps all of the waiting inside the autofs4 lookup function where 
> the lock state is defined.  I realize that this may be a lot of work, 
> but I haven't seen a possible solution that doesn't involve that.

Sounds like a lot of work and likely quite interesting but directories can 
and often do exist in the autofs filesystem that don't have active mounts. 

For these directories only the revaidate method is called at auto mount 
time. It's worth remembering that, as autofs is a pseudo filesystem, it 
pins the dentry for each of its objects so they don't go away. Maybe you're 
suggesting I change this?

Sorry, I don't mean to be rude but I'm suggesting your using 
lookup_one_len incorectly. I'll need to look at other code to see if this 
actually holds true, but, as automounting is usually not the first thing 
that people think of when they are writting a filesystem I expect I won't 
get much support from there either. 

What I'm proposing is:

1) lookup_one_len should never cause anything to be auto
   mounted because of its defined behaviour and autofs
   should behave in line with this definition.
2) The filesystem that calls lookup_one_len directly or
   indirectly is responsibe for checking if it has walked
   onto an autofs dentry and decide what action it should
   take.

I thought that this was quite sensible and a fairly simple resolution?

Ian

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-22 13:13                             ` Ian Kent
@ 2005-11-22 17:48                               ` William H. Taber
  2005-11-23 14:11                                 ` Ian Kent
  0 siblings, 1 reply; 95+ messages in thread
From: William H. Taber @ 2005-11-22 17:48 UTC (permalink / raw)
  To: Ian Kent; +Cc: Ram Pai, autofs mailing list, linux-fsdevel

Ian Kent wrote:

>>>Perhaps we are making this altogether to complicated.
>>>
>>>I'm sure that there are good reasons for the locking being the way 
>>>it is and any attempt to change it is likely to be a disaster. So what 
>>>about solving this by defining a usage policy based on the intent of 
>>>the functions concerned.

While we have been discussing this I have been playing with adding locks 
around the d_revalidate calls and it is difficult (or I am obtuse).  If 
I can get that to work it will be the simplest approach but so far I am 
getting worse deadlocks.

>>>
>>>For example.
>>>
>>>The lookup_one_len a special use funtion to return the dentry 
>>>corresponding to a path element and by definition it does not follow 
>>>mounts or symlinks. To function correctly autofs needs to follow mounts 
>>>and some time soon I will be posting patches that will use the the 
>>>follow_link method as well.
>>>
>>>So the policy could be that if autofs revalidate is called with the 
>>>directory inode semaphore held it must validate the autofs dentry itself 
>>>and not cause a mount request be triggered. The responsibility then 
>>>moves to the filesystem to check if the dentry is an autofs dentry and to 
>>>decide if it needs to then make an unlocked revalidate call. It is easy 
>>>enough to check if the semaphore is held the autofs module. The 
>>>filesystem check is easy enough to do once the filesystem magic number is 
>>>moved to one of the common autofs header files.
>>>
>>>Thoughts?
>>>
>>>Ian
>>
>>So you are asking that lookup_one_len be modified so that it knows about 
>>the internals of the autofs4 so that it can determine enough to know, 
>>before it makes the revalidate call that the the call is going to pend 
>>so that it can release the lock if it needs to?  This does not seem like 
>>a good idea to me.  The whole point of having the d_revalidate functions 
>>is so the VFS does not have to know the specifics of any individual 
>>filesystem.
> 
> 
> No not at all. Absolutely, that would be a bad idea.
> I thought my description above was fairly clear but obviously not.
> 
Or maybe I was being dense. :^)
> 
>>Since there does not appear to be a clear locking policy on 
>>d_revalidate, then the autofs4 revalidate function cannot make 
>>assumptions about that locking state.  This means that 
> 
> 
> I'm not suggesting that either. However, it is relatively simple to check 
> if a semaphore is held by someone outside of your code (my code in this 
> case, see down_trylock()). I think that checking would be safe as if the 
> semaphore is held by someone else trylock fails and autofs can assume 
> an equivelent state to oz_mode, passing through not mounting anything. 
> If the semaphore is not held trylock succeeds and autofs can immediately 
> release the semaphore and continue. Can you think of any examples of this 
> being unsafe?
> 
> 
I am not sure about safety.  I haven't researched all of the callers to 
lookup_one_len.  But what is the effect of this on the lookup itself? 
If your revalidate functions returns true, then the caller will expect 
to have a dentry that they can use.  Most likely the next thing they 
will do is to try to cross the mountpoint.  But the mountpoint might not 
be set up yet.  Alternatively, you can return false but then the vfs 
will call d_invalidate on the dentry.  Either d_invalidate succeeds and 
the dentry is unhashed and the autofs lookup function is called or it 
returns -EBUSY, at which point the lookup fails and returns the error. 
The first case is essentially what I was proposing, except that I said 
not to even bother putting the dentry into the hash chains at all.  The
EBUSY case is probably not what you want.

>>autofs4_revalidate cannot pend.  I have looked some more at the 
> 
> 
> Exactly and that's what I'm suggesting. Take account of what the 
> lookup_one_len is advertised to do. My point is that lookup_one_len is not 
> supposed to follow mounts or soft links, by definition, so it shouldn't 
> cause autofs to trigger any mounts. If a filesystem wants to use it then 
> it then it needs to take account of its defined behaviour, warts and all.
> 
I am not expecting lookup_one_len to follow mount points.  I expect to 
follow them myself.  But I do expect that if this is a mountpoint, that 
autofs will set it up for me.  If not, what is the point of an automounter?
> 
>>real_lookup code, and it is prepared for the case in which the lookup 
>>function returns a dentry other than the one passed in.  So here is a 
>>proposal that might work (but I haven't looked at the autofs4 code to 
>>verify this.)
>>1) A lookup request is made for a non-existant automounted file. 
>>Real_lookup calls autofs4_lookup.
>>2) Autofs4_lookup saves the information about this request somewhere it 
>>can find it again and wakes up the automount demon that it has work to 
>>do.  It does not put a dentry in the dentry cache and it then releases 
>>the parent i_sem and waits for the mount to complete.
>>3) Any subsequent lookup for this directory that is not from the 
>>automount demon will look for a mount request in progress, and if found, 
>>it will also release the parent lock and add itself to the wait queue.
>>4)The automount demon will run and get the information that it needs to 
>>complete the mount request and  then issue the mount.  The lookup 
>>request from mount will call real_lookup.  Since the demon is in OZ mode 
>>it does not pend, it fills in the dentry and when the dentry is fully 
>>ready for consumption, it calls d_add and wakes up the waiters.
>>5) When the waiters wake up, they get the new dentry and real_lookup 
>>will discard the one that had been allocated.
>>
>>This keeps all of the waiting inside the autofs4 lookup function where 
>>the lock state is defined.  I realize that this may be a lot of work, 
>>but I haven't seen a possible solution that doesn't involve that.
> 
> 
> Sounds like a lot of work and likely quite interesting but directories can 
> and often do exist in the autofs filesystem that don't have active mounts. 
> 
> For these directories only the revaidate method is called at auto mount 
> time. It's worth remembering that, as autofs is a pseudo filesystem, it 
> pins the dentry for each of its objects so they don't go away. Maybe you're 
> suggesting I change this?

Exactly.  If the dentries are unhashed at umount time then the 
revalidate case is not an issue.  I don't know the how much work is 
invovled in setting things up in the first place so you might want to 
cache your unused dentries yourself if that avoids having to reread the 
autofs configuration files.  But that is an implementation detail for 
you to consider.
> 
> Sorry, I don't mean to be rude but I'm suggesting your using 
> lookup_one_len incorectly. I'll need to look at other code to see if this 
> actually holds true, but, as automounting is usually not the first thing 
> that people think of when they are writting a filesystem I expect I won't 
> get much support from there either. 

I don't know what you mean by using it incorrectly.  We have a shadow 
filesytem and our lookup function is being called and we are trying to 
find the corresponding file/directory in the root filesystem.  We are 
calling lookup_one_len because we are trying to find the next name in 
the path.  We are prepared to handle mountpoint crossings, but as I said 
above, the mountpoint needs to be setup so we can cross it.  We cannot 
call path_walk or its variants because we do not have the entire path.


> 
> What I'm proposing is:
> 
> 1) lookup_one_len should never cause anything to be auto
>    mounted because of its defined behaviour and autofs
>    should behave in line with this definition.

What defined behaviour?  The purpose of autofs is to automount 
directories.  I am not looking for you to cross a mountpoint for me, I 
just want you to setup the mount so I can cross it myself.
> 2) The filesystem that calls lookup_one_len directly or
>    indirectly is responsibe for checking if it has walked
>    onto an autofs dentry and decide what action it should
>    take.
And what action would that be?  Not enter into any autofs directory 
tree?  Do the mount myself?  Return ENOENT?  Return inconsistent results 
based on whether someone else has triggered the automount for me?

And from an interface perspective, a caller of a function like 
lookup_one_len should never have to worry about the implementation of 
the underlying filesystem, or even have to know or care what the 
filesystem is.
> 
> I thought that this was quite sensible and a fairly simple resolution?
But it defeats the whole purpose of having an automounter.
> 
> Ian
> 
Regards,
Will

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-22 17:48                               ` [autofs] " William H. Taber
@ 2005-11-23 14:11                                 ` Ian Kent
  2005-11-23 16:42                                   ` William H. Taber
  0 siblings, 1 reply; 95+ messages in thread
From: Ian Kent @ 2005-11-23 14:11 UTC (permalink / raw)
  To: William H. Taber; +Cc: Ram Pai, autofs mailing list, linux-fsdevel

On Tue, 22 Nov 2005, William H. Taber wrote:

> Ian Kent wrote:
> 
> > > > Perhaps we are making this altogether to complicated.
> > > > 
> > > > I'm sure that there are good reasons for the locking being the way it is
> > > > and any attempt to change it is likely to be a disaster. So what about
> > > > solving this by defining a usage policy based on the intent of the
> > > > functions concerned.
> 
> While we have been discussing this I have been playing with adding locks
> around the d_revalidate calls and it is difficult (or I am obtuse).  If I can
> get that to work it will be the simplest approach but so far I am getting
> worse deadlocks.
> 
> > > > 
> > > > For example.
> > > > 
> > > > The lookup_one_len a special use funtion to return the dentry
> > > > corresponding to a path element and by definition it does not follow
> > > > mounts or symlinks. To function correctly autofs needs to follow mounts
> > > > and some time soon I will be posting patches that will use the the
> > > > follow_link method as well.
> > > > 
> > > > So the policy could be that if autofs revalidate is called with the
> > > > directory inode semaphore held it must validate the autofs dentry itself
> > > > and not cause a mount request be triggered. The responsibility then
> > > > moves to the filesystem to check if the dentry is an autofs dentry and
> > > > to decide if it needs to then make an unlocked revalidate call. It is
> > > > easy enough to check if the semaphore is held the autofs module. The
> > > > filesystem check is easy enough to do once the filesystem magic number
> > > > is moved to one of the common autofs header files.
> > > > 
> > > > Thoughts?
> > > > 
> > > > Ian
> > > 
> > > So you are asking that lookup_one_len be modified so that it knows about
> > > the internals of the autofs4 so that it can determine enough to know,
> > > before it makes the revalidate call that the the call is going to pend so
> > > that it can release the lock if it needs to?  This does not seem like a
> > > good idea to me.  The whole point of having the d_revalidate functions is
> > > so the VFS does not have to know the specifics of any individual
> > > filesystem.
> > 
> > 
> > No not at all. Absolutely, that would be a bad idea.
> > I thought my description above was fairly clear but obviously not.
> > 
> Or maybe I was being dense. :^)
> > 
> > > Since there does not appear to be a clear locking policy on d_revalidate,
> > > then the autofs4 revalidate function cannot make assumptions about that
> > > locking state.  This means that 
> > 
> > 
> > I'm not suggesting that either. However, it is relatively simple to check if
> > a semaphore is held by someone outside of your code (my code in this case,
> > see down_trylock()). I think that checking would be safe as if the semaphore
> > is held by someone else trylock fails and autofs can assume an equivelent
> > state to oz_mode, passing through not mounting anything. If the semaphore is
> > not held trylock succeeds and autofs can immediately release the semaphore
> > and continue. Can you think of any examples of this being unsafe?
> > 
> > 
> I am not sure about safety.  I haven't researched all of the callers to
> lookup_one_len.  But what is the effect of this on the lookup itself? If your
> revalidate functions returns true, then the caller will expect to have a
> dentry that they can use.  Most likely the next thing they will do is to try
> to cross the mountpoint.  But the mountpoint might not be set up yet.
> Alternatively, you can return false but then the vfs will call d_invalidate on
> the dentry.  Either d_invalidate succeeds and the dentry is unhashed and the
> autofs lookup function is called or it returns -EBUSY, at which point the
> lookup fails and returns the error. The first case is essentially what I was
> proposing, except that I said not to even bother putting the dentry into the
> hash chains at all.  The
> EBUSY case is probably not what you want.

OK I think I'm starting to get what your saying. I guess I didn't want to 
hear it because I've been moving toward pushing everything to revalidate.

For some reason I woke up early this morning. I read your reply and I've 
been thinking about it all day (hard to change gears when there's a 
challenge like this, bad for work, fun for me, I'm sure I'll get cained 
for being somewhere else).

To verify I've got it this time all you are saying is, avoid revalidate by 
keeping all unmounted dentrys unhashed? Nothing much more really?

I haven't really thought it through completely yet. You've noticed I'm 
not a quick study no doubt.

So far it seems doable though. One problem point could be if one of these 
lookups come in during user space daemon startup but that's detail atm.

Following startup I have two cases to deal with (i'm only listing them 
cause I think you missed case 2):

1) No directory exists - it's created by the daemon during the callback to 
the userspace before it perform the mount.

2) Directory already exists - created at startup before any mount 
activity.

Case 2 needs special attention. To achieve this I would need to have two 
types of unhashed dentry, valid and invalid (perhaps because of a ENOENT 
return from the daemon on mount).

Not really that hard to do I think.

I'd need to rework the readdir code to fill unhashed, valid dentrys and I 
can set status on the return codes from the daemon. There's likely a bunch 
of other detail as well.

Unfortuneatly for me this is only half of the solution ... see below.

> 
> > > autofs4_revalidate cannot pend.  I have looked some more at the 
> > 
> > 
> > Exactly and that's what I'm suggesting. Take account of what the
> > lookup_one_len is advertised to do. My point is that lookup_one_len is not
> > supposed to follow mounts or soft links, by definition, so it shouldn't
> > cause autofs to trigger any mounts. If a filesystem wants to use it then it
> > then it needs to take account of its defined behaviour, warts and all.
> > 
> I am not expecting lookup_one_len to follow mount points.  I expect to follow
> them myself.  But I do expect that if this is a mountpoint, that autofs will
> set it up for me.  If not, what is the point of an automounter?
> > 
> > > real_lookup code, and it is prepared for the case in which the lookup
> > > function returns a dentry other than the one passed in.  So here is a
> > > proposal that might work (but I haven't looked at the autofs4 code to
> > > verify this.)
> > > 1) A lookup request is made for a non-existant automounted file.
> > > Real_lookup calls autofs4_lookup.
> > > 2) Autofs4_lookup saves the information about this request somewhere it
> > > can find it again and wakes up the automount demon that it has work to do.
> > > It does not put a dentry in the dentry cache and it then releases the
> > > parent i_sem and waits for the mount to complete.
> > > 3) Any subsequent lookup for this directory that is not from the automount
> > > demon will look for a mount request in progress, and if found, it will
> > > also release the parent lock and add itself to the wait queue.
> > > 4)The automount demon will run and get the information that it needs to
> > > complete the mount request and  then issue the mount.  The lookup request
> > > from mount will call real_lookup.  Since the demon is in OZ mode it does
> > > not pend, it fills in the dentry and when the dentry is fully ready for
> > > consumption, it calls d_add and wakes up the waiters.
> > > 5) When the waiters wake up, they get the new dentry and real_lookup will
> > > discard the one that had been allocated.
> > > 
> > > This keeps all of the waiting inside the autofs4 lookup function where the
> > > lock state is defined.  I realize that this may be a lot of work, but I
> > > haven't seen a possible solution that doesn't involve that.
> > 
> > 
> > Sounds like a lot of work and likely quite interesting but directories can
> > and often do exist in the autofs filesystem that don't have active mounts. 
> > For these directories only the revaidate method is called at auto mount
> > time. It's worth remembering that, as autofs is a pseudo filesystem, it pins
> > the dentry for each of its objects so they don't go away. Maybe you're
> > suggesting I change this?
> 
> Exactly.  If the dentries are unhashed at umount time then the revalidate case
> is not an issue.  I don't know the how much work is invovled in setting things
> up in the first place so you might want to cache your unused dentries yourself
> if that avoids having to reread the autofs configuration files.  But that is
> an implementation detail for you to consider.
> > 
> > Sorry, I don't mean to be rude but I'm suggesting your using lookup_one_len
> > incorectly. I'll need to look at other code to see if this actually holds
> > true, but, as automounting is usually not the first thing that people think
> > of when they are writting a filesystem I expect I won't get much support
> > from there either. 
> 
> I don't know what you mean by using it incorrectly.  We have a shadow
> filesytem and our lookup function is being called and we are trying to find
> the corresponding file/directory in the root filesystem.  We are calling
> lookup_one_len because we are trying to find the next name in the path.  We
> are prepared to handle mountpoint crossings, but as I said above, the
> mountpoint needs to be setup so we can cross it.  We cannot call path_walk or
> its variants because we do not have the entire path.
> 
> 
> > 
> > What I'm proposing is:
> > 
> > 1) lookup_one_len should never cause anything to be auto
> >    mounted because of its defined behaviour and autofs
> >    should behave in line with this definition.
> 
> What defined behaviour?  The purpose of autofs is to automount directories.  I
> am not looking for you to cross a mountpoint for me, I just want you to setup
> the mount so I can cross it myself.
> > 2) The filesystem that calls lookup_one_len directly or
> >    indirectly is responsibe for checking if it has walked
> >    onto an autofs dentry and decide what action it should
> >    take.
> And what action would that be?  Not enter into any autofs directory tree?  Do
> the mount myself?  Return ENOENT?  Return inconsistent results based on
> whether someone else has triggered the automount for me?

I was thinking of something like EAGAIN to the calling fs - meaning, ok 
but I need to be revalidated (lockles) for you to be sure.

> 
> And from an interface perspective, a caller of a function like lookup_one_len
> should never have to worry about the implementation of the underlying
> filesystem, or even have to know or care what the filesystem is.
> > 
> > I thought that this was quite sensible and a fairly simple resolution?
> But it defeats the whole purpose of having an automounter.

Yep. Sigh.

But there's more. The Linux autofs implementation lacks some crucial 
features which I really want to add.

I posted an RFC to LKML and had no interest but, since it relates to this 
issue, I'd really appreciate it if you could give it a quick read and 
perhaps help out a bit with your thoughts.

http://themaw.net/direct.txt

The issue that comes up with this is that, for this to work the fs would 
have to do

if (follow_link inode method is defined)
	call follow_link

where as 

if (S_ISLNK(...))
	call follow_link

won't work.

Of course the other way to do it would be to add another method but that 
would complicate an already busy link_path_walk. I don't think people 
would agree with that.

The side affects of setting the link bit would have undesirable 
consequences.

Ideas?

Ian


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-23 14:11                                 ` Ian Kent
@ 2005-11-23 16:42                                   ` William H. Taber
  2005-11-23 17:52                                     ` Ian Kent
  2005-11-23 17:52                                     ` Ian Kent
  0 siblings, 2 replies; 95+ messages in thread
From: William H. Taber @ 2005-11-23 16:42 UTC (permalink / raw)
  To: Ian Kent; +Cc: Ram Pai, autofs mailing list, linux-fsdevel

Ian Kent wrote:
  >
> OK I think I'm starting to get what your saying. I guess I didn't want to 
> hear it because I've been moving toward pushing everything to revalidate.
> 
> For some reason I woke up early this morning. I read your reply and I've 
> been thinking about it all day (hard to change gears when there's a 
> challenge like this, bad for work, fun for me, I'm sure I'll get cained 
> for being somewhere else).
> 
> To verify I've got it this time all you are saying is, avoid revalidate by 
> keeping all unmounted dentrys unhashed? Nothing much more really?
> 
> I haven't really thought it through completely yet. You've noticed I'm 
> not a quick study no doubt.
Who am I to complain about that? :^)
> 
> So far it seems doable though. One problem point could be if one of these 
> lookups come in during user space daemon startup but that's detail atm.
> 
> Following startup I have two cases to deal with (i'm only listing them 
> cause I think you missed case 2):
> 
> 1) No directory exists - it's created by the daemon during the callback to 
> the userspace before it perform the mount.
> 
> 2) Directory already exists - created at startup before any mount 
> activity.
I am a little unclear about this.  Which directory are you talking 
about?  The mount-over directory or the directory being mounted?  My 
picture of things (and I haven't verified this by doing anything mundane 
like reading the code) is that There is a directory (such as /net ) 
which is of type autofs and as needed new directories of type autofs are 
created under it for the various hosts/filenames.  Over these 
subdirectories (the mount-over directories) get mounted the remote 
filesystems, usually of type NFS.  Is this a correct understanding?
> 
> Case 2 needs special attention. To achieve this I would need to have two 
> types of unhashed dentry, valid and invalid (perhaps because of a ENOENT 
> return from the daemon on mount).
> 
If my understanding above is correct, I don't think you need to hide the 
dentry for the autofs mount-over directory.  If there is an active 
mount, then the dentries d_mounted flag will be set and the normal 
mountpoint traversal will work.  If nothing is mounted here then the 
autofs mount-over directory lookup functions will be called.  This is 
where the actual mount request gets triggered.  The dentry created here 
should not be added to the dentry cache until the dentry is actually 
ready to use.  It has to be kept in a way that can be found by 
subsequent calls to lookup in case there are multiple requests for it. 
The trick is that the first lookup to succeed has to be the one for the 
mount request.  But once it is on the dentry hash chain, revalidate has 
to be careful because if the revalidate fails then the dentry will be 
invalidated.  And if revalidate succeeds then everything needs to be 
setup so that folow_down will work.  Hmm.  I will have to think about 
this some more.
> Not really that hard to do I think.
> 
> I'd need to rework the readdir code to fill unhashed, valid dentrys and I 
> can set status on the return codes from the daemon. There's likely a bunch 
> of other detail as well.
> 
> Unfortuneatly for me this is only half of the solution ... see below.
> 
> 
>>>>autofs4_revalidate cannot pend.  I have looked some more at the 
>>>
>>>
>>>Exactly and that's what I'm suggesting. Take account of what the
>>>lookup_one_len is advertised to do. My point is that lookup_one_len is not
>>>supposed to follow mounts or soft links, by definition, so it shouldn't
>>>cause autofs to trigger any mounts. If a filesystem wants to use it then it
>>>then it needs to take account of its defined behaviour, warts and all.
>>>
>>
>>I am not expecting lookup_one_len to follow mount points.  I expect to follow
>>them myself.  But I do expect that if this is a mountpoint, that autofs will
>>set it up for me.  If not, what is the point of an automounter?
>>
>>>>real_lookup code, and it is prepared for the case in which the lookup
>>>>function returns a dentry other than the one passed in.  So here is a
>>>>proposal that might work (but I haven't looked at the autofs4 code to
>>>>verify this.)
>>>>1) A lookup request is made for a non-existant automounted file.
>>>>Real_lookup calls autofs4_lookup.
>>>>2) Autofs4_lookup saves the information about this request somewhere it
>>>>can find it again and wakes up the automount demon that it has work to do.
>>>>It does not put a dentry in the dentry cache and it then releases the
>>>>parent i_sem and waits for the mount to complete.
>>>>3) Any subsequent lookup for this directory that is not from the automount
>>>>demon will look for a mount request in progress, and if found, it will
>>>>also release the parent lock and add itself to the wait queue.
>>>>4)The automount demon will run and get the information that it needs to
>>>>complete the mount request and  then issue the mount.  The lookup request
>>>>from mount will call real_lookup.  Since the demon is in OZ mode it does
>>>>not pend, it fills in the dentry and when the dentry is fully ready for
>>>>consumption, it calls d_add and wakes up the waiters.
>>>>5) When the waiters wake up, they get the new dentry and real_lookup will
>>>>discard the one that had been allocated.
>>>>
>>>>This keeps all of the waiting inside the autofs4 lookup function where the
>>>>lock state is defined.  I realize that this may be a lot of work, but I
>>>>haven't seen a possible solution that doesn't involve that.
>>>
>>>
>>>Sounds like a lot of work and likely quite interesting but directories can
>>>and often do exist in the autofs filesystem that don't have active mounts. 
>>>For these directories only the revaidate method is called at auto mount
>>>time. It's worth remembering that, as autofs is a pseudo filesystem, it pins
>>>the dentry for each of its objects so they don't go away. Maybe you're
>>>suggesting I change this?
>>
>>Exactly.  If the dentries are unhashed at umount time then the revalidate case
>>is not an issue.  I don't know the how much work is invovled in setting things
>>up in the first place so you might want to cache your unused dentries yourself
>>if that avoids having to reread the autofs configuration files.  But that is
>>an implementation detail for you to consider.
>>
>>>Sorry, I don't mean to be rude but I'm suggesting your using lookup_one_len
>>>incorectly. I'll need to look at other code to see if this actually holds
>>>true, but, as automounting is usually not the first thing that people think
>>>of when they are writting a filesystem I expect I won't get much support
>>>from there either. 
>>
>>I don't know what you mean by using it incorrectly.  We have a shadow
>>filesytem and our lookup function is being called and we are trying to find
>>the corresponding file/directory in the root filesystem.  We are calling
>>lookup_one_len because we are trying to find the next name in the path.  We
>>are prepared to handle mountpoint crossings, but as I said above, the
>>mountpoint needs to be setup so we can cross it.  We cannot call path_walk or
>>its variants because we do not have the entire path.
>>
>>
>>
>>>What I'm proposing is:
>>>
>>>1) lookup_one_len should never cause anything to be auto
>>>   mounted because of its defined behaviour and autofs
>>>   should behave in line with this definition.
>>
>>What defined behaviour?  The purpose of autofs is to automount directories.  I
>>am not looking for you to cross a mountpoint for me, I just want you to setup
>>the mount so I can cross it myself.
>>
>>>2) The filesystem that calls lookup_one_len directly or
>>>   indirectly is responsibe for checking if it has walked
>>>   onto an autofs dentry and decide what action it should
>>>   take.
>>
>>And what action would that be?  Not enter into any autofs directory tree?  Do
>>the mount myself?  Return ENOENT?  Return inconsistent results based on
>>whether someone else has triggered the automount for me?
> 
> 
> I was thinking of something like EAGAIN to the calling fs - meaning, ok 
> but I need to be revalidated (lockles) for you to be sure.
> 
> 
>>And from an interface perspective, a caller of a function like lookup_one_len
>>should never have to worry about the implementation of the underlying
>>filesystem, or even have to know or care what the filesystem is.
>>
>>>I thought that this was quite sensible and a fairly simple resolution?
>>
>>But it defeats the whole purpose of having an automounter.
> 
> 
> Yep. Sigh.
> 
> But there's more. The Linux autofs implementation lacks some crucial 
> features which I really want to add.
> 
> I posted an RFC to LKML and had no interest but, since it relates to this 
> issue, I'd really appreciate it if you could give it a quick read and 
> perhaps help out a bit with your thoughts.
> 
> http://themaw.net/direct.txt
> 
> The issue that comes up with this is that, for this to work the fs would 
> have to do
> 
> if (follow_link inode method is defined)
> 	call follow_link
> 
> where as 
> 
> if (S_ISLNK(...))
> 	call follow_link
> 
> won't work.
> 
> Of course the other way to do it would be to add another method but that 
> would complicate an already busy link_path_walk. I don't think people 
> would agree with that.
> 
> The side affects of setting the link bit would have undesirable 
> consequences.
> 
> Ideas?
> 
I have looked at your proposal quickly and it seems reasonable on its 
surface.  I have implemented a stackable filesystem, you are wise to 
want to avoid doing so.  I would need to give it some more thought.  But 
  since I will be on holiday until Monday, it is not going to happen soon.

Regards,
Will


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-23 16:42                                   ` William H. Taber
@ 2005-11-23 17:52                                     ` Ian Kent
  2005-11-23 18:47                                       ` William H. Taber
  2005-11-23 17:52                                     ` Ian Kent
  1 sibling, 1 reply; 95+ messages in thread
From: Ian Kent @ 2005-11-23 17:52 UTC (permalink / raw)
  To: William H. Taber; +Cc: Ram Pai, autofs mailing list, linux-fsdevel

On Wed, 23 Nov 2005, William H. Taber wrote:

> Ian Kent wrote:
>  >
> > OK I think I'm starting to get what your saying. I guess I didn't want to
> > hear it because I've been moving toward pushing everything to revalidate.
> > 
> > For some reason I woke up early this morning. I read your reply and I've
> > been thinking about it all day (hard to change gears when there's a
> > challenge like this, bad for work, fun for me, I'm sure I'll get cained for
> > being somewhere else).
> > 
> > To verify I've got it this time all you are saying is, avoid revalidate by
> > keeping all unmounted dentrys unhashed? Nothing much more really?
> > 
> > I haven't really thought it through completely yet. You've noticed I'm not a
> > quick study no doubt.
> Who am I to complain about that? :^)
> > 
> > So far it seems doable though. One problem point could be if one of these
> > lookups come in during user space daemon startup but that's detail atm.
> > 
> > Following startup I have two cases to deal with (i'm only listing them cause
> > I think you missed case 2):
> > 
> > 1) No directory exists - it's created by the daemon during the callback to
> > the userspace before it perform the mount.
> > 
> > 2) Directory already exists - created at startup before any mount activity.
> I am a little unclear about this.  Which directory are you talking about?  The
> mount-over directory or the directory being mounted?  My picture of things
> (and I haven't verified this by doing anything mundane like reading the code)
> is that There is a directory (such as /net ) which is of type autofs and as
> needed new directories of type autofs are created under it for the various
> hosts/filenames.  Over these subdirectories (the mount-over directories) get
> mounted the remote filesystems, usually of type NFS.  Is this a correct
> understanding?

Yep. That's an indirect mount.

The directories I'm refering to are the ones created inside the autofs 
mount point /net or other autofs mount point. Creating the directories 
makes them browsable without necessarily mounting them (as long as the 
module knows when to trigger a mount request). This lazyness is what 
causes all the fus.

> > 
> > Case 2 needs special attention. To achieve this I would need to have two
> > types of unhashed dentry, valid and invalid (perhaps because of a ENOENT
> > return from the daemon on mount).
> > 
> If my understanding above is correct, I don't think you need to hide the
> dentry for the autofs mount-over directory.  If there is an active mount, then
> the dentries d_mounted flag will be set and the normal mountpoint traversal
> will work.  If nothing is mounted here then the autofs mount-over directory
> lookup functions will be called.  This is where the actual mount request gets
> triggered.  The dentry created here should not be added to the dentry cache
> until the dentry is actually ready to use.  It has to be kept in a way that
> can be found by subsequent calls to lookup in case there are multiple requests
> for it. The trick is that the first lookup to succeed has to be the one for
> the mount request.  But once it is on the dentry hash chain, revalidate has to
> be careful because if the revalidate fails then the dentry will be
> invalidated.  And if revalidate succeeds then everything needs to be setup so
> that folow_down will work.  Hmm.  I will have to think about this some more.

That was how things used to work but late mounting is what's needed to 
provide the function expected of an automounter when people start using 
this in an enterprise environment. Basically autofs tells lies until it's 
forced to tell the truth.

Point is people expect to be able to see the mount points without causing 
them to mount until they actually try and access something inside the 
mount.

Hence "browseable".

> > Not really that hard to do I think.
> > 
> > I'd need to rework the readdir code to fill unhashed, valid dentrys and I
> > can set status on the return codes from the daemon. There's likely a bunch
> > of other detail as well.
> > 
> > Unfortuneatly for me this is only half of the solution ... see below.
> > 
> > 
> > > > > autofs4_revalidate cannot pend.  I have looked some more at the 
> > > > 
> > > > 
> > > > Exactly and that's what I'm suggesting. Take account of what the
> > > > lookup_one_len is advertised to do. My point is that lookup_one_len is
> > > > not
> > > > supposed to follow mounts or soft links, by definition, so it shouldn't
> > > > cause autofs to trigger any mounts. If a filesystem wants to use it then
> > > > it
> > > > then it needs to take account of its defined behaviour, warts and all.
> > > > 
> > > 
> > > I am not expecting lookup_one_len to follow mount points.  I expect to
> > > follow
> > > them myself.  But I do expect that if this is a mountpoint, that autofs
> > > will
> > > set it up for me.  If not, what is the point of an automounter?
> > > 
> > > > > real_lookup code, and it is prepared for the case in which the lookup
> > > > > function returns a dentry other than the one passed in.  So here is a
> > > > > proposal that might work (but I haven't looked at the autofs4 code to
> > > > > verify this.)
> > > > > 1) A lookup request is made for a non-existant automounted file.
> > > > > Real_lookup calls autofs4_lookup.
> > > > > 2) Autofs4_lookup saves the information about this request somewhere
> > > > > it
> > > > > can find it again and wakes up the automount demon that it has work to
> > > > > do.
> > > > > It does not put a dentry in the dentry cache and it then releases the
> > > > > parent i_sem and waits for the mount to complete.
> > > > > 3) Any subsequent lookup for this directory that is not from the
> > > > > automount
> > > > > demon will look for a mount request in progress, and if found, it will
> > > > > also release the parent lock and add itself to the wait queue.
> > > > > 4)The automount demon will run and get the information that it needs
> > > > > to
> > > > > complete the mount request and  then issue the mount.  The lookup
> > > > > request
> > > > > from mount will call real_lookup.  Since the demon is in OZ mode it
> > > > > does
> > > > > not pend, it fills in the dentry and when the dentry is fully ready
> > > > > for
> > > > > consumption, it calls d_add and wakes up the waiters.
> > > > > 5) When the waiters wake up, they get the new dentry and real_lookup
> > > > > will
> > > > > discard the one that had been allocated.
> > > > > 
> > > > > This keeps all of the waiting inside the autofs4 lookup function where
> > > > > the
> > > > > lock state is defined.  I realize that this may be a lot of work, but
> > > > > I
> > > > > haven't seen a possible solution that doesn't involve that.
> > > > 
> > > > 
> > > > Sounds like a lot of work and likely quite interesting but directories
> > > > can
> > > > and often do exist in the autofs filesystem that don't have active
> > > > mounts. For these directories only the revaidate method is called at
> > > > auto mount
> > > > time. It's worth remembering that, as autofs is a pseudo filesystem, it
> > > > pins
> > > > the dentry for each of its objects so they don't go away. Maybe you're
> > > > suggesting I change this?
> > > 
> > > Exactly.  If the dentries are unhashed at umount time then the revalidate
> > > case
> > > is not an issue.  I don't know the how much work is invovled in setting
> > > things
> > > up in the first place so you might want to cache your unused dentries
> > > yourself
> > > if that avoids having to reread the autofs configuration files.  But that
> > > is
> > > an implementation detail for you to consider.
> > > 
> > > > Sorry, I don't mean to be rude but I'm suggesting your using
> > > > lookup_one_len
> > > > incorectly. I'll need to look at other code to see if this actually
> > > > holds
> > > > true, but, as automounting is usually not the first thing that people
> > > > think
> > > > of when they are writting a filesystem I expect I won't get much support
> > > > from there either. 
> > > 
> > > I don't know what you mean by using it incorrectly.  We have a shadow
> > > filesytem and our lookup function is being called and we are trying to
> > > find
> > > the corresponding file/directory in the root filesystem.  We are calling
> > > lookup_one_len because we are trying to find the next name in the path.
> > > We
> > > are prepared to handle mountpoint crossings, but as I said above, the
> > > mountpoint needs to be setup so we can cross it.  We cannot call path_walk
> > > or
> > > its variants because we do not have the entire path.
> > > 
> > > 
> > > 
> > > > What I'm proposing is:
> > > > 
> > > > 1) lookup_one_len should never cause anything to be auto
> > > >   mounted because of its defined behaviour and autofs
> > > >   should behave in line with this definition.
> > > 
> > > What defined behaviour?  The purpose of autofs is to automount
> > > directories.  I
> > > am not looking for you to cross a mountpoint for me, I just want you to
> > > setup
> > > the mount so I can cross it myself.
> > > 
> > > > 2) The filesystem that calls lookup_one_len directly or
> > > >   indirectly is responsibe for checking if it has walked
> > > >   onto an autofs dentry and decide what action it should
> > > >   take.
> > > 
> > > And what action would that be?  Not enter into any autofs directory tree?
> > > Do
> > > the mount myself?  Return ENOENT?  Return inconsistent results based on
> > > whether someone else has triggered the automount for me?
> > 
> > 
> > I was thinking of something like EAGAIN to the calling fs - meaning, ok but
> > I need to be revalidated (lockles) for you to be sure.
> > 
> > 
> > > And from an interface perspective, a caller of a function like
> > > lookup_one_len
> > > should never have to worry about the implementation of the underlying
> > > filesystem, or even have to know or care what the filesystem is.
> > > 
> > > > I thought that this was quite sensible and a fairly simple resolution?
> > > 
> > > But it defeats the whole purpose of having an automounter.
> > 
> > 
> > Yep. Sigh.
> > 
> > But there's more. The Linux autofs implementation lacks some crucial
> > features which I really want to add.
> > 
> > I posted an RFC to LKML and had no interest but, since it relates to this
> > issue, I'd really appreciate it if you could give it a quick read and
> > perhaps help out a bit with your thoughts.
> > 
> > http://themaw.net/direct.txt
> > 
> > The issue that comes up with this is that, for this to work the fs would
> > have to do
> > 
> > if (follow_link inode method is defined)
> > 	call follow_link
> > 
> > where as 
> > if (S_ISLNK(...))
> > 	call follow_link
> > 
> > won't work.
> > 
> > Of course the other way to do it would be to add another method but that
> > would complicate an already busy link_path_walk. I don't think people would
> > agree with that.
> > 
> > The side affects of setting the link bit would have undesirable
> > consequences.
> > 
> > Ideas?
> > 
> I have looked at your proposal quickly and it seems reasonable on its surface.
> I have implemented a stackable filesystem, you are wise to want to avoid doing
> so.  I would need to give it some more thought.  But  since I will be on
> holiday until Monday, it is not going to happen soon.

If you would like something to help you sleep at night then take a copy of 
this with you.

http://themaw.net/autofs_linux_kongress.pdf

Jeff and I wrote it and he presented a talk based on it.

Ian


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-11-23 16:42                                   ` William H. Taber
  2005-11-23 17:52                                     ` Ian Kent
@ 2005-11-23 17:52                                     ` Ian Kent
  1 sibling, 0 replies; 95+ messages in thread
From: Ian Kent @ 2005-11-23 17:52 UTC (permalink / raw)
  To: William H. Taber; +Cc: autofs mailing list, linux-fsdevel

On Wed, 23 Nov 2005, William H. Taber wrote:

> Ian Kent wrote:
>  >
> > OK I think I'm starting to get what your saying. I guess I didn't want to
> > hear it because I've been moving toward pushing everything to revalidate.
> > 
> > For some reason I woke up early this morning. I read your reply and I've
> > been thinking about it all day (hard to change gears when there's a
> > challenge like this, bad for work, fun for me, I'm sure I'll get cained for
> > being somewhere else).
> > 
> > To verify I've got it this time all you are saying is, avoid revalidate by
> > keeping all unmounted dentrys unhashed? Nothing much more really?
> > 
> > I haven't really thought it through completely yet. You've noticed I'm not a
> > quick study no doubt.
> Who am I to complain about that? :^)
> > 
> > So far it seems doable though. One problem point could be if one of these
> > lookups come in during user space daemon startup but that's detail atm.
> > 
> > Following startup I have two cases to deal with (i'm only listing them cause
> > I think you missed case 2):
> > 
> > 1) No directory exists - it's created by the daemon during the callback to
> > the userspace before it perform the mount.
> > 
> > 2) Directory already exists - created at startup before any mount activity.
> I am a little unclear about this.  Which directory are you talking about?  The
> mount-over directory or the directory being mounted?  My picture of things
> (and I haven't verified this by doing anything mundane like reading the code)
> is that There is a directory (such as /net ) which is of type autofs and as
> needed new directories of type autofs are created under it for the various
> hosts/filenames.  Over these subdirectories (the mount-over directories) get
> mounted the remote filesystems, usually of type NFS.  Is this a correct
> understanding?

Yep. That's an indirect mount.

The directories I'm refering to are the ones created inside the autofs 
mount point /net or other autofs mount point. Creating the directories 
makes them browsable without necessarily mounting them (as long as the 
module knows when to trigger a mount request). This lazyness is what 
causes all the fus.

> > 
> > Case 2 needs special attention. To achieve this I would need to have two
> > types of unhashed dentry, valid and invalid (perhaps because of a ENOENT
> > return from the daemon on mount).
> > 
> If my understanding above is correct, I don't think you need to hide the
> dentry for the autofs mount-over directory.  If there is an active mount, then
> the dentries d_mounted flag will be set and the normal mountpoint traversal
> will work.  If nothing is mounted here then the autofs mount-over directory
> lookup functions will be called.  This is where the actual mount request gets
> triggered.  The dentry created here should not be added to the dentry cache
> until the dentry is actually ready to use.  It has to be kept in a way that
> can be found by subsequent calls to lookup in case there are multiple requests
> for it. The trick is that the first lookup to succeed has to be the one for
> the mount request.  But once it is on the dentry hash chain, revalidate has to
> be careful because if the revalidate fails then the dentry will be
> invalidated.  And if revalidate succeeds then everything needs to be setup so
> that folow_down will work.  Hmm.  I will have to think about this some more.

That was how things used to work but late mounting is what's needed to 
provide the function expected of an automounter when people start using 
this in an enterprise environment. Basically autofs tells lies until it's 
forced to tell the truth.

Point is people expect to be able to see the mount points without causing 
them to mount until they actually try and access something inside the 
mount.

Hence "browseable".

> > Not really that hard to do I think.
> > 
> > I'd need to rework the readdir code to fill unhashed, valid dentrys and I
> > can set status on the return codes from the daemon. There's likely a bunch
> > of other detail as well.
> > 
> > Unfortuneatly for me this is only half of the solution ... see below.
> > 
> > 
> > > > > autofs4_revalidate cannot pend.  I have looked some more at the 
> > > > 
> > > > 
> > > > Exactly and that's what I'm suggesting. Take account of what the
> > > > lookup_one_len is advertised to do. My point is that lookup_one_len is
> > > > not
> > > > supposed to follow mounts or soft links, by definition, so it shouldn't
> > > > cause autofs to trigger any mounts. If a filesystem wants to use it then
> > > > it
> > > > then it needs to take account of its defined behaviour, warts and all.
> > > > 
> > > 
> > > I am not expecting lookup_one_len to follow mount points.  I expect to
> > > follow
> > > them myself.  But I do expect that if this is a mountpoint, that autofs
> > > will
> > > set it up for me.  If not, what is the point of an automounter?
> > > 
> > > > > real_lookup code, and it is prepared for the case in which the lookup
> > > > > function returns a dentry other than the one passed in.  So here is a
> > > > > proposal that might work (but I haven't looked at the autofs4 code to
> > > > > verify this.)
> > > > > 1) A lookup request is made for a non-existant automounted file.
> > > > > Real_lookup calls autofs4_lookup.
> > > > > 2) Autofs4_lookup saves the information about this request somewhere
> > > > > it
> > > > > can find it again and wakes up the automount demon that it has work to
> > > > > do.
> > > > > It does not put a dentry in the dentry cache and it then releases the
> > > > > parent i_sem and waits for the mount to complete.
> > > > > 3) Any subsequent lookup for this directory that is not from the
> > > > > automount
> > > > > demon will look for a mount request in progress, and if found, it will
> > > > > also release the parent lock and add itself to the wait queue.
> > > > > 4)The automount demon will run and get the information that it needs
> > > > > to
> > > > > complete the mount request and  then issue the mount.  The lookup
> > > > > request
> > > > > from mount will call real_lookup.  Since the demon is in OZ mode it
> > > > > does
> > > > > not pend, it fills in the dentry and when the dentry is fully ready
> > > > > for
> > > > > consumption, it calls d_add and wakes up the waiters.
> > > > > 5) When the waiters wake up, they get the new dentry and real_lookup
> > > > > will
> > > > > discard the one that had been allocated.
> > > > > 
> > > > > This keeps all of the waiting inside the autofs4 lookup function where
> > > > > the
> > > > > lock state is defined.  I realize that this may be a lot of work, but
> > > > > I
> > > > > haven't seen a possible solution that doesn't involve that.
> > > > 
> > > > 
> > > > Sounds like a lot of work and likely quite interesting but directories
> > > > can
> > > > and often do exist in the autofs filesystem that don't have active
> > > > mounts. For these directories only the revaidate method is called at
> > > > auto mount
> > > > time. It's worth remembering that, as autofs is a pseudo filesystem, it
> > > > pins
> > > > the dentry for each of its objects so they don't go away. Maybe you're
> > > > suggesting I change this?
> > > 
> > > Exactly.  If the dentries are unhashed at umount time then the revalidate
> > > case
> > > is not an issue.  I don't know the how much work is invovled in setting
> > > things
> > > up in the first place so you might want to cache your unused dentries
> > > yourself
> > > if that avoids having to reread the autofs configuration files.  But that
> > > is
> > > an implementation detail for you to consider.
> > > 
> > > > Sorry, I don't mean to be rude but I'm suggesting your using
> > > > lookup_one_len
> > > > incorectly. I'll need to look at other code to see if this actually
> > > > holds
> > > > true, but, as automounting is usually not the first thing that people
> > > > think
> > > > of when they are writting a filesystem I expect I won't get much support
> > > > from there either. 
> > > 
> > > I don't know what you mean by using it incorrectly.  We have a shadow
> > > filesytem and our lookup function is being called and we are trying to
> > > find
> > > the corresponding file/directory in the root filesystem.  We are calling
> > > lookup_one_len because we are trying to find the next name in the path.
> > > We
> > > are prepared to handle mountpoint crossings, but as I said above, the
> > > mountpoint needs to be setup so we can cross it.  We cannot call path_walk
> > > or
> > > its variants because we do not have the entire path.
> > > 
> > > 
> > > 
> > > > What I'm proposing is:
> > > > 
> > > > 1) lookup_one_len should never cause anything to be auto
> > > >   mounted because of its defined behaviour and autofs
> > > >   should behave in line with this definition.
> > > 
> > > What defined behaviour?  The purpose of autofs is to automount
> > > directories.  I
> > > am not looking for you to cross a mountpoint for me, I just want you to
> > > setup
> > > the mount so I can cross it myself.
> > > 
> > > > 2) The filesystem that calls lookup_one_len directly or
> > > >   indirectly is responsibe for checking if it has walked
> > > >   onto an autofs dentry and decide what action it should
> > > >   take.
> > > 
> > > And what action would that be?  Not enter into any autofs directory tree?
> > > Do
> > > the mount myself?  Return ENOENT?  Return inconsistent results based on
> > > whether someone else has triggered the automount for me?
> > 
> > 
> > I was thinking of something like EAGAIN to the calling fs - meaning, ok but
> > I need to be revalidated (lockles) for you to be sure.
> > 
> > 
> > > And from an interface perspective, a caller of a function like
> > > lookup_one_len
> > > should never have to worry about the implementation of the underlying
> > > filesystem, or even have to know or care what the filesystem is.
> > > 
> > > > I thought that this was quite sensible and a fairly simple resolution?
> > > 
> > > But it defeats the whole purpose of having an automounter.
> > 
> > 
> > Yep. Sigh.
> > 
> > But there's more. The Linux autofs implementation lacks some crucial
> > features which I really want to add.
> > 
> > I posted an RFC to LKML and had no interest but, since it relates to this
> > issue, I'd really appreciate it if you could give it a quick read and
> > perhaps help out a bit with your thoughts.
> > 
> > http://themaw.net/direct.txt
> > 
> > The issue that comes up with this is that, for this to work the fs would
> > have to do
> > 
> > if (follow_link inode method is defined)
> > 	call follow_link
> > 
> > where as 
> > if (S_ISLNK(...))
> > 	call follow_link
> > 
> > won't work.
> > 
> > Of course the other way to do it would be to add another method but that
> > would complicate an already busy link_path_walk. I don't think people would
> > agree with that.
> > 
> > The side affects of setting the link bit would have undesirable
> > consequences.
> > 
> > Ideas?
> > 
> I have looked at your proposal quickly and it seems reasonable on its surface.
> I have implemented a stackable filesystem, you are wise to want to avoid doing
> so.  I would need to give it some more thought.  But  since I will be on
> holiday until Monday, it is not going to happen soon.

If you would like something to help you sleep at night then take a copy of 
this with you.

http://themaw.net/autofs_linux_kongress.pdf

Jeff and I wrote it and he presented a talk based on it.

Ian

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-23 17:52                                     ` Ian Kent
@ 2005-11-23 18:47                                       ` William H. Taber
  0 siblings, 0 replies; 95+ messages in thread
From: William H. Taber @ 2005-11-23 18:47 UTC (permalink / raw)
  To: Ian Kent; +Cc: Ram Pai, autofs mailing list, linux-fsdevel

Ian Kent wrote:

>>I am a little unclear about this.  Which directory are you talking about?  The
>>mount-over directory or the directory being mounted?  My picture of things
>>(and I haven't verified this by doing anything mundane like reading the code)
>>is that There is a directory (such as /net ) which is of type autofs and as
>>needed new directories of type autofs are created under it for the various
>>hosts/filenames.  Over these subdirectories (the mount-over directories) get
>>mounted the remote filesystems, usually of type NFS.  Is this a correct
>>understanding?
> 
> 
> Yep. That's an indirect mount.

OK.  Got it.
> 
> The directories I'm refering to are the ones created inside the autofs 
> mount point /net or other autofs mount point. Creating the directories 
> makes them browsable without necessarily mounting them (as long as the 
> module knows when to trigger a mount request). This lazyness is what 
> causes all the fus.
> 
> 
>>>Case 2 needs special attention. To achieve this I would need to have two
>>>types of unhashed dentry, valid and invalid (perhaps because of a ENOENT
>>>return from the daemon on mount).
>>>
>>
>>If my understanding above is correct, I don't think you need to hide the
>>dentry for the autofs mount-over directory.  If there is an active mount, then
>>the dentries d_mounted flag will be set and the normal mountpoint traversal
>>will work.  If nothing is mounted here then the autofs mount-over directory
>>lookup functions will be called.  This is where the actual mount request gets
>>triggered.  The dentry created here should not be added to the dentry cache
>>until the dentry is actually ready to use.  It has to be kept in a way that
>>can be found by subsequent calls to lookup in case there are multiple requests
>>for it. The trick is that the first lookup to succeed has to be the one for
>>the mount request.  But once it is on the dentry hash chain, revalidate has to
>>be careful because if the revalidate fails then the dentry will be
>>invalidated.  And if revalidate succeeds then everything needs to be setup so
>>that folow_down will work.  Hmm.  I will have to think about this some more.
> 
> 
> That was how things used to work but late mounting is what's needed to 
> provide the function expected of an automounter when people start using 
> this in an enterprise environment. Basically autofs tells lies until it's 
> forced to tell the truth.
> 
> Point is people expect to be able to see the mount points without causing 
> them to mount until they actually try and access something inside the 
> mount.
> 
> Hence "browseable".
> 
OK.  This is a whole other kettle of fish.
This means that readdir lies and reports about the existance of 
directories but doesn't mount them until someone is actually foolish 
enough to include them in a pathname.  That's the problem.  Too many 
fools. :^)

And thanks for the pointer to the paper.

Will

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-11-16 10:17 [RFC PATCH]autofs4: hang and proposed fix Ram Pai
  2005-11-16 12:41 ` [autofs] " Ian Kent
  2005-11-16 15:22   ` Jeff Moyer
@ 2005-11-27 10:47 ` Ian Kent
  2005-11-28 17:19   ` William H. Taber
  2005-11-30  1:16   ` Jeff Moyer
  3 siblings, 1 reply; 95+ messages in thread
From: Ian Kent @ 2005-11-27 10:47 UTC (permalink / raw)
  To: William H. Taber, Ram Pai; +Cc: autofs mailing list, linux-fsdevel

On Wed, 16 Nov 2005, Ram Pai wrote:

> Autofs4 assumes that its ->revalidate() function gets called with the
> parent_dentry's_inode_semaphore released. This is true mostly
> but not in one particular case.
> 
> Process P1  calls autofs4's ->lookup(). The lookup finds that the dentry
> does not exist. It creates a dentry and adds to the cache. Releases
> the parent's inode's semaphore and than calls ->revalidate().
> 
> Process P2 meanwhile comes in and cached_lookup() gets called. It finds
> the dentry in the cache and finds ->revalidate() function exists. So
> it calls ->revalidate() holding the parent's inode's semaphore.
> 
> Now the automounter daemon comes in and tries to hold the same semaphore
> in order to mount. But since the semaphore is held by P2 it
> goes to sleep.
> 
> Process P1 and P2 continue waiting for the mount to complete and it never
> happens. Deadlock.
> 
> The stack of the deadlock is as follows:
> 
> ls            S 00000000     0 13049  11954                     (NOTLB)
> f5221df0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 f5d44a70 c721b520 00000000 d4f33800 003d0990 c721b9d8 f5d44030
> f5d44164 f5220000 f5221e3c f3dd6880 f5221e68 c0215207 f3b95580 80000000
> Call Trace:
> [<c0215207>] autofs4_wait+0x307/0x3d0
> [<c02141d3>] try_to_fill_dentry+0xf3/0x150
> [<c0214389>] autofs4_revalidate+0x159/0x170
> [<c02144e0>] autofs4_lookup+0x110/0x150
> [<c016f3f5>] __lookup_hash+0x85/0xb0
> [<c016f42a>] lookup_hash+0xa/0x10
> [<c016f483>] lookup_one_len+0x53/0x70
> [<f8851293>] stubfs_readdir+0x113/0x170 [stubfs]
> [<c0172fcb>] vfs_readdir+0x8b/0xa0
> [<c01733b3>] sys_getdents64+0x63/0xb5
> [<c010464d>] syscall_call+0x7/0xb
> 
> ls            S C011B1AF     0 13050  11898                     (NOTLB)
> f1337df0 00000082 f1337e04 c011b1af 06ce3f60 00000027 00000027 00000080
> 06d03f60 00000000 c721b520 00000000 d4f33800 003d0990 f1337df0 f5d44a70
> f5d44ba4 f1336000 f1337e3c f3dd6880 f1337e68 c0215207 f3b95580 80000000
> Call Trace:
> [<c0215207>] autofs4_wait+0x307/0x3d0
> [<c02141d3>] try_to_fill_dentry+0xf3/0x150
> [<c0214389>] autofs4_revalidate+0x159/0x170
> [<c016dc77>] cached_lookup+0x47/0x80
> [<c016f3ca>] __lookup_hash+0x5a/0xb0
> [<c016f42a>] lookup_hash+0xa/0x10
> [<c016f483>] lookup_one_len+0x53/0x70  
> [<f88512e3>] stubfs_readdir+0x163/0x170 [stubfs]
> [<c0172fcb>] vfs_readdir+0x8b/0xa0  
> [<c01733b3>] sys_getdents64+0x63/0xb5
> [<c010464d>] syscall_call+0x7/0xb
> 
> automount     D 00000010     0 13052  13016                     (NOTLB)
> f3321f00 fff80000 00000007 00000010 f3321f68 c7b1cd20 00000000 f3321f34
> f3321ee8 f5e92a70 c7233520 00000000 d5304100 003d0990 c7233560 f1e31a70
> f1e31ba4 f5f59914 f5f5991c 00000296 f3321f38 c03b4cd3 f1e31a70 00000001
> Call Trace:
> [<c03b4cd3>] __down+0x83/0xe0
> [<c03b3632>] __down_failed+0xa/0x10
> [<c0171e6d>] .text.lock.namei+0xeb/0x1de
> [<c0170482>] sys_mkdir+0x52/0xd0
> [<c010464d>] syscall_call+0x7/0xb
> BUG: soft lockup detected on CPU#0!
> 

Hi guys,

I've been thinking about this one for a while now and have a suggestion 
about how it may be fixed.

To re-state the problem:

The autofs4 revalidate callback needs to function properly when called 
with the inode semaphore either held or not.

Summary:

Ram Pai provided the excelent problem profile above and offered a patch 
for comment which droped the inode semaphore. Will pointed out that 
droping the semaphore was not a good thing to do because of possible side 
affects.

A fair bit of interesting discussion followed.

My thoughts:

The cause of this issue is user space programs using autofs4 need to 
call services that must be able to take the inode semaphore. Notably 
sys_mkdir and sys_symlink in order to complete their task.

I believe that, in this case, releasing the semaphore is ok since the 
entry is part of the autofs filesystem and so autofs is responsible for 
taking care of it, provided that it is done carefully. The semaphore is 
meant to serialize changes being to the directory and these changes are 
done in autofs by asking the user space process to do it. Which are 
themselves serialized by the same semaphore.

The only tricky thing I can think of here is that care must be taken to 
ensure that the semaphore is not released before the DCACHE_AUTOFS_PENDING 
flag is set to make sure that other incoming requests are sent to the wait 
queue.

The attached patch does this and opts for a conservative approach by 
broadening the critical region instead of narrowing it.

It may also be necessary to review the return codes from revaliate but I'm 
only part way through that.

Please review and test this patch and offer further comment.
Sorry guys but I haven't been able to test this at all save verifying that 
it compiles.

Hopefully I haven't missed anything completely obvious ... DOH!

Ian

--- linux-2.6.15-rc1/fs/autofs4/root.c.lookup-deadlock	2005-11-17 18:58:38.000000000 +0800
+++ linux-2.6.15-rc1/fs/autofs4/root.c	2005-11-27 17:00:40.000000000 +0800
@@ -487,11 +487,8 @@ static struct dentry *autofs4_lookup(str
 	dentry->d_fsdata = NULL;
 	d_add(dentry, NULL);
 
-	if (dentry->d_op && dentry->d_op->d_revalidate) {
-		up(&dir->i_sem);
+	if (dentry->d_op && dentry->d_op->d_revalidate)
 		(dentry->d_op->d_revalidate)(dentry, nd);
-		down(&dir->i_sem);
-	}
 
 	/*
 	 * If we are still pending, check if we had to handle
--- linux-2.6.15-rc1/fs/autofs4/waitq.c.lookup-deadlock	2005-11-27 17:09:42.000000000 +0800
+++ linux-2.6.15-rc1/fs/autofs4/waitq.c	2005-11-27 17:17:34.000000000 +0800
@@ -161,6 +161,8 @@ int autofs4_wait(struct autofs_sb_info *
 		enum autofs_notify notify)
 {
 	struct autofs_wait_queue *wq;
+	struct inode *dir = dentry->d_parent->d_inode;
+	int i_sem_held;
 	char *name;
 	int len, status;
 
@@ -227,6 +229,14 @@ int autofs4_wait(struct autofs_sb_info *
 			(unsigned long) wq->wait_queue_token, wq->len, wq->name, notify);
 	}
 
+	/*
+	 * If we are called from lookup or lookup_hash the
+	 * the inode semaphore needs to be released for
+	 * userspace to do its thing.
+	 */
+	i_sem_held = down_trylock(&dir->i_sem);
+	up(&dir->i_sem);
+
 	if (notify != NFY_NONE && atomic_dec_and_test(&wq->notified)) {
 		int type = (notify == NFY_MOUNT ?
 			autofs_ptype_missing : autofs_ptype_expire_multi);
@@ -268,6 +278,10 @@ int autofs4_wait(struct autofs_sb_info *
 		DPRINTK("skipped sleeping");
 	}
 
+	/* Re-take the inode semaphore if it was held */
+	if (i_sem_held)
+		down(&dir->i_sem);
+
 	status = wq->status;
 
 	/* Are we the last process to need status? */

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-11-27 10:47 ` Ian Kent
@ 2005-11-28 17:19   ` William H. Taber
  2005-11-28 23:12     ` Badari Pulavarty
  2005-11-29 14:20     ` Ian Kent
  0 siblings, 2 replies; 95+ messages in thread
From: William H. Taber @ 2005-11-28 17:19 UTC (permalink / raw)
  To: Ian Kent; +Cc: Ram Pai, autofs mailing list, linux-fsdevel

Ian Kent wrote:

> My thoughts:
> 
> The cause of this issue is user space programs using autofs4 need to 
> call services that must be able to take the inode semaphore. Notably 
> sys_mkdir and sys_symlink in order to complete their task.
> 
> I believe that, in this case, releasing the semaphore is ok since the 
> entry is part of the autofs filesystem and so autofs is responsible for 
> taking care of it, provided that it is done carefully. The semaphore is 
> meant to serialize changes being to the directory and these changes are 
> done in autofs by asking the user space process to do it. Which are 
> themselves serialized by the same semaphore.
> 
> The only tricky thing I can think of here is that care must be taken to 
> ensure that the semaphore is not released before the DCACHE_AUTOFS_PENDING 
> flag is set to make sure that other incoming requests are sent to the wait 
> queue.
> 
> The attached patch does this and opts for a conservative approach by 
> broadening the critical region instead of narrowing it.
> 
> It may also be necessary to review the return codes from revaliate but I'm 
> only part way through that.
> 
> Please review and test this patch and offer further comment.
> Sorry guys but I haven't been able to test this at all save verifying that 
> it compiles.
> 
> Hopefully I haven't missed anything completely obvious ... DOH!
> 
> Ian
> 
> --- linux-2.6.15-rc1/fs/autofs4/root.c.lookup-deadlock	2005-11-17 18:58:38.000000000 +0800
> +++ linux-2.6.15-rc1/fs/autofs4/root.c	2005-11-27 17:00:40.000000000 +0800
> @@ -487,11 +487,8 @@ static struct dentry *autofs4_lookup(str
>  	dentry->d_fsdata = NULL;
>  	d_add(dentry, NULL);
>  
> -	if (dentry->d_op && dentry->d_op->d_revalidate) {
> -		up(&dir->i_sem);
> +	if (dentry->d_op && dentry->d_op->d_revalidate)
>  		(dentry->d_op->d_revalidate)(dentry, nd);
> -		down(&dir->i_sem);
> -	}
>  
>  	/*
>  	 * If we are still pending, check if we had to handle
> --- linux-2.6.15-rc1/fs/autofs4/waitq.c.lookup-deadlock	2005-11-27 17:09:42.000000000 +0800
> +++ linux-2.6.15-rc1/fs/autofs4/waitq.c	2005-11-27 17:17:34.000000000 +0800
> @@ -161,6 +161,8 @@ int autofs4_wait(struct autofs_sb_info *
>  		enum autofs_notify notify)
>  {
>  	struct autofs_wait_queue *wq;
> +	struct inode *dir = dentry->d_parent->d_inode;
> +	int i_sem_held;
>  	char *name;
>  	int len, status;
>  
> @@ -227,6 +229,14 @@ int autofs4_wait(struct autofs_sb_info *
>  			(unsigned long) wq->wait_queue_token, wq->len, wq->name, notify);
>  	}
>  
> +	/*
> +	 * If we are called from lookup or lookup_hash the
> +	 * the inode semaphore needs to be released for
> +	 * userspace to do its thing.
> +	 */
> +	i_sem_held = down_trylock(&dir->i_sem);
> +	up(&dir->i_sem);
> +
>  	if (notify != NFY_NONE && atomic_dec_and_test(&wq->notified)) {
>  		int type = (notify == NFY_MOUNT ?
>  			autofs_ptype_missing : autofs_ptype_expire_multi);
> @@ -268,6 +278,10 @@ int autofs4_wait(struct autofs_sb_info *
>  		DPRINTK("skipped sleeping");
>  	}
>  
> +	/* Re-take the inode semaphore if it was held */
> +	if (i_sem_held)
> +		down(&dir->i_sem);
> +
>  	status = wq->status;
>  
>  	/* Are we the last process to need status? */
> -
Ian,
I have not tested this patch but it seems to have a serious flaw.  Given 
that do_lookup does not get the parent i_sem lock before calling 
revalidate, you have the possibility that you are being called without 
having gotten the lock but the lock may be held by another process.  In 
that case you do not want to be releasing their lock while they are 
relying on it.

Will

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-11-28 17:19   ` William H. Taber
@ 2005-11-28 23:12     ` Badari Pulavarty
  2005-11-29 14:19       ` Ian Kent
  2005-11-29 14:20     ` Ian Kent
  1 sibling, 1 reply; 95+ messages in thread
From: Badari Pulavarty @ 2005-11-28 23:12 UTC (permalink / raw)
  To: William H. Taber; +Cc: Ian Kent, Ram Pai, autofs mailing list, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 3911 bytes --]

On Mon, 2005-11-28 at 12:19 -0500, William H. Taber wrote:
> Ian Kent wrote:
> 
> > My thoughts:
> > 
> > The cause of this issue is user space programs using autofs4 need to 
> > call services that must be able to take the inode semaphore. Notably 
> > sys_mkdir and sys_symlink in order to complete their task.
> > 
> > I believe that, in this case, releasing the semaphore is ok since the 
> > entry is part of the autofs filesystem and so autofs is responsible for 
> > taking care of it, provided that it is done carefully. The semaphore is 
> > meant to serialize changes being to the directory and these changes are 
> > done in autofs by asking the user space process to do it. Which are 
> > themselves serialized by the same semaphore.
> > 
> > The only tricky thing I can think of here is that care must be taken to 
> > ensure that the semaphore is not released before the DCACHE_AUTOFS_PENDING 
> > flag is set to make sure that other incoming requests are sent to the wait 
> > queue.
> > 
> > The attached patch does this and opts for a conservative approach by 
> > broadening the critical region instead of narrowing it.
> > 
> > It may also be necessary to review the return codes from revaliate but I'm 
> > only part way through that.
> > 
> > Please review and test this patch and offer further comment.
> > Sorry guys but I haven't been able to test this at all save verifying that 
> > it compiles.
> > 
> > Hopefully I haven't missed anything completely obvious ... DOH!
> > 
> > Ian
> > 
> > --- linux-2.6.15-rc1/fs/autofs4/root.c.lookup-deadlock	2005-11-17 18:58:38.000000000 +0800
> > +++ linux-2.6.15-rc1/fs/autofs4/root.c	2005-11-27 17:00:40.000000000 +0800
> > @@ -487,11 +487,8 @@ static struct dentry *autofs4_lookup(str
> >  	dentry->d_fsdata = NULL;
> >  	d_add(dentry, NULL);
> >  
> > -	if (dentry->d_op && dentry->d_op->d_revalidate) {
> > -		up(&dir->i_sem);
> > +	if (dentry->d_op && dentry->d_op->d_revalidate)
> >  		(dentry->d_op->d_revalidate)(dentry, nd);
> > -		down(&dir->i_sem);
> > -	}
> >  
> >  	/*
> >  	 * If we are still pending, check if we had to handle
> > --- linux-2.6.15-rc1/fs/autofs4/waitq.c.lookup-deadlock	2005-11-27 17:09:42.000000000 +0800
> > +++ linux-2.6.15-rc1/fs/autofs4/waitq.c	2005-11-27 17:17:34.000000000 +0800
> > @@ -161,6 +161,8 @@ int autofs4_wait(struct autofs_sb_info *
> >  		enum autofs_notify notify)
> >  {
> >  	struct autofs_wait_queue *wq;
> > +	struct inode *dir = dentry->d_parent->d_inode;
> > +	int i_sem_held;
> >  	char *name;
> >  	int len, status;
> >  
> > @@ -227,6 +229,14 @@ int autofs4_wait(struct autofs_sb_info *
> >  			(unsigned long) wq->wait_queue_token, wq->len, wq->name, notify);
> >  	}
> >  
> > +	/*
> > +	 * If we are called from lookup or lookup_hash the
> > +	 * the inode semaphore needs to be released for
> > +	 * userspace to do its thing.
> > +	 */
> > +	i_sem_held = down_trylock(&dir->i_sem);
> > +	up(&dir->i_sem);
> > +
> >  	if (notify != NFY_NONE && atomic_dec_and_test(&wq->notified)) {
> >  		int type = (notify == NFY_MOUNT ?
> >  			autofs_ptype_missing : autofs_ptype_expire_multi);
> > @@ -268,6 +278,10 @@ int autofs4_wait(struct autofs_sb_info *
> >  		DPRINTK("skipped sleeping");
> >  	}
> >  
> > +	/* Re-take the inode semaphore if it was held */
> > +	if (i_sem_held)
> > +		down(&dir->i_sem);
> > +
> >  	status = wq->status;
> >  
> >  	/* Are we the last process to need status? */
> > -
> Ian,
> I have not tested this patch but it seems to have a serious flaw.  Given 
> that do_lookup does not get the parent i_sem lock before calling 
> revalidate, you have the possibility that you are being called without 
> having gotten the lock but the lock may be held by another process.  In 
> that case you do not want to be releasing their lock while they are 
> relying on it.
> 

Here is the patch Will Taber proposed and I am posting on his behalf.

Thanks,
Badari




[-- Attachment #2: autofs.patch --]
[-- Type: text/x-patch, Size: 3822 bytes --]

This patch changes the semantics of d_revalidate so that it is always called 
with the parent i_sem lock held.  This allows the autofs4 code to release the
lock if it needs to pend.  Without this patch the autofs has a race condition
in which it pends in the revalidate code while holding the parent i_sem lock 
which prevents the mount from ever completing.  There have been other patches
proposed for this problem which check to see if the parent i_sem lock is held
before releasing it but those solutions ignore the possibility that the lock
may be held by another process.

diff -ur linux-2.6.13.3/fs/autofs4/root.c linux-2.6.13.3-autofspatch/fs/autofs4/root.c
--- linux-2.6.13.3/fs/autofs4/root.c	2005-10-03 16:27:35.000000000 -0700
+++ linux-2.6.13.3-autofspatch/fs/autofs4/root.c	2005-11-28 04:22:52.000000000 -0800
@@ -302,7 +302,9 @@
 		DPRINTK("waiting for expire %p name=%.*s",
 			 dentry, dentry->d_name.len, dentry->d_name.name);
 
+		up(&dentry->d_parent->d_inode->i_sem);
 		status = autofs4_wait(sbi, dentry, NFY_NONE);
+		down(&dentry->d_parent->d_inode->i_sem);
 		
 		DPRINTK("expire done status=%d", status);
 		
@@ -324,7 +326,9 @@
 		DPRINTK("waiting for mount name=%.*s",
 			 dentry->d_name.len, dentry->d_name.name);
 
+		up(&dentry->d_parent->d_inode->i_sem);
 		status = autofs4_wait(sbi, dentry, NFY_MOUNT);
+		down(&dentry->d_parent->d_inode->i_sem);
 		 
 		DPRINTK("mount done status=%d", status);
 
@@ -351,7 +355,9 @@
 		spin_lock(&dentry->d_lock);
 		dentry->d_flags |= DCACHE_AUTOFS_PENDING;
 		spin_unlock(&dentry->d_lock);
+		up(&dentry->d_parent->d_inode->i_sem);
 		status = autofs4_wait(sbi, dentry, NFY_MOUNT);
+		down(&dentry->d_parent->d_inode->i_sem);
 
 		DPRINTK("mount done status=%d", status);
 
diff -ur linux-2.6.13.3/fs/namei.c linux-2.6.13.3-autofspatch/fs/namei.c
--- linux-2.6.13.3/fs/namei.c	2005-10-03 16:27:35.000000000 -0700
+++ linux-2.6.13.3-autofspatch/fs/namei.c	2005-11-28 04:22:52.000000000 -0800
@@ -393,7 +393,6 @@
 	struct dentry * result;
 	struct inode *dir = parent->d_inode;
 
-	down(&dir->i_sem);
 	/*
 	 * First re-do the cached lookup just in case it was created
 	 * while we waited for the directory semaphore..
@@ -419,7 +418,6 @@
 			else
 				result = dentry;
 		}
-		up(&dir->i_sem);
 		return result;
 	}
 
@@ -427,7 +425,6 @@
 	 * Uhhuh! Nasty case: the cache was re-populated while
 	 * we waited on the semaphore. Need to revalidate.
 	 */
-	up(&dir->i_sem);
 	if (result->d_op && result->d_op->d_revalidate) {
 		if (!result->d_op->d_revalidate(result, nd) && !d_invalidate(result)) {
 			dput(result);
@@ -676,13 +673,16 @@
 		     struct path *path)
 {
 	struct vfsmount *mnt = nd->mnt;
+	struct inode *parent = nd->dentry->d_inode;
 	struct dentry *dentry = __d_lookup(nd->dentry, name);
 
+	down(&parent->i_sem);
 	if (!dentry)
 		goto need_lookup;
 	if (dentry->d_op && dentry->d_op->d_revalidate)
 		goto need_revalidate;
 done:
+	up(&parent->i_sem);
 	path->mnt = mnt;
 	path->dentry = dentry;
 	__follow_mount(path);
@@ -703,6 +703,7 @@
 	goto need_lookup;
 
 fail:
+	up(&parent->i_sem);
 	return PTR_ERR(dentry);
 }
 
@@ -718,7 +719,7 @@
 {
 	struct path next;
 	struct inode *inode;
-	int err;
+	int err, reval;
 	unsigned int lookup_flags = nd->flags;
 	
 	while (*name=='/')
@@ -893,9 +894,17 @@
 		 */
 		if (nd->dentry && nd->dentry->d_sb &&
 		    (nd->dentry->d_sb->s_type->fs_flags & FS_REVAL_DOT)) {
+			struct dentry *nparent;
+
 			err = -ESTALE;
 			/* Note: we do not d_invalidate() */
-			if (!nd->dentry->d_op->d_revalidate(nd->dentry, nd))
+			/* Revalidate requires us to lock the parent.
+			 */
+			nparent = nd->dentry->d_parent;
+			down(&nparent->d_inode->i_sem);
+			reval = nd->dentry->d_op->d_revalidate(nd->dentry, nd);
+			up(&nparent->d_inode->i_sem);
+			if (!reval)
 				break;
 		}
 return_base:

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-11-28 23:12     ` Badari Pulavarty
@ 2005-11-29 14:19       ` Ian Kent
  2005-11-29 16:34         ` William H. Taber
  0 siblings, 1 reply; 95+ messages in thread
From: Ian Kent @ 2005-11-29 14:19 UTC (permalink / raw)
  To: Badari Pulavarty
  Cc: William H. Taber, Ram Pai, autofs mailing list, linux-fsdevel


We'll need to do an analysis of all callers of the revalidate method.

On Mon, 28 Nov 2005, Badari Pulavarty wrote:

> On Mon, 2005-11-28 at 12:19 -0500, William H. Taber wrote:
> > Ian Kent wrote:
> > 
> > > My thoughts:
> > > 
> > > The cause of this issue is user space programs using autofs4 need to 
> > > call services that must be able to take the inode semaphore. Notably 
> > > sys_mkdir and sys_symlink in order to complete their task.
> > > 
> > > I believe that, in this case, releasing the semaphore is ok since the 
> > > entry is part of the autofs filesystem and so autofs is responsible for 
> > > taking care of it, provided that it is done carefully. The semaphore is 
> > > meant to serialize changes being to the directory and these changes are 
> > > done in autofs by asking the user space process to do it. Which are 
> > > themselves serialized by the same semaphore.
> > > 
> > > The only tricky thing I can think of here is that care must be taken to 
> > > ensure that the semaphore is not released before the DCACHE_AUTOFS_PENDING 
> > > flag is set to make sure that other incoming requests are sent to the wait 
> > > queue.
> > > 
> > > The attached patch does this and opts for a conservative approach by 
> > > broadening the critical region instead of narrowing it.
> > > 
> > > It may also be necessary to review the return codes from revaliate but I'm 
> > > only part way through that.
> > > 
> > > Please review and test this patch and offer further comment.
> > > Sorry guys but I haven't been able to test this at all save verifying that 
> > > it compiles.
> > > 
> > > Hopefully I haven't missed anything completely obvious ... DOH!
> > > 
> > > Ian
> > > 
> > > --- linux-2.6.15-rc1/fs/autofs4/root.c.lookup-deadlock	2005-11-17 18:58:38.000000000 +0800
> > > +++ linux-2.6.15-rc1/fs/autofs4/root.c	2005-11-27 17:00:40.000000000 +0800
> > > @@ -487,11 +487,8 @@ static struct dentry *autofs4_lookup(str
> > >  	dentry->d_fsdata = NULL;
> > >  	d_add(dentry, NULL);
> > >  
> > > -	if (dentry->d_op && dentry->d_op->d_revalidate) {
> > > -		up(&dir->i_sem);
> > > +	if (dentry->d_op && dentry->d_op->d_revalidate)
> > >  		(dentry->d_op->d_revalidate)(dentry, nd);
> > > -		down(&dir->i_sem);
> > > -	}
> > >  
> > >  	/*
> > >  	 * If we are still pending, check if we had to handle
> > > --- linux-2.6.15-rc1/fs/autofs4/waitq.c.lookup-deadlock	2005-11-27 17:09:42.000000000 +0800
> > > +++ linux-2.6.15-rc1/fs/autofs4/waitq.c	2005-11-27 17:17:34.000000000 +0800
> > > @@ -161,6 +161,8 @@ int autofs4_wait(struct autofs_sb_info *
> > >  		enum autofs_notify notify)
> > >  {
> > >  	struct autofs_wait_queue *wq;
> > > +	struct inode *dir = dentry->d_parent->d_inode;
> > > +	int i_sem_held;
> > >  	char *name;
> > >  	int len, status;
> > >  
> > > @@ -227,6 +229,14 @@ int autofs4_wait(struct autofs_sb_info *
> > >  			(unsigned long) wq->wait_queue_token, wq->len, wq->name, notify);
> > >  	}
> > >  
> > > +	/*
> > > +	 * If we are called from lookup or lookup_hash the
> > > +	 * the inode semaphore needs to be released for
> > > +	 * userspace to do its thing.
> > > +	 */
> > > +	i_sem_held = down_trylock(&dir->i_sem);
> > > +	up(&dir->i_sem);
> > > +
> > >  	if (notify != NFY_NONE && atomic_dec_and_test(&wq->notified)) {
> > >  		int type = (notify == NFY_MOUNT ?
> > >  			autofs_ptype_missing : autofs_ptype_expire_multi);
> > > @@ -268,6 +278,10 @@ int autofs4_wait(struct autofs_sb_info *
> > >  		DPRINTK("skipped sleeping");
> > >  	}
> > >  
> > > +	/* Re-take the inode semaphore if it was held */
> > > +	if (i_sem_held)
> > > +		down(&dir->i_sem);
> > > +
> > >  	status = wq->status;
> > >  
> > >  	/* Are we the last process to need status? */
> > > -
> > Ian,
> > I have not tested this patch but it seems to have a serious flaw.  Given 
> > that do_lookup does not get the parent i_sem lock before calling 
> > revalidate, you have the possibility that you are being called without 
> > having gotten the lock but the lock may be held by another process.  In 
> > that case you do not want to be releasing their lock while they are 
> > relying on it.
> > 
> 
> Here is the patch Will Taber proposed and I am posting on his behalf.
> 
> Thanks,
> Badari
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-11-28 17:19   ` William H. Taber
  2005-11-28 23:12     ` Badari Pulavarty
@ 2005-11-29 14:20     ` Ian Kent
  1 sibling, 0 replies; 95+ messages in thread
From: Ian Kent @ 2005-11-29 14:20 UTC (permalink / raw)
  To: William H. Taber; +Cc: Ram Pai, autofs mailing list, linux-fsdevel

On Mon, 28 Nov 2005, William H. Taber wrote:

> Ian Kent wrote:
> 
> > My thoughts:
> > 
> > The cause of this issue is user space programs using autofs4 need to 
> > call services that must be able to take the inode semaphore. Notably 
> > sys_mkdir and sys_symlink in order to complete their task.
> > 
> > I believe that, in this case, releasing the semaphore is ok since the 
> > entry is part of the autofs filesystem and so autofs is responsible for 
> > taking care of it, provided that it is done carefully. The semaphore is 
> > meant to serialize changes being to the directory and these changes are 
> > done in autofs by asking the user space process to do it. Which are 
> > themselves serialized by the same semaphore.
> > 
> > The only tricky thing I can think of here is that care must be taken to 
> > ensure that the semaphore is not released before the DCACHE_AUTOFS_PENDING 
> > flag is set to make sure that other incoming requests are sent to the wait 
> > queue.
> > 
> > The attached patch does this and opts for a conservative approach by 
> > broadening the critical region instead of narrowing it.
> > 
> > It may also be necessary to review the return codes from revaliate but I'm 
> > only part way through that.
> > 
> > Please review and test this patch and offer further comment.
> > Sorry guys but I haven't been able to test this at all save verifying that 
> > it compiles.
> > 
> > Hopefully I haven't missed anything completely obvious ... DOH!
> > 
> > Ian
> > 
> > --- linux-2.6.15-rc1/fs/autofs4/root.c.lookup-deadlock	2005-11-17 18:58:38.000000000 +0800
> > +++ linux-2.6.15-rc1/fs/autofs4/root.c	2005-11-27 17:00:40.000000000 +0800
> > @@ -487,11 +487,8 @@ static struct dentry *autofs4_lookup(str
> >  	dentry->d_fsdata = NULL;
> >  	d_add(dentry, NULL);
> >  
> > -	if (dentry->d_op && dentry->d_op->d_revalidate) {
> > -		up(&dir->i_sem);
> > +	if (dentry->d_op && dentry->d_op->d_revalidate)
> >  		(dentry->d_op->d_revalidate)(dentry, nd);
> > -		down(&dir->i_sem);
> > -	}
> >  
> >  	/*
> >  	 * If we are still pending, check if we had to handle
> > --- linux-2.6.15-rc1/fs/autofs4/waitq.c.lookup-deadlock	2005-11-27 17:09:42.000000000 +0800
> > +++ linux-2.6.15-rc1/fs/autofs4/waitq.c	2005-11-27 17:17:34.000000000 +0800
> > @@ -161,6 +161,8 @@ int autofs4_wait(struct autofs_sb_info *
> >  		enum autofs_notify notify)
> >  {
> >  	struct autofs_wait_queue *wq;
> > +	struct inode *dir = dentry->d_parent->d_inode;
> > +	int i_sem_held;
> >  	char *name;
> >  	int len, status;
> >  
> > @@ -227,6 +229,14 @@ int autofs4_wait(struct autofs_sb_info *
> >  			(unsigned long) wq->wait_queue_token, wq->len, wq->name, notify);
> >  	}
> >  
> > +	/*
> > +	 * If we are called from lookup or lookup_hash the
> > +	 * the inode semaphore needs to be released for
> > +	 * userspace to do its thing.
> > +	 */
> > +	i_sem_held = down_trylock(&dir->i_sem);
> > +	up(&dir->i_sem);
> > +
> >  	if (notify != NFY_NONE && atomic_dec_and_test(&wq->notified)) {
> >  		int type = (notify == NFY_MOUNT ?
> >  			autofs_ptype_missing : autofs_ptype_expire_multi);
> > @@ -268,6 +278,10 @@ int autofs4_wait(struct autofs_sb_info *
> >  		DPRINTK("skipped sleeping");
> >  	}
> >  
> > +	/* Re-take the inode semaphore if it was held */
> > +	if (i_sem_held)
> > +		down(&dir->i_sem);
> > +
> >  	status = wq->status;
> >  
> >  	/* Are we the last process to need status? */
> > -
> Ian,
> I have not tested this patch but it seems to have a serious flaw.  Given 
> that do_lookup does not get the parent i_sem lock before calling 
> revalidate, you have the possibility that you are being called without 
> having gotten the lock but the lock may be held by another process.  In 
> that case you do not want to be releasing their lock while they are 
> relying on it.

Oops.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-11-29 14:19       ` Ian Kent
@ 2005-11-29 16:34         ` William H. Taber
  2005-11-30 14:02           ` Ian Kent
  0 siblings, 1 reply; 95+ messages in thread
From: William H. Taber @ 2005-11-29 16:34 UTC (permalink / raw)
  To: Ian Kent; +Cc: Badari Pulavarty, Ram Pai, autofs mailing list, linux-fsdevel

Ian Kent wrote:
> We'll need to do an analysis of all callers of the revalidate method.
You are right. Searching through the sources, it would appear that I 
missed fixing autofs and devfs.  Everyone else just defines a revalidate 
routine but doesn't call one.  You may find devfs to be interesting 
because they have code to determine whether they need to release the 
i_sem lock or not.  I am working on an updated patch to include the 
changes needed for these two modules.

Will


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-16 10:17 [RFC PATCH]autofs4: hang and proposed fix Ram Pai
@ 2005-11-30  1:16   ` Jeff Moyer
  2005-11-16 15:22   ` Jeff Moyer
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 95+ messages in thread
From: Jeff Moyer @ 2005-11-30  1:16 UTC (permalink / raw)
  To: Ram Pai; +Cc: autofs, linux-fsdevel

==> Regarding [autofs] [RFC PATCH]autofs4: hang and proposed fix; linuxram@us.ibm.com (Ram Pai) adds:

linuxram> Autofs4 assumes that its ->revalidate() function gets called with
linuxram> the parent_dentry's_inode_semaphore released. This is true mostly
linuxram> but not in one particular case.

linuxram> Process P1 calls autofs4's ->lookup(). The lookup finds that the
linuxram> dentry does not exist. It creates a dentry and adds to the
linuxram> cache. Releases the parent's inode's semaphore and than calls
linuxram> ->revalidate().

linuxram> Process P2 meanwhile comes in and cached_lookup() gets called. It
linuxram> finds the dentry in the cache and finds ->revalidate() function
linuxram> exists. So it calls ->revalidate() holding the parent's inode's
linuxram> semaphore.

Can't we simply fix this case?  It seems like it should be perfectly safe
to drop the parent's i_sem before calling revalidate in cached_lookup.  In
fact, there are comments in the NFS code that would lead one to believe
that revalidate is not supposed to be called with the parent's i_sem held:

static int nfs_lookup_revalidate(struct dentry * dentry, struct nameidata *nd)
{
...
	/*
	 * Note: we're not holding inode->i_sem and so may be racing with
	 * operations that change the directory. We therefore save the
	 * change attribute *before* we do the RPC call.
	 */

Can you try out a patch which does this?

-Jeff

--- linux-2.6.14/fs/namei.c.orig	2005-11-29 20:14:30.000000000 -0500
+++ linux-2.6.14/fs/namei.c	2005-11-29 20:14:48.000000000 -0500
@@ -332,10 +332,12 @@ static struct dentry * cached_lookup(str
 		dentry = d_lookup(parent, name);
 
 	if (dentry && dentry->d_op && dentry->d_op->d_revalidate) {
+		up(&parent->d_inode->i_sem);
 		if (!dentry->d_op->d_revalidate(dentry, nd) && !d_invalidate(dentry)) {
 			dput(dentry);
 			dentry = NULL;
 		}
+		down(&parent->d_inode->i_sem);
 	}
 	return dentry;
 }

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
@ 2005-11-30  1:16   ` Jeff Moyer
  0 siblings, 0 replies; 95+ messages in thread
From: Jeff Moyer @ 2005-11-30  1:16 UTC (permalink / raw)
  To: Ram Pai; +Cc: autofs, linux-fsdevel

==> Regarding [autofs] [RFC PATCH]autofs4: hang and proposed fix; linuxram@us.ibm.com (Ram Pai) adds:

linuxram> Autofs4 assumes that its ->revalidate() function gets called with
linuxram> the parent_dentry's_inode_semaphore released. This is true mostly
linuxram> but not in one particular case.

linuxram> Process P1 calls autofs4's ->lookup(). The lookup finds that the
linuxram> dentry does not exist. It creates a dentry and adds to the
linuxram> cache. Releases the parent's inode's semaphore and than calls
linuxram> ->revalidate().

linuxram> Process P2 meanwhile comes in and cached_lookup() gets called. It
linuxram> finds the dentry in the cache and finds ->revalidate() function
linuxram> exists. So it calls ->revalidate() holding the parent's inode's
linuxram> semaphore.

Can't we simply fix this case?  It seems like it should be perfectly safe
to drop the parent's i_sem before calling revalidate in cached_lookup.  In
fact, there are comments in the NFS code that would lead one to believe
that revalidate is not supposed to be called with the parent's i_sem held:

static int nfs_lookup_revalidate(struct dentry * dentry, struct nameidata *nd)
{
...
	/*
	 * Note: we're not holding inode->i_sem and so may be racing with
	 * operations that change the directory. We therefore save the
	 * change attribute *before* we do the RPC call.
	 */

Can you try out a patch which does this?

-Jeff

--- linux-2.6.14/fs/namei.c.orig	2005-11-29 20:14:30.000000000 -0500
+++ linux-2.6.14/fs/namei.c	2005-11-29 20:14:48.000000000 -0500
@@ -332,10 +332,12 @@ static struct dentry * cached_lookup(str
 		dentry = d_lookup(parent, name);
 
 	if (dentry && dentry->d_op && dentry->d_op->d_revalidate) {
+		up(&parent->d_inode->i_sem);
 		if (!dentry->d_op->d_revalidate(dentry, nd) && !d_invalidate(dentry)) {
 			dput(dentry);
 			dentry = NULL;
 		}
+		down(&parent->d_inode->i_sem);
 	}
 	return dentry;
 }

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-30  1:16   ` Jeff Moyer
  (?)
@ 2005-11-30  1:56   ` Trond Myklebust
  2005-11-30  4:15     ` Jeff Moyer
  2005-11-30 20:32     ` William H. Taber
  -1 siblings, 2 replies; 95+ messages in thread
From: Trond Myklebust @ 2005-11-30  1:56 UTC (permalink / raw)
  To: jmoyer; +Cc: Ram Pai, autofs, linux-fsdevel

On Tue, 2005-11-29 at 20:16 -0500, Jeff Moyer wrote:
> ==> Regarding [autofs] [RFC PATCH]autofs4: hang and proposed fix; linuxram@us.ibm.com (Ram Pai) adds:
> 
> linuxram> Autofs4 assumes that its ->revalidate() function gets called with
> linuxram> the parent_dentry's_inode_semaphore released. This is true mostly
> linuxram> but not in one particular case.
> 
> linuxram> Process P1 calls autofs4's ->lookup(). The lookup finds that the
> linuxram> dentry does not exist. It creates a dentry and adds to the
> linuxram> cache. Releases the parent's inode's semaphore and than calls
> linuxram> ->revalidate().
> 
> linuxram> Process P2 meanwhile comes in and cached_lookup() gets called. It
> linuxram> finds the dentry in the cache and finds ->revalidate() function
> linuxram> exists. So it calls ->revalidate() holding the parent's inode's
> linuxram> semaphore.
> 
> Can't we simply fix this case?  It seems like it should be perfectly safe
> to drop the parent's i_sem before calling revalidate in cached_lookup.  In
> fact, there are comments in the NFS code that would lead one to believe
> that revalidate is not supposed to be called with the parent's i_sem held:
> 
> static int nfs_lookup_revalidate(struct dentry * dentry, struct nameidata *nd)
> {
> ...
> 	/*
> 	 * Note: we're not holding inode->i_sem and so may be racing with
> 	 * operations that change the directory. We therefore save the
> 	 * change attribute *before* we do the RPC call.
> 	 */
> 
> Can you try out a patch which does this?
> 
> -Jeff
> 
> --- linux-2.6.14/fs/namei.c.orig	2005-11-29 20:14:30.000000000 -0500
> +++ linux-2.6.14/fs/namei.c	2005-11-29 20:14:48.000000000 -0500
> @@ -332,10 +332,12 @@ static struct dentry * cached_lookup(str
>  		dentry = d_lookup(parent, name);
>  
>  	if (dentry && dentry->d_op && dentry->d_op->d_revalidate) {
> +		up(&parent->d_inode->i_sem);
>  		if (!dentry->d_op->d_revalidate(dentry, nd) && !d_invalidate(dentry)) {
>  			dput(dentry);
>  			dentry = NULL;
>  		}
> +		down(&parent->d_inode->i_sem);
>  	}
>  	return dentry;
>  }

Woah! Definitely not safe. NFS might not care, but the VFS will
certainly barf over that!

By dropping the dir->i_sem in cached_lookup() you are allowing 2
processes to allocate and lookup multiple dentries for the same file
inside __lookup_hash().

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-30  1:56   ` Trond Myklebust
@ 2005-11-30  4:15     ` Jeff Moyer
  2005-11-30  6:14       ` Trond Myklebust
  2005-11-30 20:32     ` William H. Taber
  1 sibling, 1 reply; 95+ messages in thread
From: Jeff Moyer @ 2005-11-30  4:15 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Ram Pai, autofs, linux-fsdevel

==> Regarding Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix; Trond Myklebust <trond.myklebust@fys.uio.no> adds:

trond.myklebust> On Tue, 2005-11-29 at 20:16 -0500, Jeff Moyer wrote:
>> ==> Regarding [autofs] [RFC PATCH]autofs4: hang and proposed fix; linuxram@us.ibm.com (Ram Pai) adds:
>> 
linuxram> Autofs4 assumes that its ->revalidate() function gets called with
linuxram> the parent_dentry's_inode_semaphore released. This is true mostly
linuxram> but not in one particular case.
>> 
linuxram> Process P1 calls autofs4's ->lookup(). The lookup finds that the
linuxram> dentry does not exist. It creates a dentry and adds to the
linuxram> cache. Releases the parent's inode's semaphore and than calls
linuxram> ->revalidate().
>> 
linuxram> Process P2 meanwhile comes in and cached_lookup() gets called. It
linuxram> finds the dentry in the cache and finds ->revalidate() function
linuxram> exists. So it calls ->revalidate() holding the parent's inode's
linuxram> semaphore.
>> 
>> Can't we simply fix this case?  It seems like it should be perfectly safe
>> to drop the parent's i_sem before calling revalidate in cached_lookup.  In
>> fact, there are comments in the NFS code that would lead one to believe
>> that revalidate is not supposed to be called with the parent's i_sem held:
>> 
>> static int nfs_lookup_revalidate(struct dentry * dentry, struct nameidata *nd)
>> {
>> ...
>> /*
>> * Note: we're not holding inode->i_sem and so may be racing with
>> * operations that change the directory. We therefore save the
>> * change attribute *before* we do the RPC call.
>> */
>> 
>> Can you try out a patch which does this?
>> 
>> -Jeff
>> 
>> --- linux-2.6.14/fs/namei.c.orig	2005-11-29 20:14:30.000000000 -0500
>> +++ linux-2.6.14/fs/namei.c	2005-11-29 20:14:48.000000000 -0500
>> @@ -332,10 +332,12 @@ static struct dentry * cached_lookup(str
>> 	dentry = d_lookup(parent, name);
>> 
>> 	if (dentry && dentry->d_op && dentry->d_op->d_revalidate) {
>> +		up(&parent->d_inode->i_sem);
>> 		if (!dentry->d_op->d_revalidate(dentry, nd) && !d_invalidate(dentry)) {
>> 			dput(dentry);
>> 			dentry = NULL;
>> 		}
>> +		down(&parent->d_inode->i_sem);
>> 	}
>> 	return dentry;
>> }

trond> Woah! Definitely not safe. NFS might not care, but the VFS will
trond> certainly barf over that!

trond> By dropping the dir->i_sem in cached_lookup() you are allowing 2
trond> processes to allocate and lookup multiple dentries for the same file
trond> inside __lookup_hash().

The patch only drops the semaphore if d_lookup finds the dentry and the
dentry has a revalidate routine.  I don't follow how you can end up with
multiple dentries for the same file in this case.

Sorry if I'm missing something obvious.

-Jeff

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-30  4:15     ` Jeff Moyer
@ 2005-11-30  6:14       ` Trond Myklebust
  2005-11-30 15:44         ` Ian Kent
  0 siblings, 1 reply; 95+ messages in thread
From: Trond Myklebust @ 2005-11-30  6:14 UTC (permalink / raw)
  To: jmoyer; +Cc: Ram Pai, autofs, linux-fsdevel

On Tue, 2005-11-29 at 23:15 -0500, Jeff Moyer wrote:

> The patch only drops the semaphore if d_lookup finds the dentry and the
> dentry has a revalidate routine.  I don't follow how you can end up with
> multiple dentries for the same file in this case.
> 
> Sorry if I'm missing something obvious.

The inode->i_sem is what ensures that nobody can insert a new dentry
between the lookup of the cached dentry by d_lookup() and the call to
->lookup() (which instantiates a new dentry).

Imagine you have two separate processes that are doing a __lookup_hash()
of the same cached dentry. Now imagine that d_revalidate() of the dentry
fails.


Process 1                                 Process 2

__lookup_hash("/foo", "bar")		__lookup_hash("/foo", "bar")
...					....
down(&inode->i_sem)
....
<enter cached_lookup("/foo","bar")>
....
dentry = d_lookup("/foo", "bar")
up(&inode->i_sem)
					down(&inode->i_sem)
					....
					<enters cached_lookup("/foo","bar")>
d_revalidate(dentry) (fails)		dentry = d_lookup("/foo","bar")
					up(&inode->i_sem)

down(&inode->i_sem)			d_revalidate(dentry); (fails)
...
<returns NULL to __lookup_hash>
....
dentry = d_alloc(parent,name)
->lookup(dentry) (instantiates "dentry")
up(&inode->i_sem)
					down(&inode->i_sem)
					....
					<returns NULL to __lookup_hash>
					....
					dentry = d_alloc(parent,name)
					->lookup(dentry) (instantiates "dentry")
....

Whoops. Suddenly you have called ->lookup() for 2 dentries that
represent the same filename in the same directory.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-11-29 16:34         ` William H. Taber
@ 2005-11-30 14:02           ` Ian Kent
  2005-11-30 16:49             ` Badari Pulavarty
  0 siblings, 1 reply; 95+ messages in thread
From: Ian Kent @ 2005-11-30 14:02 UTC (permalink / raw)
  To: William H. Taber
  Cc: Badari Pulavarty, Ram Pai, autofs mailing list, linux-fsdevel

On Tue, 29 Nov 2005, William H. Taber wrote:

> Ian Kent wrote:
> > We'll need to do an analysis of all callers of the revalidate method.
> You are right. Searching through the sources, it would appear that I 
> missed fixing autofs and devfs.  Everyone else just defines a revalidate 
> routine but doesn't call one.  You may find devfs to be interesting 
> because they have code to determine whether they need to release the 
> i_sem lock or not.  I am working on an updated patch to include the 
> changes needed for these two modules.

I've looked at devfs before but that bit of code sounds interesting to me.

The other thing that concerns me is that we may be increasing the latency 
of some code paths that need to be really fast. I was thinking that 
perhaps it might be good to try a change more in line with the locking 
used in link_patch_walk (ie. i_sem free revalidate) rather than that used 
in lookup_one_len. My only justification being that lookup is called to 
create stuff where revalidate is called to check stuff. I've been 
poking around and this change looks fairly difficult as well (I seem to 
remember you also looked at this).

Anyway, I'm keen to have a look at your patch.
Thanks much for your interest and help.

Ian


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-30  1:16   ` Jeff Moyer
  (?)
  (?)
@ 2005-11-30 14:48   ` Ian Kent
  -1 siblings, 0 replies; 95+ messages in thread
From: Ian Kent @ 2005-11-30 14:48 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Ram Pai, autofs, linux-fsdevel

On Tue, 29 Nov 2005, Jeff Moyer wrote:

> ==> Regarding [autofs] [RFC PATCH]autofs4: hang and proposed fix; linuxram@us.ibm.com (Ram Pai) adds:
> 
> linuxram> Autofs4 assumes that its ->revalidate() function gets called with
> linuxram> the parent_dentry's_inode_semaphore released. This is true mostly
> linuxram> but not in one particular case.
> 
> linuxram> Process P1 calls autofs4's ->lookup(). The lookup finds that the
> linuxram> dentry does not exist. It creates a dentry and adds to the
> linuxram> cache. Releases the parent's inode's semaphore and than calls
> linuxram> ->revalidate().
> 
> linuxram> Process P2 meanwhile comes in and cached_lookup() gets called. It
> linuxram> finds the dentry in the cache and finds ->revalidate() function
> linuxram> exists. So it calls ->revalidate() holding the parent's inode's
> linuxram> semaphore.
> 
> Can't we simply fix this case?  It seems like it should be perfectly safe
> to drop the parent's i_sem before calling revalidate in cached_lookup.  In
> fact, there are comments in the NFS code that would lead one to believe
> that revalidate is not supposed to be called with the parent's i_sem held:
> 
> static int nfs_lookup_revalidate(struct dentry * dentry, struct nameidata *nd)
> {
> ...
> 	/*
> 	 * Note: we're not holding inode->i_sem and so may be racing with
> 	 * operations that change the directory. We therefore save the
> 	 * change attribute *before* we do the RPC call.
> 	 */
> 
> Can you try out a patch which does this?

Could it be as simple as that?
Food for more thought.

Ian


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-11-30  6:14       ` Trond Myklebust
@ 2005-11-30 15:44         ` Ian Kent
  2005-11-30 15:53           ` [autofs] " Trond Myklebust
  0 siblings, 1 reply; 95+ messages in thread
From: Ian Kent @ 2005-11-30 15:44 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: autofs, linux-fsdevel

On Wed, 30 Nov 2005, Trond Myklebust wrote:

> On Tue, 2005-11-29 at 23:15 -0500, Jeff Moyer wrote:
> 
> > The patch only drops the semaphore if d_lookup finds the dentry and the
> > dentry has a revalidate routine.  I don't follow how you can end up with
> > multiple dentries for the same file in this case.
> > 
> > Sorry if I'm missing something obvious.
> 
> The inode->i_sem is what ensures that nobody can insert a new dentry
> between the lookup of the cached dentry by d_lookup() and the call to
> ->lookup() (which instantiates a new dentry).
> 
> Imagine you have two separate processes that are doing a __lookup_hash()
> of the same cached dentry. Now imagine that d_revalidate() of the dentry
> fails.

And that would be why lookup is only called if d_lookup fails in do_lookup 
when called from the link_path_walk routine. Do you think it is possible 
to do this safely in __lookup_hash with a similar re-ordering?

Ian

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-30 15:44         ` Ian Kent
@ 2005-11-30 15:53           ` Trond Myklebust
  2005-11-30 16:12             ` Ian Kent
  0 siblings, 1 reply; 95+ messages in thread
From: Trond Myklebust @ 2005-11-30 15:53 UTC (permalink / raw)
  To: Ian Kent; +Cc: jmoyer, Ram Pai, autofs, linux-fsdevel

On Wed, 2005-11-30 at 23:44 +0800, Ian Kent wrote:
> On Wed, 30 Nov 2005, Trond Myklebust wrote:
> 
> > On Tue, 2005-11-29 at 23:15 -0500, Jeff Moyer wrote:
> > 
> > > The patch only drops the semaphore if d_lookup finds the dentry and the
> > > dentry has a revalidate routine.  I don't follow how you can end up with
> > > multiple dentries for the same file in this case.
> > > 
> > > Sorry if I'm missing something obvious.
> > 
> > The inode->i_sem is what ensures that nobody can insert a new dentry
> > between the lookup of the cached dentry by d_lookup() and the call to
> > ->lookup() (which instantiates a new dentry).
> > 
> > Imagine you have two separate processes that are doing a __lookup_hash()
> > of the same cached dentry. Now imagine that d_revalidate() of the dentry
> > fails.
> 
> And that would be why lookup is only called if d_lookup fails in do_lookup 
> when called from the link_path_walk routine. Do you think it is possible 
> to do this safely in __lookup_hash with a similar re-ordering?

2.4 kernels did a second d_lookup() after retaking the i_sem. The
problem is what do you do in the case of a race: you either have to loop
back (may lead to infinite loops of d_lookup()+d_invalidate()) or you do
something like retry once, then return an error (which is what 2.4
kernels did).

Either choice is unsatisfying which is (I assume) why the current
behaviour was chosen for 2.6 kernels.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-30 15:53           ` [autofs] " Trond Myklebust
@ 2005-11-30 16:12             ` Ian Kent
  2005-11-30 16:27               ` Ian Kent
  2005-11-30 16:45               ` [autofs] " Trond Myklebust
  0 siblings, 2 replies; 95+ messages in thread
From: Ian Kent @ 2005-11-30 16:12 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: jmoyer, Ram Pai, autofs, linux-fsdevel

On Wed, 30 Nov 2005, Trond Myklebust wrote:

> On Wed, 2005-11-30 at 23:44 +0800, Ian Kent wrote:
> > On Wed, 30 Nov 2005, Trond Myklebust wrote:
> > 
> > > On Tue, 2005-11-29 at 23:15 -0500, Jeff Moyer wrote:
> > > 
> > > > The patch only drops the semaphore if d_lookup finds the dentry and the
> > > > dentry has a revalidate routine.  I don't follow how you can end up with
> > > > multiple dentries for the same file in this case.
> > > > 
> > > > Sorry if I'm missing something obvious.
> > > 
> > > The inode->i_sem is what ensures that nobody can insert a new dentry
> > > between the lookup of the cached dentry by d_lookup() and the call to
> > > ->lookup() (which instantiates a new dentry).
> > > 
> > > Imagine you have two separate processes that are doing a __lookup_hash()
> > > of the same cached dentry. Now imagine that d_revalidate() of the dentry
> > > fails.
> > 
> > And that would be why lookup is only called if d_lookup fails in do_lookup 
> > when called from the link_path_walk routine. Do you think it is possible 
> > to do this safely in __lookup_hash with a similar re-ordering?
> 
> 2.4 kernels did a second d_lookup() after retaking the i_sem. The
> problem is what do you do in the case of a race: you either have to loop
> back (may lead to infinite loops of d_lookup()+d_invalidate()) or you do
> something like retry once, then return an error (which is what 2.4
> kernels did).

But in this case the semaphore is already held up until the revalidate so 
I guess the concern is that the dentry goes away after releasing the 
semaphore?

btw, a little aside.
I'm having trouble understanding where EEXIST is returned in a call such 
as sys_mkdir. Can you help?

If a call such as sys_mkdir (perhaps in lookup_create) was able to 
determine the directory exists before taking the semaphore the problem we 
have here goes away.

> 
> Either choice is unsatisfying which is (I assume) why the current
> behaviour was chosen for 2.6 kernels.
> 
> Cheers,
>   Trond
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-11-30 16:12             ` Ian Kent
@ 2005-11-30 16:27               ` Ian Kent
  2005-11-30 16:45               ` [autofs] " Trond Myklebust
  1 sibling, 0 replies; 95+ messages in thread
From: Ian Kent @ 2005-11-30 16:27 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: autofs, linux-fsdevel

On Thu, 1 Dec 2005, Ian Kent wrote:

> 
> If a call such as sys_mkdir (perhaps in lookup_create) was able to 
> determine the directory exists before taking the semaphore the problem we 
> have here goes away.

Sorry, that's rubbish, ignore it.

Ian

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-30 16:12             ` Ian Kent
  2005-11-30 16:27               ` Ian Kent
@ 2005-11-30 16:45               ` Trond Myklebust
  1 sibling, 0 replies; 95+ messages in thread
From: Trond Myklebust @ 2005-11-30 16:45 UTC (permalink / raw)
  To: Ian Kent; +Cc: jmoyer, Ram Pai, autofs, linux-fsdevel

On Thu, 2005-12-01 at 00:12 +0800, Ian Kent wrote:

> > 2.4 kernels did a second d_lookup() after retaking the i_sem. The
> > problem is what do you do in the case of a race: you either have to loop
> > back (may lead to infinite loops of d_lookup()+d_invalidate()) or you do
> > something like retry once, then return an error (which is what 2.4
> > kernels did).
> 
> But in this case the semaphore is already held up until the revalidate so 
> I guess the concern is that the dentry goes away after releasing the 
> semaphore?

The problem is the same: dealing with the case of someone else
populating the d_cache while you have temporarily dropped the
dir->i_sem.

> btw, a little aside.
> I'm having trouble understanding where EEXIST is returned in a call such 
> as sys_mkdir. Can you help?

The call to may_create() (at the top of vfs_mkdir()) will return EEXIST
if the dentry is already positive i.e. dentry->d_inode is non-null.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-11-30 14:02           ` Ian Kent
@ 2005-11-30 16:49             ` Badari Pulavarty
  2005-11-30 17:04               ` Trond Myklebust
  0 siblings, 1 reply; 95+ messages in thread
From: Badari Pulavarty @ 2005-11-30 16:49 UTC (permalink / raw)
  To: Ian Kent
  Cc: William H. Taber, Ram Pai, autofs mailing list, linux-fsdevel,
	Al Viro, smaneesh

[-- Attachment #1: Type: text/plain, Size: 1483 bytes --]

On Wed, 2005-11-30 at 09:02 -0500, Ian Kent wrote:
> On Tue, 29 Nov 2005, William H. Taber wrote:
> 
> > Ian Kent wrote:
> > > We'll need to do an analysis of all callers of the revalidate method.
> > You are right. Searching through the sources, it would appear that I 
> > missed fixing autofs and devfs.  Everyone else just defines a revalidate 
> > routine but doesn't call one.  You may find devfs to be interesting 
> > because they have code to determine whether they need to release the 
> > i_sem lock or not.  I am working on an updated patch to include the 
> > changes needed for these two modules.
> 
> I've looked at devfs before but that bit of code sounds interesting to me.
> 
> The other thing that concerns me is that we may be increasing the latency 
> of some code paths that need to be really fast. I was thinking that 
> perhaps it might be good to try a change more in line with the locking 
> used in link_patch_walk (ie. i_sem free revalidate) rather than that used 
> in lookup_one_len. My only justification being that lookup is called to 
> create stuff where revalidate is called to check stuff. I've been 
> poking around and this change looks fairly difficult as well (I seem to 
> remember you also looked at this).
> 
> Anyway, I'm keen to have a look at your patch.
> Thanks much for your interest and help.
> 
> Ian
> 

Again, I am posting Will's latest patch on his behalf.

Any thoughts on how acceptable are the VFS changes ?

Thanks,
Badari



[-- Attachment #2: newautofs.patch --]
[-- Type: text/x-patch, Size: 6958 bytes --]

This patch changes the semantics of d_revalidate so that it is always called
with the parent i_sem lock held.  This allows the autofs4 code to release the
lock if it needs to pend.  Without this patch the autofs has a race condition
in which it pends in the revalidate code while holding the parent i_sem lock
which prevents the mount from ever completing.  There have been other patches
proposed for this problem which check to see if the parent i_sem lock is held
before releasing it but those solutions ignore the possibility that the lock
may be held by another process.

This patch has been expanded to include changes to autofs and devfs.  The 
autofs changes mimic the changes to autofs4 in the original patch.  The devfs
changes remove the checking it did to try to determine where it was called 
from so that it could get its locking right.

diff -ur -x '*.orig' -x '*.new' linux-2.6.13.3/fs/autofs/root.c linux-2.6.13.3-autofspatch/fs/autofs/root.c
--- linux-2.6.13.3/fs/autofs/root.c	2005-10-03 16:27:35.000000000 -0700
+++ linux-2.6.13.3-autofspatch/fs/autofs/root.c	2005-11-29 03:58:24.000000000 -0800
@@ -104,7 +104,9 @@
 				/* Return a negative dentry, but leave it "pending" */
 				return 1;
 			}
+			up(&dentry->d_parent->d_inode->i_sem);
 			status = autofs_wait(sbi, &dentry->d_name);
+			down(&dentry->d_parent->d_inode->i_sem);
 		} while (!(ent = autofs_hash_lookup(&sbi->dirhash, &dentry->d_name)) );
 	}
 
@@ -124,7 +126,10 @@
 	/* If this is a directory that isn't a mount point, bitch at the
 	   daemon and fix it in user space */
 	if ( S_ISDIR(dentry->d_inode->i_mode) && !d_mountpoint(dentry) ) {
-		return !autofs_wait(sbi, &dentry->d_name);
+		up(&dentry->d_parent->d_inode->i_sem);
+		status = !autofs_wait(sbi, &dentry->d_name);
+		down(&dentry->d_parent->d_inode->i_sem);
+		return (status);
 	}
 
 	/* We don't update the usages for the autofs daemon itself, this
@@ -229,9 +234,7 @@
 	dentry->d_flags |= DCACHE_AUTOFS_PENDING;
 	d_add(dentry, NULL);
 
-	up(&dir->i_sem);
 	autofs_revalidate(dentry, nd);
-	down(&dir->i_sem);
 
 	/*
 	 * If we are still pending, check if we had to handle
diff -ur -x '*.orig' -x '*.new' linux-2.6.13.3/fs/autofs4/root.c linux-2.6.13.3-autofspatch/fs/autofs4/root.c
--- linux-2.6.13.3/fs/autofs4/root.c	2005-10-03 16:27:35.000000000 -0700
+++ linux-2.6.13.3-autofspatch/fs/autofs4/root.c	2005-11-28 04:22:52.000000000 -0800
@@ -302,7 +302,9 @@
 		DPRINTK("waiting for expire %p name=%.*s",
 			 dentry, dentry->d_name.len, dentry->d_name.name);
 
+		up(&dentry->d_parent->d_inode->i_sem);
 		status = autofs4_wait(sbi, dentry, NFY_NONE);
+		down(&dentry->d_parent->d_inode->i_sem);
 		
 		DPRINTK("expire done status=%d", status);
 		
@@ -324,7 +326,9 @@
 		DPRINTK("waiting for mount name=%.*s",
 			 dentry->d_name.len, dentry->d_name.name);
 
+		up(&dentry->d_parent->d_inode->i_sem);
 		status = autofs4_wait(sbi, dentry, NFY_MOUNT);
+		down(&dentry->d_parent->d_inode->i_sem);
 		 
 		DPRINTK("mount done status=%d", status);
 
@@ -351,7 +355,9 @@
 		spin_lock(&dentry->d_lock);
 		dentry->d_flags |= DCACHE_AUTOFS_PENDING;
 		spin_unlock(&dentry->d_lock);
+		up(&dentry->d_parent->d_inode->i_sem);
 		status = autofs4_wait(sbi, dentry, NFY_MOUNT);
+		down(&dentry->d_parent->d_inode->i_sem);
 
 		DPRINTK("mount done status=%d", status);
 
diff -ur -x '*.orig' -x '*.new' linux-2.6.13.3/fs/devfs/base.c linux-2.6.13.3-autofspatch/fs/devfs/base.c
--- linux-2.6.13.3/fs/devfs/base.c	2005-10-03 16:27:35.000000000 -0700
+++ linux-2.6.13.3-autofspatch/fs/devfs/base.c	2005-11-29 04:15:09.000000000 -0800
@@ -2155,34 +2155,12 @@
 	devfs_handle_t parent = get_devfs_entry_from_vfs_inode(dir);
 	struct devfs_lookup_struct *lookup_info = dentry->d_fsdata;
 	DECLARE_WAITQUEUE(wait, current);
-	int need_lock;
 
 	/*
-	 * FIXME HACK
-	 *
 	 * make sure that
 	 *   d_instantiate always runs under lock
 	 *   we release i_sem lock before going to sleep
-	 *
-	 * unfortunately sometimes d_revalidate is called with
-	 * and sometimes without i_sem lock held. The following checks
-	 * attempt to deduce when we need to add (and drop resp.) lock
-	 * here. This relies on current (2.6.2) calling coventions:
-	 *
-	 *   lookup_hash is always run under i_sem and is passing NULL
-	 *   as nd
-	 *
-	 *   open(...,O_CREATE,...) calls _lookup_hash under i_sem
-	 *   and sets flags to LOOKUP_OPEN|LOOKUP_CREATE
-	 *
-	 *   all other invocations of ->d_revalidate seem to happen
-	 *   outside of i_sem
 	 */
-	need_lock = nd &&
-	    (!(nd->flags & LOOKUP_CREATE) || (nd->flags & LOOKUP_PARENT));
-
-	if (need_lock)
-		down(&dir->i_sem);
 
 	if (is_devfsd_or_child(fs_info)) {
 		devfs_handle_t de = lookup_info->de;
@@ -2237,8 +2215,6 @@
 		read_unlock(&parent->u.dir.lock);
 
       out:
-	if (need_lock)
-		up(&dir->i_sem);
 	return 1;
 }				/*  End Function devfs_d_revalidate_wait  */
 
diff -ur -x '*.orig' -x '*.new' linux-2.6.13.3/fs/namei.c linux-2.6.13.3-autofspatch/fs/namei.c
--- linux-2.6.13.3/fs/namei.c	2005-10-03 16:27:35.000000000 -0700
+++ linux-2.6.13.3-autofspatch/fs/namei.c	2005-11-28 04:22:52.000000000 -0800
@@ -393,7 +393,6 @@
 	struct dentry * result;
 	struct inode *dir = parent->d_inode;
 
-	down(&dir->i_sem);
 	/*
 	 * First re-do the cached lookup just in case it was created
 	 * while we waited for the directory semaphore..
@@ -419,7 +418,6 @@
 			else
 				result = dentry;
 		}
-		up(&dir->i_sem);
 		return result;
 	}
 
@@ -427,7 +425,6 @@
 	 * Uhhuh! Nasty case: the cache was re-populated while
 	 * we waited on the semaphore. Need to revalidate.
 	 */
-	up(&dir->i_sem);
 	if (result->d_op && result->d_op->d_revalidate) {
 		if (!result->d_op->d_revalidate(result, nd) && !d_invalidate(result)) {
 			dput(result);
@@ -676,13 +673,16 @@
 		     struct path *path)
 {
 	struct vfsmount *mnt = nd->mnt;
+	struct inode *parent = nd->dentry->d_inode;
 	struct dentry *dentry = __d_lookup(nd->dentry, name);
 
+	down(&parent->i_sem);
 	if (!dentry)
 		goto need_lookup;
 	if (dentry->d_op && dentry->d_op->d_revalidate)
 		goto need_revalidate;
 done:
+	up(&parent->i_sem);
 	path->mnt = mnt;
 	path->dentry = dentry;
 	__follow_mount(path);
@@ -703,6 +703,7 @@
 	goto need_lookup;
 
 fail:
+	up(&parent->i_sem);
 	return PTR_ERR(dentry);
 }
 
@@ -718,7 +719,7 @@
 {
 	struct path next;
 	struct inode *inode;
-	int err;
+	int err, reval;
 	unsigned int lookup_flags = nd->flags;
 	
 	while (*name=='/')
@@ -893,9 +894,17 @@
 		 */
 		if (nd->dentry && nd->dentry->d_sb &&
 		    (nd->dentry->d_sb->s_type->fs_flags & FS_REVAL_DOT)) {
+			struct dentry *nparent;
+
 			err = -ESTALE;
 			/* Note: we do not d_invalidate() */
-			if (!nd->dentry->d_op->d_revalidate(nd->dentry, nd))
+			/* Revalidate requires us to lock the parent.
+			 */
+			nparent = nd->dentry->d_parent;
+			down(&nparent->d_inode->i_sem);
+			reval = nd->dentry->d_op->d_revalidate(nd->dentry, nd);
+			up(&nparent->d_inode->i_sem);
+			if (!reval)
 				break;
 		}
 return_base:

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-11-30 16:49             ` Badari Pulavarty
@ 2005-11-30 17:04               ` Trond Myklebust
  2005-11-30 21:10                 ` William H. Taber
  0 siblings, 1 reply; 95+ messages in thread
From: Trond Myklebust @ 2005-11-30 17:04 UTC (permalink / raw)
  To: Badari Pulavarty
  Cc: Ian Kent, William H. Taber, Ram Pai, autofs mailing list,
	linux-fsdevel, Al Viro, smaneesh

On Wed, 2005-11-30 at 08:49 -0800, Badari Pulavarty wrote:
> On Wed, 2005-11-30 at 09:02 -0500, Ian Kent wrote:
> > On Tue, 29 Nov 2005, William H. Taber wrote:
> > 
> > > Ian Kent wrote:
> > > > We'll need to do an analysis of all callers of the revalidate method.
> > > You are right. Searching through the sources, it would appear that I 
> > > missed fixing autofs and devfs.  Everyone else just defines a revalidate 
> > > routine but doesn't call one.  You may find devfs to be interesting 
> > > because they have code to determine whether they need to release the 
> > > i_sem lock or not.  I am working on an updated patch to include the 
> > > changes needed for these two modules.
> > 
> > I've looked at devfs before but that bit of code sounds interesting to me.
> > 
> > The other thing that concerns me is that we may be increasing the latency 
> > of some code paths that need to be really fast. I was thinking that 
> > perhaps it might be good to try a change more in line with the locking 
> > used in link_patch_walk (ie. i_sem free revalidate) rather than that used 
> > in lookup_one_len. My only justification being that lookup is called to 
> > create stuff where revalidate is called to check stuff. I've been 
> > poking around and this change looks fairly difficult as well (I seem to 
> > remember you also looked at this).
> > 
> > Anyway, I'm keen to have a look at your patch.
> > Thanks much for your interest and help.
> > 
> > Ian
> > 
> 
> Again, I am posting Will's latest patch on his behalf.
> 
> Any thoughts on how acceptable are the VFS changes ?

That will slow link_path_walk() for commonly accessed shared directories
(/lib, /usr/share,...) down to a crawl.

Instead of having lock-free lookups of cached dentries, you are suddenly
serialising everybody in the parent directory.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-30  1:56   ` Trond Myklebust
  2005-11-30  4:15     ` Jeff Moyer
@ 2005-11-30 20:32     ` William H. Taber
  2005-11-30 20:53       ` Trond Myklebust
  1 sibling, 1 reply; 95+ messages in thread
From: William H. Taber @ 2005-11-30 20:32 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: jmoyer, Ram Pai, autofs, linux-fsdevel

Trond Myklebust wrote:
> On Tue, 2005-11-29 at 20:16 -0500, Jeff Moyer wrote:
> 
>>==> Regarding [autofs] [RFC PATCH]autofs4: hang and proposed fix; linuxram@us.ibm.com (Ram Pai) adds:
>>
>>linuxram> Autofs4 assumes that its ->revalidate() function gets called with
>>linuxram> the parent_dentry's_inode_semaphore released. This is true mostly
>>linuxram> but not in one particular case.
>>
>>linuxram> Process P1 calls autofs4's ->lookup(). The lookup finds that the
>>linuxram> dentry does not exist. It creates a dentry and adds to the
>>linuxram> cache. Releases the parent's inode's semaphore and than calls
>>linuxram> ->revalidate().
>>
>>linuxram> Process P2 meanwhile comes in and cached_lookup() gets called. It
>>linuxram> finds the dentry in the cache and finds ->revalidate() function
>>linuxram> exists. So it calls ->revalidate() holding the parent's inode's
>>linuxram> semaphore.
>>
>>Can't we simply fix this case?  It seems like it should be perfectly safe
>>to drop the parent's i_sem before calling revalidate in cached_lookup.  In
>>fact, there are comments in the NFS code that would lead one to believe
>>that revalidate is not supposed to be called with the parent's i_sem held:
>>
>>static int nfs_lookup_revalidate(struct dentry * dentry, struct nameidata *nd)
>>{
>>...
>>	/*
>>	 * Note: we're not holding inode->i_sem and so may be racing with
>>	 * operations that change the directory. We therefore save the
>>	 * change attribute *before* we do the RPC call.
>>	 */
>>
>>Can you try out a patch which does this?
>>
>>-Jeff
>>
>>--- linux-2.6.14/fs/namei.c.orig	2005-11-29 20:14:30.000000000 -0500
>>+++ linux-2.6.14/fs/namei.c	2005-11-29 20:14:48.000000000 -0500
>>@@ -332,10 +332,12 @@ static struct dentry * cached_lookup(str
>> 		dentry = d_lookup(parent, name);
>> 
>> 	if (dentry && dentry->d_op && dentry->d_op->d_revalidate) {
>>+		up(&parent->d_inode->i_sem);
>> 		if (!dentry->d_op->d_revalidate(dentry, nd) && !d_invalidate(dentry)) {
>> 			dput(dentry);
>> 			dentry = NULL;
>> 		}
>>+		down(&parent->d_inode->i_sem);
>> 	}
>> 	return dentry;
>> }
> 
> 
> Woah! Definitely not safe. NFS might not care, but the VFS will
> certainly barf over that!
> 
> By dropping the dir->i_sem in cached_lookup() you are allowing 2
> processes to allocate and lookup multiple dentries for the same file
> inside __lookup_hash().
> 
> Cheers,
>   Trond
Not only is there this case, but the original premise is wrong as well. 
  There is a second case in which a d_revalidate function is called with 
the parent i_sem and that is when it is called from inside of 
lookup_one_len.  What makes this tricky is that lookup_one_len is called 
from nfs_sillyrename from inside of nfs_rename which is called, 
naturally enough by sys_rename.  The rename code is very careful about 
the order in which it obtains the parent semaphores because it needs to 
get two of them.  It must always obtain the locks in the same order so 
that does not get into a deadly embrace.  If we start arbitrarily 
releasing a parent semaphore in cached_lookup and taking it again after 
the revalidate, we risk breaking the lock ordering and creating a deadly 
embrace.

When I started writing this I thought that it would be safe for the 
autofs revalidate code to release the parent semaphore because they do 
not have a rename callback.  But I looked again at the rename code and 
it calls lookup_hash on the final source and destination files after 
locking the parents so the potential for a deadly embrace still exists 
unless there is some other assurance that these final lookups will never 
pend waiting on the automounter in either their revalidate or lookup 
routines.  (Actually the requirement is that they never give up the 
parent i_sem lock, but the lookup code has to give up the lock so that 
the autofs demon can run and perform the mount so it amounts to the same 
thing.)

The same issue exists for devfs which also releases the parent i_sem 
lock so that it can wait inside its revalidation routine.

Will

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-30 20:32     ` William H. Taber
@ 2005-11-30 20:53       ` Trond Myklebust
  2005-11-30 21:30         ` William H. Taber
  0 siblings, 1 reply; 95+ messages in thread
From: Trond Myklebust @ 2005-11-30 20:53 UTC (permalink / raw)
  To: William H. Taber; +Cc: jmoyer, Ram Pai, autofs, linux-fsdevel

On Wed, 2005-11-30 at 15:32 -0500, William H. Taber wrote:

> Not only is there this case, but the original premise is wrong as well. 
>   There is a second case in which a d_revalidate function is called with 
> the parent i_sem and that is when it is called from inside of 
> lookup_one_len.  What makes this tricky is that lookup_one_len is called 
> from nfs_sillyrename from inside of nfs_rename which is called, 
> naturally enough by sys_rename.  The rename code is very careful about 
> the order in which it obtains the parent semaphores because it needs to 
> get two of them.  It must always obtain the locks in the same order so 
> that does not get into a deadly embrace.  If we start arbitrarily 
> releasing a parent semaphore in cached_lookup and taking it again after 
> the revalidate, we risk breaking the lock ordering and creating a deadly 
> embrace.
> 
> When I started writing this I thought that it would be safe for the 
> autofs revalidate code to release the parent semaphore because they do 
> not have a rename callback.  But I looked again at the rename code and 
> it calls lookup_hash on the final source and destination files after 
> locking the parents so the potential for a deadly embrace still exists 
> unless there is some other assurance that these final lookups will never 
> pend waiting on the automounter in either their revalidate or lookup 
> routines.  (Actually the requirement is that they never give up the 
> parent i_sem lock, but the lookup code has to give up the lock so that 
> the autofs demon can run and perform the mount so it amounts to the same 
> thing.)
> 
> The same issue exists for devfs which also releases the parent i_sem 
> lock so that it can wait inside its revalidation routine.

So exactly why does autofs4 want to hold the dir->i_sem in d_revalidate
in the first place? Can't we move any code that requires dir->i_sem to
be held into a ->lookup() method?

Trivially, if you have a d_revalidate that does something like

int autofs_revalidate(struct dentry *dentry, struct nameidata *nd)
{
  d_drop(dentry);
  return 0;
}

then the VFS will currently allocate a new dentry with the same name,
and call ->lookup() on it without dropping dir->i_sem. If you still need
to reference the old dentry, then put it on a private list somewhere.
That would also allow you to return the old dentry as the result of the
->lookup() operation if that is desirable.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-11-30 17:04               ` Trond Myklebust
@ 2005-11-30 21:10                 ` William H. Taber
  0 siblings, 0 replies; 95+ messages in thread
From: William H. Taber @ 2005-11-30 21:10 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Badari Pulavarty, Ian Kent, Ram Pai, autofs mailing list,
	linux-fsdevel, Al Viro, smaneesh

Trond Myklebust wrote:
> On Wed, 2005-11-30 at 08:49 -0800, Badari Pulavarty wrote:
> 
>>On Wed, 2005-11-30 at 09:02 -0500, Ian Kent wrote:
>>
>>>On Tue, 29 Nov 2005, William H. Taber wrote:
>>>
>>>
>>>>Ian Kent wrote:
>>>>
>>>>>We'll need to do an analysis of all callers of the revalidate method.
>>>>
>>>>You are right. Searching through the sources, it would appear that I 
>>>>missed fixing autofs and devfs.  Everyone else just defines a revalidate 
>>>>routine but doesn't call one.  You may find devfs to be interesting 
>>>>because they have code to determine whether they need to release the 
>>>>i_sem lock or not.  I am working on an updated patch to include the 
>>>>changes needed for these two modules.
>>>
>>>I've looked at devfs before but that bit of code sounds interesting to me.
>>>
>>>The other thing that concerns me is that we may be increasing the latency 
>>>of some code paths that need to be really fast. I was thinking that 
>>>perhaps it might be good to try a change more in line with the locking 
>>>used in link_patch_walk (ie. i_sem free revalidate) rather than that used 
>>>in lookup_one_len. My only justification being that lookup is called to 
>>>create stuff where revalidate is called to check stuff. I've been 
>>>poking around and this change looks fairly difficult as well (I seem to 
>>>remember you also looked at this).
>>>
>>>Anyway, I'm keen to have a look at your patch.
>>>Thanks much for your interest and help.
>>>
>>>Ian
>>>
>>
>>Again, I am posting Will's latest patch on his behalf.
>>
>>Any thoughts on how acceptable are the VFS changes ?
> 
> 
> That will slow link_path_walk() for commonly accessed shared directories
> (/lib, /usr/share,...) down to a crawl.
> 
> Instead of having lock-free lookups of cached dentries, you are suddenly
> serialising everybody in the parent directory.
> 
> Cheers,
>   Trond
> 
Fair enough.  But what do we do?  The original problem was that we were 
seeing deadlocks on /net because the autofs was not releasing the parent 
i_sem when it pended in its revalidate routine.  There was a race in 
which a process could be waiting on the revalidate before the automount 
demon ran but the  automount demon needed the parent i_sem to be able to 
do the mount.

I was proposing this patch to make it easy for the automounter (and 
devfs) to be able to release the parent i_sem if they were going to 
pend.  But as I described in a previous post, I am not sure that it is 
safe in the case of a rename, to allow a filesystem to release the 
parent i_sem in any event.  Oops.  I missed the s_vfs_rename_sem in the 
superblock which serializes renames on a given filesystem.  And since 
renames across filesystems are not allowed, I guess it shouldn't be a 
problem after all.

So I guess that a sufficient fix is for the autofs to add code similar 
to that in devfs so that the revalidate code can decide whether or not 
it needs to release the parent i_sem.  The best fix would be to take the 
code out of devfs and put it into fs/namei.c as an exported function (or 
into fs.h as an inline function) so that it only has to be changed in 
one place the next time the lookup code changes.  It may be an ugly fix 
but the alternative is to be consistent in our locking when we call 
d_revalidate and I don't see an easy solution to that problem.

Will

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-30 20:53       ` Trond Myklebust
@ 2005-11-30 21:30         ` William H. Taber
  2005-11-30 22:32           ` Trond Myklebust
  2005-12-01 12:09           ` Ian Kent
  0 siblings, 2 replies; 95+ messages in thread
From: William H. Taber @ 2005-11-30 21:30 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: jmoyer, Ram Pai, autofs, linux-fsdevel

Trond Myklebust wrote:
> On Wed, 2005-11-30 at 15:32 -0500, William H. Taber wrote:
> 
> 
>>Not only is there this case, but the original premise is wrong as well. 
>>  There is a second case in which a d_revalidate function is called with 
>>the parent i_sem and that is when it is called from inside of 
>>lookup_one_len.  What makes this tricky is that lookup_one_len is called 
>>from nfs_sillyrename from inside of nfs_rename which is called, 
>>naturally enough by sys_rename.  The rename code is very careful about 
>>the order in which it obtains the parent semaphores because it needs to 
>>get two of them.  It must always obtain the locks in the same order so 
>>that does not get into a deadly embrace.  If we start arbitrarily 
>>releasing a parent semaphore in cached_lookup and taking it again after 
>>the revalidate, we risk breaking the lock ordering and creating a deadly 
>>embrace.
>>
>>When I started writing this I thought that it would be safe for the 
>>autofs revalidate code to release the parent semaphore because they do 
>>not have a rename callback.  But I looked again at the rename code and 
>>it calls lookup_hash on the final source and destination files after 
>>locking the parents so the potential for a deadly embrace still exists 
>>unless there is some other assurance that these final lookups will never 
>>pend waiting on the automounter in either their revalidate or lookup 
>>routines.  (Actually the requirement is that they never give up the 
>>parent i_sem lock, but the lookup code has to give up the lock so that 
>>the autofs demon can run and perform the mount so it amounts to the same 
>>thing.)
>>
>>The same issue exists for devfs which also releases the parent i_sem 
>>lock so that it can wait inside its revalidation routine.
> 
> 
> So exactly why does autofs4 want to hold the dir->i_sem in d_revalidate
> in the first place? Can't we move any code that requires dir->i_sem to
> be held into a ->lookup() method?

It's not that d_revalidate wants or doesn't want to hold the lock.  The 
caller of lookup_one_len is required to get the lock and this function 
calls lookup_hash which calls cached_lookup which calls d_revalidate.

> 
> Trivially, if you have a d_revalidate that does something like
> 
> int autofs_revalidate(struct dentry *dentry, struct nameidata *nd)
> {
>   d_drop(dentry);
>   return 0;
> }
> 
> then the VFS will currently allocate a new dentry with the same name,
> and call ->lookup() on it without dropping dir->i_sem. If you still need
> to reference the old dentry, then put it on a private list somewhere.
> That would also allow you to return the old dentry as the result of the
> ->lookup() operation if that is desirable.

Problem with that, as I understand it and Ian Kent knows better than I, 
is that the autofs lookup code creates the dentry and fills it in 
partially and marks it as waiting for mounting and wakes up the 
automount demon.  The demon completes the mount and finishes filling in 
the dentry.  So we cannot have some other lookup coming in and removing 
the dentry on us.  At least that is what I understand from Ian's answer 
when I proposed the same sort of thing to him.   Even if  they end up 
doing something like that in a future version of the automounter, I 
would still like a simple patch that can be applied to existing systems 
as an interim fix.

Will


Will

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-30 21:30         ` William H. Taber
@ 2005-11-30 22:32           ` Trond Myklebust
  2005-12-01 16:27             ` William H. Taber
  2005-12-01 12:09           ` Ian Kent
  1 sibling, 1 reply; 95+ messages in thread
From: Trond Myklebust @ 2005-11-30 22:32 UTC (permalink / raw)
  To: William H. Taber; +Cc: jmoyer, Ram Pai, autofs, linux-fsdevel

On Wed, 2005-11-30 at 16:30 -0500, William H. Taber wrote:

> > 
> > Trivially, if you have a d_revalidate that does something like
> > 
> > int autofs_revalidate(struct dentry *dentry, struct nameidata *nd)
> > {
> >   d_drop(dentry);
> >   return 0;
> > }
> > 
> > then the VFS will currently allocate a new dentry with the same name,
> > and call ->lookup() on it without dropping dir->i_sem. If you still need
> > to reference the old dentry, then put it on a private list somewhere.
> > That would also allow you to return the old dentry as the result of the
> > ->lookup() operation if that is desirable.
> 
> Problem with that, as I understand it and Ian Kent knows better than I, 
> is that the autofs lookup code creates the dentry and fills it in 
> partially and marks it as waiting for mounting and wakes up the 
> automount demon.  The demon completes the mount and finishes filling in 
> the dentry.  So we cannot have some other lookup coming in and removing 
> the dentry on us.  At least that is what I understand from Ian's answer 
> when I proposed the same sort of thing to him.

What do you mean by "removing the dentry on us"? It is perfectly
possible to have lookup() return the original dentry every time, which
is precisely what I suggested above.

>    Even if  they end up 
> doing something like that in a future version of the automounter, I 
> would still like a simple patch that can be applied to existing systems 
> as an interim fix.

"Interim" fixes to the entire VFS API such as the ones that have
proposed here tend to be a poor idea...

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-30 21:30         ` William H. Taber
  2005-11-30 22:32           ` Trond Myklebust
@ 2005-12-01 12:09           ` Ian Kent
  2005-12-01 16:30             ` William H. Taber
  1 sibling, 1 reply; 95+ messages in thread
From: Ian Kent @ 2005-12-01 12:09 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: William H. Taber, Jeff Moyer, Ram Pai, autofs mailing list,
	linux-fsdevel

On Wed, 30 Nov 2005, William H. Taber wrote:

> Trond Myklebust wrote:
> > On Wed, 2005-11-30 at 15:32 -0500, William H. Taber wrote:
> > 
> > 
> > > Not only is there this case, but the original premise is wrong as well.
> > > There is a second case in which a d_revalidate function is called with the
> > > parent i_sem and that is when it is called from inside of lookup_one_len.
> > > What makes this tricky is that lookup_one_len is called from
> > > nfs_sillyrename from inside of nfs_rename which is called, naturally
> > > enough by sys_rename.  The rename code is very careful about the order in
> > > which it obtains the parent semaphores because it needs to get two of
> > > them.  It must always obtain the locks in the same order so that does not
> > > get into a deadly embrace.  If we start arbitrarily releasing a parent
> > > semaphore in cached_lookup and taking it again after the revalidate, we
> > > risk breaking the lock ordering and creating a deadly embrace.
> > > 
> > > When I started writing this I thought that it would be safe for the autofs
> > > revalidate code to release the parent semaphore because they do not have a
> > > rename callback.  But I looked again at the rename code and it calls
> > > lookup_hash on the final source and destination files after locking the
> > > parents so the potential for a deadly embrace still exists unless there is
> > > some other assurance that these final lookups will never pend waiting on
> > > the automounter in either their revalidate or lookup routines.  (Actually
> > > the requirement is that they never give up the parent i_sem lock, but the
> > > lookup code has to give up the lock so that the autofs demon can run and
> > > perform the mount so it amounts to the same thing.)
> > > 
> > > The same issue exists for devfs which also releases the parent i_sem lock
> > > so that it can wait inside its revalidation routine.
> > 
> > 
> > So exactly why does autofs4 want to hold the dir->i_sem in d_revalidate
> > in the first place? Can't we move any code that requires dir->i_sem to
> > be held into a ->lookup() method?
> 
> It's not that d_revalidate wants or doesn't want to hold the lock.  The caller
> of lookup_one_len is required to get the lock and this function calls
> lookup_hash which calls cached_lookup which calls d_revalidate.
> 
> > 
> > Trivially, if you have a d_revalidate that does something like
> > 
> > int autofs_revalidate(struct dentry *dentry, struct nameidata *nd)
> > {
> >   d_drop(dentry);
> >   return 0;
> > }
> > 
> > then the VFS will currently allocate a new dentry with the same name,
> > and call ->lookup() on it without dropping dir->i_sem. If you still need
> > to reference the old dentry, then put it on a private list somewhere.
> > That would also allow you to return the old dentry as the result of the
> > ->lookup() operation if that is desirable.
> 
> Problem with that, as I understand it and Ian Kent knows better than I, is
> that the autofs lookup code creates the dentry and fills it in partially and
> marks it as waiting for mounting and wakes up the automount demon.  The demon
> completes the mount and finishes filling in the dentry.  So we cannot have
> some other lookup coming in and removing the dentry on us.  At least that is
> what I understand from Ian's answer when I proposed the same sort of thing to
> him.   Even if  they end up doing something like that in a future version of
> the automounter, I would still like a simple patch that can be applied to
> existing systems as an interim fix.

Lets see if I can keep this explaination simple.

The user space process using the autofs filesystem (autodir or automount) 
needs to be able to call mkdir at mount time as a result of a callback 
from revalidate. Sometimes this comes indirectly from lookup (if the 
directory does not already exist).

lookup_one_len requires the i_sem to be held so two instances of a 
filesystem calling it lead to a deadlock when mkdir is called from 
userspace (the third process). In the case we are discussing this happens 
because the first process calls lookup which releases the i_sem and 
calls revalidate itself. The second calls revalidate which doesn't release 
the i_sem and is places on a wait queue for mount completion. Consequently 
the mkdir blocks.

So the requirement is that autofs release the i_sem during the callback, 
not obtain it.

Will believes that it is not safe for autofs to release i_sem for 
the callback to user space because it is possible that path that aquired 
it may not be the path that has called revalidate and I can see his point.

Never the less I'm still not convinced that this is possible given the 
restrictions of autofs.

Let me try and describe this, hopefully more clearly than I've done so 
far.

The only operations defined for autofs are:

mkdir, rmdir, symlink and unlink 

and the only processes that can do these operations must be in the same 
process group that mounted the filesystem. EACCESS is returned for all 
other processes attempting these operations.

The other functionality is read-only (and perhaps triggers a mount) 
being lookup, revalidate and readdir.

So the question is, can anyone provide an example of a path that, upon 
calling autofs revalidate or lookup with the i_sem held, not be the path 
that aquired it?

Ian


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-11-30 22:32           ` Trond Myklebust
@ 2005-12-01 16:27             ` William H. Taber
  0 siblings, 0 replies; 95+ messages in thread
From: William H. Taber @ 2005-12-01 16:27 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: jmoyer, Ram Pai, autofs, linux-fsdevel

Trond Myklebust wrote:
> On Wed, 2005-11-30 at 16:30 -0500, William H. Taber wrote:
> 
> 
>>>Trivially, if you have a d_revalidate that does something like
>>>
>>>int autofs_revalidate(struct dentry *dentry, struct nameidata *nd)
>>>{
>>>  d_drop(dentry);
>>>  return 0;
>>>}
>>>
>>>then the VFS will currently allocate a new dentry with the same name,
>>>and call ->lookup() on it without dropping dir->i_sem. If you still need
>>>to reference the old dentry, then put it on a private list somewhere.
>>>That would also allow you to return the old dentry as the result of the
>>>->lookup() operation if that is desirable.
>>
>>Problem with that, as I understand it and Ian Kent knows better than I, 
>>is that the autofs lookup code creates the dentry and fills it in 
>>partially and marks it as waiting for mounting and wakes up the 
>>automount demon.  The demon completes the mount and finishes filling in 
>>the dentry.  So we cannot have some other lookup coming in and removing 
>>the dentry on us.  At least that is what I understand from Ian's answer 
>>when I proposed the same sort of thing to him.
> 
> 
> What do you mean by "removing the dentry on us"? It is perfectly
> possible to have lookup() return the original dentry every time, which
> is precisely what I suggested above.
What I meant was that the autofs code created this dentry and then 
called d_add to put it in the hash chain, woke up the autofs demon to 
perform the mount then waited for the mount to complete.  The autofs 
code is, I think, intending for the demon to find this entry in a 
revalidate and complete the mount.  But Ian knows better than I.  Anyway 
I did not think that it would be good for a racing call to do_lookup to 
unhash a dentry that the automounter was expecting to find.

I was probably unclear when I referred to lookup.  I meant it in the 
generic sense of do_lookup or lookup_one_len and not the i_op->lookup 
function.  I had already suggested to Ian that they not d_add the dentry
in autofs4_lookup until the mount demon came in to complete the mount 
and have autofs4_lookup be responsible for queing up subsequent lookups 
until the mount completed and moving the code for that out of 
autofs4_revalidate.  He allowed how it would be possible but a lot of work.
> 
> 
>>   Even if  they end up 
>>doing something like that in a future version of the automounter, I 
>>would still like a simple patch that can be applied to existing systems 
>>as an interim fix.
> 
> 
> "Interim" fixes to the entire VFS API such as the ones that have
> proposed here tend to be a poor idea...
> 
Which is why I was discussing the ideas here.  From the start I have 
been asking for input from people with more understanding than I have of 
the subleties of VFS locking.  I have been trying to find a solution 
short of waiting for autofs5 since we are seeing the problem now.  But 
obviously we don't want a fix that causes more problems.  My original 
thought was that the solution to the problem was to make the locking 
requirements for d_revalidate consistent.  I now have a greater 
understanding of why things are as they are.  In another post I have 
outlined what I think a workable solution is that is confined to the 
autofs code.  Your input it has been quite helpful in helping me 
understand all of this.  Thanks.

Will

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-01 12:09           ` Ian Kent
@ 2005-12-01 16:30             ` William H. Taber
  2005-12-02 13:49               ` Ian Kent
  0 siblings, 1 reply; 95+ messages in thread
From: William H. Taber @ 2005-12-01 16:30 UTC (permalink / raw)
  To: Ian Kent
  Cc: Trond Myklebust, Jeff Moyer, Ram Pai, autofs mailing list, linux-fsdevel

Ian Kent wrote:
> On Wed, 30 Nov 2005, William H. Taber wrote:
> 
> 
>>Trond Myklebust wrote:
>>
>>>On Wed, 2005-11-30 at 15:32 -0500, William H. Taber wrote:
>>>
>>>
>>>
>>>>Not only is there this case, but the original premise is wrong as well.
>>>>There is a second case in which a d_revalidate function is called with the
>>>>parent i_sem and that is when it is called from inside of lookup_one_len.
>>>>What makes this tricky is that lookup_one_len is called from
>>>>nfs_sillyrename from inside of nfs_rename which is called, naturally
>>>>enough by sys_rename.  The rename code is very careful about the order in
>>>>which it obtains the parent semaphores because it needs to get two of
>>>>them.  It must always obtain the locks in the same order so that does not
>>>>get into a deadly embrace.  If we start arbitrarily releasing a parent
>>>>semaphore in cached_lookup and taking it again after the revalidate, we
>>>>risk breaking the lock ordering and creating a deadly embrace.
>>>>
>>>>When I started writing this I thought that it would be safe for the autofs
>>>>revalidate code to release the parent semaphore because they do not have a
>>>>rename callback.  But I looked again at the rename code and it calls
>>>>lookup_hash on the final source and destination files after locking the
>>>>parents so the potential for a deadly embrace still exists unless there is
>>>>some other assurance that these final lookups will never pend waiting on
>>>>the automounter in either their revalidate or lookup routines.  (Actually
>>>>the requirement is that they never give up the parent i_sem lock, but the
>>>>lookup code has to give up the lock so that the autofs demon can run and
>>>>perform the mount so it amounts to the same thing.)
>>>>
>>>>The same issue exists for devfs which also releases the parent i_sem lock
>>>>so that it can wait inside its revalidation routine.
>>>
>>>
>>>So exactly why does autofs4 want to hold the dir->i_sem in d_revalidate
>>>in the first place? Can't we move any code that requires dir->i_sem to
>>>be held into a ->lookup() method?
>>
>>It's not that d_revalidate wants or doesn't want to hold the lock.  The caller
>>of lookup_one_len is required to get the lock and this function calls
>>lookup_hash which calls cached_lookup which calls d_revalidate.
>>
>>
>>>Trivially, if you have a d_revalidate that does something like
>>>
>>>int autofs_revalidate(struct dentry *dentry, struct nameidata *nd)
>>>{
>>>  d_drop(dentry);
>>>  return 0;
>>>}
>>>
>>>then the VFS will currently allocate a new dentry with the same name,
>>>and call ->lookup() on it without dropping dir->i_sem. If you still need
>>>to reference the old dentry, then put it on a private list somewhere.
>>>That would also allow you to return the old dentry as the result of the
>>>->lookup() operation if that is desirable.
>>
>>Problem with that, as I understand it and Ian Kent knows better than I, is
>>that the autofs lookup code creates the dentry and fills it in partially and
>>marks it as waiting for mounting and wakes up the automount demon.  The demon
>>completes the mount and finishes filling in the dentry.  So we cannot have
>>some other lookup coming in and removing the dentry on us.  At least that is
>>what I understand from Ian's answer when I proposed the same sort of thing to
>>him.   Even if  they end up doing something like that in a future version of
>>the automounter, I would still like a simple patch that can be applied to
>>existing systems as an interim fix.
> 
> 
> Lets see if I can keep this explaination simple.
> 
> The user space process using the autofs filesystem (autodir or automount) 
> needs to be able to call mkdir at mount time as a result of a callback 
> from revalidate. Sometimes this comes indirectly from lookup (if the 
> directory does not already exist).
> 
> lookup_one_len requires the i_sem to be held so two instances of a 
> filesystem calling it lead to a deadlock when mkdir is called from 
> userspace (the third process). In the case we are discussing this happens 
> because the first process calls lookup which releases the i_sem and 
> calls revalidate itself. The second calls revalidate which doesn't release 
> the i_sem and is places on a wait queue for mount completion. Consequently 
> the mkdir blocks.
> 
> So the requirement is that autofs release the i_sem during the callback, 
> not obtain it.
> 
> Will believes that it is not safe for autofs to release i_sem for 
> the callback to user space because it is possible that path that aquired 
> it may not be the path that has called revalidate and I can see his point.
> 
> Never the less I'm still not convinced that this is possible given the 
> restrictions of autofs.
> 
> Let me try and describe this, hopefully more clearly than I've done so 
> far.
> 
> The only operations defined for autofs are:
> 
> mkdir, rmdir, symlink and unlink 
> 
> and the only processes that can do these operations must be in the same 
> process group that mounted the filesystem. EACCESS is returned for all 
> other processes attempting these operations.
> 
> The other functionality is read-only (and perhaps triggers a mount) 
> being lookup, revalidate and readdir.
> 
> So the question is, can anyone provide an example of a path that, upon 
> calling autofs revalidate or lookup with the i_sem held, not be the path 
> that aquired it?

Any other process calling lookup_one_len on a file in /net.

Will

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-01 16:30             ` William H. Taber
@ 2005-12-02 13:49               ` Ian Kent
  2005-12-02 14:07                 ` Jeff Moyer
                                   ` (2 more replies)
  0 siblings, 3 replies; 95+ messages in thread
From: Ian Kent @ 2005-12-02 13:49 UTC (permalink / raw)
  To: William H. Taber
  Cc: Trond Myklebust, Jeff Moyer, Ram Pai, autofs mailing list, linux-fsdevel

On Thu, 1 Dec 2005, William H. Taber wrote:

> Ian Kent wrote:
> > On Wed, 30 Nov 2005, William H. Taber wrote:
> > 
> > 
> > > Trond Myklebust wrote:
> > > 
> > > > On Wed, 2005-11-30 at 15:32 -0500, William H. Taber wrote:
> > > > 
> > > > 
> > > > 
> > > > > Not only is there this case, but the original premise is wrong as
> > > > > well.
> > > > > There is a second case in which a d_revalidate function is called with
> > > > > the
> > > > > parent i_sem and that is when it is called from inside of
> > > > > lookup_one_len.
> > > > > What makes this tricky is that lookup_one_len is called from
> > > > > nfs_sillyrename from inside of nfs_rename which is called, naturally
> > > > > enough by sys_rename.  The rename code is very careful about the order
> > > > > in
> > > > > which it obtains the parent semaphores because it needs to get two of
> > > > > them.  It must always obtain the locks in the same order so that does
> > > > > not
> > > > > get into a deadly embrace.  If we start arbitrarily releasing a parent
> > > > > semaphore in cached_lookup and taking it again after the revalidate,
> > > > > we
> > > > > risk breaking the lock ordering and creating a deadly embrace.
> > > > > 
> > > > > When I started writing this I thought that it would be safe for the
> > > > > autofs
> > > > > revalidate code to release the parent semaphore because they do not
> > > > > have a
> > > > > rename callback.  But I looked again at the rename code and it calls
> > > > > lookup_hash on the final source and destination files after locking
> > > > > the
> > > > > parents so the potential for a deadly embrace still exists unless
> > > > > there is
> > > > > some other assurance that these final lookups will never pend waiting
> > > > > on
> > > > > the automounter in either their revalidate or lookup routines.
> > > > > (Actually
> > > > > the requirement is that they never give up the parent i_sem lock, but
> > > > > the
> > > > > lookup code has to give up the lock so that the autofs demon can run
> > > > > and
> > > > > perform the mount so it amounts to the same thing.)
> > > > > 
> > > > > The same issue exists for devfs which also releases the parent i_sem
> > > > > lock
> > > > > so that it can wait inside its revalidation routine.
> > > > 
> > > > 
> > > > So exactly why does autofs4 want to hold the dir->i_sem in d_revalidate
> > > > in the first place? Can't we move any code that requires dir->i_sem to
> > > > be held into a ->lookup() method?
> > > 
> > > It's not that d_revalidate wants or doesn't want to hold the lock.  The
> > > caller
> > > of lookup_one_len is required to get the lock and this function calls
> > > lookup_hash which calls cached_lookup which calls d_revalidate.
> > > 
> > > 
> > > > Trivially, if you have a d_revalidate that does something like
> > > > 
> > > > int autofs_revalidate(struct dentry *dentry, struct nameidata *nd)
> > > > {
> > > >  d_drop(dentry);
> > > >  return 0;
> > > > }
> > > > 
> > > > then the VFS will currently allocate a new dentry with the same name,
> > > > and call ->lookup() on it without dropping dir->i_sem. If you still need
> > > > to reference the old dentry, then put it on a private list somewhere.
> > > > That would also allow you to return the old dentry as the result of the
> > > > ->lookup() operation if that is desirable.
> > > 
> > > Problem with that, as I understand it and Ian Kent knows better than I, is
> > > that the autofs lookup code creates the dentry and fills it in partially
> > > and
> > > marks it as waiting for mounting and wakes up the automount demon.  The
> > > demon
> > > completes the mount and finishes filling in the dentry.  So we cannot have
> > > some other lookup coming in and removing the dentry on us.  At least that
> > > is
> > > what I understand from Ian's answer when I proposed the same sort of thing
> > > to
> > > him.   Even if  they end up doing something like that in a future version
> > > of
> > > the automounter, I would still like a simple patch that can be applied to
> > > existing systems as an interim fix.
> > 
> > 
> > Lets see if I can keep this explaination simple.
> > 
> > The user space process using the autofs filesystem (autodir or automount)
> > needs to be able to call mkdir at mount time as a result of a callback from
> > revalidate. Sometimes this comes indirectly from lookup (if the directory
> > does not already exist).
> > 
> > lookup_one_len requires the i_sem to be held so two instances of a
> > filesystem calling it lead to a deadlock when mkdir is called from userspace
> > (the third process). In the case we are discussing this happens because the
> > first process calls lookup which releases the i_sem and calls revalidate
> > itself. The second calls revalidate which doesn't release the i_sem and is
> > places on a wait queue for mount completion. Consequently the mkdir blocks.
> > 
> > So the requirement is that autofs release the i_sem during the callback, not
> > obtain it.
> > 
> > Will believes that it is not safe for autofs to release i_sem for the
> > callback to user space because it is possible that path that aquired it may
> > not be the path that has called revalidate and I can see his point.
> > 
> > Never the less I'm still not convinced that this is possible given the
> > restrictions of autofs.
> > 
> > Let me try and describe this, hopefully more clearly than I've done so far.
> > 
> > The only operations defined for autofs are:
> > 
> > mkdir, rmdir, symlink and unlink 
> > and the only processes that can do these operations must be in the same
> > process group that mounted the filesystem. EACCESS is returned for all other
> > processes attempting these operations.
> > 
> > The other functionality is read-only (and perhaps triggers a mount) being
> > lookup, revalidate and readdir.
> > 
> > So the question is, can anyone provide an example of a path that, upon
> > calling autofs revalidate or lookup with the i_sem held, not be the path
> > that aquired it?

So still no counter example!

> 
> Any other process calling lookup_one_len on a file in /net.

I'm afraid this is not an example it's an assertion.
"Any other process" is a little broad I think.
You'll need to be more specific.

Consider the example reported by yourself and Ram.

In that example we have processes P1, P2 and lets call the user space 
callback P1(mount). Also assume there is a mechamism to check the 
semaphore, release it if held and later re-take it if previously held, 
like the patch I offered before.

Correct me if I'm wrong but, with the assumption above, you report 
goes like:

P1 - calls lookup_one_len, takes i_sem and eventually calls 
autofs4_lookup and indirectly autofs4_revalidate.

P2 - comes along and waits on i_sem.

P1 - autofs4_revalidate releases i_sem and posts a user space callback.

P2 - aquires i_sem and eventually calls autofs4_revalidate, releases 
i_sem and is posted to the wait queue for mount completion.

P1(mount) - calls mkdir, aquires i_sem, and calls autofs4_dir_mkdir,  
i_sem is then released.

Mount completion is signaled back to autofs4 and the waiters are released.

P1, P2, in any order each (one after the other due to the semaphore) 
re-take i_sem and each complete their lookup_one_len calls.

On both calls to autofs4_revalidate the calling process is itself the 
holder of i_sem.

Further, any other process that does a path walk during this time has two 
possible paths.

First case, the dentry exists, the process is placed on the wait queue 
along with P1 and P2 awaiting mount completion without taking i_sem.

Second case, the dentry does not yet exist, this process either aquires 
the i_sem in do_lookup and follows a similar path to P1 and waits on the 
queue for mount completion or it waits on the i_sem while P1 does 
the lookup and triggers the mount request, it the aquires i_sem find the 
dentry exists, releases i_sem and calls autofs4_revalidate without i_sem 
held and is sent to the wait queue to wait for mount completion.

Again in both these cases a process that enters autofs4_revalidate when 
the i_sem is held is the process that aquired it.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-12-02 13:49               ` Ian Kent
@ 2005-12-02 14:07                 ` Jeff Moyer
  2005-12-02 15:21                   ` Ian Kent
  2005-12-02 15:34                 ` Will Taber
  2005-12-02 16:04                 ` [autofs] " Jeff Moyer
  2 siblings, 1 reply; 95+ messages in thread
From: Jeff Moyer @ 2005-12-02 14:07 UTC (permalink / raw)
  To: Ian Kent
  Cc: autofs mailing list, linux-fsdevel, William H. Taber, Trond Myklebust

==> Regarding Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix; Ian Kent <raven@themaw.net> adds:

raven> On Thu, 1 Dec 2005, William H. Taber wrote:
>> Ian Kent wrote: > On Wed, 30 Nov 2005, William H. Taber wrote:
>> > 
>> > 
>> > > Trond Myklebust wrote:
>> > > 
>> > > > On Wed, 2005-11-30 at 15:32 -0500, William H. Taber wrote:
>> > > > 
>> > > > 
>> > > > 
>> > > > > Not only is there this case, but the original premise is wrong
>> as > > > > well.  > > > > There is a second case in which a d_revalidate
>> function is called with > > > > the > > > > parent i_sem and that is
>> when it is called from inside of > > > > lookup_one_len.  > > > > What
>> makes this tricky is that lookup_one_len is called from > > > >
>> nfs_sillyrename from inside of nfs_rename which is called, naturally > >
>> > > enough by sys_rename.  The rename code is very careful about the
>> order > > > > in > > > > which it obtains the parent semaphores because
>> it needs to get two of > > > > them.  It must always obtain the locks in
>> the same order so that does > > > > not > > > > get into a deadly
>> embrace.  If we start arbitrarily releasing a parent > > > > semaphore
>> in cached_lookup and taking it again after the revalidate, > > > > we >
>> > > > risk breaking the lock ordering and creating a deadly embrace.
>> > > > > 
>> > > > > When I started writing this I thought that it would be safe for
>> the > > > > autofs > > > > revalidate code to release the parent
>> semaphore because they do not > > > > have a > > > > rename callback.
>> But I looked again at the rename code and it calls > > > > lookup_hash
>> on the final source and destination files after locking > > > > the > >
>> > > parents so the potential for a deadly embrace still exists unless >
>> > > > there is > > > > some other assurance that these final lookups
>> will never pend waiting > > > > on > > > > the automounter in either
>> their revalidate or lookup routines.  > > > > (Actually > > > > the
>> requirement is that they never give up the parent i_sem lock, but > > >
>> > the > > > > lookup code has to give up the lock so that the autofs
>> demon can run > > > > and > > > > perform the mount so it amounts to the
>> same thing.)
>> > > > > 
>> > > > > The same issue exists for devfs which also releases the parent
>> i_sem > > > > lock > > > > so that it can wait inside its revalidation
>> routine.
>> > > > 
>> > > > 
>> > > > So exactly why does autofs4 want to hold the dir->i_sem in
>> d_revalidate > > > in the first place? Can't we move any code that
>> requires dir->i_sem to > > > be held into a ->lookup() method?
>> > > 
>> > > It's not that d_revalidate wants or doesn't want to hold the lock.
>> The > > caller > > of lookup_one_len is required to get the lock and
>> this function calls > > lookup_hash which calls cached_lookup which
>> calls d_revalidate.
>> > > 
>> > > 
>> > > > Trivially, if you have a d_revalidate that does something like
>> > > > 
>> > > > int autofs_revalidate(struct dentry *dentry, struct nameidata *nd)
>> > > > { > > > d_drop(dentry); > > > return 0; > > > }
>> > > > 
>> > > > then the VFS will currently allocate a new dentry with the same
>> name, > > > and call ->lookup() on it without dropping dir->i_sem. If
>> you still need > > > to reference the old dentry, then put it on a
>> private list somewhere.  > > > That would also allow you to return the
>> old dentry as the result of the > > > ->lookup() operation if that is
>> desirable.
>> > > 
>> > > Problem with that, as I understand it and Ian Kent knows better than
>> I, is > > that the autofs lookup code creates the dentry and fills it in
>> partially > > and > > marks it as waiting for mounting and wakes up the
>> automount demon.  The > > demon > > completes the mount and finishes
>> filling in the dentry.  So we cannot have > > some other lookup coming
>> in and removing the dentry on us.  At least that > > is > > what I
>> understand from Ian's answer when I proposed the same sort of thing > >
>> to > > him.  Even if they end up doing something like that in a future
>> version > > of > > the automounter, I would still like a simple patch
>> that can be applied to > > existing systems as an interim fix.
>> > 
>> > 
>> > Lets see if I can keep this explaination simple.
>> > 
>> > The user space process using the autofs filesystem (autodir or
>> automount) > needs to be able to call mkdir at mount time as a result of
>> a callback from > revalidate. Sometimes this comes indirectly from
>> lookup (if the directory > does not already exist).
>> > 
>> > lookup_one_len requires the i_sem to be held so two instances of a >
>> filesystem calling it lead to a deadlock when mkdir is called from
>> userspace > (the third process). In the case we are discussing this
>> happens because the > first process calls lookup which releases the
>> i_sem and calls revalidate > itself. The second calls revalidate which
>> doesn't release the i_sem and is > places on a wait queue for mount
>> completion. Consequently the mkdir blocks.
>> > 
>> > So the requirement is that autofs release the i_sem during the
>> callback, not > obtain it.
>> > 
>> > Will believes that it is not safe for autofs to release i_sem for the
>> > callback to user space because it is possible that path that aquired
>> it may > not be the path that has called revalidate and I can see his
>> point.
>> > 
>> > Never the less I'm still not convinced that this is possible given the
>> > restrictions of autofs.
>> > 
>> > Let me try and describe this, hopefully more clearly than I've done so
>> far.
>> > 
>> > The only operations defined for autofs are:
>> > 
>> > mkdir, rmdir, symlink and unlink > and the only processes that can do
>> these operations must be in the same > process group that mounted the
>> filesystem. EACCESS is returned for all other > processes attempting
>> these operations.
>> > 
>> > The other functionality is read-only (and perhaps triggers a mount)
>> being > lookup, revalidate and readdir.
>> > 
>> > So the question is, can anyone provide an example of a path that, upon
>> > calling autofs revalidate or lookup with the i_sem held, not be the
>> path > that aquired it?

raven> So still no counter example!

>> Any other process calling lookup_one_len on a file in /net.

raven> I'm afraid this is not an example it's an assertion.  "Any other
raven> process" is a little broad I think.  You'll need to be more
raven> specific.

raven> Consider the example reported by yourself and Ram.

raven> In that example we have processes P1, P2 and lets call the user
raven> space callback P1(mount). Also assume there is a mechamism to check
raven> the semaphore, release it if held and later re-take it if previously
raven> held, like the patch I offered before.

raven> Correct me if I'm wrong but, with the assumption above, you report
raven> goes like:

raven> P1 - calls lookup_one_len, takes i_sem and eventually calls
raven> autofs4_lookup and indirectly autofs4_revalidate.

raven> P2 - comes along and waits on i_sem.

raven> P1 - autofs4_revalidate releases i_sem and posts a user space
raven> callback.

raven> P2 - aquires i_sem and eventually calls autofs4_revalidate, releases
raven> i_sem and is posted to the wait queue for mount completion.

raven> P1(mount) - calls mkdir, aquires i_sem, and calls autofs4_dir_mkdir,
raven> i_sem is then released.

raven> Mount completion is signaled back to autofs4 and the waiters are
raven> released.

raven> P1, P2, in any order each (one after the other due to the semaphore)
raven> re-take i_sem and each complete their lookup_one_len calls.

raven> On both calls to autofs4_revalidate the calling process is itself
raven> the holder of i_sem.

raven> Further, any other process that does a path walk during this time
raven> has two possible paths.

raven> First case, the dentry exists, the process is placed on the wait
raven> queue along with P1 and P2 awaiting mount completion without taking
raven> i_sem.

raven> Second case, the dentry does not yet exist, this process either
raven> aquires the i_sem in do_lookup and follows a similar path to P1 and
raven> waits on the queue for mount completion or it waits on the i_sem
raven> while P1 does the lookup and triggers the mount request, it the
raven> aquires i_sem find the dentry exists, releases i_sem and calls
raven> autofs4_revalidate without i_sem held and is sent to the wait queue
raven> to wait for mount completion.

raven> Again in both these cases a process that enters autofs4_revalidate
raven> when the i_sem is held is the process that aquired it.

Now consider that revalidate is called without semaphore held.  Also
consider that another process holds this semaphore for any valid reason
(could be mkdir or whatever[1]).  Now, you're code says, hey, the semaphore is
held, let's drop it!  Bad juju follows.

You can't do this.

-Jeff

1 - Yes, mkdir shouldn't be called by anything but the automount daemon.
However, that doesn't prevent someone from calling it, and having it fail.
That code path will still acquire the semaphore.  I'm sure there are other
code paths that will get the semaphore, and not end up in revalidate, too.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-12-02 14:07                 ` Jeff Moyer
@ 2005-12-02 15:21                   ` Ian Kent
  2005-12-02 16:35                     ` [autofs] " Will Taber
  0 siblings, 1 reply; 95+ messages in thread
From: Ian Kent @ 2005-12-02 15:21 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: autofs mailing list, linux-fsdevel, William H. Taber, Trond Myklebust

On Fri, 2 Dec 2005, Jeff Moyer wrote:

> ==> Regarding Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix; Ian Kent <raven@themaw.net> adds:
> 
> raven> On Thu, 1 Dec 2005, William H. Taber wrote:
> >> Ian Kent wrote: > On Wed, 30 Nov 2005, William H. Taber wrote:
> >> > 
> >> > 
> >> > > Trond Myklebust wrote:
> >> > > 
> >> > > > On Wed, 2005-11-30 at 15:32 -0500, William H. Taber wrote:
> >> > > > 
> >> > > > 
> >> > > > 
> >> > > > > Not only is there this case, but the original premise is wrong
> >> as > > > > well.  > > > > There is a second case in which a d_revalidate
> >> function is called with > > > > the > > > > parent i_sem and that is
> >> when it is called from inside of > > > > lookup_one_len.  > > > > What
> >> makes this tricky is that lookup_one_len is called from > > > >
> >> nfs_sillyrename from inside of nfs_rename which is called, naturally > >
> >> > > enough by sys_rename.  The rename code is very careful about the
> >> order > > > > in > > > > which it obtains the parent semaphores because
> >> it needs to get two of > > > > them.  It must always obtain the locks in
> >> the same order so that does > > > > not > > > > get into a deadly
> >> embrace.  If we start arbitrarily releasing a parent > > > > semaphore
> >> in cached_lookup and taking it again after the revalidate, > > > > we >
> >> > > > risk breaking the lock ordering and creating a deadly embrace.
> >> > > > > 
> >> > > > > When I started writing this I thought that it would be safe for
> >> the > > > > autofs > > > > revalidate code to release the parent
> >> semaphore because they do not > > > > have a > > > > rename callback.
> >> But I looked again at the rename code and it calls > > > > lookup_hash
> >> on the final source and destination files after locking > > > > the > >
> >> > > parents so the potential for a deadly embrace still exists unless >
> >> > > > there is > > > > some other assurance that these final lookups
> >> will never pend waiting > > > > on > > > > the automounter in either
> >> their revalidate or lookup routines.  > > > > (Actually > > > > the
> >> requirement is that they never give up the parent i_sem lock, but > > >
> >> > the > > > > lookup code has to give up the lock so that the autofs
> >> demon can run > > > > and > > > > perform the mount so it amounts to the
> >> same thing.)
> >> > > > > 
> >> > > > > The same issue exists for devfs which also releases the parent
> >> i_sem > > > > lock > > > > so that it can wait inside its revalidation
> >> routine.
> >> > > > 
> >> > > > 
> >> > > > So exactly why does autofs4 want to hold the dir->i_sem in
> >> d_revalidate > > > in the first place? Can't we move any code that
> >> requires dir->i_sem to > > > be held into a ->lookup() method?
> >> > > 
> >> > > It's not that d_revalidate wants or doesn't want to hold the lock.
> >> The > > caller > > of lookup_one_len is required to get the lock and
> >> this function calls > > lookup_hash which calls cached_lookup which
> >> calls d_revalidate.
> >> > > 
> >> > > 
> >> > > > Trivially, if you have a d_revalidate that does something like
> >> > > > 
> >> > > > int autofs_revalidate(struct dentry *dentry, struct nameidata *nd)
> >> > > > { > > > d_drop(dentry); > > > return 0; > > > }
> >> > > > 
> >> > > > then the VFS will currently allocate a new dentry with the same
> >> name, > > > and call ->lookup() on it without dropping dir->i_sem. If
> >> you still need > > > to reference the old dentry, then put it on a
> >> private list somewhere.  > > > That would also allow you to return the
> >> old dentry as the result of the > > > ->lookup() operation if that is
> >> desirable.
> >> > > 
> >> > > Problem with that, as I understand it and Ian Kent knows better than
> >> I, is > > that the autofs lookup code creates the dentry and fills it in
> >> partially > > and > > marks it as waiting for mounting and wakes up the
> >> automount demon.  The > > demon > > completes the mount and finishes
> >> filling in the dentry.  So we cannot have > > some other lookup coming
> >> in and removing the dentry on us.  At least that > > is > > what I
> >> understand from Ian's answer when I proposed the same sort of thing > >
> >> to > > him.  Even if they end up doing something like that in a future
> >> version > > of > > the automounter, I would still like a simple patch
> >> that can be applied to > > existing systems as an interim fix.
> >> > 
> >> > 
> >> > Lets see if I can keep this explaination simple.
> >> > 
> >> > The user space process using the autofs filesystem (autodir or
> >> automount) > needs to be able to call mkdir at mount time as a result of
> >> a callback from > revalidate. Sometimes this comes indirectly from
> >> lookup (if the directory > does not already exist).
> >> > 
> >> > lookup_one_len requires the i_sem to be held so two instances of a >
> >> filesystem calling it lead to a deadlock when mkdir is called from
> >> userspace > (the third process). In the case we are discussing this
> >> happens because the > first process calls lookup which releases the
> >> i_sem and calls revalidate > itself. The second calls revalidate which
> >> doesn't release the i_sem and is > places on a wait queue for mount
> >> completion. Consequently the mkdir blocks.
> >> > 
> >> > So the requirement is that autofs release the i_sem during the
> >> callback, not > obtain it.
> >> > 
> >> > Will believes that it is not safe for autofs to release i_sem for the
> >> > callback to user space because it is possible that path that aquired
> >> it may > not be the path that has called revalidate and I can see his
> >> point.
> >> > 
> >> > Never the less I'm still not convinced that this is possible given the
> >> > restrictions of autofs.
> >> > 
> >> > Let me try and describe this, hopefully more clearly than I've done so
> >> far.
> >> > 
> >> > The only operations defined for autofs are:
> >> > 
> >> > mkdir, rmdir, symlink and unlink > and the only processes that can do
> >> these operations must be in the same > process group that mounted the
> >> filesystem. EACCESS is returned for all other > processes attempting
> >> these operations.
> >> > 
> >> > The other functionality is read-only (and perhaps triggers a mount)
> >> being > lookup, revalidate and readdir.
> >> > 
> >> > So the question is, can anyone provide an example of a path that, upon
> >> > calling autofs revalidate or lookup with the i_sem held, not be the
> >> path > that aquired it?
> 
> raven> So still no counter example!
> 
> >> Any other process calling lookup_one_len on a file in /net.
> 
> raven> I'm afraid this is not an example it's an assertion.  "Any other
> raven> process" is a little broad I think.  You'll need to be more
> raven> specific.
> 
> raven> Consider the example reported by yourself and Ram.
> 
> raven> In that example we have processes P1, P2 and lets call the user
> raven> space callback P1(mount). Also assume there is a mechamism to check
> raven> the semaphore, release it if held and later re-take it if previously
> raven> held, like the patch I offered before.
> 
> raven> Correct me if I'm wrong but, with the assumption above, you report
> raven> goes like:
> 
> raven> P1 - calls lookup_one_len, takes i_sem and eventually calls
> raven> autofs4_lookup and indirectly autofs4_revalidate.
> 
> raven> P2 - comes along and waits on i_sem.
> 
> raven> P1 - autofs4_revalidate releases i_sem and posts a user space
> raven> callback.
> 
> raven> P2 - aquires i_sem and eventually calls autofs4_revalidate, releases
> raven> i_sem and is posted to the wait queue for mount completion.
> 
> raven> P1(mount) - calls mkdir, aquires i_sem, and calls autofs4_dir_mkdir,
> raven> i_sem is then released.
> 
> raven> Mount completion is signaled back to autofs4 and the waiters are
> raven> released.
> 
> raven> P1, P2, in any order each (one after the other due to the semaphore)
> raven> re-take i_sem and each complete their lookup_one_len calls.
> 
> raven> On both calls to autofs4_revalidate the calling process is itself
> raven> the holder of i_sem.
> 
> raven> Further, any other process that does a path walk during this time
> raven> has two possible paths.
> 
> raven> First case, the dentry exists, the process is placed on the wait
> raven> queue along with P1 and P2 awaiting mount completion without taking
> raven> i_sem.
> 
> raven> Second case, the dentry does not yet exist, this process either
> raven> aquires the i_sem in do_lookup and follows a similar path to P1 and
> raven> waits on the queue for mount completion or it waits on the i_sem
> raven> while P1 does the lookup and triggers the mount request, it the
> raven> aquires i_sem find the dentry exists, releases i_sem and calls
> raven> autofs4_revalidate without i_sem held and is sent to the wait queue
> raven> to wait for mount completion.
> 
> raven> Again in both these cases a process that enters autofs4_revalidate
> raven> when the i_sem is held is the process that aquired it.
> 
> Now consider that revalidate is called without semaphore held.  Also
> consider that another process holds this semaphore for any valid reason
> (could be mkdir or whatever[1]).  Now, you're code says, hey, the semaphore is
> held, let's drop it!  Bad juju follows.
> 
> You can't do this.

Good call. A single example is sufficient to refute my assertion.

But to focus the discussion, the goal is to identify that the calling 
process is the one that holds the semaphore to justify releasing it. For 
lookup it's easy as it's always called with it held. lookup_hash will 
always call revalidate or lookup with the nameidata struct NULL and 
so can be used to identify the semaphore ownership similar to the devfs 
code Will mentioned. I don't think the other tests there are relevant to 
autofs. The patch will require some rework but still the same general 
idea.

Can you think of an example code path where __lookup_hash is called with a 
non null nameidata struct which leads to autofs4 revalidate or lookup.

Ian

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-02 13:49               ` Ian Kent
  2005-12-02 14:07                 ` Jeff Moyer
@ 2005-12-02 15:34                 ` Will Taber
  2005-12-02 17:29                   ` Ian Kent
  2005-12-02 16:04                 ` [autofs] " Jeff Moyer
  2 siblings, 1 reply; 95+ messages in thread
From: Will Taber @ 2005-12-02 15:34 UTC (permalink / raw)
  To: Ian Kent
  Cc: Trond Myklebust, Jeff Moyer, Ram Pai, autofs mailing list, linux-fsdevel

Ian Kent wrote:
> On Thu, 1 Dec 2005, William H. Taber wrote:
> 
> 
>>Ian Kent wrote:
>>
>>>On Wed, 30 Nov 2005, William H. Taber wrote:
>>>
>>>
>>>
>>>>Trond Myklebust wrote:
>>>>
>>>>
>>>>>On Wed, 2005-11-30 at 15:32 -0500, William H. Taber wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Not only is there this case, but the original premise is wrong as
>>>>>>well.
>>>>>>There is a second case in which a d_revalidate function is called with
>>>>>>the
>>>>>>parent i_sem and that is when it is called from inside of
>>>>>>lookup_one_len.
>>>>>>What makes this tricky is that lookup_one_len is called from
>>>>>>nfs_sillyrename from inside of nfs_rename which is called, naturally
>>>>>>enough by sys_rename.  The rename code is very careful about the order
>>>>>>in
>>>>>>which it obtains the parent semaphores because it needs to get two of
>>>>>>them.  It must always obtain the locks in the same order so that does
>>>>>>not
>>>>>>get into a deadly embrace.  If we start arbitrarily releasing a parent
>>>>>>semaphore in cached_lookup and taking it again after the revalidate,
>>>>>>we
>>>>>>risk breaking the lock ordering and creating a deadly embrace.
>>>>>>
>>>>>>When I started writing this I thought that it would be safe for the
>>>>>>autofs
>>>>>>revalidate code to release the parent semaphore because they do not
>>>>>>have a
>>>>>>rename callback.  But I looked again at the rename code and it calls
>>>>>>lookup_hash on the final source and destination files after locking
>>>>>>the
>>>>>>parents so the potential for a deadly embrace still exists unless
>>>>>>there is
>>>>>>some other assurance that these final lookups will never pend waiting
>>>>>>on
>>>>>>the automounter in either their revalidate or lookup routines.
>>>>>>(Actually
>>>>>>the requirement is that they never give up the parent i_sem lock, but
>>>>>>the
>>>>>>lookup code has to give up the lock so that the autofs demon can run
>>>>>>and
>>>>>>perform the mount so it amounts to the same thing.)
>>>>>>
>>>>>>The same issue exists for devfs which also releases the parent i_sem
>>>>>>lock
>>>>>>so that it can wait inside its revalidation routine.
>>>>>
>>>>>
>>>>>So exactly why does autofs4 want to hold the dir->i_sem in d_revalidate
>>>>>in the first place? Can't we move any code that requires dir->i_sem to
>>>>>be held into a ->lookup() method?
>>>>
>>>>It's not that d_revalidate wants or doesn't want to hold the lock.  The
>>>>caller
>>>>of lookup_one_len is required to get the lock and this function calls
>>>>lookup_hash which calls cached_lookup which calls d_revalidate.
>>>>
>>>>
>>>>
>>>>>Trivially, if you have a d_revalidate that does something like
>>>>>
>>>>>int autofs_revalidate(struct dentry *dentry, struct nameidata *nd)
>>>>>{
>>>>> d_drop(dentry);
>>>>> return 0;
>>>>>}
>>>>>
>>>>>then the VFS will currently allocate a new dentry with the same name,
>>>>>and call ->lookup() on it without dropping dir->i_sem. If you still need
>>>>>to reference the old dentry, then put it on a private list somewhere.
>>>>>That would also allow you to return the old dentry as the result of the
>>>>>->lookup() operation if that is desirable.
>>>>
>>>>Problem with that, as I understand it and Ian Kent knows better than I, is
>>>>that the autofs lookup code creates the dentry and fills it in partially
>>>>and
>>>>marks it as waiting for mounting and wakes up the automount demon.  The
>>>>demon
>>>>completes the mount and finishes filling in the dentry.  So we cannot have
>>>>some other lookup coming in and removing the dentry on us.  At least that
>>>>is
>>>>what I understand from Ian's answer when I proposed the same sort of thing
>>>>to
>>>>him.   Even if  they end up doing something like that in a future version
>>>>of
>>>>the automounter, I would still like a simple patch that can be applied to
>>>>existing systems as an interim fix.
>>>
>>>
>>>Lets see if I can keep this explaination simple.
>>>
>>>The user space process using the autofs filesystem (autodir or automount)
>>>needs to be able to call mkdir at mount time as a result of a callback from
>>>revalidate. Sometimes this comes indirectly from lookup (if the directory
>>>does not already exist).
>>>
>>>lookup_one_len requires the i_sem to be held so two instances of a
>>>filesystem calling it lead to a deadlock when mkdir is called from userspace
>>>(the third process). In the case we are discussing this happens because the
>>>first process calls lookup which releases the i_sem and calls revalidate
>>>itself. The second calls revalidate which doesn't release the i_sem and is
>>>places on a wait queue for mount completion. Consequently the mkdir blocks.
>>>
>>>So the requirement is that autofs release the i_sem during the callback, not
>>>obtain it.
>>>
>>>Will believes that it is not safe for autofs to release i_sem for the
>>>callback to user space because it is possible that path that aquired it may
>>>not be the path that has called revalidate and I can see his point.
>>>
>>>Never the less I'm still not convinced that this is possible given the
>>>restrictions of autofs.
>>>
>>>Let me try and describe this, hopefully more clearly than I've done so far.
>>>
>>>The only operations defined for autofs are:
>>>
>>>mkdir, rmdir, symlink and unlink 
>>>and the only processes that can do these operations must be in the same
>>>process group that mounted the filesystem. EACCESS is returned for all other
>>>processes attempting these operations.
>>>
>>>The other functionality is read-only (and perhaps triggers a mount) being
>>>lookup, revalidate and readdir.
>>>
>>>So the question is, can anyone provide an example of a path that, upon
>>>calling autofs revalidate or lookup with the i_sem held, not be the path
>>>that aquired it?
> 
> 
> So still no counter example!
> 
> 
>>Any other process calling lookup_one_len on a file in /net.
> 
> 
> I'm afraid this is not an example it's an assertion.
> "Any other process" is a little broad I think.
> You'll need to be more specific.
> 
> Consider the example reported by yourself and Ram.
> 
> In that example we have processes P1, P2 and lets call the user space 
> callback P1(mount). Also assume there is a mechamism to check the 
> semaphore, release it if held and later re-take it if previously held, 
> like the patch I offered before.
> 
> Correct me if I'm wrong but, with the assumption above, you report 
> goes like:
> 
> P1 - calls lookup_one_len, takes i_sem and eventually calls 
> autofs4_lookup and indirectly autofs4_revalidate.
> 
> P2 - comes along and waits on i_sem.
And what happens if P3 comes in with a normal lookup without i_sem held 
and calls autofs4_revalidate from do_lookup and wakes up P2? Think both 
about what will happen later in your code path and also what happens 
when P2 tries to release the lock that was no longer held.

> 
> P1 - autofs4_revalidate releases i_sem and posts a user space callback.
> 
> P2 - aquires i_sem and eventually calls autofs4_revalidate, releases 
> i_sem and is posted to the wait queue for mount completion.
> 
> P1(mount) - calls mkdir, aquires i_sem, and calls autofs4_dir_mkdir,  
> i_sem is then released.
> 
> Mount completion is signaled back to autofs4 and the waiters are released.
> 
> P1, P2, in any order each (one after the other due to the semaphore) 
> re-take i_sem and each complete their lookup_one_len calls.
> 
> On both calls to autofs4_revalidate the calling process is itself the 
> holder of i_sem.
> 
> Further, any other process that does a path walk during this time has two 
> possible paths.
> 
> First case, the dentry exists, the process is placed on the wait queue 
> along with P1 and P2 awaiting mount completion without taking i_sem.
> 
> Second case, the dentry does not yet exist, this process either aquires 
> the i_sem in do_lookup and follows a similar path to P1 and waits on the 
> queue for mount completion or it waits on the i_sem while P1 does 
> the lookup and triggers the mount request, it the aquires i_sem find the 
> dentry exists, releases i_sem and calls autofs4_revalidate without i_sem 
> held and is sent to the wait queue to wait for mount completion.
> 
> Again in both these cases a process that enters autofs4_revalidate when 
> the i_sem is held is the process that aquired it.

But a regular lookup can enter autofs4_revalidate at anytime without 
holding i_sem.

The main lookup path does not hold i_sem and Trond was pretty clear 
about why it cannot.  That is why devfs has the code which tries to 
guess whether it is the person holding the lock before it releases it. 
If you put similar code into autofs4_revalidate before you release i_sem 
it would probably work.  This of course makes your code sensitive to 
changes in the lookup code because the devfs code makes assumptions 
about what flags are set on different lookups.  The best fix would be to 
move all of the waiting into autofs4_lookup and not hash the dentry 
until the mount was ready to run.  That is necessarily a large piece of 
coding and would require a lot of testing.  That is why I am suggesting 
for now a patch that determines if the lock was held by the caller or 
not and releasing i_sem if it was, before waiting in autofs4_revalidate. 
  And of course remembering whether or not it needs to retake the lock 
after the wait completes.

Will



^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-02 13:49               ` Ian Kent
  2005-12-02 14:07                 ` Jeff Moyer
  2005-12-02 15:34                 ` Will Taber
@ 2005-12-02 16:04                 ` Jeff Moyer
  2005-12-02 17:36                   ` Ian Kent
  2 siblings, 1 reply; 95+ messages in thread
From: Jeff Moyer @ 2005-12-02 16:04 UTC (permalink / raw)
  To: Ian Kent
  Cc: William H. Taber, Trond Myklebust, Ram Pai, autofs mailing list,
	linux-fsdevel

==> Regarding Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix; Ian Kent <raven@themaw.net> adds:

raven> On Thu, 1 Dec 2005, William H. Taber wrote:
>> > So the question is, can anyone provide an example of a path that, upon
>> > calling autofs revalidate or lookup with the i_sem held, not be the
>> path > that aquired it?

raven> So still no counter example!

>> Any other process calling lookup_one_len on a file in /net.

raven> I'm afraid this is not an example it's an assertion.  "Any other
raven> process" is a little broad I think.  You'll need to be more
raven> specific.

Well, I think we've determined that the reported problem doesn't happen
with any in-tree callers.  The question, then, is do you want to fix the
locking problem?  Two approaches were presented in this thread.  I don't
really like the idea of the hack used by devfs, since it relies on implicit
semantics.  I haven't given much thought to the second approach, though
(are we sure it can be made to work?).  It may require a good deal of
effort, but if it makes things work properly, it's worth considering.  I'm
just not sure where it sits in the list of priorities, as I know you've got
a lot on your plate, Ian.

-Jeff

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-02 15:21                   ` Ian Kent
@ 2005-12-02 16:35                     ` Will Taber
  2005-12-02 17:11                       ` Ian Kent
  0 siblings, 1 reply; 95+ messages in thread
From: Will Taber @ 2005-12-02 16:35 UTC (permalink / raw)
  To: Ian Kent
  Cc: Jeff Moyer, Trond Myklebust, Ram Pai, autofs mailing list, linux-fsdevel

Ian Kent wrote:

> 
> Good call. A single example is sufficient to refute my assertion.
> 
> But to focus the discussion, the goal is to identify that the calling 
> process is the one that holds the semaphore to justify releasing it. For 
> lookup it's easy as it's always called with it held. lookup_hash will 
> always call revalidate or lookup with the nameidata struct NULL and 
> so can be used to identify the semaphore ownership similar to the devfs 
> code Will mentioned. I don't think the other tests there are relevant to 
> autofs. The patch will require some rework but still the same general 
> idea.
> 
> Can you think of an example code path where __lookup_hash is called with a 
> non null nameidata struct which leads to autofs4 revalidate or lookup.
> 

It looks as though __lookup_hash is called from open_namei with the lock 
held and a nameidata structure.  It also looks as though LOOKUP_OPEN is 
set in the nameidata->flags so you should be able to identify this case. 
  No, that's not because lookup_open does a path_walk to get the parent 
without the lock.  So it looks as though if you have a nameidata 
structure and the flags have LOOKUP_OPEN and LOOKUP_CREATE set but not 
LOOKUP_PARENT, then you are holding the i_sem lock.  The devfs code 
ignores LOOKUP_OPEN.  I have to go right now and I am not sure if the 
checks are equivalent so I will leave that as an exercise for the reader.

Regards,
Will


Will

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-02 16:35                     ` [autofs] " Will Taber
@ 2005-12-02 17:11                       ` Ian Kent
  0 siblings, 0 replies; 95+ messages in thread
From: Ian Kent @ 2005-12-02 17:11 UTC (permalink / raw)
  To: Will Taber
  Cc: Jeff Moyer, Trond Myklebust, Ram Pai, autofs mailing list, linux-fsdevel

On Fri, 2 Dec 2005, Will Taber wrote:

> Ian Kent wrote:
> 
> > 
> > Good call. A single example is sufficient to refute my assertion.
> > 
> > But to focus the discussion, the goal is to identify that the calling
> > process is the one that holds the semaphore to justify releasing it. For
> > lookup it's easy as it's always called with it held. lookup_hash will always
> > call revalidate or lookup with the nameidata struct NULL and so can be used
> > to identify the semaphore ownership similar to the devfs code Will
> > mentioned. I don't think the other tests there are relevant to autofs. The
> > patch will require some rework but still the same general idea.
> > 
> > Can you think of an example code path where __lookup_hash is called with a
> > non null nameidata struct which leads to autofs4 revalidate or lookup.
> > 
> 
> It looks as though __lookup_hash is called from open_namei with the lock held
> and a nameidata structure.  It also looks as though LOOKUP_OPEN is set in the
> nameidata->flags so you should be able to identify this case.  No, that's not
> because lookup_open does a path_walk to get the parent without the lock.  So
> it looks as though if you have a nameidata structure and the flags have
> LOOKUP_OPEN and LOOKUP_CREATE set but not LOOKUP_PARENT, then you are holding
> the i_sem lock.  The devfs code ignores LOOKUP_OPEN.  I have to go right now
> and I am not sure if the checks are equivalent so I will leave that as an
> exercise for the reader.

The create case is not used by autofs. Shouldn't need to be considered.

Ian



^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-02 15:34                 ` Will Taber
@ 2005-12-02 17:29                   ` Ian Kent
  2005-12-02 18:12                     ` Trond Myklebust
  2005-12-02 19:04                     ` [autofs] " Will Taber
  0 siblings, 2 replies; 95+ messages in thread
From: Ian Kent @ 2005-12-02 17:29 UTC (permalink / raw)
  To: Will Taber
  Cc: Trond Myklebust, Jeff Moyer, Ram Pai, autofs mailing list, linux-fsdevel

On Fri, 2 Dec 2005, Will Taber wrote:

> Ian Kent wrote:
> > On Thu, 1 Dec 2005, William H. Taber wrote:
> > 
> > 
> > > Ian Kent wrote:
> > > 
> > > > On Wed, 30 Nov 2005, William H. Taber wrote:
> > > > 
> > > > 
> > > > 
> > > > > Trond Myklebust wrote:
> > > > > 
> > > > > 
> > > > > > On Wed, 2005-11-30 at 15:32 -0500, William H. Taber wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > Not only is there this case, but the original premise is wrong as
> > > > > > > well.
> > > > > > > There is a second case in which a d_revalidate function is called
> > > > > > > with
> > > > > > > the
> > > > > > > parent i_sem and that is when it is called from inside of
> > > > > > > lookup_one_len.
> > > > > > > What makes this tricky is that lookup_one_len is called from
> > > > > > > nfs_sillyrename from inside of nfs_rename which is called,
> > > > > > > naturally
> > > > > > > enough by sys_rename.  The rename code is very careful about the
> > > > > > > order
> > > > > > > in
> > > > > > > which it obtains the parent semaphores because it needs to get two
> > > > > > > of
> > > > > > > them.  It must always obtain the locks in the same order so that
> > > > > > > does
> > > > > > > not
> > > > > > > get into a deadly embrace.  If we start arbitrarily releasing a
> > > > > > > parent
> > > > > > > semaphore in cached_lookup and taking it again after the
> > > > > > > revalidate,
> > > > > > > we
> > > > > > > risk breaking the lock ordering and creating a deadly embrace.
> > > > > > > 
> > > > > > > When I started writing this I thought that it would be safe for
> > > > > > > the
> > > > > > > autofs
> > > > > > > revalidate code to release the parent semaphore because they do
> > > > > > > not
> > > > > > > have a
> > > > > > > rename callback.  But I looked again at the rename code and it
> > > > > > > calls
> > > > > > > lookup_hash on the final source and destination files after
> > > > > > > locking
> > > > > > > the
> > > > > > > parents so the potential for a deadly embrace still exists unless
> > > > > > > there is
> > > > > > > some other assurance that these final lookups will never pend
> > > > > > > waiting
> > > > > > > on
> > > > > > > the automounter in either their revalidate or lookup routines.
> > > > > > > (Actually
> > > > > > > the requirement is that they never give up the parent i_sem lock,
> > > > > > > but
> > > > > > > the
> > > > > > > lookup code has to give up the lock so that the autofs demon can
> > > > > > > run
> > > > > > > and
> > > > > > > perform the mount so it amounts to the same thing.)
> > > > > > > 
> > > > > > > The same issue exists for devfs which also releases the parent
> > > > > > > i_sem
> > > > > > > lock
> > > > > > > so that it can wait inside its revalidation routine.
> > > > > > 
> > > > > > 
> > > > > > So exactly why does autofs4 want to hold the dir->i_sem in
> > > > > > d_revalidate
> > > > > > in the first place? Can't we move any code that requires dir->i_sem
> > > > > > to
> > > > > > be held into a ->lookup() method?
> > > > > 
> > > > > It's not that d_revalidate wants or doesn't want to hold the lock.
> > > > > The
> > > > > caller
> > > > > of lookup_one_len is required to get the lock and this function calls
> > > > > lookup_hash which calls cached_lookup which calls d_revalidate.
> > > > > 
> > > > > 
> > > > > 
> > > > > > Trivially, if you have a d_revalidate that does something like
> > > > > > 
> > > > > > int autofs_revalidate(struct dentry *dentry, struct nameidata *nd)
> > > > > > {
> > > > > > d_drop(dentry);
> > > > > > return 0;
> > > > > > }
> > > > > > 
> > > > > > then the VFS will currently allocate a new dentry with the same
> > > > > > name,
> > > > > > and call ->lookup() on it without dropping dir->i_sem. If you still
> > > > > > need
> > > > > > to reference the old dentry, then put it on a private list
> > > > > > somewhere.
> > > > > > That would also allow you to return the old dentry as the result of
> > > > > > the
> > > > > > ->lookup() operation if that is desirable.
> > > > > 
> > > > > Problem with that, as I understand it and Ian Kent knows better than
> > > > > I, is
> > > > > that the autofs lookup code creates the dentry and fills it in
> > > > > partially
> > > > > and
> > > > > marks it as waiting for mounting and wakes up the automount demon.
> > > > > The
> > > > > demon
> > > > > completes the mount and finishes filling in the dentry.  So we cannot
> > > > > have
> > > > > some other lookup coming in and removing the dentry on us.  At least
> > > > > that
> > > > > is
> > > > > what I understand from Ian's answer when I proposed the same sort of
> > > > > thing
> > > > > to
> > > > > him.   Even if  they end up doing something like that in a future
> > > > > version
> > > > > of
> > > > > the automounter, I would still like a simple patch that can be applied
> > > > > to
> > > > > existing systems as an interim fix.
> > > > 
> > > > 
> > > > Lets see if I can keep this explaination simple.
> > > > 
> > > > The user space process using the autofs filesystem (autodir or
> > > > automount)
> > > > needs to be able to call mkdir at mount time as a result of a callback
> > > > from
> > > > revalidate. Sometimes this comes indirectly from lookup (if the
> > > > directory
> > > > does not already exist).
> > > > 
> > > > lookup_one_len requires the i_sem to be held so two instances of a
> > > > filesystem calling it lead to a deadlock when mkdir is called from
> > > > userspace
> > > > (the third process). In the case we are discussing this happens because
> > > > the
> > > > first process calls lookup which releases the i_sem and calls revalidate
> > > > itself. The second calls revalidate which doesn't release the i_sem and
> > > > is
> > > > places on a wait queue for mount completion. Consequently the mkdir
> > > > blocks.
> > > > 
> > > > So the requirement is that autofs release the i_sem during the callback,
> > > > not
> > > > obtain it.
> > > > 
> > > > Will believes that it is not safe for autofs to release i_sem for the
> > > > callback to user space because it is possible that path that aquired it
> > > > may
> > > > not be the path that has called revalidate and I can see his point.
> > > > 
> > > > Never the less I'm still not convinced that this is possible given the
> > > > restrictions of autofs.
> > > > 
> > > > Let me try and describe this, hopefully more clearly than I've done so
> > > > far.
> > > > 
> > > > The only operations defined for autofs are:
> > > > 
> > > > mkdir, rmdir, symlink and unlink and the only processes that can do
> > > > these operations must be in the same
> > > > process group that mounted the filesystem. EACCESS is returned for all
> > > > other
> > > > processes attempting these operations.
> > > > 
> > > > The other functionality is read-only (and perhaps triggers a mount)
> > > > being
> > > > lookup, revalidate and readdir.
> > > > 
> > > > So the question is, can anyone provide an example of a path that, upon
> > > > calling autofs revalidate or lookup with the i_sem held, not be the path
> > > > that aquired it?
> > 
> > 
> > So still no counter example!
> > 
> > 
> > > Any other process calling lookup_one_len on a file in /net.
> > 
> > 
> > I'm afraid this is not an example it's an assertion.
> > "Any other process" is a little broad I think.
> > You'll need to be more specific.
> > 
> > Consider the example reported by yourself and Ram.
> > 
> > In that example we have processes P1, P2 and lets call the user space
> > callback P1(mount). Also assume there is a mechamism to check the semaphore,
> > release it if held and later re-take it if previously held, like the patch I
> > offered before.
> > 
> > Correct me if I'm wrong but, with the assumption above, you report goes
> > like:
> > 
> > P1 - calls lookup_one_len, takes i_sem and eventually calls autofs4_lookup
> > and indirectly autofs4_revalidate.
> > 
> > P2 - comes along and waits on i_sem.
> And what happens if P3 comes in with a normal lookup without i_sem held and
> calls autofs4_revalidate from do_lookup and wakes up P2? Think both about what
> will happen later in your code path and also what happens when P2 tries to
> release the lock that was no longer held.

P3 goes to the wait queue it can't wake up the waiters only mount 
completion can do that.

> 
> > 
> > P1 - autofs4_revalidate releases i_sem and posts a user space callback.
> > 
> > P2 - aquires i_sem and eventually calls autofs4_revalidate, releases i_sem
> > and is posted to the wait queue for mount completion.
> > 
> > P1(mount) - calls mkdir, aquires i_sem, and calls autofs4_dir_mkdir,  i_sem
> > is then released.
> > 
> > Mount completion is signaled back to autofs4 and the waiters are released.
> > 
> > P1, P2, in any order each (one after the other due to the semaphore) re-take
> > i_sem and each complete their lookup_one_len calls.
> > 
> > On both calls to autofs4_revalidate the calling process is itself the holder
> > of i_sem.
> > 
> > Further, any other process that does a path walk during this time has two
> > possible paths.
> > 
> > First case, the dentry exists, the process is placed on the wait queue along
> > with P1 and P2 awaiting mount completion without taking i_sem.
> > 
> > Second case, the dentry does not yet exist, this process either aquires the
> > i_sem in do_lookup and follows a similar path to P1 and waits on the queue
> > for mount completion or it waits on the i_sem while P1 does the lookup and
> > triggers the mount request, it the aquires i_sem find the dentry exists,
> > releases i_sem and calls autofs4_revalidate without i_sem held and is sent
> > to the wait queue to wait for mount completion.
> > 
> > Again in both these cases a process that enters autofs4_revalidate when the
> > i_sem is held is the process that aquired it.
> 
> But a regular lookup can enter autofs4_revalidate at anytime without holding
> i_sem.

And is a noop as far the semaphore is concerned. Neither taken or 
released.

> 
> The main lookup path does not hold i_sem and Trond was pretty clear about why
> it cannot.  That is why devfs has the code which tries to guess whether it is
> the person holding the lock before it releases it. If you put similar code
> into autofs4_revalidate before you release i_sem it would probably work.  This
> of course makes your code sensitive to changes in the lookup code because the
> devfs code makes assumptions about what flags are set on different lookups.
> The best fix would be to move all of the waiting into autofs4_lookup and not
> hash the dentry until the mount was ready to run.  That is necessarily a large
> piece of coding and would require a lot of testing.  That is why I am
> suggesting for now a patch that determines if the lock was held by the caller
> or not and releasing i_sem if it was, before waiting in autofs4_revalidate.
> And of course remembering whether or not it needs to retake the lock after the
> wait completes.

It's sufficient to recognize the nameidata struct is NULL on a call 
from lookup_hash nothing more that I'm aware of is needed. If that changes 
then of course autofs will need to be changed. autofs also makes 
assumptions about what flags are set for different reasons.

Your assuming that mount point directories don't exist before they are 
mounted upon which is not the case.

Ian


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-12-02 16:04                 ` [autofs] " Jeff Moyer
@ 2005-12-02 17:36                   ` Ian Kent
  2005-12-02 18:33                     ` [autofs] " Will Taber
  0 siblings, 1 reply; 95+ messages in thread
From: Ian Kent @ 2005-12-02 17:36 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: autofs mailing list, linux-fsdevel, William H. Taber, Trond Myklebust

On Fri, 2 Dec 2005, Jeff Moyer wrote:

> ==> Regarding Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix; Ian Kent <raven@themaw.net> adds:
> 
> raven> On Thu, 1 Dec 2005, William H. Taber wrote:
> >> > So the question is, can anyone provide an example of a path that, upon
> >> > calling autofs revalidate or lookup with the i_sem held, not be the
> >> path > that aquired it?
> 
> raven> So still no counter example!
> 
> >> Any other process calling lookup_one_len on a file in /net.
> 
> raven> I'm afraid this is not an example it's an assertion.  "Any other
> raven> process" is a little broad I think.  You'll need to be more
> raven> specific.
> 
> Well, I think we've determined that the reported problem doesn't happen
> with any in-tree callers.  The question, then, is do you want to fix the
> locking problem?  Two approaches were presented in this thread.  I don't
> really like the idea of the hack used by devfs, since it relies on implicit
> semantics.  I haven't given much thought to the second approach, though
> (are we sure it can be made to work?).  It may require a good deal of
> effort, but if it makes things work properly, it's worth considering.  I'm
> just not sure where it sits in the list of priorities, as I know you've got
> a lot on your plate, Ian.

It appears to me that the unhashed directory approach proposed by Will 
does not account for directories that exist but don't have current mounts.

I will re-read the posts, I expect I missed something, and give it more 
thought.

Ian

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-02 17:29                   ` Ian Kent
@ 2005-12-02 18:12                     ` Trond Myklebust
  2005-12-04 12:56                       ` Christoph Hellwig
  2005-12-02 19:04                     ` [autofs] " Will Taber
  1 sibling, 1 reply; 95+ messages in thread
From: Trond Myklebust @ 2005-12-02 18:12 UTC (permalink / raw)
  To: Ian Kent
  Cc: Will Taber, Jeff Moyer, Ram Pai, autofs mailing list, linux-fsdevel

On Sat, 2005-12-03 at 01:29 +0800, Ian Kent wrote:

> It's sufficient to recognize the nameidata struct is NULL on a call 
> from lookup_hash nothing more that I'm aware of is needed. If that changes 
> then of course autofs will need to be changed. autofs also makes 
> assumptions about what flags are set for different reasons.

Consider those cases where the VFS calls ->lookup()/->d_revalidate()
with the nameidata argument set to NULL to be bugs. They are pretty much
all slated to be fixed soon in order to enable features like read-only
bind mounts etc.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-02 17:36                   ` Ian Kent
@ 2005-12-02 18:33                     ` Will Taber
  2005-12-04  9:52                       ` Ian Kent
  0 siblings, 1 reply; 95+ messages in thread
From: Will Taber @ 2005-12-02 18:33 UTC (permalink / raw)
  To: Ian Kent
  Cc: Jeff Moyer, Trond Myklebust, Ram Pai, autofs mailing list, linux-fsdevel

Ian Kent wrote:
> On Fri, 2 Dec 2005, Jeff Moyer wrote:
> 
> 
>>==> Regarding Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix; Ian Kent <raven@themaw.net> adds:
>>
>>raven> On Thu, 1 Dec 2005, William H. Taber wrote:
>>
>>>>>So the question is, can anyone provide an example of a path that, upon
>>>>>calling autofs revalidate or lookup with the i_sem held, not be the
>>>>
>>>>path > that aquired it?
>>
>>raven> So still no counter example!
>>
>>
>>>>Any other process calling lookup_one_len on a file in /net.
>>
>>raven> I'm afraid this is not an example it's an assertion.  "Any other
>>raven> process" is a little broad I think.  You'll need to be more
>>raven> specific.
>>
>>Well, I think we've determined that the reported problem doesn't happen
>>with any in-tree callers.  The question, then, is do you want to fix the
>>locking problem?  Two approaches were presented in this thread.  I don't
>>really like the idea of the hack used by devfs, since it relies on implicit
>>semantics.  I haven't given much thought to the second approach, though
>>(are we sure it can be made to work?).  It may require a good deal of
>>effort, but if it makes things work properly, it's worth considering.  I'm
>>just not sure where it sits in the list of priorities, as I know you've got
>>a lot on your plate, Ian.
> 
> 
> It appears to me that the unhashed directory approach proposed by Will 
> does not account for directories that exist but don't have current mounts.
> 
> I will re-read the posts, I expect I missed something, and give it more 
> thought.
> 
It doesn't consider that case.  You had mentioned it but I had forgotten.

Will

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-02 17:29                   ` Ian Kent
  2005-12-02 18:12                     ` Trond Myklebust
@ 2005-12-02 19:04                     ` Will Taber
  2005-12-04  9:39                       ` Ian Kent
  1 sibling, 1 reply; 95+ messages in thread
From: Will Taber @ 2005-12-02 19:04 UTC (permalink / raw)
  To: Ian Kent
  Cc: Trond Myklebust, Jeff Moyer, Ram Pai, autofs mailing list, linux-fsdevel

Ian Kent wrote:
> On Fri, 2 Dec 2005, Will Taber wrote:
> 
> 
>>Ian Kent wrote:
>>
>>>On Thu, 1 Dec 2005, William H. Taber wrote:
>>>
>>>
>>>
>>>>Ian Kent wrote:
>>>>
>>>>
>>>>>On Wed, 30 Nov 2005, William H. Taber wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Trond Myklebust wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>On Wed, 2005-11-30 at 15:32 -0500, William H. Taber wrote:
>>>>>>>
>>>>>
>>>>>Lets see if I can keep this explaination simple.
>>>>>
>>>>>The user space process using the autofs filesystem (autodir or
>>>>>automount)
>>>>>needs to be able to call mkdir at mount time as a result of a callback
>>>>>from
>>>>>revalidate. Sometimes this comes indirectly from lookup (if the
>>>>>directory
>>>>>does not already exist).
>>>>>
>>>>>lookup_one_len requires the i_sem to be held so two instances of a
>>>>>filesystem calling it lead to a deadlock when mkdir is called from
>>>>>userspace
>>>>>(the third process). In the case we are discussing this happens because
>>>>>the
>>>>>first process calls lookup which releases the i_sem and calls revalidate
>>>>>itself. The second calls revalidate which doesn't release the i_sem and
>>>>>is
>>>>>places on a wait queue for mount completion. Consequently the mkdir
>>>>>blocks.
>>>>>
>>>>>So the requirement is that autofs release the i_sem during the callback,
>>>>>not
>>>>>obtain it.
>>>>>
>>>>>Will believes that it is not safe for autofs to release i_sem for the
>>>>>callback to user space because it is possible that path that aquired it
>>>>>may
>>>>>not be the path that has called revalidate and I can see his point.
>>>>>
>>>>>Never the less I'm still not convinced that this is possible given the
>>>>>restrictions of autofs.
>>>>>
>>>>>Let me try and describe this, hopefully more clearly than I've done so
>>>>>far.
>>>>>
>>>>>The only operations defined for autofs are:
>>>>>
>>>>>mkdir, rmdir, symlink and unlink and the only processes that can do
>>>>>these operations must be in the same
>>>>>process group that mounted the filesystem. EACCESS is returned for all
>>>>>other
>>>>>processes attempting these operations.
>>>>>
>>>>>The other functionality is read-only (and perhaps triggers a mount)
>>>>>being
>>>>>lookup, revalidate and readdir.
>>>>>
>>>>>So the question is, can anyone provide an example of a path that, upon
>>>>>calling autofs revalidate or lookup with the i_sem held, not be the path
>>>>>that aquired it?
>>>
>>>
>>>So still no counter example!
>>>
>>>
>>>
>>>>Any other process calling lookup_one_len on a file in /net.
>>>
>>>
>>>I'm afraid this is not an example it's an assertion.
>>>"Any other process" is a little broad I think.
>>>You'll need to be more specific.
>>>
>>>Consider the example reported by yourself and Ram.
>>>
>>>In that example we have processes P1, P2 and lets call the user space
>>>callback P1(mount). Also assume there is a mechamism to check the semaphore,
>>>release it if held and later re-take it if previously held, like the patch I
>>>offered before.
>>>
>>>Correct me if I'm wrong but, with the assumption above, you report goes
>>>like:
>>>
>>>P1 - calls lookup_one_len, takes i_sem and eventually calls autofs4_lookup
>>>and indirectly autofs4_revalidate.
>>>
>>>P2 - comes along and waits on i_sem.
>>
>>And what happens if P3 comes in with a normal lookup without i_sem held and
>>calls autofs4_revalidate from do_lookup and wakes up P2? Think both about what
>>will happen later in your code path and also what happens when P2 tries to
>>release the lock that was no longer held.
> 
> 
> P3 goes to the wait queue it can't wake up the waiters only mount 
> completion can do that.


But I understood your proposal to be that it d_revalidate would be 
unconditionally releasing the i_sem before went on the wait queue.  My 
point here was that P3 does not hold the i_sem lock so if it releases 
i_sem here, it will be waking up P2 before P1 has finished and released 
the lock.  Even if you don't end up in trouble from accessing something 
that hasn't been initialized yet, the counts on the semaphore are messed 
up because up has been called more often than down.

> 
> 
>>>P1 - autofs4_revalidate releases i_sem and posts a user space callback.
>>>
>>>P2 - aquires i_sem and eventually calls autofs4_revalidate, releases i_sem
>>>and is posted to the wait queue for mount completion.
>>>
>>>P1(mount) - calls mkdir, aquires i_sem, and calls autofs4_dir_mkdir,  i_sem
>>>is then released.
>>>
>>>Mount completion is signaled back to autofs4 and the waiters are released.
>>>
>>>P1, P2, in any order each (one after the other due to the semaphore) re-take
>>>i_sem and each complete their lookup_one_len calls.
>>>
>>>On both calls to autofs4_revalidate the calling process is itself the holder
>>>of i_sem.
>>>
>>>Further, any other process that does a path walk during this time has two
>>>possible paths.
>>>
>>>First case, the dentry exists, the process is placed on the wait queue along
>>>with P1 and P2 awaiting mount completion without taking i_sem.
>>>
>>>Second case, the dentry does not yet exist, this process either aquires the
>>>i_sem in do_lookup and follows a similar path to P1 and waits on the queue
>>>for mount completion or it waits on the i_sem while P1 does the lookup and
>>>triggers the mount request, it the aquires i_sem find the dentry exists,
>>>releases i_sem and calls autofs4_revalidate without i_sem held and is sent
>>>to the wait queue to wait for mount completion.
>>>
>>>Again in both these cases a process that enters autofs4_revalidate when the
>>>i_sem is held is the process that aquired it.
>>
>>But a regular lookup can enter autofs4_revalidate at anytime without holding
>>i_sem.
> 
> 
> And is a noop as far the semaphore is concerned. Neither taken or 
> released.

Only if you have the code in to check which code path you came from 
before you release the lock in d_revalidate.


> 
> 
>>The main lookup path does not hold i_sem and Trond was pretty clear about why
>>it cannot.  That is why devfs has the code which tries to guess whether it is
>>the person holding the lock before it releases it. If you put similar code
>>into autofs4_revalidate before you release i_sem it would probably work.  This
>>of course makes your code sensitive to changes in the lookup code because the
>>devfs code makes assumptions about what flags are set on different lookups.
>>The best fix would be to move all of the waiting into autofs4_lookup and not
>>hash the dentry until the mount was ready to run.  That is necessarily a large
>>piece of coding and would require a lot of testing.  That is why I am
>>suggesting for now a patch that determines if the lock was held by the caller
>>or not and releasing i_sem if it was, before waiting in autofs4_revalidate.
>>And of course remembering whether or not it needs to retake the lock after the
>>wait completes.
> 
> 
> It's sufficient to recognize the nameidata struct is NULL on a call 
> from lookup_hash nothing more that I'm aware of is needed. If that changes 
> then of course autofs will need to be changed. autofs also makes 
> assumptions about what flags are set for different reasons.
> 
> Your assuming that mount point directories don't exist before they are 
> mounted upon which is not the case.

OK.  I forgot that.  But I would still want you to think about the open 
case.  The only reason I say that is because, more times than I would 
like to admit, I am preparing to cd into a directory and vi a file to 
look at it, except that I get ahead of myself and I end up trying to vi 
the directory.  The automounter may never try to open the directory but 
you also have to consider fat fingered fools like myself.

Will

> 
> Ian
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-12-02 19:04                     ` [autofs] " Will Taber
@ 2005-12-04  9:39                       ` Ian Kent
  0 siblings, 0 replies; 95+ messages in thread
From: Ian Kent @ 2005-12-04  9:39 UTC (permalink / raw)
  To: Will Taber; +Cc: autofs mailing list, linux-fsdevel, Trond Myklebust

On Fri, 2 Dec 2005, Will Taber wrote:

> Ian Kent wrote:
> > On Fri, 2 Dec 2005, Will Taber wrote:
> > 
> > 
> > > Ian Kent wrote:
> > > 
> > > > On Thu, 1 Dec 2005, William H. Taber wrote:
> > > > 
> > > > 
> > > > 
> > > > > Ian Kent wrote:
> > > > > 
> > > > > 
> > > > > > On Wed, 30 Nov 2005, William H. Taber wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > Trond Myklebust wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > On Wed, 2005-11-30 at 15:32 -0500, William H. Taber wrote:
> > > > > > > > 
> > > > > > 
> > > > > > Lets see if I can keep this explaination simple.
> > > > > > 
> > > > > > The user space process using the autofs filesystem (autodir or
> > > > > > automount)
> > > > > > needs to be able to call mkdir at mount time as a result of a
> > > > > > callback
> > > > > > from
> > > > > > revalidate. Sometimes this comes indirectly from lookup (if the
> > > > > > directory
> > > > > > does not already exist).
> > > > > > 
> > > > > > lookup_one_len requires the i_sem to be held so two instances of a
> > > > > > filesystem calling it lead to a deadlock when mkdir is called from
> > > > > > userspace
> > > > > > (the third process). In the case we are discussing this happens
> > > > > > because
> > > > > > the
> > > > > > first process calls lookup which releases the i_sem and calls
> > > > > > revalidate
> > > > > > itself. The second calls revalidate which doesn't release the i_sem
> > > > > > and
> > > > > > is
> > > > > > places on a wait queue for mount completion. Consequently the mkdir
> > > > > > blocks.
> > > > > > 
> > > > > > So the requirement is that autofs release the i_sem during the
> > > > > > callback,
> > > > > > not
> > > > > > obtain it.
> > > > > > 
> > > > > > Will believes that it is not safe for autofs to release i_sem for
> > > > > > the
> > > > > > callback to user space because it is possible that path that aquired
> > > > > > it
> > > > > > may
> > > > > > not be the path that has called revalidate and I can see his point.
> > > > > > 
> > > > > > Never the less I'm still not convinced that this is possible given
> > > > > > the
> > > > > > restrictions of autofs.
> > > > > > 
> > > > > > Let me try and describe this, hopefully more clearly than I've done
> > > > > > so
> > > > > > far.
> > > > > > 
> > > > > > The only operations defined for autofs are:
> > > > > > 
> > > > > > mkdir, rmdir, symlink and unlink and the only processes that can do
> > > > > > these operations must be in the same
> > > > > > process group that mounted the filesystem. EACCESS is returned for
> > > > > > all
> > > > > > other
> > > > > > processes attempting these operations.
> > > > > > 
> > > > > > The other functionality is read-only (and perhaps triggers a mount)
> > > > > > being
> > > > > > lookup, revalidate and readdir.
> > > > > > 
> > > > > > So the question is, can anyone provide an example of a path that,
> > > > > > upon
> > > > > > calling autofs revalidate or lookup with the i_sem held, not be the
> > > > > > path
> > > > > > that aquired it?
> > > > 
> > > > 
> > > > So still no counter example!
> > > > 
> > > > 
> > > > 
> > > > > Any other process calling lookup_one_len on a file in /net.
> > > > 
> > > > 
> > > > I'm afraid this is not an example it's an assertion.
> > > > "Any other process" is a little broad I think.
> > > > You'll need to be more specific.
> > > > 
> > > > Consider the example reported by yourself and Ram.
> > > > 
> > > > In that example we have processes P1, P2 and lets call the user space
> > > > callback P1(mount). Also assume there is a mechamism to check the
> > > > semaphore,
> > > > release it if held and later re-take it if previously held, like the
> > > > patch I
> > > > offered before.
> > > > 
> > > > Correct me if I'm wrong but, with the assumption above, you report goes
> > > > like:
> > > > 
> > > > P1 - calls lookup_one_len, takes i_sem and eventually calls
> > > > autofs4_lookup
> > > > and indirectly autofs4_revalidate.
> > > > 
> > > > P2 - comes along and waits on i_sem.
> > > 
> > > And what happens if P3 comes in with a normal lookup without i_sem held
> > > and
> > > calls autofs4_revalidate from do_lookup and wakes up P2? Think both about
> > > what
> > > will happen later in your code path and also what happens when P2 tries to
> > > release the lock that was no longer held.
> > 
> > 
> > P3 goes to the wait queue it can't wake up the waiters only mount completion
> > can do that.
> 
> 
> But I understood your proposal to be that it d_revalidate would be
> unconditionally releasing the i_sem before went on the wait queue.  My point
> here was that P3 does not hold the i_sem lock so if it releases i_sem here, it
> will be waking up P2 before P1 has finished and released the lock.  Even if
> you don't end up in trouble from accessing something that hasn't been
> initialized yet, the counts on the semaphore are messed up because up has been
> called more often than down.

I wasn't saying unconditionally but never the less Jeffs argument points 
out the error of my ways quite well.

> 
> > 
> > 
> > > > P1 - autofs4_revalidate releases i_sem and posts a user space callback.
> > > > 
> > > > P2 - aquires i_sem and eventually calls autofs4_revalidate, releases
> > > > i_sem
> > > > and is posted to the wait queue for mount completion.
> > > > 
> > > > P1(mount) - calls mkdir, aquires i_sem, and calls autofs4_dir_mkdir,
> > > > i_sem
> > > > is then released.
> > > > 
> > > > Mount completion is signaled back to autofs4 and the waiters are
> > > > released.
> > > > 
> > > > P1, P2, in any order each (one after the other due to the semaphore)
> > > > re-take
> > > > i_sem and each complete their lookup_one_len calls.
> > > > 
> > > > On both calls to autofs4_revalidate the calling process is itself the
> > > > holder
> > > > of i_sem.
> > > > 
> > > > Further, any other process that does a path walk during this time has
> > > > two
> > > > possible paths.
> > > > 
> > > > First case, the dentry exists, the process is placed on the wait queue
> > > > along
> > > > with P1 and P2 awaiting mount completion without taking i_sem.
> > > > 
> > > > Second case, the dentry does not yet exist, this process either aquires
> > > > the
> > > > i_sem in do_lookup and follows a similar path to P1 and waits on the
> > > > queue
> > > > for mount completion or it waits on the i_sem while P1 does the lookup
> > > > and
> > > > triggers the mount request, it the aquires i_sem find the dentry exists,
> > > > releases i_sem and calls autofs4_revalidate without i_sem held and is
> > > > sent
> > > > to the wait queue to wait for mount completion.
> > > > 
> > > > Again in both these cases a process that enters autofs4_revalidate when
> > > > the
> > > > i_sem is held is the process that aquired it.
> > > 
> > > But a regular lookup can enter autofs4_revalidate at anytime without
> > > holding
> > > i_sem.
> > 
> > 
> > And is a noop as far the semaphore is concerned. Neither taken or released.
> 
> Only if you have the code in to check which code path you came from before you
> release the lock in d_revalidate.
> 
> 
> > 
> > 
> > > The main lookup path does not hold i_sem and Trond was pretty clear about
> > > why
> > > it cannot.  That is why devfs has the code which tries to guess whether it
> > > is
> > > the person holding the lock before it releases it. If you put similar code
> > > into autofs4_revalidate before you release i_sem it would probably work.
> > > This
> > > of course makes your code sensitive to changes in the lookup code because
> > > the
> > > devfs code makes assumptions about what flags are set on different
> > > lookups.
> > > The best fix would be to move all of the waiting into autofs4_lookup and
> > > not
> > > hash the dentry until the mount was ready to run.  That is necessarily a
> > > large
> > > piece of coding and would require a lot of testing.  That is why I am
> > > suggesting for now a patch that determines if the lock was held by the
> > > caller
> > > or not and releasing i_sem if it was, before waiting in
> > > autofs4_revalidate.
> > > And of course remembering whether or not it needs to retake the lock after
> > > the
> > > wait completes.
> > 
> > 
> > It's sufficient to recognize the nameidata struct is NULL on a call from
> > lookup_hash nothing more that I'm aware of is needed. If that changes then
> > of course autofs will need to be changed. autofs also makes assumptions
> > about what flags are set for different reasons.
> > 
> > Your assuming that mount point directories don't exist before they are
> > mounted upon which is not the case.
> 
> OK.  I forgot that.  But I would still want you to think about the open case.
> The only reason I say that is because, more times than I would like to admit,
> I am preparing to cd into a directory and vi a file to look at it, except that
> I get ahead of myself and I end up trying to vi the directory.  The
> automounter may never try to open the directory but you also have to consider
> fat fingered fools like myself.
> 

Of course it doesn't yet exist.

Yes.

Ian

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-12-02 18:33                     ` [autofs] " Will Taber
@ 2005-12-04  9:52                       ` Ian Kent
  2005-12-04 14:54                         ` Ian Kent
  0 siblings, 1 reply; 95+ messages in thread
From: Ian Kent @ 2005-12-04  9:52 UTC (permalink / raw)
  To: Will Taber; +Cc: autofs mailing list, linux-fsdevel, Trond Myklebust

On Fri, 2 Dec 2005, Will Taber wrote:

> > > 
> > > Well, I think we've determined that the reported problem doesn't happen
> > > with any in-tree callers.  The question, then, is do you want to fix the
> > > locking problem?  Two approaches were presented in this thread.  I don't
> > > really like the idea of the hack used by devfs, since it relies on
> > > implicit
> > > semantics.  I haven't given much thought to the second approach, though
> > > (are we sure it can be made to work?).  It may require a good deal of
> > > effort, but if it makes things work properly, it's worth considering.  I'm
> > > just not sure where it sits in the list of priorities, as I know you've
> > > got
> > > a lot on your plate, Ian.
> > 
> > 
> > It appears to me that the unhashed directory approach proposed by Will does
> > not account for directories that exist but don't have current mounts.
> > 
> > I will re-read the posts, I expect I missed something, and give it more
> > thought.
> > 
> It doesn't consider that case.  You had mentioned it but I had forgotten.
> 

OK so I decided to give Wills recommendation a bit of a run and I've come 
up with a first cut patch which of course doesn't work.

The approach is to force all callbacks to go through lookup instead of 
some through revalidate as well. The patch basically posts the dentry to a 
pending list and unhashs it, then picks it up from the list in the lookup 
and rehash it. Should be fairly simple really but I'm doing something 
obviously wrong somewhere.

I'm seeing slab corruption and I really can't see why this should be the 
case. Anyone got any ideas. The patch is against 2.6.15-rc1 but the kernel 
I'm compliling against is a RedHat patched 2.6.11 (Aurora).

--- linux-2.6.15-rc1/fs/autofs4/root.c.lookup-deadlock	2005-11-17 18:58:38.000000000 +0800
+++ linux-2.6.15-rc1/fs/autofs4/root.c	2005-12-04 11:15:51.000000000 +0800
@@ -290,15 +290,20 @@ out:
 
 static int try_to_fill_dentry(struct dentry *dentry, 
 			      struct super_block *sb,
-			      struct autofs_sb_info *sbi, int flags)
+			      struct autofs_sb_info *sbi)
 {
-	struct autofs_info *de_info = autofs4_dentry_ino(dentry);
+	struct autofs_info *ino = autofs4_dentry_ino(dentry);
 	int status = 0;
 
-	/* Block on any pending expiry here; invalidate the dentry
-           when expiration is done to trigger mount request with a new
-           dentry */
-	if (de_info && (de_info->flags & AUTOFS_INF_EXPIRING)) {
+	DPRINTK("dentry=%p %.*s ino=%p",
+		 dentry, dentry->d_name.len, dentry->d_name.name, dentry->d_inode);
+
+	/*
+	 * Block on any pending expiry here; invalidate the dentry
+	 * when expiration is done to trigger mount request with a new
+	 * dentry
+	 */
+	if (ino && (ino->flags & AUTOFS_INF_EXPIRING)) {
 		DPRINTK("waiting for expire %p name=%.*s",
 			 dentry, dentry->d_name.len, dentry->d_name.name);
 
@@ -308,70 +313,34 @@ static int try_to_fill_dentry(struct den
 		
 		/*
 		 * If the directory still exists the mount request must
-		 * continue otherwise it can't be followed at the right
-		 * time during the walk.
+		 * continue otherwise it can't be followed during the walk.
 		 */
-		status = d_invalidate(dentry);
-		if (status != -EBUSY)
-			return 0;
+		if (d_invalidate(dentry) != -EBUSY)
+			return status;
 	}
 
-	DPRINTK("dentry=%p %.*s ino=%p",
-		 dentry, dentry->d_name.len, dentry->d_name.name, dentry->d_inode);
-
 	/* Wait for a pending mount, triggering one if there isn't one already */
-	if (dentry->d_inode == NULL) {
-		DPRINTK("waiting for mount name=%.*s",
+	DPRINTK("waiting for mount name=%.*s",
 			 dentry->d_name.len, dentry->d_name.name);
 
-		status = autofs4_wait(sbi, dentry, NFY_MOUNT);
-		 
-		DPRINTK("mount done status=%d", status);
+	status = autofs4_wait(sbi, dentry, NFY_MOUNT);
 
-		if (status && dentry->d_inode)
-			return 0; /* Try to get the kernel to invalidate this dentry */
-		
-		/* Turn this into a real negative dentry? */
-		if (status == -ENOENT) {
-			dentry->d_time = jiffies + AUTOFS_NEGATIVE_TIMEOUT;
-			spin_lock(&dentry->d_lock);
-			dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
-			spin_unlock(&dentry->d_lock);
-			return 1;
-		} else if (status) {
-			/* Return a negative dentry, but leave it "pending" */
-			return 1;
-		}
-	/* Trigger mount for path component or follow link */
-	} else if (flags & (LOOKUP_CONTINUE | LOOKUP_DIRECTORY) ||
-			current->link_count) {
-		DPRINTK("waiting for mount name=%.*s",
-			dentry->d_name.len, dentry->d_name.name);
+	DPRINTK("mount done status=%d", status);
+
+	/*
+	 * We don't update the usages for the autofs daemon itself, this
+	 * is necessary for recursive autofs mounts
+	 */
+	if (!autofs4_oz_mode(sbi))
+		autofs4_update_usage(dentry);
 
+	if (!status || status == -ENOENT) {
 		spin_lock(&dentry->d_lock);
-		dentry->d_flags |= DCACHE_AUTOFS_PENDING;
+		dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
 		spin_unlock(&dentry->d_lock);
-		status = autofs4_wait(sbi, dentry, NFY_MOUNT);
-
-		DPRINTK("mount done status=%d", status);
-
-		if (status) {
-			spin_lock(&dentry->d_lock);
-			dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
-			spin_unlock(&dentry->d_lock);
-			return 0;
-		}
 	}
 
-	/* We don't update the usages for the autofs daemon itself, this
-	   is necessary for recursive autofs mounts */
-	if (!autofs4_oz_mode(sbi))
-		autofs4_update_usage(dentry);
-
-	spin_lock(&dentry->d_lock);
-	dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
-	spin_unlock(&dentry->d_lock);
-	return 1;
+	return status;
 }
 
 /*
@@ -382,22 +351,25 @@ static int try_to_fill_dentry(struct den
  */
 static int autofs4_revalidate(struct dentry * dentry, struct nameidata *nd)
 {
-	struct inode * dir = dentry->d_parent->d_inode;
+	struct inode *dir = dentry->d_parent->d_inode;
 	struct autofs_sb_info *sbi = autofs4_sbi(dir->i_sb);
+	struct autofs_info *ino = autofs4_dentry_ino(dentry);
 	int oz_mode = autofs4_oz_mode(sbi);
-	int flags = nd ? nd->flags : 0;
-	int status = 1;
+	int need_lookup;
 
-	/* Pending dentry */
-	if (autofs4_ispending(dentry)) {
-		if (!oz_mode)
-			status = try_to_fill_dentry(dentry, dir->i_sb, sbi, flags);
-		return status;
-	}
+	DPRINTK("name = %.*s oz_mode = %d",
+		dentry->d_name.len, dentry->d_name.name, oz_mode);
+
+	if (oz_mode || autofs4_ispending(dentry))
+		return 1;
 
-	/* Negative dentry.. invalidate if "old" */
-	if (dentry->d_inode == NULL)
-		return (dentry->d_time - jiffies <= AUTOFS_NEGATIVE_TIMEOUT);
+	need_lookup = nd ? nd->flags & (LOOKUP_CONTINUE | LOOKUP_DIRECTORY): 1;
+
+	/*
+	 * Clean up those pesky stale dentrys before checking for
+	 * an empty directory
+	 */
+	d_invalidate(dentry);
 
 	/* Check for a non-mountpoint directory with no contents */
 	spin_lock(&dcache_lock);
@@ -406,16 +378,21 @@ static int autofs4_revalidate(struct den
 	    list_empty(&dentry->d_subdirs)) {
 		DPRINTK("dentry=%p %.*s, emptydir",
 			 dentry, dentry->d_name.len, dentry->d_name.name);
-		spin_unlock(&dcache_lock);
-		if (!oz_mode)
-			status = try_to_fill_dentry(dentry, dir->i_sb, sbi, flags);
-		return status;
+
+		if (ino && (need_lookup || current->link_count)) {
+			spin_lock(&dentry->d_lock);
+			dentry->d_flags |= DCACHE_AUTOFS_PENDING;
+			__d_drop(dentry);
+			spin_unlock(&dentry->d_lock);
+			list_add(&ino->request, &sbi->pending);
+			spin_unlock(&dcache_lock);
+			return 0;
+		}
 	}
 	spin_unlock(&dcache_lock);
 
 	/* Update the usage list */
-	if (!oz_mode)
-		autofs4_update_usage(dentry);
+	autofs4_update_usage(dentry);
 
 	return 1;
 }
@@ -449,11 +426,40 @@ static struct dentry_operations autofs4_
 	.d_release	= autofs4_dentry_release,
 };
 
+static struct autofs_info *autofs4_lookup_pending(struct autofs_sb_info *sbi, struct qstr *name)
+{
+	unsigned int len = name->len;
+	unsigned int hash = name->hash;
+	const unsigned char *str = name->name;
+	struct list_head *p;
+	struct autofs_info *rq_ino = NULL;
+
+	list_for_each(p, &sbi->pending) {
+		struct autofs_info *this;
+		struct qstr *name;
+
+		this = list_entry(p, struct autofs_info, request);
+		name = &this->dentry->d_name;
+
+		if (name->hash != hash)
+			continue;
+		if(name->len != len)
+			continue;
+		if (!memcmp(name->name, str, len)) {
+			rq_ino = this;
+			break;
+		}
+	}
+	return rq_ino;
+}
+
 /* Lookups in the root directory */
 static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, struct nameidata *nd)
 {
-	struct autofs_sb_info *sbi;
-	int oz_mode;
+	struct autofs_sb_info *sbi = autofs4_sbi(dir->i_sb);
+	int oz_mode = autofs4_oz_mode(sbi);
+	struct autofs_info *rq_ino;
+	int status = 0;
 
 	DPRINTK("name = %.*s",
 		dentry->d_name.len, dentry->d_name.name);
@@ -461,36 +467,52 @@ static struct dentry *autofs4_lookup(str
 	if (dentry->d_name.len > NAME_MAX)
 		return ERR_PTR(-ENAMETOOLONG);/* File name too long to exist */
 
-	sbi = autofs4_sbi(dir->i_sb);
-
-	oz_mode = autofs4_oz_mode(sbi);
 	DPRINTK("pid = %u, pgrp = %u, catatonic = %d, oz_mode = %d",
 		 current->pid, process_group(current), sbi->catatonic, oz_mode);
 
-	/*
-	 * Mark the dentry incomplete, but add it. This is needed so
-	 * that the VFS layer knows about the dentry, and we can count
-	 * on catching any lookups through the revalidate.
-	 *
-	 * Let all the hard work be done by the revalidate function that
-	 * needs to be able to do this anyway..
-	 *
-	 * We need to do this before we release the directory semaphore.
-	 */
-	dentry->d_op = &autofs4_root_dentry_operations;
-
-	if (!oz_mode) {
-		spin_lock(&dentry->d_lock);
-		dentry->d_flags |= DCACHE_AUTOFS_PENDING;
-		spin_unlock(&dentry->d_lock);
-	}
-	dentry->d_fsdata = NULL;
-	d_add(dentry, NULL);
-
-	if (dentry->d_op && dentry->d_op->d_revalidate) {
+	spin_lock(&dcache_lock);
+	rq_ino = autofs4_lookup_pending(sbi, &dentry->d_name);
+	if (!oz_mode && rq_ino && autofs4_ispending(rq_ino->dentry)) {
+		/*
+		 * Revalidate has sent this to us to post a mount request.
+		 *
+		 * This must be done via lookup so we can be sure that it
+		 * was this path that aquired the directory semaphore.
+		 */
+		list_del_init(&rq_ino->request);
+		dentry = rq_ino->dentry;
+		spin_unlock(&dcache_lock);
+		d_rehash(dentry);
 		up(&dir->i_sem);
-		(dentry->d_op->d_revalidate)(dentry, nd);
+		status = try_to_fill_dentry(dentry, dir->i_sb, sbi);
 		down(&dir->i_sem);
+
+		/* If the mount suceeded we're done */
+		if (!status)
+			return dentry;
+	} else {
+		spin_unlock(&dcache_lock);
+		/*
+		 * Mark the dentry incomplete, but add it. This is needed
+		 * so that the VFS layer knows about the dentry, and we
+		 * can count on catching any lookups through the revalidate.
+		 *
+		 * We need to do this before we release the directory
+		 * semaphore.
+		 */
+		dentry->d_op = &autofs4_root_dentry_operations;
+
+		dentry->d_fsdata = NULL;
+		d_add(dentry, NULL);
+
+		if (!oz_mode) {
+			spin_lock(&dentry->d_lock);
+			dentry->d_flags |= DCACHE_AUTOFS_PENDING;
+			spin_unlock(&dentry->d_lock);
+			up(&dir->i_sem);
+			status = try_to_fill_dentry(dentry, dir->i_sb, sbi);
+			down(&dir->i_sem);
+		}
 	}
 
 	/*
@@ -498,6 +520,9 @@ static struct dentry *autofs4_lookup(str
 	 * a signal. If so we can force a restart..
 	 */
 	if (dentry->d_flags & DCACHE_AUTOFS_PENDING) {
+		spin_lock(&dentry->d_lock);
+		dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
+		spin_unlock(&dentry->d_lock);
 		/* See if we were interrupted */
 		if (signal_pending(current)) {
 			sigset_t *sigset = &current->pending.signal;
@@ -515,9 +540,13 @@ static struct dentry *autofs4_lookup(str
 	 * doesn't do the right thing for all system calls, but it should
 	 * be OK for the operations we permit from an autofs.
 	 */
-	if ( dentry->d_inode && d_unhashed(dentry) )
+	if (dentry->d_inode && d_unhashed(dentry))
 		return ERR_PTR(-ENOENT);
 
+	/* Otherwise try to get the VFS to report the error */
+	if (status)
+		return ERR_PTR(status);
+
 	return NULL;
 }
 
--- linux-2.6.15-rc1/fs/autofs4/autofs_i.h.lookup-deadlock	2005-12-03 10:32:19.000000000 +0800
+++ linux-2.6.15-rc1/fs/autofs4/autofs_i.h	2005-12-04 09:59:52.000000000 +0800
@@ -63,6 +63,8 @@ struct autofs_info {
 	struct autofs_sb_info *sbi;
 	unsigned long last_used;
 
+	struct list_head request;
+
 	mode_t	mode;
 	size_t	size;
 
@@ -105,6 +107,7 @@ struct autofs_sb_info {
 	struct semaphore wq_sem;
 	spinlock_t fs_lock;
 	struct autofs_wait_queue *queues; /* Wait queue pointer */
+	struct list_head pending;
 };
 
 static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb)
--- linux-2.6.15-rc1/fs/autofs4/inode.c.lookup-deadlock	2005-12-03 20:27:17.000000000 +0800
+++ linux-2.6.15-rc1/fs/autofs4/inode.c	2005-12-04 09:59:16.000000000 +0800
@@ -49,6 +49,8 @@ struct autofs_info *autofs4_init_ino(str
 
 	ino->sbi = sbi;
 
+	INIT_LIST_HEAD(&ino->request);
+
 	if (reinit && ino->free)
 		(ino->free)(ino);
 
@@ -272,6 +274,7 @@ int autofs4_fill_super(struct super_bloc
 	init_MUTEX(&sbi->wq_sem);
 	spin_lock_init(&sbi->fs_lock);
 	sbi->queues = NULL;
+	INIT_LIST_HEAD(&sbi->pending);
 	s->s_blocksize = 1024;
 	s->s_blocksize_bits = 10;
 	s->s_magic = AUTOFS_SUPER_MAGIC;

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-02 18:12                     ` Trond Myklebust
@ 2005-12-04 12:56                       ` Christoph Hellwig
  2005-12-04 12:57                         ` Christoph Hellwig
  2005-12-04 14:56                         ` Ian Kent
  0 siblings, 2 replies; 95+ messages in thread
From: Christoph Hellwig @ 2005-12-04 12:56 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Ian Kent, Will Taber, Jeff Moyer, Ram Pai, autofs mailing list,
	linux-fsdevel

On Fri, Dec 02, 2005 at 01:12:27PM -0500, Trond Myklebust wrote:
> On Sat, 2005-12-03 at 01:29 +0800, Ian Kent wrote:
> 
> > It's sufficient to recognize the nameidata struct is NULL on a call 
> > from lookup_hash nothing more that I'm aware of is needed. If that changes 
> > then of course autofs will need to be changed. autofs also makes 
> > assumptions about what flags are set for different reasons.
> 
> Consider those cases where the VFS calls ->lookup()/->d_revalidate()
> with the nameidata argument set to NULL to be bugs. They are pretty much
> all slated to be fixed soon in order to enable features like read-only
> bind mounts etc.

They are still NULL in exactly one case: lookup_one_len.  Given the design
of lookup_one_len we can't get at a nameidata there at all.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-04 12:56                       ` Christoph Hellwig
@ 2005-12-04 12:57                         ` Christoph Hellwig
  2005-12-04 14:58                           ` Ian Kent
  2005-12-04 14:56                         ` Ian Kent
  1 sibling, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2005-12-04 12:57 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Ian Kent, Will Taber, Jeff Moyer, Ram Pai, autofs mailing list,
	linux-fsdevel

On Sun, Dec 04, 2005 at 12:56:12PM +0000, Christoph Hellwig wrote:
> They are still NULL in exactly one case: lookup_one_len.  Given the design
> of lookup_one_len we can't get at a nameidata there at all.

Oh, forgot the most important bit here :)  lookup_one_len is a library helper
never called by the VFS.  autofs (v4 at least) doesn't use it so now always
get a nameidata.  In fact if you look in -mm there's a patch from me that
makes use of that fact.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-12-04  9:52                       ` Ian Kent
@ 2005-12-04 14:54                         ` Ian Kent
  2005-12-05 15:40                           ` Ian Kent
  0 siblings, 1 reply; 95+ messages in thread
From: Ian Kent @ 2005-12-04 14:54 UTC (permalink / raw)
  To: Will Taber; +Cc: autofs mailing list, linux-fsdevel, Trond Myklebust

On Sun, 4 Dec 2005, Ian Kent wrote:

> On Fri, 2 Dec 2005, Will Taber wrote:
> 
> > > > 
> > > > Well, I think we've determined that the reported problem doesn't happen
> > > > with any in-tree callers.  The question, then, is do you want to fix the
> > > > locking problem?  Two approaches were presented in this thread.  I don't
> > > > really like the idea of the hack used by devfs, since it relies on
> > > > implicit
> > > > semantics.  I haven't given much thought to the second approach, though
> > > > (are we sure it can be made to work?).  It may require a good deal of
> > > > effort, but if it makes things work properly, it's worth considering.  I'm
> > > > just not sure where it sits in the list of priorities, as I know you've
> > > > got
> > > > a lot on your plate, Ian.
> > > 
> > > 
> > > It appears to me that the unhashed directory approach proposed by Will does
> > > not account for directories that exist but don't have current mounts.
> > > 
> > > I will re-read the posts, I expect I missed something, and give it more
> > > thought.
> > > 
> > It doesn't consider that case.  You had mentioned it but I had forgotten.
> > 
> 
> OK so I decided to give Wills recommendation a bit of a run and I've come 
> up with a first cut patch which of course doesn't work.
> 
> The approach is to force all callbacks to go through lookup instead of 
> some through revalidate as well. The patch basically posts the dentry to a 
> pending list and unhashs it, then picks it up from the list in the lookup 
> and rehash it. Should be fairly simple really but I'm doing something 
> obviously wrong somewhere.
> 
> I'm seeing slab corruption and I really can't see why this should be the 
> case. Anyone got any ideas. The patch is against 2.6.15-rc1 but the kernel 
> I'm compliling against is a RedHat patched 2.6.11 (Aurora).
> 

Interestingly it seems to function OK on my Intel FC3 box?

And the patch only deals with the revalidate and lookup logic, the readdir 
stuff will need to be reworked as well.

Ian

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-12-04 12:56                       ` Christoph Hellwig
  2005-12-04 12:57                         ` Christoph Hellwig
@ 2005-12-04 14:56                         ` Ian Kent
  1 sibling, 0 replies; 95+ messages in thread
From: Ian Kent @ 2005-12-04 14:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: autofs mailing list, Trond Myklebust, Will Taber, linux-fsdevel

On Sun, 4 Dec 2005, Christoph Hellwig wrote:

> On Fri, Dec 02, 2005 at 01:12:27PM -0500, Trond Myklebust wrote:
> > On Sat, 2005-12-03 at 01:29 +0800, Ian Kent wrote:
> > 
> > > It's sufficient to recognize the nameidata struct is NULL on a call 
> > > from lookup_hash nothing more that I'm aware of is needed. If that changes 
> > > then of course autofs will need to be changed. autofs also makes 
> > > assumptions about what flags are set for different reasons.
> > 
> > Consider those cases where the VFS calls ->lookup()/->d_revalidate()
> > with the nameidata argument set to NULL to be bugs. They are pretty much
> > all slated to be fixed soon in order to enable features like read-only
> > bind mounts etc.
> 
> They are still NULL in exactly one case: lookup_one_len.  Given the design
> of lookup_one_len we can't get at a nameidata there at all.
> 

and I'll use that to detect the lookup_one_len call.

Ian

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-12-04 12:57                         ` Christoph Hellwig
@ 2005-12-04 14:58                           ` Ian Kent
  2005-12-04 17:17                             ` [autofs] " Christoph Hellwig
  0 siblings, 1 reply; 95+ messages in thread
From: Ian Kent @ 2005-12-04 14:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: autofs mailing list, Trond Myklebust, Will Taber, linux-fsdevel

On Sun, 4 Dec 2005, Christoph Hellwig wrote:

> On Sun, Dec 04, 2005 at 12:56:12PM +0000, Christoph Hellwig wrote:
> > They are still NULL in exactly one case: lookup_one_len.  Given the design
> > of lookup_one_len we can't get at a nameidata there at all.
> 
> Oh, forgot the most important bit here :)  lookup_one_len is a library helper
> never called by the VFS.  autofs (v4 at least) doesn't use it so now always
> get a nameidata.  In fact if you look in -mm there's a patch from me that
> makes use of that fact.
> 

But Will is calling it in a something like a stacking context and autofs 
fails to handle it. Hence this discussion.

Ian

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-04 14:58                           ` Ian Kent
@ 2005-12-04 17:17                             ` Christoph Hellwig
  2005-12-05 14:02                               ` Ian Kent
  2005-12-06 21:20                               ` Jeff Moyer
  0 siblings, 2 replies; 95+ messages in thread
From: Christoph Hellwig @ 2005-12-04 17:17 UTC (permalink / raw)
  To: Ian Kent
  Cc: Christoph Hellwig, Trond Myklebust, Will Taber, Jeff Moyer,
	Ram Pai, autofs mailing list, linux-fsdevel

On Sun, Dec 04, 2005 at 10:58:03PM +0800, Ian Kent wrote:
> > never called by the VFS.  autofs (v4 at least) doesn't use it so now always
> > get a nameidata.  In fact if you look in -mm there's a patch from me that
> > makes use of that fact.
> > 
> 
> But Will is calling it in a something like a stacking context and autofs 
> fails to handle it. Hence this discussion.

No, for current TOT that can't happen.  It could happen for older kernels
but nothing is doing it in the tree anymore and if anything outside is doing
it it's fundamentally broken.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-04 17:17                             ` [autofs] " Christoph Hellwig
@ 2005-12-05 14:02                               ` Ian Kent
  2005-12-06 21:20                               ` Jeff Moyer
  1 sibling, 0 replies; 95+ messages in thread
From: Ian Kent @ 2005-12-05 14:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Trond Myklebust, Will Taber, Jeff Moyer, Ram Pai,
	autofs mailing list, linux-fsdevel

On Sun, 4 Dec 2005, Christoph Hellwig wrote:

> On Sun, Dec 04, 2005 at 10:58:03PM +0800, Ian Kent wrote:
> > > never called by the VFS.  autofs (v4 at least) doesn't use it so now always
> > > get a nameidata.  In fact if you look in -mm there's a patch from me that
> > > makes use of that fact.
> > > 
> > 
> > But Will is calling it in a something like a stacking context and autofs 
> > fails to handle it. Hence this discussion.
> 
> No, for current TOT that can't happen.  It could happen for older kernels
> but nothing is doing it in the tree anymore and if anything outside is doing
> it it's fundamentally broken.
> 

Yes. So I should continue with my focus on handling link_path_walk cases 
as all paths should pass through their?

Ian

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC PATCH]autofs4: hang and proposed fix
  2005-12-04 14:54                         ` Ian Kent
@ 2005-12-05 15:40                           ` Ian Kent
  0 siblings, 0 replies; 95+ messages in thread
From: Ian Kent @ 2005-12-05 15:40 UTC (permalink / raw)
  To: Will Taber; +Cc: autofs mailing list, linux-fsdevel, Trond Myklebust

On Sun, 4 Dec 2005, Ian Kent wrote:

> On Sun, 4 Dec 2005, Ian Kent wrote:
> 
> > On Fri, 2 Dec 2005, Will Taber wrote:
> > 
> > > > > 
> > > > > Well, I think we've determined that the reported problem doesn't happen
> > > > > with any in-tree callers.  The question, then, is do you want to fix the
> > > > > locking problem?  Two approaches were presented in this thread.  I don't
> > > > > really like the idea of the hack used by devfs, since it relies on
> > > > > implicit
> > > > > semantics.  I haven't given much thought to the second approach, though
> > > > > (are we sure it can be made to work?).  It may require a good deal of
> > > > > effort, but if it makes things work properly, it's worth considering.  I'm
> > > > > just not sure where it sits in the list of priorities, as I know you've
> > > > > got
> > > > > a lot on your plate, Ian.
> > > > 
> > > > 
> > > > It appears to me that the unhashed directory approach proposed by Will does
> > > > not account for directories that exist but don't have current mounts.
> > > > 
> > > > I will re-read the posts, I expect I missed something, and give it more
> > > > thought.
> > > > 
> > > It doesn't consider that case.  You had mentioned it but I had forgotten.
> > > 
> > 
> > OK so I decided to give Wills recommendation a bit of a run and I've come 
> > up with a first cut patch which of course doesn't work.
> > 
> > The approach is to force all callbacks to go through lookup instead of 
> > some through revalidate as well. The patch basically posts the dentry to a 
> > pending list and unhashs it, then picks it up from the list in the lookup 
> > and rehash it. Should be fairly simple really but I'm doing something 
> > obviously wrong somewhere.
> > 
> > I'm seeing slab corruption and I really can't see why this should be the 
> > case. Anyone got any ideas. The patch is against 2.6.15-rc1 but the kernel 
> > I'm compliling against is a RedHat patched 2.6.11 (Aurora).
> > 

Think I failed to return a dgot dentry from lookup. Ooops.

> 
> Interestingly it seems to function OK on my Intel FC3 box?

Pure chance I think.

> 
> And the patch only deals with the revalidate and lookup logic, the readdir 
> stuff will need to be reworked as well.

I'll push on with this.

Ian

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-04 17:17                             ` [autofs] " Christoph Hellwig
  2005-12-05 14:02                               ` Ian Kent
@ 2005-12-06 21:20                               ` Jeff Moyer
  2005-12-06 21:40                                 ` Christoph Hellwig
  1 sibling, 1 reply; 95+ messages in thread
From: Jeff Moyer @ 2005-12-06 21:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ian Kent, Trond Myklebust, Will Taber, Ram Pai,
	autofs mailing list, linux-fsdevel

==> Regarding Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix; Christoph Hellwig <hch@infradead.org> adds:

hch> On Sun, Dec 04, 2005 at 10:58:03PM +0800, Ian Kent wrote:
>> > never called by the VFS.  autofs (v4 at least) doesn't use it so now
>> always > get a nameidata.  In fact if you look in -mm there's a patch
>> from me that > makes use of that fact.
>> > 
>> 
>> But Will is calling it in a something like a stacking context and autofs
>> fails to handle it. Hence this discussion.

hch> No, for current TOT that can't happen.  It could happen for older
hch> kernels but nothing is doing it in the tree anymore and if anything
hch> outside is doing it it's fundamentally broken.

This is a bit unclear to me.  What do you mean when you refer to "it" and
"that" above?  Oh, and TOT is a TLA I haven't run across before.

We know that there is at least one out of tree module that calls
lookup_one_len, and ends up in the autofs4 revalidate code without the
valid nameidata structure.  In this case, with your patch, wouldn't we
blindly dereference the structure and cause an oops?  If so, who is at
fault?

Thanks,

Jeff

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-06 21:20                               ` Jeff Moyer
@ 2005-12-06 21:40                                 ` Christoph Hellwig
  2005-12-06 22:37                                   ` Jeff Moyer
                                                     ` (2 more replies)
  0 siblings, 3 replies; 95+ messages in thread
From: Christoph Hellwig @ 2005-12-06 21:40 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Christoph Hellwig, Ian Kent, Trond Myklebust, Will Taber,
	Ram Pai, autofs mailing list, linux-fsdevel

On Tue, Dec 06, 2005 at 04:20:29PM -0500, Jeff Moyer wrote:
> hch> No, for current TOT that can't happen.  It could happen for older
> hch> kernels but nothing is doing it in the tree anymore and if anything
> hch> outside is doing it it's fundamentally broken.
> 
> This is a bit unclear to me.  What do you mean when you refer to "it" and
> "that" above?  Oh, and TOT is a TLA I haven't run across before.

TOT = top of tree.

To rephrease the above:  With current mainline the nameidata argument
is always valid when ->lookup or ->d_revalidate are called except when
the filesystem uses lookup_one_len.  lookup_one_len is a helper for fileystem
usage that is only valid to be used on the filesystems own trees.

> We know that there is at least one out of tree module that calls
> lookup_one_len, and ends up in the autofs4 revalidate code without the
> valid nameidata structure.  In this case, with your patch, wouldn't we
> blindly dereference the structure and cause an oops?  If so, who is at
> fault?

This out of tree module is wrong and always has been wrong.  Any actual
breakage of such a module is expected.

Do you happen to know what module that is?

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-06 21:40                                 ` Christoph Hellwig
@ 2005-12-06 22:37                                   ` Jeff Moyer
  2005-12-07 14:52                                   ` Will Taber
  2005-12-07 15:22                                   ` Brian Long
  2 siblings, 0 replies; 95+ messages in thread
From: Jeff Moyer @ 2005-12-06 22:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ian Kent, Trond Myklebust, Will Taber, Ram Pai,
	autofs mailing list, linux-fsdevel

==> Regarding Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix; Christoph Hellwig <hch@infradead.org> adds:

hch> On Tue, Dec 06, 2005 at 04:20:29PM -0500, Jeff Moyer wrote: No, for
hch> current TOT that can't happen.  It could happen for older kernels but
hch> nothing is doing it in the tree anymore and if anything outside is
hch> doing it it's fundamentally broken.
>> This is a bit unclear to me.  What do you mean when you refer to "it"
>> and "that" above?  Oh, and TOT is a TLA I haven't run across before.

hch> TOT = top of tree.

hch> To rephrease the above: With current mainline the nameidata argument
hch> is always valid when ->lookup or ->d_revalidate are called except when
hch> the filesystem uses lookup_one_len.  lookup_one_len is a helper for
hch> fileystem usage that is only valid to be used on the filesystems own
hch> trees.

>> We know that there is at least one out of tree module that calls
>> lookup_one_len, and ends up in the autofs4 revalidate code without the
>> valid nameidata structure.  In this case, with your patch, wouldn't we
>> blindly dereference the structure and cause an oops?  If so, who is at
>> fault?

hch> This out of tree module is wrong and always has been wrong.  Any
hch> actual breakage of such a module is expected.

Thanks for the clarification.  This was my interpretation, but I wanted to
be sure.

hch> Do you happen to know what module that is?

Well, the example originally posted was stubfs, which was purported to be a
sample fs used to show this problem.  Perhaps the original reporter can
tell us what other code does this.

-Jeff

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-06 21:40                                 ` Christoph Hellwig
  2005-12-06 22:37                                   ` Jeff Moyer
@ 2005-12-07 14:52                                   ` Will Taber
  2005-12-07 15:18                                     ` Christoph Hellwig
  2005-12-07 15:22                                   ` Brian Long
  2 siblings, 1 reply; 95+ messages in thread
From: Will Taber @ 2005-12-07 14:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jeff Moyer, Ian Kent, Trond Myklebust, Ram Pai,
	autofs mailing list, linux-fsdevel

Christoph Hellwig wrote:
> To rephrease the above:  With current mainline the nameidata argument
> is always valid when ->lookup or ->d_revalidate are called except when
> the filesystem uses lookup_one_len.  lookup_one_len is a helper for fileystem
> usage that is only valid to be used on the filesystems own trees.
> 
Is this documented anywhere?  How is one to know about this restriction 
since it isn't obvious from the code?  And if this function is only to 
be used to lookup in ones own filesystem how is a filesystem supposed to 
lookup a file in another filesystem if they already have a directory 
dentry in hand? Walking up the dentry tree to recreate a path name so 
you can call path_walk seems a bit much.  Without this capablility how 
does one write a stackable file system?

Will

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-07 14:52                                   ` Will Taber
@ 2005-12-07 15:18                                     ` Christoph Hellwig
  0 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2005-12-07 15:18 UTC (permalink / raw)
  To: Will Taber
  Cc: Christoph Hellwig, Jeff Moyer, Ian Kent, Trond Myklebust,
	Ram Pai, autofs mailing list, linux-fsdevel

On Wed, Dec 07, 2005 at 09:52:04AM -0500, Will Taber wrote:
> Christoph Hellwig wrote:
> >To rephrease the above:  With current mainline the nameidata argument
> >is always valid when ->lookup or ->d_revalidate are called except when
> >the filesystem uses lookup_one_len.  lookup_one_len is a helper for 
> >fileystem
> >usage that is only valid to be used on the filesystems own trees.
> >
> Is this documented anywhere?

It's documented in the mail on lkml where it was introduce long time ago.
I have a patch in my queue to add a comment about this and various other
bits.  It'll go into 2.6.26 I hope.

> How is one to know about this restriction 
> since it isn't obvious from the code? 

Actually it is obvious for the code :)  Because it doesn't get an nfsmount
it's inherently dangerous to use to look up random files.

> And if this function is only to 
> be used to lookup in ones own filesystem how is a filesystem supposed to 
> lookup a file in another filesystem if they already have a directory 
> dentry in hand?

It's not supposed to do that.  All these lookups must be done by VFS code.
The lowest level entry points to do something on the namespace of a filesystem
are the vfs_* routines or thing like dentry_open().


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-06 21:40                                 ` Christoph Hellwig
  2005-12-06 22:37                                   ` Jeff Moyer
  2005-12-07 14:52                                   ` Will Taber
@ 2005-12-07 15:22                                   ` Brian Long
  2005-12-07 15:25                                     ` Christoph Hellwig
  2005-12-07 17:46                                     ` Will Taber
  2 siblings, 2 replies; 95+ messages in thread
From: Brian Long @ 2005-12-07 15:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jeff Moyer, autofs mailing list, Trond Myklebust, Will Taber,
	linux-fsdevel, Ian Kent

On Tue, 2005-12-06 at 21:40 +0000, Christoph Hellwig wrote:
> On Tue, Dec 06, 2005 at 04:20:29PM -0500, Jeff Moyer wrote:
> > hch> No, for current TOT that can't happen.  It could happen for older
> > hch> kernels but nothing is doing it in the tree anymore and if anything
> > hch> outside is doing it it's fundamentally broken.
> > 
> > This is a bit unclear to me.  What do you mean when you refer to "it" and
> > "that" above?  Oh, and TOT is a TLA I haven't run across before.
> 
> TOT = top of tree.
> 
> To rephrease the above:  With current mainline the nameidata argument
> is always valid when ->lookup or ->d_revalidate are called except when
> the filesystem uses lookup_one_len.  lookup_one_len is a helper for fileystem
> usage that is only valid to be used on the filesystems own trees.
> 
> > We know that there is at least one out of tree module that calls
> > lookup_one_len, and ends up in the autofs4 revalidate code without the
> > valid nameidata structure.  In this case, with your patch, wouldn't we
> > blindly dereference the structure and cause an oops?  If so, who is at
> > fault?
> 
> This out of tree module is wrong and always has been wrong.  Any actual
> breakage of such a module is expected.
> 
> Do you happen to know what module that is?

I believe the filesystem is IBM Rational's mvfs (multi-version
filesystem) used in ClearCase.  My team is the internal support
organization at Cisco for Red Hat Enterprise Linux issues and we opened
the support case with Red Hat about this issue.  Our internal ClearCase
support folks also have a case opened with IBM Rational Tech Support.
Thank you for clarifying that mvfs is doing the "wrong thing" by calling
lookup_one_len.

/Brian/
-- 
       Brian Long                      |         |           |
       IT Data Center Systems          |       .|||.       .|||.
       Cisco Linux Developer           |   ..:|||||||:...:|||||||:..
       Phone: (919) 392-7363           |   C i s c o   S y s t e m s


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-07 15:22                                   ` Brian Long
@ 2005-12-07 15:25                                     ` Christoph Hellwig
  2005-12-07 17:46                                     ` Will Taber
  1 sibling, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2005-12-07 15:25 UTC (permalink / raw)
  To: Brian Long
  Cc: Christoph Hellwig, Jeff Moyer, autofs mailing list,
	Trond Myklebust, Will Taber, linux-fsdevel, Ian Kent

On Wed, Dec 07, 2005 at 10:22:23AM -0500, Brian Long wrote:
> I believe the filesystem is IBM Rational's mvfs (multi-version
> filesystem) used in ClearCase.  My team is the internal support
> organization at Cisco for Red Hat Enterprise Linux issues and we opened
> the support case with Red Hat about this issue.  Our internal ClearCase
> support folks also have a case opened with IBM Rational Tech Support.
> Thank you for clarifying that mvfs is doing the "wrong thing" by calling
> lookup_one_len.

Well, mvfs is the worst filesystem ever.  I wish could beat up the people at
rational that wrote it personally.  Everytime some implementation detail
in the VFS changes it turns out mvfs uses it in the most braindead way.

I also really doubt they could claim it's not derived work of the linux
kernel given all this poking it does.

Maybe I should sue IBM about so they take it off the market ;-)

*evil grin*

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-07 15:22                                   ` Brian Long
  2005-12-07 15:25                                     ` Christoph Hellwig
@ 2005-12-07 17:46                                     ` Will Taber
  2005-12-08 14:16                                       ` Ian Kent
  2005-12-09 12:12                                       ` Christoph Hellwig
  1 sibling, 2 replies; 95+ messages in thread
From: Will Taber @ 2005-12-07 17:46 UTC (permalink / raw)
  To: Brian Long
  Cc: Christoph Hellwig, Jeff Moyer, autofs mailing list,
	Trond Myklebust, linux-fsdevel, Ian Kent

Brian Long wrote:
> On Tue, 2005-12-06 at 21:40 +0000, Christoph Hellwig wrote:
> 
>>On Tue, Dec 06, 2005 at 04:20:29PM -0500, Jeff Moyer wrote:
>>
>>>hch> No, for current TOT that can't happen.  It could happen for older
>>>hch> kernels but nothing is doing it in the tree anymore and if anything
>>>hch> outside is doing it it's fundamentally broken.
>>>
>>>This is a bit unclear to me.  What do you mean when you refer to "it" and
>>>"that" above?  Oh, and TOT is a TLA I haven't run across before.
>>
>>TOT = top of tree.
>>
>>To rephrease the above:  With current mainline the nameidata argument
>>is always valid when ->lookup or ->d_revalidate are called except when
>>the filesystem uses lookup_one_len.  lookup_one_len is a helper for fileystem
>>usage that is only valid to be used on the filesystems own trees.
>>
>>
>>>We know that there is at least one out of tree module that calls
>>>lookup_one_len, and ends up in the autofs4 revalidate code without the
>>>valid nameidata structure.  In this case, with your patch, wouldn't we
>>>blindly dereference the structure and cause an oops?  If so, who is at
>>>fault?
>>
>>This out of tree module is wrong and always has been wrong.  Any actual
>>breakage of such a module is expected.
>>
>>Do you happen to know what module that is?
> 
> 
> I believe the filesystem is IBM Rational's mvfs (multi-version
> filesystem) used in ClearCase.  My team is the internal support
> organization at Cisco for Red Hat Enterprise Linux issues and we opened
> the support case with Red Hat about this issue.  Our internal ClearCase
> support folks also have a case opened with IBM Rational Tech Support.
> Thank you for clarifying that mvfs is doing the "wrong thing" by calling
> lookup_one_len.
> 
> /Brian/

Maybe we are doing "the wrong thing" by calling lookup_one_len on the 
autofs but that begs the question of what the right thing would be. 
There is no vfs_lookup function that will look up a single component in 
a pathname.  There may be things we can do in the future so that we 
don't have to make this call at all, but that doesn't resolve the 
problems you have today.

Likewise one could argue that the vfs layer is broken because it is 
inconsistent in its handling of the parent i_sem lock on d_revalidate 
calls.  While this may be true in some abstract software engineering 
sense I have learned enough from this thread already to realize that 
there is no easy solution there.

On the assumption that our use of lookup_one_len was appropriate, Ian 
has been working on a fix to autofs.  I am not particularly interested 
in where the change is made.  What I have been looking for was a 
solution that could then be backported to earlier releases to fix the 
problem at hand.  Anyway, with this information, is it appropriate for 
us to replace our calls to lookup_one_len with calls to path_walk or is 
that also forbidden?

Will


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-07 17:46                                     ` Will Taber
@ 2005-12-08 14:16                                       ` Ian Kent
  2005-12-09 12:12                                       ` Christoph Hellwig
  1 sibling, 0 replies; 95+ messages in thread
From: Ian Kent @ 2005-12-08 14:16 UTC (permalink / raw)
  To: Will Taber
  Cc: Brian Long, Christoph Hellwig, Jeff Moyer, autofs mailing list,
	Trond Myklebust, linux-fsdevel

On Wed, 7 Dec 2005, Will Taber wrote:

> Brian Long wrote:
> > On Tue, 2005-12-06 at 21:40 +0000, Christoph Hellwig wrote:
> > 
> >>On Tue, Dec 06, 2005 at 04:20:29PM -0500, Jeff Moyer wrote:
> >>
> >>>hch> No, for current TOT that can't happen.  It could happen for older
> >>>hch> kernels but nothing is doing it in the tree anymore and if anything
> >>>hch> outside is doing it it's fundamentally broken.
> >>>
> >>>This is a bit unclear to me.  What do you mean when you refer to "it" and
> >>>"that" above?  Oh, and TOT is a TLA I haven't run across before.
> >>
> >>TOT = top of tree.
> >>
> >>To rephrease the above:  With current mainline the nameidata argument
> >>is always valid when ->lookup or ->d_revalidate are called except when
> >>the filesystem uses lookup_one_len.  lookup_one_len is a helper for fileystem
> >>usage that is only valid to be used on the filesystems own trees.
> >>
> >>
> >>>We know that there is at least one out of tree module that calls
> >>>lookup_one_len, and ends up in the autofs4 revalidate code without the
> >>>valid nameidata structure.  In this case, with your patch, wouldn't we
> >>>blindly dereference the structure and cause an oops?  If so, who is at
> >>>fault?
> >>
> >>This out of tree module is wrong and always has been wrong.  Any actual
> >>breakage of such a module is expected.
> >>
> >>Do you happen to know what module that is?
> > 
> > 
> > I believe the filesystem is IBM Rational's mvfs (multi-version
> > filesystem) used in ClearCase.  My team is the internal support
> > organization at Cisco for Red Hat Enterprise Linux issues and we opened
> > the support case with Red Hat about this issue.  Our internal ClearCase
> > support folks also have a case opened with IBM Rational Tech Support.
> > Thank you for clarifying that mvfs is doing the "wrong thing" by calling
> > lookup_one_len.
> > 
> > /Brian/
> 
> Maybe we are doing "the wrong thing" by calling lookup_one_len on the 
> autofs but that begs the question of what the right thing would be. 
> There is no vfs_lookup function that will look up a single component in 
> a pathname.  There may be things we can do in the future so that we 
> don't have to make this call at all, but that doesn't resolve the 
> problems you have today.
> 
> Likewise one could argue that the vfs layer is broken because it is 
> inconsistent in its handling of the parent i_sem lock on d_revalidate 
> calls.  While this may be true in some abstract software engineering 
> sense I have learned enough from this thread already to realize that 
> there is no easy solution there.
> 
> On the assumption that our use of lookup_one_len was appropriate, Ian 
> has been working on a fix to autofs.  I am not particularly interested 
> in where the change is made.  What I have been looking for was a 
> solution that could then be backported to earlier releases to fix the 
> problem at hand.  Anyway, with this information, is it appropriate for 
> us to replace our calls to lookup_one_len with calls to path_walk or is 
> that also forbidden?

One thing that I can't allow for is the need for a non NULL nameidata 
struct following Christophs touch_atime patch. Requiring a valid nameidata 
struct is clearly a good thing.

Perhaps you could use lookup_hash. There is a depricated_for_modules patch 
for it in the -mm series. At least it's almost forbidden (:.

Ian


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-07 17:46                                     ` Will Taber
  2005-12-08 14:16                                       ` Ian Kent
@ 2005-12-09 12:12                                       ` Christoph Hellwig
  2005-12-09 13:33                                         ` John T. Kohl
  1 sibling, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2005-12-09 12:12 UTC (permalink / raw)
  To: Will Taber
  Cc: Brian Long, Jeff Moyer, autofs mailing list, Trond Myklebust,
	linux-fsdevel, Ian Kent

On Wed, Dec 07, 2005 at 12:46:39PM -0500, Will Taber wrote:
> problem at hand.  Anyway, with this information, is it appropriate for 
> us to replace our calls to lookup_one_len with calls to path_walk or is 
> that also forbidden?

path_walk and firends resolve a full path.  As such they are definitly not
suitable for stackable filesystem, they are intendeded only for implementating
syscalls (and syscall-like interfaces, e.g. some ioctls)

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-09 12:12                                       ` Christoph Hellwig
@ 2005-12-09 13:33                                         ` John T. Kohl
  2005-12-13 18:39                                           ` Christoph Hellwig
  0 siblings, 1 reply; 95+ messages in thread
From: John T. Kohl @ 2005-12-09 13:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Brian Long, Jeff Moyer, autofs mailing list, wtaber,
	Trond Myklebust, linux-fsdevel, Ian Kent

>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:

Christoph> On Wed, Dec 07, 2005 at 12:46:39PM -0500, Will Taber wrote:
>> problem at hand.  Anyway, with this information, is it appropriate for 
>> us to replace our calls to lookup_one_len with calls to path_walk or is 
>> that also forbidden?

Christoph> path_walk and firends resolve a full path.  As such they are
Christoph> definitly not suitable for stackable filesystem, they are
Christoph> intendeded only for implementating syscalls (and syscall-like
Christoph> interfaces, e.g. some ioctls)

path_lookup() takes a pathname and interprets it relative to cwd or
root, so it's definitely not useful for stacking.  But path_walk() takes
a pathname fragment and an initialized nameidata (start point), so it
*could* be used to resolve a single component, or multiple components.

If lookup_one_len() is for use within the caller's file system only, and
path_walk() is not suitable for stacking, then what calls *are* suitable
for stacking file systems to use?  We want to start with an existing
(dentry,vfsmnt) and a pathname component, converting it to the
(dentry,vfsmnt) of the result.

-- 
John Kohl
Senior Software Engineer - Rational Software - IBM Software Group
Lexington, Massachusetts, USA
jtk@us.ibm.com
<http://www.ibm.com/software/rational/>

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
  2005-12-09 13:33                                         ` John T. Kohl
@ 2005-12-13 18:39                                           ` Christoph Hellwig
  0 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2005-12-13 18:39 UTC (permalink / raw)
  To: John T. Kohl
  Cc: Christoph Hellwig, Brian Long, Jeff Moyer, autofs mailing list,
	wtaber, Trond Myklebust, linux-fsdevel, Ian Kent

On Fri, Dec 09, 2005 at 08:33:45AM -0500, John T. Kohl wrote:
> Christoph> path_walk and firends resolve a full path.  As such they are
> Christoph> definitly not suitable for stackable filesystem, they are
> Christoph> intendeded only for implementating syscalls (and syscall-like
> Christoph> interfaces, e.g. some ioctls)
> 
> path_lookup() takes a pathname and interprets it relative to cwd or
> root, so it's definitely not useful for stacking.  But path_walk() takes
> a pathname fragment and an initialized nameidata (start point), so it
> *could* be used to resolve a single component, or multiple components.
> 
> If lookup_one_len() is for use within the caller's file system only, and
> path_walk() is not suitable for stacking, then what calls *are* suitable
> for stacking file systems to use?  We want to start with an existing
> (dentry,vfsmnt) and a pathname component, converting it to the
> (dentry,vfsmnt) of the result.

As there are no stackable filesystems in the tree there's currently no
interface designed for them, and once we'll get interfaces for stackable
filesystems they'll surely be _GPL as they're clearly internal and any
stackable filesystem needs to know a lot about the VFS and will have to
change far more frequently than leaf filesystems.

path_walk is an internal implementation detail that will hopefull go away
as an export.

^ permalink raw reply	[flat|nested] 95+ messages in thread

end of thread, other threads:[~2005-12-13 18:39 UTC | newest]

Thread overview: 95+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-11-16 10:17 [RFC PATCH]autofs4: hang and proposed fix Ram Pai
2005-11-16 12:41 ` [autofs] " Ian Kent
2005-11-16 16:50   ` Ram Pai
2005-11-16 22:57     ` Ian Kent
2005-11-17  1:52       ` [autofs] " Ram Pai
2005-11-17 18:50         ` Ian Kent
2005-11-17 19:19           ` William H. Taber
2005-11-17 20:39             ` Ram Pai
2005-11-17 22:31               ` William H. Taber
2005-11-18 14:57                 ` Ian Kent
2005-11-18 14:54               ` Ian Kent
2005-11-18 14:44             ` Ian Kent
2005-11-18 15:20               ` William H. Taber
2005-11-18 16:30                 ` Ian Kent
2005-11-18 17:12                   ` William H. Taber
2005-11-18 18:57                     ` Ram Pai
2005-11-18 20:08                       ` William H. Taber
2005-11-19  2:52                         ` Ian Kent
2005-11-21 16:40                           ` William H. Taber
2005-11-22 13:13                             ` Ian Kent
2005-11-22 17:48                               ` [autofs] " William H. Taber
2005-11-23 14:11                                 ` Ian Kent
2005-11-23 16:42                                   ` William H. Taber
2005-11-23 17:52                                     ` Ian Kent
2005-11-23 18:47                                       ` William H. Taber
2005-11-23 17:52                                     ` Ian Kent
2005-11-19  1:40                     ` [autofs] " Ian Kent
2005-11-16 15:22 ` Jeff Moyer
2005-11-16 15:22   ` Jeff Moyer
2005-11-16 17:00   ` [autofs] " Ram Pai
2005-11-16 18:25     ` Jeff Moyer
2005-11-16 19:24       ` William H. Taber
2005-11-16 19:51         ` Ram Pai
2005-11-27 10:47 ` Ian Kent
2005-11-28 17:19   ` William H. Taber
2005-11-28 23:12     ` Badari Pulavarty
2005-11-29 14:19       ` Ian Kent
2005-11-29 16:34         ` William H. Taber
2005-11-30 14:02           ` Ian Kent
2005-11-30 16:49             ` Badari Pulavarty
2005-11-30 17:04               ` Trond Myklebust
2005-11-30 21:10                 ` William H. Taber
2005-11-29 14:20     ` Ian Kent
2005-11-30  1:16 ` [autofs] " Jeff Moyer
2005-11-30  1:16   ` Jeff Moyer
2005-11-30  1:56   ` Trond Myklebust
2005-11-30  4:15     ` Jeff Moyer
2005-11-30  6:14       ` Trond Myklebust
2005-11-30 15:44         ` Ian Kent
2005-11-30 15:53           ` [autofs] " Trond Myklebust
2005-11-30 16:12             ` Ian Kent
2005-11-30 16:27               ` Ian Kent
2005-11-30 16:45               ` [autofs] " Trond Myklebust
2005-11-30 20:32     ` William H. Taber
2005-11-30 20:53       ` Trond Myklebust
2005-11-30 21:30         ` William H. Taber
2005-11-30 22:32           ` Trond Myklebust
2005-12-01 16:27             ` William H. Taber
2005-12-01 12:09           ` Ian Kent
2005-12-01 16:30             ` William H. Taber
2005-12-02 13:49               ` Ian Kent
2005-12-02 14:07                 ` Jeff Moyer
2005-12-02 15:21                   ` Ian Kent
2005-12-02 16:35                     ` [autofs] " Will Taber
2005-12-02 17:11                       ` Ian Kent
2005-12-02 15:34                 ` Will Taber
2005-12-02 17:29                   ` Ian Kent
2005-12-02 18:12                     ` Trond Myklebust
2005-12-04 12:56                       ` Christoph Hellwig
2005-12-04 12:57                         ` Christoph Hellwig
2005-12-04 14:58                           ` Ian Kent
2005-12-04 17:17                             ` [autofs] " Christoph Hellwig
2005-12-05 14:02                               ` Ian Kent
2005-12-06 21:20                               ` Jeff Moyer
2005-12-06 21:40                                 ` Christoph Hellwig
2005-12-06 22:37                                   ` Jeff Moyer
2005-12-07 14:52                                   ` Will Taber
2005-12-07 15:18                                     ` Christoph Hellwig
2005-12-07 15:22                                   ` Brian Long
2005-12-07 15:25                                     ` Christoph Hellwig
2005-12-07 17:46                                     ` Will Taber
2005-12-08 14:16                                       ` Ian Kent
2005-12-09 12:12                                       ` Christoph Hellwig
2005-12-09 13:33                                         ` John T. Kohl
2005-12-13 18:39                                           ` Christoph Hellwig
2005-12-04 14:56                         ` Ian Kent
2005-12-02 19:04                     ` [autofs] " Will Taber
2005-12-04  9:39                       ` Ian Kent
2005-12-02 16:04                 ` [autofs] " Jeff Moyer
2005-12-02 17:36                   ` Ian Kent
2005-12-02 18:33                     ` [autofs] " Will Taber
2005-12-04  9:52                       ` Ian Kent
2005-12-04 14:54                         ` Ian Kent
2005-12-05 15:40                           ` Ian Kent
2005-11-30 14:48   ` [autofs] " Ian Kent

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.