All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
@ 2009-03-19 20:16 Sage Weil
  2009-03-19 20:16 ` [PATCH 2/2] vfs: clean up real_lookup Sage Weil
                   ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Sage Weil @ 2009-03-19 20:16 UTC (permalink / raw)
  To: linux-fsdevel, hch, akpm; +Cc: Sage Weil, Al Viro, Andreas Dilger, Yehuda Sadeh

real_lookup() is called by do_lookup() if dentry revalidation fails.  If
the cache is re-populated while waiting for i_mutex, it may find that
a d_lookup() subsequently succeeds (see the "Uhhuh! Nasty case" comment).

Previously, real_lookup() would drop i_mutex and do_revalidate() again. If
revalidate failed _again_, however, it would give up with -ENOENT.  The
problem here that network file systems may be invalidating dentries via
server callbacks, e.g. due to concurrent access from another client, and
-ENOENT is frequently the wrong answer.

This problem has been seen with both Lustre and Ceph.  It seems possible
to hit this case with NFS as well if the cache lifetime is very short.

Instead, we should do_revalidate() while i_mutex is still held.  If
revalidation fails, we can move on to a ->lookup() and ensure a correct
result without worrying about any subsequent races.

Note that do_revalidate() is called with i_mutex held elsewhere.  For
example, do_filp_open(), lookup_create(), do_unlinkat(), do_rmdir(),
and possibly others all take the directory i_mutex, and then

-> lookup_hash
        -> __lookup_hash
                -> cached_lookup
                        -> do_revalidate

so this does not introduce any new locking rules for d_revalidate
implementations.

Yes, the goto is ugly.  A cleanup patch follows.

CC: Al Viro <viro@zeniv.linux.org.uk>
CC: Andreas Dilger <adilger@sun.com>
Signed-off-by: Yehuda Sadeh <yehuda@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
---
 fs/namei.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c30e33d..b9e7128 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -489,6 +489,7 @@ static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, s
 	if (!result) {
 		struct dentry *dentry;
 
+do_the_lookup:
 		/* Don't create child dentry for a dead directory. */
 		result = ERR_PTR(-ENOENT);
 		if (IS_DEADDIR(dir))
@@ -512,12 +513,12 @@ out_unlock:
 	 * Uhhuh! Nasty case: the cache was re-populated while
 	 * we waited on the semaphore. Need to revalidate.
 	 */
-	mutex_unlock(&dir->i_mutex);
 	if (result->d_op && result->d_op->d_revalidate) {
 		result = do_revalidate(result, nd);
 		if (!result)
-			result = ERR_PTR(-ENOENT);
+			goto do_the_lookup;
 	}
+	mutex_unlock(&dir->i_mutex);
 	return result;
 }
 
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 2/2] vfs: clean up real_lookup
  2009-03-19 20:16 [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held Sage Weil
@ 2009-03-19 20:16 ` Sage Weil
  2009-03-19 20:22   ` Christoph Hellwig
  2009-03-19 20:23 ` [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held Christoph Hellwig
  2009-03-24  4:14 ` Ian Kent
  2 siblings, 1 reply; 45+ messages in thread
From: Sage Weil @ 2009-03-19 20:16 UTC (permalink / raw)
  To: linux-fsdevel, hch, akpm; +Cc: Sage Weil

Get rid of the goto by flipping the if (!result) over.  Make the
comments a bit more descriptive.  Fix a few kernel style problems.
No functional changes.

Signed-off-by: Sage Weil <sage@newdream.net>
---
 fs/namei.c |   65 +++++++++++++++++++++++++++++++----------------------------
 1 files changed, 34 insertions(+), 31 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index b9e7128..b98d916 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -461,19 +461,20 @@ ok:
  * This is called when everything else fails, and we actually have
  * to go to the low-level filesystem to find out what we should do..
  *
- * We get the directory semaphore, and after getting that we also
+ * We get the directory mutex, and after getting that we also
  * make sure that nobody added the entry to the dcache in the meantime..
  * SMP-safe
  */
-static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, struct nameidata *nd)
+static struct dentry *real_lookup(struct dentry *parent, struct qstr *name,
+				  struct nameidata *nd)
 {
-	struct dentry * result;
+	struct dentry *result, *dentry;
 	struct inode *dir = parent->d_inode;
 
 	mutex_lock(&dir->i_mutex);
 	/*
 	 * First re-do the cached lookup just in case it was created
-	 * while we waited for the directory semaphore..
+	 * while we waited for the directory mutex.
 	 *
 	 * FIXME! This could use version numbering or similar to
 	 * avoid unnecessary cache lookups.
@@ -486,38 +487,40 @@ static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, s
 	 * so doing d_lookup() (with seqlock), instead of lockfree __d_lookup
 	 */
 	result = d_lookup(parent, name);
-	if (!result) {
-		struct dentry *dentry;
-
-do_the_lookup:
-		/* Don't create child dentry for a dead directory. */
-		result = ERR_PTR(-ENOENT);
-		if (IS_DEADDIR(dir))
-			goto out_unlock;
-
-		dentry = d_alloc(parent, name);
-		result = ERR_PTR(-ENOMEM);
-		if (dentry) {
-			result = dir->i_op->lookup(dir, dentry, nd);
+	if (result) {
+		/*
+		 * The cache was re-populated while we waited on the
+		 * mutex.  We need to revalidate, this time while
+		 * holding i_mutex (to avoid another race).
+		 */
+		if (result->d_op && result->d_op->d_revalidate) {
+			result = do_revalidate(result, nd);
 			if (result)
-				dput(dentry);
-			else
-				result = dentry;
+				goto out_unlock;
+			/*
+			 * The dentry was left behind invalid.  Just
+			 * do the lookup.
+			 */
+		} else {
+			goto out_unlock;
 		}
-out_unlock:
-		mutex_unlock(&dir->i_mutex);
-		return result;
 	}
 
-	/*
-	 * Uhhuh! Nasty case: the cache was re-populated while
-	 * we waited on the semaphore. Need to revalidate.
-	 */
-	if (result->d_op && result->d_op->d_revalidate) {
-		result = do_revalidate(result, nd);
-		if (!result)
-			goto do_the_lookup;
+	/* Don't create child dentry for a dead directory. */
+	result = ERR_PTR(-ENOENT);
+	if (IS_DEADDIR(dir))
+		goto out_unlock;
+
+	dentry = d_alloc(parent, name);
+	result = ERR_PTR(-ENOMEM);
+	if (dentry) {
+		result = dir->i_op->lookup(dir, dentry, nd);
+		if (result)
+			dput(dentry);
+		else
+			result = dentry;
 	}
+out_unlock:
 	mutex_unlock(&dir->i_mutex);
 	return result;
 }
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] vfs: clean up real_lookup
  2009-03-19 20:16 ` [PATCH 2/2] vfs: clean up real_lookup Sage Weil
@ 2009-03-19 20:22   ` Christoph Hellwig
  2009-03-19 20:35     ` Sage Weil
  0 siblings, 1 reply; 45+ messages in thread
From: Christoph Hellwig @ 2009-03-19 20:22 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-fsdevel, hch, akpm

On Thu, Mar 19, 2009 at 01:16:30PM -0700, Sage Weil wrote:
> +	if (result) {
> +		/*
> +		 * The cache was re-populated while we waited on the
> +		 * mutex.  We need to revalidate, this time while
> +		 * holding i_mutex (to avoid another race).
> +		 */
> +		if (result->d_op && result->d_op->d_revalidate) {
> +			result = do_revalidate(result, nd);
>  			if (result)
> +				goto out_unlock;
> +			/*
> +			 * The dentry was left behind invalid.  Just
> +			 * do the lookup.
> +			 */
> +		} else {
> +			goto out_unlock;
>  		}

This would be better off as

		if (!result->d_op || !result->d_op->d_revalidate)
			goto out_unlock;

		result = do_revalidate(result, nd);
		if (result)
			goto out_unlock;

		/*
		 * The dentry was left behind invalid.  Just
		 * do the lookup.
		 */
	
Otherwise looks good to me.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-03-19 20:16 [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held Sage Weil
  2009-03-19 20:16 ` [PATCH 2/2] vfs: clean up real_lookup Sage Weil
@ 2009-03-19 20:23 ` Christoph Hellwig
  2009-03-24  4:14 ` Ian Kent
  2 siblings, 0 replies; 45+ messages in thread
From: Christoph Hellwig @ 2009-03-19 20:23 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-fsdevel, hch, akpm, Al Viro, Andreas Dilger, Yehuda Sadeh

Looks good.  


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] vfs: clean up real_lookup
  2009-03-19 20:22   ` Christoph Hellwig
@ 2009-03-19 20:35     ` Sage Weil
  0 siblings, 0 replies; 45+ messages in thread
From: Sage Weil @ 2009-03-19 20:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, akpm

On Thu, 19 Mar 2009, Christoph Hellwig wrote:
> On Thu, Mar 19, 2009 at 01:16:30PM -0700, Sage Weil wrote:
> > +	if (result) {
> > +		/*
> > +		 * The cache was re-populated while we waited on the
> > +		 * mutex.  We need to revalidate, this time while
> > +		 * holding i_mutex (to avoid another race).
> > +		 */
> > +		if (result->d_op && result->d_op->d_revalidate) {
> > +			result = do_revalidate(result, nd);
> >  			if (result)
> > +				goto out_unlock;
> > +			/*
> > +			 * The dentry was left behind invalid.  Just
> > +			 * do the lookup.
> > +			 */
> > +		} else {
> > +			goto out_unlock;
> >  		}
> 
> This would be better off as
> 
> 		if (!result->d_op || !result->d_op->d_revalidate)
> 			goto out_unlock;
> 
> 		result = do_revalidate(result, nd);
> 		if (result)
> 			goto out_unlock;
> 
> 		/*
> 		 * The dentry was left behind invalid.  Just
> 		 * do the lookup.
> 		 */
> 	
> Otherwise looks good to me.

Fixed up below.  Thanks!

sage

---

>From 81c8657b56f95419f8235188e6de1fc2fa73ff14 Mon Sep 17 00:00:00 2001
From: Sage Weil <sage@newdream.net>
Date: Thu, 19 Mar 2009 13:25:12 -0700
Subject: [PATCH 2/2] vfs: clean up real_lookup

Get rid of the goto by flipping the if (!result) over.  Make the
comments a bit more descriptive.  Fix a few kernel style problems.
No functional changes.

Signed-off-by: Sage Weil <sage@newdream.net>
---
 fs/namei.c |   63 +++++++++++++++++++++++++++++++----------------------------
 1 files changed, 33 insertions(+), 30 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index b9e7128..ead15a6 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -461,19 +461,20 @@ ok:
  * This is called when everything else fails, and we actually have
  * to go to the low-level filesystem to find out what we should do..
  *
- * We get the directory semaphore, and after getting that we also
+ * We get the directory mutex, and after getting that we also
  * make sure that nobody added the entry to the dcache in the meantime..
  * SMP-safe
  */
-static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, struct nameidata *nd)
+static struct dentry *real_lookup(struct dentry *parent, struct qstr *name,
+				  struct nameidata *nd)
 {
-	struct dentry * result;
+	struct dentry *result, *dentry;
 	struct inode *dir = parent->d_inode;
 
 	mutex_lock(&dir->i_mutex);
 	/*
 	 * First re-do the cached lookup just in case it was created
-	 * while we waited for the directory semaphore..
+	 * while we waited for the directory mutex.
 	 *
 	 * FIXME! This could use version numbering or similar to
 	 * avoid unnecessary cache lookups.
@@ -486,38 +487,40 @@ static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, s
 	 * so doing d_lookup() (with seqlock), instead of lockfree __d_lookup
 	 */
 	result = d_lookup(parent, name);
-	if (!result) {
-		struct dentry *dentry;
+	if (result) {
+		/*
+		 * The cache was re-populated while we waited on the
+		 * mutex.  We need to revalidate, this time while
+		 * holding i_mutex (to avoid another race).
+		 */
+		if (!result->d_op || !result->d_op->d_revalidate)
+			goto out_unlock;
 
-do_the_lookup:
-		/* Don't create child dentry for a dead directory. */
-		result = ERR_PTR(-ENOENT);
-		if (IS_DEADDIR(dir))
+		result = do_revalidate(result, nd);
+		if (result)
 			goto out_unlock;
 
-		dentry = d_alloc(parent, name);
-		result = ERR_PTR(-ENOMEM);
-		if (dentry) {
-			result = dir->i_op->lookup(dir, dentry, nd);
-			if (result)
-				dput(dentry);
-			else
-				result = dentry;
-		}
-out_unlock:
-		mutex_unlock(&dir->i_mutex);
-		return result;
+		/*
+		 * The dentry was left behind invalid.  Just
+		 * do the lookup.
+		 */
 	}
 
-	/*
-	 * Uhhuh! Nasty case: the cache was re-populated while
-	 * we waited on the semaphore. Need to revalidate.
-	 */
-	if (result->d_op && result->d_op->d_revalidate) {
-		result = do_revalidate(result, nd);
-		if (!result)
-			goto do_the_lookup;
+	/* Don't create child dentry for a dead directory. */
+	result = ERR_PTR(-ENOENT);
+	if (IS_DEADDIR(dir))
+		goto out_unlock;
+
+	dentry = d_alloc(parent, name);
+	result = ERR_PTR(-ENOMEM);
+	if (dentry) {
+		result = dir->i_op->lookup(dir, dentry, nd);
+		if (result)
+			dput(dentry);
+		else
+			result = dentry;
 	}
+out_unlock:
 	mutex_unlock(&dir->i_mutex);
 	return result;
 }
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-03-19 20:16 [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held Sage Weil
  2009-03-19 20:16 ` [PATCH 2/2] vfs: clean up real_lookup Sage Weil
  2009-03-19 20:23 ` [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held Christoph Hellwig
@ 2009-03-24  4:14 ` Ian Kent
  2009-03-24  4:18   ` Ian Kent
  2 siblings, 1 reply; 45+ messages in thread
From: Ian Kent @ 2009-03-24  4:14 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-fsdevel, hch, akpm, Al Viro, Andreas Dilger, Yehuda Sadeh

Sage Weil wrote:
> real_lookup() is called by do_lookup() if dentry revalidation fails.  If
> the cache is re-populated while waiting for i_mutex, it may find that
> a d_lookup() subsequently succeeds (see the "Uhhuh! Nasty case" comment).
> 
> Previously, real_lookup() would drop i_mutex and do_revalidate() again. If
> revalidate failed _again_, however, it would give up with -ENOENT.  The
> problem here that network file systems may be invalidating dentries via
> server callbacks, e.g. due to concurrent access from another client, and
> -ENOENT is frequently the wrong answer.

This will be something of a problem for autofs4 (and autofs).
It would require fairly significant changes to the revalidate code.

> 
> This problem has been seen with both Lustre and Ceph.  It seems possible
> to hit this case with NFS as well if the cache lifetime is very short.
> 
> Instead, we should do_revalidate() while i_mutex is still held.  If
> revalidation fails, we can move on to a ->lookup() and ensure a correct
> result without worrying about any subsequent races.
> 
> Note that do_revalidate() is called with i_mutex held elsewhere.  For
> example, do_filp_open(), lookup_create(), do_unlinkat(), do_rmdir(),
> and possibly others all take the directory i_mutex, and then
> 
> -> lookup_hash
>         -> __lookup_hash
>                 -> cached_lookup
>                         -> do_revalidate
> 
> so this does not introduce any new locking rules for d_revalidate
> implementations.
> 
> Yes, the goto is ugly.  A cleanup patch follows.
> 
> CC: Al Viro <viro@zeniv.linux.org.uk>
> CC: Andreas Dilger <adilger@sun.com>
> Signed-off-by: Yehuda Sadeh <yehuda@newdream.net>
> Signed-off-by: Sage Weil <sage@newdream.net>
> ---
>  fs/namei.c |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index c30e33d..b9e7128 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -489,6 +489,7 @@ static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, s
>  	if (!result) {
>  		struct dentry *dentry;
>  
> +do_the_lookup:
>  		/* Don't create child dentry for a dead directory. */
>  		result = ERR_PTR(-ENOENT);
>  		if (IS_DEADDIR(dir))
> @@ -512,12 +513,12 @@ out_unlock:
>  	 * Uhhuh! Nasty case: the cache was re-populated while
>  	 * we waited on the semaphore. Need to revalidate.
>  	 */
> -	mutex_unlock(&dir->i_mutex);
>  	if (result->d_op && result->d_op->d_revalidate) {
>  		result = do_revalidate(result, nd);
>  		if (!result)
> -			result = ERR_PTR(-ENOENT);
> +			goto do_the_lookup;
>  	}
> +	mutex_unlock(&dir->i_mutex);
>  	return result;
>  }
>  


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-03-24  4:14 ` Ian Kent
@ 2009-03-24  4:18   ` Ian Kent
  2009-03-25  4:29     ` Sage Weil
  0 siblings, 1 reply; 45+ messages in thread
From: Ian Kent @ 2009-03-24  4:18 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-fsdevel, hch, akpm, Al Viro, Andreas Dilger, Yehuda Sadeh

Ian Kent wrote:
> Sage Weil wrote:
>> real_lookup() is called by do_lookup() if dentry revalidation fails.  If
>> the cache is re-populated while waiting for i_mutex, it may find that
>> a d_lookup() subsequently succeeds (see the "Uhhuh! Nasty case" comment).
>>
>> Previously, real_lookup() would drop i_mutex and do_revalidate() 
>> again. If
>> revalidate failed _again_, however, it would give up with -ENOENT.  The
>> problem here that network file systems may be invalidating dentries via
>> server callbacks, e.g. due to concurrent access from another client, and
>> -ENOENT is frequently the wrong answer.
> 
> This will be something of a problem for autofs4 (and autofs).
> It would require fairly significant changes to the revalidate code.

Or maybe not that significant, I'm not sure yet.

> 
>>
>> This problem has been seen with both Lustre and Ceph.  It seems possible
>> to hit this case with NFS as well if the cache lifetime is very short.
>>
>> Instead, we should do_revalidate() while i_mutex is still held.  If
>> revalidation fails, we can move on to a ->lookup() and ensure a correct
>> result without worrying about any subsequent races.
>>
>> Note that do_revalidate() is called with i_mutex held elsewhere.  For
>> example, do_filp_open(), lookup_create(), do_unlinkat(), do_rmdir(),
>> and possibly others all take the directory i_mutex, and then
>>
>> -> lookup_hash
>>         -> __lookup_hash
>>                 -> cached_lookup
>>                         -> do_revalidate
>>
>> so this does not introduce any new locking rules for d_revalidate
>> implementations.
>>
>> Yes, the goto is ugly.  A cleanup patch follows.
>>
>> CC: Al Viro <viro@zeniv.linux.org.uk>
>> CC: Andreas Dilger <adilger@sun.com>
>> Signed-off-by: Yehuda Sadeh <yehuda@newdream.net>
>> Signed-off-by: Sage Weil <sage@newdream.net>
>> ---
>>  fs/namei.c |    5 +++--
>>  1 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/namei.c b/fs/namei.c
>> index c30e33d..b9e7128 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -489,6 +489,7 @@ static struct dentry * real_lookup(struct dentry * 
>> parent, struct qstr * name, s
>>      if (!result) {
>>          struct dentry *dentry;
>>  
>> +do_the_lookup:
>>          /* Don't create child dentry for a dead directory. */
>>          result = ERR_PTR(-ENOENT);
>>          if (IS_DEADDIR(dir))
>> @@ -512,12 +513,12 @@ out_unlock:
>>       * Uhhuh! Nasty case: the cache was re-populated while
>>       * we waited on the semaphore. Need to revalidate.
>>       */
>> -    mutex_unlock(&dir->i_mutex);
>>      if (result->d_op && result->d_op->d_revalidate) {
>>          result = do_revalidate(result, nd);
>>          if (!result)
>> -            result = ERR_PTR(-ENOENT);
>> +            goto do_the_lookup;
>>      }
>> +    mutex_unlock(&dir->i_mutex);
>>      return result;
>>  }
>>  
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-03-24  4:18   ` Ian Kent
@ 2009-03-25  4:29     ` Sage Weil
  2009-03-25  6:08       ` Ian Kent
  0 siblings, 1 reply; 45+ messages in thread
From: Sage Weil @ 2009-03-25  4:29 UTC (permalink / raw)
  To: Ian Kent; +Cc: linux-fsdevel, hch, akpm, Al Viro, Andreas Dilger, Yehuda Sadeh

On Tue, 24 Mar 2009, Ian Kent wrote:
> Ian Kent wrote:
> > Sage Weil wrote:
> > > real_lookup() is called by do_lookup() if dentry revalidation fails.  If
> > > the cache is re-populated while waiting for i_mutex, it may find that
> > > a d_lookup() subsequently succeeds (see the "Uhhuh! Nasty case" comment).
> > > 
> > > Previously, real_lookup() would drop i_mutex and do_revalidate() again. If
> > > revalidate failed _again_, however, it would give up with -ENOENT.  The
> > > problem here that network file systems may be invalidating dentries via
> > > server callbacks, e.g. due to concurrent access from another client, and
> > > -ENOENT is frequently the wrong answer.
> > 
> > This will be something of a problem for autofs4 (and autofs).
> > It would require fairly significant changes to the revalidate code.
> 
> Or maybe not that significant, I'm not sure yet.

Can you be more specific about your concern?  Since the locking rules 
aren't really changing (see below), the main thing to worry about is the 
loss of the wonky -ENOENT error..

sage


> 
> > 
> > > 
> > > This problem has been seen with both Lustre and Ceph.  It seems possible
> > > to hit this case with NFS as well if the cache lifetime is very short.
> > > 
> > > Instead, we should do_revalidate() while i_mutex is still held.  If
> > > revalidation fails, we can move on to a ->lookup() and ensure a correct
> > > result without worrying about any subsequent races.
> > > 
> > > Note that do_revalidate() is called with i_mutex held elsewhere.  For
> > > example, do_filp_open(), lookup_create(), do_unlinkat(), do_rmdir(),
> > > and possibly others all take the directory i_mutex, and then
> > > 
> > > -> lookup_hash
> > >         -> __lookup_hash
> > >                 -> cached_lookup
> > >                         -> do_revalidate
> > > 
> > > so this does not introduce any new locking rules for d_revalidate
> > > implementations.
> > > 
> > > Yes, the goto is ugly.  A cleanup patch follows.
> > > 
> > > CC: Al Viro <viro@zeniv.linux.org.uk>
> > > CC: Andreas Dilger <adilger@sun.com>
> > > Signed-off-by: Yehuda Sadeh <yehuda@newdream.net>
> > > Signed-off-by: Sage Weil <sage@newdream.net>
> > > ---
> > >  fs/namei.c |    5 +++--
> > >  1 files changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/namei.c b/fs/namei.c
> > > index c30e33d..b9e7128 100644
> > > --- a/fs/namei.c
> > > +++ b/fs/namei.c
> > > @@ -489,6 +489,7 @@ static struct dentry * real_lookup(struct dentry *
> > > parent, struct qstr * name, s
> > >      if (!result) {
> > >          struct dentry *dentry;
> > >  +do_the_lookup:
> > >          /* Don't create child dentry for a dead directory. */
> > >          result = ERR_PTR(-ENOENT);
> > >          if (IS_DEADDIR(dir))
> > > @@ -512,12 +513,12 @@ out_unlock:
> > >       * Uhhuh! Nasty case: the cache was re-populated while
> > >       * we waited on the semaphore. Need to revalidate.
> > >       */
> > > -    mutex_unlock(&dir->i_mutex);
> > >      if (result->d_op && result->d_op->d_revalidate) {
> > >          result = do_revalidate(result, nd);
> > >          if (!result)
> > > -            result = ERR_PTR(-ENOENT);
> > > +            goto do_the_lookup;
> > >      }
> > > +    mutex_unlock(&dir->i_mutex);
> > >      return result;
> > >  }
> > >  
> > 
> > -- 
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-03-25  4:29     ` Sage Weil
@ 2009-03-25  6:08       ` Ian Kent
  2009-03-25 16:11         ` Ian Kent
  0 siblings, 1 reply; 45+ messages in thread
From: Ian Kent @ 2009-03-25  6:08 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-fsdevel, hch, akpm, Al Viro, Andreas Dilger, Yehuda Sadeh

Sage Weil wrote:
> On Tue, 24 Mar 2009, Ian Kent wrote:
>> Ian Kent wrote:
>>> Sage Weil wrote:
>>>> real_lookup() is called by do_lookup() if dentry revalidation fails.  If
>>>> the cache is re-populated while waiting for i_mutex, it may find that
>>>> a d_lookup() subsequently succeeds (see the "Uhhuh! Nasty case" comment).
>>>>
>>>> Previously, real_lookup() would drop i_mutex and do_revalidate() again. If
>>>> revalidate failed _again_, however, it would give up with -ENOENT.  The
>>>> problem here that network file systems may be invalidating dentries via
>>>> server callbacks, e.g. due to concurrent access from another client, and
>>>> -ENOENT is frequently the wrong answer.
>>> This will be something of a problem for autofs4 (and autofs).
>>> It would require fairly significant changes to the revalidate code.
>> Or maybe not that significant, I'm not sure yet.
> 
> Can you be more specific about your concern?  Since the locking rules 
> aren't really changing (see below), the main thing to worry about is the 
> loss of the wonky -ENOENT error..

Historically, the lock inconsistency has been around for a long time and
autofs(4) has worked around that issue. The situation is that when
autofs(4) needs to send a request back to user space it must be sent
without the mutex held. This callback can be needed in the lookup or in
revalidate, depending on the situation. Currently when the callback is
needed in revalidate, the lock is never held.

If the lock is now held through revalidate then I will need to move the
lock release/re-acquire from the lookup to the revalidate. My initial
concern was that if we need to release and re-aquire the lock more than
once in the revalidate code the complexity of potential deadlock goes up
considerably. Having put so much effort into dealing with deadlock
issues in lookup with just one release and re-acquire I'm worried I'll
end up with more problems.

However, it looks like I will only need to do the release/re-acquire
once in revalidate, so maybe the status-quo will be maintained and no
additional issues will surface.

> 
> sage
> 
> 
>>>> This problem has been seen with both Lustre and Ceph.  It seems possible
>>>> to hit this case with NFS as well if the cache lifetime is very short.
>>>>
>>>> Instead, we should do_revalidate() while i_mutex is still held.  If
>>>> revalidation fails, we can move on to a ->lookup() and ensure a correct
>>>> result without worrying about any subsequent races.
>>>>
>>>> Note that do_revalidate() is called with i_mutex held elsewhere.  For
>>>> example, do_filp_open(), lookup_create(), do_unlinkat(), do_rmdir(),
>>>> and possibly others all take the directory i_mutex, and then
>>>>
>>>> -> lookup_hash
>>>>         -> __lookup_hash
>>>>                 -> cached_lookup
>>>>                         -> do_revalidate
>>>>
>>>> so this does not introduce any new locking rules for d_revalidate
>>>> implementations.
>>>>
>>>> Yes, the goto is ugly.  A cleanup patch follows.
>>>>
>>>> CC: Al Viro <viro@zeniv.linux.org.uk>
>>>> CC: Andreas Dilger <adilger@sun.com>
>>>> Signed-off-by: Yehuda Sadeh <yehuda@newdream.net>
>>>> Signed-off-by: Sage Weil <sage@newdream.net>
>>>> ---
>>>>  fs/namei.c |    5 +++--
>>>>  1 files changed, 3 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/namei.c b/fs/namei.c
>>>> index c30e33d..b9e7128 100644
>>>> --- a/fs/namei.c
>>>> +++ b/fs/namei.c
>>>> @@ -489,6 +489,7 @@ static struct dentry * real_lookup(struct dentry *
>>>> parent, struct qstr * name, s
>>>>      if (!result) {
>>>>          struct dentry *dentry;
>>>>  +do_the_lookup:
>>>>          /* Don't create child dentry for a dead directory. */
>>>>          result = ERR_PTR(-ENOENT);
>>>>          if (IS_DEADDIR(dir))
>>>> @@ -512,12 +513,12 @@ out_unlock:
>>>>       * Uhhuh! Nasty case: the cache was re-populated while
>>>>       * we waited on the semaphore. Need to revalidate.
>>>>       */
>>>> -    mutex_unlock(&dir->i_mutex);
>>>>      if (result->d_op && result->d_op->d_revalidate) {
>>>>          result = do_revalidate(result, nd);
>>>>          if (!result)
>>>> -            result = ERR_PTR(-ENOENT);
>>>> +            goto do_the_lookup;
>>>>      }
>>>> +    mutex_unlock(&dir->i_mutex);
>>>>      return result;
>>>>  }
>>>>  
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-03-25  6:08       ` Ian Kent
@ 2009-03-25 16:11         ` Ian Kent
  2009-03-25 19:11           ` Sage Weil
  0 siblings, 1 reply; 45+ messages in thread
From: Ian Kent @ 2009-03-25 16:11 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-fsdevel, hch, akpm, Al Viro, Andreas Dilger, Yehuda Sadeh

Ian Kent wrote:
> Sage Weil wrote:
>> On Tue, 24 Mar 2009, Ian Kent wrote:
>>> Ian Kent wrote:
>>>> Sage Weil wrote:
>>>>> real_lookup() is called by do_lookup() if dentry revalidation fails.  If
>>>>> the cache is re-populated while waiting for i_mutex, it may find that
>>>>> a d_lookup() subsequently succeeds (see the "Uhhuh! Nasty case" comment).
>>>>>
>>>>> Previously, real_lookup() would drop i_mutex and do_revalidate() again. If
>>>>> revalidate failed _again_, however, it would give up with -ENOENT.  The
>>>>> problem here that network file systems may be invalidating dentries via
>>>>> server callbacks, e.g. due to concurrent access from another client, and
>>>>> -ENOENT is frequently the wrong answer.
>>>> This will be something of a problem for autofs4 (and autofs).
>>>> It would require fairly significant changes to the revalidate code.
>>> Or maybe not that significant, I'm not sure yet.
>> Can you be more specific about your concern?  Since the locking rules 
>> aren't really changing (see below), the main thing to worry about is the 
>> loss of the wonky -ENOENT error..
> 
> Historically, the lock inconsistency has been around for a long time and
> autofs(4) has worked around that issue. The situation is that when
> autofs(4) needs to send a request back to user space it must be sent
> without the mutex held. This callback can be needed in the lookup or in
> revalidate, depending on the situation. Currently when the callback is
> needed in revalidate, the lock is never held.
> 
> If the lock is now held through revalidate then I will need to move the
> lock release/re-acquire from the lookup to the revalidate. My initial
> concern was that if we need to release and re-aquire the lock more than
> once in the revalidate code the complexity of potential deadlock goes up
> considerably. Having put so much effort into dealing with deadlock
> issues in lookup with just one release and re-acquire I'm worried I'll
> end up with more problems.
> 
> However, it looks like I will only need to do the release/re-acquire
> once in revalidate, so maybe the status-quo will be maintained and no
> additional issues will surface.
> 

It's just occurred to me that your proposing to hold the mutex over one
of the two calls that can occur as a result of do_lookup(). That's not
good because then we possibly have mixed locking for the same VFS
operation. Last time I tried to deal with this I got into a "am I the
lock owner, can I release it" dilemma.

So, I guess I'm saying that, to do this we would need to hold the lock
for both the calls to revalidate.

But if revalidate returns an error code then it will be returned and so
__link_path_walk() will then return that. At least that is what I
intended when I did it, but maybe I didn't get it quite right for
everyone, for example there would be no call to d_invalidate in that
case. Can you perhaps use that mechanism?

Ian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-03-25 16:11         ` Ian Kent
@ 2009-03-25 19:11           ` Sage Weil
  2009-03-26  2:09             ` Ian Kent
  0 siblings, 1 reply; 45+ messages in thread
From: Sage Weil @ 2009-03-25 19:11 UTC (permalink / raw)
  To: Ian Kent; +Cc: linux-fsdevel, hch, akpm, Al Viro, Andreas Dilger, Yehuda Sadeh

On Thu, 26 Mar 2009, Ian Kent wrote:
> Ian Kent wrote:
> > Sage Weil wrote:
> >> On Tue, 24 Mar 2009, Ian Kent wrote:
> >>> Ian Kent wrote:
> >>>> Sage Weil wrote:
> >>>>> real_lookup() is called by do_lookup() if dentry revalidation fails.  If
> >>>>> the cache is re-populated while waiting for i_mutex, it may find that
> >>>>> a d_lookup() subsequently succeeds (see the "Uhhuh! Nasty case" comment).
> >>>>>
> >>>>> Previously, real_lookup() would drop i_mutex and do_revalidate() again. If
> >>>>> revalidate failed _again_, however, it would give up with -ENOENT.  The
> >>>>> problem here that network file systems may be invalidating dentries via
> >>>>> server callbacks, e.g. due to concurrent access from another client, and
> >>>>> -ENOENT is frequently the wrong answer.
> >>>> This will be something of a problem for autofs4 (and autofs).
> >>>> It would require fairly significant changes to the revalidate code.
> >>> Or maybe not that significant, I'm not sure yet.
> >> Can you be more specific about your concern?  Since the locking rules 
> >> aren't really changing (see below), the main thing to worry about is the 
> >> loss of the wonky -ENOENT error..
> > 
> > Historically, the lock inconsistency has been around for a long time and
> > autofs(4) has worked around that issue. The situation is that when
> > autofs(4) needs to send a request back to user space it must be sent
> > without the mutex held. This callback can be needed in the lookup or in
> > revalidate, depending on the situation. Currently when the callback is
> > needed in revalidate, the lock is never held.
> > 
> > If the lock is now held through revalidate then I will need to move the
> > lock release/re-acquire from the lookup to the revalidate. My initial
> > concern was that if we need to release and re-aquire the lock more than
> > once in the revalidate code the complexity of potential deadlock goes up
> > considerably. Having put so much effort into dealing with deadlock
> > issues in lookup with just one release and re-acquire I'm worried I'll
> > end up with more problems.
> > 
> > However, it looks like I will only need to do the release/re-acquire
> > once in revalidate, so maybe the status-quo will be maintained and no
> > additional issues will surface.
> > 
> 
> It's just occurred to me that your proposing to hold the mutex over one
> of the two calls that can occur as a result of do_lookup(). That's not
> good because then we possibly have mixed locking for the same VFS
> operation. Last time I tried to deal with this I got into a "am I the
> lock owner, can I release it" dilemma.
>
> So, I guess I'm saying that, to do this we would need to hold the lock
> for both the calls to revalidate.

I guess the question is whether consistent revalidate rules from the point 
of do_lookup() are intended.  There are various other paths (through 
lookup_hash() and cached_lookup()) that do take the mutex.  Whether you 
get 'mixed locking' is just a matter of what starting point you choose.

And as far as I can tell, that's a good thing.  In general, we want an 
unlocked revalidate to avoid mutex contention during cached lookups.  But 
we also want locked revalidate for paths where changes are being made to 
avoid races.  It sounds like autofs was just getting lucky before since 
none of the operations that use those paths are supported.

Would it be possible to avoid the upcall in revalidate, and instead fail 
and let the subsequent lookup path do it?  (I'm guessing the upcall 
doesn't happen for _every_ revalidate?)


> But if revalidate returns an error code then it will be returned and so
> __link_path_walk() will then return that. At least that is what I
> intended when I did it, but maybe I didn't get it quite right for
> everyone, for example there would be no call to d_invalidate in that
> case. Can you perhaps use that mechanism?

I'm not sure what you mean.  When this problem currently comes up, 
revalidate doesn't want to return an error... it wants ->lookup() to be 
called again, but instead real_lookup() is returning -ENOENT.  The obvious 
alternative of looping (instead of holding i_mutex over the revalidate) 
would (at least theoretically) be starvation prone.

sage

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-03-25 19:11           ` Sage Weil
@ 2009-03-26  2:09             ` Ian Kent
  2009-03-26  3:53               ` Sage Weil
  2009-03-26  3:54               ` Ian Kent
  0 siblings, 2 replies; 45+ messages in thread
From: Ian Kent @ 2009-03-26  2:09 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-fsdevel, hch, akpm, Al Viro, Andreas Dilger, Yehuda Sadeh

Sage Weil wrote:
> On Thu, 26 Mar 2009, Ian Kent wrote:
>> Ian Kent wrote:
>>> Sage Weil wrote:
>>>> On Tue, 24 Mar 2009, Ian Kent wrote:
>>>>> Ian Kent wrote:
>>>>>> Sage Weil wrote:
>>>>>>> real_lookup() is called by do_lookup() if dentry revalidation fails.  If
>>>>>>> the cache is re-populated while waiting for i_mutex, it may find that
>>>>>>> a d_lookup() subsequently succeeds (see the "Uhhuh! Nasty case" comment).
>>>>>>>
>>>>>>> Previously, real_lookup() would drop i_mutex and do_revalidate() again. If
>>>>>>> revalidate failed _again_, however, it would give up with -ENOENT.  The
>>>>>>> problem here that network file systems may be invalidating dentries via
>>>>>>> server callbacks, e.g. due to concurrent access from another client, and
>>>>>>> -ENOENT is frequently the wrong answer.
>>>>>> This will be something of a problem for autofs4 (and autofs).
>>>>>> It would require fairly significant changes to the revalidate code.
>>>>> Or maybe not that significant, I'm not sure yet.
>>>> Can you be more specific about your concern?  Since the locking rules 
>>>> aren't really changing (see below), the main thing to worry about is the 
>>>> loss of the wonky -ENOENT error..
>>> Historically, the lock inconsistency has been around for a long time and
>>> autofs(4) has worked around that issue. The situation is that when
>>> autofs(4) needs to send a request back to user space it must be sent
>>> without the mutex held. This callback can be needed in the lookup or in
>>> revalidate, depending on the situation. Currently when the callback is
>>> needed in revalidate, the lock is never held.
>>>
>>> If the lock is now held through revalidate then I will need to move the
>>> lock release/re-acquire from the lookup to the revalidate. My initial
>>> concern was that if we need to release and re-aquire the lock more than
>>> once in the revalidate code the complexity of potential deadlock goes up
>>> considerably. Having put so much effort into dealing with deadlock
>>> issues in lookup with just one release and re-acquire I'm worried I'll
>>> end up with more problems.
>>>
>>> However, it looks like I will only need to do the release/re-acquire
>>> once in revalidate, so maybe the status-quo will be maintained and no
>>> additional issues will surface.
>>>
>> It's just occurred to me that your proposing to hold the mutex over one
>> of the two calls that can occur as a result of do_lookup(). That's not
>> good because then we possibly have mixed locking for the same VFS
>> operation. Last time I tried to deal with this I got into a "am I the
>> lock owner, can I release it" dilemma.
>>
>> So, I guess I'm saying that, to do this we would need to hold the lock
>> for both the calls to revalidate.
> 
> I guess the question is whether consistent revalidate rules from the point 
> of do_lookup() are intended.  There are various other paths (through 
> lookup_hash() and cached_lookup()) that do take the mutex.  Whether you 
> get 'mixed locking' is just a matter of what starting point you choose.
> 
> And as far as I can tell, that's a good thing.  In general, we want an 
> unlocked revalidate to avoid mutex contention during cached lookups.  But 
> we also want locked revalidate for paths where changes are being made to 
> avoid races.  It sounds like autofs was just getting lucky before since 
> none of the operations that use those paths are supported.

It's not due to luck.

> 
> Would it be possible to avoid the upcall in revalidate, and instead fail 
> and let the subsequent lookup path do it?  (I'm guessing the upcall 
> doesn't happen for _every_ revalidate?)

Yes, that's right, just every revalidate from processes that aren't the
automount process itself. The normal case is the mount succeeds and
further walks follow the mount from then on until it expires.

It was more than three years ago when I tried to make everything go
through lookup so my memory is pretty unclear now. In the end I think
there was one case in real_lookup() where the lookup was skipped,
revalidate was called and failed but lookup wasn't then called again and
I got an incorrect failure. AFAICR all other code paths that hold the
mutex over lookup and revalidate perform the revalidate first and then
the lookup if that fails, which avoids this case.

> 
> 
>> But if revalidate returns an error code then it will be returned and so
>> __link_path_walk() will then return that. At least that is what I
>> intended when I did it, but maybe I didn't get it quite right for
>> everyone, for example there would be no call to d_invalidate in that
>> case. Can you perhaps use that mechanism?
> 
> I'm not sure what you mean.  When this problem currently comes up, 
> revalidate doesn't want to return an error... it wants ->lookup() to be 
> called again, but instead real_lookup() is returning -ENOENT.  The obvious 
> alternative of looping (instead of holding i_mutex over the revalidate) 
> would (at least theoretically) be starvation prone.

I just don't understand what you need to resolve.
If returning an error isn't what you need then that's just the way it is.

Ian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-03-26  2:09             ` Ian Kent
@ 2009-03-26  3:53               ` Sage Weil
  2009-03-26  8:00                 ` Ian Kent
  2009-03-26 10:38                 ` Ian Kent
  2009-03-26  3:54               ` Ian Kent
  1 sibling, 2 replies; 45+ messages in thread
From: Sage Weil @ 2009-03-26  3:53 UTC (permalink / raw)
  To: Ian Kent; +Cc: linux-fsdevel, hch, akpm, Al Viro, Andreas Dilger, Yehuda Sadeh

On Thu, 26 Mar 2009, Ian Kent wrote:
> > Would it be possible to avoid the upcall in revalidate, and instead fail 
> > and let the subsequent lookup path do it?  (I'm guessing the upcall 
> > doesn't happen for _every_ revalidate?)
> 
> Yes, that's right, just every revalidate from processes that aren't the
> automount process itself. The normal case is the mount succeeds and
> further walks follow the mount from then on until it expires.
> 
> It was more than three years ago when I tried to make everything go
> through lookup so my memory is pretty unclear now. In the end I think
> there was one case in real_lookup() where the lookup was skipped,
> revalidate was called and failed but lookup wasn't then called again and
> I got an incorrect failure.

That is _exactly_ the bug this patch is fixing.  :)  A (racing) process 
ends up in real_lookup(), takes the mutex and finds the dentry has already 
been added to the cache by someone else.  The mutex is dropped, revalidate 
is called, and if it fails, real_lookup() returns ENOENT (!!) without ever 
trying lookup.  The basic problem is that the fs revalidate might fail, 
expecting lookup to get called, but real_lookup() returns ENOENT 
instead... which is just wrong.

> AFAICR all other code paths that hold the
> mutex over lookup and revalidate perform the revalidate first and then
> the lookup if that fails, which avoids this case.

If you mean the paths autofs manages to avoid (unlinkat, rmdir, etc.), 
yeah: the mutex is taken, then cached_lookup() (and revalidate), then 
lookup if necessary.  Holding the mutex over revalidate avoids dealing 
with various races.

So, it sounds like this fix would need to go in along with an autofs patch 
that moves the upcall back into lookup exclusively, now that a revalidate 
failure does the right thing (always calls lookup).  Hopefully that would 
mean a net simplification on the autofs side as well?

sage

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-03-26  2:09             ` Ian Kent
  2009-03-26  3:53               ` Sage Weil
@ 2009-03-26  3:54               ` Ian Kent
  2009-03-26  4:03                 ` Sage Weil
  2009-03-26  5:07                 ` Ian Kent
  1 sibling, 2 replies; 45+ messages in thread
From: Ian Kent @ 2009-03-26  3:54 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-fsdevel, hch, akpm, Al Viro, Andreas Dilger, Yehuda Sadeh

Ian Kent wrote:
> Sage Weil wrote:
>> Would it be possible to avoid the upcall in revalidate, and instead fail 
>> and let the subsequent lookup path do it?  (I'm guessing the upcall 
>> doesn't happen for _every_ revalidate?)

snip ...

> 
> It was more than three years ago when I tried to make everything go
> through lookup so my memory is pretty unclear now. In the end I think
> there was one case in real_lookup() where the lookup was skipped,
> revalidate was called and failed but lookup wasn't then called again and
> I got an incorrect failure. AFAICR all other code paths that hold the
> mutex over lookup and revalidate perform the revalidate first and then
> the lookup if that fails, which avoids this case.

I've managed to locate the work that I did on this (on an old machine I
left as it was for posterity).

I haven't had a chance to look at what I did closely and a lot has
changed in autofs4 since. The one thing that I'm fairly sure I would
need to be able to make this work is the order of the calls in
real_lookup() to be revalidate then lookup. Would that work for
real_lookup() given that we hold the mutex now?

In fact this might be an opportunity for me to clean up my sloppy
DCACHE_AUTOFS_PENDING handling by moving it all under the inode mutex.

Since this is likely to be quite a bit of effort on my part could I
impose on you to help me understand the issue you need to resolve? I'd
really like to feel it's worth the effort.

Ian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-03-26  3:54               ` Ian Kent
@ 2009-03-26  4:03                 ` Sage Weil
  2009-03-26  5:07                 ` Ian Kent
  1 sibling, 0 replies; 45+ messages in thread
From: Sage Weil @ 2009-03-26  4:03 UTC (permalink / raw)
  To: Ian Kent; +Cc: linux-fsdevel, hch, akpm, Al Viro, Andreas Dilger, Yehuda Sadeh

On Thu, 26 Mar 2009, Ian Kent wrote:
> Ian Kent wrote:
> > Sage Weil wrote:
> >> Would it be possible to avoid the upcall in revalidate, and instead fail 
> >> and let the subsequent lookup path do it?  (I'm guessing the upcall 
> >> doesn't happen for _every_ revalidate?)
> 
> snip ...
> 
> > 
> > It was more than three years ago when I tried to make everything go
> > through lookup so my memory is pretty unclear now. In the end I think
> > there was one case in real_lookup() where the lookup was skipped,
> > revalidate was called and failed but lookup wasn't then called again and
> > I got an incorrect failure. AFAICR all other code paths that hold the
> > mutex over lookup and revalidate perform the revalidate first and then
> > the lookup if that fails, which avoids this case.
> 
> I've managed to locate the work that I did on this (on an old machine I
> left as it was for posterity).
> 
> I haven't had a chance to look at what I did closely and a lot has
> changed in autofs4 since. The one thing that I'm fairly sure I would
> need to be able to make this work is the order of the calls in
> real_lookup() to be revalidate then lookup. Would that work for
> real_lookup() given that we hold the mutex now?

Yep, that's what this patch does (though it's easier to see after the 2/2 
cleanup patch is applied as well).

> In fact this might be an opportunity for me to clean up my sloppy
> DCACHE_AUTOFS_PENDING handling by moving it all under the inode mutex.
> 
> Since this is likely to be quite a bit of effort on my part could I
> impose on you to help me understand the issue you need to resolve? I'd
> really like to feel it's worth the effort.

The basic symptom is sporatic ENOENT errors getting returned to userspace 
when it is clear the file was always there.  In my case, it was caused by 
dentry lease revocation (resulting in a revalidate returning 0) with 
multiple processes doing path traversal.  The root cause is real_lookup() 
return ENOENT on revalidate failure instead of doing a lookup.

sage

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-03-26  3:54               ` Ian Kent
  2009-03-26  4:03                 ` Sage Weil
@ 2009-03-26  5:07                 ` Ian Kent
  1 sibling, 0 replies; 45+ messages in thread
From: Ian Kent @ 2009-03-26  5:07 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-fsdevel, hch, akpm, Al Viro, Andreas Dilger, Yehuda Sadeh

Ian Kent wrote:
> Ian Kent wrote:
>> Sage Weil wrote:
>>> Would it be possible to avoid the upcall in revalidate, and instead fail 
>>> and let the subsequent lookup path do it?  (I'm guessing the upcall 
>>> doesn't happen for _every_ revalidate?)
> 
> snip ...
> 
>> It was more than three years ago when I tried to make everything go
>> through lookup so my memory is pretty unclear now. In the end I think
>> there was one case in real_lookup() where the lookup was skipped,
>> revalidate was called and failed but lookup wasn't then called again and
>> I got an incorrect failure. AFAICR all other code paths that hold the
>> mutex over lookup and revalidate perform the revalidate first and then
>> the lookup if that fails, which avoids this case.
> 
> I've managed to locate the work that I did on this (on an old machine I
> left as it was for posterity).
> 
> I haven't had a chance to look at what I did closely and a lot has
> changed in autofs4 since. The one thing that I'm fairly sure I would
> need to be able to make this work is the order of the calls in
> real_lookup() to be revalidate then lookup. Would that work for
> real_lookup() given that we hold the mutex now?

DOH, which both patches together end up doing.

> 
> In fact this might be an opportunity for me to clean up my sloppy
> DCACHE_AUTOFS_PENDING handling by moving it all under the inode mutex.

Unfortunately not, I still need to use this in the follow_link() method
without the lock.

Ian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-03-26  3:53               ` Sage Weil
@ 2009-03-26  8:00                 ` Ian Kent
  2009-03-26 10:38                 ` Ian Kent
  1 sibling, 0 replies; 45+ messages in thread
From: Ian Kent @ 2009-03-26  8:00 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-fsdevel, hch, akpm, Al Viro, Andreas Dilger, Yehuda Sadeh

Sage Weil wrote:
> On Thu, 26 Mar 2009, Ian Kent wrote:
>>> Would it be possible to avoid the upcall in revalidate, and instead fail 
>>> and let the subsequent lookup path do it?  (I'm guessing the upcall 
>>> doesn't happen for _every_ revalidate?)
>> Yes, that's right, just every revalidate from processes that aren't the
>> automount process itself. The normal case is the mount succeeds and
>> further walks follow the mount from then on until it expires.
>>
>> It was more than three years ago when I tried to make everything go
>> through lookup so my memory is pretty unclear now. In the end I think
>> there was one case in real_lookup() where the lookup was skipped,
>> revalidate was called and failed but lookup wasn't then called again and
>> I got an incorrect failure.
> 
> That is _exactly_ the bug this patch is fixing.  :)  A (racing) process 
> ends up in real_lookup(), takes the mutex and finds the dentry has already 
> been added to the cache by someone else.  The mutex is dropped, revalidate 
> is called, and if it fails, real_lookup() returns ENOENT (!!) without ever 
> trying lookup.  The basic problem is that the fs revalidate might fail, 
> expecting lookup to get called, but real_lookup() returns ENOENT 
> instead... which is just wrong.
> 
>> AFAICR all other code paths that hold the
>> mutex over lookup and revalidate perform the revalidate first and then
>> the lookup if that fails, which avoids this case.
> 
> If you mean the paths autofs manages to avoid (unlinkat, rmdir, etc.), 
> yeah: the mutex is taken, then cached_lookup() (and revalidate), then 
> lookup if necessary.  Holding the mutex over revalidate avoids dealing 
> with various races.
> 
> So, it sounds like this fix would need to go in along with an autofs patch 
> that moves the upcall back into lookup exclusively, now that a revalidate 
> failure does the right thing (always calls lookup).  Hopefully that would 
> mean a net simplification on the autofs side as well?

Maybe ... doesn't look like it so far ... but it's to soon to tell.

autofs4 - always use lookup for mount

From: Ian Kent <raven@themaw.net>


---

 fs/autofs4/autofs_i.h |   15 ------
 fs/autofs4/root.c     |  117
++++++++++++++++++++++++++++++++++---------------
 2 files changed, 81 insertions(+), 51 deletions(-)

Ian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-03-26  3:53               ` Sage Weil
  2009-03-26  8:00                 ` Ian Kent
@ 2009-03-26 10:38                 ` Ian Kent
  2009-03-29  8:53                   ` Ian Kent
  1 sibling, 1 reply; 45+ messages in thread
From: Ian Kent @ 2009-03-26 10:38 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-fsdevel, hch, akpm, Al Viro, Andreas Dilger, Yehuda Sadeh

On Wed, 2009-03-25 at 20:53 -0700, Sage Weil wrote:
> So, it sounds like this fix would need to go in along with an autofs patch 
> that moves the upcall back into lookup exclusively, now that a revalidate 
> failure does the right thing (always calls lookup).  Hopefully that would 
> mean a net simplification on the autofs side as well?

Here is a quick and totally untested patch.
I doubt very much that I've got this right yet but it is something like
what I will need to do.

I can easily build this as a module and install it.
I wonder what would happen if I was to use this without your two
patches. That would give me some quick feedback on potential problems.
Thoughts?

Ian

autofs4 - always use lookup for mount

From: Ian Kent <raven@themaw.net>


---

 fs/autofs4/autofs_i.h |   15 ----
 fs/autofs4/root.c     |  199 ++++++++++++++++++++++++++-----------------------
 2 files changed, 106 insertions(+), 108 deletions(-)


diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index b7ff33c..1fd9183 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -95,6 +95,7 @@ struct autofs_info {
 
 #define AUTOFS_INF_EXPIRING	(1<<0) /* dentry is in the process of expiring */
 #define AUTOFS_INF_MOUNTPOINT	(1<<1) /* mountpoint status for direct expire */
+#define AUTOFS_INF_REVALIDATE	(1<<2) /* revalidate sent this mount request */
 
 struct autofs_wait_queue {
 	wait_queue_head_t queue;
@@ -156,20 +157,6 @@ static inline int autofs4_oz_mode(struct autofs_sb_info *sbi) {
 	return sbi->catatonic || task_pgrp_nr(current) == sbi->oz_pgrp;
 }
 
-/* Does a dentry have some pending activity? */
-static inline int autofs4_ispending(struct dentry *dentry)
-{
-	struct autofs_info *inf = autofs4_dentry_ino(dentry);
-
-	if (dentry->d_flags & DCACHE_AUTOFS_PENDING)
-		return 1;
-
-	if (inf->flags & AUTOFS_INF_EXPIRING)
-		return 1;
-
-	return 0;
-}
-
 static inline void autofs4_copy_atime(struct file *src, struct file *dst)
 {
 	dst->f_path.dentry->d_inode->i_atime =
diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index e8c55d2..397ca66 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -72,6 +72,14 @@ const struct inode_operations autofs4_dir_inode_operations = {
 	.rmdir		= autofs4_dir_rmdir,
 };
 
+static unsigned int autofs4_need_lookup(unsigned int flags)
+{
+	unsigned int res = 0;
+	if (flags & (TRIGGER_FLAGS | TRIGGER_INTENTS))
+		res = 1;
+	return res;
+}
+
 static int autofs4_dir_open(struct inode *inode, struct file *file)
 {
 	struct dentry *dentry = file->f_path.dentry;
@@ -103,7 +111,7 @@ out:
 	return dcache_dir_open(inode, file);
 }
 
-static int try_to_fill_dentry(struct dentry *dentry, int flags)
+static int try_to_fill_dentry(struct dentry *dentry)
 {
 	struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
 	struct autofs_info *ino = autofs4_dentry_ino(dentry);
@@ -116,55 +124,18 @@ static int try_to_fill_dentry(struct dentry *dentry, int flags)
 	 * Wait for a pending mount, triggering one if there
 	 * isn't one already
 	 */
-	if (dentry->d_inode == NULL) {
-		DPRINTK("waiting for mount name=%.*s",
-			 dentry->d_name.len, dentry->d_name.name);
+	DPRINTK("waiting for mount name=%.*s",
+		 dentry->d_name.len, dentry->d_name.name);
 
-		status = autofs4_wait(sbi, dentry, NFY_MOUNT);
+	status = autofs4_wait(sbi, dentry, NFY_MOUNT);
 
-		DPRINTK("mount done status=%d", status);
-
-		/* Turn this into a real negative dentry? */
-		if (status == -ENOENT) {
-			spin_lock(&dentry->d_lock);
-			dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
-			spin_unlock(&dentry->d_lock);
-			return status;
-		} else if (status) {
-			/* Return a negative dentry, but leave it "pending" */
-			return status;
-		}
-	/* Trigger mount for path component or follow link */
-	} else if (dentry->d_flags & DCACHE_AUTOFS_PENDING ||
-			flags & (TRIGGER_FLAGS | TRIGGER_INTENTS) ||
-			current->link_count) {
-		DPRINTK("waiting for mount name=%.*s",
-			dentry->d_name.len, dentry->d_name.name);
-
-		spin_lock(&dentry->d_lock);
-		dentry->d_flags |= DCACHE_AUTOFS_PENDING;
-		spin_unlock(&dentry->d_lock);
-		status = autofs4_wait(sbi, dentry, NFY_MOUNT);
-
-		DPRINTK("mount done status=%d", status);
-
-		if (status) {
-			spin_lock(&dentry->d_lock);
-			dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
-			spin_unlock(&dentry->d_lock);
-			return status;
-		}
-	}
+	DPRINTK("mount done status=%d", status);
 
 	/* Initialize expiry counter after successful mount */
-	if (ino)
+	if (ino && !status)
 		ino->last_used = jiffies;
 
-	spin_lock(&dentry->d_lock);
-	dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
-	spin_unlock(&dentry->d_lock);
-
-	return 0;
+	return status;
 }
 
 /* For autofs direct mounts the follow link triggers the mount */
@@ -216,7 +187,16 @@ static void *autofs4_follow_link(struct dentry *dentry, struct nameidata *nd)
 	    (!d_mountpoint(dentry) && __simple_empty(dentry))) {
 		spin_unlock(&dcache_lock);
 
-		status = try_to_fill_dentry(dentry, 0);
+		spin_lock(&dentry->d_lock);
+		dentry->d_flags |= DCACHE_AUTOFS_PENDING;
+		spin_unlock(&dentry->d_lock);
+
+		status = try_to_fill_dentry(dentry);
+
+		spin_lock(&dentry->d_lock);
+		dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
+		spin_unlock(&dentry->d_lock);
+
 		if (status)
 			goto out_error;
 
@@ -255,38 +235,39 @@ static int autofs4_revalidate(struct dentry *dentry, struct nameidata *nd)
 {
 	struct inode *dir = dentry->d_parent->d_inode;
 	struct autofs_sb_info *sbi = autofs4_sbi(dir->i_sb);
-	int oz_mode = autofs4_oz_mode(sbi);
-	int flags = nd ? nd->flags : 0;
+	unsigned int flags = nd ? nd->flags : 0;
 	int status = 1;
 
-	/* Pending dentry */
-	spin_lock(&sbi->fs_lock);
-	if (autofs4_ispending(dentry)) {
-		/* The daemon never causes a mount to trigger */
-		spin_unlock(&sbi->fs_lock);
-
-		if (oz_mode)
-			return 1;
+	DPRINTK("name = %.*s oz_mode = %d flags = %d",
+		dentry->d_name.len, dentry->d_name.name,
+		autofs4_oz_mode(sbi), nd ? nd->flags : 0);
 
-		/*
-		 * If the directory has gone away due to an expire
-		 * we have been called as ->d_revalidate() and so
-		 * we need to return false and proceed to ->lookup().
-		 */
-		if (autofs4_expire_wait(dentry) == -EAGAIN)
-			return 0;
+	/* The daemon never causes a mount to trigger */
+	if (autofs4_oz_mode(sbi))
+		return 1;
 
+	spin_lock(&dentry->d_lock);
+	if (dentry->d_flags & DCACHE_AUTOFS_PENDING) {
+		spin_unlock(&dentry->d_lock);
 		/*
-		 * A zero status is success otherwise we have a
-		 * negative error code.
+		 * If the directory mutex lock is held we must hide the
+		 * dentry and proceed to ->lookup() to wait for mount
+		 * completion since we cannot know whether the lock was
+		 * taken by this code path or is held by another.
+		 * Otherwise we can just wait for mount completion.
 		 */
-		status = try_to_fill_dentry(dentry, flags);
-		if (status == 0)
-			return 1;
+		status = 0;
+		if (mutex_is_locked(&dir->i_mutex)) {
+			/* Already pending, send to ->lookup() */
+			d_drop(dentry);
+		} else {
+			if (autofs4_expire_wait(dentry) != -EAGAIN)
+				status = try_to_fill_dentry(dentry);
+		}
 
 		return status;
 	}
-	spin_unlock(&sbi->fs_lock);
+	spin_unlock(&dentry->d_lock);
 
 	/* Negative dentry.. invalidate if "old" */
 	if (dentry->d_inode == NULL)
@@ -299,19 +280,23 @@ static int autofs4_revalidate(struct dentry *dentry, struct nameidata *nd)
 	    __simple_empty(dentry)) {
 		DPRINTK("dentry=%p %.*s, emptydir",
 			 dentry, dentry->d_name.len, dentry->d_name.name);
-		spin_unlock(&dcache_lock);
-
-		/* The daemon never causes a mount to trigger */
-		if (oz_mode)
-			return 1;
 
-		/*
-		 * A zero status is success otherwise we have a
-		 * negative error code.
-		 */
-		status = try_to_fill_dentry(dentry, flags);
-		if (status == 0)
-			return 1;
+		/* Always send new mount requests to lookup */
+		status = 1;
+		spin_lock(&sbi->fs_lock);
+		if (autofs4_need_lookup(flags) || current->link_count) {
+			struct autofs_info *ino = autofs4_dentry_ino(dentry);
+			ino->flags |= AUTOFS_INF_REVALIDATE;
+			spin_lock(&sbi->lookup_lock);
+			list_add(&ino->active, &sbi->active_list);
+			spin_unlock(&sbi->lookup_lock);
+			spin_lock(&dentry->d_lock);
+			__d_drop(dentry);
+			spin_unlock(&dentry->d_lock);
+			status = 0;
+		}
+		spin_unlock(&sbi->fs_lock);
+		spin_unlock(&dcache_lock);
 
 		return status;
 	}
@@ -471,6 +456,7 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
 	struct autofs_info *ino;
 	struct dentry *expiring, *unhashed;
 	int oz_mode;
+	int status = 0;
 
 	DPRINTK("name = %.*s",
 		dentry->d_name.len, dentry->d_name.name);
@@ -486,9 +472,10 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
 		 current->pid, task_pgrp_nr(current), sbi->catatonic, oz_mode);
 
 	unhashed = autofs4_lookup_active(sbi, dentry->d_parent, &dentry->d_name);
-	if (unhashed)
+	if (unhashed) {
 		dentry = unhashed;
-	else {
+		ino = autofs4_dentry_ino(dentry);
+	} else {
 		/*
 		 * Mark the dentry incomplete but don't hash it. We do this
 		 * to serialize our inode creation operations (symlink and
@@ -522,6 +509,10 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
 	}
 
 	if (!oz_mode) {
+		spin_lock(&dentry->d_lock);
+		dentry->d_flags |= DCACHE_AUTOFS_PENDING;
+		spin_unlock(&dentry->d_lock);
+
 		mutex_unlock(&dir->i_mutex);
 		expiring = autofs4_lookup_expiring(sbi,
 						   dentry->d_parent,
@@ -541,19 +532,34 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
 			dput(expiring);
 		}
 
+		status = try_to_fill_dentry(dentry);
+
+		mutex_lock(&dir->i_mutex);
+
 		spin_lock(&dentry->d_lock);
-		dentry->d_flags |= DCACHE_AUTOFS_PENDING;
+		dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
 		spin_unlock(&dentry->d_lock);
-		if (dentry->d_op && dentry->d_op->d_revalidate)
-			(dentry->d_op->d_revalidate)(dentry, nd);
-		mutex_lock(&dir->i_mutex);
+
+		/*
+		 * If we got a new lookup request from revaidate, remove
+		 * it from the active queue and rehash the dentry.
+		 */
+		spin_lock(&sbi->fs_lock);
+		if (ino->flags & AUTOFS_INF_REVALIDATE) {
+			spin_lock(&sbi->lookup_lock);
+			if (!list_empty(&ino->active))
+				list_del_init(&ino->active);
+			spin_unlock(&sbi->lookup_lock);
+			d_rehash(dentry);
+		}
+		spin_unlock(&sbi->fs_lock);
 	}
 
 	/*
-	 * If we are still pending, check if we had to handle
+	 * If we had a mount fail, check if we had to handle
 	 * a signal. If so we can force a restart..
 	 */
-	if (dentry->d_flags & DCACHE_AUTOFS_PENDING) {
+	if (status) {
 		/* See if we were interrupted */
 		if (signal_pending(current)) {
 			sigset_t *sigset = &current->pending.signal;
@@ -565,11 +571,6 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
 			    return ERR_PTR(-ERESTARTNOINTR);
 			}
 		}
-		if (!oz_mode) {
-			spin_lock(&dentry->d_lock);
-			dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
-			spin_unlock(&dentry->d_lock);
-		}
 	}
 
 	/*
@@ -599,7 +600,17 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
 		return dentry;
 	}
 
-	if (unhashed)
+	/*
+	 * If we had a mount failure, return status to user space.
+	 * If the mount succeeded and we used a dentry from the
+	 * active queue return it.
+	 */
+	if (status) {
+		dentry = ERR_PTR(status);
+		if (unhashed)
+			dput(unhashed);
+		return dentry;
+	} else if (unhashed)
 		return unhashed;
 
 	return NULL;



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-03-26 10:38                 ` Ian Kent
@ 2009-03-29  8:53                   ` Ian Kent
  2009-04-03  0:58                     ` Sage Weil
  0 siblings, 1 reply; 45+ messages in thread
From: Ian Kent @ 2009-03-29  8:53 UTC (permalink / raw)
  To: Sage Weil
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Andreas Dilger,
	Yehuda Sadeh

On Thu, 26 Mar 2009, Ian Kent wrote:

> On Wed, 2009-03-25 at 20:53 -0700, Sage Weil wrote:
> > So, it sounds like this fix would need to go in along with an autofs patch 
> > that moves the upcall back into lookup exclusively, now that a revalidate 
> > failure does the right thing (always calls lookup).  Hopefully that would 
> > mean a net simplification on the autofs side as well?
> 
> Here is a quick and totally untested patch.
> I doubt very much that I've got this right yet but it is something like
> what I will need to do.
> 
> I can easily build this as a module and install it.
> I wonder what would happen if I was to use this without your two
> patches. That would give me some quick feedback on potential problems.
> Thoughts?

Answer to self!
Latest here works OK.

I haven't finished checking yet but it looks like the patch below works 
OK. I started with a 2.6.29 build with your two patches but it was a 
little broken so I fell back to a Fedora 2.6.27 based kernel without the
two revalidate pacthes to debug it. So I still need to test the result 
against 2.6.29 again. I also don't have any real way to test for the three 
process race we discussed where the revalidate isn't followed by a 
->lookup() but with both of your patches applied that shouldn't be a 
problem (as we discussed).

I've not run checkpatch.pl against the patch either at this stage.

There is a further issue and that is regarding the autofs module.

I can't see updating autofs for this being practical (although I haven't 
actually looked yet). I suspect quite a bit of work would be needed. The 
fact is that autofs isn't used much any more and it really should be 
replaced with the autofs4 module at some point. But that's a fairly tricky 
exercise and will likely cause some user space breakage. It will require 
an updated module-init-tools to add "alais autofs4 autofs" for modprobe 
backward compatibility and will break for any explicit checks for the 
presence of the "autofs4" module.

Suddenly this fairly simple change gets bigger than Ben Hur, sorry!
Thoughts?

Ian

autofs4 - always use lookup for lookup

From: Ian Kent <raven@themaw.net>

We need to be able to hold the directory mutex during revalidate in
some cases. To do this autofs needs to send ->d_revalidate() to lookup
if the mutex is held since it cannot tell if it is the lock owner
or if the lock is held by another path of execution.
---

 fs/autofs4/autofs_i.h |   15 ---
 fs/autofs4/root.c     |  229 ++++++++++++++++++++++++++++---------------------
 2 files changed, 131 insertions(+), 113 deletions(-)


diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index b7ff33c..1fd9183 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -95,6 +95,7 @@ struct autofs_info {
 
 #define AUTOFS_INF_EXPIRING	(1<<0) /* dentry is in the process of expiring */
 #define AUTOFS_INF_MOUNTPOINT	(1<<1) /* mountpoint status for direct expire */
+#define AUTOFS_INF_REVALIDATE	(1<<2) /* revalidate sent this mount request */
 
 struct autofs_wait_queue {
 	wait_queue_head_t queue;
@@ -156,20 +157,6 @@ static inline int autofs4_oz_mode(struct autofs_sb_info *sbi) {
 	return sbi->catatonic || task_pgrp_nr(current) == sbi->oz_pgrp;
 }
 
-/* Does a dentry have some pending activity? */
-static inline int autofs4_ispending(struct dentry *dentry)
-{
-	struct autofs_info *inf = autofs4_dentry_ino(dentry);
-
-	if (dentry->d_flags & DCACHE_AUTOFS_PENDING)
-		return 1;
-
-	if (inf->flags & AUTOFS_INF_EXPIRING)
-		return 1;
-
-	return 0;
-}
-
 static inline void autofs4_copy_atime(struct file *src, struct file *dst)
 {
 	dst->f_path.dentry->d_inode->i_atime =
diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index e8c55d2..90c1c87 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -72,6 +72,14 @@ const struct inode_operations autofs4_dir_inode_operations = {
 	.rmdir		= autofs4_dir_rmdir,
 };
 
+static unsigned int autofs4_need_lookup(unsigned int flags)
+{
+	unsigned int res = 0;
+	if (flags & (TRIGGER_FLAGS | TRIGGER_INTENTS))
+		res = 1;
+	return res;
+}
+
 static int autofs4_dir_open(struct inode *inode, struct file *file)
 {
 	struct dentry *dentry = file->f_path.dentry;
@@ -103,7 +111,7 @@ out:
 	return dcache_dir_open(inode, file);
 }
 
-static int try_to_fill_dentry(struct dentry *dentry, int flags)
+static int try_to_fill_dentry(struct dentry *dentry)
 {
 	struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
 	struct autofs_info *ino = autofs4_dentry_ino(dentry);
@@ -116,55 +124,18 @@ static int try_to_fill_dentry(struct dentry *dentry, int flags)
 	 * Wait for a pending mount, triggering one if there
 	 * isn't one already
 	 */
-	if (dentry->d_inode == NULL) {
-		DPRINTK("waiting for mount name=%.*s",
-			 dentry->d_name.len, dentry->d_name.name);
-
-		status = autofs4_wait(sbi, dentry, NFY_MOUNT);
-
-		DPRINTK("mount done status=%d", status);
-
-		/* Turn this into a real negative dentry? */
-		if (status == -ENOENT) {
-			spin_lock(&dentry->d_lock);
-			dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
-			spin_unlock(&dentry->d_lock);
-			return status;
-		} else if (status) {
-			/* Return a negative dentry, but leave it "pending" */
-			return status;
-		}
-	/* Trigger mount for path component or follow link */
-	} else if (dentry->d_flags & DCACHE_AUTOFS_PENDING ||
-			flags & (TRIGGER_FLAGS | TRIGGER_INTENTS) ||
-			current->link_count) {
-		DPRINTK("waiting for mount name=%.*s",
-			dentry->d_name.len, dentry->d_name.name);
+	DPRINTK("waiting for mount name=%.*s",
+		 dentry->d_name.len, dentry->d_name.name);
 
-		spin_lock(&dentry->d_lock);
-		dentry->d_flags |= DCACHE_AUTOFS_PENDING;
-		spin_unlock(&dentry->d_lock);
-		status = autofs4_wait(sbi, dentry, NFY_MOUNT);
+	status = autofs4_wait(sbi, dentry, NFY_MOUNT);
 
-		DPRINTK("mount done status=%d", status);
-
-		if (status) {
-			spin_lock(&dentry->d_lock);
-			dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
-			spin_unlock(&dentry->d_lock);
-			return status;
-		}
-	}
+	DPRINTK("mount done status=%d", status);
 
 	/* Initialize expiry counter after successful mount */
-	if (ino)
+	if (ino && !status)
 		ino->last_used = jiffies;
 
-	spin_lock(&dentry->d_lock);
-	dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
-	spin_unlock(&dentry->d_lock);
-
-	return 0;
+	return status;
 }
 
 /* For autofs direct mounts the follow link triggers the mount */
@@ -216,7 +187,16 @@ static void *autofs4_follow_link(struct dentry *dentry, struct nameidata *nd)
 	    (!d_mountpoint(dentry) && __simple_empty(dentry))) {
 		spin_unlock(&dcache_lock);
 
-		status = try_to_fill_dentry(dentry, 0);
+		spin_lock(&dentry->d_lock);
+		dentry->d_flags |= DCACHE_AUTOFS_PENDING;
+		spin_unlock(&dentry->d_lock);
+
+		status = try_to_fill_dentry(dentry);
+
+		spin_lock(&dentry->d_lock);
+		dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
+		spin_unlock(&dentry->d_lock);
+
 		if (status)
 			goto out_error;
 
@@ -255,42 +235,55 @@ static int autofs4_revalidate(struct dentry *dentry, struct nameidata *nd)
 {
 	struct inode *dir = dentry->d_parent->d_inode;
 	struct autofs_sb_info *sbi = autofs4_sbi(dir->i_sb);
-	int oz_mode = autofs4_oz_mode(sbi);
-	int flags = nd ? nd->flags : 0;
-	int status = 1;
-
-	/* Pending dentry */
-	spin_lock(&sbi->fs_lock);
-	if (autofs4_ispending(dentry)) {
-		/* The daemon never causes a mount to trigger */
-		spin_unlock(&sbi->fs_lock);
+	unsigned int flags = nd ? nd->flags : 0;
+	int status;
 
-		if (oz_mode)
-			return 1;
+	DPRINTK("name = %.*s oz_mode = %d flags = %d",
+		dentry->d_name.len, dentry->d_name.name,
+		autofs4_oz_mode(sbi), nd ? nd->flags : 0);
 
-		/*
-		 * If the directory has gone away due to an expire
-		 * we have been called as ->d_revalidate() and so
-		 * we need to return false and proceed to ->lookup().
-		 */
-		if (autofs4_expire_wait(dentry) == -EAGAIN)
-			return 0;
+	/* The daemon never causes a mount to trigger */
+	if (autofs4_oz_mode(sbi))
+		return 1;
 
+	spin_lock(&dentry->d_lock);
+	if (dentry->d_inode == NULL || dentry->d_flags & DCACHE_AUTOFS_PENDING) {
+		unsigned int lock_held = mutex_is_locked(&dir->i_mutex);
+		unsigned int pending = dentry->d_flags & DCACHE_AUTOFS_PENDING;
+		spin_unlock(&dentry->d_lock);
 		/*
-		 * A zero status is success otherwise we have a
-		 * negative error code.
+		 * If the directory mutex lock is held we must hide the
+		 * dentry and proceed to ->lookup() to wait for mount
+		 * completion since we cannot know whether the lock was
+		 * taken by this code path or is held by another.
+		 * Otherwise we can just wait for mount completion.
 		 */
-		status = try_to_fill_dentry(dentry, flags);
-		if (status == 0)
-			return 1;
+		status = 0;
+		if (lock_held) {
+			/* Already pending, send to ->lookup() */
+			d_drop(dentry);
+		} else {
+			/*
+			 * The mutex isn't held so if there's a pending
+			 * mount we wait. If not then we are on the tail
+			 * end of a mount fail, invalidate the dentry and
+			 * return true so we don't go to lookup again.
+			 */
+			if (pending) {
+				if (autofs4_expire_wait(dentry) != -EAGAIN) {
+					status = try_to_fill_dentry(dentry);
+					if (!status)
+						status = 1;
+				}
+			} else {
+				d_invalidate(dentry);
+				status = 1;
+			}
+		}
 
 		return status;
 	}
-	spin_unlock(&sbi->fs_lock);
-
-	/* Negative dentry.. invalidate if "old" */
-	if (dentry->d_inode == NULL)
-		return 0;
+	spin_unlock(&dentry->d_lock);
 
 	/* Check for a non-mountpoint directory with no contents */
 	spin_lock(&dcache_lock);
@@ -299,19 +292,24 @@ static int autofs4_revalidate(struct dentry *dentry, struct nameidata *nd)
 	    __simple_empty(dentry)) {
 		DPRINTK("dentry=%p %.*s, emptydir",
 			 dentry, dentry->d_name.len, dentry->d_name.name);
-		spin_unlock(&dcache_lock);
-
-		/* The daemon never causes a mount to trigger */
-		if (oz_mode)
-			return 1;
 
-		/*
-		 * A zero status is success otherwise we have a
-		 * negative error code.
-		 */
-		status = try_to_fill_dentry(dentry, flags);
-		if (status == 0)
-			return 1;
+		/* Always send new mount requests to lookup */
+		status = 1;
+		spin_lock(&sbi->fs_lock);
+		if (autofs4_need_lookup(flags) || current->link_count) {
+			struct autofs_info *ino = autofs4_dentry_ino(dentry);
+			ino->flags |= AUTOFS_INF_REVALIDATE;
+			spin_lock(&sbi->lookup_lock);
+			if (list_empty(&ino->active))
+				list_add(&ino->active, &sbi->active_list);
+			spin_unlock(&sbi->lookup_lock);
+			spin_lock(&dentry->d_lock);
+			__d_drop(dentry);
+			spin_unlock(&dentry->d_lock);
+			status = 0;
+		}
+		spin_unlock(&sbi->fs_lock);
+		spin_unlock(&dcache_lock);
 
 		return status;
 	}
@@ -471,6 +469,7 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
 	struct autofs_info *ino;
 	struct dentry *expiring, *unhashed;
 	int oz_mode;
+	int status = 0;
 
 	DPRINTK("name = %.*s",
 		dentry->d_name.len, dentry->d_name.name);
@@ -486,9 +485,18 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
 		 current->pid, task_pgrp_nr(current), sbi->catatonic, oz_mode);
 
 	unhashed = autofs4_lookup_active(sbi, dentry->d_parent, &dentry->d_name);
-	if (unhashed)
+	if (unhashed) {
 		dentry = unhashed;
-	else {
+		/*
+		 * For requests from revalidate we need to rehash the dentry
+		 * here as we don't end up calling mkdir().
+		 */
+		spin_lock(&sbi->fs_lock);
+		ino = autofs4_dentry_ino(dentry);
+		if (ino->flags & AUTOFS_INF_REVALIDATE)
+			d_rehash(dentry);
+		spin_unlock(&sbi->fs_lock);
+	} else {
 		/*
 		 * Mark the dentry incomplete but don't hash it. We do this
 		 * to serialize our inode creation operations (symlink and
@@ -522,6 +530,10 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
 	}
 
 	if (!oz_mode) {
+		spin_lock(&dentry->d_lock);
+		dentry->d_flags |= DCACHE_AUTOFS_PENDING;
+		spin_unlock(&dentry->d_lock);
+
 		mutex_unlock(&dir->i_mutex);
 		expiring = autofs4_lookup_expiring(sbi,
 						   dentry->d_parent,
@@ -541,19 +553,33 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
 			dput(expiring);
 		}
 
+		status = try_to_fill_dentry(dentry);
+
+		mutex_lock(&dir->i_mutex);
+
 		spin_lock(&dentry->d_lock);
-		dentry->d_flags |= DCACHE_AUTOFS_PENDING;
+		dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
 		spin_unlock(&dentry->d_lock);
-		if (dentry->d_op && dentry->d_op->d_revalidate)
-			(dentry->d_op->d_revalidate)(dentry, nd);
-		mutex_lock(&dir->i_mutex);
+
+		/*
+		 * If we got a new lookup request from revaidate, remove
+		 * it from the active queue and rehash the dentry.
+		 */
+		spin_lock(&sbi->fs_lock);
+		if (ino->flags & AUTOFS_INF_REVALIDATE) {
+			spin_lock(&sbi->lookup_lock);
+			if (!list_empty(&ino->active))
+				list_del_init(&ino->active);
+			spin_unlock(&sbi->lookup_lock);
+		}
+		spin_unlock(&sbi->fs_lock);
 	}
 
 	/*
-	 * If we are still pending, check if we had to handle
+	 * If we had a mount fail, check if we had to handle
 	 * a signal. If so we can force a restart..
 	 */
-	if (dentry->d_flags & DCACHE_AUTOFS_PENDING) {
+	if (status) {
 		/* See if we were interrupted */
 		if (signal_pending(current)) {
 			sigset_t *sigset = &current->pending.signal;
@@ -565,11 +591,6 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
 			    return ERR_PTR(-ERESTARTNOINTR);
 			}
 		}
-		if (!oz_mode) {
-			spin_lock(&dentry->d_lock);
-			dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
-			spin_unlock(&dentry->d_lock);
-		}
 	}
 
 	/*
@@ -599,7 +620,17 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
 		return dentry;
 	}
 
-	if (unhashed)
+	/*
+	 * If we had a mount failure, return status to user space.
+	 * If the mount succeeded and we used a dentry from the
+	 * active queue return it.
+	 */
+	if (status) {
+		dentry = ERR_PTR(status);
+		if (unhashed)
+			dput(unhashed);
+		return dentry;
+	} else if (unhashed)
 		return unhashed;
 
 	return NULL;

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-03-29  8:53                   ` Ian Kent
@ 2009-04-03  0:58                     ` Sage Weil
  2009-04-03  2:00                       ` Ian Kent
  0 siblings, 1 reply; 45+ messages in thread
From: Sage Weil @ 2009-04-03  0:58 UTC (permalink / raw)
  To: Ian Kent
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Andreas Dilger,
	Yehuda Sadeh

> Latest here works OK.
> 
> I haven't finished checking yet but it looks like the patch below works 
> OK. I started with a 2.6.29 build with your two patches but it was a 
> little broken so I fell back to a Fedora 2.6.27 based kernel without the
> two revalidate pacthes to debug it. So I still need to test the result 
> against 2.6.29 again. I also don't have any real way to test for the three 
> process race we discussed where the revalidate isn't followed by a 
> ->lookup() but with both of your patches applied that shouldn't be a 
> problem (as we discussed).
> 
> I've not run checkpatch.pl against the patch either at this stage.

That's good news...
 
> There is a further issue and that is regarding the autofs module.
> 
> I can't see updating autofs for this being practical (although I haven't 
> actually looked yet). I suspect quite a bit of work would be needed. The 
> fact is that autofs isn't used much any more and it really should be 
> replaced with the autofs4 module at some point. But that's a fairly tricky 
> exercise and will likely cause some user space breakage. It will require 
> an updated module-init-tools to add "alais autofs4 autofs" for modprobe 
> backward compatibility and will break for any explicit checks for the 
> presence of the "autofs4" module.

Hmm.  Well, I assume autofs needs to work properly before this gets 
changed, though, right?  Should I see what I can do with it?  I took a 
quick look, and I don't think it will take too much to make it behave.  
It looks like the main thing is to make the lookup call to try_fill_dentry 
return any existing dentry in place of the one the vfs provides.

sage


> Suddenly this fairly simple change gets bigger than Ben Hur, sorry!
> Thoughts?
> 
> Ian
> 
> autofs4 - always use lookup for lookup
> 
> From: Ian Kent <raven@themaw.net>
> 
> We need to be able to hold the directory mutex during revalidate in
> some cases. To do this autofs needs to send ->d_revalidate() to lookup
> if the mutex is held since it cannot tell if it is the lock owner
> or if the lock is held by another path of execution.
> ---
> 
>  fs/autofs4/autofs_i.h |   15 ---
>  fs/autofs4/root.c     |  229 ++++++++++++++++++++++++++++---------------------
>  2 files changed, 131 insertions(+), 113 deletions(-)
> 
> 
> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
> index b7ff33c..1fd9183 100644
> --- a/fs/autofs4/autofs_i.h
> +++ b/fs/autofs4/autofs_i.h
> @@ -95,6 +95,7 @@ struct autofs_info {
>  
>  #define AUTOFS_INF_EXPIRING	(1<<0) /* dentry is in the process of expiring */
>  #define AUTOFS_INF_MOUNTPOINT	(1<<1) /* mountpoint status for direct expire */
> +#define AUTOFS_INF_REVALIDATE	(1<<2) /* revalidate sent this mount request */
>  
>  struct autofs_wait_queue {
>  	wait_queue_head_t queue;
> @@ -156,20 +157,6 @@ static inline int autofs4_oz_mode(struct autofs_sb_info *sbi) {
>  	return sbi->catatonic || task_pgrp_nr(current) == sbi->oz_pgrp;
>  }
>  
> -/* Does a dentry have some pending activity? */
> -static inline int autofs4_ispending(struct dentry *dentry)
> -{
> -	struct autofs_info *inf = autofs4_dentry_ino(dentry);
> -
> -	if (dentry->d_flags & DCACHE_AUTOFS_PENDING)
> -		return 1;
> -
> -	if (inf->flags & AUTOFS_INF_EXPIRING)
> -		return 1;
> -
> -	return 0;
> -}
> -
>  static inline void autofs4_copy_atime(struct file *src, struct file *dst)
>  {
>  	dst->f_path.dentry->d_inode->i_atime =
> diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
> index e8c55d2..90c1c87 100644
> --- a/fs/autofs4/root.c
> +++ b/fs/autofs4/root.c
> @@ -72,6 +72,14 @@ const struct inode_operations autofs4_dir_inode_operations = {
>  	.rmdir		= autofs4_dir_rmdir,
>  };
>  
> +static unsigned int autofs4_need_lookup(unsigned int flags)
> +{
> +	unsigned int res = 0;
> +	if (flags & (TRIGGER_FLAGS | TRIGGER_INTENTS))
> +		res = 1;
> +	return res;
> +}
> +
>  static int autofs4_dir_open(struct inode *inode, struct file *file)
>  {
>  	struct dentry *dentry = file->f_path.dentry;
> @@ -103,7 +111,7 @@ out:
>  	return dcache_dir_open(inode, file);
>  }
>  
> -static int try_to_fill_dentry(struct dentry *dentry, int flags)
> +static int try_to_fill_dentry(struct dentry *dentry)
>  {
>  	struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
>  	struct autofs_info *ino = autofs4_dentry_ino(dentry);
> @@ -116,55 +124,18 @@ static int try_to_fill_dentry(struct dentry *dentry, int flags)
>  	 * Wait for a pending mount, triggering one if there
>  	 * isn't one already
>  	 */
> -	if (dentry->d_inode == NULL) {
> -		DPRINTK("waiting for mount name=%.*s",
> -			 dentry->d_name.len, dentry->d_name.name);
> -
> -		status = autofs4_wait(sbi, dentry, NFY_MOUNT);
> -
> -		DPRINTK("mount done status=%d", status);
> -
> -		/* Turn this into a real negative dentry? */
> -		if (status == -ENOENT) {
> -			spin_lock(&dentry->d_lock);
> -			dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
> -			spin_unlock(&dentry->d_lock);
> -			return status;
> -		} else if (status) {
> -			/* Return a negative dentry, but leave it "pending" */
> -			return status;
> -		}
> -	/* Trigger mount for path component or follow link */
> -	} else if (dentry->d_flags & DCACHE_AUTOFS_PENDING ||
> -			flags & (TRIGGER_FLAGS | TRIGGER_INTENTS) ||
> -			current->link_count) {
> -		DPRINTK("waiting for mount name=%.*s",
> -			dentry->d_name.len, dentry->d_name.name);
> +	DPRINTK("waiting for mount name=%.*s",
> +		 dentry->d_name.len, dentry->d_name.name);
>  
> -		spin_lock(&dentry->d_lock);
> -		dentry->d_flags |= DCACHE_AUTOFS_PENDING;
> -		spin_unlock(&dentry->d_lock);
> -		status = autofs4_wait(sbi, dentry, NFY_MOUNT);
> +	status = autofs4_wait(sbi, dentry, NFY_MOUNT);
>  
> -		DPRINTK("mount done status=%d", status);
> -
> -		if (status) {
> -			spin_lock(&dentry->d_lock);
> -			dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
> -			spin_unlock(&dentry->d_lock);
> -			return status;
> -		}
> -	}
> +	DPRINTK("mount done status=%d", status);
>  
>  	/* Initialize expiry counter after successful mount */
> -	if (ino)
> +	if (ino && !status)
>  		ino->last_used = jiffies;
>  
> -	spin_lock(&dentry->d_lock);
> -	dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
> -	spin_unlock(&dentry->d_lock);
> -
> -	return 0;
> +	return status;
>  }
>  
>  /* For autofs direct mounts the follow link triggers the mount */
> @@ -216,7 +187,16 @@ static void *autofs4_follow_link(struct dentry *dentry, struct nameidata *nd)
>  	    (!d_mountpoint(dentry) && __simple_empty(dentry))) {
>  		spin_unlock(&dcache_lock);
>  
> -		status = try_to_fill_dentry(dentry, 0);
> +		spin_lock(&dentry->d_lock);
> +		dentry->d_flags |= DCACHE_AUTOFS_PENDING;
> +		spin_unlock(&dentry->d_lock);
> +
> +		status = try_to_fill_dentry(dentry);
> +
> +		spin_lock(&dentry->d_lock);
> +		dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
> +		spin_unlock(&dentry->d_lock);
> +
>  		if (status)
>  			goto out_error;
>  
> @@ -255,42 +235,55 @@ static int autofs4_revalidate(struct dentry *dentry, struct nameidata *nd)
>  {
>  	struct inode *dir = dentry->d_parent->d_inode;
>  	struct autofs_sb_info *sbi = autofs4_sbi(dir->i_sb);
> -	int oz_mode = autofs4_oz_mode(sbi);
> -	int flags = nd ? nd->flags : 0;
> -	int status = 1;
> -
> -	/* Pending dentry */
> -	spin_lock(&sbi->fs_lock);
> -	if (autofs4_ispending(dentry)) {
> -		/* The daemon never causes a mount to trigger */
> -		spin_unlock(&sbi->fs_lock);
> +	unsigned int flags = nd ? nd->flags : 0;
> +	int status;
>  
> -		if (oz_mode)
> -			return 1;
> +	DPRINTK("name = %.*s oz_mode = %d flags = %d",
> +		dentry->d_name.len, dentry->d_name.name,
> +		autofs4_oz_mode(sbi), nd ? nd->flags : 0);
>  
> -		/*
> -		 * If the directory has gone away due to an expire
> -		 * we have been called as ->d_revalidate() and so
> -		 * we need to return false and proceed to ->lookup().
> -		 */
> -		if (autofs4_expire_wait(dentry) == -EAGAIN)
> -			return 0;
> +	/* The daemon never causes a mount to trigger */
> +	if (autofs4_oz_mode(sbi))
> +		return 1;
>  
> +	spin_lock(&dentry->d_lock);
> +	if (dentry->d_inode == NULL || dentry->d_flags & DCACHE_AUTOFS_PENDING) {
> +		unsigned int lock_held = mutex_is_locked(&dir->i_mutex);
> +		unsigned int pending = dentry->d_flags & DCACHE_AUTOFS_PENDING;
> +		spin_unlock(&dentry->d_lock);
>  		/*
> -		 * A zero status is success otherwise we have a
> -		 * negative error code.
> +		 * If the directory mutex lock is held we must hide the
> +		 * dentry and proceed to ->lookup() to wait for mount
> +		 * completion since we cannot know whether the lock was
> +		 * taken by this code path or is held by another.
> +		 * Otherwise we can just wait for mount completion.
>  		 */
> -		status = try_to_fill_dentry(dentry, flags);
> -		if (status == 0)
> -			return 1;
> +		status = 0;
> +		if (lock_held) {
> +			/* Already pending, send to ->lookup() */
> +			d_drop(dentry);
> +		} else {
> +			/*
> +			 * The mutex isn't held so if there's a pending
> +			 * mount we wait. If not then we are on the tail
> +			 * end of a mount fail, invalidate the dentry and
> +			 * return true so we don't go to lookup again.
> +			 */
> +			if (pending) {
> +				if (autofs4_expire_wait(dentry) != -EAGAIN) {
> +					status = try_to_fill_dentry(dentry);
> +					if (!status)
> +						status = 1;
> +				}
> +			} else {
> +				d_invalidate(dentry);
> +				status = 1;
> +			}
> +		}
>  
>  		return status;
>  	}
> -	spin_unlock(&sbi->fs_lock);
> -
> -	/* Negative dentry.. invalidate if "old" */
> -	if (dentry->d_inode == NULL)
> -		return 0;
> +	spin_unlock(&dentry->d_lock);
>  
>  	/* Check for a non-mountpoint directory with no contents */
>  	spin_lock(&dcache_lock);
> @@ -299,19 +292,24 @@ static int autofs4_revalidate(struct dentry *dentry, struct nameidata *nd)
>  	    __simple_empty(dentry)) {
>  		DPRINTK("dentry=%p %.*s, emptydir",
>  			 dentry, dentry->d_name.len, dentry->d_name.name);
> -		spin_unlock(&dcache_lock);
> -
> -		/* The daemon never causes a mount to trigger */
> -		if (oz_mode)
> -			return 1;
>  
> -		/*
> -		 * A zero status is success otherwise we have a
> -		 * negative error code.
> -		 */
> -		status = try_to_fill_dentry(dentry, flags);
> -		if (status == 0)
> -			return 1;
> +		/* Always send new mount requests to lookup */
> +		status = 1;
> +		spin_lock(&sbi->fs_lock);
> +		if (autofs4_need_lookup(flags) || current->link_count) {
> +			struct autofs_info *ino = autofs4_dentry_ino(dentry);
> +			ino->flags |= AUTOFS_INF_REVALIDATE;
> +			spin_lock(&sbi->lookup_lock);
> +			if (list_empty(&ino->active))
> +				list_add(&ino->active, &sbi->active_list);
> +			spin_unlock(&sbi->lookup_lock);
> +			spin_lock(&dentry->d_lock);
> +			__d_drop(dentry);
> +			spin_unlock(&dentry->d_lock);
> +			status = 0;
> +		}
> +		spin_unlock(&sbi->fs_lock);
> +		spin_unlock(&dcache_lock);
>  
>  		return status;
>  	}
> @@ -471,6 +469,7 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
>  	struct autofs_info *ino;
>  	struct dentry *expiring, *unhashed;
>  	int oz_mode;
> +	int status = 0;
>  
>  	DPRINTK("name = %.*s",
>  		dentry->d_name.len, dentry->d_name.name);
> @@ -486,9 +485,18 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
>  		 current->pid, task_pgrp_nr(current), sbi->catatonic, oz_mode);
>  
>  	unhashed = autofs4_lookup_active(sbi, dentry->d_parent, &dentry->d_name);
> -	if (unhashed)
> +	if (unhashed) {
>  		dentry = unhashed;
> -	else {
> +		/*
> +		 * For requests from revalidate we need to rehash the dentry
> +		 * here as we don't end up calling mkdir().
> +		 */
> +		spin_lock(&sbi->fs_lock);
> +		ino = autofs4_dentry_ino(dentry);
> +		if (ino->flags & AUTOFS_INF_REVALIDATE)
> +			d_rehash(dentry);
> +		spin_unlock(&sbi->fs_lock);
> +	} else {
>  		/*
>  		 * Mark the dentry incomplete but don't hash it. We do this
>  		 * to serialize our inode creation operations (symlink and
> @@ -522,6 +530,10 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
>  	}
>  
>  	if (!oz_mode) {
> +		spin_lock(&dentry->d_lock);
> +		dentry->d_flags |= DCACHE_AUTOFS_PENDING;
> +		spin_unlock(&dentry->d_lock);
> +
>  		mutex_unlock(&dir->i_mutex);
>  		expiring = autofs4_lookup_expiring(sbi,
>  						   dentry->d_parent,
> @@ -541,19 +553,33 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
>  			dput(expiring);
>  		}
>  
> +		status = try_to_fill_dentry(dentry);
> +
> +		mutex_lock(&dir->i_mutex);
> +
>  		spin_lock(&dentry->d_lock);
> -		dentry->d_flags |= DCACHE_AUTOFS_PENDING;
> +		dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
>  		spin_unlock(&dentry->d_lock);
> -		if (dentry->d_op && dentry->d_op->d_revalidate)
> -			(dentry->d_op->d_revalidate)(dentry, nd);
> -		mutex_lock(&dir->i_mutex);
> +
> +		/*
> +		 * If we got a new lookup request from revaidate, remove
> +		 * it from the active queue and rehash the dentry.
> +		 */
> +		spin_lock(&sbi->fs_lock);
> +		if (ino->flags & AUTOFS_INF_REVALIDATE) {
> +			spin_lock(&sbi->lookup_lock);
> +			if (!list_empty(&ino->active))
> +				list_del_init(&ino->active);
> +			spin_unlock(&sbi->lookup_lock);
> +		}
> +		spin_unlock(&sbi->fs_lock);
>  	}
>  
>  	/*
> -	 * If we are still pending, check if we had to handle
> +	 * If we had a mount fail, check if we had to handle
>  	 * a signal. If so we can force a restart..
>  	 */
> -	if (dentry->d_flags & DCACHE_AUTOFS_PENDING) {
> +	if (status) {
>  		/* See if we were interrupted */
>  		if (signal_pending(current)) {
>  			sigset_t *sigset = &current->pending.signal;
> @@ -565,11 +591,6 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
>  			    return ERR_PTR(-ERESTARTNOINTR);
>  			}
>  		}
> -		if (!oz_mode) {
> -			spin_lock(&dentry->d_lock);
> -			dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
> -			spin_unlock(&dentry->d_lock);
> -		}
>  	}
>  
>  	/*
> @@ -599,7 +620,17 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
>  		return dentry;
>  	}
>  
> -	if (unhashed)
> +	/*
> +	 * If we had a mount failure, return status to user space.
> +	 * If the mount succeeded and we used a dentry from the
> +	 * active queue return it.
> +	 */
> +	if (status) {
> +		dentry = ERR_PTR(status);
> +		if (unhashed)
> +			dput(unhashed);
> +		return dentry;
> +	} else if (unhashed)
>  		return unhashed;
>  
>  	return NULL;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-04-03  0:58                     ` Sage Weil
@ 2009-04-03  2:00                       ` Ian Kent
  2009-04-03  3:07                         ` Sage Weil
  2009-06-22 17:15                         ` Sage Weil
  0 siblings, 2 replies; 45+ messages in thread
From: Ian Kent @ 2009-04-03  2:00 UTC (permalink / raw)
  To: Sage Weil
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Andreas Dilger,
	Yehuda Sadeh, H. Peter Anvin

Sage Weil wrote:
>> Latest here works OK.
>>
>> I haven't finished checking yet but it looks like the patch below works 
>> OK. I started with a 2.6.29 build with your two patches but it was a 
>> little broken so I fell back to a Fedora 2.6.27 based kernel without the
>> two revalidate pacthes to debug it. So I still need to test the result 
>> against 2.6.29 again. I also don't have any real way to test for the three 
>> process race we discussed where the revalidate isn't followed by a 
>> ->lookup() but with both of your patches applied that shouldn't be a 
>> problem (as we discussed).
>>
>> I've not run checkpatch.pl against the patch either at this stage.
> 
> That's good news...

I'm still working on this too.
I have some pressing work so it may be a while before I'm totally happy
with the patch. Didn't you say you were expecting a 2.6.31 time frame
for this?

>  
>> There is a further issue and that is regarding the autofs module.
>>
>> I can't see updating autofs for this being practical (although I haven't 
>> actually looked yet). I suspect quite a bit of work would be needed. The 
>> fact is that autofs isn't used much any more and it really should be 
>> replaced with the autofs4 module at some point. But that's a fairly tricky 
>> exercise and will likely cause some user space breakage. It will require 
>> an updated module-init-tools to add "alais autofs4 autofs" for modprobe 
>> backward compatibility and will break for any explicit checks for the 
>> presence of the "autofs4" module.
> 
> Hmm.  Well, I assume autofs needs to work properly before this gets 
> changed, though, right?  Should I see what I can do with it?  I took a 
> quick look, and I don't think it will take too much to make it behave.  
> It looks like the main thing is to make the lookup call to try_fill_dentry 
> return any existing dentry in place of the one the vfs provides.

Yes, or be replaced by what is currently the autofs4 module. The autofs
v2 communication protocol surely can't be being used any more and the
autofs4 module supports versions 3, 4 and 5. In fact I received a mail
from HPA recently suggesting he supports doing this.

I had a quick look as well. I think you'll find it isn't quite as simple
as that. I'll have a closer look as soon as I get a chance.


Ian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-04-03  2:00                       ` Ian Kent
@ 2009-04-03  3:07                         ` Sage Weil
  2009-06-22 17:15                         ` Sage Weil
  1 sibling, 0 replies; 45+ messages in thread
From: Sage Weil @ 2009-04-03  3:07 UTC (permalink / raw)
  To: Ian Kent
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Andreas Dilger,
	Yehuda Sadeh, H. Peter Anvin

On Fri, 3 Apr 2009, Ian Kent wrote:
> Sage Weil wrote:
> >> Latest here works OK.
> >>
> >> I haven't finished checking yet but it looks like the patch below works 
> >> OK. I started with a 2.6.29 build with your two patches but it was a 
> >> little broken so I fell back to a Fedora 2.6.27 based kernel without the
> >> two revalidate pacthes to debug it. So I still need to test the result 
> >> against 2.6.29 again. I also don't have any real way to test for the three 
> >> process race we discussed where the revalidate isn't followed by a 
> >> ->lookup() but with both of your patches applied that shouldn't be a 
> >> problem (as we discussed).
> >>
> >> I've not run checkpatch.pl against the patch either at this stage.
> > 
> > That's good news...
> 
> I'm still working on this too.
> I have some pressing work so it may be a while before I'm totally happy
> with the patch. Didn't you say you were expecting a 2.6.31 time frame
> for this?

I'd be happy with 2.6.31.  I think Andrew originally had it queued for 
.30, but that was before anybody realized autofs was affected.

> >> There is a further issue and that is regarding the autofs module.
> >>
> >> I can't see updating autofs for this being practical (although I haven't 
> >> actually looked yet). I suspect quite a bit of work would be needed. The 
> >> fact is that autofs isn't used much any more and it really should be 
> >> replaced with the autofs4 module at some point. But that's a fairly tricky 
> >> exercise and will likely cause some user space breakage. It will require 
> >> an updated module-init-tools to add "alais autofs4 autofs" for modprobe 
> >> backward compatibility and will break for any explicit checks for the 
> >> presence of the "autofs4" module.
> > 
> > Hmm.  Well, I assume autofs needs to work properly before this gets 
> > changed, though, right?  Should I see what I can do with it?  I took a 
> > quick look, and I don't think it will take too much to make it behave.  
> > It looks like the main thing is to make the lookup call to try_fill_dentry 
> > return any existing dentry in place of the one the vfs provides.
> 
> Yes, or be replaced by what is currently the autofs4 module. The autofs
> v2 communication protocol surely can't be being used any more and the
> autofs4 module supports versions 3, 4 and 5. In fact I received a mail
> from HPA recently suggesting he supports doing this.

That would certainly simplify things...

sage



> I had a quick look as well. I think you'll find it isn't quite as simple
> as that. I'll have a closer look as soon as I get a chance.
> 
> 
> Ian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-04-03  2:00                       ` Ian Kent
  2009-04-03  3:07                         ` Sage Weil
@ 2009-06-22 17:15                         ` Sage Weil
  2009-06-23  0:37                           ` Ian Kent
  1 sibling, 1 reply; 45+ messages in thread
From: Sage Weil @ 2009-06-22 17:15 UTC (permalink / raw)
  To: Ian Kent
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh,
	H. Peter Anvin

Hi Ian,

Have you had a chance to look at getting autofs4 lookup/revalidate 
adjusted so that this real_lookup() fix[1] can go in?

Please let me know if there is anything I can do to help here.  If you're 
still occupied, I'm happy to spin something up and send it your way... 
just let me know.

thanks-
sage


[1] http://marc.info/?l=linux-fsdevel&m=123749395609697&w=2


On Fri, 3 Apr 2009, Ian Kent wrote:

> Sage Weil wrote:
> >> Latest here works OK.
> >>
> >> I haven't finished checking yet but it looks like the patch below works 
> >> OK. I started with a 2.6.29 build with your two patches but it was a 
> >> little broken so I fell back to a Fedora 2.6.27 based kernel without the
> >> two revalidate pacthes to debug it. So I still need to test the result 
> >> against 2.6.29 again. I also don't have any real way to test for the three 
> >> process race we discussed where the revalidate isn't followed by a 
> >> ->lookup() but with both of your patches applied that shouldn't be a 
> >> problem (as we discussed).
> >>
> >> I've not run checkpatch.pl against the patch either at this stage.
> > 
> > That's good news...
> 
> I'm still working on this too.
> I have some pressing work so it may be a while before I'm totally happy
> with the patch. Didn't you say you were expecting a 2.6.31 time frame
> for this?
> 
> >  
> >> There is a further issue and that is regarding the autofs module.
> >>
> >> I can't see updating autofs for this being practical (although I haven't 
> >> actually looked yet). I suspect quite a bit of work would be needed. The 
> >> fact is that autofs isn't used much any more and it really should be 
> >> replaced with the autofs4 module at some point. But that's a fairly tricky 
> >> exercise and will likely cause some user space breakage. It will require 
> >> an updated module-init-tools to add "alais autofs4 autofs" for modprobe 
> >> backward compatibility and will break for any explicit checks for the 
> >> presence of the "autofs4" module.
> > 
> > Hmm.  Well, I assume autofs needs to work properly before this gets 
> > changed, though, right?  Should I see what I can do with it?  I took a 
> > quick look, and I don't think it will take too much to make it behave.  
> > It looks like the main thing is to make the lookup call to try_fill_dentry 
> > return any existing dentry in place of the one the vfs provides.
> 
> Yes, or be replaced by what is currently the autofs4 module. The autofs
> v2 communication protocol surely can't be being used any more and the
> autofs4 module supports versions 3, 4 and 5. In fact I received a mail
> from HPA recently suggesting he supports doing this.
> 
> I had a quick look as well. I think you'll find it isn't quite as simple
> as that. I'll have a closer look as soon as I get a chance.
> 
> 
> Ian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-06-22 17:15                         ` Sage Weil
@ 2009-06-23  0:37                           ` Ian Kent
  2009-06-23  2:40                             ` H. Peter Anvin
                                               ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Ian Kent @ 2009-06-23  0:37 UTC (permalink / raw)
  To: Sage Weil
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh,
	H. Peter Anvin

Sage Weil wrote:
> Hi Ian,
> 
> Have you had a chance to look at getting autofs4 lookup/revalidate 
> adjusted so that this real_lookup() fix[1] can go in?
> 
> Please let me know if there is anything I can do to help here.  If you're 
> still occupied, I'm happy to spin something up and send it your way... 
> just let me know.

Sorry, I haven't had time to do more on this.
There is also the issue of what to do about removing the autofs module
and renaming autofs4 to autofs, as this will break the autofs module.

I did start contacting people I think would want to know about this but
haven't gone further than an initial mail.

The other thing is that this patch was originally written quite a while
ago and, although it appears to work ok, I'm not sure it's quite what we
need.

Sorry for delaying you.

> 
> thanks-
> sage
> 
> 
> [1] http://marc.info/?l=linux-fsdevel&m=123749395609697&w=2
> 
> 
> On Fri, 3 Apr 2009, Ian Kent wrote:
> 
>> Sage Weil wrote:
>>>> Latest here works OK.
>>>>
>>>> I haven't finished checking yet but it looks like the patch below works 
>>>> OK. I started with a 2.6.29 build with your two patches but it was a 
>>>> little broken so I fell back to a Fedora 2.6.27 based kernel without the
>>>> two revalidate pacthes to debug it. So I still need to test the result 
>>>> against 2.6.29 again. I also don't have any real way to test for the three 
>>>> process race we discussed where the revalidate isn't followed by a 
>>>> ->lookup() but with both of your patches applied that shouldn't be a 
>>>> problem (as we discussed).
>>>>
>>>> I've not run checkpatch.pl against the patch either at this stage.
>>> That's good news...
>> I'm still working on this too.
>> I have some pressing work so it may be a while before I'm totally happy
>> with the patch. Didn't you say you were expecting a 2.6.31 time frame
>> for this?
>>
>>>  
>>>> There is a further issue and that is regarding the autofs module.
>>>>
>>>> I can't see updating autofs for this being practical (although I haven't 
>>>> actually looked yet). I suspect quite a bit of work would be needed. The 
>>>> fact is that autofs isn't used much any more and it really should be 
>>>> replaced with the autofs4 module at some point. But that's a fairly tricky 
>>>> exercise and will likely cause some user space breakage. It will require 
>>>> an updated module-init-tools to add "alais autofs4 autofs" for modprobe 
>>>> backward compatibility and will break for any explicit checks for the 
>>>> presence of the "autofs4" module.
>>> Hmm.  Well, I assume autofs needs to work properly before this gets 
>>> changed, though, right?  Should I see what I can do with it?  I took a 
>>> quick look, and I don't think it will take too much to make it behave.  
>>> It looks like the main thing is to make the lookup call to try_fill_dentry 
>>> return any existing dentry in place of the one the vfs provides.
>> Yes, or be replaced by what is currently the autofs4 module. The autofs
>> v2 communication protocol surely can't be being used any more and the
>> autofs4 module supports versions 3, 4 and 5. In fact I received a mail
>> from HPA recently suggesting he supports doing this.
>>
>> I had a quick look as well. I think you'll find it isn't quite as simple
>> as that. I'll have a closer look as soon as I get a chance.
>>
>>
>> Ian
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-06-23  0:37                           ` Ian Kent
@ 2009-06-23  2:40                             ` H. Peter Anvin
  2009-06-25  7:21                               ` Ian Kent
  2009-06-23  2:42                             ` H. Peter Anvin
  2009-06-24  2:28                             ` Ian Kent
  2 siblings, 1 reply; 45+ messages in thread
From: H. Peter Anvin @ 2009-06-23  2:40 UTC (permalink / raw)
  To: Ian Kent
  Cc: Sage Weil, linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh

Ian Kent wrote:
> 
> Sorry, I haven't had time to do more on this.
> There is also the issue of what to do about removing the autofs module
> and renaming autofs4 to autofs, as this will break the autofs module.
> 

The autofs module is pretty much historic at this point.  I say just
nuke it.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-06-23  0:37                           ` Ian Kent
  2009-06-23  2:40                             ` H. Peter Anvin
@ 2009-06-23  2:42                             ` H. Peter Anvin
  2009-06-24  2:28                             ` Ian Kent
  2 siblings, 0 replies; 45+ messages in thread
From: H. Peter Anvin @ 2009-06-23  2:42 UTC (permalink / raw)
  To: Ian Kent
  Cc: Sage Weil, linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh

Ian Kent wrote:
> 
> Sorry, I haven't had time to do more on this.
> There is also the issue of what to do about removing the autofs module
> and renaming autofs4 to autofs, as this will break the autofs module.
> 
> I did start contacting people I think would want to know about this but
> haven't gone further than an initial mail.
> 
> The other thing is that this patch was originally written quite a while
> ago and, although it appears to work ok, I'm not sure it's quite what we
> need.
> 

For what it's worth, the v2 protocol was only used very very early on,
and the only difference between the v2 and the v3 protocol is that the
v3 protocol supported the expiration ioctl.

There is no reason to believe there are any current users of the v2
protocol, but from the kernel point of view it is a strict subset of the
v3 protocol, so adding support for it is effectively no-op.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-06-23  0:37                           ` Ian Kent
  2009-06-23  2:40                             ` H. Peter Anvin
  2009-06-23  2:42                             ` H. Peter Anvin
@ 2009-06-24  2:28                             ` Ian Kent
  2009-06-24  5:45                               ` Sage Weil
  2 siblings, 1 reply; 45+ messages in thread
From: Ian Kent @ 2009-06-24  2:28 UTC (permalink / raw)
  To: Sage Weil
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh,
	H. Peter Anvin

Ian Kent wrote:
> Sage Weil wrote:
>> Hi Ian,
>>
>> Have you had a chance to look at getting autofs4 lookup/revalidate 
>> adjusted so that this real_lookup() fix[1] can go in?
>>
>> Please let me know if there is anything I can do to help here.  If you're 
>> still occupied, I'm happy to spin something up and send it your way... 
>> just let me know.
> 
> Sorry, I haven't had time to do more on this.
> There is also the issue of what to do about removing the autofs module
> and renaming autofs4 to autofs, as this will break the autofs module.
> 
> I did start contacting people I think would want to know about this but
> haven't gone further than an initial mail.
> 
> The other thing is that this patch was originally written quite a while
> ago and, although it appears to work ok, I'm not sure it's quite what we
> need.

I'm continuing with this now, but there's a deadlock in there somewhere!

> 
> Sorry for delaying you.
> 
>> thanks-
>> sage
>>
>>
>> [1] http://marc.info/?l=linux-fsdevel&m=123749395609697&w=2
>>
>>
>> On Fri, 3 Apr 2009, Ian Kent wrote:
>>
>>> Sage Weil wrote:
>>>>> Latest here works OK.
>>>>>
>>>>> I haven't finished checking yet but it looks like the patch below works 
>>>>> OK. I started with a 2.6.29 build with your two patches but it was a 
>>>>> little broken so I fell back to a Fedora 2.6.27 based kernel without the
>>>>> two revalidate pacthes to debug it. So I still need to test the result 
>>>>> against 2.6.29 again. I also don't have any real way to test for the three 
>>>>> process race we discussed where the revalidate isn't followed by a 
>>>>> ->lookup() but with both of your patches applied that shouldn't be a 
>>>>> problem (as we discussed).
>>>>>
>>>>> I've not run checkpatch.pl against the patch either at this stage.
>>>> That's good news...
>>> I'm still working on this too.
>>> I have some pressing work so it may be a while before I'm totally happy
>>> with the patch. Didn't you say you were expecting a 2.6.31 time frame
>>> for this?
>>>
>>>>  
>>>>> There is a further issue and that is regarding the autofs module.
>>>>>
>>>>> I can't see updating autofs for this being practical (although I haven't 
>>>>> actually looked yet). I suspect quite a bit of work would be needed. The 
>>>>> fact is that autofs isn't used much any more and it really should be 
>>>>> replaced with the autofs4 module at some point. But that's a fairly tricky 
>>>>> exercise and will likely cause some user space breakage. It will require 
>>>>> an updated module-init-tools to add "alais autofs4 autofs" for modprobe 
>>>>> backward compatibility and will break for any explicit checks for the 
>>>>> presence of the "autofs4" module.
>>>> Hmm.  Well, I assume autofs needs to work properly before this gets 
>>>> changed, though, right?  Should I see what I can do with it?  I took a 
>>>> quick look, and I don't think it will take too much to make it behave.  
>>>> It looks like the main thing is to make the lookup call to try_fill_dentry 
>>>> return any existing dentry in place of the one the vfs provides.
>>> Yes, or be replaced by what is currently the autofs4 module. The autofs
>>> v2 communication protocol surely can't be being used any more and the
>>> autofs4 module supports versions 3, 4 and 5. In fact I received a mail
>>> from HPA recently suggesting he supports doing this.
>>>
>>> I had a quick look as well. I think you'll find it isn't quite as simple
>>> as that. I'll have a closer look as soon as I get a chance.
>>>
>>>
>>> Ian
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
> 
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-06-24  2:28                             ` Ian Kent
@ 2009-06-24  5:45                               ` Sage Weil
  2009-06-24  9:17                                 ` Ian Kent
  2009-07-20  2:45                                 ` Ian Kent
  0 siblings, 2 replies; 45+ messages in thread
From: Sage Weil @ 2009-06-24  5:45 UTC (permalink / raw)
  To: Ian Kent
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh,
	H. Peter Anvin

On Wed, 24 Jun 2009, Ian Kent wrote:
> Ian Kent wrote:
> > Sage Weil wrote:
> >> Hi Ian,
> >>
> >> Have you had a chance to look at getting autofs4 lookup/revalidate 
> >> adjusted so that this real_lookup() fix[1] can go in?
> >>
> >> Please let me know if there is anything I can do to help here.  If you're 
> >> still occupied, I'm happy to spin something up and send it your way... 
> >> just let me know.
> > 
> > Sorry, I haven't had time to do more on this.
> > There is also the issue of what to do about removing the autofs module
> > and renaming autofs4 to autofs, as this will break the autofs module.
> > 
> > I did start contacting people I think would want to know about this but
> > haven't gone further than an initial mail.
> > 
> > The other thing is that this patch was originally written quite a while
> > ago and, although it appears to work ok, I'm not sure it's quite what we
> > need.
> 
> I'm continuing with this now, but there's a deadlock in there somewhere!

Sorry, are you still working with the patch you posted a few months back?

	http://marc.info/?l=linux-fsdevel&m=123831685111213&w=2

Looking over it, the 

+		unsigned int lock_held = mutex_is_locked(&dir->i_mutex);
...
+		if (lock_held) {
+			/* Already pending, send to ->lookup() */
+			d_drop(dentry);

bit looks highly suspect.  I'm guessing revalidate should never sleep, and 
always kick things off to ->lookup() (to do any waiting on upcall 
completion or whatever else) if the dentry isn't valid now...?

sage


> 
> > 
> > Sorry for delaying you.
> > 
> >> thanks-
> >> sage
> >>
> >>
> >> [1] http://marc.info/?l=linux-fsdevel&m=123749395609697&w=2
> >>
> >>
> >> On Fri, 3 Apr 2009, Ian Kent wrote:
> >>
> >>> Sage Weil wrote:
> >>>>> Latest here works OK.
> >>>>>
> >>>>> I haven't finished checking yet but it looks like the patch below works 
> >>>>> OK. I started with a 2.6.29 build with your two patches but it was a 
> >>>>> little broken so I fell back to a Fedora 2.6.27 based kernel without the
> >>>>> two revalidate pacthes to debug it. So I still need to test the result 
> >>>>> against 2.6.29 again. I also don't have any real way to test for the three 
> >>>>> process race we discussed where the revalidate isn't followed by a 
> >>>>> ->lookup() but with both of your patches applied that shouldn't be a 
> >>>>> problem (as we discussed).
> >>>>>
> >>>>> I've not run checkpatch.pl against the patch either at this stage.
> >>>> That's good news...
> >>> I'm still working on this too.
> >>> I have some pressing work so it may be a while before I'm totally happy
> >>> with the patch. Didn't you say you were expecting a 2.6.31 time frame
> >>> for this?
> >>>
> >>>>  
> >>>>> There is a further issue and that is regarding the autofs module.
> >>>>>
> >>>>> I can't see updating autofs for this being practical (although I haven't 
> >>>>> actually looked yet). I suspect quite a bit of work would be needed. The 
> >>>>> fact is that autofs isn't used much any more and it really should be 
> >>>>> replaced with the autofs4 module at some point. But that's a fairly tricky 
> >>>>> exercise and will likely cause some user space breakage. It will require 
> >>>>> an updated module-init-tools to add "alais autofs4 autofs" for modprobe 
> >>>>> backward compatibility and will break for any explicit checks for the 
> >>>>> presence of the "autofs4" module.
> >>>> Hmm.  Well, I assume autofs needs to work properly before this gets 
> >>>> changed, though, right?  Should I see what I can do with it?  I took a 
> >>>> quick look, and I don't think it will take too much to make it behave.  
> >>>> It looks like the main thing is to make the lookup call to try_fill_dentry 
> >>>> return any existing dentry in place of the one the vfs provides.
> >>> Yes, or be replaced by what is currently the autofs4 module. The autofs
> >>> v2 communication protocol surely can't be being used any more and the
> >>> autofs4 module supports versions 3, 4 and 5. In fact I received a mail
> >>> from HPA recently suggesting he supports doing this.
> >>>
> >>> I had a quick look as well. I think you'll find it isn't quite as simple
> >>> as that. I'll have a closer look as soon as I get a chance.
> >>>
> >>>
> >>> Ian
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >>>
> > 
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-06-24  5:45                               ` Sage Weil
@ 2009-06-24  9:17                                 ` Ian Kent
  2009-06-24 17:46                                   ` Sage Weil
  2009-07-20  2:45                                 ` Ian Kent
  1 sibling, 1 reply; 45+ messages in thread
From: Ian Kent @ 2009-06-24  9:17 UTC (permalink / raw)
  To: Sage Weil
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh,
	H. Peter Anvin

Sage Weil wrote:
> On Wed, 24 Jun 2009, Ian Kent wrote:
>> Ian Kent wrote:
>>> Sage Weil wrote:
>>>> Hi Ian,
>>>>
>>>> Have you had a chance to look at getting autofs4 lookup/revalidate 
>>>> adjusted so that this real_lookup() fix[1] can go in?
>>>>
>>>> Please let me know if there is anything I can do to help here.  If you're 
>>>> still occupied, I'm happy to spin something up and send it your way... 
>>>> just let me know.
>>> Sorry, I haven't had time to do more on this.
>>> There is also the issue of what to do about removing the autofs module
>>> and renaming autofs4 to autofs, as this will break the autofs module.
>>>
>>> I did start contacting people I think would want to know about this but
>>> haven't gone further than an initial mail.
>>>
>>> The other thing is that this patch was originally written quite a while
>>> ago and, although it appears to work ok, I'm not sure it's quite what we
>>> need.
>> I'm continuing with this now, but there's a deadlock in there somewhere!
> 
> Sorry, are you still working with the patch you posted a few months back?

It had changed a little but is quite different now.
I have a somewhat better stress test now so things that don't work will
pop out.

> 
> 	http://marc.info/?l=linux-fsdevel&m=123831685111213&w=2
> 
> Looking over it, the 
> 
> +		unsigned int lock_held = mutex_is_locked(&dir->i_mutex);
> ...
> +		if (lock_held) {
> +			/* Already pending, send to ->lookup() */
> +			d_drop(dentry);
> 
> bit looks highly suspect.  I'm guessing revalidate should never sleep, and 
> always kick things off to ->lookup() (to do any waiting on upcall 
> completion or whatever else) if the dentry isn't valid now...?

Yeah, I've heard that before, ;)

And that maybe the case, but that was what I first had.

Sending everything to ->lookup() might be possible but it certainly
isn't that simple.

Waiting in ->d_revaidate() isn't that different to waiting in ->lookup()
anyway as that must always be done without the directory mutex held. If
the lock isn't held when in ->d_revalidate() I can't really see any
reason not to handle that right their, possibly preventing the need to
go to ->lookup().

There are several cases I need to deal with, apart from path walks
initiated by the daemon which don't cause any call backs, and so are
largely handled by trivially returning success. The cases are, an
expiring dentry that will go away which ->lookup() can't yet handle, an
expiring dentry that won't go away which ->lookup() should be able to
handle already, and a straight out mount request which ->lookup() should
also be able to handle. The tail end of the expire cases can progress
concurrently with a mount, which is further complicated by the two cases
of going away or not, so it's all a bit tricky.

In any case I need to get this to work without the change you proposed,
except for cases that result from the locking change, and I'm using
printks to track incorrect returns to identify those cases. So what I
need right now is consistent behaviour and I'm not quite there. Once I
have that I'll work on any issues resulting from the locking change.

A lot has changed in the autofs4 module since I first tried to do this
and I now have a fairly aggressive test, so what appeared to work before
actually doesn't and it isn't as straight forward as I hoped.

Ian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-06-24  9:17                                 ` Ian Kent
@ 2009-06-24 17:46                                   ` Sage Weil
  2009-06-25  2:50                                     ` Ian Kent
  2009-06-25  4:13                                     ` Ian Kent
  0 siblings, 2 replies; 45+ messages in thread
From: Sage Weil @ 2009-06-24 17:46 UTC (permalink / raw)
  To: Ian Kent
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh,
	H. Peter Anvin

> > 
> > 	http://marc.info/?l=linux-fsdevel&m=123831685111213&w=2
> > 
> > Looking over it, the 
> > 
> > +		unsigned int lock_held = mutex_is_locked(&dir->i_mutex);
> > ...
> > +		if (lock_held) {
> > +			/* Already pending, send to ->lookup() */
> > +			d_drop(dentry);
> > 
> > bit looks highly suspect.  I'm guessing revalidate should never sleep, and 
> > always kick things off to ->lookup() (to do any waiting on upcall 
> > completion or whatever else) if the dentry isn't valid now...?
> 
> Yeah, I've heard that before, ;)
> 
> And that maybe the case, but that was what I first had.
> 
> Sending everything to ->lookup() might be possible but it certainly
> isn't that simple.
> 
> Waiting in ->d_revaidate() isn't that different to waiting in ->lookup()
> anyway as that must always be done without the directory mutex held. If
> the lock isn't held when in ->d_revalidate() I can't really see any
> reason not to handle that right their, possibly preventing the need to
> go to ->lookup().

IIRC, the fundamental problem was that if ->d_revalidate() can be called 
either with or without the dir mutex, you won't know whether the mutex is 
locked by the caller or by some other thread (mutex_is_locked() doesn't 
tell you _who_ locked the mutex).  And if waiting in d_revalidate means 
unlocking the mutex, that can't work, because you won't know if that's 
safe.  Thus, no waiting allowed in revalidate?

> There are several cases I need to deal with, apart from path walks
> initiated by the daemon which don't cause any call backs, and so are
> largely handled by trivially returning success. The cases are, an
> expiring dentry that will go away which ->lookup() can't yet handle, an
> expiring dentry that won't go away which ->lookup() should be able to
> handle already, and a straight out mount request which ->lookup() should
> also be able to handle. The tail end of the expire cases can progress
> concurrently with a mount, which is further complicated by the two cases
> of going away or not, so it's all a bit tricky.

It sounds to me like revalidate should only return success in the trivial, 
non-blocking case.  And the ->lookup() should be able to handle all 
(other) cases.  I.e., things should still work correctly (perhaps more 
slowly) without any d_revalidate() at all.  (It still looks like the main 
change needed is for lookup to use d_obtain_alias, instead of returning 
NULL unconditionally...)

> In any case I need to get this to work without the change you proposed,
> except for cases that result from the locking change, and I'm using
> printks to track incorrect returns to identify those cases. So what I
> need right now is consistent behaviour and I'm not quite there. Once I
> have that I'll work on any issues resulting from the locking change.

I suspect you want to work with the real_lookup patch applied... I don't 
think you can get consistent behavior without it. The whole goal here is 
to remove the wait-in-revalidate workaround that was compensating for that 
bug; without either the fix or the workaround you'll get occasional 
-ENOENT.

sage

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-06-24 17:46                                   ` Sage Weil
@ 2009-06-25  2:50                                     ` Ian Kent
  2009-06-25  4:13                                     ` Ian Kent
  1 sibling, 0 replies; 45+ messages in thread
From: Ian Kent @ 2009-06-25  2:50 UTC (permalink / raw)
  To: Sage Weil
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh,
	H. Peter Anvin

Sage Weil wrote:
>>> 	http://marc.info/?l=linux-fsdevel&m=123831685111213&w=2
>>>
>>> Looking over it, the 
>>>
>>> +		unsigned int lock_held = mutex_is_locked(&dir->i_mutex);
>>> ...
>>> +		if (lock_held) {
>>> +			/* Already pending, send to ->lookup() */
>>> +			d_drop(dentry);
>>>
>>> bit looks highly suspect.  I'm guessing revalidate should never sleep, and 
>>> always kick things off to ->lookup() (to do any waiting on upcall 
>>> completion or whatever else) if the dentry isn't valid now...?
>> Yeah, I've heard that before, ;)
>>
>> And that maybe the case, but that was what I first had.
>>
>> Sending everything to ->lookup() might be possible but it certainly
>> isn't that simple.
>>
>> Waiting in ->d_revaidate() isn't that different to waiting in ->lookup()
>> anyway as that must always be done without the directory mutex held. If
>> the lock isn't held when in ->d_revalidate() I can't really see any
>> reason not to handle that right their, possibly preventing the need to
>> go to ->lookup().
> 
> IIRC, the fundamental problem was that if ->d_revalidate() can be called 
> either with or without the dir mutex, you won't know whether the mutex is 
> locked by the caller or by some other thread (mutex_is_locked() doesn't 
> tell you _who_ locked the mutex).  And if waiting in d_revalidate means 
> unlocking the mutex, that can't work, because you won't know if that's 
> safe.  Thus, no waiting allowed in revalidate?

Hehe, as I said, I've been through this before.

If the mutex is not held by anyone then it doesn't need to be dropped!

But now I'm thinking I'll get rid of that that check anyway, so it
doesn't really matter.

> 
>> There are several cases I need to deal with, apart from path walks
>> initiated by the daemon which don't cause any call backs, and so are
>> largely handled by trivially returning success. The cases are, an
>> expiring dentry that will go away which ->lookup() can't yet handle, an
>> expiring dentry that won't go away which ->lookup() should be able to
>> handle already, and a straight out mount request which ->lookup() should
>> also be able to handle. The tail end of the expire cases can progress
>> concurrently with a mount, which is further complicated by the two cases
>> of going away or not, so it's all a bit tricky.
> 
> It sounds to me like revalidate should only return success in the trivial, 
> non-blocking case.  And the ->lookup() should be able to handle all 
> (other) cases.  I.e., things should still work correctly (perhaps more 
> slowly) without any d_revalidate() at all.  (It still looks like the main 
> change needed is for lookup to use d_obtain_alias, instead of returning 
> NULL unconditionally...)

It doesn't return NULL unconditionally and hasn't for some time.

It cannot work at all without a d_revalidate().

Your assuming that an automount implies dentry creation at mount time
which isn't necessarily the case. The creation of a directory in the
autofs pseudo file system may be done (positive and hashed) without a
mount request, the mount request coming later. This is a case I missed
when I described them above.

> 
>> In any case I need to get this to work without the change you proposed,
>> except for cases that result from the locking change, and I'm using
>> printks to track incorrect returns to identify those cases. So what I
>> need right now is consistent behaviour and I'm not quite there. Once I
>> have that I'll work on any issues resulting from the locking change.
> 
> I suspect you want to work with the real_lookup patch applied... I don't 
> think you can get consistent behavior without it. The whole goal here is 
> to remove the wait-in-revalidate workaround that was compensating for that 
> bug; without either the fix or the workaround you'll get occasional 
> -ENOENT.

True, and I expect and watch for that.
But I'm tending to agree and am going to do that.

Ian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-06-24 17:46                                   ` Sage Weil
  2009-06-25  2:50                                     ` Ian Kent
@ 2009-06-25  4:13                                     ` Ian Kent
  2009-06-25  4:49                                       ` Sage Weil
  1 sibling, 1 reply; 45+ messages in thread
From: Ian Kent @ 2009-06-25  4:13 UTC (permalink / raw)
  To: Sage Weil
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh,
	H. Peter Anvin

Sage Weil wrote:
> 
>> There are several cases I need to deal with, apart from path walks
>> initiated by the daemon which don't cause any call backs, and so are
>> largely handled by trivially returning success. The cases are, an
>> expiring dentry that will go away which ->lookup() can't yet handle, an
>> expiring dentry that won't go away which ->lookup() should be able to
>> handle already, and a straight out mount request which ->lookup() should
>> also be able to handle. The tail end of the expire cases can progress
>> concurrently with a mount, which is further complicated by the two cases
>> of going away or not, so it's all a bit tricky.
> 
> It sounds to me like revalidate should only return success in the trivial, 
> non-blocking case.  And the ->lookup() should be able to handle all 
> (other) cases.  I.e., things should still work correctly (perhaps more 
> slowly) without any d_revalidate() at all.  (It still looks like the main 
> change needed is for lookup to use d_obtain_alias, instead of returning 
> NULL unconditionally...)

How so?

I don't understand how d_obtain_alias() can help in this situation.
Can you elaborate please?

Ian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-06-25  4:13                                     ` Ian Kent
@ 2009-06-25  4:49                                       ` Sage Weil
  2009-06-25  5:52                                         ` Ian Kent
  0 siblings, 1 reply; 45+ messages in thread
From: Sage Weil @ 2009-06-25  4:49 UTC (permalink / raw)
  To: Ian Kent
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh,
	H. Peter Anvin

On Thu, 25 Jun 2009, Ian Kent wrote:
> Sage Weil wrote:
> > 
> >> There are several cases I need to deal with, apart from path walks
> >> initiated by the daemon which don't cause any call backs, and so are
> >> largely handled by trivially returning success. The cases are, an
> >> expiring dentry that will go away which ->lookup() can't yet handle, an
> >> expiring dentry that won't go away which ->lookup() should be able to
> >> handle already, and a straight out mount request which ->lookup() should
> >> also be able to handle. The tail end of the expire cases can progress
> >> concurrently with a mount, which is further complicated by the two cases
> >> of going away or not, so it's all a bit tricky.
> > 
> > It sounds to me like revalidate should only return success in the trivial, 
> > non-blocking case.  And the ->lookup() should be able to handle all 
> > (other) cases.  I.e., things should still work correctly (perhaps more 
> > slowly) without any d_revalidate() at all.  (It still looks like the main 
> > change needed is for lookup to use d_obtain_alias, instead of returning 
> > NULL unconditionally...)
> 
> How so?
> 
> I don't understand how d_obtain_alias() can help in this situation.
> Can you elaborate please?

Well, as you noticed I misread lookup (it doesn't always return NULL).  I 
originally just meant that lookup needs to return an existing dentry if 
there is one (found via autofs4_lookup_active?), since now it may not be 
able to count on revalidate blocking on the hashed one if it's not usable 
yet.  It looks like it's tracking mounts by dentry, not inode, so 
d_obtain_alias isn't terribly helpful.

As you can see I haven't spent much time looking at this code.  :)  I'm 
happy to do so, though, or spend some time testing to get this resolved.. 
just let me know how I can help.

sage

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-06-25  4:49                                       ` Sage Weil
@ 2009-06-25  5:52                                         ` Ian Kent
  2009-09-17  6:36                                           ` Ian Kent
  0 siblings, 1 reply; 45+ messages in thread
From: Ian Kent @ 2009-06-25  5:52 UTC (permalink / raw)
  To: Sage Weil
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh,
	H. Peter Anvin

Sage Weil wrote:
> On Thu, 25 Jun 2009, Ian Kent wrote:
>> Sage Weil wrote:
>>>> There are several cases I need to deal with, apart from path walks
>>>> initiated by the daemon which don't cause any call backs, and so are
>>>> largely handled by trivially returning success. The cases are, an
>>>> expiring dentry that will go away which ->lookup() can't yet handle, an
>>>> expiring dentry that won't go away which ->lookup() should be able to
>>>> handle already, and a straight out mount request which ->lookup() should
>>>> also be able to handle. The tail end of the expire cases can progress
>>>> concurrently with a mount, which is further complicated by the two cases
>>>> of going away or not, so it's all a bit tricky.
>>> It sounds to me like revalidate should only return success in the trivial, 
>>> non-blocking case.  And the ->lookup() should be able to handle all 
>>> (other) cases.  I.e., things should still work correctly (perhaps more 
>>> slowly) without any d_revalidate() at all.  (It still looks like the main 
>>> change needed is for lookup to use d_obtain_alias, instead of returning 
>>> NULL unconditionally...)
>> How so?
>>
>> I don't understand how d_obtain_alias() can help in this situation.
>> Can you elaborate please?
> 
> Well, as you noticed I misread lookup (it doesn't always return NULL).  I 
> originally just meant that lookup needs to return an existing dentry if 
> there is one (found via autofs4_lookup_active?), since now it may not be 
> able to count on revalidate blocking on the hashed one if it's not usable 
> yet.  It looks like it's tracking mounts by dentry, not inode, so 
> d_obtain_alias isn't terribly helpful.

Yep, that's right.

It gets better too.

Previously, in lookup, the call back was unconditional, but to avoid
mount storms from colour ls, now it isn't. Except for this I could just
rely on ->lookup() using a new dentry and not add the revalidating
dentry to the active list.

> 
> As you can see I haven't spent much time looking at this code.  :)  I'm 
> happy to do so, though, or spend some time testing to get this resolved.. 
> just let me know how I can help.

You can't just yet as it's completely different from the original patch.
Let me get this to a sensible point then we can both go over it in detail.

Ian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-06-23  2:40                             ` H. Peter Anvin
@ 2009-06-25  7:21                               ` Ian Kent
  2009-06-25 13:41                                 ` H. Peter Anvin
  0 siblings, 1 reply; 45+ messages in thread
From: Ian Kent @ 2009-06-25  7:21 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Sage Weil, linux-fsdevel, Christoph Hellwig, akpm, Al Viro,
	Yehuda Sadeh, Jon Masters

H. Peter Anvin wrote:
> Ian Kent wrote:
>> Sorry, I haven't had time to do more on this.
>> There is also the issue of what to do about removing the autofs module
>> and renaming autofs4 to autofs, as this will break the autofs module.
>>
> 
> The autofs module is pretty much historic at this point.  I say just
> nuke it.

But while I sort out the details of changing autofs4 to handle the
change to path lookup locking there are a couple of things work out for
the rename of autofs4 to autofs.

It's true that removing autofs shouldn't be a big deal but existing
users of autofs4 will have a problem. Particularly if people run a newer
kernel on and older distribution.

We probably have to accept that people  doing insmod with a specific
path are going to have a problem and they shouldn't use that method
anyway. But, AFAICS using a MODULE_ALIAS() in the kernel module will
allow for the name change but doesn't seem to take account of the
directory name change. Is that correct?

There are those that may have an alias in the module-init-tools
configuration as well. But we need get rid of that practice as well as,
for a long time now, the alias hasn't been needed.

Anyone have any thoughts how we might better handle these difficulties?

Ian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-06-25  7:21                               ` Ian Kent
@ 2009-06-25 13:41                                 ` H. Peter Anvin
  2009-06-25 13:58                                   ` Christoph Hellwig
  0 siblings, 1 reply; 45+ messages in thread
From: H. Peter Anvin @ 2009-06-25 13:41 UTC (permalink / raw)
  To: Ian Kent
  Cc: Sage Weil, linux-fsdevel, Christoph Hellwig, akpm, Al Viro,
	Yehuda Sadeh, Jon Masters

Ian Kent wrote:
> 
> We probably have to accept that people  doing insmod with a specific
> path are going to have a problem and they shouldn't use that method
> anyway. But, AFAICS using a MODULE_ALIAS() in the kernel module will
> allow for the name change but doesn't seem to take account of the
> directory name change. Is that correct?
> 

I'm not sure what that's supposed to mean.  Generally the directory a
module appears in doesn't matter.

> There are those that may have an alias in the module-init-tools
> configuration as well. But we need get rid of that practice as well as,
> for a long time now, the alias hasn't been needed.
> 
> Anyone have any thoughts how we might better handle these difficulties?

I suspect the MODULE_ALIAS should just do it.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-06-25 13:41                                 ` H. Peter Anvin
@ 2009-06-25 13:58                                   ` Christoph Hellwig
  0 siblings, 0 replies; 45+ messages in thread
From: Christoph Hellwig @ 2009-06-25 13:58 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ian Kent, Sage Weil, linux-fsdevel, Christoph Hellwig, akpm,
	Al Viro, Yehuda Sadeh, Jon Masters

On Thu, Jun 25, 2009 at 06:41:01AM -0700, H. Peter Anvin wrote:
> > There are those that may have an alias in the module-init-tools
> > configuration as well. But we need get rid of that practice as well as,
> > for a long time now, the alias hasn't been needed.
> > 
> > Anyone have any thoughts how we might better handle these difficulties?
> 
> I suspect the MODULE_ALIAS should just do it.

You'll also need to register two file_system_type structures or both
names.  Thjose two together are everything that is needed.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-06-24  5:45                               ` Sage Weil
  2009-06-24  9:17                                 ` Ian Kent
@ 2009-07-20  2:45                                 ` Ian Kent
  2009-07-28 22:47                                   ` Sage Weil
  1 sibling, 1 reply; 45+ messages in thread
From: Ian Kent @ 2009-07-20  2:45 UTC (permalink / raw)
  To: Sage Weil
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh,
	H. Peter Anvin

Sage Weil wrote:
> On Wed, 24 Jun 2009, Ian Kent wrote:
>> Ian Kent wrote:
>>> Sage Weil wrote:
>>>> Hi Ian,
>>>>
>>>> Have you had a chance to look at getting autofs4 lookup/revalidate 
>>>> adjusted so that this real_lookup() fix[1] can go in?
>>>>
>>>> Please let me know if there is anything I can do to help here.  If you're 
>>>> still occupied, I'm happy to spin something up and send it your way... 
>>>> just let me know.
>>> Sorry, I haven't had time to do more on this.
>>> There is also the issue of what to do about removing the autofs module
>>> and renaming autofs4 to autofs, as this will break the autofs module.
>>>
>>> I did start contacting people I think would want to know about this but
>>> haven't gone further than an initial mail.
>>>
>>> The other thing is that this patch was originally written quite a while
>>> ago and, although it appears to work ok, I'm not sure it's quite what we
>>> need.
>> I'm continuing with this now, but there's a deadlock in there somewhere!
> 
> Sorry, are you still working with the patch you posted a few months back?
> 
> 	http://marc.info/?l=linux-fsdevel&m=123831685111213&w=2
> 
> Looking over it, the 
> 
> +		unsigned int lock_held = mutex_is_locked(&dir->i_mutex);
> ...
> +		if (lock_held) {
> +			/* Already pending, send to ->lookup() */
> +			d_drop(dentry);
> 
> bit looks highly suspect.  I'm guessing revalidate should never sleep, and 
> always kick things off to ->lookup() (to do any waiting on upcall 
> completion or whatever else) if the dentry isn't valid now...?

I tried your suggestion and have finally come to the conclusion that it
cannot work. My own fault really, for not fully understanding why I used
the above approach in the first place.

I believe that if the mutex is not held then I "must" handle it in the
revalidate routine and if the mutex is held I "must" defer to
->lookup(). The only way to send this to ->lookup() is to drop the
dentry and rehash it in lookup and the mutex must be held over both
calls or it is possible for an execution path to skip over the lookup
call when several concurrent processes walk into the same dentry at the
same time. AFAICS it isn't possible to detect this and work around it
when sending everything to ->lookup()

This digression was quite costly in time for me but useful in improving
my understanding of the problem. I'm going to return to my original
approach, hopefully I will make better progress.

Ian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-07-20  2:45                                 ` Ian Kent
@ 2009-07-28 22:47                                   ` Sage Weil
  2009-07-29  2:59                                     ` Ian Kent
  0 siblings, 1 reply; 45+ messages in thread
From: Sage Weil @ 2009-07-28 22:47 UTC (permalink / raw)
  To: Ian Kent
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh,
	H. Peter Anvin

On Mon, 20 Jul 2009, Ian Kent wrote:
> Sage Weil wrote:
> > On Wed, 24 Jun 2009, Ian Kent wrote:
> >> I'm continuing with this now, but there's a deadlock in there somewhere!
> > 
> > Sorry, are you still working with the patch you posted a few months back?
> > 
> > 	http://marc.info/?l=linux-fsdevel&m=123831685111213&w=2
> > 
> > Looking over it, the 
> > 
> > +		unsigned int lock_held = mutex_is_locked(&dir->i_mutex);
> > ...
> > +		if (lock_held) {
> > +			/* Already pending, send to ->lookup() */
> > +			d_drop(dentry);
> > 
> > bit looks highly suspect.  I'm guessing revalidate should never sleep, and 
> > always kick things off to ->lookup() (to do any waiting on upcall 
> > completion or whatever else) if the dentry isn't valid now...?
> 
> I tried your suggestion and have finally come to the conclusion that it
> cannot work. My own fault really, for not fully understanding why I used
> the above approach in the first place.
> 
> I believe that if the mutex is not held then I "must" handle it in the
> revalidate routine and if the mutex is held I "must" defer to
> ->lookup(). The only way to send this to ->lookup() is to drop the
> dentry and rehash it in lookup and the mutex must be held over both
> calls or it is possible for an execution path to skip over the lookup
> call when several concurrent processes walk into the same dentry at the
> same time. AFAICS it isn't possible to detect this and work around it
> when sending everything to ->lookup()

How is it possible for a revalidate to fail and then to skip over the 
->lookup() call?  That sounds like a problem with the VFS?  (Like the one 
I'm trying to fix... you're testing with the real_lookup patch applied, I 
hope?)

> This digression was quite costly in time for me but useful in improving
> my understanding of the problem. I'm going to return to my original
> approach, hopefully I will make better progress.

I'm sorry if I've sent you on a wild goose chase.  If you describe how 
you're reproducing the problem I can try it locally.

Thanks-
sage

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-07-28 22:47                                   ` Sage Weil
@ 2009-07-29  2:59                                     ` Ian Kent
  2009-07-29 16:57                                       ` Sage Weil
  0 siblings, 1 reply; 45+ messages in thread
From: Ian Kent @ 2009-07-29  2:59 UTC (permalink / raw)
  To: Sage Weil
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh,
	H. Peter Anvin

Sage Weil wrote:
> On Mon, 20 Jul 2009, Ian Kent wrote:
>> Sage Weil wrote:
>>> On Wed, 24 Jun 2009, Ian Kent wrote:
>>>> I'm continuing with this now, but there's a deadlock in there somewhere!
>>> Sorry, are you still working with the patch you posted a few months back?
>>>
>>> 	http://marc.info/?l=linux-fsdevel&m=123831685111213&w=2
>>>
>>> Looking over it, the 
>>>
>>> +		unsigned int lock_held = mutex_is_locked(&dir->i_mutex);
>>> ...
>>> +		if (lock_held) {
>>> +			/* Already pending, send to ->lookup() */
>>> +			d_drop(dentry);
>>>
>>> bit looks highly suspect.  I'm guessing revalidate should never sleep, and 
>>> always kick things off to ->lookup() (to do any waiting on upcall 
>>> completion or whatever else) if the dentry isn't valid now...?
>> I tried your suggestion and have finally come to the conclusion that it
>> cannot work. My own fault really, for not fully understanding why I used
>> the above approach in the first place.
>>
>> I believe that if the mutex is not held then I "must" handle it in the
>> revalidate routine and if the mutex is held I "must" defer to
>> ->lookup(). The only way to send this to ->lookup() is to drop the
>> dentry and rehash it in lookup and the mutex must be held over both
>> calls or it is possible for an execution path to skip over the lookup
>> call when several concurrent processes walk into the same dentry at the
>> same time. AFAICS it isn't possible to detect this and work around it
>> when sending everything to ->lookup()
> 
> How is it possible for a revalidate to fail and then to skip over the 
> ->lookup() call?  That sounds like a problem with the VFS?  (Like the one 
> I'm trying to fix... you're testing with the real_lookup patch applied, I 
> hope?)

It's been hard to work out exactly what's happening but I suspect it is
due to multiple walks occurring while the the dentry hashed state
changes. Like, one process hits revalidate, drops the dentry, another
process comes along and goes straight to lookup and rehashes the dentry,
the original process, that dropped the dentry, then ends up not calling
lookup. The race only happens occasionally, and my test uses 10
processes that tend to hit the same dentrys at the same time at various
points in the mount tree over 150 iterations. Admittedly, it's a fairly
stressful test as autofs goes but it does tend to show up races quite well.

I'm now working on a method to ensure the code path that unhashed the
dentry is the one that rehashes it but that has its own set of
difficulties, partly because walks from the daemon shouldn't block and
the dentry can become unhashed at any time after other code paths have
dropped and rehashed the dentry, since the mutex is not always held
during revalidate.

> 
>> This digression was quite costly in time for me but useful in improving
>> my understanding of the problem. I'm going to return to my original
>> approach, hopefully I will make better progress.
> 
> I'm sorry if I've sent you on a wild goose chase.  If you describe how 
> you're reproducing the problem I can try it locally.

I thought that at the time but it turns out that you didn't, the race is
still present either way. I've since gone back to this approach.

I always knew this was going to be difficult but I didn't expect it to
not be possible, given that I'm using your VFS locking change. If this
continues to be a problem I may need to start thinking about some small
VFS changes to help work around the difficulty.

Ian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-07-29  2:59                                     ` Ian Kent
@ 2009-07-29 16:57                                       ` Sage Weil
  2009-07-30  0:56                                         ` Ian Kent
  0 siblings, 1 reply; 45+ messages in thread
From: Sage Weil @ 2009-07-29 16:57 UTC (permalink / raw)
  To: Ian Kent
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh,
	H. Peter Anvin

On Wed, 29 Jul 2009, Ian Kent wrote:
> > How is it possible for a revalidate to fail and then to skip over the 
> > ->lookup() call?  That sounds like a problem with the VFS?  (Like the one 
> > I'm trying to fix... you're testing with the real_lookup patch applied, I 
> > hope?)
> 
> It's been hard to work out exactly what's happening but I suspect it is
> due to multiple walks occurring while the the dentry hashed state
> changes. Like, one process hits revalidate, drops the dentry, another
> process comes along and goes straight to lookup and rehashes the dentry,
> the original process, that dropped the dentry, then ends up not calling
> lookup. The race only happens occasionally, and my test uses 10
> processes that tend to hit the same dentrys at the same time at various
> points in the mount tree over 150 iterations. Admittedly, it's a fairly
> stressful test as autofs goes but it does tend to show up races quite well.

If I'm reading namei.c correctly, every ->d_revalidate() that returns NULL 
will always result in a ->lookup() (unless IS_DEADDIR(inode)), without any 
regard for the dentry's hashed/unhashed status.  The exception appears to 
be a call in __link_path_walk when FS_REVAL_DOT is set in the superblock 
(nfs only):

		if (nd->path.dentry && nd->path.dentry->d_sb &&
		    (nd->path.dentry->d_sb->s_type->fs_flags & FS_REVAL_DOT)) {
			err = -ESTALE;
			/* Note: we do not d_invalidate() */
			if (!nd->path.dentry->d_op->d_revalidate(
					nd->path.dentry, nd))
				break;

in which case userspace would see an ESTALE.

It sounds like a few well placed printks that include the pid for unhash, 
rehash, revalidate, and lookup would narrow down what is going on?

> I always knew this was going to be difficult but I didn't expect it to
> not be possible, given that I'm using your VFS locking change. If this
> continues to be a problem I may need to start thinking about some small
> VFS changes to help work around the difficulty.

Thanks for continuing to work on this!
sage

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-07-29 16:57                                       ` Sage Weil
@ 2009-07-30  0:56                                         ` Ian Kent
  2009-07-30 17:47                                           ` Sage Weil
  0 siblings, 1 reply; 45+ messages in thread
From: Ian Kent @ 2009-07-30  0:56 UTC (permalink / raw)
  To: Sage Weil
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh,
	H. Peter Anvin

Sage Weil wrote:
> On Wed, 29 Jul 2009, Ian Kent wrote:
>>> How is it possible for a revalidate to fail and then to skip over the 
>>> ->lookup() call?  That sounds like a problem with the VFS?  (Like the one 
>>> I'm trying to fix... you're testing with the real_lookup patch applied, I 
>>> hope?)
>> It's been hard to work out exactly what's happening but I suspect it is
>> due to multiple walks occurring while the the dentry hashed state
>> changes. Like, one process hits revalidate, drops the dentry, another
>> process comes along and goes straight to lookup and rehashes the dentry,
>> the original process, that dropped the dentry, then ends up not calling
>> lookup. The race only happens occasionally, and my test uses 10
>> processes that tend to hit the same dentrys at the same time at various
>> points in the mount tree over 150 iterations. Admittedly, it's a fairly
>> stressful test as autofs goes but it does tend to show up races quite well.
> 
> If I'm reading namei.c correctly, every ->d_revalidate() that returns NULL 
> will always result in a ->lookup() (unless IS_DEADDIR(inode)), without any 
> regard for the dentry's hashed/unhashed status.  The exception appears to 
> be a call in __link_path_walk when FS_REVAL_DOT is set in the superblock 
> (nfs only):
> 
> 		if (nd->path.dentry && nd->path.dentry->d_sb &&
> 		    (nd->path.dentry->d_sb->s_type->fs_flags & FS_REVAL_DOT)) {
> 			err = -ESTALE;
> 			/* Note: we do not d_invalidate() */
> 			if (!nd->path.dentry->d_op->d_revalidate(
> 					nd->path.dentry, nd))
> 				break;
> 
> in which case userspace would see an ESTALE.
> 
> It sounds like a few well placed printks that include the pid for unhash, 
> rehash, revalidate, and lookup would narrow down what is going on?

Of course I've been doing that.

I think you need to re-read what I wrote above. Your analysis is fine if
there is only one process concurrently walking onto the dentry. If a
second process beats the revalidate process to real_lookup() and
incorrectly rehashes the dentry the following revalidate process will
not call ->lookup().

My most recent try at this seems to be working.
I need to clean up the printks and alter one of the comments and re-test
with a full test (I have been working with only half the test enabled as
it was most problematic). The testing takes a long time to run, 1 hour
10 minutes for each of two configurations and the criteria for success
in at least six consecutive successful runs, that's about thirteen
hours. But, hopefully, when that is done I'll have a series to post for
comment.

> 
>> I always knew this was going to be difficult but I didn't expect it to
>> not be possible, given that I'm using your VFS locking change. If this
>> continues to be a problem I may need to start thinking about some small
>> VFS changes to help work around the difficulty.
> 
> Thanks for continuing to work on this!
> sage


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-07-30  0:56                                         ` Ian Kent
@ 2009-07-30 17:47                                           ` Sage Weil
  2009-07-31  2:03                                             ` Ian Kent
  0 siblings, 1 reply; 45+ messages in thread
From: Sage Weil @ 2009-07-30 17:47 UTC (permalink / raw)
  To: Ian Kent
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh,
	H. Peter Anvin

On Thu, 30 Jul 2009, Ian Kent wrote:
> Sage Weil wrote:
> > On Wed, 29 Jul 2009, Ian Kent wrote:
> >>> How is it possible for a revalidate to fail and then to skip over the 
> >>> ->lookup() call?  That sounds like a problem with the VFS?  (Like the one 
> >>> I'm trying to fix... you're testing with the real_lookup patch applied, I 
> >>> hope?)
> >> It's been hard to work out exactly what's happening but I suspect it is
> >> due to multiple walks occurring while the the dentry hashed state
> >> changes. Like, one process hits revalidate, drops the dentry, another
> >> process comes along and goes straight to lookup and rehashes the dentry,
> >> the original process, that dropped the dentry, then ends up not calling
> >> lookup. The race only happens occasionally, and my test uses 10
> >> processes that tend to hit the same dentrys at the same time at various
> >> points in the mount tree over 150 iterations. Admittedly, it's a fairly
> >> stressful test as autofs goes but it does tend to show up races quite well.
> > 
> > If I'm reading namei.c correctly, every ->d_revalidate() that returns NULL 
> > will always result in a ->lookup() (unless IS_DEADDIR(inode)), without any 
> > regard for the dentry's hashed/unhashed status.  The exception appears to 
> > be a call in __link_path_walk when FS_REVAL_DOT is set in the superblock 
> > (nfs only):
> > 
> > 		if (nd->path.dentry && nd->path.dentry->d_sb &&
> > 		    (nd->path.dentry->d_sb->s_type->fs_flags & FS_REVAL_DOT)) {
> > 			err = -ESTALE;
> > 			/* Note: we do not d_invalidate() */
> > 			if (!nd->path.dentry->d_op->d_revalidate(
> > 					nd->path.dentry, nd))
> > 				break;
> > 
> > in which case userspace would see an ESTALE.
> > 
> > It sounds like a few well placed printks that include the pid for unhash, 
> > rehash, revalidate, and lookup would narrow down what is going on?
> 
> Of course I've been doing that.
> 
> I think you need to re-read what I wrote above. Your analysis is fine if
> there is only one process concurrently walking onto the dentry. If a
> second process beats the revalidate process to real_lookup() and
> incorrectly rehashes the dentry the following revalidate process will
> not call ->lookup().

Sorry, you're right.  I missed the d_lookup call in real_lookup.  So you 
mean something like:

do_lookup
	__d_lookup -> finds dentry
	do_revalidate -> drops dentry, returns null
	real_lookup
		take i_mutex
		d_lookup -> returns dentry rehashed by racing ->lookup()

> My most recent try at this seems to be working.
> I need to clean up the printks and alter one of the comments and re-test
> with a full test (I have been working with only half the test enabled as
> it was most problematic). The testing takes a long time to run, 1 hour
> 10 minutes for each of two configurations and the criteria for success
> in at least six consecutive successful runs, that's about thirteen
> hours. But, hopefully, when that is done I'll have a series to post for
> comment.

Great to hear.  I hope my inane blathering hasn't been totally useless! :)

Thanks-
sage

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-07-30 17:47                                           ` Sage Weil
@ 2009-07-31  2:03                                             ` Ian Kent
  0 siblings, 0 replies; 45+ messages in thread
From: Ian Kent @ 2009-07-31  2:03 UTC (permalink / raw)
  To: Sage Weil
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh,
	H. Peter Anvin

Sage Weil wrote:
> On Thu, 30 Jul 2009, Ian Kent wrote:
>> Sage Weil wrote:
>>> On Wed, 29 Jul 2009, Ian Kent wrote:
>>>>> How is it possible for a revalidate to fail and then to skip over the 
>>>>> ->lookup() call?  That sounds like a problem with the VFS?  (Like the one 
>>>>> I'm trying to fix... you're testing with the real_lookup patch applied, I 
>>>>> hope?)
>>>> It's been hard to work out exactly what's happening but I suspect it is
>>>> due to multiple walks occurring while the the dentry hashed state
>>>> changes. Like, one process hits revalidate, drops the dentry, another
>>>> process comes along and goes straight to lookup and rehashes the dentry,
>>>> the original process, that dropped the dentry, then ends up not calling
>>>> lookup. The race only happens occasionally, and my test uses 10
>>>> processes that tend to hit the same dentrys at the same time at various
>>>> points in the mount tree over 150 iterations. Admittedly, it's a fairly
>>>> stressful test as autofs goes but it does tend to show up races quite well.
>>> If I'm reading namei.c correctly, every ->d_revalidate() that returns NULL 
>>> will always result in a ->lookup() (unless IS_DEADDIR(inode)), without any 
>>> regard for the dentry's hashed/unhashed status.  The exception appears to 
>>> be a call in __link_path_walk when FS_REVAL_DOT is set in the superblock 
>>> (nfs only):
>>>
>>> 		if (nd->path.dentry && nd->path.dentry->d_sb &&
>>> 		    (nd->path.dentry->d_sb->s_type->fs_flags & FS_REVAL_DOT)) {
>>> 			err = -ESTALE;
>>> 			/* Note: we do not d_invalidate() */
>>> 			if (!nd->path.dentry->d_op->d_revalidate(
>>> 					nd->path.dentry, nd))
>>> 				break;
>>>
>>> in which case userspace would see an ESTALE.
>>>
>>> It sounds like a few well placed printks that include the pid for unhash, 
>>> rehash, revalidate, and lookup would narrow down what is going on?
>> Of course I've been doing that.
>>
>> I think you need to re-read what I wrote above. Your analysis is fine if
>> there is only one process concurrently walking onto the dentry. If a
>> second process beats the revalidate process to real_lookup() and
>> incorrectly rehashes the dentry the following revalidate process will
>> not call ->lookup().
> 
> Sorry, you're right.  I missed the d_lookup call in real_lookup.  So you 
> mean something like:
> 
> do_lookup
> 	__d_lookup -> finds dentry
> 	do_revalidate -> drops dentry, returns null
> 	real_lookup
> 		take i_mutex
> 		d_lookup -> returns dentry rehashed by racing ->lookup()

Well, that's what I thought but looking again it doesn't look like that
can happen. However, the evidence I have speaks for itself, a user space
process is calling ->open() on a mount point with no mount. That should
never happen, it should have waited for the mount. And the code to
ensure that the process that dropped the dentry is the one that
re-hashes it makes the problem go away.

> 
>> My most recent try at this seems to be working.
>> I need to clean up the printks and alter one of the comments and re-test
>> with a full test (I have been working with only half the test enabled as
>> it was most problematic). The testing takes a long time to run, 1 hour
>> 10 minutes for each of two configurations and the criteria for success
>> in at least six consecutive successful runs, that's about thirteen
>> hours. But, hopefully, when that is done I'll have a series to post for
>> comment.
> 
> Great to hear.  I hope my inane blathering hasn't been totally useless! :)

Sadly, on the seventh run of my test I encountered what looks like a
deadlock. I'm working on that now but, but as I progress the testing
takes longer and longer, *sigh*.

Ian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
  2009-06-25  5:52                                         ` Ian Kent
@ 2009-09-17  6:36                                           ` Ian Kent
  0 siblings, 0 replies; 45+ messages in thread
From: Ian Kent @ 2009-09-17  6:36 UTC (permalink / raw)
  To: Sage Weil
  Cc: linux-fsdevel, Christoph Hellwig, akpm, Al Viro, Yehuda Sadeh,
	H. Peter Anvin

On Thu, Jun 25, 2009 at 01:52:32PM +0800, Ian Kent wrote:
> Sage Weil wrote:
> > 
> > As you can see I haven't spent much time looking at this code.  :)  I'm 
> > happy to do so, though, or spend some time testing to get this resolved.. 
> > just let me know how I can help.
> 
> You can't just yet as it's completely different from the original patch.
> Let me get this to a sensible point then we can both go over it in detail.

Just thought I'd let you know how this is going.

I had to stop work on this for a few weeks becuase it was driving me crazy.
The good news is that after returning to it I think I've got it working, at
least mostly.

The testing is lengthy as I've had to increase the number of iterations
significantly to investigate the failures. Also, I've focused on the most
problematic configuration so there may still be difficulties as the testing
goes further.

The test I'm using has two basic configurations, one is for automount maps
using the autofs pseudo mount option "nobrowse" which just means mount
points are created at mount time. The other is with the pseudo option
"browse" that just means mount point directories are pre-created. For
these two configurations there are various sub-cases so the directory
pre-creation is not always done. But that's detail.

The current state is that I'm not seeing failures from the kernel module
with the "browse" configuration, which was the main source of difficulty
to begin with. One remaining issue is that, by the time I reach around
2000 iterations, I'm seeing an increase of just over 1 second in around
28 seconds, per iteration. I'm not sure if this is a real probem or not.

Anyway, I'm hopeing the "nobrowse" configuration testing will not show
further problems. 

So, it's probably time to post my patch series to get some more eyes
looking at them. Who on the recipient list of this mail should I include
for the post?

Ian

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2009-09-17  6:36 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-19 20:16 [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held Sage Weil
2009-03-19 20:16 ` [PATCH 2/2] vfs: clean up real_lookup Sage Weil
2009-03-19 20:22   ` Christoph Hellwig
2009-03-19 20:35     ` Sage Weil
2009-03-19 20:23 ` [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held Christoph Hellwig
2009-03-24  4:14 ` Ian Kent
2009-03-24  4:18   ` Ian Kent
2009-03-25  4:29     ` Sage Weil
2009-03-25  6:08       ` Ian Kent
2009-03-25 16:11         ` Ian Kent
2009-03-25 19:11           ` Sage Weil
2009-03-26  2:09             ` Ian Kent
2009-03-26  3:53               ` Sage Weil
2009-03-26  8:00                 ` Ian Kent
2009-03-26 10:38                 ` Ian Kent
2009-03-29  8:53                   ` Ian Kent
2009-04-03  0:58                     ` Sage Weil
2009-04-03  2:00                       ` Ian Kent
2009-04-03  3:07                         ` Sage Weil
2009-06-22 17:15                         ` Sage Weil
2009-06-23  0:37                           ` Ian Kent
2009-06-23  2:40                             ` H. Peter Anvin
2009-06-25  7:21                               ` Ian Kent
2009-06-25 13:41                                 ` H. Peter Anvin
2009-06-25 13:58                                   ` Christoph Hellwig
2009-06-23  2:42                             ` H. Peter Anvin
2009-06-24  2:28                             ` Ian Kent
2009-06-24  5:45                               ` Sage Weil
2009-06-24  9:17                                 ` Ian Kent
2009-06-24 17:46                                   ` Sage Weil
2009-06-25  2:50                                     ` Ian Kent
2009-06-25  4:13                                     ` Ian Kent
2009-06-25  4:49                                       ` Sage Weil
2009-06-25  5:52                                         ` Ian Kent
2009-09-17  6:36                                           ` Ian Kent
2009-07-20  2:45                                 ` Ian Kent
2009-07-28 22:47                                   ` Sage Weil
2009-07-29  2:59                                     ` Ian Kent
2009-07-29 16:57                                       ` Sage Weil
2009-07-30  0:56                                         ` Ian Kent
2009-07-30 17:47                                           ` Sage Weil
2009-07-31  2:03                                             ` Ian Kent
2009-03-26  3:54               ` Ian Kent
2009-03-26  4:03                 ` Sage Weil
2009-03-26  5:07                 ` Ian Kent

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.