All of lore.kernel.org
 help / color / mirror / Atom feed
From: Badari Pulavarty <pbadari@gmail.com>
To: "William H. Taber" <wtaber@us.ibm.com>
Cc: Ian Kent <raven@themaw.net>, Ram Pai <linuxram@us.ibm.com>,
	autofs mailing list <autofs@linux.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [RFC PATCH]autofs4: hang and proposed fix
Date: Mon, 28 Nov 2005 15:12:52 -0800	[thread overview]
Message-ID: <1133219572.27824.24.camel@localhost.localdomain> (raw)
In-Reply-To: <438B3C34.2050509@us.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 3911 bytes --]

On Mon, 2005-11-28 at 12:19 -0500, William H. Taber wrote:
> Ian Kent wrote:
> 
> > My thoughts:
> > 
> > The cause of this issue is user space programs using autofs4 need to 
> > call services that must be able to take the inode semaphore. Notably 
> > sys_mkdir and sys_symlink in order to complete their task.
> > 
> > I believe that, in this case, releasing the semaphore is ok since the 
> > entry is part of the autofs filesystem and so autofs is responsible for 
> > taking care of it, provided that it is done carefully. The semaphore is 
> > meant to serialize changes being to the directory and these changes are 
> > done in autofs by asking the user space process to do it. Which are 
> > themselves serialized by the same semaphore.
> > 
> > The only tricky thing I can think of here is that care must be taken to 
> > ensure that the semaphore is not released before the DCACHE_AUTOFS_PENDING 
> > flag is set to make sure that other incoming requests are sent to the wait 
> > queue.
> > 
> > The attached patch does this and opts for a conservative approach by 
> > broadening the critical region instead of narrowing it.
> > 
> > It may also be necessary to review the return codes from revaliate but I'm 
> > only part way through that.
> > 
> > Please review and test this patch and offer further comment.
> > Sorry guys but I haven't been able to test this at all save verifying that 
> > it compiles.
> > 
> > Hopefully I haven't missed anything completely obvious ... DOH!
> > 
> > Ian
> > 
> > --- linux-2.6.15-rc1/fs/autofs4/root.c.lookup-deadlock	2005-11-17 18:58:38.000000000 +0800
> > +++ linux-2.6.15-rc1/fs/autofs4/root.c	2005-11-27 17:00:40.000000000 +0800
> > @@ -487,11 +487,8 @@ static struct dentry *autofs4_lookup(str
> >  	dentry->d_fsdata = NULL;
> >  	d_add(dentry, NULL);
> >  
> > -	if (dentry->d_op && dentry->d_op->d_revalidate) {
> > -		up(&dir->i_sem);
> > +	if (dentry->d_op && dentry->d_op->d_revalidate)
> >  		(dentry->d_op->d_revalidate)(dentry, nd);
> > -		down(&dir->i_sem);
> > -	}
> >  
> >  	/*
> >  	 * If we are still pending, check if we had to handle
> > --- linux-2.6.15-rc1/fs/autofs4/waitq.c.lookup-deadlock	2005-11-27 17:09:42.000000000 +0800
> > +++ linux-2.6.15-rc1/fs/autofs4/waitq.c	2005-11-27 17:17:34.000000000 +0800
> > @@ -161,6 +161,8 @@ int autofs4_wait(struct autofs_sb_info *
> >  		enum autofs_notify notify)
> >  {
> >  	struct autofs_wait_queue *wq;
> > +	struct inode *dir = dentry->d_parent->d_inode;
> > +	int i_sem_held;
> >  	char *name;
> >  	int len, status;
> >  
> > @@ -227,6 +229,14 @@ int autofs4_wait(struct autofs_sb_info *
> >  			(unsigned long) wq->wait_queue_token, wq->len, wq->name, notify);
> >  	}
> >  
> > +	/*
> > +	 * If we are called from lookup or lookup_hash the
> > +	 * the inode semaphore needs to be released for
> > +	 * userspace to do its thing.
> > +	 */
> > +	i_sem_held = down_trylock(&dir->i_sem);
> > +	up(&dir->i_sem);
> > +
> >  	if (notify != NFY_NONE && atomic_dec_and_test(&wq->notified)) {
> >  		int type = (notify == NFY_MOUNT ?
> >  			autofs_ptype_missing : autofs_ptype_expire_multi);
> > @@ -268,6 +278,10 @@ int autofs4_wait(struct autofs_sb_info *
> >  		DPRINTK("skipped sleeping");
> >  	}
> >  
> > +	/* Re-take the inode semaphore if it was held */
> > +	if (i_sem_held)
> > +		down(&dir->i_sem);
> > +
> >  	status = wq->status;
> >  
> >  	/* Are we the last process to need status? */
> > -
> Ian,
> I have not tested this patch but it seems to have a serious flaw.  Given 
> that do_lookup does not get the parent i_sem lock before calling 
> revalidate, you have the possibility that you are being called without 
> having gotten the lock but the lock may be held by another process.  In 
> that case you do not want to be releasing their lock while they are 
> relying on it.
> 

Here is the patch Will Taber proposed and I am posting on his behalf.

Thanks,
Badari




[-- Attachment #2: autofs.patch --]
[-- Type: text/x-patch, Size: 3822 bytes --]

This patch changes the semantics of d_revalidate so that it is always called 
with the parent i_sem lock held.  This allows the autofs4 code to release the
lock if it needs to pend.  Without this patch the autofs has a race condition
in which it pends in the revalidate code while holding the parent i_sem lock 
which prevents the mount from ever completing.  There have been other patches
proposed for this problem which check to see if the parent i_sem lock is held
before releasing it but those solutions ignore the possibility that the lock
may be held by another process.

diff -ur linux-2.6.13.3/fs/autofs4/root.c linux-2.6.13.3-autofspatch/fs/autofs4/root.c
--- linux-2.6.13.3/fs/autofs4/root.c	2005-10-03 16:27:35.000000000 -0700
+++ linux-2.6.13.3-autofspatch/fs/autofs4/root.c	2005-11-28 04:22:52.000000000 -0800
@@ -302,7 +302,9 @@
 		DPRINTK("waiting for expire %p name=%.*s",
 			 dentry, dentry->d_name.len, dentry->d_name.name);
 
+		up(&dentry->d_parent->d_inode->i_sem);
 		status = autofs4_wait(sbi, dentry, NFY_NONE);
+		down(&dentry->d_parent->d_inode->i_sem);
 		
 		DPRINTK("expire done status=%d", status);
 		
@@ -324,7 +326,9 @@
 		DPRINTK("waiting for mount name=%.*s",
 			 dentry->d_name.len, dentry->d_name.name);
 
+		up(&dentry->d_parent->d_inode->i_sem);
 		status = autofs4_wait(sbi, dentry, NFY_MOUNT);
+		down(&dentry->d_parent->d_inode->i_sem);
 		 
 		DPRINTK("mount done status=%d", status);
 
@@ -351,7 +355,9 @@
 		spin_lock(&dentry->d_lock);
 		dentry->d_flags |= DCACHE_AUTOFS_PENDING;
 		spin_unlock(&dentry->d_lock);
+		up(&dentry->d_parent->d_inode->i_sem);
 		status = autofs4_wait(sbi, dentry, NFY_MOUNT);
+		down(&dentry->d_parent->d_inode->i_sem);
 
 		DPRINTK("mount done status=%d", status);
 
diff -ur linux-2.6.13.3/fs/namei.c linux-2.6.13.3-autofspatch/fs/namei.c
--- linux-2.6.13.3/fs/namei.c	2005-10-03 16:27:35.000000000 -0700
+++ linux-2.6.13.3-autofspatch/fs/namei.c	2005-11-28 04:22:52.000000000 -0800
@@ -393,7 +393,6 @@
 	struct dentry * result;
 	struct inode *dir = parent->d_inode;
 
-	down(&dir->i_sem);
 	/*
 	 * First re-do the cached lookup just in case it was created
 	 * while we waited for the directory semaphore..
@@ -419,7 +418,6 @@
 			else
 				result = dentry;
 		}
-		up(&dir->i_sem);
 		return result;
 	}
 
@@ -427,7 +425,6 @@
 	 * Uhhuh! Nasty case: the cache was re-populated while
 	 * we waited on the semaphore. Need to revalidate.
 	 */
-	up(&dir->i_sem);
 	if (result->d_op && result->d_op->d_revalidate) {
 		if (!result->d_op->d_revalidate(result, nd) && !d_invalidate(result)) {
 			dput(result);
@@ -676,13 +673,16 @@
 		     struct path *path)
 {
 	struct vfsmount *mnt = nd->mnt;
+	struct inode *parent = nd->dentry->d_inode;
 	struct dentry *dentry = __d_lookup(nd->dentry, name);
 
+	down(&parent->i_sem);
 	if (!dentry)
 		goto need_lookup;
 	if (dentry->d_op && dentry->d_op->d_revalidate)
 		goto need_revalidate;
 done:
+	up(&parent->i_sem);
 	path->mnt = mnt;
 	path->dentry = dentry;
 	__follow_mount(path);
@@ -703,6 +703,7 @@
 	goto need_lookup;
 
 fail:
+	up(&parent->i_sem);
 	return PTR_ERR(dentry);
 }
 
@@ -718,7 +719,7 @@
 {
 	struct path next;
 	struct inode *inode;
-	int err;
+	int err, reval;
 	unsigned int lookup_flags = nd->flags;
 	
 	while (*name=='/')
@@ -893,9 +894,17 @@
 		 */
 		if (nd->dentry && nd->dentry->d_sb &&
 		    (nd->dentry->d_sb->s_type->fs_flags & FS_REVAL_DOT)) {
+			struct dentry *nparent;
+
 			err = -ESTALE;
 			/* Note: we do not d_invalidate() */
-			if (!nd->dentry->d_op->d_revalidate(nd->dentry, nd))
+			/* Revalidate requires us to lock the parent.
+			 */
+			nparent = nd->dentry->d_parent;
+			down(&nparent->d_inode->i_sem);
+			reval = nd->dentry->d_op->d_revalidate(nd->dentry, nd);
+			up(&nparent->d_inode->i_sem);
+			if (!reval)
 				break;
 		}
 return_base:

  reply	other threads:[~2005-11-28 23:12 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-11-16 10:17 [RFC PATCH]autofs4: hang and proposed fix Ram Pai
2005-11-16 12:41 ` [autofs] " Ian Kent
2005-11-16 16:50   ` Ram Pai
2005-11-16 22:57     ` Ian Kent
2005-11-17  1:52       ` [autofs] " Ram Pai
2005-11-17 18:50         ` Ian Kent
2005-11-17 19:19           ` William H. Taber
2005-11-17 20:39             ` Ram Pai
2005-11-17 22:31               ` William H. Taber
2005-11-18 14:57                 ` Ian Kent
2005-11-18 14:54               ` Ian Kent
2005-11-18 14:44             ` Ian Kent
2005-11-18 15:20               ` William H. Taber
2005-11-18 16:30                 ` Ian Kent
2005-11-18 17:12                   ` William H. Taber
2005-11-18 18:57                     ` Ram Pai
2005-11-18 20:08                       ` William H. Taber
2005-11-19  2:52                         ` Ian Kent
2005-11-21 16:40                           ` William H. Taber
2005-11-22 13:13                             ` Ian Kent
2005-11-22 17:48                               ` [autofs] " William H. Taber
2005-11-23 14:11                                 ` Ian Kent
2005-11-23 16:42                                   ` William H. Taber
2005-11-23 17:52                                     ` Ian Kent
2005-11-23 18:47                                       ` William H. Taber
2005-11-23 17:52                                     ` Ian Kent
2005-11-19  1:40                     ` [autofs] " Ian Kent
2005-11-16 15:22 ` Jeff Moyer
2005-11-16 15:22   ` Jeff Moyer
2005-11-16 17:00   ` [autofs] " Ram Pai
2005-11-16 18:25     ` Jeff Moyer
2005-11-16 19:24       ` William H. Taber
2005-11-16 19:51         ` Ram Pai
2005-11-27 10:47 ` Ian Kent
2005-11-28 17:19   ` William H. Taber
2005-11-28 23:12     ` Badari Pulavarty [this message]
2005-11-29 14:19       ` Ian Kent
2005-11-29 16:34         ` William H. Taber
2005-11-30 14:02           ` Ian Kent
2005-11-30 16:49             ` Badari Pulavarty
2005-11-30 17:04               ` Trond Myklebust
2005-11-30 21:10                 ` William H. Taber
2005-11-29 14:20     ` Ian Kent
2005-11-30  1:16 ` [autofs] " Jeff Moyer
2005-11-30  1:16   ` Jeff Moyer
2005-11-30  1:56   ` Trond Myklebust
2005-11-30  4:15     ` Jeff Moyer
2005-11-30  6:14       ` Trond Myklebust
2005-11-30 15:44         ` Ian Kent
2005-11-30 15:53           ` [autofs] " Trond Myklebust
2005-11-30 16:12             ` Ian Kent
2005-11-30 16:27               ` Ian Kent
2005-11-30 16:45               ` [autofs] " Trond Myklebust
2005-11-30 20:32     ` William H. Taber
2005-11-30 20:53       ` Trond Myklebust
2005-11-30 21:30         ` William H. Taber
2005-11-30 22:32           ` Trond Myklebust
2005-12-01 16:27             ` William H. Taber
2005-12-01 12:09           ` Ian Kent
2005-12-01 16:30             ` William H. Taber
2005-12-02 13:49               ` Ian Kent
2005-12-02 14:07                 ` Jeff Moyer
2005-12-02 15:21                   ` Ian Kent
2005-12-02 16:35                     ` [autofs] " Will Taber
2005-12-02 17:11                       ` Ian Kent
2005-12-02 15:34                 ` Will Taber
2005-12-02 17:29                   ` Ian Kent
2005-12-02 18:12                     ` Trond Myklebust
2005-12-04 12:56                       ` Christoph Hellwig
2005-12-04 12:57                         ` Christoph Hellwig
2005-12-04 14:58                           ` Ian Kent
2005-12-04 17:17                             ` [autofs] " Christoph Hellwig
2005-12-05 14:02                               ` Ian Kent
2005-12-06 21:20                               ` Jeff Moyer
2005-12-06 21:40                                 ` Christoph Hellwig
2005-12-06 22:37                                   ` Jeff Moyer
2005-12-07 14:52                                   ` Will Taber
2005-12-07 15:18                                     ` Christoph Hellwig
2005-12-07 15:22                                   ` Brian Long
2005-12-07 15:25                                     ` Christoph Hellwig
2005-12-07 17:46                                     ` Will Taber
2005-12-08 14:16                                       ` Ian Kent
2005-12-09 12:12                                       ` Christoph Hellwig
2005-12-09 13:33                                         ` John T. Kohl
2005-12-13 18:39                                           ` Christoph Hellwig
2005-12-04 14:56                         ` Ian Kent
2005-12-02 19:04                     ` [autofs] " Will Taber
2005-12-04  9:39                       ` Ian Kent
2005-12-02 16:04                 ` [autofs] " Jeff Moyer
2005-12-02 17:36                   ` Ian Kent
2005-12-02 18:33                     ` [autofs] " Will Taber
2005-12-04  9:52                       ` Ian Kent
2005-12-04 14:54                         ` Ian Kent
2005-12-05 15:40                           ` Ian Kent
2005-11-30 14:48   ` [autofs] " Ian Kent

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1133219572.27824.24.camel@localhost.localdomain \
    --to=pbadari@gmail.com \
    --cc=autofs@linux.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linuxram@us.ibm.com \
    --cc=raven@themaw.net \
    --cc=wtaber@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.