From mboxrd@z Thu Jan 1 00:00:00 1970 From: linuxram@us.ibm.com (Ram Pai) Subject: [RFC PATCH]autofs4: hang and proposed fix Date: Wed, 16 Nov 2005 02:17:40 -0800 Message-ID: <20051116101740.GA9551@RAM> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-fsdevel@vger.kernel.org, linuxram@us.ibm.com Return-path: Received: from e2.ny.us.ibm.com ([32.97.182.142]:4838 "EHLO e2.ny.us.ibm.com") by vger.kernel.org with ESMTP id S932597AbVKPKV0 (ORCPT ); Wed, 16 Nov 2005 05:21:26 -0500 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e2.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id jAGALLpt022139 for ; Wed, 16 Nov 2005 05:21:21 -0500 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jAGALLPd120770 for ; Wed, 16 Nov 2005 05:21:21 -0500 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11/8.13.3) with ESMTP id jAGALLSg009726 for ; Wed, 16 Nov 2005 05:21:21 -0500 To: autofs@linux.kernel.org Content-Disposition: inline Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Autofs4 assumes that its ->revalidate() function gets called with the parent_dentry's_inode_semaphore released. This is true mostly but not in one particular case. Process P1 calls autofs4's ->lookup(). The lookup finds that the dentry does not exist. It creates a dentry and adds to the cache. Releases the parent's inode's semaphore and than calls ->revalidate(). Process P2 meanwhile comes in and cached_lookup() gets called. It finds the dentry in the cache and finds ->revalidate() function exists. So it calls ->revalidate() holding the parent's inode's semaphore. Now the automounter daemon comes in and tries to hold the same semaphore in order to mount. But since the semaphore is held by P2 it goes to sleep. Process P1 and P2 continue waiting for the mount to complete and it never happens. Deadlock. The stack of the deadlock is as follows: ls S 00000000 0 13049 11954 (NOTLB) f5221df0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 f5d44a70 c721b520 00000000 d4f33800 003d0990 c721b9d8 f5d44030 f5d44164 f5220000 f5221e3c f3dd6880 f5221e68 c0215207 f3b95580 80000000 Call Trace: [] autofs4_wait+0x307/0x3d0 [] try_to_fill_dentry+0xf3/0x150 [] autofs4_revalidate+0x159/0x170 [] autofs4_lookup+0x110/0x150 [] __lookup_hash+0x85/0xb0 [] lookup_hash+0xa/0x10 [] lookup_one_len+0x53/0x70 [] stubfs_readdir+0x113/0x170 [stubfs] [] vfs_readdir+0x8b/0xa0 [] sys_getdents64+0x63/0xb5 [] syscall_call+0x7/0xb ls S C011B1AF 0 13050 11898 (NOTLB) f1337df0 00000082 f1337e04 c011b1af 06ce3f60 00000027 00000027 00000080 06d03f60 00000000 c721b520 00000000 d4f33800 003d0990 f1337df0 f5d44a70 f5d44ba4 f1336000 f1337e3c f3dd6880 f1337e68 c0215207 f3b95580 80000000 Call Trace: [] autofs4_wait+0x307/0x3d0 [] try_to_fill_dentry+0xf3/0x150 [] autofs4_revalidate+0x159/0x170 [] cached_lookup+0x47/0x80 [] __lookup_hash+0x5a/0xb0 [] lookup_hash+0xa/0x10 [] lookup_one_len+0x53/0x70 [] stubfs_readdir+0x163/0x170 [stubfs] [] vfs_readdir+0x8b/0xa0 [] sys_getdents64+0x63/0xb5 [] syscall_call+0x7/0xb automount D 00000010 0 13052 13016 (NOTLB) f3321f00 fff80000 00000007 00000010 f3321f68 c7b1cd20 00000000 f3321f34 f3321ee8 f5e92a70 c7233520 00000000 d5304100 003d0990 c7233560 f1e31a70 f1e31ba4 f5f59914 f5f5991c 00000296 f3321f38 c03b4cd3 f1e31a70 00000001 Call Trace: [] __down+0x83/0xe0 [] __down_failed+0xa/0x10 [] .text.lock.namei+0xeb/0x1de [] sys_mkdir+0x52/0xd0 [] syscall_call+0x7/0xb BUG: soft lockup detected on CPU#0! I have coded up a tentative fix. The patch releases the semaphore in ->revalidate() function, instead of the caller of that function. Not sure if this is the right fix. Tested it and verified that the deadlock is fixed. But I am not sure if it opens up other bugs. Please validate. fs/autofs4/root.c | 26 +++++++++++++++----------- 1 files changed, 15 insertions(+), 11 deletions(-) Index: 2.6.15-rc1/fs/autofs4/root.c =================================================================== --- 2.6.15-rc1.orig/fs/autofs4/root.c +++ 2.6.15-rc1/fs/autofs4/root.c @@ -386,40 +386,47 @@ static int autofs4_revalidate(struct den struct autofs_sb_info *sbi = autofs4_sbi(dir->i_sb); int oz_mode = autofs4_oz_mode(sbi); int flags = nd ? nd->flags : 0; int status = 1; + up(&dir->i_sem); /* Pending dentry */ if (autofs4_ispending(dentry)) { if (!oz_mode) - status = try_to_fill_dentry(dentry, dir->i_sb, sbi, flags); - return status; + status = try_to_fill_dentry(dentry, dir->i_sb, + sbi, flags); + goto out; } /* Negative dentry.. invalidate if "old" */ - if (dentry->d_inode == NULL) - return (dentry->d_time - jiffies <= AUTOFS_NEGATIVE_TIMEOUT); + if (dentry->d_inode == NULL) { + status = (dentry->d_time - jiffies <= AUTOFS_NEGATIVE_TIMEOUT); + goto out; + } /* Check for a non-mountpoint directory with no contents */ spin_lock(&dcache_lock); if (S_ISDIR(dentry->d_inode->i_mode) && !d_mountpoint(dentry) && list_empty(&dentry->d_subdirs)) { DPRINTK("dentry=%p %.*s, emptydir", dentry, dentry->d_name.len, dentry->d_name.name); spin_unlock(&dcache_lock); if (!oz_mode) - status = try_to_fill_dentry(dentry, dir->i_sb, sbi, flags); - return status; + status = try_to_fill_dentry(dentry, dir->i_sb, sbi, + flags); + goto out; } spin_unlock(&dcache_lock); /* Update the usage list */ if (!oz_mode) autofs4_update_usage(dentry); - return 1; +out: + down(&dir->i_sem); + return status; } static void autofs4_dentry_release(struct dentry *de) { struct autofs_info *inf; @@ -485,15 +492,12 @@ static struct dentry *autofs4_lookup(str spin_unlock(&dentry->d_lock); } dentry->d_fsdata = NULL; d_add(dentry, NULL); - if (dentry->d_op && dentry->d_op->d_revalidate) { - up(&dir->i_sem); + if (dentry->d_op && dentry->d_op->d_revalidate) (dentry->d_op->d_revalidate)(dentry, nd); - down(&dir->i_sem); - } /* * If we are still pending, check if we had to handle * a signal. If so we can force a restart.. */