[RFC][PATCH 0/4] configfs: Make nested default groups lockdep-friendly v2

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC][PATCH 0/4] configfs: Make nested default groups lockdep-friendly v2
@ 2008-05-22 11:40 Louis Rilling
  2008-05-22 11:40 ` [RFC][PATCH 1/4] Prepare i_mutex lockdep subclasses for locking of variable path lengths Louis Rilling
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Louis Rilling @ 2008-05-22 11:40 UTC (permalink / raw)
  To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel

Hi all,

The following patchset is another try to make configfs recursive inode locking
lockdep-friendly. This time lockdep_off()/lockdep_on() are not inserted, but
I_MUTEX_* lockdep sub-classes are tweaked to allow some variable-length
recursion on the spirit of I_MUTEX_PARENT -> I_MUTEX_CHILD -> I_MUTEX_CHILD + 1
-> ... The lockdep annotations added in configfs are a variant of a previously
submitted patch (see
http://oss.oracle.com/pipermail/ocfs2-devel/2008-May/002214.html ), with a much
lighter footprint, thanks to a suggestion from Joel.

Thanks for reviewing.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC][PATCH 1/4] Prepare i_mutex lockdep subclasses for locking of variable path lengths
  2008-05-22 11:40 [RFC][PATCH 0/4] configfs: Make nested default groups lockdep-friendly v2 Louis Rilling
@ 2008-05-22 11:40 ` Louis Rilling
  2008-05-22 11:40 ` [RFC][PATCH 2/4] Prepare vfs_rmdir() for further nested i_mutex locking Louis Rilling
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Louis Rilling @ 2008-05-22 11:40 UTC (permalink / raw)
  To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel, Louis Rilling

[-- Attachment #1: prepare-variable-path-lengths-inode-locking.patch --]
[-- Type: text/plain, Size: 1726 bytes --]

Some filesystems, like configfs, need to lock more than two regular inodes
recursively in a tree, which i_mutex lockdep subclasses do not permit.

This patch reorders the definitions (but not the semantics) of i_mutex lockdep
sub-classes so that nested i_mutex locking can be done with sub-classes
I_MUTEX_PARENT -> I_MUTEX_CHILD -> I_MUTEX_CHILD + 1 -> I_MUTEX_CHILD + 2 -> ...
until MAX_LOCKDEP_SUBCLASSES - 1.

Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com>
---
 include/linux/fs.h |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)


Index: b/include/linux/fs.h
===================================================================
--- a/include/linux/fs.h	2008-05-21 09:40:28.000000000 +0200
+++ b/include/linux/fs.h	2008-05-22 12:20:20.000000000 +0200
@@ -660,20 +660,23 @@ struct inode {
  * inode->i_mutex nesting subclasses for the lock validator:
  *
  * 0: the object of the current VFS operation
- * 1: parent
- * 2: child/target
- * 3: quota file
+ * 1: xattrs
+ * 2: quota file
+ * 3: parent
+ * 4: child/target
+ * ...: lower-level children
  *
  * The locking order between these classes is
- * parent -> child -> normal -> xattr -> quota
+ * parent -> child -> sub-children... -> normal -> xattr -> quota
  */
 enum inode_i_mutex_lock_class
 {
 	I_MUTEX_NORMAL,
+	I_MUTEX_XATTR,
+	I_MUTEX_QUOTA,
 	I_MUTEX_PARENT,
 	I_MUTEX_CHILD,
-	I_MUTEX_XATTR,
-	I_MUTEX_QUOTA
+	/* Reserved for variable paths of nested inode locks */
 };
 
 extern void inode_double_lock(struct inode *inode1, struct inode *inode2);

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC][PATCH 2/4] Prepare vfs_rmdir() for further nested i_mutex locking
  2008-05-22 11:40 [RFC][PATCH 0/4] configfs: Make nested default groups lockdep-friendly v2 Louis Rilling
  2008-05-22 11:40 ` [RFC][PATCH 1/4] Prepare i_mutex lockdep subclasses for locking of variable path lengths Louis Rilling
@ 2008-05-22 11:40 ` Louis Rilling
  2008-05-22 11:40 ` [RFC][PATCH 3/4] configfs: Make nested default groups creations lockdep-friendly Louis Rilling
  2008-05-22 11:40 ` [RFC][PATCH 4/4] configfs: Make multiple default_group destructions lockdep friendly Louis Rilling
  3 siblings, 0 replies; 17+ messages in thread
From: Louis Rilling @ 2008-05-22 11:40 UTC (permalink / raw)
  To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel, Louis Rilling

[-- Attachment #1: prepare-vfs-rmdir-for-further-nested-inode-locking.patch --]
[-- Type: text/plain, Size: 1562 bytes --]

configfs_rmdir() needs to lock recursively a whole tree of i_mutex in order to
provide userspace and client sub-systems with atomic semantics. However,
vfs_rmdir() locks the to-be-removed inode with sub-class I_MUTEX_NORMAL, which
does not allow futher I_MUTEX_CHILD + x nested locks. Luckily this is a case of
I_MUTEX_PARENT -> I_MUTEX_CHILD lock dependency pattern, where do_rmdir() takes
parent's inode with sub-class I_MUTEX_PARENT.

This patch changes the sub-class of i_mutex lock in vfs_rmdir() from
I_MUTEX_NORMAL to I_MUTEX_CHILD, which allows further nested i_mutex locking as
configfs needs. Existing lock dependencies should not be impacted since
I_MUTEX_CHILD can only preceed I_MUTEX_NORMAL in locking order, and nobody locks
with sub-class I_MUTEX_CHILD before calling vfs_rmdir().

Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com>
---
 fs/namei.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: b/fs/namei.c
===================================================================
--- a/fs/namei.c	2008-05-21 09:40:26.000000000 +0200
+++ b/fs/namei.c	2008-05-22 12:21:27.000000000 +0200
@@ -2232,7 +2232,7 @@ int vfs_rmdir(struct inode *dir, struct 

 	DQUOT_INIT(dir);

-	mutex_lock(&dentry->d_inode->i_mutex);
+	mutex_lock_nested(&dentry->d_inode->i_mutex, I_MUTEX_CHILD);
 	dentry_unhash(dentry);
 	if (d_mountpoint(dentry))
 		error = -EBUSY;

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC][PATCH 3/4] configfs: Make nested default groups creations lockdep-friendly
  2008-05-22 11:40 [RFC][PATCH 0/4] configfs: Make nested default groups lockdep-friendly v2 Louis Rilling
  2008-05-22 11:40 ` [RFC][PATCH 1/4] Prepare i_mutex lockdep subclasses for locking of variable path lengths Louis Rilling
  2008-05-22 11:40 ` [RFC][PATCH 2/4] Prepare vfs_rmdir() for further nested i_mutex locking Louis Rilling
@ 2008-05-22 11:40 ` Louis Rilling
  2008-05-22 11:40 ` [RFC][PATCH 4/4] configfs: Make multiple default_group destructions lockdep friendly Louis Rilling
  3 siblings, 0 replies; 17+ messages in thread
From: Louis Rilling @ 2008-05-22 11:40 UTC (permalink / raw)
  To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel, Louis Rilling

[-- Attachment #1: configfs-make-default-group-creation-lockep-friendly-2.patch --]
[-- Type: text/plain, Size: 4442 bytes --]

When creating a config_group (resp. registering a subsystem) with nested default
groups, lockdep raises a warning since it sees a lock recursion of class
I_MUTEX_CHILD in populate_groups().

This patch makes such default groups creations lockdep-friendly by increasing
the i_mutex sub-class from I_MUTEX_CHILD onwards when descending the default
groups tree.

With this patch the depth of default group trees is limited to
MAX_LOCKDEP_SUBCLASSES - I_MUTEX_CHILD - 1 == 3. This limit is removed when not
compiling lockdep.

Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com>
---
 fs/configfs/configfs_internal.h |    3 +
 fs/configfs/dir.c               |   61 ++++++++++++++++++++++++++++++++++------
 fs/configfs/mount.c             |    3 +
 3 files changed, 59 insertions(+), 8 deletions(-)



Index: b/fs/configfs/configfs_internal.h
===================================================================
--- a/fs/configfs/configfs_internal.h	2008-05-21 09:40:25.000000000 +0200
+++ b/fs/configfs/configfs_internal.h	2008-05-22 12:38:02.000000000 +0200
@@ -38,6 +38,9 @@ struct configfs_dirent {
 	umode_t			s_mode;
 	struct dentry		* s_dentry;
 	struct iattr		* s_iattr;
+#ifdef CONFIG_LOCKDEP
+	int 			s_lock_level;
+#endif
 };
 
 #define CONFIGFS_ROOT		0x0001
Index: b/fs/configfs/dir.c
===================================================================
--- a/fs/configfs/dir.c	2008-05-21 09:40:25.000000000 +0200
+++ b/fs/configfs/dir.c	2008-05-22 12:59:23.000000000 +0200
@@ -36,6 +36,37 @@
 
 DECLARE_RWSEM(configfs_rename_sem);
 
+#ifdef CONFIG_LOCKDEP
+static inline int set_dirent_lock_level(struct configfs_dirent *prev_sd,
+					struct configfs_dirent *sd)
+{
+	int lock_level = prev_sd->s_lock_level + 1;
+	if (lock_level + I_MUTEX_CHILD < MAX_LOCKDEP_SUBCLASSES) {
+		sd->s_lock_level = lock_level;
+		return lock_level;
+	}
+	sd->s_lock_level = -1;
+	return -ELOOP;
+}
+
+static inline void reset_dirent_lock_level(struct configfs_dirent *sd)
+{
+	sd->s_lock_level = -1;
+}
+
+#else /* CONFIG_LOCKDEP */
+
+static inline int set_dirent_lock_level(struct configfs_dirent *prev_sd,
+					struct configfs_dirent *sd)
+{
+	return 0;
+}
+
+static inline void reset_dirent_lock_level(struct configfs_dirent *sd)
+{
+}
+#endif /* CONFIG_LOCKDEP */
+
 static void configfs_d_iput(struct dentry * dentry,
 			    struct inode * inode)
 {
@@ -533,6 +564,10 @@ static int populate_groups(struct config
 {
 	struct config_group *new_group;
 	struct dentry *dentry = group->cg_item.ci_dentry;
+	struct configfs_dirent *sd = dentry->d_fsdata;
+	struct configfs_dirent *parent_sd =
+		group->cg_item.ci_parent->ci_dentry->d_fsdata;
+	int lock_level;
 	int ret = 0;
 	int i;
 
@@ -546,17 +581,27 @@ static int populate_groups(struct config
 		 * That said, taking our i_mutex is closer to mkdir
 		 * emulation, and shouldn't hurt.
 		 */
-		mutex_lock_nested(&dentry->d_inode->i_mutex, I_MUTEX_CHILD);
+		/* lock_level starts at zero for the non default group */
+		lock_level = set_dirent_lock_level(parent_sd, sd);
+		if (lock_level < 0) {
+			/* Too deeply nested default groups */
+			ret = lock_level;
+		} else {
+			mutex_lock_nested(&dentry->d_inode->i_mutex,
+					  I_MUTEX_CHILD + lock_level);
 
-		for (i = 0; group->default_groups[i]; i++) {
-			new_group = group->default_groups[i];
+			for (i = 0; group->default_groups[i]; i++) {
+				new_group = group->default_groups[i];
 
-			ret = create_default_group(group, new_group);
-			if (ret)
-				break;
-		}
+				ret = create_default_group(group, new_group);
+				if (ret)
+					break;
+			}
 
-		mutex_unlock(&dentry->d_inode->i_mutex);
+			mutex_unlock(&dentry->d_inode->i_mutex);
+			/* Reset for future sub-group creations */
+			reset_dirent_lock_level(sd);
+		}
 	}
 
 	if (ret)
Index: b/fs/configfs/mount.c
===================================================================
--- a/fs/configfs/mount.c	2008-05-21 09:40:25.000000000 +0200
+++ b/fs/configfs/mount.c	2008-05-22 12:38:02.000000000 +0200
@@ -64,6 +64,9 @@ static struct configfs_dirent configfs_r
 	.s_element	= &configfs_root_group.cg_item,
 	.s_type		= CONFIGFS_ROOT,
 	.s_iattr	= NULL,
+#ifdef CONFIG_LOCKDEP
+	.s_lock_level	= -1,
+#endif
 };
 
 static int configfs_fill_super(struct super_block *sb, void *data, int silent)

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC][PATCH 4/4] configfs: Make multiple default_group destructions lockdep friendly
  2008-05-22 11:40 [RFC][PATCH 0/4] configfs: Make nested default groups lockdep-friendly v2 Louis Rilling
                   ` (2 preceding siblings ...)
  2008-05-22 11:40 ` [RFC][PATCH 3/4] configfs: Make nested default groups creations lockdep-friendly Louis Rilling
@ 2008-05-22 11:40 ` Louis Rilling
  2008-05-23 16:44   ` Louis Rilling
  3 siblings, 1 reply; 17+ messages in thread
From: Louis Rilling @ 2008-05-22 11:40 UTC (permalink / raw)
  To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel, Louis Rilling

[-- Attachment #1: configfs-make-default-group-destruction-lockdep-friendly-2.patch --]
[-- Type: text/plain, Size: 5496 bytes --]

When destroying a config_group having multiple (nested or not) default groups,
lockdep raises a warning because multiple locks of class I_MUTEX_NORMAL are
taken in configfs_detach_prep().

This patch makes such default group destructions lockdep-friendly by increasing
the i_mutex sub-class from I_MUTEX_CHILD + 1 onwards for each default group
locked under a being-destroyed config_group.

With this patch and lockdep compiled-in, the number of default groups (whatever
their depth in the tree) under a user-created config_group (resp. registered
subsystem) is limited to MAX_LOCKDEP_SUBCLASSES - I_MUTEX_CHILD - 1 == 3. This
limit is removed when not compiling lockdep.

=============================================
[ INFO: possible recursive locking detected ]
2.6.26-rc3 #5
---------------------------------------------
rmdir/1499 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#11){--..}, at: [<ffffffff802d2529>] configfs_detach_prep+0x54/0x89

but task is already holding lock:
 (&sb->s_type->i_mutex_key#11){--..}, at: [<ffffffff80296513>] vfs_rmdir+0x49/0xac

other info that might help us debug this:
2 locks held by rmdir/1499:
 #0:  (&sb->s_type->i_mutex_key#3/1){--..}, at: [<ffffffff80298138>] do_rmdir+0x82/0x108
 #1:  (&sb->s_type->i_mutex_key#11){--..}, at: [<ffffffff80296513>] vfs_rmdir+0x49/0xac

stack backtrace:
Pid: 1499, comm: rmdir Not tainted 2.6.26-rc3 #5

Call Trace:
 [<ffffffff8024afe9>] __lock_acquire+0x8d2/0xc78
 [<ffffffff80266399>] filemap_fault+0x1cf/0x332
 [<ffffffff802d2529>] configfs_detach_prep+0x54/0x89
 [<ffffffff8024b762>] lock_acquire+0x51/0x6c
 [<ffffffff802d2529>] configfs_detach_prep+0x54/0x89
 [<ffffffff80248331>] debug_mutex_lock_common+0x16/0x23
 [<ffffffff805d6244>] mutex_lock_nested+0xcd/0x23b
 [<ffffffff802d2529>] configfs_detach_prep+0x54/0x89
 [<ffffffff802d3656>] configfs_rmdir+0xb8/0x1c3
 [<ffffffff80296535>] vfs_rmdir+0x6b/0xac
 [<ffffffff8029816d>] do_rmdir+0xb7/0x108
 [<ffffffff8024a2a2>] trace_hardirqs_on+0xef/0x113
 [<ffffffff805d7364>] trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff8020b0cb>] system_call_after_swapgs+0x7b/0x80


Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com>
---
 fs/configfs/dir.c |   33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

Index: b/fs/configfs/dir.c
===================================================================
--- a/fs/configfs/dir.c	2008-05-22 12:38:02.000000000 +0200
+++ b/fs/configfs/dir.c	2008-05-22 12:49:08.000000000 +0200
@@ -49,6 +49,12 @@ static inline int set_dirent_lock_level(
 	return -ELOOP;
 }
 
+static inline void copy_dirent_lock_level(struct configfs_dirent *from,
+					  struct configfs_dirent *to)
+{
+	to->s_lock_level = from->s_lock_level;
+}
+
 static inline void reset_dirent_lock_level(struct configfs_dirent *sd)
 {
 	sd->s_lock_level = -1;
@@ -62,6 +68,11 @@ static inline int set_dirent_lock_level(
 	return 0;
 }
 
+static inline void copy_dirent_lock_level(struct configfs_dirent *from,
+					  struct configfs_dirent *to)
+{
+}
+
 static inline void reset_dirent_lock_level(struct configfs_dirent *sd)
 {
 }
@@ -372,6 +383,7 @@ static int configfs_detach_prep(struct d
 {
 	struct configfs_dirent *parent_sd = dentry->d_fsdata;
 	struct configfs_dirent *sd;
+	int lock_level;
 	int ret;
 
 	ret = -EBUSY;
@@ -383,7 +395,15 @@ static int configfs_detach_prep(struct d
 		if (sd->s_type & CONFIGFS_NOT_PINNED)
 			continue;
 		if (sd->s_type & CONFIGFS_USET_DEFAULT) {
-			mutex_lock(&sd->s_dentry->d_inode->i_mutex);
+			lock_level = set_dirent_lock_level(parent_sd, sd);
+			if (lock_level < 0) {
+				/* Bad bad bad! We cannot remove a directory
+				 * that we let be created! */
+				ret = -ELOOP;
+				break;
+			}
+			mutex_lock_nested(&sd->s_dentry->d_inode->i_mutex,
+					  I_MUTEX_CHILD + lock_level);
 			/* Mark that we've taken i_mutex */
 			sd->s_type |= CONFIGFS_USET_DROPPING;
 
@@ -392,6 +412,10 @@ static int configfs_detach_prep(struct d
 			 * deep nesting of default_groups
 			 */
 			ret = configfs_detach_prep(sd->s_dentry);
+			/* Update parent's lock_level so that remaining
+			 * sibling children keep on globally increasing
+			 * lock_level */
+			copy_dirent_lock_level(sd, parent_sd);
 			if (!ret)
 				continue;
 		} else
@@ -1206,7 +1230,11 @@ static int configfs_rmdir(struct inode *
 		return -EINVAL;
 	}
 
+	/* The inode of the config_item being removed is already locked by
+	 * vfs_rmdir() with subclass I_MUTEX_CHILD. Account for it. */
+	set_dirent_lock_level(parent_item->ci_dentry->d_fsdata, sd);
 	ret = configfs_detach_prep(dentry);
+	reset_dirent_lock_level(sd);
 	if (ret) {
 		configfs_detach_rollback(dentry);
 		config_item_put(parent_item);
@@ -1492,10 +1520,13 @@ void configfs_unregister_subsystem(struc
 
 	mutex_lock_nested(&configfs_sb->s_root->d_inode->i_mutex,
 			  I_MUTEX_PARENT);
+	/* Sets subsys's configfs_dirent lock_level to 0 */
+	set_dirent_lock_level(configfs_sb->s_root->d_fsdata, dentry->d_fsdata);
 	mutex_lock_nested(&dentry->d_inode->i_mutex, I_MUTEX_CHILD);
 	if (configfs_detach_prep(dentry)) {
 		printk(KERN_ERR "configfs: Tried to unregister non-empty subsystem!\n");
 	}
+	reset_dirent_lock_level(dentry->d_fsdata);
 	configfs_detach_group(&group->cg_item);
 	dentry->d_inode->i_flags |= S_DEAD;
 	mutex_unlock(&dentry->d_inode->i_mutex);

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 4/4] configfs: Make multiple default_group destructions lockdep friendly
  2008-05-22 11:40 ` [RFC][PATCH 4/4] configfs: Make multiple default_group destructions lockdep friendly Louis Rilling
@ 2008-05-23 16:44   ` Louis Rilling
  2008-06-02 23:07     ` Joel Becker
  0 siblings, 1 reply; 17+ messages in thread
From: Louis Rilling @ 2008-05-23 16:44 UTC (permalink / raw)
  To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2389 bytes --]

Louis Rilling a écrit :
> When destroying a config_group having multiple (nested or not) default groups,
> lockdep raises a warning because multiple locks of class I_MUTEX_NORMAL are
> taken in configfs_detach_prep().
> 
> This patch makes such default group destructions lockdep-friendly by increasing
> the i_mutex sub-class from I_MUTEX_CHILD + 1 onwards for each default group
> locked under a being-destroyed config_group.

I discovered two bugs, as described below, and fixed in the attached
version of the patch. Sorry for the noise.

[...]
> 
> Index: b/fs/configfs/dir.c
> ===================================================================
> --- a/fs/configfs/dir.c	2008-05-22 12:38:02.000000000 +0200
> +++ b/fs/configfs/dir.c	2008-05-22 12:49:08.000000000 +0200
[...]
> @@ -383,7 +395,15 @@ static int configfs_detach_prep(struct d
>  		if (sd->s_type & CONFIGFS_NOT_PINNED)
>  			continue;
>  		if (sd->s_type & CONFIGFS_USET_DEFAULT) {
> -			mutex_lock(&sd->s_dentry->d_inode->i_mutex);
> +			lock_level = set_dirent_lock_level(parent_sd, sd);
> +			if (lock_level < 0) {
> +				/* Bad bad bad! We cannot remove a directory
> +				 * that we let be created! */
> +				ret = -ELOOP;
> +				break;
> +			}
> +			mutex_lock_nested(&sd->s_dentry->d_inode->i_mutex,
> +					  I_MUTEX_CHILD + lock_level);
>  			/* Mark that we've taken i_mutex */
>  			sd->s_type |= CONFIGFS_USET_DROPPING;
>  

Here setting lock_level before acquiring the mutex may race with mkdir
(and thus configfs_attach_group()) in the default group. A corrected
version of the patch is attached.

[...]
> @@ -1206,7 +1230,11 @@ static int configfs_rmdir(struct inode *
>  		return -EINVAL;
>  	}
>  
> +	/* The inode of the config_item being removed is already locked by
> +	 * vfs_rmdir() with subclass I_MUTEX_CHILD. Account for it. */
> +	set_dirent_lock_level(parent_item->ci_dentry->d_fsdata, sd);
>  	ret = configfs_detach_prep(dentry);
> +	reset_dirent_lock_level(sd);
>  	if (ret) {
>  		configfs_detach_rollback(dentry);
>  		config_item_put(parent_item);

lock_level should be reset on rollback, since mkdir may be called again
after a failure of rmdir. Again, fixed in the new version of the patch
in attachment.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

[-- Attachment #2: configfs-make-default-group-destruction-lockdep-friendly-2.patch --]
[-- Type: text/x-diff, Size: 7342 bytes --]

configfs: Make multiple default_group destructions lockdep friendly

When destroying a config_group having multiple (nested or not) default groups,
lockdep raises a warning because multiple locks of class I_MUTEX_NORMAL are
taken in configfs_detach_prep().

This patch makes such default group destructions lockdep-friendly by increasing
the i_mutex sub-class from I_MUTEX_CHILD + 1 onwards for each default group
locked under a being-destroyed config_group.

With this patch and lockdep compiled-in, the number of default groups (whatever
their depth in the tree) under a user-created config_group (resp. registered
subsystem) is limited to MAX_LOCKDEP_SUBCLASSES - I_MUTEX_CHILD - 1 == 3. This
limit is removed when not compiling lockdep.

=============================================
[ INFO: possible recursive locking detected ]
2.6.26-rc3 #5
---------------------------------------------
rmdir/1499 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#11){--..}, at: [<ffffffff802d2529>] configfs_detach_prep+0x54/0x89

but task is already holding lock:
 (&sb->s_type->i_mutex_key#11){--..}, at: [<ffffffff80296513>] vfs_rmdir+0x49/0xac

other info that might help us debug this:
2 locks held by rmdir/1499:
 #0:  (&sb->s_type->i_mutex_key#3/1){--..}, at: [<ffffffff80298138>] do_rmdir+0x82/0x108
 #1:  (&sb->s_type->i_mutex_key#11){--..}, at: [<ffffffff80296513>] vfs_rmdir+0x49/0xac

stack backtrace:
Pid: 1499, comm: rmdir Not tainted 2.6.26-rc3 #5

Call Trace:
 [<ffffffff8024afe9>] __lock_acquire+0x8d2/0xc78
 [<ffffffff80266399>] filemap_fault+0x1cf/0x332
 [<ffffffff802d2529>] configfs_detach_prep+0x54/0x89
 [<ffffffff8024b762>] lock_acquire+0x51/0x6c
 [<ffffffff802d2529>] configfs_detach_prep+0x54/0x89
 [<ffffffff80248331>] debug_mutex_lock_common+0x16/0x23
 [<ffffffff805d6244>] mutex_lock_nested+0xcd/0x23b
 [<ffffffff802d2529>] configfs_detach_prep+0x54/0x89
 [<ffffffff802d3656>] configfs_rmdir+0xb8/0x1c3
 [<ffffffff80296535>] vfs_rmdir+0x6b/0xac
 [<ffffffff8029816d>] do_rmdir+0xb7/0x108
 [<ffffffff8024a2a2>] trace_hardirqs_on+0xef/0x113
 [<ffffffff805d7364>] trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff8020b0cb>] system_call_after_swapgs+0x7b/0x80


Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com>
---
 fs/configfs/dir.c |   74 ++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 66 insertions(+), 8 deletions(-)

Index: b/fs/configfs/dir.c
===================================================================
--- a/fs/configfs/dir.c	2008-05-22 13:36:06.000000000 +0200
+++ b/fs/configfs/dir.c	2008-05-23 16:46:43.000000000 +0200
@@ -37,16 +37,32 @@
 DECLARE_RWSEM(configfs_rename_sem);
 
 #ifdef CONFIG_LOCKDEP
+static inline int __future_dirent_lock_level(struct configfs_dirent *prev_sd,
+					     struct configfs_dirent *sd)
+{
+	int lock_level = prev_sd->s_lock_level + 1;
+	return (lock_level + I_MUTEX_CHILD < MAX_LOCKDEP_SUBCLASSES) ?
+		lock_level : -ELOOP;
+}
+
+static inline void __set_dirent_lock_level(struct configfs_dirent *sd,
+					   int lock_level)
+{
+	sd->s_lock_level = lock_level;
+}
+
 static inline int set_dirent_lock_level(struct configfs_dirent *prev_sd,
 					struct configfs_dirent *sd)
 {
-	int lock_level = prev_sd->s_lock_level + 1;
-	if (lock_level + I_MUTEX_CHILD < MAX_LOCKDEP_SUBCLASSES) {
-		sd->s_lock_level = lock_level;
-		return lock_level;
-	}
-	sd->s_lock_level = -1;
-	return -ELOOP;
+	int lock_level = __future_dirent_lock_level(prev_sd, sd);
+	__set_dirent_lock_level(sd, lock_level < 0 ? -1 : lock_level);
+	return lock_level;
+}
+
+static inline void copy_dirent_lock_level(struct configfs_dirent *from,
+					  struct configfs_dirent *to)
+{
+	to->s_lock_level = from->s_lock_level;
 }
 
 static inline void reset_dirent_lock_level(struct configfs_dirent *sd)
@@ -56,12 +72,28 @@ static inline void reset_dirent_lock_lev
 
 #else /* CONFIG_LOCKDEP */
 
+static inline int __future_dirent_lock_level(struct configfs_dirent *prev_sd,
+					     struct configfs_dirent *sd)
+{
+	return 0;
+}
+
+static inline void __set_dirent_lock_level(struct configfs_dirent *sd,
+					   int lock_level)
+{
+}
+
 static inline int set_dirent_lock_level(struct configfs_dirent *prev_sd,
 					struct configfs_dirent *sd)
 {
 	return 0;
 }
 
+static inline void copy_dirent_lock_level(struct configfs_dirent *from,
+					  struct configfs_dirent *to)
+{
+}
+
 static inline void reset_dirent_lock_level(struct configfs_dirent *sd)
 {
 }
@@ -372,6 +404,7 @@ static int configfs_detach_prep(struct d
 {
 	struct configfs_dirent *parent_sd = dentry->d_fsdata;
 	struct configfs_dirent *sd;
+	int lock_level;
 	int ret;
 
 	ret = -EBUSY;
@@ -383,7 +416,19 @@ static int configfs_detach_prep(struct d
 		if (sd->s_type & CONFIGFS_NOT_PINNED)
 			continue;
 		if (sd->s_type & CONFIGFS_USET_DEFAULT) {
-			mutex_lock(&sd->s_dentry->d_inode->i_mutex);
+			/* Do not set lock_level before we acquire the mutex,
+			 * otherwise a racing mkdir in sd could start from a
+			 * too high lock_level */
+			lock_level = __future_dirent_lock_level(parent_sd, sd);
+			if (lock_level < 0) {
+				/* Bad bad bad! We cannot remove a directory
+				 * that we let be created! */
+				ret = -ELOOP;
+				break;
+			}
+			mutex_lock_nested(&sd->s_dentry->d_inode->i_mutex,
+					  I_MUTEX_CHILD + lock_level);
+			__set_dirent_lock_level(sd, lock_level);
 			/* Mark that we've taken i_mutex */
 			sd->s_type |= CONFIGFS_USET_DROPPING;
 
@@ -392,6 +437,10 @@ static int configfs_detach_prep(struct d
 			 * deep nesting of default_groups
 			 */
 			ret = configfs_detach_prep(sd->s_dentry);
+			/* Update parent's lock_level so that remaining
+			 * sibling children keep on globally increasing
+			 * lock_level */
+			copy_dirent_lock_level(sd, parent_sd);
 			if (!ret)
 				continue;
 		} else
@@ -419,6 +468,7 @@ static void configfs_detach_rollback(str
 
 			if (sd->s_type & CONFIGFS_USET_DROPPING) {
 				sd->s_type &= ~CONFIGFS_USET_DROPPING;
+				reset_dirent_lock_level(sd);
 				mutex_unlock(&sd->s_dentry->d_inode->i_mutex);
 			}
 		}
@@ -1206,9 +1256,14 @@ static int configfs_rmdir(struct inode *
 		return -EINVAL;
 	}
 
+	/* The inode of the config_item being removed is already locked by
+	 * vfs_rmdir() with subclass I_MUTEX_CHILD. Account for it. */
+	set_dirent_lock_level(parent_item->ci_dentry->d_fsdata, sd);
 	ret = configfs_detach_prep(dentry);
+	reset_dirent_lock_level(sd);
 	if (ret) {
 		configfs_detach_rollback(dentry);
+		reset_dirent_lock_level(sd);
 		config_item_put(parent_item);
 		return ret;
 	}
@@ -1492,10 +1547,13 @@ void configfs_unregister_subsystem(struc
 
 	mutex_lock_nested(&configfs_sb->s_root->d_inode->i_mutex,
 			  I_MUTEX_PARENT);
+	/* Sets subsys's configfs_dirent lock_level to 0 */
+	set_dirent_lock_level(configfs_sb->s_root->d_fsdata, dentry->d_fsdata);
 	mutex_lock_nested(&dentry->d_inode->i_mutex, I_MUTEX_CHILD);
 	if (configfs_detach_prep(dentry)) {
 		printk(KERN_ERR "configfs: Tried to unregister non-empty subsystem!\n");
 	}
+	reset_dirent_lock_level(dentry->d_fsdata);
 	configfs_detach_group(&group->cg_item);
 	dentry->d_inode->i_flags |= S_DEAD;
 	mutex_unlock(&dentry->d_inode->i_mutex);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 4/4] configfs: Make multiple default_group destructions lockdep friendly
  2008-05-23 16:44   ` Louis Rilling
@ 2008-06-02 23:07     ` Joel Becker
  2008-06-03 16:00       ` Louis Rilling
  0 siblings, 1 reply; 17+ messages in thread
From: Joel Becker @ 2008-06-02 23:07 UTC (permalink / raw)
  To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel

	A couple comments.
	First, put a BUG_ON() where you have BAD BAD BAD - we shouldn't
be creating a depth we can't delete.

> @@ -392,6 +437,10 @@ static int configfs_detach_prep(struct d
>  			 * deep nesting of default_groups
>  			 */
>  			ret = configfs_detach_prep(sd->s_dentry);
> +			/* Update parent's lock_level so that remaining
> +			 * sibling children keep on globally increasing
> +			 * lock_level */
> +			copy_dirent_lock_level(sd, parent_sd);
>  			if (!ret)
>  				continue;
>  		} else

	I'm not sure I get this hunk.  If our parent was 1 and we are 2,
we are copying 2 to our parent so the parent can only have other
children at 3?

Joel

-- 

Life's Little Instruction Book #267

	"Lie on your back and look at the stars."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 4/4] configfs: Make multiple default_group destructions lockdep friendly
  2008-06-02 23:07     ` Joel Becker
@ 2008-06-03 16:00       ` Louis Rilling
  2008-06-06 23:01         ` Joel Becker
  0 siblings, 1 reply; 17+ messages in thread
From: Louis Rilling @ 2008-06-03 16:00 UTC (permalink / raw)
  To: Joel Becker; +Cc: ocfs2-devel, linux-kernel

On Mon, Jun 02, 2008 at 04:07:21PM -0700, Joel Becker wrote:
> 	A couple comments.
> 	First, put a BUG_ON() where you have BAD BAD BAD - we shouldn't
> be creating a depth we can't delete.

I think that the best way to avoid this is to use the same numbering scheme
while attaching default groups.

This would change the body of populate_groups() like this:

-	if (group->default_groups) {
+	/* lock_level starts at zero for the non default group.
+	 * Set it even if we do not take the lock, so that we can use the same
+	 * numbering scheme as for destruction time, and can prevent overload at
+	 * destruction time. */
+	lock_level = set_dirent_lock_level(parent_sd, sd);
+	if (lock_level < 0) {
+		/* Too many default groups */
+		ret = lock_level;
+	} else if (group->default_groups) {
 		/*
 		 * FYI, we're faking mkdir here
 		 * I'm not sure we need this semaphore, as we're called
 		 * from our parent's mkdir.  That holds our parent's
 		 * i_mutex, so afaik lookup cannot continue through our
 		 * parent to find us, let alone mess with our tree.
 		 * That said, taking our i_mutex is closer to mkdir
 		 * emulation, and shouldn't hurt.
 		 */
-		/* lock_level starts at zero for the non default group */
-		lock_level = set_dirent_lock_level(parent_sd, sd);
-		if (lock_level < 0) {
-			/* Too deeply nested default groups */
-			ret = lock_level;
-		} else {
 			mutex_lock_nested(&dentry->d_inode->i_mutex,
 					  I_MUTEX_CHILD + lock_level);
 
 			for (i = 0; group->default_groups[i]; i++) {
 				new_group = group->default_groups[i];
 
 				ret = create_default_group(group, new_group);
 				if (ret)
 					break;
 			}
 
 			mutex_unlock(&dentry->d_inode->i_mutex);
-			/* Reset for future sub-group creations */
-			reset_dirent_lock_level(sd);
-		}
 	}
+	if (lock_level > 0)
+		/* Update parent lock_level to keep it increasing, but not
+		 * if group is the one actually created/registered by the
+		 * user/subsystem */
+		copy_dirent_lock_level(sd, parent_sd);
+	/* Reset for future sub-group creations */
+	reset_dirent_lock_level(sd);

> 
> > @@ -392,6 +437,10 @@ static int configfs_detach_prep(struct d
> >  			 * deep nesting of default_groups
> >  			 */
> >  			ret = configfs_detach_prep(sd->s_dentry);
> > +			/* Update parent's lock_level so that remaining
> > +			 * sibling children keep on globally increasing
> > +			 * lock_level */
> > +			copy_dirent_lock_level(sd, parent_sd);
> >  			if (!ret)
> >  				continue;
> >  		} else
> 
> 	I'm not sure I get this hunk.  If our parent was 1 and we are 2,
> we are copying 2 to our parent so the parent can only have other
> children at 3?

Exactly.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 4/4] configfs: Make multiple default_group destructions lockdep friendly
  2008-06-03 16:00       ` Louis Rilling
@ 2008-06-06 23:01         ` Joel Becker
  2008-06-09 11:03           ` Louis Rilling
  0 siblings, 1 reply; 17+ messages in thread
From: Joel Becker @ 2008-06-06 23:01 UTC (permalink / raw)
  To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel

On Tue, Jun 03, 2008 at 06:00:34PM +0200, Louis Rilling wrote:
> On Mon, Jun 02, 2008 at 04:07:21PM -0700, Joel Becker wrote:
> > 	A couple comments.
> > 	First, put a BUG_ON() where you have BAD BAD BAD - we shouldn't
> > be creating a depth we can't delete.
> 
> I think that the best way to avoid this is to use the same numbering scheme
> while attaching default groups.

	If I'm reading this right, when we come back up from one child
chain, we update the parent to be the same as the child - this is, i
assume, to allow all the locks to be held at once.  IOW, you are trying
to have all locks in the default groups have unique lock levels,
regardless of their depth.
	This is obviously limiting on the number of default groups for
one item - it's a total cap, not a depth cap.  But I have another
concern.  We lock a particular default_group with level N, then its
child default_group with level N+1.  But how does that integrate with
VFS locking of the same mutexes?
	Say we have an group G.  It has one default group D1.  D1 has a
default group itself, D2.  So, when we populate the groups, we lock G at
MUTEX_CHILD, D1 at MUTEX_CHILD+1, and D2 at MUTEX_CHILD+2.  However,
when the VFS navigates the tree (eg, lookup() or someone attempting an
rmdir() of D2's non-default child), it will lock with _PARENT and
_CHILD, not with our subclasses.
	Am I right about this?  We won't be using the same classes as
the VFS, and thus won't be able to see about interactions between the
VFS locking and our locking?  I'd love to be wrong :-)

Joel

-- 

"The real reason GNU ls is 8-bit-clean is so that they can
 start using ISO-8859-1 option characters."
	- Christopher Davis (ckd@loiosh.kei.com)

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 4/4] configfs: Make multiple default_group destructions lockdep friendly
  2008-06-06 23:01         ` Joel Becker
@ 2008-06-09 11:03           ` Louis Rilling
  2008-06-09 12:54             ` [BUG] deadlock between configfs_rmdir() and sys_rename() (WAS Re: [RFC][PATCH 4/4] configfs: Make multiple default_group) " Louis Rilling
  0 siblings, 1 reply; 17+ messages in thread
From: Louis Rilling @ 2008-06-09 11:03 UTC (permalink / raw)
  To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel

On Fri, Jun 06, 2008 at 04:01:54PM -0700, Joel Becker wrote:
> On Tue, Jun 03, 2008 at 06:00:34PM +0200, Louis Rilling wrote:
> > On Mon, Jun 02, 2008 at 04:07:21PM -0700, Joel Becker wrote:
> > > 	A couple comments.
> > > 	First, put a BUG_ON() where you have BAD BAD BAD - we shouldn't
> > > be creating a depth we can't delete.
> > 
> > I think that the best way to avoid this is to use the same numbering scheme
> > while attaching default groups.
> 
> 	If I'm reading this right, when we come back up from one child
> chain, we update the parent to be the same as the child - this is, i
> assume, to allow all the locks to be held at once.  IOW, you are trying
> to have all locks in the default groups have unique lock levels,
> regardless of their depth.

Exactly, otherwise lockdep will issue a warning as soon as one tries to remove
a config group having default groups at the same depth, because it will see two
mutexes locked with same sub-class.

> 	This is obviously limiting on the number of default groups for
> one item - it's a total cap, not a depth cap.  But I have another
> concern.  We lock a particular default_group with level N, then its
> child default_group with level N+1.  But how does that integrate with
> VFS locking of the same mutexes?
> 	Say we have an group G.  It has one default group D1.  D1 has a
> default group itself, D2.  So, when we populate the groups, we lock G at
> MUTEX_CHILD, D1 at MUTEX_CHILD+1, and D2 at MUTEX_CHILD+2.  However,
> when the VFS navigates the tree (eg, lookup() or someone attempting an
> rmdir() of D2's non-default child), it will lock with _PARENT and
> _CHILD, not with our subclasses.
> 	Am I right about this?  We won't be using the same classes as
> the VFS, and thus won't be able to see about interactions between the
> VFS locking and our locking?  I'd love to be wrong :-)

You are perfectly right, unfortunately. This is the reason why I proposed
another way that temporarily disables lockdep, and let us prove the correctness
manually (actually, this manual solution still lets lockdep verify that the
assumption about I_MUTEX_PARENT -> I_MUTEX_CHILD nesting is correct).

A real solution without disabling lockdep and that would integrate with the VFS
should make lockdep aware of lock trees (like i_mutex locks inside a same
filesystem), or more generally lock graphs, and let lockdep verify that locks of
a tree are always taken while respecting a same order. IOW, if we are able to
consistently tag the nodes of a tree with unique numbers (consistently means
that the resulting order on the nodes is never changed when adding or removing
nodes), lockdep should check that locks of the tree are always taken in
ascending tag order.
	This seems unfortunately hard (impossible?) to achieve with reasonable
constraints: lockdep should not need to add links between the locks (this would
make addition and removal of nodes error prone), and lockdep should not need to
renumber all the nodes of a tree when adding a new node.

As a conclusion, I still suggest to temporarily disable lockdep, which will have
the advantage of letting people use lockdep (for other areas) while using
configfs, because lockdep simply cannot help us with configfs hierarchical
locking right now.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [BUG] deadlock between configfs_rmdir() and sys_rename() (WAS Re: [RFC][PATCH 4/4] configfs: Make multiple default_group) destructions lockdep friendly
  2008-06-09 11:03           ` Louis Rilling
@ 2008-06-09 12:54             ` Louis Rilling
  2008-06-10  1:58               ` Joel Becker
  0 siblings, 1 reply; 17+ messages in thread
From: Louis Rilling @ 2008-06-09 12:54 UTC (permalink / raw)
  To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 8728 bytes --]

Hi,

Following an intuition, I just found a deadlock resulting from the whole default
groups tree locking in configfs_detach_prep().

I can reproduce the bug with the attached patch (which just enlarges an existing
window in VFS lock_rename()) and the following procedure, assuming that configfs
is mounted under /config, and ocfs2 is loaded with cluster support:

# mkdir /config/cluster/foo
# cd /config/cluster/foo
# ln -s /bin/mv ~/test_deadlock
# ~/test_deadlock heartbeat/dead_threshold node/bar

and in another shell, right after having launched test_deadlock:

# rmdir /config/cluster/foo

First, lockdep warns as usual (see below), and after two minutes (standard task
deadlock parameters), we get the dead lock alerts:

<log>

=============================================
[ INFO: possible recursive locking detected ]
2.6.26-rc5 #13
---------------------------------------------
rmdir/3997 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#11){--..}, at: [<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa

but task is already holding lock:
 (&sb->s_type->i_mutex_key#11){--..}, at: [<ffffffff80296070>] vfs_rmdir+0x49/0xac

other info that might help us debug this:
2 locks held by rmdir/3997:
 #0:  (&sb->s_type->i_mutex_key#3/1){--..}, at: [<ffffffff80297c77>] do_rmdir+0x82/0x108
 #1:  (&sb->s_type->i_mutex_key#11){--..}, at: [<ffffffff80296070>] vfs_rmdir+0x49/0xac

stack backtrace:
Pid: 3997, comm: rmdir Not tainted 2.6.26-rc5 #13

Call Trace:
 [<ffffffff8024aa65>] __lock_acquire+0x8d2/0xc78
 [<ffffffff802495ec>] find_usage_backwards+0x9d/0xbe
 [<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
 [<ffffffff8024b1de>] lock_acquire+0x51/0x6c
 [<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
 [<ffffffff80247dad>] debug_mutex_lock_common+0x16/0x23
 [<ffffffff805d63a4>] mutex_lock_nested+0xcd/0x23b
 [<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
 [<ffffffff802d327b>] configfs_rmdir+0xb8/0x1c3
 [<ffffffff80296092>] vfs_rmdir+0x6b/0xac
 [<ffffffff80297cac>] do_rmdir+0xb7/0x108
 [<ffffffff80249d1e>] trace_hardirqs_on+0xef/0x113
 [<ffffffff805d74c4>] trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff8020b0cb>] system_call_after_swapgs+0x7b/0x80

INFO: task test_deadlock:3996 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
test_deadlock D 0000000000000001     0  3996   3980
 ffff81007cc93d78 0000000000000046 ffff81007cc93d40 ffffffff808ed280
 ffffffff808ed280 ffff81007cc93d28 ffffffff808ed280 ffffffff808ed280
 ffffffff808ed280 ffffffff808ea120 ffffffff808ed280 ffff81007cdcaa10
Call Trace:
 [<ffffffff802955e3>] lock_rename+0x11e/0x126
 [<ffffffff805d641e>] mutex_lock_nested+0x147/0x23b
 [<ffffffff802955e3>] lock_rename+0x11e/0x126
 [<ffffffff80297838>] sys_renameat+0xd7/0x21c
 [<ffffffff805d74c4>] trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff80249d1e>] trace_hardirqs_on+0xef/0x113
 [<ffffffff805d74c4>] trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff8020b0cb>] system_call_after_swapgs+0x7b/0x80

INFO: lockdep is turned off.
INFO: task rmdir:3997 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rmdir         D 0000000000000000     0  3997   3986
 ffff81007cdb9dd8 0000000000000046 0000000000000000 ffffffff808ed280
 ffffffff808ed280 ffff81007cdb9d88 ffffffff808ed280 ffffffff808ed280
 ffffffff808ed280 ffffffff808ea120 ffffffff808ed280 ffff81007cde0a50
Call Trace:
 [<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
 [<ffffffff805d641e>] mutex_lock_nested+0x147/0x23b
 [<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
 [<ffffffff802d327b>] configfs_rmdir+0xb8/0x1c3
 [<ffffffff80296092>] vfs_rmdir+0x6b/0xac
 [<ffffffff80297cac>] do_rmdir+0xb7/0x108
 [<ffffffff80249d1e>] trace_hardirqs_on+0xef/0x113
 [<ffffffff805d74c4>] trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff8020b0cb>] system_call_after_swapgs+0x7b/0x80

INFO: lockdep is turned off.

</log>

The issue here is that the VFS locks the i_mutex of the source and target
directories of the rename in source -> target order (because none is ascendent
of the other one), while configfs_detach_prep() takes them in default group
order (or reverse order, I'm not sure), following the order specified by the
groups' creator.

The VFS protects itself against deadlocks of two concurrent renames with
interverted source and target directories with i_sb->s_vfs_rename_mutex. Perhaps
configfs should use the same lock before calling configfs_detach_prep()?
Or maybe configfs would better find an alternative to locking the whole
default groups tree? I strongly advocate for the latter, since this could also
solve our issues with lockdep ;)

Louis

On Mon, Jun 09, 2008 at 01:03:53PM +0200, Louis Rilling wrote:
> On Fri, Jun 06, 2008 at 04:01:54PM -0700, Joel Becker wrote:
> > On Tue, Jun 03, 2008 at 06:00:34PM +0200, Louis Rilling wrote:
> > > On Mon, Jun 02, 2008 at 04:07:21PM -0700, Joel Becker wrote:
> > > > 	A couple comments.
> > > > 	First, put a BUG_ON() where you have BAD BAD BAD - we shouldn't
> > > > be creating a depth we can't delete.
> > > 
> > > I think that the best way to avoid this is to use the same numbering scheme
> > > while attaching default groups.
> > 
> > 	If I'm reading this right, when we come back up from one child
> > chain, we update the parent to be the same as the child - this is, i
> > assume, to allow all the locks to be held at once.  IOW, you are trying
> > to have all locks in the default groups have unique lock levels,
> > regardless of their depth.
> 
> Exactly, otherwise lockdep will issue a warning as soon as one tries to remove
> a config group having default groups at the same depth, because it will see two
> mutexes locked with same sub-class.
> 
> > 	This is obviously limiting on the number of default groups for
> > one item - it's a total cap, not a depth cap.  But I have another
> > concern.  We lock a particular default_group with level N, then its
> > child default_group with level N+1.  But how does that integrate with
> > VFS locking of the same mutexes?
> > 	Say we have an group G.  It has one default group D1.  D1 has a
> > default group itself, D2.  So, when we populate the groups, we lock G at
> > MUTEX_CHILD, D1 at MUTEX_CHILD+1, and D2 at MUTEX_CHILD+2.  However,
> > when the VFS navigates the tree (eg, lookup() or someone attempting an
> > rmdir() of D2's non-default child), it will lock with _PARENT and
> > _CHILD, not with our subclasses.
> > 	Am I right about this?  We won't be using the same classes as
> > the VFS, and thus won't be able to see about interactions between the
> > VFS locking and our locking?  I'd love to be wrong :-)
> 
> You are perfectly right, unfortunately. This is the reason why I proposed
> another way that temporarily disables lockdep, and let us prove the correctness
> manually (actually, this manual solution still lets lockdep verify that the
> assumption about I_MUTEX_PARENT -> I_MUTEX_CHILD nesting is correct).
> 
> A real solution without disabling lockdep and that would integrate with the VFS
> should make lockdep aware of lock trees (like i_mutex locks inside a same
> filesystem), or more generally lock graphs, and let lockdep verify that locks of
> a tree are always taken while respecting a same order. IOW, if we are able to
> consistently tag the nodes of a tree with unique numbers (consistently means
> that the resulting order on the nodes is never changed when adding or removing
> nodes), lockdep should check that locks of the tree are always taken in
> ascending tag order.
> 	This seems unfortunately hard (impossible?) to achieve with reasonable
> constraints: lockdep should not need to add links between the locks (this would
> make addition and removal of nodes error prone), and lockdep should not need to
> renumber all the nodes of a tree when adding a new node.
> 
> As a conclusion, I still suggest to temporarily disable lockdep, which will have
> the advantage of letting people use lockdep (for other areas) while using
> configfs, because lockdep simply cannot help us with configfs hierarchical
> locking right now.
> 
> Louis
> 
> -- 
> Dr Louis Rilling			Kerlabs
> Skype: louis.rilling			Batiment Germanium
> Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
> http://www.kerlabs.com/			35700 Rennes
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

[-- Attachment #2: show-configfs-deadlock-with-rename.patch --]
[-- Type: text/x-diff, Size: 784 bytes --]

---
 fs/namei.c |    6 ++++++
 1 file changed, 6 insertions(+)

Index: b/fs/namei.c
===================================================================
--- a/fs/namei.c	2008-06-09 13:33:25.000000000 +0200
+++ b/fs/namei.c	2008-06-09 13:35:57.000000000 +0200
@@ -31,6 +31,7 @@
 #include <linux/file.h>
 #include <linux/fcntl.h>
 #include <linux/device_cgroup.h>
+#include <linux/jiffies.h>
 #include <asm/namei.h>
 #include <asm/uaccess.h>
 
@@ -1566,6 +1567,11 @@ struct dentry *lock_rename(struct dentry
 	}
 
 	mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_PARENT);
+	if (!strcmp(current->comm, "test_deadlock")) {
+		unsigned long now = jiffies;
+		while (jiffies - now < 8 * HZ)
+			cpu_relax();
+	}
 	mutex_lock_nested(&p2->d_inode->i_mutex, I_MUTEX_CHILD);
 	return NULL;
 }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] deadlock between configfs_rmdir() and sys_rename() (WAS Re: [RFC][PATCH 4/4] configfs: Make multiple default_group) destructions lockdep friendly
  2008-06-09 12:54             ` [BUG] deadlock between configfs_rmdir() and sys_rename() (WAS Re: [RFC][PATCH 4/4] configfs: Make multiple default_group) " Louis Rilling
@ 2008-06-10  1:58               ` Joel Becker
  2008-06-10 10:14                 ` Louis Rilling
  0 siblings, 1 reply; 17+ messages in thread
From: Joel Becker @ 2008-06-10  1:58 UTC (permalink / raw)
  To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel

On Mon, Jun 09, 2008 at 02:54:43PM +0200, Louis Rilling wrote:
> Following an intuition, I just found a deadlock resulting from the whole default
> groups tree locking in configfs_detach_prep().

	Ugh, thanks for catching this :-(
 
> The issue here is that the VFS locks the i_mutex of the source and target
> directories of the rename in source -> target order (because none is ascendent
> of the other one), while configfs_detach_prep() takes them in default group
> order (or reverse order, I'm not sure), following the order specified by the
> groups' creator.

	What actual targets are you renaming?  Sibling default groups?

> The VFS protects itself against deadlocks of two concurrent renames with
> interverted source and target directories with i_sb->s_vfs_rename_mutex. Perhaps
> configfs should use the same lock before calling configfs_detach_prep()?
> Or maybe configfs would better find an alternative to locking the whole
> default groups tree? I strongly advocate for the latter, since this could also
> solve our issues with lockdep ;)

	I think the former actually works nicely.  We are playing with
the subtree, and want to keep all operations out of it.  Except, of
course, that we come into rmdir() with our parent i_mutex taken, so that
violates the ordering of the rename locks, right?
	I'm not against the latter AT ALL.  I just haven't come up with
it yet - we can't remove parts of the tree, it must be all or none.
Hence, we lock them all speculatively.

Joel

-- 

Life's Little Instruction Book #15

	"Own a great stereo system."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] deadlock between configfs_rmdir() and sys_rename() (WAS Re: [RFC][PATCH 4/4] configfs: Make multiple default_group) destructions lockdep friendly
  2008-06-10  1:58               ` Joel Becker
@ 2008-06-10 10:14                 ` Louis Rilling
  2008-06-10 17:36                   ` Joel Becker
  0 siblings, 1 reply; 17+ messages in thread
From: Louis Rilling @ 2008-06-10 10:14 UTC (permalink / raw)
  To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel

On Mon, Jun 09, 2008 at 06:58:00PM -0700, Joel Becker wrote:
> On Mon, Jun 09, 2008 at 02:54:43PM +0200, Louis Rilling wrote:
> > Following an intuition, I just found a deadlock resulting from the whole default
> > groups tree locking in configfs_detach_prep().
> 
> 	Ugh, thanks for catching this :-(
>  
> > The issue here is that the VFS locks the i_mutex of the source and target
> > directories of the rename in source -> target order (because none is ascendent
> > of the other one), while configfs_detach_prep() takes them in default group
> > order (or reverse order, I'm not sure), following the order specified by the
> > groups' creator.
> 
> 	What actual targets are you renaming?  Sibling default groups?

Actually the operation tries to rename a file in the source default group
"foo/heartbeat/" to a new entry "bar" in a sibling default group "foo/node/" of the
source default group. The operation itself is silly regarding configfs
semantics, but VFS cannot know before locking the source and target
directories...

> 
> > The VFS protects itself against deadlocks of two concurrent renames with
> > interverted source and target directories with i_sb->s_vfs_rename_mutex. Perhaps
> > configfs should use the same lock before calling configfs_detach_prep()?
> > Or maybe configfs would better find an alternative to locking the whole
> > default groups tree? I strongly advocate for the latter, since this could also
> > solve our issues with lockdep ;)
> 
> 	I think the former actually works nicely.  We are playing with
> the subtree, and want to keep all operations out of it.  Except, of
> course, that we come into rmdir() with our parent i_mutex taken, so that
> violates the ordering of the rename locks, right?

Right.
I suggested to use i_sb->s_vfs_rename_mutex, but we cannot do this from
inside configfs_rmdir(), because locking an i_mutex (as the VFS does
before calling configfs_rmdir()) before s_vfs_rename_mutex will also
deadlock with lock_rename().

> 	I'm not against the latter AT ALL.  I just haven't come up with
> it yet - we can't remove parts of the tree, it must be all or none.
> Hence, we lock them all speculatively.

I'm slowly thinking about a solution, but I don't know the VFS enough
yet, especially regarding dentry invalidation and locking. Would it be possible
to start an rmdir() by moving the group to some unreachable place? We could
probably use the mutex of the configfs subsystem and enlarge its scope to
protect against concurrent mkdir(), check for user-created items under the
group (and descendent default groups) to remove, invalidate all dentries and
actually remove the group. Do you think that something is feasible in this way?

Louis

-- 
Dr Louis Rilling			Kerlabs - IRISA
Skype: louis.rilling			Campus Universitaire de Beaulieu
Phone: (+33|0) 2 99 84 71 52		Avenue du General Leclerc
Fax: (+33|0) 2 99 84 71 71		35042 Rennes CEDEX - France
http://www.kerlabs.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] deadlock between configfs_rmdir() and sys_rename() (WAS Re: [RFC][PATCH 4/4] configfs: Make multiple default_group) destructions lockdep friendly
  2008-06-10 10:14                 ` Louis Rilling
@ 2008-06-10 17:36                   ` Joel Becker
  2008-06-11 14:10                     ` Louis Rilling
  0 siblings, 1 reply; 17+ messages in thread
From: Joel Becker @ 2008-06-10 17:36 UTC (permalink / raw)
  To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel

On Tue, Jun 10, 2008 at 12:14:27PM +0200, Louis Rilling wrote:
> On Mon, Jun 09, 2008 at 06:58:00PM -0700, Joel Becker wrote:
> > 	I'm not against the latter AT ALL.  I just haven't come up with
> > it yet - we can't remove parts of the tree, it must be all or none.
> > Hence, we lock them all speculatively.
> 
> I'm slowly thinking about a solution, but I don't know the VFS enough
> yet, especially regarding dentry invalidation and locking. Would it be possible
> to start an rmdir() by moving the group to some unreachable place? We could
> probably use the mutex of the configfs subsystem and enlarge its scope to
> protect against concurrent mkdir(), check for user-created items under the
> group (and descendent default groups) to remove, invalidate all dentries and
> actually remove the group. Do you think that something is feasible in this way?

	Nope, because you may have live objects below you - the rmdir
should fail, and nothing should change.  Sure, you could put it back,
but in the middle there is a period where another process tries to look
at the tree and gets ENOENT.  That's not right.
	But blocking lookup another way might work.  If we keep that
rename process out of looking up its targets (blocked on a lock we hold)
it might work.
	Note, btw, that the create side (populate_groups) is safe,
because we hold the creating parent's i_mutex throughout the entire
process.
	Hey, can we use d_revalidate?  Here's the issue.  rename, when
going to lookup the objects it wants to lock, is getting them out of
cached_lookup - there dcache locking is all that protects it.  I was
first thinking we could take the dentry locks to block this out.  But
rather, why not fail d_revalidate and force a locked lookup?  So, when
we go to lock one of these groups for detaching, we also set a flag on
the configfs_dirent.  We add a configfs_d_revalidate function that
returns based on that flag - if set, revalidation is needed.  Thus, when
another process comes in to look at the object we've already locked, it
blocks waiting to find it.
	See, in do_rename, it does do_path_lookup() before actually
calling lock_rename().  It would block there waiting for our speculative
removal.  We'd either fail rmdir, and those lookups would succeed, or
we'd succeed rmdir, and the lookup fails.
	The only concern is, can the reverse happen?  We get past the
lookups in do_rename(), and then another process comes into rmdir() -
will they deadlock there?

Joel

-- 

Life's Little Instruction Book #267

	"Lie on your back and look at the stars."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] deadlock between configfs_rmdir() and sys_rename() (WAS Re: [RFC][PATCH 4/4] configfs: Make multiple default_group) destructions lockdep friendly
  2008-06-10 17:36                   ` Joel Becker
@ 2008-06-11 14:10                     ` Louis Rilling
  2008-06-11 17:30                       ` Louis Rilling
  0 siblings, 1 reply; 17+ messages in thread
From: Louis Rilling @ 2008-06-11 14:10 UTC (permalink / raw)
  To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel

On Tue, Jun 10, 2008 at 10:36:54AM -0700, Joel Becker wrote:
> On Tue, Jun 10, 2008 at 12:14:27PM +0200, Louis Rilling wrote:
> > On Mon, Jun 09, 2008 at 06:58:00PM -0700, Joel Becker wrote:
> > > 	I'm not against the latter AT ALL.  I just haven't come up with
> > > it yet - we can't remove parts of the tree, it must be all or none.
> > > Hence, we lock them all speculatively.
> > 
> > I'm slowly thinking about a solution, but I don't know the VFS enough
> > yet, especially regarding dentry invalidation and locking. Would it be possible
> > to start an rmdir() by moving the group to some unreachable place? We could
> > probably use the mutex of the configfs subsystem and enlarge its scope to
> > protect against concurrent mkdir(), check for user-created items under the
> > group (and descendent default groups) to remove, invalidate all dentries and
> > actually remove the group. Do you think that something is feasible in this way?
> 
> 	Nope, because you may have live objects below you - the rmdir
> should fail, and nothing should change.  Sure, you could put it back,
> but in the middle there is a period where another process tries to look
> at the tree and gets ENOENT.  That's not right.

Sure. I was more thinking about moving the to-be-removed group only once we ware
sure that nobody would use it anymore (thanks to the subsys mutex enlarging its
scope for instance). Anyway, see below.

> 	But blocking lookup another way might work.  If we keep that
> rename process out of looking up its targets (blocked on a lock we hold)
> it might work.
> 	Note, btw, that the create side (populate_groups) is safe,
> because we hold the creating parent's i_mutex throughout the entire
> process.

Yes, mkdir() is completely ok. In fact, I am able to formally prove its locking
correctness (from lockdep's point of view) provided that I_MUTEX_PARENT ->
I_MUTEX_CHILD is correct. I'll come back to lockdep annotations once the
rmdir() bug is fixed.

> 	Hey, can we use d_revalidate?  Here's the issue.  rename, when
> going to lookup the objects it wants to lock, is getting them out of
> cached_lookup - there dcache locking is all that protects it.  I was
> first thinking we could take the dentry locks to block this out.  But
> rather, why not fail d_revalidate and force a locked lookup?  So, when
> we go to lock one of these groups for detaching, we also set a flag on
> the configfs_dirent.  We add a configfs_d_revalidate function that
> returns based on that flag - if set, revalidation is needed.  Thus, when
> another process comes in to look at the object we've already locked, it
> blocks waiting to find it.
> 	See, in do_rename, it does do_path_lookup() before actually
> calling lock_rename().  It would block there waiting for our speculative
> removal.  We'd either fail rmdir, and those lookups would succeed, or
> we'd succeed rmdir, and the lookup fails.
> 	The only concern is, can the reverse happen?  We get past the
> lookups in do_rename(), and then another process comes into rmdir() -
> will they deadlock there?

No we cannot make d_revalidate() temporarily fail, because it will either make
do_lookup() fail (return value < 0), or will tell do_revalidate() to call
d_invalidate() (return value == 0), which will either definitely invalidate the
dentry stored in item->ci_dentry (unlikely because its use count should be > 1,
right?), or return success to do_lookup().

The good news is that we can make d_revalidate() block on whatever we use to
protect rmdir() against racing operations. I'll try to send a patch levaraging
the subsystem's mutexes later in the day.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] deadlock between configfs_rmdir() and sys_rename() (WAS Re: [RFC][PATCH 4/4] configfs: Make multiple default_group) destructions lockdep friendly
  2008-06-11 14:10                     ` Louis Rilling
@ 2008-06-11 17:30                       ` Louis Rilling
  2008-06-11 21:15                         ` Joel Becker
  0 siblings, 1 reply; 17+ messages in thread
From: Louis Rilling @ 2008-06-11 17:30 UTC (permalink / raw)
  To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel

On Wed, Jun 11, 2008 at 04:10:10PM +0200, Louis Rilling wrote:
> On Tue, Jun 10, 2008 at 10:36:54AM -0700, Joel Becker wrote:
> > 	Hey, can we use d_revalidate?  Here's the issue.  rename, when
> > going to lookup the objects it wants to lock, is getting them out of
> > cached_lookup - there dcache locking is all that protects it.  I was
> > first thinking we could take the dentry locks to block this out.  But
> > rather, why not fail d_revalidate and force a locked lookup?  So, when
> > we go to lock one of these groups for detaching, we also set a flag on
> > the configfs_dirent.  We add a configfs_d_revalidate function that
> > returns based on that flag - if set, revalidation is needed.  Thus, when
> > another process comes in to look at the object we've already locked, it
> > blocks waiting to find it.
> > 	See, in do_rename, it does do_path_lookup() before actually
> > calling lock_rename().  It would block there waiting for our speculative
> > removal.  We'd either fail rmdir, and those lookups would succeed, or
> > we'd succeed rmdir, and the lookup fails.
> > 	The only concern is, can the reverse happen?  We get past the
> > lookups in do_rename(), and then another process comes into rmdir() -
> > will they deadlock there?
> 
> No we cannot make d_revalidate() temporarily fail, because it will either make
> do_lookup() fail (return value < 0), or will tell do_revalidate() to call
> d_invalidate() (return value == 0), which will either definitely invalidate the
> dentry stored in item->ci_dentry (unlikely because its use count should be > 1,
> right?), or return success to do_lookup().
> 
> The good news is that we can make d_revalidate() block on whatever we use to
> protect rmdir() against racing operations. I'll try to send a patch levaraging
> the subsystem's mutexes later in the day.

I've spoken too fast. Doing things in d_revalidate() (or even in
configfs_lookup()) is just too early: after they are called and before
lock_rename(), nothing can prevent rmdir() from being called.

I'm afraid that we need a way to get rid of the whole tree locking in
configfs_detach_prep(). Imagine that we protect all calls to configfs_rmdir()
and configfs_mkdir() with a global mutex (say configfs_dir_mutex). Is it still
needed to take default group's i_mutex in configfs_detach_prep() to check for
the validity of the rmdir()?
	After this check, we could set a REMOVING flag in configfs_dirent of all
default groups (could actually be done by configfs_detach_prep(), making
configfs_detach_rollback() resetting the flag), release configfs_dir_mutex, and
then call configfs_detach_group(), with detach_groups() taking default group's
i_mutex right before calling configfs_detach_group() recursively (this would
lead to the same path locking as in populate_groups() or
configfs_depend_prep()).
	On mkdir() side, once configfs_dir_mutex is acquired, we first check the
REMOVING flag, and proceed if it is not set.

Just tell me if you think that it's feasible, and I'll send you a patch if it's
ok.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] deadlock between configfs_rmdir() and sys_rename() (WAS Re: [RFC][PATCH 4/4] configfs: Make multiple default_group) destructions lockdep friendly
  2008-06-11 17:30                       ` Louis Rilling
@ 2008-06-11 21:15                         ` Joel Becker
  0 siblings, 0 replies; 17+ messages in thread
From: Joel Becker @ 2008-06-11 21:15 UTC (permalink / raw)
  To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel

On Wed, Jun 11, 2008 at 07:30:47PM +0200, Louis Rilling wrote:
> I've spoken too fast. Doing things in d_revalidate() (or even in
> configfs_lookup()) is just too early: after they are called and before
> lock_rename(), nothing can prevent rmdir() from being called.

	Yeah, that's what I was saying.

> I'm afraid that we need a way to get rid of the whole tree locking in
> configfs_detach_prep(). Imagine that we protect all calls to configfs_rmdir()
> and configfs_mkdir() with a global mutex (say configfs_dir_mutex). Is it still
> needed to take default group's i_mutex in configfs_detach_prep() to check for
> the validity of the rmdir()?

	That's not a bad idea.  I was basically sure we need the tree
locking to prevent other VFS operations from happening, but the more I
think about it, the only reason we do the prep locking is to check and
protect the ENOTEMPTY case.  I've bounced it off Mark too, and he
agrees.

> 	After this check, we could set a REMOVING flag in configfs_dirent of all
> default groups (could actually be done by configfs_detach_prep(), making
> configfs_detach_rollback() resetting the flag), release configfs_dir_mutex, and
> then call configfs_detach_group(), with detach_groups() taking default group's
> i_mutex right before calling configfs_detach_group() recursively (this would
> lead to the same path locking as in populate_groups() or
> configfs_depend_prep()).

	We already set the flag - CONFIGFS_USET_DROPPING.  The only
change is that you don't take i_mutex anymore.  For the static mutex,
the name 'configfs_dir_mutex' is fine.

> 	On mkdir() side, once configfs_dir_mutex is acquired, we first check the
> REMOVING flag, and proceed if it is not set.

	And if it is set, we return -ENOENT.

> Just tell me if you think that it's feasible, and I'll send you a patch if it's

	Have at!  Thanks.

Joel

-- 

"Vote early and vote often." 
        - Al Capone

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-06-11 21:16 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-05-22 11:40 [RFC][PATCH 0/4] configfs: Make nested default groups lockdep-friendly v2 Louis Rilling
2008-05-22 11:40 ` [RFC][PATCH 1/4] Prepare i_mutex lockdep subclasses for locking of variable path lengths Louis Rilling
2008-05-22 11:40 ` [RFC][PATCH 2/4] Prepare vfs_rmdir() for further nested i_mutex locking Louis Rilling
2008-05-22 11:40 ` [RFC][PATCH 3/4] configfs: Make nested default groups creations lockdep-friendly Louis Rilling
2008-05-22 11:40 ` [RFC][PATCH 4/4] configfs: Make multiple default_group destructions lockdep friendly Louis Rilling
2008-05-23 16:44   ` Louis Rilling
2008-06-02 23:07     ` Joel Becker
2008-06-03 16:00       ` Louis Rilling
2008-06-06 23:01         ` Joel Becker
2008-06-09 11:03           ` Louis Rilling
2008-06-09 12:54             ` [BUG] deadlock between configfs_rmdir() and sys_rename() (WAS Re: [RFC][PATCH 4/4] configfs: Make multiple default_group) " Louis Rilling
2008-06-10  1:58               ` Joel Becker
2008-06-10 10:14                 ` Louis Rilling
2008-06-10 17:36                   ` Joel Becker
2008-06-11 14:10                     ` Louis Rilling
2008-06-11 17:30                       ` Louis Rilling
2008-06-11 21:15                         ` Joel Becker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).