linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3][BUGFIX] configfs: Fix deadlock rmdir() vs rename()
@ 2008-06-12 13:31 Louis Rilling
  2008-06-12 13:31 ` [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock Louis Rilling
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Louis Rilling @ 2008-06-12 13:31 UTC (permalink / raw)
  To: Joel.Becker; +Cc: Louis.Rilling, linux-kernel, ocfs2-devel

Hi,

This patchset fixes the deadlock described below. I did not exactly implement
what I described earlier, since I found this more elegant solution meanwhile.
Please see the third patch header for a detailed explanation of the fix.

The following procedure can trigger a deadlock in configfs (see
http://www.ussg.iu.edu/hypermail/linux/kernel/0806.1/0380.html for a patch
that makes it easier to trigger):

# mkdir /config/cluster/foo
# cd /config/cluster/foo
# mv heartbeat/dead_threshold node/bar

and in another shell, right after having launched test_deadlock:

# rmdir /config/cluster/foo

First, lockdep warns as usual (see below), and after two minutes (standard task
deadlock parameters), we get the dead lock alerts:

<log>

=============================================
[ INFO: possible recursive locking detected ]
2.6.26-rc5 #13
---------------------------------------------
rmdir/3997 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#11){--..}, at: [<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa

but task is already holding lock:
 (&sb->s_type->i_mutex_key#11){--..}, at: [<ffffffff80296070>] vfs_rmdir+0x49/0xac

other info that might help us debug this:
2 locks held by rmdir/3997:
 #0:  (&sb->s_type->i_mutex_key#3/1){--..}, at: [<ffffffff80297c77>] do_rmdir+0x82/0x108
 #1:  (&sb->s_type->i_mutex_key#11){--..}, at: [<ffffffff80296070>] vfs_rmdir+0x49/0xac

stack backtrace:
Pid: 3997, comm: rmdir Not tainted 2.6.26-rc5 #13

Call Trace:
 [<ffffffff8024aa65>] __lock_acquire+0x8d2/0xc78
 [<ffffffff802495ec>] find_usage_backwards+0x9d/0xbe
 [<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
 [<ffffffff8024b1de>] lock_acquire+0x51/0x6c
 [<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
 [<ffffffff80247dad>] debug_mutex_lock_common+0x16/0x23
 [<ffffffff805d63a4>] mutex_lock_nested+0xcd/0x23b
 [<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
 [<ffffffff802d327b>] configfs_rmdir+0xb8/0x1c3
 [<ffffffff80296092>] vfs_rmdir+0x6b/0xac
 [<ffffffff80297cac>] do_rmdir+0xb7/0x108
 [<ffffffff80249d1e>] trace_hardirqs_on+0xef/0x113
 [<ffffffff805d74c4>] trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff8020b0cb>] system_call_after_swapgs+0x7b/0x80

INFO: task test_deadlock:3996 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
test_deadlock D 0000000000000001     0  3996   3980
 ffff81007cc93d78 0000000000000046 ffff81007cc93d40 ffffffff808ed280
 ffffffff808ed280 ffff81007cc93d28 ffffffff808ed280 ffffffff808ed280
 ffffffff808ed280 ffffffff808ea120 ffffffff808ed280 ffff81007cdcaa10
Call Trace:
 [<ffffffff802955e3>] lock_rename+0x11e/0x126
 [<ffffffff805d641e>] mutex_lock_nested+0x147/0x23b
 [<ffffffff802955e3>] lock_rename+0x11e/0x126
 [<ffffffff80297838>] sys_renameat+0xd7/0x21c
 [<ffffffff805d74c4>] trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff80249d1e>] trace_hardirqs_on+0xef/0x113
 [<ffffffff805d74c4>] trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff8020b0cb>] system_call_after_swapgs+0x7b/0x80

INFO: lockdep is turned off.
INFO: task rmdir:3997 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rmdir         D 0000000000000000     0  3997   3986
 ffff81007cdb9dd8 0000000000000046 0000000000000000 ffffffff808ed280
 ffffffff808ed280 ffff81007cdb9d88 ffffffff808ed280 ffffffff808ed280
 ffffffff808ed280 ffffffff808ea120 ffffffff808ed280 ffff81007cde0a50
Call Trace:
 [<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
 [<ffffffff805d641e>] mutex_lock_nested+0x147/0x23b
 [<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
 [<ffffffff802d327b>] configfs_rmdir+0xb8/0x1c3
 [<ffffffff80296092>] vfs_rmdir+0x6b/0xac
 [<ffffffff80297cac>] do_rmdir+0xb7/0x108
 [<ffffffff80249d1e>] trace_hardirqs_on+0xef/0x113
 [<ffffffff805d74c4>] trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff8020b0cb>] system_call_after_swapgs+0x7b/0x80

INFO: lockdep is turned off.

</log>

The issue here is that the VFS locks the i_mutex of the source and target
directories of the rename in source -> target order (because none is ascendent
of the other one), while configfs_detach_prep() takes them in default group
order (or reverse order, I'm not sure), following the order specified by the
groups' creator.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock
  2008-06-12 13:31 [PATCH 0/3][BUGFIX] configfs: Fix deadlock rmdir() vs rename() Louis Rilling
@ 2008-06-12 13:31 ` Louis Rilling
  2008-06-12 19:13   ` Joel Becker
  2008-06-12 13:31 ` [PATCH 2/3][BUGFIX] configfs: Make configfs_new_dirent() return error code instead of NULL Louis Rilling
  2008-06-12 13:31 ` [PATCH 3/3][BUGFIX] configfs: Fix deadlock with racing rmdir() and rename() Louis Rilling
  2 siblings, 1 reply; 14+ messages in thread
From: Louis Rilling @ 2008-06-12 13:31 UTC (permalink / raw)
  To: Joel.Becker; +Cc: Louis.Rilling, linux-kernel, ocfs2-devel

[-- Attachment #1: configfs-introduce-configfs_dirent_lock.patch --]
[-- Type: text/plain, Size: 6541 bytes --]

This patch introduces configfs_dirent_lock spinlock to protect configfs_dirent
traversals against linkage mutations (add/del/move). This will allow
configfs_detach_prep() to avoid locking i_mutexes. 

Locking rules for configfs_dirent linkage mutations are the same plus the
requirement of taking configfs_dirent_lock. For configfs_dirent walking, one can
either take appropriate i_mutex as before, or take configfs_dirent_lock.

The spinlock could actually be a mutex, but the critical sections are either
O(1) or should not be too long (default groups walking in last patch).

Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com>
---
 fs/configfs/configfs_internal.h |    3 +++
 fs/configfs/dir.c               |   25 +++++++++++++++++++++++++
 fs/configfs/inode.c             |    2 ++
 fs/configfs/symlink.c           |    2 ++
 4 files changed, 32 insertions(+)

Index: b/fs/configfs/configfs_internal.h
===================================================================
--- a/fs/configfs/configfs_internal.h	2008-06-12 13:44:27.000000000 +0200
+++ b/fs/configfs/configfs_internal.h	2008-06-12 13:44:34.000000000 +0200
@@ -26,6 +26,7 @@
 
 #include <linux/slab.h>
 #include <linux/list.h>
+#include <linux/spinlock.h>
 
 struct configfs_dirent {
 	atomic_t		s_count;
@@ -49,6 +50,8 @@ struct configfs_dirent {
 #define CONFIGFS_USET_DROPPING	0x0100
 #define CONFIGFS_NOT_PINNED	(CONFIGFS_ITEM_ATTR)
 
+extern spinlock_t configfs_dirent_lock;
+
 extern struct vfsmount * configfs_mount;
 extern struct kmem_cache *configfs_dir_cachep;
 
Index: b/fs/configfs/dir.c
===================================================================
--- a/fs/configfs/dir.c	2008-06-12 13:44:27.000000000 +0200
+++ b/fs/configfs/dir.c	2008-06-12 15:20:26.000000000 +0200
@@ -35,6 +35,11 @@
 #include "configfs_internal.h"
 
 DECLARE_RWSEM(configfs_rename_sem);
+/*
+ * Protects configfs_dirent traversals against linkage mutations
+ * Can be used as an alternative to taking the concerned i_mutex
+ */
+DEFINE_SPINLOCK(configfs_dirent_lock);
 
 static void configfs_d_iput(struct dentry * dentry,
 			    struct inode * inode)
@@ -79,7 +84,9 @@ static struct configfs_dirent *configfs_
 	atomic_set(&sd->s_count, 1);
 	INIT_LIST_HEAD(&sd->s_links);
 	INIT_LIST_HEAD(&sd->s_children);
+	spin_lock(&configfs_dirent_lock);
 	list_add(&sd->s_sibling, &parent_sd->s_children);
+	spin_unlock(&configfs_dirent_lock);
 	sd->s_element = element;
 
 	return sd;
@@ -173,7 +180,9 @@ static int create_dir(struct config_item
 		} else {
 			struct configfs_dirent *sd = d->d_fsdata;
 			if (sd) {
+				spin_lock(&configfs_dirent_lock);
 				list_del_init(&sd->s_sibling);
+				spin_unlock(&configfs_dirent_lock);
 				configfs_put(sd);
 			}
 		}
@@ -224,7 +233,9 @@ int configfs_create_link(struct configfs
 		else {
 			struct configfs_dirent *sd = dentry->d_fsdata;
 			if (sd) {
+				spin_lock(&configfs_dirent_lock);
 				list_del_init(&sd->s_sibling);
+				spin_unlock(&configfs_dirent_lock);
 				configfs_put(sd);
 			}
 		}
@@ -238,7 +249,9 @@ static void remove_dir(struct dentry * d
 	struct configfs_dirent * sd;
 
 	sd = d->d_fsdata;
+	spin_lock(&configfs_dirent_lock);
 	list_del_init(&sd->s_sibling);
+	spin_unlock(&configfs_dirent_lock);
 	configfs_put(sd);
 	if (d->d_inode)
 		simple_rmdir(parent->d_inode,d);
@@ -410,7 +423,9 @@ static void detach_attrs(struct config_i
 	list_for_each_entry_safe(sd, tmp, &parent_sd->s_children, s_sibling) {
 		if (!sd->s_element || !(sd->s_type & CONFIGFS_NOT_PINNED))
 			continue;
+		spin_lock(&configfs_dirent_lock);
 		list_del_init(&sd->s_sibling);
+		spin_unlock(&configfs_dirent_lock);
 		configfs_drop_dentry(sd, dentry);
 		configfs_put(sd);
 	}
@@ -1268,7 +1283,9 @@ static int configfs_dir_close(struct ino
 	struct configfs_dirent * cursor = file->private_data;
 
 	mutex_lock(&dentry->d_inode->i_mutex);
+	spin_lock(&configfs_dirent_lock);
 	list_del_init(&cursor->s_sibling);
+	spin_unlock(&configfs_dirent_lock);
 	mutex_unlock(&dentry->d_inode->i_mutex);
 
 	release_configfs_dirent(cursor);
@@ -1308,7 +1325,9 @@ static int configfs_readdir(struct file 
 			/* fallthrough */
 		default:
 			if (filp->f_pos == 2) {
+				spin_lock(&configfs_dirent_lock);
 				list_move(q, &parent_sd->s_children);
+				spin_unlock(&configfs_dirent_lock);
 			}
 			for (p=q->next; p!= &parent_sd->s_children; p=p->next) {
 				struct configfs_dirent *next;
@@ -1331,7 +1350,9 @@ static int configfs_readdir(struct file 
 						 dt_type(next)) < 0)
 					return 0;
 
+				spin_lock(&configfs_dirent_lock);
 				list_move(q, p);
+				spin_unlock(&configfs_dirent_lock);
 				p = q;
 				filp->f_pos++;
 			}
@@ -1362,7 +1383,9 @@ static loff_t configfs_dir_lseek(struct 
 			struct list_head *p;
 			loff_t n = file->f_pos - 2;
 
+			spin_lock(&configfs_dirent_lock);
 			list_del(&cursor->s_sibling);
+			spin_unlock(&configfs_dirent_lock);
 			p = sd->s_children.next;
 			while (n && p != &sd->s_children) {
 				struct configfs_dirent *next;
@@ -1372,7 +1395,9 @@ static loff_t configfs_dir_lseek(struct 
 					n--;
 				p = p->next;
 			}
+			spin_lock(&configfs_dirent_lock);
 			list_add_tail(&cursor->s_sibling, p);
+			spin_unlock(&configfs_dirent_lock);
 		}
 	}
 	mutex_unlock(&dentry->d_inode->i_mutex);
Index: b/fs/configfs/inode.c
===================================================================
--- a/fs/configfs/inode.c	2008-06-12 13:44:27.000000000 +0200
+++ b/fs/configfs/inode.c	2008-06-12 13:44:34.000000000 +0200
@@ -247,7 +247,9 @@ void configfs_hash_and_remove(struct den
 		if (!sd->s_element)
 			continue;
 		if (!strcmp(configfs_get_name(sd), name)) {
+			spin_lock(&configfs_dirent_lock);
 			list_del_init(&sd->s_sibling);
+			spin_unlock(&configfs_dirent_lock);
 			configfs_drop_dentry(sd, dir);
 			configfs_put(sd);
 			break;
Index: b/fs/configfs/symlink.c
===================================================================
--- a/fs/configfs/symlink.c	2008-06-12 13:44:27.000000000 +0200
+++ b/fs/configfs/symlink.c	2008-06-12 13:44:34.000000000 +0200
@@ -169,7 +169,9 @@ int configfs_unlink(struct inode *dir, s
 	parent_item = configfs_get_config_item(dentry->d_parent);
 	type = parent_item->ci_type;
 
+	spin_lock(&configfs_dirent_lock);
 	list_del_init(&sd->s_sibling);
+	spin_unlock(&configfs_dirent_lock);
 	configfs_drop_dentry(sd, dentry->d_parent);
 	dput(dentry);
 	configfs_put(sd);

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/3][BUGFIX] configfs: Make configfs_new_dirent() return error code instead of NULL
  2008-06-12 13:31 [PATCH 0/3][BUGFIX] configfs: Fix deadlock rmdir() vs rename() Louis Rilling
  2008-06-12 13:31 ` [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock Louis Rilling
@ 2008-06-12 13:31 ` Louis Rilling
  2008-06-12 13:31 ` [PATCH 3/3][BUGFIX] configfs: Fix deadlock with racing rmdir() and rename() Louis Rilling
  2 siblings, 0 replies; 14+ messages in thread
From: Louis Rilling @ 2008-06-12 13:31 UTC (permalink / raw)
  To: Joel.Becker; +Cc: Louis.Rilling, linux-kernel, ocfs2-devel

[-- Attachment #1: configfs-make-configfs_new_dirent-return-errno.patch --]
[-- Type: text/plain, Size: 1666 bytes --]

This patch makes configfs_new_dirent return negative error code instead of NULL,
which will be useful in the next patch to differentiate ENOMEM from ENOENT.

Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com>
---
 fs/configfs/dir.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Index: b/fs/configfs/dir.c
===================================================================
--- a/fs/configfs/dir.c	2008-06-12 13:45:14.000000000 +0200
+++ b/fs/configfs/dir.c	2008-06-12 15:18:28.000000000 +0200
@@ -30,6 +30,7 @@
 #include <linux/mount.h>
 #include <linux/module.h>
 #include <linux/slab.h>
+#include <linux/err.h>
 
 #include <linux/configfs.h>
 #include "configfs_internal.h"
@@ -79,7 +80,7 @@ static struct configfs_dirent *configfs_
 
 	sd = kmem_cache_zalloc(configfs_dir_cachep, GFP_KERNEL);
 	if (!sd)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	atomic_set(&sd->s_count, 1);
 	INIT_LIST_HEAD(&sd->s_links);
@@ -125,8 +126,8 @@ int configfs_make_dirent(struct configfs
 	struct configfs_dirent * sd;
 
 	sd = configfs_new_dirent(parent_sd, element);
-	if (!sd)
-		return -ENOMEM;
+	if (IS_ERR(sd))
+		return PTR_ERR(sd);
 
 	sd->s_mode = mode;
 	sd->s_type = type;
@@ -1273,7 +1274,7 @@ static int configfs_dir_open(struct inod
 	file->private_data = configfs_new_dirent(parent_sd, NULL);
 	mutex_unlock(&dentry->d_inode->i_mutex);
 
-	return file->private_data ? 0 : -ENOMEM;
+	return IS_ERR(file->private_data) ? PTR_ERR(file->private_data) : 0;
 
 }
 

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/3][BUGFIX] configfs: Fix deadlock with racing rmdir() and rename()
  2008-06-12 13:31 [PATCH 0/3][BUGFIX] configfs: Fix deadlock rmdir() vs rename() Louis Rilling
  2008-06-12 13:31 ` [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock Louis Rilling
  2008-06-12 13:31 ` [PATCH 2/3][BUGFIX] configfs: Make configfs_new_dirent() return error code instead of NULL Louis Rilling
@ 2008-06-12 13:31 ` Louis Rilling
  2 siblings, 0 replies; 14+ messages in thread
From: Louis Rilling @ 2008-06-12 13:31 UTC (permalink / raw)
  To: Joel.Becker; +Cc: Louis.Rilling, linux-kernel, ocfs2-devel

[-- Attachment #1: configfs-fix-rmdir-vs-rename-deadlock.patch --]
[-- Type: text/plain, Size: 5737 bytes --]

This patch fixes the deadlock between racing sys_rename() and configfs_rmdir().

The idea is to avoid locking i_mutexes of default groups in
configfs_detach_prep(), and rely instead on the new configfs_dirent_lock to
protect against configfs_dirent's linkage mutations. To ensure that an mkdir()
racing with rmdir() will not create new items in a to-be-removed default group,
we make configfs_new_dirent() check for the CONFIGFS_USET_DROPPING flag right
before linking the new dirent, and return error if the flag is set. This makes
racing mkdir()/symlink()/dir_open() fail in places where errors could already
happen, resp. in (attach_item()|attach_group())/create_link()/new_dirent().

configfs_depend() remains safe since it locks all the path from configfs root,
and is thus mutually exclusive with rmdir().

An advantage of this is that now detach_groups() unconditionnaly takes the
default groups i_mutex, which makes it more consistent with populate_groups().

Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com>
---
 fs/configfs/dir.c |   44 ++++++++++++++++++++++++--------------------
 1 file changed, 24 insertions(+), 20 deletions(-)

Index: b/fs/configfs/dir.c
===================================================================
--- a/fs/configfs/dir.c	2008-06-12 13:45:18.000000000 +0200
+++ b/fs/configfs/dir.c	2008-06-12 13:50:10.000000000 +0200
@@ -38,6 +38,9 @@
 DECLARE_RWSEM(configfs_rename_sem);
 /*
  * Protects configfs_dirent traversals against linkage mutations
+ * Protects setting of CONFIGFS_USET_DROPPING: checking the flag
+ * unlocked is not reliable unless in detach_groups called from
+ * rmdir/unregister and from configfs_attach_group
  * Can be used as an alternative to taking the concerned i_mutex
  */
 DEFINE_SPINLOCK(configfs_dirent_lock);
@@ -86,6 +89,11 @@ static struct configfs_dirent *configfs_
 	INIT_LIST_HEAD(&sd->s_links);
 	INIT_LIST_HEAD(&sd->s_children);
 	spin_lock(&configfs_dirent_lock);
+	if (parent_sd->s_type & CONFIGFS_USET_DROPPING) {
+		spin_unlock(&configfs_dirent_lock);
+		kmem_cache_free(configfs_dir_cachep, sd);
+		return ERR_PTR(-ENOENT);
+	}
 	list_add(&sd->s_sibling, &parent_sd->s_children);
 	spin_unlock(&configfs_dirent_lock);
 	sd->s_element = element;
@@ -345,11 +353,11 @@ static struct dentry * configfs_lookup(s
 
 /*
  * Only subdirectories count here.  Files (CONFIGFS_NOT_PINNED) are
- * attributes and are removed by rmdir().  We recurse, taking i_mutex
- * on all children that are candidates for default detach.  If the
- * result is clean, then configfs_detach_group() will handle dropping
- * i_mutex.  If there is an error, the caller will clean up the i_mutex
- * holders via configfs_detach_rollback().
+ * attributes and are removed by rmdir().  We recurse, setting
+ * CONFIGFS_USET_DROPPING on all children that are candidates for
+ * default detach.
+ * If there is an error, the caller will reset the flags via
+ * configfs_detach_rollback().
  */
 static int configfs_detach_prep(struct dentry *dentry)
 {
@@ -366,8 +374,7 @@ static int configfs_detach_prep(struct d
 		if (sd->s_type & CONFIGFS_NOT_PINNED)
 			continue;
 		if (sd->s_type & CONFIGFS_USET_DEFAULT) {
-			mutex_lock(&sd->s_dentry->d_inode->i_mutex);
-			/* Mark that we've taken i_mutex */
+			/* Mark that we're trying to drop the group */
 			sd->s_type |= CONFIGFS_USET_DROPPING;
 
 			/*
@@ -388,7 +395,7 @@ out:
 }
 
 /*
- * Walk the tree, dropping i_mutex wherever CONFIGFS_USET_DROPPING is
+ * Walk the tree, resetting CONFIGFS_USET_DROPPING wherever it was
  * set.
  */
 static void configfs_detach_rollback(struct dentry *dentry)
@@ -399,11 +406,7 @@ static void configfs_detach_rollback(str
 	list_for_each_entry(sd, &parent_sd->s_children, s_sibling) {
 		if (sd->s_type & CONFIGFS_USET_DEFAULT) {
 			configfs_detach_rollback(sd->s_dentry);
-
-			if (sd->s_type & CONFIGFS_USET_DROPPING) {
-				sd->s_type &= ~CONFIGFS_USET_DROPPING;
-				mutex_unlock(&sd->s_dentry->d_inode->i_mutex);
-			}
+			sd->s_type &= ~CONFIGFS_USET_DROPPING;
 		}
 	}
 }
@@ -482,16 +485,12 @@ static void detach_groups(struct config_
 
 		child = sd->s_dentry;
 
+		mutex_lock(&child->d_inode->i_mutex);
+
 		configfs_detach_group(sd->s_element);
 		child->d_inode->i_flags |= S_DEAD;
 
-		/*
-		 * From rmdir/unregister, a configfs_detach_prep() pass
-		 * has taken our i_mutex for us.  Drop it.
-		 * From mkdir/register cleanup, there is no sem held.
-		 */
-		if (sd->s_type & CONFIGFS_USET_DROPPING)
-			mutex_unlock(&child->d_inode->i_mutex);
+		mutex_unlock(&child->d_inode->i_mutex);
 
 		d_delete(child);
 		dput(child);
@@ -1177,12 +1176,15 @@ static int configfs_rmdir(struct inode *
 		return -EINVAL;
 	}
 
+	spin_lock(&configfs_dirent_lock);
 	ret = configfs_detach_prep(dentry);
 	if (ret) {
 		configfs_detach_rollback(dentry);
+		spin_unlock(&configfs_dirent_lock);
 		config_item_put(parent_item);
 		return ret;
 	}
+	spin_unlock(&configfs_dirent_lock);
 
 	/* Get a working ref for the duration of this function */
 	item = configfs_get_config_item(dentry);
@@ -1474,9 +1476,11 @@ void configfs_unregister_subsystem(struc
 	mutex_lock_nested(&configfs_sb->s_root->d_inode->i_mutex,
 			  I_MUTEX_PARENT);
 	mutex_lock_nested(&dentry->d_inode->i_mutex, I_MUTEX_CHILD);
+	spin_lock(&configfs_dirent_lock);
 	if (configfs_detach_prep(dentry)) {
 		printk(KERN_ERR "configfs: Tried to unregister non-empty subsystem!\n");
 	}
+	spin_unlock(&configfs_dirent_lock);
 	configfs_detach_group(&group->cg_item);
 	dentry->d_inode->i_flags |= S_DEAD;
 	mutex_unlock(&dentry->d_inode->i_mutex);

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock
  2008-06-12 13:31 ` [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock Louis Rilling
@ 2008-06-12 19:13   ` Joel Becker
  2008-06-12 22:25     ` Louis Rilling
  0 siblings, 1 reply; 14+ messages in thread
From: Joel Becker @ 2008-06-12 19:13 UTC (permalink / raw)
  To: Louis Rilling; +Cc: linux-kernel, ocfs2-devel

On Thu, Jun 12, 2008 at 03:31:27PM +0200, Louis Rilling wrote:
> This patch introduces configfs_dirent_lock spinlock to protect configfs_dirent
> traversals against linkage mutations (add/del/move). This will allow
> configfs_detach_prep() to avoid locking i_mutexes. 

	I like that you expanded the scope to cover all configfs_dirent
linkages.  These are all protected by i_mutex in the current code, but
your rename fix removes that protection.

> Locking rules for configfs_dirent linkage mutations are the same plus the
> requirement of taking configfs_dirent_lock. For configfs_dirent walking, one can
> either take appropriate i_mutex as before, or take configfs_dirent_lock.

	Nope, you *must* take configfs_dirent_lock now.  You've removed
i_mutex protection in the last patch.


> The spinlock could actually be a mutex, but the critical sections are either
> O(1) or should not be too long (default groups walking in last patch).

	I'm wary of someone's reasonably deep groups.  Discussing it
yesterday I'd been convinced that a mutex was good to start with.  But
given your increased scope of the lock, let's try the spinlock and see
what happens.

> +extern spinlock_t configfs_dirent_lock;

	Boy I wish this could be static to one file :-(

> @@ -79,7 +84,9 @@ static struct configfs_dirent *configfs_
>  	atomic_set(&sd->s_count, 1);
>  	INIT_LIST_HEAD(&sd->s_links);
>  	INIT_LIST_HEAD(&sd->s_children);
> +	spin_lock(&configfs_dirent_lock);
>  	list_add(&sd->s_sibling, &parent_sd->s_children);
> +	spin_unlock(&configfs_dirent_lock);
>  	sd->s_element = element;

	You need to set s_element either under the lock or before taking
the lock.  Once you've dropped the lock, someone can reach this dirent
from the parent, and should see s_element.

> @@ -173,7 +180,9 @@ static int create_dir(struct config_item
>  		} else {
>  			struct configfs_dirent *sd = d->d_fsdata;
>  			if (sd) {
> +				spin_lock(&configfs_dirent_lock);
>  				list_del_init(&sd->s_sibling);
> +				spin_unlock(&configfs_dirent_lock);
>  				configfs_put(sd);
>  			}
>  		}
> @@ -224,7 +233,9 @@ int configfs_create_link(struct configfs
>  		else {
>  			struct configfs_dirent *sd = dentry->d_fsdata;
>  			if (sd) {
> +				spin_lock(&configfs_dirent_lock);
>  				list_del_init(&sd->s_sibling);
> +				spin_unlock(&configfs_dirent_lock);
>  				configfs_put(sd);
>  			}
>  		}

	These strike me as wrong - I would think that one should either
see the old configfs_dirent tree or the new one.  That is, one would
take the dirent lock at the beginning of configfs_mkdir() and release it
at the end - so any other code that looks at the dirent tree will see an
atomic change.  Here, some other path could see the new dirent after
configfs_make_dirent() but then it disappears when configfs_create()
fails.
	If you did that, though, it'd have to be a mutex.
	Now, the only thing that sees this intermediate condition is
configfs itself.  Everyone else is protected by i_mutex.  I guess it's
OK - but can you comment that fact?  i_mutex does *not* protect
traversal of the configfs_dirent tree, but it does prevent the outside
world from seeing the intermediate states.

> @@ -238,7 +249,9 @@ static void remove_dir(struct dentry * d
>  	struct configfs_dirent * sd;
>  
>  	sd = d->d_fsdata;
> +	spin_lock(&configfs_dirent_lock);
>  	list_del_init(&sd->s_sibling);
> +	spin_unlock(&configfs_dirent_lock);
>  	configfs_put(sd);
>  	if (d->d_inode)
>  		simple_rmdir(parent->d_inode,d);
> @@ -410,7 +423,9 @@ static void detach_attrs(struct config_i
>  	list_for_each_entry_safe(sd, tmp, &parent_sd->s_children, s_sibling) {
>  		if (!sd->s_element || !(sd->s_type & CONFIGFS_NOT_PINNED))
>  			continue;
> +		spin_lock(&configfs_dirent_lock);
>  		list_del_init(&sd->s_sibling);
> +		spin_unlock(&configfs_dirent_lock);
>  		configfs_drop_dentry(sd, dentry);
>  		configfs_put(sd);
>  	}
> @@ -1268,7 +1283,9 @@ static int configfs_dir_close(struct ino
>  	struct configfs_dirent * cursor = file->private_data;
>  
>  	mutex_lock(&dentry->d_inode->i_mutex);
> +	spin_lock(&configfs_dirent_lock);
>  	list_del_init(&cursor->s_sibling);
> +	spin_unlock(&configfs_dirent_lock);
>  	mutex_unlock(&dentry->d_inode->i_mutex);
>  
>  	release_configfs_dirent(cursor);
> @@ -1362,7 +1383,9 @@ static loff_t configfs_dir_lseek(struct 
>  			struct list_head *p;
>  			loff_t n = file->f_pos - 2;
>  
> +			spin_lock(&configfs_dirent_lock);
>  			list_del(&cursor->s_sibling);
> +			spin_unlock(&configfs_dirent_lock);
>  			p = sd->s_children.next;
>  			while (n && p != &sd->s_children) {
>  				struct configfs_dirent *next;
> @@ -1372,7 +1395,9 @@ static loff_t configfs_dir_lseek(struct 
>  					n--;
>  				p = p->next;
>  			}
> +			spin_lock(&configfs_dirent_lock);
>  			list_add_tail(&cursor->s_sibling, p);
> +			spin_unlock(&configfs_dirent_lock);
>  		}
>  	}
>  	mutex_unlock(&dentry->d_inode->i_mutex);
> Index: b/fs/configfs/inode.c
> ===================================================================
> --- a/fs/configfs/inode.c	2008-06-12 13:44:27.000000000 +0200
> +++ b/fs/configfs/inode.c	2008-06-12 13:44:34.000000000 +0200
> @@ -247,7 +247,9 @@ void configfs_hash_and_remove(struct den
>  		if (!sd->s_element)
>  			continue;
>  		if (!strcmp(configfs_get_name(sd), name)) {
> +			spin_lock(&configfs_dirent_lock);
>  			list_del_init(&sd->s_sibling);
> +			spin_unlock(&configfs_dirent_lock);
>  			configfs_drop_dentry(sd, dir);
>  			configfs_put(sd);
>  			break;
> Index: b/fs/configfs/symlink.c
> ===================================================================
> --- a/fs/configfs/symlink.c	2008-06-12 13:44:27.000000000 +0200
> +++ b/fs/configfs/symlink.c	2008-06-12 13:44:34.000000000 +0200
> @@ -169,7 +169,9 @@ int configfs_unlink(struct inode *dir, s
>  	parent_item = configfs_get_config_item(dentry->d_parent);
>  	type = parent_item->ci_type;
>  
> +	spin_lock(&configfs_dirent_lock);
>  	list_del_init(&sd->s_sibling);
> +	spin_unlock(&configfs_dirent_lock);
>  	configfs_drop_dentry(sd, dentry->d_parent);
>  	dput(dentry);
>  	configfs_put(sd);

	You do the lock,del(sibling),unlock so often, maybe it should be
a helper.  Then you can make configfs_dirent_lock static to dir.c.
Well, you use dirent_lock in your s_links patch, so maybe not static.

Joel

-- 

"Copy from one, it's plagiarism; copy from two, it's research."
        - Wilson Mizner

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock
  2008-06-12 19:13   ` Joel Becker
@ 2008-06-12 22:25     ` Louis Rilling
  2008-06-13  2:41       ` Joel Becker
  0 siblings, 1 reply; 14+ messages in thread
From: Louis Rilling @ 2008-06-12 22:25 UTC (permalink / raw)
  To: Joel Becker; +Cc: linux-kernel, ocfs2-devel

On Thu, Jun 12, 2008 at 12:13:48PM -0700, Joel Becker wrote:
> On Thu, Jun 12, 2008 at 03:31:27PM +0200, Louis Rilling wrote:
> > This patch introduces configfs_dirent_lock spinlock to protect configfs_dirent
> > traversals against linkage mutations (add/del/move). This will allow
> > configfs_detach_prep() to avoid locking i_mutexes. 
> 
> 	I like that you expanded the scope to cover all configfs_dirent
> linkages.  These are all protected by i_mutex in the current code, but
> your rename fix removes that protection.
> 
> > Locking rules for configfs_dirent linkage mutations are the same plus the
> > requirement of taking configfs_dirent_lock. For configfs_dirent walking, one can
> > either take appropriate i_mutex as before, or take configfs_dirent_lock.
> 
> 	Nope, you *must* take configfs_dirent_lock now.  You've removed
> i_mutex protection in the last patch.

Oh well. Do you mean because of CONFIGFS_USET_DROPPING being set without
i_mutex locked? This is the only mutation (except in the s_links patch) done
without i_mutex locked. I thought that actually either other
configfs_dirent traversals like readdir() and dir_lseek() would prevent
detach_prep() from succeeding because they add dirents before, or are done in
places where detach_prep() cannot do harm because new_dirent() fails whenever it
sees CONFIGFS_USET_DROPPING: detach_attrs() and detach_groups()
must ignore CONFIGFS_USET_DROPPING, depend_prep() is protected since the
whole path is locked from configfs root, lookup() can succeed since at worst its
result will be invalidated when actually detaching the default groups. The only
function for which I can not figure out is configfs_hash_and_remove(), but it is
not used.
	I admit that the case of symlink() needs an extra check to ensure
that the target is not about to be removed. The bug was already there
though, right?
	Anyway, if it looks conceptually simpler to use
configfs_dirent_lock (probably better a mutex in that case) wherever
i_mutex are supposed to protect configfs_dirent traversals, I'm ok with it.

> 
> 
> > The spinlock could actually be a mutex, but the critical sections are either
> > O(1) or should not be too long (default groups walking in last patch).
> 
> 	I'm wary of someone's reasonably deep groups.  Discussing it
> yesterday I'd been convinced that a mutex was good to start with.  But
> given your increased scope of the lock, let's try the spinlock and see
> what happens.
> 
> > +extern spinlock_t configfs_dirent_lock;
> 
> 	Boy I wish this could be static to one file :-(
> 
> > @@ -79,7 +84,9 @@ static struct configfs_dirent *configfs_
> >  	atomic_set(&sd->s_count, 1);
> >  	INIT_LIST_HEAD(&sd->s_links);
> >  	INIT_LIST_HEAD(&sd->s_children);
> > +	spin_lock(&configfs_dirent_lock);
> >  	list_add(&sd->s_sibling, &parent_sd->s_children);
> > +	spin_unlock(&configfs_dirent_lock);
> >  	sd->s_element = element;
> 
> 	You need to set s_element either under the lock or before taking
> the lock.  Once you've dropped the lock, someone can reach this dirent
> from the parent, and should see s_element.

Will do.

> 
> > @@ -173,7 +180,9 @@ static int create_dir(struct config_item
> >  		} else {
> >  			struct configfs_dirent *sd = d->d_fsdata;
> >  			if (sd) {
> > +				spin_lock(&configfs_dirent_lock);
> >  				list_del_init(&sd->s_sibling);
> > +				spin_unlock(&configfs_dirent_lock);
> >  				configfs_put(sd);
> >  			}
> >  		}
> > @@ -224,7 +233,9 @@ int configfs_create_link(struct configfs
> >  		else {
> >  			struct configfs_dirent *sd = dentry->d_fsdata;
> >  			if (sd) {
> > +				spin_lock(&configfs_dirent_lock);
> >  				list_del_init(&sd->s_sibling);
> > +				spin_unlock(&configfs_dirent_lock);
> >  				configfs_put(sd);
> >  			}
> >  		}
> 
> 	These strike me as wrong - I would think that one should either
> see the old configfs_dirent tree or the new one.  That is, one would
> take the dirent lock at the beginning of configfs_mkdir() and release it
> at the end - so any other code that looks at the dirent tree will see an
> atomic change.  Here, some other path could see the new dirent after
> configfs_make_dirent() but then it disappears when configfs_create()
> fails.
> 	If you did that, though, it'd have to be a mutex.

And we should not take other i_mutex in populate_groups() and
populate_attrs(), otherwise deadlocks could happen.

> 	Now, the only thing that sees this intermediate condition is
> configfs itself.  Everyone else is protected by i_mutex.  I guess it's
> OK - but can you comment that fact?  i_mutex does *not* protect
> traversal of the configfs_dirent tree, but it does prevent the outside
> world from seeing the intermediate states.

The only intermediate conditions that may hurt one's mind is that an
mkdir() (resp. symlink()) racing with an rmdir() can successfuly call
make_item()/make_group() (resp. allow_link()) and immediately fail when
finalizing with attach_item()/attach_group() (resp. create_link()). So, from
userspace and the VFS this seems like "mkdir foo/bar/baz" simply failed because
of "rmdir foo", while at the same time from the subsystem point of view this
seems like userspace did "mkdir foo/bar/baz; rmdir foo/bar/baz; rmdir foo".
As I pointed out in the rename fix, this however can already happen when
attach_item()/attach_group() (resp. create_link()) fails because of
memory pressure for instance.
	For the other intermediate conditions, see above about locking rules.

> 
> > @@ -238,7 +249,9 @@ static void remove_dir(struct dentry * d
> >  	struct configfs_dirent * sd;
> >  
> >  	sd = d->d_fsdata;
> > +	spin_lock(&configfs_dirent_lock);
> >  	list_del_init(&sd->s_sibling);
> > +	spin_unlock(&configfs_dirent_lock);
> >  	configfs_put(sd);
> >  	if (d->d_inode)
> >  		simple_rmdir(parent->d_inode,d);
> > @@ -410,7 +423,9 @@ static void detach_attrs(struct config_i
> >  	list_for_each_entry_safe(sd, tmp, &parent_sd->s_children, s_sibling) {
> >  		if (!sd->s_element || !(sd->s_type & CONFIGFS_NOT_PINNED))
> >  			continue;
> > +		spin_lock(&configfs_dirent_lock);
> >  		list_del_init(&sd->s_sibling);
> > +		spin_unlock(&configfs_dirent_lock);
> >  		configfs_drop_dentry(sd, dentry);
> >  		configfs_put(sd);
> >  	}
> > @@ -1268,7 +1283,9 @@ static int configfs_dir_close(struct ino
> >  	struct configfs_dirent * cursor = file->private_data;
> >  
> >  	mutex_lock(&dentry->d_inode->i_mutex);
> > +	spin_lock(&configfs_dirent_lock);
> >  	list_del_init(&cursor->s_sibling);
> > +	spin_unlock(&configfs_dirent_lock);
> >  	mutex_unlock(&dentry->d_inode->i_mutex);
> >  
> >  	release_configfs_dirent(cursor);
> > @@ -1362,7 +1383,9 @@ static loff_t configfs_dir_lseek(struct 
> >  			struct list_head *p;
> >  			loff_t n = file->f_pos - 2;
> >  
> > +			spin_lock(&configfs_dirent_lock);
> >  			list_del(&cursor->s_sibling);
> > +			spin_unlock(&configfs_dirent_lock);
> >  			p = sd->s_children.next;
> >  			while (n && p != &sd->s_children) {
> >  				struct configfs_dirent *next;
> > @@ -1372,7 +1395,9 @@ static loff_t configfs_dir_lseek(struct 
> >  					n--;
> >  				p = p->next;
> >  			}
> > +			spin_lock(&configfs_dirent_lock);
> >  			list_add_tail(&cursor->s_sibling, p);
> > +			spin_unlock(&configfs_dirent_lock);
> >  		}
> >  	}
> >  	mutex_unlock(&dentry->d_inode->i_mutex);
> > Index: b/fs/configfs/inode.c
> > ===================================================================
> > --- a/fs/configfs/inode.c	2008-06-12 13:44:27.000000000 +0200
> > +++ b/fs/configfs/inode.c	2008-06-12 13:44:34.000000000 +0200
> > @@ -247,7 +247,9 @@ void configfs_hash_and_remove(struct den
> >  		if (!sd->s_element)
> >  			continue;
> >  		if (!strcmp(configfs_get_name(sd), name)) {
> > +			spin_lock(&configfs_dirent_lock);
> >  			list_del_init(&sd->s_sibling);
> > +			spin_unlock(&configfs_dirent_lock);
> >  			configfs_drop_dentry(sd, dir);
> >  			configfs_put(sd);
> >  			break;
> > Index: b/fs/configfs/symlink.c
> > ===================================================================
> > --- a/fs/configfs/symlink.c	2008-06-12 13:44:27.000000000 +0200
> > +++ b/fs/configfs/symlink.c	2008-06-12 13:44:34.000000000 +0200
> > @@ -169,7 +169,9 @@ int configfs_unlink(struct inode *dir, s
> >  	parent_item = configfs_get_config_item(dentry->d_parent);
> >  	type = parent_item->ci_type;
> >  
> > +	spin_lock(&configfs_dirent_lock);
> >  	list_del_init(&sd->s_sibling);
> > +	spin_unlock(&configfs_dirent_lock);
> >  	configfs_drop_dentry(sd, dentry->d_parent);
> >  	dput(dentry);
> >  	configfs_put(sd);
> 
> 	You do the lock,del(sibling),unlock so often, maybe it should be
> a helper.  Then you can make configfs_dirent_lock static to dir.c.
> Well, you use dirent_lock in your s_links patch, so maybe not static.

Yes I need it not static, or implement helpers for s_children/s_sibling
and for s_links/sl_list. And if the scope should be extended to all
cases of configfs_dirent traversals plus whatever would fix symlink()'s
target not disappearing, it should definitely not be static.

Louis

-- 
Dr Louis Rilling			Kerlabs - IRISA
Skype: louis.rilling			Campus Universitaire de Beaulieu
Phone: (+33|0) 2 99 84 71 52		Avenue du General Leclerc
Fax: (+33|0) 2 99 84 71 71		35042 Rennes CEDEX - France
http://www.kerlabs.com/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock
  2008-06-12 22:25     ` Louis Rilling
@ 2008-06-13  2:41       ` Joel Becker
  2008-06-13 10:45         ` Louis Rilling
  0 siblings, 1 reply; 14+ messages in thread
From: Joel Becker @ 2008-06-13  2:41 UTC (permalink / raw)
  To: Louis Rilling; +Cc: linux-kernel, ocfs2-devel

On Fri, Jun 13, 2008 at 12:25:58AM +0200, Louis Rilling wrote:
> On Thu, Jun 12, 2008 at 12:13:48PM -0700, Joel Becker wrote:
> > On Thu, Jun 12, 2008 at 03:31:27PM +0200, Louis Rilling wrote:
> > > Locking rules for configfs_dirent linkage mutations are the same plus the
> > > requirement of taking configfs_dirent_lock. For configfs_dirent walking, one can
> > > either take appropriate i_mutex as before, or take configfs_dirent_lock.
> > 
> > 	Nope, you *must* take configfs_dirent_lock now.  You've removed
> > i_mutex protection in the last patch.
> 
> Oh well. Do you mean because of CONFIGFS_USET_DROPPING being set without
> i_mutex locked? This is the only mutation (except in the s_links patch) done
> without i_mutex locked. I thought that actually either other
> configfs_dirent traversals like readdir() and dir_lseek() would prevent
> detach_prep() from succeeding because they add dirents before, or are done in
> places where detach_prep() cannot do harm because new_dirent() fails whenever it
> sees CONFIGFS_USET_DROPPING: detach_attrs() and detach_groups()
> must ignore CONFIGFS_USET_DROPPING, depend_prep() is protected since the
> whole path is locked from configfs root, lookup() can succeed since at worst its
> result will be invalidated when actually detaching the default groups. The only
> function for which I can not figure out is configfs_hash_and_remove(), but it is
> not used.

	I don't mean that your code is wrong, I mean that the comment is
unclear.  The locking rules aren't "you can use i_mutex or dirent_lock,
take your pick".  I think you are right that configfs_detach_prep() is
safe to set dropping as it does without i_mutex.
	This is related to the discussion below about VFS visible
changes (i_mutex protection) vs subsystem internal changes (dirent_lock
protection).  The protections have different scope, but your comment
made them sound interchangable. 

> 	I admit that the case of symlink() needs an extra check to ensure
> that the target is not about to be removed. The bug was already there
> though, right?
> 	Anyway, if it looks conceptually simpler to use
> configfs_dirent_lock (probably better a mutex in that case) wherever
> i_mutex are supposed to protect configfs_dirent traversals, I'm ok with it.

	Leave it as a spinlock.
	Going over the changes, I was pretty convinced your detach_prep
was safe vis-a-vis mkdir.  You're under i_mutex for the immediate
directory, and both attach_* and detach_* are under the immediate
i_mutex when they make the change.  Also, you have your readdir and
lookup walking s_children without a lock.  I *think* that's safe, because
it's also against the immediate directory, and thus the vfs is holding
i_mutex for you.
 
> And we should not take other i_mutex in populate_groups() and
> populate_attrs(), otherwise deadlocks could happen.

	Huh, we certainly should.  perhaps you are speaking as if we
were turning dirent_lock into a mutex.  We're not turning dirent_lock
into a mutex yet.

> > 	Now, the only thing that sees this intermediate condition is
> > configfs itself.  Everyone else is protected by i_mutex.  I guess it's
> > OK - but can you comment that fact?  i_mutex does *not* protect
> > traversal of the configfs_dirent tree, but it does prevent the outside
> > world from seeing the intermediate states.
> 
> The only intermediate conditions that may hurt one's mind is that an
> mkdir() (resp. symlink()) racing with an rmdir() can successfuly call
> make_item()/make_group() (resp. allow_link()) and immediately fail when
> finalizing with attach_item()/attach_group() (resp. create_link()). So, from
> userspace and the VFS this seems like "mkdir foo/bar/baz" simply failed because
> of "rmdir foo", while at the same time from the subsystem point of view this
> seems like userspace did "mkdir foo/bar/baz; rmdir foo/bar/baz; rmdir foo".
> As I pointed out in the rename fix, this however can already happen when
> attach_item()/attach_group() (resp. create_link()) fails because of
> memory pressure for instance.

	I'm not even sure what you said here :-)

Joel

-- 

"Egotist: a person more interested in himself than in me."
         - Ambrose Bierce 

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock
  2008-06-13  2:41       ` Joel Becker
@ 2008-06-13 10:45         ` Louis Rilling
  2008-06-13 12:09           ` Louis Rilling
  2008-06-13 20:17           ` Joel Becker
  0 siblings, 2 replies; 14+ messages in thread
From: Louis Rilling @ 2008-06-13 10:45 UTC (permalink / raw)
  To: Joel.Becker; +Cc: linux-kernel, ocfs2-devel

[-- Attachment #1: Type: text/plain, Size: 6949 bytes --]

On Thu, Jun 12, 2008 at 07:41:31PM -0700, Joel Becker wrote:
> On Fri, Jun 13, 2008 at 12:25:58AM +0200, Louis Rilling wrote:
> > On Thu, Jun 12, 2008 at 12:13:48PM -0700, Joel Becker wrote:
> > > On Thu, Jun 12, 2008 at 03:31:27PM +0200, Louis Rilling wrote:
> > > > Locking rules for configfs_dirent linkage mutations are the same plus the
> > > > requirement of taking configfs_dirent_lock. For configfs_dirent walking, one can
> > > > either take appropriate i_mutex as before, or take configfs_dirent_lock.
> > > 
> > > 	Nope, you *must* take configfs_dirent_lock now.  You've removed
> > > i_mutex protection in the last patch.
> > 
> > Oh well. Do you mean because of CONFIGFS_USET_DROPPING being set without
> > i_mutex locked? This is the only mutation (except in the s_links patch) done
> > without i_mutex locked. I thought that actually either other
> > configfs_dirent traversals like readdir() and dir_lseek() would prevent
> > detach_prep() from succeeding because they add dirents before, or are done in
> > places where detach_prep() cannot do harm because new_dirent() fails whenever it
> > sees CONFIGFS_USET_DROPPING: detach_attrs() and detach_groups()
> > must ignore CONFIGFS_USET_DROPPING, depend_prep() is protected since the
> > whole path is locked from configfs root, lookup() can succeed since at worst its
> > result will be invalidated when actually detaching the default groups. The only
> > function for which I can not figure out is configfs_hash_and_remove(), but it is
> > not used.
> 
> 	I don't mean that your code is wrong, I mean that the comment is
> unclear.  The locking rules aren't "you can use i_mutex or dirent_lock,
> take your pick".  I think you are right that configfs_detach_prep() is
> safe to set dropping as it does without i_mutex.
> 	This is related to the discussion below about VFS visible
> changes (i_mutex protection) vs subsystem internal changes (dirent_lock
> protection).  The protections have different scope, but your comment
> made them sound interchangable. 

Agreed.

> 
> > 	I admit that the case of symlink() needs an extra check to ensure
> > that the target is not about to be removed. The bug was already there
> > though, right?
> > 	Anyway, if it looks conceptually simpler to use
> > configfs_dirent_lock (probably better a mutex in that case) wherever
> > i_mutex are supposed to protect configfs_dirent traversals, I'm ok with it.
> 
> 	Leave it as a spinlock.
> 	Going over the changes, I was pretty convinced your detach_prep
> was safe vis-a-vis mkdir.  You're under i_mutex for the immediate
> directory, and both attach_* and detach_* are under the immediate
> i_mutex when they make the change.  Also, you have your readdir and
> lookup walking s_children without a lock.  I *think* that's safe, because
> it's also against the immediate directory, and thus the vfs is holding
> i_mutex for you.

Unfortunately, thinking a bit more about it I found some issues with
i_mutex lock free detach_prep(), but nothing that can't be fixed ;)
	Between detach_prep() in A and mkdir() in a default group A/B:
detach_prep() can be called in the middle of attach_group(), for instance after
having attached A/B/C, but attach_group() may then fail (because of memory
pressure for instance) while attaching C's default group A/B/C/D. This would
lead to both mkdir(A/B/C) and rmdir(A) failing, the reason for rmdir failure
being at best obscure: the user would have expected to either see mkdir succeed
and rmdir fail because of the new A/B/C group, or see mkdir fail and rmdir
succeed because no user-created group lived under A. Solution: tag A/B with
USET_IN_MKDIR on mkdir entrance, remove that tag on mkdir exit, and retry
detach_prep() as long as USET_IN_MKDIR is found under A/*.
	Between rmdir() and readdir(): dir_open() might add a configfs_dirent
to a default group A/B that detach_prep() already marked with USET_DROPPING.
This could result in detach_groups() dropping the dirent and make readdir() in
A/B crash afterwards. Solution: check USET_DROPPING in dir_open() and fail if
it is set.
	Between rmdir() and lookup(): several lookup() called under A/* while
rmdir(A) in the middle of detach_groups() could return inconsistent results (for
instance some default groups being there and some other ones not). Solution:
lock dirent_lock for the whole lookup() duration, check USET_DROPPING of current
dir, and fail with ENOENT if it is set.

I'll send a corrected patch later.

>  
> > And we should not take other i_mutex in populate_groups() and
> > populate_attrs(), otherwise deadlocks could happen.
> 
> 	Huh, we certainly should.  perhaps you are speaking as if we
> were turning dirent_lock into a mutex.  We're not turning dirent_lock
> into a mutex yet.

I was speaking as if we replaced i_mutex protection with dirent_lock
protection for a whole mkdir(), that is taking the lock before attach_* and
releasing it after.

> 
> > > 	Now, the only thing that sees this intermediate condition is
> > > configfs itself.  Everyone else is protected by i_mutex.  I guess it's
> > > OK - but can you comment that fact?  i_mutex does *not* protect
> > > traversal of the configfs_dirent tree, but it does prevent the outside
> > > world from seeing the intermediate states.
> > 
> > The only intermediate conditions that may hurt one's mind is that an
> > mkdir() (resp. symlink()) racing with an rmdir() can successfuly call
> > make_item()/make_group() (resp. allow_link()) and immediately fail when
> > finalizing with attach_item()/attach_group() (resp. create_link()). So, from
> > userspace and the VFS this seems like "mkdir foo/bar/baz" simply failed because
> > of "rmdir foo", while at the same time from the subsystem point of view this
> > seems like userspace did "mkdir foo/bar/baz; rmdir foo/bar/baz; rmdir foo".
> > As I pointed out in the rename fix, this however can already happen when
> > attach_item()/attach_group() (resp. create_link()) fails because of
> > memory pressure for instance.
> 
> 	I'm not even sure what you said here :-)

I was just saying that with i_mutex lock free detach_prep(), we have kind of
optimistic mkdir(), with conflicts resolved as error cases of attach_*.

The intermediate conditions that really matter are:
1/ the existence of partial default groups trees (I mean configfs_dirent trees)
   in the middle of attach_group() and detach_group(),
2/ the existence of default group trees that are tagged as USET_DROPPING and
   should be treated as not existing anymore.
	These intermediate conditions lead to the issues pointed out above, and
will be correctly handled (I hope) in the coming corrected patch.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock
  2008-06-13 10:45         ` Louis Rilling
@ 2008-06-13 12:09           ` Louis Rilling
  2008-06-13 20:19             ` Joel Becker
  2008-06-13 20:17           ` Joel Becker
  1 sibling, 1 reply; 14+ messages in thread
From: Louis Rilling @ 2008-06-13 12:09 UTC (permalink / raw)
  To: Joel.Becker; +Cc: linux-kernel, ocfs2-devel

[-- Attachment #1: Type: text/plain, Size: 2280 bytes --]

On Fri, Jun 13, 2008 at 12:45:13PM +0200, Louis Rilling wrote:
> 
> Unfortunately, thinking a bit more about it I found some issues with
> i_mutex lock free detach_prep(), but nothing that can't be fixed ;)
> 	Between detach_prep() in A and mkdir() in a default group A/B:
> detach_prep() can be called in the middle of attach_group(), for instance after
> having attached A/B/C, but attach_group() may then fail (because of memory
> pressure for instance) while attaching C's default group A/B/C/D. This would
> lead to both mkdir(A/B/C) and rmdir(A) failing, the reason for rmdir failure
> being at best obscure: the user would have expected to either see mkdir succeed
> and rmdir fail because of the new A/B/C group, or see mkdir fail and rmdir
> succeed because no user-created group lived under A. Solution: tag A/B with
> USET_IN_MKDIR on mkdir entrance, remove that tag on mkdir exit, and retry
> detach_prep() as long as USET_IN_MKDIR is found under A/*.
> 	Between rmdir() and readdir(): dir_open() might add a configfs_dirent
> to a default group A/B that detach_prep() already marked with USET_DROPPING.
> This could result in detach_groups() dropping the dirent and make readdir() in
> A/B crash afterwards. Solution: check USET_DROPPING in dir_open() and fail if
> it is set.
> 	Between rmdir() and lookup(): several lookup() called under A/* while
> rmdir(A) in the middle of detach_groups() could return inconsistent results (for
> instance some default groups being there and some other ones not). Solution:
> lock dirent_lock for the whole lookup() duration, check USET_DROPPING of current
> dir, and fail with ENOENT if it is set.

Oh, should probably provide some d_revalidate() also, which would return
-ENOENT for a dentry under a directory flagged with USET_DROPPING. But I'm
realizing that such "inconsistencies" (some default groups being valid in the
d_cache and some other ones not) already happen between the time detach_prep()
has flagged a default group with USET_DROPPING and the default
group is actually detached. Am I wrong?

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock
  2008-06-13 10:45         ` Louis Rilling
  2008-06-13 12:09           ` Louis Rilling
@ 2008-06-13 20:17           ` Joel Becker
  2008-06-13 21:54             ` Louis Rilling
  1 sibling, 1 reply; 14+ messages in thread
From: Joel Becker @ 2008-06-13 20:17 UTC (permalink / raw)
  To: Louis Rilling; +Cc: linux-kernel, ocfs2-devel

Louis,
	Can I just say, you're the first person to do serious review
other than myself, and I really appreciate it :-)

On Fri, Jun 13, 2008 at 12:45:13PM +0200, Louis Rilling wrote:
> On Thu, Jun 12, 2008 at 07:41:31PM -0700, Joel Becker wrote:
> Unfortunately, thinking a bit more about it I found some issues with
> i_mutex lock free detach_prep(), but nothing that can't be fixed ;)
> 	Between detach_prep() in A and mkdir() in a default group A/B:
> detach_prep() can be called in the middle of attach_group(), for instance after
> having attached A/B/C, but attach_group() may then fail (because of memory
> pressure for instance) while attaching C's default group A/B/C/D. This would
> lead to both mkdir(A/B/C) and rmdir(A) failing, the reason for rmdir failure
> being at best obscure: the user would have expected to either see mkdir succeed
> and rmdir fail because of the new A/B/C group, or see mkdir fail and rmdir
> succeed because no user-created group lived under A. Solution: tag A/B with
> USET_IN_MKDIR on mkdir entrance, remove that tag on mkdir exit, and retry
> detach_prep() as long as USET_IN_MKDIR is found under A/*.

	I see what you are saying here.  I'm not sure if that is worth
the complexity - we can say "it was kind of there".  No one will ever
hit it :-)  But let me think about it more.

> 	Between rmdir() and readdir(): dir_open() might add a configfs_dirent
> to a default group A/B that detach_prep() already marked with USET_DROPPING.
> This could result in detach_groups() dropping the dirent and make readdir() in
> A/B crash afterwards. Solution: check USET_DROPPING in dir_open() and fail if
> it is set.

	I was trying to see why this could happen, given that we can
come to this from other places - the dir could have been open before we
set USET_DROPPING.  Oh!  We actually fail rmdir with ENOTEMPTY when the
dir is open?  That's wrong.  Ignore it though - we'll fix it later.
	But back to your concern.  configfs_readdir() can't crash for
two reasons.  First, detach_groups() won't remove this dirent.  A
readdir placeholder has s_element==NULL.  Note the check in
detach_groups():

	if (!sd->s_element ||
	    !(sd->s_type & CONFIGFS_USET_DEFAULT))
		continue;

It skips our readdir placeholder, allowing us to free it in dir_close().
	There's another reason this can't be a problem.  If we get into
detach_groups(), we take i_mutex, locking out readdir().  Then we delete
the directory, setting S_DEAD.  In vfs_readdir(), they check
IS_DEADDIR() after getting i_mutex.  So they will see S_DEAD and not
call our ->readdir().  S_DEAD is important.  Someone could actually have
our default_group as their cwd.  S_DEAD prevents them from doing
anything :-)

> 	Between rmdir() and lookup(): several lookup() called under A/* while
> rmdir(A) in the middle of detach_groups() could return inconsistent results (for
> instance some default groups being there and some other ones not). Solution:
> lock dirent_lock for the whole lookup() duration, check USET_DROPPING of current
> dir, and fail with ENOENT if it is set.

	Nah, we don't care about the spurious lookups.  This is a normal
race of i_mutex.  USET_DROPPING is not a way to prevent VFS views from
changing - it's only a way to prevent new children.
	Remember, ->lookup() comes with i_mutex locking.  We hold
i_mutex during the entire delete, so they can't call ->lookup() until
we're done with a directory.  Conversely, if they win i_mutex and ->lookup()
a default group, then try to use it after we've removed it, they'll just
ENOENT.  This is evident back in do_rename().  They call lookup, which
takes and drops locks, then call lock_rename() to get the locks back.
And they can handle ENOENT at that point.

> I was speaking as if we replaced i_mutex protection with dirent_lock
> protection for a whole mkdir(), that is taking the lock before attach_* and
> releasing it after.

	Ok.  I think that's not the way to go, what you currently have
is better.

> > 	I'm not even sure what you said here :-)
> 
> I was just saying that with i_mutex lock free detach_prep(), we have kind of
> optimistic mkdir(), with conflicts resolved as error cases of attach_*.

	Basically, the concerns you had above.

> The intermediate conditions that really matter are:
> 1/ the existence of partial default groups trees (I mean configfs_dirent trees)
>    in the middle of attach_group() and detach_group(),

	This is your first case, the mkdir ENOMEM vs rmdir ENOTEMPTY.

> 2/ the existence of default group trees that are tagged as USET_DROPPING and
>    should be treated as not existing anymore.

	This is not an issue.  USET_DROPPING does *not* mean it went
away.  It means we're safe to make it go away.  We protect the actual
going-away with i_mutex.  And that's normal VFS behavior.

Joel

-- 

"I don't know anything about music. In my line you don't have
 to."
        - Elvis Presley

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock
  2008-06-13 12:09           ` Louis Rilling
@ 2008-06-13 20:19             ` Joel Becker
  0 siblings, 0 replies; 14+ messages in thread
From: Joel Becker @ 2008-06-13 20:19 UTC (permalink / raw)
  To: Louis Rilling; +Cc: linux-kernel, ocfs2-devel

On Fri, Jun 13, 2008 at 02:09:23PM +0200, Louis Rilling wrote:
> Oh, should probably provide some d_revalidate() also, which would return
> -ENOENT for a dentry under a directory flagged with USET_DROPPING. But I'm
> realizing that such "inconsistencies" (some default groups being valid in the
> d_cache and some other ones not) already happen between the time detach_prep()
> has flagged a default group with USET_DROPPING and the default
> group is actually detached. Am I wrong?

	We don't need d_revalidate().  As I stated at the end of my last
email, USET_DROPPING does not mean 'It already went away'.  It just
means we're safe to do so, because we prevent new children.  We actually
make it go away underneath i_mutex.
	The VFS handles inconsistencies between lookup and action.  It's
part of normal operation.  Otherwise, they'd have to hold all the
i_mutexes around lookup and action.

Joel

-- 

"When choosing between two evils, I always like to try the one
 I've never tried before."
        - Mae West

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock
  2008-06-13 20:17           ` Joel Becker
@ 2008-06-13 21:54             ` Louis Rilling
  2008-06-13 22:34               ` Joel Becker
  0 siblings, 1 reply; 14+ messages in thread
From: Louis Rilling @ 2008-06-13 21:54 UTC (permalink / raw)
  To: Joel.Becker; +Cc: linux-kernel, ocfs2-devel

On Fri, Jun 13, 2008 at 01:17:46PM -0700, Joel Becker wrote:
> Louis,
> 	Can I just say, you're the first person to do serious review
> other than myself, and I really appreciate it :-)

It's just that I use configfs in my own work, and I'm playing hard with
it, especially with modules. So I need to understand exactly what it
does, and what is possible with it.

> 
> On Fri, Jun 13, 2008 at 12:45:13PM +0200, Louis Rilling wrote:
> > On Thu, Jun 12, 2008 at 07:41:31PM -0700, Joel Becker wrote:
> > Unfortunately, thinking a bit more about it I found some issues with
> > i_mutex lock free detach_prep(), but nothing that can't be fixed ;)
> > 	Between detach_prep() in A and mkdir() in a default group A/B:
> > detach_prep() can be called in the middle of attach_group(), for instance after
> > having attached A/B/C, but attach_group() may then fail (because of memory
> > pressure for instance) while attaching C's default group A/B/C/D. This would
> > lead to both mkdir(A/B/C) and rmdir(A) failing, the reason for rmdir failure
> > being at best obscure: the user would have expected to either see mkdir succeed
> > and rmdir fail because of the new A/B/C group, or see mkdir fail and rmdir
> > succeed because no user-created group lived under A. Solution: tag A/B with
> > USET_IN_MKDIR on mkdir entrance, remove that tag on mkdir exit, and retry
> > detach_prep() as long as USET_IN_MKDIR is found under A/*.
> 
> 	I see what you are saying here.  I'm not sure if that is worth
> the complexity - we can say "it was kind of there".  No one will ever
> hit it :-)  But let me think about it more.

To me it's an issue only if we want to provide some atomic view to
userspace: either userspace sees a group with all of its default groups,
or it sees none. So the question is: does userspace need such atomicity?
Currently configfs provides it, so this would be a userspace visible
change if we break it.

> 
> > 	Between rmdir() and readdir(): dir_open() might add a configfs_dirent
> > to a default group A/B that detach_prep() already marked with USET_DROPPING.
> > This could result in detach_groups() dropping the dirent and make readdir() in
> > A/B crash afterwards. Solution: check USET_DROPPING in dir_open() and fail if
> > it is set.
> 
> 	I was trying to see why this could happen, given that we can
> come to this from other places - the dir could have been open before we
> set USET_DROPPING.  Oh!  We actually fail rmdir with ENOTEMPTY when the
> dir is open?  That's wrong.  Ignore it though - we'll fix it later.
> 	But back to your concern.  configfs_readdir() can't crash for
> two reasons.  First, detach_groups() won't remove this dirent.  A
> readdir placeholder has s_element==NULL.  Note the check in
> detach_groups():
> 
> 	if (!sd->s_element ||
> 	    !(sd->s_type & CONFIGFS_USET_DEFAULT))
> 		continue;
> 
> It skips our readdir placeholder, allowing us to free it in dir_close().

I had not noticed this. Thanks for pointed it out.

> 	There's another reason this can't be a problem.  If we get into
> detach_groups(), we take i_mutex, locking out readdir().  Then we delete
> the directory, setting S_DEAD.  In vfs_readdir(), they check
> IS_DEADDIR() after getting i_mutex.  So they will see S_DEAD and not
> call our ->readdir().  S_DEAD is important.  Someone could actually have
> our default_group as their cwd.  S_DEAD prevents them from doing
> anything :-)

As I told you in a previous email, I'm missing some VFS skills, so
thanks again for the explanation.

> 
> > 	Between rmdir() and lookup(): several lookup() called under A/* while
> > rmdir(A) in the middle of detach_groups() could return inconsistent results (for
> > instance some default groups being there and some other ones not). Solution:
> > lock dirent_lock for the whole lookup() duration, check USET_DROPPING of current
> > dir, and fail with ENOENT if it is set.
> 
> 	Nah, we don't care about the spurious lookups.  This is a normal
> race of i_mutex.  USET_DROPPING is not a way to prevent VFS views from
> changing - it's only a way to prevent new children.
> 	Remember, ->lookup() comes with i_mutex locking.  We hold
> i_mutex during the entire delete, so they can't call ->lookup() until
> we're done with a directory.  Conversely, if they win i_mutex and ->lookup()
> a default group, then try to use it after we've removed it, they'll just
> ENOENT.  This is evident back in do_rename().  They call lookup, which
> takes and drops locks, then call lock_rename() to get the locks back.
> And they can handle ENOENT at that point.

Sure, my only concern is the atomic view of userspace: can userspace
tolerate that (pwd=A/B, with B a default group of A, B having default groups C
and D, and A being removed) 'ls C' returns error because default group C is
already removed and 'ls D' is ok because default group D is not removed yet?

> 
> > I was speaking as if we replaced i_mutex protection with dirent_lock
> > protection for a whole mkdir(), that is taking the lock before attach_* and
> > releasing it after.
> 
> 	Ok.  I think that's not the way to go, what you currently have
> is better.

Agreed.

> 
> > The intermediate conditions that really matter are:
> > 1/ the existence of partial default groups trees (I mean configfs_dirent trees)
> >    in the middle of attach_group() and detach_group(),
> 
> 	This is your first case, the mkdir ENOMEM vs rmdir ENOTEMPTY.

Exactly.

> 
> > 2/ the existence of default group trees that are tagged as USET_DROPPING and
> >    should be treated as not existing anymore.
> 
> 	This is not an issue.  USET_DROPPING does *not* mean it went
> away.  It means we're safe to make it go away.  We protect the actual
> going-away with i_mutex.  And that's normal VFS behavior.

Again this is the concern of atomicity from userspace point of view: to
provide such atomic view, mkdir(), lookup(), readdir(), and probably
attributes open() should just fail when done in a default group flagged with
USET_DROPPING.

Anyway, if atomicity from userspace point of view is not a concern, this
just makes things simpler, and I'm ok with it.

Louis

-- 
Dr Louis Rilling			Kerlabs - IRISA
Skype: louis.rilling			Campus Universitaire de Beaulieu
Phone: (+33|0) 2 99 84 71 52		Avenue du General Leclerc
Fax: (+33|0) 2 99 84 71 71		35042 Rennes CEDEX - France
http://www.kerlabs.com/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock
  2008-06-13 21:54             ` Louis Rilling
@ 2008-06-13 22:34               ` Joel Becker
  2008-06-16 11:30                 ` Louis Rilling
  0 siblings, 1 reply; 14+ messages in thread
From: Joel Becker @ 2008-06-13 22:34 UTC (permalink / raw)
  To: Louis Rilling; +Cc: linux-kernel, ocfs2-devel

On Fri, Jun 13, 2008 at 11:54:01PM +0200, Louis Rilling wrote:
> On Fri, Jun 13, 2008 at 01:17:46PM -0700, Joel Becker wrote:
> > Louis,
> > 	Can I just say, you're the first person to do serious review
> > other than myself, and I really appreciate it :-)
> 
> It's just that I use configfs in my own work, and I'm playing hard with
> it, especially with modules. So I need to understand exactly what it
> does, and what is possible with it.

	I'm all for it.  And see my other email responding to Luis about
the current concerns with configfs.

> > 	I see what you are saying here.  I'm not sure if that is worth
> > the complexity - we can say "it was kind of there".  No one will ever
> > hit it :-)  But let me think about it more.
> 
> To me it's an issue only if we want to provide some atomic view to
> userspace: either userspace sees a group with all of its default groups,
> or it sees none. So the question is: does userspace need such atomicity?
> Currently configfs provides it, so this would be a userspace visible
> change if we break it.

	People *won't* see that.  default groups are populated and
cleaned under i_mutex.  The race of mkdir vs rmdir isn't about seeing
partial default groups, it's about the ENOMEM racing the ENOTEMPTY.  It
doesn't impact lookup or other operations.  We can fix it.  I'm just not
sure it's worth the complexity (and this is an open question).

> > 	There's another reason this can't be a problem.  If we get into
> > detach_groups(), we take i_mutex, locking out readdir().  Then we delete
> > the directory, setting S_DEAD.  In vfs_readdir(), they check
> > IS_DEADDIR() after getting i_mutex.  So they will see S_DEAD and not
> > call our ->readdir().  S_DEAD is important.  Someone could actually have
> > our default_group as their cwd.  S_DEAD prevents them from doing
> > anything :-)
> 
> As I told you in a previous email, I'm missing some VFS skills, so
> thanks again for the explanation.

	Oh, don't worry about that.  There's nothing wrong with not
knowing - that's why you asked me.  Now you do :-)

> Sure, my only concern is the atomic view of userspace: can userspace
> tolerate that (pwd=A/B, with B a default group of A, B having default groups C
> and D, and A being removed) 'ls C' returns error because default group C is
> already removed and 'ls D' is ok because default group D is not removed yet?

	They can't see that.  We take i_mutex in detach_group.  This
locks out lookup and readdir.  When we're done with detach_group, all
default groups are gone.

> > > 2/ the existence of default group trees that are tagged as USET_DROPPING and
> > >    should be treated as not existing anymore.
> > 
> > 	This is not an issue.  USET_DROPPING does *not* mean it went
> > away.  It means we're safe to make it go away.  We protect the actual
> > going-away with i_mutex.  And that's normal VFS behavior.
> 
> Again this is the concern of atomicity from userspace point of view: to
> provide such atomic view, mkdir(), lookup(), readdir(), and probably
> attributes open() should just fail when done in a default group flagged with
> USET_DROPPING.

	It's not atomic, though, and never has been.  I'm not quite sure
what you are unsure of here.  Let me try to clarify a little.
	Are you worried about two separate runs of the ls(1) command?

  # ls A/B/C
  # ls A/B/D

These can't be atomic, because someone else could rmdir(1) in the
middle:

  # ls A/B/C
  # rmdir A/B
  # ls A/B/D
  ls: No such file or directory

This is perfectly normal, and there is no way to prevent it - it is
separate entrances to the system call.
	Do you mean inside one call?  That is "ls A/B" would print "C"
but not "D"?  That cannot happen, because we hold B's i_mutex during
detach_group.  So, if readdir beat us to i_mutex, it lists "C D".  If we
win, we remove both before releasing B's i_Mutex, and readdir errors
with ENOENT - we removed B.
	I'm not quite sure what inconsistency you are asking about here.

Joel

-- 

"The one important thing i have learned over the years is the
 difference between taking one's work seriously and taking one's self
 seriously.  The first is imperative and the second is disastrous."
	-Margot Fonteyn

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock
  2008-06-13 22:34               ` Joel Becker
@ 2008-06-16 11:30                 ` Louis Rilling
  0 siblings, 0 replies; 14+ messages in thread
From: Louis Rilling @ 2008-06-16 11:30 UTC (permalink / raw)
  To: Joel.Becker; +Cc: linux-kernel, ocfs2-devel

[-- Attachment #1: Type: text/plain, Size: 4114 bytes --]

On Fri, Jun 13, 2008 at 03:34:41PM -0700, Joel Becker wrote:
> > To me it's an issue only if we want to provide some atomic view to
> > userspace: either userspace sees a group with all of its default groups,
> > or it sees none. So the question is: does userspace need such atomicity?
> > Currently configfs provides it, so this would be a userspace visible
> > change if we break it.
> 
> 	People *won't* see that.  default groups are populated and
> cleaned under i_mutex.  The race of mkdir vs rmdir isn't about seeing
> partial default groups, it's about the ENOMEM racing the ENOTEMPTY.  It
> doesn't impact lookup or other operations.  We can fix it.  I'm just not
> sure it's worth the complexity (and this is an open question).

It's not that difficult to implement, you may just find it a bit ugly... I hope
to send you a corrected "rename fix" today.

> 
> > Sure, my only concern is the atomic view of userspace: can userspace
> > tolerate that (pwd=A/B, with B a default group of A, B having default groups C
> > and D, and A being removed) 'ls C' returns error because default group C is
> > already removed and 'ls D' is ok because default group D is not removed yet?
> 
> 	They can't see that.  We take i_mutex in detach_group.  This
> locks out lookup and readdir.  When we're done with detach_group, all
> default groups are gone.

If I understand correctly, lookup() is not called each time userspace does ls,
and in configfs case, it is never called for existing items since the d_cache is
populated as soon as the user creates items. So lookup() does not block 'ls'
during rmdir() (unless it is a lookup for a never accessed attribute). I think
that this is the point that invalidates all my theory about atomicity :)

> 
> > > > 2/ the existence of default group trees that are tagged as USET_DROPPING and
> > > >    should be treated as not existing anymore.
> > > 
> > > 	This is not an issue.  USET_DROPPING does *not* mean it went
> > > away.  It means we're safe to make it go away.  We protect the actual
> > > going-away with i_mutex.  And that's normal VFS behavior.
> > 
> > Again this is the concern of atomicity from userspace point of view: to
> > provide such atomic view, mkdir(), lookup(), readdir(), and probably
> > attributes open() should just fail when done in a default group flagged with
> > USET_DROPPING.
> 
> 	It's not atomic, though, and never has been.  I'm not quite sure
> what you are unsure of here.  Let me try to clarify a little.
> 	Are you worried about two separate runs of the ls(1) command?
> 
>   # ls A/B/C
>   # ls A/B/D
> 
> These can't be atomic, because someone else could rmdir(1) in the
> middle:
> 
>   # ls A/B/C
>   # rmdir A/B
>   # ls A/B/D
>   ls: No such file or directory
> 
> This is perfectly normal, and there is no way to prevent it - it is
> separate entrances to the system call.
> 	Do you mean inside one call?  That is "ls A/B" would print "C"
> but not "D"?  That cannot happen, because we hold B's i_mutex during
> detach_group.  So, if readdir beat us to i_mutex, it lists "C D".  If we
> win, we remove both before releasing B's i_Mutex, and readdir errors
> with ENOENT - we removed B.
> 	I'm not quite sure what inconsistency you are asking about here.

The scenario that made me worry was more:
process 1:
/* PWD=A/B */
# ls C
ls: No such file or directory
/* some sync between process 1 and 2 */
process 2:
/* PWD=A/D */
# ls E
/* ok */
# ls E
ls: No such file or directory

From a user's point of view, this looks as if somebody did 'rmdir A; mkdir A;
rmdir A', while there actually were only 'rmdir A'.

If there were no d_cache, this would be impossible with the current
implementation of detach_prep() locking all default groups. But with the d_cache
this has always been possible.

Anyway, I give up with this (wrong) atomicity concern.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-06-16 11:30 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-12 13:31 [PATCH 0/3][BUGFIX] configfs: Fix deadlock rmdir() vs rename() Louis Rilling
2008-06-12 13:31 ` [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock Louis Rilling
2008-06-12 19:13   ` Joel Becker
2008-06-12 22:25     ` Louis Rilling
2008-06-13  2:41       ` Joel Becker
2008-06-13 10:45         ` Louis Rilling
2008-06-13 12:09           ` Louis Rilling
2008-06-13 20:19             ` Joel Becker
2008-06-13 20:17           ` Joel Becker
2008-06-13 21:54             ` Louis Rilling
2008-06-13 22:34               ` Joel Becker
2008-06-16 11:30                 ` Louis Rilling
2008-06-12 13:31 ` [PATCH 2/3][BUGFIX] configfs: Make configfs_new_dirent() return error code instead of NULL Louis Rilling
2008-06-12 13:31 ` [PATCH 3/3][BUGFIX] configfs: Fix deadlock with racing rmdir() and rename() Louis Rilling

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).