All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] VFS prep for union mounts/writable overlays
@ 2009-12-23 23:36 Valerie Aurora
  2009-12-23 23:36 ` [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info Valerie Aurora
  0 siblings, 1 reply; 28+ messages in thread
From: Valerie Aurora @ 2009-12-23 23:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jan Blunck, Alexander Viro, Valerie Aurora

Writable overlays/union mounts requires some generic VFS improvements
and changes.  The patches in this set are independent of union
mounts and should merged immediately.

Jan Blunck (6):
  autofs4: Save autofs trigger's vfsmount in super block info
  VFS: Make lookup_hash() return a struct path
  VFS: Make real_lookup() return a struct path
  VFS: Propagate mnt_flags into do_loopback
  VFS: BUG_ON() rehash of an already hashed dentry
  VFS: Remove unnecessary micro-optimization in cached_lookup()

Valerie Aurora (1):
  VFS: Add read-only users count to superblock

 fs/autofs4/autofs_i.h |    1 +
 fs/autofs4/init.c     |   11 +++-
 fs/autofs4/root.c     |    6 ++
 fs/dcache.c           |    1 +
 fs/namei.c            |  217 ++++++++++++++++++++++++++-----------------------
 fs/namespace.c        |    7 +-
 fs/super.c            |   18 ++++-
 include/linux/fs.h    |    5 +
 8 files changed, 158 insertions(+), 108 deletions(-)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2009-12-23 23:36 [PATCH 0/7] VFS prep for union mounts/writable overlays Valerie Aurora
@ 2009-12-23 23:36 ` Valerie Aurora
  2009-12-23 23:36   ` [PATCH 2/7] VFS: Make lookup_hash() return a struct path Valerie Aurora
  2010-01-02  0:44   ` [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info Ian Kent
  0 siblings, 2 replies; 28+ messages in thread
From: Valerie Aurora @ 2009-12-23 23:36 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jan Blunck, Alexander Viro, Valerie Aurora, Ian Kent, autofs

From: Jan Blunck <jblunck@suse.de>

This is a bugfix/replacement for commit
051d381259eb57d6074d02a6ba6e90e744f1a29f:

    During a path walk if an autofs trigger is mounted on a dentry,
    when the follow_link method is called, the nameidata struct
    contains the vfsmount and mountpoint dentry of the parent mount
    while the dentry that is passed in is the root of the autofs
    trigger mount.  I believe it is impossible to get the vfsmount of
    the trigger mount, within the follow_link method, when only the
    parent vfsmount and the root dentry of the trigger mount are
    known.

The solution in this commit was to replace the path embedded in the
parent's nameidata with the path of the link itself in
__do_follow_link().  This is a relatively harmless misuse of the
field, but union mounts ran into a bug during follow_link() caused by
the nameidata containing the wrong path (we count on it being what it
is all other places - the path of the parent).

A cleaner and easier to understand solution is to save the necessary
vfsmount in the autofs superblock info when it is mounted.  Then we
can easily update the vfsmount in autofs4_follow_link().

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Ian Kent <raven@themaw.net>
Cc: autofs@linux.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/autofs4/autofs_i.h |    1 +
 fs/autofs4/init.c     |   11 ++++++++++-
 fs/autofs4/root.c     |    6 ++++++
 fs/namei.c            |    7 ++-----
 4 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index 8f7cdde..db2bfce 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -130,6 +130,7 @@ struct autofs_sb_info {
 	int reghost_enabled;
 	int needs_reghost;
 	struct super_block *sb;
+	struct vfsmount *mnt;
 	struct mutex wq_mutex;
 	spinlock_t fs_lock;
 	struct autofs_wait_queue *queues; /* Wait queue pointer */
diff --git a/fs/autofs4/init.c b/fs/autofs4/init.c
index 9722e4b..5e0dcd7 100644
--- a/fs/autofs4/init.c
+++ b/fs/autofs4/init.c
@@ -17,7 +17,16 @@
 static int autofs_get_sb(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
 {
-	return get_sb_nodev(fs_type, flags, data, autofs4_fill_super, mnt);
+	struct autofs_sb_info *sbi;
+	int ret;
+
+	ret = get_sb_nodev(fs_type, flags, data, autofs4_fill_super, mnt);
+	if (ret)
+		return ret;
+
+	sbi = autofs4_sbi(mnt->mnt_sb);
+	sbi->mnt = mnt;
+	return 0;
 }
 
 static struct file_system_type autofs_fs_type = {
diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index b96a3c5..cb991b8 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -179,6 +179,12 @@ static void *autofs4_follow_link(struct dentry *dentry, struct nameidata *nd)
 	DPRINTK("dentry=%p %.*s oz_mode=%d nd->flags=%d",
 		dentry, dentry->d_name.len, dentry->d_name.name, oz_mode,
 		nd->flags);
+
+	dput(nd->path.dentry);
+	mntput(nd->path.mnt);
+	nd->path.mnt = mntget(sbi->mnt);
+	nd->path.dentry = dget(dentry);
+
 	/*
 	 * For an expire of a covered direct or offset mount we need
 	 * to break out of follow_down() at the autofs mount trigger
diff --git a/fs/namei.c b/fs/namei.c
index d11f404..c768444 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -629,11 +629,8 @@ static __always_inline int __do_follow_link(struct path *path, struct nameidata
 	touch_atime(path->mnt, dentry);
 	nd_set_link(nd, NULL);
 
-	if (path->mnt != nd->path.mnt) {
-		path_to_nameidata(path, nd);
-		dget(dentry);
-	}
-	mntget(path->mnt);
+	if (path->mnt == nd->path.mnt)
+		mntget(nd->path.mnt);
 	cookie = dentry->d_inode->i_op->follow_link(dentry, nd);
 	error = PTR_ERR(cookie);
 	if (!IS_ERR(cookie)) {
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/7] VFS: Make lookup_hash() return a struct path
  2009-12-23 23:36 ` [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info Valerie Aurora
@ 2009-12-23 23:36   ` Valerie Aurora
  2009-12-23 23:36     ` [PATCH 3/7] VFS: Make real_lookup() " Valerie Aurora
  2010-01-02  0:44   ` [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info Ian Kent
  1 sibling, 1 reply; 28+ messages in thread
From: Valerie Aurora @ 2009-12-23 23:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jan Blunck, Alexander Viro, Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

This patch changes lookup_hash() into returning a struct path.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c |  115 ++++++++++++++++++++++++++++++-----------------------------
 1 files changed, 58 insertions(+), 57 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c768444..0ddfefb 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1169,7 +1169,7 @@ static int path_lookup_open(int dfd, const char *name,
 }
 
 static struct dentry *__lookup_hash(struct qstr *name,
-		struct dentry *base, struct nameidata *nd)
+				    struct dentry *base, struct nameidata *nd)
 {
 	struct dentry *dentry;
 	struct inode *inode;
@@ -1216,14 +1216,22 @@ out:
  * needs parent already locked. Doesn't follow mounts.
  * SMP-safe.
  */
-static struct dentry *lookup_hash(struct nameidata *nd)
+static int lookup_hash(struct nameidata *nd, struct qstr *name,
+		       struct path *path)
 {
 	int err;
 
 	err = inode_permission(nd->path.dentry->d_inode, MAY_EXEC);
 	if (err)
-		return ERR_PTR(err);
-	return __lookup_hash(&nd->last, nd->path.dentry, nd);
+		return err;
+	path->mnt = nd->path.mnt;
+	path->dentry =  __lookup_hash(name, nd->path.dentry, nd);
+	if (IS_ERR(path->dentry)) {
+		err = PTR_ERR(path->dentry);
+		path->dentry = NULL;
+		path->mnt = NULL;
+	}
+	return err;
 }
 
 static int __lookup_one_len(const char *name, struct qstr *this,
@@ -1735,12 +1743,10 @@ struct file *do_filp_open(int dfd, const char *pathname,
 	if (flag & O_EXCL)
 		nd.flags |= LOOKUP_EXCL;
 	mutex_lock(&dir->d_inode->i_mutex);
-	path.dentry = lookup_hash(&nd);
-	path.mnt = nd.path.mnt;
+	error = lookup_hash(&nd, &nd.last, &path);
 
 do_last:
-	error = PTR_ERR(path.dentry);
-	if (IS_ERR(path.dentry)) {
+	if (error) {
 		mutex_unlock(&dir->d_inode->i_mutex);
 		goto exit;
 	}
@@ -1901,8 +1907,7 @@ do_link:
 	}
 	dir = nd.path.dentry;
 	mutex_lock(&dir->d_inode->i_mutex);
-	path.dentry = lookup_hash(&nd);
-	path.mnt = nd.path.mnt;
+	error = lookup_hash(&nd, &nd.last, &path);
 	__putname(nd.last.name);
 	goto do_last;
 }
@@ -1936,7 +1941,8 @@ EXPORT_SYMBOL(filp_open);
  */
 struct dentry *lookup_create(struct nameidata *nd, int is_dir)
 {
-	struct dentry *dentry = ERR_PTR(-EEXIST);
+	struct path path;
+	int err;
 
 	mutex_lock_nested(&nd->path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
 	/*
@@ -1944,7 +1950,7 @@ struct dentry *lookup_create(struct nameidata *nd, int is_dir)
 	 * (foo/., foo/.., /////)
 	 */
 	if (nd->last_type != LAST_NORM)
-		goto fail;
+		return ERR_PTR(-EEXIST);
 	nd->flags &= ~LOOKUP_PARENT;
 	nd->flags |= LOOKUP_CREATE | LOOKUP_EXCL;
 	nd->intent.open.flags = O_EXCL;
@@ -1952,11 +1958,11 @@ struct dentry *lookup_create(struct nameidata *nd, int is_dir)
 	/*
 	 * Do the final lookup.
 	 */
-	dentry = lookup_hash(nd);
-	if (IS_ERR(dentry))
-		goto fail;
+	err = lookup_hash(nd, &nd->last, &path);
+	if (err)
+		return ERR_PTR(err);
 
-	if (dentry->d_inode)
+	if (path.dentry->d_inode)
 		goto eexist;
 	/*
 	 * Special case - lookup gave negative, but... we had foo/bar/
@@ -1965,15 +1971,14 @@ struct dentry *lookup_create(struct nameidata *nd, int is_dir)
 	 * been asking for (non-existent) directory. -ENOENT for you.
 	 */
 	if (unlikely(!is_dir && nd->last.name[nd->last.len])) {
-		dput(dentry);
-		dentry = ERR_PTR(-ENOENT);
+		dput(path.dentry);
+		return ERR_PTR(-ENOENT);
 	}
-	return dentry;
+
+	return path.dentry;
 eexist:
-	dput(dentry);
-	dentry = ERR_PTR(-EEXIST);
-fail:
-	return dentry;
+	path_put_conditional(&path, nd);
+	return ERR_PTR(-EEXIST);
 }
 EXPORT_SYMBOL_GPL(lookup_create);
 
@@ -2210,7 +2215,7 @@ static long do_rmdir(int dfd, const char __user *pathname)
 {
 	int error = 0;
 	char * name;
-	struct dentry *dentry;
+	struct path path;
 	struct nameidata nd;
 
 	error = user_path_parent(dfd, pathname, &nd, &name);
@@ -2232,21 +2237,20 @@ static long do_rmdir(int dfd, const char __user *pathname)
 	nd.flags &= ~LOOKUP_PARENT;
 
 	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
-	dentry = lookup_hash(&nd);
-	error = PTR_ERR(dentry);
-	if (IS_ERR(dentry))
+	error = lookup_hash(&nd, &nd.last, &path);
+	if (error)
 		goto exit2;
 	error = mnt_want_write(nd.path.mnt);
 	if (error)
 		goto exit3;
-	error = security_path_rmdir(&nd.path, dentry);
+	error = security_path_rmdir(&nd.path, path.dentry);
 	if (error)
 		goto exit4;
-	error = vfs_rmdir(nd.path.dentry->d_inode, dentry);
+	error = vfs_rmdir(nd.path.dentry->d_inode, path.dentry);
 exit4:
 	mnt_drop_write(nd.path.mnt);
 exit3:
-	dput(dentry);
+	path_put_conditional(&path, &nd);
 exit2:
 	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
 exit1:
@@ -2301,7 +2305,7 @@ static long do_unlinkat(int dfd, const char __user *pathname)
 {
 	int error;
 	char *name;
-	struct dentry *dentry;
+	struct path path;
 	struct nameidata nd;
 	struct inode *inode = NULL;
 
@@ -2316,26 +2320,25 @@ static long do_unlinkat(int dfd, const char __user *pathname)
 	nd.flags &= ~LOOKUP_PARENT;
 
 	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
-	dentry = lookup_hash(&nd);
-	error = PTR_ERR(dentry);
-	if (!IS_ERR(dentry)) {
+	error = lookup_hash(&nd, &nd.last, &path);
+	if (!error) {
 		/* Why not before? Because we want correct error value */
 		if (nd.last.name[nd.last.len])
 			goto slashes;
-		inode = dentry->d_inode;
+		inode = path.dentry->d_inode;
 		if (inode)
 			atomic_inc(&inode->i_count);
 		error = mnt_want_write(nd.path.mnt);
 		if (error)
 			goto exit2;
-		error = security_path_unlink(&nd.path, dentry);
+		error = security_path_unlink(&nd.path, path.dentry);
 		if (error)
 			goto exit3;
-		error = vfs_unlink(nd.path.dentry->d_inode, dentry);
+		error = vfs_unlink(nd.path.dentry->d_inode, path.dentry);
 exit3:
 		mnt_drop_write(nd.path.mnt);
 	exit2:
-		dput(dentry);
+		path_put_conditional(&path, &nd);
 	}
 	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
 	if (inode)
@@ -2346,8 +2349,8 @@ exit1:
 	return error;
 
 slashes:
-	error = !dentry->d_inode ? -ENOENT :
-		S_ISDIR(dentry->d_inode->i_mode) ? -EISDIR : -ENOTDIR;
+	error = !path.dentry->d_inode ? -ENOENT :
+		S_ISDIR(path.dentry->d_inode->i_mode) ? -EISDIR : -ENOTDIR;
 	goto exit2;
 }
 
@@ -2687,7 +2690,7 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 		int, newdfd, const char __user *, newname)
 {
 	struct dentry *old_dir, *new_dir;
-	struct dentry *old_dentry, *new_dentry;
+	struct path old, new;
 	struct dentry *trap;
 	struct nameidata oldnd, newnd;
 	char *from;
@@ -2721,16 +2724,15 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 
 	trap = lock_rename(new_dir, old_dir);
 
-	old_dentry = lookup_hash(&oldnd);
-	error = PTR_ERR(old_dentry);
-	if (IS_ERR(old_dentry))
+	error = lookup_hash(&oldnd, &oldnd.last, &old);
+	if (error)
 		goto exit3;
 	/* source must exist */
 	error = -ENOENT;
-	if (!old_dentry->d_inode)
+	if (!old.dentry->d_inode)
 		goto exit4;
 	/* unless the source is a directory trailing slashes give -ENOTDIR */
-	if (!S_ISDIR(old_dentry->d_inode->i_mode)) {
+	if (!S_ISDIR(old.dentry->d_inode->i_mode)) {
 		error = -ENOTDIR;
 		if (oldnd.last.name[oldnd.last.len])
 			goto exit4;
@@ -2739,32 +2741,31 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 	}
 	/* source should not be ancestor of target */
 	error = -EINVAL;
-	if (old_dentry == trap)
+	if (old.dentry == trap)
 		goto exit4;
-	new_dentry = lookup_hash(&newnd);
-	error = PTR_ERR(new_dentry);
-	if (IS_ERR(new_dentry))
+	error = lookup_hash(&newnd, &newnd.last, &new);
+	if (error)
 		goto exit4;
 	/* target should not be an ancestor of source */
 	error = -ENOTEMPTY;
-	if (new_dentry == trap)
+	if (new.dentry == trap)
 		goto exit5;
 
 	error = mnt_want_write(oldnd.path.mnt);
 	if (error)
 		goto exit5;
-	error = security_path_rename(&oldnd.path, old_dentry,
-				     &newnd.path, new_dentry);
+	error = security_path_rename(&oldnd.path, old.dentry,
+				     &newnd.path, new.dentry);
 	if (error)
 		goto exit6;
-	error = vfs_rename(old_dir->d_inode, old_dentry,
-				   new_dir->d_inode, new_dentry);
+	error = vfs_rename(old_dir->d_inode, old.dentry,
+				   new_dir->d_inode, new.dentry);
 exit6:
 	mnt_drop_write(oldnd.path.mnt);
 exit5:
-	dput(new_dentry);
+	path_put_conditional(&new, &newnd);
 exit4:
-	dput(old_dentry);
+	path_put_conditional(&old, &oldnd);
 exit3:
 	unlock_rename(new_dir, old_dir);
 exit2:
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/7] VFS: Make real_lookup() return a struct path
  2009-12-23 23:36   ` [PATCH 2/7] VFS: Make lookup_hash() return a struct path Valerie Aurora
@ 2009-12-23 23:36     ` Valerie Aurora
  2009-12-23 23:37       ` [PATCH 4/7] VFS: Propagate mnt_flags into do_loopback Valerie Aurora
  0 siblings, 1 reply; 28+ messages in thread
From: Valerie Aurora @ 2009-12-23 23:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jan Blunck, Alexander Viro, Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

This patch changes real_lookup() into returning a struct path.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c |   82 +++++++++++++++++++++++++++++++++++++----------------------
 1 files changed, 51 insertions(+), 31 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 0ddfefb..3f645ce 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -473,10 +473,11 @@ ok:
  * make sure that nobody added the entry to the dcache in the meantime..
  * SMP-safe
  */
-static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, struct nameidata *nd)
+static int real_lookup(struct nameidata *nd, struct qstr *name,
+		       struct path *path)
 {
-	struct dentry * result;
-	struct inode *dir = parent->d_inode;
+	struct inode *dir = nd->path.dentry->d_inode;
+	int res = 0;
 
 	mutex_lock(&dir->i_mutex);
 	/*
@@ -493,27 +494,36 @@ static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, s
 	 *
 	 * so doing d_lookup() (with seqlock), instead of lockfree __d_lookup
 	 */
-	result = d_lookup(parent, name);
-	if (!result) {
+	path->dentry = d_lookup(nd->path.dentry, name);
+	path->mnt = nd->path.mnt;
+	if (!path->dentry) {
 		struct dentry *dentry;
 
 		/* Don't create child dentry for a dead directory. */
-		result = ERR_PTR(-ENOENT);
-		if (IS_DEADDIR(dir))
+		if (IS_DEADDIR(dir)) {
+			res = -ENOENT;
 			goto out_unlock;
+		}
 
-		dentry = d_alloc(parent, name);
-		result = ERR_PTR(-ENOMEM);
+		dentry = d_alloc(nd->path.dentry, name);
 		if (dentry) {
-			result = dir->i_op->lookup(dir, dentry, nd);
-			if (result)
+			path->dentry = dir->i_op->lookup(dir, dentry, nd);
+			if (path->dentry) {
 				dput(dentry);
-			else
-				result = dentry;
+				if (IS_ERR(path->dentry)) {
+					res = PTR_ERR(path->dentry);
+					path->dentry = NULL;
+					path->mnt = NULL;
+				}
+			} else
+				path->dentry = dentry;
+		} else {
+			res = -ENOMEM;
+			path->mnt = NULL;
 		}
 out_unlock:
 		mutex_unlock(&dir->i_mutex);
-		return result;
+		return res;
 	}
 
 	/*
@@ -521,12 +531,20 @@ out_unlock:
 	 * we waited on the semaphore. Need to revalidate.
 	 */
 	mutex_unlock(&dir->i_mutex);
-	if (result->d_op && result->d_op->d_revalidate) {
-		result = do_revalidate(result, nd);
-		if (!result)
-			result = ERR_PTR(-ENOENT);
+	if (path->dentry->d_op && path->dentry->d_op->d_revalidate) {
+		path->dentry = do_revalidate(path->dentry, nd);
+		if (!path->dentry) {
+			res = -ENOENT;
+			path->mnt = NULL;
+		}
+		if (IS_ERR(path->dentry)) {
+			res = PTR_ERR(path->dentry);
+			path->dentry = NULL;
+			path->mnt = NULL;
+		}
 	}
-	return result;
+
+	return res;
 }
 
 /*
@@ -793,35 +811,37 @@ static __always_inline void follow_dotdot(struct nameidata *nd)
 static int do_lookup(struct nameidata *nd, struct qstr *name,
 		     struct path *path)
 {
-	struct vfsmount *mnt = nd->path.mnt;
-	struct dentry *dentry = __d_lookup(nd->path.dentry, name);
+	int err;
 
-	if (!dentry)
+	path->dentry = __d_lookup(nd->path.dentry, name);
+	path->mnt = nd->path.mnt;
+	if (!path->dentry)
 		goto need_lookup;
-	if (dentry->d_op && dentry->d_op->d_revalidate)
+	if (path->dentry->d_op && path->dentry->d_op->d_revalidate)
 		goto need_revalidate;
+
 done:
-	path->mnt = mnt;
-	path->dentry = dentry;
 	__follow_mount(path);
 	return 0;
 
 need_lookup:
-	dentry = real_lookup(nd->path.dentry, name, nd);
-	if (IS_ERR(dentry))
+	err = real_lookup(nd, name, path);
+	if (err)
 		goto fail;
 	goto done;
 
 need_revalidate:
-	dentry = do_revalidate(dentry, nd);
-	if (!dentry)
+	path->dentry = do_revalidate(path->dentry, nd);
+	if (!path->dentry)
 		goto need_lookup;
-	if (IS_ERR(dentry))
+	if (IS_ERR(path->dentry)) {
+		err = PTR_ERR(path->dentry);
 		goto fail;
+	}
 	goto done;
 
 fail:
-	return PTR_ERR(dentry);
+	return err;
 }
 
 /*
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/7] VFS: Propagate mnt_flags into do_loopback
  2009-12-23 23:36     ` [PATCH 3/7] VFS: Make real_lookup() " Valerie Aurora
@ 2009-12-23 23:37       ` Valerie Aurora
  2009-12-23 23:37         ` [PATCH 5/7] VFS: Add read-only users count to superblock Valerie Aurora
  0 siblings, 1 reply; 28+ messages in thread
From: Valerie Aurora @ 2009-12-23 23:37 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jan Blunck, Alexander Viro, Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

The mnt_flags are propagated into do_loopback(), so that they can be checked
when mounting something loopback into a union.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index bdc3cb4..2244801 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1446,8 +1446,8 @@ static int do_change_type(struct path *path, int flag)
 /*
  * do loopback mount.
  */
-static int do_loopback(struct path *path, char *old_name,
-				int recurse)
+static int do_loopback(struct path *path, char *old_name, int recurse,
+		       int mnt_flags)
 {
 	struct path old_path;
 	struct vfsmount *mnt = NULL;
@@ -1959,7 +1959,8 @@ long do_mount(char *dev_name, char *dir_name, char *type_page,
 		retval = do_remount(&path, flags & ~MS_REMOUNT, mnt_flags,
 				    data_page);
 	else if (flags & MS_BIND)
-		retval = do_loopback(&path, dev_name, flags & MS_REC);
+		retval = do_loopback(&path, dev_name, flags & MS_REC,
+				     mnt_flags);
 	else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE))
 		retval = do_change_type(&path, flags);
 	else if (flags & MS_MOVE)
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/7] VFS: Add read-only users count to superblock
  2009-12-23 23:37       ` [PATCH 4/7] VFS: Propagate mnt_flags into do_loopback Valerie Aurora
@ 2009-12-23 23:37         ` Valerie Aurora
  2009-12-23 23:37           ` [PATCH 6/7] VFS: BUG_ON() rehash of an already hashed dentry Valerie Aurora
  0 siblings, 1 reply; 28+ messages in thread
From: Valerie Aurora @ 2009-12-23 23:37 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jan Blunck, Alexander Viro, Valerie Aurora

While we can check if a file system is currently read-only, we can't
guarantee that it will stay read-only.  The file system can be
remounted read-write at any time; it's also conceivable that a file
system can be mounted a second time and converted to read-write if the
underlying fs allows it.  This is a problem for union mounts, which
require the underlying file system be read-only.  Add a read-only
users count and don't allow remounts to change the file system to
read-write or read-write mounts if there are any read-only users.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/super.c         |   18 ++++++++++++++++--
 include/linux/fs.h |    5 +++++
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 19eb70b..8a12bab 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -583,8 +583,10 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
 	shrink_dcache_sb(sb);
 	sync_filesystem(sb);
 
-	/* If we are remounting RDONLY and current sb is read/write,
-	   make sure there are no rw files opened */
+	/*
+	 * If we are remounting RDONLY and current sb is read/write,
+	 * make sure there are no rw files opened.
+	 */
 	if ((flags & MS_RDONLY) && !(sb->s_flags & MS_RDONLY)) {
 		if (force)
 			mark_files_ro(sb);
@@ -596,6 +598,14 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
 	}
 	remount_rw = !(flags & MS_RDONLY) && (sb->s_flags & MS_RDONLY);
 
+	/*
+	 * If we are remounting read/write, deny write access if the
+	 * file system is being used by anything that requires it to
+	 * stay read-only (e.g., union mounts).
+	 */
+	if (remount_rw && sb->s_readonly_users)
+		return -EROFS;
+
 	if (sb->s_op->remount_fs) {
 		retval = sb->s_op->remount_fs(sb, &flags, data);
 		if (retval)
@@ -953,6 +963,10 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void
 	WARN((mnt->mnt_sb->s_maxbytes < 0), "%s set sb->s_maxbytes to "
 		"negative value (%lld)\n", type->name, mnt->mnt_sb->s_maxbytes);
 
+	error = -EROFS;
+	if (mnt->mnt_sb->s_readonly_users && !(flags & MS_RDONLY))
+		goto out_sb;
+
 	mnt->mnt_mountpoint = mnt->mnt_root;
 	mnt->mnt_parent = mnt;
 	up_write(&mnt->mnt_sb->s_umount);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2620a8c..4070eac 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1380,6 +1380,11 @@ struct super_block {
 	 * generic_show_options()
 	 */
 	char *s_options;
+
+	/*
+	 * Users who require read-only access - e.g., union mounts
+	 */
+	int s_readonly_users;
 };
 
 extern struct timespec current_fs_time(struct super_block *sb);
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 6/7] VFS: BUG_ON() rehash of an already hashed dentry
  2009-12-23 23:37         ` [PATCH 5/7] VFS: Add read-only users count to superblock Valerie Aurora
@ 2009-12-23 23:37           ` Valerie Aurora
  2009-12-23 23:37             ` [PATCH 7/7] VFS: Remove unnecessary micro-optimization in cached_lookup() Valerie Aurora
  0 siblings, 1 reply; 28+ messages in thread
From: Valerie Aurora @ 2009-12-23 23:37 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jan Blunck, Alexander Viro, Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

BUG_ON() rehash of an already hashed dentry.  For debugging of
dcache-related development.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/dcache.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index a100fa3..74d4ca9 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1551,6 +1551,7 @@ void d_rehash(struct dentry * entry)
 {
 	spin_lock(&dcache_lock);
 	spin_lock(&entry->d_lock);
+	BUG_ON(!d_unhashed(entry));
 	_d_rehash(entry);
 	spin_unlock(&entry->d_lock);
 	spin_unlock(&dcache_lock);
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 7/7] VFS: Remove unnecessary micro-optimization in cached_lookup()
  2009-12-23 23:37           ` [PATCH 6/7] VFS: BUG_ON() rehash of an already hashed dentry Valerie Aurora
@ 2009-12-23 23:37             ` Valerie Aurora
  0 siblings, 0 replies; 28+ messages in thread
From: Valerie Aurora @ 2009-12-23 23:37 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jan Blunck, Alexander Viro, Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

d_lookup() takes rename_lock which is a seq_lock.  This is so cheap
it's not worth calling lockless __d_lookup() first from
cache_lookup().  Rename cached_lookup() to cache_lookup() while we're
there.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c |   13 ++++---------
 1 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3f645ce..ae200ba 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -417,15 +417,10 @@ do_revalidate(struct dentry *dentry, struct nameidata *nd)
  * Internal lookup() using the new generic dcache.
  * SMP-safe
  */
-static struct dentry * cached_lookup(struct dentry * parent, struct qstr * name, struct nameidata *nd)
+static struct dentry *cache_lookup(struct dentry *parent, struct qstr *name,
+				   struct nameidata *nd)
 {
-	struct dentry * dentry = __d_lookup(parent, name);
-
-	/* lockess __d_lookup may fail due to concurrent d_move() 
-	 * in some unrelated directory, so try with d_lookup
-	 */
-	if (!dentry)
-		dentry = d_lookup(parent, name);
+	struct dentry *dentry = d_lookup(parent, name);
 
 	if (dentry && dentry->d_op && dentry->d_op->d_revalidate)
 		dentry = do_revalidate(dentry, nd);
@@ -1208,7 +1203,7 @@ static struct dentry *__lookup_hash(struct qstr *name,
 			goto out;
 	}
 
-	dentry = cached_lookup(base, name, nd);
+	dentry = cache_lookup(base, name, nd);
 	if (!dentry) {
 		struct dentry *new;
 
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2009-12-23 23:36 ` [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info Valerie Aurora
  2009-12-23 23:36   ` [PATCH 2/7] VFS: Make lookup_hash() return a struct path Valerie Aurora
@ 2010-01-02  0:44   ` Ian Kent
  2010-01-14  5:43     ` Al Viro
  1 sibling, 1 reply; 28+ messages in thread
From: Ian Kent @ 2010-01-02  0:44 UTC (permalink / raw)
  To: Valerie Aurora; +Cc: linux-fsdevel, Jan Blunck, Alexander Viro, autofs

On Wed, Dec 23, 2009 at 03:36:57PM -0800, Valerie Aurora wrote:
> From: Jan Blunck <jblunck@suse.de>
> 
> This is a bugfix/replacement for commit
> 051d381259eb57d6074d02a6ba6e90e744f1a29f:
> 
>     During a path walk if an autofs trigger is mounted on a dentry,
>     when the follow_link method is called, the nameidata struct
>     contains the vfsmount and mountpoint dentry of the parent mount
>     while the dentry that is passed in is the root of the autofs
>     trigger mount.  I believe it is impossible to get the vfsmount of
>     the trigger mount, within the follow_link method, when only the
>     parent vfsmount and the root dentry of the trigger mount are
>     known.
> 
> The solution in this commit was to replace the path embedded in the
> parent's nameidata with the path of the link itself in
> __do_follow_link().  This is a relatively harmless misuse of the
> field, but union mounts ran into a bug during follow_link() caused by
> the nameidata containing the wrong path (we count on it being what it
> is all other places - the path of the parent).
> 
> A cleaner and easier to understand solution is to save the necessary
> vfsmount in the autofs superblock info when it is mounted.  Then we
> can easily update the vfsmount in autofs4_follow_link().
> 
> Signed-off-by: Jan Blunck <jblunck@suse.de>
> Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Acked-by: <raven@themaw.net>

Don't know how I missed such an obvious solution when I did this.
Thanks, Ian

> Cc: Ian Kent <raven@themaw.net>
> Cc: autofs@linux.kernel.org
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/autofs4/autofs_i.h |    1 +
>  fs/autofs4/init.c     |   11 ++++++++++-
>  fs/autofs4/root.c     |    6 ++++++
>  fs/namei.c            |    7 ++-----
>  4 files changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
> index 8f7cdde..db2bfce 100644
> --- a/fs/autofs4/autofs_i.h
> +++ b/fs/autofs4/autofs_i.h
> @@ -130,6 +130,7 @@ struct autofs_sb_info {
>  	int reghost_enabled;
>  	int needs_reghost;
>  	struct super_block *sb;
> +	struct vfsmount *mnt;
>  	struct mutex wq_mutex;
>  	spinlock_t fs_lock;
>  	struct autofs_wait_queue *queues; /* Wait queue pointer */
> diff --git a/fs/autofs4/init.c b/fs/autofs4/init.c
> index 9722e4b..5e0dcd7 100644
> --- a/fs/autofs4/init.c
> +++ b/fs/autofs4/init.c
> @@ -17,7 +17,16 @@
>  static int autofs_get_sb(struct file_system_type *fs_type,
>  	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
>  {
> -	return get_sb_nodev(fs_type, flags, data, autofs4_fill_super, mnt);
> +	struct autofs_sb_info *sbi;
> +	int ret;
> +
> +	ret = get_sb_nodev(fs_type, flags, data, autofs4_fill_super, mnt);
> +	if (ret)
> +		return ret;
> +
> +	sbi = autofs4_sbi(mnt->mnt_sb);
> +	sbi->mnt = mnt;
> +	return 0;
>  }
>  
>  static struct file_system_type autofs_fs_type = {
> diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
> index b96a3c5..cb991b8 100644
> --- a/fs/autofs4/root.c
> +++ b/fs/autofs4/root.c
> @@ -179,6 +179,12 @@ static void *autofs4_follow_link(struct dentry *dentry, struct nameidata *nd)
>  	DPRINTK("dentry=%p %.*s oz_mode=%d nd->flags=%d",
>  		dentry, dentry->d_name.len, dentry->d_name.name, oz_mode,
>  		nd->flags);
> +
> +	dput(nd->path.dentry);
> +	mntput(nd->path.mnt);
> +	nd->path.mnt = mntget(sbi->mnt);
> +	nd->path.dentry = dget(dentry);
> +
>  	/*
>  	 * For an expire of a covered direct or offset mount we need
>  	 * to break out of follow_down() at the autofs mount trigger
> diff --git a/fs/namei.c b/fs/namei.c
> index d11f404..c768444 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -629,11 +629,8 @@ static __always_inline int __do_follow_link(struct path *path, struct nameidata
>  	touch_atime(path->mnt, dentry);
>  	nd_set_link(nd, NULL);
>  
> -	if (path->mnt != nd->path.mnt) {
> -		path_to_nameidata(path, nd);
> -		dget(dentry);
> -	}
> -	mntget(path->mnt);
> +	if (path->mnt == nd->path.mnt)
> +		mntget(nd->path.mnt);
>  	cookie = dentry->d_inode->i_op->follow_link(dentry, nd);
>  	error = PTR_ERR(cookie);
>  	if (!IS_ERR(cookie)) {
> -- 
> 1.5.6.5
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2010-01-02  0:44   ` [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info Ian Kent
@ 2010-01-14  5:43     ` Al Viro
  2010-01-14 19:18       ` Valerie Aurora
  0 siblings, 1 reply; 28+ messages in thread
From: Al Viro @ 2010-01-14  5:43 UTC (permalink / raw)
  To: Ian Kent; +Cc: Valerie Aurora, linux-fsdevel, Jan Blunck, autofs

On Sat, Jan 02, 2010 at 08:44:25AM +0800, Ian Kent wrote:
> On Wed, Dec 23, 2009 at 03:36:57PM -0800, Valerie Aurora wrote:
> > From: Jan Blunck <jblunck@suse.de>
> > 
> > This is a bugfix/replacement for commit
> > 051d381259eb57d6074d02a6ba6e90e744f1a29f:
> > 
> >     During a path walk if an autofs trigger is mounted on a dentry,
> >     when the follow_link method is called, the nameidata struct
> >     contains the vfsmount and mountpoint dentry of the parent mount
> >     while the dentry that is passed in is the root of the autofs
> >     trigger mount.  I believe it is impossible to get the vfsmount of
> >     the trigger mount, within the follow_link method, when only the
> >     parent vfsmount and the root dentry of the trigger mount are
> >     known.
> > 
> > The solution in this commit was to replace the path embedded in the
> > parent's nameidata with the path of the link itself in
> > __do_follow_link().  This is a relatively harmless misuse of the
> > field, but union mounts ran into a bug during follow_link() caused by
> > the nameidata containing the wrong path (we count on it being what it
> > is all other places - the path of the parent).
> > 
> > A cleaner and easier to understand solution is to save the necessary
> > vfsmount in the autofs superblock info when it is mounted.  Then we
> > can easily update the vfsmount in autofs4_follow_link().
> > 
> > Signed-off-by: Jan Blunck <jblunck@suse.de>
> > Signed-off-by: Valerie Aurora <vaurora@redhat.com>
> Acked-by: <raven@themaw.net>
> 
> Don't know how I missed such an obvious solution when I did this.
> Thanks, Ian

TBH, I don't like either variant (both the in-tree one and that).
The reason why vfsmount does *NOT* belong in superblock, TYVM: you've
messed the lifetime rules.  You can't pin it down, or the damn thing will
be impossible to kill.  OTOH, you have no promise whatsoever that superblock
won't outlive the initial vfsmount.  You might get another vfsmount over
the same thing and once the original one is gone...

So this is simply broken.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2010-01-14  5:43     ` Al Viro
@ 2010-01-14 19:18       ` Valerie Aurora
  2010-01-15  6:05         ` Ian Kent
  0 siblings, 1 reply; 28+ messages in thread
From: Valerie Aurora @ 2010-01-14 19:18 UTC (permalink / raw)
  To: Al Viro; +Cc: Ian Kent, linux-fsdevel, Jan Blunck, autofs

On Thu, Jan 14, 2010 at 05:43:22AM +0000, Al Viro wrote:
> On Sat, Jan 02, 2010 at 08:44:25AM +0800, Ian Kent wrote:
> > On Wed, Dec 23, 2009 at 03:36:57PM -0800, Valerie Aurora wrote:
> > > From: Jan Blunck <jblunck@suse.de>
> > > 
> > > This is a bugfix/replacement for commit
> > > 051d381259eb57d6074d02a6ba6e90e744f1a29f:
> > > 
> > >     During a path walk if an autofs trigger is mounted on a dentry,
> > >     when the follow_link method is called, the nameidata struct
> > >     contains the vfsmount and mountpoint dentry of the parent mount
> > >     while the dentry that is passed in is the root of the autofs
> > >     trigger mount.  I believe it is impossible to get the vfsmount of
> > >     the trigger mount, within the follow_link method, when only the
> > >     parent vfsmount and the root dentry of the trigger mount are
> > >     known.
> > > 
> > > The solution in this commit was to replace the path embedded in the
> > > parent's nameidata with the path of the link itself in
> > > __do_follow_link().  This is a relatively harmless misuse of the
> > > field, but union mounts ran into a bug during follow_link() caused by
> > > the nameidata containing the wrong path (we count on it being what it
> > > is all other places - the path of the parent).
> > > 
> > > A cleaner and easier to understand solution is to save the necessary
> > > vfsmount in the autofs superblock info when it is mounted.  Then we
> > > can easily update the vfsmount in autofs4_follow_link().
> > > 
> > > Signed-off-by: Jan Blunck <jblunck@suse.de>
> > > Signed-off-by: Valerie Aurora <vaurora@redhat.com>
> > Acked-by: <raven@themaw.net>
> > 
> > Don't know how I missed such an obvious solution when I did this.
> > Thanks, Ian
> 
> TBH, I don't like either variant (both the in-tree one and that).
> The reason why vfsmount does *NOT* belong in superblock, TYVM: you've
> messed the lifetime rules.  You can't pin it down, or the damn thing will
> be impossible to kill.  OTOH, you have no promise whatsoever that superblock
> won't outlive the initial vfsmount.  You might get another vfsmount over
> the same thing and once the original one is gone...
> 
> So this is simply broken.

Ian, you're the expert - any ideas?  What are the constraints here?

-VAL

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2010-01-14 19:18       ` Valerie Aurora
@ 2010-01-15  6:05         ` Ian Kent
  2010-01-15  8:03           ` Al Viro
  2010-01-15 14:55           ` David Howells
  0 siblings, 2 replies; 28+ messages in thread
From: Ian Kent @ 2010-01-15  6:05 UTC (permalink / raw)
  To: Valerie Aurora; +Cc: Al Viro, linux-fsdevel, Jan Blunck, autofs

On 01/15/2010 03:18 AM, Valerie Aurora wrote:
> On Thu, Jan 14, 2010 at 05:43:22AM +0000, Al Viro wrote:
>> On Sat, Jan 02, 2010 at 08:44:25AM +0800, Ian Kent wrote:
>>> On Wed, Dec 23, 2009 at 03:36:57PM -0800, Valerie Aurora wrote:
>>>> From: Jan Blunck <jblunck@suse.de>
>>>>
>>>> This is a bugfix/replacement for commit
>>>> 051d381259eb57d6074d02a6ba6e90e744f1a29f:
>>>>
>>>>     During a path walk if an autofs trigger is mounted on a dentry,
>>>>     when the follow_link method is called, the nameidata struct
>>>>     contains the vfsmount and mountpoint dentry of the parent mount
>>>>     while the dentry that is passed in is the root of the autofs
>>>>     trigger mount.  I believe it is impossible to get the vfsmount of
>>>>     the trigger mount, within the follow_link method, when only the
>>>>     parent vfsmount and the root dentry of the trigger mount are
>>>>     known.
>>>>
>>>> The solution in this commit was to replace the path embedded in the
>>>> parent's nameidata with the path of the link itself in
>>>> __do_follow_link().  This is a relatively harmless misuse of the
>>>> field, but union mounts ran into a bug during follow_link() caused by
>>>> the nameidata containing the wrong path (we count on it being what it
>>>> is all other places - the path of the parent).
>>>>
>>>> A cleaner and easier to understand solution is to save the necessary
>>>> vfsmount in the autofs superblock info when it is mounted.  Then we
>>>> can easily update the vfsmount in autofs4_follow_link().
>>>>
>>>> Signed-off-by: Jan Blunck <jblunck@suse.de>
>>>> Signed-off-by: Valerie Aurora <vaurora@redhat.com>
>>> Acked-by: <raven@themaw.net>
>>>
>>> Don't know how I missed such an obvious solution when I did this.
>>> Thanks, Ian
>>
>> TBH, I don't like either variant (both the in-tree one and that).
>> The reason why vfsmount does *NOT* belong in superblock, TYVM: you've
>> messed the lifetime rules.  You can't pin it down, or the damn thing will
>> be impossible to kill.  OTOH, you have no promise whatsoever that superblock
>> won't outlive the initial vfsmount.  You might get another vfsmount over
>> the same thing and once the original one is gone...
>>
>> So this is simply broken.
> 
> Ian, you're the expert - any ideas?  What are the constraints here?

It looks to me like the struct path coming to __do_follow_link() has
what I need. A potentially big change, but using the struct path instead
of the dentry in the inode op follow_link() call would get me what I
need in my follow_link() call without having to store anything.

Ian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2010-01-15  6:05         ` Ian Kent
@ 2010-01-15  8:03           ` Al Viro
  2010-01-15 17:36             ` Steve French
  2010-01-18  5:08             ` Ian Kent
  2010-01-15 14:55           ` David Howells
  1 sibling, 2 replies; 28+ messages in thread
From: Al Viro @ 2010-01-15  8:03 UTC (permalink / raw)
  To: Ian Kent
  Cc: Valerie Aurora, linux-fsdevel, Jan Blunck, autofs, sfrench,
	dhowells, Trond.Myklebust

[AFS, CIFS and NFS maintainers added to Cc]

On Fri, Jan 15, 2010 at 02:05:24PM +0800, Ian Kent wrote:
> >> So this is simply broken.
> > 
> > Ian, you're the expert - any ideas?  What are the constraints here?
> 
> It looks to me like the struct path coming to __do_follow_link() has
> what I need. A potentially big change, but using the struct path instead
> of the dentry in the inode op follow_link() call would get me what I
> need in my follow_link() call without having to store anything.

Yuck.  If we go for change of ->follow_link() prototype, we might as
well DTRT and split the damn thing instead of trying to overload an
unsuitable method.

Let's see what's really going on there.  AFAICT, we have several, er,
oddities in the area:
	* abuse of mnt_devname by nfs; do we ever want it to differ
(for nfs automounting purposes) for different vfsmounts over the same
superblock?
	* inconsistent behaviour when something had managed to mount
first.  Some expect the possibility of -EBUSY, some do not.
	* WTF is CIFS doing setting MNT_SHRINKABLE on *parent* vfsmount?
Blindly, not even bothering to cooperate with mnt_make_readonly() and
its ilk.  Most of all, _why_?  It's child that should be marked so, not
the parent...
	* speaking of which, what should happen to other vfsmount flags?
NFS seems to want them all inherited and MNT_SHRINKABLE set, which is
reasonable enough; CIFS sets MNT_SHRINKABLE on parent and inherits everything;
AFS can't be arsed and just sets MNT_SHRINKABLE, all flags on parent be
damned.
	* what's going on with use of intent flags in case of autofs4?

Other than that, it'd seem that we would be better off if we treated these
guys as "if we try to follow mountpoint and that sucker turns out to be
the end of the road, do something fs-specific and either try to go further
or see if we could attach the mountpoint given to us here".  Which might
be better done in VFS, with fs method returning vfsmount, NULL or ERR_PTR.

And yes, it'd mean rework of vfsmount expiry interface; we'd need to put the
vfsmount on expiry list before attaching it anywhere.  Not a big deal,
that...

Comments?  It's obviously not a solution yet and this direction might
very well turn out to be unfeasible, but let's at least see where that
might lead.  There are, after all, 4 users of that stuff.  Should be
enough to figure out what we really need and come up with a sane design...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2010-01-15  6:05         ` Ian Kent
  2010-01-15  8:03           ` Al Viro
@ 2010-01-15 14:55           ` David Howells
  2010-01-15 16:58             ` Al Viro
  2010-01-15 17:08             ` David Howells
  1 sibling, 2 replies; 28+ messages in thread
From: David Howells @ 2010-01-15 14:55 UTC (permalink / raw)
  To: Al Viro
  Cc: dhowells, Ian Kent, Valerie Aurora, linux-fsdevel, Jan Blunck,
	autofs, sfrench, Trond.Myklebust

Al Viro <viro@ZenIV.linux.org.uk> wrote:

> AFS can't be arsed and just sets MNT_SHRINKABLE, all flags on parent be
> damned.

Why's that wrong?  This is AFS's automounter, and it seems quite reasonable to
have the outer edges pruned to make space.

David

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2010-01-15 14:55           ` David Howells
@ 2010-01-15 16:58             ` Al Viro
  2010-01-15 17:08             ` David Howells
  1 sibling, 0 replies; 28+ messages in thread
From: Al Viro @ 2010-01-15 16:58 UTC (permalink / raw)
  To: David Howells
  Cc: Ian Kent, Valerie Aurora, linux-fsdevel, Jan Blunck, autofs,
	sfrench, Trond.Myklebust

On Fri, Jan 15, 2010 at 02:55:23PM +0000, David Howells wrote:
> Al Viro <viro@ZenIV.linux.org.uk> wrote:
> 
> > AFS can't be arsed and just sets MNT_SHRINKABLE, all flags on parent be
> > damned.
> 
> Why's that wrong?  This is AFS's automounter, and it seems quite reasonable to
> have the outer edges pruned to make space.

Do you want it to inerit e.g. nosuid?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2010-01-15 14:55           ` David Howells
  2010-01-15 16:58             ` Al Viro
@ 2010-01-15 17:08             ` David Howells
  2010-01-15 17:26               ` Al Viro
  1 sibling, 1 reply; 28+ messages in thread
From: David Howells @ 2010-01-15 17:08 UTC (permalink / raw)
  To: Al Viro
  Cc: dhowells, Ian Kent, Valerie Aurora, linux-fsdevel, Jan Blunck,
	autofs, sfrench, Trond.Myklebust

Al Viro <viro@ZenIV.linux.org.uk> wrote:

> Do you want it to inerit e.g. nosuid?

Are you just talking about MNT_SHRINKABLE?  Or all the other flags?

Should I be passing:

	nd->path.mnt->mnt_flags | MNT_SHRINKABLE

instead?

David

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2010-01-15 17:08             ` David Howells
@ 2010-01-15 17:26               ` Al Viro
  2010-01-16 10:17                 ` Al Viro
  0 siblings, 1 reply; 28+ messages in thread
From: Al Viro @ 2010-01-15 17:26 UTC (permalink / raw)
  To: David Howells
  Cc: Ian Kent, Valerie Aurora, linux-fsdevel, Jan Blunck, autofs,
	sfrench, Trond.Myklebust

On Fri, Jan 15, 2010 at 05:08:28PM +0000, David Howells wrote:
> Al Viro <viro@ZenIV.linux.org.uk> wrote:
> 
> > Do you want it to inerit e.g. nosuid?
> 
> Are you just talking about MNT_SHRINKABLE?  Or all the other flags?
> 
> Should I be passing:
> 
> 	nd->path.mnt->mnt_flags | MNT_SHRINKABLE
> 
> instead?

Maybe, maybe not...  BTW, even that leaves an unpleasant race with
mnt_make_readonly() (CIFS and NFS seem to be suffering from one).
Which flags do we want to be inherited?  Grabbing MNT_WRITE_HOLD,
for example, would obviously be a bad idea...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2010-01-15  8:03           ` Al Viro
@ 2010-01-15 17:36             ` Steve French
  2010-01-18  5:08             ` Ian Kent
  1 sibling, 0 replies; 28+ messages in thread
From: Steve French @ 2010-01-15 17:36 UTC (permalink / raw)
  To: Al Viro
  Cc: autofs, dhowells, Trond.Myklebust, sfrench, Valerie Aurora,
	linux-fsdevel, Jan Blunck, Ian Kent


[-- Attachment #1.1: Type: text/plain, Size: 481 bytes --]

On Fri, Jan 15, 2010 at 2:03 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:

> [AFS, CIFS and NFS maintainers added to Cc]
>        * WTF is CIFS doing setting MNT_SHRINKABLE on *parent* vfsmount?
> Blindly, not even bothering to cooperate with mnt_make_readonly() and
> its ilk.  Most of all, _why_?  It's child that should be marked so, not
> the parent...
>

Yes. Looks like a bug. Thanks for spotting this.  Doublechecking with Igor
(who wrote this section).

-- 
Thanks,

Steve

[-- Attachment #1.2: Type: text/html, Size: 793 bytes --]

[-- Attachment #2: Type: text/plain, Size: 140 bytes --]

_______________________________________________
autofs mailing list
autofs@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/autofs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2010-01-15 17:26               ` Al Viro
@ 2010-01-16 10:17                 ` Al Viro
  2010-01-17 17:57                   ` Al Viro
  0 siblings, 1 reply; 28+ messages in thread
From: Al Viro @ 2010-01-16 10:17 UTC (permalink / raw)
  To: David Howells
  Cc: Ian Kent, Valerie Aurora, linux-fsdevel, Jan Blunck, autofs,
	sfrench, Trond.Myklebust

On Fri, Jan 15, 2010 at 05:26:33PM +0000, Al Viro wrote:

> Maybe, maybe not...  BTW, even that leaves an unpleasant race with
> mnt_make_readonly() (CIFS and NFS seem to be suffering from one).
> Which flags do we want to be inherited?  Grabbing MNT_WRITE_HOLD,
> for example, would obviously be a bad idea...

Oh, man...  After doing code review re locking rules for mnt_flags
and related stuff, a bunch of races had shown up.  I'll put fixes
into #for-linus tomorrow.

Shit galore:
	* may_umount() ought to take namespace_sem (shared).  Otherwise
we race with clone_mnt() doing add_list() to ->mnt_share/->mnt_slave.
	* attach_recursive_mnt() ought to take vfsmount_lock around
its loop that does set_mnt_shared(); otherwise mount --move can race
with e.g. mount -o remount.
	* do_remount() ought to take vfsmount_lock around the assigment
to mnt_flags *and* take care to leave MNT_SHARED and MNT_UNBINDABLE alone,
especially the former.
	* CIFS shouldn't step on mnt_flags
	* NFS, AFS and CIFS should *not* leave MNT_SHARED and MNT_WRITE_HOLD
in flags passed to do_add_mount(); alternatively, do_add_mount() might
trim those.
	* [unsolved, to be dealt along with per-superblock write counts]
do_remount() plays fast and loose with MNT_READONLY for !MS_BIND case.
	* [*really* unsolved] it remains to be seen whether we want to
propagate modifications of mount flags via shared subtree stuff.  For most
of those it's trivial (and arguably the right thing to do), but ro/rw is
really nasty.  Nick's mnt_want_write() implementation will need very careful
analysis.
	* [#for-next fodder] pnode.c:get_source() cleanup; right now it is
correct, but PITA to prove the correctness.  Incidentally, CL_PROPAGATION
is gone after that one.  Not #for-linus stuff, but I'll be really happier
with it for post-.33
	* [#for-next, but might go into #for-linus as well] explicit
documentation of invariants added to Documentation/filesystems/sharedsubtree.txt
Part of that can be deduced from what's already there, but I'd rather have
it explicit and one crucial bit is simply missing ("if mnt->mnt_master != NULL,
the entire mnt->mnt_share list consists of adjacent elements of mnt->mnt_slaves,
in the same order") and proving correctness without it is Not Fun(tm).  Hell,
I've spent an hour figuring out whether it's broken or not and I'd been
through all that code back when it had been written.  In details.

I'll give that another look after I get some sleep, then to public tree it
goes...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2010-01-16 10:17                 ` Al Viro
@ 2010-01-17 17:57                   ` Al Viro
  2010-01-18  4:21                     ` Ian Kent
  0 siblings, 1 reply; 28+ messages in thread
From: Al Viro @ 2010-01-17 17:57 UTC (permalink / raw)
  To: Ian Kent
  Cc: Valerie Aurora, linux-fsdevel, Jan Blunck, autofs, sfrench,
	Trond.Myklebust, David Howells

On Sat, Jan 16, 2010 at 10:17:14AM +0000, Al Viro wrote:
> 	* [unsolved, to be dealt along with per-superblock write counts]
> do_remount() plays fast and loose with MNT_READONLY for !MS_BIND case.
> 	* [*really* unsolved] it remains to be seen whether we want to
> propagate modifications of mount flags via shared subtree stuff.  For most
> of those it's trivial (and arguably the right thing to do), but ro/rw is
> really nasty.  Nick's mnt_want_write() implementation will need very careful
> analysis.

	Speaking of autofs4, what the hell is going on in
autofs_dev_ioctl_ismountpoint()?  Checks in there make no sense,
both the "could that dentry be negative?" and whatever the hell
it is trying to do with mnt_mountpoint.  Ian?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2010-01-17 17:57                   ` Al Viro
@ 2010-01-18  4:21                     ` Ian Kent
  2010-01-18  5:59                       ` Al Viro
  0 siblings, 1 reply; 28+ messages in thread
From: Ian Kent @ 2010-01-18  4:21 UTC (permalink / raw)
  To: Al Viro
  Cc: Valerie Aurora, linux-fsdevel, Jan Blunck, autofs, sfrench,
	Trond.Myklebust, David Howells

On 01/18/2010 01:57 AM, Al Viro wrote:
> On Sat, Jan 16, 2010 at 10:17:14AM +0000, Al Viro wrote:
>> 	* [unsolved, to be dealt along with per-superblock write counts]
>> do_remount() plays fast and loose with MNT_READONLY for !MS_BIND case.
>> 	* [*really* unsolved] it remains to be seen whether we want to
>> propagate modifications of mount flags via shared subtree stuff.  For most
>> of those it's trivial (and arguably the right thing to do), but ro/rw is
>> really nasty.  Nick's mnt_want_write() implementation will need very careful
>> analysis.
> 
> 	Speaking of autofs4, what the hell is going on in
> autofs_dev_ioctl_ismountpoint()?  Checks in there make no sense,
> both the "could that dentry be negative?" and whatever the hell
> it is trying to do with mnt_mountpoint.  Ian?

In that case we may find an autofs mount that has something mounted on
top of it and user space wants to know the super of the covering mount.

If there is something mounted on top of it user space needs to know if
it is another autofs file system or some other type of file system. So
if the nameidata path, located by autofs_dev_ioctl_find_super(), is not
the top (or bottom, depending on the terminology you prefer) then we
need to follow the mount and return the magic of the thing mounted on
top of it.

Ian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2010-01-15  8:03           ` Al Viro
  2010-01-15 17:36             ` Steve French
@ 2010-01-18  5:08             ` Ian Kent
  1 sibling, 0 replies; 28+ messages in thread
From: Ian Kent @ 2010-01-18  5:08 UTC (permalink / raw)
  To: Al Viro
  Cc: Valerie Aurora, linux-fsdevel, Jan Blunck, autofs, sfrench,
	dhowells, Trond.Myklebust

On 01/15/2010 04:03 PM, Al Viro wrote:
> 	* what's going on with use of intent flags in case of autofs4?

It's the same old story.
autofs needs to take care not to trigger mounts for certain system calls
when we have reached the last path component. AFAIK the only way to do
that is to check the flags.

> 
> Other than that, it'd seem that we would be better off if we treated these
> guys as "if we try to follow mountpoint and that sucker turns out to be
> the end of the road, do something fs-specific and either try to go further
> or see if we could attach the mountpoint given to us here".  Which might
> be better done in VFS, with fs method returning vfsmount, NULL or ERR_PTR.

Forgive my thickness but am I understanding this correctly?

Are you just saying it would be better to add a file system method to
cater for the case where we might need to do further resolution on the
mount root we have just arrived at before continuing into it (mmm ...
now it feels like I've stated the obvious, but anyway ...)?

Would that be before following the link or after (not that it would
matter for my needs)?

Ian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2010-01-18  4:21                     ` Ian Kent
@ 2010-01-18  5:59                       ` Al Viro
  2010-01-18  9:14                         ` Ian Kent
  0 siblings, 1 reply; 28+ messages in thread
From: Al Viro @ 2010-01-18  5:59 UTC (permalink / raw)
  To: Ian Kent
  Cc: Valerie Aurora, linux-fsdevel, Jan Blunck, autofs, sfrench,
	Trond.Myklebust, David Howells

On Mon, Jan 18, 2010 at 12:21:09PM +0800, Ian Kent wrote:

> In that case we may find an autofs mount that has something mounted on
> top of it and user space wants to know the super of the covering mount.
> 
> If there is something mounted on top of it user space needs to know if
> it is another autofs file system or some other type of file system. So
> if the nameidata path, located by autofs_dev_ioctl_find_super(), is not
> the top (or bottom, depending on the terminology you prefer) then we
> need to follow the mount and return the magic of the thing mounted on
> top of it.

IDGI.  What you are doing there is
                if (path.mnt->mnt_mountpoint != path.mnt->mnt_root) {
                        if (follow_down(&path))
                                magic = path.mnt->mnt_sb->s_magic;
                }
and I don't think it means what you think it means.  Just what is that
mnt_mountpoint check about?  Before that point we'd found the autofs
vfsmount M that
	1) M is mounted on <name>
	2) M->mnt_sb has the right s_dev
	3) M is the closest one to root in mount tree out of vfsmounts
satisfying (1) and (2)
Now we check that
	4) the mountpoint M is attached to has dentry different from
M->mnt_root.  That's an interesting thing to check, seeing that the
only way to get it false is to have mount --bind name name, with name
not being the mountpoint before that.  And M being the result of
that mount --bind.
	5) something is mounted on top of root of M.

Then we proceed to return the s_magic of that something.  For one thing,
if there *are* several vfsmounts satisfying (1,2), which one do we really
want?  For another, what's the intent of (4)?  It looks very odd; what's
really being checked there?

In another branch we have
                if (path.dentry->d_inode &&
                    path.mnt->mnt_root == path.dentry) {
                        err = 1;
                        magic = path.dentry->d_inode->i_sb->s_magic;
                }
and AFAICT, path.dentry->d_inode == NULL is impossible there.  Besides,
path.mnt->mnt_sb->s_magic would be simpler anyway (and evaluate to
the same thing).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2010-01-18  5:59                       ` Al Viro
@ 2010-01-18  9:14                         ` Ian Kent
  2010-01-18 10:27                           ` Al Viro
  0 siblings, 1 reply; 28+ messages in thread
From: Ian Kent @ 2010-01-18  9:14 UTC (permalink / raw)
  To: Al Viro
  Cc: Valerie Aurora, linux-fsdevel, Jan Blunck, autofs, sfrench,
	Trond.Myklebust, David Howells

On 01/18/2010 01:59 PM, Al Viro wrote:
> On Mon, Jan 18, 2010 at 12:21:09PM +0800, Ian Kent wrote:
> 
>> In that case we may find an autofs mount that has something mounted on
>> top of it and user space wants to know the super of the covering mount.
>>
>> If there is something mounted on top of it user space needs to know if
>> it is another autofs file system or some other type of file system. So
>> if the nameidata path, located by autofs_dev_ioctl_find_super(), is not
>> the top (or bottom, depending on the terminology you prefer) then we
>> need to follow the mount and return the magic of the thing mounted on
>> top of it.
> 
> IDGI.  What you are doing there is
>                 if (path.mnt->mnt_mountpoint != path.mnt->mnt_root) {
>                         if (follow_down(&path))
>                                 magic = path.mnt->mnt_sb->s_magic;
>                 }
> and I don't think it means what you think it means.  Just what is that
> mnt_mountpoint check about?  Before that point we'd found the autofs
> vfsmount M that
> 	1) M is mounted on <name>
> 	2) M->mnt_sb has the right s_dev
> 	3) M is the closest one to root in mount tree out of vfsmounts
> satisfying (1) and (2)
> Now we check that
> 	4) the mountpoint M is attached to has dentry different from
> M->mnt_root.  That's an interesting thing to check, seeing that the
> only way to get it false is to have mount --bind name name, with name
> not being the mountpoint before that.  And M being the result of
> that mount --bind.
> 	5) something is mounted on top of root of M.
> 
> Then we proceed to return the s_magic of that something.  For one thing,
> if there *are* several vfsmounts satisfying (1,2), which one do we really
> want?  For another, what's the intent of (4)?  It looks very odd; what's
> really being checked there?

The code here has changed a little from what I originally posted but,
assuming for now the functionality is equivalent, I can't see what the
point of that check is and now I can't remember if there was some odd
special case. follow_mount() should be sufficient. I'll fix this.

The possibility of more than one vfsmount being present is, as you say
possible, but it is not legal for autofs (and last time I checked I
concluded it wasn't possible for me to veto bind mount requests). Other
than bind mounts I'm struggling to think of a case where I would have
more than one autofs fs mount with the same s_dev.

> 
> In another branch we have
>                 if (path.dentry->d_inode &&
>                     path.mnt->mnt_root == path.dentry) {
>                         err = 1;
>                         magic = path.dentry->d_inode->i_sb->s_magic;
>                 }
> and AFAICT, path.dentry->d_inode == NULL is impossible there.  Besides,
> path.mnt->mnt_sb->s_magic would be simpler anyway (and evaluate to
> the same thing).

Yes, the dentry should always be positive here but let me think about it
a little more in case I'm missing something. And yes, using the vfsmount
super block pointer would be better. I'll fix these too.

Ian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2010-01-18  9:14                         ` Ian Kent
@ 2010-01-18 10:27                           ` Al Viro
  2010-01-18 19:35                             ` Trond Myklebust
  2010-01-19  7:05                             ` Ian Kent
  0 siblings, 2 replies; 28+ messages in thread
From: Al Viro @ 2010-01-18 10:27 UTC (permalink / raw)
  To: Ian Kent
  Cc: autofs, David Howells, Trond.Myklebust, sfrench, Valerie Aurora,
	linux-fsdevel, Jan Blunck

On Mon, Jan 18, 2010 at 05:14:58PM +0800, Ian Kent wrote:

> The possibility of more than one vfsmount being present is, as you say
> possible, but it is not legal for autofs (and last time I checked I
> concluded it wasn't possible for me to veto bind mount requests). Other
> than bind mounts I'm struggling to think of a case where I would have
> more than one autofs fs mount with the same s_dev.

That's interesting...  Other place where we go through the stack of mounts
is where we look for one with given bits in type; do we have a possibility
of multiple candidates there and which one do we really want?  Same function,
case when we have ioctlfd == -1, !autofs_type_any(type).  Current code
(as well as original) goes for one closest to root; it would certainly be
simpler to go for one on top of stack...

> Yes, the dentry should always be positive here but let me think about it
> a little more in case I'm missing something. And yes, using the vfsmount
> super block pointer would be better. I'll fix these too.

FWIW, I'd rather have that stuff in vfs tree; that's not the only patch
eliminating mnt_mountpoint use.  AFAICT, none of the uses outside of
core VFS code are legitimate (and BTW, NFSv4 one is contrary to RFC - it
should do follow_down() in loop before reporting mount_fileid instead of
just picking the immediate ancestor for one thing and I'm not at all sure
that check around that thing is correct; it ignores export path.dentry
and looks at path.mnt.root instead, which seems to be wrong).  Other places
would be just as happy with mnt/mnt->mnt_root instead of mnt->mnt_parent/
mnt->mnt_mountpoint or, worse, mnt/mnt->mnt_mountpoint.  Pohmelfs is even
nastier, with its d_path() abuses, but that's a separate story.

IMO we ought to get rid of mnt_mountpoint/mnt_parent uses outside of core
VFS and I'd rather do that in one patch queue.

Back to another question: which syscalls should and which syscalls should not 
trigger automount on the last component?  Note that it's something we'd better
be consistent about between autofs4 and cifs/afs/nfs...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2010-01-18 10:27                           ` Al Viro
@ 2010-01-18 19:35                             ` Trond Myklebust
  2010-01-25  8:16                               ` H. Peter Anvin
  2010-01-19  7:05                             ` Ian Kent
  1 sibling, 1 reply; 28+ messages in thread
From: Trond Myklebust @ 2010-01-18 19:35 UTC (permalink / raw)
  To: Al Viro
  Cc: Ian Kent, Valerie Aurora, linux-fsdevel, Jan Blunck, autofs,
	sfrench, David Howells

On Mon, 2010-01-18 at 10:27 +0000, Al Viro wrote: 
> Back to another question: which syscalls should and which syscalls should not 
> trigger automount on the last component?  Note that it's something we'd better
> be consistent about between autofs4 and cifs/afs/nfs...

In addition to the ones that trigger automounts now:

One syscall that we've had a lot of complaints about is 'stat()'. Since
it doesn't follow symlinks, it will fail to trigger the automount, and
will return bogus values for st_dev. This again confuses some versions
of the 'du' utility that check the value of st_dev before and after
entering the subdir.
(see https://bugzilla.redhat.com/show_bug.cgi?id=431166 )

In the cifs/afs/nfs case, statfs() should really trigger automounts too.
Not sure if we want that for the autofs case.

Finally, it makes little sense for cifs/afs/nfs to allow open() on the
underlying directory, even if O_NOFOLLOW is set. Again, autofs might be
an exception due to its reliance on ioctls as a method of communication
with the kernel.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2010-01-18 10:27                           ` Al Viro
  2010-01-18 19:35                             ` Trond Myklebust
@ 2010-01-19  7:05                             ` Ian Kent
  1 sibling, 0 replies; 28+ messages in thread
From: Ian Kent @ 2010-01-19  7:05 UTC (permalink / raw)
  To: Al Viro
  Cc: Valerie Aurora, linux-fsdevel, Jan Blunck, autofs, sfrench,
	Trond.Myklebust, David Howells

On 01/18/2010 06:27 PM, Al Viro wrote:
> On Mon, Jan 18, 2010 at 05:14:58PM +0800, Ian Kent wrote:
> 
>> The possibility of more than one vfsmount being present is, as you say
>> possible, but it is not legal for autofs (and last time I checked I
>> concluded it wasn't possible for me to veto bind mount requests). Other
>> than bind mounts I'm struggling to think of a case where I would have
>> more than one autofs fs mount with the same s_dev.
> 
> That's interesting...  Other place where we go through the stack of mounts
> is where we look for one with given bits in type; do we have a possibility
> of multiple candidates there and which one do we really want?  Same function,
> case when we have ioctlfd == -1, !autofs_type_any(type).  Current code
> (as well as original) goes for one closest to root; it would certainly be
> simpler to go for one on top of stack...

We shouldn't have multiple candidates for the same reason autofs can't
support bind mounts. The mounts in an autofs tree of mounts are tied to
a map of mounts in user space which means they are tied to a specific
path. And, we won't (or at least shouldn't unless user space uses the
facility incorrectly) have multiple mounts of the same autofs type on
the stack due to the way autofs must use these mounts.

As far as going from the top of the stack goes I don't think it makes
any real difference since the stack is small (2 or possibly 3). The
find_autofs_mount() function does go from the (terminology?) bottom of
the mount stack using follow_up() whereas the original code went from
the top using the path_lookup() to get the parent and then
follow_down(). So there is additional following with the current code
since kern_path() will go down to the bottom of the stack and then we
proceed to go back up in find_autofs_mount().

> 
>> Yes, the dentry should always be positive here but let me think about it
>> a little more in case I'm missing something. And yes, using the vfsmount
>> super block pointer would be better. I'll fix these too.
> 
> FWIW, I'd rather have that stuff in vfs tree; that's not the only patch
> eliminating mnt_mountpoint use.  AFAICT, none of the uses outside of
> core VFS code are legitimate (and BTW, NFSv4 one is contrary to RFC - it
> should do follow_down() in loop before reporting mount_fileid instead of
> just picking the immediate ancestor for one thing and I'm not at all sure
> that check around that thing is correct; it ignores export path.dentry
> and looks at path.mnt.root instead, which seems to be wrong).  Other places
> would be just as happy with mnt/mnt->mnt_root instead of mnt->mnt_parent/
> mnt->mnt_mountpoint or, worse, mnt/mnt->mnt_mountpoint.  Pohmelfs is even
> nastier, with its d_path() abuses, but that's a separate story.
> 
> IMO we ought to get rid of mnt_mountpoint/mnt_parent uses outside of core
> VFS and I'd rather do that in one patch queue.
> 
> Back to another question: which syscalls should and which syscalls should not 
> trigger automount on the last component?  Note that it's something we'd better
> be consistent about between autofs4 and cifs/afs/nfs...

That's actually a difficult question.

stat(2) is the most problematic call for much the same reasons as
described by Trond.

A (hopefully fairly simple) description of a couple of specific cases
will be useful to clarify the situation.

First, terminology, an "indirect autofs mount" is an autofs fs mount
that implements single path components as mount points within the autofs
fs directory. There are two cases, indirect mounts that are "nobrose"
and ones that "browse"able. The later is just an autofs indirect mount
with pre-created mount point directories so they can be seen by listing
the directory whereas a nobrowse mount corresponds to an autofs mount we
have seen historically where a directory listing of an autofs mount with
no current active automounted mounts returns nothing.

There is no problem with stat(2) for nobrowse autofs indirect mounts
because directories that can potentially trigger mounts simply don't
exist and the user space tools tend to behave as people expect.

But, for browse autofs indirect mounts we get into trouble. For example
a long or colour ls will stat(2) each directory to get needed
information causing all the potential mounts to be mounted. Which is an
pbvious problem for autofs maps that have more than a few entries and a
disaster for anything with more than a hundred or so entries.

Currently stat(2) is honoured for the nobrowse case (the case were we
are creating a dentry) and not for the browse case, unless of course the
mount has already been triggered. This gives rise to the stat() ->
chdir() (or other mount triggering call) -> stat() inconsistency we see
in user space utilities. In Solaris browse is the default for indirect
mounts and we see the same lies in directory listings as mounts aren't
triggered in that case either.

statfs(2) is an interesting case for autofs direct mounts.

More terminology, a "direct autofs mount" is an autofs mount that
implements a mount trigger for a given path. An autofs direct map
consists of entries with a full and a target mount and so each map entry
is a distinct mount within the host file system (mmm ... sounds a bit
like the description in the Solaris man page). When a direct mount is
crossed the corresponding mount is triggered and is mounted on top of
the autofs direct mount point.

We probably should honour statfs(2) calls but without also necessarily
honouring stat(2) calls. Currently the statfs(2) calls get caught by the
exclusion above for stat(2) calls but we don't see many complaints
because we don't list autofs mounts in /etc/mtab, where the user space
utilities usually look for their information. Once again, this approach
is similar to what we see in Solaris. In Solaris /etc/mnttab is similar
to our /proc/mounts but there is a pseudo option ("hide" or something
similar, I can't remember now) used to prevent user space utilities from
considering entries that shouldn't be reported.

So the stat(2) call is the big problem while other related system calls
(in that they use very similar lookup flags), like statfs(2) have the
opposite problem.

This probably isn't really what you wanted to hear but I've tried to
give a decent warts and all description of the difficulty.

Ian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info
  2010-01-18 19:35                             ` Trond Myklebust
@ 2010-01-25  8:16                               ` H. Peter Anvin
  0 siblings, 0 replies; 28+ messages in thread
From: H. Peter Anvin @ 2010-01-25  8:16 UTC (permalink / raw)
  To: autofs

On 01/18/2010 11:35 AM, Trond Myklebust wrote:
> On Mon, 2010-01-18 at 10:27 +0000, Al Viro wrote:
>> Back to another question: which syscalls should and which syscalls should not
>> trigger automount on the last component?  Note that it's something we'd better
>> be consistent about between autofs4 and cifs/afs/nfs...
>
> In addition to the ones that trigger automounts now:
>
> One syscall that we've had a lot of complaints about is 'stat()'. Since
> it doesn't follow symlinks, it will fail to trigger the automount, and
> will return bogus values for st_dev. This again confuses some versions
> of the 'du' utility that check the value of st_dev before and after
> entering the subdir.
> (see https://bugzilla.redhat.com/show_bug.cgi?id=431166 )
>

[Sorry for late comment]

I presume you mean lstat() here.  This is *exactly* the intended 
semantics -- this prevents a GUI from looking in /home with enumeration 
enabled, and mounting *every single subvolume* in that map.

Thus the semantic: "don't automount if you wouldn't follow a symlink". 
It's perhaps not *ideal*, but it is a pretty decent approximation, easy 
to describe, and serves the purpose.

	-hpa

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2010-01-25  8:16 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-23 23:36 [PATCH 0/7] VFS prep for union mounts/writable overlays Valerie Aurora
2009-12-23 23:36 ` [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info Valerie Aurora
2009-12-23 23:36   ` [PATCH 2/7] VFS: Make lookup_hash() return a struct path Valerie Aurora
2009-12-23 23:36     ` [PATCH 3/7] VFS: Make real_lookup() " Valerie Aurora
2009-12-23 23:37       ` [PATCH 4/7] VFS: Propagate mnt_flags into do_loopback Valerie Aurora
2009-12-23 23:37         ` [PATCH 5/7] VFS: Add read-only users count to superblock Valerie Aurora
2009-12-23 23:37           ` [PATCH 6/7] VFS: BUG_ON() rehash of an already hashed dentry Valerie Aurora
2009-12-23 23:37             ` [PATCH 7/7] VFS: Remove unnecessary micro-optimization in cached_lookup() Valerie Aurora
2010-01-02  0:44   ` [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info Ian Kent
2010-01-14  5:43     ` Al Viro
2010-01-14 19:18       ` Valerie Aurora
2010-01-15  6:05         ` Ian Kent
2010-01-15  8:03           ` Al Viro
2010-01-15 17:36             ` Steve French
2010-01-18  5:08             ` Ian Kent
2010-01-15 14:55           ` David Howells
2010-01-15 16:58             ` Al Viro
2010-01-15 17:08             ` David Howells
2010-01-15 17:26               ` Al Viro
2010-01-16 10:17                 ` Al Viro
2010-01-17 17:57                   ` Al Viro
2010-01-18  4:21                     ` Ian Kent
2010-01-18  5:59                       ` Al Viro
2010-01-18  9:14                         ` Ian Kent
2010-01-18 10:27                           ` Al Viro
2010-01-18 19:35                             ` Trond Myklebust
2010-01-25  8:16                               ` H. Peter Anvin
2010-01-19  7:05                             ` Ian Kent

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.