All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/34] Union mount core for review
@ 2010-09-16 22:11 Valerie Aurora
  2010-09-16 22:11 ` [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes Valerie Aurora
                   ` (34 more replies)
  0 siblings, 35 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:11 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

This series is the core mount and lookup infrastructure from union
mounts, split up into small, easily digestible, bikeshed-friendly
pieces.  All of the (non-documentation, non-whitespace) patches in
this series are less than 140 lines long.  It's like Twitter for
kernel patches.

VFS developers should be able to review each of these patches in 3
minutes or less.  If it takes you longer, email me and I'll post a
video on YouTube making fun of you.

Changes since last posted version:

 - Lower directory entries now in one array, not linked list
 - Use clone_mnt() flags to automate hard read-only counts
 - Actually enforce top-layer-mounted-only-once rule
 - Retune DNAME_INLINE_LEN_MIN to keep 64-byte alignment
 - Updated documentation

Next on the todo list:

 - Rewrite in-kernel copyup to be more awesomer
  * Prevent half-finished copies appearing after crash
  * Set correct owners/perms/etc. of copied up dirs/files
  * Remove races between parent/target lookup
  * On metadata writes, only copyup file data if it succeeds
 - Keep reviewing hybrid union patches

After I finish that (3-4 weeks?), I have nothing major left on my
to-do list.  I'm sure that will change as I get code reviews. :)

Against 2.6.35.  The rest of the series (whiteouts, fallthrus,
soon-to-be-obsolete copyup, etc.) is in branch "split_lookup"
in:

git://git.kernel.org/pub/scm/linux/kernel/git/val/linux-2.6.git

Thanks for reviewing!

-VAL

Jan Blunck (3):
  union-mount: Introduce MNT_UNION and MS_UNION flags
  union-mount: Free union stack on removal of topmost dentry from
    dcache
  union-mount: Create IS_MNT_UNION()

Valerie Aurora (31):
  VFS: Make clone_mnt() and copy_tree() return error codes
  VFS: Add CL_NO_SHARED flag to clone_mnt()/copy_tree()
  VFS: Add CL_NO_SLAVE flag to clone_mnt()/copy_tree()
  VFS: Add CL_MAKE_HARD_READONLY flag to clone_mnt()/copy_tree()
  union-mount: Union mounts documentation
  union-mount: Add CONFIG_UNION_MOUNT option
  union-mount: Create union_stack structure
  union-mount: Add two superblock fields for union mounts
  union-mount: Add union_alloc()
  union-mount: Add union_find_dir()
  union-mount: Create d_free_unions()
  union-mount: Create union_add_dir()
  union-mount: Add union_create_topmost_dir()
  union-mount: Create needs_lookup_union()
  union-mount: Create check_topmost_union_mnt()
  union-mount: Add clone_union_tree() and put_union_sb()
  union-mount: Create build_root_union()
  union-mount: Create prepare_mnt_union() and cleanup_mnt_union()
  union-mount: Prevent improper union-related remounts
  union-mount: Prevent topmost file system from being mounted elsewhere
  union-mount: Prevent bind mounts of union mounts
  union-mount: Implement union mount
  union-mount: Temporarily disable some syscalls
  union-mount: Basic infrastructure of __union_lookup()
  union-mount: Process negative dentries in __union_lookup()
  union-mount: Return files found in lower layers in __union_lookup()
  union-mount: Build union stack in __lookup_union()
  union-mount: Follow mount in __lookup_union()
  union-mount: Add lookup_union() wrapper for __lookup_union()
  union-mount: Add do_lookup_union() wrapper for __lookup_union()
  union-mount: Call union lookup functions in lookup path

 Documentation/filesystems/union-mounts.txt |  744 ++++++++++++++++++++++++++++
 fs/Kconfig                                 |   13 +
 fs/Makefile                                |    1 +
 fs/dcache.c                                |   14 +
 fs/namei.c                                 |  204 ++++++++
 fs/namespace.c                             |  378 ++++++++++++--
 fs/pnode.c                                 |    5 +-
 fs/pnode.h                                 |    3 +
 fs/super.c                                 |    1 +
 fs/union.c                                 |  178 +++++++
 fs/union.h                                 |   80 +++
 include/linux/dcache.h                     |   22 +-
 include/linux/fs.h                         |   13 +
 include/linux/mount.h                      |    4 +
 14 files changed, 1604 insertions(+), 56 deletions(-)
 create mode 100644 Documentation/filesystems/union-mounts.txt
 create mode 100644 fs/union.c
 create mode 100644 fs/union.h


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
@ 2010-09-16 22:11 ` Valerie Aurora
  2010-09-20 21:26   ` Andreas Gruenbacher
  2010-09-30  9:51   ` Miklos Szeredi
  2010-09-16 22:11 ` [PATCH 02/34] VFS: Add CL_NO_SHARED flag to clone_mnt()/copy_tree() Valerie Aurora
                   ` (33 subsequent siblings)
  34 siblings, 2 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:11 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

copy_tree() can theoretically fail in a case other than ENOMEM, but
always returns NULL which is interpreted by callers as -ENOMEM.
Convert to return an explicit error.  Convert clone_mnt() for
consistency and because union mounts will add new error cases.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namespace.c |  111 ++++++++++++++++++++++++++++++--------------------------
 fs/pnode.c     |    5 ++-
 2 files changed, 63 insertions(+), 53 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index e1ea335..5566524 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -559,53 +559,57 @@ static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root,
 					int flag)
 {
 	struct super_block *sb = old->mnt_sb;
-	struct vfsmount *mnt = alloc_vfsmnt(old->mnt_devname);
+	struct vfsmount *mnt;
+	int err;
 
-	if (mnt) {
-		if (flag & (CL_SLAVE | CL_PRIVATE))
-			mnt->mnt_group_id = 0; /* not a peer of original */
-		else
-			mnt->mnt_group_id = old->mnt_group_id;
-
-		if ((flag & CL_MAKE_SHARED) && !mnt->mnt_group_id) {
-			int err = mnt_alloc_group_id(mnt);
-			if (err)
-				goto out_free;
-		}
+	mnt = alloc_vfsmnt(old->mnt_devname);
+	if (!mnt)
+		return ERR_PTR(-ENOMEM);
 
-		mnt->mnt_flags = old->mnt_flags;
-		atomic_inc(&sb->s_active);
-		mnt->mnt_sb = sb;
-		mnt->mnt_root = dget(root);
-		mnt->mnt_mountpoint = mnt->mnt_root;
-		mnt->mnt_parent = mnt;
-
-		if (flag & CL_SLAVE) {
-			list_add(&mnt->mnt_slave, &old->mnt_slave_list);
-			mnt->mnt_master = old;
-			CLEAR_MNT_SHARED(mnt);
-		} else if (!(flag & CL_PRIVATE)) {
-			if ((flag & CL_MAKE_SHARED) || IS_MNT_SHARED(old))
-				list_add(&mnt->mnt_share, &old->mnt_share);
-			if (IS_MNT_SLAVE(old))
-				list_add(&mnt->mnt_slave, &old->mnt_slave);
-			mnt->mnt_master = old->mnt_master;
-		}
-		if (flag & CL_MAKE_SHARED)
-			set_mnt_shared(mnt);
-
-		/* stick the duplicate mount on the same expiry list
-		 * as the original if that was on one */
-		if (flag & CL_EXPIRE) {
-			if (!list_empty(&old->mnt_expire))
-				list_add(&mnt->mnt_expire, &old->mnt_expire);
-		}
+	if (flag & (CL_SLAVE | CL_PRIVATE))
+		mnt->mnt_group_id = 0; /* not a peer of original */
+	else
+		mnt->mnt_group_id = old->mnt_group_id;
+
+	if ((flag & CL_MAKE_SHARED) && !mnt->mnt_group_id) {
+		err = mnt_alloc_group_id(mnt);
+		if (err)
+			goto out_free;
 	}
+
+	mnt->mnt_flags = old->mnt_flags;
+	atomic_inc(&sb->s_active);
+	mnt->mnt_sb = sb;
+	mnt->mnt_root = dget(root);
+	mnt->mnt_mountpoint = mnt->mnt_root;
+	mnt->mnt_parent = mnt;
+
+	if (flag & CL_SLAVE) {
+		list_add(&mnt->mnt_slave, &old->mnt_slave_list);
+		mnt->mnt_master = old;
+		CLEAR_MNT_SHARED(mnt);
+	} else if (!(flag & CL_PRIVATE)) {
+		if ((flag & CL_MAKE_SHARED) || IS_MNT_SHARED(old))
+			list_add(&mnt->mnt_share, &old->mnt_share);
+		if (IS_MNT_SLAVE(old))
+			list_add(&mnt->mnt_slave, &old->mnt_slave);
+		mnt->mnt_master = old->mnt_master;
+	}
+	if (flag & CL_MAKE_SHARED)
+		set_mnt_shared(mnt);
+
+	/* stick the duplicate mount on the same expiry list
+	 * as the original if that was on one */
+	if (flag & CL_EXPIRE) {
+		if (!list_empty(&old->mnt_expire))
+			list_add(&mnt->mnt_expire, &old->mnt_expire);
+	}
+
 	return mnt;
 
  out_free:
 	free_vfsmnt(mnt);
-	return NULL;
+	return ERR_PTR(err);
 }
 
 static inline void __mntput(struct vfsmount *mnt)
@@ -1212,11 +1216,12 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
 	struct path path;
 
 	if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt))
-		return NULL;
+		return ERR_PTR(-EINVAL);
 
 	res = q = clone_mnt(mnt, dentry, flag);
-	if (!q)
-		goto Enomem;
+	if (IS_ERR(q))
+		return q;
+
 	q->mnt_mountpoint = mnt->mnt_mountpoint;
 
 	p = mnt;
@@ -1237,8 +1242,8 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
 			path.mnt = q;
 			path.dentry = p->mnt_mountpoint;
 			q = clone_mnt(p, p->mnt_root, flag);
-			if (!q)
-				goto Enomem;
+			if (IS_ERR(q))
+				goto out;
 			spin_lock(&vfsmount_lock);
 			list_add_tail(&q->mnt_list, &res->mnt_list);
 			attach_mnt(q, &path);
@@ -1246,7 +1251,7 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
 		}
 	}
 	return res;
-Enomem:
+out:
 	if (res) {
 		LIST_HEAD(umount_list);
 		spin_lock(&vfsmount_lock);
@@ -1254,9 +1259,11 @@ Enomem:
 		spin_unlock(&vfsmount_lock);
 		release_mounts(&umount_list);
 	}
-	return NULL;
+	return q;
 }
 
+/* Caller should check returned pointer for errors */
+
 struct vfsmount *collect_mounts(struct path *path)
 {
 	struct vfsmount *tree;
@@ -1529,14 +1536,15 @@ static int do_loopback(struct path *path, char *old_name,
 	if (!check_mnt(path->mnt) || !check_mnt(old_path.mnt))
 		goto out;
 
-	err = -ENOMEM;
 	if (recurse)
 		mnt = copy_tree(old_path.mnt, old_path.dentry, 0);
 	else
 		mnt = clone_mnt(old_path.mnt, old_path.dentry, 0);
 
-	if (!mnt)
+	if (IS_ERR(mnt)) {
+		err = PTR_ERR(mnt);
 		goto out;
+	}
 
 	err = graft_tree(mnt, path);
 	if (err) {
@@ -2071,10 +2079,11 @@ static struct mnt_namespace *dup_mnt_ns(struct mnt_namespace *mnt_ns,
 	/* First pass: copy the tree topology */
 	new_ns->root = copy_tree(mnt_ns->root, mnt_ns->root->mnt_root,
 					CL_COPY_ALL | CL_EXPIRE);
-	if (!new_ns->root) {
+	if (IS_ERR(new_ns->root)) {
+		int err = PTR_ERR(new_ns->root);
 		up_write(&namespace_sem);
 		kfree(new_ns);
-		return ERR_PTR(-ENOMEM);
+		return ERR_PTR(err);
 	}
 	spin_lock(&vfsmount_lock);
 	list_add_tail(&new_ns->list, &new_ns->root->mnt_list);
diff --git a/fs/pnode.c b/fs/pnode.c
index 5cc564a..c4358d2 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -250,8 +250,9 @@ int propagate_mnt(struct vfsmount *dest_mnt, struct dentry *dest_dentry,
 
 		source =  get_source(m, prev_dest_mnt, prev_src_mnt, &type);
 
-		if (!(child = copy_tree(source, source->mnt_root, type))) {
-			ret = -ENOMEM;
+		child = copy_tree(source, source->mnt_root, type);
+		if (IS_ERR(child)) {
+			ret = PTR_ERR(child);
 			list_splice(tree_list, tmp_list.prev);
 			goto out;
 		}
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 02/34] VFS: Add CL_NO_SHARED flag to clone_mnt()/copy_tree()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
  2010-09-16 22:11 ` [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes Valerie Aurora
@ 2010-09-16 22:11 ` Valerie Aurora
  2010-09-16 22:11 ` [PATCH 03/34] VFS: Add CL_NO_SLAVE " Valerie Aurora
                   ` (32 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:11 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

Passing the CL_NO_SHARED flag to clone_mnt() causes the clone to fail
if the source mnt is shared.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namespace.c |    3 +++
 fs/pnode.h     |    1 +
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 5566524..eeb4c22 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -562,6 +562,9 @@ static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root,
 	struct vfsmount *mnt;
 	int err;
 
+	if ((flag & CL_NO_SHARED) && (IS_MNT_SHARED(old)))
+		return ERR_PTR(-EINVAL);
+
 	mnt = alloc_vfsmnt(old->mnt_devname);
 	if (!mnt)
 		return ERR_PTR(-ENOMEM);
diff --git a/fs/pnode.h b/fs/pnode.h
index 1ea4ae1..bcb3c47 100644
--- a/fs/pnode.h
+++ b/fs/pnode.h
@@ -22,6 +22,7 @@
 #define CL_COPY_ALL 		0x04
 #define CL_MAKE_SHARED 		0x08
 #define CL_PRIVATE 		0x10
+#define CL_NO_SHARED 		0x20
 
 static inline void set_mnt_shared(struct vfsmount *mnt)
 {
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 03/34] VFS: Add CL_NO_SLAVE flag to clone_mnt()/copy_tree()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
  2010-09-16 22:11 ` [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes Valerie Aurora
  2010-09-16 22:11 ` [PATCH 02/34] VFS: Add CL_NO_SHARED flag to clone_mnt()/copy_tree() Valerie Aurora
@ 2010-09-16 22:11 ` Valerie Aurora
       [not found]   ` <AANLkTim1bbGrrPcFHThx3XOm8GmudQFSmFUs3NAXT5yC@mail.gmail.com>
  2010-09-16 22:11 ` [PATCH 04/34] VFS: Add CL_MAKE_HARD_READONLY " Valerie Aurora
                   ` (31 subsequent siblings)
  34 siblings, 1 reply; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:11 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

Passing the CL_NO_SLAVE flag to clone_mnt() causes the clone
to fail if the source mnt is a slave.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namespace.c |    3 +++
 fs/pnode.h     |    1 +
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index eeb4c22..6956062 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -565,6 +565,9 @@ static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root,
 	if ((flag & CL_NO_SHARED) && (IS_MNT_SHARED(old)))
 		return ERR_PTR(-EINVAL);
 
+	if ((flag & CL_NO_SLAVE) && (IS_MNT_SLAVE(old)))
+		return ERR_PTR(-EINVAL);
+
 	mnt = alloc_vfsmnt(old->mnt_devname);
 	if (!mnt)
 		return ERR_PTR(-ENOMEM);
diff --git a/fs/pnode.h b/fs/pnode.h
index bcb3c47..8920e47 100644
--- a/fs/pnode.h
+++ b/fs/pnode.h
@@ -23,6 +23,7 @@
 #define CL_MAKE_SHARED 		0x08
 #define CL_PRIVATE 		0x10
 #define CL_NO_SHARED 		0x20
+#define CL_NO_SLAVE 		0x40
 
 static inline void set_mnt_shared(struct vfsmount *mnt)
 {
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 04/34] VFS: Add CL_MAKE_HARD_READONLY flag to clone_mnt()/copy_tree()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (2 preceding siblings ...)
  2010-09-16 22:11 ` [PATCH 03/34] VFS: Add CL_NO_SLAVE " Valerie Aurora
@ 2010-09-16 22:11 ` Valerie Aurora
  2010-09-16 22:11 ` [PATCH 05/34] union-mount: Union mounts documentation Valerie Aurora
                   ` (30 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:11 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

Passing the CL_MAKE_HARD_READONLY flag to clone_mnt() causes the clone
to fail if the source superblock is not read-only.  If it is
read-only, it increments the hard read-only users and sets the
MNT_HARD_READONLY flag in the vfsmount.  When the mount is freed via
free_vfsmnt(), automatically decrement the hard read-only users count.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namespace.c        |   18 ++++++++++++++++++
 fs/pnode.h            |    1 +
 include/linux/mount.h |    1 +
 3 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 6956062..cbaa3ea 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -400,6 +400,12 @@ EXPORT_SYMBOL(simple_set_mnt);
 void free_vfsmnt(struct vfsmount *mnt)
 {
 	kfree(mnt->mnt_devname);
+	if (mnt->mnt_flags & MNT_HARD_READONLY) {
+		BUG_ON(mnt->mnt_sb->s_hard_readonly_users <= 0);
+		down_write(&mnt->mnt_sb->s_umount);
+		mnt->mnt_sb->s_hard_readonly_users--;
+		up_write(&mnt->mnt_sb->s_umount);
+	}
 	mnt_free_id(mnt);
 #ifdef CONFIG_SMP
 	free_percpu(mnt->mnt_writers);
@@ -568,6 +574,16 @@ static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root,
 	if ((flag & CL_NO_SLAVE) && (IS_MNT_SLAVE(old)))
 		return ERR_PTR(-EINVAL);
 
+	if (flag & CL_MAKE_HARD_READONLY) {
+		down_write(&sb->s_umount);
+		if (!(sb->s_flags & MS_RDONLY)) {
+			up_write(&sb->s_umount);
+			return ERR_PTR(-EBUSY);
+		}
+		sb->s_hard_readonly_users++;
+		up_write(&sb->s_umount);
+	}
+
 	mnt = alloc_vfsmnt(old->mnt_devname);
 	if (!mnt)
 		return ERR_PTR(-ENOMEM);
@@ -603,6 +619,8 @@ static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root,
 	}
 	if (flag & CL_MAKE_SHARED)
 		set_mnt_shared(mnt);
+	if (flag & CL_MAKE_HARD_READONLY)
+		mnt->mnt_flags |= MNT_HARD_READONLY;
 
 	/* stick the duplicate mount on the same expiry list
 	 * as the original if that was on one */
diff --git a/fs/pnode.h b/fs/pnode.h
index 8920e47..dc7b468 100644
--- a/fs/pnode.h
+++ b/fs/pnode.h
@@ -24,6 +24,7 @@
 #define CL_PRIVATE 		0x10
 #define CL_NO_SHARED 		0x20
 #define CL_NO_SLAVE 		0x40
+#define CL_MAKE_HARD_READONLY	0x80
 
 static inline void set_mnt_shared(struct vfsmount *mnt)
 {
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 4bd0547..b300cf8 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -46,6 +46,7 @@ struct mnt_namespace;
 
 
 #define MNT_INTERNAL	0x4000
+#define MNT_HARD_READONLY	0x8000	/* has a hard read-only ref on the sb */
 
 struct vfsmount {
 	struct list_head mnt_hash;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 05/34] union-mount: Union mounts documentation
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (3 preceding siblings ...)
  2010-09-16 22:11 ` [PATCH 04/34] VFS: Add CL_MAKE_HARD_READONLY " Valerie Aurora
@ 2010-09-16 22:11 ` Valerie Aurora
  2010-09-16 22:11 ` [PATCH 06/34] union-mount: Introduce MNT_UNION and MS_UNION flags Valerie Aurora
                   ` (29 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:11 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

Document design and implementation of union mounts (a.k.a. writable
overlays).

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 Documentation/filesystems/union-mounts.txt |  744 ++++++++++++++++++++++++++++
 1 files changed, 744 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/filesystems/union-mounts.txt

diff --git a/Documentation/filesystems/union-mounts.txt b/Documentation/filesystems/union-mounts.txt
new file mode 100644
index 0000000..830dae6
--- /dev/null
+++ b/Documentation/filesystems/union-mounts.txt
@@ -0,0 +1,744 @@
+Union mounts (a.k.a. writable overlays)
+=======================================
+
+This document describes the architecture and current status of union
+mounts, also known as writable overlays.
+
+In this document:
+ - Overview of union mounts
+ - Terminology
+ - VFS implementation
+ - Locking strategy
+ - VFS/file system interface
+ - Userland interface
+ - NFS interaction
+ - Status
+ - Contributing to union mounts
+
+Overview
+========
+
+A union mount layers one read-write file system over one or more
+read-only file systems, with all writes going to the writable file
+system.  The namespace of both file systems appears as a combined
+whole to userland, with files and directories on the writable file
+system covering up any files or directories with matching pathnames on
+the read-only file system.  The read-write file system is the
+"topmost" or "upper" file system and the read-only file systems are
+the "lower" file systems.  A few use cases:
+
+- Root file system on CD with writes saved to hard drive (LiveCD)
+- Multiple virtual machines with the same starting root file system
+- Cluster with NFS mounted root on clients
+
+Most if not all of these problems could be solved with a COW block
+device or a clustered file system (include NFS mounts).  However, for
+some use cases, sharing is more efficient and better performing if
+done at the file system namespace level.  COW block devices only
+increase their divergence as time goes on, and a fully coherent
+writable file system is unnecessary synchronization overhead if no
+other client needs to see the writes.
+
+What union mounts are not
+-------------------------
+
+Union mounts are not a general-purpose unioning file system.  They do
+not provide a generic "union of namespaces" operation for an arbitrary
+number of file systems.  Many interesting features can be implemented
+with a generic unioning facility: dynamic insertion and removal of
+branches, write policies based on space available, online upgrade,
+etc.  Some unioning file systems that do this are UnionFS and AUFS.
+
+Terminology
+===========
+
+The main physical metaphor for union mounts is that a writable file
+system is mounted "on top" of a read-only file system.  Lookups start
+at the "topmost" read-write file system and travel "down" to the
+"bottom" read-only file system only if no blocking entry exists on the
+top layer.
+
+Topmost layer: The read-write file system.  Lookups begin here.
+
+Bottom layer: The read-only file system.  Lookups end here.
+
+Path: Combination of the vfsmount and dentry structure.
+
+Follow down: Given a path from the top layer, find the corresponding
+path on the bottom layer.
+
+Follow up: Given a path from the bottom layer, find the corresponding
+path on the top layer.
+
+Whiteout: A directory entry in the top layer that prevents lookups
+from travelling down to the bottom layer.  Created on unlink()/rmdir()
+if a corresponding directory entry exists in the bottom layer.
+
+Opaque flag: A flag on a directory in the top layer that prevents
+lookups of entries in this directory from travelling down to the
+bottom layer (unless there is an explicit fallthru entry allowing that
+for a particular entry).  Set on creation of a directory that replaces
+a whiteout, and after a directory copyup.
+
+Fallthru: A directory entry which allows lookups to "fall through" to
+the bottom layer for that exact directory entry.  This serves as a
+placeholder for directory entries from the bottom layer during
+readdir().  Fallthrus override opaque flags.
+
+File copyup: Create a file on the top layer that has the same metadata
+and contents as the file with the same pathname on the bottom layer.
+
+Directory copyup: Copy up the visible directory entries from the
+bottom layer as fallthrus in the matching top layer directory.  Mark
+the directory opaque to avoid unnecessary negative lookups on the
+bottom layer.
+
+Examples
+========
+
+What happens when I...
+
+- creat() /newfile -> creates on topmost layer
+- unlink() /oldfile -> creates a whiteout on topmost layer
+- Edit /existingfile -> copies up to top layer at open(O_WR) time
+- truncate /existingfile -> copies up to topmost layer + N bytes if specified
+- touch()/chmod()/chown()/etc. -> copies up to topmost layer
+- mkdir() /newdir -> creates on topmost layer
+- rmdir() /olddir -> creates a whiteout on topmost layer
+- mkdir() /olddir after above -> creates on topmost layer w/ opaque flag
+- readdir() /shareddir -> copies up entries from bottom layer as fallthrus
+- link() /oldfile /newlink -> copies up /oldfile, creates /newlink on topmost layer
+- symlink() /oldfile /symlink -> nothing special
+- rename() /oldfile /newfile -> copies up /oldfile to /newfile on top layer
+- rename() /olddir /newdir -> EXDEV
+- rename() /topmost_only_dir /topmost_only_dir2 -> success
+
+Getting to a root file system with union mounts:
+
+- Mount the base read-only file system as the root file system
+- Mount the read-only file system again on /newroot
+- Mount the read-write layer on /newroot:
+   # mount -o union /dev/sda /newroot
+- pivot_root to /newroot
+- Start init
+
+See scripts/pivot.sh in the UML devkit linked to from:
+
+http://valerieaurora.org/union/
+
+VFS implementation
+==================
+
+Union mounts are implemented as an integral part of the VFS, rather
+than as a VFS client file system (i.e., a stacked file system like
+unionfs or ecryptfs).  Implementing unioning inside the VFS eliminates
+the need for duplicate copies of VFS data structures, unnecessary
+indirection, and code duplication, but requires very maintainable, low
+overhead code.  Union mounts require no change to file systems serving
+as the read-only layer, and requires some minor support from file
+systems serving as the read-write layer.  File systems that want to be
+the writable layer must implement the new ->whiteout() and
+->fallthru() inode operations, which create special dummy directory
+entries.
+
+The union mounts code must accomplish the following major tasks:
+
+1) Pass lookups through to the lower level file system.
+2) Copy files and directories up to the topmost layer when written.
+3) Create whiteouts and fallthrus as necessary.
+
+VFS objects and union mounts
+----------------------------
+
+First, some VFS basics:
+
+The VFS allows multiple mounts of the same file system.  For example,
+/dev/sda can be mounted at /usr and also at /mnt.  The same file
+system can be mounted read-only at one point and read-write at
+another.  Each of these mounts has its own vfsmount data structure in
+the kernel.  However, each underlying file system has exactly one
+in-kernel superblock structure no matter how many times it is mounted.
+All the separate vfsmounts for the same file system reference the same
+superblock data structure.
+
+Directory entries are cached by the VFS in dentry structures.  The VFS
+keeps one dentry structure for each file or directory in a file
+system, no matter how many times it is mounted.  Each dentry
+represents only one element of a path name.  When the VFS looks up a
+pathname (e.g., "/sbin/init"), the result is combination of vfsmount
+and dentry.  This <mnt,dentry> pair is usually stored in a kernel
+structure named "path", which is simply two pointers, one to the
+vfsmount and one to the dentry.  A "struct path" is this structure; a
+pathname is a string like "/etc/fstab".
+
+In union mounts, a file system can only be the topmost layer for one
+union mount.  A file system can be part of multiple union mounts if it
+is a read-only layer.  So dentries in the read-only layers can be part
+of multiple unions, while a dentry in the read-write layer can only be
+part of one unin.
+
+union_dir structure
+---------------------
+
+The first job of union mounts is to map directories from the topmost
+layer to directories with the same pathname in the lower layer.  That
+is, given the <mnt,dentry> pair for a directory pathname in the
+topmost layer, we need to find all the <mnt,dentry> pairs for the
+directory with the same pathname in the lower layer.  We do this with
+the union_dir structure, which is an array containing struct paths
+(mnt, dentry pointer pairs) for each directory unioned with the
+topmost union.  The array is pointed to from the new d_union_dir
+member of struct dentry.
+
+/*
+ * The union_stack structure.  It is an array of struct paths of
+ * directories below the topmost directory in a unioned directory, The
+ * topmost dentry has a pointer to this structure.  The topmost dentry
+ * can only be part of one union, so we can reference it from the
+ * dentry, but lower dentries can be part of multiple union stacks.
+ *
+ * The number of dirs actually allocated is kept in the superblock,
+ * s_union_count.
+ */
+struct union_stack {
+	struct path u_dirs[0];
+};
+
+This structure is flexible enough to support an arbitrary number of
+layers of unioned file systems.  Since there can be more than two
+layers, this section will talk about mapping "upper" directories to
+"lower" directories, instead of "topmost" directories to "bottom"
+directories.
+
+Traversing the union stack
+--------------------------
+
+The set of union_dir structures referring to a particular pathname are
+called collectively the union stack for that directory.  To traverse
+the union stack, iterate through the number of layers in the union
+(stored in sb->s_union_count) with union_find_dir().  Example: freeing
+the union stack:
+
+void d_free_unions(struct dentry *topmost)
+{
+	struct path *path;
+	unsigned int i, layers = topmost->d_sb->s_union_count;
+
+	if (!IS_DIR_UNIONED(topmost))
+		return;
+
+	for (i = 0; i < layers; i++) {
+		path = union_find_dir(topmost, i);
+		if (path->mnt)
+			path_put(path);
+	}
+	kfree(topmost->d_union_stack);
+	topmost->d_union_stack = NULL;
+}
+
+Code paths
+----------
+
+Union mounts modify the following key code paths in the VFS:
+
+- mount()/umount()
+- Pathname lookup
+- Any path that modifies an existing file
+
+Mount
+-----
+
+Union mounts are created in two steps:
+
+1. Mount the read-only layer file systems read-only in the usual
+manner, all on the same mountpoint.  Submounts are permitted as long
+as they are also read-only and not shared (part of a mount propagation
+group).
+
+2. Mount the top layer with the "-o union" option at the same
+mountpoint.  All read-only file systems mounted at this mountpoint
+will be included in the union mount.
+
+The bottom layers must be read-only and the top layer must be
+read-write and support whiteouts and fallthrus.  A file system that
+supports whiteouts and fallthrus indicates this by setting the
+MS_WHITEOUT and MS_FALLTHRU flags in the superblock.  Currently, the
+top layer is forced to "noatime" to avoid a copyup on every access of
+a file.  Supporting atime with the current infrastructure would
+require a copyup on every open().  The "relatime" option would be
+equally efficient if the atime is the same or more recent than the
+mtime/ctime for every object on the read-only file system, and if the
+24-hour timeout on relatime was disabled.  However, this is probably
+not worthwhile for the majority of union mount use cases.
+
+File systems can only be union mounted at their root directories, for
+simplicity and performance.
+
+pivot_root() to a union mounted file system is supported.  The
+recommended way to get to a union mounted root file system is to boot
+with the read-only mount as the root file system, construct the union
+mount on an entirely new mount, and pivot_root() to the new union
+mount root.  Attempting to union mount the root file system later in
+boot will result in covering other file systems, e.g., /proc, which
+isn't permitted in the current code and is a bad idea anyway.
+
+Hard read-only file systems
+---------------------------
+
+Union mounts require the lower layer of the file system to be
+read-only.  However, in Linux, any individual file system may be
+mounted at multiple places in the namespace, and a file system can be
+changed from read-only to read-write while still mounted.  Thus, simply
+checking that the bottom layer is read-only at the time the writable
+overlay is mounted over it is pointless, since at any time the bottom
+layer may become read-write.
+
+We have to guarantee that a file system will be read-only for as long
+as it is the bottom layer of a union mount.  To do this, we track the
+number of hard read-only users of a file system in its VFS superblock
+structure.  When we union mount a writable overlay over a file system,
+we increment its read-only user count.  The file system can only be
+mounted read-write if its read-only users count is zero.
+
+Todo:
+
+- Support hard read-only NFS mounts.  See discussion here:
+
+  http://markmail.org/message/3mkgnvo4pswxd7lp
+
+Pathname lookup
+---------------
+
+Pathname lookup in a unioned directory traverses down the union stack
+for the parent directory, looking up each pathname element in each
+layer of the file system (according to the rules of whiteouts,
+fallthrus, and opaque flags).  At mount time, the union stack for the
+root directory of the file system is created, and the union stack
+creation for every other unioned directory in the file system is
+boot-strapped using the already-existing union stack of the
+directory's parent.  In order to simplify the code greatly, every
+visible directory on the lower file system is required to have a
+matching directory on the upper file system.  This matching directory
+is created during pathname lookup if does not already exist.
+Therefore, each unioned directory is the child of another unioned
+directory (or is the root directory of the file system).
+
+The actual union lookup function is called in the following code
+paths:
+
+do_lookup()->do_union_lookup()->lookup_union()->__lookup_union()
+lookup_hash()->lookup_union()->__lookup_union()
+
+__lookup_union() is where the rules of whiteouts, fallthrus, and
+opaque flags are actually implemented.  __lookup_union() returns
+either the first visible dentry, or a negative dentry from the topmost
+file system if no matching dentry exists.  If it finds a directory, it
+looks up any potential matching lower layer directories.  If it finds
+a lower layer directory, it first creates the topmost dir if necessary
+via union_create_topmost_dir(), and then calls union_add_dir() to
+append the lower directory to the end of the union stack.
+
+Note that not all directories in a union mount are unioned, only those
+with matching directories on the lower layer.  The macro
+IS_DIR_UNIONED() is a cheap, constant time way to check if a directory
+is unioned, while IS_MNT_UNION() checks if the entire mount is unioned
+(and therefore whether the directory in question is potentially
+unioned).
+
+Currently, lookup of a negative dentry or a directory with no matching
+directories below it requires a lookup in every directory in the union
+stack every time it is looked up.  We could avoid subsequent lookups
+by adding the equivalent of a negative dcache entry.
+
+File copyup
+-----------
+
+Any system call that alters the data or metadata of a file on the
+bottom layer, or creates or changes a hard link to it will trigger a
+copyup of the target file from the lower layer to the topmost layer
+
+ - open(O_WRITE | O_RDWR | O_APPEND)
+ - truncate()/open(O_TRUNC)
+ - link()
+ - rename()
+ - chmod()
+ - chown()/lchown()
+ - utimes()
+ - setxattr()/lsetxattr()
+
+Copyup of a file due to open(O_WRITE) has already occurred when:
+
+ - write()
+ - ftruncate()
+ - writable mmap()
+
+The following system calls will fail on an fd opened O_RDONLY:
+
+ - fchmod()
+ - fchown()
+ - fsetxattr()
+ - futimensat()
+
+Contrary to common sense, the above system calls are defined to
+succeed on O_RDONLY fds.  The idea seems to be that the
+O_RDONLY/O_RDWR/O_WRITE flags only apply to the actual file data, not
+to any form of metadata (times, owner, mode, or even extended
+attributes).  Applications making these system calls on O_RDONLY fds
+are correct according to the standard and work on non-union mounts.
+They will need to be rewritten (O_RDONLY -> O_RDWR) to work on union
+mounts.  We suspect this usage is uncommon.
+
+This deviation from standard is due to technical limitations of the
+union mount implementation.  Specifically, we would need to replace an
+open file descriptor from the lower layer with an open file descriptor
+for a file with matching pathname and contents on the upper layer,
+which is difficult to do.  We avoid this in other system calls by
+doing the copyup before the file is opened.  Unionfs doesn't encounter
+this problem because it creates a dummy file struct which redirects or
+fans out operations to the struct files for the underlying file
+systems.
+
+From an application's point of view, the result of an in-kernel file
+copyup is the logical equivalent of another application updating the
+file via the rename() pattern: creat() a new file, copy the data over,
+make changes the copy, and rename() over the old version.  Any
+existing open file descriptors for that file (including those in the
+same application) refer to a now invisible object that used to have
+the same pathname.  Only opens that occur after the copyup will see
+updates to the file.
+
+Permission checks
+-----------------
+
+We want to be sure we have the correct permissions to actually succeed
+in a system call before copying a file up to avoid unnecessary IO.  At
+present, the permission check for a single system call may be spread
+out over many hundreds of lines of code (e.g., open()).  In order to
+check permissions, we occasionally need to determine if there is a
+writable overlay on top of this inode.  This requires a full path, but
+often we only have the inode at this point.  In particular,
+inode_permission() returns EROFS if the inode is on a read-only file
+system, which is the wrong answer if there is a writable overlay
+mounted on top of it.
+
+The current solution is to split out the file-system-wide permission
+checks from the per-inode permission checks.  inode_permission()
+becomes:
+
+sb_permission()
+__inode_permission()
+
+inode_permission() calls sb_permission() and __inode_permission() on
+the same path.  We create path_permission() which calls
+sb_permission() on the parent directory from the top layer, and
+__inode_permission() on the target on the lower layer.  This gets us
+the correct write permissions consdering that the file will be copied
+up.
+
+Todo:
+
+  - Currently, we don't deal with differing directory permissions at
+    different levels of the stack.  This is a bug.
+
+Impact on non-union kernels and mounts
+--------------------------------------
+
+Union-related data structures, extra fields, and function calls are
+#ifdef'd out at the function/macro level with CONFIG_UNION_MOUNT in
+nearly all cases (see fs/union.h).  When CONFIG_UNION_MOUNT is
+enabled, struct dentry has one more pointer, reducing the size of
+dentry names stored in the dentry itself by 4 to 8 bytes.
+
+Todo:
+
+ - Do performance tests
+
+Locking strategy
+================
+
+The current union mount locking strategy is based on the following
+rules:
+
+* The lower layer file system is always read-only
+* The topmost file system is always read-write
+  => A file system can never a topmost and lower layer at the same time
+
+Additionally, the topmost layer may only be mounted exactly once.
+Don't think of the topmost layer as a separate independent file
+system; when it is part of a union mount, it is only a file system in
+conjunction with the read-only bottom layer.  The read-only bottom
+layer is an independent file system in and of itself and can be
+mounted elsewhere, including as the bottom layer for another union
+mount.
+
+Thus, we may define a stable locking order in terms of top layer and
+bottom layer locks, since a top layer is never a bottom layer and a
+bottom layer is never a top layer.  Another simplifying assumption is
+that all directories in a pathname exist on the top layer, as they are
+created step-by-step during lookup.  This prevents us from ever having
+to walk backwards up the path creating directory entries, which can
+get complicated.  By implication, parent directories paths during any
+operation (rename(), unlink(),etc.) are from the top layer.  Dentries
+for directories from the bottom layer are only ever seen or used by
+the lookup code.
+
+The two major problems we avoid with the above rules are:
+
+Lock ordering: Imagine two union stacks with the same two file
+systems: A mounted over B, and B mounted over A.  Sometimes locks on
+objects in both A and B will have to be held simultanously.  What
+order should they be acquired in?  Simply acquiring them from top to
+bottom will create a lock-ordering problem - one thread acquires lock
+on object from A and then tries for a lock on object from B, while
+another thread grabs the lock on object from B and then waits for the
+lock on object from A.  Some other lock ordering must be defined.
+
+Movement/change/disappearance of objects on multiple layers: A variety
+of nasty corner cases arise when more than one layer is changing at
+the same time.  Changes in the directory topology and their effect on
+inheritance are of special concern.  Al Viro's canonical email on the
+subject:
+
+http://lkml.indiana.edu/hypermail/linux/kernel/0802.0/0839.html
+
+We don't try to solve any of these cases, just avoid them in the first
+place.
+
+Todo: Prevent top layer from being mounted more than once.
+
+Cross-layer interactions
+------------------------
+
+The VFS code simultaneously holds references to and/or modifies
+objects from both the top and bottom layers in the following cases:
+
+Path lookup:
+
+Grabs i_mutex on bottom layer while holding i_mutex on top layer
+directory inode.
+
+File copyup:
+
+Holds i_mutex on the parent directory from the top layer while copying
+up file from lower layer.
+
+link():
+
+File copyup of target while holding i_mutex on parent directory on top
+layer.  Followed by a normal link() operation.
+
+rename():
+
+Holds s_vfs_rename_mutex on the top layer, i_mutex of the source's
+parent dir (top layer), and i_mutex of the target's parent dir (also
+top layer) while looking up and copying the bottom layer target and
+also creating the whiteout.
+
+Notes on rename():
+
+First, renaming of directories returns EXDEV.  It's not at all
+reasonable to recursively copy directory trees and userspace has to
+handle this case anyway.  An exception is rename() of directories that
+exist only on the topmost layer; this succeeds.
+
+Rename involves three steps on a union mount: (1) copyup of the file
+from the bottom layer, (2) rename of the new top-layer copy to the
+target in the usual manner, (3) creation of a whiteout covering the
+source of the rename.
+
+Directory copyup:
+
+Directory entries are copied up on the first readdir().  We hold the
+top layer directory i_mutex throughout and sequentially acquire and
+drop the i_mutex for each lower layer directory.
+
+VFS-fs interface
+================
+
+Read-only layer: No support necessary other than enforcement of really
+really read-only semantics (done by VFS for local file systems).
+
+Writable layer: Must implement two new inode operations:
+
+int (*whiteout) (struct inode *, struct dentry *, struct dentry *);
+int (*fallthru) (struct inode *, struct dentry *);
+
+And set the MS_WHITEOUT and MS_FALLTHRU flags to indicate support of
+these operations.
+
+Todo:
+
+- Implement whiteouts and fallthrus in ext3
+- Implement whiteouts and fallthrus in btrfs
+
+Supported file systems
+----------------------
+
+Any file system can be a read-only layer.  File systems must
+explicitly support whiteouts and fallthrus in order to be a read-write
+layer.  This patch set implements whiteouts for ext2, tmpfs, and
+jffs2.  We have tested ext2, tmpfs, and iso9660 as the read-only
+layer.
+
+Todo:
+ - Test corner cases of case-insensitive/oversensitive file systems
+
+NFS interaction
+===============
+
+NFS is currently not supported as either type of layer.  NFS as
+read-only layer requires support from the server to honor the
+read-only guarantee needed for the bottom layer.  To do this, the
+server needs to revoke access to clients requesting read-only file
+systems if the exported file system is remounted read-write or
+unmounted (during which arbitrary changes can occur).  Some recent
+discussion:
+
+http://markmail.org/message/3mkgnvo4pswxd7lp
+
+NFS as the read-write layer would require implementation of the
+->whiteout() and ->fallthru() methods.  DT_WHT directory entries are
+theoretically already supported.
+
+Also, technically the requirement for a readdir() cookie that is
+stable across reboots comes only from file systems exported via NFSv2:
+
+http://oss.oracle.com/pipermail/btrfs-devel/2008-January/000463.html
+
+Todo:
+
+- Guarantee really really read-only on NFS exports
+- Implement whiteout()/fallthru() for NFS
+
+Userland support
+================
+
+The mount command must support the "-o union" mount option and pass
+the corresponding MS_UNION flag to the kerel.  A util-linux git
+tree with union mount support is here:
+
+git://git.kernel.org/pub/scm/utils/util-linux-ng/val/util-linux-ng.git
+
+File system utilities must support whiteouts and fallthrus.  An
+e2fsprogs git tree with union mount support is here:
+
+git://git.kernel.org/pub/scm/fs/ext2/val/e2fsprogs.git
+
+Currently, whiteout directory entries are not returned to userland.
+While the directory type for whiteouts, DT_WHT, has been defined for
+many years, very little userland code handles them.  Userland will
+never see fallthru directory entries.
+
+Known non-POSIX behaviors
+-------------------------
+
+- Any writing system call (unlink()/chmod()/etc.) can return ENOSPC or EIO
+
+  Most programs are not tested and don't work well under conditions of
+  ENOSPC.  The solution is to add more disk space.
+
+- Link count may be wrong for files on bottom layer with > 1 link count
+
+  A file may have more than one hard link to it.  When a file with
+  multiple hard links is copied up, any other hard links pointing to
+  the same inode will remain unchanged.  If the file is looked up via
+  one of the hard links on the read-only layer, it will have the
+  original link count (which is off by one at this point).  An
+  example:
+
+  /bin/link1 -> inode 100
+  /etc/link2 -> inode 100
+
+  inode 100 will have link count 2.
+
+  # echo "blah" > /bin/link1
+
+  Now /bin/link1 will be copied up to the topmost layer.  But
+  /etc/link2 will still point to the original inode 100, and its link
+  count will still be 2.
+
+- Link count on directories will be wrong before readdir() (fixable)
+- File copyup is the logical equivalent of an update via copy +
+  rename().  Any existing open file descriptors will continue to refer
+  to the read-only copy on the bottom layer and will not see any
+  changes that occur after the copy-up.
+- rename() of directory may fail with EXDEV
+- fchmod()/fchown()/futimensat()/fsetattr() fail on O_RDONLY fds
+
+Status
+======
+
+The current union mounts implementation is feature-complete on local
+file systems and passes an extensive union mounts test suite,
+available in the union mounts Usermode Linux-based development kit:
+
+http://valerieaurora.org/union/union_mount_devkit.tar.gz
+
+The whiteout code has had some non-trivial level of review and
+testing, but the much the code has had no external review or testing
+outside the authors' machines.
+
+The latest version is available at:
+
+git://git.kernel.org/pub/scm/linux/kernel/git/val/linux-2.6.git
+
+Check the union mounts web page for the name of the latest branch:
+
+http://valerieaurora.org/union/
+
+Todo:
+
+- Run more tests (e.g., XFS test suite)
+- Get review from VFS maintainers
+
+Non-features
+------------
+
+Features we do not currently plan to support in union mounts:
+
+Online upgrade: E.g., installing software on a file system NFS
+exported to clients while the clients are still up and running.
+Allowing the read-only bottom layer of a union mount to change
+invalidates our locking strategy.
+
+Recursive copying of directories: E.g., implementing rename() across
+layers for directories.  Doing an in-kernel copy of a single file is
+bad enough.  Recursively copying a directory is a big no-no.
+
+Read-only top layer: The readdir() strategy fundamentally requires the
+ability to create persistent directory entries on the top layer file
+system (which may be tmpfs).  However, you can union two read-only
+file systems by union mounting a third file system (such as tmpfs)
+over the two read-onlly file systems.  Numerous alternatives to this
+readdir() strategy (including in-kernel or in-application caching)
+exist and are compatible with union mounts with its writing-readdir()
+implementation disabled.  Creating a readdir() cookie that is stable
+across multiple readdir()s requires one of:
+
+- Write to stable storage (e.g., fallthru dentries)
+- Non-evictable kernel memory cache (doesn't handle NFS server reboot)
+- Per-application caching by glibc readdir()
+
+Often these features are supported by other unioning file systems or
+by other versions of union mounts.
+
+Contributing to union mounts
+============================
+
+The union mounts web page is here:
+
+http://valerieaurora.org/union/
+
+It links to:
+
+ - All git repositories
+ - Documentation
+ - An entire self-contained UML-based dev kit with README, etc.
+
+The best mailing list for discussing union mounts is:
+
+linux-fsdevel@vger.kernel.org
+
+http://vger.kernel.org/vger-lists.html#linux-fsdevel
+
+Thank you for reading!
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 06/34] union-mount: Introduce MNT_UNION and MS_UNION flags
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (4 preceding siblings ...)
  2010-09-16 22:11 ` [PATCH 05/34] union-mount: Union mounts documentation Valerie Aurora
@ 2010-09-16 22:11 ` Valerie Aurora
  2010-09-16 22:11 ` [PATCH 07/34] union-mount: Add CONFIG_UNION_MOUNT option Valerie Aurora
                   ` (28 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:11 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Jan Blunck,
	Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

Add per mountpoint flag for Union Mount support. You need additional patches
to util-linux for that to work - see:

git://git.kernel.org/pub/scm/utils/util-linux-ng/val/util-linux-ng.git

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namespace.c        |    5 ++++-
 include/linux/fs.h    |    1 +
 include/linux/mount.h |    1 +
 3 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index cbaa3ea..f8d7d11 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -824,6 +824,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt)
 		{ MNT_NODIRATIME, ",nodiratime" },
 		{ MNT_RELATIME, ",relatime" },
 		{ MNT_STRICTATIME, ",strictatime" },
+		{ MNT_UNION, ",union" },
 		{ 0, NULL }
 	};
 	const struct proc_fs_info *fs_infop;
@@ -2047,10 +2048,12 @@ long do_mount(char *dev_name, char *dir_name, char *type_page,
 		mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME);
 	if (flags & MS_RDONLY)
 		mnt_flags |= MNT_READONLY;
+	if (flags & MS_UNION)
+		mnt_flags |= MNT_UNION;
 
 	flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE |
 		   MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
-		   MS_STRICTATIME);
+		   MS_STRICTATIME | MS_UNION);
 
 	if (flags & MS_REMOUNT)
 		retval = do_remount(&path, flags & ~MS_REMOUNT, mnt_flags,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7dcb95b..2e3d745 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -192,6 +192,7 @@ struct inodes_stat_t {
 #define MS_REMOUNT	32	/* Alter flags of a mounted FS */
 #define MS_MANDLOCK	64	/* Allow mandatory locks on an FS */
 #define MS_DIRSYNC	128	/* Directory modifications are synchronous */
+#define MS_UNION	256	/* Merge namespace with FS mounted below */
 #define MS_NOATIME	1024	/* Do not update access times. */
 #define MS_NODIRATIME	2048	/* Do not update directory access times */
 #define MS_BIND		4096
diff --git a/include/linux/mount.h b/include/linux/mount.h
index b300cf8..4014ff1 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -47,6 +47,7 @@ struct mnt_namespace;
 
 #define MNT_INTERNAL	0x4000
 #define MNT_HARD_READONLY	0x8000	/* has a hard read-only ref on the sb */
+#define MNT_UNION	0x10000		/* top layer of a union mount */
 
 struct vfsmount {
 	struct list_head mnt_hash;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 07/34] union-mount: Add CONFIG_UNION_MOUNT option
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (5 preceding siblings ...)
  2010-09-16 22:11 ` [PATCH 06/34] union-mount: Introduce MNT_UNION and MS_UNION flags Valerie Aurora
@ 2010-09-16 22:11 ` Valerie Aurora
  2010-09-16 22:11 ` [PATCH 08/34] union-mount: Create union_stack structure Valerie Aurora
                   ` (27 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:11 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

Add CONFIG_UNION_MOUNT option.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/Kconfig |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 5f85b59..47409c9 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -59,6 +59,19 @@ source "fs/notify/Kconfig"
 
 source "fs/quota/Kconfig"
 
+config UNION_MOUNT
+       bool "Union mounts (writable overlays) (EXPERIMENTAL)"
+       depends on EXPERIMENTAL
+       help
+         Union mounts allow you to mount a transparent writable
+	 layer over a read-only file system, for example, an ext3
+	 partition on a hard drive over a CD-ROM root file system
+	 image.
+
+	 See <file:Documentation/filesystems/union-mounts.txt> for details.
+
+	 If unsure, say N.
+
 source "fs/autofs/Kconfig"
 source "fs/autofs4/Kconfig"
 source "fs/fuse/Kconfig"
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 08/34] union-mount: Create union_stack structure
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (6 preceding siblings ...)
  2010-09-16 22:11 ` [PATCH 07/34] union-mount: Add CONFIG_UNION_MOUNT option Valerie Aurora
@ 2010-09-16 22:11 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 09/34] union-mount: Add two superblock fields for union mounts Valerie Aurora
                   ` (26 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:11 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

struct union_stack records the stack of directories unioned at this
directory.  A union_stack is an array of struct paths, dynamically
allocated when the dentry for the topmost directory is created.  The
topmost dentry contains a pointer to the union_stack.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/dcache.c            |    3 ++
 fs/union.h             |   54 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dcache.h |   22 +++++++++++++++++-
 3 files changed, 77 insertions(+), 2 deletions(-)
 create mode 100644 fs/union.h

diff --git a/fs/dcache.c b/fs/dcache.c
index 2cd367a..85e2737 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -959,6 +959,9 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
 	INIT_LIST_HEAD(&dentry->d_lru);
 	INIT_LIST_HEAD(&dentry->d_subdirs);
 	INIT_LIST_HEAD(&dentry->d_alias);
+#ifdef CONFIG_UNION_MOUNT
+	dentry->d_union_stack = NULL;
+#endif
 
 	if (parent) {
 		dentry->d_parent = dget(parent);
diff --git a/fs/union.h b/fs/union.h
new file mode 100644
index 0000000..38b26fd
--- /dev/null
+++ b/fs/union.h
@@ -0,0 +1,54 @@
+ /*
+ * VFS-based union mounts for Linux
+ *
+ * Copyright (C) 2004-2007 IBM Corporation, IBM Deutschland Entwicklung GmbH.
+ * Copyright (C) 2007-2009 Novell Inc.
+ * Copyright (C) 2009-2010 Red Hat, Inc.
+ *
+ *   Author(s): Jan Blunck (j.blunck@tu-harburg.de)
+ *              Valerie Aurora <vaurora@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+#ifndef __LINUX_UNION_H
+#define __LINUX_UNION_H
+#ifdef __KERNEL__
+
+#ifdef CONFIG_UNION_MOUNT
+
+/*
+ * WARNING! Confusing terminology alert.
+ *
+ * Note that the directions "up" and "down" in union mounts are the
+ * opposite of "up" and "down" in normal VFS operation terminology.
+ * "up" in the rest of the VFS means "towards the root of the mount
+ * tree."  If you mount B on top of A, following B "up" will get you
+ * A.  In union mounts, "up" means "towards the most recently mounted
+ * layer of the union stack."  If you union mount B on top of A,
+ * following A "up" will get you to B.  Another way to put it is that
+ * "up" in the VFS means going from this mount towards the direction
+ * of its mnt->mnt_parent pointer, but "up" in union mounts means
+ * going in the opposite direction (until you run out of union
+ * layers).
+ */
+
+/*
+ * The union_stack structure.  It is an array of struct paths of
+ * directories below the topmost directory in a unioned directory, The
+ * topmost dentry has a pointer to this structure.  The topmost dentry
+ * can only be part of one union, so we can reference it from the
+ * dentry, but lower dentries can be part of multiple union stacks.
+ *
+ * The number of dirs actually allocated is kept in the superblock,
+ * s_union_count.
+ */
+struct union_stack {
+	struct path u_dirs[0];
+};
+
+#endif	/* CONFIG_UNION_MOUNT */
+#endif	/* __KERNEL__ */
+#endif	/* __LINUX_UNION_H */
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 0904716..ed8ef47 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -79,12 +79,28 @@ full_name_hash(const unsigned char *name, unsigned int len)
  * Try to keep struct dentry aligned on 64 byte cachelines (this will
  * give reasonable cacheline footprint with larger lines without the
  * large memory footprint increase).
+ *
+ * XXX DNAME_INLINE_LEN_MIN is kind of pitiful on 64bit + union
+ * mounts.  May be worth tuning up, but either we go to 256 bytes and
+ * a wasteful 88 bytes of d_iname, or we lose 64-byte aligment.
  */
 #ifdef CONFIG_64BIT
+
+#ifdef CONFIG_UNION_MOUNT
+#define DNAME_INLINE_LEN_MIN 24 /* 192 bytes */
+#else
 #define DNAME_INLINE_LEN_MIN 32 /* 192 bytes */
+#endif /* CONFIG_UNION_MOUNT */
+
+#else
+
+#ifdef CONFIG_UNION_MOUNT
+#define DNAME_INLINE_LEN_MIN 36 /* 128 bytes */
 #else
 #define DNAME_INLINE_LEN_MIN 40 /* 128 bytes */
-#endif
+#endif /* CONFIG_UNION_MOUNT */
+
+#endif /* CONFIG_64BIT */
 
 struct dentry {
 	atomic_t d_count;
@@ -100,7 +116,9 @@ struct dentry {
 	struct hlist_node d_hash;	/* lookup hash list */
 	struct dentry *d_parent;	/* parent directory */
 	struct qstr d_name;
-
+#ifdef CONFIG_UNION_MOUNT
+	struct union_stack *d_union_stack;	/* dirs in union stack */
+#endif
 	struct list_head d_lru;		/* LRU list */
 	/*
 	 * d_child and d_rcu can share memory
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 09/34] union-mount: Add two superblock fields for union mounts
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (7 preceding siblings ...)
  2010-09-16 22:11 ` [PATCH 08/34] union-mount: Create union_stack structure Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 10/34] union-mount: Add union_alloc() Valerie Aurora
                   ` (25 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

Add two fields to struct super_block to support union mounts.
s_union_lower_mnts is a pointer to a cloned vfsmount tree of all the
lower (read-only) mounts unioned with the topmost (read-write)
vfsmount.  These mounts may have submounts which will also be unioned;
hence we copy the entire vfsmount tree, not just the root vfsmounts.
s_union_count is the number of lower mounts unioned at the root of the
file system.  This count is the maximum number of directories that
will ever be unioned with a single directory.  We use it to allocate a
union stack of the correct size for each directory.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 include/linux/fs.h |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2e3d745..5e35b03 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1397,6 +1397,18 @@ struct super_block {
 	 * Decremented by free_vfsmnt() if MNT_HARD_READONLY is set.
 	 */
 	int s_hard_readonly_users;
+
+	/*
+	 * Root of the private cloned vfsmount tree of the read-only
+	 * mounts in this union (set in topmost vfsmount only)
+	 */
+	struct vfsmount *s_union_lower_mnts;
+
+	/*
+	 * Number of layers in this union, not counting the topmost or
+	 * submounts.
+	 */
+	unsigned int s_union_count;
 };
 
 extern struct timespec current_fs_time(struct super_block *sb);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 10/34] union-mount: Add union_alloc()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (8 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 09/34] union-mount: Add two superblock fields for union mounts Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 11/34] union-mount: Add union_find_dir() Valerie Aurora
                   ` (24 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

union_alloc() allocates a union stack with enough entries for the
maximum possible number of directories that might be unioned at this
point.

The union_stack may be larger than strictly necessary if this
directory does not exist on all layers, but allocating exactly the
right number would require keeping the number of layers in the
union_stack structure.  We optimize for the case of unioning two file
systems and keep the count of layers in the superblock.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/Makefile |    1 +
 fs/union.c  |   42 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 0 deletions(-)
 create mode 100644 fs/union.c

diff --git a/fs/Makefile b/fs/Makefile
index e6ec1d3..936acf0 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -52,6 +52,7 @@ obj-$(CONFIG_NFS_COMMON)	+= nfs_common/
 obj-$(CONFIG_GENERIC_ACL)	+= generic_acl.o
 
 obj-y				+= quota/
+obj-$(CONFIG_UNION_MOUNT)	+= union.o
 
 obj-$(CONFIG_PROC_FS)		+= proc/
 obj-y				+= partitions/
diff --git a/fs/union.c b/fs/union.c
new file mode 100644
index 0000000..52a5c28
--- /dev/null
+++ b/fs/union.c
@@ -0,0 +1,42 @@
+ /*
+ * VFS-based union mounts for Linux
+ *
+ * Copyright (C) 2004-2007 IBM Corporation, IBM Deutschland Entwicklung GmbH.
+ * Copyright (C) 2007-2009 Novell Inc.
+ * Copyright (C) 2009-2010 Red Hat, Inc.
+ *
+ *   Author(s): Jan Blunck (j.blunck@tu-harburg.de)
+ *              Valerie Aurora <vaurora@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/bootmem.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/fs.h>
+#include <linux/mount.h>
+#include <linux/fs_struct.h>
+#include <linux/slab.h>
+
+#include "union.h"
+
+/**
+ * union_alloc - allocate a union stack
+ *
+ * @path: path of topmost directory
+ *
+ * Allocate a union_stack large enough to contain the maximum number
+ * of layers in this union mount.
+ */
+
+static struct union_stack *union_alloc(struct path *topmost)
+{
+	unsigned int layers = topmost->dentry->d_sb->s_union_count;
+	BUG_ON(!S_ISDIR(topmost->dentry->d_inode->i_mode));
+
+	return kzalloc(sizeof(struct path) * layers, GFP_KERNEL);
+}
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 11/34] union-mount: Add union_find_dir()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (9 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 10/34] union-mount: Add union_alloc() Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 12/34] union-mount: Create d_free_unions() Valerie Aurora
                   ` (23 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

union_find_dir() returns the path of the directory at the specified
layer in a unioned directory.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/union.h |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/union.h b/fs/union.h
index 38b26fd..e242451 100644
--- a/fs/union.h
+++ b/fs/union.h
@@ -49,6 +49,16 @@ struct union_stack {
 	struct path u_dirs[0];
 };
 
+static inline struct path *union_find_dir(struct dentry *dentry,
+					  unsigned int layer) {
+	BUG_ON(layer >= dentry->d_sb->s_union_count);
+	return &(dentry->d_union_stack->u_dirs[layer]);
+}
+
+#else /* CONFIG_UNION_MOUNT */
+
+#define union_find_dir(x, y)		({ BUG(); (NULL); })
+
 #endif	/* CONFIG_UNION_MOUNT */
 #endif	/* __KERNEL__ */
 #endif	/* __LINUX_UNION_H */
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 12/34] union-mount: Create d_free_unions()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (10 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 11/34] union-mount: Add union_find_dir() Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 13/34] union-mount: Free union stack on removal of topmost dentry from dcache Valerie Aurora
                   ` (22 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

d_free_unions() frees the union stack associated with a directory.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/union.c |   25 +++++++++++++++++++++++++
 fs/union.h |    7 +++++++
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/fs/union.c b/fs/union.c
index 52a5c28..a191bef 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -21,6 +21,7 @@
 #include <linux/mount.h>
 #include <linux/fs_struct.h>
 #include <linux/slab.h>
+#include <linux/namei.h>
 
 #include "union.h"
 
@@ -40,3 +41,27 @@ static struct union_stack *union_alloc(struct path *topmost)
 
 	return kzalloc(sizeof(struct path) * layers, GFP_KERNEL);
 }
+
+/**
+ * d_free_unions - free all unions for this dentry
+ *
+ * @dentry - topmost dentry in the union stack to remove
+ *
+ * This must be called when freeing a dentry.
+ */
+void d_free_unions(struct dentry *topmost)
+{
+	struct path *path;
+	unsigned int i, layers = topmost->d_sb->s_union_count;
+
+	if (!IS_DIR_UNIONED(topmost))
+		return;
+
+	for (i = 0; i < layers; i++) {
+		path = union_find_dir(topmost, i);
+		if (path->mnt)
+			path_put(path);
+	}
+	kfree(topmost->d_union_stack);
+	topmost->d_union_stack = NULL;
+}
diff --git a/fs/union.h b/fs/union.h
index e242451..353f78d 100644
--- a/fs/union.h
+++ b/fs/union.h
@@ -49,6 +49,10 @@ struct union_stack {
 	struct path u_dirs[0];
 };
 
+#define IS_DIR_UNIONED(dentry)	((dentry)->d_union_stack)
+
+extern void d_free_unions(struct dentry *);
+
 static inline struct path *union_find_dir(struct dentry *dentry,
 					  unsigned int layer) {
 	BUG_ON(layer >= dentry->d_sb->s_union_count);
@@ -57,6 +61,9 @@ static inline struct path *union_find_dir(struct dentry *dentry,
 
 #else /* CONFIG_UNION_MOUNT */
 
+#define IS_DIR_UNIONED(x)		(0)
+
+#define d_free_unions(x)		do { } while (0)
 #define union_find_dir(x, y)		({ BUG(); (NULL); })
 
 #endif	/* CONFIG_UNION_MOUNT */
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 13/34] union-mount: Free union stack on removal of topmost dentry from dcache
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (11 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 12/34] union-mount: Create d_free_unions() Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 14/34] union-mount: Create union_add_dir() Valerie Aurora
                   ` (21 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Jan Blunck,
	Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

If a dentry is removed from dentry cache because its usage count drops
to zero, its union stack is freed too.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/dcache.c    |   11 +++++++++++
 fs/namespace.c |    2 ++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 85e2737..0910ce7 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -34,6 +34,7 @@
 #include <linux/fs_struct.h>
 #include <linux/hardirq.h>
 #include "internal.h"
+#include "union.h"
 
 int sysctl_vfs_cache_pressure __read_mostly = 100;
 EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
@@ -175,6 +176,7 @@ static struct dentry *d_kill(struct dentry *dentry)
 	dentry_stat.nr_dentry--;	/* For d_free, below */
 	/*drops the locks, at that point nobody can reach this dentry */
 	dentry_iput(dentry);
+	d_free_unions(dentry);
 	if (IS_ROOT(dentry))
 		parent = NULL;
 	else
@@ -695,6 +697,7 @@ static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
 					iput(inode);
 			}
 
+			d_free_unions(dentry);
 			d_free(dentry);
 
 			/* finished when we fall off the top of the tree,
@@ -1535,6 +1538,7 @@ void d_delete(struct dentry * dentry)
 	if (atomic_read(&dentry->d_count) == 1) {
 		dentry->d_flags &= ~DCACHE_CANT_MOUNT;
 		dentry_iput(dentry);
+		d_free_unions(dentry);
 		fsnotify_nameremove(dentry, isdir);
 		return;
 	}
@@ -1545,6 +1549,13 @@ void d_delete(struct dentry * dentry)
 	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
 
+	/*
+	 * Remove any associated unions.  While someone still has this
+	 * directory open (ref count > 0), we could not have deleted
+	 * it unless it was empty, and therefore has no references to
+	 * directories below it.  So we don't need the unions.
+	 */
+	d_free_unions(dentry);
 	fsnotify_nameremove(dentry, isdir);
 }
 EXPORT_SYMBOL(d_delete);
diff --git a/fs/namespace.c b/fs/namespace.c
index f8d7d11..ffa5ed7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -33,6 +33,7 @@
 #include <asm/unistd.h>
 #include "pnode.h"
 #include "internal.h"
+#include "union.h"
 
 #define HASH_SHIFT ilog2(PAGE_SIZE / sizeof(struct list_head))
 #define HASH_SIZE (1UL << HASH_SHIFT)
@@ -1065,6 +1066,7 @@ void umount_tree(struct vfsmount *mnt, int propagate, struct list_head *kill)
 		propagate_umount(kill);
 
 	list_for_each_entry(p, kill, mnt_hash) {
+		d_free_unions(p->mnt_root);
 		list_del_init(&p->mnt_expire);
 		list_del_init(&p->mnt_list);
 		__touch_mnt_namespace(p->mnt_ns);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 14/34] union-mount: Create union_add_dir()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (12 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 13/34] union-mount: Free union stack on removal of topmost dentry from dcache Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 15/34] union-mount: Add union_create_topmost_dir() Valerie Aurora
                   ` (20 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

union_add_dir() fills out the union stack for the topmost dentry with
the path of the directory in this layer of the union.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/union.c |   28 ++++++++++++++++++++++++++++
 fs/union.h |    2 ++
 2 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/fs/union.c b/fs/union.c
index a191bef..45552f8 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -65,3 +65,31 @@ void d_free_unions(struct dentry *topmost)
 	kfree(topmost->d_union_stack);
 	topmost->d_union_stack = NULL;
 }
+
+/**
+ * union_add_dir - Add another layer to a unioned directory
+ *
+ * @topmost - topmost directory
+ * @lower - directory in the current layer
+ * @layer - index of layer to add this at
+ *
+ * @layer counts starting at 0 for the dir below the topmost dir.
+ * Must take a reference to @lower (call path_get()) before calling
+ * this function.
+ */
+
+int union_add_dir(struct path *topmost, struct path *lower,
+		  unsigned int layer)
+{
+	struct path *path;
+	struct dentry *dentry = topmost->dentry;
+	BUG_ON(layer >= dentry->d_sb->s_union_count);
+
+	if (!dentry->d_union_stack)
+		dentry->d_union_stack = union_alloc(topmost);
+	if (!dentry->d_union_stack)
+		return -ENOMEM;
+	path = union_find_dir(dentry, layer);
+	*path = *lower;
+	return 0;
+}
diff --git a/fs/union.h b/fs/union.h
index 353f78d..bd03d67 100644
--- a/fs/union.h
+++ b/fs/union.h
@@ -52,6 +52,7 @@ struct union_stack {
 #define IS_DIR_UNIONED(dentry)	((dentry)->d_union_stack)
 
 extern void d_free_unions(struct dentry *);
+extern int union_add_dir(struct path *, struct path *, unsigned int);
 
 static inline struct path *union_find_dir(struct dentry *dentry,
 					  unsigned int layer) {
@@ -64,6 +65,7 @@ static inline struct path *union_find_dir(struct dentry *dentry,
 #define IS_DIR_UNIONED(x)		(0)
 
 #define d_free_unions(x)		do { } while (0)
+#define union_add_dir(x, y, z)		({ BUG(); (0); })
 #define union_find_dir(x, y)		({ BUG(); (NULL); })
 
 #endif	/* CONFIG_UNION_MOUNT */
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 15/34] union-mount: Add union_create_topmost_dir()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (13 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 14/34] union-mount: Create union_add_dir() Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 16/34] union-mount: Create IS_MNT_UNION() Valerie Aurora
                   ` (19 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

Union mounts design requires that the topmost directory exist for
every single directory at the time lookup completes.  This is so that
we don't have to double back and create a whole path's worth of
directories whenever we copy up a file in a directory for the first
time.  This greatly simplifies locking and error handling.

XXX - attributes of copied-up dir are wrong

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/union.c |   34 ++++++++++++++++++++++++++++++++++
 fs/union.h |    3 +++
 2 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/fs/union.c b/fs/union.c
index 45552f8..bc53066 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -93,3 +93,37 @@ int union_add_dir(struct path *topmost, struct path *lower,
 	*path = *lower;
 	return 0;
 }
+
+/**
+ * union_create_topmost_dir - Create a matching dir in the topmost file system
+ *
+ * @parent - parent of target on topmost layer
+ * @name - name of target
+ * @topmost - path of target on topmost layer
+ * @lower - path of source on lower layer
+ *
+ * As we lookup each directory on the lower layer of a union, we
+ * create a matching directory on the topmost layer if it does not
+ * already exist.
+ *
+ * XXX - owner is wrong, set credentials properly
+ */
+
+int union_create_topmost_dir(struct path *parent, struct qstr *name,
+			     struct path *topmost, struct path *lower)
+{
+	int mode = lower->dentry->d_inode->i_mode;
+	int res;
+
+	BUG_ON(topmost->dentry->d_inode);
+
+	res = mnt_want_write(parent->mnt);
+	if (res)
+		return res;
+
+	res = vfs_mkdir(parent->dentry->d_inode, topmost->dentry, mode);
+
+	mnt_drop_write(parent->mnt);
+
+	return res;
+}
diff --git a/fs/union.h b/fs/union.h
index bd03d67..1692803 100644
--- a/fs/union.h
+++ b/fs/union.h
@@ -53,6 +53,8 @@ struct union_stack {
 
 extern void d_free_unions(struct dentry *);
 extern int union_add_dir(struct path *, struct path *, unsigned int);
+extern int union_create_topmost_dir(struct path *, struct qstr *, struct path *,
+				    struct path *);
 
 static inline struct path *union_find_dir(struct dentry *dentry,
 					  unsigned int layer) {
@@ -67,6 +69,7 @@ static inline struct path *union_find_dir(struct dentry *dentry,
 #define d_free_unions(x)		do { } while (0)
 #define union_add_dir(x, y, z)		({ BUG(); (0); })
 #define union_find_dir(x, y)		({ BUG(); (NULL); })
+#define union_create_topmost_dir(w, x, y, z)	({ BUG(); (0); })
 
 #endif	/* CONFIG_UNION_MOUNT */
 #endif	/* __KERNEL__ */
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 16/34] union-mount: Create IS_MNT_UNION()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (14 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 15/34] union-mount: Add union_create_topmost_dir() Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 17/34] union-mount: Create needs_lookup_union() Valerie Aurora
                   ` (18 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Jan Blunck,
	Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

IS_MNT_UNION() tests whether a vfsmount is a union.  Note that a
directory in a union mounted file system is not necessarily unioned.
Use IS_DIR_UNIONED() to test that.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/union.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/union.h b/fs/union.h
index 1692803..c496823 100644
--- a/fs/union.h
+++ b/fs/union.h
@@ -49,6 +49,7 @@ struct union_stack {
 	struct path u_dirs[0];
 };
 
+#define IS_MNT_UNION(mnt)	((mnt)->mnt_flags & MNT_UNION)
 #define IS_DIR_UNIONED(dentry)	((dentry)->d_union_stack)
 
 extern void d_free_unions(struct dentry *);
@@ -64,6 +65,7 @@ static inline struct path *union_find_dir(struct dentry *dentry,
 
 #else /* CONFIG_UNION_MOUNT */
 
+#define IS_MNT_UNION(x)			(0)
 #define IS_DIR_UNIONED(x)		(0)
 
 #define d_free_unions(x)		do { } while (0)
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 17/34] union-mount: Create needs_lookup_union()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (15 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 16/34] union-mount: Create IS_MNT_UNION() Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 18/34] union-mount: Create check_topmost_union_mnt() Valerie Aurora
                   ` (17 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

needs_lookup_union() tests if a path could possibly require a union
lookup.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/union.c |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
 fs/union.h |    2 ++
 2 files changed, 51 insertions(+), 0 deletions(-)

diff --git a/fs/union.c b/fs/union.c
index bc53066..a9427bf 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -127,3 +127,52 @@ int union_create_topmost_dir(struct path *parent, struct qstr *name,
 
 	return res;
 }
+
+/**
+ * needs_lookup_union - Avoid union lookup when not necessary
+ *
+ * @parent_path: path of the parent directory
+ * @path: path of the lookup target
+ *
+ * Check to see if the target needs union lookup - that is, needs its
+ * union stack constructed.  Two cases need union lookup in a unioned
+ * parent directory: the target is a directory without a union stack,
+ * or the target is a negative dentry.
+ *
+ * Returns 0 if this dentry does not need union lookup.  Returns 1 if
+ * it is possible this dentry needs union lookup.
+ */
+
+int needs_lookup_union(struct path *parent_path, struct path *path)
+{
+	if (!IS_DIR_UNIONED(parent_path->dentry))
+		return 0;
+
+	/* Root dir union stack created at mount (if this is a unioned mnt) */
+	if (IS_ROOT(path->dentry))
+		return 0;
+
+	/* Union stack for target already created, clearly */
+	if (IS_DIR_UNIONED(path->dentry))
+		return 0;
+
+	if (d_is_whiteout(path->dentry))
+		return 0;
+
+	if (IS_OPAQUE(parent_path->dentry->d_inode) &&
+	    !d_is_fallthru(path->dentry))
+		return 0;
+
+	/* Non-directories don't need union stacks */
+	if (path->dentry->d_inode &&
+	    !S_ISDIR(path->dentry->d_inode->i_mode))
+		return 0;
+
+	/*
+	 * XXX Things with nothing below them in a union dir
+	 * (including negative dentries) must always go through union
+	 * lookup.  This is like negative dentries in the dcache.
+	 * Write some optimization for this case.
+	 */
+	return 1;
+}
diff --git a/fs/union.h b/fs/union.h
index c496823..0c8fbca 100644
--- a/fs/union.h
+++ b/fs/union.h
@@ -56,6 +56,7 @@ extern void d_free_unions(struct dentry *);
 extern int union_add_dir(struct path *, struct path *, unsigned int);
 extern int union_create_topmost_dir(struct path *, struct qstr *, struct path *,
 				    struct path *);
+extern int needs_lookup_union(struct path *, struct path *);
 
 static inline struct path *union_find_dir(struct dentry *dentry,
 					  unsigned int layer) {
@@ -72,6 +73,7 @@ static inline struct path *union_find_dir(struct dentry *dentry,
 #define union_add_dir(x, y, z)		({ BUG(); (0); })
 #define union_find_dir(x, y)		({ BUG(); (NULL); })
 #define union_create_topmost_dir(w, x, y, z)	({ BUG(); (0); })
+#define needs_lookup_union(x, y)	({ (0); })
 
 #endif	/* CONFIG_UNION_MOUNT */
 #endif	/* __KERNEL__ */
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 18/34] union-mount: Create check_topmost_union_mnt()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (16 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 17/34] union-mount: Create needs_lookup_union() Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 19/34] union-mount: Add clone_union_tree() and put_union_sb() Valerie Aurora
                   ` (16 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

check_topmost_union_mnt() checks that the topmost layer of a proposed
union mount is read-write, supports fallthrus and whiteouts, and isn't
mounted elsewhere.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namespace.c |   40 ++++++++++++++++++++++++++++++++++++++++
 1 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index ffa5ed7..62c2f7a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1353,6 +1353,46 @@ static int invent_group_ids(struct vfsmount *mnt, bool recurse)
 	return 0;
 }
 
+/**
+ * check_topmost_union_mnt - mount-time checks for union mount
+ *
+ * @topmost_mnt: vfsmount of the topmost union filed system
+ * @mnt_flags: mount flags for the topmost mount
+ *
+ * Our readdir() solution of copying up directory entries requires
+ * that the topmost layer be writeable and support whiteouts and
+ * fallthrus.  The topmost file system can't be mounted elsewhere
+ * because it's Too Hard(tm).
+ */
+
+static int check_topmost_union_mnt(struct vfsmount *topmost_mnt, int mnt_flags)
+{
+	struct super_block *sb = topmost_mnt->mnt_sb;
+#ifndef CONFIG_UNION_MOUNT
+	printk(KERN_INFO "union mount: not supported by the kernel\n");
+	return -EINVAL;
+#endif
+	if (mnt_flags & MNT_READONLY)
+		return -EROFS;
+
+	if (atomic_read(&sb->s_active) != 1) {
+		printk(KERN_INFO "union mount: topmost fs mounted elsewhere\n");
+		return -EBUSY;
+	}
+
+	if (!(sb->s_flags & MS_WHITEOUT)) {
+		printk(KERN_INFO "union mount: whiteouts not supported by fs\n");
+		return -EINVAL;
+	}
+
+	if (!(sb->s_flags & MS_FALLTHRU)) {
+		printk(KERN_INFO "union mount: fallthrus not supported by fs\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /*
  *  @source_mnt : mount tree to be attached
  *  @nd         : place the mount tree @source_mnt is attached
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 19/34] union-mount: Add clone_union_tree() and put_union_sb()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (17 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 18/34] union-mount: Create check_topmost_union_mnt() Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 20/34] union-mount: Create build_root_union() Valerie Aurora
                   ` (15 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

A union mount clones the vfsmount tree of all of the read-only layers
of the union and keeps a reference to it in the vfsmount of the
topmost layer of the union.

clone_union_tree() takes the path of the proposed union mountpoint and
attempts to clones every vfsmount mounted at that same pathname, as
well as their submounts.  All these mounts must be read-only, not
slave, and not shared.

put_union_sb() unwinds everything clone_union_tree() does.  It is
called when the superblock is deactivated.  Thus, you can lazy unmount
a union mount and when the last reference goes away, the union will be
torn down.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namespace.c        |   71 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/mount.h |    2 +
 2 files changed, 73 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 62c2f7a..e966890 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1393,6 +1393,77 @@ static int check_topmost_union_mnt(struct vfsmount *topmost_mnt, int mnt_flags)
 	return 0;
 }
 
+void put_union_sb(struct super_block *sb)
+{
+       struct vfsmount *mnt = sb->s_union_lower_mnts;
+       LIST_HEAD(umount_list);
+
+       if (!mnt)
+               return;
+       spin_lock(&vfsmount_lock);
+       umount_tree(mnt, 0, &umount_list);
+       spin_unlock(&vfsmount_lock);
+       release_mounts(&umount_list);
+       sb->s_union_lower_mnts = 0;
+       sb->s_union_count = 0;
+}
+
+/**
+ * clone_union_tree - Clone all union-able mounts at this mountpoint
+ *
+ * @topmost - vfsmount of topmost layer
+ * @mntpnt - target of union mount
+ *
+ * Given the target mountpoint of a union mount, clone all the mounts
+ * at that mountpoint (well, pathname) that qualify as a union lower
+ * layer.  Increment the hard readonly count of the lower layer
+ * superblocks.
+ *
+ * Returns error if any of the mounts or submounts mounted on or below
+ * this pathname are unsuitable for union mounting.  This means you
+ * can't construct a union mount at the root of an existing mount
+ * without unioning it.
+ *
+ * XXX - Maybe should take # of layers to go down as an argument. But
+ * how to pass this in through mount options?  All solutions look
+ * ugly.  Currently you express your intention through mounting file
+ * systems on the same mountpoint, which is pretty elegant.
+ */
+
+static int clone_union_tree(struct vfsmount *topmost, struct path *mntpnt)
+{
+	struct vfsmount *mnt, *cloned_tree;
+
+	if (!IS_ROOT(mntpnt->dentry)) {
+		printk(KERN_INFO "union mount: mount point must be a root dir\n");
+		return -EINVAL;
+	}
+
+	/* Look for the "lowest" layer to union. */
+	mnt = mntpnt->mnt;
+	while (mnt->mnt_parent->mnt_root == mnt->mnt_mountpoint) {
+		/* Got root (mnt)? */
+		if (mnt->mnt_parent == mnt)
+			break;
+		mnt = mnt->mnt_parent;
+	}
+	/*
+	 * Clone all the read-only mounts and submounts, only if they
+	 * are not shared or slave, and increment the hard read-only
+	 * users count on each one.  If this can't be done for every
+	 * mount and submount below this one, fail.
+	 */
+	cloned_tree = copy_tree(mnt, mnt->mnt_root,
+				CL_COPY_ALL | CL_PRIVATE |
+				CL_NO_SHARED | CL_NO_SLAVE |
+				CL_MAKE_HARD_READONLY);
+	if (IS_ERR(cloned_tree))
+		return PTR_ERR(cloned_tree);
+
+	topmost->mnt_sb->s_union_lower_mnts = cloned_tree;
+	return 0;
+}
+
 /*
  *  @source_mnt : mount tree to be attached
  *  @nd         : place the mount tree @source_mnt is attached
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 4014ff1..5643835 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -138,4 +138,6 @@ extern void mark_mounts_for_expiry(struct list_head *mounts);
 
 extern dev_t name_to_dev_t(char *name);
 
+extern void put_union_sb(struct super_block *sb);
+
 #endif /* _LINUX_MOUNT_H */
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 20/34] union-mount: Create build_root_union()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (18 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 19/34] union-mount: Add clone_union_tree() and put_union_sb() Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 21/34] union-mount: Create prepare_mnt_union() and cleanup_mnt_union() Valerie Aurora
                   ` (14 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

During mount(), build_root_union() creates the union stack for the
root directory.  All other directory union stacks are bootstrapped
from their parents' union stacks during path lookup.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namespace.c |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index e966890..2bb4645 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1464,6 +1464,54 @@ static int clone_union_tree(struct vfsmount *topmost, struct path *mntpnt)
 	return 0;
 }
 
+/**
+ * build_root_union - Create the union stack for the root dir
+ *
+ * @topmost_mnt - vfsmount of topmost mount
+ *
+ * Build the union stack for the root dir.  Annoyingly, we have to
+ * traverse union "up" from the root of the cloned tree to find the
+ * topmost read-only mount, and then traverse back "down" to build the
+ * stack.
+ */
+
+static int build_root_union(struct vfsmount *topmost_mnt)
+{
+	struct path lower, topmost_path;
+	struct vfsmount *mnt, *topmost_ro_mnt;
+	unsigned int i, layers = 1;
+	int err = 0;
+
+	/* Find the topmost read-only mount */
+	topmost_ro_mnt = topmost_mnt->mnt_sb->s_union_lower_mnts;
+	for (mnt = topmost_ro_mnt; mnt; mnt = next_mnt(mnt, topmost_ro_mnt)) {
+		if ((mnt->mnt_parent == topmost_ro_mnt) &&
+		    (mnt->mnt_mountpoint == topmost_ro_mnt->mnt_root)) {
+			topmost_ro_mnt = mnt;
+			layers++;
+		}
+	}
+	topmost_mnt->mnt_sb->s_union_count = layers;
+
+	/* Build the root dir's union stack from the top down */
+	topmost_path.mnt = topmost_mnt;
+	topmost_path.dentry = topmost_mnt->mnt_root;
+	mnt = topmost_ro_mnt;
+	for (i = 0; i < layers; i++) {
+		lower.mnt = mntget(mnt);
+		lower.dentry = dget(mnt->mnt_root);
+		err = union_add_dir(&topmost_path, &lower, i);
+		if (err)
+			goto out;
+		mnt = mnt->mnt_parent;
+	}
+	return 0;
+out:
+	d_free_unions(topmost_path.dentry);
+	topmost_mnt->mnt_sb->s_union_count = 0;
+	return err;
+}
+
 /*
  *  @source_mnt : mount tree to be attached
  *  @nd         : place the mount tree @source_mnt is attached
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 21/34] union-mount: Create prepare_mnt_union() and cleanup_mnt_union()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (19 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 20/34] union-mount: Create build_root_union() Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 22/34] union-mount: Prevent improper union-related remounts Valerie Aurora
                   ` (13 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

prepare_mnt_union() ties together all the mount-time checks and setup
for union mounts.  It tests the layers for suitability and builds the
root union stack.

cleanup_mnt_union() unwinds everything prepare_mnt_union() does.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namespace.c |   43 +++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 2bb4645..ff83cee 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1512,6 +1512,49 @@ out:
 	return err;
 }
 
+/**
+ * prepare_mnt_union - do setup necessary for a union mount
+ *
+ * @topmost_mnt: vfsmount of topmost layer
+ * @mntpnt: path of requested mountpoint
+ *
+ * We union every underlying file system that is mounted on the same
+ * mountpoint (well, pathname), read-only, and not shared.  If we get
+ * at least one layer, we don't return an error, although we will
+ * complain in the kernel log if we hit a mount that can't be
+ * unioned.
+ *
+ * Caller needs namespace_sem, but can't have vfsmount_lock.
+ */
+
+static int prepare_mnt_union(struct vfsmount *topmost_mnt, struct path *mntpnt)
+{
+	int err;
+
+	err = check_topmost_union_mnt(topmost_mnt, topmost_mnt->mnt_flags);
+	if (err)
+		return err;
+
+	err = clone_union_tree(topmost_mnt, mntpnt);
+	if (err)
+		return err;
+
+	err = build_root_union(topmost_mnt);
+	if (err)
+		goto out;
+
+	return 0;
+out:
+	put_union_sb(topmost_mnt->mnt_sb);
+	return err;
+}
+
+static void cleanup_mnt_union(struct vfsmount *topmost_mnt)
+{
+	d_free_unions(topmost_mnt->mnt_root);
+	put_union_sb(topmost_mnt->mnt_sb);
+}
+
 /*
  *  @source_mnt : mount tree to be attached
  *  @nd         : place the mount tree @source_mnt is attached
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 22/34] union-mount: Prevent improper union-related remounts
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (20 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 21/34] union-mount: Create prepare_mnt_union() and cleanup_mnt_union() Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 23/34] union-mount: Prevent topmost file system from being mounted elsewhere Valerie Aurora
                   ` (12 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

A remount request must (a) not convert a union to a non-union (or vice
versa), or (b) make the topmost layer of a union read-only.

Note that we only have to worry about attempts to remount the vfsmount
of the topmost read-write of the union (the one with MNT_UNION set).
The vfsmounts of the read-only layers are hidden in a cloned tree
hanging of the superblock of the topmost layer and aren't visible to
userspace.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namespace.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index ff83cee..61256e6 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1824,6 +1824,18 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
 	if (!check_mnt(path->mnt))
 		return -EINVAL;
 
+	if ((path->mnt->mnt_flags & MNT_UNION) &&
+	    !(mnt_flags & MNT_UNION))
+		return -EINVAL;
+
+	if ((mnt_flags & MNT_UNION) &&
+	    !(path->mnt->mnt_flags & MNT_UNION))
+		return -EINVAL;
+
+	if ((path->mnt->mnt_flags & MNT_UNION) &&
+	    (mnt_flags & MNT_READONLY))
+		return -EINVAL;
+
 	if (path->dentry != path->mnt->mnt_root)
 		return -EINVAL;
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 23/34] union-mount: Prevent topmost file system from being mounted elsewhere
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (21 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 22/34] union-mount: Prevent improper union-related remounts Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-30  9:37   ` Miklos Szeredi
  2010-09-16 22:12 ` [PATCH 24/34] union-mount: Prevent bind mounts of union mounts Valerie Aurora
                   ` (11 subsequent siblings)
  34 siblings, 1 reply; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

The device underlying the topmost read-write layer of a file system
cannot be mounted anywhere else on the system.  We keep a pointer to
the union stack in the dentry of the topmost directory, so that dentry
can't be part of a different mount, since dentries are shared between
different mounts of the same device.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namespace.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 61256e6..26efaf3 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1998,6 +1998,11 @@ int do_add_mount(struct vfsmount *newmnt, struct path *path,
 	if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode))
 		goto unlock;
 
+	/* Top layers of union mounts can't be mounted elsewhere */
+	err = -EBUSY;
+	if (newmnt->mnt_sb->s_union_lower_mnts)
+		goto unlock;
+
 	newmnt->mnt_flags = mnt_flags;
 	if ((err = graft_tree(newmnt, path)))
 		goto unlock;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 24/34] union-mount: Prevent bind mounts of union mounts
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (22 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 23/34] union-mount: Prevent topmost file system from being mounted elsewhere Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 25/34] union-mount: Implement union mount Valerie Aurora
                   ` (10 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

Prevent bind mounts of parts of union mounts.

XXX - Bind mounting parts of union mounts is probably easy to
implement, but requires some careful thought about corner cases,
extensive testing, and some refactoring of the code.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namespace.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 26efaf3..e3629dd 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1761,6 +1761,12 @@ static int do_loopback(struct path *path, char *old_name,
 	err = -EINVAL;
 	if (IS_MNT_UNBINDABLE(old_path.mnt))
 		goto out;
+	/*
+	 * XXX - Mounting a subtree of a union mount elsewhere
+	 * requires careful thought and some refactoring.
+	 */
+	if (IS_MNT_UNION(old_path.mnt))
+		goto out;
 
 	if (!check_mnt(path->mnt) || !check_mnt(old_path.mnt))
 		goto out;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 25/34] union-mount: Implement union mount
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (23 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 24/34] union-mount: Prevent bind mounts of union mounts Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 26/34] union-mount: Temporarily disable some syscalls Valerie Aurora
                   ` (9 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

Up till this commit, mount with MS_UNION flag succeeded but didn't
actually union the file systems.  Now call the functions to check
the source mounts and create/destroy the per-vfsmount union structures.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namespace.c |   13 ++++++++++++-
 fs/super.c     |    1 +
 2 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index e3629dd..6bbeb49 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1632,9 +1632,17 @@ static int attach_recursive_mnt(struct vfsmount *source_mnt,
 		if (err)
 			goto out;
 	}
+
+	/* parent_path means we are moving an existing unioned mount */
+	if (!parent_path && IS_MNT_UNION(source_mnt)) {
+		err = prepare_mnt_union(source_mnt, path);
+		if (err)
+			goto out_cleanup_ids;
+	}
+
 	err = propagate_mnt(dest_mnt, dest_dentry, source_mnt, &tree_list);
 	if (err)
-		goto out_cleanup_ids;
+		goto out_cleanup_union;
 
 	spin_lock(&vfsmount_lock);
 
@@ -1658,6 +1666,9 @@ static int attach_recursive_mnt(struct vfsmount *source_mnt,
 	spin_unlock(&vfsmount_lock);
 	return 0;
 
+ out_cleanup_union:
+	if (!parent_path && IS_MNT_UNION(source_mnt))
+		cleanup_mnt_union(source_mnt);
  out_cleanup_ids:
 	if (IS_MNT_SHARED(dest_mnt))
 		cleanup_group_ids(source_mnt, NULL);
diff --git a/fs/super.c b/fs/super.c
index 3f2df09..c376147 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -160,6 +160,7 @@ void deactivate_locked_super(struct super_block *s)
 	if (atomic_dec_and_test(&s->s_active)) {
 		fs->kill_sb(s);
 		put_filesystem(fs);
+		put_union_sb(s);
 		put_super(s);
 	} else {
 		up_write(&s->s_umount);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 26/34] union-mount: Temporarily disable some syscalls
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (24 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 25/34] union-mount: Implement union mount Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 27/34] union-mount: Basic infrastructure of __union_lookup() Valerie Aurora
                   ` (8 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

After some of the following patches in this series, a few system calls
will crash the kernel if called on union-mounted file systems.
Temporarily disable rename(), unlink(), and rmdir() on unioned file
systems until they are correctly implemented by later patches.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namei.c |   17 +++++++++++++++++
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 0b6378e..2de4378 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -35,6 +35,7 @@
 #include <asm/uaccess.h>
 
 #include "internal.h"
+#include "union.h"
 
 /* [Feb-1997 T. Schoebel-Theuer]
  * Fundamental changes in the pathname lookup mechanisms (namei)
@@ -2404,6 +2405,11 @@ static long do_rmdir(int dfd, const char __user *pathname)
 	if (error)
 		return error;
 
+	/* rmdir() on union mounts not implemented yet */
+	error = -EINVAL;
+	if (IS_DIR_UNIONED(nd.path.dentry))
+		goto exit1;
+
 	switch(nd.last_type) {
 	case LAST_DOTDOT:
 		error = -ENOTEMPTY;
@@ -2500,6 +2506,11 @@ static long do_unlinkat(int dfd, const char __user *pathname)
 	if (nd.last_type != LAST_NORM)
 		goto exit1;
 
+	/* unlink() on union mounts not implemented yet */
+	error = -EINVAL;
+	if (IS_DIR_UNIONED(nd.path.dentry))
+		goto exit1;
+
 	nd.flags &= ~LOOKUP_PARENT;
 
 	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
@@ -2890,6 +2901,12 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 	if (oldnd.path.mnt != newnd.path.mnt)
 		goto exit2;
 
+	/* rename() on union mounts not implemented yet */
+	error = -EXDEV;
+	if (IS_DIR_UNIONED(oldnd.path.dentry) ||
+	    IS_DIR_UNIONED(newnd.path.dentry))
+		goto exit2;
+
 	old_dir = oldnd.path.dentry;
 	error = -EBUSY;
 	if (oldnd.last_type != LAST_NORM)
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 27/34] union-mount: Basic infrastructure of __union_lookup()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (25 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 26/34] union-mount: Temporarily disable some syscalls Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 28/34] union-mount: Process negative dentries in __union_lookup() Valerie Aurora
                   ` (7 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

Create a very simple version of union lookup.  This patch only looks
up the target in each layer of the union but does not process it in
any way.  Patches to do whiteouts, etc. follow.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namei.c |   68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 68 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 2de4378..8373463 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -724,6 +724,74 @@ static __always_inline void follow_dotdot(struct nameidata *nd)
 	follow_mount(&nd->path);
 }
 
+static struct dentry *__lookup_hash(struct qstr *name, struct dentry *base,
+				    struct nameidata *nd);
+
+/*
+ * __lookup_union - Lookup and build union stack
+ *
+ * @nd - nameidata for the parent of @topmost
+ * @name - name of target
+ * @topmost - path of the target on the topmost file system
+ *
+ * Do the "union" part of lookup for @topmost - that is, look it up in
+ * the lower layers of its parent directory's union stack.  If
+ * @topmost is a directory, build its union stack.  @topmost is the
+ * path of the target in the topmost layer of the union file system.
+ * It is either a directory or a negative (non-whiteout) dentry.
+ * @topmost and its parent must have passed the needs_union_lookup()
+ * test.
+ *
+ * This function may stomp nd->path with the path of the parent
+ * directory of the lower layers, so the caller must save nd->path and
+ * restore it afterwards.
+ */
+
+static int __lookup_union(struct nameidata *nd, struct qstr *name,
+			  struct path *topmost)
+{
+	struct path lower, parent = nd->path;
+	struct path *path;
+	unsigned int i, layers = parent.dentry->d_sb->s_union_count;
+	int err = 0;
+
+	/*
+	 * Note: This loop iterates through the union stack of the
+	 * parent of the target, not the target itself.  This function
+	 * builds the union stack of the target (if any).  The union
+	 * stack of the root directory is built at mount.
+	 */
+	for (i = 0; i < layers; i++) {
+		/*
+		 * Get the parent directory for this layer and lookup
+		 * the target in it.
+		 */
+		path = union_find_dir(parent.dentry, i);
+		if (!path->mnt)
+			continue;
+
+		nd->path = *path;
+		lower.mnt = mntget(nd->path.mnt);
+		mutex_lock(&nd->path.dentry->d_inode->i_mutex);
+		lower.dentry = __lookup_hash(name, nd->path.dentry, nd);
+		mutex_unlock(&nd->path.dentry->d_inode->i_mutex);
+
+		if (IS_ERR(lower.dentry)) {
+			mntput(lower.mnt);
+			err = PTR_ERR(lower.dentry);
+			goto out_err;
+		}
+		/* XXX - do nothing, lookup rule processing in later patches */
+		path_put(&lower);
+	}
+	return 0;
+
+out_err:
+	d_free_unions(topmost->dentry);
+	path_put(&lower);
+	return err;
+}
+
 /*
  *  It's more convoluted than I'd like it to be, but... it's still fairly
  *  small and for now I'd prefer to have fast path as straight as possible.
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 28/34] union-mount: Process negative dentries in __union_lookup()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (26 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 27/34] union-mount: Basic infrastructure of __union_lookup() Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 29/34] union-mount: Return files found in lower layers " Valerie Aurora
                   ` (6 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

Whiteouts end a union lookup.  So do opaque directories, unless
specific fallthru entry exists for this name.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namei.c |   23 ++++++++++++++++++++++-
 1 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 8373463..f6ad8b3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -781,11 +781,32 @@ static int __lookup_union(struct nameidata *nd, struct qstr *name,
 			err = PTR_ERR(lower.dentry);
 			goto out_err;
 		}
-		/* XXX - do nothing, lookup rule processing in later patches */
+
+		/*
+		 * A negative dentry can mean several things.  A plain
+		 * negative dentry is ignored and lookup continues to
+		 * the next layer.  But a whiteout or a non-fallthru
+		 * in an opaque dir covers everything below it.
+		 */
+		if (!lower.dentry->d_inode) {
+			if (d_is_whiteout(lower.dentry))
+				goto out_lookup_done;
+			if (IS_OPAQUE(nd->path.dentry->d_inode) &&
+			    !d_is_fallthru(lower.dentry))
+				goto out_lookup_done;
+			path_put(&lower);
+			continue;
+		}
+
+		/* XXX - do nothing, more in later patches */
 		path_put(&lower);
 	}
 	return 0;
 
+out_lookup_done:
+	path_put(&lower);
+	return 0;
+
 out_err:
 	d_free_unions(topmost->dentry);
 	path_put(&lower);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 29/34] union-mount: Return files found in lower layers in __union_lookup()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (27 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 28/34] union-mount: Process negative dentries in __union_lookup() Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 30/34] union-mount: Build union stack in __lookup_union() Valerie Aurora
                   ` (5 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

If we find a file during union lookup, don't look in any lower layers
and replace the topmost path with the file's path.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namei.c |   23 +++++++++++++++++++++++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f6ad8b3..c6696d8 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -798,11 +798,34 @@ static int __lookup_union(struct nameidata *nd, struct qstr *name,
 			continue;
 		}
 
+		/*
+		 * Files block everything below them.  Special case:
+		 * If we find a file below a directory (which makes no
+		 * sense), just ignore the file and return the
+		 * directory above it.
+		 */
+		if (!S_ISDIR(lower.dentry->d_inode->i_mode)) {
+			if (topmost->dentry->d_inode &&
+			    S_ISDIR(topmost->dentry->d_inode->i_mode))
+				goto out_lookup_done;
+			goto out_found_file;
+		}
+
 		/* XXX - do nothing, more in later patches */
 		path_put(&lower);
 	}
 	return 0;
 
+out_found_file:
+	/*
+	 * Swap out the positive lower dentry with the negative upper
+	 * dentry for this file.  Note that the matching mntput() is done
+	 * in link_path_walk().
+	 */
+	dput(topmost->dentry);
+	*topmost = lower;
+	return 0;
+
 out_lookup_done:
 	path_put(&lower);
 	return 0;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 30/34] union-mount: Build union stack in __lookup_union()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (28 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 29/34] union-mount: Return files found in lower layers " Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 31/34] union-mount: Follow mount " Valerie Aurora
                   ` (4 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

Build the union stack for directories as we look them up.  Create the
topmost directory if it doesn't exist.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namei.c |   18 ++++++++++++++++--
 1 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c6696d8..0041334 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -811,8 +811,22 @@ static int __lookup_union(struct nameidata *nd, struct qstr *name,
 			goto out_found_file;
 		}
 
-		/* XXX - do nothing, more in later patches */
-		path_put(&lower);
+		/*
+		 * Now we know the target is a directory.  Create a
+		 * matching topmost directory if one doesn't already
+		 * exist, and add this layer's directory to the union
+		 * stack for the topmost directory.
+		 */
+		if (!topmost->dentry->d_inode) {
+			err = union_create_topmost_dir(&parent, name, topmost,
+						       &lower);
+			if (err)
+				goto out_err;
+		}
+
+		err = union_add_dir(topmost, &lower, i);
+		if (err)
+			goto out_err;
 	}
 	return 0;
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 31/34] union-mount: Follow mount in __lookup_union()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (29 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 30/34] union-mount: Build union stack in __lookup_union() Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 32/34] union-mount: Add lookup_union() wrapper for __lookup_union() Valerie Aurora
                   ` (3 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

In order for read-only layers of a union to have submounts, we have to
follow mounts on directories in union lookup.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namei.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 0041334..cdff001 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -811,6 +811,8 @@ static int __lookup_union(struct nameidata *nd, struct qstr *name,
 			goto out_found_file;
 		}
 
+		follow_mount(&lower);
+
 		/*
 		 * Now we know the target is a directory.  Create a
 		 * matching topmost directory if one doesn't already
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 32/34] union-mount: Add lookup_union() wrapper for __lookup_union()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (30 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 31/34] union-mount: Follow mount " Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 33/34] union-mount: Add do_lookup_union() " Valerie Aurora
                   ` (2 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

__lookup_union() may overwrite the parent's path in the nameidata
struct for the entry being looked up.  This is because it reuses the
same nameidata to do lookups in each of the lower layer directories.
lookup_union() saves and restores the original parent's path.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namei.c |   27 +++++++++++++++++++++++++++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index cdff001..ecb1796 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -853,6 +853,33 @@ out_err:
 }
 
 /*
+ * lookup_union - revalidate and build union stack for this path
+ *
+ * We borrow the nameidata struct from the topmost layer to do the
+ * revalidation on lower dentries, replacing the topmost parent
+ * directory's path with that of the matching parent dir in each lower
+ * layer.  This wrapper for __lookup_union() saves the topmost layer's
+ * path and restores it when we are done.
+ */
+static int lookup_union(struct nameidata *nd, struct qstr *name,
+			struct path *topmost)
+{
+	struct path saved_path;
+	int err;
+
+	BUG_ON(!IS_MNT_UNION(nd->path.mnt) && !IS_MNT_UNION(topmost->mnt));
+	BUG_ON(!mutex_is_locked(&nd->path.dentry->d_inode->i_mutex));
+
+	saved_path = nd->path;
+
+	err = __lookup_union(nd, name, topmost);
+
+	nd->path = saved_path;
+
+	return err;
+}
+
+/*
  *  It's more convoluted than I'd like it to be, but... it's still fairly
  *  small and for now I'd prefer to have fast path as straight as possible.
  *  It _is_ time-critical.
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 33/34] union-mount: Add do_lookup_union() wrapper for __lookup_union()
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (31 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 32/34] union-mount: Add lookup_union() wrapper for __lookup_union() Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-16 22:12 ` [PATCH 34/34] union-mount: Call union lookup functions in lookup path Valerie Aurora
  2010-09-21  0:02 ` [PATCH -1/34] VFS: Add hard read-only users count to superblock Valerie Aurora
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

do_lookup_union() locks the parent directory and follows the mount
after lookup.  It is appropriate for calling from do_lookup().

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namei.c |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ecb1796..7656442 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -880,6 +880,28 @@ static int lookup_union(struct nameidata *nd, struct qstr *name,
 }
 
 /*
+ * do_union_lookup - union mount-aware part of do_lookup
+ *
+ * do_lookup()-style wrapper for lookup_union().  Follows mounts.
+ */
+
+static int do_lookup_union(struct nameidata *nd, struct qstr *name,
+			   struct path *topmost)
+{
+	struct dentry *parent = nd->path.dentry;
+	struct inode *dir = parent->d_inode;
+	int err;
+
+	mutex_lock(&dir->i_mutex);
+	err = lookup_union(nd, name, topmost);
+	mutex_unlock(&dir->i_mutex);
+
+	__follow_mount(topmost);
+
+	return err;
+}
+
+/*
  *  It's more convoluted than I'd like it to be, but... it's still fairly
  *  small and for now I'd prefer to have fast path as straight as possible.
  *  It _is_ time-critical.
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 34/34] union-mount: Call union lookup functions in lookup path
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (32 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 33/34] union-mount: Add do_lookup_union() " Valerie Aurora
@ 2010-09-16 22:12 ` Valerie Aurora
  2010-09-21  0:02 ` [PATCH -1/34] VFS: Add hard read-only users count to superblock Valerie Aurora
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-16 22:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel, Valerie Aurora

Union mounts hook into the lookup path in two places: do_lookup() and
lookup_hash().

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namei.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 7656442..bbce934 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -931,6 +931,11 @@ done:
 	path->mnt = mnt;
 	path->dentry = dentry;
 	__follow_mount(path);
+	if (needs_lookup_union(&nd->path, path)) {
+		int err = do_lookup_union(nd, name, path);
+		if (err < 0)
+			return err;
+	}
 	return 0;
 
 need_lookup:
@@ -1402,8 +1407,13 @@ static int lookup_hash(struct nameidata *nd, struct qstr *name,
 		err = PTR_ERR(path->dentry);
 		path->dentry = NULL;
 		path->mnt = NULL;
+		return err;
 	}
+
+	if (needs_lookup_union(&nd->path, path))
+		err = lookup_union(nd, name, path);
 	return err;
+
 }
 
 static int __lookup_one_len(const char *name, struct qstr *this,
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 03/34] VFS: Add CL_NO_SLAVE flag to clone_mnt()/copy_tree()
       [not found]   ` <AANLkTim1bbGrrPcFHThx3XOm8GmudQFSmFUs3NAXT5yC@mail.gmail.com>
@ 2010-09-17  4:34     ` Ram Pai
  2010-09-17 17:15       ` Valerie Aurora
  0 siblings, 1 reply; 59+ messages in thread
From: Ram Pai @ 2010-09-17  4:34 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Ram Pai, Alexander Viro, Miklos Szeredi, Christoph Hellwig,
	Andreas Gruenbacher, Nick Piggin, linux-kernel, linux-fsdevel

On Thu, Sep 16, 2010 at 05:09:58PM -0700, Ram Pai wrote:
> On Thu, Sep 16, 2010 at 3:11 PM, Valerie Aurora <vaurora@redhat.com> wrote:
> 
> > Passing the CL_NO_SLAVE flag to clone_mnt() causes the clone
> > to fail if the source mnt is a slave.
> >
> > Signed-off-by: Valerie Aurora <vaurora@redhat.com>
> > ---
> >  fs/namespace.c |    3 +++
> >  fs/pnode.h     |    1 +
> >  2 files changed, 4 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index eeb4c22..6956062 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -565,6 +565,9 @@ static struct vfsmount *clone_mnt(struct vfsmount *old,
> > struct dentry *root,
> >        if ((flag & CL_NO_SHARED) && (IS_MNT_SHARED(old)))
> >                return ERR_PTR(-EINVAL);
> >
> > +       if ((flag & CL_NO_SLAVE) && (IS_MNT_SLAVE(old)))
> > +               return ERR_PTR(-EINVAL);
> > +
> >
> 
> 
> its been a while and my memory may have corroded.  But I dont think this
> check is needed. Because cloning a 'slave mount' makes the mount a 'private
> mount' and not a 'slave mount'.

There is one case where a 'slave mount' when cloned can generate a 'slave mount', and
that is when the 'slave mount' is also a 'shared mount'. So the above check has to
be

       if ((flag & CL_NO_SLAVE) && (IS_MNT_SLAVE(old) && IS_MNT_SHARED(old)))
               return ERR_PTR(-EINVAL);

RP

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 03/34] VFS: Add CL_NO_SLAVE flag to clone_mnt()/copy_tree()
  2010-09-17  4:34     ` Ram Pai
@ 2010-09-17 17:15       ` Valerie Aurora
  2010-09-20  5:25         ` Ram Pai
  0 siblings, 1 reply; 59+ messages in thread
From: Valerie Aurora @ 2010-09-17 17:15 UTC (permalink / raw)
  To: Ram Pai
  Cc: Ram Pai, Alexander Viro, Miklos Szeredi, Christoph Hellwig,
	Andreas Gruenbacher, Nick Piggin, linux-kernel, linux-fsdevel

On Thu, Sep 16, 2010 at 09:34:01PM -0700, Ram Pai wrote:
> On Thu, Sep 16, 2010 at 05:09:58PM -0700, Ram Pai wrote:
> > On Thu, Sep 16, 2010 at 3:11 PM, Valerie Aurora <vaurora@redhat.com> wrote:
> > 
> > > Passing the CL_NO_SLAVE flag to clone_mnt() causes the clone
> > > to fail if the source mnt is a slave.
> > >
> > > Signed-off-by: Valerie Aurora <vaurora@redhat.com>
> > > ---
> > >  fs/namespace.c |    3 +++
> > >  fs/pnode.h     |    1 +
> > >  2 files changed, 4 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/fs/namespace.c b/fs/namespace.c
> > > index eeb4c22..6956062 100644
> > > --- a/fs/namespace.c
> > > +++ b/fs/namespace.c
> > > @@ -565,6 +565,9 @@ static struct vfsmount *clone_mnt(struct vfsmount *old,
> > > struct dentry *root,
> > >        if ((flag & CL_NO_SHARED) && (IS_MNT_SHARED(old)))
> > >                return ERR_PTR(-EINVAL);
> > >
> > > +       if ((flag & CL_NO_SLAVE) && (IS_MNT_SLAVE(old)))
> > > +               return ERR_PTR(-EINVAL);
> > > +
> > >
> > 
> > 
> > its been a while and my memory may have corroded.  But I dont think this
> > check is needed. Because cloning a 'slave mount' makes the mount a 'private
> > mount' and not a 'slave mount'.
>
> There is one case where a 'slave mount' when cloned can generate a 'slave mount', and
> that is when the 'slave mount' is also a 'shared mount'. So the above check has to
> be
> 
>        if ((flag & CL_NO_SLAVE) && (IS_MNT_SLAVE(old) && IS_MNT_SHARED(old)))
>                return ERR_PTR(-EINVAL);

Hey Ram,

I added this flag for union mounts. Union mounts can't deal with
namespace changes in the read-only layers, so we don't allow union of
read-only mounts that are the target of propagation events (shared or
slave).

We could automatically convert all slave or shared mounts into private
mounts when we clone the mounts, but that would surprise an
administrator who carefully set up their shared or slave read-only
mounts before unioning them.  So instead of silently converting slave
or shared to private, we error out.  Does that make sense?

All that being said, I debated how to do this cleanly and I'm still
not satisfied.  My goal is to both check and clone the proposed
read-only layers in one pass.  Without these flags, I had to do four
passes:

1. Find the "lowest" read-only mount at this mountpoint.
2. Check each mount for read-only, not shared, not slave.
3. Clone the subtree starting at the "lowest" mount.
4. Recheck the cloned tree for rules in #2.

One of the reasons I had to do it this way is that you can't hold
vfsmount_lock while calling copy_tree(), so the mount flags can change
between the first check in #2 and the copy_tree() in #3.  Also
sb->s_flag can change.  One of the problems with the current code is
that it can't deal with cloning existing union mounts, which we need
if we are to make bind mounts work (see do_loopback()).

Anyway, if you have any ideas, I'm all ears.

Thanks for reviewing,

-VAL

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 03/34] VFS: Add CL_NO_SLAVE flag to clone_mnt()/copy_tree()
  2010-09-17 17:15       ` Valerie Aurora
@ 2010-09-20  5:25         ` Ram Pai
  2010-09-21  0:03           ` Valerie Aurora
  0 siblings, 1 reply; 59+ messages in thread
From: Ram Pai @ 2010-09-20  5:25 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Ram Pai, Alexander Viro, Miklos Szeredi, Christoph Hellwig,
	Andreas Gruenbacher, Nick Piggin, linux-kernel, linux-fsdevel

On Fri, Sep 17, 2010 at 01:15:14PM -0400, Valerie Aurora wrote:
> On Thu, Sep 16, 2010 at 09:34:01PM -0700, Ram Pai wrote:
> > On Thu, Sep 16, 2010 at 05:09:58PM -0700, Ram Pai wrote:
> > > On Thu, Sep 16, 2010 at 3:11 PM, Valerie Aurora <vaurora@redhat.com> wrote:
> > > 
> > > > Passing the CL_NO_SLAVE flag to clone_mnt() causes the clone
> > > > to fail if the source mnt is a slave.
> > > >
> > > > Signed-off-by: Valerie Aurora <vaurora@redhat.com>
> > > > ---
> > > >  fs/namespace.c |    3 +++
> > > >  fs/pnode.h     |    1 +
> > > >  2 files changed, 4 insertions(+), 0 deletions(-)
> > > >
> > > > diff --git a/fs/namespace.c b/fs/namespace.c
> > > > index eeb4c22..6956062 100644
> > > > --- a/fs/namespace.c
> > > > +++ b/fs/namespace.c
> > > > @@ -565,6 +565,9 @@ static struct vfsmount *clone_mnt(struct vfsmount *old,
> > > > struct dentry *root,
> > > >        if ((flag & CL_NO_SHARED) && (IS_MNT_SHARED(old)))
> > > >                return ERR_PTR(-EINVAL);
> > > >
> > > > +       if ((flag & CL_NO_SLAVE) && (IS_MNT_SLAVE(old)))
> > > > +               return ERR_PTR(-EINVAL);
> > > > +
> > > >
> > > 
> > > 
> > > its been a while and my memory may have corroded.  But I dont think this
> > > check is needed. Because cloning a 'slave mount' makes the mount a 'private
> > > mount' and not a 'slave mount'.
> >
> > There is one case where a 'slave mount' when cloned can generate a 'slave mount', and
> > that is when the 'slave mount' is also a 'shared mount'. So the above check has to
> > be
> > 
> >        if ((flag & CL_NO_SLAVE) && (IS_MNT_SLAVE(old) && IS_MNT_SHARED(old)))
> >                return ERR_PTR(-EINVAL);
> 
> Hey Ram,
> 
> I added this flag for union mounts. Union mounts can't deal with
> namespace changes in the read-only layers, so we don't allow union of
> read-only mounts that are the target of propagation events (shared or
> slave).
> 
> We could automatically convert all slave or shared mounts into private
> mounts when we clone the mounts, but that would surprise an
> administrator who carefully set up their shared or slave read-only
> mounts before unioning them.  So instead of silently converting slave
> or shared to private, we error out.  Does that make sense?

I understand your intentions, but I think you are making a wrong assumption.
You seem to be thinking that if a slave-mount is cloned, the new cloned
mount will also be a slave-mount and will hence receive propagations. As
per shared subtree semantics, a slave-mount when cloned will create a private
mount. Since your intention is to avoid generating any new mounts that 
recieve propagations, you should be checking for shared-mounts and 
slave-shared-mounts because these are the two kind of mounts that when
cloned create new mounts that receive propagation.

btw: slave-shared-mount is a mount that is shared and is also a slave of
a shared mount.

> 
> All that being said, I debated how to do this cleanly and I'm still
> not satisfied.  My goal is to both check and clone the proposed
> read-only layers in one pass.  Without these flags, I had to do four
> passes:
> 
> 1. Find the "lowest" read-only mount at this mountpoint.
> 2. Check each mount for read-only, not shared, not slave.
> 3. Clone the subtree starting at the "lowest" mount.
> 4. Recheck the cloned tree for rules in #2.
> 
> One of the reasons I had to do it this way is that you can't hold
> vfsmount_lock while calling copy_tree(), so the mount flags can change
> between the first check in #2 and the copy_tree() in #3.  Also
> sb->s_flag can change.  

Isn't this whole operation done under the protection of namespace_sem? 
I know that shared/slave flags can't change if the namespace_sem is held. 
The same may also be true for sb->s_flag. 


> One of the problems with the current code is
> that it can't deal with cloning existing union mounts, which we need
> if we are to make bind mounts work (see do_loopback()).

if I understand your union mount semantics correctly, you dont' allow the
same filesystem to be union mounted rw in two different locations. correct?
If yes, then bind mount of a union-mount has to be disallowed.

RP

> 
> Anyway, if you have any ideas, I'm all ears.
> 
> Thanks for reviewing,
> 
> -VAL

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes
  2010-09-16 22:11 ` [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes Valerie Aurora
@ 2010-09-20 21:26   ` Andreas Gruenbacher
  2010-09-21 18:53     ` Valerie Aurora
  2010-09-30  9:51   ` Miklos Szeredi
  1 sibling, 1 reply; 59+ messages in thread
From: Andreas Gruenbacher @ 2010-09-20 21:26 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Alexander Viro, Miklos Szeredi, Christoph Hellwig, Nick Piggin,
	linux-kernel, linux-fsdevel

collect_mounts() now also returns error pointers instead of NULL upon
failure:

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index 46a57b5..898da28 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -579,7 +579,7 @@ void audit_trim_trees(void)
 
 		root_mnt = collect_mounts(&path);
 		path_put(&path);
-		if (!root_mnt)
+		if (IS_ERR(root_mnt))
 			goto skip_it;
 
 		spin_lock(&hash_lock);
@@ -651,8 +651,8 @@ int audit_add_tree_rule(struct audit_krule *rule)
 		goto Err;
 	mnt = collect_mounts(&path);
 	path_put(&path);
-	if (!mnt) {
-		err = -ENOMEM;
+	if (IS_ERR(mnt)) {
+		err = PTR_ERR(mnt);
 		goto Err;
 	}
 
@@ -701,8 +701,8 @@ int audit_tag_tree(char *old, char *new)
 		return err;
 	tagged = collect_mounts(&path2);
 	path_put(&path2);
-	if (!tagged)
-		return -ENOMEM;
+	if (IS_ERR(tagged))
+		return PTR_ERR(tagged);
 
 	err = kern_path(old, 0, &path1);
 	if (err) {
-- 
1.7.3.rc2

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH -1/34] VFS: Add hard read-only users count to superblock
  2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
                   ` (33 preceding siblings ...)
  2010-09-16 22:12 ` [PATCH 34/34] union-mount: Call union lookup functions in lookup path Valerie Aurora
@ 2010-09-21  0:02 ` Valerie Aurora
  34 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-21  0:02 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Andreas Gruenbacher,
	Nick Piggin, linux-kernel, linux-fsdevel

On Thu, Sep 16, 2010 at 03:11:51PM -0700, Valerie Aurora wrote:
> 
> Against 2.6.35.  The rest of the series (whiteouts, fallthrus,
> soon-to-be-obsolete copyup, etc.) is in branch "split_lookup"
> in:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/val/linux-2.6.git

I left out the first patch in this part of the series (there are
patches before and after this one in the branch git repo).  Here is
patch -1 in the queue:

Subject: VFS: Add hard read-only users count to superblock

While we can check if a file system is currently read-only, we can't
guarantee that it will stay read-only.  The file system can be mounted
or remounted read-write at any time.  This is a problem for union
mounts, which require the underlying file system be read-only for the
entire duration of the union mount.

Add a hard read-only users count to the superblock.  When this count
is non-zero, don't allow any read-write mounts of this super, or any
read-write remounts of existing mounts.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>


---
 fs/super.c         |    8 ++++++++
 include/linux/fs.h |    7 +++++++
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 938119a..3f2df09 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -108,6 +108,7 @@ out:
  */
 static inline void destroy_super(struct super_block *s)
 {
+	BUG_ON(s->s_hard_readonly_users);
 	security_sb_free(s);
 	kfree(s->s_subtype);
 	kfree(s->s_options);
@@ -550,6 +551,9 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
 			return -EBUSY;
 	}
 
+	if (!(flags & MS_RDONLY) && sb->s_hard_readonly_users)
+		return -EROFS;
+
 	if (sb->s_op->remount_fs) {
 		retval = sb->s_op->remount_fs(sb, &flags, data);
 		if (retval)
@@ -924,6 +928,10 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void
 	WARN((mnt->mnt_sb->s_maxbytes < 0), "%s set sb->s_maxbytes to "
 		"negative value (%lld)\n", type->name, mnt->mnt_sb->s_maxbytes);
 
+	error = -EROFS;
+	if (!(flags & MS_RDONLY) && mnt->mnt_sb->s_hard_readonly_users)
+		goto out_sb;
+
 	mnt->mnt_mountpoint = mnt->mnt_root;
 	mnt->mnt_parent = mnt;
 	up_write(&mnt->mnt_sb->s_umount);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1ed7fe8..7dcb95b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1389,6 +1389,13 @@ struct super_block {
 	 * generic_show_options()
 	 */
 	char *s_options;
+
+	/*
+	 * Number of mounts requiring that the underlying file system
+	 * never transition to read-write.  Protected by s_umount.
+	 * Decremented by free_vfsmnt() if MNT_HARD_READONLY is set.
+	 */
+	int s_hard_readonly_users;
 };
 
 extern struct timespec current_fs_time(struct super_block *sb);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 03/34] VFS: Add CL_NO_SLAVE flag to clone_mnt()/copy_tree()
  2010-09-20  5:25         ` Ram Pai
@ 2010-09-21  0:03           ` Valerie Aurora
  2010-09-27  5:42             ` Ram Pai
  0 siblings, 1 reply; 59+ messages in thread
From: Valerie Aurora @ 2010-09-21  0:03 UTC (permalink / raw)
  To: Ram Pai
  Cc: Ram Pai, Alexander Viro, Miklos Szeredi, Christoph Hellwig,
	Andreas Gruenbacher, Nick Piggin, linux-kernel, linux-fsdevel

On Sun, Sep 19, 2010 at 10:25:53PM -0700, Ram Pai wrote:
> 
> I understand your intentions, but I think you are making a wrong assumption.
> You seem to be thinking that if a slave-mount is cloned, the new cloned
> mount will also be a slave-mount and will hence receive propagations. As
> per shared subtree semantics, a slave-mount when cloned will create a private
> mount. Since your intention is to avoid generating any new mounts that 
> recieve propagations, you should be checking for shared-mounts and 
> slave-shared-mounts because these are the two kind of mounts that when
> cloned create new mounts that receive propagation.

No.  This isn't about the semantics of the clone mount operation.  It
is about the administrator creating a slave mount, unioning it, and
then being surprised when the unioned file system does not receive
mount propagation events.

Think of the source vfsmount tree as a set of command line arguments
for the union mount.

> > One of the reasons I had to do it this way is that you can't hold
> > vfsmount_lock while calling copy_tree(), so the mount flags can change
> > between the first check in #2 and the copy_tree() in #3.  Also
> > sb->s_flag can change.  
> 
> Isn't this whole operation done under the protection of namespace_sem? 
> I know that shared/slave flags can't change if the namespace_sem is held. 
> The same may also be true for sb->s_flag. 

namespace_sem only covers the shared/slave mount flags.  We also care
about MNT_READONLY, which is protected by vfsmount_lock.  sb->s_flags
is protected by sb->s_umount and not namespace_sem.

We could do the shared/slave check outside of clone_mnt(), but it
would require two passes over the source vfsmount tree.

> > One of the problems with the current code is
> > that it can't deal with cloning existing union mounts, which we need
> > if we are to make bind mounts work (see do_loopback()).
> 
> if I understand your union mount semantics correctly, you dont' allow the
> same filesystem to be union mounted rw in two different locations. correct?
> If yes, then bind mount of a union-mount has to be disallowed.

A bind mount of part of a union file system (which is itself unioned
and includes the lower layers) doesn't have the same problems as an
independent mount of the rw layer.  Mainly, the union stack will be
the same for both.  I'm not sure it will work but it won't fail for
the same reasons as a separate mount of the rw layer alone.

-VAL

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes
  2010-09-20 21:26   ` Andreas Gruenbacher
@ 2010-09-21 18:53     ` Valerie Aurora
  0 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-21 18:53 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Alexander Viro, Miklos Szeredi, Christoph Hellwig, Nick Piggin,
	linux-kernel, linux-fsdevel

On Mon, Sep 20, 2010 at 11:26:42PM +0200, Andreas Gruenbacher wrote:
> collect_mounts() now also returns error pointers instead of NULL upon
> failure:
> 
> diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> index 46a57b5..898da28 100644
> --- a/kernel/audit_tree.c
> +++ b/kernel/audit_tree.c
> @@ -579,7 +579,7 @@ void audit_trim_trees(void)
>  
>  		root_mnt = collect_mounts(&path);
>  		path_put(&path);
> -		if (!root_mnt)
> +		if (IS_ERR(root_mnt))
>  			goto skip_it;
>  
>  		spin_lock(&hash_lock);
> @@ -651,8 +651,8 @@ int audit_add_tree_rule(struct audit_krule *rule)
>  		goto Err;
>  	mnt = collect_mounts(&path);
>  	path_put(&path);
> -	if (!mnt) {
> -		err = -ENOMEM;
> +	if (IS_ERR(mnt)) {
> +		err = PTR_ERR(mnt);
>  		goto Err;
>  	}
>  
> @@ -701,8 +701,8 @@ int audit_tag_tree(char *old, char *new)
>  		return err;
>  	tagged = collect_mounts(&path2);
>  	path_put(&path2);
> -	if (!tagged)
> -		return -ENOMEM;
> +	if (IS_ERR(tagged))
> +		return PTR_ERR(tagged);
>  
>  	err = kern_path(old, 0, &path1);
>  	if (err) {

Thanks for the fix, folded in.

-VAL

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 03/34] VFS: Add CL_NO_SLAVE flag to clone_mnt()/copy_tree()
  2010-09-21  0:03           ` Valerie Aurora
@ 2010-09-27  5:42             ` Ram Pai
  2010-09-27 18:50               ` Valerie Aurora
  0 siblings, 1 reply; 59+ messages in thread
From: Ram Pai @ 2010-09-27  5:42 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Ram Pai, Ram Pai, Alexander Viro, Miklos Szeredi,
	Christoph Hellwig, Andreas Gruenbacher, Nick Piggin,
	linux-kernel, linux-fsdevel

On Mon, Sep 20, 2010 at 08:03:44PM -0400, Valerie Aurora wrote:
> On Sun, Sep 19, 2010 at 10:25:53PM -0700, Ram Pai wrote:
> > 
> > I understand your intentions, but I think you are making a wrong assumption.
> > You seem to be thinking that if a slave-mount is cloned, the new cloned
> > mount will also be a slave-mount and will hence receive propagations. As
> > per shared subtree semantics, a slave-mount when cloned will create a private
> > mount. Since your intention is to avoid generating any new mounts that 
> > recieve propagations, you should be checking for shared-mounts and 
> > slave-shared-mounts because these are the two kind of mounts that when
> > cloned create new mounts that receive propagation.
> 
> No.  This isn't about the semantics of the clone mount operation.  It
> is about the administrator creating a slave mount, unioning it, and
> then being surprised when the unioned file system does not receive
> mount propagation events.
> 
> Think of the source vfsmount tree as a set of command line arguments
> for the union mount.

Ok. In that case,  you introduced a subtle change in the semantics of clone_mnt().
As I understand it, the flags parameter of clone_mnt() are meant to be a modifier 
for the cloned mount, not a filter on the source mount.

RP

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 03/34] VFS: Add CL_NO_SLAVE flag to clone_mnt()/copy_tree()
  2010-09-27  5:42             ` Ram Pai
@ 2010-09-27 18:50               ` Valerie Aurora
  2010-10-01  0:44                 ` Ram Pai
  0 siblings, 1 reply; 59+ messages in thread
From: Valerie Aurora @ 2010-09-27 18:50 UTC (permalink / raw)
  To: Ram Pai
  Cc: Ram Pai, Alexander Viro, Miklos Szeredi, Christoph Hellwig,
	Andreas Gruenbacher, Nick Piggin, linux-kernel, linux-fsdevel

On Sun, Sep 26, 2010 at 10:42:05PM -0700, Ram Pai wrote:
> On Mon, Sep 20, 2010 at 08:03:44PM -0400, Valerie Aurora wrote:
> > On Sun, Sep 19, 2010 at 10:25:53PM -0700, Ram Pai wrote:
> > > 
> > > I understand your intentions, but I think you are making a wrong assumption.
> > > You seem to be thinking that if a slave-mount is cloned, the new cloned
> > > mount will also be a slave-mount and will hence receive propagations. As
> > > per shared subtree semantics, a slave-mount when cloned will create a private
> > > mount. Since your intention is to avoid generating any new mounts that 
> > > recieve propagations, you should be checking for shared-mounts and 
> > > slave-shared-mounts because these are the two kind of mounts that when
> > > cloned create new mounts that receive propagation.
> > 
> > No.  This isn't about the semantics of the clone mount operation.  It
> > is about the administrator creating a slave mount, unioning it, and
> > then being surprised when the unioned file system does not receive
> > mount propagation events.
> > 
> > Think of the source vfsmount tree as a set of command line arguments
> > for the union mount.
> 
> Ok. In that case,  you introduced a subtle change in the semantics of clone_mnt().
> As I understand it, the flags parameter of clone_mnt() are meant to be a modifier 
> for the cloned mount, not a filter on the source mount.

Yes, that's it exactly.

Do you have a suggestion for writing this a different way?  We can
move it all into copy_tree() and leave clone_mnt() alone, at the cost
of a little code duplication and some acrobatics around possible
loopback support.

-VAL

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 23/34] union-mount: Prevent topmost file system from being mounted elsewhere
  2010-09-16 22:12 ` [PATCH 23/34] union-mount: Prevent topmost file system from being mounted elsewhere Valerie Aurora
@ 2010-09-30  9:37   ` Miklos Szeredi
  2010-09-30 21:47     ` Valerie Aurora
  0 siblings, 1 reply; 59+ messages in thread
From: Miklos Szeredi @ 2010-09-30  9:37 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: viro, miklos, hch, agruen, npiggin, linux-kernel, linux-fsdevel, vaurora

On Thu, 16 Sep 2010, Valerie Aurora wrote:
> The device underlying the topmost read-write layer of a file system
> cannot be mounted anywhere else on the system.  We keep a pointer to
> the union stack in the dentry of the topmost directory, so that dentry
> can't be part of a different mount, since dentries are shared between
> different mounts of the same device.
> 
> Signed-off-by: Valerie Aurora <vaurora@redhat.com>
> ---
>  fs/namespace.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 61256e6..26efaf3 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1998,6 +1998,11 @@ int do_add_mount(struct vfsmount *newmnt, struct path *path,
>  	if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode))
>  		goto unlock;
>  
> +	/* Top layers of union mounts can't be mounted elsewhere */
> +	err = -EBUSY;
> +	if (newmnt->mnt_sb->s_union_lower_mnts)
> +		goto unlock;
> +

This is insufficient: the super block may be mounted elsewhere later.
And no, preventing bind mounts is not enough.

BTW, what about CLONE_NEWNS?  I think it's a rather big limitation if
that doesn't work...

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes
  2010-09-16 22:11 ` [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes Valerie Aurora
  2010-09-20 21:26   ` Andreas Gruenbacher
@ 2010-09-30  9:51   ` Miklos Szeredi
  2010-09-30 21:41     ` Valerie Aurora
  1 sibling, 1 reply; 59+ messages in thread
From: Miklos Szeredi @ 2010-09-30  9:51 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: viro, miklos, hch, agruen, npiggin, linux-kernel, linux-fsdevel, vaurora

On Thu, 16 Sep 2010, Valerie Aurora wrote:
> copy_tree() can theoretically fail in a case other than ENOMEM, but
> always returns NULL which is interpreted by callers as -ENOMEM.
> Convert to return an explicit error.  Convert clone_mnt() for
> consistency and because union mounts will add new error cases.

I think it makes sense to push this fix to 2.6.37 independently of the
other patches.

Acked-by: Miklos Szeredi <mszeredi@suse.cz>

> 
> Signed-off-by: Valerie Aurora <vaurora@redhat.com>
> ---
>  fs/namespace.c |  111 ++++++++++++++++++++++++++++++--------------------------
>  fs/pnode.c     |    5 ++-
>  2 files changed, 63 insertions(+), 53 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index e1ea335..5566524 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -559,53 +559,57 @@ static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root,
>  					int flag)
>  {
>  	struct super_block *sb = old->mnt_sb;
> -	struct vfsmount *mnt = alloc_vfsmnt(old->mnt_devname);
> +	struct vfsmount *mnt;
> +	int err;
>  
> -	if (mnt) {
> -		if (flag & (CL_SLAVE | CL_PRIVATE))
> -			mnt->mnt_group_id = 0; /* not a peer of original */
> -		else
> -			mnt->mnt_group_id = old->mnt_group_id;
> -
> -		if ((flag & CL_MAKE_SHARED) && !mnt->mnt_group_id) {
> -			int err = mnt_alloc_group_id(mnt);
> -			if (err)
> -				goto out_free;
> -		}
> +	mnt = alloc_vfsmnt(old->mnt_devname);
> +	if (!mnt)
> +		return ERR_PTR(-ENOMEM);
>  
> -		mnt->mnt_flags = old->mnt_flags;
> -		atomic_inc(&sb->s_active);
> -		mnt->mnt_sb = sb;
> -		mnt->mnt_root = dget(root);
> -		mnt->mnt_mountpoint = mnt->mnt_root;
> -		mnt->mnt_parent = mnt;
> -
> -		if (flag & CL_SLAVE) {
> -			list_add(&mnt->mnt_slave, &old->mnt_slave_list);
> -			mnt->mnt_master = old;
> -			CLEAR_MNT_SHARED(mnt);
> -		} else if (!(flag & CL_PRIVATE)) {
> -			if ((flag & CL_MAKE_SHARED) || IS_MNT_SHARED(old))
> -				list_add(&mnt->mnt_share, &old->mnt_share);
> -			if (IS_MNT_SLAVE(old))
> -				list_add(&mnt->mnt_slave, &old->mnt_slave);
> -			mnt->mnt_master = old->mnt_master;
> -		}
> -		if (flag & CL_MAKE_SHARED)
> -			set_mnt_shared(mnt);
> -
> -		/* stick the duplicate mount on the same expiry list
> -		 * as the original if that was on one */
> -		if (flag & CL_EXPIRE) {
> -			if (!list_empty(&old->mnt_expire))
> -				list_add(&mnt->mnt_expire, &old->mnt_expire);
> -		}
> +	if (flag & (CL_SLAVE | CL_PRIVATE))
> +		mnt->mnt_group_id = 0; /* not a peer of original */
> +	else
> +		mnt->mnt_group_id = old->mnt_group_id;
> +
> +	if ((flag & CL_MAKE_SHARED) && !mnt->mnt_group_id) {
> +		err = mnt_alloc_group_id(mnt);
> +		if (err)
> +			goto out_free;
>  	}
> +
> +	mnt->mnt_flags = old->mnt_flags;
> +	atomic_inc(&sb->s_active);
> +	mnt->mnt_sb = sb;
> +	mnt->mnt_root = dget(root);
> +	mnt->mnt_mountpoint = mnt->mnt_root;
> +	mnt->mnt_parent = mnt;
> +
> +	if (flag & CL_SLAVE) {
> +		list_add(&mnt->mnt_slave, &old->mnt_slave_list);
> +		mnt->mnt_master = old;
> +		CLEAR_MNT_SHARED(mnt);
> +	} else if (!(flag & CL_PRIVATE)) {
> +		if ((flag & CL_MAKE_SHARED) || IS_MNT_SHARED(old))
> +			list_add(&mnt->mnt_share, &old->mnt_share);
> +		if (IS_MNT_SLAVE(old))
> +			list_add(&mnt->mnt_slave, &old->mnt_slave);
> +		mnt->mnt_master = old->mnt_master;
> +	}
> +	if (flag & CL_MAKE_SHARED)
> +		set_mnt_shared(mnt);
> +
> +	/* stick the duplicate mount on the same expiry list
> +	 * as the original if that was on one */
> +	if (flag & CL_EXPIRE) {
> +		if (!list_empty(&old->mnt_expire))
> +			list_add(&mnt->mnt_expire, &old->mnt_expire);
> +	}
> +
>  	return mnt;
>  
>   out_free:
>  	free_vfsmnt(mnt);
> -	return NULL;
> +	return ERR_PTR(err);
>  }
>  
>  static inline void __mntput(struct vfsmount *mnt)
> @@ -1212,11 +1216,12 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
>  	struct path path;
>  
>  	if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt))
> -		return NULL;
> +		return ERR_PTR(-EINVAL);
>  
>  	res = q = clone_mnt(mnt, dentry, flag);
> -	if (!q)
> -		goto Enomem;
> +	if (IS_ERR(q))
> +		return q;
> +
>  	q->mnt_mountpoint = mnt->mnt_mountpoint;
>  
>  	p = mnt;
> @@ -1237,8 +1242,8 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
>  			path.mnt = q;
>  			path.dentry = p->mnt_mountpoint;
>  			q = clone_mnt(p, p->mnt_root, flag);
> -			if (!q)
> -				goto Enomem;
> +			if (IS_ERR(q))
> +				goto out;
>  			spin_lock(&vfsmount_lock);
>  			list_add_tail(&q->mnt_list, &res->mnt_list);
>  			attach_mnt(q, &path);
> @@ -1246,7 +1251,7 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
>  		}
>  	}
>  	return res;
> -Enomem:
> +out:
>  	if (res) {
>  		LIST_HEAD(umount_list);
>  		spin_lock(&vfsmount_lock);
> @@ -1254,9 +1259,11 @@ Enomem:
>  		spin_unlock(&vfsmount_lock);
>  		release_mounts(&umount_list);
>  	}
> -	return NULL;
> +	return q;
>  }
>  
> +/* Caller should check returned pointer for errors */
> +
>  struct vfsmount *collect_mounts(struct path *path)
>  {
>  	struct vfsmount *tree;
> @@ -1529,14 +1536,15 @@ static int do_loopback(struct path *path, char *old_name,
>  	if (!check_mnt(path->mnt) || !check_mnt(old_path.mnt))
>  		goto out;
>  
> -	err = -ENOMEM;
>  	if (recurse)
>  		mnt = copy_tree(old_path.mnt, old_path.dentry, 0);
>  	else
>  		mnt = clone_mnt(old_path.mnt, old_path.dentry, 0);
>  
> -	if (!mnt)
> +	if (IS_ERR(mnt)) {
> +		err = PTR_ERR(mnt);
>  		goto out;
> +	}
>  
>  	err = graft_tree(mnt, path);
>  	if (err) {
> @@ -2071,10 +2079,11 @@ static struct mnt_namespace *dup_mnt_ns(struct mnt_namespace *mnt_ns,
>  	/* First pass: copy the tree topology */
>  	new_ns->root = copy_tree(mnt_ns->root, mnt_ns->root->mnt_root,
>  					CL_COPY_ALL | CL_EXPIRE);
> -	if (!new_ns->root) {
> +	if (IS_ERR(new_ns->root)) {
> +		int err = PTR_ERR(new_ns->root);
>  		up_write(&namespace_sem);
>  		kfree(new_ns);
> -		return ERR_PTR(-ENOMEM);
> +		return ERR_PTR(err);
>  	}
>  	spin_lock(&vfsmount_lock);
>  	list_add_tail(&new_ns->list, &new_ns->root->mnt_list);
> diff --git a/fs/pnode.c b/fs/pnode.c
> index 5cc564a..c4358d2 100644
> --- a/fs/pnode.c
> +++ b/fs/pnode.c
> @@ -250,8 +250,9 @@ int propagate_mnt(struct vfsmount *dest_mnt, struct dentry *dest_dentry,
>  
>  		source =  get_source(m, prev_dest_mnt, prev_src_mnt, &type);
>  
> -		if (!(child = copy_tree(source, source->mnt_root, type))) {
> -			ret = -ENOMEM;
> +		child = copy_tree(source, source->mnt_root, type);
> +		if (IS_ERR(child)) {
> +			ret = PTR_ERR(child);
>  			list_splice(tree_list, tmp_list.prev);
>  			goto out;
>  		}
> -- 
> 1.6.3.3
> 
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes
  2010-09-30  9:51   ` Miklos Szeredi
@ 2010-09-30 21:41     ` Valerie Aurora
  2010-09-30 21:44       ` Valerie Aurora
  0 siblings, 1 reply; 59+ messages in thread
From: Valerie Aurora @ 2010-09-30 21:41 UTC (permalink / raw)
  To: Miklos Szeredi, ram
  Cc: viro, hch, agruen, npiggin, linux-kernel, linux-fsdevel

On Thu, Sep 30, 2010 at 11:51:30AM +0200, Miklos Szeredi wrote:
> On Thu, 16 Sep 2010, Valerie Aurora wrote:
> > copy_tree() can theoretically fail in a case other than ENOMEM, but
> > always returns NULL which is interpreted by callers as -ENOMEM.
> > Convert to return an explicit error.  Convert clone_mnt() for
> > consistency and because union mounts will add new error cases.
> 
> I think it makes sense to push this fix to 2.6.37 independently of the
> other patches.
> 
> Acked-by: Miklos Szeredi <mszeredi@suse.cz>

I'm certainly not going to argue, but I spent an hour trying to
trigger the non-ENOMEM case (below) and failed - maybe it's
unreachable?

> > @@ -1212,11 +1216,12 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
> >  	struct path path;
> >  
> >  	if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt))
> > -		return NULL;
> > +		return ERR_PTR(-EINVAL);

Ram, do you remember how this worked?

-VAL

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes
  2010-09-30 21:41     ` Valerie Aurora
@ 2010-09-30 21:44       ` Valerie Aurora
  2010-10-01  0:33         ` Ram Pai
  0 siblings, 1 reply; 59+ messages in thread
From: Valerie Aurora @ 2010-09-30 21:44 UTC (permalink / raw)
  To: Miklos Szeredi, Ram Pai
  Cc: viro, hch, agruen, npiggin, linux-kernel, linux-fsdevel

(Resend with correct email for Ram Pai)

On Thu, Sep 30, 2010 at 11:51:30AM +0200, Miklos Szeredi wrote:
> On Thu, 16 Sep 2010, Valerie Aurora wrote:
> > copy_tree() can theoretically fail in a case other than ENOMEM, but
> > always returns NULL which is interpreted by callers as -ENOMEM.
> > Convert to return an explicit error.  Convert clone_mnt() for
> > consistency and because union mounts will add new error cases.
> 
> I think it makes sense to push this fix to 2.6.37 independently of the
> other patches.
> 
> Acked-by: Miklos Szeredi <mszeredi@suse.cz>
 
I'm certainly not going to argue, but I spent an hour trying to
trigger the non-ENOMEM case (below) and failed - maybe it's
unreachable?

> > @@ -1212,11 +1216,12 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
> >  	struct path path;
> >  
> >  	if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt))
> > -		return NULL;
> > +		return ERR_PTR(-EINVAL);

Ram, do you remember how this worked?

-VAL

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 23/34] union-mount: Prevent topmost file system from being mounted elsewhere
  2010-09-30  9:37   ` Miklos Szeredi
@ 2010-09-30 21:47     ` Valerie Aurora
  0 siblings, 0 replies; 59+ messages in thread
From: Valerie Aurora @ 2010-09-30 21:47 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: viro, hch, agruen, npiggin, linux-kernel, linux-fsdevel

On Thu, Sep 30, 2010 at 11:37:48AM +0200, Miklos Szeredi wrote:
> On Thu, 16 Sep 2010, Valerie Aurora wrote:
> > The device underlying the topmost read-write layer of a file system
> > cannot be mounted anywhere else on the system.  We keep a pointer to
> > the union stack in the dentry of the topmost directory, so that dentry
> > can't be part of a different mount, since dentries are shared between
> > different mounts of the same device.
> > 
> > Signed-off-by: Valerie Aurora <vaurora@redhat.com>
> > ---
> >  fs/namespace.c |    5 +++++
> >  1 files changed, 5 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index 61256e6..26efaf3 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -1998,6 +1998,11 @@ int do_add_mount(struct vfsmount *newmnt, struct path *path,
> >  	if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode))
> >  		goto unlock;
> >  
> > +	/* Top layers of union mounts can't be mounted elsewhere */
> > +	err = -EBUSY;
> > +	if (newmnt->mnt_sb->s_union_lower_mnts)
> > +		goto unlock;
> > +
> 
> This is insufficient: the super block may be mounted elsewhere later.
> And no, preventing bind mounts is not enough.

My mistake, that's a bug in the comment/commit message - s/mount/union
mount/.  The patch that prevents not-union mounts is:

    union-mount: Create check_topmost_union_mnt()
    
    check_topmost_union_mnt() checks that the topmost layer of a proposed
    union mount is read-write, supports fallthrus and whiteouts, and isn't
    mounted elsewhere.

And the patch that prevents bind mounts is:

    union-mount: Prevent bind mounts of union mounts
    
    Prevent bind mounts of parts of union mounts.
    
    XXX - Bind mounting parts of union mounts is probably easy to
    implement, but requires some careful thought about corner cases,
    extensive testing, and some refactoring of the code.

If you see any problems in those patches, I'd appreciate the comment.

> BTW, what about CLONE_NEWNS?  I think it's a rather big limitation if
> that doesn't work...

Great segue - I think the same code will make both CLONE_NEWNS and
bind mounts work.  We can allow multiple mounts of a union if it's the
exact same stack in each mount.  I will work on this.

-VAL

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes
  2010-09-30 21:44       ` Valerie Aurora
@ 2010-10-01  0:33         ` Ram Pai
  2010-10-01  1:58           ` Ram Pai
  0 siblings, 1 reply; 59+ messages in thread
From: Ram Pai @ 2010-10-01  0:33 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Miklos Szeredi, viro, hch, agruen, npiggin, linux-kernel, linux-fsdevel

On Thu, Sep 30, 2010 at 05:44:18PM -0400, Valerie Aurora wrote:
> (Resend with correct email for Ram Pai)
> 
> On Thu, Sep 30, 2010 at 11:51:30AM +0200, Miklos Szeredi wrote:
> > On Thu, 16 Sep 2010, Valerie Aurora wrote:
> > > copy_tree() can theoretically fail in a case other than ENOMEM, but
> > > always returns NULL which is interpreted by callers as -ENOMEM.
> > > Convert to return an explicit error.  Convert clone_mnt() for
> > > consistency and because union mounts will add new error cases.
> > 
> > I think it makes sense to push this fix to 2.6.37 independently of the
> > other patches.
> > 
> > Acked-by: Miklos Szeredi <mszeredi@suse.cz>
> 
> I'm certainly not going to argue, but I spent an hour trying to
> trigger the non-ENOMEM case (below) and failed - maybe it's
> unreachable?
> 
> > > @@ -1212,11 +1216,12 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
> > >  	struct path path;
> > >  
> > >  	if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt))
> > > -		return NULL;
> > > +		return ERR_PTR(-EINVAL);
> 
> Ram, do you remember how this worked?

Oops. That should be a OR condition. there is one other check in that
function that should also be a OR condition.

BTW: the return value has to be NULL. right? because its not an error
to clone a unbindable mount. Nor is it an error to not specify CL_COPY_ALL.
It just means that you want nothing in return.

RP

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 03/34] VFS: Add CL_NO_SLAVE flag to clone_mnt()/copy_tree()
  2010-09-27 18:50               ` Valerie Aurora
@ 2010-10-01  0:44                 ` Ram Pai
  0 siblings, 0 replies; 59+ messages in thread
From: Ram Pai @ 2010-10-01  0:44 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Ram Pai, Alexander Viro, Miklos Szeredi, Christoph Hellwig,
	Andreas Gruenbacher, Nick Piggin, linux-kernel, linux-fsdevel

On Mon, Sep 27, 2010 at 02:50:17PM -0400, Valerie Aurora wrote:
> On Sun, Sep 26, 2010 at 10:42:05PM -0700, Ram Pai wrote:
> > On Mon, Sep 20, 2010 at 08:03:44PM -0400, Valerie Aurora wrote:
> > > On Sun, Sep 19, 2010 at 10:25:53PM -0700, Ram Pai wrote:
> > > > 
> > > > I understand your intentions, but I think you are making a wrong assumption.
> > > > You seem to be thinking that if a slave-mount is cloned, the new cloned
> > > > mount will also be a slave-mount and will hence receive propagations. As
> > > > per shared subtree semantics, a slave-mount when cloned will create a private
> > > > mount. Since your intention is to avoid generating any new mounts that 
> > > > recieve propagations, you should be checking for shared-mounts and 
> > > > slave-shared-mounts because these are the two kind of mounts that when
> > > > cloned create new mounts that receive propagation.
> > > 
> > > No.  This isn't about the semantics of the clone mount operation.  It
> > > is about the administrator creating a slave mount, unioning it, and
> > > then being surprised when the unioned file system does not receive
> > > mount propagation events.
> > > 
> > > Think of the source vfsmount tree as a set of command line arguments
> > > for the union mount.
> > 
> > Ok. In that case,  you introduced a subtle change in the semantics of clone_mnt().
> > As I understand it, the flags parameter of clone_mnt() are meant to be a modifier 
> > for the cloned mount, not a filter on the source mount.
> 
> Yes, that's it exactly.
> 
> Do you have a suggestion for writing this a different way?  We can
> move it all into copy_tree() and leave clone_mnt() alone, at the cost
> of a little code duplication and some acrobatics around possible
> loopback support.

I would probably lean towards a function tree_receives_propagation() similar to
tree_contains_unbindable(), where you walk down the tree and check if there
are any SLAVE or SHARED mounts.  

RP

> 
> -VAL

-- 
Ram Pai
System X Device-Driver Enablement Lead
Linux Technology Center
Beaverton OR-97006
503-5783752 t/l 7753752
linuxram@us.ibm.com

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes
  2010-10-01  0:33         ` Ram Pai
@ 2010-10-01  1:58           ` Ram Pai
  2010-10-01  9:12               ` Szeredi Miklos
  0 siblings, 1 reply; 59+ messages in thread
From: Ram Pai @ 2010-10-01  1:58 UTC (permalink / raw)
  To: Ram Pai
  Cc: Valerie Aurora, Miklos Szeredi, viro, hch, agruen, npiggin,
	linux-kernel, linux-fsdevel

On Thu, Sep 30, 2010 at 05:33:42PM -0700, Ram Pai wrote:
> On Thu, Sep 30, 2010 at 05:44:18PM -0400, Valerie Aurora wrote:
> > (Resend with correct email for Ram Pai)
> > 
> > On Thu, Sep 30, 2010 at 11:51:30AM +0200, Miklos Szeredi wrote:
> > > On Thu, 16 Sep 2010, Valerie Aurora wrote:
> > > > copy_tree() can theoretically fail in a case other than ENOMEM, but
> > > > always returns NULL which is interpreted by callers as -ENOMEM.
> > > > Convert to return an explicit error.  Convert clone_mnt() for
> > > > consistency and because union mounts will add new error cases.
> > > 
> > > I think it makes sense to push this fix to 2.6.37 independently of the
> > > other patches.
> > > 
> > > Acked-by: Miklos Szeredi <mszeredi@suse.cz>
> > 
> > I'm certainly not going to argue, but I spent an hour trying to
> > trigger the non-ENOMEM case (below) and failed - maybe it's
> > unreachable?
> > 
> > > > @@ -1212,11 +1216,12 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
> > > >  	struct path path;
> > > >  
> > > >  	if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt))
> > > > -		return NULL;
> > > > +		return ERR_PTR(-EINVAL);
> > 
> > Ram, do you remember how this worked?
> 
> Oops. That should be a OR condition. there is one other check in that
> function that should also be a OR condition.

I may be wrong here. Can't exactly recollect what CL_COPY_ALL flag means. Al Viro
might remember?  If CL_COPY_ALL means, to clone everything irrespective of any other
flags, then the above code seems right. 


> 
> BTW: the return value has to be NULL. right? because its not an error
> to clone a unbindable mount. Nor is it an error to not specify CL_COPY_ALL.
> It just means that you want nothing in return.

In any case i think the return value should be NULL.

RP

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes
  2010-10-01  1:58           ` Ram Pai
@ 2010-10-01  9:12               ` Szeredi Miklos
  0 siblings, 0 replies; 59+ messages in thread
From: Szeredi Miklos @ 2010-10-01  9:12 UTC (permalink / raw)
  To: Ram Pai
  Cc: mszeredi2, Valerie Aurora, viro, hch, agruen, npiggin,
	linux-kernel, linux-fsdevel

On Fri, Oct 1, 2010 at 3:58 AM, Ram Pai <linuxram@us.ibm.com> wrote:
> > > > > @@ -1212,11 +1216,12 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
> > > > >         struct path path;
> > > > >
> > > > >         if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt))
> > > > > -               return NULL;
> > > > > +               return ERR_PTR(-EINVAL);
> > >
> > > Ram, do you remember how this worked?
> >
> > Oops. That should be a OR condition. there is one other check in that
> > function that should also be a OR condition.
>
> I may be wrong here. Can't exactly recollect what CL_COPY_ALL flag means. Al Viro
> might remember?  If CL_COPY_ALL means, to clone everything irrespective of any other
> flags, then the above code seems right.

CL_COPY_ALL means clone the mount despite MNT_UNBINDABLE.  It is used
for cloning the whole namespace and for collect_mounts(), both of
which ignore MNT_UNBINDABLE.

Of the two remaining callers of copy_tree() do_loopback already checks
MNT_UNBINDABLE on the root of the tree to be copied.

So that leaves the one in pnode.c.   That one will be called when
attaching a new mount or mount tree.  If the root of that tree is
unbindable then the propagation will fail with -ENOMEM which is wrong,
it should simply skip the whole tree and not try to propagate.   Calls
which result in propagation are  do_loopback, do_move_mount and
do_add_mount.  Of this do_loopback and do_move_mount already check for
MNT_UNBINDABLE, do_add_mount doesn't check, but should probably just
mask out MNT_UNBINDABLE.

So in the end that check in copy_tree() should never actually trigger
and can be turned into a WARN_ON

Additionally the propagation code should perhaps be more defensive and
skip MNT_UNBINDABLE source mounts.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes
@ 2010-10-01  9:12               ` Szeredi Miklos
  0 siblings, 0 replies; 59+ messages in thread
From: Szeredi Miklos @ 2010-10-01  9:12 UTC (permalink / raw)
  To: Ram Pai
  Cc: mszeredi2, Valerie Aurora, viro, hch, agruen, npiggin,
	linux-kernel, linux-fsdevel

On Fri, Oct 1, 2010 at 3:58 AM, Ram Pai <linuxram@us.ibm.com> wrote:
> > > > > @@ -1212,11 +1216,12 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
> > > > >         struct path path;
> > > > >
> > > > >         if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt))
> > > > > -               return NULL;
> > > > > +               return ERR_PTR(-EINVAL);
> > >
> > > Ram, do you remember how this worked?
> >
> > Oops. That should be a OR condition. there is one other check in that
> > function that should also be a OR condition.
>
> I may be wrong here. Can't exactly recollect what CL_COPY_ALL flag means. Al Viro
> might remember?  If CL_COPY_ALL means, to clone everything irrespective of any other
> flags, then the above code seems right.

CL_COPY_ALL means clone the mount despite MNT_UNBINDABLE.  It is used
for cloning the whole namespace and for collect_mounts(), both of
which ignore MNT_UNBINDABLE.

Of the two remaining callers of copy_tree() do_loopback already checks
MNT_UNBINDABLE on the root of the tree to be copied.

So that leaves the one in pnode.c.   That one will be called when
attaching a new mount or mount tree.  If the root of that tree is
unbindable then the propagation will fail with -ENOMEM which is wrong,
it should simply skip the whole tree and not try to propagate.   Calls
which result in propagation are  do_loopback, do_move_mount and
do_add_mount.  Of this do_loopback and do_move_mount already check for
MNT_UNBINDABLE, do_add_mount doesn't check, but should probably just
mask out MNT_UNBINDABLE.

So in the end that check in copy_tree() should never actually trigger
and can be turned into a WARN_ON

Additionally the propagation code should perhaps be more defensive and
skip MNT_UNBINDABLE source mounts.

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes
  2010-10-01  9:12               ` Szeredi Miklos
  (?)
@ 2010-10-01 18:32               ` Ram Pai
  2010-10-06 18:24                 ` Valerie Aurora
  -1 siblings, 1 reply; 59+ messages in thread
From: Ram Pai @ 2010-10-01 18:32 UTC (permalink / raw)
  To: Szeredi Miklos
  Cc: mszeredi2, Valerie Aurora, viro, hch, agruen, npiggin,
	linux-kernel, linux-fsdevel

On Fri, Oct 01, 2010 at 11:12:48AM +0200, Szeredi Miklos wrote:
> On Fri, Oct 1, 2010 at 3:58 AM, Ram Pai <linuxram@us.ibm.com> wrote:
> > > > > > @@ -1212,11 +1216,12 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
> > > > > >         struct path path;
> > > > > >
> > > > > >         if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt))
> > > > > > -               return NULL;
> > > > > > +               return ERR_PTR(-EINVAL);
> > > >
> > > > Ram, do you remember how this worked?
> > >
> > > Oops. That should be a OR condition. there is one other check in that
> > > function that should also be a OR condition.
> >
> > I may be wrong here. Can't exactly recollect what CL_COPY_ALL flag means. Al Viro
> > might remember?  If CL_COPY_ALL means, to clone everything irrespective of any other
> > flags, then the above code seems right.
> 
> CL_COPY_ALL means clone the mount despite MNT_UNBINDABLE.  It is used
> for cloning the whole namespace and for collect_mounts(), both of
> which ignore MNT_UNBINDABLE.

Ok. That reminds me  when the above piece of code in copy_tree() is triggered.
It triggered when a mount tree with a unbindable mount at its head
is moved on a shared mount with atleast one peer.

something like this should trigger the code.

# create a unbindable mount
mkdir -p /mnt2/m1
mount --bind /mnt2/m1 /mnt2/m1
mount --make-unbindable /mnt2/m1

#create a shared mount with one peer.
mkdir -p /mnt2/s1
mkdir -p /mnt2/s2
mount --bind /mnt2/s1 /mnt2/s1
mount --make-shared /mnt2/s1
mount --bind /mnt2/s1 /mnt2/s2

#move the unbindable mount to one of the shared peer
mkdir -p /mnt2/s1/movemount
mount --move /mnt2/m1 /mnt2/s1/movemount

the last step will fail and that is because of the above check in copy_tree()

> 
> Of the two remaining callers of copy_tree() do_loopback already checks
> MNT_UNBINDABLE on the root of the tree to be copied.
> 
> So that leaves the one in pnode.c.   That one will be called when
> attaching a new mount or mount tree.  If the root of that tree is
> unbindable then the propagation will fail with -ENOMEM which is wrong,
> it should simply skip the whole tree and not try to propagate.  

Yes.  the propagation_mnt() should fail if it is unable to create clones
of the source mount due to any reason. However -ENOMEM may not be
the right return code. 


> Calls
> which result in propagation are  do_loopback, do_move_mount and
> do_add_mount.  Of this do_loopback and do_move_mount already check for
> MNT_UNBINDABLE, do_add_mount doesn't check, but should probably just
> mask out MNT_UNBINDABLE.
> 
> So in the end that check in copy_tree() should never actually trigger
> and can be turned into a WARN_ON

You can do that. But then we have to catch for the cases where a unbindable
mount is moved on a shared mounts. I suppose we can put in a check in do_move_mount().

> 
> Additionally the propagation code should perhaps be more defensive and
> skip MNT_UNBINDABLE source mounts.

No. If we do that, I am afraid, we will end up with inconsistent peer-mount trees 
which will not resemble each other.

RP

> 
> Thanks,
> Miklos

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes
  2010-10-01 18:32               ` Ram Pai
@ 2010-10-06 18:24                 ` Valerie Aurora
  2010-10-12  7:41                   ` Ram Pai
  0 siblings, 1 reply; 59+ messages in thread
From: Valerie Aurora @ 2010-10-06 18:24 UTC (permalink / raw)
  To: Ram Pai
  Cc: Szeredi Miklos, mszeredi2, viro, hch, agruen, npiggin,
	linux-kernel, linux-fsdevel

On Fri, Oct 01, 2010 at 11:32:43AM -0700, Ram Pai wrote:
> On Fri, Oct 01, 2010 at 11:12:48AM +0200, Szeredi Miklos wrote:
> > On Fri, Oct 1, 2010 at 3:58 AM, Ram Pai <linuxram@us.ibm.com> wrote:
> > > > > > > @@ -1212,11 +1216,12 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
> > > > > > > ? ? ? ? struct path path;
> > > > > > >
> > > > > > > ? ? ? ? if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt))
> > > > > > > - ? ? ? ? ? ? ? return NULL;
> > > > > > > + ? ? ? ? ? ? ? return ERR_PTR(-EINVAL);
> > > > >
> > > > > Ram, do you remember how this worked?
> > > >
> > > > Oops. That should be a OR condition. there is one other check in that
> > > > function that should also be a OR condition.
> > >
> > > I may be wrong here. Can't exactly recollect what CL_COPY_ALL flag means. Al Viro
> > > might remember? ?If CL_COPY_ALL means, to clone everything irrespective of any other
> > > flags, then the above code seems right.
> > 
> > CL_COPY_ALL means clone the mount despite MNT_UNBINDABLE.  It is used
> > for cloning the whole namespace and for collect_mounts(), both of
> > which ignore MNT_UNBINDABLE.
> 
> Ok. That reminds me  when the above piece of code in copy_tree() is triggered.
> It triggered when a mount tree with a unbindable mount at its head
> is moved on a shared mount with atleast one peer.
> 
> something like this should trigger the code.
> 
> # create a unbindable mount
> mkdir -p /mnt2/m1
> mount --bind /mnt2/m1 /mnt2/m1
> mount --make-unbindable /mnt2/m1
> 
> #create a shared mount with one peer.
> mkdir -p /mnt2/s1
> mkdir -p /mnt2/s2
> mount --bind /mnt2/s1 /mnt2/s1
> mount --make-shared /mnt2/s1
> mount --bind /mnt2/s1 /mnt2/s2
> 
> #move the unbindable mount to one of the shared peer
> mkdir -p /mnt2/s1/movemount
> mount --move /mnt2/m1 /mnt2/s1/movemount
> 
> the last step will fail and that is because of the above check in copy_tree()

Actually, it fails in do_move_mount(), as Miklos theorized.  I tested
it with the above in an attempt to trigger it in practice in case the
code review was wrong, but failed.

> > Of the two remaining callers of copy_tree() do_loopback already checks
> > MNT_UNBINDABLE on the root of the tree to be copied.
> > 
> > So that leaves the one in pnode.c.   That one will be called when
> > attaching a new mount or mount tree.  If the root of that tree is
> > unbindable then the propagation will fail with -ENOMEM which is wrong,
> > it should simply skip the whole tree and not try to propagate.  
> 
> Yes.  the propagation_mnt() should fail if it is unable to create clones
> of the source mount due to any reason. However -ENOMEM may not be
> the right return code. 
> 
> 
> > Calls
> > which result in propagation are  do_loopback, do_move_mount and
> > do_add_mount.  Of this do_loopback and do_move_mount already check for
> > MNT_UNBINDABLE, do_add_mount doesn't check, but should probably just
> > mask out MNT_UNBINDABLE.
> > 
> > So in the end that check in copy_tree() should never actually trigger
> > and can be turned into a WARN_ON
> 
> You can do that. But then we have to catch for the cases where a unbindable
> mount is moved on a shared mounts. I suppose we can put in a check in do_move_mount().
> > 
> > Additionally the propagation code should perhaps be more defensive and
> > skip MNT_UNBINDABLE source mounts.
> 
> No. If we do that, I am afraid, we will end up with inconsistent peer-mount trees 
> which will not resemble each other.

Any chance you have the time to do a little documentation on where
checks should be done and what flags each function expects?  Right now
the distribution and location of the checks tend to send the reader
off on false trails...

-VAL

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes
  2010-10-01  9:12               ` Szeredi Miklos
  (?)
  (?)
@ 2010-10-06 18:31               ` Valerie Aurora
  2010-10-07  9:42                 ` Miklos Szeredi
  -1 siblings, 1 reply; 59+ messages in thread
From: Valerie Aurora @ 2010-10-06 18:31 UTC (permalink / raw)
  To: Szeredi Miklos
  Cc: Ram Pai, mszeredi2, viro, hch, agruen, npiggin, linux-kernel,
	linux-fsdevel

On Fri, Oct 01, 2010 at 11:12:48AM +0200, Szeredi Miklos wrote:
> On Fri, Oct 1, 2010 at 3:58 AM, Ram Pai <linuxram@us.ibm.com> wrote:
> > > > > > @@ -1212,11 +1216,12 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
> > > > > > ? ? ? ? struct path path;
> > > > > >
> > > > > > ? ? ? ? if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt))
> > > > > > - ? ? ? ? ? ? ? return NULL;
> > > > > > + ? ? ? ? ? ? ? return ERR_PTR(-EINVAL);
> > > >
> > > > Ram, do you remember how this worked?
> > >
> > > Oops. That should be a OR condition. there is one other check in that
> > > function that should also be a OR condition.
> >
> > I may be wrong here. Can't exactly recollect what CL_COPY_ALL flag means. Al Viro
> > might remember? ?If CL_COPY_ALL means, to clone everything irrespective of any other
> > flags, then the above code seems right.
> 
> CL_COPY_ALL means clone the mount despite MNT_UNBINDABLE.  It is used
> for cloning the whole namespace and for collect_mounts(), both of
> which ignore MNT_UNBINDABLE.
> 
> Of the two remaining callers of copy_tree() do_loopback already checks
> MNT_UNBINDABLE on the root of the tree to be copied.

I reviewed and tested and agree.

But I don't think this change should go into stable.  It doesn't fix
any existing bug and I don't like perturbing the code in stable for a
code cleanup.

> So that leaves the one in pnode.c.   That one will be called when
> attaching a new mount or mount tree.  If the root of that tree is
> unbindable then the propagation will fail with -ENOMEM which is wrong,
> it should simply skip the whole tree and not try to propagate.   Calls

Not try to propagate - and return an error?  Or succeed and ignore?

> which result in propagation are  do_loopback, do_move_mount and
> do_add_mount.  Of this do_loopback and do_move_mount already check for
> MNT_UNBINDABLE, do_add_mount doesn't check, but should probably just
> mask out MNT_UNBINDABLE.

Hm, if we stop trusting callers of do_add_mount(), we should probably
do a lot more than just mask this out.  Interestingly, most out-of-VFS
callers just seem to add MNT_SHRINKABLE, maybe we should export
do_add_shrinkable() instead or something like that?

> So in the end that check in copy_tree() should never actually trigger
> and can be turned into a WARN_ON

WARN_ON() makes sense.

> Additionally the propagation code should perhaps be more defensive and
> skip MNT_UNBINDABLE source mounts.

Maybe WARN_ON() here too?

I'm not going to get around to revamping this part of mount
propagation until after I finish union mount copyup rewrite, anyone
else up for doing it sooner?

-VAL

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes
  2010-10-06 18:31               ` Valerie Aurora
@ 2010-10-07  9:42                 ` Miklos Szeredi
  0 siblings, 0 replies; 59+ messages in thread
From: Miklos Szeredi @ 2010-10-07  9:42 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: miklos, linuxram, mszeredi2, viro, hch, agruen, npiggin,
	linux-kernel, linux-fsdevel

On Wed, 6 Oct 2010, Valerie Aurora wrote:
> On Fri, Oct 01, 2010 at 11:12:48AM +0200, Szeredi Miklos wrote:
> > On Fri, Oct 1, 2010 at 3:58 AM, Ram Pai <linuxram@us.ibm.com> wrote:
> > > > > > > @@ -1212,11 +1216,12 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
> > > > > > > ? ? ? ? struct path path;
> > > > > > >
> > > > > > > ? ? ? ? if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt))
> > > > > > > - ? ? ? ? ? ? ? return NULL;
> > > > > > > + ? ? ? ? ? ? ? return ERR_PTR(-EINVAL);
> > > > >
> > > > > Ram, do you remember how this worked?
> > > >
> > > > Oops. That should be a OR condition. there is one other check in that
> > > > function that should also be a OR condition.
> > >
> > > I may be wrong here. Can't exactly recollect what CL_COPY_ALL flag means. Al Viro
> > > might remember? ?If CL_COPY_ALL means, to clone everything irrespective of any other
> > > flags, then the above code seems right.
> > 
> > CL_COPY_ALL means clone the mount despite MNT_UNBINDABLE.  It is used
> > for cloning the whole namespace and for collect_mounts(), both of
> > which ignore MNT_UNBINDABLE.
> > 
> > Of the two remaining callers of copy_tree() do_loopback already checks
> > MNT_UNBINDABLE on the root of the tree to be copied.
> 
> I reviewed and tested and agree.
> 
> But I don't think this change should go into stable.  It doesn't fix
> any existing bug and I don't like perturbing the code in stable for a
> code cleanup.

Right.

> > So that leaves the one in pnode.c.   That one will be called when
> > attaching a new mount or mount tree.  If the root of that tree is
> > unbindable then the propagation will fail with -ENOMEM which is wrong,
> > it should simply skip the whole tree and not try to propagate.   Calls
> 
> Not try to propagate - and return an error?  Or succeed and ignore?

I thought succeed and ignore is the right answer, but I'm not sure
now.

> > which result in propagation are  do_loopback, do_move_mount and
> > do_add_mount.  Of this do_loopback and do_move_mount already check for
> > MNT_UNBINDABLE, do_add_mount doesn't check, but should probably just
> > mask out MNT_UNBINDABLE.
> 
> Hm, if we stop trusting callers of do_add_mount(), we should probably
> do a lot more than just mask this out.  Interestingly, most out-of-VFS
> callers just seem to add MNT_SHRINKABLE, maybe we should export
> do_add_shrinkable() instead or something like that?
> 
> > So in the end that check in copy_tree() should never actually trigger
> > and can be turned into a WARN_ON
> 
> WARN_ON() makes sense.
> 
> > Additionally the propagation code should perhaps be more defensive and
> > skip MNT_UNBINDABLE source mounts.
> 
> Maybe WARN_ON() here too?

Yes, I think so.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes
  2010-10-06 18:24                 ` Valerie Aurora
@ 2010-10-12  7:41                   ` Ram Pai
  0 siblings, 0 replies; 59+ messages in thread
From: Ram Pai @ 2010-10-12  7:41 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Ram Pai, Szeredi Miklos, mszeredi2, viro, hch, agruen, npiggin,
	linux-kernel, linux-fsdevel

On Wed, Oct 06, 2010 at 02:24:50PM -0400, Valerie Aurora wrote:
> On Fri, Oct 01, 2010 at 11:32:43AM -0700, Ram Pai wrote:
> > On Fri, Oct 01, 2010 at 11:12:48AM +0200, Szeredi Miklos wrote:
> > > On Fri, Oct 1, 2010 at 3:58 AM, Ram Pai <linuxram@us.ibm.com> wrote:
> > > > > > > > @@ -1212,11 +1216,12 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
> > > > > > > > ? ? ? ? struct path path;
> > > > > > > >
> > > > > > > > ? ? ? ? if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt))
> > > > > > > > - ? ? ? ? ? ? ? return NULL;
> > > > > > > > + ? ? ? ? ? ? ? return ERR_PTR(-EINVAL);
> > > > > >
> > > > > > Ram, do you remember how this worked?
> > > > >
> > > > > Oops. That should be a OR condition. there is one other check in that
> > > > > function that should also be a OR condition.
> > > >
> > > > I may be wrong here. Can't exactly recollect what CL_COPY_ALL flag means. Al Viro
> > > > might remember? ?If CL_COPY_ALL means, to clone everything irrespective of any other
> > > > flags, then the above code seems right.
> > > 
> > > CL_COPY_ALL means clone the mount despite MNT_UNBINDABLE.  It is used
> > > for cloning the whole namespace and for collect_mounts(), both of
> > > which ignore MNT_UNBINDABLE.
> > 
> > Ok. That reminds me  when the above piece of code in copy_tree() is triggered.
> > It triggered when a mount tree with a unbindable mount at its head
> > is moved on a shared mount with atleast one peer.
> > 
> > something like this should trigger the code.
> > 
> > # create a unbindable mount
> > mkdir -p /mnt2/m1
> > mount --bind /mnt2/m1 /mnt2/m1
> > mount --make-unbindable /mnt2/m1
> > 
> > #create a shared mount with one peer.
> > mkdir -p /mnt2/s1
> > mkdir -p /mnt2/s2
> > mount --bind /mnt2/s1 /mnt2/s1
> > mount --make-shared /mnt2/s1
> > mount --bind /mnt2/s1 /mnt2/s2
> > 
> > #move the unbindable mount to one of the shared peer
> > mkdir -p /mnt2/s1/movemount
> > mount --move /mnt2/m1 /mnt2/s1/movemount
> > 
> > the last step will fail and that is because of the above check in copy_tree()
> 
> Actually, it fails in do_move_mount(), as Miklos theorized.  I tested
> it with the above in an attempt to trigger it in practice in case the
> code review was wrong, but failed.

Well, yes there is a check in do_move_mount() for this case.
I was incorrect. 

> 
> > > Of the two remaining callers of copy_tree() do_loopback already checks
> > > MNT_UNBINDABLE on the root of the tree to be copied.
> > > 
> > > So that leaves the one in pnode.c.   That one will be called when
> > > attaching a new mount or mount tree.  If the root of that tree is
> > > unbindable then the propagation will fail with -ENOMEM which is wrong,
> > > it should simply skip the whole tree and not try to propagate.  
> > 
> > Yes.  the propagation_mnt() should fail if it is unable to create clones
> > of the source mount due to any reason. However -ENOMEM may not be
> > the right return code. 
> > 
> > 
> > > Calls
> > > which result in propagation are  do_loopback, do_move_mount and
> > > do_add_mount.  Of this do_loopback and do_move_mount already check for
> > > MNT_UNBINDABLE, do_add_mount doesn't check, but should probably just
> > > mask out MNT_UNBINDABLE.
> > > 
> > > So in the end that check in copy_tree() should never actually trigger
> > > and can be turned into a WARN_ON
> > 
> > You can do that. But then we have to catch for the cases where a unbindable
> > mount is moved on a shared mounts. I suppose we can put in a check in do_move_mount().

Since the check is already in there in do_move_mount(),  I now agree with Miklos.
The check in copy_tree() does nothing but chews up a few cycles unnecessarily.
However just to be safe we can make it a WARN_ON.

> > > 
> > > Additionally the propagation code should perhaps be more defensive and
> > > skip MNT_UNBINDABLE source mounts.

the code is already skipping unbindable source mounts in propagate_mnt(). 
Miklos: did you have something else in mind here?

> > 
> > No. If we do that, I am afraid, we will end up with inconsistent peer-mount trees 
> > which will not resemble each other.
> 
> Any chance you have the time to do a little documentation on where
> checks should be done and what flags each function expects?  Right now
> the distribution and location of the checks tend to send the reader
> off on false trails...

Yes. some additional documentation is needed, given that I myself trailed on wrong
paths after having not looked at this code for more than 4years.

RP

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2010-10-12  7:42 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-16 22:11 [PATCH 00/34] Union mount core for review Valerie Aurora
2010-09-16 22:11 ` [PATCH 01/34] VFS: Make clone_mnt() and copy_tree() return error codes Valerie Aurora
2010-09-20 21:26   ` Andreas Gruenbacher
2010-09-21 18:53     ` Valerie Aurora
2010-09-30  9:51   ` Miklos Szeredi
2010-09-30 21:41     ` Valerie Aurora
2010-09-30 21:44       ` Valerie Aurora
2010-10-01  0:33         ` Ram Pai
2010-10-01  1:58           ` Ram Pai
2010-10-01  9:12             ` Szeredi Miklos
2010-10-01  9:12               ` Szeredi Miklos
2010-10-01 18:32               ` Ram Pai
2010-10-06 18:24                 ` Valerie Aurora
2010-10-12  7:41                   ` Ram Pai
2010-10-06 18:31               ` Valerie Aurora
2010-10-07  9:42                 ` Miklos Szeredi
2010-09-16 22:11 ` [PATCH 02/34] VFS: Add CL_NO_SHARED flag to clone_mnt()/copy_tree() Valerie Aurora
2010-09-16 22:11 ` [PATCH 03/34] VFS: Add CL_NO_SLAVE " Valerie Aurora
     [not found]   ` <AANLkTim1bbGrrPcFHThx3XOm8GmudQFSmFUs3NAXT5yC@mail.gmail.com>
2010-09-17  4:34     ` Ram Pai
2010-09-17 17:15       ` Valerie Aurora
2010-09-20  5:25         ` Ram Pai
2010-09-21  0:03           ` Valerie Aurora
2010-09-27  5:42             ` Ram Pai
2010-09-27 18:50               ` Valerie Aurora
2010-10-01  0:44                 ` Ram Pai
2010-09-16 22:11 ` [PATCH 04/34] VFS: Add CL_MAKE_HARD_READONLY " Valerie Aurora
2010-09-16 22:11 ` [PATCH 05/34] union-mount: Union mounts documentation Valerie Aurora
2010-09-16 22:11 ` [PATCH 06/34] union-mount: Introduce MNT_UNION and MS_UNION flags Valerie Aurora
2010-09-16 22:11 ` [PATCH 07/34] union-mount: Add CONFIG_UNION_MOUNT option Valerie Aurora
2010-09-16 22:11 ` [PATCH 08/34] union-mount: Create union_stack structure Valerie Aurora
2010-09-16 22:12 ` [PATCH 09/34] union-mount: Add two superblock fields for union mounts Valerie Aurora
2010-09-16 22:12 ` [PATCH 10/34] union-mount: Add union_alloc() Valerie Aurora
2010-09-16 22:12 ` [PATCH 11/34] union-mount: Add union_find_dir() Valerie Aurora
2010-09-16 22:12 ` [PATCH 12/34] union-mount: Create d_free_unions() Valerie Aurora
2010-09-16 22:12 ` [PATCH 13/34] union-mount: Free union stack on removal of topmost dentry from dcache Valerie Aurora
2010-09-16 22:12 ` [PATCH 14/34] union-mount: Create union_add_dir() Valerie Aurora
2010-09-16 22:12 ` [PATCH 15/34] union-mount: Add union_create_topmost_dir() Valerie Aurora
2010-09-16 22:12 ` [PATCH 16/34] union-mount: Create IS_MNT_UNION() Valerie Aurora
2010-09-16 22:12 ` [PATCH 17/34] union-mount: Create needs_lookup_union() Valerie Aurora
2010-09-16 22:12 ` [PATCH 18/34] union-mount: Create check_topmost_union_mnt() Valerie Aurora
2010-09-16 22:12 ` [PATCH 19/34] union-mount: Add clone_union_tree() and put_union_sb() Valerie Aurora
2010-09-16 22:12 ` [PATCH 20/34] union-mount: Create build_root_union() Valerie Aurora
2010-09-16 22:12 ` [PATCH 21/34] union-mount: Create prepare_mnt_union() and cleanup_mnt_union() Valerie Aurora
2010-09-16 22:12 ` [PATCH 22/34] union-mount: Prevent improper union-related remounts Valerie Aurora
2010-09-16 22:12 ` [PATCH 23/34] union-mount: Prevent topmost file system from being mounted elsewhere Valerie Aurora
2010-09-30  9:37   ` Miklos Szeredi
2010-09-30 21:47     ` Valerie Aurora
2010-09-16 22:12 ` [PATCH 24/34] union-mount: Prevent bind mounts of union mounts Valerie Aurora
2010-09-16 22:12 ` [PATCH 25/34] union-mount: Implement union mount Valerie Aurora
2010-09-16 22:12 ` [PATCH 26/34] union-mount: Temporarily disable some syscalls Valerie Aurora
2010-09-16 22:12 ` [PATCH 27/34] union-mount: Basic infrastructure of __union_lookup() Valerie Aurora
2010-09-16 22:12 ` [PATCH 28/34] union-mount: Process negative dentries in __union_lookup() Valerie Aurora
2010-09-16 22:12 ` [PATCH 29/34] union-mount: Return files found in lower layers " Valerie Aurora
2010-09-16 22:12 ` [PATCH 30/34] union-mount: Build union stack in __lookup_union() Valerie Aurora
2010-09-16 22:12 ` [PATCH 31/34] union-mount: Follow mount " Valerie Aurora
2010-09-16 22:12 ` [PATCH 32/34] union-mount: Add lookup_union() wrapper for __lookup_union() Valerie Aurora
2010-09-16 22:12 ` [PATCH 33/34] union-mount: Add do_lookup_union() " Valerie Aurora
2010-09-16 22:12 ` [PATCH 34/34] union-mount: Call union lookup functions in lookup path Valerie Aurora
2010-09-21  0:02 ` [PATCH -1/34] VFS: Add hard read-only users count to superblock Valerie Aurora

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.