All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] Union mount core rewrite v1
@ 2010-03-02 22:11 Valerie Aurora
  2010-03-02 22:11 ` [PATCH 1/6] union-mount: Introduce union_mount structure and basic operations Valerie Aurora
  2010-03-03 17:28 ` [RFC PATCH 0/6] Union mount core rewrite v1 Miklos Szeredi
  0 siblings, 2 replies; 18+ messages in thread
From: Valerie Aurora @ 2010-03-02 22:11 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig
  Cc: linux-fsdevel, linux-kernel, Valerie Aurora

This is a major rewrite of parts of union mounts, in particular the
pathname lookup code.  For more info about union mounts, see:

http://valerieaurora.org/union/

The previous code had two important problems fixed in this series:

- On file open, is_unionized() grabs vfsmount lock and walks up the
  mount tree even for non-union mounts.

- Pathname lookup required three cut-n-pasted versions of two complex
  functions, one for each of cached/real/"hashed" lookups.

This rewrite reduces the additional cost of a non-union lookup in a
CONFIG_UNION_MOUNT kernel to either 1 or 2 mount flag tests (but adds
the requirement that file systems be unioned only at their root
directories).  This rewrite implements lookup with one lookup_union()
function for all types of lookups.

This posted patch series includes only the union lookup, mount, and
readdir patches and not the relatively uncontroversial whiteout and
fallthru code.  Rewrite of the in-kernel file copyup is next on my
todo list.

The entire patch set is against Viro's for-next branch, and available
at:

git://git.kernel.org/pub/scm/linux/kernel/git/val/linux-2.6.git

Branch no_copyup.  Please review!  Thanks,

-VAL

Jan Blunck (3):
  union-mount: Introduce union_mount structure and basic operations
  union-mount: Drive the union cache via dcache
  union-mount: Call do_whiteout() on unlink and rmdir in unions

Valerie Aurora (3):
  union-mount: Implement union lookup
  union-mount: Support for mounting union mount file systems
  union-mount: Copy up directory entries on first readdir()

 fs/Kconfig             |   13 +
 fs/Makefile            |    1 +
 fs/dcache.c            |   17 ++
 fs/namei.c             |  199 +++++++++++++++-
 fs/namespace.c         |  123 ++++++++++-
 fs/readdir.c           |    9 +
 fs/union.c             |  629 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dcache.h |   28 +++
 include/linux/mount.h  |    3 +
 include/linux/union.h  |   74 ++++++
 10 files changed, 1094 insertions(+), 2 deletions(-)
 create mode 100644 fs/union.c
 create mode 100644 include/linux/union.h


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/6] union-mount: Introduce union_mount structure and basic operations
  2010-03-02 22:11 [RFC PATCH 0/6] Union mount core rewrite v1 Valerie Aurora
@ 2010-03-02 22:11 ` Valerie Aurora
  2010-03-02 22:11   ` [PATCH 2/6] union-mount: Drive the union cache via dcache Valerie Aurora
  2010-03-03 17:33   ` [PATCH 1/6] union-mount: Introduce union_mount structure and basic operations Miklos Szeredi
  2010-03-03 17:28 ` [RFC PATCH 0/6] Union mount core rewrite v1 Miklos Szeredi
  1 sibling, 2 replies; 18+ messages in thread
From: Valerie Aurora @ 2010-03-02 22:11 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig
  Cc: linux-fsdevel, linux-kernel, Jan Blunck, Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

This patch adds the basic structures and operations of VFS-based union
mounts (but not the ability to mount or lookup unioned file systems).
Each directory in a unioned file system has an associated union stack
created when the directory is first looked up.  The union stack is a
structure kept in a hash table indexed by mount and dentry of the
directory; thus, specific paths are unioned, not dentries alone.  The
union stack keeps a pointer to the upper path and the lower path and
can be looked up by either path.

This particular version of union mounts is based on ideas by Jan
Blunck, Bharata Rao, and many others.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/Kconfig             |   13 ++
 fs/Makefile            |    1 +
 fs/dcache.c            |    4 +
 fs/union.c             |  290 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dcache.h |   20 ++++
 include/linux/mount.h  |    3 +
 include/linux/union.h  |   54 +++++++++
 7 files changed, 385 insertions(+), 0 deletions(-)
 create mode 100644 fs/union.c
 create mode 100644 include/linux/union.h

diff --git a/fs/Kconfig b/fs/Kconfig
index 64d44ef..303186b 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -59,6 +59,19 @@ source "fs/notify/Kconfig"
 
 source "fs/quota/Kconfig"
 
+config UNION_MOUNT
+       bool "Writable overlays (union mounts) (EXPERIMENTAL)"
+       depends on EXPERIMENTAL
+       help
+         Writable overlays allow you to mount a transparent writable
+	 layer over a read-only file system, for example, an ext3
+	 partition on a hard drive over a CD-ROM root file system
+	 image.
+
+	 See <file:Documentation/filesystems/union-mounts.txt> for details.
+
+	 If unsure, say N.
+
 source "fs/autofs/Kconfig"
 source "fs/autofs4/Kconfig"
 source "fs/fuse/Kconfig"
diff --git a/fs/Makefile b/fs/Makefile
index af6d047..4ed672e 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -52,6 +52,7 @@ obj-$(CONFIG_NFS_COMMON)	+= nfs_common/
 obj-$(CONFIG_GENERIC_ACL)	+= generic_acl.o
 
 obj-y				+= quota/
+obj-$(CONFIG_UNION_MOUNT)	+= union.o
 
 obj-$(CONFIG_PROC_FS)		+= proc/
 obj-y				+= partitions/
diff --git a/fs/dcache.c b/fs/dcache.c
index d14c304..0c2dd32 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -960,6 +960,10 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
 	INIT_LIST_HEAD(&dentry->d_lru);
 	INIT_LIST_HEAD(&dentry->d_subdirs);
 	INIT_LIST_HEAD(&dentry->d_alias);
+#ifdef CONFIG_UNION_MOUNT
+	INIT_LIST_HEAD(&dentry->d_unions);
+	dentry->d_unionized = 0;
+#endif
 
 	if (parent) {
 		dentry->d_parent = dget(parent);
diff --git a/fs/union.c b/fs/union.c
new file mode 100644
index 0000000..2e005d9
--- /dev/null
+++ b/fs/union.c
@@ -0,0 +1,290 @@
+/*
+ * VFS based union mount for Linux
+ *
+ * Copyright (C) 2004-2007 IBM Corporation, IBM Deutschland Entwicklung GmbH.
+ * Copyright (C) 2007-2009 Novell Inc.
+ *
+ *   Author(s): Jan Blunck (j.blunck@tu-harburg.de)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include <linux/bootmem.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/hash.h>
+#include <linux/fs.h>
+#include <linux/mount.h>
+#include <linux/fs_struct.h>
+#include <linux/union.h>
+
+/*
+ * This is borrowed from fs/inode.c. The hashtable for lookups. Somebody
+ * should try to make this good - I've just made it work.
+ */
+static unsigned int union_hash_mask __read_mostly;
+static unsigned int union_hash_shift __read_mostly;
+static struct hlist_head *union_hashtable __read_mostly;
+static unsigned int union_rhash_mask __read_mostly;
+static unsigned int union_rhash_shift __read_mostly;
+static struct hlist_head *union_rhashtable __read_mostly;
+
+/*
+ * Locking Rules:
+ * - dcache_lock (for union_rlookup() only)
+ * - union_lock
+ */
+DEFINE_SPINLOCK(union_lock);
+
+static struct kmem_cache *union_cache __read_mostly;
+
+static unsigned long hash(struct dentry *dentry, struct vfsmount *mnt)
+{
+	unsigned long tmp;
+
+	tmp = ((unsigned long)mnt * (unsigned long)dentry) ^
+		(GOLDEN_RATIO_PRIME + (unsigned long)mnt) / L1_CACHE_BYTES;
+	tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> union_hash_shift);
+	return tmp & union_hash_mask;
+}
+
+static __initdata unsigned long union_hash_entries;
+
+static int __init set_union_hash_entries(char *str)
+{
+	if (!str)
+		return 0;
+	union_hash_entries = simple_strtoul(str, &str, 0);
+	return 1;
+}
+
+__setup("union_hash_entries=", set_union_hash_entries);
+
+static int __init init_union(void)
+{
+	int loop;
+
+	union_cache = KMEM_CACHE(union_mount, SLAB_PANIC | SLAB_MEM_SPREAD);
+	union_hashtable = alloc_large_system_hash("Union-cache",
+						  sizeof(struct hlist_head),
+						  union_hash_entries,
+						  14,
+						  0,
+						  &union_hash_shift,
+						  &union_hash_mask,
+						  0);
+
+	for (loop = 0; loop < (1 << union_hash_shift); loop++)
+		INIT_HLIST_HEAD(&union_hashtable[loop]);
+
+
+	union_rhashtable = alloc_large_system_hash("rUnion-cache",
+						  sizeof(struct hlist_head),
+						  union_hash_entries,
+						  14,
+						  0,
+						  &union_rhash_shift,
+						  &union_rhash_mask,
+						  0);
+
+	for (loop = 0; loop < (1 << union_rhash_shift); loop++)
+		INIT_HLIST_HEAD(&union_rhashtable[loop]);
+
+	return 0;
+}
+
+fs_initcall(init_union);
+
+struct union_mount *union_alloc(struct dentry *this, struct vfsmount *this_mnt,
+				struct dentry *next, struct vfsmount *next_mnt)
+{
+	struct union_mount *um;
+
+	BUG_ON(!S_ISDIR(this->d_inode->i_mode));
+	BUG_ON(!S_ISDIR(next->d_inode->i_mode));
+
+	um = kmem_cache_alloc(union_cache, GFP_ATOMIC);
+	if (!um)
+		return NULL;
+
+	atomic_set(&um->u_count, 1);
+	INIT_LIST_HEAD(&um->u_unions);
+	INIT_HLIST_NODE(&um->u_hash);
+	INIT_HLIST_NODE(&um->u_rhash);
+
+	um->u_this.mnt = this_mnt;
+	um->u_this.dentry = this;
+	um->u_next.mnt = mntget(next_mnt);
+	um->u_next.dentry = dget(next);
+
+	return um;
+}
+
+struct union_mount *union_get(struct union_mount *um)
+{
+	BUG_ON(!atomic_read(&um->u_count));
+	atomic_inc(&um->u_count);
+	return um;
+}
+
+static int __union_put(struct union_mount *um)
+{
+	if (!atomic_dec_and_test(&um->u_count))
+		return 0;
+
+	BUG_ON(!hlist_unhashed(&um->u_hash));
+	BUG_ON(!hlist_unhashed(&um->u_rhash));
+
+	kmem_cache_free(union_cache, um);
+	return 1;
+}
+
+void union_put(struct union_mount *um)
+{
+	struct path tmp = um->u_next;
+
+	if (__union_put(um))
+		path_put(&tmp);
+}
+
+static void __union_hash(struct union_mount *um)
+{
+	hlist_add_head(&um->u_hash, union_hashtable +
+		       hash(um->u_this.dentry, um->u_this.mnt));
+	hlist_add_head(&um->u_rhash, union_rhashtable +
+		       hash(um->u_next.dentry, um->u_next.mnt));
+}
+
+static void __union_unhash(struct union_mount *um)
+{
+	hlist_del_init(&um->u_hash);
+	hlist_del_init(&um->u_rhash);
+}
+
+struct union_mount *union_lookup(struct dentry *dentry, struct vfsmount *mnt)
+{
+	struct hlist_head *head = union_hashtable + hash(dentry, mnt);
+	struct hlist_node *node;
+	struct union_mount *um;
+
+	hlist_for_each_entry(um, node, head, u_hash) {
+		if ((um->u_this.dentry == dentry) &&
+		    (um->u_this.mnt == mnt))
+			return um;
+	}
+
+	return NULL;
+}
+
+struct union_mount *union_rlookup(struct dentry *dentry, struct vfsmount *mnt)
+{
+	struct hlist_head *head = union_rhashtable + hash(dentry, mnt);
+	struct hlist_node *node;
+	struct union_mount *um;
+
+	hlist_for_each_entry(um, node, head, u_rhash) {
+		if ((um->u_next.dentry == dentry) &&
+		    (um->u_next.mnt == mnt))
+			return um;
+	}
+
+	return NULL;
+}
+
+/*
+ * append_to_union - add a path to the bottom of the union stack
+ *
+ * Allocate and attach a union cache entry linking the new, upper
+ * mnt/dentry to the "covered" matching lower mnt/dentry.  It's okay
+ * if the union cache entry already exists.
+ */
+
+int append_to_union(struct vfsmount *upper_mnt, struct dentry *upper_dentry,
+		    struct vfsmount *lower_mnt, struct dentry *lower_dentry)
+{
+	struct union_mount *new, *um;
+
+	BUG_ON(!S_ISDIR(upper_dentry->d_inode->i_mode));
+	BUG_ON(!S_ISDIR(lower_dentry->d_inode->i_mode));
+
+	/* Common case is that it's already been created, do a lookup first */
+
+	spin_lock(&union_lock);
+	um = union_lookup(upper_dentry, upper_mnt);
+	if (um) {
+		BUG_ON((um->u_next.dentry != lower_dentry) ||
+		       (um->u_next.mnt != lower_mnt));
+		spin_unlock(&union_lock);
+		return 0;
+	}
+	spin_unlock(&union_lock);
+
+	new = union_alloc(upper_dentry, upper_mnt, lower_dentry, lower_mnt);
+	if (!new)
+		return -ENOMEM;
+
+	spin_lock(&union_lock);
+	um = union_lookup(upper_dentry, upper_mnt);
+	if (um) {
+		/* Someone added it while we were allocating, no problem */
+		BUG_ON((um->u_next.dentry != lower_dentry) ||
+		       (um->u_next.mnt != lower_mnt));
+		spin_unlock(&union_lock);
+		union_put(new);
+		return 0;
+	}
+	__union_hash(new);
+	spin_unlock(&union_lock);
+	return 0;
+}
+
+/*
+ * WARNING! Confusing terminology alert.
+ *
+ * Note that the directions "up" and "down" in union mounts are the
+ * opposite of "up" and "down" in normal VFS operation terminology.
+ * "up" in the rest of the VFS means "towards the root of the mount
+ * tree."  If you mount B on top of A, following B "up" will get you
+ * A.  In union mounts, "up" means "towards the most recently mounted
+ * layer of the union stack."  If you union mount B on top of A,
+ * following A "up" will get you to B.  Another way to put it is that
+ * "up" in the VFS means going from this mount towards the direction
+ * of its mnt->mnt_parent pointer, but "up" in union mounts means
+ * going in the opposite direction (until you run out of union
+ * layers).
+ */
+
+/*
+ * union_down_one - get the next lower directory in the union stack
+ *
+ * This is called to traverse the union stack from the given layer to
+ * the next lower layer. union_down_one() is called by various
+ * lookup functions that are aware of union mounts.
+ *
+ * Returns non-zero if followed to the next lower layer, zero otherwise.
+ *
+ * See note on up/down terminology above.
+ */
+int union_down_one(struct vfsmount **mnt, struct dentry **dentry)
+{
+	struct union_mount *um;
+
+	if (!IS_MNT_UNION(*mnt))
+		return 0;
+
+	spin_lock(&union_lock);
+	um = union_lookup(*dentry, *mnt);
+	spin_unlock(&union_lock);
+	if (um) {
+		path_get(&um->u_next);
+		dput(*dentry);
+		*dentry = um->u_next.dentry;
+		mntput(*mnt);
+		*mnt = um->u_next.mnt;
+		return 1;
+	}
+	return 0;
+}
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index e035c51..d6c1da2 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -101,6 +101,26 @@ struct dentry {
 	struct dentry *d_parent;	/* parent directory */
 	struct qstr d_name;
 
+#ifdef CONFIG_UNION_MOUNT
+	/*
+	 * Stacks of union mount structures are connected to dentries
+	 * through the d_unions field.  If this list is not empty,
+	 * then this dentry is part of a unioned directory stack.
+	 * Protected by union_lock.
+	 */
+	struct list_head d_unions;	/* list of union_mount's */
+	/*
+	 * If d_unionized is set, then this dentry is referenced by
+	 * the u_next field of a union mount structure - that is, it
+	 * is a dentry for a lower layer of a union.  d_unionized is
+	 * NOT set in the dentry for the topmost layer of a union.
+	 *
+	 * d_unionized would be better renamed to d_union_lower or
+	 * d_union_ref.
+	 */
+	unsigned int d_unionized;	/* unions referencing this dentry */
+#endif
+
 	struct list_head d_lru;		/* LRU list */
 	/*
 	 * d_child and d_rcu can share memory
diff --git a/include/linux/mount.h b/include/linux/mount.h
index d42be54..85bb75d 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -64,6 +64,9 @@ struct vfsmount {
 	struct list_head mnt_slave_list;/* list of slave mounts */
 	struct list_head mnt_slave;	/* slave list entry */
 	struct vfsmount *mnt_master;	/* slave is on master->mnt_slave_list */
+#ifdef CONFIG_UNION_MOUNT
+	struct list_head mnt_unions;	/* list of union_mount structures */
+#endif
 	struct mnt_namespace *mnt_ns;	/* containing namespace */
 	int mnt_id;			/* mount identifier */
 	int mnt_group_id;		/* peer group identifier */
diff --git a/include/linux/union.h b/include/linux/union.h
new file mode 100644
index 0000000..71dc35a
--- /dev/null
+++ b/include/linux/union.h
@@ -0,0 +1,54 @@
+/*
+ * VFS based union mount for Linux
+ *
+ * Copyright (C) 2004-2007 IBM Corporation, IBM Deutschland Entwicklung GmbH.
+ * Copyright (C) 2007 Novell Inc.
+ *   Author(s): Jan Blunck (j.blunck@tu-harburg.de)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+#ifndef __LINUX_UNION_H
+#define __LINUX_UNION_H
+#ifdef __KERNEL__
+
+#include <linux/list.h>
+#include <asm/atomic.h>
+
+struct dentry;
+struct vfsmount;
+
+#ifdef CONFIG_UNION_MOUNT
+
+/*
+ * The union mount structure.
+ */
+struct union_mount {
+	atomic_t u_count;		/* reference count */
+	struct list_head u_unions;	/* list head for d_unions */
+	struct list_head u_list;	/* list head for mnt_unions */
+	struct hlist_node u_hash;	/* list head for searching */
+	struct hlist_node u_rhash;	/* list head for reverse searching */
+
+	struct path u_this;		/* this is me */
+	struct path u_next;		/* this is what I overlay */
+};
+
+#define IS_MNT_UNION(mnt)	((mnt)->mnt_flags & MNT_UNION)
+
+extern int append_to_union(struct vfsmount *, struct dentry *,
+			   struct vfsmount *, struct dentry *);
+extern int union_down_one(struct vfsmount **, struct dentry **);
+
+#else /* CONFIG_UNION_MOUNT */
+
+#define IS_MNT_UNION(x)			(0)
+#define append_to_union(x1, y1, x2, y2)	({ BUG(); (0); })
+#define union_down_one(x, y)		({ (0); })
+
+#endif	/* CONFIG_UNION_MOUNT */
+#endif	/* __KERNEL__ */
+#endif	/* __LINUX_UNION_H */
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/6] union-mount: Drive the union cache via dcache
  2010-03-02 22:11 ` [PATCH 1/6] union-mount: Introduce union_mount structure and basic operations Valerie Aurora
@ 2010-03-02 22:11   ` Valerie Aurora
  2010-03-02 22:11     ` [PATCH 3/6] union-mount: Implement union lookup Valerie Aurora
  2010-03-03 17:35     ` [PATCH 2/6] union-mount: Drive the union cache via dcache Miklos Szeredi
  2010-03-03 17:33   ` [PATCH 1/6] union-mount: Introduce union_mount structure and basic operations Miklos Szeredi
  1 sibling, 2 replies; 18+ messages in thread
From: Valerie Aurora @ 2010-03-02 22:11 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig
  Cc: linux-fsdevel, linux-kernel, Jan Blunck, Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

If a dentry is removed from dentry cache because its usage count drops to
zero, the references to the underlying layer of the unions the dentry is in
are dropped too. Therefore the union cache is driven by the dentry cache.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/dcache.c            |   13 +++++++++++
 fs/union.c             |   56 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dcache.h |    8 ++++++
 include/linux/union.h  |    4 +++
 4 files changed, 81 insertions(+), 0 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 0c2dd32..fc37f7a 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -18,6 +18,7 @@
 #include <linux/string.h>
 #include <linux/mm.h>
 #include <linux/fs.h>
+#include <linux/union.h>
 #include <linux/fsnotify.h>
 #include <linux/slab.h>
 #include <linux/init.h>
@@ -175,6 +176,8 @@ static struct dentry *d_kill(struct dentry *dentry)
 	dentry_stat.nr_dentry--;	/* For d_free, below */
 	/*drops the locks, at that point nobody can reach this dentry */
 	dentry_iput(dentry);
+	/* If the dentry was in an union delete them */
+	shrink_d_unions(dentry);
 	if (IS_ROOT(dentry))
 		parent = NULL;
 	else
@@ -696,6 +699,7 @@ static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
 					iput(inode);
 			}
 
+			shrink_d_unions(dentry);
 			d_free(dentry);
 
 			/* finished when we fall off the top of the tree,
@@ -1536,7 +1540,9 @@ void d_delete(struct dentry * dentry)
 	spin_lock(&dentry->d_lock);
 	isdir = S_ISDIR(dentry->d_inode->i_mode);
 	if (atomic_read(&dentry->d_count) == 1) {
+		__d_drop_unions(dentry);
 		dentry_iput(dentry);
+		shrink_d_unions(dentry);
 		fsnotify_nameremove(dentry, isdir);
 		return;
 	}
@@ -1547,6 +1553,13 @@ void d_delete(struct dentry * dentry)
 	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
 
+	/*
+	 * Remove any associated unions.  While someone still has this
+	 * directory open (ref count > 0), we could not have deleted
+	 * it unless it was empty, and therefore has no references to
+	 * directories below it.  So we don't need the unions.
+	 */
+	shrink_d_unions(dentry);
 	fsnotify_nameremove(dentry, isdir);
 }
 EXPORT_SYMBOL(d_delete);
diff --git a/fs/union.c b/fs/union.c
index 2e005d9..182ca91 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -14,6 +14,7 @@
 
 #include <linux/bootmem.h>
 #include <linux/init.h>
+#include <linux/module.h>
 #include <linux/types.h>
 #include <linux/hash.h>
 #include <linux/fs.h>
@@ -236,6 +237,8 @@ int append_to_union(struct vfsmount *upper_mnt, struct dentry *upper_dentry,
 		union_put(new);
 		return 0;
 	}
+	list_add(&new->u_unions, &upper_dentry->d_unions);
+	lower_dentry->d_unionized++;
 	__union_hash(new);
 	spin_unlock(&union_lock);
 	return 0;
@@ -288,3 +291,56 @@ int union_down_one(struct vfsmount **mnt, struct dentry **dentry)
 	}
 	return 0;
 }
+
+/**
+ * __d_drop_unions  -  remove all this dentry's unions from the union hash table
+ *
+ * @dentry - topmost dentry in the union stack to remove
+ *
+ * This must be called after unhashing a dentry. This is called with
+ * dcache_lock held and unhashes all the unions this dentry is
+ * attached to.
+ */
+void __d_drop_unions(struct dentry *dentry)
+{
+	struct union_mount *this, *next;
+
+	spin_lock(&union_lock);
+	list_for_each_entry_safe(this, next, &dentry->d_unions, u_unions)
+		__union_unhash(this);
+	spin_unlock(&union_lock);
+}
+EXPORT_SYMBOL_GPL(__d_drop_unions);
+
+/*
+ * This must be called after __d_drop_unions() without holding any locks.
+ * Note: The dentry might still be reachable via a lookup but at that time it
+ * already a negative dentry. Otherwise it would be unhashed. The union_mount
+ * structure itself is still reachable through mnt->mnt_unions (which we
+ * protect against with union_lock).
+ *
+ * We were worried about a recursive dput() call through:
+ *
+ * dput()->d_kill()->shrink_d_unions()->union_put()->dput()
+ *
+ * But this path can only be reached if the dentry is unhashed when we
+ * enter the first dput(), and it can only be unhashed if it was
+ * rmdir()'d, and d_delete() calls shrink_d_unions() for us.
+ */
+void shrink_d_unions(struct dentry *dentry)
+{
+	struct union_mount *this, *next;
+
+repeat:
+	spin_lock(&union_lock);
+	list_for_each_entry_safe(this, next, &dentry->d_unions, u_unions) {
+		BUG_ON(!hlist_unhashed(&this->u_hash));
+		BUG_ON(!hlist_unhashed(&this->u_rhash));
+		list_del(&this->u_unions);
+		this->u_next.dentry->d_unionized--;
+		spin_unlock(&union_lock);
+		union_put(this);
+		goto repeat;
+	}
+	spin_unlock(&union_lock);
+}
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index d6c1da2..bfa8f97 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -227,12 +227,20 @@ extern seqlock_t rename_lock;
  * __d_drop requires dentry->d_lock.
  */
 
+#ifdef CONFIG_UNION_MOUNT
+extern void __d_drop_unions(struct dentry *);
+#endif
+
 static inline void __d_drop(struct dentry *dentry)
 {
 	if (!(dentry->d_flags & DCACHE_UNHASHED)) {
 		dentry->d_flags |= DCACHE_UNHASHED;
 		hlist_del_rcu(&dentry->d_hash);
 	}
+#ifdef CONFIG_UNION_MOUNT
+	/* remove dentry from the union hashtable */
+	__d_drop_unions(dentry);
+#endif
 }
 
 static inline void d_drop(struct dentry *dentry)
diff --git a/include/linux/union.h b/include/linux/union.h
index 71dc35a..0ab0a2f 100644
--- a/include/linux/union.h
+++ b/include/linux/union.h
@@ -42,12 +42,16 @@ struct union_mount {
 extern int append_to_union(struct vfsmount *, struct dentry *,
 			   struct vfsmount *, struct dentry *);
 extern int union_down_one(struct vfsmount **, struct dentry **);
+extern void __d_drop_unions(struct dentry *);
+extern void shrink_d_unions(struct dentry *);
 
 #else /* CONFIG_UNION_MOUNT */
 
 #define IS_MNT_UNION(x)			(0)
 #define append_to_union(x1, y1, x2, y2)	({ BUG(); (0); })
 #define union_down_one(x, y)		({ (0); })
+#define __d_drop_unions(x)		do { } while (0)
+#define shrink_d_unions(x)		do { } while (0)
 
 #endif	/* CONFIG_UNION_MOUNT */
 #endif	/* __KERNEL__ */
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/6] union-mount: Implement union lookup
  2010-03-02 22:11   ` [PATCH 2/6] union-mount: Drive the union cache via dcache Valerie Aurora
@ 2010-03-02 22:11     ` Valerie Aurora
  2010-03-02 22:11       ` [PATCH 4/6] union-mount: Support for mounting union mount file systems Valerie Aurora
  2010-03-03 17:35     ` [PATCH 2/6] union-mount: Drive the union cache via dcache Miklos Szeredi
  1 sibling, 1 reply; 18+ messages in thread
From: Valerie Aurora @ 2010-03-02 22:11 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig
  Cc: linux-fsdevel, linux-kernel, Valerie Aurora

Implement unioned directories, whiteouts, and fallthrus in pathname
lookup routines.  do_lookup() and lookup_hash() call lookup_union()
after looking up the dentry from the top-level file system.
lookup_union() is centered around __lookup_hash(), which does cached
and/or real lookups and revalidates each dentry in the union stack.

The added cost to a non-union mount pathname lookup in a
CONFIG_UNION_MOUNT kernel is either one or two mount flag tests per
pathname component, in needs_union_lookup().

XXX - implement negative union cache entries
---
 fs/namei.c            |  191 ++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/union.c            |   67 +++++++++++++++++
 include/linux/union.h |    9 +++
 3 files changed, 266 insertions(+), 1 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index fee4326..85d95fc 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -33,6 +33,7 @@
 #include <linux/fcntl.h>
 #include <linux/device_cgroup.h>
 #include <linux/fs_struct.h>
+#include <linux/union.h>
 #include <asm/uaccess.h>
 
 #include "internal.h"
@@ -739,6 +740,181 @@ static __always_inline void follow_dotdot(struct nameidata *nd)
 	follow_mount(&nd->path);
 }
 
+static struct dentry *__lookup_hash(struct qstr *name, struct dentry *base,
+				    struct nameidata *nd);
+
+/*
+ * __lookup_union - Given a path from the topmost layer, lookup and
+ * revalidate each dentry in its union stack, building it if necessary
+ *
+ * @nd - nameidata for the parent of @topmost
+ * @name - pathname from this element on
+ * @topmost - path of the topmost matching dentry
+ *
+ * Given the nameidata and the path of the topmost dentry for this
+ * pathname, lookup, revalidate, and build the associated union stack.
+ * @topmost must be either a negative dentry or a directory.
+ *
+ * This function is called both to build a new union stack and to
+ * revalidate a pre-existing union stack.  So we must cope with
+ * already existing union cache entries.
+ *
+ * This function may stomp nd->path with the path of the parent
+ * directory of lower layer, so the caller must save nd->path and
+ * restore it afterwards.  You probably want to use lookup_union(),
+ * not __lookup_union().
+ */
+
+static int __lookup_union(struct nameidata *nd, struct qstr *name,
+			  struct path *topmost)
+{
+	struct path parent = nd->path;
+	struct dentry *dentry;
+	struct path upper;
+	struct path lower;
+	int err = 0;
+
+	if (d_is_whiteout(topmost->dentry))
+		return 0;
+
+	if (IS_OPAQUE(nd->path.dentry->d_inode) &&
+	    !d_is_fallthru(topmost->dentry))
+		return 0;
+
+	/* upper is the most recent positive dentry or topmost negative */
+	upper.dentry = dget(topmost->dentry);
+	upper.mnt = mntget(topmost->mnt);
+
+	/* union_down_one() drops a reference, take one */
+	path_get(&nd->path);
+
+	/* Traverse the parent dir's union stack looking for this name */
+	while (union_down_one(&nd->path.mnt, &nd->path.dentry)) {
+		/* Lookup and revalidate the child dentry */
+		lower.mnt = nd->path.mnt;
+		lower.dentry = __lookup_hash(name, nd->path.dentry, nd);
+
+		if (IS_ERR(lower.dentry)) {
+			err = PTR_ERR(lower.dentry);
+			break;
+		}
+
+		if (d_is_whiteout(lower.dentry)) {
+			dput(lower.dentry);
+			break;
+		}
+
+		if (IS_OPAQUE(nd->path.dentry->d_inode) &&
+		    !d_is_fallthru(lower.dentry))
+			break;
+
+		if (!lower.dentry->d_inode) {
+			dput(lower.dentry);
+			continue;
+		}
+
+		/*
+		 * You can't union a file with a directory!  Note that
+		 * if the topmost directory entry is positive, then it
+		 * will be a directory at this point.
+		 */
+		if (topmost->dentry->d_inode &&
+		    !S_ISDIR(lower.dentry->d_inode->i_mode)) {
+			dput(lower.dentry);
+			break;
+		}
+
+		/* Non-dir entries block anything below, so bail out */
+		if (!S_ISDIR(lower.dentry->d_inode->i_mode)) {
+			dput(topmost->dentry);
+			topmost->dentry = lower.dentry;
+			/* mntput() of previous topmost done in link_path_walk() */
+			topmost->mnt = mntget(lower.mnt);
+			break;
+		}
+
+		/* The topmost directory must always exist.  Create if necessary. */
+		if (!topmost->dentry->d_inode) {
+			dentry = union_create_topmost_dir(&parent, name, &lower);
+			if (IS_ERR(dentry)) {
+				err = PTR_ERR(dentry);
+				dput(lower.dentry);
+				break;
+			}
+			dput(topmost->dentry);
+			topmost->dentry = dentry;
+			dput(upper.dentry);
+			upper.dentry = dget(dentry);
+		}
+
+		/* Add new dentry to the union stack (handles already-created case) */
+		err = append_to_union(upper.mnt, upper.dentry, lower.mnt, lower.dentry);
+		if (err) {
+			dput(lower.dentry);
+			break;
+		}
+
+		path_put(&upper);
+		upper.mnt = mntget(lower.mnt);
+		upper.dentry = lower.dentry;
+	}
+	path_put(&nd->path);
+	path_put(&upper);
+
+	return err;
+}
+
+/*
+ * lookup_union - revalidate and build union stack for this path
+ *
+ * We borrow the nameidata struct from the topmost layer to do the
+ * revalidation on lower dentries, replacing the topmost parent
+ * directory's path with that of the matching parent dir in each lower
+ * layer.  This wrapper for __lookup_union() saves the topmost layer's
+ * path and restores it when we are done.
+ */
+static int lookup_union(struct nameidata *nd, struct qstr *name,
+			struct path *topmost)
+{
+	struct path saved_path;
+	int err;
+
+	BUG_ON(!IS_MNT_UNION(nd->path.mnt) && !IS_MNT_UNION(topmost->mnt));
+	BUG_ON(!mutex_is_locked(&nd->path.dentry->d_inode->i_mutex));
+
+	saved_path = nd->path;
+	path_get(&saved_path);
+
+	err = __lookup_union(nd, name, topmost);
+
+	nd->path = saved_path;
+	path_put(&saved_path);
+
+	return err;
+}
+
+/*
+ * do_union_lookup - union mount-aware part of do_lookup
+ *
+ * do_lookup()-style wrapper for lookup_union().  Follows mounts.
+ */
+
+static int do_union_lookup(struct nameidata *nd, struct qstr *name,
+			   struct path *topmost)
+{
+	struct dentry *parent = nd->path.dentry;
+	struct inode *dir = parent->d_inode;
+	int err;
+
+	mutex_lock(&dir->i_mutex);
+	err = lookup_union(nd, name, topmost);
+	mutex_unlock(&dir->i_mutex);
+
+	__follow_mount(topmost);
+
+	return err;
+}
+
 /*
  *  It's more convoluted than I'd like it to be, but... it's still fairly
  *  small and for now I'd prefer to have fast path as straight as possible.
@@ -769,6 +945,11 @@ done:
 	path->mnt = mnt;
 	path->dentry = dentry;
 	__follow_mount(path);
+	if (needs_union_lookup(nd->path.mnt, path)) {
+		int err = do_union_lookup(nd, name, path);
+		if (err < 0)
+			return err;
+	}
 	return 0;
 
 need_lookup:
@@ -1231,8 +1412,13 @@ static int lookup_hash(struct nameidata *nd, struct qstr *name,
 		err = PTR_ERR(path->dentry);
 		path->dentry = NULL;
 		path->mnt = NULL;
+		return err;
 	}
+
+	if (needs_union_lookup(nd->path.mnt, path))
+		err = lookup_union(nd, name, path);
 	return err;
+
 }
 
 static int __lookup_one_len(const char *name, struct qstr *this,
@@ -2947,7 +3133,10 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 	error = -EXDEV;
 	if (oldnd.path.mnt != newnd.path.mnt)
 		goto exit2;
-
+	/* Rename on union mounts not implemented yet */
+	/* XXX much harsher check than necessary - can do some renames */
+	if (IS_UNIONED_DIR(&oldnd.path) || IS_UNIONED_DIR(&newnd.path))
+		goto exit2;
 	old_dir = oldnd.path.dentry;
 	error = -EBUSY;
 	if (oldnd.last_type != LAST_NORM)
diff --git a/fs/union.c b/fs/union.c
index 182ca91..6823081 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -21,6 +21,7 @@
 #include <linux/mount.h>
 #include <linux/fs_struct.h>
 #include <linux/union.h>
+#include <linux/namei.h>
 
 /*
  * This is borrowed from fs/inode.c. The hashtable for lookups. Somebody
@@ -196,6 +197,43 @@ struct union_mount *union_rlookup(struct dentry *dentry, struct vfsmount *mnt)
 }
 
 /*
+ * needs_union_lookup - Does this path need a union lookup?
+ *
+ * @parent_mnt - parent mnt, usually from associated nameidata (nd->path.mnt)
+ * @path - path of potential child union directory
+ *
+ * Short-circuit union operations on paths that can't possibly be
+ * unioned directories or don't need union lookup.
+ */
+
+int needs_union_lookup(struct vfsmount *parent_mnt, struct path *path)
+{
+	/* If this is the root of a mount, ignore the parent */
+	if (IS_ROOT(path->dentry) && !IS_MNT_UNION(path->mnt))
+		return 0;
+
+	/* The child could be from a lower layer, check the parent mnt */
+	if (!IS_MNT_UNION(parent_mnt))
+		return 0;
+
+	/* Only directories can be unioned */
+	if (path->dentry->d_inode &&
+	    !S_ISDIR(path->dentry->d_inode->i_mode))
+		return 0;
+
+	/*
+	 * XXX - A negative dentry for a directory in a unioned
+	 * directory could have a matching directory below it.  Or it
+	 * could not.  Either way, all we have is a negative dentry.
+	 * As a result, negative dentries with unioned parents always
+	 * have to go through a full union lookup.  This can be
+	 * avoided by adding a negative union cache entry for the
+	 * negative dentry.
+	 */
+	return 1;
+}
+
+/*
  * append_to_union - add a path to the bottom of the union stack
  *
  * Allocate and attach a union cache entry linking the new, upper
@@ -344,3 +382,32 @@ repeat:
 	}
 	spin_unlock(&union_lock);
 }
+
+/*
+ * union_create_topmost_dir - Create a matching dir in the topmost file system
+ */
+
+struct dentry * union_create_topmost_dir(struct path *parent, struct qstr *name,
+					 struct path *lower)
+{
+	struct dentry *dentry;
+	int mode = lower->dentry->d_inode->i_mode;
+	int res;
+
+	res = mnt_want_write(parent->mnt);
+	if (res)
+		return ERR_PTR(res);
+
+	dentry = lookup_one_len(name->name, parent->dentry, name->len);
+	if (IS_ERR(dentry))
+		goto out;
+
+	res = vfs_mkdir(parent->dentry->d_inode, dentry, mode);
+	if (res) {
+		dput(dentry);
+		goto out;
+	}
+out:
+	mnt_drop_write(parent->mnt);
+	return dentry;
+}
diff --git a/include/linux/union.h b/include/linux/union.h
index 0ab0a2f..938b15a 100644
--- a/include/linux/union.h
+++ b/include/linux/union.h
@@ -38,20 +38,29 @@ struct union_mount {
 };
 
 #define IS_MNT_UNION(mnt)	((mnt)->mnt_flags & MNT_UNION)
+#define IS_UNIONED_DIR(path)	(IS_MNT_UNION((path)->mnt) && \
+				 ((path)->dentry->d_unionized || \
+				  !list_empty(&(path)->dentry->d_unions)))
 
+extern int needs_union_lookup(struct vfsmount *, struct path *);
 extern int append_to_union(struct vfsmount *, struct dentry *,
 			   struct vfsmount *, struct dentry *);
 extern int union_down_one(struct vfsmount **, struct dentry **);
 extern void __d_drop_unions(struct dentry *);
 extern void shrink_d_unions(struct dentry *);
+extern struct dentry * union_create_topmost_dir(struct path *, struct qstr *,
+						struct path *);
 
 #else /* CONFIG_UNION_MOUNT */
 
 #define IS_MNT_UNION(x)			(0)
+#define IS_UNIONED_DIR(x)		(0)
+#define needs_union_lookup(x, y)	({ (0); })
 #define append_to_union(x1, y1, x2, y2)	({ BUG(); (0); })
 #define union_down_one(x, y)		({ (0); })
 #define __d_drop_unions(x)		do { } while (0)
 #define shrink_d_unions(x)		do { } while (0)
+#define union_create_topmost_dir(x, y, z)	({ BUG(); (NULL); })
 
 #endif	/* CONFIG_UNION_MOUNT */
 #endif	/* __KERNEL__ */
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/6] union-mount: Support for mounting union mount file systems
  2010-03-02 22:11     ` [PATCH 3/6] union-mount: Implement union lookup Valerie Aurora
@ 2010-03-02 22:11       ` Valerie Aurora
  2010-03-02 22:11         ` [PATCH 5/6] union-mount: Call do_whiteout() on unlink and rmdir in unions Valerie Aurora
  0 siblings, 1 reply; 18+ messages in thread
From: Valerie Aurora @ 2010-03-02 22:11 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig
  Cc: linux-fsdevel, linux-kernel, Valerie Aurora, Jan Blunck

Create and tear down union mount structures on mount.  Check
requirements for union mounts.

Thanks to Felix Fietkau <nbd@openwrt.org> for a bug fix.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namespace.c        |  123 ++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/union.c            |   56 ++++++++++++++++++++++
 include/linux/union.h |    5 ++
 3 files changed, 183 insertions(+), 1 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index fc56bf7..c994173 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -29,6 +29,7 @@
 #include <linux/log2.h>
 #include <linux/idr.h>
 #include <linux/fs_struct.h>
+#include <linux/union.h>
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
 #include "pnode.h"
@@ -157,6 +158,9 @@ struct vfsmount *alloc_vfsmnt(const char *name)
 #else
 		mnt->mnt_writers = 0;
 #endif
+#ifdef CONFIG_UNION_MOUNT
+		INIT_LIST_HEAD(&mnt->mnt_unions);
+#endif
 	}
 	return mnt;
 
@@ -492,6 +496,7 @@ static void __touch_mnt_namespace(struct mnt_namespace *ns)
 
 static void detach_mnt(struct vfsmount *mnt, struct path *old_path)
 {
+	detach_mnt_union(mnt);
 	old_path->dentry = mnt->mnt_mountpoint;
 	old_path->mnt = mnt->mnt_parent;
 	mnt->mnt_parent = mnt;
@@ -515,6 +520,7 @@ static void attach_mnt(struct vfsmount *mnt, struct path *path)
 	list_add_tail(&mnt->mnt_hash, mount_hashtable +
 			hash(path->mnt, path->dentry));
 	list_add_tail(&mnt->mnt_child, &path->mnt->mnt_mounts);
+	attach_mnt_union(mnt, path->mnt, path->dentry);
 }
 
 /*
@@ -537,6 +543,7 @@ static void commit_tree(struct vfsmount *mnt)
 	list_add_tail(&mnt->mnt_hash, mount_hashtable +
 				hash(parent, mnt->mnt_mountpoint));
 	list_add_tail(&mnt->mnt_child, &parent->mnt_mounts);
+	attach_mnt_union(mnt, mnt->mnt_parent, mnt->mnt_mountpoint);
 	touch_mnt_namespace(n);
 }
 
@@ -1025,6 +1032,7 @@ void release_mounts(struct list_head *head)
 			struct dentry *dentry;
 			struct vfsmount *m;
 			spin_lock(&vfsmount_lock);
+			detach_mnt_union(mnt);
 			dentry = mnt->mnt_mountpoint;
 			m = mnt->mnt_parent;
 			mnt->mnt_mountpoint = mnt->mnt_root;
@@ -1143,6 +1151,13 @@ static int do_umount(struct vfsmount *mnt, int flags)
 	spin_unlock(&vfsmount_lock);
 	if (retval)
 		security_sb_umount_busy(mnt);
+	/*
+	 * If this was a union mount, we are no longer a read-only
+	 * user on the underlying mount.
+	 */
+	if (mnt->mnt_flags & MNT_UNION)
+		dec_hard_readonly_users(mnt->mnt_parent);
+
 	up_write(&namespace_sem);
 	release_mounts(&umount_list);
 	return retval;
@@ -1483,6 +1498,17 @@ static int do_change_type(struct path *path, int flag)
 		return -EINVAL;
 
 	down_write(&namespace_sem);
+
+	/*
+	 * Mounts of file systems with read-only users can't deal with
+	 * mount/umount propagation events - it's the moral equivalent
+	 * of rm -rf dir/ or the like.
+	 */
+	if (sb_is_hard_readonly(mnt->mnt_sb)) {
+		err = -EROFS;
+		goto out_unlock;
+	}
+
 	if (type == MS_SHARED) {
 		err = invent_group_ids(mnt, recurse);
 		if (err)
@@ -1500,6 +1526,77 @@ static int do_change_type(struct path *path, int flag)
 }
 
 /*
+ * Mount-time check of upper and lower layer file systems to see if we
+ * can union mount one on the other.
+ *
+ * Note on union mounts and mount event propagation: The lower
+ * layer(s) of a union mount must not have any changes to its
+ * namespace.  Therefore, it must not be part of any mount event
+ * propagation group - i.e., shared or slave.  MNT_SHARED and
+ * MNT_SLAVE are not set at mount, but in do_change_type(), which
+ * prevents setting these flags on file systems with read-only users,
+ * which includes the lower layer(s) of a union mount.
+ */
+
+static int
+check_union_mnt(struct path *mntpnt, struct vfsmount *topmost_mnt, int mnt_flags)
+{
+	struct vfsmount *lower_mnt = mntpnt->mnt;
+
+	if (!(mnt_flags & MNT_UNION))
+		return 0;
+
+#ifndef CONFIG_UNION_MOUNT
+	return -EINVAL;
+#endif
+	/*
+	 * We can't deal with namespace changes in the lower layers of
+	 * a union, so the lower layer must be read-only.  Note that
+	 * we could possibly convert a read-write unioned mount into a
+	 * read-only mount here, which would give us a way to union
+	 * more than one layer with separate mount commands.  But
+	 * first we have to solve the locking order problems with more
+	 * than two layers of union.
+	 */
+	if (!(lower_mnt->mnt_sb->s_flags & MS_RDONLY))
+		return -EBUSY;
+
+	/*
+	 * WRITEME: For simplicity, the lower layer can't have
+	 * submounts.  If there's a good reason, we could recursively
+	 * check the whole subtree for read-only-ness, etc. and it
+	 * would probably work fine.
+	 */
+	if (!list_empty(&lower_mnt->mnt_mounts))
+		return -EBUSY;
+
+	/*
+	 * Only permit unioning of file systems at their root
+	 * directories.  This allows us to mark entire mounts as
+	 * unioned.  Otherwise we must slowly and expensively work our
+	 * way up a path looking for a unioned directory before we
+	 * know if a path is from a unioned lower layer.
+	 */
+
+	if (!IS_ROOT(mntpnt->dentry))
+		return -EINVAL;
+
+	/*
+	 * Topmost layer must be writable to support our readdir()
+	 * solution of copying up all lower level entries to the
+	 * topmost layer.
+	 */
+	if (mnt_flags & MNT_READONLY)
+		return -EROFS;
+
+	/* Topmost file system must support whiteouts and fallthrus. */
+	if (!(topmost_mnt->mnt_sb->s_flags & MS_WHITEOUT))
+		return -EINVAL;
+
+	return 0;
+}
+
+/*
  * do loopback mount.
  */
 static int do_loopback(struct path *path, char *old_name,
@@ -1520,6 +1617,9 @@ static int do_loopback(struct path *path, char *old_name,
 	err = -EINVAL;
 	if (IS_MNT_UNBINDABLE(old_path.mnt))
 		goto out;
+	/* Mount part of a union mount elsewhere? The mind boggles. */
+	if (IS_MNT_UNION(old_path.mnt))
+		goto out;
 
 	if (!check_mnt(path->mnt) || !check_mnt(old_path.mnt))
 		goto out;
@@ -1541,7 +1641,6 @@ static int do_loopback(struct path *path, char *old_name,
 		spin_unlock(&vfsmount_lock);
 		release_mounts(&umount_list);
 	}
-
 out:
 	up_write(&namespace_sem);
 	path_put(&old_path);
@@ -1582,6 +1681,9 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
 	if (!check_mnt(path->mnt))
 		return -EINVAL;
 
+	if (mnt_flags & MNT_UNION)
+		return -EINVAL;
+
 	if (path->dentry != path->mnt->mnt_root)
 		return -EINVAL;
 
@@ -1634,6 +1736,9 @@ static int do_move_mount(struct path *path, char *old_name)
 	while (d_mountpoint(path->dentry) &&
 	       follow_down(path))
 		;
+	/* Get the lowest layer of a union mount to move the whole stack */
+	while (union_down_one(&old_path.mnt, &old_path.dentry))
+		;
 	err = -EINVAL;
 	if (!check_mnt(path->mnt) || !check_mnt(old_path.mnt))
 		goto out;
@@ -1746,10 +1851,18 @@ int do_add_mount(struct vfsmount *newmnt, struct path *path,
 	if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode))
 		goto unlock;
 
+	err = check_union_mnt(path, newmnt, mnt_flags);
+	if (err)
+		goto unlock;
+
 	newmnt->mnt_flags = mnt_flags;
 	if ((err = graft_tree(newmnt, path)))
 		goto unlock;
 
+	/* Union mounts require the lower layer to always be read-only */
+	if (mnt_flags & MNT_UNION)
+		inc_hard_readonly_users(newmnt->mnt_parent);
+
 	if (fslist) /* add to the specified expiration list */
 		list_add_tail(&newmnt->mnt_expire, fslist);
 
@@ -2260,6 +2373,14 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	if (d_unlinked(old.dentry))
 		goto out2;
 	error = -EBUSY;
+	/*
+	 * We want the bottom-most layer of a union mount here - if we
+	 * move that around, all the layers on top move with it.
+	 */
+	while (union_down_one(&new.mnt, &new.dentry))
+		;
+	while (union_down_one(&root.mnt, &root.dentry))
+		;
 	if (new.mnt == root.mnt ||
 	    old.mnt == root.mnt)
 		goto out2; /* loop, on the same file system  */
diff --git a/fs/union.c b/fs/union.c
index 6823081..ed852e5 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -114,6 +114,7 @@ struct union_mount *union_alloc(struct dentry *this, struct vfsmount *this_mnt,
 
 	atomic_set(&um->u_count, 1);
 	INIT_LIST_HEAD(&um->u_unions);
+	INIT_LIST_HEAD(&um->u_list);
 	INIT_HLIST_NODE(&um->u_hash);
 	INIT_HLIST_NODE(&um->u_rhash);
 
@@ -275,6 +276,7 @@ int append_to_union(struct vfsmount *upper_mnt, struct dentry *upper_dentry,
 		union_put(new);
 		return 0;
 	}
+	list_add(&new->u_list, &upper_mnt->mnt_unions);
 	list_add(&new->u_unions, &upper_dentry->d_unions);
 	lower_dentry->d_unionized++;
 	__union_hash(new);
@@ -374,6 +376,7 @@ repeat:
 	list_for_each_entry_safe(this, next, &dentry->d_unions, u_unions) {
 		BUG_ON(!hlist_unhashed(&this->u_hash));
 		BUG_ON(!hlist_unhashed(&this->u_rhash));
+		list_del(&this->u_list);
 		list_del(&this->u_unions);
 		this->u_next.dentry->d_unionized--;
 		spin_unlock(&union_lock);
@@ -384,6 +387,59 @@ repeat:
 }
 
 /*
+ * Remove all union_mounts structures belonging to this vfsmount from the
+ * union lookup hashtable and so on ...
+ */
+void shrink_mnt_unions(struct vfsmount *mnt)
+{
+	struct union_mount *this, *next;
+
+repeat:
+	spin_lock(&union_lock);
+	list_for_each_entry_safe(this, next, &mnt->mnt_unions, u_list) {
+		if (this->u_this.dentry == mnt->mnt_root)
+			continue;
+		__union_unhash(this);
+		list_del(&this->u_list);
+		list_del(&this->u_unions);
+		this->u_next.dentry->d_unionized--;
+		spin_unlock(&union_lock);
+		union_put(this);
+		goto repeat;
+	}
+	spin_unlock(&union_lock);
+}
+
+int attach_mnt_union(struct vfsmount *mnt, struct vfsmount *dest_mnt,
+		     struct dentry *dest_dentry)
+{
+	if (!IS_MNT_UNION(mnt))
+		return 0;
+
+	return append_to_union(mnt, mnt->mnt_root, dest_mnt, dest_dentry);
+}
+
+void detach_mnt_union(struct vfsmount *mnt)
+{
+	struct union_mount *um;
+
+	if (!IS_MNT_UNION(mnt))
+		return;
+
+	shrink_mnt_unions(mnt);
+
+	spin_lock(&union_lock);
+	um = union_lookup(mnt->mnt_root, mnt);
+	__union_unhash(um);
+	list_del(&um->u_list);
+	list_del(&um->u_unions);
+	um->u_next.dentry->d_unionized--;
+	spin_unlock(&union_lock);
+	union_put(um);
+	return;
+}
+
+/*
  * union_create_topmost_dir - Create a matching dir in the topmost file system
  */
 
diff --git a/include/linux/union.h b/include/linux/union.h
index 938b15a..6eaeae8 100644
--- a/include/linux/union.h
+++ b/include/linux/union.h
@@ -50,6 +50,9 @@ extern void __d_drop_unions(struct dentry *);
 extern void shrink_d_unions(struct dentry *);
 extern struct dentry * union_create_topmost_dir(struct path *, struct qstr *,
 						struct path *);
+extern int attach_mnt_union(struct vfsmount *, struct vfsmount *,
+			    struct dentry *);
+extern void detach_mnt_union(struct vfsmount *);
 
 #else /* CONFIG_UNION_MOUNT */
 
@@ -61,6 +64,8 @@ extern struct dentry * union_create_topmost_dir(struct path *, struct qstr *,
 #define __d_drop_unions(x)		do { } while (0)
 #define shrink_d_unions(x)		do { } while (0)
 #define union_create_topmost_dir(x, y, z)	({ BUG(); (NULL); })
+#define attach_mnt_union(x, y, z)	do { } while (0)
+#define detach_mnt_union(x)		do { } while (0)
 
 #endif	/* CONFIG_UNION_MOUNT */
 #endif	/* __KERNEL__ */
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/6] union-mount: Call do_whiteout() on unlink and rmdir in unions
  2010-03-02 22:11       ` [PATCH 4/6] union-mount: Support for mounting union mount file systems Valerie Aurora
@ 2010-03-02 22:11         ` Valerie Aurora
  2010-03-02 22:11           ` [PATCH 6/6] union-mount: Copy up directory entries on first readdir() Valerie Aurora
  0 siblings, 1 reply; 18+ messages in thread
From: Valerie Aurora @ 2010-03-02 22:11 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig
  Cc: linux-fsdevel, linux-kernel, Jan Blunck, Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

Call do_whiteout() when removing files and directories from a union
mounted file system.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namei.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 85d95fc..9797074 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2670,6 +2670,10 @@ static long do_rmdir(int dfd, const char __user *pathname)
 	error = mnt_want_write(nd.path.mnt);
 	if (error)
 		goto exit3;
+	if (IS_UNIONED_DIR(&nd.path)) {
+		error = do_whiteout(&nd, &path, 1);
+		goto exit4;
+	}
 	error = security_path_rmdir(&nd.path, path.dentry);
 	if (error)
 		goto exit4;
@@ -2758,6 +2762,10 @@ static long do_unlinkat(int dfd, const char __user *pathname)
 		error = mnt_want_write(nd.path.mnt);
 		if (error)
 			goto exit2;
+		if (IS_UNIONED_DIR(&nd.path)) {
+			error = do_whiteout(&nd, &path, 0);
+			goto exit3;
+		}
 		error = security_path_unlink(&nd.path, path.dentry);
 		if (error)
 			goto exit3;
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/6] union-mount: Copy up directory entries on first readdir()
  2010-03-02 22:11         ` [PATCH 5/6] union-mount: Call do_whiteout() on unlink and rmdir in unions Valerie Aurora
@ 2010-03-02 22:11           ` Valerie Aurora
  2010-03-03 21:53             ` Multiple read-only layers in union mounts (was Re: [PATCH 6/6] union-mount: Copy up directory entries on first readdir()) Valerie Aurora
  0 siblings, 1 reply; 18+ messages in thread
From: Valerie Aurora @ 2010-03-02 22:11 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig
  Cc: linux-fsdevel, linux-kernel, Valerie Aurora, Felix Fietkau

readdir() in union mounts is implemented by copying up all visible
directory entries from the lower level directories to the topmost
directory.  Directory entries that refer to lower level file system
objects are marked as "fallthru" in the topmost directory.

Thanks to Felix Fietkau <nbd@openwrt.org> for a bug fix.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
---
 fs/readdir.c          |    9 +++
 fs/union.c            |  160 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/union.h |    2 +
 3 files changed, 171 insertions(+), 0 deletions(-)

diff --git a/fs/readdir.c b/fs/readdir.c
index 3a48491..da71515 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -16,6 +16,8 @@
 #include <linux/security.h>
 #include <linux/syscalls.h>
 #include <linux/unistd.h>
+#include <linux/union.h>
+#include <linux/mount.h>
 
 #include <asm/uaccess.h>
 
@@ -36,9 +38,16 @@ int vfs_readdir(struct file *file, filldir_t filler, void *buf)
 
 	res = -ENOENT;
 	if (!IS_DEADDIR(inode)) {
+		if (IS_UNIONED_DIR(&file->f_path) && !IS_OPAQUE(inode)) {
+			res = union_copyup_dir(&file->f_path);
+			if (res)
+				goto out_unlock;
+		}
+
 		res = file->f_op->readdir(file, buf, filler);
 		file_accessed(file);
 	}
+out_unlock:
 	mutex_unlock(&inode->i_mutex);
 out:
 	return res;
diff --git a/fs/union.c b/fs/union.c
index ed852e5..b66989a 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2007-2009 Novell Inc.
  *
  *   Author(s): Jan Blunck (j.blunck@tu-harburg.de)
+ *              Valerie Aurora <vaurora@redhat.com>
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License as published by the Free
@@ -22,6 +23,8 @@
 #include <linux/fs_struct.h>
 #include <linux/union.h>
 #include <linux/namei.h>
+#include <linux/file.h>
+#include <linux/security.h>
 
 /*
  * This is borrowed from fs/inode.c. The hashtable for lookups. Somebody
@@ -467,3 +470,160 @@ out:
 	mnt_drop_write(parent->mnt);
 	return dentry;
 }
+
+/**
+ * union_copyup_dir_one - copy up a single directory entry
+ *
+ * Individual directory entry copyup function for union_copyup_dir.
+ * We get the entries from higher level layers first.
+ */
+
+static int union_copyup_dir_one(void *buf, const char *name, int namlen,
+				loff_t offset, u64 ino, unsigned int d_type)
+{
+	struct dentry *topmost_dentry = (struct dentry *) buf;
+	struct dentry *dentry;
+	int err = 0;
+
+	switch (namlen) {
+	case 2:
+		if (name[1] != '.')
+			break;
+	case 1:
+		if (name[0] != '.')
+			break;
+		return 0;
+	}
+
+	/* Lookup this entry in the topmost directory */
+	dentry = lookup_one_len(name, topmost_dentry, namlen);
+
+	if (IS_ERR(dentry)) {
+		printk(KERN_WARNING "%s: error looking up %s\n", __func__,
+		       dentry->d_name.name);
+		err = PTR_ERR(dentry);
+		goto out;
+	}
+
+	/*
+	 * If the entry already exists, one of the following is true:
+	 * it was already copied up (due to an earlier lookup), an
+	 * entry with the same name already exists on the topmost file
+	 * system, it is a whiteout, or it is a fallthru.  In each
+	 * case, the top level entry masks any entries from lower file
+	 * systems, so don't copy up this entry.
+	 */
+	if (dentry->d_inode || d_is_whiteout(dentry) || d_is_fallthru(dentry))
+		goto out_dput;
+
+	/*
+	 * If the entry doesn't exist, create a fallthru entry in the
+	 * topmost file system.  All possible directory types are
+	 * used, so each file system must implement its own way of
+	 * storing a fallthru entry.
+	 */
+	err = topmost_dentry->d_inode->i_op->fallthru(topmost_dentry->d_inode,
+						      dentry);
+out_dput:
+	dput(dentry);
+out:
+	return err;
+}
+
+/**
+ * union_copyup_dir - copy up low-level directory entries to topmost dir
+ *
+ * readdir() is difficult to support on union file systems for two
+ * reasons: We must eliminate duplicates and apply whiteouts, and we
+ * must return something in f_pos that lets us restart in the same
+ * place when we return.  Our solution is to, on first readdir() of
+ * the directory, copy up all visible entries from the low-level file
+ * systems and mark the entries that refer to low-level file system
+ * objects as "fallthru" entries.
+ *
+ * Locking strategy: We hold the topmost dir's i_mutex on entry.  We
+ * grab the i_mutex on lower directories one by one.  So the locking
+ * order is:
+ *
+ * Writable/topmost layers > Read-only/lower layers
+ *
+ * So there is no problem with lock ordering for union stacks with
+ * multiple lower layers.  E.g.:
+ *
+ * (topmost) A->B->C (bottom)
+ * (topmost) D->C->B (bottom)
+ *
+ * (Not that we support more than two layers at the moment.)
+ */
+
+int union_copyup_dir(struct path *topmost_path)
+{
+	struct dentry *topmost_dentry = topmost_path->dentry;
+	struct path path = *topmost_path;
+	int res = 0;
+
+	BUG_ON(IS_OPAQUE(topmost_dentry->d_inode));
+
+	res = mnt_want_write(topmost_path->mnt);
+	if (res)
+		return res;
+	/*
+	 * Mark this dir opaque to show that we have already copied up
+	 * the lower entries.  Only fallthru entries pass through to
+	 * the underlying file system.
+	 */
+	topmost_dentry->d_inode->i_flags |= S_OPAQUE;
+	mark_inode_dirty(topmost_dentry->d_inode);
+
+	path_get(&path);
+	while (union_down_one(&path.mnt, &path.dentry)) {
+		struct file * ftmp;
+		struct inode * inode;
+
+		/* XXX Permit fallthrus on lower-level? Would need to
+		 * pass in opaque flag to union_copyup_dir_one() and
+		 * only copy up fallthru entries there.  We allow
+		 * fallthrus in lower level opaque directories on
+		 * lookup, so for consistency we should do one or the
+		 * other in both places. */
+		if (IS_OPAQUE(path.dentry->d_inode))
+			break;
+
+		/* dentry_open() doesn't get a path reference itself */
+		path_get(&path);
+		ftmp = dentry_open(path.dentry, path.mnt,
+				   O_RDONLY | O_DIRECTORY | O_NOATIME,
+				   current_cred());
+		if (IS_ERR(ftmp)) {
+			printk (KERN_ERR "unable to open dir %s for "
+				"directory copyup: %ld\n",
+				path.dentry->d_name.name, PTR_ERR(ftmp));
+			path_put(&path);
+			continue;
+		}
+
+		inode = path.dentry->d_inode;
+		mutex_lock(&inode->i_mutex);
+
+		res = -ENOENT;
+		if (IS_DEADDIR(inode))
+			goto out_fput;
+		/*
+		 * Read the whole directory, calling our directory
+		 * entry copyup function on each entry.  Pass in the
+		 * topmost dentry as our private data so we can create
+		 * new entries in the topmost directory.
+		 */
+		res = ftmp->f_op->readdir(ftmp, topmost_dentry,
+					  union_copyup_dir_one);
+out_fput:
+		mutex_unlock(&inode->i_mutex);
+		fput(ftmp);
+
+		if (res)
+			break;
+	}
+	path_put(&path);
+	mnt_drop_write(topmost_path->mnt);
+	return res;
+}
diff --git a/include/linux/union.h b/include/linux/union.h
index 6eaeae8..11ebc1d 100644
--- a/include/linux/union.h
+++ b/include/linux/union.h
@@ -53,6 +53,7 @@ extern struct dentry * union_create_topmost_dir(struct path *, struct qstr *,
 extern int attach_mnt_union(struct vfsmount *, struct vfsmount *,
 			    struct dentry *);
 extern void detach_mnt_union(struct vfsmount *);
+extern int union_copyup_dir(struct path *);
 
 #else /* CONFIG_UNION_MOUNT */
 
@@ -66,6 +67,7 @@ extern void detach_mnt_union(struct vfsmount *);
 #define union_create_topmost_dir(x, y, z)	({ BUG(); (NULL); })
 #define attach_mnt_union(x, y, z)	do { } while (0)
 #define detach_mnt_union(x)		do { } while (0)
+#define union_copyup_dir(x)		({ BUG(); (0); })
 
 #endif	/* CONFIG_UNION_MOUNT */
 #endif	/* __KERNEL__ */
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/6] Union mount core rewrite v1
  2010-03-02 22:11 [RFC PATCH 0/6] Union mount core rewrite v1 Valerie Aurora
  2010-03-02 22:11 ` [PATCH 1/6] union-mount: Introduce union_mount structure and basic operations Valerie Aurora
@ 2010-03-03 17:28 ` Miklos Szeredi
  2010-03-03 20:31   ` Valerie Aurora
  1 sibling, 1 reply; 18+ messages in thread
From: Miklos Szeredi @ 2010-03-03 17:28 UTC (permalink / raw)
  To: Valerie Aurora; +Cc: viro, hch, linux-fsdevel, linux-kernel, vaurora

On Tue,  2 Mar 2010, Valerie Aurora wrote:
> This is a major rewrite of parts of union mounts, in particular the
> pathname lookup code.  For more info about union mounts, see:
> 
> http://valerieaurora.org/union/
> 
> The previous code had two important problems fixed in this series:
> 
> - On file open, is_unionized() grabs vfsmount lock and walks up the
>   mount tree even for non-union mounts.
> 
> - Pathname lookup required three cut-n-pasted versions of two complex
>   functions, one for each of cached/real/"hashed" lookups.

Looks much better :)

> This rewrite reduces the additional cost of a non-union lookup in a
> CONFIG_UNION_MOUNT kernel to either 1 or 2 mount flag tests (but adds
> the requirement that file systems be unioned only at their root
> directories).  This rewrite implements lookup with one lookup_union()
> function for all types of lookups.
> 
> This posted patch series includes only the union lookup, mount, and
> readdir patches and not the relatively uncontroversial whiteout and
> fallthru code.

Special inode/dentry flags (whiteout, fallthrough, opaque) are not
trivially the right solution:

 - they are invisible from userspace, new APIs are necessary to manipulate them
 - they are difficult to support on network filesystems
 - they are not useful for anything other than union mounts/filesystems

Extended attributes are a more standard way to add such info to files.
Hard links would allow sharing a single inode for all whiteout entries
and one for all fallthrough entries.

Have these options been considered as an alternative?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/6] union-mount: Introduce union_mount structure and basic operations
  2010-03-02 22:11 ` [PATCH 1/6] union-mount: Introduce union_mount structure and basic operations Valerie Aurora
  2010-03-02 22:11   ` [PATCH 2/6] union-mount: Drive the union cache via dcache Valerie Aurora
@ 2010-03-03 17:33   ` Miklos Szeredi
  2010-03-03 20:45     ` Valerie Aurora
  1 sibling, 1 reply; 18+ messages in thread
From: Miklos Szeredi @ 2010-03-03 17:33 UTC (permalink / raw)
  To: Valerie Aurora; +Cc: viro, hch, linux-fsdevel, linux-kernel, jblunck, vaurora

On Tue,  2 Mar 2010, Valerie Aurora wrote:
> From: Jan Blunck <jblunck@suse.de>
> 
> This patch adds the basic structures and operations of VFS-based union
> mounts (but not the ability to mount or lookup unioned file systems).
> Each directory in a unioned file system has an associated union stack
> created when the directory is first looked up.  The union stack is a
> structure kept in a hash table indexed by mount and dentry of the
> directory; thus, specific paths are unioned, not dentries alone.  The
> union stack keeps a pointer to the upper path and the lower path and
> can be looked up by either path.
> 
> This particular version of union mounts is based on ideas by Jan
> Blunck, Bharata Rao, and many others.
> 
> Signed-off-by: Jan Blunck <jblunck@suse.de>
> Signed-off-by: Valerie Aurora <vaurora@redhat.com>
> ---
>  fs/Kconfig             |   13 ++
>  fs/Makefile            |    1 +
>  fs/dcache.c            |    4 +
>  fs/union.c             |  290 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dcache.h |   20 ++++
>  include/linux/mount.h  |    3 +
>  include/linux/union.h  |   54 +++++++++
>  7 files changed, 385 insertions(+), 0 deletions(-)
>  create mode 100644 fs/union.c
>  create mode 100644 include/linux/union.h
> 
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 64d44ef..303186b 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -59,6 +59,19 @@ source "fs/notify/Kconfig"
>  
>  source "fs/quota/Kconfig"
>  
> +config UNION_MOUNT
> +       bool "Writable overlays (union mounts) (EXPERIMENTAL)"
> +       depends on EXPERIMENTAL
> +       help
> +         Writable overlays allow you to mount a transparent writable
> +	 layer over a read-only file system, for example, an ext3
> +	 partition on a hard drive over a CD-ROM root file system
> +	 image.
> +
> +	 See <file:Documentation/filesystems/union-mounts.txt> for details.
> +
> +	 If unsure, say N.
> +
>  source "fs/autofs/Kconfig"
>  source "fs/autofs4/Kconfig"
>  source "fs/fuse/Kconfig"
> diff --git a/fs/Makefile b/fs/Makefile
> index af6d047..4ed672e 100644
> --- a/fs/Makefile
> +++ b/fs/Makefile
> @@ -52,6 +52,7 @@ obj-$(CONFIG_NFS_COMMON)	+= nfs_common/
>  obj-$(CONFIG_GENERIC_ACL)	+= generic_acl.o
>  
>  obj-y				+= quota/
> +obj-$(CONFIG_UNION_MOUNT)	+= union.o
>  
>  obj-$(CONFIG_PROC_FS)		+= proc/
>  obj-y				+= partitions/
> diff --git a/fs/dcache.c b/fs/dcache.c
> index d14c304..0c2dd32 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -960,6 +960,10 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
>  	INIT_LIST_HEAD(&dentry->d_lru);
>  	INIT_LIST_HEAD(&dentry->d_subdirs);
>  	INIT_LIST_HEAD(&dentry->d_alias);
> +#ifdef CONFIG_UNION_MOUNT
> +	INIT_LIST_HEAD(&dentry->d_unions);
> +	dentry->d_unionized = 0;
> +#endif
>  
>  	if (parent) {
>  		dentry->d_parent = dget(parent);
> diff --git a/fs/union.c b/fs/union.c
> new file mode 100644
> index 0000000..2e005d9
> --- /dev/null
> +++ b/fs/union.c
> @@ -0,0 +1,290 @@
> +/*
> + * VFS based union mount for Linux
> + *
> + * Copyright (C) 2004-2007 IBM Corporation, IBM Deutschland Entwicklung GmbH.
> + * Copyright (C) 2007-2009 Novell Inc.
> + *
> + *   Author(s): Jan Blunck (j.blunck@tu-harburg.de)
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the Free
> + * Software Foundation; either version 2 of the License, or (at your option)
> + * any later version.
> + */
> +
> +#include <linux/bootmem.h>
> +#include <linux/init.h>
> +#include <linux/types.h>
> +#include <linux/hash.h>
> +#include <linux/fs.h>
> +#include <linux/mount.h>
> +#include <linux/fs_struct.h>
> +#include <linux/union.h>
> +
> +/*
> + * This is borrowed from fs/inode.c. The hashtable for lookups. Somebody
> + * should try to make this good - I've just made it work.
> + */
> +static unsigned int union_hash_mask __read_mostly;
> +static unsigned int union_hash_shift __read_mostly;
> +static struct hlist_head *union_hashtable __read_mostly;
> +static unsigned int union_rhash_mask __read_mostly;
> +static unsigned int union_rhash_shift __read_mostly;
> +static struct hlist_head *union_rhashtable __read_mostly;
> +
> +/*
> + * Locking Rules:
> + * - dcache_lock (for union_rlookup() only)
> + * - union_lock
> + */
> +DEFINE_SPINLOCK(union_lock);
> +
> +static struct kmem_cache *union_cache __read_mostly;
> +
> +static unsigned long hash(struct dentry *dentry, struct vfsmount *mnt)
> +{
> +	unsigned long tmp;
> +
> +	tmp = ((unsigned long)mnt * (unsigned long)dentry) ^
> +		(GOLDEN_RATIO_PRIME + (unsigned long)mnt) / L1_CACHE_BYTES;
> +	tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> union_hash_shift);
> +	return tmp & union_hash_mask;
> +}
> +
> +static __initdata unsigned long union_hash_entries;
> +
> +static int __init set_union_hash_entries(char *str)
> +{
> +	if (!str)
> +		return 0;
> +	union_hash_entries = simple_strtoul(str, &str, 0);
> +	return 1;
> +}
> +
> +__setup("union_hash_entries=", set_union_hash_entries);
> +
> +static int __init init_union(void)
> +{
> +	int loop;
> +
> +	union_cache = KMEM_CACHE(union_mount, SLAB_PANIC | SLAB_MEM_SPREAD);
> +	union_hashtable = alloc_large_system_hash("Union-cache",
> +						  sizeof(struct hlist_head),
> +						  union_hash_entries,
> +						  14,
> +						  0,
> +						  &union_hash_shift,
> +						  &union_hash_mask,
> +						  0);
> +
> +	for (loop = 0; loop < (1 << union_hash_shift); loop++)
> +		INIT_HLIST_HEAD(&union_hashtable[loop]);
> +
> +
> +	union_rhashtable = alloc_large_system_hash("rUnion-cache",
> +						  sizeof(struct hlist_head),
> +						  union_hash_entries,
> +						  14,
> +						  0,
> +						  &union_rhash_shift,
> +						  &union_rhash_mask,
> +						  0);
> +
> +	for (loop = 0; loop < (1 << union_rhash_shift); loop++)
> +		INIT_HLIST_HEAD(&union_rhashtable[loop]);
> +
> +	return 0;
> +}
> +
> +fs_initcall(init_union);
> +
> +struct union_mount *union_alloc(struct dentry *this, struct vfsmount *this_mnt,
> +				struct dentry *next, struct vfsmount *next_mnt)


Why doesn't union_alloc, append_to_union, union_lookup,
union_down_one, etc use "struct path *" arg instead of separate
vfsmount and dentry pointers?


> +{
> +	struct union_mount *um;
> +
> +	BUG_ON(!S_ISDIR(this->d_inode->i_mode));
> +	BUG_ON(!S_ISDIR(next->d_inode->i_mode));
> +
> +	um = kmem_cache_alloc(union_cache, GFP_ATOMIC);
> +	if (!um)
> +		return NULL;
> +
> +	atomic_set(&um->u_count, 1);

Why is u_count not a "struct kref"?


> +	INIT_LIST_HEAD(&um->u_unions);
> +	INIT_HLIST_NODE(&um->u_hash);
> +	INIT_HLIST_NODE(&um->u_rhash);
> +
> +	um->u_this.mnt = this_mnt;
> +	um->u_this.dentry = this;
> +	um->u_next.mnt = mntget(next_mnt);
> +	um->u_next.dentry = dget(next);
> +
> +	return um;
> +}
> +
> +struct union_mount *union_get(struct union_mount *um)
> +{
> +	BUG_ON(!atomic_read(&um->u_count));
> +	atomic_inc(&um->u_count);
> +	return um;
> +}
> +
> +static int __union_put(struct union_mount *um)
> +{
> +	if (!atomic_dec_and_test(&um->u_count))
> +		return 0;
> +
> +	BUG_ON(!hlist_unhashed(&um->u_hash));
> +	BUG_ON(!hlist_unhashed(&um->u_rhash));
> +
> +	kmem_cache_free(union_cache, um);
> +	return 1;
> +}
> +
> +void union_put(struct union_mount *um)
> +{
> +	struct path tmp = um->u_next;
> +
> +	if (__union_put(um))
> +		path_put(&tmp);
> +}
> +
> +static void __union_hash(struct union_mount *um)
> +{
> +	hlist_add_head(&um->u_hash, union_hashtable +
> +		       hash(um->u_this.dentry, um->u_this.mnt));
> +	hlist_add_head(&um->u_rhash, union_rhashtable +
> +		       hash(um->u_next.dentry, um->u_next.mnt));
> +}
> +
> +static void __union_unhash(struct union_mount *um)
> +{
> +	hlist_del_init(&um->u_hash);
> +	hlist_del_init(&um->u_rhash);
> +}
> +
> +struct union_mount *union_lookup(struct dentry *dentry, struct vfsmount *mnt)
> +{
> +	struct hlist_head *head = union_hashtable + hash(dentry, mnt);
> +	struct hlist_node *node;
> +	struct union_mount *um;
> +
> +	hlist_for_each_entry(um, node, head, u_hash) {
> +		if ((um->u_this.dentry == dentry) &&
> +		    (um->u_this.mnt == mnt))
> +			return um;
> +	}
> +
> +	return NULL;
> +}
> +
> +struct union_mount *union_rlookup(struct dentry *dentry, struct vfsmount *mnt)
> +{
> +	struct hlist_head *head = union_rhashtable + hash(dentry, mnt);
> +	struct hlist_node *node;
> +	struct union_mount *um;
> +
> +	hlist_for_each_entry(um, node, head, u_rhash) {
> +		if ((um->u_next.dentry == dentry) &&
> +		    (um->u_next.mnt == mnt))
> +			return um;
> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * append_to_union - add a path to the bottom of the union stack
> + *
> + * Allocate and attach a union cache entry linking the new, upper
> + * mnt/dentry to the "covered" matching lower mnt/dentry.  It's okay
> + * if the union cache entry already exists.
> + */
> +
> +int append_to_union(struct vfsmount *upper_mnt, struct dentry *upper_dentry,
> +		    struct vfsmount *lower_mnt, struct dentry *lower_dentry)
> +{
> +	struct union_mount *new, *um;
> +
> +	BUG_ON(!S_ISDIR(upper_dentry->d_inode->i_mode));
> +	BUG_ON(!S_ISDIR(lower_dentry->d_inode->i_mode));
> +
> +	/* Common case is that it's already been created, do a lookup first */
> +
> +	spin_lock(&union_lock);
> +	um = union_lookup(upper_dentry, upper_mnt);
> +	if (um) {
> +		BUG_ON((um->u_next.dentry != lower_dentry) ||
> +		       (um->u_next.mnt != lower_mnt));
> +		spin_unlock(&union_lock);
> +		return 0;
> +	}
> +	spin_unlock(&union_lock);
> +
> +	new = union_alloc(upper_dentry, upper_mnt, lower_dentry, lower_mnt);
> +	if (!new)
> +		return -ENOMEM;
> +
> +	spin_lock(&union_lock);
> +	um = union_lookup(upper_dentry, upper_mnt);
> +	if (um) {
> +		/* Someone added it while we were allocating, no problem */
> +		BUG_ON((um->u_next.dentry != lower_dentry) ||
> +		       (um->u_next.mnt != lower_mnt));
> +		spin_unlock(&union_lock);
> +		union_put(new);
> +		return 0;
> +	}
> +	__union_hash(new);
> +	spin_unlock(&union_lock);
> +	return 0;
> +}
> +
> +/*
> + * WARNING! Confusing terminology alert.
> + *
> + * Note that the directions "up" and "down" in union mounts are the
> + * opposite of "up" and "down" in normal VFS operation terminology.
> + * "up" in the rest of the VFS means "towards the root of the mount
> + * tree."  If you mount B on top of A, following B "up" will get you
> + * A.  In union mounts, "up" means "towards the most recently mounted
> + * layer of the union stack."  If you union mount B on top of A,
> + * following A "up" will get you to B.  Another way to put it is that
> + * "up" in the VFS means going from this mount towards the direction
> + * of its mnt->mnt_parent pointer, but "up" in union mounts means
> + * going in the opposite direction (until you run out of union
> + * layers).
> + */

So if this is confusing, why not use a different terminology for union
layers?  Like "next" and "prev" like it is already used in the
structures.

> +
> +/*
> + * union_down_one - get the next lower directory in the union stack
> + *
> + * This is called to traverse the union stack from the given layer to
> + * the next lower layer. union_down_one() is called by various
> + * lookup functions that are aware of union mounts.
> + *
> + * Returns non-zero if followed to the next lower layer, zero otherwise.
> + *
> + * See note on up/down terminology above.
> + */
> +int union_down_one(struct vfsmount **mnt, struct dentry **dentry)
> +{
> +	struct union_mount *um;
> +
> +	if (!IS_MNT_UNION(*mnt))
> +		return 0;
> +
> +	spin_lock(&union_lock);
> +	um = union_lookup(*dentry, *mnt);
> +	spin_unlock(&union_lock);
> +	if (um) {
> +		path_get(&um->u_next);
> +		dput(*dentry);
> +		*dentry = um->u_next.dentry;
> +		mntput(*mnt);
> +		*mnt = um->u_next.mnt;
> +		return 1;
> +	}
> +	return 0;
> +}
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index e035c51..d6c1da2 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -101,6 +101,26 @@ struct dentry {
>  	struct dentry *d_parent;	/* parent directory */
>  	struct qstr d_name;
>  
> +#ifdef CONFIG_UNION_MOUNT
> +	/*
> +	 * Stacks of union mount structures are connected to dentries
> +	 * through the d_unions field.  If this list is not empty,
> +	 * then this dentry is part of a unioned directory stack.
> +	 * Protected by union_lock.
> +	 */
> +	struct list_head d_unions;	/* list of union_mount's */
> +	/*
> +	 * If d_unionized is set, then this dentry is referenced by
> +	 * the u_next field of a union mount structure - that is, it
> +	 * is a dentry for a lower layer of a union.  d_unionized is
> +	 * NOT set in the dentry for the topmost layer of a union.
> +	 *
> +	 * d_unionized would be better renamed to d_union_lower or
> +	 * d_union_ref.
> +	 */
> +	unsigned int d_unionized;	/* unions referencing this dentry */
> +#endif
> +
>  	struct list_head d_lru;		/* LRU list */
>  	/*
>  	 * d_child and d_rcu can share memory
> diff --git a/include/linux/mount.h b/include/linux/mount.h
> index d42be54..85bb75d 100644
> --- a/include/linux/mount.h
> +++ b/include/linux/mount.h
> @@ -64,6 +64,9 @@ struct vfsmount {
>  	struct list_head mnt_slave_list;/* list of slave mounts */
>  	struct list_head mnt_slave;	/* slave list entry */
>  	struct vfsmount *mnt_master;	/* slave is on master->mnt_slave_list */
> +#ifdef CONFIG_UNION_MOUNT
> +	struct list_head mnt_unions;	/* list of union_mount structures */
> +#endif
>  	struct mnt_namespace *mnt_ns;	/* containing namespace */
>  	int mnt_id;			/* mount identifier */
>  	int mnt_group_id;		/* peer group identifier */
> diff --git a/include/linux/union.h b/include/linux/union.h
> new file mode 100644
> index 0000000..71dc35a
> --- /dev/null
> +++ b/include/linux/union.h
> @@ -0,0 +1,54 @@
> +/*
> + * VFS based union mount for Linux
> + *
> + * Copyright (C) 2004-2007 IBM Corporation, IBM Deutschland Entwicklung GmbH.
> + * Copyright (C) 2007 Novell Inc.
> + *   Author(s): Jan Blunck (j.blunck@tu-harburg.de)
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the Free
> + * Software Foundation; either version 2 of the License, or (at your option)
> + * any later version.
> + *
> + */
> +#ifndef __LINUX_UNION_H
> +#define __LINUX_UNION_H
> +#ifdef __KERNEL__
> +
> +#include <linux/list.h>
> +#include <asm/atomic.h>
> +
> +struct dentry;
> +struct vfsmount;
> +
> +#ifdef CONFIG_UNION_MOUNT
> +
> +/*
> + * The union mount structure.
> + */
> +struct union_mount {
> +	atomic_t u_count;		/* reference count */
> +	struct list_head u_unions;	/* list head for d_unions */
> +	struct list_head u_list;	/* list head for mnt_unions */
> +	struct hlist_node u_hash;	/* list head for searching */
> +	struct hlist_node u_rhash;	/* list head for reverse searching */
> +
> +	struct path u_this;		/* this is me */
> +	struct path u_next;		/* this is what I overlay */
> +};
> +
> +#define IS_MNT_UNION(mnt)	((mnt)->mnt_flags & MNT_UNION)
> +
> +extern int append_to_union(struct vfsmount *, struct dentry *,
> +			   struct vfsmount *, struct dentry *);
> +extern int union_down_one(struct vfsmount **, struct dentry **);
> +
> +#else /* CONFIG_UNION_MOUNT */
> +
> +#define IS_MNT_UNION(x)			(0)
> +#define append_to_union(x1, y1, x2, y2)	({ BUG(); (0); })
> +#define union_down_one(x, y)		({ (0); })
> +
> +#endif	/* CONFIG_UNION_MOUNT */
> +#endif	/* __KERNEL__ */
> +#endif	/* __LINUX_UNION_H */
> -- 
> 1.5.6.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/6] union-mount: Drive the union cache via dcache
  2010-03-02 22:11   ` [PATCH 2/6] union-mount: Drive the union cache via dcache Valerie Aurora
  2010-03-02 22:11     ` [PATCH 3/6] union-mount: Implement union lookup Valerie Aurora
@ 2010-03-03 17:35     ` Miklos Szeredi
  2010-03-03 21:49       ` Valerie Aurora
  1 sibling, 1 reply; 18+ messages in thread
From: Miklos Szeredi @ 2010-03-03 17:35 UTC (permalink / raw)
  To: Valerie Aurora; +Cc: viro, hch, linux-fsdevel, linux-kernel, jblunck, vaurora

On Tue,  2 Mar 2010, Valerie Aurora wrote:
> From: Jan Blunck <jblunck@suse.de>
> 
> If a dentry is removed from dentry cache because its usage count drops to
> zero, the references to the underlying layer of the unions the dentry is in
> are dropped too. Therefore the union cache is driven by the dentry cache.
> 
> Signed-off-by: Jan Blunck <jblunck@suse.de>
> Signed-off-by: Valerie Aurora <vaurora@redhat.com>
> ---
>  fs/dcache.c            |   13 +++++++++++
>  fs/union.c             |   56 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dcache.h |    8 ++++++
>  include/linux/union.h  |    4 +++
>  4 files changed, 81 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 0c2dd32..fc37f7a 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -18,6 +18,7 @@
>  #include <linux/string.h>
>  #include <linux/mm.h>
>  #include <linux/fs.h>
> +#include <linux/union.h>
>  #include <linux/fsnotify.h>
>  #include <linux/slab.h>
>  #include <linux/init.h>
> @@ -175,6 +176,8 @@ static struct dentry *d_kill(struct dentry *dentry)
>  	dentry_stat.nr_dentry--;	/* For d_free, below */
>  	/*drops the locks, at that point nobody can reach this dentry */
>  	dentry_iput(dentry);
> +	/* If the dentry was in an union delete them */
> +	shrink_d_unions(dentry);
>  	if (IS_ROOT(dentry))
>  		parent = NULL;
>  	else
> @@ -696,6 +699,7 @@ static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
>  					iput(inode);
>  			}
>  
> +			shrink_d_unions(dentry);
>  			d_free(dentry);
>  
>  			/* finished when we fall off the top of the tree,
> @@ -1536,7 +1540,9 @@ void d_delete(struct dentry * dentry)
>  	spin_lock(&dentry->d_lock);
>  	isdir = S_ISDIR(dentry->d_inode->i_mode);
>  	if (atomic_read(&dentry->d_count) == 1) {
> +		__d_drop_unions(dentry);
>  		dentry_iput(dentry);
> +		shrink_d_unions(dentry);
>  		fsnotify_nameremove(dentry, isdir);
>  		return;
>  	}
> @@ -1547,6 +1553,13 @@ void d_delete(struct dentry * dentry)
>  	spin_unlock(&dentry->d_lock);
>  	spin_unlock(&dcache_lock);
>  
> +	/*
> +	 * Remove any associated unions.  While someone still has this
> +	 * directory open (ref count > 0), we could not have deleted
> +	 * it unless it was empty, and therefore has no references to
> +	 * directories below it.  So we don't need the unions.
> +	 */
> +	shrink_d_unions(dentry);
>  	fsnotify_nameremove(dentry, isdir);
>  }
>  EXPORT_SYMBOL(d_delete);
> diff --git a/fs/union.c b/fs/union.c
> index 2e005d9..182ca91 100644
> --- a/fs/union.c
> +++ b/fs/union.c
> @@ -14,6 +14,7 @@
>  
>  #include <linux/bootmem.h>
>  #include <linux/init.h>
> +#include <linux/module.h>
>  #include <linux/types.h>
>  #include <linux/hash.h>
>  #include <linux/fs.h>
> @@ -236,6 +237,8 @@ int append_to_union(struct vfsmount *upper_mnt, struct dentry *upper_dentry,
>  		union_put(new);
>  		return 0;
>  	}
> +	list_add(&new->u_unions, &upper_dentry->d_unions);
> +	lower_dentry->d_unionized++;
>  	__union_hash(new);
>  	spin_unlock(&union_lock);
>  	return 0;
> @@ -288,3 +291,56 @@ int union_down_one(struct vfsmount **mnt, struct dentry **dentry)
>  	}
>  	return 0;
>  }
> +
> +/**
> + * __d_drop_unions  -  remove all this dentry's unions from the union hash table
> + *
> + * @dentry - topmost dentry in the union stack to remove
> + *
> + * This must be called after unhashing a dentry. This is called with
> + * dcache_lock held and unhashes all the unions this dentry is
> + * attached to.
> + */
> +void __d_drop_unions(struct dentry *dentry)
> +{
> +	struct union_mount *this, *next;
> +
> +	spin_lock(&union_lock);
> +	list_for_each_entry_safe(this, next, &dentry->d_unions, u_unions)
> +		__union_unhash(this);
> +	spin_unlock(&union_lock);
> +}
> +EXPORT_SYMBOL_GPL(__d_drop_unions);
> +
> +/*
> + * This must be called after __d_drop_unions() without holding any locks.
> + * Note: The dentry might still be reachable via a lookup but at that time it
> + * already a negative dentry. Otherwise it would be unhashed. The union_mount
> + * structure itself is still reachable through mnt->mnt_unions (which we
> + * protect against with union_lock).
> + *
> + * We were worried about a recursive dput() call through:
> + *
> + * dput()->d_kill()->shrink_d_unions()->union_put()->dput()
> + *
> + * But this path can only be reached if the dentry is unhashed when we
> + * enter the first dput(), and it can only be unhashed if it was
> + * rmdir()'d, and d_delete() calls shrink_d_unions() for us.
> + */
> +void shrink_d_unions(struct dentry *dentry)
> +{
> +	struct union_mount *this, *next;
> +
> +repeat:
> +	spin_lock(&union_lock);
> +	list_for_each_entry_safe(this, next, &dentry->d_unions, u_unions) {
> +		BUG_ON(!hlist_unhashed(&this->u_hash));
> +		BUG_ON(!hlist_unhashed(&this->u_rhash));
> +		list_del(&this->u_unions);
> +		this->u_next.dentry->d_unionized--;
> +		spin_unlock(&union_lock);
> +		union_put(this);
> +		goto repeat;

This loop is weird.  That list_for_each_entry_safe is just used to
initialize "this", since it unconditionally restarts from the
beginning.

> +	}
> +	spin_unlock(&union_lock);
> +}
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index d6c1da2..bfa8f97 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -227,12 +227,20 @@ extern seqlock_t rename_lock;
>   * __d_drop requires dentry->d_lock.
>   */
>  
> +#ifdef CONFIG_UNION_MOUNT
> +extern void __d_drop_unions(struct dentry *);
> +#endif
> +
>  static inline void __d_drop(struct dentry *dentry)
>  {
>  	if (!(dentry->d_flags & DCACHE_UNHASHED)) {
>  		dentry->d_flags |= DCACHE_UNHASHED;
>  		hlist_del_rcu(&dentry->d_hash);
>  	}
> +#ifdef CONFIG_UNION_MOUNT
> +	/* remove dentry from the union hashtable */
> +	__d_drop_unions(dentry);
> +#endif
>  }
>  
>  static inline void d_drop(struct dentry *dentry)
> diff --git a/include/linux/union.h b/include/linux/union.h
> index 71dc35a..0ab0a2f 100644
> --- a/include/linux/union.h
> +++ b/include/linux/union.h
> @@ -42,12 +42,16 @@ struct union_mount {
>  extern int append_to_union(struct vfsmount *, struct dentry *,
>  			   struct vfsmount *, struct dentry *);
>  extern int union_down_one(struct vfsmount **, struct dentry **);
> +extern void __d_drop_unions(struct dentry *);
> +extern void shrink_d_unions(struct dentry *);
>  
>  #else /* CONFIG_UNION_MOUNT */
>  
>  #define IS_MNT_UNION(x)			(0)
>  #define append_to_union(x1, y1, x2, y2)	({ BUG(); (0); })
>  #define union_down_one(x, y)		({ (0); })
> +#define __d_drop_unions(x)		do { } while (0)
> +#define shrink_d_unions(x)		do { } while (0)
>  
>  #endif	/* CONFIG_UNION_MOUNT */
>  #endif	/* __KERNEL__ */
> -- 
> 1.5.6.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/6] Union mount core rewrite v1
  2010-03-03 17:28 ` [RFC PATCH 0/6] Union mount core rewrite v1 Miklos Szeredi
@ 2010-03-03 20:31   ` Valerie Aurora
  0 siblings, 0 replies; 18+ messages in thread
From: Valerie Aurora @ 2010-03-03 20:31 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: viro, hch, linux-fsdevel, linux-kernel

On Wed, Mar 03, 2010 at 06:28:49PM +0100, Miklos Szeredi wrote:
> On Tue,  2 Mar 2010, Valerie Aurora wrote:
> > This is a major rewrite of parts of union mounts, in particular the
> > pathname lookup code.  For more info about union mounts, see:
> > 
> > http://valerieaurora.org/union/
> > 
> > The previous code had two important problems fixed in this series:
> > 
> > - On file open, is_unionized() grabs vfsmount lock and walks up the
> >   mount tree even for non-union mounts.
> > 
> > - Pathname lookup required three cut-n-pasted versions of two complex
> >   functions, one for each of cached/real/"hashed" lookups.
> 
> Looks much better :)

Thanks for the review. :)

> > This rewrite reduces the additional cost of a non-union lookup in a
> > CONFIG_UNION_MOUNT kernel to either 1 or 2 mount flag tests (but adds
> > the requirement that file systems be unioned only at their root
> > directories).  This rewrite implements lookup with one lookup_union()
> > function for all types of lookups.
> > 
> > This posted patch series includes only the union lookup, mount, and
> > readdir patches and not the relatively uncontroversial whiteout and
> > fallthru code.
> 
> Special inode/dentry flags (whiteout, fallthrough, opaque) are not
> trivially the right solution:
> 
>  - they are invisible from userspace, new APIs are necessary to manipulate them
>  - they are difficult to support on network filesystems
>  - they are not useful for anything other than union mounts/filesystems
> 
> Extended attributes are a more standard way to add such info to files.
> Hard links would allow sharing a single inode for all whiteout entries
> and one for all fallthrough entries.
> 
> Have these options been considered as an alternative?

Thanks for bringing that up!  We have considered them and aren't sure
what the best solution is - all the options, including our current
example implementations, have serious drawbacks.  Fortunately, the
implementation of whiteouts, etc. is fairly separate from the main
body of union mount code.  For example, I'm pretty sure you could
implement whiteouts, fallthrus, and opaque flags in ext2 as system
extended attributes or as hard links (or as crazy special symlinks)
without changing any other non-ext2 patches.  We already switched once
from encoding whiteouts with reserved inode numbers in ext3 to
whiteouts as a new dentry type flag in ext2 and that didn't affect
other patches.  In general, I am happy with whatever the maintainer of
the underlying file system prefers.

The advantage of the system extended attribute approach is that any
file system with xattrs also supports union mounts.  It might be worth
implementing for that reason alone.

-VAL

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/6] union-mount: Introduce union_mount structure and basic operations
  2010-03-03 17:33   ` [PATCH 1/6] union-mount: Introduce union_mount structure and basic operations Miklos Szeredi
@ 2010-03-03 20:45     ` Valerie Aurora
  2010-03-04 16:24       ` Miklos Szeredi
  0 siblings, 1 reply; 18+ messages in thread
From: Valerie Aurora @ 2010-03-03 20:45 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: viro, hch, linux-fsdevel, linux-kernel, jblunck

On Wed, Mar 03, 2010 at 06:33:20PM +0100, Miklos Szeredi wrote:
> On Tue,  2 Mar 2010, Valerie Aurora wrote:
> > +struct union_mount *union_alloc(struct dentry *this, struct vfsmount *this_mnt,
> > +				struct dentry *next, struct vfsmount *next_mnt)
> 
> 
> Why doesn't union_alloc, append_to_union, union_lookup,
> union_down_one, etc use "struct path *" arg instead of separate
> vfsmount and dentry pointers?

I'd prefer that too, but it isn't a clear win.  For append_to_union(),
the reason is that we call it when a file system is mounted, using mnt
and mnt->mnt_root as the first args:

int attach_mnt_union(struct vfsmount *mnt, struct vfsmount *dest_mnt,
		     struct dentry *dest_dentry)
{
	if (!IS_MNT_UNION(mnt))
		return 0;

	return append_to_union(mnt, mnt->mnt_root, dest_mnt, dest_dentry);
}

Same thing happens in detach_mnt_union() with union_lookup().  That
trickles down into the rest.  I suppose I could create a temporary
path variable for those two functions and then we'd be paths
everywhere else.  What do you think?

> > +     um = kmem_cache_alloc(union_cache, GFP_ATOMIC);
> > +     if (!um)
> > +             return NULL;
> > +
> > +     atomic_set(&um->u_count, 1);
> 
> Why is u_count not a "struct kref"?

We stole this from the inode cache code, so for the same reason inodes
have i_count as atomic_t instead of a kref (whatever that is). :)

> > > +/*
> > + * WARNING! Confusing terminology alert.
> > + *
> > + * Note that the directions "up" and "down" in union mounts are the
> > + * opposite of "up" and "down" in normal VFS operation terminology.
> > + * "up" in the rest of the VFS means "towards the root of the mount
> > + * tree."  If you mount B on top of A, following B "up" will get you
> > + * A.  In union mounts, "up" means "towards the most recently mounted
> > + * layer of the union stack."  If you union mount B on top of A,
> > + * following A "up" will get you to B.  Another way to put it is that
> > + * "up" in the VFS means going from this mount towards the direction
> > + * of its mnt->mnt_parent pointer, but "up" in union mounts means
> > + * going in the opposite direction (until you run out of union
> > + * layers).
> > + */
> 
> So if this is confusing, why not use a different terminology for union
> layers?  Like "next" and "prev" like it is already used in the
> structures.

Unfortunately, "upper" and "lower" are fairly well established
concepts in layering file systems and seem to be the most natural way
for programmers to think about unioned file systems.  It's only the
VFS (which most people never touch) that uses "up" and "down" in the
opposite sense.  I think the better path is to replace "next" and
"prev" in the structure.

-VAL

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/6] union-mount: Drive the union cache via dcache
  2010-03-03 17:35     ` [PATCH 2/6] union-mount: Drive the union cache via dcache Miklos Szeredi
@ 2010-03-03 21:49       ` Valerie Aurora
  2010-03-04 16:34         ` Miklos Szeredi
  0 siblings, 1 reply; 18+ messages in thread
From: Valerie Aurora @ 2010-03-03 21:49 UTC (permalink / raw)
  To: Miklos Szeredi, jblunck; +Cc: viro, hch, linux-fsdevel, linux-kernel

On Wed, Mar 03, 2010 at 06:35:55PM +0100, Miklos Szeredi wrote:
> On Tue,  2 Mar 2010, Valerie Aurora wrote:
> > +/*
> > + * This must be called after __d_drop_unions() without holding any locks.
> > + * Note: The dentry might still be reachable via a lookup but at that time it
> > + * already a negative dentry. Otherwise it would be unhashed. The union_mount
> > + * structure itself is still reachable through mnt->mnt_unions (which we
> > + * protect against with union_lock).
> > + *
> > + * We were worried about a recursive dput() call through:
> > + *
> > + * dput()->d_kill()->shrink_d_unions()->union_put()->dput()
> > + *
> > + * But this path can only be reached if the dentry is unhashed when we
> > + * enter the first dput(), and it can only be unhashed if it was
> > + * rmdir()'d, and d_delete() calls shrink_d_unions() for us.
> > + */
> > +void shrink_d_unions(struct dentry *dentry)
> > +{
> > +	struct union_mount *this, *next;
> > +
> > +repeat:
> > +	spin_lock(&union_lock);
> > +	list_for_each_entry_safe(this, next, &dentry->d_unions, u_unions) {
> > +		BUG_ON(!hlist_unhashed(&this->u_hash));
> > +		BUG_ON(!hlist_unhashed(&this->u_rhash));
> > +		list_del(&this->u_unions);
> > +		this->u_next.dentry->d_unionized--;
> > +		spin_unlock(&union_lock);
> > +		union_put(this);
> > +		goto repeat;
> 
> This loop is weird.  That list_for_each_entry_safe is just used to
> initialize "this", since it unconditionally restarts from the
> beginning.

This loop is definitely weird, but the alternative is so simple
(replace the goto with a spin_lock()) that I suspect Jan had a reason
to write it this way.  Jan, do you recall?

Unfortunately it has not been tested on more than one element in the
list (although multiple layers now seem a possibility with the current
code).

-VAL

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Multiple read-only layers in union mounts (was Re: [PATCH 6/6] union-mount: Copy up directory entries on first readdir())
  2010-03-02 22:11           ` [PATCH 6/6] union-mount: Copy up directory entries on first readdir() Valerie Aurora
@ 2010-03-03 21:53             ` Valerie Aurora
  0 siblings, 0 replies; 18+ messages in thread
From: Valerie Aurora @ 2010-03-03 21:53 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig
  Cc: linux-fsdevel, linux-kernel, Felix Fietkau

I think we can now have multiple read-only layers in a union without
lock ordering problems.  From the comment to union_copyup_dir():

/**
 * union_copyup_dir - copy up low-level directory entries to topmost dir
 *
 * readdir() is difficult to support on union file systems for two
 * reasons: We must eliminate duplicates and apply whiteouts, and we
 * must return something in f_pos that lets us restart in the same
 * place when we return.  Our solution is to, on first readdir() of
 * the directory, copy up all visible entries from the low-level file
 * systems and mark the entries that refer to low-level file system
 * objects as "fallthru" entries.
 *
 * Locking strategy: We hold the topmost dir's i_mutex on entry.  We
 * grab the i_mutex on lower directories one by one.  So the locking
 * order is:
 *
 * Writable/topmost layers > Read-only/lower layers
 *
 * So there is no problem with lock ordering for union stacks with
 * multiple lower layers.  E.g.:
 *
 * (topmost) A->B->C (bottom)
 * (topmost) D->C->B (bottom)
 *
 * (Not that we support more than two layers at the moment.)
 */

Does this seem correct?

-VAL

On Tue, Mar 02, 2010 at 02:11:29PM -0800, Valerie Aurora wrote:
> readdir() in union mounts is implemented by copying up all visible
> directory entries from the lower level directories to the topmost
> directory.  Directory entries that refer to lower level file system
> objects are marked as "fallthru" in the topmost directory.
> 
> Thanks to Felix Fietkau <nbd@openwrt.org> for a bug fix.
> 
> Signed-off-by: Valerie Aurora <vaurora@redhat.com>
> Signed-off-by: Felix Fietkau <nbd@openwrt.org>
> ---
>  fs/readdir.c          |    9 +++
>  fs/union.c            |  160 +++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/union.h |    2 +
>  3 files changed, 171 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/readdir.c b/fs/readdir.c
> index 3a48491..da71515 100644
> --- a/fs/readdir.c
> +++ b/fs/readdir.c
> @@ -16,6 +16,8 @@
>  #include <linux/security.h>
>  #include <linux/syscalls.h>
>  #include <linux/unistd.h>
> +#include <linux/union.h>
> +#include <linux/mount.h>
>  
>  #include <asm/uaccess.h>
>  
> @@ -36,9 +38,16 @@ int vfs_readdir(struct file *file, filldir_t filler, void *buf)
>  
>  	res = -ENOENT;
>  	if (!IS_DEADDIR(inode)) {
> +		if (IS_UNIONED_DIR(&file->f_path) && !IS_OPAQUE(inode)) {
> +			res = union_copyup_dir(&file->f_path);
> +			if (res)
> +				goto out_unlock;
> +		}
> +
>  		res = file->f_op->readdir(file, buf, filler);
>  		file_accessed(file);
>  	}
> +out_unlock:
>  	mutex_unlock(&inode->i_mutex);
>  out:
>  	return res;
> diff --git a/fs/union.c b/fs/union.c
> index ed852e5..b66989a 100644
> --- a/fs/union.c
> +++ b/fs/union.c
> @@ -5,6 +5,7 @@
>   * Copyright (C) 2007-2009 Novell Inc.
>   *
>   *   Author(s): Jan Blunck (j.blunck@tu-harburg.de)
> + *              Valerie Aurora <vaurora@redhat.com>
>   *
>   * This program is free software; you can redistribute it and/or modify it
>   * under the terms of the GNU General Public License as published by the Free
> @@ -22,6 +23,8 @@
>  #include <linux/fs_struct.h>
>  #include <linux/union.h>
>  #include <linux/namei.h>
> +#include <linux/file.h>
> +#include <linux/security.h>
>  
>  /*
>   * This is borrowed from fs/inode.c. The hashtable for lookups. Somebody
> @@ -467,3 +470,160 @@ out:
>  	mnt_drop_write(parent->mnt);
>  	return dentry;
>  }
> +
> +/**
> + * union_copyup_dir_one - copy up a single directory entry
> + *
> + * Individual directory entry copyup function for union_copyup_dir.
> + * We get the entries from higher level layers first.
> + */
> +
> +static int union_copyup_dir_one(void *buf, const char *name, int namlen,
> +				loff_t offset, u64 ino, unsigned int d_type)
> +{
> +	struct dentry *topmost_dentry = (struct dentry *) buf;
> +	struct dentry *dentry;
> +	int err = 0;
> +
> +	switch (namlen) {
> +	case 2:
> +		if (name[1] != '.')
> +			break;
> +	case 1:
> +		if (name[0] != '.')
> +			break;
> +		return 0;
> +	}
> +
> +	/* Lookup this entry in the topmost directory */
> +	dentry = lookup_one_len(name, topmost_dentry, namlen);
> +
> +	if (IS_ERR(dentry)) {
> +		printk(KERN_WARNING "%s: error looking up %s\n", __func__,
> +		       dentry->d_name.name);
> +		err = PTR_ERR(dentry);
> +		goto out;
> +	}
> +
> +	/*
> +	 * If the entry already exists, one of the following is true:
> +	 * it was already copied up (due to an earlier lookup), an
> +	 * entry with the same name already exists on the topmost file
> +	 * system, it is a whiteout, or it is a fallthru.  In each
> +	 * case, the top level entry masks any entries from lower file
> +	 * systems, so don't copy up this entry.
> +	 */
> +	if (dentry->d_inode || d_is_whiteout(dentry) || d_is_fallthru(dentry))
> +		goto out_dput;
> +
> +	/*
> +	 * If the entry doesn't exist, create a fallthru entry in the
> +	 * topmost file system.  All possible directory types are
> +	 * used, so each file system must implement its own way of
> +	 * storing a fallthru entry.
> +	 */
> +	err = topmost_dentry->d_inode->i_op->fallthru(topmost_dentry->d_inode,
> +						      dentry);
> +out_dput:
> +	dput(dentry);
> +out:
> +	return err;
> +}
> +
> +/**
> + * union_copyup_dir - copy up low-level directory entries to topmost dir
> + *
> + * readdir() is difficult to support on union file systems for two
> + * reasons: We must eliminate duplicates and apply whiteouts, and we
> + * must return something in f_pos that lets us restart in the same
> + * place when we return.  Our solution is to, on first readdir() of
> + * the directory, copy up all visible entries from the low-level file
> + * systems and mark the entries that refer to low-level file system
> + * objects as "fallthru" entries.
> + *
> + * Locking strategy: We hold the topmost dir's i_mutex on entry.  We
> + * grab the i_mutex on lower directories one by one.  So the locking
> + * order is:
> + *
> + * Writable/topmost layers > Read-only/lower layers
> + *
> + * So there is no problem with lock ordering for union stacks with
> + * multiple lower layers.  E.g.:
> + *
> + * (topmost) A->B->C (bottom)
> + * (topmost) D->C->B (bottom)
> + *
> + * (Not that we support more than two layers at the moment.)
> + */
> +
> +int union_copyup_dir(struct path *topmost_path)
> +{
> +	struct dentry *topmost_dentry = topmost_path->dentry;
> +	struct path path = *topmost_path;
> +	int res = 0;
> +
> +	BUG_ON(IS_OPAQUE(topmost_dentry->d_inode));
> +
> +	res = mnt_want_write(topmost_path->mnt);
> +	if (res)
> +		return res;
> +	/*
> +	 * Mark this dir opaque to show that we have already copied up
> +	 * the lower entries.  Only fallthru entries pass through to
> +	 * the underlying file system.
> +	 */
> +	topmost_dentry->d_inode->i_flags |= S_OPAQUE;
> +	mark_inode_dirty(topmost_dentry->d_inode);
> +
> +	path_get(&path);
> +	while (union_down_one(&path.mnt, &path.dentry)) {
> +		struct file * ftmp;
> +		struct inode * inode;
> +
> +		/* XXX Permit fallthrus on lower-level? Would need to
> +		 * pass in opaque flag to union_copyup_dir_one() and
> +		 * only copy up fallthru entries there.  We allow
> +		 * fallthrus in lower level opaque directories on
> +		 * lookup, so for consistency we should do one or the
> +		 * other in both places. */
> +		if (IS_OPAQUE(path.dentry->d_inode))
> +			break;
> +
> +		/* dentry_open() doesn't get a path reference itself */
> +		path_get(&path);
> +		ftmp = dentry_open(path.dentry, path.mnt,
> +				   O_RDONLY | O_DIRECTORY | O_NOATIME,
> +				   current_cred());
> +		if (IS_ERR(ftmp)) {
> +			printk (KERN_ERR "unable to open dir %s for "
> +				"directory copyup: %ld\n",
> +				path.dentry->d_name.name, PTR_ERR(ftmp));
> +			path_put(&path);
> +			continue;
> +		}
> +
> +		inode = path.dentry->d_inode;
> +		mutex_lock(&inode->i_mutex);
> +
> +		res = -ENOENT;
> +		if (IS_DEADDIR(inode))
> +			goto out_fput;
> +		/*
> +		 * Read the whole directory, calling our directory
> +		 * entry copyup function on each entry.  Pass in the
> +		 * topmost dentry as our private data so we can create
> +		 * new entries in the topmost directory.
> +		 */
> +		res = ftmp->f_op->readdir(ftmp, topmost_dentry,
> +					  union_copyup_dir_one);
> +out_fput:
> +		mutex_unlock(&inode->i_mutex);
> +		fput(ftmp);
> +
> +		if (res)
> +			break;
> +	}
> +	path_put(&path);
> +	mnt_drop_write(topmost_path->mnt);
> +	return res;
> +}
> diff --git a/include/linux/union.h b/include/linux/union.h
> index 6eaeae8..11ebc1d 100644
> --- a/include/linux/union.h
> +++ b/include/linux/union.h
> @@ -53,6 +53,7 @@ extern struct dentry * union_create_topmost_dir(struct path *, struct qstr *,
>  extern int attach_mnt_union(struct vfsmount *, struct vfsmount *,
>  			    struct dentry *);
>  extern void detach_mnt_union(struct vfsmount *);
> +extern int union_copyup_dir(struct path *);
>  
>  #else /* CONFIG_UNION_MOUNT */
>  
> @@ -66,6 +67,7 @@ extern void detach_mnt_union(struct vfsmount *);
>  #define union_create_topmost_dir(x, y, z)	({ BUG(); (NULL); })
>  #define attach_mnt_union(x, y, z)	do { } while (0)
>  #define detach_mnt_union(x)		do { } while (0)
> +#define union_copyup_dir(x)		({ BUG(); (0); })
>  
>  #endif	/* CONFIG_UNION_MOUNT */
>  #endif	/* __KERNEL__ */
> -- 
> 1.5.6.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/6] union-mount: Introduce union_mount structure and basic operations
  2010-03-03 20:45     ` Valerie Aurora
@ 2010-03-04 16:24       ` Miklos Szeredi
  2010-03-09 19:49         ` Valerie Aurora
  0 siblings, 1 reply; 18+ messages in thread
From: Miklos Szeredi @ 2010-03-04 16:24 UTC (permalink / raw)
  To: Valerie Aurora; +Cc: miklos, viro, hch, linux-fsdevel, linux-kernel, jblunck

On Wed, 3 Mar 2010, Valerie Aurora wrote:
> On Wed, Mar 03, 2010 at 06:33:20PM +0100, Miklos Szeredi wrote:
> > On Tue,  2 Mar 2010, Valerie Aurora wrote:
> > > +struct union_mount *union_alloc(struct dentry *this, struct vfsmount *this_mnt,
> > > +				struct dentry *next, struct vfsmount *next_mnt)
> > 
> > 
> > Why doesn't union_alloc, append_to_union, union_lookup,
> > union_down_one, etc use "struct path *" arg instead of separate
> > vfsmount and dentry pointers?
> 
> I'd prefer that too, but it isn't a clear win.  For append_to_union(),
> the reason is that we call it when a file system is mounted, using mnt
> and mnt->mnt_root as the first args:
> 
> int attach_mnt_union(struct vfsmount *mnt, struct vfsmount *dest_mnt,
> 		     struct dentry *dest_dentry)
> {
> 	if (!IS_MNT_UNION(mnt))
> 		return 0;
> 
> 	return append_to_union(mnt, mnt->mnt_root, dest_mnt, dest_dentry);
> }
> 
> Same thing happens in detach_mnt_union() with union_lookup().  That
> trickles down into the rest.  I suppose I could create a temporary
> path variable for those two functions and then we'd be paths
> everywhere else.  What do you think?

If it's just two temporary vars, then IMO it's a win.  It's much
easier to read the functions if it has half the arguments.

> 
> > > +     um = kmem_cache_alloc(union_cache, GFP_ATOMIC);
> > > +     if (!um)
> > > +             return NULL;
> > > +
> > > +     atomic_set(&um->u_count, 1);
> > 
> > Why is u_count not a "struct kref"?
> 
> We stole this from the inode cache code, so for the same reason inodes
> have i_count as atomic_t instead of a kref (whatever that is). :)

i_count does some tricky things.  If you just want plain an simple
refcounting then you should be using krefs.

> > > > +/*
> > > + * WARNING! Confusing terminology alert.
> > > + *
> > > + * Note that the directions "up" and "down" in union mounts are the
> > > + * opposite of "up" and "down" in normal VFS operation terminology.
> > > + * "up" in the rest of the VFS means "towards the root of the mount
> > > + * tree."  If you mount B on top of A, following B "up" will get you
> > > + * A.  In union mounts, "up" means "towards the most recently mounted
> > > + * layer of the union stack."  If you union mount B on top of A,
> > > + * following A "up" will get you to B.  Another way to put it is that
> > > + * "up" in the VFS means going from this mount towards the direction
> > > + * of its mnt->mnt_parent pointer, but "up" in union mounts means
> > > + * going in the opposite direction (until you run out of union
> > > + * layers).
> > > + */
> > 
> > So if this is confusing, why not use a different terminology for union
> > layers?  Like "next" and "prev" like it is already used in the
> > structures.
> 
> Unfortunately, "upper" and "lower" are fairly well established
> concepts in layering file systems and seem to be the most natural way
> for programmers to think about unioned file systems.  It's only the
> VFS (which most people never touch) that uses "up" and "down" in the
> opposite sense.  I think the better path is to replace "next" and
> "prev" in the structure.

Okay.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/6] union-mount: Drive the union cache via dcache
  2010-03-03 21:49       ` Valerie Aurora
@ 2010-03-04 16:34         ` Miklos Szeredi
  2010-03-09 19:22           ` Valerie Aurora
  0 siblings, 1 reply; 18+ messages in thread
From: Miklos Szeredi @ 2010-03-04 16:34 UTC (permalink / raw)
  To: Valerie Aurora; +Cc: miklos, jblunck, viro, hch, linux-fsdevel, linux-kernel

On Wed, 3 Mar 2010, Valerie Aurora wrote:
> On Wed, Mar 03, 2010 at 06:35:55PM +0100, Miklos Szeredi wrote:
> > On Tue,  2 Mar 2010, Valerie Aurora wrote:
> > > +/*
> > > + * This must be called after __d_drop_unions() without holding any locks.
> > > + * Note: The dentry might still be reachable via a lookup but at that time it
> > > + * already a negative dentry. Otherwise it would be unhashed. The union_mount
> > > + * structure itself is still reachable through mnt->mnt_unions (which we
> > > + * protect against with union_lock).
> > > + *
> > > + * We were worried about a recursive dput() call through:
> > > + *
> > > + * dput()->d_kill()->shrink_d_unions()->union_put()->dput()
> > > + *
> > > + * But this path can only be reached if the dentry is unhashed when we
> > > + * enter the first dput(), and it can only be unhashed if it was
> > > + * rmdir()'d, and d_delete() calls shrink_d_unions() for us.
> > > + */
> > > +void shrink_d_unions(struct dentry *dentry)
> > > +{
> > > +	struct union_mount *this, *next;
> > > +
> > > +repeat:
> > > +	spin_lock(&union_lock);
> > > +	list_for_each_entry_safe(this, next, &dentry->d_unions, u_unions) {
> > > +		BUG_ON(!hlist_unhashed(&this->u_hash));
> > > +		BUG_ON(!hlist_unhashed(&this->u_rhash));
> > > +		list_del(&this->u_unions);
> > > +		this->u_next.dentry->d_unionized--;
> > > +		spin_unlock(&union_lock);
> > > +		union_put(this);
> > > +		goto repeat;
> > 
> > This loop is weird.  That list_for_each_entry_safe is just used to
> > initialize "this", since it unconditionally restarts from the
> > beginning.
> 
> This loop is definitely weird, but the alternative is so simple
> (replace the goto with a spin_lock()) that I suspect Jan had a reason
> to write it this way.  Jan, do you recall?

Something like the following is equivalent but more readable:

	struct list_head *head = &dentry->d_unions;

	spin_lock(&union_lock);
	while (!list_empty(head) {
		this = list_entry(head->next, struct union_mount, u_unions);
		...
		spin_unlock(&union_lock);
		union_put(this);
		spin_lock(&union_lock);
	}
	spin_unlock(&union_lock);

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/6] union-mount: Drive the union cache via dcache
  2010-03-04 16:34         ` Miklos Szeredi
@ 2010-03-09 19:22           ` Valerie Aurora
  0 siblings, 0 replies; 18+ messages in thread
From: Valerie Aurora @ 2010-03-09 19:22 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: jblunck, viro, hch, linux-fsdevel, linux-kernel

On Thu, Mar 04, 2010 at 05:34:13PM +0100, Miklos Szeredi wrote:
> On Wed, 3 Mar 2010, Valerie Aurora wrote:
> > On Wed, Mar 03, 2010 at 06:35:55PM +0100, Miklos Szeredi wrote:
> > > On Tue,  2 Mar 2010, Valerie Aurora wrote:
> > > > +/*
> > > > + * This must be called after __d_drop_unions() without holding any locks.
> > > > + * Note: The dentry might still be reachable via a lookup but at that time it
> > > > + * already a negative dentry. Otherwise it would be unhashed. The union_mount
> > > > + * structure itself is still reachable through mnt->mnt_unions (which we
> > > > + * protect against with union_lock).
> > > > + *
> > > > + * We were worried about a recursive dput() call through:
> > > > + *
> > > > + * dput()->d_kill()->shrink_d_unions()->union_put()->dput()
> > > > + *
> > > > + * But this path can only be reached if the dentry is unhashed when we
> > > > + * enter the first dput(), and it can only be unhashed if it was
> > > > + * rmdir()'d, and d_delete() calls shrink_d_unions() for us.
> > > > + */
> > > > +void shrink_d_unions(struct dentry *dentry)
> > > > +{
> > > > +	struct union_mount *this, *next;
> > > > +
> > > > +repeat:
> > > > +	spin_lock(&union_lock);
> > > > +	list_for_each_entry_safe(this, next, &dentry->d_unions, u_unions) {
> > > > +		BUG_ON(!hlist_unhashed(&this->u_hash));
> > > > +		BUG_ON(!hlist_unhashed(&this->u_rhash));
> > > > +		list_del(&this->u_unions);
> > > > +		this->u_next.dentry->d_unionized--;
> > > > +		spin_unlock(&union_lock);
> > > > +		union_put(this);
> > > > +		goto repeat;
> > > 
> > > This loop is weird.  That list_for_each_entry_safe is just used to
> > > initialize "this", since it unconditionally restarts from the
> > > beginning.
> > 
> > This loop is definitely weird, but the alternative is so simple
> > (replace the goto with a spin_lock()) that I suspect Jan had a reason
> > to write it this way.  Jan, do you recall?
> 
> Something like the following is equivalent but more readable:
> 
> 	struct list_head *head = &dentry->d_unions;
> 
> 	spin_lock(&union_lock);
> 	while (!list_empty(head) {
> 		this = list_entry(head->next, struct union_mount, u_unions);
> 		...
> 		spin_unlock(&union_lock);
> 		union_put(this);
> 		spin_lock(&union_lock);
> 	}
> 	spin_unlock(&union_lock);

Okay, I will rewrite it along those lines.  Thanks,

-VAL

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/6] union-mount: Introduce union_mount structure and basic operations
  2010-03-04 16:24       ` Miklos Szeredi
@ 2010-03-09 19:49         ` Valerie Aurora
  0 siblings, 0 replies; 18+ messages in thread
From: Valerie Aurora @ 2010-03-09 19:49 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: viro, hch, linux-fsdevel, linux-kernel, jblunck

On Thu, Mar 04, 2010 at 05:24:57PM +0100, Miklos Szeredi wrote:
> On Wed, 3 Mar 2010, Valerie Aurora wrote:
> > On Wed, Mar 03, 2010 at 06:33:20PM +0100, Miklos Szeredi wrote:
> > > On Tue,  2 Mar 2010, Valerie Aurora wrote:
> > > > +struct union_mount *union_alloc(struct dentry *this, struct vfsmount *this_mnt,
> > > > +				struct dentry *next, struct vfsmount *next_mnt)
> > > 
> > > 
> > > Why doesn't union_alloc, append_to_union, union_lookup,
> > > union_down_one, etc use "struct path *" arg instead of separate
> > > vfsmount and dentry pointers?
> > 
> > I'd prefer that too, but it isn't a clear win.  For append_to_union(),
> > the reason is that we call it when a file system is mounted, using mnt
> > and mnt->mnt_root as the first args:
> > 
> > int attach_mnt_union(struct vfsmount *mnt, struct vfsmount *dest_mnt,
> > 		     struct dentry *dest_dentry)
> > {
> > 	if (!IS_MNT_UNION(mnt))
> > 		return 0;
> > 
> > 	return append_to_union(mnt, mnt->mnt_root, dest_mnt, dest_dentry);
> > }
> > 
> > Same thing happens in detach_mnt_union() with union_lookup().  That
> > trickles down into the rest.  I suppose I could create a temporary
> > path variable for those two functions and then we'd be paths
> > everywhere else.  What do you think?
> 
> If it's just two temporary vars, then IMO it's a win.  It's much
> easier to read the functions if it has half the arguments.

I agree, I'll make that change.

> > > > +     um = kmem_cache_alloc(union_cache, GFP_ATOMIC);
> > > > +     if (!um)
> > > > +             return NULL;
> > > > +
> > > > +     atomic_set(&um->u_count, 1);
> > > 
> > > Why is u_count not a "struct kref"?
> > 
> > We stole this from the inode cache code, so for the same reason inodes
> > have i_count as atomic_t instead of a kref (whatever that is). :)
> 
> i_count does some tricky things.  If you just want plain an simple
> refcounting then you should be using krefs.

Could you elaborate more?  I don't see what's so tricky about an
atomic counter.

Thanks,

-VAL

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-03-09 19:49 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-02 22:11 [RFC PATCH 0/6] Union mount core rewrite v1 Valerie Aurora
2010-03-02 22:11 ` [PATCH 1/6] union-mount: Introduce union_mount structure and basic operations Valerie Aurora
2010-03-02 22:11   ` [PATCH 2/6] union-mount: Drive the union cache via dcache Valerie Aurora
2010-03-02 22:11     ` [PATCH 3/6] union-mount: Implement union lookup Valerie Aurora
2010-03-02 22:11       ` [PATCH 4/6] union-mount: Support for mounting union mount file systems Valerie Aurora
2010-03-02 22:11         ` [PATCH 5/6] union-mount: Call do_whiteout() on unlink and rmdir in unions Valerie Aurora
2010-03-02 22:11           ` [PATCH 6/6] union-mount: Copy up directory entries on first readdir() Valerie Aurora
2010-03-03 21:53             ` Multiple read-only layers in union mounts (was Re: [PATCH 6/6] union-mount: Copy up directory entries on first readdir()) Valerie Aurora
2010-03-03 17:35     ` [PATCH 2/6] union-mount: Drive the union cache via dcache Miklos Szeredi
2010-03-03 21:49       ` Valerie Aurora
2010-03-04 16:34         ` Miklos Szeredi
2010-03-09 19:22           ` Valerie Aurora
2010-03-03 17:33   ` [PATCH 1/6] union-mount: Introduce union_mount structure and basic operations Miklos Szeredi
2010-03-03 20:45     ` Valerie Aurora
2010-03-04 16:24       ` Miklos Szeredi
2010-03-09 19:49         ` Valerie Aurora
2010-03-03 17:28 ` [RFC PATCH 0/6] Union mount core rewrite v1 Miklos Szeredi
2010-03-03 20:31   ` Valerie Aurora

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.