linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/32] VFS based Union Mount (V3)
@ 2009-05-18 16:08 Jan Blunck
  2009-05-18 16:08 ` [PATCH 01/32] atomic: Only take lock when the counter drops to zero on UP as well Jan Blunck
                   ` (35 more replies)
  0 siblings, 36 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:08 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

Here is another post of the VFS based union mount implementation.

Traditionally the mount operation is opaque, which means that the content of
the mount point, the directory where the file system is mounted on, is hidden
by the content of the mounted file system's root directory until the file
system is unmounted again. Unlike the traditional UNIX mount mechanism, that
hides the contents of the mount point, a union mount presents a view as if
both filesystems are merged together. Although only the topmost layer of the
mount stack can be altered, it appears as if transparent file system mounts
allow any file to be created, modified or deleted.

Most people know the concepts and features of union mounts from other
operating systems like Sun's Translucent Filesystem, Plan9 or BSD. For an
in-depth review of union mounts and other unioning file systems, see:

http://lwn.net/Articles/324291/
http://lwn.net/Articles/325369/
http://lwn.net/Articles/327738/

Here are the key features of this implementation:
- completely VFS based
- does not change the namespace stacking
- directory listings have duplicate entries removed in the kernel
- writable unions: only the topmost file system layer may be writable
- writable unions: new whiteout filetype handled inside the kernel

Major changes since last post:
- Updated the whiteout patches:
  - DCACHE_WHITEOUT flag set on a negative dentry
  - uses filetype instead of reserved inode number on EXT2
- Copy-up directories during lookup
- Implemented fallthru support for in-kernel readdir() as proposed by
  Valerie Aurora (Henson)
- Bugfixes

Valerie updated the HOWTO page and the UML disk image. You can find her
instruction how to testdrive the code here:

http://valerieaurora.org/union/

The following patches apply on 2.6.29. Comments are welcome!

Cheers,
Jan


Jan Blunck (26):
  atomic: Only take lock when the counter drops to zero on UP as well
  VFS: BUG() if somebody tries to rehash an already hashed dentry
  VFS: propagate mnt_flags into do_loopback
  VFS: Make lookup_hash() return a struct path
  VFS: Remove unnecessary micro-optimization in cached_lookup()
  VFS: Make real_lookup() return a struct path
  VFS: Introduce dput() variant that maintains a kill-list
  whiteout: Don't return information about whiteouts to userspace
  whiteout: Add vfs_whiteout() and whiteout inode operation
  whiteout: Set S_OPAQUE inode flag when creating directories
  whiteout: Add whiteout support to tmpfs
  whiteout: Split of ext2_append_link() from ext2_add_link()
  whiteout: Add whiteout support to ext2
  whiteout: Add path_whiteout() helper
  union-mount: Documentation
  union-mount: Introduce MNT_UNION and MS_UNION flags
  union-mount: Introduce union_mount structure
  union-mount: Drive the union cache via dcache
  union-mount: Some checks during namespace changes
  union-mount: Changes to the namespace handling
  union-mount: Make lookup work for union-mounted file systems
  union-mount: stop lookup when directory has S_OPAQUE flag set
  union-mount: stop lookup when finding a whiteout
  union-mount: in-kernel file copy between union mounted filesystems
  union-mount: check for logically empty directory (FIXME)
  union-mount: call do_whiteout() on unlink and rmdir

Valerie Aurora (Henson) (6):
  union-mount: Always create topmost directory on open
  union-mount: Basic fallthru definitions
  union mount: Support for fallthru entries in union mount lookup
  union mount: ext2 fallthru support
  union-mount: tmpfs fallthru support
  union-mount: Copy up directory entries on first readdir()

 Documentation/filesystems/union-mounts.txt |  187 +++++
 fs/Kconfig                                 |    8 +
 fs/Makefile                                |    2 +
 fs/compat.c                                |    9 +
 fs/dcache.c                                |  143 ++++-
 fs/ext2/dir.c                              |  242 ++++++-
 fs/ext2/ext2.h                             |    4 +
 fs/ext2/inode.c                            |   11 +-
 fs/ext2/namei.c                            |   85 +++-
 fs/ext2/super.c                            |    8 +
 fs/libfs.c                                 |   18 +-
 fs/namei.c                                 | 1009 +++++++++++++++++++++++++---
 fs/namespace.c                             |   54 ++-
 fs/nfsd/nfs3xdr.c                          |    5 +
 fs/nfsd/nfs4xdr.c                          |    2 +-
 fs/nfsd/nfsxdr.c                           |    4 +
 fs/readdir.c                               |   25 +
 fs/union.c                                 |  948 ++++++++++++++++++++++++++
 include/linux/dcache.h                     |   30 +-
 include/linux/ext2_fs.h                    |    5 +
 include/linux/fs.h                         |    7 +
 include/linux/mount.h                      |    4 +
 include/linux/namei.h                      |    6 +
 include/linux/union.h                      |   82 +++
 lib/dec_and_lock.c                         |    3 +-
 mm/shmem.c                                 |  195 +++++-
 26 files changed, 2927 insertions(+), 169 deletions(-)
 create mode 100644 Documentation/filesystems/union-mounts.txt
 create mode 100644 fs/union.c
 create mode 100644 include/linux/union.h


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 01/32] atomic: Only take lock when the counter drops to zero on UP as well
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
@ 2009-05-18 16:08 ` Jan Blunck
  2009-05-18 16:08 ` [PATCH 02/32] VFS: BUG() if somebody tries to rehash an already hashed dentry Jan Blunck
                   ` (34 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:08 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

I think it is wrong to unconditionally take the lock before calling
atomic_dec_and_test() in _atomic_dec_and_lock(). This will deadlock in
situation where it is known that the counter will now reach zero (e.g. holding
another reference to the same object) and the lock is already taken.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 lib/dec_and_lock.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/lib/dec_and_lock.c b/lib/dec_and_lock.c
index a65c314..e73822a 100644
--- a/lib/dec_and_lock.c
+++ b/lib/dec_and_lock.c
@@ -19,11 +19,10 @@
  */
 int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)
 {
-#ifdef CONFIG_SMP
 	/* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */
 	if (atomic_add_unless(atomic, -1, 1))
 		return 0;
-#endif
+
 	/* Otherwise do it the slow way */
 	spin_lock(lock);
 	if (atomic_dec_and_test(atomic))
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 02/32] VFS: BUG() if somebody tries to rehash an already hashed dentry
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
  2009-05-18 16:08 ` [PATCH 01/32] atomic: Only take lock when the counter drops to zero on UP as well Jan Blunck
@ 2009-05-18 16:08 ` Jan Blunck
  2009-05-18 16:08 ` [PATCH 03/32] VFS: propagate mnt_flags into do_loopback Jan Blunck
                   ` (33 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:08 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

Break early when somebody tries to rehash an already hashed dentry.
Otherwise this leads to interesting corruptions in the dcache hash table
later on.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/dcache.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 07e2d4a..085f527 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1562,6 +1562,7 @@ void d_rehash(struct dentry * entry)
 {
 	spin_lock(&dcache_lock);
 	spin_lock(&entry->d_lock);
+	BUG_ON(!d_unhashed(entry));
 	_d_rehash(entry);
 	spin_unlock(&entry->d_lock);
 	spin_unlock(&dcache_lock);
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 03/32] VFS: propagate mnt_flags into do_loopback
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
  2009-05-18 16:08 ` [PATCH 01/32] atomic: Only take lock when the counter drops to zero on UP as well Jan Blunck
  2009-05-18 16:08 ` [PATCH 02/32] VFS: BUG() if somebody tries to rehash an already hashed dentry Jan Blunck
@ 2009-05-18 16:08 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 04/32] VFS: Make lookup_hash() return a struct path Jan Blunck
                   ` (32 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:08 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

The mnt_flags are propagated into do_loopback(), so that they can be checked
when mounting something loopback into a union.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/namespace.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 06f8e63..f0a5ce7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1461,8 +1461,8 @@ static int do_change_type(struct path *path, int flag)
 /*
  * do loopback mount.
  */
-static int do_loopback(struct path *path, char *old_name,
-				int recurse)
+static int do_loopback(struct path *path, char *old_name, int recurse,
+		       int mnt_flags)
 {
 	struct path old_path;
 	struct vfsmount *mnt = NULL;
@@ -1952,7 +1952,8 @@ long do_mount(char *dev_name, char *dir_name, char *type_page,
 		retval = do_remount(&path, flags & ~MS_REMOUNT, mnt_flags,
 				    data_page);
 	else if (flags & MS_BIND)
-		retval = do_loopback(&path, dev_name, flags & MS_REC);
+		retval = do_loopback(&path, dev_name, flags & MS_REC,
+				     mnt_flags);
 	else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE))
 		retval = do_change_type(&path, flags);
 	else if (flags & MS_MOVE)
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 04/32] VFS: Make lookup_hash() return a struct path
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (2 preceding siblings ...)
  2009-05-18 16:08 ` [PATCH 03/32] VFS: propagate mnt_flags into do_loopback Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 05/32] VFS: Remove unnecessary micro-optimization in cached_lookup() Jan Blunck
                   ` (31 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

This patch changes lookup_hash() into returning a struct path.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/namei.c |  114 +++++++++++++++++++++++++++++++----------------------------
 1 files changed, 60 insertions(+), 54 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index bbc15c2..081aef1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1149,7 +1149,7 @@ int path_lookup_open(int dfd, const char *name, unsigned int lookup_flags,
 }
 
 static struct dentry *__lookup_hash(struct qstr *name,
-		struct dentry *base, struct nameidata *nd)
+				    struct dentry *base, struct nameidata *nd)
 {
 	struct dentry *dentry;
 	struct inode *inode;
@@ -1196,14 +1196,22 @@ out:
  * needs parent already locked. Doesn't follow mounts.
  * SMP-safe.
  */
-static struct dentry *lookup_hash(struct nameidata *nd)
+static int lookup_hash(struct nameidata *nd, struct qstr *name,
+		       struct path *path)
 {
 	int err;
 
 	err = inode_permission(nd->path.dentry->d_inode, MAY_EXEC);
 	if (err)
-		return ERR_PTR(err);
-	return __lookup_hash(&nd->last, nd->path.dentry, nd);
+		return err;
+	path->mnt = nd->path.mnt;
+	path->dentry =  __lookup_hash(name, nd->path.dentry, nd);
+	if (IS_ERR(path->dentry)) {
+		err = PTR_ERR(path->dentry);
+		path->dentry = NULL;
+		path->mnt = NULL;
+	}
+	return err;
 }
 
 static int __lookup_one_len(const char *name, struct qstr *this,
@@ -1690,12 +1698,10 @@ struct file *do_filp_open(int dfd, const char *pathname,
 	if (flag & O_EXCL)
 		nd.flags |= LOOKUP_EXCL;
 	mutex_lock(&dir->d_inode->i_mutex);
-	path.dentry = lookup_hash(&nd);
-	path.mnt = nd.path.mnt;
+	error = lookup_hash(&nd, &nd.last, &path);
 
 do_last:
-	error = PTR_ERR(path.dentry);
-	if (IS_ERR(path.dentry)) {
+	if (error) {
 		mutex_unlock(&dir->d_inode->i_mutex);
 		goto exit;
 	}
@@ -1841,8 +1847,7 @@ do_link:
 	}
 	dir = nd.path.dentry;
 	mutex_lock(&dir->d_inode->i_mutex);
-	path.dentry = lookup_hash(&nd);
-	path.mnt = nd.path.mnt;
+	error = lookup_hash(&nd, &nd.last, &path);
 	__putname(nd.last.name);
 	goto do_last;
 }
@@ -1876,7 +1881,8 @@ EXPORT_SYMBOL(filp_open);
  */
 struct dentry *lookup_create(struct nameidata *nd, int is_dir)
 {
-	struct dentry *dentry = ERR_PTR(-EEXIST);
+	struct path path = { .dentry = ERR_PTR(-EEXIST) } ;
+	int err;
 
 	mutex_lock_nested(&nd->path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
 	/*
@@ -1892,11 +1898,13 @@ struct dentry *lookup_create(struct nameidata *nd, int is_dir)
 	/*
 	 * Do the final lookup.
 	 */
-	dentry = lookup_hash(nd);
-	if (IS_ERR(dentry))
+	err = lookup_hash(nd, &nd->last, &path);
+	if (err) {
+		path.dentry = ERR_PTR(err);
 		goto fail;
+	}
 
-	if (dentry->d_inode)
+	if (path.dentry->d_inode)
 		goto eexist;
 	/*
 	 * Special case - lookup gave negative, but... we had foo/bar/
@@ -1905,15 +1913,17 @@ struct dentry *lookup_create(struct nameidata *nd, int is_dir)
 	 * been asking for (non-existent) directory. -ENOENT for you.
 	 */
 	if (unlikely(!is_dir && nd->last.name[nd->last.len])) {
-		dput(dentry);
-		dentry = ERR_PTR(-ENOENT);
+		path_put_conditional(&path, nd);
+		path.dentry = ERR_PTR(-ENOENT);
 	}
-	return dentry;
+	if (nd->path.mnt != path.mnt)
+		mntput(path.mnt);
+	return path.dentry;
 eexist:
-	dput(dentry);
-	dentry = ERR_PTR(-EEXIST);
+	path_put_conditional(&path, nd);
+	path.dentry = ERR_PTR(-EEXIST);
 fail:
-	return dentry;
+	return path.dentry;
 }
 EXPORT_SYMBOL_GPL(lookup_create);
 
@@ -2150,7 +2160,7 @@ static long do_rmdir(int dfd, const char __user *pathname)
 {
 	int error = 0;
 	char * name;
-	struct dentry *dentry;
+	struct path path;
 	struct nameidata nd;
 
 	error = user_path_parent(dfd, pathname, &nd, &name);
@@ -2172,21 +2182,20 @@ static long do_rmdir(int dfd, const char __user *pathname)
 	nd.flags &= ~LOOKUP_PARENT;
 
 	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
-	dentry = lookup_hash(&nd);
-	error = PTR_ERR(dentry);
-	if (IS_ERR(dentry))
+	error = lookup_hash(&nd, &nd.last, &path);
+	if (error)
 		goto exit2;
 	error = mnt_want_write(nd.path.mnt);
 	if (error)
 		goto exit3;
-	error = security_path_rmdir(&nd.path, dentry);
+	error = security_path_rmdir(&nd.path, path.dentry);
 	if (error)
 		goto exit4;
-	error = vfs_rmdir(nd.path.dentry->d_inode, dentry);
+	error = vfs_rmdir(nd.path.dentry->d_inode, path.dentry);
 exit4:
 	mnt_drop_write(nd.path.mnt);
 exit3:
-	dput(dentry);
+	path_put_conditional(&path, &nd);
 exit2:
 	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
 exit1:
@@ -2241,7 +2250,7 @@ static long do_unlinkat(int dfd, const char __user *pathname)
 {
 	int error;
 	char *name;
-	struct dentry *dentry;
+	struct path path;
 	struct nameidata nd;
 	struct inode *inode = NULL;
 
@@ -2256,26 +2265,25 @@ static long do_unlinkat(int dfd, const char __user *pathname)
 	nd.flags &= ~LOOKUP_PARENT;
 
 	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
-	dentry = lookup_hash(&nd);
-	error = PTR_ERR(dentry);
-	if (!IS_ERR(dentry)) {
+	error = lookup_hash(&nd, &nd.last, &path);
+	if (!error) {
 		/* Why not before? Because we want correct error value */
 		if (nd.last.name[nd.last.len])
 			goto slashes;
-		inode = dentry->d_inode;
+		inode = path.dentry->d_inode;
 		if (inode)
 			atomic_inc(&inode->i_count);
 		error = mnt_want_write(nd.path.mnt);
 		if (error)
 			goto exit2;
-		error = security_path_unlink(&nd.path, dentry);
+		error = security_path_unlink(&nd.path, path.dentry);
 		if (error)
 			goto exit3;
-		error = vfs_unlink(nd.path.dentry->d_inode, dentry);
+		error = vfs_unlink(nd.path.dentry->d_inode, path.dentry);
 exit3:
 		mnt_drop_write(nd.path.mnt);
 	exit2:
-		dput(dentry);
+		path_put_conditional(&path, &nd);
 	}
 	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
 	if (inode)
@@ -2286,8 +2294,8 @@ exit1:
 	return error;
 
 slashes:
-	error = !dentry->d_inode ? -ENOENT :
-		S_ISDIR(dentry->d_inode->i_mode) ? -EISDIR : -ENOTDIR;
+	error = !path.dentry->d_inode ? -ENOENT :
+		S_ISDIR(path.dentry->d_inode->i_mode) ? -EISDIR : -ENOTDIR;
 	goto exit2;
 }
 
@@ -2627,7 +2635,7 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 		int, newdfd, const char __user *, newname)
 {
 	struct dentry *old_dir, *new_dir;
-	struct dentry *old_dentry, *new_dentry;
+	struct path old, new;
 	struct dentry *trap;
 	struct nameidata oldnd, newnd;
 	char *from;
@@ -2661,16 +2669,15 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 
 	trap = lock_rename(new_dir, old_dir);
 
-	old_dentry = lookup_hash(&oldnd);
-	error = PTR_ERR(old_dentry);
-	if (IS_ERR(old_dentry))
+	error = lookup_hash(&oldnd, &oldnd.last, &old);
+	if (error)
 		goto exit3;
 	/* source must exist */
 	error = -ENOENT;
-	if (!old_dentry->d_inode)
+	if (!old.dentry->d_inode)
 		goto exit4;
 	/* unless the source is a directory trailing slashes give -ENOTDIR */
-	if (!S_ISDIR(old_dentry->d_inode->i_mode)) {
+	if (!S_ISDIR(old.dentry->d_inode->i_mode)) {
 		error = -ENOTDIR;
 		if (oldnd.last.name[oldnd.last.len])
 			goto exit4;
@@ -2679,32 +2686,31 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 	}
 	/* source should not be ancestor of target */
 	error = -EINVAL;
-	if (old_dentry == trap)
+	if (old.dentry == trap)
 		goto exit4;
-	new_dentry = lookup_hash(&newnd);
-	error = PTR_ERR(new_dentry);
-	if (IS_ERR(new_dentry))
+	error = lookup_hash(&newnd, &newnd.last, &new);
+	if (error)
 		goto exit4;
 	/* target should not be an ancestor of source */
 	error = -ENOTEMPTY;
-	if (new_dentry == trap)
+	if (new.dentry == trap)
 		goto exit5;
 
 	error = mnt_want_write(oldnd.path.mnt);
 	if (error)
 		goto exit5;
-	error = security_path_rename(&oldnd.path, old_dentry,
-				     &newnd.path, new_dentry);
+	error = security_path_rename(&oldnd.path, old.dentry,
+				     &newnd.path, new.dentry);
 	if (error)
 		goto exit6;
-	error = vfs_rename(old_dir->d_inode, old_dentry,
-				   new_dir->d_inode, new_dentry);
+	error = vfs_rename(old_dir->d_inode, old.dentry,
+				   new_dir->d_inode, new.dentry);
 exit6:
 	mnt_drop_write(oldnd.path.mnt);
 exit5:
-	dput(new_dentry);
+	path_put_conditional(&new, &newnd);
 exit4:
-	dput(old_dentry);
+	path_put_conditional(&old, &oldnd);
 exit3:
 	unlock_rename(new_dir, old_dir);
 exit2:
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 05/32] VFS: Remove unnecessary micro-optimization in cached_lookup()
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (3 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 04/32] VFS: Make lookup_hash() return a struct path Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 06/32] VFS: Make real_lookup() return a struct path Jan Blunck
                   ` (30 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

d_lookup() takes rename_lock which is a seq_lock.  This is so cheap
it's not worth calling lockless __d_lookup() first from
cache_lookup().  Rename cached_lookup() to cache_lookup() while we're
there.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/namei.c |   13 ++++---------
 1 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 081aef1..a9dd19b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -402,15 +402,10 @@ do_revalidate(struct dentry *dentry, struct nameidata *nd)
  * Internal lookup() using the new generic dcache.
  * SMP-safe
  */
-static struct dentry * cached_lookup(struct dentry * parent, struct qstr * name, struct nameidata *nd)
+static struct dentry *cache_lookup(struct dentry *parent, struct qstr *name,
+				   struct nameidata *nd)
 {
-	struct dentry * dentry = __d_lookup(parent, name);
-
-	/* lockess __d_lookup may fail due to concurrent d_move() 
-	 * in some unrelated directory, so try with d_lookup
-	 */
-	if (!dentry)
-		dentry = d_lookup(parent, name);
+	struct dentry *dentry = d_lookup(parent, name);
 
 	if (dentry && dentry->d_op && dentry->d_op->d_revalidate)
 		dentry = do_revalidate(dentry, nd);
@@ -1168,7 +1163,7 @@ static struct dentry *__lookup_hash(struct qstr *name,
 			goto out;
 	}
 
-	dentry = cached_lookup(base, name, nd);
+	dentry = cache_lookup(base, name, nd);
 	if (!dentry) {
 		struct dentry *new;
 
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 06/32] VFS: Make real_lookup() return a struct path
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (4 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 05/32] VFS: Remove unnecessary micro-optimization in cached_lookup() Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 07/32] VFS: Introduce dput() variant that maintains a kill-list Jan Blunck
                   ` (29 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

This patch changes real_lookup() into returning a struct path.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/namei.c |   82 +++++++++++++++++++++++++++++++++++++----------------------
 1 files changed, 51 insertions(+), 31 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a9dd19b..e19fa2b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -460,10 +460,11 @@ ok:
  * make sure that nobody added the entry to the dcache in the meantime..
  * SMP-safe
  */
-static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, struct nameidata *nd)
+static int real_lookup(struct nameidata *nd, struct qstr *name,
+		       struct path *path)
 {
-	struct dentry * result;
-	struct inode *dir = parent->d_inode;
+	struct inode *dir = nd->path.dentry->d_inode;
+	int res = 0;
 
 	mutex_lock(&dir->i_mutex);
 	/*
@@ -480,27 +481,36 @@ static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, s
 	 *
 	 * so doing d_lookup() (with seqlock), instead of lockfree __d_lookup
 	 */
-	result = d_lookup(parent, name);
-	if (!result) {
+	path->dentry = d_lookup(nd->path.dentry, name);
+	path->mnt = nd->path.mnt;
+	if (!path->dentry) {
 		struct dentry *dentry;
 
 		/* Don't create child dentry for a dead directory. */
-		result = ERR_PTR(-ENOENT);
-		if (IS_DEADDIR(dir))
+		if (IS_DEADDIR(dir)) {
+			res = -ENOENT;
 			goto out_unlock;
+		}
 
-		dentry = d_alloc(parent, name);
-		result = ERR_PTR(-ENOMEM);
+		dentry = d_alloc(nd->path.dentry, name);
 		if (dentry) {
-			result = dir->i_op->lookup(dir, dentry, nd);
-			if (result)
+			path->dentry = dir->i_op->lookup(dir, dentry, nd);
+			if (path->dentry) {
 				dput(dentry);
-			else
-				result = dentry;
+				if (IS_ERR(path->dentry)) {
+					res = PTR_ERR(path->dentry);
+					path->dentry = NULL;
+					path->mnt = NULL;
+				}
+			} else
+				path->dentry = dentry;
+		} else {
+			res = -ENOMEM;
+			path->mnt = NULL;
 		}
 out_unlock:
 		mutex_unlock(&dir->i_mutex);
-		return result;
+		return res;
 	}
 
 	/*
@@ -508,12 +518,20 @@ out_unlock:
 	 * we waited on the semaphore. Need to revalidate.
 	 */
 	mutex_unlock(&dir->i_mutex);
-	if (result->d_op && result->d_op->d_revalidate) {
-		result = do_revalidate(result, nd);
-		if (!result)
-			result = ERR_PTR(-ENOENT);
+	if (path->dentry->d_op && path->dentry->d_op->d_revalidate) {
+		path->dentry = do_revalidate(path->dentry, nd);
+		if (!path->dentry) {
+			res = -ENOENT;
+			path->mnt = NULL;
+		}
+		if (IS_ERR(path->dentry)) {
+			res = PTR_ERR(path->dentry);
+			path->dentry = NULL;
+			path->mnt = NULL;
+		}
 	}
-	return result;
+
+	return res;
 }
 
 /*
@@ -779,35 +797,37 @@ static __always_inline void follow_dotdot(struct nameidata *nd)
 static int do_lookup(struct nameidata *nd, struct qstr *name,
 		     struct path *path)
 {
-	struct vfsmount *mnt = nd->path.mnt;
-	struct dentry *dentry = __d_lookup(nd->path.dentry, name);
+	int err;
 
-	if (!dentry)
+	path->dentry = __d_lookup(nd->path.dentry, name);
+	path->mnt = nd->path.mnt;
+	if (!path->dentry)
 		goto need_lookup;
-	if (dentry->d_op && dentry->d_op->d_revalidate)
+	if (path->dentry->d_op && path->dentry->d_op->d_revalidate)
 		goto need_revalidate;
+
 done:
-	path->mnt = mnt;
-	path->dentry = dentry;
 	__follow_mount(path);
 	return 0;
 
 need_lookup:
-	dentry = real_lookup(nd->path.dentry, name, nd);
-	if (IS_ERR(dentry))
+	err = real_lookup(nd, name, path);
+	if (err)
 		goto fail;
 	goto done;
 
 need_revalidate:
-	dentry = do_revalidate(dentry, nd);
-	if (!dentry)
+	path->dentry = do_revalidate(path->dentry, nd);
+	if (!path->dentry)
 		goto need_lookup;
-	if (IS_ERR(dentry))
+	if (IS_ERR(path->dentry)) {
+		err = PTR_ERR(path->dentry);
 		goto fail;
+	}
 	goto done;
 
 fail:
-	return PTR_ERR(dentry);
+	return err;
 }
 
 /*
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 07/32] VFS: Introduce dput() variant that maintains a kill-list
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (5 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 06/32] VFS: Make real_lookup() return a struct path Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 08/32] whiteout: Don't return information about whiteouts to userspace Jan Blunck
                   ` (28 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

This patch introduces a new variant of dput(). This becomes necessary to
prevent a recursive call to dput() from the union mount code.

  void __dput(struct dentry *dentry, struct list_head *list, int greedy);
  struct dentry *__d_kill(struct dentry *dentry, struct list_head *list,
  	 		  int greedy);

__dput() works mostly like the original dput() did. The main difference is
that if it the greedy argument is zero it will put the parent on a special
list instead of trying to get rid of it directly.

Therefore the union mount code can safely call __dput() when it wants to get
rid of underlying dentry references during a dput(). After calling __dput()
or __d_kill() the caller must make sure that __d_kill_final() is called on all
dentries on the kill list. __d_kill_final() is actually doing the
dentry_iput() and is also dereferencing the parent.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/dcache.c |  115 +++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 files changed, 105 insertions(+), 10 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 085f527..8bfbcd7 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -157,14 +157,19 @@ static void dentry_lru_del_init(struct dentry *dentry)
 }
 
 /**
- * d_kill - kill dentry and return parent
+ * __d_kill - kill dentry and return parent
  * @dentry: dentry to kill
+ * @list: kill list
+ * @greedy: return parent instead of putting it on the kill list
  *
  * The dentry must already be unhashed and removed from the LRU.
  *
- * If this is the root of the dentry tree, return NULL.
+ * If this is the root of the dentry tree, return NULL. If greedy is zero, we
+ * put the parent of this dentry on the kill list instead. The callers must
+ * make sure that __d_kill_final() is called on all dentries on the kill list.
  */
-static struct dentry *d_kill(struct dentry *dentry)
+static struct dentry *__d_kill(struct dentry *dentry, struct list_head *list,
+			       int greedy)
 	__releases(dentry->d_lock)
 	__releases(dcache_lock)
 {
@@ -172,6 +177,20 @@ static struct dentry *d_kill(struct dentry *dentry)
 
 	list_del(&dentry->d_u.d_child);
 	dentry_stat.nr_dentry--;	/* For d_free, below */
+
+	/*
+	 * If we are not greedy we just put this on a list for later processing
+	 * (follow up to parent, releasing of inode and freeing dentry memory).
+	 */
+	if (!greedy) {
+		list_del_init(&dentry->d_alias);
+		/* at this point nobody can reach this dentry */
+		list_add(&dentry->d_lru, list);
+		spin_unlock(&dentry->d_lock);
+		spin_unlock(&dcache_lock);
+		return NULL;
+	}
+
 	/*drops the locks, at that point nobody can reach this dentry */
 	dentry_iput(dentry);
 	if (IS_ROOT(dentry))
@@ -182,6 +201,54 @@ static struct dentry *d_kill(struct dentry *dentry)
 	return parent;
 }
 
+void __dput(struct dentry *, struct list_head *, int);
+
+static void __d_kill_final(struct dentry *dentry, struct list_head *list)
+{
+	struct dentry *parent;
+	struct inode *inode = dentry->d_inode;
+
+	if (inode) {
+		dentry->d_inode = NULL;
+		if (!inode->i_nlink)
+			fsnotify_inoderemove(inode);
+		if (dentry->d_op && dentry->d_op->d_iput)
+			dentry->d_op->d_iput(dentry, inode);
+		else
+			iput(inode);
+	}
+
+	if (IS_ROOT(dentry))
+		parent = NULL;
+	else
+		parent = dentry->d_parent;
+	d_free(dentry);
+	__dput(parent, list, 1);
+}
+
+/**
+ * d_kill - kill dentry and return parent
+ * @dentry: dentry to kill
+ *
+ * The dentry must already be unhashed and removed from the LRU.
+ *
+ * If this is the root of the dentry tree, return NULL.
+ */
+static struct dentry *d_kill(struct dentry *dentry)
+{
+	LIST_HEAD(mortuary);
+	struct dentry *parent;
+
+	parent = __d_kill(dentry, &mortuary, 1);
+	while (!list_empty(&mortuary)) {
+		dentry = list_entry(mortuary.next, struct dentry, d_lru);
+		list_del(&dentry->d_lru);
+		__d_kill_final(dentry, &mortuary);
+	}
+
+	return parent;
+}
+
 /* 
  * This is dput
  *
@@ -199,19 +266,24 @@ static struct dentry *d_kill(struct dentry *dentry)
  * Real recursion would eat up our stack space.
  */
 
-/*
- * dput - release a dentry
- * @dentry: dentry to release 
+/**
+ * __dput - release a dentry
+ * @dentry: dentry to release
+ * @list: kill list argument for __d_kill()
+ * @greedy: greedy argument for __d_kill()
  *
  * Release a dentry. This will drop the usage count and if appropriate
  * call the dentry unlink method as well as removing it from the queues and
  * releasing its resources. If the parent dentries were scheduled for release
- * they too may now get deleted.
+ * they too may now get deleted if @greedy is not zero. Otherwise parent is
+ * added to the kill list. The callers must make sure that __d_kill_final() is
+ * called on all dentries on the kill list.
+ *
+ * You probably want to use dput() instead.
  *
  * no dcache lock, please.
  */
-
-void dput(struct dentry *dentry)
+void __dput(struct dentry *dentry, struct list_head *list, int greedy)
 {
 	if (!dentry)
 		return;
@@ -252,12 +324,35 @@ unhash_it:
 kill_it:
 	/* if dentry was on the d_lru list delete it from there */
 	dentry_lru_del(dentry);
-	dentry = d_kill(dentry);
+	dentry = __d_kill(dentry, list, greedy);
 	if (dentry)
 		goto repeat;
 }
 
 /**
+ * dput - release a dentry
+ * @dentry: dentry to release
+ *
+ * Release a dentry. This will drop the usage count and if appropriate
+ * call the dentry unlink method as well as removing it from the queues and
+ * releasing its resources. If the parent dentries were scheduled for release
+ * they too may now get deleted.
+ *
+ * no dcache lock, please.
+ */
+void dput(struct dentry *dentry)
+{
+	LIST_HEAD(mortuary);
+
+	__dput(dentry, &mortuary, 1);
+	while (!list_empty(&mortuary)) {
+		dentry = list_entry(mortuary.next, struct dentry, d_lru);
+		list_del(&dentry->d_lru);
+		__d_kill_final(dentry, &mortuary);
+	}
+}
+
+/**
  * d_invalidate - invalidate a dentry
  * @dentry: dentry to invalidate
  *
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 08/32] whiteout: Don't return information about whiteouts to userspace
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (6 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 07/32] VFS: Introduce dput() variant that maintains a kill-list Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 09/32] whiteout: Add vfs_whiteout() and whiteout inode operation Jan Blunck
                   ` (27 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

The userspace isn't ready for handling another filetype. Therefore this
patch lets readdir() and others skip over the whiteout directory entries
they might find.

Signed-off-by: Jan Blunck <j.blunck@tu-harburg.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/compat.c       |    9 +++++++++
 fs/nfsd/nfs3xdr.c |    5 +++++
 fs/nfsd/nfs4xdr.c |    2 +-
 fs/nfsd/nfsxdr.c  |    4 ++++
 fs/readdir.c      |    9 +++++++++
 5 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/fs/compat.c b/fs/compat.c
index d0145ca..9b83e4b 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -828,6 +828,9 @@ static int compat_fillonedir(void *__buf, const char *name, int namlen,
 	struct compat_old_linux_dirent __user *dirent;
 	compat_ulong_t d_ino;
 
+	if (d_type == DT_WHT)
+		return 0;
+
 	if (buf->result)
 		return -EINVAL;
 	d_ino = ino;
@@ -899,6 +902,9 @@ static int compat_filldir(void *__buf, const char *name, int namlen,
 	compat_ulong_t d_ino;
 	int reclen = ALIGN(NAME_OFFSET(dirent) + namlen + 2, sizeof(compat_long_t));
 
+	if (d_type == DT_WHT)
+		return 0;
+
 	buf->error = -EINVAL;	/* only used if we fail.. */
 	if (reclen > buf->count)
 		return -EINVAL;
@@ -988,6 +994,9 @@ static int compat_filldir64(void * __buf, const char * name, int namlen, loff_t
 	int reclen = ALIGN(jj + namlen + 1, sizeof(u64));
 	u64 off;
 
+	if (d_type == DT_WHT)
+		return 0;
+
 	buf->error = -EINVAL;	/* only used if we fail.. */
 	if (reclen > buf->count)
 		return -EINVAL;
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 17d0dd9..06c67af 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -883,6 +883,11 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
 	int		elen;		/* estimated entry length in words */
 	int		num_entry_words = 0;	/* actual number of words */
 
+	if (d_type == DT_WHT) {
+		cd->common.err = nfs_ok;
+		return 0;
+	}
+
 	if (cd->offset) {
 		u64 offset64 = offset;
 
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 9250067..b001ed5 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1908,7 +1908,7 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
 	__be32 nfserr = nfserr_toosmall;
 
 	/* In nfsv4, "." and ".." never make it onto the wire.. */
-	if (name && isdotent(name, namlen)) {
+	if (d_type == DT_WHT || (name && isdotent(name, namlen))) {
 		cd->common.err = nfs_ok;
 		return 0;
 	}
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c
index afd08e2..a7d622c 100644
--- a/fs/nfsd/nfsxdr.c
+++ b/fs/nfsd/nfsxdr.c
@@ -513,6 +513,10 @@ nfssvc_encode_entry(void *ccdv, const char *name,
 			namlen, name, offset, ino);
 	 */
 
+	if (d_type == DT_WHT) {
+		cd->common.err = nfs_ok;
+		return 0;
+	}
 	if (offset > ~((u32) 0)) {
 		cd->common.err = nfserr_fbig;
 		return -EINVAL;
diff --git a/fs/readdir.c b/fs/readdir.c
index 7723401..3a48491 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -77,6 +77,9 @@ static int fillonedir(void * __buf, const char * name, int namlen, loff_t offset
 	struct old_linux_dirent __user * dirent;
 	unsigned long d_ino;
 
+	if (d_type == DT_WHT)
+		return 0;
+
 	if (buf->result)
 		return -EINVAL;
 	d_ino = ino;
@@ -154,6 +157,9 @@ static int filldir(void * __buf, const char * name, int namlen, loff_t offset,
 	unsigned long d_ino;
 	int reclen = ALIGN(NAME_OFFSET(dirent) + namlen + 2, sizeof(long));
 
+	if (d_type == DT_WHT)
+		return 0;
+
 	buf->error = -EINVAL;	/* only used if we fail.. */
 	if (reclen > buf->count)
 		return -EINVAL;
@@ -239,6 +245,9 @@ static int filldir64(void * __buf, const char * name, int namlen, loff_t offset,
 	struct getdents_callback64 * buf = (struct getdents_callback64 *) __buf;
 	int reclen = ALIGN(NAME_OFFSET(dirent) + namlen + 1, sizeof(u64));
 
+	if (d_type == DT_WHT)
+		return 0;
+
 	buf->error = -EINVAL;	/* only used if we fail.. */
 	if (reclen > buf->count)
 		return -EINVAL;
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 09/32] whiteout: Add vfs_whiteout() and whiteout inode operation
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (7 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 08/32] whiteout: Don't return information about whiteouts to userspace Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 10/32] whiteout: Set S_OPAQUE inode flag when creating directories Jan Blunck
                   ` (26 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

Simply white-out a given directory entry. This functionality is usually used
in the sense of unlink. Therefore the given dentry can still be in-use and
contains an in-use inode. The filesystems inode operation has to do what
unlink or rmdir would in that case. Since the dentry still might be in-use
we have to provide a fresh unhashed dentry that is used as the whiteout
dentry instead. The given dentry is dropped and the whiteout dentry is
rehashed instead.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/dcache.c            |    4 +-
 fs/namei.c             |  104 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dcache.h |    7 +++-
 include/linux/fs.h     |    3 +
 4 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 8bfbcd7..9260c99 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1076,8 +1076,10 @@ struct dentry *d_alloc_name(struct dentry *parent, const char *name)
 /* the caller must hold dcache_lock */
 static void __d_instantiate(struct dentry *dentry, struct inode *inode)
 {
-	if (inode)
+	if (inode) {
+		dentry->d_flags &= ~DCACHE_WHITEOUT;
 		list_add(&dentry->d_alias, &inode->i_dentry);
+	}
 	dentry->d_inode = inode;
 	fsnotify_d_instantiate(dentry, inode);
 }
diff --git a/fs/namei.c b/fs/namei.c
index e19fa2b..9cab708 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2111,6 +2111,110 @@ SYSCALL_DEFINE2(mkdir, const char __user *, pathname, int, mode)
 	return sys_mkdirat(AT_FDCWD, pathname, mode);
 }
 
+
+/* Checks on the victim for whiteout */
+static inline int may_whiteout(struct inode *dir, struct dentry *victim,
+			       int isdir)
+{
+	int err;
+
+	/* from may_create() */
+	if (IS_DEADDIR(dir))
+		return -ENOENT;
+	err = inode_permission(dir, MAY_WRITE | MAY_EXEC);
+	if (err)
+		return err;
+
+	/* from may_delete() */
+	if (IS_APPEND(dir))
+		return -EPERM;
+	if (!victim->d_inode)
+		return 0;
+	if (check_sticky(dir, victim->d_inode) ||
+	    IS_APPEND(victim->d_inode) ||
+	    IS_IMMUTABLE(victim->d_inode))
+		return -EPERM;
+	if (isdir) {
+		if (!S_ISDIR(victim->d_inode->i_mode))
+			return -ENOTDIR;
+		if (IS_ROOT(victim))
+			return -EBUSY;
+	} else if (S_ISDIR(victim->d_inode->i_mode))
+		return -EISDIR;
+	if (victim->d_flags & DCACHE_NFSFS_RENAMED)
+		return -EBUSY;
+	return 0;
+}
+
+/**
+ * vfs_whiteout: creates a white-out for the given directory entry
+ * @dir: parent inode
+ * @dentry: directory entry to white-out
+ *
+ * Simply white-out a given directory entry. This functionality is usually used
+ * in the sense of unlink. Therefore the given dentry can still be in-use and
+ * contains an in-use inode. The filesystem has to do what unlink or rmdir
+ * would in that case. Since the dentry still might be in-use we have to
+ * provide a fresh unhashed dentry that whiteout can fill the new inode into.
+ * In that case the given dentry is dropped and the fresh dentry containing the
+ * whiteout is rehashed instead. If the given dentry is unused, the whiteout
+ * inode is instantiated into it instead.
+ *
+ * After this returns with success, don't make any assumptions about the inode.
+ * Just dput() it dentry.
+ */
+int vfs_whiteout(struct inode *dir, struct dentry *dentry, int isdir)
+{
+	int err;
+	struct inode *old_inode = dentry->d_inode;
+	struct dentry *parent, *whiteout;
+
+	err = may_whiteout(dir, dentry, isdir);
+	if (err)
+		return err;
+
+	BUG_ON(dentry->d_parent->d_inode != dir);
+
+	if (!dir->i_op || !dir->i_op->whiteout)
+		return -EOPNOTSUPP;
+
+	if (old_inode) {
+		vfs_dq_init(dir);
+
+		mutex_lock(&old_inode->i_mutex);
+		if (isdir)
+			dentry_unhash(dentry);
+		if (d_mountpoint(dentry))
+			err = -EBUSY;
+		else {
+			if (isdir)
+				err = security_inode_rmdir(dir, dentry);
+			else
+				err = security_inode_unlink(dir, dentry);
+		}
+	}
+
+	parent = dget_parent(dentry);
+	whiteout = d_alloc_name(parent, dentry->d_name.name);
+
+	if (!err)
+		err = dir->i_op->whiteout(dir, dentry, whiteout);
+
+	if (old_inode) {
+		mutex_unlock(&old_inode->i_mutex);
+		if (!err) {
+			fsnotify_link_count(old_inode);
+			d_delete(dentry);
+		}
+		if (isdir)
+			dput(dentry);
+	}
+
+	dput(whiteout);
+	dput(parent);
+	return err;
+}
+
 /*
  * We try to drop the dentry early: we should have
  * a usage count of 2 if we're the only user of this
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index c66d224..e00e95b 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -181,8 +181,8 @@ d_iput:		no		no		no       yes
 #define DCACHE_UNHASHED		0x0010	
 
 #define DCACHE_INOTIFY_PARENT_WATCHED	0x0020 /* Parent inode is watched */
-
 #define DCACHE_COOKIE		0x0040	/* For use by dcookie subsystem */
+#define DCACHE_WHITEOUT		0x0080	/* This negative dentry is a whiteout */
 
 extern spinlock_t dcache_lock;
 extern seqlock_t rename_lock;
@@ -351,6 +351,11 @@ static inline int d_unhashed(struct dentry *dentry)
 	return (dentry->d_flags & DCACHE_UNHASHED);
 }
 
+static inline int d_is_whiteout(struct dentry *dentry)
+{
+	return (dentry->d_flags & DCACHE_WHITEOUT);
+}
+
 static inline struct dentry *dget_parent(struct dentry *dentry)
 {
 	struct dentry *ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 92734c0..5950616 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -141,6 +141,7 @@ struct inodes_stat_t {
 #define MS_RELATIME	(1<<21)	/* Update atime relative to mtime/ctime. */
 #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
+#define MS_WHITEOUT	(1<<26) /* fs does support white-out filetype */
 #define MS_ACTIVE	(1<<30)
 #define MS_NOUSER	(1<<31)
 
@@ -1241,6 +1242,7 @@ extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
 extern int vfs_rmdir(struct inode *, struct dentry *);
 extern int vfs_unlink(struct inode *, struct dentry *);
 extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
+extern int vfs_whiteout(struct inode *, struct dentry *, int);
 
 /*
  * VFS dentry helper functions.
@@ -1345,6 +1347,7 @@ struct inode_operations {
 	int (*mkdir) (struct inode *,struct dentry *,int);
 	int (*rmdir) (struct inode *,struct dentry *);
 	int (*mknod) (struct inode *,struct dentry *,int,dev_t);
+	int (*whiteout) (struct inode *, struct dentry *, struct dentry *);
 	int (*rename) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *);
 	int (*readlink) (struct dentry *, char __user *,int);
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 10/32] whiteout: Set S_OPAQUE inode flag when creating directories
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (8 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 09/32] whiteout: Add vfs_whiteout() and whiteout inode operation Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 11/32] whiteout: Add whiteout support to tmpfs Jan Blunck
                   ` (25 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

In case of an union directory we don't want that the directories on lower
layers of the union "show through". So to prevent that the contents of
underlying directories magically shows up after a mkdir() we set the S_OPAQUE
flag if directories are created where a whiteout existed before.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/namei.c         |   11 ++++++++++-
 include/linux/fs.h |    3 +++
 2 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9cab708..fe58172 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2050,6 +2050,7 @@ SYSCALL_DEFINE3(mknod, const char __user *, filename, int, mode, unsigned, dev)
 int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
 {
 	int error = may_create(dir, dentry);
+	int opaque = 0;
 
 	if (error)
 		return error;
@@ -2062,10 +2063,18 @@ int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
 	if (error)
 		return error;
 
+	if (d_is_whiteout(dentry))
+		opaque = 1;
+
 	DQUOT_INIT(dir);
 	error = dir->i_op->mkdir(dir, dentry, mode);
-	if (!error)
+	if (!error) {
 		fsnotify_mkdir(dir, dentry);
+		if (opaque) {
+			dentry->d_inode->i_flags |= S_OPAQUE;
+			mark_inode_dirty(dentry->d_inode);
+		}
+	}
 	return error;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 5950616..841bc1d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -168,6 +168,7 @@ struct inodes_stat_t {
 #define S_NOCMTIME	128	/* Do not update file c/mtime */
 #define S_SWAPFILE	256	/* Do not truncate: swapon got its bmaps */
 #define S_PRIVATE	512	/* Inode is fs-internal */
+#define S_OPAQUE	1024	/* Directory is opaque */
 
 /*
  * Note that nosuid etc flags are inode-specific: setting some file-system
@@ -203,6 +204,8 @@ struct inodes_stat_t {
 #define IS_SWAPFILE(inode)	((inode)->i_flags & S_SWAPFILE)
 #define IS_PRIVATE(inode)	((inode)->i_flags & S_PRIVATE)
 
+#define IS_OPAQUE(inode)	((inode)->i_flags & S_OPAQUE)
+
 /* the read-only stuff doesn't really belong here, but any other place is
    probably as bad and I don't want to create yet another include file. */
 
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 11/32] whiteout: Add whiteout support to tmpfs
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (9 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 10/32] whiteout: Set S_OPAQUE inode flag when creating directories Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 12/32] whiteout: Split of ext2_append_link() from ext2_add_link() Jan Blunck
                   ` (24 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

This patch adds support for whiteouts to tmpfs.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 mm/shmem.c |  150 ++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 136 insertions(+), 14 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 4103a23..b2e3904 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1774,6 +1774,76 @@ static int shmem_statfs(struct dentry *dentry, struct kstatfs *buf)
 	return 0;
 }
 
+static int shmem_rmdir(struct inode *dir, struct dentry *dentry);
+static int shmem_unlink(struct inode *dir, struct dentry *dentry);
+
+/*
+ * This is the whiteout support for tmpfs. It uses one singleton whiteout
+ * inode per superblock thus it is very similar to shmem_link().
+ */
+static int shmem_whiteout(struct inode *dir, struct dentry *old_dentry,
+			  struct dentry *new_dentry)
+{
+	struct shmem_sb_info *sbinfo = SHMEM_SB(dir->i_sb);
+	struct dentry *dentry;
+
+	if (!(dir->i_sb->s_flags & MS_WHITEOUT))
+		return -EPERM;
+
+	/* This gives us a proper initialized negative dentry */
+	dentry = simple_lookup(dir, new_dentry, NULL);
+	if (dentry && IS_ERR(dentry))
+		return PTR_ERR(dentry);
+
+	/*
+	 * No ordinary (disk based) filesystem counts whiteouts as inodes;
+	 * but each new link needs a new dentry, pinning lowmem, and
+	 * tmpfs dentries cannot be pruned until they are unlinked.
+	 */
+	if (sbinfo->max_inodes) {
+		spin_lock(&sbinfo->stat_lock);
+		if (!sbinfo->free_inodes) {
+			spin_unlock(&sbinfo->stat_lock);
+			return -ENOSPC;
+		}
+		sbinfo->free_inodes--;
+		spin_unlock(&sbinfo->stat_lock);
+	}
+
+	if (old_dentry->d_inode) {
+		if (S_ISDIR(old_dentry->d_inode->i_mode))
+			shmem_rmdir(dir, old_dentry);
+		else
+			shmem_unlink(dir, old_dentry);
+	}
+
+	dir->i_size += BOGO_DIRENT_SIZE;
+	dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+	/* Extra pinning count for the created dentry */
+	dget(new_dentry);
+	spin_lock(&new_dentry->d_lock);
+	new_dentry->d_flags |= DCACHE_WHITEOUT;
+	spin_unlock(&new_dentry->d_lock);
+	return 0;
+}
+
+static void shmem_d_instantiate(struct inode *dir, struct dentry *dentry,
+				struct inode *inode)
+{
+	if (d_is_whiteout(dentry)) {
+		/* Re-using an existing whiteout */
+		shmem_free_inode(dir->i_sb);
+		if (S_ISDIR(inode->i_mode))
+			inode->i_mode |= S_OPAQUE;
+	} else {
+		/* New dentry */
+		dir->i_size += BOGO_DIRENT_SIZE;
+		dget(dentry); /* Extra count - pin the dentry in core */
+	}
+	/* Will clear DCACHE_WHITEOUT flag */
+	d_instantiate(dentry, inode);
+
+}
 /*
  * File creation. Allocate an inode, and we're done..
  */
@@ -1798,15 +1868,16 @@ shmem_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
 			iput(inode);
 			return error;
 		}
+
 		if (dir->i_mode & S_ISGID) {
 			inode->i_gid = dir->i_gid;
 			if (S_ISDIR(mode))
 				inode->i_mode |= S_ISGID;
 		}
-		dir->i_size += BOGO_DIRENT_SIZE;
+
+		shmem_d_instantiate(dir, dentry, inode);
+
 		dir->i_ctime = dir->i_mtime = CURRENT_TIME;
-		d_instantiate(dentry, inode);
-		dget(dentry); /* Extra count - pin the dentry in core */
 	}
 	return error;
 }
@@ -1844,12 +1915,11 @@ static int shmem_link(struct dentry *old_dentry, struct inode *dir, struct dentr
 	if (ret)
 		goto out;
 
-	dir->i_size += BOGO_DIRENT_SIZE;
+	shmem_d_instantiate(dir, dentry, inode);
+
 	inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
 	inc_nlink(inode);
 	atomic_inc(&inode->i_count);	/* New dentry reference */
-	dget(dentry);		/* Extra pinning count for the created dentry */
-	d_instantiate(dentry, inode);
 out:
 	return ret;
 }
@@ -1858,21 +1928,61 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)
 {
 	struct inode *inode = dentry->d_inode;
 
-	if (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode))
-		shmem_free_inode(inode->i_sb);
+	if (d_is_whiteout(dentry) || (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode)))
+		shmem_free_inode(dir->i_sb);
 
+	if (inode) {
+		inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+		drop_nlink(inode);
+	}
 	dir->i_size -= BOGO_DIRENT_SIZE;
-	inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
-	drop_nlink(inode);
 	dput(dentry);	/* Undo the count from "create" - this does all the work */
 	return 0;
 }
 
+static void shmem_dir_unlink_whiteouts(struct inode *dir, struct dentry *dentry)
+{
+	if (!dentry->d_inode)
+		return;
+
+	/* Remove whiteouts from logical empty directory */
+	if (S_ISDIR(dentry->d_inode->i_mode) &&
+	    dentry->d_inode->i_sb->s_flags & MS_WHITEOUT) {
+		struct dentry *child, *next;
+		LIST_HEAD(list);
+
+		spin_lock(&dcache_lock);
+		list_for_each_entry(child, &dentry->d_subdirs, d_u.d_child) {
+			spin_lock(&child->d_lock);
+			if (d_is_whiteout(child)) {
+				__d_drop(child);
+				if (!list_empty(&child->d_lru)) {
+					list_del(&child->d_lru);
+					dentry_stat.nr_unused--;
+				}
+				list_add(&child->d_lru, &list);
+			}
+			spin_unlock(&child->d_lock);
+		}
+		spin_unlock(&dcache_lock);
+
+		list_for_each_entry_safe(child, next, &list, d_lru) {
+			spin_lock(&child->d_lock);
+			list_del_init(&child->d_lru);
+			spin_unlock(&child->d_lock);
+
+			shmem_unlink(dentry->d_inode, child);
+		}
+	}
+}
+
 static int shmem_rmdir(struct inode *dir, struct dentry *dentry)
 {
 	if (!simple_empty(dentry))
 		return -ENOTEMPTY;
 
+	/* Remove whiteouts from logical empty directory */
+	shmem_dir_unlink_whiteouts(dir, dentry);
 	drop_nlink(dentry->d_inode);
 	drop_nlink(dir);
 	return shmem_unlink(dir, dentry);
@@ -1881,7 +1991,7 @@ static int shmem_rmdir(struct inode *dir, struct dentry *dentry)
 /*
  * The VFS layer already does all the dentry stuff for rename,
  * we just have to decrement the usage count for the target if
- * it exists so that the VFS layer correctly free's it when it
+ * it exists so that the VFS layer correctly frees it when it
  * gets overwritten.
  */
 static int shmem_rename(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry)
@@ -1892,7 +2002,12 @@ static int shmem_rename(struct inode *old_dir, struct dentry *old_dentry, struct
 	if (!simple_empty(new_dentry))
 		return -ENOTEMPTY;
 
+	if (d_is_whiteout(new_dentry))
+		shmem_unlink(new_dir, new_dentry);
+
 	if (new_dentry->d_inode) {
+		/* Remove whiteouts from logical empty directory */
+		shmem_dir_unlink_whiteouts(new_dir, new_dentry);
 		(void) shmem_unlink(new_dir, new_dentry);
 		if (they_are_dirs)
 			drop_nlink(old_dir);
@@ -1957,12 +2072,12 @@ static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *s
 		set_page_dirty(page);
 		page_cache_release(page);
 	}
+
+	shmem_d_instantiate(dir, dentry, inode);
+
 	if (dir->i_mode & S_ISGID)
 		inode->i_gid = dir->i_gid;
-	dir->i_size += BOGO_DIRENT_SIZE;
 	dir->i_ctime = dir->i_mtime = CURRENT_TIME;
-	d_instantiate(dentry, inode);
-	dget(dentry);
 	return 0;
 }
 
@@ -2343,6 +2458,12 @@ static int shmem_fill_super(struct super_block *sb,
 	if (!root)
 		goto failed_iput;
 	sb->s_root = root;
+
+#ifdef CONFIG_TMPFS
+	if (!(sb->s_flags & MS_NOUSER))
+		sb->s_flags |= MS_WHITEOUT;
+#endif
+
 	return 0;
 
 failed_iput:
@@ -2447,6 +2568,7 @@ static const struct inode_operations shmem_dir_inode_operations = {
 	.rmdir		= shmem_rmdir,
 	.mknod		= shmem_mknod,
 	.rename		= shmem_rename,
+	.whiteout       = shmem_whiteout,
 #endif
 #ifdef CONFIG_TMPFS_POSIX_ACL
 	.setattr	= shmem_notify_change,
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 12/32] whiteout: Split of ext2_append_link() from ext2_add_link()
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (10 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 11/32] whiteout: Add whiteout support to tmpfs Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 13/32] whiteout: Add whiteout support to ext2 Jan Blunck
                   ` (23 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

The ext2_append_link() is later used to find or append a directory
entry to whiteout.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/ext2/dir.c |   70 ++++++++++++++++++++++++++++++++++++++++----------------
 1 files changed, 50 insertions(+), 20 deletions(-)

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 2999d72..e4689e3 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -471,9 +471,10 @@ void ext2_set_link(struct inode *dir, struct ext2_dir_entry_2 *de,
 }
 
 /*
- *	Parent is locked.
+ * Find or append a given dentry to the parent directory
  */
-int ext2_add_link (struct dentry *dentry, struct inode *inode)
+static ext2_dirent * ext2_append_entry(struct dentry * dentry,
+				       struct page ** page)
 {
 	struct inode *dir = dentry->d_parent->d_inode;
 	const char *name = dentry->d_name.name;
@@ -481,13 +482,10 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode)
 	unsigned chunk_size = ext2_chunk_size(dir);
 	unsigned reclen = EXT2_DIR_REC_LEN(namelen);
 	unsigned short rec_len, name_len;
-	struct page *page = NULL;
-	ext2_dirent * de;
+	ext2_dirent * de = NULL;
 	unsigned long npages = dir_pages(dir);
 	unsigned long n;
 	char *kaddr;
-	loff_t pos;
-	int err;
 
 	/*
 	 * We take care of directory expansion in the same loop.
@@ -497,20 +495,19 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode)
 	for (n = 0; n <= npages; n++) {
 		char *dir_end;
 
-		page = ext2_get_page(dir, n, 0);
-		err = PTR_ERR(page);
-		if (IS_ERR(page))
+		*page = ext2_get_page(dir, n, 0);
+		de = ERR_PTR(PTR_ERR(*page));
+		if (IS_ERR(*page))
 			goto out;
-		lock_page(page);
-		kaddr = page_address(page);
+		lock_page(*page);
+		kaddr = page_address(*page);
 		dir_end = kaddr + ext2_last_byte(dir, n);
 		de = (ext2_dirent *)kaddr;
 		kaddr += PAGE_CACHE_SIZE - reclen;
 		while ((char *)de <= kaddr) {
 			if ((char *)de == dir_end) {
 				/* We hit i_size */
-				name_len = 0;
-				rec_len = chunk_size;
+				de->name_len = 0;
 				de->rec_len = ext2_rec_len_to_disk(chunk_size);
 				de->inode = 0;
 				goto got_it;
@@ -518,12 +515,11 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode)
 			if (de->rec_len == 0) {
 				ext2_error(dir->i_sb, __func__,
 					"zero-length directory entry");
-				err = -EIO;
+				de = ERR_PTR(-EIO);
 				goto out_unlock;
 			}
-			err = -EEXIST;
 			if (ext2_match (namelen, name, de))
-				goto out_unlock;
+				goto got_it;
 			name_len = EXT2_DIR_REC_LEN(de->name_len);
 			rec_len = ext2_rec_len_from_disk(de->rec_len);
 			if (!de->inode && rec_len >= reclen)
@@ -532,13 +528,48 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode)
 				goto got_it;
 			de = (ext2_dirent *) ((char *) de + rec_len);
 		}
-		unlock_page(page);
-		ext2_put_page(page);
+		unlock_page(*page);
+		ext2_put_page(*page);
 	}
+
 	BUG();
-	return -EINVAL;
 
 got_it:
+	return de;
+	/* OFFSET_CACHE */
+out_unlock:
+	unlock_page(*page);
+	ext2_put_page(*page);
+out:
+	return de;
+}
+
+/*
+ *	Parent is locked.
+ */
+int ext2_add_link (struct dentry *dentry, struct inode *inode)
+{
+	struct inode *dir = dentry->d_parent->d_inode;
+	const char *name = dentry->d_name.name;
+	int namelen = dentry->d_name.len;
+	unsigned short rec_len, name_len;
+	ext2_dirent * de;
+	struct page *page;
+	loff_t pos;
+	int err;
+
+	de = ext2_append_entry(dentry, &page);
+	if (IS_ERR(de))
+		return PTR_ERR(de);
+
+	err = -EEXIST;
+	if (ext2_match (namelen, name, de))
+		goto out_unlock;
+
+got_it:
+	name_len = EXT2_DIR_REC_LEN(de->name_len);
+	rec_len = ext2_rec_len_from_disk(de->rec_len);
+
 	pos = page_offset(page) +
 		(char*)de - (char*)page_address(page);
 	err = __ext2_write_begin(NULL, page->mapping, pos, rec_len, 0,
@@ -562,7 +593,6 @@ got_it:
 	/* OFFSET_CACHE */
 out_put:
 	ext2_put_page(page);
-out:
 	return err;
 out_unlock:
 	unlock_page(page);
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 13/32] whiteout: Add whiteout support to ext2
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (11 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 12/32] whiteout: Split of ext2_append_link() from ext2_add_link() Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 14/32] whiteout: Add path_whiteout() helper Jan Blunck
                   ` (22 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

This patch adds whiteout support to EXT2. A whiteout is an empty directory
entry (inode == 0) with the file type set to EXT2_FT_WHT. Therefore it
allocates space in directories. Due to being implemented as a filetype it is
necessary to have the EXT2_FEATURE_INCOMPAT_FILETYPE flag set.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/ext2/dir.c           |   96 +++++++++++++++++++++++++++++++++++++++++++++--
 fs/ext2/ext2.h          |    3 +
 fs/ext2/inode.c         |   11 ++++-
 fs/ext2/namei.c         |   65 ++++++++++++++++++++++++++++++-
 fs/ext2/super.c         |    8 ++++
 include/linux/ext2_fs.h |    4 ++
 6 files changed, 177 insertions(+), 10 deletions(-)

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index e4689e3..5b499ad 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -219,7 +219,7 @@ static inline int ext2_match (int len, const char * const name,
 {
 	if (len != de->name_len)
 		return 0;
-	if (!de->inode)
+	if (!de->inode && (de->file_type != EXT2_FT_WHT))
 		return 0;
 	return !memcmp(name, de->name, len);
 }
@@ -255,6 +255,7 @@ static unsigned char ext2_filetype_table[EXT2_FT_MAX] = {
 	[EXT2_FT_FIFO]		= DT_FIFO,
 	[EXT2_FT_SOCK]		= DT_SOCK,
 	[EXT2_FT_SYMLINK]	= DT_LNK,
+	[EXT2_FT_WHT]		= DT_WHT,
 };
 
 #define S_SHIFT 12
@@ -448,6 +449,26 @@ ino_t ext2_inode_by_name(struct inode *dir, struct qstr *child)
 	return res;
 }
 
+/* Special version for filetype based whiteout support */
+ino_t ext2_inode_by_dentry(struct inode *dir, struct dentry *dentry)
+{
+	ino_t res = 0;
+	struct ext2_dir_entry_2 *de;
+	struct page *page;
+
+	de = ext2_find_entry (dir, &dentry->d_name, &page);
+	if (de) {
+		res = le32_to_cpu(de->inode);
+		if (!res && de->file_type == EXT2_FT_WHT) {
+			spin_lock(&dentry->d_lock);
+			dentry->d_flags |= DCACHE_WHITEOUT;
+			spin_unlock(&dentry->d_lock);
+		}
+		ext2_put_page(page);
+	}
+	return res;
+}
+
 /* Releases the page */
 void ext2_set_link(struct inode *dir, struct ext2_dir_entry_2 *de,
 			struct page *page, struct inode *inode)
@@ -522,7 +543,8 @@ static ext2_dirent * ext2_append_entry(struct dentry * dentry,
 				goto got_it;
 			name_len = EXT2_DIR_REC_LEN(de->name_len);
 			rec_len = ext2_rec_len_from_disk(de->rec_len);
-			if (!de->inode && rec_len >= reclen)
+			if (!de->inode && (de->file_type != EXT2_FT_WHT) &&
+			    (rec_len >= reclen))
 				goto got_it;
 			if (rec_len >= name_len + reclen)
 				goto got_it;
@@ -563,8 +585,11 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode)
 		return PTR_ERR(de);
 
 	err = -EEXIST;
-	if (ext2_match (namelen, name, de))
+	if (ext2_match (namelen, name, de)) {
+		if (de->file_type == EXT2_FT_WHT)
+			goto got_it;
 		goto out_unlock;
+	}
 
 got_it:
 	name_len = EXT2_DIR_REC_LEN(de->name_len);
@@ -576,7 +601,8 @@ got_it:
 							&page, NULL);
 	if (err)
 		goto out_unlock;
-	if (de->inode) {
+	if (de->inode || ((de->file_type == EXT2_FT_WHT) &&
+			  !ext2_match (namelen, name, de))) {
 		ext2_dirent *de1 = (ext2_dirent *) ((char *) de + name_len);
 		de1->rec_len = ext2_rec_len_to_disk(rec_len - name_len);
 		de->rec_len = ext2_rec_len_to_disk(name_len);
@@ -645,6 +671,68 @@ out:
 	return err;
 }
 
+int ext2_whiteout_entry (struct inode * dir, struct dentry * dentry,
+			 struct ext2_dir_entry_2 * de, struct page * page)
+{
+	const char *name = dentry->d_name.name;
+	int namelen = dentry->d_name.len;
+	unsigned short rec_len, name_len;
+	loff_t pos;
+	int err;
+
+	if (!de) {
+		de = ext2_append_entry(dentry, &page);
+		BUG_ON(!de);
+	}
+
+	err = -EEXIST;
+	if (ext2_match (namelen, name, de) &&
+	    (de->file_type == EXT2_FT_WHT)) {
+		ext2_error(dir->i_sb, __func__,
+			   "entry is already a whiteout in directory #%lu",
+			   dir->i_ino);
+		goto out_unlock;
+	}
+
+	name_len = EXT2_DIR_REC_LEN(de->name_len);
+	rec_len = ext2_rec_len_from_disk(de->rec_len);
+
+	pos = page_offset(page) +
+		(char*)de - (char*)page_address(page);
+	err = __ext2_write_begin(NULL, page->mapping, pos, rec_len, 0,
+							&page, NULL);
+	if (err)
+		goto out_unlock;
+	/*
+	 * We whiteout an existing entry. Do what ext2_delete_entry() would do,
+	 * except that we don't need to merge with the previous entry since
+	 * we are going to reuse it.
+	 */
+	if (ext2_match (namelen, name, de))
+		de->inode = 0;
+	if (de->inode || (de->file_type == EXT2_FT_WHT)) {
+		ext2_dirent *de1 = (ext2_dirent *) ((char *) de + name_len);
+		de1->rec_len = ext2_rec_len_to_disk(rec_len - name_len);
+		de->rec_len = ext2_rec_len_to_disk(name_len);
+		de = de1;
+	}
+	de->name_len = namelen;
+	memcpy(de->name, name, namelen);
+	de->inode = 0;
+	de->file_type = EXT2_FT_WHT;
+	err = ext2_commit_chunk(page, pos, rec_len);
+	dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
+	EXT2_I(dir)->i_flags &= ~EXT2_BTREE_FL;
+	mark_inode_dirty(dir);
+	/* OFFSET_CACHE */
+out_put:
+	ext2_put_page(page);
+	return err;
+out_unlock:
+	unlock_page(page);
+	goto out_put;
+}
+
 /*
  * Set the first fragment of directory.
  */
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index 3203042..ec9a0bd 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -106,9 +106,12 @@ extern void ext2_rsv_window_add(struct super_block *sb, struct ext2_reserve_wind
 /* dir.c */
 extern int ext2_add_link (struct dentry *, struct inode *);
 extern ino_t ext2_inode_by_name(struct inode *, struct qstr *);
+extern ino_t ext2_inode_by_dentry(struct inode *, struct dentry *);
 extern int ext2_make_empty(struct inode *, struct inode *);
 extern struct ext2_dir_entry_2 * ext2_find_entry (struct inode *,struct qstr *, struct page **);
 extern int ext2_delete_entry (struct ext2_dir_entry_2 *, struct page *);
+extern int ext2_whiteout_entry (struct inode *, struct dentry *,
+				struct ext2_dir_entry_2 *, struct page *);
 extern int ext2_empty_dir (struct inode *);
 extern struct ext2_dir_entry_2 * ext2_dotdot (struct inode *, struct page **);
 extern void ext2_set_link(struct inode *, struct ext2_dir_entry_2 *, struct page *, struct inode *);
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 23fff2f..ae467c4 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1156,7 +1156,8 @@ void ext2_set_inode_flags(struct inode *inode)
 {
 	unsigned int flags = EXT2_I(inode)->i_flags;
 
-	inode->i_flags &= ~(S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC);
+	inode->i_flags &= ~(S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|
+			    S_OPAQUE);
 	if (flags & EXT2_SYNC_FL)
 		inode->i_flags |= S_SYNC;
 	if (flags & EXT2_APPEND_FL)
@@ -1167,6 +1168,8 @@ void ext2_set_inode_flags(struct inode *inode)
 		inode->i_flags |= S_NOATIME;
 	if (flags & EXT2_DIRSYNC_FL)
 		inode->i_flags |= S_DIRSYNC;
+	if (flags & EXT2_OPAQUE_FL)
+		inode->i_flags |= S_OPAQUE;
 }
 
 /* Propagate flags from i_flags to EXT2_I(inode)->i_flags */
@@ -1174,8 +1177,8 @@ void ext2_get_inode_flags(struct ext2_inode_info *ei)
 {
 	unsigned int flags = ei->vfs_inode.i_flags;
 
-	ei->i_flags &= ~(EXT2_SYNC_FL|EXT2_APPEND_FL|
-			EXT2_IMMUTABLE_FL|EXT2_NOATIME_FL|EXT2_DIRSYNC_FL);
+	ei->i_flags &= ~(EXT2_SYNC_FL|EXT2_APPEND_FL|EXT2_IMMUTABLE_FL|
+			 EXT2_NOATIME_FL|EXT2_DIRSYNC_FL|EXT2_OPAQUE_FL);
 	if (flags & S_SYNC)
 		ei->i_flags |= EXT2_SYNC_FL;
 	if (flags & S_APPEND)
@@ -1186,6 +1189,8 @@ void ext2_get_inode_flags(struct ext2_inode_info *ei)
 		ei->i_flags |= EXT2_NOATIME_FL;
 	if (flags & S_DIRSYNC)
 		ei->i_flags |= EXT2_DIRSYNC_FL;
+	if (flags & S_OPAQUE)
+		ei->i_flags |= EXT2_OPAQUE_FL;
 }
 
 struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
diff --git a/fs/ext2/namei.c b/fs/ext2/namei.c
index 90ea179..58107ff 100644
--- a/fs/ext2/namei.c
+++ b/fs/ext2/namei.c
@@ -54,15 +54,16 @@ static inline int ext2_add_nondir(struct dentry *dentry, struct inode *inode)
  * Methods themselves.
  */
 
-static struct dentry *ext2_lookup(struct inode * dir, struct dentry *dentry, struct nameidata *nd)
+static struct dentry *ext2_lookup(struct inode * dir, struct dentry *dentry,
+				  struct nameidata *nd)
 {
 	struct inode * inode;
 	ino_t ino;
-	
+
 	if (dentry->d_name.len > EXT2_NAME_LEN)
 		return ERR_PTR(-ENAMETOOLONG);
 
-	ino = ext2_inode_by_name(dir, &dentry->d_name);
+	ino = ext2_inode_by_dentry(dir, dentry);
 	inode = NULL;
 	if (ino) {
 		inode = ext2_iget(dir->i_sb, ino);
@@ -222,6 +223,10 @@ static int ext2_mkdir(struct inode * dir, struct dentry * dentry, int mode)
 	else
 		inode->i_mapping->a_ops = &ext2_aops;
 
+	/* if we call mkdir on a whiteout create an opaque directory */
+	if (dentry->d_flags & DCACHE_WHITEOUT)
+		inode->i_flags |= S_OPAQUE;
+
 	inode_inc_link_count(inode);
 
 	err = ext2_make_empty(inode, dir);
@@ -285,6 +290,59 @@ static int ext2_rmdir (struct inode * dir, struct dentry *dentry)
 	return err;
 }
 
+/*
+ * Create a whiteout for the dentry
+ */
+static int ext2_whiteout(struct inode *dir, struct dentry *dentry,
+			 struct dentry *new_dentry)
+{
+	struct inode * inode = dentry->d_inode;
+	struct ext2_dir_entry_2 * de = NULL;
+	struct page * page;
+	int err = -ENOTEMPTY;
+
+	if (!EXT2_HAS_INCOMPAT_FEATURE(dir->i_sb,
+				       EXT2_FEATURE_INCOMPAT_FILETYPE)) {
+		ext2_error (dir->i_sb, "ext2_whiteout",
+			    "can't set whiteout filetype");
+		err = -EPERM;
+		goto out;
+	}
+
+	if (inode) {
+		if (S_ISDIR(inode->i_mode) && !ext2_empty_dir(inode))
+			goto out;
+
+		err = -ENOENT;
+		de = ext2_find_entry (dir, &dentry->d_name, &page);
+		if (!de)
+			goto out;
+		lock_page(page);
+	}
+
+	err = ext2_whiteout_entry (dir, dentry, de, page);
+	if (err)
+		goto out;
+
+	spin_lock(&new_dentry->d_lock);
+	new_dentry->d_flags |= DCACHE_WHITEOUT;
+	spin_unlock(&new_dentry->d_lock);
+	d_add(new_dentry, NULL);
+
+	if (inode) {
+		inode->i_ctime = dir->i_ctime;
+		inode_dec_link_count(inode);
+		if (S_ISDIR(inode->i_mode)) {
+			inode->i_size = 0;
+			inode_dec_link_count(inode);
+			inode_dec_link_count(dir);
+		}
+	}
+	err = 0;
+out:
+	return err;
+}
+
 static int ext2_rename (struct inode * old_dir, struct dentry * old_dentry,
 	struct inode * new_dir,	struct dentry * new_dentry )
 {
@@ -379,6 +437,7 @@ const struct inode_operations ext2_dir_inode_operations = {
 	.mkdir		= ext2_mkdir,
 	.rmdir		= ext2_rmdir,
 	.mknod		= ext2_mknod,
+	.whiteout	= ext2_whiteout,
 	.rename		= ext2_rename,
 #ifdef CONFIG_EXT2_FS_XATTR
 	.setxattr	= generic_setxattr,
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 7c6e360..dd35668 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1071,6 +1071,14 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
 	if (EXT2_HAS_COMPAT_FEATURE(sb, EXT3_FEATURE_COMPAT_HAS_JOURNAL))
 		ext2_warning(sb, __func__,
 			"mounting ext3 filesystem as ext2");
+
+	/*
+	 * If this filesystem can store the filetype it has support for
+	 * whiteouts as well.
+	 */
+	if (EXT2_HAS_INCOMPAT_FEATURE(sb, EXT2_FEATURE_INCOMPAT_FILETYPE))
+		sb->s_flags |= MS_WHITEOUT;
+
 	ext2_setup_super (sb, es, sb->s_flags & MS_RDONLY);
 	return 0;
 
diff --git a/include/linux/ext2_fs.h b/include/linux/ext2_fs.h
index 121720d..bd10826 100644
--- a/include/linux/ext2_fs.h
+++ b/include/linux/ext2_fs.h
@@ -189,6 +189,7 @@ struct ext2_group_desc
 #define EXT2_NOTAIL_FL			FS_NOTAIL_FL	/* file tail should not be merged */
 #define EXT2_DIRSYNC_FL			FS_DIRSYNC_FL	/* dirsync behaviour (directories only) */
 #define EXT2_TOPDIR_FL			FS_TOPDIR_FL	/* Top of directory hierarchies*/
+#define EXT2_OPAQUE_FL			0x00040000
 #define EXT2_RESERVED_FL		FS_RESERVED_FL	/* reserved for ext2 lib */
 
 #define EXT2_FL_USER_VISIBLE		FS_FL_USER_VISIBLE	/* User visible flags */
@@ -503,10 +504,12 @@ struct ext2_super_block {
 #define EXT3_FEATURE_INCOMPAT_RECOVER		0x0004
 #define EXT3_FEATURE_INCOMPAT_JOURNAL_DEV	0x0008
 #define EXT2_FEATURE_INCOMPAT_META_BG		0x0010
+#define EXT2_FEATURE_INCOMPAT_WHITEOUT		0x0020
 #define EXT2_FEATURE_INCOMPAT_ANY		0xffffffff
 
 #define EXT2_FEATURE_COMPAT_SUPP	EXT2_FEATURE_COMPAT_EXT_ATTR
 #define EXT2_FEATURE_INCOMPAT_SUPP	(EXT2_FEATURE_INCOMPAT_FILETYPE| \
+					 EXT2_FEATURE_INCOMPAT_WHITEOUT| \
 					 EXT2_FEATURE_INCOMPAT_META_BG)
 #define EXT2_FEATURE_RO_COMPAT_SUPP	(EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER| \
 					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE| \
@@ -573,6 +576,7 @@ enum {
 	EXT2_FT_FIFO,
 	EXT2_FT_SOCK,
 	EXT2_FT_SYMLINK,
+	EXT2_FT_WHT,
 	EXT2_FT_MAX
 };
 
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 14/32] whiteout: Add path_whiteout() helper
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (12 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 13/32] whiteout: Add whiteout support to ext2 Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 15/32] union-mount: Documentation Jan Blunck
                   ` (21 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

Add a path_whiteout() helper for vfs_whiteout().

Signed-off-by: Jan Blunck <jblunck@suse.org>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/namei.c         |   15 ++++++++++++++-
 include/linux/fs.h |    1 -
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index fe58172..9dc51b0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2172,7 +2172,7 @@ static inline int may_whiteout(struct inode *dir, struct dentry *victim,
  * After this returns with success, don't make any assumptions about the inode.
  * Just dput() it dentry.
  */
-int vfs_whiteout(struct inode *dir, struct dentry *dentry, int isdir)
+static int vfs_whiteout(struct inode *dir, struct dentry *dentry, int isdir)
 {
 	int err;
 	struct inode *old_inode = dentry->d_inode;
@@ -2224,6 +2224,19 @@ int vfs_whiteout(struct inode *dir, struct dentry *dentry, int isdir)
 	return err;
 }
 
+int path_whiteout(struct path *dir_path, struct dentry *dentry, int isdir)
+{
+	int error = mnt_want_write(dir_path->mnt);
+
+	if (!error) {
+		error = vfs_whiteout(dir_path->dentry->d_inode, dentry, isdir);
+		mnt_drop_write(dir_path->mnt);
+	}
+
+	return error;
+}
+EXPORT_SYMBOL(path_whiteout);
+
 /*
  * We try to drop the dentry early: we should have
  * a usage count of 2 if we're the only user of this
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 841bc1d..f5ca398 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1245,7 +1245,6 @@ extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
 extern int vfs_rmdir(struct inode *, struct dentry *);
 extern int vfs_unlink(struct inode *, struct dentry *);
 extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
-extern int vfs_whiteout(struct inode *, struct dentry *, int);
 
 /*
  * VFS dentry helper functions.
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 15/32] union-mount: Documentation
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (13 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 14/32] whiteout: Add path_whiteout() helper Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-25  6:25   ` hooanon05
  2009-05-18 16:09 ` [PATCH 16/32] union-mount: Introduce MNT_UNION and MS_UNION flags Jan Blunck
                   ` (20 subsequent siblings)
  35 siblings, 1 reply; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

Add simple documentation about union mounting in general and this
implementation in specific.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 Documentation/filesystems/union-mounts.txt |  187 ++++++++++++++++++++++++++++
 1 files changed, 187 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/filesystems/union-mounts.txt

diff --git a/Documentation/filesystems/union-mounts.txt b/Documentation/filesystems/union-mounts.txt
new file mode 100644
index 0000000..15bb9d5
--- /dev/null
+++ b/Documentation/filesystems/union-mounts.txt
@@ -0,0 +1,187 @@
+VFS based Union Mounts
+----------------------
+
+ 1. What are "Union Mounts"
+ 2. The Union Stack
+ 3. Whiteouts, Opaque Directories, and Fallthrus
+ 4. Copy-up
+ 5. Directory Reading
+ 6. Known Problems
+ 7. References
+
+-------------------------------------------------------------------------------
+
+1. What are "Union Mounts"
+==========================
+
+Please note: this is NOT about UnionFS and it is NOT derived work!
+
+Traditionally the mount operation is opaque, which means that the content of
+the mount point, the directory where the file system is mounted on, is hidden
+by the content of the mounted file system's root directory until the file
+system is unmounted again. Unlike the traditional UNIX mount mechanism, that
+hides the contents of the mount point, a union mount presents a view as if
+both filesystems are merged together. Although only the topmost layer of the
+mount stack can be altered, it appears as if transparent file system mounts
+allow any file to be created, modified or deleted.
+
+Most people know the concepts and features of union mounts from other
+operating systems like Sun's Translucent Filesystem, Plan9 or BSD. For an
+in-depth review of union mounts and other unioning file systems, see:
+
+http://lwn.net/Articles/324291/
+http://lwn.net/Articles/325369/
+http://lwn.net/Articles/327738/
+
+Here are the key features of this implementation:
+- completely VFS based
+- does not change the namespace stacking
+- directory listings have duplicate entries removed in the kernel
+- writable unions: only the topmost file system layer may be writable
+- writable unions: new whiteout filetype handled inside the kernel
+
+-------------------------------------------------------------------------------
+
+2. The Union Stack
+==================
+
+The mounted file systems are organized in the "file system hierarchy" (tree of
+vfsmount structures), which keeps track about the stacking of file systems
+upon each other. The per-directory view on the file system hierarchy is called
+"mount stack" and reflects the order of file systems, which are mounted on a
+specific directory.
+
+Union mounts present a single unified view of the contents of two or more file
+systems as if they are merged together. Since the information which file
+system objects are part of a unified view is not directly available from the
+file system hierarchy there is a need for a new structure. The file system
+objects, which are part of a unified view are ordered in a so-called "union
+stack". Only directories can be part of a unified view.
+
+The link between two layers of the union stack is maintained using the
+union_mount structure (#include <linux/union.h>):
+
+struct union_mount {
+       atomic_t u_count;               /* reference count */
+       struct mutex u_mutex;
+       struct list_head u_unions;      /* list head for d_unions */
+       struct hlist_node u_hash;       /* list head for searching */
+       struct hlist_node u_rhash;      /* list head for reverse searching */
+
+       struct path u_this;             /* this is me */
+       struct path u_next;             /* this is what I overlay */
+};
+
+The union_mount structure holds a reference (dget,mntget) to the next lower
+layer of the union stack. Since a dentry can be part of multiple unions
+(e.g. with bind mounts) they are tied together via the d_unions field of the
+dentry structure.
+
+All union_mount structures are cached in two hash tables, one for lookups of
+the next lower layer of the union stack and one for reverse lookups of the
+next upper layer of the union stack. The reverse lookup is necessary to
+resolve CWD relative path lookups. For calculation of the hash value, the
+(dentry,vfsmount) pair is used. The u_this field is used for the hash table
+which is used in forward lookups and the u_next field for the reverse lookups.
+
+During every new mount (or mount propagation), a new union_mount structure is
+allocated. A reference to the mountpoint's vfsmount and dentry is taken and
+stored in the u_next field.  In almost the same manner an union_mount
+structure is created during the first time lookup of a directory within a
+union mount point. In this case the lookup proceeds to all lower layers of the
+union. Therefore the complete union stack is constructed during lookups.
+
+The union_mount structures of a dentry are destroyed when the dentry itself is
+destroyed. Therefore the dentry cache is indirectly driving the union_mount
+cache like this is done for inodes too. Please note that lower layer
+union_mount structures are kept in memory until the topmost dentry is
+destroyed.
+
+-------------------------------------------------------------------------------
+
+3. Whiteouts, Opaque Directories, and Fallthrus
+===========================================================
+
+The whiteout filetype isn't new. It has been there for quite some time now
+but Linux's VFS hasn't used it yet. With the availability of union mount code
+inside the VFS the whiteout filetype is getting important to support writable
+union mounts. For read-only union mounts, support for whiteouts or
+copy-on-open is not necessary.
+
+The whiteout filetype has the same function as negative dentries: they
+describe a filename which isn't there. The creation of whiteouts needs
+lowlevel filesystem support. At the time of writing this, there is whiteout
+support for tmpfs, ext2 and ext3 available. The VFS is extended to make the
+whiteout handling transparent to all its users. The whiteouts are not
+visible to user-space.
+
+What happens when we create a directory that was previously whited-out? We
+don't want the directory entries from underlying filesystems to suddenly appear
+in the newly created directory.  So we mark the directory opaque (the file
+system must support storage of the opaque flag).
+
+Fallthrus are directory entries that override the opaque flag on a directory
+for that specific directory entry name (the lookup "falls through" to the next
+layer of the union mount).  Fallthrus are mainly useful for implementing
+readdir().
+
+-------------------------------------------------------------------------------
+
+4. Copy-up
+===========
+
+Any write to an object on any layer other than the topmost triggers a copy-up
+of the object to the topmost file system. For regular files, the copy-up
+happens when it is opened in writable mode.
+
+Directories are copied up on open, regardless of intent to write, to simplify
+copy-up of any object located below it in the namespace. Otherwise we have to
+walk the entire pathname to create intermediate directories whenever we do a
+copy-up. This is the same approach as BSD union mounts and uses a negigible
+amount of disk space.  Note that the actual directory entries themselves are
+not copied-up from the lower levels until (a) the directory is written to, or
+(b) the first readdir() of the directory (more on that later).
+
+Rename across different levels of the union is implemented as a copy-up
+operation for regular files. Rename of directories simply returns EXDEV, the
+same as if we tried to rename across different mounts. Most applications have
+to handle this case anyway. Some applications do not expect EXDEV on
+rename operations within the same directory, but these applications will also
+be broken with bind mounts.
+
+-------------------------------------------------------------------------------
+
+5. Directory Reading
+====================
+
+readdir() is somewhat difficult to implement in a unioning file system. We must
+eliminate duplicates, apply whiteouts, and start up readdir() where we left
+off, given a single f_pos value. Our solution is to copy up all the directory
+entries to the topmost directory the first time readdir() is called on a
+directory. During this copy-up, we skip duplicates and entries covered by
+whiteouts, and then create fallthru entries for each remaining visible dentry.
+Then we mark the whole directory opaque. From then on, we just use the topmost
+file system's normal readdir() operation.
+
+-------------------------------------------------------------------------------
+
+6. Known Problems
+=================
+
+- copyup() for other filetypes that reg and dir (e.g. for chown() on devices)
+- symlinks are untested
+
+-------------------------------------------------------------------------------
+
+7. References
+=============
+
+[1] http://marc.info/?l=linux-fsdevel&m=96035682927821&w=2
+[2] http://marc.info/?l=linux-fsdevel&m=117681527820133&w=2
+[3] http://marc.info/?l=linux-fsdevel&m=117913503200362&w=2
+[4] http://marc.info/?l=linux-fsdevel&m=118231827024394&w=2
+
+Authors:
+Jan Blunck <jblunck@suse.de>
+Bharata B Rao <bharata@linux.vnet.ibm.com>
+Valerie Aurora <vaurora@redhat.com>
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 16/32] union-mount: Introduce MNT_UNION and MS_UNION flags
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (14 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 15/32] union-mount: Documentation Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 17/32] union-mount: Introduce union_mount structure Jan Blunck
                   ` (19 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

Add per mountpoint flag for Union Mount support. You need additional patches
to util-linux for that to work - see:

git://git.kernel.org/pub/scm/utils/util-linux-ng/val/util-linux-ng.git

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/namespace.c        |    6 +++++-
 include/linux/fs.h    |    1 +
 include/linux/mount.h |    1 +
 3 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index f0a5ce7..53998f2 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -780,6 +780,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt)
 		{ MNT_NOATIME, ",noatime" },
 		{ MNT_NODIRATIME, ",nodiratime" },
 		{ MNT_RELATIME, ",relatime" },
+		{ MNT_UNION, ",union" },
 		{ 0, NULL }
 	};
 	const struct proc_fs_info *fs_infop;
@@ -1934,9 +1935,12 @@ long do_mount(char *dev_name, char *dir_name, char *type_page,
 		mnt_flags |= MNT_RELATIME;
 	if (flags & MS_RDONLY)
 		mnt_flags |= MNT_READONLY;
+	if (flags & MS_UNION)
+		mnt_flags |= MNT_UNION;
 
 	flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE |
-		   MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT);
+		   MS_NOATIME | MS_NODIRATIME | MS_RELATIME | MS_KERNMOUNT |
+		   MS_UNION);
 
 	/* ... and get the mountpoint */
 	retval = kern_path(dir_name, LOOKUP_FOLLOW, &path);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f5ca398..7f07768 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -125,6 +125,7 @@ struct inodes_stat_t {
 #define MS_REMOUNT	32	/* Alter flags of a mounted FS */
 #define MS_MANDLOCK	64	/* Allow mandatory locks on an FS */
 #define MS_DIRSYNC	128	/* Directory modifications are synchronous */
+#define MS_UNION	256
 #define MS_NOATIME	1024	/* Do not update access times. */
 #define MS_NODIRATIME	2048	/* Do not update directory access times */
 #define MS_BIND		4096
diff --git a/include/linux/mount.h b/include/linux/mount.h
index cab2a85..bd35fb8 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -34,6 +34,7 @@ struct mnt_namespace;
 #define MNT_SHARED	0x1000	/* if the vfsmount is a shared mount */
 #define MNT_UNBINDABLE	0x2000	/* if the vfsmount is a unbindable mount */
 #define MNT_PNODE_MASK	0x3000	/* propagation flag mask */
+#define MNT_UNION	0x4000	/* if the vfsmount is a union mount */
 
 struct vfsmount {
 	struct list_head mnt_hash;
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 17/32] union-mount: Introduce union_mount structure
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (15 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 16/32] union-mount: Introduce MNT_UNION and MS_UNION flags Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 18/32] union-mount: Drive the union cache via dcache Jan Blunck
                   ` (18 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

This patch adds the basic structures of VFS based union mounts. It is a new
implementation based on some of my old ideas that influenced Bharata B Rao
<bharata@linux.vnet.ibm.com> who came up with the proposal to let the
union_mount struct only point to the next layer in the union stack. I rewrote
nearly all of the central patches around lookup and the dcache interaction.

Advantages of the new implementation:
- the new union stack is no longer tied directly to one dentry
- the union stack enables dentries to be part of more than one union
  (bind mounts)
- it is unnecessary to traverse the union stack when de/referencing a dentry
- caching of union stack information still driven by dentry cache

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/Kconfig             |    8 +
 fs/Makefile            |    2 +
 fs/dcache.c            |    4 +
 fs/union.c             |  331 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dcache.h |    9 ++
 include/linux/union.h  |   61 +++++++++
 6 files changed, 415 insertions(+), 0 deletions(-)
 create mode 100644 fs/union.c
 create mode 100644 include/linux/union.h

diff --git a/fs/Kconfig b/fs/Kconfig
index 93945dd..55cb4f2 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -56,6 +56,14 @@ endif # BLOCK
 
 source "fs/notify/Kconfig"
 
+config UNION_MOUNT
+       bool "Union mount support (EXPERIMENTAL)"
+       depends on EXPERIMENTAL
+       ---help---
+         If you say Y here, you will be able to mount file systems as
+         union mount stacks. This is a VFS based implementation and
+         should work with all file systems. If unsure, say N.
+
 config QUOTA
 	bool "Quota support"
 	help
diff --git a/fs/Makefile b/fs/Makefile
index dc20db3..d4ac5bb 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -51,6 +51,8 @@ obj-$(CONFIG_FS_POSIX_ACL)	+= posix_acl.o xattr_acl.o
 obj-$(CONFIG_NFS_COMMON)	+= nfs_common/
 obj-$(CONFIG_GENERIC_ACL)	+= generic_acl.o
 
+obj-$(CONFIG_UNION_MOUNT)	+= union.o
+
 obj-$(CONFIG_QUOTA)		+= dquot.o
 obj-$(CONFIG_QFMT_V1)		+= quota_v1.o
 obj-$(CONFIG_QFMT_V2)		+= quota_v2.o
diff --git a/fs/dcache.c b/fs/dcache.c
index 9260c99..37e286e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1046,6 +1046,10 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
 	INIT_LIST_HEAD(&dentry->d_lru);
 	INIT_LIST_HEAD(&dentry->d_subdirs);
 	INIT_LIST_HEAD(&dentry->d_alias);
+#ifdef CONFIG_UNION_MOUNT
+	INIT_LIST_HEAD(&dentry->d_unions);
+	dentry->d_unionized = 0;
+#endif
 
 	if (parent) {
 		dentry->d_parent = dget(parent);
diff --git a/fs/union.c b/fs/union.c
new file mode 100644
index 0000000..f53ef5f
--- /dev/null
+++ b/fs/union.c
@@ -0,0 +1,331 @@
+/*
+ * VFS based union mount for Linux
+ *
+ * Copyright (C) 2004-2007 IBM Corporation, IBM Deutschland Entwicklung GmbH.
+ * Copyright (C) 2007-2009 Novell Inc.
+ *
+ *   Author(s): Jan Blunck (j.blunck@tu-harburg.de)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include <linux/bootmem.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/hash.h>
+#include <linux/fs.h>
+#include <linux/mount.h>
+#include <linux/union.h>
+
+/*
+ * This is borrowed from fs/inode.c. The hashtable for lookups. Somebody
+ * should try to make this good - I've just made it work.
+ */
+static unsigned int union_hash_mask __read_mostly;
+static unsigned int union_hash_shift __read_mostly;
+static struct hlist_head *union_hashtable __read_mostly;
+static unsigned int union_rhash_mask __read_mostly;
+static unsigned int union_rhash_shift __read_mostly;
+static struct hlist_head *union_rhashtable __read_mostly;
+
+/*
+ * Locking Rules:
+ * - dcache_lock (for union_rlookup() only)
+ * - union_lock
+ */
+DEFINE_SPINLOCK(union_lock);
+
+static struct kmem_cache *union_cache __read_mostly;
+
+static unsigned long hash(struct dentry *dentry, struct vfsmount *mnt)
+{
+	unsigned long tmp;
+
+	tmp = ((unsigned long)mnt * (unsigned long)dentry) ^
+		(GOLDEN_RATIO_PRIME + (unsigned long)mnt) / L1_CACHE_BYTES;
+	tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> union_hash_shift);
+	return tmp & union_hash_mask;
+}
+
+static __initdata unsigned long union_hash_entries;
+
+static int __init set_union_hash_entries(char *str)
+{
+	if (!str)
+		return 0;
+	union_hash_entries = simple_strtoul(str, &str, 0);
+	return 1;
+}
+
+__setup("union_hash_entries=", set_union_hash_entries);
+
+static int __init init_union(void)
+{
+	int loop;
+
+	union_cache = KMEM_CACHE(union_mount, SLAB_PANIC | SLAB_MEM_SPREAD);
+	union_hashtable = alloc_large_system_hash("Union-cache",
+						  sizeof(struct hlist_head),
+						  union_hash_entries,
+						  14,
+						  0,
+						  &union_hash_shift,
+						  &union_hash_mask,
+						  0);
+
+	for (loop = 0; loop < (1 << union_hash_shift); loop++)
+		INIT_HLIST_HEAD(&union_hashtable[loop]);
+
+
+	union_rhashtable = alloc_large_system_hash("rUnion-cache",
+						  sizeof(struct hlist_head),
+						  union_hash_entries,
+						  14,
+						  0,
+						  &union_rhash_shift,
+						  &union_rhash_mask,
+						  0);
+
+	for (loop = 0; loop < (1 << union_rhash_shift); loop++)
+		INIT_HLIST_HEAD(&union_rhashtable[loop]);
+
+	return 0;
+}
+
+fs_initcall(init_union);
+
+struct union_mount *union_alloc(struct dentry *this, struct vfsmount *this_mnt,
+				struct dentry *next, struct vfsmount *next_mnt)
+{
+	struct union_mount *um;
+
+	BUG_ON(!S_ISDIR(this->d_inode->i_mode));
+	BUG_ON(!S_ISDIR(next->d_inode->i_mode));
+
+	um = kmem_cache_alloc(union_cache, GFP_ATOMIC);
+	if (!um)
+		return NULL;
+
+	atomic_set(&um->u_count, 1);
+	INIT_LIST_HEAD(&um->u_unions);
+	INIT_HLIST_NODE(&um->u_hash);
+	INIT_HLIST_NODE(&um->u_rhash);
+
+	um->u_this.mnt = this_mnt;
+	um->u_this.dentry = this;
+	um->u_next.mnt = mntget(next_mnt);
+	um->u_next.dentry = dget(next);
+
+	return um;
+}
+
+struct union_mount *union_get(struct union_mount *um)
+{
+	BUG_ON(!atomic_read(&um->u_count));
+	atomic_inc(&um->u_count);
+	return um;
+}
+
+static int __union_put(struct union_mount *um)
+{
+	if (!atomic_dec_and_test(&um->u_count))
+		return 0;
+
+	BUG_ON(!hlist_unhashed(&um->u_hash));
+	BUG_ON(!hlist_unhashed(&um->u_rhash));
+
+	kmem_cache_free(union_cache, um);
+	return 1;
+}
+
+void union_put(struct union_mount *um)
+{
+	struct path tmp = um->u_next;
+
+	if (__union_put(um))
+		path_put(&tmp);
+}
+
+static void __union_hash(struct union_mount *um)
+{
+	hlist_add_head(&um->u_hash, union_hashtable +
+		       hash(um->u_this.dentry, um->u_this.mnt));
+	hlist_add_head(&um->u_rhash, union_rhashtable +
+		       hash(um->u_next.dentry, um->u_next.mnt));
+}
+
+static void __union_unhash(struct union_mount *um)
+{
+	hlist_del_init(&um->u_hash);
+	hlist_del_init(&um->u_rhash);
+}
+
+struct union_mount *union_lookup(struct dentry *dentry, struct vfsmount *mnt)
+{
+	struct hlist_head *head = union_hashtable + hash(dentry, mnt);
+	struct hlist_node *node;
+	struct union_mount *um;
+
+	hlist_for_each_entry(um, node, head, u_hash) {
+		if ((um->u_this.dentry == dentry) &&
+		    (um->u_this.mnt == mnt))
+			return um;
+	}
+
+	return NULL;
+}
+
+struct union_mount *union_rlookup(struct dentry *dentry, struct vfsmount *mnt)
+{
+	struct hlist_head *head = union_rhashtable + hash(dentry, mnt);
+	struct hlist_node *node;
+	struct union_mount *um;
+
+	hlist_for_each_entry(um, node, head, u_rhash) {
+		if ((um->u_next.dentry == dentry) &&
+		    (um->u_next.mnt == mnt))
+			return um;
+	}
+
+	return NULL;
+}
+
+/*
+ * is_unionized - check if a dentry lives on a union mounted file system
+ *
+ * This tests if a dentry is living on an union mounted file system by walking
+ * the file system hierarchy.
+ */
+int is_unionized(struct dentry *dentry, struct vfsmount *mnt)
+{
+	struct path this = { .mnt = mntget(mnt),
+			     .dentry = dget(dentry) };
+	struct vfsmount *tmp;
+
+	do {
+		/* check if there is an union mounted on top of us */
+		spin_lock(&vfsmount_lock);
+		list_for_each_entry(tmp, &this.mnt->mnt_mounts, mnt_child) {
+			if (!(tmp->mnt_flags & MNT_UNION))
+				continue;
+			/* Isn't this a bug? */
+			if (this.dentry->d_sb != tmp->mnt_mountpoint->d_sb)
+				continue;
+			if (is_subdir(this.dentry, tmp->mnt_mountpoint)) {
+				spin_unlock(&vfsmount_lock);
+				path_put(&this);
+				return 1;
+			}
+		}
+		spin_unlock(&vfsmount_lock);
+
+		/* check our mountpoint next */
+		tmp = mntget(this.mnt->mnt_parent);
+		dput(this.dentry);
+		this.dentry = dget(this.mnt->mnt_mountpoint);
+		mntput(this.mnt);
+		this.mnt = tmp;
+	} while (this.mnt != this.mnt->mnt_parent);
+
+	path_put(&this);
+	return 0;
+}
+
+int append_to_union(struct vfsmount *mnt, struct dentry *dentry,
+		    struct vfsmount *dest_mnt, struct dentry *dest_dentry)
+{
+	struct union_mount *this, *um;
+
+	BUG_ON(!IS_MNT_UNION(mnt));
+
+	this = union_alloc(dentry, mnt, dest_dentry, dest_mnt);
+	if (!this)
+		return -ENOMEM;
+
+	spin_lock(&union_lock);
+	um = union_lookup(dentry, mnt);
+	if (um) {
+		BUG_ON((um->u_next.dentry != dest_dentry) ||
+		       (um->u_next.mnt != dest_mnt));
+		spin_unlock(&union_lock);
+		union_put(this);
+		return 0;
+	}
+	__union_hash(this);
+	spin_unlock(&union_lock);
+	return 0;
+}
+
+/*
+ * follow_union_down - follow the union stack one layer down
+ *
+ * This is called to traverse the union stack from one layer to the next
+ * overlayed one. follow_union_down() is called by various lookup functions
+ * that are aware of union mounts.
+ *
+ * Returns non-zero if followed to the next layer, zero otherwise.
+ */
+int follow_union_down(struct vfsmount **mnt, struct dentry **dentry)
+{
+	struct union_mount *um;
+
+	if (!IS_MNT_UNION(*mnt))
+		return 0;
+
+	spin_lock(&union_lock);
+	um = union_lookup(*dentry, *mnt);
+	spin_unlock(&union_lock);
+	if (um) {
+		path_get(&um->u_next);
+		dput(*dentry);
+		*dentry = um->u_next.dentry;
+		mntput(*mnt);
+		*mnt = um->u_next.mnt;
+		return 1;
+	}
+	return 0;
+}
+
+/*
+ * follow_union_mount - follow the union stack to the topmost layer
+ *
+ * This is called to traverse the union stack to the topmost layer. This is
+ * necessary for following parent pointers in an union mount.
+ *
+ * Returns none zero if followed to the topmost layer, zero otherwise.
+ */
+int follow_union_mount(struct vfsmount **mnt, struct dentry **dentry)
+{
+	struct union_mount *um;
+	int res = 0;
+
+	while (IS_UNION(*dentry)) {
+		spin_lock(&dcache_lock);
+		spin_lock(&union_lock);
+		um = union_rlookup(*dentry, *mnt);
+		if (um)
+			path_get(&um->u_this);
+		spin_unlock(&union_lock);
+		spin_unlock(&dcache_lock);
+
+		/*
+		 * Q: Aaargh, how do I validate the topmost dentry pointer?
+		 * A: Eeeeasy! We took the dcache_lock and union_lock. Since
+		 *    this protects from any dput'ng going on, we know that the
+		 *    dentry is valid since the union is unhashed under
+		 *    dcache_lock too.
+		 */
+		if (!um)
+			break;
+		dput(*dentry);
+		*dentry = um->u_this.dentry;
+		mntput(*mnt);
+		*mnt = um->u_this.mnt;
+		res = 1;
+	}
+
+	return res;
+}
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index e00e95b..056b356 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -101,6 +101,15 @@ struct dentry {
 	struct dentry *d_parent;	/* parent directory */
 	struct qstr d_name;
 
+#ifdef CONFIG_UNION_MOUNT
+	/*
+	 * The following fields are used by the VFS based union mount
+	 * implementation. Both are protected by union_lock!
+	 */
+	struct list_head d_unions;	/* list of union_mount's */
+	unsigned int d_unionized;	/* unions referencing this dentry */
+#endif
+
 	struct list_head d_lru;		/* LRU list */
 	/*
 	 * d_child and d_rcu can share memory
diff --git a/include/linux/union.h b/include/linux/union.h
new file mode 100644
index 0000000..0c85312
--- /dev/null
+++ b/include/linux/union.h
@@ -0,0 +1,61 @@
+/*
+ * VFS based union mount for Linux
+ *
+ * Copyright (C) 2004-2007 IBM Corporation, IBM Deutschland Entwicklung GmbH.
+ * Copyright (C) 2007 Novell Inc.
+ *   Author(s): Jan Blunck (j.blunck@tu-harburg.de)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+#ifndef __LINUX_UNION_H
+#define __LINUX_UNION_H
+#ifdef __KERNEL__
+
+#include <linux/list.h>
+#include <asm/atomic.h>
+
+struct dentry;
+struct vfsmount;
+
+#ifdef CONFIG_UNION_MOUNT
+
+/*
+ * The new union mount structure.
+ */
+struct union_mount {
+	atomic_t u_count;		/* reference count */
+	struct mutex u_mutex;
+	struct list_head u_unions;	/* list head for d_unions */
+	struct hlist_node u_hash;	/* list head for searching */
+	struct hlist_node u_rhash;	/* list head for reverse searching */
+
+	struct path u_this;		/* this is me */
+	struct path u_next;		/* this is what I overlay */
+};
+
+#define IS_UNION(dentry)	(!list_empty(&(dentry)->d_unions) || \
+				 (dentry)->d_unionized)
+#define IS_MNT_UNION(mnt)	((mnt)->mnt_flags & MNT_UNION)
+
+extern int is_unionized(struct dentry *, struct vfsmount *);
+extern int append_to_union(struct vfsmount *, struct dentry *,
+			   struct vfsmount *, struct dentry *);
+extern int follow_union_down(struct vfsmount **, struct dentry **);
+extern int follow_union_mount(struct vfsmount **, struct dentry **);
+
+#else /* CONFIG_UNION_MOUNT */
+
+#define IS_UNION(x)			(0)
+#define IS_MNT_UNION(x)			(0)
+#define is_unionized(x, y)		(0)
+#define append_to_union(x1, y1, x2, y2)	({ BUG(); (0); })
+#define follow_union_down(x, y)		({ (0); })
+#define follow_union_mount(x, y)	({ (0); })
+
+#endif	/* CONFIG_UNION_MOUNT */
+#endif	/* __KERNEL__ */
+#endif	/* __LINUX_UNION_H */
-- 
1.6.1.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 18/32] union-mount: Drive the union cache via dcache
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (16 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 17/32] union-mount: Introduce union_mount structure Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 19/32] union-mount: Some checks during namespace changes Jan Blunck
                   ` (17 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

If a dentry is removed from dentry cache because its usage count drops to
zero, the references to the underlying layer of the unions the dentry is in
are droped too. Therefore the union cache is driven by the dentry cache.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/dcache.c            |   10 ++++++-
 fs/union.c             |   74 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dcache.h |    8 +++++
 include/linux/union.h  |    6 ++++
 4 files changed, 97 insertions(+), 1 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 37e286e..b6fb688 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -19,6 +19,7 @@
 #include <linux/mm.h>
 #include <linux/fdtable.h>
 #include <linux/fs.h>
+#include <linux/union.h>
 #include <linux/fsnotify.h>
 #include <linux/slab.h>
 #include <linux/init.h>
@@ -188,11 +189,14 @@ static struct dentry *__d_kill(struct dentry *dentry, struct list_head *list,
 		list_add(&dentry->d_lru, list);
 		spin_unlock(&dentry->d_lock);
 		spin_unlock(&dcache_lock);
+		__shrink_d_unions(dentry, list);
 		return NULL;
 	}
 
-	/*drops the locks, at that point nobody can reach this dentry */
+	/* drops the locks, at that point nobody can reach this dentry */
 	dentry_iput(dentry);
+	/* If the dentry was in an union delete them */
+	__shrink_d_unions(dentry, list);
 	if (IS_ROOT(dentry))
 		parent = NULL;
 	else
@@ -784,6 +788,7 @@ static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
 					iput(inode);
 			}
 
+			shrink_d_unions(dentry);
 			d_free(dentry);
 
 			/* finished when we fall off the top of the tree,
@@ -1626,7 +1631,9 @@ void d_delete(struct dentry * dentry)
 	spin_lock(&dentry->d_lock);
 	isdir = S_ISDIR(dentry->d_inode->i_mode);
 	if (atomic_read(&dentry->d_count) == 1) {
+		__d_drop_unions(dentry);
 		dentry_iput(dentry);
+		shrink_d_unions(dentry);
 		fsnotify_nameremove(dentry, isdir);
 		return;
 	}
@@ -1637,6 +1644,7 @@ void d_delete(struct dentry * dentry)
 	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
 
+	shrink_d_unions(dentry);
 	fsnotify_nameremove(dentry, isdir);
 }
 
diff --git a/fs/union.c b/fs/union.c
index f53ef5f..dd8a8cb 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -14,6 +14,7 @@
 
 #include <linux/bootmem.h>
 #include <linux/init.h>
+#include <linux/module.h>
 #include <linux/types.h>
 #include <linux/hash.h>
 #include <linux/fs.h>
@@ -254,6 +255,8 @@ int append_to_union(struct vfsmount *mnt, struct dentry *dentry,
 		union_put(this);
 		return 0;
 	}
+	list_add(&this->u_unions, &dentry->d_unions);
+	dest_dentry->d_unionized++;
 	__union_hash(this);
 	spin_unlock(&union_lock);
 	return 0;
@@ -329,3 +332,74 @@ int follow_union_mount(struct vfsmount **mnt, struct dentry **dentry)
 
 	return res;
 }
+
+/*
+ * This must be called when unhashing a dentry. This is called with dcache_lock
+ * and unhashes all unions this dentry is in.
+ */
+void __d_drop_unions(struct dentry *dentry)
+{
+	struct union_mount *this, *next;
+
+	spin_lock(&union_lock);
+	list_for_each_entry_safe(this, next, &dentry->d_unions, u_unions)
+		__union_unhash(this);
+	spin_unlock(&union_lock);
+}
+EXPORT_SYMBOL_GPL(__d_drop_unions);
+
+/*
+ * This must be called after __d_drop_unions() without holding any locks.
+ * Note: The dentry might still be reachable via a lookup but at that time it
+ * already a negative dentry. Otherwise it would be unhashed. The union_mount
+ * structure itself is still reachable through mnt->mnt_unions (which we
+ * protect against with union_lock).
+ */
+void shrink_d_unions(struct dentry *dentry)
+{
+	struct union_mount *this, *next;
+
+repeat:
+	spin_lock(&union_lock);
+	list_for_each_entry_safe(this, next, &dentry->d_unions, u_unions) {
+		BUG_ON(!hlist_unhashed(&this->u_hash));
+		BUG_ON(!hlist_unhashed(&this->u_rhash));
+		list_del(&this->u_unions);
+		this->u_next.dentry->d_unionized--;
+		spin_unlock(&union_lock);
+		union_put(this);
+		goto repeat;
+	}
+	spin_unlock(&union_lock);
+}
+
+extern void __dput(struct dentry *, struct list_head *, int);
+
+/*
+ * This is the special variant for use in dput() only.
+ */
+void __shrink_d_unions(struct dentry *dentry, struct list_head *list)
+{
+	struct union_mount *this, *next;
+
+	BUG_ON(!d_unhashed(dentry));
+
+repeat:
+	spin_lock(&union_lock);
+	list_for_each_entry_safe(this, next, &dentry->d_unions, u_unions) {
+		struct dentry *n_dentry = this->u_next.dentry;
+		struct vfsmount *n_mnt = this->u_next.mnt;
+
+		BUG_ON(!hlist_unhashed(&this->u_hash));
+		BUG_ON(!hlist_unhashed(&this->u_rhash));
+		list_del(&this->u_unions);
+		this->u_next.dentry->d_unionized--;
+		spin_unlock(&union_lock);
+		if (__union_put(this)) {
+			__dput(n_dentry, list, 0);
+			mntput(n_mnt);
+		}
+		goto repeat;
+	}
+	spin_unlock(&union_lock);
+}
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 056b356..7930b07 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -212,12 +212,20 @@ extern seqlock_t rename_lock;
  * __d_drop requires dentry->d_lock.
  */
 
+#ifdef CONFIG_UNION_MOUNT
+extern void __d_drop_unions(struct dentry *);
+#endif
+
 static inline void __d_drop(struct dentry *dentry)
 {
 	if (!(dentry->d_flags & DCACHE_UNHASHED)) {
 		dentry->d_flags |= DCACHE_UNHASHED;
 		hlist_del_rcu(&dentry->d_hash);
 	}
+#ifdef CONFIG_UNION_MOUNT
+	/* remove dentry from the union hashtable */
+	__d_drop_unions(dentry);
+#endif
 }
 
 static inline void d_drop(struct dentry *dentry)
diff --git a/include/linux/union.h b/include/linux/union.h
index 0c85312..b035a82 100644
--- a/include/linux/union.h
+++ b/include/linux/union.h
@@ -46,6 +46,9 @@ extern int append_to_union(struct vfsmount *, struct dentry *,
 			   struct vfsmount *, struct dentry *);
 extern int follow_union_down(struct vfsmount **, struct dentry **);
 extern int follow_union_mount(struct vfsmount **, struct dentry **);
+extern void __d_drop_unions(struct dentry *);
+extern void shrink_d_unions(struct dentry *);
+extern void __shrink_d_unions(struct dentry *, struct list_head *);
 
 #else /* CONFIG_UNION_MOUNT */
 
@@ -55,6 +58,9 @@ extern int follow_union_mount(struct vfsmount **, struct dentry **);
 #define append_to_union(x1, y1, x2, y2)	({ BUG(); (0); })
 #define follow_union_down(x, y)		({ (0); })
 #define follow_union_mount(x, y)	({ (0); })
+#define __d_drop_unions(x)		do { } while (0)
+#define shrink_d_unions(x)		do { } while (0)
+#define __shrink_d_unions(x,y)		do { } while (0)
 
 #endif	/* CONFIG_UNION_MOUNT */
 #endif	/* __KERNEL__ */
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 19/32] union-mount: Some checks during namespace changes
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (17 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 18/32] union-mount: Drive the union cache via dcache Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 20/32] union-mount: Changes to the namespace handling Jan Blunck
                   ` (16 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

Add some additional checks when mounting something into an union.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/namespace.c |   34 ++++++++++++++++++++++++++++++++++
 1 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 53998f2..4128d99 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -27,6 +27,7 @@
 #include <linux/ramfs.h>
 #include <linux/log2.h>
 #include <linux/idr.h>
+#include <linux/union.h>
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
 #include "pnode.h"
@@ -1442,6 +1443,10 @@ static int do_change_type(struct path *path, int flag)
 	if (path->dentry != path->mnt->mnt_root)
 		return -EINVAL;
 
+	/* Don't change the type of union mounts */
+	if (IS_MNT_UNION(path->mnt))
+		return -EINVAL;
+
 	down_write(&namespace_sem);
 	if (type == MS_SHARED) {
 		err = invent_group_ids(mnt, recurse);
@@ -1493,6 +1498,18 @@ static int do_loopback(struct path *path, char *old_name, int recurse,
 	if (!mnt)
 		goto out;
 
+	/*
+	 * Unions couldn't be writable if the filesystem doesn't know about
+	 * whiteouts
+	 */
+	err = -ENOTSUPP;
+	if ((mnt_flags & MNT_UNION) &&
+	    !(mnt->mnt_sb->s_flags & (MS_WHITEOUT|MS_RDONLY)))
+		goto out;
+
+	if (mnt_flags & MNT_UNION)
+		mnt->mnt_flags |= MNT_UNION;
+
 	err = graft_tree(mnt, path);
 	if (err) {
 		LIST_HEAD(umount_list);
@@ -1586,6 +1603,13 @@ static int do_move_mount(struct path *path, char *old_name)
 	if (err)
 		return err;
 
+	/* moving to or from a union mount is not supported */
+	err = -EINVAL;
+	if (IS_MNT_UNION(path->mnt))
+		goto exit;
+	if (IS_MNT_UNION(old_path.mnt))
+		goto exit;
+
 	down_write(&namespace_sem);
 	while (d_mountpoint(path->dentry) &&
 	       follow_down(&path->mnt, &path->dentry))
@@ -1643,6 +1667,7 @@ out:
 	up_write(&namespace_sem);
 	if (!err)
 		path_put(&parent_path);
+exit:
 	path_put(&old_path);
 	return err;
 }
@@ -1698,6 +1723,15 @@ int do_add_mount(struct vfsmount *newmnt, struct path *path,
 	if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode))
 		goto unlock;
 
+	/*
+	 * Unions couldn't be writable if the filesystem doesn't know about
+	 * whiteouts
+	 */
+	err = -ENOTSUPP;
+	if ((mnt_flags & MNT_UNION) &&
+	    !(newmnt->mnt_sb->s_flags & (MS_WHITEOUT|MS_RDONLY)))
+		goto unlock;
+
 	newmnt->mnt_flags = mnt_flags;
 	if ((err = graft_tree(newmnt, path)))
 		goto unlock;
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 20/32] union-mount: Changes to the namespace handling
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (18 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 19/32] union-mount: Some checks during namespace changes Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 21/32] union-mount: Make lookup work for union-mounted file systems Jan Blunck
                   ` (15 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

Creates the proper struct union_mount when mounting something into a
union. If the topmost filesystem isn't capable of handling the white-out
filetype it could only be mount read-only.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/namespace.c        |    7 ++++++
 fs/union.c            |   57 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/mount.h |    3 ++
 include/linux/union.h |   10 +++++++-
 4 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 4128d99..22aabc5 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -131,6 +131,9 @@ struct vfsmount *alloc_vfsmnt(const char *name)
 		INIT_LIST_HEAD(&mnt->mnt_share);
 		INIT_LIST_HEAD(&mnt->mnt_slave_list);
 		INIT_LIST_HEAD(&mnt->mnt_slave);
+#ifdef CONFIG_UNION_MOUNT
+		INIT_LIST_HEAD(&mnt->mnt_unions);
+#endif
 		atomic_set(&mnt->__mnt_writers, 0);
 	}
 	return mnt;
@@ -476,6 +479,7 @@ static void __touch_mnt_namespace(struct mnt_namespace *ns)
 
 static void detach_mnt(struct vfsmount *mnt, struct path *old_path)
 {
+	detach_mnt_union(mnt);
 	old_path->dentry = mnt->mnt_mountpoint;
 	old_path->mnt = mnt->mnt_parent;
 	mnt->mnt_parent = mnt;
@@ -499,6 +503,7 @@ static void attach_mnt(struct vfsmount *mnt, struct path *path)
 	list_add_tail(&mnt->mnt_hash, mount_hashtable +
 			hash(path->mnt, path->dentry));
 	list_add_tail(&mnt->mnt_child, &path->mnt->mnt_mounts);
+	attach_mnt_union(mnt, path->mnt, path->dentry);
 }
 
 /*
@@ -521,6 +526,7 @@ static void commit_tree(struct vfsmount *mnt)
 	list_add_tail(&mnt->mnt_hash, mount_hashtable +
 				hash(parent, mnt->mnt_mountpoint));
 	list_add_tail(&mnt->mnt_child, &parent->mnt_mounts);
+	attach_mnt_union(mnt, mnt->mnt_parent, mnt->mnt_mountpoint);
 	touch_mnt_namespace(n);
 }
 
@@ -996,6 +1002,7 @@ void release_mounts(struct list_head *head)
 			struct dentry *dentry;
 			struct vfsmount *m;
 			spin_lock(&vfsmount_lock);
+			detach_mnt_union(mnt);
 			dentry = mnt->mnt_mountpoint;
 			m = mnt->mnt_parent;
 			mnt->mnt_mountpoint = mnt->mnt_root;
diff --git a/fs/union.c b/fs/union.c
index dd8a8cb..6e220bd 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -112,6 +112,7 @@ struct union_mount *union_alloc(struct dentry *this, struct vfsmount *this_mnt,
 
 	atomic_set(&um->u_count, 1);
 	INIT_LIST_HEAD(&um->u_unions);
+	INIT_LIST_HEAD(&um->u_list);
 	INIT_HLIST_NODE(&um->u_hash);
 	INIT_HLIST_NODE(&um->u_rhash);
 
@@ -255,6 +256,7 @@ int append_to_union(struct vfsmount *mnt, struct dentry *dentry,
 		union_put(this);
 		return 0;
 	}
+	list_add(&this->u_list, &mnt->mnt_unions);
 	list_add(&this->u_unions, &dentry->d_unions);
 	dest_dentry->d_unionized++;
 	__union_hash(this);
@@ -364,6 +366,7 @@ repeat:
 	list_for_each_entry_safe(this, next, &dentry->d_unions, u_unions) {
 		BUG_ON(!hlist_unhashed(&this->u_hash));
 		BUG_ON(!hlist_unhashed(&this->u_rhash));
+		list_del(&this->u_list);
 		list_del(&this->u_unions);
 		this->u_next.dentry->d_unionized--;
 		spin_unlock(&union_lock);
@@ -392,6 +395,7 @@ repeat:
 
 		BUG_ON(!hlist_unhashed(&this->u_hash));
 		BUG_ON(!hlist_unhashed(&this->u_rhash));
+		list_del(&this->u_list);
 		list_del(&this->u_unions);
 		this->u_next.dentry->d_unionized--;
 		spin_unlock(&union_lock);
@@ -403,3 +407,56 @@ repeat:
 	}
 	spin_unlock(&union_lock);
 }
+
+/*
+ * Remove all union_mounts structures belonging to this vfsmount from the
+ * union lookup hashtable and so on ...
+ */
+void shrink_mnt_unions(struct vfsmount *mnt)
+{
+	struct union_mount *this, *next;
+
+repeat:
+	spin_lock(&union_lock);
+	list_for_each_entry_safe(this, next, &mnt->mnt_unions, u_list) {
+		if (this->u_this.dentry == mnt->mnt_root)
+			continue;
+		__union_unhash(this);
+		list_del(&this->u_list);
+		list_del(&this->u_unions);
+		this->u_next.dentry->d_unionized--;
+		spin_unlock(&union_lock);
+		union_put(this);
+		goto repeat;
+	}
+	spin_unlock(&union_lock);
+}
+
+int attach_mnt_union(struct vfsmount *mnt, struct vfsmount *dest_mnt,
+		     struct dentry *dest_dentry)
+{
+	if (!IS_MNT_UNION(mnt))
+		return 0;
+
+	return append_to_union(mnt, mnt->mnt_root, dest_mnt, dest_dentry);
+}
+
+void detach_mnt_union(struct vfsmount *mnt)
+{
+	struct union_mount *um;
+
+	if (!IS_MNT_UNION(mnt))
+		return;
+
+	shrink_mnt_unions(mnt);
+
+	spin_lock(&union_lock);
+	um = union_lookup(mnt->mnt_root, mnt);
+	__union_unhash(um);
+	list_del(&um->u_list);
+	list_del(&um->u_unions);
+	um->u_next.dentry->d_unionized--;
+	spin_unlock(&union_lock);
+	union_put(um);
+	return;
+}
diff --git a/include/linux/mount.h b/include/linux/mount.h
index bd35fb8..6f7dda7 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -53,6 +53,9 @@ struct vfsmount {
 	struct list_head mnt_slave_list;/* list of slave mounts */
 	struct list_head mnt_slave;	/* slave list entry */
 	struct vfsmount *mnt_master;	/* slave is on master->mnt_slave_list */
+#ifdef CONFIG_UNION_MOUNT
+	struct list_head mnt_unions;	/* list of union_mount structures */
+#endif
 	struct mnt_namespace *mnt_ns;	/* containing namespace */
 	int mnt_id;			/* mount identifier */
 	int mnt_group_id;		/* peer group identifier */
diff --git a/include/linux/union.h b/include/linux/union.h
index b035a82..0b6f356 100644
--- a/include/linux/union.h
+++ b/include/linux/union.h
@@ -30,8 +30,9 @@ struct union_mount {
 	atomic_t u_count;		/* reference count */
 	struct mutex u_mutex;
 	struct list_head u_unions;	/* list head for d_unions */
-	struct hlist_node u_hash;	/* list head for searching */
-	struct hlist_node u_rhash;	/* list head for reverse searching */
+	struct list_head u_list;	/* list head for mnt_unions */
+	struct hlist_node u_hash;	/* list head for seaching */
+	struct hlist_node u_rhash;	/* list head for reverse seaching */
 
 	struct path u_this;		/* this is me */
 	struct path u_next;		/* this is what I overlay */
@@ -49,6 +50,9 @@ extern int follow_union_mount(struct vfsmount **, struct dentry **);
 extern void __d_drop_unions(struct dentry *);
 extern void shrink_d_unions(struct dentry *);
 extern void __shrink_d_unions(struct dentry *, struct list_head *);
+extern int attach_mnt_union(struct vfsmount *, struct vfsmount *,
+			    struct dentry *);
+extern void detach_mnt_union(struct vfsmount *);
 
 #else /* CONFIG_UNION_MOUNT */
 
@@ -61,6 +65,8 @@ extern void __shrink_d_unions(struct dentry *, struct list_head *);
 #define __d_drop_unions(x)		do { } while (0)
 #define shrink_d_unions(x)		do { } while (0)
 #define __shrink_d_unions(x,y)		do { } while (0)
+#define attach_mnt_union(x, y, z)	do { } while (0)
+#define detach_mnt_union(x)		do { } while (0)
 
 #endif	/* CONFIG_UNION_MOUNT */
 #endif	/* __KERNEL__ */
-- 
1.6.1.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 21/32] union-mount: Make lookup work for union-mounted file systems
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (19 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 20/32] union-mount: Changes to the namespace handling Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-19 16:15   ` Miklos Szeredi
  2009-05-18 16:09 ` [PATCH 22/32] union-mount: stop lookup when directory has S_OPAQUE flag set Jan Blunck
                   ` (14 subsequent siblings)
  35 siblings, 1 reply; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

On union-mounted file systems the lookup function must also visit lower layers
of the union-stack when doing a lookup. This patches add support for
union-mounts to cached lookups and real lookups.

We have 3 different styles of lookup functions now:
- multiple pathname components, follow mounts, follow union, follow symlinks
- single pathname component, doesn't follow mounts, follow union, doesn't
  follow symlinks
- single pathname component doesn't follow mounts, doesn't follow unions,
  doesn't follow symlinks

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/namei.c            |  470 ++++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/namei.h |    6 +
 2 files changed, 468 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9dc51b0..2bb8a22 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -31,6 +31,7 @@
 #include <linux/file.h>
 #include <linux/fcntl.h>
 #include <linux/device_cgroup.h>
+#include <linux/union.h>
 #include <asm/uaccess.h>
 
 #define ACC_MODE(x) ("\000\004\002\006"[(x)&O_ACCMODE])
@@ -413,6 +414,173 @@ static struct dentry *cache_lookup(struct dentry *parent, struct qstr *name,
 	return dentry;
 }
 
+/**
+ * __cache_lookup_topmost - lookup the topmost (non-)negative dentry
+ *
+ * @nd - parent's nameidata
+ * @name - pathname part to lookup
+ * @path - found dentry for pathname part
+ *
+ * This is used for union mount lookups from dcache. The first non-negative
+ * dentry is searched on all layers of the union stack. Otherwise the topmost
+ * negative dentry is returned.
+ */
+static int __cache_lookup_topmost(struct nameidata *nd, struct qstr *name,
+				  struct path *path)
+{
+	struct dentry *dentry;
+
+	dentry = d_lookup(nd->path.dentry, name);
+	if (dentry && dentry->d_op && dentry->d_op->d_revalidate)
+		dentry = do_revalidate(dentry, nd);
+
+	/*
+	 * Remember the topmost negative dentry in case we don't find anything
+	 */
+	path->dentry = dentry;
+	path->mnt = dentry ? nd->path.mnt : NULL;
+
+	if (!dentry || dentry->d_inode)
+		return !dentry;
+
+	/* look for the first non-negative dentry */
+
+	while (follow_union_down(&nd->path.mnt, &nd->path.dentry)) {
+		dentry = d_hash_and_lookup(nd->path.dentry, name);
+
+		/*
+		 * If parts of the union stack are not in the dcache we need
+		 * to do a real lookup
+		 */
+		if (!dentry)
+			goto out_dput;
+
+		/*
+		 * If parts of the union don't survive the revalidation we
+		 * need to do a real lookup
+		 */
+		if (dentry->d_op && dentry->d_op->d_revalidate) {
+			dentry = do_revalidate(dentry, nd);
+			if (!dentry)
+				goto out_dput;
+		}
+
+		if (dentry->d_inode)
+			goto out_dput;
+
+		dput(dentry);
+	}
+
+	return !dentry;
+
+out_dput:
+	dput(path->dentry);
+	path->dentry = dentry;
+	path->mnt = dentry ? mntget(nd->path.mnt) : NULL;
+	return !dentry;
+}
+
+/**
+ * __cache_lookup_build_union - build the union stack for this part,
+ * cached version
+ *
+ * This is called after you have the topmost dentry in @path.
+ */
+static int __cache_lookup_build_union(struct nameidata *nd, struct qstr *name,
+				      struct path *path)
+{
+	struct path last = *path;
+	struct dentry *dentry;
+
+	while (follow_union_down(&nd->path.mnt, &nd->path.dentry)) {
+		dentry = d_hash_and_lookup(nd->path.dentry, name);
+		if (!dentry)
+			return 1;
+
+		if (dentry->d_op && dentry->d_op->d_revalidate) {
+			dentry = do_revalidate(dentry, nd);
+			if (!dentry)
+				return 1;
+		}
+
+		if (!dentry->d_inode) {
+			dput(dentry);
+			continue;
+		}
+
+		/* only directories can be part of a union stack */
+		if (!S_ISDIR(dentry->d_inode->i_mode)) {
+			dput(dentry);
+			break;
+		}
+
+		/* Add the newly discovered dir to the union stack */
+		append_to_union(last.mnt, last.dentry, nd->path.mnt, dentry);
+
+		if (last.dentry != path->dentry)
+			path_put(&last);
+		last.dentry = dentry;
+		last.mnt = mntget(nd->path.mnt);
+	}
+
+	if (last.dentry != path->dentry)
+		path_put(&last);
+
+	return 0;
+}
+
+/**
+ * cache_lookup_union - lookup a single pathname part from dcache
+ *
+ * This is a union mount capable version of what d_lookup() & revalidate()
+ * would do. This function returns a valid (union) dentry on success.
+ *
+ * Remember: On failure it means that parts of the union aren't cached. You
+ * should call real_lookup() afterwards to find the proper (union) dentry.
+ */
+static int cache_lookup_union(struct nameidata *nd, struct qstr *name,
+			      struct path *path)
+{
+	int res ;
+
+	if (!IS_MNT_UNION(nd->path.mnt)) {
+		path->dentry = cache_lookup(nd->path.dentry, name, nd);
+		path->mnt = path->dentry ? nd->path.mnt : NULL;
+		res = path->dentry ? 0 : 1;
+	} else {
+		struct path safe = {
+			.dentry = nd->path.dentry,
+			.mnt = nd->path.mnt
+		};
+
+		path_get(&safe);
+		res = __cache_lookup_topmost(nd, name, path);
+		if (res)
+			goto out;
+
+		/* only directories can be part of a union stack */
+		if (!path->dentry->d_inode ||
+		    !S_ISDIR(path->dentry->d_inode->i_mode))
+			goto out;
+
+		/* Build the union stack for this part */
+		res = __cache_lookup_build_union(nd, name, path);
+		if (res) {
+			dput(path->dentry);
+			if (path->mnt != safe.mnt)
+				mntput(path->mnt);
+			goto out;
+		}
+
+out:
+		path_put(&nd->path);
+		nd->path.dentry = safe.dentry;
+		nd->path.mnt = safe.mnt;
+	}
+
+	return res;
+}
+
 /*
  * Short-cut version of permission(), for calling by
  * path_walk(), when dcache lock is held.  Combines parts
@@ -534,6 +702,146 @@ out_unlock:
 	return res;
 }
 
+/**
+ * __real_lookup_topmost - lookup topmost dentry, non-cached version
+ *
+ * If we reach a dentry with restricted access, we just stop the lookup
+ * because we shouldn't see through that dentry. Same thing for dentry
+ * type mismatch and whiteouts.
+ *
+ * FIXME:
+ * - handle DT_WHT
+ * - handle union stacks in use
+ * - handle union stacks mounted upon union stacks
+ * - avoid unnecessary allocations of union locks
+ */
+static int __real_lookup_topmost(struct nameidata *nd, struct qstr *name,
+				 struct path *path)
+{
+	struct path next;
+	int err;
+
+	err = real_lookup(nd, name, path);
+	if (err)
+		return err;
+
+	if (path->dentry->d_inode)
+		return 0;
+
+	while (follow_union_down(&nd->path.mnt, &nd->path.dentry)) {
+		name->hash = full_name_hash(name->name, name->len);
+		if (nd->path.dentry->d_op && nd->path.dentry->d_op->d_hash) {
+			err = nd->path.dentry->d_op->d_hash(nd->path.dentry,
+							    name);
+			if (err < 0)
+				goto out;
+		}
+
+		err = real_lookup(nd, name, &next);
+		if (err)
+			goto out;
+
+		if (next.dentry->d_inode) {
+			dput(path->dentry);
+			mntget(next.mnt);
+			*path = next;
+			goto out;
+		}
+
+		dput(next.dentry);
+	}
+out:
+	if (err)
+		dput(path->dentry);
+	return err;
+}
+
+/**
+ * __real_lookup_build_union: build the union stack for this pathname
+ * part, non-cached version
+ *
+ * Called when not all parts of the union stack are in cache
+ */
+
+static int __real_lookup_build_union(struct nameidata *nd, struct qstr *name,
+				     struct path *path)
+{
+	struct path last = *path;
+	struct path next;
+	int err = 0;
+
+	while (follow_union_down(&nd->path.mnt, &nd->path.dentry)) {
+		/* We need to recompute the hash for lower layer lookups */
+		name->hash = full_name_hash(name->name, name->len);
+		if (nd->path.dentry->d_op && nd->path.dentry->d_op->d_hash) {
+			err = nd->path.dentry->d_op->d_hash(nd->path.dentry,
+							    name);
+			if (err < 0)
+				goto out;
+		}
+
+		err = real_lookup(nd, name, &next);
+		if (err)
+			goto out;
+
+		if (!next.dentry->d_inode) {
+			dput(next.dentry);
+			continue;
+		}
+
+		/* only directories can be part of a union stack */
+		if (!S_ISDIR(next.dentry->d_inode->i_mode)) {
+			dput(next.dentry);
+			break;
+		}
+
+		/* now we know we found something "real" */
+		append_to_union(last.mnt, last.dentry, next.mnt, next.dentry);
+
+		if (last.dentry != path->dentry)
+			path_put(&last);
+		last.dentry = next.dentry;
+		last.mnt = mntget(next.mnt);
+	}
+
+	if (last.dentry != path->dentry)
+		path_put(&last);
+out:
+	return err;
+}
+
+static int real_lookup_union(struct nameidata *nd, struct qstr *name,
+			     struct path *path)
+{
+	struct path safe = { .dentry = nd->path.dentry, .mnt = nd->path.mnt };
+	int res ;
+
+	path_get(&safe);
+	res = __real_lookup_topmost(nd, name, path);
+	if (res)
+		goto out;
+
+	/* only directories can be part of a union stack */
+	if (!path->dentry->d_inode ||
+	    !S_ISDIR(path->dentry->d_inode->i_mode))
+		goto out;
+
+	/* Build the union stack for this part */
+	res = __real_lookup_build_union(nd, name, path);
+	if (res) {
+		dput(path->dentry);
+		if (path->mnt != safe.mnt)
+			mntput(path->mnt);
+		goto out;
+	}
+
+out:
+	path_put(&nd->path);
+	nd->path.dentry = safe.dentry;
+	nd->path.mnt = safe.mnt;
+	return res;
+}
+
 /*
  * Wrapper to retry pathname resolution whenever the underlying
  * file system returns an ESTALE.
@@ -787,6 +1095,7 @@ static __always_inline void follow_dotdot(struct nameidata *nd)
 		nd->path.mnt = parent;
 	}
 	follow_mount(&nd->path.mnt, &nd->path.dentry);
+	follow_union_mount(&nd->path.mnt, &nd->path.dentry);
 }
 
 /*
@@ -799,6 +1108,9 @@ static int do_lookup(struct nameidata *nd, struct qstr *name,
 {
 	int err;
 
+	if (IS_MNT_UNION(nd->path.mnt))
+		goto need_union_lookup;
+
 	path->dentry = __d_lookup(nd->path.dentry, name);
 	path->mnt = nd->path.mnt;
 	if (!path->dentry)
@@ -807,7 +1119,12 @@ static int do_lookup(struct nameidata *nd, struct qstr *name,
 		goto need_revalidate;
 
 done:
-	__follow_mount(path);
+	if (nd->path.mnt != path->mnt) {
+		nd->um_flags |= LAST_LOWLEVEL;
+		follow_mount(&path->mnt, &path->dentry);
+	} else
+		__follow_mount(path);
+	follow_union_mount(&path->mnt, &path->dentry);
 	return 0;
 
 need_lookup:
@@ -816,6 +1133,16 @@ need_lookup:
 		goto fail;
 	goto done;
 
+need_union_lookup:
+	err = cache_lookup_union(nd, name, path);
+	if (!err && path->dentry)
+		goto done;
+
+	err = real_lookup_union(nd, name, path);
+	if (err)
+		goto fail;
+	goto done;
+
 need_revalidate:
 	path->dentry = do_revalidate(path->dentry, nd);
 	if (!path->dentry)
@@ -854,6 +1181,8 @@ static int __link_path_walk(const char *name, struct nameidata *nd)
 	if (nd->depth)
 		lookup_flags = LOOKUP_FOLLOW | (nd->flags & LOOKUP_CONTINUE);
 
+	follow_union_mount(&nd->path.mnt, &nd->path.dentry);
+
 	/* At this point we know we have a real path component. */
 	for(;;) {
 		unsigned long hash;
@@ -1038,6 +1367,7 @@ static int do_path_lookup(int dfd, const char *name,
 
 	nd->last_type = LAST_ROOT; /* if there are only slashes... */
 	nd->flags = flags;
+	nd->um_flags = 0;
 	nd->depth = 0;
 
 	if (*name=='/') {
@@ -1229,6 +1559,130 @@ static int lookup_hash(struct nameidata *nd, struct qstr *name,
 	return err;
 }
 
+static int __hash_lookup_topmost(struct nameidata *nd, struct qstr *name,
+				 struct path *path)
+{
+	struct path next;
+	int err;
+
+	err = lookup_hash(nd, name, path);
+	if (err)
+		return err;
+
+	if (path->dentry->d_inode)
+		return 0;
+
+	while (follow_union_down(&nd->path.mnt, &nd->path.dentry)) {
+		name->hash = full_name_hash(name->name, name->len);
+		if (nd->path.dentry->d_op && nd->path.dentry->d_op->d_hash) {
+			err = nd->path.dentry->d_op->d_hash(nd->path.dentry,
+							    name);
+			if (err < 0)
+				goto out;
+		}
+
+		mutex_lock(&nd->path.dentry->d_inode->i_mutex);
+		err = lookup_hash(nd, name, &next);
+		mutex_unlock(&nd->path.dentry->d_inode->i_mutex);
+		if (err)
+			goto out;
+
+		if (next.dentry->d_inode) {
+			dput(path->dentry);
+			mntget(next.mnt);
+			*path = next;
+			goto out;
+		}
+
+		dput(next.dentry);
+	}
+out:
+	if (err)
+		dput(path->dentry);
+	return err;
+}
+
+static int __hash_lookup_build_union(struct nameidata *nd, struct qstr *name,
+				     struct path *path)
+{
+	struct path last = *path;
+	struct path next;
+	int err = 0;
+
+	while (follow_union_down(&nd->path.mnt, &nd->path.dentry)) {
+		/* We need to recompute the hash for lower layer lookups */
+		name->hash = full_name_hash(name->name, name->len);
+		if (nd->path.dentry->d_op && nd->path.dentry->d_op->d_hash) {
+			err = nd->path.dentry->d_op->d_hash(nd->path.dentry,
+							    name);
+			if (err < 0)
+				goto out;
+		}
+
+		mutex_lock(&nd->path.dentry->d_inode->i_mutex);
+		err = lookup_hash(nd, name, &next);
+		mutex_unlock(&nd->path.dentry->d_inode->i_mutex);
+		if (err)
+			goto out;
+
+		if (!next.dentry->d_inode) {
+			dput(next.dentry);
+			continue;
+		}
+
+		/* only directories can be part of a union stack */
+		if (!S_ISDIR(next.dentry->d_inode->i_mode)) {
+			dput(next.dentry);
+			break;
+		}
+
+		/* now we know we found something "real" */
+		append_to_union(last.mnt, last.dentry, next.mnt, next.dentry);
+
+		if (last.dentry != path->dentry)
+			path_put(&last);
+		last.dentry = next.dentry;
+		last.mnt = mntget(next.mnt);
+	}
+
+	if (last.dentry != path->dentry)
+		path_put(&last);
+out:
+	return err;
+}
+
+static int hash_lookup_union(struct nameidata *nd, struct qstr *name,
+			     struct path *path)
+{
+	struct path safe = { .dentry = nd->path.dentry, .mnt = nd->path.mnt };
+	int res ;
+
+	path_get(&safe);
+	res = __hash_lookup_topmost(nd, name, path);
+	if (res)
+		goto out;
+
+	/* only directories can be part of a union stack */
+	if (!path->dentry->d_inode ||
+	    !S_ISDIR(path->dentry->d_inode->i_mode))
+		goto out;
+
+	/* Build the union stack for this part */
+	res = __hash_lookup_build_union(nd, name, path);
+	if (res) {
+		dput(path->dentry);
+		if (path->mnt != safe.mnt)
+			mntput(path->mnt);
+		goto out;
+	}
+
+out:
+	path_put(&nd->path);
+	nd->path.dentry = safe.dentry;
+	nd->path.mnt = safe.mnt;
+	return res;
+}
+
 static int __lookup_one_len(const char *name, struct qstr *this,
 		struct dentry *base, int len)
 {
@@ -1713,7 +2167,7 @@ struct file *do_filp_open(int dfd, const char *pathname,
 	if (flag & O_EXCL)
 		nd.flags |= LOOKUP_EXCL;
 	mutex_lock(&dir->d_inode->i_mutex);
-	error = lookup_hash(&nd, &nd.last, &path);
+	error = hash_lookup_union(&nd, &nd.last, &path);
 
 do_last:
 	if (error) {
@@ -1862,7 +2316,7 @@ do_link:
 	}
 	dir = nd.path.dentry;
 	mutex_lock(&dir->d_inode->i_mutex);
-	error = lookup_hash(&nd, &nd.last, &path);
+	error = hash_lookup_union(&nd, &nd.last, &path);
 	__putname(nd.last.name);
 	goto do_last;
 }
@@ -1913,7 +2367,7 @@ struct dentry *lookup_create(struct nameidata *nd, int is_dir)
 	/*
 	 * Do the final lookup.
 	 */
-	err = lookup_hash(nd, &nd->last, &path);
+	err = hash_lookup_union(nd, &nd->last, &path);
 	if (err) {
 		path.dentry = ERR_PTR(err);
 		goto fail;
@@ -2323,7 +2777,7 @@ static long do_rmdir(int dfd, const char __user *pathname)
 	nd.flags &= ~LOOKUP_PARENT;
 
 	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
-	error = lookup_hash(&nd, &nd.last, &path);
+	error = hash_lookup_union(&nd, &nd.last, &path);
 	if (error)
 		goto exit2;
 	error = mnt_want_write(nd.path.mnt);
@@ -2406,7 +2860,7 @@ static long do_unlinkat(int dfd, const char __user *pathname)
 	nd.flags &= ~LOOKUP_PARENT;
 
 	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
-	error = lookup_hash(&nd, &nd.last, &path);
+	error = hash_lookup_union(&nd, &nd.last, &path);
 	if (!error) {
 		/* Why not before? Because we want correct error value */
 		if (nd.last.name[nd.last.len])
@@ -2810,7 +3264,7 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 
 	trap = lock_rename(new_dir, old_dir);
 
-	error = lookup_hash(&oldnd, &oldnd.last, &old);
+	error = hash_lookup_union(&oldnd, &oldnd.last, &old);
 	if (error)
 		goto exit3;
 	/* source must exist */
@@ -2829,7 +3283,7 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 	error = -EINVAL;
 	if (old.dentry == trap)
 		goto exit4;
-	error = lookup_hash(&newnd, &newnd.last, &new);
+	error = hash_lookup_union(&newnd, &newnd.last, &new);
 	if (error)
 		goto exit4;
 	/* target should not be an ancestor of source */
diff --git a/include/linux/namei.h b/include/linux/namei.h
index fc2e035..e465cc7 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -19,6 +19,7 @@ struct nameidata {
 	struct path	path;
 	struct qstr	last;
 	unsigned int	flags;
+	unsigned int	um_flags;
 	int		last_type;
 	unsigned	depth;
 	char *saved_names[MAX_NESTED_LINKS + 1];
@@ -34,6 +35,9 @@ struct nameidata {
  */
 enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
 
+#define LAST_UNION             0x01
+#define LAST_LOWLEVEL          0x02
+
 /*
  * The bitmask for a lookup event:
  *  - follow links at the end
@@ -48,6 +52,8 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
 #define LOOKUP_CONTINUE		 4
 #define LOOKUP_PARENT		16
 #define LOOKUP_REVAL		64
+#define LOOKUP_TOPMOST	       128
+
 /*
  * Intent data
  */
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 22/32] union-mount: stop lookup when directory has S_OPAQUE flag set
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (20 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 21/32] union-mount: Make lookup work for union-mounted file systems Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 23/32] union-mount: stop lookup when finding a whiteout Jan Blunck
                   ` (13 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

Honor the S_OPAQUE flag in the union path lookup.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/namei.c |   17 ++++++++++++++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 2bb8a22..83cc5ea 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -521,6 +521,9 @@ static int __cache_lookup_build_union(struct nameidata *nd, struct qstr *name,
 			path_put(&last);
 		last.dentry = dentry;
 		last.mnt = mntget(nd->path.mnt);
+
+		if (IS_OPAQUE(last.dentry->d_inode))
+			break;
 	}
 
 	if (last.dentry != path->dentry)
@@ -560,7 +563,8 @@ static int cache_lookup_union(struct nameidata *nd, struct qstr *name,
 
 		/* only directories can be part of a union stack */
 		if (!path->dentry->d_inode ||
-		    !S_ISDIR(path->dentry->d_inode->i_mode))
+		    !S_ISDIR(path->dentry->d_inode->i_mode) ||
+		    IS_OPAQUE(path->dentry->d_inode))
 			goto out;
 
 		/* Build the union stack for this part */
@@ -802,6 +806,9 @@ static int __real_lookup_build_union(struct nameidata *nd, struct qstr *name,
 			path_put(&last);
 		last.dentry = next.dentry;
 		last.mnt = mntget(next.mnt);
+
+		if (IS_OPAQUE(last.dentry->d_inode))
+			break;
 	}
 
 	if (last.dentry != path->dentry)
@@ -823,7 +830,8 @@ static int real_lookup_union(struct nameidata *nd, struct qstr *name,
 
 	/* only directories can be part of a union stack */
 	if (!path->dentry->d_inode ||
-	    !S_ISDIR(path->dentry->d_inode->i_mode))
+	    !S_ISDIR(path->dentry->d_inode->i_mode) ||
+	    IS_OPAQUE(path->dentry->d_inode))
 		goto out;
 
 	/* Build the union stack for this part */
@@ -1108,7 +1116,7 @@ static int do_lookup(struct nameidata *nd, struct qstr *name,
 {
 	int err;
 
-	if (IS_MNT_UNION(nd->path.mnt))
+	if (IS_MNT_UNION(nd->path.mnt) && !IS_OPAQUE(nd->path.dentry->d_inode))
 		goto need_union_lookup;
 
 	path->dentry = __d_lookup(nd->path.dentry, name);
@@ -1643,6 +1651,9 @@ static int __hash_lookup_build_union(struct nameidata *nd, struct qstr *name,
 			path_put(&last);
 		last.dentry = next.dentry;
 		last.mnt = mntget(next.mnt);
+
+		if (IS_OPAQUE(last.dentry->d_inode))
+			break;
 	}
 
 	if (last.dentry != path->dentry)
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 23/32] union-mount: stop lookup when finding a whiteout
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (21 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 22/32] union-mount: stop lookup when directory has S_OPAQUE flag set Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 24/32] union-mount: in-kernel file copy between union mounted filesystems Jan Blunck
                   ` (12 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

Stop the lookup if we find a whiteout during union path lookup.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/namei.c |   30 ++++++++++++++++++++++--------
 1 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 83cc5ea..9c38df3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -440,10 +440,10 @@ static int __cache_lookup_topmost(struct nameidata *nd, struct qstr *name,
 	path->dentry = dentry;
 	path->mnt = dentry ? nd->path.mnt : NULL;
 
-	if (!dentry || dentry->d_inode)
+	if (!dentry || (dentry->d_inode || d_is_whiteout(dentry)))
 		return !dentry;
 
-	/* look for the first non-negative dentry */
+	/* look for the first non-negative or whiteout dentry */
 
 	while (follow_union_down(&nd->path.mnt, &nd->path.dentry)) {
 		dentry = d_hash_and_lookup(nd->path.dentry, name);
@@ -465,7 +465,7 @@ static int __cache_lookup_topmost(struct nameidata *nd, struct qstr *name,
 				goto out_dput;
 		}
 
-		if (dentry->d_inode)
+		if (dentry->d_inode || d_is_whiteout(dentry))
 			goto out_dput;
 
 		dput(dentry);
@@ -503,6 +503,11 @@ static int __cache_lookup_build_union(struct nameidata *nd, struct qstr *name,
 				return 1;
 		}
 
+		if (d_is_whiteout(dentry)) {
+			dput(dentry);
+			break;
+		}
+
 		if (!dentry->d_inode) {
 			dput(dentry);
 			continue;
@@ -714,7 +719,6 @@ out_unlock:
  * type mismatch and whiteouts.
  *
  * FIXME:
- * - handle DT_WHT
  * - handle union stacks in use
  * - handle union stacks mounted upon union stacks
  * - avoid unnecessary allocations of union locks
@@ -729,7 +733,7 @@ static int __real_lookup_topmost(struct nameidata *nd, struct qstr *name,
 	if (err)
 		return err;
 
-	if (path->dentry->d_inode)
+	if (path->dentry->d_inode || d_is_whiteout(path->dentry))
 		return 0;
 
 	while (follow_union_down(&nd->path.mnt, &nd->path.dentry)) {
@@ -745,7 +749,7 @@ static int __real_lookup_topmost(struct nameidata *nd, struct qstr *name,
 		if (err)
 			goto out;
 
-		if (next.dentry->d_inode) {
+		if (next.dentry->d_inode || d_is_whiteout(next.dentry)) {
 			dput(path->dentry);
 			mntget(next.mnt);
 			*path = next;
@@ -788,6 +792,11 @@ static int __real_lookup_build_union(struct nameidata *nd, struct qstr *name,
 		if (err)
 			goto out;
 
+		if (d_is_whiteout(next.dentry)) {
+			dput(next.dentry);
+			break;
+		}
+
 		if (!next.dentry->d_inode) {
 			dput(next.dentry);
 			continue;
@@ -1577,7 +1586,7 @@ static int __hash_lookup_topmost(struct nameidata *nd, struct qstr *name,
 	if (err)
 		return err;
 
-	if (path->dentry->d_inode)
+	if (path->dentry->d_inode || d_is_whiteout(path->dentry))
 		return 0;
 
 	while (follow_union_down(&nd->path.mnt, &nd->path.dentry)) {
@@ -1595,7 +1604,7 @@ static int __hash_lookup_topmost(struct nameidata *nd, struct qstr *name,
 		if (err)
 			goto out;
 
-		if (next.dentry->d_inode) {
+		if (next.dentry->d_inode || d_is_whiteout(next.dentry)) {
 			dput(path->dentry);
 			mntget(next.mnt);
 			*path = next;
@@ -1633,6 +1642,11 @@ static int __hash_lookup_build_union(struct nameidata *nd, struct qstr *name,
 		if (err)
 			goto out;
 
+		if (d_is_whiteout(next.dentry)) {
+			dput(next.dentry);
+			break;
+		}
+
 		if (!next.dentry->d_inode) {
 			dput(next.dentry);
 			continue;
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 24/32] union-mount: in-kernel file copy between union mounted filesystems
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (22 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 23/32] union-mount: stop lookup when finding a whiteout Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 25/32] union-mount: check for logically empty directory (FIXME) Jan Blunck
                   ` (11 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

This patch introduces in-kernel file copy between union mounted
filesystems. When a file is opened for writing but resides on a lower (thus
read-only) layer of the union stack it is copied to the topmost union layer
first.

This patch uses the do_splice() for doing the in-kernel file copy.

Signed-off-by: Bharata B Rao <bharata@in.ibm.com>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/namei.c            |   63 +++++++++-
 fs/union.c            |  320 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/union.h |    7 +
 3 files changed, 386 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9c38df3..91486bd 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1044,7 +1044,7 @@ static int __follow_mount(struct path *path)
 	return res;
 }
 
-static void follow_mount(struct vfsmount **mnt, struct dentry **dentry)
+void follow_mount(struct vfsmount **mnt, struct dentry **dentry)
 {
 	while (d_mountpoint(*dentry)) {
 		struct vfsmount *mounted = lookup_mnt(*mnt, *dentry);
@@ -1265,6 +1265,21 @@ static int __link_path_walk(const char *name, struct nameidata *nd)
 		if (err)
 			break;
 
+		if ((nd->flags & LOOKUP_TOPMOST) &&
+		    (nd->um_flags & LAST_LOWLEVEL)) {
+			struct dentry *dentry;
+
+			dentry = union_create_topmost(nd, &this, &next);
+			if (IS_ERR(dentry)) {
+				err = PTR_ERR(dentry);
+				goto out_dput;
+			}
+			path_put_conditional(&next, nd);
+			next.mnt = nd->path.mnt;
+			next.dentry = dentry;
+			nd->um_flags &= ~LAST_LOWLEVEL;
+		}
+
 		err = -ENOENT;
 		inode = next.dentry->d_inode;
 		if (!inode)
@@ -1314,6 +1329,22 @@ last_component:
 		err = do_lookup(nd, &this, &next);
 		if (err)
 			break;
+
+		if ((nd->flags & LOOKUP_TOPMOST) &&
+		    (nd->um_flags & LAST_LOWLEVEL)) {
+			struct dentry *dentry;
+
+			dentry = union_create_topmost(nd, &this, &next);
+			if (IS_ERR(dentry)) {
+				err = PTR_ERR(dentry);
+				goto out_dput;
+			}
+			path_put_conditional(&next, nd);
+			next.mnt = nd->path.mnt;
+			next.dentry = dentry;
+			nd->um_flags &= ~LAST_LOWLEVEL;
+		}
+
 		inode = next.dentry->d_inode;
 		if ((lookup_flags & LOOKUP_FOLLOW)
 		    && inode && inode->i_op->follow_link) {
@@ -1676,7 +1707,7 @@ out:
 	return err;
 }
 
-static int hash_lookup_union(struct nameidata *nd, struct qstr *name,
+int hash_lookup_union(struct nameidata *nd, struct qstr *name,
 			     struct path *path)
 {
 	struct path safe = { .dentry = nd->path.dentry, .mnt = nd->path.mnt };
@@ -2160,6 +2191,13 @@ struct file *do_filp_open(int dfd, const char *pathname,
 					 &nd, flag);
 		if (error)
 			return ERR_PTR(error);
+		if (unlikely((flag & FMODE_WRITE) &&
+			     is_unionized(nd.path.dentry, nd.path.mnt) &&
+			     S_ISREG(nd.path.dentry->d_inode->i_mode))) {
+			error = union_copyup(&nd, flag);
+			if (error)
+				return ERR_PTR(error);
+		}
 		goto ok;
 	}
 
@@ -2249,10 +2287,21 @@ do_last:
 	if (path.dentry->d_inode->i_op->follow_link)
 		goto do_link;
 
-	path_to_nameidata(&path, &nd);
 	error = -EISDIR;
 	if (path.dentry->d_inode && S_ISDIR(path.dentry->d_inode->i_mode))
-		goto exit;
+		goto exit_dput;
+
+	/*
+	 * If this file is on a lower layer of the union stack, copy it to the
+	 * topmost layer before opening it
+	 */
+	if (path.dentry->d_inode && (path.dentry->d_parent != dir)) {
+		error = __union_copyup(&path, &nd, &path);
+		if (error)
+			goto exit_dput;
+	}
+
+	path_to_nameidata(&path, &nd);
 ok:
 	/*
 	 * Consider:
@@ -3315,6 +3364,12 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 	error = -ENOTEMPTY;
 	if (new.dentry == trap)
 		goto exit5;
+	/* renaming on unions is done by the user-space */
+	error = -EXDEV;
+	if (is_unionized(oldnd.path.dentry, oldnd.path.mnt))
+		goto exit5;
+	if (is_unionized(newnd.path.dentry, newnd.path.mnt))
+		goto exit5;
 
 	error = mnt_want_write(oldnd.path.mnt);
 	if (error)
diff --git a/fs/union.c b/fs/union.c
index 6e220bd..d21fe5f 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -20,6 +20,14 @@
 #include <linux/fs.h>
 #include <linux/mount.h>
 #include <linux/union.h>
+#include <linux/namei.h>
+#include <linux/file.h>
+#include <linux/mm.h>
+#include <linux/quotaops.h>
+#include <linux/dnotify.h>
+#include <linux/security.h>
+#include <linux/pipe_fs_i.h>
+#include <linux/splice.h>
 
 /*
  * This is borrowed from fs/inode.c. The hashtable for lookups. Somebody
@@ -336,6 +344,318 @@ int follow_union_mount(struct vfsmount **mnt, struct dentry **dentry)
 }
 
 /*
+ * Union mount copyup support
+ */
+
+extern int hash_lookup_union(struct nameidata *, struct qstr *, struct path *);
+extern void follow_mount(struct vfsmount **, struct dentry **);
+
+/*
+ * union_relookup_topmost - lookup and create the topmost path to dentry
+ * @nd: pointer to nameidata
+ * @flags: lookup flags
+ */
+static int union_relookup_topmost(struct nameidata *nd, int flags)
+{
+	int err;
+	char *kbuf, *name;
+	struct nameidata this;
+
+	kbuf = (char *)__get_free_page(GFP_KERNEL);
+	if (!kbuf)
+		return -ENOMEM;
+
+	name = d_path(&nd->path, kbuf, PAGE_SIZE);
+	err = PTR_ERR(name);
+	if (IS_ERR(name))
+		goto free_page;
+
+	err = path_lookup(name, flags|LOOKUP_CREATE|LOOKUP_TOPMOST, &this);
+	if (err)
+		goto free_page;
+
+	path_put(&nd->path);
+	nd->path.dentry = this.path.dentry;
+	nd->path.mnt = this.path.mnt;
+
+	/*
+	 * the nd->flags should be unchanged
+	 */
+	BUG_ON(this.um_flags & LAST_LOWLEVEL);
+	nd->um_flags &= ~LAST_LOWLEVEL;
+ free_page:
+	free_page((unsigned long)kbuf);
+	return err;
+}
+
+static void __update_fs_pwd(struct path *path, struct dentry *dentry,
+			    struct vfsmount *mnt)
+{
+	struct path old = { NULL, NULL };
+
+	write_lock(&current->fs->lock);
+	if (current->fs->pwd.dentry == path->dentry) {
+		old = current->fs->pwd;
+		path_get(&current->fs->pwd);
+	}
+	write_unlock(&current->fs->lock);
+
+	if (old.dentry)
+		path_put(&old);
+
+	return;
+}
+
+/*
+ * union_create_topmost - create the topmost path component
+ * @nd: pointer to nameidata of the base directory
+ * @name: pointer to file name
+ * @path: pointer to path of the overlaid file
+ *
+ * This is called by __link_path_walk() to create the directories on a path
+ * when it is called with LOOKUP_TOPMOST.
+ */
+struct dentry *union_create_topmost(struct nameidata *nd, struct qstr *name,
+				    struct path *path)
+{
+	struct dentry *dentry, *parent = nd->path.dentry;
+	int res, mode = path->dentry->d_inode->i_mode;
+
+	if (parent->d_sb == path->dentry->d_sb)
+		return ERR_PTR(-EEXIST);
+
+	mutex_lock(&parent->d_inode->i_mutex);
+	dentry = lookup_one_len(name->name, nd->path.dentry, name->len);
+	if (IS_ERR(dentry))
+		goto out_unlock;
+
+	switch (mode & S_IFMT) {
+	case S_IFREG:
+		/*
+		 * FIXME: Does this make any sense in this case?
+		 * Special case - lookup gave negative, but... we had foo/bar/
+		 * From the vfs_mknod() POV we just have a negative dentry -
+		 * all is fine. Let's be bastards - you had / on the end,you've
+		 * been asking for (non-existent) directory. -ENOENT for you.
+		 */
+		if (name->name[name->len] && !dentry->d_inode) {
+			dput(dentry);
+			dentry = ERR_PTR(-ENOENT);
+			goto out_unlock;
+		}
+
+		res = vfs_create(parent->d_inode, dentry, mode, nd);
+		if (res) {
+			dput(dentry);
+			dentry = ERR_PTR(res);
+			goto out_unlock;
+		}
+		break;
+	case S_IFDIR:
+		res = vfs_mkdir(parent->d_inode, dentry, mode);
+		if (res) {
+			dput(dentry);
+			dentry = ERR_PTR(res);
+			goto out_unlock;
+		}
+
+		res = append_to_union(nd->path.mnt, dentry, path->mnt,
+				      path->dentry);
+		if (res) {
+			dput(dentry);
+			dentry = ERR_PTR(res);
+			goto out_unlock;
+		}
+		break;
+	default:
+		dput(dentry);
+		dentry = ERR_PTR(-EINVAL);
+		goto out_unlock;
+	}
+
+	/* FIXME: Really necessary ??? */
+/*	__update_fs_pwd(path, dentry, nd->path.mnt); */
+
+ out_unlock:
+	mutex_unlock(&parent->d_inode->i_mutex);
+	return dentry;
+}
+
+static int union_copy_file(struct dentry *old_dentry, struct vfsmount *old_mnt,
+			   struct dentry *new_dentry, struct vfsmount *new_mnt)
+{
+	int ret;
+	size_t size;
+	loff_t offset;
+	struct file *old_file, *new_file;
+	const struct cred *cred = current_cred();
+
+	dget(old_dentry);
+	mntget(old_mnt);
+	old_file = dentry_open(old_dentry, old_mnt, O_RDONLY, cred);
+	if (IS_ERR(old_file))
+		return PTR_ERR(old_file);
+
+	dget(new_dentry);
+	mntget(new_mnt);
+	new_file = dentry_open(new_dentry, new_mnt, O_WRONLY, cred);
+	ret = PTR_ERR(new_file);
+	if (IS_ERR(new_file))
+		goto fput_old;
+
+	size = i_size_read(old_file->f_path.dentry->d_inode);
+	if (((size_t)size != size) || ((ssize_t)size != size)) {
+		ret = -EFBIG;
+		goto fput_new;
+	}
+
+	offset = 0;
+	ret = do_splice_direct(old_file, &offset, new_file, size,
+			       SPLICE_F_MOVE);
+	if (ret >= 0)
+		ret = 0;
+ fput_new:
+	fput(new_file);
+ fput_old:
+	fput(old_file);
+	return ret;
+}
+
+/**
+ * __union_copyup - copy a file to the topmost directory
+ * @old: pointer to path of the old file name
+ * @new_nd: pointer to nameidata of the topmost directory
+ * @new: pointer to path of the new file name
+ *
+ * The topmost directory @new_nd must already be locked. Creates the topmost
+ * file if it doesn't exist yet.
+ */
+int __union_copyup(struct path *old, struct nameidata *new_nd, struct path *new)
+{
+	struct dentry *dentry;
+	int error;
+
+	/* Maybe this should be -EINVAL */
+	if (S_ISDIR(old->dentry->d_inode->i_mode))
+		return -EISDIR;
+
+	if (new_nd->path.dentry != new->dentry->d_parent) {
+		dentry = lookup_one_len(new->dentry->d_name.name,
+					new_nd->path.dentry,
+					new->dentry->d_name.len);
+		if (IS_ERR(dentry))
+			return PTR_ERR(dentry);
+		error = -EEXIST;
+		if (dentry->d_inode)
+			goto out_dput;
+	} else
+		dentry = dget(new->dentry);
+
+	if (!dentry->d_inode) {
+		error = vfs_create(new_nd->path.dentry->d_inode, dentry,
+				   old->dentry->d_inode->i_mode, new_nd);
+		if (error)
+			goto out_dput;
+	}
+
+	error = union_copy_file(old->dentry, old->mnt, dentry,
+				new_nd->path.mnt);
+	if (error) {
+		/* FIXME: are there return value we should not BUG() on ? */
+		BUG_ON(vfs_unlink(new_nd->path.dentry->d_inode, dentry));
+		goto out_dput;
+	}
+
+	dput(new->dentry);
+	new->dentry = dentry;
+	if (new->mnt != new_nd->path.mnt)
+		mntput(new->mnt);
+	new->mnt = new_nd->path.mnt;
+	return error;
+
+out_dput:
+	dput(dentry);
+	return error;
+}
+
+/*
+ * union_copyup - copy a file to the topmost layer of the union stack
+ * @nd: nameidata pointer to the file
+ * @flags: flags given to open_namei
+ */
+int union_copyup(struct nameidata *nd, int flags)
+{
+	struct qstr this;
+	char *name;
+	struct dentry *dir;
+	struct path path;
+	int err;
+
+	if (!is_unionized(nd->path.dentry, nd->path.mnt))
+		return 0;
+	if (!S_ISREG(nd->path.dentry->d_inode->i_mode))
+		return 0;
+
+	/* safe the name for hash_lookup_union() */
+	this.len = nd->path.dentry->d_name.len;
+	this.hash = nd->path.dentry->d_name.hash;
+	name = kmalloc(this.len + 1, GFP_KERNEL);
+	if (!name)
+		return -ENOMEM;
+	this.name = name;
+	memcpy(name, nd->path.dentry->d_name.name, nd->path.dentry->d_name.len);
+	name[this.len] = 0;
+
+	err = union_relookup_topmost(nd, nd->flags|LOOKUP_PARENT);
+	if (err) {
+		kfree(name);
+		return err;
+	}
+	nd->flags &= ~LOOKUP_PARENT;
+
+	dir = nd->path.dentry;
+	mutex_lock(&dir->d_inode->i_mutex);
+	err = hash_lookup_union(nd, &this, &path);
+	mutex_unlock(&dir->d_inode->i_mutex);
+	kfree(name);
+	if (err)
+		return err;
+
+	err = -ENOENT;
+	if (!path.dentry->d_inode)
+		goto exit_dput;
+
+	/* Necessary?! I guess not ... */
+	follow_mount(&path.mnt, &path.dentry);
+
+	err = -ENOENT;
+	if (!path.dentry->d_inode)
+		goto exit_dput;
+
+	err = -EISDIR;
+	if (!S_ISREG(path.dentry->d_inode->i_mode))
+		goto exit_dput;
+
+	if (path.dentry->d_parent != nd->path.dentry) {
+		err = __union_copyup(&path, nd, &path);
+		if (err)
+			goto exit_dput;
+	}
+
+	dput(nd->path.dentry);
+	if (nd->path.mnt != path.mnt)
+		mntput(nd->path.mnt);
+	nd->path = path;
+	return 0;
+
+exit_dput:
+	dput(path.dentry);
+	if (path.mnt != nd->path.mnt)
+		mntput(path.mnt);
+	return err;
+}
+
+/*
  * This must be called when unhashing a dentry. This is called with dcache_lock
  * and unhashes all unions this dentry is in.
  */
diff --git a/include/linux/union.h b/include/linux/union.h
index 0b6f356..405baa9 100644
--- a/include/linux/union.h
+++ b/include/linux/union.h
@@ -53,6 +53,10 @@ extern void __shrink_d_unions(struct dentry *, struct list_head *);
 extern int attach_mnt_union(struct vfsmount *, struct vfsmount *,
 			    struct dentry *);
 extern void detach_mnt_union(struct vfsmount *);
+extern struct dentry *union_create_topmost(struct nameidata *, struct qstr *,
+					   struct path *);
+extern int __union_copyup(struct path *, struct nameidata *, struct path *);
+extern int union_copyup(struct nameidata *, int);
 
 #else /* CONFIG_UNION_MOUNT */
 
@@ -67,6 +71,9 @@ extern void detach_mnt_union(struct vfsmount *);
 #define __shrink_d_unions(x,y)		do { } while (0)
 #define attach_mnt_union(x, y, z)	do { } while (0)
 #define detach_mnt_union(x)		do { } while (0)
+#define union_create_topmost(x, y, z)	({ BUG(); (NULL); })
+#define __union_copyup(x, y, z)		({ BUG(); (0); })
+#define union_copyup(x, y)		({ (0); })
 
 #endif	/* CONFIG_UNION_MOUNT */
 #endif	/* __KERNEL__ */
-- 
1.6.1.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 25/32] union-mount: check for logically empty directory (FIXME)
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (23 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 24/32] union-mount: in-kernel file copy between union mounted filesystems Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 26/32] union-mount: call do_whiteout() on unlink and rmdir Jan Blunck
                   ` (10 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

This patch abuses readdir() to check if the union directory is logically
empty. We should populate the topmost directory with fallthrough entries and
after that ask the filesystem if the directory is empty.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/namei.c |   85 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 85 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 91486bd..78eb973 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2766,6 +2766,91 @@ int path_whiteout(struct path *dir_path, struct dentry *dentry, int isdir)
 EXPORT_SYMBOL(path_whiteout);
 
 /*
+ * This is abusing readdir to check if a union directory is logically empty.
+ * Al Viro barfed when he saw this, but Val said: "Well, at this point I'm
+ * aiming for working, pretty can come later"
+ */
+static int filldir_is_empty(void *__buf, const char *name, int namlen,
+			    loff_t offset, u64 ino, unsigned int d_type)
+{
+	int *is_empty = (int *)__buf;
+
+	switch (namlen) {
+	case 2:
+		if (name[1] != '.')
+			break;
+	case 1:
+		if (name[0] != '.')
+			break;
+		return 0;
+	}
+
+	if (d_type == DT_WHT)
+		return 0;
+
+	(*is_empty) = 0;
+	return 0;
+}
+
+static int directory_is_empty(struct dentry *dentry, struct vfsmount *mnt)
+{
+	struct file *file;
+	int err;
+	int is_empty = 1;
+
+	BUG_ON(!S_ISDIR(dentry->d_inode->i_mode));
+
+	/* references for the file pointer */
+	dget(dentry);
+	mntget(mnt);
+
+	file = dentry_open(dentry, mnt, O_RDONLY, current_cred());
+	if (IS_ERR(file))
+		return 0;
+
+	err = vfs_readdir(file, filldir_is_empty, &is_empty);
+
+	fput(file);
+	return is_empty;
+}
+
+static int do_whiteout(struct nameidata *nd, struct path *path, int isdir)
+{
+	struct path safe = { .dentry = dget(nd->path.dentry),
+			     .mnt = mntget(nd->path.mnt) };
+	struct dentry *dentry = path->dentry;
+	int err;
+
+	err = may_whiteout(nd->path.dentry->d_inode, dentry, isdir);
+	if (err)
+		goto out;
+
+	err = -ENOTEMPTY;
+	if (isdir && !directory_is_empty(path->dentry, path->mnt))
+		goto out;
+
+	if (nd->path.dentry != dentry->d_parent) {
+		dentry = __lookup_hash(&path->dentry->d_name, nd->path.dentry,
+				       nd);
+		err = PTR_ERR(dentry);
+		if (IS_ERR(dentry))
+			goto out;
+
+		dput(path->dentry);
+		if (path->mnt != safe.mnt)
+			mntput(path->mnt);
+		path->mnt = nd->path.mnt;
+		path->dentry = dentry;
+	}
+
+	err = vfs_whiteout(nd->path.dentry->d_inode, dentry, isdir);
+
+out:
+	path_put(&safe);
+	return err;
+}
+
+/*
  * We try to drop the dentry early: we should have
  * a usage count of 2 if we're the only user of this
  * dentry, and if that is true (possibly after pruning
-- 
1.6.1.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 26/32] union-mount: call do_whiteout() on unlink and rmdir
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (24 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 25/32] union-mount: check for logically empty directory (FIXME) Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 27/32] union-mount: Always create topmost directory on open Jan Blunck
                   ` (9 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

Call do_whiteout() when removing files and directories.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/namei.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 78eb973..4d68597 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2825,6 +2825,10 @@ static int do_whiteout(struct nameidata *nd, struct path *path, int isdir)
 	if (err)
 		goto out;
 
+	err = -ENOENT;
+	if (!dentry->d_inode)
+		goto out;
+
 	err = -ENOTEMPTY;
 	if (isdir && !directory_is_empty(path->dentry, path->mnt))
 		goto out;
@@ -2939,6 +2943,10 @@ static long do_rmdir(int dfd, const char __user *pathname)
 	error = hash_lookup_union(&nd, &nd.last, &path);
 	if (error)
 		goto exit2;
+	if (is_unionized(nd.path.dentry, nd.path.mnt)) {
+		error = do_whiteout(&nd, &path, 1);
+		goto exit3;
+	}
 	error = mnt_want_write(nd.path.mnt);
 	if (error)
 		goto exit3;
@@ -3027,6 +3035,10 @@ static long do_unlinkat(int dfd, const char __user *pathname)
 		inode = path.dentry->d_inode;
 		if (inode)
 			atomic_inc(&inode->i_count);
+		if (is_unionized(nd.path.dentry, nd.path.mnt)) {
+			error = do_whiteout(&nd, &path, 0);
+			goto exit2;
+		}
 		error = mnt_want_write(nd.path.mnt);
 		if (error)
 			goto exit2;
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 27/32] union-mount: Always create topmost directory on open
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (25 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 26/32] union-mount: call do_whiteout() on unlink and rmdir Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 28/32] union-mount: Basic fallthru definitions Jan Blunck
                   ` (8 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

From: Valerie Aurora (Henson) <vaurora@redhat.com>

When we open a directory, unconditionally create a matching directory
on the top-level.  This way we don't have to go back and create all
the directories on the path to an element when we want to copy it up.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/namei.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 4d68597..684619c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1265,8 +1265,9 @@ static int __link_path_walk(const char *name, struct nameidata *nd)
 		if (err)
 			break;
 
-		if ((nd->flags & LOOKUP_TOPMOST) &&
-		    (nd->um_flags & LAST_LOWLEVEL)) {
+		if ((nd->um_flags & LAST_LOWLEVEL) &&
+		    (S_ISDIR(next.dentry->d_inode->i_mode) ||
+		     (nd->flags & LOOKUP_TOPMOST))) {
 			struct dentry *dentry;
 
 			dentry = union_create_topmost(nd, &this, &next);
@@ -1330,8 +1331,9 @@ last_component:
 		if (err)
 			break;
 
-		if ((nd->flags & LOOKUP_TOPMOST) &&
-		    (nd->um_flags & LAST_LOWLEVEL)) {
+		if ((nd->um_flags & LAST_LOWLEVEL) &&
+		    (S_ISDIR(next.dentry->d_inode->i_mode) ||
+		     (nd->flags & LOOKUP_TOPMOST))) {
 			struct dentry *dentry;
 
 			dentry = union_create_topmost(nd, &this, &next);
-- 
1.6.1.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 28/32] union-mount: Basic fallthru definitions
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (26 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 27/32] union-mount: Always create topmost directory on open Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 29/32] union mount: Support for fallthru entries in union mount lookup Jan Blunck
                   ` (7 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

From: Valerie Aurora (Henson) <vaurora@redhat.com>

Define the fallthru dcache flag and file system op.

Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 include/linux/dcache.h |    6 ++++++
 include/linux/fs.h     |    1 +
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 7930b07..9534813 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -192,6 +192,7 @@ d_iput:		no		no		no       yes
 #define DCACHE_INOTIFY_PARENT_WATCHED	0x0020 /* Parent inode is watched */
 #define DCACHE_COOKIE		0x0040	/* For use by dcookie subsystem */
 #define DCACHE_WHITEOUT		0x0080	/* This negative dentry is a whiteout */
+#define DCACHE_FALLTHRU		0x0100	/* Keep looking in the file system below */
 
 extern spinlock_t dcache_lock;
 extern seqlock_t rename_lock;
@@ -373,6 +374,11 @@ static inline int d_is_whiteout(struct dentry *dentry)
 	return (dentry->d_flags & DCACHE_WHITEOUT);
 }
 
+static inline int d_is_fallthru(struct dentry *dentry)
+{
+	return (dentry->d_flags & DCACHE_FALLTHRU);
+}
+
 static inline struct dentry *dget_parent(struct dentry *dentry)
 {
 	struct dentry *ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7f07768..dd9c859 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1351,6 +1351,7 @@ struct inode_operations {
 	int (*rmdir) (struct inode *,struct dentry *);
 	int (*mknod) (struct inode *,struct dentry *,int,dev_t);
 	int (*whiteout) (struct inode *, struct dentry *, struct dentry *);
+	int (*fallthru) (struct inode *, struct dentry *);
 	int (*rename) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *);
 	int (*readlink) (struct dentry *, char __user *,int);
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 29/32] union mount: Support for fallthru entries in union mount lookup
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (27 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 28/32] union-mount: Basic fallthru definitions Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 30/32] union mount: ext2 fallthru support Jan Blunck
                   ` (6 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

From: Valerie Aurora (Henson) <vaurora@redhat.com>

A fallthru directory entry overrides the opaque flag for its parent
directory (for this directory entry only).  Before, we stopped
building the union stack when we encountered an opaque directory; now
we include directories below opaque directories in the union stack and
check for opacity during lookup.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/dcache.c |    7 +++----
 fs/namei.c  |   59 +++++++++++++++++++++++++++++++++++++++++++++--------------
 2 files changed, 48 insertions(+), 18 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b6fb688..844a76a 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1086,7 +1086,7 @@ struct dentry *d_alloc_name(struct dentry *parent, const char *name)
 static void __d_instantiate(struct dentry *dentry, struct inode *inode)
 {
 	if (inode) {
-		dentry->d_flags &= ~DCACHE_WHITEOUT;
+		dentry->d_flags &= ~(DCACHE_WHITEOUT|DCACHE_FALLTHRU);
 		list_add(&dentry->d_alias, &inode->i_dentry);
 	}
 	dentry->d_inode = inode;
@@ -1650,9 +1650,8 @@ void d_delete(struct dentry * dentry)
 
 static void __d_rehash(struct dentry * entry, struct hlist_head *list)
 {
-
- 	entry->d_flags &= ~DCACHE_UNHASHED;
- 	hlist_add_head_rcu(&entry->d_hash, list);
+	entry->d_flags &= ~DCACHE_UNHASHED;
+	hlist_add_head_rcu(&entry->d_hash, list);
 }
 
 static void _d_rehash(struct dentry * entry)
diff --git a/fs/namei.c b/fs/namei.c
index 684619c..1f96a66 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -414,6 +414,28 @@ static struct dentry *cache_lookup(struct dentry *parent, struct qstr *name,
 	return dentry;
 }
 
+/*
+ * Theory of operation for opaque, whiteout, and fallthru:
+ *
+ * whiteout: Unconditionally stop lookup here - ENOENT
+ *
+ * opaque: Don't lookup in directories lower in the union stack
+ *
+ * fallthru: While looking up an entry, ignore the opaque flag for the
+ * current directory only.
+ *
+ * A union stack is a linked list of directory dentries which appear
+ * in the same place in the namespace.  When constructing the union
+ * stack, we include directories below opaque directories so that we
+ * can properly handle fallthrus.  All non-fallthru lookups have to
+ * check for the opaque flag on the parent directory and obey it.
+ *
+ * In general, the code pattern is to lookup the the topmost entry
+ * first (either the first visible non-negative dentry or a negative
+ * dentry in the topmost layer of the union), then build the union
+ * stack for the newly looked-up entry (if it is a directory).
+ */
+
 /**
  * __cache_lookup_topmost - lookup the topmost (non-)negative dentry
  *
@@ -443,6 +465,10 @@ static int __cache_lookup_topmost(struct nameidata *nd, struct qstr *name,
 	if (!dentry || (dentry->d_inode || d_is_whiteout(dentry)))
 		return !dentry;
 
+	/* Keep going through opaque directories if we found a fallthru */
+	if (IS_OPAQUE(nd->path.dentry->d_inode) && !d_is_fallthru(dentry))
+		return !dentry;
+
 	/* look for the first non-negative or whiteout dentry */
 
 	while (follow_union_down(&nd->path.mnt, &nd->path.dentry)) {
@@ -468,6 +494,10 @@ static int __cache_lookup_topmost(struct nameidata *nd, struct qstr *name,
 		if (dentry->d_inode || d_is_whiteout(dentry))
 			goto out_dput;
 
+		/* Stop the lookup on opaque parent and non-fallthru child */
+		if (IS_OPAQUE(nd->path.dentry->d_inode) && !d_is_fallthru(dentry))
+			goto out_dput;
+
 		dput(dentry);
 	}
 
@@ -526,9 +556,6 @@ static int __cache_lookup_build_union(struct nameidata *nd, struct qstr *name,
 			path_put(&last);
 		last.dentry = dentry;
 		last.mnt = mntget(nd->path.mnt);
-
-		if (IS_OPAQUE(last.dentry->d_inode))
-			break;
 	}
 
 	if (last.dentry != path->dentry)
@@ -568,8 +595,7 @@ static int cache_lookup_union(struct nameidata *nd, struct qstr *name,
 
 		/* only directories can be part of a union stack */
 		if (!path->dentry->d_inode ||
-		    !S_ISDIR(path->dentry->d_inode->i_mode) ||
-		    IS_OPAQUE(path->dentry->d_inode))
+		    !S_ISDIR(path->dentry->d_inode->i_mode))
 			goto out;
 
 		/* Build the union stack for this part */
@@ -736,6 +762,9 @@ static int __real_lookup_topmost(struct nameidata *nd, struct qstr *name,
 	if (path->dentry->d_inode || d_is_whiteout(path->dentry))
 		return 0;
 
+	if (IS_OPAQUE(nd->path.dentry->d_inode) && !d_is_fallthru(path->dentry))
+		return 0;
+
 	while (follow_union_down(&nd->path.mnt, &nd->path.dentry)) {
 		name->hash = full_name_hash(name->name, name->len);
 		if (nd->path.dentry->d_op && nd->path.dentry->d_op->d_hash) {
@@ -756,6 +785,9 @@ static int __real_lookup_topmost(struct nameidata *nd, struct qstr *name,
 			goto out;
 		}
 
+		if (IS_OPAQUE(nd->path.dentry->d_inode) && !d_is_fallthru(next.dentry))
+			goto out;
+
 		dput(next.dentry);
 	}
 out:
@@ -815,9 +847,6 @@ static int __real_lookup_build_union(struct nameidata *nd, struct qstr *name,
 			path_put(&last);
 		last.dentry = next.dentry;
 		last.mnt = mntget(next.mnt);
-
-		if (IS_OPAQUE(last.dentry->d_inode))
-			break;
 	}
 
 	if (last.dentry != path->dentry)
@@ -839,8 +868,7 @@ static int real_lookup_union(struct nameidata *nd, struct qstr *name,
 
 	/* only directories can be part of a union stack */
 	if (!path->dentry->d_inode ||
-	    !S_ISDIR(path->dentry->d_inode->i_mode) ||
-	    IS_OPAQUE(path->dentry->d_inode))
+	    !S_ISDIR(path->dentry->d_inode->i_mode))
 		goto out;
 
 	/* Build the union stack for this part */
@@ -1125,7 +1153,7 @@ static int do_lookup(struct nameidata *nd, struct qstr *name,
 {
 	int err;
 
-	if (IS_MNT_UNION(nd->path.mnt) && !IS_OPAQUE(nd->path.dentry->d_inode))
+	if (IS_MNT_UNION(nd->path.mnt))
 		goto need_union_lookup;
 
 	path->dentry = __d_lookup(nd->path.dentry, name);
@@ -1622,6 +1650,9 @@ static int __hash_lookup_topmost(struct nameidata *nd, struct qstr *name,
 	if (path->dentry->d_inode || d_is_whiteout(path->dentry))
 		return 0;
 
+	if (IS_OPAQUE(nd->path.dentry->d_inode) && !d_is_fallthru(path->dentry))
+		return 0;
+
 	while (follow_union_down(&nd->path.mnt, &nd->path.dentry)) {
 		name->hash = full_name_hash(name->name, name->len);
 		if (nd->path.dentry->d_op && nd->path.dentry->d_op->d_hash) {
@@ -1644,6 +1675,9 @@ static int __hash_lookup_topmost(struct nameidata *nd, struct qstr *name,
 			goto out;
 		}
 
+		if (IS_OPAQUE(nd->path.dentry->d_inode) && !d_is_fallthru(next.dentry))
+			goto out;
+
 		dput(next.dentry);
 	}
 out:
@@ -1698,9 +1732,6 @@ static int __hash_lookup_build_union(struct nameidata *nd, struct qstr *name,
 			path_put(&last);
 		last.dentry = next.dentry;
 		last.mnt = mntget(next.mnt);
-
-		if (IS_OPAQUE(last.dentry->d_inode))
-			break;
 	}
 
 	if (last.dentry != path->dentry)
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 30/32] union mount: ext2 fallthru support
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (28 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 29/32] union mount: Support for fallthru entries in union mount lookup Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:32   ` Andreas Dilger
  2009-05-18 16:09 ` [PATCH 31/32] union-mount: tmpfs " Jan Blunck
                   ` (5 subsequent siblings)
  35 siblings, 1 reply; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

From: Valerie Aurora (Henson) <vaurora@redhat.com>

Add support for fallthru directory entries to ext2.

Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
Signed-off-by: Jan Blunck <jblunck@suse.de>
---
 fs/ext2/dir.c           |   86 ++++++++++++++++++++++++++++++++++++++++++++--
 fs/ext2/ext2.h          |    1 +
 fs/ext2/namei.c         |   20 +++++++++++
 include/linux/ext2_fs.h |    1 +
 4 files changed, 104 insertions(+), 4 deletions(-)

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 5b499ad..b9380fe 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -219,7 +219,8 @@ static inline int ext2_match (int len, const char * const name,
 {
 	if (len != de->name_len)
 		return 0;
-	if (!de->inode && (de->file_type != EXT2_FT_WHT))
+	if (!de->inode && ((de->file_type != EXT2_FT_WHT) &&
+			   (de->file_type != EXT2_FT_FALLTHRU)))
 		return 0;
 	return !memcmp(name, de->name, len);
 }
@@ -256,6 +257,7 @@ static unsigned char ext2_filetype_table[EXT2_FT_MAX] = {
 	[EXT2_FT_SOCK]		= DT_SOCK,
 	[EXT2_FT_SYMLINK]	= DT_LNK,
 	[EXT2_FT_WHT]		= DT_WHT,
+	[EXT2_FT_FALLTHRU]	= DT_UNKNOWN,
 };
 
 #define S_SHIFT 12
@@ -342,6 +344,18 @@ ext2_readdir (struct file * filp, void * dirent, filldir_t filldir)
 					ext2_put_page(page);
 					return 0;
 				}
+			} else if (de->file_type == EXT2_FT_FALLTHRU) {
+				int over;
+				unsigned char d_type = DT_UNKNOWN;
+
+				offset = (char *)de - kaddr;
+				over = filldir(dirent, de->name, de->name_len,
+						(n<<PAGE_CACHE_SHIFT) | offset,
+						123, d_type);
+				if (over) {
+					ext2_put_page(page);
+					return 0;
+				}
 			}
 			filp->f_pos += ext2_rec_len_from_disk(de->rec_len);
 		}
@@ -463,6 +477,10 @@ ino_t ext2_inode_by_dentry(struct inode *dir, struct dentry *dentry)
 			spin_lock(&dentry->d_lock);
 			dentry->d_flags |= DCACHE_WHITEOUT;
 			spin_unlock(&dentry->d_lock);
+		} else if(!res && de->file_type == EXT2_FT_FALLTHRU) {
+			spin_lock(&dentry->d_lock);
+			dentry->d_flags |= DCACHE_FALLTHRU;
+			spin_unlock(&dentry->d_lock);
 		}
 		ext2_put_page(page);
 	}
@@ -531,6 +549,7 @@ static ext2_dirent * ext2_append_entry(struct dentry * dentry,
 				de->name_len = 0;
 				de->rec_len = ext2_rec_len_to_disk(chunk_size);
 				de->inode = 0;
+				de->file_type = 0;
 				goto got_it;
 			}
 			if (de->rec_len == 0) {
@@ -544,6 +563,7 @@ static ext2_dirent * ext2_append_entry(struct dentry * dentry,
 			name_len = EXT2_DIR_REC_LEN(de->name_len);
 			rec_len = ext2_rec_len_from_disk(de->rec_len);
 			if (!de->inode && (de->file_type != EXT2_FT_WHT) &&
+			    (de->file_type != EXT2_FT_FALLTHRU) &&
 			    (rec_len >= reclen))
 				goto got_it;
 			if (rec_len >= name_len + reclen)
@@ -586,7 +606,8 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode)
 
 	err = -EEXIST;
 	if (ext2_match (namelen, name, de)) {
-		if (de->file_type == EXT2_FT_WHT)
+		if ((de->file_type == EXT2_FT_WHT) ||
+		    (de->file_type == EXT2_FT_FALLTHRU))
 			goto got_it;
 		goto out_unlock;
 	}
@@ -601,7 +622,8 @@ got_it:
 							&page, NULL);
 	if (err)
 		goto out_unlock;
-	if (de->inode || ((de->file_type == EXT2_FT_WHT) &&
+	if (de->inode || (((de->file_type == EXT2_FT_WHT) ||
+			   (de->file_type == EXT2_FT_FALLTHRU)) &&
 			  !ext2_match (namelen, name, de))) {
 		ext2_dirent *de1 = (ext2_dirent *) ((char *) de + name_len);
 		de1->rec_len = ext2_rec_len_to_disk(rec_len - name_len);
@@ -626,6 +648,60 @@ out_unlock:
 }
 
 /*
+ * Create a fallthru entry.
+ */
+int ext2_fallthru_entry (struct inode *dir, struct dentry *dentry)
+{
+	const char *name = dentry->d_name.name;
+	int namelen = dentry->d_name.len;
+	unsigned short rec_len, name_len;
+	ext2_dirent * de;
+	struct page *page;
+	loff_t pos;
+	int err;
+
+	de = ext2_append_entry(dentry, &page);
+	if (IS_ERR(de))
+		return PTR_ERR(de);
+
+	err = -EEXIST;
+	if (ext2_match (namelen, name, de))
+		goto out_unlock;
+
+	name_len = EXT2_DIR_REC_LEN(de->name_len);
+	rec_len = ext2_rec_len_from_disk(de->rec_len);
+
+	pos = page_offset(page) +
+		(char*)de - (char*)page_address(page);
+	err = __ext2_write_begin(NULL, page->mapping, pos, rec_len, 0,
+							&page, NULL);
+	if (err)
+		goto out_unlock;
+	if (de->inode || (de->file_type == EXT2_FT_WHT) ||
+	    (de->file_type == EXT2_FT_FALLTHRU)) {
+		ext2_dirent *de1 = (ext2_dirent *) ((char *) de + name_len);
+		de1->rec_len = ext2_rec_len_to_disk(rec_len - name_len);
+		de->rec_len = ext2_rec_len_to_disk(name_len);
+		de = de1;
+	}
+	de->name_len = namelen;
+	memcpy(de->name, name, namelen);
+	de->inode = 0;
+	de->file_type = EXT2_FT_FALLTHRU;
+	err = ext2_commit_chunk(page, pos, rec_len);
+	dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
+	EXT2_I(dir)->i_flags &= ~EXT2_BTREE_FL;
+	mark_inode_dirty(dir);
+	/* OFFSET_CACHE */
+out_put:
+	ext2_put_page(page);
+	return err;
+out_unlock:
+	unlock_page(page);
+	goto out_put;
+}
+
+/*
  * ext2_delete_entry deletes a directory entry by merging it with the
  * previous entry. Page is up-to-date. Releases the page.
  */
@@ -710,7 +786,9 @@ int ext2_whiteout_entry (struct inode * dir, struct dentry * dentry,
 	 */
 	if (ext2_match (namelen, name, de))
 		de->inode = 0;
-	if (de->inode || (de->file_type == EXT2_FT_WHT)) {
+	if (de->inode || (((de->file_type == EXT2_FT_WHT) ||
+			   (de->file_type == EXT2_FT_FALLTHRU)) &&
+			  !ext2_match (namelen, name, de))) {
 		ext2_dirent *de1 = (ext2_dirent *) ((char *) de + name_len);
 		de1->rec_len = ext2_rec_len_to_disk(rec_len - name_len);
 		de->rec_len = ext2_rec_len_to_disk(name_len);
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index ec9a0bd..363b0fa 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -112,6 +112,7 @@ extern struct ext2_dir_entry_2 * ext2_find_entry (struct inode *,struct qstr *,
 extern int ext2_delete_entry (struct ext2_dir_entry_2 *, struct page *);
 extern int ext2_whiteout_entry (struct inode *, struct dentry *,
 				struct ext2_dir_entry_2 *, struct page *);
+extern int ext2_fallthru_entry (struct inode *, struct dentry *);
 extern int ext2_empty_dir (struct inode *);
 extern struct ext2_dir_entry_2 * ext2_dotdot (struct inode *, struct page **);
 extern void ext2_set_link(struct inode *, struct ext2_dir_entry_2 *, struct page *, struct inode *);
diff --git a/fs/ext2/namei.c b/fs/ext2/namei.c
index 58107ff..5bdf990 100644
--- a/fs/ext2/namei.c
+++ b/fs/ext2/namei.c
@@ -325,6 +325,7 @@ static int ext2_whiteout(struct inode *dir, struct dentry *dentry,
 		goto out;
 
 	spin_lock(&new_dentry->d_lock);
+	new_dentry->d_flags &= ~DCACHE_FALLTHRU;
 	new_dentry->d_flags |= DCACHE_WHITEOUT;
 	spin_unlock(&new_dentry->d_lock);
 	d_add(new_dentry, NULL);
@@ -343,6 +344,24 @@ out:
 	return err;
 }
 
+/*
+ * Create a fallthru entry.
+ */
+static int ext2_fallthru (struct inode *dir, struct dentry *dentry)
+{
+	int err;
+
+	err = ext2_fallthru_entry(dir, dentry);
+	if (err)
+		return err;
+
+	d_instantiate(dentry, NULL);
+	spin_lock(&dentry->d_lock);
+	dentry->d_flags |= DCACHE_FALLTHRU;
+	spin_unlock(&dentry->d_lock);
+	return 0;
+}
+
 static int ext2_rename (struct inode * old_dir, struct dentry * old_dentry,
 	struct inode * new_dir,	struct dentry * new_dentry )
 {
@@ -438,6 +457,7 @@ const struct inode_operations ext2_dir_inode_operations = {
 	.rmdir		= ext2_rmdir,
 	.mknod		= ext2_mknod,
 	.whiteout	= ext2_whiteout,
+	.fallthru	= ext2_fallthru,
 	.rename		= ext2_rename,
 #ifdef CONFIG_EXT2_FS_XATTR
 	.setxattr	= generic_setxattr,
diff --git a/include/linux/ext2_fs.h b/include/linux/ext2_fs.h
index bd10826..f6b68ec 100644
--- a/include/linux/ext2_fs.h
+++ b/include/linux/ext2_fs.h
@@ -577,6 +577,7 @@ enum {
 	EXT2_FT_SOCK,
 	EXT2_FT_SYMLINK,
 	EXT2_FT_WHT,
+	EXT2_FT_FALLTHRU,
 	EXT2_FT_MAX
 };
 
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 31/32] union-mount: tmpfs fallthru support
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (29 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 30/32] union mount: ext2 fallthru support Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 16:09 ` [PATCH 32/32] union-mount: Copy up directory entries on first readdir() Jan Blunck
                   ` (4 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

From: Valerie Aurora (Henson) <vaurora@redhat.com>

Add support for fallthru entries to tmpfs.

Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/dcache.c |    4 ++-
 fs/libfs.c  |   18 ++++++++++++++--
 mm/shmem.c  |   61 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
 3 files changed, 71 insertions(+), 12 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 844a76a..2d4c24e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2305,7 +2305,9 @@ resume:
 		struct list_head *tmp = next;
 		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
 		next = tmp->next;
-		if (d_unhashed(dentry)||!dentry->d_inode)
+		if (d_unhashed(dentry)||(!dentry->d_inode &&
+					 !d_is_whiteout(dentry) &&
+					 !d_is_fallthru(dentry)))
 			continue;
 		if (!list_empty(&dentry->d_subdirs)) {
 			this_parent = dentry;
diff --git a/fs/libfs.c b/fs/libfs.c
index 49b4409..a362077 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -131,6 +131,7 @@ int dcache_readdir(struct file * filp, void * dirent, filldir_t filldir)
 	struct dentry *cursor = filp->private_data;
 	struct list_head *p, *q = &cursor->d_u.d_child;
 	ino_t ino;
+	int d_type;
 	int i = filp->f_pos;
 
 	switch (i) {
@@ -156,14 +157,25 @@ int dcache_readdir(struct file * filp, void * dirent, filldir_t filldir)
 			for (p=q->next; p != &dentry->d_subdirs; p=p->next) {
 				struct dentry *next;
 				next = list_entry(p, struct dentry, d_u.d_child);
-				if (d_unhashed(next) || !next->d_inode)
+				if (d_unhashed(next) || (!next->d_inode && !d_is_fallthru(next)))
 					continue;
 
+				if (d_is_fallthru(next)) {
+					/* XXX Make up things we can
+					 * only get out of the inode.
+					 * Should probably really do a
+					 * lookup instead. */
+					ino = 100; /* XXX Made up number of no significance */
+					d_type = DT_UNKNOWN;
+				} else {
+					ino = next->d_inode->i_ino;
+					d_type = dt_type(next->d_inode);
+				}
+
 				spin_unlock(&dcache_lock);
 				if (filldir(dirent, next->d_name.name, 
 					    next->d_name.len, filp->f_pos, 
-					    next->d_inode->i_ino, 
-					    dt_type(next->d_inode)) < 0)
+					    ino, d_type) < 0)
 					return 0;
 				spin_lock(&dcache_lock);
 				/* next is still alive */
diff --git a/mm/shmem.c b/mm/shmem.c
index b2e3904..f8284ea 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1778,8 +1778,7 @@ static int shmem_rmdir(struct inode *dir, struct dentry *dentry);
 static int shmem_unlink(struct inode *dir, struct dentry *dentry);
 
 /*
- * This is the whiteout support for tmpfs. It uses one singleton whiteout
- * inode per superblock thus it is very similar to shmem_link().
+ * Create a dentry to signify a whiteout.
  */
 static int shmem_whiteout(struct inode *dir, struct dentry *old_dentry,
 			  struct dentry *new_dentry)
@@ -1810,8 +1809,8 @@ static int shmem_whiteout(struct inode *dir, struct dentry *old_dentry,
 		spin_unlock(&sbinfo->stat_lock);
 	}
 
-	if (old_dentry->d_inode) {
-		if (S_ISDIR(old_dentry->d_inode->i_mode))
+	if (old_dentry->d_inode || d_is_fallthru(old_dentry)) {
+		if (old_dentry->d_inode && S_ISDIR(old_dentry->d_inode->i_mode))
 			shmem_rmdir(dir, old_dentry);
 		else
 			shmem_unlink(dir, old_dentry);
@@ -1828,6 +1827,48 @@ static int shmem_whiteout(struct inode *dir, struct dentry *old_dentry,
 }
 
 static void shmem_d_instantiate(struct inode *dir, struct dentry *dentry,
+				struct inode *inode);
+
+/*
+ * Create a dentry to signify a fallthru.  A fallthru lets us read the
+ * low-level dentries into the dcache once on the first readdir() and
+ * then
+ */
+static int shmem_fallthru(struct inode *dir, struct dentry *dentry)
+{
+	struct shmem_sb_info *sbinfo = SHMEM_SB(dir->i_sb);
+
+	/* FIXME: this is stupid */
+	if (!(dir->i_sb->s_flags & MS_WHITEOUT))
+		return -EPERM;
+
+	if (dentry->d_inode || d_is_fallthru(dentry) || d_is_whiteout(dentry))
+		return -EEXIST;
+
+	/*
+	 * Each new link needs a new dentry, pinning lowmem, and tmpfs
+	 * dentries cannot be pruned until they are unlinked.
+	 */
+	if (sbinfo->max_inodes) {
+		spin_lock(&sbinfo->stat_lock);
+		if (!sbinfo->free_inodes) {
+			spin_unlock(&sbinfo->stat_lock);
+			return -ENOSPC;
+		}
+		sbinfo->free_inodes--;
+		spin_unlock(&sbinfo->stat_lock);
+	}
+
+	shmem_d_instantiate(dir, dentry, NULL);
+	dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+
+	spin_lock(&dentry->d_lock);
+	dentry->d_flags |= DCACHE_FALLTHRU;
+	spin_unlock(&dentry->d_lock);
+	return 0;
+}
+
+static void shmem_d_instantiate(struct inode *dir, struct dentry *dentry,
 				struct inode *inode)
 {
 	if (d_is_whiteout(dentry)) {
@@ -1835,14 +1876,15 @@ static void shmem_d_instantiate(struct inode *dir, struct dentry *dentry,
 		shmem_free_inode(dir->i_sb);
 		if (S_ISDIR(inode->i_mode))
 			inode->i_mode |= S_OPAQUE;
+	} else if (d_is_fallthru(dentry)) {
+		shmem_free_inode(dir->i_sb);
 	} else {
 		/* New dentry */
 		dir->i_size += BOGO_DIRENT_SIZE;
 		dget(dentry); /* Extra count - pin the dentry in core */
 	}
-	/* Will clear DCACHE_WHITEOUT flag */
+	/* Will clear DCACHE_WHITEOUT and DCACHE_FALLTHRU flags */
 	d_instantiate(dentry, inode);
-
 }
 /*
  * File creation. Allocate an inode, and we're done..
@@ -1928,7 +1970,8 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)
 {
 	struct inode *inode = dentry->d_inode;
 
-	if (d_is_whiteout(dentry) || (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode)))
+	if (d_is_whiteout(dentry) || d_is_fallthru(dentry) ||
+	    (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode)))
 		shmem_free_inode(dir->i_sb);
 
 	if (inode) {
@@ -1954,7 +1997,8 @@ static void shmem_dir_unlink_whiteouts(struct inode *dir, struct dentry *dentry)
 		spin_lock(&dcache_lock);
 		list_for_each_entry(child, &dentry->d_subdirs, d_u.d_child) {
 			spin_lock(&child->d_lock);
-			if (d_is_whiteout(child)) {
+			/* Unlink fallthrus too */
+			if (d_is_whiteout(child) || d_is_fallthru(child)) {
 				__d_drop(child);
 				if (!list_empty(&child->d_lru)) {
 					list_del(&child->d_lru);
@@ -2569,6 +2613,7 @@ static const struct inode_operations shmem_dir_inode_operations = {
 	.mknod		= shmem_mknod,
 	.rename		= shmem_rename,
 	.whiteout       = shmem_whiteout,
+	.fallthru       = shmem_fallthru,
 #endif
 #ifdef CONFIG_TMPFS_POSIX_ACL
 	.setattr	= shmem_notify_change,
-- 
1.6.1.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 32/32] union-mount: Copy up directory entries on first readdir()
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (30 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 31/32] union-mount: tmpfs " Jan Blunck
@ 2009-05-18 16:09 ` Jan Blunck
  2009-05-18 20:40 ` [PATCH] Userland for VFS based Union Mount (V3) Valerie Aurora
                   ` (3 subsequent siblings)
  35 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-18 16:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: viro, bharata, dwmw2, mszeredi, vaurora

From: Valerie Aurora (Henson) <vaurora@redhat.com>

readdir() in union mounts is implemented by copying up all visible
directory entries from the lower level directories to the topmost
directory.  Directory entries that refer to lower level file system
objects are marked as "fallthru" in the topmost directory.

Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 fs/readdir.c          |   16 +++++
 fs/union.c            |  166 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/union.h |    2 +
 3 files changed, 184 insertions(+), 0 deletions(-)

diff --git a/fs/readdir.c b/fs/readdir.c
index 3a48491..903104a 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -16,6 +16,7 @@
 #include <linux/security.h>
 #include <linux/syscalls.h>
 #include <linux/unistd.h>
+#include <linux/union.h>
 
 #include <asm/uaccess.h>
 
@@ -36,9 +37,24 @@ int vfs_readdir(struct file *file, filldir_t filler, void *buf)
 
 	res = -ENOENT;
 	if (!IS_DEADDIR(inode)) {
+		/*
+		 * XXX Think harder about locking for
+		 * union_copyup_dir.  Currently we lock the topmost
+		 * directory and hold that lock while sequentially
+		 * acquiring and dropping locks for the directories
+		 * below this one in the union stack.
+		 */
+		if (is_unionized(file->f_path.dentry, file->f_path.mnt) &&
+		    !IS_OPAQUE(inode)) {
+			res = union_copyup_dir(&file->f_path);
+			if (res)
+				goto out_unlock;
+		}
+
 		res = file->f_op->readdir(file, buf, filler);
 		file_accessed(file);
 	}
+out_unlock:
 	mutex_unlock(&inode->i_mutex);
 out:
 	return res;
diff --git a/fs/union.c b/fs/union.c
index d21fe5f..0c3c000 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2007-2009 Novell Inc.
  *
  *   Author(s): Jan Blunck (j.blunck@tu-harburg.de)
+ *              Valerie Aurora <vaurora@redhat.com>
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License as published by the Free
@@ -780,3 +781,168 @@ void detach_mnt_union(struct vfsmount *mnt)
 	union_put(um);
 	return;
 }
+
+/**
+ * union_copyup_dir_one - copy up a single directory entry
+ *
+ * Individual directory entry copyup function for union_copyup_dir.
+ * We get the entries from higher level layers first.
+ */
+
+static int union_copyup_dir_one(void *buf, const char *name, int namlen,
+				loff_t offset, u64 ino, unsigned int d_type)
+{
+	struct dentry *topmost_dentry = (struct dentry *) buf;
+	struct dentry *dentry;
+	int err = 0;
+
+	switch (namlen) {
+	case 2:
+		if (name[1] != '.')
+			break;
+	case 1:
+		if (name[0] != '.')
+			break;
+		return 0;
+	}
+
+	/* Lookup this entry in the topmost directory */
+	dentry = lookup_one_len(name, topmost_dentry, namlen);
+
+	if (IS_ERR(dentry)) {
+		printk(KERN_INFO "error looking up %s\n", dentry->d_name.name);
+		goto out;
+	}
+
+	/*
+	 * If the entry already exists, one of the following is true:
+	 * it was already copied up (due to an earlier lookup), an
+	 * entry with the same name already exists on the topmost file
+	 * system, it is a whiteout, or it is a fallthru.  In each
+	 * case, the top level entry masks any entries from lower file
+	 * systems, so don't copy up this entry.
+	 */
+	if (dentry->d_inode || d_is_whiteout(dentry) ||
+	    d_is_fallthru(dentry)) {
+		printk(KERN_INFO "skipping copy of %s\n", dentry->d_name.name);
+		goto out_dput;
+	}
+
+	/*
+	 * If the entry doesn't exist, create a fallthru entry in the
+	 * topmost file system.  All possible directory types are
+	 * used, so each file system must implement its own way of
+	 * storing a fallthru entry.
+	 */
+	printk(KERN_INFO "creating fallthru for %s\n", dentry->d_name.name);
+	err = topmost_dentry->d_inode->i_op->fallthru(topmost_dentry->d_inode,
+						      dentry);
+	/* FIXME */
+	BUG_ON(err);
+	/*
+	 * At this point, we have a negative dentry marked as fallthru
+	 * in the cache.  We could potentially lookup the entry lower
+	 * level file system and turn this into a positive dentry
+	 * right now, but it is not clear that would be a performance
+	 * win and adds more opportunities to fail.
+	 */
+out_dput:
+	dput(dentry);
+out:
+	return 0;
+}
+
+/**
+ * union_copyup_dir - copy up low-level directory entries to topmost dir
+ *
+ * readdir() is difficult to support on union file systems for two
+ * reasons: We must eliminate duplicates and apply whiteouts, and we
+ * must return something in f_pos that lets us restart in the same
+ * place when we return.  Our solution is to, on first readdir() of
+ * the directory, copy up all visible entries from the low-level file
+ * systems and mark the entries that refer to low-level file system
+ * objects as "fallthru" entries.
+ */
+
+int union_copyup_dir(struct path *topmost_path)
+{
+	struct dentry *topmost_dentry = topmost_path->dentry;
+	struct path path = *topmost_path;
+	int res = 0;
+
+	/*
+	 * Skip opaque dirs.
+	 */
+	if (IS_OPAQUE(topmost_dentry->d_inode))
+		return 0;
+
+	/*
+	 * Mark this dir opaque to show that we have already copied up
+	 * the lower entries.  Only fallthru entries pass through to
+	 * the underlying file system.
+	 *
+	 * XXX Deal with the lower file system changing.  This could
+	 * be through running a tool over the top level file system to
+	 * make directories transparent again, or we could check the
+	 * mtime of the underlying directory.
+	 */
+
+	topmost_dentry->d_inode->i_flags |= S_OPAQUE;
+	mark_inode_dirty(topmost_dentry->d_inode);
+
+	/*
+	 * Loop through each dir on each level copying up the entries
+	 * to the topmost.
+	 */
+
+	/* Don't drop the caller's reference to the topmost path */
+	path_get(&path);
+	while (follow_union_down(&path.mnt, &path.dentry)) {
+		struct file * ftmp;
+		struct inode * inode;
+
+		/* XXX Permit fallthrus on lower-level? Would need to
+		 * pass in opaque flag to union_copyup_dir_one() and
+		 * only copy up fallthru entries there.  We allow
+		 * fallthrus in lower level opaque directories on
+		 * lookup, so for consistency we should do one or the
+		 * other in both places. */
+		if (IS_OPAQUE(path.dentry->d_inode))
+			break;
+
+		/* dentry_open() doesn't get a path reference itself */
+		path_get(&path);
+		ftmp = dentry_open(path.dentry, path.mnt,
+				   O_RDONLY | O_DIRECTORY | O_NOATIME,
+				   current_cred());
+		if (IS_ERR(ftmp)) {
+			printk (KERN_ERR "unable to open dir %s for "
+				"directory copyup: %ld\n",
+				path.dentry->d_name.name, PTR_ERR(ftmp));
+			continue;
+		}
+
+		inode = path.dentry->d_inode;
+		mutex_lock(&inode->i_mutex);
+
+		res = -ENOENT;
+		if (IS_DEADDIR(inode))
+			goto out_fput;
+		/*
+		 * Read the whole directory, calling our directory
+		 * entry copyup function on each entry.  Pass in the
+		 * topmost dentry as our private data so we can create
+		 * new entries in the topmost directory.
+		 */
+		res = ftmp->f_op->readdir(ftmp, topmost_dentry,
+					  union_copyup_dir_one);
+out_fput:
+		mutex_unlock(&inode->i_mutex);
+		fput(ftmp);
+
+		if (res)
+			break;
+	}
+	path_put(&path);
+	return res;
+}
diff --git a/include/linux/union.h b/include/linux/union.h
index 405baa9..a0656b3 100644
--- a/include/linux/union.h
+++ b/include/linux/union.h
@@ -57,6 +57,7 @@ extern struct dentry *union_create_topmost(struct nameidata *, struct qstr *,
 					   struct path *);
 extern int __union_copyup(struct path *, struct nameidata *, struct path *);
 extern int union_copyup(struct nameidata *, int);
+extern int union_copyup_dir(struct path *path);
 
 #else /* CONFIG_UNION_MOUNT */
 
@@ -74,6 +75,7 @@ extern int union_copyup(struct nameidata *, int);
 #define union_create_topmost(x, y, z)	({ BUG(); (NULL); })
 #define __union_copyup(x, y, z)		({ BUG(); (0); })
 #define union_copyup(x, y)		({ (0); })
+#define union_copyup_dir(x)		({ BUG(); (0); })
 
 #endif	/* CONFIG_UNION_MOUNT */
 #endif	/* __KERNEL__ */
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH 30/32] union mount: ext2 fallthru support
  2009-05-18 16:09 ` [PATCH 30/32] union mount: ext2 fallthru support Jan Blunck
@ 2009-05-18 16:32   ` Andreas Dilger
  2009-05-19  9:42     ` Jan Blunck
  0 siblings, 1 reply; 68+ messages in thread
From: Andreas Dilger @ 2009-05-18 16:32 UTC (permalink / raw)
  To: Jan Blunck
  Cc: viro, bharata, dwmw2, mszeredi, vaurora, linux-kernel, linux-fsdevel

On May 18, 2009  18:09 +0200, Jan Blunck wrote:
> diff --git a/include/linux/ext2_fs.h b/include/linux/ext2_fs.h
> index bd10826..f6b68ec 100644
> --- a/include/linux/ext2_fs.h
> +++ b/include/linux/ext2_fs.h
> @@ -577,6 +577,7 @@ enum {
>  	EXT2_FT_SOCK,
>  	EXT2_FT_SYMLINK,
>  	EXT2_FT_WHT,
> +	EXT2_FT_FALLTHRU,
>  	EXT2_FT_MAX

The EXT2_FT_WHT is not declared in e2fsprogs::lib/ext2fs/ext2_fs.h
so you risk hitting a conflict here if someone isn't looking at
the "should be left alone" ext2 code.  Secondly, it is somewhat
dangerous to use a straight enum here, because this will reassign
values of later variables if one of the earlier ones is removed.

For enums like this that require specific constant on-disk values
I prefer being safe:

enum {
	EXT2_FT_UNKNOWN  = 0,
	EXT2_FT_REG_FILE = 1,
	EXT2_FT_DIR	 = 2,
	EXT2_FT_CHRDEV   = 3,
	EXT2_FT_BLKDEV   = 4,
	EXT2_FT_FIFO     = 5,
	EXT2_FT_SOCK     = 6,
	EXT2_FT_SYMLINK  = 7,
	EXT2_FT_WHT      = 8,
	EXT2_FT_FALLTHRU = 9,
 	EXT2_FT_MAX


It probably also makes sense to include a patch for ext3/ext4 to ensure
these values are not used by some unrelated feature.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH] Userland for VFS based Union Mount (V3)
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (31 preceding siblings ...)
  2009-05-18 16:09 ` [PATCH 32/32] union-mount: Copy up directory entries on first readdir() Jan Blunck
@ 2009-05-18 20:40 ` Valerie Aurora
  2009-05-21 13:53   ` Andreas Dilger
  2009-05-19  9:48 ` [PATCH 00/32] " Miklos Szeredi
                   ` (2 subsequent siblings)
  35 siblings, 1 reply; 68+ messages in thread
From: Valerie Aurora @ 2009-05-18 20:40 UTC (permalink / raw)
  To: Jan Blunck; +Cc: linux-kernel, linux-fsdevel, viro, bharata, dwmw2, mszeredi

The VFS union mount patches require some changes to util-linux and
e2fsprogs to support the union mount option and the ext2 whiteout
feature flag.  We are not submitting them for formal review at this
time, but the patches are below for quick reference.  They are also
available in git repos linked to from the Union Mount HOWTO page:

http://valerieaurora.org/union/

-VAL

>From f24983a535b99a9f764b01855c2e51fc32984195 Mon Sep 17 00:00:00 2001
From: Valerie Aurora Henson <vaurora@redhat.com>
Date: Sat, 21 Mar 2009 20:56:57 -0700
Subject: [PATCH 1/1] union mount patches from:

ftp://ftp.suse.com/pub/people/jblunck/union-mount/util-linux-2.13-union_mount.diff
---
 mount/mount.c           |    5 +++++
 mount/mount_constants.h |    3 +++
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/mount/mount.c b/mount/mount.c
index 9cbc466..9bf766b 100644
--- a/mount/mount.c
+++ b/mount/mount.c
@@ -138,6 +138,7 @@ static const struct opt_map opt_map[] = {
   { "sync",	0, 0, MS_SYNCHRONOUS},	/* synchronous I/O */
   { "async",	0, 1, MS_SYNCHRONOUS},	/* asynchronous I/O */
   { "dirsync",	0, 0, MS_DIRSYNC},	/* synchronous directory modifications */
+  { "union",	0, 0, MS_UNION  },	/* Union mount */
   { "remount",  0, 0, MS_REMOUNT},      /* Alter flags of mounted FS */
   { "bind",	0, 0, MS_BIND   },	/* Remount part of tree elsewhere */
   { "rbind",	0, 0, MS_BIND|MS_REC }, /* Idem, plus mounted subtrees */
@@ -1638,6 +1639,7 @@ static struct option longopts[] = {
 	{ "make-rslave", 0, 0, 141 },
 	{ "make-rprivate", 0, 0, 142 },
 	{ "make-runbindable", 0, 0, 143 },
+	{ "union", 0, 0, 144 },
 	{ "internal-only", 0, 0, 'i' },
 	{ NULL, 0, 0, 0 }
 };
@@ -1929,6 +1931,9 @@ main(int argc, char *argv[]) {
 		case 143:
 			mounttype = (MS_UNBINDABLE | MS_REC);
 			break;
+		case 144: /* union */
+			mounttype = MS_UNION;
+			break;
 
 		case '?':
 		default:
diff --git a/mount/mount_constants.h b/mount/mount_constants.h
index dc3ca27..fb4c663 100644
--- a/mount/mount_constants.h
+++ b/mount/mount_constants.h
@@ -39,6 +39,9 @@ flags had been set; if we have a union with more than one element - fail;
 if we have a stack or plain mount - mount atop of it, forming a stack. */
 #define	MS_OVER		0x200	/* 512 */
 #endif
+#ifndef MS_UNION
+#define MS_UNION	0x100	/* 256: Mount on top of a union */
+#endif
 #ifndef MS_NOATIME
 #define MS_NOATIME	0x400	/* 1024: Do not update access times. */
 #endif
-- 
1.6.0.6

>From 548331c4129420bf61b67b8019e99e4799b0c421 Mon Sep 17 00:00:00 2001
From: Valerie Aurora Henson <vaurora@redhat.com>
Date: Sat, 21 Mar 2009 12:51:33 -0700
Subject: [PATCH 1/1] union mount patches from:

ftp://ftp.suse.com/pub/people/jblunck/union-mount/e2fsprogs-1.40.2-whiteout.diff
---
 e2fsck/e2fsck.c      |    1 +
 e2fsck/e2fsck.h      |    1 +
 e2fsck/pass1.c       |    7 +++++++
 e2fsck/pass2.c       |    4 +++-
 e2fsck/util.c        |    3 +++
 lib/e2p/feature.c    |    2 ++
 lib/ext2fs/ext2_fs.h |    7 +++++--
 lib/ext2fs/ext2fs.h  |    6 +++++-
 misc/tune2fs.8.in    |   13 ++++++++++++-
 misc/tune2fs.c       |   26 +++++++++++++++++++++++++-
 10 files changed, 64 insertions(+), 6 deletions(-)

diff --git a/e2fsck/e2fsck.c b/e2fsck/e2fsck.c
index 2ba72c8..c1835ab 100644
--- a/e2fsck/e2fsck.c
+++ b/e2fsck/e2fsck.c
@@ -145,6 +145,7 @@ errcode_t e2fsck_reset_context(e2fsck_t ctx)
 	ctx->fs_total_count = 0;
 	ctx->fs_badblocks_count = 0;
 	ctx->fs_sockets_count = 0;
+	ctx->fs_whiteouts_count = 0;
 	ctx->fs_ind_count = 0;
 	ctx->fs_dind_count = 0;
 	ctx->fs_tind_count = 0;
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index 96b83da..9cd017f 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -319,6 +319,7 @@ struct e2fsck_struct {
 	__u32 fs_total_count;
 	__u32 fs_badblocks_count;
 	__u32 fs_sockets_count;
+	__u32 fs_whiteouts_count;
 	__u32 fs_ind_count;
 	__u32 fs_dind_count;
 	__u32 fs_tind_count;
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index bed1ec8..6fbf60f 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -905,6 +905,13 @@ void e2fsck_pass1(e2fsck_t ctx)
 			check_immutable(ctx, &pctx);
 			check_size(ctx, &pctx);
 			ctx->fs_sockets_count++;
+		} else if ((ctx->fs->super->s_feature_incompat &
+			    EXT2_FEATURE_INCOMPAT_WHITEOUT) &&
+			   LINUX_S_ISWHT (inode->i_mode) &&
+			   e2fsck_pass1_check_device_inode(fs, inode)) {
+			check_immutable(ctx, &pctx);
+			check_size(ctx, &pctx);
+			ctx->fs_whiteouts_count++;
 		} else
 			mark_inode_bad(ctx, ino);
 		if (inode->i_block[EXT2_IND_BLOCK])
diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index 5e088e2..3a6a996 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -1210,7 +1210,9 @@ extern int e2fsck_process_bad_inode(e2fsck_t ctx, ext2_ino_t dir,
 	if (!LINUX_S_ISDIR(inode.i_mode) && !LINUX_S_ISREG(inode.i_mode) &&
 	    !LINUX_S_ISCHR(inode.i_mode) && !LINUX_S_ISBLK(inode.i_mode) &&
 	    !LINUX_S_ISLNK(inode.i_mode) && !LINUX_S_ISFIFO(inode.i_mode) &&
-	    !(LINUX_S_ISSOCK(inode.i_mode)))
+	    !LINUX_S_ISSOCK(inode.i_mode) &&
+	    !((ctx->fs->super->s_feature_incompat &
+	       EXT2_FEATURE_INCOMPAT_WHITEOUT) && LINUX_S_ISWHT(inode.i_mode)))
 		problem = PR_2_BAD_MODE;
 	else if (LINUX_S_ISCHR(inode.i_mode)
 		 && !e2fsck_pass1_check_device_inode(fs, &inode))
diff --git a/e2fsck/util.c b/e2fsck/util.c
index f761ebb..5e431f8 100644
--- a/e2fsck/util.c
+++ b/e2fsck/util.c
@@ -500,5 +500,8 @@ int ext2_file_type(unsigned int mode)
 	if (LINUX_S_ISSOCK(mode))
 		return EXT2_FT_SOCK;
 	
+	if (LINUX_S_ISWHT(mode))
+		return EXT2_FT_WHT;
+
 	return 0;
 }
diff --git a/lib/e2p/feature.c b/lib/e2p/feature.c
index fe7e65a..c6780d5 100644
--- a/lib/e2p/feature.c
+++ b/lib/e2p/feature.c
@@ -63,6 +63,8 @@ static struct feature feature_list[] = {
 			"extents" },
 	{	E2P_FEATURE_INCOMPAT, EXT2_FEATURE_INCOMPAT_META_BG,
 			"meta_bg" },
+	{       E2P_FEATURE_INCOMPAT, EXT2_FEATURE_INCOMPAT_WHITEOUT,
+			"whiteout" },
 	{	E2P_FEATURE_INCOMPAT, EXT3_FEATURE_INCOMPAT_EXTENTS,
 			"extent" },
 	{	E2P_FEATURE_INCOMPAT, EXT4_FEATURE_INCOMPAT_64BIT,
diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index a316665..3275c39 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -637,13 +637,15 @@ struct ext2_super_block {
 #define EXT3_FEATURE_INCOMPAT_RECOVER		0x0004 /* Needs recovery */
 #define EXT3_FEATURE_INCOMPAT_JOURNAL_DEV	0x0008 /* Journal device */
 #define EXT2_FEATURE_INCOMPAT_META_BG		0x0010
+#define EXT2_FEATURE_INCOMPAT_WHITEOUT		0x0020
 #define EXT3_FEATURE_INCOMPAT_EXTENTS		0x0040
 #define EXT4_FEATURE_INCOMPAT_64BIT		0x0080
#define EXT4_FEATURE_INCOMPAT_MMP		0x0100
 
 
 #define EXT2_FEATURE_COMPAT_SUPP	0
-#define EXT2_FEATURE_INCOMPAT_SUPP	(EXT2_FEATURE_INCOMPAT_FILETYPE)
+#define EXT2_FEATURE_INCOMPAT_SUPP	(EXT2_FEATURE_INCOMPAT_FILETYPE| \
+					 EXT2_FEATURE_INCOMPAT_WHITEOUT)
 #define EXT2_FEATURE_RO_COMPAT_SUPP	(EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER| \
 					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE| \
 					 EXT2_FEATURE_RO_COMPAT_BTREE_DIR)
@@ -705,8 +707,9 @@ struct ext2_dir_entry_2 {
 #define EXT2_FT_FIFO		5
 #define EXT2_FT_SOCK		6
 #define EXT2_FT_SYMLINK		7
+#define EXT2_FT_WHT		8
 
-#define EXT2_FT_MAX		8
+#define EXT2_FT_MAX		9
 
 /*
  * EXT2_DIR_PAD defines the directory entries boundaries
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 7645210..82094d3 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -357,8 +357,9 @@ typedef struct ext2_struct_inode_scan *ext2_inode_scan;
  * non-Linux system.
  */
 #define LINUX_S_IFMT  00170000
+#define LINUX_S_IFWHT  0160000
 #define LINUX_S_IFSOCK 0140000
-#define LINUX_S_IFLNK	 0120000
+#define LINUX_S_IFLNK  0120000
 #define LINUX_S_IFREG  0100000
 #define LINUX_S_IFBLK  0060000
 #define LINUX_S_IFDIR  0040000
@@ -390,6 +391,7 @@ typedef struct ext2_struct_inode_scan *ext2_inode_scan;
 #define LINUX_S_ISBLK(m)	(((m) & LINUX_S_IFMT) == LINUX_S_IFBLK)
 #define LINUX_S_ISFIFO(m)	(((m) & LINUX_S_IFMT) == LINUX_S_IFIFO)
 #define LINUX_S_ISSOCK(m)	(((m) & LINUX_S_IFMT) == LINUX_S_IFSOCK)
+#define LINUX_S_ISWHT(m)	(((m) & LINUX_S_IFMT) == LINUX_S_IFWHT)
 
 /*
  * ext2 size of an inode
@@ -449,12 +451,14 @@ typedef struct ext2_icount *ext2_icount_t;
  #warning "Compression suFEATURE_INCOMPAT_WHITEOUT|\
 					 EXT3_FEATURE_INCOMPAT_JOURNAL_DEV|\
 					 EXT2_FEATURE_INCOMPAT_META_BG|\
 					 EXT3_FEATURE_INCOMPAT_RECOVER)
diff --git a/misc/tune2fs.8.in b/misc/tune2fs.8.in
index 2e617db..b2542a9 100644
--- a/misc/tune2fs.8.in
+++ b/misc/tune2fs.8.in
@@ -392,12 +392,18 @@ option.
 .TP
 .B sparse_super
 Limit the number of backup superblocks to save space on large filesystems.
+.TP
+.B whiteout
+For union mounted filesystems support the whiteout filetype to store metadata
+about removed files in a union.
 .RE
 .IP
 After setting or clearing 
 .B sparse_super
-and 
+,
 .B filetype 
+or
+.B whiteout
 filesystem features,
 .BR e2fsck (8)
 must be run on the filesystem to return the filesystem to a consistent state.
@@ -415,6 +421,11 @@ Linux kernels before 2.0.39 and many 2.1 series kernels do not support
 the filesystems that use any of these features.
 Enabling certain filesystem features may prevent the filesystem from
 being mounted by kernels which do not support those features.
+.IP
+.B Warning:
+Linux kernels without union mount patches do not support the whiteout
+filesystem feature. Enabling this feature prevents the filesystem from
+being mounted by kernels without union mount support.
 .TP
 .BI \-r " reserved-blocks-count"
 Set the number of reserved filesystem blocks.
diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index 833b994..bad9736 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -97,7 +97,8 @@ static void usage(void)
 static __u32 ok_features[3] = {
 	EXT3_FEATURE_COMPAT_HAS_JOURNAL |
 		EXT2_FEATURE_COMPAT_DIR_INDEX,	/* Compat */
-	EXT2_FEATURE_INCOMPAT_FILETYPE,		/* Incompat */
+	EXT2_FEATURE_INCOMPAT_FILETYPE |
+		EXT2_FEATURE_INCOMPAT_WHITEOUT,	/* Incompat */
 	EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER	/* R/O compat */
 };
 
@@ -283,6 +284,7 @@ static void update_feature_set(ext2_filsys fs, char *features)
 {
 	int sparse, old_sparse, filetype, old_filetype;
 	int journal, old_journal, dxdir, old_dxdir;
+	int whiteout, old_whiteout;
 	struct ext2_super_block *sb= fs->super;
 	__u32	old_compat, old_incompat, old_ro_compat;
 
@@ -298,6 +300,8 @@ static void update_feature_set(ext2_filsys fs, char *features)
 		EXT3_FEATURE_COMPAT_HAS_JOURNAL;
 	old_dxdir = sb->s_feature_compat &
 		EXT2_FEATURE_COMPAT_DIR_INDEX;
+	old_whiteout = sb->s_feature_incompat &
+		EXT2_FEATURE_INCOMPAT_WHITEOUT;
 	if (e2p_edit_feature(features, &sb->s_feature_compat,
 			     ok_features)) {
 		fprintf(stderr, _("Invalid filesystem option set: %s\n"),
@@ -312,6 +316,8 @@ static void update_feature_set(ext2_filsys fs, char *features)
 		EXT3_FEATURE_COMPAT_HAS_JOURNAL;
 	dxdir = sb->s_feature_compat &
 		EXT2_FEATURE_COMPAT_DIR_INDEX;
+	whiteout = sb->s_feature_incompat &
+		EXT2_FEATURE_INCOMPAT_WHITEOUT;
 	if (old_journal && !journal) {
 		if ((mount_flags & EXT2_MF_MOUNTED) &&
 		    !(mount_flags & EXT2_MF_READONLY)) {
@@ -352,6 +358,24 @@ static void update_feature_set(ext2_filsys fs, char *features)
 		if (uuid_is_null((unsigned char *) sb->s_hash_seed))
 			uuid_generate((unsigned char *) sb->s_hash_seed);
 	}
+	if (old_whiteout && !whiteout) {
+		if (mount_flags & EXT2_MF_MOUNTED) {
+			fputs(_("The whiteout flag may only be "
+				"cleared when the filesystem is\n"
+				"unmounted.\n"), stderr);
+			exit(1);
+		}
+		sb->s_state &= ~EXT2_VALID_FS;
+		printf(_("\nWhiteout superblock flag cleared.  %s"),
+		       _(please_fsck));
+	}
+	if (whiteout && !old_whiteout) {
+		//sb->s_feature_incompat |=
+		//	EXT2_FEATURE_INCOMPAT_WHITEOUT;
+		sb->s_state &= ~EXT2_VALID_FS;
+		printf(_("\nWhiteout superblock flag set.  %s"),
+		       _(please_fsck));
+	}
 
 	if (sb->s_rev_level == EXT2_GOOD_OLD_REV &&
 	    (sb->s_feature_compat || sb->s_feature_ro_compat ||
-- 
1.6.0.6



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH 30/32] union mount: ext2 fallthru support
  2009-05-18 16:32   ` Andreas Dilger
@ 2009-05-19  9:42     ` Jan Blunck
  2009-05-19 14:05       ` Andreas Dilger
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Blunck @ 2009-05-19  9:42 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: viro, bharata, dwmw2, mszeredi, vaurora, linux-kernel, linux-fsdevel

On Mon, May 18, Andreas Dilger wrote:

> For enums like this that require specific constant on-disk values
> I prefer being safe:
> 
> enum {
> 	EXT2_FT_UNKNOWN  = 0,
> 	EXT2_FT_REG_FILE = 1,
> 	EXT2_FT_DIR	 = 2,
> 	EXT2_FT_CHRDEV   = 3,
> 	EXT2_FT_BLKDEV   = 4,
> 	EXT2_FT_FIFO     = 5,
> 	EXT2_FT_SOCK     = 6,
> 	EXT2_FT_SYMLINK  = 7,
> 	EXT2_FT_WHT      = 8,
> 	EXT2_FT_FALLTHRU = 9,
>  	EXT2_FT_MAX
> 
> 
> It probably also makes sense to include a patch for ext3/ext4 to ensure
> these values are not used by some unrelated feature.

In ext3 these are preprocessor defines. IIRC defines and enums are identical
for C (both an int) so I leave this untouched and just add the new filetypes,
right?

Thanks,
Jan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (32 preceding siblings ...)
  2009-05-18 20:40 ` [PATCH] Userland for VFS based Union Mount (V3) Valerie Aurora
@ 2009-05-19  9:48 ` Miklos Szeredi
  2009-05-19 10:29   ` Jan Blunck
  2009-05-19 17:23   ` Valerie Aurora
  2009-05-21 12:54 ` Jan Rekorajski
  2009-06-04 11:38 ` Scott James Remnant
  35 siblings, 2 replies; 68+ messages in thread
From: Miklos Szeredi @ 2009-05-19  9:48 UTC (permalink / raw)
  To: jblunck
  Cc: linux-kernel, linux-fsdevel, viro, bharata, dwmw2, mszeredi, vaurora

On Mon, 18 May 2009, Jan Blunck wrote:
> Here is another post of the VFS based union mount implementation.
> 
> Traditionally the mount operation is opaque, which means that the content of
> the mount point, the directory where the file system is mounted on, is hidden
> by the content of the mounted file system's root directory until the file
> system is unmounted again. Unlike the traditional UNIX mount mechanism, that
> hides the contents of the mount point, a union mount presents a view as if
> both filesystems are merged together. Although only the topmost layer of the
> mount stack can be altered, it appears as if transparent file system mounts
> allow any file to be created, modified or deleted.
> 
> Most people know the concepts and features of union mounts from other
> operating systems like Sun's Translucent Filesystem, Plan9 or BSD. For an
> in-depth review of union mounts and other unioning file systems, see:
> 
> http://lwn.net/Articles/324291/
> http://lwn.net/Articles/325369/
> http://lwn.net/Articles/327738/
> 
> Here are the key features of this implementation:
> - completely VFS based
> - does not change the namespace stacking
> - directory listings have duplicate entries removed in the kernel
> - writable unions: only the topmost file system layer may be writable
> - writable unions: new whiteout filetype handled inside the kernel
> 
> Major changes since last post:
> - Updated the whiteout patches:
>   - DCACHE_WHITEOUT flag set on a negative dentry
>   - uses filetype instead of reserved inode number on EXT2
> - Copy-up directories during lookup
> - Implemented fallthru support for in-kernel readdir() as proposed by
>   Valerie Aurora (Henson)

Does this copy up directories persistently?  If so, does this
implementation no longer supports union of all read-only branches?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-05-19  9:48 ` [PATCH 00/32] " Miklos Szeredi
@ 2009-05-19 10:29   ` Jan Blunck
  2009-05-19 10:35     ` Miklos Szeredi
  2009-05-19 17:23   ` Valerie Aurora
  1 sibling, 1 reply; 68+ messages in thread
From: Jan Blunck @ 2009-05-19 10:29 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-kernel, linux-fsdevel, viro, bharata, dwmw2, mszeredi, vaurora

On Tue, May 19, Miklos Szeredi wrote:

> > Major changes since last post:
> > - Updated the whiteout patches:
> >   - DCACHE_WHITEOUT flag set on a negative dentry
> >   - uses filetype instead of reserved inode number on EXT2
> > - Copy-up directories during lookup
> > - Implemented fallthru support for in-kernel readdir() as proposed by
> >   Valerie Aurora (Henson)
> 
> Does this copy up directories persistently?  If so, does this
> implementation no longer supports union of all read-only branches?

The directory in the topmost filesystem is created during lookup. The contents
of the directory isn't copied up presistently at that point in time. Therefore
you have an empty directory in the topmost filesystem after the lookup. This
was necessary to get rid of the union_relookup_topmost() calls during create,
mknod, mkdir etc.

When readdir is called, the topmost directory is filed up with fallthru
entries which are persistently stored. This is only necessary to get readdir
right wrt POSIX. During lookup the fallthru dentry, which is in fact a special
negative dentry, is ignored and therefore the lookup continues on the lower
filesystem.

Hope that clarifies your questions,
Jan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-05-19 10:29   ` Jan Blunck
@ 2009-05-19 10:35     ` Miklos Szeredi
  2009-05-19 10:39       ` Jan Blunck
  0 siblings, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2009-05-19 10:35 UTC (permalink / raw)
  To: jblunck
  Cc: miklos, linux-kernel, linux-fsdevel, viro, bharata, dwmw2,
	mszeredi, vaurora

On Tue, 19 May 2009, Jan Blunck wrote:
> The directory in the topmost filesystem is created during
> lookup. The contents of the directory isn't copied up presistently
> at that point in time. Therefore you have an empty directory in the
> topmost filesystem after the lookup. This was necessary to get rid
> of the union_relookup_topmost() calls during create, mknod, mkdir
> etc.
> 
> When readdir is called, the topmost directory is filed up with
> fallthru entries which are persistently stored. This is only
> necessary to get readdir right wrt POSIX. During lookup the fallthru
> dentry, which is in fact a special negative dentry, is ignored and
> therefore the lookup continues on the lower filesystem.

So this means that the topmost branch always needs to be writable,
right?  It isn't possible to make a union of two iso9660 filesystems,
for example?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-05-19 10:35     ` Miklos Szeredi
@ 2009-05-19 10:39       ` Jan Blunck
  2009-05-19 11:54         ` Arnd Bergmann
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Blunck @ 2009-05-19 10:39 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-kernel, linux-fsdevel, viro, bharata, dwmw2, mszeredi, vaurora

On Tue, May 19, Miklos Szeredi wrote:

> On Tue, 19 May 2009, Jan Blunck wrote:
> > The directory in the topmost filesystem is created during
> > lookup. The contents of the directory isn't copied up presistently
> > at that point in time. Therefore you have an empty directory in the
> > topmost filesystem after the lookup. This was necessary to get rid
> > of the union_relookup_topmost() calls during create, mknod, mkdir
> > etc.
> > 
> > When readdir is called, the topmost directory is filed up with
> > fallthru entries which are persistently stored. This is only
> > necessary to get readdir right wrt POSIX. During lookup the fallthru
> > dentry, which is in fact a special negative dentry, is ignored and
> > therefore the lookup continues on the lower filesystem.
> 
> So this means that the topmost branch always needs to be writable,
> right?  It isn't possible to make a union of two iso9660 filesystems,
> for example?

Exactly. Although, you can do that with the help of tmpfs on top of the two
iso9660 filesystems. Or by adding fake write support to iso9660 ...

I know that this seems to be suboptimal but it is the cost that the POSIX
correct readdir comes with.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-05-19 10:39       ` Jan Blunck
@ 2009-05-19 11:54         ` Arnd Bergmann
  2009-05-19 12:15           ` Jan Blunck
  0 siblings, 1 reply; 68+ messages in thread
From: Arnd Bergmann @ 2009-05-19 11:54 UTC (permalink / raw)
  To: Jan Blunck
  Cc: Miklos Szeredi, linux-kernel, linux-fsdevel, viro, bharata,
	dwmw2, mszeredi, vaurora

On Tuesday 19 May 2009, Jan Blunck wrote:
> > So this means that the topmost branch always needs to be writable,
> > right?  It isn't possible to make a union of two iso9660 filesystems,
> > for example?
> 
> Exactly. Although, you can do that with the help of tmpfs on top of the two
> iso9660 filesystems.

But how do you get there? You can mount the tmpfs on top of two iso9660
file systems, but it seems that you wouldn't be able to get the two
stacked on top of each other in the first place.

Also, by mounting a tmpfs on top, wouldn't you you violate the requirement
for persistent inode numbers again?

> Or by adding fake write support to iso9660 ... 

This would work, but you'd have to do this for each file system if you want
to be able to use it as the top of the union while backed by a read-only
block device or when you don't want it to be written.

	Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-05-19 11:54         ` Arnd Bergmann
@ 2009-05-19 12:15           ` Jan Blunck
  2009-05-19 12:21             ` Arnd Bergmann
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Blunck @ 2009-05-19 12:15 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Miklos Szeredi, linux-kernel, linux-fsdevel, viro, bharata,
	dwmw2, mszeredi, vaurora

On Tue, May 19, Arnd Bergmann wrote:

> On Tuesday 19 May 2009, Jan Blunck wrote:
> > > So this means that the topmost branch always needs to be writable,
> > > right?  It isn't possible to make a union of two iso9660 filesystems,
> > > for example?
> > 
> > Exactly. Although, you can do that with the help of tmpfs on top of the two
> > iso9660 filesystems.
> 
> But how do you get there? You can mount the tmpfs on top of two iso9660
> file systems, but it seems that you wouldn't be able to get the two
> stacked on top of each other in the first place.

Well, at the moment you can stack them but readdir will fail every time you
call it ... I think this is just a question of policy if we want to allow that
or not.

> Also, by mounting a tmpfs on top, wouldn't you you violate the requirement
> for persistent inode numbers again?

There is no requirement for persistent unique inode numbers except if you want
to export the union again. This is something that is out of scope of this
implementation. If you are going to export a union mounted filesystem, you
only export the topmost filesystem.

> > Or by adding fake write support to iso9660 ... 
> 
> This would work, but you'd have to do this for each file system if you want
> to be able to use it as the top of the union while backed by a read-only
> block device or when you don't want it to be written.

I know that the requirement for the topmost filesystem to be able to create
directories and fill them with fallthrus is an unattractive one. On the other
hand this is the cost that you have to pay at the moment to get this kind of
functionality. This implementation will not help with all use-cases. Its focus
is to get certain use-cases right.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-05-19 12:15           ` Jan Blunck
@ 2009-05-19 12:21             ` Arnd Bergmann
  2009-05-19 13:10               ` Jan Blunck
  0 siblings, 1 reply; 68+ messages in thread
From: Arnd Bergmann @ 2009-05-19 12:21 UTC (permalink / raw)
  To: Jan Blunck
  Cc: Miklos Szeredi, linux-kernel, linux-fsdevel, viro, bharata,
	dwmw2, mszeredi, vaurora

On Tuesday 19 May 2009, Jan Blunck wrote:
> On Tue, May 19, Arnd Bergmann wrote:
> > This would work, but you'd have to do this for each file system if you want
> > to be able to use it as the top of the union while backed by a read-only
> > block device or when you don't want it to be written.
> 
> I know that the requirement for the topmost filesystem to be able to create
> directories and fill them with fallthrus is an unattractive one. On the other
> hand this is the cost that you have to pay at the moment to get this kind of
> functionality. This implementation will not help with all use-cases. Its focus
> is to get certain use-cases right.

So what would go wrong if you only made them persistent for writable file
systems, but allowed fallthrough dentries to be discarded for read-only
file systems? As long as the lower layers don't change, you should still
be able to reconstruct the same dentries every time you do a readdir, right?

	Arnd <><

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-05-19 12:21             ` Arnd Bergmann
@ 2009-05-19 13:10               ` Jan Blunck
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-19 13:10 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Miklos Szeredi, linux-kernel, linux-fsdevel, viro, bharata,
	dwmw2, mszeredi, vaurora

On Tue, May 19, Arnd Bergmann wrote:

> 
> So what would go wrong if you only made them persistent for writable file
> systems, but allowed fallthrough dentries to be discarded for read-only
> file systems? As long as the lower layers don't change, you should still
> be able to reconstruct the same dentries every time you do a readdir, right?
> 

This can work if you do this when the last file descriptor on the directory is
closed. I have a similar patch around for tmpfs readdir (the SLES8 glibc is
seeking on readdir, so it broke some of our build servers that use tmpfs).

One idea would be to separate the "fallthru in tmpfs" handling into its own
library so that it can be shared by filesystems. This would become something
similar to the On-Disk-Format (ODF) approach that Erez Zadok followed with
UnionFS.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 30/32] union mount: ext2 fallthru support
  2009-05-19  9:42     ` Jan Blunck
@ 2009-05-19 14:05       ` Andreas Dilger
  2009-05-19 16:13         ` Jan Blunck
  0 siblings, 1 reply; 68+ messages in thread
From: Andreas Dilger @ 2009-05-19 14:05 UTC (permalink / raw)
  To: Jan Blunck
  Cc: viro, bharata, dwmw2, mszeredi, vaurora, linux-kernel, linux-fsdevel

On May 19, 2009  11:42 +0200, Jan Blunck wrote:
> On Mon, May 18, Andreas Dilger wrote:
> > For enums like this that require specific constant on-disk values
> > I prefer being safe:
> > 
> > enum {
> > 	EXT2_FT_UNKNOWN  = 0,
> > 	EXT2_FT_REG_FILE = 1,
> > 	EXT2_FT_DIR	 = 2,
> > 	EXT2_FT_CHRDEV   = 3,
> > 	EXT2_FT_BLKDEV   = 4,
> > 	EXT2_FT_FIFO     = 5,
> > 	EXT2_FT_SOCK     = 6,
> > 	EXT2_FT_SYMLINK  = 7,
> > 	EXT2_FT_WHT      = 8,
> > 	EXT2_FT_FALLTHRU = 9,
> >  	EXT2_FT_MAX
> > 
> > 
> > It probably also makes sense to include a patch for ext3/ext4 to ensure
> > these values are not used by some unrelated feature.
> 
> In ext3 these are preprocessor defines. IIRC defines and enums are identical
> for C (both an int) so I leave this untouched and just add the new filetypes,
> right?

The problem is - what happens if, for whatever reason, EXT2_FT_WHT is
removed?  In a regular enum EXT2_ET_FALLTHRU would get the old value for
EXT2_FT_WHT (=8).  Alternately, if someone accidentally adds a value
before EXT2_FT_WHT because this isn't in the upstream e2fsprogs[*] and
this would push the values of EXT2_FT_WHT and EXT2_FT_FALLTHROUGH up.

That is why, when using enums for on-disk or "external" interfaces, I
prefer that the values are explicitly specified.  It also makes it more
clear when reading the code that these values are static and should not
be changed, instead of just a grouping of related constants.


[*] should be the canonical resource for new on-disk assignments, IMHO

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 30/32] union mount: ext2 fallthru support
  2009-05-19 14:05       ` Andreas Dilger
@ 2009-05-19 16:13         ` Jan Blunck
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Blunck @ 2009-05-19 16:13 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: viro, bharata, dwmw2, mszeredi, vaurora, linux-kernel, linux-fsdevel

On Tue, May 19, Andreas Dilger wrote:

> On May 19, 2009  11:42 +0200, Jan Blunck wrote:
> > On Mon, May 18, Andreas Dilger wrote:
> > > For enums like this that require specific constant on-disk values
> > > I prefer being safe:
> > > 
> > > enum {
> > > 	EXT2_FT_UNKNOWN  = 0,
> > > 	EXT2_FT_REG_FILE = 1,
> > > 	EXT2_FT_DIR	 = 2,
> > > 	EXT2_FT_CHRDEV   = 3,
> > > 	EXT2_FT_BLKDEV   = 4,
> > > 	EXT2_FT_FIFO     = 5,
> > > 	EXT2_FT_SOCK     = 6,
> > > 	EXT2_FT_SYMLINK  = 7,
> > > 	EXT2_FT_WHT      = 8,
> > > 	EXT2_FT_FALLTHRU = 9,
> > >  	EXT2_FT_MAX
> > > 
> > > 
> > > It probably also makes sense to include a patch for ext3/ext4 to ensure
> > > these values are not used by some unrelated feature.
> > 
> > In ext3 these are preprocessor defines. IIRC defines and enums are identical
> > for C (both an int) so I leave this untouched and just add the new filetypes,
> > right?
> 
> The problem is - what happens if, for whatever reason, EXT2_FT_WHT is
> removed?  In a regular enum EXT2_ET_FALLTHRU would get the old value for
> EXT2_FT_WHT (=8).  Alternately, if someone accidentally adds a value
> before EXT2_FT_WHT because this isn't in the upstream e2fsprogs[*] and
> this would push the values of EXT2_FT_WHT and EXT2_FT_FALLTHROUGH up.
> 
> That is why, when using enums for on-disk or "external" interfaces, I
> prefer that the values are explicitly specified.  It also makes it more
> clear when reading the code that these values are static and should not
> be changed, instead of just a grouping of related constants.

Yes. I totally understand your concerns. I submitted a patch already. I just
wanted to make clear that the way ext3 and ext4 define the filetypes by
preprocessor defines. The defines do not have the problem since they are
explicitly assigning numbers as well.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 21/32] union-mount: Make lookup work for union-mounted file systems
  2009-05-18 16:09 ` [PATCH 21/32] union-mount: Make lookup work for union-mounted file systems Jan Blunck
@ 2009-05-19 16:15   ` Miklos Szeredi
  2009-05-19 17:30     ` Valerie Aurora
  0 siblings, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2009-05-19 16:15 UTC (permalink / raw)
  To: jblunck
  Cc: linux-kernel, linux-fsdevel, viro, bharata, dwmw2, mszeredi, vaurora

On Mon, 18 May 2009, Jan Blunck wrote:
> On union-mounted file systems the lookup function must also visit lower layers
> of the union-stack when doing a lookup. This patches add support for
> union-mounts to cached lookups and real lookups.
> 
> We have 3 different styles of lookup functions now:
> - multiple pathname components, follow mounts, follow union, follow symlinks
> - single pathname component, doesn't follow mounts, follow union, doesn't
>   follow symlinks
> - single pathname component doesn't follow mounts, doesn't follow unions,
>   doesn't follow symlinks

Ugh...  I do wonder if this could be done in a less complicated way,
there does seem to be a fair amount of duplication between these
functions.

Worse, it looks like there are still i_mutex lock ordering issues
(__hash_lookup_topmost()/__hash_lookup_build_union()).  What happens
if two separate unions of two filesystems are built where the order of
branches is reversed?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-05-19  9:48 ` [PATCH 00/32] " Miklos Szeredi
  2009-05-19 10:29   ` Jan Blunck
@ 2009-05-19 17:23   ` Valerie Aurora
  2009-05-20  9:05     ` Miklos Szeredi
  1 sibling, 1 reply; 68+ messages in thread
From: Valerie Aurora @ 2009-05-19 17:23 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: jblunck, linux-kernel, linux-fsdevel, viro, bharata, dwmw2, mszeredi

On Tue, May 19, 2009 at 11:48:00AM +0200, Miklos Szeredi wrote:
> On Mon, 18 May 2009, Jan Blunck wrote:
> > Here is another post of the VFS based union mount implementation.
> > 
> > Traditionally the mount operation is opaque, which means that the content of
> > the mount point, the directory where the file system is mounted on, is hidden
> > by the content of the mounted file system's root directory until the file
> > system is unmounted again. Unlike the traditional UNIX mount mechanism, that
> > hides the contents of the mount point, a union mount presents a view as if
> > both filesystems are merged together. Although only the topmost layer of the
> > mount stack can be altered, it appears as if transparent file system mounts
> > allow any file to be created, modified or deleted.
> > 
> > Most people know the concepts and features of union mounts from other
> > operating systems like Sun's Translucent Filesystem, Plan9 or BSD. For an
> > in-depth review of union mounts and other unioning file systems, see:
> > 
> > http://lwn.net/Articles/324291/
> > http://lwn.net/Articles/325369/
> > http://lwn.net/Articles/327738/
> > 
> > Here are the key features of this implementation:
> > - completely VFS based
> > - does not change the namespace stacking
> > - directory listings have duplicate entries removed in the kernel
> > - writable unions: only the topmost file system layer may be writable
> > - writable unions: new whiteout filetype handled inside the kernel
> > 
> > Major changes since last post:
> > - Updated the whiteout patches:
> >   - DCACHE_WHITEOUT flag set on a negative dentry
> >   - uses filetype instead of reserved inode number on EXT2
> > - Copy-up directories during lookup
> > - Implemented fallthru support for in-kernel readdir() as proposed by
> >   Valerie Aurora (Henson)
> 
> Does this copy up directories persistently?  If so, does this
> implementation no longer supports union of all read-only branches?

As Jan said, readdir() of read-only unioned file systems works with a
tmpfs top layer.  If you think about it, this is the exact equivalent
of the version of union mounts which used the in-kernel caching
approach - except that it's better, because it reuses existing code
and caches between readdir() calls.  Cool, huh?

Do you have ideas for how to do this better?  I particularly would
like to be able to get rid of the tmpfs dentries when no one is
readdir()ing on that directory.

-VAL

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 21/32] union-mount: Make lookup work for union-mounted file systems
  2009-05-19 16:15   ` Miklos Szeredi
@ 2009-05-19 17:30     ` Valerie Aurora
  2009-05-20 10:21       ` Miklos Szeredi
  0 siblings, 1 reply; 68+ messages in thread
From: Valerie Aurora @ 2009-05-19 17:30 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: jblunck, linux-kernel, linux-fsdevel, viro, bharata, dwmw2, mszeredi

On Tue, May 19, 2009 at 06:15:52PM +0200, Miklos Szeredi wrote:
> On Mon, 18 May 2009, Jan Blunck wrote:
> > On union-mounted file systems the lookup function must also visit lower layers
> > of the union-stack when doing a lookup. This patches add support for
> > union-mounts to cached lookups and real lookups.
> > 
> > We have 3 different styles of lookup functions now:
> > - multiple pathname components, follow mounts, follow union, follow symlinks
> > - single pathname component, doesn't follow mounts, follow union, doesn't
> >   follow symlinks
> > - single pathname component doesn't follow mounts, doesn't follow unions,
> >   doesn't follow symlinks
> 
> Ugh...  I do wonder if this could be done in a less complicated way,
> there does seem to be a fair amount of duplication between these
> functions.

Yeah, I agree.  My best idea so far is not very good - have one
skeleton function and pass in function pointers for the
lookup_topmost() and build_union() functions.  Do you have any ideas?

> Worse, it looks like there are still i_mutex lock ordering issues
> (__hash_lookup_topmost()/__hash_lookup_build_union()).  What happens
> if two separate unions of two filesystems are built where the order of
> branches is reversed?

We have a similar problem in union_copyup_dir().  Hm, thinking about
this, only one of the file systems can actually change while we are
doing work.  That might help us get out of the lock ordering problems.
Thoughts?

-VAL

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-05-19 17:23   ` Valerie Aurora
@ 2009-05-20  9:05     ` Miklos Szeredi
  2009-06-08 19:44       ` Valerie Aurora
  0 siblings, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2009-05-20  9:05 UTC (permalink / raw)
  To: vaurora
  Cc: miklos, jblunck, linux-kernel, linux-fsdevel, viro, bharata,
	dwmw2, mszeredi

On Tue, 19 May 2009, Valerie Aurora wrote:
> As Jan said, readdir() of read-only unioned file systems works with a
> tmpfs top layer.  If you think about it, this is the exact equivalent
> of the version of union mounts which used the in-kernel caching
> approach - except that it's better, because it reuses existing code
> and caches between readdir() calls.  Cool, huh?

Yeah... OTOH tmpfs is probably a way too heavyweight solution for
cases where memory is short, and union mounts would typically be used
on such systems.

The big reason why kernel impementation of readdir is hard is that
unswappable kernel memory needs to be used for caching directory
contents while the directory is open.  Well, tmpfs does the same,
dentries and inodes are _not_ swappable, and they gobble up memory.

So where's the advantage over implementing a thin deduplicating and
caching layer for union mounts?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 21/32] union-mount: Make lookup work for union-mounted file systems
  2009-05-19 17:30     ` Valerie Aurora
@ 2009-05-20 10:21       ` Miklos Szeredi
  0 siblings, 0 replies; 68+ messages in thread
From: Miklos Szeredi @ 2009-05-20 10:21 UTC (permalink / raw)
  To: vaurora
  Cc: miklos, jblunck, linux-kernel, linux-fsdevel, viro, bharata,
	dwmw2, mszeredi

On Tue, 19 May 2009, Valerie Aurora wrote:
> On Tue, May 19, 2009 at 06:15:52PM +0200, Miklos Szeredi wrote:

> > Worse, it looks like there are still i_mutex lock ordering issues
> > (__hash_lookup_topmost()/__hash_lookup_build_union()).  What happens
> > if two separate unions of two filesystems are built where the order of
> > branches is reversed?
> 
> We have a similar problem in union_copyup_dir().  Hm, thinking about
> this, only one of the file systems can actually change while we are
> doing work.  That might help us get out of the lock ordering problems.
> Thoughts?

Right, we talked about this with Jan, and came basically to the same
conclusion.  The lookup on the lower branches needs to be separate
from the atomic lookup/create on the top branch.  Which means some
restructuring in the callers...

I'm not sure how all this could be simplified, I don't yet even
understand what all these different lookup functions are meant to do
(header comments might help).

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (33 preceding siblings ...)
  2009-05-19  9:48 ` [PATCH 00/32] " Miklos Szeredi
@ 2009-05-21 12:54 ` Jan Rekorajski
  2009-06-08 19:57   ` Valerie Aurora
  2009-06-04 11:38 ` Scott James Remnant
  35 siblings, 1 reply; 68+ messages in thread
From: Jan Rekorajski @ 2009-05-21 12:54 UTC (permalink / raw)
  To: Jan Blunck
  Cc: linux-kernel, linux-fsdevel, viro, bharata, dwmw2, mszeredi, vaurora

On Mon, 18 May 2009, Jan Blunck wrote:

> Here is another post of the VFS based union mount implementation.

Is there any chance this will support NFS? I can union-mount tmpfs over
nfs mounted fs, but if I try to mount --union two NFS filesystems I
always get -EBUSY on second mount on the same mountpoint.

Something along these lines:

doesn't matter if I use --union on first mount, the result is always the
same.

mount <--union> -t nfs server:/export/system /mnt
OK
mount --union -t nfs server:/export/profile /mnt
mount.nfs: /mnt is busy or already mounted

I patched mount.nfs so it knows about MS_UNION, and strace shows me that
it passes that flag to kernel.

Jan
-- 
Jan Rekorajski            |  ALL SUSPECTS ARE GUILTY. PERIOD!
baggins<at>mimuw.edu.pl   |  OTHERWISE THEY WOULDN'T BE SUSPECTS, WOULD THEY?
BOFH, MANIAC              |                   -- TROOPS by Kevin Rubio

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] Userland for VFS based Union Mount (V3)
  2009-05-18 20:40 ` [PATCH] Userland for VFS based Union Mount (V3) Valerie Aurora
@ 2009-05-21 13:53   ` Andreas Dilger
  2009-06-18  3:22     ` Valerie Aurora
  0 siblings, 1 reply; 68+ messages in thread
From: Andreas Dilger @ 2009-05-21 13:53 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Jan Blunck, linux-kernel, linux-fsdevel, viro, bharata, dwmw2, mszeredi

On May 18, 2009  16:40 -0400, Valerie Aurora wrote:
> @@ -705,8 +707,9 @@ struct ext2_dir_entry_2 {
>  #define EXT2_FT_FIFO		5
>  #define EXT2_FT_SOCK		6
>  #define EXT2_FT_SYMLINK		7
> +#define EXT2_FT_WHT		8
>  
> -#define EXT2_FT_MAX		8
> +#define EXT2_FT_MAX		9

What about the EXT2_FT_FALLTHROUGH used in the union mount patches?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 15/32] union-mount: Documentation
  2009-05-18 16:09 ` [PATCH 15/32] union-mount: Documentation Jan Blunck
@ 2009-05-25  6:25   ` hooanon05
  2009-05-25  8:03     ` Arnd Bergmann
  0 siblings, 1 reply; 68+ messages in thread
From: hooanon05 @ 2009-05-25  6:25 UTC (permalink / raw)
  To: Jan Blunck
  Cc: linux-kernel, linux-fsdevel, viro, bharata, dwmw2, mszeredi, vaurora


Jan Blunck:
> +Rename across different levels of the union is implemented as a copy-up
> +operation for regular files. Rename of directories simply returns EXDEV, the
> +same as if we tried to rename across different mounts. Most applications have
> +to handle this case anyway. Some applications do not expect EXDEV on
> +rename operations within the same directory, but these applications will also
> +be broken with bind mounts.

Is renaming a regular file supported?
Accrding to the change in "[PATCH 24/32] union-mount: in-kernel file
copy between union mounted filesystems", every rename under union seems
to be rejected.

link(2) may be similar.
When a "fileA" exists in the lower fs only, link("fileA", "new_file")
will return EXDEV too.


J. R. Okajima

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 15/32] union-mount: Documentation
  2009-05-25  6:25   ` hooanon05
@ 2009-05-25  8:03     ` Arnd Bergmann
  2009-05-25  8:43       ` hooanon05
  0 siblings, 1 reply; 68+ messages in thread
From: Arnd Bergmann @ 2009-05-25  8:03 UTC (permalink / raw)
  To: hooanon05
  Cc: Jan Blunck, linux-kernel, linux-fsdevel, viro, bharata, dwmw2,
	mszeredi, vaurora

On Monday 25 May 2009, hooanon05@yahoo.co.jp wrote:
> Is renaming a regular file supported?
> Accrding to the change in "[PATCH 24/32] union-mount: in-kernel file
> copy between union mounted filesystems", every rename under union seems
> to be rejected.

Right, but that is consistent with how the kernel would treat a
rename from one mount point to another, and tools like 'mv'
can handle this in user space.

	Arnd <><

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 15/32] union-mount: Documentation
  2009-05-25  8:03     ` Arnd Bergmann
@ 2009-05-25  8:43       ` hooanon05
  2009-06-18 19:05         ` Valerie Aurora
  0 siblings, 1 reply; 68+ messages in thread
From: hooanon05 @ 2009-05-25  8:43 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Jan Blunck, linux-kernel, linux-fsdevel, viro, bharata, dwmw2,
	mszeredi, vaurora


Arnd Bergmann:
> Right, but that is consistent with how the kernel would treat a
> rename from one mount point to another, and tools like 'mv'
> can handle this in user space.

Yes, that is the description in the union mount document.
While it says to rename a regular file is implemented, the code differs
actually.
----------------------------------------
+Rename across different levels of the union is implemented as a copy-up
+operation for regular files. Rename of directories simply returns EXDEV, the
+same as if we tried to rename across different mounts. Most applications have
	:::
----------------------------------------

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
                   ` (34 preceding siblings ...)
  2009-05-21 12:54 ` Jan Rekorajski
@ 2009-06-04 11:38 ` Scott James Remnant
  2009-06-09 22:15   ` Valerie Aurora
  35 siblings, 1 reply; 68+ messages in thread
From: Scott James Remnant @ 2009-06-04 11:38 UTC (permalink / raw)
  To: Jan Blunck
  Cc: linux-kernel, linux-fsdevel, viro, bharata, dwmw2, mszeredi, vaurora

[-- Attachment #1: Type: text/plain, Size: 500 bytes --]

On Mon, 2009-05-18 at 18:08 +0200, Jan Blunck wrote:

> Here is another post of the VFS based union mount implementation.
> 
Awesome work, this may just get us out of a tight spot with our LiveCD.
A switch to a devmapper/snapshot based implementation lost the property
that we could rsync the filesystem (squashfs has been nicely rsyncable).

What kind of testing do you need doing, and where would you like the bug
reports/patches? :-)

Scott
-- 
Scott James Remnant
scott@ubuntu.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-05-20  9:05     ` Miklos Szeredi
@ 2009-06-08 19:44       ` Valerie Aurora
  2009-06-16 15:19         ` Miklos Szeredi
  0 siblings, 1 reply; 68+ messages in thread
From: Valerie Aurora @ 2009-06-08 19:44 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: jblunck, linux-kernel, linux-fsdevel, viro, bharata, dwmw2, mszeredi

On Wed, May 20, 2009 at 11:05:27AM +0200, Miklos Szeredi wrote:
> On Tue, 19 May 2009, Valerie Aurora wrote:
> > As Jan said, readdir() of read-only unioned file systems works with a
> > tmpfs top layer.  If you think about it, this is the exact equivalent
> > of the version of union mounts which used the in-kernel caching
> > approach - except that it's better, because it reuses existing code
> > and caches between readdir() calls.  Cool, huh?
> 
> Yeah... OTOH tmpfs is probably a way too heavyweight solution for
> cases where memory is short, and union mounts would typically be used
> on such systems.

(Sorry for the delay - I've been on vacation.)

Hm, my intuition is that a tmpfs mount would be fairly lightweight in
terms of memory - the main overhead over the barebones solution would
be one superblock and vfsmount struct per mount.  What am I missing?

> The big reason why kernel impementation of readdir is hard is that
> unswappable kernel memory needs to be used for caching directory
> contents while the directory is open.  Well, tmpfs does the same,
> dentries and inodes are _not_ swappable, and they gobble up memory.

That's a good point.  It seemed to me that it wouldn't be too
difficult to make those entries evictable - drop a reference count and
set the ->d_release to mark the directory as needing rebuilding.  What
do you think?

> So where's the advantage over implementing a thin deduplicating and
> caching layer for union mounts?
> 
> Thanks,
> Miklos

-VAL

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-05-21 12:54 ` Jan Rekorajski
@ 2009-06-08 19:57   ` Valerie Aurora
  2009-06-08 22:44     ` Jan Rekorajski
  0 siblings, 1 reply; 68+ messages in thread
From: Valerie Aurora @ 2009-06-08 19:57 UTC (permalink / raw)
  To: Jan Rekorajski, Jan Blunck, linux-kernel, linux-fsdevel, viro,
	bharata, dwm

On Thu, May 21, 2009 at 02:54:19PM +0200, Jan Rekorajski wrote:
> On Mon, 18 May 2009, Jan Blunck wrote:
> 
> > Here is another post of the VFS based union mount implementation.
> 
> Is there any chance this will support NFS? I can union-mount tmpfs over

NFS as the read-only layer ought to work.  NFS as the read-write layer
is still up in the air.

> nfs mounted fs, but if I try to mount --union two NFS filesystems I
> always get -EBUSY on second mount on the same mountpoint.
> 
> Something along these lines:
> 
> doesn't matter if I use --union on first mount, the result is always the
> same.
> 
> mount <--union> -t nfs server:/export/system /mnt
> OK
> mount --union -t nfs server:/export/profile /mnt
> mount.nfs: /mnt is busy or already mounted
> 
> I patched mount.nfs so it knows about MS_UNION, and strace shows me that
> it passes that flag to kernel.

FYI, using --union on the first mount will make it union with the
local directory below it.  The --union option is not needed when you
mount the lower read-only layer.

You'll get -EBUSY on the second mount of any NFS file system over
another - try it again with the --union flag.  Support for NFS on NFS
union mount would have to change this.

-VAL

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-06-08 19:57   ` Valerie Aurora
@ 2009-06-08 22:44     ` Jan Rekorajski
  2009-06-08 22:48       ` Valerie Aurora
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Rekorajski @ 2009-06-08 22:44 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Jan Blunck, linux-kernel, linux-fsdevel, viro, bharata, dwmw2, mszeredi

On Mon, 08 Jun 2009, Valerie Aurora wrote:

> On Thu, May 21, 2009 at 02:54:19PM +0200, Jan Rekorajski wrote:
> > On Mon, 18 May 2009, Jan Blunck wrote:
> > 
> > > Here is another post of the VFS based union mount implementation.
> > 
> > Is there any chance this will support NFS? I can union-mount tmpfs over
> 
> NFS as the read-only layer ought to work.  NFS as the read-write layer
> is still up in the air.

As I don't need rw NFS, i didn't even try that :)

> > nfs mounted fs, but if I try to mount --union two NFS filesystems I
> > always get -EBUSY on second mount on the same mountpoint.
> > 
> > Something along these lines:
> > 
> > doesn't matter if I use --union on first mount, the result is always the
> > same.
> > 
> > mount <--union> -t nfs server:/export/system /mnt
> > OK
> > mount --union -t nfs server:/export/profile /mnt
> > mount.nfs: /mnt is busy or already mounted
> > 
> > I patched mount.nfs so it knows about MS_UNION, and strace shows me that
> > it passes that flag to kernel.
> 
> FYI, using --union on the first mount will make it union with the
> local directory below it.  The --union option is not needed when you
> mount the lower read-only layer.

Thanks for clarification.

> You'll get -EBUSY on the second mount of any NFS file system over
> another - try it again with the --union flag.  Support for NFS on NFS
> union mount would have to change this.

I did just that, --union didn't change standard NFS behaviour.

mount -t nfs server:/export/system /mnt
mount --union -t nfs server:/export/profile /mnt
mount.nfs: /mnt is busy or already mounted

I did an experiment by using different IP of the server (same machine)
when mounting the second fs, mount worked then, but 'ls -1 /mnt' oopsed.
I can reproduce this and send you the oops next week.

-- 
Jan Rekorajski            |  ALL SUSPECTS ARE GUILTY. PERIOD!
baggins<at>mimuw.edu.pl   |  OTHERWISE THEY WOULDN'T BE SUSPECTS, WOULD THEY?
BOFH, MANIAC              |                   -- TROOPS by Kevin Rubio

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-06-08 22:44     ` Jan Rekorajski
@ 2009-06-08 22:48       ` Valerie Aurora
  2009-06-15  9:55         ` Jan Rekorajski
  0 siblings, 1 reply; 68+ messages in thread
From: Valerie Aurora @ 2009-06-08 22:48 UTC (permalink / raw)
  To: Jan Rekorajski, Jan Blunck, linux-kernel, linux-fsdevel, viro,
	bharata, dwm

On Tue, Jun 09, 2009 at 12:44:06AM +0200, Jan Rekorajski wrote:
> On Mon, 08 Jun 2009, Valerie Aurora wrote:
> 
> > On Thu, May 21, 2009 at 02:54:19PM +0200, Jan Rekorajski wrote:
> > > On Mon, 18 May 2009, Jan Blunck wrote:
> > > 
> > > > Here is another post of the VFS based union mount implementation.
> > > 
> > > Is there any chance this will support NFS? I can union-mount tmpfs over
> > 
> > NFS as the read-only layer ought to work.  NFS as the read-write layer
> > is still up in the air.
> 
> As I don't need rw NFS, i didn't even try that :)
> 
> > > nfs mounted fs, but if I try to mount --union two NFS filesystems I
> > > always get -EBUSY on second mount on the same mountpoint.
> > > 
> > > Something along these lines:
> > > 
> > > doesn't matter if I use --union on first mount, the result is always the
> > > same.
> > > 
> > > mount <--union> -t nfs server:/export/system /mnt
> > > OK
> > > mount --union -t nfs server:/export/profile /mnt
> > > mount.nfs: /mnt is busy or already mounted
> > > 
> > > I patched mount.nfs so it knows about MS_UNION, and strace shows me that
> > > it passes that flag to kernel.
> > 
> > FYI, using --union on the first mount will make it union with the
> > local directory below it.  The --union option is not needed when you
> > mount the lower read-only layer.
> 
> Thanks for clarification.
> 
> > You'll get -EBUSY on the second mount of any NFS file system over
> > another - try it again with the --union flag.  Support for NFS on NFS
> > union mount would have to change this.
> 
> I did just that, --union didn't change standard NFS behaviour.

Er, excuse me - I mean to type "try it again WITHOUT the --union
flag."  My apologies!

> 
> mount -t nfs server:/export/system /mnt
> mount --union -t nfs server:/export/profile /mnt
> mount.nfs: /mnt is busy or already mounted
> 
> I did an experiment by using different IP of the server (same machine)
> when mounting the second fs, mount worked then, but 'ls -1 /mnt' oopsed.
> I can reproduce this and send you the oops next week.

Interesting!  Does this happen without the --union flag?

-VAL

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-06-04 11:38 ` Scott James Remnant
@ 2009-06-09 22:15   ` Valerie Aurora
  0 siblings, 0 replies; 68+ messages in thread
From: Valerie Aurora @ 2009-06-09 22:15 UTC (permalink / raw)
  To: Scott James Remnant
  Cc: Jan Blunck, linux-kernel, linux-fsdevel, viro, bharata, dwmw2, mszeredi

On Thu, Jun 04, 2009 at 12:38:50PM +0100, Scott James Remnant wrote:
> On Mon, 2009-05-18 at 18:08 +0200, Jan Blunck wrote:
> 
> > Here is another post of the VFS based union mount implementation.
> > 
> Awesome work, this may just get us out of a tight spot with our LiveCD.
> A switch to a devmapper/snapshot based implementation lost the property
> that we could rsync the filesystem (squashfs has been nicely rsyncable).
> 
> What kind of testing do you need doing, and where would you like the bug
> reports/patches? :-)

Just try to use it to do real work and I'm sure it will break real
good. :) Bug reports and patches to linux-fsdevel@vger.kernel.org and
cc'd to Jan and I.

Thanks!

-VAL

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-06-08 22:48       ` Valerie Aurora
@ 2009-06-15  9:55         ` Jan Rekorajski
  2009-06-18  3:23           ` Valerie Aurora
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Rekorajski @ 2009-06-15  9:55 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Jan Blunck, linux-kernel, linux-fsdevel, viro, bharata, dwmw2, mszeredi

On Mon, 08 Jun 2009, Valerie Aurora wrote:

> On Tue, Jun 09, 2009 at 12:44:06AM +0200, Jan Rekorajski wrote:
> > On Mon, 08 Jun 2009, Valerie Aurora wrote:
> > 
> > > You'll get -EBUSY on the second mount of any NFS file system over
> > > another - try it again with the --union flag.  Support for NFS on NFS
> > > union mount would have to change this.
> > 
> > I did just that, --union didn't change standard NFS behaviour.
> 
> Er, excuse me - I mean to type "try it again WITHOUT the --union
> flag."  My apologies!

I did, see below.

> > 
> > mount -t nfs server:/export/system /mnt
> > mount --union -t nfs server:/export/profile /mnt
> > mount.nfs: /mnt is busy or already mounted
> > 
> > I did an experiment by using different IP of the server (same machine)
> > when mounting the second fs, mount worked then, but 'ls -1 /mnt' oopsed.
> > I can reproduce this and send you the oops next week.
> 
> Interesting!  Does this happen without the --union flag?

Filesystems are exported ro, all mounts nfs3,ro,tcp.

Without --union:

mount -t nfs 10.1.0.4:/nfs/system /mnt -oro,nolock,vers=3,tcp
OK

mount -t nfs 10.1.0.4:/nfs/profile /mnt -oro,nolock,vers=3,tcp
-EBUSY (as expected)

mount -t nfs 10.1.0.3:/nfs/profile /mnt -oro,nolock,vers=3,tcp
 (notice different IP - but it's the same machine)
Works, just overmounts /mnt, ls shown contents of /nfs/profile

Now, --union:

mount -t nfs 10.1.0.4:/nfs/system /mnt -oro,nolock,vers=3,tcp
OK

mount --union -t nfs 10.1.0.4:/nfs/profile /mnt -oro,nolock,vers=3,tcp
-EBUSY (/mnt busy or already mounted)

mount --union -t nfs 10.1.0.3:/nfs/profile /mnt -oro,nolock,vers=3,tcp
 (notice different IP - but it's the same machine)
mount command works, ls Oopses:

[   61.766392] creating fallthru for opt
[   61.766417] BUG: unable to handle kernel NULL pointer dereference at (null)
[   61.766433] IP: [<(null)>] (null)
[   61.766482] *pdpt = 000000001e0e9001 *pde = 0000000000000000 
[   61.767324] Oops: 0010 [#1] PREEMPT SMP 
[   61.767324] last sysfs file: /sys/kernel/uevent_seqnum
[   61.767324] Modules linked in: nfs nfsd lockd nfs_acl auth_rpcgss sunrpc sch_sfq thermal processor thermal_sys rtc_cmos rtc_core i2c_piix4 e1000 ac psmouse rtc_lib hwmon pcspkr button i2c_core sg sr_mod serio_raw evdev cdrom
[   61.767324] 
[   61.767324] Pid: 2460, comm: ls Not tainted (2.6.29.3 #1) VirtualBox
[   61.767324] EIP: 0060:[<00000000>] EFLAGS: 00010286 CPU: 0
[   61.767324] EIP is at 0x0
[   61.767324] EAX: df6d23c0 EBX: df69d1b0 ECX: e105d3e0 EDX: df69d630
[   61.767324] ESI: df69d630 EDI: de16d300 EBP: de1b1d18 ESP: de1b1d04
[   61.767324]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[   61.767324] Process ls (pid: 2460, ti=de1b0000 task=de0ee3f0 task.ti=de1b0000)
[   61.767324] Stack:
[   61.767324]  c02d03d9 c054f57d df69d698 00000002 00000000 de1b1d7c e10357f0 00000002
[   61.767324]  00000000 0180677d 00000000 0000000a c02d0340 df69d1b0 00000000 00000002
[   61.767324]  0180677d 00000000 df6d2128 de16d300 de1b1f08 de1b1f08 df69d5a0 de1b1ed4
[   61.767324] Call Trace:
[   61.767324]  [<c02d03d9>] union_copyup_dir_one+0x99/0xc0
[   61.767324]  [<e10357f0>] nfs_do_filldir+0x210/0x570 [nfs]
[   61.767324]  [<c02d0340>] union_copyup_dir_one+0x0/0xc0
[   61.767324]  [<e1036072>] nfs_readdir+0x522/0xa10 [nfs]
[   61.767324]  [<c02d0340>] union_copyup_dir_one+0x0/0xc0
[   61.767324]  [<c0293502>] __mem_cgroup_commit_charge+0x42/0x100
[   61.767324]  [<c0293c07>] mem_cgroup_charge_common+0x57/0x70
[   61.767324]  [<e1039f18>] __put_nfs_open_context+0x28/0xb0 [nfs]
[   61.767324]  [<e0a6fbe2>] rpcauth_lookup_credcache+0x152/0x1f0 [sunrpc]
[   61.767324]  [<e0a6f92d>] rpcauth_lookupcred+0x5d/0xb0 [sunrpc]
[   61.767324]  [<e1049c10>] nfs3_decode_dirent+0x0/0x220 [nfs]
[   61.767324]  [<c02d0ad1>] union_copyup_dir+0x111/0x160
[   61.767324]  [<c02a6190>] filldir64+0x0/0x110
[   61.767324]  [<c02a6575>] vfs_readdir+0xd5/0xf0
[   61.767324]  [<c02a65fd>] sys_getdents64+0x6d/0xc0
[   61.767324]  [<c02033da>] syscall_call+0x7/0xb
[   61.767324] Code:  Bad EIP value.
[   61.767324] EIP: [<00000000>] 0x0 SS:ESP 0068:de1b1d04
[   61.816597] ---[ end trace 60fb13bae2f23426 ]---

-- 
Jan Rekorajski            |  ALL SUSPECTS ARE GUILTY. PERIOD!
baggins<at>mimuw.edu.pl   |  OTHERWISE THEY WOULDN'T BE SUSPECTS, WOULD THEY?
BOFH, MANIAC              |                   -- TROOPS by Kevin Rubio

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-06-08 19:44       ` Valerie Aurora
@ 2009-06-16 15:19         ` Miklos Szeredi
  0 siblings, 0 replies; 68+ messages in thread
From: Miklos Szeredi @ 2009-06-16 15:19 UTC (permalink / raw)
  To: vaurora
  Cc: miklos, jblunck, linux-kernel, linux-fsdevel, viro, bharata,
	dwmw2, mszeredi

On Mon, 8 Jun 2009, Valerie Aurora wrote:
> On Wed, May 20, 2009 at 11:05:27AM +0200, Miklos Szeredi wrote:

> > The big reason why kernel impementation of readdir is hard is that
> > unswappable kernel memory needs to be used for caching directory
> > contents while the directory is open.  Well, tmpfs does the same,
> > dentries and inodes are _not_ swappable, and they gobble up memory.
> 
> That's a good point.  It seemed to me that it wouldn't be too
> difficult to make those entries evictable - drop a reference count and
> set the ->d_release to mark the directory as needing rebuilding.  What
> do you think?

AFAICS, there are nontrivial problems to deal with:

If directory is still open, child dentries must not go away.

If directory is closed, and at least one child is evicted, then the
whole directory is unusable and needs to be rebuilt on next readdir.

If we can solve those in a non-racy way than it might work.  I suspect
however, that some additional code in union-mounts that adds all this
functionality without reusing tmpfs would actually be simpler to
implement.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] Userland for VFS based Union Mount (V3)
  2009-05-21 13:53   ` Andreas Dilger
@ 2009-06-18  3:22     ` Valerie Aurora
  0 siblings, 0 replies; 68+ messages in thread
From: Valerie Aurora @ 2009-06-18  3:22 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Jan Blunck, linux-kernel, linux-fsdevel, viro, bharata, dwmw2, mszeredi

On Thu, May 21, 2009 at 09:53:15AM -0400, Andreas Dilger wrote:
> On May 18, 2009  16:40 -0400, Valerie Aurora wrote:
> > @@ -705,8 +707,9 @@ struct ext2_dir_entry_2 {
> >  #define EXT2_FT_FIFO		5
> >  #define EXT2_FT_SOCK		6
> >  #define EXT2_FT_SYMLINK		7
> > +#define EXT2_FT_WHT		8
> >  
> > -#define EXT2_FT_MAX		8
> > +#define EXT2_FT_MAX		9
> 
> What about the EXT2_FT_FALLTHROUGH used in the union mount patches?

We're UNIX engineers, we don't include unnecessary letters. :)

Thanks, I added EXT2_FT_FALLTHRU and made it possible to specify "-O
whiteout" in my latest patchset.

-VAL

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/32] VFS based Union Mount (V3)
  2009-06-15  9:55         ` Jan Rekorajski
@ 2009-06-18  3:23           ` Valerie Aurora
  0 siblings, 0 replies; 68+ messages in thread
From: Valerie Aurora @ 2009-06-18  3:23 UTC (permalink / raw)
  To: Jan Rekorajski, Jan Blunck, linux-kernel, linux-fsdevel, viro,
	bharata, dwm

On Mon, Jun 15, 2009 at 11:55:28AM +0200, Jan Rekorajski wrote:
> On Mon, 08 Jun 2009, Valerie Aurora wrote:
> 
> > On Tue, Jun 09, 2009 at 12:44:06AM +0200, Jan Rekorajski wrote:
> > > On Mon, 08 Jun 2009, Valerie Aurora wrote:
> > > 
> > > > You'll get -EBUSY on the second mount of any NFS file system over
> > > > another - try it again with the --union flag.  Support for NFS on NFS
> > > > union mount would have to change this.
> > > 
> > > I did just that, --union didn't change standard NFS behaviour.
> > 
> > Er, excuse me - I mean to type "try it again WITHOUT the --union
> > flag."  My apologies!
> 
> I did, see below.
> 
> > > 
> > > mount -t nfs server:/export/system /mnt
> > > mount --union -t nfs server:/export/profile /mnt
> > > mount.nfs: /mnt is busy or already mounted
> > > 
> > > I did an experiment by using different IP of the server (same machine)
> > > when mounting the second fs, mount worked then, but 'ls -1 /mnt' oopsed.
> > > I can reproduce this and send you the oops next week.
> > 
> > Interesting!  Does this happen without the --union flag?
> 
> Filesystems are exported ro, all mounts nfs3,ro,tcp.
> 
> Without --union:
> 
> mount -t nfs 10.1.0.4:/nfs/system /mnt -oro,nolock,vers=3,tcp
> OK
> 
> mount -t nfs 10.1.0.4:/nfs/profile /mnt -oro,nolock,vers=3,tcp
> -EBUSY (as expected)
> 
> mount -t nfs 10.1.0.3:/nfs/profile /mnt -oro,nolock,vers=3,tcp
>  (notice different IP - but it's the same machine)
> Works, just overmounts /mnt, ls shown contents of /nfs/profile
> 
> Now, --union:
> 
> mount -t nfs 10.1.0.4:/nfs/system /mnt -oro,nolock,vers=3,tcp
> OK
> 
> mount --union -t nfs 10.1.0.4:/nfs/profile /mnt -oro,nolock,vers=3,tcp
> -EBUSY (/mnt busy or already mounted)
> 
> mount --union -t nfs 10.1.0.3:/nfs/profile /mnt -oro,nolock,vers=3,tcp
>  (notice different IP - but it's the same machine)
> mount command works, ls Oopses:

Sorry for the delay in reply!  This is on our to-do list, thanks for
doing a little more testing.

-VAL

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 15/32] union-mount: Documentation
  2009-05-25  8:43       ` hooanon05
@ 2009-06-18 19:05         ` Valerie Aurora
  2009-06-19  1:53           ` hooanon05
  0 siblings, 1 reply; 68+ messages in thread
From: Valerie Aurora @ 2009-06-18 19:05 UTC (permalink / raw)
  To: hooanon05
  Cc: Arnd Bergmann, Jan Blunck, linux-kernel, linux-fsdevel, viro,
	bharata, dwmw2, mszeredi

On Mon, May 25, 2009 at 05:43:10PM +0900, hooanon05@yahoo.co.jp wrote:
> 
> Arnd Bergmann:
> > Right, but that is consistent with how the kernel would treat a
> > rename from one mount point to another, and tools like 'mv'
> > can handle this in user space.
> 
> Yes, that is the description in the union mount document.
> While it says to rename a regular file is implemented, the code differs
> actually.
> ----------------------------------------
> +Rename across different levels of the union is implemented as a copy-up
> +operation for regular files. Rename of directories simply returns EXDEV, the
> +same as if we tried to rename across different mounts. Most applications have
> 	:::
> ----------------------------------------

Ah, we did implement that in an earlier version.  I don't know if we
dropped the patch by accident or on purpose, but the original version
is below.  We will either put this feature back or fix the
documentation.  Thanks!

-VAL

Subject: union-mount: copyup on rename

Add copyup renaming of regular files on union mounts. Directories are still
lazyly copied with the help of user-space.

Signed-off-by: Jan Blunck <jblunck@suse.de>
---
 fs/namei.c |  131 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 124 insertions(+), 7 deletions(-)

Index: b/fs/namei.c
===================================================================
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1532,6 +1532,8 @@ static int do_path_lookup(int dfd, const
 		nd->path = fs->pwd;
 		path_get(&fs->pwd);
 		read_unlock(&fs->lock);
+		/* Force a union_relookup() */
+		nd->um_flags = LAST_LOWLEVEL;
 	} else {
 		struct dentry *dentry;
 
@@ -3684,6 +3686,97 @@ int vfs_rename(struct inode *old_dir, st
 	return error;
 }
 
+int vfs_rename_union(struct nameidata *oldnd, struct path *old,
+		     struct nameidata *newnd, struct path *new)
+{
+	struct inode *old_dir = oldnd->path.dentry->d_inode;
+	struct inode *new_dir = newnd->path.dentry->d_inode;
+	struct qstr old_name;
+	char *name;
+	struct dentry *dentry;
+	int error;
+
+	if (old->dentry->d_inode == new->dentry->d_inode)
+		return 0;
+
+	error = may_whiteout(old->dentry, 0);
+	if (error)
+		return error;
+	if (!old_dir->i_op || !old_dir->i_op->whiteout)
+		return -EPERM;
+
+	if (!new->dentry->d_inode)
+		error = may_create(new_dir, new->dentry, NULL);
+	else
+		error = may_delete(new_dir, new->dentry, 0);
+	if (error)
+		return error;
+
+	DQUOT_INIT(old_dir);
+	DQUOT_INIT(new_dir);
+
+	error = security_inode_rename(old_dir, old->dentry,
+				      new_dir, new->dentry);
+	if (error)
+		return error;
+
+	error = -EBUSY;
+	if (d_mountpoint(old->dentry) || d_mountpoint(new->dentry))
+		return error;
+
+	error = -ENOMEM;
+	name = kmalloc(old->dentry->d_name.len, GFP_KERNEL);
+	if (!name)
+		return error;
+	strncpy(name, old->dentry->d_name.name, old->dentry->d_name.len);
+	name[old->dentry->d_name.len] = 0;
+	old_name.len = old->dentry->d_name.len;
+	old_name.hash = old->dentry->d_name.hash;
+	old_name.name = name;
+
+	/* possibly delete the existing new file */
+	if ((newnd->path.dentry == new->dentry->d_parent) && new->dentry->d_inode) {
+		/* FIXME: inode may be truncated while we hold a lock */
+		error = vfs_unlink(new_dir, new->dentry);
+		if (error)
+			goto freename;
+
+		dentry = __lookup_hash(&new->dentry->d_name,
+				       newnd->path.dentry, newnd);
+		if (IS_ERR(dentry))
+			goto freename;
+
+		dput(new->dentry);
+		new->dentry = dentry;
+	}
+
+	/* copyup to the new file */
+	error = __union_copyup(old, newnd, new);
+	if (error)
+		goto freename;
+
+	/* whiteout the old file */
+	dentry = __lookup_hash(&old_name, oldnd->path.dentry, oldnd);
+	error = PTR_ERR(dentry);
+	if (IS_ERR(dentry))
+		goto freename;
+	error = vfs_whiteout(old_dir, dentry);
+	dput(dentry);
+
+	/* FIXME: This is acutally unlink() && create() ... */
+/*
+	if (!error) {
+		const char *new_name = old_dentry->d_name.name;
+		fsnotify_move(old_dir, new_dir, old_name.name, new_name, 0,
+			      new_dentry->d_inode, old_dentry->d_inode);
+	}
+*/
+freename:
+	kfree(old_name.name);
+	return error;
+}
+
+
 static int do_rename(int olddfd, const char *oldname,
 			int newdfd, const char *newname)
 {
@@ -3701,10 +3794,7 @@ static int do_rename(int olddfd, const c
 	if (error)
 		goto exit1;
 
-	error = -EXDEV;
-	if (oldnd.path.mnt != newnd.path.mnt)
-		goto exit2;
-
+lock:
 	old_dir = oldnd.path.dentry;
 	error = -EBUSY;
 	if (oldnd.last_type != LAST_NORM)
@@ -3742,12 +3832,39 @@ static int do_rename(int olddfd, const c
 	error = -ENOTEMPTY;
 	if (new.dentry == trap)
 		goto exit5;
-	/* renaming on unions is done by the user-space */
+	/* renaming of directories on unions is done by the user-space */
+	error = -EXDEV;
+	if (is_unionized(oldnd.path.dentry, oldnd.path.mnt) &&
+	    S_ISDIR(old.dentry->d_inode->i_mode))
+		goto exit5;
+	/* renameing of other files on unions is done by copyup */
+	if ((is_unionized(oldnd.path.dentry, oldnd.path.mnt) &&
+	     (oldnd.um_flags & LAST_LOWLEVEL)) ||
+	    (is_unionized(newnd.path.dentry, newnd.path.mnt) &&
+	     (newnd.um_flags & LAST_LOWLEVEL))) {
+		path_put_conditional(&new, &newnd);
+		path_put_conditional(&old, &oldnd);
+		unlock_rename(new_dir, old_dir);
+		error = union_relookup_topmost(&oldnd,
+					       oldnd.flags & ~LOOKUP_PARENT);
+		if (error)
+			goto exit2;
+		error = union_relookup_topmost(&newnd,
+					       newnd.flags & ~LOOKUP_PARENT);
+		if (error)
+			goto exit2;
+		goto lock;
+	}
+
 	error = -EXDEV;
-	if (is_unionized(oldnd.path.dentry, oldnd.path.mnt))
+	if (oldnd.path.mnt != newnd.path.mnt)
 		goto exit5;
-	if (is_unionized(newnd.path.dentry, newnd.path.mnt))
+
+	if (is_unionized(oldnd.path.dentry, oldnd.path.mnt) &&
+	    (old.dentry->d_parent != oldnd.path.dentry)) {
+		error = vfs_rename_union(&oldnd, &old, &newnd, &new);
 		goto exit5;
+	}
 
 	error = mnt_want_write(oldnd.path.mnt);
 	if (error)


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 15/32] union-mount: Documentation
  2009-06-18 19:05         ` Valerie Aurora
@ 2009-06-19  1:53           ` hooanon05
  0 siblings, 0 replies; 68+ messages in thread
From: hooanon05 @ 2009-06-19  1:53 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Arnd Bergmann, Jan Blunck, linux-kernel, linux-fsdevel, viro,
	bharata, dwmw2, mszeredi


Valerie Aurora:
> Ah, we did implement that in an earlier version.  I don't know if we
> dropped the patch by accident or on purpose, but the original version
> is below.  We will either put this feature back or fix the
> documentation.  Thanks!

I see, it was implemented originally.
How about link(2)? Was it implemented too?
I am afraid current unionmount does not support link when the source
file exists on the lower fs.


J. R. Okajima

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2009-06-19  1:54 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
2009-05-18 16:08 ` [PATCH 01/32] atomic: Only take lock when the counter drops to zero on UP as well Jan Blunck
2009-05-18 16:08 ` [PATCH 02/32] VFS: BUG() if somebody tries to rehash an already hashed dentry Jan Blunck
2009-05-18 16:08 ` [PATCH 03/32] VFS: propagate mnt_flags into do_loopback Jan Blunck
2009-05-18 16:09 ` [PATCH 04/32] VFS: Make lookup_hash() return a struct path Jan Blunck
2009-05-18 16:09 ` [PATCH 05/32] VFS: Remove unnecessary micro-optimization in cached_lookup() Jan Blunck
2009-05-18 16:09 ` [PATCH 06/32] VFS: Make real_lookup() return a struct path Jan Blunck
2009-05-18 16:09 ` [PATCH 07/32] VFS: Introduce dput() variant that maintains a kill-list Jan Blunck
2009-05-18 16:09 ` [PATCH 08/32] whiteout: Don't return information about whiteouts to userspace Jan Blunck
2009-05-18 16:09 ` [PATCH 09/32] whiteout: Add vfs_whiteout() and whiteout inode operation Jan Blunck
2009-05-18 16:09 ` [PATCH 10/32] whiteout: Set S_OPAQUE inode flag when creating directories Jan Blunck
2009-05-18 16:09 ` [PATCH 11/32] whiteout: Add whiteout support to tmpfs Jan Blunck
2009-05-18 16:09 ` [PATCH 12/32] whiteout: Split of ext2_append_link() from ext2_add_link() Jan Blunck
2009-05-18 16:09 ` [PATCH 13/32] whiteout: Add whiteout support to ext2 Jan Blunck
2009-05-18 16:09 ` [PATCH 14/32] whiteout: Add path_whiteout() helper Jan Blunck
2009-05-18 16:09 ` [PATCH 15/32] union-mount: Documentation Jan Blunck
2009-05-25  6:25   ` hooanon05
2009-05-25  8:03     ` Arnd Bergmann
2009-05-25  8:43       ` hooanon05
2009-06-18 19:05         ` Valerie Aurora
2009-06-19  1:53           ` hooanon05
2009-05-18 16:09 ` [PATCH 16/32] union-mount: Introduce MNT_UNION and MS_UNION flags Jan Blunck
2009-05-18 16:09 ` [PATCH 17/32] union-mount: Introduce union_mount structure Jan Blunck
2009-05-18 16:09 ` [PATCH 18/32] union-mount: Drive the union cache via dcache Jan Blunck
2009-05-18 16:09 ` [PATCH 19/32] union-mount: Some checks during namespace changes Jan Blunck
2009-05-18 16:09 ` [PATCH 20/32] union-mount: Changes to the namespace handling Jan Blunck
2009-05-18 16:09 ` [PATCH 21/32] union-mount: Make lookup work for union-mounted file systems Jan Blunck
2009-05-19 16:15   ` Miklos Szeredi
2009-05-19 17:30     ` Valerie Aurora
2009-05-20 10:21       ` Miklos Szeredi
2009-05-18 16:09 ` [PATCH 22/32] union-mount: stop lookup when directory has S_OPAQUE flag set Jan Blunck
2009-05-18 16:09 ` [PATCH 23/32] union-mount: stop lookup when finding a whiteout Jan Blunck
2009-05-18 16:09 ` [PATCH 24/32] union-mount: in-kernel file copy between union mounted filesystems Jan Blunck
2009-05-18 16:09 ` [PATCH 25/32] union-mount: check for logically empty directory (FIXME) Jan Blunck
2009-05-18 16:09 ` [PATCH 26/32] union-mount: call do_whiteout() on unlink and rmdir Jan Blunck
2009-05-18 16:09 ` [PATCH 27/32] union-mount: Always create topmost directory on open Jan Blunck
2009-05-18 16:09 ` [PATCH 28/32] union-mount: Basic fallthru definitions Jan Blunck
2009-05-18 16:09 ` [PATCH 29/32] union mount: Support for fallthru entries in union mount lookup Jan Blunck
2009-05-18 16:09 ` [PATCH 30/32] union mount: ext2 fallthru support Jan Blunck
2009-05-18 16:32   ` Andreas Dilger
2009-05-19  9:42     ` Jan Blunck
2009-05-19 14:05       ` Andreas Dilger
2009-05-19 16:13         ` Jan Blunck
2009-05-18 16:09 ` [PATCH 31/32] union-mount: tmpfs " Jan Blunck
2009-05-18 16:09 ` [PATCH 32/32] union-mount: Copy up directory entries on first readdir() Jan Blunck
2009-05-18 20:40 ` [PATCH] Userland for VFS based Union Mount (V3) Valerie Aurora
2009-05-21 13:53   ` Andreas Dilger
2009-06-18  3:22     ` Valerie Aurora
2009-05-19  9:48 ` [PATCH 00/32] " Miklos Szeredi
2009-05-19 10:29   ` Jan Blunck
2009-05-19 10:35     ` Miklos Szeredi
2009-05-19 10:39       ` Jan Blunck
2009-05-19 11:54         ` Arnd Bergmann
2009-05-19 12:15           ` Jan Blunck
2009-05-19 12:21             ` Arnd Bergmann
2009-05-19 13:10               ` Jan Blunck
2009-05-19 17:23   ` Valerie Aurora
2009-05-20  9:05     ` Miklos Szeredi
2009-06-08 19:44       ` Valerie Aurora
2009-06-16 15:19         ` Miklos Szeredi
2009-05-21 12:54 ` Jan Rekorajski
2009-06-08 19:57   ` Valerie Aurora
2009-06-08 22:44     ` Jan Rekorajski
2009-06-08 22:48       ` Valerie Aurora
2009-06-15  9:55         ` Jan Rekorajski
2009-06-18  3:23           ` Valerie Aurora
2009-06-04 11:38 ` Scott James Remnant
2009-06-09 22:15   ` Valerie Aurora

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).