All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/39] Union mounts with xattrs
@ 2010-05-03 23:11 Valerie Aurora
  2010-05-03 23:12 ` [PATCH 01/39] VFS: Comment follow_mount() and friends Valerie Aurora
                   ` (38 more replies)
  0 siblings, 39 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:11 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

This release of union mounts includes:

- Updated Documentation/filesystems/union-mounts.txt
- Support for extended attributes

This version is feature-complete for local file systems.  Al Viro and
I will be reviewing it together this week, and a new version
incorporating his comments will be out as soon as I can implement it.

Patches are against 2.6.34-rc6.  See branch "xattr" in:

git://git.kernel.org/pub/scm/linux/kernel/git/val/linux-2.6.git

-VAL

Felix Fietkau (2):
  whiteout: jffs2 whiteout support
  fallthru: jffs2 fallthru support

Jan Blunck (13):
  VFS: Make lookup_hash() return a struct path
  autofs4: Save autofs trigger's vfsmount in super block info
  whiteout/NFSD: Don't return information about whiteouts to userspace
  whiteout: Add vfs_whiteout() and whiteout inode operation
  whiteout: Set S_OPAQUE inode flag when creating directories
  whiteout: Allow removal of a directory with whiteouts
  whiteout: tmpfs whiteout support
  whiteout: Split of ext2_append_link() from ext2_add_link()
  whiteout: ext2 whiteout support
  union-mount: Introduce MNT_UNION and MS_UNION flags
  union-mount: Introduce union_mount structure and basic operations
  union-mount: Drive the union cache via dcache
  union-mount: Call do_whiteout() on unlink and rmdir in unions

Valerie Aurora (24):
  VFS: Comment follow_mount() and friends
  VFS: Add read-only users count to superblock
  fallthru: Basic fallthru definitions
  fallthru: ext2 fallthru support
  fallthru: tmpfs fallthru support
  union-mount: Union mounts documentation
  union-mount: Implement union lookup
  union-mount: Support for mounting union mount file systems
  union-mount: Copy up directory entries on first readdir()
  VFS: Split inode_permission() and create path_permission()
  VFS: Create user_path_nd() to lookup both parent and target
  union-mount: In-kernel copyup routines
  union-mount: In-kernel copyup of xattrs
  union-mount: Implement union-aware access()/faccessat()
  union-mount: Implement union-aware link()
  union-mount: Implement union-aware rename()
  union-mount: Implement union-aware writable open()
  union-mount: Implement union-aware chown()
  union-mount: Implement union-aware truncate()
  union-mount: Implement union-aware chmod()/fchmodat()
  union-mount: Implement union-aware lchown()
  union-mount: Implement union-aware utimensat()
  union-mount: Implement union-aware setxattr()
  union-mount: Implement union-aware lsetxattr()

 Documentation/filesystems/union-mounts.txt |  899 ++++++++++++++++++++++++++
 Documentation/filesystems/vfs.txt          |   16 +-
 fs/Kconfig                                 |   13 +
 fs/Makefile                                |    1 +
 fs/autofs4/autofs_i.h                      |    1 +
 fs/autofs4/init.c                          |   11 +-
 fs/autofs4/root.c                          |    6 +
 fs/compat.c                                |    9 +
 fs/dcache.c                                |   35 +-
 fs/ext2/dir.c                              |  248 +++++++-
 fs/ext2/ext2.h                             |    4 +
 fs/ext2/inode.c                            |   11 +-
 fs/ext2/namei.c                            |   89 +++-
 fs/ext2/super.c                            |    6 +
 fs/jffs2/dir.c                             |  104 +++-
 fs/jffs2/fs.c                              |    4 +
 fs/jffs2/super.c                           |    2 +-
 fs/libfs.c                                 |   21 +-
 fs/namei.c                                 |  844 ++++++++++++++++++++++---
 fs/namespace.c                             |  162 +++++-
 fs/nfsd/nfs3xdr.c                          |    5 +
 fs/nfsd/nfs4xdr.c                          |    5 +
 fs/nfsd/nfsxdr.c                           |    4 +
 fs/open.c                                  |  116 +++-
 fs/readdir.c                               |   18 +
 fs/super.c                                 |   23 +
 fs/union.c                                 |  950 ++++++++++++++++++++++++++++
 fs/utimes.c                                |   14 +-
 fs/xattr.c                                 |   64 ++-
 include/linux/dcache.h                     |   38 ++-
 include/linux/ext2_fs.h                    |    5 +
 include/linux/fs.h                         |   16 +
 include/linux/jffs2.h                      |    8 +
 include/linux/mount.h                      |    7 +-
 include/linux/namei.h                      |    2 +
 include/linux/union.h                      |   77 +++
 mm/shmem.c                                 |  195 ++++++-
 37 files changed, 3854 insertions(+), 179 deletions(-)
 create mode 100644 Documentation/filesystems/union-mounts.txt
 create mode 100644 fs/union.c
 create mode 100644 include/linux/union.h


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 01/39] VFS: Comment follow_mount() and friends
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 02/39] VFS: Make lookup_hash() return a struct path Valerie Aurora
                   ` (37 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora, Alexander Viro

Add comments describing what the directions "up" and "down" mean and
ref count handling to the VFS follow_mount() family of functions.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c     |   43 +++++++++++++++++++++++++++++++++++++++----
 fs/namespace.c |   16 ++++++++++++++--
 2 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a7dce91..dda6b7e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -596,6 +596,17 @@ loop:
 	return err;
 }
 
+/*
+ * follow_up - Find the mountpoint of path's vfsmount
+ *
+ * Given a path, find the mountpoint of its source file system.
+ * Replace @path with the path of the mountpoint in the parent mount.
+ * Up is towards /.
+ *
+ * Return 1 if we went up a level and 0 if we were already at the
+ * root.
+ */
+
 int follow_up(struct path *path)
 {
 	struct vfsmount *parent;
@@ -616,8 +627,22 @@ int follow_up(struct path *path)
 	return 1;
 }
 
-/* no need for dcache_lock, as serialization is taken care in
- * namespace.c
+/*
+ * __follow_mount - Return the most recent mount at this mountpoint
+ *
+ * Given a mountpoint, find the most recently mounted file system at
+ * this mountpoint and return the path to its root dentry.  This is
+ * the file system that is visible, and it is in the direction of VFS
+ * "down" - away from the root of the mount tree.  See comments to
+ * lookup_mnt() for an example of "down."
+ *
+ * Does not decrement the refcount on the given mount even if it
+ * follows it to another mount and returns that path instead.
+ *
+ * Returns 0 if path was unchanged, 1 if we followed it to another mount.
+ *
+ * No need for dcache_lock, as serialization is taken care in
+ * namespace.c.
  */
 static int __follow_mount(struct path *path)
 {
@@ -636,6 +661,12 @@ static int __follow_mount(struct path *path)
 	return res;
 }
 
+/*
+ * Like __follow_mount, but no return value and drops references to
+ * both mnt and dentry of the given path if it follows to another
+ * mount.
+ */
+
 static void follow_mount(struct path *path)
 {
 	while (d_mountpoint(path->dentry)) {
@@ -649,8 +680,12 @@ static void follow_mount(struct path *path)
 	}
 }
 
-/* no need for dcache_lock, as serialization is taken care in
- * namespace.c
+/*
+ * Like follow_mount(), but traverses only one layer instead of
+ * continuing until it runs out.
+ *
+ * No need for dcache_lock, as serialization is taken care in
+ * namespace.c.
  */
 int follow_down(struct path *path)
 {
diff --git a/fs/namespace.c b/fs/namespace.c
index 8174c8a..1cd59a0 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -433,8 +433,20 @@ struct vfsmount *__lookup_mnt(struct vfsmount *mnt, struct dentry *dentry,
 }
 
 /*
- * lookup_mnt increments the ref count before returning
- * the vfsmount struct.
+ * lookup_mnt - Return the first child mount mounted at path
+ *
+ * "First" means first mounted chronologically.  If you create the
+ * following mounts:
+ *
+ * mount /dev/sda1 /mnt
+ * mount /dev/sda2 /mnt
+ * mount /dev/sda3 /mnt
+ *
+ * Then lookup_mnt() on the base /mnt dentry in the root mount will
+ * return successively the root dentry and vfsmount of /dev/sda1, then
+ * /dev/sda2, then /dev/sda3, then NULL.
+ *
+ * lookup_mnt takes a reference to the found vfsmount.
  */
 struct vfsmount *lookup_mnt(struct path *path)
 {
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 02/39] VFS: Make lookup_hash() return a struct path
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
  2010-05-03 23:12 ` [PATCH 01/39] VFS: Comment follow_mount() and friends Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 03/39] VFS: Add read-only users count to superblock Valerie Aurora
                   ` (36 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora, Alexander Viro

From: Jan Blunck <jblunck@suse.de>

This patch changes lookup_hash() into returning a struct path.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c |  113 ++++++++++++++++++++++++++++++-----------------------------
 1 files changed, 57 insertions(+), 56 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index dda6b7e..219da2b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1155,7 +1155,7 @@ int vfs_path_lookup(struct dentry *dentry, struct vfsmount *mnt,
 }
 
 static struct dentry *__lookup_hash(struct qstr *name,
-		struct dentry *base, struct nameidata *nd)
+				    struct dentry *base, struct nameidata *nd)
 {
 	struct dentry *dentry;
 	struct inode *inode;
@@ -1212,14 +1212,22 @@ out:
  * needs parent already locked. Doesn't follow mounts.
  * SMP-safe.
  */
-static struct dentry *lookup_hash(struct nameidata *nd)
+static int lookup_hash(struct nameidata *nd, struct qstr *name,
+		       struct path *path)
 {
 	int err;
 
 	err = exec_permission(nd->path.dentry->d_inode);
 	if (err)
-		return ERR_PTR(err);
-	return __lookup_hash(&nd->last, nd->path.dentry, nd);
+		return err;
+	path->mnt = nd->path.mnt;
+	path->dentry =  __lookup_hash(name, nd->path.dentry, nd);
+	if (IS_ERR(path->dentry)) {
+		err = PTR_ERR(path->dentry);
+		path->dentry = NULL;
+		path->mnt = NULL;
+	}
+	return err;
 }
 
 static int __lookup_one_len(const char *name, struct qstr *this,
@@ -1701,12 +1709,9 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
 
 	/* OK, it's O_CREAT */
 	mutex_lock(&dir->d_inode->i_mutex);
+	error = lookup_hash(nd, &nd->last, path);
 
-	path->dentry = lookup_hash(nd);
-	path->mnt = nd->path.mnt;
-
-	error = PTR_ERR(path->dentry);
-	if (IS_ERR(path->dentry)) {
+	if (error) {
 		mutex_unlock(&dir->d_inode->i_mutex);
 		goto exit;
 	}
@@ -1956,7 +1961,8 @@ EXPORT_SYMBOL(filp_open);
  */
 struct dentry *lookup_create(struct nameidata *nd, int is_dir)
 {
-	struct dentry *dentry = ERR_PTR(-EEXIST);
+	struct path path;
+	int err;
 
 	mutex_lock_nested(&nd->path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
 	/*
@@ -1964,7 +1970,7 @@ struct dentry *lookup_create(struct nameidata *nd, int is_dir)
 	 * (foo/., foo/.., /////)
 	 */
 	if (nd->last_type != LAST_NORM)
-		goto fail;
+		return ERR_PTR(-EEXIST);
 	nd->flags &= ~LOOKUP_PARENT;
 	nd->flags |= LOOKUP_CREATE | LOOKUP_EXCL;
 	nd->intent.open.flags = O_EXCL;
@@ -1972,11 +1978,11 @@ struct dentry *lookup_create(struct nameidata *nd, int is_dir)
 	/*
 	 * Do the final lookup.
 	 */
-	dentry = lookup_hash(nd);
-	if (IS_ERR(dentry))
-		goto fail;
+	err = lookup_hash(nd, &nd->last, &path);
+	if (err)
+		return ERR_PTR(err);
 
-	if (dentry->d_inode)
+	if (path.dentry->d_inode)
 		goto eexist;
 	/*
 	 * Special case - lookup gave negative, but... we had foo/bar/
@@ -1985,15 +1991,14 @@ struct dentry *lookup_create(struct nameidata *nd, int is_dir)
 	 * been asking for (non-existent) directory. -ENOENT for you.
 	 */
 	if (unlikely(!is_dir && nd->last.name[nd->last.len])) {
-		dput(dentry);
-		dentry = ERR_PTR(-ENOENT);
+		dput(path.dentry);
+		return ERR_PTR(-ENOENT);
 	}
-	return dentry;
+
+	return path.dentry;
 eexist:
-	dput(dentry);
-	dentry = ERR_PTR(-EEXIST);
-fail:
-	return dentry;
+	path_put_conditional(&path, nd);
+	return ERR_PTR(-EEXIST);
 }
 EXPORT_SYMBOL_GPL(lookup_create);
 
@@ -2226,7 +2231,7 @@ static long do_rmdir(int dfd, const char __user *pathname)
 {
 	int error = 0;
 	char * name;
-	struct dentry *dentry;
+	struct path path;
 	struct nameidata nd;
 
 	error = user_path_parent(dfd, pathname, &nd, &name);
@@ -2248,21 +2253,20 @@ static long do_rmdir(int dfd, const char __user *pathname)
 	nd.flags &= ~LOOKUP_PARENT;
 
 	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
-	dentry = lookup_hash(&nd);
-	error = PTR_ERR(dentry);
-	if (IS_ERR(dentry))
+	error = lookup_hash(&nd, &nd.last, &path);
+	if (error)
 		goto exit2;
 	error = mnt_want_write(nd.path.mnt);
 	if (error)
 		goto exit3;
-	error = security_path_rmdir(&nd.path, dentry);
+	error = security_path_rmdir(&nd.path, path.dentry);
 	if (error)
 		goto exit4;
-	error = vfs_rmdir(nd.path.dentry->d_inode, dentry);
+	error = vfs_rmdir(nd.path.dentry->d_inode, path.dentry);
 exit4:
 	mnt_drop_write(nd.path.mnt);
 exit3:
-	dput(dentry);
+	path_put_conditional(&path, &nd);
 exit2:
 	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
 exit1:
@@ -2318,7 +2322,7 @@ static long do_unlinkat(int dfd, const char __user *pathname)
 {
 	int error;
 	char *name;
-	struct dentry *dentry;
+	struct path path;
 	struct nameidata nd;
 	struct inode *inode = NULL;
 
@@ -2333,26 +2337,25 @@ static long do_unlinkat(int dfd, const char __user *pathname)
 	nd.flags &= ~LOOKUP_PARENT;
 
 	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
-	dentry = lookup_hash(&nd);
-	error = PTR_ERR(dentry);
-	if (!IS_ERR(dentry)) {
+	error = lookup_hash(&nd, &nd.last, &path);
+	if (!error) {
 		/* Why not before? Because we want correct error value */
 		if (nd.last.name[nd.last.len])
 			goto slashes;
-		inode = dentry->d_inode;
+		inode = path.dentry->d_inode;
 		if (inode)
 			atomic_inc(&inode->i_count);
 		error = mnt_want_write(nd.path.mnt);
 		if (error)
 			goto exit2;
-		error = security_path_unlink(&nd.path, dentry);
+		error = security_path_unlink(&nd.path, path.dentry);
 		if (error)
 			goto exit3;
-		error = vfs_unlink(nd.path.dentry->d_inode, dentry);
+		error = vfs_unlink(nd.path.dentry->d_inode, path.dentry);
 exit3:
 		mnt_drop_write(nd.path.mnt);
 	exit2:
-		dput(dentry);
+		path_put_conditional(&path, &nd);
 	}
 	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
 	if (inode)
@@ -2363,8 +2366,8 @@ exit1:
 	return error;
 
 slashes:
-	error = !dentry->d_inode ? -ENOENT :
-		S_ISDIR(dentry->d_inode->i_mode) ? -EISDIR : -ENOTDIR;
+	error = !path.dentry->d_inode ? -ENOENT :
+		S_ISDIR(path.dentry->d_inode->i_mode) ? -EISDIR : -ENOTDIR;
 	goto exit2;
 }
 
@@ -2699,7 +2702,7 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 		int, newdfd, const char __user *, newname)
 {
 	struct dentry *old_dir, *new_dir;
-	struct dentry *old_dentry, *new_dentry;
+	struct path old, new;
 	struct dentry *trap;
 	struct nameidata oldnd, newnd;
 	char *from;
@@ -2733,16 +2736,15 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 
 	trap = lock_rename(new_dir, old_dir);
 
-	old_dentry = lookup_hash(&oldnd);
-	error = PTR_ERR(old_dentry);
-	if (IS_ERR(old_dentry))
+	error = lookup_hash(&oldnd, &oldnd.last, &old);
+	if (error)
 		goto exit3;
 	/* source must exist */
 	error = -ENOENT;
-	if (!old_dentry->d_inode)
+	if (!old.dentry->d_inode)
 		goto exit4;
 	/* unless the source is a directory trailing slashes give -ENOTDIR */
-	if (!S_ISDIR(old_dentry->d_inode->i_mode)) {
+	if (!S_ISDIR(old.dentry->d_inode->i_mode)) {
 		error = -ENOTDIR;
 		if (oldnd.last.name[oldnd.last.len])
 			goto exit4;
@@ -2751,32 +2753,31 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 	}
 	/* source should not be ancestor of target */
 	error = -EINVAL;
-	if (old_dentry == trap)
+	if (old.dentry == trap)
 		goto exit4;
-	new_dentry = lookup_hash(&newnd);
-	error = PTR_ERR(new_dentry);
-	if (IS_ERR(new_dentry))
+	error = lookup_hash(&newnd, &newnd.last, &new);
+	if (error)
 		goto exit4;
 	/* target should not be an ancestor of source */
 	error = -ENOTEMPTY;
-	if (new_dentry == trap)
+	if (new.dentry == trap)
 		goto exit5;
 
 	error = mnt_want_write(oldnd.path.mnt);
 	if (error)
 		goto exit5;
-	error = security_path_rename(&oldnd.path, old_dentry,
-				     &newnd.path, new_dentry);
+	error = security_path_rename(&oldnd.path, old.dentry,
+				     &newnd.path, new.dentry);
 	if (error)
 		goto exit6;
-	error = vfs_rename(old_dir->d_inode, old_dentry,
-				   new_dir->d_inode, new_dentry);
+	error = vfs_rename(old_dir->d_inode, old.dentry,
+				   new_dir->d_inode, new.dentry);
 exit6:
 	mnt_drop_write(oldnd.path.mnt);
 exit5:
-	dput(new_dentry);
+	path_put_conditional(&new, &newnd);
 exit4:
-	dput(old_dentry);
+	path_put_conditional(&old, &oldnd);
 exit3:
 	unlock_rename(new_dir, old_dir);
 exit2:
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 03/39] VFS: Add read-only users count to superblock
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
  2010-05-03 23:12 ` [PATCH 01/39] VFS: Comment follow_mount() and friends Valerie Aurora
  2010-05-03 23:12 ` [PATCH 02/39] VFS: Make lookup_hash() return a struct path Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 04/39] autofs4: Save autofs trigger's vfsmount in super block info Valerie Aurora
                   ` (35 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora, Alexander Viro

While we can check if a file system is currently read-only, we can't
guarantee that it will stay read-only.  The file system can be
remounted read-write at any time; it's also conceivable that a file
system can be mounted a second time and converted to read-write if the
underlying fs allows it.  This is a problem for union mounts, which
require the underlying file system be read-only.  Add a read-only
users count and don't allow remounts to change the file system to
read-write or read-write mounts if there are any read-only users.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c     |   11 +++++++++++
 fs/super.c         |   23 +++++++++++++++++++++++
 include/linux/fs.h |    8 ++++++++
 3 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 1cd59a0..9a40282 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -200,6 +200,17 @@ int __mnt_is_readonly(struct vfsmount *mnt)
 }
 EXPORT_SYMBOL_GPL(__mnt_is_readonly);
 
+static void inc_hard_readonly_users(struct vfsmount *mnt)
+{
+	mnt->mnt_sb->s_hard_readonly_users++;
+}
+
+static void dec_hard_readonly_users(struct vfsmount *mnt)
+{
+	BUG_ON(mnt->mnt_sb->s_hard_readonly_users == 0);
+	mnt->mnt_sb->s_hard_readonly_users--;
+}
+
 static inline void inc_mnt_writers(struct vfsmount *mnt)
 {
 #ifdef CONFIG_SMP
diff --git a/fs/super.c b/fs/super.c
index 1527e6a..6add39b 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -118,6 +118,7 @@ out:
  */
 static inline void destroy_super(struct super_block *s)
 {
+	BUG_ON(s->s_hard_readonly_users);
 	security_sb_free(s);
 	kfree(s->s_subtype);
 	kfree(s->s_options);
@@ -557,6 +558,21 @@ out:
 	return err;
 }
 
+/*
+ * Some uses of file systems require that they never be mounted
+ * read-write anywhere (e.g., the lower layers of union mounts must
+ * always be read-only).  If there are any of these "hard" read-only
+ * mounts, don't permit a transition to read-write.
+ *
+ * Must be called while holding the namespace lock.
+ */
+
+int sb_is_hard_readonly(struct super_block *sb)
+{
+	return sb->s_hard_readonly_users ? 1 : 0;
+}
+EXPORT_SYMBOL(sb_is_hard_readonly);
+
 /**
  *	do_remount_sb - asks filesystem to change mount options.
  *	@sb:	superblock in question
@@ -599,6 +615,9 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
 			return -EBUSY;
 	}
 
+	if (remount_rw && sb_is_hard_readonly(sb))
+		return -EROFS;
+
 	if (sb->s_op->remount_fs) {
 		retval = sb->s_op->remount_fs(sb, &flags, data);
 		if (retval)
@@ -972,6 +991,10 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void
 	WARN((mnt->mnt_sb->s_maxbytes < 0), "%s set sb->s_maxbytes to "
 		"negative value (%lld)\n", type->name, mnt->mnt_sb->s_maxbytes);
 
+	error = -EROFS;
+	if (!(flags & MS_RDONLY) && sb_is_hard_readonly(mnt->mnt_sb))
+		goto out_sb;
+
 	mnt->mnt_mountpoint = mnt->mnt_root;
 	mnt->mnt_parent = mnt;
 	up_write(&mnt->mnt_sb->s_umount);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 44f35ae..d7ef72a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1383,6 +1383,13 @@ struct super_block {
 	 * generic_show_options()
 	 */
 	char *s_options;
+
+	/*
+	 * Some mounts require that the underlying file system never
+	 * transition to read-write.  They mark the sb itself as
+	 * read-only.
+	 */
+	int s_hard_readonly_users;
 };
 
 extern struct timespec current_fs_time(struct super_block *sb);
@@ -1767,6 +1774,7 @@ extern int get_sb_nodev(struct file_system_type *fs_type,
 	int (*fill_super)(struct super_block *, void *, int),
 	struct vfsmount *mnt);
 void generic_shutdown_super(struct super_block *sb);
+int sb_is_hard_readonly(struct super_block *sb);
 void kill_block_super(struct super_block *sb);
 void kill_anon_super(struct super_block *sb);
 void kill_litter_super(struct super_block *sb);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 04/39] autofs4: Save autofs trigger's vfsmount in super block info
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (2 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 03/39] VFS: Add read-only users count to superblock Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 05/39] whiteout/NFSD: Don't return information about whiteouts to userspace Valerie Aurora
                   ` (34 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora, autofs, Alexander Viro

From: Jan Blunck <jblunck@suse.de>

XXX - This is broken and included just to make union mounts work.  See
discussion at:

http://kerneltrap.org/mailarchive/linux-fsdevel/2010/1/15/6708053/thread

Original commit message:

This is a bugfix/replacement for commit
051d381259eb57d6074d02a6ba6e90e744f1a29f:

    During a path walk if an autofs trigger is mounted on a dentry,
    when the follow_link method is called, the nameidata struct
    contains the vfsmount and mountpoint dentry of the parent mount
    while the dentry that is passed in is the root of the autofs
    trigger mount.  I believe it is impossible to get the vfsmount of
    the trigger mount, within the follow_link method, when only the
    parent vfsmount and the root dentry of the trigger mount are
    known.

The solution in this commit was to replace the path embedded in the
parent's nameidata with the path of the link itself in
__do_follow_link().  This is a relatively harmless misuse of the
field, but union mounts ran into a bug during follow_link() caused by
the nameidata containing the wrong path (we count on it being what it
is all other places - the path of the parent).

A cleaner and easier to understand solution is to save the necessary
vfsmount in the autofs superblock info when it is mounted.  Then we
can easily update the vfsmount in autofs4_follow_link().

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Acked-by: Ian Kent <raven@themaw.net>
Cc: autofs@linux.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/autofs4/autofs_i.h |    1 +
 fs/autofs4/init.c     |   11 ++++++++++-
 fs/autofs4/root.c     |    6 ++++++
 fs/namei.c            |    7 ++-----
 4 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index 3d283ab..de3af64 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -133,6 +133,7 @@ struct autofs_sb_info {
 	int reghost_enabled;
 	int needs_reghost;
 	struct super_block *sb;
+	struct vfsmount *mnt;
 	struct mutex wq_mutex;
 	spinlock_t fs_lock;
 	struct autofs_wait_queue *queues; /* Wait queue pointer */
diff --git a/fs/autofs4/init.c b/fs/autofs4/init.c
index 9722e4b..5e0dcd7 100644
--- a/fs/autofs4/init.c
+++ b/fs/autofs4/init.c
@@ -17,7 +17,16 @@
 static int autofs_get_sb(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
 {
-	return get_sb_nodev(fs_type, flags, data, autofs4_fill_super, mnt);
+	struct autofs_sb_info *sbi;
+	int ret;
+
+	ret = get_sb_nodev(fs_type, flags, data, autofs4_fill_super, mnt);
+	if (ret)
+		return ret;
+
+	sbi = autofs4_sbi(mnt->mnt_sb);
+	sbi->mnt = mnt;
+	return 0;
 }
 
 static struct file_system_type autofs_fs_type = {
diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index 109a6c6..54d2857 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -220,6 +220,12 @@ static void *autofs4_follow_link(struct dentry *dentry, struct nameidata *nd)
 	DPRINTK("dentry=%p %.*s oz_mode=%d nd->flags=%d",
 		dentry, dentry->d_name.len, dentry->d_name.name, oz_mode,
 		nd->flags);
+
+	dput(nd->path.dentry);
+	mntput(nd->path.mnt);
+	nd->path.mnt = mntget(sbi->mnt);
+	nd->path.dentry = dget(dentry);
+
 	/*
 	 * For an expire of a covered direct or offset mount we need
 	 * to break out of follow_down() at the autofs mount trigger
diff --git a/fs/namei.c b/fs/namei.c
index 219da2b..304aa05 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -538,11 +538,8 @@ __do_follow_link(struct path *path, struct nameidata *nd, void **p)
 	touch_atime(path->mnt, dentry);
 	nd_set_link(nd, NULL);
 
-	if (path->mnt != nd->path.mnt) {
-		path_to_nameidata(path, nd);
-		dget(dentry);
-	}
-	mntget(path->mnt);
+	if (path->mnt == nd->path.mnt)
+		mntget(nd->path.mnt);
 	nd->last_type = LAST_BIND;
 	*p = dentry->d_inode->i_op->follow_link(dentry, nd);
 	error = PTR_ERR(*p);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 05/39] whiteout/NFSD: Don't return information about whiteouts to userspace
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (3 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 04/39] autofs4: Save autofs trigger's vfsmount in super block info Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:37     ` Neil Brown
  2010-05-03 23:12 ` [PATCH 06/39] whiteout: Add vfs_whiteout() and whiteout inode operation Valerie Aurora
                   ` (33 subsequent siblings)
  38 siblings, 1 reply; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	David Woodhouse, Valerie Aurora, linux-nfs, J. Bruce Fields,
	Neil Brown

From: Jan Blunck <jblunck@suse.de>

Userspace isn't ready for handling another file type, so silently drop
whiteout directory entries before they leave the kernel.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: linux-nfs@vger.kernel.org
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
---
 fs/compat.c       |    9 +++++++++
 fs/nfsd/nfs3xdr.c |    5 +++++
 fs/nfsd/nfs4xdr.c |    5 +++++
 fs/nfsd/nfsxdr.c  |    4 ++++
 fs/readdir.c      |    9 +++++++++
 5 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/fs/compat.c b/fs/compat.c
index 4b6ed03..adec661 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -839,6 +839,9 @@ static int compat_fillonedir(void *__buf, const char *name, int namlen,
 	struct compat_old_linux_dirent __user *dirent;
 	compat_ulong_t d_ino;
 
+	if (d_type == DT_WHT)
+		return 0;
+
 	if (buf->result)
 		return -EINVAL;
 	d_ino = ino;
@@ -910,6 +913,9 @@ static int compat_filldir(void *__buf, const char *name, int namlen,
 	compat_ulong_t d_ino;
 	int reclen = ALIGN(NAME_OFFSET(dirent) + namlen + 2, sizeof(compat_long_t));
 
+	if (d_type == DT_WHT)
+		return 0;
+
 	buf->error = -EINVAL;	/* only used if we fail.. */
 	if (reclen > buf->count)
 		return -EINVAL;
@@ -999,6 +1005,9 @@ static int compat_filldir64(void * __buf, const char * name, int namlen, loff_t
 	int reclen = ALIGN(jj + namlen + 1, sizeof(u64));
 	u64 off;
 
+	if (d_type == DT_WHT)
+		return 0;
+
 	buf->error = -EINVAL;	/* only used if we fail.. */
 	if (reclen > buf->count)
 		return -EINVAL;
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 2a533a0..9b96f5a 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -885,6 +885,11 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
 	int		elen;		/* estimated entry length in words */
 	int		num_entry_words = 0;	/* actual number of words */
 
+	if (d_type == DT_WHT) {
+		cd->common.err = nfs_ok;
+		return 0;
+	}
+
 	if (cd->offset) {
 		u64 offset64 = offset;
 
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 34ccf81..2ddf144 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2269,6 +2269,11 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
 		return 0;
 	}
 
+	if (d_type == DT_WHT) {
+		cd->common.err = nfs_ok;
+		return 0;
+	}
+
 	if (cd->offset)
 		xdr_encode_hyper(cd->offset, (u64) offset);
 
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c
index 4ce005d..0e57d4b 100644
--- a/fs/nfsd/nfsxdr.c
+++ b/fs/nfsd/nfsxdr.c
@@ -503,6 +503,10 @@ nfssvc_encode_entry(void *ccdv, const char *name,
 			namlen, name, offset, ino);
 	 */
 
+	if (d_type == DT_WHT) {
+		cd->common.err = nfs_ok;
+		return 0;
+	}
 	if (offset > ~((u32) 0)) {
 		cd->common.err = nfserr_fbig;
 		return -EINVAL;
diff --git a/fs/readdir.c b/fs/readdir.c
index 7723401..3a48491 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -77,6 +77,9 @@ static int fillonedir(void * __buf, const char * name, int namlen, loff_t offset
 	struct old_linux_dirent __user * dirent;
 	unsigned long d_ino;
 
+	if (d_type == DT_WHT)
+		return 0;
+
 	if (buf->result)
 		return -EINVAL;
 	d_ino = ino;
@@ -154,6 +157,9 @@ static int filldir(void * __buf, const char * name, int namlen, loff_t offset,
 	unsigned long d_ino;
 	int reclen = ALIGN(NAME_OFFSET(dirent) + namlen + 2, sizeof(long));
 
+	if (d_type == DT_WHT)
+		return 0;
+
 	buf->error = -EINVAL;	/* only used if we fail.. */
 	if (reclen > buf->count)
 		return -EINVAL;
@@ -239,6 +245,9 @@ static int filldir64(void * __buf, const char * name, int namlen, loff_t offset,
 	struct getdents_callback64 * buf = (struct getdents_callback64 *) __buf;
 	int reclen = ALIGN(NAME_OFFSET(dirent) + namlen + 1, sizeof(u64));
 
+	if (d_type == DT_WHT)
+		return 0;
+
 	buf->error = -EINVAL;	/* only used if we fail.. */
 	if (reclen > buf->count)
 		return -EINVAL;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 06/39] whiteout: Add vfs_whiteout() and whiteout inode operation
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (4 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 05/39] whiteout/NFSD: Don't return information about whiteouts to userspace Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 07/39] whiteout: Set S_OPAQUE inode flag when creating directories Valerie Aurora
                   ` (32 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	David Woodhouse, Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

Whiteout a given directory entry.  File systems that support whiteouts
must implement the new ->whiteout() directory inode operation.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 Documentation/filesystems/vfs.txt |   10 +++-
 fs/dcache.c                       |    4 +-
 fs/namei.c                        |  133 +++++++++++++++++++++++++++++++++++++
 include/linux/dcache.h            |    6 ++
 include/linux/fs.h                |    2 +
 5 files changed, 153 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 3de2f32..8846b4f 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -308,7 +308,7 @@ struct inode_operations
 -----------------------
 
 This describes how the VFS can manipulate an inode in your
-filesystem. As of kernel 2.6.22, the following members are defined:
+filesystem. As of kernel 2.6.33, the following members are defined:
 
 struct inode_operations {
 	int (*create) (struct inode *,struct dentry *,int, struct nameidata *);
@@ -319,6 +319,7 @@ struct inode_operations {
 	int (*mkdir) (struct inode *,struct dentry *,int);
 	int (*rmdir) (struct inode *,struct dentry *);
 	int (*mknod) (struct inode *,struct dentry *,int,dev_t);
+	int (*whiteout) (struct inode *, struct dentry *, struct dentry *);
 	int (*rename) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *);
 	int (*readlink) (struct dentry *, char __user *,int);
@@ -382,6 +383,13 @@ otherwise noted.
 	will probably need to call d_instantiate() just as you would
 	in the create() method
 
+  whiteout: called by the rmdir(2) and unlink(2) system calls on a
+        layered file system.  Only required if you want to support
+        whiteouts.  The first dentry passed in is that for the old
+        dentry if it exists, and a negative dentry otherwise.  The
+        second is the dentry for the whiteout itself.  This method
+        must unlink() or rmdir() the original entry if it exists.
+
   rename: called by the rename(2) system call to rename the object to
 	have the parent and name given by the second inode and dentry.
 
diff --git a/fs/dcache.c b/fs/dcache.c
index f1358e5..265015d 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -992,8 +992,10 @@ EXPORT_SYMBOL(d_alloc_name);
 /* the caller must hold dcache_lock */
 static void __d_instantiate(struct dentry *dentry, struct inode *inode)
 {
-	if (inode)
+	if (inode) {
+		dentry->d_flags &= ~DCACHE_WHITEOUT;
 		list_add(&dentry->d_alias, &inode->i_dentry);
+	}
 	dentry->d_inode = inode;
 	fsnotify_d_instantiate(dentry, inode);
 }
diff --git a/fs/namei.c b/fs/namei.c
index 304aa05..2c0681f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2167,6 +2167,139 @@ SYSCALL_DEFINE2(mkdir, const char __user *, pathname, int, mode)
 }
 
 /*
+ * Checks on the victim for whiteout.  We must both be able to delete
+ * the victim directory entry (if it exists) and create a new
+ * directory entry, so this function is a combination of the checks
+ * from may_create() and may_delete().
+ */
+static inline int may_whiteout(struct inode *dir, struct dentry *victim,
+			       int isdir)
+{
+	int err;
+
+	/*
+	 * From may_create().  We don't have to do this check for a
+	 * simple delete because the directory must exist if we are
+	 * trying to delete something from it.  For a whiteout, the
+	 * dir may be empty and thus potentially unlinked by this point.
+	 */
+	if (IS_DEADDIR(dir))
+		return -ENOENT;
+	err = inode_permission(dir, MAY_WRITE | MAY_EXEC);
+	if (err)
+		return err;
+
+	/* From may_delete(). */
+	if (IS_APPEND(dir))
+		return -EPERM;
+	if (!victim->d_inode)
+		return 0;
+	if (check_sticky(dir, victim->d_inode) ||
+	    IS_APPEND(victim->d_inode) ||
+	    IS_IMMUTABLE(victim->d_inode))
+		return -EPERM;
+	if (isdir) {
+		if (!S_ISDIR(victim->d_inode->i_mode))
+			return -ENOTDIR;
+		if (IS_ROOT(victim))
+			return -EBUSY;
+	} else if (S_ISDIR(victim->d_inode->i_mode))
+		return -EISDIR;
+	if (victim->d_flags & DCACHE_NFSFS_RENAMED)
+		return -EBUSY;
+	return 0;
+}
+
+/**
+ * vfs_whiteout: create a whiteout for the given directory entry
+ * @dir: parent inode
+ * @dentry: directory entry to whiteout
+ *
+ * Create a whiteout for the given directory entry.  A whiteout
+ * prevents lookup from dropping down to a lower layer of a union
+ * mounted file system.
+ *
+ * There are two important cases: (a) The directory entry to be
+ * whited-out may already exist, in which case it must first be
+ * deleted before we create the whiteout, and (b) no such directory
+ * entry exists and we only have to create the whiteout itself.
+ *
+ * The caller must pass in a dentry for the directory entry to be
+ * whited-out - a positive one if it exists, and a negative if not.
+ * When this function returns, the caller should dput() the old, now
+ * defunct dentry it passed in.  The dentry for the whiteout itself is
+ * created inside this function.
+ */
+static int vfs_whiteout(struct inode *dir, struct dentry *old_dentry, int isdir)
+{
+	int err;
+	struct inode *old_inode = old_dentry->d_inode;
+	struct dentry *parent, *whiteout;
+
+	err = may_whiteout(dir, old_dentry, isdir);
+	if (err)
+		return err;
+
+	BUG_ON(old_dentry->d_parent->d_inode != dir);
+
+	if (!dir->i_op || !dir->i_op->whiteout)
+		return -EOPNOTSUPP;
+
+	/*
+	 * If the old dentry is positive, then we have to delete this
+	 * entry before we create the whiteout.  The file system
+	 * ->whiteout() op does the actual delete, but we do all the
+	 * VFS-level checks and changes here.
+	 */
+	if (old_inode) {
+		mutex_lock(&old_inode->i_mutex);
+		if (isdir)
+			dentry_unhash(old_dentry);
+		if (d_mountpoint(old_dentry))
+			err = -EBUSY;
+		else {
+			if (isdir)
+				err = security_inode_rmdir(dir, old_dentry);
+			else
+				err = security_inode_unlink(dir, old_dentry);
+		}
+	}
+
+	parent = dget_parent(old_dentry);
+	whiteout = d_alloc_name(parent, old_dentry->d_name.name);
+
+	if (!err)
+		err = dir->i_op->whiteout(dir, old_dentry, whiteout);
+
+	if (old_inode) {
+		mutex_unlock(&old_inode->i_mutex);
+		if (!err) {
+			fsnotify_link_count(old_inode);
+			d_delete(old_dentry);
+		}
+		if (isdir)
+			dput(old_dentry);
+	}
+
+	dput(whiteout);
+	dput(parent);
+	return err;
+}
+
+int path_whiteout(struct path *dir_path, struct dentry *dentry, int isdir)
+{
+	int error = mnt_want_write(dir_path->mnt);
+
+	if (!error) {
+		error = vfs_whiteout(dir_path->dentry->d_inode, dentry, isdir);
+		mnt_drop_write(dir_path->mnt);
+	}
+
+	return error;
+}
+EXPORT_SYMBOL(path_whiteout);
+
+/*
  * We try to drop the dentry early: we should have
  * a usage count of 2 if we're the only user of this
  * dentry, and if that is true (possibly after pruning
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 30b93b2..7648b49 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -183,6 +183,7 @@ d_iput:		no		no		no       yes
 #define DCACHE_INOTIFY_PARENT_WATCHED	0x0020 /* Parent inode is watched by inotify */
 
 #define DCACHE_COOKIE		0x0040	/* For use by dcookie subsystem */
+#define DCACHE_WHITEOUT		0x0080	/* This negative dentry is a whiteout */
 
 #define DCACHE_FSNOTIFY_PARENT_WATCHED	0x0080 /* Parent inode is watched by some fsnotify listener */
 
@@ -358,6 +359,11 @@ static inline int d_unlinked(struct dentry *dentry)
 	return d_unhashed(dentry) && !IS_ROOT(dentry);
 }
 
+static inline int d_is_whiteout(struct dentry *dentry)
+{
+	return (dentry->d_flags & DCACHE_WHITEOUT);
+}
+
 static inline struct dentry *dget_parent(struct dentry *dentry)
 {
 	struct dentry *ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d7ef72a..7afdbd4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -209,6 +209,7 @@ struct inodes_stat_t {
 #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
 #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
+#define MS_WHITEOUT	(1<<25) /* FS supports whiteout filetype */
 #define MS_ACTIVE	(1<<30)
 #define MS_NOUSER	(1<<31)
 
@@ -1527,6 +1528,7 @@ struct inode_operations {
 	int (*mkdir) (struct inode *,struct dentry *,int);
 	int (*rmdir) (struct inode *,struct dentry *);
 	int (*mknod) (struct inode *,struct dentry *,int,dev_t);
+	int (*whiteout) (struct inode *, struct dentry *, struct dentry *);
 	int (*rename) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *);
 	int (*readlink) (struct dentry *, char __user *,int);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 07/39] whiteout: Set S_OPAQUE inode flag when creating directories
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (5 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 06/39] whiteout: Add vfs_whiteout() and whiteout inode operation Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 08/39] whiteout: Allow removal of a directory with whiteouts Valerie Aurora
                   ` (31 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

In case of an union directory we don't want that the directories on lower
layers of the union "show through". So to prevent that the contents of
underlying directories magically shows up after a mkdir() we set the S_OPAQUE
flag if directories are created where a whiteout existed before.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namei.c         |   11 ++++++++++-
 include/linux/fs.h |    3 +++
 2 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 2c0681f..ce32e66 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2106,6 +2106,7 @@ SYSCALL_DEFINE3(mknod, const char __user *, filename, int, mode, unsigned, dev)
 int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
 {
 	int error = may_create(dir, dentry);
+	int opaque = 0;
 
 	if (error)
 		return error;
@@ -2118,9 +2119,17 @@ int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
 	if (error)
 		return error;
 
+	if (d_is_whiteout(dentry))
+		opaque = 1;
+
 	error = dir->i_op->mkdir(dir, dentry, mode);
-	if (!error)
+	if (!error) {
 		fsnotify_mkdir(dir, dentry);
+		if (opaque) {
+			dentry->d_inode->i_flags |= S_OPAQUE;
+			mark_inode_dirty(dentry->d_inode);
+		}
+	}
 	return error;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7afdbd4..e9aa650 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -236,6 +236,7 @@ struct inodes_stat_t {
 #define S_NOCMTIME	128	/* Do not update file c/mtime */
 #define S_SWAPFILE	256	/* Do not truncate: swapon got its bmaps */
 #define S_PRIVATE	512	/* Inode is fs-internal */
+#define S_OPAQUE	1024	/* Directory is opaque */
 
 /*
  * Note that nosuid etc flags are inode-specific: setting some file-system
@@ -271,6 +272,8 @@ struct inodes_stat_t {
 #define IS_SWAPFILE(inode)	((inode)->i_flags & S_SWAPFILE)
 #define IS_PRIVATE(inode)	((inode)->i_flags & S_PRIVATE)
 
+#define IS_OPAQUE(inode)	((inode)->i_flags & S_OPAQUE)
+
 /* the read-only stuff doesn't really belong here, but any other place is
    probably as bad and I don't want to create yet another include file. */
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 08/39] whiteout: Allow removal of a directory with whiteouts
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (6 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 07/39] whiteout: Set S_OPAQUE inode flag when creating directories Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12   ` Valerie Aurora
                   ` (30 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

do_whiteout() allows removal of a directory when it has whiteouts but
is logically empty.

XXX - This patch abuses readdir() to check if the union directory is
logically empty - that is, all the entries are whiteouts (or "." or
"..").  Currently, we have no clean VFS interface to ask the lower
file system if a directory is empty.

Fixes:
 - Add ->is_directory_empty() op
 - Add is_directory_empty flag to dentry (ugly dcache populate)
 - Ask underlying fs to remove it and look for an error return
 - (your idea here)

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namei.c |   88 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 88 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ce32e66..7e2c31f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2309,6 +2309,94 @@ int path_whiteout(struct path *dir_path, struct dentry *dentry, int isdir)
 EXPORT_SYMBOL(path_whiteout);
 
 /*
+ * XXX - We are abusing readdir to check if a union directory is
+ * logically empty.
+ */
+static int filldir_is_empty(void *__buf, const char *name, int namlen,
+			    loff_t offset, u64 ino, unsigned int d_type)
+{
+	int *is_empty = (int *)__buf;
+
+	switch (namlen) {
+	case 2:
+		if (name[1] != '.')
+			break;
+	case 1:
+		if (name[0] != '.')
+			break;
+		return 0;
+	}
+
+	if (d_type == DT_WHT)
+		return 0;
+
+	(*is_empty) = 0;
+	return 0;
+}
+
+static int directory_is_empty(struct dentry *dentry, struct vfsmount *mnt)
+{
+	struct file *file;
+	int err;
+	int is_empty = 1;
+
+	BUG_ON(!S_ISDIR(dentry->d_inode->i_mode));
+
+	/* references for the file pointer */
+	dget(dentry);
+	mntget(mnt);
+
+	file = dentry_open(dentry, mnt, O_RDONLY, current_cred());
+	if (IS_ERR(file))
+		return 0;
+
+	err = vfs_readdir(file, filldir_is_empty, &is_empty);
+
+	fput(file);
+	return is_empty;
+}
+
+static int do_whiteout(struct nameidata *nd, struct path *path, int isdir)
+{
+	struct path safe = { .dentry = dget(nd->path.dentry),
+			     .mnt = mntget(nd->path.mnt) };
+	struct dentry *dentry = path->dentry;
+	int err;
+
+	err = may_whiteout(nd->path.dentry->d_inode, dentry, isdir);
+	if (err)
+		goto out;
+
+	err = -ENOENT;
+	if (!dentry->d_inode)
+		goto out;
+
+	err = -ENOTEMPTY;
+	if (isdir && !directory_is_empty(path->dentry, path->mnt))
+		goto out;
+
+	if (nd->path.dentry != dentry->d_parent) {
+		dentry = __lookup_hash(&path->dentry->d_name, nd->path.dentry,
+				       nd);
+		err = PTR_ERR(dentry);
+		if (IS_ERR(dentry))
+			goto out;
+
+		dput(path->dentry);
+		if (path->mnt != safe.mnt)
+			mntput(path->mnt);
+		path->mnt = nd->path.mnt;
+		path->dentry = dentry;
+	}
+
+	err = vfs_whiteout(nd->path.dentry->d_inode, dentry, isdir);
+
+out:
+	path_put(&safe);
+	return err;
+}
+
+/*
  * We try to drop the dentry early: we should have
  * a usage count of 2 if we're the only user of this
  * dentry, and if that is true (possibly after pruning
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 09/39] whiteout: tmpfs whiteout support
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
@ 2010-05-03 23:12   ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 02/39] VFS: Make lookup_hash() return a struct path Valerie Aurora
                     ` (37 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	David Woodhouse, Valerie Aurora, Hugh Dickins, linux-mm

From: Jan Blunck <jblunck@suse.de>

Add support for whiteout dentries to tmpfs.  This includes adding
support for whiteouts to d_genocide(), which is called to tear down
pinned tmpfs dentries.  Whiteouts have to be persistent, so they have
a pinning extra ref count that needs to be dropped by d_genocide().

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: linux-mm@kvack.org
---
 fs/dcache.c |   13 +++++-
 mm/shmem.c  |  149 +++++++++++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 147 insertions(+), 15 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 265015d..3b0e525 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2229,7 +2229,18 @@ resume:
 		struct list_head *tmp = next;
 		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
 		next = tmp->next;
-		if (d_unhashed(dentry)||!dentry->d_inode)
+		/*
+		 * Skip unhashed and negative dentries, but process
+		 * positive dentries and whiteouts.  A whiteout looks
+		 * kind of like a negative dentry for purposes of
+		 * lookup, but it has an extra pinning ref count
+		 * because it can't be evicted like a negative dentry
+		 * can.  What we care about here is ref counts - and
+		 * we need to drop the ref count on a whiteout before
+		 * we can evict it.
+		 */
+		if (d_unhashed(dentry)||(!dentry->d_inode &&
+					 !d_is_whiteout(dentry)))
 			continue;
 		if (!list_empty(&dentry->d_subdirs)) {
 			this_parent = dentry;
diff --git a/mm/shmem.c b/mm/shmem.c
index eef4ebe..c58ecf4 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1805,6 +1805,76 @@ static int shmem_statfs(struct dentry *dentry, struct kstatfs *buf)
 	return 0;
 }
 
+static int shmem_rmdir(struct inode *dir, struct dentry *dentry);
+static int shmem_unlink(struct inode *dir, struct dentry *dentry);
+
+/*
+ * This is the whiteout support for tmpfs. It uses one singleton whiteout
+ * inode per superblock thus it is very similar to shmem_link().
+ */
+static int shmem_whiteout(struct inode *dir, struct dentry *old_dentry,
+			  struct dentry *new_dentry)
+{
+	struct shmem_sb_info *sbinfo = SHMEM_SB(dir->i_sb);
+	struct dentry *dentry;
+
+	if (!(dir->i_sb->s_flags & MS_WHITEOUT))
+		return -EPERM;
+
+	/* This gives us a proper initialized negative dentry */
+	dentry = simple_lookup(dir, new_dentry, NULL);
+	if (dentry && IS_ERR(dentry))
+		return PTR_ERR(dentry);
+
+	/*
+	 * No ordinary (disk based) filesystem counts whiteouts as inodes;
+	 * but each new link needs a new dentry, pinning lowmem, and
+	 * tmpfs dentries cannot be pruned until they are unlinked.
+	 */
+	if (sbinfo->max_inodes) {
+		spin_lock(&sbinfo->stat_lock);
+		if (!sbinfo->free_inodes) {
+			spin_unlock(&sbinfo->stat_lock);
+			return -ENOSPC;
+		}
+		sbinfo->free_inodes--;
+		spin_unlock(&sbinfo->stat_lock);
+	}
+
+	if (old_dentry->d_inode) {
+		if (S_ISDIR(old_dentry->d_inode->i_mode))
+			shmem_rmdir(dir, old_dentry);
+		else
+			shmem_unlink(dir, old_dentry);
+	}
+
+	dir->i_size += BOGO_DIRENT_SIZE;
+	dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+	/* Extra pinning count for the created dentry */
+	dget(new_dentry);
+	spin_lock(&new_dentry->d_lock);
+	new_dentry->d_flags |= DCACHE_WHITEOUT;
+	spin_unlock(&new_dentry->d_lock);
+	return 0;
+}
+
+static void shmem_d_instantiate(struct inode *dir, struct dentry *dentry,
+				struct inode *inode)
+{
+	if (d_is_whiteout(dentry)) {
+		/* Re-using an existing whiteout */
+		shmem_free_inode(dir->i_sb);
+		if (S_ISDIR(inode->i_mode))
+			inode->i_mode |= S_OPAQUE;
+	} else {
+		/* New dentry */
+		dir->i_size += BOGO_DIRENT_SIZE;
+		dget(dentry); /* Extra count - pin the dentry in core */
+	}
+	/* Will clear DCACHE_WHITEOUT flag */
+	d_instantiate(dentry, inode);
+
+}
 /*
  * File creation. Allocate an inode, and we're done..
  */
@@ -1838,10 +1908,10 @@ shmem_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
 			if (S_ISDIR(mode))
 				inode->i_mode |= S_ISGID;
 		}
-		dir->i_size += BOGO_DIRENT_SIZE;
+
+		shmem_d_instantiate(dir, dentry, inode);
+
 		dir->i_ctime = dir->i_mtime = CURRENT_TIME;
-		d_instantiate(dentry, inode);
-		dget(dentry); /* Extra count - pin the dentry in core */
 	}
 	return error;
 }
@@ -1879,12 +1949,11 @@ static int shmem_link(struct dentry *old_dentry, struct inode *dir, struct dentr
 	if (ret)
 		goto out;
 
-	dir->i_size += BOGO_DIRENT_SIZE;
+	shmem_d_instantiate(dir, dentry, inode);
+
 	inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
 	inc_nlink(inode);
 	atomic_inc(&inode->i_count);	/* New dentry reference */
-	dget(dentry);		/* Extra pinning count for the created dentry */
-	d_instantiate(dentry, inode);
 out:
 	return ret;
 }
@@ -1893,21 +1962,61 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)
 {
 	struct inode *inode = dentry->d_inode;
 
-	if (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode))
-		shmem_free_inode(inode->i_sb);
+	if (d_is_whiteout(dentry) || (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode)))
+		shmem_free_inode(dir->i_sb);
 
+	if (inode) {
+		inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+		drop_nlink(inode);
+	}
 	dir->i_size -= BOGO_DIRENT_SIZE;
-	inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
-	drop_nlink(inode);
 	dput(dentry);	/* Undo the count from "create" - this does all the work */
 	return 0;
 }
 
+static void shmem_dir_unlink_whiteouts(struct inode *dir, struct dentry *dentry)
+{
+	if (!dentry->d_inode)
+		return;
+
+	/* Remove whiteouts from logical empty directory */
+	if (S_ISDIR(dentry->d_inode->i_mode) &&
+	    dentry->d_inode->i_sb->s_flags & MS_WHITEOUT) {
+		struct dentry *child, *next;
+		LIST_HEAD(list);
+
+		spin_lock(&dcache_lock);
+		list_for_each_entry(child, &dentry->d_subdirs, d_u.d_child) {
+			spin_lock(&child->d_lock);
+			if (d_is_whiteout(child)) {
+				__d_drop(child);
+				if (!list_empty(&child->d_lru)) {
+					list_del(&child->d_lru);
+					dentry_stat.nr_unused--;
+				}
+				list_add(&child->d_lru, &list);
+			}
+			spin_unlock(&child->d_lock);
+		}
+		spin_unlock(&dcache_lock);
+
+		list_for_each_entry_safe(child, next, &list, d_lru) {
+			spin_lock(&child->d_lock);
+			list_del_init(&child->d_lru);
+			spin_unlock(&child->d_lock);
+
+			shmem_unlink(dentry->d_inode, child);
+		}
+	}
+}
+
 static int shmem_rmdir(struct inode *dir, struct dentry *dentry)
 {
 	if (!simple_empty(dentry))
 		return -ENOTEMPTY;
 
+	/* Remove whiteouts from logical empty directory */
+	shmem_dir_unlink_whiteouts(dir, dentry);
 	drop_nlink(dentry->d_inode);
 	drop_nlink(dir);
 	return shmem_unlink(dir, dentry);
@@ -1916,7 +2025,7 @@ static int shmem_rmdir(struct inode *dir, struct dentry *dentry)
 /*
  * The VFS layer already does all the dentry stuff for rename,
  * we just have to decrement the usage count for the target if
- * it exists so that the VFS layer correctly free's it when it
+ * it exists so that the VFS layer correctly frees it when it
  * gets overwritten.
  */
 static int shmem_rename(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry)
@@ -1927,7 +2036,12 @@ static int shmem_rename(struct inode *old_dir, struct dentry *old_dentry, struct
 	if (!simple_empty(new_dentry))
 		return -ENOTEMPTY;
 
+	if (d_is_whiteout(new_dentry))
+		shmem_unlink(new_dir, new_dentry);
+
 	if (new_dentry->d_inode) {
+		/* Remove whiteouts from logical empty directory */
+		shmem_dir_unlink_whiteouts(new_dir, new_dentry);
 		(void) shmem_unlink(new_dir, new_dentry);
 		if (they_are_dirs)
 			drop_nlink(old_dir);
@@ -1992,12 +2106,12 @@ static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *s
 		unlock_page(page);
 		page_cache_release(page);
 	}
+
+	shmem_d_instantiate(dir, dentry, inode);
+
 	if (dir->i_mode & S_ISGID)
 		inode->i_gid = dir->i_gid;
-	dir->i_size += BOGO_DIRENT_SIZE;
 	dir->i_ctime = dir->i_mtime = CURRENT_TIME;
-	d_instantiate(dentry, inode);
-	dget(dentry);
 	return 0;
 }
 
@@ -2375,6 +2489,12 @@ int shmem_fill_super(struct super_block *sb, void *data, int silent)
 	if (!root)
 		goto failed_iput;
 	sb->s_root = root;
+
+#ifdef CONFIG_TMPFS
+	if (!(sb->s_flags & MS_NOUSER))
+		sb->s_flags |= MS_WHITEOUT;
+#endif
+
 	return 0;
 
 failed_iput:
@@ -2475,6 +2595,7 @@ static const struct inode_operations shmem_dir_inode_operations = {
 	.rmdir		= shmem_rmdir,
 	.mknod		= shmem_mknod,
 	.rename		= shmem_rename,
+	.whiteout       = shmem_whiteout,
 #endif
 #ifdef CONFIG_TMPFS_POSIX_ACL
 	.setattr	= shmem_notify_change,
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 09/39] whiteout: tmpfs whiteout support
@ 2010-05-03 23:12   ` Valerie Aurora
  0 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	David Woodhouse, Valerie Aurora, Hugh Dickins, linux-mm

From: Jan Blunck <jblunck@suse.de>

Add support for whiteout dentries to tmpfs.  This includes adding
support for whiteouts to d_genocide(), which is called to tear down
pinned tmpfs dentries.  Whiteouts have to be persistent, so they have
a pinning extra ref count that needs to be dropped by d_genocide().

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: linux-mm@kvack.org
---
 fs/dcache.c |   13 +++++-
 mm/shmem.c  |  149 +++++++++++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 147 insertions(+), 15 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 265015d..3b0e525 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2229,7 +2229,18 @@ resume:
 		struct list_head *tmp = next;
 		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
 		next = tmp->next;
-		if (d_unhashed(dentry)||!dentry->d_inode)
+		/*
+		 * Skip unhashed and negative dentries, but process
+		 * positive dentries and whiteouts.  A whiteout looks
+		 * kind of like a negative dentry for purposes of
+		 * lookup, but it has an extra pinning ref count
+		 * because it can't be evicted like a negative dentry
+		 * can.  What we care about here is ref counts - and
+		 * we need to drop the ref count on a whiteout before
+		 * we can evict it.
+		 */
+		if (d_unhashed(dentry)||(!dentry->d_inode &&
+					 !d_is_whiteout(dentry)))
 			continue;
 		if (!list_empty(&dentry->d_subdirs)) {
 			this_parent = dentry;
diff --git a/mm/shmem.c b/mm/shmem.c
index eef4ebe..c58ecf4 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1805,6 +1805,76 @@ static int shmem_statfs(struct dentry *dentry, struct kstatfs *buf)
 	return 0;
 }
 
+static int shmem_rmdir(struct inode *dir, struct dentry *dentry);
+static int shmem_unlink(struct inode *dir, struct dentry *dentry);
+
+/*
+ * This is the whiteout support for tmpfs. It uses one singleton whiteout
+ * inode per superblock thus it is very similar to shmem_link().
+ */
+static int shmem_whiteout(struct inode *dir, struct dentry *old_dentry,
+			  struct dentry *new_dentry)
+{
+	struct shmem_sb_info *sbinfo = SHMEM_SB(dir->i_sb);
+	struct dentry *dentry;
+
+	if (!(dir->i_sb->s_flags & MS_WHITEOUT))
+		return -EPERM;
+
+	/* This gives us a proper initialized negative dentry */
+	dentry = simple_lookup(dir, new_dentry, NULL);
+	if (dentry && IS_ERR(dentry))
+		return PTR_ERR(dentry);
+
+	/*
+	 * No ordinary (disk based) filesystem counts whiteouts as inodes;
+	 * but each new link needs a new dentry, pinning lowmem, and
+	 * tmpfs dentries cannot be pruned until they are unlinked.
+	 */
+	if (sbinfo->max_inodes) {
+		spin_lock(&sbinfo->stat_lock);
+		if (!sbinfo->free_inodes) {
+			spin_unlock(&sbinfo->stat_lock);
+			return -ENOSPC;
+		}
+		sbinfo->free_inodes--;
+		spin_unlock(&sbinfo->stat_lock);
+	}
+
+	if (old_dentry->d_inode) {
+		if (S_ISDIR(old_dentry->d_inode->i_mode))
+			shmem_rmdir(dir, old_dentry);
+		else
+			shmem_unlink(dir, old_dentry);
+	}
+
+	dir->i_size += BOGO_DIRENT_SIZE;
+	dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+	/* Extra pinning count for the created dentry */
+	dget(new_dentry);
+	spin_lock(&new_dentry->d_lock);
+	new_dentry->d_flags |= DCACHE_WHITEOUT;
+	spin_unlock(&new_dentry->d_lock);
+	return 0;
+}
+
+static void shmem_d_instantiate(struct inode *dir, struct dentry *dentry,
+				struct inode *inode)
+{
+	if (d_is_whiteout(dentry)) {
+		/* Re-using an existing whiteout */
+		shmem_free_inode(dir->i_sb);
+		if (S_ISDIR(inode->i_mode))
+			inode->i_mode |= S_OPAQUE;
+	} else {
+		/* New dentry */
+		dir->i_size += BOGO_DIRENT_SIZE;
+		dget(dentry); /* Extra count - pin the dentry in core */
+	}
+	/* Will clear DCACHE_WHITEOUT flag */
+	d_instantiate(dentry, inode);
+
+}
 /*
  * File creation. Allocate an inode, and we're done..
  */
@@ -1838,10 +1908,10 @@ shmem_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
 			if (S_ISDIR(mode))
 				inode->i_mode |= S_ISGID;
 		}
-		dir->i_size += BOGO_DIRENT_SIZE;
+
+		shmem_d_instantiate(dir, dentry, inode);
+
 		dir->i_ctime = dir->i_mtime = CURRENT_TIME;
-		d_instantiate(dentry, inode);
-		dget(dentry); /* Extra count - pin the dentry in core */
 	}
 	return error;
 }
@@ -1879,12 +1949,11 @@ static int shmem_link(struct dentry *old_dentry, struct inode *dir, struct dentr
 	if (ret)
 		goto out;
 
-	dir->i_size += BOGO_DIRENT_SIZE;
+	shmem_d_instantiate(dir, dentry, inode);
+
 	inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
 	inc_nlink(inode);
 	atomic_inc(&inode->i_count);	/* New dentry reference */
-	dget(dentry);		/* Extra pinning count for the created dentry */
-	d_instantiate(dentry, inode);
 out:
 	return ret;
 }
@@ -1893,21 +1962,61 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)
 {
 	struct inode *inode = dentry->d_inode;
 
-	if (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode))
-		shmem_free_inode(inode->i_sb);
+	if (d_is_whiteout(dentry) || (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode)))
+		shmem_free_inode(dir->i_sb);
 
+	if (inode) {
+		inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+		drop_nlink(inode);
+	}
 	dir->i_size -= BOGO_DIRENT_SIZE;
-	inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
-	drop_nlink(inode);
 	dput(dentry);	/* Undo the count from "create" - this does all the work */
 	return 0;
 }
 
+static void shmem_dir_unlink_whiteouts(struct inode *dir, struct dentry *dentry)
+{
+	if (!dentry->d_inode)
+		return;
+
+	/* Remove whiteouts from logical empty directory */
+	if (S_ISDIR(dentry->d_inode->i_mode) &&
+	    dentry->d_inode->i_sb->s_flags & MS_WHITEOUT) {
+		struct dentry *child, *next;
+		LIST_HEAD(list);
+
+		spin_lock(&dcache_lock);
+		list_for_each_entry(child, &dentry->d_subdirs, d_u.d_child) {
+			spin_lock(&child->d_lock);
+			if (d_is_whiteout(child)) {
+				__d_drop(child);
+				if (!list_empty(&child->d_lru)) {
+					list_del(&child->d_lru);
+					dentry_stat.nr_unused--;
+				}
+				list_add(&child->d_lru, &list);
+			}
+			spin_unlock(&child->d_lock);
+		}
+		spin_unlock(&dcache_lock);
+
+		list_for_each_entry_safe(child, next, &list, d_lru) {
+			spin_lock(&child->d_lock);
+			list_del_init(&child->d_lru);
+			spin_unlock(&child->d_lock);
+
+			shmem_unlink(dentry->d_inode, child);
+		}
+	}
+}
+
 static int shmem_rmdir(struct inode *dir, struct dentry *dentry)
 {
 	if (!simple_empty(dentry))
 		return -ENOTEMPTY;
 
+	/* Remove whiteouts from logical empty directory */
+	shmem_dir_unlink_whiteouts(dir, dentry);
 	drop_nlink(dentry->d_inode);
 	drop_nlink(dir);
 	return shmem_unlink(dir, dentry);
@@ -1916,7 +2025,7 @@ static int shmem_rmdir(struct inode *dir, struct dentry *dentry)
 /*
  * The VFS layer already does all the dentry stuff for rename,
  * we just have to decrement the usage count for the target if
- * it exists so that the VFS layer correctly free's it when it
+ * it exists so that the VFS layer correctly frees it when it
  * gets overwritten.
  */
 static int shmem_rename(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry)
@@ -1927,7 +2036,12 @@ static int shmem_rename(struct inode *old_dir, struct dentry *old_dentry, struct
 	if (!simple_empty(new_dentry))
 		return -ENOTEMPTY;
 
+	if (d_is_whiteout(new_dentry))
+		shmem_unlink(new_dir, new_dentry);
+
 	if (new_dentry->d_inode) {
+		/* Remove whiteouts from logical empty directory */
+		shmem_dir_unlink_whiteouts(new_dir, new_dentry);
 		(void) shmem_unlink(new_dir, new_dentry);
 		if (they_are_dirs)
 			drop_nlink(old_dir);
@@ -1992,12 +2106,12 @@ static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *s
 		unlock_page(page);
 		page_cache_release(page);
 	}
+
+	shmem_d_instantiate(dir, dentry, inode);
+
 	if (dir->i_mode & S_ISGID)
 		inode->i_gid = dir->i_gid;
-	dir->i_size += BOGO_DIRENT_SIZE;
 	dir->i_ctime = dir->i_mtime = CURRENT_TIME;
-	d_instantiate(dentry, inode);
-	dget(dentry);
 	return 0;
 }
 
@@ -2375,6 +2489,12 @@ int shmem_fill_super(struct super_block *sb, void *data, int silent)
 	if (!root)
 		goto failed_iput;
 	sb->s_root = root;
+
+#ifdef CONFIG_TMPFS
+	if (!(sb->s_flags & MS_NOUSER))
+		sb->s_flags |= MS_WHITEOUT;
+#endif
+
 	return 0;
 
 failed_iput:
@@ -2475,6 +2595,7 @@ static const struct inode_operations shmem_dir_inode_operations = {
 	.rmdir		= shmem_rmdir,
 	.mknod		= shmem_mknod,
 	.rename		= shmem_rename,
+	.whiteout       = shmem_whiteout,
 #endif
 #ifdef CONFIG_TMPFS_POSIX_ACL
 	.setattr	= shmem_notify_change,
-- 
1.6.3.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 10/39] whiteout: Split of ext2_append_link() from ext2_add_link()
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (8 preceding siblings ...)
  2010-05-03 23:12   ` Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 11/39] whiteout: ext2 whiteout support Valerie Aurora
                   ` (28 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora, Theodore Tso, linux-ext4

From: Jan Blunck <jblunck@suse.de>

The ext2_append_link() is later used to find or append a directory
entry to whiteout.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
---
 fs/ext2/dir.c |   70 ++++++++++++++++++++++++++++++++++++++++----------------
 1 files changed, 50 insertions(+), 20 deletions(-)

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 7516957..57207a9 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -472,9 +472,10 @@ void ext2_set_link(struct inode *dir, struct ext2_dir_entry_2 *de,
 }
 
 /*
- *	Parent is locked.
+ * Find or append a given dentry to the parent directory
  */
-int ext2_add_link (struct dentry *dentry, struct inode *inode)
+static ext2_dirent * ext2_append_entry(struct dentry * dentry,
+				       struct page ** page)
 {
 	struct inode *dir = dentry->d_parent->d_inode;
 	const char *name = dentry->d_name.name;
@@ -482,13 +483,10 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode)
 	unsigned chunk_size = ext2_chunk_size(dir);
 	unsigned reclen = EXT2_DIR_REC_LEN(namelen);
 	unsigned short rec_len, name_len;
-	struct page *page = NULL;
-	ext2_dirent * de;
+	ext2_dirent * de = NULL;
 	unsigned long npages = dir_pages(dir);
 	unsigned long n;
 	char *kaddr;
-	loff_t pos;
-	int err;
 
 	/*
 	 * We take care of directory expansion in the same loop.
@@ -498,20 +496,19 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode)
 	for (n = 0; n <= npages; n++) {
 		char *dir_end;
 
-		page = ext2_get_page(dir, n, 0);
-		err = PTR_ERR(page);
-		if (IS_ERR(page))
+		*page = ext2_get_page(dir, n, 0);
+		de = ERR_PTR(PTR_ERR(*page));
+		if (IS_ERR(*page))
 			goto out;
-		lock_page(page);
-		kaddr = page_address(page);
+		lock_page(*page);
+		kaddr = page_address(*page);
 		dir_end = kaddr + ext2_last_byte(dir, n);
 		de = (ext2_dirent *)kaddr;
 		kaddr += PAGE_CACHE_SIZE - reclen;
 		while ((char *)de <= kaddr) {
 			if ((char *)de == dir_end) {
 				/* We hit i_size */
-				name_len = 0;
-				rec_len = chunk_size;
+				de->name_len = 0;
 				de->rec_len = ext2_rec_len_to_disk(chunk_size);
 				de->inode = 0;
 				goto got_it;
@@ -519,12 +516,11 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode)
 			if (de->rec_len == 0) {
 				ext2_error(dir->i_sb, __func__,
 					"zero-length directory entry");
-				err = -EIO;
+				de = ERR_PTR(-EIO);
 				goto out_unlock;
 			}
-			err = -EEXIST;
 			if (ext2_match (namelen, name, de))
-				goto out_unlock;
+				goto got_it;
 			name_len = EXT2_DIR_REC_LEN(de->name_len);
 			rec_len = ext2_rec_len_from_disk(de->rec_len);
 			if (!de->inode && rec_len >= reclen)
@@ -533,13 +529,48 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode)
 				goto got_it;
 			de = (ext2_dirent *) ((char *) de + rec_len);
 		}
-		unlock_page(page);
-		ext2_put_page(page);
+		unlock_page(*page);
+		ext2_put_page(*page);
 	}
+
 	BUG();
-	return -EINVAL;
 
 got_it:
+	return de;
+	/* OFFSET_CACHE */
+out_unlock:
+	unlock_page(*page);
+	ext2_put_page(*page);
+out:
+	return de;
+}
+
+/*
+ *	Parent is locked.
+ */
+int ext2_add_link (struct dentry *dentry, struct inode *inode)
+{
+	struct inode *dir = dentry->d_parent->d_inode;
+	const char *name = dentry->d_name.name;
+	int namelen = dentry->d_name.len;
+	unsigned short rec_len, name_len;
+	ext2_dirent * de;
+	struct page *page;
+	loff_t pos;
+	int err;
+
+	de = ext2_append_entry(dentry, &page);
+	if (IS_ERR(de))
+		return PTR_ERR(de);
+
+	err = -EEXIST;
+	if (ext2_match (namelen, name, de))
+		goto out_unlock;
+
+got_it:
+	name_len = EXT2_DIR_REC_LEN(de->name_len);
+	rec_len = ext2_rec_len_from_disk(de->rec_len);
+
 	pos = page_offset(page) +
 		(char*)de - (char*)page_address(page);
 	err = __ext2_write_begin(NULL, page->mapping, pos, rec_len, 0,
@@ -563,7 +594,6 @@ got_it:
 	/* OFFSET_CACHE */
 out_put:
 	ext2_put_page(page);
-out:
 	return err;
 out_unlock:
 	unlock_page(page);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 11/39] whiteout: ext2 whiteout support
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (9 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 10/39] whiteout: Split of ext2_append_link() from ext2_add_link() Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12   ` Valerie Aurora
                   ` (27 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora, Theodore Tso, linux-ext4

From: Jan Blunck <jblunck@suse.de>

This patch adds whiteout support to EXT2. A whiteout is an empty directory
entry (inode == 0) with the file type set to EXT2_FT_WHT. Therefore it
allocates space in directories. Due to being implemented as a filetype it is
necessary to have the EXT2_FEATURE_INCOMPAT_FILETYPE flag set.

XXX - Whiteouts could be implemented as special symbolic links

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
---
 fs/ext2/dir.c           |   96 +++++++++++++++++++++++++++++++++++++++++++++--
 fs/ext2/ext2.h          |    3 +
 fs/ext2/inode.c         |   11 ++++-
 fs/ext2/namei.c         |   67 +++++++++++++++++++++++++++++++-
 fs/ext2/super.c         |    6 +++
 include/linux/ext2_fs.h |    4 ++
 6 files changed, 177 insertions(+), 10 deletions(-)

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 57207a9..030bd46 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -219,7 +219,7 @@ static inline int ext2_match (int len, const char * const name,
 {
 	if (len != de->name_len)
 		return 0;
-	if (!de->inode)
+	if (!de->inode && (de->file_type != EXT2_FT_WHT))
 		return 0;
 	return !memcmp(name, de->name, len);
 }
@@ -255,6 +255,7 @@ static unsigned char ext2_filetype_table[EXT2_FT_MAX] = {
 	[EXT2_FT_FIFO]		= DT_FIFO,
 	[EXT2_FT_SOCK]		= DT_SOCK,
 	[EXT2_FT_SYMLINK]	= DT_LNK,
+	[EXT2_FT_WHT]		= DT_WHT,
 };
 
 #define S_SHIFT 12
@@ -448,6 +449,26 @@ ino_t ext2_inode_by_name(struct inode *dir, struct qstr *child)
 	return res;
 }
 
+/* Special version for filetype based whiteout support */
+ino_t ext2_inode_by_dentry(struct inode *dir, struct dentry *dentry)
+{
+	ino_t res = 0;
+	struct ext2_dir_entry_2 *de;
+	struct page *page;
+
+	de = ext2_find_entry (dir, &dentry->d_name, &page);
+	if (de) {
+		res = le32_to_cpu(de->inode);
+		if (!res && de->file_type == EXT2_FT_WHT) {
+			spin_lock(&dentry->d_lock);
+			dentry->d_flags |= DCACHE_WHITEOUT;
+			spin_unlock(&dentry->d_lock);
+		}
+		ext2_put_page(page);
+	}
+	return res;
+}
+
 /* Releases the page */
 void ext2_set_link(struct inode *dir, struct ext2_dir_entry_2 *de,
 		   struct page *page, struct inode *inode, int update_times)
@@ -523,7 +544,8 @@ static ext2_dirent * ext2_append_entry(struct dentry * dentry,
 				goto got_it;
 			name_len = EXT2_DIR_REC_LEN(de->name_len);
 			rec_len = ext2_rec_len_from_disk(de->rec_len);
-			if (!de->inode && rec_len >= reclen)
+			if (!de->inode && (de->file_type != EXT2_FT_WHT) &&
+			    (rec_len >= reclen))
 				goto got_it;
 			if (rec_len >= name_len + reclen)
 				goto got_it;
@@ -564,8 +586,11 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode)
 		return PTR_ERR(de);
 
 	err = -EEXIST;
-	if (ext2_match (namelen, name, de))
+	if (ext2_match (namelen, name, de)) {
+		if (de->file_type == EXT2_FT_WHT)
+			goto got_it;
 		goto out_unlock;
+	}
 
 got_it:
 	name_len = EXT2_DIR_REC_LEN(de->name_len);
@@ -577,7 +602,8 @@ got_it:
 							&page, NULL);
 	if (err)
 		goto out_unlock;
-	if (de->inode) {
+	if (de->inode || ((de->file_type == EXT2_FT_WHT) &&
+			  !ext2_match (namelen, name, de))) {
 		ext2_dirent *de1 = (ext2_dirent *) ((char *) de + name_len);
 		de1->rec_len = ext2_rec_len_to_disk(rec_len - name_len);
 		de->rec_len = ext2_rec_len_to_disk(name_len);
@@ -646,6 +672,68 @@ out:
 	return err;
 }
 
+int ext2_whiteout_entry (struct inode * dir, struct dentry * dentry,
+			 struct ext2_dir_entry_2 * de, struct page * page)
+{
+	const char *name = dentry->d_name.name;
+	int namelen = dentry->d_name.len;
+	unsigned short rec_len, name_len;
+	loff_t pos;
+	int err;
+
+	if (!de) {
+		de = ext2_append_entry(dentry, &page);
+		BUG_ON(!de);
+	}
+
+	err = -EEXIST;
+	if (ext2_match (namelen, name, de) &&
+	    (de->file_type == EXT2_FT_WHT)) {
+		ext2_error(dir->i_sb, __func__,
+			   "entry is already a whiteout in directory #%lu",
+			   dir->i_ino);
+		goto out_unlock;
+	}
+
+	name_len = EXT2_DIR_REC_LEN(de->name_len);
+	rec_len = ext2_rec_len_from_disk(de->rec_len);
+
+	pos = page_offset(page) +
+		(char*)de - (char*)page_address(page);
+	err = __ext2_write_begin(NULL, page->mapping, pos, rec_len, 0,
+							&page, NULL);
+	if (err)
+		goto out_unlock;
+	/*
+	 * We whiteout an existing entry. Do what ext2_delete_entry() would do,
+	 * except that we don't need to merge with the previous entry since
+	 * we are going to reuse it.
+	 */
+	if (ext2_match (namelen, name, de))
+		de->inode = 0;
+	if (de->inode || (de->file_type == EXT2_FT_WHT)) {
+		ext2_dirent *de1 = (ext2_dirent *) ((char *) de + name_len);
+		de1->rec_len = ext2_rec_len_to_disk(rec_len - name_len);
+		de->rec_len = ext2_rec_len_to_disk(name_len);
+		de = de1;
+	}
+	de->name_len = namelen;
+	memcpy(de->name, name, namelen);
+	de->inode = 0;
+	de->file_type = EXT2_FT_WHT;
+	err = ext2_commit_chunk(page, pos, rec_len);
+	dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
+	EXT2_I(dir)->i_flags &= ~EXT2_BTREE_FL;
+	mark_inode_dirty(dir);
+	/* OFFSET_CACHE */
+out_put:
+	ext2_put_page(page);
+	return err;
+out_unlock:
+	unlock_page(page);
+	goto out_put;
+}
+
 /*
  * Set the first fragment of directory.
  */
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index 0b038e4..44d190c 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -102,9 +102,12 @@ extern void ext2_rsv_window_add(struct super_block *sb, struct ext2_reserve_wind
 /* dir.c */
 extern int ext2_add_link (struct dentry *, struct inode *);
 extern ino_t ext2_inode_by_name(struct inode *, struct qstr *);
+extern ino_t ext2_inode_by_dentry(struct inode *, struct dentry *);
 extern int ext2_make_empty(struct inode *, struct inode *);
 extern struct ext2_dir_entry_2 * ext2_find_entry (struct inode *,struct qstr *, struct page **);
 extern int ext2_delete_entry (struct ext2_dir_entry_2 *, struct page *);
+extern int ext2_whiteout_entry (struct inode *, struct dentry *,
+				struct ext2_dir_entry_2 *, struct page *);
 extern int ext2_empty_dir (struct inode *);
 extern struct ext2_dir_entry_2 * ext2_dotdot (struct inode *, struct page **);
 extern void ext2_set_link(struct inode *, struct ext2_dir_entry_2 *, struct page *, struct inode *, int);
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index fc13cc1..5ad2cbb 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1184,7 +1184,8 @@ void ext2_set_inode_flags(struct inode *inode)
 {
 	unsigned int flags = EXT2_I(inode)->i_flags;
 
-	inode->i_flags &= ~(S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC);
+	inode->i_flags &= ~(S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|
+			    S_OPAQUE);
 	if (flags & EXT2_SYNC_FL)
 		inode->i_flags |= S_SYNC;
 	if (flags & EXT2_APPEND_FL)
@@ -1195,6 +1196,8 @@ void ext2_set_inode_flags(struct inode *inode)
 		inode->i_flags |= S_NOATIME;
 	if (flags & EXT2_DIRSYNC_FL)
 		inode->i_flags |= S_DIRSYNC;
+	if (flags & EXT2_OPAQUE_FL)
+		inode->i_flags |= S_OPAQUE;
 }
 
 /* Propagate flags from i_flags to EXT2_I(inode)->i_flags */
@@ -1202,8 +1205,8 @@ void ext2_get_inode_flags(struct ext2_inode_info *ei)
 {
 	unsigned int flags = ei->vfs_inode.i_flags;
 
-	ei->i_flags &= ~(EXT2_SYNC_FL|EXT2_APPEND_FL|
-			EXT2_IMMUTABLE_FL|EXT2_NOATIME_FL|EXT2_DIRSYNC_FL);
+	ei->i_flags &= ~(EXT2_SYNC_FL|EXT2_APPEND_FL|EXT2_IMMUTABLE_FL|
+			 EXT2_NOATIME_FL|EXT2_DIRSYNC_FL|EXT2_OPAQUE_FL);
 	if (flags & S_SYNC)
 		ei->i_flags |= EXT2_SYNC_FL;
 	if (flags & S_APPEND)
@@ -1214,6 +1217,8 @@ void ext2_get_inode_flags(struct ext2_inode_info *ei)
 		ei->i_flags |= EXT2_NOATIME_FL;
 	if (flags & S_DIRSYNC)
 		ei->i_flags |= EXT2_DIRSYNC_FL;
+	if (flags & S_OPAQUE)
+		ei->i_flags |= EXT2_OPAQUE_FL;
 }
 
 struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
diff --git a/fs/ext2/namei.c b/fs/ext2/namei.c
index 71efb0e..12195a5 100644
--- a/fs/ext2/namei.c
+++ b/fs/ext2/namei.c
@@ -55,15 +55,16 @@ static inline int ext2_add_nondir(struct dentry *dentry, struct inode *inode)
  * Methods themselves.
  */
 
-static struct dentry *ext2_lookup(struct inode * dir, struct dentry *dentry, struct nameidata *nd)
+static struct dentry *ext2_lookup(struct inode * dir, struct dentry *dentry,
+				  struct nameidata *nd)
 {
 	struct inode * inode;
 	ino_t ino;
-	
+
 	if (dentry->d_name.len > EXT2_NAME_LEN)
 		return ERR_PTR(-ENAMETOOLONG);
 
-	ino = ext2_inode_by_name(dir, &dentry->d_name);
+	ino = ext2_inode_by_dentry(dir, dentry);
 	inode = NULL;
 	if (ino) {
 		inode = ext2_iget(dir->i_sb, ino);
@@ -242,6 +243,10 @@ static int ext2_mkdir(struct inode * dir, struct dentry * dentry, int mode)
 	else
 		inode->i_mapping->a_ops = &ext2_aops;
 
+	/* if we call mkdir on a whiteout create an opaque directory */
+	if (dentry->d_flags & DCACHE_WHITEOUT)
+		inode->i_flags |= S_OPAQUE;
+
 	inode_inc_link_count(inode);
 
 	err = ext2_make_empty(inode, dir);
@@ -307,6 +312,61 @@ static int ext2_rmdir (struct inode * dir, struct dentry *dentry)
 	return err;
 }
 
+/*
+ * Create a whiteout for the dentry
+ */
+static int ext2_whiteout(struct inode *dir, struct dentry *dentry,
+			 struct dentry *new_dentry)
+{
+	struct inode * inode = dentry->d_inode;
+	struct ext2_dir_entry_2 * de = NULL;
+	struct page * page;
+	int err = -ENOTEMPTY;
+
+	if (!EXT2_HAS_INCOMPAT_FEATURE(dir->i_sb,
+				       EXT2_FEATURE_INCOMPAT_FILETYPE)) {
+		ext2_error (dir->i_sb, "ext2_whiteout",
+			    "can't set whiteout filetype");
+		err = -EPERM;
+		goto out;
+	}
+
+	dquot_initialize(dir);
+
+	if (inode) {
+		if (S_ISDIR(inode->i_mode) && !ext2_empty_dir(inode))
+			goto out;
+
+		err = -ENOENT;
+		de = ext2_find_entry (dir, &dentry->d_name, &page);
+		if (!de)
+			goto out;
+		lock_page(page);
+	}
+
+	err = ext2_whiteout_entry (dir, dentry, de, page);
+	if (err)
+		goto out;
+
+	spin_lock(&new_dentry->d_lock);
+	new_dentry->d_flags |= DCACHE_WHITEOUT;
+	spin_unlock(&new_dentry->d_lock);
+	d_add(new_dentry, NULL);
+
+	if (inode) {
+		inode->i_ctime = dir->i_ctime;
+		inode_dec_link_count(inode);
+		if (S_ISDIR(inode->i_mode)) {
+			inode->i_size = 0;
+			inode_dec_link_count(inode);
+			inode_dec_link_count(dir);
+		}
+	}
+	err = 0;
+out:
+	return err;
+}
+
 static int ext2_rename (struct inode * old_dir, struct dentry * old_dentry,
 	struct inode * new_dir,	struct dentry * new_dentry )
 {
@@ -409,6 +469,7 @@ const struct inode_operations ext2_dir_inode_operations = {
 	.mkdir		= ext2_mkdir,
 	.rmdir		= ext2_rmdir,
 	.mknod		= ext2_mknod,
+	.whiteout	= ext2_whiteout,
 	.rename		= ext2_rename,
 #ifdef CONFIG_EXT2_FS_XATTR
 	.setxattr	= generic_setxattr,
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 42e4a30..000ee17 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1079,6 +1079,12 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
 	if (EXT2_HAS_COMPAT_FEATURE(sb, EXT3_FEATURE_COMPAT_HAS_JOURNAL))
 		ext2_msg(sb, KERN_WARNING,
 			"warning: mounting ext3 filesystem as ext2");
+	/*
+	 * Whiteouts (and fallthrus) require explicit whiteout support.
+	 */
+	if (EXT2_HAS_INCOMPAT_FEATURE(sb, EXT2_FEATURE_INCOMPAT_WHITEOUT))
+		sb->s_flags |= MS_WHITEOUT;
+
 	ext2_setup_super (sb, es, sb->s_flags & MS_RDONLY);
 	return 0;
 
diff --git a/include/linux/ext2_fs.h b/include/linux/ext2_fs.h
index 2dfa707..20468bd 100644
--- a/include/linux/ext2_fs.h
+++ b/include/linux/ext2_fs.h
@@ -189,6 +189,7 @@ struct ext2_group_desc
 #define EXT2_NOTAIL_FL			FS_NOTAIL_FL	/* file tail should not be merged */
 #define EXT2_DIRSYNC_FL			FS_DIRSYNC_FL	/* dirsync behaviour (directories only) */
 #define EXT2_TOPDIR_FL			FS_TOPDIR_FL	/* Top of directory hierarchies*/
+#define EXT2_OPAQUE_FL			0x00040000
 #define EXT2_RESERVED_FL		FS_RESERVED_FL	/* reserved for ext2 lib */
 
 #define EXT2_FL_USER_VISIBLE		FS_FL_USER_VISIBLE	/* User visible flags */
@@ -503,10 +504,12 @@ struct ext2_super_block {
 #define EXT3_FEATURE_INCOMPAT_RECOVER		0x0004
 #define EXT3_FEATURE_INCOMPAT_JOURNAL_DEV	0x0008
 #define EXT2_FEATURE_INCOMPAT_META_BG		0x0010
+#define EXT2_FEATURE_INCOMPAT_WHITEOUT		0x0020
 #define EXT2_FEATURE_INCOMPAT_ANY		0xffffffff
 
 #define EXT2_FEATURE_COMPAT_SUPP	EXT2_FEATURE_COMPAT_EXT_ATTR
 #define EXT2_FEATURE_INCOMPAT_SUPP	(EXT2_FEATURE_INCOMPAT_FILETYPE| \
+					 EXT2_FEATURE_INCOMPAT_WHITEOUT| \
 					 EXT2_FEATURE_INCOMPAT_META_BG)
 #define EXT2_FEATURE_RO_COMPAT_SUPP	(EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER| \
 					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE| \
@@ -573,6 +576,7 @@ enum {
 	EXT2_FT_FIFO		= 5,
 	EXT2_FT_SOCK		= 6,
 	EXT2_FT_SYMLINK		= 7,
+	EXT2_FT_WHT		= 8,
 	EXT2_FT_MAX
 };
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 12/39] whiteout: jffs2 whiteout support
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
  2010-05-03 23:12 ` [PATCH 01/39] VFS: Comment follow_mount() and friends Valerie Aurora
@ 2010-05-03 23:12   ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 03/39] VFS: Add read-only users count to superblock Valerie Aurora
                     ` (36 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Felix Fietkau, Valerie Aurora, David Woodhouse, linux-mtd

From: Felix Fietkau <nbd@openwrt.org>

Add support for whiteout dentries to jffs2.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: linux-mtd@lists.infradead.org
---
 fs/jffs2/dir.c        |   72 +++++++++++++++++++++++++++++++++++++++++++++++-
 fs/jffs2/fs.c         |    4 +++
 fs/jffs2/super.c      |    2 +-
 include/linux/jffs2.h |    2 +
 4 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/fs/jffs2/dir.c b/fs/jffs2/dir.c
index 7aa4417..c259193 100644
--- a/fs/jffs2/dir.c
+++ b/fs/jffs2/dir.c
@@ -34,6 +34,8 @@ static int jffs2_mknod (struct inode *,struct dentry *,int,dev_t);
 static int jffs2_rename (struct inode *, struct dentry *,
 			 struct inode *, struct dentry *);
 
+static int jffs2_whiteout (struct inode *, struct dentry *, struct dentry *);
+
 const struct file_operations jffs2_dir_operations =
 {
 	.read =		generic_read_dir,
@@ -56,6 +58,7 @@ const struct inode_operations jffs2_dir_inode_operations =
 	.mknod =	jffs2_mknod,
 	.rename =	jffs2_rename,
 	.check_acl =	jffs2_check_acl,
+	.whiteout =     jffs2_whiteout,
 	.setattr =	jffs2_setattr,
 	.setxattr =	jffs2_setxattr,
 	.getxattr =	jffs2_getxattr,
@@ -98,8 +101,14 @@ static struct dentry *jffs2_lookup(struct inode *dir_i, struct dentry *target,
 			fd = fd_list;
 		}
 	}
-	if (fd)
-		ino = fd->ino;
+	if (fd) {
+		spin_lock(&target->d_lock);
+		if (fd->type == DT_WHT)
+			target->d_flags |= DCACHE_WHITEOUT;
+		else
+			ino = fd->ino;
+		spin_unlock(&target->d_lock);
+	}
 	mutex_unlock(&dir_f->sem);
 	if (ino) {
 		inode = jffs2_iget(dir_i->i_sb, ino);
@@ -498,6 +507,11 @@ static int jffs2_mkdir (struct inode *dir_i, struct dentry *dentry, int mode)
 		return PTR_ERR(inode);
 	}
 
+	if (dentry->d_flags & DCACHE_WHITEOUT) {
+		inode->i_flags |= S_OPAQUE;
+		ri->flags = cpu_to_je16(JFFS2_INO_FLAG_OPAQUE);
+	}
+
 	inode->i_op = &jffs2_dir_inode_operations;
 	inode->i_fop = &jffs2_dir_operations;
 
@@ -779,6 +793,60 @@ static int jffs2_mknod (struct inode *dir_i, struct dentry *dentry, int mode, de
 	return 0;
 }
 
+static int jffs2_whiteout (struct inode *dir, struct dentry *old_dentry,
+			   struct dentry *new_dentry)
+{
+	struct jffs2_sb_info *c = JFFS2_SB_INFO(dir->i_sb);
+	struct jffs2_inode_info *victim_f = NULL;
+	uint32_t now;
+	int ret;
+
+	/* If it's a directory, then check whether it is really empty */
+	if (new_dentry->d_inode) {
+		victim_f = JFFS2_INODE_INFO(old_dentry->d_inode);
+		if (S_ISDIR(old_dentry->d_inode->i_mode)) {
+			struct jffs2_full_dirent *fd;
+
+			mutex_lock(&victim_f->sem);
+			for (fd = victim_f->dents; fd; fd = fd->next) {
+				if (fd->ino) {
+					mutex_unlock(&victim_f->sem);
+					return -ENOTEMPTY;
+				}
+			}
+			mutex_unlock(&victim_f->sem);
+		}
+	}
+
+	now = get_seconds();
+	ret = jffs2_do_link(c, JFFS2_INODE_INFO(dir), 0, DT_WHT,
+			    new_dentry->d_name.name, new_dentry->d_name.len, now);
+	if (ret)
+		return ret;
+
+	spin_lock(&new_dentry->d_lock);
+	new_dentry->d_flags |= DCACHE_WHITEOUT;
+	spin_unlock(&new_dentry->d_lock);
+	d_add(new_dentry, NULL);
+
+	if (victim_f) {
+		/* There was a victim. Kill it off nicely */
+		drop_nlink(old_dentry->d_inode);
+		/* Don't oops if the victim was a dirent pointing to an
+		   inode which didn't exist. */
+		if (victim_f->inocache) {
+			mutex_lock(&victim_f->sem);
+			if (S_ISDIR(old_dentry->d_inode->i_mode))
+				victim_f->inocache->pino_nlink = 0;
+			else
+				victim_f->inocache->pino_nlink--;
+			mutex_unlock(&victim_f->sem);
+		}
+	}
+
+	return 0;
+}
+
 static int jffs2_rename (struct inode *old_dir_i, struct dentry *old_dentry,
 			 struct inode *new_dir_i, struct dentry *new_dentry)
 {
diff --git a/fs/jffs2/fs.c b/fs/jffs2/fs.c
index 3451a81..c1e333c 100644
--- a/fs/jffs2/fs.c
+++ b/fs/jffs2/fs.c
@@ -301,6 +301,10 @@ struct inode *jffs2_iget(struct super_block *sb, unsigned long ino)
 
 		inode->i_op = &jffs2_dir_inode_operations;
 		inode->i_fop = &jffs2_dir_operations;
+
+		if (je16_to_cpu(latest_node.flags) & JFFS2_INO_FLAG_OPAQUE)
+			inode->i_flags |= S_OPAQUE;
+
 		break;
 	}
 	case S_IFREG:
diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c
index 9a80e8e..c12cd1c 100644
--- a/fs/jffs2/super.c
+++ b/fs/jffs2/super.c
@@ -172,7 +172,7 @@ static int jffs2_fill_super(struct super_block *sb, void *data, int silent)
 
 	sb->s_op = &jffs2_super_operations;
 	sb->s_export_op = &jffs2_export_ops;
-	sb->s_flags = sb->s_flags | MS_NOATIME;
+	sb->s_flags = sb->s_flags | MS_NOATIME | MS_WHITEOUT;
 	sb->s_xattr = jffs2_xattr_handlers;
 #ifdef CONFIG_JFFS2_FS_POSIX_ACL
 	sb->s_flags |= MS_POSIXACL;
diff --git a/include/linux/jffs2.h b/include/linux/jffs2.h
index 2b32d63..65533bb 100644
--- a/include/linux/jffs2.h
+++ b/include/linux/jffs2.h
@@ -87,6 +87,8 @@
 #define JFFS2_INO_FLAG_USERCOMPR  2	/* User has requested a specific
 					   compression type */
 
+#define JFFS2_INO_FLAG_OPAQUE     4	/* Directory is opaque (for union mounts) */
+
 
 /* These can go once we've made sure we've caught all uses without
    byteswapping */
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 12/39] whiteout: jffs2 whiteout support
@ 2010-05-03 23:12   ` Valerie Aurora
  0 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Felix Fietkau, linux-kernel, Christoph Hellwig, Valerie Aurora,
	linux-mtd, linux-fsdevel, Jan Blunck, David Woodhouse

From: Felix Fietkau <nbd@openwrt.org>

Add support for whiteout dentries to jffs2.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: linux-mtd@lists.infradead.org
---
 fs/jffs2/dir.c        |   72 +++++++++++++++++++++++++++++++++++++++++++++++-
 fs/jffs2/fs.c         |    4 +++
 fs/jffs2/super.c      |    2 +-
 include/linux/jffs2.h |    2 +
 4 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/fs/jffs2/dir.c b/fs/jffs2/dir.c
index 7aa4417..c259193 100644
--- a/fs/jffs2/dir.c
+++ b/fs/jffs2/dir.c
@@ -34,6 +34,8 @@ static int jffs2_mknod (struct inode *,struct dentry *,int,dev_t);
 static int jffs2_rename (struct inode *, struct dentry *,
 			 struct inode *, struct dentry *);
 
+static int jffs2_whiteout (struct inode *, struct dentry *, struct dentry *);
+
 const struct file_operations jffs2_dir_operations =
 {
 	.read =		generic_read_dir,
@@ -56,6 +58,7 @@ const struct inode_operations jffs2_dir_inode_operations =
 	.mknod =	jffs2_mknod,
 	.rename =	jffs2_rename,
 	.check_acl =	jffs2_check_acl,
+	.whiteout =     jffs2_whiteout,
 	.setattr =	jffs2_setattr,
 	.setxattr =	jffs2_setxattr,
 	.getxattr =	jffs2_getxattr,
@@ -98,8 +101,14 @@ static struct dentry *jffs2_lookup(struct inode *dir_i, struct dentry *target,
 			fd = fd_list;
 		}
 	}
-	if (fd)
-		ino = fd->ino;
+	if (fd) {
+		spin_lock(&target->d_lock);
+		if (fd->type == DT_WHT)
+			target->d_flags |= DCACHE_WHITEOUT;
+		else
+			ino = fd->ino;
+		spin_unlock(&target->d_lock);
+	}
 	mutex_unlock(&dir_f->sem);
 	if (ino) {
 		inode = jffs2_iget(dir_i->i_sb, ino);
@@ -498,6 +507,11 @@ static int jffs2_mkdir (struct inode *dir_i, struct dentry *dentry, int mode)
 		return PTR_ERR(inode);
 	}
 
+	if (dentry->d_flags & DCACHE_WHITEOUT) {
+		inode->i_flags |= S_OPAQUE;
+		ri->flags = cpu_to_je16(JFFS2_INO_FLAG_OPAQUE);
+	}
+
 	inode->i_op = &jffs2_dir_inode_operations;
 	inode->i_fop = &jffs2_dir_operations;
 
@@ -779,6 +793,60 @@ static int jffs2_mknod (struct inode *dir_i, struct dentry *dentry, int mode, de
 	return 0;
 }
 
+static int jffs2_whiteout (struct inode *dir, struct dentry *old_dentry,
+			   struct dentry *new_dentry)
+{
+	struct jffs2_sb_info *c = JFFS2_SB_INFO(dir->i_sb);
+	struct jffs2_inode_info *victim_f = NULL;
+	uint32_t now;
+	int ret;
+
+	/* If it's a directory, then check whether it is really empty */
+	if (new_dentry->d_inode) {
+		victim_f = JFFS2_INODE_INFO(old_dentry->d_inode);
+		if (S_ISDIR(old_dentry->d_inode->i_mode)) {
+			struct jffs2_full_dirent *fd;
+
+			mutex_lock(&victim_f->sem);
+			for (fd = victim_f->dents; fd; fd = fd->next) {
+				if (fd->ino) {
+					mutex_unlock(&victim_f->sem);
+					return -ENOTEMPTY;
+				}
+			}
+			mutex_unlock(&victim_f->sem);
+		}
+	}
+
+	now = get_seconds();
+	ret = jffs2_do_link(c, JFFS2_INODE_INFO(dir), 0, DT_WHT,
+			    new_dentry->d_name.name, new_dentry->d_name.len, now);
+	if (ret)
+		return ret;
+
+	spin_lock(&new_dentry->d_lock);
+	new_dentry->d_flags |= DCACHE_WHITEOUT;
+	spin_unlock(&new_dentry->d_lock);
+	d_add(new_dentry, NULL);
+
+	if (victim_f) {
+		/* There was a victim. Kill it off nicely */
+		drop_nlink(old_dentry->d_inode);
+		/* Don't oops if the victim was a dirent pointing to an
+		   inode which didn't exist. */
+		if (victim_f->inocache) {
+			mutex_lock(&victim_f->sem);
+			if (S_ISDIR(old_dentry->d_inode->i_mode))
+				victim_f->inocache->pino_nlink = 0;
+			else
+				victim_f->inocache->pino_nlink--;
+			mutex_unlock(&victim_f->sem);
+		}
+	}
+
+	return 0;
+}
+
 static int jffs2_rename (struct inode *old_dir_i, struct dentry *old_dentry,
 			 struct inode *new_dir_i, struct dentry *new_dentry)
 {
diff --git a/fs/jffs2/fs.c b/fs/jffs2/fs.c
index 3451a81..c1e333c 100644
--- a/fs/jffs2/fs.c
+++ b/fs/jffs2/fs.c
@@ -301,6 +301,10 @@ struct inode *jffs2_iget(struct super_block *sb, unsigned long ino)
 
 		inode->i_op = &jffs2_dir_inode_operations;
 		inode->i_fop = &jffs2_dir_operations;
+
+		if (je16_to_cpu(latest_node.flags) & JFFS2_INO_FLAG_OPAQUE)
+			inode->i_flags |= S_OPAQUE;
+
 		break;
 	}
 	case S_IFREG:
diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c
index 9a80e8e..c12cd1c 100644
--- a/fs/jffs2/super.c
+++ b/fs/jffs2/super.c
@@ -172,7 +172,7 @@ static int jffs2_fill_super(struct super_block *sb, void *data, int silent)
 
 	sb->s_op = &jffs2_super_operations;
 	sb->s_export_op = &jffs2_export_ops;
-	sb->s_flags = sb->s_flags | MS_NOATIME;
+	sb->s_flags = sb->s_flags | MS_NOATIME | MS_WHITEOUT;
 	sb->s_xattr = jffs2_xattr_handlers;
 #ifdef CONFIG_JFFS2_FS_POSIX_ACL
 	sb->s_flags |= MS_POSIXACL;
diff --git a/include/linux/jffs2.h b/include/linux/jffs2.h
index 2b32d63..65533bb 100644
--- a/include/linux/jffs2.h
+++ b/include/linux/jffs2.h
@@ -87,6 +87,8 @@
 #define JFFS2_INO_FLAG_USERCOMPR  2	/* User has requested a specific
 					   compression type */
 
+#define JFFS2_INO_FLAG_OPAQUE     4	/* Directory is opaque (for union mounts) */
+
 
 /* These can go once we've made sure we've caught all uses without
    byteswapping */
-- 
1.6.3.3


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 12/39] whiteout: jffs2 whiteout support
@ 2010-05-03 23:12   ` Valerie Aurora
  0 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Felix Fietkau, linux-kernel, Christoph Hellwig, Valerie Aurora,
	linux-mtd, linux-fsdevel, Jan Blunck, David Woodhouse

From: Felix Fietkau <nbd@openwrt.org>

Add support for whiteout dentries to jffs2.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: linux-mtd@lists.infradead.org
---
 fs/jffs2/dir.c        |   72 +++++++++++++++++++++++++++++++++++++++++++++++-
 fs/jffs2/fs.c         |    4 +++
 fs/jffs2/super.c      |    2 +-
 include/linux/jffs2.h |    2 +
 4 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/fs/jffs2/dir.c b/fs/jffs2/dir.c
index 7aa4417..c259193 100644
--- a/fs/jffs2/dir.c
+++ b/fs/jffs2/dir.c
@@ -34,6 +34,8 @@ static int jffs2_mknod (struct inode *,struct dentry *,int,dev_t);
 static int jffs2_rename (struct inode *, struct dentry *,
 			 struct inode *, struct dentry *);
 
+static int jffs2_whiteout (struct inode *, struct dentry *, struct dentry *);
+
 const struct file_operations jffs2_dir_operations =
 {
 	.read =		generic_read_dir,
@@ -56,6 +58,7 @@ const struct inode_operations jffs2_dir_inode_operations =
 	.mknod =	jffs2_mknod,
 	.rename =	jffs2_rename,
 	.check_acl =	jffs2_check_acl,
+	.whiteout =     jffs2_whiteout,
 	.setattr =	jffs2_setattr,
 	.setxattr =	jffs2_setxattr,
 	.getxattr =	jffs2_getxattr,
@@ -98,8 +101,14 @@ static struct dentry *jffs2_lookup(struct inode *dir_i, struct dentry *target,
 			fd = fd_list;
 		}
 	}
-	if (fd)
-		ino = fd->ino;
+	if (fd) {
+		spin_lock(&target->d_lock);
+		if (fd->type == DT_WHT)
+			target->d_flags |= DCACHE_WHITEOUT;
+		else
+			ino = fd->ino;
+		spin_unlock(&target->d_lock);
+	}
 	mutex_unlock(&dir_f->sem);
 	if (ino) {
 		inode = jffs2_iget(dir_i->i_sb, ino);
@@ -498,6 +507,11 @@ static int jffs2_mkdir (struct inode *dir_i, struct dentry *dentry, int mode)
 		return PTR_ERR(inode);
 	}
 
+	if (dentry->d_flags & DCACHE_WHITEOUT) {
+		inode->i_flags |= S_OPAQUE;
+		ri->flags = cpu_to_je16(JFFS2_INO_FLAG_OPAQUE);
+	}
+
 	inode->i_op = &jffs2_dir_inode_operations;
 	inode->i_fop = &jffs2_dir_operations;
 
@@ -779,6 +793,60 @@ static int jffs2_mknod (struct inode *dir_i, struct dentry *dentry, int mode, de
 	return 0;
 }
 
+static int jffs2_whiteout (struct inode *dir, struct dentry *old_dentry,
+			   struct dentry *new_dentry)
+{
+	struct jffs2_sb_info *c = JFFS2_SB_INFO(dir->i_sb);
+	struct jffs2_inode_info *victim_f = NULL;
+	uint32_t now;
+	int ret;
+
+	/* If it's a directory, then check whether it is really empty */
+	if (new_dentry->d_inode) {
+		victim_f = JFFS2_INODE_INFO(old_dentry->d_inode);
+		if (S_ISDIR(old_dentry->d_inode->i_mode)) {
+			struct jffs2_full_dirent *fd;
+
+			mutex_lock(&victim_f->sem);
+			for (fd = victim_f->dents; fd; fd = fd->next) {
+				if (fd->ino) {
+					mutex_unlock(&victim_f->sem);
+					return -ENOTEMPTY;
+				}
+			}
+			mutex_unlock(&victim_f->sem);
+		}
+	}
+
+	now = get_seconds();
+	ret = jffs2_do_link(c, JFFS2_INODE_INFO(dir), 0, DT_WHT,
+			    new_dentry->d_name.name, new_dentry->d_name.len, now);
+	if (ret)
+		return ret;
+
+	spin_lock(&new_dentry->d_lock);
+	new_dentry->d_flags |= DCACHE_WHITEOUT;
+	spin_unlock(&new_dentry->d_lock);
+	d_add(new_dentry, NULL);
+
+	if (victim_f) {
+		/* There was a victim. Kill it off nicely */
+		drop_nlink(old_dentry->d_inode);
+		/* Don't oops if the victim was a dirent pointing to an
+		   inode which didn't exist. */
+		if (victim_f->inocache) {
+			mutex_lock(&victim_f->sem);
+			if (S_ISDIR(old_dentry->d_inode->i_mode))
+				victim_f->inocache->pino_nlink = 0;
+			else
+				victim_f->inocache->pino_nlink--;
+			mutex_unlock(&victim_f->sem);
+		}
+	}
+
+	return 0;
+}
+
 static int jffs2_rename (struct inode *old_dir_i, struct dentry *old_dentry,
 			 struct inode *new_dir_i, struct dentry *new_dentry)
 {
diff --git a/fs/jffs2/fs.c b/fs/jffs2/fs.c
index 3451a81..c1e333c 100644
--- a/fs/jffs2/fs.c
+++ b/fs/jffs2/fs.c
@@ -301,6 +301,10 @@ struct inode *jffs2_iget(struct super_block *sb, unsigned long ino)
 
 		inode->i_op = &jffs2_dir_inode_operations;
 		inode->i_fop = &jffs2_dir_operations;
+
+		if (je16_to_cpu(latest_node.flags) & JFFS2_INO_FLAG_OPAQUE)
+			inode->i_flags |= S_OPAQUE;
+
 		break;
 	}
 	case S_IFREG:
diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c
index 9a80e8e..c12cd1c 100644
--- a/fs/jffs2/super.c
+++ b/fs/jffs2/super.c
@@ -172,7 +172,7 @@ static int jffs2_fill_super(struct super_block *sb, void *data, int silent)
 
 	sb->s_op = &jffs2_super_operations;
 	sb->s_export_op = &jffs2_export_ops;
-	sb->s_flags = sb->s_flags | MS_NOATIME;
+	sb->s_flags = sb->s_flags | MS_NOATIME | MS_WHITEOUT;
 	sb->s_xattr = jffs2_xattr_handlers;
 #ifdef CONFIG_JFFS2_FS_POSIX_ACL
 	sb->s_flags |= MS_POSIXACL;
diff --git a/include/linux/jffs2.h b/include/linux/jffs2.h
index 2b32d63..65533bb 100644
--- a/include/linux/jffs2.h
+++ b/include/linux/jffs2.h
@@ -87,6 +87,8 @@
 #define JFFS2_INO_FLAG_USERCOMPR  2	/* User has requested a specific
 					   compression type */
 
+#define JFFS2_INO_FLAG_OPAQUE     4	/* Directory is opaque (for union mounts) */
+
 
 /* These can go once we've made sure we've caught all uses without
    byteswapping */
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 13/39] fallthru: Basic fallthru definitions
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (11 preceding siblings ...)
  2010-05-03 23:12   ` Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 14/39] fallthru: ext2 fallthru support Valerie Aurora
                   ` (25 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

Define the fallthru dcache flag and file system op.  Mask out the
DCACHE_FALLTHRU flag on dentry creation.  Actual users and changes to
lookup come in later patches.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 Documentation/filesystems/vfs.txt |    6 ++++++
 fs/dcache.c                       |    2 +-
 include/linux/dcache.h            |    6 ++++++
 include/linux/fs.h                |    1 +
 4 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 8846b4f..29f3476 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -320,6 +320,7 @@ struct inode_operations {
 	int (*rmdir) (struct inode *,struct dentry *);
 	int (*mknod) (struct inode *,struct dentry *,int,dev_t);
 	int (*whiteout) (struct inode *, struct dentry *, struct dentry *);
+	int (*fallthru) (struct inode *, struct dentry *);
 	int (*rename) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *);
 	int (*readlink) (struct dentry *, char __user *,int);
@@ -390,6 +391,11 @@ otherwise noted.
         second is the dentry for the whiteout itself.  This method
         must unlink() or rmdir() the original entry if it exists.
 
+  fallthru: called by the readdir(2) system call on a layered file
+        system.  Only required if you want to support fallthrus.
+        Fallthrus are place-holders for directory entries visible from
+        a lower level file system.
+
   rename: called by the rename(2) system call to rename the object to
 	have the parent and name given by the second inode and dentry.
 
diff --git a/fs/dcache.c b/fs/dcache.c
index 3b0e525..b76f9e4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -993,7 +993,7 @@ EXPORT_SYMBOL(d_alloc_name);
 static void __d_instantiate(struct dentry *dentry, struct inode *inode)
 {
 	if (inode) {
-		dentry->d_flags &= ~DCACHE_WHITEOUT;
+		dentry->d_flags &= ~(DCACHE_WHITEOUT|DCACHE_FALLTHRU);
 		list_add(&dentry->d_alias, &inode->i_dentry);
 	}
 	dentry->d_inode = inode;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 7648b49..e035c51 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -184,6 +184,7 @@ d_iput:		no		no		no       yes
 
 #define DCACHE_COOKIE		0x0040	/* For use by dcookie subsystem */
 #define DCACHE_WHITEOUT		0x0080	/* This negative dentry is a whiteout */
+#define DCACHE_FALLTHRU		0x0100	/* Keep looking in the file system below */
 
 #define DCACHE_FSNOTIFY_PARENT_WATCHED	0x0080 /* Parent inode is watched by some fsnotify listener */
 
@@ -364,6 +365,11 @@ static inline int d_is_whiteout(struct dentry *dentry)
 	return (dentry->d_flags & DCACHE_WHITEOUT);
 }
 
+static inline int d_is_fallthru(struct dentry *dentry)
+{
+	return (dentry->d_flags & DCACHE_FALLTHRU);
+}
+
 static inline struct dentry *dget_parent(struct dentry *dentry)
 {
 	struct dentry *ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e9aa650..b59cd7b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1532,6 +1532,7 @@ struct inode_operations {
 	int (*rmdir) (struct inode *,struct dentry *);
 	int (*mknod) (struct inode *,struct dentry *,int,dev_t);
 	int (*whiteout) (struct inode *, struct dentry *, struct dentry *);
+	int (*fallthru) (struct inode *, struct dentry *);
 	int (*rename) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *);
 	int (*readlink) (struct dentry *, char __user *,int);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 14/39] fallthru: ext2 fallthru support
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (12 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 13/39] fallthru: Basic fallthru definitions Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12   ` Valerie Aurora
                   ` (24 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora, Theodore Tso, linux-ext4

Add support for fallthru directory entries to ext2.

XXX - Makes up inode number for fallthru entry
XXX - Might be better implemented as special symlinks

Cc: Theodore Tso <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: Jan Blunck <jblunck@suse.de>
---
 fs/ext2/dir.c           |   92 ++++++++++++++++++++++++++++++++++++++++++++--
 fs/ext2/ext2.h          |    1 +
 fs/ext2/namei.c         |   22 +++++++++++
 include/linux/ext2_fs.h |    1 +
 4 files changed, 112 insertions(+), 4 deletions(-)

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 030bd46..f3b4aff 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -219,7 +219,8 @@ static inline int ext2_match (int len, const char * const name,
 {
 	if (len != de->name_len)
 		return 0;
-	if (!de->inode && (de->file_type != EXT2_FT_WHT))
+	if (!de->inode && ((de->file_type != EXT2_FT_WHT) &&
+			   (de->file_type != EXT2_FT_FALLTHRU)))
 		return 0;
 	return !memcmp(name, de->name, len);
 }
@@ -256,6 +257,7 @@ static unsigned char ext2_filetype_table[EXT2_FT_MAX] = {
 	[EXT2_FT_SOCK]		= DT_SOCK,
 	[EXT2_FT_SYMLINK]	= DT_LNK,
 	[EXT2_FT_WHT]		= DT_WHT,
+	[EXT2_FT_FALLTHRU]	= DT_UNKNOWN,
 };
 
 #define S_SHIFT 12
@@ -342,6 +344,24 @@ ext2_readdir (struct file * filp, void * dirent, filldir_t filldir)
 					ext2_put_page(page);
 					return 0;
 				}
+			} else if (de->file_type == EXT2_FT_FALLTHRU) {
+				int over;
+				unsigned char d_type = DT_UNKNOWN;
+
+				offset = (char *)de - kaddr;
+				/* XXX We don't know the inode number
+				 * of the directory entry in the
+				 * underlying file system.  Should
+				 * look it up, either on fallthru
+				 * creation at first readdir or now at
+				 * filldir time. */
+				over = filldir(dirent, de->name, de->name_len,
+					       (n<<PAGE_CACHE_SHIFT) | offset,
+					       123 /* Made up ino */, d_type);
+				if (over) {
+					ext2_put_page(page);
+					return 0;
+				}
 			}
 			filp->f_pos += ext2_rec_len_from_disk(de->rec_len);
 		}
@@ -463,6 +483,10 @@ ino_t ext2_inode_by_dentry(struct inode *dir, struct dentry *dentry)
 			spin_lock(&dentry->d_lock);
 			dentry->d_flags |= DCACHE_WHITEOUT;
 			spin_unlock(&dentry->d_lock);
+		} else if(!res && de->file_type == EXT2_FT_FALLTHRU) {
+			spin_lock(&dentry->d_lock);
+			dentry->d_flags |= DCACHE_FALLTHRU;
+			spin_unlock(&dentry->d_lock);
 		}
 		ext2_put_page(page);
 	}
@@ -532,6 +556,7 @@ static ext2_dirent * ext2_append_entry(struct dentry * dentry,
 				de->name_len = 0;
 				de->rec_len = ext2_rec_len_to_disk(chunk_size);
 				de->inode = 0;
+				de->file_type = 0;
 				goto got_it;
 			}
 			if (de->rec_len == 0) {
@@ -545,6 +570,7 @@ static ext2_dirent * ext2_append_entry(struct dentry * dentry,
 			name_len = EXT2_DIR_REC_LEN(de->name_len);
 			rec_len = ext2_rec_len_from_disk(de->rec_len);
 			if (!de->inode && (de->file_type != EXT2_FT_WHT) &&
+			    (de->file_type != EXT2_FT_FALLTHRU) &&
 			    (rec_len >= reclen))
 				goto got_it;
 			if (rec_len >= name_len + reclen)
@@ -587,7 +613,8 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode)
 
 	err = -EEXIST;
 	if (ext2_match (namelen, name, de)) {
-		if (de->file_type == EXT2_FT_WHT)
+		if ((de->file_type == EXT2_FT_WHT) ||
+		    (de->file_type == EXT2_FT_FALLTHRU))
 			goto got_it;
 		goto out_unlock;
 	}
@@ -602,7 +629,8 @@ got_it:
 							&page, NULL);
 	if (err)
 		goto out_unlock;
-	if (de->inode || ((de->file_type == EXT2_FT_WHT) &&
+	if (de->inode || (((de->file_type == EXT2_FT_WHT) ||
+			   (de->file_type == EXT2_FT_FALLTHRU)) &&
 			  !ext2_match (namelen, name, de))) {
 		ext2_dirent *de1 = (ext2_dirent *) ((char *) de + name_len);
 		de1->rec_len = ext2_rec_len_to_disk(rec_len - name_len);
@@ -627,6 +655,60 @@ out_unlock:
 }
 
 /*
+ * Create a fallthru entry.
+ */
+int ext2_fallthru_entry (struct inode *dir, struct dentry *dentry)
+{
+	const char *name = dentry->d_name.name;
+	int namelen = dentry->d_name.len;
+	unsigned short rec_len, name_len;
+	ext2_dirent * de;
+	struct page *page;
+	loff_t pos;
+	int err;
+
+	de = ext2_append_entry(dentry, &page);
+	if (IS_ERR(de))
+		return PTR_ERR(de);
+
+	err = -EEXIST;
+	if (ext2_match (namelen, name, de))
+		goto out_unlock;
+
+	name_len = EXT2_DIR_REC_LEN(de->name_len);
+	rec_len = ext2_rec_len_from_disk(de->rec_len);
+
+	pos = page_offset(page) +
+		(char*)de - (char*)page_address(page);
+	err = __ext2_write_begin(NULL, page->mapping, pos, rec_len, 0,
+							&page, NULL);
+	if (err)
+		goto out_unlock;
+	if (de->inode || (de->file_type == EXT2_FT_WHT) ||
+	    (de->file_type == EXT2_FT_FALLTHRU)) {
+		ext2_dirent *de1 = (ext2_dirent *) ((char *) de + name_len);
+		de1->rec_len = ext2_rec_len_to_disk(rec_len - name_len);
+		de->rec_len = ext2_rec_len_to_disk(name_len);
+		de = de1;
+	}
+	de->name_len = namelen;
+	memcpy(de->name, name, namelen);
+	de->inode = 0;
+	de->file_type = EXT2_FT_FALLTHRU;
+	err = ext2_commit_chunk(page, pos, rec_len);
+	dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
+	EXT2_I(dir)->i_flags &= ~EXT2_BTREE_FL;
+	mark_inode_dirty(dir);
+	/* OFFSET_CACHE */
+out_put:
+	ext2_put_page(page);
+	return err;
+out_unlock:
+	unlock_page(page);
+	goto out_put;
+}
+
+/*
  * ext2_delete_entry deletes a directory entry by merging it with the
  * previous entry. Page is up-to-date. Releases the page.
  */
@@ -711,7 +793,9 @@ int ext2_whiteout_entry (struct inode * dir, struct dentry * dentry,
 	 */
 	if (ext2_match (namelen, name, de))
 		de->inode = 0;
-	if (de->inode || (de->file_type == EXT2_FT_WHT)) {
+	if (de->inode || (((de->file_type == EXT2_FT_WHT) ||
+			   (de->file_type == EXT2_FT_FALLTHRU)) &&
+			  !ext2_match (namelen, name, de))) {
 		ext2_dirent *de1 = (ext2_dirent *) ((char *) de + name_len);
 		de1->rec_len = ext2_rec_len_to_disk(rec_len - name_len);
 		de->rec_len = ext2_rec_len_to_disk(name_len);
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index 44d190c..2fa32b3 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -108,6 +108,7 @@ extern struct ext2_dir_entry_2 * ext2_find_entry (struct inode *,struct qstr *,
 extern int ext2_delete_entry (struct ext2_dir_entry_2 *, struct page *);
 extern int ext2_whiteout_entry (struct inode *, struct dentry *,
 				struct ext2_dir_entry_2 *, struct page *);
+extern int ext2_fallthru_entry (struct inode *, struct dentry *);
 extern int ext2_empty_dir (struct inode *);
 extern struct ext2_dir_entry_2 * ext2_dotdot (struct inode *, struct page **);
 extern void ext2_set_link(struct inode *, struct ext2_dir_entry_2 *, struct page *, struct inode *, int);
diff --git a/fs/ext2/namei.c b/fs/ext2/namei.c
index 12195a5..f28154c 100644
--- a/fs/ext2/namei.c
+++ b/fs/ext2/namei.c
@@ -349,6 +349,7 @@ static int ext2_whiteout(struct inode *dir, struct dentry *dentry,
 		goto out;
 
 	spin_lock(&new_dentry->d_lock);
+	new_dentry->d_flags &= ~DCACHE_FALLTHRU;
 	new_dentry->d_flags |= DCACHE_WHITEOUT;
 	spin_unlock(&new_dentry->d_lock);
 	d_add(new_dentry, NULL);
@@ -367,6 +368,26 @@ out:
 	return err;
 }
 
+/*
+ * Create a fallthru entry.
+ */
+static int ext2_fallthru (struct inode *dir, struct dentry *dentry)
+{
+	int err;
+
+	dquot_initialize(dir);
+
+	err = ext2_fallthru_entry(dir, dentry);
+	if (err)
+		return err;
+
+	d_instantiate(dentry, NULL);
+	spin_lock(&dentry->d_lock);
+	dentry->d_flags |= DCACHE_FALLTHRU;
+	spin_unlock(&dentry->d_lock);
+	return 0;
+}
+
 static int ext2_rename (struct inode * old_dir, struct dentry * old_dentry,
 	struct inode * new_dir,	struct dentry * new_dentry )
 {
@@ -470,6 +491,7 @@ const struct inode_operations ext2_dir_inode_operations = {
 	.rmdir		= ext2_rmdir,
 	.mknod		= ext2_mknod,
 	.whiteout	= ext2_whiteout,
+	.fallthru	= ext2_fallthru,
 	.rename		= ext2_rename,
 #ifdef CONFIG_EXT2_FS_XATTR
 	.setxattr	= generic_setxattr,
diff --git a/include/linux/ext2_fs.h b/include/linux/ext2_fs.h
index 20468bd..cb3d400 100644
--- a/include/linux/ext2_fs.h
+++ b/include/linux/ext2_fs.h
@@ -577,6 +577,7 @@ enum {
 	EXT2_FT_SOCK		= 6,
 	EXT2_FT_SYMLINK		= 7,
 	EXT2_FT_WHT		= 8,
+	EXT2_FT_FALLTHRU	= 9,
 	EXT2_FT_MAX
 };
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 15/39] fallthru: jffs2 fallthru support
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
  2010-05-03 23:12 ` [PATCH 01/39] VFS: Comment follow_mount() and friends Valerie Aurora
@ 2010-05-03 23:12   ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 03/39] VFS: Add read-only users count to superblock Valerie Aurora
                     ` (36 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Felix Fietkau, David Woodhouse, linux-mtd, Valerie Aurora

From: Felix Fietkau <nbd@openwrt.org>

Add support for fallthru dentries to jffs2.

Cc: David Woodhouse <dwmw2@infradead.org>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/jffs2/dir.c        |   36 +++++++++++++++++++++++++++++++++---
 include/linux/jffs2.h |    6 ++++++
 2 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/fs/jffs2/dir.c b/fs/jffs2/dir.c
index c259193..98397b3 100644
--- a/fs/jffs2/dir.c
+++ b/fs/jffs2/dir.c
@@ -35,6 +35,7 @@ static int jffs2_rename (struct inode *, struct dentry *,
 			 struct inode *, struct dentry *);
 
 static int jffs2_whiteout (struct inode *, struct dentry *, struct dentry *);
+static int jffs2_fallthru (struct inode *, struct dentry *);
 
 const struct file_operations jffs2_dir_operations =
 {
@@ -59,6 +60,7 @@ const struct inode_operations jffs2_dir_inode_operations =
 	.rename =	jffs2_rename,
 	.check_acl =	jffs2_check_acl,
 	.whiteout =     jffs2_whiteout,
+	.fallthru =     jffs2_fallthru,
 	.setattr =	jffs2_setattr,
 	.setxattr =	jffs2_setxattr,
 	.getxattr =	jffs2_getxattr,
@@ -103,10 +105,14 @@ static struct dentry *jffs2_lookup(struct inode *dir_i, struct dentry *target,
 	}
 	if (fd) {
 		spin_lock(&target->d_lock);
-		if (fd->type == DT_WHT)
+		switch (fd->type) {
+		case DT_WHT:
 			target->d_flags |= DCACHE_WHITEOUT;
-		else
+		case JFFS2_DT_FALLTHRU:
+			target->d_flags |= DCACHE_FALLTHRU;
+		default:
 			ino = fd->ino;
+		}
 		spin_unlock(&target->d_lock);
 	}
 	mutex_unlock(&dir_f->sem);
@@ -164,7 +170,10 @@ static int jffs2_readdir(struct file *filp, void *dirent, filldir_t filldir)
 				  fd->name, fd->ino, fd->type, curofs, offset));
 			continue;
 		}
-		if (!fd->ino) {
+		if (fd->type == JFFS2_DT_FALLTHRU)
+			/* XXX Should really do a lookup for the real inode number here */
+			fd->ino = 100;
+		else if (!fd->ino && (fd->type != DT_WHT)) {
 			D2(printk(KERN_DEBUG "Skipping deletion dirent \"%s\"\n", fd->name));
 			offset++;
 			continue;
@@ -793,6 +802,26 @@ static int jffs2_mknod (struct inode *dir_i, struct dentry *dentry, int mode, de
 	return 0;
 }
 
+static int jffs2_fallthru (struct inode *dir, struct dentry *dentry)
+{
+	struct jffs2_sb_info *c = JFFS2_SB_INFO(dir->i_sb);
+	uint32_t now;
+	int ret;
+
+	now = get_seconds();
+	ret = jffs2_do_link(c, JFFS2_INODE_INFO(dir), 0, DT_UNKNOWN,
+			    dentry->d_name.name, dentry->d_name.len, now);
+	if (ret)
+		return ret;
+
+	d_instantiate(dentry, NULL);
+	spin_lock(&dentry->d_lock);
+	dentry->d_flags |= DCACHE_FALLTHRU;
+	spin_unlock(&dentry->d_lock);
+
+	return 0;
+}
+
 static int jffs2_whiteout (struct inode *dir, struct dentry *old_dentry,
 			   struct dentry *new_dentry)
 {
@@ -825,6 +854,7 @@ static int jffs2_whiteout (struct inode *dir, struct dentry *old_dentry,
 		return ret;
 
 	spin_lock(&new_dentry->d_lock);
+	new_dentry->d_flags &= ~DCACHE_FALLTHRU;
 	new_dentry->d_flags |= DCACHE_WHITEOUT;
 	spin_unlock(&new_dentry->d_lock);
 	d_add(new_dentry, NULL);
diff --git a/include/linux/jffs2.h b/include/linux/jffs2.h
index 65533bb..dbe8c93 100644
--- a/include/linux/jffs2.h
+++ b/include/linux/jffs2.h
@@ -114,6 +114,12 @@ struct jffs2_unknown_node
 	jint32_t hdr_crc;
 };
 
+/*
+ * Non-standard directory entry type(s), for on-disk use
+ */
+
+#define                JFFS2_DT_FALLTHRU       (DT_WHT + 1)
+
 struct jffs2_raw_dirent
 {
 	jint16_t magic;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 15/39] fallthru: jffs2 fallthru support
@ 2010-05-03 23:12   ` Valerie Aurora
  0 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Felix Fietkau, linux-kernel, Christoph Hellwig, Valerie Aurora,
	linux-mtd, linux-fsdevel, Jan Blunck, David Woodhouse

From: Felix Fietkau <nbd@openwrt.org>

Add support for fallthru dentries to jffs2.

Cc: David Woodhouse <dwmw2@infradead.org>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/jffs2/dir.c        |   36 +++++++++++++++++++++++++++++++++---
 include/linux/jffs2.h |    6 ++++++
 2 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/fs/jffs2/dir.c b/fs/jffs2/dir.c
index c259193..98397b3 100644
--- a/fs/jffs2/dir.c
+++ b/fs/jffs2/dir.c
@@ -35,6 +35,7 @@ static int jffs2_rename (struct inode *, struct dentry *,
 			 struct inode *, struct dentry *);
 
 static int jffs2_whiteout (struct inode *, struct dentry *, struct dentry *);
+static int jffs2_fallthru (struct inode *, struct dentry *);
 
 const struct file_operations jffs2_dir_operations =
 {
@@ -59,6 +60,7 @@ const struct inode_operations jffs2_dir_inode_operations =
 	.rename =	jffs2_rename,
 	.check_acl =	jffs2_check_acl,
 	.whiteout =     jffs2_whiteout,
+	.fallthru =     jffs2_fallthru,
 	.setattr =	jffs2_setattr,
 	.setxattr =	jffs2_setxattr,
 	.getxattr =	jffs2_getxattr,
@@ -103,10 +105,14 @@ static struct dentry *jffs2_lookup(struct inode *dir_i, struct dentry *target,
 	}
 	if (fd) {
 		spin_lock(&target->d_lock);
-		if (fd->type == DT_WHT)
+		switch (fd->type) {
+		case DT_WHT:
 			target->d_flags |= DCACHE_WHITEOUT;
-		else
+		case JFFS2_DT_FALLTHRU:
+			target->d_flags |= DCACHE_FALLTHRU;
+		default:
 			ino = fd->ino;
+		}
 		spin_unlock(&target->d_lock);
 	}
 	mutex_unlock(&dir_f->sem);
@@ -164,7 +170,10 @@ static int jffs2_readdir(struct file *filp, void *dirent, filldir_t filldir)
 				  fd->name, fd->ino, fd->type, curofs, offset));
 			continue;
 		}
-		if (!fd->ino) {
+		if (fd->type == JFFS2_DT_FALLTHRU)
+			/* XXX Should really do a lookup for the real inode number here */
+			fd->ino = 100;
+		else if (!fd->ino && (fd->type != DT_WHT)) {
 			D2(printk(KERN_DEBUG "Skipping deletion dirent \"%s\"\n", fd->name));
 			offset++;
 			continue;
@@ -793,6 +802,26 @@ static int jffs2_mknod (struct inode *dir_i, struct dentry *dentry, int mode, de
 	return 0;
 }
 
+static int jffs2_fallthru (struct inode *dir, struct dentry *dentry)
+{
+	struct jffs2_sb_info *c = JFFS2_SB_INFO(dir->i_sb);
+	uint32_t now;
+	int ret;
+
+	now = get_seconds();
+	ret = jffs2_do_link(c, JFFS2_INODE_INFO(dir), 0, DT_UNKNOWN,
+			    dentry->d_name.name, dentry->d_name.len, now);
+	if (ret)
+		return ret;
+
+	d_instantiate(dentry, NULL);
+	spin_lock(&dentry->d_lock);
+	dentry->d_flags |= DCACHE_FALLTHRU;
+	spin_unlock(&dentry->d_lock);
+
+	return 0;
+}
+
 static int jffs2_whiteout (struct inode *dir, struct dentry *old_dentry,
 			   struct dentry *new_dentry)
 {
@@ -825,6 +854,7 @@ static int jffs2_whiteout (struct inode *dir, struct dentry *old_dentry,
 		return ret;
 
 	spin_lock(&new_dentry->d_lock);
+	new_dentry->d_flags &= ~DCACHE_FALLTHRU;
 	new_dentry->d_flags |= DCACHE_WHITEOUT;
 	spin_unlock(&new_dentry->d_lock);
 	d_add(new_dentry, NULL);
diff --git a/include/linux/jffs2.h b/include/linux/jffs2.h
index 65533bb..dbe8c93 100644
--- a/include/linux/jffs2.h
+++ b/include/linux/jffs2.h
@@ -114,6 +114,12 @@ struct jffs2_unknown_node
 	jint32_t hdr_crc;
 };
 
+/*
+ * Non-standard directory entry type(s), for on-disk use
+ */
+
+#define                JFFS2_DT_FALLTHRU       (DT_WHT + 1)
+
 struct jffs2_raw_dirent
 {
 	jint16_t magic;
-- 
1.6.3.3


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 15/39] fallthru: jffs2 fallthru support
@ 2010-05-03 23:12   ` Valerie Aurora
  0 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Felix Fietkau, linux-kernel, Christoph Hellwig, Valerie Aurora,
	linux-mtd, linux-fsdevel, Jan Blunck, David Woodhouse

From: Felix Fietkau <nbd@openwrt.org>

Add support for fallthru dentries to jffs2.

Cc: David Woodhouse <dwmw2@infradead.org>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/jffs2/dir.c        |   36 +++++++++++++++++++++++++++++++++---
 include/linux/jffs2.h |    6 ++++++
 2 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/fs/jffs2/dir.c b/fs/jffs2/dir.c
index c259193..98397b3 100644
--- a/fs/jffs2/dir.c
+++ b/fs/jffs2/dir.c
@@ -35,6 +35,7 @@ static int jffs2_rename (struct inode *, struct dentry *,
 			 struct inode *, struct dentry *);
 
 static int jffs2_whiteout (struct inode *, struct dentry *, struct dentry *);
+static int jffs2_fallthru (struct inode *, struct dentry *);
 
 const struct file_operations jffs2_dir_operations =
 {
@@ -59,6 +60,7 @@ const struct inode_operations jffs2_dir_inode_operations =
 	.rename =	jffs2_rename,
 	.check_acl =	jffs2_check_acl,
 	.whiteout =     jffs2_whiteout,
+	.fallthru =     jffs2_fallthru,
 	.setattr =	jffs2_setattr,
 	.setxattr =	jffs2_setxattr,
 	.getxattr =	jffs2_getxattr,
@@ -103,10 +105,14 @@ static struct dentry *jffs2_lookup(struct inode *dir_i, struct dentry *target,
 	}
 	if (fd) {
 		spin_lock(&target->d_lock);
-		if (fd->type == DT_WHT)
+		switch (fd->type) {
+		case DT_WHT:
 			target->d_flags |= DCACHE_WHITEOUT;
-		else
+		case JFFS2_DT_FALLTHRU:
+			target->d_flags |= DCACHE_FALLTHRU;
+		default:
 			ino = fd->ino;
+		}
 		spin_unlock(&target->d_lock);
 	}
 	mutex_unlock(&dir_f->sem);
@@ -164,7 +170,10 @@ static int jffs2_readdir(struct file *filp, void *dirent, filldir_t filldir)
 				  fd->name, fd->ino, fd->type, curofs, offset));
 			continue;
 		}
-		if (!fd->ino) {
+		if (fd->type == JFFS2_DT_FALLTHRU)
+			/* XXX Should really do a lookup for the real inode number here */
+			fd->ino = 100;
+		else if (!fd->ino && (fd->type != DT_WHT)) {
 			D2(printk(KERN_DEBUG "Skipping deletion dirent \"%s\"\n", fd->name));
 			offset++;
 			continue;
@@ -793,6 +802,26 @@ static int jffs2_mknod (struct inode *dir_i, struct dentry *dentry, int mode, de
 	return 0;
 }
 
+static int jffs2_fallthru (struct inode *dir, struct dentry *dentry)
+{
+	struct jffs2_sb_info *c = JFFS2_SB_INFO(dir->i_sb);
+	uint32_t now;
+	int ret;
+
+	now = get_seconds();
+	ret = jffs2_do_link(c, JFFS2_INODE_INFO(dir), 0, DT_UNKNOWN,
+			    dentry->d_name.name, dentry->d_name.len, now);
+	if (ret)
+		return ret;
+
+	d_instantiate(dentry, NULL);
+	spin_lock(&dentry->d_lock);
+	dentry->d_flags |= DCACHE_FALLTHRU;
+	spin_unlock(&dentry->d_lock);
+
+	return 0;
+}
+
 static int jffs2_whiteout (struct inode *dir, struct dentry *old_dentry,
 			   struct dentry *new_dentry)
 {
@@ -825,6 +854,7 @@ static int jffs2_whiteout (struct inode *dir, struct dentry *old_dentry,
 		return ret;
 
 	spin_lock(&new_dentry->d_lock);
+	new_dentry->d_flags &= ~DCACHE_FALLTHRU;
 	new_dentry->d_flags |= DCACHE_WHITEOUT;
 	spin_unlock(&new_dentry->d_lock);
 	d_add(new_dentry, NULL);
diff --git a/include/linux/jffs2.h b/include/linux/jffs2.h
index 65533bb..dbe8c93 100644
--- a/include/linux/jffs2.h
+++ b/include/linux/jffs2.h
@@ -114,6 +114,12 @@ struct jffs2_unknown_node
 	jint32_t hdr_crc;
 };
 
+/*
+ * Non-standard directory entry type(s), for on-disk use
+ */
+
+#define                JFFS2_DT_FALLTHRU       (DT_WHT + 1)
+
 struct jffs2_raw_dirent
 {
 	jint16_t magic;
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 16/39] fallthru: tmpfs fallthru support
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (14 preceding siblings ...)
  2010-05-03 23:12   ` Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 17/39] union-mount: Union mounts documentation Valerie Aurora
                   ` (22 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

Add support for fallthru directory entries to tmpfs

XXX - Makes up inode number for dirent

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/dcache.c |    3 +-
 fs/libfs.c  |   21 +++++++++++++++++--
 mm/shmem.c  |   60 ++++++++++++++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 73 insertions(+), 11 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b76f9e4..1575af4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2240,7 +2240,8 @@ resume:
 		 * we can evict it.
 		 */
 		if (d_unhashed(dentry)||(!dentry->d_inode &&
-					 !d_is_whiteout(dentry)))
+					 !d_is_whiteout(dentry) &&
+					 !d_is_fallthru(dentry)))
 			continue;
 		if (!list_empty(&dentry->d_subdirs)) {
 			this_parent = dentry;
diff --git a/fs/libfs.c b/fs/libfs.c
index ea9a6cc..2b28ca9 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -134,6 +134,7 @@ int dcache_readdir(struct file * filp, void * dirent, filldir_t filldir)
 	struct dentry *cursor = filp->private_data;
 	struct list_head *p, *q = &cursor->d_u.d_child;
 	ino_t ino;
+	int d_type;
 	int i = filp->f_pos;
 
 	switch (i) {
@@ -159,14 +160,28 @@ int dcache_readdir(struct file * filp, void * dirent, filldir_t filldir)
 			for (p=q->next; p != &dentry->d_subdirs; p=p->next) {
 				struct dentry *next;
 				next = list_entry(p, struct dentry, d_u.d_child);
-				if (d_unhashed(next) || !next->d_inode)
+				if (d_unhashed(next) || (!next->d_inode && !d_is_fallthru(next)))
 					continue;
 
+				if (d_is_fallthru(next)) {
+					/* XXX We don't know the inode
+					 * number of the directory
+					 * entry in the underlying
+					 * file system.  Should look
+					 * it up, either on fallthru
+					 * creation at first readdir
+					 * or now at filldir time. */
+					ino = 123; /* Made up ino */
+					d_type = DT_UNKNOWN;
+				} else {
+					ino = next->d_inode->i_ino;
+					d_type = dt_type(next->d_inode);
+				}
+
 				spin_unlock(&dcache_lock);
 				if (filldir(dirent, next->d_name.name, 
 					    next->d_name.len, filp->f_pos, 
-					    next->d_inode->i_ino, 
-					    dt_type(next->d_inode)) < 0)
+					    ino, d_type) < 0)
 					return 0;
 				spin_lock(&dcache_lock);
 				/* next is still alive */
diff --git a/mm/shmem.c b/mm/shmem.c
index c58ecf4..163957b 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1809,8 +1809,7 @@ static int shmem_rmdir(struct inode *dir, struct dentry *dentry);
 static int shmem_unlink(struct inode *dir, struct dentry *dentry);
 
 /*
- * This is the whiteout support for tmpfs. It uses one singleton whiteout
- * inode per superblock thus it is very similar to shmem_link().
+ * Create a dentry to signify a whiteout.
  */
 static int shmem_whiteout(struct inode *dir, struct dentry *old_dentry,
 			  struct dentry *new_dentry)
@@ -1841,8 +1840,10 @@ static int shmem_whiteout(struct inode *dir, struct dentry *old_dentry,
 		spin_unlock(&sbinfo->stat_lock);
 	}
 
-	if (old_dentry->d_inode) {
-		if (S_ISDIR(old_dentry->d_inode->i_mode))
+	if (old_dentry->d_inode || d_is_fallthru(old_dentry)) {
+		/* A fallthru for a dir is treated like a regular link */
+		if (old_dentry->d_inode &&
+		    S_ISDIR(old_dentry->d_inode->i_mode))
 			shmem_rmdir(dir, old_dentry);
 		else
 			shmem_unlink(dir, old_dentry);
@@ -1859,6 +1860,48 @@ static int shmem_whiteout(struct inode *dir, struct dentry *old_dentry,
 }
 
 static void shmem_d_instantiate(struct inode *dir, struct dentry *dentry,
+				struct inode *inode);
+
+/*
+ * Create a dentry to signify a fallthru.  A fallthru in tmpfs is the
+ * logical equivalent of an in-kernel readdir() cache.  It can't be
+ * deleted until the file system is unmounted.
+ */
+static int shmem_fallthru(struct inode *dir, struct dentry *dentry)
+{
+	struct shmem_sb_info *sbinfo = SHMEM_SB(dir->i_sb);
+
+	/* FIXME: this is stupid */
+	if (!(dir->i_sb->s_flags & MS_WHITEOUT))
+		return -EPERM;
+
+	if (dentry->d_inode || d_is_fallthru(dentry) || d_is_whiteout(dentry))
+		return -EEXIST;
+
+	/*
+	 * Each new link needs a new dentry, pinning lowmem, and tmpfs
+	 * dentries cannot be pruned until they are unlinked.
+	 */
+	if (sbinfo->max_inodes) {
+		spin_lock(&sbinfo->stat_lock);
+		if (!sbinfo->free_inodes) {
+			spin_unlock(&sbinfo->stat_lock);
+			return -ENOSPC;
+		}
+		sbinfo->free_inodes--;
+		spin_unlock(&sbinfo->stat_lock);
+	}
+
+	shmem_d_instantiate(dir, dentry, NULL);
+	dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+
+	spin_lock(&dentry->d_lock);
+	dentry->d_flags |= DCACHE_FALLTHRU;
+	spin_unlock(&dentry->d_lock);
+	return 0;
+}
+
+static void shmem_d_instantiate(struct inode *dir, struct dentry *dentry,
 				struct inode *inode)
 {
 	if (d_is_whiteout(dentry)) {
@@ -1866,14 +1909,15 @@ static void shmem_d_instantiate(struct inode *dir, struct dentry *dentry,
 		shmem_free_inode(dir->i_sb);
 		if (S_ISDIR(inode->i_mode))
 			inode->i_mode |= S_OPAQUE;
+	} else if (d_is_fallthru(dentry)) {
+		shmem_free_inode(dir->i_sb);
 	} else {
 		/* New dentry */
 		dir->i_size += BOGO_DIRENT_SIZE;
 		dget(dentry); /* Extra count - pin the dentry in core */
 	}
-	/* Will clear DCACHE_WHITEOUT flag */
+	/* Will clear DCACHE_WHITEOUT and DCACHE_FALLTHRU flags */
 	d_instantiate(dentry, inode);
-
 }
 /*
  * File creation. Allocate an inode, and we're done..
@@ -1962,7 +2006,8 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)
 {
 	struct inode *inode = dentry->d_inode;
 
-	if (d_is_whiteout(dentry) || (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode)))
+	if (d_is_whiteout(dentry) || d_is_fallthru(dentry) ||
+	    (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode)))
 		shmem_free_inode(dir->i_sb);
 
 	if (inode) {
@@ -2596,6 +2641,7 @@ static const struct inode_operations shmem_dir_inode_operations = {
 	.mknod		= shmem_mknod,
 	.rename		= shmem_rename,
 	.whiteout       = shmem_whiteout,
+	.fallthru       = shmem_fallthru,
 #endif
 #ifdef CONFIG_TMPFS_POSIX_ACL
 	.setattr	= shmem_notify_change,
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 17/39] union-mount: Union mounts documentation
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (15 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 16/39] fallthru: tmpfs " Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-04  1:54   ` Valdis.Kletnieks
  2010-05-04 21:12   ` Jamie Lokier
  2010-05-03 23:12 ` [PATCH 18/39] union-mount: Introduce MNT_UNION and MS_UNION flags Valerie Aurora
                   ` (21 subsequent siblings)
  38 siblings, 2 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

Document design and implementation of union mounts (a.k.a. writable
overlays).
---
 Documentation/filesystems/union-mounts.txt |  899 ++++++++++++++++++++++++++++
 1 files changed, 899 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/filesystems/union-mounts.txt

diff --git a/Documentation/filesystems/union-mounts.txt b/Documentation/filesystems/union-mounts.txt
new file mode 100644
index 0000000..ba830e8
--- /dev/null
+++ b/Documentation/filesystems/union-mounts.txt
@@ -0,0 +1,899 @@
+Union mounts (a.k.a. writable overlays)
+=======================================
+
+This document describes the architecture and current status of union
+mounts, also known as writable overlays.
+
+In this document:
+ - Overview of union mounts
+ - Terminology
+ - VFS implementation
+ - Locking strategy
+ - VFS/file system interface
+ - Userland interface
+ - NFS interaction
+ - Status
+ - Contributing to union mounts
+
+Overview
+========
+
+A union mount layers one read-write file system over a one read-only
+file system, with all writes going to the writable file system.  The
+namespace of both file systems appears as a combined whole to
+userland, with files and directories on the writable file system
+covering up any files or directories with matching pathnames on the
+read-only file system.  The read-write file system is the "topmost"
+or "upper" file system and the read-only file system is the "lower"
+file system.  A few use cases:
+
+- Root file system on CD with writes saved to hard drive (LiveCD)
+- Multiple virtual machines with the same starting root file system
+- Cluster with NFS mounted root on clients
+
+Most if not all of these problems could be solved with a COW block
+device or a clustered file system (include NFS mounts).  However, for
+some use cases, sharing is more efficient and better performing if
+done at the file system namespace level.  COW block devices only
+increase their divergence as time goes on, and a fully coherent
+writable file system is unnecessary synchronization overhead if no
+other client needs to see the writes.
+
+What union mounts are not
+-------------------------
+
+Union mounts are not a general-purpose unioning file system.  They do
+not provide a generic "union of namespaces" operation for an arbitrary
+number of file systems.  Many interesting features can be implemented
+with a generic unioning facility: unioning of more than two file
+systems, dynamic insertion and removal of branches, online upgrade,
+etc.  Some unioning file systems that do this are UnionFS and AUFS.
+
+File systems can only be union mounted at their mountpoints, and the
+lower level file system cannot have any submounts.
+
+Terminology
+===========
+
+The main physical metaphor for union mounts is that a writable file
+system is mounted "on top" of a read-only file system.  Lookups start
+at the "topmost" read-write file system and travel "down" to the
+"bottom" read-only file system only if no blocking entry exists on the
+top layer.
+
+Topmost layer: The read-write file system.  Lookups begin here.
+
+Bottom layer: The read-only file system.  Lookups end here.
+
+Path: Combination of the vfsmount and dentry structure.
+
+Follow down: Given a path from the top layer, find the corresponding
+path on the bottom layer.
+
+Follow up: Given a path from the bottom layer, find the corresponding
+path on the top layer.
+
+Whiteout: A directory entry in the top layer that prevents lookups
+from travelling down to the bottom layer.  Created on unlink()/rmdir()
+if a corresponding directory entry exists in the bottom layer.
+
+Opaque flag: A flag on a directory in the top layer that prevents
+lookups of entries in this directory from travelling down to the
+bottom layer (unless there is an explicit fallthru entry allowing that
+for a particular entry).  Set on creation of a directory that replaces
+a whiteout, and after a directory copyup.
+
+Fallthru: A directory entry which allows lookups to "fall through" to
+the bottom layer for that exact directory entry.  This serves as a
+placeholder for directory entries from the bottom layer during
+readdir().  Fallthrus override opaque flags.
+
+File copyup: Create a file on the top layer that has the same metadata
+and contents as the file with the same pathname on the bottom layer.
+
+Directory copyup: Copy up the visible directory entries from the
+bottom layer as fallthrus in the matching top layer directory.  Mark
+the directory opaque to avoid unnecessary negative lookups on the
+bottom layer.
+
+Examples
+========
+
+What happens when I...
+
+- creat() /newfile -> creates on topmost layer
+- unlink() /oldfile -> creates a whiteout on topmost layer
+- Edit /existingfile -> copies up to top layer at open(O_WR) time
+- truncate /existingfile -> copies up to topmost layer + N bytes if specified
+- touch()/chmod()/chown()/etc. -> copies up to topmost layer
+- mkdir() /newdir -> creates on topmost layer
+- rmdir() /olddir -> creates a whiteout on topmost layer
+- mkdir() /olddir after above -> creates on topmost layer w/ opaque flag
+- readdir() /shareddir -> copies up entries from bottom layer as fallthrus
+- link() /oldfile /newlink -> copies up /oldfile, creates /newlink on topmost layer
+- symlink() /oldfile /symlink -> nothing special
+- rename() /oldfile /newfile -> copies up /oldfile to /newfile on top layer
+- rename() /olddir /newdir -> EXDEV
+- rename() /topmost_only_dir /topmost_only_dir2 -> success
+
+Getting to a root file system with union mounts:
+
+- Mount the base read-only file system as the root file system
+- Mount the read-only file system again on /newroot
+- Mount the read-write layer on /newroot:
+   # mount -o union /dev/sda /newroot
+- pivot_root to /newroot
+- Start init
+
+See scripts/pivot.sh in the UML devkit linked to from:
+
+http://valerieaurora.org/union/
+
+VFS implementation
+==================
+
+Union mounts are implemented as an integral part of the VFS, rather
+than as a VFS client file system (i.e., a stacked file system like
+unionfs or ecryptfs).  Implementing unioning inside the VFS eliminates
+the need for duplicate copies of VFS data structures, unnecessary
+indirection, and code duplication, but requires very maintainable,
+low-to-zero overhead code.  Union mounts require no change to file
+systems serving as the read-only layer, and requires some minor
+support from file systems serving as the read-write layer.  File
+systems that want to be the writable layer must implement the new
+->whiteout() and ->fallthru() inode operations, which create special
+dummy directory entries.
+
+The union mounts code must accomplish the following major tasks:
+
+1) Pass lookups through to the lower level file system.
+2) Copy files and directories up to the topmost layer when written.
+3) Create whiteouts and fallthrus as necessary.
+
+VFS objects and union mounts
+----------------------------
+
+First, some VFS basics:
+
+The VFS allows multiple mounts of the same file system.  For example,
+/dev/sda can be mounted at /usr and also at /mnt.  The same file
+system can be mounted read-only at one point and read-write at
+another.  Each of these mounts has its own vfsmount data structure in
+the kernel.  However, each underlying file system has exactly one
+in-kernel superblock structure no matter how many times it is mounted.
+All the separate vfsmounts for the same file system reference the same
+superblock data structure.
+
+Directory entries are cached by the VFS in dentry structures.  The VFS
+keeps one dentry structure for each file or directory in a file
+system, no matter how many times it is mounted.  Each dentry
+represents only one element of a path name.  When the VFS looks up a
+pathname (e.g., "/sbin/init"), the result is combination of vfsmount
+and dentry.  This <mnt,dentry> pair is usually stored in a kernel
+structure named "path", which is simply two pointers, one to the
+vfsmount and one to the dentry.  A "struct path" is this structure; a
+pathname is a string like "/etc/fstab".
+
+As an example, given:
+
+/dev/sda mounted on /mnt
+/dev/sda mounted on /mnt2
+
+A pathname lookup for "/mnt/etc" will yield the pair:
+
+<vfsmount for /mnt, dentry for "etc" on /dev/sda>
+
+A pathname lookup for "/mnt2/etc" will yield the pair:
+
+<vfsmount for /mnt2, dentry for "etc" on /dev/sda>
+
+The dentry in both cases will be the exact same structure in memory.
+
+A union mount maps <mnt,dentry> pairs from the file system mounted on
+the "top" to <mnt,dentry> pairs from the file system on the "bottom."
+The same dentry can be a member of more than one union mount.  For
+example, given:
+
+/dev/sdb union mounted on top of /dev/sda on /mnt/union1
+/dev/sdc union mounted on top of /dev/sda on /mnt/union2
+
+The dentry for the directory "etc/" on /dev/sda will part of two union
+mount mappings:
+
+<vfsmount for /dev/sdb on /mnt/union1, dentry for "etc" on /dev/sdb>
+ |
+ v
+<vfsmount for /dev/sda on /mnt/union1, dentry for "etc" on /dev/sda>
+
+And:
+
+<vfsmount for /dev/sdc on /mnt/union2, dentry for "etc" on /dev/sdb>
+ |
+ v
+<vfsmount for /dev/sda on /mnt/union2, dentry for "etc" on /dev/sda>
+
+All of this is to say that we require a full <mnt,dentry> pair to
+accomplish any union mount tasks like copying a file to the topmost
+layer or looking up a directory entry in a lower layer.  A dentry
+alone is not sufficient, since it can be part of several different
+union mounts.
+
+union_dir structure
+---------------------
+
+The first job of union mounts is to map directories from the topmost
+layer to directories with the same pathname in the lower layer.  That
+is, we need to map the <mnt,dentry> pair for a given directory
+pathname in the topmost layer to the <mnt,dentry> pair for the
+directory with the same pathname in the lower layer.  We do this with
+the union_dir structure:
+
+struct union_dir {
+	atomic_t u_count;		/* reference count */
+	struct list_head u_unions;	/* list head for d_unions */
+	struct list_head u_list;	/* list head for mnt_unions */
+	struct hlist_node u_hash;	/* list head for searching */
+	struct hlist_node u_rhash;	/* list head for reverse searching */
+
+	struct path u_upper;		/* this is me */
+	struct path u_lower;		/* this is what I overlay */
+};
+
+This structure is flexible enough to support an arbitrary number of
+layers of unioned file systems, not just the current two-layer
+implementation.  As such, this section will talk about mapping "upper"
+directories to "lower" directories, instead of "topmost" directories
+to "bottom" directories.
+
+At the time of a union mount, we allocate a union_dir structure to map
+the root directory of the upper layer to the root directory of the
+lower layer.  In pseudo-code:
+
+u_upper = <upper mnt,dentry for "/">
+u_lower = <lower mnt,dentry for "/">
+
+This union_dir structure is then added to the union cache hash table,
+linked through u_hash, where it can be looked up via union_lookup()
+with the <upper mnt,dentry> pair as the key.  A reverse lookup is also
+included (union_rlookup() using the <lower mnt,dentry> pair, linked
+through u_rhash) but is not currently used.
+
+The union_dir is also added to the list of union_dir structures that
+reference this dentry as the topmost dentry.  This list is linked
+through u_unions member in struct union_dir and the new d_unions
+member in struct dentry.  The new d_union_lower_count member in struct
+dentry is a reference count showing how many unions reference this
+dentry through u_lower - that is, how many mounts this dentry is a
+lower dentry for.
+
+struct dentry {
+[...]
+#ifdef CONFIG_UNION_MOUNT
+	/*
+	 * Union mount structures that reference this dentry as the
+	 * upper layer are linked through the d_unions field.  If this
+	 * list is not empty, then this dentry is part of a unioned
+	 * directory stack.  Protected by union_lock.
+	 */
+	struct list_head d_unions;
+	/*
+	 * Reference count of union_dirs with this dentry in the
+	 * u_lower field of a union mount structure - that is, it is a
+	 * dentry for a lower layer of a union.  This count is NOT
+	 * incremented for the dentry that is part of the topmost
+	 * layer of a union.
+	 */
+	unsigned int d_union_lower_count;
+#endif
+[...]
+};
+
+Each union_dir is also linked through the new mnt_unions member in the
+vfsmount structure of the upper mount:
+
+struct vfsmount {
+[...]
+#ifdef CONFIG_UNION_MOUNT
+	struct list_head mnt_unions;	/* list of union_dir structures */
+#endif
+[...]
+};
+
+Traversing the union stack
+--------------------------
+
+The set of union_dir structures referring to a particular pathname are
+called collectively the union stack for that directory. (In the
+current code, only two layers and one union mount structure per path
+is allowed, but multiple layers are possible.) Note that in a union
+stack, none of the union_dir structures reference each other directly.
+Each union_dir struct records the relationship between two
+<mnt,dentry> pairs, the upper pair and the lower pair.  If a third
+layer existed, you would traverse from the top layer to the second
+layer by calling union_lookup() on the top layer's <mnt,dentry> pair.
+This would return the union_dir struct with u_upper pointing to the
+top layer's <mnt,dentry>.  Next you would take u_lower, which points
+to the second layer's <mnt,dentry> and call union_lookup() on that,
+which would return the union_dir mapping the second layer's
+<mnt,dentry> to the third layer's <mnt,dentry>.
+
+To traverse "down" the union stack one layer, use union_down_one().
+Currently, we never traverse the union stack "up" except as part of
+the normal VFS follow_mount() operation.  follow_mount() is what lets
+us traverse from the directory serving as mountpoint to the root
+directory of the file system mounted at that mountpoint.  Traversing
+the union stack "up" introduces lock ordering problems and generally
+complicates the code to the point of unmaintainability.  Currently,
+union mounts performs all its tasks as it traverses the union stack
+exactly once, going "down" in the union mounts terminology.
+
+Code paths
+----------
+
+Union mounts modify the following key code paths in the VFS:
+
+- mount()/umount()
+- Pathname lookup
+- Any path that modifies an existing file
+
+Mount
+-----
+
+Union mounts are created in two steps:
+
+1. Mount the bottom layer file system read-only in the usual manner.
+2. Mount the top layer with the "-o union" option at the same mountpoint.
+
+The bottom layer must be read-only and the top layer must be
+read-write and support whiteouts and fallthrus.  A file system that
+supports whiteouts and fallthrus indicates this by setting the
+MS_WHITEOUT flag in the superblock.  Currently, the top layer is
+forced to "noatime" to avoid a copyup on every access of a file.
+Supporting atime with the current infrastructure would require a
+copyup on every open().  The "relatime" option would be equally
+efficient if the atime is the same or more recent than the mtime/ctime
+for every object on the read-only file system, and if the 24-hour
+timeout on relatime was disabled.  However, this is probably not
+worthwhile for the majority of union mount use cases.
+
+The current step-by-step method of mounting union file systems won't
+work for three or more layers.  Say you want to union mount three file
+systems on /mnt/union:
+
+/dev/bottom - read-only bottom layer
+/dev/middle - read-only middle layer
+/dev/topmost - read-write topmost layer
+
+First you mount the bottom layer read-only:
+
+mount -o ro /dev/bottom /mnt/union
+
+Then you want to mount the middle layer also read-only, but union
+mounts requires that the top layer be read-write in order to support
+readdir() correctly:
+
+mount -o ro,union /dev/middle /mnt/union # WON'T WORK, fails
+
+The other approach is to mount the middle layer as read-write, but
+then the third mount of the topmost layer will fail because the
+underlying layer is not read-only:
+
+mount -o union /dev/middle /mnt/union
+mount -o union /dev/topmost /mnt/union # WON'T WORK, fails
+
+Two obvious options present themselves:
+
+1) Automatically attempt to convert the covered layer to read-only
+status.  In this case, the mount of /dev/topmost would attempt to
+atomically remount /dev/middle as read-only during sys_mount().  If it
+succeeds, it would go on to mount /dev/topmost as read-write and
+unioned.  This would actually be a usability improvement, since the
+administrator need not remember to mount the lower layers read-only.
+
+2) Execute the mount of all three layers in one system call by passing
+a mount option that is a string describing all the devices to be
+unioned together.  This is ugly for obvious reasons: string parsing in
+the kernel, poor error granularity, need to unwind complicated state
+if the mount fails partway through the stack.
+
+The lower layer file system must not have any submounts - other file
+systems mounted at points in the lower file system's namespace.  File
+systems can only be union mounted at their root directories.  Without
+this restriction, some VFS operations must always do a union_lookup()
+- requiring a global lock - in order to find out if a path is
+potentially unioned.  With this restriction, we can tell if a path is
+potentially unioned by checking a flag in the vfsmount.
+
+pivot_root() to a union mounted file system is supported.  The
+recommended way to get to a union mounted root file system is to boot
+with the read-only mount as the root file system, construct the union
+mount on an entirely new mount, and pivot_root() to the new union
+mount root.  Attempting to union mount the root file system later in
+boot will result in covering other file systems, e.g., /proc, which
+isn't permitted in the current code and is a bad idea anyway.
+
+Hard read-only file systems
+---------------------------
+
+Union mounts require the lower layer of the file system to be
+read-only.  However, in Linux, any individual file system may be
+mounted at multiple places in the namespace, and a file system can be
+changed from read-only to read-write while still mounted.  Thus, simply
+checking that the bottom layer is read-only at the time the writable
+overlay is mounted over it is pointless, since at any time the bottom
+layer may become read-write.
+
+We have to guarantee that a file system will be read-only for as long
+as it is the bottom layer of a union mount.  To do this, we track the
+number of hard read-only users of a file system in its VFS superblock
+structure.  When we union mount a writable overlay over a file system,
+we increment its read-only user count.  The file system can only be
+mounted read-write if its read-only users count is zero.
+
+Todo:
+
+- Support hard read-only NFS mounts.  See discussion here:
+
+  http://markmail.org/message/3mkgnvo4pswxd7lp
+
+Pathname lookup
+---------------
+
+Pathname lookup in a unioned directory traverses down the union stack
+for the parent directory, looking up each pathname element in each
+layer of the file system (according to the rules of whiteouts,
+fallthrus, and opaque flags).  At mount time, the union stack for the
+root directory of the file system is created, and the union stack
+creation for every other unioned directory in the file system is
+boot-strapped using the already-existing union stack of the
+directory's parent.  In order to simplify the code greatly, every
+visible directory on the lower file system is required to have a
+matching directory on the upper file system.  This matching directory
+is created during pathname lookup if does not already exist.
+Therefore, each unioned directory is the child of another unioned
+directory (or is the root directory of the file system).
+
+As a high-level example, consider lookup of the lower layer file
+"/mnt/union/lower_subdir/lower_file" in the union of /dev/lower and
+/dev/upper, starting with the <mnt,dentry> pair for the the root
+directory of the union mount.
+
+First, we lookup "lower_subdir" in the parent directory, "/".  Since
+this is the root directory for the mount, it already has a union stack
+constructed, consisting of one struct union_dir in the union hash
+table, filled out with:
+
+um->u_upper = <upper mnt,dentry for "/">
+um->u_lower = <lower mnt,dentry for "/">
+
+Using union_down_one(), we traverse the union stack for "/", looking
+up "lower_subdir" in the "/" directory for /dev/upper, and then in
+/dev/lower.  "lower_subdir" only exists in the lower layer, so we
+create a matching directory in the upper layer, and then allocate and
+fill out a union_dir struct that maps these directories to each other:
+
+um->u_upper = <upper mnt,dentry for "lower_subdir">
+um->u_lower = <lower mnt,dentry for "lower_subdir">
+
+Now lookup proceeds with the <upper mnt,dentry> for "lower_subdir" and
+the pathname element "lower_file".  We lookup "lower_file" in the
+upper layer directory, finding no match.  Since this is a unioned
+directory, we call union_down_one() on the <upper mnt,dentry for
+"lower_subdir">, which lookups up the union_dir structure we just
+created and returns the <lower mnt,dentry> pair.  We then lookup
+"lower_file" in the lower layer directory, which succeeds.  Unlike
+directories, files are not copied up at lookup time, so pathname
+lookup for "/mnt/union/lower_subdir/lower_file" is now complete with
+the final struct path of <lower mnt,dentry for "lower_file">.
+
+At a finer level of detail, the actual union lookup function is called
+in the following code paths:
+
+do_lookup()->do_union_lookup()->lookup_union()->__lookup_union()
+lookup_hash()->lookup_union()->__lookup_union()
+
+__lookup_union() is where the rules of whiteouts, fallthrus, and
+opaque flags are actually implemented.  __lookup_union() returns
+either the first visible dentry, or a negative dentry from the topmost
+file system if no matching dentry exists.  If it finds a directory, it
+looks up any potential matching lower layer directories.  If it finds
+a lower layer directory, it calls append_to_union() on the pair of
+directories.  append_to_union() looks up the upper path in the union
+cache and if no union cache entry already exists, it creates one.
+
+Note that not all directories in a union mount are unioned, only those
+with matching directories on the lower layer.  The macro
+IS_UNIONED_DIR() is a cheap, constant time way to check if a directory
+is unioned, while IS_MNT_UNION() checks if the entire mount is unioned
+(and therefore whether the directory in question is potentially
+unioned).
+
+Currently, lookup of a negative dentry in a unioned directory requires
+a lookup in every directory in the union stack every time it is looked
+up.  We could avoid subsequent lookups by adding a negative union
+cache entry, exactly the way negative dentries are cached.
+
+File copyup
+-----------
+
+Any system call that alters the data or metadata of a file on the
+bottom layer, or creates or changes a hard link to it will trigger a
+copyup of the target file from the lower layer to the topmost layer
+
+ - open(O_WRITE | O_RDWR | O_APPEND | O_DIRECT)
+ - truncate()/open(O_TRUNC)
+ - link()
+ - rename()
+ - chmod()
+ - chown()/lchown()
+ - utimes()
+ - setxattr()/lsetxattr()
+
+Copyup of a file due to open(O_WRITE) has already occurred when:
+
+ - write()
+ - ftruncate()
+ - writable mmap()
+
+The following system calls will fail on an fd opened O_RDONLY:
+
+ - fchmod()
+ - fchown()
+ - fsetxattr()
+ - futimensat()
+
+Contrary to common sense, the above system calls are defined to
+succeed on O_RDONLY fds.  The idea seems to be that the
+O_RDONLY/O_RDWR/O_WRITE flags only apply to the actual file data, not
+to any form of metadata (times, owner, mode, or even extended
+attributes).  Applications making these system calls on O_RDONLY fds
+are correct according to the standard and work on non-union-mounts.
+They will need to be rewritten (O_RDONLY -> O_RDWR) to work on union
+mounts.  We suspect this usage is uncommon.
+
+This deviation from standard is due to technical limitations of the
+union mount implementation.  Specifically, we would need to replace an
+open file descriptor from the lower layer with an open file descriptor
+for a file with matching pathname and contents on the upper layer,
+which is difficult to do.  We avoid this in other system calls by
+doing the copyup before the file is opened.  Unionfs doesn't encounter
+this problem because it creates a dummy file struct which redirects or
+fans out operations to the struct files for the underlying file
+systems.
+
+From an application's point of view, the result of an in-kernel file
+copyup is the logical equivalent of another application updating the
+file via the rename() pattern: creat() a new file, copy the data over,
+make changes the copy, and rename() over the old version.  Any
+existing open file descriptors for that file (including those in the
+same application) refer to a now invisible object that used to have
+the same pathname.  Only opens that occur after the copyup will see
+updates to the file.
+
+Permission checks
+-----------------
+
+We want to be sure we have the correct permissions to actually succeed
+in a system call before copying a file up to avoid unnecessary IO.  At
+present, the permission check for a single system call may be spread
+out over many hundreds of lines of code (e.g., open()).  In order to
+check permissions, we occasionally need to determine if there is a
+writable overlay on top of this inode.  This requires a full path, but
+often we only have the inode at this point.  In particular,
+inode_permission() returns EROFS if the inode is on a read-only file
+system, which is the wrong answer if there is a writable overlay
+mounted on top of it.
+
+Another trouble-maker is may_open(), which both checks permissions for
+open AND truncates the file if O_TRUNC is specified.  It doesn't make
+any sense to copy up the file and then let may_open() truncate it, but
+we can't copy it after may_open() truncates it either.  The current
+ugly hack is to pass the full nameidata to may_open() and copyup
+inside may_open().
+
+Some solutions:
+
+- Create __inode_permission() and pass it a flag telling it whether or
+  not to check for a read-only fs.  Create union_permission() which
+  takes a path, checks for a union mount, and sets the rofs flag.
+  Place the file copyup call after all the permission checks are
+  completed.  Push down the full path into the functions that need it
+  and currently only take the dentry or inode.
+
+- For each instance in which we might want to copyup, move permission
+  checks into a new function and call it from a level at which we
+  still have the full path.  Pass it an "ignore read-only fs" flag if
+  the file is on a union mount.  Pass around the ignore-rofs flag
+  inside the function doing permission checks.  If all the permission
+  checks complete successfully, copyup the file.  Would require moving
+  truncate out of may_open().
+
+Todo:
+ - On truncate, only copy up the N bytes of file data requested
+ - Make sure above handles truncate beyond EOF correctly
+ - File copyup on chown()/chmod()/chattr() etc.
+ - File copyup on open(O_APPEND)
+ - File copyup on open(O_DIRECT)
+
+Impact on non-union kernels and mounts
+--------------------------------------
+
+Union-related data structures, extra fields, and function calls are
+#ifdef'd out at the function/macro level with CONFIG_UNION_MOUNT in
+nearly all cases (see include/linux/union.h).
+
+Todo:
+
+ - Do performance tests
+
+Locking strategy
+================
+
+The current union mount locking strategy is based on the following
+rules:
+
+* Exactly two file systems are unioned
+* The bottom file system is always read-only
+* The top file system is always read-write
+  => A file system can never a top and a bottom layer at the same time
+
+Additionally, the top layer may only be mounted exactly once.  Don't
+think of the top layer as a separate independent file system; when it
+is part of a union mount, it is only a file system in conjunction with
+the read-only bottom layer.  The read-only bottom layer is an
+independent file system in and of itself and can be mounted elsewhere,
+including as the bottom layer for another union mount.
+
+Thus, we may define a stable locking order in terms of top layer and
+bottom layer locks, since a top layer is never a bottom layer and a
+bottom layer is never a top layer.  Another simplifying assumption is
+that all directories in a pathname exist on the top layer, as they are
+created step-by-step during lookup.  This prevents us from ever having
+to walk backwards up the path creating directory entries, which can
+get complicated.  By implication, parent directories paths during any
+operation (rename(), unlink(),etc.) are from the top layer.  Dentries
+for directories from the bottom layer are only ever seen or used by
+the lookup code.
+
+The two major problems we avoid with the above rules are:
+
+Lock ordering: Imagine two union stacks with the same two file
+systems: A mounted over B, and B mounted over A.  Sometimes locks on
+objects in both A and B will have to be held simultanously.  What
+order should they be acquired in?  Simply acquiring them from top to
+bottom will create a lock-ordering problem - one thread acquires lock
+on object from A and then tries for a lock on object from B, while
+another thread grabs the lock on object from B and then waits for the
+lock on object from A.  Some other lock ordering must be defined.
+
+Movement/change/disappearance of objects on multiple layers: A variety
+of nasty corner cases arise when more than one layer is changing at
+the same time.  Changes in the directory topology and their effect on
+inheritance are of special concern.  Al Viro's canonical email on the
+subject:
+
+http://lkml.indiana.edu/hypermail/linux/kernel/0802.0/0839.html
+
+We don't try to solve any of these cases, just avoid them in the first
+place.
+
+Todo: Prevent top layer from being mounted more than once.
+
+Cross-layer interactions
+------------------------
+
+The VFS code simultaneously holds references to and/or modifies
+objects from both the top and bottom layers in the following cases:
+
+Path lookup:
+
+Grabs i_mutex on bottom layer while holding i_mutex on top layer
+directory inode.
+
+File copyup:
+
+Holds i_mutex on the parent directory from the top layer while copying
+up file from lower layer.
+
+link():
+
+File copyup of target while holding i_mutex on parent directory on top
+layer.  Followed by a normal link() operation.
+
+rename():
+
+Holds s_vfs_rename_mutex on the top layer, i_mutex of the source's
+parent dir (top layer), and i_mutex of the target's parent dir (also
+top layer) while looking up and copying the bottom layer target and
+also creating the whiteout.
+
+Notes on rename():
+
+First, renaming of directories returns EXDEV.  It's not at all
+reasonable to recursively copy directory trees and userspace has to
+handle this case anyway.  An exception is rename() of directories that
+exist only on the topmost layer; this succeeds.
+
+Rename involves three steps on a union mount: (1) copyup of the file
+from the bottom layer, (2) rename of the new top-layer copy to the
+target in the usual manner, (3) creation of a whiteout covering the
+source of the rename.
+
+Directory copyup:
+
+Directory entries are copied up on the first readdir().  We hold the
+top layer directory i_mutex throughout and sequentially acquire and
+drop the i_mutex for each lower layer directory.
+
+VFS-fs interface
+================
+
+Read-only layer: No support necessary other than enforcement of really
+really read-only semantics (done by VFS for local file systems).
+
+Writable layer: Must implement two new inode operations:
+
+int (*whiteout) (struct inode *, struct dentry *, struct dentry *);
+int (*fallthru) (struct inode *, struct dentry *);
+
+And set the MS_WHITEOUT flag to indicate support of these operations.
+
+Todo:
+
+- Decide what to return in d_ino of struct dirent
+  - As Miklos Szeredi points out, the inode number from the underlying
+    fs is from a different inode "namespace" and doesn't have any
+    useful meaning in the top layer fs.
+- Implement whiteouts and fallthrus in ext3
+- Implement whiteouts and fallthrus in btrfs
+
+Supported file systems
+----------------------
+
+Any file system can be a read-only layer.  File systems must
+explicitly support whiteouts and fallthrus in order to be a read-write
+layer.  This patch set implements whiteouts for ext2, tmpfs, and
+jffs2.  We have tested ext2, tmpfs, and iso9660 as the read-only
+layer.
+
+Todo:
+ - Test corner cases of case-insensitive/oversensitive file systems
+
+NFS interaction
+===============
+
+NFS is currently not supported as either type of layer.  NFS as
+read-only layer requires support from the server to honor the
+read-only guarantee needed for the bottom layer.  To do this, the
+server needs to revoke access to clients requesting read-only file
+systems if the exported file system is remounted read-write or
+unmounted (during which arbitrary changes can occur).  Some recent
+discussion:
+
+http://markmail.org/message/3mkgnvo4pswxd7lp
+
+NFS as the read-write layer would require implementation of the
+->whiteout() and ->fallthru() methods.  DT_WHT directory entries are
+theoretically already supported.
+
+Also, technically the requirement for a readdir() cookie that is
+stable across reboots comes only from file systems exported via NFSv2:
+
+http://oss.oracle.com/pipermail/btrfs-devel/2008-January/000463.html
+
+Todo:
+
+- Guarantee really really read-only on NFS exports
+- Implement whiteout()/fallthru() for NFS
+
+Userland support
+================
+
+The mount command must support the "-o union" mount option and pass
+the corresponding MS_UNION flag to the kerel.  A util-linux git
+tree with union mount support is here:
+
+git://git.kernel.org/pub/scm/utils/util-linux-ng/val/util-linux-ng.git
+
+File system utilities must support whiteouts and fallthrus.  An
+e2fsprogs git tree with union mount support is here:
+
+git://git.kernel.org/pub/scm/fs/ext2/val/e2fsprogs.git
+
+Currently, whiteout directory entries are not returned to userland.
+While the directory type for whiteouts, DT_WHT, has been defined for
+many years, very little userland code handles them.  Userland will
+never see fallthru directory entries.
+
+Known non-POSIX behaviors
+-------------------------
+
+- Any writing system call (unlink()/chmod()/etc.) can return ENOSPC or EIO
+- Link count may be wrong for files on bottom layer with > 1 link count
+- Link count on directories will be wrong before readdir() (fixable)
+- File copyup is the logical equivalent of an update via copy +
+  rename().  Any existing open file descriptors will continue to refer
+  to the read-only copy on the bottom layer and will not see any
+  changes that occur after the copy-up.
+- rename() of directory fails with EXDEV
+- inode number in d_ino of struct dirent will be wrong for fallthrus
+- fchmod()/fchown()/futimensat()/fsetattr() fail on O_RDONLY fds
+
+Status
+======
+
+The current union mounts implementation is feature-complete on local
+file systems and passes an extensive union mounts test suite,
+available in the union mounts Usermode Linux-based development kit:
+
+http://valerieaurora.org/union/union_mount_devkit.tar.gz
+
+The whiteout code has had some non-trivial level of review and
+testing, but the majority of the rest of the code has had no external
+review or testing outside the authors' machines.
+
+The latest version is available at:
+
+git://git.kernel.org/pub/scm/linux/kernel/git/val/linux-2.6.git
+
+Check the union mounts web page for the name of the latest branch:
+
+http://valerieaurora.org/union/
+
+Todo:
+
+- Run more tests (e.g., XFS test suite)
+- Get review from VFS maintainers
+
+Non-features
+------------
+
+Features we do not currently plan to support in union mounts:
+
+Online upgrade: E.g., installing software on a file system NFS
+exported to clients while the clients are still up and running.
+Allowing the read-only bottom layer of a union mount to change
+invalidates our locking strategy.
+
+Recursive copying of directories: E.g., implementing rename() across
+layers for directories.  Doing an in-kernel copy of a single file is
+bad enough.  Recursively copying a directory is a big no-no.
+
+Read-only top layer: The readdir() strategy fundamentally requires the
+ability to create persistent directory entries on the top layer file
+system (which may be tmpfs).  Numerous alternatives (including
+in-kernel or in-application caching) exist and are compatible with
+union mounts with its writing-readdir() implementation disabled.
+Creating a readdir() cookie that is stable across multiple readdir()s
+requires one of:
+
+- Write to stable storage (e.g., fallthru dentries)
+- Non-evictable kernel memory cache (doesn't handle NFS server reboot)
+- Per-application caching by glibc readdir()
+
+Aggregation of multiple read-only file systems: We are beginning to
+see how to implement this but it doesn't currently work.
+
+Often these features are supported by other unioning file systems or
+by other versions of union mounts.
+
+Contributing to union mounts
+============================
+
+The union mounts web page is here:
+
+http://valerieaurora.org/union/
+
+It links to:
+
+ - All git repositories
+ - Documentation
+ - An entire self-contained UML-based dev kit with README, etc.
+
+The best mailing list for discussing union mounts is:
+
+linux-fsdevel@vger.kernel.org
+
+http://vger.kernel.org/vger-lists.html#linux-fsdevel
+
+Thank you for reading!
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 18/39] union-mount: Introduce MNT_UNION and MS_UNION flags
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (16 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 17/39] union-mount: Union mounts documentation Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 19/39] union-mount: Introduce union_mount structure and basic operations Valerie Aurora
                   ` (20 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Miklos Szeredi, Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

Add per mountpoint flag for Union Mount support. You need additional patches
to util-linux for that to work - see:

git://git.kernel.org/pub/scm/utils/util-linux-ng/val/util-linux-ng.git

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namespace.c        |    5 ++++-
 include/linux/fs.h    |    1 +
 include/linux/mount.h |    4 ++--
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 9a40282..5e4b27b 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -808,6 +808,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt)
 		{ MNT_NODIRATIME, ",nodiratime" },
 		{ MNT_RELATIME, ",relatime" },
 		{ MNT_STRICTATIME, ",strictatime" },
+		{ MNT_UNION, ",union" },
 		{ 0, NULL }
 	};
 	const struct proc_fs_info *fs_infop;
@@ -2018,10 +2019,12 @@ long do_mount(char *dev_name, char *dir_name, char *type_page,
 		mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME);
 	if (flags & MS_RDONLY)
 		mnt_flags |= MNT_READONLY;
+	if (flags & MS_UNION)
+		mnt_flags |= MNT_UNION;
 
 	flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE |
 		   MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
-		   MS_STRICTATIME);
+		   MS_STRICTATIME | MS_UNION);
 
 	if (flags & MS_REMOUNT)
 		retval = do_remount(&path, flags & ~MS_REMOUNT, mnt_flags,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b59cd7b..dbd9881 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -192,6 +192,7 @@ struct inodes_stat_t {
 #define MS_REMOUNT	32	/* Alter flags of a mounted FS */
 #define MS_MANDLOCK	64	/* Allow mandatory locks on an FS */
 #define MS_DIRSYNC	128	/* Directory modifications are synchronous */
+#define MS_UNION	256	/* Merge namespace with FS mounted below */
 #define MS_NOATIME	1024	/* Do not update access times. */
 #define MS_NODIRATIME	2048	/* Do not update directory access times */
 #define MS_BIND		4096
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 4bd0547..f6b714c 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -43,9 +43,9 @@ struct mnt_namespace;
  */
 #define MNT_SHARED_MASK	(MNT_UNBINDABLE)
 #define MNT_PROPAGATION_MASK	(MNT_SHARED | MNT_UNBINDABLE)
+#define MNT_UNION	0x4000	/* if the vfsmount is a union mount */
 
-
-#define MNT_INTERNAL	0x4000
+#define MNT_INTERNAL	0x8000
 
 struct vfsmount {
 	struct list_head mnt_hash;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 19/39] union-mount: Introduce union_mount structure and basic operations
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (17 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 18/39] union-mount: Introduce MNT_UNION and MS_UNION flags Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 20/39] union-mount: Drive the union cache via dcache Valerie Aurora
                   ` (19 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

This patch adds the basic structures and operations of VFS-based union
mounts (but not the ability to mount or lookup unioned file systems).
Each directory in a unioned file system has an associated union stack
created when the directory is first looked up.  The union stack is a
structure kept in a hash table indexed by mount and dentry of the
directory; thus, specific paths are unioned, not dentries alone.  The
union stack keeps a pointer to the upper path and the lower path and
can be looked up by either path.

This particular version of union mounts is based on ideas by Jan
Blunck, Bharata Rao, and many others.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/Kconfig             |   13 ++
 fs/Makefile            |    1 +
 fs/dcache.c            |    4 +
 fs/union.c             |  289 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dcache.h |   18 +++-
 include/linux/mount.h  |    3 +
 include/linux/union.h  |   53 +++++++++
 7 files changed, 380 insertions(+), 1 deletions(-)
 create mode 100644 fs/union.c
 create mode 100644 include/linux/union.h

diff --git a/fs/Kconfig b/fs/Kconfig
index 5f85b59..360227d 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -59,6 +59,19 @@ source "fs/notify/Kconfig"
 
 source "fs/quota/Kconfig"
 
+config UNION_MOUNT
+       bool "Writable overlays (union mounts) (EXPERIMENTAL)"
+       depends on EXPERIMENTAL
+       help
+         Writable overlays allow you to mount a transparent writable
+	 layer over a read-only file system, for example, an ext3
+	 partition on a hard drive over a CD-ROM root file system
+	 image.
+
+	 See <file:Documentation/filesystems/union-mounts.txt> for details.
+
+	 If unsure, say N.
+
 source "fs/autofs/Kconfig"
 source "fs/autofs4/Kconfig"
 source "fs/fuse/Kconfig"
diff --git a/fs/Makefile b/fs/Makefile
index 97f340f..1949af2 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -52,6 +52,7 @@ obj-$(CONFIG_NFS_COMMON)	+= nfs_common/
 obj-$(CONFIG_GENERIC_ACL)	+= generic_acl.o
 
 obj-y				+= quota/
+obj-$(CONFIG_UNION_MOUNT)	+= union.o
 
 obj-$(CONFIG_PROC_FS)		+= proc/
 obj-y				+= partitions/
diff --git a/fs/dcache.c b/fs/dcache.c
index 1575af4..7b47f53 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -960,6 +960,10 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
 	INIT_LIST_HEAD(&dentry->d_lru);
 	INIT_LIST_HEAD(&dentry->d_subdirs);
 	INIT_LIST_HEAD(&dentry->d_alias);
+#ifdef CONFIG_UNION_MOUNT
+	INIT_LIST_HEAD(&dentry->d_unions);
+	dentry->d_union_lower_count = 0;
+#endif
 
 	if (parent) {
 		dentry->d_parent = dget(parent);
diff --git a/fs/union.c b/fs/union.c
new file mode 100644
index 0000000..4377cf4
--- /dev/null
+++ b/fs/union.c
@@ -0,0 +1,289 @@
+/*
+ * VFS based union mount for Linux
+ *
+ * Copyright (C) 2004-2007 IBM Corporation, IBM Deutschland Entwicklung GmbH.
+ * Copyright (C) 2007-2009 Novell Inc.
+ *
+ *   Author(s): Jan Blunck (j.blunck@tu-harburg.de)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include <linux/bootmem.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/hash.h>
+#include <linux/fs.h>
+#include <linux/mount.h>
+#include <linux/fs_struct.h>
+#include <linux/slab.h>
+#include <linux/union.h>
+
+/*
+ * This is borrowed from fs/inode.c. The hashtable for lookups. Somebody
+ * should try to make this good - I've just made it work.
+ */
+static unsigned int union_hash_mask __read_mostly;
+static unsigned int union_hash_shift __read_mostly;
+static struct hlist_head *union_hashtable __read_mostly;
+static unsigned int union_rhash_mask __read_mostly;
+static unsigned int union_rhash_shift __read_mostly;
+static struct hlist_head *union_rhashtable __read_mostly;
+
+/*
+ * Locking Rules:
+ * - dcache_lock (for union_rlookup() only)
+ * - union_lock
+ */
+DEFINE_SPINLOCK(union_lock);
+
+static struct kmem_cache *union_cache __read_mostly;
+
+static unsigned long hash(struct dentry *dentry, struct vfsmount *mnt)
+{
+	unsigned long tmp;
+
+	tmp = ((unsigned long)mnt * (unsigned long)dentry) ^
+		(GOLDEN_RATIO_PRIME + (unsigned long)mnt) / L1_CACHE_BYTES;
+	tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> union_hash_shift);
+	return tmp & union_hash_mask;
+}
+
+static __initdata unsigned long union_hash_entries;
+
+static int __init set_union_hash_entries(char *str)
+{
+	if (!str)
+		return 0;
+	union_hash_entries = simple_strtoul(str, &str, 0);
+	return 1;
+}
+
+__setup("union_hash_entries=", set_union_hash_entries);
+
+static int __init init_union(void)
+{
+	int loop;
+
+	union_cache = KMEM_CACHE(union_dir, SLAB_PANIC | SLAB_MEM_SPREAD);
+	union_hashtable = alloc_large_system_hash("Union-cache",
+						  sizeof(struct hlist_head),
+						  union_hash_entries,
+						  14,
+						  0,
+						  &union_hash_shift,
+						  &union_hash_mask,
+						  0);
+
+	for (loop = 0; loop < (1 << union_hash_shift); loop++)
+		INIT_HLIST_HEAD(&union_hashtable[loop]);
+
+
+	union_rhashtable = alloc_large_system_hash("rUnion-cache",
+						  sizeof(struct hlist_head),
+						  union_hash_entries,
+						  14,
+						  0,
+						  &union_rhash_shift,
+						  &union_rhash_mask,
+						  0);
+
+	for (loop = 0; loop < (1 << union_rhash_shift); loop++)
+		INIT_HLIST_HEAD(&union_rhashtable[loop]);
+
+	return 0;
+}
+
+fs_initcall(init_union);
+
+static struct union_dir *union_alloc(struct path *upper, struct path *lower)
+{
+	struct union_dir *ud;
+
+	BUG_ON(!S_ISDIR(upper->dentry->d_inode->i_mode));
+	BUG_ON(!S_ISDIR(lower->dentry->d_inode->i_mode));
+
+	ud = kmem_cache_alloc(union_cache, GFP_ATOMIC);
+	if (!ud)
+		return NULL;
+
+	atomic_set(&ud->u_count, 1);
+	INIT_LIST_HEAD(&ud->u_unions);
+	INIT_HLIST_NODE(&ud->u_hash);
+	INIT_HLIST_NODE(&ud->u_rhash);
+
+	ud->u_upper.mnt = upper->mnt;
+	ud->u_upper.dentry = upper->dentry;
+	ud->u_lower.mnt = mntget(lower->mnt);
+	ud->u_lower.dentry = dget(lower->dentry);
+
+	return ud;
+}
+
+struct union_dir *union_get(struct union_dir *ud)
+{
+	BUG_ON(!atomic_read(&ud->u_count));
+	atomic_inc(&ud->u_count);
+	return ud;
+}
+
+static int __union_put(struct union_dir *ud)
+{
+	if (!atomic_dec_and_test(&ud->u_count))
+		return 0;
+
+	BUG_ON(!hlist_unhashed(&ud->u_hash));
+	BUG_ON(!hlist_unhashed(&ud->u_rhash));
+
+	kmem_cache_free(union_cache, ud);
+	return 1;
+}
+
+void union_put(struct union_dir *ud)
+{
+	struct path tmp = ud->u_lower;
+
+	if (__union_put(ud))
+		path_put(&tmp);
+}
+
+static void __union_hash(struct union_dir *ud)
+{
+	hlist_add_head(&ud->u_hash, union_hashtable +
+		       hash(ud->u_upper.dentry, ud->u_upper.mnt));
+	hlist_add_head(&ud->u_rhash, union_rhashtable +
+		       hash(ud->u_lower.dentry, ud->u_lower.mnt));
+}
+
+static void __union_unhash(struct union_dir *ud)
+{
+	hlist_del_init(&ud->u_hash);
+	hlist_del_init(&ud->u_rhash);
+}
+
+static struct union_dir *union_cache_lookup(struct dentry *dentry, struct vfsmount *mnt)
+{
+	struct hlist_head *head = union_hashtable + hash(dentry, mnt);
+	struct hlist_node *node;
+	struct union_dir *ud;
+
+	hlist_for_each_entry(ud, node, head, u_hash) {
+		if ((ud->u_upper.dentry == dentry) &&
+		    (ud->u_upper.mnt == mnt))
+			return ud;
+	}
+
+	return NULL;
+}
+
+static struct union_dir *union_cache_rlookup(struct dentry *dentry, struct vfsmount *mnt)
+{
+	struct hlist_head *head = union_rhashtable + hash(dentry, mnt);
+	struct hlist_node *node;
+	struct union_dir *ud;
+
+	hlist_for_each_entry(ud, node, head, u_rhash) {
+		if ((ud->u_lower.dentry == dentry) &&
+		    (ud->u_lower.mnt == mnt))
+			return ud;
+	}
+
+	return NULL;
+}
+
+/*
+ * append_to_union - add a path to the bottom of the union stack
+ *
+ * Allocate and attach a union cache entry linking the new, upper
+ * mnt/dentry to the "covered" matching lower mnt/dentry.  It's okay
+ * if the union cache entry already exists.
+ */
+
+int append_to_union(struct path *upper, struct path *lower)
+{
+	struct union_dir *new, *ud;
+
+	BUG_ON(!S_ISDIR(upper->dentry->d_inode->i_mode));
+	BUG_ON(!S_ISDIR(lower->dentry->d_inode->i_mode));
+
+	/* Common case is that it's already been created, do a lookup first */
+
+	spin_lock(&union_lock);
+	ud = union_cache_lookup(upper->dentry, upper->mnt);
+	if (ud) {
+		BUG_ON((ud->u_lower.dentry != lower->dentry) ||
+		       (ud->u_lower.mnt != lower->mnt));
+		spin_unlock(&union_lock);
+		return 0;
+	}
+	spin_unlock(&union_lock);
+
+	new = union_alloc(upper, lower);
+	if (!new)
+		return -ENOMEM;
+
+	spin_lock(&union_lock);
+	ud = union_cache_lookup(upper->dentry, upper->mnt);
+	if (ud) {
+		/* Someone added it while we were allocating, no problem */
+		BUG_ON((ud->u_lower.dentry != lower->dentry) ||
+		       (ud->u_lower.mnt != lower->mnt));
+		spin_unlock(&union_lock);
+		union_put(new);
+		return 0;
+	}
+	__union_hash(new);
+	spin_unlock(&union_lock);
+	return 0;
+}
+
+/*
+ * WARNING! Confusing terminology alert.
+ *
+ * Note that the directions "up" and "down" in union mounts are the
+ * opposite of "up" and "down" in normal VFS operation terminology.
+ * "up" in the rest of the VFS means "towards the root of the mount
+ * tree."  If you mount B on top of A, following B "up" will get you
+ * A.  In union mounts, "up" means "towards the most recently mounted
+ * layer of the union stack."  If you union mount B on top of A,
+ * following A "up" will get you to B.  Another way to put it is that
+ * "up" in the VFS means going from this mount towards the direction
+ * of its mnt->mnt_parent pointer, but "up" in union mounts means
+ * going in the opposite direction (until you run out of union
+ * layers).
+ */
+
+/*
+ * union_down_one - get the next lower directory in the union stack
+ *
+ * This is called to traverse the union stack from the given layer to
+ * the next lower layer. union_down_one() is called by various
+ * lookup functions that are aware of union mounts.
+ *
+ * Returns non-zero if followed to the next lower layer, zero otherwise.
+ *
+ * See note on up/down terminology above.
+ */
+int union_down_one(struct vfsmount **mnt, struct dentry **dentry)
+{
+	struct union_dir *ud;
+
+	if (!IS_MNT_UNION(*mnt))
+		return 0;
+
+	spin_lock(&union_lock);
+	ud = union_cache_lookup(*dentry, *mnt);
+	spin_unlock(&union_lock);
+	if (ud) {
+		path_get(&ud->u_lower);
+		dput(*dentry);
+		*dentry = ud->u_lower.dentry;
+		mntput(*mnt);
+		*mnt = ud->u_lower.mnt;
+		return 1;
+	}
+	return 0;
+}
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index e035c51..1745881 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -100,7 +100,23 @@ struct dentry {
 	struct hlist_node d_hash;	/* lookup hash list */
 	struct dentry *d_parent;	/* parent directory */
 	struct qstr d_name;
-
+#ifdef CONFIG_UNION_MOUNT
+	/*
+	 * Union mount structures that reference this dentry as the
+	 * upper layer are linked through the d_unions field.  If this
+	 * list is not empty, then this dentry is part of a unioned
+	 * directory stack.  Protected by union_lock.
+	 */
+	struct list_head d_unions;
+	/*
+	 * Reference count of union_dirs with this dentry in the
+	 * u_lower field of a union mount structure - that is, it is a
+	 * dentry for a lower layer of a union.  This count is NOT
+	 * incremented for the dentry that is part of the topmost
+	 * layer of a union.
+	 */
+	unsigned int d_union_lower_count;
+#endif
 	struct list_head d_lru;		/* LRU list */
 	/*
 	 * d_child and d_rcu can share memory
diff --git a/include/linux/mount.h b/include/linux/mount.h
index f6b714c..0517114 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -64,6 +64,9 @@ struct vfsmount {
 	struct list_head mnt_slave_list;/* list of slave mounts */
 	struct list_head mnt_slave;	/* slave list entry */
 	struct vfsmount *mnt_master;	/* slave is on master->mnt_slave_list */
+#ifdef CONFIG_UNION_MOUNT
+	struct list_head mnt_unions;	/* list of union_mount structures */
+#endif
 	struct mnt_namespace *mnt_ns;	/* containing namespace */
 	int mnt_id;			/* mount identifier */
 	int mnt_group_id;		/* peer group identifier */
diff --git a/include/linux/union.h b/include/linux/union.h
new file mode 100644
index 0000000..d66beb7
--- /dev/null
+++ b/include/linux/union.h
@@ -0,0 +1,53 @@
+/*
+ * VFS based union mount for Linux
+ *
+ * Copyright (C) 2004-2007 IBM Corporation, IBM Deutschland Entwicklung GmbH.
+ * Copyright (C) 2007 Novell Inc.
+ *   Author(s): Jan Blunck (j.blunck@tu-harburg.de)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+#ifndef __LINUX_UNION_H
+#define __LINUX_UNION_H
+#ifdef __KERNEL__
+
+#include <linux/list.h>
+#include <asm/atomic.h>
+
+struct dentry;
+struct vfsmount;
+
+#ifdef CONFIG_UNION_MOUNT
+
+/*
+ * The union mount structure.
+ */
+struct union_dir {
+	atomic_t u_count;		/* reference count */
+	struct list_head u_unions;	/* list head for d_unions */
+	struct list_head u_list;	/* list head for mnt_unions */
+	struct hlist_node u_hash;	/* list head for searching */
+	struct hlist_node u_rhash;	/* list head for reverse searching */
+
+	struct path u_upper;		/* this is me */
+	struct path u_lower;		/* this is what I overlay */
+};
+
+#define IS_MNT_UNION(mnt)	((mnt)->mnt_flags & MNT_UNION)
+
+extern int append_to_union(struct path *, struct path*);
+extern int union_down_one(struct vfsmount **, struct dentry **);
+
+#else /* CONFIG_UNION_MOUNT */
+
+#define IS_MNT_UNION(x)			(0)
+#define append_to_union(x, y)		({ BUG(); (0); })
+#define union_down_one(x, y)		({ (0); })
+
+#endif	/* CONFIG_UNION_MOUNT */
+#endif	/* __KERNEL__ */
+#endif	/* __LINUX_UNION_H */
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 20/39] union-mount: Drive the union cache via dcache
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (18 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 19/39] union-mount: Introduce union_mount structure and basic operations Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 21/39] union-mount: Implement union lookup Valerie Aurora
                   ` (18 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

If a dentry is removed from dentry cache because its usage count drops to
zero, the references to the underlying layer of the unions the dentry is in
are dropped too. Therefore the union cache is driven by the dentry cache.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/dcache.c            |   13 +++++++++++
 fs/union.c             |   56 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dcache.h |    8 ++++++
 include/linux/mount.h  |    2 +-
 include/linux/union.h  |    4 +++
 5 files changed, 82 insertions(+), 1 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 7b47f53..322c1f7 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -18,6 +18,7 @@
 #include <linux/string.h>
 #include <linux/mm.h>
 #include <linux/fs.h>
+#include <linux/union.h>
 #include <linux/fsnotify.h>
 #include <linux/slab.h>
 #include <linux/init.h>
@@ -175,6 +176,8 @@ static struct dentry *d_kill(struct dentry *dentry)
 	dentry_stat.nr_dentry--;	/* For d_free, below */
 	/*drops the locks, at that point nobody can reach this dentry */
 	dentry_iput(dentry);
+	/* If the dentry was in an union delete them */
+	shrink_d_unions(dentry);
 	if (IS_ROOT(dentry))
 		parent = NULL;
 	else
@@ -696,6 +699,7 @@ static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
 					iput(inode);
 			}
 
+			shrink_d_unions(dentry);
 			d_free(dentry);
 
 			/* finished when we fall off the top of the tree,
@@ -1535,7 +1539,9 @@ void d_delete(struct dentry * dentry)
 	spin_lock(&dentry->d_lock);
 	isdir = S_ISDIR(dentry->d_inode->i_mode);
 	if (atomic_read(&dentry->d_count) == 1) {
+		__d_drop_unions(dentry);
 		dentry_iput(dentry);
+		shrink_d_unions(dentry);
 		fsnotify_nameremove(dentry, isdir);
 		return;
 	}
@@ -1546,6 +1552,13 @@ void d_delete(struct dentry * dentry)
 	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
 
+	/*
+	 * Remove any associated unions.  While someone still has this
+	 * directory open (ref count > 0), we could not have deleted
+	 * it unless it was empty, and therefore has no references to
+	 * directories below it.  So we don't need the unions.
+	 */
+	shrink_d_unions(dentry);
 	fsnotify_nameremove(dentry, isdir);
 }
 EXPORT_SYMBOL(d_delete);
diff --git a/fs/union.c b/fs/union.c
index 4377cf4..eb664e6 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -14,6 +14,7 @@
 
 #include <linux/bootmem.h>
 #include <linux/init.h>
+#include <linux/module.h>
 #include <linux/types.h>
 #include <linux/hash.h>
 #include <linux/fs.h>
@@ -235,6 +236,8 @@ int append_to_union(struct path *upper, struct path *lower)
 		union_put(new);
 		return 0;
 	}
+	list_add(&new->u_unions, &upper->dentry->d_unions);
+	lower->dentry->d_union_lower_count++;
 	__union_hash(new);
 	spin_unlock(&union_lock);
 	return 0;
@@ -287,3 +290,56 @@ int union_down_one(struct vfsmount **mnt, struct dentry **dentry)
 	}
 	return 0;
 }
+
+/**
+ * __d_drop_unions  -  remove all this dentry's unions from the union hash table
+ *
+ * @dentry - topmost dentry in the union stack to remove
+ *
+ * This must be called after unhashing a dentry. This is called with
+ * dcache_lock held and unhashes all the unions this dentry is
+ * attached to.
+ */
+void __d_drop_unions(struct dentry *dentry)
+{
+	struct union_dir *this, *next;
+
+	spin_lock(&union_lock);
+	list_for_each_entry_safe(this, next, &dentry->d_unions, u_unions)
+		__union_unhash(this);
+	spin_unlock(&union_lock);
+}
+EXPORT_SYMBOL_GPL(__d_drop_unions);
+
+/*
+ * This must be called after __d_drop_unions() without holding any
+ * locks.  Note: The dentry might still be reachable via a lookup but
+ * at that time it already a negative dentry. Otherwise it would be
+ * unhashed. The union_dir structure itself is still reachable through
+ * mnt->mnt_unions (which we protect against with union_lock).
+ *
+ * We were worried about a recursive dput() call through:
+ *
+ * dput()->d_kill()->shrink_d_unions()->union_put()->dput()
+ *
+ * But this path can only be reached if the dentry is unhashed when we
+ * enter the first dput(), and it can only be unhashed if it was
+ * rmdir()'d, and d_delete() calls shrink_d_unions() for us.
+ */
+void shrink_d_unions(struct dentry *dentry)
+{
+	struct union_dir *this, *next;
+
+repeat:
+	spin_lock(&union_lock);
+	list_for_each_entry_safe(this, next, &dentry->d_unions, u_unions) {
+		BUG_ON(!hlist_unhashed(&this->u_hash));
+		BUG_ON(!hlist_unhashed(&this->u_rhash));
+		list_del(&this->u_unions);
+		this->u_lower.dentry->d_union_lower_count--;
+		spin_unlock(&union_lock);
+		union_put(this);
+		goto repeat;
+	}
+	spin_unlock(&union_lock);
+}
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 1745881..31656e9 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -223,12 +223,20 @@ extern seqlock_t rename_lock;
  * __d_drop requires dentry->d_lock.
  */
 
+#ifdef CONFIG_UNION_MOUNT
+extern void __d_drop_unions(struct dentry *);
+#endif
+
 static inline void __d_drop(struct dentry *dentry)
 {
 	if (!(dentry->d_flags & DCACHE_UNHASHED)) {
 		dentry->d_flags |= DCACHE_UNHASHED;
 		hlist_del_rcu(&dentry->d_hash);
 	}
+#ifdef CONFIG_UNION_MOUNT
+	/* remove dentry from the union hashtable */
+	__d_drop_unions(dentry);
+#endif
 }
 
 static inline void d_drop(struct dentry *dentry)
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 0517114..13a3818 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -65,7 +65,7 @@ struct vfsmount {
 	struct list_head mnt_slave;	/* slave list entry */
 	struct vfsmount *mnt_master;	/* slave is on master->mnt_slave_list */
 #ifdef CONFIG_UNION_MOUNT
-	struct list_head mnt_unions;	/* list of union_mount structures */
+	struct list_head mnt_unions;	/* list of union_dir structures */
 #endif
 	struct mnt_namespace *mnt_ns;	/* containing namespace */
 	int mnt_id;			/* mount identifier */
diff --git a/include/linux/union.h b/include/linux/union.h
index d66beb7..70b2adb 100644
--- a/include/linux/union.h
+++ b/include/linux/union.h
@@ -41,12 +41,16 @@ struct union_dir {
 
 extern int append_to_union(struct path *, struct path*);
 extern int union_down_one(struct vfsmount **, struct dentry **);
+extern void __d_drop_unions(struct dentry *);
+extern void shrink_d_unions(struct dentry *);
 
 #else /* CONFIG_UNION_MOUNT */
 
 #define IS_MNT_UNION(x)			(0)
 #define append_to_union(x, y)		({ BUG(); (0); })
 #define union_down_one(x, y)		({ (0); })
+#define __d_drop_unions(x)		do { } while (0)
+#define shrink_d_unions(x)		do { } while (0)
 
 #endif	/* CONFIG_UNION_MOUNT */
 #endif	/* __KERNEL__ */
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 21/39] union-mount: Implement union lookup
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (19 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 20/39] union-mount: Drive the union cache via dcache Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 22/39] union-mount: Support for mounting union mount file systems Valerie Aurora
                   ` (17 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

Implement unioned directories, whiteouts, and fallthrus in pathname
lookup routines.  do_lookup() and lookup_hash() call lookup_union()
after looking up the dentry from the top-level file system.
lookup_union() is centered around __lookup_hash(), which does cached
and/or real lookups and revalidates each dentry in the union stack.

The added cost to a non-union mount pathname lookup in a
CONFIG_UNION_MOUNT kernel is either one or two mount flag tests per
pathname component, in needs_union_lookup().

XXX - implement negative union cache entries
---
 fs/namei.c            |  199 ++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/union.c            |   67 +++++++++++++++++
 include/linux/union.h |    9 ++
 3 files changed, 274 insertions(+), 1 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 7e2c31f..a72187b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -32,6 +32,7 @@
 #include <linux/fcntl.h>
 #include <linux/device_cgroup.h>
 #include <linux/fs_struct.h>
+#include <linux/union.h>
 #include <asm/uaccess.h>
 
 #include "internal.h"
@@ -722,6 +723,189 @@ static __always_inline void follow_dotdot(struct nameidata *nd)
 	follow_mount(&nd->path);
 }
 
+static struct dentry *__lookup_hash(struct qstr *name, struct dentry *base,
+				    struct nameidata *nd);
+
+/*
+ * __lookup_union - Given a path from the topmost layer, lookup and
+ * revalidate each dentry in its union stack, building it if necessary
+ *
+ * @nd - nameidata for the parent of @topmost
+ * @name - pathname from this element on
+ * @topmost - path of the topmost matching dentry
+ *
+ * Given the nameidata and the path of the topmost dentry for this
+ * pathname, lookup, revalidate, and build the associated union stack.
+ * @topmost must be either a negative dentry or a directory.
+ *
+ * This function is called both to build a new union stack and to
+ * revalidate a pre-existing union stack.  So we must cope with
+ * already existing union cache entries.
+ *
+ * This function may stomp nd->path with the path of the parent
+ * directory of lower layer, so the caller must save nd->path and
+ * restore it afterwards.  You probably want to use lookup_union(),
+ * not __lookup_union().
+ */
+
+static int __lookup_union(struct nameidata *nd, struct qstr *name,
+			  struct path *topmost)
+{
+	struct path parent = nd->path;
+	struct dentry *dentry;
+	struct path upper;
+	struct path lower;
+	int err = 0;
+
+	if (d_is_whiteout(topmost->dentry))
+		return 0;
+
+	if (IS_OPAQUE(nd->path.dentry->d_inode) &&
+	    !d_is_fallthru(topmost->dentry))
+		return 0;
+
+	/* upper is the most recent positive dentry or topmost negative */
+	upper.dentry = dget(topmost->dentry);
+	upper.mnt = mntget(topmost->mnt);
+
+	/* union_down_one() drops a reference, take one */
+	path_get(&nd->path);
+
+	/* Traverse the parent dir's union stack looking for this name */
+	while (union_down_one(&nd->path.mnt, &nd->path.dentry)) {
+		/* Lookup and revalidate the child dentry */
+		lower.mnt = nd->path.mnt;
+		lower.dentry = __lookup_hash(name, nd->path.dentry, nd);
+
+		if (IS_ERR(lower.dentry)) {
+			err = PTR_ERR(lower.dentry);
+			break;
+		}
+
+		if (d_is_whiteout(lower.dentry)) {
+			dput(lower.dentry);
+			break;
+		}
+
+		if (IS_OPAQUE(nd->path.dentry->d_inode) &&
+		    !d_is_fallthru(lower.dentry))
+			break;
+
+		if (!lower.dentry->d_inode) {
+			dput(lower.dentry);
+			continue;
+		}
+
+		/*
+		 * You can't union a file with a directory!  Note that
+		 * if the topmost directory entry is positive, then it
+		 * will be a directory at this point.
+		 */
+		if (topmost->dentry->d_inode &&
+		    !S_ISDIR(lower.dentry->d_inode->i_mode)) {
+			dput(lower.dentry);
+			break;
+		}
+
+		/* Non-dir entries block anything below, so bail out */
+		if (!S_ISDIR(lower.dentry->d_inode->i_mode)) {
+			dput(topmost->dentry);
+			topmost->dentry = lower.dentry;
+			/*
+			 * mntput() of previous topmost done in
+			 * link_path_walk()
+			 */
+			topmost->mnt = mntget(lower.mnt);
+			break;
+		}
+
+		/* The topmost directory must always exist. */
+		if (!topmost->dentry->d_inode) {
+			dentry = union_create_topmost_dir(&parent, name,
+							  &lower);
+			if (IS_ERR(dentry)) {
+				err = PTR_ERR(dentry);
+				dput(lower.dentry);
+				break;
+			}
+			dput(topmost->dentry);
+			topmost->dentry = dentry;
+			dput(upper.dentry);
+			upper.dentry = dget(dentry);
+		}
+
+		/*
+		 * Add new dentry to the union stack.  It's okay if
+		 * we've already added it, append_to_union() can
+		 * handle that case.
+		 */
+		err = append_to_union(&upper, &lower);
+		if (err) {
+			dput(lower.dentry);
+			break;
+		}
+
+		path_put(&upper);
+		upper.mnt = mntget(lower.mnt);
+		upper.dentry = lower.dentry;
+	}
+	path_put(&nd->path);
+	path_put(&upper);
+
+	return err;
+}
+
+/*
+ * lookup_union - revalidate and build union stack for this path
+ *
+ * We borrow the nameidata struct from the topmost layer to do the
+ * revalidation on lower dentries, replacing the topmost parent
+ * directory's path with that of the matching parent dir in each lower
+ * layer.  This wrapper for __lookup_union() saves the topmost layer's
+ * path and restores it when we are done.
+ */
+static int lookup_union(struct nameidata *nd, struct qstr *name,
+			struct path *topmost)
+{
+	struct path saved_path;
+	int err;
+
+	BUG_ON(!IS_MNT_UNION(nd->path.mnt) && !IS_MNT_UNION(topmost->mnt));
+	BUG_ON(!mutex_is_locked(&nd->path.dentry->d_inode->i_mutex));
+
+	saved_path = nd->path;
+	path_get(&saved_path);
+
+	err = __lookup_union(nd, name, topmost);
+
+	nd->path = saved_path;
+	path_put(&saved_path);
+
+	return err;
+}
+
+/*
+ * do_union_lookup - union mount-aware part of do_lookup
+ *
+ * do_lookup()-style wrapper for lookup_union().  Follows mounts.
+ */
+
+static int do_union_lookup(struct nameidata *nd, struct qstr *name,
+			   struct path *topmost)
+{
+	struct dentry *parent = nd->path.dentry;
+	struct inode *dir = parent->d_inode;
+	int err;
+
+	mutex_lock(&dir->i_mutex);
+	err = lookup_union(nd, name, topmost);
+	mutex_unlock(&dir->i_mutex);
+
+	__follow_mount(topmost);
+
+	return err;
+}
+
 /*
  *  It's more convoluted than I'd like it to be, but... it's still fairly
  *  small and for now I'd prefer to have fast path as straight as possible.
@@ -752,6 +936,11 @@ done:
 	path->mnt = mnt;
 	path->dentry = dentry;
 	__follow_mount(path);
+	if (needs_union_lookup(nd->path.mnt, path)) {
+		int err = do_union_lookup(nd, name, path);
+		if (err < 0)
+			return err;
+	}
 	return 0;
 
 need_lookup:
@@ -1223,8 +1412,13 @@ static int lookup_hash(struct nameidata *nd, struct qstr *name,
 		err = PTR_ERR(path->dentry);
 		path->dentry = NULL;
 		path->mnt = NULL;
+		return err;
 	}
+
+	if (needs_union_lookup(nd->path.mnt, path))
+		err = lookup_union(nd, name, path);
 	return err;
+
 }
 
 static int __lookup_one_len(const char *name, struct qstr *this,
@@ -2947,7 +3141,10 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 	error = -EXDEV;
 	if (oldnd.path.mnt != newnd.path.mnt)
 		goto exit2;
-
+	/* Rename on union mounts not implemented yet */
+	/* XXX much harsher check than necessary - can do some renames */
+	if (IS_UNIONED_DIR(&oldnd.path) || IS_UNIONED_DIR(&newnd.path))
+		goto exit2;
 	old_dir = oldnd.path.dentry;
 	error = -EBUSY;
 	if (oldnd.last_type != LAST_NORM)
diff --git a/fs/union.c b/fs/union.c
index eb664e6..f42c490 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -22,6 +22,7 @@
 #include <linux/fs_struct.h>
 #include <linux/slab.h>
 #include <linux/union.h>
+#include <linux/namei.h>
 
 /*
  * This is borrowed from fs/inode.c. The hashtable for lookups. Somebody
@@ -196,6 +197,43 @@ static struct union_dir *union_cache_rlookup(struct dentry *dentry, struct vfsmo
 }
 
 /*
+ * needs_union_lookup - Does this path need a union lookup?
+ *
+ * @parent_mnt - parent mnt, usually from associated nameidata (nd->path.mnt)
+ * @path - path of potential child union directory
+ *
+ * Short-circuit union operations on paths that can't possibly be
+ * unioned directories or don't need union lookup.
+ */
+
+int needs_union_lookup(struct vfsmount *parent_mnt, struct path *path)
+{
+	/* If this is the root of a mount, ignore the parent */
+	if (IS_ROOT(path->dentry) && !IS_MNT_UNION(path->mnt))
+		return 0;
+
+	/* The child could be from a lower layer, check the parent mnt */
+	if (!IS_MNT_UNION(parent_mnt))
+		return 0;
+
+	/* Only directories can be unioned */
+	if (path->dentry->d_inode &&
+	    !S_ISDIR(path->dentry->d_inode->i_mode))
+		return 0;
+
+	/*
+	 * XXX - A negative dentry for a directory in a unioned
+	 * directory could have a matching directory below it.  Or it
+	 * could not.  Either way, all we have is a negative dentry.
+	 * As a result, negative dentries with unioned parents always
+	 * have to go through a full union lookup.  This can be
+	 * avoided by adding a negative union cache entry for the
+	 * negative dentry.
+	 */
+	return 1;
+}
+
+/*
  * append_to_union - add a path to the bottom of the union stack
  *
  * Allocate and attach a union cache entry linking the new, upper
@@ -343,3 +381,32 @@ repeat:
 	}
 	spin_unlock(&union_lock);
 }
+
+/*
+ * union_create_topmost_dir - Create a matching dir in the topmost file system
+ */
+
+struct dentry * union_create_topmost_dir(struct path *parent, struct qstr *name,
+					 struct path *lower)
+{
+	struct dentry *dentry;
+	int mode = lower->dentry->d_inode->i_mode;
+	int res;
+
+	res = mnt_want_write(parent->mnt);
+	if (res)
+		return ERR_PTR(res);
+
+	dentry = lookup_one_len(name->name, parent->dentry, name->len);
+	if (IS_ERR(dentry))
+		goto out;
+
+	res = vfs_mkdir(parent->dentry->d_inode, dentry, mode);
+	if (res) {
+		dput(dentry);
+		goto out;
+	}
+out:
+	mnt_drop_write(parent->mnt);
+	return dentry;
+}
diff --git a/include/linux/union.h b/include/linux/union.h
index 70b2adb..24608b2 100644
--- a/include/linux/union.h
+++ b/include/linux/union.h
@@ -38,19 +38,28 @@ struct union_dir {
 };
 
 #define IS_MNT_UNION(mnt)	((mnt)->mnt_flags & MNT_UNION)
+#define IS_UNIONED_DIR(path)	(IS_MNT_UNION((path)->mnt) && \
+				 ((path)->dentry->d_union_lower_count || \
+				  !list_empty(&(path)->dentry->d_unions)))
 
+extern int needs_union_lookup(struct vfsmount *, struct path *);
 extern int append_to_union(struct path *, struct path*);
 extern int union_down_one(struct vfsmount **, struct dentry **);
 extern void __d_drop_unions(struct dentry *);
 extern void shrink_d_unions(struct dentry *);
+extern struct dentry * union_create_topmost_dir(struct path *, struct qstr *,
+						struct path *);
 
 #else /* CONFIG_UNION_MOUNT */
 
 #define IS_MNT_UNION(x)			(0)
+#define IS_UNIONED_DIR(x)		(0)
+#define needs_union_lookup(x, y)	({ (0); })
 #define append_to_union(x, y)		({ BUG(); (0); })
 #define union_down_one(x, y)		({ (0); })
 #define __d_drop_unions(x)		do { } while (0)
 #define shrink_d_unions(x)		do { } while (0)
+#define union_create_topmost_dir(x, y, z)	({ BUG(); (NULL); })
 
 #endif	/* CONFIG_UNION_MOUNT */
 #endif	/* __KERNEL__ */
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 22/39] union-mount: Support for mounting union mount file systems
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (20 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 21/39] union-mount: Implement union lookup Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 23/39] union-mount: Call do_whiteout() on unlink and rmdir in unions Valerie Aurora
                   ` (16 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

Create and tear down union mount structures on mount.  Check
requirements for union mounts.

Thanks to Felix Fietkau <nbd@openwrt.org> for a bug fix.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namespace.c        |  130 ++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/union.c            |   63 ++++++++++++++++++++++++
 include/linux/union.h |    4 ++
 3 files changed, 196 insertions(+), 1 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 5e4b27b..e19a432 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -29,6 +29,7 @@
 #include <linux/log2.h>
 #include <linux/idr.h>
 #include <linux/fs_struct.h>
+#include <linux/union.h>
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
 #include "pnode.h"
@@ -157,6 +158,9 @@ struct vfsmount *alloc_vfsmnt(const char *name)
 #else
 		mnt->mnt_writers = 0;
 #endif
+#ifdef CONFIG_UNION_MOUNT
+		INIT_LIST_HEAD(&mnt->mnt_unions);
+#endif
 	}
 	return mnt;
 
@@ -492,6 +496,7 @@ static void __touch_mnt_namespace(struct mnt_namespace *ns)
 
 static void detach_mnt(struct vfsmount *mnt, struct path *old_path)
 {
+	detach_mnt_union(mnt);
 	old_path->dentry = mnt->mnt_mountpoint;
 	old_path->mnt = mnt->mnt_parent;
 	mnt->mnt_parent = mnt;
@@ -515,6 +520,7 @@ static void attach_mnt(struct vfsmount *mnt, struct path *path)
 	list_add_tail(&mnt->mnt_hash, mount_hashtable +
 			hash(path->mnt, path->dentry));
 	list_add_tail(&mnt->mnt_child, &path->mnt->mnt_mounts);
+	attach_mnt_union(mnt, path->mnt);
 }
 
 /*
@@ -537,6 +543,7 @@ static void commit_tree(struct vfsmount *mnt)
 	list_add_tail(&mnt->mnt_hash, mount_hashtable +
 				hash(parent, mnt->mnt_mountpoint));
 	list_add_tail(&mnt->mnt_child, &parent->mnt_mounts);
+	attach_mnt_union(mnt, parent);
 	touch_mnt_namespace(n);
 }
 
@@ -1025,6 +1032,7 @@ void release_mounts(struct list_head *head)
 			struct dentry *dentry;
 			struct vfsmount *m;
 			spin_lock(&vfsmount_lock);
+			detach_mnt_union(mnt);
 			dentry = mnt->mnt_mountpoint;
 			m = mnt->mnt_parent;
 			mnt->mnt_mountpoint = mnt->mnt_root;
@@ -1139,6 +1147,12 @@ static int do_umount(struct vfsmount *mnt, int flags)
 		if (!list_empty(&mnt->mnt_list))
 			umount_tree(mnt, 1, &umount_list);
 		retval = 0;
+		/*
+		 * If this was a union mount, we are no longer a
+		 * read-only user on the underlying mount.
+		 */
+		if (mnt->mnt_flags & MNT_UNION)
+			dec_hard_readonly_users(mnt->mnt_parent);
 	}
 	spin_unlock(&vfsmount_lock);
 	if (retval)
@@ -1490,6 +1504,17 @@ static int do_change_type(struct path *path, int flag)
 		return -EINVAL;
 
 	down_write(&namespace_sem);
+
+	/*
+	 * Mounts of file systems with read-only users can't deal with
+	 * mount/umount propagation events - it's the moral equivalent
+	 * of rm -rf dir/ or the like.
+	 */
+	if (sb_is_hard_readonly(mnt->mnt_sb)) {
+		err = -EROFS;
+		goto out_unlock;
+	}
+
 	if (type == MS_SHARED) {
 		err = invent_group_ids(mnt, recurse);
 		if (err)
@@ -1507,6 +1532,77 @@ static int do_change_type(struct path *path, int flag)
 }
 
 /*
+ * Mount-time check of upper and lower layer file systems to see if we
+ * can union mount one on the other.
+ *
+ * Note on union mounts and mount event propagation: The lower
+ * layer(s) of a union mount must not have any changes to its
+ * namespace.  Therefore, it must not be part of any mount event
+ * propagation group - i.e., shared or slave.  MNT_SHARED and
+ * MNT_SLAVE are not set at mount, but in do_change_type(), which
+ * prevents setting these flags on file systems with read-only users,
+ * which includes the lower layer(s) of a union mount.
+ */
+
+static int
+check_union_mnt(struct path *mntpnt, struct vfsmount *topmost_mnt, int mnt_flags)
+{
+	struct vfsmount *lower_mnt = mntpnt->mnt;
+
+	if (!(mnt_flags & MNT_UNION))
+		return 0;
+
+#ifndef CONFIG_UNION_MOUNT
+	return -EINVAL;
+#endif
+	/*
+	 * We can't deal with namespace changes in the lower layers of
+	 * a union, so the lower layer must be read-only.  Note that
+	 * we could possibly convert a read-write unioned mount into a
+	 * read-only mount here, which would give us a way to union
+	 * more than one layer with separate mount commands.  But
+	 * first we have to solve the locking order problems with more
+	 * than two layers of union.
+	 */
+	if (!(lower_mnt->mnt_sb->s_flags & MS_RDONLY))
+		return -EBUSY;
+
+	/*
+	 * WRITEME: For simplicity, the lower layer can't have
+	 * submounts.  If there's a good reason, we could recursively
+	 * check the whole subtree for read-only-ness, etc. and it
+	 * would probably work fine.
+	 */
+	if (!list_empty(&lower_mnt->mnt_mounts))
+		return -EBUSY;
+
+	/*
+	 * Only permit unioning of file systems at their root
+	 * directories.  This allows us to mark entire mounts as
+	 * unioned.  Otherwise we must slowly and expensively work our
+	 * way up a path looking for a unioned directory before we
+	 * know if a path is from a unioned lower layer.
+	 */
+
+	if (!IS_ROOT(mntpnt->dentry))
+		return -EINVAL;
+
+	/*
+	 * Topmost layer must be writable to support our readdir()
+	 * solution of copying up all lower level entries to the
+	 * topmost layer.
+	 */
+	if (mnt_flags & MNT_READONLY)
+		return -EROFS;
+
+	/* Topmost file system must support whiteouts and fallthrus. */
+	if (!(topmost_mnt->mnt_sb->s_flags & MS_WHITEOUT))
+		return -EINVAL;
+
+	return 0;
+}
+
+/*
  * do loopback mount.
  */
 static int do_loopback(struct path *path, char *old_name,
@@ -1527,6 +1623,9 @@ static int do_loopback(struct path *path, char *old_name,
 	err = -EINVAL;
 	if (IS_MNT_UNBINDABLE(old_path.mnt))
 		goto out;
+	/* Mount part of a union mount elsewhere? The mind boggles. */
+	if (IS_MNT_UNION(old_path.mnt))
+		goto out;
 
 	if (!check_mnt(path->mnt) || !check_mnt(old_path.mnt))
 		goto out;
@@ -1548,7 +1647,6 @@ static int do_loopback(struct path *path, char *old_name,
 		spin_unlock(&vfsmount_lock);
 		release_mounts(&umount_list);
 	}
-
 out:
 	up_write(&namespace_sem);
 	path_put(&old_path);
@@ -1589,6 +1687,17 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
 	if (!check_mnt(path->mnt))
 		return -EINVAL;
 
+	if (mnt_flags & MNT_UNION)
+		return -EINVAL;
+
+	if ((path->mnt->mnt_flags & MNT_UNION) &&
+	    !(mnt_flags & MNT_UNION))
+		return -EINVAL;
+
+	if ((path->mnt->mnt_flags & MNT_UNION) &&
+	    (mnt_flags & MNT_READONLY))
+		return -EINVAL;
+
 	if (path->dentry != path->mnt->mnt_root)
 		return -EINVAL;
 
@@ -1641,6 +1750,9 @@ static int do_move_mount(struct path *path, char *old_name)
 	while (d_mountpoint(path->dentry) &&
 	       follow_down(path))
 		;
+	/* Get the lowest layer of a union mount to move the whole stack */
+	while (union_down_one(&old_path.mnt, &old_path.dentry))
+		;
 	err = -EINVAL;
 	if (!check_mnt(path->mnt) || !check_mnt(old_path.mnt))
 		goto out;
@@ -1753,10 +1865,18 @@ int do_add_mount(struct vfsmount *newmnt, struct path *path,
 	if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode))
 		goto unlock;
 
+	err = check_union_mnt(path, newmnt, mnt_flags);
+	if (err)
+		goto unlock;
+
 	newmnt->mnt_flags = mnt_flags;
 	if ((err = graft_tree(newmnt, path)))
 		goto unlock;
 
+	/* Union mounts require the lower layer to always be read-only */
+	if (mnt_flags & MNT_UNION)
+		inc_hard_readonly_users(newmnt->mnt_parent);
+
 	if (fslist) /* add to the specified expiration list */
 		list_add_tail(&newmnt->mnt_expire, fslist);
 
@@ -2267,6 +2387,14 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	if (d_unlinked(old.dentry))
 		goto out2;
 	error = -EBUSY;
+	/*
+	 * We want the bottom-most layer of a union mount here - if we
+	 * move that around, all the layers on top move with it.
+	 */
+	while (union_down_one(&new.mnt, &new.dentry))
+		;
+	while (union_down_one(&root.mnt, &root.dentry))
+		;
 	if (new.mnt == root.mnt ||
 	    old.mnt == root.mnt)
 		goto out2; /* loop, on the same file system  */
diff --git a/fs/union.c b/fs/union.c
index f42c490..ee831a8 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -114,6 +114,7 @@ static struct union_dir *union_alloc(struct path *upper, struct path *lower)
 
 	atomic_set(&ud->u_count, 1);
 	INIT_LIST_HEAD(&ud->u_unions);
+	INIT_LIST_HEAD(&ud->u_list);
 	INIT_HLIST_NODE(&ud->u_hash);
 	INIT_HLIST_NODE(&ud->u_rhash);
 
@@ -274,6 +275,7 @@ int append_to_union(struct path *upper, struct path *lower)
 		union_put(new);
 		return 0;
 	}
+	list_add(&new->u_list, &upper->mnt->mnt_unions);
 	list_add(&new->u_unions, &upper->dentry->d_unions);
 	lower->dentry->d_union_lower_count++;
 	__union_hash(new);
@@ -373,6 +375,7 @@ repeat:
 	list_for_each_entry_safe(this, next, &dentry->d_unions, u_unions) {
 		BUG_ON(!hlist_unhashed(&this->u_hash));
 		BUG_ON(!hlist_unhashed(&this->u_rhash));
+		list_del(&this->u_list);
 		list_del(&this->u_unions);
 		this->u_lower.dentry->d_union_lower_count--;
 		spin_unlock(&union_lock);
@@ -383,6 +386,66 @@ repeat:
 }
 
 /*
+ * Remove all union_dir structures belonging to this vfsmount from the
+ * union lookup hashtable and so on ...
+ */
+void shrink_mnt_unions(struct vfsmount *mnt)
+{
+	struct union_dir *this, *next;
+
+repeat:
+	spin_lock(&union_lock);
+	list_for_each_entry_safe(this, next, &mnt->mnt_unions, u_list) {
+		if (this->u_upper.dentry == mnt->mnt_root)
+			continue;
+		__union_unhash(this);
+		list_del(&this->u_list);
+		list_del(&this->u_unions);
+		this->u_lower.dentry->d_union_lower_count--;
+		spin_unlock(&union_lock);
+		union_put(this);
+		goto repeat;
+	}
+	spin_unlock(&union_lock);
+}
+
+int attach_mnt_union(struct vfsmount *upper_mnt, struct vfsmount *lower_mnt)
+{
+	struct path upper, lower;
+	if (!IS_MNT_UNION(upper_mnt))
+		return 0;
+
+	/* Make a union of the root dirs of the upper and lower mounts */
+	upper.mnt = upper_mnt;
+	upper.dentry = upper_mnt->mnt_root;
+
+	lower.mnt = lower_mnt;
+	lower.dentry = lower_mnt->mnt_root;
+
+	return append_to_union(&upper, &lower);
+}
+
+void detach_mnt_union(struct vfsmount *mnt)
+{
+	struct union_dir *ud;
+
+	if (!IS_MNT_UNION(mnt))
+		return;
+
+	shrink_mnt_unions(mnt);
+
+	spin_lock(&union_lock);
+	ud = union_cache_lookup(mnt->mnt_root, mnt);
+	__union_unhash(ud);
+	list_del(&ud->u_list);
+	list_del(&ud->u_unions);
+	ud->u_lower.dentry->d_union_lower_count--;
+	spin_unlock(&union_lock);
+	union_put(ud);
+	return;
+}
+
+/*
  * union_create_topmost_dir - Create a matching dir in the topmost file system
  */
 
diff --git a/include/linux/union.h b/include/linux/union.h
index 24608b2..1aaaa38 100644
--- a/include/linux/union.h
+++ b/include/linux/union.h
@@ -49,6 +49,8 @@ extern void __d_drop_unions(struct dentry *);
 extern void shrink_d_unions(struct dentry *);
 extern struct dentry * union_create_topmost_dir(struct path *, struct qstr *,
 						struct path *);
+extern int attach_mnt_union(struct vfsmount *, struct vfsmount *);
+extern void detach_mnt_union(struct vfsmount *);
 
 #else /* CONFIG_UNION_MOUNT */
 
@@ -60,6 +62,8 @@ extern struct dentry * union_create_topmost_dir(struct path *, struct qstr *,
 #define __d_drop_unions(x)		do { } while (0)
 #define shrink_d_unions(x)		do { } while (0)
 #define union_create_topmost_dir(x, y, z)	({ BUG(); (NULL); })
+#define attach_mnt_union(x, y)		do { } while (0)
+#define detach_mnt_union(x)		do { } while (0)
 
 #endif	/* CONFIG_UNION_MOUNT */
 #endif	/* __KERNEL__ */
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 23/39] union-mount: Call do_whiteout() on unlink and rmdir in unions
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (21 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 22/39] union-mount: Support for mounting union mount file systems Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 24/39] union-mount: Copy up directory entries on first readdir() Valerie Aurora
                   ` (15 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

Call do_whiteout() when removing files and directories from a union
mounted file system.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
 fs/namei.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a72187b..a02b118 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2680,6 +2680,10 @@ static long do_rmdir(int dfd, const char __user *pathname)
 	error = mnt_want_write(nd.path.mnt);
 	if (error)
 		goto exit3;
+	if (IS_UNIONED_DIR(&nd.path)) {
+		error = do_whiteout(&nd, &path, 1);
+		goto exit4;
+	}
 	error = security_path_rmdir(&nd.path, path.dentry);
 	if (error)
 		goto exit4;
@@ -2769,6 +2773,10 @@ static long do_unlinkat(int dfd, const char __user *pathname)
 		error = mnt_want_write(nd.path.mnt);
 		if (error)
 			goto exit2;
+		if (IS_UNIONED_DIR(&nd.path)) {
+			error = do_whiteout(&nd, &path, 0);
+			goto exit3;
+		}
 		error = security_path_unlink(&nd.path, path.dentry);
 		if (error)
 			goto exit3;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 24/39] union-mount: Copy up directory entries on first readdir()
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (22 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 23/39] union-mount: Call do_whiteout() on unlink and rmdir in unions Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 25/39] VFS: Split inode_permission() and create path_permission() Valerie Aurora
                   ` (14 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora, Felix Fietkau

readdir() in union mounts is implemented by copying up all visible
directory entries from the lower level directories to the topmost
directory.  Directory entries that refer to lower level file system
objects are marked as "fallthru" in the topmost directory.

Thanks to Felix Fietkau <nbd@openwrt.org> for a bug fix.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
---
 fs/readdir.c          |    9 +++
 fs/union.c            |  157 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/union.h |    2 +
 3 files changed, 168 insertions(+), 0 deletions(-)

diff --git a/fs/readdir.c b/fs/readdir.c
index 3a48491..da71515 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -16,6 +16,8 @@
 #include <linux/security.h>
 #include <linux/syscalls.h>
 #include <linux/unistd.h>
+#include <linux/union.h>
+#include <linux/mount.h>
 
 #include <asm/uaccess.h>
 
@@ -36,9 +38,16 @@ int vfs_readdir(struct file *file, filldir_t filler, void *buf)
 
 	res = -ENOENT;
 	if (!IS_DEADDIR(inode)) {
+		if (IS_UNIONED_DIR(&file->f_path) && !IS_OPAQUE(inode)) {
+			res = union_copyup_dir(&file->f_path);
+			if (res)
+				goto out_unlock;
+		}
+
 		res = file->f_op->readdir(file, buf, filler);
 		file_accessed(file);
 	}
+out_unlock:
 	mutex_unlock(&inode->i_mutex);
 out:
 	return res;
diff --git a/fs/union.c b/fs/union.c
index ee831a8..7e05698 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2007-2009 Novell Inc.
  *
  *   Author(s): Jan Blunck (j.blunck@tu-harburg.de)
+ *              Valerie Aurora <vaurora@redhat.com>
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License as published by the Free
@@ -23,6 +24,8 @@
 #include <linux/slab.h>
 #include <linux/union.h>
 #include <linux/namei.h>
+#include <linux/file.h>
+#include <linux/security.h>
 
 /*
  * This is borrowed from fs/inode.c. The hashtable for lookups. Somebody
@@ -473,3 +476,157 @@ out:
 	mnt_drop_write(parent->mnt);
 	return dentry;
 }
+
+/**
+ * union_copyup_dir_one - copy up a single directory entry
+ *
+ * Individual directory entry copyup function for union_copyup_dir.
+ * We get the entries from higher level layers first.
+ */
+
+static int union_copyup_dir_one(void *buf, const char *name, int namlen,
+				loff_t offset, u64 ino, unsigned int d_type)
+{
+	struct dentry *topmost_dentry = (struct dentry *) buf;
+	struct dentry *dentry;
+	int err = 0;
+
+	switch (namlen) {
+	case 2:
+		if (name[1] != '.')
+			break;
+	case 1:
+		if (name[0] != '.')
+			break;
+		return 0;
+	}
+
+	/* Lookup this entry in the topmost directory */
+	dentry = lookup_one_len(name, topmost_dentry, namlen);
+
+	if (IS_ERR(dentry)) {
+		printk(KERN_WARNING "%s: error looking up %s\n", __func__,
+		       dentry->d_name.name);
+		err = PTR_ERR(dentry);
+		goto out;
+	}
+
+	/*
+	 * If the entry already exists, one of the following is true:
+	 * it was already copied up (due to an earlier lookup), an
+	 * entry with the same name already exists on the topmost file
+	 * system, it is a whiteout, or it is a fallthru.  In each
+	 * case, the top level entry masks any entries from lower file
+	 * systems, so don't copy up this entry.
+	 */
+	if (dentry->d_inode || d_is_whiteout(dentry) || d_is_fallthru(dentry))
+		goto out_dput;
+
+	/*
+	 * If the entry doesn't exist, create a fallthru entry in the
+	 * topmost file system.  All possible directory types are
+	 * used, so each file system must implement its own way of
+	 * storing a fallthru entry.
+	 */
+	err = topmost_dentry->d_inode->i_op->fallthru(topmost_dentry->d_inode,
+						      dentry);
+out_dput:
+	dput(dentry);
+out:
+	return err;
+}
+
+/**
+ * union_copyup_dir - copy up low-level directory entries to topmost dir
+ *
+ * readdir() is difficult to support on union file systems for two
+ * reasons: We must eliminate duplicates and apply whiteouts, and we
+ * must return something in f_pos that lets us restart in the same
+ * place when we return.  Our solution is to, on first readdir() of
+ * the directory, copy up all visible entries from the low-level file
+ * systems and mark the entries that refer to low-level file system
+ * objects as "fallthru" entries.
+ *
+ * Locking strategy: We hold the topmost dir's i_mutex on entry.  We
+ * grab the i_mutex on lower directories one by one.  So the locking
+ * order is:
+ *
+ * Writable/topmost layers > Read-only/lower layers
+ *
+ * So there is no problem with lock ordering for union stacks with
+ * multiple lower layers.  E.g.:
+ *
+ * (topmost) A->B->C (bottom)
+ * (topmost) D->C->B (bottom)
+ *
+ * (Not that we support more than two layers at the moment.)
+ */
+
+int union_copyup_dir(struct path *topmost_path)
+{
+	struct dentry *topmost_dentry = topmost_path->dentry;
+	struct path path = *topmost_path;
+	int res = 0;
+
+	BUG_ON(IS_OPAQUE(topmost_dentry->d_inode));
+
+	res = mnt_want_write(topmost_path->mnt);
+	if (res)
+		return res;
+	/*
+	 * Mark this dir opaque to show that we have already copied up
+	 * the lower entries.  Only fallthru entries pass through to
+	 * the underlying file system.
+	 */
+	topmost_dentry->d_inode->i_flags |= S_OPAQUE;
+	mark_inode_dirty(topmost_dentry->d_inode);
+
+	path_get(&path);
+	while (union_down_one(&path.mnt, &path.dentry)) {
+		struct file * ftmp;
+		struct inode * inode;
+
+		/* dentry_open() doesn't get a path reference itself */
+		path_get(&path);
+		ftmp = dentry_open(path.dentry, path.mnt,
+				   O_RDONLY | O_DIRECTORY | O_NOATIME,
+				   current_cred());
+		if (IS_ERR(ftmp)) {
+			printk (KERN_ERR "unable to open dir %s for "
+				"directory copyup: %ld\n",
+				path.dentry->d_name.name, PTR_ERR(ftmp));
+			path_put(&path);
+			continue;
+		}
+
+		inode = path.dentry->d_inode;
+		mutex_lock(&inode->i_mutex);
+
+		res = -ENOENT;
+		if (IS_DEADDIR(inode))
+			goto out_fput;
+		/*
+		 * Read the whole directory, calling our directory
+		 * entry copyup function on each entry.  Pass in the
+		 * topmost dentry as our private data so we can create
+		 * new entries in the topmost directory.
+		 */
+		res = ftmp->f_op->readdir(ftmp, topmost_dentry,
+					  union_copyup_dir_one);
+out_fput:
+		mutex_unlock(&inode->i_mutex);
+		fput(ftmp);
+
+		if (res)
+			break;
+
+		/* XXX Should process directories below an opaque
+		 * directory in case there are fallthrus in it */
+		if (IS_OPAQUE(path.dentry->d_inode))
+			break;
+
+	}
+	path_put(&path);
+	mnt_drop_write(topmost_path->mnt);
+	return res;
+}
diff --git a/include/linux/union.h b/include/linux/union.h
index 1aaaa38..fdd46f6 100644
--- a/include/linux/union.h
+++ b/include/linux/union.h
@@ -51,6 +51,7 @@ extern struct dentry * union_create_topmost_dir(struct path *, struct qstr *,
 						struct path *);
 extern int attach_mnt_union(struct vfsmount *, struct vfsmount *);
 extern void detach_mnt_union(struct vfsmount *);
+extern int union_copyup_dir(struct path *);
 
 #else /* CONFIG_UNION_MOUNT */
 
@@ -64,6 +65,7 @@ extern void detach_mnt_union(struct vfsmount *);
 #define union_create_topmost_dir(x, y, z)	({ BUG(); (NULL); })
 #define attach_mnt_union(x, y)		do { } while (0)
 #define detach_mnt_union(x)		do { } while (0)
+#define union_copyup_dir(x)		({ BUG(); (0); })
 
 #endif	/* CONFIG_UNION_MOUNT */
 #endif	/* __KERNEL__ */
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 25/39] VFS: Split inode_permission() and create path_permission()
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (23 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 24/39] union-mount: Copy up directory entries on first readdir() Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 26/39] VFS: Create user_path_nd() to lookup both parent and target Valerie Aurora
                   ` (13 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

Split inode_permission() into inode and file-system-dependent parts.
Create path_permission() to check permission based on the path to the
inode.  This is for union mounts, in which an inode can be located on
a read-only lower layer file system but is still writable, since we
will copy it up to the writable top layer file system.  So in that
case, we want to ignore MS_RDONLY on the lower layer.  To make this
decision, we must know the path (vfsmount, dentry) of both the target
and its parent.
---
 fs/namei.c         |   92 ++++++++++++++++++++++++++++++++++++++++++++--------
 include/linux/fs.h |    1 +
 2 files changed, 79 insertions(+), 14 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a02b118..92a4ff2 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -241,29 +241,20 @@ int generic_permission(struct inode *inode, int mask,
 }
 
 /**
- * inode_permission  -  check for access rights to a given inode
+ * __inode_permission  -  check for access rights to a given inode
  * @inode:	inode to check permission on
  * @mask:	right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
  *
  * Used to check for read/write/execute permissions on an inode.
- * We use "fsuid" for this, letting us set arbitrary permissions
- * for filesystem access without changing the "normal" uids which
- * are used for other things.
+ *
+ * This does not check for a read-only file system.  You probably want
+ * inode_permission().
  */
-int inode_permission(struct inode *inode, int mask)
+static int __inode_permission(struct inode *inode, int mask)
 {
 	int retval;
 
 	if (mask & MAY_WRITE) {
-		umode_t mode = inode->i_mode;
-
-		/*
-		 * Nobody gets write access to a read-only fs.
-		 */
-		if (IS_RDONLY(inode) &&
-		    (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)))
-			return -EROFS;
-
 		/*
 		 * Nobody gets write access to an immutable file.
 		 */
@@ -288,6 +279,79 @@ int inode_permission(struct inode *inode, int mask)
 }
 
 /**
+ * sb_permission  -  check superblock-level permissions
+ * @sb: superblock of inode to check permission on
+ * @mask: right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
+ *
+ * Separate out file-system wide checks from inode-specific permission
+ * checks.  In particular, union mounts want to check the read-only
+ * status of the top-level file system, not the lower.
+ */
+int sb_permission(struct super_block *sb, struct inode *inode, int mask)
+{
+	if (mask & MAY_WRITE) {
+		umode_t mode = inode->i_mode;
+
+		/*
+		 * Nobody gets write access to a read-only fs.
+		 */
+		if ((sb->s_flags & MS_RDONLY) &&
+		    (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)))
+			return -EROFS;
+	}
+	return 0;
+}
+
+/**
+ * inode_permission  -  check for access rights to a given inode
+ * @inode:	inode to check permission on
+ * @mask:	right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
+ *
+ * Used to check for read/write/execute permissions on an inode.
+ * We use "fsuid" for this, letting us set arbitrary permissions
+ * for filesystem access without changing the "normal" uids which
+ * are used for other things.
+ */
+int inode_permission(struct inode *inode, int mask)
+{
+	int retval;
+
+	retval = sb_permission(inode->i_sb, inode, mask);
+	if (retval)
+		return retval;
+	return __inode_permission(inode, mask);
+}
+
+/**
+ * path_permission - check for inode access rights depending on path
+ * @path: path of inode to check
+ * @parent_path: path of inode's parent
+ * @mask: right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
+ *
+ * Like inode_permission, but used to check for permission when the
+ * file may potentially be copied up between union layers.
+ */
+
+int path_permission(struct path *path, struct path *parent_path, int mask)
+{
+	struct vfsmount *mnt;
+	int retval;
+
+	/* Catch some reversal of args */
+	BUG_ON(!S_ISDIR(parent_path->dentry->d_inode->i_mode));
+
+	if (IS_MNT_UNION(parent_path->mnt))
+		mnt = parent_path->mnt;
+	else
+		mnt = path->mnt;
+
+	retval = sb_permission(mnt->mnt_sb, path->dentry->d_inode, mask);
+	if (retval)
+		return retval;
+	return __inode_permission(path->dentry->d_inode, mask);
+}
+
+/**
  * file_permission  -  check for additional access rights to a given file
  * @file:	file to check access rights for
  * @mask:	right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index dbd9881..3e2f8ac 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2114,6 +2114,7 @@ extern sector_t bmap(struct inode *, sector_t);
 #endif
 extern int notify_change(struct dentry *, struct iattr *);
 extern int inode_permission(struct inode *, int);
+extern int path_permission(struct path *, struct path *, int);
 extern int generic_permission(struct inode *, int,
 		int (*check_acl)(struct inode *, int));
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 26/39] VFS: Create user_path_nd() to lookup both parent and target
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (24 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 25/39] VFS: Split inode_permission() and create path_permission() Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 27/39] union-mount: In-kernel copyup routines Valerie Aurora
                   ` (12 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

Proof-of-concept implementation of user_path_nd().  Lookup both the
parent and the target of a user-supplied filename, to supply later to
union copyup routines.
---
 fs/namei.c            |   31 +++++++++++++++++++++++++++++++
 include/linux/namei.h |    2 ++
 2 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 92a4ff2..0c52042 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1571,6 +1571,37 @@ static int user_path_parent(int dfd, const char __user *path,
 	return error;
 }
 
+int user_path_nd(int dfd, const char __user *filename,
+			 unsigned flags, struct nameidata *parent_nd,
+			 struct path *child, char **tmp)
+{
+	struct nameidata child_nd;
+	char *s = getname(filename);
+	int error;
+
+	if (IS_ERR(s))
+		return PTR_ERR(s);
+
+	/* Lookup parent */
+	error = do_path_lookup(dfd, s, LOOKUP_PARENT, parent_nd);
+	if (error)
+		goto out_putname;
+
+	/* Lookup child - XXX optimize, racy */
+	error = do_path_lookup(dfd, s, flags, &child_nd);
+	if (error)
+		goto out_path_put;
+	*child = child_nd.path;
+	*tmp = s;
+	return 0;
+
+out_path_put:
+	path_put(&parent_nd->path);
+out_putname:
+	putname(s);
+	return error;
+}
+
 /*
  * It's inline, so penalty for filesystems that don't use sticky bit is
  * minimal.
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 05b441d..83dc8b5 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -58,6 +58,8 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
 #define LOOKUP_RENAME_TARGET	0x0800
 
 extern int user_path_at(int, const char __user *, unsigned, struct path *);
+extern int user_path_nd(int, const char __user *, unsigned,
+			struct nameidata *, struct path *, char **);
 
 #define user_path(name, path) user_path_at(AT_FDCWD, name, LOOKUP_FOLLOW, path)
 #define user_lpath(name, path) user_path_at(AT_FDCWD, name, 0, path)
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 27/39] union-mount: In-kernel copyup routines
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (25 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 26/39] VFS: Create user_path_nd() to lookup both parent and target Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-04  1:40   ` Valdis.Kletnieks
  2010-05-03 23:12 ` [PATCH 28/39] union-mount: In-kernel copyup of xattrs Valerie Aurora
                   ` (11 subsequent siblings)
  38 siblings, 1 reply; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

When a file on the read-only layer of a union mount is altered, it
must be copied up to the topmost read-write layer.  This patch creates
union_copyup() and its supporting routines.
---
 fs/union.c            |  244 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/union.h |    7 +-
 2 files changed, 250 insertions(+), 1 deletions(-)

diff --git a/fs/union.c b/fs/union.c
index 7e05698..d52c7c0 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -26,6 +26,7 @@
 #include <linux/namei.h>
 #include <linux/file.h>
 #include <linux/security.h>
+#include <linux/splice.h>
 
 /*
  * This is borrowed from fs/inode.c. The hashtable for lookups. Somebody
@@ -630,3 +631,246 @@ out_fput:
 	mnt_drop_write(topmost_path->mnt);
 	return res;
 }
+
+/**
+ * union_create_file
+ *
+ * @nd: namediata for source file
+ * @old: path of the source file
+ * @new: path of the new file, negative dentry
+ *
+ * Must already have mnt_want_write() on the mnt and the parent's
+ * i_mutex.
+ */
+
+static int union_create_file(struct nameidata *nd, struct path *old,
+			     struct dentry *new)
+{
+	struct path *parent = &nd->path;
+	BUG_ON(!mutex_is_locked(&parent->dentry->d_inode->i_mutex));
+
+	return vfs_create(parent->dentry->d_inode, new,
+			  old->dentry->d_inode->i_mode, nd);
+}
+
+/**
+ * union_create_symlink
+ *
+ * @nd: namediata for source symlink
+ * @old: path of the source symlink
+ * @new: path of the new symlink, negative dentry
+ *
+ * Must already have mnt_want_write() on the mnt and the parent's
+ * i_mutex.
+ */
+
+static int union_create_symlink(struct nameidata *nd, struct path *old,
+				struct dentry *new)
+{
+	void *cookie;
+	int error;
+
+	BUG_ON(!mutex_is_locked(&nd->path.dentry->d_inode->i_mutex));
+	/*
+	 * We want the contents of this symlink, not to follow it, so
+	 * this is modeled on generic_readlink() rather than
+	 * do_follow_link().
+	 */
+	nd->depth = 0;
+	cookie = old->dentry->d_inode->i_op->follow_link(old->dentry, nd);
+	if (IS_ERR(cookie))
+		return PTR_ERR(cookie);
+	/* Create a copy of the link on the top layer */
+	error = vfs_symlink(nd->path.dentry->d_inode, new,
+			    nd_get_link(nd));
+	if (old->dentry->d_inode->i_op->put_link)
+		old->dentry->d_inode->i_op->put_link(old->dentry, nd, cookie);
+	return error;
+}
+
+/**
+ * union_copyup_data - Copy up len bytes of old's data to new
+ *
+ * @old: source file
+ * @new: target file
+ * @len: number of bytes to copy
+ */
+
+static int union_copyup_data(struct path *old, struct vfsmount *new_mnt,
+			     struct dentry *new_dentry, size_t len)
+{
+	struct file *old_file;
+	struct file *new_file;
+	const struct cred *cred = current_cred();
+	loff_t offset = 0;
+	long bytes;
+	int error;
+
+	if (len == 0)
+		return 0;
+
+	/* Get reference to balance later fput() */
+	path_get(old);
+	old_file = dentry_open(old->dentry, old->mnt, O_RDONLY, cred);
+	if (IS_ERR(old_file))
+		return PTR_ERR(old_file);
+
+	dget(new_dentry);
+	mntget(new_mnt);
+	new_file = dentry_open(new_dentry, new_mnt, O_WRONLY, cred);
+	if (IS_ERR(new_file)) {
+		error = PTR_ERR(new_file);
+		goto out_fput;
+	}
+
+	bytes = do_splice_direct(old_file, &offset, new_file, len,
+				 SPLICE_F_MOVE);
+	if (bytes < 0)
+		error = bytes;
+
+	fput(new_file);
+out_fput:
+	fput(old_file);
+	return error;
+}
+
+/**
+ * union_do_copyup_path_len - Copy up a file and len bytes of data
+ *
+ * @parent: parent directory's path
+ * @path: path of file to be copied up
+ * @len: number of bytes of file data to copy up
+ *
+ * Parent's i_mutex must be held by caller.  Newly copied up path is
+ * returned in @path and original is path_put().
+ */
+
+static int __union_copyup_len(struct nameidata *nd, struct path *path,
+			      size_t len)
+{
+	struct path *parent = &nd->path;
+	struct dentry *dentry;
+	int error;
+
+	BUG_ON(!mutex_is_locked(&parent->dentry->d_inode->i_mutex));
+
+	dentry = lookup_one_len(path->dentry->d_name.name, parent->dentry,
+				path->dentry->d_name.len);
+	if (IS_ERR(dentry))
+		return PTR_ERR(dentry);
+
+	if (dentry->d_inode) {
+		/*
+		 * We raced with someone else and "lost."  That's
+		 * okay, they did all the work of copying up the file.
+		 * Note that currently data copyup happens under the
+		 * parent dir's i_mutex.  If we move it outside that,
+		 * we'll need some way of waiting for the data copyup
+		 * to complete here.
+		 */
+		error = 0;
+		goto out_newpath;
+	}
+	if (S_ISREG(path->dentry->d_inode->i_mode)) {
+		/* Create file */
+		error = union_create_file(nd, path, dentry);
+		if (error)
+			goto out_dput;
+		/* Copyup data */
+		error = union_copyup_data(path, parent->mnt, dentry, len);
+	} else {
+		BUG_ON(!S_ISLNK(path->dentry->d_inode->i_mode));
+		error = union_create_symlink(nd, path, dentry);
+	}
+	if (error) {
+		/* Most likely error: ENOSPC */
+		vfs_unlink(parent->dentry->d_inode, dentry);
+		goto out_dput;
+	}
+	/* XXX Copyup xattrs and any other dangly bits */
+out_newpath:
+	/* path_put() of original must happen before we copy in new */
+	path_put(path);
+	path->dentry = dentry;
+	path->mnt = mntget(parent->mnt);
+	return error;
+out_dput:
+	/* Don't path_put(path), let caller unwind */
+	dput(dentry);
+	return error;
+}
+
+/**
+ * union_copyup_path - Copy up a file given its path (and its parent's)
+ *
+ * @parent: parent directory's path
+ * @path: path of file to be copied up
+ * @newpath: return path of newly copied up file
+ * @copy_all: if set, copy all of the file's data and ignore @len
+ * @len: if @copy_all is not set, number of bytes of file data to copy up
+ */
+
+int do_union_copyup_len(struct nameidata *nd, struct path *path, int copy_all,
+			size_t len)
+{
+	struct path *parent = &nd->path;
+	int error;
+
+	if (!IS_UNIONED_DIR(parent))
+		return 0;
+	if (parent->mnt == path->mnt)
+		return 0;
+	if (!S_ISREG(path->dentry->d_inode->i_mode) &&
+	    !S_ISLNK(path->dentry->d_inode->i_mode))
+		return 0;
+
+	BUG_ON(!S_ISDIR(parent->dentry->d_inode->i_mode));
+
+	mutex_lock(&parent->dentry->d_inode->i_mutex);
+	error = -ENOENT;
+	if (IS_DEADDIR(parent->dentry->d_inode))
+		goto out_unlock;
+
+	if (copy_all && S_ISREG(path->dentry->d_inode->i_mode)) {
+		error = -EFBIG;
+		len = i_size_read(path->dentry->d_inode);
+		if (((size_t)len != len) || ((ssize_t)len != len))
+			goto out_unlock;
+	}
+
+	error = __union_copyup_len(nd, path, len);
+
+out_unlock:
+	mutex_unlock(&parent->dentry->d_inode->i_mutex);
+	return error;
+}
+
+/*
+ * Helper function to copy up all of a file
+ */
+int union_copyup(struct nameidata *nd, struct path *path)
+{
+	return do_union_copyup_len(nd, path, 1, 0);
+}
+
+/*
+ * Unlocked helper function to copy up all of a file
+ */
+int __union_copyup(struct nameidata *nd, struct path *path)
+{
+	size_t len;
+	len = i_size_read(path->dentry->d_inode);
+	if (((size_t)len != len) || ((ssize_t)len != len))
+		return -EFBIG;
+
+	return __union_copyup_len(nd, path, len);
+}
+
+/*
+ * Helper function to copy up part of a file
+ */
+int union_copyup_len(struct nameidata *nd, struct path *path, size_t len)
+{
+	return do_union_copyup_len(nd, path, 0, len);
+}
+
diff --git a/include/linux/union.h b/include/linux/union.h
index fdd46f6..fbfea1d 100644
--- a/include/linux/union.h
+++ b/include/linux/union.h
@@ -52,7 +52,9 @@ extern struct dentry * union_create_topmost_dir(struct path *, struct qstr *,
 extern int attach_mnt_union(struct vfsmount *, struct vfsmount *);
 extern void detach_mnt_union(struct vfsmount *);
 extern int union_copyup_dir(struct path *);
-
+extern int union_copyup(struct nameidata *, struct path *);
+extern int __union_copyup(struct nameidata *, struct path *);
+extern int union_copyup_len(struct nameidata *, struct path *, size_t len);
 #else /* CONFIG_UNION_MOUNT */
 
 #define IS_MNT_UNION(x)			(0)
@@ -66,6 +68,9 @@ extern int union_copyup_dir(struct path *);
 #define attach_mnt_union(x, y)		do { } while (0)
 #define detach_mnt_union(x)		do { } while (0)
 #define union_copyup_dir(x)		({ BUG(); (0); })
+#define union_copyup(x, y)		({ BUG(); (NULL); })
+#define __union_copyup(x, y)		({ BUG(); (NULL); })
+#define union_copyup_len(x, y, z)	({ BUG(); (NULL); })
 
 #endif	/* CONFIG_UNION_MOUNT */
 #endif	/* __KERNEL__ */
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 28/39] union-mount: In-kernel copyup of xattrs
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (26 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 27/39] union-mount: In-kernel copyup routines Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 29/39] union-mount: Implement union-aware access()/faccessat() Valerie Aurora
                   ` (10 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

Copyup extended attributes as well as file data.
---
 fs/union.c |   74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 74 insertions(+), 0 deletions(-)

diff --git a/fs/union.c b/fs/union.c
index d52c7c0..abc964a 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -27,6 +27,7 @@
 #include <linux/file.h>
 #include <linux/security.h>
 #include <linux/splice.h>
+#include <linux/xattr.h>
 
 /*
  * This is borrowed from fs/inode.c. The hashtable for lookups. Somebody
@@ -449,6 +450,72 @@ void detach_mnt_union(struct vfsmount *mnt)
 	return;
 }
 
+/**
+ * union_copyup_xattr
+ *
+ * @old: dentry of original file
+ * @new: dentry of new copy
+ *
+ * Copy up extended attributes from the original file to the new one.
+ *
+ * XXX - Permissions?  For now, copying up every xattr.
+ */
+
+static int union_copyup_xattr(struct dentry *old, struct dentry *new)
+{
+	ssize_t list_size, size;
+	char *buf, *name, *value;
+	int error;
+
+	/* Check for xattr support */
+	if (!old->d_inode->i_op->getxattr ||
+	    !new->d_inode->i_op->getxattr)
+		return 0;
+
+	/* Find out how big the list of xattrs is */
+	list_size = vfs_listxattr(old, NULL, 0);
+	if (list_size <= 0)
+		return list_size;
+
+	/* Allocate memory for the list */
+	buf = kzalloc(list_size, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	/* Allocate memory for the xattr's value */
+	error = -ENOMEM;
+	value = kmalloc(XATTR_SIZE_MAX, GFP_KERNEL);
+	if (!value)
+		goto out;
+
+	/* Actually get the list of xattrs */
+	list_size = vfs_listxattr(old, buf, list_size);
+	if (list_size <= 0) {
+		error = list_size;
+		goto out_free_value;
+	}
+
+	for (name = buf; name < (buf + list_size); name += strlen(name) + 1) {
+		/* XXX Locking? old is on read-only fs */
+		size = vfs_getxattr(old, name, value, XATTR_SIZE_MAX);
+		if (size <= 0) {
+			error = size;
+			goto out_free_value;
+		}
+		/* XXX do we really need to check for size overflow? */
+		/* XXX locks new dentry, lock ordering problems? */
+		error = vfs_setxattr(new, name, value, size, 0);
+		if (error)
+			goto out_free_value;
+	}
+
+out_free_value:
+	kfree(value);
+out:
+	kfree(buf);
+	return error;
+}
+
 /*
  * union_create_topmost_dir - Create a matching dir in the topmost file system
  */
@@ -473,6 +540,10 @@ struct dentry * union_create_topmost_dir(struct path *parent, struct qstr *name,
 		dput(dentry);
 		goto out;
 	}
+
+	res = union_copyup_xattr(lower->dentry, dentry);
+	if (res)
+		dput(dentry);
 out:
 	mnt_drop_write(parent->mnt);
 	return dentry;
@@ -788,6 +859,9 @@ static int __union_copyup_len(struct nameidata *nd, struct path *path,
 		goto out_dput;
 	}
 	/* XXX Copyup xattrs and any other dangly bits */
+	error = union_copyup_xattr(path->dentry, dentry);
+	if (error)
+		goto out_dput;
 out_newpath:
 	/* path_put() of original must happen before we copy in new */
 	path_put(path);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 29/39] union-mount: Implement union-aware access()/faccessat()
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (27 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 28/39] union-mount: In-kernel copyup of xattrs Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 30/39] union-mount: Implement union-aware link() Valerie Aurora
                   ` (9 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

For union mounts, a file located on the lower layer will incorrectly
return EROFS on an access check.  To fix this, use the new
path_permission() call, which ignores a read-only lower layer file
system if the target will be copied up to the topmost file system.
---
 fs/open.c |   21 +++++++++++++++++----
 1 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 74e5cd9..cb39b9d 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -30,6 +30,7 @@
 #include <linux/falloc.h>
 #include <linux/fs_struct.h>
 #include <linux/ima.h>
+#include <linux/union.h>
 
 #include "internal.h"
 
@@ -454,7 +455,10 @@ SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
 	const struct cred *old_cred;
 	struct cred *override_cred;
 	struct path path;
+	struct nameidata nd;
+	struct vfsmount *mnt;
 	struct inode *inode;
+	char *tmp;
 	int res;
 
 	if (mode & ~S_IRWXO)	/* where's F_OK, X_OK, W_OK, R_OK? */
@@ -478,10 +482,17 @@ SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
 
 	old_cred = override_creds(override_cred);
 
-	res = user_path_at(dfd, filename, LOOKUP_FOLLOW, &path);
+	res = user_path_nd(dfd, filename, LOOKUP_FOLLOW,
+				   &nd, &path, &tmp);
 	if (res)
 		goto out;
 
+	/* For union mounts, use the topmost mnt's permissions */
+	if (IS_UNIONED_DIR(&nd.path))
+		mnt = nd.path.mnt;
+	else
+		mnt = path.mnt;
+
 	inode = path.dentry->d_inode;
 
 	if ((mode & MAY_EXEC) && S_ISREG(inode->i_mode)) {
@@ -490,11 +501,11 @@ SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
 		 * with the "noexec" flag.
 		 */
 		res = -EACCES;
-		if (path.mnt->mnt_flags & MNT_NOEXEC)
+		if (mnt->mnt_flags & MNT_NOEXEC)
 			goto out_path_release;
 	}
 
-	res = inode_permission(inode, mode | MAY_ACCESS);
+	res = path_permission(&path, &nd.path, mode | MAY_ACCESS);
 	/* SuS v2 requires we report a read only fs too */
 	if (res || !(mode & S_IWOTH) || special_file(inode->i_mode))
 		goto out_path_release;
@@ -508,11 +519,13 @@ SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
 	 * inherently racy and know that the fs may change
 	 * state before we even see this result.
 	 */
-	if (__mnt_is_readonly(path.mnt))
+	if (__mnt_is_readonly(mnt))
 		res = -EROFS;
 
 out_path_release:
 	path_put(&path);
+	path_put(&nd.path);
+	putname(tmp);
 out:
 	revert_creds(old_cred);
 	put_cred(override_cred);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 30/39] union-mount: Implement union-aware link()
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (28 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 29/39] union-mount: Implement union-aware access()/faccessat() Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 31/39] union-mount: Implement union-aware rename() Valerie Aurora
                   ` (8 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

---
 fs/namei.c |   24 ++++++++++++++++++++----
 1 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 0c52042..d85d7f1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3029,16 +3029,18 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
 {
 	struct dentry *new_dentry;
 	struct nameidata nd;
+	struct nameidata old_nd;
 	struct path old_path;
 	int error;
 	char *to;
+	char *from;
 
 	if ((flags & ~AT_SYMLINK_FOLLOW) != 0)
 		return -EINVAL;
 
-	error = user_path_at(olddfd, oldname,
+	error = user_path_nd(olddfd, oldname,
 			     flags & AT_SYMLINK_FOLLOW ? LOOKUP_FOLLOW : 0,
-			     &old_path);
+			     &old_nd, &old_path, &from);
 	if (error)
 		return error;
 
@@ -3046,8 +3048,20 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
 	if (error)
 		goto out;
 	error = -EXDEV;
-	if (old_path.mnt != nd.path.mnt)
-		goto out_release;
+	if (old_path.mnt != nd.path.mnt) {
+		if (IS_UNIONED_DIR(&old_nd.path) &&
+		    (old_nd.path.mnt == nd.path.mnt)) {
+			error = mnt_want_write(old_nd.path.mnt);
+			if (error)
+				goto out_release;
+			error = union_copyup(&old_nd, &old_path);
+			mnt_drop_write(old_nd.path.mnt);
+			if (error)
+				goto out_release;
+		} else {
+			goto out_release;
+		}
+	}
 	new_dentry = lookup_create(&nd, 0);
 	error = PTR_ERR(new_dentry);
 	if (IS_ERR(new_dentry))
@@ -3070,6 +3084,8 @@ out_release:
 	putname(to);
 out:
 	path_put(&old_path);
+	path_put(&old_nd.path);
+	putname(from);
 
 	return error;
 }
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 31/39] union-mount: Implement union-aware rename()
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (29 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 30/39] union-mount: Implement union-aware link() Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 32/39] union-mount: Implement union-aware writable open() Valerie Aurora
                   ` (7 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

On rename() of a file on union mount, copyup and whiteout the source
file.  Both are done under the rename mutex.  I believe this is
actually atomic.

XXX - May not need to do file copyup under the lock.
---
 fs/namei.c |   75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index d85d7f1..b00ece9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3243,6 +3243,7 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 {
 	struct dentry *old_dir, *new_dir;
 	struct path old, new;
+	struct path to_whiteout = {NULL, NULL};
 	struct dentry *trap;
 	struct nameidata oldnd, newnd;
 	char *from;
@@ -3258,12 +3259,9 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 		goto exit1;
 
 	error = -EXDEV;
+	/* Union mounts will pass below test - dirs always on topmost */
 	if (oldnd.path.mnt != newnd.path.mnt)
 		goto exit2;
-	/* Rename on union mounts not implemented yet */
-	/* XXX much harsher check than necessary - can do some renames */
-	if (IS_UNIONED_DIR(&oldnd.path) || IS_UNIONED_DIR(&newnd.path))
-		goto exit2;
 	old_dir = oldnd.path.dentry;
 	error = -EBUSY;
 	if (oldnd.last_type != LAST_NORM)
@@ -3286,7 +3284,7 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 	error = -ENOENT;
 	if (!old.dentry->d_inode)
 		goto exit4;
-	/* unless the source is a directory trailing slashes give -ENOTDIR */
+	/* unless the source is a directory, trailing slashes give -ENOTDIR */
 	if (!S_ISDIR(old.dentry->d_inode->i_mode)) {
 		error = -ENOTDIR;
 		if (oldnd.last.name[oldnd.last.len])
@@ -3298,6 +3296,11 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 	error = -EINVAL;
 	if (old.dentry == trap)
 		goto exit4;
+	error = -EXDEV;
+	/* Can't rename a directory from a lower layer */
+	if (IS_UNIONED_DIR(&oldnd.path) &&
+	    IS_UNIONED_DIR(&old))
+		goto exit4;
 	error = lookup_hash(&newnd, &newnd.last, &new);
 	if (error)
 		goto exit4;
@@ -3305,6 +3308,48 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 	error = -ENOTEMPTY;
 	if (new.dentry == trap)
 		goto exit5;
+	error = -EXDEV;
+	/* Can't rename over directories on the lower layer */
+	if (IS_UNIONED_DIR(&newnd.path) &&
+	    IS_UNIONED_DIR(&new))
+		goto exit4;
+
+	/* If source is on lower layer, copy up */
+	if (IS_UNIONED_DIR(&oldnd.path) &&
+	    (old.mnt != oldnd.path.mnt)) {
+		/* Save the lower path to avoid a second lookup for whiteout */
+		to_whiteout.dentry = dget(old.dentry);
+		to_whiteout.mnt = mntget(old.mnt);
+		error = __union_copyup(&oldnd, &old);
+		if (error)
+			goto exit5;
+	}
+
+	/* If target is on lower layer, get negative dentry for topmost */
+	if (IS_UNIONED_DIR(&newnd.path) &&
+	    (new.mnt != newnd.path.mnt)) {
+		struct dentry *dentry;
+		/*
+		 * At this point, source and target are both files,
+		 * the source is on the topmost layer, and the target
+		 * is on a lower layer.  We want the target dentry to
+		 * disappear from the namespace, and give vfs_rename a
+		 * negative dentry from the topmost layer.
+		 */
+		/* We already did lookup once, no need to check perm */
+		dentry = __lookup_hash(&newnd.last, newnd.path.dentry, &newnd);
+		if (IS_ERR(dentry)) {
+			error = PTR_ERR(dentry);
+			goto exit5;
+		}
+		/* We no longer need the lower target dentry.  It
+		 * definitely should be removed from the hash table */
+		/* XXX what about failure case? */
+		d_delete(new.dentry);
+		mntput(new.mnt);
+		new.mnt = mntget(newnd.path.mnt);
+		new.dentry = dentry;
+	}
 
 	error = mnt_want_write(oldnd.path.mnt);
 	if (error)
@@ -3315,6 +3360,26 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 		goto exit6;
 	error = vfs_rename(old_dir->d_inode, old.dentry,
 				   new_dir->d_inode, new.dentry);
+	if (error)
+		goto exit6;
+	/* Now whiteout the source */
+	if (IS_UNIONED_DIR(&oldnd.path)) {
+		if (!to_whiteout.dentry) {
+			struct dentry *dentry;
+			/* We could have exposed a lower level entry */
+			dentry = __lookup_hash(&oldnd.last, oldnd.path.dentry, &oldnd);
+			if (IS_ERR(dentry)) {
+				error = PTR_ERR(dentry);
+				goto exit6;
+			}
+			to_whiteout.dentry = dentry;
+			to_whiteout.mnt = mntget(oldnd.path.mnt);
+		}
+
+		if (to_whiteout.dentry->d_inode)
+			error = do_whiteout(&oldnd, &to_whiteout, 0);
+		path_put(&to_whiteout);
+	}
 exit6:
 	mnt_drop_write(oldnd.path.mnt);
 exit5:
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 32/39] union-mount: Implement union-aware writable open()
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (30 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 31/39] union-mount: Implement union-aware rename() Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 33/39] union-mount: Implement union-aware chown() Valerie Aurora
                   ` (6 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

Copy up a file when opened with write permissions.  Does not copy up
the file data when O_TRUNC is specified.
---
 fs/namei.c |   28 ++++++++++++++++++++++++++++
 1 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index b00ece9..78c9b87 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1937,6 +1937,24 @@ exit:
 	return ERR_PTR(error);
 }
 
+static int open_union_copyup(struct nameidata *nd, struct path *path,
+			     int open_flag)
+{
+	struct vfsmount *oldmnt = path->mnt;
+	int error;
+
+	if (open_flag & O_TRUNC)
+		error = union_copyup_len(nd, path, 0);
+	else
+		error = union_copyup(nd, path);
+	if (error)
+		return error;
+	if (oldmnt != path->mnt)
+		mntput(nd->path.mnt);
+
+	return error;
+}
+
 static struct file *do_last(struct nameidata *nd, struct path *path,
 			    int open_flag, int acc_mode,
 			    int mode, const char *pathname)
@@ -1988,6 +2006,11 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
 			if (!path->dentry->d_inode->i_op->lookup)
 				goto exit_dput;
 		}
+		if (acc_mode & MAY_WRITE) {
+			error = open_union_copyup(nd, path, open_flag);
+			if (error)
+				goto exit_dput;
+		}
 		path_to_nameidata(path, nd);
 		audit_inode(pathname, nd->path.dentry);
 		goto ok;
@@ -2059,6 +2082,11 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
 	if (path->dentry->d_inode->i_op->follow_link)
 		return NULL;
 
+	if (acc_mode & MAY_WRITE) {
+		error = open_union_copyup(nd, path, open_flag);
+		if (error)
+			goto exit_dput;
+	}
 	path_to_nameidata(path, nd);
 	error = -EISDIR;
 	if (S_ISDIR(path->dentry->d_inode->i_mode))
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 33/39] union-mount: Implement union-aware chown()
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (31 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 32/39] union-mount: Implement union-aware writable open() Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 34/39] union-mount: Implement union-aware truncate() Valerie Aurora
                   ` (5 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

Proof-of-concept implementation of chown() for union mounts.
---
 fs/open.c |   23 ++++++++++++++++++++---
 1 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index cb39b9d..808d32a 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -718,18 +718,35 @@ static int chown_common(struct path *path, uid_t user, gid_t group)
 SYSCALL_DEFINE3(chown, const char __user *, filename, uid_t, user, gid_t, group)
 {
 	struct path path;
+	struct nameidata nd;
+	struct vfsmount *mnt;
+	char *tmp;
 	int error;
 
-	error = user_path(filename, &path);
+	error = user_path_nd(AT_FDCWD, filename, LOOKUP_FOLLOW,
+				     &nd, &path, &tmp);
 	if (error)
 		goto out;
-	error = mnt_want_write(path.mnt);
+
+	if (IS_UNIONED_DIR(&nd.path))
+		mnt = nd.path.mnt;
+	else
+		mnt = path.mnt;
+
+	error = mnt_want_write(mnt);
 	if (error)
 		goto out_release;
+
+	error = union_copyup(&nd, &path);
+	if (error)
+		goto out_drop_write;
 	error = chown_common(&path, user, group);
-	mnt_drop_write(path.mnt);
+out_drop_write:
+	mnt_drop_write(mnt);
 out_release:
 	path_put(&path);
+	path_put(&nd.path);
+	putname(tmp);
 out:
 	return error;
 }
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 34/39] union-mount: Implement union-aware truncate()
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (32 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 33/39] union-mount: Implement union-aware chown() Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 35/39] union-mount: Implement union-aware chmod()/fchmodat() Valerie Aurora
                   ` (4 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

---
 fs/open.c |   24 ++++++++++++++++++++----
 1 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 808d32a..fa3ecae 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -230,14 +230,17 @@ int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
 static long do_sys_truncate(const char __user *pathname, loff_t length)
 {
 	struct path path;
+	struct nameidata nd;
+	struct vfsmount *mnt;
 	struct inode *inode;
+	char *tmp;
 	int error;
 
 	error = -EINVAL;
 	if (length < 0)	/* sorry, but loff_t says... */
 		goto out;
 
-	error = user_path(pathname, &path);
+	error = user_path_nd(AT_FDCWD, pathname, 0, &nd, &path, &tmp);
 	if (error)
 		goto out;
 	inode = path.dentry->d_inode;
@@ -251,11 +254,16 @@ static long do_sys_truncate(const char __user *pathname, loff_t length)
 	if (!S_ISREG(inode->i_mode))
 		goto dput_and_out;
 
-	error = mnt_want_write(path.mnt);
+	if (IS_UNIONED_DIR(&nd.path))
+		mnt = nd.path.mnt;
+	else
+		mnt = path.mnt;
+
+	error = mnt_want_write(mnt);
 	if (error)
 		goto dput_and_out;
 
-	error = inode_permission(inode, MAY_WRITE);
+	error = path_permission(&path, &nd.path, MAY_WRITE);
 	if (error)
 		goto mnt_drop_write_and_out;
 
@@ -263,6 +271,12 @@ static long do_sys_truncate(const char __user *pathname, loff_t length)
 	if (IS_APPEND(inode))
 		goto mnt_drop_write_and_out;
 
+	error = union_copyup_len(&nd, &path, length);
+	if (error)
+		goto mnt_drop_write_and_out;
+
+	/* path may have changed after copyup */
+	inode = path.dentry->d_inode;
 	error = get_write_access(inode);
 	if (error)
 		goto mnt_drop_write_and_out;
@@ -284,9 +298,11 @@ static long do_sys_truncate(const char __user *pathname, loff_t length)
 put_write_and_out:
 	put_write_access(inode);
 mnt_drop_write_and_out:
-	mnt_drop_write(path.mnt);
+	mnt_drop_write(mnt);
 dput_and_out:
 	path_put(&path);
+	path_put(&nd.path);
+	putname(tmp);
 out:
 	return error;
 }
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 35/39] union-mount: Implement union-aware chmod()/fchmodat()
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (33 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 34/39] union-mount: Implement union-aware truncate() Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 36/39] union-mount: Implement union-aware lchown() Valerie Aurora
                   ` (3 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

---
 fs/open.c |   25 +++++++++++++++++++++----
 1 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index fa3ecae..866041c 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -669,18 +669,32 @@ out:
 SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename, mode_t, mode)
 {
 	struct path path;
+	struct nameidata nd;
+	struct vfsmount *mnt;
 	struct inode *inode;
+	char *tmp;
 	int error;
 	struct iattr newattrs;
 
-	error = user_path_at(dfd, filename, LOOKUP_FOLLOW, &path);
+	error = user_path_nd(dfd, filename, LOOKUP_FOLLOW, &nd,
+				     &path, &tmp);
 	if (error)
 		goto out;
-	inode = path.dentry->d_inode;
 
-	error = mnt_want_write(path.mnt);
+	if (IS_UNIONED_DIR(&nd.path))
+		mnt = nd.path.mnt;
+	else
+		mnt = path.mnt;
+
+	error = mnt_want_write(mnt);
 	if (error)
 		goto dput_and_out;
+
+	error = union_copyup(&nd, &path);
+	if (error)
+		goto mnt_drop_write_and_out;
+
+	inode = path.dentry->d_inode;
 	mutex_lock(&inode->i_mutex);
 	error = security_path_chmod(path.dentry, path.mnt, mode);
 	if (error)
@@ -692,9 +706,12 @@ SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename, mode_t, mode)
 	error = notify_change(path.dentry, &newattrs);
 out_unlock:
 	mutex_unlock(&inode->i_mutex);
-	mnt_drop_write(path.mnt);
+mnt_drop_write_and_out:
+	mnt_drop_write(mnt);
 dput_and_out:
 	path_put(&path);
+	path_put(&nd.path);
+	putname(tmp);
 out:
 	return error;
 }
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 36/39] union-mount: Implement union-aware lchown()
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (34 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 35/39] union-mount: Implement union-aware chmod()/fchmodat() Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 37/39] union-mount: Implement union-aware utimensat() Valerie Aurora
                   ` (2 subsequent siblings)
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

---
 fs/open.c |   23 ++++++++++++++++++++---
 1 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 866041c..e9266e2 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -812,18 +812,35 @@ out:
 SYSCALL_DEFINE3(lchown, const char __user *, filename, uid_t, user, gid_t, group)
 {
 	struct path path;
+	struct nameidata nd;
+	struct vfsmount *mnt;
+	char *tmp;
 	int error;
 
-	error = user_lpath(filename, &path);
+	error = user_path_nd(AT_FDCWD, filename, 0, &nd, &path, &tmp);
 	if (error)
 		goto out;
-	error = mnt_want_write(path.mnt);
+
+	if (IS_UNIONED_DIR(&nd.path))
+		mnt = nd.path.mnt;
+	else
+		mnt = path.mnt;
+
+	error = mnt_want_write(mnt);
 	if (error)
 		goto out_release;
+
+	error = union_copyup(&nd, &path);
+	if (error)
+		goto out_drop_write;
+
 	error = chown_common(&path, user, group);
-	mnt_drop_write(path.mnt);
+out_drop_write:
+	mnt_drop_write(mnt);
 out_release:
 	path_put(&path);
+	path_put(&nd.path);
+	putname(tmp);
 out:
 	return error;
 }
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 37/39] union-mount: Implement union-aware utimensat()
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (35 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 36/39] union-mount: Implement union-aware lchown() Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 38/39] union-mount: Implement union-aware setxattr() Valerie Aurora
  2010-05-03 23:12 ` [PATCH 39/39] union-mount: Implement union-aware lsetxattr() Valerie Aurora
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

---
 fs/utimes.c |   14 ++++++++++++--
 1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/utimes.c b/fs/utimes.c
index e4c75db..fb32111 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -8,6 +8,8 @@
 #include <linux/stat.h>
 #include <linux/utime.h>
 #include <linux/syscalls.h>
+#include <linux/union.h>
+#include <linux/slab.h>
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
 
@@ -152,18 +154,26 @@ long do_utimes(int dfd, char __user *filename, struct timespec *times, int flags
 		error = utimes_common(&file->f_path, times);
 		fput(file);
 	} else {
+		struct nameidata nd;
+		char *tmp;
 		struct path path;
 		int lookup_flags = 0;
 
 		if (!(flags & AT_SYMLINK_NOFOLLOW))
 			lookup_flags |= LOOKUP_FOLLOW;
 
-		error = user_path_at(dfd, filename, lookup_flags, &path);
+		error = user_path_nd(dfd, filename, lookup_flags, &nd, &path,
+				     &tmp);
 		if (error)
 			goto out;
 
-		error = utimes_common(&path, times);
+		error = union_copyup(&nd, &path);
+
+		if (!error)
+			error = utimes_common(&path, times);
 		path_put(&path);
+		path_put(&nd.path);
+		putname(tmp);
 	}
 
 out:
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 38/39] union-mount: Implement union-aware setxattr()
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (36 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 37/39] union-mount: Implement union-aware utimensat() Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  2010-05-03 23:12 ` [PATCH 39/39] union-mount: Implement union-aware lsetxattr() Valerie Aurora
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

---
 fs/xattr.c |   33 +++++++++++++++++++++++++++------
 1 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/fs/xattr.c b/fs/xattr.c
index 46f87e8..f930ab7 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -18,6 +18,7 @@
 #include <linux/module.h>
 #include <linux/fsnotify.h>
 #include <linux/audit.h>
+#include <linux/union.h>
 #include <asm/uaccess.h>
 
 
@@ -281,17 +282,37 @@ SYSCALL_DEFINE5(setxattr, const char __user *, pathname,
 		size_t, size, int, flags)
 {
 	struct path path;
+	struct nameidata nd;
+	struct vfsmount *mnt;
+	char *tmp;
 	int error;
 
-	error = user_path(pathname, &path);
+	error = user_path_nd(AT_FDCWD, pathname, LOOKUP_FOLLOW, &nd, &path,
+			     &tmp);
 	if (error)
 		return error;
-	error = mnt_want_write(path.mnt);
-	if (!error) {
-		error = setxattr(path.dentry, name, value, size, flags);
-		mnt_drop_write(path.mnt);
-	}
+
+	if (IS_UNIONED_DIR(&nd.path))
+		mnt = nd.path.mnt;
+	else
+		mnt = path.mnt;
+
+	error = mnt_want_write(mnt);
+	if (error)
+		goto out;
+
+	error = union_copyup(&nd, &path);
+	if (error)
+		goto out_drop_write;
+
+	error = setxattr(path.dentry, name, value, size, flags);
+
+out_drop_write:
+	mnt_drop_write(mnt);
+out:
 	path_put(&path);
+	path_put(&nd.path);
+	putname(tmp);
 	return error;
 }
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 39/39] union-mount: Implement union-aware lsetxattr()
  2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
                   ` (37 preceding siblings ...)
  2010-05-03 23:12 ` [PATCH 38/39] union-mount: Implement union-aware setxattr() Valerie Aurora
@ 2010-05-03 23:12 ` Valerie Aurora
  38 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-03 23:12 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Jan Blunck,
	Valerie Aurora

---
 fs/xattr.c |   31 +++++++++++++++++++++++++------
 1 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/fs/xattr.c b/fs/xattr.c
index f930ab7..0a56629 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -321,17 +321,36 @@ SYSCALL_DEFINE5(lsetxattr, const char __user *, pathname,
 		size_t, size, int, flags)
 {
 	struct path path;
+	struct nameidata nd;
+	struct vfsmount *mnt;
+	char *tmp;
 	int error;
 
-	error = user_lpath(pathname, &path);
+	error = user_path_nd(AT_FDCWD, pathname, 0, &nd, &path, &tmp);
 	if (error)
 		return error;
-	error = mnt_want_write(path.mnt);
-	if (!error) {
-		error = setxattr(path.dentry, name, value, size, flags);
-		mnt_drop_write(path.mnt);
-	}
+
+	if (IS_UNIONED_DIR(&nd.path))
+		mnt = nd.path.mnt;
+	else
+		mnt = path.mnt;
+
+	error = mnt_want_write(mnt);
+	if (error)
+		goto out;
+
+	error = union_copyup(&nd, &path);
+	if (error)
+		goto out_drop_write;
+
+	error = setxattr(path.dentry, name, value, size, flags);
+
+out_drop_write:
+	mnt_drop_write(mnt);
+out:
 	path_put(&path);
+	path_put(&nd.path);
+	putname(tmp);
 	return error;
 }
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/39] whiteout/NFSD: Don't return information about whiteouts to userspace
@ 2010-05-03 23:37     ` Neil Brown
  0 siblings, 0 replies; 57+ messages in thread
From: Neil Brown @ 2010-05-03 23:37 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, Christoph Hellwig,
	Jan Blunck, David Woodhouse, linux-nfs, J. Bruce Fields

On Mon,  3 May 2010 16:12:04 -0700
Valerie Aurora <vaurora@redhat.com> wrote:

> From: Jan Blunck <jblunck@suse.de>
> 
> Userspace isn't ready for handling another file type, so silently drop
> whiteout directory entries before they leave the kernel.

Feels very intrusive doesn't it....

Have you considered something like the following?

NeilBrown

diff --git a/fs/readdir.c b/fs/readdir.c
index 7723401..4c5b347 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -19,10 +19,26 @@
 
 #include <asm/uaccess.h>
 
+struct readdir_info {
+	filldir_t filler;
+	void *data;
+};
+
+static int white_out(void *vrdi, const char *name, int namlen,
+		     loff_t offset, u64 ino, unsigned int d_type)
+{
+	struct readdir_info *rdi = vrdi;
+	if (d_type == DT_WHT)
+		return 0;
+	return rdi->filler(rdi->data, name, namlen, offset, info, d_type);
+}
+
 int vfs_readdir(struct file *file, filldir_t filler, void *buf)
 {
 	struct inode *inode = file->f_path.dentry->d_inode;
 	int res = -ENOTDIR;
+	struct readir_info rdi = { filler, buf };
+
 	if (!file->f_op || !file->f_op->readdir)
 		goto out;
 
@@ -36,7 +52,7 @@ int vfs_readdir(struct file *file, filldir_t filler, void *buf)
 
 	res = -ENOENT;
 	if (!IS_DEADDIR(inode)) {
-		res = file->f_op->readdir(file, buf, filler);
+		res = file->f_op->readdir(file, &rdi, white_out);
 		file_accessed(file);
 	}
 	mutex_unlock(&inode->i_mutex);

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/39] whiteout/NFSD: Don't return information about whiteouts to userspace
@ 2010-05-03 23:37     ` Neil Brown
  0 siblings, 0 replies; 57+ messages in thread
From: Neil Brown @ 2010-05-03 23:37 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Alexander Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig,
	Jan Blunck, David Woodhouse, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	J. Bruce Fields

On Mon,  3 May 2010 16:12:04 -0700
Valerie Aurora <vaurora-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> From: Jan Blunck <jblunck-l3A5Bk7waGM@public.gmane.org>
> 
> Userspace isn't ready for handling another file type, so silently drop
> whiteout directory entries before they leave the kernel.

Feels very intrusive doesn't it....

Have you considered something like the following?

NeilBrown

diff --git a/fs/readdir.c b/fs/readdir.c
index 7723401..4c5b347 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -19,10 +19,26 @@
 
 #include <asm/uaccess.h>
 
+struct readdir_info {
+	filldir_t filler;
+	void *data;
+};
+
+static int white_out(void *vrdi, const char *name, int namlen,
+		     loff_t offset, u64 ino, unsigned int d_type)
+{
+	struct readdir_info *rdi = vrdi;
+	if (d_type == DT_WHT)
+		return 0;
+	return rdi->filler(rdi->data, name, namlen, offset, info, d_type);
+}
+
 int vfs_readdir(struct file *file, filldir_t filler, void *buf)
 {
 	struct inode *inode = file->f_path.dentry->d_inode;
 	int res = -ENOTDIR;
+	struct readir_info rdi = { filler, buf };
+
 	if (!file->f_op || !file->f_op->readdir)
 		goto out;
 
@@ -36,7 +52,7 @@ int vfs_readdir(struct file *file, filldir_t filler, void *buf)
 
 	res = -ENOENT;
 	if (!IS_DEADDIR(inode)) {
-		res = file->f_op->readdir(file, buf, filler);
+		res = file->f_op->readdir(file, &rdi, white_out);
 		file_accessed(file);
 	}
 	mutex_unlock(&inode->i_mutex);
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 27/39] union-mount: In-kernel copyup routines
  2010-05-03 23:12 ` [PATCH 27/39] union-mount: In-kernel copyup routines Valerie Aurora
@ 2010-05-04  1:40   ` Valdis.Kletnieks
  2010-05-07 14:45     ` Valerie Aurora
  0 siblings, 1 reply; 57+ messages in thread
From: Valdis.Kletnieks @ 2010-05-04  1:40 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, Christoph Hellwig,
	Jan Blunck

[-- Attachment #1: Type: text/plain, Size: 1606 bytes --]

On Mon, 03 May 2010 16:12:26 PDT, Valerie Aurora said:
> When a file on the read-only layer of a union mount is altered, it
> must be copied up to the topmost read-write layer.  This patch creates
> union_copyup() and its supporting routines.
> ---
>  fs/union.c            |  244 +++++++++++++++++++++++++++++++++++++++++++++++

> +/**
> + * union_copyup_data - Copy up len bytes of old's data to new
> + *
> + * @old: source file
> + * @new: target file
> + * @len: number of bytes to copy
> + */
> +
> +static int union_copyup_data(struct path *old, struct vfsmount *new_mnt,
> +			     struct dentry *new_dentry, size_t len)
> +{
> +	struct file *old_file;
> +	struct file *new_file;
> +	const struct cred *cred = current_cred();
> +	loff_t offset = 0;
> +	long bytes;
> +	int error;

Should this be 'int error = 0;' ?

> +
> +	if (len == 0)
> +		return 0;
> +
> +	/* Get reference to balance later fput() */
> +	path_get(old);
> +	old_file = dentry_open(old->dentry, old->mnt, O_RDONLY, cred);
> +	if (IS_ERR(old_file))
> +		return PTR_ERR(old_file);
> +
> +	dget(new_dentry);
> +	mntget(new_mnt);
> +	new_file = dentry_open(new_dentry, new_mnt, O_WRONLY, cred);
> +	if (IS_ERR(new_file)) {
> +		error = PTR_ERR(new_file);
> +		goto out_fput;
> +	}
> +
> +	bytes = do_splice_direct(old_file, &offset, new_file, len,
> +				 SPLICE_F_MOVE);
> +	if (bytes < 0)
> +		error = bytes;
> +
> +	fput(new_file);
> +out_fput:
> +	fput(old_file);
> +	return error;
> +}

because otherwise if do_splice_direct() returns a non-negative value,
we can hit 'return error;' without ever having set error to anything?


[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 17/39] union-mount: Union mounts documentation
  2010-05-03 23:12 ` [PATCH 17/39] union-mount: Union mounts documentation Valerie Aurora
@ 2010-05-04  1:54   ` Valdis.Kletnieks
  2010-05-05 13:06     ` Valerie Aurora
  2010-05-04 21:12   ` Jamie Lokier
  1 sibling, 1 reply; 57+ messages in thread
From: Valdis.Kletnieks @ 2010-05-04  1:54 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, Christoph Hellwig,
	Jan Blunck

[-- Attachment #1: Type: text/plain, Size: 1400 bytes --]

On Mon, 03 May 2010 16:12:16 PDT, Valerie Aurora said:

> +File copyup
> +-----------
> +
> +Any system call that alters the data or metadata of a file on the
> +bottom layer, or creates or changes a hard link to it will trigger a
> +copyup of the target file from the lower layer to the topmost layer
> +
> + - open(O_WRITE | O_RDWR | O_APPEND | O_DIRECT)
> + - truncate()/open(O_TRUNC)
> + - link()
> + - rename()
> + - chmod()
> + - chown()/lchown()
> + - utimes()
> + - setxattr()/lsetxattr()

I spent some time looking at patch 27 trying to figure it out for myself,
but my lack of splice()-fu doomed me. :)

A few quick questions:

1) For calls like chmod() that only touch the metadata, does it still
trigger a copyup of the data, or just the affected metadata?

2) Is the copyup of data synchronous or async done in the background?
The comments in union_copyup_len() about "We raced with someone else"
imply this is synchronous - if so. probably a note should be made that
an open() may take a little while under some conditions.  There's a *lot* of
code out there that assumes that open() calls are *really* cheap.

I wonder how many programs don't correctly deal with an ENOSPC on open() of
an already existing file.

(The answers probably don't matter unless somebody ends up invoking a
copyup of a gigabyte file, which of course implies one of my users will end up
doing exactly that. :)

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 17/39] union-mount: Union mounts documentation
  2010-05-03 23:12 ` [PATCH 17/39] union-mount: Union mounts documentation Valerie Aurora
  2010-05-04  1:54   ` Valdis.Kletnieks
@ 2010-05-04 21:12   ` Jamie Lokier
  2010-05-05 13:19     ` Valerie Aurora
  1 sibling, 1 reply; 57+ messages in thread
From: Jamie Lokier @ 2010-05-04 21:12 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, Christoph Hellwig,
	Jan Blunck

Valerie Aurora wrote:
> +File copyup: Create a file on the top layer that has the same metadata
> +and contents as the file with the same pathname on the bottom layer.

Can copyup be interrupted?  E.g. if I chmod an 80GB file, will the
chmod() system call pause for a couple of hours, or can I control-C it?

> +This deviation from standard is due to technical limitations of the
> +union mount implementation.  Specifically, we would need to replace an
> +open file descriptor from the lower layer with an open file descriptor
> +for a file with matching pathname and contents on the upper layer,
> +which is difficult to do.  We avoid this in other system calls by
> +doing the copyup before the file is opened.  Unionfs doesn't encounter
> +this problem because it creates a dummy file struct which redirects or
> +fans out operations to the struct files for the underlying file
> +systems.
> +
> +From an application's point of view, the result of an in-kernel file
> +copyup is the logical equivalent of another application updating the
> +file via the rename() pattern: creat() a new file, copy the data over,
> +make changes the copy, and rename() over the old version.  Any
> +existing open file descriptors for that file (including those in the
> +same application) refer to a now invisible object that used to have
> +the same pathname.  Only opens that occur after the copyup will see
> +updates to the file.

Does it apply the same permission checks that a program doing
copy+rename would have to pass?  I guess that is just write access to
the directory.

Does it effectively "rename" all hard links referring to the file, to
point to the new version, or does it only affect the path that was
used by the writer/modifier, leaving the other links continue to refer
to the original file?

> + - File copyup on open(O_DIRECT)

Why is O_DIRECT relevant?  O_DIRECT doesn't imply writing, and
copy+rename behaviour is the same with O_DIRECT as not.

Some programs use O_DIRECT to read very large files, without intending
they will ever be modified.  For example, qemu using O_DIRECT to
access a disk image backing file.

> +NFS interaction
> +===============
> +
> +NFS is currently not supported as either type of layer.  NFS as
> +read-only layer requires support from the server to honor the
> +read-only guarantee needed for the bottom layer.  To do this, the
> +server needs to revoke access to clients requesting read-only file
> +systems if the exported file system is remounted read-write or
> +unmounted (during which arbitrary changes can occur).  Some recent
> +discussion:
> +
> +http://markmail.org/message/3mkgnvo4pswxd7lp
> +
> +NFS as the read-write layer would require implementation of the
> +->whiteout() and ->fallthru() methods.  DT_WHT directory entries are
> +theoretically already supported.
> +
> +Also, technically the requirement for a readdir() cookie that is
> +stable across reboots comes only from file systems exported via NFSv2:
> +
> +http://oss.oracle.com/pipermail/btrfs-devel/2008-January/000463.html
> +
> +Todo:
> +
> +- Guarantee really really read-only on NFS exports
> +- Implement whiteout()/fallthru() for NFS

I'm finding it hard to imagine _guaranteeing_ really read-only.  All
you can guarantee is that the NFS says it is read-only.

For example, a userspace NFS server cannot prevent the filesystem it's
serving from changing.

Is this not a problem with other network filesystems like CIFS, P9, FUSE?

> +Known non-POSIX behaviors
> +-------------------------
> +
> +- Link count may be wrong for files on bottom layer with > 1 link count

Can you say a bit more about what will be seen?

> +- File copyup is the logical equivalent of an update via copy +
> +  rename().  Any existing open file descriptors will continue to refer
> +  to the read-only copy on the bottom layer and will not see any
> +  changes that occur after the copy-up.

I can imagine some database-like programs getting confused by that.

Maybe it would be better to fail copyup operations when the file is
currently open O_RDONLY by anyone, analogous to the way writable
mounts are refused when any union holds it read-only?

Are there uses likely to be broken by that behaviour?

Thanks,
-- Jamie

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 17/39] union-mount: Union mounts documentation
  2010-05-04  1:54   ` Valdis.Kletnieks
@ 2010-05-05 13:06     ` Valerie Aurora
  0 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-05 13:06 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, Christoph Hellwig,
	Jan Blunck

On Mon, May 03, 2010 at 09:54:12PM -0400, Valdis.Kletnieks@vt.edu wrote:
> On Mon, 03 May 2010 16:12:16 PDT, Valerie Aurora said:
> 
> > +File copyup
> > +-----------
> > +
> > +Any system call that alters the data or metadata of a file on the
> > +bottom layer, or creates or changes a hard link to it will trigger a
> > +copyup of the target file from the lower layer to the topmost layer
> > +
> > + - open(O_WRITE | O_RDWR | O_APPEND | O_DIRECT)
> > + - truncate()/open(O_TRUNC)
> > + - link()
> > + - rename()
> > + - chmod()
> > + - chown()/lchown()
> > + - utimes()
> > + - setxattr()/lsetxattr()
> 
> I spent some time looking at patch 27 trying to figure it out for myself,
> but my lack of splice()-fu doomed me. :)
> 
> A few quick questions:
> 
> 1) For calls like chmod() that only touch the metadata, does it still
> trigger a copyup of the data, or just the affected metadata?

Yes, it copies up the whole file.  Right now there's no concept of
part of the file on one layer and part on another.

> 2) Is the copyup of data synchronous or async done in the background?
> The comments in union_copyup_len() about "We raced with someone else"
> imply this is synchronous - if so. probably a note should be made that
> an open() may take a little while under some conditions.  There's a *lot* of
> code out there that assumes that open() calls are *really* cheap.

It's synchronous.  Code that assumes open() calls are cheap will have
to be rewritten. :)

> I wonder how many programs don't correctly deal with an ENOSPC on open() of
> an already existing file.

I'm not too worried about that - how many programs deal correctly with
ENOSPC when it is normally returned?

-VAL

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 17/39] union-mount: Union mounts documentation
  2010-05-04 21:12   ` Jamie Lokier
@ 2010-05-05 13:19     ` Valerie Aurora
  0 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-05 13:19 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, Christoph Hellwig,
	Jan Blunck

On Tue, May 04, 2010 at 10:12:09PM +0100, Jamie Lokier wrote:
> Valerie Aurora wrote:
> > +File copyup: Create a file on the top layer that has the same metadata
> > +and contents as the file with the same pathname on the bottom layer.
> 
> Can copyup be interrupted?  E.g. if I chmod an 80GB file, will the
> chmod() system call pause for a couple of hours, or can I control-C it?

The right behavior is that you should be able to control-C it, but I
doubt that currently works.  Let me look into testing and implementing
this.

> > +This deviation from standard is due to technical limitations of the
> > +union mount implementation.  Specifically, we would need to replace an
> > +open file descriptor from the lower layer with an open file descriptor
> > +for a file with matching pathname and contents on the upper layer,
> > +which is difficult to do.  We avoid this in other system calls by
> > +doing the copyup before the file is opened.  Unionfs doesn't encounter
> > +this problem because it creates a dummy file struct which redirects or
> > +fans out operations to the struct files for the underlying file
> > +systems.
> > +
> > +From an application's point of view, the result of an in-kernel file
> > +copyup is the logical equivalent of another application updating the
> > +file via the rename() pattern: creat() a new file, copy the data over,
> > +make changes the copy, and rename() over the old version.  Any
> > +existing open file descriptors for that file (including those in the
> > +same application) refer to a now invisible object that used to have
> > +the same pathname.  Only opens that occur after the copyup will see
> > +updates to the file.
> 
> Does it apply the same permission checks that a program doing
> copy+rename would have to pass?  I guess that is just write access to
> the directory.

Yes.

> Does it effectively "rename" all hard links referring to the file, to
> point to the new version, or does it only affect the path that was
> used by the writer/modifier, leaving the other links continue to refer
> to the original file?

In order to update all the hard links to a file, we would have to walk
the entire file system searching for links with a matching inode
number and copy them up too.  We're never going to do a
file-system-wide walk, so we won't do that.  The other hard links
still point to the old copy of the file.  We hope applications don't
commonly depend on this.

> > + - File copyup on open(O_DIRECT)
> 
> Why is O_DIRECT relevant?  O_DIRECT doesn't imply writing, and
> copy+rename behaviour is the same with O_DIRECT as not.
> 
> Some programs use O_DIRECT to read very large files, without intending
> they will ever be modified.  For example, qemu using O_DIRECT to
> access a disk image backing file.

You're right, this is a mistake.

> > +NFS interaction
> > +===============
> > +
> > +NFS is currently not supported as either type of layer.  NFS as
> > +read-only layer requires support from the server to honor the
> > +read-only guarantee needed for the bottom layer.  To do this, the
> > +server needs to revoke access to clients requesting read-only file
> > +systems if the exported file system is remounted read-write or
> > +unmounted (during which arbitrary changes can occur).  Some recent
> > +discussion:
> > +
> > +http://markmail.org/message/3mkgnvo4pswxd7lp
> > +
> > +NFS as the read-write layer would require implementation of the
> > +->whiteout() and ->fallthru() methods.  DT_WHT directory entries are
> > +theoretically already supported.
> > +
> > +Also, technically the requirement for a readdir() cookie that is
> > +stable across reboots comes only from file systems exported via NFSv2:
> > +
> > +http://oss.oracle.com/pipermail/btrfs-devel/2008-January/000463.html
> > +
> > +Todo:
> > +
> > +- Guarantee really really read-only on NFS exports
> > +- Implement whiteout()/fallthru() for NFS
> 
> I'm finding it hard to imagine _guaranteeing_ really read-only.  All
> you can guarantee is that the NFS says it is read-only.
> 
> For example, a userspace NFS server cannot prevent the filesystem it's
> serving from changing.

We're discussing how to detect this now.

> Is this not a problem with other network filesystems like CIFS, P9, FUSE?

Each file system that wants to support union mounts will need to
implement the features necessary for that layer (hard read-only for
the lower layer, whiteouts and fallthrus for the upper layer).

> > +Known non-POSIX behaviors
> > +-------------------------
> > +
> > +- Link count may be wrong for files on bottom layer with > 1 link count
> 
> Can you say a bit more about what will be seen?

Sure, I'll write up an example.

> > +- File copyup is the logical equivalent of an update via copy +
> > +  rename().  Any existing open file descriptors will continue to refer
> > +  to the read-only copy on the bottom layer and will not see any
> > +  changes that occur after the copy-up.
> 
> I can imagine some database-like programs getting confused by that.
> 
> Maybe it would be better to fail copyup operations when the file is
> currently open O_RDONLY by anyone, analogous to the way writable
> mounts are refused when any union holds it read-only?
> 
> Are there uses likely to be broken by that behaviour?

That's an interesting question.  In general, this seems like a bad
idea - any process can prevent another process from writing to a file
by opening it.  This is like chmod'ing it to 444.

-VAL

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/39] whiteout/NFSD: Don't return information about whiteouts to userspace
  2010-05-03 23:37     ` Neil Brown
  (?)
@ 2010-05-06 18:01     ` Valerie Aurora
  2010-05-06 21:18         ` Neil Brown
  -1 siblings, 1 reply; 57+ messages in thread
From: Valerie Aurora @ 2010-05-06 18:01 UTC (permalink / raw)
  To: Neil Brown
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, Christoph Hellwig,
	Jan Blunck, David Woodhouse, linux-nfs, J. Bruce Fields

On Tue, May 04, 2010 at 09:37:31AM +1000, Neil Brown wrote:
> On Mon,  3 May 2010 16:12:04 -0700
> Valerie Aurora <vaurora@redhat.com> wrote:
> 
> > From: Jan Blunck <jblunck@suse.de>
> > 
> > Userspace isn't ready for handling another file type, so silently drop
> > whiteout directory entries before they leave the kernel.
> 
> Feels very intrusive doesn't it....
> 
> Have you considered something like the following?

Hrm, I see how that could be more elegant, but I'd rather avoid yet
another layer of function pointer passing around.  This code is
already hard enough to review...

-VAL

> NeilBrown
> 
> diff --git a/fs/readdir.c b/fs/readdir.c
> index 7723401..4c5b347 100644
> --- a/fs/readdir.c
> +++ b/fs/readdir.c
> @@ -19,10 +19,26 @@
>  
>  #include <asm/uaccess.h>
>  
> +struct readdir_info {
> +	filldir_t filler;
> +	void *data;
> +};
> +
> +static int white_out(void *vrdi, const char *name, int namlen,
> +		     loff_t offset, u64 ino, unsigned int d_type)
> +{
> +	struct readdir_info *rdi = vrdi;
> +	if (d_type == DT_WHT)
> +		return 0;
> +	return rdi->filler(rdi->data, name, namlen, offset, info, d_type);
> +}
> +
>  int vfs_readdir(struct file *file, filldir_t filler, void *buf)
>  {
>  	struct inode *inode = file->f_path.dentry->d_inode;
>  	int res = -ENOTDIR;
> +	struct readir_info rdi = { filler, buf };
> +
>  	if (!file->f_op || !file->f_op->readdir)
>  		goto out;
>  
> @@ -36,7 +52,7 @@ int vfs_readdir(struct file *file, filldir_t filler, void *buf)
>  
>  	res = -ENOENT;
>  	if (!IS_DEADDIR(inode)) {
> -		res = file->f_op->readdir(file, buf, filler);
> +		res = file->f_op->readdir(file, &rdi, white_out);
>  		file_accessed(file);
>  	}
>  	mutex_unlock(&inode->i_mutex);

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/39] whiteout/NFSD: Don't return information about whiteouts to userspace
  2010-05-06 18:01     ` Valerie Aurora
@ 2010-05-06 21:18         ` Neil Brown
  0 siblings, 0 replies; 57+ messages in thread
From: Neil Brown @ 2010-05-06 21:18 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, Christoph Hellwig,
	Jan Blunck, David Woodhouse, linux-nfs, J. Bruce Fields

On Thu, 6 May 2010 14:01:51 -0400
Valerie Aurora <vaurora@redhat.com> wrote:

> On Tue, May 04, 2010 at 09:37:31AM +1000, Neil Brown wrote:
> > On Mon,  3 May 2010 16:12:04 -0700
> > Valerie Aurora <vaurora@redhat.com> wrote:
> > 
> > > From: Jan Blunck <jblunck@suse.de>
> > > 
> > > Userspace isn't ready for handling another file type, so silently drop
> > > whiteout directory entries before they leave the kernel.
> > 
> > Feels very intrusive doesn't it....
> > 
> > Have you considered something like the following?
> 
> Hrm, I see how that could be more elegant, but I'd rather avoid yet
> another layer of function pointer passing around.  This code is
> already hard enough to review...

 Yes, the extra indirection is a bit of a negative, but I don't think this
 patch is harder to review than the alternate.
 From a numerical perspective, with this patch you only need to look at the
 various places that ->readdir is called to be sure it is always correct.
 There are about 3.  With the original you need to look at ever filldir
 function.  Jan has found 9.  

 And from a maintainability perspective, I think my approach is safer.  Given
 that there are 9 filldir functions already, the chance that a need will be
 found for another seems good, and the chance that the coder will know to
 check for DT_WHT is a best even.  Conversely if another call to ->readdir
 were added it is likely that nothing would need to be done.

 Of course just counting things doesn't give a completely picture but I think
 it can be indicative.

NeilBrown


> 
> -VAL
> 
> > NeilBrown
> > 
> > diff --git a/fs/readdir.c b/fs/readdir.c
> > index 7723401..4c5b347 100644
> > --- a/fs/readdir.c
> > +++ b/fs/readdir.c
> > @@ -19,10 +19,26 @@
> >  
> >  #include <asm/uaccess.h>
> >  
> > +struct readdir_info {
> > +	filldir_t filler;
> > +	void *data;
> > +};
> > +
> > +static int white_out(void *vrdi, const char *name, int namlen,
> > +		     loff_t offset, u64 ino, unsigned int d_type)
> > +{
> > +	struct readdir_info *rdi = vrdi;
> > +	if (d_type == DT_WHT)
> > +		return 0;
> > +	return rdi->filler(rdi->data, name, namlen, offset, info, d_type);
> > +}
> > +
> >  int vfs_readdir(struct file *file, filldir_t filler, void *buf)
> >  {
> >  	struct inode *inode = file->f_path.dentry->d_inode;
> >  	int res = -ENOTDIR;
> > +	struct readir_info rdi = { filler, buf };
> > +
> >  	if (!file->f_op || !file->f_op->readdir)
> >  		goto out;
> >  
> > @@ -36,7 +52,7 @@ int vfs_readdir(struct file *file, filldir_t filler, void *buf)
> >  
> >  	res = -ENOENT;
> >  	if (!IS_DEADDIR(inode)) {
> > -		res = file->f_op->readdir(file, buf, filler);
> > +		res = file->f_op->readdir(file, &rdi, white_out);
> >  		file_accessed(file);
> >  	}
> >  	mutex_unlock(&inode->i_mutex);


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/39] whiteout/NFSD: Don't return information about whiteouts to userspace
@ 2010-05-06 21:18         ` Neil Brown
  0 siblings, 0 replies; 57+ messages in thread
From: Neil Brown @ 2010-05-06 21:18 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Alexander Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig,
	Jan Blunck, David Woodhouse, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	J. Bruce Fields

On Thu, 6 May 2010 14:01:51 -0400
Valerie Aurora <vaurora-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Tue, May 04, 2010 at 09:37:31AM +1000, Neil Brown wrote:
> > On Mon,  3 May 2010 16:12:04 -0700
> > Valerie Aurora <vaurora-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > 
> > > From: Jan Blunck <jblunck-l3A5Bk7waGM@public.gmane.org>
> > > 
> > > Userspace isn't ready for handling another file type, so silently drop
> > > whiteout directory entries before they leave the kernel.
> > 
> > Feels very intrusive doesn't it....
> > 
> > Have you considered something like the following?
> 
> Hrm, I see how that could be more elegant, but I'd rather avoid yet
> another layer of function pointer passing around.  This code is
> already hard enough to review...

 Yes, the extra indirection is a bit of a negative, but I don't think this
 patch is harder to review than the alternate.
 From a numerical perspective, with this patch you only need to look at the
 various places that ->readdir is called to be sure it is always correct.
 There are about 3.  With the original you need to look at ever filldir
 function.  Jan has found 9.  

 And from a maintainability perspective, I think my approach is safer.  Given
 that there are 9 filldir functions already, the chance that a need will be
 found for another seems good, and the chance that the coder will know to
 check for DT_WHT is a best even.  Conversely if another call to ->readdir
 were added it is likely that nothing would need to be done.

 Of course just counting things doesn't give a completely picture but I think
 it can be indicative.

NeilBrown


> 
> -VAL
> 
> > NeilBrown
> > 
> > diff --git a/fs/readdir.c b/fs/readdir.c
> > index 7723401..4c5b347 100644
> > --- a/fs/readdir.c
> > +++ b/fs/readdir.c
> > @@ -19,10 +19,26 @@
> >  
> >  #include <asm/uaccess.h>
> >  
> > +struct readdir_info {
> > +	filldir_t filler;
> > +	void *data;
> > +};
> > +
> > +static int white_out(void *vrdi, const char *name, int namlen,
> > +		     loff_t offset, u64 ino, unsigned int d_type)
> > +{
> > +	struct readdir_info *rdi = vrdi;
> > +	if (d_type == DT_WHT)
> > +		return 0;
> > +	return rdi->filler(rdi->data, name, namlen, offset, info, d_type);
> > +}
> > +
> >  int vfs_readdir(struct file *file, filldir_t filler, void *buf)
> >  {
> >  	struct inode *inode = file->f_path.dentry->d_inode;
> >  	int res = -ENOTDIR;
> > +	struct readir_info rdi = { filler, buf };
> > +
> >  	if (!file->f_op || !file->f_op->readdir)
> >  		goto out;
> >  
> > @@ -36,7 +52,7 @@ int vfs_readdir(struct file *file, filldir_t filler, void *buf)
> >  
> >  	res = -ENOENT;
> >  	if (!IS_DEADDIR(inode)) {
> > -		res = file->f_op->readdir(file, buf, filler);
> > +		res = file->f_op->readdir(file, &rdi, white_out);
> >  		file_accessed(file);
> >  	}
> >  	mutex_unlock(&inode->i_mutex);

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 27/39] union-mount: In-kernel copyup routines
  2010-05-04  1:40   ` Valdis.Kletnieks
@ 2010-05-07 14:45     ` Valerie Aurora
  0 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-07 14:45 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, Christoph Hellwig,
	Jan Blunck

On Mon, May 03, 2010 at 09:40:04PM -0400, Valdis.Kletnieks@vt.edu wrote:
> On Mon, 03 May 2010 16:12:26 PDT, Valerie Aurora said:
> > When a file on the read-only layer of a union mount is altered, it
> > must be copied up to the topmost read-write layer.  This patch creates
> > union_copyup() and its supporting routines.
> > ---
> >  fs/union.c            |  244 +++++++++++++++++++++++++++++++++++++++++++++++
> 
> > +/**
> > + * union_copyup_data - Copy up len bytes of old's data to new
> > + *
> > + * @old: source file
> > + * @new: target file
> > + * @len: number of bytes to copy
> > + */
> > +
> > +static int union_copyup_data(struct path *old, struct vfsmount *new_mnt,
> > +			     struct dentry *new_dentry, size_t len)
> > +{
> > +	struct file *old_file;
> > +	struct file *new_file;
> > +	const struct cred *cred = current_cred();
> > +	loff_t offset = 0;
> > +	long bytes;
> > +	int error;
> 
> Should this be 'int error = 0;' ?
> 
> > +
> > +	if (len == 0)
> > +		return 0;
> > +
> > +	/* Get reference to balance later fput() */
> > +	path_get(old);
> > +	old_file = dentry_open(old->dentry, old->mnt, O_RDONLY, cred);
> > +	if (IS_ERR(old_file))
> > +		return PTR_ERR(old_file);
> > +
> > +	dget(new_dentry);
> > +	mntget(new_mnt);
> > +	new_file = dentry_open(new_dentry, new_mnt, O_WRONLY, cred);
> > +	if (IS_ERR(new_file)) {
> > +		error = PTR_ERR(new_file);
> > +		goto out_fput;
> > +	}
> > +
> > +	bytes = do_splice_direct(old_file, &offset, new_file, len,
> > +				 SPLICE_F_MOVE);
> > +	if (bytes < 0)
> > +		error = bytes;
> > +
> > +	fput(new_file);
> > +out_fput:
> > +	fput(old_file);
> > +	return error;
> > +}
> 
> because otherwise if do_splice_direct() returns a non-negative value,
> we can hit 'return error;' without ever having set error to anything?

Yes, I think you're right.  Thanks!

-VAL

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/39] whiteout/NFSD: Don't return information about whiteouts to userspace
  2010-05-06 21:18         ` Neil Brown
  (?)
@ 2010-05-17 19:51         ` Valerie Aurora
  -1 siblings, 0 replies; 57+ messages in thread
From: Valerie Aurora @ 2010-05-17 19:51 UTC (permalink / raw)
  To: Neil Brown
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, Christoph Hellwig,
	Jan Blunck, David Woodhouse, linux-nfs, J. Bruce Fields

On Fri, May 07, 2010 at 07:18:08AM +1000, Neil Brown wrote:
> On Thu, 6 May 2010 14:01:51 -0400
> Valerie Aurora <vaurora@redhat.com> wrote:
> 
> > On Tue, May 04, 2010 at 09:37:31AM +1000, Neil Brown wrote:
> > > On Mon,  3 May 2010 16:12:04 -0700
> > > Valerie Aurora <vaurora@redhat.com> wrote:
> > > 
> > > > From: Jan Blunck <jblunck@suse.de>
> > > > 
> > > > Userspace isn't ready for handling another file type, so silently drop
> > > > whiteout directory entries before they leave the kernel.
> > > 
> > > Feels very intrusive doesn't it....
> > > 
> > > Have you considered something like the following?
> > 
> > Hrm, I see how that could be more elegant, but I'd rather avoid yet
> > another layer of function pointer passing around.  This code is
> > already hard enough to review...
> 
>  Yes, the extra indirection is a bit of a negative, but I don't think this
>  patch is harder to review than the alternate.
>  From a numerical perspective, with this patch you only need to look at the
>  various places that ->readdir is called to be sure it is always correct.
>  There are about 3.  With the original you need to look at ever filldir
>  function.  Jan has found 9.  
> 
>  And from a maintainability perspective, I think my approach is safer.  Given
>  that there are 9 filldir functions already, the chance that a need will be
>  found for another seems good, and the chance that the coder will know to
>  check for DT_WHT is a best even.  Conversely if another call to ->readdir
>  were added it is likely that nothing would need to be done.
> 
>  Of course just counting things doesn't give a completely picture but I think
>  it can be indicative.

Okay, good points.  Let me try it out after getting this next rewrite done.

Thanks,

-VAL

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2010-05-17 19:52 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
2010-05-03 23:12 ` [PATCH 01/39] VFS: Comment follow_mount() and friends Valerie Aurora
2010-05-03 23:12 ` [PATCH 02/39] VFS: Make lookup_hash() return a struct path Valerie Aurora
2010-05-03 23:12 ` [PATCH 03/39] VFS: Add read-only users count to superblock Valerie Aurora
2010-05-03 23:12 ` [PATCH 04/39] autofs4: Save autofs trigger's vfsmount in super block info Valerie Aurora
2010-05-03 23:12 ` [PATCH 05/39] whiteout/NFSD: Don't return information about whiteouts to userspace Valerie Aurora
2010-05-03 23:37   ` Neil Brown
2010-05-03 23:37     ` Neil Brown
2010-05-06 18:01     ` Valerie Aurora
2010-05-06 21:18       ` Neil Brown
2010-05-06 21:18         ` Neil Brown
2010-05-17 19:51         ` Valerie Aurora
2010-05-03 23:12 ` [PATCH 06/39] whiteout: Add vfs_whiteout() and whiteout inode operation Valerie Aurora
2010-05-03 23:12 ` [PATCH 07/39] whiteout: Set S_OPAQUE inode flag when creating directories Valerie Aurora
2010-05-03 23:12 ` [PATCH 08/39] whiteout: Allow removal of a directory with whiteouts Valerie Aurora
2010-05-03 23:12 ` [PATCH 09/39] whiteout: tmpfs whiteout support Valerie Aurora
2010-05-03 23:12   ` Valerie Aurora
2010-05-03 23:12 ` [PATCH 10/39] whiteout: Split of ext2_append_link() from ext2_add_link() Valerie Aurora
2010-05-03 23:12 ` [PATCH 11/39] whiteout: ext2 whiteout support Valerie Aurora
2010-05-03 23:12 ` [PATCH 12/39] whiteout: jffs2 " Valerie Aurora
2010-05-03 23:12   ` Valerie Aurora
2010-05-03 23:12   ` Valerie Aurora
2010-05-03 23:12 ` [PATCH 13/39] fallthru: Basic fallthru definitions Valerie Aurora
2010-05-03 23:12 ` [PATCH 14/39] fallthru: ext2 fallthru support Valerie Aurora
2010-05-03 23:12 ` [PATCH 15/39] fallthru: jffs2 " Valerie Aurora
2010-05-03 23:12   ` Valerie Aurora
2010-05-03 23:12   ` Valerie Aurora
2010-05-03 23:12 ` [PATCH 16/39] fallthru: tmpfs " Valerie Aurora
2010-05-03 23:12 ` [PATCH 17/39] union-mount: Union mounts documentation Valerie Aurora
2010-05-04  1:54   ` Valdis.Kletnieks
2010-05-05 13:06     ` Valerie Aurora
2010-05-04 21:12   ` Jamie Lokier
2010-05-05 13:19     ` Valerie Aurora
2010-05-03 23:12 ` [PATCH 18/39] union-mount: Introduce MNT_UNION and MS_UNION flags Valerie Aurora
2010-05-03 23:12 ` [PATCH 19/39] union-mount: Introduce union_mount structure and basic operations Valerie Aurora
2010-05-03 23:12 ` [PATCH 20/39] union-mount: Drive the union cache via dcache Valerie Aurora
2010-05-03 23:12 ` [PATCH 21/39] union-mount: Implement union lookup Valerie Aurora
2010-05-03 23:12 ` [PATCH 22/39] union-mount: Support for mounting union mount file systems Valerie Aurora
2010-05-03 23:12 ` [PATCH 23/39] union-mount: Call do_whiteout() on unlink and rmdir in unions Valerie Aurora
2010-05-03 23:12 ` [PATCH 24/39] union-mount: Copy up directory entries on first readdir() Valerie Aurora
2010-05-03 23:12 ` [PATCH 25/39] VFS: Split inode_permission() and create path_permission() Valerie Aurora
2010-05-03 23:12 ` [PATCH 26/39] VFS: Create user_path_nd() to lookup both parent and target Valerie Aurora
2010-05-03 23:12 ` [PATCH 27/39] union-mount: In-kernel copyup routines Valerie Aurora
2010-05-04  1:40   ` Valdis.Kletnieks
2010-05-07 14:45     ` Valerie Aurora
2010-05-03 23:12 ` [PATCH 28/39] union-mount: In-kernel copyup of xattrs Valerie Aurora
2010-05-03 23:12 ` [PATCH 29/39] union-mount: Implement union-aware access()/faccessat() Valerie Aurora
2010-05-03 23:12 ` [PATCH 30/39] union-mount: Implement union-aware link() Valerie Aurora
2010-05-03 23:12 ` [PATCH 31/39] union-mount: Implement union-aware rename() Valerie Aurora
2010-05-03 23:12 ` [PATCH 32/39] union-mount: Implement union-aware writable open() Valerie Aurora
2010-05-03 23:12 ` [PATCH 33/39] union-mount: Implement union-aware chown() Valerie Aurora
2010-05-03 23:12 ` [PATCH 34/39] union-mount: Implement union-aware truncate() Valerie Aurora
2010-05-03 23:12 ` [PATCH 35/39] union-mount: Implement union-aware chmod()/fchmodat() Valerie Aurora
2010-05-03 23:12 ` [PATCH 36/39] union-mount: Implement union-aware lchown() Valerie Aurora
2010-05-03 23:12 ` [PATCH 37/39] union-mount: Implement union-aware utimensat() Valerie Aurora
2010-05-03 23:12 ` [PATCH 38/39] union-mount: Implement union-aware setxattr() Valerie Aurora
2010-05-03 23:12 ` [PATCH 39/39] union-mount: Implement union-aware lsetxattr() Valerie Aurora

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.