All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] hybrid union filesystem prototype
@ 2010-08-26 18:33 Miklos Szeredi
  2010-08-26 18:33 ` [PATCH 1/5] vfs: implement open "forwarding" Miklos Szeredi
                   ` (5 more replies)
  0 siblings, 6 replies; 49+ messages in thread
From: Miklos Szeredi @ 2010-08-26 18:33 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro, jblunck, hch

This is the result of my experimentation with trying to do
union-mounts like semantics with a filesystem.  The implementation is
far from perfect, but I think the concept does sort of work.  See the
patch header for the union filesystem.

VFS modifications necessary to make it work:

  - allow f_op->open() to return a different file
  - pass dentry to i_op->permission() instead of inode
  - hack to vfs_rename() to allow rename to same inode

Git tree is here:

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git union-hybrid

Thanks,
Miklos
-- 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 1/5] vfs: implement open "forwarding"
  2010-08-26 18:33 [PATCH 0/5] hybrid union filesystem prototype Miklos Szeredi
@ 2010-08-26 18:33 ` Miklos Szeredi
  2010-08-26 18:33 ` [PATCH 2/5] vfs: make i_op->permission take a dentry instead of an inode Miklos Szeredi
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 49+ messages in thread
From: Miklos Szeredi @ 2010-08-26 18:33 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro, jblunck, hch

[-- Attachment #1: vfs-open-redirect.patch --]
[-- Type: text/plain, Size: 2648 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Add a new file operation f_op->open_other().  This acts just like
f_op->open() except the return value can be another open struct file
pointer.  In that case the original file is discarded and the
replacement file is used instead.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/open.c          |   23 +++++++++++++++++------
 include/linux/fs.h |    1 +
 2 files changed, 18 insertions(+), 6 deletions(-)

Index: linux-2.6/fs/open.c
===================================================================
--- linux-2.6.orig/fs/open.c	2010-08-19 09:45:50.000000000 +0200
+++ linux-2.6/fs/open.c	2010-08-19 09:46:27.000000000 +0200
@@ -657,6 +657,7 @@ static struct file *__dentry_open(struct
 					const struct cred *cred)
 {
 	struct inode *inode;
+	struct file *ret;
 	int error;
 
 	f->f_mode = OPEN_FMODE(f->f_flags) | FMODE_LSEEK |
@@ -664,6 +665,7 @@ static struct file *__dentry_open(struct
 	inode = dentry->d_inode;
 	if (f->f_mode & FMODE_WRITE) {
 		error = __get_file_write_access(inode, mnt);
+		ret = ERR_PTR(error);
 		if (error)
 			goto cleanup_file;
 		if (!special_file(inode->i_mode))
@@ -678,15 +680,24 @@ static struct file *__dentry_open(struct
 	file_sb_list_add(f, inode->i_sb);
 
 	error = security_dentry_open(f, cred);
+	ret = ERR_PTR(error);
 	if (error)
 		goto cleanup_all;
 
-	if (!open && f->f_op)
-		open = f->f_op->open;
-	if (open) {
-		error = open(inode, f);
-		if (error)
+	if (!open && f->f_op && f->f_op->open_other) {
+		/* NULL means keep f, non-error non-null means replace */
+		ret = f->f_op->open_other(f);
+		if (IS_ERR(ret) || ret != NULL)
 			goto cleanup_all;
+	} else {
+		if (!open && f->f_op)
+			open = f->f_op->open;
+		if (open) {
+			error = open(inode, f);
+			ret = ERR_PTR(error);
+			if (error)
+				goto cleanup_all;
+		}
 	}
 	ima_counts_get(f);
 
@@ -728,7 +739,7 @@ cleanup_file:
 	put_filp(f);
 	dput(dentry);
 	mntput(mnt);
-	return ERR_PTR(error);
+	return ret;
 }
 
 /**
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-08-19 09:46:15.000000000 +0200
+++ linux-2.6/include/linux/fs.h	2010-08-19 09:46:27.000000000 +0200
@@ -1494,6 +1494,7 @@ struct file_operations {
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
 	int (*open) (struct inode *, struct file *);
+	struct file *(*open_other) (struct file *);
 	int (*flush) (struct file *, fl_owner_t id);
 	int (*release) (struct inode *, struct file *);
 	int (*fsync) (struct file *, int datasync);

-- 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 2/5] vfs: make i_op->permission take a dentry instead of an inode
  2010-08-26 18:33 [PATCH 0/5] hybrid union filesystem prototype Miklos Szeredi
  2010-08-26 18:33 ` [PATCH 1/5] vfs: implement open "forwarding" Miklos Szeredi
@ 2010-08-26 18:33 ` Miklos Szeredi
  2010-08-26 20:24   ` David P. Quigley
  2010-08-26 18:33 ` [PATCH 3/5] vfs: add flag to allow rename to same inode Miklos Szeredi
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 49+ messages in thread
From: Miklos Szeredi @ 2010-08-26 18:33 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro, jblunck, hch

[-- Attachment #1: vfs-permission-dentry.patch --]
[-- Type: text/plain, Size: 35440 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Like most other inode operations ->permission() should take a dentry
instead of an inode.  This is necessary for filesystems which operate
on names not on inodes.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/afs/internal.h                  |    2 +-
 fs/afs/security.c                  |    3 ++-
 fs/bad_inode.c                     |    2 +-
 fs/btrfs/inode.c                   |    4 +++-
 fs/btrfs/ioctl.c                   |    8 ++++----
 fs/ceph/inode.c                    |    3 ++-
 fs/ceph/super.h                    |    2 +-
 fs/cifs/cifsfs.c                   |    3 ++-
 fs/coda/dir.c                      |    3 ++-
 fs/coda/pioctl.c                   |    4 ++--
 fs/ecryptfs/inode.c                |    4 ++--
 fs/fuse/dir.c                      |    3 ++-
 fs/gfs2/ops_inode.c                |   11 ++++++++---
 fs/hostfs/hostfs_kern.c            |    3 ++-
 fs/logfs/dir.c                     |    6 ------
 fs/namei.c                         |   37 ++++++++++++++++++++-----------------
 fs/namespace.c                     |    2 +-
 fs/nfs/dir.c                       |    3 ++-
 fs/nfsd/nfsfh.c                    |    2 +-
 fs/nfsd/vfs.c                      |    4 ++--
 fs/nilfs2/nilfs.h                  |    2 +-
 fs/notify/fanotify/fanotify_user.c |    2 +-
 fs/notify/inotify/inotify_user.c   |    2 +-
 fs/ocfs2/file.c                    |    3 ++-
 fs/ocfs2/file.h                    |    2 +-
 fs/ocfs2/refcounttree.c            |    4 ++--
 fs/open.c                          |   10 +++++-----
 fs/proc/base.c                     |    3 ++-
 fs/proc/proc_sysctl.c              |    3 ++-
 fs/reiserfs/xattr.c                |    4 +++-
 fs/smbfs/file.c                    |    4 ++--
 fs/sysfs/inode.c                   |    3 ++-
 fs/sysfs/sysfs.h                   |    2 +-
 fs/utimes.c                        |    2 +-
 fs/xattr.c                         |   12 +++++++-----
 include/linux/coda_linux.h         |    2 +-
 include/linux/fs.h                 |    4 ++--
 include/linux/nfs_fs.h             |    2 +-
 include/linux/reiserfs_xattr.h     |    2 +-
 ipc/mqueue.c                       |    2 +-
 net/unix/af_unix.c                 |    2 +-
 41 files changed, 100 insertions(+), 81 deletions(-)

Index: linux-2.6/fs/btrfs/ioctl.c
===================================================================
--- linux-2.6.orig/fs/btrfs/ioctl.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/btrfs/ioctl.c	2010-08-19 09:46:31.000000000 +0200
@@ -396,13 +396,13 @@ fail:
 }
 
 /* copy of may_create in fs/namei.c() */
-static inline int btrfs_may_create(struct inode *dir, struct dentry *child)
+static inline int btrfs_may_create(struct dentry *dir, struct dentry *child)
 {
 	if (child->d_inode)
 		return -EEXIST;
-	if (IS_DEADDIR(dir))
+	if (IS_DEADDIR(dir->d_inode))
 		return -ENOENT;
-	return inode_permission(dir, MAY_WRITE | MAY_EXEC);
+	return dentry_permission(dir, MAY_WRITE | MAY_EXEC);
 }
 
 /*
@@ -433,7 +433,7 @@ static noinline int btrfs_mksubvol(struc
 	if (error)
 		goto out_dput;
 
-	error = btrfs_may_create(dir, dentry);
+	error = btrfs_may_create(parent->dentry, dentry);
 	if (error)
 		goto out_drop_write;
 
Index: linux-2.6/fs/ecryptfs/inode.c
===================================================================
--- linux-2.6.orig/fs/ecryptfs/inode.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/ecryptfs/inode.c	2010-08-19 09:46:31.000000000 +0200
@@ -958,9 +958,9 @@ int ecryptfs_truncate(struct dentry *den
 }
 
 static int
-ecryptfs_permission(struct inode *inode, int mask)
+ecryptfs_permission(struct dentry *dentry, int mask)
 {
-	return inode_permission(ecryptfs_inode_to_lower(inode), mask);
+	return dentry_permission(ecryptfs_dentry_to_lower(dentry), mask);
 }
 
 /**
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c	2010-08-19 09:46:15.000000000 +0200
+++ linux-2.6/fs/namei.c	2010-08-19 09:46:31.000000000 +0200
@@ -240,17 +240,18 @@ int generic_permission(struct inode *ino
 }
 
 /**
- * inode_permission  -  check for access rights to a given inode
- * @inode:	inode to check permission on
+ * dentry_permission  -  check for access rights to a given dentry
+ * @dentry:	dentry to check permission on
  * @mask:	right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
  *
- * Used to check for read/write/execute permissions on an inode.
+ * Used to check for read/write/execute permissions on an dentry.
  * We use "fsuid" for this, letting us set arbitrary permissions
  * for filesystem access without changing the "normal" uids which
  * are used for other things.
  */
-int inode_permission(struct inode *inode, int mask)
+int dentry_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
 	int retval;
 
 	if (mask & MAY_WRITE) {
@@ -271,7 +272,7 @@ int inode_permission(struct inode *inode
 	}
 
 	if (inode->i_op->permission)
-		retval = inode->i_op->permission(inode, mask);
+		retval = inode->i_op->permission(dentry, mask);
 	else
 		retval = generic_permission(inode, mask, inode->i_op->check_acl);
 
@@ -295,11 +296,11 @@ int inode_permission(struct inode *inode
  *
  * Note:
  *	Do not use this function in new code.  All access checks should
- *	be done using inode_permission().
+ *	be done using dentry_permission().
  */
 int file_permission(struct file *file, int mask)
 {
-	return inode_permission(file->f_path.dentry->d_inode, mask);
+	return dentry_permission(file->f_path.dentry, mask);
 }
 
 /*
@@ -459,12 +460,13 @@ force_reval_path(struct path *path, stru
  * short-cut DAC fails, then call ->permission() to do more
  * complete permission check.
  */
-static int exec_permission(struct inode *inode)
+static int exec_permission(struct dentry *dentry)
 {
 	int ret;
+	struct inode *inode = dentry->d_inode;
 
 	if (inode->i_op->permission) {
-		ret = inode->i_op->permission(inode, MAY_EXEC);
+		ret = inode->i_op->permission(dentry, MAY_EXEC);
 		if (!ret)
 			goto ok;
 		return ret;
@@ -837,7 +839,7 @@ static int link_path_walk(const char *na
 		unsigned int c;
 
 		nd->flags |= LOOKUP_CONTINUE;
-		err = exec_permission(inode);
+		err = exec_permission(nd->path.dentry);
  		if (err)
 			break;
 
@@ -1163,7 +1165,7 @@ static struct dentry *lookup_hash(struct
 {
 	int err;
 
-	err = exec_permission(nd->path.dentry->d_inode);
+	err = exec_permission(nd->path.dentry);
 	if (err)
 		return ERR_PTR(err);
 	return __lookup_hash(&nd->last, nd->path.dentry, nd);
@@ -1213,7 +1215,7 @@ struct dentry *lookup_one_len(const char
 	if (err)
 		return ERR_PTR(err);
 
-	err = exec_permission(base->d_inode);
+	err = exec_permission(base);
 	if (err)
 		return ERR_PTR(err);
 	return __lookup_hash(&this, base, NULL);
@@ -1301,7 +1303,7 @@ static int may_delete(struct inode *dir,
 	BUG_ON(victim->d_parent->d_inode != dir);
 	audit_inode_child(victim, dir);
 
-	error = inode_permission(dir, MAY_WRITE | MAY_EXEC);
+	error = dentry_permission(victim->d_parent, MAY_WRITE | MAY_EXEC);
 	if (error)
 		return error;
 	if (IS_APPEND(dir))
@@ -1337,7 +1339,8 @@ static inline int may_create(struct inod
 		return -EEXIST;
 	if (IS_DEADDIR(dir))
 		return -ENOENT;
-	return inode_permission(dir, MAY_WRITE | MAY_EXEC);
+	BUG_ON(child->d_parent->d_inode != dir);
+	return dentry_permission(child->d_parent, MAY_WRITE | MAY_EXEC);
 }
 
 /*
@@ -1430,7 +1433,7 @@ int may_open(struct path *path, int acc_
 		break;
 	}
 
-	error = inode_permission(inode, acc_mode);
+	error = dentry_permission(dentry, acc_mode);
 	if (error)
 		return error;
 
@@ -2545,7 +2548,7 @@ static int vfs_rename_dir(struct inode *
 	 * we'll need to flip '..'.
 	 */
 	if (new_dir != old_dir) {
-		error = inode_permission(old_dentry->d_inode, MAY_WRITE);
+		error = dentry_permission(old_dentry, MAY_WRITE);
 		if (error)
 			return error;
 	}
@@ -2900,7 +2903,7 @@ EXPORT_SYMBOL(page_symlink_inode_operati
 EXPORT_SYMBOL(path_lookup);
 EXPORT_SYMBOL(kern_path);
 EXPORT_SYMBOL(vfs_path_lookup);
-EXPORT_SYMBOL(inode_permission);
+EXPORT_SYMBOL(dentry_permission);
 EXPORT_SYMBOL(file_permission);
 EXPORT_SYMBOL(unlock_rename);
 EXPORT_SYMBOL(vfs_create);
Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-08-19 09:45:50.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-08-19 09:46:31.000000000 +0200
@@ -1230,7 +1230,7 @@ static int mount_is_safe(struct path *pa
 		if (current_uid() != path->dentry->d_inode->i_uid)
 			return -EPERM;
 	}
-	if (inode_permission(path->dentry->d_inode, MAY_WRITE))
+	if (dentry_permission(path->dentry, MAY_WRITE))
 		return -EPERM;
 	return 0;
 #endif
Index: linux-2.6/fs/nfsd/nfsfh.c
===================================================================
--- linux-2.6.orig/fs/nfsd/nfsfh.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/nfsd/nfsfh.c	2010-08-19 09:46:31.000000000 +0200
@@ -38,7 +38,7 @@ static int nfsd_acceptable(void *expv, s
 		/* make sure parents give x permission to user */
 		int err;
 		parent = dget_parent(tdentry);
-		err = inode_permission(parent->d_inode, MAY_EXEC);
+		err = dentry_permission(parent, MAY_EXEC);
 		if (err < 0) {
 			dput(parent);
 			break;
Index: linux-2.6/fs/nfsd/vfs.c
===================================================================
--- linux-2.6.orig/fs/nfsd/vfs.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/nfsd/vfs.c	2010-08-19 09:46:31.000000000 +0200
@@ -2124,12 +2124,12 @@ nfsd_permission(struct svc_rqst *rqstp,
 		return 0;
 
 	/* This assumes  NFSD_MAY_{READ,WRITE,EXEC} == MAY_{READ,WRITE,EXEC} */
-	err = inode_permission(inode, acc & (MAY_READ|MAY_WRITE|MAY_EXEC));
+	err = dentry_permission(dentry, acc & (MAY_READ|MAY_WRITE|MAY_EXEC));
 
 	/* Allow read access to binaries even when mode 111 */
 	if (err == -EACCES && S_ISREG(inode->i_mode) &&
 	    acc == (NFSD_MAY_READ | NFSD_MAY_OWNER_OVERRIDE))
-		err = inode_permission(inode, MAY_EXEC);
+		err = dentry_permission(dentry, MAY_EXEC);
 
 	return err? nfserrno(err) : 0;
 }
Index: linux-2.6/fs/notify/fanotify/fanotify_user.c
===================================================================
--- linux-2.6.orig/fs/notify/fanotify/fanotify_user.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/notify/fanotify/fanotify_user.c	2010-08-19 09:46:31.000000000 +0200
@@ -454,7 +454,7 @@ static int fanotify_find_path(int dfd, c
 	}
 
 	/* you can only watch an inode if you have read permissions on it */
-	ret = inode_permission(path->dentry->d_inode, MAY_READ);
+	ret = dentry_permission(path->dentry, MAY_READ);
 	if (ret)
 		path_put(path);
 out:
Index: linux-2.6/fs/notify/inotify/inotify_user.c
===================================================================
--- linux-2.6.orig/fs/notify/inotify/inotify_user.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/notify/inotify/inotify_user.c	2010-08-19 09:46:31.000000000 +0200
@@ -358,7 +358,7 @@ static int inotify_find_inode(const char
 	if (error)
 		return error;
 	/* you can only watch an inode if you have read permissions on it */
-	error = inode_permission(path->dentry->d_inode, MAY_READ);
+	error = dentry_permission(path->dentry, MAY_READ);
 	if (error)
 		path_put(path);
 	return error;
Index: linux-2.6/fs/ocfs2/refcounttree.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/refcounttree.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/ocfs2/refcounttree.c	2010-08-19 09:46:31.000000000 +0200
@@ -4322,7 +4322,7 @@ static inline int ocfs2_may_create(struc
 		return -EEXIST;
 	if (IS_DEADDIR(dir))
 		return -ENOENT;
-	return inode_permission(dir, MAY_WRITE | MAY_EXEC);
+	return dentry_permission(child->d_parent, MAY_WRITE | MAY_EXEC);
 }
 
 /* copied from user_path_parent. */
@@ -4395,7 +4395,7 @@ static int ocfs2_vfs_reflink(struct dent
 	 * file.
 	 */
 	if (!preserve) {
-		error = inode_permission(inode, MAY_READ);
+		error = dentry_permission(old_dentry, MAY_READ);
 		if (error)
 			return error;
 	}
Index: linux-2.6/fs/open.c
===================================================================
--- linux-2.6.orig/fs/open.c	2010-08-19 09:46:27.000000000 +0200
+++ linux-2.6/fs/open.c	2010-08-19 09:46:31.000000000 +0200
@@ -89,7 +89,7 @@ static long do_sys_truncate(const char _
 	if (error)
 		goto dput_and_out;
 
-	error = inode_permission(inode, MAY_WRITE);
+	error = dentry_permission(path.dentry, MAY_WRITE);
 	if (error)
 		goto mnt_drop_write_and_out;
 
@@ -328,7 +328,7 @@ SYSCALL_DEFINE3(faccessat, int, dfd, con
 			goto out_path_release;
 	}
 
-	res = inode_permission(inode, mode | MAY_ACCESS);
+	res = dentry_permission(path.dentry, mode | MAY_ACCESS);
 	/* SuS v2 requires we report a read only fs too */
 	if (res || !(mode & S_IWOTH) || special_file(inode->i_mode))
 		goto out_path_release;
@@ -367,7 +367,7 @@ SYSCALL_DEFINE1(chdir, const char __user
 	if (error)
 		goto out;
 
-	error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
+	error = dentry_permission(path.dentry, MAY_EXEC | MAY_CHDIR);
 	if (error)
 		goto dput_and_out;
 
@@ -396,7 +396,7 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd
 	if (!S_ISDIR(inode->i_mode))
 		goto out_putf;
 
-	error = inode_permission(inode, MAY_EXEC | MAY_CHDIR);
+	error = dentry_permission(file->f_path.dentry, MAY_EXEC | MAY_CHDIR);
 	if (!error)
 		set_fs_pwd(current->fs, &file->f_path);
 out_putf:
@@ -414,7 +414,7 @@ SYSCALL_DEFINE1(chroot, const char __use
 	if (error)
 		goto out;
 
-	error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
+	error = dentry_permission(path.dentry, MAY_EXEC | MAY_CHDIR);
 	if (error)
 		goto dput_and_out;
 
Index: linux-2.6/fs/utimes.c
===================================================================
--- linux-2.6.orig/fs/utimes.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/utimes.c	2010-08-19 09:46:31.000000000 +0200
@@ -96,7 +96,7 @@ static int utimes_common(struct path *pa
 			goto mnt_drop_write_and_out;
 
 		if (!is_owner_or_cap(inode)) {
-			error = inode_permission(inode, MAY_WRITE);
+			error = dentry_permission(path->dentry, MAY_WRITE);
 			if (error)
 				goto mnt_drop_write_and_out;
 		}
Index: linux-2.6/fs/xattr.c
===================================================================
--- linux-2.6.orig/fs/xattr.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/xattr.c	2010-08-19 09:46:31.000000000 +0200
@@ -26,8 +26,10 @@
  * because different namespaces have very different rules.
  */
 static int
-xattr_permission(struct inode *inode, const char *name, int mask)
+xattr_permission(struct dentry *dentry, const char *name, int mask)
 {
+	struct inode *inode = dentry->d_inode;
+
 	/*
 	 * We can never set or remove an extended attribute on a read-only
 	 * filesystem  or on an immutable / append-only inode.
@@ -63,7 +65,7 @@ xattr_permission(struct inode *inode, co
 			return -EPERM;
 	}
 
-	return inode_permission(inode, mask);
+	return dentry_permission(dentry, mask);
 }
 
 /**
@@ -115,7 +117,7 @@ vfs_setxattr(struct dentry *dentry, cons
 	struct inode *inode = dentry->d_inode;
 	int error;
 
-	error = xattr_permission(inode, name, MAY_WRITE);
+	error = xattr_permission(dentry, name, MAY_WRITE);
 	if (error)
 		return error;
 
@@ -165,7 +167,7 @@ vfs_getxattr(struct dentry *dentry, cons
 	struct inode *inode = dentry->d_inode;
 	int error;
 
-	error = xattr_permission(inode, name, MAY_READ);
+	error = xattr_permission(dentry, name, MAY_READ);
 	if (error)
 		return error;
 
@@ -224,7 +226,7 @@ vfs_removexattr(struct dentry *dentry, c
 	if (!inode->i_op->removexattr)
 		return -EOPNOTSUPP;
 
-	error = xattr_permission(inode, name, MAY_WRITE);
+	error = xattr_permission(dentry, name, MAY_WRITE);
 	if (error)
 		return error;
 
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-08-19 09:46:27.000000000 +0200
+++ linux-2.6/include/linux/fs.h	2010-08-19 09:46:31.000000000 +0200
@@ -1525,7 +1525,7 @@ struct inode_operations {
 	void * (*follow_link) (struct dentry *, struct nameidata *);
 	void (*put_link) (struct dentry *, struct nameidata *, void *);
 	void (*truncate) (struct inode *);
-	int (*permission) (struct inode *, int);
+	int (*permission) (struct dentry *, int);
 	int (*check_acl)(struct inode *, int);
 	int (*setattr) (struct dentry *, struct iattr *);
 	int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
@@ -2111,7 +2111,7 @@ extern void emergency_remount(void);
 extern sector_t bmap(struct inode *, sector_t);
 #endif
 extern int notify_change(struct dentry *, struct iattr *);
-extern int inode_permission(struct inode *, int);
+extern int dentry_permission(struct dentry *, int);
 extern int generic_permission(struct inode *, int,
 		int (*check_acl)(struct inode *, int));
 
Index: linux-2.6/ipc/mqueue.c
===================================================================
--- linux-2.6.orig/ipc/mqueue.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/ipc/mqueue.c	2010-08-19 09:46:31.000000000 +0200
@@ -656,7 +656,7 @@ static struct file *do_open(struct ipc_n
 		goto err;
 	}
 
-	if (inode_permission(dentry->d_inode, oflag2acc[oflag & O_ACCMODE])) {
+	if (dentry_permission(dentry, oflag2acc[oflag & O_ACCMODE])) {
 		ret = -EACCES;
 		goto err;
 	}
Index: linux-2.6/net/unix/af_unix.c
===================================================================
--- linux-2.6.orig/net/unix/af_unix.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/net/unix/af_unix.c	2010-08-19 09:46:31.000000000 +0200
@@ -748,7 +748,7 @@ static struct sock *unix_find_other(stru
 		if (err)
 			goto fail;
 		inode = path.dentry->d_inode;
-		err = inode_permission(inode, MAY_WRITE);
+		err = dentry_permission(path.dentry, MAY_WRITE);
 		if (err)
 			goto put_fail;
 
Index: linux-2.6/fs/afs/internal.h
===================================================================
--- linux-2.6.orig/fs/afs/internal.h	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/afs/internal.h	2010-08-19 09:46:31.000000000 +0200
@@ -624,7 +624,7 @@ extern void afs_clear_permits(struct afs
 extern void afs_cache_permit(struct afs_vnode *, struct key *, long);
 extern void afs_zap_permits(struct rcu_head *);
 extern struct key *afs_request_key(struct afs_cell *);
-extern int afs_permission(struct inode *, int);
+extern int afs_permission(struct dentry *, int);
 
 /*
  * server.c
Index: linux-2.6/fs/afs/security.c
===================================================================
--- linux-2.6.orig/fs/afs/security.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/afs/security.c	2010-08-19 09:46:31.000000000 +0200
@@ -285,8 +285,9 @@ static int afs_check_permit(struct afs_v
  * - AFS ACLs are attached to directories only, and a file is controlled by its
  *   parent directory's ACL
  */
-int afs_permission(struct inode *inode, int mask)
+int afs_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
 	struct afs_vnode *vnode = AFS_FS_I(inode);
 	afs_access_t uninitialized_var(access);
 	struct key *key;
Index: linux-2.6/fs/bad_inode.c
===================================================================
--- linux-2.6.orig/fs/bad_inode.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/bad_inode.c	2010-08-19 09:46:31.000000000 +0200
@@ -229,7 +229,7 @@ static int bad_inode_readlink(struct den
 	return -EIO;
 }
 
-static int bad_inode_permission(struct inode *inode, int mask)
+static int bad_inode_permission(struct dentry *dentry, int mask)
 {
 	return -EIO;
 }
Index: linux-2.6/fs/btrfs/inode.c
===================================================================
--- linux-2.6.orig/fs/btrfs/inode.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/btrfs/inode.c	2010-08-19 09:46:31.000000000 +0200
@@ -6922,8 +6922,10 @@ static int btrfs_set_page_dirty(struct p
 	return __set_page_dirty_nobuffers(page);
 }
 
-static int btrfs_permission(struct inode *inode, int mask)
+static int btrfs_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
+
 	if ((BTRFS_I(inode)->flags & BTRFS_INODE_READONLY) && (mask & MAY_WRITE))
 		return -EACCES;
 	return generic_permission(inode, mask, btrfs_check_acl);
Index: linux-2.6/fs/ceph/inode.c
===================================================================
--- linux-2.6.orig/fs/ceph/inode.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/ceph/inode.c	2010-08-19 09:46:31.000000000 +0200
@@ -1757,8 +1757,9 @@ int ceph_do_getattr(struct inode *inode,
  * Check inode permissions.  We verify we have a valid value for
  * the AUTH cap, then call the generic handler.
  */
-int ceph_permission(struct inode *inode, int mask)
+int ceph_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
 	int err = ceph_do_getattr(inode, CEPH_CAP_AUTH_SHARED);
 
 	if (!err)
Index: linux-2.6/fs/ceph/super.h
===================================================================
--- linux-2.6.orig/fs/ceph/super.h	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/ceph/super.h	2010-08-19 09:46:31.000000000 +0200
@@ -776,7 +776,7 @@ extern void ceph_queue_invalidate(struct
 extern void ceph_queue_writeback(struct inode *inode);
 
 extern int ceph_do_getattr(struct inode *inode, int mask);
-extern int ceph_permission(struct inode *inode, int mask);
+extern int ceph_permission(struct dentry *dentry, int mask);
 extern int ceph_setattr(struct dentry *dentry, struct iattr *attr);
 extern int ceph_getattr(struct vfsmount *mnt, struct dentry *dentry,
 			struct kstat *stat);
Index: linux-2.6/fs/cifs/cifsfs.c
===================================================================
--- linux-2.6.orig/fs/cifs/cifsfs.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/cifs/cifsfs.c	2010-08-19 09:46:31.000000000 +0200
@@ -269,8 +269,9 @@ cifs_statfs(struct dentry *dentry, struc
 	return 0;
 }
 
-static int cifs_permission(struct inode *inode, int mask)
+static int cifs_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
 	struct cifs_sb_info *cifs_sb;
 
 	cifs_sb = CIFS_SB(inode->i_sb);
Index: linux-2.6/fs/coda/dir.c
===================================================================
--- linux-2.6.orig/fs/coda/dir.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/coda/dir.c	2010-08-19 09:46:31.000000000 +0200
@@ -138,8 +138,9 @@ exit:
 }
 
 
-int coda_permission(struct inode *inode, int mask)
+int coda_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
         int error = 0;
 
 	mask &= MAY_READ | MAY_WRITE | MAY_EXEC;
Index: linux-2.6/fs/fuse/dir.c
===================================================================
--- linux-2.6.orig/fs/fuse/dir.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/fuse/dir.c	2010-08-19 09:46:31.000000000 +0200
@@ -981,8 +981,9 @@ static int fuse_access(struct inode *ino
  * access request is sent.  Execute permission is still checked
  * locally based on file mode.
  */
-static int fuse_permission(struct inode *inode, int mask)
+static int fuse_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	bool refreshed = false;
 	int err = 0;
Index: linux-2.6/fs/gfs2/ops_inode.c
===================================================================
--- linux-2.6.orig/fs/gfs2/ops_inode.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/gfs2/ops_inode.c	2010-08-19 09:46:31.000000000 +0200
@@ -1071,6 +1071,11 @@ int gfs2_permission(struct inode *inode,
 	return error;
 }
 
+static int gfs2_dentry_permission(struct dentry *dentry, int mask)
+{
+	return gfs2_permission(dentry->d_inode, mask);
+}
+
 /*
  * XXX(truncate): the truncate_setsize calls should be moved to the end.
  */
@@ -1344,7 +1349,7 @@ out:
 }
 
 const struct inode_operations gfs2_file_iops = {
-	.permission = gfs2_permission,
+	.permission = gfs2_dentry_permission,
 	.setattr = gfs2_setattr,
 	.getattr = gfs2_getattr,
 	.setxattr = gfs2_setxattr,
@@ -1364,7 +1369,7 @@ const struct inode_operations gfs2_dir_i
 	.rmdir = gfs2_rmdir,
 	.mknod = gfs2_mknod,
 	.rename = gfs2_rename,
-	.permission = gfs2_permission,
+	.permission = gfs2_dentry_permission,
 	.setattr = gfs2_setattr,
 	.getattr = gfs2_getattr,
 	.setxattr = gfs2_setxattr,
@@ -1378,7 +1383,7 @@ const struct inode_operations gfs2_symli
 	.readlink = generic_readlink,
 	.follow_link = gfs2_follow_link,
 	.put_link = gfs2_put_link,
-	.permission = gfs2_permission,
+	.permission = gfs2_dentry_permission,
 	.setattr = gfs2_setattr,
 	.getattr = gfs2_getattr,
 	.setxattr = gfs2_setxattr,
Index: linux-2.6/fs/coda/pioctl.c
===================================================================
--- linux-2.6.orig/fs/coda/pioctl.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/coda/pioctl.c	2010-08-19 09:46:31.000000000 +0200
@@ -26,7 +26,7 @@
 #include <linux/smp_lock.h>
 
 /* pioctl ops */
-static int coda_ioctl_permission(struct inode *inode, int mask);
+static int coda_ioctl_permission(struct dentry *dentry, int mask);
 static long coda_pioctl(struct file *filp, unsigned int cmd,
 			unsigned long user_data);
 
@@ -42,7 +42,7 @@ const struct file_operations coda_ioctl_
 };
 
 /* the coda pioctl inode ops */
-static int coda_ioctl_permission(struct inode *inode, int mask)
+static int coda_ioctl_permission(struct dentry *dentry, int mask)
 {
 	return (mask & MAY_EXEC) ? -EACCES : 0;
 }
Index: linux-2.6/fs/hostfs/hostfs_kern.c
===================================================================
--- linux-2.6.orig/fs/hostfs/hostfs_kern.c	2010-08-19 09:45:50.000000000 +0200
+++ linux-2.6/fs/hostfs/hostfs_kern.c	2010-08-19 09:46:31.000000000 +0200
@@ -746,8 +746,9 @@ int hostfs_rename(struct inode *from_ino
 	return err;
 }
 
-int hostfs_permission(struct inode *ino, int desired)
+static int hostfs_permission(struct dentry *dentry, int desired)
 {
+	struct inode *ino = dentry->d_inode;
 	char *name;
 	int r = 0, w = 0, x = 0, err;
 
Index: linux-2.6/fs/logfs/dir.c
===================================================================
--- linux-2.6.orig/fs/logfs/dir.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/logfs/dir.c	2010-08-19 09:46:31.000000000 +0200
@@ -555,11 +555,6 @@ static int logfs_symlink(struct inode *d
 	return __logfs_create(dir, dentry, inode, target, destlen);
 }
 
-static int logfs_permission(struct inode *inode, int mask)
-{
-	return generic_permission(inode, mask, NULL);
-}
-
 static int logfs_link(struct dentry *old_dentry, struct inode *dir,
 		struct dentry *dentry)
 {
@@ -818,7 +813,6 @@ const struct inode_operations logfs_dir_
 	.mknod		= logfs_mknod,
 	.rename		= logfs_rename,
 	.rmdir		= logfs_rmdir,
-	.permission	= logfs_permission,
 	.symlink	= logfs_symlink,
 	.unlink		= logfs_unlink,
 };
Index: linux-2.6/fs/nfs/dir.c
===================================================================
--- linux-2.6.orig/fs/nfs/dir.c	2010-08-19 09:45:50.000000000 +0200
+++ linux-2.6/fs/nfs/dir.c	2010-08-19 09:46:31.000000000 +0200
@@ -1941,8 +1941,9 @@ int nfs_may_open(struct inode *inode, st
 	return nfs_do_access(inode, cred, nfs_open_permission_mask(openflags));
 }
 
-int nfs_permission(struct inode *inode, int mask)
+int nfs_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
 	struct rpc_cred *cred;
 	int res = 0;
 
Index: linux-2.6/fs/nilfs2/nilfs.h
===================================================================
--- linux-2.6.orig/fs/nilfs2/nilfs.h	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/nilfs2/nilfs.h	2010-08-19 09:46:31.000000000 +0200
@@ -200,7 +200,7 @@ static inline struct inode *nilfs_dat_in
  */
 #ifdef CONFIG_NILFS_POSIX_ACL
 #error "NILFS: not yet supported POSIX ACL"
-extern int nilfs_permission(struct inode *, int, struct nameidata *);
+extern int nilfs_permission(struct dentry *, int);
 extern int nilfs_acl_chmod(struct inode *);
 extern int nilfs_init_acl(struct inode *, struct inode *);
 #else
Index: linux-2.6/fs/ocfs2/file.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/file.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/ocfs2/file.c	2010-08-19 09:46:31.000000000 +0200
@@ -1310,8 +1310,9 @@ bail:
 	return err;
 }
 
-int ocfs2_permission(struct inode *inode, int mask)
+int ocfs2_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
 	int ret;
 
 	mlog_entry_void();
Index: linux-2.6/fs/ocfs2/file.h
===================================================================
--- linux-2.6.orig/fs/ocfs2/file.h	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/ocfs2/file.h	2010-08-19 09:46:31.000000000 +0200
@@ -61,7 +61,7 @@ int ocfs2_zero_extend(struct inode *inod
 int ocfs2_setattr(struct dentry *dentry, struct iattr *attr);
 int ocfs2_getattr(struct vfsmount *mnt, struct dentry *dentry,
 		  struct kstat *stat);
-int ocfs2_permission(struct inode *inode, int mask);
+int ocfs2_permission(struct dentry *dentry, int mask);
 
 int ocfs2_should_update_atime(struct inode *inode,
 			      struct vfsmount *vfsmnt);
Index: linux-2.6/fs/proc/base.c
===================================================================
--- linux-2.6.orig/fs/proc/base.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/proc/base.c	2010-08-19 09:46:31.000000000 +0200
@@ -2050,8 +2050,9 @@ static const struct file_operations proc
  * /proc/pid/fd needs a special permission handler so that a process can still
  * access /proc/self/fd after it has executed a setuid().
  */
-static int proc_fd_permission(struct inode *inode, int mask)
+static int proc_fd_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
 	int rv;
 
 	rv = generic_permission(inode, mask, NULL);
Index: linux-2.6/fs/proc/proc_sysctl.c
===================================================================
--- linux-2.6.orig/fs/proc/proc_sysctl.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/proc/proc_sysctl.c	2010-08-19 09:46:31.000000000 +0200
@@ -292,12 +292,13 @@ out:
 	return ret;
 }
 
-static int proc_sys_permission(struct inode *inode, int mask)
+static int proc_sys_permission(struct dentry *dentry, int mask)
 {
 	/*
 	 * sysctl entries that are not writeable,
 	 * are _NOT_ writeable, capabilities or not.
 	 */
+	struct inode *inode = dentry->d_inode;
 	struct ctl_table_header *head;
 	struct ctl_table *table;
 	int error;
Index: linux-2.6/fs/reiserfs/xattr.c
===================================================================
--- linux-2.6.orig/fs/reiserfs/xattr.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/reiserfs/xattr.c	2010-08-19 09:46:31.000000000 +0200
@@ -954,8 +954,10 @@ static int xattr_mount_check(struct supe
 	return 0;
 }
 
-int reiserfs_permission(struct inode *inode, int mask)
+int reiserfs_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
+
 	/*
 	 * We don't do permission checks on the internal objects.
 	 * Permissions are determined by the "owning" object.
Index: linux-2.6/fs/smbfs/file.c
===================================================================
--- linux-2.6.orig/fs/smbfs/file.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/smbfs/file.c	2010-08-19 09:46:31.000000000 +0200
@@ -408,9 +408,9 @@ smb_file_release(struct inode *inode, st
  * privileges, so we need our own check for this.
  */
 static int
-smb_file_permission(struct inode *inode, int mask)
+smb_file_permission(struct dentry *dentry, int mask)
 {
-	int mode = inode->i_mode;
+	int mode = dentry->d_inode->i_mode;
 	int error = 0;
 
 	VERBOSE("mode=%x, mask=%x\n", mode, mask);
Index: linux-2.6/fs/sysfs/inode.c
===================================================================
--- linux-2.6.orig/fs/sysfs/inode.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/sysfs/inode.c	2010-08-19 09:46:31.000000000 +0200
@@ -348,8 +348,9 @@ int sysfs_hash_and_remove(struct sysfs_d
 		return -ENOENT;
 }
 
-int sysfs_permission(struct inode *inode, int mask)
+int sysfs_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
 	struct sysfs_dirent *sd = inode->i_private;
 
 	mutex_lock(&sysfs_mutex);
Index: linux-2.6/fs/sysfs/sysfs.h
===================================================================
--- linux-2.6.orig/fs/sysfs/sysfs.h	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/sysfs/sysfs.h	2010-08-19 09:46:31.000000000 +0200
@@ -200,7 +200,7 @@ static inline void __sysfs_put(struct sy
 struct inode *sysfs_get_inode(struct super_block *sb, struct sysfs_dirent *sd);
 void sysfs_evict_inode(struct inode *inode);
 int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr *iattr);
-int sysfs_permission(struct inode *inode, int mask);
+int sysfs_permission(struct dentry *dentry, int mask);
 int sysfs_setattr(struct dentry *dentry, struct iattr *iattr);
 int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat);
 int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
Index: linux-2.6/include/linux/coda_linux.h
===================================================================
--- linux-2.6.orig/include/linux/coda_linux.h	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/include/linux/coda_linux.h	2010-08-19 09:46:31.000000000 +0200
@@ -37,7 +37,7 @@ extern const struct file_operations coda
 /* operations shared over more than one file */
 int coda_open(struct inode *i, struct file *f);
 int coda_release(struct inode *i, struct file *f);
-int coda_permission(struct inode *inode, int mask);
+int coda_permission(struct dentry *dentry, int mask);
 int coda_revalidate_inode(struct dentry *);
 int coda_getattr(struct vfsmount *, struct dentry *, struct kstat *);
 int coda_setattr(struct dentry *, struct iattr *);
Index: linux-2.6/include/linux/nfs_fs.h
===================================================================
--- linux-2.6.orig/include/linux/nfs_fs.h	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/include/linux/nfs_fs.h	2010-08-19 09:46:31.000000000 +0200
@@ -348,7 +348,7 @@ extern int nfs_refresh_inode(struct inod
 extern int nfs_post_op_update_inode(struct inode *inode, struct nfs_fattr *fattr);
 extern int nfs_post_op_update_inode_force_wcc(struct inode *inode, struct nfs_fattr *fattr);
 extern int nfs_getattr(struct vfsmount *, struct dentry *, struct kstat *);
-extern int nfs_permission(struct inode *, int);
+extern int nfs_permission(struct dentry *, int);
 extern int nfs_open(struct inode *, struct file *);
 extern int nfs_release(struct inode *, struct file *);
 extern int nfs_attribute_timeout(struct inode *inode);
Index: linux-2.6/include/linux/reiserfs_xattr.h
===================================================================
--- linux-2.6.orig/include/linux/reiserfs_xattr.h	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/include/linux/reiserfs_xattr.h	2010-08-19 09:46:31.000000000 +0200
@@ -41,7 +41,7 @@ int reiserfs_xattr_init(struct super_blo
 int reiserfs_lookup_privroot(struct super_block *sb);
 int reiserfs_delete_xattrs(struct inode *inode);
 int reiserfs_chown_xattrs(struct inode *inode, struct iattr *attrs);
-int reiserfs_permission(struct inode *inode, int mask);
+int reiserfs_permission(struct dentry *dentry, int mask);
 
 #ifdef CONFIG_REISERFS_FS_XATTR
 #define has_xattr_dir(inode) (REISERFS_I(inode)->i_flags & i_has_xattr_dir)

-- 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 3/5] vfs: add flag to allow rename to same inode
  2010-08-26 18:33 [PATCH 0/5] hybrid union filesystem prototype Miklos Szeredi
  2010-08-26 18:33 ` [PATCH 1/5] vfs: implement open "forwarding" Miklos Szeredi
  2010-08-26 18:33 ` [PATCH 2/5] vfs: make i_op->permission take a dentry instead of an inode Miklos Szeredi
@ 2010-08-26 18:33 ` Miklos Szeredi
  2010-08-26 18:33 ` [PATCH 4/5] vfs: export do_splice_direct() to modules Miklos Szeredi
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 49+ messages in thread
From: Miklos Szeredi @ 2010-08-26 18:33 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro, jblunck, hch

[-- Attachment #1: vfs-fs_rename_self_allow.patch --]
[-- Type: text/plain, Size: 1523 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Hybrid union filesystem uses dummy inodes for non-directories.  Allow
rename to work in this case despite the inode being the same.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/namei.c         |    4 +++-
 include/linux/fs.h |    1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-08-25 14:19:34.000000000 +0200
+++ linux-2.6/include/linux/fs.h	2010-08-25 14:19:53.000000000 +0200
@@ -179,6 +179,7 @@ struct inodes_stat_t {
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move()
 					 * during rename() internally.
 					 */
+#define FS_RENAME_SELF_ALLOW	65536	/* Allow rename to same inode */
 
 /*
  * These are the fs-independent mount-flags: up to 32 flags are supported
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c	2010-08-25 10:19:53.000000000 +0200
+++ linux-2.6/fs/namei.c	2010-08-25 14:22:56.000000000 +0200
@@ -2620,8 +2620,10 @@ int vfs_rename(struct inode *old_dir, st
 	int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
 	const unsigned char *old_name;
 
-	if (old_dentry->d_inode == new_dentry->d_inode)
+	if (old_dentry->d_inode == new_dentry->d_inode &&
+	    !(old_dir->i_sb->s_type->fs_flags & FS_RENAME_SELF_ALLOW)) {
  		return 0;
+	}
  
 	error = may_delete(old_dir, old_dentry, is_dir);
 	if (error)

-- 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 4/5] vfs: export do_splice_direct() to modules
  2010-08-26 18:33 [PATCH 0/5] hybrid union filesystem prototype Miklos Szeredi
                   ` (2 preceding siblings ...)
  2010-08-26 18:33 ` [PATCH 3/5] vfs: add flag to allow rename to same inode Miklos Szeredi
@ 2010-08-26 18:33 ` Miklos Szeredi
  2010-08-26 18:33 ` [PATCH 5/5] union: hybrid union filesystem prototype Miklos Szeredi
  2010-08-27  7:05 ` [PATCH 0/5] " Neil Brown
  5 siblings, 0 replies; 49+ messages in thread
From: Miklos Szeredi @ 2010-08-26 18:33 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro, jblunck, hch

[-- Attachment #1: vfs-export-do_splice_direct.patch --]
[-- Type: text/plain, Size: 678 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Export do_splice_direct() to modules.  Needed by hybrid union
filesystem.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/splice.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6/fs/splice.c
===================================================================
--- linux-2.6.orig/fs/splice.c	2010-08-13 16:07:00.000000000 +0200
+++ linux-2.6/fs/splice.c	2010-08-25 18:59:08.000000000 +0200
@@ -1307,6 +1307,7 @@ long do_splice_direct(struct file *in, l
 
 	return ret;
 }
+EXPORT_SYMBOL(do_splice_direct);
 
 static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
 			       struct pipe_inode_info *opipe,

-- 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 5/5] union: hybrid union filesystem prototype
  2010-08-26 18:33 [PATCH 0/5] hybrid union filesystem prototype Miklos Szeredi
                   ` (3 preceding siblings ...)
  2010-08-26 18:33 ` [PATCH 4/5] vfs: export do_splice_direct() to modules Miklos Szeredi
@ 2010-08-26 18:33 ` Miklos Szeredi
  2010-09-01 21:42   ` Valerie Aurora
  2010-09-02 21:42   ` Valerie Aurora
  2010-08-27  7:05 ` [PATCH 0/5] " Neil Brown
  5 siblings, 2 replies; 49+ messages in thread
From: Miklos Szeredi @ 2010-08-26 18:33 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro, jblunck, hch

[-- Attachment #1: union.patch --]
[-- Type: text/plain, Size: 43020 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

This union filesystem is a hybrid of entirely filesystem based
(unionfs, aufs) and entierly VFS based (union mounts) solutions.

The dentry tree is duplicated from the underlying filesystems, this
enables fast cached lookups without adding special support into the
VFS.  This uses slightly more memory than union mounts, but dentries
are relatively small.

Inode structures are only duplicated for directories.  Regular files,
symlinks and special files each share a single inode.  This means that
locking victim for unlink is a quasi-filesystem lock, which is
suboptimal, but could be worked around in the VFS.

Opening non directories results in the open forwarded to the
underlying filesystem.  This makes the behavior very similar to union
mounts (with the same limitations vs. fchmod/fchown on O_RDONLY file
descriptors).

Usage:

  mount -t union -olowerdir=/union/lower,upperdir=/union/upper union /mnt/union

Supported:

 - all operations

Missing:

 - upgrade credentials for copy-up
 - ensure that filesystems part of the union are not modified outside
   the union
 - optimize directory merging and caching

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/Kconfig        |    1 
 fs/Makefile       |    1 
 fs/union/Kconfig  |    4 
 fs/union/Makefile |    5 
 fs/union/union.c  | 1714 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 1725 insertions(+)

Index: linux-2.6/fs/union/union.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/fs/union/union.c	2010-08-26 19:12:09.000000000 +0200
@@ -0,0 +1,1714 @@
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/sched.h>
+#include <linux/fs_struct.h>
+#include <linux/file.h>
+#include <linux/xattr.h>
+#include <linux/security.h>
+#include <linux/mount.h>
+#include <linux/splice.h>
+#include <linux/slab.h>
+#include <linux/parser.h>
+#include <linux/module.h>
+#include <linux/uaccess.h>
+
+MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
+MODULE_DESCRIPTION("Union filesystem");
+MODULE_LICENSE("GPL");
+
+struct union_fs {
+	struct inode *symlink_inode;
+	struct inode *regular_inode;
+	struct inode *special_inode;
+};
+
+struct union_entry {
+	struct path upperpath;
+	struct path lowerpath;
+	bool opaque;
+};
+
+static const char *union_whiteout_xattr = "trusted.union.whiteout";
+static const char *union_opaque_xattr = "trusted.union.opaque";
+
+static struct path *union_path(struct union_entry *ue)
+{
+	return ue->upperpath.dentry ? &ue->upperpath : &ue->lowerpath;
+}
+
+static struct file *path_open(struct path *path, int flags)
+{
+	const struct cred *cred = current_cred();
+
+	path_get(path);
+	return dentry_open(path->dentry, path->mnt, flags, cred);
+}
+
+static int union_real_readdir(struct file *file, void *buf, filldir_t filler)
+{
+	int err;
+	struct file *realfile = file->private_data;
+
+	err = vfs_readdir(realfile, filler, buf);
+	file->f_pos = realfile->f_pos;
+
+	return err;
+}
+
+static loff_t union_real_llseek(struct file *file, loff_t offset, int origin)
+{
+	loff_t res;
+	struct file *realfile = file->private_data;
+
+	res = generic_file_llseek(realfile, offset, origin);
+	file->f_pos = realfile->f_pos;
+
+	return res;
+}
+
+static int union_real_dir_fsync(struct file *file, int datasync)
+{
+	struct file *realfile = file->private_data;
+	return vfs_fsync(realfile, datasync);
+}
+
+static int union_real_dir_release(struct inode *inode, struct file *file)
+{
+	struct file *realfile = file->private_data;
+
+	fput(realfile);
+	return 0;
+}
+
+static const struct file_operations union_real_dir_operations = {
+	.read		= generic_read_dir,
+	.readdir	= union_real_readdir,
+	.llseek		= union_real_llseek,
+	.fsync		= union_real_dir_fsync,
+	.release	= union_real_dir_release,
+};
+
+static int union_real_dir_open(struct file *file)
+{
+	struct union_entry *ue = file->f_path.dentry->d_fsdata;
+	struct path *realpath = union_path(ue);
+	struct file *realfile;
+
+	realfile = path_open(realpath, file->f_flags);
+	if (IS_ERR(realfile))
+		return PTR_ERR(realfile);
+
+	file->private_data = realfile;
+	file->f_op = &union_real_dir_operations;
+
+	return 0;
+}
+
+static int union_dir_open(struct inode *inode, struct file *file)
+{
+	struct union_entry *ue = file->f_path.dentry->d_fsdata;
+
+	if (ue->upperpath.dentry && ue->lowerpath.dentry)
+		return 0;
+	else
+		return union_real_dir_open(file);
+}
+
+static bool union_is_whiteout(struct dentry *dentry)
+{
+	int res;
+	char val;
+
+	if (!dentry)
+		return false;
+	if (!dentry->d_inode)
+		return false;
+	if (!S_ISLNK(dentry->d_inode->i_mode))
+		return false;
+
+	res = vfs_getxattr(dentry, union_whiteout_xattr, &val, 1);
+	if (res == 1 && val == 'y')
+		return true;
+
+	return false;
+}
+
+static bool union_is_opaquedir(struct dentry *dentry)
+{
+	int res;
+	char val;
+
+	if (!S_ISDIR(dentry->d_inode->i_mode))
+		return false;
+
+	res = vfs_getxattr(dentry, union_opaque_xattr, &val, 1);
+	if (res == 1 && val == 'y')
+		return true;
+
+	return false;
+}
+
+struct union_cache_entry {
+	struct union_cache_entry *next;
+	struct qstr name;
+	unsigned int type;
+	u64 ino;
+	bool is_whiteout;
+};
+
+struct union_cache_callback {
+	struct union_cache_entry *list;
+	struct union_cache_entry **endp;
+	struct path path;
+	int count;
+};
+
+static int union_cache_add_entry(struct union_cache_callback *cb,
+				 const char *name, int namelen, u64 ino,
+				 unsigned int d_type, bool is_whiteout)
+{
+	struct union_cache_entry *p;
+
+	p = kmalloc(sizeof(*p), GFP_KERNEL);
+	if (!p)
+		return -ENOMEM;
+
+	p->name.name = kstrndup(name, namelen, GFP_KERNEL);
+	if (!p->name.name) {
+		kfree(p);
+		return -ENOMEM;
+	}
+	p->name.len = namelen;
+	p->name.hash = 0;
+	p->type = d_type;
+	p->ino = ino;
+	p->is_whiteout = is_whiteout;
+	p->next = NULL;
+	*cb->endp = p;
+	cb->endp = &p->next;
+
+	return 0;
+}
+
+static void union_cache_free(struct union_cache_entry *p)
+{
+	while (p) {
+		struct union_cache_entry *next = p->next;
+
+		kfree(p->name.name);
+		kfree(p);
+		p = next;
+	}
+}
+
+static int union_cache_find_entry(struct union_cache_entry *start,
+				  const char *name, int namelen)
+{
+	struct union_cache_entry *p;
+	int ret = 0;
+
+	for (p = start; p; p = p->next) {
+		if (p->name.len != namelen)
+			continue;
+		if (strncmp(p->name.name, name, namelen) == 0) {
+			ret = 1;
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static int union_fill_lower(void *buf, const char *name, int namlen,
+			    loff_t offset, u64 ino, unsigned int d_type)
+{
+	struct union_cache_callback *cb = buf;
+
+	cb->count++;
+	if (!union_cache_find_entry(cb->list, name, namlen))
+		union_cache_add_entry(cb, name, namlen, ino, d_type, false);
+
+	return 0;
+}
+
+static int union_fill_upper(void *buf, const char *name, int namlen,
+			    loff_t offset, u64 ino, unsigned int d_type)
+{
+	struct union_cache_callback *cb = buf;
+	bool is_whiteout = false;
+
+	cb->count++;
+	if (d_type == DT_LNK) {
+		struct dentry *dentry;
+
+		dentry = lookup_one_len(name, cb->path.dentry, strlen(name));
+		if (!IS_ERR(dentry)) {
+			is_whiteout = union_is_whiteout(dentry);
+			dput(dentry);
+		}
+	}
+	union_cache_add_entry(cb, name, namlen, ino, d_type, is_whiteout);
+
+	return 0;
+}
+
+static int union_fill_cache(struct path *realpath,
+			    struct union_cache_callback *cb,
+			    filldir_t filler)
+{
+	struct file *realfile;
+	int err;
+
+	realfile = path_open(realpath, O_RDONLY | O_DIRECTORY);
+	if (IS_ERR(realfile))
+		return PTR_ERR(realfile);
+
+	do {
+		cb->count = 0;
+		err = vfs_readdir(realfile, filler, cb);
+	} while (err >= 0 && cb->count);
+	fput(realfile);
+
+	if (err < 0) {
+		union_cache_free(cb->list);
+		cb->list = NULL;
+		return err;
+	}
+
+	return 0;
+}
+
+static int union_readdir(struct file *file, void *buf, filldir_t filler)
+{
+	struct union_entry *ue = file->f_path.dentry->d_fsdata;
+	struct union_cache_entry *union_cache = file->private_data;
+	struct union_cache_entry *p;
+	loff_t off;
+	int res = 0;
+
+	if (!file->f_pos && union_cache) {
+		union_cache_free(union_cache);
+		union_cache = NULL;
+	}
+
+	if (!union_cache) {
+		struct union_cache_callback cb;
+
+		cb.list = NULL;
+		cb.endp = &cb.list;
+		cb.path = ue->upperpath;
+
+		res = union_fill_cache(&ue->upperpath, &cb, union_fill_upper);
+		if (!res) {
+			res = union_fill_cache(&ue->lowerpath, &cb,
+					       union_fill_lower);
+		}
+		if (res)
+			return res;
+
+		union_cache = cb.list;
+		file->private_data = union_cache;
+	}
+
+	off = 0;
+	for (p = union_cache; p; p = p->next) {
+		int over;
+
+		if (p->is_whiteout)
+			continue;
+
+		off++;
+		if (off <= file->f_pos)
+			continue;
+
+		over = filler(buf, p->name.name, p->name.len, off - 1,
+			      p->ino, p->type);
+		if (over)
+			break;
+
+		file->f_pos = off;
+	}
+
+	return res;
+}
+
+static int union_dir_fsync(struct file *file, int datasync)
+{
+	int err;
+	struct union_entry *ue = file->f_path.dentry->d_fsdata;
+	struct file *realfile;
+
+	realfile = path_open(&ue->upperpath, O_RDONLY);
+	if (IS_ERR(realfile))
+		return PTR_ERR(realfile);
+
+	err = vfs_fsync(realfile, datasync);
+	fput(realfile);
+
+	return err;
+}
+
+static int union_dir_release(struct inode *inode, struct file *file)
+{
+	struct union_cache_entry *union_cache = file->private_data;
+
+	union_cache_free(union_cache);
+
+	return 0;
+}
+
+static const struct file_operations union_dir_operations = {
+	.read		= generic_read_dir,
+	.open		= union_dir_open,
+	.readdir	= union_readdir,
+	.llseek		= generic_file_llseek,
+	.fsync		= union_dir_fsync,
+	.release	= union_dir_release,
+};
+
+static const struct inode_operations union_dir_inode_operations;
+
+static void union_dentry_release(struct dentry *dentry)
+{
+	struct union_entry *ue = dentry->d_fsdata;
+
+	if (ue) {
+		path_put(&ue->upperpath);
+		path_put(&ue->lowerpath);
+		kfree(ue);
+	}
+}
+
+static void union_dentry_iput(struct dentry *dentry, struct inode *inode)
+{
+	struct union_entry *ue = dentry->d_fsdata;
+
+	path_put(&ue->upperpath);
+	path_put(&ue->lowerpath);
+	ue->upperpath.dentry = NULL;
+	ue->upperpath.mnt = NULL;
+	ue->lowerpath.dentry = NULL;
+	ue->lowerpath.mnt = NULL;
+	iput(inode);
+}
+
+static const struct dentry_operations union_dentry_operations = {
+	.d_release = union_dentry_release,
+	.d_iput = union_dentry_iput,
+};
+
+static struct inode *union_new_inode(struct super_block *sb, umode_t mode)
+{
+	struct union_fs *ufs = sb->s_fs_info;
+	struct inode *inode;
+
+	switch (mode & S_IFMT) {
+	case S_IFDIR:
+		inode = new_inode(sb);
+		inode->i_flags |= S_NOATIME|S_NOCMTIME;
+		inode->i_op = &union_dir_inode_operations;
+		inode->i_fop = &union_dir_operations;
+		inode->i_mode = mode & S_IFMT;
+		break;
+
+	case S_IFLNK:
+		inode = ufs->symlink_inode;
+		atomic_inc(&inode->i_count);
+		break;
+
+	case S_IFREG:
+		inode = ufs->regular_inode;
+		atomic_inc(&inode->i_count);
+		break;
+
+	case S_IFSOCK:
+	case S_IFBLK:
+	case S_IFCHR:
+	case S_IFIFO:
+		inode = ufs->special_inode;
+		atomic_inc(&inode->i_count);
+		break;
+
+	default:
+		WARN(1, "illegal file type: %i\n", mode & S_IFMT);
+		inode = NULL;
+	}
+
+	return inode;
+
+}
+
+static struct dentry *union_lookup_real(struct dentry *dir, struct qstr *name)
+{
+	struct dentry *dentry;
+
+	mutex_lock(&dir->d_inode->i_mutex);
+	dentry = lookup_one_len(name->name, dir, name->len);
+	mutex_unlock(&dir->d_inode->i_mutex);
+
+	if (IS_ERR(dentry)) {
+		if (PTR_ERR(dentry) == -ENOENT)
+			dentry = NULL;
+	} else if (!dentry->d_inode) {
+		dput(dentry);
+		dentry = NULL;
+	}
+	return dentry;
+}
+
+static struct dentry *union_lookup(struct inode *dir, struct dentry *dentry,
+				   struct nameidata *nd)
+{
+	struct union_entry *pue = dentry->d_parent->d_fsdata;
+	struct union_entry *ue;
+	struct dentry *upperdir = pue->upperpath.dentry;
+	struct dentry *upperdentry = NULL;
+	struct dentry *lowerdir = pue->lowerpath.dentry;
+	struct dentry *lowerdentry = NULL;
+	struct inode *inode = NULL;
+	int err;
+
+	err = -ENOMEM;
+	ue = kzalloc(sizeof(struct union_entry), GFP_KERNEL);
+	if (!ue)
+		goto out;
+
+	if (upperdir) {
+		upperdentry = union_lookup_real(upperdir, &dentry->d_name);
+		err = PTR_ERR(upperdentry);
+		if (IS_ERR(upperdentry))
+			goto out_free;
+
+		if (upperdentry) {
+			if (union_is_opaquedir(upperdentry))
+				ue->opaque = true;
+			else if (union_is_whiteout(upperdentry)) {
+				dput(upperdentry);
+				upperdentry = NULL;
+				ue->opaque = true;
+			}
+		}
+	}
+	if (lowerdir && !ue->opaque) {
+		lowerdentry = union_lookup_real(lowerdir, &dentry->d_name);
+		if (IS_ERR(lowerdentry)) {
+			err = PTR_ERR(lowerdentry);
+			dput(upperdentry);
+			goto out_free;
+		}
+	}
+
+	if (lowerdentry && upperdentry &&
+	    (!S_ISDIR(upperdentry->d_inode->i_mode) ||
+	     !S_ISDIR(lowerdentry->d_inode->i_mode))) {
+		dput(lowerdentry);
+		lowerdentry = NULL;
+		ue->opaque = true;
+	}
+
+	if (lowerdentry || upperdentry) {
+		struct dentry *realdentry;
+
+		realdentry = upperdentry ? upperdentry : lowerdentry;
+		inode = union_new_inode(dir->i_sb, realdentry->d_inode->i_mode);
+		if (!inode)
+			goto out_dput;
+	}
+
+	if (upperdentry) {
+		ue->upperpath.mnt = pue->upperpath.mnt;
+		ue->upperpath.dentry = upperdentry;
+		path_get(&ue->upperpath);
+		dput(upperdentry);
+	}
+	if (lowerdentry) {
+		ue->lowerpath.mnt = pue->lowerpath.mnt;
+		ue->lowerpath.dentry = lowerdentry;
+		path_get(&ue->lowerpath);
+		dput(lowerdentry);
+	}
+
+	d_add(dentry, inode);
+	dentry->d_fsdata = ue;
+	dentry->d_op = &union_dentry_operations;
+
+	return NULL;
+
+out_dput:
+	dput(upperdentry);
+	dput(lowerdentry);
+out_free:
+	kfree(ue);
+out:
+	return ERR_PTR(err);
+}
+
+static int union_copy_up_xattr(struct dentry *old, struct dentry *new)
+{
+	ssize_t list_size, size;
+	char *buf, *name, *value;
+	int error;
+
+	if (!old->d_inode->i_op->getxattr ||
+	    !new->d_inode->i_op->getxattr)
+		return 0;
+
+	list_size = vfs_listxattr(old, NULL, 0);
+	if (list_size <= 0)
+		return list_size;
+
+	buf = kzalloc(list_size, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	error = -ENOMEM;
+	value = kmalloc(XATTR_SIZE_MAX, GFP_KERNEL);
+	if (!value)
+		goto out;
+
+	list_size = vfs_listxattr(old, buf, list_size);
+	if (list_size <= 0) {
+		error = list_size;
+		goto out_free_value;
+	}
+
+	for (name = buf; name < (buf + list_size); name += strlen(name) + 1) {
+		size = vfs_getxattr(old, name, value, XATTR_SIZE_MAX);
+		if (size <= 0) {
+			error = size;
+			goto out_free_value;
+		}
+		error = vfs_setxattr(new, name, value, size, 0);
+		if (error)
+			goto out_free_value;
+	}
+
+out_free_value:
+	kfree(value);
+out:
+	kfree(buf);
+	return error;
+}
+
+static int union_copy_up_data(struct path *old, struct path *new, size_t len)
+{
+	struct file *old_file;
+	struct file *new_file;
+	loff_t offset = 0;
+	long bytes;
+	int error = 0;
+
+	if (len == 0)
+		return 0;
+
+	old_file = path_open(old, O_RDONLY);
+	if (IS_ERR(old_file))
+		return PTR_ERR(old_file);
+
+	new_file = path_open(new, O_WRONLY);
+	if (IS_ERR(new_file)) {
+		error = PTR_ERR(new_file);
+		goto out_fput;
+	}
+
+	bytes = do_splice_direct(old_file, &offset, new_file, len,
+				 SPLICE_F_MOVE);
+	if (bytes < 0)
+		error = bytes;
+
+	fput(new_file);
+out_fput:
+	fput(old_file);
+	return error;
+}
+
+static struct dentry *union_lookup_create(struct union_entry *ue,
+					  struct union_entry *pue,
+					  struct qstr *name)
+{
+	int err;
+	struct inode *upperdir = pue->upperpath.dentry->d_inode;
+	struct dentry *newdentry;
+
+	newdentry = lookup_one_len(name->name, pue->upperpath.dentry, name->len);
+	if (IS_ERR(newdentry))
+		return newdentry;
+
+	if (ue->opaque) {
+		err = -EINVAL;
+		if (WARN_ON(!union_is_whiteout(newdentry)))
+			goto out_dput;
+
+		err = vfs_unlink(upperdir, newdentry);
+		if (err)
+			goto out_dput;
+
+		dput(newdentry);
+		newdentry = lookup_one_len(name->name, pue->upperpath.dentry, name->len);
+		if (IS_ERR(newdentry))
+			return newdentry;
+	}
+
+	err = -EEXIST;
+	if (newdentry->d_inode)
+		goto out_dput;
+
+	return newdentry;
+
+out_dput:
+	dput(newdentry);
+	return ERR_PTR(err);
+}
+
+static int union_upper_create(struct dentry *dentry, struct iattr *attr,
+			      dev_t rdev, const char *link, struct path *src)
+{
+	int err;
+	int attr_update = ATTR_UID | ATTR_GID | ATTR_ATIME_SET | ATTR_MTIME_SET;
+	struct dentry *parent = dget_parent(dentry);
+	struct union_entry *ue = dentry->d_fsdata;
+	struct union_entry *pue = parent->d_fsdata;
+	struct inode *upperdir = pue->upperpath.dentry->d_inode;
+	struct dentry *newdentry;
+	struct path newpath;
+
+	mutex_lock_nested(&upperdir->i_mutex, I_MUTEX_PARENT);
+
+	/*
+	 * Using upper filesystem locking to protect against copy up
+	 * racing with rename (rename means the copy up was already
+	 * successful).
+	 */
+	err = -EEXIST;
+	if (dentry->d_parent != parent)
+		goto out_unlock;
+
+	newdentry = union_lookup_create(ue, pue, &dentry->d_name);
+	err = PTR_ERR(newdentry);
+	if (IS_ERR(newdentry))
+		goto out_unlock;
+
+	newpath.dentry = newdentry;
+	newpath.mnt = pue->upperpath.mnt;
+
+	switch (attr->ia_mode & S_IFMT) {
+	case S_IFREG:
+		if (src)
+			WARN_ON(!(attr->ia_valid & ATTR_SIZE));
+		else
+			WARN_ON((attr->ia_valid & ATTR_SIZE));
+
+		err = vfs_create(upperdir, newdentry, attr->ia_mode, NULL);
+		if (!err && src) {
+			attr_update |= ATTR_SIZE;
+			err = union_copy_up_data(src, &newpath, attr->ia_size);
+		}
+		break;
+
+	case S_IFDIR:
+		err = vfs_mkdir(upperdir, newdentry, attr->ia_mode);
+		break;
+
+	case S_IFCHR:
+	case S_IFBLK:
+	case S_IFIFO:
+	case S_IFSOCK:
+		err = vfs_mknod(upperdir, newdentry, attr->ia_mode, rdev);
+		break;
+
+	case S_IFLNK:
+		err = vfs_symlink(upperdir, newdentry, link);
+		break;
+
+	default:
+		err = -EPERM;
+	}
+
+	if (!err && (attr->ia_valid & attr_update)) {
+		mutex_lock(&newdentry->d_inode->i_mutex);
+		err = notify_change(newdentry, attr);
+		mutex_unlock(&newdentry->d_inode->i_mutex);
+	}
+
+	if (!err && src)
+		err = union_copy_up_xattr(src->dentry, newdentry);
+
+	if (!err && ue->opaque && S_ISDIR(newdentry->d_inode->i_mode))
+		err = vfs_setxattr(newdentry, union_opaque_xattr, "y", 1, 0);
+
+	if (!err) {
+		ue->upperpath = newpath;
+		path_get(&ue->upperpath);
+		/* FIXME: release lowerpath? */
+		if (ue->lowerpath.dentry)
+			ue->opaque = true;
+	} else if (!ue->opaque) {
+		vfs_unlink(upperdir, newdentry);
+	}
+
+	dput(newdentry);
+out_unlock:
+	mutex_unlock(&upperdir->i_mutex);
+	dput(parent);
+
+	return err;
+}
+
+static char *union_read_symlink(struct path *path)
+{
+	int res;
+	char *buf;
+	struct inode *inode = path->dentry->d_inode;
+	mm_segment_t old_fs;
+
+	res = -EINVAL;
+	if (!inode->i_op->readlink)
+		goto err;
+
+	res = -ENOMEM;
+	buf = (char *) __get_free_page(GFP_KERNEL);
+	if (!buf)
+		goto err;
+
+	old_fs = get_fs();
+	set_fs(get_ds());
+	/* The cast to a user pointer is valid due to the set_fs() */
+	res = inode->i_op->readlink(path->dentry,
+				    (char __user *)buf, PAGE_SIZE - 1);
+	set_fs(old_fs);
+	if (res < 0) {
+		free_page((unsigned long) buf);
+		goto err;
+	}
+	buf[res] = '\0';
+
+	return buf;
+
+err:
+	return ERR_PTR(res);
+}
+
+static int union_copy_up_one(struct dentry *dentry, struct iattr *attr)
+{
+	int err;
+	struct union_entry *ue = dentry->d_fsdata;
+	struct inode *lowerinode = ue->lowerpath.dentry->d_inode;
+	struct iattr newattr = *attr;
+	char *link = NULL;
+
+	/* FIXME: use getattr() instead of accessing attributes directly */
+
+	/* ATTR_KILL_S*ID trumps ATTR_MODE */
+	if (newattr.ia_valid & (ATTR_KILL_SUID | ATTR_KILL_SGID)) {
+		newattr.ia_mode = lowerinode->i_mode;
+		if (newattr.ia_valid & ATTR_KILL_SUID)
+			newattr.ia_mode &= ~S_ISUID;
+		if (newattr.ia_valid & ATTR_KILL_SGID)
+			newattr.ia_mode &= ~S_ISGID;
+		newattr.ia_valid &= ~(ATTR_KILL_SUID | ATTR_KILL_SGID);
+	} else if (!(newattr.ia_valid & ATTR_MODE)) {
+		newattr.ia_mode = lowerinode->i_mode;
+		newattr.ia_valid |= ATTR_MODE;
+	} else {
+		newattr.ia_mode &= 07777;
+		newattr.ia_mode |= lowerinode->i_mode & S_IFMT;
+	}
+	if (!(newattr.ia_valid & ATTR_UID)) {
+		newattr.ia_uid = lowerinode->i_uid;
+		newattr.ia_valid |= ATTR_UID;
+	}
+	if (!(newattr.ia_valid & ATTR_GID)) {
+		newattr.ia_gid = lowerinode->i_gid;
+		newattr.ia_valid |= ATTR_GID;
+	}
+	if (!(newattr.ia_valid & ATTR_SIZE)) {
+		newattr.ia_size = lowerinode->i_size;
+		newattr.ia_valid |= ATTR_SIZE;
+	}
+	if (!(newattr.ia_valid & ATTR_ATIME)) {
+		newattr.ia_atime = lowerinode->i_atime;
+		newattr.ia_valid |= ATTR_ATIME | ATTR_ATIME_SET;
+	}
+	if (!(newattr.ia_valid & ATTR_MTIME)) {
+		newattr.ia_mtime = lowerinode->i_mtime;
+		newattr.ia_valid |= ATTR_MTIME | ATTR_MTIME_SET;
+	}
+
+	newattr.ia_valid &= ~ATTR_CTIME;
+
+	if (S_ISLNK(lowerinode->i_mode)) {
+		link = union_read_symlink(&ue->lowerpath);
+		if (IS_ERR(link))
+			return PTR_ERR(link);
+	}
+
+	err = union_upper_create(dentry, &newattr, lowerinode->i_rdev, link,
+				 &ue->lowerpath);
+
+	if (link)
+		free_page((unsigned long) link);
+
+	/* Already copied up? */
+	if (err == -EEXIST && ue->upperpath.dentry)
+		err = 0;
+
+	return err;
+}
+
+static int union_copy_up_attr(struct dentry *dentry, struct iattr *attr)
+{
+	struct union_entry *ue = dentry->d_fsdata;
+	int err = 0;
+
+	while (!err && !ue->upperpath.dentry) {
+		struct dentry *next = dget(dentry);
+		struct dentry *parent;
+
+		/* find the topmost dentry not yet copied up */
+		for (;;) {
+			struct union_entry *pue;
+
+			parent = dget_parent(next);
+			pue = parent->d_fsdata;
+
+			if (pue->upperpath.dentry)
+				break;
+
+			dput(next);
+			next = parent;
+		}
+		if (next == dentry) {
+			err = union_copy_up_one(next, attr);
+		} else {
+			struct iattr noattr = { .ia_valid = 0 };
+			err = union_copy_up_one(next, &noattr);
+		}
+		dput(parent);
+		dput(next);
+	}
+	return err;
+}
+
+static int union_copy_up(struct dentry *dentry)
+{
+	struct iattr noattr = { .ia_valid = 0 };
+
+	return union_copy_up_attr(dentry, &noattr);
+}
+
+static int union_setattr(struct dentry *dentry, struct iattr *attr)
+{
+	struct inode *inode;
+	struct union_entry *ue = dentry->d_fsdata;
+	int err;
+
+	if (!ue->upperpath.dentry)
+		return union_copy_up_attr(dentry, attr);
+
+	inode = ue->upperpath.dentry->d_inode;
+
+	mutex_lock(&inode->i_mutex);
+	err = notify_change(ue->upperpath.dentry, attr);
+	mutex_unlock(&inode->i_mutex);
+
+	return err;
+}
+
+static int union_getattr(struct vfsmount *mnt, struct dentry *dentry,
+			 struct kstat *stat)
+{
+	struct union_entry *ue = dentry->d_fsdata;
+	struct path *realpath = union_path(ue);
+
+	return vfs_getattr(realpath->mnt, realpath->dentry, stat);
+}
+
+static int union_dir_getattr(struct vfsmount *mnt, struct dentry *dentry,
+			 struct kstat *stat)
+{
+	int err;
+	struct union_entry *ue = dentry->d_fsdata;
+	struct path *realpath = union_path(ue);
+
+	err = vfs_getattr(realpath->mnt, realpath->dentry, stat);
+
+	/* FIXME: better st_ino management */
+	stat->ino = dentry->d_inode->i_ino;
+
+	return err;
+}
+
+static int union_permission(struct dentry *dentry, int mask)
+{
+	struct union_entry *ue = dentry->d_fsdata;
+	struct inode *inode;
+	int err;
+
+	if (ue->upperpath.dentry)
+		return dentry_permission(ue->upperpath.dentry, mask);
+
+	inode = ue->lowerpath.dentry->d_inode;
+	if (!(mask & MAY_WRITE) || special_file(inode->i_mode))
+		return dentry_permission(ue->lowerpath.dentry, mask);
+
+	/* Don't check for read-only fs */
+	if (mask & MAY_WRITE) {
+		if (IS_IMMUTABLE(inode))
+			return -EACCES;
+	}
+
+	if (inode->i_op->permission)
+		err = inode->i_op->permission(ue->lowerpath.dentry, mask);
+	else
+		err = generic_permission(inode, mask, inode->i_op->check_acl);
+
+	if (err)
+		return err;
+
+	return security_inode_permission(inode, mask);
+}
+
+static int union_create_object(struct dentry *dentry, int mode, dev_t rdev,
+			       const char *link)
+{
+	int err;
+	struct inode *inode;
+	struct union_entry *ue;
+	struct iattr attr = {
+		.ia_valid = ATTR_MODE,
+		.ia_mode = mode,
+	};
+
+	ue = dentry->d_fsdata;
+
+	err = -ENOMEM;
+	inode = union_new_inode(dentry->d_sb, mode);
+	if (!inode)
+		goto out;
+
+	err = union_copy_up(dentry->d_parent);
+	if (err)
+		goto out_iput;
+
+	err = union_upper_create(dentry, &attr, rdev, link, NULL);
+	if (err)
+		goto out_iput;
+
+	d_instantiate(dentry, inode);
+
+	return 0;
+
+out_iput:
+	iput(inode);
+out:
+	return err;
+}
+
+static int union_create(struct inode *dir, struct dentry *dentry, int mode,
+			struct nameidata *nd)
+{
+	return union_create_object(dentry, (mode & 07777) | S_IFREG, 0, NULL);
+}
+
+static int union_mkdir(struct inode *dir, struct dentry *dentry, int mode)
+{
+	return union_create_object(dentry, (mode & 07777) | S_IFDIR, 0, NULL);
+}
+
+static int union_mknod(struct inode *dir, struct dentry *dentry, int mode,
+		       dev_t rdev)
+{
+	return union_create_object(dentry, mode, rdev, NULL);
+}
+
+static int union_symlink(struct inode *dir, struct dentry *dentry,
+			 const char *link)
+{
+	return union_create_object(dentry, S_IFLNK, 0, link);
+}
+
+static void *union_follow_link(struct dentry *dentry, struct nameidata *nd)
+{
+	struct union_entry *ue = dentry->d_fsdata;
+	struct path *realpath = union_path(ue);
+	struct inode *realinode = realpath->dentry->d_inode;
+
+	if (!realinode->i_op->follow_link) {
+		path_put(&nd->path);
+		nd->path = *realpath;
+		path_get(&nd->path);
+		return NULL;
+	}
+
+	return realinode->i_op->follow_link(realpath->dentry, nd);
+}
+
+static void union_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
+{
+	struct union_entry *ue = dentry->d_fsdata;
+	struct path *realpath = union_path(ue);
+	struct inode *realinode = realpath->dentry->d_inode;
+
+	if (!realinode->i_op->follow_link || !realinode->i_op->put_link)
+		return;
+
+	realinode->i_op->put_link(realpath->dentry, nd, c);
+}
+
+static int union_readlink(struct dentry *dentry, char __user *buf, int bufsiz)
+{
+	struct union_entry *ue = dentry->d_fsdata;
+	struct path *realpath = union_path(ue);
+	struct inode *realinode = realpath->dentry->d_inode;
+
+	if (!realinode->i_op->readlink)
+		return -EINVAL;
+
+	touch_atime(realpath->mnt, realpath->dentry);
+	return realinode->i_op->readlink(realpath->dentry, buf, bufsiz);
+}
+
+static int union_whiteout(struct dentry *dentry)
+{
+	int err;
+	struct union_entry *pue = dentry->d_parent->d_fsdata;
+	struct dentry *newdentry;
+
+	newdentry = lookup_one_len(dentry->d_name.name, pue->upperpath.dentry,
+				   dentry->d_name.len);
+	err = PTR_ERR(newdentry);
+	if (IS_ERR(newdentry))
+		goto out;
+
+	if (WARN_ON(newdentry->d_inode))
+		goto out_dput;
+
+	err = vfs_symlink(pue->upperpath.dentry->d_inode, newdentry,
+			  "(union-whiteout)");
+	if (err)
+		goto out_dput;
+
+	err = vfs_setxattr(newdentry, union_whiteout_xattr, "y", 1, 0);
+
+out_dput:
+	dput(newdentry);
+out:
+	return err;
+}
+
+static int union_unlink(struct inode *dir, struct dentry *dentry)
+{
+	int err;
+	struct union_entry *ue = dentry->d_fsdata;
+	struct union_entry *pue;
+	struct inode *upperdir;
+
+	err = union_copy_up(dentry->d_parent);
+	if (err)
+		return err;
+
+	pue = dentry->d_parent->d_fsdata;
+	upperdir = pue->upperpath.dentry->d_inode;
+
+	mutex_lock_nested(&upperdir->i_mutex, I_MUTEX_PARENT);
+	if (ue->upperpath.dentry) {
+		err = vfs_unlink(upperdir, ue->upperpath.dentry);
+		if (err)
+			goto out_unlock;
+	} else {
+		ue->opaque = true;
+	}
+
+	if (ue->opaque)
+		err = union_whiteout(dentry);
+out_unlock:
+	mutex_unlock(&upperdir->i_mutex);
+
+	return err;
+}
+
+static int union_check_empty_dir(struct dentry *dentry)
+{
+	int err;
+	struct union_entry *ue = dentry->d_fsdata;
+	struct union_cache_callback cb;
+	struct union_cache_entry *p;
+
+	cb.list = NULL;
+	cb.endp = &cb.list;
+	cb.path = ue->upperpath;
+
+	if (ue->upperpath.dentry) {
+		err = union_fill_cache(&ue->upperpath, &cb, union_fill_upper);
+		if (err)
+			return err;
+	}
+	err = union_fill_cache(&ue->lowerpath, &cb, union_fill_lower);
+	if (err)
+		return err;
+
+	err = 0;
+	for (p = cb.list; p; p = p->next) {
+		if (p->is_whiteout)
+			continue;
+
+		if (p->name.name[0] == '.') {
+			if (p->name.len == 1)
+				continue;
+			if (p->name.len == 2 && p->name.name[1] == '.')
+				continue;
+		}
+		err = -ENOTEMPTY;
+		break;
+	}
+
+	union_cache_free(cb.list);
+
+	return err;
+}
+
+static int union_unlink_whiteout(void *buf, const char *name, int namlen,
+				 loff_t offset, u64 ino, unsigned int d_type)
+{
+	struct union_cache_callback *cb = buf;
+
+	cb->count++;
+	if (d_type == DT_LNK) {
+		int err;
+		struct dentry *dentry;
+
+		dentry = lookup_one_len(name, cb->path.dentry, strlen(name));
+		if (IS_ERR(dentry))
+			return PTR_ERR(dentry);
+
+		err = vfs_unlink(cb->path.dentry->d_inode, dentry);
+		dput(dentry);
+
+		return err;
+	}
+
+	return 0;
+}
+
+static int union_remove_whiteouts(struct dentry *dentry)
+{
+	struct union_entry *ue = dentry->d_fsdata;
+	struct union_cache_callback cb;
+
+	if (!ue->upperpath.dentry)
+		return 0;
+
+	cb.path = ue->upperpath;
+	return union_fill_cache(&ue->upperpath, &cb, union_unlink_whiteout);
+}
+
+static int union_rmdir(struct inode *dir, struct dentry *dentry)
+{
+	int err;
+	struct union_entry *ue = dentry->d_fsdata;
+	struct union_entry *pue;
+	struct inode *upperdir;
+
+	if (ue->lowerpath.dentry) {
+		err = union_check_empty_dir(dentry);
+		if (err)
+			return err;
+
+		err = union_copy_up(dentry->d_parent);
+		if (err)
+			return err;
+
+		err = union_remove_whiteouts(dentry);
+		if (err)
+			return err;
+	}
+
+	pue = dentry->d_parent->d_fsdata;
+	upperdir = pue->upperpath.dentry->d_inode;
+
+	mutex_lock_nested(&upperdir->i_mutex, I_MUTEX_PARENT);
+	if (ue->upperpath.dentry) {
+		err = vfs_rmdir(upperdir, ue->upperpath.dentry);
+		if (err)
+			goto out_unlock;
+	}
+	if (ue->lowerpath.dentry)
+		ue->opaque = true;
+
+	if (ue->opaque)
+		err = union_whiteout(dentry);
+out_unlock:
+	mutex_unlock(&upperdir->i_mutex);
+
+	return err;
+}
+
+static int union_link(struct dentry *old, struct inode *newdir,
+		      struct dentry *new)
+{
+	int err;
+	struct dentry *newdentry;
+	struct union_entry *new_ue = new->d_fsdata;
+	struct union_entry *old_ue = old->d_fsdata;
+	struct union_entry *pue = new->d_parent->d_fsdata;
+	struct inode *upperdir;
+
+	err = union_copy_up(old);
+	if (err)
+		goto out;
+
+	err = union_copy_up(new->d_parent);
+	if (err)
+		goto out;
+
+	upperdir = pue->upperpath.dentry->d_inode;
+	mutex_lock_nested(&upperdir->i_mutex, I_MUTEX_PARENT);
+	newdentry = union_lookup_create(new_ue, pue, &new->d_name);
+	err = PTR_ERR(newdentry);
+	if (IS_ERR(newdentry))
+		goto out_unlock;
+
+	err = vfs_link(old_ue->upperpath.dentry, upperdir, newdentry);
+	if (!err) {
+		struct inode *inode = old->d_inode;
+
+		atomic_inc(&inode->i_count);
+		d_instantiate(new, inode);
+
+		new_ue->upperpath.dentry = newdentry;
+		new_ue->upperpath.mnt = pue->upperpath.mnt;
+		path_get(&new_ue->upperpath);
+	}
+	dput(newdentry);
+out_unlock:
+	mutex_unlock(&upperdir->i_mutex);
+out:
+	return err;
+
+}
+
+static int union_rename(struct inode *olddir, struct dentry *old,
+			struct inode *newdir, struct dentry *new)
+{
+	int err;
+	struct union_entry *old_ue = old->d_fsdata;
+	struct union_entry *new_ue = new->d_fsdata;
+	struct union_entry *old_pue = old->d_parent->d_fsdata;
+	struct union_entry *new_pue = new->d_parent->d_fsdata;
+	struct dentry *old_upperdir;
+	struct dentry *new_upperdir;
+	struct dentry *olddentry;
+	struct dentry *newdentry;
+	struct dentry *trap;
+	bool prev_opaque;
+
+	/* Don't copy up directory trees */
+	if (old_ue->lowerpath.dentry &&
+	    S_ISDIR(old_ue->lowerpath.dentry->d_inode->i_mode))
+		return -EXDEV;
+
+	if (new_ue->lowerpath.dentry &&
+	    S_ISDIR(new_ue->lowerpath.dentry->d_inode->i_mode)) {
+		err = union_check_empty_dir(new);
+		if (err)
+			return err;
+	}
+
+	err = union_copy_up(old);
+	if (err)
+		return err;
+
+	err = union_copy_up(new->d_parent);
+	if (err)
+		return err;
+
+	if (new_ue->lowerpath.dentry &&
+	    S_ISDIR(new_ue->lowerpath.dentry->d_inode->i_mode)) {
+		err = union_remove_whiteouts(new);
+		if (err)
+			return err;
+	}
+
+	old_upperdir = old_pue->upperpath.dentry;
+	new_upperdir = new_pue->upperpath.dentry;
+	trap = lock_rename(new_upperdir, old_upperdir);
+
+	olddentry = old_ue->upperpath.dentry;
+	newdentry = dget(new_ue->upperpath.dentry);
+	if (!newdentry) {
+		newdentry = union_lookup_create(new_ue, new_pue, &new->d_name);
+		err = PTR_ERR(newdentry);
+		if (IS_ERR(newdentry))
+			goto out_unlock;
+	}
+
+	err = -EINVAL;
+	if (WARN_ON(olddentry == trap))
+		goto out_dput;
+	if (WARN_ON(newdentry == trap))
+		goto out_dput;
+
+	err = vfs_rename(old_upperdir->d_inode, olddentry,
+			 new_upperdir->d_inode, newdentry);
+
+	if (!err) {
+		prev_opaque = old_ue->opaque;
+		old_ue->opaque = new_ue->opaque || new_ue->lowerpath.dentry;
+		if (prev_opaque)
+			err = union_whiteout(old);
+		if (!err && S_ISDIR(olddentry->d_inode->i_mode)) {
+			if (prev_opaque && !old_ue->opaque)
+				vfs_removexattr(olddentry, union_opaque_xattr);
+			if (!prev_opaque && old_ue->opaque)
+				err = vfs_setxattr(olddentry, union_opaque_xattr, "y", 1, 0);
+		}
+	}
+
+out_dput:
+	dput(newdentry);
+out_unlock:
+	unlock_rename(new_upperdir, old_upperdir);
+	return err;
+}
+
+static bool union_is_private_xattr(const char *name)
+{
+	return strncmp(name, "trusted.union.", 14) == 0;
+}
+
+static int union_setxattr(struct dentry *dentry, const char *name,
+			  const void *value, size_t size, int flags)
+{
+	int err;
+	struct union_entry *ue = dentry->d_fsdata;
+
+	if (union_is_private_xattr(name))
+		return -ENODATA;
+
+	if (!ue->upperpath.dentry) {
+		err = union_copy_up(dentry);
+		if (err)
+			return err;
+	}
+
+	return vfs_setxattr(ue->upperpath.dentry, name, value, size, flags);
+}
+
+static ssize_t union_getxattr(struct dentry *dentry, const char *name,
+			      void *value, size_t size)
+{
+	struct union_entry *ue = dentry->d_fsdata;
+	struct path *realpath = union_path(ue);
+
+	if (union_is_private_xattr(name))
+		return -ENODATA;
+
+	return vfs_getxattr(realpath->dentry, name, value, size);
+}
+
+static ssize_t union_listxattr(struct dentry *dentry, char *list, size_t size)
+{
+	struct union_entry *ue = dentry->d_fsdata;
+	struct path *realpath = union_path(ue);
+
+	/* FIXME: filter out private xattrs */
+	return vfs_listxattr(realpath->dentry, list, size);
+}
+
+static int union_removexattr(struct dentry *dentry, const char *name)
+{
+	int err;
+	struct union_entry *ue = dentry->d_fsdata;
+
+	if (union_is_private_xattr(name))
+		return -ENODATA;
+
+	if (!ue->upperpath.dentry) {
+		err = vfs_getxattr(ue->lowerpath.dentry, name, NULL, 0);
+		if (err < 0)
+			return err;
+
+		err = union_copy_up(dentry);
+		if (err)
+			return err;
+	}
+
+	return vfs_removexattr(ue->upperpath.dentry, name);
+}
+
+static const struct inode_operations union_dir_inode_operations = {
+	.lookup		= union_lookup,
+	.mkdir		= union_mkdir,
+	.symlink	= union_symlink,
+	.unlink		= union_unlink,
+	.rmdir		= union_rmdir,
+	.rename		= union_rename,
+	.link		= union_link,
+	.setattr	= union_setattr,
+	.create		= union_create,
+	.mknod		= union_mknod,
+	.permission	= union_permission,
+	.getattr	= union_dir_getattr,
+	.setxattr	= union_setxattr,
+	.getxattr	= union_getxattr,
+	.listxattr	= union_listxattr,
+	.removexattr	= union_removexattr,
+};
+
+static const struct inode_operations union_file_inode_operations = {
+	.setattr	= union_setattr,
+	.permission	= union_permission,
+	.getattr	= union_getattr,
+	.setxattr	= union_setxattr,
+	.getxattr	= union_getxattr,
+	.listxattr	= union_listxattr,
+	.removexattr	= union_removexattr,
+};
+
+static const struct inode_operations union_symlink_inode_operations = {
+	.setattr	= union_setattr,
+	.follow_link	= union_follow_link,
+	.put_link	= union_put_link,
+	.readlink	= union_readlink,
+	.getattr	= union_getattr,
+	.setxattr	= union_setxattr,
+	.getxattr	= union_getxattr,
+	.listxattr	= union_listxattr,
+	.removexattr	= union_removexattr,
+};
+
+static bool union_open_need_copy_up(struct file *file, struct union_entry *ue)
+{
+	if (ue->upperpath.dentry)
+		return false;
+
+	if (special_file(ue->lowerpath.dentry->d_inode->i_mode))
+		return false;
+
+	if (!(file->f_mode & FMODE_WRITE) && !(file->f_flags & O_TRUNC))
+		return false;
+
+	return true;
+}
+
+static struct file *union_open(struct file *file)
+{
+	struct dentry *dentry = file->f_path.dentry;
+	struct union_entry *ue = dentry->d_fsdata;
+	int err;
+
+	if (union_open_need_copy_up(file, ue)) {
+		err = union_copy_up(dentry);
+		if (err)
+			return ERR_PTR(err);
+	}
+	return path_open(union_path(ue), file->f_flags);
+}
+
+static const struct file_operations union_file_operations = {
+	.open_other	= union_open,
+};
+
+static void union_put_super(struct super_block *sb)
+{
+	struct union_fs *ufs = sb->s_fs_info;
+
+	iput(ufs->symlink_inode);
+	iput(ufs->regular_inode);
+	iput(ufs->special_inode);
+	kfree(ufs);
+}
+
+static const struct super_operations union_super_operations = {
+	.put_super	= union_put_super,
+};
+
+struct union_config {
+	char *lowerdir;
+	char *upperdir;
+};
+
+enum {
+	Opt_lowerdir,
+	Opt_upperdir,
+	Opt_err,
+};
+
+static const match_table_t union_tokens = {
+	{Opt_lowerdir,			"lowerdir=%s"},
+	{Opt_upperdir,			"upperdir=%s"},
+	{Opt_err,			NULL}
+};
+
+static int union_parse_opt(char *opt, struct union_config *config)
+{
+	char *p;
+
+	config->upperdir = NULL;
+	config->lowerdir = NULL;
+
+	while ((p = strsep(&opt, ",")) != NULL) {
+		int token;
+		substring_t args[MAX_OPT_ARGS];
+
+		if (!*p)
+			continue;
+
+		token = match_token(p, union_tokens, args);
+		switch (token) {
+		case Opt_upperdir:
+			kfree(config->upperdir);
+			config->upperdir = match_strdup(&args[0]);
+			if (!config->upperdir)
+				return -ENOMEM;
+			break;
+
+		case Opt_lowerdir:
+			kfree(config->lowerdir);
+			config->lowerdir = match_strdup(&args[0]);
+			if (!config->lowerdir)
+				return -ENOMEM;
+			break;
+
+		default:
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+static int union_fill_super(struct super_block *sb, void *data, int silent)
+{
+	struct inode *root_inode;
+	struct dentry *root_dentry;
+	struct union_entry *ue;
+	struct union_fs *ufs;
+	struct union_config config;
+	int err;
+
+	err = union_parse_opt((char *) data, &config);
+	if (err)
+		goto out;
+
+	err = -EINVAL;
+	if (!config.upperdir || !config.lowerdir)
+		goto out_free_config;
+
+	err = -ENOMEM;
+	ufs = kmalloc(sizeof(struct union_fs), GFP_KERNEL);
+	if (!ufs)
+		goto out_free_config;
+
+	ufs->symlink_inode = new_inode(sb);
+	if (!ufs->symlink_inode)
+		goto out_free_ufs;
+
+	ufs->regular_inode = new_inode(sb);
+	if (!ufs->regular_inode)
+		goto out_put_symlink_inode;
+
+	ufs->special_inode = new_inode(sb);
+	if (!ufs->special_inode)
+		goto out_put_regular_inode;
+
+	ufs->symlink_inode->i_flags |= S_NOATIME|S_NOCMTIME;
+	ufs->symlink_inode->i_mode = S_IFLNK;
+	ufs->symlink_inode->i_op = &union_symlink_inode_operations;
+
+	ufs->regular_inode->i_flags |= S_NOATIME|S_NOCMTIME;
+	ufs->regular_inode->i_mode = S_IFREG;
+	ufs->regular_inode->i_op = &union_file_inode_operations;
+	ufs->regular_inode->i_fop = &union_file_operations;
+
+	ufs->special_inode->i_flags |= S_NOATIME|S_NOCMTIME;
+	ufs->special_inode->i_mode = S_IFSOCK;
+	ufs->special_inode->i_op = &union_file_inode_operations;
+	ufs->special_inode->i_fop = &union_file_operations;
+
+	root_inode = union_new_inode(sb, S_IFDIR);
+	if (!root_inode)
+		goto out_put_special_inode;
+
+	ue = kzalloc(sizeof(struct union_entry), GFP_KERNEL);
+	if (ue == NULL)
+		goto out_put_root;
+
+	err = kern_path(config.upperdir, LOOKUP_FOLLOW, &ue->upperpath);
+	if (err)
+		goto out_free_ue;
+
+	err = kern_path(config.lowerdir, LOOKUP_FOLLOW, &ue->lowerpath);
+	if (err)
+		goto out_put_upperpath;
+
+	err = -ENOTDIR;
+	if (!S_ISDIR(ue->upperpath.dentry->d_inode->i_mode) ||
+	    !S_ISDIR(ue->lowerpath.dentry->d_inode->i_mode))
+		goto out_put_lowerpath;
+
+	/* FIXME: mnt_want_write() */
+
+	err = -ENOMEM;
+	root_dentry = d_alloc_root(root_inode);
+	if (!root_dentry)
+		goto out_put_lowerpath;
+
+	root_dentry->d_fsdata = ue;
+	root_dentry->d_op = &union_dentry_operations;
+
+	sb->s_op = &union_super_operations;
+	sb->s_root = root_dentry;
+	sb->s_fs_info = ufs;
+
+	return 0;
+
+out_put_lowerpath:
+	path_put(&ue->lowerpath);
+out_put_upperpath:
+	path_put(&ue->upperpath);
+out_free_ue:
+	kfree(ue);
+out_put_root:
+	iput(root_inode);
+out_put_special_inode:
+	iput(ufs->special_inode);
+out_put_regular_inode:
+	iput(ufs->regular_inode);
+out_put_symlink_inode:
+	iput(ufs->symlink_inode);
+out_free_ufs:
+	kfree(ufs);
+out_free_config:
+	kfree(config.lowerdir);
+	kfree(config.upperdir);
+out:
+	return err;
+}
+
+static int union_get_sb(struct file_system_type *fs_type,
+			int flags, const char *dev_name,
+			void *raw_data, struct vfsmount *mnt)
+{
+	return get_sb_nodev(fs_type, flags, raw_data, union_fill_super, mnt);
+}
+
+static struct file_system_type union_fs_type = {
+	.owner		= THIS_MODULE,
+	.name		= "union",
+	.fs_flags	= FS_RENAME_SELF_ALLOW,
+	.get_sb		= union_get_sb,
+	.kill_sb	= kill_anon_super,
+};
+
+static int __init union_init(void)
+{
+	return register_filesystem(&union_fs_type);
+}
+
+static void __exit union_exit(void)
+{
+	unregister_filesystem(&union_fs_type);
+}
+
+module_init(union_init);
+module_exit(union_exit);
Index: linux-2.6/fs/Kconfig
===================================================================
--- linux-2.6.orig/fs/Kconfig	2010-08-26 19:05:53.000000000 +0200
+++ linux-2.6/fs/Kconfig	2010-08-26 19:11:27.000000000 +0200
@@ -62,6 +62,7 @@ source "fs/quota/Kconfig"
 source "fs/autofs/Kconfig"
 source "fs/autofs4/Kconfig"
 source "fs/fuse/Kconfig"
+source "fs/union/Kconfig"
 
 config CUSE
 	tristate "Character device in Userspace support"
Index: linux-2.6/fs/Makefile
===================================================================
--- linux-2.6.orig/fs/Makefile	2010-08-26 19:05:53.000000000 +0200
+++ linux-2.6/fs/Makefile	2010-08-26 19:11:27.000000000 +0200
@@ -108,6 +108,7 @@ obj-$(CONFIG_AUTOFS_FS)		+= autofs/
 obj-$(CONFIG_AUTOFS4_FS)	+= autofs4/
 obj-$(CONFIG_ADFS_FS)		+= adfs/
 obj-$(CONFIG_FUSE_FS)		+= fuse/
+obj-$(CONFIG_UNION_FS)		+= union/
 obj-$(CONFIG_UDF_FS)		+= udf/
 obj-$(CONFIG_SUN_OPENPROMFS)	+= openpromfs/
 obj-$(CONFIG_OMFS_FS)		+= omfs/
Index: linux-2.6/fs/union/Kconfig
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/fs/union/Kconfig	2010-08-26 19:11:27.000000000 +0200
@@ -0,0 +1,4 @@
+config UNION_FS
+	tristate "Union filesystem support"
+	help
+	  Add support for union filesystem.
Index: linux-2.6/fs/union/Makefile
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/fs/union/Makefile	2010-08-26 19:11:27.000000000 +0200
@@ -0,0 +1,5 @@
+#
+# Makefile for the union filesystem.
+#
+
+obj-$(CONFIG_UNION_FS) += union.o

-- 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/5] vfs: make i_op->permission take a dentry instead of an inode
  2010-08-26 18:33 ` [PATCH 2/5] vfs: make i_op->permission take a dentry instead of an inode Miklos Szeredi
@ 2010-08-26 20:24   ` David P. Quigley
  2010-08-27  4:11     ` Neil Brown
  0 siblings, 1 reply; 49+ messages in thread
From: David P. Quigley @ 2010-08-26 20:24 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, linux-kernel, vaurora, neilb, viro, jblunck, hch

I may be missing something but I looked at your patch series and I see
no good reason for this patch at all. You just churned a lot of code for
something that you don't even have a need for in the patch set. Your
only two new callers of this function could just as easily have used the
inode since it isn't doing anything special with the dentry. It actually
pulls the inode out of it and uses it in generic_permission and
security_inode_permission. If you are going to change this you should
also change generic_permission as well. Honestly I'd rather see the
dentry requirement removed from inode operations instead but
unfortunately this isn't possible as I found out with my attempts to
remove the dentry requirement for get/setxattr

On Thu, 2010-08-26 at 20:33 +0200, Miklos Szeredi wrote:
> plain text document attachment (vfs-permission-dentry.patch)
> From: Miklos Szeredi <mszeredi@suse.cz>
> 
> Like most other inode operations ->permission() should take a dentry
> instead of an inode.  This is necessary for filesystems which operate
> on names not on inodes.
> 
> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
> ---
>  fs/afs/internal.h                  |    2 +-
>  fs/afs/security.c                  |    3 ++-
>  fs/bad_inode.c                     |    2 +-
>  fs/btrfs/inode.c                   |    4 +++-
>  fs/btrfs/ioctl.c                   |    8 ++++----
>  fs/ceph/inode.c                    |    3 ++-
>  fs/ceph/super.h                    |    2 +-
>  fs/cifs/cifsfs.c                   |    3 ++-
>  fs/coda/dir.c                      |    3 ++-
>  fs/coda/pioctl.c                   |    4 ++--
>  fs/ecryptfs/inode.c                |    4 ++--
>  fs/fuse/dir.c                      |    3 ++-
>  fs/gfs2/ops_inode.c                |   11 ++++++++---
>  fs/hostfs/hostfs_kern.c            |    3 ++-
>  fs/logfs/dir.c                     |    6 ------
>  fs/namei.c                         |   37 ++++++++++++++++++++-----------------
>  fs/namespace.c                     |    2 +-
>  fs/nfs/dir.c                       |    3 ++-
>  fs/nfsd/nfsfh.c                    |    2 +-
>  fs/nfsd/vfs.c                      |    4 ++--
>  fs/nilfs2/nilfs.h                  |    2 +-
>  fs/notify/fanotify/fanotify_user.c |    2 +-
>  fs/notify/inotify/inotify_user.c   |    2 +-
>  fs/ocfs2/file.c                    |    3 ++-
>  fs/ocfs2/file.h                    |    2 +-
>  fs/ocfs2/refcounttree.c            |    4 ++--
>  fs/open.c                          |   10 +++++-----
>  fs/proc/base.c                     |    3 ++-
>  fs/proc/proc_sysctl.c              |    3 ++-
>  fs/reiserfs/xattr.c                |    4 +++-
>  fs/smbfs/file.c                    |    4 ++--
>  fs/sysfs/inode.c                   |    3 ++-
>  fs/sysfs/sysfs.h                   |    2 +-
>  fs/utimes.c                        |    2 +-
>  fs/xattr.c                         |   12 +++++++-----
>  include/linux/coda_linux.h         |    2 +-
>  include/linux/fs.h                 |    4 ++--
>  include/linux/nfs_fs.h             |    2 +-
>  include/linux/reiserfs_xattr.h     |    2 +-
>  ipc/mqueue.c                       |    2 +-
>  net/unix/af_unix.c                 |    2 +-
>  41 files changed, 100 insertions(+), 81 deletions(-)
> 
> Index: linux-2.6/fs/btrfs/ioctl.c
> ===================================================================
> --- linux-2.6.orig/fs/btrfs/ioctl.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/btrfs/ioctl.c	2010-08-19 09:46:31.000000000 +0200
> @@ -396,13 +396,13 @@ fail:
>  }
>  
>  /* copy of may_create in fs/namei.c() */
> -static inline int btrfs_may_create(struct inode *dir, struct dentry *child)
> +static inline int btrfs_may_create(struct dentry *dir, struct dentry *child)
>  {
>  	if (child->d_inode)
>  		return -EEXIST;
> -	if (IS_DEADDIR(dir))
> +	if (IS_DEADDIR(dir->d_inode))
>  		return -ENOENT;
> -	return inode_permission(dir, MAY_WRITE | MAY_EXEC);
> +	return dentry_permission(dir, MAY_WRITE | MAY_EXEC);
>  }
>  
>  /*
> @@ -433,7 +433,7 @@ static noinline int btrfs_mksubvol(struc
>  	if (error)
>  		goto out_dput;
>  
> -	error = btrfs_may_create(dir, dentry);
> +	error = btrfs_may_create(parent->dentry, dentry);
>  	if (error)
>  		goto out_drop_write;
>  
> Index: linux-2.6/fs/ecryptfs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/ecryptfs/inode.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/ecryptfs/inode.c	2010-08-19 09:46:31.000000000 +0200
> @@ -958,9 +958,9 @@ int ecryptfs_truncate(struct dentry *den
>  }
>  
>  static int
> -ecryptfs_permission(struct inode *inode, int mask)
> +ecryptfs_permission(struct dentry *dentry, int mask)
>  {
> -	return inode_permission(ecryptfs_inode_to_lower(inode), mask);
> +	return dentry_permission(ecryptfs_dentry_to_lower(dentry), mask);
>  }
>  
>  /**
> Index: linux-2.6/fs/namei.c
> ===================================================================
> --- linux-2.6.orig/fs/namei.c	2010-08-19 09:46:15.000000000 +0200
> +++ linux-2.6/fs/namei.c	2010-08-19 09:46:31.000000000 +0200
> @@ -240,17 +240,18 @@ int generic_permission(struct inode *ino
>  }
>  
>  /**
> - * inode_permission  -  check for access rights to a given inode
> - * @inode:	inode to check permission on
> + * dentry_permission  -  check for access rights to a given dentry
> + * @dentry:	dentry to check permission on
>   * @mask:	right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
>   *
> - * Used to check for read/write/execute permissions on an inode.
> + * Used to check for read/write/execute permissions on an dentry.
>   * We use "fsuid" for this, letting us set arbitrary permissions
>   * for filesystem access without changing the "normal" uids which
>   * are used for other things.
>   */
> -int inode_permission(struct inode *inode, int mask)
> +int dentry_permission(struct dentry *dentry, int mask)
>  {
> +	struct inode *inode = dentry->d_inode;
>  	int retval;
>  
>  	if (mask & MAY_WRITE) {
> @@ -271,7 +272,7 @@ int inode_permission(struct inode *inode
>  	}
>  
>  	if (inode->i_op->permission)
> -		retval = inode->i_op->permission(inode, mask);
> +		retval = inode->i_op->permission(dentry, mask);
>  	else
>  		retval = generic_permission(inode, mask, inode->i_op->check_acl);
>  
> @@ -295,11 +296,11 @@ int inode_permission(struct inode *inode
>   *
>   * Note:
>   *	Do not use this function in new code.  All access checks should
> - *	be done using inode_permission().
> + *	be done using dentry_permission().
>   */
>  int file_permission(struct file *file, int mask)
>  {
> -	return inode_permission(file->f_path.dentry->d_inode, mask);
> +	return dentry_permission(file->f_path.dentry, mask);
>  }
>  
>  /*
> @@ -459,12 +460,13 @@ force_reval_path(struct path *path, stru
>   * short-cut DAC fails, then call ->permission() to do more
>   * complete permission check.
>   */
> -static int exec_permission(struct inode *inode)
> +static int exec_permission(struct dentry *dentry)
>  {
>  	int ret;
> +	struct inode *inode = dentry->d_inode;
>  
>  	if (inode->i_op->permission) {
> -		ret = inode->i_op->permission(inode, MAY_EXEC);
> +		ret = inode->i_op->permission(dentry, MAY_EXEC);
>  		if (!ret)
>  			goto ok;
>  		return ret;
> @@ -837,7 +839,7 @@ static int link_path_walk(const char *na
>  		unsigned int c;
>  
>  		nd->flags |= LOOKUP_CONTINUE;
> -		err = exec_permission(inode);
> +		err = exec_permission(nd->path.dentry);
>   		if (err)
>  			break;
>  
> @@ -1163,7 +1165,7 @@ static struct dentry *lookup_hash(struct
>  {
>  	int err;
>  
> -	err = exec_permission(nd->path.dentry->d_inode);
> +	err = exec_permission(nd->path.dentry);
>  	if (err)
>  		return ERR_PTR(err);
>  	return __lookup_hash(&nd->last, nd->path.dentry, nd);
> @@ -1213,7 +1215,7 @@ struct dentry *lookup_one_len(const char
>  	if (err)
>  		return ERR_PTR(err);
>  
> -	err = exec_permission(base->d_inode);
> +	err = exec_permission(base);
>  	if (err)
>  		return ERR_PTR(err);
>  	return __lookup_hash(&this, base, NULL);
> @@ -1301,7 +1303,7 @@ static int may_delete(struct inode *dir,
>  	BUG_ON(victim->d_parent->d_inode != dir);
>  	audit_inode_child(victim, dir);
>  
> -	error = inode_permission(dir, MAY_WRITE | MAY_EXEC);
> +	error = dentry_permission(victim->d_parent, MAY_WRITE | MAY_EXEC);
>  	if (error)
>  		return error;
>  	if (IS_APPEND(dir))
> @@ -1337,7 +1339,8 @@ static inline int may_create(struct inod
>  		return -EEXIST;
>  	if (IS_DEADDIR(dir))
>  		return -ENOENT;
> -	return inode_permission(dir, MAY_WRITE | MAY_EXEC);
> +	BUG_ON(child->d_parent->d_inode != dir);
> +	return dentry_permission(child->d_parent, MAY_WRITE | MAY_EXEC);
>  }
>  
>  /*
> @@ -1430,7 +1433,7 @@ int may_open(struct path *path, int acc_
>  		break;
>  	}
>  
> -	error = inode_permission(inode, acc_mode);
> +	error = dentry_permission(dentry, acc_mode);
>  	if (error)
>  		return error;
>  
> @@ -2545,7 +2548,7 @@ static int vfs_rename_dir(struct inode *
>  	 * we'll need to flip '..'.
>  	 */
>  	if (new_dir != old_dir) {
> -		error = inode_permission(old_dentry->d_inode, MAY_WRITE);
> +		error = dentry_permission(old_dentry, MAY_WRITE);
>  		if (error)
>  			return error;
>  	}
> @@ -2900,7 +2903,7 @@ EXPORT_SYMBOL(page_symlink_inode_operati
>  EXPORT_SYMBOL(path_lookup);
>  EXPORT_SYMBOL(kern_path);
>  EXPORT_SYMBOL(vfs_path_lookup);
> -EXPORT_SYMBOL(inode_permission);
> +EXPORT_SYMBOL(dentry_permission);
>  EXPORT_SYMBOL(file_permission);
>  EXPORT_SYMBOL(unlock_rename);
>  EXPORT_SYMBOL(vfs_create);
> Index: linux-2.6/fs/namespace.c
> ===================================================================
> --- linux-2.6.orig/fs/namespace.c	2010-08-19 09:45:50.000000000 +0200
> +++ linux-2.6/fs/namespace.c	2010-08-19 09:46:31.000000000 +0200
> @@ -1230,7 +1230,7 @@ static int mount_is_safe(struct path *pa
>  		if (current_uid() != path->dentry->d_inode->i_uid)
>  			return -EPERM;
>  	}
> -	if (inode_permission(path->dentry->d_inode, MAY_WRITE))
> +	if (dentry_permission(path->dentry, MAY_WRITE))
>  		return -EPERM;
>  	return 0;
>  #endif
> Index: linux-2.6/fs/nfsd/nfsfh.c
> ===================================================================
> --- linux-2.6.orig/fs/nfsd/nfsfh.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/nfsd/nfsfh.c	2010-08-19 09:46:31.000000000 +0200
> @@ -38,7 +38,7 @@ static int nfsd_acceptable(void *expv, s
>  		/* make sure parents give x permission to user */
>  		int err;
>  		parent = dget_parent(tdentry);
> -		err = inode_permission(parent->d_inode, MAY_EXEC);
> +		err = dentry_permission(parent, MAY_EXEC);
>  		if (err < 0) {
>  			dput(parent);
>  			break;
> Index: linux-2.6/fs/nfsd/vfs.c
> ===================================================================
> --- linux-2.6.orig/fs/nfsd/vfs.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/nfsd/vfs.c	2010-08-19 09:46:31.000000000 +0200
> @@ -2124,12 +2124,12 @@ nfsd_permission(struct svc_rqst *rqstp,
>  		return 0;
>  
>  	/* This assumes  NFSD_MAY_{READ,WRITE,EXEC} == MAY_{READ,WRITE,EXEC} */
> -	err = inode_permission(inode, acc & (MAY_READ|MAY_WRITE|MAY_EXEC));
> +	err = dentry_permission(dentry, acc & (MAY_READ|MAY_WRITE|MAY_EXEC));
>  
>  	/* Allow read access to binaries even when mode 111 */
>  	if (err == -EACCES && S_ISREG(inode->i_mode) &&
>  	    acc == (NFSD_MAY_READ | NFSD_MAY_OWNER_OVERRIDE))
> -		err = inode_permission(inode, MAY_EXEC);
> +		err = dentry_permission(dentry, MAY_EXEC);
>  
>  	return err? nfserrno(err) : 0;
>  }
> Index: linux-2.6/fs/notify/fanotify/fanotify_user.c
> ===================================================================
> --- linux-2.6.orig/fs/notify/fanotify/fanotify_user.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/notify/fanotify/fanotify_user.c	2010-08-19 09:46:31.000000000 +0200
> @@ -454,7 +454,7 @@ static int fanotify_find_path(int dfd, c
>  	}
>  
>  	/* you can only watch an inode if you have read permissions on it */
> -	ret = inode_permission(path->dentry->d_inode, MAY_READ);
> +	ret = dentry_permission(path->dentry, MAY_READ);
>  	if (ret)
>  		path_put(path);
>  out:
> Index: linux-2.6/fs/notify/inotify/inotify_user.c
> ===================================================================
> --- linux-2.6.orig/fs/notify/inotify/inotify_user.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/notify/inotify/inotify_user.c	2010-08-19 09:46:31.000000000 +0200
> @@ -358,7 +358,7 @@ static int inotify_find_inode(const char
>  	if (error)
>  		return error;
>  	/* you can only watch an inode if you have read permissions on it */
> -	error = inode_permission(path->dentry->d_inode, MAY_READ);
> +	error = dentry_permission(path->dentry, MAY_READ);
>  	if (error)
>  		path_put(path);
>  	return error;
> Index: linux-2.6/fs/ocfs2/refcounttree.c
> ===================================================================
> --- linux-2.6.orig/fs/ocfs2/refcounttree.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/ocfs2/refcounttree.c	2010-08-19 09:46:31.000000000 +0200
> @@ -4322,7 +4322,7 @@ static inline int ocfs2_may_create(struc
>  		return -EEXIST;
>  	if (IS_DEADDIR(dir))
>  		return -ENOENT;
> -	return inode_permission(dir, MAY_WRITE | MAY_EXEC);
> +	return dentry_permission(child->d_parent, MAY_WRITE | MAY_EXEC);
>  }
>  
>  /* copied from user_path_parent. */
> @@ -4395,7 +4395,7 @@ static int ocfs2_vfs_reflink(struct dent
>  	 * file.
>  	 */
>  	if (!preserve) {
> -		error = inode_permission(inode, MAY_READ);
> +		error = dentry_permission(old_dentry, MAY_READ);
>  		if (error)
>  			return error;
>  	}
> Index: linux-2.6/fs/open.c
> ===================================================================
> --- linux-2.6.orig/fs/open.c	2010-08-19 09:46:27.000000000 +0200
> +++ linux-2.6/fs/open.c	2010-08-19 09:46:31.000000000 +0200
> @@ -89,7 +89,7 @@ static long do_sys_truncate(const char _
>  	if (error)
>  		goto dput_and_out;
>  
> -	error = inode_permission(inode, MAY_WRITE);
> +	error = dentry_permission(path.dentry, MAY_WRITE);
>  	if (error)
>  		goto mnt_drop_write_and_out;
>  
> @@ -328,7 +328,7 @@ SYSCALL_DEFINE3(faccessat, int, dfd, con
>  			goto out_path_release;
>  	}
>  
> -	res = inode_permission(inode, mode | MAY_ACCESS);
> +	res = dentry_permission(path.dentry, mode | MAY_ACCESS);
>  	/* SuS v2 requires we report a read only fs too */
>  	if (res || !(mode & S_IWOTH) || special_file(inode->i_mode))
>  		goto out_path_release;
> @@ -367,7 +367,7 @@ SYSCALL_DEFINE1(chdir, const char __user
>  	if (error)
>  		goto out;
>  
> -	error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
> +	error = dentry_permission(path.dentry, MAY_EXEC | MAY_CHDIR);
>  	if (error)
>  		goto dput_and_out;
>  
> @@ -396,7 +396,7 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd
>  	if (!S_ISDIR(inode->i_mode))
>  		goto out_putf;
>  
> -	error = inode_permission(inode, MAY_EXEC | MAY_CHDIR);
> +	error = dentry_permission(file->f_path.dentry, MAY_EXEC | MAY_CHDIR);
>  	if (!error)
>  		set_fs_pwd(current->fs, &file->f_path);
>  out_putf:
> @@ -414,7 +414,7 @@ SYSCALL_DEFINE1(chroot, const char __use
>  	if (error)
>  		goto out;
>  
> -	error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
> +	error = dentry_permission(path.dentry, MAY_EXEC | MAY_CHDIR);
>  	if (error)
>  		goto dput_and_out;
>  
> Index: linux-2.6/fs/utimes.c
> ===================================================================
> --- linux-2.6.orig/fs/utimes.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/utimes.c	2010-08-19 09:46:31.000000000 +0200
> @@ -96,7 +96,7 @@ static int utimes_common(struct path *pa
>  			goto mnt_drop_write_and_out;
>  
>  		if (!is_owner_or_cap(inode)) {
> -			error = inode_permission(inode, MAY_WRITE);
> +			error = dentry_permission(path->dentry, MAY_WRITE);
>  			if (error)
>  				goto mnt_drop_write_and_out;
>  		}
> Index: linux-2.6/fs/xattr.c
> ===================================================================
> --- linux-2.6.orig/fs/xattr.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/xattr.c	2010-08-19 09:46:31.000000000 +0200
> @@ -26,8 +26,10 @@
>   * because different namespaces have very different rules.
>   */
>  static int
> -xattr_permission(struct inode *inode, const char *name, int mask)
> +xattr_permission(struct dentry *dentry, const char *name, int mask)
>  {
> +	struct inode *inode = dentry->d_inode;
> +
>  	/*
>  	 * We can never set or remove an extended attribute on a read-only
>  	 * filesystem  or on an immutable / append-only inode.
> @@ -63,7 +65,7 @@ xattr_permission(struct inode *inode, co
>  			return -EPERM;
>  	}
>  
> -	return inode_permission(inode, mask);
> +	return dentry_permission(dentry, mask);
>  }
>  
>  /**
> @@ -115,7 +117,7 @@ vfs_setxattr(struct dentry *dentry, cons
>  	struct inode *inode = dentry->d_inode;
>  	int error;
>  
> -	error = xattr_permission(inode, name, MAY_WRITE);
> +	error = xattr_permission(dentry, name, MAY_WRITE);
>  	if (error)
>  		return error;
>  
> @@ -165,7 +167,7 @@ vfs_getxattr(struct dentry *dentry, cons
>  	struct inode *inode = dentry->d_inode;
>  	int error;
>  
> -	error = xattr_permission(inode, name, MAY_READ);
> +	error = xattr_permission(dentry, name, MAY_READ);
>  	if (error)
>  		return error;
>  
> @@ -224,7 +226,7 @@ vfs_removexattr(struct dentry *dentry, c
>  	if (!inode->i_op->removexattr)
>  		return -EOPNOTSUPP;
>  
> -	error = xattr_permission(inode, name, MAY_WRITE);
> +	error = xattr_permission(dentry, name, MAY_WRITE);
>  	if (error)
>  		return error;
>  
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h	2010-08-19 09:46:27.000000000 +0200
> +++ linux-2.6/include/linux/fs.h	2010-08-19 09:46:31.000000000 +0200
> @@ -1525,7 +1525,7 @@ struct inode_operations {
>  	void * (*follow_link) (struct dentry *, struct nameidata *);
>  	void (*put_link) (struct dentry *, struct nameidata *, void *);
>  	void (*truncate) (struct inode *);
> -	int (*permission) (struct inode *, int);
> +	int (*permission) (struct dentry *, int);
>  	int (*check_acl)(struct inode *, int);
>  	int (*setattr) (struct dentry *, struct iattr *);
>  	int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
> @@ -2111,7 +2111,7 @@ extern void emergency_remount(void);
>  extern sector_t bmap(struct inode *, sector_t);
>  #endif
>  extern int notify_change(struct dentry *, struct iattr *);
> -extern int inode_permission(struct inode *, int);
> +extern int dentry_permission(struct dentry *, int);
>  extern int generic_permission(struct inode *, int,
>  		int (*check_acl)(struct inode *, int));
>  
> Index: linux-2.6/ipc/mqueue.c
> ===================================================================
> --- linux-2.6.orig/ipc/mqueue.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/ipc/mqueue.c	2010-08-19 09:46:31.000000000 +0200
> @@ -656,7 +656,7 @@ static struct file *do_open(struct ipc_n
>  		goto err;
>  	}
>  
> -	if (inode_permission(dentry->d_inode, oflag2acc[oflag & O_ACCMODE])) {
> +	if (dentry_permission(dentry, oflag2acc[oflag & O_ACCMODE])) {
>  		ret = -EACCES;
>  		goto err;
>  	}
> Index: linux-2.6/net/unix/af_unix.c
> ===================================================================
> --- linux-2.6.orig/net/unix/af_unix.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/net/unix/af_unix.c	2010-08-19 09:46:31.000000000 +0200
> @@ -748,7 +748,7 @@ static struct sock *unix_find_other(stru
>  		if (err)
>  			goto fail;
>  		inode = path.dentry->d_inode;
> -		err = inode_permission(inode, MAY_WRITE);
> +		err = dentry_permission(path.dentry, MAY_WRITE);
>  		if (err)
>  			goto put_fail;
>  
> Index: linux-2.6/fs/afs/internal.h
> ===================================================================
> --- linux-2.6.orig/fs/afs/internal.h	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/afs/internal.h	2010-08-19 09:46:31.000000000 +0200
> @@ -624,7 +624,7 @@ extern void afs_clear_permits(struct afs
>  extern void afs_cache_permit(struct afs_vnode *, struct key *, long);
>  extern void afs_zap_permits(struct rcu_head *);
>  extern struct key *afs_request_key(struct afs_cell *);
> -extern int afs_permission(struct inode *, int);
> +extern int afs_permission(struct dentry *, int);
>  
>  /*
>   * server.c
> Index: linux-2.6/fs/afs/security.c
> ===================================================================
> --- linux-2.6.orig/fs/afs/security.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/afs/security.c	2010-08-19 09:46:31.000000000 +0200
> @@ -285,8 +285,9 @@ static int afs_check_permit(struct afs_v
>   * - AFS ACLs are attached to directories only, and a file is controlled by its
>   *   parent directory's ACL
>   */
> -int afs_permission(struct inode *inode, int mask)
> +int afs_permission(struct dentry *dentry, int mask)
>  {
> +	struct inode *inode = dentry->d_inode;
>  	struct afs_vnode *vnode = AFS_FS_I(inode);
>  	afs_access_t uninitialized_var(access);
>  	struct key *key;
> Index: linux-2.6/fs/bad_inode.c
> ===================================================================
> --- linux-2.6.orig/fs/bad_inode.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/bad_inode.c	2010-08-19 09:46:31.000000000 +0200
> @@ -229,7 +229,7 @@ static int bad_inode_readlink(struct den
>  	return -EIO;
>  }
>  
> -static int bad_inode_permission(struct inode *inode, int mask)
> +static int bad_inode_permission(struct dentry *dentry, int mask)
>  {
>  	return -EIO;
>  }
> Index: linux-2.6/fs/btrfs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/btrfs/inode.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/btrfs/inode.c	2010-08-19 09:46:31.000000000 +0200
> @@ -6922,8 +6922,10 @@ static int btrfs_set_page_dirty(struct p
>  	return __set_page_dirty_nobuffers(page);
>  }
>  
> -static int btrfs_permission(struct inode *inode, int mask)
> +static int btrfs_permission(struct dentry *dentry, int mask)
>  {
> +	struct inode *inode = dentry->d_inode;
> +
>  	if ((BTRFS_I(inode)->flags & BTRFS_INODE_READONLY) && (mask & MAY_WRITE))
>  		return -EACCES;
>  	return generic_permission(inode, mask, btrfs_check_acl);
> Index: linux-2.6/fs/ceph/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/ceph/inode.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/ceph/inode.c	2010-08-19 09:46:31.000000000 +0200
> @@ -1757,8 +1757,9 @@ int ceph_do_getattr(struct inode *inode,
>   * Check inode permissions.  We verify we have a valid value for
>   * the AUTH cap, then call the generic handler.
>   */
> -int ceph_permission(struct inode *inode, int mask)
> +int ceph_permission(struct dentry *dentry, int mask)
>  {
> +	struct inode *inode = dentry->d_inode;
>  	int err = ceph_do_getattr(inode, CEPH_CAP_AUTH_SHARED);
>  
>  	if (!err)
> Index: linux-2.6/fs/ceph/super.h
> ===================================================================
> --- linux-2.6.orig/fs/ceph/super.h	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/ceph/super.h	2010-08-19 09:46:31.000000000 +0200
> @@ -776,7 +776,7 @@ extern void ceph_queue_invalidate(struct
>  extern void ceph_queue_writeback(struct inode *inode);
>  
>  extern int ceph_do_getattr(struct inode *inode, int mask);
> -extern int ceph_permission(struct inode *inode, int mask);
> +extern int ceph_permission(struct dentry *dentry, int mask);
>  extern int ceph_setattr(struct dentry *dentry, struct iattr *attr);
>  extern int ceph_getattr(struct vfsmount *mnt, struct dentry *dentry,
>  			struct kstat *stat);
> Index: linux-2.6/fs/cifs/cifsfs.c
> ===================================================================
> --- linux-2.6.orig/fs/cifs/cifsfs.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/cifs/cifsfs.c	2010-08-19 09:46:31.000000000 +0200
> @@ -269,8 +269,9 @@ cifs_statfs(struct dentry *dentry, struc
>  	return 0;
>  }
>  
> -static int cifs_permission(struct inode *inode, int mask)
> +static int cifs_permission(struct dentry *dentry, int mask)
>  {
> +	struct inode *inode = dentry->d_inode;
>  	struct cifs_sb_info *cifs_sb;
>  
>  	cifs_sb = CIFS_SB(inode->i_sb);
> Index: linux-2.6/fs/coda/dir.c
> ===================================================================
> --- linux-2.6.orig/fs/coda/dir.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/coda/dir.c	2010-08-19 09:46:31.000000000 +0200
> @@ -138,8 +138,9 @@ exit:
>  }
>  
> 
> -int coda_permission(struct inode *inode, int mask)
> +int coda_permission(struct dentry *dentry, int mask)
>  {
> +	struct inode *inode = dentry->d_inode;
>          int error = 0;
>  
>  	mask &= MAY_READ | MAY_WRITE | MAY_EXEC;
> Index: linux-2.6/fs/fuse/dir.c
> ===================================================================
> --- linux-2.6.orig/fs/fuse/dir.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/fuse/dir.c	2010-08-19 09:46:31.000000000 +0200
> @@ -981,8 +981,9 @@ static int fuse_access(struct inode *ino
>   * access request is sent.  Execute permission is still checked
>   * locally based on file mode.
>   */
> -static int fuse_permission(struct inode *inode, int mask)
> +static int fuse_permission(struct dentry *dentry, int mask)
>  {
> +	struct inode *inode = dentry->d_inode;
>  	struct fuse_conn *fc = get_fuse_conn(inode);
>  	bool refreshed = false;
>  	int err = 0;
> Index: linux-2.6/fs/gfs2/ops_inode.c
> ===================================================================
> --- linux-2.6.orig/fs/gfs2/ops_inode.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/gfs2/ops_inode.c	2010-08-19 09:46:31.000000000 +0200
> @@ -1071,6 +1071,11 @@ int gfs2_permission(struct inode *inode,
>  	return error;
>  }
>  
> +static int gfs2_dentry_permission(struct dentry *dentry, int mask)
> +{
> +	return gfs2_permission(dentry->d_inode, mask);
> +}
> +
>  /*
>   * XXX(truncate): the truncate_setsize calls should be moved to the end.
>   */
> @@ -1344,7 +1349,7 @@ out:
>  }
>  
>  const struct inode_operations gfs2_file_iops = {
> -	.permission = gfs2_permission,
> +	.permission = gfs2_dentry_permission,
>  	.setattr = gfs2_setattr,
>  	.getattr = gfs2_getattr,
>  	.setxattr = gfs2_setxattr,
> @@ -1364,7 +1369,7 @@ const struct inode_operations gfs2_dir_i
>  	.rmdir = gfs2_rmdir,
>  	.mknod = gfs2_mknod,
>  	.rename = gfs2_rename,
> -	.permission = gfs2_permission,
> +	.permission = gfs2_dentry_permission,
>  	.setattr = gfs2_setattr,
>  	.getattr = gfs2_getattr,
>  	.setxattr = gfs2_setxattr,
> @@ -1378,7 +1383,7 @@ const struct inode_operations gfs2_symli
>  	.readlink = generic_readlink,
>  	.follow_link = gfs2_follow_link,
>  	.put_link = gfs2_put_link,
> -	.permission = gfs2_permission,
> +	.permission = gfs2_dentry_permission,
>  	.setattr = gfs2_setattr,
>  	.getattr = gfs2_getattr,
>  	.setxattr = gfs2_setxattr,
> Index: linux-2.6/fs/coda/pioctl.c
> ===================================================================
> --- linux-2.6.orig/fs/coda/pioctl.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/coda/pioctl.c	2010-08-19 09:46:31.000000000 +0200
> @@ -26,7 +26,7 @@
>  #include <linux/smp_lock.h>
>  
>  /* pioctl ops */
> -static int coda_ioctl_permission(struct inode *inode, int mask);
> +static int coda_ioctl_permission(struct dentry *dentry, int mask);
>  static long coda_pioctl(struct file *filp, unsigned int cmd,
>  			unsigned long user_data);
>  
> @@ -42,7 +42,7 @@ const struct file_operations coda_ioctl_
>  };
>  
>  /* the coda pioctl inode ops */
> -static int coda_ioctl_permission(struct inode *inode, int mask)
> +static int coda_ioctl_permission(struct dentry *dentry, int mask)
>  {
>  	return (mask & MAY_EXEC) ? -EACCES : 0;
>  }
> Index: linux-2.6/fs/hostfs/hostfs_kern.c
> ===================================================================
> --- linux-2.6.orig/fs/hostfs/hostfs_kern.c	2010-08-19 09:45:50.000000000 +0200
> +++ linux-2.6/fs/hostfs/hostfs_kern.c	2010-08-19 09:46:31.000000000 +0200
> @@ -746,8 +746,9 @@ int hostfs_rename(struct inode *from_ino
>  	return err;
>  }
>  
> -int hostfs_permission(struct inode *ino, int desired)
> +static int hostfs_permission(struct dentry *dentry, int desired)
>  {
> +	struct inode *ino = dentry->d_inode;
>  	char *name;
>  	int r = 0, w = 0, x = 0, err;
>  
> Index: linux-2.6/fs/logfs/dir.c
> ===================================================================
> --- linux-2.6.orig/fs/logfs/dir.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/logfs/dir.c	2010-08-19 09:46:31.000000000 +0200
> @@ -555,11 +555,6 @@ static int logfs_symlink(struct inode *d
>  	return __logfs_create(dir, dentry, inode, target, destlen);
>  }
>  
> -static int logfs_permission(struct inode *inode, int mask)
> -{
> -	return generic_permission(inode, mask, NULL);
> -}
> -
>  static int logfs_link(struct dentry *old_dentry, struct inode *dir,
>  		struct dentry *dentry)
>  {
> @@ -818,7 +813,6 @@ const struct inode_operations logfs_dir_
>  	.mknod		= logfs_mknod,
>  	.rename		= logfs_rename,
>  	.rmdir		= logfs_rmdir,
> -	.permission	= logfs_permission,
>  	.symlink	= logfs_symlink,
>  	.unlink		= logfs_unlink,
>  };
> Index: linux-2.6/fs/nfs/dir.c
> ===================================================================
> --- linux-2.6.orig/fs/nfs/dir.c	2010-08-19 09:45:50.000000000 +0200
> +++ linux-2.6/fs/nfs/dir.c	2010-08-19 09:46:31.000000000 +0200
> @@ -1941,8 +1941,9 @@ int nfs_may_open(struct inode *inode, st
>  	return nfs_do_access(inode, cred, nfs_open_permission_mask(openflags));
>  }
>  
> -int nfs_permission(struct inode *inode, int mask)
> +int nfs_permission(struct dentry *dentry, int mask)
>  {
> +	struct inode *inode = dentry->d_inode;
>  	struct rpc_cred *cred;
>  	int res = 0;
>  
> Index: linux-2.6/fs/nilfs2/nilfs.h
> ===================================================================
> --- linux-2.6.orig/fs/nilfs2/nilfs.h	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/nilfs2/nilfs.h	2010-08-19 09:46:31.000000000 +0200
> @@ -200,7 +200,7 @@ static inline struct inode *nilfs_dat_in
>   */
>  #ifdef CONFIG_NILFS_POSIX_ACL
>  #error "NILFS: not yet supported POSIX ACL"
> -extern int nilfs_permission(struct inode *, int, struct nameidata *);
> +extern int nilfs_permission(struct dentry *, int);
>  extern int nilfs_acl_chmod(struct inode *);
>  extern int nilfs_init_acl(struct inode *, struct inode *);
>  #else
> Index: linux-2.6/fs/ocfs2/file.c
> ===================================================================
> --- linux-2.6.orig/fs/ocfs2/file.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/ocfs2/file.c	2010-08-19 09:46:31.000000000 +0200
> @@ -1310,8 +1310,9 @@ bail:
>  	return err;
>  }
>  
> -int ocfs2_permission(struct inode *inode, int mask)
> +int ocfs2_permission(struct dentry *dentry, int mask)
>  {
> +	struct inode *inode = dentry->d_inode;
>  	int ret;
>  
>  	mlog_entry_void();
> Index: linux-2.6/fs/ocfs2/file.h
> ===================================================================
> --- linux-2.6.orig/fs/ocfs2/file.h	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/ocfs2/file.h	2010-08-19 09:46:31.000000000 +0200
> @@ -61,7 +61,7 @@ int ocfs2_zero_extend(struct inode *inod
>  int ocfs2_setattr(struct dentry *dentry, struct iattr *attr);
>  int ocfs2_getattr(struct vfsmount *mnt, struct dentry *dentry,
>  		  struct kstat *stat);
> -int ocfs2_permission(struct inode *inode, int mask);
> +int ocfs2_permission(struct dentry *dentry, int mask);
>  
>  int ocfs2_should_update_atime(struct inode *inode,
>  			      struct vfsmount *vfsmnt);
> Index: linux-2.6/fs/proc/base.c
> ===================================================================
> --- linux-2.6.orig/fs/proc/base.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/proc/base.c	2010-08-19 09:46:31.000000000 +0200
> @@ -2050,8 +2050,9 @@ static const struct file_operations proc
>   * /proc/pid/fd needs a special permission handler so that a process can still
>   * access /proc/self/fd after it has executed a setuid().
>   */
> -static int proc_fd_permission(struct inode *inode, int mask)
> +static int proc_fd_permission(struct dentry *dentry, int mask)
>  {
> +	struct inode *inode = dentry->d_inode;
>  	int rv;
>  
>  	rv = generic_permission(inode, mask, NULL);
> Index: linux-2.6/fs/proc/proc_sysctl.c
> ===================================================================
> --- linux-2.6.orig/fs/proc/proc_sysctl.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/proc/proc_sysctl.c	2010-08-19 09:46:31.000000000 +0200
> @@ -292,12 +292,13 @@ out:
>  	return ret;
>  }
>  
> -static int proc_sys_permission(struct inode *inode, int mask)
> +static int proc_sys_permission(struct dentry *dentry, int mask)
>  {
>  	/*
>  	 * sysctl entries that are not writeable,
>  	 * are _NOT_ writeable, capabilities or not.
>  	 */
> +	struct inode *inode = dentry->d_inode;
>  	struct ctl_table_header *head;
>  	struct ctl_table *table;
>  	int error;
> Index: linux-2.6/fs/reiserfs/xattr.c
> ===================================================================
> --- linux-2.6.orig/fs/reiserfs/xattr.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/reiserfs/xattr.c	2010-08-19 09:46:31.000000000 +0200
> @@ -954,8 +954,10 @@ static int xattr_mount_check(struct supe
>  	return 0;
>  }
>  
> -int reiserfs_permission(struct inode *inode, int mask)
> +int reiserfs_permission(struct dentry *dentry, int mask)
>  {
> +	struct inode *inode = dentry->d_inode;
> +
>  	/*
>  	 * We don't do permission checks on the internal objects.
>  	 * Permissions are determined by the "owning" object.
> Index: linux-2.6/fs/smbfs/file.c
> ===================================================================
> --- linux-2.6.orig/fs/smbfs/file.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/smbfs/file.c	2010-08-19 09:46:31.000000000 +0200
> @@ -408,9 +408,9 @@ smb_file_release(struct inode *inode, st
>   * privileges, so we need our own check for this.
>   */
>  static int
> -smb_file_permission(struct inode *inode, int mask)
> +smb_file_permission(struct dentry *dentry, int mask)
>  {
> -	int mode = inode->i_mode;
> +	int mode = dentry->d_inode->i_mode;
>  	int error = 0;
>  
>  	VERBOSE("mode=%x, mask=%x\n", mode, mask);
> Index: linux-2.6/fs/sysfs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/sysfs/inode.c	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/sysfs/inode.c	2010-08-19 09:46:31.000000000 +0200
> @@ -348,8 +348,9 @@ int sysfs_hash_and_remove(struct sysfs_d
>  		return -ENOENT;
>  }
>  
> -int sysfs_permission(struct inode *inode, int mask)
> +int sysfs_permission(struct dentry *dentry, int mask)
>  {
> +	struct inode *inode = dentry->d_inode;
>  	struct sysfs_dirent *sd = inode->i_private;
>  
>  	mutex_lock(&sysfs_mutex);
> Index: linux-2.6/fs/sysfs/sysfs.h
> ===================================================================
> --- linux-2.6.orig/fs/sysfs/sysfs.h	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/fs/sysfs/sysfs.h	2010-08-19 09:46:31.000000000 +0200
> @@ -200,7 +200,7 @@ static inline void __sysfs_put(struct sy
>  struct inode *sysfs_get_inode(struct super_block *sb, struct sysfs_dirent *sd);
>  void sysfs_evict_inode(struct inode *inode);
>  int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr *iattr);
> -int sysfs_permission(struct inode *inode, int mask);
> +int sysfs_permission(struct dentry *dentry, int mask);
>  int sysfs_setattr(struct dentry *dentry, struct iattr *iattr);
>  int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat);
>  int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
> Index: linux-2.6/include/linux/coda_linux.h
> ===================================================================
> --- linux-2.6.orig/include/linux/coda_linux.h	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/include/linux/coda_linux.h	2010-08-19 09:46:31.000000000 +0200
> @@ -37,7 +37,7 @@ extern const struct file_operations coda
>  /* operations shared over more than one file */
>  int coda_open(struct inode *i, struct file *f);
>  int coda_release(struct inode *i, struct file *f);
> -int coda_permission(struct inode *inode, int mask);
> +int coda_permission(struct dentry *dentry, int mask);
>  int coda_revalidate_inode(struct dentry *);
>  int coda_getattr(struct vfsmount *, struct dentry *, struct kstat *);
>  int coda_setattr(struct dentry *, struct iattr *);
> Index: linux-2.6/include/linux/nfs_fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/nfs_fs.h	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/include/linux/nfs_fs.h	2010-08-19 09:46:31.000000000 +0200
> @@ -348,7 +348,7 @@ extern int nfs_refresh_inode(struct inod
>  extern int nfs_post_op_update_inode(struct inode *inode, struct nfs_fattr *fattr);
>  extern int nfs_post_op_update_inode_force_wcc(struct inode *inode, struct nfs_fattr *fattr);
>  extern int nfs_getattr(struct vfsmount *, struct dentry *, struct kstat *);
> -extern int nfs_permission(struct inode *, int);
> +extern int nfs_permission(struct dentry *, int);
>  extern int nfs_open(struct inode *, struct file *);
>  extern int nfs_release(struct inode *, struct file *);
>  extern int nfs_attribute_timeout(struct inode *inode);
> Index: linux-2.6/include/linux/reiserfs_xattr.h
> ===================================================================
> --- linux-2.6.orig/include/linux/reiserfs_xattr.h	2010-08-19 09:45:30.000000000 +0200
> +++ linux-2.6/include/linux/reiserfs_xattr.h	2010-08-19 09:46:31.000000000 +0200
> @@ -41,7 +41,7 @@ int reiserfs_xattr_init(struct super_blo
>  int reiserfs_lookup_privroot(struct super_block *sb);
>  int reiserfs_delete_xattrs(struct inode *inode);
>  int reiserfs_chown_xattrs(struct inode *inode, struct iattr *attrs);
> -int reiserfs_permission(struct inode *inode, int mask);
> +int reiserfs_permission(struct dentry *dentry, int mask);
>  
>  #ifdef CONFIG_REISERFS_FS_XATTR
>  #define has_xattr_dir(inode) (REISERFS_I(inode)->i_flags & i_has_xattr_dir)
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/5] vfs: make i_op->permission take a dentry instead of an inode
  2010-08-26 20:24   ` David P. Quigley
@ 2010-08-27  4:11     ` Neil Brown
  2010-08-27 18:13       ` David P. Quigley
  2010-08-27 18:31       ` David P. Quigley
  0 siblings, 2 replies; 49+ messages in thread
From: Neil Brown @ 2010-08-27  4:11 UTC (permalink / raw)
  To: David P. Quigley
  Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, vaurora, viro, jblunck, hch

On Thu, 26 Aug 2010 16:24:02 -0400
"David P. Quigley" <dpquigl@tycho.nsa.gov> wrote:

> I may be missing something but I looked at your patch series and I see
> no good reason for this patch at all. You just churned a lot of code for
> something that you don't even have a need for in the patch set. Your
> only two new callers of this function could just as easily have used the
> inode since it isn't doing anything special with the dentry. It actually
> pulls the inode out of it and uses it in generic_permission and
> security_inode_permission. If you are going to change this you should
> also change generic_permission as well. Honestly I'd rather see the
> dentry requirement removed from inode operations instead but
> unfortunately this isn't possible as I found out with my attempts to
> remove the dentry requirement for get/setxattr


union_permission needs the dentry to get access to d_fsdata, which caches the
upperpath and lowerpath which were found at lookup time.

Is that what you missed?

NeilBrown

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-26 18:33 [PATCH 0/5] hybrid union filesystem prototype Miklos Szeredi
                   ` (4 preceding siblings ...)
  2010-08-26 18:33 ` [PATCH 5/5] union: hybrid union filesystem prototype Miklos Szeredi
@ 2010-08-27  7:05 ` Neil Brown
  2010-08-27  8:47   ` Miklos Szeredi
  5 siblings, 1 reply; 49+ messages in thread
From: Neil Brown @ 2010-08-27  7:05 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, vaurora, viro, jblunck, hch

On Thu, 26 Aug 2010 20:33:40 +0200
Miklos Szeredi <miklos@szeredi.hu> wrote:

> This is the result of my experimentation with trying to do
> union-mounts like semantics with a filesystem.  The implementation is
> far from perfect, but I think the concept does sort of work.  See the
> patch header for the union filesystem.
> 
> VFS modifications necessary to make it work:
> 
>   - allow f_op->open() to return a different file
>   - pass dentry to i_op->permission() instead of inode
>   - hack to vfs_rename() to allow rename to same inode

Hi Miklos,

 My first problem with this that there isn't nearly enough documentation.
So I offer the following to fix this problem.  Please correct anything that I
got glaringly wrong.  I don't claim it is at all complete, but touches on the
things that I thought were interesting.

NeilBrown



Hybrid Union Filesystem
=======================

This document describes a prototype for a new approach to providing
union-filesystem functionality in Linux.
A union-filesystem tries to present the union of two different filesystems as
though it were a single filesystem.  The result will inevitably fail to look
exactly like a normal filesystem for various technical reasons.  The
expectation is that many use cases will be able to ignore these differences.

This approach is 'hybrid' because the objects that appear in the filesystem
do not all appear to belong to that filesystem.  In many case an object
accessed in the hybrid-union will be indistinguishable from accessing the
corresponding object from the original filesystem.  This is most obvious
from the 'st_dev' field returned by stat(2).  Some objects will report an
st_dev from one original filesystem, some from the other, none will report an
st_dev from the union itself.  Similarly st_ino will only be unique when
combined with st_dev, and both of these can change over the lifetime of an
object.
Many applications and tools ignore these values and will not be affected.

Upper and Lower
---------------

A union filesystem combines two filesystems - an 'upper' filesystem and a
'lower' filesystem.  Note that while in set theory, 'union' is a commutative
operation, in filesystems it is not - the two filesystems are treated
differently.  When a name exists in both filesystems, the object in the
'upper' filesystem is visible while the object in the 'lower' filesystem is
either hidden or, in the case of directories, merged with the 'upper' object.

It would be more correct to refer to an upper and lower 'directory tree'
rather than 'filesystem' as it is quite possible for both directory trees to
be in the same filesystem and there is no requirement that the root of a
filesystem be given for either upper or lower.

The lower filesystem can be any filesystem supported by Linux and does not
need to be writable.  It could even be another union.
The upper filesystem will normally be writeable and if it is it must support
the creation of trusted.* extended attributes, and must provide valid d_type
in readdir responses, at least for symbolic links - so NFS is not suitable.

A read-only union of two read-only filesystems is support for any filesystem
type.

Directories
-----------

Unioning mainly involved directories.  If a given name appears in both upper
ad lower filesystems and refers to a non-directory in either, then the lower
object is hidden - the name refers only to the upper object.

Where both upper and lower objects are directories, a merged directory is
formed.

At mount time, the two directories given as mount options are combined
into a merged directory.  Then whenever a lookup is requested in such a
merged directory, the lookup is performed in each actual directory and the
combined result is cached in the dentry belonging to the union filesystem.
If both actual lookups find directories, both are stored and a merged
directory is create, otherwise only one is stored: the upper if it exists,
else the lower.

Only the lists of names from directories are merged.  Other content such as
metadata and extended attributes are reported for the upper directory only.
These attributes of the lower directory are hidden.

whiteouts and opaque directories
--------------------------------

In order to support rm and rmdir without changing the lower filesystem, a
union filesystem needs to record in the upper filesystem that files have been
removed.  This is done using whiteouts and opaque directories (non-directories
are always opaque).

The hybrid union filesystem uses extended attributes with a "trusted.union."
prefix to record these details.

A whiteout is created as a symbolic link with target "(union-whiteout)" and
with xattr "trusted.union.whiteout" set to "y".  When a whiteout is found in
the upper level of a merged directory, any matching name in the lower level
is ignored, and the whiteout itself is also hidden.

A directory is made opaque by setting the xattr "trusted.union.opaque" to "y".
Where the upper filesystem contains an opaque directory, any directory in the
lower filesystem with the same name is ignored.

readdir
-------

When a 'readdir' request is made on a merged directory, the upper and lower
directories are each read and the name lists merged in the obvious way (upper
is read first, then lower - entries that already exist are not re-added).
This merged name list is cached in the 'struct file' and so remains as long as
the file is kept open.  If the directory is opened and read by two processes
at the same time, they will each have separate caches.
A seekdir to the start of the directory (offset 0) followed by a readdir will
cause the cache to be discarded and rebuilt.

This means that changes to the merged directory do not appear while a
directory is being read.  This is unlikely to be noticed by many programs.

seek offsets are assigned sequentially when the directories are read.  Thus
if
  - read part of a directory
  - remember an offset, and close the directory
  - re-open the directory some time later
  - seek to the remembered offset

there may be little correlation between the old and new locations in the list
of filenames, particularly if anything has changed in the directory.

Readdir on directories that are not merged is simply handled by the
underlying directory (upper or lower).


Non-directories
---------------

Objects that are not directories (files, symlinks, device-special files etc)
are presented either from the upper or lower filesystem as appropriate.
When a file in the lower filesystem is accessed in a way the requires
write-access; such as opening for write access, changing some metadata etc,
the file is first copied from the lower filesystem to the upper filesystem
(copy_up).  Note that creating a hard-link also requires copy-up, though of
course creation of a symlink does not.

The copy_up process first makes sure that the containing directory exists
in the upper filesystem - creating it and any parents as necessary.  It then
creates the object with the same metadata (owner, mode, mtime, symlink-target
etc) and then if the object is a file, the data is copied from the lower to
the upper filesystem.  Finally any extended attributes are copied up.

Once the copy_up is complete, the hybrid-union filesystem simply provides
direct access to the newly created file in the upper filesystem - future
operations on the file are barely noticed by the hybrid-union (though an
operation on the name of the file such as rename or unlink will of course be
noticed and handled).

Changes to underlying filesystems
---------------------------------

The hybrid-union filesystem does not insist that the upper and lower
filesystems that it combines remain read-only.  This is certainly preferred
but it is non-trivial to enforce (e.g. for NFS) and so cannot be depended on.
A system integrator should ensure the underlying filesystems are not changed
- except through the hybrid union - if they want predictable behaviour, or
must accept the consequences.
The hybrid-union filesystem only ensures that unexpected changes do not cause
a system crash or serious corruption to either underlying filesystem.

The exact behaviour of the hybrid-union is best described in terms from the
'dcache' and as objects can expire from the 'dcache' at unpredictable times,
the exact behaviour when the underlying filesystems change is in
general unpredictable.

Each dcache entry in the hybrid union holds a reference to a corresponding
dcache entry in upper or lower filesystem or, in the case of merged
directories, in both the upper and lower filesystems.

If a referenced file is rename directly in one of the filesystems, that
renaming will not be noticed in the hybrid union until the referencing dcache
entry expires.

If a referenced file is unlinked from a local filesystem, it could similarly
still be accessibly through the hybrid while the dcache entry persists.  If a
referenced file is on an NFS filesystem and is unlinked remotely, the file in
the hybrid-union will become 'stale' returning ESTALE for most accesses.  It
will not disappear until the dcache entry expires from the dcache.

If a referenced file is changed, either contents or metadata, that change
will be immediately visible in the hybrid union.

If the contents of a directory are changed, then a readdir will not see that
change unless it opens the file or seeks to the start of the file after the
change was made.  Then it will be visible.

Changes will *not* be visible to a lookup if the name that is changed is
currently cached in the the dcache, whether positive or negative.
e.g. If a name is looked for in the hybrid union and not found, a negative
dcache entry will be created.  If the same name is directly created in the
upper filesystem it will not immediately become visible to the hybrid, though
it will be visible in a readdir.  This could be confusing, and could
probably be fixed using dcache revalidation.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-27  7:05 ` [PATCH 0/5] " Neil Brown
@ 2010-08-27  8:47   ` Miklos Szeredi
  2010-08-27 11:35     ` Neil Brown
  0 siblings, 1 reply; 49+ messages in thread
From: Miklos Szeredi @ 2010-08-27  8:47 UTC (permalink / raw)
  To: Neil Brown
  Cc: miklos, linux-fsdevel, linux-kernel, vaurora, viro, jblunck, hch

Hi Neil,

On Fri, 27 Aug 2010, Neil Brown wrote:
>  My first problem with this that there isn't nearly enough documentation.
> So I offer the following to fix this problem.  Please correct anything that I
> got glaringly wrong.  I don't claim it is at all complete, but touches on the
> things that I thought were interesting.

Whoa, thank you very much.  I'll add it to the patchset with some
fixes (see below).

> Hybrid Union Filesystem
> =======================
> 
> This document describes a prototype for a new approach to providing
> union-filesystem functionality in Linux.
> A union-filesystem tries to present the union of two different filesystems as
> though it were a single filesystem.  The result will inevitably fail to look
> exactly like a normal filesystem for various technical reasons.  The
> expectation is that many use cases will be able to ignore these differences.
> 
> This approach is 'hybrid' because the objects that appear in the filesystem
> do not all appear to belong to that filesystem.  In many case an object
> accessed in the hybrid-union will be indistinguishable from accessing the
> corresponding object from the original filesystem.  This is most obvious
> from the 'st_dev' field returned by stat(2).  Some objects will report an
> st_dev from one original filesystem, some from the other, none will report an
> st_dev from the union itself.

Hmm, that's a bug.  Directories actually come from the union itself
for various reasons, and it does report the correct st_ino but not
st_dev.  Fixed.

>  Similarly st_ino will only be unique when
> combined with st_dev, and both of these can change over the lifetime of an
> object.
> Many applications and tools ignore these values and will not be affected.
> 
> Upper and Lower
> ---------------
> 
> A union filesystem combines two filesystems - an 'upper' filesystem and a
> 'lower' filesystem.  Note that while in set theory, 'union' is a commutative
> operation, in filesystems it is not - the two filesystems are treated
> differently.  When a name exists in both filesystems, the object in the
> 'upper' filesystem is visible while the object in the 'lower' filesystem is
> either hidden or, in the case of directories, merged with the 'upper' object.
> 
> It would be more correct to refer to an upper and lower 'directory tree'
> rather than 'filesystem' as it is quite possible for both directory trees to
> be in the same filesystem and there is no requirement that the root of a
> filesystem be given for either upper or lower.
> 
> The lower filesystem can be any filesystem supported by Linux and does not
> need to be writable.  It could even be another union.

Almost.  Xattr namespace issues currently prevent that, but with
escaping it could be worked around if necessary.

> The upper filesystem will normally be writeable and if it is it must support
> the creation of trusted.* extended attributes, and must provide valid d_type
> in readdir responses, at least for symbolic links - so NFS is not suitable.
> 
> A read-only union of two read-only filesystems is support for any filesystem
> type.

s/is support for/may use/

> 
> Directories
> -----------
> 
> Unioning mainly involved directories.  If a given name appears in both upper
> ad lower filesystems and refers to a non-directory in either, then the lower
> object is hidden - the name refers only to the upper object.
> 
> Where both upper and lower objects are directories, a merged directory is
> formed.
> 
> At mount time, the two directories given as mount options are combined
> into a merged directory.  Then whenever a lookup is requested in such a
> merged directory, the lookup is performed in each actual directory and the
> combined result is cached in the dentry belonging to the union filesystem.
> If both actual lookups find directories, both are stored and a merged
> directory is create, otherwise only one is stored: the upper if it exists,
> else the lower.
> 
> Only the lists of names from directories are merged.  Other content such as
> metadata and extended attributes are reported for the upper directory only.
> These attributes of the lower directory are hidden.
> 
> whiteouts and opaque directories
> --------------------------------
> 
> In order to support rm and rmdir without changing the lower filesystem, a
> union filesystem needs to record in the upper filesystem that files have been
> removed.  This is done using whiteouts and opaque directories (non-directories
> are always opaque).
> 
> The hybrid union filesystem uses extended attributes with a "trusted.union."
> prefix to record these details.
> 
> A whiteout is created as a symbolic link with target "(union-whiteout)" and
> with xattr "trusted.union.whiteout" set to "y".  When a whiteout is found in
> the upper level of a merged directory, any matching name in the lower level
> is ignored, and the whiteout itself is also hidden.
> 
> A directory is made opaque by setting the xattr "trusted.union.opaque" to "y".
> Where the upper filesystem contains an opaque directory, any directory in the
> lower filesystem with the same name is ignored.
> 
> readdir
> -------
> 
> When a 'readdir' request is made on a merged directory, the upper and lower
> directories are each read and the name lists merged in the obvious way (upper
> is read first, then lower - entries that already exist are not re-added).
> This merged name list is cached in the 'struct file' and so remains as long as
> the file is kept open.  If the directory is opened and read by two processes
> at the same time, they will each have separate caches.
> A seekdir to the start of the directory (offset 0) followed by a readdir will
> cause the cache to be discarded and rebuilt.
> 
> This means that changes to the merged directory do not appear while a
> directory is being read.  This is unlikely to be noticed by many programs.
> 
> seek offsets are assigned sequentially when the directories are read.  Thus
> if
>   - read part of a directory
>   - remember an offset, and close the directory
>   - re-open the directory some time later
>   - seek to the remembered offset
> 
> there may be little correlation between the old and new locations in the list
> of filenames, particularly if anything has changed in the directory.
> 
> Readdir on directories that are not merged is simply handled by the
> underlying directory (upper or lower).
> 
> 
> Non-directories
> ---------------
> 
> Objects that are not directories (files, symlinks, device-special files etc)
> are presented either from the upper or lower filesystem as appropriate.
> When a file in the lower filesystem is accessed in a way the requires
> write-access; such as opening for write access, changing some metadata etc,
> the file is first copied from the lower filesystem to the upper filesystem
> (copy_up).  Note that creating a hard-link also requires copy-up, though of
> course creation of a symlink does not.
> 
> The copy_up process first makes sure that the containing directory exists
> in the upper filesystem - creating it and any parents as necessary.  It then
> creates the object with the same metadata (owner, mode, mtime, symlink-target
> etc) and then if the object is a file, the data is copied from the lower to
> the upper filesystem.  Finally any extended attributes are copied up.
> 
> Once the copy_up is complete, the hybrid-union filesystem simply provides
> direct access to the newly created file in the upper filesystem - future
> operations on the file are barely noticed by the hybrid-union (though an
> operation on the name of the file such as rename or unlink will of course be
> noticed and handled).
> 
> Changes to underlying filesystems
> ---------------------------------
> 

For now I refuse to even think about what happens in this case.

The easiest way out of this mess might simply be to enforce exclusive
modification to the underlying filesystems on a local level, same as
the union mount strategy.  For NFS and other remote filesystems we
either

 a) add some way to enforce it,
 b) live with the consequences if not enforced on the system level, or
 c) disallow them to be part of the union.


> The hybrid-union filesystem does not insist that the upper and lower
> filesystems that it combines remain read-only.  This is certainly preferred
> but it is non-trivial to enforce (e.g. for NFS) and so cannot be depended on.
> A system integrator should ensure the underlying filesystems are not changed
> - except through the hybrid union - if they want predictable behaviour, or
> must accept the consequences.
> The hybrid-union filesystem only ensures that unexpected changes do not cause
> a system crash or serious corruption to either underlying filesystem.
> 
> The exact behaviour of the hybrid-union is best described in terms from the
> 'dcache' and as objects can expire from the 'dcache' at unpredictable times,
> the exact behaviour when the underlying filesystems change is in
> general unpredictable.
> 
> Each dcache entry in the hybrid union holds a reference to a corresponding
> dcache entry in upper or lower filesystem or, in the case of merged
> directories, in both the upper and lower filesystems.
> 
> If a referenced file is rename directly in one of the filesystems, that
> renaming will not be noticed in the hybrid union until the referencing dcache
> entry expires.
> 
> If a referenced file is unlinked from a local filesystem, it could similarly
> still be accessibly through the hybrid while the dcache entry persists.  If a
> referenced file is on an NFS filesystem and is unlinked remotely, the file in
> the hybrid-union will become 'stale' returning ESTALE for most accesses.  It
> will not disappear until the dcache entry expires from the dcache.
> 
> If a referenced file is changed, either contents or metadata, that change
> will be immediately visible in the hybrid union.
> 
> If the contents of a directory are changed, then a readdir will not see that
> change unless it opens the file or seeks to the start of the file after the
> change was made.  Then it will be visible.
> 
> Changes will *not* be visible to a lookup if the name that is changed is
> currently cached in the the dcache, whether positive or negative.
> e.g. If a name is looked for in the hybrid union and not found, a negative
> dcache entry will be created.  If the same name is directly created in the
> upper filesystem it will not immediately become visible to the hybrid, though
> it will be visible in a readdir.  This could be confusing, and could
> probably be fixed using dcache revalidation.
> 

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-27  8:47   ` Miklos Szeredi
@ 2010-08-27 11:35     ` Neil Brown
  2010-08-27 16:53       ` Miklos Szeredi
  2010-08-30 18:38       ` Valerie Aurora
  0 siblings, 2 replies; 49+ messages in thread
From: Neil Brown @ 2010-08-27 11:35 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, vaurora, viro, jblunck, hch

On Fri, 27 Aug 2010 10:47:39 +0200
Miklos Szeredi <miklos@szeredi.hu> wrote:

> Hi Neil,
> 
> On Fri, 27 Aug 2010, Neil Brown wrote:
> >  My first problem with this that there isn't nearly enough documentation.
> > So I offer the following to fix this problem.  Please correct anything that I
> > got glaringly wrong.  I don't claim it is at all complete, but touches on the
> > things that I thought were interesting.
> 
> Whoa, thank you very much.  I'll add it to the patchset with some
> fixes (see below).
> 
> > Hybrid Union Filesystem
> > =======================
> > 
> > This document describes a prototype for a new approach to providing
> > union-filesystem functionality in Linux.
> > A union-filesystem tries to present the union of two different filesystems as
> > though it were a single filesystem.  The result will inevitably fail to look
> > exactly like a normal filesystem for various technical reasons.  The
> > expectation is that many use cases will be able to ignore these differences.
> > 
> > This approach is 'hybrid' because the objects that appear in the filesystem
> > do not all appear to belong to that filesystem.  In many case an object
> > accessed in the hybrid-union will be indistinguishable from accessing the
> > corresponding object from the original filesystem.  This is most obvious
> > from the 'st_dev' field returned by stat(2).  Some objects will report an
> > st_dev from one original filesystem, some from the other, none will report an
> > st_dev from the union itself.
> 
> Hmm, that's a bug.  Directories actually come from the union itself
> for various reasons, and it does report the correct st_ino but not
> st_dev.  Fixed.

I was wondering why you even bothered to fiddle st_ino.  Given that lots of
other things come from one fs or the other, having the merged directories
appear to be from the upper filesystem doesn't seem like a problem.  I don't
really care either way though.

> > 
> > The lower filesystem can be any filesystem supported by Linux and does not
> > need to be writable.  It could even be another union.
> 
> Almost.  Xattr namespace issues currently prevent that, but with
> escaping it could be worked around if necessary.

But you never access the xattrs for the lower directory so that shouldn't
matter.
Have a union for the upper fs would certainly be sufficiently 'interesting'
to explicitly forbid.

> > 
> > Changes to underlying filesystems
> > ---------------------------------
> > 
> 
> For now I refuse to even think about what happens in this case.
> 
> The easiest way out of this mess might simply be to enforce exclusive
> modification to the underlying filesystems on a local level, same as
> the union mount strategy.  For NFS and other remote filesystems we
> either
> 
>  a) add some way to enforce it,
>  b) live with the consequences if not enforced on the system level, or
>  c) disallow them to be part of the union.
> 

I actually think that your approach can work quite will with either the
upper or lower changing independently.  Certainly it can produce some odd
situations, but even NFS can do that (though maybe not quite so odd).

It think the best approach would be to handle the few that can be handled and
explicitly document the rest - people might even find them useful.


Anyway, here is my next instalment which are the review comments, now that I
have some documentation to read :-)

In no particular order:

1/ You call union_remove_whiteouts on an upper directory once you have
   determined that the merged directory is empty, which implies that the only
   things in the the upper directory are whiteouts.
   union_remove_whiteouts calls union_unlink_writeout on every entry.
   It checks that each entry is a DT_LNK - which we assume it must be - but
   doesn't check the the inode there is really a whiteout.
   It seem inconsistent to do the one check but not the other.

   As this could race with independent changes on the upper filesystem, I
   thick it would be safest to at least check the i_mode and i_size of the
   dentry->d_inode that is found, and possible do a readlink as well to
   ensure we only delete whiteouts.

1a/ A minor optimisation for union_is_white would be to check i_size matches
    size of (union-whiteout)

2/ It bothers me that the 'dev_name' arg to union_get_sb is unused, yet there
   are mandatory options.  I think it would be nice if dev_name were required
   to be lower,upper and options were left for other things.
   So the typical usage would be:

       mount -t union /path/to/lower,/path/to/upper /mount/point

3/ I think it would be great if you made use of d_revalidate to handle some
   of the worst problems caused by underlying filesystem changes.  If you
   cache i_mtime and i_version of the parent in the dentry and re-do the
   lookup if either is different from the directory you could hide most of
   the issues mentioned in the doco about dentries expiring.  And if you
   called d_revalidate in the underlying filesystem at the same time you could
   probably hide all the rest.

4/ The problem you mentioned about ->i_mutex which is taken during unlink
   being quasi-global could be eased somewhat by simply having a small pool
   of inodes and assigning them to dentries pseudo-randomly.  Or possibly
   having one 'file' inode per directory (that might be a bit wasteful
   though).

5/ I wonder if it would be useful to recognise 'fallthrough' objects (much 
   like whiteouts but inverted) and to optionally (based on mount option) copy
   up any directory on readdir and fill it with fall-throughs for any lower
   name that isn't in the upper.  This helps with enormous directories
   (though not if upper is RAMFS of course) and ensures a stable directory
   cookie.


While I like the idea of being able to work with changeable filesystems and
think most scenarios can be handled adequately, there is one that I'm not
comfortable with.
If you open a file readonly and get a handle on the file in the lower
filesystem, and then you fchmod the handle, it will work but will change the
lower filesystem - which you don't really want (I think).
The only way to avoid this currently is to require the lower vfsmnt to be
mounted readonly.  That is probably acceptable (it can be mounted read-write
elsewhere) but I'd like it to be easier than that.
An alternate would be to teach mnt_want_write_file about some new flag which
marks the 'struct file' as 'really-readonly' and have union_open set that
flag.  Note sure if I really like that or not.
Probably for now just:

6/  require  __mnt_is_readonly(lowerpath.mnt) at mount time.

Finally, I think it is really important to document all the non-standard
aspects of the unioned filesystem in some detail and suggest work-around for
potentially problematic behaviour.
The fact that a read-only open can get a lower-filesystem filehandle has a
lot of interesting consequences.
If you take a read-lock, it doesn't stop an other process writing to the file
(though it does stop the file from changing!).
If you request a DNOTIFY on a directory, you will not see when things get
added.
If you take a lease on a file it won't be broken by someone trying to write.

So if you:
  - create a union.
  - start a daemon that watches for changes
  - make changes

The daemon could never notice.  Of course if you stop everything and restart
it will then work smoothly, as the 'make-changes' will have copied-up the
directory.

Most such confusion can be easily avoided by 'touching' any file or directory
that will be monitored for changes or will be involved in locking.  Having to
do that after creating a union is a hassle, but only a minor hassle I think,
and may not actually be needed at all in many cases.

On the whole, I really like this approach.  It strikes a good balance between
simplicity and functionality.  It doesn't provide perfect semantics, but I
think it provides good-enough semantics and a very moderate cost.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-27 11:35     ` Neil Brown
@ 2010-08-27 16:53       ` Miklos Szeredi
  2010-08-29  4:42         ` Neil Brown
  2010-08-30 18:38       ` Valerie Aurora
  1 sibling, 1 reply; 49+ messages in thread
From: Miklos Szeredi @ 2010-08-27 16:53 UTC (permalink / raw)
  To: Neil Brown
  Cc: miklos, linux-fsdevel, linux-kernel, vaurora, viro, jblunck, hch

On Fri, 27 Aug 2010, Neil Brown wrote:
> I was wondering why you even bothered to fiddle st_ino.  Given that lots of
> other things come from one fs or the other, having the merged directories
> appear to be from the upper filesystem doesn't seem like a problem.  I don't
> really care either way though.

"rm -rf" complains if st_ino of a directory changes spontaneously.

> > > The lower filesystem can be any filesystem supported by Linux and does not
> > > need to be writable.  It could even be another union.
> > 
> > Almost.  Xattr namespace issues currently prevent that, but with
> > escaping it could be worked around if necessary.
> 
> But you never access the xattrs for the lower directory so that shouldn't
> matter.

Ah, right.  Another small issue is that currently unionfs accesses
inode->i_* from the underlying filesystems instead of calling
->getattr(), which will break if the underlying fs is a union with its
dummy inodes.  But that should be easy to fix.

> I actually think that your approach can work quite will with either the
> upper or lower changing independently.  Certainly it can produce some odd
> situations, but even NFS can do that (though maybe not quite so odd).
> 
> It think the best approach would be to handle the few that can be handled and
> explicitly document the rest - people might even find them useful.

I think it's best to leave that stuff until someone actually cares.
The "people might find it useful" argument is not strong enough to put
nontrivial effort into thinking about all the weird corner cases.

> Anyway, here is my next instalment which are the review comments, now that I
> have some documentation to read :-)
> 
> In no particular order:
> 
> 1/ You call union_remove_whiteouts on an upper directory once you have
>    determined that the merged directory is empty, which implies that the only
>    things in the the upper directory are whiteouts.
>    union_remove_whiteouts calls union_unlink_writeout on every entry.
>    It checks that each entry is a DT_LNK - which we assume it must be - but
>    doesn't check the the inode there is really a whiteout.
>    It seem inconsistent to do the one check but not the other.

The DT_LNK check is done to filter out "." and "..".  Added comment.

>    As this could race with independent changes on the upper filesystem, I
>    thick it would be safest to at least check the i_mode and i_size of the
>    dentry->d_inode that is found, and possible do a readlink as well to
>    ensure we only delete whiteouts.
> 
> 1a/ A minor optimisation for union_is_white would be to check i_size matches
>     size of (union-whiteout)

I'm not fond of relying on inode->i_* members directly as unionfs
itself doesn't play by those rules.  But maybe it's OK here as
anything wanting to be an upper filesystem will be sufficiently
"normal" for this to work.

Fixed.

> 2/ It bothers me that the 'dev_name' arg to union_get_sb is unused, yet there
>    are mandatory options.  I think it would be nice if dev_name were required
>    to be lower,upper and options were left for other things.
>    So the typical usage would be:
> 
>        mount -t union /path/to/lower,/path/to/upper /mount/point
> 

I think that's a matter of taste.  The 'dev_name' argument is just a
specialized option, and when that option needs a structure like your
example then IMO it's better to just move it to normal options.

> 3/ I think it would be great if you made use of d_revalidate to handle some
>    of the worst problems caused by underlying filesystem changes.  If you
>    cache i_mtime and i_version of the parent in the dentry and re-do the
>    lookup if either is different from the directory you could hide most of
>    the issues mentioned in the doco about dentries expiring.  And if you
>    called d_revalidate in the underlying filesystem at the same time you could
>    probably hide all the rest.

As I said, I'd leave it until someone actually needs this.

> 
> 4/ The problem you mentioned about ->i_mutex which is taken during unlink
>    being quasi-global could be eased somewhat by simply having a small pool
>    of inodes and assigning them to dentries pseudo-randomly.  Or possibly
>    having one 'file' inode per directory (that might be a bit wasteful
>    though).

That's an idea, but I'm inclined just to add some hacks to the VFS to
omit the locking if some inode flag is set.

> 5/ I wonder if it would be useful to recognise 'fallthrough' objects (much 
>    like whiteouts but inverted) and to optionally (based on mount option) copy
>    up any directory on readdir and fill it with fall-throughs for any lower
>    name that isn't in the upper.  This helps with enormous directories
>    (though not if upper is RAMFS of course) and ensures a stable directory
>    cookie.

I'm not sure if the stable directory cookie problem is important
enough.  AFAIR some ancient versions of libc relied on directory
seeking and also some weird apps might, but anything sane will not
touch that interface (and I'm hoping someday we can get rid of it for
good).

As for caching large directories, I think that's best done with the
page cache, not by permanently copying up the contents to the upper
directory.

> While I like the idea of being able to work with changeable filesystems and
> think most scenarios can be handled adequately, there is one that I'm not
> comfortable with.
> If you open a file readonly and get a handle on the file in the lower
> filesystem, and then you fchmod the handle, it will work but will change the
> lower filesystem - which you don't really want (I think).
> The only way to avoid this currently is to require the lower vfsmnt to be
> mounted readonly.  That is probably acceptable (it can be mounted read-write
> elsewhere) but I'd like it to be easier than that.
> An alternate would be to teach mnt_want_write_file about some new flag which
> marks the 'struct file' as 'really-readonly' and have union_open set that
> flag.  Note sure if I really like that or not.
> Probably for now just:
> 
> 6/  require  __mnt_is_readonly(lowerpath.mnt) at mount time.

Right, that's at the front of the todo list.

> Finally, I think it is really important to document all the non-standard
> aspects of the unioned filesystem in some detail and suggest work-around for
> potentially problematic behaviour.

I agree completely.  I just tend to write code first and documentation
later (or as late as possibly can) so your contribution in this area
really warms my heart :)

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/5] vfs: make i_op->permission take a dentry instead of an inode
  2010-08-27  4:11     ` Neil Brown
@ 2010-08-27 18:13       ` David P. Quigley
  2010-08-27 19:21         ` Valerie Aurora
  2010-08-27 18:31       ` David P. Quigley
  1 sibling, 1 reply; 49+ messages in thread
From: David P. Quigley @ 2010-08-27 18:13 UTC (permalink / raw)
  To: Neil Brown
  Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, vaurora, viro, jblunck, hch

On Fri, 2010-08-27 at 14:11 +1000, Neil Brown wrote:
> On Thu, 26 Aug 2010 16:24:02 -0400
> "David P. Quigley" <dpquigl@tycho.nsa.gov> wrote:
> 
> > I may be missing something but I looked at your patch series and I see
> > no good reason for this patch at all. You just churned a lot of code for
> > something that you don't even have a need for in the patch set. Your
> > only two new callers of this function could just as easily have used the
> > inode since it isn't doing anything special with the dentry. It actually
> > pulls the inode out of it and uses it in generic_permission and
> > security_inode_permission. If you are going to change this you should
> > also change generic_permission as well. Honestly I'd rather see the
> > dentry requirement removed from inode operations instead but
> > unfortunately this isn't possible as I found out with my attempts to
> > remove the dentry requirement for get/setxattr
> 
> 
> union_permission needs the dentry to get access to d_fsdata, which caches the
> upperpath and lowerpath which were found at lookup time.
> 
> Is that what you missed?
> 

You're correct I missed the line where that was being pulled out of the
dentry. The better question for me would be why do it this way as
opposed to what the union file systems are doing. If neither UnionFS or
AUFS are having to make this change so I'd like a much better
explination for this change. I'm not seeing enough information in the
form of why he designed the prototype this way to justify a change that
the other implementations don't seem to need.

Dave


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/5] vfs: make i_op->permission take a dentry instead of an inode
  2010-08-27  4:11     ` Neil Brown
  2010-08-27 18:13       ` David P. Quigley
@ 2010-08-27 18:31       ` David P. Quigley
  1 sibling, 0 replies; 49+ messages in thread
From: David P. Quigley @ 2010-08-27 18:31 UTC (permalink / raw)
  To: Neil Brown
  Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, vaurora, viro, jblunck, hch

It seems that your reply to [0/5] has the description that I was asking
for in my last response.

On Fri, 2010-08-27 at 14:11 +1000, Neil Brown wrote:
> On Thu, 26 Aug 2010 16:24:02 -0400
> "David P. Quigley" <dpquigl@tycho.nsa.gov> wrote:
> 
> > I may be missing something but I looked at your patch series and I see
> > no good reason for this patch at all. You just churned a lot of code for
> > something that you don't even have a need for in the patch set. Your
> > only two new callers of this function could just as easily have used the
> > inode since it isn't doing anything special with the dentry. It actually
> > pulls the inode out of it and uses it in generic_permission and
> > security_inode_permission. If you are going to change this you should
> > also change generic_permission as well. Honestly I'd rather see the
> > dentry requirement removed from inode operations instead but
> > unfortunately this isn't possible as I found out with my attempts to
> > remove the dentry requirement for get/setxattr
> 
> 
> union_permission needs the dentry to get access to d_fsdata, which caches the
> upperpath and lowerpath which were found at lookup time.
> 
> Is that what you missed?
> 
> NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/5] vfs: make i_op->permission take a dentry instead of an inode
  2010-08-27 18:13       ` David P. Quigley
@ 2010-08-27 19:21         ` Valerie Aurora
  0 siblings, 0 replies; 49+ messages in thread
From: Valerie Aurora @ 2010-08-27 19:21 UTC (permalink / raw)
  To: David P. Quigley
  Cc: Neil Brown, Miklos Szeredi, linux-fsdevel, linux-kernel, viro,
	jblunck, hch

On Fri, Aug 27, 2010 at 02:13:51PM -0400, David P. Quigley wrote:
> On Fri, 2010-08-27 at 14:11 +1000, Neil Brown wrote:
> > On Thu, 26 Aug 2010 16:24:02 -0400
> > "David P. Quigley" <dpquigl@tycho.nsa.gov> wrote:
> > 
> > > I may be missing something but I looked at your patch series and I see
> > > no good reason for this patch at all. You just churned a lot of code for
> > > something that you don't even have a need for in the patch set. Your
> > > only two new callers of this function could just as easily have used the
> > > inode since it isn't doing anything special with the dentry. It actually
> > > pulls the inode out of it and uses it in generic_permission and
> > > security_inode_permission. If you are going to change this you should
> > > also change generic_permission as well. Honestly I'd rather see the
> > > dentry requirement removed from inode operations instead but
> > > unfortunately this isn't possible as I found out with my attempts to
> > > remove the dentry requirement for get/setxattr
> > 
> > 
> > union_permission needs the dentry to get access to d_fsdata, which caches the
> > upperpath and lowerpath which were found at lookup time.
> > 
> > Is that what you missed?
> > 
> 
> You're correct I missed the line where that was being pulled out of the
> dentry. The better question for me would be why do it this way as
> opposed to what the union file systems are doing. If neither UnionFS or
> AUFS are having to make this change so I'd like a much better
> explination for this change. I'm not seeing enough information in the
> form of why he designed the prototype this way to justify a change that
> the other implementations don't seem to need.

For reference, union mounts needs something like this.  The current
patch set creates a inode_permission() variant for use in the VFS in
cases where we may copy up.  The superblock of the target is read-only
and we have to ignore that - if the topmost parent directory is
unioned.

Here's the patch, which I suspect needs to be rewritten:

commit 72a84672ccf2a923bac120e3ac75ac106d36b4ae
Author: Valerie Aurora <vaurora@redhat.com>
Date:   Sat Mar 20 17:34:53 2010 -0700

    VFS: Split inode_permission() and create path_permission()
    
    Split inode_permission() into inode and file-system-dependent parts.
    Create path_permission() to check permission based on the path to the
    inode.  This is for union mounts, in which an inode can be located on
    a read-only lower layer file system but is still writable, since we
    will copy it up to the writable top layer file system.  So in that
    case, we want to ignore MS_RDONLY on the lower layer.  To make this
    decision, we must know the path (vfsmount, dentry) of both the target
    and its parent.
    
    XXX - so ugly!

diff --git a/fs/namei.c b/fs/namei.c
index 1358a89..2b2bf41 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -241,29 +241,20 @@ int generic_permission(struct inode *inode, int mask,
 }
 
 /**
- * inode_permission  -  check for access rights to a given inode
+ * __inode_permission  -  check for access rights to a given inode
  * @inode:	inode to check permission on
  * @mask:	right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
  *
  * Used to check for read/write/execute permissions on an inode.
- * We use "fsuid" for this, letting us set arbitrary permissions
- * for filesystem access without changing the "normal" uids which
- * are used for other things.
+ *
+ * This does not check for a read-only file system.  You probably want
+ * inode_permission().
  */
-int inode_permission(struct inode *inode, int mask)
+static int __inode_permission(struct inode *inode, int mask)
 {
 	int retval;
 
 	if (mask & MAY_WRITE) {
-		umode_t mode = inode->i_mode;
-
-		/*
-		 * Nobody gets write access to a read-only fs.
-		 */
-		if (IS_RDONLY(inode) &&
-		    (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)))
-			return -EROFS;
-
 		/*
 		 * Nobody gets write access to an immutable file.
 		 */
@@ -288,6 +279,79 @@ int inode_permission(struct inode *inode, int mask)
 }
 
 /**
+ * sb_permission  -  check superblock-level permissions
+ * @sb: superblock of inode to check permission on
+ * @mask: right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
+ *
+ * Separate out file-system wide checks from inode-specific permission
+ * checks.  In particular, union mounts want to check the read-only
+ * status of the top-level file system, not the lower.
+ */
+int sb_permission(struct super_block *sb, struct inode *inode, int mask)
+{
+	if (mask & MAY_WRITE) {
+		umode_t mode = inode->i_mode;
+
+		/*
+		 * Nobody gets write access to a read-only fs.
+		 */
+		if ((sb->s_flags & MS_RDONLY) &&
+		    (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)))
+			return -EROFS;
+	}
+	return 0;
+}
+
+/**
+ * inode_permission  -  check for access rights to a given inode
+ * @inode:	inode to check permission on
+ *
+ * Used to check for read/write/execute permissions on an inode.
+ * We use "fsuid" for this, letting us set arbitrary permissions
+ * for filesystem access without changing the "normal" uids which
+ * are used for other things.
+ */
+int inode_permission(struct inode *inode, int mask)
+{
+	int retval;
+
+	retval = sb_permission(inode->i_sb, inode, mask);
+	if (retval)
+		return retval;
+	return __inode_permission(inode, mask);
+}
+
+/**
+ * path_permission - check for inode access rights depending on path
+ * @path: path of inode to check
+ * @parent_path: path of inode's parent
+ * @mask: right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
+ *
+ * Like inode_permission, but used to check for permission when the
+ * file may potentially be copied up between union layers.
+ */
+
+int path_permission(struct path *path, struct path *parent_path, int mask)
+{
+	struct vfsmount *mnt;
+	int retval;
+
+	/* Catch some reversal of args */
+	BUG_ON(!S_ISDIR(parent_path->dentry->d_inode->i_mode));
+
+	if (IS_MNT_UNION(parent_path->mnt))
+		mnt = parent_path->mnt;
+	else
+		mnt = path->mnt;
+
+	retval = sb_permission(mnt->mnt_sb, path->dentry->d_inode, mask);
+	if (retval)
+		return retval;
+	return __inode_permission(path->dentry->d_inode, mask);
+}
+
+/**
  * file_permission  -  check for additional access rights to a given file
  * @file:	file to check access rights for
  * @mask:	right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 78eca89..998adae 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2139,6 +2139,7 @@ extern sector_t bmap(struct inode *, sector_t);
 #endif
 extern int notify_change(struct dentry *, struct iattr *);
 extern int inode_permission(struct inode *, int);
+extern int path_permission(struct path *, struct path *, int);
 extern int generic_permission(struct inode *, int,
 		int (*check_acl)(struct inode *, int));
 extern int generic_readdir_fallthru(struct dentry *topmost_dentry, const char *name,

-VAL

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-27 16:53       ` Miklos Szeredi
@ 2010-08-29  4:42         ` Neil Brown
  2010-08-30 10:18           ` Miklos Szeredi
  0 siblings, 1 reply; 49+ messages in thread
From: Neil Brown @ 2010-08-29  4:42 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, vaurora, viro, jblunck, hch

On Fri, 27 Aug 2010 18:53:45 +0200
Miklos Szeredi <miklos@szeredi.hu> wrote:

> On Fri, 27 Aug 2010, Neil Brown wrote:
> > I was wondering why you even bothered to fiddle st_ino.  Given that lots of
> > other things come from one fs or the other, having the merged directories
> > appear to be from the upper filesystem doesn't seem like a problem.  I don't
> > really care either way though.
> 
> "rm -rf" complains if st_ino of a directory changes spontaneously.

I see...
Doesn't that mean that you must always present a merged-directory if the
lower directory exists.
Otherwise if you "rm -r" in a directory that only exists in the lower
filesystem, the inode will change on the first 'unlink' call.
??

> 
> > > > The lower filesystem can be any filesystem supported by Linux and does not
> > > > need to be writable.  It could even be another union.
> > > 
> > > Almost.  Xattr namespace issues currently prevent that, but with
> > > escaping it could be worked around if necessary.
> > 
> > But you never access the xattrs for the lower directory so that shouldn't
> > matter.
> 
> Ah, right.  Another small issue is that currently unionfs accesses
> inode->i_* from the underlying filesystems instead of calling
> ->getattr(), which will break if the underlying fs is a union with its
> dummy inodes.  But that should be easy to fix.
> 
> > I actually think that your approach can work quite will with either the
> > upper or lower changing independently.  Certainly it can produce some odd
> > situations, but even NFS can do that (though maybe not quite so odd).
> > 
> > It think the best approach would be to handle the few that can be handled and
> > explicitly document the rest - people might even find them useful.
> 
> I think it's best to leave that stuff until someone actually cares.
> The "people might find it useful" argument is not strong enough to put
> nontrivial effort into thinking about all the weird corner cases.

My thought on this is that in order to describe the behaviour of the
filesystem accurately you need to think about all the weird corner cases.

Then it becomes a question of: is it easier to document the weird behaviour,
or change the code so the documentation will be easier to understand?
Some cases will go one way, some the other.

But as you suggest this isn't urgent.

> 
> > Anyway, here is my next instalment which are the review comments, now that I
> > have some documentation to read :-)
> > 
> > In no particular order:
> > 
> > 1/ You call union_remove_whiteouts on an upper directory once you have
> >    determined that the merged directory is empty, which implies that the only
> >    things in the the upper directory are whiteouts.
> >    union_remove_whiteouts calls union_unlink_writeout on every entry.
> >    It checks that each entry is a DT_LNK - which we assume it must be - but
> >    doesn't check the the inode there is really a whiteout.
> >    It seem inconsistent to do the one check but not the other.
> 
> The DT_LNK check is done to filter out "." and "..".  Added comment.
> 
> >    As this could race with independent changes on the upper filesystem, I
> >    thick it would be safest to at least check the i_mode and i_size of the
> >    dentry->d_inode that is found, and possible do a readlink as well to
> >    ensure we only delete whiteouts.
> > 
> > 1a/ A minor optimisation for union_is_white would be to check i_size matches
> >     size of (union-whiteout)
> 
> I'm not fond of relying on inode->i_* members directly as unionfs
> itself doesn't play by those rules.  But maybe it's OK here as
> anything wanting to be an upper filesystem will be sufficiently
> "normal" for this to work.
> 
> Fixed.

I see your point and I think you are right.  In most cases the filesystem or
some support-library it calls should be the only thing to look at i_*
directly.
An exception seems to be that (i_mode & S_IFMT) is often treated and publicly
viewable.  But I agree that assuming i_size means something particular is
dangerous.


> > 
> > 4/ The problem you mentioned about ->i_mutex which is taken during unlink
> >    being quasi-global could be eased somewhat by simply having a small pool
> >    of inodes and assigning them to dentries pseudo-randomly.  Or possibly
> >    having one 'file' inode per directory (that might be a bit wasteful
> >    though).
> 
> That's an idea, but I'm inclined just to add some hacks to the VFS to
> omit the locking if some inode flag is set.

Every new flag you add is another sign of weakness in the VFS....

Hmmm... vfs_unlink checks if the victim is a mountpoint, which suggests that
you can mount on a non-directory.  I'm not sure if I knew that.
I wonder how this union fs would cope if a file in the upper filesystem were
mounted-on....I suspect it would work correctly.



Two other thoughts.
My comment about set-theory unions being commutative set me thinking.  I
really don't think "union" is the right name for this thing.  There is
nothing about it which really fits that proper definition of a union.
whiteouts mean that even the list of names in a directory is not the union of
the lists of names in the upper and lower directories.
"overlay" is a much more accurate name.  But union seems to be the name
that is most used.  I wonder if it is too late to change that.

Also, dnotify (and presumably inotifiy) doesn't work reliably in the current
implementation.
It works for opaque directories and those which don't have a namesake in the
lower filesystem, but for others it never notifies of changes to any files in
the directory.
This is because dnotify will set an fsnotifier on the merged-directory in the
union-fs, but the change happens to the file in the upper fs, so
fsnotify_parent will only call notifiers on the parent in the upper fs.

I think the way to fix this would involve the union-fs putting a notifier on
the upper dir to match whatever is on the merged-dir.  However the filesystem
currently isn't told when notifiers are attach to an inode.  I think it would
be good to add an inode_operation which is called whenever the notifiers are
changed.  This would also allow a networked filesystem to report notification
if the protocol supported it.
Does that seem reasonable?

NeilBrown

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-29  4:42         ` Neil Brown
@ 2010-08-30 10:18           ` Miklos Szeredi
  2010-08-30 11:40             ` Neil Brown
                               ` (2 more replies)
  0 siblings, 3 replies; 49+ messages in thread
From: Miklos Szeredi @ 2010-08-30 10:18 UTC (permalink / raw)
  To: Neil Brown
  Cc: miklos, linux-fsdevel, linux-kernel, vaurora, viro, jblunck, hch

On Sun, 29 Aug 2010, Neil Brown wrote:
> On Fri, 27 Aug 2010 18:53:45 +0200
> Miklos Szeredi <miklos@szeredi.hu> wrote:
> 
> > On Fri, 27 Aug 2010, Neil Brown wrote:
> > > I was wondering why you even bothered to fiddle st_ino.  Given
> > > that lots of other things come from one fs or the other, having
> > > the merged directories appear to be from the upper filesystem
> > > doesn't seem like a problem.  I don't really care either way
> > > though.
> > 
> > "rm -rf" complains if st_ino of a directory changes spontaneously.
> 
> I see...
> Doesn't that mean that you must always present a merged-directory if the
> lower directory exists.

Directory opens are never "forwarded" to the lower filesystem like
other opens.  One reason is that lookups continuing from the file's
path need to be on the union filesystem instead on one of the
underlying filesystems.

> Otherwise if you "rm -r" in a directory that only exists in the lower
> filesystem, the inode will change on the first 'unlink' call.
> ??

Right.

Since the union filesystem doesn't have "real" inodes the st_ino on
union directories are stable only as long as the inode is in the
cache.  That's not a problem for "rm -rf" because the ancestor
directory will always have a reference through one of its children.

> > I think it's best to leave that stuff until someone actually cares.
> > The "people might find it useful" argument is not strong enough to put
> > nontrivial effort into thinking about all the weird corner cases.
> 
> My thought on this is that in order to describe the behaviour of the
> filesystem accurately you need to think about all the weird corner cases.
> 
> Then it becomes a question of: is it easier to document the weird behaviour,
> or change the code so the documentation will be easier to understand?
> Some cases will go one way, some the other.
> 
> But as you suggest this isn't urgent.

You didn't mention one possibility: add limitations that prevent the
weird corner cases arising.  I believe this is the simplest option.


> > > 4/ The problem you mentioned about ->i_mutex which is taken during unlink
> > >    being quasi-global could be eased somewhat by simply having a small pool
> > >    of inodes and assigning them to dentries pseudo-randomly.  Or possibly
> > >    having one 'file' inode per directory (that might be a bit wasteful
> > >    though).
> > 
> > That's an idea, but I'm inclined just to add some hacks to the VFS to
> > omit the locking if some inode flag is set.
> 
> Every new flag you add is another sign of weakness in the VFS....
> 
> Hmmm... vfs_unlink checks if the victim is a mountpoint, which suggests that
> you can mount on a non-directory.  I'm not sure if I knew that.

mount --bind has always worked on non-directories too.  It's a useful
feature.

> I wonder how this union fs would cope if a file in the upper filesystem were
> mounted-on....I suspect it would work correctly.

Mounts on the union fs work as expected, they don't care about inodes
except that directories cannot cover non-directories and vice-versa.

Mounts on the upper and lower fs are currenly ignored.  It's an
interesting question whether union fs should handle mount trees on the
underlying filesystems or not.


> My comment about set-theory unions being commutative set me thinking.  I
> really don't think "union" is the right name for this thing.  There is
> nothing about it which really fits that proper definition of a union.
> whiteouts mean that even the list of names in a directory is not the union of
> the lists of names in the upper and lower directories.
> "overlay" is a much more accurate name.  But union seems to be the name
> that is most used.  I wonder if it is too late to change that.

We could call it overlayfs.  People learn new names quickly :)

> Also, dnotify (and presumably inotifiy) doesn't work reliably in the current
> implementation.
> It works for opaque directories and those which don't have a namesake in the
> lower filesystem, but for others it never notifies of changes to any files in
> the directory.
> This is because dnotify will set an fsnotifier on the merged-directory in the
> union-fs, but the change happens to the file in the upper fs, so
> fsnotify_parent will only call notifiers on the parent in the upper fs.

I think *notify will work correctly, every modificaton will be
notified on both the union fs and the upper fs.  But I haven't tested
this yet.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-30 10:18           ` Miklos Szeredi
@ 2010-08-30 11:40             ` Neil Brown
  2010-08-30 12:20               ` Miklos Szeredi
  2010-09-01  4:33               ` Neil Brown
  2010-08-31 19:29             ` Valerie Aurora
  2010-09-02 13:15             ` Jan Engelhardt
  2 siblings, 2 replies; 49+ messages in thread
From: Neil Brown @ 2010-08-30 11:40 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, vaurora, viro, jblunck, hch

On Mon, 30 Aug 2010 12:18:11 +0200
Miklos Szeredi <miklos@szeredi.hu> wrote:

> On Sun, 29 Aug 2010, Neil Brown wrote:
> > On Fri, 27 Aug 2010 18:53:45 +0200
> > Miklos Szeredi <miklos@szeredi.hu> wrote:
> > 
> > > On Fri, 27 Aug 2010, Neil Brown wrote:
> > > > I was wondering why you even bothered to fiddle st_ino.  Given
> > > > that lots of other things come from one fs or the other, having
> > > > the merged directories appear to be from the upper filesystem
> > > > doesn't seem like a problem.  I don't really care either way
> > > > though.
> > > 
> > > "rm -rf" complains if st_ino of a directory changes spontaneously.
> > 
> > I see...
> > Doesn't that mean that you must always present a merged-directory if the
> > lower directory exists.
> 
> Directory opens are never "forwarded" to the lower filesystem like
> other opens.  One reason is that lookups continuing from the file's
> path need to be on the union filesystem instead on one of the
> underlying filesystems.

You're right - I misread the code there.

> > > I think it's best to leave that stuff until someone actually cares.
> > > The "people might find it useful" argument is not strong enough to put
> > > nontrivial effort into thinking about all the weird corner cases.
> > 
> > My thought on this is that in order to describe the behaviour of the
> > filesystem accurately you need to think about all the weird corner cases.
> > 
> > Then it becomes a question of: is it easier to document the weird behaviour,
> > or change the code so the documentation will be easier to understand?
> > Some cases will go one way, some the other.
> > 
> > But as you suggest this isn't urgent.
> 
> You didn't mention one possibility: add limitations that prevent the
> weird corner cases arising.  I believe this is the simplest option.

Val has been following that approach and asking if it is possible to make an
NFS filesystem really-truly read-only. i.e. no changes.
I don't believe it is.

But I won't pursue this further without patches.


> > My comment about set-theory unions being commutative set me thinking.  I
> > really don't think "union" is the right name for this thing.  There is
> > nothing about it which really fits that proper definition of a union.
> > whiteouts mean that even the list of names in a directory is not the union of
> > the lists of names in the upper and lower directories.
> > "overlay" is a much more accurate name.  But union seems to be the name
> > that is most used.  I wonder if it is too late to change that.
> 
> We could call it overlayfs.  People learn new names quickly :)

+1

> 
> > Also, dnotify (and presumably inotifiy) doesn't work reliably in the current
> > implementation.
> > It works for opaque directories and those which don't have a namesake in the
> > lower filesystem, but for others it never notifies of changes to any files in
> > the directory.
> > This is because dnotify will set an fsnotifier on the merged-directory in the
> > union-fs, but the change happens to the file in the upper fs, so
> > fsnotify_parent will only call notifiers on the parent in the upper fs.
> 
> I think *notify will work correctly, every modificaton will be
> notified on both the union fs and the upper fs.  But I haven't tested
> this yet.

I tried.  It doesn't.
To be precise:  directory changes like file creation (even creation of a file
that already exists) get notified, but purely file-based event like modifying
the contents of the file don't generate events back to the overlayfs
directory.
Because an open (for write) of a file is passed through to the upper
filesystem, the notifications of modification through that open only go to the
upper filesystem.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-30 11:40             ` Neil Brown
@ 2010-08-30 12:20               ` Miklos Szeredi
  2010-08-31 19:18                 ` Valerie Aurora
  2010-09-01  4:33               ` Neil Brown
  1 sibling, 1 reply; 49+ messages in thread
From: Miklos Szeredi @ 2010-08-30 12:20 UTC (permalink / raw)
  To: Neil Brown
  Cc: miklos, linux-fsdevel, linux-kernel, vaurora, viro, jblunck, hch

On Mon, 30 Aug 2010, Neil Brown wrote:

> > You didn't mention one possibility: add limitations that prevent the
> > weird corner cases arising.  I believe this is the simplest option.
> 
> Val has been following that approach and asking if it is possible to make an
> NFS filesystem really-truly read-only. i.e. no changes.
> I don't believe it is.

Perhaps it doesn't matter.  The nasty cases can be prevented by just
disallowing local modification.  For the rest NFS will return ESTALE:
"though luck, why didn't you follow the rules?"

> > I think *notify will work correctly, every modificaton will be
> > notified on both the union fs and the upper fs.  But I haven't tested
> > this yet.
> 
> I tried.  It doesn't.
> To be precise:  directory changes like file creation (even creation of a file
> that already exists) get notified, but purely file-based event like modifying
> the contents of the file don't generate events back to the overlayfs
> directory.
> Because an open (for write) of a file is passed through to the upper
> filesystem, the notifications of modification through that open only go to the
> upper filesystem.

Ah, right.

> > > I think the way to fix this would involve the union-fs putting a
> > > notifier on the upper dir to match whatever is on the
> > > merged-dir.  However the filesystem currently isn't told when
> > > notifiers are attach to an inode.  I think it would be good to
> > > add an inode_operation which is called whenever the notifiers
> > > are changed.  This would also allow a networked filesystem to
> > > report notification if the protocol supported it.  Does that
> > > seem reasonable?

In that light this sounds entirely reasonable.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-27 11:35     ` Neil Brown
  2010-08-27 16:53       ` Miklos Szeredi
@ 2010-08-30 18:38       ` Valerie Aurora
  2010-08-30 23:12         ` Neil Brown
  1 sibling, 1 reply; 49+ messages in thread
From: Valerie Aurora @ 2010-08-30 18:38 UTC (permalink / raw)
  To: Neil Brown
  Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, viro, jblunck, hch

On Fri, Aug 27, 2010 at 09:35:02PM +1000, Neil Brown wrote:
> On Fri, 27 Aug 2010 10:47:39 +0200
> Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > Changes to underlying filesystems
> > > ---------------------------------
> > > 
> > 
> > For now I refuse to even think about what happens in this case.
> > 
> > The easiest way out of this mess might simply be to enforce exclusive
> > modification to the underlying filesystems on a local level, same as
> > the union mount strategy.  For NFS and other remote filesystems we
> > either
> > 
> >  a) add some way to enforce it,
> >  b) live with the consequences if not enforced on the system level, or
> >  c) disallow them to be part of the union.
> > 
> 
> I actually think that your approach can work quite will with either the
> upper or lower changing independently.  Certainly it can produce some odd
> situations, but even NFS can do that (though maybe not quite so odd).

I'm very curious about your thoughts on how to handle the lower layer
changing.  Al Viro's comments:

http://lkml.indiana.edu/hypermail/linux/kernel/0802.0/0839.html

Do you see something we're missing?

-VAL

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-30 18:38       ` Valerie Aurora
@ 2010-08-30 23:12         ` Neil Brown
  2010-08-31 11:00           ` Miklos Szeredi
  0 siblings, 1 reply; 49+ messages in thread
From: Neil Brown @ 2010-08-30 23:12 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, viro, jblunck, hch

On Mon, 30 Aug 2010 14:38:43 -0400
Valerie Aurora <vaurora@redhat.com> wrote:

> On Fri, Aug 27, 2010 at 09:35:02PM +1000, Neil Brown wrote:
> > On Fri, 27 Aug 2010 10:47:39 +0200
> > Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > > Changes to underlying filesystems
> > > > ---------------------------------
> > > > 
> > > 
> > > For now I refuse to even think about what happens in this case.
> > > 
> > > The easiest way out of this mess might simply be to enforce exclusive
> > > modification to the underlying filesystems on a local level, same as
> > > the union mount strategy.  For NFS and other remote filesystems we
> > > either
> > > 
> > >  a) add some way to enforce it,
> > >  b) live with the consequences if not enforced on the system level, or
> > >  c) disallow them to be part of the union.
> > > 
> > 
> > I actually think that your approach can work quite will with either the
> > upper or lower changing independently.  Certainly it can produce some odd
> > situations, but even NFS can do that (though maybe not quite so odd).
> 
> I'm very curious about your thoughts on how to handle the lower layer
> changing.  Al Viro's comments:
> 
> http://lkml.indiana.edu/hypermail/linux/kernel/0802.0/0839.html
> 
> Do you see something we're missing?
> 

That would be:

> If you allow a mix of old and new mappings, you can easily run into the
> situations when at some moment X1 covers Y1, X2 covers Y2, X2 is a descendent
> of X1 and Y1 is a descendent of Y2. You *really* don't want to go there -
> if nothing else, defining behaviour of copyup in face of that insanity
> will be very painful.

in particular - right?

The current proposal doesn't do copy-up of directory trees - or even of
directories - which is what I assume the reference to copyup refers to, so
that is a non-issue (i.e. where a tree copy-up might be needed, EXDEV is
returned).

For the [XY][12] issue, let's try to make it a bit more concrete.

Suppose /L is the lower tree, /U is the upper tree and /O is the overlay
mount point.
Suppose further that /U/a/b/c/d/e/f/g exists and (for now) /L/a doesn't.

Then /O/a/b/c/d/e/f/g can appear to 'cover' the above path.

So I open
    /U/a/b/c  and /U/a/b/c/d/e/f
and
    /O/a/b/c  and /O/a/b/c/d/e/f

These are Y1 Y2 X1 X2.  Currently Xn covers Yn.

To get the situation Al describes we would need to perform some manipulations
in /U e.g.
   mv /U/a/b/c/d/e  /U/a/e
   mv /U/a/b /U/a/e/f/g/b
Now each dir still has the same basename as before and there is only one
child per dir, and the longest path is
     /U/a/e/f/g/b/c/d
So now Y1 (/U/../c) is a descendent of Y2 (/U/../f), and the dentries
attached to the 4 open fds haven't seen any local change.

So: is this a problem?  It may seem a bit confusing to someone who doesn't
understand what is happening, but we define that as not being a problem (to
avoid confusion: don't change U or L).
The important questions are:  Can it cause corruption, and can it cause a
deadlock?

If we walk down the directory tree from either X1 or X2, we will see
  c/d  or f/g/b/c/d
respectively.  Both paths will terminate - no loops.  The 'd' below X2 will be
a different dentry than the 'd' above X2 despite the fact that they both
reference the same 'd' under /U.
If we try a 'renameat' - the only thing that I can imagine that would cause
fs corruption, we use the same lock_rename and vfs_rename calls on the /U
filesystem as a direct syscall would, so any attempt to create a disconnected
loop will fail.
It is probable that the upperpath.dentry should be revalidated after getting
the lock to ensure it is still hashed etc, but that would be part of the
revalidation code that I would propose to add.  So I fail to see how any
filesystem corruption could happen.

For locking: we take locks in the /U filesystem while holding locks in the /O
overlay.  There is a clear ordering here so there should be no room for
deadlock.
If the overlay filesystem were to support overlaying a mount-tree on a
mount-tree rather than just a filesystem on a filesystem I imagine that it
would not be too hard to create some weird loops.  However the current
proposal ignores mountpoints and just overlays one filesystem on another, so
the overlay is certain to be separate from, and well-ordered with respect to,
the upper and lower filesystems.

If the long path were on /L rather than /U I don't think anything would be
different.
If the paths were shorter and you managed to directly swap a parent and a
child, overlayfs would be able to notice that during revalidate and - if
necessary due to unpleasant races - would simply return -EBUSY.

It is a fairly key aspect of this proposal that we feel free to return errors
for situations that are too hard.  -EXDEV for non-opaque directory renames is
one such case.  -EBUSY when racing with changes in the covered filesystems
would be another.

I haven't got revalidation code yet, but when {upper,lower}path.dentry were
non-null, it would check they are still hashed, have the same name as the
main dentry, and have the correct corresponding parent.  If they are NULL, we
check that the corresponding parent inode is still NULL.
If not, we repeat union_lookup - or fail or whatever seems appropriate.

Inside any locks that we take on the upper filesystem to perform some
directory manipulations we would repeat the checks (on upperpath only) and
return -EBUSY if the dentries have become 'invalid' since the recent
revalidate.

So I don't see a problem.  Maybe I'm not seeing something that Al does, or
maybe the current proposal is sufficiently different from the one Al was
reviewing that the problems he observed simply don't exist.

I've reviewed your recent discussion with J. R. Okajima and it doesn't help me
see any locking issues more clearly.  Maybe if you could identify a specific
vfs_* call which could issue lock requests in a dangerous order and is not
protected by the revalidation I described above, that might help me see more
clearly.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-30 23:12         ` Neil Brown
@ 2010-08-31 11:00           ` Miklos Szeredi
  2010-08-31 11:24             ` Neil Brown
  0 siblings, 1 reply; 49+ messages in thread
From: Miklos Szeredi @ 2010-08-31 11:00 UTC (permalink / raw)
  To: Neil Brown
  Cc: vaurora, miklos, linux-fsdevel, linux-kernel, viro, jblunck, hch

On Tue, 31 Aug 2010, Neil Brown wrote:
> So: is this a problem?  It may seem a bit confusing to someone who doesn't
> understand what is happening, but we define that as not being a problem (to
> avoid confusion: don't change U or L).
> The important questions are:  Can it cause corruption, and can it cause a
> deadlock?

No, I don't think this design will do that.  So it might be enough
just to document that online modification of upper or lower
filesystems results in undefined behavior.

But to prevent accidental damage, it's prudent (at least by default)
to enforce the no-modification policy.

Why do you think this feature of allowing modification is important?
Lets take some typical use cases:

 - live cd: lower layer is hard r/o, upper layer makes no sense to
   modify online

 - thin client: lower layer is static except upgrades, which need
   special tools to support and is done offline, upper layer makes no
   sense to modify online

Do you have some cases in mind where it makes at least a little sense
to allow online modification of the underlying filesystems?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-31 11:00           ` Miklos Szeredi
@ 2010-08-31 11:24             ` Neil Brown
  2010-08-31 15:05                 ` Kyle Moffett
  0 siblings, 1 reply; 49+ messages in thread
From: Neil Brown @ 2010-08-31 11:24 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: vaurora, linux-fsdevel, linux-kernel, viro, jblunck, hch

On Tue, 31 Aug 2010 13:00:45 +0200
Miklos Szeredi <miklos@szeredi.hu> wrote:

> On Tue, 31 Aug 2010, Neil Brown wrote:
> > So: is this a problem?  It may seem a bit confusing to someone who doesn't
> > understand what is happening, but we define that as not being a problem (to
> > avoid confusion: don't change U or L).
> > The important questions are:  Can it cause corruption, and can it cause a
> > deadlock?
> 
> No, I don't think this design will do that.  So it might be enough
> just to document that online modification of upper or lower
> filesystems results in undefined behavior.
> 
> But to prevent accidental damage, it's prudent (at least by default)
> to enforce the no-modification policy.
> 
> Why do you think this feature of allowing modification is important?
> Lets take some typical use cases:
> 
>  - live cd: lower layer is hard r/o, upper layer makes no sense to
>    modify online
> 
>  - thin client: lower layer is static except upgrades, which need
>    special tools to support and is done offline, upper layer makes no
>    sense to modify online
> 
> Do you have some cases in mind where it makes at least a little sense
> to allow online modification of the underlying filesystems?

No, I don't have a particular use case in mind that would take advantage of
the layers being directly modifiable.  But I know that sys-admins can be very
ingenious and may well come up with something clever.

My point is more that I don't think that is it *possible* to prevent changes
to the underlying filesystem (NFS being the prime example) so if there are
easy steps we can take to make the behaviour of overlayfs more predictable in
those cases, we should.

Further I think that insisting that the underlying filesystems remain
unchangeable is overly restrictive.  If I were not allowed to perform an
overlay mount on a read/write lower filesystem, that would make it
significantly harder to explore the possibilities of overlayfs and experiment
with it.

I certainly don't think we should put a lot of work or a lot of code into
making it work "perfectly" in any sense at all.  But if there are *easy*
things to do that allow us to avoid some weird behaviours, then I think it is
worth that effort to do it.

NeilBrown

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-31 11:24             ` Neil Brown
@ 2010-08-31 15:05                 ` Kyle Moffett
  0 siblings, 0 replies; 49+ messages in thread
From: Kyle Moffett @ 2010-08-31 15:05 UTC (permalink / raw)
  To: Neil Brown
  Cc: Miklos Szeredi, vaurora, linux-fsdevel, linux-kernel, viro, jblunck, hch

On Tue, Aug 31, 2010 at 07:24, Neil Brown <neilb@suse.de> wrote:
> On Tue, 31 Aug 2010 13:00:45 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote:
>> No, I don't think this design will do that.  So it might be enough
>> just to document that online modification of upper or lower
>> filesystems results in undefined behavior.
>>
>> But to prevent accidental damage, it's prudent (at least by default)
>> to enforce the no-modification policy.
>>
>> Why do you think this feature of allowing modification is important?
>> Lets take some typical use cases:
>>
>>  - live cd: lower layer is hard r/o, upper layer makes no sense to
>>    modify online
>>
>>  - thin client: lower layer is static except upgrades, which need
>>    special tools to support and is done offline, upper layer makes no
>>    sense to modify online
>>
>> Do you have some cases in mind where it makes at least a little sense
>> to allow online modification of the underlying filesystems?
>
> No, I don't have a particular use case in mind that would take advantage of
> the layers being directly modifiable.  But I know that sys-admins can be very
> ingenious and may well come up with something clever.
>
> My point is more that I don't think that is it *possible* to prevent changes
> to the underlying filesystem (NFS being the prime example) so if there are
> easy steps we can take to make the behaviour of overlayfs more predictable in
> those cases, we should.

There's certainly already weird behaviors you can cause by regular
filesystem over-mounts on NFS.  For example, I have an NFS server that
exports a "/srv/git" directory; if I was to do the following actions
on a client:

# mkdir /srv/git
# mount -t nfs myserver:/srv/git /srv/git
# mkdir /srv/git/mnt
# mount -t ext3 /dev/sda3 /srv/git/mnt

And then from the server I were to:
# rmdir /srv/git/mnt

Terrible terrible things would happen... by which I mean I can no
longer access or unmount that filesystem from the client.  That use
case in particular seems to be much worse than your regular unionfs
example even, and it's easily possible today (even by accident).

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
@ 2010-08-31 15:05                 ` Kyle Moffett
  0 siblings, 0 replies; 49+ messages in thread
From: Kyle Moffett @ 2010-08-31 15:05 UTC (permalink / raw)
  To: Neil Brown
  Cc: Miklos Szeredi, vaurora, linux-fsdevel, linux-kernel, viro, jblunck, hch

On Tue, Aug 31, 2010 at 07:24, Neil Brown <neilb@suse.de> wrote:
> On Tue, 31 Aug 2010 13:00:45 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote:
>> No, I don't think this design will do that.  So it might be enough
>> just to document that online modification of upper or lower
>> filesystems results in undefined behavior.
>>
>> But to prevent accidental damage, it's prudent (at least by default)
>> to enforce the no-modification policy.
>>
>> Why do you think this feature of allowing modification is important?
>> Lets take some typical use cases:
>>
>>  - live cd: lower layer is hard r/o, upper layer makes no sense to
>>    modify online
>>
>>  - thin client: lower layer is static except upgrades, which need
>>    special tools to support and is done offline, upper layer makes no
>>    sense to modify online
>>
>> Do you have some cases in mind where it makes at least a little sense
>> to allow online modification of the underlying filesystems?
>
> No, I don't have a particular use case in mind that would take advantage of
> the layers being directly modifiable.  But I know that sys-admins can be very
> ingenious and may well come up with something clever.
>
> My point is more that I don't think that is it *possible* to prevent changes
> to the underlying filesystem (NFS being the prime example) so if there are
> easy steps we can take to make the behaviour of overlayfs more predictable in
> those cases, we should.

There's certainly already weird behaviors you can cause by regular
filesystem over-mounts on NFS.  For example, I have an NFS server that
exports a "/srv/git" directory; if I was to do the following actions
on a client:

# mkdir /srv/git
# mount -t nfs myserver:/srv/git /srv/git
# mkdir /srv/git/mnt
# mount -t ext3 /dev/sda3 /srv/git/mnt

And then from the server I were to:
# rmdir /srv/git/mnt

Terrible terrible things would happen... by which I mean I can no
longer access or unmount that filesystem from the client.  That use
case in particular seems to be much worse than your regular unionfs
example even, and it's easily possible today (even by accident).

Cheers,
Kyle Moffett
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-30 12:20               ` Miklos Szeredi
@ 2010-08-31 19:18                 ` Valerie Aurora
  2010-08-31 20:19                   ` Trond Myklebust
  0 siblings, 1 reply; 49+ messages in thread
From: Valerie Aurora @ 2010-08-31 19:18 UTC (permalink / raw)
  To: Miklos Szeredi, Trond Myklebust, J. Bruce Fields
  Cc: Neil Brown, linux-fsdevel, linux-kernel, viro, jblunck, hch

On Mon, Aug 30, 2010 at 02:20:47PM +0200, Miklos Szeredi wrote:
> On Mon, 30 Aug 2010, Neil Brown wrote:
> 
> > Val has been following that approach and asking if it is possible to make an
> > NFS filesystem really-truly read-only. i.e. no changes.
> > I don't believe it is.
> 
> Perhaps it doesn't matter.  The nasty cases can be prevented by just
> disallowing local modification.  For the rest NFS will return ESTALE:
> "though luck, why didn't you follow the rules?"

I agree: Ask the server to keep it read-only, but also detect if it
lied to prevent kernel bugs on the client.

Is detecting ESTALE and failing the mount sufficient to detect all
cases of a cached directory being altered?  I keep trying to trap an
NFS developer and beat the answer out of him but they usually get hung
up on the impossibility of 100% enforcement of the read-only server
option. (Agreed, impossible, just give the sysadmin a mount option so
that it doesn't happen accidentally.)

-VAL

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-30 10:18           ` Miklos Szeredi
  2010-08-30 11:40             ` Neil Brown
@ 2010-08-31 19:29             ` Valerie Aurora
  2010-09-02 13:15             ` Jan Engelhardt
  2 siblings, 0 replies; 49+ messages in thread
From: Valerie Aurora @ 2010-08-31 19:29 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Neil Brown, linux-fsdevel, linux-kernel, viro, jblunck, hch

On Mon, Aug 30, 2010 at 12:18:11PM +0200, Miklos Szeredi wrote:
> On Sun, 29 Aug 2010, Neil Brown wrote:
> 
> > My comment about set-theory unions being commutative set me thinking.  I
> > really don't think "union" is the right name for this thing.  There is
> > nothing about it which really fits that proper definition of a union.
> > whiteouts mean that even the list of names in a directory is not the union of
> > the lists of names in the upper and lower directories.
> > "overlay" is a much more accurate name.  But union seems to be the name
> > that is most used.  I wonder if it is too late to change that.
> 
> We could call it overlayfs.  People learn new names quickly :)

Union mounts was named "writable overlays" for one release in an
attempt to get away from the "arbitrary union of file systems" idea.
I think it helped, but went back to union mounts since it was more
familiar and made prettier function names.

The config option for union mounts says:

Union mounts allow you to mount a transparent writable layer over a
read-only file system, for example, an ext3 partition on a hard drive
over a CD-ROM root file system image.

-VAL

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-31 19:18                 ` Valerie Aurora
@ 2010-08-31 20:19                   ` Trond Myklebust
  2010-09-01  1:56                     ` Valerie Aurora
  0 siblings, 1 reply; 49+ messages in thread
From: Trond Myklebust @ 2010-08-31 20:19 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Miklos Szeredi, J. Bruce Fields, Neil Brown, linux-fsdevel,
	linux-kernel, viro, jblunck, hch

On Tue, 2010-08-31 at 15:18 -0400, Valerie Aurora wrote:
> On Mon, Aug 30, 2010 at 02:20:47PM +0200, Miklos Szeredi wrote:
> > On Mon, 30 Aug 2010, Neil Brown wrote:
> > 
> > > Val has been following that approach and asking if it is possible to make an
> > > NFS filesystem really-truly read-only. i.e. no changes.
> > > I don't believe it is.
> > 
> > Perhaps it doesn't matter.  The nasty cases can be prevented by just
> > disallowing local modification.  For the rest NFS will return ESTALE:
> > "though luck, why didn't you follow the rules?"
> 
> I agree: Ask the server to keep it read-only, but also detect if it
> lied to prevent kernel bugs on the client.
> 
> Is detecting ESTALE and failing the mount sufficient to detect all
> cases of a cached directory being altered?

No. Files can be altered without being unlinked.

>   I keep trying to trap an
> NFS developer and beat the answer out of him but they usually get hung
> up on the impossibility of 100% enforcement of the read-only server
> option. (Agreed, impossible, just give the sysadmin a mount option so
> that it doesn't happen accidentally.)

Remind me again why mounting the filesystem '-oro' on the server (and
possibly exporting it 'ro') isn't an option?

Trond

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-31 15:05                 ` Kyle Moffett
  (?)
@ 2010-08-31 20:36                 ` Valerie Aurora
  -1 siblings, 0 replies; 49+ messages in thread
From: Valerie Aurora @ 2010-08-31 20:36 UTC (permalink / raw)
  To: Kyle Moffett
  Cc: Neil Brown, Miklos Szeredi, linux-fsdevel, linux-kernel, viro,
	jblunck, hch

On Tue, Aug 31, 2010 at 11:05:18AM -0400, Kyle Moffett wrote:
> On Tue, Aug 31, 2010 at 07:24, Neil Brown <neilb@suse.de> wrote:
> > On Tue, 31 Aug 2010 13:00:45 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote:
> >> No, I don't think this design will do that. ??So it might be enough
> >> just to document that online modification of upper or lower
> >> filesystems results in undefined behavior.
> >>
> >> But to prevent accidental damage, it's prudent (at least by default)
> >> to enforce the no-modification policy.
> >>
> >> Why do you think this feature of allowing modification is important?
> >> Lets take some typical use cases:
> >>
> >> ??- live cd: lower layer is hard r/o, upper layer makes no sense to
> >> ?? ??modify online
> >>
> >> ??- thin client: lower layer is static except upgrades, which need
> >> ?? ??special tools to support and is done offline, upper layer makes no
> >> ?? ??sense to modify online
> >>
> >> Do you have some cases in mind where it makes at least a little sense
> >> to allow online modification of the underlying filesystems?
> >
> > No, I don't have a particular use case in mind that would take advantage of
> > the layers being directly modifiable. ??But I know that sys-admins can be very
> > ingenious and may well come up with something clever.
> >
> > My point is more that I don't think that is it *possible* to prevent changes
> > to the underlying filesystem (NFS being the prime example) so if there are
> > easy steps we can take to make the behaviour of overlayfs more predictable in
> > those cases, we should.
> 
> There's certainly already weird behaviors you can cause by regular
> filesystem over-mounts on NFS.  For example, I have an NFS server that
> exports a "/srv/git" directory; if I was to do the following actions
> on a client:
> 
> # mkdir /srv/git
> # mount -t nfs myserver:/srv/git /srv/git
> # mkdir /srv/git/mnt
> # mount -t ext3 /dev/sda3 /srv/git/mnt
> 
> And then from the server I were to:
> # rmdir /srv/git/mnt
> 
> Terrible terrible things would happen... by which I mean I can no
> longer access or unmount that filesystem from the client.  That use
> case in particular seems to be much worse than your regular unionfs
> example even, and it's easily possible today (even by accident).

While this definitely sucks, the concern in this case with unioning
file systems is a deadlock or kernel panic, not just "weird" behavior
or inability to unmount a file system.  Although in general I like the
standard for union behavior as "not as bad as NFS." :)

-VAL

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-31 20:19                   ` Trond Myklebust
@ 2010-09-01  1:56                     ` Valerie Aurora
  2010-09-01  4:04                       ` Trond Myklebust
  0 siblings, 1 reply; 49+ messages in thread
From: Valerie Aurora @ 2010-09-01  1:56 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Miklos Szeredi, J. Bruce Fields, Neil Brown, linux-fsdevel,
	linux-kernel, viro, jblunck, hch

On Tue, Aug 31, 2010 at 04:19:47PM -0400, Trond Myklebust wrote:
> On Tue, 2010-08-31 at 15:18 -0400, Valerie Aurora wrote:
> > On Mon, Aug 30, 2010 at 02:20:47PM +0200, Miklos Szeredi wrote:
> > > On Mon, 30 Aug 2010, Neil Brown wrote:
> > > 
> > > > Val has been following that approach and asking if it is possible to make an
> > > > NFS filesystem really-truly read-only. i.e. no changes.
> > > > I don't believe it is.
> > > 
> > > Perhaps it doesn't matter.  The nasty cases can be prevented by just
> > > disallowing local modification.  For the rest NFS will return ESTALE:
> > > "though luck, why didn't you follow the rules?"
> > 
> > I agree: Ask the server to keep it read-only, but also detect if it
> > lied to prevent kernel bugs on the client.
> > 
> > Is detecting ESTALE and failing the mount sufficient to detect all
> > cases of a cached directory being altered?
> 
> No. Files can be altered without being unlinked.

Argh.  Do generation numbers and/or mtime help us here?

> >   I keep trying to trap an
> > NFS developer and beat the answer out of him but they usually get hung
> > up on the impossibility of 100% enforcement of the read-only server
> > option. (Agreed, impossible, just give the sysadmin a mount option so
> > that it doesn't happen accidentally.)
> 
> Remind me again why mounting the filesystem '-oro' on the server (and
> possibly exporting it 'ro') isn't an option?

The "hard read only" export option is to, at minimum, make it
difficult to *accidentally* remount the file system on the server as
read-write while it is exported as a union lower layer.

True, the NFS server can mount the file system "-o ro" - and then any
random other mount(8) on the server can come along and "-o
remount,rw".  So if we have an NFS server option that uses the new
hard read-only feature, this at least makes the admin go change the
NFS mount options or unexport it before writing to the exported union
lower layer and hosing all the union mount clients.

It's the difference between:

# mount -o remount,rw /exports/union_mount_root
 [thousands of union mounted clients hang]

And:

# mount -o remount,rw /exports/union_mount_root
mount: /exports/union_mount_root is hard read-only
# emacs /etc/exports
  [edit out union/hard readonly option]
# exportfs -r
 [thousands of union mounted clients hang]

But heck, system administration is hard, what's a little more rope?
Here, hold this gun while I position your foot...

-VAL

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-09-01  1:56                     ` Valerie Aurora
@ 2010-09-01  4:04                       ` Trond Myklebust
  0 siblings, 0 replies; 49+ messages in thread
From: Trond Myklebust @ 2010-09-01  4:04 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Miklos Szeredi, J. Bruce Fields, Neil Brown, linux-fsdevel,
	linux-kernel, viro, jblunck, hch

On Tue, 2010-08-31 at 21:56 -0400, Valerie Aurora wrote:
> On Tue, Aug 31, 2010 at 04:19:47PM -0400, Trond Myklebust wrote:
> > On Tue, 2010-08-31 at 15:18 -0400, Valerie Aurora wrote:
> > > On Mon, Aug 30, 2010 at 02:20:47PM +0200, Miklos Szeredi wrote:
> > > > On Mon, 30 Aug 2010, Neil Brown wrote:
> > > > 
> > > > > Val has been following that approach and asking if it is possible to make an
> > > > > NFS filesystem really-truly read-only. i.e. no changes.
> > > > > I don't believe it is.
> > > > 
> > > > Perhaps it doesn't matter.  The nasty cases can be prevented by just
> > > > disallowing local modification.  For the rest NFS will return ESTALE:
> > > > "though luck, why didn't you follow the rules?"
> > > 
> > > I agree: Ask the server to keep it read-only, but also detect if it
> > > lied to prevent kernel bugs on the client.
> > > 
> > > Is detecting ESTALE and failing the mount sufficient to detect all
> > > cases of a cached directory being altered?
> > 
> > No. Files can be altered without being unlinked.
> 
> Argh.  Do generation numbers and/or mtime help us here?

Up to a point. ext2/3 still has 1 second mtime resolution. There is no
way that can detect all modifications. The i_version can work for ext4,
if people remember to turn it on, and if they use NFSv4.

> > >   I keep trying to trap an
> > > NFS developer and beat the answer out of him but they usually get hung
> > > up on the impossibility of 100% enforcement of the read-only server
> > > option. (Agreed, impossible, just give the sysadmin a mount option so
> > > that it doesn't happen accidentally.)
> > 
> > Remind me again why mounting the filesystem '-oro' on the server (and
> > possibly exporting it 'ro') isn't an option?
> 
> The "hard read only" export option is to, at minimum, make it
> difficult to *accidentally* remount the file system on the server as
> read-write while it is exported as a union lower layer.
> 
> True, the NFS server can mount the file system "-o ro" - and then any
> random other mount(8) on the server can come along and "-o
> remount,rw".  So if we have an NFS server option that uses the new
> hard read-only feature, this at least makes the admin go change the
> NFS mount options or unexport it before writing to the exported union
> lower layer and hosing all the union mount clients.
> 
> It's the difference between:
> 
> # mount -o remount,rw /exports/union_mount_root
>  [thousands of union mounted clients hang]
> 
> And:
> 
> # mount -o remount,rw /exports/union_mount_root
> mount: /exports/union_mount_root is hard read-only
> # emacs /etc/exports
>   [edit out union/hard readonly option]
> # exportfs -r
>  [thousands of union mounted clients hang]
> 
> But heck, system administration is hard, what's a little more rope?
> Here, hold this gun while I position your foot...

...quite.

The point we keep making is that this can work if administrators do the
right thing. If they don't, then there is pretty much nothing we can do
to stop them from podiatric self-mutilation...

Just tell them what they are supposed to do, and if they don't then
they're on their own.

   Trond

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-30 11:40             ` Neil Brown
  2010-08-30 12:20               ` Miklos Szeredi
@ 2010-09-01  4:33               ` Neil Brown
  2010-09-01 20:11                 ` Miklos Szeredi
  1 sibling, 1 reply; 49+ messages in thread
From: Neil Brown @ 2010-09-01  4:33 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, vaurora, viro, jblunck, hch

On Mon, 30 Aug 2010 21:40:27 +1000
Neil Brown <neilb@suse.de> wrote:

> > > > I think it's best to leave that stuff until someone actually cares.
> > > > The "people might find it useful" argument is not strong enough to put
> > > > nontrivial effort into thinking about all the weird corner cases.
> > > 
> > > My thought on this is that in order to describe the behaviour of the
> > > filesystem accurately you need to think about all the weird corner cases.
> > > 
> > > Then it becomes a question of: is it easier to document the weird behaviour,
> > > or change the code so the documentation will be easier to understand?
> > > Some cases will go one way, some the other.
> > > 
> > > But as you suggest this isn't urgent.
> > 
> > You didn't mention one possibility: add limitations that prevent the
> > weird corner cases arising.  I believe this is the simplest option.
> 
> Val has been following that approach and asking if it is possible to make an
> NFS filesystem really-truly read-only. i.e. no changes.
> I don't believe it is.
> 
> But I won't pursue this further without patches.
> 

And here is a patch.  It isn't really complete, but I have done enough for
today.  It at least shows what I am trying to do.

To see a clear indication of the effect, try the following script before and
after the patch..

-----------------------
#!/bin/sh

# L = lower U = upper O = overlay

umount /tmp/O
rm -rf /tmp/[LUO]

mkdir /tmp/{L,U,O}

for i in 1 2 3; do > /tmp/L/$i ; done

mount -t union -o upperdir=/tmp/U,lowerdir=/tmp/L pointlessword /tmp/O

ls -l /tmp/O/foo
ls -l /tmp/O
> /tmp/L/foo
ls -l /tmp/O
-------------------------


The last 'ls -l' report a non-existent 'foo' without the patch.  With the
patch it more correctly doesn't report anything at all about foo.


No patch change-log yet because I'm not suggesting it be included, just
reviewed.

NeilBrown

diff --git a/fs/union/union.c b/fs/union/union.c
index e19fe62..434d152 100644
--- a/fs/union/union.c
+++ b/fs/union/union.c
@@ -392,9 +392,160 @@ static void union_dentry_iput(struct dentry *dentry, struct inode *inode)
 	iput(inode);
 }
 
+static int union_still_valid(struct dentry *dentry,
+			     struct dentry *parent, struct dentry *child)
+{
+	/* dentry is in the union filesystem
+	 * child is the corresponding dentry in the upper to lower layer
+	 * parent is the corresponding dentry of dentry's parent in the same layer
+	 *
+	 * If child is NULL, the parent must be NULL or negative.
+	 * Otherwise child must be hashed, have parent as the d_parent, and
+	 * have the same name as dentry
+	 *
+	 */
+	struct qstr *qstr;
+	int rv;
+
+	if (child == NULL)
+		return (parent == NULL || parent->d_inode == NULL);
+
+	if (child->d_parent != parent)
+		return 0;
+	if (d_unhashed(child))
+		return 0;
+
+	/* Unfortunately we need d_lock to compare names */
+	spin_lock(&dentry->d_lock);
+	spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
+	qstr = &child->d_name;
+	if (parent->d_op && parent->d_op->d_compare)
+		rv = ! (parent->d_op->d_compare(parent, qstr, &dentry->d_name));
+	else
+		rv =  (qstr->len == dentry->d_name.len &&
+		       memcmp(qstr->name, dentry->d_name.name, qstr->len) == 0);
+
+	spin_unlock(&child->d_lock);
+	spin_unlock(&dentry->d_lock);
+	return rv;
+}
+
+static struct dentry *union_lookup_real(struct dentry *dir, struct qstr *name);
+static struct inode *union_new_inode(struct super_block *sb, umode_t mode);
+
+static struct union_entry do_union_lookup(struct inode *dir, struct dentry *dentry,
+					   struct nameidata *nd)
+{
+	struct union_entry *pue = dentry->d_parent->d_fsdata;
+	struct union_entry ue;
+	struct dentry *upperdir = pue->upperpath.dentry;
+	struct dentry *upperdentry = NULL;
+	struct dentry *lowerdir = pue->lowerpath.dentry;
+	struct dentry *lowerdentry = NULL;
+	int err;
+
+	memset(&ue, 0, sizeof(ue));
+
+	if (upperdir) {
+		upperdentry = union_lookup_real(upperdir, &dentry->d_name);
+		err = PTR_ERR(upperdentry);
+		if (IS_ERR(upperdentry))
+			goto out;
+
+		if (upperdentry) {
+			if (union_is_opaquedir(upperdentry))
+				ue.opaque = true;
+			else if (union_is_whiteout(upperdentry)) {
+				dput(upperdentry);
+				upperdentry = NULL;
+				ue.opaque = true;
+			}
+		}
+	}
+	if (lowerdir && !ue.opaque) {
+		lowerdentry = union_lookup_real(lowerdir, &dentry->d_name);
+		if (IS_ERR(lowerdentry)) {
+			err = PTR_ERR(lowerdentry);
+			dput(upperdentry);
+			goto out;
+		}
+	}
+
+	if (lowerdentry && upperdentry &&
+	    (!S_ISDIR(upperdentry->d_inode->i_mode) ||
+	     !S_ISDIR(lowerdentry->d_inode->i_mode))) {
+		dput(lowerdentry);
+		lowerdentry = NULL;
+		ue.opaque = true;
+	}
+
+	if (upperdentry) {
+		ue.upperpath.mnt = pue->upperpath.mnt;
+		ue.upperpath.dentry = upperdentry;
+		path_get(&ue.upperpath);
+		dput(upperdentry);
+	}
+	if (lowerdentry) {
+		ue.lowerpath.mnt = pue->lowerpath.mnt;
+		ue.lowerpath.dentry = lowerdentry;
+		path_get(&ue.lowerpath);
+		dput(lowerdentry);
+	}
+
+	return ue;
+
+out:
+	ue.upperpath.dentry = ERR_PTR(err);
+	return ue;
+}
+
+static int union_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
+{
+	/* We need to revalidate this dentry if the upper parent might have
+	 * changed.
+	 */
+	struct dentry *parent = dget_parent(dentry);
+	struct union_entry *ue = dentry->d_fsdata;
+	struct union_entry *pue = parent->d_fsdata;
+	struct union_entry nue;
+	int rv = 1;
+
+	if (union_still_valid(dentry, pue->upperpath.dentry,  ue->upperpath.dentry) &&
+	    union_still_valid(dentry, pue->lowerpath.dentry,  ue->lowerpath.dentry))
+		goto out;
+
+	/* If dentry is not a directory, it is safe to fail */
+	rv = 0;
+	if (dentry->d_inode == NULL || !S_ISDIR(dentry->d_inode->i_mode))
+		goto out;
+
+	/* If this dentry or a child is in use by someone else, it cannot be
+	 * invalidated so the best thing to do is re-lookup the names
+	 */
+	mutex_lock(&parent->d_inode->i_mutex);
+	if (dentry->d_parent != parent)
+		goto out_unlock;
+	rv = 1;
+	nue = do_union_lookup(parent->d_inode, dentry, nd);
+	if (IS_ERR(nue.upperpath.dentry))
+		/* Leave the dentry unchanged, but return the error */
+		rv = PTR_ERR(nue.upperpath.dentry);
+	else {
+		path_put(&ue->upperpath);
+		path_put(&ue->lowerpath);
+		*ue = nue;
+	}
+out_unlock:
+	mutex_unlock(&parent->d_inode->i_mutex);
+out:
+	dput(parent);
+	return rv;
+}
+
 static const struct dentry_operations union_dentry_operations = {
 	.d_release = union_dentry_release,
 	.d_iput = union_dentry_iput,
+	.d_revalidate = union_dentry_revalidate,
 };
 
 static struct inode *union_new_inode(struct super_block *sb, umode_t mode)
@@ -459,12 +610,7 @@ static struct dentry *union_lookup_real(struct dentry *dir, struct qstr *name)
 static struct dentry *union_lookup(struct inode *dir, struct dentry *dentry,
 				   struct nameidata *nd)
 {
-	struct union_entry *pue = dentry->d_parent->d_fsdata;
 	struct union_entry *ue;
-	struct dentry *upperdir = pue->upperpath.dentry;
-	struct dentry *upperdentry = NULL;
-	struct dentry *lowerdir = pue->lowerpath.dentry;
-	struct dentry *lowerdentry = NULL;
 	struct inode *inode = NULL;
 	int err;
 
@@ -472,71 +618,29 @@ static struct dentry *union_lookup(struct inode *dir, struct dentry *dentry,
 	ue = kzalloc(sizeof(struct union_entry), GFP_KERNEL);
 	if (!ue)
 		goto out;
-
-	if (upperdir) {
-		upperdentry = union_lookup_real(upperdir, &dentry->d_name);
-		err = PTR_ERR(upperdentry);
-		if (IS_ERR(upperdentry))
-			goto out_free;
-
-		if (upperdentry) {
-			if (union_is_opaquedir(upperdentry))
-				ue->opaque = true;
-			else if (union_is_whiteout(upperdentry)) {
-				dput(upperdentry);
-				upperdentry = NULL;
-				ue->opaque = true;
-			}
-		}
-	}
-	if (lowerdir && !ue->opaque) {
-		lowerdentry = union_lookup_real(lowerdir, &dentry->d_name);
-		if (IS_ERR(lowerdentry)) {
-			err = PTR_ERR(lowerdentry);
-			dput(upperdentry);
-			goto out_free;
-		}
+	*ue = do_union_lookup(dir, dentry, nd);
+	if (IS_ERR(ue->upperpath.dentry)) {
+		err = PTR_ERR(ue->upperpath.dentry);
+		goto out_free;
 	}
-
-	if (lowerdentry && upperdentry &&
-	    (!S_ISDIR(upperdentry->d_inode->i_mode) ||
-	     !S_ISDIR(lowerdentry->d_inode->i_mode))) {
-		dput(lowerdentry);
-		lowerdentry = NULL;
-		ue->opaque = true;
-	}
-
-	if (lowerdentry || upperdentry) {
+	if (ue->upperpath.dentry || ue->lowerpath.dentry) {
 		struct dentry *realdentry;
 
-		realdentry = upperdentry ? upperdentry : lowerdentry;
+		realdentry = ue->upperpath.dentry ? : ue->lowerpath.dentry;
 		inode = union_new_inode(dir->i_sb, realdentry->d_inode->i_mode);
-		if (!inode)
-			goto out_dput;
-	}
-
-	if (upperdentry) {
-		ue->upperpath.mnt = pue->upperpath.mnt;
-		ue->upperpath.dentry = upperdentry;
-		path_get(&ue->upperpath);
-		dput(upperdentry);
-	}
-	if (lowerdentry) {
-		ue->lowerpath.mnt = pue->lowerpath.mnt;
-		ue->lowerpath.dentry = lowerdentry;
-		path_get(&ue->lowerpath);
-		dput(lowerdentry);
+		err = -ENOMEM;
+		if (!inode) {
+			path_put(&ue->upperpath);
+			path_put(&ue->lowerpath);
+			goto out_free;
+		}
 	}
-
 	d_add(dentry, inode);
 	dentry->d_fsdata = ue;
 	dentry->d_op = &union_dentry_operations;
 
 	return NULL;
 
-out_dput:
-	dput(upperdentry);
-	dput(lowerdentry);
 out_free:
 	kfree(ue);
 out:

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-09-01  4:33               ` Neil Brown
@ 2010-09-01 20:11                 ` Miklos Szeredi
  0 siblings, 0 replies; 49+ messages in thread
From: Miklos Szeredi @ 2010-09-01 20:11 UTC (permalink / raw)
  To: Neil Brown
  Cc: miklos, linux-fsdevel, linux-kernel, vaurora, viro, jblunck, hch

On Wed, 1 Sep 2010, Neil Brown wrote:
> On Mon, 30 Aug 2010 21:40:27 +1000
> And here is a patch.  It isn't really complete, but I have done enough for
> today.  It at least shows what I am trying to do.

Thanks.

Okay, I see what it's trying to do.  And I can accept the part which
validates that the underlying dentry trees still match the union tree
(wouldn't it be simpler to just d_lookup() and check if the child
matches?)

However redoing the lookup and changing the upperpath and lowerpath
for directories is no good.  Upperpath and lowerpath are constant,
changing them would be like changing dentry->d_inode, the dentry would
suddenly become something else.  That's crazy.

Making sure that evething is in a sane state and erroring out if not
sounds a workable alternative to enforcing no change blindly.
Although no change should probably be the default (e.g. unless a 
"-o dont_enforce_no_change" mount option is given).

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 5/5] union: hybrid union filesystem prototype
  2010-08-26 18:33 ` [PATCH 5/5] union: hybrid union filesystem prototype Miklos Szeredi
@ 2010-09-01 21:42   ` Valerie Aurora
  2010-09-02  9:19     ` Miklos Szeredi
  2010-09-02 21:42   ` Valerie Aurora
  1 sibling, 1 reply; 49+ messages in thread
From: Valerie Aurora @ 2010-09-01 21:42 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, neilb, viro, jblunck, hch

On Thu, Aug 26, 2010 at 08:33:45PM +0200, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
> 
> This union filesystem is a hybrid of entirely filesystem based
> (unionfs, aufs) and entierly VFS based (union mounts) solutions.

This is elegant and readable code.  I am still reviewing it but have a
few comments now.

> +static int union_upper_create(struct dentry *dentry, struct iattr *attr,
> +			      dev_t rdev, const char *link, struct path *src)
> +{
> +	int err;
> +	int attr_update = ATTR_UID | ATTR_GID | ATTR_ATIME_SET | ATTR_MTIME_SET;
> +	struct dentry *parent = dget_parent(dentry);
> +	struct union_entry *ue = dentry->d_fsdata;
> +	struct union_entry *pue = parent->d_fsdata;
> +	struct inode *upperdir = pue->upperpath.dentry->d_inode;
> +	struct dentry *newdentry;
> +	struct path newpath;
> +
> +	mutex_lock_nested(&upperdir->i_mutex, I_MUTEX_PARENT);
> +
> +	/*
> +	 * Using upper filesystem locking to protect against copy up
> +	 * racing with rename (rename means the copy up was already
> +	 * successful).
> +	 */
> +	err = -EEXIST;
> +	if (dentry->d_parent != parent)
> +		goto out_unlock;
> +
> +	newdentry = union_lookup_create(ue, pue, &dentry->d_name);
> +	err = PTR_ERR(newdentry);
> +	if (IS_ERR(newdentry))
> +		goto out_unlock;
> +
> +	newpath.dentry = newdentry;
> +	newpath.mnt = pue->upperpath.mnt;
> +
> +	switch (attr->ia_mode & S_IFMT) {
> +	case S_IFREG:
> +		if (src)
> +			WARN_ON(!(attr->ia_valid & ATTR_SIZE));
> +		else
> +			WARN_ON((attr->ia_valid & ATTR_SIZE));
> +
> +		err = vfs_create(upperdir, newdentry, attr->ia_mode, NULL);

Passing a NULL namiedata pointer to vfs_create() is a convenient
temporary hack, but unfortunately NFS, ceph, etc. still use the
nameidata passed to vfs_create() and other ops.

The way union mounts gets a valid nameidata is by doing the create in
the VFS before calling file system ops that may trigger a copyup,
while we still have the original nameidata.  This is one of the major
reasons union mounts lives in the VFS.

A lot of my conversations about union mounts with Al go like this:

Al: "Rewrite it this way."
Val: "But then how do we get the nameidata?"
Al: "Arrrrrrrrrrrrrggggh."

Can you think of a way to construct a good nameidata for these
implicit copyups?  That might be a solution.

-VAL

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 5/5] union: hybrid union filesystem prototype
  2010-09-01 21:42   ` Valerie Aurora
@ 2010-09-02  9:19     ` Miklos Szeredi
  2010-09-02 21:33       ` Valerie Aurora
  0 siblings, 1 reply; 49+ messages in thread
From: Miklos Szeredi @ 2010-09-02  9:19 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: miklos, linux-fsdevel, linux-kernel, neilb, viro, jblunck, hch

On Wed, 1 Sep 2010, Valerie Aurora wrote:
> > +
> > +		err = vfs_create(upperdir, newdentry, attr->ia_mode, NULL);
> 
> Passing a NULL namiedata pointer to vfs_create() is a convenient
> temporary hack, but unfortunately NFS, ceph, etc. still use the
> nameidata passed to vfs_create() and other ops.
> 
> The way union mounts gets a valid nameidata is by doing the create in
> the VFS before calling file system ops that may trigger a copyup,
> while we still have the original nameidata.  This is one of the major
> reasons union mounts lives in the VFS.

Not a big deal, just set up nd as if this was a single component
lookup.  The previous version did it like this:

+       struct nameidata nd = {
+               .last_type = LAST_NORM,
+               .last = *name,
+       };
+
+       nd.path = pue->upperpath;
+       path_get(&nd.path);
+
+       newdentry = lookup_create(&nd, S_ISDIR(attr->ia_mode));

But that's not a solution to the NFS suckage, it's just a workaround.

"Fortunately" NFS isn't good for a writable layer of a union for other
reasons, so this isn't a big concern at the moment.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-08-30 10:18           ` Miklos Szeredi
  2010-08-30 11:40             ` Neil Brown
  2010-08-31 19:29             ` Valerie Aurora
@ 2010-09-02 13:15             ` Jan Engelhardt
  2010-09-02 13:32               ` Neil Brown
  2 siblings, 1 reply; 49+ messages in thread
From: Jan Engelhardt @ 2010-09-02 13:15 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Neil Brown, linux-fsdevel, linux-kernel, vaurora, viro, jblunck, hch


On Monday 2010-08-30 12:18, Miklos Szeredi wrote:
>
>> My comment about set-theory unions being commutative set me thinking.  I
>> really don't think "union" is the right name for this thing.  There is
>> nothing about it which really fits that proper definition of a union.
>
>We could call it overlayfs.  People learn new names quickly :)

There is a much larger issue that you should be very well aware about —

"The name wanted to be a clever acronym for "Filesystem in
USErspace", but it turned out to be an unfortunate choice. The author
has since vowed never to name a project after a common term, not even
anything found more than a handful of times on Google."

overlayfs already exists. Right next to fuse on sourceforge...


Oh and I what I like to see is support for multiple readonly branches :)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-09-02 13:15             ` Jan Engelhardt
@ 2010-09-02 13:32               ` Neil Brown
  2010-09-02 14:25                 ` Jan Engelhardt
  0 siblings, 1 reply; 49+ messages in thread
From: Neil Brown @ 2010-09-02 13:32 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, vaurora, viro, jblunck, hch

On Thu, 2 Sep 2010 15:15:37 +0200 (CEST)
Jan Engelhardt <jengelh@medozas.de> wrote:

> 
> On Monday 2010-08-30 12:18, Miklos Szeredi wrote:
> >
> >> My comment about set-theory unions being commutative set me thinking.  I
> >> really don't think "union" is the right name for this thing.  There is
> >> nothing about it which really fits that proper definition of a union.
> >
> >We could call it overlayfs.  People learn new names quickly :)
> 
> There is a much larger issue that you should be very well aware about —
> 
> "The name wanted to be a clever acronym for "Filesystem in
> USErspace", but it turned out to be an unfortunate choice. The author
> has since vowed never to name a project after a common term, not even
> anything found more than a handful of times on Google."
> 
> overlayfs already exists. Right next to fuse on sourceforge...

lol


No, I mean it.  "Linux Over-Lays".  :-)

> 
> 
> Oh and I what I like to see is support for multiple readonly branches :)

I think we very nearly have that, assuming I understand your requirement
correctly.
The lower filesystem can itself be an overlay, providing it is mounted
read-only.

So if /mnt/ro1 /mnt/ro2 /mnt/ro3 are all read-only branches then

 mount -o ro,lowerdir=/mnt/ro1,upperdir=/mnt/ro2 meaninglessstring /mnt/ov1
 mount -o ro,lowerdir=/mnt/ov1,upperdir=/mnt/ro3 meaninglessstring /mnt/ov2
 mount -o lowerdir=/mnt/ov2,upperdir=/mnt/rw ignoreme /mnt/overlay

and /mnt/overlay will be the combination of 3 read-only filesystems and one
writable one.
(this doesn't work with the code as-is, but it is really just a few bug-fixes
away).

NeilBrown

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-09-02 13:32               ` Neil Brown
@ 2010-09-02 14:25                 ` Jan Engelhardt
  2010-09-02 14:28                   ` Miklos Szeredi
  2010-09-23 13:18                   ` Jan Engelhardt
  0 siblings, 2 replies; 49+ messages in thread
From: Jan Engelhardt @ 2010-09-02 14:25 UTC (permalink / raw)
  To: Neil Brown
  Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, vaurora, viro, jblunck, hch


On Thursday 2010-09-02 15:32, Neil Brown wrote:
>> 
>> Oh and I what I like to see is support for multiple readonly branches :)
>
>I think we very nearly have that, assuming I understand your requirement
>correctly.
>The lower filesystem can itself be an overlay, providing it is mounted
>read-only.
>
>So if /mnt/ro1 /mnt/ro2 /mnt/ro3 are all read-only branches then
>
> mount -o ro,lowerdir=/mnt/ro1,upperdir=/mnt/ro2 meaninglessstring /mnt/ov1
> mount -o ro,lowerdir=/mnt/ov1,upperdir=/mnt/ro3 meaninglessstring /mnt/ov2
> mount -o lowerdir=/mnt/ov2,upperdir=/mnt/rw ignoreme /mnt/overlay
>
>and /mnt/overlay will be the combination of 3 read-only filesystems and one
>writable one.
>(this doesn't work with the code as-is, but it is really just a few bug-fixes
>away).

There ought to be a reason that other implementations offer
doing multiple branches with a single vfsmount.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-09-02 14:25                 ` Jan Engelhardt
@ 2010-09-02 14:28                   ` Miklos Szeredi
  2010-09-08 19:47                     ` David P. Quigley
  2010-09-23 13:18                   ` Jan Engelhardt
  1 sibling, 1 reply; 49+ messages in thread
From: Miklos Szeredi @ 2010-09-02 14:28 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: neilb, miklos, linux-fsdevel, linux-kernel, vaurora, viro, jblunck, hch

On Thu, 2 Sep 2010, Jan Engelhardt wrote:
> There ought to be a reason that other implementations offer
> doing multiple branches with a single vfsmount.

Overdesign.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 5/5] union: hybrid union filesystem prototype
  2010-09-02  9:19     ` Miklos Szeredi
@ 2010-09-02 21:33       ` Valerie Aurora
  2010-09-03  5:10         ` Neil Brown
  2010-09-03  8:52         ` Miklos Szeredi
  0 siblings, 2 replies; 49+ messages in thread
From: Valerie Aurora @ 2010-09-02 21:33 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, neilb, viro, jblunck, hch

On Thu, Sep 02, 2010 at 11:19:41AM +0200, Miklos Szeredi wrote:
> On Wed, 1 Sep 2010, Valerie Aurora wrote:
> > > +
> > > +		err = vfs_create(upperdir, newdentry, attr->ia_mode, NULL);
> > 
> > Passing a NULL namiedata pointer to vfs_create() is a convenient
> > temporary hack, but unfortunately NFS, ceph, etc. still use the
> > nameidata passed to vfs_create() and other ops.
> > 
> > The way union mounts gets a valid nameidata is by doing the create in
> > the VFS before calling file system ops that may trigger a copyup,
> > while we still have the original nameidata.  This is one of the major
> > reasons union mounts lives in the VFS.
> 
> Not a big deal, just set up nd as if this was a single component
> lookup.  The previous version did it like this:
> 
> +       struct nameidata nd = {
> +               .last_type = LAST_NORM,
> +               .last = *name,
> +       };
> +
> +       nd.path = pue->upperpath;
> +       path_get(&nd.path);
> +
> +       newdentry = lookup_create(&nd, S_ISDIR(attr->ia_mode));
> 
> But that's not a solution to the NFS suckage, it's just a workaround.

Hm, I suspect it's more complicated than this.  I looked at how
unionfs does it in init_lower_nd() and it requires poking around in
VFS internal details in the file system implementation.  So unioning
code is not in the VFS, but VFS code is in the union fs.  Progress?  I
dunno.

> "Fortunately" NFS isn't good for a writable layer of a union for other
> reasons, so this isn't a big concern at the moment.

It's the long-term effect on the code structure that concerns me more.

-VAL

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 5/5] union: hybrid union filesystem prototype
  2010-08-26 18:33 ` [PATCH 5/5] union: hybrid union filesystem prototype Miklos Szeredi
  2010-09-01 21:42   ` Valerie Aurora
@ 2010-09-02 21:42   ` Valerie Aurora
  2010-09-03 12:31     ` Miklos Szeredi
  1 sibling, 1 reply; 49+ messages in thread
From: Valerie Aurora @ 2010-09-02 21:42 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, neilb, viro, jblunck, hch

On Thu, Aug 26, 2010 at 08:33:45PM +0200, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
> 
> This union filesystem is a hybrid of entirely filesystem based
> (unionfs, aufs) and entierly VFS based (union mounts) solutions.
> 
> The dentry tree is duplicated from the underlying filesystems, this
> enables fast cached lookups without adding special support into the
> VFS.  This uses slightly more memory than union mounts, but dentries
> are relatively small.
> 
> Inode structures are only duplicated for directories.  Regular files,
> symlinks and special files each share a single inode.  This means that
> locking victim for unlink is a quasi-filesystem lock, which is
> suboptimal, but could be worked around in the VFS.
> 
> Opening non directories results in the open forwarded to the
> underlying filesystem.  This makes the behavior very similar to union
> mounts (with the same limitations vs. fchmod/fchown on O_RDONLY file
> descriptors).
> 
> Usage:
> 
>   mount -t union -olowerdir=/union/lower,upperdir=/union/upper union /mnt/union
> 
> Supported:
> 
>  - all operations
> 
> Missing:
> 
>  - upgrade credentials for copy-up
>  - ensure that filesystems part of the union are not modified outside
>    the union

Just a note that the infrastructure I wrote to do this last bullet
point for union mounts (hard read-only count plus mount checks) is
completely compatible with hybrid union fs.

-VAL

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 5/5] union: hybrid union filesystem prototype
  2010-09-02 21:33       ` Valerie Aurora
@ 2010-09-03  5:10         ` Neil Brown
  2010-09-03  9:16           ` Miklos Szeredi
  2010-09-03  8:52         ` Miklos Szeredi
  1 sibling, 1 reply; 49+ messages in thread
From: Neil Brown @ 2010-09-03  5:10 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, viro, jblunck, hch

On Thu, 2 Sep 2010 17:33:15 -0400
Valerie Aurora <vaurora@redhat.com> wrote:

> On Thu, Sep 02, 2010 at 11:19:41AM +0200, Miklos Szeredi wrote:
> > On Wed, 1 Sep 2010, Valerie Aurora wrote:
> > > > +
> > > > +		err = vfs_create(upperdir, newdentry, attr->ia_mode, NULL);
> > > 
> > > Passing a NULL namiedata pointer to vfs_create() is a convenient
> > > temporary hack, but unfortunately NFS, ceph, etc. still use the
> > > nameidata passed to vfs_create() and other ops.
> > > 
> > > The way union mounts gets a valid nameidata is by doing the create in
> > > the VFS before calling file system ops that may trigger a copyup,
> > > while we still have the original nameidata.  This is one of the major
> > > reasons union mounts lives in the VFS.
> > 
> > Not a big deal, just set up nd as if this was a single component
> > lookup.  The previous version did it like this:
> > 
> > +       struct nameidata nd = {
> > +               .last_type = LAST_NORM,
> > +               .last = *name,
> > +       };
> > +
> > +       nd.path = pue->upperpath;
> > +       path_get(&nd.path);
> > +
> > +       newdentry = lookup_create(&nd, S_ISDIR(attr->ia_mode));
> > 
> > But that's not a solution to the NFS suckage, it's just a workaround.
> 
> Hm, I suspect it's more complicated than this.  I looked at how
> unionfs does it in init_lower_nd() and it requires poking around in
> VFS internal details in the file system implementation.  So unioning
> code is not in the VFS, but VFS code is in the union fs.  Progress?  I
> dunno.

Slightly off-topic, but my personal definition of 'progress' in this context
would be giving more control to the filesystems rather than the VFS telling
them how they have to behave.  The VFS should largely be a library that the
filesystems can call on to do common tasks, but where they can augment what
libVFS does, or just ignore it as they choose.  This would be more like the
model of the page-cache.  It is really easy for a filesystem to use the
pagecache to store file content, and really easy for it to do something else
if that works better.

In this particular situation - where unionfs has a dentry and want to copy
that file to a different dentry, I think what we really want to do is call
the section of code in the middle of do_filp_open, roughly from the "We have
the parent and last component"  comment to the do_last() call.  If that could
be factored out and exported it would get close to what we want.

I had a look at NFS and ceph, and they want to see LOOKUP_CREATE and
LOOPUP_OPEN set, and want the intent.open.file to exist.  do_filp_open can do
all that for you.


> 
> > "Fortunately" NFS isn't good for a writable layer of a union for other
> > reasons, so this isn't a big concern at the moment.
> 
> It's the long-term effect on the code structure that concerns me more.

Code structure:  absolutely agree this is important.  But I don't think it 
    needs to be a problem - just refactor 'VFS" code and call into it.
    (I note that nfsd always passes a NULL nameidata - when refactoring that
    code it would be worth aiming to make it usable by nfsd too).

NFS as writable layer:  Not a concern at the moment, no.  But I think it is
   worth keeping it in mind.
   The biggest problem is, I think, the lack of xattrs which are currently
   needed for whiteout and opaque.
   I think there would be little cost in allowing a symlink to
   (union-whiteout) to be treated as a whiteout even though it has no xattrs
   (maybe as a mount option).
   For opaque you would need a somewhat less-elegant work around. e.g. if the
   directory contains a symlink to (union-opaque) called ._.union_opaque,
   then that symlink is hidden, and the directory is opaque.  This could be
   enabled by that same mount option.
   This might not be as efficient as xattrs, but then people don't use
   networked filesystems for their speed - they have other benefits.

NeilBrown


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 5/5] union: hybrid union filesystem prototype
  2010-09-02 21:33       ` Valerie Aurora
  2010-09-03  5:10         ` Neil Brown
@ 2010-09-03  8:52         ` Miklos Szeredi
  1 sibling, 0 replies; 49+ messages in thread
From: Miklos Szeredi @ 2010-09-03  8:52 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: miklos, linux-fsdevel, linux-kernel, neilb, viro, jblunck, hch

On Thu, 2 Sep 2010, Valerie Aurora wrote:
> Hm, I suspect it's more complicated than this.  I looked at how
> unionfs does it in init_lower_nd() and it requires poking around in
> VFS internal details in the file system implementation.  So unioning
> code is not in the VFS, but VFS code is in the union fs.  Progress?  I
> dunno.

Definitely not progress.

Progress would be if the intents mess would go away.  Which Al's been
working on for what, one, two years?

Until that's cleared up it's pointless to add hacks like that (which
in actual fact don't make much sense anyway, NFS will work fine
without the intents stuff most of the time, only perhaps
suboptimally).

> > "Fortunately" NFS isn't good for a writable layer of a union for other
> > reasons, so this isn't a big concern at the moment.
> 
> It's the long-term effect on the code structure that concerns me more.

Nameidata in any filesystem operation should not be in the long term
design structure (well, in follow_link() it was OK while it was used
recursiveley, now it's flattened out, it would also be cleaner if it
didn't get the nameidata).

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 5/5] union: hybrid union filesystem prototype
  2010-09-03  5:10         ` Neil Brown
@ 2010-09-03  9:16           ` Miklos Szeredi
  2010-09-09 16:02             ` David P. Quigley
  0 siblings, 1 reply; 49+ messages in thread
From: Miklos Szeredi @ 2010-09-03  9:16 UTC (permalink / raw)
  To: Neil Brown
  Cc: vaurora, miklos, linux-fsdevel, linux-kernel, viro, jblunck, hch

On Fri, 3 Sep 2010, Neil Brown wrote:
> Slightly off-topic, but my personal definition of 'progress' in this context
> would be giving more control to the filesystems rather than the VFS telling
> them how they have to behave.  The VFS should largely be a library that the
> filesystems can call on to do common tasks, but where they can augment what
> libVFS does, or just ignore it as they choose.  This would be more like the
> model of the page-cache.  It is really easy for a filesystem to use the
> pagecache to store file content, and really easy for it to do something else
> if that works better.
> 
> In this particular situation - where unionfs has a dentry and want to copy
> that file to a different dentry, I think what we really want to do is call
> the section of code in the middle of do_filp_open, roughly from the "We have
> the parent and last component"  comment to the do_last() call.  If that could
> be factored out and exported it would get close to what we want.
> 
> I had a look at NFS and ceph, and they want to see LOOKUP_CREATE and
> LOOPUP_OPEN set, and want the intent.open.file to exist.  do_filp_open can do
> all that for you.

Right, the difference between current open and what NFS wants is that
the current open is an inode based operation (like getattr).  The open
NFS wants is a name based operation (like create).

Unfortunately symlinks complicate that to a great extent.  Which means
this new operation really becomes a cobination of follow_link, create
and open.


> > > "Fortunately" NFS isn't good for a writable layer of a union for other
> > > reasons, so this isn't a big concern at the moment.
> > 
> > It's the long-term effect on the code structure that concerns me more.
> 
> Code structure:  absolutely agree this is important.  But I don't think it 
>     needs to be a problem - just refactor 'VFS" code and call into it.
>     (I note that nfsd always passes a NULL nameidata - when refactoring that
>     code it would be worth aiming to make it usable by nfsd too).
> 
> NFS as writable layer:  Not a concern at the moment, no.  But I think it is
>    worth keeping it in mind.
>    The biggest problem is, I think, the lack of xattrs which are currently
>    needed for whiteout and opaque.

There was a patch that seem to have been generally liked, don't know
what happened to it:

  http://lwn.net/Articles/353831/

>    I think there would be little cost in allowing a symlink to
>    (union-whiteout) to be treated as a whiteout even though it has no xattrs
>    (maybe as a mount option).
>    For opaque you would need a somewhat less-elegant work around. e.g. if the
>    directory contains a symlink to (union-opaque) called ._.union_opaque,
>    then that symlink is hidden, and the directory is opaque.  This could be
>    enabled by that same mount option.
>    This might not be as efficient as xattrs, but then people don't use
>    networked filesystems for their speed - they have other benefits.

I think unionfs/aufs do something like that.  Having namespace
pollution is ugly, but well, we can live with that.

But that's again something I'd think about when someone actually needs
it.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 5/5] union: hybrid union filesystem prototype
  2010-09-02 21:42   ` Valerie Aurora
@ 2010-09-03 12:31     ` Miklos Szeredi
  0 siblings, 0 replies; 49+ messages in thread
From: Miklos Szeredi @ 2010-09-03 12:31 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: miklos, linux-fsdevel, linux-kernel, neilb, viro, jblunck, hch

On Thu, 2 Sep 2010, Valerie Aurora wrote:
> > Missing:
> > 
> >  - upgrade credentials for copy-up
> >  - ensure that filesystems part of the union are not modified outside
> >    the union
> 
> Just a note that the infrastructure I wrote to do this last bullet
> point for union mounts (hard read-only count plus mount checks) is
> completely compatible with hybrid union fs.

Thanks.

I've used some infrastructure (like parts of the copy up code) from
union mounts.

One way to generalize the read-only counts infrastructure is not to
require MS_RDONLY on the superblock, but count the number of
read-write mounts of the sb.  If it's zero then that's equivalent to
MS_RDONLY.

For the writable layer the same can be used, except the overlay
filesystem can get a single private read-write mount.  This means that
the upper layer can also be mounted multiple times, but cannot be part
of other unions and cannot be mounted read write to userspace.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-09-02 14:28                   ` Miklos Szeredi
@ 2010-09-08 19:47                     ` David P. Quigley
  0 siblings, 0 replies; 49+ messages in thread
From: David P. Quigley @ 2010-09-08 19:47 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jan Engelhardt, neilb, linux-fsdevel, linux-kernel, vaurora,
	viro, jblunck, hch

On Thu, 2010-09-02 at 16:28 +0200, Miklos Szeredi wrote:
> On Thu, 2 Sep 2010, Jan Engelhardt wrote:
> > There ought to be a reason that other implementations offer
> > doing multiple branches with a single vfsmount.
> 
> Overdesign.
> 
> Thanks,
> Miklos
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Actually it's not overdesign. We support an arbitrary number of branches
with only the requirement that the top most branch be writable unless
you want the entire mount to be read-only. That is a feature that people
seem to have wanted for the many years that we have been working on
UnionFS. You have the limitation that you only support two branches
which is why you have to use the hack of making unions of unions of
unions where we would just add the 3 branches and mark them ro instead.
This kind of functionality makes administration more straight forward
when you are performing a task such as creating an RPM repository from a
series of loop back mounted images. I also used UnionFS to create a
mechanism for providing unified home directories on polyinstantiated MLS
systems (which is essentially a unified namespace with extra namespace
manipulation rules). I'd actually look at what UnionFS and AUFS have
done before you write their decisions off with snippy one word answers.

Dave


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 5/5] union: hybrid union filesystem prototype
  2010-09-03  9:16           ` Miklos Szeredi
@ 2010-09-09 16:02             ` David P. Quigley
  0 siblings, 0 replies; 49+ messages in thread
From: David P. Quigley @ 2010-09-09 16:02 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Neil Brown, vaurora, linux-fsdevel, linux-kernel, viro, jblunck, hch

On Fri, 2010-09-03 at 11:16 +0200, Miklos Szeredi wrote:
> On Fri, 3 Sep 2010, Neil Brown wrote:
> > Slightly off-topic, but my personal definition of 'progress' in this context
> > would be giving more control to the filesystems rather than the VFS telling
> > them how they have to behave.  The VFS should largely be a library that the
> > filesystems can call on to do common tasks, but where they can augment what
> > libVFS does, or just ignore it as they choose.  This would be more like the
> > model of the page-cache.  It is really easy for a filesystem to use the
> > pagecache to store file content, and really easy for it to do something else
> > if that works better.
> > 
> > In this particular situation - where unionfs has a dentry and want to copy
> > that file to a different dentry, I think what we really want to do is call
> > the section of code in the middle of do_filp_open, roughly from the "We have
> > the parent and last component"  comment to the do_last() call.  If that could
> > be factored out and exported it would get close to what we want.
> > 
> > I had a look at NFS and ceph, and they want to see LOOKUP_CREATE and
> > LOOPUP_OPEN set, and want the intent.open.file to exist.  do_filp_open can do
> > all that for you.
> 
> Right, the difference between current open and what NFS wants is that
> the current open is an inode based operation (like getattr).  The open
> NFS wants is a name based operation (like create).
> 
> Unfortunately symlinks complicate that to a great extent.  Which means
> this new operation really becomes a cobination of follow_link, create
> and open.
> 
> 
> > > > "Fortunately" NFS isn't good for a writable layer of a union for other
> > > > reasons, so this isn't a big concern at the moment.
> > > 
> > > It's the long-term effect on the code structure that concerns me more.
> > 
> > Code structure:  absolutely agree this is important.  But I don't think it 
> >     needs to be a problem - just refactor 'VFS" code and call into it.
> >     (I note that nfsd always passes a NULL nameidata - when refactoring that
> >     code it would be worth aiming to make it usable by nfsd too).
> > 
> > NFS as writable layer:  Not a concern at the moment, no.  But I think it is
> >    worth keeping it in mind.
> >    The biggest problem is, I think, the lack of xattrs which are currently
> >    needed for whiteout and opaque.
> 
> There was a patch that seem to have been generally liked, don't know
> what happened to it:
> 
>   http://lwn.net/Articles/353831/
> 

James spent the time to implement the side band protocol for xattrs in
NFSv3 but then hit some resistance with it recently. The thing is we are
trying to push people towards NFSv4 and the xattr solution James was
proposing was to allow people to use SELinux with NFSv3. We had a
meeting with several of the parties and it was determined that it would
be better to focus our efforts on NFSv4 and standardizing security label
support for NFSv4 than to push for the xattr solution with NFSv3. We
received the same requirements of a standards document that has gone
through the IETF process for the NFSv3 xattr solution as well. Since we
have been working on security labels with NFSv4 for almost 3 years now
and are pretty far along it was determined that it would be better to
spend cycles on that instead.


> >    I think there would be little cost in allowing a symlink to
> >    (union-whiteout) to be treated as a whiteout even though it has no xattrs
> >    (maybe as a mount option).
> >    For opaque you would need a somewhat less-elegant work around. e.g. if the
> >    directory contains a symlink to (union-opaque) called ._.union_opaque,
> >    then that symlink is hidden, and the directory is opaque.  This could be
> >    enabled by that same mount option.
> >    This might not be as efficient as xattrs, but then people don't use
> >    networked filesystems for their speed - they have other benefits.
> 
> I think unionfs/aufs do something like that.  Having namespace
> pollution is ugly, but well, we can live with that.

I agree with this as well. Back in 2006 we spoke with Ted Tso at OLS and
I believe he wasn't happy with the namespace pollution either. He
suggested that we create an on disk metadata format for unionfs that
would be used. If you look at the unionfs webpage there is an ODF branch
of the tree. This allows you to store whiteout and opaqueness data in a
separate file so that you don't have to pollute the namespace. This
might have some issues with locking and concurrency though. I didn't
work on this component so I'm unsure how it works. You might want to
look at that and see if it is worth trying to use again. 

> 
> But that's again something I'd think about when someone actually needs
> it.
> 
> Thanks,
> Miklos
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-09-02 14:25                 ` Jan Engelhardt
  2010-09-02 14:28                   ` Miklos Szeredi
@ 2010-09-23 13:18                   ` Jan Engelhardt
  2010-09-23 19:22                     ` Valerie Aurora
  1 sibling, 1 reply; 49+ messages in thread
From: Jan Engelhardt @ 2010-09-23 13:18 UTC (permalink / raw)
  To: Neil Brown
  Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, vaurora, viro, jblunck, hch


On ??, Miklos Szeredi wrote:
On Thursday 2010-09-02 16:25, Jan Engelhardt wrote:
>>
>>There ought to be a reason that other implementations offer
>>doing multiple branches with a single vfsmount.
>
>overdesign

However, those implementations offer changing the branch configuration
on the fly, e.g. pulling "out" a branch in the middle. With
"two-branches only" overlayfs, a multi-branch layout would be
multiple vfsmounts describable by something like "(((a b) c) d)".
Say you want to remove branch B. (Supposedly) works today in
aufs/unionfs. How do you pull it off with overlayfs? You can't
umount the (a b) vfsmount because it is used by c.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/5] hybrid union filesystem prototype
  2010-09-23 13:18                   ` Jan Engelhardt
@ 2010-09-23 19:22                     ` Valerie Aurora
  0 siblings, 0 replies; 49+ messages in thread
From: Valerie Aurora @ 2010-09-23 19:22 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Neil Brown, Miklos Szeredi, linux-fsdevel, linux-kernel, viro,
	jblunck, hch

On Thu, Sep 23, 2010 at 03:18:57PM +0200, Jan Engelhardt wrote:
> 
> On ??, Miklos Szeredi wrote:
> On Thursday 2010-09-02 16:25, Jan Engelhardt wrote:
> >>
> >>There ought to be a reason that other implementations offer
> >>doing multiple branches with a single vfsmount.
> >
> >overdesign
> 
> However, those implementations offer changing the branch configuration
> on the fly, e.g. pulling "out" a branch in the middle. With

Seriously.  What is the use case that justifies the complexity of
implementing this feature (correctly)?  We can all think of
theoretical use cases, but which ones are worth the cost?

> "two-branches only" overlayfs, a multi-branch layout would be
> multiple vfsmounts describable by something like "(((a b) c) d)".
> Say you want to remove branch B. (Supposedly) works today in

There's a difference between "works" and "is correct."  Maybe it is
correct; if so, let's see the code review from VFS experts and some
good stress testing.

-VAL

> aufs/unionfs. How do you pull it off with overlayfs? You can't
> umount the (a b) vfsmount because it is used by c.

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2010-09-23 19:23 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-26 18:33 [PATCH 0/5] hybrid union filesystem prototype Miklos Szeredi
2010-08-26 18:33 ` [PATCH 1/5] vfs: implement open "forwarding" Miklos Szeredi
2010-08-26 18:33 ` [PATCH 2/5] vfs: make i_op->permission take a dentry instead of an inode Miklos Szeredi
2010-08-26 20:24   ` David P. Quigley
2010-08-27  4:11     ` Neil Brown
2010-08-27 18:13       ` David P. Quigley
2010-08-27 19:21         ` Valerie Aurora
2010-08-27 18:31       ` David P. Quigley
2010-08-26 18:33 ` [PATCH 3/5] vfs: add flag to allow rename to same inode Miklos Szeredi
2010-08-26 18:33 ` [PATCH 4/5] vfs: export do_splice_direct() to modules Miklos Szeredi
2010-08-26 18:33 ` [PATCH 5/5] union: hybrid union filesystem prototype Miklos Szeredi
2010-09-01 21:42   ` Valerie Aurora
2010-09-02  9:19     ` Miklos Szeredi
2010-09-02 21:33       ` Valerie Aurora
2010-09-03  5:10         ` Neil Brown
2010-09-03  9:16           ` Miklos Szeredi
2010-09-09 16:02             ` David P. Quigley
2010-09-03  8:52         ` Miklos Szeredi
2010-09-02 21:42   ` Valerie Aurora
2010-09-03 12:31     ` Miklos Szeredi
2010-08-27  7:05 ` [PATCH 0/5] " Neil Brown
2010-08-27  8:47   ` Miklos Szeredi
2010-08-27 11:35     ` Neil Brown
2010-08-27 16:53       ` Miklos Szeredi
2010-08-29  4:42         ` Neil Brown
2010-08-30 10:18           ` Miklos Szeredi
2010-08-30 11:40             ` Neil Brown
2010-08-30 12:20               ` Miklos Szeredi
2010-08-31 19:18                 ` Valerie Aurora
2010-08-31 20:19                   ` Trond Myklebust
2010-09-01  1:56                     ` Valerie Aurora
2010-09-01  4:04                       ` Trond Myklebust
2010-09-01  4:33               ` Neil Brown
2010-09-01 20:11                 ` Miklos Szeredi
2010-08-31 19:29             ` Valerie Aurora
2010-09-02 13:15             ` Jan Engelhardt
2010-09-02 13:32               ` Neil Brown
2010-09-02 14:25                 ` Jan Engelhardt
2010-09-02 14:28                   ` Miklos Szeredi
2010-09-08 19:47                     ` David P. Quigley
2010-09-23 13:18                   ` Jan Engelhardt
2010-09-23 19:22                     ` Valerie Aurora
2010-08-30 18:38       ` Valerie Aurora
2010-08-30 23:12         ` Neil Brown
2010-08-31 11:00           ` Miklos Szeredi
2010-08-31 11:24             ` Neil Brown
2010-08-31 15:05               ` Kyle Moffett
2010-08-31 15:05                 ` Kyle Moffett
2010-08-31 20:36                 ` Valerie Aurora

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.