[PATCH 0/6] overlay filesystem prototype

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] overlay filesystem prototype
@ 2010-09-03 13:41 Miklos Szeredi
  2010-09-03 13:41 ` [PATCH 1/6] vfs: implement open "forwarding" Miklos Szeredi
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Miklos Szeredi @ 2010-09-03 13:41 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro

Updated patches follow.

Changes since the last version:

 - rename "hybrid union filesystem" to "overlay filesystem" or overlayfs

 - added documentation written by Neil

 - correct st_dev for directories (reported by Neil)

 - use getattr() to get attributes from the underlying filesystems,
   this means that now an overlay filesystem itself can be the lower,
   read-only layer of another overlay

 - listxattr filters out private extended attributes

 - get write ref on the upper layer on mount unless the overlay
   itself is mounted read-only

 - raise capabilities for copy up, dealing with whiteouts and opaque
   directories.  Now the overlay works for non-root users as well

 - "rm -rf" didn't work correctly in all cases if the directory was
   copied up between opendir and the first readdir, this is now fixed
   (and the directory operations consolidated)

 - simplified copy up, this broke optimization for truncate and
   open(O_TRUNC) (now file is copied up to be immediately truncated,
   will fix)

 - st_nlink for merged directories set to 1, this is an "illegal"
   value that normal filesystems never have but some use it to
   indicate that the number of subdirectories is unknown.  Utilities
   (find, ...) seem to tolerate this well.

 - misc fixes I forgot about

Git tree is here:

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/6] vfs: implement open "forwarding"
  2010-09-03 13:41 [PATCH 0/6] overlay filesystem prototype Miklos Szeredi
@ 2010-09-03 13:41 ` Miklos Szeredi
  2010-09-03 13:41 ` [PATCH 2/6] vfs: make i_op->permission take a dentry instead of an inode Miklos Szeredi
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Miklos Szeredi @ 2010-09-03 13:41 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro

[-- Attachment #1: vfs-open-redirect.patch --]
[-- Type: text/plain, Size: 2648 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Add a new file operation f_op->open_other().  This acts just like
f_op->open() except the return value can be another open struct file
pointer.  In that case the original file is discarded and the
replacement file is used instead.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/open.c          |   23 +++++++++++++++++------
 include/linux/fs.h |    1 +
 2 files changed, 18 insertions(+), 6 deletions(-)

Index: linux-2.6/fs/open.c
===================================================================
--- linux-2.6.orig/fs/open.c	2010-08-19 09:45:50.000000000 +0200
+++ linux-2.6/fs/open.c	2010-08-19 09:46:27.000000000 +0200
@@ -657,6 +657,7 @@ static struct file *__dentry_open(struct
 					const struct cred *cred)
 {
 	struct inode *inode;
+	struct file *ret;
 	int error;
 
 	f->f_mode = OPEN_FMODE(f->f_flags) | FMODE_LSEEK |
@@ -664,6 +665,7 @@ static struct file *__dentry_open(struct
 	inode = dentry->d_inode;
 	if (f->f_mode & FMODE_WRITE) {
 		error = __get_file_write_access(inode, mnt);
+		ret = ERR_PTR(error);
 		if (error)
 			goto cleanup_file;
 		if (!special_file(inode->i_mode))
@@ -678,15 +680,24 @@ static struct file *__dentry_open(struct
 	file_sb_list_add(f, inode->i_sb);
 
 	error = security_dentry_open(f, cred);
+	ret = ERR_PTR(error);
 	if (error)
 		goto cleanup_all;
 
-	if (!open && f->f_op)
-		open = f->f_op->open;
-	if (open) {
-		error = open(inode, f);
-		if (error)
+	if (!open && f->f_op && f->f_op->open_other) {
+		/* NULL means keep f, non-error non-null means replace */
+		ret = f->f_op->open_other(f);
+		if (IS_ERR(ret) || ret != NULL)
 			goto cleanup_all;
+	} else {
+		if (!open && f->f_op)
+			open = f->f_op->open;
+		if (open) {
+			error = open(inode, f);
+			ret = ERR_PTR(error);
+			if (error)
+				goto cleanup_all;
+		}
 	}
 	ima_counts_get(f);
 
@@ -728,7 +739,7 @@ cleanup_file:
 	put_filp(f);
 	dput(dentry);
 	mntput(mnt);
-	return ERR_PTR(error);
+	return ret;
 }
 
 /**
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-08-19 09:46:15.000000000 +0200
+++ linux-2.6/include/linux/fs.h	2010-08-19 09:46:27.000000000 +0200
@@ -1494,6 +1494,7 @@ struct file_operations {
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
 	int (*open) (struct inode *, struct file *);
+	struct file *(*open_other) (struct file *);
 	int (*flush) (struct file *, fl_owner_t id);
 	int (*release) (struct inode *, struct file *);
 	int (*fsync) (struct file *, int datasync);

-- 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/6] vfs: make i_op->permission take a dentry instead of an inode
  2010-09-03 13:41 [PATCH 0/6] overlay filesystem prototype Miklos Szeredi
  2010-09-03 13:41 ` [PATCH 1/6] vfs: implement open "forwarding" Miklos Szeredi
@ 2010-09-03 13:41 ` Miklos Szeredi
  2010-09-17 13:14   ` Aneesh Kumar K. V
  2010-09-03 13:41 ` [PATCH 3/6] vfs: add flag to allow rename to same inode Miklos Szeredi
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Miklos Szeredi @ 2010-09-03 13:41 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro

[-- Attachment #1: vfs-permission-dentry.patch --]
[-- Type: text/plain, Size: 35440 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Like most other inode operations ->permission() should take a dentry
instead of an inode.  This is necessary for filesystems which operate
on names not on inodes.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/afs/internal.h                  |    2 +-
 fs/afs/security.c                  |    3 ++-
 fs/bad_inode.c                     |    2 +-
 fs/btrfs/inode.c                   |    4 +++-
 fs/btrfs/ioctl.c                   |    8 ++++----
 fs/ceph/inode.c                    |    3 ++-
 fs/ceph/super.h                    |    2 +-
 fs/cifs/cifsfs.c                   |    3 ++-
 fs/coda/dir.c                      |    3 ++-
 fs/coda/pioctl.c                   |    4 ++--
 fs/ecryptfs/inode.c                |    4 ++--
 fs/fuse/dir.c                      |    3 ++-
 fs/gfs2/ops_inode.c                |   11 ++++++++---
 fs/hostfs/hostfs_kern.c            |    3 ++-
 fs/logfs/dir.c                     |    6 ------
 fs/namei.c                         |   37 ++++++++++++++++++++-----------------
 fs/namespace.c                     |    2 +-
 fs/nfs/dir.c                       |    3 ++-
 fs/nfsd/nfsfh.c                    |    2 +-
 fs/nfsd/vfs.c                      |    4 ++--
 fs/nilfs2/nilfs.h                  |    2 +-
 fs/notify/fanotify/fanotify_user.c |    2 +-
 fs/notify/inotify/inotify_user.c   |    2 +-
 fs/ocfs2/file.c                    |    3 ++-
 fs/ocfs2/file.h                    |    2 +-
 fs/ocfs2/refcounttree.c            |    4 ++--
 fs/open.c                          |   10 +++++-----
 fs/proc/base.c                     |    3 ++-
 fs/proc/proc_sysctl.c              |    3 ++-
 fs/reiserfs/xattr.c                |    4 +++-
 fs/smbfs/file.c                    |    4 ++--
 fs/sysfs/inode.c                   |    3 ++-
 fs/sysfs/sysfs.h                   |    2 +-
 fs/utimes.c                        |    2 +-
 fs/xattr.c                         |   12 +++++++-----
 include/linux/coda_linux.h         |    2 +-
 include/linux/fs.h                 |    4 ++--
 include/linux/nfs_fs.h             |    2 +-
 include/linux/reiserfs_xattr.h     |    2 +-
 ipc/mqueue.c                       |    2 +-
 net/unix/af_unix.c                 |    2 +-
 41 files changed, 100 insertions(+), 81 deletions(-)

Index: linux-2.6/fs/btrfs/ioctl.c
===================================================================
--- linux-2.6.orig/fs/btrfs/ioctl.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/btrfs/ioctl.c	2010-08-19 09:46:31.000000000 +0200
@@ -396,13 +396,13 @@ fail:
 }
 
 /* copy of may_create in fs/namei.c() */
-static inline int btrfs_may_create(struct inode *dir, struct dentry *child)
+static inline int btrfs_may_create(struct dentry *dir, struct dentry *child)
 {
 	if (child->d_inode)
 		return -EEXIST;
-	if (IS_DEADDIR(dir))
+	if (IS_DEADDIR(dir->d_inode))
 		return -ENOENT;
-	return inode_permission(dir, MAY_WRITE | MAY_EXEC);
+	return dentry_permission(dir, MAY_WRITE | MAY_EXEC);
 }
 
 /*
@@ -433,7 +433,7 @@ static noinline int btrfs_mksubvol(struc
 	if (error)
 		goto out_dput;
 
-	error = btrfs_may_create(dir, dentry);
+	error = btrfs_may_create(parent->dentry, dentry);
 	if (error)
 		goto out_drop_write;
 
Index: linux-2.6/fs/ecryptfs/inode.c
===================================================================
--- linux-2.6.orig/fs/ecryptfs/inode.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/ecryptfs/inode.c	2010-08-19 09:46:31.000000000 +0200
@@ -958,9 +958,9 @@ int ecryptfs_truncate(struct dentry *den
 }
 
 static int
-ecryptfs_permission(struct inode *inode, int mask)
+ecryptfs_permission(struct dentry *dentry, int mask)
 {
-	return inode_permission(ecryptfs_inode_to_lower(inode), mask);
+	return dentry_permission(ecryptfs_dentry_to_lower(dentry), mask);
 }
 
 /**
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c	2010-08-19 09:46:15.000000000 +0200
+++ linux-2.6/fs/namei.c	2010-08-19 09:46:31.000000000 +0200
@@ -240,17 +240,18 @@ int generic_permission(struct inode *ino
 }
 
 /**
- * inode_permission  -  check for access rights to a given inode
- * @inode:	inode to check permission on
+ * dentry_permission  -  check for access rights to a given dentry
+ * @dentry:	dentry to check permission on
  * @mask:	right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
  *
- * Used to check for read/write/execute permissions on an inode.
+ * Used to check for read/write/execute permissions on an dentry.
  * We use "fsuid" for this, letting us set arbitrary permissions
  * for filesystem access without changing the "normal" uids which
  * are used for other things.
  */
-int inode_permission(struct inode *inode, int mask)
+int dentry_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
 	int retval;
 
 	if (mask & MAY_WRITE) {
@@ -271,7 +272,7 @@ int inode_permission(struct inode *inode
 	}
 
 	if (inode->i_op->permission)
-		retval = inode->i_op->permission(inode, mask);
+		retval = inode->i_op->permission(dentry, mask);
 	else
 		retval = generic_permission(inode, mask, inode->i_op->check_acl);
 
@@ -295,11 +296,11 @@ int inode_permission(struct inode *inode
  *
  * Note:
  *	Do not use this function in new code.  All access checks should
- *	be done using inode_permission().
+ *	be done using dentry_permission().
  */
 int file_permission(struct file *file, int mask)
 {
-	return inode_permission(file->f_path.dentry->d_inode, mask);
+	return dentry_permission(file->f_path.dentry, mask);
 }
 
 /*
@@ -459,12 +460,13 @@ force_reval_path(struct path *path, stru
  * short-cut DAC fails, then call ->permission() to do more
  * complete permission check.
  */
-static int exec_permission(struct inode *inode)
+static int exec_permission(struct dentry *dentry)
 {
 	int ret;
+	struct inode *inode = dentry->d_inode;
 
 	if (inode->i_op->permission) {
-		ret = inode->i_op->permission(inode, MAY_EXEC);
+		ret = inode->i_op->permission(dentry, MAY_EXEC);
 		if (!ret)
 			goto ok;
 		return ret;
@@ -837,7 +839,7 @@ static int link_path_walk(const char *na
 		unsigned int c;
 
 		nd->flags |= LOOKUP_CONTINUE;
-		err = exec_permission(inode);
+		err = exec_permission(nd->path.dentry);
  		if (err)
 			break;
 
@@ -1163,7 +1165,7 @@ static struct dentry *lookup_hash(struct
 {
 	int err;
 
-	err = exec_permission(nd->path.dentry->d_inode);
+	err = exec_permission(nd->path.dentry);
 	if (err)
 		return ERR_PTR(err);
 	return __lookup_hash(&nd->last, nd->path.dentry, nd);
@@ -1213,7 +1215,7 @@ struct dentry *lookup_one_len(const char
 	if (err)
 		return ERR_PTR(err);
 
-	err = exec_permission(base->d_inode);
+	err = exec_permission(base);
 	if (err)
 		return ERR_PTR(err);
 	return __lookup_hash(&this, base, NULL);
@@ -1301,7 +1303,7 @@ static int may_delete(struct inode *dir,
 	BUG_ON(victim->d_parent->d_inode != dir);
 	audit_inode_child(victim, dir);
 
-	error = inode_permission(dir, MAY_WRITE | MAY_EXEC);
+	error = dentry_permission(victim->d_parent, MAY_WRITE | MAY_EXEC);
 	if (error)
 		return error;
 	if (IS_APPEND(dir))
@@ -1337,7 +1339,8 @@ static inline int may_create(struct inod
 		return -EEXIST;
 	if (IS_DEADDIR(dir))
 		return -ENOENT;
-	return inode_permission(dir, MAY_WRITE | MAY_EXEC);
+	BUG_ON(child->d_parent->d_inode != dir);
+	return dentry_permission(child->d_parent, MAY_WRITE | MAY_EXEC);
 }
 
 /*
@@ -1430,7 +1433,7 @@ int may_open(struct path *path, int acc_
 		break;
 	}
 
-	error = inode_permission(inode, acc_mode);
+	error = dentry_permission(dentry, acc_mode);
 	if (error)
 		return error;
 
@@ -2545,7 +2548,7 @@ static int vfs_rename_dir(struct inode *
 	 * we'll need to flip '..'.
 	 */
 	if (new_dir != old_dir) {
-		error = inode_permission(old_dentry->d_inode, MAY_WRITE);
+		error = dentry_permission(old_dentry, MAY_WRITE);
 		if (error)
 			return error;
 	}
@@ -2900,7 +2903,7 @@ EXPORT_SYMBOL(page_symlink_inode_operati
 EXPORT_SYMBOL(path_lookup);
 EXPORT_SYMBOL(kern_path);
 EXPORT_SYMBOL(vfs_path_lookup);
-EXPORT_SYMBOL(inode_permission);
+EXPORT_SYMBOL(dentry_permission);
 EXPORT_SYMBOL(file_permission);
 EXPORT_SYMBOL(unlock_rename);
 EXPORT_SYMBOL(vfs_create);
Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-08-19 09:45:50.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-08-19 09:46:31.000000000 +0200
@@ -1230,7 +1230,7 @@ static int mount_is_safe(struct path *pa
 		if (current_uid() != path->dentry->d_inode->i_uid)
 			return -EPERM;
 	}
-	if (inode_permission(path->dentry->d_inode, MAY_WRITE))
+	if (dentry_permission(path->dentry, MAY_WRITE))
 		return -EPERM;
 	return 0;
 #endif
Index: linux-2.6/fs/nfsd/nfsfh.c
===================================================================
--- linux-2.6.orig/fs/nfsd/nfsfh.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/nfsd/nfsfh.c	2010-08-19 09:46:31.000000000 +0200
@@ -38,7 +38,7 @@ static int nfsd_acceptable(void *expv, s
 		/* make sure parents give x permission to user */
 		int err;
 		parent = dget_parent(tdentry);
-		err = inode_permission(parent->d_inode, MAY_EXEC);
+		err = dentry_permission(parent, MAY_EXEC);
 		if (err < 0) {
 			dput(parent);
 			break;
Index: linux-2.6/fs/nfsd/vfs.c
===================================================================
--- linux-2.6.orig/fs/nfsd/vfs.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/nfsd/vfs.c	2010-08-19 09:46:31.000000000 +0200
@@ -2124,12 +2124,12 @@ nfsd_permission(struct svc_rqst *rqstp,
 		return 0;
 
 	/* This assumes  NFSD_MAY_{READ,WRITE,EXEC} == MAY_{READ,WRITE,EXEC} */
-	err = inode_permission(inode, acc & (MAY_READ|MAY_WRITE|MAY_EXEC));
+	err = dentry_permission(dentry, acc & (MAY_READ|MAY_WRITE|MAY_EXEC));
 
 	/* Allow read access to binaries even when mode 111 */
 	if (err == -EACCES && S_ISREG(inode->i_mode) &&
 	    acc == (NFSD_MAY_READ | NFSD_MAY_OWNER_OVERRIDE))
-		err = inode_permission(inode, MAY_EXEC);
+		err = dentry_permission(dentry, MAY_EXEC);
 
 	return err? nfserrno(err) : 0;
 }
Index: linux-2.6/fs/notify/fanotify/fanotify_user.c
===================================================================
--- linux-2.6.orig/fs/notify/fanotify/fanotify_user.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/notify/fanotify/fanotify_user.c	2010-08-19 09:46:31.000000000 +0200
@@ -454,7 +454,7 @@ static int fanotify_find_path(int dfd, c
 	}
 
 	/* you can only watch an inode if you have read permissions on it */
-	ret = inode_permission(path->dentry->d_inode, MAY_READ);
+	ret = dentry_permission(path->dentry, MAY_READ);
 	if (ret)
 		path_put(path);
 out:
Index: linux-2.6/fs/notify/inotify/inotify_user.c
===================================================================
--- linux-2.6.orig/fs/notify/inotify/inotify_user.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/notify/inotify/inotify_user.c	2010-08-19 09:46:31.000000000 +0200
@@ -358,7 +358,7 @@ static int inotify_find_inode(const char
 	if (error)
 		return error;
 	/* you can only watch an inode if you have read permissions on it */
-	error = inode_permission(path->dentry->d_inode, MAY_READ);
+	error = dentry_permission(path->dentry, MAY_READ);
 	if (error)
 		path_put(path);
 	return error;
Index: linux-2.6/fs/ocfs2/refcounttree.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/refcounttree.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/ocfs2/refcounttree.c	2010-08-19 09:46:31.000000000 +0200
@@ -4322,7 +4322,7 @@ static inline int ocfs2_may_create(struc
 		return -EEXIST;
 	if (IS_DEADDIR(dir))
 		return -ENOENT;
-	return inode_permission(dir, MAY_WRITE | MAY_EXEC);
+	return dentry_permission(child->d_parent, MAY_WRITE | MAY_EXEC);
 }
 
 /* copied from user_path_parent. */
@@ -4395,7 +4395,7 @@ static int ocfs2_vfs_reflink(struct dent
 	 * file.
 	 */
 	if (!preserve) {
-		error = inode_permission(inode, MAY_READ);
+		error = dentry_permission(old_dentry, MAY_READ);
 		if (error)
 			return error;
 	}
Index: linux-2.6/fs/open.c
===================================================================
--- linux-2.6.orig/fs/open.c	2010-08-19 09:46:27.000000000 +0200
+++ linux-2.6/fs/open.c	2010-08-19 09:46:31.000000000 +0200
@@ -89,7 +89,7 @@ static long do_sys_truncate(const char _
 	if (error)
 		goto dput_and_out;
 
-	error = inode_permission(inode, MAY_WRITE);
+	error = dentry_permission(path.dentry, MAY_WRITE);
 	if (error)
 		goto mnt_drop_write_and_out;
 
@@ -328,7 +328,7 @@ SYSCALL_DEFINE3(faccessat, int, dfd, con
 			goto out_path_release;
 	}
 
-	res = inode_permission(inode, mode | MAY_ACCESS);
+	res = dentry_permission(path.dentry, mode | MAY_ACCESS);
 	/* SuS v2 requires we report a read only fs too */
 	if (res || !(mode & S_IWOTH) || special_file(inode->i_mode))
 		goto out_path_release;
@@ -367,7 +367,7 @@ SYSCALL_DEFINE1(chdir, const char __user
 	if (error)
 		goto out;
 
-	error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
+	error = dentry_permission(path.dentry, MAY_EXEC | MAY_CHDIR);
 	if (error)
 		goto dput_and_out;
 
@@ -396,7 +396,7 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd
 	if (!S_ISDIR(inode->i_mode))
 		goto out_putf;
 
-	error = inode_permission(inode, MAY_EXEC | MAY_CHDIR);
+	error = dentry_permission(file->f_path.dentry, MAY_EXEC | MAY_CHDIR);
 	if (!error)
 		set_fs_pwd(current->fs, &file->f_path);
 out_putf:
@@ -414,7 +414,7 @@ SYSCALL_DEFINE1(chroot, const char __use
 	if (error)
 		goto out;
 
-	error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
+	error = dentry_permission(path.dentry, MAY_EXEC | MAY_CHDIR);
 	if (error)
 		goto dput_and_out;
 
Index: linux-2.6/fs/utimes.c
===================================================================
--- linux-2.6.orig/fs/utimes.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/utimes.c	2010-08-19 09:46:31.000000000 +0200
@@ -96,7 +96,7 @@ static int utimes_common(struct path *pa
 			goto mnt_drop_write_and_out;
 
 		if (!is_owner_or_cap(inode)) {
-			error = inode_permission(inode, MAY_WRITE);
+			error = dentry_permission(path->dentry, MAY_WRITE);
 			if (error)
 				goto mnt_drop_write_and_out;
 		}
Index: linux-2.6/fs/xattr.c
===================================================================
--- linux-2.6.orig/fs/xattr.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/xattr.c	2010-08-19 09:46:31.000000000 +0200
@@ -26,8 +26,10 @@
  * because different namespaces have very different rules.
  */
 static int
-xattr_permission(struct inode *inode, const char *name, int mask)
+xattr_permission(struct dentry *dentry, const char *name, int mask)
 {
+	struct inode *inode = dentry->d_inode;
+
 	/*
 	 * We can never set or remove an extended attribute on a read-only
 	 * filesystem  or on an immutable / append-only inode.
@@ -63,7 +65,7 @@ xattr_permission(struct inode *inode, co
 			return -EPERM;
 	}
 
-	return inode_permission(inode, mask);
+	return dentry_permission(dentry, mask);
 }
 
 /**
@@ -115,7 +117,7 @@ vfs_setxattr(struct dentry *dentry, cons
 	struct inode *inode = dentry->d_inode;
 	int error;
 
-	error = xattr_permission(inode, name, MAY_WRITE);
+	error = xattr_permission(dentry, name, MAY_WRITE);
 	if (error)
 		return error;
 
@@ -165,7 +167,7 @@ vfs_getxattr(struct dentry *dentry, cons
 	struct inode *inode = dentry->d_inode;
 	int error;
 
-	error = xattr_permission(inode, name, MAY_READ);
+	error = xattr_permission(dentry, name, MAY_READ);
 	if (error)
 		return error;
 
@@ -224,7 +226,7 @@ vfs_removexattr(struct dentry *dentry, c
 	if (!inode->i_op->removexattr)
 		return -EOPNOTSUPP;
 
-	error = xattr_permission(inode, name, MAY_WRITE);
+	error = xattr_permission(dentry, name, MAY_WRITE);
 	if (error)
 		return error;
 
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-08-19 09:46:27.000000000 +0200
+++ linux-2.6/include/linux/fs.h	2010-08-19 09:46:31.000000000 +0200
@@ -1525,7 +1525,7 @@ struct inode_operations {
 	void * (*follow_link) (struct dentry *, struct nameidata *);
 	void (*put_link) (struct dentry *, struct nameidata *, void *);
 	void (*truncate) (struct inode *);
-	int (*permission) (struct inode *, int);
+	int (*permission) (struct dentry *, int);
 	int (*check_acl)(struct inode *, int);
 	int (*setattr) (struct dentry *, struct iattr *);
 	int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
@@ -2111,7 +2111,7 @@ extern void emergency_remount(void);
 extern sector_t bmap(struct inode *, sector_t);
 #endif
 extern int notify_change(struct dentry *, struct iattr *);
-extern int inode_permission(struct inode *, int);
+extern int dentry_permission(struct dentry *, int);
 extern int generic_permission(struct inode *, int,
 		int (*check_acl)(struct inode *, int));
 
Index: linux-2.6/ipc/mqueue.c
===================================================================
--- linux-2.6.orig/ipc/mqueue.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/ipc/mqueue.c	2010-08-19 09:46:31.000000000 +0200
@@ -656,7 +656,7 @@ static struct file *do_open(struct ipc_n
 		goto err;
 	}
 
-	if (inode_permission(dentry->d_inode, oflag2acc[oflag & O_ACCMODE])) {
+	if (dentry_permission(dentry, oflag2acc[oflag & O_ACCMODE])) {
 		ret = -EACCES;
 		goto err;
 	}
Index: linux-2.6/net/unix/af_unix.c
===================================================================
--- linux-2.6.orig/net/unix/af_unix.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/net/unix/af_unix.c	2010-08-19 09:46:31.000000000 +0200
@@ -748,7 +748,7 @@ static struct sock *unix_find_other(stru
 		if (err)
 			goto fail;
 		inode = path.dentry->d_inode;
-		err = inode_permission(inode, MAY_WRITE);
+		err = dentry_permission(path.dentry, MAY_WRITE);
 		if (err)
 			goto put_fail;
 
Index: linux-2.6/fs/afs/internal.h
===================================================================
--- linux-2.6.orig/fs/afs/internal.h	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/afs/internal.h	2010-08-19 09:46:31.000000000 +0200
@@ -624,7 +624,7 @@ extern void afs_clear_permits(struct afs
 extern void afs_cache_permit(struct afs_vnode *, struct key *, long);
 extern void afs_zap_permits(struct rcu_head *);
 extern struct key *afs_request_key(struct afs_cell *);
-extern int afs_permission(struct inode *, int);
+extern int afs_permission(struct dentry *, int);
 
 /*
  * server.c
Index: linux-2.6/fs/afs/security.c
===================================================================
--- linux-2.6.orig/fs/afs/security.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/afs/security.c	2010-08-19 09:46:31.000000000 +0200
@@ -285,8 +285,9 @@ static int afs_check_permit(struct afs_v
  * - AFS ACLs are attached to directories only, and a file is controlled by its
  *   parent directory's ACL
  */
-int afs_permission(struct inode *inode, int mask)
+int afs_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
 	struct afs_vnode *vnode = AFS_FS_I(inode);
 	afs_access_t uninitialized_var(access);
 	struct key *key;
Index: linux-2.6/fs/bad_inode.c
===================================================================
--- linux-2.6.orig/fs/bad_inode.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/bad_inode.c	2010-08-19 09:46:31.000000000 +0200
@@ -229,7 +229,7 @@ static int bad_inode_readlink(struct den
 	return -EIO;
 }
 
-static int bad_inode_permission(struct inode *inode, int mask)
+static int bad_inode_permission(struct dentry *dentry, int mask)
 {
 	return -EIO;
 }
Index: linux-2.6/fs/btrfs/inode.c
===================================================================
--- linux-2.6.orig/fs/btrfs/inode.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/btrfs/inode.c	2010-08-19 09:46:31.000000000 +0200
@@ -6922,8 +6922,10 @@ static int btrfs_set_page_dirty(struct p
 	return __set_page_dirty_nobuffers(page);
 }
 
-static int btrfs_permission(struct inode *inode, int mask)
+static int btrfs_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
+
 	if ((BTRFS_I(inode)->flags & BTRFS_INODE_READONLY) && (mask & MAY_WRITE))
 		return -EACCES;
 	return generic_permission(inode, mask, btrfs_check_acl);
Index: linux-2.6/fs/ceph/inode.c
===================================================================
--- linux-2.6.orig/fs/ceph/inode.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/ceph/inode.c	2010-08-19 09:46:31.000000000 +0200
@@ -1757,8 +1757,9 @@ int ceph_do_getattr(struct inode *inode,
  * Check inode permissions.  We verify we have a valid value for
  * the AUTH cap, then call the generic handler.
  */
-int ceph_permission(struct inode *inode, int mask)
+int ceph_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
 	int err = ceph_do_getattr(inode, CEPH_CAP_AUTH_SHARED);
 
 	if (!err)
Index: linux-2.6/fs/ceph/super.h
===================================================================
--- linux-2.6.orig/fs/ceph/super.h	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/ceph/super.h	2010-08-19 09:46:31.000000000 +0200
@@ -776,7 +776,7 @@ extern void ceph_queue_invalidate(struct
 extern void ceph_queue_writeback(struct inode *inode);
 
 extern int ceph_do_getattr(struct inode *inode, int mask);
-extern int ceph_permission(struct inode *inode, int mask);
+extern int ceph_permission(struct dentry *dentry, int mask);
 extern int ceph_setattr(struct dentry *dentry, struct iattr *attr);
 extern int ceph_getattr(struct vfsmount *mnt, struct dentry *dentry,
 			struct kstat *stat);
Index: linux-2.6/fs/cifs/cifsfs.c
===================================================================
--- linux-2.6.orig/fs/cifs/cifsfs.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/cifs/cifsfs.c	2010-08-19 09:46:31.000000000 +0200
@@ -269,8 +269,9 @@ cifs_statfs(struct dentry *dentry, struc
 	return 0;
 }
 
-static int cifs_permission(struct inode *inode, int mask)
+static int cifs_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
 	struct cifs_sb_info *cifs_sb;
 
 	cifs_sb = CIFS_SB(inode->i_sb);
Index: linux-2.6/fs/coda/dir.c
===================================================================
--- linux-2.6.orig/fs/coda/dir.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/coda/dir.c	2010-08-19 09:46:31.000000000 +0200
@@ -138,8 +138,9 @@ exit:
 }
 
 
-int coda_permission(struct inode *inode, int mask)
+int coda_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
         int error = 0;
 
 	mask &= MAY_READ | MAY_WRITE | MAY_EXEC;
Index: linux-2.6/fs/fuse/dir.c
===================================================================
--- linux-2.6.orig/fs/fuse/dir.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/fuse/dir.c	2010-08-19 09:46:31.000000000 +0200
@@ -981,8 +981,9 @@ static int fuse_access(struct inode *ino
  * access request is sent.  Execute permission is still checked
  * locally based on file mode.
  */
-static int fuse_permission(struct inode *inode, int mask)
+static int fuse_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	bool refreshed = false;
 	int err = 0;
Index: linux-2.6/fs/gfs2/ops_inode.c
===================================================================
--- linux-2.6.orig/fs/gfs2/ops_inode.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/gfs2/ops_inode.c	2010-08-19 09:46:31.000000000 +0200
@@ -1071,6 +1071,11 @@ int gfs2_permission(struct inode *inode,
 	return error;
 }
 
+static int gfs2_dentry_permission(struct dentry *dentry, int mask)
+{
+	return gfs2_permission(dentry->d_inode, mask);
+}
+
 /*
  * XXX(truncate): the truncate_setsize calls should be moved to the end.
  */
@@ -1344,7 +1349,7 @@ out:
 }
 
 const struct inode_operations gfs2_file_iops = {
-	.permission = gfs2_permission,
+	.permission = gfs2_dentry_permission,
 	.setattr = gfs2_setattr,
 	.getattr = gfs2_getattr,
 	.setxattr = gfs2_setxattr,
@@ -1364,7 +1369,7 @@ const struct inode_operations gfs2_dir_i
 	.rmdir = gfs2_rmdir,
 	.mknod = gfs2_mknod,
 	.rename = gfs2_rename,
-	.permission = gfs2_permission,
+	.permission = gfs2_dentry_permission,
 	.setattr = gfs2_setattr,
 	.getattr = gfs2_getattr,
 	.setxattr = gfs2_setxattr,
@@ -1378,7 +1383,7 @@ const struct inode_operations gfs2_symli
 	.readlink = generic_readlink,
 	.follow_link = gfs2_follow_link,
 	.put_link = gfs2_put_link,
-	.permission = gfs2_permission,
+	.permission = gfs2_dentry_permission,
 	.setattr = gfs2_setattr,
 	.getattr = gfs2_getattr,
 	.setxattr = gfs2_setxattr,
Index: linux-2.6/fs/coda/pioctl.c
===================================================================
--- linux-2.6.orig/fs/coda/pioctl.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/coda/pioctl.c	2010-08-19 09:46:31.000000000 +0200
@@ -26,7 +26,7 @@
 #include <linux/smp_lock.h>
 
 /* pioctl ops */
-static int coda_ioctl_permission(struct inode *inode, int mask);
+static int coda_ioctl_permission(struct dentry *dentry, int mask);
 static long coda_pioctl(struct file *filp, unsigned int cmd,
 			unsigned long user_data);
 
@@ -42,7 +42,7 @@ const struct file_operations coda_ioctl_
 };
 
 /* the coda pioctl inode ops */
-static int coda_ioctl_permission(struct inode *inode, int mask)
+static int coda_ioctl_permission(struct dentry *dentry, int mask)
 {
 	return (mask & MAY_EXEC) ? -EACCES : 0;
 }
Index: linux-2.6/fs/hostfs/hostfs_kern.c
===================================================================
--- linux-2.6.orig/fs/hostfs/hostfs_kern.c	2010-08-19 09:45:50.000000000 +0200
+++ linux-2.6/fs/hostfs/hostfs_kern.c	2010-08-19 09:46:31.000000000 +0200
@@ -746,8 +746,9 @@ int hostfs_rename(struct inode *from_ino
 	return err;
 }
 
-int hostfs_permission(struct inode *ino, int desired)
+static int hostfs_permission(struct dentry *dentry, int desired)
 {
+	struct inode *ino = dentry->d_inode;
 	char *name;
 	int r = 0, w = 0, x = 0, err;
 
Index: linux-2.6/fs/logfs/dir.c
===================================================================
--- linux-2.6.orig/fs/logfs/dir.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/logfs/dir.c	2010-08-19 09:46:31.000000000 +0200
@@ -555,11 +555,6 @@ static int logfs_symlink(struct inode *d
 	return __logfs_create(dir, dentry, inode, target, destlen);
 }
 
-static int logfs_permission(struct inode *inode, int mask)
-{
-	return generic_permission(inode, mask, NULL);
-}
-
 static int logfs_link(struct dentry *old_dentry, struct inode *dir,
 		struct dentry *dentry)
 {
@@ -818,7 +813,6 @@ const struct inode_operations logfs_dir_
 	.mknod		= logfs_mknod,
 	.rename		= logfs_rename,
 	.rmdir		= logfs_rmdir,
-	.permission	= logfs_permission,
 	.symlink	= logfs_symlink,
 	.unlink		= logfs_unlink,
 };
Index: linux-2.6/fs/nfs/dir.c
===================================================================
--- linux-2.6.orig/fs/nfs/dir.c	2010-08-19 09:45:50.000000000 +0200
+++ linux-2.6/fs/nfs/dir.c	2010-08-19 09:46:31.000000000 +0200
@@ -1941,8 +1941,9 @@ int nfs_may_open(struct inode *inode, st
 	return nfs_do_access(inode, cred, nfs_open_permission_mask(openflags));
 }
 
-int nfs_permission(struct inode *inode, int mask)
+int nfs_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
 	struct rpc_cred *cred;
 	int res = 0;
 
Index: linux-2.6/fs/nilfs2/nilfs.h
===================================================================
--- linux-2.6.orig/fs/nilfs2/nilfs.h	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/nilfs2/nilfs.h	2010-08-19 09:46:31.000000000 +0200
@@ -200,7 +200,7 @@ static inline struct inode *nilfs_dat_in
  */
 #ifdef CONFIG_NILFS_POSIX_ACL
 #error "NILFS: not yet supported POSIX ACL"
-extern int nilfs_permission(struct inode *, int, struct nameidata *);
+extern int nilfs_permission(struct dentry *, int);
 extern int nilfs_acl_chmod(struct inode *);
 extern int nilfs_init_acl(struct inode *, struct inode *);
 #else
Index: linux-2.6/fs/ocfs2/file.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/file.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/ocfs2/file.c	2010-08-19 09:46:31.000000000 +0200
@@ -1310,8 +1310,9 @@ bail:
 	return err;
 }
 
-int ocfs2_permission(struct inode *inode, int mask)
+int ocfs2_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
 	int ret;
 
 	mlog_entry_void();
Index: linux-2.6/fs/ocfs2/file.h
===================================================================
--- linux-2.6.orig/fs/ocfs2/file.h	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/ocfs2/file.h	2010-08-19 09:46:31.000000000 +0200
@@ -61,7 +61,7 @@ int ocfs2_zero_extend(struct inode *inod
 int ocfs2_setattr(struct dentry *dentry, struct iattr *attr);
 int ocfs2_getattr(struct vfsmount *mnt, struct dentry *dentry,
 		  struct kstat *stat);
-int ocfs2_permission(struct inode *inode, int mask);
+int ocfs2_permission(struct dentry *dentry, int mask);
 
 int ocfs2_should_update_atime(struct inode *inode,
 			      struct vfsmount *vfsmnt);
Index: linux-2.6/fs/proc/base.c
===================================================================
--- linux-2.6.orig/fs/proc/base.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/proc/base.c	2010-08-19 09:46:31.000000000 +0200
@@ -2050,8 +2050,9 @@ static const struct file_operations proc
  * /proc/pid/fd needs a special permission handler so that a process can still
  * access /proc/self/fd after it has executed a setuid().
  */
-static int proc_fd_permission(struct inode *inode, int mask)
+static int proc_fd_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
 	int rv;
 
 	rv = generic_permission(inode, mask, NULL);
Index: linux-2.6/fs/proc/proc_sysctl.c
===================================================================
--- linux-2.6.orig/fs/proc/proc_sysctl.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/proc/proc_sysctl.c	2010-08-19 09:46:31.000000000 +0200
@@ -292,12 +292,13 @@ out:
 	return ret;
 }
 
-static int proc_sys_permission(struct inode *inode, int mask)
+static int proc_sys_permission(struct dentry *dentry, int mask)
 {
 	/*
 	 * sysctl entries that are not writeable,
 	 * are _NOT_ writeable, capabilities or not.
 	 */
+	struct inode *inode = dentry->d_inode;
 	struct ctl_table_header *head;
 	struct ctl_table *table;
 	int error;
Index: linux-2.6/fs/reiserfs/xattr.c
===================================================================
--- linux-2.6.orig/fs/reiserfs/xattr.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/reiserfs/xattr.c	2010-08-19 09:46:31.000000000 +0200
@@ -954,8 +954,10 @@ static int xattr_mount_check(struct supe
 	return 0;
 }
 
-int reiserfs_permission(struct inode *inode, int mask)
+int reiserfs_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
+
 	/*
 	 * We don't do permission checks on the internal objects.
 	 * Permissions are determined by the "owning" object.
Index: linux-2.6/fs/smbfs/file.c
===================================================================
--- linux-2.6.orig/fs/smbfs/file.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/smbfs/file.c	2010-08-19 09:46:31.000000000 +0200
@@ -408,9 +408,9 @@ smb_file_release(struct inode *inode, st
  * privileges, so we need our own check for this.
  */
 static int
-smb_file_permission(struct inode *inode, int mask)
+smb_file_permission(struct dentry *dentry, int mask)
 {
-	int mode = inode->i_mode;
+	int mode = dentry->d_inode->i_mode;
 	int error = 0;
 
 	VERBOSE("mode=%x, mask=%x\n", mode, mask);
Index: linux-2.6/fs/sysfs/inode.c
===================================================================
--- linux-2.6.orig/fs/sysfs/inode.c	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/sysfs/inode.c	2010-08-19 09:46:31.000000000 +0200
@@ -348,8 +348,9 @@ int sysfs_hash_and_remove(struct sysfs_d
 		return -ENOENT;
 }
 
-int sysfs_permission(struct inode *inode, int mask)
+int sysfs_permission(struct dentry *dentry, int mask)
 {
+	struct inode *inode = dentry->d_inode;
 	struct sysfs_dirent *sd = inode->i_private;
 
 	mutex_lock(&sysfs_mutex);
Index: linux-2.6/fs/sysfs/sysfs.h
===================================================================
--- linux-2.6.orig/fs/sysfs/sysfs.h	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/fs/sysfs/sysfs.h	2010-08-19 09:46:31.000000000 +0200
@@ -200,7 +200,7 @@ static inline void __sysfs_put(struct sy
 struct inode *sysfs_get_inode(struct super_block *sb, struct sysfs_dirent *sd);
 void sysfs_evict_inode(struct inode *inode);
 int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr *iattr);
-int sysfs_permission(struct inode *inode, int mask);
+int sysfs_permission(struct dentry *dentry, int mask);
 int sysfs_setattr(struct dentry *dentry, struct iattr *iattr);
 int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat);
 int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
Index: linux-2.6/include/linux/coda_linux.h
===================================================================
--- linux-2.6.orig/include/linux/coda_linux.h	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/include/linux/coda_linux.h	2010-08-19 09:46:31.000000000 +0200
@@ -37,7 +37,7 @@ extern const struct file_operations coda
 /* operations shared over more than one file */
 int coda_open(struct inode *i, struct file *f);
 int coda_release(struct inode *i, struct file *f);
-int coda_permission(struct inode *inode, int mask);
+int coda_permission(struct dentry *dentry, int mask);
 int coda_revalidate_inode(struct dentry *);
 int coda_getattr(struct vfsmount *, struct dentry *, struct kstat *);
 int coda_setattr(struct dentry *, struct iattr *);
Index: linux-2.6/include/linux/nfs_fs.h
===================================================================
--- linux-2.6.orig/include/linux/nfs_fs.h	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/include/linux/nfs_fs.h	2010-08-19 09:46:31.000000000 +0200
@@ -348,7 +348,7 @@ extern int nfs_refresh_inode(struct inod
 extern int nfs_post_op_update_inode(struct inode *inode, struct nfs_fattr *fattr);
 extern int nfs_post_op_update_inode_force_wcc(struct inode *inode, struct nfs_fattr *fattr);
 extern int nfs_getattr(struct vfsmount *, struct dentry *, struct kstat *);
-extern int nfs_permission(struct inode *, int);
+extern int nfs_permission(struct dentry *, int);
 extern int nfs_open(struct inode *, struct file *);
 extern int nfs_release(struct inode *, struct file *);
 extern int nfs_attribute_timeout(struct inode *inode);
Index: linux-2.6/include/linux/reiserfs_xattr.h
===================================================================
--- linux-2.6.orig/include/linux/reiserfs_xattr.h	2010-08-19 09:45:30.000000000 +0200
+++ linux-2.6/include/linux/reiserfs_xattr.h	2010-08-19 09:46:31.000000000 +0200
@@ -41,7 +41,7 @@ int reiserfs_xattr_init(struct super_blo
 int reiserfs_lookup_privroot(struct super_block *sb);
 int reiserfs_delete_xattrs(struct inode *inode);
 int reiserfs_chown_xattrs(struct inode *inode, struct iattr *attrs);
-int reiserfs_permission(struct inode *inode, int mask);
+int reiserfs_permission(struct dentry *dentry, int mask);
 
 #ifdef CONFIG_REISERFS_FS_XATTR
 #define has_xattr_dir(inode) (REISERFS_I(inode)->i_flags & i_has_xattr_dir)

-- 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 3/6] vfs: add flag to allow rename to same inode
  2010-09-03 13:41 [PATCH 0/6] overlay filesystem prototype Miklos Szeredi
  2010-09-03 13:41 ` [PATCH 1/6] vfs: implement open "forwarding" Miklos Szeredi
  2010-09-03 13:41 ` [PATCH 2/6] vfs: make i_op->permission take a dentry instead of an inode Miklos Szeredi
@ 2010-09-03 13:41 ` Miklos Szeredi
  2010-09-03 13:41 ` [PATCH 4/6] vfs: export do_splice_direct() to modules Miklos Szeredi
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Miklos Szeredi @ 2010-09-03 13:41 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro

[-- Attachment #1: vfs-fs_rename_self_allow.patch --]
[-- Type: text/plain, Size: 1522 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

The overlay filesystem uses dummy inodes for non-directories.  Allow
rename to work in this case despite the inode being the same.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/namei.c         |    4 +++-
 include/linux/fs.h |    1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-08-25 14:19:34.000000000 +0200
+++ linux-2.6/include/linux/fs.h	2010-08-25 14:19:53.000000000 +0200
@@ -179,6 +179,7 @@ struct inodes_stat_t {
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move()
 					 * during rename() internally.
 					 */
+#define FS_RENAME_SELF_ALLOW	65536	/* Allow rename to same inode */
 
 /*
  * These are the fs-independent mount-flags: up to 32 flags are supported
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c	2010-08-25 10:19:53.000000000 +0200
+++ linux-2.6/fs/namei.c	2010-08-25 14:22:56.000000000 +0200
@@ -2620,8 +2620,10 @@ int vfs_rename(struct inode *old_dir, st
 	int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
 	const unsigned char *old_name;
 
-	if (old_dentry->d_inode == new_dentry->d_inode)
+	if (old_dentry->d_inode == new_dentry->d_inode &&
+	    !(old_dir->i_sb->s_type->fs_flags & FS_RENAME_SELF_ALLOW)) {
  		return 0;
+	}
  
 	error = may_delete(old_dir, old_dentry, is_dir);
 	if (error)

-- 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 4/6] vfs: export do_splice_direct() to modules
  2010-09-03 13:41 [PATCH 0/6] overlay filesystem prototype Miklos Szeredi
                   ` (2 preceding siblings ...)
  2010-09-03 13:41 ` [PATCH 3/6] vfs: add flag to allow rename to same inode Miklos Szeredi
@ 2010-09-03 13:41 ` Miklos Szeredi
  2010-09-03 13:41 ` [PATCH 5/6] overlay: hybrid overlay filesystem prototype Miklos Szeredi
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Miklos Szeredi @ 2010-09-03 13:41 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro

[-- Attachment #1: vfs-export-do_splice_direct.patch --]
[-- Type: text/plain, Size: 673 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Export do_splice_direct() to modules.  Needed by overlay filesystem.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/splice.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6/fs/splice.c
===================================================================
--- linux-2.6.orig/fs/splice.c	2010-08-13 16:07:00.000000000 +0200
+++ linux-2.6/fs/splice.c	2010-08-25 18:59:08.000000000 +0200
@@ -1307,6 +1307,7 @@ long do_splice_direct(struct file *in, l
 
 	return ret;
 }
+EXPORT_SYMBOL(do_splice_direct);
 
 static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
 			       struct pipe_inode_info *opipe,

-- 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 5/6] overlay: hybrid overlay filesystem prototype
  2010-09-03 13:41 [PATCH 0/6] overlay filesystem prototype Miklos Szeredi
                   ` (3 preceding siblings ...)
  2010-09-03 13:41 ` [PATCH 4/6] vfs: export do_splice_direct() to modules Miklos Szeredi
@ 2010-09-03 13:41 ` Miklos Szeredi
  2010-09-03 13:41 ` [PATCH 6/6] overlay: overlay filesystem documentation Miklos Szeredi
  2010-09-05 10:37 ` [PATCH 0/6] overlay filesystem prototype J. R. Okajima
  6 siblings, 0 replies; 11+ messages in thread
From: Miklos Szeredi @ 2010-09-03 13:41 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro

[-- Attachment #1: overlayfs.patch --]
[-- Type: text/plain, Size: 46528 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

This overlay filesystem is a hybrid of entirely filesystem based
(unionfs, aufs) and entierly VFS based (union mounts) solutions.

The dentry tree is duplicated from the underlying filesystems, this
enables fast cached lookups without adding special support into the
VFS.  This uses slightly more memory than union mounts, but dentries
are relatively small.

Inode structures are only duplicated for directories.  Regular files,
symlinks and special files each share a single inode.  This means that
locking victim for unlink is a quasi-filesystem lock, which is
suboptimal, but could be worked around in the VFS.

Opening non directories results in the open forwarded to the
underlying filesystem.  This makes the behavior very similar to union
mounts (with the same limitations vs. fchmod/fchown on O_RDONLY file
descriptors).

Usage:

  mount -t overlay -olowerdir=/lower,upperdir=/upper overlay /mnt

Supported:

 - all operations

Missing:

 - ensure that filesystems part of the overlay are not modified outside
   the overlay
 - optimize directory merging and caching

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/Kconfig               |    1 
 fs/Makefile              |    1 
 fs/overlayfs/Kconfig     |    4 
 fs/overlayfs/Makefile    |    5 
 fs/overlayfs/overlayfs.c | 1890 +++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 1901 insertions(+)

Index: linux-2.6/fs/overlayfs/overlayfs.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/fs/overlayfs/overlayfs.c	2010-09-03 14:46:07.000000000 +0200
@@ -0,0 +1,1890 @@
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/sched.h>
+#include <linux/fs_struct.h>
+#include <linux/file.h>
+#include <linux/xattr.h>
+#include <linux/security.h>
+#include <linux/mount.h>
+#include <linux/splice.h>
+#include <linux/slab.h>
+#include <linux/parser.h>
+#include <linux/module.h>
+#include <linux/uaccess.h>
+
+MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
+MODULE_DESCRIPTION("Overlay filesystem");
+MODULE_LICENSE("GPL");
+
+struct ovl_fs {
+	struct inode *symlink_inode;
+	struct inode *regular_inode;
+	struct inode *special_inode;
+	struct vfsmount *upper_mnt;
+};
+
+struct ovl_entry {
+	struct path upperpath;
+	struct path lowerpath;
+	bool opaque;
+};
+
+static const char *ovl_whiteout_xattr = "trusted.overlay.whiteout";
+static const char *ovl_opaque_xattr = "trusted.overlay.opaque";
+static const char *ovl_whiteout_symlink = "(overlay-whiteout)";
+
+static struct path *ovl_path(struct ovl_entry *ue)
+{
+	return ue->upperpath.dentry ? &ue->upperpath : &ue->lowerpath;
+}
+
+static struct file *path_open(struct path *path, int flags)
+{
+	const struct cred *cred = current_cred();
+
+	path_get(path);
+	return dentry_open(path->dentry, path->mnt, flags, cred);
+}
+
+static bool ovl_is_whiteout(struct dentry *dentry)
+{
+	int res;
+	char val;
+
+	if (!dentry)
+		return false;
+	if (!dentry->d_inode)
+		return false;
+	if (!S_ISLNK(dentry->d_inode->i_mode))
+		return false;
+
+	res = vfs_getxattr(dentry, ovl_whiteout_xattr, &val, 1);
+	if (res == 1 && val == 'y')
+		return true;
+
+	return false;
+}
+
+static bool ovl_is_opaquedir(struct dentry *dentry)
+{
+	int res;
+	char val;
+
+	if (!S_ISDIR(dentry->d_inode->i_mode))
+		return false;
+
+	res = vfs_getxattr(dentry, ovl_opaque_xattr, &val, 1);
+	if (res == 1 && val == 'y')
+		return true;
+
+	return false;
+}
+
+struct ovl_cache_entry {
+	struct ovl_cache_entry *next;
+	struct qstr name;
+	unsigned int type;
+	u64 ino;
+	bool is_whiteout;
+};
+
+struct ovl_cache_callback {
+	struct ovl_cache_entry *list;
+	struct ovl_cache_entry **endp;
+	struct path path;
+	int count;
+	int err;
+};
+
+struct ovl_dir_file {
+	bool is_real;
+	struct ovl_cache_entry *cache;
+	struct file *realfile;
+};
+
+static int ovl_cache_add_entry(struct ovl_cache_callback *cb,
+				 const char *name, int namelen, u64 ino,
+				 unsigned int d_type, bool is_whiteout)
+{
+	struct ovl_cache_entry *p;
+
+	p = kmalloc(sizeof(*p), GFP_KERNEL);
+	if (!p)
+		return -ENOMEM;
+
+	p->name.name = kstrndup(name, namelen, GFP_KERNEL);
+	if (!p->name.name) {
+		kfree(p);
+		return -ENOMEM;
+	}
+	p->name.len = namelen;
+	p->name.hash = 0;
+	p->type = d_type;
+	p->ino = ino;
+	p->is_whiteout = is_whiteout;
+	p->next = NULL;
+	*cb->endp = p;
+	cb->endp = &p->next;
+
+	return 0;
+}
+
+static void ovl_cache_free(struct ovl_cache_entry *p)
+{
+	while (p) {
+		struct ovl_cache_entry *next = p->next;
+
+		kfree(p->name.name);
+		kfree(p);
+		p = next;
+	}
+}
+
+static int ovl_cache_find_entry(struct ovl_cache_entry *start,
+				  const char *name, int namelen)
+{
+	struct ovl_cache_entry *p;
+	int ret = 0;
+
+	for (p = start; p; p = p->next) {
+		if (p->name.len != namelen)
+			continue;
+		if (strncmp(p->name.name, name, namelen) == 0) {
+			ret = 1;
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static int ovl_fill_lower(void *buf, const char *name, int namlen,
+			    loff_t offset, u64 ino, unsigned int d_type)
+{
+	struct ovl_cache_callback *cb = buf;
+
+	cb->count++;
+	if (!ovl_cache_find_entry(cb->list, name, namlen))
+		cb->err = ovl_cache_add_entry(cb, name, namlen, ino, d_type, false);
+
+	return cb->err;
+}
+
+static int ovl_fill_upper(void *buf, const char *name, int namlen,
+			  loff_t offset, u64 ino, unsigned int d_type)
+{
+	struct ovl_cache_callback *cb = buf;
+	bool is_whiteout = false;
+
+	cb->count++;
+	if (d_type == DT_LNK) {
+		struct dentry *dentry;
+
+		dentry = lookup_one_len(name, cb->path.dentry, strlen(name));
+		if (IS_ERR(dentry)) {
+			cb->err = PTR_ERR(dentry);
+			goto out;
+		}
+		is_whiteout = ovl_is_whiteout(dentry);
+		dput(dentry);
+	}
+
+	cb->err = ovl_cache_add_entry(cb, name, namlen, ino, d_type, is_whiteout);
+
+out:
+	return cb->err;
+}
+
+static int ovl_fill_cache(struct path *realpath, struct ovl_cache_callback *cb,
+			  filldir_t filler)
+{
+	const struct cred *old_cred;
+	struct cred *override_cred;
+	struct file *realfile;
+	int err;
+
+	realfile = path_open(realpath, O_RDONLY | O_DIRECTORY);
+	if (IS_ERR(realfile))
+		return PTR_ERR(realfile);
+
+	err = -ENOMEM;
+	override_cred = prepare_creds();
+	if (override_cred) {
+		/*
+		 * CAP_SYS_ADMIN for getxattr
+		 * CAP_DAC_OVERRIDE for lookup and unlink
+		 */
+		cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+		cap_raise(override_cred->cap_effective, CAP_DAC_OVERRIDE);
+		old_cred = override_creds(override_cred);
+
+		do {
+			cb->count = 0;
+			cb->err = 0;
+			err = vfs_readdir(realfile, filler, cb);
+			if (err >= 0)
+				err = cb->err;
+		} while (!err && cb->count);
+
+		revert_creds(old_cred);
+		put_cred(override_cred);
+	}
+	fput(realfile);
+
+	if (err) {
+		ovl_cache_free(cb->list);
+		cb->list = NULL;
+		return err;
+	}
+
+	return 0;
+}
+
+static int ovl_readdir(struct file *file, void *buf, filldir_t filler)
+{
+	struct ovl_dir_file *od = file->private_data;
+	struct ovl_entry *ue = file->f_path.dentry->d_fsdata;
+	struct ovl_cache_entry *p;
+	loff_t off;
+	int res = 0;
+
+	if (!file->f_pos) {
+		ovl_cache_free(od->cache);
+		od->cache = NULL;
+		od->is_real = false;
+	}
+
+	if (od->is_real || !ue->lowerpath.dentry || !ue->upperpath.dentry) {
+		od->is_real = true;
+		res = vfs_readdir(od->realfile, filler, buf);
+		file->f_pos = od->realfile->f_pos;
+
+		return res;
+	}
+
+	if (!od->cache) {
+		struct ovl_cache_callback cb = {
+			.list = NULL,
+			.endp = &cb.list,
+			.path = ue->upperpath,
+		};
+
+		res = ovl_fill_cache(&ue->upperpath, &cb, ovl_fill_upper);
+		if (!res) {
+			res = ovl_fill_cache(&ue->lowerpath, &cb,
+					     ovl_fill_lower);
+		}
+		if (res)
+			return res;
+
+		od->cache = cb.list;
+	}
+
+	off = 0;
+	for (p = od->cache; p; p = p->next) {
+		int over;
+
+		if (p->is_whiteout)
+			continue;
+
+		off++;
+		if (off <= file->f_pos)
+			continue;
+
+		over = filler(buf, p->name.name, p->name.len, off - 1,
+			      p->ino, p->type);
+		if (over)
+			break;
+
+		file->f_pos = off;
+	}
+
+	return res;
+}
+
+static loff_t ovl_dir_llseek(struct file *file, loff_t offset, int origin)
+{
+	loff_t res;
+	struct ovl_dir_file *od = file->private_data;
+
+	res = generic_file_llseek(od->realfile, offset, origin);
+	file->f_pos = od->realfile->f_pos;
+
+	return res;
+}
+
+static int ovl_dir_fsync(struct file *file, int datasync)
+{
+	struct ovl_dir_file *od = file->private_data;
+
+	return vfs_fsync(od->realfile, datasync);
+}
+
+static int ovl_dir_release(struct inode *inode, struct file *file)
+{
+	struct ovl_dir_file *od = file->private_data;
+
+	ovl_cache_free(od->cache);
+	fput(od->realfile);
+	kfree(od);
+
+	return 0;
+}
+
+static int ovl_dir_open(struct inode *inode, struct file *file)
+{
+	int err;
+	struct ovl_entry *ue = file->f_path.dentry->d_fsdata;
+	struct path *realpath = ovl_path(ue);
+	struct ovl_dir_file *od;
+
+	od = kzalloc(sizeof(struct ovl_dir_file), GFP_KERNEL);
+	if (!od)
+		return -ENOMEM;
+
+	od->realfile = path_open(realpath, file->f_flags);
+	if (IS_ERR(od->realfile)) {
+		err = PTR_ERR(od->realfile);
+		kfree(od);
+		return err;
+	}
+
+	file->private_data = od;
+
+	return 0;
+}
+
+static const struct file_operations ovl_dir_operations = {
+	.read		= generic_read_dir,
+	.open		= ovl_dir_open,
+	.readdir	= ovl_readdir,
+	.llseek		= ovl_dir_llseek,
+	.fsync		= ovl_dir_fsync,
+	.release	= ovl_dir_release,
+};
+
+static const struct inode_operations ovl_dir_inode_operations;
+
+static void ovl_dentry_release(struct dentry *dentry)
+{
+	struct ovl_entry *ue = dentry->d_fsdata;
+
+	if (ue) {
+		path_put(&ue->upperpath);
+		path_put(&ue->lowerpath);
+		kfree(ue);
+	}
+}
+
+static void ovl_dentry_iput(struct dentry *dentry, struct inode *inode)
+{
+	struct ovl_entry *ue = dentry->d_fsdata;
+
+	path_put(&ue->upperpath);
+	path_put(&ue->lowerpath);
+	ue->upperpath.dentry = NULL;
+	ue->upperpath.mnt = NULL;
+	ue->lowerpath.dentry = NULL;
+	ue->lowerpath.mnt = NULL;
+	iput(inode);
+}
+
+static const struct dentry_operations ovl_dentry_operations = {
+	.d_release = ovl_dentry_release,
+	.d_iput = ovl_dentry_iput,
+};
+
+static struct inode *ovl_new_inode(struct super_block *sb, umode_t mode)
+{
+	struct ovl_fs *ufs = sb->s_fs_info;
+	struct inode *inode;
+
+	switch (mode & S_IFMT) {
+	case S_IFDIR:
+		inode = new_inode(sb);
+		inode->i_flags |= S_NOATIME|S_NOCMTIME;
+		inode->i_op = &ovl_dir_inode_operations;
+		inode->i_fop = &ovl_dir_operations;
+		inode->i_mode = S_IFDIR;
+		break;
+
+	case S_IFLNK:
+		inode = ufs->symlink_inode;
+		atomic_inc(&inode->i_count);
+		break;
+
+	case S_IFREG:
+		inode = ufs->regular_inode;
+		atomic_inc(&inode->i_count);
+		break;
+
+	case S_IFSOCK:
+	case S_IFBLK:
+	case S_IFCHR:
+	case S_IFIFO:
+		inode = ufs->special_inode;
+		atomic_inc(&inode->i_count);
+		break;
+
+	default:
+		WARN(1, "illegal file type: %i\n", mode & S_IFMT);
+		inode = NULL;
+	}
+
+	return inode;
+
+}
+
+static struct dentry *ovl_lookup_real(struct dentry *dir, struct qstr *name)
+{
+	struct dentry *dentry;
+
+	mutex_lock(&dir->d_inode->i_mutex);
+	dentry = lookup_one_len(name->name, dir, name->len);
+	mutex_unlock(&dir->d_inode->i_mutex);
+
+	if (IS_ERR(dentry)) {
+		if (PTR_ERR(dentry) == -ENOENT)
+			dentry = NULL;
+	} else if (!dentry->d_inode) {
+		dput(dentry);
+		dentry = NULL;
+	}
+	return dentry;
+}
+
+static struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
+				   struct nameidata *nd)
+{
+	struct ovl_entry *pue = dentry->d_parent->d_fsdata;
+	struct ovl_entry *ue;
+	struct dentry *upperdir = pue->upperpath.dentry;
+	struct dentry *upperdentry = NULL;
+	struct dentry *lowerdir = pue->lowerpath.dentry;
+	struct dentry *lowerdentry = NULL;
+	struct inode *inode = NULL;
+	int err;
+
+	err = -ENOMEM;
+	ue = kzalloc(sizeof(struct ovl_entry), GFP_KERNEL);
+	if (!ue)
+		goto out;
+
+	if (upperdir) {
+		upperdentry = ovl_lookup_real(upperdir, &dentry->d_name);
+		err = PTR_ERR(upperdentry);
+		if (IS_ERR(upperdentry))
+			goto out_free;
+
+		if (upperdentry) {
+			const struct cred *old_cred;
+			struct cred *override_cred;
+
+			err = -ENOMEM;
+			override_cred = prepare_creds();
+			if (!override_cred)
+				goto out_dput;
+
+			/* CAP_SYS_ADMIN needed for getxattr */
+			cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+			old_cred = override_creds(override_cred);
+
+			if (ovl_is_opaquedir(upperdentry)) {
+				ue->opaque = true;
+			} else if (ovl_is_whiteout(upperdentry)) {
+				dput(upperdentry);
+				upperdentry = NULL;
+				ue->opaque = true;
+			}
+			revert_creds(old_cred);
+			put_cred(override_cred);
+		}
+	}
+	if (lowerdir && !ue->opaque) {
+		lowerdentry = ovl_lookup_real(lowerdir, &dentry->d_name);
+		if (IS_ERR(lowerdentry)) {
+			err = PTR_ERR(lowerdentry);
+			dput(upperdentry);
+			goto out_free;
+		}
+	}
+
+	if (lowerdentry && upperdentry &&
+	    (!S_ISDIR(upperdentry->d_inode->i_mode) ||
+	     !S_ISDIR(lowerdentry->d_inode->i_mode))) {
+		dput(lowerdentry);
+		lowerdentry = NULL;
+		ue->opaque = true;
+	}
+
+	if (lowerdentry || upperdentry) {
+		struct dentry *realdentry;
+
+		realdentry = upperdentry ? upperdentry : lowerdentry;
+		inode = ovl_new_inode(dir->i_sb, realdentry->d_inode->i_mode);
+		if (!inode)
+			goto out_dput;
+	}
+
+	if (upperdentry) {
+		ue->upperpath.mnt = pue->upperpath.mnt;
+		ue->upperpath.dentry = upperdentry;
+		path_get(&ue->upperpath);
+		dput(upperdentry);
+	}
+	if (lowerdentry) {
+		ue->lowerpath.mnt = pue->lowerpath.mnt;
+		ue->lowerpath.dentry = lowerdentry;
+		path_get(&ue->lowerpath);
+		dput(lowerdentry);
+	}
+
+	d_add(dentry, inode);
+	dentry->d_fsdata = ue;
+	dentry->d_op = &ovl_dentry_operations;
+
+	return NULL;
+
+out_dput:
+	dput(upperdentry);
+	dput(lowerdentry);
+out_free:
+	kfree(ue);
+out:
+	return ERR_PTR(err);
+}
+
+static int ovl_copy_up_xattr(struct dentry *old, struct dentry *new)
+{
+	ssize_t list_size, size;
+	char *buf, *name, *value;
+	int error;
+
+	if (!old->d_inode->i_op->getxattr ||
+	    !new->d_inode->i_op->getxattr)
+		return 0;
+
+	list_size = vfs_listxattr(old, NULL, 0);
+	if (list_size <= 0)
+		return list_size;
+
+	buf = kzalloc(list_size, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	error = -ENOMEM;
+	value = kmalloc(XATTR_SIZE_MAX, GFP_KERNEL);
+	if (!value)
+		goto out;
+
+	list_size = vfs_listxattr(old, buf, list_size);
+	if (list_size <= 0) {
+		error = list_size;
+		goto out_free_value;
+	}
+
+	for (name = buf; name < (buf + list_size); name += strlen(name) + 1) {
+		size = vfs_getxattr(old, name, value, XATTR_SIZE_MAX);
+		if (size <= 0) {
+			error = size;
+			goto out_free_value;
+		}
+		error = vfs_setxattr(new, name, value, size, 0);
+		if (error)
+			goto out_free_value;
+	}
+
+out_free_value:
+	kfree(value);
+out:
+	kfree(buf);
+	return error;
+}
+
+static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
+{
+	struct file *old_file;
+	struct file *new_file;
+	loff_t offset = 0;
+	long bytes;
+	int error = 0;
+
+	if (len == 0)
+		return 0;
+
+	old_file = path_open(old, O_RDONLY);
+	if (IS_ERR(old_file))
+		return PTR_ERR(old_file);
+
+	new_file = path_open(new, O_WRONLY);
+	if (IS_ERR(new_file)) {
+		error = PTR_ERR(new_file);
+		goto out_fput;
+	}
+
+	/* FIXME: do_splice_direct() can't copy >4G */
+	/* FIXME: allow kill signal to abort */
+	/* FIXME: sparse files */
+	bytes = do_splice_direct(old_file, &offset, new_file, len,
+				 SPLICE_F_MOVE);
+	if (bytes < 0)
+		error = bytes;
+
+	fput(new_file);
+out_fput:
+	fput(old_file);
+	return error;
+}
+
+static struct dentry *ovl_lookup_create(struct ovl_entry *ue,
+					struct ovl_entry *pue,
+					struct qstr *name)
+{
+	int err;
+	struct inode *upperdir = pue->upperpath.dentry->d_inode;
+	struct dentry *newdentry;
+
+	newdentry = lookup_one_len(name->name, pue->upperpath.dentry, name->len);
+	if (IS_ERR(newdentry))
+		return newdentry;
+
+	if (ue->opaque) {
+		const struct cred *old_cred;
+		struct cred *override_cred;
+
+		err = -ENOMEM;
+		override_cred = prepare_creds();
+		if (!override_cred)
+			goto out_dput;
+
+		/*
+		 * CAP_SYS_ADMIN for getxattr
+		 * CAP_FOWNER for unlink in sticky directory
+		 */
+		cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+		cap_raise(override_cred->cap_effective, CAP_FOWNER);
+		old_cred = override_creds(override_cred);
+
+		err = -ESTALE;
+		if (ovl_is_whiteout(newdentry))
+			err = vfs_unlink(upperdir, newdentry);
+
+		revert_creds(old_cred);
+		put_cred(override_cred);
+		if (err)
+			goto out_dput;
+
+		dput(newdentry);
+		newdentry = lookup_one_len(name->name, pue->upperpath.dentry, name->len);
+		if (IS_ERR(newdentry))
+			return newdentry;
+	}
+
+	err = -EEXIST;
+	if (newdentry->d_inode)
+		goto out_dput;
+
+	return newdentry;
+
+out_dput:
+	dput(newdentry);
+	return ERR_PTR(err);
+}
+
+static int ovl_upper_create(struct dentry *parent, struct dentry *dentry,
+			    struct kstat *stat, const char *link,
+			    struct path *newpath)
+{
+	int err;
+	struct ovl_entry *ue = dentry->d_fsdata;
+	struct ovl_entry *pue = parent->d_fsdata;
+	struct inode *upperdir = pue->upperpath.dentry->d_inode;
+	struct dentry *newdentry;
+
+	newdentry = ovl_lookup_create(ue, pue, &dentry->d_name);
+	if (IS_ERR(newdentry))
+		return PTR_ERR(newdentry);
+
+	switch (stat->mode & S_IFMT) {
+	case S_IFREG:
+		err = vfs_create(upperdir, newdentry, stat->mode, NULL);
+		break;
+
+	case S_IFDIR:
+		err = vfs_mkdir(upperdir, newdentry, stat->mode);
+		break;
+
+	case S_IFCHR:
+	case S_IFBLK:
+	case S_IFIFO:
+	case S_IFSOCK:
+		err = vfs_mknod(upperdir, newdentry, stat->mode, stat->rdev);
+		break;
+
+	case S_IFLNK:
+		err = vfs_symlink(upperdir, newdentry, link);
+		break;
+
+	default:
+		err = -EPERM;
+	}
+	if (!err) {
+		newpath->dentry = newdentry;
+		newpath->mnt = pue->upperpath.mnt;
+		path_get(newpath);
+	}
+
+	dput(newdentry);
+	return err;
+}
+
+static char *ovl_read_symlink(struct path *path)
+{
+	int res;
+	char *buf;
+	struct inode *inode = path->dentry->d_inode;
+	mm_segment_t old_fs;
+
+	res = -EINVAL;
+	if (!inode->i_op->readlink)
+		goto err;
+
+	res = -ENOMEM;
+	buf = (char *) __get_free_page(GFP_KERNEL);
+	if (!buf)
+		goto err;
+
+	old_fs = get_fs();
+	set_fs(get_ds());
+	/* The cast to a user pointer is valid due to the set_fs() */
+	res = inode->i_op->readlink(path->dentry,
+				    (char __user *)buf, PAGE_SIZE - 1);
+	set_fs(old_fs);
+	if (res < 0) {
+		free_page((unsigned long) buf);
+		goto err;
+	}
+	buf[res] = '\0';
+
+	return buf;
+
+err:
+	return ERR_PTR(res);
+}
+
+static int ovl_set_timestamps(struct dentry *upperdentry, struct kstat *stat)
+{
+	struct iattr attr = {
+		.ia_valid = ATTR_ATIME | ATTR_MTIME | ATTR_ATIME_SET | ATTR_MTIME_SET,
+		.ia_atime = stat->atime,
+		.ia_mtime = stat->mtime,
+	};
+
+	return notify_change(upperdentry, &attr);
+}
+
+static int ovl_set_mode(struct dentry *upperdentry, umode_t mode)
+{
+	struct iattr attr = {
+		.ia_valid = ATTR_MODE,
+		.ia_mode = mode,
+	};
+
+	return notify_change(upperdentry, &attr);
+}
+
+static int ovl_set_opaque(struct dentry *upperdentry)
+{
+	int err;
+	const struct cred *old_cred;
+	struct cred *override_cred;
+
+	override_cred = prepare_creds();
+	if (!override_cred)
+		return -ENOMEM;
+
+	/* CAP_SYS_ADMIN for setxattr of "trusted" namespace */
+	cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+	old_cred = override_creds(override_cred);
+	err = vfs_setxattr(upperdentry, ovl_opaque_xattr, "y", 1, 0);
+	revert_creds(old_cred);
+	put_cred(override_cred);
+
+	return err;
+}
+
+static int ovl_remove_opaque(struct dentry *upperdentry)
+{
+	int err;
+	const struct cred *old_cred;
+	struct cred *override_cred;
+
+	override_cred = prepare_creds();
+	if (!override_cred)
+		return -ENOMEM;
+
+	/* CAP_SYS_ADMIN for removexattr of "trusted" namespace */
+	cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+	old_cred = override_creds(override_cred);
+	err = vfs_removexattr(upperdentry, ovl_opaque_xattr);
+	revert_creds(old_cred);
+	put_cred(override_cred);
+
+	return err;
+}
+
+static int ovl_copy_up_locked(struct dentry *parent, struct dentry *dentry,
+			      struct kstat *pstat, struct kstat *stat,
+			      const char *link)
+{
+	int err;
+	struct ovl_entry *ue = dentry->d_fsdata;
+	struct ovl_entry *pue = parent->d_fsdata;
+	struct path newpath;
+	umode_t mode = stat->mode;
+
+	/*
+	 * Using upper filesystem locking to protect against copy up
+	 * racing with rename (rename means the copy up was already
+	 * successful).
+	 */
+	if (dentry->d_parent != parent) {
+		if (WARN_ON(!ue->upperpath.dentry))
+			return -ESTALE;
+
+		return 0;
+	}
+	/* Can't properly set mode on creation because of the umask */
+	stat->mode &= S_IFMT;
+
+	err  = ovl_upper_create(parent, dentry, stat, link, &newpath);
+	if (err) {
+		/* Already copied up? */
+		if (err == -EEXIST && ue->upperpath.dentry)
+			return 0;
+
+		return err;
+	}
+
+	if (S_ISREG(stat->mode)) {
+		err = ovl_copy_up_data(&ue->lowerpath, &newpath, stat->size);
+		if (err)
+			goto out_path_put;
+	}
+
+	err = ovl_copy_up_xattr(ue->lowerpath.dentry, newpath.dentry);
+	if (err)
+		goto out_path_put;
+
+	if (ue->opaque && S_ISDIR(stat->mode)) {
+		err = ovl_set_opaque(newpath.dentry);
+		if (err)
+			goto out_path_put;
+	}
+
+	mutex_lock(&newpath.dentry->d_inode->i_mutex);
+	err = ovl_set_mode(newpath.dentry, mode);
+	if (!err)
+		err = ovl_set_timestamps(newpath.dentry, stat);
+	mutex_unlock(&newpath.dentry->d_inode->i_mutex);
+	if (err)
+		goto out_path_put;
+
+	/* Restore timestamps on parent (best effort) */
+	ovl_set_timestamps(pue->upperpath.dentry, pstat);
+
+	ue->upperpath = newpath;
+	/* FIXME: release lowerpath? */
+	if (ue->lowerpath.dentry)
+		ue->opaque = true;
+
+	return 0;
+
+out_path_put:
+	path_put(&newpath);
+	return err;
+}
+
+static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry)
+{
+	int err;
+	struct kstat stat;
+	struct kstat pstat;
+	struct ovl_entry *ue = dentry->d_fsdata;
+	struct ovl_entry *pue = parent->d_fsdata;
+	struct inode *upperdir = pue->upperpath.dentry->d_inode;
+	const struct cred *old_cred;
+	struct cred *override_cred;
+	char *link = NULL;
+
+	err = vfs_getattr(ue->lowerpath.mnt, ue->lowerpath.dentry, &stat);
+	if (err)
+		return err;
+
+	err = vfs_getattr(pue->upperpath.mnt, pue->upperpath.dentry, &pstat);
+	if (err)
+		return err;
+
+	if (S_ISLNK(stat.mode)) {
+		link = ovl_read_symlink(&ue->lowerpath);
+		if (IS_ERR(link))
+			return PTR_ERR(link);
+	}
+
+	err = -ENOMEM;
+	override_cred = prepare_creds();
+	if (!override_cred)
+		goto out_free_link;
+
+	override_cred->fsuid = stat.uid;
+	override_cred->fsgid = stat.gid;
+	/*
+	 * CAP_SYS_ADMIN for copying up extended attributes
+	 * CAP_DAC_OVERRIDE for create
+	 * CAP_FOWNER for chmod, timestamp update
+	 * CAP_FSETID for chmod
+	 * CAP_MKNOD for mknod
+	 */
+	cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+	cap_raise(override_cred->cap_effective, CAP_DAC_OVERRIDE);
+	cap_raise(override_cred->cap_effective, CAP_FOWNER);
+	cap_raise(override_cred->cap_effective, CAP_FSETID);
+	cap_raise(override_cred->cap_effective, CAP_MKNOD);
+	old_cred = override_creds(override_cred);
+
+	mutex_lock_nested(&upperdir->i_mutex, I_MUTEX_PARENT);
+	err = ovl_copy_up_locked(parent, dentry, &pstat, &stat, link);
+	mutex_unlock(&upperdir->i_mutex);
+
+	revert_creds(old_cred);
+	put_cred(override_cred);
+
+out_free_link:
+	if (link)
+		free_page((unsigned long) link);
+
+	return err;
+}
+
+static int ovl_copy_up(struct dentry *dentry)
+{
+	struct ovl_entry *ue = dentry->d_fsdata;
+	int err;
+
+	err = 0;
+	while (!err && !ue->upperpath.dentry) {
+		struct dentry *next = dget(dentry);
+		struct dentry *parent;
+
+		/* find the topmost dentry not yet copied up */
+		for (;;) {
+			struct ovl_entry *pue;
+
+			parent = dget_parent(next);
+			pue = parent->d_fsdata;
+
+			if (pue->upperpath.dentry)
+				break;
+
+			dput(next);
+			next = parent;
+		}
+		err = ovl_copy_up_one(parent, next);
+
+		dput(parent);
+		dput(next);
+	}
+
+	return err;
+}
+
+static int ovl_setattr(struct dentry *dentry, struct iattr *attr)
+{
+	struct inode *inode;
+	struct ovl_entry *ue = dentry->d_fsdata;
+	int err;
+
+	/* FIXME: handle truncate efficiently */
+	err = ovl_copy_up(dentry);
+	if (err)
+		return err;
+
+	inode = ue->upperpath.dentry->d_inode;
+
+	mutex_lock(&inode->i_mutex);
+	err = notify_change(ue->upperpath.dentry, attr);
+	mutex_unlock(&inode->i_mutex);
+
+	return err;
+}
+
+static int ovl_getattr(struct vfsmount *mnt, struct dentry *dentry,
+			 struct kstat *stat)
+{
+	struct ovl_entry *ue = dentry->d_fsdata;
+	struct path *realpath = ovl_path(ue);
+
+	return vfs_getattr(realpath->mnt, realpath->dentry, stat);
+}
+
+static int ovl_dir_getattr(struct vfsmount *mnt, struct dentry *dentry,
+			 struct kstat *stat)
+{
+	int err;
+	struct ovl_entry *ue = dentry->d_fsdata;
+	struct path *realpath = ovl_path(ue);
+
+	err = vfs_getattr(realpath->mnt, realpath->dentry, stat);
+
+	stat->dev = dentry->d_sb->s_dev;
+	stat->ino = dentry->d_inode->i_ino;
+
+	/*
+	 * It's probably not worth it to count subdirs to get the
+	 * correct link count.  nlink=1 seems to pacify 'find' and
+	 * other utilities.
+	 */
+	if (ue->lowerpath.dentry && ue->upperpath.dentry)
+		stat->nlink = 1;
+
+	return err;
+}
+
+static int ovl_permission(struct dentry *dentry, int mask)
+{
+	struct ovl_entry *ue = dentry->d_fsdata;
+	struct inode *inode;
+	int err;
+
+	if (ue->upperpath.dentry)
+		return dentry_permission(ue->upperpath.dentry, mask);
+
+	inode = ue->lowerpath.dentry->d_inode;
+	if (!(mask & MAY_WRITE) || special_file(inode->i_mode))
+		return dentry_permission(ue->lowerpath.dentry, mask);
+
+	/* Don't check for read-only fs */
+	if (mask & MAY_WRITE) {
+		if (IS_IMMUTABLE(inode))
+			return -EACCES;
+	}
+
+	if (inode->i_op->permission)
+		err = inode->i_op->permission(ue->lowerpath.dentry, mask);
+	else
+		err = generic_permission(inode, mask, inode->i_op->check_acl);
+
+	if (err)
+		return err;
+
+	return security_inode_permission(inode, mask);
+}
+
+static int ovl_create_object(struct dentry *dentry, int mode, dev_t rdev,
+			       const char *link)
+{
+	int err;
+	struct inode *inode;
+	struct ovl_entry *ue = dentry->d_fsdata;
+	struct ovl_entry *pue = dentry->d_parent->d_fsdata;
+	struct inode *upperdir;
+	struct path newpath;
+	struct kstat stat = {
+		.mode = mode,
+		.rdev = rdev,
+	};
+
+	err = -ENOMEM;
+	inode = ovl_new_inode(dentry->d_sb, mode);
+	if (!inode)
+		goto out;
+
+	err = ovl_copy_up(dentry->d_parent);
+	if (err)
+		goto out_iput;
+
+	upperdir = pue->upperpath.dentry->d_inode;
+
+	mutex_lock_nested(&upperdir->i_mutex, I_MUTEX_PARENT);
+	err = ovl_upper_create(dentry->d_parent, dentry, &stat, link,
+			       &newpath);
+	if (err)
+		goto out_unlock;
+
+	if (ue->opaque && S_ISDIR(mode)) {
+		err = ovl_set_opaque(newpath.dentry);
+		if (err) {
+			path_put(&newpath);
+			goto out_unlock;
+		}
+	}
+	ue->upperpath = newpath;
+	d_instantiate(dentry, inode);
+	inode = NULL;
+
+out_unlock:
+	mutex_unlock(&upperdir->i_mutex);
+out_iput:
+	iput(inode);
+out:
+	return err;
+}
+
+static int ovl_create(struct inode *dir, struct dentry *dentry, int mode,
+			struct nameidata *nd)
+{
+	return ovl_create_object(dentry, (mode & 07777) | S_IFREG, 0, NULL);
+}
+
+static int ovl_mkdir(struct inode *dir, struct dentry *dentry, int mode)
+{
+	return ovl_create_object(dentry, (mode & 07777) | S_IFDIR, 0, NULL);
+}
+
+static int ovl_mknod(struct inode *dir, struct dentry *dentry, int mode,
+		       dev_t rdev)
+{
+	return ovl_create_object(dentry, mode, rdev, NULL);
+}
+
+static int ovl_symlink(struct inode *dir, struct dentry *dentry,
+			 const char *link)
+{
+	return ovl_create_object(dentry, S_IFLNK, 0, link);
+}
+
+static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
+{
+	struct ovl_entry *ue = dentry->d_fsdata;
+	struct path *realpath = ovl_path(ue);
+	struct inode *realinode = realpath->dentry->d_inode;
+
+	if (WARN_ON(!realinode->i_op->follow_link))
+		return ERR_PTR(-EPERM);
+
+	return realinode->i_op->follow_link(realpath->dentry, nd);
+}
+
+static void ovl_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
+{
+	struct ovl_entry *ue = dentry->d_fsdata;
+	struct path *realpath = ovl_path(ue);
+	struct inode *realinode = realpath->dentry->d_inode;
+
+	if (realinode->i_op->put_link)
+		realinode->i_op->put_link(realpath->dentry, nd, c);
+}
+
+static int ovl_readlink(struct dentry *dentry, char __user *buf, int bufsiz)
+{
+	struct ovl_entry *ue = dentry->d_fsdata;
+	struct path *realpath = ovl_path(ue);
+	struct inode *realinode = realpath->dentry->d_inode;
+
+	if (!realinode->i_op->readlink)
+		return -EINVAL;
+
+	touch_atime(realpath->mnt, realpath->dentry);
+	return realinode->i_op->readlink(realpath->dentry, buf, bufsiz);
+}
+
+static int ovl_whiteout(struct dentry *dentry)
+{
+	int err;
+	struct ovl_entry *pue = dentry->d_parent->d_fsdata;
+	struct dentry *newdentry;
+	const struct cred *old_cred;
+	struct cred *override_cred;
+
+	err = -ENOMEM;
+	override_cred = prepare_creds();
+	if (!override_cred)
+		goto out;
+
+	/*
+	 * CAP_SYS_ADMIN for setxattr
+	 * CAP_DAC_OVERRIDE for symlink creation
+	 */
+	cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+	cap_raise(override_cred->cap_effective, CAP_DAC_OVERRIDE);
+	override_cred->fsuid = 0;
+	override_cred->fsgid = 0;
+	old_cred = override_creds(override_cred);
+
+	newdentry = lookup_one_len(dentry->d_name.name, pue->upperpath.dentry,
+				   dentry->d_name.len);
+	err = PTR_ERR(newdentry);
+	if (IS_ERR(newdentry))
+		goto out_put_cred;
+
+	err = -ESTALE;
+	if (WARN_ON(newdentry->d_inode))
+		goto out_dput;
+
+	err = vfs_symlink(pue->upperpath.dentry->d_inode, newdentry,
+			  ovl_whiteout_symlink);
+	if (err)
+		goto out_dput;
+
+	err = vfs_setxattr(newdentry, ovl_whiteout_xattr, "y", 1, 0);
+
+out_dput:
+	dput(newdentry);
+out_put_cred:
+	revert_creds(old_cred);
+	put_cred(override_cred);
+out:
+	return err;
+}
+
+static int ovl_unlink(struct inode *dir, struct dentry *dentry)
+{
+	int err;
+	struct ovl_entry *ue = dentry->d_fsdata;
+	struct ovl_entry *pue;
+	struct inode *upperdir;
+
+	err = ovl_copy_up(dentry->d_parent);
+	if (err)
+		return err;
+
+	pue = dentry->d_parent->d_fsdata;
+	upperdir = pue->upperpath.dentry->d_inode;
+
+	mutex_lock_nested(&upperdir->i_mutex, I_MUTEX_PARENT);
+	if (ue->upperpath.dentry) {
+		err = vfs_unlink(upperdir, ue->upperpath.dentry);
+		if (err)
+			goto out_unlock;
+	} else {
+		ue->opaque = true;
+	}
+
+	if (ue->opaque)
+		err = ovl_whiteout(dentry);
+out_unlock:
+	mutex_unlock(&upperdir->i_mutex);
+
+	return err;
+}
+
+static int ovl_check_empty_dir(struct dentry *dentry)
+{
+	int err;
+	struct ovl_entry *ue = dentry->d_fsdata;
+	struct ovl_cache_entry *p;
+	struct ovl_cache_callback cb = {
+		.list = NULL,
+		.endp = &cb.list,
+		.path = ue->upperpath,
+	};
+
+	if (ue->upperpath.dentry) {
+		err = ovl_fill_cache(&ue->upperpath, &cb, ovl_fill_upper);
+		if (err)
+			return err;
+	}
+	err = ovl_fill_cache(&ue->lowerpath, &cb, ovl_fill_lower);
+	if (err)
+		return err;
+
+	err = 0;
+	for (p = cb.list; p; p = p->next) {
+		if (p->is_whiteout)
+			continue;
+
+		if (p->name.name[0] == '.') {
+			if (p->name.len == 1)
+				continue;
+			if (p->name.len == 2 && p->name.name[1] == '.')
+				continue;
+		}
+		err = -ENOTEMPTY;
+		break;
+	}
+
+	ovl_cache_free(cb.list);
+
+	return err;
+}
+
+static int ovl_unlink_whiteout(void *buf, const char *name, int namlen,
+				 loff_t offset, u64 ino, unsigned int d_type)
+{
+	struct ovl_cache_callback *cb = buf;
+
+	cb->count++;
+	/* check d_type to filter out "." and ".." */
+	if (d_type == DT_LNK) {
+		struct dentry *dentry;
+
+		dentry = lookup_one_len(name, cb->path.dentry, strlen(name));
+		if (IS_ERR(dentry)) {
+			cb->err = PTR_ERR(dentry);
+		} else {
+			cb->err = vfs_unlink(cb->path.dentry->d_inode, dentry);
+			dput(dentry);
+		}
+	}
+
+	return cb->err;
+}
+
+static int ovl_remove_whiteouts(struct dentry *dentry)
+{
+	struct ovl_entry *ue = dentry->d_fsdata;
+	struct ovl_cache_callback cb = {
+		.list = NULL,
+		.path = ue->upperpath,
+	};
+
+	if (!ue->upperpath.dentry)
+		return 0;
+
+	return ovl_fill_cache(&ue->upperpath, &cb, ovl_unlink_whiteout);
+}
+
+static int ovl_rmdir(struct inode *dir, struct dentry *dentry)
+{
+	int err;
+	struct ovl_entry *ue = dentry->d_fsdata;
+	struct ovl_entry *pue;
+	struct inode *upperdir;
+
+	if (ue->lowerpath.dentry) {
+		err = ovl_check_empty_dir(dentry);
+		if (err)
+			return err;
+
+		err = ovl_copy_up(dentry->d_parent);
+		if (err)
+			return err;
+
+		err = ovl_remove_whiteouts(dentry);
+		if (err)
+			return err;
+	}
+
+	pue = dentry->d_parent->d_fsdata;
+	upperdir = pue->upperpath.dentry->d_inode;
+
+	mutex_lock_nested(&upperdir->i_mutex, I_MUTEX_PARENT);
+	if (ue->upperpath.dentry) {
+		err = vfs_rmdir(upperdir, ue->upperpath.dentry);
+		if (err)
+			goto out_unlock;
+	}
+	if (ue->lowerpath.dentry)
+		ue->opaque = true;
+
+	if (ue->opaque)
+		err = ovl_whiteout(dentry);
+out_unlock:
+	mutex_unlock(&upperdir->i_mutex);
+
+	return err;
+}
+
+static int ovl_link(struct dentry *old, struct inode *newdir,
+		      struct dentry *new)
+{
+	int err;
+	struct dentry *newdentry;
+	struct ovl_entry *new_ue = new->d_fsdata;
+	struct ovl_entry *old_ue = old->d_fsdata;
+	struct ovl_entry *pue = new->d_parent->d_fsdata;
+	struct inode *upperdir;
+
+	err = ovl_copy_up(old);
+	if (err)
+		goto out;
+
+	err = ovl_copy_up(new->d_parent);
+	if (err)
+		goto out;
+
+	upperdir = pue->upperpath.dentry->d_inode;
+	mutex_lock_nested(&upperdir->i_mutex, I_MUTEX_PARENT);
+	newdentry = ovl_lookup_create(new_ue, pue, &new->d_name);
+	err = PTR_ERR(newdentry);
+	if (IS_ERR(newdentry))
+		goto out_unlock;
+
+	err = vfs_link(old_ue->upperpath.dentry, upperdir, newdentry);
+	if (!err) {
+		struct inode *inode = old->d_inode;
+
+		atomic_inc(&inode->i_count);
+		d_instantiate(new, inode);
+
+		new_ue->upperpath.dentry = newdentry;
+		new_ue->upperpath.mnt = pue->upperpath.mnt;
+		path_get(&new_ue->upperpath);
+	}
+	dput(newdentry);
+out_unlock:
+	mutex_unlock(&upperdir->i_mutex);
+out:
+	return err;
+
+}
+
+static int ovl_rename(struct inode *olddir, struct dentry *old,
+			struct inode *newdir, struct dentry *new)
+{
+	int err;
+	struct ovl_entry *old_ue = old->d_fsdata;
+	struct ovl_entry *new_ue = new->d_fsdata;
+	struct ovl_entry *old_pue = old->d_parent->d_fsdata;
+	struct ovl_entry *new_pue = new->d_parent->d_fsdata;
+	struct dentry *old_upperdir;
+	struct dentry *new_upperdir;
+	struct dentry *olddentry;
+	struct dentry *newdentry;
+	struct dentry *trap;
+	bool prev_opaque;
+
+	/* Don't copy up directory trees */
+	if (old_ue->lowerpath.dentry &&
+	    S_ISDIR(old_ue->lowerpath.dentry->d_inode->i_mode))
+		return -EXDEV;
+
+	if (new_ue->lowerpath.dentry &&
+	    S_ISDIR(new_ue->lowerpath.dentry->d_inode->i_mode)) {
+		err = ovl_check_empty_dir(new);
+		if (err)
+			return err;
+	}
+
+	err = ovl_copy_up(old);
+	if (err)
+		return err;
+
+	err = ovl_copy_up(new->d_parent);
+	if (err)
+		return err;
+
+	if (new_ue->lowerpath.dentry &&
+	    S_ISDIR(new_ue->lowerpath.dentry->d_inode->i_mode)) {
+		err = ovl_remove_whiteouts(new);
+		if (err)
+			return err;
+	}
+
+	old_upperdir = old_pue->upperpath.dentry;
+	new_upperdir = new_pue->upperpath.dentry;
+	trap = lock_rename(new_upperdir, old_upperdir);
+
+	olddentry = old_ue->upperpath.dentry;
+	newdentry = dget(new_ue->upperpath.dentry);
+	if (!newdentry) {
+		newdentry = ovl_lookup_create(new_ue, new_pue, &new->d_name);
+		err = PTR_ERR(newdentry);
+		if (IS_ERR(newdentry))
+			goto out_unlock;
+	}
+
+	err = -ESTALE;
+	if (WARN_ON(olddentry == trap))
+		goto out_dput;
+	if (WARN_ON(newdentry == trap))
+		goto out_dput;
+
+	err = vfs_rename(old_upperdir->d_inode, olddentry,
+			 new_upperdir->d_inode, newdentry);
+
+	if (!err) {
+		prev_opaque = old_ue->opaque;
+		old_ue->opaque = new_ue->opaque || new_ue->lowerpath.dentry;
+		if (prev_opaque)
+			err = ovl_whiteout(old);
+		if (!err && S_ISDIR(olddentry->d_inode->i_mode)) {
+			if (prev_opaque && !old_ue->opaque)
+				ovl_remove_opaque(olddentry);
+			if (!prev_opaque && old_ue->opaque)
+				err = ovl_set_opaque(olddentry);
+		}
+	}
+
+out_dput:
+	dput(newdentry);
+out_unlock:
+	unlock_rename(new_upperdir, old_upperdir);
+	return err;
+}
+
+static bool ovl_is_private_xattr(const char *name)
+{
+	return strncmp(name, "trusted.overlay.", 14) == 0;
+}
+
+static int ovl_setxattr(struct dentry *dentry, const char *name,
+			  const void *value, size_t size, int flags)
+{
+	int err;
+	struct ovl_entry *ue = dentry->d_fsdata;
+
+	if (ovl_is_private_xattr(name))
+		return -ENODATA;
+
+	if (!ue->upperpath.dentry) {
+		err = ovl_copy_up(dentry);
+		if (err)
+			return err;
+	}
+
+	return vfs_setxattr(ue->upperpath.dentry, name, value, size, flags);
+}
+
+static ssize_t ovl_getxattr(struct dentry *dentry, const char *name,
+			      void *value, size_t size)
+{
+	struct ovl_entry *ue = dentry->d_fsdata;
+	struct path *realpath = ovl_path(ue);
+
+	if (ovl_is_private_xattr(name))
+		return -ENODATA;
+
+	return vfs_getxattr(realpath->dentry, name, value, size);
+}
+
+static ssize_t ovl_listxattr(struct dentry *dentry, char *list, size_t size)
+{
+	struct ovl_entry *ue = dentry->d_fsdata;
+	struct path *realpath = ovl_path(ue);
+	ssize_t res;
+	int off;
+
+	res = vfs_listxattr(realpath->dentry, list, size);
+	if (res <= 0 || size == 0)
+		return res;
+
+	/* filter out private xattrs */
+	for (off = 0; off < res;) {
+		char *s = list + off;
+		size_t slen = strlen(s) + 1;
+
+		BUG_ON(off + slen > res);
+
+		if (ovl_is_private_xattr(s)) {
+			res -= slen;
+			memmove(s, s + slen, res - off);
+		} else {
+			off += slen;
+		}
+	}
+
+	return res;
+}
+
+static int ovl_removexattr(struct dentry *dentry, const char *name)
+{
+	int err;
+	struct ovl_entry *ue = dentry->d_fsdata;
+
+	if (ovl_is_private_xattr(name))
+		return -ENODATA;
+
+	if (!ue->upperpath.dentry) {
+		err = vfs_getxattr(ue->lowerpath.dentry, name, NULL, 0);
+		if (err < 0)
+			return err;
+
+		err = ovl_copy_up(dentry);
+		if (err)
+			return err;
+	}
+
+	return vfs_removexattr(ue->upperpath.dentry, name);
+}
+
+static const struct inode_operations ovl_dir_inode_operations = {
+	.lookup		= ovl_lookup,
+	.mkdir		= ovl_mkdir,
+	.symlink	= ovl_symlink,
+	.unlink		= ovl_unlink,
+	.rmdir		= ovl_rmdir,
+	.rename		= ovl_rename,
+	.link		= ovl_link,
+	.setattr	= ovl_setattr,
+	.create		= ovl_create,
+	.mknod		= ovl_mknod,
+	.permission	= ovl_permission,
+	.getattr	= ovl_dir_getattr,
+	.setxattr	= ovl_setxattr,
+	.getxattr	= ovl_getxattr,
+	.listxattr	= ovl_listxattr,
+	.removexattr	= ovl_removexattr,
+};
+
+static const struct inode_operations ovl_file_inode_operations = {
+	.setattr	= ovl_setattr,
+	.permission	= ovl_permission,
+	.getattr	= ovl_getattr,
+	.setxattr	= ovl_setxattr,
+	.getxattr	= ovl_getxattr,
+	.listxattr	= ovl_listxattr,
+	.removexattr	= ovl_removexattr,
+};
+
+static const struct inode_operations ovl_symlink_inode_operations = {
+	.setattr	= ovl_setattr,
+	.follow_link	= ovl_follow_link,
+	.put_link	= ovl_put_link,
+	.readlink	= ovl_readlink,
+	.getattr	= ovl_getattr,
+	.setxattr	= ovl_setxattr,
+	.getxattr	= ovl_getxattr,
+	.listxattr	= ovl_listxattr,
+	.removexattr	= ovl_removexattr,
+};
+
+static bool ovl_open_need_copy_up(struct file *file, struct ovl_entry *ue)
+{
+	if (ue->upperpath.dentry)
+		return false;
+
+	if (special_file(ue->lowerpath.dentry->d_inode->i_mode))
+		return false;
+
+	if (!(file->f_mode & FMODE_WRITE) && !(file->f_flags & O_TRUNC))
+		return false;
+
+	return true;
+}
+
+static struct file *ovl_open(struct file *file)
+{
+	struct dentry *dentry = file->f_path.dentry;
+	struct ovl_entry *ue = dentry->d_fsdata;
+	int err;
+
+	if (ovl_open_need_copy_up(file, ue)) {
+		err = ovl_copy_up(dentry);
+		if (err)
+			return ERR_PTR(err);
+	}
+	return path_open(ovl_path(ue), file->f_flags);
+}
+
+static const struct file_operations ovl_file_operations = {
+	.open_other	= ovl_open,
+};
+
+static void ovl_put_super(struct super_block *sb)
+{
+	struct ovl_fs *ufs = sb->s_fs_info;
+
+	if (!(sb->s_flags & MS_RDONLY))
+		mnt_drop_write(ufs->upper_mnt);
+
+	mntput(ufs->upper_mnt);
+
+	iput(ufs->symlink_inode);
+	iput(ufs->regular_inode);
+	iput(ufs->special_inode);
+	kfree(ufs);
+}
+
+static const struct super_operations ovl_super_operations = {
+	.put_super	= ovl_put_super,
+};
+
+struct ovl_config {
+	char *lowerdir;
+	char *upperdir;
+};
+
+enum {
+	Opt_lowerdir,
+	Opt_upperdir,
+	Opt_err,
+};
+
+static const match_table_t ovl_tokens = {
+	{Opt_lowerdir,			"lowerdir=%s"},
+	{Opt_upperdir,			"upperdir=%s"},
+	{Opt_err,			NULL}
+};
+
+static int ovl_parse_opt(char *opt, struct ovl_config *config)
+{
+	char *p;
+
+	config->upperdir = NULL;
+	config->lowerdir = NULL;
+
+	while ((p = strsep(&opt, ",")) != NULL) {
+		int token;
+		substring_t args[MAX_OPT_ARGS];
+
+		if (!*p)
+			continue;
+
+		token = match_token(p, ovl_tokens, args);
+		switch (token) {
+		case Opt_upperdir:
+			kfree(config->upperdir);
+			config->upperdir = match_strdup(&args[0]);
+			if (!config->upperdir)
+				return -ENOMEM;
+			break;
+
+		case Opt_lowerdir:
+			kfree(config->lowerdir);
+			config->lowerdir = match_strdup(&args[0]);
+			if (!config->lowerdir)
+				return -ENOMEM;
+			break;
+
+		default:
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+static int ovl_fill_super(struct super_block *sb, void *data, int silent)
+{
+	struct inode *root_inode;
+	struct dentry *root_dentry;
+	struct ovl_entry *ue;
+	struct ovl_fs *ufs;
+	struct ovl_config config;
+	int err;
+
+	err = ovl_parse_opt((char *) data, &config);
+	if (err)
+		goto out;
+
+	err = -EINVAL;
+	if (!config.upperdir || !config.lowerdir)
+		goto out_free_config;
+
+	err = -ENOMEM;
+	ufs = kmalloc(sizeof(struct ovl_fs), GFP_KERNEL);
+	if (!ufs)
+		goto out_free_config;
+
+	ufs->symlink_inode = new_inode(sb);
+	if (!ufs->symlink_inode)
+		goto out_free_ufs;
+
+	ufs->regular_inode = new_inode(sb);
+	if (!ufs->regular_inode)
+		goto out_put_symlink_inode;
+
+	ufs->special_inode = new_inode(sb);
+	if (!ufs->special_inode)
+		goto out_put_regular_inode;
+
+	ufs->symlink_inode->i_flags |= S_NOATIME|S_NOCMTIME;
+	ufs->symlink_inode->i_mode = S_IFLNK;
+	ufs->symlink_inode->i_op = &ovl_symlink_inode_operations;
+
+	ufs->regular_inode->i_flags |= S_NOATIME|S_NOCMTIME;
+	ufs->regular_inode->i_mode = S_IFREG;
+	ufs->regular_inode->i_op = &ovl_file_inode_operations;
+	ufs->regular_inode->i_fop = &ovl_file_operations;
+
+	ufs->special_inode->i_flags |= S_NOATIME|S_NOCMTIME;
+	ufs->special_inode->i_mode = S_IFSOCK;
+	ufs->special_inode->i_op = &ovl_file_inode_operations;
+	ufs->special_inode->i_fop = &ovl_file_operations;
+
+	root_inode = ovl_new_inode(sb, S_IFDIR);
+	if (!root_inode)
+		goto out_put_special_inode;
+
+	ue = kzalloc(sizeof(struct ovl_entry), GFP_KERNEL);
+	if (ue == NULL)
+		goto out_put_root;
+
+	err = kern_path(config.upperdir, LOOKUP_FOLLOW, &ue->upperpath);
+	if (err)
+		goto out_free_ue;
+
+	err = kern_path(config.lowerdir, LOOKUP_FOLLOW, &ue->lowerpath);
+	if (err)
+		goto out_put_upperpath;
+
+	err = -ENOTDIR;
+	if (!S_ISDIR(ue->upperpath.dentry->d_inode->i_mode) ||
+	    !S_ISDIR(ue->lowerpath.dentry->d_inode->i_mode))
+		goto out_put_lowerpath;
+
+	if (!(sb->s_flags & MS_RDONLY)) {
+		err = mnt_want_write(ue->upperpath.mnt);
+		if (err)
+			goto out_put_lowerpath;
+	}
+
+	err = -ENOMEM;
+	root_dentry = d_alloc_root(root_inode);
+	if (!root_dentry)
+		goto out_drop_write;
+
+	root_dentry->d_fsdata = ue;
+	root_dentry->d_op = &ovl_dentry_operations;
+
+	ufs->upper_mnt = mntget(ue->upperpath.mnt);
+
+	sb->s_op = &ovl_super_operations;
+	sb->s_root = root_dentry;
+	sb->s_fs_info = ufs;
+
+	return 0;
+
+out_drop_write:
+	if (!(sb->s_flags & MS_RDONLY))
+		mnt_drop_write(ue->upperpath.mnt);
+out_put_lowerpath:
+	path_put(&ue->lowerpath);
+out_put_upperpath:
+	path_put(&ue->upperpath);
+out_free_ue:
+	kfree(ue);
+out_put_root:
+	iput(root_inode);
+out_put_special_inode:
+	iput(ufs->special_inode);
+out_put_regular_inode:
+	iput(ufs->regular_inode);
+out_put_symlink_inode:
+	iput(ufs->symlink_inode);
+out_free_ufs:
+	kfree(ufs);
+out_free_config:
+	kfree(config.lowerdir);
+	kfree(config.upperdir);
+out:
+	return err;
+}
+
+static int ovl_get_sb(struct file_system_type *fs_type,
+			int flags, const char *dev_name,
+			void *raw_data, struct vfsmount *mnt)
+{
+	return get_sb_nodev(fs_type, flags, raw_data, ovl_fill_super, mnt);
+}
+
+static struct file_system_type ovl_fs_type = {
+	.owner		= THIS_MODULE,
+	.name		= "overlayfs",
+	.fs_flags	= FS_RENAME_SELF_ALLOW,
+	.get_sb		= ovl_get_sb,
+	.kill_sb	= kill_anon_super,
+};
+
+static int __init ovl_init(void)
+{
+	return register_filesystem(&ovl_fs_type);
+}
+
+static void __exit ovl_exit(void)
+{
+	unregister_filesystem(&ovl_fs_type);
+}
+
+module_init(ovl_init);
+module_exit(ovl_exit);
Index: linux-2.6/fs/Kconfig
===================================================================
--- linux-2.6.orig/fs/Kconfig	2010-09-03 14:45:57.000000000 +0200
+++ linux-2.6/fs/Kconfig	2010-09-03 14:46:00.000000000 +0200
@@ -62,6 +62,7 @@ source "fs/quota/Kconfig"
 source "fs/autofs/Kconfig"
 source "fs/autofs4/Kconfig"
 source "fs/fuse/Kconfig"
+source "fs/overlayfs/Kconfig"
 
 config CUSE
 	tristate "Character device in Userspace support"
Index: linux-2.6/fs/Makefile
===================================================================
--- linux-2.6.orig/fs/Makefile	2010-09-03 14:45:57.000000000 +0200
+++ linux-2.6/fs/Makefile	2010-09-03 14:46:00.000000000 +0200
@@ -108,6 +108,7 @@ obj-$(CONFIG_AUTOFS_FS)		+= autofs/
 obj-$(CONFIG_AUTOFS4_FS)	+= autofs4/
 obj-$(CONFIG_ADFS_FS)		+= adfs/
 obj-$(CONFIG_FUSE_FS)		+= fuse/
+obj-$(CONFIG_OVERLAYFS_FS)	+= overlayfs/
 obj-$(CONFIG_UDF_FS)		+= udf/
 obj-$(CONFIG_SUN_OPENPROMFS)	+= openpromfs/
 obj-$(CONFIG_OMFS_FS)		+= omfs/
Index: linux-2.6/fs/overlayfs/Kconfig
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/fs/overlayfs/Kconfig	2010-09-03 14:46:00.000000000 +0200
@@ -0,0 +1,4 @@
+config OVERLAYFS_FS
+	tristate "Overlay filesystem support"
+	help
+	  Add support for overlay filesystem.
Index: linux-2.6/fs/overlayfs/Makefile
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/fs/overlayfs/Makefile	2010-09-03 14:46:00.000000000 +0200
@@ -0,0 +1,5 @@
+#
+# Makefile for the overlay filesystem.
+#
+
+obj-$(CONFIG_OVERLAYFS_FS) += overlayfs.o

-- 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 6/6] overlay: overlay filesystem documentation
  2010-09-03 13:41 [PATCH 0/6] overlay filesystem prototype Miklos Szeredi
                   ` (4 preceding siblings ...)
  2010-09-03 13:41 ` [PATCH 5/6] overlay: hybrid overlay filesystem prototype Miklos Szeredi
@ 2010-09-03 13:41 ` Miklos Szeredi
  2010-09-05 10:37 ` [PATCH 0/6] overlay filesystem prototype J. R. Okajima
  6 siblings, 0 replies; 11+ messages in thread
From: Miklos Szeredi @ 2010-09-03 13:41 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro

[-- Attachment #1: overlayfs-documentation.patch --]
[-- Type: text/plain, Size: 7769 bytes --]

From: Neil Brown <neilb@suse.de>

Document the overlay filesystem.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 Documentation/filesystems/overlayfs.txt |  162 ++++++++++++++++++++++++++++++++
 1 file changed, 162 insertions(+)

Index: linux-2.6/Documentation/filesystems/overlayfs.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/Documentation/filesystems/overlayfs.txt	2010-08-31 18:41:33.000000000 +0200
@@ -0,0 +1,162 @@
+Written by: Neil Brown <neilb@suse.de>
+
+Overlay Filesystem
+==================
+
+This document describes a prototype for a new approach to providing
+union-filesystem functionality in Linux.  A union-filesystem tries to
+present the union of two different filesystems as though it were a
+single filesystem.  The result will inevitably fail to look exactly
+like a normal filesystem for various technical reasons.  The
+expectation is that many use cases will be able to ignore these
+differences.
+
+This approach is 'hybrid' because the objects that appear in the
+filesystem do not all appear to belong to that filesystem.  In many
+case an object accessed in the union will be indistinguishable
+from accessing the corresponding object from the original filesystem.
+This is most obvious from the 'st_dev' field returned by stat(2).
+Some objects will report an st_dev from one original filesystem, some
+from the other, directories will report an st_dev from the union
+itself.  Similarly st_ino will only be unique when combined with
+st_dev, and both of these can change over the lifetime of a
+non-directory object.  Many applications and tools ignore these values
+and will not be affected.
+
+Upper and Lower
+---------------
+
+An overlay filesystem combines two filesystems - an 'upper' filesystem
+and a 'lower' filesystem.  Note that while in set theory, 'union' is a
+commutative operation, in filesystems it is not - the two filesystems
+are treated differently.  When a name exists in both filesystems, the
+object in the 'upper' filesystem is visible while the object in the
+'lower' filesystem is either hidden or, in the case of directories,
+merged with the 'upper' object.
+
+It would be more correct to refer to an upper and lower 'directory
+tree' rather than 'filesystem' as it is quite possible for both
+directory trees to be in the same filesystem and there is no
+requirement that the root of a filesystem be given for either upper or
+lower.
+
+The lower filesystem can be any filesystem supported by Linux and does
+not need to be writable.  Theoretically it could even be another
+overlayfs, but this is not yet supported.  The upper filesystem will
+normally be writeable and if it is it must support the creation of
+trusted.* extended attributes, and must provide valid d_type in
+readdir responses, at least for symbolic links - so NFS is not
+suitable.
+
+A read-only union of two read-only filesystems may use any filesystem
+type.
+
+Directories
+-----------
+
+Unioning mainly involved directories.  If a given name appears in both
+upper ad lower filesystems and refers to a non-directory in either,
+then the lower object is hidden - the name refers only to the upper
+object.
+
+Where both upper and lower objects are directories, a merged directory
+is formed.
+
+At mount time, the two directories given as mount options are combined
+into a merged directory.  Then whenever a lookup is requested in such
+a merged directory, the lookup is performed in each actual directory
+and the combined result is cached in the dentry belonging to the overlay
+filesystem.  If both actual lookups find directories, both are stored
+and a merged directory is create, otherwise only one is stored: the
+upper if it exists, else the lower.
+
+Only the lists of names from directories are merged.  Other content
+such as metadata and extended attributes are reported for the upper
+directory only.  These attributes of the lower directory are hidden.
+
+whiteouts and opaque directories
+--------------------------------
+
+In order to support rm and rmdir without changing the lower
+filesystem, an overlay filesystem needs to record in the upper filesystem
+that files have been removed.  This is done using whiteouts and opaque
+directories (non-directories are always opaque).
+
+The overlay filesystem uses extended attributes with a
+"trusted.overlay."  prefix to record these details.
+
+A whiteout is created as a symbolic link with target
+"(overlay-whiteout)" and with xattr "trusted.overlay.whiteout" set to "y".
+When a whiteout is found in the upper level of a merged directory, any
+matching name in the lower level is ignored, and the whiteout itself
+is also hidden.
+
+A directory is made opaque by setting the xattr "trusted.overlay.opaque"
+to "y".  Where the upper filesystem contains an opaque directory, any
+directory in the lower filesystem with the same name is ignored.
+
+readdir
+-------
+
+When a 'readdir' request is made on a merged directory, the upper and
+lower directories are each read and the name lists merged in the
+obvious way (upper is read first, then lower - entries that already
+exist are not re-added).  This merged name list is cached in the
+'struct file' and so remains as long as the file is kept open.  If the
+directory is opened and read by two processes at the same time, they
+will each have separate caches.  A seekdir to the start of the
+directory (offset 0) followed by a readdir will cause the cache to be
+discarded and rebuilt.
+
+This means that changes to the merged directory do not appear while a
+directory is being read.  This is unlikely to be noticed by many
+programs.
+
+seek offsets are assigned sequentially when the directories are read.
+Thus if
+  - read part of a directory
+  - remember an offset, and close the directory
+  - re-open the directory some time later
+  - seek to the remembered offset
+
+there may be little correlation between the old and new locations in
+the list of filenames, particularly if anything has changed in the
+directory.
+
+Readdir on directories that are not merged is simply handled by the
+underlying directory (upper or lower).
+
+
+Non-directories
+---------------
+
+Objects that are not directories (files, symlinks, device-special
+files etc) are presented either from the upper or lower filesystem as
+appropriate.  When a file in the lower filesystem is accessed in a way
+the requires write-access; such as opening for write access, changing
+some metadata etc, the file is first copied from the lower filesystem
+to the upper filesystem (copy_up).  Note that creating a hard-link
+also requires copy-up, though of course creation of a symlink does
+not.
+
+The copy_up process first makes sure that the containing directory
+exists in the upper filesystem - creating it and any parents as
+necessary.  It then creates the object with the same metadata (owner,
+mode, mtime, symlink-target etc) and then if the object is a file, the
+data is copied from the lower to the upper filesystem.  Finally any
+extended attributes are copied up.
+
+Once the copy_up is complete, the overlay filesystem simply
+provides direct access to the newly created file in the upper
+filesystem - future operations on the file are barely noticed by the
+overlay filesystem (though an operation on the name of the file such as
+rename or unlink will of course be noticed and handled).
+
+Changes to underlying filesystems
+---------------------------------
+
+Offline changes, when the overlay is not mounted, are allowed to either
+the upper or the lower trees.
+
+Changes to the underlying filesystems while part of a mounted overlay
+filesystem are not allowed.  This is not yet enforced, but will be in

-- 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/6] overlay filesystem prototype
  2010-09-03 13:41 [PATCH 0/6] overlay filesystem prototype Miklos Szeredi
                   ` (5 preceding siblings ...)
  2010-09-03 13:41 ` [PATCH 6/6] overlay: overlay filesystem documentation Miklos Szeredi
@ 2010-09-05 10:37 ` J. R. Okajima
  2010-09-05 11:44   ` Neil Brown
  6 siblings, 1 reply; 11+ messages in thread
From: J. R. Okajima @ 2010-09-05 10:37 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, vaurora, neilb, viro


Miklos Szeredi:
> Changes since the last version:
	:::
>  - get write ref on the upper layer on mount unless the overlay
>    itself is mounted read-only

I think it a good approach.
Although it may be harmless, write-ref will not be put when a user
executes,
- mount -o ro /overlay
- umount /overlay
It will be easy to fix by implementing s_op->remount().


>  - raise capabilities for copy up, dealing with whiteouts and opaque
>    directories.  Now the overlay works for non-root users as well

Interesting approach.
But is it safe? If multi-threaded ap or a signal handler share
the credential, then they may gain incorrect capability.


J. R. Okajima

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/6] overlay filesystem prototype
  2010-09-05 10:37 ` [PATCH 0/6] overlay filesystem prototype J. R. Okajima
@ 2010-09-05 11:44   ` Neil Brown
  2010-09-05 12:08     ` J. R. Okajima
  0 siblings, 1 reply; 11+ messages in thread
From: Neil Brown @ 2010-09-05 11:44 UTC (permalink / raw)
  To: J. R. Okajima; +Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, vaurora, viro

On Sun, 05 Sep 2010 19:37:10 +0900
"J. R. Okajima" <hooanon05@yahoo.co.jp> wrote:

> 
> Miklos Szeredi:
> > Changes since the last version:
> 	:::
> >  - get write ref on the upper layer on mount unless the overlay
> >    itself is mounted read-only
> 
> I think it a good approach.
> Although it may be harmless, write-ref will not be put when a user
> executes,
> - mount -o ro /overlay
             ^remount,   I assume
> - umount /overlay
> It will be easy to fix by implementing s_op->remount().

Something like this?

(I have a few other patches queued up, but haven't tested anything properly
yet).

NeilBrown

>From 3a9e1d4f07c5d6fd18cc165537107dd31233ec1f Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Sat, 4 Sep 2010 09:17:54 +1000
Subject: [PATCH] ovl: minimal remount support.

As overlayfs reflects the 'readonly' mount status in write-access to
the upper filesystem, we must handle remount and either drop or take
write access when the ro status changes.

Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/fs/overlayfs/overlayfs.c b/fs/overlayfs/overlayfs.c
index 0ddfeec..4e032e8 100644
--- a/fs/overlayfs/overlayfs.c
+++ b/fs/overlayfs/overlayfs.c
@@ -1685,8 +1685,28 @@ static void ovl_put_super(struct super_block *sb)
 	kfree(ufs);
 }
 
+static int ovl_remount_fs(struct super_block *sb, int *flagsp, char *data)
+{
+	int flags = *flagsp;
+	struct ovl_fs *ufs = sb->s_fs_info;
+
+	/* When remounting rw or ro, we need to adjust the write access to the
+	 * upper fs.
+	 */
+	if (((flags ^ sb->s_flags) & MS_RDONLY) == 0)
+		/* No change to readonly status */
+		return 0;
+
+	if (flags & MS_RDONLY) {
+		mnt_drop_write(ufs->upper_mnt);
+		return 0;
+	} else
+		return mnt_want_write(ufs->upper_mnt);
+}
+
 static const struct super_operations ovl_super_operations = {
 	.put_super	= ovl_put_super,
+	.remount_fs	= ovl_remount_fs,
 };
 
 struct ovl_config {

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/6] overlay filesystem prototype
  2010-09-05 11:44   ` Neil Brown
@ 2010-09-05 12:08     ` J. R. Okajima
  0 siblings, 0 replies; 11+ messages in thread
From: J. R. Okajima @ 2010-09-05 12:08 UTC (permalink / raw)
  To: Neil Brown; +Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, vaurora, viro


Neil Brown:
> > Although it may be harmless, write-ref will not be put when a user
> > executes,
> > - mount -o ro /overlay
>              ^remount,   I assume

Right.


> Something like this?

Exactly.


J. R. Okajima

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/6] vfs: make i_op->permission take a dentry instead of an inode
  2010-09-03 13:41 ` [PATCH 2/6] vfs: make i_op->permission take a dentry instead of an inode Miklos Szeredi
@ 2010-09-17 13:14   ` Aneesh Kumar K. V
  0 siblings, 0 replies; 11+ messages in thread
From: Aneesh Kumar K. V @ 2010-09-17 13:14 UTC (permalink / raw)
  To: Miklos Szeredi, linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro

On Fri, 03 Sep 2010 15:41:18 +0200, Miklos Szeredi <miklos@szeredi.hu> wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
> 
> Like most other inode operations ->permission() should take a dentry
> instead of an inode.  This is necessary for filesystems which operate
> on names not on inodes.
> 

This change will also help 9P patch series i am doing 
http://article.gmane.org/gmane.linux.kernel/1032788

Currently ACL values are fetched from the server as a part of inode
initialization. We can't do it in inode operations->permission because
we need dentry to do 9P operations. Having inode operations->permission
take a dentry instead of an inode help there.

-aneesh

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-09-17 13:14 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-03 13:41 [PATCH 0/6] overlay filesystem prototype Miklos Szeredi
2010-09-03 13:41 ` [PATCH 1/6] vfs: implement open "forwarding" Miklos Szeredi
2010-09-03 13:41 ` [PATCH 2/6] vfs: make i_op->permission take a dentry instead of an inode Miklos Szeredi
2010-09-17 13:14   ` Aneesh Kumar K. V
2010-09-03 13:41 ` [PATCH 3/6] vfs: add flag to allow rename to same inode Miklos Szeredi
2010-09-03 13:41 ` [PATCH 4/6] vfs: export do_splice_direct() to modules Miklos Szeredi
2010-09-03 13:41 ` [PATCH 5/6] overlay: hybrid overlay filesystem prototype Miklos Szeredi
2010-09-03 13:41 ` [PATCH 6/6] overlay: overlay filesystem documentation Miklos Szeredi
2010-09-05 10:37 ` [PATCH 0/6] overlay filesystem prototype J. R. Okajima
2010-09-05 11:44   ` Neil Brown
2010-09-05 12:08     ` J. R. Okajima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).