All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/20] Overlayfs inodes index
@ 2017-06-07  7:51 Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 01/20] vfs: introduce inode 'inuse' lock Amir Goldstein
                   ` (22 more replies)
  0 siblings, 23 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

Miklos,

This set is an independent series a top of current overlayfs-next.
I yanked all the dependencies from previous postings (consistent d_ino,
dir_lock, verify_lower) and included everything needed in this posting.

This work introcuding the inodes index opt-in feature, which provides:
- Hardlinks are not broken on copy up
- Infrastructure for overlayfs NFS export

Hardlink copy up tests including concurrent copy up of lower hardlinks
are available on my xfstests dev branch [1].

There are some more TODO items before this is ready for v4.13, but I'd
love to hear what you think about the direction this has taken so far.

Thanks,
Amir.

TODO:
- Consistency of lower and upper hardlinks (*)
- Cleanup stale and orphan index entries on mount (**)
- Document the inodes index feature

(*) When any lower hardlink has been copied up, we get the indexed
    upper inode on lookup of all lower hardlinks and since they all
    share the same overlay inode, they have the same 'realinode'.
    Opening the lower hardlinks for read gives that lower inode
    and not the indexed upper 'realinode'. The tests in [1] demostrate
    this problem.
(**) A stale index entry has a missing or stale 'origin' xattr. An
     orphan index entry has nlink 1 and all lower hardlinks are covered.

v2:
- Rebase on top of overalyfs-next with all dependencies
- Remove 'verify_lower' dependency and simplify
- Lookup index dentry on ovl_lookup()
- Don't hash broken overlay hardlinks by origin
- Fix nlink count of overlay inodes
- Constant st_ino for indexed hardlinks

v1:
- Upper/work dir exclusive lock
- Introduce verify_lower mount option
- Hash overlay inodes by origin
- Introduce inodes index feature
- Copy up hardlinks using inodes index

[1] https://github.com/amir73il/xfstests/commits/overlayfs-devel

Amir Goldstein (20):
  vfs: introduce inode 'inuse' lock
  ovl: get exclusive ownership on upper/work dirs
  ovl: relax same fs constrain for ovl_check_origin()
  ovl: generalize ovl_create_workdir()
  ovl: introduce the inodes index dir feature
  ovl: verify upper root dir matches lower root dir
  ovl: verify index dir matches upper dir
  ovl: lookup index entry for non-dir
  ovl: move inode helpers to inode.c
  ovl: use ovl_inode_init() for initializing new inode
  ovl: hash overlay non-dir inodes by copy up origin inode
  ovl: fix nlink leak in ovl_rename()
  ovl: adjust overlay inode nlink for indexed inodes
  ovl: defer upper dir lock to tempfile link
  ovl: factor out ovl_copy_up_inode() helper
  ovl: generalize ovl_copy_up_locked() using actors
  ovl: generalize ovl_copy_up_one() using actors
  ovl: implement index dir copy up method
  ovl: handle race of concurrent lower hardlinks copy up
  ovl: constant inode number for hardlinks

 fs/inode.c               |  74 ++++++
 fs/overlayfs/Kconfig     |  20 ++
 fs/overlayfs/copy_up.c   | 648 ++++++++++++++++++++++++++++++++++++++---------
 fs/overlayfs/dir.c       |  23 +-
 fs/overlayfs/inode.c     | 138 +++++++++-
 fs/overlayfs/namei.c     | 241 +++++++++++++++---
 fs/overlayfs/overlayfs.h |  52 ++--
 fs/overlayfs/ovl_entry.h |   7 +
 fs/overlayfs/super.c     | 194 ++++++++++++--
 fs/overlayfs/util.c      |  45 ++--
 include/linux/fs.h       |  16 ++
 11 files changed, 1235 insertions(+), 223 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 01/20] vfs: introduce inode 'inuse' lock
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 02/20] ovl: get exclusive ownership on upper/work dirs Amir Goldstein
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

Added an i_state flag I_INUSE and helpers to set/clear/test and
wait until it is cleared.

The 'inuse' lock is an 'advisory' inode lock, that can be used to extend
exclusive create protection beyond parent->i_mutex lock among cooperating
users.

This is going to be used by overlayfs to get exclusive ownership on upper
and work dirs among overlayfs mounts and to sychronize concurrent copy up
of lower hardlinks.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/inode.c         | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h | 16 ++++++++++++
 2 files changed, 90 insertions(+)

diff --git a/fs/inode.c b/fs/inode.c
index db5914783a71..546cd503148a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2120,3 +2120,77 @@ struct timespec current_time(struct inode *inode)
 	return timespec_trunc(now, inode->i_sb->s_time_gran);
 }
 EXPORT_SYMBOL(current_time);
+
+/**
+ * inode_inuse_trylock - try to get an exclusive 'inuse' lock on inode
+ * @inode: inode being locked
+ *
+ * The 'inuse' lock is an 'advisory' lock that can be used to extend exclusive
+ * create protection beyond parent->i_mutex lock among cooperating users.
+ * Used by overlayfs to get exclusive ownership on upper and work dirs among
+ * overlayfs mounts.
+ *
+ * Caller must hold a reference to inode to prevent it from being freed while
+ * it is marked inuse.
+ *
+ * Return true if I_INUSE flag was set by this call.
+ */
+bool inode_inuse_trylock(struct inode *inode)
+{
+	bool locked = false;
+
+	spin_lock(&inode->i_lock);
+	if (!WARN_ON(!atomic_read(&inode->i_count)) &&
+	    !WARN_ON(inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) &&
+	    !(inode->i_state & I_INUSE)) {
+		inode->i_state |= I_INUSE;
+		locked = true;
+	}
+	spin_unlock(&inode->i_lock);
+	return locked;
+}
+EXPORT_SYMBOL(inode_inuse_trylock);
+
+/**
+ * inode_inuse_unlock - release exclusive 'inuse' lock
+ * @inode:	inode inuse to unlock
+ *
+ * Clear the I_INUSE state and wake up any waiters.
+ *
+ * Caller must hold a reference to inode and must have successfully marked
+ * the inode 'inuse' prior to this call.
+ */
+void inode_inuse_unlock(struct inode *inode)
+{
+	spin_lock(&inode->i_lock);
+	WARN_ON(!atomic_read(&inode->i_count));
+	WARN_ON(inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE));
+	WARN_ON(!(inode->i_state & I_INUSE));
+	inode->i_state &= ~I_INUSE;
+	smp_mb();
+	wake_up_bit(&inode->i_state, __I_INUSE);
+	spin_unlock(&inode->i_lock);
+}
+EXPORT_SYMBOL(inode_inuse_unlock);
+
+/**
+ * wait_on_inode_inuse - wait for release of exclusive 'inuse' lock
+ * @inode:	inode inuse to wait on
+ *
+ * Can be used in combination with parent i_mutex, to protect access to a
+ * newly created inode, until that inode has been properly initialized by
+ * the user that grabbed the 'inuse' exclusive lock after creating the inode.
+ *
+ * Caller must hold a reference to inode to prevent waiting on an inode that
+ * is not 'inuse' and is already being freed.
+ *
+ * Return 0 if the 'inuse' bit is clear or has been cleared while waiting.
+ */
+int wait_on_inode_inuse(struct inode *inode, unsigned mode)
+{
+	if (WARN_ON(!atomic_read(&inode->i_count)))
+		return -EINVAL;
+	might_sleep();
+	return wait_on_bit(&inode->i_state, __I_INUSE, mode);
+}
+EXPORT_SYMBOL(wait_on_inode_inuse);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index aab10f93ef23..e064612b45ef 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1929,6 +1929,12 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
  *			wb stat updates to grab mapping->tree_lock.  See
  *			inode_switch_wb_work_fn() for details.
  *
+ * I_INUSE		An 'advisory' bit to get exclusive ownership on inode
+ *			using inode_inuse_trylock().  It can be used to extend
+ *			exclusive create protection beyond parent->i_mutex lock.
+ *			Used by overlayfs to get exclusive ownership on upper
+ *			and work dirs among overlayfs mounts.
+ *
  * Q: What is the difference between I_WILL_FREE and I_FREEING?
  */
 #define I_DIRTY_SYNC		(1 << 0)
@@ -1949,6 +1955,8 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
 #define __I_DIRTY_TIME_EXPIRED	12
 #define I_DIRTY_TIME_EXPIRED	(1 << __I_DIRTY_TIME_EXPIRED)
 #define I_WB_SWITCH		(1 << 13)
+#define __I_INUSE		14
+#define I_INUSE			(1 << __I_INUSE)
 
 #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES)
 #define I_DIRTY_ALL (I_DIRTY | I_DIRTY_TIME)
@@ -3258,5 +3266,13 @@ static inline bool dir_relax_shared(struct inode *inode)
 
 extern bool path_noexec(const struct path *path);
 extern void inode_nohighmem(struct inode *inode);
+extern bool inode_inuse_trylock(struct inode *inode);
+extern void inode_inuse_unlock(struct inode *inode);
+extern int wait_on_inode_inuse(struct inode *inode, unsigned mode);
+
+static inline bool inode_inuse(struct inode *inode)
+{
+	return inode->i_state & I_INUSE;
+}
 
 #endif /* _LINUX_FS_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 02/20] ovl: get exclusive ownership on upper/work dirs
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 01/20] vfs: introduce inode 'inuse' lock Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 03/20] ovl: relax same fs constrain for ovl_check_origin() Amir Goldstein
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

Bad things can happen if several concurrent overlay mounts try to
use the same upperdir/workdir path.

Try to get the 'inuse' advisory lock on upperdir and workdir.
Fail mount if another overlay mount instance or another user
holds the 'inuse' lock on these directories.

Note that this provides no protection for concurrent overlay
mount that use overlapping (i.e. descendant) upper/work dirs.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/ovl_entry.h |  3 +++
 fs/overlayfs/super.c     | 41 ++++++++++++++++++++++++++++++++++++++---
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 34bc4a9f5c61..b0e7ee2ae398 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -21,6 +21,9 @@ struct ovl_fs {
 	struct vfsmount *upper_mnt;
 	unsigned numlower;
 	struct vfsmount **lower_mnt;
+	/* workbasedir is the path at workdir= mount option */
+	struct dentry *workbasedir;
+	/* workdir is the 'work' directory under workbasedir */
 	struct dentry *workdir;
 	long namelen;
 	/* pathnames of lower and upper dirs, for show_options */
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 4882ffb37bae..476f021baf2a 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -165,12 +165,28 @@ static const struct dentry_operations ovl_reval_dentry_operations = {
 	.d_weak_revalidate = ovl_dentry_weak_revalidate,
 };
 
+/* Get exclusive ownership on upper/work dir among overlay mounts */
+static bool ovl_dir_lock(struct dentry *dentry)
+{
+	return inode_inuse_trylock(d_inode(dentry));
+}
+
+static void ovl_dir_unlock(struct dentry *dentry)
+{
+	if (dentry)
+		inode_inuse_unlock(d_inode(dentry));
+}
+
 static void ovl_put_super(struct super_block *sb)
 {
 	struct ovl_fs *ufs = sb->s_fs_info;
 	unsigned i;
 
 	dput(ufs->workdir);
+	ovl_dir_unlock(ufs->workbasedir);
+	dput(ufs->workbasedir);
+	if (ufs->upper_mnt)
+		ovl_dir_unlock(ufs->upper_mnt->mnt_root);
 	mntput(ufs->upper_mnt);
 	for (i = 0; i < ufs->numlower; i++)
 		mntput(ufs->lower_mnt[i]);
@@ -788,9 +804,15 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 		if (err)
 			goto out_put_upperpath;
 
+		err = -EBUSY;
+		if (!ovl_dir_lock(upperpath.dentry)) {
+			pr_err("overlayfs: upperdir is in-use by another mount\n");
+			goto out_put_upperpath;
+		}
+
 		err = ovl_mount_dir(ufs->config.workdir, &workpath);
 		if (err)
-			goto out_put_upperpath;
+			goto out_unlock_upperdentry;
 
 		err = -EINVAL;
 		if (upperpath.mnt != workpath.mnt) {
@@ -801,12 +823,20 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 			pr_err("overlayfs: workdir and upperdir must be separate subtrees\n");
 			goto out_put_workpath;
 		}
+
+		err = -EBUSY;
+		if (!ovl_dir_lock(workpath.dentry)) {
+			pr_err("overlayfs: workdir is in-use by another mount\n");
+			goto out_put_workpath;
+		}
+
+		ufs->workbasedir = workpath.dentry;
 		sb->s_stack_depth = upperpath.mnt->mnt_sb->s_stack_depth;
 	}
 	err = -ENOMEM;
 	lowertmp = kstrdup(ufs->config.lowerdir, GFP_KERNEL);
 	if (!lowertmp)
-		goto out_put_workpath;
+		goto out_unlock_workdentry;
 
 	err = -EINVAL;
 	stacklen = ovl_split_lowerdirs(lowertmp);
@@ -849,6 +879,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 			pr_err("overlayfs: failed to clone upperpath\n");
 			goto out_put_lowerpath;
 		}
+
 		/* Don't inherit atime flags */
 		ufs->upper_mnt->mnt_flags &= ~(MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME);
 
@@ -971,7 +1002,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	mntput(upperpath.mnt);
 	for (i = 0; i < numlower; i++)
 		mntput(stack[i].mnt);
-	path_put(&workpath);
+	mntput(workpath.mnt);
 	kfree(lowertmp);
 
 	if (upperpath.dentry) {
@@ -1011,8 +1042,12 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	kfree(stack);
 out_free_lowertmp:
 	kfree(lowertmp);
+out_unlock_workdentry:
+	ovl_dir_unlock(workpath.dentry);
 out_put_workpath:
 	path_put(&workpath);
+out_unlock_upperdentry:
+	ovl_dir_unlock(upperpath.dentry);
 out_put_upperpath:
 	path_put(&upperpath);
 out_free_config:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 03/20] ovl: relax same fs constrain for ovl_check_origin()
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 01/20] vfs: introduce inode 'inuse' lock Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 02/20] ovl: get exclusive ownership on upper/work dirs Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 04/20] ovl: generalize ovl_create_workdir() Amir Goldstein
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

For the case of all layers not on the same fs, try to decode the copy up
origin file handle on any of the lower layers.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/namei.c | 43 +++++++++++++++++++++++++------------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 1ec1499fc672..4ca6061f7bfa 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -272,28 +272,24 @@ static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
 static int ovl_check_origin(struct dentry *dentry, struct dentry *upperdentry,
 			    struct path **stackp, unsigned int *ctrp)
 {
-	struct super_block *same_sb = ovl_same_sb(dentry->d_sb);
 	struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
 	struct vfsmount *mnt;
-	struct dentry *origin;
+	struct dentry *origin = NULL;
+	int i;
 
-	if (!same_sb || !roe->numlower)
-		return 0;
 
-       /*
-	* Since all layers are on the same fs, we use the first layer for
-	* decoding the file handle.  We may get a disconnected dentry,
-	* which is fine, because we only need to hold the origin inode in
-	* cache and use its inode number.  We may even get a connected dentry,
-	* that is not under the first layer's root.  That is also fine for
-	* using it's inode number - it's the same as if we held a reference
-	* to a dentry in first layer that was moved under us.
-	*/
-	mnt = roe->lowerstack[0].mnt;
-
-	origin = ovl_get_origin(upperdentry, mnt);
-	if (IS_ERR_OR_NULL(origin))
-		return PTR_ERR(origin);
+	for (i = 0; i < roe->numlower; i++) {
+		mnt = roe->lowerstack[i].mnt;
+		origin = ovl_get_origin(upperdentry, mnt);
+		if (IS_ERR(origin))
+			return PTR_ERR(origin);
+
+		if (origin)
+			break;
+	}
+
+	if (!origin)
+		return 0;
 
 	BUG_ON(*stackp || *ctrp);
 	*stackp = kmalloc(sizeof(struct path), GFP_TEMPORARY);
@@ -301,6 +297,7 @@ static int ovl_check_origin(struct dentry *dentry, struct dentry *upperdentry,
 		dput(origin);
 		return -ENOMEM;
 	}
+
 	**stackp = (struct path) { .dentry = origin, .mnt = mnt };
 	*ctrp = 1;
 
@@ -375,6 +372,16 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		}
 		if (upperdentry && !d.is_dir) {
 			BUG_ON(!d.stop || d.redirect);
+			/*
+			 * Lookup copy up origin by decoding origin file handle.
+			 * We may get a disconnected dentry, which is fine,
+			 * because we only need to hold the origin inode in
+			 * cache and use its inode number.  We may even get a
+			 * connected dentry, that is not under any of the lower
+			 * layers root.  That is also fine for using it's inode
+			 * number - it's the same as if we held a reference
+			 * to a dentry in lower layer that was moved under us.
+			 */
 			err = ovl_check_origin(dentry, upperdentry,
 					       &stack, &ctr);
 			if (err)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 04/20] ovl: generalize ovl_create_workdir()
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (2 preceding siblings ...)
  2017-06-07  7:51 ` [PATCH v2 03/20] ovl: relax same fs constrain for ovl_check_origin() Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 05/20] ovl: introduce the inodes index dir feature Amir Goldstein
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

Pass in the subdir name to create and specify if subdir is persistent
or if it should be cleaned up on every mount.

Move fallback to readonly mount on failure to create dir and print of error
message into the helper.

This function is going to be used for creating the persistent 'index' dir
under workbasedir.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/super.c | 45 +++++++++++++++++++++++++++++----------------
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 476f021baf2a..5114627799e9 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -395,22 +395,25 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
 
 #define OVL_WORKDIR_NAME "work"
 
-static struct dentry *ovl_workdir_create(struct vfsmount *mnt,
-					 struct dentry *dentry)
+static struct dentry *ovl_workdir_create(struct super_block *sb,
+					 struct ovl_fs *ufs,
+					 struct dentry *dentry,
+					 const char *name, bool persist)
 {
-	struct inode *dir = dentry->d_inode;
-	struct dentry *work;
+	struct vfsmount *mnt = ufs->upper_mnt;
+	struct inode *dir = NULL;
+	struct dentry *work = NULL;
 	int err;
 	bool retried = false;
 
 	err = mnt_want_write(mnt);
 	if (err)
-		return ERR_PTR(err);
+		goto out_err;
 
+	dir = d_inode(dentry);
 	inode_lock_nested(dir, I_MUTEX_PARENT);
 retry:
-	work = lookup_one_len(OVL_WORKDIR_NAME, dentry,
-			      strlen(OVL_WORKDIR_NAME));
+	work = lookup_one_len(name, dentry, strlen(name));
 
 	if (!IS_ERR(work)) {
 		struct iattr attr = {
@@ -423,6 +426,9 @@ static struct dentry *ovl_workdir_create(struct vfsmount *mnt,
 			if (retried)
 				goto out_dput;
 
+			if (persist)
+				goto out_unlock;
+
 			retried = true;
 			ovl_workdir_cleanup(dir, mnt, work, 0);
 			dput(work);
@@ -462,16 +468,25 @@ static struct dentry *ovl_workdir_create(struct vfsmount *mnt,
 		inode_unlock(work->d_inode);
 		if (err)
 			goto out_dput;
+	} else {
+		err = PTR_ERR(work);
+		goto out_err;
 	}
+
 out_unlock:
-	inode_unlock(dir);
+	if (dir)
+		inode_unlock(dir);
 	mnt_drop_write(mnt);
 
 	return work;
 
 out_dput:
 	dput(work);
-	work = ERR_PTR(err);
+out_err:
+	pr_warn("overlayfs: failed to create directory %s/%s (errno: %i); mounting read-only\n",
+		ufs->config.workdir, name, -err);
+	sb->s_flags |= MS_RDONLY;
+	work = NULL;
 	goto out_unlock;
 }
 
@@ -885,14 +900,11 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 
 		sb->s_time_gran = ufs->upper_mnt->mnt_sb->s_time_gran;
 
-		ufs->workdir = ovl_workdir_create(ufs->upper_mnt, workpath.dentry);
+		ufs->workdir = ovl_workdir_create(sb, ufs, workpath.dentry,
+						  OVL_WORKDIR_NAME, false);
 		err = PTR_ERR(ufs->workdir);
-		if (IS_ERR(ufs->workdir)) {
-			pr_warn("overlayfs: failed to create directory %s/%s (errno: %i); mounting read-only\n",
-				ufs->config.workdir, OVL_WORKDIR_NAME, -err);
-			sb->s_flags |= MS_RDONLY;
-			ufs->workdir = NULL;
-		}
+		if (IS_ERR(ufs->workdir))
+			goto out_put_upper_mnt;
 
 		/*
 		 * Upper should support d_type, else whiteouts are visible.
@@ -1035,6 +1047,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	kfree(ufs->lower_mnt);
 out_put_workdir:
 	dput(ufs->workdir);
+out_put_upper_mnt:
 	mntput(ufs->upper_mnt);
 out_put_lowerpath:
 	for (i = 0; i < numlower; i++)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 05/20] ovl: introduce the inodes index dir feature
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (3 preceding siblings ...)
  2017-06-07  7:51 ` [PATCH v2 04/20] ovl: generalize ovl_create_workdir() Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 06/20] ovl: verify upper root dir matches lower root dir Amir Goldstein
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

Create the index dir on mount. The index dir will contain hardlinks to
upper inodes, named after the hex representation of their origin lower
inodes.

The index dir is going to be used to prevent breaking lower hardlinks
on copy up and to implement overlayfs NFS export.

Because the feature is not fully backward compat, enabling the feature
is opt-in by config/module/mount option.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/Kconfig     | 20 ++++++++++++++++++
 fs/overlayfs/copy_up.c   | 10 ++++-----
 fs/overlayfs/overlayfs.h |  2 ++
 fs/overlayfs/ovl_entry.h |  3 +++
 fs/overlayfs/super.c     | 54 +++++++++++++++++++++++++++++++++++++++++++++++-
 fs/overlayfs/util.c      | 15 ++++++++++++++
 6 files changed, 97 insertions(+), 7 deletions(-)

diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
index c0c9683934b7..b7241f273c36 100644
--- a/fs/overlayfs/Kconfig
+++ b/fs/overlayfs/Kconfig
@@ -23,3 +23,23 @@ config OVERLAY_FS_REDIRECT_DIR
 	  Note, that redirects are not backward compatible.  That is, mounting
 	  an overlay which has redirects on a kernel that doesn't support this
 	  feature will have unexpected results.
+
+config OVERLAY_FS_INDEX
+	bool "Overlayfs: turn on inodes index feature by default"
+	depends on OVERLAY_FS
+	help
+	  If this config option is enabled then overlay filesystems will use
+	  the inodes index dir to map lower inodes to upper inodes by default.
+	  In this case it is still possible to turn off index globally with the
+	  "index=off" module option or on a filesystem instance basis with the
+	  "index=off" mount option.
+
+	  The inodes index feature prevents breaking of lower hardlinks on copy
+	  up and enables NFS export.
+
+	  Note, that the inodes index feature is read-only backward compatible.
+	  That is, mounting an overlay which has an index dir on a kernel that
+	  doesn't support this feature read-only, will not have any negative
+	  outcomes.  However, mounting the same overlay with an old kernel
+	  read-write and then mounting it again with a new kernel, will have
+	  unexpected results.
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 7a44533f4bbf..d3047fdc0377 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -233,12 +233,13 @@ int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat)
 	return err;
 }
 
-static struct ovl_fh *ovl_encode_fh(struct dentry *lower, uuid_be *uuid)
+static struct ovl_fh *ovl_encode_fh(struct dentry *lower)
 {
 	struct ovl_fh *fh;
 	int fh_type, fh_len, dwords;
 	void *buf;
 	int buflen = MAX_HANDLE_SZ;
+	uuid_be *uuid = (uuid_be *) &lower->d_sb->s_uuid;
 
 	buf = kmalloc(buflen, GFP_TEMPORARY);
 	if (!buf)
@@ -283,8 +284,6 @@ static struct ovl_fh *ovl_encode_fh(struct dentry *lower, uuid_be *uuid)
 static int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
 			  struct dentry *upper)
 {
-	struct super_block *sb = lower->d_sb;
-	uuid_be *uuid = (uuid_be *) &sb->s_uuid;
 	const struct ovl_fh *fh = NULL;
 	int err;
 
@@ -293,9 +292,8 @@ static int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
 	 * so we can use the overlay.origin xattr to distignuish between a copy
 	 * up and a pure upper inode.
 	 */
-	if (sb->s_export_op && sb->s_export_op->fh_to_dentry &&
-	    uuid_be_cmp(*uuid, NULL_UUID_BE)) {
-		fh = ovl_encode_fh(lower, uuid);
+	if (ovl_can_decode_fh(lower->d_sb)) {
+		fh = ovl_encode_fh(lower);
 		if (IS_ERR(fh))
 			return PTR_ERR(fh);
 	}
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 513e25e56eed..9ec93063c3a8 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -191,6 +191,8 @@ void ovl_drop_write(struct dentry *dentry);
 struct dentry *ovl_workdir(struct dentry *dentry);
 const struct cred *ovl_override_creds(struct super_block *sb);
 struct super_block *ovl_same_sb(struct super_block *sb);
+bool ovl_can_decode_fh(struct super_block *sb);
+struct dentry *ovl_indexdir(struct super_block *sb);
 struct ovl_entry *ovl_alloc_entry(unsigned int numlower);
 bool ovl_dentry_remote(struct dentry *dentry);
 bool ovl_dentry_weird(struct dentry *dentry);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index b0e7ee2ae398..ddd4b0a874a9 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -14,6 +14,7 @@ struct ovl_config {
 	char *workdir;
 	bool default_permissions;
 	bool redirect_dir;
+	bool index;
 };
 
 /* private information held for overlayfs's superblock */
@@ -25,6 +26,8 @@ struct ovl_fs {
 	struct dentry *workbasedir;
 	/* workdir is the 'work' directory under workbasedir */
 	struct dentry *workdir;
+	/* index directory listing overlay inodes by origin file handle */
+	struct dentry *indexdir;
 	long namelen;
 	/* pathnames of lower and upper dirs, for show_options */
 	struct ovl_config config;
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 5114627799e9..02edc5ab1df3 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -34,6 +34,11 @@ module_param_named(redirect_dir, ovl_redirect_dir_def, bool, 0644);
 MODULE_PARM_DESC(ovl_redirect_dir_def,
 		 "Default to on or off for the redirect_dir feature");
 
+static bool ovl_index_def = IS_ENABLED(CONFIG_OVERLAY_FS_INDEX);
+module_param_named(index, ovl_index_def, bool, 0644);
+MODULE_PARM_DESC(ovl_index_def,
+		 "Default to on or off for the inodes index feature");
+
 static void ovl_dentry_release(struct dentry *dentry)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
@@ -182,6 +187,7 @@ static void ovl_put_super(struct super_block *sb)
 	struct ovl_fs *ufs = sb->s_fs_info;
 	unsigned i;
 
+	dput(ufs->indexdir);
 	dput(ufs->workdir);
 	ovl_dir_unlock(ufs->workbasedir);
 	dput(ufs->workbasedir);
@@ -265,6 +271,9 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
 	if (ufs->config.redirect_dir != ovl_redirect_dir_def)
 		seq_printf(m, ",redirect_dir=%s",
 			   ufs->config.redirect_dir ? "on" : "off");
+	if (ufs->config.index != ovl_index_def)
+		seq_printf(m, ",index=%s",
+			   ufs->config.index ? "on" : "off");
 	return 0;
 }
 
@@ -294,6 +303,8 @@ enum {
 	OPT_DEFAULT_PERMISSIONS,
 	OPT_REDIRECT_DIR_ON,
 	OPT_REDIRECT_DIR_OFF,
+	OPT_INDEX_ON,
+	OPT_INDEX_OFF,
 	OPT_ERR,
 };
 
@@ -304,6 +315,8 @@ static const match_table_t ovl_tokens = {
 	{OPT_DEFAULT_PERMISSIONS,	"default_permissions"},
 	{OPT_REDIRECT_DIR_ON,		"redirect_dir=on"},
 	{OPT_REDIRECT_DIR_OFF,		"redirect_dir=off"},
+	{OPT_INDEX_ON,			"index=on"},
+	{OPT_INDEX_OFF,			"index=off"},
 	{OPT_ERR,			NULL}
 };
 
@@ -376,6 +389,14 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
 			config->redirect_dir = false;
 			break;
 
+		case OPT_INDEX_ON:
+			config->index = true;
+			break;
+
+		case OPT_INDEX_OFF:
+			config->index = false;
+			break;
+
 		default:
 			pr_err("overlayfs: unrecognized mount option \"%s\" or missing value\n", p);
 			return -EINVAL;
@@ -394,6 +415,7 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
 }
 
 #define OVL_WORKDIR_NAME "work"
+#define OVL_INDEXDIR_NAME "index"
 
 static struct dentry *ovl_workdir_create(struct super_block *sb,
 					 struct ovl_fs *ufs,
@@ -785,6 +807,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 
 	init_waitqueue_head(&ufs->copyup_wq);
 	ufs->config.redirect_dir = ovl_redirect_dir_def;
+	ufs->config.index = ovl_index_def;
 	err = ovl_parse_opt((char *) data, &ufs->config);
 	if (err)
 		goto out_free_config;
@@ -947,6 +970,13 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 			} else {
 				vfs_removexattr(ufs->workdir, OVL_XATTR_OPAQUE);
 			}
+
+			/* Check if upper/work fs supports file handles */
+			if (ufs->config.index &&
+			    !ovl_can_decode_fh(ufs->workdir->d_sb)) {
+				ufs->config.index = false;
+				pr_warn("overlayfs: upper fs does not support NFS export.\n");
+			}
 		}
 	}
 
@@ -976,6 +1006,15 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 			ufs->same_sb = mnt->mnt_sb;
 		else if (ufs->same_sb != mnt->mnt_sb)
 			ufs->same_sb = NULL;
+
+		/*
+		 * The inodes index feature needs to encode and decode file
+		 * handles, so it requires that all layers support NFS export.
+		 */
+		if (ufs->config.index && !ovl_can_decode_fh(mnt->mnt_sb)) {
+			ufs->config.index = false;
+			pr_warn("overlayfs: lower fs does not support NFS export.\n");
+		}
 	}
 
 	/* If the upper fs is nonexistent, we mark overlayfs r/o too */
@@ -984,6 +1023,17 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	else if (ufs->upper_mnt->mnt_sb != ufs->same_sb)
 		ufs->same_sb = NULL;
 
+	if (!(sb->s_flags & MS_RDONLY) && ufs->config.index) {
+		ufs->indexdir = ovl_workdir_create(sb, ufs, workpath.dentry,
+						   OVL_INDEXDIR_NAME, true);
+		err = PTR_ERR(ufs->indexdir);
+		if (IS_ERR(ufs->indexdir))
+			goto out_put_lower_mnt;
+
+		if (!ufs->indexdir)
+			pr_warn("overlayfs: try deleting index dir or mounting with '-o index=off' to disable inodes index.\n");
+	}
+
 	if (remote)
 		sb->s_d_op = &ovl_reval_dentry_operations;
 	else
@@ -991,7 +1041,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 
 	ufs->creator_cred = cred = prepare_creds();
 	if (!cred)
-		goto out_put_lower_mnt;
+		goto out_put_indexdir;
 
 	/* Never override disk quota limits or use reserved space */
 	cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
@@ -1041,6 +1091,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	kfree(oe);
 out_put_cred:
 	put_cred(ufs->creator_cred);
+out_put_indexdir:
+	dput(ufs->indexdir);
 out_put_lower_mnt:
 	for (i = 0; i < ufs->numlower; i++)
 		mntput(ufs->lower_mnt[i]);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 809048913889..8e2f308056f1 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -12,6 +12,8 @@
 #include <linux/slab.h>
 #include <linux/cred.h>
 #include <linux/xattr.h>
+#include <linux/exportfs.h>
+#include <linux/uuid.h>
 #include "overlayfs.h"
 #include "ovl_entry.h"
 
@@ -47,6 +49,19 @@ struct super_block *ovl_same_sb(struct super_block *sb)
 	return ofs->same_sb;
 }
 
+bool ovl_can_decode_fh(struct super_block *sb)
+{
+	return (sb->s_export_op && sb->s_export_op->fh_to_dentry &&
+		uuid_be_cmp(*(uuid_be *) &sb->s_uuid, NULL_UUID_BE));
+}
+
+struct dentry *ovl_indexdir(struct super_block *sb)
+{
+	struct ovl_fs *ofs = sb->s_fs_info;
+
+	return ofs->indexdir;
+}
+
 struct ovl_entry *ovl_alloc_entry(unsigned int numlower)
 {
 	size_t size = offsetof(struct ovl_entry, lowerstack[numlower]);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 06/20] ovl: verify upper root dir matches lower root dir
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (4 preceding siblings ...)
  2017-06-07  7:51 ` [PATCH v2 05/20] ovl: introduce the inodes index dir feature Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 07/20] ovl: verify index dir matches upper dir Amir Goldstein
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

When inodes index feature is enabled, verify that the file handle stored
in upper root dir matches the lower root dir or fail to mount.

If upper root dir has no stored file handle, encode and store the lower
root dir file handle in overlay.origin xattr.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c   |  2 +-
 fs/overlayfs/namei.c     | 84 +++++++++++++++++++++++++++++++++++++++++-------
 fs/overlayfs/overlayfs.h |  3 ++
 fs/overlayfs/super.c     | 42 ++++++++++++++++++++++++
 4 files changed, 119 insertions(+), 12 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index d3047fdc0377..ee0b894e1d99 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -233,7 +233,7 @@ int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat)
 	return err;
 }
 
-static struct ovl_fh *ovl_encode_fh(struct dentry *lower)
+struct ovl_fh *ovl_encode_fh(struct dentry *lower)
 {
 	struct ovl_fh *fh;
 	int fh_type, fh_len, dwords;
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 4ca6061f7bfa..1b8017a57a02 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -88,13 +88,11 @@ static int ovl_acceptable(void *ctx, struct dentry *dentry)
 	return 1;
 }
 
-static struct dentry *ovl_get_origin(struct dentry *dentry,
-				     struct vfsmount *mnt)
+static struct ovl_fh *ovl_get_origin_fh(struct dentry *dentry,
+					struct vfsmount *mnt)
 {
 	int res;
 	struct ovl_fh *fh = NULL;
-	struct dentry *origin = NULL;
-	int bytes;
 
 	res = vfs_getxattr(dentry, OVL_XATTR_ORIGIN, NULL, 0);
 	if (res < 0) {
@@ -106,7 +104,7 @@ static struct dentry *ovl_get_origin(struct dentry *dentry,
 	if (res == 0)
 		return NULL;
 
-	fh  = kzalloc(res, GFP_TEMPORARY);
+	fh = kzalloc(res, GFP_TEMPORARY);
 	if (!fh)
 		return ERR_PTR(-ENOMEM);
 
@@ -129,8 +127,6 @@ static struct dentry *ovl_get_origin(struct dentry *dentry,
 	    (fh->flags & OVL_FH_FLAG_BIG_ENDIAN) != OVL_FH_FLAG_CPU_ENDIAN)
 		goto out;
 
-	bytes = (fh->len - offsetof(struct ovl_fh, fid));
-
 	/*
 	 * Make sure that the stored uuid matches the uuid of the lower
 	 * layer where file handle will be decoded.
@@ -138,6 +134,31 @@ static struct dentry *ovl_get_origin(struct dentry *dentry,
 	if (uuid_be_cmp(fh->uuid, *(uuid_be *) &mnt->mnt_sb->s_uuid))
 		goto out;
 
+	return fh;
+
+out:
+	kfree(fh);
+	return NULL;
+
+fail:
+	pr_warn_ratelimited("overlayfs: failed to get origin (%i)\n", res);
+	goto out;
+invalid:
+	pr_warn_ratelimited("overlayfs: invalid origin (%*phN)\n", res, fh);
+	goto out;
+}
+
+static struct dentry *ovl_get_origin(struct dentry *dentry,
+				     struct vfsmount *mnt)
+{
+	struct dentry *origin = NULL;
+	struct ovl_fh *fh = ovl_get_origin_fh(dentry, mnt);
+	int bytes;
+
+	if (!fh)
+		return NULL;
+
+	bytes = (fh->len - offsetof(struct ovl_fh, fid));
 	origin = exportfs_decode_fh(mnt, (struct fid *)fh->fid,
 				    bytes >> 2, (int)fh->type,
 				    ovl_acceptable, NULL);
@@ -159,11 +180,8 @@ static struct dentry *ovl_get_origin(struct dentry *dentry,
 	kfree(fh);
 	return origin;
 
-fail:
-	pr_warn_ratelimited("overlayfs: failed to get origin (%i)\n", res);
-	goto out;
 invalid:
-	pr_warn_ratelimited("overlayfs: invalid origin (%*phN)\n", res, fh);
+	pr_warn_ratelimited("overlayfs: invalid origin (%*phN)\n", fh->len, fh);
 	goto out;
 }
 
@@ -305,6 +323,50 @@ static int ovl_check_origin(struct dentry *dentry, struct dentry *upperdentry,
 }
 
 /*
+ * Verify that an inode matches the origin file handle stored in upper inode.
+ * Return 0 on match, -ESTALE on mismatch, < 0 on error.
+ */
+int ovl_verify_origin(struct dentry *dentry, struct vfsmount *mnt,
+		      struct dentry *origin)
+{
+	struct inode *inode = NULL;
+	struct ovl_fh *fh = NULL;
+	struct ovl_fh *ofh = ovl_get_origin_fh(dentry, mnt);
+	int err;
+
+	/* Fail verification with no warning if no valid origin fh */
+	if (!ofh)
+		return -ENODATA;
+
+	if (IS_ERR(ofh)) {
+		err = PTR_ERR(ofh);
+		goto fail;
+	}
+
+	fh = ovl_encode_fh(origin);
+	if (IS_ERR(fh)) {
+		err = PTR_ERR(fh);
+		fh = NULL;
+		goto fail;
+	} else if (fh->len != ofh->len || memcmp(fh, ofh, fh->len)) {
+		err = -ESTALE;
+		goto fail;
+	}
+
+	err = 0;
+out:
+	kfree(ofh);
+	kfree(fh);
+	return err;
+
+fail:
+	inode = d_inode(origin);
+	pr_warn_ratelimited("overlayfs: failed to verify origin (ino=%lu, err=%i) - were layers copied?\n",
+			    inode ? inode->i_ino : 0, err);
+	goto out;
+}
+
+/*
  * Returns next layer in stack starting from top.
  * Returns -1 if this is the last layer.
  */
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 9ec93063c3a8..1eb8250464be 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -235,6 +235,8 @@ static inline bool ovl_is_impuredir(struct dentry *dentry)
 
 
 /* namei.c */
+int ovl_verify_origin(struct dentry *dentry, struct vfsmount *mnt,
+		      struct dentry *origin);
 int ovl_path_next(int idx, struct dentry *dentry, struct path *path, int *idxp);
 struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags);
 bool ovl_lower_positive(struct dentry *dentry);
@@ -293,3 +295,4 @@ int ovl_copy_up(struct dentry *dentry);
 int ovl_copy_up_flags(struct dentry *dentry, int flags);
 int ovl_copy_xattr(struct dentry *old, struct dentry *new);
 int ovl_set_attr(struct dentry *upper, struct kstat *stat);
+struct ovl_fh *ovl_encode_fh(struct dentry *lower);
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 02edc5ab1df3..137371419657 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -414,6 +414,41 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
 	return 0;
 }
 
+/*
+ * Verify that stored file handle in dir matches origin.
+ * If dir has no stored file handle, encode and store origin file handle.
+ */
+static int ovl_verify_set_origin(struct dentry *dir, struct vfsmount *mnt,
+				 struct dentry *origin, const char *name)
+{
+	const struct ovl_fh *fh = NULL;
+	int err;
+
+	err = ovl_verify_origin(dir, mnt, origin);
+	if (!err)
+		return 0;
+
+	if (err != -ENODATA)
+		goto fail;
+
+	fh = ovl_encode_fh(origin);
+	err = PTR_ERR(fh);
+	if (IS_ERR(fh))
+		goto fail;
+	err = ovl_do_setxattr(dir, OVL_XATTR_ORIGIN, fh, fh->len, 0);
+	if (err)
+		goto fail;
+
+out:
+	kfree(fh);
+	return err;
+
+fail:
+	pr_err("overlayfs: failed to verify %s dir. (err=%i)\n",
+	       name, err);
+	goto out;
+}
+
 #define OVL_WORKDIR_NAME "work"
 #define OVL_INDEXDIR_NAME "index"
 
@@ -1024,6 +1059,13 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 		ufs->same_sb = NULL;
 
 	if (!(sb->s_flags & MS_RDONLY) && ufs->config.index) {
+		/* Verify lower root is upper root origin */
+		err = ovl_verify_set_origin(upperpath.dentry,
+					    ufs->lower_mnt[0], stack[0].dentry,
+					    "lower root");
+		if (err)
+			goto out_put_lower_mnt;
+
 		ufs->indexdir = ovl_workdir_create(sb, ufs, workpath.dentry,
 						   OVL_INDEXDIR_NAME, true);
 		err = PTR_ERR(ufs->indexdir);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 07/20] ovl: verify index dir matches upper dir
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (5 preceding siblings ...)
  2017-06-07  7:51 ` [PATCH v2 06/20] ovl: verify upper root dir matches lower root dir Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 08/20] ovl: lookup index entry for non-dir Amir Goldstein
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

An index dir contains persistent hardlinks to files in upper dir.
Therefore, we must never mount an existing index dir with a differnt
upper dir.

Store the upper root dir file handle in index dir inode when index
dir is created and verify the file handle before using an existing
index dir on mount.

Add an 'is_upper' flag to the overlay file handle encoding and set it
when encoding the upper root file handle. This is not critical for index
dir verification, but it is good practice towards a standard overlayfs
file handle format for NFS export.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c   | 12 ++++++++++--
 fs/overlayfs/namei.c     |  4 ++--
 fs/overlayfs/overlayfs.h |  6 ++++--
 fs/overlayfs/super.c     | 21 ++++++++++++++++-----
 4 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index ee0b894e1d99..eedc26e15dad 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -233,7 +233,7 @@ int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat)
 	return err;
 }
 
-struct ovl_fh *ovl_encode_fh(struct dentry *lower)
+struct ovl_fh *ovl_encode_fh(struct dentry *lower, bool is_upper)
 {
 	struct ovl_fh *fh;
 	int fh_type, fh_len, dwords;
@@ -272,6 +272,14 @@ struct ovl_fh *ovl_encode_fh(struct dentry *lower)
 	fh->magic = OVL_FH_MAGIC;
 	fh->type = fh_type;
 	fh->flags = OVL_FH_FLAG_CPU_ENDIAN;
+	/*
+	 * When we will want to decode an overlay dentry from this handle
+	 * and all layers are on the same fs, if we get a disconncted real
+	 * dentry when we decode fid, the only way to tell if we should assign
+	 * it to upperdentry or to lowerstack is by checking this flag.
+	 */
+	if (is_upper)
+		fh->flags |= OVL_FH_FLAG_PATH_UPPER;
 	fh->len = fh_len;
 	fh->uuid = *uuid;
 	memcpy(fh->fid, buf, buflen);
@@ -293,7 +301,7 @@ static int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
 	 * up and a pure upper inode.
 	 */
 	if (ovl_can_decode_fh(lower->d_sb)) {
-		fh = ovl_encode_fh(lower);
+		fh = ovl_encode_fh(lower, false);
 		if (IS_ERR(fh))
 			return PTR_ERR(fh);
 	}
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 1b8017a57a02..61f4988527f4 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -327,7 +327,7 @@ static int ovl_check_origin(struct dentry *dentry, struct dentry *upperdentry,
  * Return 0 on match, -ESTALE on mismatch, < 0 on error.
  */
 int ovl_verify_origin(struct dentry *dentry, struct vfsmount *mnt,
-		      struct dentry *origin)
+		      struct dentry *origin, bool is_upper)
 {
 	struct inode *inode = NULL;
 	struct ovl_fh *fh = NULL;
@@ -343,7 +343,7 @@ int ovl_verify_origin(struct dentry *dentry, struct vfsmount *mnt,
 		goto fail;
 	}
 
-	fh = ovl_encode_fh(origin);
+	fh = ovl_encode_fh(origin, is_upper);
 	if (IS_ERR(fh)) {
 		err = PTR_ERR(fh);
 		fh = NULL;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 1eb8250464be..6fd7c9e748c1 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -38,6 +38,8 @@ enum ovl_path_type {
 /* CPU byte order required for fid decoding:  */
 #define OVL_FH_FLAG_BIG_ENDIAN	(1 << 0)
 #define OVL_FH_FLAG_ANY_ENDIAN	(1 << 1)
+/* Is the real inode encoded in fid an upper inode? */
+#define OVL_FH_FLAG_PATH_UPPER	(1 << 2)
 
 #define OVL_FH_FLAG_ALL (OVL_FH_FLAG_BIG_ENDIAN | OVL_FH_FLAG_ANY_ENDIAN)
 
@@ -236,7 +238,7 @@ static inline bool ovl_is_impuredir(struct dentry *dentry)
 
 /* namei.c */
 int ovl_verify_origin(struct dentry *dentry, struct vfsmount *mnt,
-		      struct dentry *origin);
+		      struct dentry *origin, bool is_upper);
 int ovl_path_next(int idx, struct dentry *dentry, struct path *path, int *idxp);
 struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags);
 bool ovl_lower_positive(struct dentry *dentry);
@@ -295,4 +297,4 @@ int ovl_copy_up(struct dentry *dentry);
 int ovl_copy_up_flags(struct dentry *dentry, int flags);
 int ovl_copy_xattr(struct dentry *old, struct dentry *new);
 int ovl_set_attr(struct dentry *upper, struct kstat *stat);
-struct ovl_fh *ovl_encode_fh(struct dentry *lower);
+struct ovl_fh *ovl_encode_fh(struct dentry *lower, bool is_upper);
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 137371419657..b411b6d5f7b1 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -419,19 +419,20 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
  * If dir has no stored file handle, encode and store origin file handle.
  */
 static int ovl_verify_set_origin(struct dentry *dir, struct vfsmount *mnt,
-				 struct dentry *origin, const char *name)
+				 struct dentry *origin, const char *name,
+				 bool is_upper)
 {
 	const struct ovl_fh *fh = NULL;
 	int err;
 
-	err = ovl_verify_origin(dir, mnt, origin);
+	err = ovl_verify_origin(dir, mnt, origin, is_upper);
 	if (!err)
 		return 0;
 
 	if (err != -ENODATA)
 		goto fail;
 
-	fh = ovl_encode_fh(origin);
+	fh = ovl_encode_fh(origin, is_upper);
 	err = PTR_ERR(fh);
 	if (IS_ERR(fh))
 		goto fail;
@@ -1062,7 +1063,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 		/* Verify lower root is upper root origin */
 		err = ovl_verify_set_origin(upperpath.dentry,
 					    ufs->lower_mnt[0], stack[0].dentry,
-					    "lower root");
+					    "lower root", false);
 		if (err)
 			goto out_put_lower_mnt;
 
@@ -1072,8 +1073,18 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 		if (IS_ERR(ufs->indexdir))
 			goto out_put_lower_mnt;
 
-		if (!ufs->indexdir)
+		if (ufs->indexdir) {
+			/* Verify upper root is index dir origin */
+			err = ovl_verify_set_origin(ufs->indexdir,
+						    ufs->upper_mnt,
+						    upperpath.dentry,
+						    "upper root", true);
+			if (err)
+				goto out_put_indexdir;
+		} else {
 			pr_warn("overlayfs: try deleting index dir or mounting with '-o index=off' to disable inodes index.\n");
+		}
+
 	}
 
 	if (remote)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 08/20] ovl: lookup index entry for non-dir
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (6 preceding siblings ...)
  2017-06-07  7:51 ` [PATCH v2 07/20] ovl: verify index dir matches upper dir Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-08 12:11   ` Miklos Szeredi
  2017-06-07  7:51 ` [PATCH v2 09/20] ovl: move inode helpers to inode.c Amir Goldstein
                   ` (14 subsequent siblings)
  22 siblings, 1 reply; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

When inodes index feature is enabled, lookup in indexdir for the index
entry of lower real inode or copy up origin inode. The index entry name
is the hex representation of the lower inode file handle.

If the index dentry in negative, then either no lower aliases have been
copied up yet, or aliases have been copied up in older kernels and are
not indexed.

If the index dentry for a copy up origin inode is positive, but points
to an inode different than the upper inode, then either the upper inode
has been copied up and not indexed or it was indexed, but since then
index dir was cleared. Either way, that index cannot be used to indentify
the overlay inode.

If a negative index dentry is found or a positive dentry that matches the
upper inode, store it in the overlay dentry to be used later. A non-upper
overlay dentry may still have a positive index from copy up of another
lower hardlink. This situation can be tested with the path type macros
OVL_TYPE_INDEX(type) && !OVL_TYPE_UPPER(type).

This is going to be used to prevent breaking hardlinks on copy up.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/namei.c     | 89 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/overlayfs/overlayfs.h |  3 ++
 fs/overlayfs/ovl_entry.h |  1 +
 fs/overlayfs/super.c     |  1 +
 fs/overlayfs/util.c      | 11 ++++++
 5 files changed, 105 insertions(+)

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 61f4988527f4..d204488bf23c 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -367,6 +367,84 @@ int ovl_verify_origin(struct dentry *dentry, struct vfsmount *mnt,
 }
 
 /*
+ * Lookup in indexdir for the index entry of a lower real inode or a copy up
+ * origin inode. The index entry name is the hex representation of the lower
+ * inode file handle.
+ *
+ * If the index dentry in negative, then either no lower aliases have been
+ * copied up yet, or aliases have been copied up in older kernels and are
+ * not indexed.
+ *
+ * If the index dentry for a copy up origin inode is positive, but points
+ * to an inode different than the upper inode, then either the upper inode
+ * has been copied up and not indexed or it was indexed, but since then
+ * index dir was cleared. Either way, that index cannot be used to indentify
+ * the overlay inode.
+ */
+static int ovl_lookup_index(struct dentry *dentry, struct dentry *upper,
+			    struct dentry *lower, struct dentry **indexp)
+{
+	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
+	struct ovl_fh *fh;
+	struct dentry *index = NULL;
+	struct inode *inode;
+	char *s, *name = NULL;
+	long namelen = 0;
+	int err;
+
+	if (WARN_ON(!ofs->indexdir))
+		return -ENOENT;
+
+	fh = ovl_encode_fh(lower, false);
+	if (IS_ERR(fh)) {
+		err = PTR_ERR(fh);
+		fh = NULL;
+		goto fail;
+	}
+
+	err = -ENAMETOOLONG;
+	namelen = fh->len * 2;
+	if (namelen > ofs->namelen)
+		goto fail;
+
+	err = -ENOMEM;
+	name = kzalloc(namelen + 1, GFP_TEMPORARY);
+	if (!name)
+		goto fail;
+
+	s  = bin2hex(name, fh, fh->len);
+	namelen = s - name;
+
+	index = lookup_one_len_unlocked(name, ofs->indexdir, namelen);
+	if (IS_ERR(index)) {
+		err = PTR_ERR(index);
+		goto fail;
+	}
+
+	if (upper && d_inode(index) != d_inode(upper)) {
+		inode = d_inode(index);
+		pr_debug("ovl_lookup_index: upper with origin not indexed (%pd2, ino=%lu)\n",
+			 upper, inode ? inode->i_ino : 0);
+		dput(index);
+		index = NULL;
+	}
+
+	*indexp = index;
+	err = 0;
+
+out:
+	kfree(fh);
+	kfree(name);
+	return err;
+
+fail:
+	inode = d_inode(lower);
+	pr_warn_ratelimited("overlayfs: failed inode index lookup (ino=%lu, key=%*s, err=%i); mount with '-o index=off' to disable inodes index.\n",
+			    inode ? inode->i_ino : 0, (int)namelen, name, err);
+	goto out;
+}
+
+/*
  * Returns next layer in stack starting from top.
  * Returns -1 if this is the last layer.
  */
@@ -400,6 +478,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
 	struct path *stack = NULL;
 	struct dentry *upperdir, *upperdentry = NULL;
+	struct dentry *indexdentry = NULL;
 	unsigned int ctr = 0;
 	struct inode *inode = NULL;
 	bool upperopaque = false;
@@ -500,6 +579,14 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		}
 	}
 
+	/* Lookup index by lower inode and verify it matches upper inode */
+	if (ctr && !d.is_dir && ovl_indexdir(dentry->d_sb)) {
+		err = ovl_lookup_index(dentry, upperdentry, stack[0].dentry,
+				       &indexdentry);
+		if (err)
+			goto out_put;
+	}
+
 	oe = ovl_alloc_entry(ctr);
 	err = -ENOMEM;
 	if (!oe)
@@ -531,6 +618,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	oe->impure = upperimpure;
 	oe->redirect = upperredirect;
 	oe->__upperdentry = upperdentry;
+	oe->indexdentry = indexdentry;
 	memcpy(oe->lowerstack, stack, sizeof(struct path) * ctr);
 	kfree(stack);
 	kfree(d.redirect);
@@ -542,6 +630,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 out_free_oe:
 	kfree(oe);
 out_put:
+	dput(indexdentry);
 	for (i = 0; i < ctr; i++)
 		dput(stack[i].dentry);
 	kfree(stack);
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 6fd7c9e748c1..47d62c585c96 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -14,11 +14,13 @@ enum ovl_path_type {
 	__OVL_PATH_UPPER	= (1 << 0),
 	__OVL_PATH_MERGE	= (1 << 1),
 	__OVL_PATH_ORIGIN	= (1 << 2),
+	__OVL_PATH_INDEX	= (1 << 3),
 };
 
 #define OVL_TYPE_UPPER(type)	((type) & __OVL_PATH_UPPER)
 #define OVL_TYPE_MERGE(type)	((type) & __OVL_PATH_MERGE)
 #define OVL_TYPE_ORIGIN(type)	((type) & __OVL_PATH_ORIGIN)
+#define OVL_TYPE_INDEX(type)	((type) & __OVL_PATH_INDEX)
 
 #define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay."
 #define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX "opaque"
@@ -204,6 +206,7 @@ void ovl_path_lower(struct dentry *dentry, struct path *path);
 enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path);
 struct dentry *ovl_dentry_upper(struct dentry *dentry);
 struct dentry *ovl_dentry_lower(struct dentry *dentry);
+struct dentry *ovl_dentry_index(struct dentry *dentry);
 struct dentry *ovl_dentry_real(struct dentry *dentry);
 struct ovl_dir_cache *ovl_dir_cache(struct dentry *dentry);
 void ovl_set_dir_cache(struct dentry *dentry, struct ovl_dir_cache *cache);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index ddd4b0a874a9..dc08ce5c4ac0 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -43,6 +43,7 @@ struct ovl_fs {
 /* private information held for every overlayfs dentry */
 struct ovl_entry {
 	struct dentry *__upperdentry;
+	struct dentry *indexdentry;
 	struct ovl_dir_cache *cache;
 	union {
 		struct {
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index b411b6d5f7b1..b53c08f70b1a 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -47,6 +47,7 @@ static void ovl_dentry_release(struct dentry *dentry)
 		unsigned int i;
 
 		dput(oe->__upperdentry);
+		dput(oe->indexdentry);
 		kfree(oe->redirect);
 		for (i = 0; i < oe->numlower; i++)
 			dput(oe->lowerstack[i].dentry);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 8e2f308056f1..6048f204be07 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -108,6 +108,10 @@ enum ovl_path_type ovl_path_type(struct dentry *dentry)
 		if (oe->numlower > 1)
 			type |= __OVL_PATH_MERGE;
 	}
+	/* Non-upper can have a positive index dentry from another hardlink */
+	if (oe->indexdentry && d_inode(oe->indexdentry))
+		type |= __OVL_PATH_INDEX;
+
 	return type;
 }
 
@@ -158,6 +162,13 @@ struct dentry *ovl_dentry_lower(struct dentry *dentry)
 	return __ovl_dentry_lower(oe);
 }
 
+struct dentry *ovl_dentry_index(struct dentry *dentry)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+
+	return oe->indexdentry;
+}
+
 struct dentry *ovl_dentry_real(struct dentry *dentry)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 09/20] ovl: move inode helpers to inode.c
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (7 preceding siblings ...)
  2017-06-07  7:51 ` [PATCH v2 08/20] ovl: lookup index entry for non-dir Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 10/20] ovl: use ovl_inode_init() for initializing new inode Amir Goldstein
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/inode.c     | 16 ++++++++++++++++
 fs/overlayfs/overlayfs.h | 31 ++++++++++++++++---------------
 fs/overlayfs/util.c      | 16 ----------------
 3 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index d613e2c41242..ab394510736e 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -451,6 +451,22 @@ struct inode *ovl_new_inode(struct super_block *sb, umode_t mode, dev_t rdev)
 	return inode;
 }
 
+void ovl_inode_init(struct inode *inode, struct inode *realinode, bool is_upper)
+{
+	WRITE_ONCE(inode->i_private, (unsigned long) realinode |
+		   (is_upper ? OVL_ISUPPER_MASK : 0));
+}
+
+void ovl_inode_update(struct inode *inode, struct inode *upperinode)
+{
+	WARN_ON(!upperinode);
+	WARN_ON(!inode_unhashed(inode));
+	WRITE_ONCE(inode->i_private,
+		   (unsigned long) upperinode | OVL_ISUPPER_MASK);
+	if (!S_ISDIR(upperinode->i_mode))
+		__insert_inode_hash(inode, (unsigned long) upperinode);
+}
+
 static int ovl_inode_test(struct inode *inode, void *data)
 {
 	return ovl_inode_real(inode, NULL) == data;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 47d62c585c96..c4669b6b0e20 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -64,8 +64,6 @@ struct ovl_fh {
 	u8 fid[0];	/* file identifier */
 } __packed;
 
-#define OVL_ISUPPER_MASK 1UL
-
 static inline int ovl_do_rmdir(struct inode *dir, struct dentry *dentry)
 {
 	int err = vfs_rmdir(dir, dentry);
@@ -179,16 +177,6 @@ static inline struct dentry *ovl_do_tmpfile(struct dentry *dentry, umode_t mode)
 	return ret;
 }
 
-static inline struct inode *ovl_inode_real(struct inode *inode, bool *is_upper)
-{
-	unsigned long x = (unsigned long) READ_ONCE(inode->i_private);
-
-	if (is_upper)
-		*is_upper = x & OVL_ISUPPER_MASK;
-
-	return (struct inode *) (x & ~OVL_ISUPPER_MASK);
-}
-
 /* util.c */
 int ovl_want_write(struct dentry *dentry);
 void ovl_drop_write(struct dentry *dentry);
@@ -218,9 +206,6 @@ bool ovl_redirect_dir(struct super_block *sb);
 const char *ovl_dentry_get_redirect(struct dentry *dentry);
 void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
 void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry);
-void ovl_inode_init(struct inode *inode, struct inode *realinode,
-		    bool is_upper);
-void ovl_inode_update(struct inode *inode, struct inode *upperinode);
 void ovl_dentry_version_inc(struct dentry *dentry);
 u64 ovl_dentry_version_get(struct dentry *dentry);
 bool ovl_is_whiteout(struct dentry *dentry);
@@ -271,7 +256,23 @@ int ovl_update_time(struct inode *inode, struct timespec *ts, int flags);
 bool ovl_is_private_xattr(const char *name);
 
 struct inode *ovl_new_inode(struct super_block *sb, umode_t mode, dev_t rdev);
+void ovl_inode_init(struct inode *inode, struct inode *realinode,
+		    bool is_upper);
+void ovl_inode_update(struct inode *inode, struct inode *upperinode);
 struct inode *ovl_get_inode(struct super_block *sb, struct inode *realinode);
+
+#define OVL_ISUPPER_MASK 1UL
+
+static inline struct inode *ovl_inode_real(struct inode *inode, bool *is_upper)
+{
+	unsigned long x = (unsigned long) READ_ONCE(inode->i_private);
+
+	if (is_upper)
+		*is_upper = x & OVL_ISUPPER_MASK;
+
+	return (struct inode *) (x & ~OVL_ISUPPER_MASK);
+}
+
 static inline void ovl_copyattr(struct inode *from, struct inode *to)
 {
 	to->i_uid = from->i_uid;
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 6048f204be07..c42eb2eb9313 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -256,22 +256,6 @@ void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
 	oe->__upperdentry = upperdentry;
 }
 
-void ovl_inode_init(struct inode *inode, struct inode *realinode, bool is_upper)
-{
-	WRITE_ONCE(inode->i_private, (unsigned long) realinode |
-		   (is_upper ? OVL_ISUPPER_MASK : 0));
-}
-
-void ovl_inode_update(struct inode *inode, struct inode *upperinode)
-{
-	WARN_ON(!upperinode);
-	WARN_ON(!inode_unhashed(inode));
-	WRITE_ONCE(inode->i_private,
-		   (unsigned long) upperinode | OVL_ISUPPER_MASK);
-	if (!S_ISDIR(upperinode->i_mode))
-		__insert_inode_hash(inode, (unsigned long) upperinode);
-}
-
 void ovl_dentry_version_inc(struct dentry *dentry)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 10/20] ovl: use ovl_inode_init() for initializing new inode
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (8 preceding siblings ...)
  2017-06-07  7:51 ` [PATCH v2 09/20] ovl: move inode helpers to inode.c Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 11/20] ovl: hash overlay non-dir inodes by copy up origin inode Amir Goldstein
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

In preparation for hashing all overlay inodes, modify the helper
ovl_inode_init() to hash a new non-dir upper overlay inode.

Use this helper for initializing new upper inode in ovl_instantiate()
instead of ovl_inode_update(), because the implementations for new inode
case and copy up case are going to diverge.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/dir.c   |  2 +-
 fs/overlayfs/inode.c | 22 +++++++++++++++++-----
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index a63a71656e9b..4f0096fee276 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -156,7 +156,7 @@ static void ovl_instantiate(struct dentry *dentry, struct inode *inode,
 	ovl_dentry_version_inc(dentry->d_parent);
 	ovl_dentry_update(dentry, newdentry);
 	if (!hardlink) {
-		ovl_inode_update(inode, d_inode(newdentry));
+		ovl_inode_init(inode, d_inode(newdentry), true);
 		ovl_copyattr(newdentry->d_inode, inode);
 	} else {
 		WARN_ON(ovl_inode_real(inode, NULL) != d_inode(newdentry));
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index ab394510736e..55f4df8c3cf1 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -451,20 +451,32 @@ struct inode *ovl_new_inode(struct super_block *sb, umode_t mode, dev_t rdev)
 	return inode;
 }
 
-void ovl_inode_init(struct inode *inode, struct inode *realinode, bool is_upper)
+static void ovl_insert_inode_hash(struct inode *inode, struct inode *realinode)
+{
+	WARN_ON(!inode_unhashed(inode));
+	__insert_inode_hash(inode, (unsigned long) realinode);
+}
+
+static void ovl_inode_set_real(struct inode *inode, struct inode *realinode,
+			       bool is_upper)
 {
 	WRITE_ONCE(inode->i_private, (unsigned long) realinode |
 		   (is_upper ? OVL_ISUPPER_MASK : 0));
 }
 
+void ovl_inode_init(struct inode *inode, struct inode *realinode, bool is_upper)
+{
+	ovl_inode_set_real(inode, realinode, is_upper);
+	if (is_upper && !S_ISDIR(realinode->i_mode))
+		ovl_insert_inode_hash(inode, realinode);
+}
+
 void ovl_inode_update(struct inode *inode, struct inode *upperinode)
 {
 	WARN_ON(!upperinode);
-	WARN_ON(!inode_unhashed(inode));
-	WRITE_ONCE(inode->i_private,
-		   (unsigned long) upperinode | OVL_ISUPPER_MASK);
+	ovl_inode_set_real(inode, upperinode, true);
 	if (!S_ISDIR(upperinode->i_mode))
-		__insert_inode_hash(inode, (unsigned long) upperinode);
+		ovl_insert_inode_hash(inode, upperinode);
 }
 
 static int ovl_inode_test(struct inode *inode, void *data)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 11/20] ovl: hash overlay non-dir inodes by copy up origin inode
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (9 preceding siblings ...)
  2017-06-07  7:51 ` [PATCH v2 10/20] ovl: use ovl_inode_init() for initializing new inode Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 12/20] ovl: fix nlink leak in ovl_rename() Amir Goldstein
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

When inodes index feature is enabled, hash all non-dir inodes by the
address of the copy up origin inode if it is indexed.

Non-upper overlay inodes are hashed by the lower real inode, which is
the copy up origin to be. The lower (copy up origin) inode in stored in
the unused field i_data.private_data of the overlay inode.

This change makes all lower hardlinks and their indexed copy ups be
represented by a single overlay inode and is needed for vfs inode
consistency after hardlinks are no longer broken on copy up.

When hashing a non-upper overlay inode and an index entry already exists
from another lower alias copy up, set the overlay realinode to the indexed
upper inode. This is needed to make the overlay realinode invariant to
the order of lookup between two lower aliases, when only one fo them was
copied up.

Because overlay dentries of lower hardlink aliases have the same overlay
inode, a non indexed copy up of those lower aliases will cause a conflict
when trying to update the realinode to the broken upper hardlink.
A non indexed copy up of an alias that lost in this conflict will return
-EEXIST and drop the overlay dentry. The next lookup of that broken
upper hardlink will return as upper entry with a new overlay inode.

This conflict is going to be handled more gracefully by following patches.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c   |  6 +++-
 fs/overlayfs/inode.c     | 84 ++++++++++++++++++++++++++++++++++++++++++------
 fs/overlayfs/namei.c     | 25 ++++++++++++--
 fs/overlayfs/overlayfs.h | 13 ++++++--
 4 files changed, 114 insertions(+), 14 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index eedc26e15dad..ae18824c7944 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -418,7 +418,11 @@ static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
 
 	newdentry = dget(tmpfile ? upper : temp);
 	ovl_dentry_update(dentry, newdentry);
-	ovl_inode_update(d_inode(dentry), d_inode(newdentry));
+	err = ovl_inode_update(d_inode(dentry), d_inode(newdentry));
+	if (err) {
+		/* Broken hardlink - drop cache and return error */
+		d_drop(dentry);
+	}
 
 	/* Restore timestamps on parent (best effort) */
 	ovl_set_timestamps(upperdir, pstat);
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 55f4df8c3cf1..1f8276d7df32 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -471,32 +471,98 @@ void ovl_inode_init(struct inode *inode, struct inode *realinode, bool is_upper)
 		ovl_insert_inode_hash(inode, realinode);
 }
 
-void ovl_inode_update(struct inode *inode, struct inode *upperinode)
+/*
+ * When inodes index is enabled, we hash all non-dir inodes by the address
+ * of the lower origin inode. We need to take care on concurrent copy up of
+ * different lower hardlinks, that only one alias can set the upper real inode.
+ * Copy up of an alias that lost the ovl_inode_update() race will get -EEXIST
+ * and needs to d_drop() the overlay dentry of that alias, so the next
+ * ovl_lookup() will initialize a new overlay inode for the broken hardlink.
+ */
+int ovl_inode_update(struct inode *inode, struct inode *upperinode)
 {
+	bool is_upper;
+	struct inode *realinode;
+
 	WARN_ON(!upperinode);
-	ovl_inode_set_real(inode, upperinode, true);
-	if (!S_ISDIR(upperinode->i_mode))
+	spin_lock(&inode->i_lock);
+	realinode = ovl_inode_real(inode, &is_upper);
+	if (!is_upper)
+		ovl_inode_set_real(inode, upperinode, true);
+	spin_unlock(&inode->i_lock);
+	if (is_upper && realinode != upperinode)
+		return -EEXIST;
+	/* When inodes index is enabled, inode is hashed before copy up */
+	if (!S_ISDIR(upperinode->i_mode) && !ovl_indexdir(inode->i_sb))
 		ovl_insert_inode_hash(inode, upperinode);
+	return 0;
+}
+
+/* Store copy up origin inode in unused field i_data.private_data */
+static void ovl_inode_set_orig(struct inode *inode, struct inode *originode)
+{
+	inode->i_data.private_data = originode;
+}
+
+static struct inode *ovl_inode_orig(struct inode *inode)
+{
+	return (struct inode *) inode->i_data.private_data;
 }
 
 static int ovl_inode_test(struct inode *inode, void *data)
 {
-	return ovl_inode_real(inode, NULL) == data;
+	struct ovl_inode_info *oi = data;
+	bool is_upper;
+	struct inode *realinode = ovl_inode_real(inode, &is_upper);
+
+	if (realinode == oi->realinode) {
+		WARN_ON(is_upper != oi->is_upper);
+		return true;
+	}
+
+	/*
+	 * Return same overlay inode when looking up by lower real inode, but
+	 * an existing overlay inode, that is hashed by the same lower origin
+	 * inode, has already been updated on copy up to a real upper inode.
+	 */
+	return ovl_indexdir(inode->i_sb) && !oi->is_upper &&
+		is_upper && ovl_inode_orig(inode) == oi->realinode;
 }
 
 static int ovl_inode_set(struct inode *inode, void *data)
 {
-	inode->i_private = (void *) (((unsigned long) data) | OVL_ISUPPER_MASK);
+	struct ovl_inode_info *oi = data;
+
+	ovl_inode_set_real(inode, oi->realinode, oi->is_upper);
+	ovl_inode_set_orig(inode, oi->originode);
 	return 0;
 }
 
-struct inode *ovl_get_inode(struct super_block *sb, struct inode *realinode)
-
+struct inode *ovl_get_inode(struct super_block *sb, struct ovl_inode_info *oi)
 {
+	struct inode *realinode = oi->realinode;
+	unsigned long hashval = (unsigned long) realinode;
 	struct inode *inode;
 
-	inode = iget5_locked(sb, (unsigned long) realinode,
-			     ovl_inode_test, ovl_inode_set, realinode);
+	/*
+	 * With inodes index feature enabled, overlay inodes hash key is the
+	 * address of the copy up origin inode if it is indexed.
+	 * When hashing a non-upper overlay inode, origin has to be set to
+	 * the real lower inode, which is the copy up origin inode to be.
+	 * When hashing a non-upper overlay inode and index points to an upper
+	 * inode (from another lower alias copy up), set the real inode to the
+	 * indexed upper inode.
+	 */
+	if (oi->originode && oi->originode != realinode) {
+		WARN_ON(!ovl_indexdir(sb) || !oi->is_upper);
+		hashval = (unsigned long) oi->originode;
+	}
+	if (oi->index && d_inode(oi->index)) {
+		WARN_ON(oi->is_upper && d_inode(oi->index) != realinode);
+		realinode = d_inode(oi->index);
+	}
+
+	inode = iget5_locked(sb, hashval, ovl_inode_test, ovl_inode_set, oi);
 	if (inode && inode->i_state & I_NEW) {
 		ovl_fill_inode(inode, realinode->i_mode, realinode->i_rdev);
 		set_nlink(inode, realinode->i_nlink);
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index d204488bf23c..3f0f429798ef 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -595,13 +595,34 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	if (upperdentry || ctr) {
 		struct dentry *realdentry;
 		struct inode *realinode;
+		struct inode *originode = NULL;
 
 		realdentry = upperdentry ? upperdentry : stack[0].dentry;
 		realinode = d_inode(realdentry);
+		/*
+		 * When inodes index is enabled, we hash all non-dir inodes
+		 * by the address of the copy up origin inode if it is indexed
+		 * or by the address of the non-upper real inode.
+		 * When inodes index is disabled, or if origin is not indexed,
+		 * we hash non-dir upper inodes by the addess of the real inode.
+		 * Regardless of the inode we use as hash key, we always store
+		 * the real (upper most) inode in i_private field.
+		 */
+		if (indexdentry) {
+			BUG_ON(!ctr);
+			originode = d_inode(stack[0].dentry);
+		}
 
 		err = -ENOMEM;
-		if (upperdentry && !d_is_dir(upperdentry)) {
-			inode = ovl_get_inode(dentry->d_sb, realinode);
+		if (!d.is_dir && (upperdentry || originode)) {
+			struct ovl_inode_info info = {
+				.realinode = realinode,
+				.originode = originode,
+				.index = indexdentry,
+				.is_upper = !!upperdentry,
+			};
+
+			inode = ovl_get_inode(dentry->d_sb, &info);
 		} else {
 			inode = ovl_new_inode(dentry->d_sb, realinode->i_mode,
 					      realinode->i_rdev);
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index c4669b6b0e20..beac2d858689 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -258,8 +258,17 @@ bool ovl_is_private_xattr(const char *name);
 struct inode *ovl_new_inode(struct super_block *sb, umode_t mode, dev_t rdev);
 void ovl_inode_init(struct inode *inode, struct inode *realinode,
 		    bool is_upper);
-void ovl_inode_update(struct inode *inode, struct inode *upperinode);
-struct inode *ovl_get_inode(struct super_block *sb, struct inode *realinode);
+int ovl_inode_update(struct inode *inode, struct inode *upperinode);
+
+/* information used to lookup an overlayfs inode */
+struct ovl_inode_info {
+	struct inode *realinode;
+	struct inode *originode;
+	struct dentry *index;
+	bool is_upper;
+};
+
+struct inode *ovl_get_inode(struct super_block *sb, struct ovl_inode_info *oi);
 
 #define OVL_ISUPPER_MASK 1UL
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 12/20] ovl: fix nlink leak in ovl_rename()
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (10 preceding siblings ...)
  2017-06-07  7:51 ` [PATCH v2 11/20] ovl: hash overlay non-dir inodes by copy up origin inode Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 13/20] ovl: adjust overlay inode nlink for indexed inodes Amir Goldstein
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

This patch fixes an overlay inode nlink leak in the case where
ovl_rename() renames over a non-dir.

This is not so critical, because overlay inode doesn't rely on
nlink dropping to zero for inode deletion.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/dir.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 4f0096fee276..a60c075f62f6 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -891,6 +891,7 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 	bool old_opaque;
 	bool new_opaque;
 	bool cleanup_whiteout = false;
+	bool new_drop_nlink = false;
 	bool overwrite = !(flags & RENAME_EXCHANGE);
 	bool is_dir = d_is_dir(old);
 	bool new_is_dir = d_is_dir(new);
@@ -952,6 +953,8 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 			flags |= RENAME_EXCHANGE;
 			cleanup_whiteout = true;
 		}
+		if (!new_is_dir && new->d_inode)
+			new_drop_nlink = true;
 	}
 
 	old_upperdir = ovl_dentry_upper(old->d_parent);
@@ -1046,6 +1049,9 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 	if (cleanup_whiteout)
 		ovl_cleanup(old_upperdir->d_inode, newdentry);
 
+	if (new_drop_nlink)
+		drop_nlink(new->d_inode);
+
 	ovl_dentry_version_inc(old->d_parent);
 	ovl_dentry_version_inc(new->d_parent);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 13/20] ovl: adjust overlay inode nlink for indexed inodes
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (11 preceding siblings ...)
  2017-06-07  7:51 ` [PATCH v2 12/20] ovl: fix nlink leak in ovl_rename() Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 14/20] ovl: defer upper dir lock to tempfile link Amir Goldstein
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

If overlay inode has an indexed realinode, decrement 1 nlink for the
index entry.  Overlay inode nlink does not account for all the lower
hardlinks that have not been copied up yet.  Those will be added to nlink
as they get copied up and won't decrement nlink when they get unlinked
or renamed over.

The important thing to take care of is that overlay inode nlink doesn't
drop to zero when there are still upper hardlinks or non covered
lower hardlinks.

An overlay inode zero nlink stands for an orphan indexed inode. This is
the case where some of the lower hardlinks were copied up, modified, and
then all copied up upper hardlinks has been unlinked, but there are still
non covered lower hardlinks.

Return the overlay inode nlinks for indexed upper inodes on stat(2).

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/dir.c   | 17 +++++++++++++----
 fs/overlayfs/inode.c | 25 ++++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index a60c075f62f6..3bb226ffa552 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -733,9 +733,16 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir)
 	return err;
 }
 
+static bool ovl_type_indexed_lower(struct dentry *dentry)
+{
+	enum ovl_path_type type = ovl_path_type(dentry);
+
+	return !OVL_TYPE_UPPER(type) && OVL_TYPE_INDEX(type);
+}
+
 static int ovl_do_remove(struct dentry *dentry, bool is_dir)
 {
-	enum ovl_path_type type;
+	bool indexed_lower;
 	int err;
 	const struct cred *old_cred;
 
@@ -747,7 +754,8 @@ static int ovl_do_remove(struct dentry *dentry, bool is_dir)
 	if (err)
 		goto out_drop_write;
 
-	type = ovl_path_type(dentry);
+	/* An indexed lower hardlink is not counted in overlay nlink */
+	indexed_lower = ovl_type_indexed_lower(dentry);
 
 	old_cred = ovl_override_creds(dentry->d_sb);
 	if (!ovl_lower_positive(dentry))
@@ -758,7 +766,7 @@ static int ovl_do_remove(struct dentry *dentry, bool is_dir)
 	if (!err) {
 		if (is_dir)
 			clear_nlink(dentry->d_inode);
-		else
+		else if (!indexed_lower)
 			drop_nlink(dentry->d_inode);
 	}
 out_drop_write:
@@ -953,7 +961,8 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 			flags |= RENAME_EXCHANGE;
 			cleanup_whiteout = true;
 		}
-		if (!new_is_dir && new->d_inode)
+		/* An indexed lower hardlink is not counted in overlay nlink */
+		if (!new_is_dir && new->d_inode && !ovl_type_indexed_lower(new))
 			new_drop_nlink = true;
 	}
 
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 1f8276d7df32..6e85a2a7fcdc 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -126,6 +126,15 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
 	if (is_dir && OVL_TYPE_MERGE(type))
 		stat->nlink = 1;
 
+	/*
+	 * Return the overlay inode nlinks for indexed upper inodes.
+	 * Overlay nlink accounts for all upper hardlinks excluding the
+	 * index hardlink entry.
+	 * TODO: add count of non-covered lower hardlinks.
+	 */
+	if (!is_dir && OVL_TYPE_UPPER(type) && OVL_TYPE_INDEX(type))
+		stat->nlink = dentry->d_inode->i_nlink;
+
 out:
 	revert_creds(old_cred);
 
@@ -542,6 +551,7 @@ struct inode *ovl_get_inode(struct super_block *sb, struct ovl_inode_info *oi)
 {
 	struct inode *realinode = oi->realinode;
 	unsigned long hashval = (unsigned long) realinode;
+	unsigned int nlink = realinode->i_nlink;
 	struct inode *inode;
 
 	/*
@@ -560,12 +570,25 @@ struct inode *ovl_get_inode(struct super_block *sb, struct ovl_inode_info *oi)
 	if (oi->index && d_inode(oi->index)) {
 		WARN_ON(oi->is_upper && d_inode(oi->index) != realinode);
 		realinode = d_inode(oi->index);
+		/*
+		 * Decrement 1 nlink for the index entry. Overlay inode nlink
+		 * does not account for all the lower hardlinks that have not
+		 * been copied up yet. Those will be added to nlink as they get
+		 * copied up and won't decrement nlink when they get unlinked
+		 * or renamed over.
+		 * An overlay inode zero nlink stands for an orphan indexed
+		 * inode - an inode that was copied up, modified, and then the
+		 * copied up alias has been unlinked.
+		 */
+		nlink = realinode->i_nlink;
+		if (!WARN_ON(!nlink))
+			nlink--;
 	}
 
 	inode = iget5_locked(sb, hashval, ovl_inode_test, ovl_inode_set, oi);
 	if (inode && inode->i_state & I_NEW) {
 		ovl_fill_inode(inode, realinode->i_mode, realinode->i_rdev);
-		set_nlink(inode, realinode->i_nlink);
+		set_nlink(inode, nlink);
 		unlock_new_inode(inode);
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 14/20] ovl: defer upper dir lock to tempfile link
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (12 preceding siblings ...)
  2017-06-07  7:51 ` [PATCH v2 13/20] ovl: adjust overlay inode nlink for indexed inodes Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 15/20] ovl: factor out ovl_copy_up_inode() helper Amir Goldstein
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On copy up of regular file using an O_TMPFILE, lock upper dir only
before linking the tempfile in place.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 35 ++++++++++++++++++-----------------
 fs/overlayfs/util.c    |  3 ++-
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index ae18824c7944..f6570bc127e7 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -336,8 +336,13 @@ static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
 		.link = link
 	};
 
-	upper = lookup_one_len(dentry->d_name.name, upperdir,
-			       dentry->d_name.len);
+	if (tmpfile) {
+		upper = lookup_one_len_unlocked(dentry->d_name.name, upperdir,
+						dentry->d_name.len);
+	} else {
+		upper = lookup_one_len(dentry->d_name.name, upperdir,
+				       dentry->d_name.len);
+	}
 	err = PTR_ERR(upper);
 	if (IS_ERR(upper))
 		goto out;
@@ -377,16 +382,7 @@ static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
 		BUG_ON(upperpath.dentry != NULL);
 		upperpath.dentry = temp;
 
-		if (tmpfile) {
-			inode_unlock(udir);
-			err = ovl_copy_up_data(lowerpath, &upperpath,
-					       stat->size);
-			inode_lock_nested(udir, I_MUTEX_PARENT);
-		} else {
-			err = ovl_copy_up_data(lowerpath, &upperpath,
-					       stat->size);
-		}
-
+		err = ovl_copy_up_data(lowerpath, &upperpath, stat->size);
 		if (err)
 			goto out_cleanup;
 	}
@@ -409,10 +405,16 @@ static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
 	if (err)
 		goto out_cleanup;
 
-	if (tmpfile)
+	if (tmpfile) {
+		inode_lock_nested(udir, I_MUTEX_PARENT);
 		err = ovl_do_link(temp, udir, upper, true);
-	else
+		if (!err)
+			ovl_set_timestamps(upperdir, pstat);
+		pstat = NULL;
+		inode_unlock(udir);
+	} else {
 		err = ovl_do_rename(wdir, temp, udir, upper, 0);
+	}
 	if (err)
 		goto out_cleanup;
 
@@ -425,7 +427,8 @@ static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
 	}
 
 	/* Restore timestamps on parent (best effort) */
-	ovl_set_timestamps(upperdir, pstat);
+	if (pstat)
+		ovl_set_timestamps(upperdir, pstat);
 out2:
 	dput(temp);
 out1:
@@ -496,10 +499,8 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 			goto out_done;
 		}
 
-		inode_lock_nested(upperdir->d_inode, I_MUTEX_PARENT);
 		err = ovl_copy_up_locked(workdir, upperdir, dentry, lowerpath,
 					 stat, link, &pstat, true);
-		inode_unlock(upperdir->d_inode);
 		ovl_copy_up_end(dentry);
 		goto out_done;
 	}
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index c42eb2eb9313..a28bf55a0d48 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -246,7 +246,8 @@ void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
 
-	WARN_ON(!inode_is_locked(upperdentry->d_parent->d_inode));
+	WARN_ON(!inode_is_locked(upperdentry->d_parent->d_inode) &&
+		!oe->copying);
 	WARN_ON(oe->__upperdentry);
 	/*
 	 * Make sure upperdentry is consistent before making it visible to
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 15/20] ovl: factor out ovl_copy_up_inode() helper
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (13 preceding siblings ...)
  2017-06-07  7:51 ` [PATCH v2 14/20] ovl: defer upper dir lock to tempfile link Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 16/20] ovl: generalize ovl_copy_up_locked() using actors Amir Goldstein
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

Factor out helper for copying lower inode data and metadata to temp
upper inode, that is common to copy up using O_TMPFILE and workdir.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 64 +++++++++++++++++++++++++++++---------------------
 1 file changed, 37 insertions(+), 27 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index f6570bc127e7..e642a51f3be4 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -316,6 +316,42 @@ static int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
 	return err;
 }
 
+static int ovl_copy_up_inode(struct dentry *dentry, struct dentry *temp,
+			     struct path *lowerpath, struct kstat *stat)
+{
+	int err;
+
+	if (S_ISREG(stat->mode)) {
+		struct path upperpath;
+
+		ovl_path_upper(dentry, &upperpath);
+		BUG_ON(upperpath.dentry != NULL);
+		upperpath.dentry = temp;
+
+		err = ovl_copy_up_data(lowerpath, &upperpath, stat->size);
+		if (err)
+			return err;
+	}
+
+	err = ovl_copy_xattr(lowerpath->dentry, temp);
+	if (err)
+		return err;
+
+	inode_lock(temp->d_inode);
+	err = ovl_set_attr(temp, stat);
+	inode_unlock(temp->d_inode);
+	if (err)
+		return err;
+
+	/*
+	 * Store identifier of lower inode in upper inode xattr to
+	 * allow lookup of the copy up origin inode.
+	 */
+	err = ovl_set_origin(dentry, lowerpath->dentry, temp);
+
+	return err;
+}
+
 static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
 			      struct dentry *dentry, struct path *lowerpath,
 			      struct kstat *stat, const char *link,
@@ -375,33 +411,7 @@ static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
 	if (err)
 		goto out2;
 
-	if (S_ISREG(stat->mode)) {
-		struct path upperpath;
-
-		ovl_path_upper(dentry, &upperpath);
-		BUG_ON(upperpath.dentry != NULL);
-		upperpath.dentry = temp;
-
-		err = ovl_copy_up_data(lowerpath, &upperpath, stat->size);
-		if (err)
-			goto out_cleanup;
-	}
-
-	err = ovl_copy_xattr(lowerpath->dentry, temp);
-	if (err)
-		goto out_cleanup;
-
-	inode_lock(temp->d_inode);
-	err = ovl_set_attr(temp, stat);
-	inode_unlock(temp->d_inode);
-	if (err)
-		goto out_cleanup;
-
-	/*
-	 * Store identifier of lower inode in upper inode xattr to
-	 * allow lookup of the copy up origin inode.
-	 */
-	err = ovl_set_origin(dentry, lowerpath->dentry, temp);
+	err = ovl_copy_up_inode(dentry, temp, lowerpath, stat);
 	if (err)
 		goto out_cleanup;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 16/20] ovl: generalize ovl_copy_up_locked() using actors
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (14 preceding siblings ...)
  2017-06-07  7:51 ` [PATCH v2 15/20] ovl: factor out ovl_copy_up_inode() helper Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 17/20] ovl: generalize ovl_copy_up_one() " Amir Goldstein
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

ovl_copy_up_locked() is implementing two different copy up methods in one
function with many conditional statements.

Split the function into actors: aquire,prepare,commit,cancel,release.
Implement the actors prepare,commit,cancel for the 'workdir' and
'tmpfile' copy up methods.

The aquire,release code for the two methods is still open coded in
ovl_copy_up_one(). Next patch will implement the aquire,release actors.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 280 +++++++++++++++++++++++++++++++++++--------------
 1 file changed, 202 insertions(+), 78 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index e642a51f3be4..b01be9517de9 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -352,104 +352,227 @@ static int ovl_copy_up_inode(struct dentry *dentry, struct dentry *temp,
 	return err;
 }
 
-static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
-			      struct dentry *dentry, struct path *lowerpath,
-			      struct kstat *stat, const char *link,
-			      struct kstat *pstat, bool tmpfile)
+/*
+ * Context and operations for copying up a single lower file.
+ */
+struct ovl_copy_up_ctx {
+	struct dentry *dentry;
+	struct path *lowerpath;
+	struct kstat *stat;
+	struct kstat pstat;
+	const char *link;
+	struct dentry *upperdir;
+	struct dentry *tempdir;
+	struct dentry *upper;
+	struct dentry *temp;
+};
+
+struct ovl_copy_up_ops {
+	int (*aquire)(struct ovl_copy_up_ctx *);
+	int (*prepare)(struct ovl_copy_up_ctx *);
+	int (*commit)(struct ovl_copy_up_ctx *);
+	int (*cancel)(struct ovl_copy_up_ctx *);
+	int (*release)(struct ovl_copy_up_ctx *);
+};
+
+/*
+ * Copy up operations using workdir.
+ * Upper file is created in workdir, copied and moved to upperdir.
+ */
+static int ovl_copy_up_workdir_aquire(struct ovl_copy_up_ctx *ctx)
+{
+	return 0;
+}
+
+static int ovl_copy_up_workdir_prepare(struct ovl_copy_up_ctx *ctx)
 {
-	struct inode *wdir = workdir->d_inode;
-	struct inode *udir = upperdir->d_inode;
-	struct dentry *newdentry = NULL;
 	struct dentry *upper = NULL;
 	struct dentry *temp = NULL;
 	int err;
-	const struct cred *old_creds = NULL;
-	struct cred *new_creds = NULL;
 	struct cattr cattr = {
 		/* Can't properly set mode on creation because of the umask */
-		.mode = stat->mode & S_IFMT,
-		.rdev = stat->rdev,
-		.link = link
+		.mode = ctx->stat->mode & S_IFMT,
+		.rdev = ctx->stat->rdev,
+		.link = ctx->link,
 	};
 
-	if (tmpfile) {
-		upper = lookup_one_len_unlocked(dentry->d_name.name, upperdir,
-						dentry->d_name.len);
-	} else {
-		upper = lookup_one_len(dentry->d_name.name, upperdir,
-				       dentry->d_name.len);
-	}
+
+	upper = lookup_one_len(ctx->dentry->d_name.name, ctx->upperdir,
+			       ctx->dentry->d_name.len);
 	err = PTR_ERR(upper);
 	if (IS_ERR(upper))
 		goto out;
 
+	temp = ovl_lookup_temp(ctx->tempdir);
+	if (IS_ERR(temp)) {
+		err = PTR_ERR(temp);
+		goto out_dput_upper;
+	}
+
+	err = ovl_create_real(d_inode(ctx->tempdir), temp, &cattr, NULL, true);
+	if (err)
+		goto out_dput_temp;
+
+	ctx->upper = upper;
+	ctx->temp = temp;
+	return 0;
+
+out_dput_temp:
+	dput(temp);
+out_dput_upper:
+	dput(upper);
+out:
+	return err;
+}
+
+static int ovl_copy_up_workdir_commit(struct ovl_copy_up_ctx *ctx)
+{
+	int err;
+
+	err = ovl_do_rename(d_inode(ctx->tempdir), ctx->temp,
+			    d_inode(ctx->upperdir), ctx->upper, 0);
+	if (err)
+		return err;
+
+	/* After rename, ctx->temp is the upper entry we will use */
+	swap(ctx->temp, ctx->upper);
+
+	/* Restore timestamps on parent (best effort) */
+	ovl_set_timestamps(ctx->upperdir, &ctx->pstat);
+
+	return err;
+}
+
+static int ovl_copy_up_workdir_cancel(struct ovl_copy_up_ctx *ctx)
+{
+	ovl_cleanup(d_inode(ctx->tempdir), ctx->temp);
+	return 0;
+}
+
+static int ovl_copy_up_workdir_release(struct ovl_copy_up_ctx *ctx)
+{
+	return 0;
+}
+
+static const struct ovl_copy_up_ops ovl_copy_up_workdir_ops = {
+	.aquire = ovl_copy_up_workdir_aquire,
+	.prepare = ovl_copy_up_workdir_prepare,
+	.commit = ovl_copy_up_workdir_commit,
+	.cancel = ovl_copy_up_workdir_cancel,
+	.release = ovl_copy_up_workdir_release,
+};
+
+/*
+ * Copy up operations using O_TMPFILE.
+ * Upper file is created unlinked, copied and linked to upperdir.
+ */
+static int ovl_copy_up_tmpfile_aquire(struct ovl_copy_up_ctx *ctx)
+{
+	return 0;
+}
+
+static int ovl_copy_up_tmpfile_prepare(struct ovl_copy_up_ctx *ctx)
+{
+	struct dentry *upper;
+	struct dentry *temp;
+
+	upper = lookup_one_len_unlocked(ctx->dentry->d_name.name, ctx->upperdir,
+					ctx->dentry->d_name.len);
+	if (IS_ERR(upper))
+		return PTR_ERR(upper);
+
+	temp = ovl_do_tmpfile(ctx->upperdir, ctx->stat->mode);
+	if (IS_ERR(temp)) {
+		dput(upper);
+		return PTR_ERR(temp);
+	}
+
+	ctx->upper = upper;
+	ctx->temp = temp;
+	return 0;
+}
+
+static int ovl_copy_up_tmpfile_commit(struct ovl_copy_up_ctx *ctx)
+{
+	int err;
+
+	inode_lock_nested(d_inode(ctx->upperdir), I_MUTEX_PARENT);
+	/* link the sucker ;) */
+	err = ovl_do_link(ctx->temp, d_inode(ctx->upperdir), ctx->upper, true);
+	/* Restore timestamps on parent (best effort) */
+	if (!err)
+		ovl_set_timestamps(ctx->upperdir, &ctx->pstat);
+	inode_unlock(d_inode(ctx->upperdir));
+
+	return err;
+}
+
+static int ovl_copy_up_tmpfile_cancel(struct ovl_copy_up_ctx *ctx)
+{
+	return 0;
+}
+
+static int ovl_copy_up_tmpfile_release(struct ovl_copy_up_ctx *ctx)
+{
+	return 0;
+}
+
+static const struct ovl_copy_up_ops ovl_copy_up_tmpfile_ops = {
+	.aquire = ovl_copy_up_tmpfile_aquire,
+	.prepare = ovl_copy_up_tmpfile_prepare,
+	.commit = ovl_copy_up_tmpfile_commit,
+	.cancel = ovl_copy_up_tmpfile_cancel,
+	.release = ovl_copy_up_tmpfile_release,
+};
+
+static int ovl_copy_up_locked(struct ovl_copy_up_ctx *ctx,
+			      const struct ovl_copy_up_ops *ops)
+{
+	struct dentry *dentry = ctx->dentry;
+	const struct cred *old_creds = NULL;
+	struct cred *new_creds = NULL;
+	int err;
+
 	err = security_inode_copy_up(dentry, &new_creds);
 	if (err < 0)
-		goto out1;
+		return err;
 
 	if (new_creds)
 		old_creds = override_creds(new_creds);
 
-	if (tmpfile)
-		temp = ovl_do_tmpfile(upperdir, stat->mode);
-	else
-		temp = ovl_lookup_temp(workdir);
-	err = 0;
-	if (IS_ERR(temp)) {
-		err = PTR_ERR(temp);
-		temp = NULL;
-	}
-
-	if (!err && !tmpfile)
-		err = ovl_create_real(wdir, temp, &cattr, NULL, true);
+	ctx->upper = ctx->temp = NULL;
+	err = ops->prepare(ctx);
 
 	if (new_creds) {
 		revert_creds(old_creds);
 		put_cred(new_creds);
 	}
-
 	if (err)
-		goto out2;
+		goto out;
 
-	err = ovl_copy_up_inode(dentry, temp, lowerpath, stat);
+	err = ovl_copy_up_inode(dentry, ctx->temp, ctx->lowerpath, ctx->stat);
 	if (err)
-		goto out_cleanup;
+		goto out_cancel;
 
-	if (tmpfile) {
-		inode_lock_nested(udir, I_MUTEX_PARENT);
-		err = ovl_do_link(temp, udir, upper, true);
-		if (!err)
-			ovl_set_timestamps(upperdir, pstat);
-		pstat = NULL;
-		inode_unlock(udir);
-	} else {
-		err = ovl_do_rename(wdir, temp, udir, upper, 0);
-	}
+	err = ops->commit(ctx);
 	if (err)
-		goto out_cleanup;
+		goto out_cancel;
 
-	newdentry = dget(tmpfile ? upper : temp);
-	ovl_dentry_update(dentry, newdentry);
-	err = ovl_inode_update(d_inode(dentry), d_inode(newdentry));
+	ovl_dentry_update(dentry, dget(ctx->upper));
+	err = ovl_inode_update(d_inode(dentry), d_inode(ctx->upper));
 	if (err) {
 		/* Broken hardlink - drop cache and return error */
 		d_drop(dentry);
 	}
 
-	/* Restore timestamps on parent (best effort) */
-	if (pstat)
-		ovl_set_timestamps(upperdir, pstat);
-out2:
-	dput(temp);
-out1:
-	dput(upper);
 out:
+	dput(ctx->temp);
+	dput(ctx->upper);
 	return err;
 
-out_cleanup:
-	if (!tmpfile)
-		ovl_cleanup(wdir, temp);
-	goto out2;
+out_cancel:
+	ops->cancel(ctx);
+	goto out;
 }
 
 /*
@@ -465,37 +588,40 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 			   struct path *lowerpath, struct kstat *stat)
 {
 	DEFINE_DELAYED_CALL(done);
-	struct dentry *workdir = ovl_workdir(dentry);
 	int err;
-	struct kstat pstat;
 	struct path parentpath;
 	struct dentry *lowerdentry = lowerpath->dentry;
-	struct dentry *upperdir;
-	const char *link = NULL;
 	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
+	struct ovl_copy_up_ctx ctx = {
+		.dentry = dentry,
+		.lowerpath = lowerpath,
+		.stat = stat,
+		.link = NULL,
+		.tempdir = ovl_workdir(dentry),
+	};
 
-	if (WARN_ON(!workdir))
+	if (WARN_ON(!ctx.tempdir))
 		return -EROFS;
 
 	ovl_do_check_copy_up(lowerdentry);
 
 	ovl_path_upper(parent, &parentpath);
-	upperdir = parentpath.dentry;
+	ctx.upperdir = parentpath.dentry;
 
 	/* Mark parent "impure" because it may now contain non-pure upper */
-	err = ovl_set_impure(parent, upperdir);
+	err = ovl_set_impure(parent, ctx.upperdir);
 	if (err)
 		return err;
 
-	err = vfs_getattr(&parentpath, &pstat,
+	err = vfs_getattr(&parentpath, &ctx.pstat,
 			  STATX_ATIME | STATX_MTIME, AT_STATX_SYNC_AS_STAT);
 	if (err)
 		return err;
 
 	if (S_ISLNK(stat->mode)) {
-		link = vfs_get_link(lowerdentry, &done);
-		if (IS_ERR(link))
-			return PTR_ERR(link);
+		ctx.link = vfs_get_link(lowerdentry, &done);
+		if (IS_ERR(ctx.link))
+			return PTR_ERR(ctx.link);
 	}
 
 	/* Should we copyup with O_TMPFILE or with workdir? */
@@ -509,14 +635,13 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 			goto out_done;
 		}
 
-		err = ovl_copy_up_locked(workdir, upperdir, dentry, lowerpath,
-					 stat, link, &pstat, true);
+		err = ovl_copy_up_locked(&ctx, &ovl_copy_up_tmpfile_ops);
 		ovl_copy_up_end(dentry);
 		goto out_done;
 	}
 
 	err = -EIO;
-	if (lock_rename(workdir, upperdir) != NULL) {
+	if (lock_rename(ctx.tempdir, ctx.upperdir) != NULL) {
 		pr_err("overlayfs: failed to lock workdir+upperdir\n");
 		goto out_unlock;
 	}
@@ -526,10 +651,9 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 		goto out_unlock;
 	}
 
-	err = ovl_copy_up_locked(workdir, upperdir, dentry, lowerpath,
-				 stat, link, &pstat, false);
+	err = ovl_copy_up_locked(&ctx, &ovl_copy_up_workdir_ops);
 out_unlock:
-	unlock_rename(workdir, upperdir);
+	unlock_rename(ctx.tempdir, ctx.upperdir);
 out_done:
 	do_delayed_call(&done);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 17/20] ovl: generalize ovl_copy_up_one() using actors
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (15 preceding siblings ...)
  2017-06-07  7:51 ` [PATCH v2 16/20] ovl: generalize ovl_copy_up_locked() using actors Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 18/20] ovl: implement index dir copy up method Amir Goldstein
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

ovl_copy_up_one() is implementing two different locking methods
for regular files and non-regular files.

Implement the actors aquire,release for the 'workdir' and 'tmpfile'
copy up methods and use them to generalize locking in ovl_copy_up_one().

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 59 ++++++++++++++++++++++++++++----------------------
 1 file changed, 33 insertions(+), 26 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index b01be9517de9..843ba9ca7bfc 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -381,7 +381,23 @@ struct ovl_copy_up_ops {
  */
 static int ovl_copy_up_workdir_aquire(struct ovl_copy_up_ctx *ctx)
 {
+	int err = -EIO;
+
+	if (lock_rename(ctx->tempdir, ctx->upperdir) != NULL) {
+		pr_err("overlayfs: failed to lock workdir+upperdir\n");
+		goto out_unlock;
+	}
+	if (ovl_dentry_upper(ctx->dentry)) {
+		/* Raced with another copy-up? */
+		err = 1;
+		goto out_unlock;
+	}
+
 	return 0;
+
+out_unlock:
+	unlock_rename(ctx->tempdir, ctx->upperdir);
+	return err;
 }
 
 static int ovl_copy_up_workdir_prepare(struct ovl_copy_up_ctx *ctx)
@@ -451,6 +467,7 @@ static int ovl_copy_up_workdir_cancel(struct ovl_copy_up_ctx *ctx)
 
 static int ovl_copy_up_workdir_release(struct ovl_copy_up_ctx *ctx)
 {
+	unlock_rename(ctx->tempdir, ctx->upperdir);
 	return 0;
 }
 
@@ -468,7 +485,7 @@ static const struct ovl_copy_up_ops ovl_copy_up_workdir_ops = {
  */
 static int ovl_copy_up_tmpfile_aquire(struct ovl_copy_up_ctx *ctx)
 {
-	return 0;
+	return ovl_copy_up_start(ctx->dentry);
 }
 
 static int ovl_copy_up_tmpfile_prepare(struct ovl_copy_up_ctx *ctx)
@@ -514,6 +531,7 @@ static int ovl_copy_up_tmpfile_cancel(struct ovl_copy_up_ctx *ctx)
 
 static int ovl_copy_up_tmpfile_release(struct ovl_copy_up_ctx *ctx)
 {
+	ovl_copy_up_end(ctx->dentry);
 	return 0;
 }
 
@@ -599,6 +617,7 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 		.link = NULL,
 		.tempdir = ovl_workdir(dentry),
 	};
+	const struct ovl_copy_up_ops *ops;
 
 	if (WARN_ON(!ctx.tempdir))
 		return -EROFS;
@@ -625,35 +644,23 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 	}
 
 	/* Should we copyup with O_TMPFILE or with workdir? */
-	if (S_ISREG(stat->mode) && ofs->tmpfile) {
-		err = ovl_copy_up_start(dentry);
-		/* err < 0: interrupted, err > 0: raced with another copy-up */
-		if (unlikely(err)) {
-			pr_debug("ovl_copy_up_start(%pd2) = %i\n", dentry, err);
-			if (err > 0)
-				err = 0;
-			goto out_done;
-		}
-
-		err = ovl_copy_up_locked(&ctx, &ovl_copy_up_tmpfile_ops);
-		ovl_copy_up_end(dentry);
+	if (S_ISREG(stat->mode) && ofs->tmpfile)
+		ops = &ovl_copy_up_tmpfile_ops;
+	else
+		ops = &ovl_copy_up_workdir_ops;
+
+	err = ops->aquire(&ctx);
+	/* err < 0: interrupted, err > 0: raced with another copy-up */
+	if (unlikely(err)) {
+		pr_debug("%s(%pd2): aquire = %i\n", __func__, dentry, err);
+		if (err > 0)
+			err = 0;
 		goto out_done;
 	}
 
-	err = -EIO;
-	if (lock_rename(ctx.tempdir, ctx.upperdir) != NULL) {
-		pr_err("overlayfs: failed to lock workdir+upperdir\n");
-		goto out_unlock;
-	}
-	if (ovl_dentry_upper(dentry)) {
-		/* Raced with another copy-up?  Nothing to do, then... */
-		err = 0;
-		goto out_unlock;
-	}
+	err = ovl_copy_up_locked(&ctx, ops);
+	ops->release(&ctx);
 
-	err = ovl_copy_up_locked(&ctx, &ovl_copy_up_workdir_ops);
-out_unlock:
-	unlock_rename(ctx.tempdir, ctx.upperdir);
 out_done:
 	do_delayed_call(&done);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 18/20] ovl: implement index dir copy up method
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (16 preceding siblings ...)
  2017-06-07  7:51 ` [PATCH v2 17/20] ovl: generalize ovl_copy_up_one() " Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 19/20] ovl: handle race of concurrent lower hardlinks copy up Amir Goldstein
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

Implement a copy up method using index dir to prevent breaking lower
hardlinks on copy up.

This method requires that the inodes index dir feature was enabled and
that all underlying fs support NFS export.

On the first lower hardlink copy up, upper file is created in index dir,
named after the hex representation of the lower origin inode file handle.
On the second lower hardlink copy up, upper file is found in index dir,
by the same lower handle key.
On either case, the upper indexed inode is then linked to the copy up
upper path.

The index entry remains linked for future lower hardlink copy up and for
lower to upper inode map, that is needed for exporting overlayfs to NFS.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 209 ++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 200 insertions(+), 9 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 843ba9ca7bfc..f08220c82c77 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -345,7 +345,8 @@ static int ovl_copy_up_inode(struct dentry *dentry, struct dentry *temp,
 
 	/*
 	 * Store identifier of lower inode in upper inode xattr to
-	 * allow lookup of the copy up origin inode.
+	 * allow lookup of the copy up origin inode. We do this last
+	 * to bless the inode, in case it was created in the index dir.
 	 */
 	err = ovl_set_origin(dentry, lowerpath->dentry, temp);
 
@@ -365,6 +366,7 @@ struct ovl_copy_up_ctx {
 	struct dentry *tempdir;
 	struct dentry *upper;
 	struct dentry *temp;
+	bool created;
 };
 
 struct ovl_copy_up_ops {
@@ -543,6 +545,187 @@ static const struct ovl_copy_up_ops ovl_copy_up_tmpfile_ops = {
 	.release = ovl_copy_up_tmpfile_release,
 };
 
+/*
+ * Copy up operations using index dir.
+ * Upper file is created in index dir, copied and linked to upperdir.
+ * The index entry remains to be used for mapping lower inode to upper inode.
+ */
+static int ovl_copy_up_indexdir_aquire(struct ovl_copy_up_ctx *ctx)
+{
+	/* TODO: handle race of concurrent lower hardlinks copy up */
+	return ovl_copy_up_start(ctx->dentry);
+}
+
+static int ovl_copy_up_indexdir_prepare(struct ovl_copy_up_ctx *ctx)
+{
+	struct dentry *upper;
+	struct dentry *index;
+	struct inode *inode;
+	int err;
+	struct cattr cattr = {
+		/* Can't properly set mode on creation because of the umask */
+		.mode = ctx->stat->mode & S_IFMT,
+		.rdev = ctx->stat->rdev,
+		.link = ctx->link,
+	};
+
+	upper = lookup_one_len_unlocked(ctx->dentry->d_name.name, ctx->upperdir,
+					ctx->dentry->d_name.len);
+	if (IS_ERR(upper))
+		return PTR_ERR(upper);
+
+	index = dget(ovl_dentry_index(ctx->dentry));
+	BUG_ON(!index);
+	inode = d_inode(index);
+	if (inode) {
+		/* Another lower hardlink already copied-up? */
+		err = -EEXIST;
+		if ((inode->i_mode & S_IFMT) != cattr.mode)
+			goto out_dput;
+
+		err = -ENOENT;
+		if (!inode->i_nlink)
+			goto out_dput;
+
+		/*
+		 * Verify that found index is a copy up of lower inode.
+		 * If index inode doesn't point back to lower inode via
+		 * origin file handle, then this is either an in-progress
+		 * copy up or leftover from index dir before copying layers.
+		 * In both cases, we cannot use this index and must fail the
+		 * copy up. The in-progress case will return -EEXISTS and the
+		 * leftover case will return -ESTALE.
+		 */
+		err = ovl_verify_origin(index, ctx->lowerpath->mnt,
+					ctx->lowerpath->dentry, false);
+		if (err) {
+			if (err == -ENODATA)
+				err = -EEXIST;
+			goto out_dput;
+		}
+
+		if (inode->i_nlink < 2) {
+			/*
+			 * An orphan index inode can be created by copying up
+			 * a lower hardlink alias and then unlinking it. From
+			 * overlayfs perspective, this inode may still live if
+			 * there are more lower hardlinks and it should contain
+			 * the data of the upper inode that was unlinked. So if
+			 * an orphan inode is found in the index dir and we
+			 * should reuse it on copy up of another lower alias.
+			 *
+			 * TODO: keep account of nlink incremented by copy up
+			 * and account of nlink decremented by lower cover up.
+			 * When copyup_nlink + coverup_nlink == origin_nlink
+			 * and index_nlink == 1, need to replace the index entry
+			 * with a whiteout because all overlay references to the
+			 * index are gone.
+			 */
+			pr_warn_ratelimited("overlayfs: linking to orphan upper (%pd2, ino=%lu)\n",
+					    index, inode->i_ino);
+		}
+
+		/* Link to existing upper without copying lower */
+		err = 1;
+		goto out;
+	}
+
+	inode_lock_nested(d_inode(ctx->tempdir), I_MUTEX_PARENT);
+	err = ovl_create_real(d_inode(ctx->tempdir), index, &cattr, NULL, true);
+	if (!err)
+		ctx->created = true;
+	inode_unlock(d_inode(ctx->tempdir));
+	if (err)
+		goto out_dput;
+
+out:
+	ctx->upper = upper;
+	ctx->temp = index;
+	return err;
+
+out_dput:
+	pr_warn_ratelimited("overlayfs: failed to create index entry (%pd2, ino=%lu, err=%i)\n",
+			    index, inode ? inode->i_ino : 0, err);
+	pr_warn_ratelimited("overlayfs: try clearing index dir or mounting with '-o index=off' to disable inodes index.\n");
+	dput(upper);
+	dput(index);
+	/* In case a bad/deleted index inode is cached in overlay dentry */
+	d_drop(ctx->dentry);
+	return err;
+}
+
+static int ovl_copy_up_indexdir_commit(struct ovl_copy_up_ctx *ctx)
+{
+	int err;
+
+	inode_lock_nested(d_inode(ctx->upperdir), I_MUTEX_PARENT);
+	/* link the sucker ;) */
+	err = ovl_do_link(ctx->temp, d_inode(ctx->upperdir), ctx->upper, true);
+	/*
+	 * Overlay inode nlink doesn't account for lower hardlinks that haven't
+	 * been copied up, so we need to update it on copy up. Otherwise, user
+	 * could decrement nlink below zero by unlinking copied up uppers.
+	 * On the first copy up, we set nlink to 1 (excluding the index entry)
+	 * and on following copy ups we increment it. In between ovl_link
+	 * could add more upper hardlinks and increment nlink as well.
+	 */
+	if (!err) {
+		if (ctx->created)
+			set_nlink(d_inode(ctx->dentry), 1);
+		else
+			inc_nlink(d_inode(ctx->dentry));
+		/* Restore timestamps on parent (best effort) */
+		ovl_set_timestamps(ctx->upperdir, &ctx->pstat);
+	}
+	inode_unlock(d_inode(ctx->upperdir));
+
+	return err;
+}
+
+static int ovl_copy_up_indexdir_cancel(struct ovl_copy_up_ctx *ctx)
+{
+	struct inode *inode = d_inode(ctx->temp);
+
+	if (WARN_ON(!inode))
+		return 0;
+
+	/* Cleanup prepared index entry only if we created it */
+	if (!ctx->created)
+		return 0;
+
+	inode_lock_nested(d_inode(ctx->tempdir), I_MUTEX_PARENT);
+
+	/*
+	 * We must not cleanup an already hardlinked index.
+	 */
+	if (inode->i_nlink != 1)
+		goto out_unlock;
+
+	pr_warn_ratelimited("overlayfs: cleanup bad index (%pd2, ino=%lu)\n",
+			    ctx->temp, inode->i_ino);
+	ovl_cleanup(d_inode(ctx->tempdir), ctx->temp);
+	/* Bad index inode is still cached in overlay dentry */
+	d_drop(ctx->dentry);
+
+out_unlock:
+	inode_unlock(d_inode(ctx->tempdir));
+	return 0;
+}
+
+static int ovl_copy_up_indexdir_release(struct ovl_copy_up_ctx *ctx)
+{
+	ovl_copy_up_end(ctx->dentry);
+	return 0;
+}
+
+static const struct ovl_copy_up_ops ovl_copy_up_indexdir_ops = {
+	.aquire = ovl_copy_up_indexdir_aquire,
+	.prepare = ovl_copy_up_indexdir_prepare,
+	.commit = ovl_copy_up_indexdir_commit,
+	.cancel = ovl_copy_up_indexdir_cancel,
+	.release = ovl_copy_up_indexdir_release,
+};
+
 static int ovl_copy_up_locked(struct ovl_copy_up_ctx *ctx,
 			      const struct ovl_copy_up_ops *ops)
 {
@@ -565,12 +748,16 @@ static int ovl_copy_up_locked(struct ovl_copy_up_ctx *ctx,
 		revert_creds(old_creds);
 		put_cred(new_creds);
 	}
-	if (err)
+	if (err < 0)
 		goto out;
 
-	err = ovl_copy_up_inode(dentry, ctx->temp, ctx->lowerpath, ctx->stat);
-	if (err)
-		goto out_cancel;
+	/* err == 1 means we found an existing hardlinked upper inode */
+	if (!err) {
+		err = ovl_copy_up_inode(dentry, ctx->temp, ctx->lowerpath,
+					ctx->stat);
+		if (err)
+			goto out_cancel;
+	}
 
 	err = ops->commit(ctx);
 	if (err)
@@ -584,8 +771,8 @@ static int ovl_copy_up_locked(struct ovl_copy_up_ctx *ctx,
 	}
 
 out:
-	dput(ctx->temp);
 	dput(ctx->upper);
+	dput(ctx->temp);
 	return err;
 
 out_cancel:
@@ -610,12 +797,14 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 	struct path parentpath;
 	struct dentry *lowerdentry = lowerpath->dentry;
 	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
+	bool indexed = ovl_dentry_index(dentry);
 	struct ovl_copy_up_ctx ctx = {
 		.dentry = dentry,
 		.lowerpath = lowerpath,
 		.stat = stat,
 		.link = NULL,
-		.tempdir = ovl_workdir(dentry),
+		.tempdir = indexed ? ovl_indexdir(dentry->d_sb) :
+				     ovl_workdir(dentry),
 	};
 	const struct ovl_copy_up_ops *ops;
 
@@ -643,8 +832,10 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 			return PTR_ERR(ctx.link);
 	}
 
-	/* Should we copyup with O_TMPFILE or with workdir? */
-	if (S_ISREG(stat->mode) && ofs->tmpfile)
+	/* Should we copyup with O_TMPFILE with indexdir or with workdir? */
+	if (indexed)
+		ops = &ovl_copy_up_indexdir_ops;
+	else if (S_ISREG(stat->mode) && ofs->tmpfile)
 		ops = &ovl_copy_up_tmpfile_ops;
 	else
 		ops = &ovl_copy_up_workdir_ops;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 19/20] ovl: handle race of concurrent lower hardlinks copy up
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (17 preceding siblings ...)
  2017-06-07  7:51 ` [PATCH v2 18/20] ovl: implement index dir copy up method Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-07  7:51 ` [PATCH v2 20/20] ovl: constant inode number for hardlinks Amir Goldstein
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

Make sure that task B does not link to an index inode created by
task A, before task A completed copying the inode data.
Task A marks the index inode 'inuse' and task B waits for 'inuse' flag
to be cleared.  Take care of the race between task A canceling the
copy up and unlinking the index inode and task B trying to link to it.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 85 ++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 69 insertions(+), 16 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index f08220c82c77..f8acf9c8b345 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -552,16 +552,30 @@ static const struct ovl_copy_up_ops ovl_copy_up_tmpfile_ops = {
  */
 static int ovl_copy_up_indexdir_aquire(struct ovl_copy_up_ctx *ctx)
 {
-	/* TODO: handle race of concurrent lower hardlinks copy up */
 	return ovl_copy_up_start(ctx->dentry);
 }
 
+/*
+ * Set ctx->temp to a positive dentry with the index inode.
+ *
+ * Return 0 if entry was created by us, in which case, we also got
+ * inode_inuse lock and we will release it on copy_up_commit after
+ * copying the inode data.
+ *
+ * Return 1 if we found an index inode, in which case, we do not need
+ * to copy the inode data.
+ *
+ * May return -EEXISTS/-ENOENT/-EBUSY on various races of concurrent
+ * lower hardlinks copy up on the same lower inode.
+ */
 static int ovl_copy_up_indexdir_prepare(struct ovl_copy_up_ctx *ctx)
 {
+	struct inode *dir = d_inode(ctx->tempdir);
 	struct dentry *upper;
 	struct dentry *index;
 	struct inode *inode;
 	int err;
+	bool retried = false;
 	struct cattr cattr = {
 		/* Can't properly set mode on creation because of the umask */
 		.mode = ctx->stat->mode & S_IFMT,
@@ -576,6 +590,7 @@ static int ovl_copy_up_indexdir_prepare(struct ovl_copy_up_ctx *ctx)
 
 	index = dget(ovl_dentry_index(ctx->dentry));
 	BUG_ON(!index);
+retry:
 	inode = d_inode(index);
 	if (inode) {
 		/* Another lower hardlink already copied-up? */
@@ -583,18 +598,32 @@ static int ovl_copy_up_indexdir_prepare(struct ovl_copy_up_ctx *ctx)
 		if ((inode->i_mode & S_IFMT) != cattr.mode)
 			goto out_dput;
 
-		err = -ENOENT;
+		/*
+		 * We need to be carefull not to link with a copy in-progress
+		 * index inode. Testing 'inuse' bit with indexdir i_mutex held
+		 * protect us from the time of creating the index entry below
+		 * to the time of ovl_copy_up_indexdir_commit. We also need to
+		 * test i_nlink with indexdir i_mutex held to make sure that
+		 * index has not been unlinked by ovl_copy_up_indexdir_cancel,
+		 * right after clearing 'inuse' bit.
+		 */
+		inode_lock_nested(dir, I_MUTEX_PARENT);
 		if (!inode->i_nlink)
+			err = -ENOENT;
+		else
+			err = wait_on_inode_inuse(inode, TASK_KILLABLE);
+		inode_unlock(dir);
+		if (err)
 			goto out_dput;
 
 		/*
 		 * Verify that found index is a copy up of lower inode.
 		 * If index inode doesn't point back to lower inode via
-		 * origin file handle, then this is either an in-progress
-		 * copy up or leftover from index dir before copying layers.
+		 * origin file handle, then this is either a leftover from
+		 * failed copy up or an index dir entry before copying layers.
 		 * In both cases, we cannot use this index and must fail the
-		 * copy up. The in-progress case will return -EEXISTS and the
-		 * leftover case will return -ESTALE.
+		 * copy up. The failed copy up case will return -EEXISTS and
+		 * the copying layers case will return -ESTALE.
 		 */
 		err = ovl_verify_origin(index, ctx->lowerpath->mnt,
 					ctx->lowerpath->dentry, false);
@@ -630,11 +659,29 @@ static int ovl_copy_up_indexdir_prepare(struct ovl_copy_up_ctx *ctx)
 		goto out;
 	}
 
-	inode_lock_nested(d_inode(ctx->tempdir), I_MUTEX_PARENT);
-	err = ovl_create_real(d_inode(ctx->tempdir), index, &cattr, NULL, true);
+	/* Raced on creating index and not found on retry? */
+	err = -ENOENT;
+	if (retried)
+		goto out_dput;
+
+	inode_lock_nested(dir, I_MUTEX_PARENT);
+	err = ovl_create_real(dir, index, &cattr, NULL, true);
 	if (!err)
 		ctx->created = true;
-	inode_unlock(d_inode(ctx->tempdir));
+	/*
+	 * Mark index inode 'inuse' before copying inode data to avoid racing
+	 * with copy up of another lower hardlink of the same lower inode.
+	 */
+	if (!err && !inode_inuse_trylock(index->d_inode))
+		err = -EBUSY;
+	inode_unlock(dir);
+
+	/* Raced with copy up of another lower hardlink of the same inode? */
+	if (err == -EEXIST) {
+		retried = true;
+		goto retry;
+	}
+
 	if (err)
 		goto out_dput;
 
@@ -679,6 +726,10 @@ static int ovl_copy_up_indexdir_commit(struct ovl_copy_up_ctx *ctx)
 	}
 	inode_unlock(d_inode(ctx->upperdir));
 
+	/* Allow other lower hardlinks to link with our creation */
+	if (ctx->created)
+		inode_inuse_unlock(d_inode(ctx->temp));
+
 	return err;
 }
 
@@ -696,18 +747,20 @@ static int ovl_copy_up_indexdir_cancel(struct ovl_copy_up_ctx *ctx)
 	inode_lock_nested(d_inode(ctx->tempdir), I_MUTEX_PARENT);
 
 	/*
-	 * We must not cleanup an already hardlinked index.
+	 * We must clear 'inuse' flag before unlink, but we need to take care
+	 * because this could allow another copy up to race with this cleanup,
+	 * find the unborn index inode that looks like an orphan and link it,
+	 * before we unlink the unborn index entry. So before linking an index
+	 * inode we must take indexdir i_mutex, wait for 'inuse' flag to be
+	 * cleared and test that inode i_nlink is positive.
 	 */
-	if (inode->i_nlink != 1)
-		goto out_unlock;
-
-	pr_warn_ratelimited("overlayfs: cleanup bad index (%pd2, ino=%lu)\n",
-			    ctx->temp, inode->i_ino);
+	inode_inuse_unlock(inode);
+	pr_warn_ratelimited("overlayfs: cleanup bad index (%pd2, ino=%lu, nlink=%u)\n",
+			    ctx->temp, inode->i_ino, inode->i_nlink);
 	ovl_cleanup(d_inode(ctx->tempdir), ctx->temp);
 	/* Bad index inode is still cached in overlay dentry */
 	d_drop(ctx->dentry);
 
-out_unlock:
 	inode_unlock(d_inode(ctx->tempdir));
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 20/20] ovl: constant inode number for hardlinks
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (18 preceding siblings ...)
  2017-06-07  7:51 ` [PATCH v2 19/20] ovl: handle race of concurrent lower hardlinks copy up Amir Goldstein
@ 2017-06-07  7:51 ` Amir Goldstein
  2017-06-07  7:54 ` [PATCH v2 00/20] Overlayfs inodes index Miklos Szeredi
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

When inodes index feature is enabled, it is safe to return the copy up
origin st_ino for indexed upper hardlinks, because all indexed upper
hardlinks are represented by the same overlay inode as the copy up
origin.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/inode.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 6e85a2a7fcdc..8a30dc6cd42d 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -96,11 +96,14 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
 
 			WARN_ON_ONCE(stat->dev != lowerstat.dev);
 			/*
-			 * Lower hardlinks are broken on copy up to different
+			 * Lower hardlinks may be broken on copy up to different
 			 * upper files, so we cannot use the lower origin st_ino
 			 * for those different files, even for the same fs case.
+			 * Indexed upper inodes are safe and non-hardlinked
+			 * origin inodes are safe.
 			 */
-			if (is_dir || lowerstat.nlink == 1)
+			if (is_dir || lowerstat.nlink == 1 ||
+			    OVL_TYPE_INDEX(type))
 				stat->ino = lowerstat.ino;
 		}
 		stat->dev = dentry->d_sb->s_dev;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 00/20] Overlayfs inodes index
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (19 preceding siblings ...)
  2017-06-07  7:51 ` [PATCH v2 20/20] ovl: constant inode number for hardlinks Amir Goldstein
@ 2017-06-07  7:54 ` Miklos Szeredi
  2017-06-07  7:58   ` Amir Goldstein
  2017-06-07 14:58 ` Amir Goldstein
  2017-06-07 17:17 ` [PATCH v2 00/20] Overlayfs inodes index J. Bruce Fields
  22 siblings, 1 reply; 41+ messages in thread
From: Miklos Szeredi @ 2017-06-07  7:54 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Wed, Jun 7, 2017 at 9:51 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> Miklos,
>
> This set is an independent series a top of current overlayfs-next.
> I yanked all the dependencies from previous postings (consistent d_ino,
> dir_lock, verify_lower) and included everything needed in this posting.
>
> This work introcuding the inodes index opt-in feature, which provides:
> - Hardlinks are not broken on copy up
> - Infrastructure for overlayfs NFS export
>
> Hardlink copy up tests including concurrent copy up of lower hardlinks
> are available on my xfstests dev branch [1].
>
> There are some more TODO items before this is ready for v4.13, but I'd
> love to hear what you think about the direction this has taken so far.

Do you have a public branch for this?

Thanks,
Miklos

>
> Thanks,
> Amir.
>
> TODO:
> - Consistency of lower and upper hardlinks (*)
> - Cleanup stale and orphan index entries on mount (**)
> - Document the inodes index feature
>
> (*) When any lower hardlink has been copied up, we get the indexed
>     upper inode on lookup of all lower hardlinks and since they all
>     share the same overlay inode, they have the same 'realinode'.
>     Opening the lower hardlinks for read gives that lower inode
>     and not the indexed upper 'realinode'. The tests in [1] demostrate
>     this problem.
> (**) A stale index entry has a missing or stale 'origin' xattr. An
>      orphan index entry has nlink 1 and all lower hardlinks are covered.
>
> v2:
> - Rebase on top of overalyfs-next with all dependencies
> - Remove 'verify_lower' dependency and simplify
> - Lookup index dentry on ovl_lookup()
> - Don't hash broken overlay hardlinks by origin
> - Fix nlink count of overlay inodes
> - Constant st_ino for indexed hardlinks
>
> v1:
> - Upper/work dir exclusive lock
> - Introduce verify_lower mount option
> - Hash overlay inodes by origin
> - Introduce inodes index feature
> - Copy up hardlinks using inodes index
>
> [1] https://github.com/amir73il/xfstests/commits/overlayfs-devel
>
> Amir Goldstein (20):
>   vfs: introduce inode 'inuse' lock
>   ovl: get exclusive ownership on upper/work dirs
>   ovl: relax same fs constrain for ovl_check_origin()
>   ovl: generalize ovl_create_workdir()
>   ovl: introduce the inodes index dir feature
>   ovl: verify upper root dir matches lower root dir
>   ovl: verify index dir matches upper dir
>   ovl: lookup index entry for non-dir
>   ovl: move inode helpers to inode.c
>   ovl: use ovl_inode_init() for initializing new inode
>   ovl: hash overlay non-dir inodes by copy up origin inode
>   ovl: fix nlink leak in ovl_rename()
>   ovl: adjust overlay inode nlink for indexed inodes
>   ovl: defer upper dir lock to tempfile link
>   ovl: factor out ovl_copy_up_inode() helper
>   ovl: generalize ovl_copy_up_locked() using actors
>   ovl: generalize ovl_copy_up_one() using actors
>   ovl: implement index dir copy up method
>   ovl: handle race of concurrent lower hardlinks copy up
>   ovl: constant inode number for hardlinks
>
>  fs/inode.c               |  74 ++++++
>  fs/overlayfs/Kconfig     |  20 ++
>  fs/overlayfs/copy_up.c   | 648 ++++++++++++++++++++++++++++++++++++++---------
>  fs/overlayfs/dir.c       |  23 +-
>  fs/overlayfs/inode.c     | 138 +++++++++-
>  fs/overlayfs/namei.c     | 241 +++++++++++++++---
>  fs/overlayfs/overlayfs.h |  52 ++--
>  fs/overlayfs/ovl_entry.h |   7 +
>  fs/overlayfs/super.c     | 194 ++++++++++++--
>  fs/overlayfs/util.c      |  45 ++--
>  include/linux/fs.h       |  16 ++
>  11 files changed, 1235 insertions(+), 223 deletions(-)
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 00/20] Overlayfs inodes index
  2017-06-07  7:54 ` [PATCH v2 00/20] Overlayfs inodes index Miklos Szeredi
@ 2017-06-07  7:58   ` Amir Goldstein
  0 siblings, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07  7:58 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Wed, Jun 7, 2017 at 10:54 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Wed, Jun 7, 2017 at 9:51 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Miklos,
>>
>> This set is an independent series a top of current overlayfs-next.
>> I yanked all the dependencies from previous postings (consistent d_ino,
>> dir_lock, verify_lower) and included everything needed in this posting.
>>
>> This work introcuding the inodes index opt-in feature, which provides:
>> - Hardlinks are not broken on copy up
>> - Infrastructure for overlayfs NFS export
>>
>> Hardlink copy up tests including concurrent copy up of lower hardlinks
>> are available on my xfstests dev branch [1].
>>
>> There are some more TODO items before this is ready for v4.13, but I'd
>> love to hear what you think about the direction this has taken so far.
>
> Do you have a public branch for this?

https://github.com/amir73il/linux/commits/ovl-hardlinks

(it includes 4 unrelated vfs/tmpfs patches)

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 00/20] Overlayfs inodes index
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (20 preceding siblings ...)
  2017-06-07  7:54 ` [PATCH v2 00/20] Overlayfs inodes index Miklos Szeredi
@ 2017-06-07 14:58 ` Amir Goldstein
  2017-06-08 15:00   ` [PATCH v2 21/23] ovl: use inodes index on readonly mount Amir Goldstein
  2017-06-07 17:17 ` [PATCH v2 00/20] Overlayfs inodes index J. Bruce Fields
  22 siblings, 1 reply; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07 14:58 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, overlayfs, linux-fsdevel

On Wed, Jun 7, 2017 at 10:51 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> Miklos,
>
> This set is an independent series a top of current overlayfs-next.
> I yanked all the dependencies from previous postings (consistent d_ino,
> dir_lock, verify_lower) and included everything needed in this posting.
>
> This work introcuding the inodes index opt-in feature, which provides:
> - Hardlinks are not broken on copy up
> - Infrastructure for overlayfs NFS export
>
> Hardlink copy up tests including concurrent copy up of lower hardlinks
> are available on my xfstests dev branch [1].
>
> There are some more TODO items before this is ready for v4.13, but I'd
> love to hear what you think about the direction this has taken so far.
>
> Thanks,
> Amir.
>
> TODO:
> - Consistency of lower and upper hardlinks (*)
> - Cleanup stale and orphan index entries on mount (**)
> - Document the inodes index feature
>
> (*) When any lower hardlink has been copied up, we get the indexed
>     upper inode on lookup of all lower hardlinks and since they all
>     share the same overlay inode, they have the same 'realinode'.
>     Opening the lower hardlinks for read gives that lower inode
>     and not the indexed upper 'realinode'. The tests in [1] demostrate
>     this problem.

Miklos,

I have fix for the lower/upper hardlink inconsistency issue.
Pushed the patches to branch:
https://github.com/amir73il/linux/commits/ovl-hardlinks-rocopyup

It now passes all the xfstests hardlink copy up tests [1].
Note that you will need to use my unionmount-testsuite dev branch [2]
to test with inodes index enabled.
The one test that fails the constant inode verification on v4.12-rc1
(due to hardlink non constant ino) now passes:
./run --ov=1 --samefs rename-mass-5

Thanks,
Amir.

[1] https://github.com/amir73il/xfstests/commits/overlayfs-devel
[2] https://github.com/amir73il/unionmount-testsuite/commits/overlayfs-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 00/20] Overlayfs inodes index
  2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
                   ` (21 preceding siblings ...)
  2017-06-07 14:58 ` Amir Goldstein
@ 2017-06-07 17:17 ` J. Bruce Fields
  2017-06-07 18:36   ` Amir Goldstein
  22 siblings, 1 reply; 41+ messages in thread
From: J. Bruce Fields @ 2017-06-07 17:17 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel

On Wed, Jun 07, 2017 at 10:51:04AM +0300, Amir Goldstein wrote:
> Miklos,
> 
> This set is an independent series a top of current overlayfs-next.
> I yanked all the dependencies from previous postings (consistent d_ino,
> dir_lock, verify_lower) and included everything needed in this posting.
> 
> This work introcuding the inodes index opt-in feature, which provides:
> - Hardlinks are not broken on copy up
> - Infrastructure for overlayfs NFS export

Minor suggestion: for the lazy readers among us, a quick summary of the
implementation might be helpful.  (Could take most of it from just
05/20?)

--b.

> 
> Hardlink copy up tests including concurrent copy up of lower hardlinks
> are available on my xfstests dev branch [1].
> 
> There are some more TODO items before this is ready for v4.13, but I'd
> love to hear what you think about the direction this has taken so far.
> 
> Thanks,
> Amir.
> 
> TODO:
> - Consistency of lower and upper hardlinks (*)
> - Cleanup stale and orphan index entries on mount (**)
> - Document the inodes index feature
> 
> (*) When any lower hardlink has been copied up, we get the indexed
>     upper inode on lookup of all lower hardlinks and since they all
>     share the same overlay inode, they have the same 'realinode'.
>     Opening the lower hardlinks for read gives that lower inode
>     and not the indexed upper 'realinode'. The tests in [1] demostrate
>     this problem.
> (**) A stale index entry has a missing or stale 'origin' xattr. An
>      orphan index entry has nlink 1 and all lower hardlinks are covered.
> 
> v2:
> - Rebase on top of overalyfs-next with all dependencies
> - Remove 'verify_lower' dependency and simplify
> - Lookup index dentry on ovl_lookup()
> - Don't hash broken overlay hardlinks by origin
> - Fix nlink count of overlay inodes
> - Constant st_ino for indexed hardlinks
> 
> v1:
> - Upper/work dir exclusive lock
> - Introduce verify_lower mount option
> - Hash overlay inodes by origin
> - Introduce inodes index feature
> - Copy up hardlinks using inodes index
> 
> [1] https://github.com/amir73il/xfstests/commits/overlayfs-devel
> 
> Amir Goldstein (20):
>   vfs: introduce inode 'inuse' lock
>   ovl: get exclusive ownership on upper/work dirs
>   ovl: relax same fs constrain for ovl_check_origin()
>   ovl: generalize ovl_create_workdir()
>   ovl: introduce the inodes index dir feature
>   ovl: verify upper root dir matches lower root dir
>   ovl: verify index dir matches upper dir
>   ovl: lookup index entry for non-dir
>   ovl: move inode helpers to inode.c
>   ovl: use ovl_inode_init() for initializing new inode
>   ovl: hash overlay non-dir inodes by copy up origin inode
>   ovl: fix nlink leak in ovl_rename()
>   ovl: adjust overlay inode nlink for indexed inodes
>   ovl: defer upper dir lock to tempfile link
>   ovl: factor out ovl_copy_up_inode() helper
>   ovl: generalize ovl_copy_up_locked() using actors
>   ovl: generalize ovl_copy_up_one() using actors
>   ovl: implement index dir copy up method
>   ovl: handle race of concurrent lower hardlinks copy up
>   ovl: constant inode number for hardlinks
> 
>  fs/inode.c               |  74 ++++++
>  fs/overlayfs/Kconfig     |  20 ++
>  fs/overlayfs/copy_up.c   | 648 ++++++++++++++++++++++++++++++++++++++---------
>  fs/overlayfs/dir.c       |  23 +-
>  fs/overlayfs/inode.c     | 138 +++++++++-
>  fs/overlayfs/namei.c     | 241 +++++++++++++++---
>  fs/overlayfs/overlayfs.h |  52 ++--
>  fs/overlayfs/ovl_entry.h |   7 +
>  fs/overlayfs/super.c     | 194 ++++++++++++--
>  fs/overlayfs/util.c      |  45 ++--
>  include/linux/fs.h       |  16 ++
>  11 files changed, 1235 insertions(+), 223 deletions(-)
> 
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 00/20] Overlayfs inodes index
  2017-06-07 17:17 ` [PATCH v2 00/20] Overlayfs inodes index J. Bruce Fields
@ 2017-06-07 18:36   ` Amir Goldstein
  2017-06-07 18:59     ` J. Bruce Fields
  0 siblings, 1 reply; 41+ messages in thread
From: Amir Goldstein @ 2017-06-07 18:36 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Miklos Szeredi, Al Viro, overlayfs, linux-fsdevel

On Wed, Jun 7, 2017 at 8:17 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Wed, Jun 07, 2017 at 10:51:04AM +0300, Amir Goldstein wrote:
>> Miklos,
>>
>> This set is an independent series a top of current overlayfs-next.
>> I yanked all the dependencies from previous postings (consistent d_ino,
>> dir_lock, verify_lower) and included everything needed in this posting.
>>
>> This work introcuding the inodes index opt-in feature, which provides:
>> - Hardlinks are not broken on copy up
>> - Infrastructure for overlayfs NFS export
>
> Minor suggestion: for the lazy readers among us, a quick summary of the
> implementation might be helpful.  (Could take most of it from just
> 05/20?)
>

Heh, for the *very* lazy reader, I'll paste the referred snip from 05/20:
"The index dir will contain hardlinks to upper inodes, named after the
hex representation of their origin lower inodes.
The index dir is going to be used to prevent breaking lower hardlinks
on copy up and to implement overlayfs NFS export."

That's me being very very brief about the what and the why, but nothing
about the how, so here goes:

The challenge with exporting overlayfs handled is that an object can start
its life as a lower inode and transition into an upper inode (copy up).
When the object is lower, the only way to encode it is by encoding the
underlying fs lower inode, say encode(/lower/foo)=0x0001.
The gist of the inodes index concept is that when decoding the handle,
we need to check if the object was copied up.
When an object is copied up, it creates an index entry at:
workdir/index/0001, which is a hardlink to /upper/foo,
so we have a way to get from the lower handle to the upper inode if it
exists.

The same lower -> upper mapping also helps herding a lower hardlinks
bundle with their upper hardlinks bundle and refer to all as a single inode
object. Most of the gritty details are in patches {08,11,18}/20.

With this series, which is aiming for v4.13, NFS export is not there yet,
because lower directories are not yet indexed (and more).
I do, however, have a POC of non-persistent overlay NFS export [1]
with a detailed documentation patch about the missing bits for persistent
handles. I tested this POC with xfstest generic/426.

Hope this summary was quick enough ;-)

Cheers,
Amir.

[1] https://github.com/amir73il/linux/commits/ovl-nfs-export-wip

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 00/20] Overlayfs inodes index
  2017-06-07 18:36   ` Amir Goldstein
@ 2017-06-07 18:59     ` J. Bruce Fields
  0 siblings, 0 replies; 41+ messages in thread
From: J. Bruce Fields @ 2017-06-07 18:59 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, Al Viro, overlayfs, linux-fsdevel

On Wed, Jun 07, 2017 at 09:36:46PM +0300, Amir Goldstein wrote:
> On Wed, Jun 7, 2017 at 8:17 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > On Wed, Jun 07, 2017 at 10:51:04AM +0300, Amir Goldstein wrote:
> >> Miklos,
> >>
> >> This set is an independent series a top of current overlayfs-next.
> >> I yanked all the dependencies from previous postings (consistent d_ino,
> >> dir_lock, verify_lower) and included everything needed in this posting.
> >>
> >> This work introcuding the inodes index opt-in feature, which provides:
> >> - Hardlinks are not broken on copy up
> >> - Infrastructure for overlayfs NFS export
> >
> > Minor suggestion: for the lazy readers among us, a quick summary of the
> > implementation might be helpful.  (Could take most of it from just
> > 05/20?)
> >
> 
> Heh, for the *very* lazy reader, I'll paste the referred snip from 05/20:
> "The index dir will contain hardlinks to upper inodes, named after the
> hex representation of their origin lower inodes.
> The index dir is going to be used to prevent breaking lower hardlinks
> on copy up and to implement overlayfs NFS export."
> 
> That's me being very very brief about the what and the why, but nothing
> about the how, so here goes:
> 
> The challenge with exporting overlayfs handled is that an object can start
> its life as a lower inode and transition into an upper inode (copy up).
> When the object is lower, the only way to encode it is by encoding the
> underlying fs lower inode, say encode(/lower/foo)=0x0001.
> The gist of the inodes index concept is that when decoding the handle,
> we need to check if the object was copied up.
> When an object is copied up, it creates an index entry at:
> workdir/index/0001, which is a hardlink to /upper/foo,
> so we have a way to get from the lower handle to the upper inode if it
> exists.
> 
> The same lower -> upper mapping also helps herding a lower hardlinks
> bundle with their upper hardlinks bundle and refer to all as a single inode
> object. Most of the gritty details are in patches {08,11,18}/20.
> 
> With this series, which is aiming for v4.13, NFS export is not there yet,
> because lower directories are not yet indexed (and more).
> I do, however, have a POC of non-persistent overlay NFS export [1]
> with a detailed documentation patch about the missing bits for persistent
> handles. I tested this POC with xfstest generic/426.
> 
> Hope this summary was quick enough ;-)

Yes, thanks!

> 
> Cheers,
> Amir.
> 
> [1] https://github.com/amir73il/linux/commits/ovl-nfs-export-wip

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 08/20] ovl: lookup index entry for non-dir
  2017-06-07  7:51 ` [PATCH v2 08/20] ovl: lookup index entry for non-dir Amir Goldstein
@ 2017-06-08 12:11   ` Miklos Szeredi
  2017-06-08 14:48     ` Amir Goldstein
  0 siblings, 1 reply; 41+ messages in thread
From: Miklos Szeredi @ 2017-06-08 12:11 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Wed, Jun 7, 2017 at 9:51 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> When inodes index feature is enabled, lookup in indexdir for the index
> entry of lower real inode or copy up origin inode. The index entry name
> is the hex representation of the lower inode file handle.
>
> If the index dentry in negative, then either no lower aliases have been
> copied up yet, or aliases have been copied up in older kernels and are
> not indexed.
>
> If the index dentry for a copy up origin inode is positive, but points
> to an inode different than the upper inode, then either the upper inode
> has been copied up and not indexed or it was indexed, but since then
> index dir was cleared. Either way, that index cannot be used to indentify
> the overlay inode.
>
> If a negative index dentry is found or a positive dentry that matches the
> upper inode, store it in the overlay dentry to be used later. A non-upper
> overlay dentry may still have a positive index from copy up of another
> lower hardlink. This situation can be tested with the path type macros
> OVL_TYPE_INDEX(type) && !OVL_TYPE_UPPER(type).
>
> This is going to be used to prevent breaking hardlinks on copy up.

Why is a negative index (or any index, for that matter) stored in the
overlay dentry?  This seems a big waste, since the index dentry will
be allocated for all lower files, yet never used unless copied up.

Index is used:

  - at lookup need to find any copied up alias
  - at copyup need to set up new index

In both cases we can just do a fresh lookup in the index dir with
locks held (which is a good idea anyway).

Something related: should upperdentry of aliases be hardlinked to the
index on lookup and copy-up?  Or should the "hardlink-up" be deferred
until rename?  I think updating as soon as possible is the simpler of
the two.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 08/20] ovl: lookup index entry for non-dir
  2017-06-08 12:11   ` Miklos Szeredi
@ 2017-06-08 14:48     ` Amir Goldstein
  2017-06-08 15:17       ` Miklos Szeredi
  0 siblings, 1 reply; 41+ messages in thread
From: Amir Goldstein @ 2017-06-08 14:48 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Thu, Jun 8, 2017 at 3:11 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Wed, Jun 7, 2017 at 9:51 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> When inodes index feature is enabled, lookup in indexdir for the index
>> entry of lower real inode or copy up origin inode. The index entry name
>> is the hex representation of the lower inode file handle.
>>
>> If the index dentry in negative, then either no lower aliases have been
>> copied up yet, or aliases have been copied up in older kernels and are
>> not indexed.
>>
>> If the index dentry for a copy up origin inode is positive, but points
>> to an inode different than the upper inode, then either the upper inode
>> has been copied up and not indexed or it was indexed, but since then
>> index dir was cleared. Either way, that index cannot be used to indentify
>> the overlay inode.
>>
>> If a negative index dentry is found or a positive dentry that matches the
>> upper inode, store it in the overlay dentry to be used later. A non-upper
>> overlay dentry may still have a positive index from copy up of another
>> lower hardlink. This situation can be tested with the path type macros
>> OVL_TYPE_INDEX(type) && !OVL_TYPE_UPPER(type).
>>
>> This is going to be used to prevent breaking hardlinks on copy up.
>
> Why is a negative index (or any index, for that matter) stored in the
> overlay dentry?

One of the reasons it is stored in the overlay dentry is to be able to
test OVL_TYPE_INDEX() on the dentry, for example, in:
"ovl: adjust overlay inode nlink for indexed inodes".

OVL_TYPE_INDEX() is only interested in positive index, so it may seem
like a better idea to save memory and store the index dentry on lookup
only if it is positive, but then OVL_TYPE_INDEX() will be wrong for lower
aliases that became lower indexed aliases due to another alias copy up.
This is important for reasons that are mentioned below.

There may be an opportunity for a massive optimization though.
For all practical purpose, ovl_type_indexed_lower() should always be
false for lower with nlink == 1, so there is no reason so store the negative
index dentry in that case.

> This seems a big waste, since the index dentry will
> be allocated for all lower files, yet never used unless copied up.
>
> Index is used:
>
>   - at lookup need to find any copied up alias
>   - at copyup need to set up new index

So it has several more subtle uses:
- When whiteout a lower aliases, we need to count down nlink to
  know when we can unlink the an orphan index (TODO)
- Same when renaming over a lower indexed alias
- The lower hardlinks copy up on read [1] is another big user

>
> In both cases we can just do a fresh lookup in the index dir with
> locks held (which is a good idea anyway).
>
> Something related: should upperdentry of aliases be hardlinked to the
> index on lookup and copy-up?  Or should the "hardlink-up" be deferred
> until rename?  I think updating as soon as possible is the simpler of
> the two.
>

Agree that updating "as soon as possible" is simpler.
I went even further with "as soon as possible" with lower hardlinks copy
up on read [1]. There is a subtle inconsistency between ovl_inode_real()
and ovl_{dentry,path}_real() of those lower indexed aliases that is hard
to deal with if we are not hardlinking up on the first access.

But there is another reason we cannot defer indexing until rename -
We need to be able to decode a non-dir file handle if its parent was
renamed, so the index has to be there even if the the upper file itself
was not renamed.

I guess that means we will also need to index directories on copy up.
Too bad. I though it was enough to index directories on rename.

Amir.

[1] https://github.com/amir73il/linux/commits/ovl-hardlinks-rocopyup

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 21/23] ovl: use inodes index on readonly mount
  2017-06-07 14:58 ` Amir Goldstein
@ 2017-06-08 15:00   ` Amir Goldstein
  2017-06-08 15:00     ` [PATCH v2 22/23] ovl: move copy up helpers to copy_up.c Amir Goldstein
  2017-06-08 15:00     ` [PATCH v2 23/23] ovl: copy up on read operations on indexed lower Amir Goldstein
  0 siblings, 2 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-08 15:00 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

The inodes index is needed also with readonly mount to find indexed
lower hardlinks. Also, a readonly overlay mount can be remounted
readonly and then overlay dentries in cache should already have
the index entry cached.

Therefore, we need to disable the inodes index feature only in case
there are no upper/work dirs, and not when mounting readonly per
user request.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/super.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index b53c08f70b1a..4cf18850b4de 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -251,6 +251,12 @@ static int ovl_statfs(struct dentry *dentry, struct kstatfs *buf)
 	return err;
 }
 
+/* Will this overlay be forced to mount/remount ro? */
+static bool ovl_force_readonly(struct ovl_fs *ufs)
+{
+	return (!ufs->upper_mnt || !ufs->workdir);
+}
+
 /**
  * ovl_show_options
  *
@@ -282,7 +288,7 @@ static int ovl_remount(struct super_block *sb, int *flags, char *data)
 {
 	struct ovl_fs *ufs = sb->s_fs_info;
 
-	if (!(*flags & MS_RDONLY) && (!ufs->upper_mnt || !ufs->workdir))
+	if (!(*flags & MS_RDONLY) && ovl_force_readonly(ufs))
 		return -EROFS;
 
 	return 0;
@@ -1060,7 +1066,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	else if (ufs->upper_mnt->mnt_sb != ufs->same_sb)
 		ufs->same_sb = NULL;
 
-	if (!(sb->s_flags & MS_RDONLY) && ufs->config.index) {
+	if (!(ovl_force_readonly(ufs)) && ufs->config.index) {
 		/* Verify lower root is upper root origin */
 		err = ovl_verify_set_origin(upperpath.dentry,
 					    ufs->lower_mnt[0], stack[0].dentry,
@@ -1085,8 +1091,10 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 		} else {
 			pr_warn("overlayfs: try deleting index dir or mounting with '-o index=off' to disable inodes index.\n");
 		}
-
 	}
+	/* Show index=off/on in /proc/mounts for any of the reasons above */
+	if (!ufs->indexdir)
+		ufs->config.index = false;
 
 	if (remote)
 		sb->s_d_op = &ovl_reval_dentry_operations;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 22/23] ovl: move copy up helpers to copy_up.c
  2017-06-08 15:00   ` [PATCH v2 21/23] ovl: use inodes index on readonly mount Amir Goldstein
@ 2017-06-08 15:00     ` Amir Goldstein
  2017-06-08 15:00     ` [PATCH v2 23/23] ovl: copy up on read operations on indexed lower Amir Goldstein
  1 sibling, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-08 15:00 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c   | 65 +++++++++++++++++++++++++++++++++++++++++++++++-
 fs/overlayfs/inode.c     | 33 ------------------------
 fs/overlayfs/overlayfs.h |  5 +---
 fs/overlayfs/util.c      | 30 ----------------------
 4 files changed, 65 insertions(+), 68 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index f8acf9c8b345..8a297cfc33fb 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -353,6 +353,36 @@ static int ovl_copy_up_inode(struct dentry *dentry, struct dentry *temp,
 	return err;
 }
 
+static int ovl_copy_up_start(struct dentry *dentry)
+{
+	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
+	struct ovl_entry *oe = dentry->d_fsdata;
+	int err;
+
+	spin_lock(&ofs->copyup_wq.lock);
+	err = wait_event_interruptible_locked(ofs->copyup_wq, !oe->copying);
+	if (!err) {
+		if (oe->__upperdentry)
+			err = 1; /* Already copied up */
+		else
+			oe->copying = true;
+	}
+	spin_unlock(&ofs->copyup_wq.lock);
+
+	return err;
+}
+
+static void ovl_copy_up_end(struct dentry *dentry)
+{
+	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
+	struct ovl_entry *oe = dentry->d_fsdata;
+
+	spin_lock(&ofs->copyup_wq.lock);
+	oe->copying = false;
+	wake_up_locked(&ofs->copyup_wq);
+	spin_unlock(&ofs->copyup_wq.lock);
+}
+
 /*
  * Context and operations for copying up a single lower file.
  */
@@ -911,7 +941,7 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 	return err;
 }
 
-int ovl_copy_up_flags(struct dentry *dentry, int flags)
+static int ovl_copy_up_flags(struct dentry *dentry, int flags)
 {
 	int err = 0;
 	const struct cred *old_cred = ovl_override_creds(dentry->d_sb);
@@ -960,3 +990,36 @@ int ovl_copy_up(struct dentry *dentry)
 {
 	return ovl_copy_up_flags(dentry, 0);
 }
+
+static bool ovl_open_need_copy_up(int flags, enum ovl_path_type type,
+				  struct dentry *realdentry)
+{
+	if (OVL_TYPE_UPPER(type))
+		return false;
+
+	if (special_file(realdentry->d_inode->i_mode))
+		return false;
+
+	if (!(OPEN_FMODE(flags) & FMODE_WRITE) && !(flags & O_TRUNC))
+		return false;
+
+	return true;
+}
+
+int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags)
+{
+	int err = 0;
+	struct path realpath;
+	enum ovl_path_type type;
+
+	type = ovl_path_real(dentry, &realpath);
+	if (ovl_open_need_copy_up(file_flags, type, realpath.dentry)) {
+		err = ovl_want_write(dentry);
+		if (!err) {
+			err = ovl_copy_up_flags(dentry, file_flags);
+			ovl_drop_write(dentry);
+		}
+	}
+
+	return err;
+}
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 8a30dc6cd42d..db2a63bd8bb9 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -312,39 +312,6 @@ struct posix_acl *ovl_get_acl(struct inode *inode, int type)
 	return acl;
 }
 
-static bool ovl_open_need_copy_up(int flags, enum ovl_path_type type,
-				  struct dentry *realdentry)
-{
-	if (OVL_TYPE_UPPER(type))
-		return false;
-
-	if (special_file(realdentry->d_inode->i_mode))
-		return false;
-
-	if (!(OPEN_FMODE(flags) & FMODE_WRITE) && !(flags & O_TRUNC))
-		return false;
-
-	return true;
-}
-
-int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags)
-{
-	int err = 0;
-	struct path realpath;
-	enum ovl_path_type type;
-
-	type = ovl_path_real(dentry, &realpath);
-	if (ovl_open_need_copy_up(file_flags, type, realpath.dentry)) {
-		err = ovl_want_write(dentry);
-		if (!err) {
-			err = ovl_copy_up_flags(dentry, file_flags);
-			ovl_drop_write(dentry);
-		}
-	}
-
-	return err;
-}
-
 int ovl_update_time(struct inode *inode, struct timespec *ts, int flags)
 {
 	struct dentry *alias;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index beac2d858689..434870f5bb4b 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -210,8 +210,6 @@ void ovl_dentry_version_inc(struct dentry *dentry);
 u64 ovl_dentry_version_get(struct dentry *dentry);
 bool ovl_is_whiteout(struct dentry *dentry);
 struct file *ovl_path_open(struct path *path, int flags);
-int ovl_copy_up_start(struct dentry *dentry);
-void ovl_copy_up_end(struct dentry *dentry);
 bool ovl_check_dir_xattr(struct dentry *dentry, const char *name);
 int ovl_check_setxattr(struct dentry *dentry, struct dentry *upperdentry,
 		       const char *name, const void *value, size_t size,
@@ -251,7 +249,6 @@ int ovl_xattr_get(struct dentry *dentry, const char *name,
 		  void *value, size_t size);
 ssize_t ovl_listxattr(struct dentry *dentry, char *list, size_t size);
 struct posix_acl *ovl_get_acl(struct inode *inode, int type);
-int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags);
 int ovl_update_time(struct inode *inode, struct timespec *ts, int flags);
 bool ovl_is_private_xattr(const char *name);
 
@@ -307,7 +304,7 @@ void ovl_cleanup(struct inode *dir, struct dentry *dentry);
 
 /* copy_up.c */
 int ovl_copy_up(struct dentry *dentry);
-int ovl_copy_up_flags(struct dentry *dentry, int flags);
+int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags);
 int ovl_copy_xattr(struct dentry *old, struct dentry *new);
 int ovl_set_attr(struct dentry *upper, struct kstat *stat);
 struct ovl_fh *ovl_encode_fh(struct dentry *lower, bool is_upper);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index a28bf55a0d48..fa6c2ae4a747 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -285,36 +285,6 @@ struct file *ovl_path_open(struct path *path, int flags)
 	return dentry_open(path, flags | O_NOATIME, current_cred());
 }
 
-int ovl_copy_up_start(struct dentry *dentry)
-{
-	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
-	struct ovl_entry *oe = dentry->d_fsdata;
-	int err;
-
-	spin_lock(&ofs->copyup_wq.lock);
-	err = wait_event_interruptible_locked(ofs->copyup_wq, !oe->copying);
-	if (!err) {
-		if (oe->__upperdentry)
-			err = 1; /* Already copied up */
-		else
-			oe->copying = true;
-	}
-	spin_unlock(&ofs->copyup_wq.lock);
-
-	return err;
-}
-
-void ovl_copy_up_end(struct dentry *dentry)
-{
-	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
-	struct ovl_entry *oe = dentry->d_fsdata;
-
-	spin_lock(&ofs->copyup_wq.lock);
-	oe->copying = false;
-	wake_up_locked(&ofs->copyup_wq);
-	spin_unlock(&ofs->copyup_wq.lock);
-}
-
 bool ovl_check_dir_xattr(struct dentry *dentry, const char *name)
 {
 	int res;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 23/23] ovl: copy up on read operations on indexed lower
  2017-06-08 15:00   ` [PATCH v2 21/23] ovl: use inodes index on readonly mount Amir Goldstein
  2017-06-08 15:00     ` [PATCH v2 22/23] ovl: move copy up helpers to copy_up.c Amir Goldstein
@ 2017-06-08 15:00     ` Amir Goldstein
  1 sibling, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-08 15:00 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

With inodes index feature, all lower and upper hardlinks point to
the same overlay inode. However, when a lower hardlink is accessed
for read operation, the real inode operated on is not the same inode
as the real inode for read operation on an upper hardlink.

When accessing a lower hardlink for read, which is already indexed by
an earlier upper hardlink copy up, call ovl_copy_up() to link the
indexed upper on top of the lower hardlink and then operate on the
upper real inode to avoid this inconsistency.

The following test demonstrates the upper/lower hardlinks inconsistency:

$ echo -n a > /lower/foo
$ ln /lower/foo /lower/bar
$ cd /mnt
$ tail foo bar # both aliases are ro lower
==> foo <==
a
==> bar <==
a

$ echo -n b >> foo
$ tail foo bar # foo is rw upper, bar is ro lower
==> foo <==
ab
==> bar <==
a

$ echo -n c >> bar
$ tail foo bar # both aliases are rw upper
==> foo <==
abc
==> bar <==
abc

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c   | 28 ++++++++++++++++++++++++++++
 fs/overlayfs/overlayfs.h |  1 +
 fs/overlayfs/super.c     |  4 +++-
 fs/overlayfs/util.c      | 12 +++++++++++-
 4 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 8a297cfc33fb..27aabed25680 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -1023,3 +1023,31 @@ int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags)
 
 	return err;
 }
+
+/* Copy up on read ops of non-dir indexed lower */
+int ovl_maybe_ro_copy_up(struct dentry *dentry)
+{
+	enum ovl_path_type type = ovl_path_type(dentry);
+	int err;
+
+	if (WARN_ON(!d_inode(dentry)) || d_is_dir(dentry) ||
+	    OVL_TYPE_UPPER(type) || !OVL_TYPE_INDEX(type))
+		return 0;
+
+	err = ovl_want_write(dentry);
+	if (err)
+		goto fail;
+
+	err = ovl_copy_up(dentry);
+	ovl_drop_write(dentry);
+
+	if (err)
+		goto fail;
+
+	return 0;
+
+fail:
+	pr_warn_ratelimited("overlayfs: failed copy up on read (%pd2, err=%i)\n",
+			    dentry, err);
+	return err;
+}
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 434870f5bb4b..10df85d41546 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -305,6 +305,7 @@ void ovl_cleanup(struct inode *dir, struct dentry *dentry);
 /* copy_up.c */
 int ovl_copy_up(struct dentry *dentry);
 int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags);
+int ovl_maybe_ro_copy_up(struct dentry *dentry);
 int ovl_copy_xattr(struct dentry *old, struct dentry *new);
 int ovl_set_attr(struct dentry *upper, struct kstat *stat);
 struct ovl_fh *ovl_encode_fh(struct dentry *lower, bool is_upper);
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 4cf18850b4de..5ef9ce7489be 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -76,6 +76,8 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
 				 unsigned int open_flags)
 {
 	struct dentry *real;
+	/* Copy up on open for read of indexed lower */
+	bool rocopyup = !inode && ovl_indexdir(dentry->d_sb);
 	int err;
 
 	if (!d_is_reg(dentry)) {
@@ -87,7 +89,7 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
 	if (d_is_negative(dentry))
 		return dentry;
 
-	if (open_flags) {
+	if (open_flags || rocopyup) {
 		err = ovl_open_maybe_copy_up(dentry, open_flags);
 		if (err)
 			return ERR_PTR(err);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index fa6c2ae4a747..bad3df557cc6 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -14,6 +14,7 @@
 #include <linux/xattr.h>
 #include <linux/exportfs.h>
 #include <linux/uuid.h>
+#include <linux/ratelimit.h>
 #include "overlayfs.h"
 #include "ovl_entry.h"
 
@@ -131,10 +132,15 @@ void ovl_path_lower(struct dentry *dentry, struct path *path)
 	*path = oe->numlower ? oe->lowerstack[0] : (struct path) { };
 }
 
+/* Caller must not hold ovl_want_write(dentry) */
 enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path)
 {
-	enum ovl_path_type type = ovl_path_type(dentry);
+	enum ovl_path_type type;
 
+	/* best effort copy up indexed lower */
+	ovl_maybe_ro_copy_up(dentry);
+
+	type = ovl_path_type(dentry);
 	if (!OVL_TYPE_UPPER(type))
 		ovl_path_lower(dentry, path);
 	else
@@ -169,11 +175,15 @@ struct dentry *ovl_dentry_index(struct dentry *dentry)
 	return oe->indexdentry;
 }
 
+/* Caller must not hold ovl_want_write(dentry) */
 struct dentry *ovl_dentry_real(struct dentry *dentry)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
 	struct dentry *realdentry;
 
+	/* Best effort copy up of indexed lower */
+	ovl_maybe_ro_copy_up(dentry);
+
 	realdentry = ovl_upperdentry_dereference(oe);
 	if (!realdentry)
 		realdentry = __ovl_dentry_lower(oe);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 08/20] ovl: lookup index entry for non-dir
  2017-06-08 14:48     ` Amir Goldstein
@ 2017-06-08 15:17       ` Miklos Szeredi
  2017-06-08 16:09         ` Amir Goldstein
  0 siblings, 1 reply; 41+ messages in thread
From: Miklos Szeredi @ 2017-06-08 15:17 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Thu, Jun 8, 2017 at 4:48 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Jun 8, 2017 at 3:11 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Wed, Jun 7, 2017 at 9:51 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> When inodes index feature is enabled, lookup in indexdir for the index
>>> entry of lower real inode or copy up origin inode. The index entry name
>>> is the hex representation of the lower inode file handle.
>>>
>>> If the index dentry in negative, then either no lower aliases have been
>>> copied up yet, or aliases have been copied up in older kernels and are
>>> not indexed.
>>>
>>> If the index dentry for a copy up origin inode is positive, but points
>>> to an inode different than the upper inode, then either the upper inode
>>> has been copied up and not indexed or it was indexed, but since then
>>> index dir was cleared. Either way, that index cannot be used to indentify
>>> the overlay inode.
>>>
>>> If a negative index dentry is found or a positive dentry that matches the
>>> upper inode, store it in the overlay dentry to be used later. A non-upper
>>> overlay dentry may still have a positive index from copy up of another
>>> lower hardlink. This situation can be tested with the path type macros
>>> OVL_TYPE_INDEX(type) && !OVL_TYPE_UPPER(type).
>>>
>>> This is going to be used to prevent breaking hardlinks on copy up.
>>
>> Why is a negative index (or any index, for that matter) stored in the
>> overlay dentry?
>
> One of the reasons it is stored in the overlay dentry is to be able to
> test OVL_TYPE_INDEX() on the dentry, for example, in:
> "ovl: adjust overlay inode nlink for indexed inodes".

OVL_TYPE_INDEX() is just a bool.  When we do the *first* copy up of a
file with multiple links we need to update all the aliases anyway:
i.e. do the hardlink-ups, update ->upperdentry, etc.

> OVL_TYPE_INDEX() is only interested in positive index, so it may seem
> like a better idea to save memory and store the index dentry on lookup
> only if it is positive, but then OVL_TYPE_INDEX() will be wrong for lower
> aliases that became lower indexed aliases due to another alias copy up.
> This is important for reasons that are mentioned below.
>
> There may be an opportunity for a massive optimization though.
> For all practical purpose, ovl_type_indexed_lower() should always be
> false for lower with nlink == 1, so there is no reason so store the negative
> index dentry in that case.
>
>> This seems a big waste, since the index dentry will
>> be allocated for all lower files, yet never used unless copied up.
>>
>> Index is used:
>>
>>   - at lookup need to find any copied up alias
>>   - at copyup need to set up new index
>
> So it has several more subtle uses:
> - When whiteout a lower aliases, we need to count down nlink to
>   know when we can unlink the an orphan index (TODO)

If we do the link-up early (at lookup) then whiteout won't need
special casing.  The link-up would be unnecessary in this case, but
delaying it will just cause headaches.

> - Same when renaming over a lower indexed alias

And the same case here.

> - The lower hardlinks copy up on read [1] is another big user

Again, doing it on lookup will be earlier than at read, so no issue.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 08/20] ovl: lookup index entry for non-dir
  2017-06-08 15:17       ` Miklos Szeredi
@ 2017-06-08 16:09         ` Amir Goldstein
  2017-06-09  8:43           ` Miklos Szeredi
  0 siblings, 1 reply; 41+ messages in thread
From: Amir Goldstein @ 2017-06-08 16:09 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Thu, Jun 8, 2017 at 6:17 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Jun 8, 2017 at 4:48 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Thu, Jun 8, 2017 at 3:11 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Wed, Jun 7, 2017 at 9:51 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>> When inodes index feature is enabled, lookup in indexdir for the index
>>>> entry of lower real inode or copy up origin inode. The index entry name
>>>> is the hex representation of the lower inode file handle.
>>>>
>>>> If the index dentry in negative, then either no lower aliases have been
>>>> copied up yet, or aliases have been copied up in older kernels and are
>>>> not indexed.
>>>>
>>>> If the index dentry for a copy up origin inode is positive, but points
>>>> to an inode different than the upper inode, then either the upper inode
>>>> has been copied up and not indexed or it was indexed, but since then
>>>> index dir was cleared. Either way, that index cannot be used to indentify
>>>> the overlay inode.
>>>>
>>>> If a negative index dentry is found or a positive dentry that matches the
>>>> upper inode, store it in the overlay dentry to be used later. A non-upper
>>>> overlay dentry may still have a positive index from copy up of another
>>>> lower hardlink. This situation can be tested with the path type macros
>>>> OVL_TYPE_INDEX(type) && !OVL_TYPE_UPPER(type).
>>>>
>>>> This is going to be used to prevent breaking hardlinks on copy up.
>>>
>>> Why is a negative index (or any index, for that matter) stored in the
>>> overlay dentry?
>>
>> One of the reasons it is stored in the overlay dentry is to be able to
>> test OVL_TYPE_INDEX() on the dentry, for example, in:
>> "ovl: adjust overlay inode nlink for indexed inodes".
>
> OVL_TYPE_INDEX() is just a bool.  When we do the *first* copy up of a
> file with multiple links we need to update all the aliases anyway:
> i.e. do the hardlink-ups, update ->upperdentry, etc.
>

You mean iterate all existing the overlay inode dentry cache aliases?
Makes sense. together with hardlink-up on lookup, this will make
ovl_type_indexed_lower() impossible. I guess we need not worry about
unhashed/disconnected overlay aliases right now?
although we might need to revisit this assumption with NFS export.

>>
>>> This seems a big waste, since the index dentry will
>>> be allocated for all lower files, yet never used unless copied up.
>>>
>>> Index is used:
>>>
>>>   - at lookup need to find any copied up alias
>>>   - at copyup need to set up new index
>>
>> So it has several more subtle uses:
>> - When whiteout a lower aliases, we need to count down nlink to
>>   know when we can unlink the an orphan index (TODO)
>
> If we do the link-up early (at lookup) then whiteout won't need
> special casing.  The link-up would be unnecessary in this case, but
> delaying it will just cause headaches.
>

While on the subject, maybe you also have an idea about how to
account for whiteouts *before* the first copy up:
If we have N lower hardlinks, delete/whiteout N-1 then copy up
the Nth and then delete the Nth, we need to store the negative nlink
count (N-1) before the index exists to know that we can turn the
orphan index into a whiteout (so NFS decode will guaranty ESTALE).

One though I had was to keep an "anti-index" dir with whiteouts
that are covering the anti-indexed lower.
On mount and on ovl_inode_evict, need to test:
origin.nlink == anti-index.nlink && index.nlink == 1 and unlink
the (positive) index entry. NFS decode can find the anti-index
compare origin.nlink == anti-index.nlink and reach the right
conclusion.

I couldn't find an appropriate solution for rename over though.

>> - Same when renaming over a lower indexed alias
>
> And the same case here.
>
>> - The lower hardlinks copy up on read [1] is another big user
>
> Again, doing it on lookup will be earlier than at read, so no issue.
>

Yes, that would be much better.
I'll rework the patches.

Amir.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 08/20] ovl: lookup index entry for non-dir
  2017-06-08 16:09         ` Amir Goldstein
@ 2017-06-09  8:43           ` Miklos Szeredi
  2017-06-09  9:38             ` Amir Goldstein
  0 siblings, 1 reply; 41+ messages in thread
From: Miklos Szeredi @ 2017-06-09  8:43 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Thu, Jun 8, 2017 at 6:09 PM, Amir Goldstein <amir73il@gmail.com> wrote:

> You mean iterate all existing the overlay inode dentry cache aliases?

Yes.

And locking is needed.  The INUSE flag could provide that, but I'd
prefer something local to the overlay instance (i.e. in ovl_fs).  The
reason is that lower layers could be shared and then separate
instances would be interfering with each other, which is not nice.
First version could just do a single mutex in ovl_fs.  Could refine
that to an array of locks hashed by origin inode, or something.

>
>>>
>>>> This seems a big waste, since the index dentry will
>>>> be allocated for all lower files, yet never used unless copied up.
>>>>
>>>> Index is used:
>>>>
>>>>   - at lookup need to find any copied up alias
>>>>   - at copyup need to set up new index
>>>
>>> So it has several more subtle uses:
>>> - When whiteout a lower aliases, we need to count down nlink to
>>>   know when we can unlink the an orphan index (TODO)
>>
>> If we do the link-up early (at lookup) then whiteout won't need
>> special casing.  The link-up would be unnecessary in this case, but
>> delaying it will just cause headaches.
>>
>
> While on the subject, maybe you also have an idea about how to
> account for whiteouts *before* the first copy up:

Ah, another special case.  We could un-special it by doing the copy-up
to index anyway before commencing with the delete (only for i_nlink >
1, obviously).  Bit of a hack, but would make things simpler.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 08/20] ovl: lookup index entry for non-dir
  2017-06-09  8:43           ` Miklos Szeredi
@ 2017-06-09  9:38             ` Amir Goldstein
  2017-06-09 11:49               ` Miklos Szeredi
  0 siblings, 1 reply; 41+ messages in thread
From: Amir Goldstein @ 2017-06-09  9:38 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Fri, Jun 9, 2017 at 11:43 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Jun 8, 2017 at 6:09 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>
>> You mean iterate all existing the overlay inode dentry cache aliases?
>
> Yes.
>
> And locking is needed.  The INUSE flag could provide that, but I'd
> prefer something local to the overlay instance (i.e. in ovl_fs).  The
> reason is that lower layers could be shared and then separate
> instances would be interfering with each other, which is not nice.
> First version could just do a single mutex in ovl_fs.  Could refine
> that to an array of locks hashed by origin inode, or something.
>

Then how about inuse_lock the overlay inode?
It is already unique among all lower/upper aliases.
In fact,  oe->copying and ovl_{start,end}_copy_up could probably
be moved from the overlay dentry to the overlay inode.
It should work the same for the old use cases and provide the needed
(granular) locking for hardlinks copy-aliases-up.
I would just need to implement inode_inuse_lock().
Am I missing something?

>>
>> While on the subject, maybe you also have an idea about how to
>> account for whiteouts *before* the first copy up:
>
> Ah, another special case.  We could un-special it by doing the copy-up
> to index anyway before commencing with the delete (only for i_nlink >
> 1, obviously).  Bit of a hack, but would make things simpler.
>

This certainly works for my use case of clone up, so I don't mind wiring up
this workaround. We would still need to keep persistent count of copy ups
(e.g. in trusted.overlay.nlink of upper), because the condition to whiteout the
index on mount/evict is (origin->i_nlink == trusted.overlay.nlink &&
index->i_nlink == 1), but I was planning to do that anyway.

<soliciting> copy-up-on-delete is also classic use case for metadata-copy-up.
And then we can proceed to metadata-index-copy-up on open for read to
solve the consistent ro/rw fd.
Did you get a chance to look at the page mapping sharing?
Hey, I already set i_data.privae_data to the origin inode.
All you need to do is steel the pages ;-) </soliciting>

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 08/20] ovl: lookup index entry for non-dir
  2017-06-09  9:38             ` Amir Goldstein
@ 2017-06-09 11:49               ` Miklos Szeredi
  2017-06-09 13:14                 ` Miklos Szeredi
  0 siblings, 1 reply; 41+ messages in thread
From: Miklos Szeredi @ 2017-06-09 11:49 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Fri, Jun 9, 2017 at 11:38 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Fri, Jun 9, 2017 at 11:43 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Thu, Jun 8, 2017 at 6:09 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>
>>> You mean iterate all existing the overlay inode dentry cache aliases?
>>
>> Yes.
>>
>> And locking is needed.  The INUSE flag could provide that, but I'd
>> prefer something local to the overlay instance (i.e. in ovl_fs).  The
>> reason is that lower layers could be shared and then separate
>> instances would be interfering with each other, which is not nice.
>> First version could just do a single mutex in ovl_fs.  Could refine
>> that to an array of locks hashed by origin inode, or something.
>>
>
> Then how about inuse_lock the overlay inode?
> It is already unique among all lower/upper aliases.
> In fact,  oe->copying and ovl_{start,end}_copy_up could probably
> be moved from the overlay dentry to the overlay inode.
> It should work the same for the old use cases and provide the needed
> (granular) locking for hardlinks copy-aliases-up.

I like that.

> I would just need to implement inode_inuse_lock().

Just put the mutex in the new struct ovl_inode?

>
>>>
>>> While on the subject, maybe you also have an idea about how to
>>> account for whiteouts *before* the first copy up:
>>
>> Ah, another special case.  We could un-special it by doing the copy-up
>> to index anyway before commencing with the delete (only for i_nlink >
>> 1, obviously).  Bit of a hack, but would make things simpler.
>>
>
> This certainly works for my use case of clone up, so I don't mind wiring up
> this workaround. We would still need to keep persistent count of copy ups
> (e.g. in trusted.overlay.nlink of upper), because the condition to whiteout the
> index on mount/evict is (origin->i_nlink == trusted.overlay.nlink &&
> index->i_nlink == 1), but I was planning to do that anyway.

Yes.

The challenge is in accounting the number of link-ups atomically with
the link-up itself.  No ideas off-hand.

> <soliciting> copy-up-on-delete is also classic use case for metadata-copy-up.
> And then we can proceed to metadata-index-copy-up on open for read to
> solve the consistent ro/rw fd.
> Did you get a chance to look at the page mapping sharing?

Not yet.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 08/20] ovl: lookup index entry for non-dir
  2017-06-09 11:49               ` Miklos Szeredi
@ 2017-06-09 13:14                 ` Miklos Szeredi
  2017-06-09 13:24                   ` Amir Goldstein
  2017-06-09 22:56                   ` Amir Goldstein
  0 siblings, 2 replies; 41+ messages in thread
From: Miklos Szeredi @ 2017-06-09 13:14 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Fri, Jun 9, 2017 at 1:49 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Jun 9, 2017 at 11:38 AM, Amir Goldstein <amir73il@gmail.com> wrote:

> The challenge is in accounting the number of link-ups atomically with
> the link-up itself.  No ideas off-hand.

Following would work I think:

- copy up to indexdir (it now has st_nlink == 1)
- set overlay.nlinkup to "1-2"
  + the first number indicates the target number of linkups
  + the second indicates the target st_nlink on the file
- fsync the file (hopefully this writes the xattrs as well as the data)
- link the file from indexdir to upper
- set overlay.nlinkup to "1"
   + this indicates that the linkup was finished and number of linkups is 1.

There are 3 cases when we find a file with overlay.nlinkup:

- just one number: nothing to do
- two numbers and second number == st_nlink: replace with just the linkup value
- two numbers and second number != st_nlink: replace with decremented
linkup value

So the atomicity is guaranteed by st_nlink becoming in sync with the
target value.

Would need to ensure that ovl_link() is not performed in parallel with
the linkup procedure.

Thanks,
Miklos

 - power fai


overlay.nlinkup

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 08/20] ovl: lookup index entry for non-dir
  2017-06-09 13:14                 ` Miklos Szeredi
@ 2017-06-09 13:24                   ` Amir Goldstein
  2017-06-09 13:29                     ` Miklos Szeredi
  2017-06-09 22:56                   ` Amir Goldstein
  1 sibling, 1 reply; 41+ messages in thread
From: Amir Goldstein @ 2017-06-09 13:24 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Fri, Jun 9, 2017 at 4:14 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Jun 9, 2017 at 1:49 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Fri, Jun 9, 2017 at 11:38 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>
>> The challenge is in accounting the number of link-ups atomically with
>> the link-up itself.  No ideas off-hand.
>
> Following would work I think:
>
> - copy up to indexdir (it now has st_nlink == 1)
> - set overlay.nlinkup to "1-2"
>   + the first number indicates the target number of linkups
>   + the second indicates the target st_nlink on the file
> - fsync the file (hopefully this writes the xattrs as well as the data)
> - link the file from indexdir to upper
> - set overlay.nlinkup to "1"
>    + this indicates that the linkup was finished and number of linkups is 1.
>
> There are 3 cases when we find a file with overlay.nlinkup:
>
> - just one number: nothing to do
> - two numbers and second number == st_nlink: replace with just the linkup value
> - two numbers and second number != st_nlink: replace with decremented
> linkup value
>
> So the atomicity is guaranteed by st_nlink becoming in sync with the
> target value.
>
> Would need to ensure that ovl_link() is not performed in parallel with
> the linkup procedure.
>

Yep, I started to write you the same thing :)
I though of using a separate xattr for the transnational values,
and delete it after link up, but I guess single xattr is cleaner ??

I guess the new ovl_inode mutex could be used for blocking ovl_link().

Just to be clear:
- we are going to allocate ovl_inode from our own slab cache
- move from using the i_private inode fields to ovl_inode fields:
  {realinode, originode, nlinkup, lock}
- use ovl_inode mutex instead of wait queue and for hardlink up

Correct?

Amir.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 08/20] ovl: lookup index entry for non-dir
  2017-06-09 13:24                   ` Amir Goldstein
@ 2017-06-09 13:29                     ` Miklos Szeredi
  0 siblings, 0 replies; 41+ messages in thread
From: Miklos Szeredi @ 2017-06-09 13:29 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Fri, Jun 9, 2017 at 3:24 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Fri, Jun 9, 2017 at 4:14 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Fri, Jun 9, 2017 at 1:49 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Fri, Jun 9, 2017 at 11:38 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>
>>> The challenge is in accounting the number of link-ups atomically with
>>> the link-up itself.  No ideas off-hand.
>>
>> Following would work I think:
>>
>> - copy up to indexdir (it now has st_nlink == 1)
>> - set overlay.nlinkup to "1-2"
>>   + the first number indicates the target number of linkups
>>   + the second indicates the target st_nlink on the file
>> - fsync the file (hopefully this writes the xattrs as well as the data)
>> - link the file from indexdir to upper
>> - set overlay.nlinkup to "1"
>>    + this indicates that the linkup was finished and number of linkups is 1.
>>
>> There are 3 cases when we find a file with overlay.nlinkup:
>>
>> - just one number: nothing to do
>> - two numbers and second number == st_nlink: replace with just the linkup value
>> - two numbers and second number != st_nlink: replace with decremented
>> linkup value
>>
>> So the atomicity is guaranteed by st_nlink becoming in sync with the
>> target value.
>>
>> Would need to ensure that ovl_link() is not performed in parallel with
>> the linkup procedure.
>>
>
> Yep, I started to write you the same thing :)
> I though of using a separate xattr for the transnational values,
> and delete it after link up, but I guess single xattr is cleaner ??
>
> I guess the new ovl_inode mutex could be used for blocking ovl_link().
>
> Just to be clear:
> - we are going to allocate ovl_inode from our own slab cache
> - move from using the i_private inode fields to ovl_inode fields:
>   {realinode, originode, nlinkup, lock}
> - use ovl_inode mutex instead of wait queue and for hardlink up
>
> Correct?

Sounds good.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 08/20] ovl: lookup index entry for non-dir
  2017-06-09 13:14                 ` Miklos Szeredi
  2017-06-09 13:24                   ` Amir Goldstein
@ 2017-06-09 22:56                   ` Amir Goldstein
  1 sibling, 0 replies; 41+ messages in thread
From: Amir Goldstein @ 2017-06-09 22:56 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Fri, Jun 9, 2017 at 4:14 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Jun 9, 2017 at 1:49 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Fri, Jun 9, 2017 at 11:38 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>
>> The challenge is in accounting the number of link-ups atomically with
>> the link-up itself.  No ideas off-hand.
>
> Following would work I think:
>
> - copy up to indexdir (it now has st_nlink == 1)
> - set overlay.nlinkup to "1-2"
>   + the first number indicates the target number of linkups
>   + the second indicates the target st_nlink on the file
> - fsync the file (hopefully this writes the xattrs as well as the data)
> - link the file from indexdir to upper
> - set overlay.nlinkup to "1"
>    + this indicates that the linkup was finished and number of linkups is 1.
>

Or like this:

struct ovl_nlink_adjust {
  int nlink_diff;
  bool from_lower;
}

Init overlay inode:
overlay_nlink := (from_lower ? lower_nlink : upper_nlink) + nlink_diff

Copy/link up:
- take ovl_inode lock
- set overlay.nlink { (overlay_nlink - lower_link), true }
- link index (won't change diff from lower inode nlink on fail/success)

Unlink/link/rename:
- take ovl_inode lock
- set overlay.nlink { (overlay_nlink - upper_nlink), false }
- unlink/link/rename (won't change diff from upper nlink on fail/success)

I don't think fsync matters to this game.
I'd be surprised if the nlink changing operations (link/unlink/rename) can be
reordered with the setxattr operations on a power fail safe fs.

Amir.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2017-06-09 22:56 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-07  7:51 [PATCH v2 00/20] Overlayfs inodes index Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 01/20] vfs: introduce inode 'inuse' lock Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 02/20] ovl: get exclusive ownership on upper/work dirs Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 03/20] ovl: relax same fs constrain for ovl_check_origin() Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 04/20] ovl: generalize ovl_create_workdir() Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 05/20] ovl: introduce the inodes index dir feature Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 06/20] ovl: verify upper root dir matches lower root dir Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 07/20] ovl: verify index dir matches upper dir Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 08/20] ovl: lookup index entry for non-dir Amir Goldstein
2017-06-08 12:11   ` Miklos Szeredi
2017-06-08 14:48     ` Amir Goldstein
2017-06-08 15:17       ` Miklos Szeredi
2017-06-08 16:09         ` Amir Goldstein
2017-06-09  8:43           ` Miklos Szeredi
2017-06-09  9:38             ` Amir Goldstein
2017-06-09 11:49               ` Miklos Szeredi
2017-06-09 13:14                 ` Miklos Szeredi
2017-06-09 13:24                   ` Amir Goldstein
2017-06-09 13:29                     ` Miklos Szeredi
2017-06-09 22:56                   ` Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 09/20] ovl: move inode helpers to inode.c Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 10/20] ovl: use ovl_inode_init() for initializing new inode Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 11/20] ovl: hash overlay non-dir inodes by copy up origin inode Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 12/20] ovl: fix nlink leak in ovl_rename() Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 13/20] ovl: adjust overlay inode nlink for indexed inodes Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 14/20] ovl: defer upper dir lock to tempfile link Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 15/20] ovl: factor out ovl_copy_up_inode() helper Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 16/20] ovl: generalize ovl_copy_up_locked() using actors Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 17/20] ovl: generalize ovl_copy_up_one() " Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 18/20] ovl: implement index dir copy up method Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 19/20] ovl: handle race of concurrent lower hardlinks copy up Amir Goldstein
2017-06-07  7:51 ` [PATCH v2 20/20] ovl: constant inode number for hardlinks Amir Goldstein
2017-06-07  7:54 ` [PATCH v2 00/20] Overlayfs inodes index Miklos Szeredi
2017-06-07  7:58   ` Amir Goldstein
2017-06-07 14:58 ` Amir Goldstein
2017-06-08 15:00   ` [PATCH v2 21/23] ovl: use inodes index on readonly mount Amir Goldstein
2017-06-08 15:00     ` [PATCH v2 22/23] ovl: move copy up helpers to copy_up.c Amir Goldstein
2017-06-08 15:00     ` [PATCH v2 23/23] ovl: copy up on read operations on indexed lower Amir Goldstein
2017-06-07 17:17 ` [PATCH v2 00/20] Overlayfs inodes index J. Bruce Fields
2017-06-07 18:36   ` Amir Goldstein
2017-06-07 18:59     ` J. Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.