All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/25] Overlayfs inodes index
@ 2017-06-21 12:28 Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 01/25] vfs: introduce inode 'inuse' lock Amir Goldstein
                   ` (26 more replies)
  0 siblings, 27 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs, linux-fsdevel

Miklos,

This is it.
All done on my end, so unless you have any rejects that you want me
to fix, you may start mangling.

I've added another xfstest for nlink accounting and may add another
one that mangles with lower hardlinks to get negative overlay nlink.
It's a behavior I observed with lower/upper stress xfstest overlay/019
and fixed in patch 25.

Given the state of affairs with consistent d_ino patches, I suggest that
we focus on herding the inode index patches towards linux-next before
anything else.

Thanks,
Amir.


This work introduces the inodes index opt-in feature, which provides:
- Hardlinks are not broken on copy up
- Infrastructure for overlayfs NFS export

The work is available on my ovl-hardlinks.v4 branch [1].
For the curious, documentation about how overlay NFS export would work
with the inodes index is available in the linked commit [2].

The most significant change w.r.t. vfs is that with inodes index,
overlay inodes are hashed and unique throughout the lifecycle of an
overlay object (i.e. across copy up). This is required for not breaking
hardlinks on copy up.

Hardlink copy up tests including coverage for concurrent copy up of
lower hardlinks, lower/upper hardlinks consistency and union nlink
accounting are available on my xfstests dev branch [3].

This work also fixes constant st_ino for samefs case for lower hardlinks,
so the few constant st_ino tests that fail on v4.12-rc1 due to copy up of
lower hardlinks now pass [3][4].

Changes since v3:
- Persistent overlay nlink accounting
- Cleanup index when overlay nlink drops to zero
- Cleanup stale and orphan index entries on mount
- Update documentation w.r.t inodes index feature

Changes v1..v2:
- Introduce ovl_inode and ovl_inode mutex
- Synchronize copy up with ovl_inode mutex
- Don't store index dentry in overlay dentry
- Consistency of lower/upper hardlinks by link-up on lookup
- Preemptive copy up before lower hardlink unlink

[1] https://github.com/amir73il/linux/commits/ovl-hardlinks.v4
[2] https://github.com/amir73il/linux/commits/ovl-nfs-export 
[3] https://github.com/amir73il/xfstests/commits/overlayfs-devel
[4] https://github.com/amir73il/unionmount-testsuite/commits/overlayfs-devel


Amir Goldstein (25):
  vfs: introduce inode 'inuse' lock
  ovl: get exclusive ownership on upper/work dirs
  ovl: relax same fs constrain for ovl_check_origin()
  ovl: generalize ovl_create_workdir()
  ovl: introduce the inodes index dir feature
  ovl: verify upper root dir matches lower root dir
  ovl: verify index dir matches upper dir
  ovl: store path type in dentry
  ovl: cram dentry state booleans into type flags
  ovl: lookup index entry for copy up origin
  ovl: cleanup bad and stale index entries on mount
  ovl: allocate an ovl_inode struct
  ovl: store upper/lower real inode in ovl_inode_info
  ovl: use ovl_inode_init() for initializing new inode
  ovl: hash overlay non-dir inodes by copy up origin inode
  ovl: defer upper dir lock to tempfile link
  ovl: factor out ovl_copy_up_inode() helper
  ovl: generalize ovl_copy_up_locked() using actors
  ovl: generalize ovl_copy_up_one() using actors
  ovl: use ovl_inode mutex to synchronize concurrent copy up
  ovl: implement index dir copy up method
  ovl: link up indexed lower hardlink on lookup
  ovl: fix nlink leak in ovl_rename()
  ovl: persistent overlay inode nlink for indexed inodes
  ovl: cleanup orphan index entries

 Documentation/filesystems/overlayfs.txt |   6 +-
 fs/inode.c                              |  50 +++
 fs/overlayfs/Kconfig                    |  20 +
 fs/overlayfs/copy_up.c                  | 625 +++++++++++++++++++++++++-------
 fs/overlayfs/dir.c                      | 164 ++++++++-
 fs/overlayfs/inode.c                    | 160 +++++++-
 fs/overlayfs/namei.c                    | 382 ++++++++++++++++---
 fs/overlayfs/overlayfs.h                |  47 ++-
 fs/overlayfs/ovl_entry.h                |  37 +-
 fs/overlayfs/readdir.c                  |  50 +++
 fs/overlayfs/super.c                    | 286 +++++++++++++--
 fs/overlayfs/util.c                     | 164 +++++++--
 include/linux/fs.h                      |  14 +
 13 files changed, 1723 insertions(+), 282 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v4 01/25] vfs: introduce inode 'inuse' lock
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 02/25] ovl: get exclusive ownership on upper/work dirs Amir Goldstein
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

Added an i_state flag I_INUSE and helpers to set/clear/test the bit.

The 'inuse' lock is an 'advisory' inode lock, that can be used to extend
exclusive create protection beyond parent->i_mutex lock among cooperating
users.

This is going to be used by overlayfs to get exclusive ownership on upper
and work dirs among overlayfs mounts.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/inode.c         | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h | 14 ++++++++++++++
 2 files changed, 64 insertions(+)

diff --git a/fs/inode.c b/fs/inode.c
index db5914783a71..6a41b27467eb 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2120,3 +2120,53 @@ struct timespec current_time(struct inode *inode)
 	return timespec_trunc(now, inode->i_sb->s_time_gran);
 }
 EXPORT_SYMBOL(current_time);
+
+/**
+ * inode_inuse_trylock - try to get an exclusive 'inuse' lock on inode
+ * @inode: inode being locked
+ *
+ * The 'inuse' lock is an 'advisory' lock that can be used to extend exclusive
+ * create protection beyond parent->i_mutex lock among cooperating users.
+ * Used by overlayfs to get exclusive ownership on upper and work dirs among
+ * overlayfs mounts.
+ *
+ * Caller must hold a reference to inode to prevent it from being freed while
+ * it is marked inuse.
+ *
+ * Return true if I_INUSE flag was set by this call.
+ */
+bool inode_inuse_trylock(struct inode *inode)
+{
+	bool locked = false;
+
+	spin_lock(&inode->i_lock);
+	if (!WARN_ON(!atomic_read(&inode->i_count)) &&
+	    !WARN_ON(inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) &&
+	    !(inode->i_state & I_INUSE)) {
+		inode->i_state |= I_INUSE;
+		locked = true;
+	}
+	spin_unlock(&inode->i_lock);
+	return locked;
+}
+EXPORT_SYMBOL(inode_inuse_trylock);
+
+/**
+ * inode_inuse_unlock - release exclusive 'inuse' lock
+ * @inode:	inode inuse to unlock
+ *
+ * Clear the I_INUSE state.
+ *
+ * Caller must hold a reference to inode and must have successfully marked
+ * the inode 'inuse' prior to this call.
+ */
+void inode_inuse_unlock(struct inode *inode)
+{
+	spin_lock(&inode->i_lock);
+	WARN_ON(!atomic_read(&inode->i_count));
+	WARN_ON(inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE));
+	WARN_ON(!(inode->i_state & I_INUSE));
+	inode->i_state &= ~I_INUSE;
+	spin_unlock(&inode->i_lock);
+}
+EXPORT_SYMBOL(inode_inuse_unlock);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index aab10f93ef23..532802120258 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1929,6 +1929,12 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
  *			wb stat updates to grab mapping->tree_lock.  See
  *			inode_switch_wb_work_fn() for details.
  *
+ * I_INUSE		An 'advisory' bit to get exclusive ownership on inode
+ *			using inode_inuse_trylock().  It can be used to extend
+ *			exclusive create protection beyond parent->i_mutex lock.
+ *			Used by overlayfs to get exclusive ownership on upper
+ *			and work dirs among overlayfs mounts.
+ *
  * Q: What is the difference between I_WILL_FREE and I_FREEING?
  */
 #define I_DIRTY_SYNC		(1 << 0)
@@ -1949,6 +1955,7 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
 #define __I_DIRTY_TIME_EXPIRED	12
 #define I_DIRTY_TIME_EXPIRED	(1 << __I_DIRTY_TIME_EXPIRED)
 #define I_WB_SWITCH		(1 << 13)
+#define I_INUSE			(1 << 14)
 
 #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES)
 #define I_DIRTY_ALL (I_DIRTY | I_DIRTY_TIME)
@@ -3258,5 +3265,12 @@ static inline bool dir_relax_shared(struct inode *inode)
 
 extern bool path_noexec(const struct path *path);
 extern void inode_nohighmem(struct inode *inode);
+extern bool inode_inuse_trylock(struct inode *inode);
+extern void inode_inuse_unlock(struct inode *inode);
+
+static inline bool inode_inuse(struct inode *inode)
+{
+	return inode->i_state & I_INUSE;
+}
 
 #endif /* _LINUX_FS_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 02/25] ovl: get exclusive ownership on upper/work dirs
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 01/25] vfs: introduce inode 'inuse' lock Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 03/25] ovl: relax same fs constrain for ovl_check_origin() Amir Goldstein
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

Bad things can happen if several concurrent overlay mounts try to
use the same upperdir/workdir path.

Try to get the 'inuse' advisory lock on upperdir and workdir.
Fail mount if another overlay mount instance or another user
holds the 'inuse' lock on these directories.

Note that this provides no protection for concurrent overlay
mount that use overlapping (i.e. descendant) upper/work dirs.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/ovl_entry.h |  3 +++
 fs/overlayfs/super.c     | 41 ++++++++++++++++++++++++++++++++++++++---
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 34bc4a9f5c61..b0e7ee2ae398 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -21,6 +21,9 @@ struct ovl_fs {
 	struct vfsmount *upper_mnt;
 	unsigned numlower;
 	struct vfsmount **lower_mnt;
+	/* workbasedir is the path at workdir= mount option */
+	struct dentry *workbasedir;
+	/* workdir is the 'work' directory under workbasedir */
 	struct dentry *workdir;
 	long namelen;
 	/* pathnames of lower and upper dirs, for show_options */
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 4882ffb37bae..476f021baf2a 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -165,12 +165,28 @@ static const struct dentry_operations ovl_reval_dentry_operations = {
 	.d_weak_revalidate = ovl_dentry_weak_revalidate,
 };
 
+/* Get exclusive ownership on upper/work dir among overlay mounts */
+static bool ovl_dir_lock(struct dentry *dentry)
+{
+	return inode_inuse_trylock(d_inode(dentry));
+}
+
+static void ovl_dir_unlock(struct dentry *dentry)
+{
+	if (dentry)
+		inode_inuse_unlock(d_inode(dentry));
+}
+
 static void ovl_put_super(struct super_block *sb)
 {
 	struct ovl_fs *ufs = sb->s_fs_info;
 	unsigned i;
 
 	dput(ufs->workdir);
+	ovl_dir_unlock(ufs->workbasedir);
+	dput(ufs->workbasedir);
+	if (ufs->upper_mnt)
+		ovl_dir_unlock(ufs->upper_mnt->mnt_root);
 	mntput(ufs->upper_mnt);
 	for (i = 0; i < ufs->numlower; i++)
 		mntput(ufs->lower_mnt[i]);
@@ -788,9 +804,15 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 		if (err)
 			goto out_put_upperpath;
 
+		err = -EBUSY;
+		if (!ovl_dir_lock(upperpath.dentry)) {
+			pr_err("overlayfs: upperdir is in-use by another mount\n");
+			goto out_put_upperpath;
+		}
+
 		err = ovl_mount_dir(ufs->config.workdir, &workpath);
 		if (err)
-			goto out_put_upperpath;
+			goto out_unlock_upperdentry;
 
 		err = -EINVAL;
 		if (upperpath.mnt != workpath.mnt) {
@@ -801,12 +823,20 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 			pr_err("overlayfs: workdir and upperdir must be separate subtrees\n");
 			goto out_put_workpath;
 		}
+
+		err = -EBUSY;
+		if (!ovl_dir_lock(workpath.dentry)) {
+			pr_err("overlayfs: workdir is in-use by another mount\n");
+			goto out_put_workpath;
+		}
+
+		ufs->workbasedir = workpath.dentry;
 		sb->s_stack_depth = upperpath.mnt->mnt_sb->s_stack_depth;
 	}
 	err = -ENOMEM;
 	lowertmp = kstrdup(ufs->config.lowerdir, GFP_KERNEL);
 	if (!lowertmp)
-		goto out_put_workpath;
+		goto out_unlock_workdentry;
 
 	err = -EINVAL;
 	stacklen = ovl_split_lowerdirs(lowertmp);
@@ -849,6 +879,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 			pr_err("overlayfs: failed to clone upperpath\n");
 			goto out_put_lowerpath;
 		}
+
 		/* Don't inherit atime flags */
 		ufs->upper_mnt->mnt_flags &= ~(MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME);
 
@@ -971,7 +1002,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	mntput(upperpath.mnt);
 	for (i = 0; i < numlower; i++)
 		mntput(stack[i].mnt);
-	path_put(&workpath);
+	mntput(workpath.mnt);
 	kfree(lowertmp);
 
 	if (upperpath.dentry) {
@@ -1011,8 +1042,12 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	kfree(stack);
 out_free_lowertmp:
 	kfree(lowertmp);
+out_unlock_workdentry:
+	ovl_dir_unlock(workpath.dentry);
 out_put_workpath:
 	path_put(&workpath);
+out_unlock_upperdentry:
+	ovl_dir_unlock(upperpath.dentry);
 out_put_upperpath:
 	path_put(&upperpath);
 out_free_config:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 03/25] ovl: relax same fs constrain for ovl_check_origin()
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 01/25] vfs: introduce inode 'inuse' lock Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 02/25] ovl: get exclusive ownership on upper/work dirs Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 04/25] ovl: generalize ovl_create_workdir() Amir Goldstein
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

For the case of all layers not on the same fs, try to decode the copy up
origin file handle on any of the lower layers.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/namei.c | 43 +++++++++++++++++++++++++------------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 1ec1499fc672..4ca6061f7bfa 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -272,28 +272,24 @@ static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
 static int ovl_check_origin(struct dentry *dentry, struct dentry *upperdentry,
 			    struct path **stackp, unsigned int *ctrp)
 {
-	struct super_block *same_sb = ovl_same_sb(dentry->d_sb);
 	struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
 	struct vfsmount *mnt;
-	struct dentry *origin;
+	struct dentry *origin = NULL;
+	int i;
 
-	if (!same_sb || !roe->numlower)
-		return 0;
 
-       /*
-	* Since all layers are on the same fs, we use the first layer for
-	* decoding the file handle.  We may get a disconnected dentry,
-	* which is fine, because we only need to hold the origin inode in
-	* cache and use its inode number.  We may even get a connected dentry,
-	* that is not under the first layer's root.  That is also fine for
-	* using it's inode number - it's the same as if we held a reference
-	* to a dentry in first layer that was moved under us.
-	*/
-	mnt = roe->lowerstack[0].mnt;
-
-	origin = ovl_get_origin(upperdentry, mnt);
-	if (IS_ERR_OR_NULL(origin))
-		return PTR_ERR(origin);
+	for (i = 0; i < roe->numlower; i++) {
+		mnt = roe->lowerstack[i].mnt;
+		origin = ovl_get_origin(upperdentry, mnt);
+		if (IS_ERR(origin))
+			return PTR_ERR(origin);
+
+		if (origin)
+			break;
+	}
+
+	if (!origin)
+		return 0;
 
 	BUG_ON(*stackp || *ctrp);
 	*stackp = kmalloc(sizeof(struct path), GFP_TEMPORARY);
@@ -301,6 +297,7 @@ static int ovl_check_origin(struct dentry *dentry, struct dentry *upperdentry,
 		dput(origin);
 		return -ENOMEM;
 	}
+
 	**stackp = (struct path) { .dentry = origin, .mnt = mnt };
 	*ctrp = 1;
 
@@ -375,6 +372,16 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		}
 		if (upperdentry && !d.is_dir) {
 			BUG_ON(!d.stop || d.redirect);
+			/*
+			 * Lookup copy up origin by decoding origin file handle.
+			 * We may get a disconnected dentry, which is fine,
+			 * because we only need to hold the origin inode in
+			 * cache and use its inode number.  We may even get a
+			 * connected dentry, that is not under any of the lower
+			 * layers root.  That is also fine for using it's inode
+			 * number - it's the same as if we held a reference
+			 * to a dentry in lower layer that was moved under us.
+			 */
 			err = ovl_check_origin(dentry, upperdentry,
 					       &stack, &ctr);
 			if (err)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 04/25] ovl: generalize ovl_create_workdir()
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (2 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 03/25] ovl: relax same fs constrain for ovl_check_origin() Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 05/25] ovl: introduce the inodes index dir feature Amir Goldstein
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

Pass in the subdir name to create and specify if subdir is persistent
or if it should be cleaned up on every mount.

Move fallback to readonly mount on failure to create dir and print of error
message into the helper.

This function is going to be used for creating the persistent 'index' dir
under workbasedir.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/super.c | 45 +++++++++++++++++++++++++++++----------------
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 476f021baf2a..5114627799e9 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -395,22 +395,25 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
 
 #define OVL_WORKDIR_NAME "work"
 
-static struct dentry *ovl_workdir_create(struct vfsmount *mnt,
-					 struct dentry *dentry)
+static struct dentry *ovl_workdir_create(struct super_block *sb,
+					 struct ovl_fs *ufs,
+					 struct dentry *dentry,
+					 const char *name, bool persist)
 {
-	struct inode *dir = dentry->d_inode;
-	struct dentry *work;
+	struct vfsmount *mnt = ufs->upper_mnt;
+	struct inode *dir = NULL;
+	struct dentry *work = NULL;
 	int err;
 	bool retried = false;
 
 	err = mnt_want_write(mnt);
 	if (err)
-		return ERR_PTR(err);
+		goto out_err;
 
+	dir = d_inode(dentry);
 	inode_lock_nested(dir, I_MUTEX_PARENT);
 retry:
-	work = lookup_one_len(OVL_WORKDIR_NAME, dentry,
-			      strlen(OVL_WORKDIR_NAME));
+	work = lookup_one_len(name, dentry, strlen(name));
 
 	if (!IS_ERR(work)) {
 		struct iattr attr = {
@@ -423,6 +426,9 @@ static struct dentry *ovl_workdir_create(struct vfsmount *mnt,
 			if (retried)
 				goto out_dput;
 
+			if (persist)
+				goto out_unlock;
+
 			retried = true;
 			ovl_workdir_cleanup(dir, mnt, work, 0);
 			dput(work);
@@ -462,16 +468,25 @@ static struct dentry *ovl_workdir_create(struct vfsmount *mnt,
 		inode_unlock(work->d_inode);
 		if (err)
 			goto out_dput;
+	} else {
+		err = PTR_ERR(work);
+		goto out_err;
 	}
+
 out_unlock:
-	inode_unlock(dir);
+	if (dir)
+		inode_unlock(dir);
 	mnt_drop_write(mnt);
 
 	return work;
 
 out_dput:
 	dput(work);
-	work = ERR_PTR(err);
+out_err:
+	pr_warn("overlayfs: failed to create directory %s/%s (errno: %i); mounting read-only\n",
+		ufs->config.workdir, name, -err);
+	sb->s_flags |= MS_RDONLY;
+	work = NULL;
 	goto out_unlock;
 }
 
@@ -885,14 +900,11 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 
 		sb->s_time_gran = ufs->upper_mnt->mnt_sb->s_time_gran;
 
-		ufs->workdir = ovl_workdir_create(ufs->upper_mnt, workpath.dentry);
+		ufs->workdir = ovl_workdir_create(sb, ufs, workpath.dentry,
+						  OVL_WORKDIR_NAME, false);
 		err = PTR_ERR(ufs->workdir);
-		if (IS_ERR(ufs->workdir)) {
-			pr_warn("overlayfs: failed to create directory %s/%s (errno: %i); mounting read-only\n",
-				ufs->config.workdir, OVL_WORKDIR_NAME, -err);
-			sb->s_flags |= MS_RDONLY;
-			ufs->workdir = NULL;
-		}
+		if (IS_ERR(ufs->workdir))
+			goto out_put_upper_mnt;
 
 		/*
 		 * Upper should support d_type, else whiteouts are visible.
@@ -1035,6 +1047,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	kfree(ufs->lower_mnt);
 out_put_workdir:
 	dput(ufs->workdir);
+out_put_upper_mnt:
 	mntput(ufs->upper_mnt);
 out_put_lowerpath:
 	for (i = 0; i < numlower; i++)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 05/25] ovl: introduce the inodes index dir feature
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (3 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 04/25] ovl: generalize ovl_create_workdir() Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 06/25] ovl: verify upper root dir matches lower root dir Amir Goldstein
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

Create the index dir on mount. The index dir will contain hardlinks to
upper inodes, named after the hex representation of their origin lower
inodes.

The index dir is going to be used to prevent breaking lower hardlinks
on copy up and to implement overlayfs NFS export.

Because the feature is not fully backward compat, enabling the feature
is opt-in by config/module/mount option.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/Kconfig     | 20 +++++++++++++++
 fs/overlayfs/copy_up.c   | 10 +++-----
 fs/overlayfs/overlayfs.h |  2 ++
 fs/overlayfs/ovl_entry.h |  3 +++
 fs/overlayfs/super.c     | 66 ++++++++++++++++++++++++++++++++++++++++++++++--
 fs/overlayfs/util.c      | 15 +++++++++++
 6 files changed, 108 insertions(+), 8 deletions(-)

diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
index c0c9683934b7..b7241f273c36 100644
--- a/fs/overlayfs/Kconfig
+++ b/fs/overlayfs/Kconfig
@@ -23,3 +23,23 @@ config OVERLAY_FS_REDIRECT_DIR
 	  Note, that redirects are not backward compatible.  That is, mounting
 	  an overlay which has redirects on a kernel that doesn't support this
 	  feature will have unexpected results.
+
+config OVERLAY_FS_INDEX
+	bool "Overlayfs: turn on inodes index feature by default"
+	depends on OVERLAY_FS
+	help
+	  If this config option is enabled then overlay filesystems will use
+	  the inodes index dir to map lower inodes to upper inodes by default.
+	  In this case it is still possible to turn off index globally with the
+	  "index=off" module option or on a filesystem instance basis with the
+	  "index=off" mount option.
+
+	  The inodes index feature prevents breaking of lower hardlinks on copy
+	  up and enables NFS export.
+
+	  Note, that the inodes index feature is read-only backward compatible.
+	  That is, mounting an overlay which has an index dir on a kernel that
+	  doesn't support this feature read-only, will not have any negative
+	  outcomes.  However, mounting the same overlay with an old kernel
+	  read-write and then mounting it again with a new kernel, will have
+	  unexpected results.
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 7a44533f4bbf..d3047fdc0377 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -233,12 +233,13 @@ int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat)
 	return err;
 }
 
-static struct ovl_fh *ovl_encode_fh(struct dentry *lower, uuid_be *uuid)
+static struct ovl_fh *ovl_encode_fh(struct dentry *lower)
 {
 	struct ovl_fh *fh;
 	int fh_type, fh_len, dwords;
 	void *buf;
 	int buflen = MAX_HANDLE_SZ;
+	uuid_be *uuid = (uuid_be *) &lower->d_sb->s_uuid;
 
 	buf = kmalloc(buflen, GFP_TEMPORARY);
 	if (!buf)
@@ -283,8 +284,6 @@ static struct ovl_fh *ovl_encode_fh(struct dentry *lower, uuid_be *uuid)
 static int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
 			  struct dentry *upper)
 {
-	struct super_block *sb = lower->d_sb;
-	uuid_be *uuid = (uuid_be *) &sb->s_uuid;
 	const struct ovl_fh *fh = NULL;
 	int err;
 
@@ -293,9 +292,8 @@ static int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
 	 * so we can use the overlay.origin xattr to distignuish between a copy
 	 * up and a pure upper inode.
 	 */
-	if (sb->s_export_op && sb->s_export_op->fh_to_dentry &&
-	    uuid_be_cmp(*uuid, NULL_UUID_BE)) {
-		fh = ovl_encode_fh(lower, uuid);
+	if (ovl_can_decode_fh(lower->d_sb)) {
+		fh = ovl_encode_fh(lower);
 		if (IS_ERR(fh))
 			return PTR_ERR(fh);
 	}
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 513e25e56eed..9ec93063c3a8 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -191,6 +191,8 @@ void ovl_drop_write(struct dentry *dentry);
 struct dentry *ovl_workdir(struct dentry *dentry);
 const struct cred *ovl_override_creds(struct super_block *sb);
 struct super_block *ovl_same_sb(struct super_block *sb);
+bool ovl_can_decode_fh(struct super_block *sb);
+struct dentry *ovl_indexdir(struct super_block *sb);
 struct ovl_entry *ovl_alloc_entry(unsigned int numlower);
 bool ovl_dentry_remote(struct dentry *dentry);
 bool ovl_dentry_weird(struct dentry *dentry);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index b0e7ee2ae398..ddd4b0a874a9 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -14,6 +14,7 @@ struct ovl_config {
 	char *workdir;
 	bool default_permissions;
 	bool redirect_dir;
+	bool index;
 };
 
 /* private information held for overlayfs's superblock */
@@ -25,6 +26,8 @@ struct ovl_fs {
 	struct dentry *workbasedir;
 	/* workdir is the 'work' directory under workbasedir */
 	struct dentry *workdir;
+	/* index directory listing overlay inodes by origin file handle */
+	struct dentry *indexdir;
 	long namelen;
 	/* pathnames of lower and upper dirs, for show_options */
 	struct ovl_config config;
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 5114627799e9..d92c7486bdf2 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -34,6 +34,11 @@ module_param_named(redirect_dir, ovl_redirect_dir_def, bool, 0644);
 MODULE_PARM_DESC(ovl_redirect_dir_def,
 		 "Default to on or off for the redirect_dir feature");
 
+static bool ovl_index_def = IS_ENABLED(CONFIG_OVERLAY_FS_INDEX);
+module_param_named(index, ovl_index_def, bool, 0644);
+MODULE_PARM_DESC(ovl_index_def,
+		 "Default to on or off for the inodes index feature");
+
 static void ovl_dentry_release(struct dentry *dentry)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
@@ -182,6 +187,7 @@ static void ovl_put_super(struct super_block *sb)
 	struct ovl_fs *ufs = sb->s_fs_info;
 	unsigned i;
 
+	dput(ufs->indexdir);
 	dput(ufs->workdir);
 	ovl_dir_unlock(ufs->workbasedir);
 	dput(ufs->workbasedir);
@@ -244,6 +250,12 @@ static int ovl_statfs(struct dentry *dentry, struct kstatfs *buf)
 	return err;
 }
 
+/* Will this overlay be forced to mount/remount ro? */
+static bool ovl_force_readonly(struct ovl_fs *ufs)
+{
+	return (!ufs->upper_mnt || !ufs->workdir);
+}
+
 /**
  * ovl_show_options
  *
@@ -265,6 +277,9 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
 	if (ufs->config.redirect_dir != ovl_redirect_dir_def)
 		seq_printf(m, ",redirect_dir=%s",
 			   ufs->config.redirect_dir ? "on" : "off");
+	if (ufs->config.index != ovl_index_def)
+		seq_printf(m, ",index=%s",
+			   ufs->config.index ? "on" : "off");
 	return 0;
 }
 
@@ -272,7 +287,7 @@ static int ovl_remount(struct super_block *sb, int *flags, char *data)
 {
 	struct ovl_fs *ufs = sb->s_fs_info;
 
-	if (!(*flags & MS_RDONLY) && (!ufs->upper_mnt || !ufs->workdir))
+	if (!(*flags & MS_RDONLY) && ovl_force_readonly(ufs))
 		return -EROFS;
 
 	return 0;
@@ -294,6 +309,8 @@ enum {
 	OPT_DEFAULT_PERMISSIONS,
 	OPT_REDIRECT_DIR_ON,
 	OPT_REDIRECT_DIR_OFF,
+	OPT_INDEX_ON,
+	OPT_INDEX_OFF,
 	OPT_ERR,
 };
 
@@ -304,6 +321,8 @@ static const match_table_t ovl_tokens = {
 	{OPT_DEFAULT_PERMISSIONS,	"default_permissions"},
 	{OPT_REDIRECT_DIR_ON,		"redirect_dir=on"},
 	{OPT_REDIRECT_DIR_OFF,		"redirect_dir=off"},
+	{OPT_INDEX_ON,			"index=on"},
+	{OPT_INDEX_OFF,			"index=off"},
 	{OPT_ERR,			NULL}
 };
 
@@ -376,6 +395,14 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
 			config->redirect_dir = false;
 			break;
 
+		case OPT_INDEX_ON:
+			config->index = true;
+			break;
+
+		case OPT_INDEX_OFF:
+			config->index = false;
+			break;
+
 		default:
 			pr_err("overlayfs: unrecognized mount option \"%s\" or missing value\n", p);
 			return -EINVAL;
@@ -394,6 +421,7 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
 }
 
 #define OVL_WORKDIR_NAME "work"
+#define OVL_INDEXDIR_NAME "index"
 
 static struct dentry *ovl_workdir_create(struct super_block *sb,
 					 struct ovl_fs *ufs,
@@ -785,6 +813,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 
 	init_waitqueue_head(&ufs->copyup_wq);
 	ufs->config.redirect_dir = ovl_redirect_dir_def;
+	ufs->config.index = ovl_index_def;
 	err = ovl_parse_opt((char *) data, &ufs->config);
 	if (err)
 		goto out_free_config;
@@ -947,6 +976,13 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 			} else {
 				vfs_removexattr(ufs->workdir, OVL_XATTR_OPAQUE);
 			}
+
+			/* Check if upper/work fs supports file handles */
+			if (ufs->config.index &&
+			    !ovl_can_decode_fh(ufs->workdir->d_sb)) {
+				ufs->config.index = false;
+				pr_warn("overlayfs: upper fs does not support NFS export.\n");
+			}
 		}
 	}
 
@@ -976,6 +1012,15 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 			ufs->same_sb = mnt->mnt_sb;
 		else if (ufs->same_sb != mnt->mnt_sb)
 			ufs->same_sb = NULL;
+
+		/*
+		 * The inodes index feature needs to encode and decode file
+		 * handles, so it requires that all layers support NFS export.
+		 */
+		if (ufs->config.index && !ovl_can_decode_fh(mnt->mnt_sb)) {
+			ufs->config.index = false;
+			pr_warn("overlayfs: lower fs does not support NFS export.\n");
+		}
 	}
 
 	/* If the upper fs is nonexistent, we mark overlayfs r/o too */
@@ -984,6 +1029,21 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	else if (ufs->upper_mnt->mnt_sb != ufs->same_sb)
 		ufs->same_sb = NULL;
 
+	if (!(ovl_force_readonly(ufs)) && ufs->config.index) {
+		ufs->indexdir = ovl_workdir_create(sb, ufs, workpath.dentry,
+						   OVL_INDEXDIR_NAME, true);
+		err = PTR_ERR(ufs->indexdir);
+		if (IS_ERR(ufs->indexdir))
+			goto out_put_lower_mnt;
+
+		if (!ufs->indexdir)
+			pr_warn("overlayfs: try deleting index dir or mounting with '-o index=off' to disable inodes index.\n");
+	}
+
+	/* Show index=off/on in /proc/mounts for any of the reasons above */
+	if (!ufs->indexdir)
+		ufs->config.index = false;
+
 	if (remote)
 		sb->s_d_op = &ovl_reval_dentry_operations;
 	else
@@ -991,7 +1051,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 
 	ufs->creator_cred = cred = prepare_creds();
 	if (!cred)
-		goto out_put_lower_mnt;
+		goto out_put_indexdir;
 
 	/* Never override disk quota limits or use reserved space */
 	cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
@@ -1041,6 +1101,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	kfree(oe);
 out_put_cred:
 	put_cred(ufs->creator_cred);
+out_put_indexdir:
+	dput(ufs->indexdir);
 out_put_lower_mnt:
 	for (i = 0; i < ufs->numlower; i++)
 		mntput(ufs->lower_mnt[i]);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 809048913889..8e2f308056f1 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -12,6 +12,8 @@
 #include <linux/slab.h>
 #include <linux/cred.h>
 #include <linux/xattr.h>
+#include <linux/exportfs.h>
+#include <linux/uuid.h>
 #include "overlayfs.h"
 #include "ovl_entry.h"
 
@@ -47,6 +49,19 @@ struct super_block *ovl_same_sb(struct super_block *sb)
 	return ofs->same_sb;
 }
 
+bool ovl_can_decode_fh(struct super_block *sb)
+{
+	return (sb->s_export_op && sb->s_export_op->fh_to_dentry &&
+		uuid_be_cmp(*(uuid_be *) &sb->s_uuid, NULL_UUID_BE));
+}
+
+struct dentry *ovl_indexdir(struct super_block *sb)
+{
+	struct ovl_fs *ofs = sb->s_fs_info;
+
+	return ofs->indexdir;
+}
+
 struct ovl_entry *ovl_alloc_entry(unsigned int numlower)
 {
 	size_t size = offsetof(struct ovl_entry, lowerstack[numlower]);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 06/25] ovl: verify upper root dir matches lower root dir
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (4 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 05/25] ovl: introduce the inodes index dir feature Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 07/25] ovl: verify index dir matches upper dir Amir Goldstein
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

When inodes index feature is enabled, verify that the file handle stored
in upper root dir matches the lower root dir or fail to mount.

If upper root dir has no stored file handle, encode and store the lower
root dir file handle in overlay.origin xattr.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c   |   2 +-
 fs/overlayfs/namei.c     | 103 ++++++++++++++++++++++++++++++++++++++++-------
 fs/overlayfs/overlayfs.h |   3 ++
 fs/overlayfs/super.c     |   8 ++++
 4 files changed, 101 insertions(+), 15 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index d3047fdc0377..ee0b894e1d99 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -233,7 +233,7 @@ int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat)
 	return err;
 }
 
-static struct ovl_fh *ovl_encode_fh(struct dentry *lower)
+struct ovl_fh *ovl_encode_fh(struct dentry *lower)
 {
 	struct ovl_fh *fh;
 	int fh_type, fh_len, dwords;
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 4ca6061f7bfa..c08e334c879c 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -88,13 +88,10 @@ static int ovl_acceptable(void *ctx, struct dentry *dentry)
 	return 1;
 }
 
-static struct dentry *ovl_get_origin(struct dentry *dentry,
-				     struct vfsmount *mnt)
+static struct ovl_fh *ovl_get_origin_fh(struct dentry *dentry)
 {
 	int res;
 	struct ovl_fh *fh = NULL;
-	struct dentry *origin = NULL;
-	int bytes;
 
 	res = vfs_getxattr(dentry, OVL_XATTR_ORIGIN, NULL, 0);
 	if (res < 0) {
@@ -106,7 +103,7 @@ static struct dentry *ovl_get_origin(struct dentry *dentry,
 	if (res == 0)
 		return NULL;
 
-	fh  = kzalloc(res, GFP_TEMPORARY);
+	fh = kzalloc(res, GFP_TEMPORARY);
 	if (!fh)
 		return ERR_PTR(-ENOMEM);
 
@@ -129,7 +126,29 @@ static struct dentry *ovl_get_origin(struct dentry *dentry,
 	    (fh->flags & OVL_FH_FLAG_BIG_ENDIAN) != OVL_FH_FLAG_CPU_ENDIAN)
 		goto out;
 
-	bytes = (fh->len - offsetof(struct ovl_fh, fid));
+	return fh;
+
+out:
+	kfree(fh);
+	return NULL;
+
+fail:
+	pr_warn_ratelimited("overlayfs: failed to get origin (%i)\n", res);
+	goto out;
+invalid:
+	pr_warn_ratelimited("overlayfs: invalid origin (%*phN)\n", res, fh);
+	goto out;
+}
+
+static struct dentry *ovl_get_origin(struct dentry *dentry,
+				     struct vfsmount *mnt)
+{
+	struct dentry *origin = NULL;
+	struct ovl_fh *fh = ovl_get_origin_fh(dentry);
+	int bytes;
+
+	if (IS_ERR_OR_NULL(fh))
+		return (struct dentry *)fh;
 
 	/*
 	 * Make sure that the stored uuid matches the uuid of the lower
@@ -138,6 +157,7 @@ static struct dentry *ovl_get_origin(struct dentry *dentry,
 	if (uuid_be_cmp(fh->uuid, *(uuid_be *) &mnt->mnt_sb->s_uuid))
 		goto out;
 
+	bytes = (fh->len - offsetof(struct ovl_fh, fid));
 	origin = exportfs_decode_fh(mnt, (struct fid *)fh->fid,
 				    bytes >> 2, (int)fh->type,
 				    ovl_acceptable, NULL);
@@ -149,21 +169,17 @@ static struct dentry *ovl_get_origin(struct dentry *dentry,
 	}
 
 	if (ovl_dentry_weird(origin) ||
-	    ((d_inode(origin)->i_mode ^ d_inode(dentry)->i_mode) & S_IFMT)) {
-		dput(origin);
-		origin = NULL;
+	    ((d_inode(origin)->i_mode ^ d_inode(dentry)->i_mode) & S_IFMT))
 		goto invalid;
-	}
 
 out:
 	kfree(fh);
 	return origin;
 
-fail:
-	pr_warn_ratelimited("overlayfs: failed to get origin (%i)\n", res);
-	goto out;
 invalid:
-	pr_warn_ratelimited("overlayfs: invalid origin (%*phN)\n", res, fh);
+	pr_warn_ratelimited("overlayfs: invalid origin (%pd2)\n", origin);
+	dput(origin);
+	origin = NULL;
 	goto out;
 }
 
@@ -305,6 +321,65 @@ static int ovl_check_origin(struct dentry *dentry, struct dentry *upperdentry,
 }
 
 /*
+ * Verify that @fh matches the origin file handle stored in OVL_XATTR_ORIGIN.
+ * Return 0 on match, -ESTALE on mismatch, < 0 on error.
+ */
+static int ovl_verify_origin_fh(struct dentry *dentry, struct ovl_fh *fh)
+{
+	struct ovl_fh *ofh = ovl_get_origin_fh(dentry);
+	int err = 0;
+
+	if (!ofh)
+		return -ENODATA;
+
+	if (IS_ERR(ofh))
+		return PTR_ERR(ofh);
+
+	if (fh->len != ofh->len || memcmp(fh, ofh, fh->len))
+		err = -ESTALE;
+
+	kfree(ofh);
+	return err;
+}
+
+/*
+ * Verify that an inode matches the origin file handle stored in upper inode.
+ *
+ * If @set is true and there is no stored file handle, encode and store origin
+ * file handle in OVL_XATTR_ORIGIN.
+ *
+ * Return 0 on match, -ESTALE on mismatch, < 0 on error.
+ */
+int ovl_verify_origin(struct dentry *dentry, struct vfsmount *mnt,
+		      struct dentry *origin, bool set)
+{
+	struct inode *inode = NULL;
+	struct ovl_fh *fh = NULL;
+	int err;
+
+	fh = ovl_encode_fh(origin);
+	err = PTR_ERR(fh);
+	if (IS_ERR(fh))
+		goto fail;
+
+	err = ovl_verify_origin_fh(dentry, fh);
+	if (set && err == -ENODATA)
+		err = ovl_do_setxattr(dentry, OVL_XATTR_ORIGIN, fh, fh->len, 0);
+	if (err)
+		goto fail;
+
+out:
+	kfree(fh);
+	return err;
+
+fail:
+	inode = d_inode(origin);
+	pr_warn_ratelimited("overlayfs: failed to verify origin (%pd2, ino=%lu, err=%i)\n",
+			    origin, inode ? inode->i_ino : 0, err);
+	goto out;
+}
+
+/*
  * Returns next layer in stack starting from top.
  * Returns -1 if this is the last layer.
  */
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 9ec93063c3a8..12d96a77c217 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -235,6 +235,8 @@ static inline bool ovl_is_impuredir(struct dentry *dentry)
 
 
 /* namei.c */
+int ovl_verify_origin(struct dentry *dentry, struct vfsmount *mnt,
+		      struct dentry *origin, bool set);
 int ovl_path_next(int idx, struct dentry *dentry, struct path *path, int *idxp);
 struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags);
 bool ovl_lower_positive(struct dentry *dentry);
@@ -293,3 +295,4 @@ int ovl_copy_up(struct dentry *dentry);
 int ovl_copy_up_flags(struct dentry *dentry, int flags);
 int ovl_copy_xattr(struct dentry *old, struct dentry *new);
 int ovl_set_attr(struct dentry *upper, struct kstat *stat);
+struct ovl_fh *ovl_encode_fh(struct dentry *lower);
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index d92c7486bdf2..7c621bb265db 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1030,6 +1030,14 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 		ufs->same_sb = NULL;
 
 	if (!(ovl_force_readonly(ufs)) && ufs->config.index) {
+		/* Verify lower root is upper root origin */
+		err = ovl_verify_origin(upperpath.dentry, ufs->lower_mnt[0],
+					stack[0].dentry, true);
+		if (err) {
+			pr_err("overlayfs: failed to verify upper root origin\n");
+			goto out_put_lower_mnt;
+		}
+
 		ufs->indexdir = ovl_workdir_create(sb, ufs, workpath.dentry,
 						   OVL_INDEXDIR_NAME, true);
 		err = PTR_ERR(ufs->indexdir);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 07/25] ovl: verify index dir matches upper dir
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (5 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 06/25] ovl: verify upper root dir matches lower root dir Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 08/25] ovl: store path type in dentry Amir Goldstein
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

An index dir contains persistent hardlinks to files in upper dir.
Therefore, we must never mount an existing index dir with a differnt
upper dir.

Store the upper root dir file handle in index dir inode when index
dir is created and verify the file handle before using an existing
index dir on mount.

Add an 'is_upper' flag to the overlay file handle encoding and set it
when encoding the upper root file handle. This is not critical for index
dir verification, but it is good practice towards a standard overlayfs
file handle format for NFS export.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c   | 12 ++++++++++--
 fs/overlayfs/namei.c     |  4 ++--
 fs/overlayfs/overlayfs.h |  6 ++++--
 fs/overlayfs/super.c     | 13 +++++++++++--
 4 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index ee0b894e1d99..eedc26e15dad 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -233,7 +233,7 @@ int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat)
 	return err;
 }
 
-struct ovl_fh *ovl_encode_fh(struct dentry *lower)
+struct ovl_fh *ovl_encode_fh(struct dentry *lower, bool is_upper)
 {
 	struct ovl_fh *fh;
 	int fh_type, fh_len, dwords;
@@ -272,6 +272,14 @@ struct ovl_fh *ovl_encode_fh(struct dentry *lower)
 	fh->magic = OVL_FH_MAGIC;
 	fh->type = fh_type;
 	fh->flags = OVL_FH_FLAG_CPU_ENDIAN;
+	/*
+	 * When we will want to decode an overlay dentry from this handle
+	 * and all layers are on the same fs, if we get a disconncted real
+	 * dentry when we decode fid, the only way to tell if we should assign
+	 * it to upperdentry or to lowerstack is by checking this flag.
+	 */
+	if (is_upper)
+		fh->flags |= OVL_FH_FLAG_PATH_UPPER;
 	fh->len = fh_len;
 	fh->uuid = *uuid;
 	memcpy(fh->fid, buf, buflen);
@@ -293,7 +301,7 @@ static int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
 	 * up and a pure upper inode.
 	 */
 	if (ovl_can_decode_fh(lower->d_sb)) {
-		fh = ovl_encode_fh(lower);
+		fh = ovl_encode_fh(lower, false);
 		if (IS_ERR(fh))
 			return PTR_ERR(fh);
 	}
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index c08e334c879c..400ad20ceb15 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -351,13 +351,13 @@ static int ovl_verify_origin_fh(struct dentry *dentry, struct ovl_fh *fh)
  * Return 0 on match, -ESTALE on mismatch, < 0 on error.
  */
 int ovl_verify_origin(struct dentry *dentry, struct vfsmount *mnt,
-		      struct dentry *origin, bool set)
+		      struct dentry *origin, bool is_upper, bool set)
 {
 	struct inode *inode = NULL;
 	struct ovl_fh *fh = NULL;
 	int err;
 
-	fh = ovl_encode_fh(origin);
+	fh = ovl_encode_fh(origin, is_upper);
 	err = PTR_ERR(fh);
 	if (IS_ERR(fh))
 		goto fail;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 12d96a77c217..612b6e45c64b 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -38,6 +38,8 @@ enum ovl_path_type {
 /* CPU byte order required for fid decoding:  */
 #define OVL_FH_FLAG_BIG_ENDIAN	(1 << 0)
 #define OVL_FH_FLAG_ANY_ENDIAN	(1 << 1)
+/* Is the real inode encoded in fid an upper inode? */
+#define OVL_FH_FLAG_PATH_UPPER	(1 << 2)
 
 #define OVL_FH_FLAG_ALL (OVL_FH_FLAG_BIG_ENDIAN | OVL_FH_FLAG_ANY_ENDIAN)
 
@@ -236,7 +238,7 @@ static inline bool ovl_is_impuredir(struct dentry *dentry)
 
 /* namei.c */
 int ovl_verify_origin(struct dentry *dentry, struct vfsmount *mnt,
-		      struct dentry *origin, bool set);
+		      struct dentry *origin, bool is_upper, bool set);
 int ovl_path_next(int idx, struct dentry *dentry, struct path *path, int *idxp);
 struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags);
 bool ovl_lower_positive(struct dentry *dentry);
@@ -295,4 +297,4 @@ int ovl_copy_up(struct dentry *dentry);
 int ovl_copy_up_flags(struct dentry *dentry, int flags);
 int ovl_copy_xattr(struct dentry *old, struct dentry *new);
 int ovl_set_attr(struct dentry *upper, struct kstat *stat);
-struct ovl_fh *ovl_encode_fh(struct dentry *lower);
+struct ovl_fh *ovl_encode_fh(struct dentry *lower, bool is_upper);
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 7c621bb265db..a9de7d1893e2 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1032,7 +1032,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	if (!(ovl_force_readonly(ufs)) && ufs->config.index) {
 		/* Verify lower root is upper root origin */
 		err = ovl_verify_origin(upperpath.dentry, ufs->lower_mnt[0],
-					stack[0].dentry, true);
+					stack[0].dentry, false, true);
 		if (err) {
 			pr_err("overlayfs: failed to verify upper root origin\n");
 			goto out_put_lower_mnt;
@@ -1044,8 +1044,17 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 		if (IS_ERR(ufs->indexdir))
 			goto out_put_lower_mnt;
 
-		if (!ufs->indexdir)
+		if (ufs->indexdir) {
+			/* Verify upper root is index dir origin */
+			err = ovl_verify_origin(ufs->indexdir, ufs->upper_mnt,
+						upperpath.dentry, true, true);
+			if (err)
+				pr_err("overlayfs: failed to verify index dir origin\n");
+		}
+		if (err || !ufs->indexdir)
 			pr_warn("overlayfs: try deleting index dir or mounting with '-o index=off' to disable inodes index.\n");
+		if (err)
+			goto out_put_indexdir;
 	}
 
 	/* Show index=off/on in /proc/mounts for any of the reasons above */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 08/25] ovl: store path type in dentry
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (6 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 07/25] ovl: verify index dir matches upper dir Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 09/25] ovl: cram dentry state booleans into type flags Amir Goldstein
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

We would like to be able to set type flags during lookup (i.e. INDEX)
instead of deducing them from other state information.

Store the type value in ovl_entry and update the UPPER/MERGE/ORIGIN
flags when needed, so ovl_path_type() just returns the stored value.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/namei.c     |  1 +
 fs/overlayfs/overlayfs.h |  1 +
 fs/overlayfs/ovl_entry.h |  3 +++
 fs/overlayfs/super.c     |  1 +
 fs/overlayfs/util.c      | 20 +++++++++++++++++---
 5 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 400ad20ceb15..6e827a2b1cf0 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -548,6 +548,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	kfree(stack);
 	kfree(d.redirect);
 	dentry->d_fsdata = oe;
+	ovl_update_type(dentry, d.is_dir);
 	d_add(dentry, inode);
 
 	return NULL;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 612b6e45c64b..09852c879046 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -199,6 +199,7 @@ struct ovl_entry *ovl_alloc_entry(unsigned int numlower);
 bool ovl_dentry_remote(struct dentry *dentry);
 bool ovl_dentry_weird(struct dentry *dentry);
 enum ovl_path_type ovl_path_type(struct dentry *dentry);
+void ovl_update_type(struct dentry *dentry, bool is_dir);
 void ovl_path_upper(struct dentry *dentry, struct path *path);
 void ovl_path_lower(struct dentry *dentry, struct path *path);
 enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index ddd4b0a874a9..e94fa2638faf 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -40,6 +40,8 @@ struct ovl_fs {
 	struct super_block *same_sb;
 };
 
+enum ovl_path_type;
+
 /* private information held for every overlayfs dentry */
 struct ovl_entry {
 	struct dentry *__upperdentry;
@@ -54,6 +56,7 @@ struct ovl_entry {
 		};
 		struct rcu_head rcu;
 	};
+	enum ovl_path_type __type;
 	unsigned numlower;
 	struct path lowerstack[];
 };
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index a9de7d1893e2..c72d79749652 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1105,6 +1105,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	kfree(stack);
 
 	root_dentry->d_fsdata = oe;
+	ovl_update_type(root_dentry, true);
 
 	realinode = d_inode(ovl_dentry_real(root_dentry));
 	ovl_inode_init(d_inode(root_dentry), realinode, !!upperpath.dentry);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 8e2f308056f1..8e743c76aa03 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -91,24 +91,37 @@ bool ovl_dentry_weird(struct dentry *dentry)
 enum ovl_path_type ovl_path_type(struct dentry *dentry)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
+
+	return READ_ONCE(oe->__type);
+}
+
+void ovl_update_type(struct dentry *dentry, bool is_dir)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
 	enum ovl_path_type type = 0;
 
+	spin_lock(&dentry->d_lock);
 	if (oe->__upperdentry) {
 		type = __OVL_PATH_UPPER;
-
 		/*
 		 * Non-dir dentry can hold lower dentry of its copy up origin.
 		 */
 		if (oe->numlower) {
 			type |= __OVL_PATH_ORIGIN;
-			if (d_is_dir(dentry))
+			if (is_dir)
 				type |= __OVL_PATH_MERGE;
 		}
 	} else {
 		if (oe->numlower > 1)
 			type |= __OVL_PATH_MERGE;
 	}
-	return type;
+
+	/*
+	 * The UPPER/MERGE/ORIGIN flags can never be cleared during the
+	 * lifetime of a dentry, so don't bother masking them out first.
+	 */
+	oe->__type |= type;
+	spin_unlock(&dentry->d_lock);
 }
 
 void ovl_path_upper(struct dentry *dentry, struct path *path)
@@ -243,6 +256,7 @@ void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
 	 */
 	smp_wmb();
 	oe->__upperdentry = upperdentry;
+	ovl_update_type(dentry, d_is_dir(dentry));
 }
 
 void ovl_inode_init(struct inode *inode, struct inode *realinode, bool is_upper)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 09/25] ovl: cram dentry state booleans into type flags
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (7 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 08/25] ovl: store path type in dentry Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 10/25] ovl: lookup index entry for copy up origin Amir Goldstein
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

Like the other type flags, 'opaque' and 'impure' are states discovered in
lookup, set on certain ops (e.g.: create, rename) and never cleared.

We would like to add more state info to ovl_entry soon (indexed) and
and this state info would be added as type flags. It makes little sense
to have the 'opaque' and 'impure' states occupy a boolean each, so store
these states as type flags.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/namei.c     | 15 +++++++--------
 fs/overlayfs/overlayfs.h |  4 ++++
 fs/overlayfs/ovl_entry.h |  2 --
 fs/overlayfs/super.c     |  7 +++----
 fs/overlayfs/util.c      | 25 ++++++++++++++++---------
 5 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 6e827a2b1cf0..17170d89e012 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -415,8 +415,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	struct dentry *upperdir, *upperdentry = NULL;
 	unsigned int ctr = 0;
 	struct inode *inode = NULL;
-	bool upperopaque = false;
-	bool upperimpure = false;
+	enum ovl_path_type type = 0;
 	char *upperredirect = NULL;
 	struct dentry *this;
 	unsigned int i;
@@ -470,9 +469,10 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 			if (d.redirect[0] == '/')
 				poe = roe;
 		}
-		upperopaque = d.opaque;
-		if (upperdentry && d.is_dir)
-			upperimpure = ovl_is_impuredir(upperdentry);
+		if (d.opaque)
+			type |= __OVL_PATH_OPAQUE;
+		if (upperdentry && d.is_dir && ovl_is_impuredir(upperdentry))
+			type |= __OVL_PATH_IMPURE;
 	}
 
 	if (!d.stop && poe->numlower) {
@@ -540,8 +540,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	}
 
 	revert_creds(old_cred);
-	oe->opaque = upperopaque;
-	oe->impure = upperimpure;
+	oe->__type = type;
 	oe->redirect = upperredirect;
 	oe->__upperdentry = upperdentry;
 	memcpy(oe->lowerstack, stack, sizeof(struct path) * ctr);
@@ -582,7 +581,7 @@ bool ovl_lower_positive(struct dentry *dentry)
 	 * whiteout.
 	 */
 	if (!dentry->d_inode)
-		return oe->opaque;
+		return OVL_TYPE_OPAQUE(oe->__type);
 
 	/* Negative upper -> positive lower */
 	if (!oe->__upperdentry)
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 09852c879046..6b98c5fe7d21 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -14,11 +14,15 @@ enum ovl_path_type {
 	__OVL_PATH_UPPER	= (1 << 0),
 	__OVL_PATH_MERGE	= (1 << 1),
 	__OVL_PATH_ORIGIN	= (1 << 2),
+	__OVL_PATH_OPAQUE	= (1 << 3),
+	__OVL_PATH_IMPURE	= (1 << 4),
 };
 
 #define OVL_TYPE_UPPER(type)	((type) & __OVL_PATH_UPPER)
 #define OVL_TYPE_MERGE(type)	((type) & __OVL_PATH_MERGE)
 #define OVL_TYPE_ORIGIN(type)	((type) & __OVL_PATH_ORIGIN)
+#define OVL_TYPE_OPAQUE(type)	((type) & __OVL_PATH_OPAQUE)
+#define OVL_TYPE_IMPURE(type)	((type) & __OVL_PATH_IMPURE)
 
 #define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay."
 #define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX "opaque"
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index e94fa2638faf..6de8725cc38a 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -50,8 +50,6 @@ struct ovl_entry {
 		struct {
 			u64 version;
 			const char *redirect;
-			bool opaque;
-			bool impure;
 			bool copying;
 		};
 		struct rcu_head rcu;
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index c72d79749652..2e6062a073d4 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1094,10 +1094,9 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	mntput(workpath.mnt);
 	kfree(lowertmp);
 
-	if (upperpath.dentry) {
-		oe->__upperdentry = upperpath.dentry;
-		oe->impure = ovl_is_impuredir(upperpath.dentry);
-	}
+	oe->__upperdentry = upperpath.dentry;
+	if (upperpath.dentry && ovl_is_impuredir(upperpath.dentry))
+		oe->__type |= __OVL_PATH_IMPURE;
 	for (i = 0; i < numlower; i++) {
 		oe->lowerstack[i].dentry = stack[i].dentry;
 		oe->lowerstack[i].mnt = ufs->lower_mnt[i];
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 8e743c76aa03..9d2f3bb70e61 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -199,15 +199,12 @@ void ovl_set_dir_cache(struct dentry *dentry, struct ovl_dir_cache *cache)
 
 bool ovl_dentry_is_opaque(struct dentry *dentry)
 {
-	struct ovl_entry *oe = dentry->d_fsdata;
-	return oe->opaque;
+	return OVL_TYPE_OPAQUE(ovl_path_type(dentry));
 }
 
 bool ovl_dentry_is_impure(struct dentry *dentry)
 {
-	struct ovl_entry *oe = dentry->d_fsdata;
-
-	return oe->impure;
+	return OVL_TYPE_IMPURE(ovl_path_type(dentry));
 }
 
 bool ovl_dentry_is_whiteout(struct dentry *dentry)
@@ -219,7 +216,18 @@ void ovl_dentry_set_opaque(struct dentry *dentry)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
 
-	oe->opaque = true;
+	spin_lock(&dentry->d_lock);
+	oe->__type |= __OVL_PATH_OPAQUE;
+	spin_unlock(&dentry->d_lock);
+}
+
+static void ovl_dentry_set_impure(struct dentry *dentry)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+
+	spin_lock(&dentry->d_lock);
+	oe->__type |= __OVL_PATH_IMPURE;
+	spin_unlock(&dentry->d_lock);
 }
 
 bool ovl_redirect_dir(struct super_block *sb)
@@ -372,9 +380,8 @@ int ovl_check_setxattr(struct dentry *dentry, struct dentry *upperdentry,
 int ovl_set_impure(struct dentry *dentry, struct dentry *upperdentry)
 {
 	int err;
-	struct ovl_entry *oe = dentry->d_fsdata;
 
-	if (oe->impure)
+	if (ovl_dentry_is_impure(dentry))
 		return 0;
 
 	/*
@@ -384,7 +391,7 @@ int ovl_set_impure(struct dentry *dentry, struct dentry *upperdentry)
 	err = ovl_check_setxattr(dentry, upperdentry, OVL_XATTR_IMPURE,
 				 "y", 1, 0);
 	if (!err)
-		oe->impure = true;
+		ovl_dentry_set_impure(dentry);
 
 	return err;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 10/25] ovl: lookup index entry for copy up origin
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (8 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 09/25] ovl: cram dentry state booleans into type flags Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 11/25] ovl: cleanup bad and stale index entries on mount Amir Goldstein
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

When inodes index feature is enabled, lookup in indexdir for the index
entry of lower real inode or copy up origin inode. The index entry name
is the hex representation of the lower inode file handle.

If the index dentry in negative, then either no lower aliases have been
copied up yet, or aliases have been copied up in older kernels and are
not indexed.

If the index dentry for a copy up origin inode is positive, but points
to an inode different than the upper inode, then either the upper inode
has been copied up and not indexed or it was indexed, but since then
index dir was cleared. Either way, that index cannot be used to indentify
the overlay inode.

If a positive dentry that matches the upper inode was found, then it is
safe to use the copy up origin st_ino for upper hardlinks, because all
indexed upper hardlinks are represented by the same overlay inode as the
copy up origin.

Set the INDEX type flag on an indexed upper dentry. A non-upper dentry
may also have a positive index from copy up of another lower hardlink.
This situation will be handled by following patches.

Index lookup is going to be used to prevent breaking hardlinks on copy up.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/inode.c     |   8 +++-
 fs/overlayfs/namei.c     | 105 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/overlayfs/overlayfs.h |   4 ++
 3 files changed, 115 insertions(+), 2 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index d613e2c41242..b78993abdeea 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -96,11 +96,15 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
 
 			WARN_ON_ONCE(stat->dev != lowerstat.dev);
 			/*
-			 * Lower hardlinks are broken on copy up to different
+			 * Lower hardlinks may be broken on copy up to different
 			 * upper files, so we cannot use the lower origin st_ino
 			 * for those different files, even for the same fs case.
+			 * With inodes index enabled, it is safe to use st_ino
+			 * of an indexed hardlinked origin. The index validates
+			 * that the upper hardlink is not broken.
 			 */
-			if (is_dir || lowerstat.nlink == 1)
+			if (is_dir || lowerstat.nlink == 1 ||
+			    OVL_TYPE_INDEX(type))
 				stat->ino = lowerstat.ino;
 		}
 		stat->dev = dentry->d_sb->s_dev;
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 17170d89e012..debe489fbb47 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -380,6 +380,84 @@ int ovl_verify_origin(struct dentry *dentry, struct vfsmount *mnt,
 }
 
 /*
+ * Lookup in indexdir for the index entry of a lower real inode or a copy up
+ * origin inode. The index entry name is the hex representation of the lower
+ * inode file handle.
+ *
+ * If the index dentry in negative, then either no lower aliases have been
+ * copied up yet, or aliases have been copied up in older kernels and are
+ * not indexed.
+ *
+ * If the index dentry for a copy up origin inode is positive, but points
+ * to an inode different than the upper inode, then either the upper inode
+ * has been copied up and not indexed or it was indexed, but since then
+ * index dir was cleared. Either way, that index cannot be used to indentify
+ * the overlay inode.
+ */
+int ovl_lookup_index(struct dentry *dentry, struct dentry *upper,
+		     struct dentry *lower, struct dentry **indexp)
+{
+	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
+	struct ovl_fh *fh;
+	struct dentry *index = NULL;
+	struct inode *inode;
+	char *s, *name = NULL;
+	long namelen = 0;
+	int err;
+
+	if (WARN_ON(!ofs->indexdir))
+		return -ENOENT;
+
+	fh = ovl_encode_fh(lower, false);
+	if (IS_ERR(fh)) {
+		err = PTR_ERR(fh);
+		fh = NULL;
+		goto fail;
+	}
+
+	err = -ENAMETOOLONG;
+	namelen = fh->len * 2;
+	if (namelen > ofs->namelen)
+		goto fail;
+
+	err = -ENOMEM;
+	name = kzalloc(namelen + 1, GFP_TEMPORARY);
+	if (!name)
+		goto fail;
+
+	s  = bin2hex(name, fh, fh->len);
+	namelen = s - name;
+
+	index = lookup_one_len_unlocked(name, ofs->indexdir, namelen);
+	if (IS_ERR(index)) {
+		err = PTR_ERR(index);
+		goto fail;
+	}
+
+	if (upper && d_inode(index) != d_inode(upper)) {
+		inode = d_inode(index);
+		pr_debug("ovl_lookup_index: upper with origin not indexed (%pd2, ino=%lu)\n",
+			 upper, inode ? inode->i_ino : 0);
+		dput(index);
+		index = NULL;
+	}
+
+	*indexp = index;
+	err = 0;
+
+out:
+	kfree(fh);
+	kfree(name);
+	return err;
+
+fail:
+	inode = d_inode(lower);
+	pr_warn_ratelimited("overlayfs: failed inode index lookup (ino=%lu, key=%*s, err=%i); mount with '-o index=off' to disable inodes index.\n",
+			    inode ? inode->i_ino : 0, (int)namelen, name, err);
+	goto out;
+}
+
+/*
  * Returns next layer in stack starting from top.
  * Returns -1 if this is the last layer.
  */
@@ -413,6 +491,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
 	struct path *stack = NULL;
 	struct dentry *upperdir, *upperdentry = NULL;
+	struct dentry *index = NULL;
 	unsigned int ctr = 0;
 	struct inode *inode = NULL;
 	enum ovl_path_type type = 0;
@@ -513,6 +592,30 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		}
 	}
 
+	/* Lookup index by lower inode and verify it matches upper inode */
+	if (ctr && !d.is_dir && ovl_indexdir(dentry->d_sb)) {
+		struct dentry *origin = stack[0].dentry;
+
+		err = ovl_lookup_index(dentry, upperdentry, origin, &index);
+		if (err)
+			goto out_put;
+
+		if (!upperdentry) {
+			/* TODO: handle lookup of lower indexed entries */
+		} else if (index && d_inode(index)) {
+			/* Vertified indexed upper */
+			type |= __OVL_PATH_INDEX;
+		} else if (d_inode(origin)->i_nlink > 1) {
+			/*
+			 * Non-indexed upper of lower hardlink is a broken
+			 * hardlink - disown origin, because we cannot use
+			 * origin st_ino for a broken hardlink.
+			 */
+			dput(origin);
+			ctr = 0;
+		}
+	}
+
 	oe = ovl_alloc_entry(ctr);
 	err = -ENOMEM;
 	if (!oe)
@@ -544,6 +647,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	oe->redirect = upperredirect;
 	oe->__upperdentry = upperdentry;
 	memcpy(oe->lowerstack, stack, sizeof(struct path) * ctr);
+	dput(index);
 	kfree(stack);
 	kfree(d.redirect);
 	dentry->d_fsdata = oe;
@@ -555,6 +659,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 out_free_oe:
 	kfree(oe);
 out_put:
+	dput(index);
 	for (i = 0; i < ctr; i++)
 		dput(stack[i].dentry);
 	kfree(stack);
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 6b98c5fe7d21..596b728f394c 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -16,6 +16,7 @@ enum ovl_path_type {
 	__OVL_PATH_ORIGIN	= (1 << 2),
 	__OVL_PATH_OPAQUE	= (1 << 3),
 	__OVL_PATH_IMPURE	= (1 << 4),
+	__OVL_PATH_INDEX	= (1 << 5),
 };
 
 #define OVL_TYPE_UPPER(type)	((type) & __OVL_PATH_UPPER)
@@ -23,6 +24,7 @@ enum ovl_path_type {
 #define OVL_TYPE_ORIGIN(type)	((type) & __OVL_PATH_ORIGIN)
 #define OVL_TYPE_OPAQUE(type)	((type) & __OVL_PATH_OPAQUE)
 #define OVL_TYPE_IMPURE(type)	((type) & __OVL_PATH_IMPURE)
+#define OVL_TYPE_INDEX(type)	((type) & __OVL_PATH_INDEX)
 
 #define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay."
 #define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX "opaque"
@@ -244,6 +246,8 @@ static inline bool ovl_is_impuredir(struct dentry *dentry)
 /* namei.c */
 int ovl_verify_origin(struct dentry *dentry, struct vfsmount *mnt,
 		      struct dentry *origin, bool is_upper, bool set);
+int ovl_lookup_index(struct dentry *dentry, struct dentry *upper,
+		     struct dentry *lower, struct dentry **indexp);
 int ovl_path_next(int idx, struct dentry *dentry, struct path *path, int *idxp);
 struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags);
 bool ovl_lower_positive(struct dentry *dentry);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 11/25] ovl: cleanup bad and stale index entries on mount
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (9 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 10/25] ovl: lookup index entry for copy up origin Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 12/25] ovl: allocate an ovl_inode struct Amir Goldstein
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

Bad index entries are entries whose name does not match the
origin file handle stored in trusted.overlay.origin xattr.
Bad index entries could be a result of a system power off in
the middle of copy up.

Stale index entries are entries whose origin file handle is
stale. Stale index entries could be a result of copying layers
or removing lower entries while the overlay is not mounted.
The case of copying layers should be detected earlier by the
verification of upper root dir origin and index dir origin.

Both bad and stale index entries are detected and removed
on mount.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/dir.c       |  4 ++-
 fs/overlayfs/namei.c     | 74 ++++++++++++++++++++++++++++++++++++++++++------
 fs/overlayfs/overlayfs.h |  6 +++-
 fs/overlayfs/readdir.c   | 50 ++++++++++++++++++++++++++++++++
 fs/overlayfs/super.c     |  6 ++++
 5 files changed, 130 insertions(+), 10 deletions(-)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index a63a71656e9b..822be8647d88 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -24,7 +24,7 @@ module_param_named(redirect_max, ovl_redirect_max, ushort, 0644);
 MODULE_PARM_DESC(ovl_redirect_max,
 		 "Maximum length of absolute redirect xattr value");
 
-void ovl_cleanup(struct inode *wdir, struct dentry *wdentry)
+int ovl_cleanup(struct inode *wdir, struct dentry *wdentry)
 {
 	int err;
 
@@ -39,6 +39,8 @@ void ovl_cleanup(struct inode *wdir, struct dentry *wdentry)
 		pr_err("overlayfs: cleanup of '%pd2' failed (%i)\n",
 		       wdentry, err);
 	}
+
+	return err;
 }
 
 struct dentry *ovl_lookup_temp(struct dentry *workdir)
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index debe489fbb47..261ea51199c6 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -285,17 +285,17 @@ static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
 }
 
 
-static int ovl_check_origin(struct dentry *dentry, struct dentry *upperdentry,
+static int ovl_check_origin(struct dentry *upperdentry,
+			    struct path *lowerstack, unsigned int numlower,
 			    struct path **stackp, unsigned int *ctrp)
 {
-	struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
 	struct vfsmount *mnt;
 	struct dentry *origin = NULL;
 	int i;
 
 
-	for (i = 0; i < roe->numlower; i++) {
-		mnt = roe->lowerstack[i].mnt;
+	for (i = 0; i < numlower; i++) {
+		mnt = lowerstack[i].mnt;
 		origin = ovl_get_origin(upperdentry, mnt);
 		if (IS_ERR(origin))
 			return PTR_ERR(origin);
@@ -307,8 +307,9 @@ static int ovl_check_origin(struct dentry *dentry, struct dentry *upperdentry,
 	if (!origin)
 		return 0;
 
-	BUG_ON(*stackp || *ctrp);
-	*stackp = kmalloc(sizeof(struct path), GFP_TEMPORARY);
+	BUG_ON(*ctrp);
+	if (!*stackp)
+		*stackp = kmalloc(sizeof(struct path), GFP_TEMPORARY);
 	if (!*stackp) {
 		dput(origin);
 		return -ENOMEM;
@@ -380,6 +381,63 @@ int ovl_verify_origin(struct dentry *dentry, struct vfsmount *mnt,
 }
 
 /*
+ * Verify that an index entry name matches the origin file handle stored in
+ * OVL_XATTR_ORIGIN and that origin file handle can be decoded to lower path.
+ * Return 0 on match, -ESTALE on mismatch or stale origin, < 0 on error.
+ */
+int ovl_verify_index(struct dentry *index, struct path *lowerstack,
+		     unsigned int numlower)
+{
+	struct ovl_fh *fh = NULL;
+	size_t len;
+	struct path origin = { };
+	struct path *stack = &origin;
+	unsigned int ctr = 0;
+	int err;
+
+	if (!d_inode(index))
+		return 0;
+
+	err = -EISDIR;
+	if (d_is_dir(index))
+		goto fail;
+
+	err = -EINVAL;
+	if (index->d_name.len < sizeof(struct ovl_fh)*2)
+		goto fail;
+
+	err = -ENOMEM;
+	len = index->d_name.len / 2;
+	fh = kzalloc(len, GFP_TEMPORARY);
+	if (!fh)
+		goto fail;
+
+	err = -EINVAL;
+	if (hex2bin((u8 *)fh, index->d_name.name, len) || len != fh->len)
+		goto fail;
+
+	err = ovl_verify_origin_fh(index, fh);
+	if (err)
+		goto fail;
+
+	err = ovl_check_origin(index, lowerstack, numlower, &stack, &ctr);
+	if (!err && !ctr)
+		err = -ESTALE;
+	if (err)
+		goto fail;
+
+	dput(origin.dentry);
+out:
+	kfree(fh);
+	return err;
+
+fail:
+	pr_warn_ratelimited("overlayfs: failed to verify index (%pd2, err=%i)\n",
+			    index, err);
+	goto out;
+}
+
+/*
  * Lookup in indexdir for the index entry of a lower real inode or a copy up
  * origin inode. The index entry name is the hex representation of the lower
  * inode file handle.
@@ -535,8 +593,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 			 * number - it's the same as if we held a reference
 			 * to a dentry in lower layer that was moved under us.
 			 */
-			err = ovl_check_origin(dentry, upperdentry,
-					       &stack, &ctr);
+			err = ovl_check_origin(upperdentry, roe->lowerstack,
+					       roe->numlower, &stack, &ctr);
 			if (err)
 				goto out;
 		}
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 596b728f394c..a92af10fd369 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -246,6 +246,8 @@ static inline bool ovl_is_impuredir(struct dentry *dentry)
 /* namei.c */
 int ovl_verify_origin(struct dentry *dentry, struct vfsmount *mnt,
 		      struct dentry *origin, bool is_upper, bool set);
+int ovl_verify_index(struct dentry *index, struct path *lowerstack,
+		     unsigned int numlower);
 int ovl_lookup_index(struct dentry *dentry, struct dentry *upper,
 		     struct dentry *lower, struct dentry **indexp);
 int ovl_path_next(int idx, struct dentry *dentry, struct path *path, int *idxp);
@@ -260,6 +262,8 @@ void ovl_cache_free(struct list_head *list);
 int ovl_check_d_type_supported(struct path *realpath);
 void ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
 			 struct dentry *dentry, int level);
+int ovl_indexdir_cleanup(struct dentry *dentry, struct vfsmount *mnt,
+			 struct path *lowerstack, unsigned int numlower);
 
 /* inode.c */
 int ovl_setattr(struct dentry *dentry, struct iattr *attr);
@@ -299,7 +303,7 @@ struct cattr {
 int ovl_create_real(struct inode *dir, struct dentry *newdentry,
 		    struct cattr *attr,
 		    struct dentry *hardlink, bool debug);
-void ovl_cleanup(struct inode *dir, struct dentry *dentry);
+int ovl_cleanup(struct inode *dir, struct dentry *dentry);
 
 /* copy_up.c */
 int ovl_copy_up(struct dentry *dentry);
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 3de5ebe7e7b9..e59d3e1c9173 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -786,3 +786,53 @@ void ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
 		ovl_cleanup(dir, dentry);
 	}
 }
+
+int ovl_indexdir_cleanup(struct dentry *dentry, struct vfsmount *mnt,
+			 struct path *lowerstack, unsigned int numlower)
+{
+	int err;
+	struct inode *dir = dentry->d_inode;
+	struct path path = { .mnt = mnt, .dentry = dentry };
+	LIST_HEAD(list);
+	struct ovl_cache_entry *p;
+	struct ovl_readdir_data rdd = {
+		.ctx.actor = ovl_fill_merge,
+		.dentry = NULL,
+		.list = &list,
+		.root = RB_ROOT,
+		.is_lowest = false,
+	};
+
+	err = ovl_dir_read(&path, &rdd);
+	if (err)
+		goto out;
+
+	inode_lock_nested(dir, I_MUTEX_PARENT);
+	list_for_each_entry(p, &list, l_node) {
+		struct dentry *index;
+
+		if (p->name[0] == '.') {
+			if (p->len == 1)
+				continue;
+			if (p->len == 2 && p->name[1] == '.')
+				continue;
+		}
+		index = lookup_one_len(p->name, dentry, p->len);
+		if (IS_ERR(index)) {
+			err = PTR_ERR(index);
+			break;
+		}
+		if (ovl_verify_index(index, lowerstack, numlower)) {
+			err = ovl_cleanup(dir, index);
+			if (err)
+				break;
+		}
+		dput(index);
+	}
+	inode_unlock(dir);
+out:
+	ovl_cache_free(&list);
+	if (err)
+		pr_err("overlayfs: failed index dir cleanup (%i)\n", err);
+	return err;
+}
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 2e6062a073d4..b3ac406654bd 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1050,6 +1050,12 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 						upperpath.dentry, true, true);
 			if (err)
 				pr_err("overlayfs: failed to verify index dir origin\n");
+
+			/* Cleanup bad/stale index entries */
+			if (!err)
+				err = ovl_indexdir_cleanup(ufs->indexdir,
+							   ufs->upper_mnt,
+							   stack, numlower);
 		}
 		if (err || !ufs->indexdir)
 			pr_warn("overlayfs: try deleting index dir or mounting with '-o index=off' to disable inodes index.\n");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 12/25] ovl: allocate an ovl_inode struct
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (10 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 11/25] ovl: cleanup bad and stale index entries on mount Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 13/25] ovl: store upper/lower real inode in ovl_inode_info Amir Goldstein
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

We need some more space to store overlay inode data in memory,
so allocate overlay inodes from a slab of struct ovl_inode.

The ovl_inode struct includes a mutex that is going to be used
for synchronizing copy up of lower hardlinks.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/ovl_entry.h | 12 ++++++++++
 fs/overlayfs/super.c     | 62 ++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 6de8725cc38a..6c36fa70abe4 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -65,3 +65,15 @@ static inline struct dentry *ovl_upperdentry_dereference(struct ovl_entry *oe)
 {
 	return lockless_dereference(oe->__upperdentry);
 }
+
+struct ovl_inode {
+	/* keep this first */
+	struct inode vfs_inode;
+	/* synchronize copy up and more */
+	struct mutex oi_lock;
+};
+
+static inline struct ovl_inode *OVL_I(struct inode *inode)
+{
+	return (struct ovl_inode *) inode;
+}
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index b3ac406654bd..d32ae4c00f9c 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -170,6 +170,34 @@ static const struct dentry_operations ovl_reval_dentry_operations = {
 	.d_weak_revalidate = ovl_dentry_weak_revalidate,
 };
 
+static struct kmem_cache *ovl_inode_cachep;
+
+static struct inode *ovl_alloc_inode(struct super_block *sb)
+{
+	struct inode *inode;
+
+	inode = kmem_cache_alloc(ovl_inode_cachep, GFP_KERNEL);
+	if (!inode)
+		return NULL;
+
+	mutex_init(&OVL_I(inode)->oi_lock);
+
+	return inode;
+}
+
+static void ovl_i_callback(struct rcu_head *head)
+{
+	struct inode *inode = container_of(head, struct inode, i_rcu);
+
+	kmem_cache_free(ovl_inode_cachep, inode);
+}
+
+static void ovl_destroy_inode(struct inode *inode)
+{
+	mutex_destroy(&OVL_I(inode)->oi_lock);
+	call_rcu(&inode->i_rcu, ovl_i_callback);
+}
+
 /* Get exclusive ownership on upper/work dir among overlay mounts */
 static bool ovl_dir_lock(struct dentry *dentry)
 {
@@ -294,12 +322,14 @@ static int ovl_remount(struct super_block *sb, int *flags, char *data)
 }
 
 static const struct super_operations ovl_super_operations = {
+	.alloc_inode	= ovl_alloc_inode,
+	.destroy_inode	= ovl_destroy_inode,
+	.drop_inode	= generic_delete_inode,
 	.put_super	= ovl_put_super,
 	.sync_fs	= ovl_sync_fs,
 	.statfs		= ovl_statfs,
 	.show_options	= ovl_show_options,
 	.remount_fs	= ovl_remount,
-	.drop_inode	= generic_delete_inode,
 };
 
 enum {
@@ -1171,14 +1201,42 @@ static struct file_system_type ovl_fs_type = {
 };
 MODULE_ALIAS_FS("overlay");
 
+static void ovl_inode_init_once(void *foo)
+{
+	struct inode *inode = foo;
+
+	inode_init_once(inode);
+}
+
 static int __init ovl_init(void)
 {
-	return register_filesystem(&ovl_fs_type);
+	int err;
+
+	ovl_inode_cachep = kmem_cache_create("ovl_inode",
+					     sizeof(struct ovl_inode), 0,
+					     SLAB_HWCACHE_ALIGN|SLAB_ACCOUNT,
+					     ovl_inode_init_once);
+	if (ovl_inode_cachep == NULL)
+		return -ENOMEM;
+
+	err = register_filesystem(&ovl_fs_type);
+	if (err)
+		kmem_cache_destroy(ovl_inode_cachep);
+
+	return err;
 }
 
 static void __exit ovl_exit(void)
 {
 	unregister_filesystem(&ovl_fs_type);
+
+	/*
+	 * Make sure all delayed rcu free inodes are flushed before we
+	 * destroy cache.
+	 */
+	rcu_barrier();
+	kmem_cache_destroy(ovl_inode_cachep);
+
 }
 
 module_init(ovl_init);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 13/25] ovl: store upper/lower real inode in ovl_inode_info
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (11 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 12/25] ovl: allocate an ovl_inode struct Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 14/25] ovl: use ovl_inode_init() for initializing new inode Amir Goldstein
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

Introduce ovl_inode_info struct that is embedded in ovl_inode
and contains a reference to lowerinode and/or upperinode.

Storing the upper/lower real inode in ovl_inode_info replaces
the method of storing realinode & ISUPPER flag in vfs inode
i_private field.

This will be used for hashing overlay inodes before copy up.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/inode.c     |  5 +++--
 fs/overlayfs/overlayfs.h | 13 +------------
 fs/overlayfs/ovl_entry.h | 12 ++++++++++++
 fs/overlayfs/super.c     |  1 +
 fs/overlayfs/util.c      | 28 ++++++++++++++++++++++++----
 5 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index b78993abdeea..49492b982726 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -13,6 +13,7 @@
 #include <linux/xattr.h>
 #include <linux/posix_acl.h>
 #include "overlayfs.h"
+#include "ovl_entry.h"
 
 int ovl_setattr(struct dentry *dentry, struct iattr *attr)
 {
@@ -457,12 +458,12 @@ struct inode *ovl_new_inode(struct super_block *sb, umode_t mode, dev_t rdev)
 
 static int ovl_inode_test(struct inode *inode, void *data)
 {
-	return ovl_inode_real(inode, NULL) == data;
+	return OVL_I_INFO(inode)->__upperinode == data;
 }
 
 static int ovl_inode_set(struct inode *inode, void *data)
 {
-	inode->i_private = (void *) (((unsigned long) data) | OVL_ISUPPER_MASK);
+	OVL_I_INFO(inode)->__upperinode = data;
 	return 0;
 }
 
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index a92af10fd369..35cefa9e9de6 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -68,8 +68,6 @@ struct ovl_fh {
 	u8 fid[0];	/* file identifier */
 } __packed;
 
-#define OVL_ISUPPER_MASK 1UL
-
 static inline int ovl_do_rmdir(struct inode *dir, struct dentry *dentry)
 {
 	int err = vfs_rmdir(dir, dentry);
@@ -183,16 +181,6 @@ static inline struct dentry *ovl_do_tmpfile(struct dentry *dentry, umode_t mode)
 	return ret;
 }
 
-static inline struct inode *ovl_inode_real(struct inode *inode, bool *is_upper)
-{
-	unsigned long x = (unsigned long) READ_ONCE(inode->i_private);
-
-	if (is_upper)
-		*is_upper = x & OVL_ISUPPER_MASK;
-
-	return (struct inode *) (x & ~OVL_ISUPPER_MASK);
-}
-
 /* util.c */
 int ovl_want_write(struct dentry *dentry);
 void ovl_drop_write(struct dentry *dentry);
@@ -224,6 +212,7 @@ void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
 void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry);
 void ovl_inode_init(struct inode *inode, struct inode *realinode,
 		    bool is_upper);
+struct inode *ovl_inode_real(struct inode *inode, bool *is_upper);
 void ovl_inode_update(struct inode *inode, struct inode *upperinode);
 void ovl_dentry_version_inc(struct dentry *dentry);
 u64 ovl_dentry_version_get(struct dentry *dentry);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 6c36fa70abe4..6e240bc5ac1d 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -66,9 +66,16 @@ static inline struct dentry *ovl_upperdentry_dereference(struct ovl_entry *oe)
 	return lockless_dereference(oe->__upperdentry);
 }
 
+/* private information embedded in every overlayfs inode */
+struct ovl_inode_info {
+	struct inode *__upperinode;
+	struct inode *lowerinode;
+};
+
 struct ovl_inode {
 	/* keep this first */
 	struct inode vfs_inode;
+	struct ovl_inode_info info;
 	/* synchronize copy up and more */
 	struct mutex oi_lock;
 };
@@ -77,3 +84,8 @@ static inline struct ovl_inode *OVL_I(struct inode *inode)
 {
 	return (struct ovl_inode *) inode;
 }
+
+static inline struct ovl_inode_info *OVL_I_INFO(struct inode *inode)
+{
+	return &OVL_I(inode)->info;
+}
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index d32ae4c00f9c..33ff179e32d7 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -181,6 +181,7 @@ static struct inode *ovl_alloc_inode(struct super_block *sb)
 		return NULL;
 
 	mutex_init(&OVL_I(inode)->oi_lock);
+	memset(OVL_I_INFO(inode), 0, sizeof(struct ovl_inode_info));
 
 	return inode;
 }
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 9d2f3bb70e61..916d4e5719a8 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -269,16 +269,36 @@ void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
 
 void ovl_inode_init(struct inode *inode, struct inode *realinode, bool is_upper)
 {
-	WRITE_ONCE(inode->i_private, (unsigned long) realinode |
-		   (is_upper ? OVL_ISUPPER_MASK : 0));
+	struct ovl_inode_info *oi = OVL_I_INFO(inode);
+
+	if (is_upper) {
+		oi->__upperinode = realinode;
+		oi->lowerinode = NULL;
+	} else {
+		oi->__upperinode = NULL;
+		oi->lowerinode = realinode;
+	}
+}
+
+struct inode *ovl_inode_real(struct inode *inode, bool *is_upper)
+{
+	struct ovl_inode_info *oi = OVL_I_INFO(inode);
+	struct inode *realinode;
+
+	realinode = READ_ONCE(oi->__upperinode);
+	if (is_upper)
+		*is_upper = !!realinode;
+	if (!realinode)
+		realinode = oi->lowerinode;
+
+	return realinode;
 }
 
 void ovl_inode_update(struct inode *inode, struct inode *upperinode)
 {
 	WARN_ON(!upperinode);
 	WARN_ON(!inode_unhashed(inode));
-	WRITE_ONCE(inode->i_private,
-		   (unsigned long) upperinode | OVL_ISUPPER_MASK);
+	WRITE_ONCE(OVL_I_INFO(inode)->__upperinode, upperinode);
 	if (!S_ISDIR(upperinode->i_mode))
 		__insert_inode_hash(inode, (unsigned long) upperinode);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 14/25] ovl: use ovl_inode_init() for initializing new inode
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (12 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 13/25] ovl: store upper/lower real inode in ovl_inode_info Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 15/25] ovl: hash overlay non-dir inodes by copy up origin inode Amir Goldstein
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

In preparation for hashing all overlay inodes, modify the helper
ovl_inode_init() to hash a new non-dir upper overlay inode.

Use this helper for initializing new upper inode in ovl_instantiate()
instead of ovl_inode_update(), because the implementations for new inode
case and copy up case are going to diverge.

Also move ovl_copyattr() into ovl_inode_init() and ovl_inode_get(),
so attributes will be copied only for new inode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/dir.c   |  3 +--
 fs/overlayfs/inode.c |  1 +
 fs/overlayfs/namei.c |  1 -
 fs/overlayfs/util.c  | 11 ++++++++++-
 4 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 822be8647d88..7bcb5a19e514 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -158,8 +158,7 @@ static void ovl_instantiate(struct dentry *dentry, struct inode *inode,
 	ovl_dentry_version_inc(dentry->d_parent);
 	ovl_dentry_update(dentry, newdentry);
 	if (!hardlink) {
-		ovl_inode_update(inode, d_inode(newdentry));
-		ovl_copyattr(newdentry->d_inode, inode);
+		ovl_inode_init(inode, d_inode(newdentry), true);
 	} else {
 		WARN_ON(ovl_inode_real(inode, NULL) != d_inode(newdentry));
 		inc_nlink(inode);
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 49492b982726..6f0ec691d8d5 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -477,6 +477,7 @@ struct inode *ovl_get_inode(struct super_block *sb, struct inode *realinode)
 	if (inode && inode->i_state & I_NEW) {
 		ovl_fill_inode(inode, realinode->i_mode, realinode->i_rdev);
 		set_nlink(inode, realinode->i_nlink);
+		ovl_copyattr(realinode, inode);
 		unlock_new_inode(inode);
 	}
 
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 261ea51199c6..29f57fe6d921 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -697,7 +697,6 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		}
 		if (!inode)
 			goto out_free_oe;
-		ovl_copyattr(realdentry->d_inode, inode);
 	}
 
 	revert_creds(old_cred);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 916d4e5719a8..1769cc81a50f 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -267,6 +267,12 @@ void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
 	ovl_update_type(dentry, d_is_dir(dentry));
 }
 
+static void ovl_insert_inode_hash(struct inode *inode, struct inode *realinode)
+{
+	WARN_ON(!inode_unhashed(inode));
+	__insert_inode_hash(inode, (unsigned long) realinode);
+}
+
 void ovl_inode_init(struct inode *inode, struct inode *realinode, bool is_upper)
 {
 	struct ovl_inode_info *oi = OVL_I_INFO(inode);
@@ -274,10 +280,13 @@ void ovl_inode_init(struct inode *inode, struct inode *realinode, bool is_upper)
 	if (is_upper) {
 		oi->__upperinode = realinode;
 		oi->lowerinode = NULL;
+		if (!S_ISDIR(realinode->i_mode))
+			ovl_insert_inode_hash(inode, realinode);
 	} else {
 		oi->__upperinode = NULL;
 		oi->lowerinode = realinode;
 	}
+	ovl_copyattr(realinode, inode);
 }
 
 struct inode *ovl_inode_real(struct inode *inode, bool *is_upper)
@@ -300,7 +309,7 @@ void ovl_inode_update(struct inode *inode, struct inode *upperinode)
 	WARN_ON(!inode_unhashed(inode));
 	WRITE_ONCE(OVL_I_INFO(inode)->__upperinode, upperinode);
 	if (!S_ISDIR(upperinode->i_mode))
-		__insert_inode_hash(inode, (unsigned long) upperinode);
+		ovl_insert_inode_hash(inode, upperinode);
 }
 
 void ovl_dentry_version_inc(struct dentry *dentry)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 15/25] ovl: hash overlay non-dir inodes by copy up origin inode
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (13 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 14/25] ovl: use ovl_inode_init() for initializing new inode Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 16/25] ovl: defer upper dir lock to tempfile link Amir Goldstein
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

When inodes index feature is enabled, hash all non-dir inodes by the
address of the copy up origin inode if origin inode is indexed.

Non-upper overlay inodes are hashed by the lower real inode, which is
the copy up origin to be.

This change makes all lower hardlinks and their indexed copy ups be
represented by a single overlay inode and is needed for vfs inode
consistency after hardlinks are no longer broken on copy up.

Because overlay dentries of lower hardlink aliases have the same overlay
inode, a non indexed copy up of those lower aliases will cause a conflict
when trying to update the realinode to the broken upper hardlink.
A non indexed copy up of an alias that lost in this conflict will return
-EEXIST and drop the overlay dentry. The next lookup of that broken
upper hardlink will return as upper entry with a new overlay inode.

This conflict is going to be handled more gracefully by following patches.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c   |  6 +++++-
 fs/overlayfs/inode.c     | 49 ++++++++++++++++++++++++++++++++++++++++++------
 fs/overlayfs/namei.c     | 22 ++++++++++++++++++++--
 fs/overlayfs/overlayfs.h |  7 +++++--
 fs/overlayfs/util.c      | 33 ++++++++++++++++++++++++++++----
 5 files changed, 102 insertions(+), 15 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index eedc26e15dad..ae18824c7944 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -418,7 +418,11 @@ static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
 
 	newdentry = dget(tmpfile ? upper : temp);
 	ovl_dentry_update(dentry, newdentry);
-	ovl_inode_update(d_inode(dentry), d_inode(newdentry));
+	err = ovl_inode_update(d_inode(dentry), d_inode(newdentry));
+	if (err) {
+		/* Broken hardlink - drop cache and return error */
+		d_drop(dentry);
+	}
 
 	/* Restore timestamps on parent (best effort) */
 	ovl_set_timestamps(upperdir, pstat);
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 6f0ec691d8d5..28f9a8cc0f61 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -458,22 +458,59 @@ struct inode *ovl_new_inode(struct super_block *sb, umode_t mode, dev_t rdev)
 
 static int ovl_inode_test(struct inode *inode, void *data)
 {
-	return OVL_I_INFO(inode)->__upperinode == data;
+	struct ovl_inode_info *oi = OVL_I_INFO(inode);
+	struct ovl_inode_info *info = data;
+	bool indexed = ovl_indexdir(inode->i_sb);
+
+	if (indexed && info->lowerinode != oi->lowerinode)
+		return false;
+
+	if (info->__upperinode == oi->__upperinode)
+		return true;
+
+	/*
+	 * Return same overlay inode when looking up by lower real inode, but
+	 * an indexed overlay inode, that is hashed by the same lower origin
+	 * inode, has already been updated on copy up to a real upper inode.
+	 */
+	return indexed && !info->__upperinode;
 }
 
 static int ovl_inode_set(struct inode *inode, void *data)
 {
-	OVL_I_INFO(inode)->__upperinode = data;
+	struct ovl_inode_info *oi = OVL_I_INFO(inode);
+	struct ovl_inode_info *info = data;
+
+	oi->__upperinode = info->__upperinode;
+	oi->lowerinode = info->lowerinode;
+
 	return 0;
 }
 
-struct inode *ovl_get_inode(struct super_block *sb, struct inode *realinode)
-
+struct inode *ovl_get_inode(struct super_block *sb, struct ovl_inode_info *info)
 {
+	struct inode *realinode = info->__upperinode;
+	unsigned long hashval = (unsigned long) realinode;
 	struct inode *inode;
 
-	inode = iget5_locked(sb, (unsigned long) realinode,
-			     ovl_inode_test, ovl_inode_set, realinode);
+	/*
+	 * With inodes index feature enabled, overlay inodes hash key is the
+	 * address of the copy up origin lower inode if it is indexed.
+	 * When hashing a non-upper overlay inode, origin has to be set to
+	 * the real lower inode, which is the copy up origin inode to be.
+	 * Regardless of the inode we use as hash key, we always use the
+	 * uppermost real inode to initialize ovl_inode.
+	 */
+	if (info->lowerinode) {
+		WARN_ON(!ovl_indexdir(sb));
+		hashval = (unsigned long) info->lowerinode;
+		if (!realinode)
+			realinode = info->lowerinode;
+	} else if (WARN_ON(!realinode)) {
+		return NULL;
+	}
+
+	inode = iget5_locked(sb, hashval, ovl_inode_test, ovl_inode_set, info);
 	if (inode && inode->i_state & I_NEW) {
 		ovl_fill_inode(inode, realinode->i_mode, realinode->i_rdev);
 		set_nlink(inode, realinode->i_nlink);
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 29f57fe6d921..08154dad725e 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -682,13 +682,31 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	if (upperdentry || ctr) {
 		struct dentry *realdentry;
 		struct inode *realinode;
+		struct ovl_inode_info info = { };
 
 		realdentry = upperdentry ? upperdentry : stack[0].dentry;
 		realinode = d_inode(realdentry);
+		/*
+		 * When inodes index is enabled, we hash upper non-dir inodes
+		 * by the address of the copy up origin inode if origin inode
+		 * is indexed (positive index) and we hash non-upper non-dir
+		 * inodes by the address of the lower inode (negative index).
+		 *
+		 * When inodes index is disabled, or if origin is not indexed,
+		 * we hash non-dir upper inodes by the address of upper inode.
+		 */
+		if (upperdentry) {
+			BUG_ON(index && realinode != d_inode(index));
+			info.__upperinode = realinode;
+		}
+		if (index) {
+			BUG_ON(!ctr);
+			info.lowerinode = d_inode(stack[0].dentry);
+		}
 
 		err = -ENOMEM;
-		if (upperdentry && !d_is_dir(upperdentry)) {
-			inode = ovl_get_inode(dentry->d_sb, realinode);
+		if ((upperdentry || index) && !d.is_dir) {
+			inode = ovl_get_inode(dentry->d_sb, &info);
 		} else {
 			inode = ovl_new_inode(dentry->d_sb, realinode->i_mode,
 					      realinode->i_rdev);
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 35cefa9e9de6..79cee43eb84d 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -213,7 +213,7 @@ void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry);
 void ovl_inode_init(struct inode *inode, struct inode *realinode,
 		    bool is_upper);
 struct inode *ovl_inode_real(struct inode *inode, bool *is_upper);
-void ovl_inode_update(struct inode *inode, struct inode *upperinode);
+int ovl_inode_update(struct inode *inode, struct inode *upperinode);
 void ovl_dentry_version_inc(struct dentry *dentry);
 u64 ovl_dentry_version_get(struct dentry *dentry);
 bool ovl_is_whiteout(struct dentry *dentry);
@@ -270,7 +270,10 @@ int ovl_update_time(struct inode *inode, struct timespec *ts, int flags);
 bool ovl_is_private_xattr(const char *name);
 
 struct inode *ovl_new_inode(struct super_block *sb, umode_t mode, dev_t rdev);
-struct inode *ovl_get_inode(struct super_block *sb, struct inode *realinode);
+struct ovl_inode_info;
+struct inode *ovl_get_inode(struct super_block *sb,
+			    struct ovl_inode_info *info);
+
 static inline void ovl_copyattr(struct inode *from, struct inode *to)
 {
 	to->i_uid = from->i_uid;
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 1769cc81a50f..f3a22566eaa1 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -303,13 +303,38 @@ struct inode *ovl_inode_real(struct inode *inode, bool *is_upper)
 	return realinode;
 }
 
-void ovl_inode_update(struct inode *inode, struct inode *upperinode)
+/*
+ * When inodes index is enabled, we hash all non-dir inodes by the address
+ * of the lower origin inode. We need to take care on concurrent copy up of
+ * different lower hardlinks, that only one alias can set the upper real inode.
+ * Copy up of an alias that lost the ovl_inode_update() race will get -EEXIST
+ * and needs to d_drop() the overlay dentry of that alias, so the next
+ * ovl_lookup() will initialize a new overlay inode for the broken hardlink.
+ */
+int ovl_inode_update(struct inode *inode, struct inode *upperinode)
 {
+	struct ovl_inode_info *oi = OVL_I_INFO(inode);
+	struct inode *realinode;
+	bool indexed = ovl_indexdir(inode->i_sb);
+
 	WARN_ON(!upperinode);
-	WARN_ON(!inode_unhashed(inode));
-	WRITE_ONCE(OVL_I_INFO(inode)->__upperinode, upperinode);
-	if (!S_ISDIR(upperinode->i_mode))
+
+	spin_lock(&inode->i_lock);
+	realinode = oi->__upperinode;
+	if (!realinode)
+		oi->__upperinode = upperinode;
+	spin_unlock(&inode->i_lock);
+
+	if (realinode && realinode != upperinode) {
+		WARN_ON(!indexed);
+		return -EEXIST;
+	}
+
+	/* When inodes index is enabled, inode is hashed before copy up */
+	if (!S_ISDIR(upperinode->i_mode) && !indexed)
 		ovl_insert_inode_hash(inode, upperinode);
+
+	return 0;
 }
 
 void ovl_dentry_version_inc(struct dentry *dentry)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 16/25] ovl: defer upper dir lock to tempfile link
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (14 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 15/25] ovl: hash overlay non-dir inodes by copy up origin inode Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 17/25] ovl: factor out ovl_copy_up_inode() helper Amir Goldstein
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

On copy up of regular file using an O_TMPFILE, lock upper dir only
before linking the tempfile in place.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 35 ++++++++++++++++++-----------------
 fs/overlayfs/util.c    |  3 ++-
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index ae18824c7944..f6570bc127e7 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -336,8 +336,13 @@ static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
 		.link = link
 	};
 
-	upper = lookup_one_len(dentry->d_name.name, upperdir,
-			       dentry->d_name.len);
+	if (tmpfile) {
+		upper = lookup_one_len_unlocked(dentry->d_name.name, upperdir,
+						dentry->d_name.len);
+	} else {
+		upper = lookup_one_len(dentry->d_name.name, upperdir,
+				       dentry->d_name.len);
+	}
 	err = PTR_ERR(upper);
 	if (IS_ERR(upper))
 		goto out;
@@ -377,16 +382,7 @@ static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
 		BUG_ON(upperpath.dentry != NULL);
 		upperpath.dentry = temp;
 
-		if (tmpfile) {
-			inode_unlock(udir);
-			err = ovl_copy_up_data(lowerpath, &upperpath,
-					       stat->size);
-			inode_lock_nested(udir, I_MUTEX_PARENT);
-		} else {
-			err = ovl_copy_up_data(lowerpath, &upperpath,
-					       stat->size);
-		}
-
+		err = ovl_copy_up_data(lowerpath, &upperpath, stat->size);
 		if (err)
 			goto out_cleanup;
 	}
@@ -409,10 +405,16 @@ static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
 	if (err)
 		goto out_cleanup;
 
-	if (tmpfile)
+	if (tmpfile) {
+		inode_lock_nested(udir, I_MUTEX_PARENT);
 		err = ovl_do_link(temp, udir, upper, true);
-	else
+		if (!err)
+			ovl_set_timestamps(upperdir, pstat);
+		pstat = NULL;
+		inode_unlock(udir);
+	} else {
 		err = ovl_do_rename(wdir, temp, udir, upper, 0);
+	}
 	if (err)
 		goto out_cleanup;
 
@@ -425,7 +427,8 @@ static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
 	}
 
 	/* Restore timestamps on parent (best effort) */
-	ovl_set_timestamps(upperdir, pstat);
+	if (pstat)
+		ovl_set_timestamps(upperdir, pstat);
 out2:
 	dput(temp);
 out1:
@@ -496,10 +499,8 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 			goto out_done;
 		}
 
-		inode_lock_nested(upperdir->d_inode, I_MUTEX_PARENT);
 		err = ovl_copy_up_locked(workdir, upperdir, dentry, lowerpath,
 					 stat, link, &pstat, true);
-		inode_unlock(upperdir->d_inode);
 		ovl_copy_up_end(dentry);
 		goto out_done;
 	}
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index f3a22566eaa1..f35f87823827 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -256,7 +256,8 @@ void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
 
-	WARN_ON(!inode_is_locked(upperdentry->d_parent->d_inode));
+	WARN_ON(!inode_is_locked(upperdentry->d_parent->d_inode) &&
+		!oe->copying);
 	WARN_ON(oe->__upperdentry);
 	/*
 	 * Make sure upperdentry is consistent before making it visible to
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 17/25] ovl: factor out ovl_copy_up_inode() helper
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (15 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 16/25] ovl: defer upper dir lock to tempfile link Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 18/25] ovl: generalize ovl_copy_up_locked() using actors Amir Goldstein
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

Factor out helper for copying lower inode data and metadata to temp
upper inode, that is common to copy up using O_TMPFILE and workdir.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 64 +++++++++++++++++++++++++++++---------------------
 1 file changed, 37 insertions(+), 27 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index f6570bc127e7..e642a51f3be4 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -316,6 +316,42 @@ static int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
 	return err;
 }
 
+static int ovl_copy_up_inode(struct dentry *dentry, struct dentry *temp,
+			     struct path *lowerpath, struct kstat *stat)
+{
+	int err;
+
+	if (S_ISREG(stat->mode)) {
+		struct path upperpath;
+
+		ovl_path_upper(dentry, &upperpath);
+		BUG_ON(upperpath.dentry != NULL);
+		upperpath.dentry = temp;
+
+		err = ovl_copy_up_data(lowerpath, &upperpath, stat->size);
+		if (err)
+			return err;
+	}
+
+	err = ovl_copy_xattr(lowerpath->dentry, temp);
+	if (err)
+		return err;
+
+	inode_lock(temp->d_inode);
+	err = ovl_set_attr(temp, stat);
+	inode_unlock(temp->d_inode);
+	if (err)
+		return err;
+
+	/*
+	 * Store identifier of lower inode in upper inode xattr to
+	 * allow lookup of the copy up origin inode.
+	 */
+	err = ovl_set_origin(dentry, lowerpath->dentry, temp);
+
+	return err;
+}
+
 static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
 			      struct dentry *dentry, struct path *lowerpath,
 			      struct kstat *stat, const char *link,
@@ -375,33 +411,7 @@ static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
 	if (err)
 		goto out2;
 
-	if (S_ISREG(stat->mode)) {
-		struct path upperpath;
-
-		ovl_path_upper(dentry, &upperpath);
-		BUG_ON(upperpath.dentry != NULL);
-		upperpath.dentry = temp;
-
-		err = ovl_copy_up_data(lowerpath, &upperpath, stat->size);
-		if (err)
-			goto out_cleanup;
-	}
-
-	err = ovl_copy_xattr(lowerpath->dentry, temp);
-	if (err)
-		goto out_cleanup;
-
-	inode_lock(temp->d_inode);
-	err = ovl_set_attr(temp, stat);
-	inode_unlock(temp->d_inode);
-	if (err)
-		goto out_cleanup;
-
-	/*
-	 * Store identifier of lower inode in upper inode xattr to
-	 * allow lookup of the copy up origin inode.
-	 */
-	err = ovl_set_origin(dentry, lowerpath->dentry, temp);
+	err = ovl_copy_up_inode(dentry, temp, lowerpath, stat);
 	if (err)
 		goto out_cleanup;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 18/25] ovl: generalize ovl_copy_up_locked() using actors
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (16 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 17/25] ovl: factor out ovl_copy_up_inode() helper Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 19/25] ovl: generalize ovl_copy_up_one() " Amir Goldstein
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

ovl_copy_up_locked() is implementing two different copy up methods in one
function with many conditional statements.

Split the function into actors: prepare,commit,cancel.
Implement the actors for the 'workdir' and 'tmpfile' copy up methods.

The aquire,release code for the two methods is still open coded in
ovl_copy_up_one(). Next patch will implement the aquire,release actors.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 250 ++++++++++++++++++++++++++++++++++---------------
 1 file changed, 172 insertions(+), 78 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index e642a51f3be4..39e982acc6c3 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -352,104 +352,197 @@ static int ovl_copy_up_inode(struct dentry *dentry, struct dentry *temp,
 	return err;
 }
 
-static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
-			      struct dentry *dentry, struct path *lowerpath,
-			      struct kstat *stat, const char *link,
-			      struct kstat *pstat, bool tmpfile)
+/*
+ * Context and operations for copying up a single lower file.
+ */
+struct ovl_copy_up_ctx {
+	struct dentry *dentry;
+	struct path *lowerpath;
+	struct kstat *stat;
+	struct kstat pstat;
+	const char *link;
+	struct dentry *upperdir;
+	struct dentry *tempdir;
+	struct dentry *upper;
+	struct dentry *temp;
+};
+
+struct ovl_copy_up_ops {
+	int (*prepare)(struct ovl_copy_up_ctx *);
+	int (*commit)(struct ovl_copy_up_ctx *);
+	void (*cancel)(struct ovl_copy_up_ctx *);
+};
+
+/*
+ * Copy up operations using workdir.
+ * Upper file is created in workdir, copied and moved to upperdir.
+ */
+static int ovl_copy_up_workdir_prepare(struct ovl_copy_up_ctx *ctx)
 {
-	struct inode *wdir = workdir->d_inode;
-	struct inode *udir = upperdir->d_inode;
-	struct dentry *newdentry = NULL;
 	struct dentry *upper = NULL;
 	struct dentry *temp = NULL;
 	int err;
-	const struct cred *old_creds = NULL;
-	struct cred *new_creds = NULL;
 	struct cattr cattr = {
 		/* Can't properly set mode on creation because of the umask */
-		.mode = stat->mode & S_IFMT,
-		.rdev = stat->rdev,
-		.link = link
+		.mode = ctx->stat->mode & S_IFMT,
+		.rdev = ctx->stat->rdev,
+		.link = ctx->link,
 	};
 
-	if (tmpfile) {
-		upper = lookup_one_len_unlocked(dentry->d_name.name, upperdir,
-						dentry->d_name.len);
-	} else {
-		upper = lookup_one_len(dentry->d_name.name, upperdir,
-				       dentry->d_name.len);
-	}
+
+	upper = lookup_one_len(ctx->dentry->d_name.name, ctx->upperdir,
+			       ctx->dentry->d_name.len);
 	err = PTR_ERR(upper);
 	if (IS_ERR(upper))
 		goto out;
 
+	temp = ovl_lookup_temp(ctx->tempdir);
+	if (IS_ERR(temp)) {
+		err = PTR_ERR(temp);
+		goto out_dput_upper;
+	}
+
+	err = ovl_create_real(d_inode(ctx->tempdir), temp, &cattr, NULL, true);
+	if (err)
+		goto out_dput_temp;
+
+	ctx->upper = upper;
+	ctx->temp = temp;
+	return 0;
+
+out_dput_temp:
+	dput(temp);
+out_dput_upper:
+	dput(upper);
+out:
+	return err;
+}
+
+static int ovl_copy_up_workdir_commit(struct ovl_copy_up_ctx *ctx)
+{
+	int err;
+
+	err = ovl_do_rename(d_inode(ctx->tempdir), ctx->temp,
+			    d_inode(ctx->upperdir), ctx->upper, 0);
+	if (err)
+		return err;
+
+	/* After rename, ctx->temp is the upper entry we will use */
+	swap(ctx->temp, ctx->upper);
+
+	/* Restore timestamps on parent (best effort) */
+	ovl_set_timestamps(ctx->upperdir, &ctx->pstat);
+
+	return err;
+}
+
+static void ovl_copy_up_workdir_cancel(struct ovl_copy_up_ctx *ctx)
+{
+	ovl_cleanup(d_inode(ctx->tempdir), ctx->temp);
+}
+
+static const struct ovl_copy_up_ops ovl_copy_up_workdir_ops = {
+	.prepare = ovl_copy_up_workdir_prepare,
+	.commit = ovl_copy_up_workdir_commit,
+	.cancel = ovl_copy_up_workdir_cancel,
+};
+
+/*
+ * Copy up operations using O_TMPFILE.
+ * Upper file is created unlinked, copied and linked to upperdir.
+ */
+static int ovl_copy_up_tmpfile_prepare(struct ovl_copy_up_ctx *ctx)
+{
+	struct dentry *upper;
+	struct dentry *temp;
+
+	upper = lookup_one_len_unlocked(ctx->dentry->d_name.name, ctx->upperdir,
+					ctx->dentry->d_name.len);
+	if (IS_ERR(upper))
+		return PTR_ERR(upper);
+
+	temp = ovl_do_tmpfile(ctx->upperdir, ctx->stat->mode);
+	if (IS_ERR(temp)) {
+		dput(upper);
+		return PTR_ERR(temp);
+	}
+
+	ctx->upper = upper;
+	ctx->temp = temp;
+	return 0;
+}
+
+static int ovl_copy_up_tmpfile_commit(struct ovl_copy_up_ctx *ctx)
+{
+	int err;
+
+	inode_lock_nested(d_inode(ctx->upperdir), I_MUTEX_PARENT);
+	/* link the sucker ;) */
+	err = ovl_do_link(ctx->temp, d_inode(ctx->upperdir), ctx->upper, true);
+	/* Restore timestamps on parent (best effort) */
+	if (!err)
+		ovl_set_timestamps(ctx->upperdir, &ctx->pstat);
+	inode_unlock(d_inode(ctx->upperdir));
+
+	return err;
+}
+
+static void ovl_copy_up_tmpfile_cancel(struct ovl_copy_up_ctx *ctx) { }
+
+static const struct ovl_copy_up_ops ovl_copy_up_tmpfile_ops = {
+	.prepare = ovl_copy_up_tmpfile_prepare,
+	.commit = ovl_copy_up_tmpfile_commit,
+	.cancel = ovl_copy_up_tmpfile_cancel,
+};
+
+static int ovl_copy_up_locked(struct ovl_copy_up_ctx *ctx,
+			      const struct ovl_copy_up_ops *ops)
+{
+	struct dentry *dentry = ctx->dentry;
+	const struct cred *old_creds = NULL;
+	struct cred *new_creds = NULL;
+	int err;
+
 	err = security_inode_copy_up(dentry, &new_creds);
 	if (err < 0)
-		goto out1;
+		return err;
 
 	if (new_creds)
 		old_creds = override_creds(new_creds);
 
-	if (tmpfile)
-		temp = ovl_do_tmpfile(upperdir, stat->mode);
-	else
-		temp = ovl_lookup_temp(workdir);
-	err = 0;
-	if (IS_ERR(temp)) {
-		err = PTR_ERR(temp);
-		temp = NULL;
-	}
-
-	if (!err && !tmpfile)
-		err = ovl_create_real(wdir, temp, &cattr, NULL, true);
+	ctx->upper = ctx->temp = NULL;
+	err = ops->prepare(ctx);
 
 	if (new_creds) {
 		revert_creds(old_creds);
 		put_cred(new_creds);
 	}
-
 	if (err)
-		goto out2;
+		goto out;
 
-	err = ovl_copy_up_inode(dentry, temp, lowerpath, stat);
+	err = ovl_copy_up_inode(dentry, ctx->temp, ctx->lowerpath, ctx->stat);
 	if (err)
-		goto out_cleanup;
+		goto out_cancel;
 
-	if (tmpfile) {
-		inode_lock_nested(udir, I_MUTEX_PARENT);
-		err = ovl_do_link(temp, udir, upper, true);
-		if (!err)
-			ovl_set_timestamps(upperdir, pstat);
-		pstat = NULL;
-		inode_unlock(udir);
-	} else {
-		err = ovl_do_rename(wdir, temp, udir, upper, 0);
-	}
+	err = ops->commit(ctx);
 	if (err)
-		goto out_cleanup;
+		goto out_cancel;
 
-	newdentry = dget(tmpfile ? upper : temp);
-	ovl_dentry_update(dentry, newdentry);
-	err = ovl_inode_update(d_inode(dentry), d_inode(newdentry));
+	ovl_dentry_update(dentry, dget(ctx->upper));
+	err = ovl_inode_update(d_inode(dentry), d_inode(ctx->upper));
 	if (err) {
 		/* Broken hardlink - drop cache and return error */
 		d_drop(dentry);
 	}
 
-	/* Restore timestamps on parent (best effort) */
-	if (pstat)
-		ovl_set_timestamps(upperdir, pstat);
-out2:
-	dput(temp);
-out1:
-	dput(upper);
 out:
+	dput(ctx->temp);
+	dput(ctx->upper);
 	return err;
 
-out_cleanup:
-	if (!tmpfile)
-		ovl_cleanup(wdir, temp);
-	goto out2;
+out_cancel:
+	ops->cancel(ctx);
+	goto out;
 }
 
 /*
@@ -465,37 +558,40 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 			   struct path *lowerpath, struct kstat *stat)
 {
 	DEFINE_DELAYED_CALL(done);
-	struct dentry *workdir = ovl_workdir(dentry);
 	int err;
-	struct kstat pstat;
 	struct path parentpath;
 	struct dentry *lowerdentry = lowerpath->dentry;
-	struct dentry *upperdir;
-	const char *link = NULL;
 	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
+	struct ovl_copy_up_ctx ctx = {
+		.dentry = dentry,
+		.lowerpath = lowerpath,
+		.stat = stat,
+		.link = NULL,
+		.tempdir = ovl_workdir(dentry),
+	};
 
-	if (WARN_ON(!workdir))
+	if (WARN_ON(!ctx.tempdir))
 		return -EROFS;
 
 	ovl_do_check_copy_up(lowerdentry);
 
 	ovl_path_upper(parent, &parentpath);
-	upperdir = parentpath.dentry;
+	ctx.upperdir = parentpath.dentry;
 
 	/* Mark parent "impure" because it may now contain non-pure upper */
-	err = ovl_set_impure(parent, upperdir);
+	err = ovl_set_impure(parent, ctx.upperdir);
 	if (err)
 		return err;
 
-	err = vfs_getattr(&parentpath, &pstat,
+	err = vfs_getattr(&parentpath, &ctx.pstat,
 			  STATX_ATIME | STATX_MTIME, AT_STATX_SYNC_AS_STAT);
 	if (err)
 		return err;
 
 	if (S_ISLNK(stat->mode)) {
-		link = vfs_get_link(lowerdentry, &done);
-		if (IS_ERR(link))
-			return PTR_ERR(link);
+		ctx.link = vfs_get_link(lowerdentry, &done);
+		if (IS_ERR(ctx.link))
+			return PTR_ERR(ctx.link);
 	}
 
 	/* Should we copyup with O_TMPFILE or with workdir? */
@@ -509,14 +605,13 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 			goto out_done;
 		}
 
-		err = ovl_copy_up_locked(workdir, upperdir, dentry, lowerpath,
-					 stat, link, &pstat, true);
+		err = ovl_copy_up_locked(&ctx, &ovl_copy_up_tmpfile_ops);
 		ovl_copy_up_end(dentry);
 		goto out_done;
 	}
 
 	err = -EIO;
-	if (lock_rename(workdir, upperdir) != NULL) {
+	if (lock_rename(ctx.tempdir, ctx.upperdir) != NULL) {
 		pr_err("overlayfs: failed to lock workdir+upperdir\n");
 		goto out_unlock;
 	}
@@ -526,10 +621,9 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 		goto out_unlock;
 	}
 
-	err = ovl_copy_up_locked(workdir, upperdir, dentry, lowerpath,
-				 stat, link, &pstat, false);
+	err = ovl_copy_up_locked(&ctx, &ovl_copy_up_workdir_ops);
 out_unlock:
-	unlock_rename(workdir, upperdir);
+	unlock_rename(ctx.tempdir, ctx.upperdir);
 out_done:
 	do_delayed_call(&done);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 19/25] ovl: generalize ovl_copy_up_one() using actors
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (17 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 18/25] ovl: generalize ovl_copy_up_locked() using actors Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 20/25] ovl: use ovl_inode mutex to synchronize concurrent copy up Amir Goldstein
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

ovl_copy_up_one() is implementing two different locking methods
for regular files and non-regular files.

Implement the actors aquire,release for the 'workdir' and 'tmpfile'
copy up methods and use them to generalize locking in ovl_copy_up_one().

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 81 ++++++++++++++++++++++++++++++++++----------------
 1 file changed, 56 insertions(+), 25 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 39e982acc6c3..452560a17d7e 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -368,15 +368,38 @@ struct ovl_copy_up_ctx {
 };
 
 struct ovl_copy_up_ops {
+	int (*aquire)(struct ovl_copy_up_ctx *);
 	int (*prepare)(struct ovl_copy_up_ctx *);
 	int (*commit)(struct ovl_copy_up_ctx *);
 	void (*cancel)(struct ovl_copy_up_ctx *);
+	void (*release)(struct ovl_copy_up_ctx *);
 };
 
 /*
  * Copy up operations using workdir.
  * Upper file is created in workdir, copied and moved to upperdir.
  */
+static int ovl_copy_up_workdir_aquire(struct ovl_copy_up_ctx *ctx)
+{
+	int err = -EIO;
+
+	if (lock_rename(ctx->tempdir, ctx->upperdir) != NULL) {
+		pr_err("overlayfs: failed to lock workdir+upperdir\n");
+		goto out_unlock;
+	}
+	if (ovl_dentry_upper(ctx->dentry)) {
+		/* Raced with another copy-up? */
+		err = 1;
+		goto out_unlock;
+	}
+
+	return 0;
+
+out_unlock:
+	unlock_rename(ctx->tempdir, ctx->upperdir);
+	return err;
+}
+
 static int ovl_copy_up_workdir_prepare(struct ovl_copy_up_ctx *ctx)
 {
 	struct dentry *upper = NULL;
@@ -441,16 +464,28 @@ static void ovl_copy_up_workdir_cancel(struct ovl_copy_up_ctx *ctx)
 	ovl_cleanup(d_inode(ctx->tempdir), ctx->temp);
 }
 
+static void ovl_copy_up_workdir_release(struct ovl_copy_up_ctx *ctx)
+{
+	unlock_rename(ctx->tempdir, ctx->upperdir);
+}
+
 static const struct ovl_copy_up_ops ovl_copy_up_workdir_ops = {
+	.aquire = ovl_copy_up_workdir_aquire,
 	.prepare = ovl_copy_up_workdir_prepare,
 	.commit = ovl_copy_up_workdir_commit,
 	.cancel = ovl_copy_up_workdir_cancel,
+	.release = ovl_copy_up_workdir_release,
 };
 
 /*
  * Copy up operations using O_TMPFILE.
  * Upper file is created unlinked, copied and linked to upperdir.
  */
+static int ovl_copy_up_tmpfile_aquire(struct ovl_copy_up_ctx *ctx)
+{
+	return ovl_copy_up_start(ctx->dentry);
+}
+
 static int ovl_copy_up_tmpfile_prepare(struct ovl_copy_up_ctx *ctx)
 {
 	struct dentry *upper;
@@ -489,10 +524,17 @@ static int ovl_copy_up_tmpfile_commit(struct ovl_copy_up_ctx *ctx)
 
 static void ovl_copy_up_tmpfile_cancel(struct ovl_copy_up_ctx *ctx) { }
 
+static void ovl_copy_up_tmpfile_release(struct ovl_copy_up_ctx *ctx)
+{
+	ovl_copy_up_end(ctx->dentry);
+}
+
 static const struct ovl_copy_up_ops ovl_copy_up_tmpfile_ops = {
+	.aquire = ovl_copy_up_tmpfile_aquire,
 	.prepare = ovl_copy_up_tmpfile_prepare,
 	.commit = ovl_copy_up_tmpfile_commit,
 	.cancel = ovl_copy_up_tmpfile_cancel,
+	.release = ovl_copy_up_tmpfile_release,
 };
 
 static int ovl_copy_up_locked(struct ovl_copy_up_ctx *ctx,
@@ -569,6 +611,7 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 		.link = NULL,
 		.tempdir = ovl_workdir(dentry),
 	};
+	const struct ovl_copy_up_ops *ops;
 
 	if (WARN_ON(!ctx.tempdir))
 		return -EROFS;
@@ -595,35 +638,23 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 	}
 
 	/* Should we copyup with O_TMPFILE or with workdir? */
-	if (S_ISREG(stat->mode) && ofs->tmpfile) {
-		err = ovl_copy_up_start(dentry);
-		/* err < 0: interrupted, err > 0: raced with another copy-up */
-		if (unlikely(err)) {
-			pr_debug("ovl_copy_up_start(%pd2) = %i\n", dentry, err);
-			if (err > 0)
-				err = 0;
-			goto out_done;
-		}
-
-		err = ovl_copy_up_locked(&ctx, &ovl_copy_up_tmpfile_ops);
-		ovl_copy_up_end(dentry);
+	if (S_ISREG(stat->mode) && ofs->tmpfile)
+		ops = &ovl_copy_up_tmpfile_ops;
+	else
+		ops = &ovl_copy_up_workdir_ops;
+
+	err = ops->aquire(&ctx);
+	/* err < 0: interrupted, err > 0: raced with another copy-up */
+	if (unlikely(err)) {
+		pr_debug("%s(%pd2): aquire = %i\n", __func__, dentry, err);
+		if (err > 0)
+			err = 0;
 		goto out_done;
 	}
 
-	err = -EIO;
-	if (lock_rename(ctx.tempdir, ctx.upperdir) != NULL) {
-		pr_err("overlayfs: failed to lock workdir+upperdir\n");
-		goto out_unlock;
-	}
-	if (ovl_dentry_upper(dentry)) {
-		/* Raced with another copy-up?  Nothing to do, then... */
-		err = 0;
-		goto out_unlock;
-	}
+	err = ovl_copy_up_locked(&ctx, ops);
+	ops->release(&ctx);
 
-	err = ovl_copy_up_locked(&ctx, &ovl_copy_up_workdir_ops);
-out_unlock:
-	unlock_rename(ctx.tempdir, ctx.upperdir);
 out_done:
 	do_delayed_call(&done);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 20/25] ovl: use ovl_inode mutex to synchronize concurrent copy up
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (18 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 19/25] ovl: generalize ovl_copy_up_one() " Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 21/25] ovl: implement index dir copy up method Amir Goldstein
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

Use the new ovl_inode mutex to synchonize concurrent copy up
instead of the super block copy up workqueue.

Moving the synchronization object from the overlay dentry to
the overlay inode is needed for synchonizing concurrent copy up
of lower hardlinks to the same upper inode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/ovl_entry.h |  2 --
 fs/overlayfs/super.c     |  1 -
 fs/overlayfs/util.c      | 24 ++++++------------------
 3 files changed, 6 insertions(+), 21 deletions(-)

diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 6e240bc5ac1d..288f52f24755 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -35,7 +35,6 @@ struct ovl_fs {
 	const struct cred *creator_cred;
 	bool tmpfile;
 	bool noxattr;
-	wait_queue_head_t copyup_wq;
 	/* sb common to all layers */
 	struct super_block *same_sb;
 };
@@ -50,7 +49,6 @@ struct ovl_entry {
 		struct {
 			u64 version;
 			const char *redirect;
-			bool copying;
 		};
 		struct rcu_head rcu;
 	};
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 33ff179e32d7..144354b5fcf1 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -842,7 +842,6 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	if (!ufs)
 		goto out;
 
-	init_waitqueue_head(&ufs->copyup_wq);
 	ufs->config.redirect_dir = ovl_redirect_dir_def;
 	ufs->config.index = ovl_index_def;
 	err = ovl_parse_opt((char *) data, &ufs->config);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index f35f87823827..a28fd9709648 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -257,7 +257,7 @@ void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
 	struct ovl_entry *oe = dentry->d_fsdata;
 
 	WARN_ON(!inode_is_locked(upperdentry->d_parent->d_inode) &&
-		!oe->copying);
+		!mutex_is_locked(&OVL_I(d_inode(dentry))->oi_lock));
 	WARN_ON(oe->__upperdentry);
 	/*
 	 * Make sure upperdentry is consistent before making it visible to
@@ -368,32 +368,20 @@ struct file *ovl_path_open(struct path *path, int flags)
 
 int ovl_copy_up_start(struct dentry *dentry)
 {
-	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
-	struct ovl_entry *oe = dentry->d_fsdata;
 	int err;
 
-	spin_lock(&ofs->copyup_wq.lock);
-	err = wait_event_interruptible_locked(ofs->copyup_wq, !oe->copying);
-	if (!err) {
-		if (oe->__upperdentry)
-			err = 1; /* Already copied up */
-		else
-			oe->copying = true;
+	err = mutex_lock_interruptible(&OVL_I(d_inode(dentry))->oi_lock);
+	if (!err && ovl_dentry_upper(dentry)) {
+		err = 1; /* Already copied up */
+		mutex_unlock(&OVL_I(d_inode(dentry))->oi_lock);
 	}
-	spin_unlock(&ofs->copyup_wq.lock);
 
 	return err;
 }
 
 void ovl_copy_up_end(struct dentry *dentry)
 {
-	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
-	struct ovl_entry *oe = dentry->d_fsdata;
-
-	spin_lock(&ofs->copyup_wq.lock);
-	oe->copying = false;
-	wake_up_locked(&ofs->copyup_wq);
-	spin_unlock(&ofs->copyup_wq.lock);
+	mutex_unlock(&OVL_I(d_inode(dentry))->oi_lock);
 }
 
 bool ovl_check_dir_xattr(struct dentry *dentry, const char *name)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 21/25] ovl: implement index dir copy up method
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (19 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 20/25] ovl: use ovl_inode mutex to synchronize concurrent copy up Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 22/25] ovl: link up indexed lower hardlink on lookup Amir Goldstein
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

Implement a copy up method for non-dir objects using index dir to
prevent breaking lower hardlinks on copy up.

This method requires that the inodes index dir feature was enabled and
that all underlying fs support NFS export.

On the first lower hardlink copy up, upper file is created in index dir,
named after the hex representation of the lower origin inode file handle.
On the second lower hardlink copy up, upper file is found in index dir,
by the same lower handle key.
On either case, the upper indexed inode is then linked to the copy up
upper path.

The index entry remains linked for future lower hardlink copy up and for
lower to upper inode map, that is needed for exporting overlayfs to NFS.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 Documentation/filesystems/overlayfs.txt |   6 +-
 fs/overlayfs/copy_up.c                  | 211 ++++++++++++++++++++++++++++++--
 fs/overlayfs/overlayfs.h                |   1 +
 fs/overlayfs/util.c                     |   9 ++
 4 files changed, 216 insertions(+), 11 deletions(-)

diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt
index c9e884b52698..8599fc9addbb 100644
--- a/Documentation/filesystems/overlayfs.txt
+++ b/Documentation/filesystems/overlayfs.txt
@@ -210,9 +210,9 @@ filesystem, so both st_dev and st_ino of the file may change.
 
 Any open files referring to this inode will access the old data.
 
-If a file with multiple hard links is copied up, then this will
-"break" the link.  Changes will not be propagated to other names
-referring to the same inode.
+Unless "inodes index" feature is enabled, if a file with multiple hard
+links is copied up, then this will "break" the link and changes will not
+be propagated to other names referring to the same inode.
 
 Unless "redirect_dir" feature is enabled, rename(2) on a lower or merged
 directory will fail with EXDEV.
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 452560a17d7e..acff8b9e6d1b 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -345,7 +345,8 @@ static int ovl_copy_up_inode(struct dentry *dentry, struct dentry *temp,
 
 	/*
 	 * Store identifier of lower inode in upper inode xattr to
-	 * allow lookup of the copy up origin inode.
+	 * allow lookup of the copy up origin inode. We do this last
+	 * to bless the inode, in case it was created in the index dir.
 	 */
 	err = ovl_set_origin(dentry, lowerpath->dentry, temp);
 
@@ -365,6 +366,7 @@ struct ovl_copy_up_ctx {
 	struct dentry *tempdir;
 	struct dentry *upper;
 	struct dentry *temp;
+	bool created;
 };
 
 struct ovl_copy_up_ops {
@@ -537,6 +539,191 @@ static const struct ovl_copy_up_ops ovl_copy_up_tmpfile_ops = {
 	.release = ovl_copy_up_tmpfile_release,
 };
 
+/*
+ * Copy up operations using index dir.
+ * Upper file is created in index dir, copied and linked to upperdir.
+ * The index entry remains to be used for mapping lower inode to upper inode.
+ */
+static int ovl_copy_up_indexdir_aquire(struct ovl_copy_up_ctx *ctx)
+{
+	return ovl_copy_up_start(ctx->dentry);
+}
+
+/*
+ * Set ctx->temp to a positive dentry with the index inode.
+ *
+ * Return 0 if entry was created by us and we need to copy the inode data.
+ *
+ * Return 1 if we found an index inode, in which case, we do not need
+ * to copy the inode data.
+ *
+ * May return -EEXISTS/-ENOENT if corrupt index entries are found.
+ */
+static int ovl_copy_up_indexdir_prepare(struct ovl_copy_up_ctx *ctx)
+{
+	struct dentry *upper;
+	struct dentry *index = NULL;
+	struct inode *inode;
+	int err;
+	struct cattr cattr = {
+		/* Can't properly set mode on creation because of the umask */
+		.mode = ctx->stat->mode & S_IFMT,
+		.rdev = ctx->stat->rdev,
+		.link = ctx->link,
+	};
+
+	upper = lookup_one_len_unlocked(ctx->dentry->d_name.name, ctx->upperdir,
+					ctx->dentry->d_name.len);
+	if (IS_ERR(upper))
+		return PTR_ERR(upper);
+
+	err = ovl_lookup_index(ctx->dentry, NULL, ctx->lowerpath->dentry,
+			       &index);
+	if (err)
+		goto out_dput;
+
+	inode = d_inode(index);
+	if (inode) {
+		/* Another lower hardlink already copied-up? */
+		err = -EEXIST;
+		if ((inode->i_mode & S_IFMT) != cattr.mode)
+			goto out_dput;
+
+		err = -ENOENT;
+		if (!inode->i_nlink)
+			goto out_dput;
+
+		/*
+		 * Verify that found index is a copy up of lower inode.
+		 * If index inode doesn't point back to lower inode via
+		 * origin file handle, then this is either a leftover from
+		 * failed copy up or an index dir entry before copying layers.
+		 * In both cases, we cannot use this index and must fail the
+		 * copy up. The failed copy up case will return -EEXISTS and
+		 * the copying layers case will return -ESTALE.
+		 */
+		err = ovl_verify_origin(index, ctx->lowerpath->mnt,
+					ctx->lowerpath->dentry, false, false);
+		if (err) {
+			if (err == -ENODATA)
+				err = -EEXIST;
+			goto out_dput;
+		}
+
+		if (inode->i_nlink < 2) {
+			/*
+			 * An orphan index inode can be created by copying up
+			 * a lower hardlink alias and then unlinking it. From
+			 * overlayfs perspective, this inode may still live if
+			 * there are more lower hardlinks and it should contain
+			 * the data of the upper inode that was unlinked. So if
+			 * an orphan inode is found in the index dir and we
+			 * should reuse it on copy up of another lower alias.
+			 *
+			 * TODO: keep account of nlink incremented by copy up
+			 * and account of nlink decremented by lower cover up.
+			 * When copyup_nlink + coverup_nlink == origin_nlink
+			 * and index_nlink == 1, need to remove the index entry
+			 * because all overlay references to the index are gone.
+			 */
+			pr_warn_ratelimited("overlayfs: linking to orphan upper (%pd2, ino=%lu)\n",
+					    index, inode->i_ino);
+		}
+
+		/* Link to existing upper without copying lower */
+		err = 1;
+		goto out;
+	}
+
+	inode_lock_nested(d_inode(ctx->tempdir), I_MUTEX_PARENT);
+	err = ovl_create_real(d_inode(ctx->tempdir), index, &cattr, NULL, true);
+	inode_unlock(d_inode(ctx->tempdir));
+	if (err)
+		goto out_dput;
+
+	ctx->created = true;
+out:
+	ctx->upper = upper;
+	ctx->temp = index;
+	return err;
+
+out_dput:
+	pr_warn_ratelimited("overlayfs: failed to create index entry (%pd2, ino=%lu, err=%i)\n",
+			    index ? : ctx->lowerpath->dentry,
+			    inode ? inode->i_ino : 0, err);
+	pr_warn_ratelimited("overlayfs: try clearing index dir or mounting with '-o index=off' to disable inodes index.\n");
+	dput(upper);
+	dput(index);
+	return err;
+}
+
+static int ovl_copy_up_indexdir_commit(struct ovl_copy_up_ctx *ctx)
+{
+	int err;
+
+	inode_lock_nested(d_inode(ctx->upperdir), I_MUTEX_PARENT);
+	/* link the sucker ;) */
+	err = ovl_do_link(ctx->temp, d_inode(ctx->upperdir), ctx->upper, true);
+	/* Restore timestamps on parent (best effort) */
+	if (!err)
+		ovl_set_timestamps(ctx->upperdir, &ctx->pstat);
+	inode_unlock(d_inode(ctx->upperdir));
+
+	if (err)
+		goto out;
+
+	/*
+	 * Overlay inode nlink doesn't account for lower hardlinks that haven't
+	 * been copied up, so we need to update it on copy up. Otherwise, user
+	 * could decrement nlink below zero by unlinking copied up uppers.
+	 * On the first copy up, we set overlay inode nlink to upper inode nlink
+	 * and on following copy ups we increment it. In between, ovl_link()
+	 * could add more upper hardlinks and increment overlay nlink as well.
+	 */
+	if (ctx->created)
+		set_nlink(d_inode(ctx->dentry), d_inode(ctx->temp)->i_nlink);
+	else
+		inc_nlink(d_inode(ctx->dentry));
+
+	/* We can mark dentry is indexed before updating upperdentry */
+	ovl_dentry_set_indexed(ctx->dentry);
+
+out:
+	return err;
+}
+
+static void ovl_copy_up_indexdir_cancel(struct ovl_copy_up_ctx *ctx)
+{
+	struct inode *inode = d_inode(ctx->temp);
+
+	if (WARN_ON(!inode))
+		return;
+
+	/* Cleanup prepared index entry only if we created it */
+	if (!ctx->created)
+		return;
+
+	pr_warn_ratelimited("overlayfs: cleanup bad index (%pd2, ino=%lu, nlink=%u)\n",
+			    ctx->temp, inode->i_ino, inode->i_nlink);
+
+	inode_lock_nested(d_inode(ctx->tempdir), I_MUTEX_PARENT);
+	ovl_cleanup(d_inode(ctx->tempdir), ctx->temp);
+	inode_unlock(d_inode(ctx->tempdir));
+}
+
+static void ovl_copy_up_indexdir_release(struct ovl_copy_up_ctx *ctx)
+{
+	ovl_copy_up_end(ctx->dentry);
+}
+
+static const struct ovl_copy_up_ops ovl_copy_up_indexdir_ops = {
+	.aquire = ovl_copy_up_indexdir_aquire,
+	.prepare = ovl_copy_up_indexdir_prepare,
+	.commit = ovl_copy_up_indexdir_commit,
+	.cancel = ovl_copy_up_indexdir_cancel,
+	.release = ovl_copy_up_indexdir_release,
+};
+
 static int ovl_copy_up_locked(struct ovl_copy_up_ctx *ctx,
 			      const struct ovl_copy_up_ops *ops)
 {
@@ -559,12 +746,16 @@ static int ovl_copy_up_locked(struct ovl_copy_up_ctx *ctx,
 		revert_creds(old_creds);
 		put_cred(new_creds);
 	}
-	if (err)
+	if (err < 0)
 		goto out;
 
-	err = ovl_copy_up_inode(dentry, ctx->temp, ctx->lowerpath, ctx->stat);
-	if (err)
-		goto out_cancel;
+	/* err == 1 means we found an existing hardlinked upper inode */
+	if (!err) {
+		err = ovl_copy_up_inode(dentry, ctx->temp, ctx->lowerpath,
+					ctx->stat);
+		if (err)
+			goto out_cancel;
+	}
 
 	err = ops->commit(ctx);
 	if (err)
@@ -604,12 +795,14 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 	struct path parentpath;
 	struct dentry *lowerdentry = lowerpath->dentry;
 	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
+	bool indexed = ovl_indexdir(dentry->d_sb) && !S_ISDIR(stat->mode);
 	struct ovl_copy_up_ctx ctx = {
 		.dentry = dentry,
 		.lowerpath = lowerpath,
 		.stat = stat,
 		.link = NULL,
-		.tempdir = ovl_workdir(dentry),
+		.tempdir = indexed ? ovl_indexdir(dentry->d_sb) :
+				     ovl_workdir(dentry),
 	};
 	const struct ovl_copy_up_ops *ops;
 
@@ -637,8 +830,10 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 			return PTR_ERR(ctx.link);
 	}
 
-	/* Should we copyup with O_TMPFILE or with workdir? */
-	if (S_ISREG(stat->mode) && ofs->tmpfile)
+	/* Should we copyup with O_TMPFILE, with indexdir or with workdir? */
+	if (indexed)
+		ops = &ovl_copy_up_indexdir_ops;
+	else if (S_ISREG(stat->mode) && ofs->tmpfile)
 		ops = &ovl_copy_up_tmpfile_ops;
 	else
 		ops = &ovl_copy_up_workdir_ops;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 79cee43eb84d..250b3d42da86 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -206,6 +206,7 @@ bool ovl_dentry_is_opaque(struct dentry *dentry);
 bool ovl_dentry_is_impure(struct dentry *dentry);
 bool ovl_dentry_is_whiteout(struct dentry *dentry);
 void ovl_dentry_set_opaque(struct dentry *dentry);
+void ovl_dentry_set_indexed(struct dentry *dentry);
 bool ovl_redirect_dir(struct super_block *sb);
 const char *ovl_dentry_get_redirect(struct dentry *dentry);
 void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index a28fd9709648..1d6284083e16 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -230,6 +230,15 @@ static void ovl_dentry_set_impure(struct dentry *dentry)
 	spin_unlock(&dentry->d_lock);
 }
 
+void ovl_dentry_set_indexed(struct dentry *dentry)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+
+	spin_lock(&dentry->d_lock);
+	oe->__type |= __OVL_PATH_INDEX;
+	spin_unlock(&dentry->d_lock);
+}
+
 bool ovl_redirect_dir(struct super_block *sb)
 {
 	struct ovl_fs *ofs = sb->s_fs_info;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 22/25] ovl: link up indexed lower hardlink on lookup
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (20 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 21/25] ovl: implement index dir copy up method Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 23/25] ovl: fix nlink leak in ovl_rename() Amir Goldstein
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

With inodes index feature, all lower and upper hardlinks point to
the same overlay inode. However, when a lower hardlink is accessed
for read operation, the real inode operated on is not the same inode
as the real inode for read operation on an upper hardlink.

When lookup finds a lower hardlink, which is already indexed by
an earlier upper hardlink copy up, call ovl_copy_up() to link the
indexed upper on top of the lower hardlink and then operate on the
upper real inode to avoid this inconsistency.

Invalidate a lower indexed dentry on dcache lookup, so ovl_lookup()
is called to perform the index link up.

The following test demonstrates the upper/lower hardlinks inconsistency:

$ echo -n a > /lower/foo
$ ln /lower/foo /lower/bar
$ cd /mnt
$ echo -n b >> foo
$ tail foo bar # foo is indexed upper, bar is indexed lower
==> foo <==
ab
==> bar <==
a

$ echo -n c >> bar
$ tail foo bar # both aliases are indexed upper
==> foo <==
abc
==> bar <==
abc

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/namei.c | 20 ++++++++++++++++++--
 fs/overlayfs/super.c | 39 +++++++++++++++++++++++++++++++++------
 2 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 08154dad725e..c6b986b68105 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -659,7 +659,14 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 			goto out_put;
 
 		if (!upperdentry) {
-			/* TODO: handle lookup of lower indexed entries */
+			/*
+			 * At this point, if we find a positive index, we cannot
+			 * tell if the index entry has been created by a copy up
+			 * in progress, because we don't have the overlay inode
+			 * and we don't hold the overlay inode oi_lock. So we
+			 * will treat this entry as non-indexed lower and will
+			 * try to link it up before returning from lookup.
+			 */
 		} else if (index && d_inode(index)) {
 			/* Vertified indexed upper */
 			type |= __OVL_PATH_INDEX;
@@ -722,13 +729,22 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	oe->redirect = upperredirect;
 	oe->__upperdentry = upperdentry;
 	memcpy(oe->lowerstack, stack, sizeof(struct path) * ctr);
-	dput(index);
 	kfree(stack);
 	kfree(d.redirect);
 	dentry->d_fsdata = oe;
 	ovl_update_type(dentry, d.is_dir);
 	d_add(dentry, inode);
 
+	/* Link up indexed lower early for consistent overlay hardlinks */
+	if (index && d_inode(index) && !upperdentry) {
+		err = ovl_copy_up(dentry);
+		if (err) {
+			pr_warn_ratelimited("overlayfs: failed link up to index (%pd2, index=%pd2, err=%i)\n",
+					    dentry, index, err);
+		}
+	}
+	dput(index);
+
 	return NULL;
 
 out_free_oe:
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 144354b5fcf1..d6604bbe0a66 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -117,11 +117,43 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
 	return dentry;
 }
 
+static int ovl_dentry_indexed_revalidate(struct dentry *dentry,
+					 unsigned int flags)
+{
+	enum ovl_path_type type = ovl_path_type(dentry);
+	bool is_upper;
+
+	if (!ovl_indexdir(dentry->d_sb) ||
+	    d_is_dir(dentry) || d_is_negative(dentry))
+		return 1;
+
+	/*
+	 * Invalidate lower hardlink after it has been indexed by copy up
+	 * of another lower alias. ovl_lookup will trigger copy up of this
+	 * path and link the upper path to the upper index inode.
+	 */
+	ovl_inode_real(d_inode(dentry), &is_upper);
+	if (is_upper && !OVL_TYPE_UPPER(type))
+		return 0;
+
+	return 1;
+}
+
+static const struct dentry_operations ovl_dentry_operations = {
+	.d_release = ovl_dentry_release,
+	.d_real = ovl_d_real,
+	.d_revalidate = ovl_dentry_indexed_revalidate,
+};
+
 static int ovl_dentry_revalidate(struct dentry *dentry, unsigned int flags)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
 	unsigned int i;
-	int ret = 1;
+	int ret;
+
+	ret = ovl_dentry_indexed_revalidate(dentry, flags);
+	if (ret < 1)
+		return ret;
 
 	for (i = 0; i < oe->numlower; i++) {
 		struct dentry *d = oe->lowerstack[i].dentry;
@@ -158,11 +190,6 @@ static int ovl_dentry_weak_revalidate(struct dentry *dentry, unsigned int flags)
 	return ret;
 }
 
-static const struct dentry_operations ovl_dentry_operations = {
-	.d_release = ovl_dentry_release,
-	.d_real = ovl_d_real,
-};
-
 static const struct dentry_operations ovl_reval_dentry_operations = {
 	.d_release = ovl_dentry_release,
 	.d_real = ovl_d_real,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 23/25] ovl: fix nlink leak in ovl_rename()
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (21 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 22/25] ovl: link up indexed lower hardlink on lookup Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 24/25] ovl: persistent overlay inode nlink for indexed inodes Amir Goldstein
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

This patch fixes an overlay inode nlink leak in the case where
ovl_rename() renames over a non-dir.

This is not so critical, because overlay inode doesn't rely on
nlink dropping to zero for inode deletion.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/dir.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 7bcb5a19e514..45120e9d8bbd 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -892,6 +892,7 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 	bool old_opaque;
 	bool new_opaque;
 	bool cleanup_whiteout = false;
+	bool new_drop_nlink = false;
 	bool overwrite = !(flags & RENAME_EXCHANGE);
 	bool is_dir = d_is_dir(old);
 	bool new_is_dir = d_is_dir(new);
@@ -927,6 +928,8 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 		err = ovl_copy_up(new);
 		if (err)
 			goto out_drop_write;
+	} else if (!new_is_dir && new->d_inode) {
+		new_drop_nlink = true;
 	}
 
 	old_cred = ovl_override_creds(old->d_sb);
@@ -1047,6 +1050,9 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 	if (cleanup_whiteout)
 		ovl_cleanup(old_upperdir->d_inode, newdentry);
 
+	if (new_drop_nlink)
+		drop_nlink(new->d_inode);
+
 	ovl_dentry_version_inc(old->d_parent);
 	ovl_dentry_version_inc(new->d_parent);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 24/25] ovl: persistent overlay inode nlink for indexed inodes
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (22 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 23/25] ovl: fix nlink leak in ovl_rename() Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-23 11:34   ` Amir Goldstein
  2017-06-21 12:28 ` [PATCH v4 25/25] ovl: cleanup orphan index entries Amir Goldstein
                   ` (2 subsequent siblings)
  26 siblings, 1 reply; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

With inodes index enabled, an overlay inode nlink counts the union of upper
and non-covered lower hardlinks. During the lifetime of a non-pure upper
inode, the following nlink modifying operations can happen:

1. Lower hardlink copy up
2. Upper hardlink created, unlinked or renamed over
3. Lower hardlink whiteout or renamed over

For the first, copy up case, the union nlink does not change, whether the
operation succeeds or fails, but the upper inode nlink may change.
Therefore, before copy up, we store the union nlink value relative to the
lower inode nlink in the index inode xattr trusted.overlay.nlink.

For the second, upper hardlink case, the union nlink should be incremented
or decremented IFF the operation succeeds, aligned with nlink change of the
upper inode. Therefore, before link/unlink/rename, we store the union nlink
value relative to the upper inode nlink in the index inode.

For the last, lower cover up case, we simplify things by preceding the
whiteout or cover up with copy up. This makes sure that there is an index
upper inode where the nlink xattr can be stored before the copied up upper
entry is unlink.

Return the overlay inode nlinks for indexed upper inodes on stat(2).

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c   |  88 +++++++++++++++++++++++++++-------------
 fs/overlayfs/dir.c       |  80 +++++++++++++++++++++++++++++++++++-
 fs/overlayfs/inode.c     | 103 ++++++++++++++++++++++++++++++++++++++++++++++-
 fs/overlayfs/namei.c     |   2 +-
 fs/overlayfs/overlayfs.h |   6 ++-
 5 files changed, 245 insertions(+), 34 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index acff8b9e6d1b..bbc16880b32d 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -610,24 +610,21 @@ static int ovl_copy_up_indexdir_prepare(struct ovl_copy_up_ctx *ctx)
 			goto out_dput;
 		}
 
-		if (inode->i_nlink < 2) {
+		err = -ENOENT;
+		if (ctx->dentry->d_inode->i_nlink == 0) {
 			/*
 			 * An orphan index inode can be created by copying up
-			 * a lower hardlink alias and then unlinking it. From
-			 * overlayfs perspective, this inode may still live if
-			 * there are more lower hardlinks and it should contain
-			 * the data of the upper inode that was unlinked. So if
-			 * an orphan inode is found in the index dir and we
-			 * should reuse it on copy up of another lower alias.
-			 *
-			 * TODO: keep account of nlink incremented by copy up
-			 * and account of nlink decremented by lower cover up.
-			 * When copyup_nlink + coverup_nlink == origin_nlink
-			 * and index_nlink == 1, need to remove the index entry
-			 * because all overlay references to the index are gone.
+			 * all lower hardlinks and then unlinking all upper
+			 * hardlinks. The overlay inode may still be alive if
+			 * it is referenced from an open file descriptor, but
+			 * there should be no more copy ups that link to the
+			 * index inode.
+			 * Orphan index inodes should be cleaned up on mount
+			 * and when overlay inode nlink drops to zero.
 			 */
-			pr_warn_ratelimited("overlayfs: linking to orphan upper (%pd2, ino=%lu)\n",
-					    index, inode->i_ino);
+			pr_warn_ratelimited("overlayfs: not linking to orphan index (%pd2, nlink=%u)\n",
+					    index, inode->i_nlink);
+			goto out_dput;
 		}
 
 		/* Link to existing upper without copying lower */
@@ -643,6 +640,15 @@ static int ovl_copy_up_indexdir_prepare(struct ovl_copy_up_ctx *ctx)
 
 	ctx->created = true;
 out:
+	/*
+	 * The overlay inode nlink does not change on copy up whether the
+	 * operation succeeds or fails, but the upper inode nlink may change.
+	 * Therefore, before copy up, we store the union nlink value relative
+	 * to the lower inode nlink in an index inode xattr. We will store it
+	 * again relative to index inode nlink at copy up commit or cancel.
+	 */
+	ovl_set_nlink(d_inode(ctx->dentry), index, false);
+
 	ctx->upper = upper;
 	ctx->temp = index;
 	return err;
@@ -657,10 +663,45 @@ static int ovl_copy_up_indexdir_prepare(struct ovl_copy_up_ctx *ctx)
 	return err;
 }
 
+static int ovl_fsync_index(struct dentry *dentry, struct dentry *index)
+{
+	struct file *file;
+	struct path upperpath;
+	int err;
+
+	ovl_path_upper(dentry, &upperpath);
+	BUG_ON(upperpath.dentry != NULL);
+	upperpath.dentry = index;
+
+	file = ovl_path_open(&upperpath, O_LARGEFILE | O_WRONLY);
+	if (IS_ERR(file))
+		return PTR_ERR(file);
+
+	err = vfs_fsync(file, 0);
+
+	fput(file);
+	return err;
+}
+
 static int ovl_copy_up_indexdir_commit(struct ovl_copy_up_ctx *ctx)
 {
 	int err;
 
+	/*
+	 * fsync index before "link-up" to guaranty that nlink xattr is stored
+	 * on-disk. Non empty regular files are fsynced on ovl_copy_up_data().
+	 * This does not cover "link-up" of non-regular files.
+	 *
+	 * XXX: Is this really needed? I think that on a journalled file system
+	 * inode xattr change cannot be re-ordered with the same inode's nlink
+	 * change, because they are both metadata changes of that inode.
+	 */
+	if ((!ctx->created || !ctx->stat->size) && S_ISREG(ctx->stat->mode)) {
+		err = ovl_fsync_index(ctx->dentry, ctx->temp);
+		if (err)
+			goto out;
+	}
+
 	inode_lock_nested(d_inode(ctx->upperdir), I_MUTEX_PARENT);
 	/* link the sucker ;) */
 	err = ovl_do_link(ctx->temp, d_inode(ctx->upperdir), ctx->upper, true);
@@ -672,18 +713,8 @@ static int ovl_copy_up_indexdir_commit(struct ovl_copy_up_ctx *ctx)
 	if (err)
 		goto out;
 
-	/*
-	 * Overlay inode nlink doesn't account for lower hardlinks that haven't
-	 * been copied up, so we need to update it on copy up. Otherwise, user
-	 * could decrement nlink below zero by unlinking copied up uppers.
-	 * On the first copy up, we set overlay inode nlink to upper inode nlink
-	 * and on following copy ups we increment it. In between, ovl_link()
-	 * could add more upper hardlinks and increment overlay nlink as well.
-	 */
-	if (ctx->created)
-		set_nlink(d_inode(ctx->dentry), d_inode(ctx->temp)->i_nlink);
-	else
-		inc_nlink(d_inode(ctx->dentry));
+	/* Store the union nlink value relative to index inode nlink */
+	ovl_set_nlink(d_inode(ctx->dentry), ctx->temp, true);
 
 	/* We can mark dentry is indexed before updating upperdentry */
 	ovl_dentry_set_indexed(ctx->dentry);
@@ -699,6 +730,9 @@ static void ovl_copy_up_indexdir_cancel(struct ovl_copy_up_ctx *ctx)
 	if (WARN_ON(!inode))
 		return;
 
+	/* Store the union nlink value relative to index inode nlink */
+	ovl_set_nlink(d_inode(ctx->dentry), ctx->temp, true);
+
 	/* Cleanup prepared index entry only if we created it */
 	if (!ctx->created)
 		return;
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 45120e9d8bbd..c9ba057ccfa3 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -18,6 +18,7 @@
 #include <linux/atomic.h>
 #include <linux/ratelimit.h>
 #include "overlayfs.h"
+#include "ovl_entry.h"
 
 static unsigned short ovl_redirect_max = 256;
 module_param_named(redirect_max, ovl_redirect_max, ushort, 0644);
@@ -585,6 +586,64 @@ static int ovl_symlink(struct inode *dir, struct dentry *dentry,
 	return ovl_create_object(dentry, S_IFLNK, 0, link);
 }
 
+/*
+ * Operations that change overlay inode and upper inode nlink need to be
+ * synchronized with copy up for persistent nlink accounting.
+ */
+static int ovl_nlink_start(struct dentry *dentry)
+{
+	enum ovl_path_type type = ovl_path_type(dentry);
+	int err;
+	const struct cred *old_cred;
+
+	/*
+	 * With inodes index is enabled, we store the union overlay nlink
+	 * in an xattr on the index inode. When whiting out lower hardlinks
+	 * we need to decrement the overlay persistent nlink, but before the
+	 * first copy up, we have no upper index inode to store the xattr.
+	 *
+	 * As a workaround, before whiteout/rename over of a lower hardlink,
+	 * copy up to create the upper index. Creating the upper index will
+	 * initialize the overlay nlink, so it could be dropped if unlink
+	 * or rename succeeds.
+	 *
+	 * TODO: implement metadata only index copy up when called with
+	 *       ovl_copy_up_flags(dentry, O_PATH).
+	 */
+	if (ovl_indexdir(dentry->d_sb) && !OVL_TYPE_UPPER(type) &&
+	    ovl_dentry_lower(dentry)->d_inode->i_nlink > 1) {
+		err = ovl_copy_up(dentry);
+		if (err)
+			return err;
+
+		type = ovl_path_type(dentry);
+	}
+
+	err = mutex_lock_interruptible(&OVL_I(d_inode(dentry))->oi_lock);
+	if (err)
+		return err;
+
+	if (!OVL_TYPE_INDEX(type) || !OVL_TYPE_UPPER(type))
+		return 0;
+
+	old_cred = ovl_override_creds(dentry->d_sb);
+	/*
+	 * The overlay inode nlink should be incremented/decremented IFF the
+	 * upper operation succeeds, along with nlink change of upper inode.
+	 * Therefore, before link/unlink/rename, we store the union nlink
+	 * value relative to the upper inode nlink in an upper inode xattr.
+	 */
+	ovl_set_nlink(d_inode(dentry), ovl_dentry_upper(dentry), true);
+	revert_creds(old_cred);
+
+	return 0;
+}
+
+static void ovl_nlink_end(struct dentry *dentry)
+{
+	mutex_unlock(&OVL_I(d_inode(dentry))->oi_lock);
+}
+
 static int ovl_link(struct dentry *old, struct inode *newdir,
 		    struct dentry *new)
 {
@@ -599,6 +658,10 @@ static int ovl_link(struct dentry *old, struct inode *newdir,
 	if (err)
 		goto out_drop_write;
 
+	err = ovl_nlink_start(old);
+	if (err)
+		goto out_drop_write;
+
 	inode = d_inode(old);
 	ihold(inode);
 
@@ -606,6 +669,7 @@ static int ovl_link(struct dentry *old, struct inode *newdir,
 	if (err)
 		iput(inode);
 
+	ovl_nlink_end(old);
 out_drop_write:
 	ovl_drop_write(old);
 out:
@@ -736,7 +800,6 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir)
 
 static int ovl_do_remove(struct dentry *dentry, bool is_dir)
 {
-	enum ovl_path_type type;
 	int err;
 	const struct cred *old_cred;
 
@@ -748,7 +811,9 @@ static int ovl_do_remove(struct dentry *dentry, bool is_dir)
 	if (err)
 		goto out_drop_write;
 
-	type = ovl_path_type(dentry);
+	err = ovl_nlink_start(dentry);
+	if (err)
+		goto out_drop_write;
 
 	old_cred = ovl_override_creds(dentry->d_sb);
 	if (!ovl_lower_positive(dentry))
@@ -762,6 +827,8 @@ static int ovl_do_remove(struct dentry *dentry, bool is_dir)
 		else
 			drop_nlink(dentry->d_inode);
 	}
+
+	ovl_nlink_end(dentry);
 out_drop_write:
 	ovl_drop_write(dentry);
 out:
@@ -929,6 +996,10 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 		if (err)
 			goto out_drop_write;
 	} else if (!new_is_dir && new->d_inode) {
+		err = ovl_nlink_start(new);
+		if (err)
+			goto out_drop_write;
+
 		new_drop_nlink = true;
 	}
 
@@ -1064,6 +1135,11 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 	unlock_rename(new_upperdir, old_upperdir);
 out_revert_creds:
 	revert_creds(old_cred);
+	/*
+	 * Release oi_lock after rename lock.
+	 */
+	if (new_drop_nlink)
+		ovl_nlink_end(new);
 out_drop_write:
 	ovl_drop_write(old);
 out:
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 28f9a8cc0f61..b83d7e387a02 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -12,6 +12,7 @@
 #include <linux/cred.h>
 #include <linux/xattr.h>
 #include <linux/posix_acl.h>
+#include <linux/ratelimit.h>
 #include "overlayfs.h"
 #include "ovl_entry.h"
 
@@ -131,6 +132,15 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
 	if (is_dir && OVL_TYPE_MERGE(type))
 		stat->nlink = 1;
 
+	/*
+	 * Return the overlay inode nlinks for indexed upper inodes.
+	 * Overlay inode nlink counts the union of the upper hardlinks
+	 * and non-covered lower hardlinks. It does not include the upper
+	 * index hardlink.
+	 */
+	if (!is_dir && OVL_TYPE_UPPER(type) && OVL_TYPE_INDEX(type))
+		stat->nlink = dentry->d_inode->i_nlink;
+
 out:
 	revert_creds(old_cred);
 
@@ -445,6 +455,93 @@ static void ovl_fill_inode(struct inode *inode, umode_t mode, dev_t rdev)
 	}
 }
 
+/*
+ * With inodes index enabled, an overlay inode nlink counts the union of upper
+ * hardlinks and non-covered lower hardlinks. During the lifetime of a non-pure
+ * upper inode, the following nlink modifying operations can happen:
+ *
+ * 1. Lower hardlink copy up
+ * 2. Upper hardlink created, unlinked or renamed over
+ * 3. Lower hardlink whiteout or renamed over
+ *
+ * For the first, copy up case, the union nlink does not change, whether the
+ * operation succeeds or fails, but the upper inode nlink may change.
+ * Therefore, before copy up, we store the union nlink value relative to the
+ * lower inode nlink in the index inode xattr trusted.overlay.nlink.
+ *
+ * For the second, upper hardlink case, the union nlink should be incremented
+ * or decremented IFF the operation succeeds, aligned with nlink change of the
+ * upper inode. Therefore, before link/unlink/rename, we store the union nlink
+ * value relative to the upper inode nlink in the index inode.
+ *
+ * For the last, lower cover up case, we simplify things by preceding the
+ * whiteout or cover up with copy up. This makes sure that there is an index
+ * upper inode where the nlink xattr can be stored before the copied up upper
+ * entry is unlink.
+ */
+#define OVL_NLINK_ADD_UPPER	(1 << 0)
+
+/* On-disk format for indexed nlink */
+struct ovl_nlink {
+	__be32 nlink_add;
+	u8 flags;
+} __packed;
+
+/* Called must hold OVL_I(inode)->oi_lock */
+int ovl_set_nlink(struct inode *inode, struct dentry *index, bool add_upper)
+{
+	struct ovl_inode_info *oi = OVL_I_INFO(inode);
+	struct ovl_nlink onlink;
+	unsigned int nlink_base;
+
+	if (add_upper) {
+		/* oi->__upperinode may be NULL after failed copy up */
+		nlink_base = index->d_inode->i_nlink;
+		onlink.flags = OVL_NLINK_ADD_UPPER;
+	} else {
+		nlink_base = oi->lowerinode->i_nlink;
+		onlink.flags = 0;
+	}
+	onlink.nlink_add = cpu_to_be32(inode->i_nlink - nlink_base);
+
+	return ovl_do_setxattr(index, OVL_XATTR_NLINK,
+			       &onlink, sizeof(onlink), 0);
+}
+
+static unsigned int ovl_get_nlink(struct ovl_inode_info *info,
+				  struct dentry *index, unsigned int real_nlink)
+{
+	struct ovl_nlink onlink;
+	__s32 nlink_add = 0;
+	int res;
+
+	if (!index || !index->d_inode)
+		return real_nlink;
+
+	res = vfs_getxattr(index, OVL_XATTR_NLINK, &onlink, sizeof(onlink));
+	if (res < sizeof(onlink))
+		goto fail;
+
+	if (onlink.flags & OVL_NLINK_ADD_UPPER) {
+		/* info->__upperinode may be NULL for indexed lower */
+		nlink_add = index->d_inode->i_nlink;
+	} else {
+		nlink_add = info->lowerinode->i_nlink;
+	}
+
+	nlink_add += (__s32)be32_to_cpu(onlink.nlink_add);
+	if (nlink_add < 0)
+		goto fail;
+
+	return nlink_add;
+
+fail:
+	pr_warn_ratelimited("overlayfs: failed to get index nlink (%pd2, ino=%lu, nlink=%u, nlink_add=%d, res=%i)\n",
+			    index, index->d_inode->i_ino,
+			    index->d_inode->i_nlink, nlink_add, res);
+	return real_nlink;
+}
+
 struct inode *ovl_new_inode(struct super_block *sb, umode_t mode, dev_t rdev)
 {
 	struct inode *inode;
@@ -487,7 +584,8 @@ static int ovl_inode_set(struct inode *inode, void *data)
 	return 0;
 }
 
-struct inode *ovl_get_inode(struct super_block *sb, struct ovl_inode_info *info)
+struct inode *ovl_get_inode(struct super_block *sb, struct ovl_inode_info *info,
+			    struct dentry *index)
 {
 	struct inode *realinode = info->__upperinode;
 	unsigned long hashval = (unsigned long) realinode;
@@ -513,7 +611,8 @@ struct inode *ovl_get_inode(struct super_block *sb, struct ovl_inode_info *info)
 	inode = iget5_locked(sb, hashval, ovl_inode_test, ovl_inode_set, info);
 	if (inode && inode->i_state & I_NEW) {
 		ovl_fill_inode(inode, realinode->i_mode, realinode->i_rdev);
-		set_nlink(inode, realinode->i_nlink);
+		set_nlink(inode, ovl_get_nlink(OVL_I_INFO(inode), index,
+					       realinode->i_nlink));
 		ovl_copyattr(realinode, inode);
 		unlock_new_inode(inode);
 	}
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index c6b986b68105..5ff5f82f6503 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -713,7 +713,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 
 		err = -ENOMEM;
 		if ((upperdentry || index) && !d.is_dir) {
-			inode = ovl_get_inode(dentry->d_sb, &info);
+			inode = ovl_get_inode(dentry->d_sb, &info, index);
 		} else {
 			inode = ovl_new_inode(dentry->d_sb, realinode->i_mode,
 					      realinode->i_rdev);
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 250b3d42da86..82c1d51b7b63 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -31,6 +31,7 @@ enum ovl_path_type {
 #define OVL_XATTR_REDIRECT OVL_XATTR_PREFIX "redirect"
 #define OVL_XATTR_ORIGIN OVL_XATTR_PREFIX "origin"
 #define OVL_XATTR_IMPURE OVL_XATTR_PREFIX "impure"
+#define OVL_XATTR_NLINK OVL_XATTR_PREFIX "nlink"
 
 /*
  * The tuple (fh,uuid) is a universal unique identifier for a copy up origin,
@@ -270,10 +271,11 @@ int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags);
 int ovl_update_time(struct inode *inode, struct timespec *ts, int flags);
 bool ovl_is_private_xattr(const char *name);
 
+int ovl_set_nlink(struct inode *inode, struct dentry *index, bool add_upper);
 struct inode *ovl_new_inode(struct super_block *sb, umode_t mode, dev_t rdev);
 struct ovl_inode_info;
-struct inode *ovl_get_inode(struct super_block *sb,
-			    struct ovl_inode_info *info);
+struct inode *ovl_get_inode(struct super_block *sb, struct ovl_inode_info *info,
+			    struct dentry *index);
 
 static inline void ovl_copyattr(struct inode *from, struct inode *to)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 25/25] ovl: cleanup orphan index entries
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (23 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 24/25] ovl: persistent overlay inode nlink for indexed inodes Amir Goldstein
@ 2017-06-21 12:28 ` Amir Goldstein
  2017-06-21 16:45   ` Amir Goldstein
  2017-06-21 17:02 ` [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
  2017-06-22 10:18 ` [PATCH v4 26/25] ovl: document copying layers restrictions with " Amir Goldstein
  26 siblings, 1 reply; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 12:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

index entry should live only as long as there are upper or lower
hardlinks.

Cleanup orphan index entries on mount and when dropping the last
overlay inode nlink.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/dir.c       | 71 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/overlayfs/inode.c     |  4 +--
 fs/overlayfs/namei.c     |  6 ++++
 fs/overlayfs/overlayfs.h |  4 ++-
 fs/overlayfs/super.c     |  2 +-
 5 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index c9ba057ccfa3..58c9212458fc 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -84,6 +84,64 @@ static struct dentry *ovl_whiteout(struct dentry *workdir,
 	return whiteout;
 }
 
+/* Called must hold OVL_I(inode)->oi_lock */
+static int ovl_cleanup_index(struct dentry *dentry)
+{
+	struct inode *dir = ovl_indexdir(dentry->d_sb)->d_inode;
+	struct dentry *lower;
+	struct dentry *index = NULL;
+	struct inode *inode;
+	int err;
+
+	/*
+	 * dentry may already be unhashed, but it still holds a reference to
+	 * the lower/upper dentries from before an unlink operation.
+	 */
+	lower = ovl_dentry_lower(dentry);
+	err = -ESTALE;
+	if (WARN_ON(!lower))
+		goto fail;
+
+	err = ovl_lookup_index(dentry, ovl_dentry_upper(dentry), lower, &index);
+	if (err)
+		goto fail;
+
+	err = -ENOENT;
+	if (WARN_ON(!index || !index->d_inode))
+		goto fail;
+
+	inode = d_inode(index);
+	err = -EEXIST;
+	if (inode->i_nlink != 1) {
+		pr_warn_ratelimited("overlayfs: cleanup linked index (%pd2, ino=%lu, nlink=%u)\n",
+				    index, inode->i_ino, inode->i_nlink);
+		/*
+		 * We either have a bug with persistent union nlink or a lower
+		 * hardlink was added while overlay is mounted. Adding a lower
+		 * hardlink and then unlinking all overlay hardlinks would drop
+		 * overlay nlink to zero before all upper inodes are unlinked.
+		 * As a safety measure, when that situation is detected, set
+		 * the overlay nlink to the index inode nlink minus one for the
+		 * index entry itself.
+		 */
+		set_nlink(d_inode(dentry), inode->i_nlink - 1);
+		goto fail;
+	}
+
+	inode_lock_nested(dir, I_MUTEX_PARENT);
+	/* TODO: whiteout instead of cleanup to block future open by handle */
+	err = ovl_cleanup(dir, index);
+	inode_unlock(dir);
+
+out:
+	dput(index);
+	return err;
+
+fail:
+	pr_err("overlayfs: cleanup index of '%pd2' failed (%i)\n", dentry, err);
+	goto out;
+}
+
 int ovl_create_real(struct inode *dir, struct dentry *newdentry,
 		    struct cattr *attr, struct dentry *hardlink, bool debug)
 {
@@ -641,6 +699,17 @@ static int ovl_nlink_start(struct dentry *dentry)
 
 static void ovl_nlink_end(struct dentry *dentry)
 {
+	enum ovl_path_type type = ovl_path_type(dentry);
+
+	/* Unlink the index inode on last overlay inode unlink */
+	if (OVL_TYPE_INDEX(type) && d_inode(dentry)->i_nlink == 0) {
+		const struct cred *old_cred;
+
+		old_cred = ovl_override_creds(dentry->d_sb);
+		ovl_cleanup_index(dentry);
+		revert_creds(old_cred);
+	}
+
 	mutex_unlock(&OVL_I(d_inode(dentry))->oi_lock);
 }
 
@@ -1137,6 +1206,8 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 	revert_creds(old_cred);
 	/*
 	 * Release oi_lock after rename lock.
+	 * Orphan index cleanup may need to aquire index dir mutex, which is
+	 * the same lockdep class/fstype as the upper dir rename lock.
 	 */
 	if (new_drop_nlink)
 		ovl_nlink_end(new);
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index b83d7e387a02..d21e4450b5a9 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -508,8 +508,8 @@ int ovl_set_nlink(struct inode *inode, struct dentry *index, bool add_upper)
 			       &onlink, sizeof(onlink), 0);
 }
 
-static unsigned int ovl_get_nlink(struct ovl_inode_info *info,
-				  struct dentry *index, unsigned int real_nlink)
+unsigned int ovl_get_nlink(struct ovl_inode_info *info, struct dentry *index,
+			   unsigned int real_nlink)
 {
 	struct ovl_nlink onlink;
 	__s32 nlink_add = 0;
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 5ff5f82f6503..aef123a441dd 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -393,6 +393,7 @@ int ovl_verify_index(struct dentry *index, struct path *lowerstack,
 	struct path origin = { };
 	struct path *stack = &origin;
 	unsigned int ctr = 0;
+	struct ovl_inode_info info = { };
 	int err;
 
 	if (!d_inode(index))
@@ -426,6 +427,11 @@ int ovl_verify_index(struct dentry *index, struct path *lowerstack,
 	if (err)
 		goto fail;
 
+	/* Check if index is orphan and don't warn before cleaning it */
+	info.lowerinode = d_inode(origin.dentry);
+	if (ovl_get_nlink(&info, index, 0) == 0)
+		err = -ENOENT;
+
 	dput(origin.dentry);
 out:
 	kfree(fh);
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 82c1d51b7b63..89eea7954420 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -271,9 +271,11 @@ int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags);
 int ovl_update_time(struct inode *inode, struct timespec *ts, int flags);
 bool ovl_is_private_xattr(const char *name);
 
+struct ovl_inode_info;
 int ovl_set_nlink(struct inode *inode, struct dentry *index, bool add_upper);
+unsigned int ovl_get_nlink(struct ovl_inode_info *info, struct dentry *index,
+			   unsigned int real_nlink);
 struct inode *ovl_new_inode(struct super_block *sb, umode_t mode, dev_t rdev);
-struct ovl_inode_info;
 struct inode *ovl_get_inode(struct super_block *sb, struct ovl_inode_info *info,
 			    struct dentry *index);
 
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index d6604bbe0a66..793667f47a04 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1108,7 +1108,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 			if (err)
 				pr_err("overlayfs: failed to verify index dir origin\n");
 
-			/* Cleanup bad/stale index entries */
+			/* Cleanup bad/stale/orphan index entries */
 			if (!err)
 				err = ovl_indexdir_cleanup(ufs->indexdir,
 							   ufs->upper_mnt,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 25/25] ovl: cleanup orphan index entries
  2017-06-21 12:28 ` [PATCH v4 25/25] ovl: cleanup orphan index entries Amir Goldstein
@ 2017-06-21 16:45   ` Amir Goldstein
  0 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 16:45 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

index entry should live only as long as there are upper or lower
hardlinks.

Cleanup orphan index entries on mount and when dropping the last
overlay inode nlink.

When about to cleanup or link up to orphan index and the index inode
nlink > 1, admit that something went wrong and adjust overlay nlink
to index inode nlink - 1 to prevent it from dropping below zero.
This could happen when adding lower hardlinks underneath a mounted
overlay and then trying to unlink them.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c   | 21 ++++++++++++--
 fs/overlayfs/dir.c       | 72 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/overlayfs/inode.c     |  4 +--
 fs/overlayfs/namei.c     |  6 ++++
 fs/overlayfs/overlayfs.h |  4 ++-
 fs/overlayfs/super.c     |  2 +-
 6 files changed, 102 insertions(+), 7 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index bbc16880b32d..642957f5bed9 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -622,9 +622,24 @@ static int ovl_copy_up_indexdir_prepare(struct ovl_copy_up_ctx *ctx)
 			 * Orphan index inodes should be cleaned up on mount
 			 * and when overlay inode nlink drops to zero.
 			 */
-			pr_warn_ratelimited("overlayfs: not linking to orphan index (%pd2, nlink=%u)\n",
-					    index, inode->i_nlink);
-			goto out_dput;
+			pr_warn_ratelimited("overlayfs: link-up orphan index (%pd2, ino=%lu, nlink=%u)\n",
+					    index, inode->i_ino,
+					    inode->i_nlink);
+
+			if (inode->i_nlink == 1)
+				goto out_dput;
+
+			/*
+			 * If index is has nlink > 1, then we either have a bug
+			 * with persistent union nlink or a lower hardlink was
+			 * added while overlay is mounted. Adding a lower
+			 * hardlink and then unlinking all overlay hardlinks
+			 * would drop overlay nlink to zero before all upper
+			 * inodes are unlinked. As a safety measure, when that
+			 * situation is detected, set the overlay nlink to the
+			 * index inode nlink minus one for the index entry.
+			 */
+			set_nlink(d_inode(ctx->dentry), inode->i_nlink - 1);
 		}
 
 		/* Link to existing upper without copying lower */
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index c9ba057ccfa3..b4cd8de1ce0c 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -84,6 +84,65 @@ static struct dentry *ovl_whiteout(struct dentry *workdir,
 	return whiteout;
 }
 
+/* Called must hold OVL_I(inode)->oi_lock */
+static int ovl_cleanup_index(struct dentry *dentry)
+{
+	struct inode *dir = ovl_indexdir(dentry->d_sb)->d_inode;
+	struct dentry *lower;
+	struct dentry *index = NULL;
+	struct inode *inode;
+	int err;
+
+	/*
+	 * dentry may already be unhashed, but it still holds a reference to
+	 * the lower/upper dentries from before an unlink operation.
+	 */
+	lower = ovl_dentry_lower(dentry);
+	err = -ESTALE;
+	if (WARN_ON(!lower))
+		goto fail;
+
+	err = ovl_lookup_index(dentry, ovl_dentry_upper(dentry), lower, &index);
+	if (err)
+		goto fail;
+
+	err = -ENOENT;
+	if (WARN_ON(!index || !index->d_inode))
+		goto fail;
+
+	inode = d_inode(index);
+	err = -EEXIST;
+	if (inode->i_nlink != 1) {
+		pr_warn_ratelimited("overlayfs: cleanup linked index (%pd2, ino=%lu, nlink=%u)\n",
+				    index, inode->i_ino, inode->i_nlink);
+		/*
+		 * We either have a bug with persistent union nlink or a lower
+		 * hardlink was added while overlay is mounted. Adding a lower
+		 * hardlink and then unlinking all overlay hardlinks would drop
+		 * overlay nlink to zero before all upper inodes are unlinked.
+		 * As a safety measure, when that situation is detected, set
+		 * the overlay nlink to the index inode nlink minus one for the
+		 * index entry itself.
+		 */
+		set_nlink(d_inode(dentry), inode->i_nlink - 1);
+		ovl_set_nlink(d_inode(dentry), index, true);
+		goto fail;
+	}
+
+	inode_lock_nested(dir, I_MUTEX_PARENT);
+	/* TODO: whiteout instead of cleanup to block future open by handle */
+	err = ovl_cleanup(dir, index);
+	inode_unlock(dir);
+
+out:
+	dput(index);
+	return err;
+
+fail:
+	pr_err("overlayfs: cleanup index of '%pd2' failed (%i)\n", dentry, err);
+	goto out;
+}
+
 int ovl_create_real(struct inode *dir, struct dentry *newdentry,
 		    struct cattr *attr, struct dentry *hardlink, bool debug)
 {
@@ -641,6 +700,17 @@ static int ovl_nlink_start(struct dentry *dentry)
 
 static void ovl_nlink_end(struct dentry *dentry)
 {
+	enum ovl_path_type type = ovl_path_type(dentry);
+
+	/* Unlink the index inode on last overlay inode unlink */
+	if (OVL_TYPE_INDEX(type) && d_inode(dentry)->i_nlink == 0) {
+		const struct cred *old_cred;
+
+		old_cred = ovl_override_creds(dentry->d_sb);
+		ovl_cleanup_index(dentry);
+		revert_creds(old_cred);
+	}
+
 	mutex_unlock(&OVL_I(d_inode(dentry))->oi_lock);
 }
 
@@ -1137,6 +1207,8 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 	revert_creds(old_cred);
 	/*
 	 * Release oi_lock after rename lock.
+	 * Orphan index cleanup may need to aquire index dir mutex, which is
+	 * the same lockdep class/fstype as the upper dir rename lock.
 	 */
 	if (new_drop_nlink)
 		ovl_nlink_end(new);
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index b83d7e387a02..d21e4450b5a9 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -508,8 +508,8 @@ int ovl_set_nlink(struct inode *inode, struct dentry *index, bool add_upper)
 			       &onlink, sizeof(onlink), 0);
 }
 
-static unsigned int ovl_get_nlink(struct ovl_inode_info *info,
-				  struct dentry *index, unsigned int real_nlink)
+unsigned int ovl_get_nlink(struct ovl_inode_info *info, struct dentry *index,
+			   unsigned int real_nlink)
 {
 	struct ovl_nlink onlink;
 	__s32 nlink_add = 0;
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 5ff5f82f6503..aef123a441dd 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -393,6 +393,7 @@ int ovl_verify_index(struct dentry *index, struct path *lowerstack,
 	struct path origin = { };
 	struct path *stack = &origin;
 	unsigned int ctr = 0;
+	struct ovl_inode_info info = { };
 	int err;
 
 	if (!d_inode(index))
@@ -426,6 +427,11 @@ int ovl_verify_index(struct dentry *index, struct path *lowerstack,
 	if (err)
 		goto fail;
 
+	/* Check if index is orphan and don't warn before cleaning it */
+	info.lowerinode = d_inode(origin.dentry);
+	if (ovl_get_nlink(&info, index, 0) == 0)
+		err = -ENOENT;
+
 	dput(origin.dentry);
 out:
 	kfree(fh);
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 82c1d51b7b63..89eea7954420 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -271,9 +271,11 @@ int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags);
 int ovl_update_time(struct inode *inode, struct timespec *ts, int flags);
 bool ovl_is_private_xattr(const char *name);
 
+struct ovl_inode_info;
 int ovl_set_nlink(struct inode *inode, struct dentry *index, bool add_upper);
+unsigned int ovl_get_nlink(struct ovl_inode_info *info, struct dentry *index,
+			   unsigned int real_nlink);
 struct inode *ovl_new_inode(struct super_block *sb, umode_t mode, dev_t rdev);
-struct ovl_inode_info;
 struct inode *ovl_get_inode(struct super_block *sb, struct ovl_inode_info *info,
 			    struct dentry *index);
 
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index d6604bbe0a66..793667f47a04 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1108,7 +1108,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 			if (err)
 				pr_err("overlayfs: failed to verify index dir origin\n");
 
-			/* Cleanup bad/stale index entries */
+			/* Cleanup bad/stale/orphan index entries */
 			if (!err)
 				err = ovl_indexdir_cleanup(ufs->indexdir,
 							   ufs->upper_mnt,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 00/25] Overlayfs inodes index
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (24 preceding siblings ...)
  2017-06-21 12:28 ` [PATCH v4 25/25] ovl: cleanup orphan index entries Amir Goldstein
@ 2017-06-21 17:02 ` Amir Goldstein
  2017-06-21 20:03   ` Amir Goldstein
  2017-06-22 10:18 ` [PATCH v4 26/25] ovl: document copying layers restrictions with " Amir Goldstein
  26 siblings, 1 reply; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 17:02 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: overlayfs, linux-fsdevel

On Wed, Jun 21, 2017 at 3:28 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> Miklos,
>
> This is it.
> All done on my end, so unless you have any rejects that you want me
> to fix, you may start mangling.
>
> I've added another xfstest for nlink accounting and may add another
> one that mangles with lower hardlinks to get negative overlay nlink.
> It's a behavior I observed with lower/upper stress xfstest overlay/019
> and fixed in patch 25.

FYI1: I wrote that test and pushed to my xfstests dev branch.
The test found another corner case of lower hardlinks mangling.
I fixed that other case and re-posted patch 25 and force pushed
ovl-hardlinks.v4.

FYI2: Occasionally, I get the WARN_ON from drop_nlink() when
running the generic xfstest fsstress test generic/013:

generic/013 [16:50:00]run fstests generic/013 at 2017-06-21 16:50:00
------------[ cut here ]------------
WARNING: CPU: 0 PID: 10920 at
/home/amir/build/src/linux/fs/inode.c:282 drop_nlink+0x10/0x29
Call Trace:
 ovl_do_remove.part.2+0x353/0x3bd
 ovl_unlink+0x23/0x26
 vfs_unlink+0xde/0x17d
 do_unlinkat+0x10a/0x215

Since this is not an overlay specific test, all fsstress operation
are on pure upper, so I'm not sure how my patches could break
this test case. Because it happens rarely, its going to take me time
to bisect the problem.

My only immediate suspect is "ovl: fix nlink leak in ovl_rename()",
which is a generic change that drops nlink. Could you please take
a closer look at this patch to see if I missed something.
I will try to reproduce the issue with just this one patch.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 00/25] Overlayfs inodes index
  2017-06-21 17:02 ` [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
@ 2017-06-21 20:03   ` Amir Goldstein
  0 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-21 20:03 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: overlayfs, linux-fsdevel

On Wed, Jun 21, 2017 at 8:02 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Jun 21, 2017 at 3:28 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Miklos,
>>
>> This is it.
>> All done on my end, so unless you have any rejects that you want me
>> to fix, you may start mangling.
>>
>> I've added another xfstest for nlink accounting and may add another
>> one that mangles with lower hardlinks to get negative overlay nlink.
>> It's a behavior I observed with lower/upper stress xfstest overlay/019
>> and fixed in patch 25.
>
> FYI1: I wrote that test and pushed to my xfstests dev branch.
> The test found another corner case of lower hardlinks mangling.
> I fixed that other case and re-posted patch 25 and force pushed
> ovl-hardlinks.v4.
>
> FYI2: Occasionally, I get the WARN_ON from drop_nlink() when
> running the generic xfstest fsstress test generic/013:
>
> generic/013 [16:50:00]run fstests generic/013 at 2017-06-21 16:50:00
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 10920 at
> /home/amir/build/src/linux/fs/inode.c:282 drop_nlink+0x10/0x29
> Call Trace:
>  ovl_do_remove.part.2+0x353/0x3bd
>  ovl_unlink+0x23/0x26
>  vfs_unlink+0xde/0x17d
>  do_unlinkat+0x10a/0x215
>
> Since this is not an overlay specific test, all fsstress operation
> are on pure upper, so I'm not sure how my patches could break
> this test case. Because it happens rarely, its going to take me time
> to bisect the problem.
>
> My only immediate suspect is "ovl: fix nlink leak in ovl_rename()",
> which is a generic change that drops nlink. Could you please take
> a closer look at this patch to see if I missed something.
> I will try to reproduce the issue with just this one patch.
>

This patch is not to blame, nor are any of the other patches.
The problem goes back at least as far as kernel v4.10.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v4 26/25] ovl: document copying layers restrictions with inodes index
  2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
                   ` (25 preceding siblings ...)
  2017-06-21 17:02 ` [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
@ 2017-06-22 10:18 ` Amir Goldstein
  26 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-22 10:18 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs, linux-fsdevel

The inodes index feature introduces a behavior change - on mount,
upper root origin file handle is verified to match the lower root dir.
This implies that copied layers cannot be mounted with the inodes index
feature enabled, without explicitly removing the upper dir origin xattr
and the index dir.

The inodes index feature is required to support:
- Prevent breaking hardlinks on copy up
- NFS export support (upcoming)
- Overlayfs snapshots (POC)

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 Documentation/filesystems/overlayfs.txt | 34 +++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

Miklos,

I wrote in v4 posting that I updated the documentation, but forgot to send
out the documentation patch, so here goes.

There is one more behavior change that is related to indexing of directories
in v4.14, which we should consider including in v4.13, so v4.14 won't change
behavior of index enabled lookup.

The change is to verify lower of merge dir by origin and is available
on the following commit along with updated documentation [1]. I will post
this extra patch for your consideration after I am done running some more
tests.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/ovl-verify-dir

diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt
index 8599fc9addbb..f63a8552bb81 100644
--- a/Documentation/filesystems/overlayfs.txt
+++ b/Documentation/filesystems/overlayfs.txt
@@ -201,6 +201,40 @@ rightmost one and going left.  In the above example lower1 will be the
 top, lower2 the middle and lower3 the bottom layer.
 
 
+Sharing and copying layers
+--------------------------
+
+Lower layers may be shared among several overlay mounts and that is indeed
+a very common practice.  An overlay mount may use the same lower layer
+path as another overlay mount and it may use a lower layer path that is
+beneath or above the path of another overlay lower layer path.
+
+Using an upper layer path and/or a workdir path that are already used by
+another overlay mount is not allowed and will fail with EBUSY.  Using
+partially overlapping paths is not allowed but will not fail with EBUSY.
+
+Mounting an overlay using an upper layer path, where the upper layer path
+was previously used by another mounted overlay in combination with a
+different lower layer path, is allowed, unless the "inodes index" feature
+is enabled.
+
+With the "inodes index" feature, on the first time mount, an NFS file
+handle of the lower layer root directory, along with the UUID of the lower
+filesystem, are encoded and stored in the "trusted.overlay.origin" extended
+attribute on the upper layer root directory.  On subsequent mount attempts,
+the lower root directory file handle and lower filesystem UUID are compared
+to the stored origin in upper root directory.  On failure to verify the
+lower root origin, mount will fail with ESTALE.  An overlayfs mount with
+"inodes index" enabled will fail with EOPNOTSUPP if the lower filesystem
+does not support NFS export, lower filesystem does not have a valid UUID or
+if the upper filesystem does not support extended attributes.
+
+It is quite a common practice to copy overlay layers to a different
+directory tree on the same or different underlying filesystem, and even
+to a different machine.  With the "inodes index" feature, trying to mount
+the copied layers will fail the verification of the lower root file handle.
+
+
 Non-standard behavior
 ---------------------
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 24/25] ovl: persistent overlay inode nlink for indexed inodes
  2017-06-21 12:28 ` [PATCH v4 24/25] ovl: persistent overlay inode nlink for indexed inodes Amir Goldstein
@ 2017-06-23 11:34   ` Amir Goldstein
  0 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-06-23 11:34 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: overlayfs

On Wed, Jun 21, 2017 at 3:28 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> With inodes index enabled, an overlay inode nlink counts the union of upper
> and non-covered lower hardlinks. During the lifetime of a non-pure upper
> inode, the following nlink modifying operations can happen:
>
> 1. Lower hardlink copy up
> 2. Upper hardlink created, unlinked or renamed over
> 3. Lower hardlink whiteout or renamed over
>
> For the first, copy up case, the union nlink does not change, whether the
> operation succeeds or fails, but the upper inode nlink may change.
> Therefore, before copy up, we store the union nlink value relative to the
> lower inode nlink in the index inode xattr trusted.overlay.nlink.
>
> For the second, upper hardlink case, the union nlink should be incremented
> or decremented IFF the operation succeeds, aligned with nlink change of the
> upper inode. Therefore, before link/unlink/rename, we store the union nlink
> value relative to the upper inode nlink in the index inode.
>
> For the last, lower cover up case, we simplify things by preceding the
> whiteout or cover up with copy up. This makes sure that there is an index
> upper inode where the nlink xattr can be stored before the copied up upper
> entry is unlink.
>
> Return the overlay inode nlinks for indexed upper inodes on stat(2).
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/copy_up.c   |  88 +++++++++++++++++++++++++++-------------
>  fs/overlayfs/dir.c       |  80 +++++++++++++++++++++++++++++++++++-
>  fs/overlayfs/inode.c     | 103 ++++++++++++++++++++++++++++++++++++++++++++++-
>  fs/overlayfs/namei.c     |   2 +-
>  fs/overlayfs/overlayfs.h |   6 ++-
>  5 files changed, 245 insertions(+), 34 deletions(-)
>
...
> +
> +static unsigned int ovl_get_nlink(struct ovl_inode_info *info,
> +                                 struct dentry *index, unsigned int real_nlink)
> +{
> +       struct ovl_nlink onlink;
> +       __s32 nlink_add = 0;
> +       int res;
> +
> +       if (!index || !index->d_inode)
> +               return real_nlink;
> +
> +       res = vfs_getxattr(index, OVL_XATTR_NLINK, &onlink, sizeof(onlink));
> +       if (res < sizeof(onlink))
> +               goto fail;
> +

I tested cleanup of orphans due to missing nlink xattr.
Apparently, ((int)-ENODATAT < sizeof(onlink)) is false..
    if (res != sizeof(onlink)) works better

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2017-06-23 11:34 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-21 12:28 [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 01/25] vfs: introduce inode 'inuse' lock Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 02/25] ovl: get exclusive ownership on upper/work dirs Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 03/25] ovl: relax same fs constrain for ovl_check_origin() Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 04/25] ovl: generalize ovl_create_workdir() Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 05/25] ovl: introduce the inodes index dir feature Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 06/25] ovl: verify upper root dir matches lower root dir Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 07/25] ovl: verify index dir matches upper dir Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 08/25] ovl: store path type in dentry Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 09/25] ovl: cram dentry state booleans into type flags Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 10/25] ovl: lookup index entry for copy up origin Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 11/25] ovl: cleanup bad and stale index entries on mount Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 12/25] ovl: allocate an ovl_inode struct Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 13/25] ovl: store upper/lower real inode in ovl_inode_info Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 14/25] ovl: use ovl_inode_init() for initializing new inode Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 15/25] ovl: hash overlay non-dir inodes by copy up origin inode Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 16/25] ovl: defer upper dir lock to tempfile link Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 17/25] ovl: factor out ovl_copy_up_inode() helper Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 18/25] ovl: generalize ovl_copy_up_locked() using actors Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 19/25] ovl: generalize ovl_copy_up_one() " Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 20/25] ovl: use ovl_inode mutex to synchronize concurrent copy up Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 21/25] ovl: implement index dir copy up method Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 22/25] ovl: link up indexed lower hardlink on lookup Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 23/25] ovl: fix nlink leak in ovl_rename() Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 24/25] ovl: persistent overlay inode nlink for indexed inodes Amir Goldstein
2017-06-23 11:34   ` Amir Goldstein
2017-06-21 12:28 ` [PATCH v4 25/25] ovl: cleanup orphan index entries Amir Goldstein
2017-06-21 16:45   ` Amir Goldstein
2017-06-21 17:02 ` [PATCH v4 00/25] Overlayfs inodes index Amir Goldstein
2017-06-21 20:03   ` Amir Goldstein
2017-06-22 10:18 ` [PATCH v4 26/25] ovl: document copying layers restrictions with " Amir Goldstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.