All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/8] Overlayfs: constant st_ino/d_ino for non-samefs
@ 2017-11-02 20:38 Amir Goldstein
  2017-11-02 20:38 ` [PATCH v7 1/8] ovl: move include of ovl_entry.h into overlayfs.h Amir Goldstein
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Amir Goldstein @ 2017-11-02 20:38 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

Miklos,

This version is cleaner and better tested.
I have a WIP for xfstest for the new cases, will clean it up and post
it later.

Changes since v6:
- Reorganize d_ino patches to: 1. merge dir cache and 2. impure dir cache
- Explain better why we need to return pseudo st_dev for lower
- Fix crash on building impure cache for pure lower
- Reset cache on change of dir from is_real to !is_real

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/ovl-nonsamefs-v7

Amir Goldstein (5):
  ovl: move include of ovl_entry.h into overlayfs.h
  ovl: relax same fs constraint for constant st_ino
  ovl: recalc d_ino for dir cache in non-samefs case
  ovl: update cache version of impure parent on rename
  ovl: ovl_iterate_real() for all pure upper/lower in non-samefs case

Chandan Rajendra (3):
  ovl: re-structure overlay lower layers in-memory
  ovl: allocate anonymous devs for lowerdirs
  ovl: return anonymous st_dev for lower inodes

 fs/overlayfs/copy_up.c   |  1 -
 fs/overlayfs/dir.c       | 16 ++++++----
 fs/overlayfs/inode.c     | 51 +++++++++++++++++++++---------
 fs/overlayfs/namei.c     | 53 ++++++++++++++++++-------------
 fs/overlayfs/overlayfs.h |  7 +++--
 fs/overlayfs/ovl_entry.h | 14 +++++++--
 fs/overlayfs/readdir.c   | 64 +++++++++++++++++++++++++++-----------
 fs/overlayfs/super.c     | 81 ++++++++++++++++++++++++++++++------------------
 fs/overlayfs/util.c      | 17 +++++++---
 9 files changed, 202 insertions(+), 102 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v7 1/8] ovl: move include of ovl_entry.h into overlayfs.h
  2017-11-02 20:38 [PATCH v7 0/8] Overlayfs: constant st_ino/d_ino for non-samefs Amir Goldstein
@ 2017-11-02 20:38 ` Amir Goldstein
  2017-11-02 20:38 ` [PATCH v7 2/8] ovl: re-structure overlay lower layers in-memory Amir Goldstein
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Amir Goldstein @ 2017-11-02 20:38 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

Most overlayfs c files already explicitly include ovl_entry.h
to use overlay entry struct definitions and upcoming changes
are going to require even more c files to include this header.

All overlayfs c files include overlayfs.h and overlayfs.h itself
refers to some structs defined in ovl_entry.h, so it seems more
logic to include ovl_entry.h from overlayfs.h than from c files.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c   | 1 -
 fs/overlayfs/inode.c     | 1 -
 fs/overlayfs/namei.c     | 1 -
 fs/overlayfs/overlayfs.h | 1 +
 fs/overlayfs/super.c     | 1 -
 fs/overlayfs/util.c      | 1 -
 6 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index c441f9387a1b..9311d183b532 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -22,7 +22,6 @@
 #include <linux/ratelimit.h>
 #include <linux/exportfs.h>
 #include "overlayfs.h"
-#include "ovl_entry.h"
 
 #define OVL_COPY_UP_CHUNK_SIZE (1 << 20)
 
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 2949448d0c7d..b772e5a2a730 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -14,7 +14,6 @@
 #include <linux/posix_acl.h>
 #include <linux/ratelimit.h>
 #include "overlayfs.h"
-#include "ovl_entry.h"
 
 int ovl_setattr(struct dentry *dentry, struct iattr *attr)
 {
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index a12dc10bf726..505a4b8902fc 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -15,7 +15,6 @@
 #include <linux/mount.h>
 #include <linux/exportfs.h>
 #include "overlayfs.h"
-#include "ovl_entry.h"
 
 struct ovl_lookup_data {
 	struct qstr name;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index d53157ccf0d7..1cf3bdd193a4 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -9,6 +9,7 @@
 
 #include <linux/kernel.h>
 #include <linux/uuid.h>
+#include "ovl_entry.h"
 
 enum ovl_path_type {
 	__OVL_PATH_UPPER	= (1 << 0),
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index f5738e96a052..8702803ba328 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -18,7 +18,6 @@
 #include <linux/seq_file.h>
 #include <linux/posix_acl_xattr.h>
 #include "overlayfs.h"
-#include "ovl_entry.h"
 
 MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
 MODULE_DESCRIPTION("Overlay filesystem");
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 51ca8bd16009..9158d17bb320 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -17,7 +17,6 @@
 #include <linux/namei.h>
 #include <linux/ratelimit.h>
 #include "overlayfs.h"
-#include "ovl_entry.h"
 
 int ovl_want_write(struct dentry *dentry)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 2/8] ovl: re-structure overlay lower layers in-memory
  2017-11-02 20:38 [PATCH v7 0/8] Overlayfs: constant st_ino/d_ino for non-samefs Amir Goldstein
  2017-11-02 20:38 ` [PATCH v7 1/8] ovl: move include of ovl_entry.h into overlayfs.h Amir Goldstein
@ 2017-11-02 20:38 ` Amir Goldstein
  2017-11-02 20:38 ` [PATCH v7 3/8] ovl: allocate anonymous devs for lowerdirs Amir Goldstein
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Amir Goldstein @ 2017-11-02 20:38 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

From: Chandan Rajendra <chandan@linux.vnet.ibm.com>

Define new structures to represent overlay instance lower layers and
overlay merge dir lower layers to make room for storing more per layer
information in-memory.

Instead of keeping the fs instance lower layers in an array of struct
vfsmount, keep them in an array of new struct ovl_layer, that has a
pointer to struct vfsmount.

Instead of keeping the dentry lower layers in an array of struct path,
keep them in an array of new struct ovl_path, that has a pointer to
struct dentry and to struct ovl_layer.

Add a small helper to find the fs layer id that correspopnds to a lower
struct ovl_path and use it in ovl_lookup().

[amir: split re-structure from anonymous bdev patch]

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/namei.c     | 52 ++++++++++++++++++++++++----------------
 fs/overlayfs/overlayfs.h |  4 ++--
 fs/overlayfs/ovl_entry.h | 13 ++++++++--
 fs/overlayfs/readdir.c   |  4 ++--
 fs/overlayfs/super.c     | 62 ++++++++++++++++++++++++++----------------------
 fs/overlayfs/util.c      |  7 +++++-
 6 files changed, 86 insertions(+), 56 deletions(-)

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 505a4b8902fc..6cc3ece3417d 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -285,16 +285,15 @@ static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
 
 
 static int ovl_check_origin(struct dentry *upperdentry,
-			    struct path *lowerstack, unsigned int numlower,
-			    struct path **stackp, unsigned int *ctrp)
+			    struct ovl_path *lower, unsigned int numlower,
+			    struct ovl_path **stackp, unsigned int *ctrp)
 {
 	struct vfsmount *mnt;
 	struct dentry *origin = NULL;
 	int i;
 
-
 	for (i = 0; i < numlower; i++) {
-		mnt = lowerstack[i].mnt;
+		mnt = lower[i].layer->mnt;
 		origin = ovl_get_origin(upperdentry, mnt);
 		if (IS_ERR(origin))
 			return PTR_ERR(origin);
@@ -308,12 +307,12 @@ static int ovl_check_origin(struct dentry *upperdentry,
 
 	BUG_ON(*ctrp);
 	if (!*stackp)
-		*stackp = kmalloc(sizeof(struct path), GFP_KERNEL);
+		*stackp = kmalloc(sizeof(struct ovl_path), GFP_KERNEL);
 	if (!*stackp) {
 		dput(origin);
 		return -ENOMEM;
 	}
-	**stackp = (struct path) { .dentry = origin, .mnt = mnt };
+	**stackp = (struct ovl_path){.dentry = origin, .layer = lower[i].layer};
 	*ctrp = 1;
 
 	return 0;
@@ -383,13 +382,13 @@ int ovl_verify_origin(struct dentry *dentry, struct vfsmount *mnt,
  * OVL_XATTR_ORIGIN and that origin file handle can be decoded to lower path.
  * Return 0 on match, -ESTALE on mismatch or stale origin, < 0 on error.
  */
-int ovl_verify_index(struct dentry *index, struct path *lowerstack,
+int ovl_verify_index(struct dentry *index, struct ovl_path *lower,
 		     unsigned int numlower)
 {
 	struct ovl_fh *fh = NULL;
 	size_t len;
-	struct path origin = { };
-	struct path *stack = &origin;
+	struct ovl_path origin = { };
+	struct ovl_path *stack = &origin;
 	unsigned int ctr = 0;
 	int err;
 
@@ -428,7 +427,7 @@ int ovl_verify_index(struct dentry *index, struct path *lowerstack,
 	if (err)
 		goto fail;
 
-	err = ovl_check_origin(index, lowerstack, numlower, &stack, &ctr);
+	err = ovl_check_origin(index, lower, numlower, &stack, &ctr);
 	if (!err && !ctr)
 		err = -ESTALE;
 	if (err)
@@ -567,11 +566,24 @@ int ovl_path_next(int idx, struct dentry *dentry, struct path *path)
 		idx++;
 	}
 	BUG_ON(idx > oe->numlower);
-	*path = oe->lowerstack[idx - 1];
+	path->dentry = oe->lowerstack[idx - 1].dentry;
+	path->mnt = oe->lowerstack[idx - 1].layer->mnt;
 
 	return (idx < oe->numlower) ? idx + 1 : -1;
 }
 
+static int ovl_find_layer(struct ovl_fs *ofs, struct ovl_path *path)
+{
+	int i;
+
+	for (i = 0; i < ofs->numlower; i++) {
+		if (ofs->lower_layers[i].mnt == path->layer->mnt)
+			break;
+	}
+
+	return i;
+}
+
 struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 			  unsigned int flags)
 {
@@ -580,7 +592,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
 	struct ovl_entry *poe = dentry->d_parent->d_fsdata;
 	struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
-	struct path *stack = NULL;
+	struct ovl_path *stack = NULL;
 	struct dentry *upperdir, *upperdentry = NULL;
 	struct dentry *index = NULL;
 	unsigned int ctr = 0;
@@ -645,17 +657,17 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 
 	if (!d.stop && poe->numlower) {
 		err = -ENOMEM;
-		stack = kcalloc(ofs->numlower, sizeof(struct path),
+		stack = kcalloc(ofs->numlower, sizeof(struct ovl_path),
 				GFP_KERNEL);
 		if (!stack)
 			goto out_put_upper;
 	}
 
 	for (i = 0; !d.stop && i < poe->numlower; i++) {
-		struct path lowerpath = poe->lowerstack[i];
+		struct ovl_path lower = poe->lowerstack[i];
 
 		d.last = i == poe->numlower - 1;
-		err = ovl_lookup_layer(lowerpath.dentry, &d, &this);
+		err = ovl_lookup_layer(lower.dentry, &d, &this);
 		if (err)
 			goto out_put;
 
@@ -663,7 +675,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 			continue;
 
 		stack[ctr].dentry = this;
-		stack[ctr].mnt = lowerpath.mnt;
+		stack[ctr].layer = lower.layer;
 		ctr++;
 
 		if (d.stop)
@@ -673,10 +685,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 			poe = roe;
 
 			/* Find the current layer on the root dentry */
-			for (i = 0; i < poe->numlower; i++)
-				if (poe->lowerstack[i].mnt == lowerpath.mnt)
-					break;
-			if (WARN_ON(i == poe->numlower))
+			i = ovl_find_layer(ofs, &lower);
+			if (WARN_ON(i == ofs->numlower))
 				break;
 		}
 	}
@@ -699,7 +709,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		goto out_put;
 
 	oe->opaque = upperopaque;
-	memcpy(oe->lowerstack, stack, sizeof(struct path) * ctr);
+	memcpy(oe->lowerstack, stack, sizeof(struct ovl_path) * ctr);
 	dentry->d_fsdata = oe;
 
 	if (upperdentry)
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 1cf3bdd193a4..cefe5a97d048 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -251,7 +251,7 @@ static inline bool ovl_is_impuredir(struct dentry *dentry)
 /* namei.c */
 int ovl_verify_origin(struct dentry *dentry, struct vfsmount *mnt,
 		      struct dentry *origin, bool is_upper, bool set);
-int ovl_verify_index(struct dentry *index, struct path *lowerstack,
+int ovl_verify_index(struct dentry *index, struct ovl_path *lower,
 		     unsigned int numlower);
 int ovl_get_index_name(struct dentry *origin, struct qstr *name);
 int ovl_path_next(int idx, struct dentry *dentry, struct path *path);
@@ -268,7 +268,7 @@ int ovl_check_d_type_supported(struct path *realpath);
 void ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
 			 struct dentry *dentry, int level);
 int ovl_indexdir_cleanup(struct dentry *dentry, struct vfsmount *mnt,
-			 struct path *lowerstack, unsigned int numlower);
+			 struct ovl_path *lower, unsigned int numlower);
 
 /* inode.c */
 int ovl_set_nlink_upper(struct dentry *dentry);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 25d9b5adcd42..1e28329b5db8 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -17,11 +17,20 @@ struct ovl_config {
 	bool index;
 };
 
+struct ovl_layer {
+	struct vfsmount *mnt;
+};
+
+struct ovl_path {
+	struct ovl_layer *layer;
+	struct dentry *dentry;
+};
+
 /* private information held for overlayfs's superblock */
 struct ovl_fs {
 	struct vfsmount *upper_mnt;
 	unsigned numlower;
-	struct vfsmount **lower_mnt;
+	struct ovl_layer *lower_layers;
 	/* workbasedir is the path at workdir= mount option */
 	struct dentry *workbasedir;
 	/* workdir is the 'work' directory under workbasedir */
@@ -52,7 +61,7 @@ struct ovl_entry {
 		struct rcu_head rcu;
 	};
 	unsigned numlower;
-	struct path lowerstack[];
+	struct ovl_path lowerstack[];
 };
 
 struct ovl_entry *ovl_alloc_entry(unsigned int numlower);
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 333de85d6de4..4f2479b6ef91 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -1019,7 +1019,7 @@ void ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
 }
 
 int ovl_indexdir_cleanup(struct dentry *dentry, struct vfsmount *mnt,
-			 struct path *lowerstack, unsigned int numlower)
+			 struct ovl_path *lower, unsigned int numlower)
 {
 	int err;
 	struct dentry *index = NULL;
@@ -1054,7 +1054,7 @@ int ovl_indexdir_cleanup(struct dentry *dentry, struct vfsmount *mnt,
 			index = NULL;
 			break;
 		}
-		err = ovl_verify_index(index, lowerstack, numlower);
+		err = ovl_verify_index(index, lower, numlower);
 		/* Cleanup stale and orphan index entries */
 		if (err && (err == -ESTALE || err == -ENOENT))
 			err = ovl_cleanup(dir, index);
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 8702803ba328..e6e13d9588d1 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -220,8 +220,8 @@ static void ovl_put_super(struct super_block *sb)
 		ovl_inuse_unlock(ufs->upper_mnt->mnt_root);
 	mntput(ufs->upper_mnt);
 	for (i = 0; i < ufs->numlower; i++)
-		mntput(ufs->lower_mnt[i]);
-	kfree(ufs->lower_mnt);
+		mntput(ufs->lower_layers[i].mnt);
+	kfree(ufs->lower_layers);
 
 	kfree(ufs->config.lowerdir);
 	kfree(ufs->config.upperdir);
@@ -1026,24 +1026,26 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	}
 
 	err = -ENOMEM;
-	ufs->lower_mnt = kcalloc(numlower, sizeof(struct vfsmount *), GFP_KERNEL);
-	if (ufs->lower_mnt == NULL)
+	ufs->lower_layers = kcalloc(numlower, sizeof(struct ovl_layer),
+				    GFP_KERNEL);
+	if (ufs->lower_layers == NULL)
 		goto out_put_workdir;
 	for (i = 0; i < numlower; i++) {
-		struct vfsmount *mnt = clone_private_mount(&stack[i]);
+		struct vfsmount *mnt;
 
+		mnt = clone_private_mount(&stack[i]);
 		err = PTR_ERR(mnt);
 		if (IS_ERR(mnt)) {
 			pr_err("overlayfs: failed to clone lowerpath\n");
-			goto out_put_lower_mnt;
+			goto out_put_lower_layers;
 		}
 		/*
-		 * Make lower_mnt R/O.  That way fchmod/fchown on lower file
+		 * Make lower layers R/O.  That way fchmod/fchown on lower file
 		 * will fail instead of modifying lower fs.
 		 */
 		mnt->mnt_flags |= MNT_READONLY | MNT_NOATIME;
 
-		ufs->lower_mnt[ufs->numlower] = mnt;
+		ufs->lower_layers[ufs->numlower].mnt = mnt;
 		ufs->numlower++;
 
 		/* Check if all lower layers are on same sb */
@@ -1059,13 +1061,25 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	else if (ufs->upper_mnt->mnt_sb != ufs->same_sb)
 		ufs->same_sb = NULL;
 
+	err = -ENOMEM;
+	oe = ovl_alloc_entry(numlower);
+	if (!oe)
+		goto out_put_lower_layers;
+
+	for (i = 0; i < numlower; i++) {
+		oe->lowerstack[i].dentry = stack[i].dentry;
+		oe->lowerstack[i].layer = &(ufs->lower_layers[i]);
+	}
+
 	if (!(ovl_force_readonly(ufs)) && ufs->config.index) {
 		/* Verify lower root is upper root origin */
-		err = ovl_verify_origin(upperpath.dentry, ufs->lower_mnt[0],
-					stack[0].dentry, false, true);
+		err = ovl_verify_origin(upperpath.dentry,
+					oe->lowerstack[0].layer->mnt,
+					oe->lowerstack[0].dentry,
+					false, true);
 		if (err) {
 			pr_err("overlayfs: failed to verify upper root origin\n");
-			goto out_put_lower_mnt;
+			goto out_free_oe;
 		}
 
 		ufs->indexdir = ovl_workdir_create(sb, ufs, workpath.dentry,
@@ -1081,7 +1095,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 			if (!err)
 				err = ovl_indexdir_cleanup(ufs->indexdir,
 							   ufs->upper_mnt,
-							   stack, numlower);
+							   oe->lowerstack,
+							   numlower);
 		}
 		if (err || !ufs->indexdir)
 			pr_warn("overlayfs: try deleting index dir or mounting with '-o index=off' to disable inodes index.\n");
@@ -1106,11 +1121,6 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	/* Never override disk quota limits or use reserved space */
 	cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
 
-	err = -ENOMEM;
-	oe = ovl_alloc_entry(numlower);
-	if (!oe)
-		goto out_put_cred;
-
 	sb->s_magic = OVERLAYFS_SUPER_MAGIC;
 	sb->s_op = &ovl_super_operations;
 	sb->s_xattr = ovl_xattr_handlers;
@@ -1119,11 +1129,12 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 
 	root_dentry = d_make_root(ovl_new_inode(sb, S_IFDIR, 0));
 	if (!root_dentry)
-		goto out_free_oe;
+		goto out_put_cred;
 
 	mntput(upperpath.mnt);
 	for (i = 0; i < numlower; i++)
 		mntput(stack[i].mnt);
+	kfree(stack);
 	mntput(workpath.mnt);
 	kfree(lowertmp);
 
@@ -1132,11 +1143,6 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 		if (ovl_is_impuredir(upperpath.dentry))
 			ovl_set_flag(OVL_IMPURE, d_inode(root_dentry));
 	}
-	for (i = 0; i < numlower; i++) {
-		oe->lowerstack[i].dentry = stack[i].dentry;
-		oe->lowerstack[i].mnt = ufs->lower_mnt[i];
-	}
-	kfree(stack);
 
 	root_dentry->d_fsdata = oe;
 
@@ -1147,16 +1153,16 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 
 	return 0;
 
-out_free_oe:
-	kfree(oe);
 out_put_cred:
 	put_cred(ufs->creator_cred);
 out_put_indexdir:
 	dput(ufs->indexdir);
-out_put_lower_mnt:
+out_free_oe:
+	kfree(oe);
+out_put_lower_layers:
 	for (i = 0; i < ufs->numlower; i++)
-		mntput(ufs->lower_mnt[i]);
-	kfree(ufs->lower_mnt);
+		mntput(ufs->lower_layers[i].mnt);
+	kfree(ufs->lower_layers);
 out_put_workdir:
 	dput(ufs->workdir);
 	mntput(ufs->upper_mnt);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 9158d17bb320..d6bb1c9f5e7a 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -124,7 +124,12 @@ void ovl_path_lower(struct dentry *dentry, struct path *path)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
 
-	*path = oe->numlower ? oe->lowerstack[0] : (struct path) { };
+	if (oe->numlower) {
+		path->mnt = oe->lowerstack[0].layer->mnt;
+		path->dentry = oe->lowerstack[0].dentry;
+	} else {
+		*path = (struct path) { };
+	}
 }
 
 enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 3/8] ovl: allocate anonymous devs for lowerdirs
  2017-11-02 20:38 [PATCH v7 0/8] Overlayfs: constant st_ino/d_ino for non-samefs Amir Goldstein
  2017-11-02 20:38 ` [PATCH v7 1/8] ovl: move include of ovl_entry.h into overlayfs.h Amir Goldstein
  2017-11-02 20:38 ` [PATCH v7 2/8] ovl: re-structure overlay lower layers in-memory Amir Goldstein
@ 2017-11-02 20:38 ` Amir Goldstein
  2017-11-02 20:38 ` [PATCH v7 4/8] ovl: return anonymous st_dev for lower inodes Amir Goldstein
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Amir Goldstein @ 2017-11-02 20:38 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

From: Chandan Rajendra <chandan@linux.vnet.ibm.com>

Generate unique values of st_dev per lower layer for non-samefs
overlay mount. The unique values are obtained by allocating anonymous
bdevs for each of the lowerdirs in the overlayfs instance.

The anonymous bdev is going to be returned by stat(2) for lowerdir
non-dir entries in non-samefs case.

[amir: split from ovl_getattr() and re-structure patches]

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/ovl_entry.h |  1 +
 fs/overlayfs/super.c     | 18 ++++++++++++++++--
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 1e28329b5db8..93eb6a044dd2 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -19,6 +19,7 @@ struct ovl_config {
 
 struct ovl_layer {
 	struct vfsmount *mnt;
+	dev_t pseudo_dev;
 };
 
 struct ovl_path {
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index e6e13d9588d1..323dfd7a0236 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -219,8 +219,10 @@ static void ovl_put_super(struct super_block *sb)
 	if (ufs->upper_mnt && ufs->upperdir_locked)
 		ovl_inuse_unlock(ufs->upper_mnt->mnt_root);
 	mntput(ufs->upper_mnt);
-	for (i = 0; i < ufs->numlower; i++)
+	for (i = 0; i < ufs->numlower; i++) {
 		mntput(ufs->lower_layers[i].mnt);
+		free_anon_bdev(ufs->lower_layers[i].pseudo_dev);
+	}
 	kfree(ufs->lower_layers);
 
 	kfree(ufs->config.lowerdir);
@@ -1032,11 +1034,19 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 		goto out_put_workdir;
 	for (i = 0; i < numlower; i++) {
 		struct vfsmount *mnt;
+		dev_t dev;
+
+		err = get_anon_bdev(&dev);
+		if (err) {
+			pr_err("overlayfs: failed to get anonymous bdev for lowerpath\n");
+			goto out_put_lower_layers;
+		}
 
 		mnt = clone_private_mount(&stack[i]);
 		err = PTR_ERR(mnt);
 		if (IS_ERR(mnt)) {
 			pr_err("overlayfs: failed to clone lowerpath\n");
+			free_anon_bdev(dev);
 			goto out_put_lower_layers;
 		}
 		/*
@@ -1046,6 +1056,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 		mnt->mnt_flags |= MNT_READONLY | MNT_NOATIME;
 
 		ufs->lower_layers[ufs->numlower].mnt = mnt;
+		ufs->lower_layers[ufs->numlower].pseudo_dev = dev;
 		ufs->numlower++;
 
 		/* Check if all lower layers are on same sb */
@@ -1160,8 +1171,11 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 out_free_oe:
 	kfree(oe);
 out_put_lower_layers:
-	for (i = 0; i < ufs->numlower; i++)
+	for (i = 0; i < ufs->numlower; i++) {
+		if (ufs->lower_layers[i].mnt)
+			free_anon_bdev(ufs->lower_layers[i].pseudo_dev);
 		mntput(ufs->lower_layers[i].mnt);
+	}
 	kfree(ufs->lower_layers);
 out_put_workdir:
 	dput(ufs->workdir);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 4/8] ovl: return anonymous st_dev for lower inodes
  2017-11-02 20:38 [PATCH v7 0/8] Overlayfs: constant st_ino/d_ino for non-samefs Amir Goldstein
                   ` (2 preceding siblings ...)
  2017-11-02 20:38 ` [PATCH v7 3/8] ovl: allocate anonymous devs for lowerdirs Amir Goldstein
@ 2017-11-02 20:38 ` Amir Goldstein
  2017-11-02 20:38 ` [PATCH v7 5/8] ovl: relax same fs constraint for constant st_ino Amir Goldstein
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Amir Goldstein @ 2017-11-02 20:38 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

From: Chandan Rajendra <chandan@linux.vnet.ibm.com>

For non-samefs setup, to make sure that st_dev/st_ino pair is unique
across the system, we return a unique anonymous st_dev for stat(2)
of lower layer inode.

A following patch is going to fix constant st_dev/st_ino across copy up
by returning origin st_dev/st_ino for copied up objects.

If the st_dev/st_ino for copied up object would have been the same as
that of the real underlying lower file, running diff on underlying lower
file and overlay copied up file would result in diff reporting that the
2 files are equal when in fact, they may have different content.

[amir: simplify ovl_get_pseudo_dev()
       split from allocate anonymous bdev patch]

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/inode.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index b772e5a2a730..607a50c50042 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -15,6 +15,14 @@
 #include <linux/ratelimit.h>
 #include "overlayfs.h"
 
+
+static dev_t ovl_get_pseudo_dev(struct dentry *dentry)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+
+	return oe->lowerstack[0].layer->pseudo_dev;
+}
+
 int ovl_setattr(struct dentry *dentry, struct iattr *attr)
 {
 	int err;
@@ -121,6 +129,13 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
 		 */
 		stat->dev = dentry->d_sb->s_dev;
 		stat->ino = dentry->d_inode->i_ino;
+	} else if (!OVL_TYPE_UPPER(type)) {
+		/*
+		 * For non-samefs setup, to make sure that st_dev/st_ino pair
+		 * is unique across the system, we use a unique anonymous
+		 * st_dev for lower layer inode.
+		 */
+		stat->dev = ovl_get_pseudo_dev(dentry);
 	}
 
 	/*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 5/8] ovl: relax same fs constraint for constant st_ino
  2017-11-02 20:38 [PATCH v7 0/8] Overlayfs: constant st_ino/d_ino for non-samefs Amir Goldstein
                   ` (3 preceding siblings ...)
  2017-11-02 20:38 ` [PATCH v7 4/8] ovl: return anonymous st_dev for lower inodes Amir Goldstein
@ 2017-11-02 20:38 ` Amir Goldstein
  2017-11-06 20:43   ` Vivek Goyal
  2017-11-02 20:38 ` [PATCH v7 6/8] ovl: recalc d_ino for dir cache in non-samefs case Amir Goldstein
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Amir Goldstein @ 2017-11-02 20:38 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

For the case of all layers not on the same fs, return the copy up origin
inode st_dev/st_ino for non-dir from stat(2).

This guaranties constant st_dev/st_ino for non-dir across copy up.
Like the same fs case, st_ino of non-dir is also persistent.

If the st_dev/st_ino for copied up object would have been the same as
that of the real underlying lower file, running diff on underlying lower
file and overlay copied up file would result in diff reporting that the
two files are equal when in fact, they may have different content.

Therefore, unlike the same fs case, st_dev is not persistent because it
uses the unique anonymous bdev allocated for the lower layer.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/inode.c | 49 +++++++++++++++++++++++++++++--------------------
 1 file changed, 29 insertions(+), 20 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 607a50c50042..d06de98808f6 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -74,6 +74,7 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
 	struct path realpath;
 	const struct cred *old_cred;
 	bool is_dir = S_ISDIR(dentry->d_inode->i_mode);
+	bool samefs = ovl_same_sb(dentry->d_sb);
 	int err;
 
 	type = ovl_path_real(dentry, &realpath);
@@ -83,16 +84,13 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
 		goto out;
 
 	/*
-	 * When all layers are on the same fs, all real inode number are
-	 * unique, so we use the overlay st_dev, which is friendly to du -x.
-	 *
-	 * We also use st_ino of the copy up origin, if we know it.
-	 * This guaranties constant st_dev/st_ino across copy up.
+	 * For non-dir or same fs, we use st_ino of the copy up origin, if we
+	 * know it. This guaranties constant st_dev/st_ino across copy up.
 	 *
 	 * If filesystem supports NFS export ops, this also guaranties
 	 * persistent st_ino across mount cycle.
 	 */
-	if (ovl_same_sb(dentry->d_sb)) {
+	if (!is_dir || samefs) {
 		if (OVL_TYPE_ORIGIN(type)) {
 			struct kstat lowerstat;
 			u32 lowermask = STATX_INO | (!is_dir ? STATX_NLINK : 0);
@@ -103,7 +101,6 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
 			if (err)
 				goto out;
 
-			WARN_ON_ONCE(stat->dev != lowerstat.dev);
 			/*
 			 * Lower hardlinks may be broken on copy up to different
 			 * upper files, so we cannot use the lower origin st_ino
@@ -115,27 +112,39 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
 			if (is_dir || lowerstat.nlink == 1 ||
 			    ovl_test_flag(OVL_INDEX, d_inode(dentry)))
 				stat->ino = lowerstat.ino;
+
+			if (samefs)
+				WARN_ON_ONCE(stat->dev != lowerstat.dev);
+			else
+				stat->dev = ovl_get_pseudo_dev(dentry);
 		}
-		stat->dev = dentry->d_sb->s_dev;
-	} else if (is_dir) {
+		if (samefs) {
+			/*
+			 * When all layers are on the same fs, all real inode
+			 * number are unique, so we use the overlay st_dev,
+			 * which is friendly to du -x.
+			 */
+			stat->dev = dentry->d_sb->s_dev;
+		} else if (!OVL_TYPE_UPPER(type)) {
+			/*
+			 * For non-samefs setup, to make sure that st_dev/st_ino
+			 * pair is unique across the system, we use a unique
+			 * anonymous st_dev for lower layer inode.
+			 */
+			stat->dev = ovl_get_pseudo_dev(dentry);
+		}
+	} else {
 		/*
-		 * If not all layers are on the same fs the pair {real st_ino;
-		 * overlay st_dev} is not unique, so use the non persistent
-		 * overlay st_ino.
-		 *
 		 * Always use the overlay st_dev for directories, so 'find
 		 * -xdev' will scan the entire overlay mount and won't cross the
 		 * overlay mount boundaries.
+		 *
+		 * If not all layers are on the same fs the pair {real st_ino;
+		 * overlay st_dev} is not unique, so use the non persistent
+		 * overlay st_ino for directories.
 		 */
 		stat->dev = dentry->d_sb->s_dev;
 		stat->ino = dentry->d_inode->i_ino;
-	} else if (!OVL_TYPE_UPPER(type)) {
-		/*
-		 * For non-samefs setup, to make sure that st_dev/st_ino pair
-		 * is unique across the system, we use a unique anonymous
-		 * st_dev for lower layer inode.
-		 */
-		stat->dev = ovl_get_pseudo_dev(dentry);
 	}
 
 	/*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 6/8] ovl: recalc d_ino for dir cache in non-samefs case
  2017-11-02 20:38 [PATCH v7 0/8] Overlayfs: constant st_ino/d_ino for non-samefs Amir Goldstein
                   ` (4 preceding siblings ...)
  2017-11-02 20:38 ` [PATCH v7 5/8] ovl: relax same fs constraint for constant st_ino Amir Goldstein
@ 2017-11-02 20:38 ` Amir Goldstein
  2017-11-02 20:38 ` [PATCH v7 7/8] ovl: update cache version of impure parent on rename Amir Goldstein
  2017-11-02 20:38 ` [PATCH v7 8/8] ovl: ovl_iterate_real() for all pure upper/lower in non-samefs case Amir Goldstein
  7 siblings, 0 replies; 23+ messages in thread
From: Amir Goldstein @ 2017-11-02 20:38 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

st_ino returned by stat(2) for dirs on non-samefs is the non persistent
overlay i_ino. Update the dir cache with volatile overlay i_ino values.
Overlay dir cache and inode cache may get out of sync after child inodes
exit and re-enter inode cache. This means that user may see inconsistent
st_ino/d_ino, but user can already see inconsistent st_ino of same object
in different times, so this is better than nothing.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/readdir.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 4f2479b6ef91..4ec95806285e 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -120,6 +120,10 @@ static bool ovl_calc_d_ino(struct ovl_readdir_data *rdd,
 	if (!rdd->dentry)
 		return false;
 
+	/* Always recalc d_ino from i_ino for dir on non-samefs */
+	if (p->type == DT_DIR && !ovl_same_sb(rdd->dentry->d_sb))
+		return true;
+
 	/* Always recalc d_ino for parent */
 	if (strcmp(p->name, "..") == 0)
 		return true;
@@ -459,9 +463,6 @@ static int ovl_cache_update_ino(struct path *path, struct ovl_cache_entry *p)
 	u64 ino = p->real_ino;
 	int err = 0;
 
-	if (!ovl_same_sb(dir->d_sb))
-		goto out;
-
 	if (p->name[0] == '.') {
 		if (p->len == 1) {
 			this = dget(dir);
@@ -484,6 +485,19 @@ static int ovl_cache_update_ino(struct path *path, struct ovl_cache_entry *p)
 	}
 
 get:
+	/*
+	 * st_ino for dirs on non-samefs is the non persistent overlay i_ino.
+	 * Update the dir cache with volatile overlay i_ino, even though
+	 * dir cache and inode cache may get out of sync after child inodes
+	 * exit and re-enter inode cache. This means that user may see
+	 * inconsistent st_ino/d_ino, but user can already see inconsistent
+	 * st_ino of same object in different times.
+	 */
+	if (p->type == DT_DIR && !ovl_same_sb(dir->d_sb)) {
+		ino = this->d_inode->i_ino;
+		goto out;
+	}
+
 	type = ovl_path_type(this);
 	if (OVL_TYPE_ORIGIN(type)) {
 		struct kstat stat;
@@ -494,7 +508,8 @@ static int ovl_cache_update_ino(struct path *path, struct ovl_cache_entry *p)
 		if (err)
 			goto fail;
 
-		WARN_ON_ONCE(dir->d_sb->s_dev != stat.dev);
+		if (ovl_same_sb(dir->d_sb))
+			WARN_ON_ONCE(dir->d_sb->s_dev != stat.dev);
 		ino = stat.ino;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 7/8] ovl: update cache version of impure parent on rename
  2017-11-02 20:38 [PATCH v7 0/8] Overlayfs: constant st_ino/d_ino for non-samefs Amir Goldstein
                   ` (5 preceding siblings ...)
  2017-11-02 20:38 ` [PATCH v7 6/8] ovl: recalc d_ino for dir cache in non-samefs case Amir Goldstein
@ 2017-11-02 20:38 ` Amir Goldstein
  2017-11-06  9:17   ` Miklos Szeredi
  2017-11-02 20:38 ` [PATCH v7 8/8] ovl: ovl_iterate_real() for all pure upper/lower in non-samefs case Amir Goldstein
  7 siblings, 1 reply; 23+ messages in thread
From: Amir Goldstein @ 2017-11-02 20:38 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

ovl_rename() updates dir cache version for impure old parent if an entry
with copy up origin is moved into old parent, but it did not update
cache version if the entry moved out of old parent has a copy up origin.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/dir.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index ef533198be45..26aef3b5f007 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -1075,9 +1075,10 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 			drop_nlink(d_inode(new));
 	}
 
-	ovl_dentry_version_inc(old->d_parent,
-			       !overwrite && ovl_type_origin(new));
+	ovl_dentry_version_inc(old->d_parent, ovl_type_origin(old));
 	ovl_dentry_version_inc(new->d_parent, ovl_type_origin(old));
+	if (!overwrite)
+		ovl_dentry_version_inc(old->d_parent, ovl_type_origin(new));
 
 out_dput:
 	dput(newdentry);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 8/8] ovl: ovl_iterate_real() for all pure upper/lower in non-samefs case
  2017-11-02 20:38 [PATCH v7 0/8] Overlayfs: constant st_ino/d_ino for non-samefs Amir Goldstein
                   ` (6 preceding siblings ...)
  2017-11-02 20:38 ` [PATCH v7 7/8] ovl: update cache version of impure parent on rename Amir Goldstein
@ 2017-11-02 20:38 ` Amir Goldstein
  2017-11-06  9:35   ` Miklos Szeredi
  7 siblings, 1 reply; 23+ messages in thread
From: Amir Goldstein @ 2017-11-02 20:38 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

For non-samefs case, all directory entries are "impure" because the
st_ino value returned from stat(2) is not the st_ino value of the
real inode.

In order to provide the same d_ino values for the directory type entries,
we never iterate the real dir directly in non-samefs case.

ovl_fill_real() has been modified to return the overlay inode number
as d_ino of the "." and ".." entries.

Build the impure dir cache for non-samefs case even if there is no
"impure" xattr on dir, but if real dir has nlink > 2 or nlink == 1,
which indicates that is has subdirs.

Update non-merge dir cache version on every change in subdir entry
and reset the dir cache after copy up when directory becomes !is_real.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/dir.c       | 17 ++++++++++-------
 fs/overlayfs/overlayfs.h |  2 +-
 fs/overlayfs/readdir.c   | 37 +++++++++++++++++++++++++------------
 fs/overlayfs/util.c      |  9 ++++++---
 4 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 26aef3b5f007..f2ba8ab99581 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -155,7 +155,7 @@ static int ovl_set_opaque(struct dentry *dentry, struct dentry *upperdentry)
 static void ovl_instantiate(struct dentry *dentry, struct inode *inode,
 			    struct dentry *newdentry, bool hardlink)
 {
-	ovl_dentry_version_inc(dentry->d_parent, false);
+	ovl_dentry_version_inc(dentry->d_parent, false, S_ISDIR(inode->i_mode));
 	ovl_dentry_set_upper_alias(dentry);
 	if (!hardlink) {
 		ovl_inode_update(inode, newdentry);
@@ -676,7 +676,7 @@ static int ovl_remove_and_whiteout(struct dentry *dentry, bool is_dir)
 	if (flags)
 		ovl_cleanup(wdir, upper);
 
-	ovl_dentry_version_inc(dentry->d_parent, true);
+	ovl_dentry_version_inc(dentry->d_parent, true, is_dir);
 out_d_drop:
 	d_drop(dentry);
 	dput(whiteout);
@@ -727,7 +727,8 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir)
 		err = vfs_rmdir(dir, upper);
 	else
 		err = vfs_unlink(dir, upper, NULL);
-	ovl_dentry_version_inc(dentry->d_parent, ovl_type_origin(dentry));
+	ovl_dentry_version_inc(dentry->d_parent, ovl_type_origin(dentry),
+			       is_dir);
 
 	/*
 	 * Keeping this dentry hashed would mean having to release
@@ -1075,10 +1076,12 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 			drop_nlink(d_inode(new));
 	}
 
-	ovl_dentry_version_inc(old->d_parent, ovl_type_origin(old));
-	ovl_dentry_version_inc(new->d_parent, ovl_type_origin(old));
-	if (!overwrite)
-		ovl_dentry_version_inc(old->d_parent, ovl_type_origin(new));
+	ovl_dentry_version_inc(old->d_parent, ovl_type_origin(old), is_dir);
+	ovl_dentry_version_inc(new->d_parent, ovl_type_origin(old), is_dir);
+	if (!overwrite) {
+		ovl_dentry_version_inc(old->d_parent, ovl_type_origin(new),
+				       new_is_dir);
+	}
 
 out_dput:
 	dput(newdentry);
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index cefe5a97d048..17fc4622dfba 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -221,7 +221,7 @@ void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
 void ovl_inode_init(struct inode *inode, struct dentry *upperdentry,
 		    struct dentry *lowerdentry);
 void ovl_inode_update(struct inode *inode, struct dentry *upperdentry);
-void ovl_dentry_version_inc(struct dentry *dentry, bool impurity);
+void ovl_dentry_version_inc(struct dentry *dentry, bool impurity, bool subdir);
 u64 ovl_dentry_version_get(struct dentry *dentry);
 bool ovl_is_whiteout(struct dentry *dentry);
 struct file *ovl_path_open(struct path *path, int flags);
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 4ec95806285e..56320b927c5d 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -344,14 +344,14 @@ static void ovl_dir_reset(struct file *file)
 	struct ovl_dir_file *od = file->private_data;
 	struct ovl_dir_cache *cache = od->cache;
 	struct dentry *dentry = file->f_path.dentry;
-	bool is_real;
+	bool is_real = ovl_dir_is_real(dentry);
 
-	if (cache && ovl_dentry_version_get(dentry) != cache->version) {
+	if (cache && (ovl_dentry_version_get(dentry) != cache->version ||
+		      od->is_real != is_real)) {
 		ovl_cache_put(od, dentry);
 		od->cache = NULL;
 		od->cursor = NULL;
 	}
-	is_real = ovl_dir_is_real(dentry);
 	if (od->is_real != is_real) {
 		/* is_real can only become false when dir is copied up */
 		if (WARN_ON(is_real))
@@ -455,7 +455,6 @@ static struct ovl_dir_cache *ovl_cache_get(struct dentry *dentry)
  * sure that d_ino will be consistent with st_ino from stat(2).
  */
 static int ovl_cache_update_ino(struct path *path, struct ovl_cache_entry *p)
-
 {
 	struct dentry *dir = path->dentry;
 	struct dentry *this = NULL;
@@ -557,7 +556,7 @@ static int ovl_dir_read_impure(struct path *path,  struct list_head *list,
 
 	INIT_LIST_HEAD(list);
 	*root = RB_ROOT;
-	ovl_path_upper(path->dentry, &realpath);
+	ovl_path_real(path->dentry, &realpath);
 
 	err = ovl_dir_read(&realpath, &rdd);
 	if (err)
@@ -592,6 +591,7 @@ static struct ovl_dir_cache *ovl_cache_get_impure(struct path *path)
 {
 	int res;
 	struct dentry *dentry = path->dentry;
+	struct dentry *upper = ovl_dentry_upper(dentry);
 	struct ovl_dir_cache *cache;
 
 	cache = ovl_dir_cache(d_inode(dentry));
@@ -612,9 +612,9 @@ static struct ovl_dir_cache *ovl_cache_get_impure(struct path *path)
 		kfree(cache);
 		return ERR_PTR(res);
 	}
-	if (list_empty(&cache->entries)) {
+	if (upper && list_empty(&cache->entries)) {
 		/* Good oportunity to get rid of an unnecessary "impure" flag */
-		ovl_do_removexattr(ovl_dentry_upper(dentry), OVL_XATTR_IMPURE);
+		ovl_do_removexattr(upper, OVL_XATTR_IMPURE);
 		ovl_clear_flag(OVL_IMPURE, d_inode(dentry));
 		kfree(cache);
 		return NULL;
@@ -631,6 +631,7 @@ struct ovl_readdir_translate {
 	struct ovl_dir_cache *cache;
 	struct dir_context ctx;
 	u64 parent_ino;
+	u64 current_ino;
 };
 
 static int ovl_fill_real(struct dir_context *ctx, const char *name,
@@ -643,6 +644,8 @@ static int ovl_fill_real(struct dir_context *ctx, const char *name,
 
 	if (rdt->parent_ino && strcmp(name, "..") == 0)
 		ino = rdt->parent_ino;
+	else if (rdt->current_ino && namelen == 1 && name[0] == '.')
+		ino = rdt->current_ino;
 	else if (rdt->cache) {
 		struct ovl_cache_entry *p;
 
@@ -659,12 +662,16 @@ static int ovl_iterate_real(struct file *file, struct dir_context *ctx)
 	int err;
 	struct ovl_dir_file *od = file->private_data;
 	struct dentry *dir = file->f_path.dentry;
+	struct inode *realinode = d_inode(od->realfile->f_path.dentry);
 	struct ovl_readdir_translate rdt = {
 		.ctx.actor = ovl_fill_real,
 		.orig_ctx = ctx,
 	};
 
-	if (OVL_TYPE_MERGE(ovl_path_type(dir->d_parent))) {
+	if (!ovl_same_sb(dir->d_sb)) {
+		rdt.current_ino = dir->d_inode->i_ino;
+		rdt.parent_ino = dir->d_parent->d_inode->i_ino;
+	} else if (OVL_TYPE_MERGE(ovl_path_type(dir->d_parent))) {
 		struct kstat stat;
 		struct path statpath = file->f_path;
 
@@ -677,7 +684,13 @@ static int ovl_iterate_real(struct file *file, struct dir_context *ctx)
 		rdt.parent_ino = stat.ino;
 	}
 
-	if (ovl_test_flag(OVL_IMPURE, d_inode(dir))) {
+	/*
+	 * If dir is impure, we need to cache st_ino values of copy up origins.
+	 * For non-samefs, if dir nlink is either 1 (=any) or > 2, then it may
+	 * have subdirs and we need to cache their non persistent st_ino values.
+	 */
+	if (ovl_test_flag(OVL_IMPURE, d_inode(dir)) ||
+	    (!ovl_same_sb(dir->d_sb) && realinode->i_nlink != 2)) {
 		rdt.cache = ovl_cache_get_impure(&file->f_path);
 		if (IS_ERR(rdt.cache))
 			return PTR_ERR(rdt.cache);
@@ -703,9 +716,9 @@ static int ovl_iterate(struct file *file, struct dir_context *ctx)
 		 * dir is impure then need to adjust d_ino for copied up
 		 * entries.
 		 */
-		if (ovl_same_sb(dentry->d_sb) &&
-		    (ovl_test_flag(OVL_IMPURE, d_inode(dentry)) ||
-		     OVL_TYPE_MERGE(ovl_path_type(dentry->d_parent)))) {
+		if (ovl_test_flag(OVL_IMPURE, d_inode(dentry)) ||
+		     OVL_TYPE_MERGE(ovl_path_type(dentry->d_parent)) ||
+		     !ovl_same_sb(dentry->d_sb)) {
 			return ovl_iterate_real(file, ctx);
 		}
 		return iterate_dir(od->realfile, ctx);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index d6bb1c9f5e7a..b4d729dff66b 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -279,7 +279,7 @@ void ovl_inode_update(struct inode *inode, struct dentry *upperdentry)
 	}
 }
 
-void ovl_dentry_version_inc(struct dentry *dentry, bool impurity)
+void ovl_dentry_version_inc(struct dentry *dentry, bool impurity, bool subdir)
 {
 	struct inode *inode = d_inode(dentry);
 
@@ -288,9 +288,12 @@ void ovl_dentry_version_inc(struct dentry *dentry, bool impurity)
 	 * Version is used by readdir code to keep cache consistent.  For merge
 	 * dirs all changes need to be noted.  For non-merge dirs, cache only
 	 * contains impure (ones which have been copied up and have origins)
-	 * entries, so only need to note changes to impure entries.
+	 * entries, so only need to note changes to impure entries.  For
+	 * non-merge dirs in non-samefs case, we need to also note changes for
+	 * all sub-dirs, because we cache all overlay dir inode numbers.
 	 */
-	if (OVL_TYPE_MERGE(ovl_path_type(dentry)) || impurity)
+	if (OVL_TYPE_MERGE(ovl_path_type(dentry)) || impurity ||
+	    (subdir && !ovl_same_sb(dentry->d_sb)))
 		OVL_I(inode)->version++;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 7/8] ovl: update cache version of impure parent on rename
  2017-11-02 20:38 ` [PATCH v7 7/8] ovl: update cache version of impure parent on rename Amir Goldstein
@ 2017-11-06  9:17   ` Miklos Szeredi
  2017-11-06 10:06     ` Amir Goldstein
  0 siblings, 1 reply; 23+ messages in thread
From: Miklos Szeredi @ 2017-11-06  9:17 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

On Thu, Nov 2, 2017 at 9:38 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> ovl_rename() updates dir cache version for impure old parent if an entry
> with copy up origin is moved into old parent, but it did not update
> cache version if the entry moved out of old parent has a copy up origin.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/dir.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index ef533198be45..26aef3b5f007 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -1075,9 +1075,10 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
>                         drop_nlink(d_inode(new));
>         }
>
> -       ovl_dentry_version_inc(old->d_parent,
> -                              !overwrite && ovl_type_origin(new));
> +       ovl_dentry_version_inc(old->d_parent, ovl_type_origin(old));
>         ovl_dentry_version_inc(new->d_parent, ovl_type_origin(old));
> +       if (!overwrite)
> +               ovl_dentry_version_inc(old->d_parent, ovl_type_origin(new));

How about the opposite (i.e. newdir losing impurity)?  That can happen two ways:

  - overwriting copied up file (new) with pure one (old)
  - exchange copied up file (new) with pure one (old)

Also I'd merge the two version increments into one, although that
doesn't really matter:

-    ovl_dentry_version_inc(old->d_parent,
-                   !overwrite && ovl_type_origin(new));
-    ovl_dentry_version_inc(new->d_parent, ovl_type_origin(old));
+    ovl_dentry_version_inc(old->d_parent, ovl_type_origin(old) ||
+                   (!overwrite && ovl_type_origin(new)));
+    ovl_dentry_version_inc(new->d_parent,
+                   ovl_type_origin(old) || ovl_type_origin(new));

Fixed up the patch, but tell me if I'm missing something here.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 8/8] ovl: ovl_iterate_real() for all pure upper/lower in non-samefs case
  2017-11-02 20:38 ` [PATCH v7 8/8] ovl: ovl_iterate_real() for all pure upper/lower in non-samefs case Amir Goldstein
@ 2017-11-06  9:35   ` Miklos Szeredi
  2017-11-06 10:08     ` Amir Goldstein
  2017-11-06 11:45     ` Amir Goldstein
  0 siblings, 2 replies; 23+ messages in thread
From: Miklos Szeredi @ 2017-11-06  9:35 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

On Thu, Nov 2, 2017 at 9:38 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> For non-samefs case, all directory entries are "impure" because the
> st_ino value returned from stat(2) is not the st_ino value of the
> real inode.
>
> In order to provide the same d_ino values for the directory type entries,
> we never iterate the real dir directly in non-samefs case.
>
> ovl_fill_real() has been modified to return the overlay inode number
> as d_ino of the "." and ".." entries.
>
> Build the impure dir cache for non-samefs case even if there is no
> "impure" xattr on dir, but if real dir has nlink > 2 or nlink == 1,
> which indicates that is has subdirs.
>
> Update non-merge dir cache version on every change in subdir entry
> and reset the dir cache after copy up when directory becomes !is_real.
>

I don't really like this.  Non-samefs d_ino will always be hacky,
whether consistent with st_ino or not (it loses the uniqueness
property of d_ino, that is guaranteed in a normal filesystem).

Spending possibly large amounts of memory and CPU cycles to achieve
consistency is dubious.   Especially since nobody seems to be
complaining about it.

So, I'd just drop 6/8 and 8/8.

And I'm still hoping we can get some infrastructure for partitioning
the ino space (i.e. filesystem can report that it will only use N bits
from the available 64).  This would allow properly solving the
non-samefs case without adding hacks.

Thanks,
Miklos


> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/dir.c       | 17 ++++++++++-------
>  fs/overlayfs/overlayfs.h |  2 +-
>  fs/overlayfs/readdir.c   | 37 +++++++++++++++++++++++++------------
>  fs/overlayfs/util.c      |  9 ++++++---
>  4 files changed, 42 insertions(+), 23 deletions(-)
>
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index 26aef3b5f007..f2ba8ab99581 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -155,7 +155,7 @@ static int ovl_set_opaque(struct dentry *dentry, struct dentry *upperdentry)
>  static void ovl_instantiate(struct dentry *dentry, struct inode *inode,
>                             struct dentry *newdentry, bool hardlink)
>  {
> -       ovl_dentry_version_inc(dentry->d_parent, false);
> +       ovl_dentry_version_inc(dentry->d_parent, false, S_ISDIR(inode->i_mode));
>         ovl_dentry_set_upper_alias(dentry);
>         if (!hardlink) {
>                 ovl_inode_update(inode, newdentry);
> @@ -676,7 +676,7 @@ static int ovl_remove_and_whiteout(struct dentry *dentry, bool is_dir)
>         if (flags)
>                 ovl_cleanup(wdir, upper);
>
> -       ovl_dentry_version_inc(dentry->d_parent, true);
> +       ovl_dentry_version_inc(dentry->d_parent, true, is_dir);
>  out_d_drop:
>         d_drop(dentry);
>         dput(whiteout);
> @@ -727,7 +727,8 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir)
>                 err = vfs_rmdir(dir, upper);
>         else
>                 err = vfs_unlink(dir, upper, NULL);
> -       ovl_dentry_version_inc(dentry->d_parent, ovl_type_origin(dentry));
> +       ovl_dentry_version_inc(dentry->d_parent, ovl_type_origin(dentry),
> +                              is_dir);
>
>         /*
>          * Keeping this dentry hashed would mean having to release
> @@ -1075,10 +1076,12 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
>                         drop_nlink(d_inode(new));
>         }
>
> -       ovl_dentry_version_inc(old->d_parent, ovl_type_origin(old));
> -       ovl_dentry_version_inc(new->d_parent, ovl_type_origin(old));
> -       if (!overwrite)
> -               ovl_dentry_version_inc(old->d_parent, ovl_type_origin(new));
> +       ovl_dentry_version_inc(old->d_parent, ovl_type_origin(old), is_dir);
> +       ovl_dentry_version_inc(new->d_parent, ovl_type_origin(old), is_dir);
> +       if (!overwrite) {
> +               ovl_dentry_version_inc(old->d_parent, ovl_type_origin(new),
> +                                      new_is_dir);
> +       }
>
>  out_dput:
>         dput(newdentry);
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index cefe5a97d048..17fc4622dfba 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -221,7 +221,7 @@ void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
>  void ovl_inode_init(struct inode *inode, struct dentry *upperdentry,
>                     struct dentry *lowerdentry);
>  void ovl_inode_update(struct inode *inode, struct dentry *upperdentry);
> -void ovl_dentry_version_inc(struct dentry *dentry, bool impurity);
> +void ovl_dentry_version_inc(struct dentry *dentry, bool impurity, bool subdir);
>  u64 ovl_dentry_version_get(struct dentry *dentry);
>  bool ovl_is_whiteout(struct dentry *dentry);
>  struct file *ovl_path_open(struct path *path, int flags);
> diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> index 4ec95806285e..56320b927c5d 100644
> --- a/fs/overlayfs/readdir.c
> +++ b/fs/overlayfs/readdir.c
> @@ -344,14 +344,14 @@ static void ovl_dir_reset(struct file *file)
>         struct ovl_dir_file *od = file->private_data;
>         struct ovl_dir_cache *cache = od->cache;
>         struct dentry *dentry = file->f_path.dentry;
> -       bool is_real;
> +       bool is_real = ovl_dir_is_real(dentry);
>
> -       if (cache && ovl_dentry_version_get(dentry) != cache->version) {
> +       if (cache && (ovl_dentry_version_get(dentry) != cache->version ||
> +                     od->is_real != is_real)) {
>                 ovl_cache_put(od, dentry);
>                 od->cache = NULL;
>                 od->cursor = NULL;
>         }
> -       is_real = ovl_dir_is_real(dentry);
>         if (od->is_real != is_real) {
>                 /* is_real can only become false when dir is copied up */
>                 if (WARN_ON(is_real))
> @@ -455,7 +455,6 @@ static struct ovl_dir_cache *ovl_cache_get(struct dentry *dentry)
>   * sure that d_ino will be consistent with st_ino from stat(2).
>   */
>  static int ovl_cache_update_ino(struct path *path, struct ovl_cache_entry *p)
> -
>  {
>         struct dentry *dir = path->dentry;
>         struct dentry *this = NULL;
> @@ -557,7 +556,7 @@ static int ovl_dir_read_impure(struct path *path,  struct list_head *list,
>
>         INIT_LIST_HEAD(list);
>         *root = RB_ROOT;
> -       ovl_path_upper(path->dentry, &realpath);
> +       ovl_path_real(path->dentry, &realpath);
>
>         err = ovl_dir_read(&realpath, &rdd);
>         if (err)
> @@ -592,6 +591,7 @@ static struct ovl_dir_cache *ovl_cache_get_impure(struct path *path)
>  {
>         int res;
>         struct dentry *dentry = path->dentry;
> +       struct dentry *upper = ovl_dentry_upper(dentry);
>         struct ovl_dir_cache *cache;
>
>         cache = ovl_dir_cache(d_inode(dentry));
> @@ -612,9 +612,9 @@ static struct ovl_dir_cache *ovl_cache_get_impure(struct path *path)
>                 kfree(cache);
>                 return ERR_PTR(res);
>         }
> -       if (list_empty(&cache->entries)) {
> +       if (upper && list_empty(&cache->entries)) {
>                 /* Good oportunity to get rid of an unnecessary "impure" flag */
> -               ovl_do_removexattr(ovl_dentry_upper(dentry), OVL_XATTR_IMPURE);
> +               ovl_do_removexattr(upper, OVL_XATTR_IMPURE);
>                 ovl_clear_flag(OVL_IMPURE, d_inode(dentry));
>                 kfree(cache);
>                 return NULL;
> @@ -631,6 +631,7 @@ struct ovl_readdir_translate {
>         struct ovl_dir_cache *cache;
>         struct dir_context ctx;
>         u64 parent_ino;
> +       u64 current_ino;
>  };
>
>  static int ovl_fill_real(struct dir_context *ctx, const char *name,
> @@ -643,6 +644,8 @@ static int ovl_fill_real(struct dir_context *ctx, const char *name,
>
>         if (rdt->parent_ino && strcmp(name, "..") == 0)
>                 ino = rdt->parent_ino;
> +       else if (rdt->current_ino && namelen == 1 && name[0] == '.')
> +               ino = rdt->current_ino;
>         else if (rdt->cache) {
>                 struct ovl_cache_entry *p;
>
> @@ -659,12 +662,16 @@ static int ovl_iterate_real(struct file *file, struct dir_context *ctx)
>         int err;
>         struct ovl_dir_file *od = file->private_data;
>         struct dentry *dir = file->f_path.dentry;
> +       struct inode *realinode = d_inode(od->realfile->f_path.dentry);
>         struct ovl_readdir_translate rdt = {
>                 .ctx.actor = ovl_fill_real,
>                 .orig_ctx = ctx,
>         };
>
> -       if (OVL_TYPE_MERGE(ovl_path_type(dir->d_parent))) {
> +       if (!ovl_same_sb(dir->d_sb)) {
> +               rdt.current_ino = dir->d_inode->i_ino;
> +               rdt.parent_ino = dir->d_parent->d_inode->i_ino;
> +       } else if (OVL_TYPE_MERGE(ovl_path_type(dir->d_parent))) {
>                 struct kstat stat;
>                 struct path statpath = file->f_path;
>
> @@ -677,7 +684,13 @@ static int ovl_iterate_real(struct file *file, struct dir_context *ctx)
>                 rdt.parent_ino = stat.ino;
>         }
>
> -       if (ovl_test_flag(OVL_IMPURE, d_inode(dir))) {
> +       /*
> +        * If dir is impure, we need to cache st_ino values of copy up origins.
> +        * For non-samefs, if dir nlink is either 1 (=any) or > 2, then it may
> +        * have subdirs and we need to cache their non persistent st_ino values.
> +        */
> +       if (ovl_test_flag(OVL_IMPURE, d_inode(dir)) ||
> +           (!ovl_same_sb(dir->d_sb) && realinode->i_nlink != 2)) {
>                 rdt.cache = ovl_cache_get_impure(&file->f_path);
>                 if (IS_ERR(rdt.cache))
>                         return PTR_ERR(rdt.cache);
> @@ -703,9 +716,9 @@ static int ovl_iterate(struct file *file, struct dir_context *ctx)
>                  * dir is impure then need to adjust d_ino for copied up
>                  * entries.
>                  */
> -               if (ovl_same_sb(dentry->d_sb) &&
> -                   (ovl_test_flag(OVL_IMPURE, d_inode(dentry)) ||
> -                    OVL_TYPE_MERGE(ovl_path_type(dentry->d_parent)))) {
> +               if (ovl_test_flag(OVL_IMPURE, d_inode(dentry)) ||
> +                    OVL_TYPE_MERGE(ovl_path_type(dentry->d_parent)) ||
> +                    !ovl_same_sb(dentry->d_sb)) {
>                         return ovl_iterate_real(file, ctx);
>                 }
>                 return iterate_dir(od->realfile, ctx);
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index d6bb1c9f5e7a..b4d729dff66b 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -279,7 +279,7 @@ void ovl_inode_update(struct inode *inode, struct dentry *upperdentry)
>         }
>  }
>
> -void ovl_dentry_version_inc(struct dentry *dentry, bool impurity)
> +void ovl_dentry_version_inc(struct dentry *dentry, bool impurity, bool subdir)
>  {
>         struct inode *inode = d_inode(dentry);
>
> @@ -288,9 +288,12 @@ void ovl_dentry_version_inc(struct dentry *dentry, bool impurity)
>          * Version is used by readdir code to keep cache consistent.  For merge
>          * dirs all changes need to be noted.  For non-merge dirs, cache only
>          * contains impure (ones which have been copied up and have origins)
> -        * entries, so only need to note changes to impure entries.
> +        * entries, so only need to note changes to impure entries.  For
> +        * non-merge dirs in non-samefs case, we need to also note changes for
> +        * all sub-dirs, because we cache all overlay dir inode numbers.
>          */
> -       if (OVL_TYPE_MERGE(ovl_path_type(dentry)) || impurity)
> +       if (OVL_TYPE_MERGE(ovl_path_type(dentry)) || impurity ||
> +           (subdir && !ovl_same_sb(dentry->d_sb)))
>                 OVL_I(inode)->version++;
>  }
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 7/8] ovl: update cache version of impure parent on rename
  2017-11-06  9:17   ` Miklos Szeredi
@ 2017-11-06 10:06     ` Amir Goldstein
  2017-11-06 10:39       ` Miklos Szeredi
  0 siblings, 1 reply; 23+ messages in thread
From: Amir Goldstein @ 2017-11-06 10:06 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

On Mon, Nov 6, 2017 at 11:17 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Nov 2, 2017 at 9:38 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> ovl_rename() updates dir cache version for impure old parent if an entry
>> with copy up origin is moved into old parent, but it did not update
>> cache version if the entry moved out of old parent has a copy up origin.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  fs/overlayfs/dir.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
>> index ef533198be45..26aef3b5f007 100644
>> --- a/fs/overlayfs/dir.c
>> +++ b/fs/overlayfs/dir.c
>> @@ -1075,9 +1075,10 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
>>                         drop_nlink(d_inode(new));
>>         }
>>
>> -       ovl_dentry_version_inc(old->d_parent,
>> -                              !overwrite && ovl_type_origin(new));
>> +       ovl_dentry_version_inc(old->d_parent, ovl_type_origin(old));
>>         ovl_dentry_version_inc(new->d_parent, ovl_type_origin(old));
>> +       if (!overwrite)
>> +               ovl_dentry_version_inc(old->d_parent, ovl_type_origin(new));
>
> How about the opposite (i.e. newdir losing impurity)?  That can happen two ways:
>
>   - overwriting copied up file (new) with pure one (old)
>   - exchange copied up file (new) with pure one (old)
>
> Also I'd merge the two version increments into one, although that
> doesn't really matter:
>
> -    ovl_dentry_version_inc(old->d_parent,
> -                   !overwrite && ovl_type_origin(new));
> -    ovl_dentry_version_inc(new->d_parent, ovl_type_origin(old));
> +    ovl_dentry_version_inc(old->d_parent, ovl_type_origin(old) ||
> +                   (!overwrite && ovl_type_origin(new)));
> +    ovl_dentry_version_inc(new->d_parent,
> +                   ovl_type_origin(old) || ovl_type_origin(new));
>
> Fixed up the patch, but tell me if I'm missing something here.
>

I think you are.
Both my version and your version are wrong because the argument
'impurity' should really be 'impurity_change'

so if we have:
oldimpurity = ovl_type_origin(old);
newimpurity = d_inode(new) ? ovl_type_origin(new) : false;

version increment should really be something like this:
ovl_dentry_version_inc(old->d_parent, oldimpurity ^ (overwrite ?
newimpurity : false));
ovl_dentry_version_inc(new->d_parent, oldimpurity ^ newimpurity);


I think...

Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 8/8] ovl: ovl_iterate_real() for all pure upper/lower in non-samefs case
  2017-11-06  9:35   ` Miklos Szeredi
@ 2017-11-06 10:08     ` Amir Goldstein
  2017-11-06 10:54       ` Amir Goldstein
  2017-11-06 11:45     ` Amir Goldstein
  1 sibling, 1 reply; 23+ messages in thread
From: Amir Goldstein @ 2017-11-06 10:08 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

On Mon, Nov 6, 2017 at 11:35 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Nov 2, 2017 at 9:38 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> For non-samefs case, all directory entries are "impure" because the
>> st_ino value returned from stat(2) is not the st_ino value of the
>> real inode.
>>
>> In order to provide the same d_ino values for the directory type entries,
>> we never iterate the real dir directly in non-samefs case.
>>
>> ovl_fill_real() has been modified to return the overlay inode number
>> as d_ino of the "." and ".." entries.
>>
>> Build the impure dir cache for non-samefs case even if there is no
>> "impure" xattr on dir, but if real dir has nlink > 2 or nlink == 1,
>> which indicates that is has subdirs.
>>
>> Update non-merge dir cache version on every change in subdir entry
>> and reset the dir cache after copy up when directory becomes !is_real.
>>
>
> I don't really like this.  Non-samefs d_ino will always be hacky,
> whether consistent with st_ino or not (it loses the uniqueness
> property of d_ino, that is guaranteed in a normal filesystem).
>
> Spending possibly large amounts of memory and CPU cycles to achieve
> consistency is dubious.   Especially since nobody seems to be
> complaining about it.
>
> So, I'd just drop 6/8 and 8/8.
>

Sure, that's fine by me.
I need to fix the get_name() issue regardless.

FYI, will be posting some extra patches for whiteout expose shortly
(i.e. restore "origin" on lookup).

Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 7/8] ovl: update cache version of impure parent on rename
  2017-11-06 10:06     ` Amir Goldstein
@ 2017-11-06 10:39       ` Miklos Szeredi
  2017-11-06 10:59         ` Amir Goldstein
  0 siblings, 1 reply; 23+ messages in thread
From: Miklos Szeredi @ 2017-11-06 10:39 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

On Mon, Nov 6, 2017 at 11:06 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Nov 6, 2017 at 11:17 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Thu, Nov 2, 2017 at 9:38 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> ovl_rename() updates dir cache version for impure old parent if an entry
>>> with copy up origin is moved into old parent, but it did not update
>>> cache version if the entry moved out of old parent has a copy up origin.
>>>
>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>> ---
>>>  fs/overlayfs/dir.c | 5 +++--
>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
>>> index ef533198be45..26aef3b5f007 100644
>>> --- a/fs/overlayfs/dir.c
>>> +++ b/fs/overlayfs/dir.c
>>> @@ -1075,9 +1075,10 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
>>>                         drop_nlink(d_inode(new));
>>>         }
>>>
>>> -       ovl_dentry_version_inc(old->d_parent,
>>> -                              !overwrite && ovl_type_origin(new));
>>> +       ovl_dentry_version_inc(old->d_parent, ovl_type_origin(old));
>>>         ovl_dentry_version_inc(new->d_parent, ovl_type_origin(old));
>>> +       if (!overwrite)
>>> +               ovl_dentry_version_inc(old->d_parent, ovl_type_origin(new));
>>
>> How about the opposite (i.e. newdir losing impurity)?  That can happen two ways:
>>
>>   - overwriting copied up file (new) with pure one (old)
>>   - exchange copied up file (new) with pure one (old)
>>
>> Also I'd merge the two version increments into one, although that
>> doesn't really matter:
>>
>> -    ovl_dentry_version_inc(old->d_parent,
>> -                   !overwrite && ovl_type_origin(new));
>> -    ovl_dentry_version_inc(new->d_parent, ovl_type_origin(old));
>> +    ovl_dentry_version_inc(old->d_parent, ovl_type_origin(old) ||
>> +                   (!overwrite && ovl_type_origin(new)));
>> +    ovl_dentry_version_inc(new->d_parent,
>> +                   ovl_type_origin(old) || ovl_type_origin(new));
>>
>> Fixed up the patch, but tell me if I'm missing something here.
>>
>
> I think you are.
> Both my version and your version are wrong because the argument
> 'impurity' should really be 'impurity_change'
>
> so if we have:
> oldimpurity = ovl_type_origin(old);
> newimpurity = d_inode(new) ? ovl_type_origin(new) : false;
>
> version increment should really be something like this:
> ovl_dentry_version_inc(old->d_parent, oldimpurity ^ (overwrite ?
> newimpurity : false));
> ovl_dentry_version_inc(new->d_parent, oldimpurity ^ newimpurity);
>
>
> I think...

We are not only interested in change of state, but rather if the cache
is valid or not.  If an impurity is removed and another added, then
that needs to be noted as well.

Note: strictly speaking we don't need to invalidate the cache on entry
removal (that's why things worked fine before this patch).  And in
fact recalculating the cache on every removal of an impurity might
well cause worse performance than not doing that.  So this could be
further optimized to differentiate between removal and addition of
impurity and only "garbage collect" old caches occasionally.  But lets
not get lost in such details for now.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 8/8] ovl: ovl_iterate_real() for all pure upper/lower in non-samefs case
  2017-11-06 10:08     ` Amir Goldstein
@ 2017-11-06 10:54       ` Amir Goldstein
  0 siblings, 0 replies; 23+ messages in thread
From: Amir Goldstein @ 2017-11-06 10:54 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

On Mon, Nov 6, 2017 at 12:08 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Nov 6, 2017 at 11:35 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Thu, Nov 2, 2017 at 9:38 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> For non-samefs case, all directory entries are "impure" because the
>>> st_ino value returned from stat(2) is not the st_ino value of the
>>> real inode.
>>>
>>> In order to provide the same d_ino values for the directory type entries,
>>> we never iterate the real dir directly in non-samefs case.
>>>
>>> ovl_fill_real() has been modified to return the overlay inode number
>>> as d_ino of the "." and ".." entries.
>>>
>>> Build the impure dir cache for non-samefs case even if there is no
>>> "impure" xattr on dir, but if real dir has nlink > 2 or nlink == 1,
>>> which indicates that is has subdirs.
>>>
>>> Update non-merge dir cache version on every change in subdir entry
>>> and reset the dir cache after copy up when directory becomes !is_real.
>>>
>>
>> I don't really like this.  Non-samefs d_ino will always be hacky,
>> whether consistent with st_ino or not (it loses the uniqueness
>> property of d_ino, that is guaranteed in a normal filesystem).
>>
>> Spending possibly large amounts of memory and CPU cycles to achieve
>> consistency is dubious.   Especially since nobody seems to be
>> complaining about it.
>>
>> So, I'd just drop 6/8 and 8/8.
>>
>
> Sure, that's fine by me.
> I need to fix the get_name() issue regardless.
>

On second thought, we should keep Chandan's change to
return correct d_ino for ".".
I will post a trimmed down and fixed version of his patch instead of 8/8.

Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 7/8] ovl: update cache version of impure parent on rename
  2017-11-06 10:39       ` Miklos Szeredi
@ 2017-11-06 10:59         ` Amir Goldstein
  0 siblings, 0 replies; 23+ messages in thread
From: Amir Goldstein @ 2017-11-06 10:59 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

On Mon, Nov 6, 2017 at 12:39 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Mon, Nov 6, 2017 at 11:06 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Mon, Nov 6, 2017 at 11:17 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Thu, Nov 2, 2017 at 9:38 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>> ovl_rename() updates dir cache version for impure old parent if an entry
>>>> with copy up origin is moved into old parent, but it did not update
>>>> cache version if the entry moved out of old parent has a copy up origin.
>>>>
>>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>>> ---
>>>>  fs/overlayfs/dir.c | 5 +++--
>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
>>>> index ef533198be45..26aef3b5f007 100644
>>>> --- a/fs/overlayfs/dir.c
>>>> +++ b/fs/overlayfs/dir.c
>>>> @@ -1075,9 +1075,10 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
>>>>                         drop_nlink(d_inode(new));
>>>>         }
>>>>
>>>> -       ovl_dentry_version_inc(old->d_parent,
>>>> -                              !overwrite && ovl_type_origin(new));
>>>> +       ovl_dentry_version_inc(old->d_parent, ovl_type_origin(old));
>>>>         ovl_dentry_version_inc(new->d_parent, ovl_type_origin(old));
>>>> +       if (!overwrite)
>>>> +               ovl_dentry_version_inc(old->d_parent, ovl_type_origin(new));
>>>
>>> How about the opposite (i.e. newdir losing impurity)?  That can happen two ways:
>>>
>>>   - overwriting copied up file (new) with pure one (old)
>>>   - exchange copied up file (new) with pure one (old)
>>>
>>> Also I'd merge the two version increments into one, although that
>>> doesn't really matter:
>>>
>>> -    ovl_dentry_version_inc(old->d_parent,
>>> -                   !overwrite && ovl_type_origin(new));
>>> -    ovl_dentry_version_inc(new->d_parent, ovl_type_origin(old));
>>> +    ovl_dentry_version_inc(old->d_parent, ovl_type_origin(old) ||
>>> +                   (!overwrite && ovl_type_origin(new)));
>>> +    ovl_dentry_version_inc(new->d_parent,
>>> +                   ovl_type_origin(old) || ovl_type_origin(new));
>>>
>>> Fixed up the patch, but tell me if I'm missing something here.
>>>
>>
>> I think you are.
>> Both my version and your version are wrong because the argument
>> 'impurity' should really be 'impurity_change'
>>
>> so if we have:
>> oldimpurity = ovl_type_origin(old);
>> newimpurity = d_inode(new) ? ovl_type_origin(new) : false;
>>
>> version increment should really be something like this:
>> ovl_dentry_version_inc(old->d_parent, oldimpurity ^ (overwrite ?
>> newimpurity : false));
>> ovl_dentry_version_inc(new->d_parent, oldimpurity ^ newimpurity);
>>
>>
>> I think...
>
> We are not only interested in change of state, but rather if the cache
> is valid or not.  If an impurity is removed and another added, then
> that needs to be noted as well.

Right... so you only have a bug in your patch calling ovl_type_origin(new)
without checking d_inode(new) for increment of new->d_parent version.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 8/8] ovl: ovl_iterate_real() for all pure upper/lower in non-samefs case
  2017-11-06  9:35   ` Miklos Szeredi
  2017-11-06 10:08     ` Amir Goldstein
@ 2017-11-06 11:45     ` Amir Goldstein
  2017-11-06 13:57       ` Miklos Szeredi
  1 sibling, 1 reply; 23+ messages in thread
From: Amir Goldstein @ 2017-11-06 11:45 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

On Mon, Nov 6, 2017 at 11:35 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Nov 2, 2017 at 9:38 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> For non-samefs case, all directory entries are "impure" because the
>> st_ino value returned from stat(2) is not the st_ino value of the
>> real inode.
>>
>> In order to provide the same d_ino values for the directory type entries,
>> we never iterate the real dir directly in non-samefs case.
>>
>> ovl_fill_real() has been modified to return the overlay inode number
>> as d_ino of the "." and ".." entries.
>>
>> Build the impure dir cache for non-samefs case even if there is no
>> "impure" xattr on dir, but if real dir has nlink > 2 or nlink == 1,
>> which indicates that is has subdirs.
>>
>> Update non-merge dir cache version on every change in subdir entry
>> and reset the dir cache after copy up when directory becomes !is_real.
>>
>
> I don't really like this.  Non-samefs d_ino will always be hacky,
> whether consistent with st_ino or not (it loses the uniqueness
> property of d_ino, that is guaranteed in a normal filesystem).
>
> Spending possibly large amounts of memory and CPU cycles to achieve
> consistency is dubious.   Especially since nobody seems to be
> complaining about it.
>
> So, I'd just drop 6/8 and 8/8.
>
> And I'm still hoping we can get some infrastructure for partitioning
> the ino space (i.e. filesystem can report that it will only use N bits
> from the available 64).  This would allow properly solving the
> non-samefs case without adding hacks.
>

Oh! there is a ridiculously simple way for some file systems to know
that they are using 32 bit inodes.
We already call ovl_can_decode_fh() for upper and all lower layers.
All we need to do is actually encode the layer directory and check
returned fh_type.

If fh_type is FILEID_INO32_GEN, bingo!
That turns out to be true to all file systems that use the default
implementation of .encode_fh, including the ext* family, squashfs
and more.

XFS with mount option inode32 will also encode fh_type
FILEID_INO32_GEN, unless file system size is very large, so that
gives users a way to easily opt-in for constant st_ino when xfs
is involved.

I'll see if I can sketch a patch.

Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 8/8] ovl: ovl_iterate_real() for all pure upper/lower in non-samefs case
  2017-11-06 11:45     ` Amir Goldstein
@ 2017-11-06 13:57       ` Miklos Szeredi
  2017-11-06 14:30         ` Amir Goldstein
  0 siblings, 1 reply; 23+ messages in thread
From: Miklos Szeredi @ 2017-11-06 13:57 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

On Mon, Nov 6, 2017 at 12:45 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Nov 6, 2017 at 11:35 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:

>> And I'm still hoping we can get some infrastructure for partitioning
>> the ino space (i.e. filesystem can report that it will only use N bits
>> from the available 64).  This would allow properly solving the
>> non-samefs case without adding hacks.
>>
>
> Oh! there is a ridiculously simple way for some file systems to know
> that they are using 32 bit inodes.
> We already call ovl_can_decode_fh() for upper and all lower layers.
> All we need to do is actually encode the layer directory and check
> returned fh_type.
>
> If fh_type is FILEID_INO32_GEN, bingo!
> That turns out to be true to all file systems that use the default
> implementation of .encode_fh, including the ext* family, squashfs
> and more.

That's good.  However testing FILEID_INO32_GEN is not particularly
reliable since that constant is defined as "1" and AFAIR there are
other uses of "1" as type, just not with this name.  And nothing
guarantees that all those uses are for 32bit inode numbers, although
that may be the case.

I'd be more happy with an explicit bit count in struct super_block
indicating the number of bits in use.

Such an interface wouldn't limit filesystem implementors in any way.
They'd just be more flexibly declare the needs they have.

Also there is possibly a usecase for a large fs (non 32-bit inum) as
overlayfs upper layer.   I would guess it wouldn't be too difficult to
add an "inumbits=N" parameter to mkfs.xfs for that case.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 8/8] ovl: ovl_iterate_real() for all pure upper/lower in non-samefs case
  2017-11-06 13:57       ` Miklos Szeredi
@ 2017-11-06 14:30         ` Amir Goldstein
  2017-11-06 14:52           ` Miklos Szeredi
  0 siblings, 1 reply; 23+ messages in thread
From: Amir Goldstein @ 2017-11-06 14:30 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

On Mon, Nov 6, 2017 at 3:57 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Mon, Nov 6, 2017 at 12:45 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Mon, Nov 6, 2017 at 11:35 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
>>> And I'm still hoping we can get some infrastructure for partitioning
>>> the ino space (i.e. filesystem can report that it will only use N bits
>>> from the available 64).  This would allow properly solving the
>>> non-samefs case without adding hacks.
>>>
>>
>> Oh! there is a ridiculously simple way for some file systems to know
>> that they are using 32 bit inodes.
>> We already call ovl_can_decode_fh() for upper and all lower layers.
>> All we need to do is actually encode the layer directory and check
>> returned fh_type.
>>
>> If fh_type is FILEID_INO32_GEN, bingo!
>> That turns out to be true to all file systems that use the default
>> implementation of .encode_fh, including the ext* family, squashfs
>> and more.
>
> That's good.  However testing FILEID_INO32_GEN is not particularly
> reliable since that constant is defined as "1" and AFAIR there are
> other uses of "1" as type, just not with this name.  And nothing
> guarantees that all those uses are for 32bit inode numbers, although
> that may be the case.
>

Right... although for most fs I mentioned, default .encode_fh *is* in fact
a guaranty for 32bit inodes, so we can start with that and add explicit
number of bits to super block later.

> I'd be more happy with an explicit bit count in struct super_block
> indicating the number of bits in use.
>
> Such an interface wouldn't limit filesystem implementors in any way.
> They'd just be more flexibly declare the needs they have.
>
> Also there is possibly a usecase for a large fs (non 32-bit inum) as
> overlayfs upper layer.   I would guess it wouldn't be too difficult to
> add an "inumbits=N" parameter to mkfs.xfs for that case.
>

Cannot set inumbits, because they are already determined by fs
geometry (MSB bits are AG #).
The good news is that you can always know the max inumbits of
any existing xfs, so it would be possible for admin to mount overlay
with mount option upperinumbits=.
The bad news is that I think any xfs can potentially be grown to
use all inode bits, so it is unlikely that xfs will be willing to make
any inumbits commitment in super block besides the already
supported case of inode32 mount option.

Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 8/8] ovl: ovl_iterate_real() for all pure upper/lower in non-samefs case
  2017-11-06 14:30         ` Amir Goldstein
@ 2017-11-06 14:52           ` Miklos Szeredi
  0 siblings, 0 replies; 23+ messages in thread
From: Miklos Szeredi @ 2017-11-06 14:52 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Chandan Rajendra, Vivek Goyal, linux-unionfs

On Mon, Nov 6, 2017 at 3:30 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Nov 6, 2017 at 3:57 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Mon, Nov 6, 2017 at 12:45 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> On Mon, Nov 6, 2017 at 11:35 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>>>> And I'm still hoping we can get some infrastructure for partitioning
>>>> the ino space (i.e. filesystem can report that it will only use N bits
>>>> from the available 64).  This would allow properly solving the
>>>> non-samefs case without adding hacks.
>>>>
>>>
>>> Oh! there is a ridiculously simple way for some file systems to know
>>> that they are using 32 bit inodes.
>>> We already call ovl_can_decode_fh() for upper and all lower layers.
>>> All we need to do is actually encode the layer directory and check
>>> returned fh_type.
>>>
>>> If fh_type is FILEID_INO32_GEN, bingo!
>>> That turns out to be true to all file systems that use the default
>>> implementation of .encode_fh, including the ext* family, squashfs
>>> and more.
>>
>> That's good.  However testing FILEID_INO32_GEN is not particularly
>> reliable since that constant is defined as "1" and AFAIR there are
>> other uses of "1" as type, just not with this name.  And nothing
>> guarantees that all those uses are for 32bit inode numbers, although
>> that may be the case.
>>
>
> Right... although for most fs I mentioned, default .encode_fh *is* in fact
> a guaranty for 32bit inodes, so we can start with that and add explicit
> number of bits to super block later.
>
>> I'd be more happy with an explicit bit count in struct super_block
>> indicating the number of bits in use.
>>
>> Such an interface wouldn't limit filesystem implementors in any way.
>> They'd just be more flexibly declare the needs they have.
>>
>> Also there is possibly a usecase for a large fs (non 32-bit inum) as
>> overlayfs upper layer.   I would guess it wouldn't be too difficult to
>> add an "inumbits=N" parameter to mkfs.xfs for that case.
>>
>
> Cannot set inumbits, because they are already determined by fs
> geometry (MSB bits are AG #).
> The good news is that you can always know the max inumbits of
> any existing xfs, so it would be possible for admin to mount overlay
> with mount option upperinumbits=.

Right.  That's plenty enough.

> The bad news is that I think any xfs can potentially be grown to
> use all inode bits, so it is unlikely that xfs will be willing to make
> any inumbits commitment in super block besides the already
> supported case of inode32 mount option.

Sure, and I don't think there's a need for xfs to declare an upper
max.  What we need is

 - xfs reports maximum inum based on *current* size
 - overlayfs reports maximum inum allowed for layers based on settings
 - inum space mediator ensures that things remain consistent

It may be a bit more complicated than just a single count in super
block, but I don't see any theoretical problems with such an approach.

And we can start with just a count that is either 32 or 64, since that
is what we have now, and it may be enough for some time yet.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 5/8] ovl: relax same fs constraint for constant st_ino
  2017-11-02 20:38 ` [PATCH v7 5/8] ovl: relax same fs constraint for constant st_ino Amir Goldstein
@ 2017-11-06 20:43   ` Vivek Goyal
  2017-11-06 21:06     ` Amir Goldstein
  0 siblings, 1 reply; 23+ messages in thread
From: Vivek Goyal @ 2017-11-06 20:43 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, Chandan Rajendra, linux-unionfs

On Thu, Nov 02, 2017 at 10:38:08PM +0200, Amir Goldstein wrote:
> For the case of all layers not on the same fs, return the copy up origin
> inode st_dev/st_ino for non-dir from stat(2).
> 
> This guaranties constant st_dev/st_ino for non-dir across copy up.
> Like the same fs case, st_ino of non-dir is also persistent.
> 
> If the st_dev/st_ino for copied up object would have been the same as
> that of the real underlying lower file, running diff on underlying lower
> file and overlay copied up file would result in diff reporting that the
> two files are equal when in fact, they may have different content.
> 
> Therefore, unlike the same fs case, st_dev is not persistent because it
> uses the unique anonymous bdev allocated for the lower layer.

I have a stupid question. I thought st_dev is allocated dynamically and
is not persistent across reboot. Is that not the case? If yes, then
it is not persistent for same fs case as well. If yes, above does not
make sense.

Vivek

> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/inode.c | 49 +++++++++++++++++++++++++++++--------------------
>  1 file changed, 29 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index 607a50c50042..d06de98808f6 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -74,6 +74,7 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
>  	struct path realpath;
>  	const struct cred *old_cred;
>  	bool is_dir = S_ISDIR(dentry->d_inode->i_mode);
> +	bool samefs = ovl_same_sb(dentry->d_sb);
>  	int err;
>  
>  	type = ovl_path_real(dentry, &realpath);
> @@ -83,16 +84,13 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
>  		goto out;
>  
>  	/*
> -	 * When all layers are on the same fs, all real inode number are
> -	 * unique, so we use the overlay st_dev, which is friendly to du -x.
> -	 *
> -	 * We also use st_ino of the copy up origin, if we know it.
> -	 * This guaranties constant st_dev/st_ino across copy up.
> +	 * For non-dir or same fs, we use st_ino of the copy up origin, if we
> +	 * know it. This guaranties constant st_dev/st_ino across copy up.
>  	 *
>  	 * If filesystem supports NFS export ops, this also guaranties
>  	 * persistent st_ino across mount cycle.
>  	 */
> -	if (ovl_same_sb(dentry->d_sb)) {
> +	if (!is_dir || samefs) {
>  		if (OVL_TYPE_ORIGIN(type)) {
>  			struct kstat lowerstat;
>  			u32 lowermask = STATX_INO | (!is_dir ? STATX_NLINK : 0);
> @@ -103,7 +101,6 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
>  			if (err)
>  				goto out;
>  
> -			WARN_ON_ONCE(stat->dev != lowerstat.dev);
>  			/*
>  			 * Lower hardlinks may be broken on copy up to different
>  			 * upper files, so we cannot use the lower origin st_ino
> @@ -115,27 +112,39 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
>  			if (is_dir || lowerstat.nlink == 1 ||
>  			    ovl_test_flag(OVL_INDEX, d_inode(dentry)))
>  				stat->ino = lowerstat.ino;
> +
> +			if (samefs)
> +				WARN_ON_ONCE(stat->dev != lowerstat.dev);
> +			else
> +				stat->dev = ovl_get_pseudo_dev(dentry);
>  		}
> -		stat->dev = dentry->d_sb->s_dev;
> -	} else if (is_dir) {
> +		if (samefs) {
> +			/*
> +			 * When all layers are on the same fs, all real inode
> +			 * number are unique, so we use the overlay st_dev,
> +			 * which is friendly to du -x.
> +			 */
> +			stat->dev = dentry->d_sb->s_dev;
> +		} else if (!OVL_TYPE_UPPER(type)) {
> +			/*
> +			 * For non-samefs setup, to make sure that st_dev/st_ino
> +			 * pair is unique across the system, we use a unique
> +			 * anonymous st_dev for lower layer inode.
> +			 */
> +			stat->dev = ovl_get_pseudo_dev(dentry);
> +		}
> +	} else {
>  		/*
> -		 * If not all layers are on the same fs the pair {real st_ino;
> -		 * overlay st_dev} is not unique, so use the non persistent
> -		 * overlay st_ino.
> -		 *
>  		 * Always use the overlay st_dev for directories, so 'find
>  		 * -xdev' will scan the entire overlay mount and won't cross the
>  		 * overlay mount boundaries.
> +		 *
> +		 * If not all layers are on the same fs the pair {real st_ino;
> +		 * overlay st_dev} is not unique, so use the non persistent
> +		 * overlay st_ino for directories.
>  		 */
>  		stat->dev = dentry->d_sb->s_dev;
>  		stat->ino = dentry->d_inode->i_ino;
> -	} else if (!OVL_TYPE_UPPER(type)) {
> -		/*
> -		 * For non-samefs setup, to make sure that st_dev/st_ino pair
> -		 * is unique across the system, we use a unique anonymous
> -		 * st_dev for lower layer inode.
> -		 */
> -		stat->dev = ovl_get_pseudo_dev(dentry);
>  	}
>  
>  	/*
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 5/8] ovl: relax same fs constraint for constant st_ino
  2017-11-06 20:43   ` Vivek Goyal
@ 2017-11-06 21:06     ` Amir Goldstein
  2017-11-06 21:10       ` Amir Goldstein
  0 siblings, 1 reply; 23+ messages in thread
From: Amir Goldstein @ 2017-11-06 21:06 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Miklos Szeredi, Chandan Rajendra, overlayfs

On Mon, Nov 6, 2017 at 10:43 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Thu, Nov 02, 2017 at 10:38:08PM +0200, Amir Goldstein wrote:
>> For the case of all layers not on the same fs, return the copy up origin
>> inode st_dev/st_ino for non-dir from stat(2).
>>
>> This guaranties constant st_dev/st_ino for non-dir across copy up.
>> Like the same fs case, st_ino of non-dir is also persistent.
>>
>> If the st_dev/st_ino for copied up object would have been the same as
>> that of the real underlying lower file, running diff on underlying lower
>> file and overlay copied up file would result in diff reporting that the
>> two files are equal when in fact, they may have different content.
>>
>> Therefore, unlike the same fs case, st_dev is not persistent because it
>> uses the unique anonymous bdev allocated for the lower layer.
>
> I have a stupid question. I thought st_dev is allocated dynamically and
> is not persistent across reboot. Is that not the case? If yes, then
> it is not persistent for same fs case as well. If yes, above does not
> make sense.
>

True. That's a garbage text. Will delete it for v8.
Thanks!
Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 5/8] ovl: relax same fs constraint for constant st_ino
  2017-11-06 21:06     ` Amir Goldstein
@ 2017-11-06 21:10       ` Amir Goldstein
  0 siblings, 0 replies; 23+ messages in thread
From: Amir Goldstein @ 2017-11-06 21:10 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Miklos Szeredi, Chandan Rajendra, overlayfs

On Mon, Nov 6, 2017 at 11:06 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Nov 6, 2017 at 10:43 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> On Thu, Nov 02, 2017 at 10:38:08PM +0200, Amir Goldstein wrote:
>>> For the case of all layers not on the same fs, return the copy up origin
>>> inode st_dev/st_ino for non-dir from stat(2).
>>>
>>> This guaranties constant st_dev/st_ino for non-dir across copy up.
>>> Like the same fs case, st_ino of non-dir is also persistent.
>>>
>>> If the st_dev/st_ino for copied up object would have been the same as
>>> that of the real underlying lower file, running diff on underlying lower
>>> file and overlay copied up file would result in diff reporting that the
>>> two files are equal when in fact, they may have different content.
>>>
>>> Therefore, unlike the same fs case, st_dev is not persistent because it
>>> uses the unique anonymous bdev allocated for the lower layer.
>>
>> I have a stupid question. I thought st_dev is allocated dynamically and
>> is not persistent across reboot. Is that not the case? If yes, then
>> it is not persistent for same fs case as well. If yes, above does not
>> make sense.
>>
>
> True. That's a garbage text. Will delete it for v8.

Replaced with:

Therefore, unlike the same fs case, st_dev is not uniform across all
overlay object, which is less friednly to du -x.

Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-11-06 21:10 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-02 20:38 [PATCH v7 0/8] Overlayfs: constant st_ino/d_ino for non-samefs Amir Goldstein
2017-11-02 20:38 ` [PATCH v7 1/8] ovl: move include of ovl_entry.h into overlayfs.h Amir Goldstein
2017-11-02 20:38 ` [PATCH v7 2/8] ovl: re-structure overlay lower layers in-memory Amir Goldstein
2017-11-02 20:38 ` [PATCH v7 3/8] ovl: allocate anonymous devs for lowerdirs Amir Goldstein
2017-11-02 20:38 ` [PATCH v7 4/8] ovl: return anonymous st_dev for lower inodes Amir Goldstein
2017-11-02 20:38 ` [PATCH v7 5/8] ovl: relax same fs constraint for constant st_ino Amir Goldstein
2017-11-06 20:43   ` Vivek Goyal
2017-11-06 21:06     ` Amir Goldstein
2017-11-06 21:10       ` Amir Goldstein
2017-11-02 20:38 ` [PATCH v7 6/8] ovl: recalc d_ino for dir cache in non-samefs case Amir Goldstein
2017-11-02 20:38 ` [PATCH v7 7/8] ovl: update cache version of impure parent on rename Amir Goldstein
2017-11-06  9:17   ` Miklos Szeredi
2017-11-06 10:06     ` Amir Goldstein
2017-11-06 10:39       ` Miklos Szeredi
2017-11-06 10:59         ` Amir Goldstein
2017-11-02 20:38 ` [PATCH v7 8/8] ovl: ovl_iterate_real() for all pure upper/lower in non-samefs case Amir Goldstein
2017-11-06  9:35   ` Miklos Szeredi
2017-11-06 10:08     ` Amir Goldstein
2017-11-06 10:54       ` Amir Goldstein
2017-11-06 11:45     ` Amir Goldstein
2017-11-06 13:57       ` Miklos Szeredi
2017-11-06 14:30         ` Amir Goldstein
2017-11-06 14:52           ` Miklos Szeredi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.