All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/11] overlayfs constant inode numbers
@ 2017-04-24  9:14 Amir Goldstein
  2017-04-24  9:14 ` [PATCH v2 01/11] ovl: store path type in dentry Amir Goldstein
                   ` (14 more replies)
  0 siblings, 15 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24  9:14 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

Miklos,

Following your comments on the 'stable inodes' series from last week,
this series fixes constant inode numbers for stat(2) with any layer
configuration.

For the case of all *lower* layers on same fs that supports NFS export,
redirect by file handle will be used to optimize the lookup of the copy
up origin of non-dir inode.

For the case of *all* layers on same fs, overlayfs also gains:
- Persistent inode numbers for directories
- Correct results for du -x

Consistcy of stat(2) st_ino with readdir(3) d_ino is NOT addressed by
this series. It will be addressed for the 'samefs' configuration by the
follow up 'stable inode' work, which is also going to address preserving
hardlinks on copy up.

This series is available for testing on [1].
unionmount-testsuite needs a small fix patch for layers_check() [2].
Tested the following layer configurations:
 ./run --ov{,=0,=10} {,--samefs}

Tested constant inode numbers with xfstest overlay/017 and added a check
for persistent directory inode numbers across mount cycle [3].

Most of the patches in this series you already reviewed at one time or another
and have your comments already addressed. Some other patches are trivial.
Probably the only patches you need to take a closer look at are the 2 lookup
patches (5-6).

The implementation of lookup of a merged dir with a combination of redirect
by fh from upper and redirect by name in mid layer is more complicated.
Because this case is not strictly needed for this series, I simplified
things a bit and restricted lookup by fh to those cases:
1. Non directory (lookup of copy up origin)
2. Merge directory when ofs->numlower == 1

This restriction may be relaxed later on if we want to handle lookup by fh
with fallback to lookup by path for merge dirs.

What do you say? ... Too late for v4.12?

Amir.

[1] https://github.com/amir73il/linux/commits/ovl-constino
[2] https://github.com/amir73il/unionmount-testsuite/commits/overlayfs-devel
[3] https://github.com/amir73il/xfstests/commits/overlayfs-devel

Amir Goldstein (11):
  ovl: store path type in dentry
  ovl: cram opaque boolean into type flags
  ovl: check if all layers are on the same fs
  ovl: store file handle of lower inode on copy up
  ovl: lookup redirect by file handle
  ovl: lookup non-dir inode copy up origin
  ovl: set the COPYUP type flag for non-dirs
  ovl: redirect non-dir by path on rename
  ovl: constant st_ino/st_dev across copy up
  ovl: persistent inode number for directories
  ovl: fix du --one-file-system on overlay mount

 fs/overlayfs/copy_up.c   |  98 +++++++++++++++++++++
 fs/overlayfs/dir.c       |  28 +++++-
 fs/overlayfs/inode.c     |  21 ++++-
 fs/overlayfs/namei.c     | 216 +++++++++++++++++++++++++++++++++++++++++------
 fs/overlayfs/overlayfs.h |  23 +++++
 fs/overlayfs/ovl_entry.h |   9 +-
 fs/overlayfs/super.c     |  21 +++++
 fs/overlayfs/util.c      |  83 ++++++++++++++++--
 8 files changed, 461 insertions(+), 38 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 01/11] ovl: store path type in dentry
  2017-04-24  9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
@ 2017-04-24  9:14 ` Amir Goldstein
  2017-04-24 12:59   ` Vivek Goyal
  2017-04-24  9:14 ` [PATCH v2 02/11] ovl: cram opaque boolean into type flags Amir Goldstein
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24  9:14 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

We would like to add more state info to ovl_entry soon (for const ino)
and this state info would be added as type flags.

Store the type value in ovl_entry and update the UPPER and MERGE type
flags when needed, so ovl_path_type() just returns the stored value.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/namei.c     |  1 +
 fs/overlayfs/overlayfs.h |  1 +
 fs/overlayfs/ovl_entry.h |  3 +++
 fs/overlayfs/super.c     |  1 +
 fs/overlayfs/util.c      | 30 ++++++++++++++++++++++++------
 5 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index b8b0778..8788fd7 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -338,6 +338,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	kfree(stack);
 	kfree(d.redirect);
 	dentry->d_fsdata = oe;
+	ovl_update_type(dentry, d.is_dir);
 	d_add(dentry, inode);
 
 	return NULL;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 741dc0b..e90a548 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -155,6 +155,7 @@ struct ovl_entry *ovl_alloc_entry(unsigned int numlower);
 bool ovl_dentry_remote(struct dentry *dentry);
 bool ovl_dentry_weird(struct dentry *dentry);
 enum ovl_path_type ovl_path_type(struct dentry *dentry);
+enum ovl_path_type ovl_update_type(struct dentry *dentry, bool is_dir);
 void ovl_path_upper(struct dentry *dentry, struct path *path);
 void ovl_path_lower(struct dentry *dentry, struct path *path);
 enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 59614fa..293be5f 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -31,6 +31,8 @@ struct ovl_fs {
 	wait_queue_head_t copyup_wq;
 };
 
+enum ovl_path_type;
+
 /* private information held for every overlayfs dentry */
 struct ovl_entry {
 	struct dentry *__upperdentry;
@@ -44,6 +46,7 @@ struct ovl_entry {
 		};
 		struct rcu_head rcu;
 	};
+	enum ovl_path_type __type;
 	unsigned numlower;
 	struct path lowerstack[];
 };
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index c072a0c..671bac0 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -961,6 +961,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	kfree(stack);
 
 	root_dentry->d_fsdata = oe;
+	ovl_update_type(root_dentry, true);
 
 	realinode = d_inode(ovl_dentry_real(root_dentry));
 	ovl_inode_init(d_inode(root_dentry), realinode, !!upperpath.dentry);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 1953986..6a857fb 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -70,21 +70,38 @@ bool ovl_dentry_weird(struct dentry *dentry)
 enum ovl_path_type ovl_path_type(struct dentry *dentry)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
-	enum ovl_path_type type = 0;
+	enum ovl_path_type type = oe->__type;
 
-	if (oe->__upperdentry) {
-		type = __OVL_PATH_UPPER;
+	/* Matches smp_wmb() in ovl_update_type() */
+	smp_rmb();
+	return type;
+}
+
+enum ovl_path_type ovl_update_type(struct dentry *dentry, bool is_dir)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+	enum ovl_path_type type = oe->__type;
 
+	/* Update UPPER/MERGE flags and preserve the rest */
+	type &= ~(__OVL_PATH_UPPER | __OVL_PATH_MERGE);
+	if (oe->__upperdentry) {
+		type |= __OVL_PATH_UPPER;
 		/*
-		 * Non-dir dentry can hold lower dentry from previous
-		 * location.
+		 * Non-dir dentry can hold lower dentry from before
+		 * copy-up.
 		 */
-		if (oe->numlower && d_is_dir(dentry))
+		if (oe->numlower && is_dir)
 			type |= __OVL_PATH_MERGE;
 	} else {
 		if (oe->numlower > 1)
 			type |= __OVL_PATH_MERGE;
 	}
+	/*
+	 * Make sure type is consistent with __upperdentry before making it
+	 * visible to ovl_path_type().
+	 */
+	smp_wmb();
+	oe->__type = type;
 	return type;
 }
 
@@ -220,6 +237,7 @@ void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
 	 */
 	smp_wmb();
 	oe->__upperdentry = upperdentry;
+	ovl_update_type(dentry, d_is_dir(dentry));
 }
 
 void ovl_inode_init(struct inode *inode, struct inode *realinode, bool is_upper)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 02/11] ovl: cram opaque boolean into type flags
  2017-04-24  9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
  2017-04-24  9:14 ` [PATCH v2 01/11] ovl: store path type in dentry Amir Goldstein
@ 2017-04-24  9:14 ` Amir Goldstein
  2017-04-24  9:14 ` [PATCH v2 03/11] ovl: check if all layers are on the same fs Amir Goldstein
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24  9:14 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

We are going to add more state info to ovl_entry soon (for const ino)
and this state info would be added as type flags.

It makes sense to treat 'opaque' in a similar way, so instead of using
a boolean member in ovl_entry use a type bit to represent opaqueness.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/namei.c     | 9 +++++----
 fs/overlayfs/overlayfs.h | 2 ++
 fs/overlayfs/ovl_entry.h | 1 -
 fs/overlayfs/util.c      | 5 +++--
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 8788fd7..d660177 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -224,7 +224,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	struct dentry *upperdir, *upperdentry = NULL;
 	unsigned int ctr = 0;
 	struct inode *inode = NULL;
-	bool upperopaque = false;
+	enum ovl_path_type type = 0;
 	char *upperredirect = NULL;
 	struct dentry *this;
 	unsigned int i;
@@ -261,7 +261,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 			if (d.redirect[0] == '/')
 				poe = dentry->d_sb->s_root->d_fsdata;
 		}
-		upperopaque = d.opaque;
+		if (d.opaque)
+			type |= __OVL_PATH_OPAQUE;
 	}
 
 	if (!d.stop && poe->numlower) {
@@ -331,7 +332,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	}
 
 	revert_creds(old_cred);
-	oe->opaque = upperopaque;
+	oe->__type = type;
 	oe->redirect = upperredirect;
 	oe->__upperdentry = upperdentry;
 	memcpy(oe->lowerstack, stack, sizeof(struct path) * ctr);
@@ -372,7 +373,7 @@ bool ovl_lower_positive(struct dentry *dentry)
 	 * whiteout.
 	 */
 	if (!dentry->d_inode)
-		return oe->opaque;
+		return OVL_TYPE_OPAQUE(oe->__type);
 
 	/* Negative upper -> positive lower */
 	if (!oe->__upperdentry)
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index e90a548..9420101 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -12,10 +12,12 @@
 enum ovl_path_type {
 	__OVL_PATH_UPPER	= (1 << 0),
 	__OVL_PATH_MERGE	= (1 << 1),
+	__OVL_PATH_OPAQUE	= (1 << 2),
 };
 
 #define OVL_TYPE_UPPER(type)	((type) & __OVL_PATH_UPPER)
 #define OVL_TYPE_MERGE(type)	((type) & __OVL_PATH_MERGE)
+#define OVL_TYPE_OPAQUE(type)	((type) & __OVL_PATH_OPAQUE)
 
 #define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay."
 #define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX "opaque"
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 293be5f..12c4922 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -41,7 +41,6 @@ struct ovl_entry {
 		struct {
 			u64 version;
 			const char *redirect;
-			bool opaque;
 			bool copying;
 		};
 		struct rcu_head rcu;
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 6a857fb..dce4141 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -181,7 +181,8 @@ void ovl_set_dir_cache(struct dentry *dentry, struct ovl_dir_cache *cache)
 bool ovl_dentry_is_opaque(struct dentry *dentry)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
-	return oe->opaque;
+
+	return OVL_TYPE_OPAQUE(oe->__type);
 }
 
 bool ovl_dentry_is_whiteout(struct dentry *dentry)
@@ -193,7 +194,7 @@ void ovl_dentry_set_opaque(struct dentry *dentry)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
 
-	oe->opaque = true;
+	oe->__type |= __OVL_PATH_OPAQUE;
 }
 
 bool ovl_redirect_dir(struct super_block *sb)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 03/11] ovl: check if all layers are on the same fs
  2017-04-24  9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
  2017-04-24  9:14 ` [PATCH v2 01/11] ovl: store path type in dentry Amir Goldstein
  2017-04-24  9:14 ` [PATCH v2 02/11] ovl: cram opaque boolean into type flags Amir Goldstein
@ 2017-04-24  9:14 ` Amir Goldstein
  2017-04-24  9:14 ` [PATCH v2 04/11] ovl: store file handle of lower inode on copy up Amir Goldstein
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24  9:14 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

Some features can only work when all lower layers are on the same fs
and some features require that upper layer is also on the same fs.
Test those conditions during mount time, so features can check them later.

Add helper ovl_same_lower_sb() to return the common super block in case
all lower layers are on the same fs and helper ovl_same_sb() to return
the super block common to all layers.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/overlayfs.h |  2 ++
 fs/overlayfs/ovl_entry.h |  3 +++
 fs/overlayfs/super.c     |  9 +++++++++
 fs/overlayfs/util.c      | 14 ++++++++++++++
 4 files changed, 28 insertions(+)

diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 9420101..48d0dae 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -153,6 +153,8 @@ int ovl_want_write(struct dentry *dentry);
 void ovl_drop_write(struct dentry *dentry);
 struct dentry *ovl_workdir(struct dentry *dentry);
 const struct cred *ovl_override_creds(struct super_block *sb);
+struct super_block *ovl_same_lower_sb(struct super_block *sb);
+struct super_block *ovl_same_sb(struct super_block *sb);
 struct ovl_entry *ovl_alloc_entry(unsigned int numlower);
 bool ovl_dentry_remote(struct dentry *dentry);
 bool ovl_dentry_weird(struct dentry *dentry);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 12c4922..41708bf 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -29,6 +29,9 @@ struct ovl_fs {
 	const struct cred *creator_cred;
 	bool tmpfile;
 	wait_queue_head_t copyup_wq;
+	/* sb common to all (or all lower) layers */
+	struct super_block *same_lower_sb;
+	struct super_block *same_sb;
 };
 
 enum ovl_path_type;
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 671bac0..b8830ee 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -898,6 +898,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	ufs->lower_mnt = kcalloc(numlower, sizeof(struct vfsmount *), GFP_KERNEL);
 	if (ufs->lower_mnt == NULL)
 		goto out_put_workdir;
+
 	for (i = 0; i < numlower; i++) {
 		struct vfsmount *mnt = clone_private_mount(&stack[i]);
 
@@ -914,11 +915,19 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 
 		ufs->lower_mnt[ufs->numlower] = mnt;
 		ufs->numlower++;
+
+		/* Check if all lower layers are on same sb */
+		if (i == 0)
+			ufs->same_lower_sb = mnt->mnt_sb;
+		else if (ufs->same_lower_sb != mnt->mnt_sb)
+			ufs->same_lower_sb = NULL;
 	}
 
 	/* If the upper fs is nonexistent, we mark overlayfs r/o too */
 	if (!ufs->upper_mnt)
 		sb->s_flags |= MS_RDONLY;
+	else if (ufs->upper_mnt->mnt_sb == ufs->same_lower_sb)
+		ufs->same_sb = ufs->same_lower_sb;
 
 	if (remote)
 		sb->s_d_op = &ovl_reval_dentry_operations;
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index dce4141..43dcdf5 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -41,6 +41,20 @@ const struct cred *ovl_override_creds(struct super_block *sb)
 	return override_creds(ofs->creator_cred);
 }
 
+struct super_block *ovl_same_lower_sb(struct super_block *sb)
+{
+	struct ovl_fs *ofs = sb->s_fs_info;
+
+	return ofs->same_lower_sb;
+}
+
+struct super_block *ovl_same_sb(struct super_block *sb)
+{
+	struct ovl_fs *ofs = sb->s_fs_info;
+
+	return ofs->same_sb;
+}
+
 struct ovl_entry *ovl_alloc_entry(unsigned int numlower)
 {
 	size_t size = offsetof(struct ovl_entry, lowerstack[numlower]);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
  2017-04-24  9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
                   ` (2 preceding siblings ...)
  2017-04-24  9:14 ` [PATCH v2 03/11] ovl: check if all layers are on the same fs Amir Goldstein
@ 2017-04-24  9:14 ` Amir Goldstein
  2017-04-24 13:32   ` kbuild test robot
                     ` (2 more replies)
  2017-04-24  9:14 ` [PATCH v2 05/11] ovl: lookup redirect by file handle Amir Goldstein
                   ` (10 subsequent siblings)
  14 siblings, 3 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24  9:14 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

Sometimes it is interesting to know if an upper file is pure
upper or a copy up target, and if it is a copy up target, it
may be interesting to find the copy up origin.

This will be used to preserve lower inode numbers across copy up.

Store the lower inode file handle in upper inode xattr overlay.fh
on copy up to use it later for these cases.

On failure to encode lower file handle, store an invalid 'null'
handle, so we can always use the overlay.fh xattr to distignuish
between a copy up and a pure upper inode.

If lower fs does not support NFS export ops or if not all lower
layers are on the same fs, don't try to encode a lower file handle
and use the 'null' handle instead.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c   | 98 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/overlayfs/overlayfs.h | 15 ++++++++
 fs/overlayfs/ovl_entry.h |  2 +
 fs/overlayfs/super.c     | 11 ++++++
 fs/overlayfs/util.c      | 14 +++++++
 5 files changed, 140 insertions(+)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 906ea6c..1a967b9 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -20,6 +20,7 @@
 #include <linux/namei.h>
 #include <linux/fdtable.h>
 #include <linux/ratelimit.h>
+#include <linux/exportfs.h>
 #include "overlayfs.h"
 #include "ovl_entry.h"
 
@@ -232,6 +233,95 @@ int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat)
 	return err;
 }
 
+static struct ovl_fh *ovl_get_fh(struct dentry *lower)
+{
+	const struct export_operations *nop = lower->d_sb->s_export_op;
+	struct ovl_fh *fh;
+	int fh_type, fh_len, dwords;
+	void *buf = NULL;
+	void *ret = NULL;
+	int buflen = MAX_HANDLE_SZ;
+	int err;
+
+	/* Do not encode file handle if we cannot decode it later */
+	err = -EOPNOTSUPP;
+	if (!nop || !nop->fh_to_dentry)
+		goto out_err;
+
+	err = -ENOMEM;
+	buf = kmalloc(buflen, GFP_TEMPORARY);
+	if (!buf)
+		goto out_err;
+
+	fh = buf;
+	dwords = (buflen - offsetof(struct ovl_fh, fid)) >> 2;
+	fh_type = exportfs_encode_fh(lower,
+				     (struct fid *)fh->fid,
+				     &dwords, 0);
+	fh_len = (dwords << 2) + offsetof(struct ovl_fh, fid);
+
+	err = -EOVERFLOW;
+	if (fh_len > buflen || fh_type <= 0 || fh_type == FILEID_INVALID)
+		goto out_err;
+
+	fh->version = OVL_FH_VERSION;
+	fh->magic = OVL_FH_MAGIC;
+	fh->type = fh_type;
+	fh->len = fh_len;
+
+	err = -ENOMEM;
+	ret = kmalloc(fh_len, GFP_KERNEL);
+	if (!ret)
+		goto out_err;
+
+	memcpy(ret, buf, fh_len);
+
+	kfree(buf);
+	return ret;
+
+out_err:
+	pr_warn_ratelimited("overlay: failed to get redirect fh (%i)\n", err);
+	kfree(buf);
+	kfree(ret);
+	return ERR_PTR(err);
+}
+
+static struct ovl_fh null_fh = {
+	.version = OVL_FH_VERSION,
+	.magic = OVL_FH_MAGIC,
+	.type = FILEID_INVALID,
+	.len = sizeof(struct ovl_fh),
+};
+
+static int ovl_set_lower_fh(struct dentry *dentry, struct dentry *upper)
+{
+	int err;
+	const struct ovl_fh *fh = NULL;
+
+	if (ovl_redirect_fh(dentry->d_sb))
+		fh = ovl_get_fh(ovl_dentry_lower(dentry));
+	/*
+	 * On failure to encode lower fh, store an invalid 'null' fh, so
+	 * we can always use the overlay.fh xattr to distignuish between
+	 * a copy up and a pure upper inode.  If lower fs does not support
+	 * encoding fh, don't try to encode again.
+	 */
+	err = PTR_ERR(fh);
+	if (IS_ERR_OR_NULL(fh)) {
+		if (err == -EOPNOTSUPP) {
+			pr_warn("overlay: file handle not supported by lower - turning off redirect_fh\n");
+			ovl_clear_redirect_fh(dentry->d_sb);
+		}
+		fh = &null_fh;
+	}
+
+	err = ovl_do_setxattr(upper, OVL_XATTR_FH, fh, fh->len, 0);
+
+	if (fh != &null_fh)
+		kfree(fh);
+	return err;
+}
+
 static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
 			      struct dentry *dentry, struct path *lowerpath,
 			      struct kstat *stat, const char *link,
@@ -316,6 +406,14 @@ static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
 	if (err)
 		goto out_cleanup;
 
+	/*
+	 * Store file handle of lower inode in upper inode xattr to
+	 * allow lookup of the copy up origin inode.
+	 */
+	err = ovl_set_lower_fh(dentry, temp);
+	if (err)
+		goto out_cleanup;
+
 	if (tmpfile)
 		err = ovl_do_link(temp, udir, upper, true);
 	else
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 48d0dae..c3cfbc5 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -22,6 +22,7 @@ enum ovl_path_type {
 #define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay."
 #define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX "opaque"
 #define OVL_XATTR_REDIRECT OVL_XATTR_PREFIX "redirect"
+#define OVL_XATTR_FH OVL_XATTR_PREFIX "fh"
 
 #define OVL_ISUPPER_MASK 1UL
 
@@ -148,6 +149,18 @@ static inline struct inode *ovl_inode_real(struct inode *inode, bool *is_upper)
 	return (struct inode *) (x & ~OVL_ISUPPER_MASK);
 }
 
+/* redirect data format for redirect by file handle */
+struct ovl_fh {
+	unsigned char version;	/* 0 */
+	unsigned char magic;	/* 0xfb */
+	unsigned char len;	/* size of this header + size of fid */
+	unsigned char type;	/* fid_type of fid */
+	unsigned char fid[0];	/* file identifier */
+} __packed;
+
+#define OVL_FH_VERSION	0
+#define OVL_FH_MAGIC	0xfb
+
 /* util.c */
 int ovl_want_write(struct dentry *dentry);
 void ovl_drop_write(struct dentry *dentry);
@@ -175,6 +188,8 @@ bool ovl_redirect_dir(struct super_block *sb);
 void ovl_clear_redirect_dir(struct super_block *sb);
 const char *ovl_dentry_get_redirect(struct dentry *dentry);
 void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
+bool ovl_redirect_fh(struct super_block *sb);
+void ovl_clear_redirect_fh(struct super_block *sb);
 void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry);
 void ovl_inode_init(struct inode *inode, struct inode *realinode,
 		    bool is_upper);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 41708bf..2172dc5 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -32,6 +32,8 @@ struct ovl_fs {
 	/* sb common to all (or all lower) layers */
 	struct super_block *same_lower_sb;
 	struct super_block *same_sb;
+	/* redirect by file handle */
+	bool redirect_fh;
 };
 
 enum ovl_path_type;
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index b8830ee..34632ec 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -17,6 +17,7 @@
 #include <linux/statfs.h>
 #include <linux/seq_file.h>
 #include <linux/posix_acl_xattr.h>
+#include <linux/exportfs.h>
 #include "overlayfs.h"
 #include "ovl_entry.h"
 
@@ -929,6 +930,16 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	else if (ufs->upper_mnt->mnt_sb == ufs->same_lower_sb)
 		ufs->same_sb = ufs->same_lower_sb;
 
+	/*
+	 * Redirect by file handle is used to find a lower entry in one of the
+	 * lower layers,  so the handle must be unique across all lower layers.
+	 * Therefore, enable redirect by file handle, only if all lower layers
+	 * are on the same sb which supports lookup by file handles.
+	 */
+	if (ufs->same_lower_sb && ufs->same_lower_sb->s_export_op &&
+	    ufs->same_lower_sb->s_export_op->fh_to_dentry)
+		ufs->redirect_fh = true;
+
 	if (remote)
 		sb->s_d_op = &ovl_reval_dentry_operations;
 	else
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 43dcdf5..b3bc117 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -240,6 +240,20 @@ void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect)
 	oe->redirect = redirect;
 }
 
+bool ovl_redirect_fh(struct super_block *sb)
+{
+	struct ovl_fs *ofs = sb->s_fs_info;
+
+	return ofs->redirect_fh;
+}
+
+void ovl_clear_redirect_fh(struct super_block *sb)
+{
+	struct ovl_fs *ofs = sb->s_fs_info;
+
+	ofs->redirect_fh = false;
+}
+
 void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-24  9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
                   ` (3 preceding siblings ...)
  2017-04-24  9:14 ` [PATCH v2 04/11] ovl: store file handle of lower inode on copy up Amir Goldstein
@ 2017-04-24  9:14 ` Amir Goldstein
  2017-04-25  8:10   ` Amir Goldstein
  2017-04-25 15:13   ` Miklos Szeredi
  2017-04-24  9:14 ` [PATCH v2 06/11] ovl: lookup non-dir inode copy up origin Amir Goldstein
                   ` (9 subsequent siblings)
  14 siblings, 2 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24  9:14 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

When overlay.fh xattr is found in a directory inode, instead of lookup
of the dentry in next lower layer by name, first try to get it by calling
exportfs_decode_fh().

On failure to lookup by file handle to lower layer, fall back to lookup
by name with or without path redirect.

For now we only support following by file handle from upper if there is a
single lower layer, because fallback from lookup by file hande to lookup
by path in mid layers is not yet implemented.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/namei.c     | 185 +++++++++++++++++++++++++++++++++++++++++++----
 fs/overlayfs/overlayfs.h |   1 +
 fs/overlayfs/util.c      |  14 ++++
 3 files changed, 186 insertions(+), 14 deletions(-)

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index d660177..0d1cc8f 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -9,9 +9,11 @@
 
 #include <linux/fs.h>
 #include <linux/cred.h>
+#include <linux/mount.h>
 #include <linux/namei.h>
 #include <linux/xattr.h>
 #include <linux/ratelimit.h>
+#include <linux/exportfs.h>
 #include "overlayfs.h"
 #include "ovl_entry.h"
 
@@ -21,7 +23,10 @@ struct ovl_lookup_data {
 	bool opaque;
 	bool stop;
 	bool last;
-	char *redirect;
+	bool by_path;		/* redirect by path */
+	bool by_fh;		/* redirect by file handle */
+	char *redirect;		/* path to follow */
+	struct ovl_fh *fh;	/* file handle to follow */
 };
 
 static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
@@ -81,6 +86,42 @@ static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
 	goto err_free;
 }
 
+static int ovl_check_redirect_fh(struct dentry *dentry,
+				 struct ovl_lookup_data *d)
+{
+	int res;
+	void *buf = NULL;
+
+	res = vfs_getxattr(dentry, OVL_XATTR_FH, NULL, 0);
+	if (res < 0) {
+		if (res == -ENODATA || res == -EOPNOTSUPP)
+			return 0;
+		goto fail;
+	}
+	buf = kzalloc(res, GFP_TEMPORARY);
+	if (!buf)
+		return -ENOMEM;
+
+	if (res == 0)
+		goto fail;
+
+	res = vfs_getxattr(dentry, OVL_XATTR_FH, buf, res);
+	if (res < 0 || !ovl_redirect_fh_ok(buf, res))
+		goto fail;
+
+	kfree(d->fh);
+	d->fh = buf;
+
+	return 0;
+
+err_free:
+	kfree(buf);
+	return 0;
+fail:
+	pr_warn_ratelimited("overlayfs: failed to get file handle (%i)\n", res);
+	goto err_free;
+}
+
 static bool ovl_is_opaquedir(struct dentry *dentry)
 {
 	int res;
@@ -96,22 +137,81 @@ static bool ovl_is_opaquedir(struct dentry *dentry)
 	return false;
 }
 
+/* Check if p1 is connected with a chain of hashed dentries to p2 */
+static bool ovl_is_lookable(struct dentry *p1, struct dentry *p2)
+{
+	struct dentry *p;
+
+	for (p = p2; !IS_ROOT(p); p = p->d_parent) {
+		if (d_unhashed(p))
+			return false;
+		if (p->d_parent == p1)
+			return true;
+	}
+	return false;
+}
+
+/* Check if dentry is reachable from mnt via path lookup */
+static int ovl_dentry_under_mnt(void *ctx, struct dentry *dentry)
+{
+	struct vfsmount *mnt = ctx;
+
+	return ovl_is_lookable(mnt->mnt_root, dentry);
+}
+
+static struct dentry *ovl_lookup_fh(struct vfsmount *mnt,
+				    const struct ovl_fh *fh)
+{
+	int bytes = (fh->len - offsetof(struct ovl_fh, fid));
+
+	/*
+	 * When redirect_fh is disabled, 'invalid' file handles are stored
+	 * to indicate that this entry has been copied up.
+	 */
+	if (!bytes || (int)fh->type == FILEID_INVALID)
+		return ERR_PTR(-ESTALE);
+
+	/*
+	 * Several layers can be on the same fs and decoded dentry may be in
+	 * either one of those layers. We are looking for a match of dentry
+	 * and mnt to find out to which layer the decoded dentry belongs to.
+	 */
+	return exportfs_decode_fh(mnt, (struct fid *)fh->fid,
+				  bytes >> 2, (int)fh->type,
+				  ovl_dentry_under_mnt, mnt);
+}
+
 static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
 			     const char *name, unsigned int namelen,
 			     size_t prelen, const char *post,
-			     struct dentry **ret)
+			     struct vfsmount *mnt, struct dentry **ret)
 {
 	struct dentry *this;
 	int err;
 
-	this = lookup_one_len_unlocked(name, base, namelen);
+	/*
+	 * Lookup of upper is with null d->fh.
+	 * Lookup of lower is either by_fh with non-null d->fh
+	 * or by_path with null d->fh.
+	 */
+	if (d->fh)
+		this = ovl_lookup_fh(mnt, d->fh);
+	else
+		this = lookup_one_len_unlocked(name, base, namelen);
 	if (IS_ERR(this)) {
 		err = PTR_ERR(this);
 		this = NULL;
 		if (err == -ENOENT || err == -ENAMETOOLONG)
 			goto out;
+		if (d->fh && err == -ESTALE)
+			goto out;
 		goto out_err;
 	}
+
+	/* If found by file handle - don't follow that handle again */
+	kfree(d->fh);
+	d->fh = NULL;
+
 	if (!this->d_inode)
 		goto put_and_out;
 
@@ -135,9 +235,18 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
 		d->stop = d->opaque = true;
 		goto out;
 	}
-	err = ovl_check_redirect(this, d, prelen, post);
-	if (err)
-		goto out_err;
+	if (d->last)
+		goto out;
+	if (d->by_path) {
+		err = ovl_check_redirect(this, d, prelen, post);
+		if (err)
+			goto out_err;
+	}
+	if (d->by_fh) {
+		err = ovl_check_redirect_fh(this, d);
+		if (err)
+			goto out_err;
+	}
 out:
 	*ret = this;
 	return 0;
@@ -152,6 +261,12 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
 	return err;
 }
 
+static int ovl_lookup_layer_fh(struct path *path, struct ovl_lookup_data *d,
+			       struct dentry **ret)
+{
+	return ovl_lookup_single(path->dentry, d, "", 0, 0, "", path->mnt, ret);
+}
+
 static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
 			    struct dentry **ret)
 {
@@ -162,7 +277,7 @@ static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
 
 	if (d->name.name[0] != '/')
 		return ovl_lookup_single(base, d, d->name.name, d->name.len,
-					 0, "", ret);
+					 0, "", NULL, ret);
 
 	while (!IS_ERR_OR_NULL(base) && d_can_lookup(base)) {
 		const char *s = d->name.name + d->name.len - rem;
@@ -175,7 +290,7 @@ static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
 			return -EIO;
 
 		err = ovl_lookup_single(base, d, s, thislen,
-					d->name.len - rem, next, &base);
+					d->name.len - rem, next, NULL, &base);
 		dput(dentry);
 		if (err)
 			return err;
@@ -220,6 +335,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	const struct cred *old_cred;
 	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
 	struct ovl_entry *poe = dentry->d_parent->d_fsdata;
+	struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
 	struct path *stack = NULL;
 	struct dentry *upperdir, *upperdentry = NULL;
 	unsigned int ctr = 0;
@@ -235,7 +351,10 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		.opaque = false,
 		.stop = false,
 		.last = !poe->numlower,
+		.by_path = true,
 		.redirect = NULL,
+		.by_fh = true,
+		.fh = NULL,
 	};
 
 	if (dentry->d_name.len > ofs->namelen)
@@ -259,13 +378,23 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 			if (!upperredirect)
 				goto out_put_upper;
 			if (d.redirect[0] == '/')
-				poe = dentry->d_sb->s_root->d_fsdata;
+				poe = roe;
 		}
 		if (d.opaque)
 			type |= __OVL_PATH_OPAQUE;
 	}
 
-	if (!d.stop && poe->numlower) {
+	/*
+	 * For now we only support lower by fh in single layer, because
+	 * fallback from lookup by fh to lookup by path in mid layers for
+	 * merge directory is not yet implemented.
+	 */
+	if (!ofs->redirect_fh || ofs->numlower > 1) {
+		kfree(d.fh);
+		d.fh = NULL;
+	}
+
+	if (!d.stop && (poe->numlower || d.fh)) {
 		err = -ENOMEM;
 		stack = kcalloc(ofs->numlower, sizeof(struct path),
 				GFP_TEMPORARY);
@@ -273,6 +402,35 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 			goto out_put_upper;
 	}
 
+	/* Try to lookup lower layers by file handle */
+	d.by_path = false;
+	for (i = 0; !d.stop && d.fh && i < roe->numlower; i++) {
+		struct path lowerpath = poe->lowerstack[i];
+
+		d.last = i == poe->numlower - 1;
+		err = ovl_lookup_layer_fh(&lowerpath, &d, &this);
+		if (err)
+			goto out_put;
+
+		if (!this)
+			continue;
+
+		stack[ctr].dentry = this;
+		stack[ctr].mnt = lowerpath.mnt;
+		ctr++;
+		/*
+		 * Found by fh - won't lookup by path.
+		 * TODO: set d.redirect to dentry_path(this),
+		 *       so lookup can continue by path.
+		 */
+		d.stop = true;
+	}
+
+	/* Fallback to lookup lower layers by path */
+	d.by_path = true;
+	d.by_fh = false;
+	kfree(d.fh);
+	d.fh = NULL;
 	for (i = 0; !d.stop && i < poe->numlower; i++) {
 		struct path lowerpath = poe->lowerstack[i];
 
@@ -291,10 +449,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		if (d.stop)
 			break;
 
-		if (d.redirect &&
-		    d.redirect[0] == '/' &&
-		    poe != dentry->d_sb->s_root->d_fsdata) {
-			poe = dentry->d_sb->s_root->d_fsdata;
+		if (d.redirect && d.redirect[0] == '/' && poe != roe) {
+			poe = roe;
 
 			/* Find the current layer on the root dentry */
 			for (i = 0; i < poe->numlower; i++)
@@ -354,6 +510,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	dput(upperdentry);
 	kfree(upperredirect);
 out:
+	kfree(d.fh);
 	kfree(d.redirect);
 	revert_creds(old_cred);
 	return ERR_PTR(err);
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index c3cfbc5..08002ce 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -190,6 +190,7 @@ const char *ovl_dentry_get_redirect(struct dentry *dentry);
 void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
 bool ovl_redirect_fh(struct super_block *sb);
 void ovl_clear_redirect_fh(struct super_block *sb);
+bool ovl_redirect_fh_ok(const char *redirect, size_t size);
 void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry);
 void ovl_inode_init(struct inode *inode, struct inode *realinode,
 		    bool is_upper);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index b3bc117..dba9753 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -254,6 +254,20 @@ void ovl_clear_redirect_fh(struct super_block *sb)
 	ofs->redirect_fh = false;
 }
 
+bool ovl_redirect_fh_ok(const char *redirect, size_t size)
+{
+	struct ovl_fh *fh = (void *)redirect;
+
+	if (size < sizeof(struct ovl_fh) || size < fh->len)
+		return false;
+
+	if (fh->version > OVL_FH_VERSION ||
+	    fh->magic != OVL_FH_MAGIC)
+		return false;
+
+	return true;
+}
+
 void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 06/11] ovl: lookup non-dir inode copy up origin
  2017-04-24  9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
                   ` (4 preceding siblings ...)
  2017-04-24  9:14 ` [PATCH v2 05/11] ovl: lookup redirect by file handle Amir Goldstein
@ 2017-04-24  9:14 ` Amir Goldstein
  2017-04-24  9:14 ` [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs Amir Goldstein
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24  9:14 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

When non directory upper has overlay.fh xattr, lookup in lower layers
by file handle or by path to find the copy up origin inode.

Until this change a non-dir dentry could have had oe->numlower == 1
with oe->lowerstack[0] pointing at the copy up origin path, right
after copy up, but not when a non-dir dentry was created by ovl_lookup().

After this change, a non-dir dentry could be pointing at the copy up
origin after ovl_lookup(), as long as the copy up was done by overlayfs
that had redirect_fh support.

Non-dir entries that were copied up by overlayfs without redirect_fh
support will look the same as pure upper non-dir entries.

This is going to be used for persistent inode numbers across copy up.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/namei.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 0d1cc8f..318092a 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -225,15 +225,16 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
 		goto put_and_out;
 	}
 	if (!d_can_lookup(this)) {
-		d->stop = true;
-		if (d->is_dir)
+		if (d->is_dir) {
+			d->stop = true;
 			goto put_and_out;
-		goto out;
-	}
-	d->is_dir = true;
-	if (!d->last && ovl_is_opaquedir(this)) {
-		d->stop = d->opaque = true;
-		goto out;
+		}
+	} else {
+		d->is_dir = true;
+		if (!d->last && ovl_is_opaquedir(this)) {
+			d->stop = d->opaque = true;
+			goto out;
+		}
 	}
 	if (d->last)
 		goto out;
@@ -247,6 +248,9 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
 		if (err)
 			goto out_err;
 	}
+	/* No redirect for non-dir means pure upper */
+	if (!d->is_dir)
+		d->stop = !d->fh && !d->redirect;
 out:
 	*ret = this;
 	return 0;
@@ -385,11 +389,11 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	}
 
 	/*
-	 * For now we only support lower by fh in single layer, because
-	 * fallback from lookup by fh to lookup by path in mid layers for
-	 * merge directory is not yet implemented.
+	 * For now we only support lookup by fh in single layer for directory,
+	 * because fallback from lookup by fh to lookup by path in mid layers
+	 * for merge directory is not yet implemented.
 	 */
-	if (!ofs->redirect_fh || ofs->numlower > 1) {
+	if (!ofs->redirect_fh || (d.is_dir && ofs->numlower > 1)) {
 		kfree(d.fh);
 		d.fh = NULL;
 	}
@@ -402,7 +406,6 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 			goto out_put_upper;
 	}
 
-	/* Try to lookup lower layers by file handle */
 	d.by_path = false;
 	for (i = 0; !d.stop && d.fh && i < roe->numlower; i++) {
 		struct path lowerpath = poe->lowerstack[i];
@@ -446,7 +449,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		stack[ctr].mnt = lowerpath.mnt;
 		ctr++;
 
-		if (d.stop)
+		/* Do not follow non-dir copy up origin more than once */
+		if (d.stop || !d.is_dir)
 			break;
 
 		if (d.redirect && d.redirect[0] == '/' && poe != roe) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs
  2017-04-24  9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
                   ` (5 preceding siblings ...)
  2017-04-24  9:14 ` [PATCH v2 06/11] ovl: lookup non-dir inode copy up origin Amir Goldstein
@ 2017-04-24  9:14 ` Amir Goldstein
  2017-04-26 14:40   ` Miklos Szeredi
  2017-04-24  9:14 ` [PATCH v2 08/11] ovl: redirect non-dir by path on rename Amir Goldstein
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24  9:14 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

For directory entries, non zero oe->numlower implies OVL_TYPE_MERGE.
Define a new type flag OVL_TYPE_COPYUP to indicate that an entry is
a target of a copy up.

For directory entries COPYUP = MERGE && UPPER. For non-dir entries
non zero oe->numlower implies COPYUP, but COPYUP does not imply
non zero oe->numlower.  COPYUP can also be set on lookup when detecting
an overlay.fh xattr on a non-dir, even if that fh cannot be followed.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/namei.c     |  3 +++
 fs/overlayfs/overlayfs.h |  2 ++
 fs/overlayfs/util.c      | 12 ++++++++----
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 318092a..73a8879 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -386,6 +386,9 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		}
 		if (d.opaque)
 			type |= __OVL_PATH_OPAQUE;
+		/* overlay.fh xattr implies this is a copy up */
+		if (d.fh)
+			type |= __OVL_PATH_COPYUP;
 	}
 
 	/*
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 08002ce..d0bb538 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -13,11 +13,13 @@ enum ovl_path_type {
 	__OVL_PATH_UPPER	= (1 << 0),
 	__OVL_PATH_MERGE	= (1 << 1),
 	__OVL_PATH_OPAQUE	= (1 << 2),
+	__OVL_PATH_COPYUP	= (1 << 3),
 };
 
 #define OVL_TYPE_UPPER(type)	((type) & __OVL_PATH_UPPER)
 #define OVL_TYPE_MERGE(type)	((type) & __OVL_PATH_MERGE)
 #define OVL_TYPE_OPAQUE(type)	((type) & __OVL_PATH_OPAQUE)
+#define OVL_TYPE_COPYUP(type)	((type) & __OVL_PATH_COPYUP)
 
 #define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay."
 #define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX "opaque"
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index dba9753..89789bc 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -101,11 +101,15 @@ enum ovl_path_type ovl_update_type(struct dentry *dentry, bool is_dir)
 	if (oe->__upperdentry) {
 		type |= __OVL_PATH_UPPER;
 		/*
-		 * Non-dir dentry can hold lower dentry from before
-		 * copy-up.
+		 * oe->numlower implies a copy up, but copy up does not imply
+		 * oe->numlower.  It can also be set on lookup when detecting
+		 * an overlay.fh xattr on a non-dir that cannot be followed.
 		 */
-		if (oe->numlower && is_dir)
-			type |= __OVL_PATH_MERGE;
+		if (oe->numlower) {
+			type |= __OVL_PATH_COPYUP;
+			if (is_dir)
+				type |= __OVL_PATH_MERGE;
+		}
 	} else {
 		if (oe->numlower > 1)
 			type |= __OVL_PATH_MERGE;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 08/11] ovl: redirect non-dir by path on rename
  2017-04-24  9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
                   ` (6 preceding siblings ...)
  2017-04-24  9:14 ` [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs Amir Goldstein
@ 2017-04-24  9:14 ` Amir Goldstein
  2017-04-24  9:14 ` [PATCH v2 09/11] ovl: constant st_ino/st_dev across copy up Amir Goldstein
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24  9:14 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

When a non-dir COPYUP type entry is being renamed, set its
overlay.redirect xattr, just the same as when renaming a lower
or merge directory.

This will be used to find the copy up original of non-dir inodes
in case the lower layers do not support lookup by file handle.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/dir.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 6515796..edfe3df 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -795,6 +795,13 @@ static bool ovl_type_merge_or_lower(struct dentry *dentry)
 	return OVL_TYPE_MERGE(type) || !OVL_TYPE_UPPER(type);
 }
 
+static bool ovl_type_copyup(struct dentry *dentry)
+{
+	enum ovl_path_type type = ovl_path_type(dentry);
+
+	return OVL_TYPE_COPYUP(type);
+}
+
 static bool ovl_can_move(struct dentry *dentry)
 {
 	return ovl_redirect_dir(dentry->d_sb) ||
@@ -1022,6 +1029,8 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 			err = ovl_set_opaque(old, olddentry);
 		if (err)
 			goto out_dput;
+	} else if (ovl_type_copyup(old)) {
+		err = ovl_set_redirect(old, samedir);
 	}
 	if (!overwrite && new_is_dir) {
 		if (ovl_type_merge_or_lower(new))
@@ -1030,6 +1039,8 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 			err = ovl_set_opaque(new, newdentry);
 		if (err)
 			goto out_dput;
+	} else if (!overwrite && ovl_type_copyup(new)) {
+		err = ovl_set_redirect(new, samedir);
 	}
 
 	err = ovl_do_rename(old_upperdir->d_inode, olddentry,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 09/11] ovl: constant st_ino/st_dev across copy up
  2017-04-24  9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
                   ` (7 preceding siblings ...)
  2017-04-24  9:14 ` [PATCH v2 08/11] ovl: redirect non-dir by path on rename Amir Goldstein
@ 2017-04-24  9:14 ` Amir Goldstein
  2017-04-24  9:14 ` [PATCH v2 10/11] ovl: persistent and constant inode number for directories Amir Goldstein
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24  9:14 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

When getting attributes for overlay inode of path type COPYUP,
get the inode and dev numbers from the copy up origin inode.

This results in constant and persistent st_ino/st_dev representation
of files in overlay mount before and after copy up as well as after
mount cycle.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/inode.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 17b8418..3615a52 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -60,15 +60,25 @@ int ovl_setattr(struct dentry *dentry, struct iattr *attr)
 static int ovl_getattr(const struct path *path, struct kstat *stat,
 		       u32 request_mask, unsigned int flags)
 {
-	struct dentry *dentry = path->dentry;
+	struct dentry *lower, *dentry = path->dentry;
 	struct path realpath;
 	const struct cred *old_cred;
+	enum ovl_path_type type;
 	int err;
 
-	ovl_path_real(dentry, &realpath);
+	type = ovl_path_real(dentry, &realpath);
 	old_cred = ovl_override_creds(dentry->d_sb);
 	err = vfs_getattr(&realpath, stat, request_mask, flags);
 	revert_creds(old_cred);
+	if (err)
+		return err;
+
+	lower = ovl_dentry_lower(dentry);
+	if (OVL_TYPE_COPYUP(type) && lower) {
+		stat->dev = lower->d_sb->s_dev;
+		stat->ino = lower->d_inode->i_ino;
+	}
+
 	return err;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 10/11] ovl: persistent and constant inode number for directories
  2017-04-24  9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
                   ` (8 preceding siblings ...)
  2017-04-24  9:14 ` [PATCH v2 09/11] ovl: constant st_ino/st_dev across copy up Amir Goldstein
@ 2017-04-24  9:14 ` Amir Goldstein
  2017-04-24  9:14 ` [PATCH v2 11/11] ovl: fix du --one-file-system on overlay mount Amir Goldstein
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24  9:14 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

stat(2) on overlay directories reports the overlay temp inode
number, which is constant across copy up, but is not persistent.

When all layers are on the same fs, report the upper most lower inode
(a.k.a stable inode) number for directories.

This inode number is persistent, unique across the overlay mount and
constant across copy up.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/dir.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index edfe3df..6106649 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -154,8 +154,23 @@ static int ovl_dir_getattr(const struct path *path, struct kstat *stat,
 	if (err)
 		return err;
 
+	/*
+	 * Always use the overlay bdev for directories, so 'find -xdev' will
+	 * scan the entire overlay mount and won't cross the overlay mount
+	 * boundaries.
+	 */
 	stat->dev = dentry->d_sb->s_dev;
-	stat->ino = dentry->d_inode->i_ino;
+	/*
+	 * When all layers are not on the same fs, the pair real inode numbers
+	 * and overlay bdev is not unique, so use the non persistent overlay
+	 * inode number.
+	 * When all layers are on the same fs, use the stable inode number,
+	 * which is persistent, unique and constant across copy up.
+	 */
+	if (!ovl_same_sb(dentry->d_sb))
+		stat->ino = dentry->d_inode->i_ino;
+	else if (OVL_TYPE_UPPER(type) && OVL_TYPE_MERGE(type))
+		stat->ino = ovl_dentry_lower(dentry)->d_inode->i_ino;
 
 	/*
 	 * It's probably not worth it to count subdirs to get the
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 11/11] ovl: fix du --one-file-system on overlay mount
  2017-04-24  9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
                   ` (9 preceding siblings ...)
  2017-04-24  9:14 ` [PATCH v2 10/11] ovl: persistent and constant inode number for directories Amir Goldstein
@ 2017-04-24  9:14 ` Amir Goldstein
  2017-04-24 18:40 ` [PATCH v2 12/12] ovl: persistent inode numbers for hardlinks Amir Goldstein
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24  9:14 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

Overlay directory inodes report overlay bdev to stat(2).
Overlay non-dir inodes report real bdev and real ino to stat(2).

Due to the different bdev values for dir and non-dir inodes, when executing
the command du -x on an overlay mount, the result is wrong because non-dirs
are not accounted for in the overlay bdev usage.

The reasons for this bdev inconsistecy is:
1. The overlay ino is not persistent, so real ino is used for non-dirs
2. The tupple overlay bdev and real ino is not unique, so real bdev is
   used for non-dirs

In case all overlay layers are on the same underlying fs, the tupple
from reason 2 above is unique, so use this tupple for non-dirs to get
the correct result from du -x.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/inode.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 3615a52..39c3bb0 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -78,6 +78,13 @@ static int ovl_getattr(const struct path *path, struct kstat *stat,
 		stat->dev = lower->d_sb->s_dev;
 		stat->ino = lower->d_inode->i_ino;
 	}
+	/*
+	 * When all layers are on same fs, the tupple overlay bdev
+	 * and real inode ino is unique, so it is preferred to expose
+	 * overlay bdev for overlay inodes for things like du -x.
+	 */
+	if (ovl_same_sb(dentry->d_sb))
+		stat->dev = dentry->d_sb->s_dev;
 
 	return err;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 01/11] ovl: store path type in dentry
  2017-04-24  9:14 ` [PATCH v2 01/11] ovl: store path type in dentry Amir Goldstein
@ 2017-04-24 12:59   ` Vivek Goyal
  2017-04-24 13:10     ` Amir Goldstein
  0 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2017-04-24 12:59 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel

On Mon, Apr 24, 2017 at 12:14:06PM +0300, Amir Goldstein wrote:

[..]
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index c072a0c..671bac0 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -961,6 +961,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>  	kfree(stack);
>  
>  	root_dentry->d_fsdata = oe;
> +	ovl_update_type(root_dentry, true);
>  
>  	realinode = d_inode(ovl_dentry_real(root_dentry));
>  	ovl_inode_init(d_inode(root_dentry), realinode, !!upperpath.dentry);
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index 1953986..6a857fb 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -70,21 +70,38 @@ bool ovl_dentry_weird(struct dentry *dentry)
>  enum ovl_path_type ovl_path_type(struct dentry *dentry)
>  {
>  	struct ovl_entry *oe = dentry->d_fsdata;
> -	enum ovl_path_type type = 0;
> +	enum ovl_path_type type = oe->__type;
>  
> -	if (oe->__upperdentry) {
> -		type = __OVL_PATH_UPPER;
> +	/* Matches smp_wmb() in ovl_update_type() */
> +	smp_rmb();
> +	return type;

Hi Amir,

I never manage to understand barriers so I will ask. Why this barrier is
required and what can go wrong if we don't use this barrier.

Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 01/11] ovl: store path type in dentry
  2017-04-24 12:59   ` Vivek Goyal
@ 2017-04-24 13:10     ` Amir Goldstein
  2017-04-24 13:36       ` Vivek Goyal
  0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 13:10 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel

On Mon, Apr 24, 2017 at 3:59 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Mon, Apr 24, 2017 at 12:14:06PM +0300, Amir Goldstein wrote:
>
> [..]
> > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > index c072a0c..671bac0 100644
> > --- a/fs/overlayfs/super.c
> > +++ b/fs/overlayfs/super.c
> > @@ -961,6 +961,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> >       kfree(stack);
> >
> >       root_dentry->d_fsdata = oe;
> > +     ovl_update_type(root_dentry, true);
> >
> >       realinode = d_inode(ovl_dentry_real(root_dentry));
> >       ovl_inode_init(d_inode(root_dentry), realinode, !!upperpath.dentry);
> > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> > index 1953986..6a857fb 100644
> > --- a/fs/overlayfs/util.c
> > +++ b/fs/overlayfs/util.c
> > @@ -70,21 +70,38 @@ bool ovl_dentry_weird(struct dentry *dentry)
> >  enum ovl_path_type ovl_path_type(struct dentry *dentry)
> >  {
> >       struct ovl_entry *oe = dentry->d_fsdata;
> > -     enum ovl_path_type type = 0;
> > +     enum ovl_path_type type = oe->__type;
> >
> > -     if (oe->__upperdentry) {
> > -             type = __OVL_PATH_UPPER;
> > +     /* Matches smp_wmb() in ovl_update_type() */
> > +     smp_rmb();
> > +     return type;
>
> Hi Amir,
>
> I never manage to understand barriers so I will ask. Why this barrier is
> required and what can go wrong if we don't use this barrier.
>

Hi Vivek,

Miklos was kind enough to answer that question for me when he made
the comment about missing memmory barrier on v1 of the patch:
http://www.spinics.net/lists/linux-unionfs/msg01687.html

Whether or not I got it right, we shall see shortly ;-)

Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
  2017-04-24  9:14 ` [PATCH v2 04/11] ovl: store file handle of lower inode on copy up Amir Goldstein
@ 2017-04-24 13:32   ` kbuild test robot
  2017-04-24 13:57     ` Amir Goldstein
  2017-04-25 14:53   ` Miklos Szeredi
  2017-04-26  9:39   ` Miklos Szeredi
  2 siblings, 1 reply; 69+ messages in thread
From: kbuild test robot @ 2017-04-24 13:32 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: kbuild-all, Miklos Szeredi, Vivek Goyal, Al Viro, linux-unionfs,
	linux-fsdevel

Hi Amir,

[auto build test WARNING on miklos-vfs/overlayfs-next]
[also build test WARNING on v4.11-rc8 next-20170424]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Amir-Goldstein/overlayfs-constant-inode-numbers/20170424-175555
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next


coccinelle warnings: (new ones prefixed by >>)

>> fs/overlayfs/copy_up.c:309:7-14: ERROR: PTR_ERR applied after initialization to constant on line 299

vim +309 fs/overlayfs/copy_up.c

   293		.len = sizeof(struct ovl_fh),
   294	};
   295	
   296	static int ovl_set_lower_fh(struct dentry *dentry, struct dentry *upper)
   297	{
   298		int err;
 > 299		const struct ovl_fh *fh = NULL;
   300	
   301		if (ovl_redirect_fh(dentry->d_sb))
   302			fh = ovl_get_fh(ovl_dentry_lower(dentry));
   303		/*
   304		 * On failure to encode lower fh, store an invalid 'null' fh, so
   305		 * we can always use the overlay.fh xattr to distignuish between
   306		 * a copy up and a pure upper inode.  If lower fs does not support
   307		 * encoding fh, don't try to encode again.
   308		 */
 > 309		err = PTR_ERR(fh);
   310		if (IS_ERR_OR_NULL(fh)) {
   311			if (err == -EOPNOTSUPP) {
   312				pr_warn("overlay: file handle not supported by lower - turning off redirect_fh\n");

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 01/11] ovl: store path type in dentry
  2017-04-24 13:10     ` Amir Goldstein
@ 2017-04-24 13:36       ` Vivek Goyal
  2017-04-24 13:41         ` Amir Goldstein
  0 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2017-04-24 13:36 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel

On Mon, Apr 24, 2017 at 04:10:30PM +0300, Amir Goldstein wrote:
> On Mon, Apr 24, 2017 at 3:59 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Mon, Apr 24, 2017 at 12:14:06PM +0300, Amir Goldstein wrote:
> >
> > [..]
> > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > > index c072a0c..671bac0 100644
> > > --- a/fs/overlayfs/super.c
> > > +++ b/fs/overlayfs/super.c
> > > @@ -961,6 +961,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> > >       kfree(stack);
> > >
> > >       root_dentry->d_fsdata = oe;
> > > +     ovl_update_type(root_dentry, true);
> > >
> > >       realinode = d_inode(ovl_dentry_real(root_dentry));
> > >       ovl_inode_init(d_inode(root_dentry), realinode, !!upperpath.dentry);
> > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> > > index 1953986..6a857fb 100644
> > > --- a/fs/overlayfs/util.c
> > > +++ b/fs/overlayfs/util.c
> > > @@ -70,21 +70,38 @@ bool ovl_dentry_weird(struct dentry *dentry)
> > >  enum ovl_path_type ovl_path_type(struct dentry *dentry)
> > >  {
> > >       struct ovl_entry *oe = dentry->d_fsdata;
> > > -     enum ovl_path_type type = 0;
> > > +     enum ovl_path_type type = oe->__type;
> > >
> > > -     if (oe->__upperdentry) {
> > > -             type = __OVL_PATH_UPPER;
> > > +     /* Matches smp_wmb() in ovl_update_type() */
> > > +     smp_rmb();
> > > +     return type;
> >
> > Hi Amir,
> >
> > I never manage to understand barriers so I will ask. Why this barrier is
> > required and what can go wrong if we don't use this barrier.
> >
> 
> Hi Vivek,
> 
> Miklos was kind enough to answer that question for me when he made
> the comment about missing memmory barrier on v1 of the patch:
> http://www.spinics.net/lists/linux-unionfs/msg01687.html
> 
> Whether or not I got it right, we shall see shortly ;-)

Hi Amir,

Thanks. Ok, so we are making sure if other cpu sees updated ->type, then
it is guaranteed that it isses updated ->upperdentry as well.

I feel it is worth to put a shortened version of explanation from miklos
in the comments. It will help to recall why did we put it. But it is just
me. May be it is obvious to others.

Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 01/11] ovl: store path type in dentry
  2017-04-24 13:36       ` Vivek Goyal
@ 2017-04-24 13:41         ` Amir Goldstein
  0 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 13:41 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel

On Mon, Apr 24, 2017 at 4:36 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Mon, Apr 24, 2017 at 04:10:30PM +0300, Amir Goldstein wrote:
>> On Mon, Apr 24, 2017 at 3:59 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> >
>> > On Mon, Apr 24, 2017 at 12:14:06PM +0300, Amir Goldstein wrote:
>> >
>> > [..]
>> > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>> > > index c072a0c..671bac0 100644
>> > > --- a/fs/overlayfs/super.c
>> > > +++ b/fs/overlayfs/super.c
>> > > @@ -961,6 +961,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>> > >       kfree(stack);
>> > >
>> > >       root_dentry->d_fsdata = oe;
>> > > +     ovl_update_type(root_dentry, true);
>> > >
>> > >       realinode = d_inode(ovl_dentry_real(root_dentry));
>> > >       ovl_inode_init(d_inode(root_dentry), realinode, !!upperpath.dentry);
>> > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
>> > > index 1953986..6a857fb 100644
>> > > --- a/fs/overlayfs/util.c
>> > > +++ b/fs/overlayfs/util.c
>> > > @@ -70,21 +70,38 @@ bool ovl_dentry_weird(struct dentry *dentry)
>> > >  enum ovl_path_type ovl_path_type(struct dentry *dentry)
>> > >  {
>> > >       struct ovl_entry *oe = dentry->d_fsdata;
>> > > -     enum ovl_path_type type = 0;
>> > > +     enum ovl_path_type type = oe->__type;
>> > >
>> > > -     if (oe->__upperdentry) {
>> > > -             type = __OVL_PATH_UPPER;
>> > > +     /* Matches smp_wmb() in ovl_update_type() */
>> > > +     smp_rmb();
>> > > +     return type;
>> >
>> > Hi Amir,
>> >
>> > I never manage to understand barriers so I will ask. Why this barrier is
>> > required and what can go wrong if we don't use this barrier.
>> >
>>
>> Hi Vivek,
>>
>> Miklos was kind enough to answer that question for me when he made
>> the comment about missing memmory barrier on v1 of the patch:
>> http://www.spinics.net/lists/linux-unionfs/msg01687.html
>>
>> Whether or not I got it right, we shall see shortly ;-)
>
> Hi Amir,
>
> Thanks. Ok, so we are making sure if other cpu sees updated ->type, then
> it is guaranteed that it isses updated ->upperdentry as well.
>
> I feel it is worth to put a shortened version of explanation from miklos
> in the comments. It will help to recall why did we put it. But it is just
> me. May be it is obvious to others.
>

I recon memory barriers are obvious to few.
However, I think the documentation I left is quite standard practice:
- smp_rmb() has a comment to point to the matching smp_wmb()
- smp_wmb() has a comment to explain what the barrier protects:
 * Make sure type is consistent with __upperdentry before making it
 * visible to ovl_path_type()
(i.e. to lockless readers of oe->__type)

Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
  2017-04-24 13:32   ` kbuild test robot
@ 2017-04-24 13:57     ` Amir Goldstein
  0 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 13:57 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Miklos Szeredi, Vivek Goyal, Al Viro, linux-unionfs,
	linux-fsdevel

On Mon, Apr 24, 2017 at 4:32 PM, kbuild test robot <lkp@intel.com> wrote:
> Hi Amir,
>
> [auto build test WARNING on miklos-vfs/overlayfs-next]
> [also build test WARNING on v4.11-rc8 next-20170424]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url:    https://github.com/0day-ci/linux/commits/Amir-Goldstein/overlayfs-constant-inode-numbers/20170424-175555
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
>
>
> coccinelle warnings: (new ones prefixed by >>)
>
>>> fs/overlayfs/copy_up.c:309:7-14: ERROR: PTR_ERR applied after initialization to constant on line 299

Why is this wrong?
The pointer tested is not const - it's referenced data is.

Anyway, this pointed out another thing worth a warning:
The static variable null_fh was meant to be const.
I wonder if coccinelle has a way to figure out this pattern?
I guess not.

> > 289 static struct ovl_fh null_fh = {
>   290    .version = OVL_FH_VERSION,
>   291    .magic = OVL_FH_MAGIC,
>   292    .type = FILEID_INVALID,

>
> vim +309 fs/overlayfs/copy_up.c
>

>    293          .len = sizeof(struct ovl_fh),
>    294  };
>    295
>    296  static int ovl_set_lower_fh(struct dentry *dentry, struct dentry *upper)
>    297  {
>    298          int err;
>  > 299          const struct ovl_fh *fh = NULL;
>    300
>    301          if (ovl_redirect_fh(dentry->d_sb))
>    302                  fh = ovl_get_fh(ovl_dentry_lower(dentry));
>    303          /*
>    304           * On failure to encode lower fh, store an invalid 'null' fh, so
>    305           * we can always use the overlay.fh xattr to distignuish between
>    306           * a copy up and a pure upper inode.  If lower fs does not support
>    307           * encoding fh, don't try to encode again.
>    308           */
>  > 309          err = PTR_ERR(fh);
>    310          if (IS_ERR_OR_NULL(fh)) {
>    311                  if (err == -EOPNOTSUPP) {
>    312                          pr_warn("overlay: file handle not supported by lower - turning off redirect_fh\n");
>
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 12/12] ovl: persistent inode numbers for hardlinks
  2017-04-24  9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
                   ` (10 preceding siblings ...)
  2017-04-24  9:14 ` [PATCH v2 11/11] ovl: fix du --one-file-system on overlay mount Amir Goldstein
@ 2017-04-24 18:40 ` Amir Goldstein
  2017-04-24 18:51 ` [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 18:40 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

An upper type non directory dentry that is a copy up target
should have a reference to its lower copy up origin.

There are three ways for an upper type dentry to be instantiated:
1. A lower type dentry that is being copied up
2. An entry that is found in upper dir by ovl_lookup()
3. A negative dentry is hardlinked to an upper type dentry

In the first case, the lower reference is set before copy up.
In the second case, the lower reference is found by ovl_lookup().
In the last case of hardlinked upper dentry, it is not easy to
update the lower reference of the negative dentry.  Instead,
drop the newly hardlinked negative dentry from dcache and let
the next access call ovl_lookup() to find its lower reference.

This makes sure that the inode number reported by stat(2) after
the hardlink is created is the same inode number that will be
reported by stat(2) after mount cycle, which is the inode number
of the lower copy up origin of the hardlink source.

NOTE that this does not fix breaking of lower hardlinks on copy
up, but it will result in stat(2) reporting the same inode number
for all the upper broken hardlinks.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/dir.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 03854abf7..6ef35e8 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -197,6 +197,9 @@ static void ovl_instantiate(struct dentry *dentry, struct inode *inode,
 		inc_nlink(inode);
 	}
 	d_instantiate(dentry, inode);
+	/* Force lookup of new upper hardlink to find its lower */
+	if (hardlink)
+		d_drop(dentry);
 }
 
 static bool ovl_type_merge(struct dentry *dentry)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 00/11] overlayfs constant inode numbers
  2017-04-24  9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
                   ` (11 preceding siblings ...)
  2017-04-24 18:40 ` [PATCH v2 12/12] ovl: persistent inode numbers for hardlinks Amir Goldstein
@ 2017-04-24 18:51 ` Amir Goldstein
  2017-04-25 11:52 ` Vivek Goyal
  2017-04-25 12:16 ` Vivek Goyal
  14 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 18:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Mon, Apr 24, 2017 at 12:14 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> Miklos,
>
> Following your comments on the 'stable inodes' series from last week,
> this series fixes constant inode numbers for stat(2) with any layer
> configuration.
>
> For the case of all *lower* layers on same fs that supports NFS export,
> redirect by file handle will be used to optimize the lookup of the copy
> up origin of non-dir inode.
>
> For the case of *all* layers on same fs, overlayfs also gains:
> - Persistent inode numbers for directories
> - Correct results for du -x
>
> Consistcy of stat(2) st_ino with readdir(3) d_ino is NOT addressed by
> this series. It will be addressed for the 'samefs' configuration by the
> follow up 'stable inode' work, which is also going to address preserving
> hardlinks on copy up.
>
> This series is available for testing on [1].
> unionmount-testsuite needs a small fix patch for layers_check() [2].
> Tested the following layer configurations:
>  ./run --ov{,=0,=10} {,--samefs}

Miklos,

I instrumented unionmount-testsuite to mount cycle and compare ino
with pre copy up ino after rename and link operations [2].

It found one bug w.r.t. inode number of hardlinks, i.e.:
  ./run --ov hard-link
/mnt/a/no_foo100: inode number wrong (got 18833, want 17406)

I posted a 12th patch ("ovl: persistent inode numbers for hardlinks")
that fixes this issue, but I am not sure about 2 things:
1. The fix may not be so elegant (d_drop after d_instantiate)
2. Should broken hardlinks report the same inode number?

Without patch 12, broken hardlinks do report the same (lower) ino,
but only after ovl_lookup(). After ovl_link() the target reports the upper ino.
Patch 12 fixes the ovl_link() case, but I'm not sure if that is the
desired outcome.

>
> Tested constant inode numbers with xfstest overlay/017 and added a check
> for persistent directory inode numbers across mount cycle [3].
>
> Most of the patches in this series you already reviewed at one time or another
> and have your comments already addressed. Some other patches are trivial.
> Probably the only patches you need to take a closer look at are the 2 lookup
> patches (5-6).
>
> The implementation of lookup of a merged dir with a combination of redirect
> by fh from upper and redirect by name in mid layer is more complicated.
> Because this case is not strictly needed for this series, I simplified
> things a bit and restricted lookup by fh to those cases:
> 1. Non directory (lookup of copy up origin)
> 2. Merge directory when ofs->numlower == 1
>
> This restriction may be relaxed later on if we want to handle lookup by fh
> with fallback to lookup by path for merge dirs.
>
> What do you say? ... Too late for v4.12?
>
> Amir.
>
> [1] https://github.com/amir73il/linux/commits/ovl-constino
> [2] https://github.com/amir73il/unionmount-testsuite/commits/overlayfs-devel
> [3] https://github.com/amir73il/xfstests/commits/overlayfs-devel
>
> Amir Goldstein (11):
>   ovl: store path type in dentry
>   ovl: cram opaque boolean into type flags
>   ovl: check if all layers are on the same fs
>   ovl: store file handle of lower inode on copy up
>   ovl: lookup redirect by file handle
>   ovl: lookup non-dir inode copy up origin
>   ovl: set the COPYUP type flag for non-dirs
>   ovl: redirect non-dir by path on rename
>   ovl: constant st_ino/st_dev across copy up
>   ovl: persistent inode number for directories
>   ovl: fix du --one-file-system on overlay mount
>
>  fs/overlayfs/copy_up.c   |  98 +++++++++++++++++++++
>  fs/overlayfs/dir.c       |  28 +++++-
>  fs/overlayfs/inode.c     |  21 ++++-
>  fs/overlayfs/namei.c     | 216 +++++++++++++++++++++++++++++++++++++++++------
>  fs/overlayfs/overlayfs.h |  23 +++++
>  fs/overlayfs/ovl_entry.h |   9 +-
>  fs/overlayfs/super.c     |  21 +++++
>  fs/overlayfs/util.c      |  83 ++++++++++++++++--
>  8 files changed, 461 insertions(+), 38 deletions(-)
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-24  9:14 ` [PATCH v2 05/11] ovl: lookup redirect by file handle Amir Goldstein
@ 2017-04-25  8:10   ` Amir Goldstein
  2017-04-25 15:13   ` Miklos Szeredi
  1 sibling, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-25  8:10 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Mon, Apr 24, 2017 at 12:14 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> When overlay.fh xattr is found in a directory inode, instead of lookup
> of the dentry in next lower layer by name, first try to get it by calling
> exportfs_decode_fh().
>
> On failure to lookup by file handle to lower layer, fall back to lookup
> by name with or without path redirect.
>
> For now we only support following by file handle from upper if there is a
> single lower layer, because fallback from lookup by file hande to lookup
> by path in mid layers is not yet implemented.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/namei.c     | 185 +++++++++++++++++++++++++++++++++++++++++++----
>  fs/overlayfs/overlayfs.h |   1 +
>  fs/overlayfs/util.c      |  14 ++++
>  3 files changed, 186 insertions(+), 14 deletions(-)
>
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index d660177..0d1cc8f 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -9,9 +9,11 @@
>
>  #include <linux/fs.h>
>  #include <linux/cred.h>
> +#include <linux/mount.h>
>  #include <linux/namei.h>
>  #include <linux/xattr.h>
>  #include <linux/ratelimit.h>
> +#include <linux/exportfs.h>
>  #include "overlayfs.h"
>  #include "ovl_entry.h"
>
> @@ -21,7 +23,10 @@ struct ovl_lookup_data {
>         bool opaque;
>         bool stop;
>         bool last;
> -       char *redirect;
> +       bool by_path;           /* redirect by path */
> +       bool by_fh;             /* redirect by file handle */
> +       char *redirect;         /* path to follow */
> +       struct ovl_fh *fh;      /* file handle to follow */
>  };
>
>  static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
> @@ -81,6 +86,42 @@ static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
>         goto err_free;
>  }
>
> +static int ovl_check_redirect_fh(struct dentry *dentry,
> +                                struct ovl_lookup_data *d)
> +{
> +       int res;
> +       void *buf = NULL;
> +
> +       res = vfs_getxattr(dentry, OVL_XATTR_FH, NULL, 0);
> +       if (res < 0) {
> +               if (res == -ENODATA || res == -EOPNOTSUPP)
> +                       return 0;
> +               goto fail;
> +       }
> +       buf = kzalloc(res, GFP_TEMPORARY);
> +       if (!buf)
> +               return -ENOMEM;
> +
> +       if (res == 0)
> +               goto fail;
> +
> +       res = vfs_getxattr(dentry, OVL_XATTR_FH, buf, res);
> +       if (res < 0 || !ovl_redirect_fh_ok(buf, res))
> +               goto fail;
> +
> +       kfree(d->fh);
> +       d->fh = buf;
> +
> +       return 0;
> +
> +err_free:
> +       kfree(buf);
> +       return 0;
> +fail:
> +       pr_warn_ratelimited("overlayfs: failed to get file handle (%i)\n", res);
> +       goto err_free;
> +}
> +
>  static bool ovl_is_opaquedir(struct dentry *dentry)
>  {
>         int res;
> @@ -96,22 +137,81 @@ static bool ovl_is_opaquedir(struct dentry *dentry)
>         return false;
>  }
>
> +/* Check if p1 is connected with a chain of hashed dentries to p2 */
> +static bool ovl_is_lookable(struct dentry *p1, struct dentry *p2)
> +{
> +       struct dentry *p;
> +
> +       for (p = p2; !IS_ROOT(p); p = p->d_parent) {
> +               if (d_unhashed(p))
> +                       return false;
> +               if (p->d_parent == p1)
> +                       return true;
> +       }
> +       return false;
> +}
> +
> +/* Check if dentry is reachable from mnt via path lookup */
> +static int ovl_dentry_under_mnt(void *ctx, struct dentry *dentry)
> +{
> +       struct vfsmount *mnt = ctx;
> +
> +       return ovl_is_lookable(mnt->mnt_root, dentry);
> +}
> +
> +static struct dentry *ovl_lookup_fh(struct vfsmount *mnt,
> +                                   const struct ovl_fh *fh)
> +{
> +       int bytes = (fh->len - offsetof(struct ovl_fh, fid));
> +
> +       /*
> +        * When redirect_fh is disabled, 'invalid' file handles are stored
> +        * to indicate that this entry has been copied up.
> +        */
> +       if (!bytes || (int)fh->type == FILEID_INVALID)
> +               return ERR_PTR(-ESTALE);
> +
> +       /*
> +        * Several layers can be on the same fs and decoded dentry may be in
> +        * either one of those layers. We are looking for a match of dentry
> +        * and mnt to find out to which layer the decoded dentry belongs to.
> +        */
> +       return exportfs_decode_fh(mnt, (struct fid *)fh->fid,
> +                                 bytes >> 2, (int)fh->type,
> +                                 ovl_dentry_under_mnt, mnt);
> +}
> +
>  static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
>                              const char *name, unsigned int namelen,
>                              size_t prelen, const char *post,
> -                            struct dentry **ret)
> +                            struct vfsmount *mnt, struct dentry **ret)
>  {
>         struct dentry *this;
>         int err;
>
> -       this = lookup_one_len_unlocked(name, base, namelen);
> +       /*
> +        * Lookup of upper is with null d->fh.
> +        * Lookup of lower is either by_fh with non-null d->fh
> +        * or by_path with null d->fh.
> +        */
> +       if (d->fh)
> +               this = ovl_lookup_fh(mnt, d->fh);
> +       else
> +               this = lookup_one_len_unlocked(name, base, namelen);
>         if (IS_ERR(this)) {
>                 err = PTR_ERR(this);
>                 this = NULL;
>                 if (err == -ENOENT || err == -ENAMETOOLONG)
>                         goto out;
> +               if (d->fh && err == -ESTALE)
> +                       goto out;
>                 goto out_err;
>         }
> +
> +       /* If found by file handle - don't follow that handle again */
> +       kfree(d->fh);
> +       d->fh = NULL;
> +
>         if (!this->d_inode)
>                 goto put_and_out;
>
> @@ -135,9 +235,18 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
>                 d->stop = d->opaque = true;
>                 goto out;
>         }
> -       err = ovl_check_redirect(this, d, prelen, post);
> -       if (err)
> -               goto out_err;
> +       if (d->last)
> +               goto out;
> +       if (d->by_path) {
> +               err = ovl_check_redirect(this, d, prelen, post);
> +               if (err)
> +                       goto out_err;
> +       }
> +       if (d->by_fh) {
> +               err = ovl_check_redirect_fh(this, d);
> +               if (err)
> +                       goto out_err;
> +       }
>  out:
>         *ret = this;
>         return 0;
> @@ -152,6 +261,12 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
>         return err;
>  }
>
> +static int ovl_lookup_layer_fh(struct path *path, struct ovl_lookup_data *d,
> +                              struct dentry **ret)
> +{
> +       return ovl_lookup_single(path->dentry, d, "", 0, 0, "", path->mnt, ret);
> +}
> +
>  static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
>                             struct dentry **ret)
>  {
> @@ -162,7 +277,7 @@ static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
>
>         if (d->name.name[0] != '/')
>                 return ovl_lookup_single(base, d, d->name.name, d->name.len,
> -                                        0, "", ret);
> +                                        0, "", NULL, ret);
>
>         while (!IS_ERR_OR_NULL(base) && d_can_lookup(base)) {
>                 const char *s = d->name.name + d->name.len - rem;
> @@ -175,7 +290,7 @@ static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
>                         return -EIO;
>
>                 err = ovl_lookup_single(base, d, s, thislen,
> -                                       d->name.len - rem, next, &base);
> +                                       d->name.len - rem, next, NULL, &base);
>                 dput(dentry);
>                 if (err)
>                         return err;
> @@ -220,6 +335,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>         const struct cred *old_cred;
>         struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
>         struct ovl_entry *poe = dentry->d_parent->d_fsdata;
> +       struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
>         struct path *stack = NULL;
>         struct dentry *upperdir, *upperdentry = NULL;
>         unsigned int ctr = 0;
> @@ -235,7 +351,10 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                 .opaque = false,
>                 .stop = false,
>                 .last = !poe->numlower,
> +               .by_path = true,
>                 .redirect = NULL,
> +               .by_fh = true,
> +               .fh = NULL,
>         };
>
>         if (dentry->d_name.len > ofs->namelen)
> @@ -259,13 +378,23 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                         if (!upperredirect)
>                                 goto out_put_upper;
>                         if (d.redirect[0] == '/')
> -                               poe = dentry->d_sb->s_root->d_fsdata;
> +                               poe = roe;
>                 }
>                 if (d.opaque)
>                         type |= __OVL_PATH_OPAQUE;
>         }
>
> -       if (!d.stop && poe->numlower) {
> +       /*
> +        * For now we only support lower by fh in single layer, because
> +        * fallback from lookup by fh to lookup by path in mid layers for
> +        * merge directory is not yet implemented.
> +        */
> +       if (!ofs->redirect_fh || ofs->numlower > 1) {
> +               kfree(d.fh);
> +               d.fh = NULL;
> +       }
> +
> +       if (!d.stop && (poe->numlower || d.fh)) {
>                 err = -ENOMEM;
>                 stack = kcalloc(ofs->numlower, sizeof(struct path),
>                                 GFP_TEMPORARY);
> @@ -273,6 +402,35 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                         goto out_put_upper;
>         }
>
> +       /* Try to lookup lower layers by file handle */
> +       d.by_path = false;
> +       for (i = 0; !d.stop && d.fh && i < roe->numlower; i++) {
> +               struct path lowerpath = poe->lowerstack[i];
> +
> +               d.last = i == poe->numlower - 1;

copy&paste bug: should be s/poe/roe in 2 lines above.
it matters especially when lower files are moved into an opaque dir
I am improving xfstest overlay/017 to cover this case.

> +               err = ovl_lookup_layer_fh(&lowerpath, &d, &this);
> +               if (err)
> +                       goto out_put;
> +
> +               if (!this)
> +                       continue;
> +
> +               stack[ctr].dentry = this;
> +               stack[ctr].mnt = lowerpath.mnt;
> +               ctr++;
> +               /*
> +                * Found by fh - won't lookup by path.
> +                * TODO: set d.redirect to dentry_path(this),
> +                *       so lookup can continue by path.
> +                */
> +               d.stop = true;
> +       }
> +
> +       /* Fallback to lookup lower layers by path */
> +       d.by_path = true;
> +       d.by_fh = false;
> +       kfree(d.fh);
> +       d.fh = NULL;
>         for (i = 0; !d.stop && i < poe->numlower; i++) {
>                 struct path lowerpath = poe->lowerstack[i];
>
> @@ -291,10 +449,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                 if (d.stop)
>                         break;
>
> -               if (d.redirect &&
> -                   d.redirect[0] == '/' &&
> -                   poe != dentry->d_sb->s_root->d_fsdata) {
> -                       poe = dentry->d_sb->s_root->d_fsdata;
> +               if (d.redirect && d.redirect[0] == '/' && poe != roe) {
> +                       poe = roe;
>
>                         /* Find the current layer on the root dentry */
>                         for (i = 0; i < poe->numlower; i++)
> @@ -354,6 +510,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>         dput(upperdentry);
>         kfree(upperredirect);
>  out:
> +       kfree(d.fh);
>         kfree(d.redirect);
>         revert_creds(old_cred);
>         return ERR_PTR(err);
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index c3cfbc5..08002ce 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -190,6 +190,7 @@ const char *ovl_dentry_get_redirect(struct dentry *dentry);
>  void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
>  bool ovl_redirect_fh(struct super_block *sb);
>  void ovl_clear_redirect_fh(struct super_block *sb);
> +bool ovl_redirect_fh_ok(const char *redirect, size_t size);
>  void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry);
>  void ovl_inode_init(struct inode *inode, struct inode *realinode,
>                     bool is_upper);
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index b3bc117..dba9753 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -254,6 +254,20 @@ void ovl_clear_redirect_fh(struct super_block *sb)
>         ofs->redirect_fh = false;
>  }
>
> +bool ovl_redirect_fh_ok(const char *redirect, size_t size)
> +{
> +       struct ovl_fh *fh = (void *)redirect;
> +
> +       if (size < sizeof(struct ovl_fh) || size < fh->len)
> +               return false;
> +
> +       if (fh->version > OVL_FH_VERSION ||
> +           fh->magic != OVL_FH_MAGIC)
> +               return false;
> +
> +       return true;
> +}
> +
>  void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
>  {
>         struct ovl_entry *oe = dentry->d_fsdata;
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 00/11] overlayfs constant inode numbers
  2017-04-24  9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
                   ` (12 preceding siblings ...)
  2017-04-24 18:51 ` [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
@ 2017-04-25 11:52 ` Vivek Goyal
  2017-04-25 12:05   ` Amir Goldstein
  2017-04-25 12:16 ` Vivek Goyal
  14 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2017-04-25 11:52 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel

On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
> Miklos,
> 
> Following your comments on the 'stable inodes' series from last week,
> this series fixes constant inode numbers for stat(2) with any layer
> configuration.
> 
> For the case of all *lower* layers on same fs that supports NFS export,
> redirect by file handle will be used to optimize the lookup of the copy
> up origin of non-dir inode.
> 
> For the case of *all* layers on same fs, overlayfs also gains:
> - Persistent inode numbers for directories
> - Correct results for du -x
> 
> Consistcy of stat(2) st_ino with readdir(3) d_ino is NOT addressed by
> this series. It will be addressed for the 'samefs' configuration by the
> follow up 'stable inode' work, which is also going to address preserving
> hardlinks on copy up.

Hi Amir,

We need to update Documentation/filesystems/overlayfs.txt as well to
reflect new semantics of reporting st_dev and st_ino?

Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 00/11] overlayfs constant inode numbers
  2017-04-25 11:52 ` Vivek Goyal
@ 2017-04-25 12:05   ` Amir Goldstein
  0 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-25 12:05 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel

On Tue, Apr 25, 2017 at 2:52 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
>> Miklos,
>>
>> Following your comments on the 'stable inodes' series from last week,
>> this series fixes constant inode numbers for stat(2) with any layer
>> configuration.
>>
>> For the case of all *lower* layers on same fs that supports NFS export,
>> redirect by file handle will be used to optimize the lookup of the copy
>> up origin of non-dir inode.
>>
>> For the case of *all* layers on same fs, overlayfs also gains:
>> - Persistent inode numbers for directories
>> - Correct results for du -x
>>
>> Consistcy of stat(2) st_ino with readdir(3) d_ino is NOT addressed by
>> this series. It will be addressed for the 'samefs' configuration by the
>> follow up 'stable inode' work, which is also going to address preserving
>> hardlinks on copy up.
>
> Hi Amir,
>
> We need to update Documentation/filesystems/overlayfs.txt as well to
> reflect new semantics of reporting st_dev and st_ino?
>

Of course need to! I'll do that.
Thanks!

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 00/11] overlayfs constant inode numbers
  2017-04-24  9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
                   ` (13 preceding siblings ...)
  2017-04-25 11:52 ` Vivek Goyal
@ 2017-04-25 12:16 ` Vivek Goyal
  2017-04-25 12:41   ` Amir Goldstein
  14 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2017-04-25 12:16 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel

On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
> Miklos,
> 
> Following your comments on the 'stable inodes' series from last week,
> this series fixes constant inode numbers for stat(2) with any layer
> configuration.
> 
> For the case of all *lower* layers on same fs that supports NFS export,
> redirect by file handle will be used to optimize the lookup of the copy
> up origin of non-dir inode.

I was trying to run unionmount-testsuite (original from dhowells) and I
disabled layer check. Looks like empty directory rename test fails.

***
*** ./run --ov --ts=0 rename-empty-dir
***
TEST rename-empty-dir.py:10: Rename empty dir and rename back
 ./run --rename /mnt/a/empty100 /mnt/a/no_dir100
 /mnt/a/empty100: Unexpected error: Invalid cross-device link

Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 00/11] overlayfs constant inode numbers
  2017-04-25 12:16 ` Vivek Goyal
@ 2017-04-25 12:41   ` Amir Goldstein
  2017-04-25 12:52     ` Vivek Goyal
  0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-25 12:41 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel

On Tue, Apr 25, 2017 at 3:16 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
>> Miklos,
>>
>> Following your comments on the 'stable inodes' series from last week,
>> this series fixes constant inode numbers for stat(2) with any layer
>> configuration.
>>
>> For the case of all *lower* layers on same fs that supports NFS export,
>> redirect by file handle will be used to optimize the lookup of the copy
>> up origin of non-dir inode.
>
> I was trying to run unionmount-testsuite (original from dhowells) and I
> disabled layer check. Looks like empty directory rename test fails.
>
> ***
> *** ./run --ov --ts=0 rename-empty-dir
> ***
> TEST rename-empty-dir.py:10: Rename empty dir and rename back
>  ./run --rename /mnt/a/empty100 /mnt/a/no_dir100
>  /mnt/a/empty100: Unexpected error: Invalid cross-device link
>

Strange... I can't find code in recent times when this used to work
It certainly doesn't look like it should work with kernel v4.10
and redirect_dir=off.
I couldn't the point of regression by looking at the change log.
You'd need to bisect to find the regression patch.

Are you not compiling kernel with redirect_dir?
CONFIG_OVERLAY_FS_REDIRECT_DIR=y

I guess not. If you do compile or mount with -o redirect_dir=on,
you will need some minimal patches to unionmount-testsuite
that set the expectations correctly for directory rename.

The last stable branch I have from testing v4.10 is this:
https://github.com/amir73il/unionmount-testsuite/commits/ovl_rename_dir

But you may as well take my most recent branch for testing const ino:
https://github.com/amir73il/unionmount-testsuite/commits/overlayfs-devel

Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 00/11] overlayfs constant inode numbers
  2017-04-25 12:41   ` Amir Goldstein
@ 2017-04-25 12:52     ` Vivek Goyal
  2017-04-25 13:23       ` Amir Goldstein
  0 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2017-04-25 12:52 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel

On Tue, Apr 25, 2017 at 03:41:56PM +0300, Amir Goldstein wrote:
> On Tue, Apr 25, 2017 at 3:16 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
> >> Miklos,
> >>
> >> Following your comments on the 'stable inodes' series from last week,
> >> this series fixes constant inode numbers for stat(2) with any layer
> >> configuration.
> >>
> >> For the case of all *lower* layers on same fs that supports NFS export,
> >> redirect by file handle will be used to optimize the lookup of the copy
> >> up origin of non-dir inode.
> >
> > I was trying to run unionmount-testsuite (original from dhowells) and I
> > disabled layer check. Looks like empty directory rename test fails.
> >
> > ***
> > *** ./run --ov --ts=0 rename-empty-dir
> > ***
> > TEST rename-empty-dir.py:10: Rename empty dir and rename back
> >  ./run --rename /mnt/a/empty100 /mnt/a/no_dir100
> >  /mnt/a/empty100: Unexpected error: Invalid cross-device link
> >
> 
> Strange... I can't find code in recent times when this used to work
> It certainly doesn't look like it should work with kernel v4.10
> and redirect_dir=off.
> I couldn't the point of regression by looking at the change log.
> You'd need to bisect to find the regression patch.
> 
> Are you not compiling kernel with redirect_dir?
> CONFIG_OVERLAY_FS_REDIRECT_DIR=y

I noticed that I am running with REDIRECT_DIR=n. 

I also re-ran the tests without your patches and test is still broken. So
it is not due to your current patch series.

It has been long time since I ran these tests. I suspect that we might
have changed this behavior during redirect directory patches.

So question is, is this a regression or expected behavior. That is with
REDIRECT_DIR=n, renames of empty directory will be denied too.

> 
> I guess not. If you do compile or mount with -o redirect_dir=on,
> you will need some minimal patches to unionmount-testsuite
> that set the expectations correctly for directory rename.
> 
> The last stable branch I have from testing v4.10 is this:
> https://github.com/amir73il/unionmount-testsuite/commits/ovl_rename_dir
> 
> But you may as well take my most recent branch for testing const ino:
> https://github.com/amir73il/unionmount-testsuite/commits/overlayfs-devel

I guess I should start using your copy of unionmount-testsuite.

Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 00/11] overlayfs constant inode numbers
  2017-04-25 12:52     ` Vivek Goyal
@ 2017-04-25 13:23       ` Amir Goldstein
  2017-04-25 13:29         ` Vivek Goyal
  0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-25 13:23 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel

On Tue, Apr 25, 2017 at 3:52 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Apr 25, 2017 at 03:41:56PM +0300, Amir Goldstein wrote:
>> On Tue, Apr 25, 2017 at 3:16 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
>> >> Miklos,
>> >>
>> >> Following your comments on the 'stable inodes' series from last week,
>> >> this series fixes constant inode numbers for stat(2) with any layer
>> >> configuration.
>> >>
>> >> For the case of all *lower* layers on same fs that supports NFS export,
>> >> redirect by file handle will be used to optimize the lookup of the copy
>> >> up origin of non-dir inode.
>> >
>> > I was trying to run unionmount-testsuite (original from dhowells) and I
>> > disabled layer check. Looks like empty directory rename test fails.
>> >
>> > ***
>> > *** ./run --ov --ts=0 rename-empty-dir
>> > ***
>> > TEST rename-empty-dir.py:10: Rename empty dir and rename back
>> >  ./run --rename /mnt/a/empty100 /mnt/a/no_dir100
>> >  /mnt/a/empty100: Unexpected error: Invalid cross-device link
>> >
>>
>> Strange... I can't find code in recent times when this used to work
>> It certainly doesn't look like it should work with kernel v4.10
>> and redirect_dir=off.
>> I couldn't the point of regression by looking at the change log.
>> You'd need to bisect to find the regression patch.
>>
>> Are you not compiling kernel with redirect_dir?
>> CONFIG_OVERLAY_FS_REDIRECT_DIR=y
>
> I noticed that I am running with REDIRECT_DIR=n.
>
> I also re-ran the tests without your patches and test is still broken. So
> it is not due to your current patch series.
>
> It has been long time since I ran these tests. I suspect that we might
> have changed this behavior during redirect directory patches.
>
> So question is, is this a regression or expected behavior. That is with
> REDIRECT_DIR=n, renames of empty directory will be denied too.
>

It must be a regression, although I can't think why anyone would care.
If one really cares about renaming lower empty directories, why not enable
REDIRECT_DIR?

>>
>> I guess not. If you do compile or mount with -o redirect_dir=on,
>> you will need some minimal patches to unionmount-testsuite
>> that set the expectations correctly for directory rename.
>>
>> The last stable branch I have from testing v4.10 is this:
>> https://github.com/amir73il/unionmount-testsuite/commits/ovl_rename_dir
>>
>> But you may as well take my most recent branch for testing const ino:
>> https://github.com/amir73il/unionmount-testsuite/commits/overlayfs-devel
>
> I guess I should start using your copy of unionmount-testsuite.
>
> Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 00/11] overlayfs constant inode numbers
  2017-04-25 13:23       ` Amir Goldstein
@ 2017-04-25 13:29         ` Vivek Goyal
  2017-04-25 13:49           ` Amir Goldstein
  0 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2017-04-25 13:29 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel

On Tue, Apr 25, 2017 at 04:23:28PM +0300, Amir Goldstein wrote:
> On Tue, Apr 25, 2017 at 3:52 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Tue, Apr 25, 2017 at 03:41:56PM +0300, Amir Goldstein wrote:
> >> On Tue, Apr 25, 2017 at 3:16 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
> >> >> Miklos,
> >> >>
> >> >> Following your comments on the 'stable inodes' series from last week,
> >> >> this series fixes constant inode numbers for stat(2) with any layer
> >> >> configuration.
> >> >>
> >> >> For the case of all *lower* layers on same fs that supports NFS export,
> >> >> redirect by file handle will be used to optimize the lookup of the copy
> >> >> up origin of non-dir inode.
> >> >
> >> > I was trying to run unionmount-testsuite (original from dhowells) and I
> >> > disabled layer check. Looks like empty directory rename test fails.
> >> >
> >> > ***
> >> > *** ./run --ov --ts=0 rename-empty-dir
> >> > ***
> >> > TEST rename-empty-dir.py:10: Rename empty dir and rename back
> >> >  ./run --rename /mnt/a/empty100 /mnt/a/no_dir100
> >> >  /mnt/a/empty100: Unexpected error: Invalid cross-device link
> >> >
> >>
> >> Strange... I can't find code in recent times when this used to work
> >> It certainly doesn't look like it should work with kernel v4.10
> >> and redirect_dir=off.
> >> I couldn't the point of regression by looking at the change log.
> >> You'd need to bisect to find the regression patch.
> >>
> >> Are you not compiling kernel with redirect_dir?
> >> CONFIG_OVERLAY_FS_REDIRECT_DIR=y
> >
> > I noticed that I am running with REDIRECT_DIR=n.
> >
> > I also re-ran the tests without your patches and test is still broken. So
> > it is not due to your current patch series.
> >
> > It has been long time since I ran these tests. I suspect that we might
> > have changed this behavior during redirect directory patches.
> >
> > So question is, is this a regression or expected behavior. That is with
> > REDIRECT_DIR=n, renames of empty directory will be denied too.
> >
> 
> It must be a regression, although I can't think why anyone would care.
> If one really cares about renaming lower empty directories, why not enable
> REDIRECT_DIR?

I will enable it now. I just had an old config and ran into this.

But this does raise the question unionmount-testsuite need to be
maintained somewhere so that it acts as a baseline to figure out if
new patches broke some existing tests.

I can go by the tree you are maintaining but currently that's broken too
with REDIRECT_DIR=n.

Vivek

> 
> >>
> >> I guess not. If you do compile or mount with -o redirect_dir=on,
> >> you will need some minimal patches to unionmount-testsuite
> >> that set the expectations correctly for directory rename.
> >>
> >> The last stable branch I have from testing v4.10 is this:
> >> https://github.com/amir73il/unionmount-testsuite/commits/ovl_rename_dir
> >>
> >> But you may as well take my most recent branch for testing const ino:
> >> https://github.com/amir73il/unionmount-testsuite/commits/overlayfs-devel
> >
> > I guess I should start using your copy of unionmount-testsuite.
> >
> > Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 00/11] overlayfs constant inode numbers
  2017-04-25 13:29         ` Vivek Goyal
@ 2017-04-25 13:49           ` Amir Goldstein
  2017-04-25 13:53             ` Vivek Goyal
  0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-25 13:49 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel

On Tue, Apr 25, 2017 at 4:29 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Apr 25, 2017 at 04:23:28PM +0300, Amir Goldstein wrote:
>> On Tue, Apr 25, 2017 at 3:52 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Tue, Apr 25, 2017 at 03:41:56PM +0300, Amir Goldstein wrote:
>> >> On Tue, Apr 25, 2017 at 3:16 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> >> > On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
>> >> >> Miklos,
>> >> >>
>> >> >> Following your comments on the 'stable inodes' series from last week,
>> >> >> this series fixes constant inode numbers for stat(2) with any layer
>> >> >> configuration.
>> >> >>
>> >> >> For the case of all *lower* layers on same fs that supports NFS export,
>> >> >> redirect by file handle will be used to optimize the lookup of the copy
>> >> >> up origin of non-dir inode.
>> >> >
>> >> > I was trying to run unionmount-testsuite (original from dhowells) and I
>> >> > disabled layer check. Looks like empty directory rename test fails.
>> >> >
>> >> > ***
>> >> > *** ./run --ov --ts=0 rename-empty-dir
>> >> > ***
>> >> > TEST rename-empty-dir.py:10: Rename empty dir and rename back
>> >> >  ./run --rename /mnt/a/empty100 /mnt/a/no_dir100
>> >> >  /mnt/a/empty100: Unexpected error: Invalid cross-device link
>> >> >
>> >>
>> >> Strange... I can't find code in recent times when this used to work
>> >> It certainly doesn't look like it should work with kernel v4.10
>> >> and redirect_dir=off.
>> >> I couldn't the point of regression by looking at the change log.
>> >> You'd need to bisect to find the regression patch.
>> >>
>> >> Are you not compiling kernel with redirect_dir?
>> >> CONFIG_OVERLAY_FS_REDIRECT_DIR=y
>> >
>> > I noticed that I am running with REDIRECT_DIR=n.
>> >
>> > I also re-ran the tests without your patches and test is still broken. So
>> > it is not due to your current patch series.
>> >
>> > It has been long time since I ran these tests. I suspect that we might
>> > have changed this behavior during redirect directory patches.
>> >
>> > So question is, is this a regression or expected behavior. That is with
>> > REDIRECT_DIR=n, renames of empty directory will be denied too.
>> >
>>
>> It must be a regression, although I can't think why anyone would care.
>> If one really cares about renaming lower empty directories, why not enable
>> REDIRECT_DIR?
>
> I will enable it now. I just had an old config and ran into this.
>
> But this does raise the question unionmount-testsuite need to be
> maintained somewhere so that it acts as a baseline to figure out if
> new patches broke some existing tests.
>
> I can go by the tree you are maintaining but currently that's broken too
> with REDIRECT_DIR=n.
>

Right.
I have given some though about what's the best way to handle this.
Probably need a test flag --noredirect. I'll add this to my TODO...

BTW, I try to keep the branch overlayfs-devel uptodate for testing
latest features. It could be rebased, but I'll make an effort not to.
If there is a need for a more stable non-rewindable branch, let me know.

>
>>
>> >>
>> >> I guess not. If you do compile or mount with -o redirect_dir=on,
>> >> you will need some minimal patches to unionmount-testsuite
>> >> that set the expectations correctly for directory rename.
>> >>
>> >> The last stable branch I have from testing v4.10 is this:
>> >> https://github.com/amir73il/unionmount-testsuite/commits/ovl_rename_dir
>> >>
>> >> But you may as well take my most recent branch for testing const ino:
>> >> https://github.com/amir73il/unionmount-testsuite/commits/overlayfs-devel
>> >
>> > I guess I should start using your copy of unionmount-testsuite.
>> >
>> > Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 00/11] overlayfs constant inode numbers
  2017-04-25 13:49           ` Amir Goldstein
@ 2017-04-25 13:53             ` Vivek Goyal
  2017-04-25 14:20               ` Amir Goldstein
  0 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2017-04-25 13:53 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel, David Howells

On Tue, Apr 25, 2017 at 04:49:00PM +0300, Amir Goldstein wrote:
> On Tue, Apr 25, 2017 at 4:29 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Tue, Apr 25, 2017 at 04:23:28PM +0300, Amir Goldstein wrote:
> >> On Tue, Apr 25, 2017 at 3:52 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > On Tue, Apr 25, 2017 at 03:41:56PM +0300, Amir Goldstein wrote:
> >> >> On Tue, Apr 25, 2017 at 3:16 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> >> > On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
> >> >> >> Miklos,
> >> >> >>
> >> >> >> Following your comments on the 'stable inodes' series from last week,
> >> >> >> this series fixes constant inode numbers for stat(2) with any layer
> >> >> >> configuration.
> >> >> >>
> >> >> >> For the case of all *lower* layers on same fs that supports NFS export,
> >> >> >> redirect by file handle will be used to optimize the lookup of the copy
> >> >> >> up origin of non-dir inode.
> >> >> >
> >> >> > I was trying to run unionmount-testsuite (original from dhowells) and I
> >> >> > disabled layer check. Looks like empty directory rename test fails.
> >> >> >
> >> >> > ***
> >> >> > *** ./run --ov --ts=0 rename-empty-dir
> >> >> > ***
> >> >> > TEST rename-empty-dir.py:10: Rename empty dir and rename back
> >> >> >  ./run --rename /mnt/a/empty100 /mnt/a/no_dir100
> >> >> >  /mnt/a/empty100: Unexpected error: Invalid cross-device link
> >> >> >
> >> >>
> >> >> Strange... I can't find code in recent times when this used to work
> >> >> It certainly doesn't look like it should work with kernel v4.10
> >> >> and redirect_dir=off.
> >> >> I couldn't the point of regression by looking at the change log.
> >> >> You'd need to bisect to find the regression patch.
> >> >>
> >> >> Are you not compiling kernel with redirect_dir?
> >> >> CONFIG_OVERLAY_FS_REDIRECT_DIR=y
> >> >
> >> > I noticed that I am running with REDIRECT_DIR=n.
> >> >
> >> > I also re-ran the tests without your patches and test is still broken. So
> >> > it is not due to your current patch series.
> >> >
> >> > It has been long time since I ran these tests. I suspect that we might
> >> > have changed this behavior during redirect directory patches.
> >> >
> >> > So question is, is this a regression or expected behavior. That is with
> >> > REDIRECT_DIR=n, renames of empty directory will be denied too.
> >> >
> >>
> >> It must be a regression, although I can't think why anyone would care.
> >> If one really cares about renaming lower empty directories, why not enable
> >> REDIRECT_DIR?
> >
> > I will enable it now. I just had an old config and ran into this.
> >
> > But this does raise the question unionmount-testsuite need to be
> > maintained somewhere so that it acts as a baseline to figure out if
> > new patches broke some existing tests.
> >
> > I can go by the tree you are maintaining but currently that's broken too
> > with REDIRECT_DIR=n.
> >
> 
> Right.
> I have given some though about what's the best way to handle this.
> Probably need a test flag --noredirect. I'll add this to my TODO...
> 
> BTW, I try to keep the branch overlayfs-devel uptodate for testing
> latest features. It could be rebased, but I'll make an effort not to.
> If there is a need for a more stable non-rewindable branch, let me know.

I think would be good if you maintain "master" branch of your tree up
to date and hopefully that's stable so that later git pull does not talk
about conflicts. We can then use your tree for setting a baseline and
detecting regressions.

CCing Dave Howells, in case he is interested in continuing to update his
tree as overlayfs kernel development takes place.

Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 00/11] overlayfs constant inode numbers
  2017-04-25 13:53             ` Vivek Goyal
@ 2017-04-25 14:20               ` Amir Goldstein
  0 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-25 14:20 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel, David Howells

On Tue, Apr 25, 2017 at 4:53 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Apr 25, 2017 at 04:49:00PM +0300, Amir Goldstein wrote:
>> On Tue, Apr 25, 2017 at 4:29 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Tue, Apr 25, 2017 at 04:23:28PM +0300, Amir Goldstein wrote:
>> >> On Tue, Apr 25, 2017 at 3:52 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> >> > On Tue, Apr 25, 2017 at 03:41:56PM +0300, Amir Goldstein wrote:
>> >> >> On Tue, Apr 25, 2017 at 3:16 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> >> >> > On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
>> >> >> >> Miklos,
>> >> >> >>
>> >> >> >> Following your comments on the 'stable inodes' series from last week,
>> >> >> >> this series fixes constant inode numbers for stat(2) with any layer
>> >> >> >> configuration.
>> >> >> >>
>> >> >> >> For the case of all *lower* layers on same fs that supports NFS export,
>> >> >> >> redirect by file handle will be used to optimize the lookup of the copy
>> >> >> >> up origin of non-dir inode.
>> >> >> >
>> >> >> > I was trying to run unionmount-testsuite (original from dhowells) and I
>> >> >> > disabled layer check. Looks like empty directory rename test fails.
>> >> >> >
>> >> >> > ***
>> >> >> > *** ./run --ov --ts=0 rename-empty-dir
>> >> >> > ***
>> >> >> > TEST rename-empty-dir.py:10: Rename empty dir and rename back
>> >> >> >  ./run --rename /mnt/a/empty100 /mnt/a/no_dir100
>> >> >> >  /mnt/a/empty100: Unexpected error: Invalid cross-device link
>> >> >> >
>> >> >>
>> >> >> Strange... I can't find code in recent times when this used to work
>> >> >> It certainly doesn't look like it should work with kernel v4.10
>> >> >> and redirect_dir=off.
>> >> >> I couldn't the point of regression by looking at the change log.
>> >> >> You'd need to bisect to find the regression patch.
>> >> >>
>> >> >> Are you not compiling kernel with redirect_dir?
>> >> >> CONFIG_OVERLAY_FS_REDIRECT_DIR=y
>> >> >
>> >> > I noticed that I am running with REDIRECT_DIR=n.
>> >> >
>> >> > I also re-ran the tests without your patches and test is still broken. So
>> >> > it is not due to your current patch series.
>> >> >
>> >> > It has been long time since I ran these tests. I suspect that we might
>> >> > have changed this behavior during redirect directory patches.
>> >> >
>> >> > So question is, is this a regression or expected behavior. That is with
>> >> > REDIRECT_DIR=n, renames of empty directory will be denied too.
>> >> >
>> >>
>> >> It must be a regression, although I can't think why anyone would care.
>> >> If one really cares about renaming lower empty directories, why not enable
>> >> REDIRECT_DIR?
>> >
>> > I will enable it now. I just had an old config and ran into this.
>> >
>> > But this does raise the question unionmount-testsuite need to be
>> > maintained somewhere so that it acts as a baseline to figure out if
>> > new patches broke some existing tests.
>> >
>> > I can go by the tree you are maintaining but currently that's broken too
>> > with REDIRECT_DIR=n.
>> >
>>
>> Right.
>> I have given some though about what's the best way to handle this.
>> Probably need a test flag --noredirect. I'll add this to my TODO...
>>
>> BTW, I try to keep the branch overlayfs-devel uptodate for testing
>> latest features. It could be rebased, but I'll make an effort not to.
>> If there is a need for a more stable non-rewindable branch, let me know.
>
> I think would be good if you maintain "master" branch of your tree up
> to date and hopefully that's stable so that later git pull does not talk
> about conflicts. We can then use your tree for setting a baseline and
> detecting regressions.
>
> CCing Dave Howells, in case he is interested in continuing to update his
> tree as overlayfs kernel development takes place.
>

OK. declaring branch master on my tree 'ff-only':
https://github.com/amir73il/unionmount-testsuite/tree/master

Last commit is set to:
060af33 run --ov --samefs uses lower/upper on same fs

This commit contains instructions also how to setup unionmount-testsuite
on non tmpfs, which is very useful for being in touch with reality.

It is recommended to test at least with the following flag combinations:
./run --ov # tmpfs not same for lower/upper
./run --ov=0 # same as above with cycle mount after mkdir/rename
./run --ov --samefs # tmpfs or configured base fs, same for lower and upper
./run --ov=0 --samefs # same as above with cycle mount after mkdir/rename

Mind you that testing constant inode work still requires branch overlayfs-devel
with the fix to check_layers() and more goodies.

Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
  2017-04-24  9:14 ` [PATCH v2 04/11] ovl: store file handle of lower inode on copy up Amir Goldstein
  2017-04-24 13:32   ` kbuild test robot
@ 2017-04-25 14:53   ` Miklos Szeredi
  2017-04-26  5:47     ` Amir Goldstein
  2017-04-26  9:39   ` Miklos Szeredi
  2 siblings, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-25 14:53 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> Sometimes it is interesting to know if an upper file is pure
> upper or a copy up target, and if it is a copy up target, it
> may be interesting to find the copy up origin.
>
> This will be used to preserve lower inode numbers across copy up.
>
> Store the lower inode file handle in upper inode xattr overlay.fh
> on copy up to use it later for these cases.
>
> On failure to encode lower file handle, store an invalid 'null'
> handle, so we can always use the overlay.fh xattr to distignuish
> between a copy up and a pure upper inode.
>
> If lower fs does not support NFS export ops or if not all lower
> layers are on the same fs, don't try to encode a lower file handle
> and use the 'null' handle instead.

Decoding fh on wrong fs is going to result in "interesting"
posibilities, so I think we should be storing some kind of identifier
about the layer from the very start.

The trivial way to do that would be to encode the filesystem's UUID
into the stored fh.  Problem seems to be that only ext4 is setting
sb->s_uuid.  Probably not too hard to fix the others.

When decoding, trivial to check in the samefs case, but we'd need a
table for the uuid->layer lookup for the non-samefs case. But that can
wait, I'd be content with just having the infrastructure there and
just using it to verify the handle for now.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-24  9:14 ` [PATCH v2 05/11] ovl: lookup redirect by file handle Amir Goldstein
  2017-04-25  8:10   ` Amir Goldstein
@ 2017-04-25 15:13   ` Miklos Szeredi
  2017-04-25 17:41     ` Amir Goldstein
  1 sibling, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-25 15:13 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> When overlay.fh xattr is found in a directory inode, instead of lookup
> of the dentry in next lower layer by name, first try to get it by calling
> exportfs_decode_fh().
>
> On failure to lookup by file handle to lower layer, fall back to lookup
> by name with or without path redirect.
>
> For now we only support following by file handle from upper if there is a
> single lower layer, because fallback from lookup by file hande to lookup
> by path in mid layers is not yet implemented.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/namei.c     | 185 +++++++++++++++++++++++++++++++++++++++++++----
>  fs/overlayfs/overlayfs.h |   1 +
>  fs/overlayfs/util.c      |  14 ++++
>  3 files changed, 186 insertions(+), 14 deletions(-)
>
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index d660177..0d1cc8f 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -9,9 +9,11 @@
>
>  #include <linux/fs.h>
>  #include <linux/cred.h>
> +#include <linux/mount.h>
>  #include <linux/namei.h>
>  #include <linux/xattr.h>
>  #include <linux/ratelimit.h>
> +#include <linux/exportfs.h>
>  #include "overlayfs.h"
>  #include "ovl_entry.h"
>
> @@ -21,7 +23,10 @@ struct ovl_lookup_data {
>         bool opaque;
>         bool stop;
>         bool last;
> -       char *redirect;
> +       bool by_path;           /* redirect by path */
> +       bool by_fh;             /* redirect by file handle */
> +       char *redirect;         /* path to follow */
> +       struct ovl_fh *fh;      /* file handle to follow */
>  };
>
>  static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
> @@ -81,6 +86,42 @@ static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
>         goto err_free;
>  }
>
> +static int ovl_check_redirect_fh(struct dentry *dentry,
> +                                struct ovl_lookup_data *d)
> +{
> +       int res;
> +       void *buf = NULL;
> +
> +       res = vfs_getxattr(dentry, OVL_XATTR_FH, NULL, 0);
> +       if (res < 0) {
> +               if (res == -ENODATA || res == -EOPNOTSUPP)
> +                       return 0;
> +               goto fail;
> +       }
> +       buf = kzalloc(res, GFP_TEMPORARY);
> +       if (!buf)
> +               return -ENOMEM;
> +
> +       if (res == 0)
> +               goto fail;
> +
> +       res = vfs_getxattr(dentry, OVL_XATTR_FH, buf, res);
> +       if (res < 0 || !ovl_redirect_fh_ok(buf, res))
> +               goto fail;
> +
> +       kfree(d->fh);
> +       d->fh = buf;
> +
> +       return 0;
> +
> +err_free:
> +       kfree(buf);
> +       return 0;
> +fail:
> +       pr_warn_ratelimited("overlayfs: failed to get file handle (%i)\n", res);
> +       goto err_free;
> +}
> +
>  static bool ovl_is_opaquedir(struct dentry *dentry)
>  {
>         int res;
> @@ -96,22 +137,81 @@ static bool ovl_is_opaquedir(struct dentry *dentry)
>         return false;
>  }
>
> +/* Check if p1 is connected with a chain of hashed dentries to p2 */
> +static bool ovl_is_lookable(struct dentry *p1, struct dentry *p2)
> +{
> +       struct dentry *p;
> +
> +       for (p = p2; !IS_ROOT(p); p = p->d_parent) {
> +               if (d_unhashed(p))
> +                       return false;
> +               if (p->d_parent == p1)
> +                       return true;
> +       }
> +       return false;
> +}

Walking the dentry tree without RCU protection is dangerous and broken.

I'm also wondering if there's a better way to find the layer (e.g.
store the layer index in the handle as well).

> +
> +/* Check if dentry is reachable from mnt via path lookup */
> +static int ovl_dentry_under_mnt(void *ctx, struct dentry *dentry)
> +{
> +       struct vfsmount *mnt = ctx;
> +
> +       return ovl_is_lookable(mnt->mnt_root, dentry);
> +}
> +
> +static struct dentry *ovl_lookup_fh(struct vfsmount *mnt,
> +                                   const struct ovl_fh *fh)
> +{
> +       int bytes = (fh->len - offsetof(struct ovl_fh, fid));
> +
> +       /*
> +        * When redirect_fh is disabled, 'invalid' file handles are stored
> +        * to indicate that this entry has been copied up.
> +        */
> +       if (!bytes || (int)fh->type == FILEID_INVALID)
> +               return ERR_PTR(-ESTALE);
> +
> +       /*
> +        * Several layers can be on the same fs and decoded dentry may be in
> +        * either one of those layers. We are looking for a match of dentry
> +        * and mnt to find out to which layer the decoded dentry belongs to.
> +        */
> +       return exportfs_decode_fh(mnt, (struct fid *)fh->fid,
> +                                 bytes >> 2, (int)fh->type,
> +                                 ovl_dentry_under_mnt, mnt);
> +}
> +
>  static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
>                              const char *name, unsigned int namelen,
>                              size_t prelen, const char *post,
> -                            struct dentry **ret)
> +                            struct vfsmount *mnt, struct dentry **ret)

I think it would be better to split this function into path and fh
variants and extract the common parts into helper(s).

>  {
>         struct dentry *this;
>         int err;
>
> -       this = lookup_one_len_unlocked(name, base, namelen);
> +       /*
> +        * Lookup of upper is with null d->fh.
> +        * Lookup of lower is either by_fh with non-null d->fh
> +        * or by_path with null d->fh.
> +        */
> +       if (d->fh)
> +               this = ovl_lookup_fh(mnt, d->fh);
> +       else
> +               this = lookup_one_len_unlocked(name, base, namelen);
>         if (IS_ERR(this)) {
>                 err = PTR_ERR(this);
>                 this = NULL;
>                 if (err == -ENOENT || err == -ENAMETOOLONG)
>                         goto out;
> +               if (d->fh && err == -ESTALE)
> +                       goto out;
>                 goto out_err;
>         }
> +
> +       /* If found by file handle - don't follow that handle again */
> +       kfree(d->fh);
> +       d->fh = NULL;
> +
>         if (!this->d_inode)
>                 goto put_and_out;
>
> @@ -135,9 +235,18 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
>                 d->stop = d->opaque = true;
>                 goto out;
>         }
> -       err = ovl_check_redirect(this, d, prelen, post);
> -       if (err)
> -               goto out_err;
> +       if (d->last)
> +               goto out;
> +       if (d->by_path) {
> +               err = ovl_check_redirect(this, d, prelen, post);
> +               if (err)
> +                       goto out_err;
> +       }
> +       if (d->by_fh) {
> +               err = ovl_check_redirect_fh(this, d);
> +               if (err)
> +                       goto out_err;
> +       }
>  out:
>         *ret = this;
>         return 0;
> @@ -152,6 +261,12 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
>         return err;
>  }
>
> +static int ovl_lookup_layer_fh(struct path *path, struct ovl_lookup_data *d,
> +                              struct dentry **ret)
> +{
> +       return ovl_lookup_single(path->dentry, d, "", 0, 0, "", path->mnt, ret);
> +}
> +
>  static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
>                             struct dentry **ret)
>  {
> @@ -162,7 +277,7 @@ static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
>
>         if (d->name.name[0] != '/')
>                 return ovl_lookup_single(base, d, d->name.name, d->name.len,
> -                                        0, "", ret);
> +                                        0, "", NULL, ret);
>
>         while (!IS_ERR_OR_NULL(base) && d_can_lookup(base)) {
>                 const char *s = d->name.name + d->name.len - rem;
> @@ -175,7 +290,7 @@ static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
>                         return -EIO;
>
>                 err = ovl_lookup_single(base, d, s, thislen,
> -                                       d->name.len - rem, next, &base);
> +                                       d->name.len - rem, next, NULL, &base);
>                 dput(dentry);
>                 if (err)
>                         return err;
> @@ -220,6 +335,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>         const struct cred *old_cred;
>         struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
>         struct ovl_entry *poe = dentry->d_parent->d_fsdata;
> +       struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
>         struct path *stack = NULL;
>         struct dentry *upperdir, *upperdentry = NULL;
>         unsigned int ctr = 0;
> @@ -235,7 +351,10 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                 .opaque = false,
>                 .stop = false,
>                 .last = !poe->numlower,
> +               .by_path = true,
>                 .redirect = NULL,
> +               .by_fh = true,
> +               .fh = NULL,
>         };
>
>         if (dentry->d_name.len > ofs->namelen)
> @@ -259,13 +378,23 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                         if (!upperredirect)
>                                 goto out_put_upper;
>                         if (d.redirect[0] == '/')
> -                               poe = dentry->d_sb->s_root->d_fsdata;
> +                               poe = roe;
>                 }
>                 if (d.opaque)
>                         type |= __OVL_PATH_OPAQUE;
>         }
>
> -       if (!d.stop && poe->numlower) {
> +       /*
> +        * For now we only support lower by fh in single layer, because
> +        * fallback from lookup by fh to lookup by path in mid layers for
> +        * merge directory is not yet implemented.
> +        */
> +       if (!ofs->redirect_fh || ofs->numlower > 1) {
> +               kfree(d.fh);
> +               d.fh = NULL;
> +       }
> +
> +       if (!d.stop && (poe->numlower || d.fh)) {
>                 err = -ENOMEM;
>                 stack = kcalloc(ofs->numlower, sizeof(struct path),
>                                 GFP_TEMPORARY);
> @@ -273,6 +402,35 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                         goto out_put_upper;
>         }
>
> +       /* Try to lookup lower layers by file handle */
> +       d.by_path = false;
> +       for (i = 0; !d.stop && d.fh && i < roe->numlower; i++) {
> +               struct path lowerpath = poe->lowerstack[i];
> +
> +               d.last = i == poe->numlower - 1;
> +               err = ovl_lookup_layer_fh(&lowerpath, &d, &this);
> +               if (err)
> +                       goto out_put;
> +
> +               if (!this)
> +                       continue;
> +
> +               stack[ctr].dentry = this;
> +               stack[ctr].mnt = lowerpath.mnt;
> +               ctr++;
> +               /*
> +                * Found by fh - won't lookup by path.
> +                * TODO: set d.redirect to dentry_path(this),
> +                *       so lookup can continue by path.
> +                */
> +               d.stop = true;
> +       }
> +
> +       /* Fallback to lookup lower layers by path */
> +       d.by_path = true;
> +       d.by_fh = false;
> +       kfree(d.fh);
> +       d.fh = NULL;
>         for (i = 0; !d.stop && i < poe->numlower; i++) {
>                 struct path lowerpath = poe->lowerstack[i];
>
> @@ -291,10 +449,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                 if (d.stop)
>                         break;
>
> -               if (d.redirect &&
> -                   d.redirect[0] == '/' &&
> -                   poe != dentry->d_sb->s_root->d_fsdata) {
> -                       poe = dentry->d_sb->s_root->d_fsdata;
> +               if (d.redirect && d.redirect[0] == '/' && poe != roe) {
> +                       poe = roe;
>
>                         /* Find the current layer on the root dentry */
>                         for (i = 0; i < poe->numlower; i++)
> @@ -354,6 +510,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>         dput(upperdentry);
>         kfree(upperredirect);
>  out:
> +       kfree(d.fh);
>         kfree(d.redirect);
>         revert_creds(old_cred);
>         return ERR_PTR(err);
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index c3cfbc5..08002ce 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -190,6 +190,7 @@ const char *ovl_dentry_get_redirect(struct dentry *dentry);
>  void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
>  bool ovl_redirect_fh(struct super_block *sb);
>  void ovl_clear_redirect_fh(struct super_block *sb);
> +bool ovl_redirect_fh_ok(const char *redirect, size_t size);
>  void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry);
>  void ovl_inode_init(struct inode *inode, struct inode *realinode,
>                     bool is_upper);
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index b3bc117..dba9753 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -254,6 +254,20 @@ void ovl_clear_redirect_fh(struct super_block *sb)
>         ofs->redirect_fh = false;
>  }
>
> +bool ovl_redirect_fh_ok(const char *redirect, size_t size)
> +{
> +       struct ovl_fh *fh = (void *)redirect;
> +
> +       if (size < sizeof(struct ovl_fh) || size < fh->len)
> +               return false;
> +
> +       if (fh->version > OVL_FH_VERSION ||
> +           fh->magic != OVL_FH_MAGIC)
> +               return false;
> +
> +       return true;
> +}
> +
>  void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
>  {
>         struct ovl_entry *oe = dentry->d_fsdata;
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-25 15:13   ` Miklos Szeredi
@ 2017-04-25 17:41     ` Amir Goldstein
  2017-04-25 19:11       ` Amir Goldstein
  0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-25 17:41 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Tue, Apr 25, 2017 at 6:13 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> When overlay.fh xattr is found in a directory inode, instead of lookup
>> of the dentry in next lower layer by name, first try to get it by calling
>> exportfs_decode_fh().
>>
>> On failure to lookup by file handle to lower layer, fall back to lookup
>> by name with or without path redirect.
>>
>> For now we only support following by file handle from upper if there is a
>> single lower layer, because fallback from lookup by file hande to lookup
>> by path in mid layers is not yet implemented.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  fs/overlayfs/namei.c     | 185 +++++++++++++++++++++++++++++++++++++++++++----
>>  fs/overlayfs/overlayfs.h |   1 +
>>  fs/overlayfs/util.c      |  14 ++++
>>  3 files changed, 186 insertions(+), 14 deletions(-)
>>
[...]
>>
>> +/* Check if p1 is connected with a chain of hashed dentries to p2 */
>> +static bool ovl_is_lookable(struct dentry *p1, struct dentry *p2)
>> +{
>> +       struct dentry *p;
>> +
>> +       for (p = p2; !IS_ROOT(p); p = p->d_parent) {
>> +               if (d_unhashed(p))
>> +                       return false;
>> +               if (p->d_parent == p1)
>> +                       return true;
>> +       }
>> +       return false;
>> +}
>
> Walking the dentry tree without RCU protection is dangerous and broken.
>

I wonder if is_subdir() would be correct here?
Or I could just follow its lead to implement the parent walk correctly.
I did want to verify that the found dentry is not only 'connected' to
root, but also 'lookable', because I don't want to find a deleted file
when looking in lower layers.
Maybe that was too much and in any case, I could just verify that
the decoded dentry itself is hashed.

> I'm also wondering if there's a better way to find the layer

The purpose of this test is not only to find the layer, but
also to verify that the found inode is linked under the layer root.
I think that decode_fh() will always be able to create a disconnected
dentry if decoding an inode that is on the same sb as the layer where
fh was encoded. I'm pretty sure this was what I found in my initial
tests which made me write the broken ovl_is_lookable().

> (e.g. store the layer index in the handle as well).
>

But the layer index is a volatile number that can change.
I would like to be able to find by fh also when more layers are added
to the stack.

The only thing I can think of is to store sb_uuid+layer_root_fh+lower_fh.
At mount, we build a hash of the lower sb_uuid (save same_lower_uuid
for now).
At lookup, we first find lower_sb by uuid (verify same_lower_uuid for now),
then decode lower_root by root_fh, then find lower_mnt by lower_root,
then decode lower_fh with lower_mnt.

Sound reasonable?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-25 17:41     ` Amir Goldstein
@ 2017-04-25 19:11       ` Amir Goldstein
  2017-04-26  9:06         ` Miklos Szeredi
  0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-25 19:11 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Tue, Apr 25, 2017 at 8:41 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Tue, Apr 25, 2017 at 6:13 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> When overlay.fh xattr is found in a directory inode, instead of lookup
>>> of the dentry in next lower layer by name, first try to get it by calling
>>> exportfs_decode_fh().
>>>
>>> On failure to lookup by file handle to lower layer, fall back to lookup
>>> by name with or without path redirect.
>>>
>>> For now we only support following by file handle from upper if there is a
>>> single lower layer, because fallback from lookup by file hande to lookup
>>> by path in mid layers is not yet implemented.
>>>
>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>> ---
>>>  fs/overlayfs/namei.c     | 185 +++++++++++++++++++++++++++++++++++++++++++----
>>>  fs/overlayfs/overlayfs.h |   1 +
>>>  fs/overlayfs/util.c      |  14 ++++
>>>  3 files changed, 186 insertions(+), 14 deletions(-)
>>>
> [...]
>>>
>>> +/* Check if p1 is connected with a chain of hashed dentries to p2 */
>>> +static bool ovl_is_lookable(struct dentry *p1, struct dentry *p2)
>>> +{
>>> +       struct dentry *p;
>>> +
>>> +       for (p = p2; !IS_ROOT(p); p = p->d_parent) {
>>> +               if (d_unhashed(p))
>>> +                       return false;
>>> +               if (p->d_parent == p1)
>>> +                       return true;
>>> +       }
>>> +       return false;
>>> +}
>>
>> Walking the dentry tree without RCU protection is dangerous and broken.
>>
>
> I wonder if is_subdir() would be correct here?
> Or I could just follow its lead to implement the parent walk correctly.
> I did want to verify that the found dentry is not only 'connected' to
> root, but also 'lookable', because I don't want to find a deleted file
> when looking in lower layers.
> Maybe that was too much and in any case, I could just verify that
> the decoded dentry itself is hashed.
>
>> I'm also wondering if there's a better way to find the layer
>
> The purpose of this test is not only to find the layer, but
> also to verify that the found inode is linked under the layer root.
> I think that decode_fh() will always be able to create a disconnected
> dentry if decoding an inode that is on the same sb as the layer where
> fh was encoded. I'm pretty sure this was what I found in my initial
> tests which made me write the broken ovl_is_lookable().
>
>> (e.g. store the layer index in the handle as well).
>>
>
> But the layer index is a volatile number that can change.
> I would like to be able to find by fh also when more layers are added
> to the stack.
>
> The only thing I can think of is to store sb_uuid+layer_root_fh+lower_fh.
> At mount, we build a hash of the lower sb_uuid (save same_lower_uuid
> for now).
> At lookup, we first find lower_sb by uuid (verify same_lower_uuid for now),
> then decode lower_root by root_fh, then find lower_mnt by lower_root,
> then decode lower_fh with lower_mnt.
>
> Sound reasonable?

Or maybe like this:

At mount time either set or verify the xattr in upper layer root inode:
overlay.root.$i [i:=0..numlower-1] - ovl_root_id of lower layer i
ovl_root_id includes for each layer:
- sb uuid
- fh of root inode

If mount was able to set or verify that all ovl_root_id[i] match their
respective lower layer sb and root inode, then redirect_fh can be enabled,
otherwise it is disabled.

With redirect_fh enabled, it is safe to lookup by the lower layer index,
root fh and lower inode fh.
With redirect_fh enabled, it is safe to store handles on copy up along
with lower layer index and root fh.

A lower layer can be used and reused by any number of overlay mounts
at different layer index.

An upper layer can be reused in an overlay mount with either copied lower
layers or with different lower stack and will have redirect_fh disabled.

An upper layer can be rotated as lower layer, because file handles are
never followed from a lower layer. Constant inode numbers code does
not need to follow by fh from lower layers.

With this scheme, there is no need to store nor match sb_uuid a for
every single copy up and every single lookup by fh.
There is no need to 'lookup' the layer, just use the index and compare
the root_fh.

It is quite safe from following handles to wrong fs, except if user copies
parts of an upper layer (without the layer root), but doing something like
that is equivalent to a user that takes down an NFS server, brings up
a server with the same network address and exports the same share
name from a different filesystem.

Maybe the chances are more slim, but the same interesting things could
happen.

Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
  2017-04-25 14:53   ` Miklos Szeredi
@ 2017-04-26  5:47     ` Amir Goldstein
  2017-04-26  9:21       ` Miklos Szeredi
  0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-26  5:47 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Tue, Apr 25, 2017 at 5:53 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Sometimes it is interesting to know if an upper file is pure
>> upper or a copy up target, and if it is a copy up target, it
>> may be interesting to find the copy up origin.
>>
>> This will be used to preserve lower inode numbers across copy up.
>>
>> Store the lower inode file handle in upper inode xattr overlay.fh
>> on copy up to use it later for these cases.
>>
>> On failure to encode lower file handle, store an invalid 'null'
>> handle, so we can always use the overlay.fh xattr to distignuish
>> between a copy up and a pure upper inode.
>>
>> If lower fs does not support NFS export ops or if not all lower
>> layers are on the same fs, don't try to encode a lower file handle
>> and use the 'null' handle instead.
>
> Decoding fh on wrong fs is going to result in "interesting"
> posibilities, so I think we should be storing some kind of identifier
> about the layer from the very start.
>
> The trivial way to do that would be to encode the filesystem's UUID
> into the stored fh.  Problem seems to be that only ext4 is setting
> sb->s_uuid.  Probably not too hard to fix the others.
>

xfs supports sb->s_export_op->get_uuid() (and seems to be the only
fs that supports exportfs block ops). It may be more appropriate
for our use case (universal unique file handle) to use this API
and add support for it in other fs.
We can also use the existence of sb->s_export_op->get_uuid
as a promise for a persistent/exportable sb uuid instead of assuming
that sb->s_uuid has such properties.

> When decoding, trivial to check in the samefs case, but we'd need a
> table for the uuid->layer lookup for the non-samefs case. But that can
> wait, I'd be content with just having the infrastructure there and
> just using it to verify the handle for now.
>

Sounds good.
I'll do the same_lower_sb implementation for v3.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-25 19:11       ` Amir Goldstein
@ 2017-04-26  9:06         ` Miklos Szeredi
  2017-04-26  9:40           ` Amir Goldstein
  0 siblings, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-26  9:06 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Tue, Apr 25, 2017 at 9:11 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> Or maybe like this:
>
> At mount time either set or verify the xattr in upper layer root inode:
> overlay.root.$i [i:=0..numlower-1] - ovl_root_id of lower layer i
> ovl_root_id includes for each layer:
> - sb uuid
> - fh of root inode
>
> If mount was able to set or verify that all ovl_root_id[i] match their
> respective lower layer sb and root inode, then redirect_fh can be enabled,
> otherwise it is disabled.
>
> With redirect_fh enabled, it is safe to lookup by the lower layer index,
> root fh and lower inode fh.
> With redirect_fh enabled, it is safe to store handles on copy up along
> with lower layer index and root fh.
>
> A lower layer can be used and reused by any number of overlay mounts
> at different layer index.
>
> An upper layer can be reused in an overlay mount with either copied lower
> layers or with different lower stack and will have redirect_fh disabled.
>
> An upper layer can be rotated as lower layer, because file handles are
> never followed from a lower layer. Constant inode numbers code does
> not need to follow by fh from lower layers.
>
> With this scheme, there is no need to store nor match sb_uuid a for
> every single copy up and every single lookup by fh.
> There is no need to 'lookup' the layer, just use the index and compare
> the root_fh.
>
> It is quite safe from following handles to wrong fs, except if user copies
> parts of an upper layer (without the layer root), but doing something like
> that is equivalent to a user that takes down an NFS server, brings up
> a server with the same network address and exports the same share
> name from a different filesystem.
>
> Maybe the chances are more slim, but the same interesting things could
> happen.

Checking UUID would be O(1) and very fast, so I wouldn't worry about
that.  Using is_subdir() to verify the layer is O(depth) but still
very fast.  I don't think that's an issue either.

Using is_subdir() to find the layer would be O(depth*numlower).  But
we can optimize that if we really want to:  have a function that
returns the first ancestor of a dentry that is a layer root (marked
with a flag).  Then we just need to map that dentry to the layer,
which can be done with a hash table or whatever.

And anyway uncached lookup will be slow, and we are only doing this
for copied up files and directories.  So I don't think we need to
worry too much about optimizing this.

So for now lets just go with the original patch but replace
ovl_is_lookable() with is_subdir().

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
  2017-04-26  5:47     ` Amir Goldstein
@ 2017-04-26  9:21       ` Miklos Szeredi
  2017-04-26  9:27         ` Amir Goldstein
  0 siblings, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-26  9:21 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Wed, Apr 26, 2017 at 7:47 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Tue, Apr 25, 2017 at 5:53 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> Sometimes it is interesting to know if an upper file is pure
>>> upper or a copy up target, and if it is a copy up target, it
>>> may be interesting to find the copy up origin.
>>>
>>> This will be used to preserve lower inode numbers across copy up.
>>>
>>> Store the lower inode file handle in upper inode xattr overlay.fh
>>> on copy up to use it later for these cases.
>>>
>>> On failure to encode lower file handle, store an invalid 'null'
>>> handle, so we can always use the overlay.fh xattr to distignuish
>>> between a copy up and a pure upper inode.
>>>
>>> If lower fs does not support NFS export ops or if not all lower
>>> layers are on the same fs, don't try to encode a lower file handle
>>> and use the 'null' handle instead.
>>
>> Decoding fh on wrong fs is going to result in "interesting"
>> posibilities, so I think we should be storing some kind of identifier
>> about the layer from the very start.
>>
>> The trivial way to do that would be to encode the filesystem's UUID
>> into the stored fh.  Problem seems to be that only ext4 is setting
>> sb->s_uuid.  Probably not too hard to fix the others.
>>
>
> xfs supports sb->s_export_op->get_uuid() (and seems to be the only
> fs that supports exportfs block ops). It may be more appropriate
> for our use case (universal unique file handle) to use this API
> and add support for it in other fs.
> We can also use the existence of sb->s_export_op->get_uuid
> as a promise for a persistent/exportable sb uuid instead of assuming
> that sb->s_uuid has such properties.

Right, if ->get_uuid() could be made to work on all exportable fs,
than that would be good.

The "offset" argument worries me a little.   And we'd need to get rid
of the printk in the xfs code (or move it to pnfsd, which is where it
belongs).

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
  2017-04-26  9:21       ` Miklos Szeredi
@ 2017-04-26  9:27         ` Amir Goldstein
  2017-04-26  9:35           ` Miklos Szeredi
  0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-26  9:27 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Wed, Apr 26, 2017 at 12:21 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Wed, Apr 26, 2017 at 7:47 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Tue, Apr 25, 2017 at 5:53 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>> Sometimes it is interesting to know if an upper file is pure
>>>> upper or a copy up target, and if it is a copy up target, it
>>>> may be interesting to find the copy up origin.
>>>>
>>>> This will be used to preserve lower inode numbers across copy up.
>>>>
>>>> Store the lower inode file handle in upper inode xattr overlay.fh
>>>> on copy up to use it later for these cases.
>>>>
>>>> On failure to encode lower file handle, store an invalid 'null'
>>>> handle, so we can always use the overlay.fh xattr to distignuish
>>>> between a copy up and a pure upper inode.
>>>>
>>>> If lower fs does not support NFS export ops or if not all lower
>>>> layers are on the same fs, don't try to encode a lower file handle
>>>> and use the 'null' handle instead.
>>>
>>> Decoding fh on wrong fs is going to result in "interesting"
>>> posibilities, so I think we should be storing some kind of identifier
>>> about the layer from the very start.
>>>
>>> The trivial way to do that would be to encode the filesystem's UUID
>>> into the stored fh.  Problem seems to be that only ext4 is setting
>>> sb->s_uuid.  Probably not too hard to fix the others.
>>>
>>
>> xfs supports sb->s_export_op->get_uuid() (and seems to be the only
>> fs that supports exportfs block ops). It may be more appropriate
>> for our use case (universal unique file handle) to use this API
>> and add support for it in other fs.
>> We can also use the existence of sb->s_export_op->get_uuid
>> as a promise for a persistent/exportable sb uuid instead of assuming
>> that sb->s_uuid has such properties.
>
> Right, if ->get_uuid() could be made to work on all exportable fs,
> than that would be good.
>
> The "offset" argument worries me a little.   And we'd need to get rid
> of the printk in the xfs code (or move it to pnfsd, which is where it
> belongs).
>

The offset argument is discard-able, it gives you more information
than we need.
Another problem is that ->get_uuid for xfs is compiled out by default
without CONFIG_PNFSD, although this could be changed.

Anyway, I have a very simple patch for xfs to set sb->s_uuid.
btrfs has several uuid's (i.e. subvolumes) on the same sb struct IIUC,
so need to see how to handle this.

Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
  2017-04-26  9:27         ` Amir Goldstein
@ 2017-04-26  9:35           ` Miklos Szeredi
  0 siblings, 0 replies; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-26  9:35 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Wed, Apr 26, 2017 at 11:27 AM, Amir Goldstein <amir73il@gmail.com> wrote:

>
> The offset argument is discard-able, it gives you more information
> than we need.

Sure, the problem with that is what should a filesystem put there
which cannot provide such an offset?   Is it optional?  What value
indicates invalid offset?

> Another problem is that ->get_uuid for xfs is compiled out by default
> without CONFIG_PNFSD, although this could be changed.
>
> Anyway, I have a very simple patch for xfs to set sb->s_uuid.
> btrfs has several uuid's (i.e. subvolumes) on the same sb struct IIUC,
> so need to see how to handle this.

Yes, actually btrfs wants some sort of lightweight superblock for
subvolumes with just s_dev and s_uuid.

Not sure what we can do about that for now...

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
  2017-04-24  9:14 ` [PATCH v2 04/11] ovl: store file handle of lower inode on copy up Amir Goldstein
  2017-04-24 13:32   ` kbuild test robot
  2017-04-25 14:53   ` Miklos Szeredi
@ 2017-04-26  9:39   ` Miklos Szeredi
  2017-04-26  9:53     ` Amir Goldstein
  2 siblings, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-26  9:39 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> Sometimes it is interesting to know if an upper file is pure
> upper or a copy up target, and if it is a copy up target, it
> may be interesting to find the copy up origin.
>
> This will be used to preserve lower inode numbers across copy up.
>
> Store the lower inode file handle in upper inode xattr overlay.fh
> on copy up to use it later for these cases.
>
> On failure to encode lower file handle, store an invalid 'null'
> handle, so we can always use the overlay.fh xattr to distignuish
> between a copy up and a pure upper inode.
>
> If lower fs does not support NFS export ops or if not all lower
> layers are on the same fs, don't try to encode a lower file handle
> and use the 'null' handle instead.

One other question regarding this:  do we want  to store the handle of
the next file in the copy up chain or the handle of the original file?

This patch seems to do the "next file" thing.  For directories,
obviously that's what we want, but for files...

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-26  9:06         ` Miklos Szeredi
@ 2017-04-26  9:40           ` Amir Goldstein
  2017-04-26  9:55             ` Miklos Szeredi
  0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-26  9:40 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Wed, Apr 26, 2017 at 12:06 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Tue, Apr 25, 2017 at 9:11 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Or maybe like this:
>>
>> At mount time either set or verify the xattr in upper layer root inode:
>> overlay.root.$i [i:=0..numlower-1] - ovl_root_id of lower layer i
>> ovl_root_id includes for each layer:
>> - sb uuid
>> - fh of root inode
>>
>> If mount was able to set or verify that all ovl_root_id[i] match their
>> respective lower layer sb and root inode, then redirect_fh can be enabled,
>> otherwise it is disabled.
>>
>> With redirect_fh enabled, it is safe to lookup by the lower layer index,
>> root fh and lower inode fh.
>> With redirect_fh enabled, it is safe to store handles on copy up along
>> with lower layer index and root fh.
>>
>> A lower layer can be used and reused by any number of overlay mounts
>> at different layer index.
>>
>> An upper layer can be reused in an overlay mount with either copied lower
>> layers or with different lower stack and will have redirect_fh disabled.
>>
>> An upper layer can be rotated as lower layer, because file handles are
>> never followed from a lower layer. Constant inode numbers code does
>> not need to follow by fh from lower layers.
>>
>> With this scheme, there is no need to store nor match sb_uuid a for
>> every single copy up and every single lookup by fh.
>> There is no need to 'lookup' the layer, just use the index and compare
>> the root_fh.
>>
>> It is quite safe from following handles to wrong fs, except if user copies
>> parts of an upper layer (without the layer root), but doing something like
>> that is equivalent to a user that takes down an NFS server, brings up
>> a server with the same network address and exports the same share
>> name from a different filesystem.
>>
>> Maybe the chances are more slim, but the same interesting things could
>> happen.
>
> Checking UUID would be O(1) and very fast, so I wouldn't worry about
> that.  Using is_subdir() to verify the layer is O(depth) but still
> very fast.  I don't think that's an issue either.
>
> Using is_subdir() to find the layer would be O(depth*numlower).  But
> we can optimize that if we really want to:  have a function that
> returns the first ancestor of a dentry that is a layer root (marked
> with a flag).  Then we just need to map that dentry to the layer,
> which can be done with a hash table or whatever.
>
> And anyway uncached lookup will be slow, and we are only doing this
> for copied up files and directories.  So I don't think we need to
> worry too much about optimizing this.
>
> So for now lets just go with the original patch but replace
> ovl_is_lookable() with is_subdir().
>

Just to see that I understand you correctly.

I am now working on storing the following:

/*
 * The tuple origin.{fh,layer,uuid} is a universal unique identifier
 * for a copy up origin, where:
 * origin.fh    - exported file handle of the lower file
 * origin.root - exported file handle of the lower layer root
 * origin.uuid  - uuid of the lower filesystem
 *
 * origin.{fh,root} are stored in format of a variable length binary blob
 * with struct ovl_fh header (total blob size up to 20 bytes).
 * uuid is stored in raw format (16 bytes) as published by sb->s_uuid.
 */

I intend to implement lookup as follows:
- compare(origin.uuid, same_lower_sb->s_uuid)
# layer root dentries cannot be DCACHE_DISCONNECTED, so
# exportfs_decode_fh ignores mnt arg and returns the cached dentry
- root = exportfs_decode_fh(lowerstack[0].mnt, origin.root)
- find layer where lowerstack[layer].dentry == root
- this = exportfs_decode_fh(lowerstack[layer].mnt, origin.fh)

is_subdir() is NOT needed for decoding the layer root
is_subdir() is optional for decoding the lower file, because
it is not needed to identify the layer

The lookup is O(numlower)+O(depth) where O(depth) is
just as precousion.

Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
  2017-04-26  9:39   ` Miklos Szeredi
@ 2017-04-26  9:53     ` Amir Goldstein
  2017-04-26  9:57       ` Miklos Szeredi
  0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-26  9:53 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Wed, Apr 26, 2017 at 12:39 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Sometimes it is interesting to know if an upper file is pure
>> upper or a copy up target, and if it is a copy up target, it
>> may be interesting to find the copy up origin.
>>
>> This will be used to preserve lower inode numbers across copy up.
>>
>> Store the lower inode file handle in upper inode xattr overlay.fh
>> on copy up to use it later for these cases.
>>
>> On failure to encode lower file handle, store an invalid 'null'
>> handle, so we can always use the overlay.fh xattr to distignuish
>> between a copy up and a pure upper inode.
>>
>> If lower fs does not support NFS export ops or if not all lower
>> layers are on the same fs, don't try to encode a lower file handle
>> and use the 'null' handle instead.
>
> One other question regarding this:  do we want  to store the handle of
> the next file in the copy up chain or the handle of the original file?
>
> This patch seems to do the "next file" thing.  For directories,
> obviously that's what we want, but for files...
>

What I found when working on this is that any file below to uppermost
lower is of zero interest to us.

So I defined 'stable inode' and we only need to lookup stable inode:
Stable := uppermost lower (or upper if numlower == 0)

For NFS export, Stable fh is unique enough, because
when rotating upper layer or any change of layer stack configuration,
NFS handles may become stale and this is fine.

inode numbers are guarantied to remain constant and persistent
as long as upper is not rotated.
Rotating upper will change stable inode numbers and this is fine
(regard it as cpio/tar of the filesystem).

Hardlinks will be preserved as long as lower stack configuration
doesn't change.
When upper is rotated the copy up hardlink bunch will be broken
from the non-copy-up hardlink bunch, which is quite a minor
concern IMO (cpio/tar don't always preserve hardlinks).

Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-26  9:40           ` Amir Goldstein
@ 2017-04-26  9:55             ` Miklos Szeredi
  2017-04-26 10:17               ` Amir Goldstein
  0 siblings, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-26  9:55 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Wed, Apr 26, 2017 at 11:40 AM, Amir Goldstein <amir73il@gmail.com> wrote:

> Just to see that I understand you correctly.
>
> I am now working on storing the following:
>
> /*
>  * The tuple origin.{fh,layer,uuid} is a universal unique identifier
>  * for a copy up origin, where:
>  * origin.fh    - exported file handle of the lower file
>  * origin.root - exported file handle of the lower layer root
>  * origin.uuid  - uuid of the lower filesystem

I wouldn't even store origin.root.

>  *
>  * origin.{fh,root} are stored in format of a variable length binary blob
>  * with struct ovl_fh header (total blob size up to 20 bytes).
>  * uuid is stored in raw format (16 bytes) as published by sb->s_uuid.
>  */
>
> I intend to implement lookup as follows:
> - compare(origin.uuid, same_lower_sb->s_uuid)
> # layer root dentries cannot be DCACHE_DISCONNECTED, so
> # exportfs_decode_fh ignores mnt arg and returns the cached dentry
> - root = exportfs_decode_fh(lowerstack[0].mnt, origin.root)
> - find layer where lowerstack[layer].dentry == root
> - this = exportfs_decode_fh(lowerstack[layer].mnt, origin.fh)
>
> is_subdir() is NOT needed for decoding the layer root
> is_subdir() is optional for decoding the lower file, because
> it is not needed to identify the layer

Hmm, we can just force exportfs_decode_fh() to return a connected
dentry (return false from *acceptable() if the dentry is disconnected)
before going on to iterate the layers to see which one contains it.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
  2017-04-26  9:53     ` Amir Goldstein
@ 2017-04-26  9:57       ` Miklos Szeredi
  0 siblings, 0 replies; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-26  9:57 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Wed, Apr 26, 2017 at 11:53 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Apr 26, 2017 at 12:39 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> Sometimes it is interesting to know if an upper file is pure
>>> upper or a copy up target, and if it is a copy up target, it
>>> may be interesting to find the copy up origin.
>>>
>>> This will be used to preserve lower inode numbers across copy up.
>>>
>>> Store the lower inode file handle in upper inode xattr overlay.fh
>>> on copy up to use it later for these cases.
>>>
>>> On failure to encode lower file handle, store an invalid 'null'
>>> handle, so we can always use the overlay.fh xattr to distignuish
>>> between a copy up and a pure upper inode.
>>>
>>> If lower fs does not support NFS export ops or if not all lower
>>> layers are on the same fs, don't try to encode a lower file handle
>>> and use the 'null' handle instead.
>>
>> One other question regarding this:  do we want  to store the handle of
>> the next file in the copy up chain or the handle of the original file?
>>
>> This patch seems to do the "next file" thing.  For directories,
>> obviously that's what we want, but for files...
>>
>
> What I found when working on this is that any file below to uppermost
> lower is of zero interest to us.
>
> So I defined 'stable inode' and we only need to lookup stable inode:
> Stable := uppermost lower (or upper if numlower == 0)
>
> For NFS export, Stable fh is unique enough, because
> when rotating upper layer or any change of layer stack configuration,
> NFS handles may become stale and this is fine.
>
> inode numbers are guarantied to remain constant and persistent
> as long as upper is not rotated.
> Rotating upper will change stable inode numbers and this is fine
> (regard it as cpio/tar of the filesystem).
>
> Hardlinks will be preserved as long as lower stack configuration
> doesn't change.
> When upper is rotated the copy up hardlink bunch will be broken
> from the non-copy-up hardlink bunch, which is quite a minor
> concern IMO (cpio/tar don't always preserve hardlinks).

Okay, makes sense.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-26  9:55             ` Miklos Szeredi
@ 2017-04-26 10:17               ` Amir Goldstein
  2017-04-26 12:15                 ` Miklos Szeredi
  0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-26 10:17 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Wed, Apr 26, 2017 at 12:55 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Wed, Apr 26, 2017 at 11:40 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>
>> Just to see that I understand you correctly.
>>
>> I am now working on storing the following:
>>
>> /*
>>  * The tuple origin.{fh,layer,uuid} is a universal unique identifier
>>  * for a copy up origin, where:
>>  * origin.fh    - exported file handle of the lower file
>>  * origin.root - exported file handle of the lower layer root
>>  * origin.uuid  - uuid of the lower filesystem
>
> I wouldn't even store origin.root.
>
>>  *
>>  * origin.{fh,root} are stored in format of a variable length binary blob
>>  * with struct ovl_fh header (total blob size up to 20 bytes).
>>  * uuid is stored in raw format (16 bytes) as published by sb->s_uuid.
>>  */
>>
>> I intend to implement lookup as follows:
>> - compare(origin.uuid, same_lower_sb->s_uuid)
>> # layer root dentries cannot be DCACHE_DISCONNECTED, so
>> # exportfs_decode_fh ignores mnt arg and returns the cached dentry
>> - root = exportfs_decode_fh(lowerstack[0].mnt, origin.root)
>> - find layer where lowerstack[layer].dentry == root
>> - this = exportfs_decode_fh(lowerstack[layer].mnt, origin.fh)
>>
>> is_subdir() is NOT needed for decoding the layer root
>> is_subdir() is optional for decoding the lower file, because
>> it is not needed to identify the layer
>
> Hmm, we can just force exportfs_decode_fh() to return a connected
> dentry (return false from *acceptable() if the dentry is disconnected)
> before going on to iterate the layers to see which one contains it.
>

Hmm, this might work, but to quote from exportfs_decode_fh():
"It's not a directory.  Life is a little more complicated."

IIUC, 'connected' means 'connected to sb root', and not
'connected to mnt root', so in the optimal case where
all lower dentries are cached,  exportfs_decode_fh() will return
a connected dentry for every fh we give it regardless of the
mnt argument, so we will have to use is_subdir() to find the
right layer, which brings us back to O(numlower*depth)

With the extra cost of storing the deducible information origin.root,
we will have less complex and more efficient lookup code.

Let me try and implement it and see if I am right.
We can always discard origin.root from v4 if it turns
out to be unhelpful.

Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-26 10:17               ` Amir Goldstein
@ 2017-04-26 12:15                 ` Miklos Szeredi
  2017-04-26 14:51                   ` Amir Goldstein
  0 siblings, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-26 12:15 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Wed, Apr 26, 2017 at 12:17 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Apr 26, 2017 at 12:55 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Wed, Apr 26, 2017 at 11:40 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>
>>> Just to see that I understand you correctly.
>>>
>>> I am now working on storing the following:
>>>
>>> /*
>>>  * The tuple origin.{fh,layer,uuid} is a universal unique identifier
>>>  * for a copy up origin, where:
>>>  * origin.fh    - exported file handle of the lower file
>>>  * origin.root - exported file handle of the lower layer root
>>>  * origin.uuid  - uuid of the lower filesystem
>>
>> I wouldn't even store origin.root.
>>
>>>  *
>>>  * origin.{fh,root} are stored in format of a variable length binary blob
>>>  * with struct ovl_fh header (total blob size up to 20 bytes).
>>>  * uuid is stored in raw format (16 bytes) as published by sb->s_uuid.
>>>  */
>>>
>>> I intend to implement lookup as follows:
>>> - compare(origin.uuid, same_lower_sb->s_uuid)
>>> # layer root dentries cannot be DCACHE_DISCONNECTED, so
>>> # exportfs_decode_fh ignores mnt arg and returns the cached dentry
>>> - root = exportfs_decode_fh(lowerstack[0].mnt, origin.root)
>>> - find layer where lowerstack[layer].dentry == root
>>> - this = exportfs_decode_fh(lowerstack[layer].mnt, origin.fh)
>>>
>>> is_subdir() is NOT needed for decoding the layer root
>>> is_subdir() is optional for decoding the lower file, because
>>> it is not needed to identify the layer
>>
>> Hmm, we can just force exportfs_decode_fh() to return a connected
>> dentry (return false from *acceptable() if the dentry is disconnected)
>> before going on to iterate the layers to see which one contains it.
>>
>
> Hmm, this might work, but to quote from exportfs_decode_fh():
> "It's not a directory.  Life is a little more complicated."
>
> IIUC, 'connected' means 'connected to sb root', and not
> 'connected to mnt root', so in the optimal case where
> all lower dentries are cached,  exportfs_decode_fh() will return
> a connected dentry for every fh we give it regardless of the
> mnt argument, so we will have to use is_subdir() to find the
> right layer, which brings us back to O(numlower*depth)

It just means that we might have to make up an artificial mount which
has its root at the sb root to be able to decode the handle into a
connected one.

>
> With the extra cost of storing the deducible information origin.root,
> we will have less complex and more efficient lookup code.
>
> Let me try and implement it and see if I am right.
> We can always discard origin.root from v4 if it turns
> out to be unhelpful.

I don't have good feelings about storing the root fh just because we
don't special case the layer root anywhere yet, and I wouldn't want to
do that unless there's a good reason.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs
  2017-04-24  9:14 ` [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs Amir Goldstein
@ 2017-04-26 14:40   ` Miklos Szeredi
  2017-04-26 14:53     ` Miklos Szeredi
  2017-04-26 14:57     ` Amir Goldstein
  0 siblings, 2 replies; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-26 14:40 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> For directory entries, non zero oe->numlower implies OVL_TYPE_MERGE.
> Define a new type flag OVL_TYPE_COPYUP to indicate that an entry is
> a target of a copy up.
>
> For directory entries COPYUP = MERGE && UPPER. For non-dir entries
> non zero oe->numlower implies COPYUP, but COPYUP does not imply
> non zero oe->numlower.  COPYUP can also be set on lookup when detecting
> an overlay.fh xattr on a non-dir, even if that fh cannot be followed.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/namei.c     |  3 +++
>  fs/overlayfs/overlayfs.h |  2 ++
>  fs/overlayfs/util.c      | 12 ++++++++----
>  3 files changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index 318092a..73a8879 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -386,6 +386,9 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                 }
>                 if (d.opaque)
>                         type |= __OVL_PATH_OPAQUE;
> +               /* overlay.fh xattr implies this is a copy up */
> +               if (d.fh)
> +                       type |= __OVL_PATH_COPYUP;
>         }
>
>         /*
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 08002ce..d0bb538 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -13,11 +13,13 @@ enum ovl_path_type {
>         __OVL_PATH_UPPER        = (1 << 0),
>         __OVL_PATH_MERGE        = (1 << 1),
>         __OVL_PATH_OPAQUE       = (1 << 2),
> +       __OVL_PATH_COPYUP       = (1 << 3),
>  };
>
>  #define OVL_TYPE_UPPER(type)   ((type) & __OVL_PATH_UPPER)
>  #define OVL_TYPE_MERGE(type)   ((type) & __OVL_PATH_MERGE)
>  #define OVL_TYPE_OPAQUE(type)  ((type) & __OVL_PATH_OPAQUE)
> +#define OVL_TYPE_COPYUP(type)  ((type) & __OVL_PATH_COPYUP)
>
>  #define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay."
>  #define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX "opaque"
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index dba9753..89789bc 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -101,11 +101,15 @@ enum ovl_path_type ovl_update_type(struct dentry *dentry, bool is_dir)
>         if (oe->__upperdentry) {
>                 type |= __OVL_PATH_UPPER;
>                 /*
> -                * Non-dir dentry can hold lower dentry from before
> -                * copy-up.
> +                * oe->numlower implies a copy up, but copy up does not imply
> +                * oe->numlower.  It can also be set on lookup when detecting
> +                * an overlay.fh xattr on a non-dir that cannot be followed.

The code looks fine, but I don't understand the comment.  Why would we
set COPYUP flag when the fh cannot be followed?

The reason I think the COPYUP vs. MERGE distinction is needed is the
ovl_check_empty_and_clear() thing.  It starts with a merged directory
with some whiteouts in it and exchanges it with an empty and opaque
directory.   Normally the empty directory will be deleted immediately,
but if something fails during the deletion, then it will remain there.
  The overlay is left in a consistent state, but the association with
the original inode should still remain, so it will have COPYUP but not
MERGE.

Now the current code is actually broken, because we leave the old,
replaced directory in __upperdentry as well as the rest of the lower
stack.  So should the deletion fail after the replacement things won't
work properly.

I think we can fix that by replacing __upperdentry.  Luckily we are
under inode lock, so protected against concurrent readdir or creation
inside the directory.  Then we have lifetime problems.  Until now a
positive __upperdentry was assumed to have a lifetime equal to that of
the overlay dentry.  We'd need an old_upperdentry to save it.  I think
that's it it, but maybe there are other issues.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-26 12:15                 ` Miklos Szeredi
@ 2017-04-26 14:51                   ` Amir Goldstein
  2017-04-27  6:27                     ` Amir Goldstein
  2017-04-27  7:40                     ` Miklos Szeredi
  0 siblings, 2 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-26 14:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Wed, Apr 26, 2017 at 3:15 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Wed, Apr 26, 2017 at 12:17 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Wed, Apr 26, 2017 at 12:55 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Wed, Apr 26, 2017 at 11:40 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>
>>>> Just to see that I understand you correctly.
>>>>
>>>> I am now working on storing the following:
>>>>
>>>> /*
>>>>  * The tuple origin.{fh,layer,uuid} is a universal unique identifier
>>>>  * for a copy up origin, where:
>>>>  * origin.fh    - exported file handle of the lower file
>>>>  * origin.root - exported file handle of the lower layer root
>>>>  * origin.uuid  - uuid of the lower filesystem
>>>
>>> I wouldn't even store origin.root.
>>>
>>>>  *
>>>>  * origin.{fh,root} are stored in format of a variable length binary blob
>>>>  * with struct ovl_fh header (total blob size up to 20 bytes).
>>>>  * uuid is stored in raw format (16 bytes) as published by sb->s_uuid.
>>>>  */
>>>>
>>>> I intend to implement lookup as follows:
>>>> - compare(origin.uuid, same_lower_sb->s_uuid)
>>>> # layer root dentries cannot be DCACHE_DISCONNECTED, so
>>>> # exportfs_decode_fh ignores mnt arg and returns the cached dentry
>>>> - root = exportfs_decode_fh(lowerstack[0].mnt, origin.root)
>>>> - find layer where lowerstack[layer].dentry == root
>>>> - this = exportfs_decode_fh(lowerstack[layer].mnt, origin.fh)
>>>>
>>>> is_subdir() is NOT needed for decoding the layer root
>>>> is_subdir() is optional for decoding the lower file, because
>>>> it is not needed to identify the layer
>>>
>>> Hmm, we can just force exportfs_decode_fh() to return a connected
>>> dentry (return false from *acceptable() if the dentry is disconnected)
>>> before going on to iterate the layers to see which one contains it.
>>>
>>
>> Hmm, this might work, but to quote from exportfs_decode_fh():
>> "It's not a directory.  Life is a little more complicated."
>>
>> IIUC, 'connected' means 'connected to sb root', and not
>> 'connected to mnt root', so in the optimal case where
>> all lower dentries are cached,  exportfs_decode_fh() will return
>> a connected dentry for every fh we give it regardless of the
>> mnt argument, so we will have to use is_subdir() to find the
>> right layer, which brings us back to O(numlower*depth)
>
> It just means that we might have to make up an artificial mount which
> has its root at the sb root to be able to decode the handle into a
> connected one.
>

I'm not sure I understand what this artificial mount buys us.

>>
>> With the extra cost of storing the deducible information origin.root,
>> we will have less complex and more efficient lookup code.
>>
>> Let me try and implement it and see if I am right.
>> We can always discard origin.root from v4 if it turns
>> out to be unhelpful.
>
> I don't have good feelings about storing the root fh just because we
> don't special case the layer root anywhere yet, and I wouldn't want to
> do that unless there's a good reason.
>

There are a few reasons for origin.root, not sure if they are good:
1. lookup is O(numlower+depth) instead of O(numlower*depth)
2. origin.uuid validates that we are still on the same sb
    origin.root validates that we are still using the same lower dirs
    and that files from old lower were not moved around to find themselves
    inside a different lower dir
3. hardlinks between layers (!!!) will still get to the right layer

I personally think that reason #1 is the important one, but I think we
disagree on the technical details of exportfs_decode_fh() and we
need to sort this out.

Here is my untested implementation of find layer by uuid/rootfh
with the relevant comments. Maybe it helps you point out what
I am missing or what you are missing:

/* Find lower layer index by layer root file handle and uuid */
static int ovl_find_layer_by_fh(struct dentry *dentry, struct
ovl_lookup_data *d)
{
        struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
        struct super_block *lower_sb = ovl_same_lower_sb(dentry->d_sb);
        struct dentry *this;
        int i;

        /*
         * For now, we only support lookup by fh for all lower layers on the
         * same sb.  Not all filesystems set sb->s_uuid.  For those who don't
         * this code will compare zeros, which at least ensures us that the
         * file handles are not crossing from filesystem with sb->s_uuid to
         * a filesystem without sb->s_uuid and vice versa.
         */
        if (!lower_sb || memcmp(lower_sb->s_uuid, &d->uuid, sizeof(d->uuid)))
                return -1;

        /*
         * Layer root dentries are pinned, there are no aliases for dirs, and
         * all lower layers are on the same sb.  If rootfh is correct,
         * exportfs_decode_fh() will find it in dcache and return the only
         * instance, regardless of the mnt argument and we can compare the
         * returned pointer with the pointers in lowerstack.
         */
        this = ovl_decode_fh(roe->lowerstack[0].mnt, d->rootfh, ovl_is_dir);
        if (IS_ERR(this))
                return -1;

        for (i = 0; i < roe->numlower; i++) {
                if (this == roe->lowerstack[i].dentry)
                        break;
        }

        dput(this);
        return i < roe->numlower ? i : -1;
}

Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs
  2017-04-26 14:40   ` Miklos Szeredi
@ 2017-04-26 14:53     ` Miklos Szeredi
  2017-04-26 15:02       ` Amir Goldstein
  2017-04-26 14:57     ` Amir Goldstein
  1 sibling, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-26 14:53 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Wed, Apr 26, 2017 at 4:40 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:

> The reason I think the COPYUP vs. MERGE distinction is needed is the
> ovl_check_empty_and_clear() thing.  It starts with a merged directory
> with some whiteouts in it and exchanges it with an empty and opaque
> directory.   Normally the empty directory will be deleted immediately,
> but if something fails during the deletion, then it will remain there.
>   The overlay is left in a consistent state, but the association with
> the original inode should still remain, so it will have COPYUP but not
> MERGE.

One more thought: we could introduce a separate "overlay.merge"
attribute that is the exact opposite of "overlay.opaque".
"overlay.merge" would imply "overlay.fh" but "overlay.fh" would not
imply "overlay.merge".

It would allow us to optionally get rid of "overlay.opaque" when back
compatibility is not needed.

It would also allow a new feature: on metadata only updates of regular
files we wouldn't need to copy up the data.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs
  2017-04-26 14:40   ` Miklos Szeredi
  2017-04-26 14:53     ` Miklos Szeredi
@ 2017-04-26 14:57     ` Amir Goldstein
  1 sibling, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-26 14:57 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Wed, Apr 26, 2017 at 5:40 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> For directory entries, non zero oe->numlower implies OVL_TYPE_MERGE.
>> Define a new type flag OVL_TYPE_COPYUP to indicate that an entry is
>> a target of a copy up.
>>
>> For directory entries COPYUP = MERGE && UPPER. For non-dir entries
>> non zero oe->numlower implies COPYUP, but COPYUP does not imply
>> non zero oe->numlower.  COPYUP can also be set on lookup when detecting
>> an overlay.fh xattr on a non-dir, even if that fh cannot be followed.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  fs/overlayfs/namei.c     |  3 +++
>>  fs/overlayfs/overlayfs.h |  2 ++
>>  fs/overlayfs/util.c      | 12 ++++++++----
>>  3 files changed, 13 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
>> index 318092a..73a8879 100644
>> --- a/fs/overlayfs/namei.c
>> +++ b/fs/overlayfs/namei.c
>> @@ -386,6 +386,9 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>>                 }
>>                 if (d.opaque)
>>                         type |= __OVL_PATH_OPAQUE;
>> +               /* overlay.fh xattr implies this is a copy up */
>> +               if (d.fh)
>> +                       type |= __OVL_PATH_COPYUP;
>>         }
>>
>>         /*
>> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
>> index 08002ce..d0bb538 100644
>> --- a/fs/overlayfs/overlayfs.h
>> +++ b/fs/overlayfs/overlayfs.h
>> @@ -13,11 +13,13 @@ enum ovl_path_type {
>>         __OVL_PATH_UPPER        = (1 << 0),
>>         __OVL_PATH_MERGE        = (1 << 1),
>>         __OVL_PATH_OPAQUE       = (1 << 2),
>> +       __OVL_PATH_COPYUP       = (1 << 3),
>>  };
>>
>>  #define OVL_TYPE_UPPER(type)   ((type) & __OVL_PATH_UPPER)
>>  #define OVL_TYPE_MERGE(type)   ((type) & __OVL_PATH_MERGE)
>>  #define OVL_TYPE_OPAQUE(type)  ((type) & __OVL_PATH_OPAQUE)
>> +#define OVL_TYPE_COPYUP(type)  ((type) & __OVL_PATH_COPYUP)
>>
>>  #define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay."
>>  #define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX "opaque"
>> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
>> index dba9753..89789bc 100644
>> --- a/fs/overlayfs/util.c
>> +++ b/fs/overlayfs/util.c
>> @@ -101,11 +101,15 @@ enum ovl_path_type ovl_update_type(struct dentry *dentry, bool is_dir)
>>         if (oe->__upperdentry) {
>>                 type |= __OVL_PATH_UPPER;
>>                 /*
>> -                * Non-dir dentry can hold lower dentry from before
>> -                * copy-up.
>> +                * oe->numlower implies a copy up, but copy up does not imply
>> +                * oe->numlower.  It can also be set on lookup when detecting
>> +                * an overlay.fh xattr on a non-dir that cannot be followed.
>
> The code looks fine, but I don't understand the comment.  Why would we
> set COPYUP flag when the fh cannot be followed?
>

See patch #8 ovl: redirect non-dir by path on rename

overlay.fh *is* the indication of a copy up and non-dir copy ups
should be redirected
on rename.

With this, copying the layers will not break the constant inode property.
upper files will continue to follow by path to origin and report the
new (post copy)
stable inode number.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs
  2017-04-26 14:53     ` Miklos Szeredi
@ 2017-04-26 15:02       ` Amir Goldstein
  2017-04-26 18:51         ` Amir Goldstein
  2017-04-27  9:32         ` Miklos Szeredi
  0 siblings, 2 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-26 15:02 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Wed, Apr 26, 2017 at 5:53 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Wed, Apr 26, 2017 at 4:40 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
>> The reason I think the COPYUP vs. MERGE distinction is needed is the
>> ovl_check_empty_and_clear() thing.  It starts with a merged directory
>> with some whiteouts in it and exchanges it with an empty and opaque
>> directory.   Normally the empty directory will be deleted immediately,
>> but if something fails during the deletion, then it will remain there.
>>   The overlay is left in a consistent state, but the association with
>> the original inode should still remain, so it will have COPYUP but not
>> MERGE.
>
> One more thought: we could introduce a separate "overlay.merge"
> attribute that is the exact opposite of "overlay.opaque".
> "overlay.merge" would imply "overlay.fh" but "overlay.fh" would not
> imply "overlay.merge".
>
> It would allow us to optionally get rid of "overlay.opaque" when back
> compatibility is not needed.
>
> It would also allow a new feature: on metadata only updates of regular
> files we wouldn't need to copy up the data.
>

So you intend to set overlay.merge for non-dir?
How is it different from overlay.fh then?
With it's new name, overlay.origin.fh indicates that there is a copy
up origin below us. Either directly below us, or at overlay.redirect.
We can also try to follow to origin by fh, but that is only an optimization -
an important optimization IMO, because file rename are more common
than dir renames and lookup stable inode by fh in a deep directory
with many layers will be much more efficient by fh.

Are we understanding each other w.r.t. overlay.merge vs overlay.fh?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs
  2017-04-26 15:02       ` Amir Goldstein
@ 2017-04-26 18:51         ` Amir Goldstein
  2017-04-27  9:32         ` Miklos Szeredi
  1 sibling, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-26 18:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Wed, Apr 26, 2017 at 6:02 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Apr 26, 2017 at 5:53 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Wed, Apr 26, 2017 at 4:40 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>>> The reason I think the COPYUP vs. MERGE distinction is needed is the
>>> ovl_check_empty_and_clear() thing.  It starts with a merged directory
>>> with some whiteouts in it and exchanges it with an empty and opaque
>>> directory.   Normally the empty directory will be deleted immediately,
>>> but if something fails during the deletion, then it will remain there.
>>>   The overlay is left in a consistent state, but the association with
>>> the original inode should still remain, so it will have COPYUP but not
>>> MERGE.
>>
>> One more thought: we could introduce a separate "overlay.merge"
>> attribute that is the exact opposite of "overlay.opaque".
>> "overlay.merge" would imply "overlay.fh" but "overlay.fh" would not
>> imply "overlay.merge".
>>
>> It would allow us to optionally get rid of "overlay.opaque" when back
>> compatibility is not needed.
>>
>> It would also allow a new feature: on metadata only updates of regular
>> files we wouldn't need to copy up the data.
>>
>
> So you intend to set overlay.merge for non-dir?
> How is it different from overlay.fh then?
> With it's new name, overlay.origin.fh indicates that there is a copy
> up origin below us. Either directly below us, or at overlay.redirect.
> We can also try to follow to origin by fh, but that is only an optimization -

I miss-spoke - redirect_fh to origin is not only as optimization.
Although renames do not depend on redirect_fh, hardlinks do.

As I learned from improved unionmount-testsuite:

./run --ov=1 hard-link
...
 ./run --link /mnt/a/no_foo110 /mnt/a/foo110
mount -t overlay overlay /mnt
-olowerdir=/upper/0:/lower,upperdir=/upper/1,workdir=/upper/work
sh (8035): drop_caches: 3
/mnt/a/foo110: inode number wrong (got 68908, want 68898)

This error happens in non-samefs case when there is more than 1 lower layer
and redirect_fh is disabled.

It happens after link and mount cycle because the linked upper file, does not
know how to lookup the lower origin.

The error does not happen with samefs and with single lower fs, i.e.:
./run --ov=0 hard-link
./run --ov=1 --samefs hard-link

Because in those cases, all the upper hardlinks follow to origin by fh
and report the same inode number.

I think this calls for setting overlay.redirect also on the target of
ovl_link()??

Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-26 14:51                   ` Amir Goldstein
@ 2017-04-27  6:27                     ` Amir Goldstein
  2017-04-27  7:48                       ` Miklos Szeredi
  2017-04-27  7:40                     ` Miklos Szeredi
  1 sibling, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-27  6:27 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Wed, Apr 26, 2017 at 5:51 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Apr 26, 2017 at 3:15 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Wed, Apr 26, 2017 at 12:17 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> On Wed, Apr 26, 2017 at 12:55 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>> On Wed, Apr 26, 2017 at 11:40 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>
>>>>> Just to see that I understand you correctly.
>>>>>
>>>>> I am now working on storing the following:
>>>>>
>>>>> /*
>>>>>  * The tuple origin.{fh,layer,uuid} is a universal unique identifier
>>>>>  * for a copy up origin, where:
>>>>>  * origin.fh    - exported file handle of the lower file
>>>>>  * origin.root - exported file handle of the lower layer root
>>>>>  * origin.uuid  - uuid of the lower filesystem
>>>>
>>>> I wouldn't even store origin.root.
>>>>
>>>>>  *
>>>>>  * origin.{fh,root} are stored in format of a variable length binary blob
>>>>>  * with struct ovl_fh header (total blob size up to 20 bytes).
>>>>>  * uuid is stored in raw format (16 bytes) as published by sb->s_uuid.
>>>>>  */
>>>>>
>>>>> I intend to implement lookup as follows:
>>>>> - compare(origin.uuid, same_lower_sb->s_uuid)
>>>>> # layer root dentries cannot be DCACHE_DISCONNECTED, so
>>>>> # exportfs_decode_fh ignores mnt arg and returns the cached dentry
>>>>> - root = exportfs_decode_fh(lowerstack[0].mnt, origin.root)
>>>>> - find layer where lowerstack[layer].dentry == root
>>>>> - this = exportfs_decode_fh(lowerstack[layer].mnt, origin.fh)
>>>>>
>>>>> is_subdir() is NOT needed for decoding the layer root
>>>>> is_subdir() is optional for decoding the lower file, because
>>>>> it is not needed to identify the layer
>>>>
>>>> Hmm, we can just force exportfs_decode_fh() to return a connected
>>>> dentry (return false from *acceptable() if the dentry is disconnected)
>>>> before going on to iterate the layers to see which one contains it.
>>>>
>>>
>>> Hmm, this might work, but to quote from exportfs_decode_fh():
>>> "It's not a directory.  Life is a little more complicated."
>>>
>>> IIUC, 'connected' means 'connected to sb root', and not
>>> 'connected to mnt root', so in the optimal case where
>>> all lower dentries are cached,  exportfs_decode_fh() will return
>>> a connected dentry for every fh we give it regardless of the
>>> mnt argument, so we will have to use is_subdir() to find the
>>> right layer, which brings us back to O(numlower*depth)
>>
>> It just means that we might have to make up an artificial mount which
>> has its root at the sb root to be able to decode the handle into a
>> connected one.
>>
>
> I'm not sure I understand what this artificial mount buys us.

Let me try to explain the problem with a worse case, but not
improbable example:

Suppose I have an overlay with deep file at /a/b/c/.../z
Suppose the layers are at /old/{lower,upper} I copy them
over to /new/{lower,upper} and mount the overlay at new path.

Suppose that dcache is fully populated under /new and fully
evicted under /old.

When trying to decode the file handle for z, exportfs_decode_fh()
will call the file system to actually read all directories a..z from disk
in order to reconnect the dentry of old z all the way up to /old
and it will do that *before* calling the acceptable() callback.

Alternatively, if we first try to decode the file handle for /old/lower,
decoding will be very fast (most likely already in cache) and we will
not have to continue to decoding z and reading all directories a..z
from disk.

This is why and how I implemented lookup by origin.{root+fh}
in v3 patch set.

>
>>>
>>> With the extra cost of storing the deducible information origin.root,
>>> we will have less complex and more efficient lookup code.
>>>
>>> Let me try and implement it and see if I am right.
>>> We can always discard origin.root from v4 if it turns
>>> out to be unhelpful.
>>
>> I don't have good feelings about storing the root fh just because we
>> don't special case the layer root anywhere yet, and I wouldn't want to
>> do that unless there's a good reason.
>>

Wait, what do you mean by "we don't special case the layer root?"
Do you mean that we could mount an overlay at a subdir path?
i.e. in the example below, we could mount an overlay with
upperdir=/new/upper/a/b/c,lowerdir=/new/lower/a/b/c?

If this is what you mean then it is not true that we don't special case
layer root. We do it with path redirect relative to layer root.
If anything, we should be storing origin.root along with overlay.redirect
in order to verify that we are not redirecting into the wrong relative
path.

>
> There are a few reasons for origin.root, not sure if they are good:
> 1. lookup is O(numlower+depth) instead of O(numlower*depth)
> 2. origin.uuid validates that we are still on the same sb
>     origin.root validates that we are still using the same lower dirs
>     and that files from old lower were not moved around to find themselves
>     inside a different lower dir
> 3. hardlinks between layers (!!!) will still get to the right layer
>
> I personally think that reason #1 is the important one, but I think we
> disagree on the technical details of exportfs_decode_fh() and we
> need to sort this out.
>
> Here is my untested implementation of find layer by uuid/rootfh
> with the relevant comments. Maybe it helps you point out what
> I am missing or what you are missing:
>
> /* Find lower layer index by layer root file handle and uuid */
> static int ovl_find_layer_by_fh(struct dentry *dentry, struct
> ovl_lookup_data *d)
> {
>         struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
>         struct super_block *lower_sb = ovl_same_lower_sb(dentry->d_sb);
>         struct dentry *this;
>         int i;
>
>         /*
>          * For now, we only support lookup by fh for all lower layers on the
>          * same sb.  Not all filesystems set sb->s_uuid.  For those who don't
>          * this code will compare zeros, which at least ensures us that the
>          * file handles are not crossing from filesystem with sb->s_uuid to
>          * a filesystem without sb->s_uuid and vice versa.
>          */
>         if (!lower_sb || memcmp(lower_sb->s_uuid, &d->uuid, sizeof(d->uuid)))
>                 return -1;
>
>         /*
>          * Layer root dentries are pinned, there are no aliases for dirs, and
>          * all lower layers are on the same sb.  If rootfh is correct,
>          * exportfs_decode_fh() will find it in dcache and return the only
>          * instance, regardless of the mnt argument and we can compare the
>          * returned pointer with the pointers in lowerstack.
>          */
>         this = ovl_decode_fh(roe->lowerstack[0].mnt, d->rootfh, ovl_is_dir);
>         if (IS_ERR(this))
>                 return -1;
>
>         for (i = 0; i < roe->numlower; i++) {
>                 if (this == roe->lowerstack[i].dentry)
>                         break;
>         }
>
>         dput(this);
>         return i < roe->numlower ? i : -1;
> }
>
> Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-26 14:51                   ` Amir Goldstein
  2017-04-27  6:27                     ` Amir Goldstein
@ 2017-04-27  7:40                     ` Miklos Szeredi
  1 sibling, 0 replies; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-27  7:40 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Wed, Apr 26, 2017 at 4:51 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Apr 26, 2017 at 3:15 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:

>> I don't have good feelings about storing the root fh just because we
>> don't special case the layer root anywhere yet, and I wouldn't want to
>> do that unless there's a good reason.
>>
>
> There are a few reasons for origin.root, not sure if they are good:
> 1. lookup is O(numlower+depth) instead of O(numlower*depth)

We can optimize to O(numlower+depth) even without origin.root.

> 2. origin.uuid validates that we are still on the same sb
>     origin.root validates that we are still using the same lower dirs
>     and that files from old lower were not moved around to find themselves
>     inside a different lower dir

Parent is encoded in the fh, so that makes it resistant to moving. See
the exportfs_get_name() trickery to get a non-dir connected.  It's
needed whether we have origin.root or not.  And yes, it's pretty
heavyweight.   Wondering if it's worth the trouble, since we are not
actually going to use the lower inode for anything else than getting
the inode number.  And then we could just store the inode number
instead of the fh, and be rid of this mess.

If file is moved to another layer by moving an ancestor directory then
we won't detect that.  Question is: do we care?  It's definitely in
the "you messed with lower dirs, you keep the pieces" territory.

> 3. hardlinks between layers (!!!) will still get to the right layer

Even without origin.root it should get the right layer, since we are
encoding the parent in the fh.

> I personally think that reason #1 is the important one, but I think we
> disagree on the technical details of exportfs_decode_fh() and we
> need to sort this out.
>
> Here is my untested implementation of find layer by uuid/rootfh
> with the relevant comments. Maybe it helps you point out what
> I am missing or what you are missing:

Yeah, it simplifies the implementation.  But implementation is
secondary to interface...

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-27  6:27                     ` Amir Goldstein
@ 2017-04-27  7:48                       ` Miklos Szeredi
  2017-04-27  9:22                         ` Amir Goldstein
  2017-04-27  9:26                         ` Miklos Szeredi
  0 siblings, 2 replies; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-27  7:48 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Thu, Apr 27, 2017 at 8:27 AM, Amir Goldstein <amir73il@gmail.com> wrote:

> Let me try to explain the problem with a worse case, but not
> improbable example:
>
> Suppose I have an overlay with deep file at /a/b/c/.../z
> Suppose the layers are at /old/{lower,upper} I copy them
> over to /new/{lower,upper} and mount the overlay at new path.
>
> Suppose that dcache is fully populated under /new and fully
> evicted under /old.
>
> When trying to decode the file handle for z, exportfs_decode_fh()
> will call the file system to actually read all directories a..z from disk
> in order to reconnect the dentry of old z all the way up to /old
> and it will do that *before* calling the acceptable() callback.
>
> Alternatively, if we first try to decode the file handle for /old/lower,
> decoding will be very fast (most likely already in cache) and we will
> not have to continue to decoding z and reading all directories a..z
> from disk.

To answer my own question in the prev mail: we need to decode the fh
and not just blindly use the inum to prevent issues with
copied/mutilited/etc lower layers.

And yes, in the copied case decoding origin.root first would be a good
optimization that couldn't be done without it.

> Wait, what do you mean by "we don't special case the layer root?"
> Do you mean that we could mount an overlay at a subdir path?
> i.e. in the example below, we could mount an overlay with
> upperdir=/new/upper/a/b/c,lowerdir=/new/lower/a/b/c?
>
> If this is what you mean then it is not true that we don't special case
> layer root. We do it with path redirect relative to layer root.
> If anything, we should be storing origin.root along with overlay.redirect
> in order to verify that we are not redirecting into the wrong relative
> path.

Yeah, you're right, we are special casing layer root.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-27  7:48                       ` Miklos Szeredi
@ 2017-04-27  9:22                         ` Amir Goldstein
  2017-04-27  9:26                         ` Miklos Szeredi
  1 sibling, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-27  9:22 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Thu, Apr 27, 2017 at 10:48 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Apr 27, 2017 at 8:27 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>
>> Let me try to explain the problem with a worse case, but not
>> improbable example:
>>
>> Suppose I have an overlay with deep file at /a/b/c/.../z
>> Suppose the layers are at /old/{lower,upper} I copy them
>> over to /new/{lower,upper} and mount the overlay at new path.
>>
>> Suppose that dcache is fully populated under /new and fully
>> evicted under /old.
>>
>> When trying to decode the file handle for z, exportfs_decode_fh()
>> will call the file system to actually read all directories a..z from disk
>> in order to reconnect the dentry of old z all the way up to /old
>> and it will do that *before* calling the acceptable() callback.
>>
>> Alternatively, if we first try to decode the file handle for /old/lower,
>> decoding will be very fast (most likely already in cache) and we will
>> not have to continue to decoding z and reading all directories a..z
>> from disk.
>
> To answer my own question in the prev mail: we need to decode the fh
> and not just blindly use the inum to prevent issues with
> copied/mutilited/etc lower layers.
>

I was going to refer you to this example when reading you question
in prev email. That's what we get for no read/write barriers in emails ;-)

> And yes, in the copied case decoding origin.root first would be a good
> optimization that couldn't be done without it.
>

Good, so we seem to have an agreement w.r.t. the lookup fh patch.

I've already applied a change to disable redirect_fh if lower s_uuid is
zeros and I verified that it works as expected by running the hard-link
constant inode test that relies on redirect_fh over xfs mounted with
-o nouuid.

I will be posting the enhanced xfstest for constant inodes later today.

Let me know when are are done reviewing the series, so I can rework it
with the binary blob change you requested.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-27  7:48                       ` Miklos Szeredi
  2017-04-27  9:22                         ` Amir Goldstein
@ 2017-04-27  9:26                         ` Miklos Szeredi
       [not found]                           ` <CAOQ4uxiweaqzR3eT-StgtDFAHBuYhGRvAJE6v=XpH33MevpmoA@mail.gmail.com>
  1 sibling, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-27  9:26 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Thu, Apr 27, 2017 at 9:48 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Apr 27, 2017 at 8:27 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>
>> Let me try to explain the problem with a worse case, but not
>> improbable example:
>>
>> Suppose I have an overlay with deep file at /a/b/c/.../z
>> Suppose the layers are at /old/{lower,upper} I copy them
>> over to /new/{lower,upper} and mount the overlay at new path.
>>
>> Suppose that dcache is fully populated under /new and fully
>> evicted under /old.
>>
>> When trying to decode the file handle for z, exportfs_decode_fh()
>> will call the file system to actually read all directories a..z from disk
>> in order to reconnect the dentry of old z all the way up to /old
>> and it will do that *before* calling the acceptable() callback.
>>
>> Alternatively, if we first try to decode the file handle for /old/lower,
>> decoding will be very fast (most likely already in cache) and we will
>> not have to continue to decoding z and reading all directories a..z
>> from disk.
>
> To answer my own question in the prev mail: we need to decode the fh
> and not just blindly use the inum to prevent issues with
> copied/mutilited/etc lower layers.

Hmm, this is absurd.  Why are we going to all this trouble to find the
origin inode though decoding the file handle when this thing was meant
to be an *optimization*?  Without redirect, we can look up origin just
like we do for merge dirs.  Way faster than decoding a connected
dentry, which is going to result in a readdir of the parent directory
and whatnot.  The only thing we need is a bool "was this copied" flag.

For moved files, decoding the fh might be an optimization over walking
the redirect, but that depends on a various factors, and it might also
be a lot slower...  But it's needed for the snapshot case, right?

Am I missing something?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs
  2017-04-26 15:02       ` Amir Goldstein
  2017-04-26 18:51         ` Amir Goldstein
@ 2017-04-27  9:32         ` Miklos Szeredi
  1 sibling, 0 replies; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-27  9:32 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Wed, Apr 26, 2017 at 5:02 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Apr 26, 2017 at 5:53 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Wed, Apr 26, 2017 at 4:40 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>>> The reason I think the COPYUP vs. MERGE distinction is needed is the
>>> ovl_check_empty_and_clear() thing.  It starts with a merged directory
>>> with some whiteouts in it and exchanges it with an empty and opaque
>>> directory.   Normally the empty directory will be deleted immediately,
>>> but if something fails during the deletion, then it will remain there.
>>>   The overlay is left in a consistent state, but the association with
>>> the original inode should still remain, so it will have COPYUP but not
>>> MERGE.
>>
>> One more thought: we could introduce a separate "overlay.merge"
>> attribute that is the exact opposite of "overlay.opaque".
>> "overlay.merge" would imply "overlay.fh" but "overlay.fh" would not
>> imply "overlay.merge".
>>
>> It would allow us to optionally get rid of "overlay.opaque" when back
>> compatibility is not needed.
>>
>> It would also allow a new feature: on metadata only updates of regular
>> files we wouldn't need to copy up the data.
>>
>
> So you intend to set overlay.merge for non-dir?

Nope, not by default.

> How is it different from overlay.fh then?

It would make sense for regular files for the non-samefs or non-clone
fs cases if only metadata (attr, xattr) are modified but data is not.
We'd create an empty file with the copied up metadata and
"overlay.merge" set indicating that the data I/O should still be
redirected to the origin, while metadata is kept in the copied up
file.  This can be upgraded to a fully copied-up file later.

Not something for this series, obviously.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
       [not found]                             ` <CAJfpegtTJmcLVrLOeQbhu4Q6sM0Mi_FRgr+vStF0k95QsWm5uQ@mail.gmail.com>
@ 2017-04-27 13:53                               ` Amir Goldstein
  2017-04-27 14:46                                 ` Miklos Szeredi
  0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-27 13:53 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

[Adding back CC list after I unintentionally dropped it]

On Thu, Apr 27, 2017 at 4:11 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Apr 27, 2017 at 11:53 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Thu, Apr 27, 2017 at 12:26 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
>>> Hmm, this is absurd.  Why are we going to all this trouble to find the
>>> origin inode though decoding the file handle when this thing was meant
>>> to be an *optimization*?  Without redirect, we can look up origin just
>>> like we do for merge dirs.  Way faster than decoding a connected
>>> dentry, which is going to result in a readdir of the parent directory
>>> and whatnot.  The only thing we need is a bool "was this copied" flag.
>>>
>>
>> Yes, that is what happens in non-same-lower-fs case.
>> (overlay.fh is a fancy 4 bytes binary boolean blob)
>>
>>> For moved files, decoding the fh might be an optimization over walking
>>> the redirect, but that depends on a various factors, and it might also
>>> be a lot slower...
>>
>> I think we can safely disable by_fh for non-dirs if redirect is not set.
>> redirect_fh will be a bit more efficient with large numlower, but maybe
>> that is less interesting.
>>
>>> But it's needed for the snapshot case, right?
>>
>> It sure is! but the redirect_fh infrastructure is also needed for the next
>> part of the work of NFS export and preserved hardlinks, although in
>> this case it may be used to follow forward and not backwards.
>>
>> It also makes overlayfs more robust to lower layer changes, so less
>> chance for bugs related to mangled lower fs.
>>
>>>
>>> Am I missing something?
>>>
>>
>> Not really. Without the context of this being the first part of a 2 parts
>> work, the amount of hustle that redirect_fh brings to this series should
>> have raised your eyebrows as it did now.
>>
>> I started with the 'stable inode' series for same-fs only case which
>> provides a lot more than just constant inodes.
>>
>> You commented that we should try harder to do constant inodes
>> for non-samefs case, so I yanked out the parts relevant only to
>> constant inodes and relaxed the samefs constrains.
>>
>> Redirect_fh was left in, because:
>> 1. It was always the basis the large work, so was easier to leave it
>>     like this, then to have patches that add it later.
>> 2. It has some merit for optimization and for some constant inode
>>     cases with hardlinks
>> 3. It serves the full work for this code to be reviewed and tested
>>     soon, which is what is happening now.
>>
>> That said, if you feel strongly about it, I could either:
>> - Yank it out and re-introduce in the next part
>> - Leave it in with redirect_fh disabled by default and enabled
>>   by mount option, so at least people could test it and see how
>>   it performs compared to redirect by path in different workloads.
>>
>> Thoughts?
>
> Maybe you misunderstood.  I wasn't saying we don't need
> overlay.origin. I'm saying that it doesn't make sense to use
> overlay.origin for lookup.
>
> Summing up what we know:
>
>  - for constant inode we need a bool flag indicating that this is a
> copied up file or directory,  with that flag, together with redirect
> for regular files, we can do constant inode for samefs and non-samefs
> case.  That bool flag can be the existence of the overlay.origin xattr
>

Agree.

>  - hardlinks: need to set overlay.redirect  when hardlink is created
> from a copied up file; similar to what we do on rename
>

Partly agree.
1. This is not atomic, because hardlink is not created in workdir.
2. Reverse mapping will take care of this anyway. Remember?
    there is an extra hardlink in workdir/inodes with has overlay.redirect
    set on first alias copy up

>  - hardlink un-breaking: need reverse mapping (from lower to overlay);
> not in the scope of this patchset
>

Agree.

>  - NFS export: need reverse mapping; not in the scope of this patchset
>

Agree.

>  - for the snapshotfs case we need a way to keep the overlay in sync
> with a changing lower layer.  It's impossible to atomically update
> overlay.redirect together with the location of the lower file;
> overlay.origin can fix that
>

Correct.

>  - for non-redirect case looking up by overlay.origin is almost surely
> a pessimization
>

Not sure about that.
For directories by fh and by name are probably on par -
At most one lookup of ".." compared to one lookup of d.name.

For non-dir there is a better way that is better than both (see below).

>  - lookup by overlay.origin can work if overlay.redirect would be too
> long to fit in the max xattr len
>

Sure.

>  - for the redirect case looking up by overlay.origin may be faster,
> but may be much slower; hard to determine which to use
>

Here I disagree.
I claim that is always faster to find the lower (disconneccted) dentry
by fh, much faster in fact. See explanation at the bottom.

>  - overlay.origin can be used to verify if the lower path looked up
> with overlay.redirect is indeed the same file/directory that was
> originally copied up.  Not sure if this is useful for anything else
> than the snapshotfs case.
>

Its true. Comparing file handles would be quite easy.

> So my conclusion is that unless we must (snapshotfs, overflow in
> overlay.redirect) we should not be using overlay.origin for looking up
> the lower file.
>

I can live with that. Comparing the lookup result to origin.fh can be
optional based on some 'strict' mount option and resort to lookup by
fh when compare fails.
But we need to reconsider in light of my new suggestion, because
I do believe that finding the lower inode of non-dir is always a win
by fh.

> Even for snapshotfs it might make sense to start with plain lookup
> (with out without overlay.redirect) using overlay.origin to verify and
> falling back to lookup by overlay.origin only on mismatch (and
> updating overlay.redirect).
>
> Do you disagree?
>

So I managed to confuse myself about the technical facts of decoding,
because I was used to dealing only with decoding of dir handles
(for snapshots) and just now added decoding of non-dir.

When decoding a directory handle, it is always being connected up to
root. It sounds harsh, but in fact I think it will always be faster or on par
with regular lookup by path, because:
When looking up backwards until the first connected ancestor, filesystem
always has to read the ".." entry.
When looking up forward by path, then filesystem need to read entries
from connected ancestor by name and that is most likely indexed only
worse then the ".." entry of the backwards lookup.

When decoding directories you also want to get a connected dentry and
verifying is_subdir() makes sense.

HOWEVER, and this is big thing that I missed, when decoding a non-dir
we DON'T need to get a connected dentry.
It's perfectly fine to get a disconnected alias and getting a disconnected
alias is always O(1) for the filesystem.
The only thing we really need from this alias is to know its inode number
(and to know that it is still valid).

So if we encode non-connectable fh for non-dir (like knfsd does by default)
then:
1. decoding them will always be faster then any other lookup method
2. we cannot verify is_subdir() - so what?

What's the worse thing that can happen if the decoded entry is not under
the layer anymore? We only use its inode number, and the only thing we
need to know is that it is unique within the lower layers inode namespace
and we don't need is_subdir() for that.

But I just realized something very very bad about non-samefs case.
We must use made up st_dev for lower layers, we can certainly no
longer use the real lower st_dev.
If we do, then we will have 2 files in 2 different overlay mounts,
who have the same lower inode but 2 different upper inodes with
different content and those 2 overlay files will have the same
st_dev/st_ino.

I just found that in my debian based  kvm-xfstests machine, diff
reports 2 broken hardlinks with different content as equal, because
they have the same st_dev/st_ino.

So in conclusion:
1. Encode non-connectable file handles to non-dir
2. Always try to lookup non-dir by fh first - it's O(1)
3. Non-samefs needs fake st_dev before reporting constant inodes
4. Broken hardlinks should NOT report same inode

Urgh this was long!

Do you agree with my analysis of decoding complexity and the conclusions?

Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-27 13:53                               ` Amir Goldstein
@ 2017-04-27 14:46                                 ` Miklos Szeredi
  2017-04-27 16:08                                   ` Amir Goldstein
  0 siblings, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-27 14:46 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Thu, Apr 27, 2017 at 3:53 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Apr 27, 2017 at 4:11 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:

>>  - hardlinks: need to set overlay.redirect  when hardlink is created
>> from a copied up file; similar to what we do on rename
>>
>
> Partly agree.
> 1. This is not atomic, because hardlink is not created in workdir.

Doesn't matter; we can set overlay.redirect before we do the hardlink.
If the hardlink fails, we are left with the redirect, but that's not a
problem.

> 2. Reverse mapping will take care of this anyway. Remember?
>     there is an extra hardlink in workdir/inodes with has overlay.redirect
>     set on first alias copy up

Yeah, but I don't really see why we'd need to set overlay.redirect on
copy-up.  When reverse mapping (i.e. trying to reconstruct the overlay
dentry from the lower fh)  we'll just do the same thing as for normal
lookup: use the path from the upper root to the dentry and look up
each component in the lower layers (taking into account any
overlay.redirect encountered).

>>  - for non-redirect case looking up by overlay.origin is almost surely
>> a pessimization
>>
>
> Not sure about that.
> For directories by fh and by name are probably on par -
> At most one lookup of ".." compared to one lookup of d.name.
>
> For non-dir there is a better way that is better than both (see below).

Not that simple (see below).

> So I managed to confuse myself about the technical facts of decoding,
> because I was used to dealing only with decoding of dir handles
> (for snapshots) and just now added decoding of non-dir.
>
> When decoding a directory handle, it is always being connected up to
> root. It sounds harsh, but in fact I think it will always be faster or on par
> with regular lookup by path, because:
> When looking up backwards until the first connected ancestor, filesystem
> always has to read the ".." entry.
> When looking up forward by path, then filesystem need to read entries
> from connected ancestor by name and that is most likely indexed only
> worse then the ".." entry of the backwards lookup.

Problem is there's more going on than just lookup of "..".  In fact it
*must* entail the lookup of "name" as well, because that's the way the
dentry gets connected.  There's an even bigger snag: where do we get
the name?  There's a ->get_name() export op, but most fs don't define
it, and the default action is to iterate the parent dir and find the
one matching our inum.  There goes the performance...

That's why I'm saying it's almost certainly will be slower.  Exception
might be the cached case, but even there lookup by inum might be
slower than the super optimized cached path lookup of a few filenames.
Since we are looking up the overlay dentry, which isn't cached at this
point, so why would the lower ones be?

> When decoding directories you also want to get a connected dentry and
> verifying is_subdir() makes sense.
>
> HOWEVER, and this is big thing that I missed, when decoding a non-dir
> we DON'T need to get a connected dentry.
> It's perfectly fine to get a disconnected alias and getting a disconnected
> alias is always O(1) for the filesystem.

It's O(1) but so is a single component lookup (case without
overlay.redirect).  In the cold cache cache both will be slow, since
most of the time will be spent on getting the inode from disk.  In the
hot cache case, odds are the name lookup will win, since it's the more
optimized codepath...

> The only thing we really need from this alias is to know its inode number
> (and to know that it is still valid).
>
> So if we encode non-connectable fh for non-dir (like knfsd does by default)
> then:
> 1. decoding them will always be faster then any other lookup method
> 2. we cannot verify is_subdir() - so what?
>
> What's the worse thing that can happen if the decoded entry is not under
> the layer anymore? We only use its inode number, and the only thing we
> need to know is that it is unique within the lower layers inode namespace
> and we don't need is_subdir() for that.

Okay.

>
> But I just realized something very very bad about non-samefs case.
> We must use made up st_dev for lower layers, we can certainly no
> longer use the real lower st_dev.
> If we do, then we will have 2 files in 2 different overlay mounts,
> who have the same lower inode but 2 different upper inodes with
> different content and those 2 overlay files will have the same
> st_dev/st_ino.

Yeah, I told you about this issue a couple of mails back.

>
> I just found that in my debian based  kvm-xfstests machine, diff
> reports 2 broken hardlinks with different content as equal, because
> they have the same st_dev/st_ino.
>
> So in conclusion:
> 1. Encode non-connectable file handles to non-dir

Fine by me.

> 2. Always try to lookup non-dir by fh first - it's O(1)

Well... for simplicity sake, okay.  Probably not a big loss.

> 3. Non-samefs needs fake st_dev before reporting constant inodes

Yes.

> 4. Broken hardlinks should NOT report same inode

Yes.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-27 14:46                                 ` Miklos Szeredi
@ 2017-04-27 16:08                                   ` Amir Goldstein
  2017-04-28  7:25                                     ` Amir Goldstein
  0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-27 16:08 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel

On Thu, Apr 27, 2017 at 5:46 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Apr 27, 2017 at 3:53 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Thu, Apr 27, 2017 at 4:11 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
>>>  - hardlinks: need to set overlay.redirect  when hardlink is created
>>> from a copied up file; similar to what we do on rename
>>>
>>
>> Partly agree.
>> 1. This is not atomic, because hardlink is not created in workdir.
>
> Doesn't matter; we can set overlay.redirect before we do the hardlink.
> If the hardlink fails, we are left with the redirect, but that's not a
> problem.
>

Right...

>> 2. Reverse mapping will take care of this anyway. Remember?
>>     there is an extra hardlink in workdir/inodes with has overlay.redirect
>>     set on first alias copy up
>
> Yeah, but I don't really see why we'd need to set overlay.redirect on
> copy-up.  When reverse mapping (i.e. trying to reconstruct the overlay
> dentry from the lower fh)  we'll just do the same thing as for normal
> lookup: use the path from the upper root to the dentry and look up
> each component in the lower layers (taking into account any
> overlay.redirect encountered).

I was thinking we need overlay.redirect to find a lower alias by path
from any upper alias, so I knew we should have overlay.redirect in the upper
inode, but it's true that we can set it on the first ovl_link() and not at first
copy up time.


>
>>>  - for non-redirect case looking up by overlay.origin is almost surely
>>> a pessimization
>>>
>>
>> Not sure about that.
>> For directories by fh and by name are probably on par -
>> At most one lookup of ".." compared to one lookup of d.name.
>>
>> For non-dir there is a better way that is better than both (see below).
>
> Not that simple (see below).

Not that simple referring to directories (and I agree), but
not referring to non connected non-dir.


>
> Problem is there's more going on than just lookup of "..".  In fact it
> *must* entail the lookup of "name" as well, because that's the way the
> dentry gets connected.  There's an even bigger snag: where do we get
> the name?  There's a ->get_name() export op, but most fs don't define
> it, and the default action is to iterate the parent dir and find the
> one matching our inum.  There goes the performance...
>
> That's why I'm saying it's almost certainly will be slower.  Exception
> might be the cached case, but even there lookup by inum might be
> slower than the super optimized cached path lookup of a few filenames.
> Since we are looking up the overlay dentry, which isn't cached at this
> point, so why would the lower ones be?

Agree for directories, so we should not be looking directory by fh.
It's anyway hard to do for numlayers > 1.
I will see about comparing origin.fh with dir found by path - it may
be the way to go for snapshots.

>
>> When decoding directories you also want to get a connected dentry and
>> verifying is_subdir() makes sense.
>>
>> HOWEVER, and this is big thing that I missed, when decoding a non-dir
>> we DON'T need to get a connected dentry.
>> It's perfectly fine to get a disconnected alias and getting a disconnected
>> alias is always O(1) for the filesystem.
>
> It's O(1) but so is a single component lookup (case without
> overlay.redirect).  In the cold cache cache both will be slow, since
> most of the time will be spent on getting the inode from disk.  In the
> hot cache case, odds are the name lookup will win, since it's the more
> optimized codepath...
>

Correct.

To complete the picture, here is how better lookup by inode than
lookup by redirect path.

Lookup of inode should be always quite fast for filesystem, even with
cold indoe/dentry cache, inodes are easy to find by index, so worse case
is O(1 inode block read from disk at a a known location).

Compared to redirect by path of N elements this is much much better.
O(N synchronic reads of inode blocks and N directory blocks from disk)


>
>>
>> But I just realized something very very bad about non-samefs case.
>> We must use made up st_dev for lower layers, we can certainly no
>> longer use the real lower st_dev.
>> If we do, then we will have 2 files in 2 different overlay mounts,
>> who have the same lower inode but 2 different upper inodes with
>> different content and those 2 overlay files will have the same
>> st_dev/st_ino.
>
> Yeah, I told you about this issue a couple of mails back.
>

Yes you did. I guess I thought you were referring to not all lower
on same sb. Then need a unique st_dev per lower layer because
they don't share the same inode namespace.

I though that 'all lower on same fs' was ok to use the same_lower_sb
as st_dev, as is the case now for lower type entries, but it is not ok.


>>
>> I just found that in my debian based  kvm-xfstests machine, diff
>> reports 2 broken hardlinks with different content as equal, because
>> they have the same st_dev/st_ino.
>>
>> So in conclusion:
>> 1. Encode non-connectable file handles to non-dir
>
> Fine by me.
>
>> 2. Always try to lookup non-dir by fh first - it's O(1)
>
> Well... for simplicity sake, okay.  Probably not a big loss.
>
>> 3. Non-samefs needs fake st_dev before reporting constant inodes
>
> Yes.
>
>> 4. Broken hardlinks should NOT report same inode
>
> Yes.
>

So to list everything for v4 in one place:

5. store uuid together with lower fh inside struct ovl_fh (in overlay.origin)
There does not seem to be a reason to store root fh though for non-dir
and its not relevant for for lookup of dir for snapshot case either (single
lower layer case)

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-27 16:08                                   ` Amir Goldstein
@ 2017-04-28  7:25                                     ` Amir Goldstein
  2017-04-28  7:55                                       ` Miklos Szeredi
  0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-28  7:25 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs, Vivek Goyal

[Removed some CC from list to reduce spam over side question]

On Thu, Apr 27, 2017 at 7:08 PM, Amir Goldstein <amir73il@gmail.com> wrote:

>
> So to list everything for v4 in one place:
>
> 5. store uuid together with lower fh inside struct ovl_fh (in overlay.origin)
> There does not seem to be a reason to store root fh though for non-dir
> and its not relevant for for lookup of dir for snapshot case either (single
> lower layer case)
>

Miklos,

I see you are online so I might as well ask now again to make sure
(because our weekends are not aligned).

With all recent conclusions, do you see a reason to keep origin root fh?

For snapshots I need just one thing -
Verify that origin.fh matches the lower of merge dir that was found by path.
The verification is very cheap (only encode the found dentry), so we may
do it in any configuration, just don't know how to act upon it.

What to do in case verification fails may need configuration option.
For snapshots I need a 'strict' policy meaning that "stale lower" equals
"implicit opaque", but that will not do the right thing for copied layers case.

The way I have it now in my snapshot patches is overload on the redirect_dir
mount option and add a value redirect_dir=fh. The build time and module
options are still boolean, but -o redirect_dir=fh sets config->redirect_dir=true
and config->redirect_fh=true.
config->redirect_fh can later be set to false if the prerequisite (samefs etc)
don't apply.

I may need to separate the general ofs->redirect_fh capabiltiy from the mount
configuration (i.e. config->redirect_dir_fh or make
config->redirect_dir an enum).

I could also add more policy options for redirect_dir, i.e.:
off - pre v4.10 compat
on - v4.10 compat (path only)
path - same as on, just to explicitly mention for when knowingly copying layers
fh - snapshot case, fh must be verified
auto - (the default?) best effort w.r.t lower dir renames -
lookup by path, verify fh, if fails try to lookup by fh, if fails use
path result anyway.

I realize you prefer the "minimum configuration" policy, but I'm afraid we
are at a cross road of letting the user decide. No?

Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-28  7:25                                     ` Amir Goldstein
@ 2017-04-28  7:55                                       ` Miklos Szeredi
  2017-04-28  8:15                                         ` Amir Goldstein
  0 siblings, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-28  7:55 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: linux-unionfs, Vivek Goyal

On Fri, Apr 28, 2017 at 9:25 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> With all recent conclusions, do you see a reason to keep origin root fh?

No, I'm quite happy if we don't :)

> For snapshots I need just one thing -
> Verify that origin.fh matches the lower of merge dir that was found by path.
> The verification is very cheap (only encode the found dentry), so we may
> do it in any configuration, just don't know how to act upon it.
>
> What to do in case verification fails may need configuration option.
> For snapshots I need a 'strict' policy meaning that "stale lower" equals
> "implicit opaque", but that will not do the right thing for copied layers case.
>
> The way I have it now in my snapshot patches is overload on the redirect_dir
> mount option and add a value redirect_dir=fh. The build time and module
> options are still boolean, but -o redirect_dir=fh sets config->redirect_dir=true
> and config->redirect_fh=true.
> config->redirect_fh can later be set to false if the prerequisite (samefs etc)
> don't apply.
>
> I may need to separate the general ofs->redirect_fh capabiltiy from the mount
> configuration (i.e. config->redirect_dir_fh or make
> config->redirect_dir an enum).
>
> I could also add more policy options for redirect_dir, i.e.:
> off - pre v4.10 compat
> on - v4.10 compat (path only)
> path - same as on, just to explicitly mention for when knowingly copying layers
> fh - snapshot case, fh must be verified
> auto - (the default?) best effort w.r.t lower dir renames -
> lookup by path, verify fh, if fails try to lookup by fh, if fails use
> path result anyway.
>
> I realize you prefer the "minimum configuration" policy, but I'm afraid we
> are at a cross road of letting the user decide. No?

Is this only about dirs though?

For now I'd just add a "verify_lower" option defaulting to off and not
expand "redirect_dir".  That should take care of the snapshot case,
right?

The more complicated things can come later.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-28  7:55                                       ` Miklos Szeredi
@ 2017-04-28  8:15                                         ` Amir Goldstein
  2017-04-28  9:37                                           ` Miklos Szeredi
  0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-28  8:15 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs, Vivek Goyal

On Fri, Apr 28, 2017 at 10:55 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Apr 28, 2017 at 9:25 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> With all recent conclusions, do you see a reason to keep origin root fh?
>
> No, I'm quite happy if we don't :)
>
>> For snapshots I need just one thing -
>> Verify that origin.fh matches the lower of merge dir that was found by path.
>> The verification is very cheap (only encode the found dentry), so we may
>> do it in any configuration, just don't know how to act upon it.
>>
>> What to do in case verification fails may need configuration option.
>> For snapshots I need a 'strict' policy meaning that "stale lower" equals
>> "implicit opaque", but that will not do the right thing for copied layers case.
>>
>> The way I have it now in my snapshot patches is overload on the redirect_dir
>> mount option and add a value redirect_dir=fh. The build time and module
>> options are still boolean, but -o redirect_dir=fh sets config->redirect_dir=true
>> and config->redirect_fh=true.
>> config->redirect_fh can later be set to false if the prerequisite (samefs etc)
>> don't apply.
>>
>> I may need to separate the general ofs->redirect_fh capabiltiy from the mount
>> configuration (i.e. config->redirect_dir_fh or make
>> config->redirect_dir an enum).
>>
>> I could also add more policy options for redirect_dir, i.e.:
>> off - pre v4.10 compat
>> on - v4.10 compat (path only)
>> path - same as on, just to explicitly mention for when knowingly copying layers
>> fh - snapshot case, fh must be verified
>> auto - (the default?) best effort w.r.t lower dir renames -
>> lookup by path, verify fh, if fails try to lookup by fh, if fails use
>> path result anyway.
>>
>> I realize you prefer the "minimum configuration" policy, but I'm afraid we
>> are at a cross road of letting the user decide. No?
>
> Is this only about dirs though?

Well, we could add configuration options to decide if and how to follow
and verify fh for non-dir, but:

1. We agreed that trying to follow fh for non-dir is a no-loose situation
    for !redirect and hot cache case and a probable win for redirect with
    cold case
2. For snapshots the behavior should be the same -
    use the lower ino if you can find and hold lower inode
    use upper ino otherwise (i.e. snapshots are not binding the inode
    numbers forever)

So I see no reason, going forward, to provide a user configuration
for the lookup of non-dir.

>
> For now I'd just add a "verify_lower" option defaulting to off and not
> expand "redirect_dir".  That should take care of the snapshot case,
> right?
>

Right.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-28  8:15                                         ` Amir Goldstein
@ 2017-04-28  9:37                                           ` Miklos Szeredi
  2017-04-28  9:57                                             ` Amir Goldstein
  0 siblings, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-28  9:37 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: linux-unionfs, Vivek Goyal

On Fri, Apr 28, 2017 at 10:15 AM, Amir Goldstein <amir73il@gmail.com> wrote:

> Well, we could add configuration options to decide if and how to follow
> and verify fh for non-dir, but:
>
> 1. We agreed that trying to follow fh for non-dir is a no-loose situation
>     for !redirect and hot cache case and a probable win for redirect with
>     cold case

Okay.  That also means that redirect is not actually needed for
non-dir.  Well, except for the weird case of having to reconstruct a
reverse mapping after copying the layers in order to properly handle
copy up of hardlinks on the lower layer.  But lets not care about that
for now (or ever, probably).

> 2. For snapshots the behavior should be the same -
>     use the lower ino if you can find and hold lower inode
>     use upper ino otherwise (i.e. snapshots are not binding the inode
>     numbers forever)
>
> So I see no reason, going forward, to provide a user configuration
> for the lookup of non-dir.

Good.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-28  9:37                                           ` Miklos Szeredi
@ 2017-04-28  9:57                                             ` Amir Goldstein
  2017-04-28 10:05                                               ` Miklos Szeredi
  0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-28  9:57 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs, Vivek Goyal

On Fri, Apr 28, 2017 at 12:37 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Apr 28, 2017 at 10:15 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>
>> Well, we could add configuration options to decide if and how to follow
>> and verify fh for non-dir, but:
>>
>> 1. We agreed that trying to follow fh for non-dir is a no-loose situation
>>     for !redirect and hot cache case and a probable win for redirect with
>>     cold case
>
> Okay.  That also means that redirect is not actually needed for
> non-dir.  Well, except for the weird case of having to reconstruct a
> reverse mapping after copying the layers in order to properly handle
> copy up of hardlinks on the lower layer.  But lets not care about that
> for now (or ever, probably).
>

It's needed for when we can't lookup by fh:
- lower has NULL uuid
- !same_lower_sb (may be relaxed going forward)

Heh, it's hard too keep track of it all ;-)

Once all the dust is settled, I'll sit down to write a 'Redirect by file handle'
section for  Documentation/filesystems/overlayfs.txt that will summarize
all cases.

Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-28  9:57                                             ` Amir Goldstein
@ 2017-04-28 10:05                                               ` Miklos Szeredi
  2017-04-28 10:45                                                 ` Amir Goldstein
  0 siblings, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-28 10:05 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: linux-unionfs, Vivek Goyal

On Fri, Apr 28, 2017 at 11:57 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Fri, Apr 28, 2017 at 12:37 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Fri, Apr 28, 2017 at 10:15 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>
>>> Well, we could add configuration options to decide if and how to follow
>>> and verify fh for non-dir, but:
>>>
>>> 1. We agreed that trying to follow fh for non-dir is a no-loose situation
>>>     for !redirect and hot cache case and a probable win for redirect with
>>>     cold case
>>
>> Okay.  That also means that redirect is not actually needed for
>> non-dir.  Well, except for the weird case of having to reconstruct a
>> reverse mapping after copying the layers in order to properly handle
>> copy up of hardlinks on the lower layer.  But lets not care about that
>> for now (or ever, probably).
>>
>
> It's needed for when we can't lookup by fh:
> - lower has NULL uuid
> - !same_lower_sb (may be relaxed going forward)
>
> Heh, it's hard too keep track of it all ;-)

Since this is only for constant inode, we really shouldn't need to
care about any of the above cases.

Lets keep things as simple as possible.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
  2017-04-28 10:05                                               ` Miklos Szeredi
@ 2017-04-28 10:45                                                 ` Amir Goldstein
  0 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-28 10:45 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs, Vivek Goyal

On Fri, Apr 28, 2017 at 1:05 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Apr 28, 2017 at 11:57 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Fri, Apr 28, 2017 at 12:37 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Fri, Apr 28, 2017 at 10:15 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>
>>>> Well, we could add configuration options to decide if and how to follow
>>>> and verify fh for non-dir, but:
>>>>
>>>> 1. We agreed that trying to follow fh for non-dir is a no-loose situation
>>>>     for !redirect and hot cache case and a probable win for redirect with
>>>>     cold case
>>>
>>> Okay.  That also means that redirect is not actually needed for
>>> non-dir.  Well, except for the weird case of having to reconstruct a
>>> reverse mapping after copying the layers in order to properly handle
>>> copy up of hardlinks on the lower layer.  But lets not care about that
>>> for now (or ever, probably).
>>>
>>
>> It's needed for when we can't lookup by fh:
>> - lower has NULL uuid
>> - !same_lower_sb (may be relaxed going forward)
>>
>> Heh, it's hard too keep track of it all ;-)
>
> Since this is only for constant inode, we really shouldn't need to
> care about any of the above cases.

It's also going to be needed for preserving hardlinks
(even without copying layers) for !same_fs_with_uuid case.
But that is not fixed by this series, so I can introduce
"redirect non-dir on ovl_link() and ovl_rename()"
in the next series.

>
> Lets keep things as simple as possible.
>

OK, so I'll handle lookup for all layer on samefs with uuid
and leave the rest for later.

And will store overlay.origin in any configuration, which
may be FILEID_INVALID for lowers that don't support exportfs,
so we can make use of it later.

Amir.

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2017-04-28 10:45 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-24  9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
2017-04-24  9:14 ` [PATCH v2 01/11] ovl: store path type in dentry Amir Goldstein
2017-04-24 12:59   ` Vivek Goyal
2017-04-24 13:10     ` Amir Goldstein
2017-04-24 13:36       ` Vivek Goyal
2017-04-24 13:41         ` Amir Goldstein
2017-04-24  9:14 ` [PATCH v2 02/11] ovl: cram opaque boolean into type flags Amir Goldstein
2017-04-24  9:14 ` [PATCH v2 03/11] ovl: check if all layers are on the same fs Amir Goldstein
2017-04-24  9:14 ` [PATCH v2 04/11] ovl: store file handle of lower inode on copy up Amir Goldstein
2017-04-24 13:32   ` kbuild test robot
2017-04-24 13:57     ` Amir Goldstein
2017-04-25 14:53   ` Miklos Szeredi
2017-04-26  5:47     ` Amir Goldstein
2017-04-26  9:21       ` Miklos Szeredi
2017-04-26  9:27         ` Amir Goldstein
2017-04-26  9:35           ` Miklos Szeredi
2017-04-26  9:39   ` Miklos Szeredi
2017-04-26  9:53     ` Amir Goldstein
2017-04-26  9:57       ` Miklos Szeredi
2017-04-24  9:14 ` [PATCH v2 05/11] ovl: lookup redirect by file handle Amir Goldstein
2017-04-25  8:10   ` Amir Goldstein
2017-04-25 15:13   ` Miklos Szeredi
2017-04-25 17:41     ` Amir Goldstein
2017-04-25 19:11       ` Amir Goldstein
2017-04-26  9:06         ` Miklos Szeredi
2017-04-26  9:40           ` Amir Goldstein
2017-04-26  9:55             ` Miklos Szeredi
2017-04-26 10:17               ` Amir Goldstein
2017-04-26 12:15                 ` Miklos Szeredi
2017-04-26 14:51                   ` Amir Goldstein
2017-04-27  6:27                     ` Amir Goldstein
2017-04-27  7:48                       ` Miklos Szeredi
2017-04-27  9:22                         ` Amir Goldstein
2017-04-27  9:26                         ` Miklos Szeredi
     [not found]                           ` <CAOQ4uxiweaqzR3eT-StgtDFAHBuYhGRvAJE6v=XpH33MevpmoA@mail.gmail.com>
     [not found]                             ` <CAJfpegtTJmcLVrLOeQbhu4Q6sM0Mi_FRgr+vStF0k95QsWm5uQ@mail.gmail.com>
2017-04-27 13:53                               ` Amir Goldstein
2017-04-27 14:46                                 ` Miklos Szeredi
2017-04-27 16:08                                   ` Amir Goldstein
2017-04-28  7:25                                     ` Amir Goldstein
2017-04-28  7:55                                       ` Miklos Szeredi
2017-04-28  8:15                                         ` Amir Goldstein
2017-04-28  9:37                                           ` Miklos Szeredi
2017-04-28  9:57                                             ` Amir Goldstein
2017-04-28 10:05                                               ` Miklos Szeredi
2017-04-28 10:45                                                 ` Amir Goldstein
2017-04-27  7:40                     ` Miklos Szeredi
2017-04-24  9:14 ` [PATCH v2 06/11] ovl: lookup non-dir inode copy up origin Amir Goldstein
2017-04-24  9:14 ` [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs Amir Goldstein
2017-04-26 14:40   ` Miklos Szeredi
2017-04-26 14:53     ` Miklos Szeredi
2017-04-26 15:02       ` Amir Goldstein
2017-04-26 18:51         ` Amir Goldstein
2017-04-27  9:32         ` Miklos Szeredi
2017-04-26 14:57     ` Amir Goldstein
2017-04-24  9:14 ` [PATCH v2 08/11] ovl: redirect non-dir by path on rename Amir Goldstein
2017-04-24  9:14 ` [PATCH v2 09/11] ovl: constant st_ino/st_dev across copy up Amir Goldstein
2017-04-24  9:14 ` [PATCH v2 10/11] ovl: persistent and constant inode number for directories Amir Goldstein
2017-04-24  9:14 ` [PATCH v2 11/11] ovl: fix du --one-file-system on overlay mount Amir Goldstein
2017-04-24 18:40 ` [PATCH v2 12/12] ovl: persistent inode numbers for hardlinks Amir Goldstein
2017-04-24 18:51 ` [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
2017-04-25 11:52 ` Vivek Goyal
2017-04-25 12:05   ` Amir Goldstein
2017-04-25 12:16 ` Vivek Goyal
2017-04-25 12:41   ` Amir Goldstein
2017-04-25 12:52     ` Vivek Goyal
2017-04-25 13:23       ` Amir Goldstein
2017-04-25 13:29         ` Vivek Goyal
2017-04-25 13:49           ` Amir Goldstein
2017-04-25 13:53             ` Vivek Goyal
2017-04-25 14:20               ` Amir Goldstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.