All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/9][V3] overlayfs: Delayed copy up of data
@ 2017-10-10 15:32 Vivek Goyal
  2017-10-10 15:32 ` [PATCH 1/9] ovl: ovl_check_setxattr() get rid of redundant -EOPNOTSUPP check Vivek Goyal
                   ` (8 more replies)
  0 siblings, 9 replies; 31+ messages in thread
From: Vivek Goyal @ 2017-10-10 15:32 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal, ebiederm

Hi,

This is V3 of patches.

Changes from V2:

- Got rid of boolean in ovl_getattr(). This will be changed again once
  chandan's patches are merged (Amir).

- Moved index=on check much later in ovl_fill_super(). (Amir)

- Fixed an inode leak in case of returning -ESTALE.

- Added another patch to add read and write barrier around
  setting/resetting of OVL_METACOPY bit. As we do lockless access in
  ovl_copy_up_flags(), I think without this we can be subject to races.

Original description
--------------------
In one of the recent converstions, people mentioned that chown/chmod
lead to copy up files as well as data. We could optimize it so that
only metadata is copied up during chown/chmod and data is copied up when
file is opened for WRITE.

This optimization potentially could be useful with containers and user
namespaces. In popular scenario, people end up doing chown() on whole
image directory tree based on container mappings. And this chown copies
up everything, breaking sharing of page cache between containers.

With these patches, only metadat is copied up during chown() and if file
is opened for READ, d_real() returns lower dentry/inode. That way,
different containers can still continue to use page cache. That's the
use case I have in mind.

Basically, I am relying on storing OVL_XATTR_ORIGIN in upper inode
during copy up. I use that information to get to lower inode later and
do data copy up when necessary.

I also store OVL_XATTR_METACOPY in upper inode to mark that only
metadata has been copied up and data copy up still might be required.

Any feedback is helpful.

Thanks
Vivek

Vivek Goyal (9):
  ovl: ovl_check_setxattr() get rid of redundant -EOPNOTSUPP check
  ovl: During copy up, first copy up metadata and then data
  ovl: Provide a mount option metacopy=on/off for metadata copyup
  ovl: Copy up only metadata during copy up where it makes sense
  ovl: Set xattr OVL_XATTR_METACOPY on upper file
  ovl: Fix ovl_getattr() to get number of blocks from lower
  ovl: Introduce read/write barriers around metacopy flag update
  ovl: Set OVL_METACOPY flag during ovl_lookup()
  ovl: Return lower dentry if only metadata copy up took place

 fs/overlayfs/Kconfig     |   9 ++++
 fs/overlayfs/copy_up.c   | 114 ++++++++++++++++++++++++++++++++++++++---------
 fs/overlayfs/inode.c     |  12 ++++-
 fs/overlayfs/namei.c     |  40 +++++++++++++++++
 fs/overlayfs/overlayfs.h |   8 +++-
 fs/overlayfs/ovl_entry.h |   1 +
 fs/overlayfs/super.c     |  36 +++++++++++++++
 fs/overlayfs/util.c      |  53 +++++++++++++++++-----
 8 files changed, 237 insertions(+), 36 deletions(-)

-- 
2.13.5

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/9] ovl: ovl_check_setxattr() get rid of redundant -EOPNOTSUPP check
  2017-10-10 15:32 [RFC PATCH 0/9][V3] overlayfs: Delayed copy up of data Vivek Goyal
@ 2017-10-10 15:32 ` Vivek Goyal
  2017-10-10 15:32 ` [PATCH 2/9] ovl: During copy up, first copy up metadata and then data Vivek Goyal
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Vivek Goyal @ 2017-10-10 15:32 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal, ebiederm

At mount time we check if upper supports xattr or not and set ofs->noxattr
field accordingly. But in ovl_check_setxattr() still seems to have logic
which expects that ovl_do_setxattr() can return -EOPNOTSUPP. Not sure when
and how can we hit this code. Feels redundant to me. So I am removing
this code.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/util.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index b9b239fa5cfd..a4ce1c9944f0 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -348,21 +348,12 @@ int ovl_check_setxattr(struct dentry *dentry, struct dentry *upperdentry,
 		       const char *name, const void *value, size_t size,
 		       int xerr)
 {
-	int err;
 	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
 
 	if (ofs->noxattr)
 		return xerr;
 
-	err = ovl_do_setxattr(upperdentry, name, value, size, 0);
-
-	if (err == -EOPNOTSUPP) {
-		pr_warn("overlayfs: cannot set %s xattr on upper\n", name);
-		ofs->noxattr = true;
-		return xerr;
-	}
-
-	return err;
+	return ovl_do_setxattr(upperdentry, name, value, size, 0);
 }
 
 int ovl_set_impure(struct dentry *dentry, struct dentry *upperdentry)
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/9] ovl: During copy up, first copy up metadata and then data
  2017-10-10 15:32 [RFC PATCH 0/9][V3] overlayfs: Delayed copy up of data Vivek Goyal
  2017-10-10 15:32 ` [PATCH 1/9] ovl: ovl_check_setxattr() get rid of redundant -EOPNOTSUPP check Vivek Goyal
@ 2017-10-10 15:32 ` Vivek Goyal
  2017-10-10 15:32 ` [PATCH 3/9] ovl: Provide a mount option metacopy=on/off for metadata copyup Vivek Goyal
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Vivek Goyal @ 2017-10-10 15:32 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal, ebiederm

This just helps with later patches where after copying up metadata, we
skip data copying step, if needed.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/copy_up.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index c441f9387a1b..b626584bb7b4 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -445,6 +445,23 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
 {
 	int err;
 
+	err = ovl_copy_xattr(c->lowerpath.dentry, temp);
+	if (err)
+		return err;
+
+	/*
+	 * Store identifier of lower inode in upper inode xattr to
+	 * allow lookup of the copy up origin inode.
+	 *
+	 * Don't set origin when we are breaking the association with a lower
+	 * hard link.
+	 */
+	if (c->origin) {
+		err = ovl_set_origin(c->dentry, c->lowerpath.dentry, temp);
+		if (err)
+			return err;
+	}
+
 	if (S_ISREG(c->stat.mode)) {
 		struct path upperpath;
 
@@ -457,29 +474,12 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
 			return err;
 	}
 
-	err = ovl_copy_xattr(c->lowerpath.dentry, temp);
-	if (err)
-		return err;
-
 	inode_lock(temp->d_inode);
 	err = ovl_set_attr(temp, &c->stat);
 	inode_unlock(temp->d_inode);
 	if (err)
 		return err;
 
-	/*
-	 * Store identifier of lower inode in upper inode xattr to
-	 * allow lookup of the copy up origin inode.
-	 *
-	 * Don't set origin when we are breaking the association with a lower
-	 * hard link.
-	 */
-	if (c->origin) {
-		err = ovl_set_origin(c->dentry, c->lowerpath.dentry, temp);
-		if (err)
-			return err;
-	}
-
 	return 0;
 }
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/9] ovl: Provide a mount option metacopy=on/off for metadata copyup
  2017-10-10 15:32 [RFC PATCH 0/9][V3] overlayfs: Delayed copy up of data Vivek Goyal
  2017-10-10 15:32 ` [PATCH 1/9] ovl: ovl_check_setxattr() get rid of redundant -EOPNOTSUPP check Vivek Goyal
  2017-10-10 15:32 ` [PATCH 2/9] ovl: During copy up, first copy up metadata and then data Vivek Goyal
@ 2017-10-10 15:32 ` Vivek Goyal
  2017-10-11  1:36   ` Amir Goldstein
  2017-10-10 15:32 ` [PATCH 4/9] ovl: Copy up only metadata during copy up where it makes sense Vivek Goyal
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Vivek Goyal @ 2017-10-10 15:32 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal, ebiederm

By default metadata only copy up is disabled. Provide a mount option so
that users can choose one way or other.

Also metadata copyup is conditional on index=on. If index=off and user
specifies metacopy=on, it goes back to metacopy=off and a warning is
printed.

Also provide a kernel config and module option to enable/disable
metacopy feature.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/Kconfig     |  9 +++++++++
 fs/overlayfs/ovl_entry.h |  1 +
 fs/overlayfs/super.c     | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 42 insertions(+)

diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
index cbfc196e5dc5..94d4c61719c8 100644
--- a/fs/overlayfs/Kconfig
+++ b/fs/overlayfs/Kconfig
@@ -43,3 +43,12 @@ config OVERLAY_FS_INDEX
 	  outcomes.  However, mounting the same overlay with an old kernel
 	  read-write and then mounting it again with a new kernel, will have
 	  unexpected results.
+
+config OVERLAY_FS_METACOPY
+	bool "Overlayfs: turn on metadata only copy up feature by default"
+	depends on OVERLAY_FS
+	depends on OVERLAY_FS_INDEX
+	help
+	  If this config option is enabled then overlay filesystems will
+	  copy up only metadata where appropriate and data copy up will
+	  happen when a file is opended for WRITE operation.
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 25d9b5adcd42..6806f0b0fbc2 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -15,6 +15,7 @@ struct ovl_config {
 	bool default_permissions;
 	bool redirect_dir;
 	bool index;
+	bool metacopy;
 };
 
 /* private information held for overlayfs's superblock */
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 092d150643c1..6f4c32e49298 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -39,6 +39,11 @@ module_param_named(index, ovl_index_def, bool, 0644);
 MODULE_PARM_DESC(ovl_index_def,
 		 "Default to on or off for the inodes index feature");
 
+static bool ovl_metacopy_def = IS_ENABLED(CONFIG_OVERLAY_FS_METACOPY);
+module_param_named(metacopy, ovl_metacopy_def, bool, 0644);
+MODULE_PARM_DESC(ovl_metacopy_def,
+		 "Default to on or off for the metadata only copy up feature");
+
 static void ovl_dentry_release(struct dentry *dentry)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
@@ -303,6 +308,9 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
 	if (ufs->config.index != ovl_index_def)
 		seq_printf(m, ",index=%s",
 			   ufs->config.index ? "on" : "off");
+	if (ufs->config.metacopy != ovl_metacopy_def)
+		seq_printf(m, ",metacopy=%s",
+			   ufs->config.metacopy ? "on" : "off");
 	return 0;
 }
 
@@ -336,6 +344,8 @@ enum {
 	OPT_REDIRECT_DIR_OFF,
 	OPT_INDEX_ON,
 	OPT_INDEX_OFF,
+	OPT_METACOPY_ON,
+	OPT_METACOPY_OFF,
 	OPT_ERR,
 };
 
@@ -348,6 +358,8 @@ static const match_table_t ovl_tokens = {
 	{OPT_REDIRECT_DIR_OFF,		"redirect_dir=off"},
 	{OPT_INDEX_ON,			"index=on"},
 	{OPT_INDEX_OFF,			"index=off"},
+	{OPT_METACOPY_ON,		"metacopy=on"},
+	{OPT_METACOPY_OFF,		"metacopy=off"},
 	{OPT_ERR,			NULL}
 };
 
@@ -428,6 +440,14 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
 			config->index = false;
 			break;
 
+		case OPT_METACOPY_ON:
+			config->metacopy = true;
+			break;
+
+		case OPT_METACOPY_OFF:
+			config->metacopy = false;
+			break;
+
 		default:
 			pr_err("overlayfs: unrecognized mount option \"%s\" or missing value\n", p);
 			return -EINVAL;
@@ -847,6 +867,12 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 
 	ufs->config.redirect_dir = ovl_redirect_dir_def;
 	ufs->config.index = ovl_index_def;
+	if (ovl_metacopy_def && !ovl_index_def) {
+		pr_warn("overlayfs: metadata copy up can not be enabled by default as index feature is not enabled by default.\n");
+		ovl_metacopy_def = false;
+	}
+	ufs->config.metacopy = ovl_metacopy_def;
+
 	err = ovl_parse_opt((char *) data, &ufs->config);
 	if (err)
 		goto out_free_config;
@@ -1091,6 +1117,12 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	if (!ufs->indexdir)
 		ufs->config.index = false;
 
+	/* As of now metacopy=on is dependent on index=on */
+	if (ufs->config.metacopy && !ufs->config.index) {
+		pr_warn("overlayfs: can not enable metadata only copyup with index=off. Falling back to metacopy=off\n");
+		ufs->config.metacopy = false;
+	}
+
 	if (remote)
 		sb->s_d_op = &ovl_reval_dentry_operations;
 	else
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 4/9] ovl: Copy up only metadata during copy up where it makes sense
  2017-10-10 15:32 [RFC PATCH 0/9][V3] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (2 preceding siblings ...)
  2017-10-10 15:32 ` [PATCH 3/9] ovl: Provide a mount option metacopy=on/off for metadata copyup Vivek Goyal
@ 2017-10-10 15:32 ` Vivek Goyal
  2017-10-10 15:32 ` [PATCH 5/9] ovl: Set xattr OVL_XATTR_METACOPY on upper file Vivek Goyal
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Vivek Goyal @ 2017-10-10 15:32 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal, ebiederm

If it makes sense to copy up only metadata during copy up, do it. This
is done for regular files which are not opened for WRITE and have origin
being saved.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/copy_up.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index b626584bb7b4..ebd404629c6d 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -306,11 +306,8 @@ static int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
 			return PTR_ERR(fh);
 	}
 
-	/*
-	 * Do not fail when upper doesn't support xattrs.
-	 */
 	err = ovl_check_setxattr(dentry, upper, OVL_XATTR_ORIGIN, fh,
-				 fh ? fh->len : 0, 0);
+				 fh ? fh->len : 0, -EOPNOTSUPP);
 	kfree(fh);
 
 	return err;
@@ -328,6 +325,7 @@ struct ovl_copy_up_ctx {
 	struct dentry *workdir;
 	bool tmpfile;
 	bool origin;
+	bool metadata_only;
 };
 
 static int ovl_link_up(struct ovl_copy_up_ctx *c)
@@ -458,11 +456,15 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
 	 */
 	if (c->origin) {
 		err = ovl_set_origin(c->dentry, c->lowerpath.dentry, temp);
-		if (err)
-			return err;
+		if (err) {
+			if (err != -EOPNOTSUPP)
+				return err;
+			/* Upper does not support xattr. Copy up data as well */
+			c->metadata_only = false;
+		}
 	}
 
-	if (S_ISREG(c->stat.mode)) {
+	if (S_ISREG(c->stat.mode) && !c->metadata_only) {
 		struct path upperpath;
 
 		ovl_path_upper(c->dentry, &upperpath);
@@ -592,10 +594,12 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 	int err;
 	DEFINE_DELAYED_CALL(done);
 	struct path parentpath;
+	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
 	struct ovl_copy_up_ctx ctx = {
 		.parent = parent,
 		.dentry = dentry,
 		.workdir = ovl_workdir(dentry),
+		.metadata_only = ofs->config.metacopy,
 	};
 
 	if (WARN_ON(!ctx.workdir))
@@ -616,9 +620,17 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 	if (err)
 		return err;
 
+	if (!S_ISREG(ctx.stat.mode))
+		ctx.metadata_only = false;
+
 	/* maybe truncate regular file. this has no effect on dirs */
-	if (flags & O_TRUNC)
+	if (flags & O_TRUNC) {
 		ctx.stat.size = 0;
+		ctx.metadata_only = false;
+	}
+
+	if (flags & (OPEN_FMODE(flags) & FMODE_WRITE))
+		ctx.metadata_only = false;
 
 	if (S_ISLNK(ctx.stat.mode)) {
 		ctx.link = vfs_get_link(ctx.lowerpath.dentry, &done);
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 5/9] ovl: Set xattr OVL_XATTR_METACOPY on upper file
  2017-10-10 15:32 [RFC PATCH 0/9][V3] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (3 preceding siblings ...)
  2017-10-10 15:32 ` [PATCH 4/9] ovl: Copy up only metadata during copy up where it makes sense Vivek Goyal
@ 2017-10-10 15:32 ` Vivek Goyal
  2017-10-10 17:03   ` Amir Goldstein
  2017-10-10 15:32 ` [PATCH 6/9] ovl: Fix ovl_getattr() to get number of blocks from lower Vivek Goyal
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Vivek Goyal @ 2017-10-10 15:32 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal, ebiederm

Presence of OVL_XATTR_METACOPY reflects that file has been copied up
with metadata only and data needs to be copied up later from lower.
So this xattr is set when a metadata copy takes place and cleared when
data copy takes place.

We also use a bit in ovl_inode->flags to cache OVL_METACOPY which reflects
whether ovl inode has only metadata copied up.

ovl_set_size() code has been taken from Amir's patch.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/copy_up.c   | 65 ++++++++++++++++++++++++++++++++++++++++++++++--
 fs/overlayfs/inode.c     |  3 ++-
 fs/overlayfs/overlayfs.h |  5 +++-
 fs/overlayfs/util.c      | 21 ++++++++++++++--
 4 files changed, 88 insertions(+), 6 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index ebd404629c6d..682852a78163 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -196,6 +196,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
 	return error;
 }
 
+static int ovl_set_size(struct dentry *upperdentry, struct kstat *stat)
+{
+	struct iattr attr = {
+		.ia_valid = ATTR_SIZE,
+		.ia_size = stat->size,
+	};
+
+	return notify_change(upperdentry, &attr, NULL);
+}
+
 static int ovl_set_timestamps(struct dentry *upperdentry, struct kstat *stat)
 {
 	struct iattr attr = {
@@ -439,9 +449,31 @@ static int ovl_get_tmpfile(struct ovl_copy_up_ctx *c, struct dentry **tempp)
 	goto out;
 }
 
+static int ovl_copy_up_data_inode(struct ovl_copy_up_ctx *c)
+{
+	struct path upperpath;
+	int err;
+
+	ovl_path_upper(c->dentry, &upperpath);
+	BUG_ON(upperpath.dentry == NULL);
+
+	err = ovl_copy_up_data(&c->lowerpath, &upperpath, c->stat.size);
+	if (err)
+		return err;
+
+	err= vfs_removexattr(upperpath.dentry, OVL_XATTR_METACOPY);
+	if (err)
+		return err;
+
+	smp_wmb();
+	ovl_clear_flag(OVL_METACOPY, d_inode(c->dentry));
+	return err;
+}
+
 static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
 {
 	int err;
+	bool set_size = false;
 
 	err = ovl_copy_xattr(c->lowerpath.dentry, temp);
 	if (err)
@@ -476,8 +508,29 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
 			return err;
 	}
 
+	if (c->metadata_only) {
+		err = ovl_check_setxattr(c->dentry, temp, OVL_XATTR_METACOPY,
+					 NULL, 0, -EOPNOTSUPP);
+		if (err)
+			return err;
+
+		set_size = true;
+	}
+
 	inode_lock(temp->d_inode);
+	if (set_size) {
+		err = ovl_set_size(temp, &c->stat);
+		if (err) {
+			inode_unlock(temp->d_inode);
+			return err;
+		}
+	}
+
 	err = ovl_set_attr(temp, &c->stat);
+	if (err) {
+		inode_unlock(temp->d_inode);
+		return err;
+	}
 	inode_unlock(temp->d_inode);
 	if (err)
 		return err;
@@ -511,6 +564,10 @@ static int ovl_copy_up_locked(struct ovl_copy_up_ctx *c)
 		goto out_cleanup;
 
 	ovl_inode_update(d_inode(c->dentry), newdentry);
+	if (c->metadata_only) {
+		smp_wmb();
+		ovl_set_flag(OVL_METACOPY, d_inode(c->dentry));
+	}
 out:
 	dput(temp);
 	return err;
@@ -639,7 +696,7 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 	}
 	ovl_do_check_copy_up(ctx.lowerpath.dentry);
 
-	err = ovl_copy_up_start(dentry);
+	err = ovl_copy_up_start(dentry, flags);
 	/* err < 0: interrupted, err > 0: raced with another copy-up */
 	if (unlikely(err)) {
 		if (err > 0)
@@ -649,6 +706,8 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 			err = ovl_do_copy_up(&ctx);
 		if (!err && !ovl_dentry_has_upper_alias(dentry))
 			err = ovl_link_up(&ctx);
+		if (!err && ovl_dentry_needs_data_copy_up(dentry, flags))
+			err = ovl_copy_up_data_inode(&ctx);
 		ovl_copy_up_end(dentry);
 	}
 	do_delayed_call(&done);
@@ -679,8 +738,10 @@ int ovl_copy_up_flags(struct dentry *dentry, int flags)
 		 *      with rename.
 		 */
 		if (ovl_dentry_upper(dentry) &&
-		    ovl_dentry_has_upper_alias(dentry))
+		    ovl_dentry_has_upper_alias(dentry) &&
+		    !ovl_dentry_needs_data_copy_up(dentry, flags)) {
 			break;
+		}
 
 		next = dget(dentry);
 		/* find the topmost dentry not yet copied up */
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index a619addecafc..e5825b8948e0 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -320,7 +320,8 @@ struct posix_acl *ovl_get_acl(struct inode *inode, int type)
 static bool ovl_open_need_copy_up(struct dentry *dentry, int flags)
 {
 	if (ovl_dentry_upper(dentry) &&
-	    ovl_dentry_has_upper_alias(dentry))
+	    ovl_dentry_has_upper_alias(dentry) &&
+	    !ovl_dentry_needs_data_copy_up(dentry, flags))
 		return false;
 
 	if (special_file(d_inode(dentry)->i_mode))
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index c706a6f99928..773f5ad75729 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -26,10 +26,12 @@ enum ovl_path_type {
 #define OVL_XATTR_ORIGIN OVL_XATTR_PREFIX "origin"
 #define OVL_XATTR_IMPURE OVL_XATTR_PREFIX "impure"
 #define OVL_XATTR_NLINK OVL_XATTR_PREFIX "nlink"
+#define OVL_XATTR_METACOPY OVL_XATTR_PREFIX "metacopy"
 
 enum ovl_flag {
 	OVL_IMPURE,
 	OVL_INDEX,
+	OVL_METACOPY,
 };
 
 /*
@@ -211,6 +213,7 @@ bool ovl_dentry_is_whiteout(struct dentry *dentry);
 void ovl_dentry_set_opaque(struct dentry *dentry);
 bool ovl_dentry_has_upper_alias(struct dentry *dentry);
 void ovl_dentry_set_upper_alias(struct dentry *dentry);
+bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags);
 bool ovl_redirect_dir(struct super_block *sb);
 const char *ovl_dentry_get_redirect(struct dentry *dentry);
 void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
@@ -221,7 +224,7 @@ void ovl_dentry_version_inc(struct dentry *dentry, bool impurity);
 u64 ovl_dentry_version_get(struct dentry *dentry);
 bool ovl_is_whiteout(struct dentry *dentry);
 struct file *ovl_path_open(struct path *path, int flags);
-int ovl_copy_up_start(struct dentry *dentry);
+int ovl_copy_up_start(struct dentry *dentry, int flags);
 void ovl_copy_up_end(struct dentry *dentry);
 bool ovl_check_dir_xattr(struct dentry *dentry, const char *name);
 int ovl_check_setxattr(struct dentry *dentry, struct dentry *upperdentry,
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index a4ce1c9944f0..91c542d1a39d 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -227,6 +227,22 @@ void ovl_dentry_set_upper_alias(struct dentry *dentry)
 	oe->has_upper = true;
 }
 
+bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags) {
+	if (!S_ISREG(d_inode(dentry)->i_mode))
+		return false;
+
+	if (!flags)
+		return false;
+
+	if (!(OPEN_FMODE(flags) & FMODE_WRITE))
+		return false;
+
+	if (!ovl_test_flag(OVL_METACOPY, d_inode(dentry)))
+		return false;
+
+	return true;
+}
+
 bool ovl_redirect_dir(struct super_block *sb)
 {
 	struct ovl_fs *ofs = sb->s_fs_info;
@@ -310,13 +326,14 @@ struct file *ovl_path_open(struct path *path, int flags)
 	return dentry_open(path, flags | O_NOATIME, current_cred());
 }
 
-int ovl_copy_up_start(struct dentry *dentry)
+int ovl_copy_up_start(struct dentry *dentry, int flags)
 {
 	struct ovl_inode *oi = OVL_I(d_inode(dentry));
 	int err;
 
 	err = mutex_lock_interruptible(&oi->lock);
-	if (!err && ovl_dentry_has_upper_alias(dentry)) {
+	if (!err && ovl_dentry_has_upper_alias(dentry) &&
+	    !ovl_dentry_needs_data_copy_up(dentry, flags)) {
 		err = 1; /* Already copied up */
 		mutex_unlock(&oi->lock);
 	}
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 6/9] ovl: Fix ovl_getattr() to get number of blocks from lower
  2017-10-10 15:32 [RFC PATCH 0/9][V3] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (4 preceding siblings ...)
  2017-10-10 15:32 ` [PATCH 5/9] ovl: Set xattr OVL_XATTR_METACOPY on upper file Vivek Goyal
@ 2017-10-10 15:32 ` Vivek Goyal
  2017-10-10 15:32 ` [PATCH 7/9] ovl: Introduce read/write barriers around metacopy flag update Vivek Goyal
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Vivek Goyal @ 2017-10-10 15:32 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal, ebiederm

If an inode has been copied up metadata only, then we need to query the
number of blocks from lower and fill up the stat->st_blocks.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/inode.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index e5825b8948e0..d333fd48659f 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -140,6 +140,15 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
 	if (!is_dir && ovl_test_flag(OVL_INDEX, d_inode(dentry)))
 		stat->nlink = dentry->d_inode->i_nlink;
 
+	if (ovl_test_flag(OVL_METACOPY, d_inode(dentry))) {
+		struct kstat lowerstat;
+
+		ovl_path_lower(dentry, &realpath);
+		err = vfs_getattr(&realpath, &lowerstat, STATX_BLOCKS, flags);
+		if (err)
+			goto out;
+		stat->blocks = lowerstat.blocks;
+	}
 out:
 	revert_creds(old_cred);
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 7/9] ovl: Introduce read/write barriers around metacopy flag update
  2017-10-10 15:32 [RFC PATCH 0/9][V3] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (5 preceding siblings ...)
  2017-10-10 15:32 ` [PATCH 6/9] ovl: Fix ovl_getattr() to get number of blocks from lower Vivek Goyal
@ 2017-10-10 15:32 ` Vivek Goyal
  2017-10-10 17:12   ` Amir Goldstein
  2017-10-10 15:32 ` [PATCH 8/9] ovl: Set OVL_METACOPY flag during ovl_lookup() Vivek Goyal
  2017-10-10 15:32 ` [PATCH 9/9] ovl: Return lower dentry if only metadata copy up took place Vivek Goyal
  8 siblings, 1 reply; 31+ messages in thread
From: Vivek Goyal @ 2017-10-10 15:32 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal, ebiederm

We can access and check metacopy flag outside ovl_inode->lock. IOW, say
a file was meta copied up in the past and it has metacopy flag set. Now
a data copy up operation in progress. Say another thread reads state of
this flag and if flag clearing is visible before file has been fully
copied up, that can cause problems.

	CPU1				CPU2
ovl_copy_up_flags()			acquire(oi->lock)
 ovl_dentry_needs_data_copy_up()	  ovl_copy_up_data_inode()
   ovl_test_metacopy_flag()		    ovl_copy_up_data()
					    ovl_clear_metacopy_flag()
					release(oi->lock)

Say CPU2 is copying up data and in the end clears metacopy flag. But if
CPU1 perceives clearing of metacopy before actual data copy up operation
being finished, that can become a problem. We want this clearing of flag
to be visible only if all previous write operations have finished.

Hence this patch introduces smp_wmb() on metacopy flag set/clear operation
and smp_rmb() on metacopy flag test operation.

May be some other lock or barrier is already covering it. But I am not sure
what that is and is it obvious enough that we will not break it in future.

So hence trying to be safe here and introducing barriers explicitly for
metacopy bit.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/copy_up.c   |  9 +++------
 fs/overlayfs/overlayfs.h |  3 +++
 fs/overlayfs/util.c      | 23 ++++++++++++++++++++++-
 3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 682852a78163..b10269b80803 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -465,8 +465,7 @@ static int ovl_copy_up_data_inode(struct ovl_copy_up_ctx *c)
 	if (err)
 		return err;
 
-	smp_wmb();
-	ovl_clear_flag(OVL_METACOPY, d_inode(c->dentry));
+	ovl_clear_metacopy_flag(d_inode(c->dentry));
 	return err;
 }
 
@@ -564,10 +563,8 @@ static int ovl_copy_up_locked(struct ovl_copy_up_ctx *c)
 		goto out_cleanup;
 
 	ovl_inode_update(d_inode(c->dentry), newdentry);
-	if (c->metadata_only) {
-		smp_wmb();
-		ovl_set_flag(OVL_METACOPY, d_inode(c->dentry));
-	}
+	if (c->metadata_only)
+		ovl_set_metacopy_flag(d_inode(c->dentry));
 out:
 	dput(temp);
 	return err;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 773f5ad75729..df3f513de3cc 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -234,6 +234,9 @@ int ovl_set_impure(struct dentry *dentry, struct dentry *upperdentry);
 void ovl_set_flag(unsigned long flag, struct inode *inode);
 void ovl_clear_flag(unsigned long flag, struct inode *inode);
 bool ovl_test_flag(unsigned long flag, struct inode *inode);
+void ovl_set_metacopy_flag(struct inode *inode);
+void ovl_clear_metacopy_flag(struct inode *inode);
+bool ovl_test_metacopy_flag(struct inode *inode);
 bool ovl_inuse_trylock(struct dentry *dentry);
 void ovl_inuse_unlock(struct dentry *dentry);
 int ovl_nlink_start(struct dentry *dentry, bool *locked);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 91c542d1a39d..000f78b92a72 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -227,6 +227,27 @@ void ovl_dentry_set_upper_alias(struct dentry *dentry)
 	oe->has_upper = true;
 }
 
+bool ovl_test_metacopy_flag(struct inode *inode)
+{
+	/* Pairs with smp_wmb() in ovl_set_metacopy_flag() */
+	smp_rmb();
+	return ovl_test_flag(OVL_METACOPY, inode);
+}
+
+void ovl_set_metacopy_flag(struct inode *inode)
+{
+	/* Pairs with smp_rmb() in ovl_test_metacopy_flag() */
+	smp_wmb();
+	ovl_set_flag(OVL_METACOPY, inode);
+}
+
+void ovl_clear_metacopy_flag(struct inode *inode)
+{
+	/* Pairs with smp_rmb() in ovl_test_metacopy_flag() */
+	smp_wmb();
+	ovl_clear_flag(OVL_METACOPY, inode);
+}
+
 bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags) {
 	if (!S_ISREG(d_inode(dentry)->i_mode))
 		return false;
@@ -237,7 +258,7 @@ bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags) {
 	if (!(OPEN_FMODE(flags) & FMODE_WRITE))
 		return false;
 
-	if (!ovl_test_flag(OVL_METACOPY, d_inode(dentry)))
+	if (!ovl_test_metacopy_flag(d_inode(dentry)))
 		return false;
 
 	return true;
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 8/9] ovl: Set OVL_METACOPY flag during ovl_lookup()
  2017-10-10 15:32 [RFC PATCH 0/9][V3] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (6 preceding siblings ...)
  2017-10-10 15:32 ` [PATCH 7/9] ovl: Introduce read/write barriers around metacopy flag update Vivek Goyal
@ 2017-10-10 15:32 ` Vivek Goyal
  2017-10-10 15:32 ` [PATCH 9/9] ovl: Return lower dentry if only metadata copy up took place Vivek Goyal
  8 siblings, 0 replies; 31+ messages in thread
From: Vivek Goyal @ 2017-10-10 15:32 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal, ebiederm

During lookup, check for presence of OVL_XATTR_METACOPY and if present,
set OVL_METACOPY bit in flags.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/namei.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 654bea1a5ac9..85328e1088d8 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -26,6 +26,24 @@ struct ovl_lookup_data {
 	char *redirect;
 };
 
+/* err < 0, 0 if no metacopy xattr, 1 if metacopy xattr found */
+static int ovl_check_metacopy(struct dentry *dentry)
+{
+	int res;
+
+	res = vfs_getxattr(dentry, OVL_XATTR_METACOPY, NULL, 0);
+	if (res < 0) {
+		if (res == -ENODATA || res == -EOPNOTSUPP)
+			return 0;
+		goto out;
+	}
+
+	return 1;
+out:
+	pr_warn_ratelimited("overlayfs: failed to get metacopy (%i)\n", res);
+	return res;
+}
+
 static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
 			      size_t prelen, const char *post)
 {
@@ -592,6 +610,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	struct dentry *this;
 	unsigned int i;
 	int err;
+	bool metacopy = false;
 	struct ovl_lookup_data d = {
 		.name = dentry->d_name,
 		.is_dir = false,
@@ -632,6 +651,12 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 					       roe->numlower, &stack, &ctr);
 			if (err)
 				goto out;
+
+			err = ovl_check_metacopy(upperdentry);
+			if (err < 0)
+				goto out;
+			if (err == 1)
+				metacopy = true;
 		}
 
 		if (d.redirect) {
@@ -717,6 +742,19 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		OVL_I(inode)->redirect = upperredirect;
 		if (index)
 			ovl_set_flag(OVL_INDEX, inode);
+
+		if (metacopy) {
+			/*
+			 * Found an upper with metacopy set but at the same
+			 * time there is no lower dentry. Something is not
+			 * right.
+			 */
+			if (!ctr) {
+				err = -ESTALE;
+				goto out_put_inode;
+			}
+			ovl_set_flag(OVL_METACOPY, inode);
+		}
 	}
 
 	revert_creds(old_cred);
@@ -727,6 +765,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 
 	return NULL;
 
+out_put_inode:
+	iput(inode);
 out_free_oe:
 	dentry->d_fsdata = NULL;
 	kfree(oe);
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 9/9] ovl: Return lower dentry if only metadata copy up took place
  2017-10-10 15:32 [RFC PATCH 0/9][V3] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (7 preceding siblings ...)
  2017-10-10 15:32 ` [PATCH 8/9] ovl: Set OVL_METACOPY flag during ovl_lookup() Vivek Goyal
@ 2017-10-10 15:32 ` Vivek Goyal
  8 siblings, 0 replies; 31+ messages in thread
From: Vivek Goyal @ 2017-10-10 15:32 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal, ebiederm

Upper dentry inode does not have data. So return lower dentry if upper
is only a metadata copy.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/super.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 6f4c32e49298..fa9d0032fd03 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -101,10 +101,14 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
 			err = ovl_check_append_only(d_inode(real), open_flags);
 			if (err)
 				return ERR_PTR(err);
+
+			if (ovl_test_flag(OVL_METACOPY, d_inode(dentry)))
+				goto lower;
 		}
 		return real;
 	}
 
+lower:
 	real = ovl_dentry_lower(dentry);
 	if (!real)
 		goto bug;
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/9] ovl: Set xattr OVL_XATTR_METACOPY on upper file
  2017-10-10 15:32 ` [PATCH 5/9] ovl: Set xattr OVL_XATTR_METACOPY on upper file Vivek Goyal
@ 2017-10-10 17:03   ` Amir Goldstein
  2017-10-11 20:16     ` Vivek Goyal
  0 siblings, 1 reply; 31+ messages in thread
From: Amir Goldstein @ 2017-10-10 17:03 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Tue, Oct 10, 2017 at 6:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> Presence of OVL_XATTR_METACOPY reflects that file has been copied up
> with metadata only and data needs to be copied up later from lower.
> So this xattr is set when a metadata copy takes place and cleared when
> data copy takes place.
>
> We also use a bit in ovl_inode->flags to cache OVL_METACOPY which reflects
> whether ovl inode has only metadata copied up.
>
> ovl_set_size() code has been taken from Amir's patch.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/copy_up.c   | 65 ++++++++++++++++++++++++++++++++++++++++++++++--
>  fs/overlayfs/inode.c     |  3 ++-
>  fs/overlayfs/overlayfs.h |  5 +++-
>  fs/overlayfs/util.c      | 21 ++++++++++++++--
>  4 files changed, 88 insertions(+), 6 deletions(-)
>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index ebd404629c6d..682852a78163 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -196,6 +196,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>         return error;
>  }
>
> +static int ovl_set_size(struct dentry *upperdentry, struct kstat *stat)
> +{
> +       struct iattr attr = {
> +               .ia_valid = ATTR_SIZE,
> +               .ia_size = stat->size,
> +       };
> +
> +       return notify_change(upperdentry, &attr, NULL);
> +}
> +
>  static int ovl_set_timestamps(struct dentry *upperdentry, struct kstat *stat)
>  {
>         struct iattr attr = {
> @@ -439,9 +449,31 @@ static int ovl_get_tmpfile(struct ovl_copy_up_ctx *c, struct dentry **tempp)
>         goto out;
>  }
>
> +static int ovl_copy_up_data_inode(struct ovl_copy_up_ctx *c)
> +{
> +       struct path upperpath;
> +       int err;
> +
> +       ovl_path_upper(c->dentry, &upperpath);
> +       BUG_ON(upperpath.dentry == NULL);
> +
> +       err = ovl_copy_up_data(&c->lowerpath, &upperpath, c->stat.size);
> +       if (err)
> +               return err;
> +
> +       err= vfs_removexattr(upperpath.dentry, OVL_XATTR_METACOPY);
> +       if (err)
> +               return err;
> +
> +       smp_wmb();
> +       ovl_clear_flag(OVL_METACOPY, d_inode(c->dentry));
> +       return err;
> +}
> +
>  static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
>  {
>         int err;
> +       bool set_size = false;
>
>         err = ovl_copy_xattr(c->lowerpath.dentry, temp);
>         if (err)
> @@ -476,8 +508,29 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
>                         return err;
>         }
>
> +       if (c->metadata_only) {
> +               err = ovl_check_setxattr(c->dentry, temp, OVL_XATTR_METACOPY,
> +                                        NULL, 0, -EOPNOTSUPP);
> +               if (err)
> +                       return err;
> +
> +               set_size = true;
> +       }
> +
>         inode_lock(temp->d_inode);
> +       if (set_size) {
> +               err = ovl_set_size(temp, &c->stat);
> +               if (err) {
> +                       inode_unlock(temp->d_inode);
> +                       return err;
> +               }

This extra if err block can be avoided with if !err
Before ovl set attr.

> +       }
> +
>         err = ovl_set_attr(temp, &c->stat);
> +       if (err) {
> +               inode_unlock(temp->d_inode);
> +               return err;
> +       }

This extra if err block adds nothing.

>         inode_unlock(temp->d_inode);
>         if (err)
>                 return err;
> @@ -511,6 +564,10 @@ static int ovl_copy_up_locked(struct ovl_copy_up_ctx *c)
>                 goto out_cleanup;
>
>         ovl_inode_update(d_inode(c->dentry), newdentry);
> +       if (c->metadata_only) {
> +               smp_wmb();

This wmb does not address the only problem imo.
The problem is you must not allow inode to appear to have
An upper dentry before you made sure metacopy flag is visible.
So first you need to set metacopy flag before updating upper.
Then you also need to test them is correct order with rmb.

> +               ovl_set_flag(OVL_METACOPY, d_inode(c->dentry));
> +       }
>  out:
>         dput(temp);
>         return err;
> @@ -639,7 +696,7 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
>         }
>         ovl_do_check_copy_up(ctx.lowerpath.dentry);
>
> -       err = ovl_copy_up_start(dentry);
> +       err = ovl_copy_up_start(dentry, flags);
>         /* err < 0: interrupted, err > 0: raced with another copy-up */
>         if (unlikely(err)) {
>                 if (err > 0)
> @@ -649,6 +706,8 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
>                         err = ovl_do_copy_up(&ctx);
>                 if (!err && !ovl_dentry_has_upper_alias(dentry))
>                         err = ovl_link_up(&ctx);
> +               if (!err && ovl_dentry_needs_data_copy_up(dentry, flags))
> +                       err = ovl_copy_up_data_inode(&ctx);
>                 ovl_copy_up_end(dentry);
>         }
>         do_delayed_call(&done);
> @@ -679,8 +738,10 @@ int ovl_copy_up_flags(struct dentry *dentry, int flags)
>                  *      with rename.
>                  */
>                 if (ovl_dentry_upper(dentry) &&
> -                   ovl_dentry_has_upper_alias(dentry))
> +                   ovl_dentry_has_upper_alias(dentry) &&
> +                   !ovl_dentry_needs_data_copy_up(dentry, flags)) {
>                         break;
> +               }
>
>                 next = dget(dentry);
>                 /* find the topmost dentry not yet copied up */
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index a619addecafc..e5825b8948e0 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -320,7 +320,8 @@ struct posix_acl *ovl_get_acl(struct inode *inode, int type)
>  static bool ovl_open_need_copy_up(struct dentry *dentry, int flags)
>  {
>         if (ovl_dentry_upper(dentry) &&
> -           ovl_dentry_has_upper_alias(dentry))
> +           ovl_dentry_has_upper_alias(dentry) &&
> +           !ovl_dentry_needs_data_copy_up(dentry, flags))
>                 return false;
>
>         if (special_file(d_inode(dentry)->i_mode))
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index c706a6f99928..773f5ad75729 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -26,10 +26,12 @@ enum ovl_path_type {
>  #define OVL_XATTR_ORIGIN OVL_XATTR_PREFIX "origin"
>  #define OVL_XATTR_IMPURE OVL_XATTR_PREFIX "impure"
>  #define OVL_XATTR_NLINK OVL_XATTR_PREFIX "nlink"
> +#define OVL_XATTR_METACOPY OVL_XATTR_PREFIX "metacopy"
>
>  enum ovl_flag {
>         OVL_IMPURE,
>         OVL_INDEX,
> +       OVL_METACOPY,
>  };
>
>  /*
> @@ -211,6 +213,7 @@ bool ovl_dentry_is_whiteout(struct dentry *dentry);
>  void ovl_dentry_set_opaque(struct dentry *dentry);
>  bool ovl_dentry_has_upper_alias(struct dentry *dentry);
>  void ovl_dentry_set_upper_alias(struct dentry *dentry);
> +bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags);
>  bool ovl_redirect_dir(struct super_block *sb);
>  const char *ovl_dentry_get_redirect(struct dentry *dentry);
>  void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
> @@ -221,7 +224,7 @@ void ovl_dentry_version_inc(struct dentry *dentry, bool impurity);
>  u64 ovl_dentry_version_get(struct dentry *dentry);
>  bool ovl_is_whiteout(struct dentry *dentry);
>  struct file *ovl_path_open(struct path *path, int flags);
> -int ovl_copy_up_start(struct dentry *dentry);
> +int ovl_copy_up_start(struct dentry *dentry, int flags);
>  void ovl_copy_up_end(struct dentry *dentry);
>  bool ovl_check_dir_xattr(struct dentry *dentry, const char *name);
>  int ovl_check_setxattr(struct dentry *dentry, struct dentry *upperdentry,
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index a4ce1c9944f0..91c542d1a39d 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -227,6 +227,22 @@ void ovl_dentry_set_upper_alias(struct dentry *dentry)
>         oe->has_upper = true;
>  }
>
> +bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags) {
> +       if (!S_ISREG(d_inode(dentry)->i_mode))
> +               return false;
> +
> +       if (!flags)
> +               return false;
> +
> +       if (!(OPEN_FMODE(flags) & FMODE_WRITE))
> +               return false;
> +
> +       if (!ovl_test_flag(OVL_METACOPY, d_inode(dentry)))
> +               return false;
> +
> +       return true;
> +}
> +
>  bool ovl_redirect_dir(struct super_block *sb)
>  {
>         struct ovl_fs *ofs = sb->s_fs_info;
> @@ -310,13 +326,14 @@ struct file *ovl_path_open(struct path *path, int flags)
>         return dentry_open(path, flags | O_NOATIME, current_cred());
>  }
>
> -int ovl_copy_up_start(struct dentry *dentry)
> +int ovl_copy_up_start(struct dentry *dentry, int flags)
>  {
>         struct ovl_inode *oi = OVL_I(d_inode(dentry));
>         int err;
>
>         err = mutex_lock_interruptible(&oi->lock);
> -       if (!err && ovl_dentry_has_upper_alias(dentry)) {
> +       if (!err && ovl_dentry_has_upper_alias(dentry) &&
> +           !ovl_dentry_needs_data_copy_up(dentry, flags)) {
>                 err = 1; /* Already copied up */
>                 mutex_unlock(&oi->lock);
>         }
> --
> 2.13.5
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/9] ovl: Introduce read/write barriers around metacopy flag update
  2017-10-10 15:32 ` [PATCH 7/9] ovl: Introduce read/write barriers around metacopy flag update Vivek Goyal
@ 2017-10-10 17:12   ` Amir Goldstein
  2017-10-11 20:27     ` Vivek Goyal
  0 siblings, 1 reply; 31+ messages in thread
From: Amir Goldstein @ 2017-10-10 17:12 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Tue, Oct 10, 2017 at 6:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> We can access and check metacopy flag outside ovl_inode->lock. IOW, say
> a file was meta copied up in the past and it has metacopy flag set. Now
> a data copy up operation in progress. Say another thread reads state of
> this flag and if flag clearing is visible before file has been fully
> copied up, that can cause problems.
>
>         CPU1                            CPU2
> ovl_copy_up_flags()                     acquire(oi->lock)
>  ovl_dentry_needs_data_copy_up()          ovl_copy_up_data_inode()
>    ovl_test_metacopy_flag()                 ovl_copy_up_data()
>                                             ovl_clear_metacopy_flag()
>                                         release(oi->lock)
>
> Say CPU2 is copying up data and in the end clears metacopy flag. But if
> CPU1 perceives clearing of metacopy before actual data copy up operation
> being finished, that can become a problem. We want this clearing of flag
> to be visible only if all previous write operations have finished.
>
> Hence this patch introduces smp_wmb() on metacopy flag set/clear operation
> and smp_rmb() on metacopy flag test operation.
>
> May be some other lock or barrier is already covering it. But I am not sure
> what that is and is it obvious enough that we will not break it in future.
>
> So hence trying to be safe here and introducing barriers explicitly for
> metacopy bit.

Please revisit your changes after addressing my comment on patch 5.
IIUC the flow you describe above should be addressed by testing
metacopy flag again under ovl inode lock.

The only subtle part imo is making metacopy flag visible
Before making upper dentry visible.
Clearing metacopy flag should not be a problem imo,
As it comes after fsync, but didn't look closely.

>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/copy_up.c   |  9 +++------
>  fs/overlayfs/overlayfs.h |  3 +++
>  fs/overlayfs/util.c      | 23 ++++++++++++++++++++++-
>  3 files changed, 28 insertions(+), 7 deletions(-)
>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index 682852a78163..b10269b80803 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -465,8 +465,7 @@ static int ovl_copy_up_data_inode(struct ovl_copy_up_ctx *c)
>         if (err)
>                 return err;
>
> -       smp_wmb();
> -       ovl_clear_flag(OVL_METACOPY, d_inode(c->dentry));
> +       ovl_clear_metacopy_flag(d_inode(c->dentry));
>         return err;
>  }
>
> @@ -564,10 +563,8 @@ static int ovl_copy_up_locked(struct ovl_copy_up_ctx *c)
>                 goto out_cleanup;
>
>         ovl_inode_update(d_inode(c->dentry), newdentry);
> -       if (c->metadata_only) {
> -               smp_wmb();
> -               ovl_set_flag(OVL_METACOPY, d_inode(c->dentry));
> -       }
> +       if (c->metadata_only)
> +               ovl_set_metacopy_flag(d_inode(c->dentry));
>  out:
>         dput(temp);
>         return err;
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 773f5ad75729..df3f513de3cc 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -234,6 +234,9 @@ int ovl_set_impure(struct dentry *dentry, struct dentry *upperdentry);
>  void ovl_set_flag(unsigned long flag, struct inode *inode);
>  void ovl_clear_flag(unsigned long flag, struct inode *inode);
>  bool ovl_test_flag(unsigned long flag, struct inode *inode);
> +void ovl_set_metacopy_flag(struct inode *inode);
> +void ovl_clear_metacopy_flag(struct inode *inode);
> +bool ovl_test_metacopy_flag(struct inode *inode);
>  bool ovl_inuse_trylock(struct dentry *dentry);
>  void ovl_inuse_unlock(struct dentry *dentry);
>  int ovl_nlink_start(struct dentry *dentry, bool *locked);
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index 91c542d1a39d..000f78b92a72 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -227,6 +227,27 @@ void ovl_dentry_set_upper_alias(struct dentry *dentry)
>         oe->has_upper = true;
>  }
>
> +bool ovl_test_metacopy_flag(struct inode *inode)
> +{
> +       /* Pairs with smp_wmb() in ovl_set_metacopy_flag() */
> +       smp_rmb();
> +       return ovl_test_flag(OVL_METACOPY, inode);
> +}
> +
> +void ovl_set_metacopy_flag(struct inode *inode)
> +{
> +       /* Pairs with smp_rmb() in ovl_test_metacopy_flag() */
> +       smp_wmb();
> +       ovl_set_flag(OVL_METACOPY, inode);
> +}
> +
> +void ovl_clear_metacopy_flag(struct inode *inode)
> +{
> +       /* Pairs with smp_rmb() in ovl_test_metacopy_flag() */
> +       smp_wmb();
> +       ovl_clear_flag(OVL_METACOPY, inode);
> +}
> +
>  bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags) {
>         if (!S_ISREG(d_inode(dentry)->i_mode))
>                 return false;
> @@ -237,7 +258,7 @@ bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags) {
>         if (!(OPEN_FMODE(flags) & FMODE_WRITE))
>                 return false;
>
> -       if (!ovl_test_flag(OVL_METACOPY, d_inode(dentry)))
> +       if (!ovl_test_metacopy_flag(d_inode(dentry)))
>                 return false;
>
>         return true;
> --
> 2.13.5
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/9] ovl: Provide a mount option metacopy=on/off for metadata copyup
  2017-10-10 15:32 ` [PATCH 3/9] ovl: Provide a mount option metacopy=on/off for metadata copyup Vivek Goyal
@ 2017-10-11  1:36   ` Amir Goldstein
  2017-10-11 13:57     ` Vivek Goyal
  0 siblings, 1 reply; 31+ messages in thread
From: Amir Goldstein @ 2017-10-11  1:36 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Tue, Oct 10, 2017 at 6:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> By default metadata only copy up is disabled. Provide a mount option so
> that users can choose one way or other.
>
> Also metadata copyup is conditional on index=on. If index=off and user
> specifies metacopy=on, it goes back to metacopy=off and a warning is
> printed.
>

So this commit does not explain why the features are dependent
And when I think about it again, they should not be dependent.

I had several arguments for having metacopy rely on index, but
You shot down the idea to index all regular files for performance
Of index cleanup on mount.

Now the only remaining reason is not setting same origin for
Broken upper hardlinks when index is off, the solution should
Be to not metacopy lower hardlinks when index is off.
IMO it's a minor trade-off for keeping the features independent.

> Also provide a kernel config and module option to enable/disable
> metacopy feature.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/Kconfig     |  9 +++++++++
>  fs/overlayfs/ovl_entry.h |  1 +
>  fs/overlayfs/super.c     | 32 ++++++++++++++++++++++++++++++++
>  3 files changed, 42 insertions(+)
>
> diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
> index cbfc196e5dc5..94d4c61719c8 100644
> --- a/fs/overlayfs/Kconfig
> +++ b/fs/overlayfs/Kconfig
> @@ -43,3 +43,12 @@ config OVERLAY_FS_INDEX
>           outcomes.  However, mounting the same overlay with an old kernel
>           read-write and then mounting it again with a new kernel, will have
>           unexpected results.
> +
> +config OVERLAY_FS_METACOPY
> +       bool "Overlayfs: turn on metadata only copy up feature by default"
> +       depends on OVERLAY_FS
> +       depends on OVERLAY_FS_INDEX
> +       help
> +         If this config option is enabled then overlay filesystems will
> +         copy up only metadata where appropriate and data copy up will
> +         happen when a file is opended for WRITE operation.
> diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> index 25d9b5adcd42..6806f0b0fbc2 100644
> --- a/fs/overlayfs/ovl_entry.h
> +++ b/fs/overlayfs/ovl_entry.h
> @@ -15,6 +15,7 @@ struct ovl_config {
>         bool default_permissions;
>         bool redirect_dir;
>         bool index;
> +       bool metacopy;
>  };
>
>  /* private information held for overlayfs's superblock */
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 092d150643c1..6f4c32e49298 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -39,6 +39,11 @@ module_param_named(index, ovl_index_def, bool, 0644);
>  MODULE_PARM_DESC(ovl_index_def,
>                  "Default to on or off for the inodes index feature");
>
> +static bool ovl_metacopy_def = IS_ENABLED(CONFIG_OVERLAY_FS_METACOPY);
> +module_param_named(metacopy, ovl_metacopy_def, bool, 0644);
> +MODULE_PARM_DESC(ovl_metacopy_def,
> +                "Default to on or off for the metadata only copy up feature");
> +
>  static void ovl_dentry_release(struct dentry *dentry)
>  {
>         struct ovl_entry *oe = dentry->d_fsdata;
> @@ -303,6 +308,9 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
>         if (ufs->config.index != ovl_index_def)
>                 seq_printf(m, ",index=%s",
>                            ufs->config.index ? "on" : "off");
> +       if (ufs->config.metacopy != ovl_metacopy_def)
> +               seq_printf(m, ",metacopy=%s",
> +                          ufs->config.metacopy ? "on" : "off");
>         return 0;
>  }
>
> @@ -336,6 +344,8 @@ enum {
>         OPT_REDIRECT_DIR_OFF,
>         OPT_INDEX_ON,
>         OPT_INDEX_OFF,
> +       OPT_METACOPY_ON,
> +       OPT_METACOPY_OFF,
>         OPT_ERR,
>  };
>
> @@ -348,6 +358,8 @@ static const match_table_t ovl_tokens = {
>         {OPT_REDIRECT_DIR_OFF,          "redirect_dir=off"},
>         {OPT_INDEX_ON,                  "index=on"},
>         {OPT_INDEX_OFF,                 "index=off"},
> +       {OPT_METACOPY_ON,               "metacopy=on"},
> +       {OPT_METACOPY_OFF,              "metacopy=off"},
>         {OPT_ERR,                       NULL}
>  };
>
> @@ -428,6 +440,14 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
>                         config->index = false;
>                         break;
>
> +               case OPT_METACOPY_ON:
> +                       config->metacopy = true;
> +                       break;
> +
> +               case OPT_METACOPY_OFF:
> +                       config->metacopy = false;
> +                       break;
> +
>                 default:
>                         pr_err("overlayfs: unrecognized mount option \"%s\" or missing value\n", p);
>                         return -EINVAL;
> @@ -847,6 +867,12 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>
>         ufs->config.redirect_dir = ovl_redirect_dir_def;
>         ufs->config.index = ovl_index_def;
> +       if (ovl_metacopy_def && !ovl_index_def) {
> +               pr_warn("overlayfs: metadata copy up can not be enabled by default as index feature is not enabled by default.\n");
> +               ovl_metacopy_def = false;
> +       }
> +       ufs->config.metacopy = ovl_metacopy_def;
> +
>         err = ovl_parse_opt((char *) data, &ufs->config);
>         if (err)
>                 goto out_free_config;
> @@ -1091,6 +1117,12 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>         if (!ufs->indexdir)
>                 ufs->config.index = false;
>
> +       /* As of now metacopy=on is dependent on index=on */
> +       if (ufs->config.metacopy && !ufs->config.index) {
> +               pr_warn("overlayfs: can not enable metadata only copyup with index=off. Falling back to metacopy=off\n");
> +               ufs->config.metacopy = false;
> +       }
> +
>         if (remote)
>                 sb->s_d_op = &ovl_reval_dentry_operations;
>         else
> --
> 2.13.5
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/9] ovl: Provide a mount option metacopy=on/off for metadata copyup
  2017-10-11  1:36   ` Amir Goldstein
@ 2017-10-11 13:57     ` Vivek Goyal
  2017-10-11 16:29       ` Amir Goldstein
  0 siblings, 1 reply; 31+ messages in thread
From: Vivek Goyal @ 2017-10-11 13:57 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Wed, Oct 11, 2017 at 04:36:46AM +0300, Amir Goldstein wrote:
> On Tue, Oct 10, 2017 at 6:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > By default metadata only copy up is disabled. Provide a mount option so
> > that users can choose one way or other.
> >
> > Also metadata copyup is conditional on index=on. If index=off and user
> > specifies metacopy=on, it goes back to metacopy=off and a warning is
> > printed.
> >
> 
> So this commit does not explain why the features are dependent
> And when I think about it again, they should not be dependent.
> 
> I had several arguments for having metacopy rely on index, but
> You shot down the idea to index all regular files for performance
> Of index cleanup on mount.
> 
> Now the only remaining reason is not setting same origin for
> Broken upper hardlinks when index is off, the solution should
> Be to not metacopy lower hardlinks when index is off.
> IMO it's a minor trade-off for keeping the features independent.

Hi Amir,

Hey, you were the one who pushed me hard to make it dependent on
index. :-) I was more than happy to keep it as independent
functionality.

Anyway, I did not understand what's the problem with borken upper
hardlinks when index=off. If two upper hardlinks (broken), can 
point to same ORIGIN on lower, they will just copy up same data.

IOW, it does not seem to be more broken then what it is now just
becase of metadata copy up or because of both upper pointing to
same ORIGIN? What am I missing.

Vivek

> 
> > Also provide a kernel config and module option to enable/disable
> > metacopy feature.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/overlayfs/Kconfig     |  9 +++++++++
> >  fs/overlayfs/ovl_entry.h |  1 +
> >  fs/overlayfs/super.c     | 32 ++++++++++++++++++++++++++++++++
> >  3 files changed, 42 insertions(+)
> >
> > diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
> > index cbfc196e5dc5..94d4c61719c8 100644
> > --- a/fs/overlayfs/Kconfig
> > +++ b/fs/overlayfs/Kconfig
> > @@ -43,3 +43,12 @@ config OVERLAY_FS_INDEX
> >           outcomes.  However, mounting the same overlay with an old kernel
> >           read-write and then mounting it again with a new kernel, will have
> >           unexpected results.
> > +
> > +config OVERLAY_FS_METACOPY
> > +       bool "Overlayfs: turn on metadata only copy up feature by default"
> > +       depends on OVERLAY_FS
> > +       depends on OVERLAY_FS_INDEX
> > +       help
> > +         If this config option is enabled then overlay filesystems will
> > +         copy up only metadata where appropriate and data copy up will
> > +         happen when a file is opended for WRITE operation.
> > diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> > index 25d9b5adcd42..6806f0b0fbc2 100644
> > --- a/fs/overlayfs/ovl_entry.h
> > +++ b/fs/overlayfs/ovl_entry.h
> > @@ -15,6 +15,7 @@ struct ovl_config {
> >         bool default_permissions;
> >         bool redirect_dir;
> >         bool index;
> > +       bool metacopy;
> >  };
> >
> >  /* private information held for overlayfs's superblock */
> > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > index 092d150643c1..6f4c32e49298 100644
> > --- a/fs/overlayfs/super.c
> > +++ b/fs/overlayfs/super.c
> > @@ -39,6 +39,11 @@ module_param_named(index, ovl_index_def, bool, 0644);
> >  MODULE_PARM_DESC(ovl_index_def,
> >                  "Default to on or off for the inodes index feature");
> >
> > +static bool ovl_metacopy_def = IS_ENABLED(CONFIG_OVERLAY_FS_METACOPY);
> > +module_param_named(metacopy, ovl_metacopy_def, bool, 0644);
> > +MODULE_PARM_DESC(ovl_metacopy_def,
> > +                "Default to on or off for the metadata only copy up feature");
> > +
> >  static void ovl_dentry_release(struct dentry *dentry)
> >  {
> >         struct ovl_entry *oe = dentry->d_fsdata;
> > @@ -303,6 +308,9 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
> >         if (ufs->config.index != ovl_index_def)
> >                 seq_printf(m, ",index=%s",
> >                            ufs->config.index ? "on" : "off");
> > +       if (ufs->config.metacopy != ovl_metacopy_def)
> > +               seq_printf(m, ",metacopy=%s",
> > +                          ufs->config.metacopy ? "on" : "off");
> >         return 0;
> >  }
> >
> > @@ -336,6 +344,8 @@ enum {
> >         OPT_REDIRECT_DIR_OFF,
> >         OPT_INDEX_ON,
> >         OPT_INDEX_OFF,
> > +       OPT_METACOPY_ON,
> > +       OPT_METACOPY_OFF,
> >         OPT_ERR,
> >  };
> >
> > @@ -348,6 +358,8 @@ static const match_table_t ovl_tokens = {
> >         {OPT_REDIRECT_DIR_OFF,          "redirect_dir=off"},
> >         {OPT_INDEX_ON,                  "index=on"},
> >         {OPT_INDEX_OFF,                 "index=off"},
> > +       {OPT_METACOPY_ON,               "metacopy=on"},
> > +       {OPT_METACOPY_OFF,              "metacopy=off"},
> >         {OPT_ERR,                       NULL}
> >  };
> >
> > @@ -428,6 +440,14 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
> >                         config->index = false;
> >                         break;
> >
> > +               case OPT_METACOPY_ON:
> > +                       config->metacopy = true;
> > +                       break;
> > +
> > +               case OPT_METACOPY_OFF:
> > +                       config->metacopy = false;
> > +                       break;
> > +
> >                 default:
> >                         pr_err("overlayfs: unrecognized mount option \"%s\" or missing value\n", p);
> >                         return -EINVAL;
> > @@ -847,6 +867,12 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> >
> >         ufs->config.redirect_dir = ovl_redirect_dir_def;
> >         ufs->config.index = ovl_index_def;
> > +       if (ovl_metacopy_def && !ovl_index_def) {
> > +               pr_warn("overlayfs: metadata copy up can not be enabled by default as index feature is not enabled by default.\n");
> > +               ovl_metacopy_def = false;
> > +       }
> > +       ufs->config.metacopy = ovl_metacopy_def;
> > +
> >         err = ovl_parse_opt((char *) data, &ufs->config);
> >         if (err)
> >                 goto out_free_config;
> > @@ -1091,6 +1117,12 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> >         if (!ufs->indexdir)
> >                 ufs->config.index = false;
> >
> > +       /* As of now metacopy=on is dependent on index=on */
> > +       if (ufs->config.metacopy && !ufs->config.index) {
> > +               pr_warn("overlayfs: can not enable metadata only copyup with index=off. Falling back to metacopy=off\n");
> > +               ufs->config.metacopy = false;
> > +       }
> > +
> >         if (remote)
> >                 sb->s_d_op = &ovl_reval_dentry_operations;
> >         else
> > --
> > 2.13.5
> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/9] ovl: Provide a mount option metacopy=on/off for metadata copyup
  2017-10-11 13:57     ` Vivek Goyal
@ 2017-10-11 16:29       ` Amir Goldstein
  2017-10-11 16:53         ` Vivek Goyal
  0 siblings, 1 reply; 31+ messages in thread
From: Amir Goldstein @ 2017-10-11 16:29 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Wed, Oct 11, 2017 at 4:57 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Oct 11, 2017 at 04:36:46AM +0300, Amir Goldstein wrote:
>> On Tue, Oct 10, 2017 at 6:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > By default metadata only copy up is disabled. Provide a mount option so
>> > that users can choose one way or other.
>> >
>> > Also metadata copyup is conditional on index=on. If index=off and user
>> > specifies metacopy=on, it goes back to metacopy=off and a warning is
>> > printed.
>> >
>>
>> So this commit does not explain why the features are dependent
>> And when I think about it again, they should not be dependent.
>>
>> I had several arguments for having metacopy rely on index, but
>> You shot down the idea to index all regular files for performance
>> Of index cleanup on mount.
>>
>> Now the only remaining reason is not setting same origin for
>> Broken upper hardlinks when index is off, the solution should
>> Be to not metacopy lower hardlinks when index is off.
>> IMO it's a minor trade-off for keeping the features independent.
>
> Hi Amir,
>
> Hey, you were the one who pushed me hard to make it dependent on
> index. :-) I was more than happy to keep it as independent
> functionality.

Yes I was. Just being honest about the options.
There are arguments both ways.
The big commonality is that both features are not friendly to copying
layers, but metacopy can survive copy upper layer and using same lower
layer.

>
> Anyway, I did not understand what's the problem with borken upper
> hardlinks when index=off. If two upper hardlinks (broken), can
> point to same ORIGIN on lower, they will just copy up same data.
>
> IOW, it does not seem to be more broken then what it is now just
> becase of metadata copy up or because of both upper pointing to
> same ORIGIN? What am I missing.
>
It is not broken now because we intentionally do NOT set origin when
copying up lower hardlinks and index is disabled.
You must not break this rule, so instead you can avoid metacopy for
lower hardlinks when index is disabled.
>
>>
>> > Also provide a kernel config and module option to enable/disable
>> > metacopy feature.
>> >
>> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
>> > ---
>> >  fs/overlayfs/Kconfig     |  9 +++++++++
>> >  fs/overlayfs/ovl_entry.h |  1 +
>> >  fs/overlayfs/super.c     | 32 ++++++++++++++++++++++++++++++++
>> >  3 files changed, 42 insertions(+)
>> >
>> > diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
>> > index cbfc196e5dc5..94d4c61719c8 100644
>> > --- a/fs/overlayfs/Kconfig
>> > +++ b/fs/overlayfs/Kconfig
>> > @@ -43,3 +43,12 @@ config OVERLAY_FS_INDEX
>> >           outcomes.  However, mounting the same overlay with an old kernel
>> >           read-write and then mounting it again with a new kernel, will have
>> >           unexpected results.
>> > +
>> > +config OVERLAY_FS_METACOPY
>> > +       bool "Overlayfs: turn on metadata only copy up feature by default"
>> > +       depends on OVERLAY_FS
>> > +       depends on OVERLAY_FS_INDEX
>> > +       help
>> > +         If this config option is enabled then overlay filesystems will
>> > +         copy up only metadata where appropriate and data copy up will
>> > +         happen when a file is opended for WRITE operation.
>> > diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
>> > index 25d9b5adcd42..6806f0b0fbc2 100644
>> > --- a/fs/overlayfs/ovl_entry.h
>> > +++ b/fs/overlayfs/ovl_entry.h
>> > @@ -15,6 +15,7 @@ struct ovl_config {
>> >         bool default_permissions;
>> >         bool redirect_dir;
>> >         bool index;
>> > +       bool metacopy;
>> >  };
>> >
>> >  /* private information held for overlayfs's superblock */
>> > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>> > index 092d150643c1..6f4c32e49298 100644
>> > --- a/fs/overlayfs/super.c
>> > +++ b/fs/overlayfs/super.c
>> > @@ -39,6 +39,11 @@ module_param_named(index, ovl_index_def, bool, 0644);
>> >  MODULE_PARM_DESC(ovl_index_def,
>> >                  "Default to on or off for the inodes index feature");
>> >
>> > +static bool ovl_metacopy_def = IS_ENABLED(CONFIG_OVERLAY_FS_METACOPY);
>> > +module_param_named(metacopy, ovl_metacopy_def, bool, 0644);
>> > +MODULE_PARM_DESC(ovl_metacopy_def,
>> > +                "Default to on or off for the metadata only copy up feature");
>> > +
>> >  static void ovl_dentry_release(struct dentry *dentry)
>> >  {
>> >         struct ovl_entry *oe = dentry->d_fsdata;
>> > @@ -303,6 +308,9 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
>> >         if (ufs->config.index != ovl_index_def)
>> >                 seq_printf(m, ",index=%s",
>> >                            ufs->config.index ? "on" : "off");
>> > +       if (ufs->config.metacopy != ovl_metacopy_def)
>> > +               seq_printf(m, ",metacopy=%s",
>> > +                          ufs->config.metacopy ? "on" : "off");
>> >         return 0;
>> >  }
>> >
>> > @@ -336,6 +344,8 @@ enum {
>> >         OPT_REDIRECT_DIR_OFF,
>> >         OPT_INDEX_ON,
>> >         OPT_INDEX_OFF,
>> > +       OPT_METACOPY_ON,
>> > +       OPT_METACOPY_OFF,
>> >         OPT_ERR,
>> >  };
>> >
>> > @@ -348,6 +358,8 @@ static const match_table_t ovl_tokens = {
>> >         {OPT_REDIRECT_DIR_OFF,          "redirect_dir=off"},
>> >         {OPT_INDEX_ON,                  "index=on"},
>> >         {OPT_INDEX_OFF,                 "index=off"},
>> > +       {OPT_METACOPY_ON,               "metacopy=on"},
>> > +       {OPT_METACOPY_OFF,              "metacopy=off"},
>> >         {OPT_ERR,                       NULL}
>> >  };
>> >
>> > @@ -428,6 +440,14 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
>> >                         config->index = false;
>> >                         break;
>> >
>> > +               case OPT_METACOPY_ON:
>> > +                       config->metacopy = true;
>> > +                       break;
>> > +
>> > +               case OPT_METACOPY_OFF:
>> > +                       config->metacopy = false;
>> > +                       break;
>> > +
>> >                 default:
>> >                         pr_err("overlayfs: unrecognized mount option \"%s\" or missing value\n", p);
>> >                         return -EINVAL;
>> > @@ -847,6 +867,12 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>> >
>> >         ufs->config.redirect_dir = ovl_redirect_dir_def;
>> >         ufs->config.index = ovl_index_def;
>> > +       if (ovl_metacopy_def && !ovl_index_def) {
>> > +               pr_warn("overlayfs: metadata copy up can not be enabled by default as index feature is not enabled by default.\n");
>> > +               ovl_metacopy_def = false;
>> > +       }
>> > +       ufs->config.metacopy = ovl_metacopy_def;
>> > +
>> >         err = ovl_parse_opt((char *) data, &ufs->config);
>> >         if (err)
>> >                 goto out_free_config;
>> > @@ -1091,6 +1117,12 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>> >         if (!ufs->indexdir)
>> >                 ufs->config.index = false;
>> >
>> > +       /* As of now metacopy=on is dependent on index=on */
>> > +       if (ufs->config.metacopy && !ufs->config.index) {
>> > +               pr_warn("overlayfs: can not enable metadata only copyup with index=off. Falling back to metacopy=off\n");
>> > +               ufs->config.metacopy = false;
>> > +       }
>> > +
>> >         if (remote)
>> >                 sb->s_d_op = &ovl_reval_dentry_operations;
>> >         else
>> > --
>> > 2.13.5
>> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/9] ovl: Provide a mount option metacopy=on/off for metadata copyup
  2017-10-11 16:29       ` Amir Goldstein
@ 2017-10-11 16:53         ` Vivek Goyal
  2017-10-11 17:36           ` Amir Goldstein
  0 siblings, 1 reply; 31+ messages in thread
From: Vivek Goyal @ 2017-10-11 16:53 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Wed, Oct 11, 2017 at 07:29:38PM +0300, Amir Goldstein wrote:

[..]
> >
> > Anyway, I did not understand what's the problem with borken upper
> > hardlinks when index=off. If two upper hardlinks (broken), can
> > point to same ORIGIN on lower, they will just copy up same data.
> >
> > IOW, it does not seem to be more broken then what it is now just
> > becase of metadata copy up or because of both upper pointing to
> > same ORIGIN? What am I missing.
> >
> It is not broken now because we intentionally do NOT set origin when
> copying up lower hardlinks and index is disabled.
> You must not break this rule, so instead you can avoid metacopy for
> lower hardlinks when index is disabled.

Hi Amir,

Ok, I am open to change it so that if index=off, lower hardlinks are not
copied up metadata only.

But please help me understand that what *additional* thing breaks if origin
is set when index=off. 

Say lower has a file foo.txt and another hard link foo-link.txt.
(say index=off, metacopy=off).

Say, I open foo-link.txt for WRITE, and it gets copied up. Now there is
no link between foo-link.txt and foo.txt and if a user opens foo.txt for
READ, it does not see updates to foo-link.txt. So this is the broken
behavior of hardlinks with index=off, as of today.

Now I go back to original state and this time do same operation with
(indxex=off, metacopy=on). Then "chown foo-link.txt" will only copy
metadata and set origin to lower file. Say I also chown "foo.txt", that
will do the same thing. Now both foo.txt and foo-link.txt will have
ORIGIN pointing to same lower inode. If any of the file is opened for
WRITE, data will be copied up from same origin.

So setting same ORIGIN on two upper files, does not seem to break anything
additional, AFAICS. What am I missing.

Vivek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/9] ovl: Provide a mount option metacopy=on/off for metadata copyup
  2017-10-11 16:53         ` Vivek Goyal
@ 2017-10-11 17:36           ` Amir Goldstein
  2017-10-11 18:34             ` Vivek Goyal
  0 siblings, 1 reply; 31+ messages in thread
From: Amir Goldstein @ 2017-10-11 17:36 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Wed, Oct 11, 2017 at 7:53 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Oct 11, 2017 at 07:29:38PM +0300, Amir Goldstein wrote:
>
> [..]
>> >
>> > Anyway, I did not understand what's the problem with borken upper
>> > hardlinks when index=off. If two upper hardlinks (broken), can
>> > point to same ORIGIN on lower, they will just copy up same data.
>> >
>> > IOW, it does not seem to be more broken then what it is now just
>> > becase of metadata copy up or because of both upper pointing to
>> > same ORIGIN? What am I missing.
>> >
>> It is not broken now because we intentionally do NOT set origin when
>> copying up lower hardlinks and index is disabled.
>> You must not break this rule, so instead you can avoid metacopy for
>> lower hardlinks when index is disabled.
>
> Hi Amir,
>
> Ok, I am open to change it so that if index=off, lower hardlinks are not
> copied up metadata only.
>
> But please help me understand that what *additional* thing breaks if origin
> is set when index=off.
>
> Say lower has a file foo.txt and another hard link foo-link.txt.
> (say index=off, metacopy=off).
>
> Say, I open foo-link.txt for WRITE, and it gets copied up. Now there is
> no link between foo-link.txt and foo.txt and if a user opens foo.txt for
> READ, it does not see updates to foo-link.txt. So this is the broken
> behavior of hardlinks with index=off, as of today.
>
> Now I go back to original state and this time do same operation with
> (indxex=off, metacopy=on). Then "chown foo-link.txt" will only copy
> metadata and set origin to lower file. Say I also chown "foo.txt", that
> will do the same thing. Now both foo.txt and foo-link.txt will have
> ORIGIN pointing to same lower inode. If any of the file is opened for
> WRITE, data will be copied up from same origin.
>
> So setting same ORIGIN on two upper files, does not seem to break anything
> additional, AFAICS. What am I missing.
>

Sorry for brevity of my response earlier. I was hopping on a plane.
The problem is a bit convoluted to explain, but here goes.

ORIGIN was introduced to implement constant st_ino across copy up.
stat(2) of an overlay file with ORIGIN (both layer in same fs) returns
st_ino of origin inode.

When copy up breaks lower hardlink to *different* upper inodes,
then those broken aliases MUST NOT return the same st_ino,
because they are different files with different data.

Only when index is enabled and copy up does not break lower
hardlink, it is legal to set ORIGIN for every upper alias, because
all upper aliases can and should return same st_ino (the origin st_ino)
from stat(2).

So that is the simplest way to explain why you MUST NOT set ORIGIN
on copy up of lower hardlink when index is disabled and therefore, you
cannot use ORIGIN for metacopy.

As I wrote in some email, you could set METAORIGIN xattr in this case
just for the use of metacopy, but I rather not have this special case.

Beyond the problem stated above with st_ino of broken hardlinks,
if broken hardlinks have the same ORIGIN, this also breaks indexing
assumptions and can lead to EIO on lookup, but no need to get into that.

Hope this is clear now,
Cheers,
Amir.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/9] ovl: Provide a mount option metacopy=on/off for metadata copyup
  2017-10-11 17:36           ` Amir Goldstein
@ 2017-10-11 18:34             ` Vivek Goyal
  2017-10-11 20:29               ` Amir Goldstein
  0 siblings, 1 reply; 31+ messages in thread
From: Vivek Goyal @ 2017-10-11 18:34 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Wed, Oct 11, 2017 at 08:36:03PM +0300, Amir Goldstein wrote:
> On Wed, Oct 11, 2017 at 7:53 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Wed, Oct 11, 2017 at 07:29:38PM +0300, Amir Goldstein wrote:
> >
> > [..]
> >> >
> >> > Anyway, I did not understand what's the problem with borken upper
> >> > hardlinks when index=off. If two upper hardlinks (broken), can
> >> > point to same ORIGIN on lower, they will just copy up same data.
> >> >
> >> > IOW, it does not seem to be more broken then what it is now just
> >> > becase of metadata copy up or because of both upper pointing to
> >> > same ORIGIN? What am I missing.
> >> >
> >> It is not broken now because we intentionally do NOT set origin when
> >> copying up lower hardlinks and index is disabled.
> >> You must not break this rule, so instead you can avoid metacopy for
> >> lower hardlinks when index is disabled.
> >
> > Hi Amir,
> >
> > Ok, I am open to change it so that if index=off, lower hardlinks are not
> > copied up metadata only.
> >
> > But please help me understand that what *additional* thing breaks if origin
> > is set when index=off.
> >
> > Say lower has a file foo.txt and another hard link foo-link.txt.
> > (say index=off, metacopy=off).
> >
> > Say, I open foo-link.txt for WRITE, and it gets copied up. Now there is
> > no link between foo-link.txt and foo.txt and if a user opens foo.txt for
> > READ, it does not see updates to foo-link.txt. So this is the broken
> > behavior of hardlinks with index=off, as of today.
> >
> > Now I go back to original state and this time do same operation with
> > (indxex=off, metacopy=on). Then "chown foo-link.txt" will only copy
> > metadata and set origin to lower file. Say I also chown "foo.txt", that
> > will do the same thing. Now both foo.txt and foo-link.txt will have
> > ORIGIN pointing to same lower inode. If any of the file is opened for
> > WRITE, data will be copied up from same origin.
> >
> > So setting same ORIGIN on two upper files, does not seem to break anything
> > additional, AFAICS. What am I missing.
> >
> 
> Sorry for brevity of my response earlier. I was hopping on a plane.
> The problem is a bit convoluted to explain, but here goes.
> 
> ORIGIN was introduced to implement constant st_ino across copy up.
> stat(2) of an overlay file with ORIGIN (both layer in same fs) returns
> st_ino of origin inode.
> 
> When copy up breaks lower hardlink to *different* upper inodes,
> then those broken aliases MUST NOT return the same st_ino,
> because they are different files with different data.

Hi Amir,

I thought current code already takes care of this situation.

ovl_getattr() {
          /*
            * Lower hardlinks may be broken on copy up to different
            * upper files, so we cannot use the lower origin st_ino
            * for those different files, even for the same fs case.
            * With inodes index enabled, it is safe to use st_ino
            * of an indexed hardlinked origin. The index validates
            * that the upper hardlink is not broken.
            */
           if (is_dir || lowerstat.nlink == 1 ||
               ovl_test_flag(OVL_INDEX, d_inode(dentry)))
                   stat->ino = lowerstat.ino;
}

So if lower had nlink > 1, then its inode number will not be used if
indexing was not enabled. If that's the case, two upper pointing to
same origin should not be a problem?

I even tested it. Wrote a patch to force setting origin even on hard
links with index=off.

Before copy up, both the files (testfile.txt and link-file.txt) had
same inode numbers. The moment file is copied up it uses its real
inode number on upper (and not the one retrieved from lower). IOW,
origin inode number is discarded if index=off. If that's the case
then setting ORIGIN with broken hard link should be just fine. It
can still be used for metadata copyup while it can't be used with
index=off.

In fact this sounds better to me. Make use of ORIGIN information only if
it is safe. With broken hardlinks it is not safe to make use of this
information so ignore ORIGIN. But it is still ok to make use of this
information from metadata copyup point of view.

> 
> Only when index is enabled and copy up does not break lower
> hardlink, it is legal to set ORIGIN for every upper alias, because
> all upper aliases can and should return same st_ino (the origin st_ino)
> from stat(2).
> 
> So that is the simplest way to explain why you MUST NOT set ORIGIN
> on copy up of lower hardlink when index is disabled and therefore, you
> cannot use ORIGIN for metacopy.
> 
> As I wrote in some email, you could set METAORIGIN xattr in this case
> just for the use of metacopy, but I rather not have this special case.
> 
> Beyond the problem stated above with st_ino of broken hardlinks,
> if broken hardlinks have the same ORIGIN, this also breaks indexing
> assumptions and can lead to EIO on lookup, but no need to get into that.

I guess you are referring to ovl_lookup_index().

ovl_lookup_index() {
	index = lookup_one_len_unlocked(name.name, ofs->indexdir, name.len);
        if (d_is_negative(index)) {
                if (upper && d_inode(origin)->i_nlink > 1) {
                        pr_warn_ratelimited("overlayfs: hard link with origin but no index (ino=%lu).\n",
                                            d_inode(origin)->i_ino);
                        goto fail;
                }

	}
}

So this is failing if we find a dentry which has origin but no index. And
this will hit if a overlay was mounted with index=off, hard link copied up
and then remounted with index=on. In that case it will return -EIO.

My question is, that does this have to be an error. If we are supporting
this use case, where index can be turned on later, then can we just warn
and continue? Or set OVL_XATTR_INDEX on upper to indicate that this
upper should have an index.

I mean ORIGIN started with the exclusive purpose of inode number stability.
But this is sort of infrastructure which keeps track of ORIGIN of copy
up source and can be used for metadata copy up as well. So indexing should
not put restrictions on what files ORIGIN can be set on. Instead both
metacopy and index should not make use of ORIGIN where they this things
are broken for them.

Thoughts?

Vivek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/9] ovl: Set xattr OVL_XATTR_METACOPY on upper file
  2017-10-10 17:03   ` Amir Goldstein
@ 2017-10-11 20:16     ` Vivek Goyal
  2017-10-11 20:44       ` Amir Goldstein
  0 siblings, 1 reply; 31+ messages in thread
From: Vivek Goyal @ 2017-10-11 20:16 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Tue, Oct 10, 2017 at 08:03:05PM +0300, Amir Goldstein wrote:

[..]
> > @@ -511,6 +564,10 @@ static int ovl_copy_up_locked(struct ovl_copy_up_ctx *c)
> >                 goto out_cleanup;
> >
> >         ovl_inode_update(d_inode(c->dentry), newdentry);
> > +       if (c->metadata_only) {
> > +               smp_wmb();
> 
> This wmb does not address the only problem imo.
> The problem is you must not allow inode to appear to have
> An upper dentry before you made sure metacopy flag is visible.
> So first you need to set metacopy flag before updating upper.
> Then you also need to test them is correct order with rmb.

Ok, so conceptually it makse sense to me that OVL_METACOPY flag should
be set before upper dentry is made visible.

So ovl_set_flag(OVL_METACOPY) should move before ovl_inode_update(). 
ovl_inode_update() already has a smp_wmb(). So write side is probably
fine.

But read side, ovl_upperdentry_dereference() only has data dependency
barrier and not smp_rmb(). (lockless_dereference(oi->__upperdentry)). I
am not sure if that's sufficient for ovl_inode->flags read or not. May
be we need to replace data dependency barrier with smp_rmb() in in
ovl_upperdentry_dereference()?

Even if we do this, I think we need barriers around setting/resetting
of OVL_METACOPY for the reset case. I will explain this in other patch.

Vivek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/9] ovl: Introduce read/write barriers around metacopy flag update
  2017-10-10 17:12   ` Amir Goldstein
@ 2017-10-11 20:27     ` Vivek Goyal
  2017-10-11 21:08       ` Amir Goldstein
  0 siblings, 1 reply; 31+ messages in thread
From: Vivek Goyal @ 2017-10-11 20:27 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Tue, Oct 10, 2017 at 08:12:00PM +0300, Amir Goldstein wrote:
> On Tue, Oct 10, 2017 at 6:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > We can access and check metacopy flag outside ovl_inode->lock. IOW, say
> > a file was meta copied up in the past and it has metacopy flag set. Now
> > a data copy up operation in progress. Say another thread reads state of
> > this flag and if flag clearing is visible before file has been fully
> > copied up, that can cause problems.
> >
> >         CPU1                            CPU2
> > ovl_copy_up_flags()                     acquire(oi->lock)
> >  ovl_dentry_needs_data_copy_up()          ovl_copy_up_data_inode()
> >    ovl_test_metacopy_flag()                 ovl_copy_up_data()
> >                                             ovl_clear_metacopy_flag()
> >                                         release(oi->lock)
> >
> > Say CPU2 is copying up data and in the end clears metacopy flag. But if
> > CPU1 perceives clearing of metacopy before actual data copy up operation
> > being finished, that can become a problem. We want this clearing of flag
> > to be visible only if all previous write operations have finished.
> >
> > Hence this patch introduces smp_wmb() on metacopy flag set/clear operation
> > and smp_rmb() on metacopy flag test operation.
> >
> > May be some other lock or barrier is already covering it. But I am not sure
> > what that is and is it obvious enough that we will not break it in future.
> >
> > So hence trying to be safe here and introducing barriers explicitly for
> > metacopy bit.
> 
> Please revisit your changes after addressing my comment on patch 5.

Assume, we added smp_rmb() in ovl_upperdentry_dereference(), to make
sure by the time upperdentry is visible, OVL_METACOPY is visible too.

> IIUC the flow you describe above should be addressed by testing
> metacopy flag again under ovl inode lock.
> 
> The only subtle part imo is making metacopy flag visible
> Before making upper dentry visible.
> Clearing metacopy flag should not be a problem imo,
> As it comes after fsync, but didn't look closely.

Say a file has been metadata copied up only. So upper is visible and
OVL_METACOPY flag is set.

Now, what happens if two paths try to data copy up the file. Say first
path gets the oi->lock and starts doing data copy. Second path enters
ovl_copy_up_flags() and checks if data copy up is required or not. If
OVL_METACOPY is still set, then second process will block on oi->lock
and once first process exits will check OVL_METACOPY again under lock
and bail out. So that is not a problem.

What about the other case. That is OVL_METACOPY gets cleared before
data copy operation is complete. If I don't introduce smp_wmb()/smp_rmb()
around resetting of OVL_METACOPY, what will make sure that
ovl_copy_up_flags() will see OVL_METACOPY=0 only after data copy up is
complete. Of course we will not like to return to caller of
ovl_copy_up_flags() while data copy up is still going on on a different
cpu.

Adding a smp_rmb()/smp_wmb() pair around OVL_METACOPY should solve the
above and that's what this patch is about.

(I noticed that there is a bug in current patch. I should first LOAD
 OVL_METACOPY state, then do smp_rmb() and then return to caller with
 loaded state.).

Vivek


> 
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/overlayfs/copy_up.c   |  9 +++------
> >  fs/overlayfs/overlayfs.h |  3 +++
> >  fs/overlayfs/util.c      | 23 ++++++++++++++++++++++-
> >  3 files changed, 28 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> > index 682852a78163..b10269b80803 100644
> > --- a/fs/overlayfs/copy_up.c
> > +++ b/fs/overlayfs/copy_up.c
> > @@ -465,8 +465,7 @@ static int ovl_copy_up_data_inode(struct ovl_copy_up_ctx *c)
> >         if (err)
> >                 return err;
> >
> > -       smp_wmb();
> > -       ovl_clear_flag(OVL_METACOPY, d_inode(c->dentry));
> > +       ovl_clear_metacopy_flag(d_inode(c->dentry));
> >         return err;
> >  }
> >
> > @@ -564,10 +563,8 @@ static int ovl_copy_up_locked(struct ovl_copy_up_ctx *c)
> >                 goto out_cleanup;
> >
> >         ovl_inode_update(d_inode(c->dentry), newdentry);
> > -       if (c->metadata_only) {
> > -               smp_wmb();
> > -               ovl_set_flag(OVL_METACOPY, d_inode(c->dentry));
> > -       }
> > +       if (c->metadata_only)
> > +               ovl_set_metacopy_flag(d_inode(c->dentry));
> >  out:
> >         dput(temp);
> >         return err;
> > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> > index 773f5ad75729..df3f513de3cc 100644
> > --- a/fs/overlayfs/overlayfs.h
> > +++ b/fs/overlayfs/overlayfs.h
> > @@ -234,6 +234,9 @@ int ovl_set_impure(struct dentry *dentry, struct dentry *upperdentry);
> >  void ovl_set_flag(unsigned long flag, struct inode *inode);
> >  void ovl_clear_flag(unsigned long flag, struct inode *inode);
> >  bool ovl_test_flag(unsigned long flag, struct inode *inode);
> > +void ovl_set_metacopy_flag(struct inode *inode);
> > +void ovl_clear_metacopy_flag(struct inode *inode);
> > +bool ovl_test_metacopy_flag(struct inode *inode);
> >  bool ovl_inuse_trylock(struct dentry *dentry);
> >  void ovl_inuse_unlock(struct dentry *dentry);
> >  int ovl_nlink_start(struct dentry *dentry, bool *locked);
> > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> > index 91c542d1a39d..000f78b92a72 100644
> > --- a/fs/overlayfs/util.c
> > +++ b/fs/overlayfs/util.c
> > @@ -227,6 +227,27 @@ void ovl_dentry_set_upper_alias(struct dentry *dentry)
> >         oe->has_upper = true;
> >  }
> >
> > +bool ovl_test_metacopy_flag(struct inode *inode)
> > +{
> > +       /* Pairs with smp_wmb() in ovl_set_metacopy_flag() */
> > +       smp_rmb();
> > +       return ovl_test_flag(OVL_METACOPY, inode);
> > +}
> > +
> > +void ovl_set_metacopy_flag(struct inode *inode)
> > +{
> > +       /* Pairs with smp_rmb() in ovl_test_metacopy_flag() */
> > +       smp_wmb();
> > +       ovl_set_flag(OVL_METACOPY, inode);
> > +}
> > +
> > +void ovl_clear_metacopy_flag(struct inode *inode)
> > +{
> > +       /* Pairs with smp_rmb() in ovl_test_metacopy_flag() */
> > +       smp_wmb();
> > +       ovl_clear_flag(OVL_METACOPY, inode);
> > +}
> > +
> >  bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags) {
> >         if (!S_ISREG(d_inode(dentry)->i_mode))
> >                 return false;
> > @@ -237,7 +258,7 @@ bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags) {
> >         if (!(OPEN_FMODE(flags) & FMODE_WRITE))
> >                 return false;
> >
> > -       if (!ovl_test_flag(OVL_METACOPY, d_inode(dentry)))
> > +       if (!ovl_test_metacopy_flag(d_inode(dentry)))
> >                 return false;
> >
> >         return true;
> > --
> > 2.13.5
> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/9] ovl: Provide a mount option metacopy=on/off for metadata copyup
  2017-10-11 18:34             ` Vivek Goyal
@ 2017-10-11 20:29               ` Amir Goldstein
  2017-10-12 13:23                 ` Vivek Goyal
  0 siblings, 1 reply; 31+ messages in thread
From: Amir Goldstein @ 2017-10-11 20:29 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Wed, Oct 11, 2017 at 9:34 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Oct 11, 2017 at 08:36:03PM +0300, Amir Goldstein wrote:
>> On Wed, Oct 11, 2017 at 7:53 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Wed, Oct 11, 2017 at 07:29:38PM +0300, Amir Goldstein wrote:
>> >
>> > [..]
>> >> >
>> >> > Anyway, I did not understand what's the problem with borken upper
>> >> > hardlinks when index=off. If two upper hardlinks (broken), can
>> >> > point to same ORIGIN on lower, they will just copy up same data.
>> >> >
>> >> > IOW, it does not seem to be more broken then what it is now just
>> >> > becase of metadata copy up or because of both upper pointing to
>> >> > same ORIGIN? What am I missing.
>> >> >
>> >> It is not broken now because we intentionally do NOT set origin when
>> >> copying up lower hardlinks and index is disabled.
>> >> You must not break this rule, so instead you can avoid metacopy for
>> >> lower hardlinks when index is disabled.
>> >
>> > Hi Amir,
>> >
>> > Ok, I am open to change it so that if index=off, lower hardlinks are not
>> > copied up metadata only.
>> >
>> > But please help me understand that what *additional* thing breaks if origin
>> > is set when index=off.
>> >
>> > Say lower has a file foo.txt and another hard link foo-link.txt.
>> > (say index=off, metacopy=off).
>> >
>> > Say, I open foo-link.txt for WRITE, and it gets copied up. Now there is
>> > no link between foo-link.txt and foo.txt and if a user opens foo.txt for
>> > READ, it does not see updates to foo-link.txt. So this is the broken
>> > behavior of hardlinks with index=off, as of today.
>> >
>> > Now I go back to original state and this time do same operation with
>> > (indxex=off, metacopy=on). Then "chown foo-link.txt" will only copy
>> > metadata and set origin to lower file. Say I also chown "foo.txt", that
>> > will do the same thing. Now both foo.txt and foo-link.txt will have
>> > ORIGIN pointing to same lower inode. If any of the file is opened for
>> > WRITE, data will be copied up from same origin.
>> >
>> > So setting same ORIGIN on two upper files, does not seem to break anything
>> > additional, AFAICS. What am I missing.
>> >
>>
>> Sorry for brevity of my response earlier. I was hopping on a plane.
>> The problem is a bit convoluted to explain, but here goes.
>>
>> ORIGIN was introduced to implement constant st_ino across copy up.
>> stat(2) of an overlay file with ORIGIN (both layer in same fs) returns
>> st_ino of origin inode.
>>
>> When copy up breaks lower hardlink to *different* upper inodes,
>> then those broken aliases MUST NOT return the same st_ino,
>> because they are different files with different data.
>
> Hi Amir,
>
> I thought current code already takes care of this situation.
>
> ovl_getattr() {
>           /*
>             * Lower hardlinks may be broken on copy up to different
>             * upper files, so we cannot use the lower origin st_ino
>             * for those different files, even for the same fs case.
>             * With inodes index enabled, it is safe to use st_ino
>             * of an indexed hardlinked origin. The index validates
>             * that the upper hardlink is not broken.
>             */
>            if (is_dir || lowerstat.nlink == 1 ||
>                ovl_test_flag(OVL_INDEX, d_inode(dentry)))
>                    stat->ino = lowerstat.ino;
> }
>
> So if lower had nlink > 1, then its inode number will not be used if
> indexing was not enabled. If that's the case, two upper pointing to
> same origin should not be a problem?
>
> I even tested it. Wrote a patch to force setting origin even on hard
> links with index=off.
>
> Before copy up, both the files (testfile.txt and link-file.txt) had
> same inode numbers. The moment file is copied up it uses its real
> inode number on upper (and not the one retrieved from lower). IOW,
> origin inode number is discarded if index=off. If that's the case
> then setting ORIGIN with broken hard link should be just fine. It
> can still be used for metadata copyup while it can't be used with
> index=off.
>
> In fact this sounds better to me. Make use of ORIGIN information only if
> it is safe. With broken hardlinks it is not safe to make use of this
> information so ignore ORIGIN. But it is still ok to make use of this
> information from metadata copyup point of view.
>

Yes, all this is true. I was trying to avoid the full explanation regrading
index inconsistency, but I ended up giving an incomplete story.

>>
>> Only when index is enabled and copy up does not break lower
>> hardlink, it is legal to set ORIGIN for every upper alias, because
>> all upper aliases can and should return same st_ino (the origin st_ino)
>> from stat(2).
>>
>> So that is the simplest way to explain why you MUST NOT set ORIGIN
>> on copy up of lower hardlink when index is disabled and therefore, you
>> cannot use ORIGIN for metacopy.
>>
>> As I wrote in some email, you could set METAORIGIN xattr in this case
>> just for the use of metacopy, but I rather not have this special case.
>>
>> Beyond the problem stated above with st_ino of broken hardlinks,
>> if broken hardlinks have the same ORIGIN, this also breaks indexing
>> assumptions and can lead to EIO on lookup, but no need to get into that.
>
> I guess you are referring to ovl_lookup_index().
>
>  () {
>         index = lookup_one_len_unlocked(name.name, ofs->indexdir, name.len);
>         if (d_is_negative(index)) {
>                 if (upper && d_inode(origin)->i_nlink > 1) {
>                         pr_warn_ratelimited("overlayfs: hard link with origin but no index (ino=%lu).\n",
>                                             d_inode(origin)->i_ino);
>                         goto fail;
>                 }
>
>         }
> }
>
> So this is failing if we find a dentry which has origin but no index. And
> this will hit if a overlay was mounted with index=off, hard link copied up
> and then remounted with index=on. In that case it will return -EIO.
>
> My question is, that does this have to be an error. If we are supporting
> this use case, where index can be turned on later, then can we just warn
> and continue? Or set OVL_XATTR_INDEX on upper to indicate that this
> upper should have an index.
>
> I mean ORIGIN started with the exclusive purpose of inode number stability.
> But this is sort of infrastructure which keeps track of ORIGIN of copy
> up source and can be used for metadata copy up as well. So indexing should
> not put restrictions on what files ORIGIN can be set on. Instead both
> metacopy and index should not make use of ORIGIN where they this things
> are broken for them.
>
> Thoughts?
>

It is interesting to note that the commit that introduced this limitation
fbaf94ee3cd5 ("ovl: don't set origin on broken lower hardlink")
was merged as a "last minute" (rc7) fix patch before final v4.12
and had this text in commit message:
"We can relax this in the future when we are able to index upper object by
    origin."

I am not sure it is easy to relax this limitation, but I may be wrong.
I'll take another swing at writing the full story. Hope I will get it
right this time...

Index is a 1-to-1 mapping from origin *inode* to upper *inode*.
Highlighting *inode* because there may be many lower or upper
aliases of that inode.

The index key (its name) is the file handle of origin inode.
The index itself is an upper alias (link) of upper inode.

If more than 1 upper inode point to the same origin inode,
then index cannot be consistent for both upper aliases.
This would actually be a common scenario if hardlinks are broken
due to index=off (or kernel v4.12 without the rc7 fix)
and then index is turned on and more lower aliases are copied up.

On lookup, we can detected that index found by origin
file handle is not a hardlink of upper inode and ignore it instead of
returning EIO on:
        if (upper && d_inode(upper) != inode) {
                pr_warn_ratelimited("overlayfs: wrong index found
(index=%pd2, ino=%lu, upper ino=%lu).\n",
                                    index, inode->i_ino, d_inode(upper)->i_ino);
                goto fail;
        }

And we could have allowed for "hard link with origin but no index"
instead of returning EIO.

In those cases, we would need to get a hashed overlay inode by address
of upper inode, instead of by address of origin inode and we would have to
not set the OVL_INDEX flag on lookup.

I guess one of the reasons we did not do it on first version of index feature
is that we wanted to keep things simple and we did not have a good enough
reason to set ORIGIN for broken hardlinks.

So I am now open to the possibility that we may be able to set ORIGIN
for broken hardlinks without breaking anything, but will need to see patches
before I can say if we missed something. Maybe Miklos can see something
that I have missed.

In any case, for the sake of simplicity, I wouldn't bother doing metacopy up
of lower hardlinks in first version of metacopy feature.

Thanks for insisting on a good answer (not sure I have provided one yet)
Amir.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/9] ovl: Set xattr OVL_XATTR_METACOPY on upper file
  2017-10-11 20:16     ` Vivek Goyal
@ 2017-10-11 20:44       ` Amir Goldstein
  0 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-10-11 20:44 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Wed, Oct 11, 2017 at 11:16 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Oct 10, 2017 at 08:03:05PM +0300, Amir Goldstein wrote:
>
> [..]
>> > @@ -511,6 +564,10 @@ static int ovl_copy_up_locked(struct ovl_copy_up_ctx *c)
>> >                 goto out_cleanup;
>> >
>> >         ovl_inode_update(d_inode(c->dentry), newdentry);
>> > +       if (c->metadata_only) {
>> > +               smp_wmb();
>>
>> This wmb does not address the only problem imo.
>> The problem is you must not allow inode to appear to have
>> An upper dentry before you made sure metacopy flag is visible.
>> So first you need to set metacopy flag before updating upper.
>> Then you also need to test them is correct order with rmb.
>
> Ok, so conceptually it makse sense to me that OVL_METACOPY flag should
> be set before upper dentry is made visible.
>
> So ovl_set_flag(OVL_METACOPY) should move before ovl_inode_update().
> ovl_inode_update() already has a smp_wmb(). So write side is probably
> fine.
>
> But read side, ovl_upperdentry_dereference() only has data dependency
> barrier and not smp_rmb(). (lockless_dereference(oi->__upperdentry)). I
> am not sure if that's sufficient for ovl_inode->flags read or not. May
> be we need to replace data dependency barrier with smp_rmb() in in
> ovl_upperdentry_dereference()?

I think we do need smp_rmb() between ovl_upperdentry_dereference()
and testing OVL_METACOPY, but not inside ovl_upperdentry_dereference()
somewhere in d_real() path maybe in ovl_open_need_copy_up()

>
> Even if we do this, I think we need barriers around setting/resetting
> of OVL_METACOPY for the reset case. I will explain this in other patch.
>

I think the spinlock in atomic bit ops of set_bit/clear_bit is got you covered
w.r.t. memory barriers. Only need to worry about lockless test_bit() IMO.

Amir.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/9] ovl: Introduce read/write barriers around metacopy flag update
  2017-10-11 20:27     ` Vivek Goyal
@ 2017-10-11 21:08       ` Amir Goldstein
  2017-10-13 18:27         ` Vivek Goyal
  0 siblings, 1 reply; 31+ messages in thread
From: Amir Goldstein @ 2017-10-11 21:08 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Wed, Oct 11, 2017 at 11:27 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Oct 10, 2017 at 08:12:00PM +0300, Amir Goldstein wrote:
>> On Tue, Oct 10, 2017 at 6:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > We can access and check metacopy flag outside ovl_inode->lock. IOW, say
>> > a file was meta copied up in the past and it has metacopy flag set. Now
>> > a data copy up operation in progress. Say another thread reads state of
>> > this flag and if flag clearing is visible before file has been fully
>> > copied up, that can cause problems.
>> >
>> >         CPU1                            CPU2
>> > ovl_copy_up_flags()                     acquire(oi->lock)
>> >  ovl_dentry_needs_data_copy_up()          ovl_copy_up_data_inode()
>> >    ovl_test_metacopy_flag()                 ovl_copy_up_data()
>> >                                             ovl_clear_metacopy_flag()
>> >                                         release(oi->lock)
>> >
>> > Say CPU2 is copying up data and in the end clears metacopy flag. But if
>> > CPU1 perceives clearing of metacopy before actual data copy up operation
>> > being finished, that can become a problem. We want this clearing of flag
>> > to be visible only if all previous write operations have finished.
>> >
>> > Hence this patch introduces smp_wmb() on metacopy flag set/clear operation
>> > and smp_rmb() on metacopy flag test operation.
>> >
>> > May be some other lock or barrier is already covering it. But I am not sure
>> > what that is and is it obvious enough that we will not break it in future.
>> >
>> > So hence trying to be safe here and introducing barriers explicitly for
>> > metacopy bit.
>>
>> Please revisit your changes after addressing my comment on patch 5.
>
> Assume, we added smp_rmb() in ovl_upperdentry_dereference(), to make
> sure by the time upperdentry is visible, OVL_METACOPY is visible too.
>
>> IIUC the flow you describe above should be addressed by testing
>> metacopy flag again under ovl inode lock.
>>
>> The only subtle part imo is making metacopy flag visible
>> Before making upper dentry visible.
>> Clearing metacopy flag should not be a problem imo,
>> As it comes after fsync, but didn't look closely.
>
> Say a file has been metadata copied up only. So upper is visible and
> OVL_METACOPY flag is set.
>
> Now, what happens if two paths try to data copy up the file. Say first
> path gets the oi->lock and starts doing data copy. Second path enters
> ovl_copy_up_flags() and checks if data copy up is required or not. If
> OVL_METACOPY is still set, then second process will block on oi->lock
> and once first process exits will check OVL_METACOPY again under lock
> and bail out. So that is not a problem.
>
> What about the other case. That is OVL_METACOPY gets cleared before
> data copy operation is complete. If I don't introduce smp_wmb()/smp_rmb()
> around resetting of OVL_METACOPY, what will make sure that
> ovl_copy_up_flags() will see OVL_METACOPY=0 only after data copy up is
> complete. Of course we will not like to return to caller of
> ovl_copy_up_flags() while data copy up is still going on on a different
> cpu.
>

vfs_removexattr() between data copy up and clearing OVL_METACOPY
takes inode lock.
IIUC, any lock/unlock has a full memory barrier.

Amir.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/9] ovl: Provide a mount option metacopy=on/off for metadata copyup
  2017-10-11 20:29               ` Amir Goldstein
@ 2017-10-12 13:23                 ` Vivek Goyal
  2017-10-12 13:39                   ` Amir Goldstein
  0 siblings, 1 reply; 31+ messages in thread
From: Vivek Goyal @ 2017-10-12 13:23 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Wed, Oct 11, 2017 at 11:29:02PM +0300, Amir Goldstein wrote:
> On Wed, Oct 11, 2017 at 9:34 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Wed, Oct 11, 2017 at 08:36:03PM +0300, Amir Goldstein wrote:
> >> On Wed, Oct 11, 2017 at 7:53 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > On Wed, Oct 11, 2017 at 07:29:38PM +0300, Amir Goldstein wrote:
> >> >
> >> > [..]
> >> >> >
> >> >> > Anyway, I did not understand what's the problem with borken upper
> >> >> > hardlinks when index=off. If two upper hardlinks (broken), can
> >> >> > point to same ORIGIN on lower, they will just copy up same data.
> >> >> >
> >> >> > IOW, it does not seem to be more broken then what it is now just
> >> >> > becase of metadata copy up or because of both upper pointing to
> >> >> > same ORIGIN? What am I missing.
> >> >> >
> >> >> It is not broken now because we intentionally do NOT set origin when
> >> >> copying up lower hardlinks and index is disabled.
> >> >> You must not break this rule, so instead you can avoid metacopy for
> >> >> lower hardlinks when index is disabled.
> >> >
> >> > Hi Amir,
> >> >
> >> > Ok, I am open to change it so that if index=off, lower hardlinks are not
> >> > copied up metadata only.
> >> >
> >> > But please help me understand that what *additional* thing breaks if origin
> >> > is set when index=off.
> >> >
> >> > Say lower has a file foo.txt and another hard link foo-link.txt.
> >> > (say index=off, metacopy=off).
> >> >
> >> > Say, I open foo-link.txt for WRITE, and it gets copied up. Now there is
> >> > no link between foo-link.txt and foo.txt and if a user opens foo.txt for
> >> > READ, it does not see updates to foo-link.txt. So this is the broken
> >> > behavior of hardlinks with index=off, as of today.
> >> >
> >> > Now I go back to original state and this time do same operation with
> >> > (indxex=off, metacopy=on). Then "chown foo-link.txt" will only copy
> >> > metadata and set origin to lower file. Say I also chown "foo.txt", that
> >> > will do the same thing. Now both foo.txt and foo-link.txt will have
> >> > ORIGIN pointing to same lower inode. If any of the file is opened for
> >> > WRITE, data will be copied up from same origin.
> >> >
> >> > So setting same ORIGIN on two upper files, does not seem to break anything
> >> > additional, AFAICS. What am I missing.
> >> >
> >>
> >> Sorry for brevity of my response earlier. I was hopping on a plane.
> >> The problem is a bit convoluted to explain, but here goes.
> >>
> >> ORIGIN was introduced to implement constant st_ino across copy up.
> >> stat(2) of an overlay file with ORIGIN (both layer in same fs) returns
> >> st_ino of origin inode.
> >>
> >> When copy up breaks lower hardlink to *different* upper inodes,
> >> then those broken aliases MUST NOT return the same st_ino,
> >> because they are different files with different data.
> >
> > Hi Amir,
> >
> > I thought current code already takes care of this situation.
> >
> > ovl_getattr() {
> >           /*
> >             * Lower hardlinks may be broken on copy up to different
> >             * upper files, so we cannot use the lower origin st_ino
> >             * for those different files, even for the same fs case.
> >             * With inodes index enabled, it is safe to use st_ino
> >             * of an indexed hardlinked origin. The index validates
> >             * that the upper hardlink is not broken.
> >             */
> >            if (is_dir || lowerstat.nlink == 1 ||
> >                ovl_test_flag(OVL_INDEX, d_inode(dentry)))
> >                    stat->ino = lowerstat.ino;
> > }
> >
> > So if lower had nlink > 1, then its inode number will not be used if
> > indexing was not enabled. If that's the case, two upper pointing to
> > same origin should not be a problem?
> >
> > I even tested it. Wrote a patch to force setting origin even on hard
> > links with index=off.
> >
> > Before copy up, both the files (testfile.txt and link-file.txt) had
> > same inode numbers. The moment file is copied up it uses its real
> > inode number on upper (and not the one retrieved from lower). IOW,
> > origin inode number is discarded if index=off. If that's the case
> > then setting ORIGIN with broken hard link should be just fine. It
> > can still be used for metadata copyup while it can't be used with
> > index=off.
> >
> > In fact this sounds better to me. Make use of ORIGIN information only if
> > it is safe. With broken hardlinks it is not safe to make use of this
> > information so ignore ORIGIN. But it is still ok to make use of this
> > information from metadata copyup point of view.
> >
> 
> Yes, all this is true. I was trying to avoid the full explanation regrading
> index inconsistency, but I ended up giving an incomplete story.
> 
> >>
> >> Only when index is enabled and copy up does not break lower
> >> hardlink, it is legal to set ORIGIN for every upper alias, because
> >> all upper aliases can and should return same st_ino (the origin st_ino)
> >> from stat(2).
> >>
> >> So that is the simplest way to explain why you MUST NOT set ORIGIN
> >> on copy up of lower hardlink when index is disabled and therefore, you
> >> cannot use ORIGIN for metacopy.
> >>
> >> As I wrote in some email, you could set METAORIGIN xattr in this case
> >> just for the use of metacopy, but I rather not have this special case.
> >>
> >> Beyond the problem stated above with st_ino of broken hardlinks,
> >> if broken hardlinks have the same ORIGIN, this also breaks indexing
> >> assumptions and can lead to EIO on lookup, but no need to get into that.
> >
> > I guess you are referring to ovl_lookup_index().
> >
> >  () {
> >         index = lookup_one_len_unlocked(name.name, ofs->indexdir, name.len);
> >         if (d_is_negative(index)) {
> >                 if (upper && d_inode(origin)->i_nlink > 1) {
> >                         pr_warn_ratelimited("overlayfs: hard link with origin but no index (ino=%lu).\n",
> >                                             d_inode(origin)->i_ino);
> >                         goto fail;
> >                 }
> >
> >         }
> > }
> >
> > So this is failing if we find a dentry which has origin but no index. And
> > this will hit if a overlay was mounted with index=off, hard link copied up
> > and then remounted with index=on. In that case it will return -EIO.
> >
> > My question is, that does this have to be an error. If we are supporting
> > this use case, where index can be turned on later, then can we just warn
> > and continue? Or set OVL_XATTR_INDEX on upper to indicate that this
> > upper should have an index.
> >
> > I mean ORIGIN started with the exclusive purpose of inode number stability.
> > But this is sort of infrastructure which keeps track of ORIGIN of copy
> > up source and can be used for metadata copy up as well. So indexing should
> > not put restrictions on what files ORIGIN can be set on. Instead both
> > metacopy and index should not make use of ORIGIN where they this things
> > are broken for them.
> >
> > Thoughts?
> >
> 
> It is interesting to note that the commit that introduced this limitation
> fbaf94ee3cd5 ("ovl: don't set origin on broken lower hardlink")
> was merged as a "last minute" (rc7) fix patch before final v4.12
> and had this text in commit message:
> "We can relax this in the future when we are able to index upper object by
>     origin."
> 
> I am not sure it is easy to relax this limitation, but I may be wrong.
> I'll take another swing at writing the full story. Hope I will get it
> right this time...
> 
> Index is a 1-to-1 mapping from origin *inode* to upper *inode*.
> Highlighting *inode* because there may be many lower or upper
> aliases of that inode.
> 
> The index key (its name) is the file handle of origin inode.
> The index itself is an upper alias (link) of upper inode.
> 
> If more than 1 upper inode point to the same origin inode,
> then index cannot be consistent for both upper aliases.
> This would actually be a common scenario if hardlinks are broken
> due to index=off (or kernel v4.12 without the rc7 fix)
> and then index is turned on and more lower aliases are copied up.
> 
> On lookup, we can detected that index found by origin
> file handle is not a hardlink of upper inode and ignore it instead of
> returning EIO on:
>         if (upper && d_inode(upper) != inode) {
>                 pr_warn_ratelimited("overlayfs: wrong index found
> (index=%pd2, ino=%lu, upper ino=%lu).\n",
>                                     index, inode->i_ino, d_inode(upper)->i_ino);
>                 goto fail;
>         }
> 
> And we could have allowed for "hard link with origin but no index"
> instead of returning EIO.
> 
> In those cases, we would need to get a hashed overlay inode by address
> of upper inode, instead of by address of origin inode and we would have to
> not set the OVL_INDEX flag on lookup.
> 
> I guess one of the reasons we did not do it on first version of index feature
> is that we wanted to keep things simple and we did not have a good enough
> reason to set ORIGIN for broken hardlinks.
> 
> So I am now open to the possibility that we may be able to set ORIGIN
> for broken hardlinks without breaking anything, but will need to see patches
> before I can say if we missed something. Maybe Miklos can see something
> that I have missed.
> 
> In any case, for the sake of simplicity, I wouldn't bother doing metacopy up
> of lower hardlinks in first version of metacopy feature.

Ok, I can see that current code will return -EIO if index is enabled after
some copy up have taken place with index=off. So for now, I will disable
metadata copyup on hardlinks if index=off. 

But it would be nice if we can move away from this restriction.

> 
> Thanks for insisting on a good answer (not sure I have provided one yet)

I think I understand it now. Thank you.

Vivek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/9] ovl: Provide a mount option metacopy=on/off for metadata copyup
  2017-10-12 13:23                 ` Vivek Goyal
@ 2017-10-12 13:39                   ` Amir Goldstein
  0 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-10-12 13:39 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Thu, Oct 12, 2017 at 4:23 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Oct 11, 2017 at 11:29:02PM +0300, Amir Goldstein wrote:
>> On Wed, Oct 11, 2017 at 9:34 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
...
>> > So this is failing if we find a dentry which has origin but no index. And
>> > this will hit if a overlay was mounted with index=off, hard link copied up
>> > and then remounted with index=on. In that case it will return -EIO.
>> >
>> > My question is, that does this have to be an error. If we are supporting
>> > this use case, where index can be turned on later, then can we just warn
>> > and continue? Or set OVL_XATTR_INDEX on upper to indicate that this
>> > upper should have an index.
>> >
>> > I mean ORIGIN started with the exclusive purpose of inode number stability.
>> > But this is sort of infrastructure which keeps track of ORIGIN of copy
>> > up source and can be used for metadata copy up as well. So indexing should
>> > not put restrictions on what files ORIGIN can be set on. Instead both
>> > metacopy and index should not make use of ORIGIN where they this things
>> > are broken for them.
>> >
>> > Thoughts?
>> >
>>
>> It is interesting to note that the commit that introduced this limitation
>> fbaf94ee3cd5 ("ovl: don't set origin on broken lower hardlink")
>> was merged as a "last minute" (rc7) fix patch before final v4.12
>> and had this text in commit message:
>> "We can relax this in the future when we are able to index upper object by
>>     origin."
>>
>> I am not sure it is easy to relax this limitation, but I may be wrong.
>> I'll take another swing at writing the full story. Hope I will get it
>> right this time...
>>
>> Index is a 1-to-1 mapping from origin *inode* to upper *inode*.
>> Highlighting *inode* because there may be many lower or upper
>> aliases of that inode.
>>
>> The index key (its name) is the file handle of origin inode.
>> The index itself is an upper alias (link) of upper inode.
>>
>> If more than 1 upper inode point to the same origin inode,
>> then index cannot be consistent for both upper aliases.
>> This would actually be a common scenario if hardlinks are broken
>> due to index=off (or kernel v4.12 without the rc7 fix)
>> and then index is turned on and more lower aliases are copied up.
>>
>> On lookup, we can detected that index found by origin
>> file handle is not a hardlink of upper inode and ignore it instead of
>> returning EIO on:
>>         if (upper && d_inode(upper) != inode) {
>>                 pr_warn_ratelimited("overlayfs: wrong index found
>> (index=%pd2, ino=%lu, upper ino=%lu).\n",
>>                                     index, inode->i_ino, d_inode(upper)->i_ino);
>>                 goto fail;
>>         }
>>
>> And we could have allowed for "hard link with origin but no index"
>> instead of returning EIO.
>>
>> In those cases, we would need to get a hashed overlay inode by address
>> of upper inode, instead of by address of origin inode and we would have to
>> not set the OVL_INDEX flag on lookup.
>>
>> I guess one of the reasons we did not do it on first version of index feature
>> is that we wanted to keep things simple and we did not have a good enough
>> reason to set ORIGIN for broken hardlinks.
>>
>> So I am now open to the possibility that we may be able to set ORIGIN
>> for broken hardlinks without breaking anything, but will need to see patches
>> before I can say if we missed something. Maybe Miklos can see something
>> that I have missed.
>>
>> In any case, for the sake of simplicity, I wouldn't bother doing metacopy up
>> of lower hardlinks in first version of metacopy feature.
>
> Ok, I can see that current code will return -EIO if index is enabled after
> some copy up have taken place with index=off. So for now, I will disable
> metadata copyup on hardlinks if index=off.
>

Well, if it takes me so much effort to hardly convince myself that this
behavior is correct, then I may be trying to rationalize a bug...

The current EIO behavior is not nice so say the least.
The protection of fbaf94ee3cd5 ("ovl: don't set origin on broken lower
hardlink")
doesn't hold if lower is not a hardlink when it is copied up (with
either index=off/on)
and then lower is hardlinked while overlay is offline.

> But it would be nice if we can move away from this restriction.
>

I've already written the xfstest of the use case above and will try to get
rid of those EIO.

Amir.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/9] ovl: Introduce read/write barriers around metacopy flag update
  2017-10-11 21:08       ` Amir Goldstein
@ 2017-10-13 18:27         ` Vivek Goyal
  2017-10-14  6:05           ` Amir Goldstein
  0 siblings, 1 reply; 31+ messages in thread
From: Vivek Goyal @ 2017-10-13 18:27 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Thu, Oct 12, 2017 at 12:08:04AM +0300, Amir Goldstein wrote:
> On Wed, Oct 11, 2017 at 11:27 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Tue, Oct 10, 2017 at 08:12:00PM +0300, Amir Goldstein wrote:
> >> On Tue, Oct 10, 2017 at 6:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > We can access and check metacopy flag outside ovl_inode->lock. IOW, say
> >> > a file was meta copied up in the past and it has metacopy flag set. Now
> >> > a data copy up operation in progress. Say another thread reads state of
> >> > this flag and if flag clearing is visible before file has been fully
> >> > copied up, that can cause problems.
> >> >
> >> >         CPU1                            CPU2
> >> > ovl_copy_up_flags()                     acquire(oi->lock)
> >> >  ovl_dentry_needs_data_copy_up()          ovl_copy_up_data_inode()
> >> >    ovl_test_metacopy_flag()                 ovl_copy_up_data()
> >> >                                             ovl_clear_metacopy_flag()
> >> >                                         release(oi->lock)
> >> >
> >> > Say CPU2 is copying up data and in the end clears metacopy flag. But if
> >> > CPU1 perceives clearing of metacopy before actual data copy up operation
> >> > being finished, that can become a problem. We want this clearing of flag
> >> > to be visible only if all previous write operations have finished.
> >> >
> >> > Hence this patch introduces smp_wmb() on metacopy flag set/clear operation
> >> > and smp_rmb() on metacopy flag test operation.
> >> >
> >> > May be some other lock or barrier is already covering it. But I am not sure
> >> > what that is and is it obvious enough that we will not break it in future.
> >> >
> >> > So hence trying to be safe here and introducing barriers explicitly for
> >> > metacopy bit.
> >>
> >> Please revisit your changes after addressing my comment on patch 5.
> >
> > Assume, we added smp_rmb() in ovl_upperdentry_dereference(), to make
> > sure by the time upperdentry is visible, OVL_METACOPY is visible too.
> >
> >> IIUC the flow you describe above should be addressed by testing
> >> metacopy flag again under ovl inode lock.
> >>
> >> The only subtle part imo is making metacopy flag visible
> >> Before making upper dentry visible.
> >> Clearing metacopy flag should not be a problem imo,
> >> As it comes after fsync, but didn't look closely.
> >
> > Say a file has been metadata copied up only. So upper is visible and
> > OVL_METACOPY flag is set.
> >
> > Now, what happens if two paths try to data copy up the file. Say first
> > path gets the oi->lock and starts doing data copy. Second path enters
> > ovl_copy_up_flags() and checks if data copy up is required or not. If
> > OVL_METACOPY is still set, then second process will block on oi->lock
> > and once first process exits will check OVL_METACOPY again under lock
> > and bail out. So that is not a problem.
> >
> > What about the other case. That is OVL_METACOPY gets cleared before
> > data copy operation is complete. If I don't introduce smp_wmb()/smp_rmb()
> > around resetting of OVL_METACOPY, what will make sure that
> > ovl_copy_up_flags() will see OVL_METACOPY=0 only after data copy up is
> > complete. Of course we will not like to return to caller of
> > ovl_copy_up_flags() while data copy up is still going on on a different
> > cpu.
> >
> 
> vfs_removexattr() between data copy up and clearing OVL_METACOPY
> takes inode lock.
> IIUC, any lock/unlock has a full memory barrier.

When I read memory-barrier.txt, it seems to suggest that RELEASE can let
instructions outside critical region sneak into critical region. If that's
the case, actually clearing of metaflag can happen before xattr actually
got removed. Though I can't think what will go wrong in that case.

Also will all this work without other cpu having any matching barrier?
Again, I am not barrier expert and I am trying to understand these and
one of the messages seems to be that most of the time they have to be
paired. So in this case, we are taking inode lock/unlock on one cpu
but cpu which is checking value of OVL_METACOPY, really does not have
any matching acquire/release or some other kind of barrier.

Vivek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/9] ovl: Introduce read/write barriers around metacopy flag update
  2017-10-13 18:27         ` Vivek Goyal
@ 2017-10-14  6:05           ` Amir Goldstein
  2017-10-14  7:00             ` Amir Goldstein
  2017-10-16 13:24             ` Vivek Goyal
  0 siblings, 2 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-10-14  6:05 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Fri, Oct 13, 2017 at 9:27 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Thu, Oct 12, 2017 at 12:08:04AM +0300, Amir Goldstein wrote:
>> On Wed, Oct 11, 2017 at 11:27 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Tue, Oct 10, 2017 at 08:12:00PM +0300, Amir Goldstein wrote:
>> >> On Tue, Oct 10, 2017 at 6:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> >> > We can access and check metacopy flag outside ovl_inode->lock. IOW, say
>> >> > a file was meta copied up in the past and it has metacopy flag set. Now
>> >> > a data copy up operation in progress. Say another thread reads state of
>> >> > this flag and if flag clearing is visible before file has been fully
>> >> > copied up, that can cause problems.
>> >> >
>> >> >         CPU1                            CPU2
>> >> > ovl_copy_up_flags()                     acquire(oi->lock)
>> >> >  ovl_dentry_needs_data_copy_up()          ovl_copy_up_data_inode()
>> >> >    ovl_test_metacopy_flag()                 ovl_copy_up_data()
>> >> >                                             ovl_clear_metacopy_flag()
>> >> >                                         release(oi->lock)
>> >> >
>> >> > Say CPU2 is copying up data and in the end clears metacopy flag. But if
>> >> > CPU1 perceives clearing of metacopy before actual data copy up operation
>> >> > being finished, that can become a problem. We want this clearing of flag
>> >> > to be visible only if all previous write operations have finished.
>> >> >
>> >> > Hence this patch introduces smp_wmb() on metacopy flag set/clear operation
>> >> > and smp_rmb() on metacopy flag test operation.
>> >> >
>> >> > May be some other lock or barrier is already covering it. But I am not sure
>> >> > what that is and is it obvious enough that we will not break it in future.
>> >> >
>> >> > So hence trying to be safe here and introducing barriers explicitly for
>> >> > metacopy bit.
>> >>
>> >> Please revisit your changes after addressing my comment on patch 5.
>> >
>> > Assume, we added smp_rmb() in ovl_upperdentry_dereference(), to make
>> > sure by the time upperdentry is visible, OVL_METACOPY is visible too.
>> >
>> >> IIUC the flow you describe above should be addressed by testing
>> >> metacopy flag again under ovl inode lock.
>> >>
>> >> The only subtle part imo is making metacopy flag visible
>> >> Before making upper dentry visible.
>> >> Clearing metacopy flag should not be a problem imo,
>> >> As it comes after fsync, but didn't look closely.
>> >
>> > Say a file has been metadata copied up only. So upper is visible and
>> > OVL_METACOPY flag is set.
>> >
>> > Now, what happens if two paths try to data copy up the file. Say first
>> > path gets the oi->lock and starts doing data copy. Second path enters
>> > ovl_copy_up_flags() and checks if data copy up is required or not. If
>> > OVL_METACOPY is still set, then second process will block on oi->lock
>> > and once first process exits will check OVL_METACOPY again under lock
>> > and bail out. So that is not a problem.
>> >
>> > What about the other case. That is OVL_METACOPY gets cleared before
>> > data copy operation is complete. If I don't introduce smp_wmb()/smp_rmb()
>> > around resetting of OVL_METACOPY, what will make sure that
>> > ovl_copy_up_flags() will see OVL_METACOPY=0 only after data copy up is
>> > complete. Of course we will not like to return to caller of
>> > ovl_copy_up_flags() while data copy up is still going on on a different
>> > cpu.
>> >
>>
>> vfs_removexattr() between data copy up and clearing OVL_METACOPY
>> takes inode lock.
>> IIUC, any lock/unlock has a full memory barrier.
>
> When I read memory-barrier.txt, it seems to suggest that RELEASE can let
> instructions outside critical region sneak into critical region. If that's
> the case, actually clearing of metaflag can happen before xattr actually
> got removed. Though I can't think what will go wrong in that case.

clear_bit takes a spinlock so I *think* clearing of flag cannot sneak
before removing xattr.

>
> Also will all this work without other cpu having any matching barrier?
> Again, I am not barrier expert and I am trying to understand these and

Me neither, trying to understand this along with you.

> one of the messages seems to be that most of the time they have to be
> paired. So in this case, we are taking inode lock/unlock on one cpu
> but cpu which is checking value of OVL_METACOPY, really does not have
> any matching acquire/release or some other kind of barrier.
>

Other CPU needs rmb before checking flag to guarantee the order
of setting METACOPY flag before setting upperdentry.

If other CPU sees positive upperdentry and the METACOPY flag cleared,
then I *think* it should be safe to return the upper inode from d_real(),
because RELEASE on inode lock should have the proper guaranties
for consistency of inode data structures in memory.
Also, copying data is followed by fsync operation.
I did not check individual file system implementation of fsync, but I
would be very surprised if fsync wasn't a good enough barrier for all
the inode data structures in memory.

I realize this last paragraph is hand waving, because it refers to
vague "inode data structures" entity as if this entity is all protected by
inode lock, which is not the case? but I believe the proper proof is not
far off. Will need Miklos to reaffirm my statements.

Amir.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/9] ovl: Introduce read/write barriers around metacopy flag update
  2017-10-14  6:05           ` Amir Goldstein
@ 2017-10-14  7:00             ` Amir Goldstein
  2017-10-16 13:24               ` Vivek Goyal
  2017-10-16 13:24             ` Vivek Goyal
  1 sibling, 1 reply; 31+ messages in thread
From: Amir Goldstein @ 2017-10-14  7:00 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Sat, Oct 14, 2017 at 9:05 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Fri, Oct 13, 2017 at 9:27 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> On Thu, Oct 12, 2017 at 12:08:04AM +0300, Amir Goldstein wrote:
>>> On Wed, Oct 11, 2017 at 11:27 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>>> > On Tue, Oct 10, 2017 at 08:12:00PM +0300, Amir Goldstein wrote:
>>> >> On Tue, Oct 10, 2017 at 6:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>>> >> > We can access and check metacopy flag outside ovl_inode->lock. IOW, say
>>> >> > a file was meta copied up in the past and it has metacopy flag set. Now
>>> >> > a data copy up operation in progress. Say another thread reads state of
>>> >> > this flag and if flag clearing is visible before file has been fully
>>> >> > copied up, that can cause problems.
>>> >> >
>>> >> >         CPU1                            CPU2
>>> >> > ovl_copy_up_flags()                     acquire(oi->lock)
>>> >> >  ovl_dentry_needs_data_copy_up()          ovl_copy_up_data_inode()
>>> >> >    ovl_test_metacopy_flag()                 ovl_copy_up_data()
>>> >> >                                             ovl_clear_metacopy_flag()
>>> >> >                                         release(oi->lock)
>>> >> >
>>> >> > Say CPU2 is copying up data and in the end clears metacopy flag. But if
>>> >> > CPU1 perceives clearing of metacopy before actual data copy up operation
>>> >> > being finished, that can become a problem. We want this clearing of flag
>>> >> > to be visible only if all previous write operations have finished.
>>> >> >
>>> >> > Hence this patch introduces smp_wmb() on metacopy flag set/clear operation
>>> >> > and smp_rmb() on metacopy flag test operation.
>>> >> >
>>> >> > May be some other lock or barrier is already covering it. But I am not sure
>>> >> > what that is and is it obvious enough that we will not break it in future.
>>> >> >
>>> >> > So hence trying to be safe here and introducing barriers explicitly for
>>> >> > metacopy bit.
>>> >>
>>> >> Please revisit your changes after addressing my comment on patch 5.
>>> >
>>> > Assume, we added smp_rmb() in ovl_upperdentry_dereference(), to make
>>> > sure by the time upperdentry is visible, OVL_METACOPY is visible too.
>>> >
>>> >> IIUC the flow you describe above should be addressed by testing
>>> >> metacopy flag again under ovl inode lock.
>>> >>
>>> >> The only subtle part imo is making metacopy flag visible
>>> >> Before making upper dentry visible.
>>> >> Clearing metacopy flag should not be a problem imo,
>>> >> As it comes after fsync, but didn't look closely.
>>> >
>>> > Say a file has been metadata copied up only. So upper is visible and
>>> > OVL_METACOPY flag is set.
>>> >
>>> > Now, what happens if two paths try to data copy up the file. Say first
>>> > path gets the oi->lock and starts doing data copy. Second path enters
>>> > ovl_copy_up_flags() and checks if data copy up is required or not. If
>>> > OVL_METACOPY is still set, then second process will block on oi->lock
>>> > and once first process exits will check OVL_METACOPY again under lock
>>> > and bail out. So that is not a problem.
>>> >
>>> > What about the other case. That is OVL_METACOPY gets cleared before
>>> > data copy operation is complete. If I don't introduce smp_wmb()/smp_rmb()
>>> > around resetting of OVL_METACOPY, what will make sure that
>>> > ovl_copy_up_flags() will see OVL_METACOPY=0 only after data copy up is
>>> > complete. Of course we will not like to return to caller of
>>> > ovl_copy_up_flags() while data copy up is still going on on a different
>>> > cpu.
>>> >
>>>
>>> vfs_removexattr() between data copy up and clearing OVL_METACOPY
>>> takes inode lock.
>>> IIUC, any lock/unlock has a full memory barrier.
>>
>> When I read memory-barrier.txt, it seems to suggest that RELEASE can let
>> instructions outside critical region sneak into critical region. If that's
>> the case, actually clearing of metaflag can happen before xattr actually
>> got removed. Though I can't think what will go wrong in that case.
>
> clear_bit takes a spinlock so I *think* clearing of flag cannot sneak
> before removing xattr.
>
>>
>> Also will all this work without other cpu having any matching barrier?
>> Again, I am not barrier expert and I am trying to understand these and
>
> Me neither, trying to understand this along with you.
>
>> one of the messages seems to be that most of the time they have to be
>> paired. So in this case, we are taking inode lock/unlock on one cpu
>> but cpu which is checking value of OVL_METACOPY, really does not have
>> any matching acquire/release or some other kind of barrier.
>>
>
> Other CPU needs rmb before checking flag to guarantee the order
> of setting METACOPY flag before setting upperdentry.
>
> If other CPU sees positive upperdentry and the METACOPY flag cleared,
> then I *think* it should be safe to return the upper inode from d_real(),
> because RELEASE on inode lock should have the proper guaranties
> for consistency of inode data structures in memory.

Nah, smp_wmb before clearing METACOPY is correct, just like
smp_wmb in ovl_inode_update(). probably should copy the same comment.
But that was already there before this wrappers patch, so I don't see the
need for this patch.

Amir.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/9] ovl: Introduce read/write barriers around metacopy flag update
  2017-10-14  7:00             ` Amir Goldstein
@ 2017-10-16 13:24               ` Vivek Goyal
  0 siblings, 0 replies; 31+ messages in thread
From: Vivek Goyal @ 2017-10-16 13:24 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Sat, Oct 14, 2017 at 10:00:44AM +0300, Amir Goldstein wrote:
> On Sat, Oct 14, 2017 at 9:05 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> > On Fri, Oct 13, 2017 at 9:27 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> On Thu, Oct 12, 2017 at 12:08:04AM +0300, Amir Goldstein wrote:
> >>> On Wed, Oct 11, 2017 at 11:27 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >>> > On Tue, Oct 10, 2017 at 08:12:00PM +0300, Amir Goldstein wrote:
> >>> >> On Tue, Oct 10, 2017 at 6:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >>> >> > We can access and check metacopy flag outside ovl_inode->lock. IOW, say
> >>> >> > a file was meta copied up in the past and it has metacopy flag set. Now
> >>> >> > a data copy up operation in progress. Say another thread reads state of
> >>> >> > this flag and if flag clearing is visible before file has been fully
> >>> >> > copied up, that can cause problems.
> >>> >> >
> >>> >> >         CPU1                            CPU2
> >>> >> > ovl_copy_up_flags()                     acquire(oi->lock)
> >>> >> >  ovl_dentry_needs_data_copy_up()          ovl_copy_up_data_inode()
> >>> >> >    ovl_test_metacopy_flag()                 ovl_copy_up_data()
> >>> >> >                                             ovl_clear_metacopy_flag()
> >>> >> >                                         release(oi->lock)
> >>> >> >
> >>> >> > Say CPU2 is copying up data and in the end clears metacopy flag. But if
> >>> >> > CPU1 perceives clearing of metacopy before actual data copy up operation
> >>> >> > being finished, that can become a problem. We want this clearing of flag
> >>> >> > to be visible only if all previous write operations have finished.
> >>> >> >
> >>> >> > Hence this patch introduces smp_wmb() on metacopy flag set/clear operation
> >>> >> > and smp_rmb() on metacopy flag test operation.
> >>> >> >
> >>> >> > May be some other lock or barrier is already covering it. But I am not sure
> >>> >> > what that is and is it obvious enough that we will not break it in future.
> >>> >> >
> >>> >> > So hence trying to be safe here and introducing barriers explicitly for
> >>> >> > metacopy bit.
> >>> >>
> >>> >> Please revisit your changes after addressing my comment on patch 5.
> >>> >
> >>> > Assume, we added smp_rmb() in ovl_upperdentry_dereference(), to make
> >>> > sure by the time upperdentry is visible, OVL_METACOPY is visible too.
> >>> >
> >>> >> IIUC the flow you describe above should be addressed by testing
> >>> >> metacopy flag again under ovl inode lock.
> >>> >>
> >>> >> The only subtle part imo is making metacopy flag visible
> >>> >> Before making upper dentry visible.
> >>> >> Clearing metacopy flag should not be a problem imo,
> >>> >> As it comes after fsync, but didn't look closely.
> >>> >
> >>> > Say a file has been metadata copied up only. So upper is visible and
> >>> > OVL_METACOPY flag is set.
> >>> >
> >>> > Now, what happens if two paths try to data copy up the file. Say first
> >>> > path gets the oi->lock and starts doing data copy. Second path enters
> >>> > ovl_copy_up_flags() and checks if data copy up is required or not. If
> >>> > OVL_METACOPY is still set, then second process will block on oi->lock
> >>> > and once first process exits will check OVL_METACOPY again under lock
> >>> > and bail out. So that is not a problem.
> >>> >
> >>> > What about the other case. That is OVL_METACOPY gets cleared before
> >>> > data copy operation is complete. If I don't introduce smp_wmb()/smp_rmb()
> >>> > around resetting of OVL_METACOPY, what will make sure that
> >>> > ovl_copy_up_flags() will see OVL_METACOPY=0 only after data copy up is
> >>> > complete. Of course we will not like to return to caller of
> >>> > ovl_copy_up_flags() while data copy up is still going on on a different
> >>> > cpu.
> >>> >
> >>>
> >>> vfs_removexattr() between data copy up and clearing OVL_METACOPY
> >>> takes inode lock.
> >>> IIUC, any lock/unlock has a full memory barrier.
> >>
> >> When I read memory-barrier.txt, it seems to suggest that RELEASE can let
> >> instructions outside critical region sneak into critical region. If that's
> >> the case, actually clearing of metaflag can happen before xattr actually
> >> got removed. Though I can't think what will go wrong in that case.
> >
> > clear_bit takes a spinlock so I *think* clearing of flag cannot sneak
> > before removing xattr.
> >
> >>
> >> Also will all this work without other cpu having any matching barrier?
> >> Again, I am not barrier expert and I am trying to understand these and
> >
> > Me neither, trying to understand this along with you.
> >
> >> one of the messages seems to be that most of the time they have to be
> >> paired. So in this case, we are taking inode lock/unlock on one cpu
> >> but cpu which is checking value of OVL_METACOPY, really does not have
> >> any matching acquire/release or some other kind of barrier.
> >>
> >
> > Other CPU needs rmb before checking flag to guarantee the order
> > of setting METACOPY flag before setting upperdentry.
> >
> > If other CPU sees positive upperdentry and the METACOPY flag cleared,
> > then I *think* it should be safe to return the upper inode from d_real(),
> > because RELEASE on inode lock should have the proper guaranties
> > for consistency of inode data structures in memory.
> 
> Nah, smp_wmb before clearing METACOPY is correct, just like
> smp_wmb in ovl_inode_update(). probably should copy the same comment.

Atleast it feels safe (after reading memory-barrier.txt).

> But that was already there before this wrappers patch, so I don't see the
> need for this patch.

Actually, that was a patch organization mistake. I wanted to keep all
logic related to barriers around OVL_METACOPY flags in a single patch
with proper commit message. So that if later something is broken, we can
look at commit message and remember why did it do it this way.

Also, I thought doing this through wrapper functions makes lot of sense 
because it becomes easier to understand barrier pairing as well as chances
of making mistakes later come down. Very similar to wrappers used by
ovl_inode_update() and ovl_dentry_upper_dereference(). One can easily
understand what's the barrier pattern used here and what's being
protected.

So I really perfer to wrap barrier logic in wrappers. I will give this
patch series another try taking care of all other comments and then we
can have another round of conversation around this barrier logic.

Vivek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/9] ovl: Introduce read/write barriers around metacopy flag update
  2017-10-14  6:05           ` Amir Goldstein
  2017-10-14  7:00             ` Amir Goldstein
@ 2017-10-16 13:24             ` Vivek Goyal
  2017-10-16 13:31               ` Amir Goldstein
  1 sibling, 1 reply; 31+ messages in thread
From: Vivek Goyal @ 2017-10-16 13:24 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Sat, Oct 14, 2017 at 09:05:17AM +0300, Amir Goldstein wrote:

[..]
> > When I read memory-barrier.txt, it seems to suggest that RELEASE can let
> > instructions outside critical region sneak into critical region. If that's
> > the case, actually clearing of metaflag can happen before xattr actually
> > got removed. Though I can't think what will go wrong in that case.
> 
> clear_bit takes a spinlock so I *think* clearing of flag cannot sneak
> before removing xattr.

Which spinlock does clear_bit take? Are you referring to ovl_inode->lock mutex?

Vivek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/9] ovl: Introduce read/write barriers around metacopy flag update
  2017-10-16 13:24             ` Vivek Goyal
@ 2017-10-16 13:31               ` Amir Goldstein
  0 siblings, 0 replies; 31+ messages in thread
From: Amir Goldstein @ 2017-10-16 13:31 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi, Eric W. Biederman

On Mon, Oct 16, 2017 at 4:24 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Sat, Oct 14, 2017 at 09:05:17AM +0300, Amir Goldstein wrote:
>
> [..]
>> > When I read memory-barrier.txt, it seems to suggest that RELEASE can let
>> > instructions outside critical region sneak into critical region. If that's
>> > the case, actually clearing of metaflag can happen before xattr actually
>> > got removed. Though I can't think what will go wrong in that case.
>>
>> clear_bit takes a spinlock so I *think* clearing of flag cannot sneak
>> before removing xattr.
>
> Which spinlock does clear_bit take? Are you referring to ovl_inode->lock mutex?
>
See set_bit() and clear_bit() in
include/asm-generic/bitops/atomic.h

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2017-10-16 13:31 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-10 15:32 [RFC PATCH 0/9][V3] overlayfs: Delayed copy up of data Vivek Goyal
2017-10-10 15:32 ` [PATCH 1/9] ovl: ovl_check_setxattr() get rid of redundant -EOPNOTSUPP check Vivek Goyal
2017-10-10 15:32 ` [PATCH 2/9] ovl: During copy up, first copy up metadata and then data Vivek Goyal
2017-10-10 15:32 ` [PATCH 3/9] ovl: Provide a mount option metacopy=on/off for metadata copyup Vivek Goyal
2017-10-11  1:36   ` Amir Goldstein
2017-10-11 13:57     ` Vivek Goyal
2017-10-11 16:29       ` Amir Goldstein
2017-10-11 16:53         ` Vivek Goyal
2017-10-11 17:36           ` Amir Goldstein
2017-10-11 18:34             ` Vivek Goyal
2017-10-11 20:29               ` Amir Goldstein
2017-10-12 13:23                 ` Vivek Goyal
2017-10-12 13:39                   ` Amir Goldstein
2017-10-10 15:32 ` [PATCH 4/9] ovl: Copy up only metadata during copy up where it makes sense Vivek Goyal
2017-10-10 15:32 ` [PATCH 5/9] ovl: Set xattr OVL_XATTR_METACOPY on upper file Vivek Goyal
2017-10-10 17:03   ` Amir Goldstein
2017-10-11 20:16     ` Vivek Goyal
2017-10-11 20:44       ` Amir Goldstein
2017-10-10 15:32 ` [PATCH 6/9] ovl: Fix ovl_getattr() to get number of blocks from lower Vivek Goyal
2017-10-10 15:32 ` [PATCH 7/9] ovl: Introduce read/write barriers around metacopy flag update Vivek Goyal
2017-10-10 17:12   ` Amir Goldstein
2017-10-11 20:27     ` Vivek Goyal
2017-10-11 21:08       ` Amir Goldstein
2017-10-13 18:27         ` Vivek Goyal
2017-10-14  6:05           ` Amir Goldstein
2017-10-14  7:00             ` Amir Goldstein
2017-10-16 13:24               ` Vivek Goyal
2017-10-16 13:24             ` Vivek Goyal
2017-10-16 13:31               ` Amir Goldstein
2017-10-10 15:32 ` [PATCH 8/9] ovl: Set OVL_METACOPY flag during ovl_lookup() Vivek Goyal
2017-10-10 15:32 ` [PATCH 9/9] ovl: Return lower dentry if only metadata copy up took place Vivek Goyal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.