All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data
@ 2017-10-25 19:09 Vivek Goyal
  2017-10-25 19:09 ` [PATCH 01/13] ovl: Put upperdentry if ovl_check_origin() fails Vivek Goyal
                   ` (13 more replies)
  0 siblings, 14 replies; 44+ messages in thread
From: Vivek Goyal @ 2017-10-25 19:09 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal

Hi,

Here is the V5 of patches. Following are changes from V4.

- Flipped the meaning of bit. Now OVL_METACOPY is called OVL_UPPERDATA and
  dropped a barrier patch which implemented ordering requirements between
  oi->flags and oi->__upperdentry.

- Now ovl_d_real() does return upper dentry if upper dentry is metacopy only
  dentry.

- Added a fix to ovl_lookup() goto statement.

- Renamed ->metadata_only to ->metacopy

- Replaced BUG_ON() with WARN_ON() in ovl_copy_up_meta_inode_data()

- Added WARN_ON(!OVL_TYPE_ORIGIN(type)) in ovl_getattr()  

- Took care of bunch of code organization comments from Amir.

Original Description
--------------------
In one of the recent converstions, people mentioned that chown/chmod
lead to copy up files as well as data. We could optimize it so that
only metadata is copied up during chown/chmod and data is copied up when
file is opened for WRITE.

This optimization potentially could be useful with containers and user
namespaces. In popular scenario, people end up doing chown() on whole
image directory tree based on container mappings. And this chown copies
up everything, breaking sharing of page cache between containers.

With these patches, only metadat is copied up during chown() and if file
is opened for READ, d_real() returns lower dentry/inode. That way,
different containers can still continue to use page cache. That's the
use case I have in mind.

Basically, I am relying on storing OVL_XATTR_ORIGIN in upper inode
during copy up. I use that information to get to lower inode later and
do data copy up when necessary.

I also store OVL_XATTR_METACOPY in upper inode to mark that only
metadata has been copied up and data copy up still might be required.

Any feedback is helpful.

Thanks
Vivek

Vivek Goyal (13):
  ovl: Put upperdentry if ovl_check_origin() fails
  ovl: Create origin xattr on copy up for all files
  ovl: ovl_check_setxattr() get rid of redundant -EOPNOTSUPP check
  ovl: Provide a mount option metacopy=on/off for metadata copyup
  ovl: During copy up, first copy up metadata and then data
  ovl: Copy up only metadata during copy up where it makes sense
  ovl: A new xattr OVL_XATTR_METACOPY for file on upper
  ovl: Fix ovl_getattr() to get number of blocks from lower
  ovl: Set OVL_UPPERDATA flag during ovl_lookup()
  ovl: Return lower dentry if only metadata copy up took place
  ovl: Introduce read/write barriers around metacopy flag update
  ovl: Do not export metacopy only upper dentry
  ovl: Enable metadata only feature

 fs/overlayfs/Kconfig     |   8 ++++
 fs/overlayfs/copy_up.c   | 116 +++++++++++++++++++++++++++++++++++------------
 fs/overlayfs/inode.c     |  15 +++++-
 fs/overlayfs/namei.c     |  38 +++++++++++++++-
 fs/overlayfs/overlayfs.h |   6 ++-
 fs/overlayfs/ovl_entry.h |   1 +
 fs/overlayfs/super.c     |  67 +++++++++++++++++++++++++--
 fs/overlayfs/util.c      |  54 +++++++++++++++++-----
 8 files changed, 257 insertions(+), 48 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 01/13] ovl: Put upperdentry if ovl_check_origin() fails
  2017-10-25 19:09 [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data Vivek Goyal
@ 2017-10-25 19:09 ` Vivek Goyal
  2017-10-25 20:08   ` Amir Goldstein
  2017-10-25 19:09 ` [PATCH 02/13] ovl: Create origin xattr on copy up for all files Vivek Goyal
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2017-10-25 19:09 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal

If ovl_check_origin() fails, we should put upperdentry. We have a reference
on it by now. So goto out_put_upper instead of out.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/namei.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 88ff1da..aec543d 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -633,7 +633,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 			err = ovl_check_origin(upperdentry, roe->lowerstack,
 					       roe->numlower, &stack, &ctr);
 			if (err)
-				goto out;
+				goto out_put_upper;
 		}
 
 		if (d.redirect) {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 02/13] ovl: Create origin xattr on copy up for all files
  2017-10-25 19:09 [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data Vivek Goyal
  2017-10-25 19:09 ` [PATCH 01/13] ovl: Put upperdentry if ovl_check_origin() fails Vivek Goyal
@ 2017-10-25 19:09 ` Vivek Goyal
  2017-10-26  5:31   ` Amir Goldstein
  2017-10-25 19:09 ` [PATCH 03/13] ovl: ovl_check_setxattr() get rid of redundant -EOPNOTSUPP check Vivek Goyal
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2017-10-25 19:09 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal

Right now my understanding is that origin xattr is created for all copied
up files if index=on. And if index=off, then we create it for all type
of files except hardlinks (nlink != 1).

With metadata only copy up, I will still require origin xattr to copy up
data later, so create it even for hardlinks even with index=off.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/copy_up.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index c441f93..8a8f538 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -470,9 +470,6 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
 	/*
 	 * Store identifier of lower inode in upper inode xattr to
 	 * allow lookup of the copy up origin inode.
-	 *
-	 * Don't set origin when we are breaking the association with a lower
-	 * hard link.
 	 */
 	if (c->origin) {
 		err = ovl_set_origin(c->dentry, c->lowerpath.dentry, temp);
@@ -538,9 +535,6 @@ static int ovl_do_copy_up(struct ovl_copy_up_ctx *c)
 	    c->stat.nlink > 1)
 		indexed = true;
 
-	if (S_ISDIR(c->stat.mode) || c->stat.nlink == 1 || indexed)
-		c->origin = true;
-
 	if (indexed) {
 		c->destdir = ovl_indexdir(c->dentry->d_sb);
 		err = ovl_get_index_name(c->lowerpath.dentry, &c->destname);
@@ -596,6 +590,7 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 		.parent = parent,
 		.dentry = dentry,
 		.workdir = ovl_workdir(dentry),
+		.origin = true,
 	};
 
 	if (WARN_ON(!ctx.workdir))
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 03/13] ovl: ovl_check_setxattr() get rid of redundant -EOPNOTSUPP check
  2017-10-25 19:09 [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data Vivek Goyal
  2017-10-25 19:09 ` [PATCH 01/13] ovl: Put upperdentry if ovl_check_origin() fails Vivek Goyal
  2017-10-25 19:09 ` [PATCH 02/13] ovl: Create origin xattr on copy up for all files Vivek Goyal
@ 2017-10-25 19:09 ` Vivek Goyal
  2017-10-25 19:09 ` [PATCH 04/13] ovl: Provide a mount option metacopy=on/off for metadata copyup Vivek Goyal
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Vivek Goyal @ 2017-10-25 19:09 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal

At mount time we check if upper supports xattr or not and set ofs->noxattr
field accordingly. But in ovl_check_setxattr() still seems to have logic
which expects that ovl_do_setxattr() can return -EOPNOTSUPP. Not sure when
and how can we hit this code. Feels redundant to me. So I am removing
this code.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/util.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index b9b239f..a4ce1c9 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -348,21 +348,12 @@ int ovl_check_setxattr(struct dentry *dentry, struct dentry *upperdentry,
 		       const char *name, const void *value, size_t size,
 		       int xerr)
 {
-	int err;
 	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
 
 	if (ofs->noxattr)
 		return xerr;
 
-	err = ovl_do_setxattr(upperdentry, name, value, size, 0);
-
-	if (err == -EOPNOTSUPP) {
-		pr_warn("overlayfs: cannot set %s xattr on upper\n", name);
-		ofs->noxattr = true;
-		return xerr;
-	}
-
-	return err;
+	return ovl_do_setxattr(upperdentry, name, value, size, 0);
 }
 
 int ovl_set_impure(struct dentry *dentry, struct dentry *upperdentry)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 04/13] ovl: Provide a mount option metacopy=on/off for metadata copyup
  2017-10-25 19:09 [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (2 preceding siblings ...)
  2017-10-25 19:09 ` [PATCH 03/13] ovl: ovl_check_setxattr() get rid of redundant -EOPNOTSUPP check Vivek Goyal
@ 2017-10-25 19:09 ` Vivek Goyal
  2017-10-26  5:39   ` Amir Goldstein
  2017-10-25 19:09 ` [PATCH 05/13] ovl: During copy up, first copy up metadata and then data Vivek Goyal
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2017-10-25 19:09 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal

By default metadata only copy up is disabled. Provide a mount option so
that users can choose one way or other.

Also provide a kernel config and module option to enable/disable
metacopy feature.

Like index feature, when overlay is mounted, on root upper directory we
set ORIGIN which points to lower. And at later mount time it is verified
again. This hopes to get the configuration right. But this does only so
much as we don't verify all the lowers. So it is possible that a lower is
missing and later data copy up fails.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/Kconfig     |  8 ++++++++
 fs/overlayfs/ovl_entry.h |  1 +
 fs/overlayfs/super.c     | 38 +++++++++++++++++++++++++++++++++++---
 3 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
index cbfc196..17a0b17 100644
--- a/fs/overlayfs/Kconfig
+++ b/fs/overlayfs/Kconfig
@@ -43,3 +43,11 @@ config OVERLAY_FS_INDEX
 	  outcomes.  However, mounting the same overlay with an old kernel
 	  read-write and then mounting it again with a new kernel, will have
 	  unexpected results.
+
+config OVERLAY_FS_METACOPY
+	bool "Overlayfs: turn on metadata only copy up feature by default"
+	depends on OVERLAY_FS
+	help
+	  If this config option is enabled then overlay filesystems will
+	  copy up only metadata where appropriate and data copy up will
+	  happen when a file is opended for WRITE operation.
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 25d9b5a..6806f0b 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -15,6 +15,7 @@ struct ovl_config {
 	bool default_permissions;
 	bool redirect_dir;
 	bool index;
+	bool metacopy;
 };
 
 /* private information held for overlayfs's superblock */
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 092d150..32e3d4b 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -39,6 +39,11 @@ module_param_named(index, ovl_index_def, bool, 0644);
 MODULE_PARM_DESC(ovl_index_def,
 		 "Default to on or off for the inodes index feature");
 
+static bool ovl_metacopy_def = IS_ENABLED(CONFIG_OVERLAY_FS_METACOPY);
+module_param_named(metacopy, ovl_metacopy_def, bool, 0644);
+MODULE_PARM_DESC(ovl_metacopy_def,
+		 "Default to on or off for the metadata only copy up feature");
+
 static void ovl_dentry_release(struct dentry *dentry)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
@@ -303,6 +308,9 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
 	if (ufs->config.index != ovl_index_def)
 		seq_printf(m, ",index=%s",
 			   ufs->config.index ? "on" : "off");
+	if (ufs->config.metacopy != ovl_metacopy_def)
+		seq_printf(m, ",metacopy=%s",
+			   ufs->config.metacopy ? "on" : "off");
 	return 0;
 }
 
@@ -336,6 +344,8 @@ enum {
 	OPT_REDIRECT_DIR_OFF,
 	OPT_INDEX_ON,
 	OPT_INDEX_OFF,
+	OPT_METACOPY_ON,
+	OPT_METACOPY_OFF,
 	OPT_ERR,
 };
 
@@ -348,6 +358,8 @@ static const match_table_t ovl_tokens = {
 	{OPT_REDIRECT_DIR_OFF,		"redirect_dir=off"},
 	{OPT_INDEX_ON,			"index=on"},
 	{OPT_INDEX_OFF,			"index=off"},
+	{OPT_METACOPY_ON,		"metacopy=on"},
+	{OPT_METACOPY_OFF,		"metacopy=off"},
 	{OPT_ERR,			NULL}
 };
 
@@ -428,6 +440,14 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
 			config->index = false;
 			break;
 
+		case OPT_METACOPY_ON:
+			config->metacopy = true;
+			break;
+
+		case OPT_METACOPY_OFF:
+			config->metacopy = false;
+			break;
+
 		default:
 			pr_err("overlayfs: unrecognized mount option \"%s\" or missing value\n", p);
 			return -EINVAL;
@@ -644,9 +664,16 @@ static int ovl_lower_dir(const char *name, struct path *path,
 	 * The inodes index feature needs to encode and decode file
 	 * handles, so it requires that all layers support them.
 	 */
-	if (ofs->config.index && !ovl_can_decode_fh(path->dentry->d_sb)) {
+	if ((ofs->config.index || ofs->config.metacopy) &&
+	     !ovl_can_decode_fh(path->dentry->d_sb)) {
+		if (ofs->config.index)
+			pr_warn("overlayfs: fs on '%s' does not support file handles, falling back to index=off.\n", name);
+
+		if (ofs->config.metacopy)
+			pr_warn("overlayfs: fs on '%s' does not support file handles, falling back to metacopy=off.\n", name);
+
 		ofs->config.index = false;
-		pr_warn("overlayfs: fs on '%s' does not support file handles, falling back to index=off.\n", name);
+		ofs->config.metacopy = false;
 	}
 
 	return 0;
@@ -847,6 +874,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 
 	ufs->config.redirect_dir = ovl_redirect_dir_def;
 	ufs->config.index = ovl_index_def;
+	ufs->config.metacopy = ovl_metacopy_def;
+
 	err = ovl_parse_opt((char *) data, &ufs->config);
 	if (err)
 		goto out_free_config;
@@ -1057,7 +1086,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	else if (ufs->upper_mnt->mnt_sb != ufs->same_sb)
 		ufs->same_sb = NULL;
 
-	if (!(ovl_force_readonly(ufs)) && ufs->config.index) {
+	if (!(ovl_force_readonly(ufs)) &&
+	      (ufs->config.index || ufs->config.metacopy)) {
 		/* Verify lower root is upper root origin */
 		err = ovl_verify_origin(upperpath.dentry, ufs->lower_mnt[0],
 					stack[0].dentry, false, true);
@@ -1065,7 +1095,9 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 			pr_err("overlayfs: failed to verify upper root origin\n");
 			goto out_put_lower_mnt;
 		}
+	}
 
+	if (!(ovl_force_readonly(ufs)) && ufs->config.index) {
 		ufs->indexdir = ovl_workdir_create(sb, ufs, workpath.dentry,
 						   OVL_INDEXDIR_NAME, true);
 		if (ufs->indexdir) {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 05/13] ovl: During copy up, first copy up metadata and then data
  2017-10-25 19:09 [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (3 preceding siblings ...)
  2017-10-25 19:09 ` [PATCH 04/13] ovl: Provide a mount option metacopy=on/off for metadata copyup Vivek Goyal
@ 2017-10-25 19:09 ` Vivek Goyal
  2017-10-26  5:42   ` Amir Goldstein
  2017-10-25 19:09 ` [PATCH 06/13] ovl: Copy up only metadata during copy up where it makes sense Vivek Goyal
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2017-10-25 19:09 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal

Just a little re-ordering of code. This helps with next patch where after
copying up metadata, we skip data copying step, if needed.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/copy_up.c | 33 ++++++++++++++++-----------------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 8a8f538..4f580ec 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -445,6 +445,21 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
 {
 	int err;
 
+	err = ovl_copy_xattr(c->lowerpath.dentry, temp);
+	if (err)
+		return err;
+
+	/*
+	 * Store identifier of lower inode in upper inode xattr to
+	 * allow lookup of the copy up origin inode.
+	 *
+	 */
+	if (c->origin) {
+		err = ovl_set_origin(c->dentry, c->lowerpath.dentry, temp);
+		if (err)
+			return err;
+	}
+
 	if (S_ISREG(c->stat.mode)) {
 		struct path upperpath;
 
@@ -457,27 +472,11 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
 			return err;
 	}
 
-	err = ovl_copy_xattr(c->lowerpath.dentry, temp);
-	if (err)
-		return err;
-
 	inode_lock(temp->d_inode);
 	err = ovl_set_attr(temp, &c->stat);
 	inode_unlock(temp->d_inode);
-	if (err)
-		return err;
-
-	/*
-	 * Store identifier of lower inode in upper inode xattr to
-	 * allow lookup of the copy up origin inode.
-	 */
-	if (c->origin) {
-		err = ovl_set_origin(c->dentry, c->lowerpath.dentry, temp);
-		if (err)
-			return err;
-	}
 
-	return 0;
+	return err;
 }
 
 static int ovl_copy_up_locked(struct ovl_copy_up_ctx *c)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 06/13] ovl: Copy up only metadata during copy up where it makes sense
  2017-10-25 19:09 [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (4 preceding siblings ...)
  2017-10-25 19:09 ` [PATCH 05/13] ovl: During copy up, first copy up metadata and then data Vivek Goyal
@ 2017-10-25 19:09 ` Vivek Goyal
  2017-10-25 19:09 ` [PATCH 07/13] ovl: A new xattr OVL_XATTR_METACOPY for file on upper Vivek Goyal
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Vivek Goyal @ 2017-10-25 19:09 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal

If it makes sense to copy up only metadata during copy up, do it. This
is done for regular files which are not opened for WRITE and have origin
being saved.

Right now ->metacopy is set to 0 hard coded. Last patch in the series will
set it based on config option and enable the feature.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 4f580ec..2936284 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -306,11 +306,8 @@ static int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
 			return PTR_ERR(fh);
 	}
 
-	/*
-	 * Do not fail when upper doesn't support xattrs.
-	 */
 	err = ovl_check_setxattr(dentry, upper, OVL_XATTR_ORIGIN, fh,
-				 fh ? fh->len : 0, 0);
+				 fh ? fh->len : 0, -EOPNOTSUPP);
 	kfree(fh);
 
 	return err;
@@ -328,6 +325,7 @@ struct ovl_copy_up_ctx {
 	struct dentry *workdir;
 	bool tmpfile;
 	bool origin;
+	bool metacopy;
 };
 
 static int ovl_link_up(struct ovl_copy_up_ctx *c)
@@ -456,11 +454,15 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
 	 */
 	if (c->origin) {
 		err = ovl_set_origin(c->dentry, c->lowerpath.dentry, temp);
-		if (err)
-			return err;
+		if (err) {
+			if (err != -EOPNOTSUPP)
+				return err;
+			/* Upper does not support xattr. Copy up data as well */
+			c->metacopy = false;
+		}
 	}
 
-	if (S_ISREG(c->stat.mode)) {
+	if (S_ISREG(c->stat.mode) && !c->metacopy) {
 		struct path upperpath;
 
 		ovl_path_upper(c->dentry, &upperpath);
@@ -590,6 +592,7 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 		.dentry = dentry,
 		.workdir = ovl_workdir(dentry),
 		.origin = true,
+		.metacopy = false,
 	};
 
 	if (WARN_ON(!ctx.workdir))
@@ -610,9 +613,17 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 	if (err)
 		return err;
 
+	if (!S_ISREG(ctx.stat.mode))
+		ctx.metacopy = false;
+
 	/* maybe truncate regular file. this has no effect on dirs */
-	if (flags & O_TRUNC)
+	if (flags & O_TRUNC) {
 		ctx.stat.size = 0;
+		ctx.metacopy = false;
+	}
+
+	if (flags & (OPEN_FMODE(flags) & FMODE_WRITE))
+		ctx.metacopy = false;
 
 	if (S_ISLNK(ctx.stat.mode)) {
 		ctx.link = vfs_get_link(ctx.lowerpath.dentry, &done);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 07/13] ovl: A new xattr OVL_XATTR_METACOPY for file on upper
  2017-10-25 19:09 [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (5 preceding siblings ...)
  2017-10-25 19:09 ` [PATCH 06/13] ovl: Copy up only metadata during copy up where it makes sense Vivek Goyal
@ 2017-10-25 19:09 ` Vivek Goyal
  2017-10-26  6:04   ` Amir Goldstein
  2017-10-25 19:09 ` [PATCH 08/13] ovl: Fix ovl_getattr() to get number of blocks from lower Vivek Goyal
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2017-10-25 19:09 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal

Now we will have the capability to have upper inodes which might be only
metadata copy up and data is still on lower inode. So add a new xattr
OVL_XATTR_METACOPY to distinguish between two cases.

Presence of OVL_XATTR_METACOPY reflects that file has been copied up metadata
only and and data will be copied up later from lower origin.
So this xattr is set when a metadata copy takes place and cleared when
data copy takes place.

We also use a bit in ovl_inode->flags to cache OVL_UPPERDATA which reflects
whether ovl inode has data or not (as opposed to metadata only copy up).

Note, OVL_UPPERDATA is set only on those upper inodes which can possibly
be metacopy only inodes. For example, inode should be a regular file and
there needs to be a lower dentry.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/copy_up.c   | 55 +++++++++++++++++++++++++++++++++++++++++++++---
 fs/overlayfs/inode.c     |  3 ++-
 fs/overlayfs/overlayfs.h |  6 +++++-
 fs/overlayfs/util.c      | 34 ++++++++++++++++++++++++++++--
 4 files changed, 91 insertions(+), 7 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 2936284..a6cda02 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -196,6 +196,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
 	return error;
 }
 
+static int ovl_set_size(struct dentry *upperdentry, struct kstat *stat)
+{
+	struct iattr attr = {
+		.ia_valid = ATTR_SIZE,
+		.ia_size = stat->size,
+	};
+
+	return notify_change(upperdentry, &attr, NULL);
+}
+
 static int ovl_set_timestamps(struct dentry *upperdentry, struct kstat *stat)
 {
 	struct iattr attr = {
@@ -439,6 +449,28 @@ static int ovl_get_tmpfile(struct ovl_copy_up_ctx *c, struct dentry **tempp)
 	goto out;
 }
 
+/* Copy up data of an inode which was copied up metadata only in the past. */
+static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c)
+{
+	struct path upperpath;
+	int err;
+
+	ovl_path_upper(c->dentry, &upperpath);
+	if (WARN_ON(upperpath.dentry == NULL))
+		return -EIO;
+
+	err = ovl_copy_up_data(&c->lowerpath, &upperpath, c->stat.size);
+	if (err)
+		return err;
+
+	err= vfs_removexattr(upperpath.dentry, OVL_XATTR_METACOPY);
+	if (err)
+		return err;
+
+	ovl_set_flag(OVL_UPPERDATA, d_inode(c->dentry));
+	return err;
+}
+
 static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
 {
 	int err;
@@ -474,8 +506,19 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
 			return err;
 	}
 
+	if (c->metacopy) {
+		err = ovl_check_setxattr(c->dentry, temp, OVL_XATTR_METACOPY,
+					 NULL, 0, -EOPNOTSUPP);
+		if (err)
+			return err;
+	}
+
+
 	inode_lock(temp->d_inode);
-	err = ovl_set_attr(temp, &c->stat);
+	if (c->metacopy)
+		err = ovl_set_size(temp, &c->stat);
+	if (!err)
+		err = ovl_set_attr(temp, &c->stat);
 	inode_unlock(temp->d_inode);
 
 	return err;
@@ -506,6 +549,8 @@ static int ovl_copy_up_locked(struct ovl_copy_up_ctx *c)
 	if (err)
 		goto out_cleanup;
 
+	if (!c->metacopy && S_ISREG(d_inode(c->dentry)->i_mode))
+		ovl_set_flag(OVL_UPPERDATA, d_inode(c->dentry));
 	ovl_inode_update(d_inode(c->dentry), newdentry);
 out:
 	dput(temp);
@@ -632,7 +677,7 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 	}
 	ovl_do_check_copy_up(ctx.lowerpath.dentry);
 
-	err = ovl_copy_up_start(dentry);
+	err = ovl_copy_up_start(dentry, flags);
 	/* err < 0: interrupted, err > 0: raced with another copy-up */
 	if (unlikely(err)) {
 		if (err > 0)
@@ -642,6 +687,8 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 			err = ovl_do_copy_up(&ctx);
 		if (!err && !ovl_dentry_has_upper_alias(dentry))
 			err = ovl_link_up(&ctx);
+		if (!err && ovl_dentry_needs_data_copy_up(dentry, flags))
+			err = ovl_copy_up_meta_inode_data(&ctx);
 		ovl_copy_up_end(dentry);
 	}
 	do_delayed_call(&done);
@@ -672,8 +719,10 @@ int ovl_copy_up_flags(struct dentry *dentry, int flags)
 		 *      with rename.
 		 */
 		if (ovl_dentry_upper(dentry) &&
-		    ovl_dentry_has_upper_alias(dentry))
+		    ovl_dentry_has_upper_alias(dentry) &&
+		    !ovl_dentry_needs_data_copy_up(dentry, flags)) {
 			break;
+		}
 
 		next = dget(dentry);
 		/* find the topmost dentry not yet copied up */
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 321511e..1b4b42c 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -320,7 +320,8 @@ struct posix_acl *ovl_get_acl(struct inode *inode, int type)
 static bool ovl_open_need_copy_up(struct dentry *dentry, int flags)
 {
 	if (ovl_dentry_upper(dentry) &&
-	    ovl_dentry_has_upper_alias(dentry))
+	    ovl_dentry_has_upper_alias(dentry) &&
+	    !ovl_dentry_needs_data_copy_up(dentry, flags))
 		return false;
 
 	if (special_file(d_inode(dentry)->i_mode))
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index d9a0edd..714abf9 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -26,10 +26,12 @@ enum ovl_path_type {
 #define OVL_XATTR_ORIGIN OVL_XATTR_PREFIX "origin"
 #define OVL_XATTR_IMPURE OVL_XATTR_PREFIX "impure"
 #define OVL_XATTR_NLINK OVL_XATTR_PREFIX "nlink"
+#define OVL_XATTR_METACOPY OVL_XATTR_PREFIX "metacopy"
 
 enum ovl_flag {
 	OVL_IMPURE,
 	OVL_INDEX,
+	OVL_UPPERDATA,
 };
 
 /*
@@ -211,6 +213,8 @@ bool ovl_dentry_is_whiteout(struct dentry *dentry);
 void ovl_dentry_set_opaque(struct dentry *dentry);
 bool ovl_dentry_has_upper_alias(struct dentry *dentry);
 void ovl_dentry_set_upper_alias(struct dentry *dentry);
+bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags);
+bool ovl_dentry_check_upperdata(struct dentry *dentry);
 bool ovl_redirect_dir(struct super_block *sb);
 const char *ovl_dentry_get_redirect(struct dentry *dentry);
 void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
@@ -221,7 +225,7 @@ void ovl_dentry_version_inc(struct dentry *dentry, bool impurity);
 u64 ovl_dentry_version_get(struct dentry *dentry);
 bool ovl_is_whiteout(struct dentry *dentry);
 struct file *ovl_path_open(struct path *path, int flags);
-int ovl_copy_up_start(struct dentry *dentry);
+int ovl_copy_up_start(struct dentry *dentry, int flags);
 void ovl_copy_up_end(struct dentry *dentry);
 bool ovl_check_dir_xattr(struct dentry *dentry, const char *name);
 int ovl_check_setxattr(struct dentry *dentry, struct dentry *upperdentry,
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index a4ce1c9..ef720a9 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -227,6 +227,35 @@ void ovl_dentry_set_upper_alias(struct dentry *dentry)
 	oe->has_upper = true;
 }
 
+bool ovl_dentry_check_upperdata(struct dentry *dentry) {
+	if (!S_ISREG(d_inode(dentry)->i_mode))
+		return false;
+
+	if (!ovl_dentry_lower(dentry))
+		return false;
+
+	return true;
+}
+
+bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags) {
+	if (!ovl_dentry_check_upperdata(dentry))
+		return false;
+
+	if (!S_ISREG(d_inode(dentry)->i_mode))
+		return false;
+
+	if (!flags)
+		return false;
+
+	if (!(OPEN_FMODE(flags) & FMODE_WRITE))
+		return false;
+
+	if (likely(ovl_test_flag(OVL_UPPERDATA, d_inode(dentry))))
+		return false;
+
+	return true;
+}
+
 bool ovl_redirect_dir(struct super_block *sb)
 {
 	struct ovl_fs *ofs = sb->s_fs_info;
@@ -310,13 +339,14 @@ struct file *ovl_path_open(struct path *path, int flags)
 	return dentry_open(path, flags | O_NOATIME, current_cred());
 }
 
-int ovl_copy_up_start(struct dentry *dentry)
+int ovl_copy_up_start(struct dentry *dentry, int flags)
 {
 	struct ovl_inode *oi = OVL_I(d_inode(dentry));
 	int err;
 
 	err = mutex_lock_interruptible(&oi->lock);
-	if (!err && ovl_dentry_has_upper_alias(dentry)) {
+	if (!err && ovl_dentry_has_upper_alias(dentry) &&
+	    !ovl_dentry_needs_data_copy_up(dentry, flags)) {
 		err = 1; /* Already copied up */
 		mutex_unlock(&oi->lock);
 	}
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 08/13] ovl: Fix ovl_getattr() to get number of blocks from lower
  2017-10-25 19:09 [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (6 preceding siblings ...)
  2017-10-25 19:09 ` [PATCH 07/13] ovl: A new xattr OVL_XATTR_METACOPY for file on upper Vivek Goyal
@ 2017-10-25 19:09 ` Vivek Goyal
  2017-10-26  6:12   ` Amir Goldstein
  2017-10-25 19:09 ` [PATCH 09/13] ovl: Set OVL_UPPERDATA flag during ovl_lookup() Vivek Goyal
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2017-10-25 19:09 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal

If an inode has been copied up metadata only, then we need to query the
number of blocks from lower and fill up the stat->st_blocks.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/inode.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 1b4b42c..dcc012c 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -140,6 +140,18 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
 	if (!is_dir && ovl_test_flag(OVL_INDEX, d_inode(dentry)))
 		stat->nlink = dentry->d_inode->i_nlink;
 
+	if (ovl_dentry_upper(dentry) && ovl_dentry_check_upperdata(dentry) &&
+	    !ovl_test_flag(OVL_UPPERDATA, d_inode(dentry))) {
+		struct kstat lowerstat;
+
+		if (WARN_ON(!OVL_TYPE_ORIGIN(type)))
+			goto out;
+		ovl_path_lower(dentry, &realpath);
+		err = vfs_getattr(&realpath, &lowerstat, STATX_BLOCKS, flags);
+		if (err)
+			goto out;
+		stat->blocks = lowerstat.blocks;
+	}
 out:
 	revert_creds(old_cred);
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 09/13] ovl: Set OVL_UPPERDATA flag during ovl_lookup()
  2017-10-25 19:09 [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (7 preceding siblings ...)
  2017-10-25 19:09 ` [PATCH 08/13] ovl: Fix ovl_getattr() to get number of blocks from lower Vivek Goyal
@ 2017-10-25 19:09 ` Vivek Goyal
  2017-10-26  6:19   ` Amir Goldstein
  2017-10-25 19:09 ` [PATCH 10/13] ovl: Return lower dentry if only metadata copy up took place Vivek Goyal
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2017-10-25 19:09 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal

During lookup, check for presence of OVL_XATTR_METACOPY and if not present,
set OVL_UPPERDATA bit in flags. This is only done for upper inode with a
corresponding lower dentry and file needs to be a regular file. Basically
any file which is eligible for metadata only copy up.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/namei.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index aec543d..52cd6ca 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -26,6 +26,24 @@ struct ovl_lookup_data {
 	char *redirect;
 };
 
+/* err < 0, 0 if no metacopy xattr, 1 if metacopy xattr found */
+static int ovl_check_metacopy(struct dentry *dentry)
+{
+	int res;
+
+	res = vfs_getxattr(dentry, OVL_XATTR_METACOPY, NULL, 0);
+	if (res < 0) {
+		if (res == -ENODATA || res == -EOPNOTSUPP)
+			return 0;
+		goto out;
+	}
+
+	return 1;
+out:
+	pr_warn_ratelimited("overlayfs: failed to get metacopy (%i)\n", res);
+	return res;
+}
+
 static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
 			      size_t prelen, const char *post)
 {
@@ -594,6 +612,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	struct dentry *this;
 	unsigned int i;
 	int err;
+	bool metacopy = false;
 	struct ovl_lookup_data d = {
 		.name = dentry->d_name,
 		.is_dir = false,
@@ -634,6 +653,20 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 					       roe->numlower, &stack, &ctr);
 			if (err)
 				goto out_put_upper;
+
+			err = ovl_check_metacopy(upperdentry);
+			metacopy = err;
+			if (err < 0)
+				goto out_put_upper;
+			if (metacopy && !ctr) {
+				/*
+				 * Found an upper with metacopy set but at the
+				 * same time there is no lower dentry. Something
+				 * is not right.
+				 */
+				err = -ESTALE;
+				goto out_put_upper;
+			}
 		}
 
 		if (d.redirect) {
@@ -719,6 +752,9 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		OVL_I(inode)->redirect = upperredirect;
 		if (index)
 			ovl_set_flag(OVL_INDEX, inode);
+
+		if (upperdentry && ctr && S_ISREG(inode->i_mode) && !metacopy)
+			ovl_set_flag(OVL_UPPERDATA, inode);
 	}
 
 	revert_creds(old_cred);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 10/13] ovl: Return lower dentry if only metadata copy up took place
  2017-10-25 19:09 [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (8 preceding siblings ...)
  2017-10-25 19:09 ` [PATCH 09/13] ovl: Set OVL_UPPERDATA flag during ovl_lookup() Vivek Goyal
@ 2017-10-25 19:09 ` Vivek Goyal
  2017-10-25 19:09 ` [PATCH 11/13] ovl: Introduce read/write barriers around metacopy flag update Vivek Goyal
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Vivek Goyal @ 2017-10-25 19:09 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal

Upper dentry inode does not have data. So return lower dentry if upper
is only a metadata copy.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/super.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 32e3d4b..4cf1f98 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -101,10 +101,15 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
 			err = ovl_check_append_only(d_inode(real), open_flags);
 			if (err)
 				return ERR_PTR(err);
+
+			if (ovl_dentry_check_upperdata(dentry) &&
+			    !ovl_test_flag(OVL_UPPERDATA, d_inode(dentry)))
+				goto lower;
 		}
 		return real;
 	}
 
+lower:
 	real = ovl_dentry_lower(dentry);
 	if (!real)
 		goto bug;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 11/13] ovl: Introduce read/write barriers around metacopy flag update
  2017-10-25 19:09 [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (9 preceding siblings ...)
  2017-10-25 19:09 ` [PATCH 10/13] ovl: Return lower dentry if only metadata copy up took place Vivek Goyal
@ 2017-10-25 19:09 ` Vivek Goyal
  2017-10-26  6:34   ` Amir Goldstein
  2017-10-25 19:09 ` [PATCH 12/13] ovl: Do not export metacopy only upper dentry Vivek Goyal
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2017-10-25 19:09 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal

If a file is copied up metadata only and later when same file is opened
for WRITE, then data copy up takes place. We copy up data, remove METACOPY
xattr and then set the UPPERDATA flag in ovl_entry->flags. While all
these operations happen with oi->lock held, read side of oi->flags is
lockless. That is another thread on another cpu can check if UPPERDATA
flag is set or not.

So this gives us an ordering requirement w.r.t UPPERDATA flag. That is, if
another cpu sees UPPERDATA flag set, then it should be guaranteed that
effects of data copy up and remove xattr operations are also visible.

For example.

	CPU1				CPU2
ovl_copy_up_flags()			acquire(oi->lock)
 ovl_dentry_needs_data_copy_up()	  ovl_copy_up_data()
   ovl_test_flag(OVL_UPPERDATA)		  vfs_removexattr()
					  ovl_set_flag(OVL_UPPERDATA)
					release(oi->lock)

Say CPU2 is copying up data and in the end sets UPPERDATA flag. But if
CPU1 perceives the effects of setting UPPERDATA flag but not effects of
preceeding operations, that would be a problem.

Hence this patch introduces smp_wmb() on setting UPPERDATA flag operation
and smp_rmb() on UPPERDATA flag test operation.

May be some other lock or barrier is already covering it. But I am not sure
what that is and is it obvious enough that we will not break it in future.

So hence trying to be safe here and introducing barriers explicitly for
UPPERDATA flag/bit.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/copy_up.c |  7 ++++++-
 fs/overlayfs/super.c   | 13 ++++++++++---
 fs/overlayfs/util.c    | 11 ++++++++++-
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index a6cda02..4876ae4 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -466,7 +466,12 @@ static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c)
 	err= vfs_removexattr(upperpath.dentry, OVL_XATTR_METACOPY);
 	if (err)
 		return err;
-
+	/*
+	 * Pairs with smp_rmb() in ovl_dentry_needs_data_copy_up(). Make sure
+	 * if OVL_UPPERDATA flag is visible, then all the write operations
+	 * before it are visible as well.
+	 */
+	smp_wmb();
 	ovl_set_flag(OVL_UPPERDATA, d_inode(c->dentry));
 	return err;
 }
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 4cf1f98..e97dccb 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -102,9 +102,16 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
 			if (err)
 				return ERR_PTR(err);
 
-			if (ovl_dentry_check_upperdata(dentry) &&
-			    !ovl_test_flag(OVL_UPPERDATA, d_inode(dentry)))
-				goto lower;
+			if (ovl_dentry_check_upperdata(dentry)) {
+				if (!ovl_test_flag(OVL_UPPERDATA,
+				    d_inode(dentry)))
+					goto lower;
+				/*
+				 * Pairs with smp_wmb in
+				 * ovl_copy_up_meta_inode_data()
+				 */
+				smp_rmb();
+			}
 		}
 		return real;
 	}
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index ef720a9..d0f3bf7 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -238,6 +238,8 @@ bool ovl_dentry_check_upperdata(struct dentry *dentry) {
 }
 
 bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags) {
+	bool upperdata;
+
 	if (!ovl_dentry_check_upperdata(dentry))
 		return false;
 
@@ -250,7 +252,14 @@ bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags) {
 	if (!(OPEN_FMODE(flags) & FMODE_WRITE))
 		return false;
 
-	if (likely(ovl_test_flag(OVL_UPPERDATA, d_inode(dentry))))
+	upperdata = ovl_test_flag(OVL_UPPERDATA, d_inode(dentry));
+	/*
+	 * Pairs with smp_wmb() in ovl_copy_up_meta_inode_data(). Make sure
+	 * if setting of OVL_UPPERDATA is visible, then effects of writes
+	 * before that are visible too.
+	 */
+	smp_rmb();
+	if (upperdata)
 		return false;
 
 	return true;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 12/13] ovl: Do not export metacopy only upper dentry
  2017-10-25 19:09 [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (10 preceding siblings ...)
  2017-10-25 19:09 ` [PATCH 11/13] ovl: Introduce read/write barriers around metacopy flag update Vivek Goyal
@ 2017-10-25 19:09 ` Vivek Goyal
  2017-10-26  6:54   ` Amir Goldstein
  2017-10-25 19:09 ` [PATCH 13/13] ovl: Enable metadata only feature Vivek Goyal
  2017-10-26  7:18 ` [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data Amir Goldstein
  13 siblings, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2017-10-25 19:09 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal

d_real() can make a upper metacopy dentry/inode visible to the vfs layer.
This is something new and vfs layer does not know that this inode contains
only metadata and not data. And this could break things.

So to be safe, do not export metacopy only dentry/inode to vfs using d_real().

If d_real() is called with flag D_REAL_UPPER, return upper dentry only if
it has data (flag OVL_UPPERDATA is set).

Similiarly, if d_real(inode=X) is called, a warning is emitted if returned
dentry/inode does not have OVL_UPPERDATA set. This should not happen as
we never made this metacopy inode visible to vfs so nobody should be calling
overlayfs back with inode=metacopy_inode.

I scanned the code and I don't think it breaks any of the existing code.
There are two users of D_REAL_UPPER. may_write_real() and
update_ovl_inode_times().

may_write_real(), will get an NULL dentry if upper inode is metacopy only
and it will return -EPERM. Effectively, we are disallowing modifications
to metacopy only inode from this interface. Though there is opportunity
to improve it. (Allow chattr on metacopy inodes).

update_ovl_inode_times() gets inode mtime and ctime from real inode. It
should not be broken for metacopy inode as well for following reasons.

- For any metadata operations (setattr, acl etc), overlay always calls
  ovl_copyattr() and updates ovl inode mtime and ctime. So there is no
  need to update mtime and ctime int his case. Its already updated.

- For metadata inode, mtime should be same as lower and not change. (data
  can't be modified on metadata inode without copyup).

- For file writes, ctime and mtime will be updated. But in that case
  first data will be copied up and this will not be a metadata inode
  anymore. And furthr call to d_real(D_REAL_UPPER) will return upper
  inode and new mtime and ctime will be obtainable.

So atime updates should work just fine for metacopy inodes.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/super.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index e97dccb..dc8909a 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -80,8 +80,18 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
 	struct dentry *real;
 	int err;
 
-	if (flags & D_REAL_UPPER)
-		return ovl_dentry_upper(dentry);
+	if (flags & D_REAL_UPPER) {
+		real = ovl_dentry_upper(dentry);
+		if (!real)
+			return NULL;
+		if (!ovl_dentry_check_upperdata(dentry))
+			return real;
+		if (!ovl_test_flag(OVL_UPPERDATA, d_inode(dentry)))
+			return NULL;
+		/* Pairs with smp_wmb() in ovl_copy_up_meta_inode_data() */
+		smp_rmb();
+		return real;
+	}
 
 	if (!d_is_reg(dentry)) {
 		if (!inode || inode == d_inode(dentry))
@@ -113,6 +123,9 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
 				smp_rmb();
 			}
 		}
+
+		WARN_ON(ovl_dentry_check_upperdata(dentry) &&
+			!ovl_test_flag((OVL_UPPERDATA), d_inode(dentry)));
 		return real;
 	}
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 13/13] ovl: Enable metadata only feature
  2017-10-25 19:09 [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (11 preceding siblings ...)
  2017-10-25 19:09 ` [PATCH 12/13] ovl: Do not export metacopy only upper dentry Vivek Goyal
@ 2017-10-25 19:09 ` Vivek Goyal
  2017-10-26  7:07   ` Amir Goldstein
  2017-10-26  7:18 ` [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data Amir Goldstein
  13 siblings, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2017-10-25 19:09 UTC (permalink / raw)
  To: linux-unionfs; +Cc: amir73il, miklos, vgoyal

All the bits are in patches before this. So it is time to enable the
metadata only copy up feature.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/copy_up.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 4876ae4..3ce35e1 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -637,12 +637,13 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 	int err;
 	DEFINE_DELAYED_CALL(done);
 	struct path parentpath;
+	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
 	struct ovl_copy_up_ctx ctx = {
 		.parent = parent,
 		.dentry = dentry,
 		.workdir = ovl_workdir(dentry),
 		.origin = true,
-		.metacopy = false,
+		.metacopy = ofs->config.metacopy,
 	};
 
 	if (WARN_ON(!ctx.workdir))
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 01/13] ovl: Put upperdentry if ovl_check_origin() fails
  2017-10-25 19:09 ` [PATCH 01/13] ovl: Put upperdentry if ovl_check_origin() fails Vivek Goyal
@ 2017-10-25 20:08   ` Amir Goldstein
  0 siblings, 0 replies; 44+ messages in thread
From: Amir Goldstein @ 2017-10-25 20:08 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> If ovl_check_origin() fails, we should put upperdentry. We have a reference
> on it by now. So goto out_put_upper instead of out.
>

This one needs CC stable and Fixes: tags

> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/namei.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index 88ff1da..aec543d 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -633,7 +633,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                         err = ovl_check_origin(upperdentry, roe->lowerstack,
>                                                roe->numlower, &stack, &ctr);
>                         if (err)
> -                               goto out;
> +                               goto out_put_upper;
>                 }
>
>                 if (d.redirect) {
> --
> 2.5.5
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/13] ovl: Create origin xattr on copy up for all files
  2017-10-25 19:09 ` [PATCH 02/13] ovl: Create origin xattr on copy up for all files Vivek Goyal
@ 2017-10-26  5:31   ` Amir Goldstein
  2017-10-26 12:53     ` Vivek Goyal
  0 siblings, 1 reply; 44+ messages in thread
From: Amir Goldstein @ 2017-10-26  5:31 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> Right now my understanding is that origin xattr is created for all copied
> up files if index=on. And if index=off, then we create it for all type
> of files except hardlinks (nlink != 1).
>
> With metadata only copy up, I will still require origin xattr to copy up
> data later, so create it even for hardlinks even with index=off.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/copy_up.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index c441f93..8a8f538 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -470,9 +470,6 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
>         /*
>          * Store identifier of lower inode in upper inode xattr to
>          * allow lookup of the copy up origin inode.
> -        *
> -        * Don't set origin when we are breaking the association with a lower
> -        * hard link.
>          */
>         if (c->origin) {
>                 err = ovl_set_origin(c->dentry, c->lowerpath.dentry, temp);
> @@ -538,9 +535,6 @@ static int ovl_do_copy_up(struct ovl_copy_up_ctx *c)
>             c->stat.nlink > 1)
>                 indexed = true;
>
> -       if (S_ISDIR(c->stat.mode) || c->stat.nlink == 1 || indexed)
> -               c->origin = true;
> -
>         if (indexed) {
>                 c->destdir = ovl_indexdir(c->dentry->d_sb);
>                 err = ovl_get_index_name(c->lowerpath.dentry, &c->destname);
> @@ -596,6 +590,7 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
>                 .parent = parent,
>                 .dentry = dentry,
>                 .workdir = ovl_workdir(dentry),
> +               .origin = true,

If its always true, please remove the condition variable, but if it
were up to me,
I would set it to true only when metacopy is enabled for now and maybe relax it
unconditionally later on

>         };
>
>         if (WARN_ON(!ctx.workdir))
> --
> 2.5.5
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/13] ovl: Provide a mount option metacopy=on/off for metadata copyup
  2017-10-25 19:09 ` [PATCH 04/13] ovl: Provide a mount option metacopy=on/off for metadata copyup Vivek Goyal
@ 2017-10-26  5:39   ` Amir Goldstein
  2017-10-26 13:15     ` Vivek Goyal
  0 siblings, 1 reply; 44+ messages in thread
From: Amir Goldstein @ 2017-10-26  5:39 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> By default metadata only copy up is disabled. Provide a mount option so
> that users can choose one way or other.
>
> Also provide a kernel config and module option to enable/disable
> metacopy feature.
>
> Like index feature, when overlay is mounted, on root upper directory we
> set ORIGIN which points to lower. And at later mount time it is verified
> again. This hopes to get the configuration right. But this does only so
> much as we don't verify all the lowers. So it is possible that a lower is
> missing and later data copy up fails.

Since you bother to specify the motivation for verifying lower root dir,
FYI, the main motivation was to detect the case of copied layers.
I still hold the opinion that new incompat features should enforce
exclusive dir lock, so withholding Reviewed-by on this one for now.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 05/13] ovl: During copy up, first copy up metadata and then data
  2017-10-25 19:09 ` [PATCH 05/13] ovl: During copy up, first copy up metadata and then data Vivek Goyal
@ 2017-10-26  5:42   ` Amir Goldstein
  2017-10-26 13:19     ` Vivek Goyal
  0 siblings, 1 reply; 44+ messages in thread
From: Amir Goldstein @ 2017-10-26  5:42 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> Just a little re-ordering of code. This helps with next patch where after
> copying up metadata, we skip data copying step, if needed.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>

Please add
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

if you post again with no changes to patch,
so help me avoid re-review

Thanks

> ---
>  fs/overlayfs/copy_up.c | 33 ++++++++++++++++-----------------
>  1 file changed, 16 insertions(+), 17 deletions(-)
>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index 8a8f538..4f580ec 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -445,6 +445,21 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
>  {
>         int err;
>
> +       err = ovl_copy_xattr(c->lowerpath.dentry, temp);
> +       if (err)
> +               return err;
> +
> +       /*
> +        * Store identifier of lower inode in upper inode xattr to
> +        * allow lookup of the copy up origin inode.
> +        *
> +        */
> +       if (c->origin) {
> +               err = ovl_set_origin(c->dentry, c->lowerpath.dentry, temp);
> +               if (err)
> +                       return err;
> +       }
> +
>         if (S_ISREG(c->stat.mode)) {
>                 struct path upperpath;
>
> @@ -457,27 +472,11 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
>                         return err;
>         }
>
> -       err = ovl_copy_xattr(c->lowerpath.dentry, temp);
> -       if (err)
> -               return err;
> -
>         inode_lock(temp->d_inode);
>         err = ovl_set_attr(temp, &c->stat);
>         inode_unlock(temp->d_inode);
> -       if (err)
> -               return err;
> -
> -       /*
> -        * Store identifier of lower inode in upper inode xattr to
> -        * allow lookup of the copy up origin inode.
> -        */
> -       if (c->origin) {
> -               err = ovl_set_origin(c->dentry, c->lowerpath.dentry, temp);
> -               if (err)
> -                       return err;
> -       }
>
> -       return 0;
> +       return err;
>  }
>
>  static int ovl_copy_up_locked(struct ovl_copy_up_ctx *c)
> --
> 2.5.5
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/13] ovl: A new xattr OVL_XATTR_METACOPY for file on upper
  2017-10-25 19:09 ` [PATCH 07/13] ovl: A new xattr OVL_XATTR_METACOPY for file on upper Vivek Goyal
@ 2017-10-26  6:04   ` Amir Goldstein
  2017-10-26 13:53     ` Vivek Goyal
  0 siblings, 1 reply; 44+ messages in thread
From: Amir Goldstein @ 2017-10-26  6:04 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> Now we will have the capability to have upper inodes which might be only
> metadata copy up and data is still on lower inode. So add a new xattr
> OVL_XATTR_METACOPY to distinguish between two cases.
>
> Presence of OVL_XATTR_METACOPY reflects that file has been copied up metadata
> only and and data will be copied up later from lower origin.
> So this xattr is set when a metadata copy takes place and cleared when
> data copy takes place.
>
> We also use a bit in ovl_inode->flags to cache OVL_UPPERDATA which reflects
> whether ovl inode has data or not (as opposed to metadata only copy up).
>
> Note, OVL_UPPERDATA is set only on those upper inodes which can possibly
> be metacopy only inodes. For example, inode should be a regular file and
> there needs to be a lower dentry.

Why? is it not simpler and better for readability to set it in all
cases except metacopy?
Doesn't matter much, but seems to me that code will look better.
That is not to say you don't need a helper "should I check the flag".

>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/copy_up.c   | 55 +++++++++++++++++++++++++++++++++++++++++++++---
>  fs/overlayfs/inode.c     |  3 ++-
>  fs/overlayfs/overlayfs.h |  6 +++++-
>  fs/overlayfs/util.c      | 34 ++++++++++++++++++++++++++++--
>  4 files changed, 91 insertions(+), 7 deletions(-)
>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index 2936284..a6cda02 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -196,6 +196,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>         return error;
>  }
>
> +static int ovl_set_size(struct dentry *upperdentry, struct kstat *stat)
> +{
> +       struct iattr attr = {
> +               .ia_valid = ATTR_SIZE,
> +               .ia_size = stat->size,
> +       };
> +
> +       return notify_change(upperdentry, &attr, NULL);
> +}
> +
>  static int ovl_set_timestamps(struct dentry *upperdentry, struct kstat *stat)
>  {
>         struct iattr attr = {
> @@ -439,6 +449,28 @@ static int ovl_get_tmpfile(struct ovl_copy_up_ctx *c, struct dentry **tempp)
>         goto out;
>  }
>
> +/* Copy up data of an inode which was copied up metadata only in the past. */
> +static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c)
> +{
> +       struct path upperpath;
> +       int err;
> +
> +       ovl_path_upper(c->dentry, &upperpath);
> +       if (WARN_ON(upperpath.dentry == NULL))
> +               return -EIO;
> +
> +       err = ovl_copy_up_data(&c->lowerpath, &upperpath, c->stat.size);
> +       if (err)
> +               return err;
> +
> +       err= vfs_removexattr(upperpath.dentry, OVL_XATTR_METACOPY);
> +       if (err)
> +               return err;
> +
> +       ovl_set_flag(OVL_UPPERDATA, d_inode(c->dentry));
> +       return err;
> +}
> +
>  static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
>  {
>         int err;
> @@ -474,8 +506,19 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
>                         return err;
>         }
>
> +       if (c->metacopy) {
> +               err = ovl_check_setxattr(c->dentry, temp, OVL_XATTR_METACOPY,
> +                                        NULL, 0, -EOPNOTSUPP);
> +               if (err)
> +                       return err;
> +       }
> +
> +
>         inode_lock(temp->d_inode);
> -       err = ovl_set_attr(temp, &c->stat);
> +       if (c->metacopy)
> +               err = ovl_set_size(temp, &c->stat);
> +       if (!err)
> +               err = ovl_set_attr(temp, &c->stat);
>         inode_unlock(temp->d_inode);
>
>         return err;
> @@ -506,6 +549,8 @@ static int ovl_copy_up_locked(struct ovl_copy_up_ctx *c)
>         if (err)
>                 goto out_cleanup;
>
> +       if (!c->metacopy && S_ISREG(d_inode(c->dentry)->i_mode))
> +               ovl_set_flag(OVL_UPPERDATA, d_inode(c->dentry));
>         ovl_inode_update(d_inode(c->dentry), newdentry);
>  out:
>         dput(temp);
> @@ -632,7 +677,7 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
>         }
>         ovl_do_check_copy_up(ctx.lowerpath.dentry);
>
> -       err = ovl_copy_up_start(dentry);
> +       err = ovl_copy_up_start(dentry, flags);
>         /* err < 0: interrupted, err > 0: raced with another copy-up */
>         if (unlikely(err)) {
>                 if (err > 0)
> @@ -642,6 +687,8 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
>                         err = ovl_do_copy_up(&ctx);
>                 if (!err && !ovl_dentry_has_upper_alias(dentry))
>                         err = ovl_link_up(&ctx);
> +               if (!err && ovl_dentry_needs_data_copy_up(dentry, flags))
> +                       err = ovl_copy_up_meta_inode_data(&ctx);
>                 ovl_copy_up_end(dentry);
>         }
>         do_delayed_call(&done);
> @@ -672,8 +719,10 @@ int ovl_copy_up_flags(struct dentry *dentry, int flags)
>                  *      with rename.
>                  */
>                 if (ovl_dentry_upper(dentry) &&
> -                   ovl_dentry_has_upper_alias(dentry))
> +                   ovl_dentry_has_upper_alias(dentry) &&
> +                   !ovl_dentry_needs_data_copy_up(dentry, flags)) {
>                         break;
> +               }
>
>                 next = dget(dentry);
>                 /* find the topmost dentry not yet copied up */
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index 321511e..1b4b42c 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -320,7 +320,8 @@ struct posix_acl *ovl_get_acl(struct inode *inode, int type)
>  static bool ovl_open_need_copy_up(struct dentry *dentry, int flags)
>  {
>         if (ovl_dentry_upper(dentry) &&
> -           ovl_dentry_has_upper_alias(dentry))
> +           ovl_dentry_has_upper_alias(dentry) &&
> +           !ovl_dentry_needs_data_copy_up(dentry, flags))
>                 return false;
>
>         if (special_file(d_inode(dentry)->i_mode))
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index d9a0edd..714abf9 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -26,10 +26,12 @@ enum ovl_path_type {
>  #define OVL_XATTR_ORIGIN OVL_XATTR_PREFIX "origin"
>  #define OVL_XATTR_IMPURE OVL_XATTR_PREFIX "impure"
>  #define OVL_XATTR_NLINK OVL_XATTR_PREFIX "nlink"
> +#define OVL_XATTR_METACOPY OVL_XATTR_PREFIX "metacopy"
>
>  enum ovl_flag {
>         OVL_IMPURE,
>         OVL_INDEX,
> +       OVL_UPPERDATA,
>  };
>
>  /*
> @@ -211,6 +213,8 @@ bool ovl_dentry_is_whiteout(struct dentry *dentry);
>  void ovl_dentry_set_opaque(struct dentry *dentry);
>  bool ovl_dentry_has_upper_alias(struct dentry *dentry);
>  void ovl_dentry_set_upper_alias(struct dentry *dentry);
> +bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags);
> +bool ovl_dentry_check_upperdata(struct dentry *dentry);
>  bool ovl_redirect_dir(struct super_block *sb);
>  const char *ovl_dentry_get_redirect(struct dentry *dentry);
>  void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
> @@ -221,7 +225,7 @@ void ovl_dentry_version_inc(struct dentry *dentry, bool impurity);
>  u64 ovl_dentry_version_get(struct dentry *dentry);
>  bool ovl_is_whiteout(struct dentry *dentry);
>  struct file *ovl_path_open(struct path *path, int flags);
> -int ovl_copy_up_start(struct dentry *dentry);
> +int ovl_copy_up_start(struct dentry *dentry, int flags);
>  void ovl_copy_up_end(struct dentry *dentry);
>  bool ovl_check_dir_xattr(struct dentry *dentry, const char *name);
>  int ovl_check_setxattr(struct dentry *dentry, struct dentry *upperdentry,
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index a4ce1c9..ef720a9 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -227,6 +227,35 @@ void ovl_dentry_set_upper_alias(struct dentry *dentry)
>         oe->has_upper = true;
>  }
>
> +bool ovl_dentry_check_upperdata(struct dentry *dentry) {

Not a good helper name IMO. it reads to me "check if upperdata is set"
and it should read "should I check upperdata flag"

> +       if (!S_ISREG(d_inode(dentry)->i_mode))
> +               return false;
> +
> +       if (!ovl_dentry_lower(dentry))
> +               return false;
> +
> +       return true;
> +}
> +
> +bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags) {
> +       if (!ovl_dentry_check_upperdata(dentry))
> +               return false;
> +
> +       if (!S_ISREG(d_inode(dentry)->i_mode))
> +               return false;

Isn't this redundant to the helper check?

> +
> +       if (!flags)
> +               return false;
> +
> +       if (!(OPEN_FMODE(flags) & FMODE_WRITE))
> +               return false;
> +
> +       if (likely(ovl_test_flag(OVL_UPPERDATA, d_inode(dentry))))
> +               return false;
> +
> +       return true;
> +}
> +
>  bool ovl_redirect_dir(struct super_block *sb)
>  {
>         struct ovl_fs *ofs = sb->s_fs_info;
> @@ -310,13 +339,14 @@ struct file *ovl_path_open(struct path *path, int flags)
>         return dentry_open(path, flags | O_NOATIME, current_cred());
>  }
>
> -int ovl_copy_up_start(struct dentry *dentry)
> +int ovl_copy_up_start(struct dentry *dentry, int flags)
>  {
>         struct ovl_inode *oi = OVL_I(d_inode(dentry));
>         int err;
>
>         err = mutex_lock_interruptible(&oi->lock);
> -       if (!err && ovl_dentry_has_upper_alias(dentry)) {
> +       if (!err && ovl_dentry_has_upper_alias(dentry) &&
> +           !ovl_dentry_needs_data_copy_up(dentry, flags)) {
>                 err = 1; /* Already copied up */
>                 mutex_unlock(&oi->lock);
>         }
> --
> 2.5.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 08/13] ovl: Fix ovl_getattr() to get number of blocks from lower
  2017-10-25 19:09 ` [PATCH 08/13] ovl: Fix ovl_getattr() to get number of blocks from lower Vivek Goyal
@ 2017-10-26  6:12   ` Amir Goldstein
  0 siblings, 0 replies; 44+ messages in thread
From: Amir Goldstein @ 2017-10-26  6:12 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> If an inode has been copied up metadata only, then we need to query the
> number of blocks from lower and fill up the stat->st_blocks.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/inode.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index 1b4b42c..dcc012c 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -140,6 +140,18 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
>         if (!is_dir && ovl_test_flag(OVL_INDEX, d_inode(dentry)))
>                 stat->nlink = dentry->d_inode->i_nlink;
>
> +       if (ovl_dentry_upper(dentry) && ovl_dentry_check_upperdata(dentry) &&
> +           !ovl_test_flag(OVL_UPPERDATA, d_inode(dentry))) {
> +               struct kstat lowerstat;
> +
> +               if (WARN_ON(!OVL_TYPE_ORIGIN(type)))
> +                       goto out;

Unnecessary complicated. OVL_TYPE_ORIGIN(type) already tells you that you
have both an upper and a lower so if you set OVL_UPPERDATA for all objects
that can safely access upper inode data, you will need just:

if (OVL_TYPE_ORIGIN(type) && !ovl_test_flag(OVL_UPPERDATA, d_inode(dentry)))

or test flag first and keep the WARN_ON for OVL_UPPERDATA && !OVL_TYPE_ORIGIN


> +               ovl_path_lower(dentry, &realpath);
> +               err = vfs_getattr(&realpath, &lowerstat, STATX_BLOCKS, flags);
> +               if (err)
> +                       goto out;
> +               stat->blocks = lowerstat.blocks;
> +       }
>  out:
>         revert_creds(old_cred);
>
> --
> 2.5.5
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 09/13] ovl: Set OVL_UPPERDATA flag during ovl_lookup()
  2017-10-25 19:09 ` [PATCH 09/13] ovl: Set OVL_UPPERDATA flag during ovl_lookup() Vivek Goyal
@ 2017-10-26  6:19   ` Amir Goldstein
  2017-10-26 18:04     ` Vivek Goyal
  0 siblings, 1 reply; 44+ messages in thread
From: Amir Goldstein @ 2017-10-26  6:19 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> During lookup, check for presence of OVL_XATTR_METACOPY and if not present,
> set OVL_UPPERDATA bit in flags. This is only done for upper inode with a
> corresponding lower dentry and file needs to be a regular file. Basically
> any file which is eligible for metadata only copy up.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/namei.c | 36 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
>
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index aec543d..52cd6ca 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -26,6 +26,24 @@ struct ovl_lookup_data {
>         char *redirect;
>  };
>
> +/* err < 0, 0 if no metacopy xattr, 1 if metacopy xattr found */
> +static int ovl_check_metacopy(struct dentry *dentry)
> +{
> +       int res;
> +
> +       res = vfs_getxattr(dentry, OVL_XATTR_METACOPY, NULL, 0);
> +       if (res < 0) {
> +               if (res == -ENODATA || res == -EOPNOTSUPP)
> +                       return 0;
> +               goto out;
> +       }
> +
> +       return 1;
> +out:
> +       pr_warn_ratelimited("overlayfs: failed to get metacopy (%i)\n", res);
> +       return res;
> +}
> +
>  static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
>                               size_t prelen, const char *post)
>  {
> @@ -594,6 +612,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>         struct dentry *this;
>         unsigned int i;
>         int err;
> +       bool metacopy = false;
>         struct ovl_lookup_data d = {
>                 .name = dentry->d_name,
>                 .is_dir = false,
> @@ -634,6 +653,20 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                                                roe->numlower, &stack, &ctr);
>                         if (err)
>                                 goto out_put_upper;
> +
> +                       err = ovl_check_metacopy(upperdentry);

You should avoid checking metacopy xattr for file types that are not
eligable.

> +                       metacopy = err;
> +                       if (err < 0)
> +                               goto out_put_upper;
> +                       if (metacopy && !ctr) {
> +                               /*
> +                                * Found an upper with metacopy set but at the
> +                                * same time there is no lower dentry. Something
> +                                * is not right.
> +                                */
> +                               err = -ESTALE;
> +                               goto out_put_upper;
> +                       }
>                 }
>
>                 if (d.redirect) {
> @@ -719,6 +752,9 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                 OVL_I(inode)->redirect = upperredirect;
>                 if (index)
>                         ovl_set_flag(OVL_INDEX, inode);
> +
> +               if (upperdentry && ctr && S_ISREG(inode->i_mode) && !metacopy)

Simpler: if (upperdentry && !metacopy)

> +                       ovl_set_flag(OVL_UPPERDATA, inode);
>         }
>
>         revert_creds(old_cred);
> --
> 2.5.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 11/13] ovl: Introduce read/write barriers around metacopy flag update
  2017-10-25 19:09 ` [PATCH 11/13] ovl: Introduce read/write barriers around metacopy flag update Vivek Goyal
@ 2017-10-26  6:34   ` Amir Goldstein
  2017-10-26 17:54     ` Vivek Goyal
  0 siblings, 1 reply; 44+ messages in thread
From: Amir Goldstein @ 2017-10-26  6:34 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> If a file is copied up metadata only and later when same file is opened
> for WRITE, then data copy up takes place. We copy up data, remove METACOPY
> xattr and then set the UPPERDATA flag in ovl_entry->flags. While all
> these operations happen with oi->lock held, read side of oi->flags is
> lockless. That is another thread on another cpu can check if UPPERDATA
> flag is set or not.
>
> So this gives us an ordering requirement w.r.t UPPERDATA flag. That is, if
> another cpu sees UPPERDATA flag set, then it should be guaranteed that
> effects of data copy up and remove xattr operations are also visible.
>
> For example.
>
>         CPU1                            CPU2
> ovl_copy_up_flags()                     acquire(oi->lock)
>  ovl_dentry_needs_data_copy_up()          ovl_copy_up_data()
>    ovl_test_flag(OVL_UPPERDATA)           vfs_removexattr()
>                                           ovl_set_flag(OVL_UPPERDATA)
>                                         release(oi->lock)
>
> Say CPU2 is copying up data and in the end sets UPPERDATA flag. But if
> CPU1 perceives the effects of setting UPPERDATA flag but not effects of
> preceeding operations, that would be a problem.

Why would that be a problem?
What can go wrong?
If you try to answer the question instead of referring to a vague "problem"
you will see that only the ovl_d_real() code path can be a problem and maybe
(I did not check) ovl_getattr. Please change your example above to ovl_d_real()
code path of CPU1

>
> Hence this patch introduces smp_wmb() on setting UPPERDATA flag operation
> and smp_rmb() on UPPERDATA flag test operation.
>
> May be some other lock or barrier is already covering it. But I am not sure
> what that is and is it obvious enough that we will not break it in future.
>
> So hence trying to be safe here and introducing barriers explicitly for
> UPPERDATA flag/bit.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/copy_up.c |  7 ++++++-
>  fs/overlayfs/super.c   | 13 ++++++++++---
>  fs/overlayfs/util.c    | 11 ++++++++++-
>  3 files changed, 26 insertions(+), 5 deletions(-)
>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index a6cda02..4876ae4 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -466,7 +466,12 @@ static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c)
>         err= vfs_removexattr(upperpath.dentry, OVL_XATTR_METACOPY);
>         if (err)
>                 return err;
> -
> +       /*
> +        * Pairs with smp_rmb() in ovl_dentry_needs_data_copy_up(). Make sure

Nope. only pairs with smp_rmpb() in ovl_d_real() (or in a new helper
you need to create)


> +        * if OVL_UPPERDATA flag is visible, then all the write operations
> +        * before it are visible as well.
> +        */
> +       smp_wmb();
>         ovl_set_flag(OVL_UPPERDATA, d_inode(c->dentry));
>         return err;
>  }
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 4cf1f98..e97dccb 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -102,9 +102,16 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
>                         if (err)
>                                 return ERR_PTR(err);
>
> -                       if (ovl_dentry_check_upperdata(dentry) &&
> -                           !ovl_test_flag(OVL_UPPERDATA, d_inode(dentry)))
> -                               goto lower;
> +                       if (ovl_dentry_check_upperdata(dentry)) {
> +                               if (!ovl_test_flag(OVL_UPPERDATA,
> +                                   d_inode(dentry)))
> +                                       goto lower;
> +                               /*
> +                                * Pairs with smp_wmb in
> +                                * ovl_copy_up_meta_inode_data()
> +                                */
> +                               smp_rmb();
> +                       }
>                 }
>                 return real;
>         }
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index ef720a9..d0f3bf7 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -238,6 +238,8 @@ bool ovl_dentry_check_upperdata(struct dentry *dentry) {
>  }
>
>  bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags) {
> +       bool upperdata;
> +
>         if (!ovl_dentry_check_upperdata(dentry))
>                 return false;
>
> @@ -250,7 +252,14 @@ bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags) {
>         if (!(OPEN_FMODE(flags) & FMODE_WRITE))
>                 return false;
>
> -       if (likely(ovl_test_flag(OVL_UPPERDATA, d_inode(dentry))))
> +       upperdata = ovl_test_flag(OVL_UPPERDATA, d_inode(dentry));
> +       /*
> +        * Pairs with smp_wmb() in ovl_copy_up_meta_inode_data(). Make sure
> +        * if setting of OVL_UPPERDATA is visible, then effects of writes
> +        * before that are visible too.
> +        */
> +       smp_rmb();
> +       if (upperdata)

Nope. smp_rmb() is not needed here, because most of the places that
use this helper
will take a lock and call it again under lock.
You may need an explicit smp_rmb() also in getattr() though, so you
can create a new
helper that does exactly what the hunk in ovl_d_real does and reuse
the helper in ovl_getattr

>                 return false;
>
>         return true;
> --
> 2.5.5
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 12/13] ovl: Do not export metacopy only upper dentry
  2017-10-25 19:09 ` [PATCH 12/13] ovl: Do not export metacopy only upper dentry Vivek Goyal
@ 2017-10-26  6:54   ` Amir Goldstein
  2017-10-26  6:54     ` Amir Goldstein
  0 siblings, 1 reply; 44+ messages in thread
From: Amir Goldstein @ 2017-10-26  6:54 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> d_real() can make a upper metacopy dentry/inode visible to the vfs layer.
> This is something new and vfs layer does not know that this inode contains
> only metadata and not data. And this could break things.
>
> So to be safe, do not export metacopy only dentry/inode to vfs using d_real().
>
> If d_real() is called with flag D_REAL_UPPER, return upper dentry only if
> it has data (flag OVL_UPPERDATA is set).
>
> Similiarly, if d_real(inode=X) is called, a warning is emitted if returned
> dentry/inode does not have OVL_UPPERDATA set. This should not happen as
> we never made this metacopy inode visible to vfs so nobody should be calling
> overlayfs back with inode=metacopy_inode.
>
> I scanned the code and I don't think it breaks any of the existing code.
> There are two users of D_REAL_UPPER. may_write_real() and
> update_ovl_inode_times().
>
> may_write_real(), will get an NULL dentry if upper inode is metacopy only
> and it will return -EPERM. Effectively, we are disallowing modifications
> to metacopy only inode from this interface. Though there is opportunity
> to improve it. (Allow chattr on metacopy inodes).
>
> update_ovl_inode_times() gets inode mtime and ctime from real inode. It
> should not be broken for metacopy inode as well for following reasons.
>
> - For any metadata operations (setattr, acl etc), overlay always calls
>   ovl_copyattr() and updates ovl inode mtime and ctime. So there is no
>   need to update mtime and ctime int his case. Its already updated.
>
> - For metadata inode, mtime should be same as lower and not change. (data
>   can't be modified on metadata inode without copyup).
>
> - For file writes, ctime and mtime will be updated. But in that case
>   first data will be copied up and this will not be a metadata inode
>   anymore. And furthr call to d_real(D_REAL_UPPER) will return upper
>   inode and new mtime and ctime will be obtainable.
>
> So atime updates should work just fine for metacopy inodes.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/super.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index e97dccb..dc8909a 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -80,8 +80,18 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
>         struct dentry *real;
>         int err;
>
> -       if (flags & D_REAL_UPPER)
> -               return ovl_dentry_upper(dentry);
> +       if (flags & D_REAL_UPPER) {
> +               real = ovl_dentry_upper(dentry);
> +               if (!real)
> +                       return NULL;


bool ovl_dentry_is_metacopy(dentry) {
> +               if (!ovl_dentry_check_upperdata(dentry))
> +                       return false;
> +               if (!ovl_test_flag(OVL_UPPERDATA, d_inode(dentry)))
> +                       return true;
> +               /* Pairs with smp_wmb() in ovl_copy_up_meta_inode_data() */
> +               smp_rmb();
                    return false;
}

to be used 2 times in ovl_d_real() and in ovl_getattr()


> +               return real;
> +       }
>
>         if (!d_is_reg(dentry)) {
>                 if (!inode || inode == d_inode(dentry))
> @@ -113,6 +123,9 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
>                                 smp_rmb();
>                         }
>                 }
> +
> +               WARN_ON(ovl_dentry_check_upperdata(dentry) &&
> +                       !ovl_test_flag((OVL_UPPERDATA), d_inode(dentry)));

Instead of checking flag twice, check it once above the if (!inode), e.g:

bool metacopy = ovl_dentry_is_metacopy(dentry);
if (!inode) {
        ...
        if (unlikely(metacopy))
                goto lower;
} else if (unlikely(metacopy))
        goto bug;
}

>                 return real;
>         }
>
> --
> 2.5.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 12/13] ovl: Do not export metacopy only upper dentry
  2017-10-26  6:54   ` Amir Goldstein
@ 2017-10-26  6:54     ` Amir Goldstein
  0 siblings, 0 replies; 44+ messages in thread
From: Amir Goldstein @ 2017-10-26  6:54 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Oct 26, 2017 at 9:54 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> d_real() can make a upper metacopy dentry/inode visible to the vfs layer.
>> This is something new and vfs layer does not know that this inode contains
>> only metadata and not data. And this could break things.
>>
>> So to be safe, do not export metacopy only dentry/inode to vfs using d_real().

Also please change subject and this line to "do not expose..."

>>
>> If d_real() is called with flag D_REAL_UPPER, return upper dentry only if
>> it has data (flag OVL_UPPERDATA is set).
>>
>> Similiarly, if d_real(inode=X) is called, a warning is emitted if returned
>> dentry/inode does not have OVL_UPPERDATA set. This should not happen as
>> we never made this metacopy inode visible to vfs so nobody should be calling
>> overlayfs back with inode=metacopy_inode.
>>
>> I scanned the code and I don't think it breaks any of the existing code.
>> There are two users of D_REAL_UPPER. may_write_real() and
>> update_ovl_inode_times().
>>
>> may_write_real(), will get an NULL dentry if upper inode is metacopy only
>> and it will return -EPERM. Effectively, we are disallowing modifications
>> to metacopy only inode from this interface. Though there is opportunity
>> to improve it. (Allow chattr on metacopy inodes).
>>
>> update_ovl_inode_times() gets inode mtime and ctime from real inode. It
>> should not be broken for metacopy inode as well for following reasons.
>>
>> - For any metadata operations (setattr, acl etc), overlay always calls
>>   ovl_copyattr() and updates ovl inode mtime and ctime. So there is no
>>   need to update mtime and ctime int his case. Its already updated.
>>
>> - For metadata inode, mtime should be same as lower and not change. (data
>>   can't be modified on metadata inode without copyup).
>>
>> - For file writes, ctime and mtime will be updated. But in that case
>>   first data will be copied up and this will not be a metadata inode
>>   anymore. And furthr call to d_real(D_REAL_UPPER) will return upper
>>   inode and new mtime and ctime will be obtainable.
>>
>> So atime updates should work just fine for metacopy inodes.
>>
>> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
>> ---
>>  fs/overlayfs/super.c | 17 +++++++++++++++--
>>  1 file changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>> index e97dccb..dc8909a 100644
>> --- a/fs/overlayfs/super.c
>> +++ b/fs/overlayfs/super.c
>> @@ -80,8 +80,18 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
>>         struct dentry *real;
>>         int err;
>>
>> -       if (flags & D_REAL_UPPER)
>> -               return ovl_dentry_upper(dentry);
>> +       if (flags & D_REAL_UPPER) {
>> +               real = ovl_dentry_upper(dentry);
>> +               if (!real)
>> +                       return NULL;
>
>
> bool ovl_dentry_is_metacopy(dentry) {
>> +               if (!ovl_dentry_check_upperdata(dentry))
>> +                       return false;
>> +               if (!ovl_test_flag(OVL_UPPERDATA, d_inode(dentry)))
>> +                       return true;
>> +               /* Pairs with smp_wmb() in ovl_copy_up_meta_inode_data() */
>> +               smp_rmb();
>                     return false;
> }
>
> to be used 2 times in ovl_d_real() and in ovl_getattr()
>
>
>> +               return real;
>> +       }
>>
>>         if (!d_is_reg(dentry)) {
>>                 if (!inode || inode == d_inode(dentry))
>> @@ -113,6 +123,9 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
>>                                 smp_rmb();
>>                         }
>>                 }
>> +
>> +               WARN_ON(ovl_dentry_check_upperdata(dentry) &&
>> +                       !ovl_test_flag((OVL_UPPERDATA), d_inode(dentry)));
>
> Instead of checking flag twice, check it once above the if (!inode), e.g:
>
> bool metacopy = ovl_dentry_is_metacopy(dentry);
> if (!inode) {
>         ...
>         if (unlikely(metacopy))
>                 goto lower;
> } else if (unlikely(metacopy))
>         goto bug;
> }
>
>>                 return real;
>>         }
>>
>> --
>> 2.5.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 13/13] ovl: Enable metadata only feature
  2017-10-25 19:09 ` [PATCH 13/13] ovl: Enable metadata only feature Vivek Goyal
@ 2017-10-26  7:07   ` Amir Goldstein
  2017-10-26 18:19     ` Vivek Goyal
  0 siblings, 1 reply; 44+ messages in thread
From: Amir Goldstein @ 2017-10-26  7:07 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> All the bits are in patches before this. So it is time to enable the
> metadata only copy up feature.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/copy_up.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index 4876ae4..3ce35e1 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -637,12 +637,13 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
>         int err;
>         DEFINE_DELAYED_CALL(done);
>         struct path parentpath;
> +       struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
>         struct ovl_copy_up_ctx ctx = {
>                 .parent = parent,
>                 .dentry = dentry,
>                 .workdir = ovl_workdir(dentry),
>                 .origin = true,
> -               .metacopy = false,
> +               .metacopy = ofs->config.metacopy,

This should have been better
.metacopy = ovl_dentry_needs_data_copy_up(dentry)
to begin with and the helper would start with

if (true) return; (with a TODO comment)

That this patch would fix to
if (!ofs->config.metacopy) return

Then you don't need to check ISREG and write flags below.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data
  2017-10-25 19:09 [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (12 preceding siblings ...)
  2017-10-25 19:09 ` [PATCH 13/13] ovl: Enable metadata only feature Vivek Goyal
@ 2017-10-26  7:18 ` Amir Goldstein
  2017-10-27 16:40   ` Vivek Goyal
  13 siblings, 1 reply; 44+ messages in thread
From: Amir Goldstein @ 2017-10-26  7:18 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> Hi,
>
> Here is the V5 of patches. Following are changes from V4.

Please use git format-patch -v6 next time you post.
It tags all the patch subjects so it is easier to find the former
revision of specific patch.

>
> - Flipped the meaning of bit. Now OVL_METACOPY is called OVL_UPPERDATA and
>   dropped a barrier patch which implemented ordering requirements between
>   oi->flags and oi->__upperdentry.
>
> - Now ovl_d_real() does return upper dentry if upper dentry is metacopy only
>   dentry.

doesn't return upper dentry

But I am still not convinced that metacopy upper inode is not accessed in places
it shouldn't be accessed by vfs, via other APIs.

For example, ovl_permission(), ovl_getattr(), ovl_setattr(), ovl_listxattr(),
ovl_get_acl() will all access metacopy inode and that outcome will be desired.
But you did not provide a complete list of vfs APIs and verify for each
that accessing the metacopy upper via various variants of ovl_dentry_real()
is either not possible (because access requires a rw fd) or is the
desired outcome.

>
> - Added a fix to ovl_lookup() goto statement.

This patch is independent of this series AFAICT your patches don't
depend on it in.
Please post it separately with Fixes tag and CC stable

>
> - Renamed ->metadata_only to ->metacopy
>
> - Replaced BUG_ON() with WARN_ON() in ovl_copy_up_meta_inode_data()
>
> - Added WARN_ON(!OVL_TYPE_ORIGIN(type)) in ovl_getattr()
>
> - Took care of bunch of code organization comments from Amir.
>
> Original Description
> --------------------
> In one of the recent converstions, people mentioned that chown/chmod
> lead to copy up files as well as data. We could optimize it so that
> only metadata is copied up during chown/chmod and data is copied up when
> file is opened for WRITE.
>
> This optimization potentially could be useful with containers and user
> namespaces. In popular scenario, people end up doing chown() on whole
> image directory tree based on container mappings. And this chown copies
> up everything, breaking sharing of page cache between containers.
>
> With these patches, only metadat is copied up during chown() and if file
> is opened for READ, d_real() returns lower dentry/inode. That way,
> different containers can still continue to use page cache. That's the
> use case I have in mind.
>
> Basically, I am relying on storing OVL_XATTR_ORIGIN in upper inode
> during copy up. I use that information to get to lower inode later and
> do data copy up when necessary.
>
> I also store OVL_XATTR_METACOPY in upper inode to mark that only
> metadata has been copied up and data copy up still might be required.
>
> Any feedback is helpful.
>
> Thanks
> Vivek
>
> Vivek Goyal (13):
>   ovl: Put upperdentry if ovl_check_origin() fails
>   ovl: Create origin xattr on copy up for all files
>   ovl: ovl_check_setxattr() get rid of redundant -EOPNOTSUPP check
>   ovl: Provide a mount option metacopy=on/off for metadata copyup
>   ovl: During copy up, first copy up metadata and then data
>   ovl: Copy up only metadata during copy up where it makes sense
>   ovl: A new xattr OVL_XATTR_METACOPY for file on upper
>   ovl: Fix ovl_getattr() to get number of blocks from lower
>   ovl: Set OVL_UPPERDATA flag during ovl_lookup()
>   ovl: Return lower dentry if only metadata copy up took place
>   ovl: Introduce read/write barriers around metacopy flag update
>   ovl: Do not export metacopy only upper dentry
>   ovl: Enable metadata only feature
>
>  fs/overlayfs/Kconfig     |   8 ++++
>  fs/overlayfs/copy_up.c   | 116 +++++++++++++++++++++++++++++++++++------------
>  fs/overlayfs/inode.c     |  15 +++++-
>  fs/overlayfs/namei.c     |  38 +++++++++++++++-
>  fs/overlayfs/overlayfs.h |   6 ++-
>  fs/overlayfs/ovl_entry.h |   1 +
>  fs/overlayfs/super.c     |  67 +++++++++++++++++++++++++--
>  fs/overlayfs/util.c      |  54 +++++++++++++++++-----
>  8 files changed, 257 insertions(+), 48 deletions(-)
>
> --
> 2.5.5
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/13] ovl: Create origin xattr on copy up for all files
  2017-10-26  5:31   ` Amir Goldstein
@ 2017-10-26 12:53     ` Vivek Goyal
  0 siblings, 0 replies; 44+ messages in thread
From: Vivek Goyal @ 2017-10-26 12:53 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Thu, Oct 26, 2017 at 08:31:48AM +0300, Amir Goldstein wrote:
> On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > Right now my understanding is that origin xattr is created for all copied
> > up files if index=on. And if index=off, then we create it for all type
> > of files except hardlinks (nlink != 1).
> >
> > With metadata only copy up, I will still require origin xattr to copy up
> > data later, so create it even for hardlinks even with index=off.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/overlayfs/copy_up.c | 7 +------
> >  1 file changed, 1 insertion(+), 6 deletions(-)
> >
> > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> > index c441f93..8a8f538 100644
> > --- a/fs/overlayfs/copy_up.c
> > +++ b/fs/overlayfs/copy_up.c
> > @@ -470,9 +470,6 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
> >         /*
> >          * Store identifier of lower inode in upper inode xattr to
> >          * allow lookup of the copy up origin inode.
> > -        *
> > -        * Don't set origin when we are breaking the association with a lower
> > -        * hard link.
> >          */
> >         if (c->origin) {
> >                 err = ovl_set_origin(c->dentry, c->lowerpath.dentry, temp);
> > @@ -538,9 +535,6 @@ static int ovl_do_copy_up(struct ovl_copy_up_ctx *c)
> >             c->stat.nlink > 1)
> >                 indexed = true;
> >
> > -       if (S_ISDIR(c->stat.mode) || c->stat.nlink == 1 || indexed)
> > -               c->origin = true;
> > -
> >         if (indexed) {
> >                 c->destdir = ovl_indexdir(c->dentry->d_sb);
> >                 err = ovl_get_index_name(c->lowerpath.dentry, &c->destname);
> > @@ -596,6 +590,7 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
> >                 .parent = parent,
> >                 .dentry = dentry,
> >                 .workdir = ovl_workdir(dentry),
> > +               .origin = true,
> 
> If its always true, please remove the condition variable,

Right. I will remove it.

> but if it
> were up to me,
> I would set it to true only when metacopy is enabled for now and maybe relax it
> unconditionally later on

I rather not make it metacopy dependent. If there are issus with
this, they better come out now and don't want these issues to be visible
only if metacopy=on. It becomes another corner case to handle then. Want
to avoid it.

Vivek

> 
> >         };
> >
> >         if (WARN_ON(!ctx.workdir))
> > --
> > 2.5.5
> >

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/13] ovl: Provide a mount option metacopy=on/off for metadata copyup
  2017-10-26  5:39   ` Amir Goldstein
@ 2017-10-26 13:15     ` Vivek Goyal
  2017-10-26 13:57       ` Amir Goldstein
  0 siblings, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2017-10-26 13:15 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Thu, Oct 26, 2017 at 08:39:28AM +0300, Amir Goldstein wrote:
> On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > By default metadata only copy up is disabled. Provide a mount option so
> > that users can choose one way or other.
> >
> > Also provide a kernel config and module option to enable/disable
> > metacopy feature.
> >
> > Like index feature, when overlay is mounted, on root upper directory we
> > set ORIGIN which points to lower. And at later mount time it is verified
> > again. This hopes to get the configuration right. But this does only so
> > much as we don't verify all the lowers. So it is possible that a lower is
> > missing and later data copy up fails.
> 
> Since you bother to specify the motivation for verifying lower root dir,
> FYI, the main motivation was to detect the case of copied layers.

What are copied layers? You mean I remove original layer and bring in
another one with same top dir name?

But this tests only one layer, right. So if there are multiple lower
layers, I can still copy/remove or do something else with other layers
and it will not detect.

Also I should be able to retain top dir name same and replace anything
underneath and that will not be detected either.

So it provides only so much of protection, IIUC.

> I still hold the opinion that new incompat features should enforce
> exclusive dir lock, so withholding Reviewed-by on this one for now.

I am really not happy about enforcing exlusive dir lock because
this breaks userspace. 

I like the idea of making it hard to get configuration wrong, but overlay
was like this since the beginning and how many issues have we faced
that users ended up sharing upper. Atleast I have not come across
even a single one.

If that's the case, I would rahter wait that a proper fix comes along
(finding existing super block and returning that) and then harden it.

If I enforce it now, I most likely can't even use it with current
container run time because there is a possibility they can leak
mount points. So what's the point of introducing this feature.

Miklos, what do you think?

Vivek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 05/13] ovl: During copy up, first copy up metadata and then data
  2017-10-26  5:42   ` Amir Goldstein
@ 2017-10-26 13:19     ` Vivek Goyal
  0 siblings, 0 replies; 44+ messages in thread
From: Vivek Goyal @ 2017-10-26 13:19 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Thu, Oct 26, 2017 at 08:42:08AM +0300, Amir Goldstein wrote:
> On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > Just a little re-ordering of code. This helps with next patch where after
> > copying up metadata, we skip data copying step, if needed.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> 
> Please add
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> 
> if you post again with no changes to patch,
> so help me avoid re-review

Sure will do. This patch had changed slightly (I started returing err
instead of 0). That's probably I did not include it. 

Vivek

> 
> Thanks
> 
> > ---
> >  fs/overlayfs/copy_up.c | 33 ++++++++++++++++-----------------
> >  1 file changed, 16 insertions(+), 17 deletions(-)
> >
> > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> > index 8a8f538..4f580ec 100644
> > --- a/fs/overlayfs/copy_up.c
> > +++ b/fs/overlayfs/copy_up.c
> > @@ -445,6 +445,21 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
> >  {
> >         int err;
> >
> > +       err = ovl_copy_xattr(c->lowerpath.dentry, temp);
> > +       if (err)
> > +               return err;
> > +
> > +       /*
> > +        * Store identifier of lower inode in upper inode xattr to
> > +        * allow lookup of the copy up origin inode.
> > +        *
> > +        */
> > +       if (c->origin) {
> > +               err = ovl_set_origin(c->dentry, c->lowerpath.dentry, temp);
> > +               if (err)
> > +                       return err;
> > +       }
> > +
> >         if (S_ISREG(c->stat.mode)) {
> >                 struct path upperpath;
> >
> > @@ -457,27 +472,11 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
> >                         return err;
> >         }
> >
> > -       err = ovl_copy_xattr(c->lowerpath.dentry, temp);
> > -       if (err)
> > -               return err;
> > -
> >         inode_lock(temp->d_inode);
> >         err = ovl_set_attr(temp, &c->stat);
> >         inode_unlock(temp->d_inode);
> > -       if (err)
> > -               return err;
> > -
> > -       /*
> > -        * Store identifier of lower inode in upper inode xattr to
> > -        * allow lookup of the copy up origin inode.
> > -        */
> > -       if (c->origin) {
> > -               err = ovl_set_origin(c->dentry, c->lowerpath.dentry, temp);
> > -               if (err)
> > -                       return err;
> > -       }
> >
> > -       return 0;
> > +       return err;
> >  }
> >
> >  static int ovl_copy_up_locked(struct ovl_copy_up_ctx *c)
> > --
> > 2.5.5
> >

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/13] ovl: A new xattr OVL_XATTR_METACOPY for file on upper
  2017-10-26  6:04   ` Amir Goldstein
@ 2017-10-26 13:53     ` Vivek Goyal
  2017-10-26 14:14       ` Amir Goldstein
  0 siblings, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2017-10-26 13:53 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Thu, Oct 26, 2017 at 09:04:17AM +0300, Amir Goldstein wrote:
> On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > Now we will have the capability to have upper inodes which might be only
> > metadata copy up and data is still on lower inode. So add a new xattr
> > OVL_XATTR_METACOPY to distinguish between two cases.
> >
> > Presence of OVL_XATTR_METACOPY reflects that file has been copied up metadata
> > only and and data will be copied up later from lower origin.
> > So this xattr is set when a metadata copy takes place and cleared when
> > data copy takes place.
> >
> > We also use a bit in ovl_inode->flags to cache OVL_UPPERDATA which reflects
> > whether ovl inode has data or not (as opposed to metadata only copy up).
> >
> > Note, OVL_UPPERDATA is set only on those upper inodes which can possibly
> > be metacopy only inodes. For example, inode should be a regular file and
> > there needs to be a lower dentry.
> 
> Why? is it not simpler and better for readability to set it in all
> cases except metacopy?

I think it was re-introducing ordering requirement w.r.t oi->__upperdentry
and that's why I avoided it doing for all type of files.

For example, in file/dir creation path when a new dentry is created, we
need to make sure OVL_UPPERDATA is visible on other cpus before
oi->__upperdentry is visible. Otherwise other cpus might try to copy up this
file for which there might not be any lower and all the bad things can happen.

For example, 

          CPU1					CPU2
   ovl_instantiate() {			   ovl_d_real() {
     ovl_dentry_set_upper_alias()	    ovl_open_maybe_copy_up()
     set OVL_UPPERDATA 			     ovl_dentry_needs_data_copy_up   
     smp_wmb();				      test OVL_UPPERDATA
     oi->__upperdentry = upperdentry
   }

So if cpu1 is creating a new file and cpu2 does ovl_d_real() and it sees
upper dentry but not OVL_UPPERDATA, it will try to copy up file (which
might not be on lower at all).

Similarly, in ovl_d_real() we check OVL_UPPERDATA one more time and if
its not set, we return lower dentry instead of upper. It can race there
as well and try to return a lower which does not exist (for a newly
created file). 

So, IIUC, this was bringing back ordering requirement and hence I limited
this flag.

BTW, how did oi->has_upper handle this requirement. IIUC, that can run
into same issue. It is possible that CPU2 does not see ->has_upper for
newly created file while it has oi->__upperdentry. That means it can
try to copy up a non-existent file as well.

> Doesn't matter much, but seems to me that code will look better.
> That is not to say you don't need a helper "should I check the flag".

I am not very happy with helper either. But it felt less bad option 
when it came to re-introducing barrier. So I chose this path.

> 
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/overlayfs/copy_up.c   | 55 +++++++++++++++++++++++++++++++++++++++++++++---
> >  fs/overlayfs/inode.c     |  3 ++-
> >  fs/overlayfs/overlayfs.h |  6 +++++-
> >  fs/overlayfs/util.c      | 34 ++++++++++++++++++++++++++++--
> >  4 files changed, 91 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> > index 2936284..a6cda02 100644
> > --- a/fs/overlayfs/copy_up.c
> > +++ b/fs/overlayfs/copy_up.c
> > @@ -196,6 +196,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
> >         return error;
> >  }
> >
> > +static int ovl_set_size(struct dentry *upperdentry, struct kstat *stat)
> > +{
> > +       struct iattr attr = {
> > +               .ia_valid = ATTR_SIZE,
> > +               .ia_size = stat->size,
> > +       };
> > +
> > +       return notify_change(upperdentry, &attr, NULL);
> > +}
> > +
> >  static int ovl_set_timestamps(struct dentry *upperdentry, struct kstat *stat)
> >  {
> >         struct iattr attr = {
> > @@ -439,6 +449,28 @@ static int ovl_get_tmpfile(struct ovl_copy_up_ctx *c, struct dentry **tempp)
> >         goto out;
> >  }
> >
> > +/* Copy up data of an inode which was copied up metadata only in the past. */
> > +static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c)
> > +{
> > +       struct path upperpath;
> > +       int err;
> > +
> > +       ovl_path_upper(c->dentry, &upperpath);
> > +       if (WARN_ON(upperpath.dentry == NULL))
> > +               return -EIO;
> > +
> > +       err = ovl_copy_up_data(&c->lowerpath, &upperpath, c->stat.size);
> > +       if (err)
> > +               return err;
> > +
> > +       err= vfs_removexattr(upperpath.dentry, OVL_XATTR_METACOPY);
> > +       if (err)
> > +               return err;
> > +
> > +       ovl_set_flag(OVL_UPPERDATA, d_inode(c->dentry));
> > +       return err;
> > +}
> > +
> >  static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
> >  {
> >         int err;
> > @@ -474,8 +506,19 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
> >                         return err;
> >         }
> >
> > +       if (c->metacopy) {
> > +               err = ovl_check_setxattr(c->dentry, temp, OVL_XATTR_METACOPY,
> > +                                        NULL, 0, -EOPNOTSUPP);
> > +               if (err)
> > +                       return err;
> > +       }
> > +
> > +
> >         inode_lock(temp->d_inode);
> > -       err = ovl_set_attr(temp, &c->stat);
> > +       if (c->metacopy)
> > +               err = ovl_set_size(temp, &c->stat);
> > +       if (!err)
> > +               err = ovl_set_attr(temp, &c->stat);
> >         inode_unlock(temp->d_inode);
> >
> >         return err;
> > @@ -506,6 +549,8 @@ static int ovl_copy_up_locked(struct ovl_copy_up_ctx *c)
> >         if (err)
> >                 goto out_cleanup;
> >
> > +       if (!c->metacopy && S_ISREG(d_inode(c->dentry)->i_mode))
> > +               ovl_set_flag(OVL_UPPERDATA, d_inode(c->dentry));
> >         ovl_inode_update(d_inode(c->dentry), newdentry);
> >  out:
> >         dput(temp);
> > @@ -632,7 +677,7 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
> >         }
> >         ovl_do_check_copy_up(ctx.lowerpath.dentry);
> >
> > -       err = ovl_copy_up_start(dentry);
> > +       err = ovl_copy_up_start(dentry, flags);
> >         /* err < 0: interrupted, err > 0: raced with another copy-up */
> >         if (unlikely(err)) {
> >                 if (err > 0)
> > @@ -642,6 +687,8 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
> >                         err = ovl_do_copy_up(&ctx);
> >                 if (!err && !ovl_dentry_has_upper_alias(dentry))
> >                         err = ovl_link_up(&ctx);
> > +               if (!err && ovl_dentry_needs_data_copy_up(dentry, flags))
> > +                       err = ovl_copy_up_meta_inode_data(&ctx);
> >                 ovl_copy_up_end(dentry);
> >         }
> >         do_delayed_call(&done);
> > @@ -672,8 +719,10 @@ int ovl_copy_up_flags(struct dentry *dentry, int flags)
> >                  *      with rename.
> >                  */
> >                 if (ovl_dentry_upper(dentry) &&
> > -                   ovl_dentry_has_upper_alias(dentry))
> > +                   ovl_dentry_has_upper_alias(dentry) &&
> > +                   !ovl_dentry_needs_data_copy_up(dentry, flags)) {
> >                         break;
> > +               }
> >
> >                 next = dget(dentry);
> >                 /* find the topmost dentry not yet copied up */
> > diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> > index 321511e..1b4b42c 100644
> > --- a/fs/overlayfs/inode.c
> > +++ b/fs/overlayfs/inode.c
> > @@ -320,7 +320,8 @@ struct posix_acl *ovl_get_acl(struct inode *inode, int type)
> >  static bool ovl_open_need_copy_up(struct dentry *dentry, int flags)
> >  {
> >         if (ovl_dentry_upper(dentry) &&
> > -           ovl_dentry_has_upper_alias(dentry))
> > +           ovl_dentry_has_upper_alias(dentry) &&
> > +           !ovl_dentry_needs_data_copy_up(dentry, flags))
> >                 return false;
> >
> >         if (special_file(d_inode(dentry)->i_mode))
> > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> > index d9a0edd..714abf9 100644
> > --- a/fs/overlayfs/overlayfs.h
> > +++ b/fs/overlayfs/overlayfs.h
> > @@ -26,10 +26,12 @@ enum ovl_path_type {
> >  #define OVL_XATTR_ORIGIN OVL_XATTR_PREFIX "origin"
> >  #define OVL_XATTR_IMPURE OVL_XATTR_PREFIX "impure"
> >  #define OVL_XATTR_NLINK OVL_XATTR_PREFIX "nlink"
> > +#define OVL_XATTR_METACOPY OVL_XATTR_PREFIX "metacopy"
> >
> >  enum ovl_flag {
> >         OVL_IMPURE,
> >         OVL_INDEX,
> > +       OVL_UPPERDATA,
> >  };
> >
> >  /*
> > @@ -211,6 +213,8 @@ bool ovl_dentry_is_whiteout(struct dentry *dentry);
> >  void ovl_dentry_set_opaque(struct dentry *dentry);
> >  bool ovl_dentry_has_upper_alias(struct dentry *dentry);
> >  void ovl_dentry_set_upper_alias(struct dentry *dentry);
> > +bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags);
> > +bool ovl_dentry_check_upperdata(struct dentry *dentry);
> >  bool ovl_redirect_dir(struct super_block *sb);
> >  const char *ovl_dentry_get_redirect(struct dentry *dentry);
> >  void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
> > @@ -221,7 +225,7 @@ void ovl_dentry_version_inc(struct dentry *dentry, bool impurity);
> >  u64 ovl_dentry_version_get(struct dentry *dentry);
> >  bool ovl_is_whiteout(struct dentry *dentry);
> >  struct file *ovl_path_open(struct path *path, int flags);
> > -int ovl_copy_up_start(struct dentry *dentry);
> > +int ovl_copy_up_start(struct dentry *dentry, int flags);
> >  void ovl_copy_up_end(struct dentry *dentry);
> >  bool ovl_check_dir_xattr(struct dentry *dentry, const char *name);
> >  int ovl_check_setxattr(struct dentry *dentry, struct dentry *upperdentry,
> > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> > index a4ce1c9..ef720a9 100644
> > --- a/fs/overlayfs/util.c
> > +++ b/fs/overlayfs/util.c
> > @@ -227,6 +227,35 @@ void ovl_dentry_set_upper_alias(struct dentry *dentry)
> >         oe->has_upper = true;
> >  }
> >
> > +bool ovl_dentry_check_upperdata(struct dentry *dentry) {
> 
> Not a good helper name IMO. it reads to me "check if upperdata is set"
> and it should read "should I check upperdata flag"

How about ovl_should_check_upperdata() instead.

> 
> > +       if (!S_ISREG(d_inode(dentry)->i_mode))
> > +               return false;
> > +
> > +       if (!ovl_dentry_lower(dentry))
> > +               return false;
> > +
> > +       return true;
> > +}
> > +
> > +bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags) {
> > +       if (!ovl_dentry_check_upperdata(dentry))
> > +               return false;
> > +
> > +       if (!S_ISREG(d_inode(dentry)->i_mode))
> > +               return false;
> 
> Isn't this redundant to the helper check?

Yes. this is redundant now. Forgot to remove it.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/13] ovl: Provide a mount option metacopy=on/off for metadata copyup
  2017-10-26 13:15     ` Vivek Goyal
@ 2017-10-26 13:57       ` Amir Goldstein
  0 siblings, 0 replies; 44+ messages in thread
From: Amir Goldstein @ 2017-10-26 13:57 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Oct 26, 2017 at 4:15 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Thu, Oct 26, 2017 at 08:39:28AM +0300, Amir Goldstein wrote:
>> On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > By default metadata only copy up is disabled. Provide a mount option so
>> > that users can choose one way or other.
>> >
>> > Also provide a kernel config and module option to enable/disable
>> > metacopy feature.
>> >
>> > Like index feature, when overlay is mounted, on root upper directory we
>> > set ORIGIN which points to lower. And at later mount time it is verified
>> > again. This hopes to get the configuration right. But this does only so
>> > much as we don't verify all the lowers. So it is possible that a lower is
>> > missing and later data copy up fails.
>>
>> Since you bother to specify the motivation for verifying lower root dir,
>> FYI, the main motivation was to detect the case of copied layers.
>
> What are copied layers? You mean I remove original layer and bring in
> another one with same top dir name?
>

I mean admin is out of disk space so she cp -a some folders
between volumes or maybe copying all layers a side to backup
a machine.

There is a section in Documentation/filesystems/overlayfs.txt
"Sharing and copying layers" where I wrote:
"It is quite a common practice to copy overlay layers to a different
directory tree on the same or different underlying filesystem, and even
to a different machine"

This was my impression.
Please fix me if you think this statement is not accurate.


> But this tests only one layer, right. So if there are multiple lower
> layers, I can still copy/remove or do something else with other layers
> and it will not detect.
>
> Also I should be able to retain top dir name same and replace anything
> underneath and that will not be detected either.
>
> So it provides only so much of protection, IIUC.

It provides merely a big red sign if user used to practice
"copying layers", user cannot do that anymore with index=on
and she cannot do that with metacopy=on either.
the test is there as a heuristic to detect that use case and alert the user.

I would rephrase commit message to something like:
"Like index feature, we verify on mount that upper root is not being
 reused with a different lower root. This hopes to get the configuration
 right and detect the copied layers use case. But this does only..."

Less 'what' more 'why'.

>
>> I still hold the opinion that new incompat features should enforce
>> exclusive dir lock, so withholding Reviewed-by on this one for now.
>
> I am really not happy about enforcing exlusive dir lock because
> this breaks userspace.
>
> I like the idea of making it hard to get configuration wrong, but overlay
> was like this since the beginning and how many issues have we faced
> that users ended up sharing upper. Atleast I have not come across
> even a single one.
>
> If that's the case, I would rahter wait that a proper fix comes along
> (finding existing super block and returning that) and then harden it.
>
> If I enforce it now, I most likely can't even use it with current
> container run time because there is a possibility they can leak
> mount points. So what's the point of introducing this feature.
>
> Miklos, what do you think?
>
> Vivek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/13] ovl: A new xattr OVL_XATTR_METACOPY for file on upper
  2017-10-26 13:53     ` Vivek Goyal
@ 2017-10-26 14:14       ` Amir Goldstein
  2017-10-26 14:34         ` Vivek Goyal
  0 siblings, 1 reply; 44+ messages in thread
From: Amir Goldstein @ 2017-10-26 14:14 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Oct 26, 2017 at 4:53 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Thu, Oct 26, 2017 at 09:04:17AM +0300, Amir Goldstein wrote:
>> On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > Now we will have the capability to have upper inodes which might be only
>> > metadata copy up and data is still on lower inode. So add a new xattr
>> > OVL_XATTR_METACOPY to distinguish between two cases.
>> >
>> > Presence of OVL_XATTR_METACOPY reflects that file has been copied up metadata
>> > only and and data will be copied up later from lower origin.
>> > So this xattr is set when a metadata copy takes place and cleared when
>> > data copy takes place.
>> >
>> > We also use a bit in ovl_inode->flags to cache OVL_UPPERDATA which reflects
>> > whether ovl inode has data or not (as opposed to metadata only copy up).
>> >
>> > Note, OVL_UPPERDATA is set only on those upper inodes which can possibly
>> > be metacopy only inodes. For example, inode should be a regular file and
>> > there needs to be a lower dentry.
>>
>> Why? is it not simpler and better for readability to set it in all
>> cases except metacopy?
>
> I think it was re-introducing ordering requirement w.r.t oi->__upperdentry
> and that's why I avoided it doing for all type of files.
>
> For example, in file/dir creation path when a new dentry is created, we
> need to make sure OVL_UPPERDATA is visible on other cpus before
> oi->__upperdentry is visible. Otherwise other cpus might try to copy up this
> file for which there might not be any lower and all the bad things can happen.
>
> For example,
>
>           CPU1                                  CPU2
>    ovl_instantiate() {                     ovl_d_real() {
>      ovl_dentry_set_upper_alias()           ovl_open_maybe_copy_up()
>      set OVL_UPPERDATA                       ovl_dentry_needs_data_copy_up
>      smp_wmb();                               test OVL_UPPERDATA
>      oi->__upperdentry = upperdentry
>    }
>
> So if cpu1 is creating a new file and cpu2 does ovl_d_real() and it sees
> upper dentry but not OVL_UPPERDATA, it will try to copy up file (which
> might not be on lower at all).
>
> Similarly, in ovl_d_real() we check OVL_UPPERDATA one more time and if
> its not set, we return lower dentry instead of upper. It can race there
> as well and try to return a lower which does not exist (for a newly
> created file).
>
> So, IIUC, this was bringing back ordering requirement and hence I limited
> this flag.
>

Fine. but use ovl_should_check_upperdata() in read path (CPU2) anyway
so there is little point in checking whether or not you need to set the flag.
Better just set it unconditionally on copy up (!metacopy) and on
ovl_instantiate().

> BTW, how did oi->has_upper handle this requirement. IIUC, that can run
> into same issue. It is possible that CPU2 does not see ->has_upper for
> newly created file while it has oi->__upperdentry. That means it can
> try to copy up a non-existent file as well.

I think you are referring to the "false negative" case in Miklos's comment
in ovl_copy_up_flags(). This is the simplest case because both __upperdentry
and has_alias are checked again under oi->lock in ovl_copy_up_one().

>
>> Doesn't matter much, but seems to me that code will look better.
>> That is not to say you don't need a helper "should I check the flag".
>
> I am not very happy with helper either. But it felt less bad option
> when it came to re-introducing barrier. So I chose this path.
>
>>
>> >
>> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
>> > ---
>> >  fs/overlayfs/copy_up.c   | 55 +++++++++++++++++++++++++++++++++++++++++++++---
>> >  fs/overlayfs/inode.c     |  3 ++-
>> >  fs/overlayfs/overlayfs.h |  6 +++++-
>> >  fs/overlayfs/util.c      | 34 ++++++++++++++++++++++++++++--
>> >  4 files changed, 91 insertions(+), 7 deletions(-)
>> >
>> > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
>> > index 2936284..a6cda02 100644
>> > --- a/fs/overlayfs/copy_up.c
>> > +++ b/fs/overlayfs/copy_up.c
>> > @@ -196,6 +196,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>> >         return error;
>> >  }
>> >
>> > +static int ovl_set_size(struct dentry *upperdentry, struct kstat *stat)
>> > +{
>> > +       struct iattr attr = {
>> > +               .ia_valid = ATTR_SIZE,
>> > +               .ia_size = stat->size,
>> > +       };
>> > +
>> > +       return notify_change(upperdentry, &attr, NULL);
>> > +}
>> > +
>> >  static int ovl_set_timestamps(struct dentry *upperdentry, struct kstat *stat)
>> >  {
>> >         struct iattr attr = {
>> > @@ -439,6 +449,28 @@ static int ovl_get_tmpfile(struct ovl_copy_up_ctx *c, struct dentry **tempp)
>> >         goto out;
>> >  }
>> >
>> > +/* Copy up data of an inode which was copied up metadata only in the past. */
>> > +static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c)
>> > +{
>> > +       struct path upperpath;
>> > +       int err;
>> > +
>> > +       ovl_path_upper(c->dentry, &upperpath);
>> > +       if (WARN_ON(upperpath.dentry == NULL))
>> > +               return -EIO;
>> > +
>> > +       err = ovl_copy_up_data(&c->lowerpath, &upperpath, c->stat.size);
>> > +       if (err)
>> > +               return err;
>> > +
>> > +       err= vfs_removexattr(upperpath.dentry, OVL_XATTR_METACOPY);
>> > +       if (err)
>> > +               return err;
>> > +
>> > +       ovl_set_flag(OVL_UPPERDATA, d_inode(c->dentry));
>> > +       return err;
>> > +}
>> > +
>> >  static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
>> >  {
>> >         int err;
>> > @@ -474,8 +506,19 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
>> >                         return err;
>> >         }
>> >
>> > +       if (c->metacopy) {
>> > +               err = ovl_check_setxattr(c->dentry, temp, OVL_XATTR_METACOPY,
>> > +                                        NULL, 0, -EOPNOTSUPP);
>> > +               if (err)
>> > +                       return err;
>> > +       }
>> > +
>> > +
>> >         inode_lock(temp->d_inode);
>> > -       err = ovl_set_attr(temp, &c->stat);
>> > +       if (c->metacopy)
>> > +               err = ovl_set_size(temp, &c->stat);
>> > +       if (!err)
>> > +               err = ovl_set_attr(temp, &c->stat);
>> >         inode_unlock(temp->d_inode);
>> >
>> >         return err;
>> > @@ -506,6 +549,8 @@ static int ovl_copy_up_locked(struct ovl_copy_up_ctx *c)
>> >         if (err)
>> >                 goto out_cleanup;
>> >
>> > +       if (!c->metacopy && S_ISREG(d_inode(c->dentry)->i_mode))
>> > +               ovl_set_flag(OVL_UPPERDATA, d_inode(c->dentry));
>> >         ovl_inode_update(d_inode(c->dentry), newdentry);
>> >  out:
>> >         dput(temp);
>> > @@ -632,7 +677,7 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
>> >         }
>> >         ovl_do_check_copy_up(ctx.lowerpath.dentry);
>> >
>> > -       err = ovl_copy_up_start(dentry);
>> > +       err = ovl_copy_up_start(dentry, flags);
>> >         /* err < 0: interrupted, err > 0: raced with another copy-up */
>> >         if (unlikely(err)) {
>> >                 if (err > 0)
>> > @@ -642,6 +687,8 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
>> >                         err = ovl_do_copy_up(&ctx);
>> >                 if (!err && !ovl_dentry_has_upper_alias(dentry))
>> >                         err = ovl_link_up(&ctx);
>> > +               if (!err && ovl_dentry_needs_data_copy_up(dentry, flags))
>> > +                       err = ovl_copy_up_meta_inode_data(&ctx);
>> >                 ovl_copy_up_end(dentry);
>> >         }
>> >         do_delayed_call(&done);
>> > @@ -672,8 +719,10 @@ int ovl_copy_up_flags(struct dentry *dentry, int flags)
>> >                  *      with rename.
>> >                  */
>> >                 if (ovl_dentry_upper(dentry) &&
>> > -                   ovl_dentry_has_upper_alias(dentry))
>> > +                   ovl_dentry_has_upper_alias(dentry) &&
>> > +                   !ovl_dentry_needs_data_copy_up(dentry, flags)) {
>> >                         break;
>> > +               }
>> >
>> >                 next = dget(dentry);
>> >                 /* find the topmost dentry not yet copied up */
>> > diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
>> > index 321511e..1b4b42c 100644
>> > --- a/fs/overlayfs/inode.c
>> > +++ b/fs/overlayfs/inode.c
>> > @@ -320,7 +320,8 @@ struct posix_acl *ovl_get_acl(struct inode *inode, int type)
>> >  static bool ovl_open_need_copy_up(struct dentry *dentry, int flags)
>> >  {
>> >         if (ovl_dentry_upper(dentry) &&
>> > -           ovl_dentry_has_upper_alias(dentry))
>> > +           ovl_dentry_has_upper_alias(dentry) &&
>> > +           !ovl_dentry_needs_data_copy_up(dentry, flags))
>> >                 return false;
>> >
>> >         if (special_file(d_inode(dentry)->i_mode))
>> > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
>> > index d9a0edd..714abf9 100644
>> > --- a/fs/overlayfs/overlayfs.h
>> > +++ b/fs/overlayfs/overlayfs.h
>> > @@ -26,10 +26,12 @@ enum ovl_path_type {
>> >  #define OVL_XATTR_ORIGIN OVL_XATTR_PREFIX "origin"
>> >  #define OVL_XATTR_IMPURE OVL_XATTR_PREFIX "impure"
>> >  #define OVL_XATTR_NLINK OVL_XATTR_PREFIX "nlink"
>> > +#define OVL_XATTR_METACOPY OVL_XATTR_PREFIX "metacopy"
>> >
>> >  enum ovl_flag {
>> >         OVL_IMPURE,
>> >         OVL_INDEX,
>> > +       OVL_UPPERDATA,
>> >  };
>> >
>> >  /*
>> > @@ -211,6 +213,8 @@ bool ovl_dentry_is_whiteout(struct dentry *dentry);
>> >  void ovl_dentry_set_opaque(struct dentry *dentry);
>> >  bool ovl_dentry_has_upper_alias(struct dentry *dentry);
>> >  void ovl_dentry_set_upper_alias(struct dentry *dentry);
>> > +bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags);
>> > +bool ovl_dentry_check_upperdata(struct dentry *dentry);
>> >  bool ovl_redirect_dir(struct super_block *sb);
>> >  const char *ovl_dentry_get_redirect(struct dentry *dentry);
>> >  void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
>> > @@ -221,7 +225,7 @@ void ovl_dentry_version_inc(struct dentry *dentry, bool impurity);
>> >  u64 ovl_dentry_version_get(struct dentry *dentry);
>> >  bool ovl_is_whiteout(struct dentry *dentry);
>> >  struct file *ovl_path_open(struct path *path, int flags);
>> > -int ovl_copy_up_start(struct dentry *dentry);
>> > +int ovl_copy_up_start(struct dentry *dentry, int flags);
>> >  void ovl_copy_up_end(struct dentry *dentry);
>> >  bool ovl_check_dir_xattr(struct dentry *dentry, const char *name);
>> >  int ovl_check_setxattr(struct dentry *dentry, struct dentry *upperdentry,
>> > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
>> > index a4ce1c9..ef720a9 100644
>> > --- a/fs/overlayfs/util.c
>> > +++ b/fs/overlayfs/util.c
>> > @@ -227,6 +227,35 @@ void ovl_dentry_set_upper_alias(struct dentry *dentry)
>> >         oe->has_upper = true;
>> >  }
>> >
>> > +bool ovl_dentry_check_upperdata(struct dentry *dentry) {
>>
>> Not a good helper name IMO. it reads to me "check if upperdata is set"
>> and it should read "should I check upperdata flag"
>
> How about ovl_should_check_upperdata() instead.
>

OK. and make sure to add if (ofs->config.metacopy)

>>
>> > +       if (!S_ISREG(d_inode(dentry)->i_mode))
>> > +               return false;
>> > +
>> > +       if (!ovl_dentry_lower(dentry))
>> > +               return false;
>> > +
>> > +       return true;
>> > +}
>> > +
>> > +bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags) {
>> > +       if (!ovl_dentry_check_upperdata(dentry))
>> > +               return false;
>> > +
>> > +       if (!S_ISREG(d_inode(dentry)->i_mode))
>> > +               return false;
>>
>> Isn't this redundant to the helper check?
>
> Yes. this is redundant now. Forgot to remove it.
>
> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/13] ovl: A new xattr OVL_XATTR_METACOPY for file on upper
  2017-10-26 14:14       ` Amir Goldstein
@ 2017-10-26 14:34         ` Vivek Goyal
  2017-10-26 16:11           ` Amir Goldstein
  0 siblings, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2017-10-26 14:34 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Thu, Oct 26, 2017 at 05:14:57PM +0300, Amir Goldstein wrote:
> On Thu, Oct 26, 2017 at 4:53 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Thu, Oct 26, 2017 at 09:04:17AM +0300, Amir Goldstein wrote:
> >> On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > Now we will have the capability to have upper inodes which might be only
> >> > metadata copy up and data is still on lower inode. So add a new xattr
> >> > OVL_XATTR_METACOPY to distinguish between two cases.
> >> >
> >> > Presence of OVL_XATTR_METACOPY reflects that file has been copied up metadata
> >> > only and and data will be copied up later from lower origin.
> >> > So this xattr is set when a metadata copy takes place and cleared when
> >> > data copy takes place.
> >> >
> >> > We also use a bit in ovl_inode->flags to cache OVL_UPPERDATA which reflects
> >> > whether ovl inode has data or not (as opposed to metadata only copy up).
> >> >
> >> > Note, OVL_UPPERDATA is set only on those upper inodes which can possibly
> >> > be metacopy only inodes. For example, inode should be a regular file and
> >> > there needs to be a lower dentry.
> >>
> >> Why? is it not simpler and better for readability to set it in all
> >> cases except metacopy?
> >
> > I think it was re-introducing ordering requirement w.r.t oi->__upperdentry
> > and that's why I avoided it doing for all type of files.
> >
> > For example, in file/dir creation path when a new dentry is created, we
> > need to make sure OVL_UPPERDATA is visible on other cpus before
> > oi->__upperdentry is visible. Otherwise other cpus might try to copy up this
> > file for which there might not be any lower and all the bad things can happen.
> >
> > For example,
> >
> >           CPU1                                  CPU2
> >    ovl_instantiate() {                     ovl_d_real() {
> >      ovl_dentry_set_upper_alias()           ovl_open_maybe_copy_up()
> >      set OVL_UPPERDATA                       ovl_dentry_needs_data_copy_up
> >      smp_wmb();                               test OVL_UPPERDATA
> >      oi->__upperdentry = upperdentry
> >    }
> >
> > So if cpu1 is creating a new file and cpu2 does ovl_d_real() and it sees
> > upper dentry but not OVL_UPPERDATA, it will try to copy up file (which
> > might not be on lower at all).
> >
> > Similarly, in ovl_d_real() we check OVL_UPPERDATA one more time and if
> > its not set, we return lower dentry instead of upper. It can race there
> > as well and try to return a lower which does not exist (for a newly
> > created file).
> >
> > So, IIUC, this was bringing back ordering requirement and hence I limited
> > this flag.
> >
> 
> Fine. but use ovl_should_check_upperdata() in read path (CPU2) anyway
> so there is little point in checking whether or not you need to set the flag.
> Better just set it unconditionally on copy up (!metacopy) and on
> ovl_instantiate().

Ok. Why not keep it symmetric in WRITE and READ paths. That way it is
easier to read and understand code. Making it asymmetric will only
increase confusion. If we are never going to check a flag, why to 
set it to begin with.

Anyway, I don't have strong opinion about it. If you like this better,
I am fine.

> 
> > BTW, how did oi->has_upper handle this requirement. IIUC, that can run
> > into same issue. It is possible that CPU2 does not see ->has_upper for
> > newly created file while it has oi->__upperdentry. That means it can
> > try to copy up a non-existent file as well.
> 
> I think you are referring to the "false negative" case in Miklos's comment
> in ovl_copy_up_flags(). This is the simplest case because both __upperdentry
> and has_alias are checked again under oi->lock in ovl_copy_up_one().

But before we take lock, we start doing operations on lower and you might
get null pointer dereference. (there will not be any lower for an upper
only file).

ovl_copy_up_one() {
	ovl_path_lower(dentry, &ctx.lowerpath);
	err = vfs_getattr(&ctx.lowerpath, &ctx.stat,
                          STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT);
	ovl_do_check_copy_up(ctx.lowerpath.dentry);
	ovl_copy_up_start();    <--- Now we take lock. 
}

[..]
> >> > +bool ovl_dentry_check_upperdata(struct dentry *dentry) {
> >>
> >> Not a good helper name IMO. it reads to me "check if upperdata is set"
> >> and it should read "should I check upperdata flag"
> >
> > How about ovl_should_check_upperdata() instead.
> >
> 
> OK. and make sure to add if (ofs->config.metacopy)

Hmm... I am not sure about checking for ofs->config.metacopy. The way
this feature is designed right now, is that if metacopy=off, then any
new copyup will not happen metacopy only. But if there are existing
metacopy only files, then these will still be data copied up.

So say somebody mounts with metacopy=on and bunch of files are copied up
metadata only. Now overlay is remounted with metacopy=off, then this will
continue to work. No new copy ups will be metadata only but existing
metadata only copied up files will continue to work.

That means, even if metacopy=off, I need to check for UPPERDATA flag
properly and trigger a data copy up if need be.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/13] ovl: A new xattr OVL_XATTR_METACOPY for file on upper
  2017-10-26 14:34         ` Vivek Goyal
@ 2017-10-26 16:11           ` Amir Goldstein
  2017-10-27  4:28             ` Amir Goldstein
  0 siblings, 1 reply; 44+ messages in thread
From: Amir Goldstein @ 2017-10-26 16:11 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Oct 26, 2017 at 5:34 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Thu, Oct 26, 2017 at 05:14:57PM +0300, Amir Goldstein wrote:
>> On Thu, Oct 26, 2017 at 4:53 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Thu, Oct 26, 2017 at 09:04:17AM +0300, Amir Goldstein wrote:
>> >> On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> >> > Now we will have the capability to have upper inodes which might be only
>> >> > metadata copy up and data is still on lower inode. So add a new xattr
>> >> > OVL_XATTR_METACOPY to distinguish between two cases.
>> >> >
>> >> > Presence of OVL_XATTR_METACOPY reflects that file has been copied up metadata
>> >> > only and and data will be copied up later from lower origin.
>> >> > So this xattr is set when a metadata copy takes place and cleared when
>> >> > data copy takes place.
>> >> >
>> >> > We also use a bit in ovl_inode->flags to cache OVL_UPPERDATA which reflects
>> >> > whether ovl inode has data or not (as opposed to metadata only copy up).
>> >> >
>> >> > Note, OVL_UPPERDATA is set only on those upper inodes which can possibly
>> >> > be metacopy only inodes. For example, inode should be a regular file and
>> >> > there needs to be a lower dentry.
>> >>
>> >> Why? is it not simpler and better for readability to set it in all
>> >> cases except metacopy?
>> >
>> > I think it was re-introducing ordering requirement w.r.t oi->__upperdentry
>> > and that's why I avoided it doing for all type of files.
>> >
>> > For example, in file/dir creation path when a new dentry is created, we
>> > need to make sure OVL_UPPERDATA is visible on other cpus before
>> > oi->__upperdentry is visible. Otherwise other cpus might try to copy up this
>> > file for which there might not be any lower and all the bad things can happen.
>> >
>> > For example,
>> >
>> >           CPU1                                  CPU2
>> >    ovl_instantiate() {                     ovl_d_real() {
>> >      ovl_dentry_set_upper_alias()           ovl_open_maybe_copy_up()
>> >      set OVL_UPPERDATA                       ovl_dentry_needs_data_copy_up
>> >      smp_wmb();                               test OVL_UPPERDATA
>> >      oi->__upperdentry = upperdentry
>> >    }
>> >
>> > So if cpu1 is creating a new file and cpu2 does ovl_d_real() and it sees
>> > upper dentry but not OVL_UPPERDATA, it will try to copy up file (which
>> > might not be on lower at all).
>> >
>> > Similarly, in ovl_d_real() we check OVL_UPPERDATA one more time and if
>> > its not set, we return lower dentry instead of upper. It can race there
>> > as well and try to return a lower which does not exist (for a newly
>> > created file).
>> >
>> > So, IIUC, this was bringing back ordering requirement and hence I limited
>> > this flag.
>> >
>>
>> Fine. but use ovl_should_check_upperdata() in read path (CPU2) anyway
>> so there is little point in checking whether or not you need to set the flag.
>> Better just set it unconditionally on copy up (!metacopy) and on
>> ovl_instantiate().
>
> Ok. Why not keep it symmetric in WRITE and READ paths. That way it is
> easier to read and understand code. Making it asymmetric will only
> increase confusion. If we are never going to check a flag, why to
> set it to begin with.
>
> Anyway, I don't have strong opinion about it. If you like this better,
> I am fine.
>

Please just makes sure the resulting code is simplest.
Don't add extra if's if setting the flag is not harmful.

>>
>> > BTW, how did oi->has_upper handle this requirement. IIUC, that can run
>> > into same issue. It is possible that CPU2 does not see ->has_upper for
>> > newly created file while it has oi->__upperdentry. That means it can
>> > try to copy up a non-existent file as well.
>>
>> I think you are referring to the "false negative" case in Miklos's comment
>> in ovl_copy_up_flags(). This is the simplest case because both __upperdentry
>> and has_alias are checked again under oi->lock in ovl_copy_up_one().
>
> But before we take lock, we start doing operations on lower and you might
> get null pointer dereference. (there will not be any lower for an upper
> only file).
>
> ovl_copy_up_one() {
>         ovl_path_lower(dentry, &ctx.lowerpath);
>         err = vfs_getattr(&ctx.lowerpath, &ctx.stat,
>                           STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT);
>         ovl_do_check_copy_up(ctx.lowerpath.dentry);
>         ovl_copy_up_start();    <--- Now we take lock.
> }
>

Ah, I see it now.

I guess there are barriers to make sure that dentry is consistent before
adding it to dcache and finding it in lookup, but just guessing. Miklos?

> [..]
>> >> > +bool ovl_dentry_check_upperdata(struct dentry *dentry) {
>> >>
>> >> Not a good helper name IMO. it reads to me "check if upperdata is set"
>> >> and it should read "should I check upperdata flag"
>> >
>> > How about ovl_should_check_upperdata() instead.
>> >
>>
>> OK. and make sure to add if (ofs->config.metacopy)
>
> Hmm... I am not sure about checking for ofs->config.metacopy. The way
> this feature is designed right now, is that if metacopy=off, then any
> new copyup will not happen metacopy only. But if there are existing
> metacopy only files, then these will still be data copied up.
>
> So say somebody mounts with metacopy=on and bunch of files are copied up
> metadata only. Now overlay is remounted with metacopy=off, then this will
> continue to work. No new copy ups will be metadata only but existing
> metadata only copied up files will continue to work.
>
> That means, even if metacopy=off, I need to check for UPPERDATA flag
> properly and trigger a data copy up if need be.
>

I see. In that case, we can employ "filesystem enabled features"
terminology, which can tell us if a feature was ever enabled. This is what
my "overlay index features" patches are trying to address.

But its ok to ignore this for metacopy for now.

Amir.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 11/13] ovl: Introduce read/write barriers around metacopy flag update
  2017-10-26  6:34   ` Amir Goldstein
@ 2017-10-26 17:54     ` Vivek Goyal
  2017-10-27  4:35       ` Amir Goldstein
  0 siblings, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2017-10-26 17:54 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Thu, Oct 26, 2017 at 09:34:15AM +0300, Amir Goldstein wrote:
> On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > If a file is copied up metadata only and later when same file is opened
> > for WRITE, then data copy up takes place. We copy up data, remove METACOPY
> > xattr and then set the UPPERDATA flag in ovl_entry->flags. While all
> > these operations happen with oi->lock held, read side of oi->flags is
> > lockless. That is another thread on another cpu can check if UPPERDATA
> > flag is set or not.
> >
> > So this gives us an ordering requirement w.r.t UPPERDATA flag. That is, if
> > another cpu sees UPPERDATA flag set, then it should be guaranteed that
> > effects of data copy up and remove xattr operations are also visible.
> >
> > For example.
> >
> >         CPU1                            CPU2
> > ovl_copy_up_flags()                     acquire(oi->lock)
> >  ovl_dentry_needs_data_copy_up()          ovl_copy_up_data()
> >    ovl_test_flag(OVL_UPPERDATA)           vfs_removexattr()
> >                                           ovl_set_flag(OVL_UPPERDATA)
> >                                         release(oi->lock)
> >
> > Say CPU2 is copying up data and in the end sets UPPERDATA flag. But if
> > CPU1 perceives the effects of setting UPPERDATA flag but not effects of
> > preceeding operations, that would be a problem.
> 
> Why would that be a problem?
> What can go wrong?

That's a good question. I really don't have a concrete example where I can
say this this can go wrong. Can you think of something.

> If you try to answer the question instead of referring to a vague "problem"
> you will see that only the ovl_d_real() code path can be a problem.

Right. And ovl_copy_up_flags() will be called from ovl_d_real().  Will
update it to show cover more of parent chain.

> and maybe
> (I did not check) ovl_getattr. Please change your example above to ovl_d_real()
> code path of CPU1

Will do.

I looked at ovl_getattr() and can't think why smp_rmb() is needed there.
We check UPPERDATA in the end if flag is not visible, then we do stat
on lower. Which should be fine as if other cpu is doing copy up, there
are no guarantees that ovl_getattr() will see updates.

And if UPPERDATA is set, then we don't do anything and simply return, so
that should not matter either. In d_real() we return upperdentry so we
need to make sure it is stable that's why smp_rmb(). In ovl_getattr()
we don't return upper dentry, so smp_rmb() is probably not required.

> 
> >
> > Hence this patch introduces smp_wmb() on setting UPPERDATA flag operation
> > and smp_rmb() on UPPERDATA flag test operation.
> >
> > May be some other lock or barrier is already covering it. But I am not sure
> > what that is and is it obvious enough that we will not break it in future.
> >
> > So hence trying to be safe here and introducing barriers explicitly for
> > UPPERDATA flag/bit.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/overlayfs/copy_up.c |  7 ++++++-
> >  fs/overlayfs/super.c   | 13 ++++++++++---
> >  fs/overlayfs/util.c    | 11 ++++++++++-
> >  3 files changed, 26 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> > index a6cda02..4876ae4 100644
> > --- a/fs/overlayfs/copy_up.c
> > +++ b/fs/overlayfs/copy_up.c
> > @@ -466,7 +466,12 @@ static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c)
> >         err= vfs_removexattr(upperpath.dentry, OVL_XATTR_METACOPY);
> >         if (err)
> >                 return err;
> > -
> > +       /*
> > +        * Pairs with smp_rmb() in ovl_dentry_needs_data_copy_up(). Make sure
> 
> Nope. only pairs with smp_rmpb() in ovl_d_real() (or in a new helper
> you need to create)

Please see further down about my argument that why we should retain 
smp_rmb() in ovl_dentry_needs_data_copy_up().

> 
> 
> > +        * if OVL_UPPERDATA flag is visible, then all the write operations
> > +        * before it are visible as well.
> > +        */
> > +       smp_wmb();
> >         ovl_set_flag(OVL_UPPERDATA, d_inode(c->dentry));
> >         return err;
> >  }
> > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > index 4cf1f98..e97dccb 100644
> > --- a/fs/overlayfs/super.c
> > +++ b/fs/overlayfs/super.c
> > @@ -102,9 +102,16 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
> >                         if (err)
> >                                 return ERR_PTR(err);
> >
> > -                       if (ovl_dentry_check_upperdata(dentry) &&
> > -                           !ovl_test_flag(OVL_UPPERDATA, d_inode(dentry)))
> > -                               goto lower;
> > +                       if (ovl_dentry_check_upperdata(dentry)) {
> > +                               if (!ovl_test_flag(OVL_UPPERDATA,
> > +                                   d_inode(dentry)))
> > +                                       goto lower;
> > +                               /*
> > +                                * Pairs with smp_wmb in
> > +                                * ovl_copy_up_meta_inode_data()
> > +                                */
> > +                               smp_rmb();
> > +                       }
> >                 }
> >                 return real;
> >         }
> > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> > index ef720a9..d0f3bf7 100644
> > --- a/fs/overlayfs/util.c
> > +++ b/fs/overlayfs/util.c
> > @@ -238,6 +238,8 @@ bool ovl_dentry_check_upperdata(struct dentry *dentry) {
> >  }
> >
> >  bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags) {
> > +       bool upperdata;
> > +
> >         if (!ovl_dentry_check_upperdata(dentry))
> >                 return false;
> >
> > @@ -250,7 +252,14 @@ bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags) {
> >         if (!(OPEN_FMODE(flags) & FMODE_WRITE))
> >                 return false;
> >
> > -       if (likely(ovl_test_flag(OVL_UPPERDATA, d_inode(dentry))))
> > +       upperdata = ovl_test_flag(OVL_UPPERDATA, d_inode(dentry));
> > +       /*
> > +        * Pairs with smp_wmb() in ovl_copy_up_meta_inode_data(). Make sure
> > +        * if setting of OVL_UPPERDATA is visible, then effects of writes
> > +        * before that are visible too.
> > +        */
> > +       smp_rmb();
> > +       if (upperdata)
> 
> Nope. smp_rmb() is not needed here, because most of the places that
> use this helper
> will take a lock and call it again under lock.

When you say "lock" you are referring to oi->lock, right?

If yes, I see 3 callsites of ovl_dentry_needs_data_copy_up() right now and
two of them are lockless(). Calls from ovl_d_real() and ovl_copy_up_flags()
are lockless while call from ovl_copy_up_one() is locked.

ovl_d_real() is one example of lockless access. There are others. Anybody
whole call ovl_copy_up() will call this lockless. That's a different
thing that ovl_copy_up() right now does not specify WRITE flag so data
copy up will not take place. But that's an internal detail of meaning of
the bit at this point of time. 

I would rather place barrier right next to bit/flag which is being
protectd, instead of putting it somewhere far up in the call chain. That
makes understanding code hard at the same time possibility of of error
increases.

IOW, this is no different from ovl_dentry_upper() where data dependency
barrier is placed right next to pointer which is being protected. And
now ovl_dentry_upper() is called both from lockless and locked code.

> You may need an explicit smp_rmb() also in getattr() though, so you
> can create a new
> helper that does exactly what the hunk in ovl_d_real does and reuse
> the helper in ovl_getattr
> 

Right. getattr() seems to be racy right now. getattr() should either
return number of blocks from lower (if metacopy only) or from upper
(after data copy has stablized). But not anything in between.

And right now, I think multiple races are possible.

		CPU1			CPU2
	ovl_getattr()
	vfs_getattr(upper)
					data_copy_up_finished;
					smp_wmb()
					OVL_UPPERDATA=1
	test OVL_UPPERDATA=1
	smp_rmb()
	return

So when we did vfs_getattr() on upper first time, it could be any number
of blocks (either 0 or intermediate state). In that case should always
return blocks from lower (I think).

So that probably means that OVL_UPPERDATA should be checked early,
possibly with smp_rmb() and then decision should be made in advance
whether to query lower or not.

I will fix it. Thanks for bringing it up.

Vivek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 09/13] ovl: Set OVL_UPPERDATA flag during ovl_lookup()
  2017-10-26  6:19   ` Amir Goldstein
@ 2017-10-26 18:04     ` Vivek Goyal
  0 siblings, 0 replies; 44+ messages in thread
From: Vivek Goyal @ 2017-10-26 18:04 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Thu, Oct 26, 2017 at 09:19:38AM +0300, Amir Goldstein wrote:
[..]
> > @@ -634,6 +653,20 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> >                                                roe->numlower, &stack, &ctr);
> >                         if (err)
> >                                 goto out_put_upper;
> > +
> > +                       err = ovl_check_metacopy(upperdentry);
> 
> You should avoid checking metacopy xattr for file types that are not
> eligable.

Sure. Will do.

Vivek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 13/13] ovl: Enable metadata only feature
  2017-10-26  7:07   ` Amir Goldstein
@ 2017-10-26 18:19     ` Vivek Goyal
  0 siblings, 0 replies; 44+ messages in thread
From: Vivek Goyal @ 2017-10-26 18:19 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Thu, Oct 26, 2017 at 10:07:39AM +0300, Amir Goldstein wrote:
> On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > All the bits are in patches before this. So it is time to enable the
> > metadata only copy up feature.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/overlayfs/copy_up.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> > index 4876ae4..3ce35e1 100644
> > --- a/fs/overlayfs/copy_up.c
> > +++ b/fs/overlayfs/copy_up.c
> > @@ -637,12 +637,13 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
> >         int err;
> >         DEFINE_DELAYED_CALL(done);
> >         struct path parentpath;
> > +       struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
> >         struct ovl_copy_up_ctx ctx = {
> >                 .parent = parent,
> >                 .dentry = dentry,
> >                 .workdir = ovl_workdir(dentry),
> >                 .origin = true,
> > -               .metacopy = false,
> > +               .metacopy = ofs->config.metacopy,
> 
> This should have been better
> .metacopy = ovl_dentry_needs_data_copy_up(dentry)

You mean ovl_dentry_needs_meta_copy_up(). Ok, I will introduce a
helper for this and get rid of other code.

> to begin with and the helper would start with
> 
> if (true) return; (with a TODO comment)
> 
> That this patch would fix to
> if (!ofs->config.metacopy) return
> 
> Then you don't need to check ISREG and write flags below.

Will do.

Vivek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/13] ovl: A new xattr OVL_XATTR_METACOPY for file on upper
  2017-10-26 16:11           ` Amir Goldstein
@ 2017-10-27  4:28             ` Amir Goldstein
  0 siblings, 0 replies; 44+ messages in thread
From: Amir Goldstein @ 2017-10-27  4:28 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Oct 26, 2017 at 7:11 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Oct 26, 2017 at 5:34 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> On Thu, Oct 26, 2017 at 05:14:57PM +0300, Amir Goldstein wrote:
>>> On Thu, Oct 26, 2017 at 4:53 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>>> > On Thu, Oct 26, 2017 at 09:04:17AM +0300, Amir Goldstein wrote:
>>> >> On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>>> >> > Now we will have the capability to have upper inodes which might be only
>>> >> > metadata copy up and data is still on lower inode. So add a new xattr
>>> >> > OVL_XATTR_METACOPY to distinguish between two cases.
>>> >> >
>>> >> > Presence of OVL_XATTR_METACOPY reflects that file has been copied up metadata
>>> >> > only and and data will be copied up later from lower origin.
>>> >> > So this xattr is set when a metadata copy takes place and cleared when
>>> >> > data copy takes place.
>>> >> >
>>> >> > We also use a bit in ovl_inode->flags to cache OVL_UPPERDATA which reflects
>>> >> > whether ovl inode has data or not (as opposed to metadata only copy up).
>>> >> >
>>> >> > Note, OVL_UPPERDATA is set only on those upper inodes which can possibly
>>> >> > be metacopy only inodes. For example, inode should be a regular file and
>>> >> > there needs to be a lower dentry.
>>> >>
>>> >> Why? is it not simpler and better for readability to set it in all
>>> >> cases except metacopy?
>>> >
>>> > I think it was re-introducing ordering requirement w.r.t oi->__upperdentry
>>> > and that's why I avoided it doing for all type of files.
>>> >
>>> > For example, in file/dir creation path when a new dentry is created, we
>>> > need to make sure OVL_UPPERDATA is visible on other cpus before
>>> > oi->__upperdentry is visible. Otherwise other cpus might try to copy up this
>>> > file for which there might not be any lower and all the bad things can happen.
>>> >
>>> > For example,
>>> >
>>> >           CPU1                                  CPU2
>>> >    ovl_instantiate() {                     ovl_d_real() {
>>> >      ovl_dentry_set_upper_alias()           ovl_open_maybe_copy_up()
>>> >      set OVL_UPPERDATA                       ovl_dentry_needs_data_copy_up
>>> >      smp_wmb();                               test OVL_UPPERDATA
>>> >      oi->__upperdentry = upperdentry
>>> >    }
>>> >
>>> > So if cpu1 is creating a new file and cpu2 does ovl_d_real() and it sees
>>> > upper dentry but not OVL_UPPERDATA, it will try to copy up file (which
>>> > might not be on lower at all).
>>> >
>>> > Similarly, in ovl_d_real() we check OVL_UPPERDATA one more time and if
>>> > its not set, we return lower dentry instead of upper. It can race there
>>> > as well and try to return a lower which does not exist (for a newly
>>> > created file).
>>> >
>>> > So, IIUC, this was bringing back ordering requirement and hence I limited
>>> > this flag.
>>> >
>>>
>>> Fine. but use ovl_should_check_upperdata() in read path (CPU2) anyway
>>> so there is little point in checking whether or not you need to set the flag.
>>> Better just set it unconditionally on copy up (!metacopy) and on
>>> ovl_instantiate().
>>
>> Ok. Why not keep it symmetric in WRITE and READ paths. That way it is
>> easier to read and understand code. Making it asymmetric will only
>> increase confusion. If we are never going to check a flag, why to
>> set it to begin with.
>>
>> Anyway, I don't have strong opinion about it. If you like this better,
>> I am fine.
>>
>
> Please just makes sure the resulting code is simplest.
> Don't add extra if's if setting the flag is not harmful.
>
>>>
>>> > BTW, how did oi->has_upper handle this requirement. IIUC, that can run
>>> > into same issue. It is possible that CPU2 does not see ->has_upper for
>>> > newly created file while it has oi->__upperdentry. That means it can
>>> > try to copy up a non-existent file as well.
>>>
>>> I think you are referring to the "false negative" case in Miklos's comment
>>> in ovl_copy_up_flags(). This is the simplest case because both __upperdentry
>>> and has_alias are checked again under oi->lock in ovl_copy_up_one().
>>
>> But before we take lock, we start doing operations on lower and you might
>> get null pointer dereference. (there will not be any lower for an upper
>> only file).
>>
>> ovl_copy_up_one() {
>>         ovl_path_lower(dentry, &ctx.lowerpath);
>>         err = vfs_getattr(&ctx.lowerpath, &ctx.stat,
>>                           STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT);
>>         ovl_do_check_copy_up(ctx.lowerpath.dentry);
>>         ovl_copy_up_start();    <--- Now we take lock.
>> }
>>
>
> Ah, I see it now.
>
> I guess there are barriers to make sure that dentry is consistent before
> adding it to dcache and finding it in lookup, but just guessing. Miklos?
>

Vivek,

The false positive case you point out of pure upper entry that will attempt
copy up doesn't require memory barriers to fix. A simple sanity check of
non-NULL lowerpath is enough and return success from ovl_copy_up_one()
If you make this check with WARN_ON() we will know if we had a missing
barrier for has_upper for create upper case, but I don't even know if we care.

Also the check for "should copy up" is starting to grow.
Now it looks like this:
        if (!ovl_dentry_upper(dentry) ||
            (!ovl_dentry_has_upper_alias(dentry))

and it repeats at least twice in d_real() lockless path.
Now you want to add another condition and my NFS export patches
add another condition (dentry->d_flags & DCACHE_DISCONNECTED)
so it is about time to introduce a helper for the lockless path.

Amir.


Amir.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 11/13] ovl: Introduce read/write barriers around metacopy flag update
  2017-10-26 17:54     ` Vivek Goyal
@ 2017-10-27  4:35       ` Amir Goldstein
  2017-10-27 13:14         ` Vivek Goyal
  0 siblings, 1 reply; 44+ messages in thread
From: Amir Goldstein @ 2017-10-27  4:35 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Oct 26, 2017 at 8:54 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Thu, Oct 26, 2017 at 09:34:15AM +0300, Amir Goldstein wrote:
>> On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > If a file is copied up metadata only and later when same file is opened
>> > for WRITE, then data copy up takes place. We copy up data, remove METACOPY
>> > xattr and then set the UPPERDATA flag in ovl_entry->flags. While all
>> > these operations happen with oi->lock held, read side of oi->flags is
>> > lockless. That is another thread on another cpu can check if UPPERDATA
>> > flag is set or not.
>> >
>> > So this gives us an ordering requirement w.r.t UPPERDATA flag. That is, if
>> > another cpu sees UPPERDATA flag set, then it should be guaranteed that
>> > effects of data copy up and remove xattr operations are also visible.
>> >
>> > For example.
>> >
>> >         CPU1                            CPU2
>> > ovl_copy_up_flags()                     acquire(oi->lock)
>> >  ovl_dentry_needs_data_copy_up()          ovl_copy_up_data()
>> >    ovl_test_flag(OVL_UPPERDATA)           vfs_removexattr()
>> >                                           ovl_set_flag(OVL_UPPERDATA)
>> >                                         release(oi->lock)
>> >
>> > Say CPU2 is copying up data and in the end sets UPPERDATA flag. But if
>> > CPU1 perceives the effects of setting UPPERDATA flag but not effects of
>> > preceeding operations, that would be a problem.
>>
>> Why would that be a problem?
>> What can go wrong?
>
> That's a good question. I really don't have a concrete example where I can
> say this this can go wrong. Can you think of something.
>
>> If you try to answer the question instead of referring to a vague "problem"
>> you will see that only the ovl_d_real() code path can be a problem.
>
> Right. And ovl_copy_up_flags() will be called from ovl_d_real().  Will
> update it to show cover more of parent chain.
>

That is not what I meant.
I don't see any negative outcome that could happen from false view
of OVL_UPPERDATA flag in ovl_copy_up_flags().
The race with ovl_instantiate() you pointed out is interesting, but easy to
solve by checking for non-NULL lower.

The path of ovl_d_real() that exposes upper inode to early is the only
path I see that can cause problems, as well as perhaps ovl_getattr(),
which may expose wrong blocks.

Amir.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 11/13] ovl: Introduce read/write barriers around metacopy flag update
  2017-10-27  4:35       ` Amir Goldstein
@ 2017-10-27 13:14         ` Vivek Goyal
  0 siblings, 0 replies; 44+ messages in thread
From: Vivek Goyal @ 2017-10-27 13:14 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Fri, Oct 27, 2017 at 07:35:00AM +0300, Amir Goldstein wrote:
> On Thu, Oct 26, 2017 at 8:54 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Thu, Oct 26, 2017 at 09:34:15AM +0300, Amir Goldstein wrote:
> >> On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > If a file is copied up metadata only and later when same file is opened
> >> > for WRITE, then data copy up takes place. We copy up data, remove METACOPY
> >> > xattr and then set the UPPERDATA flag in ovl_entry->flags. While all
> >> > these operations happen with oi->lock held, read side of oi->flags is
> >> > lockless. That is another thread on another cpu can check if UPPERDATA
> >> > flag is set or not.
> >> >
> >> > So this gives us an ordering requirement w.r.t UPPERDATA flag. That is, if
> >> > another cpu sees UPPERDATA flag set, then it should be guaranteed that
> >> > effects of data copy up and remove xattr operations are also visible.
> >> >
> >> > For example.
> >> >
> >> >         CPU1                            CPU2
> >> > ovl_copy_up_flags()                     acquire(oi->lock)
> >> >  ovl_dentry_needs_data_copy_up()          ovl_copy_up_data()
> >> >    ovl_test_flag(OVL_UPPERDATA)           vfs_removexattr()
> >> >                                           ovl_set_flag(OVL_UPPERDATA)
> >> >                                         release(oi->lock)
> >> >
> >> > Say CPU2 is copying up data and in the end sets UPPERDATA flag. But if
> >> > CPU1 perceives the effects of setting UPPERDATA flag but not effects of
> >> > preceeding operations, that would be a problem.
> >>
> >> Why would that be a problem?
> >> What can go wrong?
> >
> > That's a good question. I really don't have a concrete example where I can
> > say this this can go wrong. Can you think of something.
> >
> >> If you try to answer the question instead of referring to a vague "problem"
> >> you will see that only the ovl_d_real() code path can be a problem.
> >
> > Right. And ovl_copy_up_flags() will be called from ovl_d_real().  Will
> > update it to show cover more of parent chain.
> >
> 
> That is not what I meant.
> I don't see any negative outcome that could happen from false view
> of OVL_UPPERDATA flag in ovl_copy_up_flags().

Sure, but barrier needs to be near to variable being protected by barrier.
My point is that if you move them out in ovl_d_real() instead, then it
is hard to understand what's being protected by barrier.

> The race with ovl_instantiate() you pointed out is interesting, but easy to
> solve by checking for non-NULL lower.

Yep it is. But it does not give me a good feeling because if barriers are
protecting something, it should protect in all cases. Now we have a
special case where barrier is not protecting a field fully and that means
we will rely on some other field. And that makes understanding locking
such a difficult task.

> 
> The path of ovl_d_real() that exposes upper inode to early is the only
> path I see that can cause problems, as well as perhaps ovl_getattr(),
> which may expose wrong blocks.

Right. At this point of time ovl_d_real() and ovl_getattr() seem to be
only two paths which access OVL_UPPERDATA in lockless manner.

Vivek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data
  2017-10-26  7:18 ` [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data Amir Goldstein
@ 2017-10-27 16:40   ` Vivek Goyal
  2017-10-28 14:50     ` Amir Goldstein
  0 siblings, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2017-10-27 16:40 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Thu, Oct 26, 2017 at 10:18:19AM +0300, Amir Goldstein wrote:
> On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > Hi,
> >
> > Here is the V5 of patches. Following are changes from V4.
> 
> Please use git format-patch -v6 next time you post.
> It tags all the patch subjects so it is easier to find the former
> revision of specific patch.

Will do.

> 
> >
> > - Flipped the meaning of bit. Now OVL_METACOPY is called OVL_UPPERDATA and
> >   dropped a barrier patch which implemented ordering requirements between
> >   oi->flags and oi->__upperdentry.
> >
> > - Now ovl_d_real() does return upper dentry if upper dentry is metacopy only
> >   dentry.
> 
> doesn't return upper dentry
> 
> But I am still not convinced that metacopy upper inode is not accessed in places
> it shouldn't be accessed by vfs, via other APIs.
> 
> For example, ovl_permission(), ovl_getattr(), ovl_setattr(), ovl_listxattr(),
> ovl_get_acl() will all access metacopy inode and that outcome will be desired.
> But you did not provide a complete list of vfs APIs and verify for each
> that accessing the metacopy upper via various variants of ovl_dentry_real()
> is either not possible (because access requires a rw fd) or is the
> desired outcome.

Thanks for asking this. I have been looking at code and trying to figure
out if there is anything which is broken. But this also requires the
eyeballs from all the experienced filesystem folks. I have tried
going through all overlayfs interfaces and listing my thoughts below.
AFAICS, nothing seems to be broken.

- I am assuming that ovl_dir_operations{} are not impacted by this at
  all. Metadata copy up affects only regular files and does not touch any
  of directory operations.

Operations internal to overlayfs
================================
1. ovl_setattr()

   Will do metadata copy up and work on upper inode.

2. inode_permission()

   Will do permission checks on upper metacopy inode. Which seems to be
   right thing.

3. ovl_getattr()

   Will do getattr on metacopy inode and will query lower for number of
   blocks. Seems ok.

4. vfs_listxattr()

   Will retrieve xattr list from upper metacopy inode. Seems right.

5. ovl_get_acl()

   Will retrieve acl from upper metacopy inode. Seems right.

6. ovl_update_time()

   Will update atime on upper metacopy inode. Seems right.

7. ovl_get_link()

   Should work on upper inode. For symlinks upper will never be metadata inode
   as this is not a regular file. So this should be fine

8. ovl_lookup()

   Will set the upperdentry to metacopy inode. Seems right.

9. ovl_mkdir()

   Shouldn't be affected as directories can't be metacopy only.

10. ovl_symlink()

   Shouldn't be affected as symlinks can't be metacopy only.

11. ovl_unlink()

   This should remove upper metacopy dentry/inode and replace with whiteout.

12. ovl_rmdir()

   Should not be impacted. There are no metadata copy up for dir inodes.

13. ovl_rename()

   Will copy up files metadata only (as needed) and then do rename. Data
   will be copied up later when files are opened for WRITE. Can't think
   of anything which will be broken due to metadata only inode.

14. ovl_link()

   Will copy up old file metadata only and then create link. Or create
   link if metadata inode already exists on upper. Feels it should not
   be impacted either.

15. ovl_create()

   Will create a new object. No metadata only inode involved.

16. ovl_mknod()

   Will create a new object. No metadata only inode involved.


d_real() and friends
===================
1. vfs_truncate() (fs/open.c)
{
  upperdentry = d_real(path->dentry, NULL, O_WRONLY, 0);
}

This will enforce full copy up. So no behavior change.

2. vfs_open() (fs/open.c)
{
  dentry *dentry = d_real(path->dentry, NULL, file->f_flags, 0)
}

This will get dentry based on flags. If file is opened for READ, then
one will get lower dentry for metacopy inode case. This should be
fine. Can't see anything wrong yet.

3. update_ovl_inode_times() (fs/inode.c)
{
        upperdentry = d_real(dentry, NULL, 0, D_REAL_UPPER);
}

This will return NULL for metacopy inode (for regular files). And that should
not impact atime updates for following reason.

- mtime does not change for metacopy inode. So overlay inode and upper inode
  should have same mtime.

- ctime is kept in sync with overlay inode for all metadata related operations
  which happen on metadata inode (through overlay).

- Whenever mtime or ctime because file has been written to, that time full
  copy up will take place and this will return upperdentry (and not NULL).


4. may_write_real() (fs/namespace.c)
{
        upperdentry = d_real(dentry, NULL, 0, D_REAL_UPPER);
}

This can get a NULL for metadata only inode and that means return -EPERM. IOW,
may_write_real() will always return -EPERM on metadata only inode and these
will be treated as if these are files on lower/.

Now question is does it break something. This will not allow write access to
a file which is metadata only and it makes sense to me. One can not write
to lower file. One could argue that metadata update should be allowed but
current API does not distinguish between the two. So it will be denied.

There are many users of it and I will spend some time and try to understand
each and every use case.

5. file_dentry() (include/linux/fs.h)
{
        return d_real(file->f_path.dentry, file_inode(file), 0, 0);
}

This is trying to find dentry matching with file->f_inode. For metadata inodes
->f_inode will return lower inode which is expected.

6. d_real_inode() (include/linux/dcache.h)
{
        d_backing_inode(d_real((struct dentry *) dentry, NULL, 0, 0));

}
 only inode, this should return lower dentry and lower inode.
Question is, does it break anything.

Only place d_real_inode() seems to be being used is in fs/locks.c

check_conflicting_open() {
        if ((arg == F_RDLCK) &&
           (atomic_read(&d_real_inode(dentry)->i_writecount) > 0))
                return -EAGAIN;
}

i_writecount will be updated only on upper file. So if there is metadata
only inode, then i_writecount should be zero both on upper and lower. So
this case is similar to lower inode, IIUC and should not be a problem.


> >
> > - Added a fix to ovl_lookup() goto statement.
> 
> This patch is independent of this series AFAICT your patches don't
> depend on it in.
> Please post it separately with Fixes tag and CC stable

Will do.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data
  2017-10-27 16:40   ` Vivek Goyal
@ 2017-10-28 14:50     ` Amir Goldstein
  2017-10-31 13:39       ` Vivek Goyal
  0 siblings, 1 reply; 44+ messages in thread
From: Amir Goldstein @ 2017-10-28 14:50 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Fri, Oct 27, 2017 at 7:40 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Thu, Oct 26, 2017 at 10:18:19AM +0300, Amir Goldstein wrote:
>> On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > Hi,
>> >
>> > Here is the V5 of patches. Following are changes from V4.
>>
>> Please use git format-patch -v6 next time you post.
>> It tags all the patch subjects so it is easier to find the former
>> revision of specific patch.
>
> Will do.
>
>>
>> >
>> > - Flipped the meaning of bit. Now OVL_METACOPY is called OVL_UPPERDATA and
>> >   dropped a barrier patch which implemented ordering requirements between
>> >   oi->flags and oi->__upperdentry.
>> >
>> > - Now ovl_d_real() does return upper dentry if upper dentry is metacopy only
>> >   dentry.
>>
>> doesn't return upper dentry
>>
>> But I am still not convinced that metacopy upper inode is not accessed in places
>> it shouldn't be accessed by vfs, via other APIs.
>>
>> For example, ovl_permission(), ovl_getattr(), ovl_setattr(), ovl_listxattr(),
>> ovl_get_acl() will all access metacopy inode and that outcome will be desired.
>> But you did not provide a complete list of vfs APIs and verify for each
>> that accessing the metacopy upper via various variants of ovl_dentry_real()
>> is either not possible (because access requires a rw fd) or is the
>> desired outcome.
>
> Thanks for asking this. I have been looking at code and trying to figure
> out if there is anything which is broken. But this also requires the
> eyeballs from all the experienced filesystem folks. I have tried
> going through all overlayfs interfaces and listing my thoughts below.
> AFAICS, nothing seems to be broken.

That's a very good start.
Now I wonder if the metacopy feature can break userspace apps that relied on
metadata operations doing copy up to avoid the RDONLY/RDWR fd inconsistency,
even if they escaped the inconsistency by accident so far.
I guess there is only one one to know...

>
> - I am assuming that ovl_dir_operations{} are not impacted by this at
>   all. Metadata copy up affects only regular files and does not touch any
>   of directory operations.
>
> Operations internal to overlayfs
> ================================
> 1. ovl_setattr()
>
>    Will do metadata copy up and work on upper inode.
>
> 2. inode_permission()
>
>    Will do permission checks on upper metacopy inode. Which seems to be
>    right thing.
>
> 3. ovl_getattr()
>
>    Will do getattr on metacopy inode and will query lower for number of
>    blocks. Seems ok.

Hmm it seems ok for stat(2), not sure about statx().
If upper metacopy is not encrypted (it has no data) and lower
is encrypted, not sure what should be the result of querying
the STATX_ATTR_ENCRYPTED attribute.
But that is just an example showing how fragile this can, when
'metadata' is not a well defined term.

Not saying that this is a show stopper, but something to consider.

>
> 4. vfs_listxattr()
>
>    Will retrieve xattr list from upper metacopy inode. Seems right.
>
> 5. ovl_get_acl()
>
>    Will retrieve acl from upper metacopy inode. Seems right.
>

You missed ovl_posix_acl_xattr_set/get().
They are hiding well ;-)
But they seem right anyway.

> 6. ovl_update_time()
>
>    Will update atime on upper metacopy inode. Seems right.
>
> 7. ovl_get_link()
>
>    Should work on upper inode. For symlinks upper will never be metadata inode
>    as this is not a regular file. So this should be fine
>
> 8. ovl_lookup()
>
>    Will set the upperdentry to metacopy inode. Seems right.
>
> 9. ovl_mkdir()
>
>    Shouldn't be affected as directories can't be metacopy only.
>
> 10. ovl_symlink()
>
>    Shouldn't be affected as symlinks can't be metacopy only.
>
> 11. ovl_unlink()
>
>    This should remove upper metacopy dentry/inode and replace with whiteout.
>
> 12. ovl_rmdir()
>
>    Should not be impacted. There are no metadata copy up for dir inodes.
>
> 13. ovl_rename()
>
>    Will copy up files metadata only (as needed) and then do rename. Data
>    will be copied up later when files are opened for WRITE. Can't think
>    of anything which will be broken due to metadata only inode.
>
> 14. ovl_link()
>
>    Will copy up old file metadata only and then create link. Or create
>    link if metadata inode already exists on upper. Feels it should not
>    be impacted either.
>
> 15. ovl_create()
>
>    Will create a new object. No metadata only inode involved.
>
> 16. ovl_mknod()
>
>    Will create a new object. No metadata only inode involved.
>
>
> d_real() and friends
> ===================
> 1. vfs_truncate() (fs/open.c)
> {
>   upperdentry = d_real(path->dentry, NULL, O_WRONLY, 0);
> }
>
> This will enforce full copy up. So no behavior change.
>
> 2. vfs_open() (fs/open.c)
> {
>   dentry *dentry = d_real(path->dentry, NULL, file->f_flags, 0)
> }
>
> This will get dentry based on flags. If file is opened for READ, then
> one will get lower dentry for metacopy inode case. This should be
> fine. Can't see anything wrong yet.
>
> 3. update_ovl_inode_times() (fs/inode.c)
> {
>         upperdentry = d_real(dentry, NULL, 0, D_REAL_UPPER);
> }
>
> This will return NULL for metacopy inode (for regular files). And that should
> not impact atime updates for following reason.
>
> - mtime does not change for metacopy inode. So overlay inode and upper inode
>   should have same mtime.
>
> - ctime is kept in sync with overlay inode for all metadata related operations
>   which happen on metadata inode (through overlay).
>
> - Whenever mtime or ctime because file has been written to, that time full
>   copy up will take place and this will return upperdentry (and not NULL).
>
>
> 4. may_write_real() (fs/namespace.c)
> {
>         upperdentry = d_real(dentry, NULL, 0, D_REAL_UPPER);
> }
>
> This can get a NULL for metadata only inode and that means return -EPERM. IOW,
> may_write_real() will always return -EPERM on metadata only inode and these
> will be treated as if these are files on lower/.
>
> Now question is does it break something. This will not allow write access to
> a file which is metadata only and it makes sense to me. One can not write
> to lower file. One could argue that metadata update should be allowed but
> current API does not distinguish between the two. So it will be denied.
>
> There are many users of it and I will spend some time and try to understand
> each and every use case.
>
> 5. file_dentry() (include/linux/fs.h)
> {
>         return d_real(file->f_path.dentry, file_inode(file), 0, 0);
> }
>
> This is trying to find dentry matching with file->f_inode. For metadata inodes
> ->f_inode will return lower inode which is expected.
>
> 6. d_real_inode() (include/linux/dcache.h)
> {
>         d_backing_inode(d_real((struct dentry *) dentry, NULL, 0, 0));
>
> }
>  only inode, this should return lower dentry and lower inode.
> Question is, does it break anything.
>
> Only place d_real_inode() seems to be being used is in fs/locks.c
>
> check_conflicting_open() {
>         if ((arg == F_RDLCK) &&
>            (atomic_read(&d_real_inode(dentry)->i_writecount) > 0))
>                 return -EAGAIN;
> }
>
> i_writecount will be updated only on upper file. So if there is metadata
> only inode, then i_writecount should be zero both on upper and lower. So
> this case is similar to lower inode, IIUC and should not be a problem.
>
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data
  2017-10-28 14:50     ` Amir Goldstein
@ 2017-10-31 13:39       ` Vivek Goyal
  2017-10-31 13:56         ` Amir Goldstein
  0 siblings, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2017-10-31 13:39 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Sat, Oct 28, 2017 at 05:50:19PM +0300, Amir Goldstein wrote:
> On Fri, Oct 27, 2017 at 7:40 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Thu, Oct 26, 2017 at 10:18:19AM +0300, Amir Goldstein wrote:
> >> On Wed, Oct 25, 2017 at 10:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > Hi,
> >> >
> >> > Here is the V5 of patches. Following are changes from V4.
> >>
> >> Please use git format-patch -v6 next time you post.
> >> It tags all the patch subjects so it is easier to find the former
> >> revision of specific patch.
> >
> > Will do.
> >
> >>
> >> >
> >> > - Flipped the meaning of bit. Now OVL_METACOPY is called OVL_UPPERDATA and
> >> >   dropped a barrier patch which implemented ordering requirements between
> >> >   oi->flags and oi->__upperdentry.
> >> >
> >> > - Now ovl_d_real() does return upper dentry if upper dentry is metacopy only
> >> >   dentry.
> >>
> >> doesn't return upper dentry
> >>
> >> But I am still not convinced that metacopy upper inode is not accessed in places
> >> it shouldn't be accessed by vfs, via other APIs.
> >>
> >> For example, ovl_permission(), ovl_getattr(), ovl_setattr(), ovl_listxattr(),
> >> ovl_get_acl() will all access metacopy inode and that outcome will be desired.
> >> But you did not provide a complete list of vfs APIs and verify for each
> >> that accessing the metacopy upper via various variants of ovl_dentry_real()
> >> is either not possible (because access requires a rw fd) or is the
> >> desired outcome.
> >
> > Thanks for asking this. I have been looking at code and trying to figure
> > out if there is anything which is broken. But this also requires the
> > eyeballs from all the experienced filesystem folks. I have tried
> > going through all overlayfs interfaces and listing my thoughts below.
> > AFAICS, nothing seems to be broken.
> 
> That's a very good start.
> Now I wonder if the metacopy feature can break userspace apps that relied on
> metadata operations doing copy up to avoid the RDONLY/RDWR fd inconsistency,
> even if they escaped the inconsistency by accident so far.

There is a small possibility that somebody was relying on this. But we
are keeping metacopy=off by default. So this should not be a backward
compatibility issue. Those who chose to enable it, they can't rely on
file being copied up on changing file attr. 

> I guess there is only one one to know...

Not sure what does this mean.

> 
> >
> > - I am assuming that ovl_dir_operations{} are not impacted by this at
> >   all. Metadata copy up affects only regular files and does not touch any
> >   of directory operations.
> >
> > Operations internal to overlayfs
> > ================================
> > 1. ovl_setattr()
> >
> >    Will do metadata copy up and work on upper inode.
> >
> > 2. inode_permission()
> >
> >    Will do permission checks on upper metacopy inode. Which seems to be
> >    right thing.
> >
> > 3. ovl_getattr()
> >
> >    Will do getattr on metacopy inode and will query lower for number of
> >    blocks. Seems ok.
> 
> Hmm it seems ok for stat(2), not sure about statx().
> If upper metacopy is not encrypted (it has no data) and lower
> is encrypted, not sure what should be the result of querying
> the STATX_ATTR_ENCRYPTED attribute.

That's a good point. I think we have similar issue for directories as
well. In case of merged dir, statx() will return STATX_ATTR_ENCRYPTED
attribute from top of dir stack. So if lower dir is encrypted while
upper is not, STATX_ATTR_ENCRYPTED will not be reported.

Having said that, I feel it makes sense to report STATX_ATTR_ENCRYPTED
for metadata only files (if these files are encrypted on lower). So
while file metadata is not encrypted in this case, but file data is
encrypted. We are anyway querying lower for number of blocks. We could
just query STATX_ATTR_ENCRYPTED as well and report it to user.

> But that is just an example showing how fragile this can, when
> 'metadata' is not a well defined term.
> 
> Not saying that this is a show stopper, but something to consider.
> 
> >
> > 4. vfs_listxattr()
> >
> >    Will retrieve xattr list from upper metacopy inode. Seems right.
> >
> > 5. ovl_get_acl()
> >
> >    Will retrieve acl from upper metacopy inode. Seems right.
> >
> 
> You missed ovl_posix_acl_xattr_set/get().
> They are hiding well ;-)
> But they seem right anyway.

Right. I missed these. Thanks for pointing out. Yes, these should be fine
too.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data
  2017-10-31 13:39       ` Vivek Goyal
@ 2017-10-31 13:56         ` Amir Goldstein
  0 siblings, 0 replies; 44+ messages in thread
From: Amir Goldstein @ 2017-10-31 13:56 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Tue, Oct 31, 2017 at 3:39 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Sat, Oct 28, 2017 at 05:50:19PM +0300, Amir Goldstein wrote:
...
>>
>> That's a very good start.
>> Now I wonder if the metacopy feature can break userspace apps that relied on
>> metadata operations doing copy up to avoid the RDONLY/RDWR fd inconsistency,
>> even if they escaped the inconsistency by accident so far.
>
> There is a small possibility that somebody was relying on this. But we
> are keeping metacopy=off by default. So this should not be a backward
> compatibility issue. Those who chose to enable it, they can't rely on
> file being copied up on changing file attr.
>

Yeh. that's right.

>> I guess there is only one one to know...
>
> Not sure what does this mean.
>

Deploy the feature and wait for the shouts...

Cheers,
Amir.

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2017-10-31 13:56 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-25 19:09 [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data Vivek Goyal
2017-10-25 19:09 ` [PATCH 01/13] ovl: Put upperdentry if ovl_check_origin() fails Vivek Goyal
2017-10-25 20:08   ` Amir Goldstein
2017-10-25 19:09 ` [PATCH 02/13] ovl: Create origin xattr on copy up for all files Vivek Goyal
2017-10-26  5:31   ` Amir Goldstein
2017-10-26 12:53     ` Vivek Goyal
2017-10-25 19:09 ` [PATCH 03/13] ovl: ovl_check_setxattr() get rid of redundant -EOPNOTSUPP check Vivek Goyal
2017-10-25 19:09 ` [PATCH 04/13] ovl: Provide a mount option metacopy=on/off for metadata copyup Vivek Goyal
2017-10-26  5:39   ` Amir Goldstein
2017-10-26 13:15     ` Vivek Goyal
2017-10-26 13:57       ` Amir Goldstein
2017-10-25 19:09 ` [PATCH 05/13] ovl: During copy up, first copy up metadata and then data Vivek Goyal
2017-10-26  5:42   ` Amir Goldstein
2017-10-26 13:19     ` Vivek Goyal
2017-10-25 19:09 ` [PATCH 06/13] ovl: Copy up only metadata during copy up where it makes sense Vivek Goyal
2017-10-25 19:09 ` [PATCH 07/13] ovl: A new xattr OVL_XATTR_METACOPY for file on upper Vivek Goyal
2017-10-26  6:04   ` Amir Goldstein
2017-10-26 13:53     ` Vivek Goyal
2017-10-26 14:14       ` Amir Goldstein
2017-10-26 14:34         ` Vivek Goyal
2017-10-26 16:11           ` Amir Goldstein
2017-10-27  4:28             ` Amir Goldstein
2017-10-25 19:09 ` [PATCH 08/13] ovl: Fix ovl_getattr() to get number of blocks from lower Vivek Goyal
2017-10-26  6:12   ` Amir Goldstein
2017-10-25 19:09 ` [PATCH 09/13] ovl: Set OVL_UPPERDATA flag during ovl_lookup() Vivek Goyal
2017-10-26  6:19   ` Amir Goldstein
2017-10-26 18:04     ` Vivek Goyal
2017-10-25 19:09 ` [PATCH 10/13] ovl: Return lower dentry if only metadata copy up took place Vivek Goyal
2017-10-25 19:09 ` [PATCH 11/13] ovl: Introduce read/write barriers around metacopy flag update Vivek Goyal
2017-10-26  6:34   ` Amir Goldstein
2017-10-26 17:54     ` Vivek Goyal
2017-10-27  4:35       ` Amir Goldstein
2017-10-27 13:14         ` Vivek Goyal
2017-10-25 19:09 ` [PATCH 12/13] ovl: Do not export metacopy only upper dentry Vivek Goyal
2017-10-26  6:54   ` Amir Goldstein
2017-10-26  6:54     ` Amir Goldstein
2017-10-25 19:09 ` [PATCH 13/13] ovl: Enable metadata only feature Vivek Goyal
2017-10-26  7:07   ` Amir Goldstein
2017-10-26 18:19     ` Vivek Goyal
2017-10-26  7:18 ` [RFC PATCH 00/13][V5] overlayfs: Delayed copy up of data Amir Goldstein
2017-10-27 16:40   ` Vivek Goyal
2017-10-28 14:50     ` Amir Goldstein
2017-10-31 13:39       ` Vivek Goyal
2017-10-31 13:56         ` Amir Goldstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.