All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v13 00/28] overlayfs: Delayed copy up of data
@ 2018-03-29 19:38 Vivek Goyal
  2018-03-29 19:38 ` [PATCH v13 01/28] ovl: Set OVL_INDEX flag in ovl_get_inode() Vivek Goyal
                   ` (27 more replies)
  0 siblings, 28 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

Hi,

Here are the V13 of metadata only copy up patches.

These apply on top of 4.16.0-rc7 kernel + "ovl: Bunch of ovl_lookup() path fixes" patch series.

Patches are also available here.

https://github.com/rhvgoyal/linux/commits/metacopy-v13

Changes from V12:

- Primarily took care of Amir's comments.
- Redirects are now placed on regular file also and that opened bunch
  of races. Fixed those issues.
- Posted few ovl_lookup() fixes and rebased these patches on top of
  the fixes. ("ovl: Bunch of ovl_lookup() path fixes")
- Now metacopy file with upper is also marked as OVL_TYPE_MERGED and
  rename code makes use of that assumption. (Suggested by Amir).
- Fixed an issue where OVL_TYPE_ORIGIN was being set even for broken
  hardlinks with metacopy.
- Fixed races w.r.t remove redirect path.
- During rename, set relative redirect if nlink=1, otherwise set
  absolute redirect on metacopy files.

Any comments and feedback is welcome.

Thanks
Vivek

Vivek Goyal (28):
  ovl: Set OVL_INDEX flag in ovl_get_inode()
  ovl: Initialize ovl_inode->redirect in ovl_get_inode()
  ovl: Rename local variable locked to new_locked
  ovl: Provide a mount option metacopy=on/off for metadata copyup
  ovl: During copy up, first copy up metadata and then data
  ovl: Move the copy up helpers to copy_up.c
  ovl: Copy up only metadata during copy up where it makes sense
  ovl: Add helper ovl_already_copied_up()
  ovl: A new xattr OVL_XATTR_METACOPY for file on upper
  ovl: Modify ovl_lookup() and friends to lookup metacopy dentry
  ovl: Copy up meta inode data from lowest data inode
  ovl: Fix ovl_getattr() to get number of blocks from lower
  ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry
  ovl: Do not expose metacopy only dentry from d_real()
  ovl: Move some of ovl_nlink_start() functionality in ovl_nlink_prep()
  ovl: Create locked version of ovl_nlink_start() and ovl_nlink_end()
  ovl: During rename lock both source and target ovl_inode
  ovl: Check redirects for metacopy files
  ovl: Treat metacopy dentries as type OVL_PATH_MERGE
  ovl: Do not set dentry type ORIGIN for broken hardlinks
  ovl: Set redirect on metacopy files upon rename
  ovl: Set redirect on upper inode when it is linked
  ovl: Remove redirect when data of a metacopy file is copied up
  ovl: Do not error if REDIRECT XATTR is missing
  ovl: Use out_err insteada of out_nomem
  ovl: Re-check redirect xattr during inode initialization
  ovl: Verify a data dentry has been found for metacopy inode
  ovl: Enable metadata only feature

 Documentation/filesystems/overlayfs.txt |  30 +++-
 fs/overlayfs/Kconfig                    |  19 +++
 fs/overlayfs/copy_up.c                  | 165 +++++++++++++++----
 fs/overlayfs/dir.c                      | 166 ++++++++++++++++---
 fs/overlayfs/export.c                   |   7 +-
 fs/overlayfs/inode.c                    | 140 +++++++++++-----
 fs/overlayfs/namei.c                    | 121 ++++++++++----
 fs/overlayfs/overlayfs.h                |  30 +++-
 fs/overlayfs/ovl_entry.h                |   1 +
 fs/overlayfs/super.c                    |  62 ++++++-
 fs/overlayfs/util.c                     | 281 +++++++++++++++++++++++++++++---
 11 files changed, 861 insertions(+), 161 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v13 01/28] ovl: Set OVL_INDEX flag in ovl_get_inode()
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-30  4:59   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 02/28] ovl: Initialize ovl_inode->redirect " Vivek Goyal
                   ` (26 subsequent siblings)
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

All the inode/ovl_inode initialization should happen in ovl_get_inode().
This is especially useful when multiple dentries are pointing to same
inode and inode is already in the cash. In that case, we don't have to
initialize the OVL_INDEX. We don't even take ovl_inode->lock in ovl_lookup()
and that can run into issues for the case of shared inode.

So move OVL_INDEX flag setting in ovl_get_inode().

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/export.c | 3 ---
 fs/overlayfs/inode.c  | 3 +++
 fs/overlayfs/namei.c  | 2 --
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 87bd4148f4fb..9868c173068e 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -311,9 +311,6 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
 		return ERR_CAST(inode);
 	}
 
-	if (index)
-		ovl_set_flag(OVL_INDEX, inode);
-
 	dentry = d_find_any_alias(inode);
 	if (!dentry) {
 		dentry = d_alloc_anon(inode->i_sb);
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 3b1bd469accd..06ef9a7a7d1a 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -760,6 +760,9 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 	if (upperdentry && ovl_is_impuredir(upperdentry))
 		ovl_set_flag(OVL_IMPURE, inode);
 
+	if (index)
+		ovl_set_flag(OVL_INDEX, inode);
+
 	/* Check for non-merge dir that may have whiteouts */
 	if (is_dir) {
 		if (((upperdentry && lowerdentry) || numlower > 1) ||
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index f55c00bf9e17..ec92fa2f7d5f 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -1005,8 +1005,6 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 			goto out_free_oe;
 
 		OVL_I(inode)->redirect = upperredirect;
-		if (index)
-			ovl_set_flag(OVL_INDEX, inode);
 	}
 
 	revert_creds(old_cred);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 02/28] ovl: Initialize ovl_inode->redirect in ovl_get_inode()
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
  2018-03-29 19:38 ` [PATCH v13 01/28] ovl: Set OVL_INDEX flag in ovl_get_inode() Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-30  4:57   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 03/28] ovl: Rename local variable locked to new_locked Vivek Goyal
                   ` (25 subsequent siblings)
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

All the inode/ovl_inode properties ideally should be initialized in
ovl_get_inode(). Otherwise this can be a problem for the cases where
multiple dentries point to same inode (hardlinks, disconnected dentries).

As of now this is not a problem as redirects are used only for directories
which don't share inode. But soon I want to use redirects for regular files
also and there it will become an issue.

Hence, move ->redirect initialization in ovl_get_inode().

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/export.c    | 2 +-
 fs/overlayfs/inode.c     | 5 ++++-
 fs/overlayfs/namei.c     | 4 +---
 fs/overlayfs/overlayfs.h | 2 +-
 4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 9868c173068e..e668329f7361 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -305,7 +305,7 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
 	if (d_is_dir(upper ?: lower))
 		return ERR_PTR(-EIO);
 
-	inode = ovl_get_inode(sb, dget(upper), lower, index, !!lower);
+	inode = ovl_get_inode(sb, dget(upper), lower, index, !!lower, NULL);
 	if (IS_ERR(inode)) {
 		dput(upper);
 		return ERR_CAST(inode);
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 06ef9a7a7d1a..2e7ee09c7831 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -704,7 +704,7 @@ static bool ovl_hash_bylower(struct super_block *sb, struct dentry *upper,
 
 struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 			    struct dentry *lowerdentry, struct dentry *index,
-			    unsigned int numlower)
+			    unsigned int numlower, char *redirect)
 {
 	struct inode *realinode = upperdentry ? d_inode(upperdentry) : NULL;
 	struct inode *inode;
@@ -741,6 +741,7 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 			}
 
 			dput(upperdentry);
+			kfree(redirect);
 			goto out;
 		}
 
@@ -763,6 +764,8 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 	if (index)
 		ovl_set_flag(OVL_INDEX, inode);
 
+	OVL_I(inode)->redirect = redirect;
+
 	/* Check for non-merge dir that may have whiteouts */
 	if (is_dir) {
 		if (((upperdentry && lowerdentry) || numlower > 1) ||
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index ec92fa2f7d5f..0b325e65864c 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -999,12 +999,10 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		if (ctr)
 			origin = stack[0].dentry;
 		inode = ovl_get_inode(dentry->d_sb, upperdentry, origin, index,
-				      ctr);
+				      ctr, upperredirect);
 		err = PTR_ERR(inode);
 		if (IS_ERR(inode))
 			goto out_free_oe;
-
-		OVL_I(inode)->redirect = upperredirect;
 	}
 
 	revert_creds(old_cred);
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 225ff1171147..a65ce7fd1b6e 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -330,7 +330,7 @@ struct inode *ovl_lookup_inode(struct super_block *sb, struct dentry *real,
 			       bool is_upper);
 struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 			    struct dentry *lowerdentry, struct dentry *index,
-			    unsigned int numlower);
+			    unsigned int numlower, char *redirect);
 static inline void ovl_copyattr(struct inode *from, struct inode *to)
 {
 	to->i_uid = from->i_uid;
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 03/28] ovl: Rename local variable locked to new_locked
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
  2018-03-29 19:38 ` [PATCH v13 01/28] ovl: Set OVL_INDEX flag in ovl_get_inode() Vivek Goyal
  2018-03-29 19:38 ` [PATCH v13 02/28] ovl: Initialize ovl_inode->redirect " Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-30  4:58   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 04/28] ovl: Provide a mount option metacopy=on/off for metadata copyup Vivek Goyal
                   ` (24 subsequent siblings)
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

As "locked" tracks the mutex lock state of "new" dentry inode instead of
"old" dentry inode, name the variable "new_locked" instead.

Soon I will need to lock old dentry inode lock as well and will track that
state in variable "locked".

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/dir.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 839709c7803a..8a38f3136547 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -892,7 +892,7 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 		      unsigned int flags)
 {
 	int err;
-	bool locked = false;
+	bool new_locked = false;
 	struct dentry *old_upperdir;
 	struct dentry *new_upperdir;
 	struct dentry *olddentry;
@@ -959,7 +959,7 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 		if (err)
 			goto out_drop_write;
 	} else {
-		err = ovl_nlink_start(new, &locked);
+		err = ovl_nlink_start(new, &new_locked);
 		if (err)
 			goto out_drop_write;
 	}
@@ -1087,7 +1087,7 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 	unlock_rename(new_upperdir, old_upperdir);
 out_revert_creds:
 	revert_creds(old_cred);
-	ovl_nlink_end(new, locked);
+	ovl_nlink_end(new, new_locked);
 out_drop_write:
 	ovl_drop_write(old);
 out:
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 04/28] ovl: Provide a mount option metacopy=on/off for metadata copyup
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (2 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 03/28] ovl: Rename local variable locked to new_locked Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-30  4:52   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 05/28] ovl: During copy up, first copy up metadata and then data Vivek Goyal
                   ` (23 subsequent siblings)
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

By default metadata only copy up is disabled. Provide a mount option so
that users can choose one way or other.

Also provide a kernel config and module option to enable/disable
metacopy feature.

metacopy feature requires redirect_dir=on when upper is present. Otherwise,
it requires redirect_dir=follow atleast.

Like index feature, we verify on mount that upper root is not being
reused with a different lower root. This hopes to get the configuration
right and detect the copied layers use case. But this does only so
much as we don't verify all the lowers. So it is possible that a lower is
missing and later data copy up fails.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 Documentation/filesystems/overlayfs.txt | 30 ++++++++++++++++++++++++-
 fs/overlayfs/Kconfig                    | 19 ++++++++++++++++
 fs/overlayfs/ovl_entry.h                |  1 +
 fs/overlayfs/super.c                    | 40 ++++++++++++++++++++++++++++++++-
 4 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt
index 6ea1e64d1464..b7720e61973c 100644
--- a/Documentation/filesystems/overlayfs.txt
+++ b/Documentation/filesystems/overlayfs.txt
@@ -249,6 +249,30 @@ rightmost one and going left.  In the above example lower1 will be the
 top, lower2 the middle and lower3 the bottom layer.
 
 
+Metadata only copyup
+--------------------
+
+When metadata only copy up feature is enabled, overlayfs will only copy
+up metadata (as opposed to whole file), when a metadata specific operation
+like chown/chmod is performed. Full file will be copied up later when
+file is opened for WRITE operation.
+
+IOW, this is delayed data copy up operation and data is copied up when
+there is a need to actually modify data.
+
+There are multiple ways to enable/disable this feature. A config option
+CONFIG_OVERLAY_FS_METACOPY can be set/unset to enable/disable this feature
+by default. Or one can enable/disable it at module load time with module
+parameter metacopy=on/off. Lastly, there is also a per mount option
+metacopy=on/off to enable/disable this feature per mount.
+
+Do not use metacopy=on with untrusted upper/lower directories. Otherwise
+it is possible that an attacker can create an handcrafted file with
+appropriate REDIRECT and METACOPY xattrs, and gain access to file on lower
+pointed by REDIRECT. This should not be possible on local system as setting
+"trusted." xattrs will require CAP_SYS_ADMIN. But it should be possible
+for untrusted layers like from a pen drive.
+
 Sharing and copying layers
 --------------------------
 
@@ -267,7 +291,7 @@ though it will not result in a crash or deadlock.
 Mounting an overlay using an upper layer path, where the upper layer path
 was previously used by another mounted overlay in combination with a
 different lower layer path, is allowed, unless the "inodes index" feature
-is enabled.
+or "metadata only copyup" feature is enabled.
 
 With the "inodes index" feature, on the first time mount, an NFS file
 handle of the lower layer root directory, along with the UUID of the lower
@@ -280,6 +304,10 @@ lower root origin, mount will fail with ESTALE.  An overlayfs mount with
 does not support NFS export, lower filesystem does not have a valid UUID or
 if the upper filesystem does not support extended attributes.
 
+For "metadata only copyup" feature there is no verification mechanism at
+mount time. So if same upper is mouted with different set of lower, mount
+probably will succeed but expect the unexpected later on. So don't do it.
+
 It is quite a common practice to copy overlay layers to a different
 directory tree on the same or different underlying filesystem, and even
 to a different machine.  With the "inodes index" feature, trying to mount
diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
index ce6ff5a0a6e4..7d9650c9c075 100644
--- a/fs/overlayfs/Kconfig
+++ b/fs/overlayfs/Kconfig
@@ -86,3 +86,22 @@ config OVERLAY_FS_NFS_EXPORT
 	  case basis with the "nfs_export=on" mount option.
 
 	  Say N unless you fully understand the consequences.
+
+config OVERLAY_FS_METACOPY
+	bool "Overlayfs: turn on metadata only copy up feature by default"
+	depends on OVERLAY_FS
+	depends on !OVERLAY_FS_NFS_EXPORT
+	select OVERLAY_FS_REDIRECT_DIR
+	help
+	  If this config option is enabled then overlay filesystems will
+	  copy up only metadata where appropriate and data copy up will
+	  happen when a file is opended for WRITE operation. It is still
+	  possible to turn off this feature globally with the "metacopy=off"
+	  module option or on a filesystem instance basis with the
+	  "metacopy=off" mount option.
+
+	  Note, that this feature is not backward compatible.  That is,
+	  mounting an overlay which has metacopy only inodes on a kernel
+	  that doesn't support this feature will have unexpected results.
+
+	  If unsure, say N.
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index bfef6edcc111..7dc55628080d 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -18,6 +18,7 @@ struct ovl_config {
 	const char *redirect_mode;
 	bool index;
 	bool nfs_export;
+	bool metacopy;
 };
 
 struct ovl_layer {
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 7c24619ae7fc..ddff54fa9e85 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -58,6 +58,11 @@ static void ovl_entry_stack_free(struct ovl_entry *oe)
 		dput(oe->lowerstack[i].dentry);
 }
 
+static bool ovl_metacopy_def = IS_ENABLED(CONFIG_OVERLAY_FS_METACOPY);
+module_param_named(metacopy, ovl_metacopy_def, bool, 0644);
+MODULE_PARM_DESC(ovl_metacopy_def,
+		 "Default to on or off for the metadata only copy up feature");
+
 static void ovl_dentry_release(struct dentry *dentry)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
@@ -350,6 +355,9 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
 	if (ofs->config.nfs_export != ovl_nfs_export_def)
 		seq_printf(m, ",nfs_export=%s", ofs->config.nfs_export ?
 						"on" : "off");
+	if (ofs->config.metacopy != ovl_metacopy_def)
+		seq_printf(m, ",metacopy=%s",
+			   ofs->config.metacopy ? "on" : "off");
 	return 0;
 }
 
@@ -384,6 +392,8 @@ enum {
 	OPT_INDEX_OFF,
 	OPT_NFS_EXPORT_ON,
 	OPT_NFS_EXPORT_OFF,
+	OPT_METACOPY_ON,
+	OPT_METACOPY_OFF,
 	OPT_ERR,
 };
 
@@ -397,6 +407,8 @@ static const match_table_t ovl_tokens = {
 	{OPT_INDEX_OFF,			"index=off"},
 	{OPT_NFS_EXPORT_ON,		"nfs_export=on"},
 	{OPT_NFS_EXPORT_OFF,		"nfs_export=off"},
+	{OPT_METACOPY_ON,		"metacopy=on"},
+	{OPT_METACOPY_OFF,		"metacopy=off"},
 	{OPT_ERR,			NULL}
 };
 
@@ -511,6 +523,14 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
 			config->nfs_export = false;
 			break;
 
+		case OPT_METACOPY_ON:
+			config->metacopy = true;
+			break;
+
+		case OPT_METACOPY_OFF:
+			config->metacopy = false;
+			break;
+
 		default:
 			pr_err("overlayfs: unrecognized mount option \"%s\" or missing value\n", p);
 			return -EINVAL;
@@ -993,7 +1013,8 @@ static int ovl_make_workdir(struct ovl_fs *ofs, struct path *workpath)
 	if (err) {
 		ofs->noxattr = true;
 		ofs->config.index = false;
-		pr_warn("overlayfs: upper fs does not support xattr, falling back to index=off.\n");
+		ofs->config.metacopy = false;
+		pr_warn("overlayfs: upper fs does not support xattr, falling back to index=off and metacopy=off.\n");
 		err = 0;
 	} else {
 		vfs_removexattr(ofs->workdir, OVL_XATTR_OPAQUE);
@@ -1012,6 +1033,11 @@ static int ovl_make_workdir(struct ovl_fs *ofs, struct path *workpath)
 		ofs->config.nfs_export = false;
 	}
 
+	/* metacopy feature with upper requires redirect_dir=on */
+	if (ofs->config.metacopy && !ofs->config.redirect_dir) {
+		pr_warn("overlayfs: metadata only copyup requires \"redirect_dir=on\", falling back to metacopy=off.\n");
+		ofs->config.metacopy = false;
+	}
 out:
 	mnt_drop_write(mnt);
 	return err;
@@ -1188,6 +1214,12 @@ static struct ovl_entry *ovl_get_lowerstack(struct super_block *sb,
 		ofs->config.nfs_export = false;
 	}
 
+	if (!ofs->config.upperdir && ofs->config.metacopy &&
+	    !ofs->config.redirect_follow) {
+		ofs->config.metacopy = false;
+		pr_warn("overlayfs: metadata only copyup requires \"redirect_dir=follow\" on non-upper mount, falling back to metacopy=off.\n");
+	}
+
 	err = -ENOMEM;
 	stack = kcalloc(stacklen, sizeof(struct path), GFP_KERNEL);
 	if (!stack)
@@ -1263,6 +1295,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 
 	ofs->config.index = ovl_index_def;
 	ofs->config.nfs_export = ovl_nfs_export_def;
+	ofs->config.metacopy = ovl_metacopy_def;
 	err = ovl_parse_opt((char *) data, &ofs->config);
 	if (err)
 		goto out_err;
@@ -1331,6 +1364,11 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 		}
 	}
 
+	if (ofs->config.metacopy && ofs->config.nfs_export) {
+		pr_warn("overlayfs: Metadata copy up requires NFS export disabled, falling back to nfs_export=off.\n");
+		ofs->config.nfs_export = false;
+	}
+
 	if (ofs->config.nfs_export)
 		sb->s_export_op = &ovl_export_operations;
 
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 05/28] ovl: During copy up, first copy up metadata and then data
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (3 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 04/28] ovl: Provide a mount option metacopy=on/off for metadata copyup Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-29 19:38 ` [PATCH v13 06/28] ovl: Move the copy up helpers to copy_up.c Vivek Goyal
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

Just a little re-ordering of code. This helps with next patch where after
copying up metadata, we skip data copying step, if needed.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/copy_up.c | 36 +++++++++++++++++-------------------
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index d855f508fa20..2de4ab3254a4 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -536,28 +536,10 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
 {
 	int err;
 
-	if (S_ISREG(c->stat.mode)) {
-		struct path upperpath;
-
-		ovl_path_upper(c->dentry, &upperpath);
-		BUG_ON(upperpath.dentry != NULL);
-		upperpath.dentry = temp;
-
-		err = ovl_copy_up_data(&c->lowerpath, &upperpath, c->stat.size);
-		if (err)
-			return err;
-	}
-
 	err = ovl_copy_xattr(c->lowerpath.dentry, temp);
 	if (err)
 		return err;
 
-	inode_lock(temp->d_inode);
-	err = ovl_set_attr(temp, &c->stat);
-	inode_unlock(temp->d_inode);
-	if (err)
-		return err;
-
 	/*
 	 * Store identifier of lower inode in upper inode xattr to
 	 * allow lookup of the copy up origin inode.
@@ -571,7 +553,23 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
 			return err;
 	}
 
-	return 0;
+	if (S_ISREG(c->stat.mode)) {
+		struct path upperpath;
+
+		ovl_path_upper(c->dentry, &upperpath);
+		BUG_ON(upperpath.dentry != NULL);
+		upperpath.dentry = temp;
+
+		err = ovl_copy_up_data(&c->lowerpath, &upperpath, c->stat.size);
+		if (err)
+			return err;
+	}
+
+	inode_lock(temp->d_inode);
+	err = ovl_set_attr(temp, &c->stat);
+	inode_unlock(temp->d_inode);
+
+	return err;
 }
 
 static int ovl_copy_up_locked(struct ovl_copy_up_ctx *c)
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 06/28] ovl: Move the copy up helpers to copy_up.c
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (4 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 05/28] ovl: During copy up, first copy up metadata and then data Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-29 19:38 ` [PATCH v13 07/28] ovl: Copy up only metadata during copy up where it makes sense Vivek Goyal
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

Right now two copy up helpers are in inode.c. Amir suggested it might
be better to move these to copy_up.c.

There will one more related function which will come in later patch.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/copy_up.c   | 32 ++++++++++++++++++++++++++++++++
 fs/overlayfs/inode.c     | 32 --------------------------------
 fs/overlayfs/overlayfs.h |  2 +-
 3 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 2de4ab3254a4..98403b1d40c2 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -826,6 +826,38 @@ int ovl_copy_up_flags(struct dentry *dentry, int flags)
 	return err;
 }
 
+static bool ovl_open_need_copy_up(struct dentry *dentry, int flags)
+{
+	/* Copy up of disconnected dentry does not set upper alias */
+	if (ovl_dentry_upper(dentry) &&
+	    (ovl_dentry_has_upper_alias(dentry) ||
+	     (dentry->d_flags & DCACHE_DISCONNECTED)))
+		return false;
+
+	if (special_file(d_inode(dentry)->i_mode))
+		return false;
+
+	if (!(OPEN_FMODE(flags) & FMODE_WRITE) && !(flags & O_TRUNC))
+		return false;
+
+	return true;
+}
+
+int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags)
+{
+	int err = 0;
+
+	if (ovl_open_need_copy_up(dentry, file_flags)) {
+		err = ovl_want_write(dentry);
+		if (!err) {
+			err = ovl_copy_up_flags(dentry, file_flags);
+			ovl_drop_write(dentry);
+		}
+	}
+
+	return err;
+}
+
 int ovl_copy_up(struct dentry *dentry)
 {
 	return ovl_copy_up_flags(dentry, 0);
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 2e7ee09c7831..3991a890b464 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -349,38 +349,6 @@ struct posix_acl *ovl_get_acl(struct inode *inode, int type)
 	return acl;
 }
 
-static bool ovl_open_need_copy_up(struct dentry *dentry, int flags)
-{
-	/* Copy up of disconnected dentry does not set upper alias */
-	if (ovl_dentry_upper(dentry) &&
-	    (ovl_dentry_has_upper_alias(dentry) ||
-	     (dentry->d_flags & DCACHE_DISCONNECTED)))
-		return false;
-
-	if (special_file(d_inode(dentry)->i_mode))
-		return false;
-
-	if (!(OPEN_FMODE(flags) & FMODE_WRITE) && !(flags & O_TRUNC))
-		return false;
-
-	return true;
-}
-
-int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags)
-{
-	int err = 0;
-
-	if (ovl_open_need_copy_up(dentry, file_flags)) {
-		err = ovl_want_write(dentry);
-		if (!err) {
-			err = ovl_copy_up_flags(dentry, file_flags);
-			ovl_drop_write(dentry);
-		}
-	}
-
-	return err;
-}
-
 int ovl_update_time(struct inode *inode, struct timespec *ts, int flags)
 {
 	struct dentry *alias;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index a65ce7fd1b6e..7c57820524f4 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -321,7 +321,6 @@ int ovl_xattr_get(struct dentry *dentry, struct inode *inode, const char *name,
 		  void *value, size_t size);
 ssize_t ovl_listxattr(struct dentry *dentry, char *list, size_t size);
 struct posix_acl *ovl_get_acl(struct inode *inode, int type);
-int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags);
 int ovl_update_time(struct inode *inode, struct timespec *ts, int flags);
 bool ovl_is_private_xattr(const char *name);
 
@@ -359,6 +358,7 @@ int ovl_cleanup(struct inode *dir, struct dentry *dentry);
 /* copy_up.c */
 int ovl_copy_up(struct dentry *dentry);
 int ovl_copy_up_flags(struct dentry *dentry, int flags);
+int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags);
 int ovl_copy_xattr(struct dentry *old, struct dentry *new);
 int ovl_set_attr(struct dentry *upper, struct kstat *stat);
 struct ovl_fh *ovl_encode_fh(struct dentry *real, bool is_upper);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 07/28] ovl: Copy up only metadata during copy up where it makes sense
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (5 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 06/28] ovl: Move the copy up helpers to copy_up.c Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-29 19:38 ` [PATCH v13 08/28] ovl: Add helper ovl_already_copied_up() Vivek Goyal
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

If it makes sense to copy up only metadata during copy up, do it. This
is done for regular files which are not opened for WRITE and have origin
being saved.

Right now ->metacopy is set to 0 always. Last patch in the series will
remove the hard coded statement and enable metacopy feature.

Currently ovl_set_origin() returns success even if specified xattr could
not be set due to -EOPNOTSUPP returned by upper. IOW, ovl_set_origin()
operation is optional and copy up operation continues.

With metadata copy up, ovl_set_origin() can't be optional. We need to know
if origin could be set or not. If it could not be set, then either disable
metacopy or abort copy up operation. I have take then later approach and
disable metacopy for this inode.

Normally we should not run into this path as metacopy will not be enabled
if upper does not support xattr. This check is done during mount. This
path can only hit if checks during mount pass but at some point of time
later upper still returns -EOPNOTSUPP. So its a saftey mechanism to handle
that unexpected -EOPNOTSUPP.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/copy_up.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 98403b1d40c2..45e396a0fd6a 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -232,6 +232,26 @@ int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat)
 	return err;
 }
 
+static bool ovl_need_meta_copy_up(struct dentry *dentry, umode_t mode,
+				  int flags)
+{
+	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
+
+	/* TODO: Will enable metacopy in last patch of series */
+	return false;
+
+	if (!ofs->config.metacopy)
+		return false;
+
+	if (!S_ISREG(mode))
+		return false;
+
+	if (flags && ((OPEN_FMODE(flags) & FMODE_WRITE) || (flags & O_TRUNC)))
+		return false;
+
+	return true;
+}
+
 struct ovl_fh *ovl_encode_fh(struct dentry *real, bool is_upper)
 {
 	struct ovl_fh *fh;
@@ -416,6 +436,7 @@ struct ovl_copy_up_ctx {
 	bool tmpfile;
 	bool origin;
 	bool indexed;
+	bool metacopy;
 };
 
 static int ovl_link_up(struct ovl_copy_up_ctx *c)
@@ -553,7 +574,7 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
 			return err;
 	}
 
-	if (S_ISREG(c->stat.mode)) {
+	if (S_ISREG(c->stat.mode) && !c->metacopy) {
 		struct path upperpath;
 
 		ovl_path_upper(c->dentry, &upperpath);
@@ -729,6 +750,8 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 	if (err)
 		return err;
 
+	ctx.metacopy = ovl_need_meta_copy_up(dentry, ctx.stat.mode, flags);
+
 	if (parent) {
 		ovl_path_upper(parent, &parentpath);
 		ctx.destdir = parentpath.dentry;
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 08/28] ovl: Add helper ovl_already_copied_up()
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (6 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 07/28] ovl: Copy up only metadata during copy up where it makes sense Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-29 19:38 ` [PATCH v13 09/28] ovl: A new xattr OVL_XATTR_METACOPY for file on upper Vivek Goyal
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

There are couple of places where we need to know if file is already copied
up (in lockless manner). Right now its open coded and there are only
two conditions to check. Soon this patch series will introduce another
condition to check and Amir wants to introduce one more. So introduce
a helper instead to check this so that code is easier to read.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c   | 20 ++------------------
 fs/overlayfs/overlayfs.h |  1 +
 fs/overlayfs/util.c      | 26 +++++++++++++++++++++++++-
 3 files changed, 28 insertions(+), 19 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 45e396a0fd6a..8d9af7fdc8a4 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -810,21 +810,7 @@ int ovl_copy_up_flags(struct dentry *dentry, int flags)
 		struct dentry *next;
 		struct dentry *parent = NULL;
 
-		/*
-		 * Check if copy-up has happened as well as for upper alias (in
-		 * case of hard links) is there.
-		 *
-		 * Both checks are lockless:
-		 *  - false negatives: will recheck under oi->lock
-		 *  - false positives:
-		 *    + ovl_dentry_upper() uses memory barriers to ensure the
-		 *      upper dentry is up-to-date
-		 *    + ovl_dentry_has_upper_alias() relies on locking of
-		 *      upper parent i_rwsem to prevent reordering copy-up
-		 *      with rename.
-		 */
-		if (ovl_dentry_upper(dentry) &&
-		    (ovl_dentry_has_upper_alias(dentry) || disconnected))
+		if (ovl_already_copied_up(dentry))
 			break;
 
 		next = dget(dentry);
@@ -852,9 +838,7 @@ int ovl_copy_up_flags(struct dentry *dentry, int flags)
 static bool ovl_open_need_copy_up(struct dentry *dentry, int flags)
 {
 	/* Copy up of disconnected dentry does not set upper alias */
-	if (ovl_dentry_upper(dentry) &&
-	    (ovl_dentry_has_upper_alias(dentry) ||
-	     (dentry->d_flags & DCACHE_DISCONNECTED)))
+	if (ovl_already_copied_up(dentry))
 		return false;
 
 	if (special_file(d_inode(dentry)->i_mode))
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 7c57820524f4..c7c9d717d546 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -242,6 +242,7 @@ bool ovl_is_whiteout(struct dentry *dentry);
 struct file *ovl_path_open(struct path *path, int flags);
 int ovl_copy_up_start(struct dentry *dentry);
 void ovl_copy_up_end(struct dentry *dentry);
+bool ovl_already_copied_up(struct dentry *dentry);
 bool ovl_check_origin_xattr(struct dentry *dentry);
 bool ovl_check_dir_xattr(struct dentry *dentry, const char *name);
 int ovl_check_setxattr(struct dentry *dentry, struct dentry *upperdentry,
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 930784a26623..4e537eeb594e 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -339,13 +339,37 @@ struct file *ovl_path_open(struct path *path, int flags)
 	return dentry_open(path, flags | O_NOATIME, current_cred());
 }
 
+bool ovl_already_copied_up(struct dentry *dentry)
+{
+	bool disconnected = dentry->d_flags & DCACHE_DISCONNECTED;
+
+	/*
+	 * Check if copy-up has happened as well as for upper alias (in
+	 * case of hard links) is there.
+	 *
+	 * Both checks are lockless:
+	 *  - false negatives: will recheck under oi->lock
+	 *  - false positives:
+	 *    + ovl_dentry_upper() uses memory barriers to ensure the
+	 *      upper dentry is up-to-date
+	 *    + ovl_dentry_has_upper_alias() relies on locking of
+	 *      upper parent i_rwsem to prevent reordering copy-up
+	 *      with rename.
+	 */
+	if (ovl_dentry_upper(dentry) &&
+	    (ovl_dentry_has_upper_alias(dentry) || disconnected))
+		return true;
+
+	return false;
+}
+
 int ovl_copy_up_start(struct dentry *dentry)
 {
 	struct ovl_inode *oi = OVL_I(d_inode(dentry));
 	int err;
 
 	err = mutex_lock_interruptible(&oi->lock);
-	if (!err && ovl_dentry_has_upper_alias(dentry)) {
+	if (!err && ovl_already_copied_up(dentry)) {
 		err = 1; /* Already copied up */
 		mutex_unlock(&oi->lock);
 	}
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 09/28] ovl: A new xattr OVL_XATTR_METACOPY for file on upper
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (7 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 08/28] ovl: Add helper ovl_already_copied_up() Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-04-11 15:10   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 10/28] ovl: Modify ovl_lookup() and friends to lookup metacopy dentry Vivek Goyal
                   ` (18 subsequent siblings)
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

Now we will have the capability to have upper inodes which might be only
metadata copy up and data is still on lower inode. So add a new xattr
OVL_XATTR_METACOPY to distinguish between two cases.

Presence of OVL_XATTR_METACOPY reflects that file has been copied up
metadata only and and data will be copied up later from lower origin.
So this xattr is set when a metadata copy takes place and cleared when
data copy takes place.

We also use a bit in ovl_inode->flags to cache OVL_UPPERDATA which reflects
whether ovl inode has data or not (as opposed to metadata only copy up).

If a file is copied up metadata only and later when same file is opened
for WRITE, then data copy up takes place. We copy up data, remove METACOPY
xattr and then set the UPPERDATA flag in ovl_inode->flags. While all
these operations happen with oi->lock held, read side of oi->flags can be
lockless. That is another thread on another cpu can check if UPPERDATA
flag is set or not.

So this gives us an ordering requirement w.r.t UPPERDATA flag. That is, if
another cpu sees UPPERDATA flag set, then it should be guaranteed that
effects of data copy up and remove xattr operations are also visible.

For example.

	CPU1				CPU2
ovl_d_real()				acquire(oi->lock)
 ovl_open_maybe_copy_up()                ovl_copy_up_data()
  open_open_need_copy_up()		 vfs_removexattr()
   ovl_already_copied_up()
    ovl_dentry_needs_data_copy_up()	 ovl_set_flag(OVL_UPPERDATA)
     ovl_test_flag(OVL_UPPERDATA)       release(oi->lock)

Say CPU2 is copying up data and in the end sets UPPERDATA flag. But if
CPU1 perceives the effects of setting UPPERDATA flag but not the effects
of preceeding operations (ex. upper that is not fully copied up), it will be
a problem.

Hence this patch introduces smp_wmb() on setting UPPERDATA flag operation
and smp_rmb() on UPPERDATA flag test operation.

May be some other lock or barrier is already covering it. But I am not sure
what that is and is it obvious enough that we will not break it in future.

So hence trying to be safe here and introducing barriers explicitly for
UPPERDATA flag/bit.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/copy_up.c   | 56 ++++++++++++++++++++++++++++++----
 fs/overlayfs/dir.c       |  1 +
 fs/overlayfs/overlayfs.h | 18 +++++++++--
 fs/overlayfs/super.c     |  1 +
 fs/overlayfs/util.c      | 78 +++++++++++++++++++++++++++++++++++++++++++++---
 5 files changed, 143 insertions(+), 11 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 8d9af7fdc8a4..9801ae7baa5d 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -195,6 +195,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
 	return error;
 }
 
+static int ovl_set_size(struct dentry *upperdentry, struct kstat *stat)
+{
+	struct iattr attr = {
+		.ia_valid = ATTR_SIZE,
+		.ia_size = stat->size,
+	};
+
+	return notify_change(upperdentry, &attr, NULL);
+}
+
 static int ovl_set_timestamps(struct dentry *upperdentry, struct kstat *stat)
 {
 	struct iattr attr = {
@@ -586,8 +596,18 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
 			return err;
 	}
 
+	if (c->metacopy) {
+		err = ovl_check_setxattr(c->dentry, temp, OVL_XATTR_METACOPY,
+					 NULL, 0, -EOPNOTSUPP);
+		if (err)
+			return err;
+	}
+
 	inode_lock(temp->d_inode);
-	err = ovl_set_attr(temp, &c->stat);
+	if (c->metacopy)
+		err = ovl_set_size(temp, &c->stat);
+	if (!err)
+		err = ovl_set_attr(temp, &c->stat);
 	inode_unlock(temp->d_inode);
 
 	return err;
@@ -625,6 +645,8 @@ static int ovl_copy_up_locked(struct ovl_copy_up_ctx *c)
 	if (err)
 		goto out_cleanup;
 
+	if (!c->metacopy)
+		ovl_set_upperdata(d_inode(c->dentry));
 	inode = d_inode(c->dentry);
 	ovl_inode_update(inode, newdentry);
 	if (S_ISDIR(inode->i_mode))
@@ -729,6 +751,28 @@ static int ovl_do_copy_up(struct ovl_copy_up_ctx *c)
 	return err;
 }
 
+/* Copy up data of an inode which was copied up metadata only in the past. */
+static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c)
+{
+	struct path upperpath;
+	int err;
+
+	ovl_path_upper(c->dentry, &upperpath);
+	if (WARN_ON(upperpath.dentry == NULL))
+		return -EIO;
+
+	err = ovl_copy_up_data(&c->lowerpath, &upperpath, c->stat.size);
+	if (err)
+		return err;
+
+	err = vfs_removexattr(upperpath.dentry, OVL_XATTR_METACOPY);
+	if (err)
+		return err;
+
+	ovl_set_upperdata(d_inode(c->dentry));
+	return err;
+}
+
 static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 			   int flags)
 {
@@ -775,7 +819,7 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 	}
 	ovl_do_check_copy_up(ctx.lowerpath.dentry);
 
-	err = ovl_copy_up_start(dentry);
+	err = ovl_copy_up_start(dentry, flags);
 	/* err < 0: interrupted, err > 0: raced with another copy-up */
 	if (unlikely(err)) {
 		if (err > 0)
@@ -785,6 +829,8 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 			err = ovl_do_copy_up(&ctx);
 		if (!err && parent && !ovl_dentry_has_upper_alias(dentry))
 			err = ovl_link_up(&ctx);
+		if (!err && ovl_dentry_needs_data_copy_up_locked(dentry, flags))
+			err = ovl_copy_up_meta_inode_data(&ctx);
 		ovl_copy_up_end(dentry);
 	}
 	do_delayed_call(&done);
@@ -810,7 +856,7 @@ int ovl_copy_up_flags(struct dentry *dentry, int flags)
 		struct dentry *next;
 		struct dentry *parent = NULL;
 
-		if (ovl_already_copied_up(dentry))
+		if (ovl_already_copied_up(dentry, flags))
 			break;
 
 		next = dget(dentry);
@@ -838,13 +884,13 @@ int ovl_copy_up_flags(struct dentry *dentry, int flags)
 static bool ovl_open_need_copy_up(struct dentry *dentry, int flags)
 {
 	/* Copy up of disconnected dentry does not set upper alias */
-	if (ovl_already_copied_up(dentry))
+	if (ovl_already_copied_up(dentry, flags))
 		return false;
 
 	if (special_file(d_inode(dentry)->i_mode))
 		return false;
 
-	if (!(OPEN_FMODE(flags) & FMODE_WRITE) && !(flags & O_TRUNC))
+	if (!ovl_open_flags_need_copy_up(flags))
 		return false;
 
 	return true;
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 8a38f3136547..7617a03acc30 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -189,6 +189,7 @@ static void ovl_instantiate(struct dentry *dentry, struct inode *inode,
 	ovl_dentry_version_inc(dentry->d_parent, false);
 	ovl_dentry_set_upper_alias(dentry);
 	if (!hardlink) {
+		ovl_set_upperdata(inode);
 		ovl_inode_update(inode, newdentry);
 		ovl_copyattr(newdentry->d_inode, inode);
 	} else {
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index c7c9d717d546..8608f3ec8f44 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -28,6 +28,7 @@ enum ovl_path_type {
 #define OVL_XATTR_IMPURE OVL_XATTR_PREFIX "impure"
 #define OVL_XATTR_NLINK OVL_XATTR_PREFIX "nlink"
 #define OVL_XATTR_UPPER OVL_XATTR_PREFIX "upper"
+#define OVL_XATTR_METACOPY OVL_XATTR_PREFIX "metacopy"
 
 enum ovl_inode_flag {
 	/* Pure upper dir that may contain non pure upper entries */
@@ -35,6 +36,7 @@ enum ovl_inode_flag {
 	/* Non-merge dir that may contain whiteout entries */
 	OVL_WHITEOUTS,
 	OVL_INDEX,
+	OVL_UPPERDATA,
 };
 
 enum ovl_entry_flag {
@@ -196,6 +198,14 @@ static inline struct dentry *ovl_do_tmpfile(struct dentry *dentry, umode_t mode)
 	return ret;
 }
 
+static inline bool ovl_open_flags_need_copy_up(int flags)
+{
+	if (!flags)
+		return false;
+
+	return ((OPEN_FMODE(flags) & FMODE_WRITE) || (flags & O_TRUNC));
+}
+
 /* util.c */
 int ovl_want_write(struct dentry *dentry);
 void ovl_drop_write(struct dentry *dentry);
@@ -230,6 +240,10 @@ bool ovl_dentry_is_whiteout(struct dentry *dentry);
 void ovl_dentry_set_opaque(struct dentry *dentry);
 bool ovl_dentry_has_upper_alias(struct dentry *dentry);
 void ovl_dentry_set_upper_alias(struct dentry *dentry);
+bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags);
+bool ovl_dentry_needs_data_copy_up_locked(struct dentry *dentry, int flags);
+bool ovl_has_upperdata(struct dentry *dentry);
+void ovl_set_upperdata(struct inode *inode);
 bool ovl_redirect_dir(struct super_block *sb);
 const char *ovl_dentry_get_redirect(struct dentry *dentry);
 void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
@@ -240,9 +254,9 @@ void ovl_dentry_version_inc(struct dentry *dentry, bool impurity);
 u64 ovl_dentry_version_get(struct dentry *dentry);
 bool ovl_is_whiteout(struct dentry *dentry);
 struct file *ovl_path_open(struct path *path, int flags);
-int ovl_copy_up_start(struct dentry *dentry);
+int ovl_copy_up_start(struct dentry *dentry, int flags);
 void ovl_copy_up_end(struct dentry *dentry);
-bool ovl_already_copied_up(struct dentry *dentry);
+bool ovl_already_copied_up(struct dentry *dentry, int flags);
 bool ovl_check_origin_xattr(struct dentry *dentry);
 bool ovl_check_dir_xattr(struct dentry *dentry, const char *name);
 int ovl_check_setxattr(struct dentry *dentry, struct dentry *upperdentry,
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index ddff54fa9e85..f77acb8d632a 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1398,6 +1398,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	/* Root is always merge -> can have whiteouts */
 	ovl_set_flag(OVL_WHITEOUTS, d_inode(root_dentry));
 	ovl_dentry_set_flag(OVL_E_CONNECTED, root_dentry);
+	ovl_set_upperdata(d_inode(root_dentry));
 	ovl_inode_init(d_inode(root_dentry), upperpath.dentry,
 		       ovl_dentry_lower(root_dentry));
 
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 4e537eeb594e..7929cc872df6 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -256,6 +256,62 @@ void ovl_dentry_set_upper_alias(struct dentry *dentry)
 	ovl_dentry_set_flag(OVL_E_UPPER_ALIAS, dentry);
 }
 
+static bool ovl_should_check_upperdata(struct dentry *dentry)
+{
+	if (!S_ISREG(d_inode(dentry)->i_mode))
+		return false;
+
+	if (!ovl_dentry_lower(dentry))
+		return false;
+
+	return true;
+}
+
+bool ovl_has_upperdata(struct dentry *dentry)
+{
+	if (!ovl_should_check_upperdata(dentry))
+		return true;
+
+	if (!ovl_test_flag(OVL_UPPERDATA, d_inode(dentry)))
+		return false;
+	/*
+	 * Pairs with smp_wmb() in ovl_set_upperdata(). Main user of
+	 * ovl_has_upperdata() is ovl_copy_up_meta_inode_data(). Make sure
+	 * if setting of OVL_UPPERDATA is visible, then effects of writes
+	 * before that are visible too.
+	 */
+	smp_rmb();
+	return true;
+}
+
+void ovl_set_upperdata(struct inode *inode)
+{
+	/*
+	 * Pairs with smp_rmb() in ovl_has_upperdata(). Make sure
+	 * if OVL_UPPERDATA flag is visible, then effects of write operations
+	 * before it are visible as well.
+	 */
+	smp_wmb();
+	ovl_set_flag(OVL_UPPERDATA, inode);
+}
+
+/* Caller should hold ovl_inode->lock */
+bool ovl_dentry_needs_data_copy_up_locked(struct dentry *dentry, int flags)
+{
+	if (!ovl_open_flags_need_copy_up(flags))
+		return false;
+
+	return !ovl_test_flag(OVL_UPPERDATA, d_inode(dentry));
+}
+
+bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags)
+{
+	if (!ovl_open_flags_need_copy_up(flags))
+		return false;
+
+	return !ovl_has_upperdata(dentry);
+}
+
 bool ovl_redirect_dir(struct super_block *sb)
 {
 	struct ovl_fs *ofs = sb->s_fs_info;
@@ -339,7 +395,20 @@ struct file *ovl_path_open(struct path *path, int flags)
 	return dentry_open(path, flags | O_NOATIME, current_cred());
 }
 
-bool ovl_already_copied_up(struct dentry *dentry)
+/* Caller should hold ovl_inode->lock */
+static bool ovl_already_copied_up_locked(struct dentry *dentry, int flags)
+{
+	bool disconnected = dentry->d_flags & DCACHE_DISCONNECTED;
+
+	if (ovl_dentry_upper(dentry) &&
+	    (ovl_dentry_has_upper_alias(dentry) || disconnected) &&
+	    !ovl_dentry_needs_data_copy_up_locked(dentry, flags))
+		return true;
+
+	return false;
+}
+
+bool ovl_already_copied_up(struct dentry *dentry, int flags)
 {
 	bool disconnected = dentry->d_flags & DCACHE_DISCONNECTED;
 
@@ -357,19 +426,20 @@ bool ovl_already_copied_up(struct dentry *dentry)
 	 *      with rename.
 	 */
 	if (ovl_dentry_upper(dentry) &&
-	    (ovl_dentry_has_upper_alias(dentry) || disconnected))
+	    (ovl_dentry_has_upper_alias(dentry) || disconnected) &&
+	    !ovl_dentry_needs_data_copy_up(dentry, flags))
 		return true;
 
 	return false;
 }
 
-int ovl_copy_up_start(struct dentry *dentry)
+int ovl_copy_up_start(struct dentry *dentry, int flags)
 {
 	struct ovl_inode *oi = OVL_I(d_inode(dentry));
 	int err;
 
 	err = mutex_lock_interruptible(&oi->lock);
-	if (!err && ovl_already_copied_up(dentry)) {
+	if (!err && ovl_already_copied_up_locked(dentry, flags)) {
 		err = 1; /* Already copied up */
 		mutex_unlock(&oi->lock);
 	}
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 10/28] ovl: Modify ovl_lookup() and friends to lookup metacopy dentry
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (8 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 09/28] ovl: A new xattr OVL_XATTR_METACOPY for file on upper Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-30  5:49   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 11/28] ovl: Copy up meta inode data from lowest data inode Vivek Goyal
                   ` (17 subsequent siblings)
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

This patch modifies ovl_lookup() and friends to lookup metacopy dentries.
It also allows for presence of metacopy dentries in lower layer.

During lookup, check for presence of OVL_XATTR_METACOPY and if not present,
set OVL_UPPERDATA bit in flags.

We don't support metacopy feature with nfs_export. So in nfs_export code,
we set OVL_UPPERDATA flag set unconditionally if upper inode exists.

Do not follow metacopy origin if we find a metacopy only inode and metacopy
feature is not enabled for that mount. Like redirect, this can have security
implications where an attacker could hand craft upper and try to gain
access to file on lower which it should not have to begin with.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/export.c    |  3 ++
 fs/overlayfs/inode.c     |  6 +++-
 fs/overlayfs/namei.c     | 90 ++++++++++++++++++++++++++++++++++++++++++------
 fs/overlayfs/overlayfs.h |  1 +
 fs/overlayfs/util.c      | 22 ++++++++++++
 5 files changed, 110 insertions(+), 12 deletions(-)

diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index e668329f7361..1c233096e59c 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -311,6 +311,9 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
 		return ERR_CAST(inode);
 	}
 
+	if (upper)
+		ovl_set_flag(OVL_UPPERDATA, inode);
+
 	dentry = d_find_any_alias(inode);
 	if (!dentry) {
 		dentry = d_alloc_anon(inode->i_sb);
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 3991a890b464..e1dbfed0c449 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -677,7 +677,7 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 	struct inode *realinode = upperdentry ? d_inode(upperdentry) : NULL;
 	struct inode *inode;
 	bool bylower = ovl_hash_bylower(sb, upperdentry, lowerdentry, index);
-	bool is_dir;
+	bool is_dir, metacopy = false;
 
 	if (!realinode)
 		realinode = d_inode(lowerdentry);
@@ -732,6 +732,10 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 	if (index)
 		ovl_set_flag(OVL_INDEX, inode);
 
+	metacopy = ovl_check_metacopy_xattr(upperdentry ?: lowerdentry);
+	if (upperdentry && !metacopy)
+		ovl_set_flag(OVL_UPPERDATA, inode);
+
 	OVL_I(inode)->redirect = redirect;
 
 	/* Check for non-merge dir that may have whiteouts */
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 0b325e65864c..1dba89e9543f 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -24,6 +24,7 @@ struct ovl_lookup_data {
 	bool stop;
 	bool last;
 	char *redirect;
+	bool metacopy;
 };
 
 static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
@@ -252,9 +253,13 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
 		goto put_and_out;
 	}
 	if (!d_can_lookup(this)) {
-		d->stop = true;
 		if (d->is_dir)
 			goto put_and_out;
+		err = ovl_check_metacopy_xattr(this);
+		if (err < 0)
+			goto out_err;
+		d->stop = !err;
+		d->metacopy = !!err;
 		goto out;
 	}
 	if (last_element)
@@ -815,7 +820,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
 	struct ovl_entry *poe = dentry->d_parent->d_fsdata;
 	struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
-	struct ovl_path *stack = NULL;
+	struct ovl_path *stack = NULL, *origin_path = NULL;
 	struct dentry *upperdir, *upperdentry = NULL;
 	struct dentry *origin = NULL;
 	struct dentry *index = NULL;
@@ -826,6 +831,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	struct dentry *this;
 	unsigned int i;
 	int err;
+	bool metacopy = false;
 	struct ovl_lookup_data d = {
 		.name = dentry->d_name,
 		.is_dir = false,
@@ -833,6 +839,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		.stop = false,
 		.last = ofs->config.redirect_follow ? false : !poe->numlower,
 		.redirect = NULL,
+		.metacopy = false,
 	};
 
 	if (dentry->d_name.len > ofs->namelen)
@@ -851,7 +858,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 			goto out;
 		}
 		if (upperdentry && !d.is_dir) {
-			BUG_ON(!d.stop || d.redirect);
+			unsigned int origin_ctr = 0;
+			BUG_ON(d.redirect);
 			/*
 			 * Lookup copy up origin by decoding origin file handle.
 			 * We may get a disconnected dentry, which is fine,
@@ -862,16 +870,20 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 			 * number - it's the same as if we held a reference
 			 * to a dentry in lower layer that was moved under us.
 			 */
-			err = ovl_check_origin(ofs, upperdentry, &stack, &ctr);
+			err = ovl_check_origin(ofs, upperdentry, &origin_path,
+					       &origin_ctr);
 			if (err)
 				goto out_put_upper;
+
+			if (d.metacopy)
+				metacopy = true;
 		}
 
 		if (d.redirect) {
 			err = -ENOMEM;
 			upperredirect = kstrdup(d.redirect, GFP_KERNEL);
 			if (!upperredirect)
-				goto out_put_upper;
+				goto out_put_origin;
 			if (d.redirect[0] == '/')
 				poe = roe;
 		}
@@ -883,7 +895,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		stack = kcalloc(ofs->numlower, sizeof(struct ovl_path),
 				GFP_KERNEL);
 		if (!stack)
-			goto out_put_upper;
+			goto out_put_origin;
 	}
 
 	for (i = 0; !d.stop && i < poe->numlower; i++) {
@@ -905,7 +917,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		 * If no origin fh is stored in upper of a merge dir, store fh
 		 * of lower dir and set upper parent "impure".
 		 */
-		if (upperdentry && !ctr && !ofs->noxattr) {
+		if (upperdentry && !ctr && !ofs->noxattr && d.is_dir) {
 			err = ovl_fix_origin(dentry, this, upperdentry);
 			if (err) {
 				dput(this);
@@ -917,16 +929,35 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		 * When "verify_lower" feature is enabled, do not merge with a
 		 * lower dir that does not match a stored origin xattr. In any
 		 * case, only verified origin is used for index lookup.
+		 *
+		 * For non-dir dentry, make sure dentry found by lookup
+		 * matches the origin stored in upper. Otherwise its an
+		 * error.
 		 */
-		if (upperdentry && !ctr && ovl_verify_lower(dentry->d_sb)) {
+		if (upperdentry && !ctr &&
+		    ((d.is_dir && ovl_verify_lower(dentry->d_sb)) ||
+		     (!d.is_dir && origin_path))) {
 			err = ovl_verify_origin(upperdentry, this, false);
 			if (err) {
 				dput(this);
-				break;
+				if (d.is_dir)
+					break;
+				goto out_put;
 			}
-
 			/* Bless lower dir as verified origin */
-			origin = this;
+			if (d.is_dir)
+				origin = this;
+		}
+
+		if (d.metacopy)
+			metacopy = true;
+		/*
+		 * Do not store intermediate metacopy dentries in chain,
+		 * except top most lower metacopy dentry
+		 */
+		if (d.metacopy && ctr) {
+			dput(this);
+			continue;
 		}
 
 		stack[ctr].dentry = this;
@@ -960,6 +991,34 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		}
 	}
 
+	if (metacopy) {
+		BUG_ON(d.is_dir);
+		/*
+		 * Found a metacopy dentry but did not find corresponding
+		 * data dentry
+		 */
+		if (d.metacopy) {
+			err = -ESTALE;
+			goto out_put;
+		}
+
+		err = -EPERM;
+		if (!ofs->config.metacopy) {
+			pr_warn_ratelimited("overlay: refusing to follow"
+					    " metacopy origin for (%pd2)\n",
+					    dentry);
+			goto out_put;
+		}
+	} else if (!d.is_dir && upperdentry && !ctr && origin_path) {
+		if (WARN_ON(stack != NULL)) {
+			err = -EIO;
+			goto out_put;
+		}
+		stack = origin_path;
+		ctr = 1;
+		origin_path = NULL;
+	}
+
 	/*
 	 * Lookup index by lower inode and verify it matches upper inode.
 	 * We only trust dir index if we verified that lower dir matches
@@ -1006,6 +1065,10 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	}
 
 	revert_creds(old_cred);
+	if (origin_path) {
+		dput(origin_path->dentry);
+		kfree(origin_path);
+	}
 	dput(index);
 	kfree(stack);
 	kfree(d.redirect);
@@ -1019,6 +1082,11 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	for (i = 0; i < ctr; i++)
 		dput(stack[i].dentry);
 	kfree(stack);
+out_put_origin:
+	if (origin_path) {
+		dput(origin_path->dentry);
+		kfree(origin_path);
+	}
 out_put_upper:
 	dput(upperdentry);
 	kfree(upperredirect);
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 8608f3ec8f44..9f2541c26b0a 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -272,6 +272,7 @@ bool ovl_need_index(struct dentry *dentry);
 int ovl_nlink_start(struct dentry *dentry, bool *locked);
 void ovl_nlink_end(struct dentry *dentry, bool locked);
 int ovl_lock_rename_workdir(struct dentry *workdir, struct dentry *upperdir);
+int ovl_check_metacopy_xattr(struct dentry *dentry);
 
 static inline bool ovl_is_impuredir(struct dentry *dentry)
 {
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 7929cc872df6..f6a2b75bae35 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -740,3 +740,25 @@ int ovl_lock_rename_workdir(struct dentry *workdir, struct dentry *upperdir)
 	pr_err("overlayfs: failed to lock workdir+upperdir\n");
 	return -EIO;
 }
+
+/* err < 0, 0 if no metacopy xattr, 1 if metacopy xattr found */
+int ovl_check_metacopy_xattr(struct dentry *dentry)
+{
+	int res;
+
+	/* Only regular files can have metacopy xattr */
+	if (!S_ISREG(d_inode(dentry)->i_mode))
+		return 0;
+
+	res = vfs_getxattr(dentry, OVL_XATTR_METACOPY, NULL, 0);
+	if (res < 0) {
+		if (res == -ENODATA || res == -EOPNOTSUPP)
+			return 0;
+		goto out;
+	}
+
+	return 1;
+out:
+	pr_warn_ratelimited("overlayfs: failed to get metacopy (%i)\n", res);
+	return res;
+}
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 11/28] ovl: Copy up meta inode data from lowest data inode
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (9 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 10/28] ovl: Modify ovl_lookup() and friends to lookup metacopy dentry Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-29 19:38 ` [PATCH v13 12/28] ovl: Fix ovl_getattr() to get number of blocks from lower Vivek Goyal
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

So far lower could not be a meta inode. So whenever it was time to copy
up data of a meta inode, we could copy it up from top most lower dentry.

But now lower itself can be a metacopy inode. That means data copy up
needs to take place from a data inode in metacopy inode chain. Find
lower data inode in the chain and use that for data copy up.

Introduced a helper called ovl_path_lowerdata() to find the lower
data inode chain.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/copy_up.c   | 14 ++++++++++----
 fs/overlayfs/overlayfs.h |  1 +
 fs/overlayfs/util.c      | 14 ++++++++++++++
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 9801ae7baa5d..0c8d2755bd25 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -585,13 +585,15 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
 	}
 
 	if (S_ISREG(c->stat.mode) && !c->metacopy) {
-		struct path upperpath;
+		struct path upperpath, datapath;
 
 		ovl_path_upper(c->dentry, &upperpath);
 		BUG_ON(upperpath.dentry != NULL);
 		upperpath.dentry = temp;
 
-		err = ovl_copy_up_data(&c->lowerpath, &upperpath, c->stat.size);
+		ovl_path_lowerdata(c->dentry, &datapath);
+		BUG_ON(datapath.dentry == NULL);
+		err = ovl_copy_up_data(&datapath, &upperpath, c->stat.size);
 		if (err)
 			return err;
 	}
@@ -754,14 +756,18 @@ static int ovl_do_copy_up(struct ovl_copy_up_ctx *c)
 /* Copy up data of an inode which was copied up metadata only in the past. */
 static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c)
 {
-	struct path upperpath;
+	struct path upperpath, datapath;
 	int err;
 
 	ovl_path_upper(c->dentry, &upperpath);
 	if (WARN_ON(upperpath.dentry == NULL))
 		return -EIO;
 
-	err = ovl_copy_up_data(&c->lowerpath, &upperpath, c->stat.size);
+	ovl_path_lowerdata(c->dentry, &datapath);
+	if (WARN_ON(datapath.dentry == NULL))
+		return -EIO;
+
+	err = ovl_copy_up_data(&datapath, &upperpath, c->stat.size);
 	if (err)
 		return err;
 
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 9f2541c26b0a..7a9b6de16c89 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -222,6 +222,7 @@ bool ovl_dentry_weird(struct dentry *dentry);
 enum ovl_path_type ovl_path_type(struct dentry *dentry);
 void ovl_path_upper(struct dentry *dentry, struct path *path);
 void ovl_path_lower(struct dentry *dentry, struct path *path);
+void ovl_path_lowerdata(struct dentry *dentry, struct path *path);
 enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path);
 struct dentry *ovl_dentry_upper(struct dentry *dentry);
 struct dentry *ovl_dentry_lower(struct dentry *dentry);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index f6a2b75bae35..2a467f9fcbfa 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -148,6 +148,20 @@ void ovl_path_lower(struct dentry *dentry, struct path *path)
 	}
 }
 
+void ovl_path_lowerdata(struct dentry *dentry, struct path *path)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+	int idx = oe->numlower - 1;
+
+	if (!oe->numlower) {
+		*path = (struct path) { };
+		return;
+	}
+
+	path->mnt = oe->lowerstack[idx].layer->mnt;
+	path->dentry = oe->lowerstack[idx].dentry;
+}
+
 enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path)
 {
 	enum ovl_path_type type = ovl_path_type(dentry);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 12/28] ovl: Fix ovl_getattr() to get number of blocks from lower
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (10 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 11/28] ovl: Copy up meta inode data from lowest data inode Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-30  9:24   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 13/28] ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry Vivek Goyal
                   ` (15 subsequent siblings)
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

If an inode has been copied up metadata only, then we need to query the
number of blocks from lower and fill up the stat->st_blocks.

We need to be careful about races where we are doing stat on one cpu and
data copy up is taking place on other cpu. We want to return
stat->st_blocks either from lower or stable upper and not something in
between. Hence, ovl_has_upperdata() is called first to figure out whether
block reporting will take place from lower or upper.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/inode.c     | 17 ++++++++++++++++-
 fs/overlayfs/overlayfs.h |  1 +
 fs/overlayfs/util.c      | 16 ++++++++++++++++
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index e1dbfed0c449..cd03f3e642fd 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -76,6 +76,9 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
 	bool is_dir = S_ISDIR(dentry->d_inode->i_mode);
 	bool samefs = ovl_same_sb(dentry->d_sb);
 	int err;
+	bool metacopy = false;
+
+	metacopy = ovl_is_metacopy_dentry(dentry);
 
 	type = ovl_path_real(dentry, &realpath);
 	old_cred = ovl_override_creds(dentry->d_sb);
@@ -93,7 +96,8 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
 	if (!is_dir || samefs) {
 		if (OVL_TYPE_ORIGIN(type)) {
 			struct kstat lowerstat;
-			u32 lowermask = STATX_INO | (!is_dir ? STATX_NLINK : 0);
+			u32 lowermask = STATX_INO | STATX_BLOCKS |
+					(!is_dir ? STATX_NLINK : 0);
 
 			ovl_path_lower(dentry, &realpath);
 			err = vfs_getattr(&realpath, &lowerstat,
@@ -126,6 +130,17 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
 			else
 				stat->dev = ovl_get_pseudo_dev(dentry);
 		}
+		if (metacopy) {
+			struct kstat lowerdatastat;
+			u32 lowermask = STATX_BLOCKS;
+
+			ovl_path_lowerdata(dentry, &realpath);
+			err = vfs_getattr(&realpath, &lowerdatastat,
+					  lowermask, flags);
+			if (err)
+				goto out;
+			stat->blocks = lowerdatastat.blocks;
+		}
 		if (samefs) {
 			/*
 			 * When all layers are on the same fs, all real inode
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 7a9b6de16c89..4eb4b765887f 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -274,6 +274,7 @@ int ovl_nlink_start(struct dentry *dentry, bool *locked);
 void ovl_nlink_end(struct dentry *dentry, bool locked);
 int ovl_lock_rename_workdir(struct dentry *workdir, struct dentry *upperdir);
 int ovl_check_metacopy_xattr(struct dentry *dentry);
+bool ovl_is_metacopy_dentry(struct dentry *dentry);
 
 static inline bool ovl_is_impuredir(struct dentry *dentry)
 {
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 2a467f9fcbfa..fd98329e820c 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -776,3 +776,19 @@ int ovl_check_metacopy_xattr(struct dentry *dentry)
 	pr_warn_ratelimited("overlayfs: failed to get metacopy (%i)\n", res);
 	return res;
 }
+
+bool ovl_is_metacopy_dentry(struct dentry *dentry)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+
+	if (!d_is_reg(dentry))
+		return false;
+
+	if (ovl_dentry_upper(dentry)) {
+		if (!ovl_has_upperdata(dentry))
+			return true;
+		return false;
+	}
+
+	return (oe->numlower > 1);
+}
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 13/28] ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (11 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 12/28] ovl: Fix ovl_getattr() to get number of blocks from lower Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-30  6:01   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 14/28] ovl: Do not expose metacopy only dentry from d_real() Vivek Goyal
                   ` (14 subsequent siblings)
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

Now we have the notion of data dentry and metacopy dentry. ovl_dentry_lower()
will return lower dentry at idx 0, but it could be either data or metacopy
dentry. Now we support metacopy dentries in lower layers so it is possible
that lowerstack[0] is metacopy dentry while lowerstack[1] is actual data
dentry.

So add an helper which returns lowest most dentry which is supposed to be
data dentry.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/overlayfs.h |  1 +
 fs/overlayfs/util.c      | 14 ++++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 4eb4b765887f..214d9f08c574 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -226,6 +226,7 @@ void ovl_path_lowerdata(struct dentry *dentry, struct path *path);
 enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path);
 struct dentry *ovl_dentry_upper(struct dentry *dentry);
 struct dentry *ovl_dentry_lower(struct dentry *dentry);
+struct dentry *ovl_dentry_lowerdata(struct dentry *dentry);
 struct dentry *ovl_dentry_real(struct dentry *dentry);
 struct dentry *ovl_i_dentry_upper(struct inode *inode);
 struct inode *ovl_inode_upper(struct inode *inode);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index fd98329e820c..394674c4c820 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -186,6 +186,20 @@ struct dentry *ovl_dentry_lower(struct dentry *dentry)
 	return oe->numlower ? oe->lowerstack[0].dentry : NULL;
 }
 
+/*
+ * ovl_dentry_lower() could return either a data dentry or metacopy dentry
+ * dependig on what is stored in lowerstack[0]. At times we need to find
+ * lower dentry which has data (and not metacopy dntry). This helper
+ * returns the lower data dentry.
+ */
+struct dentry *ovl_dentry_lowerdata(struct dentry *dentry)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+	int idx = oe->numlower - 1;
+
+	return idx >= 0 ? oe->lowerstack[idx].dentry : NULL;
+}
+
 struct dentry *ovl_dentry_real(struct dentry *dentry)
 {
 	return ovl_dentry_upper(dentry) ?: ovl_dentry_lower(dentry);
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 14/28] ovl: Do not expose metacopy only dentry from d_real()
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (12 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 13/28] ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-29 19:38 ` [PATCH v13 15/28] ovl: Move some of ovl_nlink_start() functionality in ovl_nlink_prep() Vivek Goyal
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

d_real() can make a upper or lower metacopy dentry/inode visible to the vfs
layer. This is something new and vfs layer does not know that this inode
contains only metadata and not data. And this could break things.

So to be safe, do not expose metacopy only dentry/inode to vfs using
d_real().

IOW, d_real() will not reuturn metacopy dentry. Instead, it will return
dentry corresponding lower data dentry/inode which has file data.

For regular d_real() call (inode == NULL, D_REAL_UPPER not set), if upper
dentry inode is metacopy only and does not have data, return lower dentry.

If d_real() is called with flag D_REAL_UPPER, return upper dentry only if
it has data (flag OVL_UPPERDATA is set).

Similiarly, if d_real(inode=X) is called, a warning is emitted if returned
dentry/inode does not have OVL_UPPERDATA set. This should not happen as
we never made this metacopy inode visible to vfs so nobody should be
calling overlayfs back with inode=metacopy_inode.

I scanned the code and I don't think it breaks any of the existing code.
There are two users of D_REAL_UPPER. may_write_real() and
update_ovl_inode_times().

may_write_real(), will get an NULL dentry if upper inode is metacopy only
and it will return -EPERM. Effectively, we are disallowing modifications
to metacopy only inode from this interface. Though there is opportunity
to improve it. (Allow chattr on metacopy inodes).

update_ovl_inode_times() gets inode mtime and ctime from real inode. It
should not be broken for metacopy inode as well for following reasons.

- For any metadata operations (setattr, acl etc), overlay always calls
  ovl_copyattr() and updates ovl inode mtime and ctime. So there is no
  need to update mtime and ctime in this case. Its already updated, hence
  even if d_real(D_REAL_UPPER) returns nil, it should be fine.

- For metadata inode, mtime should be same as lower and not change. (data
  can't be modified on metadata inode without copyup). IOW, mtime of
  ovl dentry should be same as mtime of underlying metadata inode on upper
  always. So there is no need to update it.

- For file writes, ctime and mtime will be updated. But in that case
  first data will be copied up and this will not be a metadata inode
  anymore. And furthr call to d_real(D_REAL_UPPER) will return upper
  inode and new mtime and ctime will be obtainable.

So atime updates should work just fine for metacopy inodes. I think only
corner case is if somehow underlying filesystem changes ctime of upper
metadata inode without overlay knowing about it. Not sure how that
can happen. If somehow is affected by that, then we probably can implement
another flag which will allow caller to get metacopy inode as well.
Something like d_real(D_REAL_UPPER | D_METACOPY). And that should solve
this issue.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/super.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index f77acb8d632a..a5f796452c9f 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -96,8 +96,14 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
 	struct dentry *real;
 	int err;
 
-	if (flags & D_REAL_UPPER)
-		return ovl_dentry_upper(dentry);
+	if (flags & D_REAL_UPPER) {
+		real = ovl_dentry_upper(dentry);
+		if (!real)
+			return NULL;
+		if (!ovl_has_upperdata(dentry))
+			return NULL;
+		return real;
+	}
 
 	if (!d_is_reg(dentry)) {
 		if (!inode || inode == d_inode(dentry))
@@ -113,15 +119,22 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
 
 	real = ovl_dentry_upper(dentry);
 	if (real && (!inode || inode == d_inode(real))) {
+		bool metacopy = !ovl_has_upperdata(dentry);
 		if (!inode) {
 			err = ovl_check_append_only(d_inode(real), open_flags);
 			if (err)
 				return ERR_PTR(err);
-		}
+
+			if (unlikely(metacopy))
+				goto lower;
+		} else if (unlikely(metacopy))
+			goto bug;
+
 		return real;
 	}
 
-	real = ovl_dentry_lower(dentry);
+lower:
+	real = ovl_dentry_lowerdata(dentry);
 	if (!real)
 		goto bug;
 
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 15/28] ovl: Move some of ovl_nlink_start() functionality in ovl_nlink_prep()
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (13 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 14/28] ovl: Do not expose metacopy only dentry from d_real() Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-30  6:23   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 16/28] ovl: Create locked version of ovl_nlink_start() and ovl_nlink_end() Vivek Goyal
                   ` (12 subsequent siblings)
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

Soon I want to write patches to enable redirects on non-dir files. That means
it is possible that we have to deal with the case where multiple dentries
might be sharing inode and ovl_inode->redirect field setting/resetting
will need to be protected by taking ovl_inode->lock. Current dentry based
locking alone will not be sufficient for this case.

As of now, nlink based code takes ovl_inode->lock in some cases. For redirect
case, during ovl_rename() I might have to take ovl_inode->lock both on
old as well as new ovl_inode. And that means that I need to make sure
there are no deadlocks.

I want to separate out logic for taking lock in a new function. Hence will
need to break down ovl_nlink_start() a bit.

ovl_nlink_start() does the copy up and then takes ovl_inode->lock. Move
copy up related portions in a separate function called ovl_nlink_prep().

This patch is just code reorganization and no funcitonal change.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/dir.c       | 14 ++++++++++++++
 fs/overlayfs/overlayfs.h |  1 +
 fs/overlayfs/util.c      | 26 ++++++++++++++++++--------
 3 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 7617a03acc30..1f003be4a19e 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -595,6 +595,10 @@ static int ovl_link(struct dentry *old, struct inode *newdir,
 	if (err)
 		goto out_drop_write;
 
+	err = ovl_nlink_prep(old);
+	if (err)
+		goto out_drop_write;
+
 	err = ovl_nlink_start(old, &locked);
 	if (err)
 		goto out_drop_write;
@@ -752,6 +756,10 @@ static int ovl_do_remove(struct dentry *dentry, bool is_dir)
 	if (err)
 		goto out_drop_write;
 
+	err = ovl_nlink_prep(dentry);
+	if (err)
+		goto out_drop_write;
+
 	err = ovl_nlink_start(dentry, &locked);
 	if (err)
 		goto out_drop_write;
@@ -960,6 +968,12 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 		if (err)
 			goto out_drop_write;
 	} else {
+		err = ovl_nlink_prep(new);
+		if (err)
+			goto out_drop_write;
+	}
+
+	if (overwrite) {
 		err = ovl_nlink_start(new, &new_locked);
 		if (err)
 			goto out_drop_write;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 214d9f08c574..aa5b0c121fc7 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -271,6 +271,7 @@ bool ovl_test_flag(unsigned long flag, struct inode *inode);
 bool ovl_inuse_trylock(struct dentry *dentry);
 void ovl_inuse_unlock(struct dentry *dentry);
 bool ovl_need_index(struct dentry *dentry);
+int ovl_nlink_prep(struct dentry *dentry);
 int ovl_nlink_start(struct dentry *dentry, bool *locked);
 void ovl_nlink_end(struct dentry *dentry, bool locked);
 int ovl_lock_rename_workdir(struct dentry *workdir, struct dentry *upperdir);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 394674c4c820..ed93e233894f 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -675,15 +675,9 @@ static void ovl_cleanup_index(struct dentry *dentry)
 	goto out;
 }
 
-/*
- * Operations that change overlay inode and upper inode nlink need to be
- * synchronized with copy up for persistent nlink accounting.
- */
-int ovl_nlink_start(struct dentry *dentry, bool *locked)
+int ovl_nlink_prep(struct dentry *dentry)
 {
-	struct ovl_inode *oi = OVL_I(d_inode(dentry));
-	const struct cred *old_cred;
-	int err;
+	int err = 0;
 
 	if (!d_inode(dentry))
 		return 0;
@@ -708,6 +702,22 @@ int ovl_nlink_start(struct dentry *dentry, bool *locked)
 			return err;
 	}
 
+	return err;
+}
+
+/*
+ * Operations that change overlay inode and upper inode nlink need to be
+ * synchronized with copy up for persistent nlink accounting.
+ */
+int ovl_nlink_start(struct dentry *dentry, bool *locked)
+{
+	struct ovl_inode *oi = OVL_I(d_inode(dentry));
+	const struct cred *old_cred;
+	int err;
+
+	if (!d_inode(dentry))
+		return 0;
+
 	err = mutex_lock_interruptible(&oi->lock);
 	if (err)
 		return err;
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 16/28] ovl: Create locked version of ovl_nlink_start() and ovl_nlink_end()
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (14 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 15/28] ovl: Move some of ovl_nlink_start() functionality in ovl_nlink_prep() Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-30  6:28   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 17/28] ovl: During rename lock both source and target ovl_inode Vivek Goyal
                   ` (11 subsequent siblings)
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

Soon in ovl_rename() I will introduce a function to take lock both on old
and new ovl_inode. That means I don't want ovl_nlink_start() to take lock
and ovl_nlink_end() to drop lock. It will be done by other functions.

So create ovl_nlink_start_locked() and ovl_nlink_end_locked() which assume
that by the time they are called, lock has already been taken and these
don't deal with taking/releasing locks.

These will be used in following patches.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/overlayfs.h |  2 ++
 fs/overlayfs/util.c      | 50 +++++++++++++++++++++++++++++-------------------
 2 files changed, 32 insertions(+), 20 deletions(-)

diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index aa5b0c121fc7..429713653b3b 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -273,7 +273,9 @@ void ovl_inuse_unlock(struct dentry *dentry);
 bool ovl_need_index(struct dentry *dentry);
 int ovl_nlink_prep(struct dentry *dentry);
 int ovl_nlink_start(struct dentry *dentry, bool *locked);
+int ovl_nlink_start_locked(struct dentry *dentry);
 void ovl_nlink_end(struct dentry *dentry, bool locked);
+void ovl_nlink_end_locked(struct dentry *dentry);
 int ovl_lock_rename_workdir(struct dentry *workdir, struct dentry *upperdir);
 int ovl_check_metacopy_xattr(struct dentry *dentry);
 bool ovl_is_metacopy_dentry(struct dentry *dentry);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index ed93e233894f..927960aa57ee 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -709,21 +709,13 @@ int ovl_nlink_prep(struct dentry *dentry)
  * Operations that change overlay inode and upper inode nlink need to be
  * synchronized with copy up for persistent nlink accounting.
  */
-int ovl_nlink_start(struct dentry *dentry, bool *locked)
+int ovl_nlink_start_locked(struct dentry *dentry)
 {
-	struct ovl_inode *oi = OVL_I(d_inode(dentry));
 	const struct cred *old_cred;
 	int err;
 
-	if (!d_inode(dentry))
-		return 0;
-
-	err = mutex_lock_interruptible(&oi->lock);
-	if (err)
-		return err;
-
 	if (d_is_dir(dentry) || !ovl_test_flag(OVL_INDEX, d_inode(dentry)))
-		goto out;
+		return 0;
 
 	old_cred = ovl_override_creds(dentry->d_sb);
 	/*
@@ -734,8 +726,22 @@ int ovl_nlink_start(struct dentry *dentry, bool *locked)
 	 */
 	err = ovl_set_nlink_upper(dentry);
 	revert_creds(old_cred);
+	return err;
+}
 
-out:
+int ovl_nlink_start(struct dentry *dentry, bool *locked)
+{
+	struct ovl_inode *oi = OVL_I(d_inode(dentry));
+	int err;
+
+	if (!d_inode(dentry))
+		return 0;
+
+	err = mutex_lock_interruptible(&oi->lock);
+	if (err)
+		return err;
+
+	err = ovl_nlink_start_locked(dentry);
 	if (err)
 		mutex_unlock(&oi->lock);
 	else
@@ -744,18 +750,22 @@ int ovl_nlink_start(struct dentry *dentry, bool *locked)
 	return err;
 }
 
-void ovl_nlink_end(struct dentry *dentry, bool locked)
+void ovl_nlink_end_locked(struct dentry *dentry)
 {
-	if (locked) {
-		if (ovl_test_flag(OVL_INDEX, d_inode(dentry)) &&
-		    d_inode(dentry)->i_nlink == 0) {
-			const struct cred *old_cred;
+	if (ovl_test_flag(OVL_INDEX, d_inode(dentry)) &&
+	    d_inode(dentry)->i_nlink == 0) {
+		const struct cred *old_cred;
 
-			old_cred = ovl_override_creds(dentry->d_sb);
-			ovl_cleanup_index(dentry);
-			revert_creds(old_cred);
-		}
+		old_cred = ovl_override_creds(dentry->d_sb);
+		ovl_cleanup_index(dentry);
+		revert_creds(old_cred);
+	}
+}
 
+void ovl_nlink_end(struct dentry *dentry, bool locked)
+{
+	if (locked) {
+		ovl_nlink_end_locked(dentry);
 		mutex_unlock(&OVL_I(d_inode(dentry))->lock);
 	}
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 17/28] ovl: During rename lock both source and target ovl_inode
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (15 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 16/28] ovl: Create locked version of ovl_nlink_start() and ovl_nlink_end() Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-30  6:50   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 18/28] ovl: Check redirects for metacopy files Vivek Goyal
                   ` (10 subsequent siblings)
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

In some cases we need to take both source (old) and target(new) dentry
ovl_inode->lock. This patch adds support for that. Locks are taken in
order of increasing inode address to avoid deadlock. This code has been
taken from lock_two_nondirectories().

As of now, metacopy needs this lock if we are planning to update redirect
information on source/target ovl_inode. nlink related accounting takes this
lock on target for the case of overwrite.

One change w.r.t existing code is mutex_lock() vs mutex_lock_interruptible().
Now ovl_rename() code path has switched to mutex_lock() variant for
ovl_inode->lock. There is no particular reason, I just found it easier
to reuse lock_two_nondirectories(), that's why. Hoping its not a concern.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/dir.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 82 insertions(+), 5 deletions(-)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 1f003be4a19e..3ea052b6bac7 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -896,12 +896,84 @@ static int ovl_set_redirect(struct dentry *dentry, bool samedir)
 	return err;
 }
 
+static void ovl_lock_two_nondirectories(struct ovl_inode *oi1,
+				       struct ovl_inode *oi2)
+{
+	if (oi1 > oi2)
+		swap(oi1, oi2);
+
+	if (oi1)
+                mutex_lock(&oi1->lock);
+        if (oi2 && oi2 != oi1)
+                mutex_lock_nested(&oi2->lock, I_MUTEX_NONDIR2);
+}
+
+static void ovl_rename_lock_ovl_inodes(struct dentry *old, struct dentry *new,
+				       bool overwrite, bool *old_locked,
+				       bool *new_locked)
+{
+	bool lock_old = false, lock_new = false;
+	struct ovl_inode * oi1 = NULL, *oi2 = NULL;
+
+	/*
+	 * First determine which inodes need to be locked.
+	 *
+	 * If "old" is metacopy, oi->redirect will be set. So we need lock
+	 * on old ovl_inode. If "new" is metacopy, and it is not being
+	 * overwritten, then oi->redirect will be set and we will need
+	 * lock on new ovl_inode.
+	 */
+
+	if (ovl_is_metacopy_dentry(old))
+		lock_old = true;
+
+	if (!overwrite && ovl_is_metacopy_dentry(new))
+		lock_new = true;
+	/*
+	 * For nlink, if we are overwriting and new->inode is not empty,
+	 * then we need to take lock on new ovl_inode.
+	 */
+	if (overwrite && d_inode(new))
+		lock_new = true;
+
+	oi1 = OVL_I(d_inode(old));
+	if (lock_new)
+		oi2 = OVL_I(d_inode(new));
+
+	if (lock_old && lock_new) {
+		ovl_lock_two_nondirectories(oi1, oi2);
+		*old_locked = *new_locked = true;
+		return;
+	}
+
+	if (lock_old) {
+		mutex_lock(&oi1->lock);
+		*old_locked = true;
+		return;
+	}
+
+	if (lock_new) {
+		mutex_lock(&oi2->lock);
+		*new_locked = true;
+	}
+}
+
+static void ovl_rename_unlock_ovl_inodes(struct dentry *old, struct dentry *new,
+				         bool old_locked, bool new_locked)
+{
+	if (old_locked)
+		mutex_unlock(&OVL_I(d_inode(old))->lock);
+
+	if (new_locked && d_inode(old) != d_inode(new))
+		mutex_unlock(&OVL_I(d_inode(new))->lock);
+}
+
 static int ovl_rename(struct inode *olddir, struct dentry *old,
 		      struct inode *newdir, struct dentry *new,
 		      unsigned int flags)
 {
 	int err;
-	bool new_locked = false;
+	bool old_locked = false, new_locked = false;
 	struct dentry *old_upperdir;
 	struct dentry *new_upperdir;
 	struct dentry *olddentry;
@@ -973,10 +1045,12 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 			goto out_drop_write;
 	}
 
-	if (overwrite) {
-		err = ovl_nlink_start(new, &new_locked);
+	ovl_rename_lock_ovl_inodes(old, new, overwrite, &old_locked,
+				   &new_locked);
+	if (overwrite && new_locked) {
+		err = ovl_nlink_start_locked(new);
 		if (err)
-			goto out_drop_write;
+			goto out_unlock_inodes;
 	}
 
 	old_cred = ovl_override_creds(old->d_sb);
@@ -1102,7 +1176,10 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 	unlock_rename(new_upperdir, old_upperdir);
 out_revert_creds:
 	revert_creds(old_cred);
-	ovl_nlink_end(new, new_locked);
+	if (new_locked)
+		ovl_nlink_end_locked(new);
+out_unlock_inodes:
+	ovl_rename_unlock_ovl_inodes(old, new, old_locked, new_locked);
 out_drop_write:
 	ovl_drop_write(old);
 out:
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 18/28] ovl: Check redirects for metacopy files
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (16 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 17/28] ovl: During rename lock both source and target ovl_inode Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-30 10:02   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 19/28] ovl: Treat metacopy dentries as type OVL_PATH_MERGE Vivek Goyal
                   ` (9 subsequent siblings)
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

Right now we rely on path based lookup for data origin of metacopy upper.
This will work only if upper has not been renamed. We solved this problem
already for merged directories using redirect. Use same logic for metacopy
files.

This patch just goes on to check redirects for metacopy files.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/namei.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 1dba89e9543f..a7a9588f64b7 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -260,18 +260,19 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
 			goto out_err;
 		d->stop = !err;
 		d->metacopy = !!err;
-		goto out;
-	}
-	if (last_element)
-		d->is_dir = true;
-	if (d->last)
-		goto out;
-
-	if (ovl_is_opaquedir(this)) {
-		d->stop = true;
+		if (!d->metacopy)
+			goto out;
+	} else {
 		if (last_element)
-			d->opaque = true;
-		goto out;
+			d->is_dir = true;
+		if (d->last)
+			goto out;
+		if (ovl_is_opaquedir(this)) {
+			d->stop = true;
+			if (last_element)
+				d->opaque = true;
+			goto out;
+		}
 	}
 	err = ovl_check_redirect(this, d, prelen, post);
 	if (err)
@@ -859,7 +860,6 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		}
 		if (upperdentry && !d.is_dir) {
 			unsigned int origin_ctr = 0;
-			BUG_ON(d.redirect);
 			/*
 			 * Lookup copy up origin by decoding origin file handle.
 			 * We may get a disconnected dentry, which is fine,
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 19/28] ovl: Treat metacopy dentries as type OVL_PATH_MERGE
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (17 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 18/28] ovl: Check redirects for metacopy files Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-30  6:52   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 20/28] ovl: Do not set dentry type ORIGIN for broken hardlinks Vivek Goyal
                   ` (8 subsequent siblings)
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

Right now OVL_PATH_MERGE is used only for merged directories.
But conceptually, a metacopy dentry (backed by a lower data dentry) is
a merged entity as well.

So mark metacopy dentries as OVL_PATH_MERGE and ovl_rename() makes use
of this property later to set redirect on a metacopy file.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 927960aa57ee..29f7336ade88 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -118,7 +118,7 @@ enum ovl_path_type ovl_path_type(struct dentry *dentry)
 		 */
 		if (oe->numlower) {
 			type |= __OVL_PATH_ORIGIN;
-			if (d_is_dir(dentry))
+			if (d_is_dir(dentry) || !ovl_has_upperdata(dentry))
 				type |= __OVL_PATH_MERGE;
 		}
 	} else {
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 20/28] ovl: Do not set dentry type ORIGIN for broken hardlinks
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (18 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 19/28] ovl: Treat metacopy dentries as type OVL_PATH_MERGE Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-30  9:54   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 21/28] ovl: Set redirect on metacopy files upon rename Vivek Goyal
                   ` (7 subsequent siblings)
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

If a dentry has copy up origin, we set flag OVL_PATH_ORIGIN. So far
this decision was easy that we had to check only for oe->numlower
and if it is non-zero, we knew there is copy up origin. (For non-dir
we installed origin dentry in lowerstack[0]).

But we don't create ORGIN xattr for broken hardlinks (index=off). And
with metacopy feature it is possible that we will still install
lowerstack[0]. But that's lower data dentry of metacopy upper of broken
hardlink and not ORIGIN XATTR is not set.

So two differentiate between two cases, do not set OVL_PATH_ORIGIN if
we have a broken hardlink.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/util.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 29f7336ade88..961d65bd25c9 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -117,7 +117,14 @@ enum ovl_path_type ovl_path_type(struct dentry *dentry)
 		 * Non-dir dentry can hold lower dentry of its copy up origin.
 		 */
 		if (oe->numlower) {
-			type |= __OVL_PATH_ORIGIN;
+			/*
+			 * ORIGIN is created for everyting except broken
+			 * hardlinks
+			 */
+			if (!(d_inode(dentry)->i_nlink > 1 &&
+			    !ovl_test_flag(OVL_INDEX, d_inode(dentry))))
+				type |= __OVL_PATH_ORIGIN;
+
 			if (d_is_dir(dentry) || !ovl_has_upperdata(dentry))
 				type |= __OVL_PATH_MERGE;
 		}
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 21/28] ovl: Set redirect on metacopy files upon rename
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (19 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 20/28] ovl: Do not set dentry type ORIGIN for broken hardlinks Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-30  7:31   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 22/28] ovl: Set redirect on upper inode when it is linked Vivek Goyal
                   ` (6 subsequent siblings)
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

Set redirect on metacopy files upon rename. This will help find data dentry
in lower dirs.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/dir.c | 50 +++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 37 insertions(+), 13 deletions(-)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 3ea052b6bac7..7c0a02d9f6bd 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -968,6 +968,27 @@ static void ovl_rename_unlock_ovl_inodes(struct dentry *old, struct dentry *new,
 		mutex_unlock(&OVL_I(d_inode(new))->lock);
 }
 
+static bool ovl_relative_redirect(struct dentry *dentry, bool samedir)
+{
+	if (d_is_dir(dentry))
+		return samedir;
+
+	/*
+	 * For non-dir hardlinked files, we need absolute redirects
+	 * in general as two upper hardlinks could be in different
+	 * dirs. We could put a relative redirect now and convert
+	 * it to absolute redirect later. But when nlink > 1 and
+	 * indexing is on, that means relative redirect needs to be
+	 * converted to absolute during copy up of another lower
+	 * hardllink as well.
+	 *
+	 * So without optimizing too much, just check if non-dir
+	 * has nlink > 1 or not. If yes, set absolute redirect to
+	 * begin with.
+	 */
+	return (d_inode(dentry)->i_nlink > 1 ? false : samedir);
+}
+
 static int ovl_rename(struct inode *olddir, struct dentry *old,
 		      struct inode *newdir, struct dentry *new,
 		      unsigned int flags)
@@ -1131,22 +1152,25 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 		goto out_dput;
 
 	err = 0;
-	if (is_dir) {
-		if (ovl_type_merge_or_lower(old))
-			err = ovl_set_redirect(old, samedir);
-		else if (!old_opaque && ovl_type_merge(new->d_parent))
-			err = ovl_set_opaque_xerr(old, olddentry, -EXDEV);
-		if (err)
-			goto out_dput;
-	}
-	if (!overwrite && new_is_dir) {
+	if (ovl_type_merge_or_lower(old))
+		err = ovl_set_redirect(old,
+				       ovl_relative_redirect(old, samedir));
+	else if (is_dir && !old_opaque && ovl_type_merge(new->d_parent))
+		err = ovl_set_opaque_xerr(old, olddentry, -EXDEV);
+
+	if (err)
+		goto out_dput;
+
+	if (!overwrite) {
 		if (ovl_type_merge_or_lower(new))
-			err = ovl_set_redirect(new, samedir);
-		else if (!new_opaque && ovl_type_merge(old->d_parent))
+			err = ovl_set_redirect(new, ovl_relative_redirect(new,
+					       samedir));
+		else if (new_is_dir && !new_opaque &&
+			 ovl_type_merge(old->d_parent))
 			err = ovl_set_opaque_xerr(new, newdentry, -EXDEV);
-		if (err)
-			goto out_dput;
 	}
+	if (err)
+		goto out_dput;
 
 	err = ovl_do_rename(old_upperdir->d_inode, olddentry,
 			    new_upperdir->d_inode, newdentry, flags);
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 22/28] ovl: Set redirect on upper inode when it is linked
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (20 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 21/28] ovl: Set redirect on metacopy files upon rename Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-30  7:04   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 23/28] ovl: Remove redirect when data of a metacopy file is copied up Vivek Goyal
                   ` (5 subsequent siblings)
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

When we create a hardlink to a metacopy upper file, first the redirect
on that inode. Path based lookup will not work with newly created link
and redirect will solve that issue.

Also use absolute redirect as two hardlinks could be in different directores
and relative redirect will not work.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/dir.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 7c0a02d9f6bd..ccbe061fc4ba 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -24,6 +24,8 @@ module_param_named(redirect_max, ovl_redirect_max, ushort, 0644);
 MODULE_PARM_DESC(ovl_redirect_max,
 		 "Maximum length of absolute redirect xattr value");
 
+static int ovl_set_redirect(struct dentry *dentry, bool samedir);
+
 int ovl_cleanup(struct inode *wdir, struct dentry *wdentry)
 {
 	int err;
@@ -468,6 +470,9 @@ static int ovl_create_or_link(struct dentry *dentry, struct inode *inode,
 	const struct cred *old_cred;
 	struct cred *override_cred;
 	struct dentry *parent = dentry->d_parent;
+	struct dentry *hardlink_upper;
+
+	hardlink_upper = hardlink ? ovl_dentry_upper(hardlink) : NULL;
 
 	err = ovl_copy_up(parent);
 	if (err)
@@ -502,12 +507,18 @@ static int ovl_create_or_link(struct dentry *dentry, struct inode *inode,
 		put_cred(override_creds(override_cred));
 		put_cred(override_cred);
 
+		if (hardlink && ovl_is_metacopy_dentry(hardlink)) {
+			err = ovl_set_redirect(hardlink, false);
+			if (err)
+				goto out_revert_creds;
+		}
+
 		if (!ovl_dentry_is_whiteout(dentry))
 			err = ovl_create_upper(dentry, inode, attr,
-						hardlink);
+					       hardlink_upper);
 		else
 			err = ovl_create_over_whiteout(dentry, inode, attr,
-							hardlink);
+					               hardlink_upper);
 	}
 out_revert_creds:
 	revert_creds(old_cred);
@@ -606,8 +617,7 @@ static int ovl_link(struct dentry *old, struct inode *newdir,
 	inode = d_inode(old);
 	ihold(inode);
 
-	err = ovl_create_or_link(new, inode, NULL, ovl_dentry_upper(old),
-				 ovl_type_origin(old));
+	err = ovl_create_or_link(new, inode, NULL, old, ovl_type_origin(old));
 	if (err)
 		iput(inode);
 
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 23/28] ovl: Remove redirect when data of a metacopy file is copied up
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (21 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 22/28] ovl: Set redirect on upper inode when it is linked Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-29 19:38 ` [PATCH v13 24/28] ovl: Do not error if REDIRECT XATTR is missing Vivek Goyal
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

When a metacopy file is no longer a metacopy and data has been copied up,
remove REDIRECT xattr. Its not needed anymore.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/copy_up.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 0c8d2755bd25..fbaf5fbfdf88 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -774,9 +774,18 @@ static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c)
 	err = vfs_removexattr(upperpath.dentry, OVL_XATTR_METACOPY);
 	if (err)
 		return err;
+	/*
+	 * A metacopy files does not need redirect xattr once data has
+	 * been copied up.
+	 */
+	err = vfs_removexattr(upperpath.dentry, OVL_XATTR_REDIRECT);
+	if (err && err != -ENODATA && err != -EOPNOTSUPP)
+		return err;
 
+	/* We must be holding ovl_inode->lock if we are here */
+	ovl_dentry_set_redirect(c->dentry, NULL);
 	ovl_set_upperdata(d_inode(c->dentry));
-	return err;
+	return 0;
 }
 
 static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 24/28] ovl: Do not error if REDIRECT XATTR is missing
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (22 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 23/28] ovl: Remove redirect when data of a metacopy file is copied up Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-30  7:41   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 25/28] ovl: Use out_err insteada of out_nomem Vivek Goyal
                   ` (3 subsequent siblings)
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

Currently we first call vfs_getxattr() to get size of REDIRECT xatttr and
then make another call vfs_getxattr() to read the xattr in buffer.

This is all part of ovl_lookup() and we do not have ovl_inode->lock. That
means it is possible that inode got copied up on another cpu and is not
a metacopy inode anymore. And that also means REDIRECT xattr got removed.
And that means that while first call to vfs_getxattr() succeeded, second
call can fail with -ENODATA.

Do not error out if -ENODATA is received from second call. With metacopy
enabled, it is not an error case anymore.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/namei.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index a7a9588f64b7..a97666245726 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -47,8 +47,15 @@ static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
 		goto invalid;
 
 	res = vfs_getxattr(dentry, OVL_XATTR_REDIRECT, buf, res);
-	if (res < 0)
+	if (res < 0) {
+		/*
+		 * Redirect xattr can be removed if a parallel data
+		 * copy up took place.
+		 */
+		if (res == -ENODATA)
+			goto err_free;
 		goto fail;
+	}
 	if (res == 0)
 		goto invalid;
 	if (buf[0] == '/') {
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 25/28] ovl: Use out_err insteada of out_nomem
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (23 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 24/28] ovl: Do not error if REDIRECT XATTR is missing Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-30  7:35   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 26/28] ovl: Re-check redirect xattr during inode initialization Vivek Goyal
                   ` (2 subsequent siblings)
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

Right now we use goto out_nomem which assumes error code is -ENOMEM. But
there are other errors returned like -ESTALE as well. So instead of out_nomem,
use out_err which will do ERR_PTR(err). That way one can putt error code
in err and jump to out_err.

This just code reorganization and no change of functionality.

I am about to add more code and this organization helps laying more code
and error paths on top of it.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/inode.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index cd03f3e642fd..3dccfa1ee123 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -693,6 +693,7 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 	struct inode *inode;
 	bool bylower = ovl_hash_bylower(sb, upperdentry, lowerdentry, index);
 	bool is_dir, metacopy = false;
+	int err = -ENOMEM;
 
 	if (!realinode)
 		realinode = d_inode(lowerdentry);
@@ -710,7 +711,7 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 		inode = iget5_locked(sb, (unsigned long) key,
 				     ovl_inode_test, ovl_inode_set, key);
 		if (!inode)
-			goto out_nomem;
+			goto out_err;
 		if (!(inode->i_state & I_NEW)) {
 			/*
 			 * Verify that the underlying files stored in the inode
@@ -719,8 +720,8 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 			if (!ovl_verify_inode(inode, lowerdentry, upperdentry,
 					      true)) {
 				iput(inode);
-				inode = ERR_PTR(-ESTALE);
-				goto out;
+				err = -ESTALE;
+				goto out_err;
 			}
 
 			dput(upperdentry);
@@ -735,8 +736,10 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 	} else {
 		/* Lower hardlink that will be broken on copy up */
 		inode = new_inode(sb);
-		if (!inode)
-			goto out_nomem;
+		if (!inode) {
+			err = -ENOMEM;
+			goto out_err;
+		}
 	}
 	ovl_fill_inode(inode, realinode->i_mode, realinode->i_rdev);
 	ovl_inode_init(inode, upperdentry, lowerdentry);
@@ -766,7 +769,7 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 out:
 	return inode;
 
-out_nomem:
-	inode = ERR_PTR(-ENOMEM);
+out_err:
+	inode = ERR_PTR(err);
 	goto out;
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 26/28] ovl: Re-check redirect xattr during inode initialization
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (24 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 25/28] ovl: Use out_err insteada of out_nomem Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-30  8:56   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode Vivek Goyal
  2018-03-29 19:38 ` [PATCH v13 28/28] ovl: Enable metadata only feature Vivek Goyal
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

So far redirect could be placed on directories only and now it can be
placed on regular files as well. Also it could be completely removed
when a metacopy copy up file's data is copied up. That means if a redirect
is present during ovl_lookup(), it could be gone by the time ovl_get_inode()
happens.

Or it is possible that ovl_lookup() does not see a redirect and a rename
is taking place on a hard link and that places a redirect. And by the
time ovl_lookup() calls ovl_get_inode(), it sets ovl_inode->redirect = NULL
(Assume inode got flushed out of cache and was allocated new).

IOW, because we check and process redirect without locks in ovl_lookup(),
many possibilities open up for regular files. So for such cases, do not
use the redirect provided by the caller. Instead query it and install
in ovl_inode->redirect.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/inode.c     | 19 ++++++++++++++++++-
 fs/overlayfs/overlayfs.h |  1 +
 fs/overlayfs/util.c      | 42 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 3dccfa1ee123..6a0c85699024 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -694,6 +694,7 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 	bool bylower = ovl_hash_bylower(sb, upperdentry, lowerdentry, index);
 	bool is_dir, metacopy = false;
 	int err = -ENOMEM;
+	char *new_redirect = NULL;
 
 	if (!realinode)
 		realinode = d_inode(lowerdentry);
@@ -754,7 +755,18 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 	if (upperdentry && !metacopy)
 		ovl_set_flag(OVL_UPPERDATA, inode);
 
-	OVL_I(inode)->redirect = redirect;
+	if (!metacopy) {
+		OVL_I(inode)->redirect = redirect;
+		redirect = NULL;
+	} else if (upperdentry) {
+		new_redirect = ovl_get_redirect_xattr(upperdentry);
+		if (IS_ERR(new_redirect)) {
+			err = PTR_ERR(new_redirect);
+			goto out_err_inode;
+		}
+		OVL_I(inode)->redirect = new_redirect;
+		new_redirect = NULL;
+	}
 
 	/* Check for non-merge dir that may have whiteouts */
 	if (is_dir) {
@@ -764,11 +776,16 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 		}
 	}
 
+	kfree(redirect);
 	if (inode->i_state & I_NEW)
 		unlock_new_inode(inode);
 out:
 	return inode;
 
+out_err_inode:
+	if (inode->i_state & I_NEW)
+		unlock_new_inode(inode);
+	iput(inode);
 out_err:
 	inode = ERR_PTR(err);
 	goto out;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 429713653b3b..a3bee7619fbb 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -279,6 +279,7 @@ void ovl_nlink_end_locked(struct dentry *dentry);
 int ovl_lock_rename_workdir(struct dentry *workdir, struct dentry *upperdir);
 int ovl_check_metacopy_xattr(struct dentry *dentry);
 bool ovl_is_metacopy_dentry(struct dentry *dentry);
+char *ovl_get_redirect_xattr(struct dentry *dentry);
 
 static inline bool ovl_is_impuredir(struct dentry *dentry)
 {
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 961d65bd25c9..3d090b6f9fc2 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -833,3 +833,45 @@ bool ovl_is_metacopy_dentry(struct dentry *dentry)
 
 	return (oe->numlower > 1);
 }
+
+char *ovl_get_redirect_xattr(struct dentry *dentry)
+{
+	int res;
+	char *s, *next, *buf = NULL;
+
+	res = vfs_getxattr(dentry, OVL_XATTR_REDIRECT, NULL, 0);
+	if (res < 0) {
+		if (res == -ENODATA || res == -EOPNOTSUPP)
+			return NULL;
+		return ERR_PTR(res);
+	}
+
+	buf = kzalloc(res + 1, GFP_KERNEL);
+	if (!buf)
+		return ERR_PTR(-ENOMEM);
+
+	res = vfs_getxattr(dentry, OVL_XATTR_REDIRECT, buf, res);
+	if (res < 0) {
+		kfree(buf);
+		return ERR_PTR(res);
+        }
+	if (res == 0)
+		goto invalid;
+
+	if (buf[0] == '/') {
+		for (s = buf; *s++ == '/'; s = next) {
+			next = strchrnul(s, '/');
+			if (s == next)
+				goto invalid;
+		}
+	} else {
+		if (strchr(buf, '/') != NULL)
+			goto invalid;
+	}
+
+	return buf;
+invalid:
+	pr_warn_ratelimited("overlayfs: invalid redirect (%s)\n", buf);
+	kfree(buf);
+	return ERR_PTR(-EINVAL);
+}
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (25 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 26/28] ovl: Re-check redirect xattr during inode initialization Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  2018-03-30 10:53   ` Amir Goldstein
  2018-03-29 19:38 ` [PATCH v13 28/28] ovl: Enable metadata only feature Vivek Goyal
  27 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

If we find a upper metacopy inode, make sure we also found associated data
dentry in lower. Otherwise copy up operation later will fail.

There are two cases where this can happen. First case is that somehow
data file was removed from lower layer. Other case is that REDIRECT
xattr was removed due to copy up of file on another cpu (when inode is
shared between two dentries) and hence ovl_lookup() could not find the
correct dentry.

First case is an error while second case is not error. If file has been
copied up, then it does not matter if data dentry was found or not.

Redirect removal is protected using ovl_inode->lock and ovl_lookup() does
not have access to that lock. So to differentiate between these two
cases, take appropriate inode lock in ovl_get_inode() and make sure a
data dentry has been found for metacopy inode. Otherwise lookup failed
and its an error.

For example, say two files are hardlinked, foo.txt and bar.txt. Say foo.txt
is renamed to foo-renamed.txt gets copied up metadata only. This will also
put a redirect "/foo.txt" on hardlnk  inode. Now assume foo-renamed.txt
is opened for write and is undergoing data copy up on one cpu and bar.txt
is under going ovl_lookup() on other cpu. Data copy up path will remove
REDIRECT and METACOPY xattr. It is possible that METACOPY xattr is
visible to ovl_lookup() but by the REDIRECT xattr was gone by the time.
That means no data dentry will be found but at the same time now inode
is not metacopy inode. So data dentry is not required. So this is not
error case. But if inode was still metacopy but data dentry was not found
this is error case. (May be due to underlying layer changed). Fix it by
returning -ESTALE.

If inode was found in cache, then take ovl_inode->lock before checking
status of inode. If inode has been allocated, then it is returned with
inode lock anyway and other threads will block on that lock, so no need
to take ovl_inode->lock.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/export.c    |  3 ++-
 fs/overlayfs/inode.c     | 49 +++++++++++++++++++++++++++++++++++++++++++++++-
 fs/overlayfs/namei.c     | 14 ++++----------
 fs/overlayfs/overlayfs.h |  3 ++-
 4 files changed, 56 insertions(+), 13 deletions(-)

diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 1c233096e59c..e8575d4d2c77 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -305,7 +305,8 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
 	if (d_is_dir(upper ?: lower))
 		return ERR_PTR(-EIO);
 
-	inode = ovl_get_inode(sb, dget(upper), lower, index, !!lower, NULL);
+	inode = ovl_get_inode(sb, dget(upper), lower, index, !!lower, NULL,
+			      false);
 	if (IS_ERR(inode)) {
 		dput(upper);
 		return ERR_CAST(inode);
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 6a0c85699024..7e30f4a7cdd9 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -685,9 +685,42 @@ static bool ovl_hash_bylower(struct super_block *sb, struct dentry *upper,
 	return true;
 }
 
+static bool ovl_verify_metacopy_data(struct super_block *sb,
+				     struct inode *inode, bool metacopydata)
+{
+	struct ovl_fs *ofs = sb->s_fs_info;
+	struct ovl_inode *oi = OVL_I(inode);
+	bool metacopy = false;
+
+	/* A metacopy data dentry was found. So no need to do further checks */
+	if (metacopydata)
+		return true;
+
+	/*
+	 * Metacopy feature not enabled. No metadata data copy up should take
+	 * place. So no further checks needed.
+	 */
+	if (!ofs->config.metacopy)
+		return true;
+
+	if (!S_ISREG(inode->i_mode))
+		return true;
+
+	/*
+	 * Metacopy feature is enabled and we have not found metacopy data
+	 * dentry. Make sure this inode is not metacopy inode.
+	 */
+	mutex_lock(&oi->lock);
+	metacopy = !ovl_test_flag(OVL_UPPERDATA, inode);
+	mutex_unlock(&oi->lock);
+
+	return !metacopy;
+}
+
 struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 			    struct dentry *lowerdentry, struct dentry *index,
-			    unsigned int numlower, char *redirect)
+			    unsigned int numlower, char *redirect,
+			    bool metacopydata)
 {
 	struct inode *realinode = upperdentry ? d_inode(upperdentry) : NULL;
 	struct inode *inode;
@@ -725,6 +758,15 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 				goto out_err;
 			}
 
+			/* Verify data dentry was found for metacopy dentry */
+			if (upperdentry &&
+			    !ovl_verify_metacopy_data(sb, inode,
+				                      metacopydata)) {
+				iput(inode);
+				inode = ERR_PTR(-ESTALE);
+				goto out;
+			}
+
 			dput(upperdentry);
 			kfree(redirect);
 			goto out;
@@ -755,6 +797,11 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 	if (upperdentry && !metacopy)
 		ovl_set_flag(OVL_UPPERDATA, inode);
 
+	if (upperdentry && metacopy && !metacopydata) {
+		err = -ESTALE;
+		goto out_err_inode;
+	}
+
 	if (!metacopy) {
 		OVL_I(inode)->redirect = redirect;
 		redirect = NULL;
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index a97666245726..8be37b0be6fd 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -839,7 +839,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	struct dentry *this;
 	unsigned int i;
 	int err;
-	bool metacopy = false;
+	bool metacopy = false, metacopydata = false;
 	struct ovl_lookup_data d = {
 		.name = dentry->d_name,
 		.is_dir = false,
@@ -958,6 +958,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 
 		if (d.metacopy)
 			metacopy = true;
+		else
+			metacopydata = true;
 		/*
 		 * Do not store intermediate metacopy dentries in chain,
 		 * except top most lower metacopy dentry
@@ -1000,14 +1002,6 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 
 	if (metacopy) {
 		BUG_ON(d.is_dir);
-		/*
-		 * Found a metacopy dentry but did not find corresponding
-		 * data dentry
-		 */
-		if (d.metacopy) {
-			err = -ESTALE;
-			goto out_put;
-		}
 
 		err = -EPERM;
 		if (!ofs->config.metacopy) {
@@ -1065,7 +1059,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		if (ctr)
 			origin = stack[0].dentry;
 		inode = ovl_get_inode(dentry->d_sb, upperdentry, origin, index,
-				      ctr, upperredirect);
+				      ctr, upperredirect, metacopydata);
 		err = PTR_ERR(inode);
 		if (IS_ERR(inode))
 			goto out_free_oe;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index a3bee7619fbb..2330253d7e25 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -352,7 +352,8 @@ struct inode *ovl_lookup_inode(struct super_block *sb, struct dentry *real,
 			       bool is_upper);
 struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 			    struct dentry *lowerdentry, struct dentry *index,
-			    unsigned int numlower, char *redirect);
+			    unsigned int numlower, char *redirect,
+			    bool metacopydata);
 static inline void ovl_copyattr(struct inode *from, struct inode *to)
 {
 	to->i_uid = from->i_uid;
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v13 28/28] ovl: Enable metadata only feature
  2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
                   ` (26 preceding siblings ...)
  2018-03-29 19:38 ` [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode Vivek Goyal
@ 2018-03-29 19:38 ` Vivek Goyal
  27 siblings, 0 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-03-29 19:38 UTC (permalink / raw)
  To: linux-unionfs; +Cc: miklos, amir73il, vgoyal

All the bits are in patches before this. So it is time to enable the
metadata only copy up feature.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index fbaf5fbfdf88..bfdd7dbb4ad7 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -247,9 +247,6 @@ static bool ovl_need_meta_copy_up(struct dentry *dentry, umode_t mode,
 {
 	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
 
-	/* TODO: Will enable metacopy in last patch of series */
-	return false;
-
 	if (!ofs->config.metacopy)
 		return false;
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 04/28] ovl: Provide a mount option metacopy=on/off for metadata copyup
  2018-03-29 19:38 ` [PATCH v13 04/28] ovl: Provide a mount option metacopy=on/off for metadata copyup Vivek Goyal
@ 2018-03-30  4:52   ` Amir Goldstein
  2018-04-02 13:56     ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30  4:52 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> By default metadata only copy up is disabled. Provide a mount option so
> that users can choose one way or other.
>
> Also provide a kernel config and module option to enable/disable
> metacopy feature.
>
> metacopy feature requires redirect_dir=on when upper is present. Otherwise,
> it requires redirect_dir=follow atleast.
>
> Like index feature, we verify on mount that upper root is not being
> reused with a different lower root.

I don't see that in the patch

> This hopes to get the configuration
> right and detect the copied layers use case. But this does only so
> much as we don't verify all the lowers. So it is possible that a lower is
> missing and later data copy up fails.
>
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  Documentation/filesystems/overlayfs.txt | 30 ++++++++++++++++++++++++-
>  fs/overlayfs/Kconfig                    | 19 ++++++++++++++++
>  fs/overlayfs/ovl_entry.h                |  1 +
>  fs/overlayfs/super.c                    | 40 ++++++++++++++++++++++++++++++++-
>  4 files changed, 88 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt
> index 6ea1e64d1464..b7720e61973c 100644
> --- a/Documentation/filesystems/overlayfs.txt
> +++ b/Documentation/filesystems/overlayfs.txt
> @@ -249,6 +249,30 @@ rightmost one and going left.  In the above example lower1 will be the
>  top, lower2 the middle and lower3 the bottom layer.
>
>
> +Metadata only copyup
> +--------------------
> +
> +When metadata only copy up feature is enabled, overlayfs will only copy
> +up metadata (as opposed to whole file), when a metadata specific operation
> +like chown/chmod is performed. Full file will be copied up later when
> +file is opened for WRITE operation.
> +
> +IOW, this is delayed data copy up operation and data is copied up when
> +there is a need to actually modify data.
> +
> +There are multiple ways to enable/disable this feature. A config option
> +CONFIG_OVERLAY_FS_METACOPY can be set/unset to enable/disable this feature
> +by default. Or one can enable/disable it at module load time with module
> +parameter metacopy=on/off. Lastly, there is also a per mount option
> +metacopy=on/off to enable/disable this feature per mount.
> +
> +Do not use metacopy=on with untrusted upper/lower directories. Otherwise
> +it is possible that an attacker can create an handcrafted file with
> +appropriate REDIRECT and METACOPY xattrs, and gain access to file on lower
> +pointed by REDIRECT. This should not be possible on local system as setting
> +"trusted." xattrs will require CAP_SYS_ADMIN. But it should be possible
> +for untrusted layers like from a pen drive.
> +
>  Sharing and copying layers
>  --------------------------
>
> @@ -267,7 +291,7 @@ though it will not result in a crash or deadlock.
>  Mounting an overlay using an upper layer path, where the upper layer path
>  was previously used by another mounted overlay in combination with a
>  different lower layer path, is allowed, unless the "inodes index" feature
> -is enabled.
> +or "metadata only copyup" feature is enabled.
>
>  With the "inodes index" feature, on the first time mount, an NFS file
>  handle of the lower layer root directory, along with the UUID of the lower
> @@ -280,6 +304,10 @@ lower root origin, mount will fail with ESTALE.  An overlayfs mount with
>  does not support NFS export, lower filesystem does not have a valid UUID or
>  if the upper filesystem does not support extended attributes.
>
> +For "metadata only copyup" feature there is no verification mechanism at
> +mount time. So if same upper is mouted with different set of lower, mount
> +probably will succeed but expect the unexpected later on. So don't do it.
> +
>  It is quite a common practice to copy overlay layers to a different
>  directory tree on the same or different underlying filesystem, and even
>  to a different machine.  With the "inodes index" feature, trying to mount
> diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
> index ce6ff5a0a6e4..7d9650c9c075 100644
> --- a/fs/overlayfs/Kconfig
> +++ b/fs/overlayfs/Kconfig
> @@ -86,3 +86,22 @@ config OVERLAY_FS_NFS_EXPORT
>           case basis with the "nfs_export=on" mount option.
>
>           Say N unless you fully understand the consequences.
> +
> +config OVERLAY_FS_METACOPY
> +       bool "Overlayfs: turn on metadata only copy up feature by default"
> +       depends on OVERLAY_FS
> +       depends on !OVERLAY_FS_NFS_EXPORT

Like the test in the code, the dependency should be
OVERLAY_FS_NFS_EXPORT depends on !OVERLAY_FS_METACOPY

> +       select OVERLAY_FS_REDIRECT_DIR

At first glance, I thought this should be
depends on OVERLAY_FS_REDIRECT_DIR,
like in the code
But I see why select makes sense in the context of config options.
Makes me wonder if NFS_EXPORT should also select INDEX

I know why I didn't do this logic in the code, because we do not distinguish
in the code between explicit mount option "index=off" and no mount option
at all when default is "index=off". The former should disable nfs_export
but the latter should enable index.

> +       help
> +         If this config option is enabled then overlay filesystems will
> +         copy up only metadata where appropriate and data copy up will
> +         happen when a file is opended for WRITE operation. It is still
> +         possible to turn off this feature globally with the "metacopy=off"
> +         module option or on a filesystem instance basis with the
> +         "metacopy=off" mount option.
> +
> +         Note, that this feature is not backward compatible.  That is,
> +         mounting an overlay which has metacopy only inodes on a kernel
> +         that doesn't support this feature will have unexpected results.
> +
> +         If unsure, say N.
> diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> index bfef6edcc111..7dc55628080d 100644
> --- a/fs/overlayfs/ovl_entry.h
> +++ b/fs/overlayfs/ovl_entry.h
> @@ -18,6 +18,7 @@ struct ovl_config {
>         const char *redirect_mode;
>         bool index;
>         bool nfs_export;
> +       bool metacopy;
>  };
>
>  struct ovl_layer {
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 7c24619ae7fc..ddff54fa9e85 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -58,6 +58,11 @@ static void ovl_entry_stack_free(struct ovl_entry *oe)
>                 dput(oe->lowerstack[i].dentry);
>  }
>
> +static bool ovl_metacopy_def = IS_ENABLED(CONFIG_OVERLAY_FS_METACOPY);
> +module_param_named(metacopy, ovl_metacopy_def, bool, 0644);
> +MODULE_PARM_DESC(ovl_metacopy_def,
> +                "Default to on or off for the metadata only copy up feature");
> +
>  static void ovl_dentry_release(struct dentry *dentry)
>  {
>         struct ovl_entry *oe = dentry->d_fsdata;
> @@ -350,6 +355,9 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
>         if (ofs->config.nfs_export != ovl_nfs_export_def)
>                 seq_printf(m, ",nfs_export=%s", ofs->config.nfs_export ?
>                                                 "on" : "off");
> +       if (ofs->config.metacopy != ovl_metacopy_def)
> +               seq_printf(m, ",metacopy=%s",
> +                          ofs->config.metacopy ? "on" : "off");
>         return 0;
>  }
>
> @@ -384,6 +392,8 @@ enum {
>         OPT_INDEX_OFF,
>         OPT_NFS_EXPORT_ON,
>         OPT_NFS_EXPORT_OFF,
> +       OPT_METACOPY_ON,
> +       OPT_METACOPY_OFF,
>         OPT_ERR,
>  };
>
> @@ -397,6 +407,8 @@ static const match_table_t ovl_tokens = {
>         {OPT_INDEX_OFF,                 "index=off"},
>         {OPT_NFS_EXPORT_ON,             "nfs_export=on"},
>         {OPT_NFS_EXPORT_OFF,            "nfs_export=off"},
> +       {OPT_METACOPY_ON,               "metacopy=on"},
> +       {OPT_METACOPY_OFF,              "metacopy=off"},
>         {OPT_ERR,                       NULL}
>  };
>
> @@ -511,6 +523,14 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
>                         config->nfs_export = false;
>                         break;
>
> +               case OPT_METACOPY_ON:
> +                       config->metacopy = true;
> +                       break;
> +
> +               case OPT_METACOPY_OFF:
> +                       config->metacopy = false;
> +                       break;
> +
>                 default:
>                         pr_err("overlayfs: unrecognized mount option \"%s\" or missing value\n", p);
>                         return -EINVAL;
> @@ -993,7 +1013,8 @@ static int ovl_make_workdir(struct ovl_fs *ofs, struct path *workpath)
>         if (err) {
>                 ofs->noxattr = true;
>                 ofs->config.index = false;
> -               pr_warn("overlayfs: upper fs does not support xattr, falling back to index=off.\n");
> +               ofs->config.metacopy = false;
> +               pr_warn("overlayfs: upper fs does not support xattr, falling back to index=off and metacopy=off.\n");
>                 err = 0;
>         } else {
>                 vfs_removexattr(ofs->workdir, OVL_XATTR_OPAQUE);
> @@ -1012,6 +1033,11 @@ static int ovl_make_workdir(struct ovl_fs *ofs, struct path *workpath)
>                 ofs->config.nfs_export = false;
>         }
>
> +       /* metacopy feature with upper requires redirect_dir=on */
> +       if (ofs->config.metacopy && !ofs->config.redirect_dir) {
> +               pr_warn("overlayfs: metadata only copyup requires \"redirect_dir=on\", falling back to metacopy=off.\n");
> +               ofs->config.metacopy = false;
> +       }

Please move all these scattered tests that depend only on parsed config
values at the end of ovl_parse_redirect_mode().

>  out:
>         mnt_drop_write(mnt);
>         return err;
> @@ -1188,6 +1214,12 @@ static struct ovl_entry *ovl_get_lowerstack(struct super_block *sb,
>                 ofs->config.nfs_export = false;
>         }
>
> +       if (!ofs->config.upperdir && ofs->config.metacopy &&
> +           !ofs->config.redirect_follow) {
> +               ofs->config.metacopy = false;
> +               pr_warn("overlayfs: metadata only copyup requires \"redirect_dir=follow\" on non-upper mount, falling back to metacopy=off.\n");
> +       }
> +
>         err = -ENOMEM;
>         stack = kcalloc(stacklen, sizeof(struct path), GFP_KERNEL);
>         if (!stack)
> @@ -1263,6 +1295,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>
>         ofs->config.index = ovl_index_def;
>         ofs->config.nfs_export = ovl_nfs_export_def;
> +       ofs->config.metacopy = ovl_metacopy_def;
>         err = ovl_parse_opt((char *) data, &ofs->config);
>         if (err)
>                 goto out_err;
> @@ -1331,6 +1364,11 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>                 }
>         }
>
> +       if (ofs->config.metacopy && ofs->config.nfs_export) {
> +               pr_warn("overlayfs: Metadata copy up requires NFS export disabled, falling back to nfs_export=off.\n");

"NFS export is not supported with metadata only copy up, ..."

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 02/28] ovl: Initialize ovl_inode->redirect in ovl_get_inode()
  2018-03-29 19:38 ` [PATCH v13 02/28] ovl: Initialize ovl_inode->redirect " Vivek Goyal
@ 2018-03-30  4:57   ` Amir Goldstein
  0 siblings, 0 replies; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30  4:57 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> All the inode/ovl_inode properties ideally should be initialized in
> ovl_get_inode(). Otherwise this can be a problem for the cases where
> multiple dentries point to same inode (hardlinks, disconnected dentries).
>
> As of now this is not a problem as redirects are used only for directories
> which don't share inode. But soon I want to use redirects for regular files
> also and there it will become an issue.
>
> Hence, move ->redirect initialization in ovl_get_inode().
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

> ---
>  fs/overlayfs/export.c    | 2 +-
>  fs/overlayfs/inode.c     | 5 ++++-
>  fs/overlayfs/namei.c     | 4 +---
>  fs/overlayfs/overlayfs.h | 2 +-
>  4 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> index 9868c173068e..e668329f7361 100644
> --- a/fs/overlayfs/export.c
> +++ b/fs/overlayfs/export.c
> @@ -305,7 +305,7 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
>         if (d_is_dir(upper ?: lower))
>                 return ERR_PTR(-EIO);
>
> -       inode = ovl_get_inode(sb, dget(upper), lower, index, !!lower);

I'd leave a TODO comment here for fixing metacopy+NFS export support
You may add the comment in later patch, just don't forget.

> +       inode = ovl_get_inode(sb, dget(upper), lower, index, !!lower, NULL);
>         if (IS_ERR(inode)) {
>                 dput(upper);
>                 return ERR_CAST(inode);
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index 06ef9a7a7d1a..2e7ee09c7831 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -704,7 +704,7 @@ static bool ovl_hash_bylower(struct super_block *sb, struct dentry *upper,
>
>  struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
>                             struct dentry *lowerdentry, struct dentry *index,
> -                           unsigned int numlower)
> +                           unsigned int numlower, char *redirect)
>  {
>         struct inode *realinode = upperdentry ? d_inode(upperdentry) : NULL;
>         struct inode *inode;
> @@ -741,6 +741,7 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
>                         }
>
>                         dput(upperdentry);
> +                       kfree(redirect);
>                         goto out;
>                 }
>
> @@ -763,6 +764,8 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
>         if (index)
>                 ovl_set_flag(OVL_INDEX, inode);
>
> +       OVL_I(inode)->redirect = redirect;
> +
>         /* Check for non-merge dir that may have whiteouts */
>         if (is_dir) {
>                 if (((upperdentry && lowerdentry) || numlower > 1) ||
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index ec92fa2f7d5f..0b325e65864c 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -999,12 +999,10 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                 if (ctr)
>                         origin = stack[0].dentry;
>                 inode = ovl_get_inode(dentry->d_sb, upperdentry, origin, index,
> -                                     ctr);
> +                                     ctr, upperredirect);
>                 err = PTR_ERR(inode);
>                 if (IS_ERR(inode))
>                         goto out_free_oe;
> -
> -               OVL_I(inode)->redirect = upperredirect;
>         }
>
>         revert_creds(old_cred);
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 225ff1171147..a65ce7fd1b6e 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -330,7 +330,7 @@ struct inode *ovl_lookup_inode(struct super_block *sb, struct dentry *real,
>                                bool is_upper);
>  struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
>                             struct dentry *lowerdentry, struct dentry *index,
> -                           unsigned int numlower);
> +                           unsigned int numlower, char *redirect);
>  static inline void ovl_copyattr(struct inode *from, struct inode *to)
>  {
>         to->i_uid = from->i_uid;
> --
> 2.13.6
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 03/28] ovl: Rename local variable locked to new_locked
  2018-03-29 19:38 ` [PATCH v13 03/28] ovl: Rename local variable locked to new_locked Vivek Goyal
@ 2018-03-30  4:58   ` Amir Goldstein
  0 siblings, 0 replies; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30  4:58 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> As "locked" tracks the mutex lock state of "new" dentry inode instead of
> "old" dentry inode, name the variable "new_locked" instead.
>
> Soon I will need to lock old dentry inode lock as well and will track that
> state in variable "locked".
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

> ---
>  fs/overlayfs/dir.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index 839709c7803a..8a38f3136547 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -892,7 +892,7 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
>                       unsigned int flags)
>  {
>         int err;
> -       bool locked = false;
> +       bool new_locked = false;
>         struct dentry *old_upperdir;
>         struct dentry *new_upperdir;
>         struct dentry *olddentry;
> @@ -959,7 +959,7 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
>                 if (err)
>                         goto out_drop_write;
>         } else {
> -               err = ovl_nlink_start(new, &locked);
> +               err = ovl_nlink_start(new, &new_locked);
>                 if (err)
>                         goto out_drop_write;
>         }
> @@ -1087,7 +1087,7 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
>         unlock_rename(new_upperdir, old_upperdir);
>  out_revert_creds:
>         revert_creds(old_cred);
> -       ovl_nlink_end(new, locked);
> +       ovl_nlink_end(new, new_locked);
>  out_drop_write:
>         ovl_drop_write(old);
>  out:
> --
> 2.13.6
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 01/28] ovl: Set OVL_INDEX flag in ovl_get_inode()
  2018-03-29 19:38 ` [PATCH v13 01/28] ovl: Set OVL_INDEX flag in ovl_get_inode() Vivek Goyal
@ 2018-03-30  4:59   ` Amir Goldstein
  0 siblings, 0 replies; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30  4:59 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> All the inode/ovl_inode initialization should happen in ovl_get_inode().
> This is especially useful when multiple dentries are pointing to same
> inode and inode is already in the cash. In that case, we don't have to

s/in the cash/in cache/

> initialize the OVL_INDEX. We don't even take ovl_inode->lock in ovl_lookup()
> and that can run into issues for the case of shared inode.
>
> So move OVL_INDEX flag setting in ovl_get_inode().
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

> ---
>  fs/overlayfs/export.c | 3 ---
>  fs/overlayfs/inode.c  | 3 +++
>  fs/overlayfs/namei.c  | 2 --
>  3 files changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> index 87bd4148f4fb..9868c173068e 100644
> --- a/fs/overlayfs/export.c
> +++ b/fs/overlayfs/export.c
> @@ -311,9 +311,6 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
>                 return ERR_CAST(inode);
>         }
>
> -       if (index)
> -               ovl_set_flag(OVL_INDEX, inode);
> -
>         dentry = d_find_any_alias(inode);
>         if (!dentry) {
>                 dentry = d_alloc_anon(inode->i_sb);
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index 3b1bd469accd..06ef9a7a7d1a 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -760,6 +760,9 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
>         if (upperdentry && ovl_is_impuredir(upperdentry))
>                 ovl_set_flag(OVL_IMPURE, inode);
>
> +       if (index)
> +               ovl_set_flag(OVL_INDEX, inode);
> +
>         /* Check for non-merge dir that may have whiteouts */
>         if (is_dir) {
>                 if (((upperdentry && lowerdentry) || numlower > 1) ||
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index f55c00bf9e17..ec92fa2f7d5f 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -1005,8 +1005,6 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                         goto out_free_oe;
>
>                 OVL_I(inode)->redirect = upperredirect;
> -               if (index)
> -                       ovl_set_flag(OVL_INDEX, inode);
>         }
>
>         revert_creds(old_cred);
> --
> 2.13.6
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 10/28] ovl: Modify ovl_lookup() and friends to lookup metacopy dentry
  2018-03-29 19:38 ` [PATCH v13 10/28] ovl: Modify ovl_lookup() and friends to lookup metacopy dentry Vivek Goyal
@ 2018-03-30  5:49   ` Amir Goldstein
  2018-03-30  9:12     ` Amir Goldstein
  2018-04-02 15:06     ` Vivek Goyal
  0 siblings, 2 replies; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30  5:49 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> This patch modifies ovl_lookup() and friends to lookup metacopy dentries.
> It also allows for presence of metacopy dentries in lower layer.
>
> During lookup, check for presence of OVL_XATTR_METACOPY and if not present,
> set OVL_UPPERDATA bit in flags.
>
> We don't support metacopy feature with nfs_export. So in nfs_export code,
> we set OVL_UPPERDATA flag set unconditionally if upper inode exists.
>
> Do not follow metacopy origin if we find a metacopy only inode and metacopy
> feature is not enabled for that mount. Like redirect, this can have security
> implications where an attacker could hand craft upper and try to gain
> access to file on lower which it should not have to begin with.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/export.c    |  3 ++
>  fs/overlayfs/inode.c     |  6 +++-
>  fs/overlayfs/namei.c     | 90 ++++++++++++++++++++++++++++++++++++++++++------
>  fs/overlayfs/overlayfs.h |  1 +
>  fs/overlayfs/util.c      | 22 ++++++++++++
>  5 files changed, 110 insertions(+), 12 deletions(-)
>
> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> index e668329f7361..1c233096e59c 100644
> --- a/fs/overlayfs/export.c
> +++ b/fs/overlayfs/export.c
> @@ -311,6 +311,9 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
>                 return ERR_CAST(inode);
>         }
>
> +       if (upper)
> +               ovl_set_flag(OVL_UPPERDATA, inode);
> +
>         dentry = d_find_any_alias(inode);
>         if (!dentry) {
>                 dentry = d_alloc_anon(inode->i_sb);
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index 3991a890b464..e1dbfed0c449 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -677,7 +677,7 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
>         struct inode *realinode = upperdentry ? d_inode(upperdentry) : NULL;
>         struct inode *inode;
>         bool bylower = ovl_hash_bylower(sb, upperdentry, lowerdentry, index);
> -       bool is_dir;
> +       bool is_dir, metacopy = false;
>
>         if (!realinode)
>                 realinode = d_inode(lowerdentry);
> @@ -732,6 +732,10 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
>         if (index)
>                 ovl_set_flag(OVL_INDEX, inode);
>
> +       metacopy = ovl_check_metacopy_xattr(upperdentry ?: lowerdentry);

No reason to check metacopy on lowerdentry.

> +       if (upperdentry && !metacopy)
> +               ovl_set_flag(OVL_UPPERDATA, inode);
> +
>         OVL_I(inode)->redirect = redirect;
>
>         /* Check for non-merge dir that may have whiteouts */
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index 0b325e65864c..1dba89e9543f 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -24,6 +24,7 @@ struct ovl_lookup_data {
>         bool stop;
>         bool last;
>         char *redirect;
> +       bool metacopy;
>  };
>
>  static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
> @@ -252,9 +253,13 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
>                 goto put_and_out;
>         }
>         if (!d_can_lookup(this)) {
> -               d->stop = true;

upper was a dir. You look in lower and find a non-dir. you need to stop
going to next layer. goto put_and_out won't do that.

Similarly, you need to handle the case where dir is found below
non-dir with metacopy.

>                 if (d->is_dir)
>                         goto put_and_out;
> +               err = ovl_check_metacopy_xattr(this);
> +               if (err < 0)
> +                       goto out_err;
> +               d->stop = !err;
> +               d->metacopy = !!err;
>                 goto out;
>         }
>         if (last_element)
> @@ -815,7 +820,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>         struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
>         struct ovl_entry *poe = dentry->d_parent->d_fsdata;
>         struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
> -       struct ovl_path *stack = NULL;
> +       struct ovl_path *stack = NULL, *origin_path = NULL;
>         struct dentry *upperdir, *upperdentry = NULL;
>         struct dentry *origin = NULL;
>         struct dentry *index = NULL;
> @@ -826,6 +831,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>         struct dentry *this;
>         unsigned int i;
>         int err;
> +       bool metacopy = false;
>         struct ovl_lookup_data d = {
>                 .name = dentry->d_name,
>                 .is_dir = false,
> @@ -833,6 +839,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                 .stop = false,
>                 .last = ofs->config.redirect_follow ? false : !poe->numlower,
>                 .redirect = NULL,
> +               .metacopy = false,
>         };
>
>         if (dentry->d_name.len > ofs->namelen)
> @@ -851,7 +858,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                         goto out;
>                 }
>                 if (upperdentry && !d.is_dir) {
> -                       BUG_ON(!d.stop || d.redirect);
> +                       unsigned int origin_ctr = 0;
> +                       BUG_ON(d.redirect);
>                         /*
>                          * Lookup copy up origin by decoding origin file handle.
>                          * We may get a disconnected dentry, which is fine,
> @@ -862,16 +870,20 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                          * number - it's the same as if we held a reference
>                          * to a dentry in lower layer that was moved under us.
>                          */
> -                       err = ovl_check_origin(ofs, upperdentry, &stack, &ctr);
> +                       err = ovl_check_origin(ofs, upperdentry, &origin_path,
> +                                              &origin_ctr);
>                         if (err)
>                                 goto out_put_upper;
> +
> +                       if (d.metacopy)
> +                               metacopy = true;
>                 }
>
>                 if (d.redirect) {
>                         err = -ENOMEM;
>                         upperredirect = kstrdup(d.redirect, GFP_KERNEL);
>                         if (!upperredirect)
> -                               goto out_put_upper;
> +                               goto out_put_origin;
>                         if (d.redirect[0] == '/')
>                                 poe = roe;
>                 }
> @@ -883,7 +895,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                 stack = kcalloc(ofs->numlower, sizeof(struct ovl_path),
>                                 GFP_KERNEL);
>                 if (!stack)
> -                       goto out_put_upper;
> +                       goto out_put_origin;
>         }
>
>         for (i = 0; !d.stop && i < poe->numlower; i++) {
> @@ -905,7 +917,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                  * If no origin fh is stored in upper of a merge dir, store fh
>                  * of lower dir and set upper parent "impure".
>                  */
> -               if (upperdentry && !ctr && !ofs->noxattr) {
> +               if (upperdentry && !ctr && !ofs->noxattr && d.is_dir) {
>                         err = ovl_fix_origin(dentry, this, upperdentry);
>                         if (err) {
>                                 dput(this);
> @@ -917,16 +929,35 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                  * When "verify_lower" feature is enabled, do not merge with a
>                  * lower dir that does not match a stored origin xattr. In any
>                  * case, only verified origin is used for index lookup.
> +                *
> +                * For non-dir dentry, make sure dentry found by lookup
> +                * matches the origin stored in upper. Otherwise its an
> +                * error.
>                  */
> -               if (upperdentry && !ctr && ovl_verify_lower(dentry->d_sb)) {
> +               if (upperdentry && !ctr &&
> +                   ((d.is_dir && ovl_verify_lower(dentry->d_sb)) ||
> +                    (!d.is_dir && origin_path))) {
>                         err = ovl_verify_origin(upperdentry, this, false);
>                         if (err) {
>                                 dput(this);
> -                               break;
> +                               if (d.is_dir)
> +                                       break;
> +                               goto out_put;
>                         }
> -
>                         /* Bless lower dir as verified origin */
> -                       origin = this;
> +                       if (d.is_dir)
> +                               origin = this;

It's ok to bless verified non-dir as well.
It is going to be blesses anyway, just above index lookup if ctr > 0.

> +               }
> +
> +               if (d.metacopy)
> +                       metacopy = true;
> +               /*
> +                * Do not store intermediate metacopy dentries in chain,
> +                * except top most lower metacopy dentry
> +                */
> +               if (d.metacopy && ctr) {
> +                       dput(this);
> +                       continue;
>                 }
>
>                 stack[ctr].dentry = this;
> @@ -960,6 +991,34 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>                 }
>         }
>
> +       if (metacopy) {
> +               BUG_ON(d.is_dir);

Yeh, I think that is really a bug, because you need to detect
the case of dir in lower layer under metacopy in upper layer
and do something about it.

> +               /*
> +                * Found a metacopy dentry but did not find corresponding
> +                * data dentry
> +                */
> +               if (d.metacopy) {
> +                       err = -ESTALE;
> +                       goto out_put;
> +               }
> +
> +               err = -EPERM;
> +               if (!ofs->config.metacopy) {
> +                       pr_warn_ratelimited("overlay: refusing to follow"
> +                                           " metacopy origin for (%pd2)\n",
> +                                           dentry);
> +                       goto out_put;
> +               }
> +       } else if (!d.is_dir && upperdentry && !ctr && origin_path) {
> +               if (WARN_ON(stack != NULL)) {
> +                       err = -EIO;
> +                       goto out_put;
> +               }
> +               stack = origin_path;
> +               ctr = 1;
> +               origin_path = NULL;
> +       }
> +
>         /*
>          * Lookup index by lower inode and verify it matches upper inode.
>          * We only trust dir index if we verified that lower dir matches
> @@ -1006,6 +1065,10 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>         }
>
>         revert_creds(old_cred);
> +       if (origin_path) {
> +               dput(origin_path->dentry);
> +               kfree(origin_path);
> +       }
>         dput(index);
>         kfree(stack);
>         kfree(d.redirect);
> @@ -1019,6 +1082,11 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>         for (i = 0; i < ctr; i++)
>                 dput(stack[i].dentry);
>         kfree(stack);
> +out_put_origin:
> +       if (origin_path) {
> +               dput(origin_path->dentry);
> +               kfree(origin_path);
> +       }

There is no need for the new goto label.
Just add this in existing out_put_upper label.

Thanks,
Amir,

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 13/28] ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry
  2018-03-29 19:38 ` [PATCH v13 13/28] ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry Vivek Goyal
@ 2018-03-30  6:01   ` Amir Goldstein
  2018-04-02 15:08     ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30  6:01 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> Now we have the notion of data dentry and metacopy dentry. ovl_dentry_lower()
> will return lower dentry at idx 0, but it could be either data or metacopy

will return the upper most lower dentry. (idx 0 means upper).

> dentry. Now we support metacopy dentries in lower layers so it is possible
> that lowerstack[0] is metacopy dentry while lowerstack[1] is actual data
> dentry.
>
> So add an helper which returns lowest most dentry which is supposed to be
> data dentry.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

> ---
>  fs/overlayfs/overlayfs.h |  1 +
>  fs/overlayfs/util.c      | 14 ++++++++++++++
>  2 files changed, 15 insertions(+)
>
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 4eb4b765887f..214d9f08c574 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -226,6 +226,7 @@ void ovl_path_lowerdata(struct dentry *dentry, struct path *path);
>  enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path);
>  struct dentry *ovl_dentry_upper(struct dentry *dentry);
>  struct dentry *ovl_dentry_lower(struct dentry *dentry);
> +struct dentry *ovl_dentry_lowerdata(struct dentry *dentry);
>  struct dentry *ovl_dentry_real(struct dentry *dentry);
>  struct dentry *ovl_i_dentry_upper(struct inode *inode);
>  struct inode *ovl_inode_upper(struct inode *inode);
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index fd98329e820c..394674c4c820 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -186,6 +186,20 @@ struct dentry *ovl_dentry_lower(struct dentry *dentry)
>         return oe->numlower ? oe->lowerstack[0].dentry : NULL;
>  }
>
> +/*
> + * ovl_dentry_lower() could return either a data dentry or metacopy dentry
> + * dependig on what is stored in lowerstack[0]. At times we need to find
> + * lower dentry which has data (and not metacopy dntry). This helper
> + * returns the lower data dentry.
> + */
> +struct dentry *ovl_dentry_lowerdata(struct dentry *dentry)
> +{
> +       struct ovl_entry *oe = dentry->d_fsdata;
> +       int idx = oe->numlower - 1;

Please stick with convention of layer->idx that idx 0 is upper.

> +
> +       return idx >= 0 ? oe->lowerstack[idx].dentry : NULL;
> +}
> +
>  struct dentry *ovl_dentry_real(struct dentry *dentry)
>  {
>         return ovl_dentry_upper(dentry) ?: ovl_dentry_lower(dentry);
> --
> 2.13.6
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 15/28] ovl: Move some of ovl_nlink_start() functionality in ovl_nlink_prep()
  2018-03-29 19:38 ` [PATCH v13 15/28] ovl: Move some of ovl_nlink_start() functionality in ovl_nlink_prep() Vivek Goyal
@ 2018-03-30  6:23   ` Amir Goldstein
  0 siblings, 0 replies; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30  6:23 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> Soon I want to write patches to enable redirects on non-dir files. That means
> it is possible that we have to deal with the case where multiple dentries
> might be sharing inode and ovl_inode->redirect field setting/resetting
> will need to be protected by taking ovl_inode->lock. Current dentry based
> locking alone will not be sufficient for this case.
>
> As of now, nlink based code takes ovl_inode->lock in some cases. For redirect
> case, during ovl_rename() I might have to take ovl_inode->lock both on
> old as well as new ovl_inode. And that means that I need to make sure
> there are no deadlocks.
>
> I want to separate out logic for taking lock in a new function. Hence will
> need to break down ovl_nlink_start() a bit.
>
> ovl_nlink_start() does the copy up and then takes ovl_inode->lock. Move
> copy up related portions in a separate function called ovl_nlink_prep().

The sensible thing to do is to call 2 helpers from ovl_nlink_start()
ovl_nlink_prep() and ovl_nlink_???() and then you can use helpers
directly in rename, but don't need to change all other places.
The changes to other places do not make the code better, they make
it worse.

>
> This patch is just code reorganization and no funcitonal change.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/dir.c       | 14 ++++++++++++++
>  fs/overlayfs/overlayfs.h |  1 +
>  fs/overlayfs/util.c      | 26 ++++++++++++++++++--------
>  3 files changed, 33 insertions(+), 8 deletions(-)
>
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index 7617a03acc30..1f003be4a19e 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -595,6 +595,10 @@ static int ovl_link(struct dentry *old, struct inode *newdir,
>         if (err)
>                 goto out_drop_write;
>
> +       err = ovl_nlink_prep(old);
> +       if (err)
> +               goto out_drop_write;
> +
>         err = ovl_nlink_start(old, &locked);
>         if (err)
>                 goto out_drop_write;
> @@ -752,6 +756,10 @@ static int ovl_do_remove(struct dentry *dentry, bool is_dir)
>         if (err)
>                 goto out_drop_write;
>
> +       err = ovl_nlink_prep(dentry);
> +       if (err)
> +               goto out_drop_write;
> +
>         err = ovl_nlink_start(dentry, &locked);
>         if (err)
>                 goto out_drop_write;
> @@ -960,6 +968,12 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
>                 if (err)
>                         goto out_drop_write;
>         } else {
> +               err = ovl_nlink_prep(new);
> +               if (err)
> +                       goto out_drop_write;
> +       }
> +
> +       if (overwrite) {
>                 err = ovl_nlink_start(new, &new_locked);
>                 if (err)
>                         goto out_drop_write;
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 214d9f08c574..aa5b0c121fc7 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -271,6 +271,7 @@ bool ovl_test_flag(unsigned long flag, struct inode *inode);
>  bool ovl_inuse_trylock(struct dentry *dentry);
>  void ovl_inuse_unlock(struct dentry *dentry);
>  bool ovl_need_index(struct dentry *dentry);
> +int ovl_nlink_prep(struct dentry *dentry);
>  int ovl_nlink_start(struct dentry *dentry, bool *locked);
>  void ovl_nlink_end(struct dentry *dentry, bool locked);
>  int ovl_lock_rename_workdir(struct dentry *workdir, struct dentry *upperdir);
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index 394674c4c820..ed93e233894f 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -675,15 +675,9 @@ static void ovl_cleanup_index(struct dentry *dentry)
>         goto out;
>  }
>
> -/*
> - * Operations that change overlay inode and upper inode nlink need to be
> - * synchronized with copy up for persistent nlink accounting.
> - */
> -int ovl_nlink_start(struct dentry *dentry, bool *locked)
> +int ovl_nlink_prep(struct dentry *dentry)
>  {
> -       struct ovl_inode *oi = OVL_I(d_inode(dentry));
> -       const struct cred *old_cred;
> -       int err;
> +       int err = 0;
>
>         if (!d_inode(dentry))
>                 return 0;
> @@ -708,6 +702,22 @@ int ovl_nlink_start(struct dentry *dentry, bool *locked)
>                         return err;
>         }
>
> +       return err;
> +}
> +
> +/*
> + * Operations that change overlay inode and upper inode nlink need to be
> + * synchronized with copy up for persistent nlink accounting.
> + */
> +int ovl_nlink_start(struct dentry *dentry, bool *locked)
> +{
> +       struct ovl_inode *oi = OVL_I(d_inode(dentry));
> +       const struct cred *old_cred;
> +       int err;
> +
> +       if (!d_inode(dentry))
> +               return 0;
> +
>         err = mutex_lock_interruptible(&oi->lock);
>         if (err)
>                 return err;
> --
> 2.13.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 16/28] ovl: Create locked version of ovl_nlink_start() and ovl_nlink_end()
  2018-03-29 19:38 ` [PATCH v13 16/28] ovl: Create locked version of ovl_nlink_start() and ovl_nlink_end() Vivek Goyal
@ 2018-03-30  6:28   ` Amir Goldstein
  0 siblings, 0 replies; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30  6:28 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> Soon in ovl_rename() I will introduce a function to take lock both on old
> and new ovl_inode. That means I don't want ovl_nlink_start() to take lock
> and ovl_nlink_end() to drop lock. It will be done by other functions.
>
> So create ovl_nlink_start_locked() and ovl_nlink_end_locked() which assume
> that by the time they are called, lock has already been taken and these
> don't deal with taking/releasing locks.
>
> These will be used in following patches.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

> ---
>  fs/overlayfs/overlayfs.h |  2 ++
>  fs/overlayfs/util.c      | 50 +++++++++++++++++++++++++++++-------------------
>  2 files changed, 32 insertions(+), 20 deletions(-)
>
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index aa5b0c121fc7..429713653b3b 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -273,7 +273,9 @@ void ovl_inuse_unlock(struct dentry *dentry);
>  bool ovl_need_index(struct dentry *dentry);
>  int ovl_nlink_prep(struct dentry *dentry);
>  int ovl_nlink_start(struct dentry *dentry, bool *locked);
> +int ovl_nlink_start_locked(struct dentry *dentry);
>  void ovl_nlink_end(struct dentry *dentry, bool locked);
> +void ovl_nlink_end_locked(struct dentry *dentry);
>  int ovl_lock_rename_workdir(struct dentry *workdir, struct dentry *upperdir);
>  int ovl_check_metacopy_xattr(struct dentry *dentry);
>  bool ovl_is_metacopy_dentry(struct dentry *dentry);
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index ed93e233894f..927960aa57ee 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -709,21 +709,13 @@ int ovl_nlink_prep(struct dentry *dentry)
>   * Operations that change overlay inode and upper inode nlink need to be
>   * synchronized with copy up for persistent nlink accounting.
>   */
> -int ovl_nlink_start(struct dentry *dentry, bool *locked)
> +int ovl_nlink_start_locked(struct dentry *dentry)
>  {
> -       struct ovl_inode *oi = OVL_I(d_inode(dentry));
>         const struct cred *old_cred;
>         int err;
>
> -       if (!d_inode(dentry))
> -               return 0;
> -
> -       err = mutex_lock_interruptible(&oi->lock);
> -       if (err)
> -               return err;
> -
>         if (d_is_dir(dentry) || !ovl_test_flag(OVL_INDEX, d_inode(dentry)))
> -               goto out;
> +               return 0;
>
>         old_cred = ovl_override_creds(dentry->d_sb);
>         /*
> @@ -734,8 +726,22 @@ int ovl_nlink_start(struct dentry *dentry, bool *locked)
>          */
>         err = ovl_set_nlink_upper(dentry);
>         revert_creds(old_cred);
> +       return err;
> +}
>
> -out:
> +int ovl_nlink_start(struct dentry *dentry, bool *locked)
> +{
> +       struct ovl_inode *oi = OVL_I(d_inode(dentry));
> +       int err;
> +
> +       if (!d_inode(dentry))
> +               return 0;
> +
> +       err = mutex_lock_interruptible(&oi->lock);
> +       if (err)
> +               return err;
> +
> +       err = ovl_nlink_start_locked(dentry);
>         if (err)
>                 mutex_unlock(&oi->lock);
>         else
> @@ -744,18 +750,22 @@ int ovl_nlink_start(struct dentry *dentry, bool *locked)
>         return err;
>  }
>
> -void ovl_nlink_end(struct dentry *dentry, bool locked)
> +void ovl_nlink_end_locked(struct dentry *dentry)
>  {
> -       if (locked) {
> -               if (ovl_test_flag(OVL_INDEX, d_inode(dentry)) &&
> -                   d_inode(dentry)->i_nlink == 0) {
> -                       const struct cred *old_cred;
> +       if (ovl_test_flag(OVL_INDEX, d_inode(dentry)) &&
> +           d_inode(dentry)->i_nlink == 0) {
> +               const struct cred *old_cred;
>
> -                       old_cred = ovl_override_creds(dentry->d_sb);
> -                       ovl_cleanup_index(dentry);
> -                       revert_creds(old_cred);
> -               }
> +               old_cred = ovl_override_creds(dentry->d_sb);
> +               ovl_cleanup_index(dentry);
> +               revert_creds(old_cred);
> +       }
> +}
>
> +void ovl_nlink_end(struct dentry *dentry, bool locked)
> +{
> +       if (locked) {
> +               ovl_nlink_end_locked(dentry);
>                 mutex_unlock(&OVL_I(d_inode(dentry))->lock);
>         }
>  }
> --
> 2.13.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 17/28] ovl: During rename lock both source and target ovl_inode
  2018-03-29 19:38 ` [PATCH v13 17/28] ovl: During rename lock both source and target ovl_inode Vivek Goyal
@ 2018-03-30  6:50   ` Amir Goldstein
  2018-04-02 17:34     ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30  6:50 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> In some cases we need to take both source (old) and target(new) dentry
> ovl_inode->lock. This patch adds support for that. Locks are taken in
> order of increasing inode address to avoid deadlock. This code has been
> taken from lock_two_nondirectories().
>
> As of now, metacopy needs this lock if we are planning to update redirect
> information on source/target ovl_inode. nlink related accounting takes this
> lock on target for the case of overwrite.
>

This is very over complicated.

Here is how this could be simplified.

Setting redirects does NOT need to happen under all the complicated rename
locks. Settings redirects can happen completely before everything else
because it is harmless to set redirect and not rename afterwards.

I would split out a helper ovl_rename_prep() it can also take care of
copy ups and check ovl_can_rename() and then set redirect if needed
on old and/or new locking them one at a time.
You can take the ovl_inode->lock inside ovl_set_redirect() and you can also
override credentials or whatever is needed due to moving set redirect
earlier.

You probably don't need all the nlink_start re-factoring with this change.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 19/28] ovl: Treat metacopy dentries as type OVL_PATH_MERGE
  2018-03-29 19:38 ` [PATCH v13 19/28] ovl: Treat metacopy dentries as type OVL_PATH_MERGE Vivek Goyal
@ 2018-03-30  6:52   ` Amir Goldstein
  0 siblings, 0 replies; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30  6:52 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> Right now OVL_PATH_MERGE is used only for merged directories.
> But conceptually, a metacopy dentry (backed by a lower data dentry) is
> a merged entity as well.
>
> So mark metacopy dentries as OVL_PATH_MERGE and ovl_rename() makes use
> of this property later to set redirect on a metacopy file.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

> ---
>  fs/overlayfs/util.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index 927960aa57ee..29f7336ade88 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -118,7 +118,7 @@ enum ovl_path_type ovl_path_type(struct dentry *dentry)
>                  */
>                 if (oe->numlower) {
>                         type |= __OVL_PATH_ORIGIN;
> -                       if (d_is_dir(dentry))
> +                       if (d_is_dir(dentry) || !ovl_has_upperdata(dentry))
>                                 type |= __OVL_PATH_MERGE;
>                 }
>         } else {
> --
> 2.13.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 22/28] ovl: Set redirect on upper inode when it is linked
  2018-03-29 19:38 ` [PATCH v13 22/28] ovl: Set redirect on upper inode when it is linked Vivek Goyal
@ 2018-03-30  7:04   ` Amir Goldstein
  2018-04-11 15:59     ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30  7:04 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> When we create a hardlink to a metacopy upper file, first the redirect
> on that inode. Path based lookup will not work with newly created link
> and redirect will solve that issue.
>
> Also use absolute redirect as two hardlinks could be in different directores
> and relative redirect will not work.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/dir.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index 7c0a02d9f6bd..ccbe061fc4ba 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -24,6 +24,8 @@ module_param_named(redirect_max, ovl_redirect_max, ushort, 0644);
>  MODULE_PARM_DESC(ovl_redirect_max,
>                  "Maximum length of absolute redirect xattr value");
>
> +static int ovl_set_redirect(struct dentry *dentry, bool samedir);
> +
>  int ovl_cleanup(struct inode *wdir, struct dentry *wdentry)
>  {
>         int err;
> @@ -468,6 +470,9 @@ static int ovl_create_or_link(struct dentry *dentry, struct inode *inode,
>         const struct cred *old_cred;
>         struct cred *override_cred;
>         struct dentry *parent = dentry->d_parent;
> +       struct dentry *hardlink_upper;
> +
> +       hardlink_upper = hardlink ? ovl_dentry_upper(hardlink) : NULL;
>
>         err = ovl_copy_up(parent);
>         if (err)
> @@ -502,12 +507,18 @@ static int ovl_create_or_link(struct dentry *dentry, struct inode *inode,
>                 put_cred(override_creds(override_cred));
>                 put_cred(override_cred);
>
> +               if (hardlink && ovl_is_metacopy_dentry(hardlink)) {
> +                       err = ovl_set_redirect(hardlink, false);

Like with rename, redirect could be set much sooner in ovl_link()
if all the locks were contained within ovl_set_redirect().
I think code will be simpler overall, but can't say for sure...


> +                       if (err)
> +                               goto out_revert_creds;
> +               }
> +
>                 if (!ovl_dentry_is_whiteout(dentry))
>                         err = ovl_create_upper(dentry, inode, attr,
> -                                               hardlink);
> +                                              hardlink_upper);
>                 else
>                         err = ovl_create_over_whiteout(dentry, inode, attr,
> -                                                       hardlink);
> +                                                      hardlink_upper);
>         }
>  out_revert_creds:
>         revert_creds(old_cred);
> @@ -606,8 +617,7 @@ static int ovl_link(struct dentry *old, struct inode *newdir,
>         inode = d_inode(old);
>         ihold(inode);
>
> -       err = ovl_create_or_link(new, inode, NULL, ovl_dentry_upper(old),
> -                                ovl_type_origin(old));
> +       err = ovl_create_or_link(new, inode, NULL, old, ovl_type_origin(old));
>         if (err)
>                 iput(inode);
>
> --
> 2.13.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 21/28] ovl: Set redirect on metacopy files upon rename
  2018-03-29 19:38 ` [PATCH v13 21/28] ovl: Set redirect on metacopy files upon rename Vivek Goyal
@ 2018-03-30  7:31   ` Amir Goldstein
  2018-04-11 15:12     ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30  7:31 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> Set redirect on metacopy files upon rename. This will help find data dentry
> in lower dirs.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/dir.c | 50 +++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 37 insertions(+), 13 deletions(-)
>
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index 3ea052b6bac7..7c0a02d9f6bd 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -968,6 +968,27 @@ static void ovl_rename_unlock_ovl_inodes(struct dentry *old, struct dentry *new,
>                 mutex_unlock(&OVL_I(d_inode(new))->lock);
>  }
>
> +static bool ovl_relative_redirect(struct dentry *dentry, bool samedir)
> +{
> +       if (d_is_dir(dentry))
> +               return samedir;
> +
> +       /*
> +        * For non-dir hardlinked files, we need absolute redirects
> +        * in general as two upper hardlinks could be in different
> +        * dirs. We could put a relative redirect now and convert
> +        * it to absolute redirect later. But when nlink > 1 and
> +        * indexing is on, that means relative redirect needs to be
> +        * converted to absolute during copy up of another lower
> +        * hardllink as well.
> +        *
> +        * So without optimizing too much, just check if non-dir
> +        * has nlink > 1 or not. If yes, set absolute redirect to
> +        * begin with.
> +        */
> +       return (d_inode(dentry)->i_nlink > 1 ? false : samedir);

I don't think this is wrong, but I don't like relying on the overlay inode
nlink. I wonder if you should separate the case of indexed and
non-indexed inode.

> +}
> +
>  static int ovl_rename(struct inode *olddir, struct dentry *old,
>                       struct inode *newdir, struct dentry *new,
>                       unsigned int flags)
> @@ -1131,22 +1152,25 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
>                 goto out_dput;
>
>         err = 0;
> -       if (is_dir) {
> -               if (ovl_type_merge_or_lower(old))
> -                       err = ovl_set_redirect(old, samedir);
> -               else if (!old_opaque && ovl_type_merge(new->d_parent))
> -                       err = ovl_set_opaque_xerr(old, olddentry, -EXDEV);
> -               if (err)
> -                       goto out_dput;
> -       }
> -       if (!overwrite && new_is_dir) {
> +       if (ovl_type_merge_or_lower(old))
> +               err = ovl_set_redirect(old,
> +                                      ovl_relative_redirect(old, samedir));
> +       else if (is_dir && !old_opaque && ovl_type_merge(new->d_parent))
> +               err = ovl_set_opaque_xerr(old, olddentry, -EXDEV);
> +
> +       if (err)
> +               goto out_dput;
> +
> +       if (!overwrite) {
>                 if (ovl_type_merge_or_lower(new))
> -                       err = ovl_set_redirect(new, samedir);
> -               else if (!new_opaque && ovl_type_merge(old->d_parent))
> +                       err = ovl_set_redirect(new, ovl_relative_redirect(new,
> +                                              samedir));
> +               else if (new_is_dir && !new_opaque &&
> +                        ovl_type_merge(old->d_parent))
>                         err = ovl_set_opaque_xerr(new, newdentry, -EXDEV);
> -               if (err)
> -                       goto out_dput;
>         }
> +       if (err)
> +               goto out_dput;
>
>         err = ovl_do_rename(old_upperdir->d_inode, olddentry,
>                             new_upperdir->d_inode, newdentry, flags);
> --
> 2.13.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 25/28] ovl: Use out_err insteada of out_nomem
  2018-03-29 19:38 ` [PATCH v13 25/28] ovl: Use out_err insteada of out_nomem Vivek Goyal
@ 2018-03-30  7:35   ` Amir Goldstein
  0 siblings, 0 replies; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30  7:35 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> Right now we use goto out_nomem which assumes error code is -ENOMEM. But
> there are other errors returned like -ESTALE as well. So instead of out_nomem,
> use out_err which will do ERR_PTR(err). That way one can putt error code
> in err and jump to out_err.
>
> This just code reorganization and no change of functionality.
>
> I am about to add more code and this organization helps laying more code
> and error paths on top of it.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

> ---
>  fs/overlayfs/inode.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index cd03f3e642fd..3dccfa1ee123 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -693,6 +693,7 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
>         struct inode *inode;
>         bool bylower = ovl_hash_bylower(sb, upperdentry, lowerdentry, index);
>         bool is_dir, metacopy = false;
> +       int err = -ENOMEM;
>
>         if (!realinode)
>                 realinode = d_inode(lowerdentry);
> @@ -710,7 +711,7 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
>                 inode = iget5_locked(sb, (unsigned long) key,
>                                      ovl_inode_test, ovl_inode_set, key);
>                 if (!inode)
> -                       goto out_nomem;
> +                       goto out_err;
>                 if (!(inode->i_state & I_NEW)) {
>                         /*
>                          * Verify that the underlying files stored in the inode
> @@ -719,8 +720,8 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
>                         if (!ovl_verify_inode(inode, lowerdentry, upperdentry,
>                                               true)) {
>                                 iput(inode);
> -                               inode = ERR_PTR(-ESTALE);
> -                               goto out;
> +                               err = -ESTALE;
> +                               goto out_err;
>                         }
>
>                         dput(upperdentry);
> @@ -735,8 +736,10 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
>         } else {
>                 /* Lower hardlink that will be broken on copy up */
>                 inode = new_inode(sb);
> -               if (!inode)
> -                       goto out_nomem;
> +               if (!inode) {
> +                       err = -ENOMEM;

I recon Miklos prefers setting err value before condition.
same with ESTALE above

> +                       goto out_err;
> +               }
>         }
>         ovl_fill_inode(inode, realinode->i_mode, realinode->i_rdev);
>         ovl_inode_init(inode, upperdentry, lowerdentry);
> @@ -766,7 +769,7 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
>  out:
>         return inode;
>
> -out_nomem:
> -       inode = ERR_PTR(-ENOMEM);
> +out_err:
> +       inode = ERR_PTR(err);
>         goto out;
>  }
> --
> 2.13.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 24/28] ovl: Do not error if REDIRECT XATTR is missing
  2018-03-29 19:38 ` [PATCH v13 24/28] ovl: Do not error if REDIRECT XATTR is missing Vivek Goyal
@ 2018-03-30  7:41   ` Amir Goldstein
  0 siblings, 0 replies; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30  7:41 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> Currently we first call vfs_getxattr() to get size of REDIRECT xatttr and
> then make another call vfs_getxattr() to read the xattr in buffer.
>
> This is all part of ovl_lookup() and we do not have ovl_inode->lock. That
> means it is possible that inode got copied up on another cpu and is not
> a metacopy inode anymore. And that also means REDIRECT xattr got removed.
> And that means that while first call to vfs_getxattr() succeeded, second
> call can fail with -ENODATA.
>
> Do not error out if -ENODATA is received from second call. With metacopy
> enabled, it is not an error case anymore.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

> ---
>  fs/overlayfs/namei.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index a7a9588f64b7..a97666245726 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -47,8 +47,15 @@ static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
>                 goto invalid;
>
>         res = vfs_getxattr(dentry, OVL_XATTR_REDIRECT, buf, res);
> -       if (res < 0)
> +       if (res < 0) {
> +               /*
> +                * Redirect xattr can be removed if a parallel data
> +                * copy up took place.
> +                */
> +               if (res == -ENODATA)
> +                       goto err_free;
>                 goto fail;
> +       }
>         if (res == 0)
>                 goto invalid;
>         if (buf[0] == '/') {
> --
> 2.13.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 26/28] ovl: Re-check redirect xattr during inode initialization
  2018-03-29 19:38 ` [PATCH v13 26/28] ovl: Re-check redirect xattr during inode initialization Vivek Goyal
@ 2018-03-30  8:56   ` Amir Goldstein
  2018-04-02 19:35     ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30  8:56 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> So far redirect could be placed on directories only and now it can be
> placed on regular files as well. Also it could be completely removed
> when a metacopy copy up file's data is copied up. That means if a redirect
> is present during ovl_lookup(), it could be gone by the time ovl_get_inode()
> happens.
>

There is a bit of a mess in the assumptions.

If the inode is pure upper or indexed origin, than the alleged race ends up
in !(inode->i_state & I_NEW) and you discard redirect anyway.

If the inode is non-indexed copyup, then it is a different inode on disk
and different struct ovl_inode in memory than the inode of the copy up
we are allegedly racing with (they are broken hardlinks), so there is no
issue.

> Or it is possible that ovl_lookup() does not see a redirect and a rename
> is taking place on a hard link and that places a redirect. And by the
> time ovl_lookup() calls ovl_get_inode(), it sets ovl_inode->redirect = NULL
> (Assume inode got flushed out of cache and was allocated new).

Same as above.

I am not saying there are no races between lookup and rename/link,
but IMO the text above does not describe them or proves that they exist.

>
> IOW, because we check and process redirect without locks in ovl_lookup(),
> many possibilities open up for regular files. So for such cases, do not
> use the redirect provided by the caller. Instead query it and install
> in ovl_inode->redirect.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/inode.c     | 19 ++++++++++++++++++-
>  fs/overlayfs/overlayfs.h |  1 +
>  fs/overlayfs/util.c      | 42 ++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 61 insertions(+), 1 deletion(-)
>
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index 3dccfa1ee123..6a0c85699024 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -694,6 +694,7 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
>         bool bylower = ovl_hash_bylower(sb, upperdentry, lowerdentry, index);
>         bool is_dir, metacopy = false;
>         int err = -ENOMEM;
> +       char *new_redirect = NULL;
>
>         if (!realinode)
>                 realinode = d_inode(lowerdentry);
> @@ -754,7 +755,18 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
>         if (upperdentry && !metacopy)
>                 ovl_set_flag(OVL_UPPERDATA, inode);
>
> -       OVL_I(inode)->redirect = redirect;
> +       if (!metacopy) {
> +               OVL_I(inode)->redirect = redirect;
> +               redirect = NULL;
> +       } else if (upperdentry) {
> +               new_redirect = ovl_get_redirect_xattr(upperdentry);
> +               if (IS_ERR(new_redirect)) {
> +                       err = PTR_ERR(new_redirect);
> +                       goto out_err_inode;
> +               }
> +               OVL_I(inode)->redirect = new_redirect;
> +               new_redirect = NULL;
> +       }
>
>         /* Check for non-merge dir that may have whiteouts */
>         if (is_dir) {
> @@ -764,11 +776,16 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
>                 }
>         }
>
> +       kfree(redirect);
>         if (inode->i_state & I_NEW)
>                 unlock_new_inode(inode);
>  out:
>         return inode;
>
> +out_err_inode:
> +       if (inode->i_state & I_NEW)
> +               unlock_new_inode(inode);
> +       iput(inode);
>  out_err:
>         inode = ERR_PTR(err);
>         goto out;
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 429713653b3b..a3bee7619fbb 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -279,6 +279,7 @@ void ovl_nlink_end_locked(struct dentry *dentry);
>  int ovl_lock_rename_workdir(struct dentry *workdir, struct dentry *upperdir);
>  int ovl_check_metacopy_xattr(struct dentry *dentry);
>  bool ovl_is_metacopy_dentry(struct dentry *dentry);
> +char *ovl_get_redirect_xattr(struct dentry *dentry);
>
>  static inline bool ovl_is_impuredir(struct dentry *dentry)
>  {
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index 961d65bd25c9..3d090b6f9fc2 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -833,3 +833,45 @@ bool ovl_is_metacopy_dentry(struct dentry *dentry)
>
>         return (oe->numlower > 1);
>  }
> +
> +char *ovl_get_redirect_xattr(struct dentry *dentry)
> +{
> +       int res;
> +       char *s, *next, *buf = NULL;
> +
> +       res = vfs_getxattr(dentry, OVL_XATTR_REDIRECT, NULL, 0);
> +       if (res < 0) {
> +               if (res == -ENODATA || res == -EOPNOTSUPP)
> +                       return NULL;
> +               return ERR_PTR(res);
> +       }
> +
> +       buf = kzalloc(res + 1, GFP_KERNEL);
> +       if (!buf)
> +               return ERR_PTR(-ENOMEM);
> +
> +       res = vfs_getxattr(dentry, OVL_XATTR_REDIRECT, buf, res);
> +       if (res < 0) {
> +               kfree(buf);
> +               return ERR_PTR(res);
> +        }
> +       if (res == 0)
> +               goto invalid;
> +
> +       if (buf[0] == '/') {
> +               for (s = buf; *s++ == '/'; s = next) {
> +                       next = strchrnul(s, '/');
> +                       if (s == next)
> +                               goto invalid;
> +               }
> +       } else {
> +               if (strchr(buf, '/') != NULL)
> +                       goto invalid;
> +       }
> +
> +       return buf;
> +invalid:
> +       pr_warn_ratelimited("overlayfs: invalid redirect (%s)\n", buf);
> +       kfree(buf);
> +       return ERR_PTR(-EINVAL);
> +}
> --
> 2.13.6
>

If you really end up needing this helper, you should use it from lookup as well.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 10/28] ovl: Modify ovl_lookup() and friends to lookup metacopy dentry
  2018-03-30  5:49   ` Amir Goldstein
@ 2018-03-30  9:12     ` Amir Goldstein
  2018-04-02 19:45       ` Vivek Goyal
  2018-04-02 15:06     ` Vivek Goyal
  1 sibling, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30  9:12 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Fri, Mar 30, 2018 at 8:49 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> This patch modifies ovl_lookup() and friends to lookup metacopy dentries.
>> It also allows for presence of metacopy dentries in lower layer.
>>
>> During lookup, check for presence of OVL_XATTR_METACOPY and if not present,
>> set OVL_UPPERDATA bit in flags.
>>
>> We don't support metacopy feature with nfs_export. So in nfs_export code,
>> we set OVL_UPPERDATA flag set unconditionally if upper inode exists.
>>
>> Do not follow metacopy origin if we find a metacopy only inode and metacopy
>> feature is not enabled for that mount. Like redirect, this can have security
>> implications where an attacker could hand craft upper and try to gain
>> access to file on lower which it should not have to begin with.
>>
>> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
>> ---
[...]

>> @@ -917,16 +929,35 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>>                  * When "verify_lower" feature is enabled, do not merge with a
>>                  * lower dir that does not match a stored origin xattr. In any
>>                  * case, only verified origin is used for index lookup.
>> +                *
>> +                * For non-dir dentry, make sure dentry found by lookup
>> +                * matches the origin stored in upper. Otherwise its an
>> +                * error.
>>                  */
>> -               if (upperdentry && !ctr && ovl_verify_lower(dentry->d_sb)) {
>> +               if (upperdentry && !ctr &&
>> +                   ((d.is_dir && ovl_verify_lower(dentry->d_sb)) ||
>> +                    (!d.is_dir && origin_path))) {
>>                         err = ovl_verify_origin(upperdentry, this, false);
>>                         if (err) {
>>                                 dput(this);
>> -                               break;
>> +                               if (d.is_dir)
>> +                                       break;
>> +                               goto out_put;
>>                         }
>> -
>>                         /* Bless lower dir as verified origin */
>> -                       origin = this;
>> +                       if (d.is_dir)
>> +                               origin = this;
>
> It's ok to bless verified non-dir as well.
> It is going to be blesses anyway, just above index lookup if ctr > 0.

>
>> +               }
>> +
>> +               if (d.metacopy)
>> +                       metacopy = true;
>> +               /*
>> +                * Do not store intermediate metacopy dentries in chain,
>> +                * except top most lower metacopy dentry
>> +                */
>> +               if (d.metacopy && ctr) {
>> +                       dput(this);
>> +                       continue;
>>                 }
>>
>>                 stack[ctr].dentry = this;
>> @@ -960,6 +991,34 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>>                 }
>>         }
>>
>> +       if (metacopy) {
>> +               BUG_ON(d.is_dir);
>
> Yeh, I think that is really a bug, because you need to detect
> the case of dir in lower layer under metacopy in upper layer
> and do something about it.
>
>> +               /*
>> +                * Found a metacopy dentry but did not find corresponding
>> +                * data dentry
>> +                */
>> +               if (d.metacopy) {
>> +                       err = -ESTALE;
>> +                       goto out_put;
>> +               }
>> +
>> +               err = -EPERM;
>> +               if (!ofs->config.metacopy) {
>> +                       pr_warn_ratelimited("overlay: refusing to follow"
>> +                                           " metacopy origin for (%pd2)\n",
>> +                                           dentry);
>> +                       goto out_put;
>> +               }

in (!origin_path) lower was followed by name/redirect and not
verified by origin, we should not lookup index of non-dir below.
Right now non-dir index entries assume that origin xattr exists
and matches the entry name. We may be able to relax that in
the future using non-dir redirect, but we are not there yet.


>> +       } else if (!d.is_dir && upperdentry && !ctr && origin_path) {
>> +               if (WARN_ON(stack != NULL)) {
>> +                       err = -EIO;
>> +                       goto out_put;
>> +               }
>> +               stack = origin_path;
>> +               ctr = 1;
>> +               origin_path = NULL;
>> +       }
>> +
>>         /*
>>          * Lookup index by lower inode and verify it matches upper inode.
>>          * We only trust dir index if we verified that lower dir matches
         * origin, otherwise dir index entries may be inconsistent and we
         * ignore them. Always lookup index of non-dir and non-upper.
         */
        if (ctr && (!upperdentry || !d.is_dir))
                origin = stack[0].dentry;

So this condition needs to be fixed.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 12/28] ovl: Fix ovl_getattr() to get number of blocks from lower
  2018-03-29 19:38 ` [PATCH v13 12/28] ovl: Fix ovl_getattr() to get number of blocks from lower Vivek Goyal
@ 2018-03-30  9:24   ` Amir Goldstein
  2018-04-02 20:11     ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30  9:24 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> If an inode has been copied up metadata only, then we need to query the
> number of blocks from lower and fill up the stat->st_blocks.
>
> We need to be careful about races where we are doing stat on one cpu and
> data copy up is taking place on other cpu. We want to return
> stat->st_blocks either from lower or stable upper and not something in
> between. Hence, ovl_has_upperdata() is called first to figure out whether
> block reporting will take place from lower or upper.
>
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/inode.c     | 17 ++++++++++++++++-
>  fs/overlayfs/overlayfs.h |  1 +
>  fs/overlayfs/util.c      | 16 ++++++++++++++++
>  3 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index e1dbfed0c449..cd03f3e642fd 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -76,6 +76,9 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
>         bool is_dir = S_ISDIR(dentry->d_inode->i_mode);
>         bool samefs = ovl_same_sb(dentry->d_sb);
>         int err;
> +       bool metacopy = false;
> +
> +       metacopy = ovl_is_metacopy_dentry(dentry);
>
>         type = ovl_path_real(dentry, &realpath);
>         old_cred = ovl_override_creds(dentry->d_sb);
> @@ -93,7 +96,8 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
>         if (!is_dir || samefs) {
>                 if (OVL_TYPE_ORIGIN(type)) {
>                         struct kstat lowerstat;
> -                       u32 lowermask = STATX_INO | (!is_dir ? STATX_NLINK : 0);
> +                       u32 lowermask = STATX_INO | STATX_BLOCKS |
> +                                       (!is_dir ? STATX_NLINK : 0);

Leftover.

But I think you could do with one vfs_getattr():

         if (!is_dir || samefs) {
                   u32 lowermask = metacopy ? STATX_BLOCKS : 0;
                   if (OVL_TYPE_ORIGIN(type))
                           lowermask |= STATX_INO | (!is_dir ? STATX_NLINK : 0);
                   if (lowermask) {
                           ovl_path_lower(dentry, &realpath);
                           err = vfs_getattr(&realpath, &lowerstat,

...

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 20/28] ovl: Do not set dentry type ORIGIN for broken hardlinks
  2018-03-29 19:38 ` [PATCH v13 20/28] ovl: Do not set dentry type ORIGIN for broken hardlinks Vivek Goyal
@ 2018-03-30  9:54   ` Amir Goldstein
  2018-04-10 14:00     ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30  9:54 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> If a dentry has copy up origin, we set flag OVL_PATH_ORIGIN. So far
> this decision was easy that we had to check only for oe->numlower
> and if it is non-zero, we knew there is copy up origin. (For non-dir
> we installed origin dentry in lowerstack[0]).
>
> But we don't create ORGIN xattr for broken hardlinks (index=off). And
> with metacopy feature it is possible that we will still install
> lowerstack[0]. But that's lower data dentry of metacopy upper of broken
> hardlink and not ORIGIN XATTR is not set.
>
> So two differentiate between two cases, do not set OVL_PATH_ORIGIN if
> we have a broken hardlink.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/util.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index 29f7336ade88..961d65bd25c9 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -117,7 +117,14 @@ enum ovl_path_type ovl_path_type(struct dentry *dentry)
>                  * Non-dir dentry can hold lower dentry of its copy up origin.

This comment needs updating with metacopy.

>                  */
>                 if (oe->numlower) {
> -                       type |= __OVL_PATH_ORIGIN;
> +                       /*
> +                        * ORIGIN is created for everyting except broken
> +                        * hardlinks
> +                        */
> +                       if (!(d_inode(dentry)->i_nlink > 1 &&
> +                           !ovl_test_flag(OVL_INDEX, d_inode(dentry))))
> +                               type |= __OVL_PATH_ORIGIN;
> +

I don't like relying on overlay nlink. it is not reliable.
And I think you missed the directory case.
The information you need was available during lookup
and we did not keep it (was lower verified by origin fh).
Most likely what we need to do it store OVL_ORIGIN inode flag
during lookup and then use ovl_test_flag(OVL_ORIGIN)
in place of OVL_TYPE_ORIGIN(type).

If I am not mistaken, you could set flag OVL_ORIGIN in
ovl_get_inode() IFF (upperdentry && bylower), but to be honest
the rules became so complicated that I can't say for sure.
At least I concentrated all the rules in one helper ovl_hash_bylower(),
so I hope that helps.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 18/28] ovl: Check redirects for metacopy files
  2018-03-29 19:38 ` [PATCH v13 18/28] ovl: Check redirects for metacopy files Vivek Goyal
@ 2018-03-30 10:02   ` Amir Goldstein
  2018-04-02 20:29     ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30 10:02 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> Right now we rely on path based lookup for data origin of metacopy upper.
> This will work only if upper has not been renamed. We solved this problem
> already for merged directories using redirect. Use same logic for metacopy
> files.
>
> This patch just goes on to check redirects for metacopy files.
>

The patch changes the logic very subtly, but it is really hard to
follow because of convoluted diff. Please make changes that don't
change logic in a separate patch.

> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/namei.c | 24 ++++++++++++------------
>  1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index 1dba89e9543f..a7a9588f64b7 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -260,18 +260,19 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
>                         goto out_err;
>                 d->stop = !err;
>                 d->metacopy = !!err;
> -               goto out;
> -       }
> -       if (last_element)
> -               d->is_dir = true;
> -       if (d->last)
> -               goto out;
> -
> -       if (ovl_is_opaquedir(this)) {
> -               d->stop = true;
> +               if (!d->metacopy)
> +                       goto out;
> +       } else {
>                 if (last_element)
> -                       d->opaque = true;
> -               goto out;
> +                       d->is_dir = true;
> +               if (d->last)
> +                       goto out;

If I am not mistaken, d->last test is relevant to both did and non-dir,
because it also prevents the unneeded check redirect.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode
  2018-03-29 19:38 ` [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode Vivek Goyal
@ 2018-03-30 10:53   ` Amir Goldstein
  2018-04-02 12:39     ` Vivek Goyal
  2018-04-04 12:29     ` Vivek Goyal
  0 siblings, 2 replies; 91+ messages in thread
From: Amir Goldstein @ 2018-03-30 10:53 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> If we find a upper metacopy inode, make sure we also found associated data
> dentry in lower. Otherwise copy up operation later will fail.
>
> There are two cases where this can happen. First case is that somehow
> data file was removed from lower layer. Other case is that REDIRECT
> xattr was removed due to copy up of file on another cpu (when inode is
> shared between two dentries) and hence ovl_lookup() could not find the
> correct dentry.
>

Remind me again why we remove REDIRECT xattr?
Is it a must for functionality or just for being boy scouts?
I would prefer to avoid having to deal with races of this sort.
You can cleanup REDIRECT for non-dir that is not metacopy
on lookup when finding a I_NEW inode.


> First case is an error while second case is not error. If file has been
> copied up, then it does not matter if data dentry was found or not.
>
> Redirect removal is protected using ovl_inode->lock and ovl_lookup() does
> not have access to that lock. So to differentiate between these two
> cases, take appropriate inode lock in ovl_get_inode() and make sure a
> data dentry has been found for metacopy inode. Otherwise lookup failed
> and its an error.
>
> For example, say two files are hardlinked, foo.txt and bar.txt. Say foo.txt
> is renamed to foo-renamed.txt gets copied up metadata only. This will also
> put a redirect "/foo.txt" on hardlnk  inode. Now assume foo-renamed.txt
> is opened for write and is undergoing data copy up on one cpu and bar.txt
> is under going ovl_lookup() on other cpu. Data copy up path will remove
> REDIRECT and METACOPY xattr. It is possible that METACOPY xattr is
> visible to ovl_lookup() but by the REDIRECT xattr was gone by the time.
> That means no data dentry will be found but at the same time now inode
> is not metacopy inode. So data dentry is not required. So this is not
> error case. But if inode was still metacopy but data dentry was not found
> this is error case. (May be due to underlying layer changed). Fix it by
> returning -ESTALE.
>
> If inode was found in cache, then take ovl_inode->lock before checking
> status of inode. If inode has been allocated, then it is returned with
> inode lock anyway and other threads will block on that lock, so no need
> to take ovl_inode->lock.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/export.c    |  3 ++-
>  fs/overlayfs/inode.c     | 49 +++++++++++++++++++++++++++++++++++++++++++++++-
>  fs/overlayfs/namei.c     | 14 ++++----------
>  fs/overlayfs/overlayfs.h |  3 ++-
>  4 files changed, 56 insertions(+), 13 deletions(-)
>
> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> index 1c233096e59c..e8575d4d2c77 100644
> --- a/fs/overlayfs/export.c
> +++ b/fs/overlayfs/export.c
> @@ -305,7 +305,8 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
>         if (d_is_dir(upper ?: lower))
>                 return ERR_PTR(-EIO);
>
> -       inode = ovl_get_inode(sb, dget(upper), lower, index, !!lower, NULL);
> +       inode = ovl_get_inode(sb, dget(upper), lower, index, !!lower, NULL,
> +                             false);
>         if (IS_ERR(inode)) {
>                 dput(upper);
>                 return ERR_CAST(inode);
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index 6a0c85699024..7e30f4a7cdd9 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -685,9 +685,42 @@ static bool ovl_hash_bylower(struct super_block *sb, struct dentry *upper,
>         return true;
>  }
>
> +static bool ovl_verify_metacopy_data(struct super_block *sb,
> +                                    struct inode *inode, bool metacopydata)
> +{
> +       struct ovl_fs *ofs = sb->s_fs_info;
> +       struct ovl_inode *oi = OVL_I(inode);
> +       bool metacopy = false;
> +
> +       /* A metacopy data dentry was found. So no need to do further checks */
> +       if (metacopydata)
> +               return true;
> +
> +       /*
> +        * Metacopy feature not enabled. No metadata data copy up should take
> +        * place. So no further checks needed.
> +        */
> +       if (!ofs->config.metacopy)
> +               return true;
> +
> +       if (!S_ISREG(inode->i_mode))
> +               return true;
> +
> +       /*
> +        * Metacopy feature is enabled and we have not found metacopy data
> +        * dentry. Make sure this inode is not metacopy inode.
> +        */
> +       mutex_lock(&oi->lock);
> +       metacopy = !ovl_test_flag(OVL_UPPERDATA, inode);
> +       mutex_unlock(&oi->lock);
> +

FYI, the reason you need mutex_lock_interruptible is so if other CPU
is busy with large copy up, the process that does lookup() can
be interrupted by user.
This is not likley to be a problem here, because race condition suggests that
copy up of data is complete by now, but nevertheless.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode
  2018-03-30 10:53   ` Amir Goldstein
@ 2018-04-02 12:39     ` Vivek Goyal
  2018-04-04 12:29     ` Vivek Goyal
  1 sibling, 0 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-04-02 12:39 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Fri, Mar 30, 2018 at 01:53:24PM +0300, Amir Goldstein wrote:
> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > If we find a upper metacopy inode, make sure we also found associated data
> > dentry in lower. Otherwise copy up operation later will fail.
> >
> > There are two cases where this can happen. First case is that somehow
> > data file was removed from lower layer. Other case is that REDIRECT
> > xattr was removed due to copy up of file on another cpu (when inode is
> > shared between two dentries) and hence ovl_lookup() could not find the
> > correct dentry.
> >
> 
> Remind me again why we remove REDIRECT xattr?
> Is it a must for functionality or just for being boy scouts?

Cleaning REDIRECT is not must for the functionality. It is just more
about cleanup as that redirect is not needed anymore.

> I would prefer to avoid having to deal with races of this sort.
> You can cleanup REDIRECT for non-dir that is not metacopy
> on lookup when finding a I_NEW inode.

Hmm.., ok, that sounds reasonable too. Can look into it.

> 
> 
> > First case is an error while second case is not error. If file has been
> > copied up, then it does not matter if data dentry was found or not.
> >
> > Redirect removal is protected using ovl_inode->lock and ovl_lookup() does
> > not have access to that lock. So to differentiate between these two
> > cases, take appropriate inode lock in ovl_get_inode() and make sure a
> > data dentry has been found for metacopy inode. Otherwise lookup failed
> > and its an error.
> >
> > For example, say two files are hardlinked, foo.txt and bar.txt. Say foo.txt
> > is renamed to foo-renamed.txt gets copied up metadata only. This will also
> > put a redirect "/foo.txt" on hardlnk  inode. Now assume foo-renamed.txt
> > is opened for write and is undergoing data copy up on one cpu and bar.txt
> > is under going ovl_lookup() on other cpu. Data copy up path will remove
> > REDIRECT and METACOPY xattr. It is possible that METACOPY xattr is
> > visible to ovl_lookup() but by the REDIRECT xattr was gone by the time.
> > That means no data dentry will be found but at the same time now inode
> > is not metacopy inode. So data dentry is not required. So this is not
> > error case. But if inode was still metacopy but data dentry was not found
> > this is error case. (May be due to underlying layer changed). Fix it by
> > returning -ESTALE.
> >
> > If inode was found in cache, then take ovl_inode->lock before checking
> > status of inode. If inode has been allocated, then it is returned with
> > inode lock anyway and other threads will block on that lock, so no need
> > to take ovl_inode->lock.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/overlayfs/export.c    |  3 ++-
> >  fs/overlayfs/inode.c     | 49 +++++++++++++++++++++++++++++++++++++++++++++++-
> >  fs/overlayfs/namei.c     | 14 ++++----------
> >  fs/overlayfs/overlayfs.h |  3 ++-
> >  4 files changed, 56 insertions(+), 13 deletions(-)
> >
> > diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> > index 1c233096e59c..e8575d4d2c77 100644
> > --- a/fs/overlayfs/export.c
> > +++ b/fs/overlayfs/export.c
> > @@ -305,7 +305,8 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
> >         if (d_is_dir(upper ?: lower))
> >                 return ERR_PTR(-EIO);
> >
> > -       inode = ovl_get_inode(sb, dget(upper), lower, index, !!lower, NULL);
> > +       inode = ovl_get_inode(sb, dget(upper), lower, index, !!lower, NULL,
> > +                             false);
> >         if (IS_ERR(inode)) {
> >                 dput(upper);
> >                 return ERR_CAST(inode);
> > diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> > index 6a0c85699024..7e30f4a7cdd9 100644
> > --- a/fs/overlayfs/inode.c
> > +++ b/fs/overlayfs/inode.c
> > @@ -685,9 +685,42 @@ static bool ovl_hash_bylower(struct super_block *sb, struct dentry *upper,
> >         return true;
> >  }
> >
> > +static bool ovl_verify_metacopy_data(struct super_block *sb,
> > +                                    struct inode *inode, bool metacopydata)
> > +{
> > +       struct ovl_fs *ofs = sb->s_fs_info;
> > +       struct ovl_inode *oi = OVL_I(inode);
> > +       bool metacopy = false;
> > +
> > +       /* A metacopy data dentry was found. So no need to do further checks */
> > +       if (metacopydata)
> > +               return true;
> > +
> > +       /*
> > +        * Metacopy feature not enabled. No metadata data copy up should take
> > +        * place. So no further checks needed.
> > +        */
> > +       if (!ofs->config.metacopy)
> > +               return true;
> > +
> > +       if (!S_ISREG(inode->i_mode))
> > +               return true;
> > +
> > +       /*
> > +        * Metacopy feature is enabled and we have not found metacopy data
> > +        * dentry. Make sure this inode is not metacopy inode.
> > +        */
> > +       mutex_lock(&oi->lock);
> > +       metacopy = !ovl_test_flag(OVL_UPPERDATA, inode);
> > +       mutex_unlock(&oi->lock);
> > +
> 
> FYI, the reason you need mutex_lock_interruptible is so if other CPU
> is busy with large copy up, the process that does lookup() can
> be interrupted by user.
> This is not likley to be a problem here, because race condition suggests that
> copy up of data is complete by now, but nevertheless.

Ok, good to know.

Vivek

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 04/28] ovl: Provide a mount option metacopy=on/off for metadata copyup
  2018-03-30  4:52   ` Amir Goldstein
@ 2018-04-02 13:56     ` Vivek Goyal
  2018-04-05 20:16       ` Amir Goldstein
  0 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-04-02 13:56 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Fri, Mar 30, 2018 at 07:52:17AM +0300, Amir Goldstein wrote:
> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > By default metadata only copy up is disabled. Provide a mount option so
> > that users can choose one way or other.
> >
> > Also provide a kernel config and module option to enable/disable
> > metacopy feature.
> >
> > metacopy feature requires redirect_dir=on when upper is present. Otherwise,
> > it requires redirect_dir=follow atleast.
> >
> > Like index feature, we verify on mount that upper root is not being
> > reused with a different lower root.
> 
> I don't see that in the patch

Oh.., this is leftover from previous patches. Will remove this comment.
I have completely got rid of dealing with ORIGIN when moving to
REDIRECT based lookup.

> 
> > This hopes to get the configuration
> > right and detect the copied layers use case. But this does only so
> > much as we don't verify all the lowers. So it is possible that a lower is
> > missing and later data copy up fails.
> >
> > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  Documentation/filesystems/overlayfs.txt | 30 ++++++++++++++++++++++++-
> >  fs/overlayfs/Kconfig                    | 19 ++++++++++++++++
> >  fs/overlayfs/ovl_entry.h                |  1 +
> >  fs/overlayfs/super.c                    | 40 ++++++++++++++++++++++++++++++++-
> >  4 files changed, 88 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt
> > index 6ea1e64d1464..b7720e61973c 100644
> > --- a/Documentation/filesystems/overlayfs.txt
> > +++ b/Documentation/filesystems/overlayfs.txt
> > @@ -249,6 +249,30 @@ rightmost one and going left.  In the above example lower1 will be the
> >  top, lower2 the middle and lower3 the bottom layer.
> >
> >
> > +Metadata only copyup
> > +--------------------
> > +
> > +When metadata only copy up feature is enabled, overlayfs will only copy
> > +up metadata (as opposed to whole file), when a metadata specific operation
> > +like chown/chmod is performed. Full file will be copied up later when
> > +file is opened for WRITE operation.
> > +
> > +IOW, this is delayed data copy up operation and data is copied up when
> > +there is a need to actually modify data.
> > +
> > +There are multiple ways to enable/disable this feature. A config option
> > +CONFIG_OVERLAY_FS_METACOPY can be set/unset to enable/disable this feature
> > +by default. Or one can enable/disable it at module load time with module
> > +parameter metacopy=on/off. Lastly, there is also a per mount option
> > +metacopy=on/off to enable/disable this feature per mount.
> > +
> > +Do not use metacopy=on with untrusted upper/lower directories. Otherwise
> > +it is possible that an attacker can create an handcrafted file with
> > +appropriate REDIRECT and METACOPY xattrs, and gain access to file on lower
> > +pointed by REDIRECT. This should not be possible on local system as setting
> > +"trusted." xattrs will require CAP_SYS_ADMIN. But it should be possible
> > +for untrusted layers like from a pen drive.
> > +
> >  Sharing and copying layers
> >  --------------------------
> >
> > @@ -267,7 +291,7 @@ though it will not result in a crash or deadlock.
> >  Mounting an overlay using an upper layer path, where the upper layer path
> >  was previously used by another mounted overlay in combination with a
> >  different lower layer path, is allowed, unless the "inodes index" feature
> > -is enabled.
> > +or "metadata only copyup" feature is enabled.
> >
> >  With the "inodes index" feature, on the first time mount, an NFS file
> >  handle of the lower layer root directory, along with the UUID of the lower
> > @@ -280,6 +304,10 @@ lower root origin, mount will fail with ESTALE.  An overlayfs mount with
> >  does not support NFS export, lower filesystem does not have a valid UUID or
> >  if the upper filesystem does not support extended attributes.
> >
> > +For "metadata only copyup" feature there is no verification mechanism at
> > +mount time. So if same upper is mouted with different set of lower, mount
> > +probably will succeed but expect the unexpected later on. So don't do it.
> > +
> >  It is quite a common practice to copy overlay layers to a different
> >  directory tree on the same or different underlying filesystem, and even
> >  to a different machine.  With the "inodes index" feature, trying to mount
> > diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
> > index ce6ff5a0a6e4..7d9650c9c075 100644
> > --- a/fs/overlayfs/Kconfig
> > +++ b/fs/overlayfs/Kconfig
> > @@ -86,3 +86,22 @@ config OVERLAY_FS_NFS_EXPORT
> >           case basis with the "nfs_export=on" mount option.
> >
> >           Say N unless you fully understand the consequences.
> > +
> > +config OVERLAY_FS_METACOPY
> > +       bool "Overlayfs: turn on metadata only copy up feature by default"
> > +       depends on OVERLAY_FS
> > +       depends on !OVERLAY_FS_NFS_EXPORT
> 
> Like the test in the code, the dependency should be
> OVERLAY_FS_NFS_EXPORT depends on !OVERLAY_FS_METACOPY

Ok, makes sense. Will change.

> 
> > +       select OVERLAY_FS_REDIRECT_DIR
> 
> At first glance, I thought this should be
> depends on OVERLAY_FS_REDIRECT_DIR,
> like in the code
> But I see why select makes sense in the context of config options.
> Makes me wonder if NFS_EXPORT should also select INDEX

I think it makes sense to select INDEX if user enables NFS_EXPORT.

> 
> I know why I didn't do this logic in the code, because we do not distinguish
> in the code between explicit mount option "index=off" and no mount option
> at all when default is "index=off". The former should disable nfs_export
> but the latter should enable index.
> 
> > +       help
> > +         If this config option is enabled then overlay filesystems will
> > +         copy up only metadata where appropriate and data copy up will
> > +         happen when a file is opended for WRITE operation. It is still
> > +         possible to turn off this feature globally with the "metacopy=off"
> > +         module option or on a filesystem instance basis with the
> > +         "metacopy=off" mount option.
> > +
> > +         Note, that this feature is not backward compatible.  That is,
> > +         mounting an overlay which has metacopy only inodes on a kernel
> > +         that doesn't support this feature will have unexpected results.
> > +
> > +         If unsure, say N.
> > diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> > index bfef6edcc111..7dc55628080d 100644
> > --- a/fs/overlayfs/ovl_entry.h
> > +++ b/fs/overlayfs/ovl_entry.h
> > @@ -18,6 +18,7 @@ struct ovl_config {
> >         const char *redirect_mode;
> >         bool index;
> >         bool nfs_export;
> > +       bool metacopy;
> >  };
> >
> >  struct ovl_layer {
> > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > index 7c24619ae7fc..ddff54fa9e85 100644
> > --- a/fs/overlayfs/super.c
> > +++ b/fs/overlayfs/super.c
> > @@ -58,6 +58,11 @@ static void ovl_entry_stack_free(struct ovl_entry *oe)
> >                 dput(oe->lowerstack[i].dentry);
> >  }
> >
> > +static bool ovl_metacopy_def = IS_ENABLED(CONFIG_OVERLAY_FS_METACOPY);
> > +module_param_named(metacopy, ovl_metacopy_def, bool, 0644);
> > +MODULE_PARM_DESC(ovl_metacopy_def,
> > +                "Default to on or off for the metadata only copy up feature");
> > +
> >  static void ovl_dentry_release(struct dentry *dentry)
> >  {
> >         struct ovl_entry *oe = dentry->d_fsdata;
> > @@ -350,6 +355,9 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
> >         if (ofs->config.nfs_export != ovl_nfs_export_def)
> >                 seq_printf(m, ",nfs_export=%s", ofs->config.nfs_export ?
> >                                                 "on" : "off");
> > +       if (ofs->config.metacopy != ovl_metacopy_def)
> > +               seq_printf(m, ",metacopy=%s",
> > +                          ofs->config.metacopy ? "on" : "off");
> >         return 0;
> >  }
> >
> > @@ -384,6 +392,8 @@ enum {
> >         OPT_INDEX_OFF,
> >         OPT_NFS_EXPORT_ON,
> >         OPT_NFS_EXPORT_OFF,
> > +       OPT_METACOPY_ON,
> > +       OPT_METACOPY_OFF,
> >         OPT_ERR,
> >  };
> >
> > @@ -397,6 +407,8 @@ static const match_table_t ovl_tokens = {
> >         {OPT_INDEX_OFF,                 "index=off"},
> >         {OPT_NFS_EXPORT_ON,             "nfs_export=on"},
> >         {OPT_NFS_EXPORT_OFF,            "nfs_export=off"},
> > +       {OPT_METACOPY_ON,               "metacopy=on"},
> > +       {OPT_METACOPY_OFF,              "metacopy=off"},
> >         {OPT_ERR,                       NULL}
> >  };
> >
> > @@ -511,6 +523,14 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
> >                         config->nfs_export = false;
> >                         break;
> >
> > +               case OPT_METACOPY_ON:
> > +                       config->metacopy = true;
> > +                       break;
> > +
> > +               case OPT_METACOPY_OFF:
> > +                       config->metacopy = false;
> > +                       break;
> > +
> >                 default:
> >                         pr_err("overlayfs: unrecognized mount option \"%s\" or missing value\n", p);
> >                         return -EINVAL;
> > @@ -993,7 +1013,8 @@ static int ovl_make_workdir(struct ovl_fs *ofs, struct path *workpath)
> >         if (err) {
> >                 ofs->noxattr = true;
> >                 ofs->config.index = false;
> > -               pr_warn("overlayfs: upper fs does not support xattr, falling back to index=off.\n");
> > +               ofs->config.metacopy = false;
> > +               pr_warn("overlayfs: upper fs does not support xattr, falling back to index=off and metacopy=off.\n");
> >                 err = 0;
> >         } else {
> >                 vfs_removexattr(ofs->workdir, OVL_XATTR_OPAQUE);
> > @@ -1012,6 +1033,11 @@ static int ovl_make_workdir(struct ovl_fs *ofs, struct path *workpath)
> >                 ofs->config.nfs_export = false;
> >         }
> >
> > +       /* metacopy feature with upper requires redirect_dir=on */
> > +       if (ofs->config.metacopy && !ofs->config.redirect_dir) {
> > +               pr_warn("overlayfs: metadata only copyup requires \"redirect_dir=on\", falling back to metacopy=off.\n");
> > +               ofs->config.metacopy = false;
> > +       }
> 
> Please move all these scattered tests that depend only on parsed config
> values at the end of ovl_parse_redirect_mode().

Will do.

> 
> >  out:
> >         mnt_drop_write(mnt);
> >         return err;
> > @@ -1188,6 +1214,12 @@ static struct ovl_entry *ovl_get_lowerstack(struct super_block *sb,
> >                 ofs->config.nfs_export = false;
> >         }
> >
> > +       if (!ofs->config.upperdir && ofs->config.metacopy &&
> > +           !ofs->config.redirect_follow) {
> > +               ofs->config.metacopy = false;
> > +               pr_warn("overlayfs: metadata only copyup requires \"redirect_dir=follow\" on non-upper mount, falling back to metacopy=off.\n");
> > +       }
> > +
> >         err = -ENOMEM;
> >         stack = kcalloc(stacklen, sizeof(struct path), GFP_KERNEL);
> >         if (!stack)
> > @@ -1263,6 +1295,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> >
> >         ofs->config.index = ovl_index_def;
> >         ofs->config.nfs_export = ovl_nfs_export_def;
> > +       ofs->config.metacopy = ovl_metacopy_def;
> >         err = ovl_parse_opt((char *) data, &ofs->config);
> >         if (err)
> >                 goto out_err;
> > @@ -1331,6 +1364,11 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> >                 }
> >         }
> >
> > +       if (ofs->config.metacopy && ofs->config.nfs_export) {
> > +               pr_warn("overlayfs: Metadata copy up requires NFS export disabled, falling back to nfs_export=off.\n");
> 
> "NFS export is not supported with metadata only copy up, ..."

Will change.

Vivek

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 10/28] ovl: Modify ovl_lookup() and friends to lookup metacopy dentry
  2018-03-30  5:49   ` Amir Goldstein
  2018-03-30  9:12     ` Amir Goldstein
@ 2018-04-02 15:06     ` Vivek Goyal
  1 sibling, 0 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-04-02 15:06 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Fri, Mar 30, 2018 at 08:49:26AM +0300, Amir Goldstein wrote:
> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > This patch modifies ovl_lookup() and friends to lookup metacopy dentries.
> > It also allows for presence of metacopy dentries in lower layer.
> >
> > During lookup, check for presence of OVL_XATTR_METACOPY and if not present,
> > set OVL_UPPERDATA bit in flags.
> >
> > We don't support metacopy feature with nfs_export. So in nfs_export code,
> > we set OVL_UPPERDATA flag set unconditionally if upper inode exists.
> >
> > Do not follow metacopy origin if we find a metacopy only inode and metacopy
> > feature is not enabled for that mount. Like redirect, this can have security
> > implications where an attacker could hand craft upper and try to gain
> > access to file on lower which it should not have to begin with.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/overlayfs/export.c    |  3 ++
> >  fs/overlayfs/inode.c     |  6 +++-
> >  fs/overlayfs/namei.c     | 90 ++++++++++++++++++++++++++++++++++++++++++------
> >  fs/overlayfs/overlayfs.h |  1 +
> >  fs/overlayfs/util.c      | 22 ++++++++++++
> >  5 files changed, 110 insertions(+), 12 deletions(-)
> >
> > diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> > index e668329f7361..1c233096e59c 100644
> > --- a/fs/overlayfs/export.c
> > +++ b/fs/overlayfs/export.c
> > @@ -311,6 +311,9 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
> >                 return ERR_CAST(inode);
> >         }
> >
> > +       if (upper)
> > +               ovl_set_flag(OVL_UPPERDATA, inode);
> > +
> >         dentry = d_find_any_alias(inode);
> >         if (!dentry) {
> >                 dentry = d_alloc_anon(inode->i_sb);
> > diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> > index 3991a890b464..e1dbfed0c449 100644
> > --- a/fs/overlayfs/inode.c
> > +++ b/fs/overlayfs/inode.c
> > @@ -677,7 +677,7 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
> >         struct inode *realinode = upperdentry ? d_inode(upperdentry) : NULL;
> >         struct inode *inode;
> >         bool bylower = ovl_hash_bylower(sb, upperdentry, lowerdentry, index);
> > -       bool is_dir;
> > +       bool is_dir, metacopy = false;
> >
> >         if (!realinode)
> >                 realinode = d_inode(lowerdentry);
> > @@ -732,6 +732,10 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
> >         if (index)
> >                 ovl_set_flag(OVL_INDEX, inode);
> >
> > +       metacopy = ovl_check_metacopy_xattr(upperdentry ?: lowerdentry);
> 
> No reason to check metacopy on lowerdentry.

Right. Will change it.

> 
> > +       if (upperdentry && !metacopy)
> > +               ovl_set_flag(OVL_UPPERDATA, inode);
> > +
> >         OVL_I(inode)->redirect = redirect;
> >
> >         /* Check for non-merge dir that may have whiteouts */
> > diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> > index 0b325e65864c..1dba89e9543f 100644
> > --- a/fs/overlayfs/namei.c
> > +++ b/fs/overlayfs/namei.c
> > @@ -24,6 +24,7 @@ struct ovl_lookup_data {
> >         bool stop;
> >         bool last;
> >         char *redirect;
> > +       bool metacopy;
> >  };
> >
> >  static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
> > @@ -252,9 +253,13 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
> >                 goto put_and_out;
> >         }
> >         if (!d_can_lookup(this)) {
> > -               d->stop = true;
> 
> upper was a dir. You look in lower and find a non-dir. you need to stop
> going to next layer. goto put_and_out won't do that.

Ok, will set d->stop = true before "goto put_and_out" to handle this
case.

> 
> Similarly, you need to handle the case where dir is found below
> non-dir with metacopy.

Ok, will look into it.

> 
> >                 if (d->is_dir)
> >                         goto put_and_out;
> > +               err = ovl_check_metacopy_xattr(this);
> > +               if (err < 0)
> > +                       goto out_err;
> > +               d->stop = !err;
> > +               d->metacopy = !!err;
> >                 goto out;
> >         }
> >         if (last_element)
> > @@ -815,7 +820,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> >         struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
> >         struct ovl_entry *poe = dentry->d_parent->d_fsdata;
> >         struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
> > -       struct ovl_path *stack = NULL;
> > +       struct ovl_path *stack = NULL, *origin_path = NULL;
> >         struct dentry *upperdir, *upperdentry = NULL;
> >         struct dentry *origin = NULL;
> >         struct dentry *index = NULL;
> > @@ -826,6 +831,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> >         struct dentry *this;
> >         unsigned int i;
> >         int err;
> > +       bool metacopy = false;
> >         struct ovl_lookup_data d = {
> >                 .name = dentry->d_name,
> >                 .is_dir = false,
> > @@ -833,6 +839,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> >                 .stop = false,
> >                 .last = ofs->config.redirect_follow ? false : !poe->numlower,
> >                 .redirect = NULL,
> > +               .metacopy = false,
> >         };
> >
> >         if (dentry->d_name.len > ofs->namelen)
> > @@ -851,7 +858,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> >                         goto out;
> >                 }
> >                 if (upperdentry && !d.is_dir) {
> > -                       BUG_ON(!d.stop || d.redirect);
> > +                       unsigned int origin_ctr = 0;
> > +                       BUG_ON(d.redirect);
> >                         /*
> >                          * Lookup copy up origin by decoding origin file handle.
> >                          * We may get a disconnected dentry, which is fine,
> > @@ -862,16 +870,20 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> >                          * number - it's the same as if we held a reference
> >                          * to a dentry in lower layer that was moved under us.
> >                          */
> > -                       err = ovl_check_origin(ofs, upperdentry, &stack, &ctr);
> > +                       err = ovl_check_origin(ofs, upperdentry, &origin_path,
> > +                                              &origin_ctr);
> >                         if (err)
> >                                 goto out_put_upper;
> > +
> > +                       if (d.metacopy)
> > +                               metacopy = true;
> >                 }
> >
> >                 if (d.redirect) {
> >                         err = -ENOMEM;
> >                         upperredirect = kstrdup(d.redirect, GFP_KERNEL);
> >                         if (!upperredirect)
> > -                               goto out_put_upper;
> > +                               goto out_put_origin;
> >                         if (d.redirect[0] == '/')
> >                                 poe = roe;
> >                 }
> > @@ -883,7 +895,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> >                 stack = kcalloc(ofs->numlower, sizeof(struct ovl_path),
> >                                 GFP_KERNEL);
> >                 if (!stack)
> > -                       goto out_put_upper;
> > +                       goto out_put_origin;
> >         }
> >
> >         for (i = 0; !d.stop && i < poe->numlower; i++) {
> > @@ -905,7 +917,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> >                  * If no origin fh is stored in upper of a merge dir, store fh
> >                  * of lower dir and set upper parent "impure".
> >                  */
> > -               if (upperdentry && !ctr && !ofs->noxattr) {
> > +               if (upperdentry && !ctr && !ofs->noxattr && d.is_dir) {
> >                         err = ovl_fix_origin(dentry, this, upperdentry);
> >                         if (err) {
> >                                 dput(this);
> > @@ -917,16 +929,35 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> >                  * When "verify_lower" feature is enabled, do not merge with a
> >                  * lower dir that does not match a stored origin xattr. In any
> >                  * case, only verified origin is used for index lookup.
> > +                *
> > +                * For non-dir dentry, make sure dentry found by lookup
> > +                * matches the origin stored in upper. Otherwise its an
> > +                * error.
> >                  */
> > -               if (upperdentry && !ctr && ovl_verify_lower(dentry->d_sb)) {
> > +               if (upperdentry && !ctr &&
> > +                   ((d.is_dir && ovl_verify_lower(dentry->d_sb)) ||
> > +                    (!d.is_dir && origin_path))) {
> >                         err = ovl_verify_origin(upperdentry, this, false);
> >                         if (err) {
> >                                 dput(this);
> > -                               break;
> > +                               if (d.is_dir)
> > +                                       break;
> > +                               goto out_put;
> >                         }
> > -
> >                         /* Bless lower dir as verified origin */
> > -                       origin = this;
> > +                       if (d.is_dir)
> > +                               origin = this;
> 
> It's ok to bless verified non-dir as well.
> It is going to be blesses anyway, just above index lookup if ctr > 0.

Hmm..., make sense to trust origin once we have verified origin both
for dir and non-dir. Will do.

> 
> > +               }
> > +
> > +               if (d.metacopy)
> > +                       metacopy = true;
> > +               /*
> > +                * Do not store intermediate metacopy dentries in chain,
> > +                * except top most lower metacopy dentry
> > +                */
> > +               if (d.metacopy && ctr) {
> > +                       dput(this);
> > +                       continue;
> >                 }
> >
> >                 stack[ctr].dentry = this;
> > @@ -960,6 +991,34 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> >                 }
> >         }
> >
> > +       if (metacopy) {
> > +               BUG_ON(d.is_dir);
> 
> Yeh, I think that is really a bug, because you need to detect
> the case of dir in lower layer under metacopy in upper layer
> and do something about it.
> 
> > +               /*
> > +                * Found a metacopy dentry but did not find corresponding
> > +                * data dentry
> > +                */
> > +               if (d.metacopy) {
> > +                       err = -ESTALE;
> > +                       goto out_put;
> > +               }
> > +
> > +               err = -EPERM;
> > +               if (!ofs->config.metacopy) {
> > +                       pr_warn_ratelimited("overlay: refusing to follow"
> > +                                           " metacopy origin for (%pd2)\n",
> > +                                           dentry);
> > +                       goto out_put;
> > +               }
> > +       } else if (!d.is_dir && upperdentry && !ctr && origin_path) {
> > +               if (WARN_ON(stack != NULL)) {
> > +                       err = -EIO;
> > +                       goto out_put;
> > +               }
> > +               stack = origin_path;
> > +               ctr = 1;
> > +               origin_path = NULL;
> > +       }
> > +
> >         /*
> >          * Lookup index by lower inode and verify it matches upper inode.
> >          * We only trust dir index if we verified that lower dir matches
> > @@ -1006,6 +1065,10 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> >         }
> >
> >         revert_creds(old_cred);
> > +       if (origin_path) {
> > +               dput(origin_path->dentry);
> > +               kfree(origin_path);
> > +       }
> >         dput(index);
> >         kfree(stack);
> >         kfree(d.redirect);
> > @@ -1019,6 +1082,11 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> >         for (i = 0; i < ctr; i++)
> >                 dput(stack[i].dentry);
> >         kfree(stack);
> > +out_put_origin:
> > +       if (origin_path) {
> > +               dput(origin_path->dentry);
> > +               kfree(origin_path);
> > +       }
> 
> There is no need for the new goto label.
> Just add this in existing out_put_upper label.

Ok, I should be able to club this with out_put_upper label.

Vivek

> 
> Thanks,
> Amir,

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 13/28] ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry
  2018-03-30  6:01   ` Amir Goldstein
@ 2018-04-02 15:08     ` Vivek Goyal
  0 siblings, 0 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-04-02 15:08 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Fri, Mar 30, 2018 at 09:01:12AM +0300, Amir Goldstein wrote:
> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > Now we have the notion of data dentry and metacopy dentry. ovl_dentry_lower()
> > will return lower dentry at idx 0, but it could be either data or metacopy
> 
> will return the upper most lower dentry. (idx 0 means upper).

I am explaining the behavior of current ovl_dentry_lower() helper here.
(And not of new helper).
> 
> > dentry. Now we support metacopy dentries in lower layers so it is possible
> > that lowerstack[0] is metacopy dentry while lowerstack[1] is actual data
> > dentry.
> >
> > So add an helper which returns lowest most dentry which is supposed to be
> > data dentry.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> 
> > ---
> >  fs/overlayfs/overlayfs.h |  1 +
> >  fs/overlayfs/util.c      | 14 ++++++++++++++
> >  2 files changed, 15 insertions(+)
> >
> > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> > index 4eb4b765887f..214d9f08c574 100644
> > --- a/fs/overlayfs/overlayfs.h
> > +++ b/fs/overlayfs/overlayfs.h
> > @@ -226,6 +226,7 @@ void ovl_path_lowerdata(struct dentry *dentry, struct path *path);
> >  enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path);
> >  struct dentry *ovl_dentry_upper(struct dentry *dentry);
> >  struct dentry *ovl_dentry_lower(struct dentry *dentry);
> > +struct dentry *ovl_dentry_lowerdata(struct dentry *dentry);
> >  struct dentry *ovl_dentry_real(struct dentry *dentry);
> >  struct dentry *ovl_i_dentry_upper(struct inode *inode);
> >  struct inode *ovl_inode_upper(struct inode *inode);
> > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> > index fd98329e820c..394674c4c820 100644
> > --- a/fs/overlayfs/util.c
> > +++ b/fs/overlayfs/util.c
> > @@ -186,6 +186,20 @@ struct dentry *ovl_dentry_lower(struct dentry *dentry)
> >         return oe->numlower ? oe->lowerstack[0].dentry : NULL;
> >  }
> >
> > +/*
> > + * ovl_dentry_lower() could return either a data dentry or metacopy dentry
> > + * dependig on what is stored in lowerstack[0]. At times we need to find
> > + * lower dentry which has data (and not metacopy dntry). This helper
> > + * returns the lower data dentry.
> > + */
> > +struct dentry *ovl_dentry_lowerdata(struct dentry *dentry)
> > +{
> > +       struct ovl_entry *oe = dentry->d_fsdata;
> > +       int idx = oe->numlower - 1;
> 
> Please stick with convention of layer->idx that idx 0 is upper.
> 
> > +
> > +       return idx >= 0 ? oe->lowerstack[idx].dentry : NULL;
> > +}
> > +
> >  struct dentry *ovl_dentry_real(struct dentry *dentry)
> >  {
> >         return ovl_dentry_upper(dentry) ?: ovl_dentry_lower(dentry);
> > --
> > 2.13.6
> >

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 17/28] ovl: During rename lock both source and target ovl_inode
  2018-03-30  6:50   ` Amir Goldstein
@ 2018-04-02 17:34     ` Vivek Goyal
  0 siblings, 0 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-04-02 17:34 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Fri, Mar 30, 2018 at 09:50:53AM +0300, Amir Goldstein wrote:
> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > In some cases we need to take both source (old) and target(new) dentry
> > ovl_inode->lock. This patch adds support for that. Locks are taken in
> > order of increasing inode address to avoid deadlock. This code has been
> > taken from lock_two_nondirectories().
> >
> > As of now, metacopy needs this lock if we are planning to update redirect
> > information on source/target ovl_inode. nlink related accounting takes this
> > lock on target for the case of overwrite.
> >
> 
> This is very over complicated.
> 
> Here is how this could be simplified.
> 
> Setting redirects does NOT need to happen under all the complicated rename
> locks. Settings redirects can happen completely before everything else
> because it is harmless to set redirect and not rename afterwards.

hmm..., So basically set redirects before doing lock_rename(new_upperdir,
old_upperdir). I guess even before ovl_set_nlink_start() takes inode
lock. That way we don't have to hold two inode locks together and
we should be able to take one inode lock at a time, set redirect and
release lock.

I spent some time thinking if I can spot anything obviously wrong
but could not find anything. So I will try to implement this. If
this works, I like the idea of simplifying the locking a bit. Right
now trying to take two inode locks makes it complicated.

Vivek
> 
> I would split out a helper ovl_rename_prep() it can also take care of
> copy ups and check ovl_can_rename() and then set redirect if needed
> on old and/or new locking them one at a time.
> You can take the ovl_inode->lock inside ovl_set_redirect() and you can also
> override credentials or whatever is needed due to moving set redirect
> earlier.

> 
> You probably don't need all the nlink_start re-factoring with this change.
> 
> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 26/28] ovl: Re-check redirect xattr during inode initialization
  2018-03-30  8:56   ` Amir Goldstein
@ 2018-04-02 19:35     ` Vivek Goyal
  2018-04-02 20:25       ` Amir Goldstein
  0 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-04-02 19:35 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Fri, Mar 30, 2018 at 11:56:42AM +0300, Amir Goldstein wrote:
> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > So far redirect could be placed on directories only and now it can be
> > placed on regular files as well. Also it could be completely removed
> > when a metacopy copy up file's data is copied up. That means if a redirect
> > is present during ovl_lookup(), it could be gone by the time ovl_get_inode()
> > happens.
> >
> 
> There is a bit of a mess in the assumptions.
> 
> If the inode is pure upper or indexed origin, than the alleged race ends up
> in !(inode->i_state & I_NEW) and you discard redirect anyway.

Can't these also happen when I_NEW=true. I mean inode could be flushed
out of cache. Say one cpu is doing ovl_lookup() and thread got blocked
while other cpu did copy up of file on other cpu, removed redirect and
inode got flushed out of cache. Now cpu1 resumes execuction, creates
a new inode but it needs to re-check if redirect is still present or
not?

> 
> If the inode is non-indexed copyup, then it is a different inode on disk
> and different struct ovl_inode in memory than the inode of the copy up
> we are allegedly racing with (they are broken hardlinks), so there is no
> issue.

Agreed that in case of broken hardlinks this race does not exist. But
do we really want to optimize it here? 

> 
> > Or it is possible that ovl_lookup() does not see a redirect and a rename
> > is taking place on a hard link and that places a redirect. And by the
> > time ovl_lookup() calls ovl_get_inode(), it sets ovl_inode->redirect = NULL
> > (Assume inode got flushed out of cache and was allocated new).
> 
> Same as above.
> 
> I am not saying there are no races between lookup and rename/link,
> but IMO the text above does not describe them or proves that they exist.
> 

I can try to give more details. But I think if inode gets flushed out
of cache, then we need to query redirect info again.

Vivek
> >
> > IOW, because we check and process redirect without locks in ovl_lookup(),
> > many possibilities open up for regular files. So for such cases, do not
> > use the redirect provided by the caller. Instead query it and install
> > in ovl_inode->redirect.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/overlayfs/inode.c     | 19 ++++++++++++++++++-
> >  fs/overlayfs/overlayfs.h |  1 +
> >  fs/overlayfs/util.c      | 42 ++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 61 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> > index 3dccfa1ee123..6a0c85699024 100644
> > --- a/fs/overlayfs/inode.c
> > +++ b/fs/overlayfs/inode.c
> > @@ -694,6 +694,7 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
> >         bool bylower = ovl_hash_bylower(sb, upperdentry, lowerdentry, index);
> >         bool is_dir, metacopy = false;
> >         int err = -ENOMEM;
> > +       char *new_redirect = NULL;
> >
> >         if (!realinode)
> >                 realinode = d_inode(lowerdentry);
> > @@ -754,7 +755,18 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
> >         if (upperdentry && !metacopy)
> >                 ovl_set_flag(OVL_UPPERDATA, inode);
> >
> > -       OVL_I(inode)->redirect = redirect;
> > +       if (!metacopy) {
> > +               OVL_I(inode)->redirect = redirect;
> > +               redirect = NULL;
> > +       } else if (upperdentry) {
> > +               new_redirect = ovl_get_redirect_xattr(upperdentry);
> > +               if (IS_ERR(new_redirect)) {
> > +                       err = PTR_ERR(new_redirect);
> > +                       goto out_err_inode;
> > +               }
> > +               OVL_I(inode)->redirect = new_redirect;
> > +               new_redirect = NULL;
> > +       }
> >
> >         /* Check for non-merge dir that may have whiteouts */
> >         if (is_dir) {
> > @@ -764,11 +776,16 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
> >                 }
> >         }
> >
> > +       kfree(redirect);
> >         if (inode->i_state & I_NEW)
> >                 unlock_new_inode(inode);
> >  out:
> >         return inode;
> >
> > +out_err_inode:
> > +       if (inode->i_state & I_NEW)
> > +               unlock_new_inode(inode);
> > +       iput(inode);
> >  out_err:
> >         inode = ERR_PTR(err);
> >         goto out;
> > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> > index 429713653b3b..a3bee7619fbb 100644
> > --- a/fs/overlayfs/overlayfs.h
> > +++ b/fs/overlayfs/overlayfs.h
> > @@ -279,6 +279,7 @@ void ovl_nlink_end_locked(struct dentry *dentry);
> >  int ovl_lock_rename_workdir(struct dentry *workdir, struct dentry *upperdir);
> >  int ovl_check_metacopy_xattr(struct dentry *dentry);
> >  bool ovl_is_metacopy_dentry(struct dentry *dentry);
> > +char *ovl_get_redirect_xattr(struct dentry *dentry);
> >
> >  static inline bool ovl_is_impuredir(struct dentry *dentry)
> >  {
> > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> > index 961d65bd25c9..3d090b6f9fc2 100644
> > --- a/fs/overlayfs/util.c
> > +++ b/fs/overlayfs/util.c
> > @@ -833,3 +833,45 @@ bool ovl_is_metacopy_dentry(struct dentry *dentry)
> >
> >         return (oe->numlower > 1);
> >  }
> > +
> > +char *ovl_get_redirect_xattr(struct dentry *dentry)
> > +{
> > +       int res;
> > +       char *s, *next, *buf = NULL;
> > +
> > +       res = vfs_getxattr(dentry, OVL_XATTR_REDIRECT, NULL, 0);
> > +       if (res < 0) {
> > +               if (res == -ENODATA || res == -EOPNOTSUPP)
> > +                       return NULL;
> > +               return ERR_PTR(res);
> > +       }
> > +
> > +       buf = kzalloc(res + 1, GFP_KERNEL);
> > +       if (!buf)
> > +               return ERR_PTR(-ENOMEM);
> > +
> > +       res = vfs_getxattr(dentry, OVL_XATTR_REDIRECT, buf, res);
> > +       if (res < 0) {
> > +               kfree(buf);
> > +               return ERR_PTR(res);
> > +        }
> > +       if (res == 0)
> > +               goto invalid;
> > +
> > +       if (buf[0] == '/') {
> > +               for (s = buf; *s++ == '/'; s = next) {
> > +                       next = strchrnul(s, '/');
> > +                       if (s == next)
> > +                               goto invalid;
> > +               }
> > +       } else {
> > +               if (strchr(buf, '/') != NULL)
> > +                       goto invalid;
> > +       }
> > +
> > +       return buf;
> > +invalid:
> > +       pr_warn_ratelimited("overlayfs: invalid redirect (%s)\n", buf);
> > +       kfree(buf);
> > +       return ERR_PTR(-EINVAL);
> > +}
> > --
> > 2.13.6
> >
> 
> If you really end up needing this helper, you should use it from lookup as well.
> 
> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 10/28] ovl: Modify ovl_lookup() and friends to lookup metacopy dentry
  2018-03-30  9:12     ` Amir Goldstein
@ 2018-04-02 19:45       ` Vivek Goyal
  2018-04-02 20:07         ` Amir Goldstein
  0 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-04-02 19:45 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Fri, Mar 30, 2018 at 12:12:18PM +0300, Amir Goldstein wrote:
> On Fri, Mar 30, 2018 at 8:49 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> > On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> This patch modifies ovl_lookup() and friends to lookup metacopy dentries.
> >> It also allows for presence of metacopy dentries in lower layer.
> >>
> >> During lookup, check for presence of OVL_XATTR_METACOPY and if not present,
> >> set OVL_UPPERDATA bit in flags.
> >>
> >> We don't support metacopy feature with nfs_export. So in nfs_export code,
> >> we set OVL_UPPERDATA flag set unconditionally if upper inode exists.
> >>
> >> Do not follow metacopy origin if we find a metacopy only inode and metacopy
> >> feature is not enabled for that mount. Like redirect, this can have security
> >> implications where an attacker could hand craft upper and try to gain
> >> access to file on lower which it should not have to begin with.
> >>
> >> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> >> ---
> [...]
> 
> >> @@ -917,16 +929,35 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> >>                  * When "verify_lower" feature is enabled, do not merge with a
> >>                  * lower dir that does not match a stored origin xattr. In any
> >>                  * case, only verified origin is used for index lookup.
> >> +                *
> >> +                * For non-dir dentry, make sure dentry found by lookup
> >> +                * matches the origin stored in upper. Otherwise its an
> >> +                * error.
> >>                  */
> >> -               if (upperdentry && !ctr && ovl_verify_lower(dentry->d_sb)) {
> >> +               if (upperdentry && !ctr &&
> >> +                   ((d.is_dir && ovl_verify_lower(dentry->d_sb)) ||
> >> +                    (!d.is_dir && origin_path))) {
> >>                         err = ovl_verify_origin(upperdentry, this, false);
> >>                         if (err) {
> >>                                 dput(this);
> >> -                               break;
> >> +                               if (d.is_dir)
> >> +                                       break;
> >> +                               goto out_put;
> >>                         }
> >> -
> >>                         /* Bless lower dir as verified origin */
> >> -                       origin = this;
> >> +                       if (d.is_dir)
> >> +                               origin = this;
> >
> > It's ok to bless verified non-dir as well.
> > It is going to be blesses anyway, just above index lookup if ctr > 0.
> 
> >
> >> +               }
> >> +
> >> +               if (d.metacopy)
> >> +                       metacopy = true;
> >> +               /*
> >> +                * Do not store intermediate metacopy dentries in chain,
> >> +                * except top most lower metacopy dentry
> >> +                */
> >> +               if (d.metacopy && ctr) {
> >> +                       dput(this);
> >> +                       continue;
> >>                 }
> >>
> >>                 stack[ctr].dentry = this;
> >> @@ -960,6 +991,34 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> >>                 }
> >>         }
> >>
> >> +       if (metacopy) {
> >> +               BUG_ON(d.is_dir);
> >
> > Yeh, I think that is really a bug, because you need to detect
> > the case of dir in lower layer under metacopy in upper layer
> > and do something about it.
> >
> >> +               /*
> >> +                * Found a metacopy dentry but did not find corresponding
> >> +                * data dentry
> >> +                */
> >> +               if (d.metacopy) {
> >> +                       err = -ESTALE;
> >> +                       goto out_put;
> >> +               }
> >> +
> >> +               err = -EPERM;
> >> +               if (!ofs->config.metacopy) {
> >> +                       pr_warn_ratelimited("overlay: refusing to follow"
> >> +                                           " metacopy origin for (%pd2)\n",
> >> +                                           dentry);
> >> +                       goto out_put;
> >> +               }
> 
> in (!origin_path) lower was followed by name/redirect and not
> verified by origin, we should not lookup index of non-dir below.
> Right now non-dir index entries assume that origin xattr exists
> and matches the entry name. We may be able to relax that in
> the future using non-dir redirect, but we are not there yet.

Ok.

> 
> 
> >> +       } else if (!d.is_dir && upperdentry && !ctr && origin_path) {
> >> +               if (WARN_ON(stack != NULL)) {
> >> +                       err = -EIO;
> >> +                       goto out_put;
> >> +               }
> >> +               stack = origin_path;
> >> +               ctr = 1;
> >> +               origin_path = NULL;
> >> +       }
> >> +
> >>         /*
> >>          * Lookup index by lower inode and verify it matches upper inode.
> >>          * We only trust dir index if we verified that lower dir matches
>          * origin, otherwise dir index entries may be inconsistent and we
>          * ignore them. Always lookup index of non-dir and non-upper.
>          */
>         if (ctr && (!upperdentry || !d.is_dir))
>                 origin = stack[0].dentry;
> 
> So this condition needs to be fixed.

Ok, So I could make it like the case of directory. That is for
metacopy files bless origin only if it was verified and modify
above condiiton as follows.

         if (ctr && (!upperdentry || (!d.is_dir && !metacopy)))
                 origin = stack[0].dentry;

That way for a metcopy dentry we will search for index only if
origin was verified.

Vivek

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 10/28] ovl: Modify ovl_lookup() and friends to lookup metacopy dentry
  2018-04-02 19:45       ` Vivek Goyal
@ 2018-04-02 20:07         ` Amir Goldstein
  0 siblings, 0 replies; 91+ messages in thread
From: Amir Goldstein @ 2018-04-02 20:07 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Mon, Apr 2, 2018 at 10:45 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Mar 30, 2018 at 12:12:18PM +0300, Amir Goldstein wrote:
>> On Fri, Mar 30, 2018 at 8:49 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> > On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> >> This patch modifies ovl_lookup() and friends to lookup metacopy dentries.
>> >> It also allows for presence of metacopy dentries in lower layer.
>> >>
>> >> During lookup, check for presence of OVL_XATTR_METACOPY and if not present,
>> >> set OVL_UPPERDATA bit in flags.
>> >>
>> >> We don't support metacopy feature with nfs_export. So in nfs_export code,
>> >> we set OVL_UPPERDATA flag set unconditionally if upper inode exists.
>> >>
>> >> Do not follow metacopy origin if we find a metacopy only inode and metacopy
>> >> feature is not enabled for that mount. Like redirect, this can have security
>> >> implications where an attacker could hand craft upper and try to gain
>> >> access to file on lower which it should not have to begin with.
>> >>
>> >> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
>> >> ---
>> [...]
>>
>> >> @@ -917,16 +929,35 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>> >>                  * When "verify_lower" feature is enabled, do not merge with a
>> >>                  * lower dir that does not match a stored origin xattr. In any
>> >>                  * case, only verified origin is used for index lookup.
>> >> +                *
>> >> +                * For non-dir dentry, make sure dentry found by lookup
>> >> +                * matches the origin stored in upper. Otherwise its an
>> >> +                * error.
>> >>                  */
>> >> -               if (upperdentry && !ctr && ovl_verify_lower(dentry->d_sb)) {
>> >> +               if (upperdentry && !ctr &&
>> >> +                   ((d.is_dir && ovl_verify_lower(dentry->d_sb)) ||
>> >> +                    (!d.is_dir && origin_path))) {
>> >>                         err = ovl_verify_origin(upperdentry, this, false);
>> >>                         if (err) {
>> >>                                 dput(this);
>> >> -                               break;
>> >> +                               if (d.is_dir)
>> >> +                                       break;
>> >> +                               goto out_put;
>> >>                         }
>> >> -
>> >>                         /* Bless lower dir as verified origin */
>> >> -                       origin = this;
>> >> +                       if (d.is_dir)
>> >> +                               origin = this;
>> >
>> > It's ok to bless verified non-dir as well.
>> > It is going to be blesses anyway, just above index lookup if ctr > 0.
>>
>> >
>> >> +               }
>> >> +
>> >> +               if (d.metacopy)
>> >> +                       metacopy = true;
>> >> +               /*
>> >> +                * Do not store intermediate metacopy dentries in chain,
>> >> +                * except top most lower metacopy dentry
>> >> +                */
>> >> +               if (d.metacopy && ctr) {
>> >> +                       dput(this);
>> >> +                       continue;
>> >>                 }
>> >>
>> >>                 stack[ctr].dentry = this;
>> >> @@ -960,6 +991,34 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>> >>                 }
>> >>         }
>> >>
>> >> +       if (metacopy) {
>> >> +               BUG_ON(d.is_dir);
>> >
>> > Yeh, I think that is really a bug, because you need to detect
>> > the case of dir in lower layer under metacopy in upper layer
>> > and do something about it.
>> >
>> >> +               /*
>> >> +                * Found a metacopy dentry but did not find corresponding
>> >> +                * data dentry
>> >> +                */
>> >> +               if (d.metacopy) {
>> >> +                       err = -ESTALE;
>> >> +                       goto out_put;
>> >> +               }
>> >> +
>> >> +               err = -EPERM;
>> >> +               if (!ofs->config.metacopy) {
>> >> +                       pr_warn_ratelimited("overlay: refusing to follow"
>> >> +                                           " metacopy origin for (%pd2)\n",
>> >> +                                           dentry);
>> >> +                       goto out_put;
>> >> +               }
>>
>> in (!origin_path) lower was followed by name/redirect and not
>> verified by origin, we should not lookup index of non-dir below.
>> Right now non-dir index entries assume that origin xattr exists
>> and matches the entry name. We may be able to relax that in
>> the future using non-dir redirect, but we are not there yet.
>
> Ok.
>
>>
>>
>> >> +       } else if (!d.is_dir && upperdentry && !ctr && origin_path) {
>> >> +               if (WARN_ON(stack != NULL)) {
>> >> +                       err = -EIO;
>> >> +                       goto out_put;
>> >> +               }
>> >> +               stack = origin_path;
>> >> +               ctr = 1;
>> >> +               origin_path = NULL;
>> >> +       }
>> >> +
>> >>         /*
>> >>          * Lookup index by lower inode and verify it matches upper inode.
>> >>          * We only trust dir index if we verified that lower dir matches
>>          * origin, otherwise dir index entries may be inconsistent and we
>>          * ignore them. Always lookup index of non-dir and non-upper.
>>          */
>>         if (ctr && (!upperdentry || !d.is_dir))
>>                 origin = stack[0].dentry;
>>
>> So this condition needs to be fixed.
>
> Ok, So I could make it like the case of directory. That is for
> metacopy files bless origin only if it was verified and modify
> above condiiton as follows.
>
>          if (ctr && (!upperdentry || (!d.is_dir && !metacopy)))
>                  origin = stack[0].dentry;
>
> That way for a metcopy dentry we will search for index only if
> origin was verified.
>

Correct.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 12/28] ovl: Fix ovl_getattr() to get number of blocks from lower
  2018-03-30  9:24   ` Amir Goldstein
@ 2018-04-02 20:11     ` Vivek Goyal
  2018-04-02 20:27       ` Amir Goldstein
  0 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-04-02 20:11 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Fri, Mar 30, 2018 at 12:24:15PM +0300, Amir Goldstein wrote:
> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > If an inode has been copied up metadata only, then we need to query the
> > number of blocks from lower and fill up the stat->st_blocks.
> >
> > We need to be careful about races where we are doing stat on one cpu and
> > data copy up is taking place on other cpu. We want to return
> > stat->st_blocks either from lower or stable upper and not something in
> > between. Hence, ovl_has_upperdata() is called first to figure out whether
> > block reporting will take place from lower or upper.
> >
> > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/overlayfs/inode.c     | 17 ++++++++++++++++-
> >  fs/overlayfs/overlayfs.h |  1 +
> >  fs/overlayfs/util.c      | 16 ++++++++++++++++
> >  3 files changed, 33 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> > index e1dbfed0c449..cd03f3e642fd 100644
> > --- a/fs/overlayfs/inode.c
> > +++ b/fs/overlayfs/inode.c
> > @@ -76,6 +76,9 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
> >         bool is_dir = S_ISDIR(dentry->d_inode->i_mode);
> >         bool samefs = ovl_same_sb(dentry->d_sb);
> >         int err;
> > +       bool metacopy = false;
> > +
> > +       metacopy = ovl_is_metacopy_dentry(dentry);
> >
> >         type = ovl_path_real(dentry, &realpath);
> >         old_cred = ovl_override_creds(dentry->d_sb);
> > @@ -93,7 +96,8 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
> >         if (!is_dir || samefs) {
> >                 if (OVL_TYPE_ORIGIN(type)) {
> >                         struct kstat lowerstat;
> > -                       u32 lowermask = STATX_INO | (!is_dir ? STATX_NLINK : 0);
> > +                       u32 lowermask = STATX_INO | STATX_BLOCKS |
> > +                                       (!is_dir ? STATX_NLINK : 0);
> 
> Leftover.
> 
> But I think you could do with one vfs_getattr():
> 
>          if (!is_dir || samefs) {
>                    u32 lowermask = metacopy ? STATX_BLOCKS : 0;
>                    if (OVL_TYPE_ORIGIN(type))
>                            lowermask |= STATX_INO | (!is_dir ? STATX_NLINK : 0);
>                    if (lowermask) {
>                            ovl_path_lower(dentry, &realpath);
>                            err = vfs_getattr(&realpath, &lowerstat,

We now support mid layer metacopy dentries. So we will need to do
vfs_getattr() on lowest data dentry for data blocks. While rest of
the info (inode number and nlink) comes from top most lower dentry.

So I don't think we can get rid of extra vfs_getattr()?

Vivek

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 26/28] ovl: Re-check redirect xattr during inode initialization
  2018-04-02 19:35     ` Vivek Goyal
@ 2018-04-02 20:25       ` Amir Goldstein
  0 siblings, 0 replies; 91+ messages in thread
From: Amir Goldstein @ 2018-04-02 20:25 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Mon, Apr 2, 2018 at 10:35 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Mar 30, 2018 at 11:56:42AM +0300, Amir Goldstein wrote:
>> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > So far redirect could be placed on directories only and now it can be
>> > placed on regular files as well. Also it could be completely removed
>> > when a metacopy copy up file's data is copied up. That means if a redirect
>> > is present during ovl_lookup(), it could be gone by the time ovl_get_inode()
>> > happens.
>> >
>>
>> There is a bit of a mess in the assumptions.
>>
>> If the inode is pure upper or indexed origin, than the alleged race ends up
>> in !(inode->i_state & I_NEW) and you discard redirect anyway.
>
> Can't these also happen when I_NEW=true. I mean inode could be flushed
> out of cache. Say one cpu is doing ovl_lookup() and thread got blocked
> while other cpu did copy up of file on other cpu, removed redirect and
> inode got flushed out of cache. Now cpu1 resumes execuction, creates
> a new inode but it needs to re-check if redirect is still present or
> not?
>

Not sure. Best to drop the idea of removing REDIRECT on copy up
and be done with it.

>>
>> If the inode is non-indexed copyup, then it is a different inode on disk
>> and different struct ovl_inode in memory than the inode of the copy up
>> we are allegedly racing with (they are broken hardlinks), so there is no
>> issue.
>
> Agreed that in case of broken hardlinks this race does not exist. But
> do we really want to optimize it here?
>

Well, if you are going to cleanup REDIRECT only on I_NEW,
then no need to special case broken hardlinks.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 12/28] ovl: Fix ovl_getattr() to get number of blocks from lower
  2018-04-02 20:11     ` Vivek Goyal
@ 2018-04-02 20:27       ` Amir Goldstein
  0 siblings, 0 replies; 91+ messages in thread
From: Amir Goldstein @ 2018-04-02 20:27 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Mon, Apr 2, 2018 at 11:11 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Mar 30, 2018 at 12:24:15PM +0300, Amir Goldstein wrote:
>> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > If an inode has been copied up metadata only, then we need to query the
>> > number of blocks from lower and fill up the stat->st_blocks.
>> >
>> > We need to be careful about races where we are doing stat on one cpu and
>> > data copy up is taking place on other cpu. We want to return
>> > stat->st_blocks either from lower or stable upper and not something in
>> > between. Hence, ovl_has_upperdata() is called first to figure out whether
>> > block reporting will take place from lower or upper.
>> >
>> > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
>> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
>> > ---
>> >  fs/overlayfs/inode.c     | 17 ++++++++++++++++-
>> >  fs/overlayfs/overlayfs.h |  1 +
>> >  fs/overlayfs/util.c      | 16 ++++++++++++++++
>> >  3 files changed, 33 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
>> > index e1dbfed0c449..cd03f3e642fd 100644
>> > --- a/fs/overlayfs/inode.c
>> > +++ b/fs/overlayfs/inode.c
>> > @@ -76,6 +76,9 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
>> >         bool is_dir = S_ISDIR(dentry->d_inode->i_mode);
>> >         bool samefs = ovl_same_sb(dentry->d_sb);
>> >         int err;
>> > +       bool metacopy = false;
>> > +
>> > +       metacopy = ovl_is_metacopy_dentry(dentry);
>> >
>> >         type = ovl_path_real(dentry, &realpath);
>> >         old_cred = ovl_override_creds(dentry->d_sb);
>> > @@ -93,7 +96,8 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
>> >         if (!is_dir || samefs) {
>> >                 if (OVL_TYPE_ORIGIN(type)) {
>> >                         struct kstat lowerstat;
>> > -                       u32 lowermask = STATX_INO | (!is_dir ? STATX_NLINK : 0);
>> > +                       u32 lowermask = STATX_INO | STATX_BLOCKS |
>> > +                                       (!is_dir ? STATX_NLINK : 0);
>>
>> Leftover.
>>
>> But I think you could do with one vfs_getattr():
>>
>>          if (!is_dir || samefs) {
>>                    u32 lowermask = metacopy ? STATX_BLOCKS : 0;
>>                    if (OVL_TYPE_ORIGIN(type))
>>                            lowermask |= STATX_INO | (!is_dir ? STATX_NLINK : 0);
>>                    if (lowermask) {
>>                            ovl_path_lower(dentry, &realpath);
>>                            err = vfs_getattr(&realpath, &lowerstat,
>
> We now support mid layer metacopy dentries. So we will need to do
> vfs_getattr() on lowest data dentry for data blocks. While rest of
> the info (inode number and nlink) comes from top most lower dentry.
>
> So I don't think we can get rid of extra vfs_getattr()?
>

Right. didn't notice that subtle detail, better amend commit message
to mention that.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 18/28] ovl: Check redirects for metacopy files
  2018-03-30 10:02   ` Amir Goldstein
@ 2018-04-02 20:29     ` Vivek Goyal
  2018-04-03  5:44       ` Amir Goldstein
  0 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-04-02 20:29 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Fri, Mar 30, 2018 at 01:02:32PM +0300, Amir Goldstein wrote:
> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > Right now we rely on path based lookup for data origin of metacopy upper.
> > This will work only if upper has not been renamed. We solved this problem
> > already for merged directories using redirect. Use same logic for metacopy
> > files.
> >
> > This patch just goes on to check redirects for metacopy files.
> >
> 
> The patch changes the logic very subtly, but it is really hard to
> follow because of convoluted diff. Please make changes that don't
> change logic in a separate patch.
> 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/overlayfs/namei.c | 24 ++++++++++++------------
> >  1 file changed, 12 insertions(+), 12 deletions(-)
> >
> > diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> > index 1dba89e9543f..a7a9588f64b7 100644
> > --- a/fs/overlayfs/namei.c
> > +++ b/fs/overlayfs/namei.c
> > @@ -260,18 +260,19 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
> >                         goto out_err;
> >                 d->stop = !err;
> >                 d->metacopy = !!err;
> > -               goto out;
> > -       }
> > -       if (last_element)
> > -               d->is_dir = true;
> > -       if (d->last)
> > -               goto out;
> > -
> > -       if (ovl_is_opaquedir(this)) {
> > -               d->stop = true;
> > +               if (!d->metacopy)
> > +                       goto out;
> > +       } else {
> >                 if (last_element)
> > -                       d->opaque = true;
> > -               goto out;
> > +                       d->is_dir = true;
> > +               if (d->last)
> > +                       goto out;
> 
> If I am not mistaken, d->last test is relevant to both did and non-dir,
> because it also prevents the unneeded check redirect.

It is just an optimization which we do only for directories as of now
and this patch does not change that behavior. For non-dirs we never
check redirects anyway as of now. 

One could argue that this optimization should be introduced for metacopy
files as well. I could do that. Just that I did not focus too much
on optimizations a lot. This series has been growing and dragging
enough, that I want to focus on correctness first. And worry about
optimizations later.

Vivek
> 
> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 18/28] ovl: Check redirects for metacopy files
  2018-04-02 20:29     ` Vivek Goyal
@ 2018-04-03  5:44       ` Amir Goldstein
  2018-04-03 12:31         ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-04-03  5:44 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Mon, Apr 2, 2018 at 11:29 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Mar 30, 2018 at 01:02:32PM +0300, Amir Goldstein wrote:
>> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > Right now we rely on path based lookup for data origin of metacopy upper.
>> > This will work only if upper has not been renamed. We solved this problem
>> > already for merged directories using redirect. Use same logic for metacopy
>> > files.
>> >
>> > This patch just goes on to check redirects for metacopy files.
>> >
>>
>> The patch changes the logic very subtly, but it is really hard to
>> follow because of convoluted diff. Please make changes that don't
>> change logic in a separate patch.
>>
>> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
>> > ---
>> >  fs/overlayfs/namei.c | 24 ++++++++++++------------
>> >  1 file changed, 12 insertions(+), 12 deletions(-)
>> >
>> > diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
>> > index 1dba89e9543f..a7a9588f64b7 100644
>> > --- a/fs/overlayfs/namei.c
>> > +++ b/fs/overlayfs/namei.c
>> > @@ -260,18 +260,19 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
>> >                         goto out_err;
>> >                 d->stop = !err;
>> >                 d->metacopy = !!err;
>> > -               goto out;
>> > -       }
>> > -       if (last_element)
>> > -               d->is_dir = true;
>> > -       if (d->last)
>> > -               goto out;
>> > -
>> > -       if (ovl_is_opaquedir(this)) {
>> > -               d->stop = true;
>> > +               if (!d->metacopy)
>> > +                       goto out;
>> > +       } else {
>> >                 if (last_element)
>> > -                       d->opaque = true;
>> > -               goto out;
>> > +                       d->is_dir = true;
>> > +               if (d->last)
>> > +                       goto out;
>>
>> If I am not mistaken, d->last test is relevant to both did and non-dir,
>> because it also prevents the unneeded check redirect.
>
> It is just an optimization which we do only for directories as of now
> and this patch does not change that behavior. For non-dirs we never
> check redirects anyway as of now.

This is a moot argument.
Of course this patch changes behavior.
If adds the ability to redirect non-dir and it should add the same logic as
for dir.

>
> One could argue that this optimization should be introduced for metacopy
> files as well. I could do that. Just that I did not focus too much
> on optimizations a lot. This series has been growing and dragging
> enough, that I want to focus on correctness first. And worry about
> optimizations later.

I hear your frustration, but I think the patch set is progressing very nicely
(and got a few prep patches already in overlayfs-next), so this isn't going
to break you ;-)

 if (!d->metacopy || d->last)
         goto out;

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 18/28] ovl: Check redirects for metacopy files
  2018-04-03  5:44       ` Amir Goldstein
@ 2018-04-03 12:31         ` Vivek Goyal
  0 siblings, 0 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-04-03 12:31 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Tue, Apr 03, 2018 at 08:44:33AM +0300, Amir Goldstein wrote:
> On Mon, Apr 2, 2018 at 11:29 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Fri, Mar 30, 2018 at 01:02:32PM +0300, Amir Goldstein wrote:
> >> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > Right now we rely on path based lookup for data origin of metacopy upper.
> >> > This will work only if upper has not been renamed. We solved this problem
> >> > already for merged directories using redirect. Use same logic for metacopy
> >> > files.
> >> >
> >> > This patch just goes on to check redirects for metacopy files.
> >> >
> >>
> >> The patch changes the logic very subtly, but it is really hard to
> >> follow because of convoluted diff. Please make changes that don't
> >> change logic in a separate patch.
> >>
> >> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> >> > ---
> >> >  fs/overlayfs/namei.c | 24 ++++++++++++------------
> >> >  1 file changed, 12 insertions(+), 12 deletions(-)
> >> >
> >> > diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> >> > index 1dba89e9543f..a7a9588f64b7 100644
> >> > --- a/fs/overlayfs/namei.c
> >> > +++ b/fs/overlayfs/namei.c
> >> > @@ -260,18 +260,19 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
> >> >                         goto out_err;
> >> >                 d->stop = !err;
> >> >                 d->metacopy = !!err;
> >> > -               goto out;
> >> > -       }
> >> > -       if (last_element)
> >> > -               d->is_dir = true;
> >> > -       if (d->last)
> >> > -               goto out;
> >> > -
> >> > -       if (ovl_is_opaquedir(this)) {
> >> > -               d->stop = true;
> >> > +               if (!d->metacopy)
> >> > +                       goto out;
> >> > +       } else {
> >> >                 if (last_element)
> >> > -                       d->opaque = true;
> >> > -               goto out;
> >> > +                       d->is_dir = true;
> >> > +               if (d->last)
> >> > +                       goto out;
> >>
> >> If I am not mistaken, d->last test is relevant to both did and non-dir,
> >> because it also prevents the unneeded check redirect.
> >
> > It is just an optimization which we do only for directories as of now
> > and this patch does not change that behavior. For non-dirs we never
> > check redirects anyway as of now.
> 
> This is a moot argument.
> Of course this patch changes behavior.
> If adds the ability to redirect non-dir and it should add the same logic as
> for dir.
> 
> >
> > One could argue that this optimization should be introduced for metacopy
> > files as well. I could do that. Just that I did not focus too much
> > on optimizations a lot. This series has been growing and dragging
> > enough, that I want to focus on correctness first. And worry about
> > optimizations later.
> 
> I hear your frustration, but I think the patch set is progressing very nicely
> (and got a few prep patches already in overlayfs-next), so this isn't going
> to break you ;-)
> 
>  if (!d->metacopy || d->last)
>          goto out;

Ok, I will add this.

Vivek

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode
  2018-03-30 10:53   ` Amir Goldstein
  2018-04-02 12:39     ` Vivek Goyal
@ 2018-04-04 12:29     ` Vivek Goyal
  2018-04-04 12:51       ` Amir Goldstein
  1 sibling, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-04-04 12:29 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Fri, Mar 30, 2018 at 01:53:24PM +0300, Amir Goldstein wrote:
> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > If we find a upper metacopy inode, make sure we also found associated data
> > dentry in lower. Otherwise copy up operation later will fail.
> >
> > There are two cases where this can happen. First case is that somehow
> > data file was removed from lower layer. Other case is that REDIRECT
> > xattr was removed due to copy up of file on another cpu (when inode is
> > shared between two dentries) and hence ovl_lookup() could not find the
> > correct dentry.
> >
> 
> Remind me again why we remove REDIRECT xattr?
> Is it a must for functionality or just for being boy scouts?
> I would prefer to avoid having to deal with races of this sort.
> You can cleanup REDIRECT for non-dir that is not metacopy
> on lookup when finding a I_NEW inode.

Ok, thinking more about it. If we were to clean REDIRECT on lookup when
finding I_NEW inode, that means we will have to always do
vfs_removexattr(OVL_XATTR_REDIRECT) on all non-metacopy non-dir files.
That does not sound like a very good idea. Its unnecessary overhead in
lookup path.

IOW, I think removing REDIRECT and doing appropriate locking around
ovl_inode->redirect is probably better.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode
  2018-04-04 12:29     ` Vivek Goyal
@ 2018-04-04 12:51       ` Amir Goldstein
  2018-04-04 13:21         ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-04-04 12:51 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Wed, Apr 4, 2018 at 3:29 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Mar 30, 2018 at 01:53:24PM +0300, Amir Goldstein wrote:
>> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > If we find a upper metacopy inode, make sure we also found associated data
>> > dentry in lower. Otherwise copy up operation later will fail.
>> >
>> > There are two cases where this can happen. First case is that somehow
>> > data file was removed from lower layer. Other case is that REDIRECT
>> > xattr was removed due to copy up of file on another cpu (when inode is
>> > shared between two dentries) and hence ovl_lookup() could not find the
>> > correct dentry.
>> >
>>
>> Remind me again why we remove REDIRECT xattr?
>> Is it a must for functionality or just for being boy scouts?
>> I would prefer to avoid having to deal with races of this sort.
>> You can cleanup REDIRECT for non-dir that is not metacopy
>> on lookup when finding a I_NEW inode.
>
> Ok, thinking more about it. If we were to clean REDIRECT on lookup when
> finding I_NEW inode, that means we will have to always do
> vfs_removexattr(OVL_XATTR_REDIRECT) on all non-metacopy non-dir files.
> That does not sound like a very good idea. Its unnecessary overhead in
> lookup path.
>
> IOW, I think removing REDIRECT and doing appropriate locking around
> ovl_inode->redirect is probably better.
>

Here is what I propose.
During lookup, you anyway check REDIRECT and check METACOPY
on upper and then call ovl_get_inode() with upper redirect and upper
metacopy information.

IF this is a new inode AND both REDIRECT and METACOPY were
found on upper THEN it is safe to remove REDIRECT xattr.

Maybe I am missing something, but I don't see where the extra overhead
is, beyond the overhead already there for metacopy enabled lookup.

OTOH, I don't see what cleaning REDIRECT gets us in the first place.
During lookup, REDIRECT does not affect non metacopy non-dir,
because we skip ovl_check_redirect().
REDIRECT could actually be useful for reconstructing ORIGIN xattr
and index after copying layers, so not sure its a good thing to remove it
at all. After all, redirects are pretty rare as it is.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode
  2018-04-04 12:51       ` Amir Goldstein
@ 2018-04-04 13:21         ` Vivek Goyal
  2018-04-04 15:51           ` Amir Goldstein
  0 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-04-04 13:21 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Wed, Apr 04, 2018 at 03:51:37PM +0300, Amir Goldstein wrote:
> On Wed, Apr 4, 2018 at 3:29 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Fri, Mar 30, 2018 at 01:53:24PM +0300, Amir Goldstein wrote:
> >> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > If we find a upper metacopy inode, make sure we also found associated data
> >> > dentry in lower. Otherwise copy up operation later will fail.
> >> >
> >> > There are two cases where this can happen. First case is that somehow
> >> > data file was removed from lower layer. Other case is that REDIRECT
> >> > xattr was removed due to copy up of file on another cpu (when inode is
> >> > shared between two dentries) and hence ovl_lookup() could not find the
> >> > correct dentry.
> >> >
> >>
> >> Remind me again why we remove REDIRECT xattr?
> >> Is it a must for functionality or just for being boy scouts?
> >> I would prefer to avoid having to deal with races of this sort.
> >> You can cleanup REDIRECT for non-dir that is not metacopy
> >> on lookup when finding a I_NEW inode.
> >
> > Ok, thinking more about it. If we were to clean REDIRECT on lookup when
> > finding I_NEW inode, that means we will have to always do
> > vfs_removexattr(OVL_XATTR_REDIRECT) on all non-metacopy non-dir files.
> > That does not sound like a very good idea. Its unnecessary overhead in
> > lookup path.
> >
> > IOW, I think removing REDIRECT and doing appropriate locking around
> > ovl_inode->redirect is probably better.
> >
> 
> Here is what I propose.
> During lookup, you anyway check REDIRECT and check METACOPY
> on upper and then call ovl_get_inode() with upper redirect and upper
> metacopy information.

We check for upperredirect only if metacopy xattr is found. Otherwise
we skip checking for redirect.

https://github.com/rhvgoyal/linux/blob/metacopy-v13/fs/overlayfs/namei.c#L270

> 
> IF this is a new inode AND both REDIRECT and METACOPY were
> found on upper THEN it is safe to remove REDIRECT xattr.

If both METACOPY and REDIRECT were found, then we should not remove
REDIRECT. That REDIRECT is still useful. REDIRECT should be removed
only if METACOPY is not found and REDIRECT is found (on a non-dir file).

> 
> Maybe I am missing something, but I don't see where the extra overhead
> is, beyond the overhead already there for metacopy enabled lookup.

Given we don't check for REDIRECT if upper is not METACOPY, that means
we will have to always check for REDIRECT in ovl_get_inode() and add
the unnecessary overhead (To all non-dir files).

> 
> OTOH, I don't see what cleaning REDIRECT gets us in the first place.
> During lookup, REDIRECT does not affect non metacopy non-dir,
> because we skip ovl_check_redirect().
> REDIRECT could actually be useful for reconstructing ORIGIN xattr
> and index after copying layers, so not sure its a good thing to remove it
> at all. After all, redirects are pretty rare as it is.

I see it as unnecessary xattr present and feel that cleaning it is a
good idea. Given we are thinking of packing REDIRECT xattr in tar file
for layer backup and restore case, it makes even more sense to clean
it up otherwise it shows up every where unnecessarily. I personally
think it is always a good idea to cleanup information you don't need
anymore, instead of letting it sit around.

Vivek

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode
  2018-04-04 13:21         ` Vivek Goyal
@ 2018-04-04 15:51           ` Amir Goldstein
  2018-04-05 14:37             ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-04-04 15:51 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Wed, Apr 4, 2018 at 4:21 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Apr 04, 2018 at 03:51:37PM +0300, Amir Goldstein wrote:
>> On Wed, Apr 4, 2018 at 3:29 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Fri, Mar 30, 2018 at 01:53:24PM +0300, Amir Goldstein wrote:
>> >> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> >> > If we find a upper metacopy inode, make sure we also found associated data
>> >> > dentry in lower. Otherwise copy up operation later will fail.
>> >> >
>> >> > There are two cases where this can happen. First case is that somehow
>> >> > data file was removed from lower layer. Other case is that REDIRECT
>> >> > xattr was removed due to copy up of file on another cpu (when inode is
>> >> > shared between two dentries) and hence ovl_lookup() could not find the
>> >> > correct dentry.
>> >> >
>> >>
>> >> Remind me again why we remove REDIRECT xattr?
>> >> Is it a must for functionality or just for being boy scouts?
>> >> I would prefer to avoid having to deal with races of this sort.
>> >> You can cleanup REDIRECT for non-dir that is not metacopy
>> >> on lookup when finding a I_NEW inode.
>> >
>> > Ok, thinking more about it. If we were to clean REDIRECT on lookup when
>> > finding I_NEW inode, that means we will have to always do
>> > vfs_removexattr(OVL_XATTR_REDIRECT) on all non-metacopy non-dir files.
>> > That does not sound like a very good idea. Its unnecessary overhead in
>> > lookup path.
>> >
>> > IOW, I think removing REDIRECT and doing appropriate locking around
>> > ovl_inode->redirect is probably better.
>> >
>>
>> Here is what I propose.
>> During lookup, you anyway check REDIRECT and check METACOPY
>> on upper and then call ovl_get_inode() with upper redirect and upper
>> metacopy information.
>
> We check for upperredirect only if metacopy xattr is found. Otherwise
> we skip checking for redirect.
>
> https://github.com/rhvgoyal/linux/blob/metacopy-v13/fs/overlayfs/namei.c#L270
>
>>
>> IF this is a new inode AND both REDIRECT and METACOPY were
>> found on upper THEN it is safe to remove REDIRECT xattr.
>
> If both METACOPY and REDIRECT were found, then we should not remove
> REDIRECT. That REDIRECT is still useful. REDIRECT should be removed
> only if METACOPY is not found and REDIRECT is found (on a non-dir file).
>
>>
>> Maybe I am missing something, but I don't see where the extra overhead
>> is, beyond the overhead already there for metacopy enabled lookup.
>
> Given we don't check for REDIRECT if upper is not METACOPY, that means
> we will have to always check for REDIRECT in ovl_get_inode() and add
> the unnecessary overhead (To all non-dir files).
>

I see.

>>
>> OTOH, I don't see what cleaning REDIRECT gets us in the first place.
>> During lookup, REDIRECT does not affect non metacopy non-dir,
>> because we skip ovl_check_redirect().
>> REDIRECT could actually be useful for reconstructing ORIGIN xattr
>> and index after copying layers, so not sure its a good thing to remove it
>> at all. After all, redirects are pretty rare as it is.
>
> I see it as unnecessary xattr present and feel that cleaning it is a
> good idea. Given we are thinking of packing REDIRECT xattr in tar file
> for layer backup and restore case, it makes even more sense to clean
> it up otherwise it shows up every where unnecessarily. I personally
> think it is always a good idea to cleanup information you don't need
> anymore, instead of letting it sit around.
>

Look. I have no objection to cleaning REDIRECT, but I am saying it
is tricky, so I think it is going to cost you a few more patches and maybe
a few more review cycles, so I advised against it.

But here is another idea: store the redirect string in the METACOPY
xattr, this way, removal of METACOPY xattr atomically cleans up
REDIRECT and in lookup, only need to check METACOPY xattr
(exists but empty means no redirect).

Cheers,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode
  2018-04-04 15:51           ` Amir Goldstein
@ 2018-04-05 14:37             ` Vivek Goyal
  2018-04-05 18:22               ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-04-05 14:37 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Wed, Apr 04, 2018 at 06:51:57PM +0300, Amir Goldstein wrote:
> On Wed, Apr 4, 2018 at 4:21 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Wed, Apr 04, 2018 at 03:51:37PM +0300, Amir Goldstein wrote:
> >> On Wed, Apr 4, 2018 at 3:29 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > On Fri, Mar 30, 2018 at 01:53:24PM +0300, Amir Goldstein wrote:
> >> >> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> >> > If we find a upper metacopy inode, make sure we also found associated data
> >> >> > dentry in lower. Otherwise copy up operation later will fail.
> >> >> >
> >> >> > There are two cases where this can happen. First case is that somehow
> >> >> > data file was removed from lower layer. Other case is that REDIRECT
> >> >> > xattr was removed due to copy up of file on another cpu (when inode is
> >> >> > shared between two dentries) and hence ovl_lookup() could not find the
> >> >> > correct dentry.
> >> >> >
> >> >>
> >> >> Remind me again why we remove REDIRECT xattr?
> >> >> Is it a must for functionality or just for being boy scouts?
> >> >> I would prefer to avoid having to deal with races of this sort.
> >> >> You can cleanup REDIRECT for non-dir that is not metacopy
> >> >> on lookup when finding a I_NEW inode.
> >> >
> >> > Ok, thinking more about it. If we were to clean REDIRECT on lookup when
> >> > finding I_NEW inode, that means we will have to always do
> >> > vfs_removexattr(OVL_XATTR_REDIRECT) on all non-metacopy non-dir files.
> >> > That does not sound like a very good idea. Its unnecessary overhead in
> >> > lookup path.
> >> >
> >> > IOW, I think removing REDIRECT and doing appropriate locking around
> >> > ovl_inode->redirect is probably better.
> >> >
> >>
> >> Here is what I propose.
> >> During lookup, you anyway check REDIRECT and check METACOPY
> >> on upper and then call ovl_get_inode() with upper redirect and upper
> >> metacopy information.
> >
> > We check for upperredirect only if metacopy xattr is found. Otherwise
> > we skip checking for redirect.
> >
> > https://github.com/rhvgoyal/linux/blob/metacopy-v13/fs/overlayfs/namei.c#L270
> >
> >>
> >> IF this is a new inode AND both REDIRECT and METACOPY were
> >> found on upper THEN it is safe to remove REDIRECT xattr.
> >
> > If both METACOPY and REDIRECT were found, then we should not remove
> > REDIRECT. That REDIRECT is still useful. REDIRECT should be removed
> > only if METACOPY is not found and REDIRECT is found (on a non-dir file).
> >
> >>
> >> Maybe I am missing something, but I don't see where the extra overhead
> >> is, beyond the overhead already there for metacopy enabled lookup.
> >
> > Given we don't check for REDIRECT if upper is not METACOPY, that means
> > we will have to always check for REDIRECT in ovl_get_inode() and add
> > the unnecessary overhead (To all non-dir files).
> >
> 
> I see.
> 
> >>
> >> OTOH, I don't see what cleaning REDIRECT gets us in the first place.
> >> During lookup, REDIRECT does not affect non metacopy non-dir,
> >> because we skip ovl_check_redirect().
> >> REDIRECT could actually be useful for reconstructing ORIGIN xattr
> >> and index after copying layers, so not sure its a good thing to remove it
> >> at all. After all, redirects are pretty rare as it is.
> >
> > I see it as unnecessary xattr present and feel that cleaning it is a
> > good idea. Given we are thinking of packing REDIRECT xattr in tar file
> > for layer backup and restore case, it makes even more sense to clean
> > it up otherwise it shows up every where unnecessarily. I personally
> > think it is always a good idea to cleanup information you don't need
> > anymore, instead of letting it sit around.
> >
> 
> Look. I have no objection to cleaning REDIRECT, but I am saying it
> is tricky, so I think it is going to cost you a few more patches and maybe
> a few more review cycles, so I advised against it.

Hi Amir,

Anyway, I doubt these patches are going to be merged for 4.17. So 
I am fine if it takes few more revisions to properly review it. Doing
it properly is more important. (Despite the fact that I am little
exhausted now. :-))

> 
> But here is another idea: store the redirect string in the METACOPY
> xattr, this way, removal of METACOPY xattr atomically cleans up
> REDIRECT and in lookup, only need to check METACOPY xattr
> (exists but empty means no redirect).

I don't like this much. I had thought about it but did not pursue it.

- First of all, I don't like that REDIRECT for dir and non-dir will be
  stored differently. 

- Secondly, xattr is just one pience. We also need to protet
  ov_inode->redirect field and this will not solve that issue. That issue
  can be solved only if we provide proper locking so that readers and
  writers of ovl_inode->redirect don't race and run into unexpected
  surprises.

Given VFS locking does not protect against copy up path races with
rename(), I think that right solution for this problem is to protect
against this race by taking ovl_inode->lock. I think this is something
future readers can understand and build more functionality on top.

If we are primarily worried about races against copy up for redirect, then
we probably don't have to double lock both ovl_inodes. As you suggested,
I should be able to move out set_redirect() earlier in rename
path and take one lock at a time. That should simplify the locking
logic a bit. How about this instead?

Vivek

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode
  2018-04-05 14:37             ` Vivek Goyal
@ 2018-04-05 18:22               ` Vivek Goyal
  2018-04-05 19:58                 ` Amir Goldstein
  0 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-04-05 18:22 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Thu, Apr 05, 2018 at 10:37:57AM -0400, Vivek Goyal wrote:
> On Wed, Apr 04, 2018 at 06:51:57PM +0300, Amir Goldstein wrote:
> > On Wed, Apr 4, 2018 at 4:21 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > > On Wed, Apr 04, 2018 at 03:51:37PM +0300, Amir Goldstein wrote:
> > >> On Wed, Apr 4, 2018 at 3:29 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > >> > On Fri, Mar 30, 2018 at 01:53:24PM +0300, Amir Goldstein wrote:
> > >> >> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > >> >> > If we find a upper metacopy inode, make sure we also found associated data
> > >> >> > dentry in lower. Otherwise copy up operation later will fail.
> > >> >> >
> > >> >> > There are two cases where this can happen. First case is that somehow
> > >> >> > data file was removed from lower layer. Other case is that REDIRECT
> > >> >> > xattr was removed due to copy up of file on another cpu (when inode is
> > >> >> > shared between two dentries) and hence ovl_lookup() could not find the
> > >> >> > correct dentry.
> > >> >> >
> > >> >>
> > >> >> Remind me again why we remove REDIRECT xattr?
> > >> >> Is it a must for functionality or just for being boy scouts?
> > >> >> I would prefer to avoid having to deal with races of this sort.
> > >> >> You can cleanup REDIRECT for non-dir that is not metacopy
> > >> >> on lookup when finding a I_NEW inode.
> > >> >
> > >> > Ok, thinking more about it. If we were to clean REDIRECT on lookup when
> > >> > finding I_NEW inode, that means we will have to always do
> > >> > vfs_removexattr(OVL_XATTR_REDIRECT) on all non-metacopy non-dir files.
> > >> > That does not sound like a very good idea. Its unnecessary overhead in
> > >> > lookup path.
> > >> >
> > >> > IOW, I think removing REDIRECT and doing appropriate locking around
> > >> > ovl_inode->redirect is probably better.
> > >> >
> > >>
> > >> Here is what I propose.
> > >> During lookup, you anyway check REDIRECT and check METACOPY
> > >> on upper and then call ovl_get_inode() with upper redirect and upper
> > >> metacopy information.
> > >
> > > We check for upperredirect only if metacopy xattr is found. Otherwise
> > > we skip checking for redirect.
> > >
> > > https://github.com/rhvgoyal/linux/blob/metacopy-v13/fs/overlayfs/namei.c#L270
> > >
> > >>
> > >> IF this is a new inode AND both REDIRECT and METACOPY were
> > >> found on upper THEN it is safe to remove REDIRECT xattr.
> > >
> > > If both METACOPY and REDIRECT were found, then we should not remove
> > > REDIRECT. That REDIRECT is still useful. REDIRECT should be removed
> > > only if METACOPY is not found and REDIRECT is found (on a non-dir file).
> > >
> > >>
> > >> Maybe I am missing something, but I don't see where the extra overhead
> > >> is, beyond the overhead already there for metacopy enabled lookup.
> > >
> > > Given we don't check for REDIRECT if upper is not METACOPY, that means
> > > we will have to always check for REDIRECT in ovl_get_inode() and add
> > > the unnecessary overhead (To all non-dir files).
> > >
> > 
> > I see.
> > 
> > >>
> > >> OTOH, I don't see what cleaning REDIRECT gets us in the first place.
> > >> During lookup, REDIRECT does not affect non metacopy non-dir,
> > >> because we skip ovl_check_redirect().
> > >> REDIRECT could actually be useful for reconstructing ORIGIN xattr
> > >> and index after copying layers, so not sure its a good thing to remove it
> > >> at all. After all, redirects are pretty rare as it is.
> > >
> > > I see it as unnecessary xattr present and feel that cleaning it is a
> > > good idea. Given we are thinking of packing REDIRECT xattr in tar file
> > > for layer backup and restore case, it makes even more sense to clean
> > > it up otherwise it shows up every where unnecessarily. I personally
> > > think it is always a good idea to cleanup information you don't need
> > > anymore, instead of letting it sit around.
> > >
> > 
> > Look. I have no objection to cleaning REDIRECT, but I am saying it
> > is tricky, so I think it is going to cost you a few more patches and maybe
> > a few more review cycles, so I advised against it.
> 
> Hi Amir,
> 
> Anyway, I doubt these patches are going to be merged for 4.17. So 
> I am fine if it takes few more revisions to properly review it. Doing
> it properly is more important. (Despite the fact that I am little
> exhausted now. :-))
> 
> > 
> > But here is another idea: store the redirect string in the METACOPY
> > xattr, this way, removal of METACOPY xattr atomically cleans up
> > REDIRECT and in lookup, only need to check METACOPY xattr
> > (exists but empty means no redirect).
> 
> I don't like this much. I had thought about it but did not pursue it.
> 
> - First of all, I don't like that REDIRECT for dir and non-dir will be
>   stored differently. 
> 
> - Secondly, xattr is just one pience. We also need to protet
>   ov_inode->redirect field and this will not solve that issue. That issue
>   can be solved only if we provide proper locking so that readers and
>   writers of ovl_inode->redirect don't race and run into unexpected
>   surprises.
> 
> Given VFS locking does not protect against copy up path races with
> rename(), I think that right solution for this problem is to protect
> against this race by taking ovl_inode->lock. I think this is something
> future readers can understand and build more functionality on top.
> 
> If we are primarily worried about races against copy up for redirect, then
> we probably don't have to double lock both ovl_inodes. As you suggested,
> I should be able to move out set_redirect() earlier in rename
> path and take one lock at a time. That should simplify the locking
> logic a bit. How about this instead?

If you don't like this locking ovl_inode->lock model, I guess for now
I could live with not removing REDIRECT after copy up. Once that gets
merged, I could do one patch series just to clean up REDIRECT after copy
up and do proper locking.

Vivek

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode
  2018-04-05 18:22               ` Vivek Goyal
@ 2018-04-05 19:58                 ` Amir Goldstein
  2018-04-05 20:45                   ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-04-05 19:58 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Apr 5, 2018 at 9:22 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Thu, Apr 05, 2018 at 10:37:57AM -0400, Vivek Goyal wrote:
>> On Wed, Apr 04, 2018 at 06:51:57PM +0300, Amir Goldstein wrote:
>> > On Wed, Apr 4, 2018 at 4:21 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > > On Wed, Apr 04, 2018 at 03:51:37PM +0300, Amir Goldstein wrote:
>> > >> On Wed, Apr 4, 2018 at 3:29 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > >> > On Fri, Mar 30, 2018 at 01:53:24PM +0300, Amir Goldstein wrote:
>> > >> >> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > >> >> > If we find a upper metacopy inode, make sure we also found associated data
>> > >> >> > dentry in lower. Otherwise copy up operation later will fail.
>> > >> >> >
>> > >> >> > There are two cases where this can happen. First case is that somehow
>> > >> >> > data file was removed from lower layer. Other case is that REDIRECT
>> > >> >> > xattr was removed due to copy up of file on another cpu (when inode is
>> > >> >> > shared between two dentries) and hence ovl_lookup() could not find the
>> > >> >> > correct dentry.
>> > >> >> >
>> > >> >>
>> > >> >> Remind me again why we remove REDIRECT xattr?
>> > >> >> Is it a must for functionality or just for being boy scouts?
>> > >> >> I would prefer to avoid having to deal with races of this sort.
>> > >> >> You can cleanup REDIRECT for non-dir that is not metacopy
>> > >> >> on lookup when finding a I_NEW inode.
>> > >> >
>> > >> > Ok, thinking more about it. If we were to clean REDIRECT on lookup when
>> > >> > finding I_NEW inode, that means we will have to always do
>> > >> > vfs_removexattr(OVL_XATTR_REDIRECT) on all non-metacopy non-dir files.
>> > >> > That does not sound like a very good idea. Its unnecessary overhead in
>> > >> > lookup path.
>> > >> >
>> > >> > IOW, I think removing REDIRECT and doing appropriate locking around
>> > >> > ovl_inode->redirect is probably better.
>> > >> >
>> > >>
>> > >> Here is what I propose.
>> > >> During lookup, you anyway check REDIRECT and check METACOPY
>> > >> on upper and then call ovl_get_inode() with upper redirect and upper
>> > >> metacopy information.
>> > >
>> > > We check for upperredirect only if metacopy xattr is found. Otherwise
>> > > we skip checking for redirect.
>> > >
>> > > https://github.com/rhvgoyal/linux/blob/metacopy-v13/fs/overlayfs/namei.c#L270
>> > >
>> > >>
>> > >> IF this is a new inode AND both REDIRECT and METACOPY were
>> > >> found on upper THEN it is safe to remove REDIRECT xattr.
>> > >
>> > > If both METACOPY and REDIRECT were found, then we should not remove
>> > > REDIRECT. That REDIRECT is still useful. REDIRECT should be removed
>> > > only if METACOPY is not found and REDIRECT is found (on a non-dir file).
>> > >
>> > >>
>> > >> Maybe I am missing something, but I don't see where the extra overhead
>> > >> is, beyond the overhead already there for metacopy enabled lookup.
>> > >
>> > > Given we don't check for REDIRECT if upper is not METACOPY, that means
>> > > we will have to always check for REDIRECT in ovl_get_inode() and add
>> > > the unnecessary overhead (To all non-dir files).
>> > >
>> >
>> > I see.
>> >
>> > >>
>> > >> OTOH, I don't see what cleaning REDIRECT gets us in the first place.
>> > >> During lookup, REDIRECT does not affect non metacopy non-dir,
>> > >> because we skip ovl_check_redirect().
>> > >> REDIRECT could actually be useful for reconstructing ORIGIN xattr
>> > >> and index after copying layers, so not sure its a good thing to remove it
>> > >> at all. After all, redirects are pretty rare as it is.
>> > >
>> > > I see it as unnecessary xattr present and feel that cleaning it is a
>> > > good idea. Given we are thinking of packing REDIRECT xattr in tar file
>> > > for layer backup and restore case, it makes even more sense to clean
>> > > it up otherwise it shows up every where unnecessarily. I personally
>> > > think it is always a good idea to cleanup information you don't need
>> > > anymore, instead of letting it sit around.
>> > >
>> >
>> > Look. I have no objection to cleaning REDIRECT, but I am saying it
>> > is tricky, so I think it is going to cost you a few more patches and maybe
>> > a few more review cycles, so I advised against it.
>>
>> Hi Amir,
>>
>> Anyway, I doubt these patches are going to be merged for 4.17. So
>> I am fine if it takes few more revisions to properly review it. Doing
>> it properly is more important. (Despite the fact that I am little
>> exhausted now. :-))
>>
>> >
>> > But here is another idea: store the redirect string in the METACOPY
>> > xattr, this way, removal of METACOPY xattr atomically cleans up
>> > REDIRECT and in lookup, only need to check METACOPY xattr
>> > (exists but empty means no redirect).
>>
>> I don't like this much. I had thought about it but did not pursue it.
>>
>> - First of all, I don't like that REDIRECT for dir and non-dir will be
>>   stored differently.
>>
>> - Secondly, xattr is just one pience. We also need to protet
>>   ov_inode->redirect field and this will not solve that issue. That issue
>>   can be solved only if we provide proper locking so that readers and
>>   writers of ovl_inode->redirect don't race and run into unexpected
>>   surprises.
>>
>> Given VFS locking does not protect against copy up path races with
>> rename(), I think that right solution for this problem is to protect
>> against this race by taking ovl_inode->lock. I think this is something
>> future readers can understand and build more functionality on top.
>>
>> If we are primarily worried about races against copy up for redirect, then
>> we probably don't have to double lock both ovl_inodes. As you suggested,
>> I should be able to move out set_redirect() earlier in rename
>> path and take one lock at a time. That should simplify the locking
>> logic a bit. How about this instead?
>
> If you don't like this locking ovl_inode->lock model, I guess for now
> I could live with not removing REDIRECT after copy up. Once that gets
> merged, I could do one patch series just to clean up REDIRECT after copy
> up and do proper locking.
>

I think we are confusing two different things in this discussion.
Locking ovl_inode->lock for changing redirect is something that should
stay in the patch set IMO and should be simplified by moving
set redirect before taking rename locks on two inodes.

I was asking about removal of REDIRECT because this patch
(ovl_verify_metacopy_data) is a bit tricky for me to review and
I still don't feel confident about it.

My intuition says we could go other ways as well:
- unite METACOPY/REDIRECT xattr (we can call the unite
xattr REDIRECT and not METACOPY)
- memory barriers between setting/clearing/checking
METACOPY/REDIRECT (there is already a barrier for setting
upperdata flag, so that's half the work already.

We can either have this discussion now or, as you suggested
leave it to a following patch set. Rule of thumb - if this is v13
with 28 patches, might not be a bad idea to deffer 2 patches
and reduce complexity...

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 04/28] ovl: Provide a mount option metacopy=on/off for metadata copyup
  2018-04-02 13:56     ` Vivek Goyal
@ 2018-04-05 20:16       ` Amir Goldstein
  2018-04-06 13:51         ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-04-05 20:16 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Mon, Apr 2, 2018 at 4:56 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Mar 30, 2018 at 07:52:17AM +0300, Amir Goldstein wrote:
>> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > By default metadata only copy up is disabled. Provide a mount option so
>> > that users can choose one way or other.
>> >
>> > Also provide a kernel config and module option to enable/disable
>> > metacopy feature.
>> >
>> > metacopy feature requires redirect_dir=on when upper is present. Otherwise,
>> > it requires redirect_dir=follow atleast.
>> >
>> > Like index feature, we verify on mount that upper root is not being
>> > reused with a different lower root.
>>
>> I don't see that in the patch
>
> Oh.., this is leftover from previous patches. Will remove this comment.
> I have completely got rid of dealing with ORIGIN when moving to
> REDIRECT based lookup.
>
>>
>> > This hopes to get the configuration
>> > right and detect the copied layers use case. But this does only so
>> > much as we don't verify all the lowers. So it is possible that a lower is
>> > missing and later data copy up fails.
>> >
>> > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
>> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
>> > ---
>> >  Documentation/filesystems/overlayfs.txt | 30 ++++++++++++++++++++++++-
>> >  fs/overlayfs/Kconfig                    | 19 ++++++++++++++++
>> >  fs/overlayfs/ovl_entry.h                |  1 +
>> >  fs/overlayfs/super.c                    | 40 ++++++++++++++++++++++++++++++++-
>> >  4 files changed, 88 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt
>> > index 6ea1e64d1464..b7720e61973c 100644
>> > --- a/Documentation/filesystems/overlayfs.txt
>> > +++ b/Documentation/filesystems/overlayfs.txt
>> > @@ -249,6 +249,30 @@ rightmost one and going left.  In the above example lower1 will be the
>> >  top, lower2 the middle and lower3 the bottom layer.
>> >
>> >
>> > +Metadata only copyup
>> > +--------------------
>> > +
>> > +When metadata only copy up feature is enabled, overlayfs will only copy
>> > +up metadata (as opposed to whole file), when a metadata specific operation
>> > +like chown/chmod is performed. Full file will be copied up later when
>> > +file is opened for WRITE operation.
>> > +
>> > +IOW, this is delayed data copy up operation and data is copied up when
>> > +there is a need to actually modify data.
>> > +

Vivek,

Some nick picking here.
This is document is about the only decent documentation for overlayfs.
I always try to live up to the standards set by Neil who started it.
Abbreviations (IOW) doesn't seem right to me in this context.
Also, you'll notice that the form "copyup" does not appear in this document.
It may sound petty, but if people wish to read a coherent document about
overlayfs, we would be wise to maintain coherency in the phrases we use
throughout the document.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode
  2018-04-05 19:58                 ` Amir Goldstein
@ 2018-04-05 20:45                   ` Vivek Goyal
  2018-04-06  9:46                     ` Amir Goldstein
  0 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-04-05 20:45 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Thu, Apr 05, 2018 at 10:58:58PM +0300, Amir Goldstein wrote:
> On Thu, Apr 5, 2018 at 9:22 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Thu, Apr 05, 2018 at 10:37:57AM -0400, Vivek Goyal wrote:
> >> On Wed, Apr 04, 2018 at 06:51:57PM +0300, Amir Goldstein wrote:
> >> > On Wed, Apr 4, 2018 at 4:21 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > > On Wed, Apr 04, 2018 at 03:51:37PM +0300, Amir Goldstein wrote:
> >> > >> On Wed, Apr 4, 2018 at 3:29 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > >> > On Fri, Mar 30, 2018 at 01:53:24PM +0300, Amir Goldstein wrote:
> >> > >> >> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > >> >> > If we find a upper metacopy inode, make sure we also found associated data
> >> > >> >> > dentry in lower. Otherwise copy up operation later will fail.
> >> > >> >> >
> >> > >> >> > There are two cases where this can happen. First case is that somehow
> >> > >> >> > data file was removed from lower layer. Other case is that REDIRECT
> >> > >> >> > xattr was removed due to copy up of file on another cpu (when inode is
> >> > >> >> > shared between two dentries) and hence ovl_lookup() could not find the
> >> > >> >> > correct dentry.
> >> > >> >> >
> >> > >> >>
> >> > >> >> Remind me again why we remove REDIRECT xattr?
> >> > >> >> Is it a must for functionality or just for being boy scouts?
> >> > >> >> I would prefer to avoid having to deal with races of this sort.
> >> > >> >> You can cleanup REDIRECT for non-dir that is not metacopy
> >> > >> >> on lookup when finding a I_NEW inode.
> >> > >> >
> >> > >> > Ok, thinking more about it. If we were to clean REDIRECT on lookup when
> >> > >> > finding I_NEW inode, that means we will have to always do
> >> > >> > vfs_removexattr(OVL_XATTR_REDIRECT) on all non-metacopy non-dir files.
> >> > >> > That does not sound like a very good idea. Its unnecessary overhead in
> >> > >> > lookup path.
> >> > >> >
> >> > >> > IOW, I think removing REDIRECT and doing appropriate locking around
> >> > >> > ovl_inode->redirect is probably better.
> >> > >> >
> >> > >>
> >> > >> Here is what I propose.
> >> > >> During lookup, you anyway check REDIRECT and check METACOPY
> >> > >> on upper and then call ovl_get_inode() with upper redirect and upper
> >> > >> metacopy information.
> >> > >
> >> > > We check for upperredirect only if metacopy xattr is found. Otherwise
> >> > > we skip checking for redirect.
> >> > >
> >> > > https://github.com/rhvgoyal/linux/blob/metacopy-v13/fs/overlayfs/namei.c#L270
> >> > >
> >> > >>
> >> > >> IF this is a new inode AND both REDIRECT and METACOPY were
> >> > >> found on upper THEN it is safe to remove REDIRECT xattr.
> >> > >
> >> > > If both METACOPY and REDIRECT were found, then we should not remove
> >> > > REDIRECT. That REDIRECT is still useful. REDIRECT should be removed
> >> > > only if METACOPY is not found and REDIRECT is found (on a non-dir file).
> >> > >
> >> > >>
> >> > >> Maybe I am missing something, but I don't see where the extra overhead
> >> > >> is, beyond the overhead already there for metacopy enabled lookup.
> >> > >
> >> > > Given we don't check for REDIRECT if upper is not METACOPY, that means
> >> > > we will have to always check for REDIRECT in ovl_get_inode() and add
> >> > > the unnecessary overhead (To all non-dir files).
> >> > >
> >> >
> >> > I see.
> >> >
> >> > >>
> >> > >> OTOH, I don't see what cleaning REDIRECT gets us in the first place.
> >> > >> During lookup, REDIRECT does not affect non metacopy non-dir,
> >> > >> because we skip ovl_check_redirect().
> >> > >> REDIRECT could actually be useful for reconstructing ORIGIN xattr
> >> > >> and index after copying layers, so not sure its a good thing to remove it
> >> > >> at all. After all, redirects are pretty rare as it is.
> >> > >
> >> > > I see it as unnecessary xattr present and feel that cleaning it is a
> >> > > good idea. Given we are thinking of packing REDIRECT xattr in tar file
> >> > > for layer backup and restore case, it makes even more sense to clean
> >> > > it up otherwise it shows up every where unnecessarily. I personally
> >> > > think it is always a good idea to cleanup information you don't need
> >> > > anymore, instead of letting it sit around.
> >> > >
> >> >
> >> > Look. I have no objection to cleaning REDIRECT, but I am saying it
> >> > is tricky, so I think it is going to cost you a few more patches and maybe
> >> > a few more review cycles, so I advised against it.
> >>
> >> Hi Amir,
> >>
> >> Anyway, I doubt these patches are going to be merged for 4.17. So
> >> I am fine if it takes few more revisions to properly review it. Doing
> >> it properly is more important. (Despite the fact that I am little
> >> exhausted now. :-))
> >>
> >> >
> >> > But here is another idea: store the redirect string in the METACOPY
> >> > xattr, this way, removal of METACOPY xattr atomically cleans up
> >> > REDIRECT and in lookup, only need to check METACOPY xattr
> >> > (exists but empty means no redirect).
> >>
> >> I don't like this much. I had thought about it but did not pursue it.
> >>
> >> - First of all, I don't like that REDIRECT for dir and non-dir will be
> >>   stored differently.
> >>
> >> - Secondly, xattr is just one pience. We also need to protet
> >>   ov_inode->redirect field and this will not solve that issue. That issue
> >>   can be solved only if we provide proper locking so that readers and
> >>   writers of ovl_inode->redirect don't race and run into unexpected
> >>   surprises.
> >>
> >> Given VFS locking does not protect against copy up path races with
> >> rename(), I think that right solution for this problem is to protect
> >> against this race by taking ovl_inode->lock. I think this is something
> >> future readers can understand and build more functionality on top.
> >>
> >> If we are primarily worried about races against copy up for redirect, then
> >> we probably don't have to double lock both ovl_inodes. As you suggested,
> >> I should be able to move out set_redirect() earlier in rename
> >> path and take one lock at a time. That should simplify the locking
> >> logic a bit. How about this instead?
> >
> > If you don't like this locking ovl_inode->lock model, I guess for now
> > I could live with not removing REDIRECT after copy up. Once that gets
> > merged, I could do one patch series just to clean up REDIRECT after copy
> > up and do proper locking.
> >
> 
> I think we are confusing two different things in this discussion.
> Locking ovl_inode->lock for changing redirect is something that should
> stay in the patch set IMO and should be simplified by moving
> set redirect before taking rename locks on two inodes.
> 
> I was asking about removal of REDIRECT because this patch
> (ovl_verify_metacopy_data) is a bit tricky for me to review and
> I still don't feel confident about it.
> 
> My intuition says we could go other ways as well:
> - unite METACOPY/REDIRECT xattr (we can call the unite
> xattr REDIRECT and not METACOPY)
> - memory barriers between setting/clearing/checking
> METACOPY/REDIRECT (there is already a barrier for setting
> upperdata flag, so that's half the work already.
> 
> We can either have this discussion now or, as you suggested
> leave it to a following patch set. Rule of thumb - if this is v13
> with 28 patches, might not be a bad idea to deffer 2 patches
> and reduce complexity...
> 

Hi Amir,

Ok, let me drop removing REDIRECT from the patchset to reduce
complexity. Lets first try to get in a basic version in.

Now coming to the question of locking ovl_inode->lock. If REDIRECTS
are never removed and only upgraded (from relative to absolute), my
understanding is that current VFS locking is sufficient to prevent
races. Only rename and link path set redirects and both the paths take
inode locks and that means two set redirects can't make progress in
parallel on an inode.

When it comes to ovl_lookup(), we will set ovl_inode->redirect only for
the case of I_NEW. So no races should be there as well.

So far we had to add locking due to copy up path and now copy up path
will not touch redirect xattr. I probably will not even clear
ovl_inode->inode redirect as we are not removing REDIRECT.

Do you see any other path racing and still needed ovl_inode->lock?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode
  2018-04-05 20:45                   ` Vivek Goyal
@ 2018-04-06  9:46                     ` Amir Goldstein
  2018-04-06 15:37                       ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-04-06  9:46 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Apr 5, 2018 at 11:45 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
[...]
>> >
>> > If you don't like this locking ovl_inode->lock model, I guess for now
>> > I could live with not removing REDIRECT after copy up. Once that gets
>> > merged, I could do one patch series just to clean up REDIRECT after copy
>> > up and do proper locking.
>> >
>>
>> I think we are confusing two different things in this discussion.
>> Locking ovl_inode->lock for changing redirect is something that should
>> stay in the patch set IMO and should be simplified by moving
>> set redirect before taking rename locks on two inodes.
>>
>> I was asking about removal of REDIRECT because this patch
>> (ovl_verify_metacopy_data) is a bit tricky for me to review and
>> I still don't feel confident about it.
>>
>> My intuition says we could go other ways as well:
>> - unite METACOPY/REDIRECT xattr (we can call the unite
>> xattr REDIRECT and not METACOPY)
>> - memory barriers between setting/clearing/checking
>> METACOPY/REDIRECT (there is already a barrier for setting
>> upperdata flag, so that's half the work already.
>>
>> We can either have this discussion now or, as you suggested
>> leave it to a following patch set. Rule of thumb - if this is v13
>> with 28 patches, might not be a bad idea to deffer 2 patches
>> and reduce complexity...
>>
>
> Hi Amir,
>
> Ok, let me drop removing REDIRECT from the patchset to reduce
> complexity. Lets first try to get in a basic version in.
>
> Now coming to the question of locking ovl_inode->lock. If REDIRECTS
> are never removed and only upgraded (from relative to absolute), my
> understanding is that current VFS locking is sufficient to prevent
> races. Only rename and link path set redirects and both the paths take
> inode locks and that means two set redirects can't make progress in
> parallel on an inode.
>
> When it comes to ovl_lookup(), we will set ovl_inode->redirect only for
> the case of I_NEW. So no races should be there as well.
>
> So far we had to add locking due to copy up path and now copy up path
> will not touch redirect xattr. I probably will not even clear
> ovl_inode->inode redirect as we are not removing REDIRECT.
>
> Do you see any other path racing and still needed ovl_inode->lock?
>

Let's see.

Current code is taking A->d_lock in ovl_set_redirect(A) to protect
against racing with redirect path traversal in ovl_get_redirect(A/B/c).

For the purpose of protecting ancestors redirect path, d_lock is
sufficient and is needed anyway to access d_name.

Also any rename in the filesystem is *also* serialized with
ovl_sb->s_vfs_rename_mutex. That would make the change I suggested
to move ovl_set_redirect() before lock_rename() a bit tricky.
You may say that taking d_lock in ovl_set_redirect() is not strictly
needed, but I guess it is better coding (or maybe something I am
missing).

Now when adding ovl_set_redirect() for non-dir and in ovl_link() the
plot thickens, because:
1. link is not serialized with ovl_sb->s_vfs_rename_mutex lock,
so path traversal in ovl_get_redirect() in unstable
2. 'old' dentry in ovl_link() can be disconnected, so there is
no path for ovl_get_redirect() at all.

The only way to get a disconnected dentry is with nfs export, so for
now, it is enough to WARN_ON and return error in ovl_set_redirect()
for (dentry->d_flags & DCACHE_DISCONNECTED)
with a comment referring to nfs_export+metacopy support.

Maybe the simplest solution w.r.t stabilizing path traversal
is to move ovl_set_redirect() above lock_rename()
as I suggested and use a new lock ofs->ovl_rename_mutex
inside ovl_get_redirect() to protect the !samedir traversal
and also take that lock before lock_rename() in ovl_rename().

Back to the original question: how should ovl_inode->redirect
be protected if at all it needs protection?
I think setting redirect need the protection of ofs->ovl_rename_mutex.
Can either take it just around ovl_dentry_set_redirect(), probably
instead of d_lock, or take the ovl_rename_mutex inside
ovl_set_redirect() and around ovl_get_redirect() instead of inside
ovl_get_redirect().

Hmm, I hope I got this right. I advise to get feedback from Miklos
to this design before you proceed.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 04/28] ovl: Provide a mount option metacopy=on/off for metadata copyup
  2018-04-05 20:16       ` Amir Goldstein
@ 2018-04-06 13:51         ` Vivek Goyal
  0 siblings, 0 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-04-06 13:51 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Thu, Apr 05, 2018 at 11:16:36PM +0300, Amir Goldstein wrote:
> On Mon, Apr 2, 2018 at 4:56 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Fri, Mar 30, 2018 at 07:52:17AM +0300, Amir Goldstein wrote:
> >> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > By default metadata only copy up is disabled. Provide a mount option so
> >> > that users can choose one way or other.
> >> >
> >> > Also provide a kernel config and module option to enable/disable
> >> > metacopy feature.
> >> >
> >> > metacopy feature requires redirect_dir=on when upper is present. Otherwise,
> >> > it requires redirect_dir=follow atleast.
> >> >
> >> > Like index feature, we verify on mount that upper root is not being
> >> > reused with a different lower root.
> >>
> >> I don't see that in the patch
> >
> > Oh.., this is leftover from previous patches. Will remove this comment.
> > I have completely got rid of dealing with ORIGIN when moving to
> > REDIRECT based lookup.
> >
> >>
> >> > This hopes to get the configuration
> >> > right and detect the copied layers use case. But this does only so
> >> > much as we don't verify all the lowers. So it is possible that a lower is
> >> > missing and later data copy up fails.
> >> >
> >> > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> >> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> >> > ---
> >> >  Documentation/filesystems/overlayfs.txt | 30 ++++++++++++++++++++++++-
> >> >  fs/overlayfs/Kconfig                    | 19 ++++++++++++++++
> >> >  fs/overlayfs/ovl_entry.h                |  1 +
> >> >  fs/overlayfs/super.c                    | 40 ++++++++++++++++++++++++++++++++-
> >> >  4 files changed, 88 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt
> >> > index 6ea1e64d1464..b7720e61973c 100644
> >> > --- a/Documentation/filesystems/overlayfs.txt
> >> > +++ b/Documentation/filesystems/overlayfs.txt
> >> > @@ -249,6 +249,30 @@ rightmost one and going left.  In the above example lower1 will be the
> >> >  top, lower2 the middle and lower3 the bottom layer.
> >> >
> >> >
> >> > +Metadata only copyup
> >> > +--------------------
> >> > +
> >> > +When metadata only copy up feature is enabled, overlayfs will only copy
> >> > +up metadata (as opposed to whole file), when a metadata specific operation
> >> > +like chown/chmod is performed. Full file will be copied up later when
> >> > +file is opened for WRITE operation.
> >> > +
> >> > +IOW, this is delayed data copy up operation and data is copied up when
> >> > +there is a need to actually modify data.
> >> > +
> 
> Vivek,
> 
> Some nick picking here.
> This is document is about the only decent documentation for overlayfs.
> I always try to live up to the standards set by Neil who started it.
> Abbreviations (IOW) doesn't seem right to me in this context.
> Also, you'll notice that the form "copyup" does not appear in this document.
> It may sound petty, but if people wish to read a coherent document about
> overlayfs, we would be wise to maintain coherency in the phrases we use
> throughout the document.

Ok, I will remove IOW. Also will replace "copyup" with "copy up".

Vivek

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode
  2018-04-06  9:46                     ` Amir Goldstein
@ 2018-04-06 15:37                       ` Vivek Goyal
  2018-04-06 16:21                         ` Amir Goldstein
  0 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-04-06 15:37 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Fri, Apr 06, 2018 at 12:46:31PM +0300, Amir Goldstein wrote:
> On Thu, Apr 5, 2018 at 11:45 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> [...]
> >> >
> >> > If you don't like this locking ovl_inode->lock model, I guess for now
> >> > I could live with not removing REDIRECT after copy up. Once that gets
> >> > merged, I could do one patch series just to clean up REDIRECT after copy
> >> > up and do proper locking.
> >> >
> >>
> >> I think we are confusing two different things in this discussion.
> >> Locking ovl_inode->lock for changing redirect is something that should
> >> stay in the patch set IMO and should be simplified by moving
> >> set redirect before taking rename locks on two inodes.
> >>
> >> I was asking about removal of REDIRECT because this patch
> >> (ovl_verify_metacopy_data) is a bit tricky for me to review and
> >> I still don't feel confident about it.
> >>
> >> My intuition says we could go other ways as well:
> >> - unite METACOPY/REDIRECT xattr (we can call the unite
> >> xattr REDIRECT and not METACOPY)
> >> - memory barriers between setting/clearing/checking
> >> METACOPY/REDIRECT (there is already a barrier for setting
> >> upperdata flag, so that's half the work already.
> >>
> >> We can either have this discussion now or, as you suggested
> >> leave it to a following patch set. Rule of thumb - if this is v13
> >> with 28 patches, might not be a bad idea to deffer 2 patches
> >> and reduce complexity...
> >>
> >
> > Hi Amir,
> >
> > Ok, let me drop removing REDIRECT from the patchset to reduce
> > complexity. Lets first try to get in a basic version in.
> >
> > Now coming to the question of locking ovl_inode->lock. If REDIRECTS
> > are never removed and only upgraded (from relative to absolute), my
> > understanding is that current VFS locking is sufficient to prevent
> > races. Only rename and link path set redirects and both the paths take
> > inode locks and that means two set redirects can't make progress in
> > parallel on an inode.
> >
> > When it comes to ovl_lookup(), we will set ovl_inode->redirect only for
> > the case of I_NEW. So no races should be there as well.
> >
> > So far we had to add locking due to copy up path and now copy up path
> > will not touch redirect xattr. I probably will not even clear
> > ovl_inode->inode redirect as we are not removing REDIRECT.
> >
> > Do you see any other path racing and still needed ovl_inode->lock?
> >
> 
> Let's see.
> 
> Current code is taking A->d_lock in ovl_set_redirect(A) to protect
> against racing with redirect path traversal in ovl_get_redirect(A/B/c).
> 
> For the purpose of protecting ancestors redirect path, d_lock is
> sufficient and is needed anyway to access d_name.
> 
> Also any rename in the filesystem is *also* serialized with
> ovl_sb->s_vfs_rename_mutex. That would make the change I suggested
> to move ovl_set_redirect() before lock_rename() a bit tricky.
> You may say that taking d_lock in ovl_set_redirect() is not strictly
> needed, but I guess it is better coding (or maybe something I am
> missing).
> 
> Now when adding ovl_set_redirect() for non-dir and in ovl_link() the
> plot thickens, because:
> 1. link is not serialized with ovl_sb->s_vfs_rename_mutex lock,
> so path traversal in ovl_get_redirect() in unstable
> 2. 'old' dentry in ovl_link() can be disconnected, so there is
> no path for ovl_get_redirect() at all.
> 
> The only way to get a disconnected dentry is with nfs export, so for
> now, it is enough to WARN_ON and return error in ovl_set_redirect()
> for (dentry->d_flags & DCACHE_DISCONNECTED)
> with a comment referring to nfs_export+metacopy support.

Hi Amir,

I can put this warning for DCACHE_DISCONNECTED dentries.

Now, most interesting part for me here is that why do we need to
stop/synchronize other renames in the system while some set_redirect/get
_redirect operation is taking place. That's the part I don't understand.
When I am looking at current code, I feel d_lock, seems to be good enough
to make sure that ovl_get_redirect() works fine when parallel renames
progressing on other cpu.

So a simple example, is say I am creating a link bar/foo-link.txt to a file
foo/foo.txt and that triggers setting absolute redirect on foo.txt
and we will call ovl_get_direct() and  traverse the tree up till root. 
Now say a part of the tree is also being renamed. Say foo/ is being
renamed to alpha/. I am wondering is d_lock is not enough to make sure
this is not a problem.

We always set redirect first and then do rename. That means d_parent
should be changed only after redirect has been set on a dentry. And
that should guarantee that if ovl_get_redirect() sees new parent, 
then parent_dentry->redirect has been already set. If it sees dentry
before rename, then redirect might be there if not, dentry name would
be used and it will also see the old parent and continue traversal
up.

Is there anything I am missing? 

> 
> Maybe the simplest solution w.r.t stabilizing path traversal
> is to move ovl_set_redirect() above lock_rename()
> as I suggested and use a new lock ofs->ovl_rename_mutex
> inside ovl_get_redirect() to protect the !samedir traversal
> and also take that lock before lock_rename() in ovl_rename().

I already made changes to move ovl_set_redirect() above lock_rename()
in my copy. I still can't see the need of ofs->ovl_rename_mutex. Please
help me understand the need with an example.

> 
> Back to the original question: how should ovl_inode->redirect
> be protected if at all it needs protection?

At this point I am thinking at max we need to just take ovl_inode->lock
to protect ovl_inode->redirect. It just makes it little safer and
relatively easier to understand.

Thanks
Vivek

> I think setting redirect need the protection of ofs->ovl_rename_mutex.
> Can either take it just around ovl_dentry_set_redirect(), probably
> instead of d_lock, or take the ovl_rename_mutex inside
> ovl_set_redirect() and around ovl_get_redirect() instead of inside
> ovl_get_redirect().
> 
> Hmm, I hope I got this right. I advise to get feedback from Miklos
> to this design before you proceed.
> 
> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode
  2018-04-06 15:37                       ` Vivek Goyal
@ 2018-04-06 16:21                         ` Amir Goldstein
  2018-04-06 17:32                           ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-04-06 16:21 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Fri, Apr 6, 2018 at 6:37 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Apr 06, 2018 at 12:46:31PM +0300, Amir Goldstein wrote:
>> On Thu, Apr 5, 2018 at 11:45 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> [...]
>> >> >
>> >> > If you don't like this locking ovl_inode->lock model, I guess for now
>> >> > I could live with not removing REDIRECT after copy up. Once that gets
>> >> > merged, I could do one patch series just to clean up REDIRECT after copy
>> >> > up and do proper locking.
>> >> >
>> >>
>> >> I think we are confusing two different things in this discussion.
>> >> Locking ovl_inode->lock for changing redirect is something that should
>> >> stay in the patch set IMO and should be simplified by moving
>> >> set redirect before taking rename locks on two inodes.
>> >>
>> >> I was asking about removal of REDIRECT because this patch
>> >> (ovl_verify_metacopy_data) is a bit tricky for me to review and
>> >> I still don't feel confident about it.
>> >>
>> >> My intuition says we could go other ways as well:
>> >> - unite METACOPY/REDIRECT xattr (we can call the unite
>> >> xattr REDIRECT and not METACOPY)
>> >> - memory barriers between setting/clearing/checking
>> >> METACOPY/REDIRECT (there is already a barrier for setting
>> >> upperdata flag, so that's half the work already.
>> >>
>> >> We can either have this discussion now or, as you suggested
>> >> leave it to a following patch set. Rule of thumb - if this is v13
>> >> with 28 patches, might not be a bad idea to deffer 2 patches
>> >> and reduce complexity...
>> >>
>> >
>> > Hi Amir,
>> >
>> > Ok, let me drop removing REDIRECT from the patchset to reduce
>> > complexity. Lets first try to get in a basic version in.
>> >
>> > Now coming to the question of locking ovl_inode->lock. If REDIRECTS
>> > are never removed and only upgraded (from relative to absolute), my
>> > understanding is that current VFS locking is sufficient to prevent
>> > races. Only rename and link path set redirects and both the paths take
>> > inode locks and that means two set redirects can't make progress in
>> > parallel on an inode.
>> >
>> > When it comes to ovl_lookup(), we will set ovl_inode->redirect only for
>> > the case of I_NEW. So no races should be there as well.
>> >
>> > So far we had to add locking due to copy up path and now copy up path
>> > will not touch redirect xattr. I probably will not even clear
>> > ovl_inode->inode redirect as we are not removing REDIRECT.
>> >
>> > Do you see any other path racing and still needed ovl_inode->lock?
>> >
>>
>> Let's see.
>>
>> Current code is taking A->d_lock in ovl_set_redirect(A) to protect
>> against racing with redirect path traversal in ovl_get_redirect(A/B/c).
>>
>> For the purpose of protecting ancestors redirect path, d_lock is
>> sufficient and is needed anyway to access d_name.
>>
>> Also any rename in the filesystem is *also* serialized with
>> ovl_sb->s_vfs_rename_mutex. That would make the change I suggested
>> to move ovl_set_redirect() before lock_rename() a bit tricky.
>> You may say that taking d_lock in ovl_set_redirect() is not strictly
>> needed, but I guess it is better coding (or maybe something I am
>> missing).
>>
>> Now when adding ovl_set_redirect() for non-dir and in ovl_link() the
>> plot thickens, because:
>> 1. link is not serialized with ovl_sb->s_vfs_rename_mutex lock,
>> so path traversal in ovl_get_redirect() in unstable
>> 2. 'old' dentry in ovl_link() can be disconnected, so there is
>> no path for ovl_get_redirect() at all.
>>
>> The only way to get a disconnected dentry is with nfs export, so for
>> now, it is enough to WARN_ON and return error in ovl_set_redirect()
>> for (dentry->d_flags & DCACHE_DISCONNECTED)
>> with a comment referring to nfs_export+metacopy support.
>
> Hi Amir,
>
> I can put this warning for DCACHE_DISCONNECTED dentries.
>
> Now, most interesting part for me here is that why do we need to
> stop/synchronize other renames in the system while some set_redirect/get
> _redirect operation is taking place. That's the part I don't understand.
> When I am looking at current code, I feel d_lock, seems to be good enough
> to make sure that ovl_get_redirect() works fine when parallel renames
> progressing on other cpu.

Right.

>
> So a simple example, is say I am creating a link bar/foo-link.txt to a file
> foo/foo.txt and that triggers setting absolute redirect on foo.txt
> and we will call ovl_get_direct() and  traverse the tree up till root.
> Now say a part of the tree is also being renamed. Say foo/ is being
> renamed to alpha/. I am wondering is d_lock is not enough to make sure
> this is not a problem.
>
> We always set redirect first and then do rename. That means d_parent
> should be changed only after redirect has been set on a dentry. And
> that should guarantee that if ovl_get_redirect() sees new parent,
> then parent_dentry->redirect has been already set. If it sees dentry
> before rename, then redirect might be there if not, dentry name would
> be used and it will also see the old parent and continue traversal
> up.
>
> Is there anything I am missing?

No. I think you got it right.
I was confused.

>
>>
>> Maybe the simplest solution w.r.t stabilizing path traversal
>> is to move ovl_set_redirect() above lock_rename()
>> as I suggested and use a new lock ofs->ovl_rename_mutex
>> inside ovl_get_redirect() to protect the !samedir traversal
>> and also take that lock before lock_rename() in ovl_rename().
>
> I already made changes to move ovl_set_redirect() above lock_rename()
> in my copy. I still can't see the need of ofs->ovl_rename_mutex. Please
> help me understand the need with an example.
>

No need.

>>
>> Back to the original question: how should ovl_inode->redirect
>> be protected if at all it needs protection?
>
> At this point I am thinking at max we need to just take ovl_inode->lock
> to protect ovl_inode->redirect. It just makes it little safer and
> relatively easier to understand.
>

I agree. I don't think ovl_inode->lock is needed, but I would
add a comment explaining why (VFS inode lock).

The problem is that we cannot get rid of d_lock for path traversal
and I don't see how we can easily take both ovl dentry d_lock with
ovl_inode->lock, because the latter is a sleeping lock, but layering
wise it is logically below the overlay dentry lock.

So better not add ovl_inode->lock
Sorry for the noise.

I hope moving set redirect before lock_rename simplified your
patches, so this whole exercise served a point.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode
  2018-04-06 16:21                         ` Amir Goldstein
@ 2018-04-06 17:32                           ` Vivek Goyal
  2018-04-06 20:10                             ` Amir Goldstein
  0 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-04-06 17:32 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Fri, Apr 06, 2018 at 07:21:07PM +0300, Amir Goldstein wrote:
> On Fri, Apr 6, 2018 at 6:37 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Fri, Apr 06, 2018 at 12:46:31PM +0300, Amir Goldstein wrote:
> >> On Thu, Apr 5, 2018 at 11:45 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> [...]
> >> >> >
> >> >> > If you don't like this locking ovl_inode->lock model, I guess for now
> >> >> > I could live with not removing REDIRECT after copy up. Once that gets
> >> >> > merged, I could do one patch series just to clean up REDIRECT after copy
> >> >> > up and do proper locking.
> >> >> >
> >> >>
> >> >> I think we are confusing two different things in this discussion.
> >> >> Locking ovl_inode->lock for changing redirect is something that should
> >> >> stay in the patch set IMO and should be simplified by moving
> >> >> set redirect before taking rename locks on two inodes.
> >> >>
> >> >> I was asking about removal of REDIRECT because this patch
> >> >> (ovl_verify_metacopy_data) is a bit tricky for me to review and
> >> >> I still don't feel confident about it.
> >> >>
> >> >> My intuition says we could go other ways as well:
> >> >> - unite METACOPY/REDIRECT xattr (we can call the unite
> >> >> xattr REDIRECT and not METACOPY)
> >> >> - memory barriers between setting/clearing/checking
> >> >> METACOPY/REDIRECT (there is already a barrier for setting
> >> >> upperdata flag, so that's half the work already.
> >> >>
> >> >> We can either have this discussion now or, as you suggested
> >> >> leave it to a following patch set. Rule of thumb - if this is v13
> >> >> with 28 patches, might not be a bad idea to deffer 2 patches
> >> >> and reduce complexity...
> >> >>
> >> >
> >> > Hi Amir,
> >> >
> >> > Ok, let me drop removing REDIRECT from the patchset to reduce
> >> > complexity. Lets first try to get in a basic version in.
> >> >
> >> > Now coming to the question of locking ovl_inode->lock. If REDIRECTS
> >> > are never removed and only upgraded (from relative to absolute), my
> >> > understanding is that current VFS locking is sufficient to prevent
> >> > races. Only rename and link path set redirects and both the paths take
> >> > inode locks and that means two set redirects can't make progress in
> >> > parallel on an inode.
> >> >
> >> > When it comes to ovl_lookup(), we will set ovl_inode->redirect only for
> >> > the case of I_NEW. So no races should be there as well.
> >> >
> >> > So far we had to add locking due to copy up path and now copy up path
> >> > will not touch redirect xattr. I probably will not even clear
> >> > ovl_inode->inode redirect as we are not removing REDIRECT.
> >> >
> >> > Do you see any other path racing and still needed ovl_inode->lock?
> >> >
> >>
> >> Let's see.
> >>
> >> Current code is taking A->d_lock in ovl_set_redirect(A) to protect
> >> against racing with redirect path traversal in ovl_get_redirect(A/B/c).
> >>
> >> For the purpose of protecting ancestors redirect path, d_lock is
> >> sufficient and is needed anyway to access d_name.
> >>
> >> Also any rename in the filesystem is *also* serialized with
> >> ovl_sb->s_vfs_rename_mutex. That would make the change I suggested
> >> to move ovl_set_redirect() before lock_rename() a bit tricky.
> >> You may say that taking d_lock in ovl_set_redirect() is not strictly
> >> needed, but I guess it is better coding (or maybe something I am
> >> missing).
> >>
> >> Now when adding ovl_set_redirect() for non-dir and in ovl_link() the
> >> plot thickens, because:
> >> 1. link is not serialized with ovl_sb->s_vfs_rename_mutex lock,
> >> so path traversal in ovl_get_redirect() in unstable
> >> 2. 'old' dentry in ovl_link() can be disconnected, so there is
> >> no path for ovl_get_redirect() at all.
> >>
> >> The only way to get a disconnected dentry is with nfs export, so for
> >> now, it is enough to WARN_ON and return error in ovl_set_redirect()
> >> for (dentry->d_flags & DCACHE_DISCONNECTED)
> >> with a comment referring to nfs_export+metacopy support.
> >
> > Hi Amir,
> >
> > I can put this warning for DCACHE_DISCONNECTED dentries.
> >
> > Now, most interesting part for me here is that why do we need to
> > stop/synchronize other renames in the system while some set_redirect/get
> > _redirect operation is taking place. That's the part I don't understand.
> > When I am looking at current code, I feel d_lock, seems to be good enough
> > to make sure that ovl_get_redirect() works fine when parallel renames
> > progressing on other cpu.
> 
> Right.
> 
> >
> > So a simple example, is say I am creating a link bar/foo-link.txt to a file
> > foo/foo.txt and that triggers setting absolute redirect on foo.txt
> > and we will call ovl_get_direct() and  traverse the tree up till root.
> > Now say a part of the tree is also being renamed. Say foo/ is being
> > renamed to alpha/. I am wondering is d_lock is not enough to make sure
> > this is not a problem.
> >
> > We always set redirect first and then do rename. That means d_parent
> > should be changed only after redirect has been set on a dentry. And
> > that should guarantee that if ovl_get_redirect() sees new parent,
> > then parent_dentry->redirect has been already set. If it sees dentry
> > before rename, then redirect might be there if not, dentry name would
> > be used and it will also see the old parent and continue traversal
> > up.
> >
> > Is there anything I am missing?
> 
> No. I think you got it right.
> I was confused.
> 
> >
> >>
> >> Maybe the simplest solution w.r.t stabilizing path traversal
> >> is to move ovl_set_redirect() above lock_rename()
> >> as I suggested and use a new lock ofs->ovl_rename_mutex
> >> inside ovl_get_redirect() to protect the !samedir traversal
> >> and also take that lock before lock_rename() in ovl_rename().
> >
> > I already made changes to move ovl_set_redirect() above lock_rename()
> > in my copy. I still can't see the need of ofs->ovl_rename_mutex. Please
> > help me understand the need with an example.
> >
> 
> No need.
> 
> >>
> >> Back to the original question: how should ovl_inode->redirect
> >> be protected if at all it needs protection?
> >
> > At this point I am thinking at max we need to just take ovl_inode->lock
> > to protect ovl_inode->redirect. It just makes it little safer and
> > relatively easier to understand.
> >
> 
> I agree. I don't think ovl_inode->lock is needed, but I would
> add a comment explaining why (VFS inode lock).
> 
> The problem is that we cannot get rid of d_lock for path traversal
> and I don't see how we can easily take both ovl dentry d_lock with
> ovl_inode->lock, because the latter is a sleeping lock, but layering
> wise it is logically below the overlay dentry lock.

Actually, I think ovl_inode->lock is at same layer as inode->i_rwsem. So
VFS first takes i_rwsem and then d_lock. And we probably should do the
same thing. In fact we are already taking ovl_inode->lock in
ovl_nlink_start() first and then calling ovl_set_redirect() which takes
d_lock.

> 
> So better not add ovl_inode->lock
> Sorry for the noise.

I will not add ovl_inode->lock if I can explain everything. I have a
feeling that redirect upgrade path will have some funny interactions
with ovl_lookup() path. Let me write the new patches and see if
ovl_inode->lock is still required.

> 
> I hope moving set redirect before lock_rename simplified your
> patches, so this whole exercise served a point.

It definitely does. Thanks for that idea. 

Vivek

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode
  2018-04-06 17:32                           ` Vivek Goyal
@ 2018-04-06 20:10                             ` Amir Goldstein
  2018-04-09 12:18                               ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-04-06 20:10 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Fri, Apr 6, 2018 at 8:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
[...]
>> >>
>> >> Back to the original question: how should ovl_inode->redirect
>> >> be protected if at all it needs protection?
>> >
>> > At this point I am thinking at max we need to just take ovl_inode->lock
>> > to protect ovl_inode->redirect. It just makes it little safer and
>> > relatively easier to understand.
>> >
>>
>> I agree. I don't think ovl_inode->lock is needed, but I would
>> add a comment explaining why (VFS inode lock).
>>
>> The problem is that we cannot get rid of d_lock for path traversal
>> and I don't see how we can easily take both ovl dentry d_lock with
>> ovl_inode->lock, because the latter is a sleeping lock, but layering
>> wise it is logically below the overlay dentry lock.
>
> Actually, I think ovl_inode->lock is at same layer as inode->i_rwsem. So
> VFS first takes i_rwsem and then d_lock. And we probably should do the
> same thing. In fact we are already taking ovl_inode->lock in
> ovl_nlink_start() first and then calling ovl_set_redirect() which takes
> d_lock.
>

It's not completely ok that we do that.

If we can always abide to the rules of locking order:
VFS overlay layer locks -> internal overlay locks -> VFS underlying fs locks

We will be safer.

So if ovl_set_redirect() can happen without taking ovl_inode->lock
and completely before underlying fs VFS locks I think that would be
better.

>>
>> So better not add ovl_inode->lock
>> Sorry for the noise.
>
> I will not add ovl_inode->lock if I can explain everything. I have a
> feeling that redirect upgrade path will have some funny interactions
> with ovl_lookup() path. Let me write the new patches and see if
> ovl_inode->lock is still required.
>

OK.
Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode
  2018-04-06 20:10                             ` Amir Goldstein
@ 2018-04-09 12:18                               ` Vivek Goyal
  0 siblings, 0 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-04-09 12:18 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Fri, Apr 06, 2018 at 11:10:05PM +0300, Amir Goldstein wrote:
> On Fri, Apr 6, 2018 at 8:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> [...]
> >> >>
> >> >> Back to the original question: how should ovl_inode->redirect
> >> >> be protected if at all it needs protection?
> >> >
> >> > At this point I am thinking at max we need to just take ovl_inode->lock
> >> > to protect ovl_inode->redirect. It just makes it little safer and
> >> > relatively easier to understand.
> >> >
> >>
> >> I agree. I don't think ovl_inode->lock is needed, but I would
> >> add a comment explaining why (VFS inode lock).
> >>
> >> The problem is that we cannot get rid of d_lock for path traversal
> >> and I don't see how we can easily take both ovl dentry d_lock with
> >> ovl_inode->lock, because the latter is a sleeping lock, but layering
> >> wise it is logically below the overlay dentry lock.
> >
> > Actually, I think ovl_inode->lock is at same layer as inode->i_rwsem. So
> > VFS first takes i_rwsem and then d_lock. And we probably should do the
> > same thing. In fact we are already taking ovl_inode->lock in
> > ovl_nlink_start() first and then calling ovl_set_redirect() which takes
> > d_lock.
> >
> 
> It's not completely ok that we do that.
> 
> If we can always abide to the rules of locking order:
> VFS overlay layer locks -> internal overlay locks -> VFS underlying fs locks

Sure, and that's what will happen when we move ovl_set_redirect() above
lock_rename().

- When ovl_rename() is called, VFS is holding ovl layer locks.
- ovl_nlink_start() and ovl_set_redrect() with take internal
  overlay lock (ovl_inode->lock)
- And then lock_rename() will take VFS underlying fs lock.

So ordering of locks will be exactly as you like it.

Vivek

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 20/28] ovl: Do not set dentry type ORIGIN for broken hardlinks
  2018-03-30  9:54   ` Amir Goldstein
@ 2018-04-10 14:00     ` Vivek Goyal
  2018-04-10 19:20       ` Amir Goldstein
  0 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-04-10 14:00 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Fri, Mar 30, 2018 at 12:54:14PM +0300, Amir Goldstein wrote:
> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > If a dentry has copy up origin, we set flag OVL_PATH_ORIGIN. So far
> > this decision was easy that we had to check only for oe->numlower
> > and if it is non-zero, we knew there is copy up origin. (For non-dir
> > we installed origin dentry in lowerstack[0]).
> >
> > But we don't create ORGIN xattr for broken hardlinks (index=off). And
> > with metacopy feature it is possible that we will still install
> > lowerstack[0]. But that's lower data dentry of metacopy upper of broken
> > hardlink and not ORIGIN XATTR is not set.
> >
> > So two differentiate between two cases, do not set OVL_PATH_ORIGIN if
> > we have a broken hardlink.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/overlayfs/util.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> > index 29f7336ade88..961d65bd25c9 100644
> > --- a/fs/overlayfs/util.c
> > +++ b/fs/overlayfs/util.c
> > @@ -117,7 +117,14 @@ enum ovl_path_type ovl_path_type(struct dentry *dentry)
> >                  * Non-dir dentry can hold lower dentry of its copy up origin.
> 
> This comment needs updating with metacopy.
> 
> >                  */
> >                 if (oe->numlower) {
> > -                       type |= __OVL_PATH_ORIGIN;
> > +                       /*
> > +                        * ORIGIN is created for everyting except broken
> > +                        * hardlinks
> > +                        */
> > +                       if (!(d_inode(dentry)->i_nlink > 1 &&
> > +                           !ovl_test_flag(OVL_INDEX, d_inode(dentry))))
> > +                               type |= __OVL_PATH_ORIGIN;
> > +
> 
> I don't like relying on overlay nlink. it is not reliable.
> And I think you missed the directory case.

Directory should be fine because for directories ->i_nlink == 1 and this
condition will become true.

> The information you need was available during lookup
> and we did not keep it (was lower verified by origin fh).
> Most likely what we need to do it store OVL_ORIGIN inode flag
> during lookup and then use ovl_test_flag(OVL_ORIGIN)
> in place of OVL_TYPE_ORIGIN(type).
> 
> If I am not mistaken, you could set flag OVL_ORIGIN in
> ovl_get_inode() IFF (upperdentry && bylower), but to be honest
> the rules became so complicated that I can't say for sure.
> At least I concentrated all the rules in one helper ovl_hash_bylower(),
> so I hope that helps.

Ok, I can put another ovl-inode flag say OVL_ORIGIN. But that also means
that I need to set the flag during copy up. And that means it brings
back ordering requirements for lockless access. For example, in
ovl_getattr(), if we replace OVL_TYPE_ORIGIN() with ovl_test_flag(ORIGIN),
without explicit barriers, what's the guarantee a user will see OVL_ORIGIN
flag yet. In fact it can happen that user might see upperdentry but not
OVL_ORIGIN.

In fact I see OVL_INDEX being used in ovl_getattr(). If a parallel copy
up took place it is possible that ovl_getattr() saw upperdentry but not
the OVL_INDEX. And then nlink count will be off. IOW, what barrier/lock
guarantees that this does not happen.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 20/28] ovl: Do not set dentry type ORIGIN for broken hardlinks
  2018-04-10 14:00     ` Vivek Goyal
@ 2018-04-10 19:20       ` Amir Goldstein
  2018-04-10 19:29         ` Amir Goldstein
  2018-04-10 20:51         ` Vivek Goyal
  0 siblings, 2 replies; 91+ messages in thread
From: Amir Goldstein @ 2018-04-10 19:20 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Tue, Apr 10, 2018 at 5:00 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Mar 30, 2018 at 12:54:14PM +0300, Amir Goldstein wrote:
>> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > If a dentry has copy up origin, we set flag OVL_PATH_ORIGIN. So far
>> > this decision was easy that we had to check only for oe->numlower
>> > and if it is non-zero, we knew there is copy up origin. (For non-dir
>> > we installed origin dentry in lowerstack[0]).
>> >
>> > But we don't create ORGIN xattr for broken hardlinks (index=off). And
>> > with metacopy feature it is possible that we will still install
>> > lowerstack[0]. But that's lower data dentry of metacopy upper of broken
>> > hardlink and not ORIGIN XATTR is not set.
>> >
>> > So two differentiate between two cases, do not set OVL_PATH_ORIGIN if
>> > we have a broken hardlink.
>> >
>> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
>> > ---
>> >  fs/overlayfs/util.c | 9 ++++++++-
>> >  1 file changed, 8 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
>> > index 29f7336ade88..961d65bd25c9 100644
>> > --- a/fs/overlayfs/util.c
>> > +++ b/fs/overlayfs/util.c
>> > @@ -117,7 +117,14 @@ enum ovl_path_type ovl_path_type(struct dentry *dentry)
>> >                  * Non-dir dentry can hold lower dentry of its copy up origin.
>>
>> This comment needs updating with metacopy.
>>
>> >                  */
>> >                 if (oe->numlower) {
>> > -                       type |= __OVL_PATH_ORIGIN;
>> > +                       /*
>> > +                        * ORIGIN is created for everyting except broken
>> > +                        * hardlinks
>> > +                        */
>> > +                       if (!(d_inode(dentry)->i_nlink > 1 &&
>> > +                           !ovl_test_flag(OVL_INDEX, d_inode(dentry))))
>> > +                               type |= __OVL_PATH_ORIGIN;
>> > +
>>
>> I don't like relying on overlay nlink. it is not reliable.
>> And I think you missed the directory case.
>
> Directory should be fine because for directories ->i_nlink == 1 and this
> condition will become true.
>
>> The information you need was available during lookup
>> and we did not keep it (was lower verified by origin fh).
>> Most likely what we need to do it store OVL_ORIGIN inode flag
>> during lookup and then use ovl_test_flag(OVL_ORIGIN)
>> in place of OVL_TYPE_ORIGIN(type).
>>
>> If I am not mistaken, you could set flag OVL_ORIGIN in
>> ovl_get_inode() IFF (upperdentry && bylower), but to be honest
>> the rules became so complicated that I can't say for sure.
>> At least I concentrated all the rules in one helper ovl_hash_bylower(),
>> so I hope that helps.
>
> Ok, I can put another ovl-inode flag say OVL_ORIGIN. But that also means
> that I need to set the flag during copy up. And that means it brings
> back ordering requirements for lockless access. For example, in
> ovl_getattr(), if we replace OVL_TYPE_ORIGIN() with ovl_test_flag(ORIGIN),
> without explicit barriers, what's the guarantee a user will see OVL_ORIGIN
> flag yet. In fact it can happen that user might see upperdentry but not
> OVL_ORIGIN.

OK, so instead set OVL_BYLOWER in ovl_get_inode() IFF (bylower).
that property doesn't change on copy up.
and set __OVL_PATH_ORIGIN in ovl_path_type() if upper AND numlower
AND OVL_BYLOWER.

>
> In fact I see OVL_INDEX being used in ovl_getattr(). If a parallel copy
> up took place it is possible that ovl_getattr() saw upperdentry but not
> the OVL_INDEX. And then nlink count will be off. IOW, what barrier/lock
> guarantees that this does not happen.
>

Yes. and that is not the only problem I see with ovl_getattr().

stat->dev could be set to ovl_get_pseudo_dev(dentry), even
if stat->ino is not set to lowerstat.ino; that is not good.

If I am not mistaken, all the special conditions in ovl_getattr()
to use lower st_ino are already encoded in ovl_hash_bylower(),
so you may completely remove those extra tests if
OVL_TYPE_ORIGIN(type) already checks the OVL_BYLOWER
flag.

It may be hard to see my claim above is true, because, for
example, sb->s_export_op in ovl_hash_bylower() is equivalent
to ovl_verify_lower(dentry->d_sb) in ovl_getattr().

Please double check my claims, and if you agree, you can submit a
patch that Fixes: ("a0c5ad307ac0 ovl: relax same fs constraint for
constant st_ino"), which already does the right thing for metacopy.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 20/28] ovl: Do not set dentry type ORIGIN for broken hardlinks
  2018-04-10 19:20       ` Amir Goldstein
@ 2018-04-10 19:29         ` Amir Goldstein
  2018-04-10 20:59           ` Vivek Goyal
  2018-04-10 20:51         ` Vivek Goyal
  1 sibling, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-04-10 19:29 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Tue, Apr 10, 2018 at 10:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Tue, Apr 10, 2018 at 5:00 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
[...]
>>
>> In fact I see OVL_INDEX being used in ovl_getattr(). If a parallel copy
>> up took place it is possible that ovl_getattr() saw upperdentry but not
>> the OVL_INDEX. And then nlink count will be off. IOW, what barrier/lock
>> guarantees that this does not happen.
>>
>
> Yes. and that is not the only problem I see with ovl_getattr().
>
> stat->dev could be set to ovl_get_pseudo_dev(dentry), even
> if stat->ino is not set to lowerstat.ino; that is not good.
>
> If I am not mistaken, all the special conditions in ovl_getattr()
> to use lower st_ino are already encoded in ovl_hash_bylower(),
> so you may completely remove those extra tests if
> OVL_TYPE_ORIGIN(type) already checks the OVL_BYLOWER
> flag.
>
> It may be hard to see my claim above is true, because, for
> example, sb->s_export_op in ovl_hash_bylower() is equivalent
> to ovl_verify_lower(dentry->d_sb) in ovl_getattr().
>
> Please double check my claims, and if you agree, you can submit a
> patch that Fixes: ("a0c5ad307ac0 ovl: relax same fs constraint for
> constant st_ino"), which already does the right thing for metacopy.
>

On second thought, it would be better to submit the simple fix patch
first, so it could be applied to stable kernels that don't have
ovl_hash_bylower() and only then submit the patch that adds
OVL_BYLOWER and removes the extra tests in ovl_getattr().
Something like this (untested) should do.
Let me know if you would like me to submit the fix patch.

Thanks,
Amir.

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 3b1bd469accd..34fe86dedd3b 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -118,13 +118,10 @@ int ovl_getattr(const struct path *path, struct
kstat *stat,
                         */
                        if (ovl_test_flag(OVL_INDEX, d_inode(dentry)) ||
                            (!ovl_verify_lower(dentry->d_sb) &&
-                            (is_dir || lowerstat.nlink == 1)))
+                            (is_dir || lowerstat.nlink == 1))) {
                                stat->ino = lowerstat.ino;
-
-                       if (samefs)
-                               WARN_ON_ONCE(stat->dev != lowerstat.dev);
-                       else
                                stat->dev = ovl_get_pseudo_dev(dentry);
+                       }
                }
                if (samefs) {
                        /*

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 20/28] ovl: Do not set dentry type ORIGIN for broken hardlinks
  2018-04-10 19:20       ` Amir Goldstein
  2018-04-10 19:29         ` Amir Goldstein
@ 2018-04-10 20:51         ` Vivek Goyal
  2018-04-11  8:58           ` Amir Goldstein
  1 sibling, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-04-10 20:51 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Tue, Apr 10, 2018 at 10:20:43PM +0300, Amir Goldstein wrote:
> On Tue, Apr 10, 2018 at 5:00 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Fri, Mar 30, 2018 at 12:54:14PM +0300, Amir Goldstein wrote:
> >> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > If a dentry has copy up origin, we set flag OVL_PATH_ORIGIN. So far
> >> > this decision was easy that we had to check only for oe->numlower
> >> > and if it is non-zero, we knew there is copy up origin. (For non-dir
> >> > we installed origin dentry in lowerstack[0]).
> >> >
> >> > But we don't create ORGIN xattr for broken hardlinks (index=off). And
> >> > with metacopy feature it is possible that we will still install
> >> > lowerstack[0]. But that's lower data dentry of metacopy upper of broken
> >> > hardlink and not ORIGIN XATTR is not set.
> >> >
> >> > So two differentiate between two cases, do not set OVL_PATH_ORIGIN if
> >> > we have a broken hardlink.
> >> >
> >> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> >> > ---
> >> >  fs/overlayfs/util.c | 9 ++++++++-
> >> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> >> > index 29f7336ade88..961d65bd25c9 100644
> >> > --- a/fs/overlayfs/util.c
> >> > +++ b/fs/overlayfs/util.c
> >> > @@ -117,7 +117,14 @@ enum ovl_path_type ovl_path_type(struct dentry *dentry)
> >> >                  * Non-dir dentry can hold lower dentry of its copy up origin.
> >>
> >> This comment needs updating with metacopy.
> >>
> >> >                  */
> >> >                 if (oe->numlower) {
> >> > -                       type |= __OVL_PATH_ORIGIN;
> >> > +                       /*
> >> > +                        * ORIGIN is created for everyting except broken
> >> > +                        * hardlinks
> >> > +                        */
> >> > +                       if (!(d_inode(dentry)->i_nlink > 1 &&
> >> > +                           !ovl_test_flag(OVL_INDEX, d_inode(dentry))))
> >> > +                               type |= __OVL_PATH_ORIGIN;
> >> > +
> >>
> >> I don't like relying on overlay nlink. it is not reliable.
> >> And I think you missed the directory case.
> >
> > Directory should be fine because for directories ->i_nlink == 1 and this
> > condition will become true.
> >
> >> The information you need was available during lookup
> >> and we did not keep it (was lower verified by origin fh).
> >> Most likely what we need to do it store OVL_ORIGIN inode flag
> >> during lookup and then use ovl_test_flag(OVL_ORIGIN)
> >> in place of OVL_TYPE_ORIGIN(type).
> >>
> >> If I am not mistaken, you could set flag OVL_ORIGIN in
> >> ovl_get_inode() IFF (upperdentry && bylower), but to be honest
> >> the rules became so complicated that I can't say for sure.
> >> At least I concentrated all the rules in one helper ovl_hash_bylower(),
> >> so I hope that helps.
> >
> > Ok, I can put another ovl-inode flag say OVL_ORIGIN. But that also means
> > that I need to set the flag during copy up. And that means it brings
> > back ordering requirements for lockless access. For example, in
> > ovl_getattr(), if we replace OVL_TYPE_ORIGIN() with ovl_test_flag(ORIGIN),
> > without explicit barriers, what's the guarantee a user will see OVL_ORIGIN
> > flag yet. In fact it can happen that user might see upperdentry but not
> > OVL_ORIGIN.
> 
> OK, so instead set OVL_BYLOWER in ovl_get_inode() IFF (bylower).
> that property doesn't change on copy up.
> and set __OVL_PATH_ORIGIN in ovl_path_type() if upper AND numlower
> AND OVL_BYLOWER.

Ok, so at the time of lookup, we know if hardlink will be broken or
not and that property does not change over copy up. So yes that should
solve the issue.

Having said that, OVL_BYLOWER is very broad in my opinion. Setting
OVL_ORIGIN in inode appeals more to me. This means this inode either
has copy up origin or will have copy up origin when copy up
happens. 

What do you think?

I have another proposal. All these problems seem to stem from the
fact that copy up changes some of the fields in ovl_inode and we
don't have any strict ordering for the updates.

I am wondering can we replace data dependency barrier
(smp_read_barrier_depends) with a full smp_read() and use that
for ovl_dentry_upper(). And do all the flag updates before
ovl_inode_update(). This should make sure that if caller sees
upperdentry, then not only upper dentry is table, at the same
time any updates to OVL_INDEX, OVL_ORIGIN and any other flags
have been done.

So in copy up path, we will first update OVL_INDEX, OVL_ORIGIN etc
and then call ovl_inode_update(). And on read side fetch upperdentry
first and put smp_rmb() after that. And if upperdentry is visible,
that means INDEX and ORIGIN update must be visible as well.

Copy up side
-----------
	store OVL_INDEX
	store OVL_ORIGIN
	store any other relevant flag
	smp_wmb()
	store ovl_inode->__upperdentry

Read side (ovl_getxattr and other users)
-----------------------------------------
	load ovl_inode->__upperdentry
	smp_rmb()
	load OVL_INDEX
	load OVL_ORIGIN

Now if ovl_inode->__upperdentry is set, that means not only upper dentry
is stable, it should also mean that ovl_inode->flags are stable too.

If this theory of barrier operations is correct, I feel this is a better
solution. It is easy to understand at the same time it allows for
easy code extension in future.

Right now, these lockless assumptions are so difficult to understand
and so error prone, that all the new users will invariably end up
introducing races.

> 
> >
> > In fact I see OVL_INDEX being used in ovl_getattr(). If a parallel copy
> > up took place it is possible that ovl_getattr() saw upperdentry but not
> > the OVL_INDEX. And then nlink count will be off. IOW, what barrier/lock
> > guarantees that this does not happen.
> >
> 
> Yes. and that is not the only problem I see with ovl_getattr().
> 
> stat->dev could be set to ovl_get_pseudo_dev(dentry), even
> if stat->ino is not set to lowerstat.ino; that is not good.
> 
> If I am not mistaken, all the special conditions in ovl_getattr()
> to use lower st_ino are already encoded in ovl_hash_bylower(),
> so you may completely remove those extra tests if
> OVL_TYPE_ORIGIN(type) already checks the OVL_BYLOWER
> flag.
> 
> It may be hard to see my claim above is true, because, for
> example, sb->s_export_op in ovl_hash_bylower() is equivalent
> to ovl_verify_lower(dentry->d_sb) in ovl_getattr().
> 
> Please double check my claims, and if you agree, you can submit a
> patch that Fixes: ("a0c5ad307ac0 ovl: relax same fs constraint for
> constant st_ino"), which already does the right thing for metacopy.

I will need to spend some time to better understand it. I feel that
changing barrier semantics around __upperdentry, will solve this
issue with existing code as well.

Vivek

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 20/28] ovl: Do not set dentry type ORIGIN for broken hardlinks
  2018-04-10 19:29         ` Amir Goldstein
@ 2018-04-10 20:59           ` Vivek Goyal
  0 siblings, 0 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-04-10 20:59 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Tue, Apr 10, 2018 at 10:29:00PM +0300, Amir Goldstein wrote:
> On Tue, Apr 10, 2018 at 10:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> > On Tue, Apr 10, 2018 at 5:00 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> [...]
> >>
> >> In fact I see OVL_INDEX being used in ovl_getattr(). If a parallel copy
> >> up took place it is possible that ovl_getattr() saw upperdentry but not
> >> the OVL_INDEX. And then nlink count will be off. IOW, what barrier/lock
> >> guarantees that this does not happen.
> >>
> >
> > Yes. and that is not the only problem I see with ovl_getattr().
> >
> > stat->dev could be set to ovl_get_pseudo_dev(dentry), even
> > if stat->ino is not set to lowerstat.ino; that is not good.
> >
> > If I am not mistaken, all the special conditions in ovl_getattr()
> > to use lower st_ino are already encoded in ovl_hash_bylower(),
> > so you may completely remove those extra tests if
> > OVL_TYPE_ORIGIN(type) already checks the OVL_BYLOWER
> > flag.
> >
> > It may be hard to see my claim above is true, because, for
> > example, sb->s_export_op in ovl_hash_bylower() is equivalent
> > to ovl_verify_lower(dentry->d_sb) in ovl_getattr().
> >
> > Please double check my claims, and if you agree, you can submit a
> > patch that Fixes: ("a0c5ad307ac0 ovl: relax same fs constraint for
> > constant st_ino"), which already does the right thing for metacopy.
> >
> 
> On second thought, it would be better to submit the simple fix patch
> first, so it could be applied to stable kernels that don't have
> ovl_hash_bylower() and only then submit the patch that adds
> OVL_BYLOWER and removes the extra tests in ovl_getattr().
> Something like this (untested) should do.
> Let me know if you would like me to submit the fix patch.

Frankly speaking I don't understand the fix. So I would prefer that
you submit it with a proper commit message.

Vivek

> 
> Thanks,
> Amir.
> 
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index 3b1bd469accd..34fe86dedd3b 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -118,13 +118,10 @@ int ovl_getattr(const struct path *path, struct
> kstat *stat,
>                          */
>                         if (ovl_test_flag(OVL_INDEX, d_inode(dentry)) ||
>                             (!ovl_verify_lower(dentry->d_sb) &&
> -                            (is_dir || lowerstat.nlink == 1)))
> +                            (is_dir || lowerstat.nlink == 1))) {
>                                 stat->ino = lowerstat.ino;
> -
> -                       if (samefs)
> -                               WARN_ON_ONCE(stat->dev != lowerstat.dev);
> -                       else
>                                 stat->dev = ovl_get_pseudo_dev(dentry);
> +                       }
>                 }
>                 if (samefs) {
>                         /*

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 20/28] ovl: Do not set dentry type ORIGIN for broken hardlinks
  2018-04-10 20:51         ` Vivek Goyal
@ 2018-04-11  8:58           ` Amir Goldstein
  2018-04-11 13:31             ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-04-11  8:58 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Tue, Apr 10, 2018 at 11:51 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Apr 10, 2018 at 10:20:43PM +0300, Amir Goldstein wrote:
>> On Tue, Apr 10, 2018 at 5:00 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Fri, Mar 30, 2018 at 12:54:14PM +0300, Amir Goldstein wrote:
>> >> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> >> > If a dentry has copy up origin, we set flag OVL_PATH_ORIGIN. So far
>> >> > this decision was easy that we had to check only for oe->numlower
>> >> > and if it is non-zero, we knew there is copy up origin. (For non-dir
>> >> > we installed origin dentry in lowerstack[0]).
>> >> >
>> >> > But we don't create ORGIN xattr for broken hardlinks (index=off). And
>> >> > with metacopy feature it is possible that we will still install
>> >> > lowerstack[0]. But that's lower data dentry of metacopy upper of broken
>> >> > hardlink and not ORIGIN XATTR is not set.
>> >> >
>> >> > So two differentiate between two cases, do not set OVL_PATH_ORIGIN if
>> >> > we have a broken hardlink.
>> >> >
>> >> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
>> >> > ---
>> >> >  fs/overlayfs/util.c | 9 ++++++++-
>> >> >  1 file changed, 8 insertions(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
>> >> > index 29f7336ade88..961d65bd25c9 100644
>> >> > --- a/fs/overlayfs/util.c
>> >> > +++ b/fs/overlayfs/util.c
>> >> > @@ -117,7 +117,14 @@ enum ovl_path_type ovl_path_type(struct dentry *dentry)
>> >> >                  * Non-dir dentry can hold lower dentry of its copy up origin.
>> >>
>> >> This comment needs updating with metacopy.
>> >>
>> >> >                  */
>> >> >                 if (oe->numlower) {
>> >> > -                       type |= __OVL_PATH_ORIGIN;
>> >> > +                       /*
>> >> > +                        * ORIGIN is created for everyting except broken
>> >> > +                        * hardlinks
>> >> > +                        */
>> >> > +                       if (!(d_inode(dentry)->i_nlink > 1 &&
>> >> > +                           !ovl_test_flag(OVL_INDEX, d_inode(dentry))))
>> >> > +                               type |= __OVL_PATH_ORIGIN;
>> >> > +
>> >>
>> >> I don't like relying on overlay nlink. it is not reliable.
>> >> And I think you missed the directory case.
>> >
>> > Directory should be fine because for directories ->i_nlink == 1 and this
>> > condition will become true.
>> >
>> >> The information you need was available during lookup
>> >> and we did not keep it (was lower verified by origin fh).
>> >> Most likely what we need to do it store OVL_ORIGIN inode flag
>> >> during lookup and then use ovl_test_flag(OVL_ORIGIN)
>> >> in place of OVL_TYPE_ORIGIN(type).
>> >>
>> >> If I am not mistaken, you could set flag OVL_ORIGIN in
>> >> ovl_get_inode() IFF (upperdentry && bylower), but to be honest
>> >> the rules became so complicated that I can't say for sure.
>> >> At least I concentrated all the rules in one helper ovl_hash_bylower(),
>> >> so I hope that helps.
>> >
>> > Ok, I can put another ovl-inode flag say OVL_ORIGIN. But that also means
>> > that I need to set the flag during copy up. And that means it brings
>> > back ordering requirements for lockless access. For example, in
>> > ovl_getattr(), if we replace OVL_TYPE_ORIGIN() with ovl_test_flag(ORIGIN),
>> > without explicit barriers, what's the guarantee a user will see OVL_ORIGIN
>> > flag yet. In fact it can happen that user might see upperdentry but not
>> > OVL_ORIGIN.
>>
>> OK, so instead set OVL_BYLOWER in ovl_get_inode() IFF (bylower).
>> that property doesn't change on copy up.
>> and set __OVL_PATH_ORIGIN in ovl_path_type() if upper AND numlower
>> AND OVL_BYLOWER.
>
> Ok, so at the time of lookup, we know if hardlink will be broken or
> not and that property does not change over copy up. So yes that should
> solve the issue.
>
> Having said that, OVL_BYLOWER is very broad in my opinion. Setting
> OVL_ORIGIN in inode appeals more to me. This means this inode either
> has copy up origin or will have copy up origin when copy up
> happens.
>
> What do you think?

I think its better to choose a good flag name than to add complexity.
How about OVL_CONST_INO?

>
> I have another proposal. All these problems seem to stem from the
> fact that copy up changes some of the fields in ovl_inode and we
> don't have any strict ordering for the updates.
>
> I am wondering can we replace data dependency barrier
> (smp_read_barrier_depends) with a full smp_read() and use that
> for ovl_dentry_upper(). And do all the flag updates before
> ovl_inode_update(). This should make sure that if caller sees
> upperdentry, then not only upper dentry is table, at the same
> time any updates to OVL_INDEX, OVL_ORIGIN and any other flags
> have been done.
>
> So in copy up path, we will first update OVL_INDEX, OVL_ORIGIN etc
> and then call ovl_inode_update(). And on read side fetch upperdentry
> first and put smp_rmb() after that. And if upperdentry is visible,
> that means INDEX and ORIGIN update must be visible as well.
>
> Copy up side
> -----------
>         store OVL_INDEX
>         store OVL_ORIGIN
>         store any other relevant flag
>         smp_wmb()
>         store ovl_inode->__upperdentry
>
> Read side (ovl_getxattr and other users)
> -----------------------------------------
>         load ovl_inode->__upperdentry
>         smp_rmb()
>         load OVL_INDEX
>         load OVL_ORIGIN
>
> Now if ovl_inode->__upperdentry is set, that means not only upper dentry
> is stable, it should also mean that ovl_inode->flags are stable too.
>
> If this theory of barrier operations is correct, I feel this is a better
> solution. It is easy to understand at the same time it allows for
> easy code extension in future.
>
> Right now, these lockless assumptions are so difficult to understand
> and so error prone, that all the new users will invariably end up
> introducing races.
>

I see the value in your proposal, but:
1. I do not know how to measure the performance impact of this change.
2. patch 20/28 introduces another smp_wmb() in ovl_copy_up_locked()
and it is not conditional to metacopy being enabled. Is that right?
Can your proposal make that any better?

>>
>> >
>> > In fact I see OVL_INDEX being used in ovl_getattr(). If a parallel copy
>> > up took place it is possible that ovl_getattr() saw upperdentry but not
>> > the OVL_INDEX. And then nlink count will be off. IOW, what barrier/lock
>> > guarantees that this does not happen.
>> >
>>
>> Yes. and that is not the only problem I see with ovl_getattr().
>>
>> stat->dev could be set to ovl_get_pseudo_dev(dentry), even
>> if stat->ino is not set to lowerstat.ino; that is not good.
>>
>> If I am not mistaken, all the special conditions in ovl_getattr()
>> to use lower st_ino are already encoded in ovl_hash_bylower(),
>> so you may completely remove those extra tests if
>> OVL_TYPE_ORIGIN(type) already checks the OVL_BYLOWER
>> flag.
>>
>> It may be hard to see my claim above is true, because, for
>> example, sb->s_export_op in ovl_hash_bylower() is equivalent
>> to ovl_verify_lower(dentry->d_sb) in ovl_getattr().
>>
>> Please double check my claims, and if you agree, you can submit a
>> patch that Fixes: ("a0c5ad307ac0 ovl: relax same fs constraint for
>> constant st_ino"), which already does the right thing for metacopy.
>
> I will need to spend some time to better understand it. I feel that
> changing barrier semantics around __upperdentry, will solve this
> issue with existing code as well.
>

Its a simple bug not related to barriers semantics.
I will post a fix patch.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 20/28] ovl: Do not set dentry type ORIGIN for broken hardlinks
  2018-04-11  8:58           ` Amir Goldstein
@ 2018-04-11 13:31             ` Vivek Goyal
  0 siblings, 0 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-04-11 13:31 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Wed, Apr 11, 2018 at 11:58:54AM +0300, Amir Goldstein wrote:
> On Tue, Apr 10, 2018 at 11:51 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Tue, Apr 10, 2018 at 10:20:43PM +0300, Amir Goldstein wrote:
> >> On Tue, Apr 10, 2018 at 5:00 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > On Fri, Mar 30, 2018 at 12:54:14PM +0300, Amir Goldstein wrote:
> >> >> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> >> > If a dentry has copy up origin, we set flag OVL_PATH_ORIGIN. So far
> >> >> > this decision was easy that we had to check only for oe->numlower
> >> >> > and if it is non-zero, we knew there is copy up origin. (For non-dir
> >> >> > we installed origin dentry in lowerstack[0]).
> >> >> >
> >> >> > But we don't create ORGIN xattr for broken hardlinks (index=off). And
> >> >> > with metacopy feature it is possible that we will still install
> >> >> > lowerstack[0]. But that's lower data dentry of metacopy upper of broken
> >> >> > hardlink and not ORIGIN XATTR is not set.
> >> >> >
> >> >> > So two differentiate between two cases, do not set OVL_PATH_ORIGIN if
> >> >> > we have a broken hardlink.
> >> >> >
> >> >> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> >> >> > ---
> >> >> >  fs/overlayfs/util.c | 9 ++++++++-
> >> >> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >> >> >
> >> >> > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> >> >> > index 29f7336ade88..961d65bd25c9 100644
> >> >> > --- a/fs/overlayfs/util.c
> >> >> > +++ b/fs/overlayfs/util.c
> >> >> > @@ -117,7 +117,14 @@ enum ovl_path_type ovl_path_type(struct dentry *dentry)
> >> >> >                  * Non-dir dentry can hold lower dentry of its copy up origin.
> >> >>
> >> >> This comment needs updating with metacopy.
> >> >>
> >> >> >                  */
> >> >> >                 if (oe->numlower) {
> >> >> > -                       type |= __OVL_PATH_ORIGIN;
> >> >> > +                       /*
> >> >> > +                        * ORIGIN is created for everyting except broken
> >> >> > +                        * hardlinks
> >> >> > +                        */
> >> >> > +                       if (!(d_inode(dentry)->i_nlink > 1 &&
> >> >> > +                           !ovl_test_flag(OVL_INDEX, d_inode(dentry))))
> >> >> > +                               type |= __OVL_PATH_ORIGIN;
> >> >> > +
> >> >>
> >> >> I don't like relying on overlay nlink. it is not reliable.
> >> >> And I think you missed the directory case.
> >> >
> >> > Directory should be fine because for directories ->i_nlink == 1 and this
> >> > condition will become true.
> >> >
> >> >> The information you need was available during lookup
> >> >> and we did not keep it (was lower verified by origin fh).
> >> >> Most likely what we need to do it store OVL_ORIGIN inode flag
> >> >> during lookup and then use ovl_test_flag(OVL_ORIGIN)
> >> >> in place of OVL_TYPE_ORIGIN(type).
> >> >>
> >> >> If I am not mistaken, you could set flag OVL_ORIGIN in
> >> >> ovl_get_inode() IFF (upperdentry && bylower), but to be honest
> >> >> the rules became so complicated that I can't say for sure.
> >> >> At least I concentrated all the rules in one helper ovl_hash_bylower(),
> >> >> so I hope that helps.
> >> >
> >> > Ok, I can put another ovl-inode flag say OVL_ORIGIN. But that also means
> >> > that I need to set the flag during copy up. And that means it brings
> >> > back ordering requirements for lockless access. For example, in
> >> > ovl_getattr(), if we replace OVL_TYPE_ORIGIN() with ovl_test_flag(ORIGIN),
> >> > without explicit barriers, what's the guarantee a user will see OVL_ORIGIN
> >> > flag yet. In fact it can happen that user might see upperdentry but not
> >> > OVL_ORIGIN.
> >>
> >> OK, so instead set OVL_BYLOWER in ovl_get_inode() IFF (bylower).
> >> that property doesn't change on copy up.
> >> and set __OVL_PATH_ORIGIN in ovl_path_type() if upper AND numlower
> >> AND OVL_BYLOWER.
> >
> > Ok, so at the time of lookup, we know if hardlink will be broken or
> > not and that property does not change over copy up. So yes that should
> > solve the issue.
> >
> > Having said that, OVL_BYLOWER is very broad in my opinion. Setting
> > OVL_ORIGIN in inode appeals more to me. This means this inode either
> > has copy up origin or will have copy up origin when copy up
> > happens.
> >
> > What do you think?
> 
> I think its better to choose a good flag name than to add complexity.
> How about OVL_CONST_INO?

OVL_CONST_INO sounds better in terms of naming. So it will translate
to ORIGIN.

OVL_ORIGIN = (upperdentry && OVL_CONST_INO)

> 
> >
> > I have another proposal. All these problems seem to stem from the
> > fact that copy up changes some of the fields in ovl_inode and we
> > don't have any strict ordering for the updates.
> >
> > I am wondering can we replace data dependency barrier
> > (smp_read_barrier_depends) with a full smp_read() and use that
> > for ovl_dentry_upper(). And do all the flag updates before
> > ovl_inode_update(). This should make sure that if caller sees
> > upperdentry, then not only upper dentry is table, at the same
> > time any updates to OVL_INDEX, OVL_ORIGIN and any other flags
> > have been done.
> >
> > So in copy up path, we will first update OVL_INDEX, OVL_ORIGIN etc
> > and then call ovl_inode_update(). And on read side fetch upperdentry
> > first and put smp_rmb() after that. And if upperdentry is visible,
> > that means INDEX and ORIGIN update must be visible as well.
> >
> > Copy up side
> > -----------
> >         store OVL_INDEX
> >         store OVL_ORIGIN
> >         store any other relevant flag
> >         smp_wmb()
> >         store ovl_inode->__upperdentry
> >
> > Read side (ovl_getxattr and other users)
> > -----------------------------------------
> >         load ovl_inode->__upperdentry
> >         smp_rmb()
> >         load OVL_INDEX
> >         load OVL_ORIGIN
> >
> > Now if ovl_inode->__upperdentry is set, that means not only upper dentry
> > is stable, it should also mean that ovl_inode->flags are stable too.
> >
> > If this theory of barrier operations is correct, I feel this is a better
> > solution. It is easy to understand at the same time it allows for
> > easy code extension in future.
> >
> > Right now, these lockless assumptions are so difficult to understand
> > and so error prone, that all the new users will invariably end up
> > introducing races.
> >
> 
> I see the value in your proposal, but:
> 1. I do not know how to measure the performance impact of this change.
> 2. patch 20/28 introduces another smp_wmb() in ovl_copy_up_locked()
> and it is not conditional to metacopy being enabled. Is that right?
> Can your proposal make that any better?

I am thinking that for now I will leave it as it is because OVL_CONST_INO
probably will solve the immediate issue. I will probably take up this
proposal in a separate patch series once we are done merging a basic
metacopy patch series.

Thanks
Vivek

> 
> >>
> >> >
> >> > In fact I see OVL_INDEX being used in ovl_getattr(). If a parallel copy
> >> > up took place it is possible that ovl_getattr() saw upperdentry but not
> >> > the OVL_INDEX. And then nlink count will be off. IOW, what barrier/lock
> >> > guarantees that this does not happen.
> >> >
> >>
> >> Yes. and that is not the only problem I see with ovl_getattr().
> >>
> >> stat->dev could be set to ovl_get_pseudo_dev(dentry), even
> >> if stat->ino is not set to lowerstat.ino; that is not good.
> >>
> >> If I am not mistaken, all the special conditions in ovl_getattr()
> >> to use lower st_ino are already encoded in ovl_hash_bylower(),
> >> so you may completely remove those extra tests if
> >> OVL_TYPE_ORIGIN(type) already checks the OVL_BYLOWER
> >> flag.
> >>
> >> It may be hard to see my claim above is true, because, for
> >> example, sb->s_export_op in ovl_hash_bylower() is equivalent
> >> to ovl_verify_lower(dentry->d_sb) in ovl_getattr().
> >>
> >> Please double check my claims, and if you agree, you can submit a
> >> patch that Fixes: ("a0c5ad307ac0 ovl: relax same fs constraint for
> >> constant st_ino"), which already does the right thing for metacopy.
> >
> > I will need to spend some time to better understand it. I feel that
> > changing barrier semantics around __upperdentry, will solve this
> > issue with existing code as well.
> >
> 
> Its a simple bug not related to barriers semantics.
> I will post a fix patch.
> 
> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 09/28] ovl: A new xattr OVL_XATTR_METACOPY for file on upper
  2018-03-29 19:38 ` [PATCH v13 09/28] ovl: A new xattr OVL_XATTR_METACOPY for file on upper Vivek Goyal
@ 2018-04-11 15:10   ` Amir Goldstein
  2018-04-11 15:53     ` Vivek Goyal
  0 siblings, 1 reply; 91+ messages in thread
From: Amir Goldstein @ 2018-04-11 15:10 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> Now we will have the capability to have upper inodes which might be only
> metadata copy up and data is still on lower inode. So add a new xattr
> OVL_XATTR_METACOPY to distinguish between two cases.
>
> Presence of OVL_XATTR_METACOPY reflects that file has been copied up
> metadata only and and data will be copied up later from lower origin.
> So this xattr is set when a metadata copy takes place and cleared when
> data copy takes place.
>
> We also use a bit in ovl_inode->flags to cache OVL_UPPERDATA which reflects
> whether ovl inode has data or not (as opposed to metadata only copy up).
>
> If a file is copied up metadata only and later when same file is opened
> for WRITE, then data copy up takes place. We copy up data, remove METACOPY
> xattr and then set the UPPERDATA flag in ovl_inode->flags. While all
> these operations happen with oi->lock held, read side of oi->flags can be
> lockless. That is another thread on another cpu can check if UPPERDATA
> flag is set or not.
>
> So this gives us an ordering requirement w.r.t UPPERDATA flag. That is, if
> another cpu sees UPPERDATA flag set, then it should be guaranteed that
> effects of data copy up and remove xattr operations are also visible.
>
> For example.
>
>         CPU1                            CPU2
> ovl_d_real()                            acquire(oi->lock)
>  ovl_open_maybe_copy_up()                ovl_copy_up_data()
>   open_open_need_copy_up()               vfs_removexattr()
>    ovl_already_copied_up()
>     ovl_dentry_needs_data_copy_up()      ovl_set_flag(OVL_UPPERDATA)
>      ovl_test_flag(OVL_UPPERDATA)       release(oi->lock)
>
> Say CPU2 is copying up data and in the end sets UPPERDATA flag. But if
> CPU1 perceives the effects of setting UPPERDATA flag but not the effects
> of preceeding operations (ex. upper that is not fully copied up), it will be
> a problem.
>
> Hence this patch introduces smp_wmb() on setting UPPERDATA flag operation
> and smp_rmb() on UPPERDATA flag test operation.
>
> May be some other lock or barrier is already covering it. But I am not sure
> what that is and is it obvious enough that we will not break it in future.
>
> So hence trying to be safe here and introducing barriers explicitly for
> UPPERDATA flag/bit.
>
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/copy_up.c   | 56 ++++++++++++++++++++++++++++++----
>  fs/overlayfs/dir.c       |  1 +
>  fs/overlayfs/overlayfs.h | 18 +++++++++--
>  fs/overlayfs/super.c     |  1 +
>  fs/overlayfs/util.c      | 78 +++++++++++++++++++++++++++++++++++++++++++++---
>  5 files changed, 143 insertions(+), 11 deletions(-)
>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index 8d9af7fdc8a4..9801ae7baa5d 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -195,6 +195,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>         return error;
>  }
>
> +static int ovl_set_size(struct dentry *upperdentry, struct kstat *stat)
> +{
> +       struct iattr attr = {
> +               .ia_valid = ATTR_SIZE,
> +               .ia_size = stat->size,
> +       };
> +
> +       return notify_change(upperdentry, &attr, NULL);
> +}
> +
>  static int ovl_set_timestamps(struct dentry *upperdentry, struct kstat *stat)
>  {
>         struct iattr attr = {
> @@ -586,8 +596,18 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
>                         return err;
>         }
>
> +       if (c->metacopy) {
> +               err = ovl_check_setxattr(c->dentry, temp, OVL_XATTR_METACOPY,
> +                                        NULL, 0, -EOPNOTSUPP);
> +               if (err)
> +                       return err;
> +       }
> +
>         inode_lock(temp->d_inode);
> -       err = ovl_set_attr(temp, &c->stat);
> +       if (c->metacopy)
> +               err = ovl_set_size(temp, &c->stat);
> +       if (!err)
> +               err = ovl_set_attr(temp, &c->stat);
>         inode_unlock(temp->d_inode);
>
>         return err;
> @@ -625,6 +645,8 @@ static int ovl_copy_up_locked(struct ovl_copy_up_ctx *c)
>         if (err)
>                 goto out_cleanup;
>
> +       if (!c->metacopy)
> +               ovl_set_upperdata(d_inode(c->dentry));
>         inode = d_inode(c->dentry);
>         ovl_inode_update(inode, newdentry);

Following discussion on patch 20/28, I think this should be
    if (!c->metacopy)
            ovl_set_flag(OVL_UPPERDATA, inode);

without the memory barrier, because all the places that
check ovl_has_upperdata check upperdentry first, so the
smp_wmb() in ovl_inode_update() is sufficient and the extra
wmb is really only needed in ovl_copy_up_meta_inode_data().

Right?

Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 21/28] ovl: Set redirect on metacopy files upon rename
  2018-03-30  7:31   ` Amir Goldstein
@ 2018-04-11 15:12     ` Vivek Goyal
  2018-04-11 17:01       ` Amir Goldstein
  0 siblings, 1 reply; 91+ messages in thread
From: Vivek Goyal @ 2018-04-11 15:12 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Fri, Mar 30, 2018 at 10:31:46AM +0300, Amir Goldstein wrote:
> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > Set redirect on metacopy files upon rename. This will help find data dentry
> > in lower dirs.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/overlayfs/dir.c | 50 +++++++++++++++++++++++++++++++++++++-------------
> >  1 file changed, 37 insertions(+), 13 deletions(-)
> >
> > diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> > index 3ea052b6bac7..7c0a02d9f6bd 100644
> > --- a/fs/overlayfs/dir.c
> > +++ b/fs/overlayfs/dir.c
> > @@ -968,6 +968,27 @@ static void ovl_rename_unlock_ovl_inodes(struct dentry *old, struct dentry *new,
> >                 mutex_unlock(&OVL_I(d_inode(new))->lock);
> >  }
> >
> > +static bool ovl_relative_redirect(struct dentry *dentry, bool samedir)
> > +{
> > +       if (d_is_dir(dentry))
> > +               return samedir;
> > +
> > +       /*
> > +        * For non-dir hardlinked files, we need absolute redirects
> > +        * in general as two upper hardlinks could be in different
> > +        * dirs. We could put a relative redirect now and convert
> > +        * it to absolute redirect later. But when nlink > 1 and
> > +        * indexing is on, that means relative redirect needs to be
> > +        * converted to absolute during copy up of another lower
> > +        * hardllink as well.
> > +        *
> > +        * So without optimizing too much, just check if non-dir
> > +        * has nlink > 1 or not. If yes, set absolute redirect to
> > +        * begin with.
> > +        */
> > +       return (d_inode(dentry)->i_nlink > 1 ? false : samedir);
> 
> I don't think this is wrong, but I don't like relying on the overlay inode
> nlink. I wonder if you should separate the case of indexed and
> non-indexed inode.

Hi Amir,

So if inode is indexed, use absolute redirect, otherwise use, relative
redirect? But how do we get reliably the informaton if inode is
indexed or not. I mean, OVL_INDEX has memory ordering issues, w.r.t
copy up, so that's untrusted as well.

How about if I check for nlink of lowerdentry (instead of ovl dentry).
Will that be fine? 

Vivek


> 
> > +}
> > +
> >  static int ovl_rename(struct inode *olddir, struct dentry *old,
> >                       struct inode *newdir, struct dentry *new,
> >                       unsigned int flags)
> > @@ -1131,22 +1152,25 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
> >                 goto out_dput;
> >
> >         err = 0;
> > -       if (is_dir) {
> > -               if (ovl_type_merge_or_lower(old))
> > -                       err = ovl_set_redirect(old, samedir);
> > -               else if (!old_opaque && ovl_type_merge(new->d_parent))
> > -                       err = ovl_set_opaque_xerr(old, olddentry, -EXDEV);
> > -               if (err)
> > -                       goto out_dput;
> > -       }
> > -       if (!overwrite && new_is_dir) {
> > +       if (ovl_type_merge_or_lower(old))
> > +               err = ovl_set_redirect(old,
> > +                                      ovl_relative_redirect(old, samedir));
> > +       else if (is_dir && !old_opaque && ovl_type_merge(new->d_parent))
> > +               err = ovl_set_opaque_xerr(old, olddentry, -EXDEV);
> > +
> > +       if (err)
> > +               goto out_dput;
> > +
> > +       if (!overwrite) {
> >                 if (ovl_type_merge_or_lower(new))
> > -                       err = ovl_set_redirect(new, samedir);
> > -               else if (!new_opaque && ovl_type_merge(old->d_parent))
> > +                       err = ovl_set_redirect(new, ovl_relative_redirect(new,
> > +                                              samedir));
> > +               else if (new_is_dir && !new_opaque &&
> > +                        ovl_type_merge(old->d_parent))
> >                         err = ovl_set_opaque_xerr(new, newdentry, -EXDEV);
> > -               if (err)
> > -                       goto out_dput;
> >         }
> > +       if (err)
> > +               goto out_dput;
> >
> >         err = ovl_do_rename(old_upperdir->d_inode, olddentry,
> >                             new_upperdir->d_inode, newdentry, flags);
> > --
> > 2.13.6
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 09/28] ovl: A new xattr OVL_XATTR_METACOPY for file on upper
  2018-04-11 15:10   ` Amir Goldstein
@ 2018-04-11 15:53     ` Vivek Goyal
  0 siblings, 0 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-04-11 15:53 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Wed, Apr 11, 2018 at 06:10:03PM +0300, Amir Goldstein wrote:
> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > Now we will have the capability to have upper inodes which might be only
> > metadata copy up and data is still on lower inode. So add a new xattr
> > OVL_XATTR_METACOPY to distinguish between two cases.
> >
> > Presence of OVL_XATTR_METACOPY reflects that file has been copied up
> > metadata only and and data will be copied up later from lower origin.
> > So this xattr is set when a metadata copy takes place and cleared when
> > data copy takes place.
> >
> > We also use a bit in ovl_inode->flags to cache OVL_UPPERDATA which reflects
> > whether ovl inode has data or not (as opposed to metadata only copy up).
> >
> > If a file is copied up metadata only and later when same file is opened
> > for WRITE, then data copy up takes place. We copy up data, remove METACOPY
> > xattr and then set the UPPERDATA flag in ovl_inode->flags. While all
> > these operations happen with oi->lock held, read side of oi->flags can be
> > lockless. That is another thread on another cpu can check if UPPERDATA
> > flag is set or not.
> >
> > So this gives us an ordering requirement w.r.t UPPERDATA flag. That is, if
> > another cpu sees UPPERDATA flag set, then it should be guaranteed that
> > effects of data copy up and remove xattr operations are also visible.
> >
> > For example.
> >
> >         CPU1                            CPU2
> > ovl_d_real()                            acquire(oi->lock)
> >  ovl_open_maybe_copy_up()                ovl_copy_up_data()
> >   open_open_need_copy_up()               vfs_removexattr()
> >    ovl_already_copied_up()
> >     ovl_dentry_needs_data_copy_up()      ovl_set_flag(OVL_UPPERDATA)
> >      ovl_test_flag(OVL_UPPERDATA)       release(oi->lock)
> >
> > Say CPU2 is copying up data and in the end sets UPPERDATA flag. But if
> > CPU1 perceives the effects of setting UPPERDATA flag but not the effects
> > of preceeding operations (ex. upper that is not fully copied up), it will be
> > a problem.
> >
> > Hence this patch introduces smp_wmb() on setting UPPERDATA flag operation
> > and smp_rmb() on UPPERDATA flag test operation.
> >
> > May be some other lock or barrier is already covering it. But I am not sure
> > what that is and is it obvious enough that we will not break it in future.
> >
> > So hence trying to be safe here and introducing barriers explicitly for
> > UPPERDATA flag/bit.
> >
> > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/overlayfs/copy_up.c   | 56 ++++++++++++++++++++++++++++++----
> >  fs/overlayfs/dir.c       |  1 +
> >  fs/overlayfs/overlayfs.h | 18 +++++++++--
> >  fs/overlayfs/super.c     |  1 +
> >  fs/overlayfs/util.c      | 78 +++++++++++++++++++++++++++++++++++++++++++++---
> >  5 files changed, 143 insertions(+), 11 deletions(-)
> >
> > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> > index 8d9af7fdc8a4..9801ae7baa5d 100644
> > --- a/fs/overlayfs/copy_up.c
> > +++ b/fs/overlayfs/copy_up.c
> > @@ -195,6 +195,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
> >         return error;
> >  }
> >
> > +static int ovl_set_size(struct dentry *upperdentry, struct kstat *stat)
> > +{
> > +       struct iattr attr = {
> > +               .ia_valid = ATTR_SIZE,
> > +               .ia_size = stat->size,
> > +       };
> > +
> > +       return notify_change(upperdentry, &attr, NULL);
> > +}
> > +
> >  static int ovl_set_timestamps(struct dentry *upperdentry, struct kstat *stat)
> >  {
> >         struct iattr attr = {
> > @@ -586,8 +596,18 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
> >                         return err;
> >         }
> >
> > +       if (c->metacopy) {
> > +               err = ovl_check_setxattr(c->dentry, temp, OVL_XATTR_METACOPY,
> > +                                        NULL, 0, -EOPNOTSUPP);
> > +               if (err)
> > +                       return err;
> > +       }
> > +
> >         inode_lock(temp->d_inode);
> > -       err = ovl_set_attr(temp, &c->stat);
> > +       if (c->metacopy)
> > +               err = ovl_set_size(temp, &c->stat);
> > +       if (!err)
> > +               err = ovl_set_attr(temp, &c->stat);
> >         inode_unlock(temp->d_inode);
> >
> >         return err;
> > @@ -625,6 +645,8 @@ static int ovl_copy_up_locked(struct ovl_copy_up_ctx *c)
> >         if (err)
> >                 goto out_cleanup;
> >
> > +       if (!c->metacopy)
> > +               ovl_set_upperdata(d_inode(c->dentry));
> >         inode = d_inode(c->dentry);
> >         ovl_inode_update(inode, newdentry);
> 
> Following discussion on patch 20/28, I think this should be
>     if (!c->metacopy)
>             ovl_set_flag(OVL_UPPERDATA, inode);
> 
> without the memory barrier, because all the places that
> check ovl_has_upperdata check upperdentry first, so the
> smp_wmb() in ovl_inode_update() is sufficient and the extra
> wmb is really only needed in ovl_copy_up_meta_inode_data().
> 
> Right?

May be. I am not sure. We will need help of a barrier expert to
say that we are understanding it right. :-) I did not see this
pattern directly mentioned in memory-barrier.txt though.

For now, I would like to stick to exisiting implementation and
look at all barrier related optimizations in a separate much smaller
patch series.

Vivek

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 22/28] ovl: Set redirect on upper inode when it is linked
  2018-03-30  7:04   ` Amir Goldstein
@ 2018-04-11 15:59     ` Vivek Goyal
  0 siblings, 0 replies; 91+ messages in thread
From: Vivek Goyal @ 2018-04-11 15:59 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, Miklos Szeredi

On Fri, Mar 30, 2018 at 10:04:40AM +0300, Amir Goldstein wrote:
> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > When we create a hardlink to a metacopy upper file, first the redirect
> > on that inode. Path based lookup will not work with newly created link
> > and redirect will solve that issue.
> >
> > Also use absolute redirect as two hardlinks could be in different directores
> > and relative redirect will not work.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/overlayfs/dir.c | 18 ++++++++++++++----
> >  1 file changed, 14 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> > index 7c0a02d9f6bd..ccbe061fc4ba 100644
> > --- a/fs/overlayfs/dir.c
> > +++ b/fs/overlayfs/dir.c
> > @@ -24,6 +24,8 @@ module_param_named(redirect_max, ovl_redirect_max, ushort, 0644);
> >  MODULE_PARM_DESC(ovl_redirect_max,
> >                  "Maximum length of absolute redirect xattr value");
> >
> > +static int ovl_set_redirect(struct dentry *dentry, bool samedir);
> > +
> >  int ovl_cleanup(struct inode *wdir, struct dentry *wdentry)
> >  {
> >         int err;
> > @@ -468,6 +470,9 @@ static int ovl_create_or_link(struct dentry *dentry, struct inode *inode,
> >         const struct cred *old_cred;
> >         struct cred *override_cred;
> >         struct dentry *parent = dentry->d_parent;
> > +       struct dentry *hardlink_upper;
> > +
> > +       hardlink_upper = hardlink ? ovl_dentry_upper(hardlink) : NULL;
> >
> >         err = ovl_copy_up(parent);
> >         if (err)
> > @@ -502,12 +507,18 @@ static int ovl_create_or_link(struct dentry *dentry, struct inode *inode,
> >                 put_cred(override_creds(override_cred));
> >                 put_cred(override_cred);
> >
> > +               if (hardlink && ovl_is_metacopy_dentry(hardlink)) {
> > +                       err = ovl_set_redirect(hardlink, false);
> 
> Like with rename, redirect could be set much sooner in ovl_link()
> if all the locks were contained within ovl_set_redirect().
> I think code will be simpler overall, but can't say for sure...

For now, I am not taking any locks in ovl_set_redirect(). I think VFS
locking along with d_lock seems sufficient for the current use case.

So leaving it untouched for now.

Vivek
> 
> 
> > +                       if (err)
> > +                               goto out_revert_creds;
> > +               }
> > +
> >                 if (!ovl_dentry_is_whiteout(dentry))
> >                         err = ovl_create_upper(dentry, inode, attr,
> > -                                               hardlink);
> > +                                              hardlink_upper);
> >                 else
> >                         err = ovl_create_over_whiteout(dentry, inode, attr,
> > -                                                       hardlink);
> > +                                                      hardlink_upper);
> >         }
> >  out_revert_creds:
> >         revert_creds(old_cred);
> > @@ -606,8 +617,7 @@ static int ovl_link(struct dentry *old, struct inode *newdir,
> >         inode = d_inode(old);
> >         ihold(inode);
> >
> > -       err = ovl_create_or_link(new, inode, NULL, ovl_dentry_upper(old),
> > -                                ovl_type_origin(old));
> > +       err = ovl_create_or_link(new, inode, NULL, old, ovl_type_origin(old));
> >         if (err)
> >                 iput(inode);
> >
> > --
> > 2.13.6
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v13 21/28] ovl: Set redirect on metacopy files upon rename
  2018-04-11 15:12     ` Vivek Goyal
@ 2018-04-11 17:01       ` Amir Goldstein
  0 siblings, 0 replies; 91+ messages in thread
From: Amir Goldstein @ 2018-04-11 17:01 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: overlayfs, Miklos Szeredi

On Wed, Apr 11, 2018 at 6:12 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Mar 30, 2018 at 10:31:46AM +0300, Amir Goldstein wrote:
>> On Thu, Mar 29, 2018 at 10:38 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > Set redirect on metacopy files upon rename. This will help find data dentry
>> > in lower dirs.
>> >
>> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
>> > ---
>> >  fs/overlayfs/dir.c | 50 +++++++++++++++++++++++++++++++++++++-------------
>> >  1 file changed, 37 insertions(+), 13 deletions(-)
>> >
>> > diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
>> > index 3ea052b6bac7..7c0a02d9f6bd 100644
>> > --- a/fs/overlayfs/dir.c
>> > +++ b/fs/overlayfs/dir.c
>> > @@ -968,6 +968,27 @@ static void ovl_rename_unlock_ovl_inodes(struct dentry *old, struct dentry *new,
>> >                 mutex_unlock(&OVL_I(d_inode(new))->lock);
>> >  }
>> >
>> > +static bool ovl_relative_redirect(struct dentry *dentry, bool samedir)
>> > +{
>> > +       if (d_is_dir(dentry))
>> > +               return samedir;
>> > +
>> > +       /*
>> > +        * For non-dir hardlinked files, we need absolute redirects
>> > +        * in general as two upper hardlinks could be in different
>> > +        * dirs. We could put a relative redirect now and convert
>> > +        * it to absolute redirect later. But when nlink > 1 and
>> > +        * indexing is on, that means relative redirect needs to be
>> > +        * converted to absolute during copy up of another lower
>> > +        * hardllink as well.
>> > +        *
>> > +        * So without optimizing too much, just check if non-dir
>> > +        * has nlink > 1 or not. If yes, set absolute redirect to
>> > +        * begin with.
>> > +        */
>> > +       return (d_inode(dentry)->i_nlink > 1 ? false : samedir);
>>
>> I don't think this is wrong, but I don't like relying on the overlay inode
>> nlink. I wonder if you should separate the case of indexed and
>> non-indexed inode.
>
> Hi Amir,
>
> So if inode is indexed, use absolute redirect, otherwise use, relative
> redirect? But how do we get reliably the informaton if inode is
> indexed or not. I mean, OVL_INDEX has memory ordering issues, w.r.t
> copy up, so that's untrusted as well.
>
> How about if I check for nlink of lowerdentry (instead of ovl dentry).
> Will that be fine?
>

I guess that would be fine.
Maybe nicer if we set OVL_LOWER_HARDLINK in lookup.
But for now, let's just say I did not NACK your patch
and maybe there is no good reason to doubt the reliability of
overlay inode nlink, so unless Miklos feels otherwise, let's postpone this
discussion for a later time.

Cheers,
Amir.

^ permalink raw reply	[flat|nested] 91+ messages in thread

end of thread, other threads:[~2018-04-11 17:01 UTC | newest]

Thread overview: 91+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-29 19:38 [PATCH v13 00/28] overlayfs: Delayed copy up of data Vivek Goyal
2018-03-29 19:38 ` [PATCH v13 01/28] ovl: Set OVL_INDEX flag in ovl_get_inode() Vivek Goyal
2018-03-30  4:59   ` Amir Goldstein
2018-03-29 19:38 ` [PATCH v13 02/28] ovl: Initialize ovl_inode->redirect " Vivek Goyal
2018-03-30  4:57   ` Amir Goldstein
2018-03-29 19:38 ` [PATCH v13 03/28] ovl: Rename local variable locked to new_locked Vivek Goyal
2018-03-30  4:58   ` Amir Goldstein
2018-03-29 19:38 ` [PATCH v13 04/28] ovl: Provide a mount option metacopy=on/off for metadata copyup Vivek Goyal
2018-03-30  4:52   ` Amir Goldstein
2018-04-02 13:56     ` Vivek Goyal
2018-04-05 20:16       ` Amir Goldstein
2018-04-06 13:51         ` Vivek Goyal
2018-03-29 19:38 ` [PATCH v13 05/28] ovl: During copy up, first copy up metadata and then data Vivek Goyal
2018-03-29 19:38 ` [PATCH v13 06/28] ovl: Move the copy up helpers to copy_up.c Vivek Goyal
2018-03-29 19:38 ` [PATCH v13 07/28] ovl: Copy up only metadata during copy up where it makes sense Vivek Goyal
2018-03-29 19:38 ` [PATCH v13 08/28] ovl: Add helper ovl_already_copied_up() Vivek Goyal
2018-03-29 19:38 ` [PATCH v13 09/28] ovl: A new xattr OVL_XATTR_METACOPY for file on upper Vivek Goyal
2018-04-11 15:10   ` Amir Goldstein
2018-04-11 15:53     ` Vivek Goyal
2018-03-29 19:38 ` [PATCH v13 10/28] ovl: Modify ovl_lookup() and friends to lookup metacopy dentry Vivek Goyal
2018-03-30  5:49   ` Amir Goldstein
2018-03-30  9:12     ` Amir Goldstein
2018-04-02 19:45       ` Vivek Goyal
2018-04-02 20:07         ` Amir Goldstein
2018-04-02 15:06     ` Vivek Goyal
2018-03-29 19:38 ` [PATCH v13 11/28] ovl: Copy up meta inode data from lowest data inode Vivek Goyal
2018-03-29 19:38 ` [PATCH v13 12/28] ovl: Fix ovl_getattr() to get number of blocks from lower Vivek Goyal
2018-03-30  9:24   ` Amir Goldstein
2018-04-02 20:11     ` Vivek Goyal
2018-04-02 20:27       ` Amir Goldstein
2018-03-29 19:38 ` [PATCH v13 13/28] ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry Vivek Goyal
2018-03-30  6:01   ` Amir Goldstein
2018-04-02 15:08     ` Vivek Goyal
2018-03-29 19:38 ` [PATCH v13 14/28] ovl: Do not expose metacopy only dentry from d_real() Vivek Goyal
2018-03-29 19:38 ` [PATCH v13 15/28] ovl: Move some of ovl_nlink_start() functionality in ovl_nlink_prep() Vivek Goyal
2018-03-30  6:23   ` Amir Goldstein
2018-03-29 19:38 ` [PATCH v13 16/28] ovl: Create locked version of ovl_nlink_start() and ovl_nlink_end() Vivek Goyal
2018-03-30  6:28   ` Amir Goldstein
2018-03-29 19:38 ` [PATCH v13 17/28] ovl: During rename lock both source and target ovl_inode Vivek Goyal
2018-03-30  6:50   ` Amir Goldstein
2018-04-02 17:34     ` Vivek Goyal
2018-03-29 19:38 ` [PATCH v13 18/28] ovl: Check redirects for metacopy files Vivek Goyal
2018-03-30 10:02   ` Amir Goldstein
2018-04-02 20:29     ` Vivek Goyal
2018-04-03  5:44       ` Amir Goldstein
2018-04-03 12:31         ` Vivek Goyal
2018-03-29 19:38 ` [PATCH v13 19/28] ovl: Treat metacopy dentries as type OVL_PATH_MERGE Vivek Goyal
2018-03-30  6:52   ` Amir Goldstein
2018-03-29 19:38 ` [PATCH v13 20/28] ovl: Do not set dentry type ORIGIN for broken hardlinks Vivek Goyal
2018-03-30  9:54   ` Amir Goldstein
2018-04-10 14:00     ` Vivek Goyal
2018-04-10 19:20       ` Amir Goldstein
2018-04-10 19:29         ` Amir Goldstein
2018-04-10 20:59           ` Vivek Goyal
2018-04-10 20:51         ` Vivek Goyal
2018-04-11  8:58           ` Amir Goldstein
2018-04-11 13:31             ` Vivek Goyal
2018-03-29 19:38 ` [PATCH v13 21/28] ovl: Set redirect on metacopy files upon rename Vivek Goyal
2018-03-30  7:31   ` Amir Goldstein
2018-04-11 15:12     ` Vivek Goyal
2018-04-11 17:01       ` Amir Goldstein
2018-03-29 19:38 ` [PATCH v13 22/28] ovl: Set redirect on upper inode when it is linked Vivek Goyal
2018-03-30  7:04   ` Amir Goldstein
2018-04-11 15:59     ` Vivek Goyal
2018-03-29 19:38 ` [PATCH v13 23/28] ovl: Remove redirect when data of a metacopy file is copied up Vivek Goyal
2018-03-29 19:38 ` [PATCH v13 24/28] ovl: Do not error if REDIRECT XATTR is missing Vivek Goyal
2018-03-30  7:41   ` Amir Goldstein
2018-03-29 19:38 ` [PATCH v13 25/28] ovl: Use out_err insteada of out_nomem Vivek Goyal
2018-03-30  7:35   ` Amir Goldstein
2018-03-29 19:38 ` [PATCH v13 26/28] ovl: Re-check redirect xattr during inode initialization Vivek Goyal
2018-03-30  8:56   ` Amir Goldstein
2018-04-02 19:35     ` Vivek Goyal
2018-04-02 20:25       ` Amir Goldstein
2018-03-29 19:38 ` [PATCH v13 27/28] ovl: Verify a data dentry has been found for metacopy inode Vivek Goyal
2018-03-30 10:53   ` Amir Goldstein
2018-04-02 12:39     ` Vivek Goyal
2018-04-04 12:29     ` Vivek Goyal
2018-04-04 12:51       ` Amir Goldstein
2018-04-04 13:21         ` Vivek Goyal
2018-04-04 15:51           ` Amir Goldstein
2018-04-05 14:37             ` Vivek Goyal
2018-04-05 18:22               ` Vivek Goyal
2018-04-05 19:58                 ` Amir Goldstein
2018-04-05 20:45                   ` Vivek Goyal
2018-04-06  9:46                     ` Amir Goldstein
2018-04-06 15:37                       ` Vivek Goyal
2018-04-06 16:21                         ` Amir Goldstein
2018-04-06 17:32                           ` Vivek Goyal
2018-04-06 20:10                             ` Amir Goldstein
2018-04-09 12:18                               ` Vivek Goyal
2018-03-29 19:38 ` [PATCH v13 28/28] ovl: Enable metadata only feature Vivek Goyal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.