linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] overlayfs: C/R enhancements
@ 2020-06-04 16:11 Alexander Mikhalitsyn
  2020-06-04 16:11 ` [PATCH 1/2] overlayfs: add dynamic path resolving in mount options Alexander Mikhalitsyn
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Alexander Mikhalitsyn @ 2020-06-04 16:11 UTC (permalink / raw)
  To: miklos
  Cc: avagin, ptikhomirov, khorenko, vvs, ktkhai,
	Alexander Mikhalitsyn, linux-unionfs, linux-kernel

This patchset aimed to make C/R of overlayfs mounts with CRIU possible.
We introduce two new overlayfs module options -- dyn_path_opts and
mnt_id_path_opts. If enabled this options allows to see real *full* paths
in lowerdir, workdir, upperdir options, and also mnt_ids for corresponding
paths.

This changes should not break anything because for showing mnt_ids we simply
introduce new show-time mount options. And for paths we simply *always*
provide *full paths* instead of relative path on mountinfo.

BEFORE

overlay on /var/lib/docker/overlay2/XYZ/merged type overlay (rw,relatime,
lowerdir=/var/lib/docker/overlay2/XYZ-init/diff:/var/lib/docker/overlay2/
ABC/diff,upperdir=/var/lib/docker/overlay2/XYZ/diff,workdir=/var/lib/docker
/overlay2/XYZ/work)
none on /sys type sysfs (rw,relatime)

AFTER

overlay on /var/lib/docker/overlay2/XYZ/merged type overlay (rw,relatime,
lowerdir=/var/lib/docker/overlay2/XYZ-init/diff:/var/lib/docker/overlay2/
ABC/diff,upperdir=/var/lib/docker/overlay2/XYZ/diff,workdir=/var/lib/docker
/overlay2/XYZ/work,lowerdir_mnt_id=175:175,upperdir_mnt_id=175)
none on /sys type sysfs (rw,relatime)

Alexander Mikhalitsyn (2):
  overlayfs: add dynamic path resolving in mount options
  overlayfs: add mnt_id paths options

 fs/overlayfs/Kconfig     |  57 ++++++++++++++++++++++
 fs/overlayfs/overlayfs.h |   7 +++
 fs/overlayfs/ovl_entry.h |   6 ++-
 fs/overlayfs/super.c     | 103 ++++++++++++++++++++++++---------------
 fs/overlayfs/util.c      |  42 ++++++++++++++++
 5 files changed, 174 insertions(+), 41 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] overlayfs: add dynamic path resolving in mount options
  2020-06-04 16:11 [PATCH 0/2] overlayfs: C/R enhancements Alexander Mikhalitsyn
@ 2020-06-04 16:11 ` Alexander Mikhalitsyn
  2020-06-04 16:11 ` [PATCH 2/2] overlayfs: add mnt_id paths options Alexander Mikhalitsyn
  2020-06-04 18:04 ` [PATCH 0/2] overlayfs: C/R enhancements Amir Goldstein
  2 siblings, 0 replies; 11+ messages in thread
From: Alexander Mikhalitsyn @ 2020-06-04 16:11 UTC (permalink / raw)
  To: miklos
  Cc: avagin, ptikhomirov, khorenko, vvs, ktkhai,
	Alexander Mikhalitsyn, linux-unionfs, linux-kernel

This patch adds OVERLAY_FS_DYNAMIC_RESOLVE_PATH_OPTIONS
compile-time option, and "dyn_path_opts" runtime module option.
These options corresponds "dynamic path resolving in lowerdir,
upperdir, workdir mount options" mode. If enabled, user may see
real full paths relatively to the mount namespace in lowerdir,
upperdir, workdir options (/proc/mounts, /proc/<fd>/mountinfo).

This patch is very helpful to checkpoint/restore functionality
of overlayfs mounts. With this patch and CRIU it's real to C/R
Docker containers with overlayfs storage driver.

Note: d_path function from dcache.c is used to resolve full path
in mount namespace. This function also adds "(deleted)" suffix
if dentry was deleted. So, If one of dentries in lowerdir, upperdir,
workdir options is deleted, we will see "(deleted)" suffix in
corresponding path.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
---
 fs/overlayfs/Kconfig     | 31 ++++++++++++++
 fs/overlayfs/overlayfs.h |  5 +++
 fs/overlayfs/ovl_entry.h |  6 ++-
 fs/overlayfs/super.c     | 88 ++++++++++++++++++++++------------------
 fs/overlayfs/util.c      | 21 ++++++++++
 5 files changed, 110 insertions(+), 41 deletions(-)

diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
index dd188c7996b3..c24988527ef3 100644
--- a/fs/overlayfs/Kconfig
+++ b/fs/overlayfs/Kconfig
@@ -124,3 +124,34 @@ config OVERLAY_FS_METACOPY
 	  that doesn't support this feature will have unexpected results.
 
 	  If unsure, say N.
+
+config OVERLAY_FS_DYNAMIC_RESOLVE_PATH_OPTIONS
+	bool "Overlayfs: all mount paths options resolves dynamically on options show"
+	default y
+	depends on OVERLAY_FS
+	help
+	  This option helps checkpoint/restore of overlayfs mounts.
+	  If N selected, old behavior is saved. In this case lowerdir, upperdir,
+	  workdir options shows in /proc/fd/mountinfo, /proc/mounts as it given
+	  by user on mount. User may specify relative paths in these options, then
+	  we couldn't determine from options which full paths correspond these
+	  relative paths. Also, after pivot_root syscall these paths (even full)
+	  will not rebuild according to root change.
+
+	  If this config option is enabled then overlay filesystems lowerdir, upperdir,
+	  workdir options paths will dynamically recalculated as full paths in corresponding
+	  mount namespaces by default.
+
+	  It's also possible to change this behavior on overlayfs module loading or
+	  through sysfs (dyn_path_opts parameter).
+
+	  Disable this to get a backward compatible with previous kernels configuration,
+	  but in this case checkpoint/restore functionality for overlayfs mounts
+	  will not work.
+
+	  If backward compatibility is not an issue, then it is safe and
+	  recommended to say Y here.
+
+	  For more information, see Documentation/filesystems/overlayfs.txt
+
+	  If unsure, say N.
\ No newline at end of file
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index e6f3670146ed..8722ed556e11 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -302,6 +302,11 @@ char *ovl_get_redirect_xattr(struct dentry *dentry, int padding);
 ssize_t ovl_getxattr(struct dentry *dentry, char *name, char **value,
 		     size_t padding);
 
+void print_path_option(struct seq_file *m, const char *name, struct path *path);
+void print_paths_option(struct seq_file *m, const char *name,
+			struct path *paths, unsigned int num);
+			struct path *paths, unsigned int num);
+
 static inline bool ovl_is_impuredir(struct dentry *dentry)
 {
 	return ovl_check_dir_xattr(dentry, OVL_XATTR_IMPURE);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 5762d802fe01..5db2582b47bf 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -47,13 +47,15 @@ struct ovl_path {
 /* private information held for overlayfs's superblock */
 struct ovl_fs {
 	struct vfsmount *upper_mnt;
+	struct path upperpath;
 	unsigned int numlayer;
 	/* Number of unique fs among layers including upper fs */
 	unsigned int numfs;
+	struct path *lowerpaths;
 	const struct ovl_layer *layers;
 	struct ovl_sb *fs;
-	/* workbasedir is the path at workdir= mount option */
-	struct dentry *workbasedir;
+	/* workbasepath is the path at workdir= mount option */
+	struct path workbasepath;
 	/* workdir is the 'work' directory under workbasedir */
 	struct dentry *workdir;
 	/* index directory listing overlay inodes by origin file handle */
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 732ad5495c92..a449b6bb4b20 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -53,6 +53,10 @@ module_param_named(xino_auto, ovl_xino_auto_def, bool, 0644);
 MODULE_PARM_DESC(xino_auto,
 		 "Auto enable xino feature");
 
+static bool ovl_dyn_path_opts = IS_ENABLED(CONFIG_OVERLAY_FS_DYNAMIC_RESOLVE_PATH_OPTIONS);
+module_param_named(dyn_path_opts, ovl_dyn_path_opts, bool, 0644);
+MODULE_PARM_DESC(dyn_path_opts, "dyn_path_opts feature enabled");
+
 static void ovl_entry_stack_free(struct ovl_entry *oe)
 {
 	unsigned int i;
@@ -220,11 +224,17 @@ static void ovl_free_fs(struct ovl_fs *ofs)
 	dput(ofs->indexdir);
 	dput(ofs->workdir);
 	if (ofs->workdir_locked)
-		ovl_inuse_unlock(ofs->workbasedir);
-	dput(ofs->workbasedir);
+		ovl_inuse_unlock(ofs->workbasepath.dentry);
+	path_put(&ofs->workbasepath);
 	if (ofs->upperdir_locked)
 		ovl_inuse_unlock(ofs->upper_mnt->mnt_root);
 	mntput(ofs->upper_mnt);
+	path_put(&ofs->upperpath);
+	if (ofs->lowerpaths) {
+		for (i = 0; i < ofs->numlayer; i++)
+			path_put(&ofs->lowerpaths[i]);
+		kfree(ofs->lowerpaths);
+	}
 	for (i = 1; i < ofs->numlayer; i++) {
 		iput(ofs->layers[i].trap);
 		mntput(ofs->layers[i].mnt);
@@ -339,10 +349,18 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
 	struct super_block *sb = dentry->d_sb;
 	struct ovl_fs *ofs = sb->s_fs_info;
 
-	seq_show_option(m, "lowerdir", ofs->config.lowerdir);
-	if (ofs->config.upperdir) {
-		seq_show_option(m, "upperdir", ofs->config.upperdir);
-		seq_show_option(m, "workdir", ofs->config.workdir);
+	if (ovl_dyn_path_opts) {
+		print_paths_option(m, "lowerdir", ofs->lowerpaths, ofs->numlayer);
+		if (ofs->config.upperdir) {
+			print_path_option(m, "upperdir", &ofs->upperpath);
+			print_path_option(m, "workdir", &ofs->workbasepath);
+		}
+	} else {
+		seq_show_option(m, "lowerdir", ofs->config.lowerdir);
+		if (ofs->config.upperdir) {
+			seq_show_option(m, "upperdir", ofs->config.upperdir);
+			seq_show_option(m, "workdir", ofs->config.workdir);
+		}
 	}
 	if (ofs->config.default_permissions)
 		seq_puts(m, ",default_permissions");
@@ -610,7 +628,7 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
 static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
 					 const char *name, bool persist)
 {
-	struct inode *dir =  ofs->workbasedir->d_inode;
+	struct inode *dir =  ofs->workbasepath.dentry->d_inode;
 	struct vfsmount *mnt = ofs->upper_mnt;
 	struct dentry *work;
 	int err;
@@ -621,7 +639,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
 	locked = true;
 
 retry:
-	work = lookup_one_len(name, ofs->workbasedir, strlen(name));
+	work = lookup_one_len(name, ofs->workbasepath.dentry, strlen(name));
 
 	if (!IS_ERR(work)) {
 		struct iattr attr = {
@@ -1126,7 +1144,7 @@ static int ovl_check_rename_whiteout(struct dentry *workdir)
 }
 
 static int ovl_make_workdir(struct super_block *sb, struct ovl_fs *ofs,
-			    struct path *workpath)
+			    struct path *workbasepath)
 {
 	struct vfsmount *mnt = ofs->upper_mnt;
 	struct dentry *temp;
@@ -1153,7 +1171,7 @@ static int ovl_make_workdir(struct super_block *sb, struct ovl_fs *ofs,
 	 * workdir. This check requires successful creation of workdir in
 	 * previous step.
 	 */
-	err = ovl_check_d_type_supported(workpath);
+	err = ovl_check_d_type_supported(workbasepath);
 	if (err < 0)
 		goto out;
 
@@ -1230,25 +1248,22 @@ static int ovl_get_workdir(struct super_block *sb, struct ovl_fs *ofs,
 			   struct path *upperpath)
 {
 	int err;
-	struct path workpath = { };
 
-	err = ovl_mount_dir(ofs->config.workdir, &workpath);
+	err = ovl_mount_dir(ofs->config.workdir, &ofs->workbasepath);
 	if (err)
 		goto out;
 
 	err = -EINVAL;
-	if (upperpath->mnt != workpath.mnt) {
+	if (upperpath->mnt != ofs->workbasepath.mnt) {
 		pr_err("workdir and upperdir must reside under the same mount\n");
 		goto out;
 	}
-	if (!ovl_workdir_ok(workpath.dentry, upperpath->dentry)) {
+	if (!ovl_workdir_ok(ofs->workbasepath.dentry, upperpath->dentry)) {
 		pr_err("workdir and upperdir must be separate subtrees\n");
 		goto out;
 	}
 
-	ofs->workbasedir = dget(workpath.dentry);
-
-	if (ovl_inuse_trylock(ofs->workbasedir)) {
+	if (ovl_inuse_trylock(ofs->workbasepath.dentry)) {
 		ofs->workdir_locked = true;
 	} else {
 		err = ovl_report_in_use(ofs, "workdir");
@@ -1256,15 +1271,14 @@ static int ovl_get_workdir(struct super_block *sb, struct ovl_fs *ofs,
 			goto out;
 	}
 
-	err = ovl_setup_trap(sb, ofs->workbasedir, &ofs->workbasedir_trap,
+	err = ovl_setup_trap(sb, ofs->workbasepath.dentry, &ofs->workbasedir_trap,
 			     "workdir");
 	if (err)
 		goto out;
 
-	err = ovl_make_workdir(sb, ofs, &workpath);
+	err = ovl_make_workdir(sb, ofs, &ofs->workbasepath);
 
 out:
-	path_put(&workpath);
 
 	return err;
 }
@@ -1513,7 +1527,6 @@ static struct ovl_entry *ovl_get_lowerstack(struct super_block *sb,
 {
 	int err;
 	char *lowertmp, *lower;
-	struct path *stack = NULL;
 	unsigned int stacklen, numlower = 0, i;
 	struct ovl_entry *oe;
 
@@ -1538,14 +1551,14 @@ static struct ovl_entry *ovl_get_lowerstack(struct super_block *sb,
 	}
 
 	err = -ENOMEM;
-	stack = kcalloc(stacklen, sizeof(struct path), GFP_KERNEL);
-	if (!stack)
+	ofs->lowerpaths = kcalloc(stacklen, sizeof(struct path), GFP_KERNEL);
+	if (!ofs->lowerpaths)
 		goto out_err;
 
 	err = -EINVAL;
 	lower = lowertmp;
 	for (numlower = 0; numlower < stacklen; numlower++) {
-		err = ovl_lower_dir(lower, &stack[numlower], ofs,
+		err = ovl_lower_dir(lower, &ofs->lowerpaths[numlower], ofs,
 				    &sb->s_stack_depth);
 		if (err)
 			goto out_err;
@@ -1560,7 +1573,7 @@ static struct ovl_entry *ovl_get_lowerstack(struct super_block *sb,
 		goto out_err;
 	}
 
-	err = ovl_get_layers(sb, ofs, stack, numlower);
+	err = ovl_get_layers(sb, ofs, ofs->lowerpaths, numlower);
 	if (err)
 		goto out_err;
 
@@ -1570,19 +1583,20 @@ static struct ovl_entry *ovl_get_lowerstack(struct super_block *sb,
 		goto out_err;
 
 	for (i = 0; i < numlower; i++) {
-		oe->lowerstack[i].dentry = dget(stack[i].dentry);
+		oe->lowerstack[i].dentry = dget(ofs->lowerpaths[i].dentry);
 		oe->lowerstack[i].layer = &ofs->layers[i+1];
 	}
 
 out:
-	for (i = 0; i < numlower; i++)
-		path_put(&stack[i]);
-	kfree(stack);
 	kfree(lowertmp);
 
 	return oe;
 
 out_err:
+	for (i = 0; i < numlower; i++)
+		path_put_init(&ofs->lowerpaths[i]);
+	kfree(ofs->lowerpaths);
+	ofs->lowerpaths = NULL;
 	oe = ERR_PTR(err);
 	goto out;
 }
@@ -1642,7 +1656,7 @@ static int ovl_check_overlapping_layers(struct super_block *sb,
 		 * workbasedir.  In that case, we already have their traps in
 		 * inode cache and we will catch that case on lookup.
 		 */
-		err = ovl_check_layer(sb, ofs, ofs->workbasedir, "workdir");
+		err = ovl_check_layer(sb, ofs, ofs->workbasepath.dentry, "workdir");
 		if (err)
 			return err;
 	}
@@ -1667,7 +1681,7 @@ static struct dentry *ovl_get_root(struct super_block *sb,
 	unsigned long ino = d_inode(lowerpath->dentry)->i_ino;
 	int fsid = lowerpath->layer->fsid;
 	struct ovl_inode_params oip = {
-		.upperdentry = upperdentry,
+		.upperdentry = dget(upperdentry),
 		.lowerpath = lowerpath,
 	};
 
@@ -1698,7 +1712,6 @@ static struct dentry *ovl_get_root(struct super_block *sb,
 
 static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 {
-	struct path upperpath = { };
 	struct dentry *root_dentry;
 	struct ovl_entry *oe;
 	struct ovl_fs *ofs;
@@ -1752,11 +1765,11 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 			goto out_err;
 		}
 
-		err = ovl_get_upper(sb, ofs, &upperpath);
+		err = ovl_get_upper(sb, ofs, &ofs->upperpath);
 		if (err)
 			goto out_err;
 
-		err = ovl_get_workdir(sb, ofs, &upperpath);
+		err = ovl_get_workdir(sb, ofs, &ofs->upperpath);
 		if (err)
 			goto out_err;
 
@@ -1777,7 +1790,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 		sb->s_flags |= SB_RDONLY;
 
 	if (!(ovl_force_readonly(ofs)) && ofs->config.index) {
-		err = ovl_get_indexdir(sb, ofs, oe, &upperpath);
+		err = ovl_get_indexdir(sb, ofs, oe, &ofs->upperpath);
 		if (err)
 			goto out_free_oe;
 
@@ -1820,12 +1833,10 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	sb->s_flags |= SB_POSIXACL;
 
 	err = -ENOMEM;
-	root_dentry = ovl_get_root(sb, upperpath.dentry, oe);
+	root_dentry = ovl_get_root(sb, ofs->upperpath.dentry, oe);
 	if (!root_dentry)
 		goto out_free_oe;
 
-	mntput(upperpath.mnt);
-
 	sb->s_root = root_dentry;
 
 	return 0;
@@ -1834,7 +1845,6 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	ovl_entry_stack_free(oe);
 	kfree(oe);
 out_err:
-	path_put(&upperpath);
 	ovl_free_fs(ofs);
 out:
 	return err;
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 36b60788ee47..36bb98c14d35 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -13,6 +13,7 @@
 #include <linux/uuid.h>
 #include <linux/namei.h>
 #include <linux/ratelimit.h>
+#include <linux/seq_file.h>
 #include "overlayfs.h"
 
 int ovl_want_write(struct dentry *dentry)
@@ -920,3 +921,23 @@ char *ovl_get_redirect_xattr(struct dentry *dentry, int padding)
 	kfree(buf);
 	return ERR_PTR(res);
 }
+
+void print_path_option(struct seq_file *m, const char *name, struct path *path)
+{
+	seq_show_option(m, name, "");
+	seq_path(m, path, ", \t\n\\");
+}
+
+void print_paths_option(struct seq_file *m, const char *name,
+			struct path *paths, unsigned int num)
+{
+	int i;
+
+	seq_show_option(m, name, "");
+
+	for (i = 0; i < num; i++) {
+		if (i)
+			seq_putc(m, ':');
+		seq_path(m, &paths[i], ", \t\n\\");
+	}
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] overlayfs: add mnt_id paths options
  2020-06-04 16:11 [PATCH 0/2] overlayfs: C/R enhancements Alexander Mikhalitsyn
  2020-06-04 16:11 ` [PATCH 1/2] overlayfs: add dynamic path resolving in mount options Alexander Mikhalitsyn
@ 2020-06-04 16:11 ` Alexander Mikhalitsyn
  2020-06-04 18:04 ` [PATCH 0/2] overlayfs: C/R enhancements Amir Goldstein
  2 siblings, 0 replies; 11+ messages in thread
From: Alexander Mikhalitsyn @ 2020-06-04 16:11 UTC (permalink / raw)
  To: miklos
  Cc: avagin, ptikhomirov, khorenko, vvs, ktkhai,
	Alexander Mikhalitsyn, linux-unionfs, linux-kernel

This patch adds config OVERLAY_FS_PATH_OPTIONS_MNT_ID
compile-time option, and "mnt_id_path_opts" runtime module option.
If enabled, user may see mnt_ids for lowerdir, upperdir paths
in mountinfo in separate lowerdir_mnt_id/upperdir_mnt_id options.

This patch is very helpful to checkpoint/restore functionality
of overlayfs mounts in case when we have overmounts on
lowerdir, workdir, upperdir paths.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
---
 fs/overlayfs/Kconfig     | 26 ++++++++++++++++++++++++++
 fs/overlayfs/overlayfs.h |  2 ++
 fs/overlayfs/super.c     | 15 +++++++++++++++
 fs/overlayfs/util.c      | 21 +++++++++++++++++++++
 4 files changed, 64 insertions(+)

diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
index c24988527ef3..2797869c8d16 100644
--- a/fs/overlayfs/Kconfig
+++ b/fs/overlayfs/Kconfig
@@ -154,4 +154,30 @@ config OVERLAY_FS_DYNAMIC_RESOLVE_PATH_OPTIONS
 
 	  For more information, see Documentation/filesystems/overlayfs.txt
 
+	  If unsure, say N.
+
+config OVERLAY_FS_PATH_OPTIONS_MNT_ID
+	bool "Overlayfs: show mnt_id for all mount paths options"
+	default y
+	depends on OVERLAY_FS
+	help
+	  This option helps checkpoint/restore of overlayfs mounts.
+	  If N selected, old behavior is saved.
+
+	  If this config option is enabled then in overlay filesystems mount
+	  options you will be able to see additional parameters lowerdir_mnt_id/
+	  upperdir_mnt_id with corresponding mnt_ids.
+
+	  It's also possible to change this behavior on overlayfs module loading or
+	  through sysfs (mnt_id_path_opts parameter).
+
+	  Disable this to get a backward compatible with previous kernels configuration,
+	  but in this case checkpoint/restore functionality for overlayfs mounts
+	  may not fully work.
+
+	  If backward compatibility is not an issue, then it is safe and
+	  recommended to say Y here.
+
+	  For more information, see Documentation/filesystems/overlayfs.txt
+
 	  If unsure, say N.
\ No newline at end of file
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 8722ed556e11..980fe06d15b5 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -305,6 +305,8 @@ ssize_t ovl_getxattr(struct dentry *dentry, char *name, char **value,
 void print_path_option(struct seq_file *m, const char *name, struct path *path);
 void print_paths_option(struct seq_file *m, const char *name,
 			struct path *paths, unsigned int num);
+void print_mnt_id_option(struct seq_file *m, const char *name, struct path *path);
+void print_mnt_ids_option(struct seq_file *m, const char *name,
 			struct path *paths, unsigned int num);
 
 static inline bool ovl_is_impuredir(struct dentry *dentry)
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index a449b6bb4b20..ee2ed125341c 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -57,6 +57,10 @@ static bool ovl_dyn_path_opts = IS_ENABLED(CONFIG_OVERLAY_FS_DYNAMIC_RESOLVE_PAT
 module_param_named(dyn_path_opts, ovl_dyn_path_opts, bool, 0644);
 MODULE_PARM_DESC(dyn_path_opts, "dyn_path_opts feature enabled");
 
+static bool ovl_mnt_id_path_opts = IS_ENABLED(OVERLAY_FS_PATH_OPTIONS_MNT_ID);
+module_param_named(mnt_id_path_opts, ovl_mnt_id_path_opts, bool, 0644);
+MODULE_PARM_DESC(mnt_id_path_opts, "mnt_id_path_opts feature enabled");
+
 static void ovl_entry_stack_free(struct ovl_entry *oe)
 {
 	unsigned int i;
@@ -362,6 +366,17 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
 			seq_show_option(m, "workdir", ofs->config.workdir);
 		}
 	}
+
+	if (ovl_mnt_id_path_opts) {
+		print_mnt_ids_option(m, "lowerdir_mnt_id", ofs->lowerpaths, ofs->numlayer);
+		/*
+		 * We don't need to show mnt_id for workdir because it
+		 * on the same mount as upperdir.
+		 */
+		if (ofs->config.upperdir)
+			print_mnt_id_option(m, "upperdir_mnt_id", &ofs->upperpath);
+	}
+
 	if (ofs->config.default_permissions)
 		seq_puts(m, ",default_permissions");
 	if (strcmp(ofs->config.redirect_mode, ovl_redirect_mode_def()) != 0)
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 36bb98c14d35..85106b2ed00a 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -14,6 +14,7 @@
 #include <linux/namei.h>
 #include <linux/ratelimit.h>
 #include <linux/seq_file.h>
+#include "../mount.h"
 #include "overlayfs.h"
 
 int ovl_want_write(struct dentry *dentry)
@@ -941,3 +942,23 @@ void print_paths_option(struct seq_file *m, const char *name,
 		seq_path(m, &paths[i], ", \t\n\\");
 	}
 }
+
+void print_mnt_id_option(struct seq_file *m, const char *name, struct path *path)
+{
+	seq_show_option(m, name, "");
+	seq_printf(m, "%i", real_mount(path->mnt)->mnt_id);
+}
+
+void print_mnt_ids_option(struct seq_file *m, const char *name,
+			struct path *paths, unsigned int num)
+{
+	int i;
+
+	seq_show_option(m, name, "");
+
+	for (i = 0; i < num; i++) {
+		if (i)
+			seq_putc(m, ':');
+		seq_printf(m, "%i", real_mount(paths[i].mnt)->mnt_id);
+	}
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] overlayfs: C/R enhancements
  2020-06-04 16:11 [PATCH 0/2] overlayfs: C/R enhancements Alexander Mikhalitsyn
  2020-06-04 16:11 ` [PATCH 1/2] overlayfs: add dynamic path resolving in mount options Alexander Mikhalitsyn
  2020-06-04 16:11 ` [PATCH 2/2] overlayfs: add mnt_id paths options Alexander Mikhalitsyn
@ 2020-06-04 18:04 ` Amir Goldstein
  2020-06-04 21:34   ` Alexander Mikhalitsyn
  2 siblings, 1 reply; 11+ messages in thread
From: Amir Goldstein @ 2020-06-04 18:04 UTC (permalink / raw)
  To: Alexander Mikhalitsyn
  Cc: Miklos Szeredi, Andrei Vagin, ptikhomirov, khorenko, vvs, ktkhai,
	overlayfs, linux-kernel

On Thu, Jun 4, 2020 at 7:13 PM Alexander Mikhalitsyn
<alexander.mikhalitsyn@virtuozzo.com> wrote:
>
> This patchset aimed to make C/R of overlayfs mounts with CRIU possible.
> We introduce two new overlayfs module options -- dyn_path_opts and
> mnt_id_path_opts. If enabled this options allows to see real *full* paths
> in lowerdir, workdir, upperdir options, and also mnt_ids for corresponding
> paths.
>
> This changes should not break anything because for showing mnt_ids we simply
> introduce new show-time mount options. And for paths we simply *always*
> provide *full paths* instead of relative path on mountinfo.
>
> BEFORE
>
> overlay on /var/lib/docker/overlay2/XYZ/merged type overlay (rw,relatime,
> lowerdir=/var/lib/docker/overlay2/XYZ-init/diff:/var/lib/docker/overlay2/
> ABC/diff,upperdir=/var/lib/docker/overlay2/XYZ/diff,workdir=/var/lib/docker
> /overlay2/XYZ/work)
> none on /sys type sysfs (rw,relatime)
>
> AFTER
>
> overlay on /var/lib/docker/overlay2/XYZ/merged type overlay (rw,relatime,
> lowerdir=/var/lib/docker/overlay2/XYZ-init/diff:/var/lib/docker/overlay2/
> ABC/diff,upperdir=/var/lib/docker/overlay2/XYZ/diff,workdir=/var/lib/docker
> /overlay2/XYZ/work,lowerdir_mnt_id=175:175,upperdir_mnt_id=175)
> none on /sys type sysfs (rw,relatime)
>

But overlayfs won't accept these "output only" options as input args,
which is a problem.

Wouldn't it be better for C/R to implement mount options
that overlayfs can parse and pass it mntid and fhandle instead
of paths?
I believe C/R is using a similar method to export inotify/fanotify
marks.

FWIW overlayfs already has utilities that encode filehandle to
text and back, see  ovl_get_index_name().

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] overlayfs: C/R enhancements
  2020-06-04 18:04 ` [PATCH 0/2] overlayfs: C/R enhancements Amir Goldstein
@ 2020-06-04 21:34   ` Alexander Mikhalitsyn
  2020-06-05  2:35     ` Amir Goldstein
  0 siblings, 1 reply; 11+ messages in thread
From: Alexander Mikhalitsyn @ 2020-06-04 21:34 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Andrey Vagin, Pavel Tikhomirov,
	Konstantin Khorenko, Vasiliy Averin, Kirill Tkhai, overlayfs,
	linux-kernel

Hello,

>But overlayfs won't accept these "output only" options as input args,
which is a problem.

Will it be problematic if we simply ignore "lowerdir_mnt_id" and "upperdir_mnt_id" options in ovl_parse_opt()?

>Wouldn't it be better for C/R to implement mount options
that overlayfs can parse and pass it mntid and fhandle instead
of paths?

Problem is that we need to know on C/R "dump stage" which mounts are used on lower layers and upper layer. Most likely I don't understand something but I can't catch how "mount-time" options will help us.

>I believe C/R is using a similar method to export inotify/fanotify
marks.

Yes, we are using fhandles to C/R inotify/fanotify. On "dump" we get fhandle from /proc/<pid>/fdinfo and on "restore" we open fhandle through open_by_handle_at() syscall.

Regards, Alex

________________________________________
From: Amir Goldstein <amir73il@gmail.com>
Sent: Thursday, June 4, 2020 21:04
To: Alexander Mikhalitsyn
Cc: Miklos Szeredi; Andrey Vagin; Pavel Tikhomirov; Konstantin Khorenko; Vasiliy Averin; Kirill Tkhai; overlayfs; linux-kernel
Subject: Re: [PATCH 0/2] overlayfs: C/R enhancements

On Thu, Jun 4, 2020 at 7:13 PM Alexander Mikhalitsyn
<alexander.mikhalitsyn@virtuozzo.com> wrote:
>
> This patchset aimed to make C/R of overlayfs mounts with CRIU possible.
> We introduce two new overlayfs module options -- dyn_path_opts and
> mnt_id_path_opts. If enabled this options allows to see real *full* paths
> in lowerdir, workdir, upperdir options, and also mnt_ids for corresponding
> paths.
>
> This changes should not break anything because for showing mnt_ids we simply
> introduce new show-time mount options. And for paths we simply *always*
> provide *full paths* instead of relative path on mountinfo.
>
> BEFORE
>
> overlay on /var/lib/docker/overlay2/XYZ/merged type overlay (rw,relatime,
> lowerdir=/var/lib/docker/overlay2/XYZ-init/diff:/var/lib/docker/overlay2/
> ABC/diff,upperdir=/var/lib/docker/overlay2/XYZ/diff,workdir=/var/lib/docker
> /overlay2/XYZ/work)
> none on /sys type sysfs (rw,relatime)
>
> AFTER
>
> overlay on /var/lib/docker/overlay2/XYZ/merged type overlay (rw,relatime,
> lowerdir=/var/lib/docker/overlay2/XYZ-init/diff:/var/lib/docker/overlay2/
> ABC/diff,upperdir=/var/lib/docker/overlay2/XYZ/diff,workdir=/var/lib/docker
> /overlay2/XYZ/work,lowerdir_mnt_id=175:175,upperdir_mnt_id=175)
> none on /sys type sysfs (rw,relatime)
>

But overlayfs won't accept these "output only" options as input args,
which is a problem.

Wouldn't it be better for C/R to implement mount options
that overlayfs can parse and pass it mntid and fhandle instead
of paths?
I believe C/R is using a similar method to export inotify/fanotify
marks.

FWIW overlayfs already has utilities that encode filehandle to
text and back, see  ovl_get_index_name().

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] overlayfs: C/R enhancements
  2020-06-04 21:34   ` Alexander Mikhalitsyn
@ 2020-06-05  2:35     ` Amir Goldstein
  2020-06-05  8:41       ` Pavel Tikhomirov
  0 siblings, 1 reply; 11+ messages in thread
From: Amir Goldstein @ 2020-06-05  2:35 UTC (permalink / raw)
  To: Alexander Mikhalitsyn
  Cc: Miklos Szeredi, Andrey Vagin, Pavel Tikhomirov,
	Konstantin Khorenko, Vasiliy Averin, Kirill Tkhai, overlayfs,
	linux-kernel

On Fri, Jun 5, 2020 at 12:34 AM Alexander Mikhalitsyn
<alexander.mikhalitsyn@virtuozzo.com> wrote:
>
> Hello,
>
> >But overlayfs won't accept these "output only" options as input args,
> which is a problem.
>
> Will it be problematic if we simply ignore "lowerdir_mnt_id" and "upperdir_mnt_id" options in ovl_parse_opt()?
>

That would solve this small problem.

> >Wouldn't it be better for C/R to implement mount options
> that overlayfs can parse and pass it mntid and fhandle instead
> of paths?
>
> Problem is that we need to know on C/R "dump stage" which mounts are used on lower layers and upper layer. Most likely I don't understand something but I can't catch how "mount-time" options will help us.

As you already know from inotify/fanotify C/R fhandle is timeless, so
there would be no distinction between mount time and dump time.
About mnt_id, your patches will cause the original mount-time mounts to be busy.
That is a problem as well.

I think you should describe the use case is more details.
Is your goal to C/R any overlayfs mount that the process has open
files on? visible to process?
For NFS export, we use the persistent descriptor {uuid;fhandle}
(a.k.a. struct ovl_fh) to encode
an underlying layer object.

CRIU can look for an existing mount to a filesystem with uuid as restore stage
(or even mount this filesystem) and use open_by_handle_at() to open a
path to layer.
After mounting overlay, that mount to underlying fs can even be discarded.

And if this works for you, you don't have to export the layers ovl_fh in
/proc/mounts, you can export them in numerous other ways.
One way from the top of my head, getxattr on overlay root dir.
"trusted.overlay" xattr is anyway a reserved prefix, so "trusted.overlay.layers"
for example could work.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] overlayfs: C/R enhancements
  2020-06-05  2:35     ` Amir Goldstein
@ 2020-06-05  8:41       ` Pavel Tikhomirov
  2020-06-05 10:21         ` Amir Goldstein
  0 siblings, 1 reply; 11+ messages in thread
From: Pavel Tikhomirov @ 2020-06-05  8:41 UTC (permalink / raw)
  To: Amir Goldstein, Alexander Mikhalitsyn
  Cc: Miklos Szeredi, Andrey Vagin, Konstantin Khorenko,
	Vasiliy Averin, Kirill Tkhai, overlayfs, linux-kernel



On 6/5/20 5:35 AM, Amir Goldstein wrote:
> On Fri, Jun 5, 2020 at 12:34 AM Alexander Mikhalitsyn
> <alexander.mikhalitsyn@virtuozzo.com> wrote:
>>
>> Hello,
>>
>>> But overlayfs won't accept these "output only" options as input args,
>> which is a problem.
>>
>> Will it be problematic if we simply ignore "lowerdir_mnt_id" and "upperdir_mnt_id" options in ovl_parse_opt()?
>>
> 
> That would solve this small problem.

This is not a big problem actually as these options shown in mountinfo 
for overlay had been "output only" forever, please see these two 
examples below:

a) Imagine you've mounted overlay with relative paths and forgot (or you 
never known as you are another user) where your cwd was at the moment of 
mount syscall. - How would you use those options as "input" to create 
the same overlay mount somethere else (bind-mounting not involved)?

b) Imagine you've mounted overlay with absolute paths and someone (other 
user) overmounted lower (upper/workdir) paths for you, all directory 
structure would be the same on overmount but yet files are different. - 
How would you use those options from mountinfo as "input" ones?

We try to make them much closer to "input" ones.

Agreed, we should ignore *_mnt_id on mount because paths identify mounts 
at the time of mount call.

> 
>>> Wouldn't it be better for C/R to implement mount options
>> that overlayfs can parse and pass it mntid and fhandle instead
>> of paths? >>
>> Problem is that we need to know on C/R "dump stage" which mounts are used on lower layers and upper layer. Most likely I don't understand something but I can't catch how "mount-time" options will help us.
> 
> As you already know from inotify/fanotify C/R fhandle is timeless, so
> there would be no distinction between mount time and dump time.

Pair of fhandle+mnt_id looks an equivalent to path+mnt_id pair, CRIU 
will just need to open fhandle+mnt_id with open_by_handle_at and 
readlink to get path on dump and continue to use path+mnt_id as before. 
(not too common with fhandles but it's my current understanding)

But if you take a look on (a) and (b) again, the regular user does not 
see full information about overlay mount in /proc/pid/mountinfo, they 
can't just take a look on it and understand from there it comes from. 
Resolving fhandle looks like a too hard task for a user.

> About mnt_id, your patches will cause the original mount-time mounts to be busy.
> That is a problem as well.

Children mounts lock parent, open files lock parent. Another analogy is 
a loop device which locks the backing file mount (AFAICS). Anyway one 
can lazy umount, can't they? But I'm not too sure for this one, maybe 
you can share more implications of this problem?

> 
> I think you should describe the use case is more details.
> Is your goal to C/R any overlayfs mount that the process has open
> files on? visible to process
We wan't to dump a container, not a simple process, if the container 
process has access to some resource CRIU needs to restore this resource.

Imagine the process in container mounts it's own overlay inside 
container, for instance to imulate write access to readonly mount or 
just to implement some snapshots, don't know exact use case. And we want 
to checkpoint/restore this container. (Currently CRIU only supports 
overlay as external mount, e.g. for docker continers docker engine 
pre-creates overlay for us and we just bind from it - it's a different 
case.) If the in-container process creates the in-container mount we 
need to recreate it on restore so that the in-container view of the 
filesystem persists.

> For NFS export, we use the persistent descriptor {uuid;fhandle}
> (a.k.a. struct ovl_fh) to encode
> an underlying layer object.
> 
> CRIU can look for an existing mount to a filesystem with uuid as restore stage
> (or even mount this filesystem) and use open_by_handle_at() to open a
> path to layer.

On restore we can be on another physical node, so I doubt we have same 
uuid's, sorry I don't fully understand here already.

> After mounting overlay, that mount to underlying fs can even be discarded.
> 
> And if this works for you, you don't have to export the layers ovl_fh in
> /proc/mounts, you can export them in numerous other ways.
> One way from the top of my head, getxattr on overlay root dir.
> "trusted.overlay" xattr is anyway a reserved prefix, so "trusted.overlay.layers"
> for example could work.

Thanks xattr might be a good option, but still don't forget about (a) 
and (b), users like to know all information about mount from 
/proc/pid/mountinfo.

> 
> Thanks,
> Amir.
> 

-- 
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] overlayfs: C/R enhancements
  2020-06-05  8:41       ` Pavel Tikhomirov
@ 2020-06-05 10:21         ` Amir Goldstein
  2020-06-05 12:44           ` Alexander Mikhalitsyn
  0 siblings, 1 reply; 11+ messages in thread
From: Amir Goldstein @ 2020-06-05 10:21 UTC (permalink / raw)
  To: Pavel Tikhomirov
  Cc: Alexander Mikhalitsyn, Miklos Szeredi, Andrey Vagin,
	Konstantin Khorenko, Vasiliy Averin, Kirill Tkhai, overlayfs,
	linux-kernel

On Fri, Jun 5, 2020 at 11:41 AM Pavel Tikhomirov
<ptikhomirov@virtuozzo.com> wrote:
>
>
>
> On 6/5/20 5:35 AM, Amir Goldstein wrote:
> > On Fri, Jun 5, 2020 at 12:34 AM Alexander Mikhalitsyn
> > <alexander.mikhalitsyn@virtuozzo.com> wrote:
> >>
> >> Hello,
> >>
> >>> But overlayfs won't accept these "output only" options as input args,
> >> which is a problem.
> >>
> >> Will it be problematic if we simply ignore "lowerdir_mnt_id" and "upperdir_mnt_id" options in ovl_parse_opt()?
> >>
> >
> > That would solve this small problem.
>
> This is not a big problem actually as these options shown in mountinfo
> for overlay had been "output only" forever,
> please see these two examples below:
>
> a) Imagine you've mounted overlay with relative paths and forgot (or you
> never known as you are another user) where your cwd was at the moment of
> mount syscall. - How would you use those options as "input" to create
> the same overlay mount somethere else (bind-mounting not involved)?
>
> b) Imagine you've mounted overlay with absolute paths and someone (other
> user) overmounted lower (upper/workdir) paths for you, all directory
> structure would be the same on overmount but yet files are different. -
> How would you use those options from mountinfo as "input" ones?
>
> We try to make them much closer to "input" ones.


That is not what I meant by "output only"
I meant invalid input options as in EINVAL not ENOENT

>
> Agreed, we should ignore *_mnt_id on mount because paths identify mounts
> at the time of mount call.
>
> >
> >>> Wouldn't it be better for C/R to implement mount options
> >> that overlayfs can parse and pass it mntid and fhandle instead
> >> of paths? >>
> >> Problem is that we need to know on C/R "dump stage" which mounts are used on lower layers and upper layer. Most likely I don't understand something but I can't catch how "mount-time" options will help us.
> >
> > As you already know from inotify/fanotify C/R fhandle is timeless, so
> > there would be no distinction between mount time and dump time.
>
> Pair of fhandle+mnt_id looks an equivalent to path+mnt_id pair, CRIU
> will just need to open fhandle+mnt_id with open_by_handle_at and
> readlink to get path on dump and continue to use path+mnt_id as before.
> (not too common with fhandles but it's my current understanding)
>
> But if you take a look on (a) and (b) again, the regular user does not
> see full information about overlay mount in /proc/pid/mountinfo, they
> can't just take a look on it and understand from there it comes from.
> Resolving fhandle looks like a too hard task for a user.
>

Right, and we need to provide regular user needs in parallel to
providing C/R needs. understood.

> > About mnt_id, your patches will cause the original mount-time mounts to be busy.
> > That is a problem as well.
>
> Children mounts lock parent, open files lock parent. Another analogy is
> a loop device which locks the backing file mount (AFAICS). Anyway one
> can lazy umount, can't they? But I'm not too sure for this one, maybe
> you can share more implications of this problem?
>

Overlayfs mounts are internal not children mounts in the namespace,
so no open files hold reference the mounts in the namespace (AFAICS).

This use case will break:

  mount /dev/vdf /vdf
  mkdir /vdf/{l,u,w} /tmp/m
  mount -t overlay overlay /tmp/m -o
lowerdir=/vdf/l,upperdir=/vdf/u,workdir=/vdf/w
  umount /vdf

Yes users can lazy unmount, the filesystem itself on /dev/vdf is not actually
unmounted, only the /vdf mount goes away from the namespace, but the use case
without lazy unmount will still break.

Maybe its fine since distro/admin needs to opt-in for this change of behavior.
I have to wonder though, why did you add two different config/module
options for this feature? Yes, its two different sub-functionalities, but
which real user (not imaginary one) will really turn on just half the feature?
While at it, you copy pasted the text:
          For more information, see Documentation/filesystems/overlayfs.txt
but there is no more information to be found.

> >
> > I think you should describe the use case is more details.
> > Is your goal to C/R any overlayfs mount that the process has open
> > files on? visible to process
> We wan't to dump a container, not a simple process, if the container
> process has access to some resource CRIU needs to restore this resource.
>
> Imagine the process in container mounts it's own overlay inside
> container, for instance to imulate write access to readonly mount or
> just to implement some snapshots, don't know exact use case. And we want
> to checkpoint/restore this container. (Currently CRIU only supports
> overlay as external mount, e.g. for docker continers docker engine
> pre-creates overlay for us and we just bind from it - it's a different
> case.) If the in-container process creates the in-container mount we
> need to recreate it on restore so that the in-container view of the
> filesystem persists.
>

Understood. but how do you *know* which mounts the container
created and need to be migrated?
Which loop devices the user has created?
As opposed to the ones that docker engine re-created?
It is the found from diff between mountinfo of process and host?

> > For NFS export, we use the persistent descriptor {uuid;fhandle}
> > (a.k.a. struct ovl_fh) to encode
> > an underlying layer object.
> >
> > CRIU can look for an existing mount to a filesystem with uuid as restore stage
> > (or even mount this filesystem) and use open_by_handle_at() to open a
> > path to layer.
>
> On restore we can be on another physical node, so I doubt we have same
> uuid's, sorry I don't fully understand here already.
>

I see, so what about inotify/fanotify?
fhandle and uuid can be looked up/resolved to mnt/path at "dump" time.
The difference between mnt_id/uuid is who keeps the reference on the
mount. If overlayfs provides uuid, then you rely on docker to keep the
reference on the mount and use the reference-less uuid to find the
mount that docker is holding for you.

> > After mounting overlay, that mount to underlying fs can even be discarded.
> >
> > And if this works for you, you don't have to export the layers ovl_fh in
> > /proc/mounts, you can export them in numerous other ways.
> > One way from the top of my head, getxattr on overlay root dir.
> > "trusted.overlay" xattr is anyway a reserved prefix, so "trusted.overlay.layers"
> > for example could work.
>
> Thanks xattr might be a good option, but still don't forget about (a)
> and (b), users like to know all information about mount from
> /proc/pid/mountinfo.
>

Let's stick to your use cases requirements. If you have other use cases
for this functionality lay them out explicitly.

I went to see what losetup does and I see that LOOP_SET_STATUS
ioctl stores a path string that LOOP_GET_STATUS gets back in return.
Does not seem C/R friendly either. Are you not handling loop devices?

It's strange because loop driver keeps an open backing file so
LOOP_GET_STATUS could have returned the uptodate path.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] overlayfs: C/R enhancements
  2020-06-05 10:21         ` Amir Goldstein
@ 2020-06-05 12:44           ` Alexander Mikhalitsyn
  2020-06-05 14:36             ` Amir Goldstein
  0 siblings, 1 reply; 11+ messages in thread
From: Alexander Mikhalitsyn @ 2020-06-05 12:44 UTC (permalink / raw)
  To: Amir Goldstein, Pavel Tikhomirov
  Cc: Miklos Szeredi, Andrey Vagin, Konstantin Khorenko,
	Vasiliy Averin, Kirill Tkhai, overlayfs, linux-kernel

Dear Amir,

We actively discussed about patches with Pavel, so with your favour I'll try to answer (partially) to some of your comments.

On Fri, 5 Jun 2020 13:21:40 +0300
Amir Goldstein <amir73il@gmail.com> wrote:

> On Fri, Jun 5, 2020 at 11:41 AM Pavel Tikhomirov
> <ptikhomirov@virtuozzo.com> wrote:
> >
> >
> >
> > On 6/5/20 5:35 AM, Amir Goldstein wrote:
> > > On Fri, Jun 5, 2020 at 12:34 AM Alexander Mikhalitsyn
> > > <alexander.mikhalitsyn@virtuozzo.com> wrote:
> > >>
> > >> Hello,
> > >>
> > >>> But overlayfs won't accept these "output only" options as input args,
> > >> which is a problem.
> > >>
> > >> Will it be problematic if we simply ignore "lowerdir_mnt_id" and "upperdir_mnt_id" options in ovl_parse_opt()?
> > >>
> > >
> > > That would solve this small problem.
> >
> > This is not a big problem actually as these options shown in mountinfo
> > for overlay had been "output only" forever,
> > please see these two examples below:
> >
> > a) Imagine you've mounted overlay with relative paths and forgot (or you
> > never known as you are another user) where your cwd was at the moment of
> > mount syscall. - How would you use those options as "input" to create
> > the same overlay mount somethere else (bind-mounting not involved)?
> >
> > b) Imagine you've mounted overlay with absolute paths and someone (other
> > user) overmounted lower (upper/workdir) paths for you, all directory
> > structure would be the same on overmount but yet files are different. -
> > How would you use those options from mountinfo as "input" ones?
> >
> > We try to make them much closer to "input" ones.
> 
> 
> That is not what I meant by "output only"
> I meant invalid input options as in EINVAL not ENOENT

Yes, it's just as example of existing inconsistency between mount-time/show-time
options. But yes, errno will be different. As it was said before we can simply
ignore new options at mount time.

> 
> >
> > Agreed, we should ignore *_mnt_id on mount because paths identify mounts
> > at the time of mount call.
> >
> > >
> > >>> Wouldn't it be better for C/R to implement mount options
> > >> that overlayfs can parse and pass it mntid and fhandle instead
> > >> of paths? >>
> > >> Problem is that we need to know on C/R "dump stage" which mounts are used on lower layers and upper layer. Most likely I don't understand something but I can't catch how "mount-time" options will help us.
> > >
> > > As you already know from inotify/fanotify C/R fhandle is timeless, so
> > > there would be no distinction between mount time and dump time.
> >
> > Pair of fhandle+mnt_id looks an equivalent to path+mnt_id pair, CRIU
> > will just need to open fhandle+mnt_id with open_by_handle_at and
> > readlink to get path on dump and continue to use path+mnt_id as before.
> > (not too common with fhandles but it's my current understanding)
> >
> > But if you take a look on (a) and (b) again, the regular user does not
> > see full information about overlay mount in /proc/pid/mountinfo, they
> > can't just take a look on it and understand from there it comes from.
> > Resolving fhandle looks like a too hard task for a user.
> >
> 
> Right, and we need to provide regular user needs in parallel to
> providing C/R needs. understood.
> 
> > > About mnt_id, your patches will cause the original mount-time mounts to be busy.
> > > That is a problem as well.
> >
> > Children mounts lock parent, open files lock parent. Another analogy is
> > a loop device which locks the backing file mount (AFAICS). Anyway one
> > can lazy umount, can't they? But I'm not too sure for this one, maybe
> > you can share more implications of this problem?
> >
> 
> Overlayfs mounts are internal not children mounts in the namespace,
> so no open files hold reference the mounts in the namespace (AFAICS).
> 
> This use case will break:
> 
>   mount /dev/vdf /vdf
>   mkdir /vdf/{l,u,w} /tmp/m
>   mount -t overlay overlay /tmp/m -o
> lowerdir=/vdf/l,upperdir=/vdf/u,workdir=/vdf/w
>   umount /vdf
> 
> Yes users can lazy unmount, the filesystem itself on /dev/vdf is not actually
> unmounted, only the /vdf mount goes away from the namespace, but the use case
> without lazy unmount will still break.
> 
> Maybe its fine since distro/admin needs to opt-in for this change of behavior.

At current moment this options only affect on show options, but of course, we
can change patch, so that if both new features are turned off, we not hold
"struct path" reference. In this case behavior will fully as it was before proposed
changes.

> I have to wonder though, why did you add two different config/module
> options for this feature? Yes, its two different sub-functionalities, but
> which real user (not imaginary one) will really turn on just half the feature?

If user know that overmounts is not used on lowerdir/workdir/upperdir paths,
then it's not required to know mnt_id. Full path seems to be sufficient in this
scenario. It's not a critical for CRIU needs to have one or two options.

> While at it, you copy pasted the text:
>           For more information, see Documentation/filesystems/overlayfs.txt
> but there is no more information to be found.

As far as I know documentation patches must be send to another mailing list.
Of course I have plan to add information to overlayfs documentation about new feature.

> 
> > >
> > > I think you should describe the use case is more details.
> > > Is your goal to C/R any overlayfs mount that the process has open
> > > files on? visible to process
> > We wan't to dump a container, not a simple process, if the container
> > process has access to some resource CRIU needs to restore this resource.
> >
> > Imagine the process in container mounts it's own overlay inside
> > container, for instance to imulate write access to readonly mount or
> > just to implement some snapshots, don't know exact use case. And we want
> > to checkpoint/restore this container. (Currently CRIU only supports
> > overlay as external mount, e.g. for docker continers docker engine
> > pre-creates overlay for us and we just bind from it - it's a different
> > case.) If the in-container process creates the in-container mount we
> > need to recreate it on restore so that the in-container view of the
> > filesystem persists.
> >
> 
> Understood. but how do you *know* which mounts the container
> created and need to be migrated?
> Which loop devices the user has created?
> As opposed to the ones that docker engine re-created?
> It is the found from diff between mountinfo of process and host?

No, CRIU simply checkpoints *all* mounts in all mount namespaces
of process inside container, but if user (the one who calls criu,
e.g. container manager) want to use some mount from external mount
namespace then user must specify this mounts explicitly on restore
and we simply make bindmount (from this external mount) in
internal mount namespace.

> 
> > > For NFS export, we use the persistent descriptor {uuid;fhandle}
> > > (a.k.a. struct ovl_fh) to encode
> > > an underlying layer object.
> > >
> > > CRIU can look for an existing mount to a filesystem with uuid as restore stage
> > > (or even mount this filesystem) and use open_by_handle_at() to open a
> > > path to layer.
> >
> > On restore we can be on another physical node, so I doubt we have same
> > uuid's, sorry I don't fully understand here already.
> >
> 
> I see, so what about inotify/fanotify?
> fhandle and uuid can be looked up/resolved to mnt/path at "dump" time.
> The difference between mnt_id/uuid is who keeps the reference on the
> mount. If overlayfs provides uuid, then you rely on docker to keep the
> reference on the mount and use the reference-less uuid to find the
> mount that docker is holding for you.

Huh, seems very interesting! Thanks! I will need to dive into fhandle's
system to fully understand how it works.

> 
> > > After mounting overlay, that mount to underlying fs can even be discarded.
> > >
> > > And if this works for you, you don't have to export the layers ovl_fh in
> > > /proc/mounts, you can export them in numerous other ways.
> > > One way from the top of my head, getxattr on overlay root dir.
> > > "trusted.overlay" xattr is anyway a reserved prefix, so "trusted.overlay.layers"
> > > for example could work.
> >
> > Thanks xattr might be a good option, but still don't forget about (a)
> > and (b), users like to know all information about mount from
> > /proc/pid/mountinfo.
> >
> 
> Let's stick to your use cases requirements. If you have other use cases
> for this functionality lay them out explicitly.

Requirements is very simple, at "dump stage" we need to save all overlayfs mount options
sufficient to fully reconstruct overlayfs mount state on "restore stage". We already
have proof of concept implementation of Docker overlayfs mounts when docker is running in
OpenVZ container. In this case we fully dump all tree of mounts and all mount namespaces.
CRIU mounts restore procedure at first reconstruct mount tree in special separate subtrees
called "yards", then when all mounts is reconstructed we do "pivot_root" syscall. And
with overlayfs it was a problem, because we mounted overlayfs with lowerdir,workdir,upperdir
paths with mount namespace "yard" path prefix, and after restore in mount options user may see
that lowerdir,workdir,upperdir paths were changed... It's a problem. Also it makes second C/R
procedure is impossible, because after first C/R lowerdir,workdir,upperdir paths is invalidated
after pivot_root.

Example for Docker (after first C/R procedure):

options lowerdir=/tmp/.criu.mntns.owMo9C/9-0000000000//var/lib/docker/overlay2/l/4BLZ4WH6GZIVKJE5QF62QUUKVZ:/var/lib/docker/overlay2/l/7FYRGAXT35JMKTXCHDNCQO3HKT,upperdir=/tmp/.criu.mntns.owMo9C/9-0000000000//var/lib/docker/overlay2/30aa26fb5e5671fc0126f2fc0e84cc740ce6bf06ca6ad4ac877a3c60f5aceaf1/diff,workdir=/tmp/.criu.mntns.owMo9C/9-0000000000//var/lib/docker/overlay2/30aa26fb5e5671fc0126f2fc0e84cc740ce6bf06ca6ad4ac877a3c60f5aceaf1/work

/tmp/.criu.mntns.owMo9C/9-0000000000/ is a root yard for corresponding mount namespace. It's
service only temporary directory. After pivot_root this path is inaccessible and irrelevant.

To fix this problem 1st patch was prepared. To solve problem of C/Ring possible overmounts we
suggest 2nd patch with mnt_id showing.

> 
> I went to see what losetup does and I see that LOOP_SET_STATUS
> ioctl stores a path string that LOOP_GET_STATUS gets back in return.
> Does not seem C/R friendly either. Are you not handling loop devices?
> 
> It's strange because loop driver keeps an open backing file so
> LOOP_GET_STATUS could have returned the uptodate path.

At the moment we don't support loop-devices C/R. But yes, we may be will need to patch
ioctl(LOOP_GET_STATUS) for C/R needs.

> 
> Thanks,
> Amir.

-- 
Regards,
Alex.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] overlayfs: C/R enhancements
  2020-06-05 12:44           ` Alexander Mikhalitsyn
@ 2020-06-05 14:36             ` Amir Goldstein
  2020-06-05 14:56               ` Alexander Mikhalitsyn
  0 siblings, 1 reply; 11+ messages in thread
From: Amir Goldstein @ 2020-06-05 14:36 UTC (permalink / raw)
  To: Alexander Mikhalitsyn
  Cc: Pavel Tikhomirov, Miklos Szeredi, Andrey Vagin,
	Konstantin Khorenko, Vasiliy Averin, Kirill Tkhai, overlayfs,
	linux-kernel

> > While at it, you copy pasted the text:
> >           For more information, see Documentation/filesystems/overlayfs.txt
> > but there is no more information to be found.
>
> As far as I know documentation patches must be send to another mailing list.
> Of course I have plan to add information to overlayfs documentation about new feature.
>

Please send documentation patch together with the series
to this list. its fine to wait with that until the concept is approved though.


> > > > And if this works for you, you don't have to export the layers ovl_fh in
> > > > /proc/mounts, you can export them in numerous other ways.
> > > > One way from the top of my head, getxattr on overlay root dir.
> > > > "trusted.overlay" xattr is anyway a reserved prefix, so "trusted.overlay.layers"
> > > > for example could work.
> > >
> > > Thanks xattr might be a good option, but still don't forget about (a)
> > > and (b), users like to know all information about mount from
> > > /proc/pid/mountinfo.
> > >
> >
> > Let's stick to your use cases requirements. If you have other use cases
> > for this functionality lay them out explicitly.
>
> Requirements is very simple, at "dump stage" we need to save all overlayfs mount options
> sufficient to fully reconstruct overlayfs mount state on "restore stage". We already
> have proof of concept implementation of Docker overlayfs mounts when docker is running in
> OpenVZ container. In this case we fully dump all tree of mounts and all mount namespaces.
> CRIU mounts restore procedure at first reconstruct mount tree in special separate subtrees
> called "yards", then when all mounts is reconstructed we do "pivot_root" syscall. And
> with overlayfs it was a problem, because we mounted overlayfs with lowerdir,workdir,upperdir
> paths with mount namespace "yard" path prefix, and after restore in mount options user may see
> that lowerdir,workdir,upperdir paths were changed... It's a problem. Also it makes second C/R
> procedure is impossible, because after first C/R lowerdir,workdir,upperdir paths is invalidated
> after pivot_root.
>
> Example for Docker (after first C/R procedure):
>
> options lowerdir=/tmp/.criu.mntns.owMo9C/9-0000000000//var/lib/docker/overlay2/l/4BLZ4WH6GZIVKJE5QF62QUUKVZ:/var/lib/docker/overlay2/l/7FYRGAXT35JMKTXCHDNCQO3HKT,upperdir=/tmp/.criu.mntns.owMo9C/9-0000000000//var/lib/docker/overlay2/30aa26fb5e5671fc0126f2fc0e84cc740ce6bf06ca6ad4ac877a3c60f5aceaf1/diff,workdir=/tmp/.criu.mntns.owMo9C/9-0000000000//var/lib/docker/overlay2/30aa26fb5e5671fc0126f2fc0e84cc740ce6bf06ca6ad4ac877a3c60f5aceaf1/work
>

That reminds me.
I've read somewhere that thoses symlinks l/4BLZ4WH6GZIVKJE5QF62QUUKVZ
are meant to shorten the mount option string, because the mount
options are limited by
page size and with many lower layers limitation can reach.

That is one of the reasons that new mount API was created for (i.e. fsconfig()).
I wonder if /proc/mounts also has a similar limitation on options size.
I also wonder why docker doesn't chdir into /var/lib/docker/overlay2/
before mounting overlay and use relative paths, though that would have
been worse for CRIU.

So at least for the docker use case CRIU knows very well where the
underlying filesytem is mounted (/var/lib/docker/overlay2/ or above).
So if you got any API from overlayfs something like:
getxattr("/var/lib/docker/overlay2/XYZ/merged",
"trusted.overlay.layers.0.fh",..)
which reads the ovl_fh encoding of layer 0 (upper) rootdir, CRIU
can verify that uuid matches the filesystem mounted at /var/vol/docker/overlay2/
and then call open_by_handle_at() to open fd and resolve it to a path
under /var/vol/docker/overlay2.

I don't know if that provides what CRIU needs, but it would be no more
than a few lines of code in overlayfs:

if (i < ofs->numlayer)
    fh = ovl_encode_real_fh(ofs->layers[i].mnt->mnt_root, ...

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] overlayfs: C/R enhancements
  2020-06-05 14:36             ` Amir Goldstein
@ 2020-06-05 14:56               ` Alexander Mikhalitsyn
  0 siblings, 0 replies; 11+ messages in thread
From: Alexander Mikhalitsyn @ 2020-06-05 14:56 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Pavel Tikhomirov, Miklos Szeredi, Andrey Vagin,
	Konstantin Khorenko, Vasiliy Averin, Kirill Tkhai, overlayfs,
	linux-kernel

On Fri, 5 Jun 2020 17:36:10 +0300
Amir Goldstein <amir73il@gmail.com> wrote:

> > > While at it, you copy pasted the text:
> > >           For more information, see Documentation/filesystems/overlayfs.txt
> > > but there is no more information to be found.
> >
> > As far as I know documentation patches must be send to another mailing list.
> > Of course I have plan to add information to overlayfs documentation about new feature.
> >
> 
> Please send documentation patch together with the series
> to this list. its fine to wait with that until the concept is approved though.
> 

Yep, I will prepare patch and send.

> 
> > > > > And if this works for you, you don't have to export the layers ovl_fh in
> > > > > /proc/mounts, you can export them in numerous other ways.
> > > > > One way from the top of my head, getxattr on overlay root dir.
> > > > > "trusted.overlay" xattr is anyway a reserved prefix, so "trusted.overlay.layers"
> > > > > for example could work.
> > > >
> > > > Thanks xattr might be a good option, but still don't forget about (a)
> > > > and (b), users like to know all information about mount from
> > > > /proc/pid/mountinfo.
> > > >
> > >
> > > Let's stick to your use cases requirements. If you have other use cases
> > > for this functionality lay them out explicitly.
> >
> > Requirements is very simple, at "dump stage" we need to save all overlayfs mount options
> > sufficient to fully reconstruct overlayfs mount state on "restore stage". We already
> > have proof of concept implementation of Docker overlayfs mounts when docker is running in
> > OpenVZ container. In this case we fully dump all tree of mounts and all mount namespaces.
> > CRIU mounts restore procedure at first reconstruct mount tree in special separate subtrees
> > called "yards", then when all mounts is reconstructed we do "pivot_root" syscall. And
> > with overlayfs it was a problem, because we mounted overlayfs with lowerdir,workdir,upperdir
> > paths with mount namespace "yard" path prefix, and after restore in mount options user may see
> > that lowerdir,workdir,upperdir paths were changed... It's a problem. Also it makes second C/R
> > procedure is impossible, because after first C/R lowerdir,workdir,upperdir paths is invalidated
> > after pivot_root.
> >
> > Example for Docker (after first C/R procedure):
> >
> > options lowerdir=/tmp/.criu.mntns.owMo9C/9-0000000000//var/lib/docker/overlay2/l/4BLZ4WH6GZIVKJE5QF62QUUKVZ:/var/lib/docker/overlay2/l/7FYRGAXT35JMKTXCHDNCQO3HKT,upperdir=/tmp/.criu.mntns.owMo9C/9-0000000000//var/lib/docker/overlay2/30aa26fb5e5671fc0126f2fc0e84cc740ce6bf06ca6ad4ac877a3c60f5aceaf1/diff,workdir=/tmp/.criu.mntns.owMo9C/9-0000000000//var/lib/docker/overlay2/30aa26fb5e5671fc0126f2fc0e84cc740ce6bf06ca6ad4ac877a3c60f5aceaf1/work
> >
> 
> That reminds me.
> I've read somewhere that thoses symlinks l/4BLZ4WH6GZIVKJE5QF62QUUKVZ
> are meant to shorten the mount option string, because the mount
> options are limited by
> page size and with many lower layers limitation can reach.
> 
> That is one of the reasons that new mount API was created for (i.e. fsconfig()).
> I wonder if /proc/mounts also has a similar limitation on options size.
> I also wonder why docker doesn't chdir into /var/lib/docker/overlay2/
> before mounting overlay and use relative paths, though that would have
> been worse for CRIU.
> 
> So at least for the docker use case CRIU knows very well where the
> underlying filesytem is mounted (/var/lib/docker/overlay2/ or above).
> So if you got any API from overlayfs something like:
> getxattr("/var/lib/docker/overlay2/XYZ/merged",
> "trusted.overlay.layers.0.fh",..)
> which reads the ovl_fh encoding of layer 0 (upper) rootdir, CRIU
> can verify that uuid matches the filesystem mounted at /var/vol/docker/overlay2/
> and then call open_by_handle_at() to open fd and resolve it to a path
> under /var/vol/docker/overlay2.
> 
> I don't know if that provides what CRIU needs, but it would be no more
> than a few lines of code in overlayfs:
> 
> if (i < ofs->numlayer)
>     fh = ovl_encode_real_fh(ofs->layers[i].mnt->mnt_root, ...
> 

I will try to experiment with that. Thank you!

> Thanks,
> Amir.

-- 
Regards,
Alex.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-06-05 14:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-04 16:11 [PATCH 0/2] overlayfs: C/R enhancements Alexander Mikhalitsyn
2020-06-04 16:11 ` [PATCH 1/2] overlayfs: add dynamic path resolving in mount options Alexander Mikhalitsyn
2020-06-04 16:11 ` [PATCH 2/2] overlayfs: add mnt_id paths options Alexander Mikhalitsyn
2020-06-04 18:04 ` [PATCH 0/2] overlayfs: C/R enhancements Amir Goldstein
2020-06-04 21:34   ` Alexander Mikhalitsyn
2020-06-05  2:35     ` Amir Goldstein
2020-06-05  8:41       ` Pavel Tikhomirov
2020-06-05 10:21         ` Amir Goldstein
2020-06-05 12:44           ` Alexander Mikhalitsyn
2020-06-05 14:36             ` Amir Goldstein
2020-06-05 14:56               ` Alexander Mikhalitsyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).