linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/3] Make overlayfs volatile mounts reusable
@ 2020-11-25 10:46 Sargun Dhillon
  2020-11-25 10:46 ` [PATCH v1 1/3] fs: Add s_instance_id field to superblock for unique identification Sargun Dhillon
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Sargun Dhillon @ 2020-11-25 10:46 UTC (permalink / raw)
  To: linux-unionfs, miklos, Alexander Viro, Amir Goldstein
  Cc: Sargun Dhillon, Giuseppe Scrivano, Vivek Goyal, Daniel J Walsh,
	linux-fsdevel, David Howells


The volatile option is great for "ephemeral" containers. Unfortunately,
it doesn't capture all uses. There are two ways to use it safely right now:

1. Throw away the entire upperdir between mounts
2. Manually syncfs between mounts

For certain use-cases like serverless, or short-lived containers, it is
advantageous to be able to stop the container (runtime) and start it up on
demand / invocation of the function. Usually, there is some bootstrap
process which involves downloading some artifacts, or putting secrets on
disk, and then upon invocation of the function, you want to (re)start the
container.

If you have to syncfs every time you do this, it can lead to excess
filesystem overhead for all of the other containers on the machine, and
stall out every container who's upperdir is on the same underlying
filesystem, unless your filesystem offers something like subvolumes,
and if sync can be restricted to a subvolume.

The kernel has information that it can use to determine whether or not this
is safe -- primarily if the underlying FS has had writeback errors or not.
Overlayfs doesn't delay writes, so the consistency of the upperdir is not
contingent on the mount of overlayfs, but rather the mount of the
underlying filesystem. It can also make sure the underlying filesystem
wasn't remounted. Although, it was suggested that we use derive this
information from the upperdir's inode[1], we can checkpoint this data on
disk in an xattr.

Specifically we checkpoint:
  * Superblock "id": This is a new concept introduced in one of the patches
    which keeps track of (re)mounts of filesystems, by having a per boot
    monotonically increasing integer identifying the superblock. This is
    safer than trying to obfuscate the pointer and putting it into an
    xattr (due to leak risk, and address reuse), and during the course
    of a boot, the u64 should not wrap.
  * Overlay "boot id": This is a new UUID that is overlayfs specific,
    as overlayfs is a module that's independent from the rest of the
    system and can be (re)loaded independently -- thus it generates
    a UUID at load time which can be used to uniquely identify it.
  * upperdir / workdir errseq: A sample of the errseq_t on the workdir /
    upperdir's superblock. Since the errseq_t is implemented as a u32
    with errno + error counter, we can safely store it in a checkpoint.

This has to be done in kernelspace because userspace does not have access
to information such as superblock (re)mounts, the writeback errseq value,
and whether the module has been (re)loaded. Although this mechanism only
determines a subset of the error scenarios, it lays out the groundwork
for adding more.

In the future, we may want to make it so overlayfs shuts down on errors,
or have some kind of other behaviour when errors are detected. If we want
a per-file (on the upperdir) piece of metadata can be added indicating
errseq_t for just that file, and we can check it on each close / open,
and then allow the user to make an intelligent decision of how they want
overlayfs to handle the error. This would allow for errors to be captured
during normal operation as opposed to just between remounts.

RFC [2]:
 * Changed datastructure names
 * No longer delete the volatile dir / dirty file
 * Moved xattr to volatile directory
 * Split errseq check

[1]: https://lore.kernel.org/linux-unionfs/CAOQ4uxhadzC3-kh-igfxv3pAmC3ocDtAQTxByu4hrn8KtZuieQ@mail.gmail.com/
[2]: https://lore.kernel.org/linux-unionfs/20201116045758.21774-1-sargun@sargun.me/T/#t

Sargun Dhillon (3):
  fs: Add s_instance_id field to superblock for unique identification
  overlay: Add the ability to remount volatile directories when safe
  overlay: Add rudimentary checking of writeback errseq on volatile
    remount

 Documentation/filesystems/overlayfs.rst |  6 +-
 fs/overlayfs/overlayfs.h                | 38 +++++++++-
 fs/overlayfs/readdir.c                  | 94 +++++++++++++++++++++----
 fs/overlayfs/super.c                    | 75 +++++++++++++++-----
 fs/overlayfs/util.c                     |  2 +
 fs/super.c                              |  3 +
 include/linux/fs.h                      |  7 ++
 7 files changed, 193 insertions(+), 32 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 1/3] fs: Add s_instance_id field to superblock for unique identification
  2020-11-25 10:46 [PATCH v1 0/3] Make overlayfs volatile mounts reusable Sargun Dhillon
@ 2020-11-25 10:46 ` Sargun Dhillon
  2020-11-25 10:46 ` [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe Sargun Dhillon
  2020-11-25 10:46 ` [PATCH v1 3/3] overlay: Add rudimentary checking of writeback errseq on volatile remount Sargun Dhillon
  2 siblings, 0 replies; 19+ messages in thread
From: Sargun Dhillon @ 2020-11-25 10:46 UTC (permalink / raw)
  To: linux-unionfs, miklos, Alexander Viro, Amir Goldstein
  Cc: Sargun Dhillon, Giuseppe Scrivano, Vivek Goyal, Daniel J Walsh,
	linux-fsdevel, David Howells

This assigns a per-boot unique number to each superblock. This allows
other components to know whether a filesystem has been remounted
since they last interacted with it.

At every boot it is reset to 0. There is no specific reason it is set to 0,
other than repeatability versus using some random starting number. Because
of this, you must store it along some other piece of data which is
initialized at boot time.

This doesn't have any of the overhead of idr, and a u64 wont wrap any time
soon. There is no forward lookup requirement, so an idr is not needed.

In the future, we may want to expose this to userspace. Userspace programs
can benefit from this if they have large chunks of dirty or mmaped memory
that they're interacting with, and they want to see if that volume has been
unmounted, and remounted. Along with this, and a mechanism to inspect the
superblock's errseq a user can determine whether they need to throw away
their cache or similar. This is another benefit in comparison to just
using a pointer to the superblock to uniquely identify it.

Although this doesn't expose an ioctl or similar yet, in the future we
could add an ioctl that allows for fetching the s_instance_id for a given
cache, and inspection of the errseq associated with that.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-unionfs@vger.kernel.org
---
 fs/super.c         | 3 +++
 include/linux/fs.h | 7 +++++++
 2 files changed, 10 insertions(+)

diff --git a/fs/super.c b/fs/super.c
index 904459b35119..e47ace7f8c3d 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -42,6 +42,7 @@
 
 static int thaw_super_locked(struct super_block *sb);
 
+static u64 s_instance_id_counter;
 static LIST_HEAD(super_blocks);
 static DEFINE_SPINLOCK(sb_lock);
 
@@ -546,6 +547,7 @@ struct super_block *sget_fc(struct fs_context *fc,
 	s->s_iflags |= fc->s_iflags;
 	strlcpy(s->s_id, s->s_type->name, sizeof(s->s_id));
 	list_add_tail(&s->s_list, &super_blocks);
+	s->s_instance_id = s_instance_id_counter++;
 	hlist_add_head(&s->s_instances, &s->s_type->fs_supers);
 	spin_unlock(&sb_lock);
 	get_filesystem(s->s_type);
@@ -625,6 +627,7 @@ struct super_block *sget(struct file_system_type *type,
 	s->s_type = type;
 	strlcpy(s->s_id, type->name, sizeof(s->s_id));
 	list_add_tail(&s->s_list, &super_blocks);
+	s->s_instance_id = s_instance_id_counter++;
 	hlist_add_head(&s->s_instances, &type->fs_supers);
 	spin_unlock(&sb_lock);
 	get_filesystem(type);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7519ae003a08..09bf54382a54 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1472,6 +1472,13 @@ struct super_block {
 	char			s_id[32];	/* Informational name */
 	uuid_t			s_uuid;		/* UUID */
 
+	/*
+	 * ID identifying this particular instance of the superblock. It can
+	 * be used to determine if a particular filesystem has been remounted.
+	 * It may be exposed to userspace.
+	 */
+	u64			s_instance_id;
+
 	unsigned int		s_max_links;
 	fmode_t			s_mode;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe
  2020-11-25 10:46 [PATCH v1 0/3] Make overlayfs volatile mounts reusable Sargun Dhillon
  2020-11-25 10:46 ` [PATCH v1 1/3] fs: Add s_instance_id field to superblock for unique identification Sargun Dhillon
@ 2020-11-25 10:46 ` Sargun Dhillon
  2020-11-25 12:49   ` kernel test robot
                     ` (5 more replies)
  2020-11-25 10:46 ` [PATCH v1 3/3] overlay: Add rudimentary checking of writeback errseq on volatile remount Sargun Dhillon
  2 siblings, 6 replies; 19+ messages in thread
From: Sargun Dhillon @ 2020-11-25 10:46 UTC (permalink / raw)
  To: linux-unionfs, miklos, Alexander Viro, Amir Goldstein
  Cc: Sargun Dhillon, Giuseppe Scrivano, Vivek Goyal, Daniel J Walsh,
	linux-fsdevel, David Howells

Overlayfs added the ability to setup mounts where all syncs could be
short-circuted in (2a99ddacee43: ovl: provide a mount option "volatile").

A user might want to remount this fs, but we do not let the user because
of the "incompat" detection feature. In the case of volatile, it is safe
to do something like[1]:

$ sync -f /root/upperdir
$ rm -rf /root/workdir/incompat/volatile

There are two ways to go about this. You can call sync on the underlying
filesystem, check the error code, and delete the dirty file if everything
is clean. If you're running lots of containers on the same filesystem, or
you want to avoid all unnecessary I/O, this may be suboptimal.

Alternatively, you can blindly delete the dirty file, and "hope for the
best".

This patch introduces transparent functionality to check if it is
(relatively) safe to reuse the upperdir. It ensures that the filesystem
hasn't been remounted, the system hasn't been rebooted, nor has the
overlayfs code changed. Since the structure is explicitly not meant to be
used between different versions of the code, its stability does not matter
so much.

[1]: https://lore.kernel.org/linux-unionfs/CAOQ4uxhKr+j5jFyEC2gJX8E8M19mQ3CqdTYaPZOvDQ9c0qLEzw@mail.gmail.com/T/#m6abe713e4318202ad57f301bf28a414e1d824f9c

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-unionfs@vger.kernel.org
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Amir Goldstein <amir73il@gmail.com>
---
 Documentation/filesystems/overlayfs.rst |  6 +-
 fs/overlayfs/overlayfs.h                | 37 ++++++++++-
 fs/overlayfs/readdir.c                  | 88 +++++++++++++++++++++----
 fs/overlayfs/super.c                    | 74 ++++++++++++++++-----
 fs/overlayfs/util.c                     |  2 +
 5 files changed, 175 insertions(+), 32 deletions(-)

diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst
index 580ab9a0fe31..18237c79fb4e 100644
--- a/Documentation/filesystems/overlayfs.rst
+++ b/Documentation/filesystems/overlayfs.rst
@@ -581,7 +581,11 @@ checks for this directory and refuses to mount if present. This is a strong
 indicator that user should throw away upper and work directories and create
 fresh one. In very limited cases where the user knows that the system has
 not crashed and contents of upperdir are intact, The "volatile" directory
-can be removed.
+can be removed.  In some cases, the filesystem can detect if the upperdir
+can be reused safely in a subsequent volatile mount.  If the filesystem is able
+to determine this, it will not require the user to manually delete the volatile
+directory and mounting will proceed as normal.
+
 
 Testsuite
 ---------
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index f8880aa2ba0e..de694ee99d7c 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -32,8 +32,13 @@ enum ovl_xattr {
 	OVL_XATTR_NLINK,
 	OVL_XATTR_UPPER,
 	OVL_XATTR_METACOPY,
+	OVL_XATTR_VOLATILE,
 };
 
+#define OVL_INCOMPATDIR_NAME "incompat"
+#define OVL_VOLATILEDIR_NAME "volatile"
+#define OVL_VOLATILE_DIRTY_NAME "dirty"
+
 enum ovl_inode_flag {
 	/* Pure upper dir that may contain non pure upper entries */
 	OVL_IMPURE,
@@ -57,6 +62,31 @@ enum {
 	OVL_XINO_ON,
 };
 
+/*
+ * This is copied into the volatile xattr, and the user does not interact with
+ * it. There is no stability requirement, as a reboot explicitly invalidates
+ * a volatile workdir. It is explicitly meant not to be a stable api.
+ *
+ * Although this structure isn't meant to be stable it is exposed to potentially
+ * unprivileged users. We don't do any kind of cryptographic operations with
+ * the structure, so it could be tampered with, or inspected. Don't put
+ * kernel memory pointers in it, or anything else that could cause problems,
+ * or information disclosure.
+ */
+struct ovl_volatile_info {
+	/*
+	 * This uniquely identifies a boot, and is reset if overlayfs itself
+	 * is reloaded. Therefore we check our current / known boot_id
+	 * against this before looking at any other fields to validate:
+	 * 1. Is this datastructure laid out in the way we expect? (Overlayfs
+	 *    module, reboot, etc...)
+	 * 2. Could something have changed (like the s_instance_id counter
+	 *    resetting)
+	 */
+	uuid_t		ovl_boot_id;	/* Must stay first member */
+	u64		s_instance_id;
+} __packed;
+
 /*
  * The tuple (fh,uuid) is a universal unique identifier for a copy up origin,
  * where:
@@ -422,8 +452,8 @@ void ovl_cleanup_whiteouts(struct dentry *upper, struct list_head *list);
 void ovl_cache_free(struct list_head *list);
 void ovl_dir_cache_free(struct inode *inode);
 int ovl_check_d_type_supported(struct path *realpath);
-int ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
-			struct dentry *dentry, int level);
+int ovl_workdir_cleanup(struct ovl_fs *ofs, struct inode *dir,
+			struct vfsmount *mnt, struct dentry *dentry, int level);
 int ovl_indexdir_cleanup(struct ovl_fs *ofs);
 
 /* inode.c */
@@ -520,3 +550,6 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
 
 /* export.c */
 extern const struct export_operations ovl_export_operations;
+
+/* super.c */
+extern uuid_t ovl_boot_id;
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 01620ebae1bd..4e3e2bc3ea43 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -1080,10 +1080,68 @@ int ovl_check_d_type_supported(struct path *realpath)
 
 	return rdd.d_type_supported;
 }
+static int ovl_verify_volatile_info(struct ovl_fs *ofs,
+				    struct dentry *volatiledir)
+{
+	int err;
+	struct ovl_volatile_info info;
+
+	err = ovl_do_getxattr(ofs, volatiledir, OVL_XATTR_VOLATILE, &info,
+			      sizeof(info));
+	if (err < 0) {
+		pr_debug("Unable to read volatile xattr: %d\n", err);
+		return -EINVAL;
+	}
+
+	if (err != sizeof(info)) {
+		pr_debug("ovl_volatile_info is of size %d expected %ld\n", err,
+			 sizeof(info));
+		return -EINVAL;
+	}
+
+	if (!uuid_equal(&ovl_boot_id, &info.ovl_boot_id)) {
+		pr_debug("boot id has changed (reboot or module reloaded)\n");
+		return -EINVAL;
+	}
+
+	if (volatiledir->d_sb->s_instance_id != info.s_instance_id) {
+		pr_debug("workdir has been unmounted and remounted\n");
+		return -EINVAL;
+	}
 
-#define OVL_INCOMPATDIR_NAME "incompat"
+	return 1;
+}
 
-static int ovl_workdir_cleanup_recurse(struct path *path, int level)
+/*
+ * ovl_check_incompat checks this specific incompat entry for incompatibility.
+ * If it is found to be incompatible -EINVAL will be returned.
+ *
+ * If the directory should be preserved, then this function returns 1.
+ */
+static int ovl_check_incompat(struct ovl_fs *ofs, struct ovl_cache_entry *p,
+			      struct path *path)
+{
+	int err = -EINVAL;
+	struct dentry *d;
+
+	d = lookup_one_len(p->name, path->dentry, p->len);
+	if (IS_ERR(d))
+		return PTR_ERR(d);
+
+	if (!strcmp(p->name, OVL_VOLATILEDIR_NAME))
+		err = ovl_verify_volatile_info(ofs, d);
+
+	if (err == -EINVAL)
+		pr_err("incompat feature '%s' cannot be mounted\n", p->name);
+	else
+		pr_debug("incompat '%s' handled: %d\n", p->name, err);
+
+	dput(d);
+	return err;
+}
+
+static int ovl_workdir_cleanup_recurse(struct ovl_fs *ofs, struct path *path,
+				       int level)
 {
 	int err;
 	struct inode *dir = path->dentry->d_inode;
@@ -1125,16 +1183,19 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level)
 			if (p->len == 2 && p->name[1] == '.')
 				continue;
 		} else if (incompat) {
-			pr_err("overlay with incompat feature '%s' cannot be mounted\n",
-				p->name);
-			err = -EINVAL;
-			break;
+			err = ovl_check_incompat(ofs, p, path);
+			if (err < 0)
+				break;
+			/* Skip cleaning this */
+			if (err == 1)
+				continue;
 		}
 		dentry = lookup_one_len(p->name, path->dentry, p->len);
 		if (IS_ERR(dentry))
 			continue;
 		if (dentry->d_inode)
-			err = ovl_workdir_cleanup(dir, path->mnt, dentry, level);
+			err = ovl_workdir_cleanup(ofs, dir, path->mnt, dentry,
+						  level);
 		dput(dentry);
 		if (err)
 			break;
@@ -1142,11 +1203,13 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level)
 	inode_unlock(dir);
 out:
 	ovl_cache_free(&list);
+	if (incompat && err >= 0)
+		return 1;
 	return err;
 }
 
-int ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
-			 struct dentry *dentry, int level)
+int ovl_workdir_cleanup(struct ovl_fs *ofs, struct inode *dir,
+			struct vfsmount *mnt, struct dentry *dentry, int level)
 {
 	int err;
 
@@ -1159,7 +1222,7 @@ int ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
 		struct path path = { .mnt = mnt, .dentry = dentry };
 
 		inode_unlock(dir);
-		err = ovl_workdir_cleanup_recurse(&path, level + 1);
+		err = ovl_workdir_cleanup_recurse(ofs, &path, level + 1);
 		inode_lock_nested(dir, I_MUTEX_PARENT);
 		if (!err)
 			err = ovl_cleanup(dir, dentry);
@@ -1206,9 +1269,10 @@ int ovl_indexdir_cleanup(struct ovl_fs *ofs)
 		}
 		/* Cleanup leftover from index create/cleanup attempt */
 		if (index->d_name.name[0] == '#') {
-			err = ovl_workdir_cleanup(dir, path.mnt, index, 1);
-			if (err)
+			err = ovl_workdir_cleanup(ofs, dir, path.mnt, index, 1);
+			if (err < 0)
 				break;
+			err = 0;
 			goto next;
 		}
 		err = ovl_verify_index(ofs, index);
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 290983bcfbb3..9a1b07907662 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -15,6 +15,7 @@
 #include <linux/seq_file.h>
 #include <linux/posix_acl_xattr.h>
 #include <linux/exportfs.h>
+#include <linux/uuid.h>
 #include "overlayfs.h"
 
 MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
@@ -23,6 +24,7 @@ MODULE_LICENSE("GPL");
 
 
 struct ovl_dir_cache;
+uuid_t ovl_boot_id;
 
 #define OVL_MAX_STACK 500
 
@@ -722,20 +724,24 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
 				goto out_unlock;
 
 			retried = true;
-			err = ovl_workdir_cleanup(dir, mnt, work, 0);
-			dput(work);
-			if (err == -EINVAL) {
-				work = ERR_PTR(err);
-				goto out_unlock;
+			err = ovl_workdir_cleanup(ofs, dir, mnt, work, 0);
+			/* check if we should reuse the workdir */
+			if (err != 1) {
+				dput(work);
+				if (err == -EINVAL) {
+					work = ERR_PTR(err);
+					goto out_unlock;
+				}
+				goto retry;
 			}
-			goto retry;
+		} else {
+			work = ovl_create_real(dir, work,
+					       OVL_CATTR(attr.ia_mode));
+			err = PTR_ERR(work);
+			if (IS_ERR(work))
+				goto out_err;
 		}
 
-		work = ovl_create_real(dir, work, OVL_CATTR(attr.ia_mode));
-		err = PTR_ERR(work);
-		if (IS_ERR(work))
-			goto out_err;
-
 		/*
 		 * Try to remove POSIX ACL xattrs from workdir.  We are good if:
 		 *
@@ -1237,26 +1243,59 @@ static struct dentry *ovl_lookup_or_create(struct dentry *parent,
 	return child;
 }
 
+static int ovl_set_volatile_info(struct ovl_fs *ofs, struct dentry *volatiledir)
+{
+	int err;
+	struct ovl_volatile_info info = {
+		.s_instance_id = volatiledir->d_sb->s_instance_id,
+	};
+
+	uuid_copy(&info.ovl_boot_id, &ovl_boot_id);
+	err = ovl_do_setxattr(ofs, volatiledir, OVL_XATTR_VOLATILE, &info,
+			      sizeof(info));
+
+	if (err == -EOPNOTSUPP)
+		return 0;
+
+	return err;
+}
+
 /*
  * Creates $workdir/work/incompat/volatile/dirty file if it is not already
  * present.
  */
 static int ovl_create_volatile_dirty(struct ovl_fs *ofs)
 {
+	int err;
 	unsigned int ctr;
-	struct dentry *d = dget(ofs->workbasedir);
+	struct dentry *volatiledir, *d = dget(ofs->workbasedir);
 	static const char *const volatile_path[] = {
-		OVL_WORKDIR_NAME, "incompat", "volatile", "dirty"
+		OVL_WORKDIR_NAME,
+		OVL_INCOMPATDIR_NAME,
+		OVL_VOLATILEDIR_NAME,
+		OVL_VOLATILE_DIRTY_NAME,
 	};
 	const char *const *name = volatile_path;
 
-	for (ctr = ARRAY_SIZE(volatile_path); ctr; ctr--, name++) {
-		d = ovl_lookup_or_create(d, *name, ctr > 1 ? S_IFDIR : S_IFREG);
+	/* Stop before the dirty file is created */
+	for (ctr = 0; ctr < ARRAY_SIZE(volatile_path) - 1; ctr++, name++) {
+		d = ovl_lookup_or_create(d, *name, S_IFDIR);
 		if (IS_ERR(d))
 			return PTR_ERR(d);
 	}
-	dput(d);
-	return 0;
+	volatiledir = dget(d);
+
+	/* Create the dirty file exists before we set the xattr */
+	d = ovl_lookup_or_create(d, *name, S_IFREG);
+	if (!IS_ERR(d)) {
+		dput(d);
+		err = ovl_set_volatile_info(ofs, volatiledir);
+	} else {
+		err = PTR_ERR(d);
+	}
+
+	dput(volatiledir);
+	return err;
 }
 
 static int ovl_make_workdir(struct super_block *sb, struct ovl_fs *ofs,
@@ -2044,6 +2083,7 @@ static int __init ovl_init(void)
 {
 	int err;
 
+	uuid_gen(&ovl_boot_id);
 	ovl_inode_cachep = kmem_cache_create("ovl_inode",
 					     sizeof(struct ovl_inode), 0,
 					     (SLAB_RECLAIM_ACCOUNT|
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 23f475627d07..87c9f5a063ed 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -580,6 +580,7 @@ bool ovl_check_dir_xattr(struct super_block *sb, struct dentry *dentry,
 #define OVL_XATTR_NLINK_POSTFIX		"nlink"
 #define OVL_XATTR_UPPER_POSTFIX		"upper"
 #define OVL_XATTR_METACOPY_POSTFIX	"metacopy"
+#define OVL_XATTR_VOLATILE_POSTFIX	"volatile"
 
 #define OVL_XATTR_TAB_ENTRY(x) \
 	[x] = OVL_XATTR_PREFIX x ## _POSTFIX
@@ -592,6 +593,7 @@ const char *ovl_xattr_table[] = {
 	OVL_XATTR_TAB_ENTRY(OVL_XATTR_NLINK),
 	OVL_XATTR_TAB_ENTRY(OVL_XATTR_UPPER),
 	OVL_XATTR_TAB_ENTRY(OVL_XATTR_METACOPY),
+	OVL_XATTR_TAB_ENTRY(OVL_XATTR_VOLATILE),
 };
 
 int ovl_check_setxattr(struct dentry *dentry, struct dentry *upperdentry,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 3/3] overlay: Add rudimentary checking of writeback errseq on volatile remount
  2020-11-25 10:46 [PATCH v1 0/3] Make overlayfs volatile mounts reusable Sargun Dhillon
  2020-11-25 10:46 ` [PATCH v1 1/3] fs: Add s_instance_id field to superblock for unique identification Sargun Dhillon
  2020-11-25 10:46 ` [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe Sargun Dhillon
@ 2020-11-25 10:46 ` Sargun Dhillon
  2020-11-25 14:03   ` Amir Goldstein
  2 siblings, 1 reply; 19+ messages in thread
From: Sargun Dhillon @ 2020-11-25 10:46 UTC (permalink / raw)
  To: linux-unionfs, miklos, Alexander Viro, Amir Goldstein
  Cc: Sargun Dhillon, Giuseppe Scrivano, Vivek Goyal, Daniel J Walsh,
	linux-fsdevel, David Howells

Volatile remounts validate the following at the moment:
 * Has the module been reloaded / the system rebooted
 * Has the workdir been remounted

This adds a new check for errors detected via the superblock's
errseq_t. At mount time, the errseq_t is snapshotted to disk,
and upon remount it's re-verified. This allows for kernel-level
detection of errors without forcing userspace to perform a
sync and allows for the hidden detection of writeback errors.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
---
 fs/overlayfs/overlayfs.h | 1 +
 fs/overlayfs/readdir.c   | 6 ++++++
 fs/overlayfs/super.c     | 1 +
 3 files changed, 8 insertions(+)

diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index de694ee99d7c..e8a711953b64 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -85,6 +85,7 @@ struct ovl_volatile_info {
 	 */
 	uuid_t		ovl_boot_id;	/* Must stay first member */
 	u64		s_instance_id;
+	errseq_t	errseq;	/* Implemented as a u32 */
 } __packed;
 
 /*
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 4e3e2bc3ea43..2bb0641ecbbd 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -1109,6 +1109,12 @@ static int ovl_verify_volatile_info(struct ovl_fs *ofs,
 		return -EINVAL;
 	}
 
+	err = errseq_check(&volatiledir->d_sb->s_wb_err, info.errseq);
+	if (err) {
+		pr_debug("Workdir filesystem reports errors: %d\n", err);
+		return -EINVAL;
+	}
+
 	return 1;
 }
 
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 9a1b07907662..49dee41ec125 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1248,6 +1248,7 @@ static int ovl_set_volatile_info(struct ovl_fs *ofs, struct dentry *volatiledir)
 	int err;
 	struct ovl_volatile_info info = {
 		.s_instance_id = volatiledir->d_sb->s_instance_id,
+		.errseq = errseq_sample(&volatiledir->d_sb->s_wb_err),
 	};
 
 	uuid_copy(&info.ovl_boot_id, &ovl_boot_id);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe
  2020-11-25 10:46 ` [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe Sargun Dhillon
@ 2020-11-25 12:49   ` kernel test robot
  2020-11-25 13:23   ` kernel test robot
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: kernel test robot @ 2020-11-25 12:49 UTC (permalink / raw)
  To: Sargun Dhillon, linux-unionfs, miklos, Alexander Viro, Amir Goldstein
  Cc: kbuild-all, Sargun Dhillon, Giuseppe Scrivano, Vivek Goyal,
	Daniel J Walsh, linux-fsdevel, David Howells

[-- Attachment #1: Type: text/plain, Size: 3591 bytes --]

Hi Sargun,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on miklos-vfs/overlayfs-next]
[also build test WARNING on linus/master v5.10-rc5 next-20201124]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Sargun-Dhillon/Make-overlayfs-volatile-mounts-reusable/20201125-184754
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
config: arc-randconfig-m031-20201125 (attached as .config)
compiler: arc-elf-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/2fee0f742c2bb44771e51ed73ec7faf50165832a
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Sargun-Dhillon/Make-overlayfs-volatile-mounts-reusable/20201125-184754
        git checkout 2fee0f742c2bb44771e51ed73ec7faf50165832a
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from include/linux/kernel.h:16,
                    from include/linux/list.h:9,
                    from include/linux/wait.h:7,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from fs/overlayfs/readdir.c:7:
   fs/overlayfs/readdir.c: In function 'ovl_verify_volatile_info':
>> include/linux/kern_levels.h:5:18: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'unsigned int' [-Wformat=]
       5 | #define KERN_SOH "\001"  /* ASCII Start Of Header */
         |                  ^~~~~~
   include/linux/printk.h:140:10: note: in definition of macro 'no_printk'
     140 |   printk(fmt, ##__VA_ARGS__);  \
         |          ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
      15 | #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
         |                    ^~~~~~~~
   include/linux/printk.h:430:12: note: in expansion of macro 'KERN_DEBUG'
     430 |  no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
         |            ^~~~~~~~~~
   fs/overlayfs/readdir.c:1097:3: note: in expansion of macro 'pr_debug'
    1097 |   pr_debug("ovl_volatile_info is of size %d expected %ld\n", err,
         |   ^~~~~~~~
   fs/overlayfs/readdir.c:1097:56: note: format string is defined here
    1097 |   pr_debug("ovl_volatile_info is of size %d expected %ld\n", err,
         |                                                      ~~^
         |                                                        |
         |                                                        long int
         |                                                      %d

vim +5 include/linux/kern_levels.h

314ba3520e513a7 Joe Perches 2012-07-30  4  
04d2c8c83d0e3ac Joe Perches 2012-07-30 @5  #define KERN_SOH	"\001"		/* ASCII Start Of Header */
04d2c8c83d0e3ac Joe Perches 2012-07-30  6  #define KERN_SOH_ASCII	'\001'
04d2c8c83d0e3ac Joe Perches 2012-07-30  7  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 22247 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe
  2020-11-25 10:46 ` [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe Sargun Dhillon
  2020-11-25 12:49   ` kernel test robot
@ 2020-11-25 13:23   ` kernel test robot
  2020-11-25 13:29   ` kernel test robot
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: kernel test robot @ 2020-11-25 13:23 UTC (permalink / raw)
  To: Sargun Dhillon, linux-unionfs, miklos, Alexander Viro, Amir Goldstein
  Cc: kbuild-all, clang-built-linux, Sargun Dhillon, Giuseppe Scrivano,
	Vivek Goyal, Daniel J Walsh, linux-fsdevel, David Howells

[-- Attachment #1: Type: text/plain, Size: 4348 bytes --]

Hi Sargun,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on miklos-vfs/overlayfs-next]
[also build test WARNING on linus/master v5.10-rc5 next-20201125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Sargun-Dhillon/Make-overlayfs-volatile-mounts-reusable/20201125-184754
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
config: powerpc64-randconfig-r005-20201125 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 77e98eaee2e8d4b9b297b66fda5b1e51e2a69999)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install powerpc64 cross compiling tool for clang build
        # apt-get install binutils-powerpc64-linux-gnu
        # https://github.com/0day-ci/linux/commit/2fee0f742c2bb44771e51ed73ec7faf50165832a
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Sargun-Dhillon/Make-overlayfs-volatile-mounts-reusable/20201125-184754
        git checkout 2fee0f742c2bb44771e51ed73ec7faf50165832a
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/overlayfs/readdir.c:1098:5: warning: format specifies type 'long' but the argument has type 'unsigned int' [-Wformat]
                            sizeof(info));
                            ^~~~~~~~~~~~
   include/linux/printk.h:424:26: note: expanded from macro 'pr_debug'
           dynamic_pr_debug(fmt, ##__VA_ARGS__)
                            ~~~    ^~~~~~~~~~~
   include/linux/dynamic_debug.h:158:22: note: expanded from macro 'dynamic_pr_debug'
                              pr_fmt(fmt), ##__VA_ARGS__)
                                     ~~~     ^~~~~~~~~~~
   include/linux/dynamic_debug.h:147:56: note: expanded from macro '_dynamic_func_call'
           __dynamic_func_call(__UNIQUE_ID(ddebug), fmt, func, ##__VA_ARGS__)
                                                                 ^~~~~~~~~~~
   include/linux/dynamic_debug.h:129:15: note: expanded from macro '__dynamic_func_call'
                   func(&id, ##__VA_ARGS__);               \
                               ^~~~~~~~~~~
   1 warning generated.

vim +1098 fs/overlayfs/readdir.c

  1064	
  1065	/*
  1066	 * Returns 1 if d_type is supported, 0 not supported/unknown. Negative values
  1067	 * if error is encountered.
  1068	 */
  1069	int ovl_check_d_type_supported(struct path *realpath)
  1070	{
  1071		int err;
  1072		struct ovl_readdir_data rdd = {
  1073			.ctx.actor = ovl_check_d_type,
  1074			.d_type_supported = false,
  1075		};
  1076	
  1077		err = ovl_dir_read(realpath, &rdd);
  1078		if (err)
  1079			return err;
  1080	
  1081		return rdd.d_type_supported;
  1082	}
  1083	static int ovl_verify_volatile_info(struct ovl_fs *ofs,
  1084					    struct dentry *volatiledir)
  1085	{
  1086		int err;
  1087		struct ovl_volatile_info info;
  1088	
  1089		err = ovl_do_getxattr(ofs, volatiledir, OVL_XATTR_VOLATILE, &info,
  1090				      sizeof(info));
  1091		if (err < 0) {
  1092			pr_debug("Unable to read volatile xattr: %d\n", err);
  1093			return -EINVAL;
  1094		}
  1095	
  1096		if (err != sizeof(info)) {
  1097			pr_debug("ovl_volatile_info is of size %d expected %ld\n", err,
> 1098				 sizeof(info));
  1099			return -EINVAL;
  1100		}
  1101	
  1102		if (!uuid_equal(&ovl_boot_id, &info.ovl_boot_id)) {
  1103			pr_debug("boot id has changed (reboot or module reloaded)\n");
  1104			return -EINVAL;
  1105		}
  1106	
  1107		if (volatiledir->d_sb->s_instance_id != info.s_instance_id) {
  1108			pr_debug("workdir has been unmounted and remounted\n");
  1109			return -EINVAL;
  1110		}
  1111	
  1112		return 1;
  1113	}
  1114	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30220 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe
  2020-11-25 10:46 ` [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe Sargun Dhillon
  2020-11-25 12:49   ` kernel test robot
  2020-11-25 13:23   ` kernel test robot
@ 2020-11-25 13:29   ` kernel test robot
  2020-11-25 13:58   ` Amir Goldstein
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: kernel test robot @ 2020-11-25 13:29 UTC (permalink / raw)
  To: Sargun Dhillon, linux-unionfs, miklos, Alexander Viro, Amir Goldstein
  Cc: kbuild-all, Sargun Dhillon, Giuseppe Scrivano, Vivek Goyal,
	Daniel J Walsh, linux-fsdevel, David Howells

[-- Attachment #1: Type: text/plain, Size: 3524 bytes --]

Hi Sargun,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on miklos-vfs/overlayfs-next]
[also build test WARNING on linus/master v5.10-rc5 next-20201125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Sargun-Dhillon/Make-overlayfs-volatile-mounts-reusable/20201125-184754
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
config: parisc-randconfig-r001-20201125 (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/2fee0f742c2bb44771e51ed73ec7faf50165832a
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Sargun-Dhillon/Make-overlayfs-volatile-mounts-reusable/20201125-184754
        git checkout 2fee0f742c2bb44771e51ed73ec7faf50165832a
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from fs/overlayfs/readdir.c:16:
   fs/overlayfs/readdir.c: In function 'ovl_verify_volatile_info':
>> fs/overlayfs/overlayfs.h:13:21: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'unsigned int' [-Wformat=]
      13 | #define pr_fmt(fmt) "overlayfs: " fmt
         |                     ^~~~~~~~~~~~~
   include/linux/dynamic_debug.h:129:15: note: in expansion of macro 'pr_fmt'
     129 |   func(&id, ##__VA_ARGS__);  \
         |               ^~~~~~~~~~~
   include/linux/dynamic_debug.h:147:2: note: in expansion of macro '__dynamic_func_call'
     147 |  __dynamic_func_call(__UNIQUE_ID(ddebug), fmt, func, ##__VA_ARGS__)
         |  ^~~~~~~~~~~~~~~~~~~
   include/linux/dynamic_debug.h:157:2: note: in expansion of macro '_dynamic_func_call'
     157 |  _dynamic_func_call(fmt, __dynamic_pr_debug,  \
         |  ^~~~~~~~~~~~~~~~~~
   include/linux/printk.h:424:2: note: in expansion of macro 'dynamic_pr_debug'
     424 |  dynamic_pr_debug(fmt, ##__VA_ARGS__)
         |  ^~~~~~~~~~~~~~~~
   fs/overlayfs/readdir.c:1097:3: note: in expansion of macro 'pr_debug'
    1097 |   pr_debug("ovl_volatile_info is of size %d expected %ld\n", err,
         |   ^~~~~~~~
   fs/overlayfs/readdir.c:1097:56: note: format string is defined here
    1097 |   pr_debug("ovl_volatile_info is of size %d expected %ld\n", err,
         |                                                      ~~^
         |                                                        |
         |                                                        long int
         |                                                      %d

vim +13 fs/overlayfs/overlayfs.h

e9be9d5e76e3487 Miklos Szeredi 2014-10-24  11  
1bd0a3aea4357e1 lijiazi        2019-12-16  12  #undef pr_fmt
1bd0a3aea4357e1 lijiazi        2019-12-16 @13  #define pr_fmt(fmt) "overlayfs: " fmt
1bd0a3aea4357e1 lijiazi        2019-12-16  14  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28600 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe
  2020-11-25 10:46 ` [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe Sargun Dhillon
                     ` (2 preceding siblings ...)
  2020-11-25 13:29   ` kernel test robot
@ 2020-11-25 13:58   ` Amir Goldstein
  2020-11-25 14:43   ` Vivek Goyal
  2020-11-25 18:17   ` Vivek Goyal
  5 siblings, 0 replies; 19+ messages in thread
From: Amir Goldstein @ 2020-11-25 13:58 UTC (permalink / raw)
  To: Sargun Dhillon
  Cc: overlayfs, Miklos Szeredi, Alexander Viro, Giuseppe Scrivano,
	Vivek Goyal, Daniel J Walsh, linux-fsdevel, David Howells

On Wed, Nov 25, 2020 at 12:46 PM Sargun Dhillon <sargun@sargun.me> wrote:
>
> Overlayfs added the ability to setup mounts where all syncs could be
> short-circuted in (2a99ddacee43: ovl: provide a mount option "volatile").
>
> A user might want to remount this fs, but we do not let the user because
> of the "incompat" detection feature. In the case of volatile, it is safe
> to do something like[1]:
>
> $ sync -f /root/upperdir
> $ rm -rf /root/workdir/incompat/volatile
>
> There are two ways to go about this. You can call sync on the underlying
> filesystem, check the error code, and delete the dirty file if everything
> is clean. If you're running lots of containers on the same filesystem, or
> you want to avoid all unnecessary I/O, this may be suboptimal.
>
> Alternatively, you can blindly delete the dirty file, and "hope for the
> best".
>
> This patch introduces transparent functionality to check if it is
> (relatively) safe to reuse the upperdir. It ensures that the filesystem
> hasn't been remounted, the system hasn't been rebooted, nor has the
> overlayfs code changed. Since the structure is explicitly not meant to be
> used between different versions of the code, its stability does not matter
> so much.
>

Looks good.
You forgot to allow reuse only as volatile...
Other than that just a few comments.

> [1]: https://lore.kernel.org/linux-unionfs/CAOQ4uxhKr+j5jFyEC2gJX8E8M19mQ3CqdTYaPZOvDQ9c0qLEzw@mail.gmail.com/T/#m6abe713e4318202ad57f301bf28a414e1d824f9c
>
> Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-unionfs@vger.kernel.org
> Cc: Miklos Szeredi <miklos@szeredi.hu>
> Cc: Amir Goldstein <amir73il@gmail.com>
> ---
>  Documentation/filesystems/overlayfs.rst |  6 +-
>  fs/overlayfs/overlayfs.h                | 37 ++++++++++-
>  fs/overlayfs/readdir.c                  | 88 +++++++++++++++++++++----
>  fs/overlayfs/super.c                    | 74 ++++++++++++++++-----
>  fs/overlayfs/util.c                     |  2 +
>  5 files changed, 175 insertions(+), 32 deletions(-)
>
> diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst
> index 580ab9a0fe31..18237c79fb4e 100644
> --- a/Documentation/filesystems/overlayfs.rst
> +++ b/Documentation/filesystems/overlayfs.rst
> @@ -581,7 +581,11 @@ checks for this directory and refuses to mount if present. This is a strong
>  indicator that user should throw away upper and work directories and create
>  fresh one. In very limited cases where the user knows that the system has
>  not crashed and contents of upperdir are intact, The "volatile" directory
> -can be removed.
> +can be removed.  In some cases, the filesystem can detect if the upperdir
> +can be reused safely in a subsequent volatile mount.  If the filesystem is able
> +to determine this, it will not require the user to manually delete the volatile
> +directory and mounting will proceed as normal.
> +
>
>  Testsuite
>  ---------
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index f8880aa2ba0e..de694ee99d7c 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -32,8 +32,13 @@ enum ovl_xattr {
>         OVL_XATTR_NLINK,
>         OVL_XATTR_UPPER,
>         OVL_XATTR_METACOPY,
> +       OVL_XATTR_VOLATILE,
>  };
>
> +#define OVL_INCOMPATDIR_NAME "incompat"
> +#define OVL_VOLATILEDIR_NAME "volatile"
> +#define OVL_VOLATILE_DIRTY_NAME "dirty"
> +
>  enum ovl_inode_flag {
>         /* Pure upper dir that may contain non pure upper entries */
>         OVL_IMPURE,
> @@ -57,6 +62,31 @@ enum {
>         OVL_XINO_ON,
>  };
>
> +/*
> + * This is copied into the volatile xattr, and the user does not interact with
> + * it. There is no stability requirement, as a reboot explicitly invalidates
> + * a volatile workdir. It is explicitly meant not to be a stable api.
> + *
> + * Although this structure isn't meant to be stable it is exposed to potentially
> + * unprivileged users. We don't do any kind of cryptographic operations with
> + * the structure, so it could be tampered with, or inspected. Don't put
> + * kernel memory pointers in it, or anything else that could cause problems,
> + * or information disclosure.
> + */
> +struct ovl_volatile_info {
> +       /*
> +        * This uniquely identifies a boot, and is reset if overlayfs itself
> +        * is reloaded. Therefore we check our current / known boot_id
> +        * against this before looking at any other fields to validate:
> +        * 1. Is this datastructure laid out in the way we expect? (Overlayfs
> +        *    module, reboot, etc...)
> +        * 2. Could something have changed (like the s_instance_id counter
> +        *    resetting)
> +        */
> +       uuid_t          ovl_boot_id;    /* Must stay first member */
> +       u64             s_instance_id;
> +} __packed;
> +
>  /*
>   * The tuple (fh,uuid) is a universal unique identifier for a copy up origin,
>   * where:
> @@ -422,8 +452,8 @@ void ovl_cleanup_whiteouts(struct dentry *upper, struct list_head *list);
>  void ovl_cache_free(struct list_head *list);
>  void ovl_dir_cache_free(struct inode *inode);
>  int ovl_check_d_type_supported(struct path *realpath);
> -int ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
> -                       struct dentry *dentry, int level);
> +int ovl_workdir_cleanup(struct ovl_fs *ofs, struct inode *dir,
> +                       struct vfsmount *mnt, struct dentry *dentry, int level);
>  int ovl_indexdir_cleanup(struct ovl_fs *ofs);
>
>  /* inode.c */
> @@ -520,3 +550,6 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
>
>  /* export.c */
>  extern const struct export_operations ovl_export_operations;
> +
> +/* super.c */
> +extern uuid_t ovl_boot_id;
> diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> index 01620ebae1bd..4e3e2bc3ea43 100644
> --- a/fs/overlayfs/readdir.c
> +++ b/fs/overlayfs/readdir.c
> @@ -1080,10 +1080,68 @@ int ovl_check_d_type_supported(struct path *realpath)
>
>         return rdd.d_type_supported;
>  }
> +static int ovl_verify_volatile_info(struct ovl_fs *ofs,
> +                                   struct dentry *volatiledir)
> +{
> +       int err;
> +       struct ovl_volatile_info info;
> +
> +       err = ovl_do_getxattr(ofs, volatiledir, OVL_XATTR_VOLATILE, &info,
> +                             sizeof(info));
> +       if (err < 0) {
> +               pr_debug("Unable to read volatile xattr: %d\n", err);
> +               return -EINVAL;
> +       }
> +
> +       if (err != sizeof(info)) {
> +               pr_debug("ovl_volatile_info is of size %d expected %ld\n", err,
> +                        sizeof(info));
> +               return -EINVAL;
> +       }
> +
> +       if (!uuid_equal(&ovl_boot_id, &info.ovl_boot_id)) {
> +               pr_debug("boot id has changed (reboot or module reloaded)\n");
> +               return -EINVAL;
> +       }
> +
> +       if (volatiledir->d_sb->s_instance_id != info.s_instance_id) {
> +               pr_debug("workdir has been unmounted and remounted\n");
> +               return -EINVAL;
> +       }
>
> -#define OVL_INCOMPATDIR_NAME "incompat"
> +       return 1;
> +}
>
> -static int ovl_workdir_cleanup_recurse(struct path *path, int level)
> +/*
> + * ovl_check_incompat checks this specific incompat entry for incompatibility.
> + * If it is found to be incompatible -EINVAL will be returned.
> + *
> + * If the directory should be preserved, then this function returns 1.
> + */
> +static int ovl_check_incompat(struct ovl_fs *ofs, struct ovl_cache_entry *p,
> +                             struct path *path)
> +{
> +       int err = -EINVAL;
> +       struct dentry *d;
> +

if p->name is not OVL_VOLATILEDIR_NAME, there is no reason to lookup
and unless ofs->config.ovl_volatile, you can already fail the check.


> +       d = lookup_one_len(p->name, path->dentry, p->len);
> +       if (IS_ERR(d))
> +               return PTR_ERR(d);
> +
> +       if (!strcmp(p->name, OVL_VOLATILEDIR_NAME))
> +               err = ovl_verify_volatile_info(ofs, d);
> +
> +       if (err == -EINVAL)
> +               pr_err("incompat feature '%s' cannot be mounted\n", p->name);
> +       else
> +               pr_debug("incompat '%s' handled: %d\n", p->name, err);
> +
> +       dput(d);
> +       return err;
> +}
> +
> +static int ovl_workdir_cleanup_recurse(struct ovl_fs *ofs, struct path *path,
> +                                      int level)
>  {
>         int err;
>         struct inode *dir = path->dentry->d_inode;
> @@ -1125,16 +1183,19 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level)
>                         if (p->len == 2 && p->name[1] == '.')
>                                 continue;
>                 } else if (incompat) {
> -                       pr_err("overlay with incompat feature '%s' cannot be mounted\n",
> -                               p->name);
> -                       err = -EINVAL;
> -                       break;
> +                       err = ovl_check_incompat(ofs, p, path);
> +                       if (err < 0)
> +                               break;
> +                       /* Skip cleaning this */
> +                       if (err == 1)
> +                               continue;
>                 }
>                 dentry = lookup_one_len(p->name, path->dentry, p->len);
>                 if (IS_ERR(dentry))
>                         continue;
>                 if (dentry->d_inode)
> -                       err = ovl_workdir_cleanup(dir, path->mnt, dentry, level);
> +                       err = ovl_workdir_cleanup(ofs, dir, path->mnt, dentry,
> +                                                 level);
>                 dput(dentry);
>                 if (err)
>                         break;
> @@ -1142,11 +1203,13 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level)
>         inode_unlock(dir);
>  out:
>         ovl_cache_free(&list);
> +       if (incompat && err >= 0)
> +               return 1;
>         return err;
>  }
>
> -int ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
> -                        struct dentry *dentry, int level)
> +int ovl_workdir_cleanup(struct ovl_fs *ofs, struct inode *dir,
> +                       struct vfsmount *mnt, struct dentry *dentry, int level)
>  {
>         int err;
>
> @@ -1159,7 +1222,7 @@ int ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
>                 struct path path = { .mnt = mnt, .dentry = dentry };
>
>                 inode_unlock(dir);
> -               err = ovl_workdir_cleanup_recurse(&path, level + 1);
> +               err = ovl_workdir_cleanup_recurse(ofs, &path, level + 1);
>                 inode_lock_nested(dir, I_MUTEX_PARENT);
>                 if (!err)
>                         err = ovl_cleanup(dir, dentry);
> @@ -1206,9 +1269,10 @@ int ovl_indexdir_cleanup(struct ovl_fs *ofs)
>                 }
>                 /* Cleanup leftover from index create/cleanup attempt */
>                 if (index->d_name.name[0] == '#') {
> -                       err = ovl_workdir_cleanup(dir, path.mnt, index, 1);
> -                       if (err)
> +                       err = ovl_workdir_cleanup(ofs, dir, path.mnt, index, 1);
> +                       if (err < 0)
>                                 break;
> +                       err = 0;
>                         goto next;
>                 }
>                 err = ovl_verify_index(ofs, index);
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 290983bcfbb3..9a1b07907662 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -15,6 +15,7 @@
>  #include <linux/seq_file.h>
>  #include <linux/posix_acl_xattr.h>
>  #include <linux/exportfs.h>
> +#include <linux/uuid.h>
>  #include "overlayfs.h"
>
>  MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
> @@ -23,6 +24,7 @@ MODULE_LICENSE("GPL");
>
>
>  struct ovl_dir_cache;
> +uuid_t ovl_boot_id;
>
>  #define OVL_MAX_STACK 500
>
> @@ -722,20 +724,24 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
>                                 goto out_unlock;
>
>                         retried = true;
> -                       err = ovl_workdir_cleanup(dir, mnt, work, 0);
> -                       dput(work);
> -                       if (err == -EINVAL) {
> -                               work = ERR_PTR(err);
> -                               goto out_unlock;
> +                       err = ovl_workdir_cleanup(ofs, dir, mnt, work, 0);
> +                       /* check if we should reuse the workdir */
> +                       if (err != 1) {
> +                               dput(work);
> +                               if (err == -EINVAL) {
> +                                       work = ERR_PTR(err);
> +                                       goto out_unlock;
> +                               }
> +                               goto retry;
>                         }
> -                       goto retry;
> +               } else {
> +                       work = ovl_create_real(dir, work,
> +                                              OVL_CATTR(attr.ia_mode));
> +                       err = PTR_ERR(work);
> +                       if (IS_ERR(work))
> +                               goto out_err;
>                 }
>
> -               work = ovl_create_real(dir, work, OVL_CATTR(attr.ia_mode));
> -               err = PTR_ERR(work);
> -               if (IS_ERR(work))
> -                       goto out_err;
> -
>                 /*
>                  * Try to remove POSIX ACL xattrs from workdir.  We are good if:
>                  *
> @@ -1237,26 +1243,59 @@ static struct dentry *ovl_lookup_or_create(struct dentry *parent,
>         return child;
>  }
>
> +static int ovl_set_volatile_info(struct ovl_fs *ofs, struct dentry *volatiledir)
> +{
> +       int err;
> +       struct ovl_volatile_info info = {
> +               .s_instance_id = volatiledir->d_sb->s_instance_id,
> +       };
> +
> +       uuid_copy(&info.ovl_boot_id, &ovl_boot_id);
> +       err = ovl_do_setxattr(ofs, volatiledir, OVL_XATTR_VOLATILE, &info,
> +                             sizeof(info));
> +
> +       if (err == -EOPNOTSUPP)
> +               return 0;
> +
> +       return err;
> +}
> +
>  /*
>   * Creates $workdir/work/incompat/volatile/dirty file if it is not already
>   * present.
>   */
>  static int ovl_create_volatile_dirty(struct ovl_fs *ofs)
>  {
> +       int err;
>         unsigned int ctr;
> -       struct dentry *d = dget(ofs->workbasedir);
> +       struct dentry *volatiledir, *d = dget(ofs->workbasedir);
>         static const char *const volatile_path[] = {
> -               OVL_WORKDIR_NAME, "incompat", "volatile", "dirty"
> +               OVL_WORKDIR_NAME,
> +               OVL_INCOMPATDIR_NAME,
> +               OVL_VOLATILEDIR_NAME,
> +               OVL_VOLATILE_DIRTY_NAME,
>         };
>         const char *const *name = volatile_path;
>
> -       for (ctr = ARRAY_SIZE(volatile_path); ctr; ctr--, name++) {
> -               d = ovl_lookup_or_create(d, *name, ctr > 1 ? S_IFDIR : S_IFREG);
> +       /* Stop before the dirty file is created */
> +       for (ctr = 0; ctr < ARRAY_SIZE(volatile_path) - 1; ctr++, name++) {
> +               d = ovl_lookup_or_create(d, *name, S_IFDIR);
>                 if (IS_ERR(d))
>                         return PTR_ERR(d);
>         }
> -       dput(d);
> -       return 0;
> +       volatiledir = dget(d);
> +
> +       /* Create the dirty file exists before we set the xattr */
> +       d = ovl_lookup_or_create(d, *name, S_IFREG);

There is really no point in including OVL_VOLATILE_DIRTY_NAME in the array
this is just too confusing, use the name explicitly.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 3/3] overlay: Add rudimentary checking of writeback errseq on volatile remount
  2020-11-25 10:46 ` [PATCH v1 3/3] overlay: Add rudimentary checking of writeback errseq on volatile remount Sargun Dhillon
@ 2020-11-25 14:03   ` Amir Goldstein
  2020-11-25 15:36     ` Vivek Goyal
  0 siblings, 1 reply; 19+ messages in thread
From: Amir Goldstein @ 2020-11-25 14:03 UTC (permalink / raw)
  To: Sargun Dhillon
  Cc: overlayfs, Miklos Szeredi, Alexander Viro, Giuseppe Scrivano,
	Vivek Goyal, Daniel J Walsh, linux-fsdevel, David Howells

On Wed, Nov 25, 2020 at 12:46 PM Sargun Dhillon <sargun@sargun.me> wrote:
>
> Volatile remounts validate the following at the moment:
>  * Has the module been reloaded / the system rebooted
>  * Has the workdir been remounted
>
> This adds a new check for errors detected via the superblock's
> errseq_t. At mount time, the errseq_t is snapshotted to disk,
> and upon remount it's re-verified. This allows for kernel-level
> detection of errors without forcing userspace to perform a
> sync and allows for the hidden detection of writeback errors.
>

Looks fine as long as you verify that the reuse is also volatile.

Care to also add the alleged issues that Vivek pointed out with existing
volatile mount to the documentation? (unless Vivek intends to do fix those)

> Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> ---
>  fs/overlayfs/overlayfs.h | 1 +
>  fs/overlayfs/readdir.c   | 6 ++++++
>  fs/overlayfs/super.c     | 1 +
>  3 files changed, 8 insertions(+)
>
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index de694ee99d7c..e8a711953b64 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -85,6 +85,7 @@ struct ovl_volatile_info {
>          */
>         uuid_t          ovl_boot_id;    /* Must stay first member */
>         u64             s_instance_id;
> +       errseq_t        errseq; /* Implemented as a u32 */
>  } __packed;
>
>  /*
> diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> index 4e3e2bc3ea43..2bb0641ecbbd 100644
> --- a/fs/overlayfs/readdir.c
> +++ b/fs/overlayfs/readdir.c
> @@ -1109,6 +1109,12 @@ static int ovl_verify_volatile_info(struct ovl_fs *ofs,
>                 return -EINVAL;
>         }
>
> +       err = errseq_check(&volatiledir->d_sb->s_wb_err, info.errseq);
> +       if (err) {
> +               pr_debug("Workdir filesystem reports errors: %d\n", err);
> +               return -EINVAL;
> +       }
> +
>         return 1;
>  }
>
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 9a1b07907662..49dee41ec125 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -1248,6 +1248,7 @@ static int ovl_set_volatile_info(struct ovl_fs *ofs, struct dentry *volatiledir)
>         int err;
>         struct ovl_volatile_info info = {
>                 .s_instance_id = volatiledir->d_sb->s_instance_id,
> +               .errseq = errseq_sample(&volatiledir->d_sb->s_wb_err),
>         };
>
>         uuid_copy(&info.ovl_boot_id, &ovl_boot_id);
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe
  2020-11-25 10:46 ` [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe Sargun Dhillon
                     ` (3 preceding siblings ...)
  2020-11-25 13:58   ` Amir Goldstein
@ 2020-11-25 14:43   ` Vivek Goyal
  2020-11-25 15:29     ` Sargun Dhillon
  2020-11-25 18:17   ` Vivek Goyal
  5 siblings, 1 reply; 19+ messages in thread
From: Vivek Goyal @ 2020-11-25 14:43 UTC (permalink / raw)
  To: Sargun Dhillon
  Cc: linux-unionfs, miklos, Alexander Viro, Amir Goldstein,
	Giuseppe Scrivano, Daniel J Walsh, linux-fsdevel, David Howells

On Wed, Nov 25, 2020 at 02:46:20AM -0800, Sargun Dhillon wrote:
> Overlayfs added the ability to setup mounts where all syncs could be
> short-circuted in (2a99ddacee43: ovl: provide a mount option "volatile").
> 
> A user might want to remount this fs, but we do not let the user because
> of the "incompat" detection feature. In the case of volatile, it is safe
> to do something like[1]:
> 
> $ sync -f /root/upperdir
> $ rm -rf /root/workdir/incompat/volatile
> 
> There are two ways to go about this. You can call sync on the underlying
> filesystem, check the error code, and delete the dirty file if everything
> is clean. If you're running lots of containers on the same filesystem, or
> you want to avoid all unnecessary I/O, this may be suboptimal.
> 
> Alternatively, you can blindly delete the dirty file, and "hope for the
> best".
> 
> This patch introduces transparent functionality to check if it is
> (relatively) safe to reuse the upperdir. It ensures that the filesystem
> hasn't been remounted,

Hi Sargun,

I am wondering why does it matter if filessytem container upper has been
remoutned? If there were no writeback errors, it should still be safe
to use that upper/? 

> the system hasn't been rebooted, nor has the
> overlayfs code changed.

Same question of overlay module being reloaded. Why reloading overlay
module is unsafe for reusing upperdir for volatile mounts.

I thought there were only two issues with reuse of upperdir.

- System should not have crashed. So boot id needs to be same.
- There are no writeback errors since last unmount. 

What am I missing.

Also, I think any error detection on remount for volatile containers
should go in only once we have capability in overlayfs to detect
writeback errors (for volatile containers) and shutdown filesystem
automatically.

Thanks
Vivek

> Since the structure is explicitly not meant to be
> used between different versions of the code, its stability does not matter
> so much.
> 
> [1]: https://lore.kernel.org/linux-unionfs/CAOQ4uxhKr+j5jFyEC2gJX8E8M19mQ3CqdTYaPZOvDQ9c0qLEzw@mail.gmail.com/T/#m6abe713e4318202ad57f301bf28a414e1d824f9c
> 
> Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-unionfs@vger.kernel.org
> Cc: Miklos Szeredi <miklos@szeredi.hu>
> Cc: Amir Goldstein <amir73il@gmail.com>
> ---
>  Documentation/filesystems/overlayfs.rst |  6 +-
>  fs/overlayfs/overlayfs.h                | 37 ++++++++++-
>  fs/overlayfs/readdir.c                  | 88 +++++++++++++++++++++----
>  fs/overlayfs/super.c                    | 74 ++++++++++++++++-----
>  fs/overlayfs/util.c                     |  2 +
>  5 files changed, 175 insertions(+), 32 deletions(-)
> 
> diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst
> index 580ab9a0fe31..18237c79fb4e 100644
> --- a/Documentation/filesystems/overlayfs.rst
> +++ b/Documentation/filesystems/overlayfs.rst
> @@ -581,7 +581,11 @@ checks for this directory and refuses to mount if present. This is a strong
>  indicator that user should throw away upper and work directories and create
>  fresh one. In very limited cases where the user knows that the system has
>  not crashed and contents of upperdir are intact, The "volatile" directory
> -can be removed.
> +can be removed.  In some cases, the filesystem can detect if the upperdir
> +can be reused safely in a subsequent volatile mount.  If the filesystem is able
> +to determine this, it will not require the user to manually delete the volatile
> +directory and mounting will proceed as normal.
> +
>  
>  Testsuite
>  ---------
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index f8880aa2ba0e..de694ee99d7c 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -32,8 +32,13 @@ enum ovl_xattr {
>  	OVL_XATTR_NLINK,
>  	OVL_XATTR_UPPER,
>  	OVL_XATTR_METACOPY,
> +	OVL_XATTR_VOLATILE,
>  };
>  
> +#define OVL_INCOMPATDIR_NAME "incompat"
> +#define OVL_VOLATILEDIR_NAME "volatile"
> +#define OVL_VOLATILE_DIRTY_NAME "dirty"
> +
>  enum ovl_inode_flag {
>  	/* Pure upper dir that may contain non pure upper entries */
>  	OVL_IMPURE,
> @@ -57,6 +62,31 @@ enum {
>  	OVL_XINO_ON,
>  };
>  
> +/*
> + * This is copied into the volatile xattr, and the user does not interact with
> + * it. There is no stability requirement, as a reboot explicitly invalidates
> + * a volatile workdir. It is explicitly meant not to be a stable api.
> + *
> + * Although this structure isn't meant to be stable it is exposed to potentially
> + * unprivileged users. We don't do any kind of cryptographic operations with
> + * the structure, so it could be tampered with, or inspected. Don't put
> + * kernel memory pointers in it, or anything else that could cause problems,
> + * or information disclosure.
> + */
> +struct ovl_volatile_info {
> +	/*
> +	 * This uniquely identifies a boot, and is reset if overlayfs itself
> +	 * is reloaded. Therefore we check our current / known boot_id
> +	 * against this before looking at any other fields to validate:
> +	 * 1. Is this datastructure laid out in the way we expect? (Overlayfs
> +	 *    module, reboot, etc...)
> +	 * 2. Could something have changed (like the s_instance_id counter
> +	 *    resetting)
> +	 */
> +	uuid_t		ovl_boot_id;	/* Must stay first member */
> +	u64		s_instance_id;
> +} __packed;
> +
>  /*
>   * The tuple (fh,uuid) is a universal unique identifier for a copy up origin,
>   * where:
> @@ -422,8 +452,8 @@ void ovl_cleanup_whiteouts(struct dentry *upper, struct list_head *list);
>  void ovl_cache_free(struct list_head *list);
>  void ovl_dir_cache_free(struct inode *inode);
>  int ovl_check_d_type_supported(struct path *realpath);
> -int ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
> -			struct dentry *dentry, int level);
> +int ovl_workdir_cleanup(struct ovl_fs *ofs, struct inode *dir,
> +			struct vfsmount *mnt, struct dentry *dentry, int level);
>  int ovl_indexdir_cleanup(struct ovl_fs *ofs);
>  
>  /* inode.c */
> @@ -520,3 +550,6 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
>  
>  /* export.c */
>  extern const struct export_operations ovl_export_operations;
> +
> +/* super.c */
> +extern uuid_t ovl_boot_id;
> diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> index 01620ebae1bd..4e3e2bc3ea43 100644
> --- a/fs/overlayfs/readdir.c
> +++ b/fs/overlayfs/readdir.c
> @@ -1080,10 +1080,68 @@ int ovl_check_d_type_supported(struct path *realpath)
>  
>  	return rdd.d_type_supported;
>  }
> +static int ovl_verify_volatile_info(struct ovl_fs *ofs,
> +				    struct dentry *volatiledir)
> +{
> +	int err;
> +	struct ovl_volatile_info info;
> +
> +	err = ovl_do_getxattr(ofs, volatiledir, OVL_XATTR_VOLATILE, &info,
> +			      sizeof(info));
> +	if (err < 0) {
> +		pr_debug("Unable to read volatile xattr: %d\n", err);
> +		return -EINVAL;
> +	}
> +
> +	if (err != sizeof(info)) {
> +		pr_debug("ovl_volatile_info is of size %d expected %ld\n", err,
> +			 sizeof(info));
> +		return -EINVAL;
> +	}
> +
> +	if (!uuid_equal(&ovl_boot_id, &info.ovl_boot_id)) {
> +		pr_debug("boot id has changed (reboot or module reloaded)\n");
> +		return -EINVAL;
> +	}
> +
> +	if (volatiledir->d_sb->s_instance_id != info.s_instance_id) {
> +		pr_debug("workdir has been unmounted and remounted\n");
> +		return -EINVAL;
> +	}
>  
> -#define OVL_INCOMPATDIR_NAME "incompat"
> +	return 1;
> +}
>  
> -static int ovl_workdir_cleanup_recurse(struct path *path, int level)
> +/*
> + * ovl_check_incompat checks this specific incompat entry for incompatibility.
> + * If it is found to be incompatible -EINVAL will be returned.
> + *
> + * If the directory should be preserved, then this function returns 1.
> + */
> +static int ovl_check_incompat(struct ovl_fs *ofs, struct ovl_cache_entry *p,
> +			      struct path *path)
> +{
> +	int err = -EINVAL;
> +	struct dentry *d;
> +
> +	d = lookup_one_len(p->name, path->dentry, p->len);
> +	if (IS_ERR(d))
> +		return PTR_ERR(d);
> +
> +	if (!strcmp(p->name, OVL_VOLATILEDIR_NAME))
> +		err = ovl_verify_volatile_info(ofs, d);
> +
> +	if (err == -EINVAL)
> +		pr_err("incompat feature '%s' cannot be mounted\n", p->name);
> +	else
> +		pr_debug("incompat '%s' handled: %d\n", p->name, err);
> +
> +	dput(d);
> +	return err;
> +}
> +
> +static int ovl_workdir_cleanup_recurse(struct ovl_fs *ofs, struct path *path,
> +				       int level)
>  {
>  	int err;
>  	struct inode *dir = path->dentry->d_inode;
> @@ -1125,16 +1183,19 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level)
>  			if (p->len == 2 && p->name[1] == '.')
>  				continue;
>  		} else if (incompat) {
> -			pr_err("overlay with incompat feature '%s' cannot be mounted\n",
> -				p->name);
> -			err = -EINVAL;
> -			break;
> +			err = ovl_check_incompat(ofs, p, path);
> +			if (err < 0)
> +				break;
> +			/* Skip cleaning this */
> +			if (err == 1)
> +				continue;
>  		}
>  		dentry = lookup_one_len(p->name, path->dentry, p->len);
>  		if (IS_ERR(dentry))
>  			continue;
>  		if (dentry->d_inode)
> -			err = ovl_workdir_cleanup(dir, path->mnt, dentry, level);
> +			err = ovl_workdir_cleanup(ofs, dir, path->mnt, dentry,
> +						  level);
>  		dput(dentry);
>  		if (err)
>  			break;
> @@ -1142,11 +1203,13 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level)
>  	inode_unlock(dir);
>  out:
>  	ovl_cache_free(&list);
> +	if (incompat && err >= 0)
> +		return 1;
>  	return err;
>  }
>  
> -int ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
> -			 struct dentry *dentry, int level)
> +int ovl_workdir_cleanup(struct ovl_fs *ofs, struct inode *dir,
> +			struct vfsmount *mnt, struct dentry *dentry, int level)
>  {
>  	int err;
>  
> @@ -1159,7 +1222,7 @@ int ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
>  		struct path path = { .mnt = mnt, .dentry = dentry };
>  
>  		inode_unlock(dir);
> -		err = ovl_workdir_cleanup_recurse(&path, level + 1);
> +		err = ovl_workdir_cleanup_recurse(ofs, &path, level + 1);
>  		inode_lock_nested(dir, I_MUTEX_PARENT);
>  		if (!err)
>  			err = ovl_cleanup(dir, dentry);
> @@ -1206,9 +1269,10 @@ int ovl_indexdir_cleanup(struct ovl_fs *ofs)
>  		}
>  		/* Cleanup leftover from index create/cleanup attempt */
>  		if (index->d_name.name[0] == '#') {
> -			err = ovl_workdir_cleanup(dir, path.mnt, index, 1);
> -			if (err)
> +			err = ovl_workdir_cleanup(ofs, dir, path.mnt, index, 1);
> +			if (err < 0)
>  				break;
> +			err = 0;
>  			goto next;
>  		}
>  		err = ovl_verify_index(ofs, index);
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 290983bcfbb3..9a1b07907662 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -15,6 +15,7 @@
>  #include <linux/seq_file.h>
>  #include <linux/posix_acl_xattr.h>
>  #include <linux/exportfs.h>
> +#include <linux/uuid.h>
>  #include "overlayfs.h"
>  
>  MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
> @@ -23,6 +24,7 @@ MODULE_LICENSE("GPL");
>  
>  
>  struct ovl_dir_cache;
> +uuid_t ovl_boot_id;
>  
>  #define OVL_MAX_STACK 500
>  
> @@ -722,20 +724,24 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
>  				goto out_unlock;
>  
>  			retried = true;
> -			err = ovl_workdir_cleanup(dir, mnt, work, 0);
> -			dput(work);
> -			if (err == -EINVAL) {
> -				work = ERR_PTR(err);
> -				goto out_unlock;
> +			err = ovl_workdir_cleanup(ofs, dir, mnt, work, 0);
> +			/* check if we should reuse the workdir */
> +			if (err != 1) {
> +				dput(work);
> +				if (err == -EINVAL) {
> +					work = ERR_PTR(err);
> +					goto out_unlock;
> +				}
> +				goto retry;
>  			}
> -			goto retry;
> +		} else {
> +			work = ovl_create_real(dir, work,
> +					       OVL_CATTR(attr.ia_mode));
> +			err = PTR_ERR(work);
> +			if (IS_ERR(work))
> +				goto out_err;
>  		}
>  
> -		work = ovl_create_real(dir, work, OVL_CATTR(attr.ia_mode));
> -		err = PTR_ERR(work);
> -		if (IS_ERR(work))
> -			goto out_err;
> -
>  		/*
>  		 * Try to remove POSIX ACL xattrs from workdir.  We are good if:
>  		 *
> @@ -1237,26 +1243,59 @@ static struct dentry *ovl_lookup_or_create(struct dentry *parent,
>  	return child;
>  }
>  
> +static int ovl_set_volatile_info(struct ovl_fs *ofs, struct dentry *volatiledir)
> +{
> +	int err;
> +	struct ovl_volatile_info info = {
> +		.s_instance_id = volatiledir->d_sb->s_instance_id,
> +	};
> +
> +	uuid_copy(&info.ovl_boot_id, &ovl_boot_id);
> +	err = ovl_do_setxattr(ofs, volatiledir, OVL_XATTR_VOLATILE, &info,
> +			      sizeof(info));
> +
> +	if (err == -EOPNOTSUPP)
> +		return 0;
> +
> +	return err;
> +}
> +
>  /*
>   * Creates $workdir/work/incompat/volatile/dirty file if it is not already
>   * present.
>   */
>  static int ovl_create_volatile_dirty(struct ovl_fs *ofs)
>  {
> +	int err;
>  	unsigned int ctr;
> -	struct dentry *d = dget(ofs->workbasedir);
> +	struct dentry *volatiledir, *d = dget(ofs->workbasedir);
>  	static const char *const volatile_path[] = {
> -		OVL_WORKDIR_NAME, "incompat", "volatile", "dirty"
> +		OVL_WORKDIR_NAME,
> +		OVL_INCOMPATDIR_NAME,
> +		OVL_VOLATILEDIR_NAME,
> +		OVL_VOLATILE_DIRTY_NAME,
>  	};
>  	const char *const *name = volatile_path;
>  
> -	for (ctr = ARRAY_SIZE(volatile_path); ctr; ctr--, name++) {
> -		d = ovl_lookup_or_create(d, *name, ctr > 1 ? S_IFDIR : S_IFREG);
> +	/* Stop before the dirty file is created */
> +	for (ctr = 0; ctr < ARRAY_SIZE(volatile_path) - 1; ctr++, name++) {
> +		d = ovl_lookup_or_create(d, *name, S_IFDIR);
>  		if (IS_ERR(d))
>  			return PTR_ERR(d);
>  	}
> -	dput(d);
> -	return 0;
> +	volatiledir = dget(d);
> +
> +	/* Create the dirty file exists before we set the xattr */
> +	d = ovl_lookup_or_create(d, *name, S_IFREG);
> +	if (!IS_ERR(d)) {
> +		dput(d);
> +		err = ovl_set_volatile_info(ofs, volatiledir);
> +	} else {
> +		err = PTR_ERR(d);
> +	}
> +
> +	dput(volatiledir);
> +	return err;
>  }
>  
>  static int ovl_make_workdir(struct super_block *sb, struct ovl_fs *ofs,
> @@ -2044,6 +2083,7 @@ static int __init ovl_init(void)
>  {
>  	int err;
>  
> +	uuid_gen(&ovl_boot_id);
>  	ovl_inode_cachep = kmem_cache_create("ovl_inode",
>  					     sizeof(struct ovl_inode), 0,
>  					     (SLAB_RECLAIM_ACCOUNT|
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index 23f475627d07..87c9f5a063ed 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -580,6 +580,7 @@ bool ovl_check_dir_xattr(struct super_block *sb, struct dentry *dentry,
>  #define OVL_XATTR_NLINK_POSTFIX		"nlink"
>  #define OVL_XATTR_UPPER_POSTFIX		"upper"
>  #define OVL_XATTR_METACOPY_POSTFIX	"metacopy"
> +#define OVL_XATTR_VOLATILE_POSTFIX	"volatile"
>  
>  #define OVL_XATTR_TAB_ENTRY(x) \
>  	[x] = OVL_XATTR_PREFIX x ## _POSTFIX
> @@ -592,6 +593,7 @@ const char *ovl_xattr_table[] = {
>  	OVL_XATTR_TAB_ENTRY(OVL_XATTR_NLINK),
>  	OVL_XATTR_TAB_ENTRY(OVL_XATTR_UPPER),
>  	OVL_XATTR_TAB_ENTRY(OVL_XATTR_METACOPY),
> +	OVL_XATTR_TAB_ENTRY(OVL_XATTR_VOLATILE),
>  };
>  
>  int ovl_check_setxattr(struct dentry *dentry, struct dentry *upperdentry,
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe
  2020-11-25 14:43   ` Vivek Goyal
@ 2020-11-25 15:29     ` Sargun Dhillon
  0 siblings, 0 replies; 19+ messages in thread
From: Sargun Dhillon @ 2020-11-25 15:29 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-unionfs, miklos, Alexander Viro, Amir Goldstein,
	Giuseppe Scrivano, Daniel J Walsh, linux-fsdevel, David Howells

On Wed, Nov 25, 2020 at 09:43:44AM -0500, Vivek Goyal wrote:
> On Wed, Nov 25, 2020 at 02:46:20AM -0800, Sargun Dhillon wrote:
> > Overlayfs added the ability to setup mounts where all syncs could be
> > short-circuted in (2a99ddacee43: ovl: provide a mount option "volatile").
> > 
> > A user might want to remount this fs, but we do not let the user because
> > of the "incompat" detection feature. In the case of volatile, it is safe
> > to do something like[1]:
> > 
> > $ sync -f /root/upperdir
> > $ rm -rf /root/workdir/incompat/volatile
> > 
> > There are two ways to go about this. You can call sync on the underlying
> > filesystem, check the error code, and delete the dirty file if everything
> > is clean. If you're running lots of containers on the same filesystem, or
> > you want to avoid all unnecessary I/O, this may be suboptimal.
> > 
> > Alternatively, you can blindly delete the dirty file, and "hope for the
> > best".
> > 
> > This patch introduces transparent functionality to check if it is
> > (relatively) safe to reuse the upperdir. It ensures that the filesystem
> > hasn't been remounted,
> 
> Hi Sargun,
> 
> I am wondering why does it matter if filessytem container upper has been
> remoutned? If there were no writeback errors, it should still be safe
> to use that upper/? 
> 

There is no good way to detect if the previous unmount was done cleanly. Each
filesystem seems to have its own API to do this. We can't detect if the drive
was "pulled out" without warning because at remount time the writeback error
count is reset to 0.

> > the system hasn't been rebooted, nor has the
> > overlayfs code changed.
> 
> Same question of overlay module being reloaded. Why reloading overlay
> module is unsafe for reusing upperdir for volatile mounts.
> 

I do not want to make this struct stable. Although we can check if the code (of 
the overlayfs module) is the same, and the rest of the kernel is the same, but
that seems more complicated, and potentially brittle.

The other aspect of the check here is to verify if the sb instance id has been
reset. Although we could use the kernel boot ID to verify that number hasn't
been reset, this seems like it's a much more conservative check that will
ensure safety, and we can always "dial it back".

This approach seems complicated, and reading how modversions / hashes worked
seemed non-trivial.

> I thought there were only two issues with reuse of upperdir.
> 
> - System should not have crashed. So boot id needs to be same.
> - There are no writeback errors since last unmount. 
You need to check that the filesystem was not uncleanly unmounted. That
can occur without a full system crash.

> 
> What am I missing.
> 
> Also, I think any error detection on remount for volatile containers
> should go in only once we have capability in overlayfs to detect
> writeback errors (for volatile containers) and shutdown filesystem
> automatically.
> 
That specific patch is separate. It can be added later.

> Thanks
> Vivek
> 
> > Since the structure is explicitly not meant to be
> > used between different versions of the code, its stability does not matter
> > so much.
> > 
> > [1]: https://lore.kernel.org/linux-unionfs/CAOQ4uxhKr+j5jFyEC2gJX8E8M19mQ3CqdTYaPZOvDQ9c0qLEzw@mail.gmail.com/T/#m6abe713e4318202ad57f301bf28a414e1d824f9c
> > 
> > Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> > Cc: linux-fsdevel@vger.kernel.org
> > Cc: linux-unionfs@vger.kernel.org
> > Cc: Miklos Szeredi <miklos@szeredi.hu>
> > Cc: Amir Goldstein <amir73il@gmail.com>
> > ---
> >  Documentation/filesystems/overlayfs.rst |  6 +-
> >  fs/overlayfs/overlayfs.h                | 37 ++++++++++-
> >  fs/overlayfs/readdir.c                  | 88 +++++++++++++++++++++----
> >  fs/overlayfs/super.c                    | 74 ++++++++++++++++-----
> >  fs/overlayfs/util.c                     |  2 +
> >  5 files changed, 175 insertions(+), 32 deletions(-)
> > 
> > diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst
> > index 580ab9a0fe31..18237c79fb4e 100644
> > --- a/Documentation/filesystems/overlayfs.rst
> > +++ b/Documentation/filesystems/overlayfs.rst
> > @@ -581,7 +581,11 @@ checks for this directory and refuses to mount if present. This is a strong
> >  indicator that user should throw away upper and work directories and create
> >  fresh one. In very limited cases where the user knows that the system has
> >  not crashed and contents of upperdir are intact, The "volatile" directory
> > -can be removed.
> > +can be removed.  In some cases, the filesystem can detect if the upperdir
> > +can be reused safely in a subsequent volatile mount.  If the filesystem is able
> > +to determine this, it will not require the user to manually delete the volatile
> > +directory and mounting will proceed as normal.
> > +
> >  
> >  Testsuite
> >  ---------
> > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> > index f8880aa2ba0e..de694ee99d7c 100644
> > --- a/fs/overlayfs/overlayfs.h
> > +++ b/fs/overlayfs/overlayfs.h
> > @@ -32,8 +32,13 @@ enum ovl_xattr {
> >  	OVL_XATTR_NLINK,
> >  	OVL_XATTR_UPPER,
> >  	OVL_XATTR_METACOPY,
> > +	OVL_XATTR_VOLATILE,
> >  };
> >  
> > +#define OVL_INCOMPATDIR_NAME "incompat"
> > +#define OVL_VOLATILEDIR_NAME "volatile"
> > +#define OVL_VOLATILE_DIRTY_NAME "dirty"
> > +
> >  enum ovl_inode_flag {
> >  	/* Pure upper dir that may contain non pure upper entries */
> >  	OVL_IMPURE,
> > @@ -57,6 +62,31 @@ enum {
> >  	OVL_XINO_ON,
> >  };
> >  
> > +/*
> > + * This is copied into the volatile xattr, and the user does not interact with
> > + * it. There is no stability requirement, as a reboot explicitly invalidates
> > + * a volatile workdir. It is explicitly meant not to be a stable api.
> > + *
> > + * Although this structure isn't meant to be stable it is exposed to potentially
> > + * unprivileged users. We don't do any kind of cryptographic operations with
> > + * the structure, so it could be tampered with, or inspected. Don't put
> > + * kernel memory pointers in it, or anything else that could cause problems,
> > + * or information disclosure.
> > + */
> > +struct ovl_volatile_info {
> > +	/*
> > +	 * This uniquely identifies a boot, and is reset if overlayfs itself
> > +	 * is reloaded. Therefore we check our current / known boot_id
> > +	 * against this before looking at any other fields to validate:
> > +	 * 1. Is this datastructure laid out in the way we expect? (Overlayfs
> > +	 *    module, reboot, etc...)
> > +	 * 2. Could something have changed (like the s_instance_id counter
> > +	 *    resetting)
> > +	 */
> > +	uuid_t		ovl_boot_id;	/* Must stay first member */
> > +	u64		s_instance_id;
> > +} __packed;
> > +
> >  /*
> >   * The tuple (fh,uuid) is a universal unique identifier for a copy up origin,
> >   * where:
> > @@ -422,8 +452,8 @@ void ovl_cleanup_whiteouts(struct dentry *upper, struct list_head *list);
> >  void ovl_cache_free(struct list_head *list);
> >  void ovl_dir_cache_free(struct inode *inode);
> >  int ovl_check_d_type_supported(struct path *realpath);
> > -int ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
> > -			struct dentry *dentry, int level);
> > +int ovl_workdir_cleanup(struct ovl_fs *ofs, struct inode *dir,
> > +			struct vfsmount *mnt, struct dentry *dentry, int level);
> >  int ovl_indexdir_cleanup(struct ovl_fs *ofs);
> >  
> >  /* inode.c */
> > @@ -520,3 +550,6 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
> >  
> >  /* export.c */
> >  extern const struct export_operations ovl_export_operations;
> > +
> > +/* super.c */
> > +extern uuid_t ovl_boot_id;
> > diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> > index 01620ebae1bd..4e3e2bc3ea43 100644
> > --- a/fs/overlayfs/readdir.c
> > +++ b/fs/overlayfs/readdir.c
> > @@ -1080,10 +1080,68 @@ int ovl_check_d_type_supported(struct path *realpath)
> >  
> >  	return rdd.d_type_supported;
> >  }
> > +static int ovl_verify_volatile_info(struct ovl_fs *ofs,
> > +				    struct dentry *volatiledir)
> > +{
> > +	int err;
> > +	struct ovl_volatile_info info;
> > +
> > +	err = ovl_do_getxattr(ofs, volatiledir, OVL_XATTR_VOLATILE, &info,
> > +			      sizeof(info));
> > +	if (err < 0) {
> > +		pr_debug("Unable to read volatile xattr: %d\n", err);
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (err != sizeof(info)) {
> > +		pr_debug("ovl_volatile_info is of size %d expected %ld\n", err,
> > +			 sizeof(info));
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (!uuid_equal(&ovl_boot_id, &info.ovl_boot_id)) {
> > +		pr_debug("boot id has changed (reboot or module reloaded)\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (volatiledir->d_sb->s_instance_id != info.s_instance_id) {
> > +		pr_debug("workdir has been unmounted and remounted\n");
> > +		return -EINVAL;
> > +	}
> >  
> > -#define OVL_INCOMPATDIR_NAME "incompat"
> > +	return 1;
> > +}
> >  
> > -static int ovl_workdir_cleanup_recurse(struct path *path, int level)
> > +/*
> > + * ovl_check_incompat checks this specific incompat entry for incompatibility.
> > + * If it is found to be incompatible -EINVAL will be returned.
> > + *
> > + * If the directory should be preserved, then this function returns 1.
> > + */
> > +static int ovl_check_incompat(struct ovl_fs *ofs, struct ovl_cache_entry *p,
> > +			      struct path *path)
> > +{
> > +	int err = -EINVAL;
> > +	struct dentry *d;
> > +
> > +	d = lookup_one_len(p->name, path->dentry, p->len);
> > +	if (IS_ERR(d))
> > +		return PTR_ERR(d);
> > +
> > +	if (!strcmp(p->name, OVL_VOLATILEDIR_NAME))
> > +		err = ovl_verify_volatile_info(ofs, d);
> > +
> > +	if (err == -EINVAL)
> > +		pr_err("incompat feature '%s' cannot be mounted\n", p->name);
> > +	else
> > +		pr_debug("incompat '%s' handled: %d\n", p->name, err);
> > +
> > +	dput(d);
> > +	return err;
> > +}
> > +
> > +static int ovl_workdir_cleanup_recurse(struct ovl_fs *ofs, struct path *path,
> > +				       int level)
> >  {
> >  	int err;
> >  	struct inode *dir = path->dentry->d_inode;
> > @@ -1125,16 +1183,19 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level)
> >  			if (p->len == 2 && p->name[1] == '.')
> >  				continue;
> >  		} else if (incompat) {
> > -			pr_err("overlay with incompat feature '%s' cannot be mounted\n",
> > -				p->name);
> > -			err = -EINVAL;
> > -			break;
> > +			err = ovl_check_incompat(ofs, p, path);
> > +			if (err < 0)
> > +				break;
> > +			/* Skip cleaning this */
> > +			if (err == 1)
> > +				continue;
> >  		}
> >  		dentry = lookup_one_len(p->name, path->dentry, p->len);
> >  		if (IS_ERR(dentry))
> >  			continue;
> >  		if (dentry->d_inode)
> > -			err = ovl_workdir_cleanup(dir, path->mnt, dentry, level);
> > +			err = ovl_workdir_cleanup(ofs, dir, path->mnt, dentry,
> > +						  level);
> >  		dput(dentry);
> >  		if (err)
> >  			break;
> > @@ -1142,11 +1203,13 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level)
> >  	inode_unlock(dir);
> >  out:
> >  	ovl_cache_free(&list);
> > +	if (incompat && err >= 0)
> > +		return 1;
> >  	return err;
> >  }
> >  
> > -int ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
> > -			 struct dentry *dentry, int level)
> > +int ovl_workdir_cleanup(struct ovl_fs *ofs, struct inode *dir,
> > +			struct vfsmount *mnt, struct dentry *dentry, int level)
> >  {
> >  	int err;
> >  
> > @@ -1159,7 +1222,7 @@ int ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
> >  		struct path path = { .mnt = mnt, .dentry = dentry };
> >  
> >  		inode_unlock(dir);
> > -		err = ovl_workdir_cleanup_recurse(&path, level + 1);
> > +		err = ovl_workdir_cleanup_recurse(ofs, &path, level + 1);
> >  		inode_lock_nested(dir, I_MUTEX_PARENT);
> >  		if (!err)
> >  			err = ovl_cleanup(dir, dentry);
> > @@ -1206,9 +1269,10 @@ int ovl_indexdir_cleanup(struct ovl_fs *ofs)
> >  		}
> >  		/* Cleanup leftover from index create/cleanup attempt */
> >  		if (index->d_name.name[0] == '#') {
> > -			err = ovl_workdir_cleanup(dir, path.mnt, index, 1);
> > -			if (err)
> > +			err = ovl_workdir_cleanup(ofs, dir, path.mnt, index, 1);
> > +			if (err < 0)
> >  				break;
> > +			err = 0;
> >  			goto next;
> >  		}
> >  		err = ovl_verify_index(ofs, index);
> > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > index 290983bcfbb3..9a1b07907662 100644
> > --- a/fs/overlayfs/super.c
> > +++ b/fs/overlayfs/super.c
> > @@ -15,6 +15,7 @@
> >  #include <linux/seq_file.h>
> >  #include <linux/posix_acl_xattr.h>
> >  #include <linux/exportfs.h>
> > +#include <linux/uuid.h>
> >  #include "overlayfs.h"
> >  
> >  MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
> > @@ -23,6 +24,7 @@ MODULE_LICENSE("GPL");
> >  
> >  
> >  struct ovl_dir_cache;
> > +uuid_t ovl_boot_id;
> >  
> >  #define OVL_MAX_STACK 500
> >  
> > @@ -722,20 +724,24 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> >  				goto out_unlock;
> >  
> >  			retried = true;
> > -			err = ovl_workdir_cleanup(dir, mnt, work, 0);
> > -			dput(work);
> > -			if (err == -EINVAL) {
> > -				work = ERR_PTR(err);
> > -				goto out_unlock;
> > +			err = ovl_workdir_cleanup(ofs, dir, mnt, work, 0);
> > +			/* check if we should reuse the workdir */
> > +			if (err != 1) {
> > +				dput(work);
> > +				if (err == -EINVAL) {
> > +					work = ERR_PTR(err);
> > +					goto out_unlock;
> > +				}
> > +				goto retry;
> >  			}
> > -			goto retry;
> > +		} else {
> > +			work = ovl_create_real(dir, work,
> > +					       OVL_CATTR(attr.ia_mode));
> > +			err = PTR_ERR(work);
> > +			if (IS_ERR(work))
> > +				goto out_err;
> >  		}
> >  
> > -		work = ovl_create_real(dir, work, OVL_CATTR(attr.ia_mode));
> > -		err = PTR_ERR(work);
> > -		if (IS_ERR(work))
> > -			goto out_err;
> > -
> >  		/*
> >  		 * Try to remove POSIX ACL xattrs from workdir.  We are good if:
> >  		 *
> > @@ -1237,26 +1243,59 @@ static struct dentry *ovl_lookup_or_create(struct dentry *parent,
> >  	return child;
> >  }
> >  
> > +static int ovl_set_volatile_info(struct ovl_fs *ofs, struct dentry *volatiledir)
> > +{
> > +	int err;
> > +	struct ovl_volatile_info info = {
> > +		.s_instance_id = volatiledir->d_sb->s_instance_id,
> > +	};
> > +
> > +	uuid_copy(&info.ovl_boot_id, &ovl_boot_id);
> > +	err = ovl_do_setxattr(ofs, volatiledir, OVL_XATTR_VOLATILE, &info,
> > +			      sizeof(info));
> > +
> > +	if (err == -EOPNOTSUPP)
> > +		return 0;
> > +
> > +	return err;
> > +}
> > +
> >  /*
> >   * Creates $workdir/work/incompat/volatile/dirty file if it is not already
> >   * present.
> >   */
> >  static int ovl_create_volatile_dirty(struct ovl_fs *ofs)
> >  {
> > +	int err;
> >  	unsigned int ctr;
> > -	struct dentry *d = dget(ofs->workbasedir);
> > +	struct dentry *volatiledir, *d = dget(ofs->workbasedir);
> >  	static const char *const volatile_path[] = {
> > -		OVL_WORKDIR_NAME, "incompat", "volatile", "dirty"
> > +		OVL_WORKDIR_NAME,
> > +		OVL_INCOMPATDIR_NAME,
> > +		OVL_VOLATILEDIR_NAME,
> > +		OVL_VOLATILE_DIRTY_NAME,
> >  	};
> >  	const char *const *name = volatile_path;
> >  
> > -	for (ctr = ARRAY_SIZE(volatile_path); ctr; ctr--, name++) {
> > -		d = ovl_lookup_or_create(d, *name, ctr > 1 ? S_IFDIR : S_IFREG);
> > +	/* Stop before the dirty file is created */
> > +	for (ctr = 0; ctr < ARRAY_SIZE(volatile_path) - 1; ctr++, name++) {
> > +		d = ovl_lookup_or_create(d, *name, S_IFDIR);
> >  		if (IS_ERR(d))
> >  			return PTR_ERR(d);
> >  	}
> > -	dput(d);
> > -	return 0;
> > +	volatiledir = dget(d);
> > +
> > +	/* Create the dirty file exists before we set the xattr */
> > +	d = ovl_lookup_or_create(d, *name, S_IFREG);
> > +	if (!IS_ERR(d)) {
> > +		dput(d);
> > +		err = ovl_set_volatile_info(ofs, volatiledir);
> > +	} else {
> > +		err = PTR_ERR(d);
> > +	}
> > +
> > +	dput(volatiledir);
> > +	return err;
> >  }
> >  
> >  static int ovl_make_workdir(struct super_block *sb, struct ovl_fs *ofs,
> > @@ -2044,6 +2083,7 @@ static int __init ovl_init(void)
> >  {
> >  	int err;
> >  
> > +	uuid_gen(&ovl_boot_id);
> >  	ovl_inode_cachep = kmem_cache_create("ovl_inode",
> >  					     sizeof(struct ovl_inode), 0,
> >  					     (SLAB_RECLAIM_ACCOUNT|
> > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> > index 23f475627d07..87c9f5a063ed 100644
> > --- a/fs/overlayfs/util.c
> > +++ b/fs/overlayfs/util.c
> > @@ -580,6 +580,7 @@ bool ovl_check_dir_xattr(struct super_block *sb, struct dentry *dentry,
> >  #define OVL_XATTR_NLINK_POSTFIX		"nlink"
> >  #define OVL_XATTR_UPPER_POSTFIX		"upper"
> >  #define OVL_XATTR_METACOPY_POSTFIX	"metacopy"
> > +#define OVL_XATTR_VOLATILE_POSTFIX	"volatile"
> >  
> >  #define OVL_XATTR_TAB_ENTRY(x) \
> >  	[x] = OVL_XATTR_PREFIX x ## _POSTFIX
> > @@ -592,6 +593,7 @@ const char *ovl_xattr_table[] = {
> >  	OVL_XATTR_TAB_ENTRY(OVL_XATTR_NLINK),
> >  	OVL_XATTR_TAB_ENTRY(OVL_XATTR_UPPER),
> >  	OVL_XATTR_TAB_ENTRY(OVL_XATTR_METACOPY),
> > +	OVL_XATTR_TAB_ENTRY(OVL_XATTR_VOLATILE),
> >  };
> >  
> >  int ovl_check_setxattr(struct dentry *dentry, struct dentry *upperdentry,
> > -- 
> > 2.25.1
> > 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 3/3] overlay: Add rudimentary checking of writeback errseq on volatile remount
  2020-11-25 14:03   ` Amir Goldstein
@ 2020-11-25 15:36     ` Vivek Goyal
  2020-11-25 15:52       ` Amir Goldstein
  0 siblings, 1 reply; 19+ messages in thread
From: Vivek Goyal @ 2020-11-25 15:36 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Sargun Dhillon, overlayfs, Miklos Szeredi, Alexander Viro,
	Giuseppe Scrivano, Daniel J Walsh, linux-fsdevel, David Howells

On Wed, Nov 25, 2020 at 04:03:06PM +0200, Amir Goldstein wrote:
> On Wed, Nov 25, 2020 at 12:46 PM Sargun Dhillon <sargun@sargun.me> wrote:
> >
> > Volatile remounts validate the following at the moment:
> >  * Has the module been reloaded / the system rebooted
> >  * Has the workdir been remounted
> >
> > This adds a new check for errors detected via the superblock's
> > errseq_t. At mount time, the errseq_t is snapshotted to disk,
> > and upon remount it's re-verified. This allows for kernel-level
> > detection of errors without forcing userspace to perform a
> > sync and allows for the hidden detection of writeback errors.
> >
> 
> Looks fine as long as you verify that the reuse is also volatile.
> 
> Care to also add the alleged issues that Vivek pointed out with existing
> volatile mount to the documentation? (unless Vivek intends to do fix those)

I thought current writeback error issue with volatile mounts needs to
be fixed with shutting down filesystem. (And mere documentation is not
enough).

Amir, are you planning to improve your ovl-shutdown patches to detect
writeback errors for volatile mounts. Or you want somebody else to
look at it.

W.r.t this patch set, I still think that first we should have patches
to shutdown filesystem on writeback errors (for volatile mount), and
then detecting writeback errors on remount makes more sense.

Vivek

> 
> > Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> > ---
> >  fs/overlayfs/overlayfs.h | 1 +
> >  fs/overlayfs/readdir.c   | 6 ++++++
> >  fs/overlayfs/super.c     | 1 +
> >  3 files changed, 8 insertions(+)
> >
> > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> > index de694ee99d7c..e8a711953b64 100644
> > --- a/fs/overlayfs/overlayfs.h
> > +++ b/fs/overlayfs/overlayfs.h
> > @@ -85,6 +85,7 @@ struct ovl_volatile_info {
> >          */
> >         uuid_t          ovl_boot_id;    /* Must stay first member */
> >         u64             s_instance_id;
> > +       errseq_t        errseq; /* Implemented as a u32 */
> >  } __packed;
> >
> >  /*
> > diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> > index 4e3e2bc3ea43..2bb0641ecbbd 100644
> > --- a/fs/overlayfs/readdir.c
> > +++ b/fs/overlayfs/readdir.c
> > @@ -1109,6 +1109,12 @@ static int ovl_verify_volatile_info(struct ovl_fs *ofs,
> >                 return -EINVAL;
> >         }
> >
> > +       err = errseq_check(&volatiledir->d_sb->s_wb_err, info.errseq);
> > +       if (err) {
> > +               pr_debug("Workdir filesystem reports errors: %d\n", err);
> > +               return -EINVAL;
> > +       }
> > +
> >         return 1;
> >  }
> >
> > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > index 9a1b07907662..49dee41ec125 100644
> > --- a/fs/overlayfs/super.c
> > +++ b/fs/overlayfs/super.c
> > @@ -1248,6 +1248,7 @@ static int ovl_set_volatile_info(struct ovl_fs *ofs, struct dentry *volatiledir)
> >         int err;
> >         struct ovl_volatile_info info = {
> >                 .s_instance_id = volatiledir->d_sb->s_instance_id,
> > +               .errseq = errseq_sample(&volatiledir->d_sb->s_wb_err),
> >         };
> >
> >         uuid_copy(&info.ovl_boot_id, &ovl_boot_id);
> > --
> > 2.25.1
> >
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 3/3] overlay: Add rudimentary checking of writeback errseq on volatile remount
  2020-11-25 15:36     ` Vivek Goyal
@ 2020-11-25 15:52       ` Amir Goldstein
  0 siblings, 0 replies; 19+ messages in thread
From: Amir Goldstein @ 2020-11-25 15:52 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Sargun Dhillon, overlayfs, Miklos Szeredi, Alexander Viro,
	Giuseppe Scrivano, Daniel J Walsh, linux-fsdevel, David Howells

On Wed, Nov 25, 2020 at 5:36 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Wed, Nov 25, 2020 at 04:03:06PM +0200, Amir Goldstein wrote:
> > On Wed, Nov 25, 2020 at 12:46 PM Sargun Dhillon <sargun@sargun.me> wrote:
> > >
> > > Volatile remounts validate the following at the moment:
> > >  * Has the module been reloaded / the system rebooted
> > >  * Has the workdir been remounted
> > >
> > > This adds a new check for errors detected via the superblock's
> > > errseq_t. At mount time, the errseq_t is snapshotted to disk,
> > > and upon remount it's re-verified. This allows for kernel-level
> > > detection of errors without forcing userspace to perform a
> > > sync and allows for the hidden detection of writeback errors.
> > >
> >
> > Looks fine as long as you verify that the reuse is also volatile.
> >
> > Care to also add the alleged issues that Vivek pointed out with existing
> > volatile mount to the documentation? (unless Vivek intends to do fix those)
>
> I thought current writeback error issue with volatile mounts needs to
> be fixed with shutting down filesystem. (And mere documentation is not
> enough).
>

Documentation is the bare minimum.
If someone implements the shutdown approach that would be best.

> Amir, are you planning to improve your ovl-shutdown patches to detect
> writeback errors for volatile mounts. Or you want somebody else to
> look at it.

I did not intend to work on this.
Whoever wants to pick this up doesn't need to actually implement the
shutdown ioctl, may implement only an "internal shutdown" on error.

>
> W.r.t this patch set, I still think that first we should have patches
> to shutdown filesystem on writeback errors (for volatile mount), and
> then detecting writeback errors on remount makes more sense.
>

I agree that would be very nice, but I can also understand the argument
that volatile mount has an issue, which does not get any better or any
worse as a result of Sargun's patches.

If anything, they improve the situation:
Currently, the user does have a way to know if any data was lost on a
volatile mount.
After a successful mount cycle, the user knows that no data was lost
during the last volatile mount period.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe
  2020-11-25 10:46 ` [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe Sargun Dhillon
                     ` (4 preceding siblings ...)
  2020-11-25 14:43   ` Vivek Goyal
@ 2020-11-25 18:17   ` Vivek Goyal
  2020-11-25 18:31     ` Sargun Dhillon
  5 siblings, 1 reply; 19+ messages in thread
From: Vivek Goyal @ 2020-11-25 18:17 UTC (permalink / raw)
  To: Sargun Dhillon
  Cc: linux-unionfs, miklos, Alexander Viro, Amir Goldstein,
	Giuseppe Scrivano, Daniel J Walsh, linux-fsdevel, David Howells

On Wed, Nov 25, 2020 at 02:46:20AM -0800, Sargun Dhillon wrote:

[..]
> @@ -1125,16 +1183,19 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level)
>  			if (p->len == 2 && p->name[1] == '.')
>  				continue;
>  		} else if (incompat) {
> -			pr_err("overlay with incompat feature '%s' cannot be mounted\n",
> -				p->name);
> -			err = -EINVAL;
> -			break;
> +			err = ovl_check_incompat(ofs, p, path);
> +			if (err < 0)
> +				break;
> +			/* Skip cleaning this */
> +			if (err == 1)
> +				continue;
>  		}

Shouldn't we clean volatile/dirty on non-volatile mount. I did a 
volatile mount followed by a non-volatile remount and I still
see work/incompat/volatile/dirty and "trusted.overlay.volatile" xattr
on "volatile" dir. I would expect that this will be all cleaned up
as soon as that upper/work is used for non-volatile mount.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe
  2020-11-25 18:17   ` Vivek Goyal
@ 2020-11-25 18:31     ` Sargun Dhillon
  2020-11-25 18:43       ` Vivek Goyal
  0 siblings, 1 reply; 19+ messages in thread
From: Sargun Dhillon @ 2020-11-25 18:31 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: overlayfs, Miklos Szeredi, Alexander Viro, Amir Goldstein,
	Giuseppe Scrivano, Daniel J Walsh, Linux FS-devel Mailing List,
	David Howells

On Wed, Nov 25, 2020 at 10:17 AM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Wed, Nov 25, 2020 at 02:46:20AM -0800, Sargun Dhillon wrote:
>
> [..]
> > @@ -1125,16 +1183,19 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level)
> >                       if (p->len == 2 && p->name[1] == '.')
> >                               continue;
> >               } else if (incompat) {
> > -                     pr_err("overlay with incompat feature '%s' cannot be mounted\n",
> > -                             p->name);
> > -                     err = -EINVAL;
> > -                     break;
> > +                     err = ovl_check_incompat(ofs, p, path);
> > +                     if (err < 0)
> > +                             break;
> > +                     /* Skip cleaning this */
> > +                     if (err == 1)
> > +                             continue;
> >               }
>
> Shouldn't we clean volatile/dirty on non-volatile mount. I did a
> volatile mount followed by a non-volatile remount and I still
> see work/incompat/volatile/dirty and "trusted.overlay.volatile" xattr
> on "volatile" dir. I would expect that this will be all cleaned up
> as soon as that upper/work is used for non-volatile mount.
>
>

Amir pointed out this is incorrect behaviour earlier.
You should be able to go:
non-volatile -> volatile
volatile -> volatile

But never
volatile -> non-volatile, since our mechanism is not bulletproof.

I will fix this in the next respin.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe
  2020-11-25 18:31     ` Sargun Dhillon
@ 2020-11-25 18:43       ` Vivek Goyal
  2020-11-25 18:47         ` Sargun Dhillon
  0 siblings, 1 reply; 19+ messages in thread
From: Vivek Goyal @ 2020-11-25 18:43 UTC (permalink / raw)
  To: Sargun Dhillon
  Cc: overlayfs, Miklos Szeredi, Alexander Viro, Amir Goldstein,
	Giuseppe Scrivano, Daniel J Walsh, Linux FS-devel Mailing List,
	David Howells

On Wed, Nov 25, 2020 at 10:31:36AM -0800, Sargun Dhillon wrote:
> On Wed, Nov 25, 2020 at 10:17 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Wed, Nov 25, 2020 at 02:46:20AM -0800, Sargun Dhillon wrote:
> >
> > [..]
> > > @@ -1125,16 +1183,19 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level)
> > >                       if (p->len == 2 && p->name[1] == '.')
> > >                               continue;
> > >               } else if (incompat) {
> > > -                     pr_err("overlay with incompat feature '%s' cannot be mounted\n",
> > > -                             p->name);
> > > -                     err = -EINVAL;
> > > -                     break;
> > > +                     err = ovl_check_incompat(ofs, p, path);
> > > +                     if (err < 0)
> > > +                             break;
> > > +                     /* Skip cleaning this */
> > > +                     if (err == 1)
> > > +                             continue;
> > >               }
> >
> > Shouldn't we clean volatile/dirty on non-volatile mount. I did a
> > volatile mount followed by a non-volatile remount and I still
> > see work/incompat/volatile/dirty and "trusted.overlay.volatile" xattr
> > on "volatile" dir. I would expect that this will be all cleaned up
> > as soon as that upper/work is used for non-volatile mount.
> >
> >
> 
> Amir pointed out this is incorrect behaviour earlier.
> You should be able to go:
> non-volatile -> volatile
> volatile -> volatile
> 
> But never
> volatile -> non-volatile, since our mechanism is not bulletproof.

Ok, so one needs to manually remove volatile/dirty to be able to
go from volatile to non-volatile.

I am wondering what does this change mean in terms of user visible
behavior. So far, if somebody tried a remount of volatile overlay, it 
will fail. After this change, it will most likely succeed. I am
hoping nobody relies on remount failure of volatile mount and
complain that user visible behavior changed after kernel upgrade.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe
  2020-11-25 18:43       ` Vivek Goyal
@ 2020-11-25 18:47         ` Sargun Dhillon
  2020-11-25 18:52           ` Vivek Goyal
  0 siblings, 1 reply; 19+ messages in thread
From: Sargun Dhillon @ 2020-11-25 18:47 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: overlayfs, Miklos Szeredi, Alexander Viro, Amir Goldstein,
	Giuseppe Scrivano, Daniel J Walsh, Linux FS-devel Mailing List,
	David Howells

On Wed, Nov 25, 2020 at 10:43 AM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Wed, Nov 25, 2020 at 10:31:36AM -0800, Sargun Dhillon wrote:
> > On Wed, Nov 25, 2020 at 10:17 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Wed, Nov 25, 2020 at 02:46:20AM -0800, Sargun Dhillon wrote:
> > >
> > > [..]
> > > > @@ -1125,16 +1183,19 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level)
> > > >                       if (p->len == 2 && p->name[1] == '.')
> > > >                               continue;
> > > >               } else if (incompat) {
> > > > -                     pr_err("overlay with incompat feature '%s' cannot be mounted\n",
> > > > -                             p->name);
> > > > -                     err = -EINVAL;
> > > > -                     break;
> > > > +                     err = ovl_check_incompat(ofs, p, path);
> > > > +                     if (err < 0)
> > > > +                             break;
> > > > +                     /* Skip cleaning this */
> > > > +                     if (err == 1)
> > > > +                             continue;
> > > >               }
> > >
> > > Shouldn't we clean volatile/dirty on non-volatile mount. I did a
> > > volatile mount followed by a non-volatile remount and I still
> > > see work/incompat/volatile/dirty and "trusted.overlay.volatile" xattr
> > > on "volatile" dir. I would expect that this will be all cleaned up
> > > as soon as that upper/work is used for non-volatile mount.
> > >
> > >
> >
> > Amir pointed out this is incorrect behaviour earlier.
> > You should be able to go:
> > non-volatile -> volatile
> > volatile -> volatile
> >
> > But never
> > volatile -> non-volatile, since our mechanism is not bulletproof.
>
> Ok, so one needs to manually remove volatile/dirty to be able to
> go from volatile to non-volatile.
>
> I am wondering what does this change mean in terms of user visible
> behavior. So far, if somebody tried a remount of volatile overlay, it
> will fail. After this change, it will most likely succeed. I am
> hoping nobody relies on remount failure of volatile mount and
> complain that user visible behavior changed after kernel upgrade.
>
> Thanks
> Vivek
>
If I respin this shortly, can we get it in rc6, or do we want to wait
until 5.11?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe
  2020-11-25 18:47         ` Sargun Dhillon
@ 2020-11-25 18:52           ` Vivek Goyal
  2020-11-25 19:37             ` Amir Goldstein
  0 siblings, 1 reply; 19+ messages in thread
From: Vivek Goyal @ 2020-11-25 18:52 UTC (permalink / raw)
  To: Sargun Dhillon
  Cc: overlayfs, Miklos Szeredi, Alexander Viro, Amir Goldstein,
	Giuseppe Scrivano, Daniel J Walsh, Linux FS-devel Mailing List,
	David Howells

On Wed, Nov 25, 2020 at 10:47:38AM -0800, Sargun Dhillon wrote:
> On Wed, Nov 25, 2020 at 10:43 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Wed, Nov 25, 2020 at 10:31:36AM -0800, Sargun Dhillon wrote:
> > > On Wed, Nov 25, 2020 at 10:17 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > >
> > > > On Wed, Nov 25, 2020 at 02:46:20AM -0800, Sargun Dhillon wrote:
> > > >
> > > > [..]
> > > > > @@ -1125,16 +1183,19 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level)
> > > > >                       if (p->len == 2 && p->name[1] == '.')
> > > > >                               continue;
> > > > >               } else if (incompat) {
> > > > > -                     pr_err("overlay with incompat feature '%s' cannot be mounted\n",
> > > > > -                             p->name);
> > > > > -                     err = -EINVAL;
> > > > > -                     break;
> > > > > +                     err = ovl_check_incompat(ofs, p, path);
> > > > > +                     if (err < 0)
> > > > > +                             break;
> > > > > +                     /* Skip cleaning this */
> > > > > +                     if (err == 1)
> > > > > +                             continue;
> > > > >               }
> > > >
> > > > Shouldn't we clean volatile/dirty on non-volatile mount. I did a
> > > > volatile mount followed by a non-volatile remount and I still
> > > > see work/incompat/volatile/dirty and "trusted.overlay.volatile" xattr
> > > > on "volatile" dir. I would expect that this will be all cleaned up
> > > > as soon as that upper/work is used for non-volatile mount.
> > > >
> > > >
> > >
> > > Amir pointed out this is incorrect behaviour earlier.
> > > You should be able to go:
> > > non-volatile -> volatile
> > > volatile -> volatile
> > >
> > > But never
> > > volatile -> non-volatile, since our mechanism is not bulletproof.
> >
> > Ok, so one needs to manually remove volatile/dirty to be able to
> > go from volatile to non-volatile.
> >
> > I am wondering what does this change mean in terms of user visible
> > behavior. So far, if somebody tried a remount of volatile overlay, it
> > will fail. After this change, it will most likely succeed. I am
> > hoping nobody relies on remount failure of volatile mount and
> > complain that user visible behavior changed after kernel upgrade.
> >
> > Thanks
> > Vivek
> >
> If I respin this shortly, can we get it in rc6, or do we want to wait
> until 5.11?

I think that trying to squeeze it in this late in cycle is probably
not a good idea. If above is a valid concern, then this feature probably
needs to be an opt-in.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe
  2020-11-25 18:52           ` Vivek Goyal
@ 2020-11-25 19:37             ` Amir Goldstein
  0 siblings, 0 replies; 19+ messages in thread
From: Amir Goldstein @ 2020-11-25 19:37 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Sargun Dhillon, overlayfs, Miklos Szeredi, Alexander Viro,
	Giuseppe Scrivano, Daniel J Walsh, Linux FS-devel Mailing List,
	David Howells

On Wed, Nov 25, 2020 at 8:52 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Wed, Nov 25, 2020 at 10:47:38AM -0800, Sargun Dhillon wrote:
> > On Wed, Nov 25, 2020 at 10:43 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Wed, Nov 25, 2020 at 10:31:36AM -0800, Sargun Dhillon wrote:
> > > > On Wed, Nov 25, 2020 at 10:17 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > >
> > > > > On Wed, Nov 25, 2020 at 02:46:20AM -0800, Sargun Dhillon wrote:
> > > > >
> > > > > [..]
> > > > > > @@ -1125,16 +1183,19 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level)
> > > > > >                       if (p->len == 2 && p->name[1] == '.')
> > > > > >                               continue;
> > > > > >               } else if (incompat) {
> > > > > > -                     pr_err("overlay with incompat feature '%s' cannot be mounted\n",
> > > > > > -                             p->name);
> > > > > > -                     err = -EINVAL;
> > > > > > -                     break;
> > > > > > +                     err = ovl_check_incompat(ofs, p, path);
> > > > > > +                     if (err < 0)
> > > > > > +                             break;
> > > > > > +                     /* Skip cleaning this */
> > > > > > +                     if (err == 1)
> > > > > > +                             continue;
> > > > > >               }
> > > > >
> > > > > Shouldn't we clean volatile/dirty on non-volatile mount. I did a
> > > > > volatile mount followed by a non-volatile remount and I still
> > > > > see work/incompat/volatile/dirty and "trusted.overlay.volatile" xattr
> > > > > on "volatile" dir. I would expect that this will be all cleaned up
> > > > > as soon as that upper/work is used for non-volatile mount.
> > > > >
> > > > >
> > > >
> > > > Amir pointed out this is incorrect behaviour earlier.
> > > > You should be able to go:
> > > > non-volatile -> volatile
> > > > volatile -> volatile
> > > >
> > > > But never
> > > > volatile -> non-volatile, since our mechanism is not bulletproof.
> > >
> > > Ok, so one needs to manually remove volatile/dirty to be able to
> > > go from volatile to non-volatile.
> > >
> > > I am wondering what does this change mean in terms of user visible
> > > behavior. So far, if somebody tried a remount of volatile overlay, it
> > > will fail. After this change, it will most likely succeed. I am
> > > hoping nobody relies on remount failure of volatile mount and
> > > complain that user visible behavior changed after kernel upgrade.
> > >
> > > Thanks
> > > Vivek
> > >
> > If I respin this shortly, can we get it in rc6, or do we want to wait
> > until 5.11?
>
> I think that trying to squeeze it in this late in cycle is probably
> not a good idea. If above is a valid concern, then this feature probably
> needs to be an opt-in.
>

Doesn't sound like a concern to me.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-11-25 19:37 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-25 10:46 [PATCH v1 0/3] Make overlayfs volatile mounts reusable Sargun Dhillon
2020-11-25 10:46 ` [PATCH v1 1/3] fs: Add s_instance_id field to superblock for unique identification Sargun Dhillon
2020-11-25 10:46 ` [PATCH v1 2/3] overlay: Add the ability to remount volatile directories when safe Sargun Dhillon
2020-11-25 12:49   ` kernel test robot
2020-11-25 13:23   ` kernel test robot
2020-11-25 13:29   ` kernel test robot
2020-11-25 13:58   ` Amir Goldstein
2020-11-25 14:43   ` Vivek Goyal
2020-11-25 15:29     ` Sargun Dhillon
2020-11-25 18:17   ` Vivek Goyal
2020-11-25 18:31     ` Sargun Dhillon
2020-11-25 18:43       ` Vivek Goyal
2020-11-25 18:47         ` Sargun Dhillon
2020-11-25 18:52           ` Vivek Goyal
2020-11-25 19:37             ` Amir Goldstein
2020-11-25 10:46 ` [PATCH v1 3/3] overlay: Add rudimentary checking of writeback errseq on volatile remount Sargun Dhillon
2020-11-25 14:03   ` Amir Goldstein
2020-11-25 15:36     ` Vivek Goyal
2020-11-25 15:52       ` Amir Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).