From: Sargun Dhillon <sargun@sargun.me>
To: Amir Goldstein <amir73il@gmail.com>
Cc: overlayfs <linux-unionfs@vger.kernel.org>,
Miklos Szeredi <miklos@szeredi.hu>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Giuseppe Scrivano <gscrivan@redhat.com>,
Vivek Goyal <vgoyal@redhat.com>,
Daniel J Walsh <dwalsh@redhat.com>,
David Howells <dhowells@redhat.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [RFC PATCH 3/3] overlay: Add the ability to remount volatile directories when safe
Date: Mon, 16 Nov 2020 10:30:14 +0000 [thread overview]
Message-ID: <20201116103013.GA13259@ircssh-2.c.rugged-nimbus-611.internal> (raw)
In-Reply-To: <CAOQ4uxh9oa5TWNY4byNyeBGUe7wyON2NJ2_rj5GiYD_0wBOXGA@mail.gmail.com>
On Mon, Nov 16, 2020 at 11:31:20AM +0200, Amir Goldstein wrote:
> On Mon, Nov 16, 2020 at 6:58 AM Sargun Dhillon <sargun@sargun.me> wrote:
> >
> > Overlayfs added the ability to setup mounts where all syncs could be
> > short-circuted in (2a99ddacee43: ovl: provide a mount option "volatile").
> >
> > A user might want to remount this fs, but we do not let the user because
> > of the "incompat" detection feature. In the case of volatile, it is safe
> > to do something like[1]:
> >
> > $ sync -f /root/upperdir
> > $ rm -rf /root/workdir/incompat/volatile
> >
> > There are two ways to go about this. You can call sync on the underlying
> > filesystem, check the error code, and delete the dirty file if everything
> > is clean. If you're running lots of containers on the same filesystem, or
> > you want to avoid all unnecessary I/O, this may be suboptimal.
> >
> > Alternatively, you can blindly delete the dirty file, and "hope for the
> > best".
> >
> > This patch introduces transparent functionality to check if it is
> > (relatively) safe to reuse the upperdir. It ensures that the filesystem
> > hasn't been remounted, the system hasn't been rebooted, nor has the
> > overlayfs code changed. It also checks the errseq on the superblock
> > indicating if there have been any writeback errors since the previous
> > mount. Currently, this information is not directly exposed to userspace, so
> > the user cannot make decisions based on this.
>
> This is the main reason IMO that this patch is needed, but it's buried inside
> this paragraph. It wasn't obvious to me at first why userspace solution
> was not possible. Maybe try to give it more focus.
>
>
> > Instead we checkpoint
> > this information to disk, and upon remount we see if any of it has
> > changed. Since the structure is explicitly not meant to be used
> > between different versions of the code, its stability does not
> > matter so much.
> >
> > [1]: https://lore.kernel.org/linux-unionfs/CAOQ4uxhKr+j5jFyEC2gJX8E8M19mQ3CqdTYaPZOvDQ9c0qLEzw@mail.gmail.com/T/#m6abe713e4318202ad57f301bf28a414e1d824f9c
> >
> > Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> > Cc: linux-fsdevel@vger.kernel.org
> > Cc: linux-unionfs@vger.kernel.org
> > Cc: Miklos Szeredi <miklos@szeredi.hu>
> > Cc: Amir Goldstein <amir73il@gmail.com>
> > ---
> > Documentation/filesystems/overlayfs.rst | 5 +-
> > fs/overlayfs/overlayfs.h | 34 ++++++++++
> > fs/overlayfs/readdir.c | 86 +++++++++++++++++++++++--
> > fs/overlayfs/super.c | 22 ++++++-
> > 4 files changed, 139 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst
> > index 580ab9a0fe31..fa3faeeab727 100644
> > --- a/Documentation/filesystems/overlayfs.rst
> > +++ b/Documentation/filesystems/overlayfs.rst
> > @@ -581,7 +581,10 @@ checks for this directory and refuses to mount if present. This is a strong
> > indicator that user should throw away upper and work directories and create
> > fresh one. In very limited cases where the user knows that the system has
> > not crashed and contents of upperdir are intact, The "volatile" directory
> > -can be removed.
> > +can be removed. In certain cases it the filesystem can detect that the
>
> typo: it the filesystem
>
> > +upperdir can be reused safely, and it will not require the user to
> > +manually delete the volatile directory.
> > +
> >
> > Testsuite
> > ---------
> > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> > index 9eb911f243e1..980d2c930f7a 100644
> > --- a/fs/overlayfs/overlayfs.h
> > +++ b/fs/overlayfs/overlayfs.h
> > @@ -30,6 +30,11 @@ enum ovl_path_type {
> > #define OVL_XATTR_NLINK OVL_XATTR_PREFIX "nlink"
> > #define OVL_XATTR_UPPER OVL_XATTR_PREFIX "upper"
> > #define OVL_XATTR_METACOPY OVL_XATTR_PREFIX "metacopy"
> > +#define OVL_XATTR_VOLATILE OVL_XATTR_PREFIX "volatile"
> > +
> > +#define OVL_INCOMPATDIR_NAME "incompat"
> > +#define OVL_VOLATILEDIR_NAME "volatile"
> > +#define OVL_VOLATILE_DIRTY_NAME "dirty"
> >
> > enum ovl_inode_flag {
> > /* Pure upper dir that may contain non pure upper entries */
> > @@ -54,6 +59,32 @@ enum {
> > OVL_XINO_ON,
> > };
> >
> > +/*
> > + * This is copied into the volatile xattr, and the user does not interact with
> > + * it. There is no stability requirement, as a reboot explicitly invalidates
> > + * a volatile workdir. It is explicitly meant not to be a stable api.
> > + *
> > + * Although this structure isn't meant to be stable it is exposed to potentially
> > + * unprivileged users. We don't do any kind of cryptographic operations with
> > + * the structure, so it could be tampered with, or inspected. Don't put
> > + * kernel memory pointers in it, or anything else that could cause problems,
> > + * or information disclosure.
> > + */
> > +struct overlay_volatile_info {
>
> ovl_volatile_info please
>
> > + /*
> > + * This uniquely identifies a boot, and is reset if overlayfs itself
> > + * is reloaded. Therefore we check our current / known boot_id
> > + * against this before looking at any other fields to validate:
> > + * 1. Is this datastructure laid out in the way we expect? (Overlayfs
> > + * module, reboot, etc...)
> > + * 2. Could something have changed (like the s_instance_id counter
> > + * resetting)
> > + */
> > + uuid_t overlay_boot_id;
>
> ovl_boot_id
>
> > + u64 s_instance_id;
> > + errseq_t errseq; /* Just a u32 */
> > +} __packed;
> > +
> > /*
> > * The tuple (fh,uuid) is a universal unique identifier for a copy up origin,
> > * where:
> > @@ -501,3 +532,6 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
> >
> > /* export.c */
> > extern const struct export_operations ovl_export_operations;
> > +
> > +/* super.c */
> > +extern uuid_t overlay_boot_id;
> > diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> > index f8cc15533afa..ee0d2b88a19c 100644
> > --- a/fs/overlayfs/readdir.c
> > +++ b/fs/overlayfs/readdir.c
> > @@ -1054,7 +1054,84 @@ int ovl_check_d_type_supported(struct path *realpath)
> > return rdd.d_type_supported;
> > }
> >
> > -#define OVL_INCOMPATDIR_NAME "incompat"
> > +static int ovl_check_incompat_volatile(struct ovl_cache_entry *p,
> > + struct path *path)
> > +{
> > + int err, ret = -EINVAL;
> > + struct overlay_volatile_info info;
> > + struct dentry *d_volatile, *d_dirty;
> > +
> > + d_volatile = lookup_one_len(p->name, path->dentry, p->len);
> > + if (IS_ERR(d_volatile))
> > + return PTR_ERR(d_volatile);
> > +
> > + inode_lock_nested(d_volatile->d_inode, I_MUTEX_PARENT);
>
> You can't do this. I_MUTEX_PARENT level is already taken on parent
> and you also don't need to perform lookup in this helper. I will explain below.
>
> > + d_dirty = lookup_one_len(OVL_VOLATILE_DIRTY_NAME, d_volatile,
> > + strlen(OVL_VOLATILE_DIRTY_NAME));
> > + if (IS_ERR(d_dirty)) {
> > + err = PTR_ERR(d_dirty);
> > + if (err != -ENOENT)
> > + ret = err;
> > + goto out_putvolatile;
> > + }
> > +
> > + if (!d_dirty->d_inode)
> > + goto out_putdirty;
> > +
> > + inode_lock_nested(d_dirty->d_inode, I_MUTEX_XATTR);
>
> What's this lock for?
>
I need to take a lock on this inode to prevent modifications to it, right, or is
getting the xattr safe?
> > + err = ovl_do_getxattr(d_dirty, OVL_XATTR_VOLATILE, &info, sizeof(info));
> > + inode_unlock(d_dirty->d_inode);
> > + if (err != sizeof(info))
> > + goto out_putdirty;
> > +
> > + if (!uuid_equal(&overlay_boot_id, &info.overlay_boot_id)) {
> > + pr_debug("boot id has changed (reboot or module reloaded)\n");
> > + goto out_putdirty;
> > + }
> > +
> > + if (d_dirty->d_inode->i_sb->s_instance_id != info.s_instance_id) {
> > + pr_debug("workdir has been unmounted and remounted\n");
> > + goto out_putdirty;
> > + }
> > +
> > + err = errseq_check(&d_dirty->d_inode->i_sb->s_wb_err, info.errseq);
> > + if (err) {
> > + pr_debug("workdir dir has experienced errors: %d\n", err);
> > + goto out_putdirty;
> > + }
>
> Please put all the above including getxattr in helper ovl_verify_volatile_info()
>
Is it okay if the helper stays in super.c?
> > +
> > + /* Dirty file is okay, delete it. */
> > + ret = ovl_do_unlink(d_volatile->d_inode, d_dirty);
>
> That's a problem. By doing this, you have now approved a regular overlay
> re-mount, but you need only approve a volatile overlay re-mount.
> Need to pass ofs to ovl_workdir_cleanup{,_recurse}.
>
I can add a check to make sure this behaviour is only allowed on remounts back
into volatile. There's technically a race condition here, where if there
is an error between this check, and the mounting being finished, the FS
could be dirty, but that already exists with the impl today.
> > +
> > +out_putdirty:
> > + dput(d_dirty);
> > +out_putvolatile:
> > + inode_unlock(d_volatile->d_inode);
> > + dput(d_volatile);
> > + return ret;
> > +}
> > +
> > +/*
> > + * check_incompat checks this specific incompat entry for incompatibility.
> > + * If it is found to be incompatible -EINVAL will be returned.
> > + *
> > + * Any other -errno indicates an unknown error, and filesystem mounting
> > + * should be aborted.
> > + */
> > +static int ovl_check_incompat(struct ovl_cache_entry *p, struct path *path)
> > +{
> > + int err = -EINVAL;
> > +
> > + if (!strcmp(p->name, OVL_VOLATILEDIR_NAME))
> > + err = ovl_check_incompat_volatile(p, path);
> > +
> > + if (err == -EINVAL)
> > + pr_err("incompat feature '%s' cannot be mounted\n", p->name);
> > + else
> > + pr_debug("incompat '%s' handled: %d\n", p->name, err);
> > +
> > + return err;
> > +}
> >
> > static int ovl_workdir_cleanup_recurse(struct path *path, int level)
> > {
> > @@ -1098,10 +1175,9 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level)
> > if (p->len == 2 && p->name[1] == '.')
> > continue;
> > } else if (incompat) {
> > - pr_err("overlay with incompat feature '%s' cannot be mounted\n",
> > - p->name);
> > - err = -EINVAL;
> > - break;
> > + err = ovl_check_incompat(p, path);
> > + if (err)
> > + break;
>
> The call to ovl_check_incompat here is too soon and it makes
> you need to lookup both the volatile dir and dirty file.
> What you need to do and let cleanup recurse into the next level
> while letting it know that we are now traversing the "incompat"
> subtree.
>
Maybe a dumb question but why is it incompat/volatile/dirty, rather than just
incompat/volatile, where volatile is a file? Are there any caveats with putting
the xattr on the directory, or alternatively are there any reasons not to make
the structure incompat/volatile/dirty?
> You can see a more generic implementation I once made here:
> https://github.com/amir73il/linux/blob/ovl-features/fs/overlayfs/readdir.c#L1051
> but it should be simpler with just incompat/volatile.
> Perhaps something like this:
>
> dentry = lookup_one_len(p->name, path->dentry, p->len);
> if (IS_ERR(dentry))
> continue;
> - if (dentry->d_inode)
> + if (dentry->d_inode && d_is_dir(dentry) && incompat)
> + err = ovl_incompatdir_cleanup(dir, path->mnt, dentry);
> + else if (dentry->d_inode)
> err = ovl_workdir_cleanup(dir, path->mnt,
> dentry, level);
> dput(dentry);
>
> Then inside ovl_incompatdir_cleanup() you can call ovl_check_incompat()
> with dentry argument.
>
> Now you have a few options. A simple option would be to put the volatile
> xattr on the volatile dir instead of on the dirty file.
> If you do that, you can call ovl_verify_volatile_info() on the volatile dentry
> without any lookups (only on a volatile re-mount) and if the volatile dir is
> approved for reuse, you don't even need to remove the dirty file, because
> it's just going to be re-created anyway.
>
>
> > }
> > dentry = lookup_one_len(p->name, path->dentry, p->len);
> > if (IS_ERR(dentry))
> > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > index 2ee0ba16cc7b..94980898009f 100644
> > --- a/fs/overlayfs/super.c
> > +++ b/fs/overlayfs/super.c
> > @@ -15,6 +15,7 @@
> > #include <linux/seq_file.h>
> > #include <linux/posix_acl_xattr.h>
> > #include <linux/exportfs.h>
> > +#include <linux/uuid.h>
> > #include "overlayfs.h"
> >
> > MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
> > @@ -23,6 +24,7 @@ MODULE_LICENSE("GPL");
> >
> >
> > struct ovl_dir_cache;
> > +uuid_t overlay_boot_id;
>
> ovl_boot_id please.
>
> >
> > #define OVL_MAX_STACK 500
> >
> > @@ -1246,20 +1248,35 @@ static struct dentry *ovl_lookup_or_create(struct dentry *parent,
> > */
> > static int ovl_create_volatile_dirty(struct ovl_fs *ofs)
> > {
> > + int err;
> > unsigned int ctr;
> > struct dentry *d = dget(ofs->workbasedir);
> > static const char *const volatile_path[] = {
> > - OVL_WORKDIR_NAME, "incompat", "volatile", "dirty"
> > + OVL_WORKDIR_NAME,
> > + OVL_INCOMPATDIR_NAME,
> > + OVL_VOLATILEDIR_NAME,
> > + OVL_VOLATILE_DIRTY_NAME,
> > };
> > const char *const *name = volatile_path;
> > + struct overlay_volatile_info info = {};
> >
> > for (ctr = ARRAY_SIZE(volatile_path); ctr; ctr--, name++) {
> > d = ovl_lookup_or_create(d, *name, ctr > 1 ? S_IFDIR : S_IFREG);
> > if (IS_ERR(d))
> > return PTR_ERR(d);
> > }
> > +
> > + uuid_copy(&info.overlay_boot_id, &overlay_boot_id);
> > + info.s_instance_id = d->d_inode->i_sb->s_instance_id;
> > + info.errseq = errseq_sample(&d->d_inode->i_sb->s_wb_err);
> > +
> > +
> > + err = ovl_do_setxattr(d, OVL_XATTR_VOLATILE, &info, sizeof(info), 0);
> > + if (err == -EOPNOTSUPP)
> > + err = 0;
> > +
>
> Please put all the above in helper ovl_set_volatile_info()
>
> Thanks,
> Amir.
Thank you for the fast review.
next prev parent reply other threads:[~2020-11-16 11:46 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-16 4:57 [RFC PATCH 0/3] Make overlayfs volatile mounts reusable Sargun Dhillon
2020-11-16 4:57 ` [RFC PATCH 1/3] fs: Add s_instance_id field to superblock for unique identification Sargun Dhillon
2020-11-16 5:07 ` Sargun Dhillon
2020-11-16 4:57 ` [RFC PATCH 2/3] overlay: Add ovl_do_getxattr helper Sargun Dhillon
2020-11-16 11:00 ` Amir Goldstein
2020-11-16 4:57 ` [RFC PATCH 3/3] overlay: Add the ability to remount volatile directories when safe Sargun Dhillon
2020-11-16 9:31 ` Amir Goldstein
2020-11-16 10:30 ` Sargun Dhillon [this message]
2020-11-16 11:17 ` Amir Goldstein
2020-11-16 12:52 ` Amir Goldstein
2020-11-16 14:42 ` Vivek Goyal
2020-11-16 14:45 ` Vivek Goyal
2020-11-16 15:20 ` Amir Goldstein
2020-11-16 16:36 ` Vivek Goyal
2020-11-16 18:25 ` Sargun Dhillon
2020-11-16 19:27 ` Vivek Goyal
2020-11-16 20:18 ` Amir Goldstein
2020-11-16 21:09 ` Vivek Goyal
2020-11-17 5:33 ` Amir Goldstein
2020-11-17 14:48 ` Vivek Goyal
2020-11-17 15:24 ` Amir Goldstein
2020-11-17 15:40 ` Vivek Goyal
2020-11-17 16:46 ` Vivek Goyal
2020-11-17 18:03 ` Amir Goldstein
2020-11-17 18:29 ` Vivek Goyal
2020-11-18 7:24 ` Amir Goldstein
2020-11-18 8:27 ` Sargun Dhillon
2020-11-18 10:46 ` Amir Goldstein
2020-11-18 14:55 ` Vivek Goyal
2020-11-16 21:26 ` Vivek Goyal
2020-11-16 22:14 ` Sargun Dhillon
2020-11-17 5:41 ` Amir Goldstein
2020-11-17 17:05 ` Vivek Goyal
2020-11-16 17:38 ` Sargun Dhillon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201116103013.GA13259@ircssh-2.c.rugged-nimbus-611.internal \
--to=sargun@sargun.me \
--cc=amir73il@gmail.com \
--cc=dhowells@redhat.com \
--cc=dwalsh@redhat.com \
--cc=gscrivan@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-unionfs@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=vgoyal@redhat.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).