From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63883C6379D for ; Mon, 16 Nov 2020 04:58:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F2643222B9 for ; Mon, 16 Nov 2020 04:58:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=sargun.me header.i=@sargun.me header.b="kGyCSbvj" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726315AbgKPE6L (ORCPT ); Sun, 15 Nov 2020 23:58:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41088 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726509AbgKPE6L (ORCPT ); Sun, 15 Nov 2020 23:58:11 -0500 Received: from mail-pf1-x441.google.com (mail-pf1-x441.google.com [IPv6:2607:f8b0:4864:20::441]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7802EC0613D1 for ; Sun, 15 Nov 2020 20:58:11 -0800 (PST) Received: by mail-pf1-x441.google.com with SMTP id c20so12999073pfr.8 for ; Sun, 15 Nov 2020 20:58:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sargun.me; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=rFPavjyYFF24wOBlhzJm+F5LYxdtAeaLoYvsfLmUX2E=; b=kGyCSbvjdRcRBXVx7A/oypGiAoqaYMsXgwPJ6opYca1ccw84t4uM0Hp3Q8jpd0cL0O Ybs1dTiy2re0CRV2uPWnxHlMiCoDr2Cnfox2AMnxum682h/WYRm9gHURC0ZvNumL/MoR qw4jsqcBLk3F9np4Q1s+szWqRJzGBf8b2nDWA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=rFPavjyYFF24wOBlhzJm+F5LYxdtAeaLoYvsfLmUX2E=; b=RoSH5L6AkjlFOQpi3/IOOhf66sbbEeu6AXj0O9chxojI8cfmTUStNTlVX9MPIRLERp sgyF4yHnQTWdKEfsEPRMGiOGmttMMTeY6pxE6OLcA2HrRVaY71ZaPOCrQGV03r4xe6th XCxfLpK0ll2u92S9M8+JKLA/pfJIYgxguG0yfFuG/IRlP1YdDvzOKzqd7e3pjJNg+nTP 5ISYOzdMB4A6p6T39AAGyfIWlxAMkzTFWQm5H2Uu3+888OaEq77q49d1Ilh2m4sTvYSh PMiWhm178MfbgLf9JAbZ8OHuZ09iusjuoUPWJvyAiTPT9lIftS77pY4MqQneupnNclIP qDyw== X-Gm-Message-State: AOAM533r1U+TVjuVpGA6ic3JCmAf1IMu9dHCD8lfu2VUBFfi5YENaH5I 3/Mo3XKC8izMqb0MHrwEmkkBPSc5mE8cc1mv X-Google-Smtp-Source: ABdhPJyibKLjrcNc5l6uvKIbKvrPv0ZbGCJbMH3ZYguHzHZXQUK+vJrU8jY8xNUi88bNKzrEU4TKmA== X-Received: by 2002:a17:90a:4d47:: with SMTP id l7mr13992249pjh.121.1605502690504; Sun, 15 Nov 2020 20:58:10 -0800 (PST) Received: from ubuntu.netflix.com (203.20.25.136.in-addr.arpa. [136.25.20.203]) by smtp.gmail.com with ESMTPSA id v23sm16465284pjh.46.2020.11.15.20.58.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 15 Nov 2020 20:58:09 -0800 (PST) From: Sargun Dhillon To: linux-unionfs@vger.kernel.org, miklos@szeredi.hu, Alexander Viro Cc: Sargun Dhillon , Giuseppe Scrivano , Vivek Goyal , Daniel J Walsh , David Howells , linux-fsdevel@vger.kernel.org, Amir Goldstein Subject: [RFC PATCH 3/3] overlay: Add the ability to remount volatile directories when safe Date: Sun, 15 Nov 2020 20:57:58 -0800 Message-Id: <20201116045758.21774-4-sargun@sargun.me> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201116045758.21774-1-sargun@sargun.me> References: <20201116045758.21774-1-sargun@sargun.me> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-unionfs@vger.kernel.org Overlayfs added the ability to setup mounts where all syncs could be short-circuted in (2a99ddacee43: ovl: provide a mount option "volatile"). A user might want to remount this fs, but we do not let the user because of the "incompat" detection feature. In the case of volatile, it is safe to do something like[1]: $ sync -f /root/upperdir $ rm -rf /root/workdir/incompat/volatile There are two ways to go about this. You can call sync on the underlying filesystem, check the error code, and delete the dirty file if everything is clean. If you're running lots of containers on the same filesystem, or you want to avoid all unnecessary I/O, this may be suboptimal. Alternatively, you can blindly delete the dirty file, and "hope for the best". This patch introduces transparent functionality to check if it is (relatively) safe to reuse the upperdir. It ensures that the filesystem hasn't been remounted, the system hasn't been rebooted, nor has the overlayfs code changed. It also checks the errseq on the superblock indicating if there have been any writeback errors since the previous mount. Currently, this information is not directly exposed to userspace, so the user cannot make decisions based on this. Instead we checkpoint this information to disk, and upon remount we see if any of it has changed. Since the structure is explicitly not meant to be used between different versions of the code, its stability does not matter so much. [1]: https://lore.kernel.org/linux-unionfs/CAOQ4uxhKr+j5jFyEC2gJX8E8M19mQ3CqdTYaPZOvDQ9c0qLEzw@mail.gmail.com/T/#m6abe713e4318202ad57f301bf28a414e1d824f9c Signed-off-by: Sargun Dhillon Cc: linux-fsdevel@vger.kernel.org Cc: linux-unionfs@vger.kernel.org Cc: Miklos Szeredi Cc: Amir Goldstein --- Documentation/filesystems/overlayfs.rst | 5 +- fs/overlayfs/overlayfs.h | 34 ++++++++++ fs/overlayfs/readdir.c | 86 +++++++++++++++++++++++-- fs/overlayfs/super.c | 22 ++++++- 4 files changed, 139 insertions(+), 8 deletions(-) diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst index 580ab9a0fe31..fa3faeeab727 100644 --- a/Documentation/filesystems/overlayfs.rst +++ b/Documentation/filesystems/overlayfs.rst @@ -581,7 +581,10 @@ checks for this directory and refuses to mount if present. This is a strong indicator that user should throw away upper and work directories and create fresh one. In very limited cases where the user knows that the system has not crashed and contents of upperdir are intact, The "volatile" directory -can be removed. +can be removed. In certain cases it the filesystem can detect that the +upperdir can be reused safely, and it will not require the user to +manually delete the volatile directory. + Testsuite --------- diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index 9eb911f243e1..980d2c930f7a 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -30,6 +30,11 @@ enum ovl_path_type { #define OVL_XATTR_NLINK OVL_XATTR_PREFIX "nlink" #define OVL_XATTR_UPPER OVL_XATTR_PREFIX "upper" #define OVL_XATTR_METACOPY OVL_XATTR_PREFIX "metacopy" +#define OVL_XATTR_VOLATILE OVL_XATTR_PREFIX "volatile" + +#define OVL_INCOMPATDIR_NAME "incompat" +#define OVL_VOLATILEDIR_NAME "volatile" +#define OVL_VOLATILE_DIRTY_NAME "dirty" enum ovl_inode_flag { /* Pure upper dir that may contain non pure upper entries */ @@ -54,6 +59,32 @@ enum { OVL_XINO_ON, }; +/* + * This is copied into the volatile xattr, and the user does not interact with + * it. There is no stability requirement, as a reboot explicitly invalidates + * a volatile workdir. It is explicitly meant not to be a stable api. + * + * Although this structure isn't meant to be stable it is exposed to potentially + * unprivileged users. We don't do any kind of cryptographic operations with + * the structure, so it could be tampered with, or inspected. Don't put + * kernel memory pointers in it, or anything else that could cause problems, + * or information disclosure. + */ +struct overlay_volatile_info { + /* + * This uniquely identifies a boot, and is reset if overlayfs itself + * is reloaded. Therefore we check our current / known boot_id + * against this before looking at any other fields to validate: + * 1. Is this datastructure laid out in the way we expect? (Overlayfs + * module, reboot, etc...) + * 2. Could something have changed (like the s_instance_id counter + * resetting) + */ + uuid_t overlay_boot_id; + u64 s_instance_id; + errseq_t errseq; /* Just a u32 */ +} __packed; + /* * The tuple (fh,uuid) is a universal unique identifier for a copy up origin, * where: @@ -501,3 +532,6 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower, /* export.c */ extern const struct export_operations ovl_export_operations; + +/* super.c */ +extern uuid_t overlay_boot_id; diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c index f8cc15533afa..ee0d2b88a19c 100644 --- a/fs/overlayfs/readdir.c +++ b/fs/overlayfs/readdir.c @@ -1054,7 +1054,84 @@ int ovl_check_d_type_supported(struct path *realpath) return rdd.d_type_supported; } -#define OVL_INCOMPATDIR_NAME "incompat" +static int ovl_check_incompat_volatile(struct ovl_cache_entry *p, + struct path *path) +{ + int err, ret = -EINVAL; + struct overlay_volatile_info info; + struct dentry *d_volatile, *d_dirty; + + d_volatile = lookup_one_len(p->name, path->dentry, p->len); + if (IS_ERR(d_volatile)) + return PTR_ERR(d_volatile); + + inode_lock_nested(d_volatile->d_inode, I_MUTEX_PARENT); + d_dirty = lookup_one_len(OVL_VOLATILE_DIRTY_NAME, d_volatile, + strlen(OVL_VOLATILE_DIRTY_NAME)); + if (IS_ERR(d_dirty)) { + err = PTR_ERR(d_dirty); + if (err != -ENOENT) + ret = err; + goto out_putvolatile; + } + + if (!d_dirty->d_inode) + goto out_putdirty; + + inode_lock_nested(d_dirty->d_inode, I_MUTEX_XATTR); + err = ovl_do_getxattr(d_dirty, OVL_XATTR_VOLATILE, &info, sizeof(info)); + inode_unlock(d_dirty->d_inode); + if (err != sizeof(info)) + goto out_putdirty; + + if (!uuid_equal(&overlay_boot_id, &info.overlay_boot_id)) { + pr_debug("boot id has changed (reboot or module reloaded)\n"); + goto out_putdirty; + } + + if (d_dirty->d_inode->i_sb->s_instance_id != info.s_instance_id) { + pr_debug("workdir has been unmounted and remounted\n"); + goto out_putdirty; + } + + err = errseq_check(&d_dirty->d_inode->i_sb->s_wb_err, info.errseq); + if (err) { + pr_debug("workdir dir has experienced errors: %d\n", err); + goto out_putdirty; + } + + /* Dirty file is okay, delete it. */ + ret = ovl_do_unlink(d_volatile->d_inode, d_dirty); + +out_putdirty: + dput(d_dirty); +out_putvolatile: + inode_unlock(d_volatile->d_inode); + dput(d_volatile); + return ret; +} + +/* + * check_incompat checks this specific incompat entry for incompatibility. + * If it is found to be incompatible -EINVAL will be returned. + * + * Any other -errno indicates an unknown error, and filesystem mounting + * should be aborted. + */ +static int ovl_check_incompat(struct ovl_cache_entry *p, struct path *path) +{ + int err = -EINVAL; + + if (!strcmp(p->name, OVL_VOLATILEDIR_NAME)) + err = ovl_check_incompat_volatile(p, path); + + if (err == -EINVAL) + pr_err("incompat feature '%s' cannot be mounted\n", p->name); + else + pr_debug("incompat '%s' handled: %d\n", p->name, err); + + return err; +} static int ovl_workdir_cleanup_recurse(struct path *path, int level) { @@ -1098,10 +1175,9 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level) if (p->len == 2 && p->name[1] == '.') continue; } else if (incompat) { - pr_err("overlay with incompat feature '%s' cannot be mounted\n", - p->name); - err = -EINVAL; - break; + err = ovl_check_incompat(p, path); + if (err) + break; } dentry = lookup_one_len(p->name, path->dentry, p->len); if (IS_ERR(dentry)) diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index 2ee0ba16cc7b..94980898009f 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -15,6 +15,7 @@ #include #include #include +#include #include "overlayfs.h" MODULE_AUTHOR("Miklos Szeredi "); @@ -23,6 +24,7 @@ MODULE_LICENSE("GPL"); struct ovl_dir_cache; +uuid_t overlay_boot_id; #define OVL_MAX_STACK 500 @@ -1246,20 +1248,35 @@ static struct dentry *ovl_lookup_or_create(struct dentry *parent, */ static int ovl_create_volatile_dirty(struct ovl_fs *ofs) { + int err; unsigned int ctr; struct dentry *d = dget(ofs->workbasedir); static const char *const volatile_path[] = { - OVL_WORKDIR_NAME, "incompat", "volatile", "dirty" + OVL_WORKDIR_NAME, + OVL_INCOMPATDIR_NAME, + OVL_VOLATILEDIR_NAME, + OVL_VOLATILE_DIRTY_NAME, }; const char *const *name = volatile_path; + struct overlay_volatile_info info = {}; for (ctr = ARRAY_SIZE(volatile_path); ctr; ctr--, name++) { d = ovl_lookup_or_create(d, *name, ctr > 1 ? S_IFDIR : S_IFREG); if (IS_ERR(d)) return PTR_ERR(d); } + + uuid_copy(&info.overlay_boot_id, &overlay_boot_id); + info.s_instance_id = d->d_inode->i_sb->s_instance_id; + info.errseq = errseq_sample(&d->d_inode->i_sb->s_wb_err); + + + err = ovl_do_setxattr(d, OVL_XATTR_VOLATILE, &info, sizeof(info), 0); + if (err == -EOPNOTSUPP) + err = 0; + dput(d); - return 0; + return err; } static int ovl_make_workdir(struct super_block *sb, struct ovl_fs *ofs, @@ -2045,6 +2062,7 @@ static int __init ovl_init(void) { int err; + uuid_gen(&overlay_boot_id); ovl_inode_cachep = kmem_cache_create("ovl_inode", sizeof(struct ovl_inode), 0, (SLAB_RECLAIM_ACCOUNT| -- 2.25.1