From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2612C433E0 for ; Tue, 21 Jul 2020 13:16:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 937D520717 for ; Tue, 21 Jul 2020 13:16:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="key not found in DNS" (0-bit key) header.d=szeredi.hu header.i=@szeredi.hu header.b="bFyaybPl" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726919AbgGUNQJ (ORCPT ); Tue, 21 Jul 2020 09:16:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42938 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726769AbgGUNQI (ORCPT ); Tue, 21 Jul 2020 09:16:08 -0400 Received: from mail-ej1-x642.google.com (mail-ej1-x642.google.com [IPv6:2a00:1450:4864:20::642]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56BB0C061794 for ; Tue, 21 Jul 2020 06:16:08 -0700 (PDT) Received: by mail-ej1-x642.google.com with SMTP id rk21so21621404ejb.2 for ; Tue, 21 Jul 2020 06:16:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Dn2TsBkW6f5+a7a25abjEcG5m7fkrlRN2Nj2Y1lEVuY=; b=bFyaybPl8jURkfMi5lZRvwEB4kKTxtxc6ZfBl1Kai4KaJ9wUC+N9OeIMmTshi2DdJV gwDjvvXto/LfhACvbOjWcrAmYNL/4rtnpEG5kUwQF2YTnUp769ZgkxU7Aqt1NxU1gVlt Cez3qSKZbwrcwBVzHwOMxTVKHB2E+axLBLJVM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Dn2TsBkW6f5+a7a25abjEcG5m7fkrlRN2Nj2Y1lEVuY=; b=TuyEdTe8hVzPuVgNOJCGjgRW2+Tj4z3zikNrQMInnYLSM+JtZNc4TWREU7ngIEj5Fp ZjtDrcMW/OQgJkPU4rs1GNPb7dzR5nMvMZNj/9R7RjiAT5uJFsyxeMnUSWRPjmKf5A6Y /fBkWLVl1F6MuNPwpE8LuUQRpiNvIvwkXKKNQrWerMLhCZKu6Q40t/MfUDc3C1jkYue5 dh6/RzXPTzq7EFzeTMMqrHTHidc2mmI79RuixQgXHPg6NBcIpoNCq7LTccIh4Azz9TCu ujVL2bUxahXmfXgRVfwTYkWO8tR2TnHbhHLBGHNA1AQq6Wa+5boA2D2f+NndVni0+5al OB0g== X-Gm-Message-State: AOAM530u9xItMu7xRtq9MB4WpqqefqRb1iowD1fgYhh/29id/aj0HnbE sw1pSKmIGF43RKMKvV8Q/jjjUizz2lCXIy+WFwizeg== X-Google-Smtp-Source: ABdhPJwhrieuuUKXVF6WVrp6Ktb9dJG8uOYrV7LqKpCpgLJNrB0/D8roi8YNR+F+gEGdE8p3I3/zVsOAMrwSi4inxa8= X-Received: by 2002:a17:906:1c05:: with SMTP id k5mr24643455ejg.320.1595337366780; Tue, 21 Jul 2020 06:16:06 -0700 (PDT) MIME-Version: 1.0 References: <20200706161227.GB3107@redhat.com> <20200720161618.GD502563@redhat.com> In-Reply-To: <20200720161618.GD502563@redhat.com> From: Miklos Szeredi Date: Tue, 21 Jul 2020 15:15:55 +0200 Message-ID: Subject: Re: [PATCH v4] overlayfs: Provide mount options sync=off/fs to skip sync To: Vivek Goyal Cc: Amir Goldstein , overlayfs , Giuseppe Scrivano , Daniel J Walsh , Steven Whitehouse , pmatilai@redhat.com, sandeen@redhat.com Content-Type: text/plain; charset="UTF-8" Sender: linux-unionfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-unionfs@vger.kernel.org On Mon, Jul 20, 2020 at 6:16 PM Vivek Goyal wrote: > For building images containers folks need to sync upper layer. Their > current plan is to use "syncfs upper/" because it is same as if overlay > was mounted with sync=fs. But this syncs whole upper filesystem and > not just upper of a particular overlayfs instance > > So idea was to provide sync=fs from the beginning and ask container > folks to use this. So that in future if we can optimize sync=fs to > sync selctive inodes, then container runtime will automatically > benefit from it without any changes. It also reduces the chances > of error on container runtime which fail to sync upper. Hence idea > of sync=fs sounded appleaing to me. Not sure I understand the reason for sync=fs? Should it rather be sync=shutdown? > > Havid said that, I am open to dropping sync=fs for now, if you don't > see the value at this point of time. At this point it doesn't add any usefulness, so let's just drop it. > > > > Naming: I'm not at all convinced by any name having "sync" in it. I > > think "sync=no" is about the implementation, not the functionality, > > and so it's confusing. The functionality is better described by > > "volatile" or "temporary". But I can live with sync=... if voted > > down. > > I am fine with the name "volatile/temporary" for sync=off. How about needing "volatile" for all kinds of modes that reduce the normal durability/integrity guarantees. Then additional "sync=foobar" option to control the details? Thanks, Miklos > > Amir, WDYT? > > Vivek > > > > > > > > > > > > > > > > > Giuseppe is also looking to cut down on number of iops done on the > > > disk. He is complaining that often in cloud their VMs are throttled > > > if they cross the limit. This option can help them where they reduce > > > number of iops (by cutting down on frequent sync and writebacks). > > > > > > Changes from v3: > > > - Used only enums and dropped bit flags (Amir Goldstein) > > > - Dropped error when conflicting sync options provided. (Amir Goldstein) > > > > > > Changes from v2: > > > - Added helper functions (Amir Goldstein) > > > - Used enums to keep sync state (Amir Goldstein) > > > > > > Signed-off-by: Giuseppe Scrivano > > > Signed-off-by: Miklos Szeredi > > > Signed-off-by: Vivek Goyal > > > --- > > > Documentation/filesystems/overlayfs.rst | 16 +++++++++++ > > > fs/overlayfs/copy_up.c | 12 ++++++--- > > > fs/overlayfs/file.c | 10 ++++++- > > > fs/overlayfs/ovl_entry.h | 17 ++++++++++++ > > > fs/overlayfs/readdir.c | 3 +++ > > > fs/overlayfs/super.c | 35 ++++++++++++++++++++++--- > > > 6 files changed, 85 insertions(+), 8 deletions(-) > > > > > > diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst > > > index 660dbaf0b9b8..4e55ac4433ec 100644 > > > --- a/Documentation/filesystems/overlayfs.rst > > > +++ b/Documentation/filesystems/overlayfs.rst > > > @@ -563,6 +563,22 @@ This verification may cause significant overhead in some cases. > > > Note: the mount options index=off,nfs_export=on are conflicting and will > > > result in an error. > > > > > > +Disable sync > > > +------------ > > > +By default, overlay skips sync on files residing on a lower layer. It > > > +is possible to skip sync operations for files on the upper layer as well > > > +with the "sync=off" and "sync=fs" mount option. > > > + > > > +"sync=off" option disables all forms of sync from overlay, including the > > > +one done at umount/remount. If system crashes or shuts down, user > > > +should throw away upper directory and start fresh. > > > + > > > +"sync=fs" option disables all forms of sync except full filesystem > > > +sync which is done at syncfs/remount/mount time. This is useful for > > > +use cases like container image build which want upper to persist > > > +only if operation has finished. If system crashes before image > > > +layer formation is complete, tools should discard upper and start > > > +fresh. > > > > > > Testsuite > > > --------- > > > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c > > > index 79dd052c7dbf..3a5ae9c2f86e 100644 > > > --- a/fs/overlayfs/copy_up.c > > > +++ b/fs/overlayfs/copy_up.c > > > @@ -128,7 +128,8 @@ int ovl_copy_xattr(struct dentry *old, struct dentry *new) > > > return error; > > > } > > > > > > -static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len) > > > +static int ovl_copy_up_data(struct ovl_fs *ofs, struct path *old, > > > + struct path *new, loff_t len) > > > { > > > struct file *old_file; > > > struct file *new_file; > > > @@ -218,7 +219,7 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len) > > > len -= bytes; > > > } > > > out: > > > - if (!error) > > > + if (!error && ovl_should_fsync(ofs)) > > > error = vfs_fsync(new_file, 0); > > > fput(new_file); > > > out_fput: > > > @@ -484,6 +485,7 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c) > > > > > > static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp) > > > { > > > + struct ovl_fs *ofs = OVL_FS(c->dentry->d_sb); > > > int err; > > > > > > /* > > > @@ -499,7 +501,8 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp) > > > upperpath.dentry = temp; > > > > > > ovl_path_lowerdata(c->dentry, &datapath); > > > - err = ovl_copy_up_data(&datapath, &upperpath, c->stat.size); > > > + err = ovl_copy_up_data(ofs, &datapath, &upperpath, > > > + c->stat.size); > > > if (err) > > > return err; > > > } > > > @@ -784,6 +787,7 @@ static bool ovl_need_meta_copy_up(struct dentry *dentry, umode_t mode, > > > /* Copy up data of an inode which was copied up metadata only in the past. */ > > > static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c) > > > { > > > + struct ovl_fs *ofs = OVL_FS(c->dentry->d_sb); > > > struct path upperpath, datapath; > > > int err; > > > char *capability = NULL; > > > @@ -804,7 +808,7 @@ static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c) > > > goto out; > > > } > > > > > > - err = ovl_copy_up_data(&datapath, &upperpath, c->stat.size); > > > + err = ovl_copy_up_data(ofs, &datapath, &upperpath, c->stat.size); > > > if (err) > > > goto out_free; > > > > > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c > > > index 01820e654a21..c92af3856dbf 100644 > > > --- a/fs/overlayfs/file.c > > > +++ b/fs/overlayfs/file.c > > > @@ -329,6 +329,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) > > > struct fd real; > > > const struct cred *old_cred; > > > ssize_t ret; > > > + int ifl = iocb->ki_flags; > > > > > > if (!iov_iter_count(iter)) > > > return 0; > > > @@ -344,11 +345,14 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) > > > if (ret) > > > goto out_unlock; > > > > > > + if (!ovl_should_fsync(OVL_FS(inode->i_sb))) > > > + ifl &= ~(IOCB_DSYNC | IOCB_SYNC); > > > + > > > old_cred = ovl_override_creds(file_inode(file)->i_sb); > > > if (is_sync_kiocb(iocb)) { > > > file_start_write(real.file); > > > ret = vfs_iter_write(real.file, iter, &iocb->ki_pos, > > > - ovl_iocb_to_rwf(iocb->ki_flags)); > > > + ovl_iocb_to_rwf(ifl)); > > > file_end_write(real.file); > > > /* Update size */ > > > ovl_copyattr(ovl_inode_real(inode), inode); > > > @@ -368,6 +372,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) > > > real.flags = 0; > > > aio_req->orig_iocb = iocb; > > > kiocb_clone(&aio_req->iocb, iocb, real.file); > > > + aio_req->iocb.ki_flags = ifl; > > > aio_req->iocb.ki_complete = ovl_aio_rw_complete; > > > ret = vfs_iocb_iter_write(real.file, &aio_req->iocb, iter); > > > if (ret != -EIOCBQUEUED) > > > @@ -431,6 +436,9 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync) > > > const struct cred *old_cred; > > > int ret; > > > > > > + if (!ovl_should_fsync(OVL_FS(file_inode(file)->i_sb))) > > > + return 0; > > > + > > > ret = ovl_real_fdget_meta(file, &real, !datasync); > > > if (ret) > > > return ret; > > > diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h > > > index b429c80879ee..e6d21eff5620 100644 > > > --- a/fs/overlayfs/ovl_entry.h > > > +++ b/fs/overlayfs/ovl_entry.h > > > @@ -5,6 +5,12 @@ > > > * Copyright (C) 2016 Red Hat, Inc. > > > */ > > > > > > +enum ovl_sync_type { > > > + OVL_SYNC_ON, > > > + OVL_SYNC_OFF, > > > + OVL_SYNC_FS, > > > +}; > > > + > > > struct ovl_config { > > > char *lowerdir; > > > char *upperdir; > > > @@ -17,6 +23,7 @@ struct ovl_config { > > > bool nfs_export; > > > int xino; > > > bool metacopy; > > > + enum ovl_sync_type sync; > > > }; > > > > > > struct ovl_sb { > > > @@ -90,6 +97,16 @@ static inline struct ovl_fs *OVL_FS(struct super_block *sb) > > > return (struct ovl_fs *)sb->s_fs_info; > > > } > > > > > > +static inline bool ovl_should_fsync(struct ovl_fs *ofs) > > > +{ > > > + return ofs->config.sync == OVL_SYNC_ON; > > > +} > > > + > > > +static inline bool ovl_should_syncfs(struct ovl_fs *ofs) > > > +{ > > > + return ofs->config.sync != OVL_SYNC_OFF; > > > +} > > > + > > > /* private information held for every overlayfs dentry */ > > > struct ovl_entry { > > > union { > > > diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c > > > index 6918b98faeb6..80f772faad5c 100644 > > > --- a/fs/overlayfs/readdir.c > > > +++ b/fs/overlayfs/readdir.c > > > @@ -863,6 +863,9 @@ static int ovl_dir_fsync(struct file *file, loff_t start, loff_t end, > > > if (!OVL_TYPE_UPPER(ovl_path_type(dentry))) > > > return 0; > > > > > > + if (!ovl_should_fsync(OVL_FS(dentry->d_sb))) > > > + return 0; > > > + > > > /* > > > * Need to check if we started out being a lower dir, but got copied up > > > */ > > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c > > > index 91476bc422f9..04f6108fdc69 100644 > > > --- a/fs/overlayfs/super.c > > > +++ b/fs/overlayfs/super.c > > > @@ -264,6 +264,8 @@ static int ovl_sync_fs(struct super_block *sb, int wait) > > > if (!ovl_upper_mnt(ofs)) > > > return 0; > > > > > > + if (!ovl_should_syncfs(ofs)) > > > + return 0; > > > /* > > > * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC). > > > * All the super blocks will be iterated, including upper_sb. > > > @@ -327,6 +329,12 @@ static const char * const ovl_xino_str[] = { > > > "on", > > > }; > > > > > > +static const char * const ovl_sync_str[] = { > > > + "on", > > > + "off", > > > + "fs", > > > +}; > > > + > > > static inline int ovl_xino_def(void) > > > { > > > return ovl_xino_auto_def ? OVL_XINO_AUTO : OVL_XINO_OFF; > > > @@ -362,6 +370,8 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry) > > > if (ofs->config.metacopy != ovl_metacopy_def) > > > seq_printf(m, ",metacopy=%s", > > > ofs->config.metacopy ? "on" : "off"); > > > + if (ofs->config.sync != OVL_SYNC_ON) > > > + seq_printf(m, ",sync=%s", ovl_sync_str[ofs->config.sync]); > > > return 0; > > > } > > > > > > @@ -376,9 +386,11 @@ static int ovl_remount(struct super_block *sb, int *flags, char *data) > > > > > > if (*flags & SB_RDONLY && !sb_rdonly(sb)) { > > > upper_sb = ovl_upper_mnt(ofs)->mnt_sb; > > > - down_read(&upper_sb->s_umount); > > > - ret = sync_filesystem(upper_sb); > > > - up_read(&upper_sb->s_umount); > > > + if (ovl_should_syncfs(ofs)) { > > > + down_read(&upper_sb->s_umount); > > > + ret = sync_filesystem(upper_sb); > > > + up_read(&upper_sb->s_umount); > > > + } > > > } > > > > > > return ret; > > > @@ -411,6 +423,8 @@ enum { > > > OPT_XINO_AUTO, > > > OPT_METACOPY_ON, > > > OPT_METACOPY_OFF, > > > + OPT_SYNC_OFF, > > > + OPT_SYNC_FS, > > > OPT_ERR, > > > }; > > > > > > @@ -429,6 +443,8 @@ static const match_table_t ovl_tokens = { > > > {OPT_XINO_AUTO, "xino=auto"}, > > > {OPT_METACOPY_ON, "metacopy=on"}, > > > {OPT_METACOPY_OFF, "metacopy=off"}, > > > + {OPT_SYNC_OFF, "sync=off"}, > > > + {OPT_SYNC_FS, "sync=fs"}, > > > {OPT_ERR, NULL} > > > }; > > > > > > @@ -573,6 +589,14 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config) > > > metacopy_opt = true; > > > break; > > > > > > + case OPT_SYNC_OFF: > > > + config->sync = OVL_SYNC_OFF; > > > + break; > > > + > > > + case OPT_SYNC_FS: > > > + config->sync = OVL_SYNC_FS; > > > + break; > > > + > > > default: > > > pr_err("unrecognized mount option \"%s\" or missing value\n", > > > p); > > > @@ -588,6 +612,11 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config) > > > config->workdir = NULL; > > > } > > > > > > + if (!config->upperdir && config->sync) { > > > + pr_info("option sync=off/fs is meaningless in a non-upper mount, ignoring it.\n"); > > > + config->sync = 0; > > > + } > > > + > > > err = ovl_parse_redirect_mode(config, config->redirect_mode); > > > if (err) > > > return err; > > > -- > > > 2.25.4 > > > > > >