From: Sargun Dhillon <sargun@sargun.me>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>,
Vivek Goyal <vgoyal@redhat.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
overlayfs <linux-unionfs@vger.kernel.org>,
Jeff Layton <jlayton@kernel.org>,
Miklos Szeredi <miklos@szeredi.hu>, Jan Kara <jack@suse.cz>,
NeilBrown <neilb@suse.com>, Al Viro <viro@zeniv.linux.org.uk>,
Christoph Hellwig <hch@lst.de>,
Chengguang Xu <cgxu519@mykernel.net>
Subject: Re: [PATCH 3/3] overlayfs: Report writeback errors on upper
Date: Thu, 24 Dec 2020 10:12:16 +0000 [thread overview]
Message-ID: <20201224101215.GA23046@ircssh-2.c.rugged-nimbus-611.internal> (raw)
In-Reply-To: <CAOQ4uxjAeGv8x2hBBzHz5PjSDq0Q+RN-ikgqEvAA+XE_U-U5Nw@mail.gmail.com>
On Thu, Dec 24, 2020 at 11:32:55AM +0200, Amir Goldstein wrote:
> On Wed, Dec 23, 2020 at 10:44 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Dec 23, 2020 at 08:21:41PM +0000, Sargun Dhillon wrote:
> > > On Wed, Dec 23, 2020 at 08:07:46PM +0000, Matthew Wilcox wrote:
> > > > On Wed, Dec 23, 2020 at 07:29:41PM +0000, Sargun Dhillon wrote:
> > > > > On Wed, Dec 23, 2020 at 06:50:44PM +0000, Matthew Wilcox wrote:
> > > > > > On Wed, Dec 23, 2020 at 06:20:27PM +0000, Sargun Dhillon wrote:
> > > > > > > I fail to see why this is neccessary if you incorporate error reporting into the
> > > > > > > sync_fs callback. Why is this separate from that callback? If you pickup Jeff's
> > > > > > > patch that adds the 2nd flag to errseq for "observed", you should be able to
> > > > > > > stash the first errseq seen in the ovl_fs struct, and do the check-and-return
> > > > > > > in there instead instead of adding this new infrastructure.
> > > > > >
> > > > > > You still haven't explained why you want to add the "observed" flag.
> > > > >
> > > > >
> > > > > In the overlayfs model, many users may be using the same filesystem (super block)
> > > > > for their upperdir. Let's say you have something like this:
> > > > >
> > > > > /workdir [Mounted FS]
> > > > > /workdir/upperdir1 [overlayfs upperdir]
> > > > > /workdir/upperdir2 [overlayfs upperdir]
> > > > > /workdir/userscratchspace
> > > > >
> > > > > The user needs to be able to do something like:
> > > > > sync -f ${overlayfs1}/file
> > > > >
> > > > > which in turn will call sync on the the underlying filesystem (the one mounted
> > > > > on /workdir), and can check if the errseq has changed since the overlayfs was
> > > > > mounted, and use that to return an error to the user.
> > > >
> > > > OK, but I don't see why the current scheme doesn't work for this. If
> > > > (each instance of) overlayfs samples the errseq at mount time and then
> > > > check_and_advances it at sync time, it will see any error that has occurred
> > > > since the mount happened (and possibly also an error which occurred before
> > > > the mount happened, but hadn't been reported to anybody before).
> > > >
> > >
> > > If there is an outstanding error at mount time, and the SEEN flag is unset,
> > > subsequent errors will not increment the counter, until the user calls sync on
> > > the upperdir's filesystem. If overlayfs calls check_and_advance on the upperdir's
> > > super block at any point, it will then set the seen block, and if the user calls
> > > syncfs on the upperdir, it will not return that there is an outstanding error,
> > > since overlayfs just cleared it.
> >
> > Your concern is this case:
> >
> > fs is mounted on /workdir
> > /workdir/A is written to and then closed.
> > writeback happens and -EIO happens, but there's nobody around to care.
> > /workdir/upperdir1 becomes part of an overlayfs mount
> > overlayfs samples the error
> > a user writes to /workdir/B, another -EIO occurs, but nothing happens
> > someone calls syncfs on /workdir/upperdir/A, gets the EIO.
> > a user opens /workdir/B and calls syncfs, but sees no error
> >
> > do i have that right? or is it something else?
>
> IMO it is something else. Others may disagree.
> IMO the level of interference between users accessing overlay and users
> accessing upper fs directly is not well defined and it can stay this way.
>
> Concurrent access to /workdir/upperdir/A via overlay and underlying fs
> is explicitly warranted against in Documentation/filesystems/overlayfs.rst#
> Changes to underlying filesystems:
> "Changes to the underlying filesystems while part of a mounted overlay
> filesystem are not allowed. If the underlying filesystem is changed,
> the behavior of the overlay is undefined, though it will not result in
> a crash or deadlock."
>
> The question is whether syncfs(open(/workdir/B)) is considered
> "Changes to the underlying filesystems". Regardless of the answer,
> this is not an interesting case IMO.
>
> The real issue is with interference between overlays that share the
> same upper fs, because this is by far and large the common use case
> that is creating real problems for a lot of container users.
>
> Workloads running inside containers (with overlayfs storage driver)
> will never be as isolated as workloads running inside VMs, but it
> doesn't mean we cannot try to improve.
>
> In current master, syncfs() on any file by any container user will
> result in full syncfs() of the upperfs, which is very bad for container
> isolation. This has been partly fixed by Chengguang Xu [1] and I expect
> his work will be merged soon. Overlayfs still does not do the writeback
> and syncfs() in overlay still waits for all upper fs writeback to complete,
> but at least syncfs() in overlay only kicks writeback for upper fs files
> dirtied by this overlay.
>
> [1] https://lore.kernel.org/linux-unionfs/CAJfpegsbb4iTxW8ZyuRFVNc63zg7Ku7vzpSNuzHASYZH-d5wWA@mail.gmail.com/
>
> Sharing the same SEEN flag among thousands of containers is also
> far from ideal, because effectively this means that any given workload
> in any single container has very little chance of observing the SEEN flag.
>
> To this end, I do agree with Matthew that overlayfs should sample errseq
> and the best patchset to implement it so far IMO is Jeff's patchset [2].
> This patch set was written to cater only "volatile" overlayfs mount, but
> there is no reason not to use the same mechanism for regular overlay
> mount. The only difference being that "volatile" overlay only checks for
> error since mount on syncfs() (because "volatile" overlay does NOT
> syncfs upper fs) and regular overlay checks and advances the overlay's
> errseq sample on syncfs (and does syncfs upper fs).
>
> Matthew, I hope that my explanation of the use case and Jeff's answer
> is sufficient to understand why the split of the SEEN flag is needed.
>
> [2] https://lore.kernel.org/linux-unionfs/20201213132713.66864-1-jlayton@kernel.org/
>
> w.r.t Vivek's patchset (this one), I do not object to it at all, but it fixes
> a problem that Jeff's patch had already solved with an ugly hack:
>
> /* Propagate errors from upper to overlayfs */
> ret = errseq_check(&upper_sb->s_wb_err, ofs->err_mark);
> errseq_set(&sb->s_wb_err, ret);
>
> Since Jeff's patch is minimal, I think that it should be the fix applied
> first and proposed for stable (with adaptations for non-volatile overlay).
>
> I guess that Vivek's patch 1/3 from this series [3] is also needed to
> complement the work that should go to stable.
>
> Vivek, Sargun,
>
> Do you understand my proposal?
Yes. I agree that Jeff's patch should be added to stable. The fact we don't
bubble up writeback errors turns out to be a real problem that I never knew
was happening, but upon investigating, it looks like a real thing.
I think we can use Jeff's hacky approach in stable, as it's far more minimal,
and has a much lower chance of causing issues, but if we make further
improvements, we wont be able to backport them to stable as easily. I have
nothing explicitly against Vivek's approach though.
> Do you agree with it as a way forward to address the various syncfs
> issues for volatile/non-volatile that both of you were trying to address?
Yes. I think Vivek's patchset of introducing a new superblock callback is
the best approach.
>
> Sargun, I know this all discussion has forked from your volatile re-use
> patch set, but let's not confuse fsdevel forks more than we have to.
> The way forward for volatile re-use from this proposal is straight forward.
I think that Vivek's patchset of adding a new callback, plus Jeff's new flag
solves detection of errors for normal mounts, volatile mounts, and volatile
remounts.
I think I responded to Jeff's patch and it looked good, bar one of the loops.
My only suggestion is that someone add the intended behaviour here as comment
to super_ops of the new callback.
>
> Thanks,
> Amir.
next prev parent reply other threads:[~2020-12-24 10:13 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-21 19:50 [RFC PATCH 0/3][v3] vfs, overlayfs: Fix syncfs() to return correct errors Vivek Goyal
2020-12-21 19:50 ` [PATCH 1/3] vfs: Do not ignore return code from s_op->sync_fs Vivek Goyal
2020-12-22 1:23 ` NeilBrown
2020-12-22 15:17 ` Vivek Goyal
2020-12-21 19:50 ` [PATCH 2/3] vfs: Add a super block operation to check for writeback errors Vivek Goyal
2020-12-22 16:19 ` Matthew Wilcox
2020-12-22 16:25 ` Vivek Goyal
2020-12-23 12:44 ` Jeff Layton
2020-12-23 12:48 ` Jeff Layton
2021-01-04 19:41 ` Vivek Goyal
2020-12-21 19:50 ` [PATCH 3/3] overlayfs: Report writeback errors on upper Vivek Goyal
2020-12-22 16:20 ` Matthew Wilcox
2020-12-22 16:29 ` Vivek Goyal
2020-12-22 17:46 ` Matthew Wilcox
2020-12-22 17:55 ` Vivek Goyal
2020-12-23 12:53 ` Jeff Layton
2020-12-23 18:20 ` Sargun Dhillon
2020-12-23 18:50 ` Matthew Wilcox
2020-12-23 19:29 ` Sargun Dhillon
2020-12-23 20:07 ` Matthew Wilcox
2020-12-23 20:21 ` Sargun Dhillon
2020-12-23 20:44 ` Matthew Wilcox
2020-12-24 9:32 ` Amir Goldstein
2020-12-24 10:12 ` Sargun Dhillon [this message]
2020-12-24 12:13 ` Matthew Wilcox
2020-12-25 6:50 ` Amir Goldstein
2020-12-28 13:25 ` Jeff Layton
2020-12-28 15:51 ` Amir Goldstein
2021-01-04 15:51 ` Vivek Goyal
2020-12-28 15:56 ` Matthew Wilcox
2020-12-28 17:26 ` Jeff Layton
2020-12-28 19:25 ` Sargun Dhillon
2020-12-28 19:37 ` Amir Goldstein
2020-12-28 20:48 ` Matthew Wilcox
2021-01-02 13:25 ` Jeff Layton
2021-01-04 16:59 ` Vivek Goyal
2021-01-04 15:14 ` Vivek Goyal
2021-01-04 15:22 ` Amir Goldstein
2021-01-04 15:40 ` Vivek Goyal
2021-01-04 21:42 ` Amir Goldstein
2021-01-04 22:44 ` Vivek Goyal
2021-01-05 7:11 ` Amir Goldstein
2021-01-05 16:26 ` Vivek Goyal
2021-01-05 16:57 ` Amir Goldstein
2020-12-23 19:00 ` Jeff Layton
2021-01-04 20:00 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201224101215.GA23046@ircssh-2.c.rugged-nimbus-611.internal \
--to=sargun@sargun.me \
--cc=amir73il@gmail.com \
--cc=cgxu519@mykernel.net \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-unionfs@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=neilb@suse.com \
--cc=vgoyal@redhat.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).