linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sargun Dhillon <sargun@sargun.me>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Vivek Goyal <vgoyal@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	overlayfs <linux-unionfs@vger.kernel.org>,
	Jeff Layton <jlayton@kernel.org>,
	Miklos Szeredi <miklos@szeredi.hu>, Jan Kara <jack@suse.cz>,
	NeilBrown <neilb@suse.com>, Al Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@lst.de>,
	Chengguang Xu <cgxu519@mykernel.net>
Subject: Re: [PATCH 3/3] overlayfs: Report writeback errors on upper
Date: Thu, 24 Dec 2020 10:12:16 +0000	[thread overview]
Message-ID: <20201224101215.GA23046@ircssh-2.c.rugged-nimbus-611.internal> (raw)
In-Reply-To: <CAOQ4uxjAeGv8x2hBBzHz5PjSDq0Q+RN-ikgqEvAA+XE_U-U5Nw@mail.gmail.com>

On Thu, Dec 24, 2020 at 11:32:55AM +0200, Amir Goldstein wrote:
> On Wed, Dec 23, 2020 at 10:44 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Dec 23, 2020 at 08:21:41PM +0000, Sargun Dhillon wrote:
> > > On Wed, Dec 23, 2020 at 08:07:46PM +0000, Matthew Wilcox wrote:
> > > > On Wed, Dec 23, 2020 at 07:29:41PM +0000, Sargun Dhillon wrote:
> > > > > On Wed, Dec 23, 2020 at 06:50:44PM +0000, Matthew Wilcox wrote:
> > > > > > On Wed, Dec 23, 2020 at 06:20:27PM +0000, Sargun Dhillon wrote:
> > > > > > > I fail to see why this is neccessary if you incorporate error reporting into the
> > > > > > > sync_fs callback. Why is this separate from that callback? If you pickup Jeff's
> > > > > > > patch that adds the 2nd flag to errseq for "observed", you should be able to
> > > > > > > stash the first errseq seen in the ovl_fs struct, and do the check-and-return
> > > > > > > in there instead instead of adding this new infrastructure.
> > > > > >
> > > > > > You still haven't explained why you want to add the "observed" flag.
> > > > >
> > > > >
> > > > > In the overlayfs model, many users may be using the same filesystem (super block)
> > > > > for their upperdir. Let's say you have something like this:
> > > > >
> > > > > /workdir [Mounted FS]
> > > > > /workdir/upperdir1 [overlayfs upperdir]
> > > > > /workdir/upperdir2 [overlayfs upperdir]
> > > > > /workdir/userscratchspace
> > > > >
> > > > > The user needs to be able to do something like:
> > > > > sync -f ${overlayfs1}/file
> > > > >
> > > > > which in turn will call sync on the the underlying filesystem (the one mounted
> > > > > on /workdir), and can check if the errseq has changed since the overlayfs was
> > > > > mounted, and use that to return an error to the user.
> > > >
> > > > OK, but I don't see why the current scheme doesn't work for this.  If
> > > > (each instance of) overlayfs samples the errseq at mount time and then
> > > > check_and_advances it at sync time, it will see any error that has occurred
> > > > since the mount happened (and possibly also an error which occurred before
> > > > the mount happened, but hadn't been reported to anybody before).
> > > >
> > >
> > > If there is an outstanding error at mount time, and the SEEN flag is unset,
> > > subsequent errors will not increment the counter, until the user calls sync on
> > > the upperdir's filesystem. If overlayfs calls check_and_advance on the upperdir's
> > > super block at any point, it will then set the seen block, and if the user calls
> > > syncfs on the upperdir, it will not return that there is an outstanding error,
> > > since overlayfs just cleared it.
> >
> > Your concern is this case:
> >
> > fs is mounted on /workdir
> > /workdir/A is written to and then closed.
> > writeback happens and -EIO happens, but there's nobody around to care.
> > /workdir/upperdir1 becomes part of an overlayfs mount
> > overlayfs samples the error
> > a user writes to /workdir/B, another -EIO occurs, but nothing happens
> > someone calls syncfs on /workdir/upperdir/A, gets the EIO.
> > a user opens /workdir/B and calls syncfs, but sees no error
> >
> > do i have that right?  or is it something else?
> 
> IMO it is something else. Others may disagree.
> IMO the level of interference between users accessing overlay and users
> accessing upper fs directly is not well defined and it can stay this way.
> 
> Concurrent access to  /workdir/upperdir/A via overlay and underlying fs
> is explicitly warranted against in Documentation/filesystems/overlayfs.rst#
> Changes to underlying filesystems:
> "Changes to the underlying filesystems while part of a mounted overlay
> filesystem are not allowed.  If the underlying filesystem is changed,
> the behavior of the overlay is undefined, though it will not result in
> a crash or deadlock."
> 
> The question is whether syncfs(open(/workdir/B)) is considered
> "Changes to the underlying filesystems". Regardless of the answer,
> this is not an interesting case IMO.
> 
> The real issue is with interference between overlays that share the
> same upper fs, because this is by far and large the common use case
> that is creating real problems for a lot of container users.
> 
> Workloads running inside containers (with overlayfs storage driver)
> will never be as isolated as workloads running inside VMs, but it
> doesn't mean we cannot try to improve.
> 
> In current master, syncfs() on any file by any container user will
> result in full syncfs() of the upperfs, which is very bad for container
> isolation. This has been partly fixed by Chengguang Xu [1] and I expect
> his work will be merged soon. Overlayfs still does not do the writeback
> and syncfs() in overlay still waits for all upper fs writeback to complete,
> but at least syncfs() in overlay only kicks writeback for upper fs files
> dirtied by this overlay.
> 
> [1] https://lore.kernel.org/linux-unionfs/CAJfpegsbb4iTxW8ZyuRFVNc63zg7Ku7vzpSNuzHASYZH-d5wWA@mail.gmail.com/
> 
> Sharing the same SEEN flag among thousands of containers is also
> far from ideal, because effectively this means that any given workload
> in any single container has very little chance of observing the SEEN flag.
> 
> To this end, I do agree with Matthew that overlayfs should sample errseq
> and the best patchset to implement it so far IMO is Jeff's patchset [2].
> This patch set was written to cater only "volatile" overlayfs mount, but
> there is no reason not to use the same mechanism for regular overlay
> mount. The only difference being that "volatile" overlay only checks for
> error since mount on syncfs() (because "volatile" overlay does NOT
> syncfs upper fs) and regular overlay checks and advances the overlay's
> errseq sample on syncfs (and does syncfs upper fs).
> 
> Matthew, I hope that my explanation of the use case and Jeff's answer
> is sufficient to understand why the split of the SEEN flag is needed.
> 
> [2] https://lore.kernel.org/linux-unionfs/20201213132713.66864-1-jlayton@kernel.org/
> 
> w.r.t Vivek's patchset (this one), I do not object to it at all, but it fixes
> a problem that Jeff's patch had already solved with an ugly hack:
> 
>   /* Propagate errors from upper to overlayfs */
>   ret = errseq_check(&upper_sb->s_wb_err, ofs->err_mark);
>   errseq_set(&sb->s_wb_err, ret);
> 
> Since Jeff's patch is minimal, I think that it should be the fix applied
> first and proposed for stable (with adaptations for non-volatile overlay).
> 
> I guess that Vivek's patch 1/3 from this series [3] is also needed to
> complement the work that should go to stable.
> 
> Vivek, Sargun,
> 
> Do you understand my proposal?
Yes. I agree that Jeff's patch should be added to stable. The fact we don't
bubble up writeback errors turns out to be a real problem that I never knew
was happening, but upon investigating, it looks like a real thing.

I think we can use Jeff's hacky approach in stable, as it's far more minimal, 
and has a much lower chance of causing issues, but if we make further 
improvements, we wont be able to backport them to stable as easily. I have
nothing explicitly against Vivek's approach though.

> Do you agree with it as a way forward to address the various syncfs
> issues for volatile/non-volatile that both of you were trying to address?
Yes. I think Vivek's patchset of introducing a new superblock callback is
the best approach.

> 
> Sargun, I know this all discussion has forked from your volatile re-use
> patch set, but let's not confuse fsdevel forks more than we have to.
> The way forward for volatile re-use from this proposal is straight forward.

I think that Vivek's patchset of adding a new callback, plus Jeff's new flag 
solves detection of errors for normal mounts, volatile mounts, and volatile 
remounts.

I think I responded to Jeff's patch and it looked good, bar one of the loops.
My only suggestion is that someone add the intended behaviour here as comment
to super_ops of the new callback.
> 
> Thanks,
> Amir.

  reply	other threads:[~2020-12-24 10:13 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-21 19:50 [RFC PATCH 0/3][v3] vfs, overlayfs: Fix syncfs() to return correct errors Vivek Goyal
2020-12-21 19:50 ` [PATCH 1/3] vfs: Do not ignore return code from s_op->sync_fs Vivek Goyal
2020-12-22  1:23   ` NeilBrown
2020-12-22 15:17     ` Vivek Goyal
2020-12-21 19:50 ` [PATCH 2/3] vfs: Add a super block operation to check for writeback errors Vivek Goyal
2020-12-22 16:19   ` Matthew Wilcox
2020-12-22 16:25     ` Vivek Goyal
2020-12-23 12:44       ` Jeff Layton
2020-12-23 12:48   ` Jeff Layton
2021-01-04 19:41     ` Vivek Goyal
2020-12-21 19:50 ` [PATCH 3/3] overlayfs: Report writeback errors on upper Vivek Goyal
2020-12-22 16:20   ` Matthew Wilcox
2020-12-22 16:29     ` Vivek Goyal
2020-12-22 17:46       ` Matthew Wilcox
2020-12-22 17:55         ` Vivek Goyal
2020-12-23 12:53           ` Jeff Layton
2020-12-23 18:20   ` Sargun Dhillon
2020-12-23 18:50     ` Matthew Wilcox
2020-12-23 19:29       ` Sargun Dhillon
2020-12-23 20:07         ` Matthew Wilcox
2020-12-23 20:21           ` Sargun Dhillon
2020-12-23 20:44             ` Matthew Wilcox
2020-12-24  9:32               ` Amir Goldstein
2020-12-24 10:12                 ` Sargun Dhillon [this message]
2020-12-24 12:13                 ` Matthew Wilcox
2020-12-25  6:50                   ` Amir Goldstein
2020-12-28 13:25                     ` Jeff Layton
2020-12-28 15:51                       ` Amir Goldstein
2021-01-04 15:51                         ` Vivek Goyal
2020-12-28 15:56                       ` Matthew Wilcox
2020-12-28 17:26                         ` Jeff Layton
2020-12-28 19:25                           ` Sargun Dhillon
2020-12-28 19:37                           ` Amir Goldstein
2020-12-28 20:48                             ` Matthew Wilcox
2021-01-02 13:25                               ` Jeff Layton
2021-01-04 16:59                         ` Vivek Goyal
2021-01-04 15:14                 ` Vivek Goyal
2021-01-04 15:22                   ` Amir Goldstein
2021-01-04 15:40                     ` Vivek Goyal
2021-01-04 21:42                       ` Amir Goldstein
2021-01-04 22:44                         ` Vivek Goyal
2021-01-05  7:11                           ` Amir Goldstein
2021-01-05 16:26                             ` Vivek Goyal
2021-01-05 16:57                               ` Amir Goldstein
2020-12-23 19:00     ` Jeff Layton
2021-01-04 20:00     ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201224101215.GA23046@ircssh-2.c.rugged-nimbus-611.internal \
    --to=sargun@sargun.me \
    --cc=amir73il@gmail.com \
    --cc=cgxu519@mykernel.net \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=neilb@suse.com \
    --cc=vgoyal@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).