From: Amir Goldstein <firstname.lastname@example.org>
To: Matthew Wilcox <email@example.com>, Sargun Dhillon <firstname.lastname@example.org>
Cc: Vivek Goyal <email@example.com>,
Jeff Layton <firstname.lastname@example.org>,
Miklos Szeredi <email@example.com>, Jan Kara <firstname.lastname@example.org>,
NeilBrown <email@example.com>, Al Viro <firstname.lastname@example.org>,
Christoph Hellwig <email@example.com>,
Chengguang Xu <firstname.lastname@example.org>
Subject: Re: [PATCH 3/3] overlayfs: Report writeback errors on upper
Date: Fri, 25 Dec 2020 08:50:25 +0200 [thread overview]
Message-ID: <CAOQ4uxj5YS9LSPoBZ3uakb6NeBG7g-Zeu+8Vt57tizEH6xu0cw@mail.gmail.com> (raw)
On Thu, Dec 24, 2020 at 2:13 PM Matthew Wilcox <email@example.com> wrote:
> On Thu, Dec 24, 2020 at 11:32:55AM +0200, Amir Goldstein wrote:
> > In current master, syncfs() on any file by any container user will
> > result in full syncfs() of the upperfs, which is very bad for container
> > isolation. This has been partly fixed by Chengguang Xu  and I expect
> > his work will be merged soon. Overlayfs still does not do the writeback
> > and syncfs() in overlay still waits for all upper fs writeback to complete,
> > but at least syncfs() in overlay only kicks writeback for upper fs files
> > dirtied by this overlay.
> >  https://lore.kernel.org/linux-unionfs/CAJfpegsbb4iTxW8ZyuRFVNc63zg7Ku7vzpSNuzHASYZH-d5wWA@mail.gmail.com/
> > Sharing the same SEEN flag among thousands of containers is also
> > far from ideal, because effectively this means that any given workload
> > in any single container has very little chance of observing the SEEN flag.
> Perhaps you misunderstand how errseq works. If each container samples
> the errseq at startup, then they will all see any error which occurs
> during their lifespan
Meant to say "...very little chance of NOT observing the SEEN flag",
but We are not in disagreement.
My argument against sharing the SEEN flag refers to Vivek's patch of
stacked errseq_sample()/errseq_check_and_advance() which does NOT
sample errseq at overlayfs mount time. That is why my next sentence is:
"I do agree with Matthew that overlayfs should sample errseq...".
> (and possibly an error which occurred before they started up).
Right. And this is where the discussion of splitting the SEEN flag started.
Some of us want to treat overlayfs mount time as a true epoc for errseq.
The new container didn't write any files yet, so it should not care about
writeback errors from the past.
I agree that it may not be very critical, but as I wrote before, I think we
should do our best to try and isolate container workloads.
> > To this end, I do agree with Matthew that overlayfs should sample errseq
> > and the best patchset to implement it so far IMO is Jeff's patchset .
> > This patch set was written to cater only "volatile" overlayfs mount, but
> > there is no reason not to use the same mechanism for regular overlay
> > mount. The only difference being that "volatile" overlay only checks for
> > error since mount on syncfs() (because "volatile" overlay does NOT
> > syncfs upper fs) and regular overlay checks and advances the overlay's
> > errseq sample on syncfs (and does syncfs upper fs).
> > Matthew, I hope that my explanation of the use case and Jeff's answer
> > is sufficient to understand why the split of the SEEN flag is needed.
> >  https://firstname.lastname@example.org/
> No, it still feels weird and wrong.
All right. Considering your reservations, I think perhaps the split of the
SEEN flag can wait for a later time after more discussions and maybe
not as suitable for stable as we thought.
I think that for stable, it would be sufficient to adapt Surgun's original
syncfs for volatile mount patch  to cover the non-volatile case:
- errseq_sample() upper fs
- on volatile mount, errseq_check() upper fs and fail mount on un-SEEN error
- errseq_check() for volatile mount
- errseq_check_and_advance() for non-volatile mount
- errseq_set() overlay sb on upper fs error
Now errseq_set() is not only a hack around __sync_filesystem ignoring
return value of ->sync_fs(). It is really needed for per-overlay SEEN
error isolation in the non-volatile case.
Unless I am missing something, I think we do not strictly need Vivek's
1/3 patch  for stable, but not sure.
Do you agree with the above proposal?
Will you make it into a patch?
Do you agree that overlay syncfs observing writeback errors that predate
overlay mount time is an issue that can be deferred (maybe forever)?
BTW, in all the discussions we always assumed that stacked fsync() is correct
WRT errseq, but in fact, fsync() can also observe an unseen error that predates
In order to fix that, we will probably need to split the SEEN flag and some
more errseq acrobatics, but again, not sure it is worth the effort.
next prev parent reply other threads:[~2020-12-25 6:51 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-21 19:50 [RFC PATCH 0/3][v3] vfs, overlayfs: Fix syncfs() to return correct errors Vivek Goyal
2020-12-21 19:50 ` [PATCH 1/3] vfs: Do not ignore return code from s_op->sync_fs Vivek Goyal
2020-12-22 1:23 ` NeilBrown
2020-12-22 15:17 ` Vivek Goyal
2020-12-21 19:50 ` [PATCH 2/3] vfs: Add a super block operation to check for writeback errors Vivek Goyal
2020-12-22 16:19 ` Matthew Wilcox
2020-12-22 16:25 ` Vivek Goyal
2020-12-23 12:44 ` Jeff Layton
2020-12-23 12:48 ` Jeff Layton
2021-01-04 19:41 ` Vivek Goyal
2020-12-21 19:50 ` [PATCH 3/3] overlayfs: Report writeback errors on upper Vivek Goyal
2020-12-22 16:20 ` Matthew Wilcox
2020-12-22 16:29 ` Vivek Goyal
2020-12-22 17:46 ` Matthew Wilcox
2020-12-22 17:55 ` Vivek Goyal
2020-12-23 12:53 ` Jeff Layton
2020-12-23 18:20 ` Sargun Dhillon
2020-12-23 18:50 ` Matthew Wilcox
2020-12-23 19:29 ` Sargun Dhillon
2020-12-23 20:07 ` Matthew Wilcox
2020-12-23 20:21 ` Sargun Dhillon
2020-12-23 20:44 ` Matthew Wilcox
2020-12-24 9:32 ` Amir Goldstein
2020-12-24 10:12 ` Sargun Dhillon
2020-12-24 12:13 ` Matthew Wilcox
2020-12-25 6:50 ` Amir Goldstein [this message]
2020-12-28 13:25 ` Jeff Layton
2020-12-28 15:51 ` Amir Goldstein
2021-01-04 15:51 ` Vivek Goyal
2020-12-28 15:56 ` Matthew Wilcox
2020-12-28 17:26 ` Jeff Layton
2020-12-28 19:25 ` Sargun Dhillon
2020-12-28 19:37 ` Amir Goldstein
2020-12-28 20:48 ` Matthew Wilcox
2021-01-02 13:25 ` Jeff Layton
2021-01-04 16:59 ` Vivek Goyal
2021-01-04 15:14 ` Vivek Goyal
2021-01-04 15:22 ` Amir Goldstein
2021-01-04 15:40 ` Vivek Goyal
2021-01-04 21:42 ` Amir Goldstein
2021-01-04 22:44 ` Vivek Goyal
2021-01-05 7:11 ` Amir Goldstein
2021-01-05 16:26 ` Vivek Goyal
2021-01-05 16:57 ` Amir Goldstein
2020-12-23 19:00 ` Jeff Layton
2021-01-04 20:00 ` Vivek Goyal
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).