Re: [PATCH v2 0/3] Check errors on sync for volatile overlayfs mounts

From: Sargun Dhillon <sargun@sargun.me>
To: Jeff Layton <jlayton@kernel.org>
Cc: Amir Goldstein <amir73il@gmail.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	Vivek Goyal <vgoyal@redhat.com>,
	overlayfs <linux-unionfs@vger.kernel.org>,
	Linux FS-devel Mailing List <linux-fsdevel@vger.kernel.org>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH v2 0/3] Check errors on sync for volatile overlayfs mounts
Date: Sun, 13 Dec 2020 20:06:42 +0000	[thread overview]
Message-ID: <20201213200642.GB8562@ircssh-2.c.rugged-nimbus-611.internal> (raw)
In-Reply-To: <7779e2ed97080009d894f3442bfad31972494542.camel@kernel.org>

On Sat, Dec 12, 2020 at 06:21:37AM -0500, Jeff Layton wrote:
> On Fri, 2020-12-11 at 15:49 -0800, Sargun Dhillon wrote:
> > The semantics of errseq and syncfs are such that it is impossible to track
> > if any errors have occurred between the time the first error occurred, and
> > the user checks for the error (calls syncfs, and subsequently
> > errseq_check_and_advance.
> > 
> > Overlayfs has a volatile feature which short-circuits syncfs. This, in turn
> > makes it so that the user can have silent data corruption and not know
> > about it. The third patch in the series introduces behaviour that makes it
> > so that we can track errors, and bubble up whether the user has put
> > themselves in bad situation.
> > 
> > This required some gymanstics in errseq, and adding a wrapper around it
> > called "errseq_counter" (errseq + counter). The data structure uses an
> > atomic to track overflow errors. This approach, rather than moving to an
> > atomic64 / u64 is so we can avoid bloating every person that subscribes to
> > an errseq, and only add the subscriber behaviour to those who care (at the
> > expense of space.
> > 
> > The datastructure is write-optimized, and rightfully so, as the users
> > of the counter feature are just overlayfs, and it's called in fsync
> > checking, which is a rather seldom operation, and not really on
> > any hotpaths.
> > 
> > [1]: https://lore.kernel.org/linux-fsdevel/20201202092720.41522-1-sargun@sargun.me/
> > 
> > Sargun Dhillon (3):
> >   errseq: Add errseq_counter to allow for all errors to be observed
> >   errseq: Add mechanism to snapshot errseq_counter and check snapshot
> >   overlay: Implement volatile-specific fsync error behaviour
> > 
> >  Documentation/filesystems/overlayfs.rst |   8 ++
> >  fs/buffer.c                             |   2 +-
> >  fs/overlayfs/file.c                     |   5 +-
> >  fs/overlayfs/overlayfs.h                |   1 +
> >  fs/overlayfs/ovl_entry.h                |   3 +
> >  fs/overlayfs/readdir.c                  |   5 +-
> >  fs/overlayfs/super.c                    |  26 +++--
> >  fs/overlayfs/util.c                     |  28 +++++
> >  fs/super.c                              |   1 +
> >  fs/sync.c                               |   3 +-
> >  include/linux/errseq.h                  |  18 ++++
> >  include/linux/fs.h                      |   6 +-
> >  include/linux/pagemap.h                 |   2 +-
> >  lib/errseq.c                            | 129 ++++++++++++++++++++----
> >  14 files changed, 202 insertions(+), 35 deletions(-)
> > 
> 
> It would hel if you could more clearly lay out the semantics you're
> looking for. If I understand correctly:
> 
> You basically want to be able to sample the sb->s_wb_err of the upper
> layer at mount time and then always return an error if any new errors
> were recorded since that point.
> 
There's two things we want to achieve:

1. If an error occurs on the upperidr after mount time, we want to tell the user 
  on every syncfs  they try to do on the overlayfs volume that it occurred, and
  that the volume is in an inconsistent state.
2. We want to be able to checkpoint some information to disk, and if an overlayfs
   mount was unmounted, and remounted, while in volatile mode, we want to make sure
   no error occurred while we were way.

> If that's correct, then I'm not sure I get need for all of this extra
> counter machinery. Why not just sample it at mount time without
> recording it as 0 if the seen flag isn't set. Then just do an
> errseq_check against the upper superblock (without advancing) in the
> overlayfs ->sync_fs routine and just errseq_set that error into the
> overlayfs superblock? The syncfs syscall wrapper should then always
> report the latest error.
I considered the following options:
1. Make errseq_t a u64: Downside: Bloats all errseq_ts to u64 / 8-byte aligned
2. Make errseq_counter_t an atomic64 / u64 giving us 52 error checking bits vs.
   just 20: Downside: We would have to do an cmpxchg64 on every error, which
   seemed like it could be costly on platforms that don't naturally support it.
3. Have an overflow counter: This doesn't introduce extra CPU overhead for any
   other user of errseq, nor does it introduce much memory overhead. Downside:
   complexity.

> 
> Or (even better) rework all of the sync_fs/syncfs mess to be more sane,
> so that overlayfs has more control over what errors get returned to
> userland. ISTM that the main problem you have is that the
> errseq_check_and_advance is done in the syscall wrapper, and that's
> probably not appropriate for your use-case.
> 
> -- 
> Jeff Layton <jlayton@kernel.org>
>