Re: [PATCH v2 4/4] overlay: Add rudimentary checking of writeback errseq on volatile remount

From: Amir Goldstein <amir73il@gmail.com>
To: Vivek Goyal <vgoyal@redhat.com>, Sargun Dhillon <sargun@sargun.me>
Cc: overlayfs <linux-unionfs@vger.kernel.org>,
	Miklos Szeredi <miklos@szeredi.hu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Giuseppe Scrivano <gscrivan@redhat.com>,
	Daniel J Walsh <dwalsh@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	David Howells <dhowells@redhat.com>,
	Jeff Layton <jlayton@redhat.com>
Subject: Re: [PATCH v2 4/4] overlay: Add rudimentary checking of writeback errseq on volatile remount
Date: Sat, 5 Dec 2020 11:13:13 +0200	[thread overview]
Message-ID: <CAOQ4uxjeG4N7i95D+YFr0zo82nLOjUCdUhD8e1WABFtwtQYzrQ@mail.gmail.com> (raw)
In-Reply-To: <20201130191509.GC14328@redhat.com>

On Mon, Nov 30, 2020 at 9:15 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Fri, Nov 27, 2020 at 01:20:58AM -0800, Sargun Dhillon wrote:
> > Volatile remounts validate the following at the moment:
> >  * Has the module been reloaded / the system rebooted
> >  * Has the workdir been remounted
> >
> > This adds a new check for errors detected via the superblock's
> > errseq_t. At mount time, the errseq_t is snapshotted to disk,
> > and upon remount it's re-verified. This allows for kernel-level
> > detection of errors without forcing userspace to perform a
> > sync and allows for the hidden detection of writeback errors.
> >
> > Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> > Cc: linux-fsdevel@vger.kernel.org
> > Cc: linux-unionfs@vger.kernel.org
> > Cc: Miklos Szeredi <miklos@szeredi.hu>
> > Cc: Amir Goldstein <amir73il@gmail.com>
> > Cc: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/overlayfs/overlayfs.h | 1 +
> >  fs/overlayfs/readdir.c   | 6 ++++++
> >  fs/overlayfs/super.c     | 1 +
> >  3 files changed, 8 insertions(+)
> >
> > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> > index de694ee99d7c..e8a711953b64 100644
> > --- a/fs/overlayfs/overlayfs.h
> > +++ b/fs/overlayfs/overlayfs.h
> > @@ -85,6 +85,7 @@ struct ovl_volatile_info {
> >        */
> >       uuid_t          ovl_boot_id;    /* Must stay first member */
> >       u64             s_instance_id;
> > +     errseq_t        errseq; /* Implemented as a u32 */
> >  } __packed;
> >
> >  /*
> > diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> > index 7b66fbb20261..5795b28bb4cf 100644
> > --- a/fs/overlayfs/readdir.c
> > +++ b/fs/overlayfs/readdir.c
> > @@ -1117,6 +1117,12 @@ static int ovl_verify_volatile_info(struct ovl_fs *ofs,
> >               return -EINVAL;
> >       }
> >
> > +     err = errseq_check(&volatiledir->d_sb->s_wb_err, info.errseq);
> > +     if (err) {
> > +             pr_debug("Workdir filesystem reports errors: %d\n", err);
> > +             return -EINVAL;
> > +     }
> > +
> >       return 1;
> >  }
> >
> > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > index a8ee3ba4ebbd..2e473f8c75dd 100644
> > --- a/fs/overlayfs/super.c
> > +++ b/fs/overlayfs/super.c
> > @@ -1248,6 +1248,7 @@ static int ovl_set_volatile_info(struct ovl_fs *ofs, struct dentry *volatiledir)
> >       int err;
> >       struct ovl_volatile_info info = {
> >               .s_instance_id = volatiledir->d_sb->s_instance_id,
> > +             .errseq = errseq_sample(&volatiledir->d_sb->s_wb_err),
>
> errse_sample() seems to return 0 if nobody has seen the error yet. That
> means on remount we will fail. It is a false failure from our perspective
> and we are not interested in knowing if somebody else has seen the
> failure or not.
>
> Maybe we need a flag in errseq_sample() to get us current value
> irrespective of the fact whether anybody has seen the error or not?
>
> If we end up making this change, then we probably will have to somehow
> mask ERRSEQ_SEEN bit in errseq_check() comparison. Because if we
> sampled ->s_wb_err when nobody saw it and later by the remount time
> say ERRSEQ_SEEN is set, we don't want remount to fail.
>

Hopping back to this review, looks like for volatile mount we need
something like (in this order):
1. check if re-use and get sampled errseq from volatiledir xattr
2. otherwise errseq_sample() upper_sb and store in volatiledir xattr
3. errseq_check() since stored or sampled errseq (0 for fresh mount
with unseen error)
4. fail volatile mount if errseq_check() failed
5. errseq_check() since stored errseq on fsync()/syncfs()

For fresh volatile mount, syncfs can fix the temporary mount error.
For re-used volatile mount, the mount error is permanent.

Did I miss anything?
Is the mount safe for both seen and unseen error cases? no error case?
Are we safe if a syncfs on upper_sb sneaks in between 2 and 3?

Thanks,
Amir.