Re: [PATCH] overlay: Implement volatile-specific fsync error behaviour

From: Vivek Goyal <vgoyal@redhat.com>
To: Jeff Layton <jlayton@redhat.com>
Cc: Sargun Dhillon <sargun@sargun.me>,
	Amir Goldstein <amir73il@gmail.com>,
	linux-fsdevel@vger.kernel.org, linux-unionfs@vger.kernel.org,
	Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [PATCH] overlay: Implement volatile-specific fsync error behaviour
Date: Wed, 2 Dec 2020 13:56:01 -0500	[thread overview]
Message-ID: <20201202185601.GF147783@redhat.com> (raw)
In-Reply-To: <59de2220a85e858a4c397969e2a0d03f1d653a6a.camel@redhat.com>

On Wed, Dec 02, 2020 at 01:22:09PM -0500, Jeff Layton wrote:
> On Wed, 2020-12-02 at 12:29 -0500, Vivek Goyal wrote:
> > On Wed, Dec 02, 2020 at 12:02:43PM -0500, Jeff Layton wrote:
> > 
> > [..]
> > > > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > > > > index 290983bcfbb3..82a096a05bce 100644
> > > > > --- a/fs/overlayfs/super.c
> > > > > +++ b/fs/overlayfs/super.c
> > > > > @@ -261,11 +261,18 @@ static int ovl_sync_fs(struct super_block *sb, int wait)
> > > > > Â 	struct super_block *upper_sb;
> > > > > Â 	int ret;
> > > > > Â 
> > > > > 
> > > > > 
> > > > > 
> > > > > -	if (!ovl_upper_mnt(ofs))
> > > > > -		return 0;
> > > > > +	ret = ovl_check_sync(ofs);
> > > > > +	/*
> > > > > +	 * We have to always set the err, because the return value isn't
> > > > > +	 * checked, and instead VFS looks at the writeback errseq after
> > > > > +	 * this call.
> > > > > +	 */
> > > > > +	if (ret < 0)
> > > > > +		errseq_set(&sb->s_wb_err, ret);
> > > > 
> > > > I was wondering that why errseq_set() will result in returning error
> > > > all the time. Then realized that last syncfs() call must have set
> > > > ERRSEQ_SEEN flag and that will mean errseq_set() will increment
> > > > counter and that means this syncfs() will will return error too. Cool.
> > > > 
> > > > > +
> > > > > +	if (!ret)
> > > > > +		return ret;
> > > > > Â 
> > > > > 
> > > > > 
> > > > > 
> > > > > -	if (!ovl_should_sync(ofs))
> > > > > -		return 0;
> > > > > Â 	/*
> > > > > Â 	 * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
> > > > > Â 	 * All the super blocks will be iterated, including upper_sb.
> > > > > @@ -1927,6 +1934,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> > > > > Â 	sb->s_op = &ovl_super_operations;
> > > > > Â 
> > > > > 
> > > > > 
> > > > > 
> > > > > Â 	if (ofs->config.upperdir) {
> > > > > +		struct super_block *upper_mnt_sb;
> > > > > +
> > > > > Â 		if (!ofs->config.workdir) {
> > > > > Â 			pr_err("missing 'workdir'\n");
> > > > > Â 			goto out_err;
> > > > > @@ -1943,9 +1952,10 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> > > > > Â 		if (!ofs->workdir)
> > > > > Â 			sb->s_flags |= SB_RDONLY;
> > > > > Â 
> > > > > 
> > > > > 
> > > > > 
> > > > > -		sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
> > > > > -		sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
> > > > > -
> > > > > +		upper_mnt_sb = ovl_upper_mnt(ofs)->mnt_sb;
> > > > > +		sb->s_stack_depth = upper_mnt_sb->s_stack_depth;
> > > > > +		sb->s_time_gran = upper_mnt_sb->s_time_gran;
> > > > > +		ofs->upper_errseq = errseq_sample(&upper_mnt_sb->s_wb_err);
> > > > 
> > > > I asked this question in last email as well. errseq_sample() will return
> > > > 0 if current error has not been seen yet. That means next time a sync
> > > > call comes for volatile mount, it will return an error. But that's
> > > > not what we want. When we mounted a volatile overlay, if there is an
> > > > existing error (seen/unseen), we don't care. We only care if there
> > > > is a new error after the volatile mount, right?
> > > > 
> > > > I guess we will need another helper similar to errseq_smaple() which
> > > > just returns existing value of errseq. And then we will have to
> > > > do something about errseq_check() to not return an error if "since"
> > > > and "eseq" differ only by "seen" bit.
> > > > 
> > > > Otherwise in current form, volatile mount will always return error
> > > > if upperdir has error and it has not been seen by anybody.
> > > > 
> > > > How did you finally end up testing the error case. Want to simualate
> > > > error aritificially and test it.
> > > > 
> > > 
> > > If you don't want to see errors that occurred before you did the mount,
> > > then you probably can just resurrect and rename the original version of
> > > errseq_sample. Something like this, but with a different name:
> > > 
> > > +errseq_t errseq_sample(errseq_t *eseq)
> > > +{
> > > +       errseq_t old = READ_ONCE(*eseq);
> > > +       errseq_t new = old;
> > > +
> > > +       /*
> > > +        * For the common case of no errors ever having been set, we can skip
> > > +        * marking the SEEN bit. Once an error has been set, the value will
> > > +        * never go back to zero.
> > > +        */
> > > +       if (old != 0) {
> > > +               new |= ERRSEQ_SEEN;
> > > +               if (old != new)
> > > +                       cmpxchg(eseq, old, new);
> > > +       }
> > > +       return new;
> > > +}
> > 
> > Yes, a helper like this should solve the issue at hand. We are not
> > interested in previous errors. This also sets the ERRSEQ_SEEN on 
> > sample and it will also solve the other issue when after sampling
> > if error gets seen, we don't want errseq_check() to return error.
> > 
> > Thinking of some possible names for new function.
> > 
> > errseq_sample_seen()
> > errseq_sample_set_seen()
> > errseq_sample_consume_unseen()
> > errseq_sample_current()
> > 
> 
> errseq_sample_consume_unseen() sounds good, though maybe it should be
> "ignore_unseen"? IDK, naming this stuff is the hardest part.
> 
> If you don't want to add a new helper, I think you'd probably also be
> able to do something like this in fill_super:
> 
>     errseq_sample()
>     errseq_check_and_advance()
> 
> 
> ...and just ignore the error returned by the check and advance. At that
> point, the cursor should be caught up and any subsequent syncfs call
> should return 0 until you record another error. It's a little less
> efficient, but only slightly so.

This seems even better.

Thinking little bit more. I am now concerned about setting ERRSEQ_SEEN on
sample. In our case, that would mean that we consumed an unseen error but
never reported it back to user space. And then somebody might complain.

This kind of reminds me posgresql's fsync issues where they did
writes using one fd and another thread opened another fd and
did sync and they expected any errors to be reported.

Similary what if an unseen error is present on superblock on upper
and if we mount volatile overlay and mark the error SEEN, then
if another process opens a file on upper and did syncfs(), it will
complain that exisiting error was not reported to it.

Overlay use case seems to be that we just want to check if an error
has happened on upper superblock since we sampled it and don't
want to consume that error as such. Will it make sense to introduce
two helpers for error sampling and error checking which mask the
SEEN bit and don't do anything with it. For example, following compile
tested only patch.

Now we will not touch SEEN bit at all. And even if SEEN gets set
since we sampled, errseq_check_mask_seen() will not flag it as
error.

Thanks
Vivek

---
 lib/errseq.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Index: redhat-linux/lib/errseq.c
===================================================================

--- redhat-linux.orig/lib/errseq.c	2020-06-09 08:59:29.712836019 -0400
+++ redhat-linux/lib/errseq.c	2020-12-02 13:40:08.085775647 -0500
@@ -130,6 +130,12 @@ errseq_t errseq_sample(errseq_t *eseq)
 }
 EXPORT_SYMBOL(errseq_sample);
 
+errseq_t errseq_sample_mask_seen(errseq_t *eseq)
+{
+	return READ_ONCE(*eseq) & (~ERRSEQ_SEEN);
+}
+EXPORT_SYMBOL(errseq_sample_mask_seen);
+
 /**
  * errseq_check() - Has an error occurred since a particular sample point?
  * @eseq: Pointer to errseq_t value to be checked.
@@ -151,6 +157,17 @@ int errseq_check(errseq_t *eseq, errseq_
 }
 EXPORT_SYMBOL(errseq_check);
 
+int errseq_check_mask_seen(errseq_t *eseq, errseq_t since)
+{
+	errseq_t cur = READ_ONCE(*eseq) & (~ERRSEQ_SEEN);
+
+	since &= ~ERRSEQ_SEEN;
+	if (likely(cur == since))
+		return 0;
+	return -(cur & MAX_ERRNO);
+}
+EXPORT_SYMBOL(errseq_check_mask_seen);
+
 /**
  * errseq_check_and_advance() - Check an errseq_t and advance to current value.
  * @eseq: Pointer to value being checked and reported.