linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sargun Dhillon <sargun@sargun.me>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Jeff Layton <jlayton@redhat.com>,
	Amir Goldstein <amir73il@gmail.com>,
	linux-fsdevel@vger.kernel.org, linux-unionfs@vger.kernel.org,
	Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [PATCH] overlay: Implement volatile-specific fsync error behaviour
Date: Wed, 2 Dec 2020 18:49:37 +0000	[thread overview]
Message-ID: <20201202184936.GA17139@ircssh-2.c.rugged-nimbus-611.internal> (raw)
In-Reply-To: <20201202172906.GE147783@redhat.com>

On Wed, Dec 02, 2020 at 12:29:06PM -0500, Vivek Goyal wrote:
> On Wed, Dec 02, 2020 at 12:02:43PM -0500, Jeff Layton wrote:
> 
> [..]
> > > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > > > index 290983bcfbb3..82a096a05bce 100644
> > > > --- a/fs/overlayfs/super.c
> > > > +++ b/fs/overlayfs/super.c
> > > > @@ -261,11 +261,18 @@ static int ovl_sync_fs(struct super_block *sb, int wait)
> > > >  	struct super_block *upper_sb;
> > > >  	int ret;
> > > >  
> > > > 
> > > > 
> > > > 
> > > > -	if (!ovl_upper_mnt(ofs))
> > > > -		return 0;
> > > > +	ret = ovl_check_sync(ofs);
> > > > +	/*
> > > > +	 * We have to always set the err, because the return value isn't
> > > > +	 * checked, and instead VFS looks at the writeback errseq after
> > > > +	 * this call.
> > > > +	 */
> > > > +	if (ret < 0)
> > > > +		errseq_set(&sb->s_wb_err, ret);
> > > 
> > > I was wondering that why errseq_set() will result in returning error
> > > all the time. Then realized that last syncfs() call must have set
> > > ERRSEQ_SEEN flag and that will mean errseq_set() will increment
> > > counter and that means this syncfs() will will return error too. Cool.
> > > 
> > > > +
> > > > +	if (!ret)
> > > > +		return ret;
> > > >  
> > > > 
> > > > 
> > > > 
> > > > -	if (!ovl_should_sync(ofs))
> > > > -		return 0;
> > > >  	/*
> > > >  	 * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
> > > >  	 * All the super blocks will be iterated, including upper_sb.
> > > > @@ -1927,6 +1934,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> > > >  	sb->s_op = &ovl_super_operations;
> > > >  
> > > > 
> > > > 
> > > > 
> > > >  	if (ofs->config.upperdir) {
> > > > +		struct super_block *upper_mnt_sb;
> > > > +
> > > >  		if (!ofs->config.workdir) {
> > > >  			pr_err("missing 'workdir'\n");
> > > >  			goto out_err;
> > > > @@ -1943,9 +1952,10 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> > > >  		if (!ofs->workdir)
> > > >  			sb->s_flags |= SB_RDONLY;
> > > >  
> > > > 
> > > > 
> > > > 
> > > > -		sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
> > > > -		sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
> > > > -
> > > > +		upper_mnt_sb = ovl_upper_mnt(ofs)->mnt_sb;
> > > > +		sb->s_stack_depth = upper_mnt_sb->s_stack_depth;
> > > > +		sb->s_time_gran = upper_mnt_sb->s_time_gran;
> > > > +		ofs->upper_errseq = errseq_sample(&upper_mnt_sb->s_wb_err);
> > > 
> > > I asked this question in last email as well. errseq_sample() will return
> > > 0 if current error has not been seen yet. That means next time a sync
> > > call comes for volatile mount, it will return an error. But that's
> > > not what we want. When we mounted a volatile overlay, if there is an
> > > existing error (seen/unseen), we don't care. We only care if there
> > > is a new error after the volatile mount, right?
> > > 
> > > I guess we will need another helper similar to errseq_smaple() which
> > > just returns existing value of errseq. And then we will have to
> > > do something about errseq_check() to not return an error if "since"
> > > and "eseq" differ only by "seen" bit.
> > > 
> > > Otherwise in current form, volatile mount will always return error
> > > if upperdir has error and it has not been seen by anybody.
> > > 
> > > How did you finally end up testing the error case. Want to simualate
> > > error aritificially and test it.
> > > 

I used the blockdev error injection layer. It only works with ext2, because
ext4 (and other filesystems) will error and go into readonly.

dd if=/dev/zero of=/tmp/loop bs=1M count=100
losetup /dev/loop8 /tmp/loop 
mkfs.ext2 /dev/loop8
mount -o errors=continue /dev/loop8 /mnt/loop/
mkdir -p /mnt/loop/{upperdir,workdir}
mount -t overlay -o volatile,index=off,lowerdir=/root/lowerdir,upperdir=/mnt/loop/upperdir,workdir=/mnt/loop/workdir none /mnt/foo/
echo 1 > /sys/block/loop8/make-it-fail
echo 100 > /sys/kernel/debug/fail_make_request/probability
echo 1 > /sys/kernel/debug/fail_make_request/times
dd if=/dev/zero of=/mnt/foo/zero bs=1M count=1
sync

I tried to get XFS tests working, but I was unable to get a simpler repro than 
above. This is also easy enough to do with a simple kernel module. Maybe it'd be 
neat to be able to inject in errseq increments via the fault injection API one 
day? I have no idea what the VFS's approach here is.

> > 
> > If you don't want to see errors that occurred before you did the mount,
> > then you probably can just resurrect and rename the original version of
> > errseq_sample. Something like this, but with a different name:
> > 
> > +errseq_t errseq_sample(errseq_t *eseq)
> > +{
> > +       errseq_t old = READ_ONCE(*eseq);
> > +       errseq_t new = old;
> > +
> > +       /*
> > +        * For the common case of no errors ever having been set, we can skip
> > +        * marking the SEEN bit. Once an error has been set, the value will
> > +        * never go back to zero.
> > +        */
> > +       if (old != 0) {
> > +               new |= ERRSEQ_SEEN;
> > +               if (old != new)
> > +                       cmpxchg(eseq, old, new);
> > +       }
> > +       return new;
> > +}
> 
> Yes, a helper like this should solve the issue at hand. We are not
> interested in previous errors. This also sets the ERRSEQ_SEEN on 
> sample and it will also solve the other issue when after sampling
> if error gets seen, we don't want errseq_check() to return error.
> 
> Thinking of some possible names for new function.
> 
> errseq_sample_seen()
> errseq_sample_set_seen()
> errseq_sample_consume_unseen()
> errseq_sample_current()
> 
> Thanks
> Vivek
> 

I think we can just replace the code in super.c with:
ofs->upper_errseq = READ_ONCE(&upper_mnt_sb->s_wb_err);

And then add an errseq helper which checks:
int errseq_check_ignore_seen(errseq_t *eseq, errseq_t since)
{
	errseq_t cur = READ_ONCE(*eseq);

	if ((cur == since) || (cur == since | ERRSEQ_SEEN))
		return 0;

	return -(cur & MAX_ERRNO);
}

--- 

This extra (cur == since | ERRSEQ_SEEN) ignores the situation where cur has 
"been seen". We do not want to do the cmpxchg I think because that would hide 
the situation from the user where if they do a syncfs we hide the error from
the user. 

If the since had seen already set, but cur does not have seen set, it means
we've wrapped.

  parent reply	other threads:[~2020-12-02 18:50 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-02  9:27 [PATCH] overlay: Implement volatile-specific fsync error behaviour Sargun Dhillon
2020-12-02 10:25 ` Amir Goldstein
2020-12-02 15:07 ` Vivek Goyal
2020-12-02 17:02   ` Jeff Layton
2020-12-02 17:29     ` Vivek Goyal
2020-12-02 18:22       ` Jeff Layton
2020-12-02 18:56         ` Vivek Goyal
2020-12-02 19:03           ` Sargun Dhillon
2020-12-02 19:26           ` Jeff Layton
2020-12-02 21:34             ` Vivek Goyal
2020-12-02 21:52               ` Jeff Layton
2020-12-03 10:42                 ` Sargun Dhillon
2020-12-03 12:06                   ` Jeff Layton
2020-12-03 14:27                   ` Vivek Goyal
2020-12-03 15:20                     ` Jeff Layton
2020-12-03 17:08                       ` Sargun Dhillon
2020-12-03 17:50                         ` Jeff Layton
2020-12-03 20:43                           ` Vivek Goyal
2020-12-03 21:36                             ` Jeff Layton
2020-12-03 22:24                               ` Vivek Goyal
2020-12-03 23:36                                 ` Jeff Layton
2020-12-04  6:45                                   ` Amir Goldstein
2020-12-04 15:00                                     ` Vivek Goyal
2020-12-03 20:31                       ` Vivek Goyal
2020-12-02 18:49       ` Sargun Dhillon [this message]
2020-12-02 19:10         ` Jeff Layton
2020-12-03 10:36         ` Amir Goldstein
2020-12-02 17:41   ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201202184936.GA17139@ircssh-2.c.rugged-nimbus-611.internal \
    --to=sargun@sargun.me \
    --cc=amir73il@gmail.com \
    --cc=jlayton@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).