All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/2] xfs: zero length symlinks are not valid
Date: Tue, 19 Jun 2018 12:28:40 -0400	[thread overview]
Message-ID: <20180619162839.GC2806@bfoster> (raw)
In-Reply-To: <20180619154810.GC21698@magnolia>

On Tue, Jun 19, 2018 at 08:48:10AM -0700, Darrick J. Wong wrote:
> On Tue, Jun 19, 2018 at 07:54:34AM -0400, Brian Foster wrote:
> > On Tue, Jun 19, 2018 at 08:42:59AM +1000, Dave Chinner wrote:
> > > On Mon, Jun 18, 2018 at 09:24:11AM -0400, Brian Foster wrote:
> > > > On Mon, Jun 18, 2018 at 03:57:10PM +1000, Dave Chinner wrote:
> > > > > From: Dave Chinner <dchinner@redhat.com>
> > > > > 
> > > > > A log recovery failure has been reproduced where a symlink inode has
> > > > > a zero length in extent form. It was caused by a shutdown during a
> > > > > combined fstress+fsmark workload.
> > > > > 
> > > > > The underlying problem is the issue in xfs_inactive_symlink(): the
> > > > > inode is unlocked between the symlink inactivation/truncation and
> > > > > the inode being freed. This opens a window for the inode to be
> > > > > written to disk before it xfs_ifree() removes it from the unlinked
> > > > > list, marks it free in the inobt and zeros the mode.
> > > > > 
> > > > > For shortform inodes, the fix is simple. xfs_ifree() clears the data
> > > > > fork state, so there's no need to do it in xfs_inactive_symlink().
> > > > > This means the shortform fork verifier will not see a zero length
> > > > > data fork as it mirrors the inode size through to xfs_ifree()), and
> > > > > hence if the inode gets written back and the fork verifiers are run
> > > > > they will still see a fork that matches the on-disk inode size.
> > > > > 
> > > > > For extent form (remote) symlinks, it is a little more tricky. Here
> > > > > we explicitly set the inode size to zero, so the above race can lead
> > > > > to zero length symlinks on disk. Because the inode is unlinked at
> > > > > this point (i.e. on the unlinked list) and unreferenced, it can
> > > > > never be seen again by a user. Hence when we set the inode size to
> > > > 
> > > > Perhaps we don't care, but what about bulkstat? The inode free is
> > > > imminent, so it probably doesn't matter that much.
> > > 
> > > Bulkstat users have to deal with all sorts of TOCTOU races because
> > > we do the inode lookup after unlocking the AGI. Hence the inodes
> > > returned are not guaranteed to be allocated or even accessible,
> > > especially if they return nlink == 0...
> > > 
> > > So, yeah, I don't think it matters for bulkstat.
> > > 
> > 
> > Ack.
> > 
> > > > > @@ -523,17 +531,10 @@ xfs_inactive_symlink(
> > > > >  		return -EIO;
> > > > >  
> > > > >  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> > > > > -
> > > > > -	/*
> > > > > -	 * Zero length symlinks _can_ exist.
> > > > > -	 */
> > > > >  	pathlen = (int)ip->i_d.di_size;
> > > > > -	if (!pathlen) {
> > > > > -		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > > > -		return 0;
> > > > > -	}
> > > > > +	ASSERT(pathlen);
> > > > 
> > > > Ok, but I still feel like I'm missing something here. The commit log
> > > > explains how symlink inode inactivation creates this transient zero-size
> > > > symlink state. That's fine, but why do we have code to check for such
> > > > symlinks _in_ the symlink activation path?
> > > 
> > > Ah, I put that in the cover letter - the VFS is supposed to ensure
> > > we don't get zero length symlinks passed to us, and the verifiers
> > > are supposed to catch it coming off disk. This is mostly just a
> > > bounds check to document that we don't expect zero length symlinks
> > > at any time, especially in the unlink path. I put the same assert
> > > check in the xfs_symlink() create path, too, to ensure we don't get
> > > passed a zero length from the VFS...
> > > 
> > 
> > Oh, I'm not referring to the ASSERT() here. That all seems fine...
> > 
> > > > Is the assumption that this is just an overzealous/unnecessary check, or
> > > > am I missing something? If the former, an extra sentence in the last
> > > > paragraph or so of the commit log to explain why we remove this would be
> > > > nice.
> > > 
> > > It's a "should never occur" situation but, well, paranoia seeing as
> > > we've got this wrong for so long...
> > > 
> > 
> > Rather, I'm asking about the pre-existing code that we remove. The hunk
> > just above from xfs_inactive_symlink():
> > 
> > -	/*
> > -	 * Zero length symlinks _can_ exist.
> > -	 */
> > -	if (!pathlen) {
> > -		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > -		return 0;
> > -	}
> > 
> > ... obviously suggests that there could have been a situation where we'd
> > see a zero-length symlink on entry to xfs_inactive_symlink(). This
> > patch, however, focuses on fixing the transient zero-length symlink
> > caused by xfs_inactive_symlink_rmt() (which comes after the above
> > check).
> > 
> > The latter implies we'd never enter the symlink inactivate path with a
> > zero-length symlink, since this is the path that causes it (until the
> > inode is ultimately freed). So my question essentially is do we have any
> > idea why this code exists in the first place? Is it just bogus given
> > there's no other clear vector to a zero-length symlink before the inode
> > is inactivated?
> 
> I think that vector is log recovery.  Say we have a filesystem running
> on an old kernel, and the fs goes down immediately after the transaction
> commit in xfs_inactive_symlink_rmt.  At that exact moment we have a
> zero-size extents format symlink on the unlinked list, and if we boot up
> with a new kernel and go through log recovery we'll try to free this
> inode in xlog_recover_process_one_iunlink.
> 

Ah, Ok. So log recovery will look up the inode and run it through
the inactive sequence again. We presumably remove this code because with
this patch that should no longer happen. Keeping it around doesn't help
us because presumably the verifier would fail before we get to this
point anyways. Makes sense, thanks Darrick.

> The new code that switches i_mode prevents us from dropping more of
> these onto the disk, but we still have old filesystems to deal with.
> FWIW every now and then generic/475 complains in xfs_repair -n about
> such a symlink (dfork format 2 and size 0).
> 

So it sounds like the caveat with this approach then is that we haven't
necessarily provided a recovery path for somebody stuck in this
particular failed recovery state, outside of 'xfs_repair -L'. :/ Unless
I'm misunderstanding the error, that does seem a bit unfortunate given
that otherwise recovery should complete.

Brian

> --D
> 
> > Brian
> > 
> > > > > -	if (pathlen < 0 || pathlen > XFS_SYMLINK_MAXLEN) {
> > > > > +	if (pathlen <= 0 || pathlen > XFS_SYMLINK_MAXLEN) {
> > > > >  		xfs_alert(mp, "%s: inode (0x%llx) bad symlink length (%d)",
> > > > >  			 __func__, (unsigned long long)ip->i_ino, pathlen);
> > > > >  		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > > > @@ -541,12 +542,12 @@ xfs_inactive_symlink(
> > > > >  		return -EFSCORRUPTED;
> > > > >  	}
> > > > >  
> > > > > +	/*
> > > > > +	 * Inline fork state gets removed by xfs_difree() so we have nothing to
> > > > > +	 * do here in that case.
> > > > > +	 */
> > > > 
> > > > It looks like that actually happens in xfs_ifree() (not xfs_difree()).
> > > 
> > > Yup, will fix.
> > > 
> > > Cheers,
> > > 
> > > Dave.
> > > -- 
> > > Dave Chinner
> > > david@fromorbit.com
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2018-06-19 16:28 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-18  5:57 [PATCH 0/2] xfs: symlink and inode writeback issues Dave Chinner
2018-06-18  5:57 ` [PATCH 1/2] xfs: zero length symlinks are not valid Dave Chinner
2018-06-18 13:24   ` Brian Foster
2018-06-18 22:42     ` Dave Chinner
2018-06-19 11:54       ` Brian Foster
2018-06-19 15:48         ` Darrick J. Wong
2018-06-19 16:28           ` Brian Foster [this message]
2018-06-19 23:22             ` Dave Chinner
2018-06-20 11:50               ` Brian Foster
2018-06-20 22:59                 ` Dave Chinner
2018-06-21 11:46                   ` Brian Foster
2018-06-21 22:31                     ` Dave Chinner
2018-06-21 22:53                       ` Darrick J. Wong
2018-06-22 10:44                         ` Brian Foster
2018-06-23 17:38                           ` Darrick J. Wong
2018-06-18  5:57 ` [PATCH 2/2] xfs: xfs_iflush_abort() can be called twice on cluster writeback failure Dave Chinner
2018-06-18 13:24   ` Brian Foster
2018-06-19  5:05   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180619162839.GC2806@bfoster \
    --to=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.