Re: [PATCH] xfs: call xfs_idestroy_fork() in xfs_ilock() critical section

From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Waiman Long <Waiman.Long@hp.com>,
	linux-kernel@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: [PATCH] xfs: call xfs_idestroy_fork() in xfs_ilock() critical section
Date: Fri, 24 Apr 2015 08:08:23 +1000	[thread overview]
Message-ID: <20150423220823.GJ15810@dastard> (raw)
In-Reply-To: <20150423122149.GA13131@bfoster.bfoster>

On Thu, Apr 23, 2015 at 08:21:50AM -0400, Brian Foster wrote:
> On Thu, Apr 23, 2015 at 09:17:58AM +1000, Dave Chinner wrote:
> > @@ -410,11 +418,12 @@ xfs_attr_inactive(xfs_inode_t *dp)
> >  	 */
> >  	trans = xfs_trans_alloc(mp, XFS_TRANS_ATTRINVAL);
> >  	error = xfs_trans_reserve(trans, &M_RES(mp)->tr_attrinval, 0, 0);
> > -	if (error) {
> > -		xfs_trans_cancel(trans, 0);
> > -		return error;
> > -	}
> > -	xfs_ilock(dp, XFS_ILOCK_EXCL);
> > +	if (error)
> > +		goto out_cancel;
> > +
> 
> The error path expects a locked inode, but it isn't here.

Right, xfs/181 tripped that. I've fixed it in my current version ;)

> 
> > +	lock_mode = XFS_ILOCK_EXCL;
> > +	cancel_flags = XFS_TRANS_RELEASE_LOG_RES | XFS_TRANS_ABORT;
> > +	xfs_ilock(dp, lock_mode);
> >  
> >  	/*
> >  	 * No need to make quota reservations here. We expect to release some
> > @@ -423,28 +432,36 @@ xfs_attr_inactive(xfs_inode_t *dp)
> >  	xfs_trans_ijoin(trans, dp, 0);
> >  
> >  	/*
> > -	 * Decide on what work routines to call based on the inode size.
> > +	 * It's unlikely we've raced with an attribute fork creation, but check
> > +	 * anyway just in case.
> >  	 */
> > -	if (!xfs_inode_hasattr(dp) ||
> > -	    dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
> > -		error = 0;
> > -		goto out;
> > +	if (!XFS_IFORK_Q(dp))
> > +		goto out_cancel;
> 
> What about attribute fork creation would cause di_forkoff == 0 if that
> wasn't the case above? Do you mean to say a potential race with
> attribute fork destruction?

atrtibute fork creation will never leave di_forkoff == 0. See
xfs_attr_shortform_bytesfit() as a guideline for the min/max fork
offset at attribute fork creation time.

The race I'm talking about is the fact we check for an attr fork,
then drop the lock, do the trans reserve and then grab the lock
again. The inode could have changed in that time, so we need to
check again. It's extremely unlikely that the inode has changed due
to the fact it is in the ->evict path and can't be referenced by the
VFS again until it's in a reclaimable state. Hence it is only
internal filesystem stuff that could modify it, which I don't think
can happen. So, leave the check, mark the race as unlikely to occur.

> > +	/* invalidate and truncate the attribute fork extents */
> > +	if (dp->i_d.di_aformat != XFS_DINODE_FMT_LOCAL) {
> > +		error = xfs_attr3_root_inactive(&trans, dp);
> > +		if (error)
> > +			goto out_cancel;
> > +
> > +		error = xfs_itruncate_extents(&trans, dp, XFS_ATTR_FORK, 0);
> > +		if (error)
> > +			goto out_cancel;
> >  	}
> > -	error = xfs_attr3_root_inactive(&trans, dp);
> > -	if (error)
> > -		goto out;
> >  
> > -	error = xfs_itruncate_extents(&trans, dp, XFS_ATTR_FORK, 0);
> > -	if (error)
> > -		goto out;
> > +	/* Reset the attribute fork - this also destroys the in-core fork */
> > +	xfs_attr_fork_reset(dp, trans);
> >  
> >  	error = xfs_trans_commit(trans, XFS_TRANS_RELEASE_LOG_RES);
> > -	xfs_iunlock(dp, XFS_ILOCK_EXCL);
> > -
> > +	xfs_iunlock(dp, lock_mode);
> >  	return error;
> >  
> > -out:
> > -	xfs_trans_cancel(trans, XFS_TRANS_RELEASE_LOG_RES|XFS_TRANS_ABORT);
> > -	xfs_iunlock(dp, XFS_ILOCK_EXCL);
> > +out_cancel:
> > +	xfs_trans_cancel(trans, cancel_flags);
> > +out_destroy_fork:
> > +	/* kill the in-core attr fork before we drop the inode lock */
> > +	if (dp->i_afp)
> > +		xfs_idestroy_fork(dp, XFS_ATTR_FORK);
> > +	xfs_iunlock(dp, lock_mode);
> 
> I wonder if a warning or some kind of notification is appropriate here.
> If we get to this point, we're removing an inode potentially without
> having freed attr fork blocks and thus leaving them permanently
> unreferenced, yes?

We end up leaving the inode on the unlinked list because we abort
the inactivation on error. The in-core inode still gets reclaimed
properly, but it's now up to log recovery to re-run inactivation to
try to free the inode or xfs_repair to cleanit up.  Either way, it's
safe just to leave the inode where it is on the unlinked list - it's
free and not getting in the way, so IMO warnings at this point don't
serve any useful purpose...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com