All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH v2] xfs: fix iclog release error check race with shutdown
Date: Wed, 19 Feb 2020 08:12:32 -0500	[thread overview]
Message-ID: <20200219131232.GA24157@bfoster> (raw)
In-Reply-To: <20200218215243.GS10776@dread.disaster.area>

On Wed, Feb 19, 2020 at 08:52:43AM +1100, Dave Chinner wrote:
> On Tue, Feb 18, 2020 at 12:54:25PM -0500, Brian Foster wrote:
> > Prior to commit df732b29c8 ("xfs: call xlog_state_release_iclog with
> > l_icloglock held"), xlog_state_release_iclog() always performed a
> > locked check of the iclog error state before proceeding into the
> > sync state processing code. As of this commit, part of
> > xlog_state_release_iclog() was open-coded into
> > xfs_log_release_iclog() and as a result the locked error state check
> > was lost.
> > 
> > The lockless check still exists, but this doesn't account for the
> > possibility of a race with a shutdown being performed by another
> > task causing the iclog state to change while the original task waits
> > on ->l_icloglock. This has reproduced very rarely via generic/475
> > and manifests as an assert failure in __xlog_state_release_iclog()
> > due to an unexpected iclog state.
> > 
> > Restore the locked error state check in xlog_state_release_iclog()
> > to ensure that an iclog state update via shutdown doesn't race with
> > the iclog release state processing code.
> > 
> > Fixes: df732b29c807 ("xfs: call xlog_state_release_iclog with l_icloglock held")
> > Reported-by: Zorro Lang <zlang@redhat.com>
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > 
> > v2:
> > - Include Fixes tag.
> > - Use common error path to include shutdown call.
> > v1: https://lore.kernel.org/linux-xfs/20200214181528.24046-1-bfoster@redhat.com/
> > 
> >  fs/xfs/xfs_log.c | 13 +++++++++----
> >  1 file changed, 9 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index f6006d94a581..796ff37d5bb5 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -605,18 +605,23 @@ xfs_log_release_iclog(
> >  	struct xlog		*log = mp->m_log;
> >  	bool			sync;
> >  
> > -	if (iclog->ic_state == XLOG_STATE_IOERROR) {
> > -		xfs_force_shutdown(mp, SHUTDOWN_LOG_IO_ERROR);
> > -		return -EIO;
> > -	}
> 
> hmmmm.
> 
> xfs_force_shutdown() will does nothing if the iclog at the head of
> the log->iclog list is marked with XLOG_STATE_IOERROR before IO is
> submitted. In general, that is the case here as the head iclog is
> what xlog_state_get_iclog_space() returns.
> 
> i.e. XLOG_STATE_IOERROR here implies the log has already been shut
> down because the state is IOERROR rather than ACTIVE or WANT_SYNC as
> was set when the iclog was obtained from
> xlog_state_get_iclog_space().
> 

I'm not sure that assumption is always true. If the iclog is (was)
WANT_SYNC, for example, then it's already been switched out from the
head of that list. That might only happen if a checkpoint happens to
align nicely with the end of an iclog, but that's certainly possible. We
also need to consider that ->l_icloglock cycles here and thus somebody
else could switch out the head iclog..

> > +	if (iclog->ic_state == XLOG_STATE_IOERROR)
> > +		goto error;
> >  
> >  	if (atomic_dec_and_lock(&iclog->ic_refcnt, &log->l_icloglock)) {
> > +		if (iclog->ic_state == XLOG_STATE_IOERROR) {
> > +			spin_unlock(&log->l_icloglock);
> > +			goto error;
> > +		}
> >  		sync = __xlog_state_release_iclog(log, iclog);
> >  		spin_unlock(&log->l_icloglock);
> >  		if (sync)
> >  			xlog_sync(log, iclog);
> >  	}
> >  	return 0;
> > +error:
> > +	xfs_force_shutdown(mp, SHUTDOWN_LOG_IO_ERROR);
> > +	return -EIO;
> 
> Hence I suspect that this should be ASSERT(XLOG_FORCED_SHUTDOWN(log))
> just like is in xfs_log_force_umount() when this pre-existing log
> IO error condition is tripped over...
> 

I think this change is fundamentally correct based on the simple fact
that we only ever set XLOG_STATE_IOERROR in the shutdown path. That
said, the assert would be potentially racy against the shutdown path
without any ordering guarantee that the release path might see the iclog
state update prior to the log state update and lead to a potentially
false negative assert failure. I suspect this is why the shutdown call
might have been made from here in the first place (and partly why I
didn't bother with it in the restored locked state check).

Given the context of this patch (fixing a regression) and the practical
history of this code (and the annoying process of identifying and
tracking down the source of these kind of shutdown buglets), I'm
inclined to have this patch preserve the historical and relatively
proven behavior as much as possible and consider further cleanups
separately...

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


  parent reply	other threads:[~2020-02-19 13:12 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-18 17:54 [PATCH v2] xfs: fix iclog release error check race with shutdown Brian Foster
2020-02-18 21:00 ` Christoph Hellwig
2020-02-18 21:52 ` Dave Chinner
2020-02-18 22:36   ` Christoph Hellwig
2020-02-19  3:00     ` Dave Chinner
2020-02-19 18:46       ` Christoph Hellwig
2020-02-19 13:12   ` Brian Foster [this message]
2020-02-19 21:51     ` Darrick J. Wong
2020-02-20 12:41       ` Brian Foster
2020-02-20 15:43         ` Christoph Hellwig
2020-02-20 16:02           ` Brian Foster
2020-02-20  3:32     ` Dave Chinner
2020-02-19 21:21 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200219131232.GA24157@bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.