All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Brian Foster <bfoster@redhat.com>
Cc: sandeen@sandeen.net, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/3] xfs_repair: set NEEDSREPAIR the first time we write to a filesystem
Date: Thu, 18 Feb 2021 09:07:17 -0800	[thread overview]
Message-ID: <20210218170717.GU7193@magnolia> (raw)
In-Reply-To: <20210218125927.GA685651@bfoster>

On Thu, Feb 18, 2021 at 07:59:27AM -0500, Brian Foster wrote:
> On Wed, Feb 17, 2021 at 08:45:22PM -0800, Darrick J. Wong wrote:
> > On Tue, Feb 16, 2021 at 06:55:34AM -0500, Brian Foster wrote:
> > > On Fri, Feb 12, 2021 at 09:46:50PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > Add a hook to the buffer cache so that xfs_repair can intercept the
> > > > first write to a V5 filesystem to set the NEEDSREPAIR flag.  In the
> > > > event that xfs_repair dirties the filesystem and goes down, this ensures
> > > > that the sysadmin will have to re-start repair before mounting.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > ---
> > > >  include/xfs_mount.h |    4 ++
> > > >  libxfs/rdwr.c       |    4 ++
> > > >  repair/xfs_repair.c |   95 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  3 files changed, 103 insertions(+)
> > > > 
> > > > 
> > > ...
> > > > diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
> > > > index 90d1a95a..12e319ae 100644
> > > > --- a/repair/xfs_repair.c
> > > > +++ b/repair/xfs_repair.c
> ...
> > > > @@ -751,6 +753,95 @@ clear_needsrepair(
> > > >  		libxfs_buf_relse(bp);
> > > >  }
> > > >  
> > > > +static void
> > > > +update_sb_crc_only(
> > > > +	struct xfs_buf		*bp)
> > > > +{
> > > > +	xfs_buf_update_cksum(bp, XFS_SB_CRC_OFF);
> > > > +}
> > > > +
> > > > +/* Forcibly write the primary superblock with the NEEDSREPAIR flag set. */
> > > > +static void
> > > > +force_needsrepair(
> > > > +	struct xfs_mount	*mp)
> > > > +{
> > > > +	struct xfs_buf_ops	fake_ops;
> > > > +	struct xfs_buf		*bp;
> > > > +	int			error;
> > > > +
> > > > +	if (!xfs_sb_version_hascrc(&mp->m_sb) ||
> > > > +	    xfs_sb_version_needsrepair(&mp->m_sb))
> > > > +		return;
> > > > +
> > > > +	bp = libxfs_getsb(mp);
> > > > +	if (!bp || bp->b_error) {
> > > > +		do_log(
> > > > +	_("couldn't get superblock to set needsrepair, err=%d\n"),
> > > > +				bp ? bp->b_error : ENOMEM);
> > > > +		return;
> > > > +	} else {
> > > > +		/*
> > > > +		 * It's possible that we need to set NEEDSREPAIR before we've
> > > > +		 * had a chance to fix the summary counters in the primary sb.
> > > > +		 * With the exception of those counters, phase 1 already
> > > > +		 * ensured that the geometry makes sense.
> > > > +		 *
> > > > +		 * Bad summary counters in the primary super can cause the
> > > > +		 * write verifier to fail, so substitute a dummy that only sets
> > > > +		 * the CRC.  In the event of a crash, NEEDSREPAIR will prevent
> > > > +		 * the kernel from mounting our potentially damaged filesystem
> > > > +		 * until repair is run again, so it's ok to bypass the usual
> > > > +		 * verification in this one case.
> > > > +		 */
> > > > +		fake_ops = xfs_sb_buf_ops; /* struct copy */
> > > > +		fake_ops.verify_write = update_sb_crc_only;
> > > > +
> > > > +		mp->m_sb.sb_features_incompat |=
> > > > +				XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;
> > > > +		libxfs_sb_to_disk(bp->b_addr, &mp->m_sb);
> > > > +
> > > > +		/* Force the primary super to disk immediately. */
> > > > +		bp->b_ops = &fake_ops;
> > > > +		error = -libxfs_bwrite(bp);
> > > 
> > > This looks like it calls back into the writeback hook before it's been
> > > cleared.
> > 
> > Yes, it does...
> > 
> > > I'm guessing the xfs_sb_version_needsrepair() check above cuts
> > > off any recursion issues,
> > 
> > ...but as soon as we end up in the writeback hook, the b_ops and
> > XFS_SB_DADDR checks cause the hook to return without doing anything.
> > 
> > > but it might be cleaner to clear the callback pointer first.
> > 
> > We need to block all other writers until /after/ we've written the
> > primary super, which means that we can't clear the callback until some
> > time after the bwrite.
> > 
> 
> Ok, care to document how this is intended to work with a comment?

Will do, though I'll put the comments about the hook in the actual hook
function:

	/*
	 * This write hook ignores any buffer that looks like a
	 * superblock to avoid hook recursion when setting NEEDSREPAIR.
	 * Higher level code modifying an sb must control the flag
	 * manually.
	 */

	if (bp->b_ops == &xfs_sb_buf_ops || bp->b_bn == XFS_SB_DADDR)
		return;

	pthread_mutex_lock(&wb_mutex);

	/*
	 * If someone else already dropped the hook, then needsrepair
	 * has already been set on the filesystem and we can unlock.
	 */
	if (mp->m_buf_writeback_fn != repair_capture_writeback)
		goto unlock;

	/*
	 * If we get here, the buffer being written is not a superblock,
	 * and needsrepair needs to be set.  The hook is kept in place
	 * to plug all other writes until the sb write finishes.
	 */
	force_needsrepair(mp);
	mp->m_buf_writeback_fn = NULL;

--D

> Brian
> 
> > > 
> > > > +		bp->b_ops = &xfs_sb_buf_ops;
> > > > +		if (error)
> > > > +			do_log(_("couldn't force needsrepair, err=%d\n"), error);
> > > > +	}
> > > > +	if (bp)
> > > > +		libxfs_buf_relse(bp);
> > > 
> > > Doesn't appear we can get here with bp == NULL. Otherwise the rest looks
> > > reasonable to me.
> > 
> > Yep, will fix.
> > 
> > --D
> > 
> > > Brian
> > > 
> > > > +}
> > > > +
> > > > +/* Intercept the first write to the filesystem so we can set NEEDSREPAIR. */
> > > > +static void
> > > > +repair_capture_writeback(
> > > > +	struct xfs_buf		*bp)
> > > > +{
> > > > +	struct xfs_mount	*mp = bp->b_mount;
> > > > +	static pthread_mutex_t	wb_mutex = PTHREAD_MUTEX_INITIALIZER;
> > > > +
> > > > +	/* Higher level code modifying a superblock must set NEEDSREPAIR. */
> > > > +	if (bp->b_ops == &xfs_sb_buf_ops || bp->b_bn == XFS_SB_DADDR)
> > > > +		return;
> > > > +
> > > > +	pthread_mutex_lock(&wb_mutex);
> > > > +
> > > > +	/*
> > > > +	 * If someone else already dropped the hook, then needsrepair has
> > > > +	 * already been set on the filesystem and we can unlock.
> > > > +	 */
> > > > +	if (mp->m_buf_writeback_fn != repair_capture_writeback)
> > > > +		goto unlock;
> > > > +
> > > > +	/*
> > > > +	 * If we get here, the buffer being written is not a superblock, and
> > > > +	 * needsrepair needs to be set.
> > > > +	 */
> > > > +	force_needsrepair(mp);
> > > > +	mp->m_buf_writeback_fn = NULL;
> > > > +unlock:
> > > > +	pthread_mutex_unlock(&wb_mutex);
> > > > +}
> > > > +
> > > >  int
> > > >  main(int argc, char **argv)
> > > >  {
> > > > @@ -847,6 +938,10 @@ main(int argc, char **argv)
> > > >  	if (verbose > 2)
> > > >  		mp->m_flags |= LIBXFS_MOUNT_WANT_CORRUPTED;
> > > >  
> > > > +	/* Capture the first writeback so that we can set needsrepair. */
> > > > +	if (xfs_sb_version_hascrc(&mp->m_sb))
> > > > +		mp->m_buf_writeback_fn = repair_capture_writeback;
> > > > +
> > > >  	/*
> > > >  	 * set XFS-independent status vars from the mount/sb structure
> > > >  	 */
> > > > 
> > > 
> > 
> 

  reply	other threads:[~2021-02-18 18:48 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-13  5:46 [PATCHSET 0/3] xfs_repair: set needsrepair when dirtying filesystems Darrick J. Wong
2021-02-13  5:46 ` [PATCH 1/3] xfs_repair: set NEEDSREPAIR the first time we write to a filesystem Darrick J. Wong
2021-02-16 11:55   ` Brian Foster
2021-02-18  4:45     ` Darrick J. Wong
2021-02-18 12:59       ` Brian Foster
2021-02-18 17:07         ` Darrick J. Wong [this message]
2021-02-13  5:46 ` [PATCH 2/3] libxfs: simulate system failure after a certain number of writes Darrick J. Wong
2021-02-16 11:56   ` Brian Foster
2021-02-18  4:36     ` Darrick J. Wong
2021-02-18 13:02       ` Brian Foster
2021-02-18 17:42         ` Darrick J. Wong
2021-02-13  5:47 ` [PATCH 3/3] xfs_repair: add post-phase error injection points Darrick J. Wong
2021-02-16 11:58   ` Brian Foster
2021-02-18  4:47     ` Darrick J. Wong
2021-02-18 13:02       ` Brian Foster
2021-02-18 18:01         ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210218170717.GU7193@magnolia \
    --to=djwong@kernel.org \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.