linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Eric Sandeen <sandeen@sandeen.net>
Cc: Christoph Hellwig <hch@lst.de>, Brian Foster <bfoster@redhat.com>,
	linux-xfs@vger.kernel.org
Subject: Re: [PATCH 08/11] xfs_repair: allow setting the needsrepair flag
Date: Thu, 11 Feb 2021 16:17:31 -0800	[thread overview]
Message-ID: <20210212001731.GH7193@magnolia> (raw)
In-Reply-To: <2e135dfe-9be6-b5f9-7c06-a10e6e45e3da@sandeen.net>

On Thu, Feb 11, 2021 at 05:29:05PM -0600, Eric Sandeen wrote:
> On 2/11/21 4:59 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Quietly set up the ability to tell xfs_repair to set NEEDSREPAIR at
> > program start and (presumably) clear it by the end of the run.  This
> > code isn't terribly useful to users; it's mainly here so that fstests
> > can exercise the functionality.  We don't document this flag in the
> > manual pages at all because repair clears needsrepair at exit, which
> > means the knobs only exist for fstests to exercise the functionality.
> > 
> > Note that we can't do any of these upgrades until we've at least done a
> > preliminary scan of the primary super and the log.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> 
> I'm still a little on the fence about the cmdline option for crashing
> repair at a certain point from the POV that Brian kind of pointed out
> that this doesn't exactly scale as we need more hooks.

(That's in the next patch.)

> but
> 
> ehhhh it's a test-only undocumented option and I guess we could change
> it later if desired
> 
> we do have other debug options on the commandline already as well....

I don't mind moving the debugging hooks to be seekrit environment
variables or something, but I don't think I've quite addressed some of
Brian's comments from last time:

[paste in stuff Brian said]

> But is it worth maintaining test specific debug logic in an
> application just to confirm that particular feature bit upgrades
> actually set the bit?

I argue that yes, this is important enough to burn a debugging knob.
The sequence that I think we should prevent through testing is the one
where we've set the new feature on the primary super but we haven't
finished generating whatever new metadata is needed to complete the
upgrade, the system crashes, and on remount the verifiers explode.

Chances are pretty good that we'll get an angry bug report on the
mailing list: "I upgraded my fs, the power went down, and the kernel
sprayed corruption everywhere!"  If we get a customer escalation like
this, I'd /much/ rather it be about not being able to mount right after
the reboot than a latent corruption that grows unseen until somebody's
filesystem loses data.

If a future patch to repair accidentally breaks the behavior where we
set NEEDSREPAIR at the same time as we set the new feature and flush the
super to disk, we cannot tell that there's been a regression in this
safety mechanism just by looking at the output of an otherwise
successful xfs_repair run...

> It seems sufficient to me to test that needsrepair functionality works
> as expected and that individual feature upgrade works as well.

...so in other words, we need some point to inject an error to make sure
that the upgrade interlock is correct.

> Given the discussion on patch 7, perhaps it makes more sense to at
> least defer this sort of injection mechanism until we have a scheme
> for generic needsrepair usage worked out for xfs_repair?

I'm in the midst of prototyping what I said in the last thread --
hooking the buffe cache so that repair can catch the first time we
actually write anything to the filesystem, and using that to set
NEEDSREPAIR.  I've not run it through full fstests yet, but AFAICT I can
keep using the same tests and the same injection knobs I already wrote.

> I am wondering if there's a way to make repair fail without requiring
> additional code, but if not and we do require some sort of injection
> mode, I suspect we might end up better served by something more
> generic (i.e. capable of failures at random points) rather than
> defining a command line option specifically for a particular fstest..

Probably yes, but ... uh I don't want this to drag on into building a
generic error injection framework for userspace.

I would /really/ like to get inobtcount/bigtime tests into the kernel
without a giant detour they have nearly zero test coverage from the
wider community.

--D

> 
> > ---
> >  repair/globals.c    |    2 ++
> >  repair/globals.h    |    2 ++
> >  repair/phase2.c     |   63 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  repair/xfs_repair.c |    9 +++++++
> >  4 files changed, 76 insertions(+)
> > 
> > 
> > diff --git a/repair/globals.c b/repair/globals.c
> > index 110d98b6..699a96ee 100644
> > --- a/repair/globals.c
> > +++ b/repair/globals.c
> > @@ -49,6 +49,8 @@ int	rt_spec;		/* Realtime dev specified as option */
> >  int	convert_lazy_count;	/* Convert lazy-count mode on/off */
> >  int	lazy_count;		/* What to set if to if converting */
> >  
> > +bool	add_needsrepair;	/* forcibly set needsrepair while repairing */
> > +
> >  /* misc status variables */
> >  
> >  int	primary_sb_modified;
> > diff --git a/repair/globals.h b/repair/globals.h
> > index 1d397b35..043b3e8e 100644
> > --- a/repair/globals.h
> > +++ b/repair/globals.h
> > @@ -90,6 +90,8 @@ extern int	rt_spec;		/* Realtime dev specified as option */
> >  extern int	convert_lazy_count;	/* Convert lazy-count mode on/off */
> >  extern int	lazy_count;		/* What to set if to if converting */
> >  
> > +extern bool	add_needsrepair;
> > +
> >  /* misc status variables */
> >  
> >  extern int		primary_sb_modified;
> > diff --git a/repair/phase2.c b/repair/phase2.c
> > index 952ac4a5..9a8d42e1 100644
> > --- a/repair/phase2.c
> > +++ b/repair/phase2.c
> > @@ -131,6 +131,63 @@ zero_log(
> >  		libxfs_max_lsn = log->l_last_sync_lsn;
> >  }
> >  
> > +static bool
> > +set_needsrepair(
> > +	struct xfs_mount	*mp)
> > +{
> > +	if (!xfs_sb_version_hascrc(&mp->m_sb)) {
> > +		printf(
> > +	_("needsrepair flag only supported on V5 filesystems.\n"));
> > +		exit(0);
> > +	}
> > +
> > +	if (xfs_sb_version_needsrepair(&mp->m_sb)) {
> > +		printf(_("Filesystem already marked as needing repair.\n"));
> > +		exit(0);
> > +	}
> > +
> > +	printf(_("Marking filesystem in need of repair.\n"));
> > +	mp->m_sb.sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;
> > +	return true;
> > +}
> > +
> > +/* Perform the user's requested upgrades on filesystem. */
> > +static void
> > +upgrade_filesystem(
> > +	struct xfs_mount	*mp)
> > +{
> > +	struct xfs_buf		*bp;
> > +	bool			dirty = false;
> > +	int			error;
> > +
> > +	if (add_needsrepair)
> > +		dirty |= set_needsrepair(mp);
> > +
> > +        if (no_modify || !dirty)
> > +                return;
> > +
> > +        bp = libxfs_getsb(mp);
> > +        if (!bp || bp->b_error) {
> > +                do_error(
> > +	_("couldn't get superblock for feature upgrade, err=%d\n"),
> > +                                bp ? bp->b_error : ENOMEM);
> > +        } else {
> > +                libxfs_sb_to_disk(bp->b_addr, &mp->m_sb);
> > +
> > +                /*
> > +		 * Write the primary super to disk immediately so that
> > +		 * needsrepair will be set if repair doesn't complete.
> > +		 */
> > +                error = -libxfs_bwrite(bp);
> > +                if (error)
> > +                        do_error(
> > +	_("filesystem feature upgrade failed, err=%d\n"),
> > +                                        error);
> > +        }
> > +        if (bp)
> > +                libxfs_buf_relse(bp);
> > +}
> > +
> >  /*
> >   * ok, at this point, the fs is mounted but the root inode may be
> >   * trashed and the ag headers haven't been checked.  So we have
> > @@ -235,4 +292,10 @@ phase2(
> >  				do_warn(_("would correct\n"));
> >  		}
> >  	}
> > +
> > +	/*
> > +	 * Upgrade the filesystem now that we've done a preliminary check of
> > +	 * the superblocks, the AGs, the log, and the metadata inodes.
> > +	 */
> > +	upgrade_filesystem(mp);
> >  }
> > diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
> > index 90d1a95a..a613505f 100644
> > --- a/repair/xfs_repair.c
> > +++ b/repair/xfs_repair.c
> > @@ -65,11 +65,13 @@ static char *o_opts[] = {
> >   */
> >  enum c_opt_nums {
> >  	CONVERT_LAZY_COUNT = 0,
> > +	CONVERT_NEEDSREPAIR,
> >  	C_MAX_OPTS,
> >  };
> >  
> >  static char *c_opts[] = {
> >  	[CONVERT_LAZY_COUNT]	= "lazycount",
> > +	[CONVERT_NEEDSREPAIR]	= "needsrepair",
> >  	[C_MAX_OPTS]		= NULL,
> >  };
> >  
> > @@ -302,6 +304,13 @@ process_args(int argc, char **argv)
> >  					lazy_count = (int)strtol(val, NULL, 0);
> >  					convert_lazy_count = 1;
> >  					break;
> > +				case CONVERT_NEEDSREPAIR:
> > +					if (!val)
> > +						do_abort(
> > +		_("-c needsrepair requires a parameter\n"));
> > +					if (strtol(val, NULL, 0) == 1)
> > +						add_needsrepair = true;
> > +					break;
> >  				default:
> >  					unknown('c', val);
> >  					break;
> > 

  reply	other threads:[~2021-02-12  0:18 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-11 22:59 [PATCHSET v5 00/11] xfs: add the ability to flag a fs for repair Darrick J. Wong
2021-02-11 22:59 ` [PATCH 01/11] xfs_admin: clean up string quoting Darrick J. Wong
2021-02-11 23:08   ` Chaitanya Kulkarni
2021-02-11 22:59 ` [PATCH 02/11] xfs_admin: support filesystems with realtime devices Darrick J. Wong
2021-02-11 23:08   ` Chaitanya Kulkarni
2021-02-11 22:59 ` [PATCH 03/11] xfs_db: report the needsrepair flag in check and version commands Darrick J. Wong
2021-02-11 22:59 ` [PATCH 04/11] xfs_db: don't allow label/uuid setting if the needsrepair flag is set Darrick J. Wong
2021-02-11 22:59 ` [PATCH 05/11] xfs_repair: fix unmount error message to have a newline Darrick J. Wong
2021-02-11 23:12   ` Chaitanya Kulkarni
2021-02-11 22:59 ` [PATCH 06/11] xfs_repair: clear quota CHKD flags on the incore superblock too Darrick J. Wong
2021-02-11 22:59 ` [PATCH 07/11] xfs_repair: clear the needsrepair flag Darrick J. Wong
2021-02-11 22:59 ` [PATCH 08/11] xfs_repair: allow setting " Darrick J. Wong
2021-02-11 23:29   ` Eric Sandeen
2021-02-12  0:17     ` Darrick J. Wong [this message]
2021-02-12  0:20       ` Eric Sandeen
2021-02-12  1:26         ` Darrick J. Wong
2021-02-12  4:35       ` Darrick J. Wong
2021-02-12 13:35       ` Brian Foster
2021-02-12 18:54         ` Darrick J. Wong
2021-02-11 22:59 ` [PATCH 09/11] xfs_repair: add a testing hook for NEEDSREPAIR Darrick J. Wong
2021-02-11 22:59 ` [PATCH 10/11] xfs_admin: support adding features to V5 filesystems Darrick J. Wong
2021-02-11 23:00 ` [PATCH 11/11] man: mark all deprecated V4 format options Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210212001731.GH7193@magnolia \
    --to=djwong@kernel.org \
    --cc=bfoster@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).