All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 01/11] xfs: skip scrub xref if corruption already noted
Date: Wed, 18 Apr 2018 09:02:41 -0700	[thread overview]
Message-ID: <20180418160241.GR24738@magnolia> (raw)
In-Reply-To: <20180418150307.GA19870@bfoster.bfoster>

On Wed, Apr 18, 2018 at 11:03:08AM -0400, Brian Foster wrote:
> On Tue, Apr 17, 2018 at 07:39:33PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Don't bother looking for cross-referencing problems if the metadata is
> > already corrupt or we've already found a cross-referencing problem.
> > Since we added a helper function for flags testing, convert existing
> > users to use it.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/scrub/alloc.c    |    2 +-
> >  fs/xfs/scrub/common.c   |    4 ++++
> >  fs/xfs/scrub/common.h   |    6 ++++++
> >  fs/xfs/scrub/ialloc.c   |    2 +-
> >  fs/xfs/scrub/refcount.c |    4 ++--
> >  fs/xfs/scrub/rmap.c     |    4 ++--
> >  fs/xfs/scrub/rtbitmap.c |    3 +++
> >  fs/xfs/scrub/scrub.c    |    3 +--
> >  8 files changed, 20 insertions(+), 8 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/scrub/alloc.c b/fs/xfs/scrub/alloc.c
> > index 517c079..f5c6993 100644
> > --- a/fs/xfs/scrub/alloc.c
> > +++ b/fs/xfs/scrub/alloc.c
> > @@ -172,7 +172,7 @@ xfs_scrub_xref_is_used_space(
> >  	bool				is_freesp;
> >  	int				error;
> >  
> > -	if (!sc->sa.bno_cur)
> > +	if (!sc->sa.bno_cur || xfs_scrub_found_corruption(sc->sm))
> >  		return;
> >  
> >  	error = xfs_alloc_has_record(sc->sa.bno_cur, agbno, len, &is_freesp);
> > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > index 8ed91d5..3f72176 100644
> > --- a/fs/xfs/scrub/common.c
> > +++ b/fs/xfs/scrub/common.c
> > @@ -727,6 +727,10 @@ xfs_scrub_should_check_xref(
> >  	int				*error,
> >  	struct xfs_btree_cur		**curpp)
> >  {
> > +	/* No point in xref if we already know we're corrupt. */
> > +	if (xfs_scrub_found_corruption(sc->sm))
> > +		return false;
> > +
> 
> This seems like a fairly innocuous change for not being terribly
> familiar with the scrub code, but a high level thought...
> 
> This function has 30+ callers, then we have a bunch of these
> found_corruption() checks as well, one of which is the former. The new
> callers of the found_corruption() functions look like they easily have
> tens of callers throughout the scrub xref code, which kind of makes this
> look like quite the maze of logic just to shortcut various checks.
> 
> This is not an objection to this patch, but it makes me wonder how much
> "empty" processing is involved in a situation where we assume an xref
> error is detected early on in a scrub. Secondarily, the logic makes this
> hard enough to follow that the only way to really surmise that is to
> trace through all of the scrub code. It also looks like we have some
> places where it's checked multiple times.

Yep.  Originally I designed scrub always to run all the way to the end
so that we could find and stream to userspace every corruption &
discrepancy that we found.  Then I started looking into how to actually
communicate with userspace in a sane way, found none, and decided to
migrate towards bailing out as soon as we know something's bad:

 1. Stop checking at the first outright corruption we see.
 2. Stop cross-referencing at the first xref discrepancy we see.

The goal is to revise the structure of the scrubbing code like so:

1. for each record:
   a. call function that:
        i. look for corruption errors
       ii. if corruption errors then return
      iii. for each thing we can cross-reference with:
           A. if xref errors, break out of loop
           B. confirm cross-reference
   b. if corruption errors, break out of loop

Note: the xref checks in 1.a.iii.B. should never set OFLAG_CORRUPT.

Therefore I have to outfit every xref helper and subfunction with a
check that skips the xref if there are xref errors...

> For example, xfs_scrub_inode_xref() checks the corrupt flag and then
> calls a bunch of sub-handlers that call either found_corruption() or
> the should_check_xref(). One of the latter callers is
> xfs_scrub_inode_xref_finobt(), which does a btree lookup before it
> ever gets to this call only to then bail on the corruption flag being
> set.

...but I see what you're saying and I think I just plain missed those. :(

> Granted, this would probably only execute if the flag was set after the
> higher level scrub_inode_xref() check, but isn't it kind of strange to
> bury a check that is intended to control high-level xref execution flow
> in a call that is (also) designed to check the error of some btree
> lookup that shouldn't be required at all if the xref was going to be
> skipped anyways? I'm wondering if this would be more clear with
> consistent but separate error/corruption checks at the top of every xref
> helper to dictate high-level scrub execution flow and leave
> scrub_should_check_xref() to continue to care only about local
> processing decisions specific to the current branch of the scan (i.e.,
> my btree lookup failed and so we must set the corrupt flag and not
> proceed).

Yep, exactly right.  I'll go scan the source code again.

(My filthy excuse for missing things is distraction on account of making
sure that scrub reliably triggers repair, and that repair works
correctly.)

--D

> Brian
> 
> >  	if (*error == 0)
> >  		return true;
> >  
> > diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> > index deaf604..30e9039 100644
> > --- a/fs/xfs/scrub/common.h
> > +++ b/fs/xfs/scrub/common.h
> > @@ -157,4 +157,10 @@ int xfs_scrub_setup_inode_contents(struct xfs_scrub_context *sc,
> >  				   struct xfs_inode *ip, unsigned int resblks);
> >  void xfs_scrub_buffer_recheck(struct xfs_scrub_context *sc, struct xfs_buf *bp);
> >  
> > +static inline bool xfs_scrub_found_corruption(struct xfs_scrub_metadata *sm)
> > +{
> > +	return sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
> > +			       XFS_SCRUB_OFLAG_XCORRUPT);
> > +}
> > +
> >  #endif	/* __XFS_SCRUB_COMMON_H__ */
> > diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> > index 106ca4b..05e6191 100644
> > --- a/fs/xfs/scrub/ialloc.c
> > +++ b/fs/xfs/scrub/ialloc.c
> > @@ -496,7 +496,7 @@ xfs_scrub_xref_inode_check(
> >  	bool				has_inodes;
> >  	int				error;
> >  
> > -	if (!(*icur))
> > +	if (!(*icur) || xfs_scrub_found_corruption(sc->sm))
> >  		return;
> >  
> >  	error = xfs_ialloc_has_inodes_at_extent(*icur, agbno, len, &has_inodes);
> > diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c
> > index 400f156..823bda3 100644
> > --- a/fs/xfs/scrub/refcount.c
> > +++ b/fs/xfs/scrub/refcount.c
> > @@ -460,7 +460,7 @@ xfs_scrub_xref_is_cow_staging(
> >  	int				has_refcount;
> >  	int				error;
> >  
> > -	if (!sc->sa.refc_cur)
> > +	if (!sc->sa.refc_cur || xfs_scrub_found_corruption(sc->sm))
> >  		return;
> >  
> >  	/* Find the CoW staging extent. */
> > @@ -504,7 +504,7 @@ xfs_scrub_xref_is_not_shared(
> >  	bool				shared;
> >  	int				error;
> >  
> > -	if (!sc->sa.refc_cur)
> > +	if (!sc->sa.refc_cur || xfs_scrub_found_corruption(sc->sm))
> >  		return;
> >  
> >  	error = xfs_refcount_has_record(sc->sa.refc_cur, agbno, len, &shared);
> > diff --git a/fs/xfs/scrub/rmap.c b/fs/xfs/scrub/rmap.c
> > index 8f2a7c3..9ca92e4 100644
> > --- a/fs/xfs/scrub/rmap.c
> > +++ b/fs/xfs/scrub/rmap.c
> > @@ -207,7 +207,7 @@ xfs_scrub_xref_check_owner(
> >  	bool				has_rmap;
> >  	int				error;
> >  
> > -	if (!sc->sa.rmap_cur)
> > +	if (!sc->sa.rmap_cur || xfs_scrub_found_corruption(sc->sm))
> >  		return;
> >  
> >  	error = xfs_rmap_record_exists(sc->sa.rmap_cur, bno, len, oinfo,
> > @@ -250,7 +250,7 @@ xfs_scrub_xref_has_no_owner(
> >  	bool				has_rmap;
> >  	int				error;
> >  
> > -	if (!sc->sa.rmap_cur)
> > +	if (!sc->sa.rmap_cur || xfs_scrub_found_corruption(sc->sm))
> >  		return;
> >  
> >  	error = xfs_rmap_has_record(sc->sa.rmap_cur, bno, len, &has_rmap);
> > diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c
> > index 39c41dfe..c8672f7 100644
> > --- a/fs/xfs/scrub/rtbitmap.c
> > +++ b/fs/xfs/scrub/rtbitmap.c
> > @@ -110,6 +110,9 @@ xfs_scrub_xref_is_used_rt_space(
> >  	bool				is_free;
> >  	int				error;
> >  
> > +	if (xfs_scrub_found_corruption(sc->sm))
> > +		return;
> > +
> >  	xfs_ilock(sc->mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
> >  	error = xfs_rtalloc_extent_is_free(sc->mp, sc->tp, fsbno, len,
> >  			&is_free);
> > diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> > index 26c7596..cb1e727 100644
> > --- a/fs/xfs/scrub/scrub.c
> > +++ b/fs/xfs/scrub/scrub.c
> > @@ -446,8 +446,7 @@ xfs_scrub_metadata(
> >  	} else if (error)
> >  		goto out_teardown;
> >  
> > -	if (sc.sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
> > -			       XFS_SCRUB_OFLAG_XCORRUPT))
> > +	if (xfs_scrub_found_corruption(sc.sm))
> >  		xfs_alert_ratelimited(mp, "Corruption detected during scrub.");
> >  
> >  out_teardown:
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2018-04-18 16:03 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-18  2:39 [PATCH 00/11] xfs-4.18: online scrub fixes Darrick J. Wong
2018-04-18  2:39 ` [PATCH 01/11] xfs: skip scrub xref if corruption already noted Darrick J. Wong
2018-04-18 15:03   ` Brian Foster
2018-04-18 16:02     ` Darrick J. Wong [this message]
2018-04-18  2:39 ` [PATCH 02/11] xfs: create the XFS_QMOPT_QUOTIP_LOCKED flag Darrick J. Wong
2018-04-18 15:33   ` Brian Foster
2018-04-18 16:55     ` Darrick J. Wong
2018-04-18 17:09       ` Brian Foster
2018-04-19  8:32   ` Christoph Hellwig
2018-04-21 18:42     ` Darrick J. Wong
2018-04-18  2:39 ` [PATCH 03/11] xfs: report failing address when dquot verifier fails Darrick J. Wong
2018-04-18 18:33   ` Brian Foster
2018-04-18  2:40 ` [PATCH 04/11] xfs: refactor dquot iteration Darrick J. Wong
2018-04-18 18:34   ` Brian Foster
2018-04-18 22:20     ` Darrick J. Wong
2018-04-18  2:40 ` [PATCH 05/11] xfs: avoid ilock games in the quota scrubber Darrick J. Wong
2018-04-18 18:34   ` Brian Foster
2018-04-18  2:40 ` [PATCH 06/11] xfs: quota scrub should use bmapbtd scrubber Darrick J. Wong
2018-04-18 18:34   ` Brian Foster
2018-04-18 20:00     ` Darrick J. Wong
2018-04-19 11:20       ` Brian Foster
2018-04-18  2:40 ` [PATCH 07/11] xfs: superblock scrub should use uncached buffers Darrick J. Wong
2018-04-19 12:55   ` Brian Foster
2018-04-19 17:25     ` Darrick J. Wong
2018-04-19 18:57       ` Brian Foster
2018-04-20  0:07         ` Dave Chinner
2018-04-21  0:29           ` Darrick J. Wong
2018-04-20  0:05       ` Dave Chinner
2018-04-18  2:40 ` [PATCH 08/11] xfs: clean up scrub usage of KM_NOFS Darrick J. Wong
2018-04-19 12:55   ` Brian Foster
2018-04-18  2:40 ` [PATCH 09/11] xfs: btree scrub should check minrecs Darrick J. Wong
2018-04-19 12:55   ` Brian Foster
2018-04-18  2:40 ` [PATCH 10/11] xfs: refactor scrub transaction allocation function Darrick J. Wong
2018-04-19 12:56   ` Brian Foster
2018-04-18  2:40 ` [PATCH 11/11] xfs: avoid ABBA deadlock when scrubbing parent pointers Darrick J. Wong
2018-04-19 12:56   ` Brian Foster
2018-04-19 17:33     ` Darrick J. Wong
2018-04-19 18:58       ` Brian Foster
2018-04-19 19:06         ` Darrick J. Wong
2018-04-21  0:31           ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180418160241.GR24738@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.