All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	xfs <linux-xfs@vger.kernel.org>
Subject: Re: Question about 67dc288c ("xfs: ensure verifiers are attached to recovered buffers")
Date: Sun, 15 Oct 2017 09:07:59 +1100	[thread overview]
Message-ID: <20171014220759.GZ15067@dastard> (raw)
In-Reply-To: <20171014115550.GB50635@bfoster.bfoster>

On Sat, Oct 14, 2017 at 07:55:51AM -0400, Brian Foster wrote:
> On Fri, Oct 13, 2017 at 11:49:16AM -0700, Darrick J. Wong wrote:
> > Hi all,
> > 
> > I have a question about 67dc288c ("xfs: ensure verifiers are attached to
> > recovered buffers").  I was analyzing a scrub failure on generic/392
> > with a v4 filesystem which stems from xfs_scrub_buffer_recheck (it's in
> > scrub part 4) being unable to find a b_ops attached to the AGF buffer
> > and signalling error.
> > 
> > The pattern I observe is that when log recovery runs on a v4 filesystem,
> > we call some variant of xfs_buf_read with a NULL ops parameter.  The
> > buffer therefore gets created and read without any verifiers.
> > Eventually, xlog_recover_validate_buf_type gets called, and on a v5
> > filesystem we come back and attach verifiers and all is well.  However,
> > on a v4 filesystem the function returns without doing anything, so the
> > xfs_buf just sits around in memory with no verifier.  Subsequent
> > read/log/relse patterns can write anything they want without write
> > verifiers to check that.
> > 
> > If the v4 fs didn't need log recovery, the buffers get created with
> > b_ops as you'd expect.
> > 
> > My question is, shouldn't xlog_recover_validate_buf_type unconditionally
> > set b_ops and save the "if (hascrc)" bits for the part that ensures the
> > LSN is up to date?
> > 
> 
> Seems reasonable, but I notice that the has_crc() check around
> _validate_buf_type() comes in sometime after the the original commit
> referenced below (d75afeb3) and commit 67dc288c. It appears to be due to
> commit 9222a9cf86 ("xfs: don't shutdown log recovery on validation
> errors").
> 
> IIRC, the problem there is that log recovery had traditionally always
> unconditionally replayed everything in the log over whatever resides in
> the fs. This actually meant that recovery could transiently corrupt
> buffers in certain cases if the target buffer happened to be relogged
> more than once and was already up to date, which leads to verification
> failures.

Yes, that is one of the problems - we can get writeback of partially
updated buffers mid-way through log recovery on v4 filesystems.

> This was addressed for v5 filesystems with LSN ordering rules,
> but the challenge for v4 filesystems was that there is no metadata LSN
> and thus no means to detect whether a buffer is already up to date with
> regard to a transaction in the log.

In a nutshell.

> Dave might have more historical context to confirm that...

Historically it only occurred (rarely) due to memory pressure
triggering writeback during recovery. However, when we changed to context
specific delayed write buffer lists we started doing that writeback
after every checkpoint was recovered. Hence it's now pretty trivial
to trigger verifier failures during log recovery on v4
filesystems...

> If that is
> still an open issue, a couple initial ideas come to mind:
> 
> 1.) Do something simple/crude like reclaim all buffers after log
> recovery on v4 filesystems to provide a clean slate going forward.

This might be a worthwhile thing to do, anyway. Log recovery can
lead to a lot of cached metadata that won't be referenced again
after reocvery is complete. Perhaps we should just clear the
buffer cache after the first phase of recovery just before/after
we re-read the superblock and re-init the incore space accounting...

> 2.) Unconditionally attach verifiers during recovery as originally done
> and wire up something generic that short circuits verifier invocations
> on v4 filesystems when log recovery is in progress.

I'd prefer "return to clean slate" than have to handle log
recovery state specially in every verifier. It's simple, it's easy
to maintain, and it creates a barrier between metadata recovered
from the log and post-processing of intents/unlinks that ensures
we've made all the recovered changes stable on disk before we move
on...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2017-10-14 22:08 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-13 18:49 Question about 67dc288c ("xfs: ensure verifiers are attached to recovered buffers") Darrick J. Wong
2017-10-14 11:55 ` Brian Foster
2017-10-14 19:05   ` Darrick J. Wong
2017-10-16 10:37     ` Brian Foster
2017-10-16 21:29     ` Dave Chinner
2017-10-16 22:18       ` Darrick J. Wong
2017-10-17 14:53         ` Brian Foster
2017-10-20 15:16         ` Brian Foster
2017-10-20 16:44           ` Darrick J. Wong
2017-10-20 16:59             ` Brian Foster
2017-10-20 18:00               ` Darrick J. Wong
2017-10-21  6:10                 ` Darrick J. Wong
2017-10-23 13:08                   ` Brian Foster
2017-10-14 22:07   ` Dave Chinner [this message]
2017-10-16 10:38     ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171014220759.GZ15067@dastard \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.