All of lore.kernel.org
 help / color / mirror / Atom feed
* Question about 67dc288c ("xfs: ensure verifiers are attached to recovered buffers")
@ 2017-10-13 18:49 Darrick J. Wong
  2017-10-14 11:55 ` Brian Foster
  0 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2017-10-13 18:49 UTC (permalink / raw)
  To: xfs; +Cc: Dave Chinner

Hi all,

I have a question about 67dc288c ("xfs: ensure verifiers are attached to
recovered buffers").  I was analyzing a scrub failure on generic/392
with a v4 filesystem which stems from xfs_scrub_buffer_recheck (it's in
scrub part 4) being unable to find a b_ops attached to the AGF buffer
and signalling error.

The pattern I observe is that when log recovery runs on a v4 filesystem,
we call some variant of xfs_buf_read with a NULL ops parameter.  The
buffer therefore gets created and read without any verifiers.
Eventually, xlog_recover_validate_buf_type gets called, and on a v5
filesystem we come back and attach verifiers and all is well.  However,
on a v4 filesystem the function returns without doing anything, so the
xfs_buf just sits around in memory with no verifier.  Subsequent
read/log/relse patterns can write anything they want without write
verifiers to check that.

If the v4 fs didn't need log recovery, the buffers get created with
b_ops as you'd expect.

My question is, shouldn't xlog_recover_validate_buf_type unconditionally
set b_ops and save the "if (hascrc)" bits for the part that ensures the
LSN is up to date?

It seems like a bad idea to let buffers sit around with no verifier.
The original patch adding this function is d75afeb3 ("xfs: add buffer
types to directory and attribute buffers") and looks like it was
supposed to do this for any filesystem, but I wasn't around to know the
evolution of that part of xlog.

--D

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about 67dc288c ("xfs: ensure verifiers are attached to recovered buffers")
  2017-10-13 18:49 Question about 67dc288c ("xfs: ensure verifiers are attached to recovered buffers") Darrick J. Wong
@ 2017-10-14 11:55 ` Brian Foster
  2017-10-14 19:05   ` Darrick J. Wong
  2017-10-14 22:07   ` Dave Chinner
  0 siblings, 2 replies; 15+ messages in thread
From: Brian Foster @ 2017-10-14 11:55 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs, Dave Chinner

On Fri, Oct 13, 2017 at 11:49:16AM -0700, Darrick J. Wong wrote:
> Hi all,
> 
> I have a question about 67dc288c ("xfs: ensure verifiers are attached to
> recovered buffers").  I was analyzing a scrub failure on generic/392
> with a v4 filesystem which stems from xfs_scrub_buffer_recheck (it's in
> scrub part 4) being unable to find a b_ops attached to the AGF buffer
> and signalling error.
> 
> The pattern I observe is that when log recovery runs on a v4 filesystem,
> we call some variant of xfs_buf_read with a NULL ops parameter.  The
> buffer therefore gets created and read without any verifiers.
> Eventually, xlog_recover_validate_buf_type gets called, and on a v5
> filesystem we come back and attach verifiers and all is well.  However,
> on a v4 filesystem the function returns without doing anything, so the
> xfs_buf just sits around in memory with no verifier.  Subsequent
> read/log/relse patterns can write anything they want without write
> verifiers to check that.
> 
> If the v4 fs didn't need log recovery, the buffers get created with
> b_ops as you'd expect.
> 
> My question is, shouldn't xlog_recover_validate_buf_type unconditionally
> set b_ops and save the "if (hascrc)" bits for the part that ensures the
> LSN is up to date?
> 

Seems reasonable, but I notice that the has_crc() check around
_validate_buf_type() comes in sometime after the the original commit
referenced below (d75afeb3) and commit 67dc288c. It appears to be due to
commit 9222a9cf86 ("xfs: don't shutdown log recovery on validation
errors").

IIRC, the problem there is that log recovery had traditionally always
unconditionally replayed everything in the log over whatever resides in
the fs. This actually meant that recovery could transiently corrupt
buffers in certain cases if the target buffer happened to be relogged
more than once and was already up to date, which leads to verification
failures. This was addressed for v5 filesystems with LSN ordering rules,
but the challenge for v4 filesystems was that there is no metadata LSN
and thus no means to detect whether a buffer is already up to date with
regard to a transaction in the log.

Dave might have more historical context to confirm that... If that is
still an open issue, a couple initial ideas come to mind:

1.) Do something simple/crude like reclaim all buffers after log
recovery on v4 filesystems to provide a clean slate going forward.

2.) Unconditionally attach verifiers during recovery as originally done
and wire up something generic that short circuits verifier invocations
on v4 filesystems when log recovery is in progress.

Brian

> It seems like a bad idea to let buffers sit around with no verifier.
> The original patch adding this function is d75afeb3 ("xfs: add buffer
> types to directory and attribute buffers") and looks like it was
> supposed to do this for any filesystem, but I wasn't around to know the
> evolution of that part of xlog.
> 
> --D
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about 67dc288c ("xfs: ensure verifiers are attached to recovered buffers")
  2017-10-14 11:55 ` Brian Foster
@ 2017-10-14 19:05   ` Darrick J. Wong
  2017-10-16 10:37     ` Brian Foster
  2017-10-16 21:29     ` Dave Chinner
  2017-10-14 22:07   ` Dave Chinner
  1 sibling, 2 replies; 15+ messages in thread
From: Darrick J. Wong @ 2017-10-14 19:05 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs, Dave Chinner

On Sat, Oct 14, 2017 at 07:55:51AM -0400, Brian Foster wrote:
> On Fri, Oct 13, 2017 at 11:49:16AM -0700, Darrick J. Wong wrote:
> > Hi all,
> > 
> > I have a question about 67dc288c ("xfs: ensure verifiers are attached to
> > recovered buffers").  I was analyzing a scrub failure on generic/392
> > with a v4 filesystem which stems from xfs_scrub_buffer_recheck (it's in
> > scrub part 4) being unable to find a b_ops attached to the AGF buffer
> > and signalling error.
> > 
> > The pattern I observe is that when log recovery runs on a v4 filesystem,
> > we call some variant of xfs_buf_read with a NULL ops parameter.  The
> > buffer therefore gets created and read without any verifiers.
> > Eventually, xlog_recover_validate_buf_type gets called, and on a v5
> > filesystem we come back and attach verifiers and all is well.  However,
> > on a v4 filesystem the function returns without doing anything, so the
> > xfs_buf just sits around in memory with no verifier.  Subsequent
> > read/log/relse patterns can write anything they want without write
> > verifiers to check that.
> > 
> > If the v4 fs didn't need log recovery, the buffers get created with
> > b_ops as you'd expect.
> > 
> > My question is, shouldn't xlog_recover_validate_buf_type unconditionally
> > set b_ops and save the "if (hascrc)" bits for the part that ensures the
> > LSN is up to date?
> > 
> 
> Seems reasonable, but I notice that the has_crc() check around
> _validate_buf_type() comes in sometime after the the original commit
> referenced below (d75afeb3) and commit 67dc288c. It appears to be due to
> commit 9222a9cf86 ("xfs: don't shutdown log recovery on validation
> errors").
> 
> IIRC, the problem there is that log recovery had traditionally always
> unconditionally replayed everything in the log over whatever resides in
> the fs. This actually meant that recovery could transiently corrupt
> buffers in certain cases if the target buffer happened to be relogged
> more than once and was already up to date, which leads to verification
> failures. This was addressed for v5 filesystems with LSN ordering rules,
> but the challenge for v4 filesystems was that there is no metadata LSN
> and thus no means to detect whether a buffer is already up to date with
> regard to a transaction in the log.
> 
> Dave might have more historical context to confirm that... If that is
> still an open issue, a couple initial ideas come to mind:
> 
> 1.) Do something simple/crude like reclaim all buffers after log
> recovery on v4 filesystems to provide a clean slate going forward.
> 
> 2.) Unconditionally attach verifiers during recovery as originally done
> and wire up something generic that short circuits verifier invocations
> on v4 filesystems when log recovery is in progress.

What do you think about 3) add b_ops later if they're missing?

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 2f97c12..8842a27 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -742,6 +742,15 @@ xfs_buf_read_map(
        if (bp) {
                trace_xfs_buf_read(bp, flags, _RET_IP_);
 
+               /*
+                * If this buffer is up to date and has no verifier, try
+                * to set one.  This can happen on v4 because log
+                * recovery reads in the buffers for replay but does not
+                * set b_ops.
+                */
+               if ((bp->b_flags & XBF_DONE) && !bp->b_ops)
+                       bp->b_ops = ops;
+
                if (!(bp->b_flags & XBF_DONE)) {
                        XFS_STATS_INC(target->bt_mount, xb_get_read);
                        bp->b_ops = ops;


(Sort of a hack, I think I prefer something along the lines of #2 better.)

--D

> 
> Brian
> 
> > It seems like a bad idea to let buffers sit around with no verifier.
> > The original patch adding this function is d75afeb3 ("xfs: add buffer
> > types to directory and attribute buffers") and looks like it was
> > supposed to do this for any filesystem, but I wasn't around to know the
> > evolution of that part of xlog.
> > 
> > --D
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Question about 67dc288c ("xfs: ensure verifiers are attached to recovered buffers")
  2017-10-14 11:55 ` Brian Foster
  2017-10-14 19:05   ` Darrick J. Wong
@ 2017-10-14 22:07   ` Dave Chinner
  2017-10-16 10:38     ` Brian Foster
  1 sibling, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2017-10-14 22:07 UTC (permalink / raw)
  To: Brian Foster; +Cc: Darrick J. Wong, xfs

On Sat, Oct 14, 2017 at 07:55:51AM -0400, Brian Foster wrote:
> On Fri, Oct 13, 2017 at 11:49:16AM -0700, Darrick J. Wong wrote:
> > Hi all,
> > 
> > I have a question about 67dc288c ("xfs: ensure verifiers are attached to
> > recovered buffers").  I was analyzing a scrub failure on generic/392
> > with a v4 filesystem which stems from xfs_scrub_buffer_recheck (it's in
> > scrub part 4) being unable to find a b_ops attached to the AGF buffer
> > and signalling error.
> > 
> > The pattern I observe is that when log recovery runs on a v4 filesystem,
> > we call some variant of xfs_buf_read with a NULL ops parameter.  The
> > buffer therefore gets created and read without any verifiers.
> > Eventually, xlog_recover_validate_buf_type gets called, and on a v5
> > filesystem we come back and attach verifiers and all is well.  However,
> > on a v4 filesystem the function returns without doing anything, so the
> > xfs_buf just sits around in memory with no verifier.  Subsequent
> > read/log/relse patterns can write anything they want without write
> > verifiers to check that.
> > 
> > If the v4 fs didn't need log recovery, the buffers get created with
> > b_ops as you'd expect.
> > 
> > My question is, shouldn't xlog_recover_validate_buf_type unconditionally
> > set b_ops and save the "if (hascrc)" bits for the part that ensures the
> > LSN is up to date?
> > 
> 
> Seems reasonable, but I notice that the has_crc() check around
> _validate_buf_type() comes in sometime after the the original commit
> referenced below (d75afeb3) and commit 67dc288c. It appears to be due to
> commit 9222a9cf86 ("xfs: don't shutdown log recovery on validation
> errors").
> 
> IIRC, the problem there is that log recovery had traditionally always
> unconditionally replayed everything in the log over whatever resides in
> the fs. This actually meant that recovery could transiently corrupt
> buffers in certain cases if the target buffer happened to be relogged
> more than once and was already up to date, which leads to verification
> failures.

Yes, that is one of the problems - we can get writeback of partially
updated buffers mid-way through log recovery on v4 filesystems.

> This was addressed for v5 filesystems with LSN ordering rules,
> but the challenge for v4 filesystems was that there is no metadata LSN
> and thus no means to detect whether a buffer is already up to date with
> regard to a transaction in the log.

In a nutshell.

> Dave might have more historical context to confirm that...

Historically it only occurred (rarely) due to memory pressure
triggering writeback during recovery. However, when we changed to context
specific delayed write buffer lists we started doing that writeback
after every checkpoint was recovered. Hence it's now pretty trivial
to trigger verifier failures during log recovery on v4
filesystems...

> If that is
> still an open issue, a couple initial ideas come to mind:
> 
> 1.) Do something simple/crude like reclaim all buffers after log
> recovery on v4 filesystems to provide a clean slate going forward.

This might be a worthwhile thing to do, anyway. Log recovery can
lead to a lot of cached metadata that won't be referenced again
after reocvery is complete. Perhaps we should just clear the
buffer cache after the first phase of recovery just before/after
we re-read the superblock and re-init the incore space accounting...

> 2.) Unconditionally attach verifiers during recovery as originally done
> and wire up something generic that short circuits verifier invocations
> on v4 filesystems when log recovery is in progress.

I'd prefer "return to clean slate" than have to handle log
recovery state specially in every verifier. It's simple, it's easy
to maintain, and it creates a barrier between metadata recovered
from the log and post-processing of intents/unlinks that ensures
we've made all the recovered changes stable on disk before we move
on...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about 67dc288c ("xfs: ensure verifiers are attached to recovered buffers")
  2017-10-14 19:05   ` Darrick J. Wong
@ 2017-10-16 10:37     ` Brian Foster
  2017-10-16 21:29     ` Dave Chinner
  1 sibling, 0 replies; 15+ messages in thread
From: Brian Foster @ 2017-10-16 10:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs, Dave Chinner

On Sat, Oct 14, 2017 at 12:05:30PM -0700, Darrick J. Wong wrote:
> On Sat, Oct 14, 2017 at 07:55:51AM -0400, Brian Foster wrote:
> > On Fri, Oct 13, 2017 at 11:49:16AM -0700, Darrick J. Wong wrote:
> > > Hi all,
> > > 
> > > I have a question about 67dc288c ("xfs: ensure verifiers are attached to
> > > recovered buffers").  I was analyzing a scrub failure on generic/392
> > > with a v4 filesystem which stems from xfs_scrub_buffer_recheck (it's in
> > > scrub part 4) being unable to find a b_ops attached to the AGF buffer
> > > and signalling error.
> > > 
> > > The pattern I observe is that when log recovery runs on a v4 filesystem,
> > > we call some variant of xfs_buf_read with a NULL ops parameter.  The
> > > buffer therefore gets created and read without any verifiers.
> > > Eventually, xlog_recover_validate_buf_type gets called, and on a v5
> > > filesystem we come back and attach verifiers and all is well.  However,
> > > on a v4 filesystem the function returns without doing anything, so the
> > > xfs_buf just sits around in memory with no verifier.  Subsequent
> > > read/log/relse patterns can write anything they want without write
> > > verifiers to check that.
> > > 
> > > If the v4 fs didn't need log recovery, the buffers get created with
> > > b_ops as you'd expect.
> > > 
> > > My question is, shouldn't xlog_recover_validate_buf_type unconditionally
> > > set b_ops and save the "if (hascrc)" bits for the part that ensures the
> > > LSN is up to date?
> > > 
> > 
> > Seems reasonable, but I notice that the has_crc() check around
> > _validate_buf_type() comes in sometime after the the original commit
> > referenced below (d75afeb3) and commit 67dc288c. It appears to be due to
> > commit 9222a9cf86 ("xfs: don't shutdown log recovery on validation
> > errors").
> > 
> > IIRC, the problem there is that log recovery had traditionally always
> > unconditionally replayed everything in the log over whatever resides in
> > the fs. This actually meant that recovery could transiently corrupt
> > buffers in certain cases if the target buffer happened to be relogged
> > more than once and was already up to date, which leads to verification
> > failures. This was addressed for v5 filesystems with LSN ordering rules,
> > but the challenge for v4 filesystems was that there is no metadata LSN
> > and thus no means to detect whether a buffer is already up to date with
> > regard to a transaction in the log.
> > 
> > Dave might have more historical context to confirm that... If that is
> > still an open issue, a couple initial ideas come to mind:
> > 
> > 1.) Do something simple/crude like reclaim all buffers after log
> > recovery on v4 filesystems to provide a clean slate going forward.
> > 
> > 2.) Unconditionally attach verifiers during recovery as originally done
> > and wire up something generic that short circuits verifier invocations
> > on v4 filesystems when log recovery is in progress.
> 
> What do you think about 3) add b_ops later if they're missing?
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 2f97c12..8842a27 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -742,6 +742,15 @@ xfs_buf_read_map(
>         if (bp) {
>                 trace_xfs_buf_read(bp, flags, _RET_IP_);
>  
> +               /*
> +                * If this buffer is up to date and has no verifier, try
> +                * to set one.  This can happen on v4 because log
> +                * recovery reads in the buffers for replay but does not
> +                * set b_ops.
> +                */
> +               if ((bp->b_flags & XBF_DONE) && !bp->b_ops)
> +                       bp->b_ops = ops;
> +
>                 if (!(bp->b_flags & XBF_DONE)) {
>                         XFS_STATS_INC(target->bt_mount, xb_get_read);
>                         bp->b_ops = ops;
> 
> 
> (Sort of a hack, I think I prefer something along the lines of #2 better.)
> 

Yeah, seems like a plausible fix from a technical perspective. The first
thing that comes to mind is the need to verify that nothing actually
looks up an in-core buffer rather than attempt to read it and could
somehow use it from there. That may not be the case or even likely going
forward, but you never know how code is going to change in the future.
The fact that it's not a relevant concern with the other approaches is
what makes me give slight preference to those, fwiw.

OTOH, a variant of this that say iterated the cached buffers and did a
post-recovery "validation" to attach any missing verifiers might be a
way to avoid that flaw. As Dave kind of alludes to in his reply, that
just might be more work/code than it's really worth.

Brian

> --D
> 
> > 
> > Brian
> > 
> > > It seems like a bad idea to let buffers sit around with no verifier.
> > > The original patch adding this function is d75afeb3 ("xfs: add buffer
> > > types to directory and attribute buffers") and looks like it was
> > > supposed to do this for any filesystem, but I wasn't around to know the
> > > evolution of that part of xlog.
> > > 
> > > --D
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about 67dc288c ("xfs: ensure verifiers are attached to recovered buffers")
  2017-10-14 22:07   ` Dave Chinner
@ 2017-10-16 10:38     ` Brian Foster
  0 siblings, 0 replies; 15+ messages in thread
From: Brian Foster @ 2017-10-16 10:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, xfs

On Sun, Oct 15, 2017 at 09:07:59AM +1100, Dave Chinner wrote:
> On Sat, Oct 14, 2017 at 07:55:51AM -0400, Brian Foster wrote:
> > On Fri, Oct 13, 2017 at 11:49:16AM -0700, Darrick J. Wong wrote:
> > > Hi all,
> > > 
> > > I have a question about 67dc288c ("xfs: ensure verifiers are attached to
> > > recovered buffers").  I was analyzing a scrub failure on generic/392
> > > with a v4 filesystem which stems from xfs_scrub_buffer_recheck (it's in
> > > scrub part 4) being unable to find a b_ops attached to the AGF buffer
> > > and signalling error.
> > > 
> > > The pattern I observe is that when log recovery runs on a v4 filesystem,
> > > we call some variant of xfs_buf_read with a NULL ops parameter.  The
> > > buffer therefore gets created and read without any verifiers.
> > > Eventually, xlog_recover_validate_buf_type gets called, and on a v5
> > > filesystem we come back and attach verifiers and all is well.  However,
> > > on a v4 filesystem the function returns without doing anything, so the
> > > xfs_buf just sits around in memory with no verifier.  Subsequent
> > > read/log/relse patterns can write anything they want without write
> > > verifiers to check that.
> > > 
> > > If the v4 fs didn't need log recovery, the buffers get created with
> > > b_ops as you'd expect.
> > > 
> > > My question is, shouldn't xlog_recover_validate_buf_type unconditionally
> > > set b_ops and save the "if (hascrc)" bits for the part that ensures the
> > > LSN is up to date?
> > > 
> > 
> > Seems reasonable, but I notice that the has_crc() check around
> > _validate_buf_type() comes in sometime after the the original commit
> > referenced below (d75afeb3) and commit 67dc288c. It appears to be due to
> > commit 9222a9cf86 ("xfs: don't shutdown log recovery on validation
> > errors").
> > 
> > IIRC, the problem there is that log recovery had traditionally always
> > unconditionally replayed everything in the log over whatever resides in
> > the fs. This actually meant that recovery could transiently corrupt
> > buffers in certain cases if the target buffer happened to be relogged
> > more than once and was already up to date, which leads to verification
> > failures.
> 
> Yes, that is one of the problems - we can get writeback of partially
> updated buffers mid-way through log recovery on v4 filesystems.
> 
> > This was addressed for v5 filesystems with LSN ordering rules,
> > but the challenge for v4 filesystems was that there is no metadata LSN
> > and thus no means to detect whether a buffer is already up to date with
> > regard to a transaction in the log.
> 
> In a nutshell.
> 
> > Dave might have more historical context to confirm that...
> 
> Historically it only occurred (rarely) due to memory pressure
> triggering writeback during recovery. However, when we changed to context
> specific delayed write buffer lists we started doing that writeback
> after every checkpoint was recovered. Hence it's now pretty trivial
> to trigger verifier failures during log recovery on v4
> filesystems...
> 
> > If that is
> > still an open issue, a couple initial ideas come to mind:
> > 
> > 1.) Do something simple/crude like reclaim all buffers after log
> > recovery on v4 filesystems to provide a clean slate going forward.
> 
> This might be a worthwhile thing to do, anyway. Log recovery can
> lead to a lot of cached metadata that won't be referenced again
> after reocvery is complete. Perhaps we should just clear the
> buffer cache after the first phase of recovery just before/after
> we re-read the superblock and re-init the incore space accounting...
> 

Ok. In that case, then perhaps doing something like this wouldn't need
to be limited to just v4 filesystems.

> > 2.) Unconditionally attach verifiers during recovery as originally done
> > and wire up something generic that short circuits verifier invocations
> > on v4 filesystems when log recovery is in progress.
> 
> I'd prefer "return to clean slate" than have to handle log
> recovery state specially in every verifier. It's simple, it's easy
> to maintain, and it creates a barrier between metadata recovered
> from the log and post-processing of intents/unlinks that ensures
> we've made all the recovered changes stable on disk before we move
> on...
> 

Either of these seem reasonable to me so if there is additional reason
to go with #1, then that works for me.

Just note that the intent of #2 above was not to modify every verifier
to accommodate this situation.  Rather, to consider a generic change
such as not invoking the verifier under particular conditions. Modifying
the individual verifiers might be more reasonable if we had a generic
verifier abstraction as Darrick and I discussed a bit ago wrt to some
unrelated changes that I don't recall, but even then I'm not sure I
would consider it the most elegant option..

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about 67dc288c ("xfs: ensure verifiers are attached to recovered buffers")
  2017-10-14 19:05   ` Darrick J. Wong
  2017-10-16 10:37     ` Brian Foster
@ 2017-10-16 21:29     ` Dave Chinner
  2017-10-16 22:18       ` Darrick J. Wong
  1 sibling, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2017-10-16 21:29 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, xfs

On Sat, Oct 14, 2017 at 12:05:30PM -0700, Darrick J. Wong wrote:
> On Sat, Oct 14, 2017 at 07:55:51AM -0400, Brian Foster wrote:
> > On Fri, Oct 13, 2017 at 11:49:16AM -0700, Darrick J. Wong wrote:
> > > Hi all,
> > > 
> > > I have a question about 67dc288c ("xfs: ensure verifiers are attached to
> > > recovered buffers").  I was analyzing a scrub failure on generic/392
> > > with a v4 filesystem which stems from xfs_scrub_buffer_recheck (it's in
> > > scrub part 4) being unable to find a b_ops attached to the AGF buffer
> > > and signalling error.
> > > 
> > > The pattern I observe is that when log recovery runs on a v4 filesystem,
> > > we call some variant of xfs_buf_read with a NULL ops parameter.  The
> > > buffer therefore gets created and read without any verifiers.
> > > Eventually, xlog_recover_validate_buf_type gets called, and on a v5
> > > filesystem we come back and attach verifiers and all is well.  However,
> > > on a v4 filesystem the function returns without doing anything, so the
> > > xfs_buf just sits around in memory with no verifier.  Subsequent
> > > read/log/relse patterns can write anything they want without write
> > > verifiers to check that.
> > > 
> > > If the v4 fs didn't need log recovery, the buffers get created with
> > > b_ops as you'd expect.
> > > 
> > > My question is, shouldn't xlog_recover_validate_buf_type unconditionally
> > > set b_ops and save the "if (hascrc)" bits for the part that ensures the
> > > LSN is up to date?
> > > 
> > 
> > Seems reasonable, but I notice that the has_crc() check around
> > _validate_buf_type() comes in sometime after the the original commit
> > referenced below (d75afeb3) and commit 67dc288c. It appears to be due to
> > commit 9222a9cf86 ("xfs: don't shutdown log recovery on validation
> > errors").
> > 
> > IIRC, the problem there is that log recovery had traditionally always
> > unconditionally replayed everything in the log over whatever resides in
> > the fs. This actually meant that recovery could transiently corrupt
> > buffers in certain cases if the target buffer happened to be relogged
> > more than once and was already up to date, which leads to verification
> > failures. This was addressed for v5 filesystems with LSN ordering rules,
> > but the challenge for v4 filesystems was that there is no metadata LSN
> > and thus no means to detect whether a buffer is already up to date with
> > regard to a transaction in the log.
> > 
> > Dave might have more historical context to confirm that... If that is
> > still an open issue, a couple initial ideas come to mind:
> > 
> > 1.) Do something simple/crude like reclaim all buffers after log
> > recovery on v4 filesystems to provide a clean slate going forward.
> > 
> > 2.) Unconditionally attach verifiers during recovery as originally done
> > and wire up something generic that short circuits verifier invocations
> > on v4 filesystems when log recovery is in progress.
> 
> What do you think about 3) add b_ops later if they're missing?
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 2f97c12..8842a27 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -742,6 +742,15 @@ xfs_buf_read_map(
>         if (bp) {
>                 trace_xfs_buf_read(bp, flags, _RET_IP_);
>  
> +               /*
> +                * If this buffer is up to date and has no verifier, try
> +                * to set one.  This can happen on v4 because log
> +                * recovery reads in the buffers for replay but does not
> +                * set b_ops.
> +                */
> +               if ((bp->b_flags & XBF_DONE) && !bp->b_ops)
> +                       bp->b_ops = ops;

I don't really like this because it will hide bugs in the code.

It also doesn't solve the problem because a read with NULL ops will
still leave a buffer with no ops attached.

IMO, if we've read/created a buffer without ops, then it is up to
the code that created/read it to either attach the required ops
before the buffer is released or to invalidate the buffer before
anyone else can use it or write it. The buffer write code warns
about writing buffers without verfiers, but we can't warn on read
because read-with-NULL-verifier is a valid thing to do....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about 67dc288c ("xfs: ensure verifiers are attached to recovered buffers")
  2017-10-16 21:29     ` Dave Chinner
@ 2017-10-16 22:18       ` Darrick J. Wong
  2017-10-17 14:53         ` Brian Foster
  2017-10-20 15:16         ` Brian Foster
  0 siblings, 2 replies; 15+ messages in thread
From: Darrick J. Wong @ 2017-10-16 22:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Brian Foster, xfs

On Tue, Oct 17, 2017 at 08:29:33AM +1100, Dave Chinner wrote:
> On Sat, Oct 14, 2017 at 12:05:30PM -0700, Darrick J. Wong wrote:
> > On Sat, Oct 14, 2017 at 07:55:51AM -0400, Brian Foster wrote:
> > > On Fri, Oct 13, 2017 at 11:49:16AM -0700, Darrick J. Wong wrote:
> > > > Hi all,
> > > > 
> > > > I have a question about 67dc288c ("xfs: ensure verifiers are attached to
> > > > recovered buffers").  I was analyzing a scrub failure on generic/392
> > > > with a v4 filesystem which stems from xfs_scrub_buffer_recheck (it's in
> > > > scrub part 4) being unable to find a b_ops attached to the AGF buffer
> > > > and signalling error.
> > > > 
> > > > The pattern I observe is that when log recovery runs on a v4 filesystem,
> > > > we call some variant of xfs_buf_read with a NULL ops parameter.  The
> > > > buffer therefore gets created and read without any verifiers.
> > > > Eventually, xlog_recover_validate_buf_type gets called, and on a v5
> > > > filesystem we come back and attach verifiers and all is well.  However,
> > > > on a v4 filesystem the function returns without doing anything, so the
> > > > xfs_buf just sits around in memory with no verifier.  Subsequent
> > > > read/log/relse patterns can write anything they want without write
> > > > verifiers to check that.
> > > > 
> > > > If the v4 fs didn't need log recovery, the buffers get created with
> > > > b_ops as you'd expect.
> > > > 
> > > > My question is, shouldn't xlog_recover_validate_buf_type unconditionally
> > > > set b_ops and save the "if (hascrc)" bits for the part that ensures the
> > > > LSN is up to date?
> > > > 
> > > 
> > > Seems reasonable, but I notice that the has_crc() check around
> > > _validate_buf_type() comes in sometime after the the original commit
> > > referenced below (d75afeb3) and commit 67dc288c. It appears to be due to
> > > commit 9222a9cf86 ("xfs: don't shutdown log recovery on validation
> > > errors").
> > > 
> > > IIRC, the problem there is that log recovery had traditionally always
> > > unconditionally replayed everything in the log over whatever resides in
> > > the fs. This actually meant that recovery could transiently corrupt
> > > buffers in certain cases if the target buffer happened to be relogged
> > > more than once and was already up to date, which leads to verification
> > > failures. This was addressed for v5 filesystems with LSN ordering rules,
> > > but the challenge for v4 filesystems was that there is no metadata LSN
> > > and thus no means to detect whether a buffer is already up to date with
> > > regard to a transaction in the log.
> > > 
> > > Dave might have more historical context to confirm that... If that is
> > > still an open issue, a couple initial ideas come to mind:
> > > 
> > > 1.) Do something simple/crude like reclaim all buffers after log
> > > recovery on v4 filesystems to provide a clean slate going forward.
> > > 
> > > 2.) Unconditionally attach verifiers during recovery as originally done
> > > and wire up something generic that short circuits verifier invocations
> > > on v4 filesystems when log recovery is in progress.
> > 
> > What do you think about 3) add b_ops later if they're missing?
> > 
> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index 2f97c12..8842a27 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
> > @@ -742,6 +742,15 @@ xfs_buf_read_map(
> >         if (bp) {
> >                 trace_xfs_buf_read(bp, flags, _RET_IP_);
> >  
> > +               /*
> > +                * If this buffer is up to date and has no verifier, try
> > +                * to set one.  This can happen on v4 because log
> > +                * recovery reads in the buffers for replay but does not
> > +                * set b_ops.
> > +                */
> > +               if ((bp->b_flags & XBF_DONE) && !bp->b_ops)
> > +                       bp->b_ops = ops;
> 
> I don't really like this because it will hide bugs in the code.
> 
> It also doesn't solve the problem because a read with NULL ops will
> still leave a buffer with no ops attached.
> 
> IMO, if we've read/created a buffer without ops, then it is up to
> the code that created/read it to either attach the required ops
> before the buffer is released or to invalidate the buffer before
> anyone else can use it or write it. The buffer write code warns
> about writing buffers without verfiers, but we can't warn on read
> because read-with-NULL-verifier is a valid thing to do....

Fair 'nuff.  FWIW I'm ok with approach #1 if anyone enthusiastically
wants to write it up.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about 67dc288c ("xfs: ensure verifiers are attached to recovered buffers")
  2017-10-16 22:18       ` Darrick J. Wong
@ 2017-10-17 14:53         ` Brian Foster
  2017-10-20 15:16         ` Brian Foster
  1 sibling, 0 replies; 15+ messages in thread
From: Brian Foster @ 2017-10-17 14:53 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, xfs

On Mon, Oct 16, 2017 at 03:18:18PM -0700, Darrick J. Wong wrote:
> On Tue, Oct 17, 2017 at 08:29:33AM +1100, Dave Chinner wrote:
> > On Sat, Oct 14, 2017 at 12:05:30PM -0700, Darrick J. Wong wrote:
> > > On Sat, Oct 14, 2017 at 07:55:51AM -0400, Brian Foster wrote:
> > > > On Fri, Oct 13, 2017 at 11:49:16AM -0700, Darrick J. Wong wrote:
> > > > > Hi all,
> > > > > 
> > > > > I have a question about 67dc288c ("xfs: ensure verifiers are attached to
> > > > > recovered buffers").  I was analyzing a scrub failure on generic/392
> > > > > with a v4 filesystem which stems from xfs_scrub_buffer_recheck (it's in
> > > > > scrub part 4) being unable to find a b_ops attached to the AGF buffer
> > > > > and signalling error.
> > > > > 
> > > > > The pattern I observe is that when log recovery runs on a v4 filesystem,
> > > > > we call some variant of xfs_buf_read with a NULL ops parameter.  The
> > > > > buffer therefore gets created and read without any verifiers.
> > > > > Eventually, xlog_recover_validate_buf_type gets called, and on a v5
> > > > > filesystem we come back and attach verifiers and all is well.  However,
> > > > > on a v4 filesystem the function returns without doing anything, so the
> > > > > xfs_buf just sits around in memory with no verifier.  Subsequent
> > > > > read/log/relse patterns can write anything they want without write
> > > > > verifiers to check that.
> > > > > 
> > > > > If the v4 fs didn't need log recovery, the buffers get created with
> > > > > b_ops as you'd expect.
> > > > > 
> > > > > My question is, shouldn't xlog_recover_validate_buf_type unconditionally
> > > > > set b_ops and save the "if (hascrc)" bits for the part that ensures the
> > > > > LSN is up to date?
> > > > > 
> > > > 
> > > > Seems reasonable, but I notice that the has_crc() check around
> > > > _validate_buf_type() comes in sometime after the the original commit
> > > > referenced below (d75afeb3) and commit 67dc288c. It appears to be due to
> > > > commit 9222a9cf86 ("xfs: don't shutdown log recovery on validation
> > > > errors").
> > > > 
> > > > IIRC, the problem there is that log recovery had traditionally always
> > > > unconditionally replayed everything in the log over whatever resides in
> > > > the fs. This actually meant that recovery could transiently corrupt
> > > > buffers in certain cases if the target buffer happened to be relogged
> > > > more than once and was already up to date, which leads to verification
> > > > failures. This was addressed for v5 filesystems with LSN ordering rules,
> > > > but the challenge for v4 filesystems was that there is no metadata LSN
> > > > and thus no means to detect whether a buffer is already up to date with
> > > > regard to a transaction in the log.
> > > > 
> > > > Dave might have more historical context to confirm that... If that is
> > > > still an open issue, a couple initial ideas come to mind:
> > > > 
> > > > 1.) Do something simple/crude like reclaim all buffers after log
> > > > recovery on v4 filesystems to provide a clean slate going forward.
> > > > 
> > > > 2.) Unconditionally attach verifiers during recovery as originally done
> > > > and wire up something generic that short circuits verifier invocations
> > > > on v4 filesystems when log recovery is in progress.
> > > 
> > > What do you think about 3) add b_ops later if they're missing?
> > > 
> > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > > index 2f97c12..8842a27 100644
> > > --- a/fs/xfs/xfs_buf.c
> > > +++ b/fs/xfs/xfs_buf.c
> > > @@ -742,6 +742,15 @@ xfs_buf_read_map(
> > >         if (bp) {
> > >                 trace_xfs_buf_read(bp, flags, _RET_IP_);
> > >  
> > > +               /*
> > > +                * If this buffer is up to date and has no verifier, try
> > > +                * to set one.  This can happen on v4 because log
> > > +                * recovery reads in the buffers for replay but does not
> > > +                * set b_ops.
> > > +                */
> > > +               if ((bp->b_flags & XBF_DONE) && !bp->b_ops)
> > > +                       bp->b_ops = ops;
> > 
> > I don't really like this because it will hide bugs in the code.
> > 
> > It also doesn't solve the problem because a read with NULL ops will
> > still leave a buffer with no ops attached.
> > 
> > IMO, if we've read/created a buffer without ops, then it is up to
> > the code that created/read it to either attach the required ops
> > before the buffer is released or to invalidate the buffer before
> > anyone else can use it or write it. The buffer write code warns
> > about writing buffers without verfiers, but we can't warn on read
> > because read-with-NULL-verifier is a valid thing to do....
> 
> Fair 'nuff.  FWIW I'm ok with approach #1 if anyone enthusiastically
> wants to write it up.
> 

I'll add it to my todo list but FYI I have very limited time this week,
it's more likely I won't actually get to it until next week.

Brian

> --D
> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about 67dc288c ("xfs: ensure verifiers are attached to recovered buffers")
  2017-10-16 22:18       ` Darrick J. Wong
  2017-10-17 14:53         ` Brian Foster
@ 2017-10-20 15:16         ` Brian Foster
  2017-10-20 16:44           ` Darrick J. Wong
  1 sibling, 1 reply; 15+ messages in thread
From: Brian Foster @ 2017-10-20 15:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, xfs

On Mon, Oct 16, 2017 at 03:18:18PM -0700, Darrick J. Wong wrote:
> On Tue, Oct 17, 2017 at 08:29:33AM +1100, Dave Chinner wrote:
> > On Sat, Oct 14, 2017 at 12:05:30PM -0700, Darrick J. Wong wrote:
> > > On Sat, Oct 14, 2017 at 07:55:51AM -0400, Brian Foster wrote:
> > > > On Fri, Oct 13, 2017 at 11:49:16AM -0700, Darrick J. Wong wrote:
> > > > > Hi all,
> > > > > 
> > > > > I have a question about 67dc288c ("xfs: ensure verifiers are attached to
> > > > > recovered buffers").  I was analyzing a scrub failure on generic/392
> > > > > with a v4 filesystem which stems from xfs_scrub_buffer_recheck (it's in
> > > > > scrub part 4) being unable to find a b_ops attached to the AGF buffer
> > > > > and signalling error.
> > > > > 
> > > > > The pattern I observe is that when log recovery runs on a v4 filesystem,
> > > > > we call some variant of xfs_buf_read with a NULL ops parameter.  The
> > > > > buffer therefore gets created and read without any verifiers.
> > > > > Eventually, xlog_recover_validate_buf_type gets called, and on a v5
> > > > > filesystem we come back and attach verifiers and all is well.  However,
> > > > > on a v4 filesystem the function returns without doing anything, so the
> > > > > xfs_buf just sits around in memory with no verifier.  Subsequent
> > > > > read/log/relse patterns can write anything they want without write
> > > > > verifiers to check that.
> > > > > 
> > > > > If the v4 fs didn't need log recovery, the buffers get created with
> > > > > b_ops as you'd expect.
> > > > > 
> > > > > My question is, shouldn't xlog_recover_validate_buf_type unconditionally
> > > > > set b_ops and save the "if (hascrc)" bits for the part that ensures the
> > > > > LSN is up to date?
> > > > > 
> > > > 
> > > > Seems reasonable, but I notice that the has_crc() check around
> > > > _validate_buf_type() comes in sometime after the the original commit
> > > > referenced below (d75afeb3) and commit 67dc288c. It appears to be due to
> > > > commit 9222a9cf86 ("xfs: don't shutdown log recovery on validation
> > > > errors").
> > > > 
> > > > IIRC, the problem there is that log recovery had traditionally always
> > > > unconditionally replayed everything in the log over whatever resides in
> > > > the fs. This actually meant that recovery could transiently corrupt
> > > > buffers in certain cases if the target buffer happened to be relogged
> > > > more than once and was already up to date, which leads to verification
> > > > failures. This was addressed for v5 filesystems with LSN ordering rules,
> > > > but the challenge for v4 filesystems was that there is no metadata LSN
> > > > and thus no means to detect whether a buffer is already up to date with
> > > > regard to a transaction in the log.
> > > > 
> > > > Dave might have more historical context to confirm that... If that is
> > > > still an open issue, a couple initial ideas come to mind:
> > > > 
> > > > 1.) Do something simple/crude like reclaim all buffers after log
> > > > recovery on v4 filesystems to provide a clean slate going forward.
> > > > 
> > > > 2.) Unconditionally attach verifiers during recovery as originally done
> > > > and wire up something generic that short circuits verifier invocations
> > > > on v4 filesystems when log recovery is in progress.
> > > 
> > > What do you think about 3) add b_ops later if they're missing?
> > > 
> > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > > index 2f97c12..8842a27 100644
> > > --- a/fs/xfs/xfs_buf.c
> > > +++ b/fs/xfs/xfs_buf.c
> > > @@ -742,6 +742,15 @@ xfs_buf_read_map(
> > >         if (bp) {
> > >                 trace_xfs_buf_read(bp, flags, _RET_IP_);
> > >  
> > > +               /*
> > > +                * If this buffer is up to date and has no verifier, try
> > > +                * to set one.  This can happen on v4 because log
> > > +                * recovery reads in the buffers for replay but does not
> > > +                * set b_ops.
> > > +                */
> > > +               if ((bp->b_flags & XBF_DONE) && !bp->b_ops)
> > > +                       bp->b_ops = ops;
> > 
> > I don't really like this because it will hide bugs in the code.
> > 
> > It also doesn't solve the problem because a read with NULL ops will
> > still leave a buffer with no ops attached.
> > 
> > IMO, if we've read/created a buffer without ops, then it is up to
> > the code that created/read it to either attach the required ops
> > before the buffer is released or to invalidate the buffer before
> > anyone else can use it or write it. The buffer write code warns
> > about writing buffers without verfiers, but we can't warn on read
> > because read-with-NULL-verifier is a valid thing to do....
> 
> Fair 'nuff.  FWIW I'm ok with approach #1 if anyone enthusiastically
> wants to write it up.
> 

What do you guys think about something like this? It runs
unconditionally, but this is partly as designed to provide a clean cache
on v5 filesystems as well. It should be mostly unnoticeable if log
recovery hasn't run, but we could easily add another flag to make it
conditional on log recovery if desired.

Brian

--- 8< ---

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index dc95a49..226d26b 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -780,6 +780,16 @@ xfs_log_mount_finish(
 	mp->m_super->s_flags &= ~MS_ACTIVE;
 	evict_inodes(mp->m_super);
 
+	/*
+	 * Drain the buffer LRU after log recovery. This is required for
+	 * v4 filesystems to avoid leaving around buffers with NULL verifier
+	 * ops. Do it unconditionally to make sure we're always in a clean cache
+	 * state after mount.
+	 */
+	xfs_log_force(mp, XFS_LOG_SYNC);
+	xfs_ail_push_all_sync(mp->m_ail);
+	xfs_wait_buftarg(mp->m_ddev_targp);
+
 	if (readonly)
 		mp->m_flags |= XFS_MOUNT_RDONLY;
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Question about 67dc288c ("xfs: ensure verifiers are attached to recovered buffers")
  2017-10-20 15:16         ` Brian Foster
@ 2017-10-20 16:44           ` Darrick J. Wong
  2017-10-20 16:59             ` Brian Foster
  0 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2017-10-20 16:44 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, xfs

On Fri, Oct 20, 2017 at 11:16:10AM -0400, Brian Foster wrote:
> On Mon, Oct 16, 2017 at 03:18:18PM -0700, Darrick J. Wong wrote:
> > On Tue, Oct 17, 2017 at 08:29:33AM +1100, Dave Chinner wrote:
> > > On Sat, Oct 14, 2017 at 12:05:30PM -0700, Darrick J. Wong wrote:
> > > > On Sat, Oct 14, 2017 at 07:55:51AM -0400, Brian Foster wrote:
> > > > > On Fri, Oct 13, 2017 at 11:49:16AM -0700, Darrick J. Wong wrote:
> > > > > > Hi all,
> > > > > > 
> > > > > > I have a question about 67dc288c ("xfs: ensure verifiers are attached to
> > > > > > recovered buffers").  I was analyzing a scrub failure on generic/392
> > > > > > with a v4 filesystem which stems from xfs_scrub_buffer_recheck (it's in
> > > > > > scrub part 4) being unable to find a b_ops attached to the AGF buffer
> > > > > > and signalling error.
> > > > > > 
> > > > > > The pattern I observe is that when log recovery runs on a v4 filesystem,
> > > > > > we call some variant of xfs_buf_read with a NULL ops parameter.  The
> > > > > > buffer therefore gets created and read without any verifiers.
> > > > > > Eventually, xlog_recover_validate_buf_type gets called, and on a v5
> > > > > > filesystem we come back and attach verifiers and all is well.  However,
> > > > > > on a v4 filesystem the function returns without doing anything, so the
> > > > > > xfs_buf just sits around in memory with no verifier.  Subsequent
> > > > > > read/log/relse patterns can write anything they want without write
> > > > > > verifiers to check that.
> > > > > > 
> > > > > > If the v4 fs didn't need log recovery, the buffers get created with
> > > > > > b_ops as you'd expect.
> > > > > > 
> > > > > > My question is, shouldn't xlog_recover_validate_buf_type unconditionally
> > > > > > set b_ops and save the "if (hascrc)" bits for the part that ensures the
> > > > > > LSN is up to date?
> > > > > > 
> > > > > 
> > > > > Seems reasonable, but I notice that the has_crc() check around
> > > > > _validate_buf_type() comes in sometime after the the original commit
> > > > > referenced below (d75afeb3) and commit 67dc288c. It appears to be due to
> > > > > commit 9222a9cf86 ("xfs: don't shutdown log recovery on validation
> > > > > errors").
> > > > > 
> > > > > IIRC, the problem there is that log recovery had traditionally always
> > > > > unconditionally replayed everything in the log over whatever resides in
> > > > > the fs. This actually meant that recovery could transiently corrupt
> > > > > buffers in certain cases if the target buffer happened to be relogged
> > > > > more than once and was already up to date, which leads to verification
> > > > > failures. This was addressed for v5 filesystems with LSN ordering rules,
> > > > > but the challenge for v4 filesystems was that there is no metadata LSN
> > > > > and thus no means to detect whether a buffer is already up to date with
> > > > > regard to a transaction in the log.
> > > > > 
> > > > > Dave might have more historical context to confirm that... If that is
> > > > > still an open issue, a couple initial ideas come to mind:
> > > > > 
> > > > > 1.) Do something simple/crude like reclaim all buffers after log
> > > > > recovery on v4 filesystems to provide a clean slate going forward.
> > > > > 
> > > > > 2.) Unconditionally attach verifiers during recovery as originally done
> > > > > and wire up something generic that short circuits verifier invocations
> > > > > on v4 filesystems when log recovery is in progress.
> > > > 
> > > > What do you think about 3) add b_ops later if they're missing?
> > > > 
> > > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > > > index 2f97c12..8842a27 100644
> > > > --- a/fs/xfs/xfs_buf.c
> > > > +++ b/fs/xfs/xfs_buf.c
> > > > @@ -742,6 +742,15 @@ xfs_buf_read_map(
> > > >         if (bp) {
> > > >                 trace_xfs_buf_read(bp, flags, _RET_IP_);
> > > >  
> > > > +               /*
> > > > +                * If this buffer is up to date and has no verifier, try
> > > > +                * to set one.  This can happen on v4 because log
> > > > +                * recovery reads in the buffers for replay but does not
> > > > +                * set b_ops.
> > > > +                */
> > > > +               if ((bp->b_flags & XBF_DONE) && !bp->b_ops)
> > > > +                       bp->b_ops = ops;
> > > 
> > > I don't really like this because it will hide bugs in the code.
> > > 
> > > It also doesn't solve the problem because a read with NULL ops will
> > > still leave a buffer with no ops attached.
> > > 
> > > IMO, if we've read/created a buffer without ops, then it is up to
> > > the code that created/read it to either attach the required ops
> > > before the buffer is released or to invalidate the buffer before
> > > anyone else can use it or write it. The buffer write code warns
> > > about writing buffers without verfiers, but we can't warn on read
> > > because read-with-NULL-verifier is a valid thing to do....
> > 
> > Fair 'nuff.  FWIW I'm ok with approach #1 if anyone enthusiastically
> > wants to write it up.
> > 
> 
> What do you guys think about something like this? It runs
> unconditionally, but this is partly as designed to provide a clean cache
> on v5 filesystems as well. It should be mostly unnoticeable if log
> recovery hasn't run, but we could easily add another flag to make it
> conditional on log recovery if desired.
> 
> Brian
> 
> --- 8< ---
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index dc95a49..226d26b 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -780,6 +780,16 @@ xfs_log_mount_finish(
>  	mp->m_super->s_flags &= ~MS_ACTIVE;
>  	evict_inodes(mp->m_super);
>  
> +	/*
> +	 * Drain the buffer LRU after log recovery. This is required for
> +	 * v4 filesystems to avoid leaving around buffers with NULL verifier
> +	 * ops. Do it unconditionally to make sure we're always in a clean cache
> +	 * state after mount.
> +	 */
> +	xfs_log_force(mp, XFS_LOG_SYNC);
> +	xfs_ail_push_all_sync(mp->m_ail);

Is it necessary to force the log & push the AIL even if recovery didn't run?

> +	xfs_wait_buftarg(mp->m_ddev_targp);

Otherwise, looks ok enough I'll see if it fixes the scrub failures that
inspired my original email. :)

--D

> +
>  	if (readonly)
>  		mp->m_flags |= XFS_MOUNT_RDONLY;
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about 67dc288c ("xfs: ensure verifiers are attached to recovered buffers")
  2017-10-20 16:44           ` Darrick J. Wong
@ 2017-10-20 16:59             ` Brian Foster
  2017-10-20 18:00               ` Darrick J. Wong
  0 siblings, 1 reply; 15+ messages in thread
From: Brian Foster @ 2017-10-20 16:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, xfs

On Fri, Oct 20, 2017 at 09:44:06AM -0700, Darrick J. Wong wrote:
> On Fri, Oct 20, 2017 at 11:16:10AM -0400, Brian Foster wrote:
> > On Mon, Oct 16, 2017 at 03:18:18PM -0700, Darrick J. Wong wrote:
> > > On Tue, Oct 17, 2017 at 08:29:33AM +1100, Dave Chinner wrote:
> > > > On Sat, Oct 14, 2017 at 12:05:30PM -0700, Darrick J. Wong wrote:
> > > > > On Sat, Oct 14, 2017 at 07:55:51AM -0400, Brian Foster wrote:
> > > > > > On Fri, Oct 13, 2017 at 11:49:16AM -0700, Darrick J. Wong wrote:
> > > > > > > Hi all,
> > > > > > > 
> > > > > > > I have a question about 67dc288c ("xfs: ensure verifiers are attached to
> > > > > > > recovered buffers").  I was analyzing a scrub failure on generic/392
> > > > > > > with a v4 filesystem which stems from xfs_scrub_buffer_recheck (it's in
> > > > > > > scrub part 4) being unable to find a b_ops attached to the AGF buffer
> > > > > > > and signalling error.
> > > > > > > 
> > > > > > > The pattern I observe is that when log recovery runs on a v4 filesystem,
> > > > > > > we call some variant of xfs_buf_read with a NULL ops parameter.  The
> > > > > > > buffer therefore gets created and read without any verifiers.
> > > > > > > Eventually, xlog_recover_validate_buf_type gets called, and on a v5
> > > > > > > filesystem we come back and attach verifiers and all is well.  However,
> > > > > > > on a v4 filesystem the function returns without doing anything, so the
> > > > > > > xfs_buf just sits around in memory with no verifier.  Subsequent
> > > > > > > read/log/relse patterns can write anything they want without write
> > > > > > > verifiers to check that.
> > > > > > > 
> > > > > > > If the v4 fs didn't need log recovery, the buffers get created with
> > > > > > > b_ops as you'd expect.
> > > > > > > 
> > > > > > > My question is, shouldn't xlog_recover_validate_buf_type unconditionally
> > > > > > > set b_ops and save the "if (hascrc)" bits for the part that ensures the
> > > > > > > LSN is up to date?
> > > > > > > 
> > > > > > 
> > > > > > Seems reasonable, but I notice that the has_crc() check around
> > > > > > _validate_buf_type() comes in sometime after the the original commit
> > > > > > referenced below (d75afeb3) and commit 67dc288c. It appears to be due to
> > > > > > commit 9222a9cf86 ("xfs: don't shutdown log recovery on validation
> > > > > > errors").
> > > > > > 
> > > > > > IIRC, the problem there is that log recovery had traditionally always
> > > > > > unconditionally replayed everything in the log over whatever resides in
> > > > > > the fs. This actually meant that recovery could transiently corrupt
> > > > > > buffers in certain cases if the target buffer happened to be relogged
> > > > > > more than once and was already up to date, which leads to verification
> > > > > > failures. This was addressed for v5 filesystems with LSN ordering rules,
> > > > > > but the challenge for v4 filesystems was that there is no metadata LSN
> > > > > > and thus no means to detect whether a buffer is already up to date with
> > > > > > regard to a transaction in the log.
> > > > > > 
> > > > > > Dave might have more historical context to confirm that... If that is
> > > > > > still an open issue, a couple initial ideas come to mind:
> > > > > > 
> > > > > > 1.) Do something simple/crude like reclaim all buffers after log
> > > > > > recovery on v4 filesystems to provide a clean slate going forward.
> > > > > > 
> > > > > > 2.) Unconditionally attach verifiers during recovery as originally done
> > > > > > and wire up something generic that short circuits verifier invocations
> > > > > > on v4 filesystems when log recovery is in progress.
> > > > > 
> > > > > What do you think about 3) add b_ops later if they're missing?
> > > > > 
> > > > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > > > > index 2f97c12..8842a27 100644
> > > > > --- a/fs/xfs/xfs_buf.c
> > > > > +++ b/fs/xfs/xfs_buf.c
> > > > > @@ -742,6 +742,15 @@ xfs_buf_read_map(
> > > > >         if (bp) {
> > > > >                 trace_xfs_buf_read(bp, flags, _RET_IP_);
> > > > >  
> > > > > +               /*
> > > > > +                * If this buffer is up to date and has no verifier, try
> > > > > +                * to set one.  This can happen on v4 because log
> > > > > +                * recovery reads in the buffers for replay but does not
> > > > > +                * set b_ops.
> > > > > +                */
> > > > > +               if ((bp->b_flags & XBF_DONE) && !bp->b_ops)
> > > > > +                       bp->b_ops = ops;
> > > > 
> > > > I don't really like this because it will hide bugs in the code.
> > > > 
> > > > It also doesn't solve the problem because a read with NULL ops will
> > > > still leave a buffer with no ops attached.
> > > > 
> > > > IMO, if we've read/created a buffer without ops, then it is up to
> > > > the code that created/read it to either attach the required ops
> > > > before the buffer is released or to invalidate the buffer before
> > > > anyone else can use it or write it. The buffer write code warns
> > > > about writing buffers without verfiers, but we can't warn on read
> > > > because read-with-NULL-verifier is a valid thing to do....
> > > 
> > > Fair 'nuff.  FWIW I'm ok with approach #1 if anyone enthusiastically
> > > wants to write it up.
> > > 
> > 
> > What do you guys think about something like this? It runs
> > unconditionally, but this is partly as designed to provide a clean cache
> > on v5 filesystems as well. It should be mostly unnoticeable if log
> > recovery hasn't run, but we could easily add another flag to make it
> > conditional on log recovery if desired.
> > 
> > Brian
> > 
> > --- 8< ---
> > 
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index dc95a49..226d26b 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -780,6 +780,16 @@ xfs_log_mount_finish(
> >  	mp->m_super->s_flags &= ~MS_ACTIVE;
> >  	evict_inodes(mp->m_super);
> >  
> > +	/*
> > +	 * Drain the buffer LRU after log recovery. This is required for
> > +	 * v4 filesystems to avoid leaving around buffers with NULL verifier
> > +	 * ops. Do it unconditionally to make sure we're always in a clean cache
> > +	 * state after mount.
> > +	 */
> > +	xfs_log_force(mp, XFS_LOG_SYNC);
> > +	xfs_ail_push_all_sync(mp->m_ail);
> 
> Is it necessary to force the log & push the AIL even if recovery didn't run?
> 

Don't think so.. As noted above, I just did this because it's basically
unnoticeable if log recovery didn't run and it avoids the need to add a
flag. I can add one if that is preferred, or thinking about it again I
suppose we could just sample for XLOG_RECOVERY_NEEDED before we call
xlog_recover_finish() (which clears it).

> > +	xfs_wait_buftarg(mp->m_ddev_targp);
> 
> Otherwise, looks ok enough I'll see if it fixes the scrub failures that
> inspired my original email. :)
> 

Thanks, let me know.

Brian

> --D
> 
> > +
> >  	if (readonly)
> >  		mp->m_flags |= XFS_MOUNT_RDONLY;
> >  
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about 67dc288c ("xfs: ensure verifiers are attached to recovered buffers")
  2017-10-20 16:59             ` Brian Foster
@ 2017-10-20 18:00               ` Darrick J. Wong
  2017-10-21  6:10                 ` Darrick J. Wong
  0 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2017-10-20 18:00 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, xfs

On Fri, Oct 20, 2017 at 12:59:41PM -0400, Brian Foster wrote:
> On Fri, Oct 20, 2017 at 09:44:06AM -0700, Darrick J. Wong wrote:
> > On Fri, Oct 20, 2017 at 11:16:10AM -0400, Brian Foster wrote:
> > > On Mon, Oct 16, 2017 at 03:18:18PM -0700, Darrick J. Wong wrote:
> > > > On Tue, Oct 17, 2017 at 08:29:33AM +1100, Dave Chinner wrote:
> > > > > On Sat, Oct 14, 2017 at 12:05:30PM -0700, Darrick J. Wong wrote:
> > > > > > On Sat, Oct 14, 2017 at 07:55:51AM -0400, Brian Foster wrote:
> > > > > > > On Fri, Oct 13, 2017 at 11:49:16AM -0700, Darrick J. Wong wrote:
> > > > > > > > Hi all,
> > > > > > > > 
> > > > > > > > I have a question about 67dc288c ("xfs: ensure verifiers are attached to
> > > > > > > > recovered buffers").  I was analyzing a scrub failure on generic/392
> > > > > > > > with a v4 filesystem which stems from xfs_scrub_buffer_recheck (it's in
> > > > > > > > scrub part 4) being unable to find a b_ops attached to the AGF buffer
> > > > > > > > and signalling error.
> > > > > > > > 
> > > > > > > > The pattern I observe is that when log recovery runs on a v4 filesystem,
> > > > > > > > we call some variant of xfs_buf_read with a NULL ops parameter.  The
> > > > > > > > buffer therefore gets created and read without any verifiers.
> > > > > > > > Eventually, xlog_recover_validate_buf_type gets called, and on a v5
> > > > > > > > filesystem we come back and attach verifiers and all is well.  However,
> > > > > > > > on a v4 filesystem the function returns without doing anything, so the
> > > > > > > > xfs_buf just sits around in memory with no verifier.  Subsequent
> > > > > > > > read/log/relse patterns can write anything they want without write
> > > > > > > > verifiers to check that.
> > > > > > > > 
> > > > > > > > If the v4 fs didn't need log recovery, the buffers get created with
> > > > > > > > b_ops as you'd expect.
> > > > > > > > 
> > > > > > > > My question is, shouldn't xlog_recover_validate_buf_type unconditionally
> > > > > > > > set b_ops and save the "if (hascrc)" bits for the part that ensures the
> > > > > > > > LSN is up to date?
> > > > > > > > 
> > > > > > > 
> > > > > > > Seems reasonable, but I notice that the has_crc() check around
> > > > > > > _validate_buf_type() comes in sometime after the the original commit
> > > > > > > referenced below (d75afeb3) and commit 67dc288c. It appears to be due to
> > > > > > > commit 9222a9cf86 ("xfs: don't shutdown log recovery on validation
> > > > > > > errors").
> > > > > > > 
> > > > > > > IIRC, the problem there is that log recovery had traditionally always
> > > > > > > unconditionally replayed everything in the log over whatever resides in
> > > > > > > the fs. This actually meant that recovery could transiently corrupt
> > > > > > > buffers in certain cases if the target buffer happened to be relogged
> > > > > > > more than once and was already up to date, which leads to verification
> > > > > > > failures. This was addressed for v5 filesystems with LSN ordering rules,
> > > > > > > but the challenge for v4 filesystems was that there is no metadata LSN
> > > > > > > and thus no means to detect whether a buffer is already up to date with
> > > > > > > regard to a transaction in the log.
> > > > > > > 
> > > > > > > Dave might have more historical context to confirm that... If that is
> > > > > > > still an open issue, a couple initial ideas come to mind:
> > > > > > > 
> > > > > > > 1.) Do something simple/crude like reclaim all buffers after log
> > > > > > > recovery on v4 filesystems to provide a clean slate going forward.
> > > > > > > 
> > > > > > > 2.) Unconditionally attach verifiers during recovery as originally done
> > > > > > > and wire up something generic that short circuits verifier invocations
> > > > > > > on v4 filesystems when log recovery is in progress.
> > > > > > 
> > > > > > What do you think about 3) add b_ops later if they're missing?
> > > > > > 
> > > > > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > > > > > index 2f97c12..8842a27 100644
> > > > > > --- a/fs/xfs/xfs_buf.c
> > > > > > +++ b/fs/xfs/xfs_buf.c
> > > > > > @@ -742,6 +742,15 @@ xfs_buf_read_map(
> > > > > >         if (bp) {
> > > > > >                 trace_xfs_buf_read(bp, flags, _RET_IP_);
> > > > > >  
> > > > > > +               /*
> > > > > > +                * If this buffer is up to date and has no verifier, try
> > > > > > +                * to set one.  This can happen on v4 because log
> > > > > > +                * recovery reads in the buffers for replay but does not
> > > > > > +                * set b_ops.
> > > > > > +                */
> > > > > > +               if ((bp->b_flags & XBF_DONE) && !bp->b_ops)
> > > > > > +                       bp->b_ops = ops;
> > > > > 
> > > > > I don't really like this because it will hide bugs in the code.
> > > > > 
> > > > > It also doesn't solve the problem because a read with NULL ops will
> > > > > still leave a buffer with no ops attached.
> > > > > 
> > > > > IMO, if we've read/created a buffer without ops, then it is up to
> > > > > the code that created/read it to either attach the required ops
> > > > > before the buffer is released or to invalidate the buffer before
> > > > > anyone else can use it or write it. The buffer write code warns
> > > > > about writing buffers without verfiers, but we can't warn on read
> > > > > because read-with-NULL-verifier is a valid thing to do....
> > > > 
> > > > Fair 'nuff.  FWIW I'm ok with approach #1 if anyone enthusiastically
> > > > wants to write it up.
> > > > 
> > > 
> > > What do you guys think about something like this? It runs
> > > unconditionally, but this is partly as designed to provide a clean cache
> > > on v5 filesystems as well. It should be mostly unnoticeable if log
> > > recovery hasn't run, but we could easily add another flag to make it
> > > conditional on log recovery if desired.
> > > 
> > > Brian
> > > 
> > > --- 8< ---
> > > 
> > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > > index dc95a49..226d26b 100644
> > > --- a/fs/xfs/xfs_log.c
> > > +++ b/fs/xfs/xfs_log.c
> > > @@ -780,6 +780,16 @@ xfs_log_mount_finish(
> > >  	mp->m_super->s_flags &= ~MS_ACTIVE;
> > >  	evict_inodes(mp->m_super);
> > >  
> > > +	/*
> > > +	 * Drain the buffer LRU after log recovery. This is required for
> > > +	 * v4 filesystems to avoid leaving around buffers with NULL verifier
> > > +	 * ops. Do it unconditionally to make sure we're always in a clean cache
> > > +	 * state after mount.
> > > +	 */
> > > +	xfs_log_force(mp, XFS_LOG_SYNC);
> > > +	xfs_ail_push_all_sync(mp->m_ail);
> > 
> > Is it necessary to force the log & push the AIL even if recovery didn't run?
> > 
> 
> Don't think so.. As noted above, I just did this because it's basically
> unnoticeable if log recovery didn't run and it avoids the need to add a
> flag. I can add one if that is preferred, or thinking about it again I
> suppose we could just sample for XLOG_RECOVERY_NEEDED before we call
> xlog_recover_finish() (which clears it).
> 
> > > +	xfs_wait_buftarg(mp->m_ddev_targp);
> > 
> > Otherwise, looks ok enough I'll see if it fixes the scrub failures that
> > inspired my original email. :)
> > 
> 
> Thanks, let me know.

This seems to fix the scrub problems running generic/392 xfs/121 xfs/181
on a v4 filesystem.

(Note that you have to have built the djwong-devel branches of the
kernel and xfsprogs in order for xfstests to trip over the verifier-less
buffers.  The particular patch is "xfs: scrub in-core metadata".)

--D

> 
> Brian
> 
> > --D
> > 
> > > +
> > >  	if (readonly)
> > >  		mp->m_flags |= XFS_MOUNT_RDONLY;
> > >  
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about 67dc288c ("xfs: ensure verifiers are attached to recovered buffers")
  2017-10-20 18:00               ` Darrick J. Wong
@ 2017-10-21  6:10                 ` Darrick J. Wong
  2017-10-23 13:08                   ` Brian Foster
  0 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2017-10-21  6:10 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, xfs

On Fri, Oct 20, 2017 at 11:00:40AM -0700, Darrick J. Wong wrote:
> On Fri, Oct 20, 2017 at 12:59:41PM -0400, Brian Foster wrote:
> > On Fri, Oct 20, 2017 at 09:44:06AM -0700, Darrick J. Wong wrote:
> > > On Fri, Oct 20, 2017 at 11:16:10AM -0400, Brian Foster wrote:
> > > > On Mon, Oct 16, 2017 at 03:18:18PM -0700, Darrick J. Wong wrote:
> > > > > On Tue, Oct 17, 2017 at 08:29:33AM +1100, Dave Chinner wrote:
> > > > > > On Sat, Oct 14, 2017 at 12:05:30PM -0700, Darrick J. Wong wrote:
> > > > > > > On Sat, Oct 14, 2017 at 07:55:51AM -0400, Brian Foster wrote:
> > > > > > > > On Fri, Oct 13, 2017 at 11:49:16AM -0700, Darrick J. Wong wrote:
> > > > > > > > > Hi all,
> > > > > > > > > 
> > > > > > > > > I have a question about 67dc288c ("xfs: ensure verifiers are attached to
> > > > > > > > > recovered buffers").  I was analyzing a scrub failure on generic/392
> > > > > > > > > with a v4 filesystem which stems from xfs_scrub_buffer_recheck (it's in
> > > > > > > > > scrub part 4) being unable to find a b_ops attached to the AGF buffer
> > > > > > > > > and signalling error.
> > > > > > > > > 
> > > > > > > > > The pattern I observe is that when log recovery runs on a v4 filesystem,
> > > > > > > > > we call some variant of xfs_buf_read with a NULL ops parameter.  The
> > > > > > > > > buffer therefore gets created and read without any verifiers.
> > > > > > > > > Eventually, xlog_recover_validate_buf_type gets called, and on a v5
> > > > > > > > > filesystem we come back and attach verifiers and all is well.  However,
> > > > > > > > > on a v4 filesystem the function returns without doing anything, so the
> > > > > > > > > xfs_buf just sits around in memory with no verifier.  Subsequent
> > > > > > > > > read/log/relse patterns can write anything they want without write
> > > > > > > > > verifiers to check that.
> > > > > > > > > 
> > > > > > > > > If the v4 fs didn't need log recovery, the buffers get created with
> > > > > > > > > b_ops as you'd expect.
> > > > > > > > > 
> > > > > > > > > My question is, shouldn't xlog_recover_validate_buf_type unconditionally
> > > > > > > > > set b_ops and save the "if (hascrc)" bits for the part that ensures the
> > > > > > > > > LSN is up to date?
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Seems reasonable, but I notice that the has_crc() check around
> > > > > > > > _validate_buf_type() comes in sometime after the the original commit
> > > > > > > > referenced below (d75afeb3) and commit 67dc288c. It appears to be due to
> > > > > > > > commit 9222a9cf86 ("xfs: don't shutdown log recovery on validation
> > > > > > > > errors").
> > > > > > > > 
> > > > > > > > IIRC, the problem there is that log recovery had traditionally always
> > > > > > > > unconditionally replayed everything in the log over whatever resides in
> > > > > > > > the fs. This actually meant that recovery could transiently corrupt
> > > > > > > > buffers in certain cases if the target buffer happened to be relogged
> > > > > > > > more than once and was already up to date, which leads to verification
> > > > > > > > failures. This was addressed for v5 filesystems with LSN ordering rules,
> > > > > > > > but the challenge for v4 filesystems was that there is no metadata LSN
> > > > > > > > and thus no means to detect whether a buffer is already up to date with
> > > > > > > > regard to a transaction in the log.
> > > > > > > > 
> > > > > > > > Dave might have more historical context to confirm that... If that is
> > > > > > > > still an open issue, a couple initial ideas come to mind:
> > > > > > > > 
> > > > > > > > 1.) Do something simple/crude like reclaim all buffers after log
> > > > > > > > recovery on v4 filesystems to provide a clean slate going forward.
> > > > > > > > 
> > > > > > > > 2.) Unconditionally attach verifiers during recovery as originally done
> > > > > > > > and wire up something generic that short circuits verifier invocations
> > > > > > > > on v4 filesystems when log recovery is in progress.
> > > > > > > 
> > > > > > > What do you think about 3) add b_ops later if they're missing?
> > > > > > > 
> > > > > > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > > > > > > index 2f97c12..8842a27 100644
> > > > > > > --- a/fs/xfs/xfs_buf.c
> > > > > > > +++ b/fs/xfs/xfs_buf.c
> > > > > > > @@ -742,6 +742,15 @@ xfs_buf_read_map(
> > > > > > >         if (bp) {
> > > > > > >                 trace_xfs_buf_read(bp, flags, _RET_IP_);
> > > > > > >  
> > > > > > > +               /*
> > > > > > > +                * If this buffer is up to date and has no verifier, try
> > > > > > > +                * to set one.  This can happen on v4 because log
> > > > > > > +                * recovery reads in the buffers for replay but does not
> > > > > > > +                * set b_ops.
> > > > > > > +                */
> > > > > > > +               if ((bp->b_flags & XBF_DONE) && !bp->b_ops)
> > > > > > > +                       bp->b_ops = ops;
> > > > > > 
> > > > > > I don't really like this because it will hide bugs in the code.
> > > > > > 
> > > > > > It also doesn't solve the problem because a read with NULL ops will
> > > > > > still leave a buffer with no ops attached.
> > > > > > 
> > > > > > IMO, if we've read/created a buffer without ops, then it is up to
> > > > > > the code that created/read it to either attach the required ops
> > > > > > before the buffer is released or to invalidate the buffer before
> > > > > > anyone else can use it or write it. The buffer write code warns
> > > > > > about writing buffers without verfiers, but we can't warn on read
> > > > > > because read-with-NULL-verifier is a valid thing to do....
> > > > > 
> > > > > Fair 'nuff.  FWIW I'm ok with approach #1 if anyone enthusiastically
> > > > > wants to write it up.
> > > > > 
> > > > 
> > > > What do you guys think about something like this? It runs
> > > > unconditionally, but this is partly as designed to provide a clean cache
> > > > on v5 filesystems as well. It should be mostly unnoticeable if log
> > > > recovery hasn't run, but we could easily add another flag to make it
> > > > conditional on log recovery if desired.
> > > > 
> > > > Brian
> > > > 
> > > > --- 8< ---
> > > > 
> > > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > > > index dc95a49..226d26b 100644
> > > > --- a/fs/xfs/xfs_log.c
> > > > +++ b/fs/xfs/xfs_log.c
> > > > @@ -780,6 +780,16 @@ xfs_log_mount_finish(
> > > >  	mp->m_super->s_flags &= ~MS_ACTIVE;
> > > >  	evict_inodes(mp->m_super);
> > > >  
> > > > +	/*
> > > > +	 * Drain the buffer LRU after log recovery. This is required for
> > > > +	 * v4 filesystems to avoid leaving around buffers with NULL verifier
> > > > +	 * ops. Do it unconditionally to make sure we're always in a clean cache
> > > > +	 * state after mount.
> > > > +	 */
> > > > +	xfs_log_force(mp, XFS_LOG_SYNC);
> > > > +	xfs_ail_push_all_sync(mp->m_ail);
> > > 
> > > Is it necessary to force the log & push the AIL even if recovery didn't run?
> > > 
> > 
> > Don't think so.. As noted above, I just did this because it's basically
> > unnoticeable if log recovery didn't run and it avoids the need to add a
> > flag. I can add one if that is preferred, or thinking about it again I
> > suppose we could just sample for XLOG_RECOVERY_NEEDED before we call
> > xlog_recover_finish() (which clears it).
> > 
> > > > +	xfs_wait_buftarg(mp->m_ddev_targp);
> > > 
> > > Otherwise, looks ok enough I'll see if it fixes the scrub failures that
> > > inspired my original email. :)
> > > 
> > 
> > Thanks, let me know.
> 
> This seems to fix the scrub problems running generic/392 xfs/121 xfs/181
> on a v4 filesystem.
> 
> (Note that you have to have built the djwong-devel branches of the
> kernel and xfsprogs in order for xfstests to trip over the verifier-less
> buffers.  The particular patch is "xfs: scrub in-core metadata".)

Follow-up question: if xlog_recover_finish returns an error, there's no
point in pushing the log, right?  I think we're basically toast if that
happens.

--D

> 
> --D
> 
> > 
> > Brian
> > 
> > > --D
> > > 
> > > > +
> > > >  	if (readonly)
> > > >  		mp->m_flags |= XFS_MOUNT_RDONLY;
> > > >  
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about 67dc288c ("xfs: ensure verifiers are attached to recovered buffers")
  2017-10-21  6:10                 ` Darrick J. Wong
@ 2017-10-23 13:08                   ` Brian Foster
  0 siblings, 0 replies; 15+ messages in thread
From: Brian Foster @ 2017-10-23 13:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, xfs

On Fri, Oct 20, 2017 at 11:10:13PM -0700, Darrick J. Wong wrote:
> On Fri, Oct 20, 2017 at 11:00:40AM -0700, Darrick J. Wong wrote:
> > On Fri, Oct 20, 2017 at 12:59:41PM -0400, Brian Foster wrote:
> > > On Fri, Oct 20, 2017 at 09:44:06AM -0700, Darrick J. Wong wrote:
> > > > On Fri, Oct 20, 2017 at 11:16:10AM -0400, Brian Foster wrote:
> > > > > On Mon, Oct 16, 2017 at 03:18:18PM -0700, Darrick J. Wong wrote:
> > > > > > On Tue, Oct 17, 2017 at 08:29:33AM +1100, Dave Chinner wrote:
> > > > > > > On Sat, Oct 14, 2017 at 12:05:30PM -0700, Darrick J. Wong wrote:
> > > > > > > > On Sat, Oct 14, 2017 at 07:55:51AM -0400, Brian Foster wrote:
> > > > > > > > > On Fri, Oct 13, 2017 at 11:49:16AM -0700, Darrick J. Wong wrote:
> > > > > > > > > > Hi all,
> > > > > > > > > > 
> > > > > > > > > > I have a question about 67dc288c ("xfs: ensure verifiers are attached to
> > > > > > > > > > recovered buffers").  I was analyzing a scrub failure on generic/392
> > > > > > > > > > with a v4 filesystem which stems from xfs_scrub_buffer_recheck (it's in
> > > > > > > > > > scrub part 4) being unable to find a b_ops attached to the AGF buffer
> > > > > > > > > > and signalling error.
> > > > > > > > > > 
> > > > > > > > > > The pattern I observe is that when log recovery runs on a v4 filesystem,
> > > > > > > > > > we call some variant of xfs_buf_read with a NULL ops parameter.  The
> > > > > > > > > > buffer therefore gets created and read without any verifiers.
> > > > > > > > > > Eventually, xlog_recover_validate_buf_type gets called, and on a v5
> > > > > > > > > > filesystem we come back and attach verifiers and all is well.  However,
> > > > > > > > > > on a v4 filesystem the function returns without doing anything, so the
> > > > > > > > > > xfs_buf just sits around in memory with no verifier.  Subsequent
> > > > > > > > > > read/log/relse patterns can write anything they want without write
> > > > > > > > > > verifiers to check that.
> > > > > > > > > > 
> > > > > > > > > > If the v4 fs didn't need log recovery, the buffers get created with
> > > > > > > > > > b_ops as you'd expect.
> > > > > > > > > > 
> > > > > > > > > > My question is, shouldn't xlog_recover_validate_buf_type unconditionally
> > > > > > > > > > set b_ops and save the "if (hascrc)" bits for the part that ensures the
> > > > > > > > > > LSN is up to date?
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Seems reasonable, but I notice that the has_crc() check around
> > > > > > > > > _validate_buf_type() comes in sometime after the the original commit
> > > > > > > > > referenced below (d75afeb3) and commit 67dc288c. It appears to be due to
> > > > > > > > > commit 9222a9cf86 ("xfs: don't shutdown log recovery on validation
> > > > > > > > > errors").
> > > > > > > > > 
> > > > > > > > > IIRC, the problem there is that log recovery had traditionally always
> > > > > > > > > unconditionally replayed everything in the log over whatever resides in
> > > > > > > > > the fs. This actually meant that recovery could transiently corrupt
> > > > > > > > > buffers in certain cases if the target buffer happened to be relogged
> > > > > > > > > more than once and was already up to date, which leads to verification
> > > > > > > > > failures. This was addressed for v5 filesystems with LSN ordering rules,
> > > > > > > > > but the challenge for v4 filesystems was that there is no metadata LSN
> > > > > > > > > and thus no means to detect whether a buffer is already up to date with
> > > > > > > > > regard to a transaction in the log.
> > > > > > > > > 
> > > > > > > > > Dave might have more historical context to confirm that... If that is
> > > > > > > > > still an open issue, a couple initial ideas come to mind:
> > > > > > > > > 
> > > > > > > > > 1.) Do something simple/crude like reclaim all buffers after log
> > > > > > > > > recovery on v4 filesystems to provide a clean slate going forward.
> > > > > > > > > 
> > > > > > > > > 2.) Unconditionally attach verifiers during recovery as originally done
> > > > > > > > > and wire up something generic that short circuits verifier invocations
> > > > > > > > > on v4 filesystems when log recovery is in progress.
> > > > > > > > 
> > > > > > > > What do you think about 3) add b_ops later if they're missing?
> > > > > > > > 
> > > > > > > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > > > > > > > index 2f97c12..8842a27 100644
> > > > > > > > --- a/fs/xfs/xfs_buf.c
> > > > > > > > +++ b/fs/xfs/xfs_buf.c
> > > > > > > > @@ -742,6 +742,15 @@ xfs_buf_read_map(
> > > > > > > >         if (bp) {
> > > > > > > >                 trace_xfs_buf_read(bp, flags, _RET_IP_);
> > > > > > > >  
> > > > > > > > +               /*
> > > > > > > > +                * If this buffer is up to date and has no verifier, try
> > > > > > > > +                * to set one.  This can happen on v4 because log
> > > > > > > > +                * recovery reads in the buffers for replay but does not
> > > > > > > > +                * set b_ops.
> > > > > > > > +                */
> > > > > > > > +               if ((bp->b_flags & XBF_DONE) && !bp->b_ops)
> > > > > > > > +                       bp->b_ops = ops;
> > > > > > > 
> > > > > > > I don't really like this because it will hide bugs in the code.
> > > > > > > 
> > > > > > > It also doesn't solve the problem because a read with NULL ops will
> > > > > > > still leave a buffer with no ops attached.
> > > > > > > 
> > > > > > > IMO, if we've read/created a buffer without ops, then it is up to
> > > > > > > the code that created/read it to either attach the required ops
> > > > > > > before the buffer is released or to invalidate the buffer before
> > > > > > > anyone else can use it or write it. The buffer write code warns
> > > > > > > about writing buffers without verfiers, but we can't warn on read
> > > > > > > because read-with-NULL-verifier is a valid thing to do....
> > > > > > 
> > > > > > Fair 'nuff.  FWIW I'm ok with approach #1 if anyone enthusiastically
> > > > > > wants to write it up.
> > > > > > 
> > > > > 
> > > > > What do you guys think about something like this? It runs
> > > > > unconditionally, but this is partly as designed to provide a clean cache
> > > > > on v5 filesystems as well. It should be mostly unnoticeable if log
> > > > > recovery hasn't run, but we could easily add another flag to make it
> > > > > conditional on log recovery if desired.
> > > > > 
> > > > > Brian
> > > > > 
> > > > > --- 8< ---
> > > > > 
> > > > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > > > > index dc95a49..226d26b 100644
> > > > > --- a/fs/xfs/xfs_log.c
> > > > > +++ b/fs/xfs/xfs_log.c
> > > > > @@ -780,6 +780,16 @@ xfs_log_mount_finish(
> > > > >  	mp->m_super->s_flags &= ~MS_ACTIVE;
> > > > >  	evict_inodes(mp->m_super);
> > > > >  
> > > > > +	/*
> > > > > +	 * Drain the buffer LRU after log recovery. This is required for
> > > > > +	 * v4 filesystems to avoid leaving around buffers with NULL verifier
> > > > > +	 * ops. Do it unconditionally to make sure we're always in a clean cache
> > > > > +	 * state after mount.
> > > > > +	 */
> > > > > +	xfs_log_force(mp, XFS_LOG_SYNC);
> > > > > +	xfs_ail_push_all_sync(mp->m_ail);
> > > > 
> > > > Is it necessary to force the log & push the AIL even if recovery didn't run?
> > > > 
> > > 
> > > Don't think so.. As noted above, I just did this because it's basically
> > > unnoticeable if log recovery didn't run and it avoids the need to add a
> > > flag. I can add one if that is preferred, or thinking about it again I
> > > suppose we could just sample for XLOG_RECOVERY_NEEDED before we call
> > > xlog_recover_finish() (which clears it).
> > > 
> > > > > +	xfs_wait_buftarg(mp->m_ddev_targp);
> > > > 
> > > > Otherwise, looks ok enough I'll see if it fixes the scrub failures that
> > > > inspired my original email. :)
> > > > 
> > > 
> > > Thanks, let me know.
> > 
> > This seems to fix the scrub problems running generic/392 xfs/121 xfs/181
> > on a v4 filesystem.
> > 
> > (Note that you have to have built the djwong-devel branches of the
> > kernel and xfsprogs in order for xfstests to trip over the verifier-less
> > buffers.  The particular patch is "xfs: scrub in-core metadata".)
> 
> Follow-up question: if xlog_recover_finish returns an error, there's no
> point in pushing the log, right?  I think we're basically toast if that
> happens.
> 

I think we want to push the log/ail in any case that we call
xfs_wait_buftarg() after starting recovery to make sure that all dirty
buffers can be reclaimed immediately. We could skip the whole thing from
here in the error case since the mount is going to fail, but I don't
think it really matters. We'll end up in xfs_log_quiesce() as part of
the mount unwind anyways, which basically ends up doing the same thing.

Brian

> --D
> 
> > 
> > --D
> > 
> > > 
> > > Brian
> > > 
> > > > --D
> > > > 
> > > > > +
> > > > >  	if (readonly)
> > > > >  		mp->m_flags |= XFS_MOUNT_RDONLY;
> > > > >  
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > the body of a message to majordomo@vger.kernel.org
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-10-23 13:08 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-13 18:49 Question about 67dc288c ("xfs: ensure verifiers are attached to recovered buffers") Darrick J. Wong
2017-10-14 11:55 ` Brian Foster
2017-10-14 19:05   ` Darrick J. Wong
2017-10-16 10:37     ` Brian Foster
2017-10-16 21:29     ` Dave Chinner
2017-10-16 22:18       ` Darrick J. Wong
2017-10-17 14:53         ` Brian Foster
2017-10-20 15:16         ` Brian Foster
2017-10-20 16:44           ` Darrick J. Wong
2017-10-20 16:59             ` Brian Foster
2017-10-20 18:00               ` Darrick J. Wong
2017-10-21  6:10                 ` Darrick J. Wong
2017-10-23 13:08                   ` Brian Foster
2017-10-14 22:07   ` Dave Chinner
2017-10-16 10:38     ` Brian Foster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.