All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: detect agfl count corruption and reset agfl
@ 2018-03-14 17:17 Brian Foster
  2018-03-14 18:12 ` Darrick J. Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Brian Foster @ 2018-03-14 17:17 UTC (permalink / raw)
  To: linux-xfs

The struct xfs_agfl v5 header was originally introduced with
unexpected padding that caused the AGFL to operate with one less
slot than intended. The header has since been packed, but the fix
left an incompatibility for users who upgrade from an old kernel
with the unpacked header to a newer kernel with the packed header
while the AGFL happens to wrap around the end. The newer kernel
recognizes one extra slot at the physical end of the AGFL that the
previous kernel did not. The new kernel will eventually attempt to
allocate a block from that slot, which contains invalid data, and
cause a crash.

This condition can be detected by comparing the active range of the
AGFL to the count. While this detects a padding mismatch, it can
also trigger false positives for unrelated flcount corruption. Since
we cannot distinguish a size mismatch due to padding from unrelated
corruption, we can't trust the AGFL enough to simply repopulate the
empty slot.

Instead, avoid unnecessarily complex detection logic and and use a
solution that can handle any form of flcount corruption that slips
through read verifiers: distrust the entire AGFL and reset it to an
empty state. Any valid blocks within the AGFL are intentionally
leaked. This requires xfs_repair to rectify (which was already
necessary based on the state the AGFL was found in). The reset
mitigates the side effect of the padding mismatch problem from a
filesystem crash to a free space accounting inconsistency. The
generic approach also means that this patch can be safely backported
to kernels with or without a packed struct xfs_agfl.

Check the AGF for an invalid freelist count on initial read from
disk. If detected, set a flag on the xfs_perag to indicate that a
reset is required before the AGFL can be used. In the first
transaction that attempts to use a flagged AGFL, reset it to empty,
warn the user about the inconsistency and allow the freelist fixup
code to repopulate the AGFL with new blocks. The xfs_perag flag is
cleared to eliminate the need for repeated checks on each block
allocation operation.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
---

Hi all,

Here's an actual patch for the AGFL reset approach. This survives a
normal xfstests run. I also ran some quick xfstests runs with hacks in
place to unconditionally trigger an agfl reset on first read. This
caused a ton of test failures due to the resulting accounting
inconsistency on the scratch device, but otherwise it survived from a
torture test perspective and didn't introduce any unrelated regressions
that I could see.

Thoughts, reviews, flames appreciated.

Brian

v1:
- Use a simplified AGFL reset mechanism.
- Trigger on AGFL fixup rather than get/put.
- Various refactors, cleanups and comments.
rfc: https://marc.info/?l=linux-xfs&m=152045069616434&w=2

 fs/xfs/libxfs/xfs_alloc.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_mount.h        |  1 +
 fs/xfs/xfs_trace.h        |  9 ++++-
 3 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 3db90b707fb2..7dfec92f28dc 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2063,6 +2063,95 @@ xfs_alloc_space_available(
 }
 
 /*
+ * Check the agfl fields of the agf for inconsistency or corruption. The purpose
+ * is to detect an agfl header padding mismatch between current and early v5
+ * kernels. This problem manifests as a 1-slot size difference between the
+ * on-disk flcount and the active [first, last] range of a wrapped agfl. This
+ * may also catch variants of agfl count corruption unrelated to padding. Either
+ * way, we'll reset the agfl and warn the user.
+ *
+ * Return true if a reset is required before the agfl can be used, false
+ * otherwise.
+ */
+static bool
+xfs_agfl_need_reset(
+	struct xfs_mount	*mp,
+	struct xfs_agf		*agf)
+{
+	uint32_t		f = be32_to_cpu(agf->agf_flfirst);
+	uint32_t		l = be32_to_cpu(agf->agf_fllast);
+	uint32_t		c = be32_to_cpu(agf->agf_flcount);
+	int			agfl_size = xfs_agfl_size(mp);
+	int			active;
+
+	/* no agfl header on v4 supers */
+	if (!xfs_sb_version_hascrc(&mp->m_sb))
+		return false;
+
+	/*
+	 * The agf read verifier catches severe corruption of these fields.
+	 * Repeat some sanity checks to cover a packed -> unpacked mismatch if
+	 * the verifier allows it.
+	 */
+	if (f >= agfl_size || l >= agfl_size)
+		return true;
+	if (c > agfl_size)
+		return true;
+
+	/*
+	 * Check consistency between the on-disk count and the active range. An
+	 * agfl padding mismatch manifests as an inconsistent flcount.
+	 */
+	if (c && l >= f)
+		active = l - f + 1;
+	else if (c)
+		active = agfl_size - f + l + 1;
+	else
+		active = 0;
+	if (active != c)
+		return true;
+
+	return false;
+}
+
+/*
+ * Reset the agfl to an empty state. Ignore/drop any existing blocks since the
+ * agfl content cannot be trusted. Warn the user that a repair is required to
+ * recover leaked blocks.
+ *
+ * The purpose of this mechanism is to handle filesystems affected by the agfl
+ * header padding mismatch problem. A reset keeps the filesystem online with a
+ * relatively minor free space accounting inconsistency rather than suffer the
+ * inevitable crash from use of an invalid agfl block.
+ */
+static void
+xfs_agfl_reset(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	struct xfs_perag	*pag)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
+
+	ASSERT(pag->pagf_agflreset);
+	trace_xfs_agfl_reset(mp, agf, 0, _RET_IP_);
+
+	xfs_warn(mp,
+	       "WARNING: Reset corrupted AGFL on AG %u. %d blocks leaked. "
+	       "Please unmount and run xfs_repair.",
+	         pag->pag_agno, pag->pagf_flcount);
+
+	agf->agf_flfirst = 0;
+	agf->agf_fllast = cpu_to_be32(xfs_agfl_size(mp) - 1);
+	agf->agf_flcount = 0;
+	xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST |
+				    XFS_AGF_FLCOUNT);
+
+	pag->pagf_flcount = 0;
+	pag->pagf_agflreset = false;
+}
+
+/*
  * Decide whether to use this allocation group for this allocation.
  * If so, fix up the btree freelist's size.
  */
@@ -2123,6 +2212,10 @@ xfs_alloc_fix_freelist(
 		}
 	}
 
+	/* reset a padding mismatched agfl before final free space check */
+	if (pag->pagf_agflreset)
+		xfs_agfl_reset(tp, agbp, pag);
+
 	/* If there isn't enough total space or single-extent, reject it. */
 	need = xfs_alloc_min_freelist(mp, pag);
 	if (!xfs_alloc_space_available(args, need, flags))
@@ -2279,6 +2372,7 @@ xfs_alloc_get_freelist(
 		agf->agf_flfirst = 0;
 
 	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
+	ASSERT(!pag->pagf_agflreset);
 	be32_add_cpu(&agf->agf_flcount, -1);
 	xfs_trans_agflist_delta(tp, -1);
 	pag->pagf_flcount--;
@@ -2390,6 +2484,7 @@ xfs_alloc_put_freelist(
 		agf->agf_fllast = 0;
 
 	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
+	ASSERT(!pag->pagf_agflreset);
 	be32_add_cpu(&agf->agf_flcount, 1);
 	xfs_trans_agflist_delta(tp, 1);
 	pag->pagf_flcount++;
@@ -2597,6 +2692,7 @@ xfs_alloc_read_agf(
 		pag->pagb_count = 0;
 		pag->pagb_tree = RB_ROOT;
 		pag->pagf_init = 1;
+		pag->pagf_agflreset = xfs_agfl_need_reset(mp, agf);
 	}
 #ifdef DEBUG
 	else if (!XFS_FORCED_SHUTDOWN(mp)) {
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 1808f56decaa..10b90bbc5162 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -353,6 +353,7 @@ typedef struct xfs_perag {
 	char		pagi_inodeok;	/* The agi is ok for inodes */
 	uint8_t		pagf_levels[XFS_BTNUM_AGF];
 					/* # of levels in bno & cnt btree */
+	bool		pagf_agflreset; /* agfl requires reset before use */
 	uint32_t	pagf_flcount;	/* count of blocks in freelist */
 	xfs_extlen_t	pagf_freeblks;	/* total free blocks */
 	xfs_extlen_t	pagf_longest;	/* longest free space */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 945de08af7ba..a982c0b623d0 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1477,7 +1477,7 @@ TRACE_EVENT(xfs_extent_busy_trim,
 		  __entry->tlen)
 );
 
-TRACE_EVENT(xfs_agf,
+DECLARE_EVENT_CLASS(xfs_agf_class,
 	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags,
 		 unsigned long caller_ip),
 	TP_ARGS(mp, agf, flags, caller_ip),
@@ -1533,6 +1533,13 @@ TRACE_EVENT(xfs_agf,
 		  __entry->longest,
 		  (void *)__entry->caller_ip)
 );
+#define DEFINE_AGF_EVENT(name) \
+DEFINE_EVENT(xfs_agf_class, name, \
+	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, \
+		 unsigned long caller_ip), \
+	TP_ARGS(mp, agf, flags, caller_ip))
+DEFINE_AGF_EVENT(xfs_agf);
+DEFINE_AGF_EVENT(xfs_agfl_reset);
 
 TRACE_EVENT(xfs_free_extent,
 	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno,
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] xfs: detect agfl count corruption and reset agfl
  2018-03-14 17:17 [PATCH] xfs: detect agfl count corruption and reset agfl Brian Foster
@ 2018-03-14 18:12 ` Darrick J. Wong
  2018-03-14 20:42   ` Dave Chiluk
  0 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2018-03-14 18:12 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Wed, Mar 14, 2018 at 01:17:24PM -0400, Brian Foster wrote:
> The struct xfs_agfl v5 header was originally introduced with
> unexpected padding that caused the AGFL to operate with one less
> slot than intended. The header has since been packed, but the fix
> left an incompatibility for users who upgrade from an old kernel
> with the unpacked header to a newer kernel with the packed header
> while the AGFL happens to wrap around the end. The newer kernel
> recognizes one extra slot at the physical end of the AGFL that the
> previous kernel did not. The new kernel will eventually attempt to
> allocate a block from that slot, which contains invalid data, and
> cause a crash.
> 
> This condition can be detected by comparing the active range of the
> AGFL to the count. While this detects a padding mismatch, it can
> also trigger false positives for unrelated flcount corruption. Since
> we cannot distinguish a size mismatch due to padding from unrelated
> corruption, we can't trust the AGFL enough to simply repopulate the
> empty slot.
> 
> Instead, avoid unnecessarily complex detection logic and and use a
> solution that can handle any form of flcount corruption that slips
> through read verifiers: distrust the entire AGFL and reset it to an
> empty state. Any valid blocks within the AGFL are intentionally
> leaked. This requires xfs_repair to rectify (which was already
> necessary based on the state the AGFL was found in). The reset
> mitigates the side effect of the padding mismatch problem from a
> filesystem crash to a free space accounting inconsistency. The
> generic approach also means that this patch can be safely backported
> to kernels with or without a packed struct xfs_agfl.
> 
> Check the AGF for an invalid freelist count on initial read from
> disk. If detected, set a flag on the xfs_perag to indicate that a
> reset is required before the AGFL can be used. In the first
> transaction that attempts to use a flagged AGFL, reset it to empty,
> warn the user about the inconsistency and allow the freelist fixup
> code to repopulate the AGFL with new blocks. The xfs_perag flag is
> cleared to eliminate the need for repeated checks on each block
> allocation operation.
> 
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> Hi all,
> 
> Here's an actual patch for the AGFL reset approach. This survives a
> normal xfstests run. I also ran some quick xfstests runs with hacks in
> place to unconditionally trigger an agfl reset on first read. This
> caused a ton of test failures due to the resulting accounting
> inconsistency on the scratch device, but otherwise it survived from a
> torture test perspective and didn't introduce any unrelated regressions
> that I could see.
> 
> Thoughts, reviews, flames appreciated.
> 
> Brian
> 
> v1:
> - Use a simplified AGFL reset mechanism.
> - Trigger on AGFL fixup rather than get/put.
> - Various refactors, cleanups and comments.
> rfc: https://marc.info/?l=linux-xfs&m=152045069616434&w=2
> 
>  fs/xfs/libxfs/xfs_alloc.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_mount.h        |  1 +
>  fs/xfs/xfs_trace.h        |  9 ++++-
>  3 files changed, 105 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 3db90b707fb2..7dfec92f28dc 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -2063,6 +2063,95 @@ xfs_alloc_space_available(
>  }
>  
>  /*
> + * Check the agfl fields of the agf for inconsistency or corruption. The purpose
> + * is to detect an agfl header padding mismatch between current and early v5
> + * kernels. This problem manifests as a 1-slot size difference between the
> + * on-disk flcount and the active [first, last] range of a wrapped agfl. This
> + * may also catch variants of agfl count corruption unrelated to padding. Either
> + * way, we'll reset the agfl and warn the user.
> + *
> + * Return true if a reset is required before the agfl can be used, false
> + * otherwise.
> + */
> +static bool
> +xfs_agfl_need_reset(
> +	struct xfs_mount	*mp,
> +	struct xfs_agf		*agf)
> +{
> +	uint32_t		f = be32_to_cpu(agf->agf_flfirst);
> +	uint32_t		l = be32_to_cpu(agf->agf_fllast);
> +	uint32_t		c = be32_to_cpu(agf->agf_flcount);
> +	int			agfl_size = xfs_agfl_size(mp);
> +	int			active;
> +
> +	/* no agfl header on v4 supers */
> +	if (!xfs_sb_version_hascrc(&mp->m_sb))
> +		return false;
> +
> +	/*
> +	 * The agf read verifier catches severe corruption of these fields.
> +	 * Repeat some sanity checks to cover a packed -> unpacked mismatch if
> +	 * the verifier allows it.
> +	 */
> +	if (f >= agfl_size || l >= agfl_size)
> +		return true;
> +	if (c > agfl_size)
> +		return true;
> +
> +	/*
> +	 * Check consistency between the on-disk count and the active range. An
> +	 * agfl padding mismatch manifests as an inconsistent flcount.
> +	 */
> +	if (c && l >= f)
> +		active = l - f + 1;
> +	else if (c)
> +		active = agfl_size - f + l + 1;
> +	else
> +		active = 0;
> +	if (active != c)
> +		return true;
> +
> +	return false;

This could become "return active != c;"

But otherwise looks ok enough to try it out,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> +}
> +
> +/*
> + * Reset the agfl to an empty state. Ignore/drop any existing blocks since the
> + * agfl content cannot be trusted. Warn the user that a repair is required to
> + * recover leaked blocks.
> + *
> + * The purpose of this mechanism is to handle filesystems affected by the agfl
> + * header padding mismatch problem. A reset keeps the filesystem online with a
> + * relatively minor free space accounting inconsistency rather than suffer the
> + * inevitable crash from use of an invalid agfl block.
> + */
> +static void
> +xfs_agfl_reset(
> +	struct xfs_trans	*tp,
> +	struct xfs_buf		*agbp,
> +	struct xfs_perag	*pag)
> +{
> +	struct xfs_mount	*mp = tp->t_mountp;
> +	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
> +
> +	ASSERT(pag->pagf_agflreset);
> +	trace_xfs_agfl_reset(mp, agf, 0, _RET_IP_);
> +
> +	xfs_warn(mp,
> +	       "WARNING: Reset corrupted AGFL on AG %u. %d blocks leaked. "
> +	       "Please unmount and run xfs_repair.",
> +	         pag->pag_agno, pag->pagf_flcount);
> +
> +	agf->agf_flfirst = 0;
> +	agf->agf_fllast = cpu_to_be32(xfs_agfl_size(mp) - 1);
> +	agf->agf_flcount = 0;
> +	xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST |
> +				    XFS_AGF_FLCOUNT);
> +
> +	pag->pagf_flcount = 0;
> +	pag->pagf_agflreset = false;
> +}
> +
> +/*
>   * Decide whether to use this allocation group for this allocation.
>   * If so, fix up the btree freelist's size.
>   */
> @@ -2123,6 +2212,10 @@ xfs_alloc_fix_freelist(
>  		}
>  	}
>  
> +	/* reset a padding mismatched agfl before final free space check */
> +	if (pag->pagf_agflreset)
> +		xfs_agfl_reset(tp, agbp, pag);
> +
>  	/* If there isn't enough total space or single-extent, reject it. */
>  	need = xfs_alloc_min_freelist(mp, pag);
>  	if (!xfs_alloc_space_available(args, need, flags))
> @@ -2279,6 +2372,7 @@ xfs_alloc_get_freelist(
>  		agf->agf_flfirst = 0;
>  
>  	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> +	ASSERT(!pag->pagf_agflreset);
>  	be32_add_cpu(&agf->agf_flcount, -1);
>  	xfs_trans_agflist_delta(tp, -1);
>  	pag->pagf_flcount--;
> @@ -2390,6 +2484,7 @@ xfs_alloc_put_freelist(
>  		agf->agf_fllast = 0;
>  
>  	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> +	ASSERT(!pag->pagf_agflreset);
>  	be32_add_cpu(&agf->agf_flcount, 1);
>  	xfs_trans_agflist_delta(tp, 1);
>  	pag->pagf_flcount++;
> @@ -2597,6 +2692,7 @@ xfs_alloc_read_agf(
>  		pag->pagb_count = 0;
>  		pag->pagb_tree = RB_ROOT;
>  		pag->pagf_init = 1;
> +		pag->pagf_agflreset = xfs_agfl_need_reset(mp, agf);
>  	}
>  #ifdef DEBUG
>  	else if (!XFS_FORCED_SHUTDOWN(mp)) {
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 1808f56decaa..10b90bbc5162 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -353,6 +353,7 @@ typedef struct xfs_perag {
>  	char		pagi_inodeok;	/* The agi is ok for inodes */
>  	uint8_t		pagf_levels[XFS_BTNUM_AGF];
>  					/* # of levels in bno & cnt btree */
> +	bool		pagf_agflreset; /* agfl requires reset before use */
>  	uint32_t	pagf_flcount;	/* count of blocks in freelist */
>  	xfs_extlen_t	pagf_freeblks;	/* total free blocks */
>  	xfs_extlen_t	pagf_longest;	/* longest free space */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 945de08af7ba..a982c0b623d0 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1477,7 +1477,7 @@ TRACE_EVENT(xfs_extent_busy_trim,
>  		  __entry->tlen)
>  );
>  
> -TRACE_EVENT(xfs_agf,
> +DECLARE_EVENT_CLASS(xfs_agf_class,
>  	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags,
>  		 unsigned long caller_ip),
>  	TP_ARGS(mp, agf, flags, caller_ip),
> @@ -1533,6 +1533,13 @@ TRACE_EVENT(xfs_agf,
>  		  __entry->longest,
>  		  (void *)__entry->caller_ip)
>  );
> +#define DEFINE_AGF_EVENT(name) \
> +DEFINE_EVENT(xfs_agf_class, name, \
> +	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, \
> +		 unsigned long caller_ip), \
> +	TP_ARGS(mp, agf, flags, caller_ip))
> +DEFINE_AGF_EVENT(xfs_agf);
> +DEFINE_AGF_EVENT(xfs_agfl_reset);
>  
>  TRACE_EVENT(xfs_free_extent,
>  	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno,
> -- 
> 2.13.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] xfs: detect agfl count corruption and reset agfl
  2018-03-14 18:12 ` Darrick J. Wong
@ 2018-03-14 20:42   ` Dave Chiluk
  2018-03-15 10:38     ` Brian Foster
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Chiluk @ 2018-03-14 20:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, linux-xfs

On Wed, Mar 14, 2018 at 1:12 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Wed, Mar 14, 2018 at 01:17:24PM -0400, Brian Foster wrote:
>> The struct xfs_agfl v5 header was originally introduced with
>> unexpected padding that caused the AGFL to operate with one less
>> slot than intended. The header has since been packed, but the fix
>> left an incompatibility for users who upgrade from an old kernel
>> with the unpacked header to a newer kernel with the packed header
>> while the AGFL happens to wrap around the end. The newer kernel
>> recognizes one extra slot at the physical end of the AGFL that the
>> previous kernel did not. The new kernel will eventually attempt to
>> allocate a block from that slot, which contains invalid data, and
>> cause a crash.
>>
>> This condition can be detected by comparing the active range of the
>> AGFL to the count. While this detects a padding mismatch, it can
>> also trigger false positives for unrelated flcount corruption. Since
>> we cannot distinguish a size mismatch due to padding from unrelated
>> corruption, we can't trust the AGFL enough to simply repopulate the
>> empty slot.
>>
>> Instead, avoid unnecessarily complex detection logic and and use a
>> solution that can handle any form of flcount corruption that slips
>> through read verifiers: distrust the entire AGFL and reset it to an
>> empty state. Any valid blocks within the AGFL are intentionally
>> leaked. This requires xfs_repair to rectify (which was already
>> necessary based on the state the AGFL was found in). The reset
>> mitigates the side effect of the padding mismatch problem from a
>> filesystem crash to a free space accounting inconsistency. The
>> generic approach also means that this patch can be safely backported
>> to kernels with or without a packed struct xfs_agfl.
>>
>> Check the AGF for an invalid freelist count on initial read from
>> disk. If detected, set a flag on the xfs_perag to indicate that a
>> reset is required before the AGFL can be used. In the first
>> transaction that attempts to use a flagged AGFL, reset it to empty,
>> warn the user about the inconsistency and allow the freelist fixup
>> code to repopulate the AGFL with new blocks. The xfs_perag flag is
>> cleared to eliminate the need for repeated checks on each block
>> allocation operation.
>>
>> Suggested-by: Dave Chinner <david@fromorbit.com>
>> Signed-off-by: Brian Foster <bfoster@redhat.com>
>> ---
>>
>> Hi all,
>>
>> Here's an actual patch for the AGFL reset approach. This survives a
>> normal xfstests run. I also ran some quick xfstests runs with hacks in
>> place to unconditionally trigger an agfl reset on first read. This
>> caused a ton of test failures due to the resulting accounting
>> inconsistency on the scratch device, but otherwise it survived from a
>> torture test perspective and didn't introduce any unrelated regressions
>> that I could see.
>>
>> Thoughts, reviews, flames appreciated.
>>
>> Brian
>>
>> v1:
>> - Use a simplified AGFL reset mechanism.
>> - Trigger on AGFL fixup rather than get/put.
>> - Various refactors, cleanups and comments.
>> rfc: https://marc.info/?l=linux-xfs&m=152045069616434&w=2
>>
>>  fs/xfs/libxfs/xfs_alloc.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++
>>  fs/xfs/xfs_mount.h        |  1 +
>>  fs/xfs/xfs_trace.h        |  9 ++++-
>>  3 files changed, 105 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
>> index 3db90b707fb2..7dfec92f28dc 100644
>> --- a/fs/xfs/libxfs/xfs_alloc.c
>> +++ b/fs/xfs/libxfs/xfs_alloc.c
>> @@ -2063,6 +2063,95 @@ xfs_alloc_space_available(
>>  }
>>
>>  /*
>> + * Check the agfl fields of the agf for inconsistency or corruption. The purpose
>> + * is to detect an agfl header padding mismatch between current and early v5
>> + * kernels. This problem manifests as a 1-slot size difference between the
>> + * on-disk flcount and the active [first, last] range of a wrapped agfl. This
>> + * may also catch variants of agfl count corruption unrelated to padding. Either
>> + * way, we'll reset the agfl and warn the user.
>> + *
>> + * Return true if a reset is required before the agfl can be used, false
>> + * otherwise.
>> + */
>> +static bool
>> +xfs_agfl_need_reset(

Nitpick: Rename xfs_agfl_need_reset to xfs_agfl_needs_reset for better
English and flow.

>> +     struct xfs_mount        *mp,
>> +     struct xfs_agf          *agf)
>> +{
>> +     uint32_t                f = be32_to_cpu(agf->agf_flfirst);
>> +     uint32_t                l = be32_to_cpu(agf->agf_fllast);
>> +     uint32_t                c = be32_to_cpu(agf->agf_flcount);
>> +     int                     agfl_size = xfs_agfl_size(mp);
>> +     int                     active;
>> +
>> +     /* no agfl header on v4 supers */
>> +     if (!xfs_sb_version_hascrc(&mp->m_sb))
>> +             return false;
>> +
>> +     /*
>> +      * The agf read verifier catches severe corruption of these fields.
>> +      * Repeat some sanity checks to cover a packed -> unpacked mismatch if
>> +      * the verifier allows it.
>> +      */
>> +     if (f >= agfl_size || l >= agfl_size)
>> +             return true;
>> +     if (c > agfl_size)
>> +             return true;
>> +
>> +     /*
>> +      * Check consistency between the on-disk count and the active range. An
>> +      * agfl padding mismatch manifests as an inconsistent flcount.
>> +      */
>> +     if (c && l >= f)
>> +             active = l - f + 1;
>> +     else if (c)
>> +             active = agfl_size - f + l + 1;
>> +     else
>> +             active = 0;
>> +     if (active != c)
>> +             return true;
>> +
>> +     return false;
>
> This could become "return active != c;"
>
> But otherwise looks ok enough to try it out,
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
>
> --D
>
>> +}
>> +
>> +/*
>> + * Reset the agfl to an empty state. Ignore/drop any existing blocks since the
>> + * agfl content cannot be trusted. Warn the user that a repair is required to
>> + * recover leaked blocks.
>> + *
>> + * The purpose of this mechanism is to handle filesystems affected by the agfl
>> + * header padding mismatch problem. A reset keeps the filesystem online with a
>> + * relatively minor free space accounting inconsistency rather than suffer the
>> + * inevitable crash from use of an invalid agfl block.
>> + */
>> +static void
>> +xfs_agfl_reset(
>> +     struct xfs_trans        *tp,
>> +     struct xfs_buf          *agbp,
>> +     struct xfs_perag        *pag)
>> +{
>> +     struct xfs_mount        *mp = tp->t_mountp;
>> +     struct xfs_agf          *agf = XFS_BUF_TO_AGF(agbp);
>> +
>> +     ASSERT(pag->pagf_agflreset);
>> +     trace_xfs_agfl_reset(mp, agf, 0, _RET_IP_);
>> +
>> +     xfs_warn(mp,
>> +            "WARNING: Reset corrupted AGFL on AG %u. %d blocks leaked. "
>> +            "Please unmount and run xfs_repair.",
>> +              pag->pag_agno, pag->pagf_flcount);
>> +
>> +     agf->agf_flfirst = 0;
>> +     agf->agf_fllast = cpu_to_be32(xfs_agfl_size(mp) - 1);
>> +     agf->agf_flcount = 0;
>> +     xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST |
>> +                                 XFS_AGF_FLCOUNT);
>> +
>> +     pag->pagf_flcount = 0;
>> +     pag->pagf_agflreset = false;
>> +}
>> +
>> +/*
>>   * Decide whether to use this allocation group for this allocation.
>>   * If so, fix up the btree freelist's size.
>>   */
>> @@ -2123,6 +2212,10 @@ xfs_alloc_fix_freelist(
>>               }
>>       }
>>
>> +     /* reset a padding mismatched agfl before final free space check */
>> +     if (pag->pagf_agflreset)
>> +             xfs_agfl_reset(tp, agbp, pag);
>> +
>>       /* If there isn't enough total space or single-extent, reject it. */
>>       need = xfs_alloc_min_freelist(mp, pag);
>>       if (!xfs_alloc_space_available(args, need, flags))
>> @@ -2279,6 +2372,7 @@ xfs_alloc_get_freelist(
>>               agf->agf_flfirst = 0;
>>
>>       pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
>> +     ASSERT(!pag->pagf_agflreset);
>>       be32_add_cpu(&agf->agf_flcount, -1);
>>       xfs_trans_agflist_delta(tp, -1);
>>       pag->pagf_flcount--;
>> @@ -2390,6 +2484,7 @@ xfs_alloc_put_freelist(
>>               agf->agf_fllast = 0;
>>
>>       pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
>> +     ASSERT(!pag->pagf_agflreset);
>>       be32_add_cpu(&agf->agf_flcount, 1);
>>       xfs_trans_agflist_delta(tp, 1);
>>       pag->pagf_flcount++;
>> @@ -2597,6 +2692,7 @@ xfs_alloc_read_agf(
>>               pag->pagb_count = 0;
>>               pag->pagb_tree = RB_ROOT;
>>               pag->pagf_init = 1;
>> +             pag->pagf_agflreset = xfs_agfl_need_reset(mp, agf);
>>       }
>>  #ifdef DEBUG
>>       else if (!XFS_FORCED_SHUTDOWN(mp)) {
>> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
>> index 1808f56decaa..10b90bbc5162 100644
>> --- a/fs/xfs/xfs_mount.h
>> +++ b/fs/xfs/xfs_mount.h
>> @@ -353,6 +353,7 @@ typedef struct xfs_perag {
>>       char            pagi_inodeok;   /* The agi is ok for inodes */
>>       uint8_t         pagf_levels[XFS_BTNUM_AGF];
>>                                       /* # of levels in bno & cnt btree */
>> +     bool            pagf_agflreset; /* agfl requires reset before use */
>>       uint32_t        pagf_flcount;   /* count of blocks in freelist */
>>       xfs_extlen_t    pagf_freeblks;  /* total free blocks */
>>       xfs_extlen_t    pagf_longest;   /* longest free space */
>> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
>> index 945de08af7ba..a982c0b623d0 100644
>> --- a/fs/xfs/xfs_trace.h
>> +++ b/fs/xfs/xfs_trace.h
>> @@ -1477,7 +1477,7 @@ TRACE_EVENT(xfs_extent_busy_trim,
>>                 __entry->tlen)
>>  );
>>
>> -TRACE_EVENT(xfs_agf,
>> +DECLARE_EVENT_CLASS(xfs_agf_class,
>>       TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags,
>>                unsigned long caller_ip),
>>       TP_ARGS(mp, agf, flags, caller_ip),
>> @@ -1533,6 +1533,13 @@ TRACE_EVENT(xfs_agf,
>>                 __entry->longest,
>>                 (void *)__entry->caller_ip)
>>  );
>> +#define DEFINE_AGF_EVENT(name) \
>> +DEFINE_EVENT(xfs_agf_class, name, \
>> +     TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, \
>> +              unsigned long caller_ip), \
>> +     TP_ARGS(mp, agf, flags, caller_ip))
>> +DEFINE_AGF_EVENT(xfs_agf);
>> +DEFINE_AGF_EVENT(xfs_agfl_reset);
>>
>>  TRACE_EVENT(xfs_free_extent,
>>       TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno,
>> --
>> 2.13.6
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>

I'm also assuming this will get submitted back to the linux-stable
trees as the agfl packing change is already causing issues in the
stable trees.  If you do not intend to push it into the linux-stable
trees let me know and I'll take care of at least the major ones.

Thanks,
Dave

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] xfs: detect agfl count corruption and reset agfl
  2018-03-14 20:42   ` Dave Chiluk
@ 2018-03-15 10:38     ` Brian Foster
  2018-03-15 15:46       ` Darrick J. Wong
  2018-03-15 22:26       ` Dave Chinner
  0 siblings, 2 replies; 10+ messages in thread
From: Brian Foster @ 2018-03-15 10:38 UTC (permalink / raw)
  To: Dave Chiluk; +Cc: Darrick J. Wong, linux-xfs

On Wed, Mar 14, 2018 at 03:42:50PM -0500, Dave Chiluk wrote:
> On Wed, Mar 14, 2018 at 1:12 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Wed, Mar 14, 2018 at 01:17:24PM -0400, Brian Foster wrote:
...
> 
> Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>
> 
> I'm also assuming this will get submitted back to the linux-stable
> trees as the agfl packing change is already causing issues in the
> stable trees.  If you do not intend to push it into the linux-stable
> trees let me know and I'll take care of at least the major ones.
> 

Yeah, I can cc stable in the next post along with the other minor fixes.
My question is how far back should this fix go? Was the plan to only go
back to v4.5 because that is where the packing fix first went in? Or
should this go back further because it looks like the packing fix was
backported to v3.10:

$ git show 96f859d52bcb1
commit 96f859d52bcb1c6ea6f3388d39862bf7143e2f30
Author: Darrick J. Wong <darrick.wong@oracle.com>
Date:   Mon Jan 4 16:13:21 2016 +1100

    libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct
    
    ...
    
    cc: <stable@vger.kernel.org> # 3.10 - 4.4
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Brian

> Thanks,
> Dave
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] xfs: detect agfl count corruption and reset agfl
  2018-03-15 10:38     ` Brian Foster
@ 2018-03-15 15:46       ` Darrick J. Wong
  2018-03-15 16:27         ` Dave Chiluk
  2018-03-15 22:26       ` Dave Chinner
  1 sibling, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2018-03-15 15:46 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chiluk, linux-xfs

On Thu, Mar 15, 2018 at 06:38:39AM -0400, Brian Foster wrote:
> On Wed, Mar 14, 2018 at 03:42:50PM -0500, Dave Chiluk wrote:
> > On Wed, Mar 14, 2018 at 1:12 PM, Darrick J. Wong
> > <darrick.wong@oracle.com> wrote:
> > > On Wed, Mar 14, 2018 at 01:17:24PM -0400, Brian Foster wrote:
> ...
> > 
> > Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>
> > 
> > I'm also assuming this will get submitted back to the linux-stable
> > trees as the agfl packing change is already causing issues in the
> > stable trees.  If you do not intend to push it into the linux-stable
> > trees let me know and I'll take care of at least the major ones.
> > 
> 
> Yeah, I can cc stable in the next post along with the other minor fixes.
> My question is how far back should this fix go? Was the plan to only go
> back to v4.5 because that is where the packing fix first went in? Or
> should this go back further because it looks like the packing fix was
> backported to v3.10:
> 
> $ git show 96f859d52bcb1
> commit 96f859d52bcb1c6ea6f3388d39862bf7143e2f30
> Author: Darrick J. Wong <darrick.wong@oracle.com>
> Date:   Mon Jan 4 16:13:21 2016 +1100
> 
>     libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct
>     
>     ...
>     
>     cc: <stable@vger.kernel.org> # 3.10 - 4.4
>     Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
>     Reviewed-by: Dave Chinner <dchinner@redhat.com>
>     Signed-off-by: Dave Chinner <david@fromorbit.com>

Hmmm, I'm assuming that you'd want 3.10 at least for RHEL, but I'll let
you all figure that one out.

As far as the upstream kernels, 4.14.27, 4.9.87, 4.4.121, and 4.1.50
have that packing patch so I guess they'll all need some version of this.

--D

> 
> Brian
> 
> > Thanks,
> > Dave
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] xfs: detect agfl count corruption and reset agfl
  2018-03-15 15:46       ` Darrick J. Wong
@ 2018-03-15 16:27         ` Dave Chiluk
  2018-03-15 22:48           ` Dave Chinner
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Chiluk @ 2018-03-15 16:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, linux-xfs

On Thu, Mar 15, 2018 at 10:46 AM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Thu, Mar 15, 2018 at 06:38:39AM -0400, Brian Foster wrote:
>> On Wed, Mar 14, 2018 at 03:42:50PM -0500, Dave Chiluk wrote:
>> > On Wed, Mar 14, 2018 at 1:12 PM, Darrick J. Wong
>> > <darrick.wong@oracle.com> wrote:
>> > > On Wed, Mar 14, 2018 at 01:17:24PM -0400, Brian Foster wrote:
>> ...
>> >
>> > Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>
>> >
>> > I'm also assuming this will get submitted back to the linux-stable
>> > trees as the agfl packing change is already causing issues in the
>> > stable trees.  If you do not intend to push it into the linux-stable
>> > trees let me know and I'll take care of at least the major ones.
>> >
>>
>> Yeah, I can cc stable in the next post along with the other minor fixes.
>> My question is how far back should this fix go? Was the plan to only go
>> back to v4.5 because that is where the packing fix first went in? Or
>> should this go back further because it looks like the packing fix was
>> backported to v3.10:
>>
>> $ git show 96f859d52bcb1
>> commit 96f859d52bcb1c6ea6f3388d39862bf7143e2f30
>> Author: Darrick J. Wong <darrick.wong@oracle.com>
>> Date:   Mon Jan 4 16:13:21 2016 +1100
>>
>>     libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct
>>
>>     ...
>>
>>     cc: <stable@vger.kernel.org> # 3.10 - 4.4
>>     Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
>>     Reviewed-by: Dave Chinner <dchinner@redhat.com>
>>     Signed-off-by: Dave Chinner <david@fromorbit.com>
>
> Hmmm, I'm assuming that you'd want 3.10 at least for RHEL, but I'll let
> you all figure that one out.
>
> As far as the upstream kernels, 4.14.27, 4.9.87, 4.4.121, and 4.1.50
> have that packing patch so I guess they'll all need some version of this.
>
> --D
>
>>
>> Brian
>>
>> > Thanks,
>> > Dave
>> > --

RHEL is actually fine for now, since they explicitly remove the
packing patch in their kernel, and xfsprogs.  Once you submit the
patches to linux-stable the ubuntu-kernel team monitors and includes
patches for the releases that they are stable maintainers of *(they
are downstream for 4.4 of gregkh, but currently maintain a 3.13, 4.13,
and 4.15 tree).

Also please add a Fixes line to your commit so it's obvious what patch
it helps remediate.  Fixes is actually not a great word here, but that
looks to be what the submitting-patches.txt doc calls for.

Fixes: 96f859d52bcb libxfs: pack the agfl header structure so
XFS_AGFL_SIZE is correct

This way stable maintainers understand that the fix resolves an issue
that was introduced by that patch, and can apply/not apply
appropriately.  Although in all honesty the patch really applies to
all stable kernels regardless of if the packing patch has been applied
or not.

Dave.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] xfs: detect agfl count corruption and reset agfl
  2018-03-15 10:38     ` Brian Foster
  2018-03-15 15:46       ` Darrick J. Wong
@ 2018-03-15 22:26       ` Dave Chinner
  2018-03-16 11:59         ` Brian Foster
  1 sibling, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2018-03-15 22:26 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chiluk, Darrick J. Wong, linux-xfs

On Thu, Mar 15, 2018 at 06:38:39AM -0400, Brian Foster wrote:
> On Wed, Mar 14, 2018 at 03:42:50PM -0500, Dave Chiluk wrote:
> > On Wed, Mar 14, 2018 at 1:12 PM, Darrick J. Wong
> > <darrick.wong@oracle.com> wrote:
> > > On Wed, Mar 14, 2018 at 01:17:24PM -0400, Brian Foster wrote:
> ...
> > 
> > Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>
> > 
> > I'm also assuming this will get submitted back to the linux-stable
> > trees as the agfl packing change is already causing issues in the
> > stable trees.  If you do not intend to push it into the linux-stable
> > trees let me know and I'll take care of at least the major ones.
> > 
> 
> Yeah, I can cc stable in the next post along with the other minor fixes.
> My question is how far back should this fix go? Was the plan to only go
> back to v4.5 because that is where the packing fix first went in? Or
> should this go back further because it looks like the packing fix was
> backported to v3.10:
> 
> $ git show 96f859d52bcb1
> commit 96f859d52bcb1c6ea6f3388d39862bf7143e2f30
> Author: Darrick J. Wong <darrick.wong@oracle.com>
> Date:   Mon Jan 4 16:13:21 2016 +1100
> 
>     libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct
>     
>     ...
>     
>     cc: <stable@vger.kernel.org> # 3.10 - 4.4
>     Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
>     Reviewed-by: Dave Chinner <dchinner@redhat.com>
>     Signed-off-by: Dave Chinner <david@fromorbit.com>

3.10 was when the problem was first introduced. I have no idea
whether it got backported that far but the stable kernel
maintainers, so you'll have to manually audit all current long-term
stable kernels to determine what kernels need backports.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] xfs: detect agfl count corruption and reset agfl
  2018-03-15 16:27         ` Dave Chiluk
@ 2018-03-15 22:48           ` Dave Chinner
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2018-03-15 22:48 UTC (permalink / raw)
  To: Dave Chiluk; +Cc: Darrick J. Wong, Brian Foster, linux-xfs

On Thu, Mar 15, 2018 at 11:27:02AM -0500, Dave Chiluk wrote:
> On Thu, Mar 15, 2018 at 10:46 AM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Thu, Mar 15, 2018 at 06:38:39AM -0400, Brian Foster wrote:
> >> On Wed, Mar 14, 2018 at 03:42:50PM -0500, Dave Chiluk wrote:
> >> > On Wed, Mar 14, 2018 at 1:12 PM, Darrick J. Wong
> >> > <darrick.wong@oracle.com> wrote:
> >> > > On Wed, Mar 14, 2018 at 01:17:24PM -0400, Brian Foster wrote:
> >> ...
> >> >
> >> > Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>
> >> >
> >> > I'm also assuming this will get submitted back to the linux-stable
> >> > trees as the agfl packing change is already causing issues in the
> >> > stable trees.  If you do not intend to push it into the linux-stable
> >> > trees let me know and I'll take care of at least the major ones.
> >> >
> >>
> >> Yeah, I can cc stable in the next post along with the other minor fixes.
> >> My question is how far back should this fix go? Was the plan to only go
> >> back to v4.5 because that is where the packing fix first went in? Or
> >> should this go back further because it looks like the packing fix was
> >> backported to v3.10:
> >>
> >> $ git show 96f859d52bcb1
> >> commit 96f859d52bcb1c6ea6f3388d39862bf7143e2f30
> >> Author: Darrick J. Wong <darrick.wong@oracle.com>
> >> Date:   Mon Jan 4 16:13:21 2016 +1100
> >>
> >>     libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct
> >>
> >>     ...
> >>
> >>     cc: <stable@vger.kernel.org> # 3.10 - 4.4
> >>     Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> >>     Reviewed-by: Dave Chinner <dchinner@redhat.com>
> >>     Signed-off-by: Dave Chinner <david@fromorbit.com>
> >
> > Hmmm, I'm assuming that you'd want 3.10 at least for RHEL, but I'll let
> > you all figure that one out.
> >
> > As far as the upstream kernels, 4.14.27, 4.9.87, 4.4.121, and 4.1.50
> > have that packing patch so I guess they'll all need some version of this.
> >
> > --D
> >
> >>
> >> Brian
> >>
> >> > Thanks,
> >> > Dave
> >> > --
> 
> RHEL is actually fine for now, since they explicitly remove the
> packing patch in their kernel, and xfsprogs.  Once you submit the
> patches to linux-stable the ubuntu-kernel team monitors and includes
> patches for the releases that they are stable maintainers of *(they
> are downstream for 4.4 of gregkh, but currently maintain a 3.13, 4.13,
> and 4.15 tree).
> 
> Also please add a Fixes line to your commit so it's obvious what patch
> it helps remediate.  Fixes is actually not a great word here, but that
> looks to be what the submitting-patches.txt doc calls for.
> 
> Fixes: 96f859d52bcb libxfs: pack the agfl header structure so
> XFS_AGFL_SIZE is correct

No, please don't dumb down a complex issue to a simple, naive
metadata tag like this. Explain the issue fully in the commit
message, mentioning/referencing commits when appropriate.

As I've mentioned in another thread recently about backports - it
you are relying on "fixes" tags to determine what needs backporting,
your backporting process is fundamentally broken.  I don't care what
the kernel documentatin says - it frequently does not apply because
it's written by someone else for their own reasons and requirements
that aren't relevant to us. They are guidelines, not rules, for that
reason.

> This way stable maintainers understand that the fix resolves an issue
> that was introduced by that patch, and can apply/not apply
> appropriately.

I simply don't trust the stable process to get complex XFS backports
right and correctly tested. e.g. We've had this problem before with
things like error numbers changing sign @ 3.16 - patches from >3.16
were getting backported with negative errnos to kernels <3.16, and
they were breaking because errors were not being correctly detected. 

Because nobody in the stable process was regression testing
filesystem backports other than booting kernels, it wasn't until
users installed and started reporting stable kernel regressions to
us that we were able to identify the bugs and the process issues
that caused them.

Put simply: the stable kernel maintainers are not filesystem experts
and they don't run filesystem regression tests to determine that the
fixes don't have any unexpected side effects. What that means is
that stable kernel backports need to be done under the eye of an XFS
developer who then follows up by reviewing the backports once merged
and running regression tests agtainst the resulting kernel as we
cannot rely on the stable process to do this.  It's a serious amount
of work for something as critical as fixing an on-disk format
problem, and we simply can't trust anyone else to do the job
properly.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] xfs: detect agfl count corruption and reset agfl
  2018-03-15 22:26       ` Dave Chinner
@ 2018-03-16 11:59         ` Brian Foster
  0 siblings, 0 replies; 10+ messages in thread
From: Brian Foster @ 2018-03-16 11:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Dave Chiluk, Darrick J. Wong, linux-xfs

On Fri, Mar 16, 2018 at 09:26:05AM +1100, Dave Chinner wrote:
> On Thu, Mar 15, 2018 at 06:38:39AM -0400, Brian Foster wrote:
> > On Wed, Mar 14, 2018 at 03:42:50PM -0500, Dave Chiluk wrote:
> > > On Wed, Mar 14, 2018 at 1:12 PM, Darrick J. Wong
> > > <darrick.wong@oracle.com> wrote:
> > > > On Wed, Mar 14, 2018 at 01:17:24PM -0400, Brian Foster wrote:
> > ...
> > > 
> > > Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>
> > > 
> > > I'm also assuming this will get submitted back to the linux-stable
> > > trees as the agfl packing change is already causing issues in the
> > > stable trees.  If you do not intend to push it into the linux-stable
> > > trees let me know and I'll take care of at least the major ones.
> > > 
> > 
> > Yeah, I can cc stable in the next post along with the other minor fixes.
> > My question is how far back should this fix go? Was the plan to only go
> > back to v4.5 because that is where the packing fix first went in? Or
> > should this go back further because it looks like the packing fix was
> > backported to v3.10:
> > 
> > $ git show 96f859d52bcb1
> > commit 96f859d52bcb1c6ea6f3388d39862bf7143e2f30
> > Author: Darrick J. Wong <darrick.wong@oracle.com>
> > Date:   Mon Jan 4 16:13:21 2016 +1100
> > 
> >     libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct
> >     
> >     ...
> >     
> >     cc: <stable@vger.kernel.org> # 3.10 - 4.4
> >     Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> >     Reviewed-by: Dave Chinner <dchinner@redhat.com>
> >     Signed-off-by: Dave Chinner <david@fromorbit.com>
> 
> 3.10 was when the problem was first introduced. I have no idea
> whether it got backported that far but the stable kernel
> maintainers, so you'll have to manually audit all current long-term
> stable kernels to determine what kernels need backports.
> 

The earliest I saw it backported was to 3.16, which looks like Luis had
perhaps done that one. Otherwise, the kernels Darrick pointed out seem
to have the fix.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] xfs: detect agfl count corruption and reset agfl
  2018-05-31 23:41 [PATCH 0/1] Fix for xfs agfl wrap crash for stable kernels Dave Chiluk
@ 2018-05-31 23:41 ` Dave Chiluk
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Chiluk @ 2018-05-31 23:41 UTC (permalink / raw)
  To: stable

From: Brian Foster <bfoster@redhat.com>

commit a27ba2607e60312554cbcd43fc660b2c7f29dc9c upstream

The struct xfs_agfl v5 header was originally introduced with
unexpected padding that caused the AGFL to operate with one less
slot than intended. The header has since been packed, but the fix
left an incompatibility for users who upgrade from an old kernel
with the unpacked header to a newer kernel with the packed header
while the AGFL happens to wrap around the end. The newer kernel
recognizes one extra slot at the physical end of the AGFL that the
previous kernel did not. The new kernel will eventually attempt to
allocate a block from that slot, which contains invalid data, and
cause a crash.

This condition can be detected by comparing the active range of the
AGFL to the count. While this detects a padding mismatch, it can
also trigger false positives for unrelated flcount corruption. Since
we cannot distinguish a size mismatch due to padding from unrelated
corruption, we can't trust the AGFL enough to simply repopulate the
empty slot.

Instead, avoid unnecessarily complex detection logic and and use a
solution that can handle any form of flcount corruption that slips
through read verifiers: distrust the entire AGFL and reset it to an
empty state. Any valid blocks within the AGFL are intentionally
leaked. This requires xfs_repair to rectify (which was already
necessary based on the state the AGFL was found in). The reset
mitigates the side effect of the padding mismatch problem from a
filesystem crash to a free space accounting inconsistency. The
generic approach also means that this patch can be safely backported
to kernels with or without a packed struct xfs_agfl.

Check the AGF for an invalid freelist count on initial read from
disk. If detected, set a flag on the xfs_perag to indicate that a
reset is required before the AGFL can be used. In the first
transaction that attempts to use a flagged AGFL, reset it to empty,
warn the user about the inconsistency and allow the freelist fixup
code to repopulate the AGFL with new blocks. The xfs_perag flag is
cleared to eliminate the need for repeated checks on each block
allocation operation.

This allows kernels that include the packing fix commit 96f859d52bcb
("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
to handle older unpacked AGFL formats without a filesystem crash.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

(backported from commit a27ba2607e60312554cbcd43fc660b2c7f29dc9c)
Signed-off-by: Dave Chiluk <chiluk+linuxxfs@indeed.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 94 +++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_mount.h        |  1 +
 fs/xfs/xfs_trace.h        |  9 +++-
 3 files changed, 103 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index e1e7fe3b5424..b663b756f552 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1923,6 +1923,93 @@ xfs_alloc_space_available(
 	return true;
 }
 
+/*
+ * Check the agfl fields of the agf for inconsistency or corruption. The purpose
+ * is to detect an agfl header padding mismatch between current and early v5
+ * kernels. This problem manifests as a 1-slot size difference between the
+ * on-disk flcount and the active [first, last] range of a wrapped agfl. This
+ * may also catch variants of agfl count corruption unrelated to padding. Either
+ * way, we'll reset the agfl and warn the user.
+ *
+ * Return true if a reset is required before the agfl can be used, false
+ * otherwise.
+ */
+static bool
+xfs_agfl_needs_reset(
+	struct xfs_mount	*mp,
+	struct xfs_agf		*agf)
+{
+	uint32_t		f = be32_to_cpu(agf->agf_flfirst);
+	uint32_t		l = be32_to_cpu(agf->agf_fllast);
+	uint32_t		c = be32_to_cpu(agf->agf_flcount);
+	int			agfl_size = XFS_AGFL_SIZE(mp);
+	int			active;
+
+	/* no agfl header on v4 supers */
+	if (!xfs_sb_version_hascrc(&mp->m_sb))
+		return false;
+
+	/*
+	 * The agf read verifier catches severe corruption of these fields.
+	 * Repeat some sanity checks to cover a packed -> unpacked mismatch if
+	 * the verifier allows it.
+	 */
+	if (f >= agfl_size || l >= agfl_size)
+		return true;
+	if (c > agfl_size)
+		return true;
+
+	/*
+	 * Check consistency between the on-disk count and the active range. An
+	 * agfl padding mismatch manifests as an inconsistent flcount.
+	 */
+	if (c && l >= f)
+		active = l - f + 1;
+	else if (c)
+		active = agfl_size - f + l + 1;
+	else
+		active = 0;
+
+	return active != c;
+}
+
+/*
+ * Reset the agfl to an empty state. Ignore/drop any existing blocks since the
+ * agfl content cannot be trusted. Warn the user that a repair is required to
+ * recover leaked blocks.
+ *
+ * The purpose of this mechanism is to handle filesystems affected by the agfl
+ * header padding mismatch problem. A reset keeps the filesystem online with a
+ * relatively minor free space accounting inconsistency rather than suffer the
+ * inevitable crash from use of an invalid agfl block.
+ */
+static void
+xfs_agfl_reset(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	struct xfs_perag	*pag)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
+
+	ASSERT(pag->pagf_agflreset);
+	trace_xfs_agfl_reset(mp, agf, 0, _RET_IP_);
+
+	xfs_warn(mp,
+	       "WARNING: Reset corrupted AGFL on AG %u. %d blocks leaked. "
+	       "Please unmount and run xfs_repair.",
+	         pag->pag_agno, pag->pagf_flcount);
+
+	agf->agf_flfirst = 0;
+	agf->agf_fllast = cpu_to_be32(XFS_AGFL_SIZE(mp) - 1);
+	agf->agf_flcount = 0;
+	xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST |
+				    XFS_AGF_FLCOUNT);
+
+	pag->pagf_flcount = 0;
+	pag->pagf_agflreset = false;
+}
+
 /*
  * Decide whether to use this allocation group for this allocation.
  * If so, fix up the btree freelist's size.
@@ -1983,6 +2070,10 @@ xfs_alloc_fix_freelist(
 		}
 	}
 
+	/* reset a padding mismatched agfl before final free space check */
+	if (pag->pagf_agflreset)
+		xfs_agfl_reset(tp, agbp, pag);
+
 	/* If there isn't enough total space or single-extent, reject it. */
 	need = xfs_alloc_min_freelist(mp, pag);
 	if (!xfs_alloc_space_available(args, need, flags))
@@ -2121,6 +2212,7 @@ xfs_alloc_get_freelist(
 		agf->agf_flfirst = 0;
 
 	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
+	ASSERT(!pag->pagf_agflreset);
 	be32_add_cpu(&agf->agf_flcount, -1);
 	xfs_trans_agflist_delta(tp, -1);
 	pag->pagf_flcount--;
@@ -2226,6 +2318,7 @@ xfs_alloc_put_freelist(
 		agf->agf_fllast = 0;
 
 	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
+	ASSERT(!pag->pagf_agflreset);
 	be32_add_cpu(&agf->agf_flcount, 1);
 	xfs_trans_agflist_delta(tp, 1);
 	pag->pagf_flcount++;
@@ -2417,6 +2510,7 @@ xfs_alloc_read_agf(
 		pag->pagb_count = 0;
 		pag->pagb_tree = RB_ROOT;
 		pag->pagf_init = 1;
+		pag->pagf_agflreset = xfs_agfl_needs_reset(mp, agf);
 	}
 #ifdef DEBUG
 	else if (!XFS_FORCED_SHUTDOWN(mp)) {
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index b57098481c10..ae3e52749f20 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -278,6 +278,7 @@ typedef struct xfs_perag {
 	char		pagi_inodeok;	/* The agi is ok for inodes */
 	__uint8_t	pagf_levels[XFS_BTNUM_AGF];
 					/* # of levels in bno & cnt btree */
+	bool		pagf_agflreset; /* agfl requires reset before use */
 	__uint32_t	pagf_flcount;	/* count of blocks in freelist */
 	xfs_extlen_t	pagf_freeblks;	/* total free blocks */
 	xfs_extlen_t	pagf_longest;	/* longest free space */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 877079eb0f8f..cc6fa64821d2 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1485,7 +1485,7 @@ TRACE_EVENT(xfs_trans_commit_lsn,
 		  __entry->lsn)
 );
 
-TRACE_EVENT(xfs_agf,
+DECLARE_EVENT_CLASS(xfs_agf_class,
 	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags,
 		 unsigned long caller_ip),
 	TP_ARGS(mp, agf, flags, caller_ip),
@@ -1541,6 +1541,13 @@ TRACE_EVENT(xfs_agf,
 		  __entry->longest,
 		  (void *)__entry->caller_ip)
 );
+#define DEFINE_AGF_EVENT(name) \
+DEFINE_EVENT(xfs_agf_class, name, \
+	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, \
+		 unsigned long caller_ip), \
+	TP_ARGS(mp, agf, flags, caller_ip))
+DEFINE_AGF_EVENT(xfs_agf);
+DEFINE_AGF_EVENT(xfs_agfl_reset);
 
 TRACE_EVENT(xfs_free_extent,
 	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-05-31 23:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-14 17:17 [PATCH] xfs: detect agfl count corruption and reset agfl Brian Foster
2018-03-14 18:12 ` Darrick J. Wong
2018-03-14 20:42   ` Dave Chiluk
2018-03-15 10:38     ` Brian Foster
2018-03-15 15:46       ` Darrick J. Wong
2018-03-15 16:27         ` Dave Chiluk
2018-03-15 22:48           ` Dave Chinner
2018-03-15 22:26       ` Dave Chinner
2018-03-16 11:59         ` Brian Foster
2018-05-31 23:41 [PATCH 0/1] Fix for xfs agfl wrap crash for stable kernels Dave Chiluk
2018-05-31 23:41 ` [PATCH] xfs: detect agfl count corruption and reset agfl Dave Chiluk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.