linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Dave Chinner <david@fromorbit.com>,
	Brian Foster <bfoster@redhat.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Dave Chiluk <chiluk+linuxxfs@indeed.com>
Subject: [PATCH 4.4 05/37] xfs: detect agfl count corruption and reset agfl
Date: Tue,  5 Jun 2018 19:01:10 +0200	[thread overview]
Message-ID: <20180605170109.209417858@linuxfoundation.org> (raw)
In-Reply-To: <20180605170108.884872354@linuxfoundation.org>

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit a27ba2607e60312554cbcd43fc660b2c7f29dc9c upstream.

The struct xfs_agfl v5 header was originally introduced with
unexpected padding that caused the AGFL to operate with one less
slot than intended. The header has since been packed, but the fix
left an incompatibility for users who upgrade from an old kernel
with the unpacked header to a newer kernel with the packed header
while the AGFL happens to wrap around the end. The newer kernel
recognizes one extra slot at the physical end of the AGFL that the
previous kernel did not. The new kernel will eventually attempt to
allocate a block from that slot, which contains invalid data, and
cause a crash.

This condition can be detected by comparing the active range of the
AGFL to the count. While this detects a padding mismatch, it can
also trigger false positives for unrelated flcount corruption. Since
we cannot distinguish a size mismatch due to padding from unrelated
corruption, we can't trust the AGFL enough to simply repopulate the
empty slot.

Instead, avoid unnecessarily complex detection logic and and use a
solution that can handle any form of flcount corruption that slips
through read verifiers: distrust the entire AGFL and reset it to an
empty state. Any valid blocks within the AGFL are intentionally
leaked. This requires xfs_repair to rectify (which was already
necessary based on the state the AGFL was found in). The reset
mitigates the side effect of the padding mismatch problem from a
filesystem crash to a free space accounting inconsistency. The
generic approach also means that this patch can be safely backported
to kernels with or without a packed struct xfs_agfl.

Check the AGF for an invalid freelist count on initial read from
disk. If detected, set a flag on the xfs_perag to indicate that a
reset is required before the AGFL can be used. In the first
transaction that attempts to use a flagged AGFL, reset it to empty,
warn the user about the inconsistency and allow the freelist fixup
code to repopulate the AGFL with new blocks. The xfs_perag flag is
cleared to eliminate the need for repeated checks on each block
allocation operation.

This allows kernels that include the packing fix commit 96f859d52bcb
("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
to handle older unpacked AGFL formats without a filesystem crash.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chiluk <chiluk+linuxxfs@indeed.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_alloc.c |   94 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_mount.h        |    1 
 fs/xfs/xfs_trace.h        |    9 +++-
 3 files changed, 103 insertions(+), 1 deletion(-)

--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1924,6 +1924,93 @@ xfs_alloc_space_available(
 }
 
 /*
+ * Check the agfl fields of the agf for inconsistency or corruption. The purpose
+ * is to detect an agfl header padding mismatch between current and early v5
+ * kernels. This problem manifests as a 1-slot size difference between the
+ * on-disk flcount and the active [first, last] range of a wrapped agfl. This
+ * may also catch variants of agfl count corruption unrelated to padding. Either
+ * way, we'll reset the agfl and warn the user.
+ *
+ * Return true if a reset is required before the agfl can be used, false
+ * otherwise.
+ */
+static bool
+xfs_agfl_needs_reset(
+	struct xfs_mount	*mp,
+	struct xfs_agf		*agf)
+{
+	uint32_t		f = be32_to_cpu(agf->agf_flfirst);
+	uint32_t		l = be32_to_cpu(agf->agf_fllast);
+	uint32_t		c = be32_to_cpu(agf->agf_flcount);
+	int			agfl_size = XFS_AGFL_SIZE(mp);
+	int			active;
+
+	/* no agfl header on v4 supers */
+	if (!xfs_sb_version_hascrc(&mp->m_sb))
+		return false;
+
+	/*
+	 * The agf read verifier catches severe corruption of these fields.
+	 * Repeat some sanity checks to cover a packed -> unpacked mismatch if
+	 * the verifier allows it.
+	 */
+	if (f >= agfl_size || l >= agfl_size)
+		return true;
+	if (c > agfl_size)
+		return true;
+
+	/*
+	 * Check consistency between the on-disk count and the active range. An
+	 * agfl padding mismatch manifests as an inconsistent flcount.
+	 */
+	if (c && l >= f)
+		active = l - f + 1;
+	else if (c)
+		active = agfl_size - f + l + 1;
+	else
+		active = 0;
+
+	return active != c;
+}
+
+/*
+ * Reset the agfl to an empty state. Ignore/drop any existing blocks since the
+ * agfl content cannot be trusted. Warn the user that a repair is required to
+ * recover leaked blocks.
+ *
+ * The purpose of this mechanism is to handle filesystems affected by the agfl
+ * header padding mismatch problem. A reset keeps the filesystem online with a
+ * relatively minor free space accounting inconsistency rather than suffer the
+ * inevitable crash from use of an invalid agfl block.
+ */
+static void
+xfs_agfl_reset(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	struct xfs_perag	*pag)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
+
+	ASSERT(pag->pagf_agflreset);
+	trace_xfs_agfl_reset(mp, agf, 0, _RET_IP_);
+
+	xfs_warn(mp,
+	       "WARNING: Reset corrupted AGFL on AG %u. %d blocks leaked. "
+	       "Please unmount and run xfs_repair.",
+	         pag->pag_agno, pag->pagf_flcount);
+
+	agf->agf_flfirst = 0;
+	agf->agf_fllast = cpu_to_be32(XFS_AGFL_SIZE(mp) - 1);
+	agf->agf_flcount = 0;
+	xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST |
+				    XFS_AGF_FLCOUNT);
+
+	pag->pagf_flcount = 0;
+	pag->pagf_agflreset = false;
+}
+
+/*
  * Decide whether to use this allocation group for this allocation.
  * If so, fix up the btree freelist's size.
  */
@@ -1983,6 +2070,10 @@ xfs_alloc_fix_freelist(
 		}
 	}
 
+	/* reset a padding mismatched agfl before final free space check */
+	if (pag->pagf_agflreset)
+		xfs_agfl_reset(tp, agbp, pag);
+
 	/* If there isn't enough total space or single-extent, reject it. */
 	need = xfs_alloc_min_freelist(mp, pag);
 	if (!xfs_alloc_space_available(args, need, flags))
@@ -2121,6 +2212,7 @@ xfs_alloc_get_freelist(
 		agf->agf_flfirst = 0;
 
 	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
+	ASSERT(!pag->pagf_agflreset);
 	be32_add_cpu(&agf->agf_flcount, -1);
 	xfs_trans_agflist_delta(tp, -1);
 	pag->pagf_flcount--;
@@ -2226,6 +2318,7 @@ xfs_alloc_put_freelist(
 		agf->agf_fllast = 0;
 
 	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
+	ASSERT(!pag->pagf_agflreset);
 	be32_add_cpu(&agf->agf_flcount, 1);
 	xfs_trans_agflist_delta(tp, 1);
 	pag->pagf_flcount++;
@@ -2417,6 +2510,7 @@ xfs_alloc_read_agf(
 		pag->pagb_count = 0;
 		pag->pagb_tree = RB_ROOT;
 		pag->pagf_init = 1;
+		pag->pagf_agflreset = xfs_agfl_needs_reset(mp, agf);
 	}
 #ifdef DEBUG
 	else if (!XFS_FORCED_SHUTDOWN(mp)) {
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -278,6 +278,7 @@ typedef struct xfs_perag {
 	char		pagi_inodeok;	/* The agi is ok for inodes */
 	__uint8_t	pagf_levels[XFS_BTNUM_AGF];
 					/* # of levels in bno & cnt btree */
+	bool		pagf_agflreset; /* agfl requires reset before use */
 	__uint32_t	pagf_flcount;	/* count of blocks in freelist */
 	xfs_extlen_t	pagf_freeblks;	/* total free blocks */
 	xfs_extlen_t	pagf_longest;	/* longest free space */
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1485,7 +1485,7 @@ TRACE_EVENT(xfs_trans_commit_lsn,
 		  __entry->lsn)
 );
 
-TRACE_EVENT(xfs_agf,
+DECLARE_EVENT_CLASS(xfs_agf_class,
 	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags,
 		 unsigned long caller_ip),
 	TP_ARGS(mp, agf, flags, caller_ip),
@@ -1541,6 +1541,13 @@ TRACE_EVENT(xfs_agf,
 		  __entry->longest,
 		  (void *)__entry->caller_ip)
 );
+#define DEFINE_AGF_EVENT(name) \
+DEFINE_EVENT(xfs_agf_class, name, \
+	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, \
+		 unsigned long caller_ip), \
+	TP_ARGS(mp, agf, flags, caller_ip))
+DEFINE_AGF_EVENT(xfs_agf);
+DEFINE_AGF_EVENT(xfs_agfl_reset);
 
 TRACE_EVENT(xfs_free_extent,
 	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno,

  parent reply	other threads:[~2018-06-05 17:04 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-05 17:01 [PATCH 4.4 00/37] 4.4.136-stable review Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 01/37] arm64: lse: Add early clobbers to some input/output asm operands Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 02/37] powerpc/64s: Clear PCR on boot Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 03/37] USB: serial: cp210x: use tcflag_t to fix incompatible pointer type Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 04/37] sh: New gcc support Greg Kroah-Hartman
2018-06-05 17:01 ` Greg Kroah-Hartman [this message]
2018-06-05 17:01 ` [PATCH 4.4 06/37] Revert "ima: limit file hash setting by user to fix and log modes" Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 07/37] Input: elan_i2c_smbus - fix corrupted stack Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 08/37] tracing: Fix crash when freeing instances with event triggers Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 09/37] selinux: KASAN: slab-out-of-bounds in xattr_getsecurity Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 10/37] cfg80211: further limit wiphy names to 64 bytes Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 11/37] rtlwifi: rtl8192cu: Remove variable self-assignment in rf.c Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 12/37] ASoC: Intel: sst: remove redundant variable dma_dev_name Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 13/37] irda: fix overly long udelay() Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 14/37] tcp: avoid integer overflows in tcp_rcv_space_adjust() Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 15/37] i2c: rcar: make sure clocks are on when doing clock calculation Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 16/37] i2c: rcar: rework hw init Greg Kroah-Hartman
2018-06-18 18:46   ` Ben Hutchings
2018-06-25 10:05     ` Fabrizio Castro
2018-06-05 17:01 ` [PATCH 4.4 17/37] i2c: rcar: remove unused IOERROR state Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 18/37] i2c: rcar: remove spinlock Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 19/37] i2c: rcar: refactor setup of a msg Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 20/37] i2c: rcar: init new messages in irq Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 21/37] i2c: rcar: dont issue stop when HW does it automatically Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 22/37] i2c: rcar: check master irqs before slave irqs Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 23/37] i2c: rcar: revoke START request early Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 24/37] dmaengine: usb-dmac: fix endless loop in usb_dmac_chan_terminate_all() Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 25/37] iio:kfifo_buf: check for uint overflow Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 26/37] MIPS: ptrace: Fix PTRACE_PEEKUSR requests for 64-bit FGRs Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 27/37] MIPS: prctl: Disallow FRE without FR with PR_SET_FP_MODE requests Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 28/37] scsi: scsi_transport_srp: Fix shost to rport translation Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 29/37] stm class: Use vmalloc for the master map Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 30/37] hwtracing: stm: fix build error on some arches Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 32/37] Kbuild: change CC_OPTIMIZE_FOR_SIZE definition Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 33/37] fix io_destroy()/aio_complete() race Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 34/37] mm: fix the NULL mapping case in __isolate_lru_page() Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 35/37] sparc64: Add __multi3 for gcc 7.x and later Greg Kroah-Hartman
2018-06-05 17:01 ` [PATCH 4.4 36/37] sparc64: Dont clibber fixed registers in __multi4 Greg Kroah-Hartman
2018-06-05 21:59 ` [PATCH 4.4 00/37] 4.4.136-stable review Shuah Khan
2018-06-06  0:30 ` Nathan Chancellor
2018-06-06  8:19   ` Greg Kroah-Hartman
2018-06-06 11:18 ` Naresh Kamboju
2018-06-06 12:15   ` Greg Kroah-Hartman
2018-06-06 13:28 ` Guenter Roeck
2018-06-06 13:31   ` Greg Kroah-Hartman
2018-06-06 15:01     ` Guenter Roeck
2018-06-06 15:35       ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180605170109.209417858@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=bfoster@redhat.com \
    --cc=chiluk+linuxxfs@indeed.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).