All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] xfs: log recovery torn write detection
@ 2015-11-12 15:37 Brian Foster
  2015-11-12 15:37 ` [PATCH v2 1/8] xfs: detect and handle invalid iclog size set by mkfs Brian Foster
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Brian Foster @ 2015-11-12 15:37 UTC (permalink / raw)
  To: xfs

Hi all,

Here's v2 of log recovery torn write detection. This fixes a couple bugs
as noted in the replies to the v1 series and replaces the error
injection mechanism with an alternate version that actually writes an
invalid log record CRC to disk. An xfstests test based on this mechanism
is forthcoming.

This survives xfstests and runs through a significant number of error
injection iterations without any related problems. I have seen one or
two assert failures and whatnot that do not appear related to this
series and so far are not repeatable. Thoughts, reviews, flames
appreciated.

Brian

v2:
- Fixed bug in first bad record detection to return correct block when
  log wraps.
- Update torn record handling to truncate back to tail.
- Fixed up warning messages.
- Replaced v1 error injection mechanism with log write time CRC error
  injection.
v1: http://oss.sgi.com/pipermail/xfs/2015-November/044841.html
- Added bug fix for mkfs log record header inconsistency.
- Refactored log recovery code to support a CRC-check-only recovery
  pass.
- CRC verify the last 8 records behind the head to account for
  concurrent log writes.
- Verify the tail of the log as well when the head is torn.
- Added (rfc) crc error injection patch for testing purposes.
rfc: http://oss.sgi.com/pipermail/xfs/2015-July/042415.html

Brian Foster (8):
  xfs: detect and handle invalid iclog size set by mkfs
  xfs: refactor log record unpack and data processing
  xfs: refactor and open code log record crc check
  xfs: return start block of first bad log record during recovery
  xfs: support a crc verification only log record pass
  xfs: refactor log record start detection into a new helper
  xfs: detect and trim torn writes during log recovery
  xfs: debug mode log record crc error injection

 fs/xfs/libxfs/xfs_log_recover.h |   1 +
 fs/xfs/xfs_log.c                |  45 +++-
 fs/xfs/xfs_log_priv.h           |   3 +
 fs/xfs/xfs_log_recover.c        | 569 ++++++++++++++++++++++++++++++++--------
 fs/xfs/xfs_sysfs.c              |  36 +++
 5 files changed, 541 insertions(+), 113 deletions(-)

-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/8] xfs: detect and handle invalid iclog size set by mkfs
  2015-11-12 15:37 [PATCH v2 0/8] xfs: log recovery torn write detection Brian Foster
@ 2015-11-12 15:37 ` Brian Foster
  2015-11-12 15:37 ` [PATCH v2 2/8] xfs: refactor log record unpack and data processing Brian Foster
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Brian Foster @ 2015-11-12 15:37 UTC (permalink / raw)
  To: xfs

XFS log records have separate fields for the record size and the iclog
size used to write the record. mkfs.xfs zeroes the log and writes an
unmount record to generate a clean log for the subsequent mount. The
userspace record logging code has a bug where the iclog size (h_size)
field of the log record is hardcoded to 32k, even if a log stripe unit
is specified. The log record length is correctly extended to the stripe
unit. Since the kernel log recovery code uses the h_size field to
determine the log buffer size, this means that the kernel can attempt to
read/process records larger than the buffer size and overrun the buffer.

This has historically not been a problem because the kernel doesn't
actually run through log recovery in the clean unmount case. Instead,
the kernel detects that a single unmount record exists between the head
and tail and pushes the tail forward such that the log is viewed as
clean (head == tail). Once CRC verification is enabled, however, all
records at the head of the log are verified for CRC errors and thus we
are susceptible to overrun problems if the iclog field is not correct.

While the core problem must be fixed in userspace, this is historical
behavior that must be detected in the kernel to avoid severe side
effects such as memory corruption and crashes. Update the log buffer
size calculation code to detect this condition, warn the user and resize
the log buffer based on the log stripe unit. Return a corruption error
in cases where this does not look like a clean filesystem (i.e., the log
record header indicates more than one operation).

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log_recover.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index c5ecaac..4f880d6 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -4245,7 +4245,7 @@ xlog_do_recovery_pass(
 	xfs_daddr_t		blk_no;
 	char			*offset;
 	xfs_buf_t		*hbp, *dbp;
-	int			error = 0, h_size;
+	int			error = 0, h_size, h_len;
 	int			bblks, split_bblks;
 	int			hblks, split_hblks, wrapped_hblks;
 	struct hlist_head	rhash[XLOG_RHASH_SIZE];
@@ -4274,7 +4274,31 @@ xlog_do_recovery_pass(
 		error = xlog_valid_rec_header(log, rhead, tail_blk);
 		if (error)
 			goto bread_err1;
+
+		/*
+		 * xfsprogs has a bug where record length is based on lsunit but
+		 * h_size (iclog size) is hardcoded to 32k. Now that we
+		 * unconditionally CRC verify the unmount record, this means the
+		 * log buffer can be too small for the record and cause an
+		 * overrun.
+		 *
+		 * Detect this condition here. Use lsunit for the buffer size as
+		 * long as this looks like the mkfs case. Otherwise, return an
+		 * error to avoid a buffer overrun.
+		 */
 		h_size = be32_to_cpu(rhead->h_size);
+		h_len = be32_to_cpu(rhead->h_len);
+		if (h_len > h_size) {
+			if (h_len <= log->l_mp->m_logbsize &&
+			    be32_to_cpu(rhead->h_num_logops) == 1) {
+				xfs_warn(log->l_mp,
+		"invalid iclog size (%d bytes), using lsunit (%d bytes)",
+					 h_size, log->l_mp->m_logbsize);
+				h_size = log->l_mp->m_logbsize;
+			} else
+				return -EFSCORRUPTED;
+		}
+
 		if ((be32_to_cpu(rhead->h_version) & XLOG_VERSION_2) &&
 		    (h_size > XLOG_HEADER_CYCLE_SIZE)) {
 			hblks = h_size / XLOG_HEADER_CYCLE_SIZE;
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/8] xfs: refactor log record unpack and data processing
  2015-11-12 15:37 [PATCH v2 0/8] xfs: log recovery torn write detection Brian Foster
  2015-11-12 15:37 ` [PATCH v2 1/8] xfs: detect and handle invalid iclog size set by mkfs Brian Foster
@ 2015-11-12 15:37 ` Brian Foster
  2015-11-12 15:37 ` [PATCH v2 3/8] xfs: refactor and open code log record crc check Brian Foster
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Brian Foster @ 2015-11-12 15:37 UTC (permalink / raw)
  To: xfs

xlog_do_recovery_pass() duplicates a couple function calls related to
processing log records because the function must handle wrapping around
the end of the log if the head is behind the tail. This is implemented
as separate loops. CRC verification pass support will modify how records
are processed in both of these loops.

Rather than continue to duplicate code, factor the calls that process a
log record into a new helper and call that helper from both loops. This
patch contains no functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log_recover.c | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 4f880d6..236ebaf 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -4190,6 +4190,26 @@ xlog_unpack_data(
 	return 0;
 }
 
+/*
+ * Unpack and process a log record.
+ */
+STATIC int
+xlog_recover_process(
+	struct xlog		*log,
+	struct hlist_head	rhash[],
+	struct xlog_rec_header	*rhead,
+	char			*dp,
+	int			pass)
+{
+	int			error;
+
+	error = xlog_unpack_data(rhead, dp, log);
+	if (error)
+		return error;
+
+	return xlog_recover_process_data(log, rhash, rhead, dp, pass);
+}
+
 STATIC int
 xlog_valid_rec_header(
 	struct xlog		*log,
@@ -4432,12 +4452,8 @@ xlog_do_recovery_pass(
 					goto bread_err2;
 			}
 
-			error = xlog_unpack_data(rhead, offset, log);
-			if (error)
-				goto bread_err2;
-
-			error = xlog_recover_process_data(log, rhash,
-							rhead, offset, pass);
+			error = xlog_recover_process(log, rhash, rhead, offset,
+						     pass);
 			if (error)
 				goto bread_err2;
 			blk_no += bblks;
@@ -4465,12 +4481,7 @@ xlog_do_recovery_pass(
 		if (error)
 			goto bread_err2;
 
-		error = xlog_unpack_data(rhead, offset, log);
-		if (error)
-			goto bread_err2;
-
-		error = xlog_recover_process_data(log, rhash,
-						rhead, offset, pass);
+		error = xlog_recover_process(log, rhash, rhead, offset, pass);
 		if (error)
 			goto bread_err2;
 		blk_no += bblks + hblks;
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/8] xfs: refactor and open code log record crc check
  2015-11-12 15:37 [PATCH v2 0/8] xfs: log recovery torn write detection Brian Foster
  2015-11-12 15:37 ` [PATCH v2 1/8] xfs: detect and handle invalid iclog size set by mkfs Brian Foster
  2015-11-12 15:37 ` [PATCH v2 2/8] xfs: refactor log record unpack and data processing Brian Foster
@ 2015-11-12 15:37 ` Brian Foster
  2015-11-12 15:37 ` [PATCH v2 4/8] xfs: return start block of first bad log record during recovery Brian Foster
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Brian Foster @ 2015-11-12 15:37 UTC (permalink / raw)
  To: xfs

Log record CRC verification currently occurs during active log recovery,
immediately before a log record is unpacked. Therefore, the CRC
calculation code is buried within the data unpack function. CRC
verification pass support only needs to go so far as check the CRC, but
this is not easily allowed as the code is currently organized.

Since we now have a new log record processing helper, pull the record
CRC verification code out from the unpack helper and open-code it at the
top of the new process helper. This facilitates the ability to modify
how records are processed based on the type of the current pass. This
patch contains no functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log_recover.c | 72 +++++++++++++++++-------------------------------
 1 file changed, 26 insertions(+), 46 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 236ebaf..9ec4bbd 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -4118,46 +4118,6 @@ xlog_recover_process_iunlinks(
 	mp->m_dmevmask = mp_dmevmask;
 }
 
-/*
- * Upack the log buffer data and crc check it. If the check fails, issue a
- * warning if and only if the CRC in the header is non-zero. This makes the
- * check an advisory warning, and the zero CRC check will prevent failure
- * warnings from being emitted when upgrading the kernel from one that does not
- * add CRCs by default.
- *
- * When filesystems are CRC enabled, this CRC mismatch becomes a fatal log
- * corruption failure
- */
-STATIC int
-xlog_unpack_data_crc(
-	struct xlog_rec_header	*rhead,
-	char			*dp,
-	struct xlog		*log)
-{
-	__le32			crc;
-
-	crc = xlog_cksum(log, rhead, dp, be32_to_cpu(rhead->h_len));
-	if (crc != rhead->h_crc) {
-		if (rhead->h_crc || xfs_sb_version_hascrc(&log->l_mp->m_sb)) {
-			xfs_alert(log->l_mp,
-		"log record CRC mismatch: found 0x%x, expected 0x%x.",
-					le32_to_cpu(rhead->h_crc),
-					le32_to_cpu(crc));
-			xfs_hex_dump(dp, 32);
-		}
-
-		/*
-		 * If we've detected a log record corruption, then we can't
-		 * recover past this point. Abort recovery if we are enforcing
-		 * CRC protection by punting an error back up the stack.
-		 */
-		if (xfs_sb_version_hascrc(&log->l_mp->m_sb))
-			return -EFSCORRUPTED;
-	}
-
-	return 0;
-}
-
 STATIC int
 xlog_unpack_data(
 	struct xlog_rec_header	*rhead,
@@ -4165,11 +4125,6 @@ xlog_unpack_data(
 	struct xlog		*log)
 {
 	int			i, j, k;
-	int			error;
-
-	error = xlog_unpack_data_crc(rhead, dp, log);
-	if (error)
-		return error;
 
 	for (i = 0; i < BTOBB(be32_to_cpu(rhead->h_len)) &&
 		  i < (XLOG_HEADER_CYCLE_SIZE / BBSIZE); i++) {
@@ -4191,7 +4146,7 @@ xlog_unpack_data(
 }
 
 /*
- * Unpack and process a log record.
+ * CRC check, unpack and process a log record.
  */
 STATIC int
 xlog_recover_process(
@@ -4202,6 +4157,31 @@ xlog_recover_process(
 	int			pass)
 {
 	int			error;
+	__le32			crc;
+
+	/*
+	 * Check the CRC and issue a warning if and only if the CRC in the
+	 * header is non-zero. This is an advisory warning and the zero CRC
+	 * check prevents warnings from being emitted when upgrading the kernel
+	 * from one that does not add CRCs by default.
+	 */
+	crc = xlog_cksum(log, rhead, dp, be32_to_cpu(rhead->h_len));
+	if (crc != le32_to_cpu(rhead->h_crc)) {
+		if (rhead->h_crc || xfs_sb_version_hascrc(&log->l_mp->m_sb)) {
+			xfs_alert(log->l_mp,
+		"log record CRC mismatch: found 0x%x, expected 0x%x.",
+					le32_to_cpu(rhead->h_crc),
+					le32_to_cpu(crc));
+			xfs_hex_dump(dp, 32);
+		}
+
+		/*
+		 * If the filesystem is CRC enabled, this mismatch becomes a
+		 * fatal log corruption failure.
+		 */
+		if (xfs_sb_version_hascrc(&log->l_mp->m_sb))
+			return -EFSCORRUPTED;
+	}
 
 	error = xlog_unpack_data(rhead, dp, log);
 	if (error)
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 4/8] xfs: return start block of first bad log record during recovery
  2015-11-12 15:37 [PATCH v2 0/8] xfs: log recovery torn write detection Brian Foster
                   ` (2 preceding siblings ...)
  2015-11-12 15:37 ` [PATCH v2 3/8] xfs: refactor and open code log record crc check Brian Foster
@ 2015-11-12 15:37 ` Brian Foster
  2015-11-12 15:37 ` [PATCH v2 5/8] xfs: support a crc verification only log record pass Brian Foster
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Brian Foster @ 2015-11-12 15:37 UTC (permalink / raw)
  To: xfs

Each log recovery pass walks from the tail block to the head block and
processes records appropriately based on the associated log pass type.
There are various failure conditions that can occur through this
sequence, such as I/O errors, CRC errors, etc. Log torn write detection
will perform CRC verification near the head of the log to detect torn
writes and trim torn records from the log appropriately.

As it is, xlog_do_recovery_pass() only returns an error code in the
event of CRC failure, which isn't enough information to trim the head of
the log. Update xlog_do_recovery_pass() to optionally return the start
block of the associated record when an error occurs. This patch contains
no functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log_recover.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 9ec4bbd..e0318e8 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -4239,10 +4239,12 @@ xlog_do_recovery_pass(
 	struct xlog		*log,
 	xfs_daddr_t		head_blk,
 	xfs_daddr_t		tail_blk,
-	int			pass)
+	int			pass,
+	xfs_daddr_t		*first_bad)	/* out: first bad log rec */
 {
 	xlog_rec_header_t	*rhead;
 	xfs_daddr_t		blk_no;
+	xfs_daddr_t		rhead_blk;
 	char			*offset;
 	xfs_buf_t		*hbp, *dbp;
 	int			error = 0, h_size, h_len;
@@ -4251,6 +4253,7 @@ xlog_do_recovery_pass(
 	struct hlist_head	rhash[XLOG_RHASH_SIZE];
 
 	ASSERT(head_blk != tail_blk);
+	rhead_blk = 0;
 
 	/*
 	 * Read the header of the tail block and get the iclog buffer size from
@@ -4325,7 +4328,7 @@ xlog_do_recovery_pass(
 	}
 
 	memset(rhash, 0, sizeof(rhash));
-	blk_no = tail_blk;
+	blk_no = rhead_blk = tail_blk;
 	if (tail_blk > head_blk) {
 		/*
 		 * Perform recovery around the end of the physical log.
@@ -4436,11 +4439,14 @@ xlog_do_recovery_pass(
 						     pass);
 			if (error)
 				goto bread_err2;
+
 			blk_no += bblks;
+			rhead_blk = blk_no;
 		}
 
 		ASSERT(blk_no >= log->l_logBBsize);
 		blk_no -= log->l_logBBsize;
+		rhead_blk = blk_no;
 	}
 
 	/* read first part of physical log */
@@ -4464,13 +4470,19 @@ xlog_do_recovery_pass(
 		error = xlog_recover_process(log, rhash, rhead, offset, pass);
 		if (error)
 			goto bread_err2;
+
 		blk_no += bblks + hblks;
+		rhead_blk = blk_no;
 	}
 
  bread_err2:
 	xlog_put_bp(dbp);
  bread_err1:
 	xlog_put_bp(hbp);
+
+	if (error && first_bad)
+		*first_bad = rhead_blk;
+
 	return error;
 }
 
@@ -4508,7 +4520,7 @@ xlog_do_log_recovery(
 		INIT_LIST_HEAD(&log->l_buf_cancel_table[i]);
 
 	error = xlog_do_recovery_pass(log, head_blk, tail_blk,
-				      XLOG_RECOVER_PASS1);
+				      XLOG_RECOVER_PASS1, NULL);
 	if (error != 0) {
 		kmem_free(log->l_buf_cancel_table);
 		log->l_buf_cancel_table = NULL;
@@ -4519,7 +4531,7 @@ xlog_do_log_recovery(
 	 * When it is complete free the table of buf cancel items.
 	 */
 	error = xlog_do_recovery_pass(log, head_blk, tail_blk,
-				      XLOG_RECOVER_PASS2);
+				      XLOG_RECOVER_PASS2, NULL);
 #ifdef DEBUG
 	if (!error) {
 		int	i;
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 5/8] xfs: support a crc verification only log record pass
  2015-11-12 15:37 [PATCH v2 0/8] xfs: log recovery torn write detection Brian Foster
                   ` (3 preceding siblings ...)
  2015-11-12 15:37 ` [PATCH v2 4/8] xfs: return start block of first bad log record during recovery Brian Foster
@ 2015-11-12 15:37 ` Brian Foster
  2015-11-12 15:37 ` [PATCH v2 6/8] xfs: refactor log record start detection into a new helper Brian Foster
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Brian Foster @ 2015-11-12 15:37 UTC (permalink / raw)
  To: xfs

Log recovery torn write detection uses CRC verification over a range of
the active log to identify torn writes. Since the generic log recovery
pass code implements a superset of the functionality required for CRC
verification, it can be easily modified to support a CRC verification
only pass.

Create a new CRC pass type and update the log record processing helper
to skip everything beyond CRC verification when in this mode. This pass
will be invoked in subsequent patches to implement torn write detection.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_log_recover.h |  1 +
 fs/xfs/xfs_log_recover.c        | 24 +++++++++++++++++++-----
 2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index 1c55ccb..8e385f9 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -60,6 +60,7 @@ typedef struct xlog_recover {
  */
 #define	XLOG_BC_TABLE_SIZE	64
 
+#define	XLOG_RECOVER_CRCPASS	0
 #define	XLOG_RECOVER_PASS1	1
 #define	XLOG_RECOVER_PASS2	2
 
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index e0318e8..1be2590 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -4159,13 +4159,27 @@ xlog_recover_process(
 	int			error;
 	__le32			crc;
 
+	crc = xlog_cksum(log, rhead, dp, be32_to_cpu(rhead->h_len));
+
 	/*
-	 * Check the CRC and issue a warning if and only if the CRC in the
-	 * header is non-zero. This is an advisory warning and the zero CRC
-	 * check prevents warnings from being emitted when upgrading the kernel
-	 * from one that does not add CRCs by default.
+	 * Nothing else to do if this is a CRC verification pass. Just return
+	 * if this a record with a non-zero crc. Unfortunately, mkfs always
+	 * sets h_crc to 0 so we must consider this valid even on v5 supers.
+	 * Otherwise, return EFSBADCRC on failure so the callers up the stack
+	 * know precisely what failed.
+	 */
+	if (pass == XLOG_RECOVER_CRCPASS) {
+		if (rhead->h_crc && crc != le32_to_cpu(rhead->h_crc))
+			return -EFSBADCRC;
+		return 0;
+	}
+
+	/*
+	 * We're in the normal recovery path. Issue a warning if and only if the
+	 * CRC in the header is non-zero. This is an advisory warning and the
+	 * zero CRC check prevents warnings from being emitted when upgrading
+	 * the kernel from one that does not add CRCs by default.
 	 */
-	crc = xlog_cksum(log, rhead, dp, be32_to_cpu(rhead->h_len));
 	if (crc != le32_to_cpu(rhead->h_crc)) {
 		if (rhead->h_crc || xfs_sb_version_hascrc(&log->l_mp->m_sb)) {
 			xfs_alert(log->l_mp,
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 6/8] xfs: refactor log record start detection into a new helper
  2015-11-12 15:37 [PATCH v2 0/8] xfs: log recovery torn write detection Brian Foster
                   ` (4 preceding siblings ...)
  2015-11-12 15:37 ` [PATCH v2 5/8] xfs: support a crc verification only log record pass Brian Foster
@ 2015-11-12 15:37 ` Brian Foster
  2015-11-12 15:37 ` [PATCH v2 7/8] xfs: detect and trim torn writes during log recovery Brian Foster
  2015-11-12 15:37 ` [PATCH v2 8/8] xfs: debug mode log record crc error injection Brian Foster
  7 siblings, 0 replies; 9+ messages in thread
From: Brian Foster @ 2015-11-12 15:37 UTC (permalink / raw)
  To: xfs

As part of the head/tail discovery process, log recovery locates the
head block and then reverse seeks to find the start of the last active
record in the log. This is non-trivial as the record itself could have
wrapped around the end of the physical log. Log recovery torn write
detection potentially needs to walk further behind the last record in
the log, as multiple log I/Os can be in-flight at one time during a
crash event.

Therefore, refactor the reverse log record header search mechanism into
a new helper that supports the ability to seek past an arbitrary number
of log records (or until the tail is hit). Update the head/tail search
mechanism to call the new helper, but otherwise there is no change in
log recovery behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log_recover.c | 118 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 83 insertions(+), 35 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 1be2590..423c36d 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -868,6 +868,79 @@ validate_head:
 }
 
 /*
+ * Seek backwards in the log for log record headers.
+ *
+ * Given a starting log block, walk backwards until we find the provided number
+ * of records or hit the provided tail block. The return value is the number of
+ * records encountered or a negative error code. The log block and buffer
+ * pointer of the last record seen are returned in rblk and rhead respectively.
+ */
+STATIC int
+xlog_rseek_logrec_hdr(
+	struct xlog		*log,
+	xfs_daddr_t		head_blk,
+	xfs_daddr_t		tail_blk,
+	int			count,
+	struct xfs_buf		*bp,
+	xfs_daddr_t		*rblk,
+	struct xlog_rec_header	**rhead,
+	bool			*wrapped)
+{
+	int			i;
+	int			error;
+	int			found = 0;
+	char			*offset = NULL;
+	xfs_daddr_t		end_blk;
+
+	*wrapped = false;
+
+	/*
+	 * Walk backwards from the head block until we hit the tail or the first
+	 * block in the log.
+	 */
+	end_blk = head_blk > tail_blk ? tail_blk : 0;
+	for (i = (int) head_blk - 1; i >= end_blk; i--) {
+		error = xlog_bread(log, i, 1, bp, &offset);
+		if (error)
+			goto out_error;
+
+		if (*(__be32 *) offset == cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) {
+			*rblk = i;
+			*rhead = (struct xlog_rec_header *) offset;
+			if (++found == count)
+				break;
+		}
+	}
+
+	/*
+	 * If we haven't hit the tail block or the log record header count,
+	 * start looking again from the end of the physical log. Note that
+	 * callers can pass head == tail if the tail is not yet known.
+	 */
+	if (tail_blk >= head_blk && found != count) {
+		for (i = log->l_logBBsize - 1; i >= (int) tail_blk; i--) {
+			error = xlog_bread(log, i, 1, bp, &offset);
+			if (error)
+				goto out_error;
+
+			if (*(__be32 *)offset ==
+			    cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) {
+				*wrapped = true;
+				*rblk = i;
+				*rhead = (struct xlog_rec_header *) offset;
+				if (++found == count)
+					break;
+			}
+		}
+	}
+
+	return found;
+
+out_error:
+	return error;
+}
+
+/*
  * Find the sync block number or the tail of the log.
  *
  * This will be the block number of the last record to have its
@@ -898,8 +971,7 @@ xlog_find_tail(
 	xfs_daddr_t		after_umount_blk;
 	xfs_lsn_t		tail_lsn;
 	int			hblks;
-
-	found = 0;
+	bool			wrapped = false;
 
 	/*
 	 * Find previous log record
@@ -923,37 +995,16 @@ xlog_find_tail(
 	}
 
 	/*
-	 * Search backwards looking for log record header block
+	 * Search backwards through the log looking for the log record header
+	 * block. This wraps all the way back around to the head so something is
+	 * seriously wrong if we can't find it.
 	 */
 	ASSERT(*head_blk < INT_MAX);
-	for (i = (int)(*head_blk) - 1; i >= 0; i--) {
-		error = xlog_bread(log, i, 1, bp, &offset);
-		if (error)
-			goto done;
-
-		if (*(__be32 *)offset == cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) {
-			found = 1;
-			break;
-		}
-	}
-	/*
-	 * If we haven't found the log record header block, start looking
-	 * again from the end of the physical log.  XXXmiken: There should be
-	 * a check here to make sure we didn't search more than N blocks in
-	 * the previous code.
-	 */
-	if (!found) {
-		for (i = log->l_logBBsize - 1; i >= (int)(*head_blk); i--) {
-			error = xlog_bread(log, i, 1, bp, &offset);
-			if (error)
-				goto done;
-
-			if (*(__be32 *)offset ==
-			    cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) {
-				found = 2;
-				break;
-			}
-		}
+	found = xlog_rseek_logrec_hdr(log, *head_blk, *head_blk, 1, bp, &i,
+				      &rhead, &wrapped);
+	if (found < 0) {
+		error = found;
+		goto done;
 	}
 	if (!found) {
 		xfs_warn(log->l_mp, "%s: couldn't find sync record", __func__);
@@ -961,9 +1012,6 @@ xlog_find_tail(
 		ASSERT(0);
 		return -EIO;
 	}
-
-	/* find blk_no of tail of log */
-	rhead = (xlog_rec_header_t *)offset;
 	*tail_blk = BLOCK_LSN(be64_to_cpu(rhead->h_tail_lsn));
 
 	/*
@@ -979,7 +1027,7 @@ xlog_find_tail(
 	log->l_prev_block = i;
 	log->l_curr_block = (int)*head_blk;
 	log->l_curr_cycle = be32_to_cpu(rhead->h_cycle);
-	if (found == 2)
+	if (wrapped)
 		log->l_curr_cycle++;
 	atomic64_set(&log->l_tail_lsn, be64_to_cpu(rhead->h_tail_lsn));
 	atomic64_set(&log->l_last_sync_lsn, be64_to_cpu(rhead->h_lsn));
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 7/8] xfs: detect and trim torn writes during log recovery
  2015-11-12 15:37 [PATCH v2 0/8] xfs: log recovery torn write detection Brian Foster
                   ` (5 preceding siblings ...)
  2015-11-12 15:37 ` [PATCH v2 6/8] xfs: refactor log record start detection into a new helper Brian Foster
@ 2015-11-12 15:37 ` Brian Foster
  2015-11-12 15:37 ` [PATCH v2 8/8] xfs: debug mode log record crc error injection Brian Foster
  7 siblings, 0 replies; 9+ messages in thread
From: Brian Foster @ 2015-11-12 15:37 UTC (permalink / raw)
  To: xfs

Certain types of storage, such as persistent memory, do not provide
sector atomicity for writes. This means that if a crash occurs while XFS
is writing log records, only part of those records might make it to the
storage. This is problematic because log recovery uses the cycle value
packed at the top of each log block to locate the head/tail of the log.
This can lead to CRC verification failures during log recovery and an
unmountable fs for a filesystem that is otherwise consistent.

Update log recovery to incorporate log record CRC verification as part
of the head/tail discovery process. Once the head is located via the
traditional algorithm, run a CRC-only pass over the records up to the
head of the log. If CRC verification fails, assume that the records are
torn as a matter of policy and trim the head block back to the start of
the first bad record.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log_recover.c | 306 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 287 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 423c36d..5bf8076 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -61,6 +61,9 @@ xlog_recover_check_summary(
 #else
 #define	xlog_recover_check_summary(log)
 #endif
+STATIC int
+xlog_do_recovery_pass(
+        struct xlog *, xfs_daddr_t, xfs_daddr_t, int, xfs_daddr_t *);
 
 /*
  * This structure is used during recovery to record the buf log items which
@@ -941,6 +944,278 @@ out_error:
 }
 
 /*
+ * Seek forward in the log for log record headers.
+ *
+ * Given head and tail blocks, walk forward from the tail block until we find
+ * the provided number of records or hit the head block. The return value is the
+ * number of records encountered or a negative error code. The log block and
+ * buffer pointer of the last record seen are returned in rblk and rhead
+ * respectively.
+ */
+STATIC int
+xlog_seek_logrec_hdr(
+	struct xlog		*log,
+	xfs_daddr_t		head_blk,
+	xfs_daddr_t		tail_blk,
+	int			count,
+	struct xfs_buf		*bp,
+	xfs_daddr_t		*rblk,
+	struct xlog_rec_header	**rhead,
+	bool			*wrapped)
+{
+	int			i;
+	int			error;
+	int			found = 0;
+	char			*offset = NULL;
+	xfs_daddr_t		end_blk;
+
+	*wrapped = false;
+
+	/*
+	 * Walk forward from the tail block until we hit the head or the last
+	 * block in the log.
+	 */
+	end_blk = head_blk > tail_blk ? head_blk : log->l_logBBsize - 1;
+	for (i = (int) tail_blk; i <= end_blk; i++) {
+		error = xlog_bread(log, i, 1, bp, &offset);
+		if (error)
+			goto out_error;
+
+		if (*(__be32 *) offset == cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) {
+			*rblk = i;
+			*rhead = (struct xlog_rec_header *) offset;
+			if (++found == count)
+				break;
+		}
+	}
+
+	/*
+	 * If we haven't hit the head block or the log record header count,
+	 * start looking again from the start of the physical log.
+	 */
+	if (tail_blk > head_blk && found != count) {
+		for (i = 0; i < (int) head_blk; i++) {
+			error = xlog_bread(log, i, 1, bp, &offset);
+			if (error)
+				goto out_error;
+
+			if (*(__be32 *)offset ==
+			    cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) {
+				*wrapped = true;
+				*rblk = i;
+				*rhead = (struct xlog_rec_header *) offset;
+				if (++found == count)
+					break;
+			}
+		}
+	}
+
+	return found;
+
+out_error:
+	return error;
+}
+
+/*
+ * Check the log tail for torn writes. This is required when torn writes are
+ * detected at the head and the head had to be walked back to a previous record.
+ * The tail of the previous record must now be verified to ensure the torn
+ * writes didn't corrupt the previous tail.
+ *
+ * Return an error if CRC verification fails as recovery cannot proceed.
+ */
+STATIC int
+xlog_verify_tail(
+	struct xlog		*log,
+	xfs_daddr_t		head_blk,
+	xfs_daddr_t		tail_blk)
+{
+	struct xlog_rec_header	*thead;
+	struct xfs_buf		*bp;
+	xfs_daddr_t		first_bad;
+	int			count;
+	int			error = 0;
+	bool			wrapped;
+	xfs_daddr_t		tmp_head;
+
+	bp = xlog_get_bp(log, 1);
+	if (!bp)
+		return -ENOMEM;
+
+	/*
+	 * Seek XLOG_MAX_ICLOGS + 1 records past the current tail record to get
+	 * a temporary head block that points after the last possible
+	 * concurrently written record of the tail.
+	 */
+	count = xlog_seek_logrec_hdr(log, head_blk, tail_blk,
+				     XLOG_MAX_ICLOGS + 1, bp, &tmp_head, &thead,
+				     &wrapped);
+	if (count < 0) {
+		error = count;
+		goto out;
+	}
+
+	/*
+	 * If the call above didn't find XLOG_MAX_ICLOGS + 1 records, we ran
+	 * into the actual log head. tmp_head points to the start of the record
+	 * so update it to the actual head block.
+	 */
+	if (count < XLOG_MAX_ICLOGS + 1)
+		tmp_head = head_blk;
+
+	/*
+	 * We now have a tail and temporary head block that covers at least
+	 * XLOG_MAX_ICLOGS records from the tail. We need to verify that these
+	 * records were completely written. Run a CRC verification pass from
+	 * tail to head and return the result.
+	 */
+	error = xlog_do_recovery_pass(log, tmp_head, tail_blk,
+				      XLOG_RECOVER_CRCPASS, &first_bad);
+
+out:
+	xlog_put_bp(bp);
+	return error;
+}
+
+/*
+ * Detect and trim torn writes from the head of the log.
+ *
+ * Storage without sector atomicity guarantees can result in torn writes in the
+ * log in the event of a crash. Our only means to detect this scenario is via
+ * CRC verification. While we can't always be certain that CRC verification
+ * failure is due to a torn write vs. an unrelated corruption, we do know that
+ * only a certain number (XLOG_MAX_ICLOGS) of log records can be written out at
+ * one time. Therefore, CRC verify up to XLOG_MAX_ICLOGS records at the head of
+ * the log and treat failures in this range as torn writes as a matter of
+ * policy. In the event of CRC failure, the head is walked back to the last good
+ * record in the log and the tail is updated from that record and verified.
+ */
+STATIC int
+xlog_verify_head(
+	struct xlog		*log,
+	xfs_daddr_t		*head_blk,	/* in/out: unverified head */
+	xfs_daddr_t		*tail_blk,	/* out: tail block */
+	struct xfs_buf		*bp,
+	xfs_daddr_t		*rhead_blk,	/* start blk of last record */
+	struct xlog_rec_header	**rhead,	/* ptr to last record */
+	bool			*wrapped)	/* last rec. wraps phys. log */
+{
+	struct xlog_rec_header	*tmp_rhead;
+	struct xfs_buf		*tmp_bp;
+	xfs_daddr_t		first_bad;
+	xfs_daddr_t		tmp_rhead_blk;
+	int			found;
+	int			error;
+	bool			tmp_wrapped;
+
+	/*
+	 * Search backwards through the log looking for the log record header
+	 * block. This wraps all the way back around to the head so something is
+	 * seriously wrong if we can't find it.
+	 */
+	found = xlog_rseek_logrec_hdr(log, *head_blk, *head_blk, 1, bp, rhead_blk,
+				      rhead, wrapped);
+	if (found < 0)
+		return found;
+	if (!found) {
+		xfs_warn(log->l_mp, "%s: couldn't find sync record", __func__);
+		return -EIO;
+	}
+
+	*tail_blk = BLOCK_LSN(be64_to_cpu((*rhead)->h_tail_lsn));
+
+	/*
+	 * Now that we have a tail block, check the head of the log for torn
+	 * writes. Search again until we hit the tail or the maximum number of
+	 * log record I/Os that could have been in flight at one time. Use a
+	 * temporary buffer so we don't trash the rhead/bp pointer from the
+	 * call above.
+	 */
+	tmp_bp = xlog_get_bp(log, 1);
+	if (!tmp_bp)
+		return -ENOMEM;
+	error = xlog_rseek_logrec_hdr(log, *head_blk, *tail_blk,
+				      XLOG_MAX_ICLOGS, tmp_bp, &tmp_rhead_blk,
+				      &tmp_rhead, &tmp_wrapped);
+	xlog_put_bp(tmp_bp);
+	if (error < 0)
+		return error;
+
+	/*
+	 * Now run a CRC verification pass over the records starting at the
+	 * block found above to the current head. If a CRC failure occurs, the
+	 * log block of the first bad record is saved in first_bad.
+	 */
+	error = xlog_do_recovery_pass(log, *head_blk, tmp_rhead_blk,
+				      XLOG_RECOVER_CRCPASS, &first_bad);
+	if (error == -EFSBADCRC) {
+		/*
+		 * We've hit a potential torn write. Reset the error and warn
+		 * about it.
+		 */
+		error = 0;
+		xfs_warn(log->l_mp,
+"Torn write (CRC failure) detected at log block 0x%llx. Truncating head block from 0x%llx.",
+			 first_bad, *head_blk);
+
+		/*
+		 * Get the header block and buffer pointer for the last good
+		 * record before the bad record.
+		 *
+		 * Note that xlog_find_tail() clears the blocks at the new head
+		 * (i.e., the records with invalid CRC) if the cycle number
+		 * matches the the current cycle.
+		 */
+		found = xlog_rseek_logrec_hdr(log, first_bad, *tail_blk, 1, bp,
+					      rhead_blk, rhead, wrapped);
+		if (found < 0)
+			return found;
+		if (found == 0)		/* XXX: right thing to do here? */
+			return -EIO;
+
+		/*
+		 * Reset the head block to the starting block of the first bad
+		 * log record and set the tail block based on the last good
+		 * record.
+		 *
+		 * Bail out if the updated head/tail match as this indicates
+		 * possible corruption outside of the acceptable
+		 * (XLOG_MAX_ICLOGS) range. This is a job for xfs_repair...
+		 */
+		*head_blk = first_bad;
+		*tail_blk = BLOCK_LSN(be64_to_cpu((*rhead)->h_tail_lsn));
+		if (*head_blk == *tail_blk) {
+			ASSERT(0);
+			return 0;
+		}
+
+		/*
+		 * Now verify the tail based on the updated head. This is
+		 * required because the torn writes trimmed from the head could
+		 * have been written over the tail of a previous record. Return
+		 * any errors since recovery cannot proceed if the tail is
+		 * corrupt.
+		 *
+		 * XXX: This leaves a gap in truly robust protection from torn
+		 * writes in the log. If the head is behind the tail, the tail
+		 * pushes forward to create some space and then a crash occurs
+		 * causing the writes into the previous record's tail region to
+		 * tear, log recovery isn't able to recover.
+		 *
+		 * How likely is this to occur? If possible, can we do something
+		 * more intelligent here? Is it safe to push the tail forward if
+		 * we can determine that the tail is within the range of the
+		 * torn write (e.g., the kernel can only overwrite the tail if
+		 * it has actually been pushed forward)? Alternatively, could we
+		 * somehow prevent this condition at runtime?
+		 */
+		error = xlog_verify_tail(log, *head_blk, *tail_blk);
+	}
+
+	return error;
+}
+
+/*
  * Find the sync block number or the tail of the log.
  *
  * This will be the block number of the last record to have its
@@ -966,9 +1241,10 @@ xlog_find_tail(
 	xlog_op_header_t	*op_head;
 	char			*offset = NULL;
 	xfs_buf_t		*bp;
-	int			error, i, found;
+	int			error;
 	xfs_daddr_t		umount_data_blk;
 	xfs_daddr_t		after_umount_blk;
+	xfs_daddr_t		rhead_blk;
 	xfs_lsn_t		tail_lsn;
 	int			hblks;
 	bool			wrapped = false;
@@ -995,24 +1271,16 @@ xlog_find_tail(
 	}
 
 	/*
-	 * Search backwards through the log looking for the log record header
-	 * block. This wraps all the way back around to the head so something is
-	 * seriously wrong if we can't find it.
+	 * Trim the head block back to skip over torn records. We can have
+	 * multiple log I/Os in flight at any time, so we assume CRC failures
+	 * back through the previous several records are torn writes and skip
+	 * them.
 	 */
 	ASSERT(*head_blk < INT_MAX);
-	found = xlog_rseek_logrec_hdr(log, *head_blk, *head_blk, 1, bp, &i,
-				      &rhead, &wrapped);
-	if (found < 0) {
-		error = found;
+	error = xlog_verify_head(log, head_blk, tail_blk, bp, &rhead_blk,
+				 &rhead, &wrapped);
+	if (error)
 		goto done;
-	}
-	if (!found) {
-		xfs_warn(log->l_mp, "%s: couldn't find sync record", __func__);
-		xlog_put_bp(bp);
-		ASSERT(0);
-		return -EIO;
-	}
-	*tail_blk = BLOCK_LSN(be64_to_cpu(rhead->h_tail_lsn));
 
 	/*
 	 * Reset log values according to the state of the log when we
@@ -1024,7 +1292,7 @@ xlog_find_tail(
 	 * written was complete and ended exactly on the end boundary
 	 * of the physical log.
 	 */
-	log->l_prev_block = i;
+	log->l_prev_block = rhead_blk;
 	log->l_curr_block = (int)*head_blk;
 	log->l_curr_cycle = be32_to_cpu(rhead->h_cycle);
 	if (wrapped)
@@ -1062,12 +1330,12 @@ xlog_find_tail(
 	} else {
 		hblks = 1;
 	}
-	after_umount_blk = (i + hblks + (int)
+	after_umount_blk = (rhead_blk + hblks + (int)
 		BTOBB(be32_to_cpu(rhead->h_len))) % log->l_logBBsize;
 	tail_lsn = atomic64_read(&log->l_tail_lsn);
 	if (*head_blk == after_umount_blk &&
 	    be32_to_cpu(rhead->h_num_logops) == 1) {
-		umount_data_blk = (i + hblks) % log->l_logBBsize;
+		umount_data_blk = (rhead_blk + hblks) % log->l_logBBsize;
 		error = xlog_bread(log, umount_data_blk, 1, bp, &offset);
 		if (error)
 			goto done;
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 8/8] xfs: debug mode log record crc error injection
  2015-11-12 15:37 [PATCH v2 0/8] xfs: log recovery torn write detection Brian Foster
                   ` (6 preceding siblings ...)
  2015-11-12 15:37 ` [PATCH v2 7/8] xfs: detect and trim torn writes during log recovery Brian Foster
@ 2015-11-12 15:37 ` Brian Foster
  7 siblings, 0 replies; 9+ messages in thread
From: Brian Foster @ 2015-11-12 15:37 UTC (permalink / raw)
  To: xfs

XFS now uses CRC verification over a limited section of the log to
detect torn writes prior to a crash. This is difficult to test directly
due to the timing and hardware requirements to cause a short write.

Add a mechanism to inject CRC errors into log records to facilitate
testing torn write detection during log recovery. This mechanism is
dangerous and can result in filesystem corruption. Thus, it is only
available in DEBUG mode for testing/development purposes. Set a non-zero
value to the following sysfs entry to enable error injection:

	/sys/fs/xfs/<dev>/log/log_badcrc_factor

Once enabled, XFS intentionally writes an invalid CRC to a log record at
some random point in the future based on the provided frequency. The
filesystem immediately shuts down once the record has been written to
the physical log to prevent metadata writeback (e.g., AIL insertion)
once the log write completes. This helps reasonably simulate a torn
write to the log as the affected record must be safe to discard. The
next mount after the intentional shutdown requires log recovery and
should detect and recover from the torn write.

Note again that this _will_ result in data loss or worse. For testing
and development purposes only!

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log.c      | 45 ++++++++++++++++++++++++++++++++++++++-------
 fs/xfs/xfs_log_priv.h |  3 +++
 fs/xfs/xfs_sysfs.c    | 36 ++++++++++++++++++++++++++++++++++++
 3 files changed, 77 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index f52c72a..887c443 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1188,10 +1188,16 @@ xlog_iodone(xfs_buf_t *bp)
 	int			aborted = 0;
 
 	/*
-	 * Race to shutdown the filesystem if we see an error.
+	 * Race to shutdown the filesystem if we see an error or the iclog is in
+	 * IOABORT state. The IOABORT state is only set in DEBUG mode to inject
+	 * CRC errors into log recovery.
 	 */
-	if (XFS_TEST_ERROR(bp->b_error, l->l_mp,
-			XFS_ERRTAG_IODONE_IOERR, XFS_RANDOM_IODONE_IOERR)) {
+	if (XFS_TEST_ERROR(bp->b_error, l->l_mp, XFS_ERRTAG_IODONE_IOERR,
+			   XFS_RANDOM_IODONE_IOERR) ||
+	    iclog->ic_state & XLOG_STATE_IOABORT) {
+		if (iclog->ic_state & XLOG_STATE_IOABORT)
+			iclog->ic_state &= ~XLOG_STATE_IOABORT;
+
 		xfs_buf_ioerror_alert(bp, __func__);
 		xfs_buf_stale(bp);
 		xfs_force_shutdown(l->l_mp, SHUTDOWN_LOG_IO_ERROR);
@@ -1838,6 +1844,23 @@ xlog_sync(
 	/* calculcate the checksum */
 	iclog->ic_header.h_crc = xlog_cksum(log, &iclog->ic_header,
 					    iclog->ic_datap, size);
+#ifdef DEBUG
+	/*
+	 * Intentionally corrupt the log record CRC based on the error injection
+	 * frequency, if defined. This facilitates testing log recovery in the
+	 * event of torn writes. Hence, set the IOABORT state to abort the log
+	 * write on I/O completion and shutdown the fs. The subsequent mount
+	 * detects the bad CRC and attempts to recover.
+	 */
+	if (log->l_badcrc_factor &&
+	    (prandom_u32() % log->l_badcrc_factor == 0)) {
+		iclog->ic_header.h_crc &= 0xAAAAAAAA;
+		iclog->ic_state |= XLOG_STATE_IOABORT;
+		xfs_warn(log->l_mp,
+	"Intentionally corrupted log record at LSN 0x%llx. Shutdown imminent.",
+			 be64_to_cpu(iclog->ic_header.h_lsn));
+	}
+#endif
 
 	bp->b_io_length = BTOBB(count);
 	bp->b_fspriv = iclog;
@@ -2791,11 +2814,19 @@ xlog_state_do_callback(
 		}
 	} while (!ioerrors && loopdidcallbacks);
 
+#ifdef DEBUG
 	/*
-	 * make one last gasp attempt to see if iclogs are being left in
-	 * limbo..
+	 * Make one last gasp attempt to see if iclogs are being left in limbo.
+	 * If the above loop finds an iclog earlier than the current iclog and
+	 * in one of the syncing states, the current iclog is put into
+	 * DO_CALLBACK and the callbacks are deferred to the completion of the
+	 * earlier iclog. Walk the iclogs in order and make sure that no iclog
+	 * is in DO_CALLBACK unless an earlier iclog is in one of the syncing
+	 * states.
+	 *
+	 * Note that SYNCING|IOABORT is a valid state so we cannot just check
+	 * for ic_state == SYNCING.
 	 */
-#ifdef DEBUG
 	if (funcdidcallbacks) {
 		first_iclog = iclog = log->l_iclog;
 		do {
@@ -2810,7 +2841,7 @@ xlog_state_do_callback(
 			 * IOERROR - give up hope all ye who enter here
 			 */
 			if (iclog->ic_state == XLOG_STATE_WANT_SYNC ||
-			    iclog->ic_state == XLOG_STATE_SYNCING ||
+			    iclog->ic_state & XLOG_STATE_SYNCING ||
 			    iclog->ic_state == XLOG_STATE_DONE_SYNC ||
 			    iclog->ic_state == XLOG_STATE_IOERROR )
 				break;
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 8daba74..ed88963 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -62,6 +62,7 @@ static inline uint xlog_get_client_id(__be32 i)
 #define XLOG_STATE_CALLBACK  0x0020 /* Callback functions now */
 #define XLOG_STATE_DIRTY     0x0040 /* Dirty IC log, not ready for ACTIVE status*/
 #define XLOG_STATE_IOERROR   0x0080 /* IO error happened in sync'ing log */
+#define XLOG_STATE_IOABORT   0x0100 /* force abort on I/O completion (debug) */
 #define XLOG_STATE_ALL	     0x7FFF /* All possible valid flags */
 #define XLOG_STATE_NOTUSED   0x8000 /* This IC log not being used */
 
@@ -410,6 +411,8 @@ struct xlog {
 	/* The following field are used for debugging; need to hold icloglock */
 #ifdef DEBUG
 	void			*l_iclog_bak[XLOG_MAX_ICLOGS];
+	/* log record crc error injection factor */
+	uint32_t		l_badcrc_factor;
 #endif
 
 };
diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
index ee70f5d..641d625 100644
--- a/fs/xfs/xfs_sysfs.c
+++ b/fs/xfs/xfs_sysfs.c
@@ -255,11 +255,47 @@ write_grant_head_show(
 }
 XFS_SYSFS_ATTR_RO(write_grant_head);
 
+#ifdef DEBUG
+STATIC ssize_t
+log_badcrc_factor_store(
+	struct kobject	*kobject,
+	const char	*buf,
+	size_t		count)
+{
+	struct xlog	*log = to_xlog(kobject);
+	int		ret;
+	uint32_t	val;
+
+	ret = kstrtouint(buf, 0, &val);
+	if (ret)
+		return ret;
+
+	log->l_badcrc_factor = val;
+
+	return count;
+}
+
+STATIC ssize_t
+log_badcrc_factor_show(
+	struct kobject	*kobject,
+	char		*buf)
+{
+	struct xlog	*log = to_xlog(kobject);
+
+	return snprintf(buf, PAGE_SIZE, "%d\n", log->l_badcrc_factor);
+}
+
+XFS_SYSFS_ATTR_RW(log_badcrc_factor);
+#endif	/* DEBUG */
+
 static struct attribute *xfs_log_attrs[] = {
 	ATTR_LIST(log_head_lsn),
 	ATTR_LIST(log_tail_lsn),
 	ATTR_LIST(reserve_grant_head),
 	ATTR_LIST(write_grant_head),
+#ifdef DEBUG
+	ATTR_LIST(log_badcrc_factor),
+#endif
 	NULL,
 };
 
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-11-12 15:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-12 15:37 [PATCH v2 0/8] xfs: log recovery torn write detection Brian Foster
2015-11-12 15:37 ` [PATCH v2 1/8] xfs: detect and handle invalid iclog size set by mkfs Brian Foster
2015-11-12 15:37 ` [PATCH v2 2/8] xfs: refactor log record unpack and data processing Brian Foster
2015-11-12 15:37 ` [PATCH v2 3/8] xfs: refactor and open code log record crc check Brian Foster
2015-11-12 15:37 ` [PATCH v2 4/8] xfs: return start block of first bad log record during recovery Brian Foster
2015-11-12 15:37 ` [PATCH v2 5/8] xfs: support a crc verification only log record pass Brian Foster
2015-11-12 15:37 ` [PATCH v2 6/8] xfs: refactor log record start detection into a new helper Brian Foster
2015-11-12 15:37 ` [PATCH v2 7/8] xfs: detect and trim torn writes during log recovery Brian Foster
2015-11-12 15:37 ` [PATCH v2 8/8] xfs: debug mode log record crc error injection Brian Foster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.