[PATCH 0/2] xfs: prevent transient corrupt states during log recovery

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] xfs: prevent transient corrupt states during log recovery
@ 2013-08-28 11:22 Dave Chinner
  2013-08-28 11:22 ` [PATCH 1/2] xfs: btree block LSN escaping to disk uninitialised Dave Chinner
  2013-08-28 11:22 ` [PATCH 2/2] xfs: check LSN ordering for v5 superblocks during recovery Dave Chinner
  0 siblings, 2 replies; 12+ messages in thread
From: Dave Chinner @ 2013-08-28 11:22 UTC (permalink / raw)
  To: xfs

Hi folks,

The following two patches prvent log recovery from recovering old
changes over a newer object on disk, preventing transient corrupt
states in memory causing verifier failures when recovery tries to
write the objects back to disk.

This only works for v5 filesystems as it relies on the LSN that is
contained in all v5 metadata and stamped with the current LSN when
ever the metadata is written to doisk.

The first patch fixes a problem with LSN fields being unitialised
and logged in that state, and then having recovery restore that
uninitialised state and confuse any further attempts by recovery
to determine the age of the object by LSN.

The second patch does the work of determining the age of an object
being recovered based on the magic number in the object. The reasons
for doing this are described in that patch.

By skipping recovery of objects that are newer on disk than in the
checkpoint being recovered, we ensure that log recovery never
creates a transient corrupt state during recovery in memory, and
hence we never attempt to write them to disk and so recovery won't
abort due to corruption being detected due to this issue.

Cheers,

Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] xfs: btree block LSN escaping to disk uninitialised
  2013-08-28 11:22 [PATCH 0/2] xfs: prevent transient corrupt states during log recovery Dave Chinner
@ 2013-08-28 11:22 ` Dave Chinner
  2013-08-28 20:31   ` Mark Tinguely
  2013-08-28 11:22 ` [PATCH 2/2] xfs: check LSN ordering for v5 superblocks during recovery Dave Chinner
  1 sibling, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2013-08-28 11:22 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When testing LSN ordering code for v5 superblocks, it was discovered
that the the LSN embedded in the generic btree blocks was
occasionally uninitialised. These values didn't get written to disk
by metadata writeback - they got written by previous transactions in
log recovery.

The issue is here that the when the block is first allocated and
initialised, the LSN field was not initialised - it gets overwritten
before IO is issued on the buffer - but the value that is logged by
transactions that modify the header before it is written to disk
(and initialised) contain garbage. Hence the first recovery of the
buffer will stamp garbage into the LSN field, and that can cause
subsequent transactions to not replay correctly.

The fix is simply to initialise the bb_lsn field to zero when we
initialise the block for the first time.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_btree.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/xfs/xfs_btree.c b/fs/xfs/xfs_btree.c
index ae106f6..7a2b4da 100644
--- a/fs/xfs/xfs_btree.c
+++ b/fs/xfs/xfs_btree.c
@@ -978,6 +978,7 @@ xfs_btree_init_block_int(
 			buf->bb_u.l.bb_owner = cpu_to_be64(owner);
 			uuid_copy(&buf->bb_u.l.bb_uuid, &mp->m_sb.sb_uuid);
 			buf->bb_u.l.bb_pad = 0;
+			buf->bb_u.l.bb_lsn = 0;
 		}
 	} else {
 		/* owner is a 32 bit value on short blocks */
@@ -989,6 +990,7 @@ xfs_btree_init_block_int(
 			buf->bb_u.s.bb_blkno = cpu_to_be64(blkno);
 			buf->bb_u.s.bb_owner = cpu_to_be32(__owner);
 			uuid_copy(&buf->bb_u.s.bb_uuid, &mp->m_sb.sb_uuid);
+			buf->bb_u.s.bb_lsn = 0;
 		}
 	}
 }
-- 
1.8.3.2

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] xfs: check LSN ordering for v5 superblocks during recovery
  2013-08-28 11:22 [PATCH 0/2] xfs: prevent transient corrupt states during log recovery Dave Chinner
  2013-08-28 11:22 ` [PATCH 1/2] xfs: btree block LSN escaping to disk uninitialised Dave Chinner
@ 2013-08-28 11:22 ` Dave Chinner
  2013-08-28 20:49   ` Mark Tinguely
  1 sibling, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2013-08-28 11:22 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Log recovery has some strict ordering requirements which unordered
or reordered metadata writeback can defeat. This can occur when an
item is logged in a transaction, written back to disk, and then
logged in a new transaction before the tail of the log is moved past
the original modification.

The result of this is that when we read an object off disk for
recovery purposes, the buffer that we read may not contain the
object type that recovery is expecting and hence at the end of the
checkpoint being recovered we have an invalid object in memory.

This isn't usually a problem, as recovery will then replay all the
other checkpoints and that brings the object back to a valid and
correct state, but the issue is that while the object is in the
invalid state it can be flushed to disk. This results in the object
verifier failing and triggering a corruption shutdown of log
recover. This is correct behaviour for the verifiers - the problem
is that we are not detecting that the object we've read off disk is
newer than the transaction we are replaying.

All metadata in v5 filesystems has the LSN of it's last modification
stamped in it. This enabled log recover to read that field and
determine the age of the object on disk correctly. If the LSN of the
object on disk is older than the transaction being replayed, then we
replay the modification. If the LSN of the object matches or is more
recent than the transaction's LSN, then we should avoid overwriting
the object as that is what leads to the transient corrupt state.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_recover.c | 169 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 156 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 769c4a1..7c0c1fd 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1959,6 +1959,104 @@ xlog_recover_do_inode_buffer(
 }
 
 /*
+ * V5 filesystems know the age of the buffer on disk being recovered. We can
+ * have newer objects on disk than we are replaying, and so for these cases we
+ * don't want to replay the current change as that will make the buffer contents
+ * temporarily invalid on disk.
+ *
+ * The magic number might not match the buffer type we are going to recover
+ * (e.g. reallocated blocks), so we ignore the xfs_buf_log_format flags.  Hence
+ * extract the LSN of the existing object in the buffer based on it's current
+ * magic number.  If we don't recognise the magic number in the buffer, then
+ * return a LSN of -1 so that the caller knows it was an unrecognised block and
+ * so can recover the buffer.
+ */
+static xfs_lsn_t
+xlog_recover_get_buf_lsn(
+	struct xfs_mount	*mp,
+	struct xfs_buf		*bp)
+{
+	__uint32_t		magic32;
+	__uint16_t		magic16;
+	__uint16_t		magicda;
+	void			*blk = bp->b_addr;
+
+	/* v4 filesystems always recover immediately */
+	if (!xfs_sb_version_hascrc(&mp->m_sb))
+		goto recover_immediately;
+
+	magic32 = be32_to_cpu(*(__be32 *)blk);
+	switch (magic32) {
+	case XFS_ABTB_CRC_MAGIC:
+	case XFS_ABTC_CRC_MAGIC:
+	case XFS_ABTB_MAGIC:
+	case XFS_ABTC_MAGIC:
+	case XFS_IBT_CRC_MAGIC:
+	case XFS_IBT_MAGIC:
+		return be64_to_cpu(
+				((struct xfs_btree_block *)blk)->bb_u.s.bb_lsn);
+	case XFS_BMAP_CRC_MAGIC:
+	case XFS_BMAP_MAGIC:
+		return be64_to_cpu(
+				((struct xfs_btree_block *)blk)->bb_u.l.bb_lsn);
+	case XFS_AGF_MAGIC:
+		return be64_to_cpu(((struct xfs_agf *)blk)->agf_lsn);
+	case XFS_AGFL_MAGIC:
+		return be64_to_cpu(((struct xfs_agfl *)blk)->agfl_lsn);
+	case XFS_AGI_MAGIC:
+		return be64_to_cpu(((struct xfs_agi *)blk)->agi_lsn);
+	case XFS_SYMLINK_MAGIC:
+		return be64_to_cpu(((struct xfs_dsymlink_hdr *)blk)->sl_lsn);
+	case XFS_DIR3_BLOCK_MAGIC:
+	case XFS_DIR3_DATA_MAGIC:
+	case XFS_DIR3_FREE_MAGIC:
+		return be64_to_cpu(((struct xfs_dir3_blk_hdr *)blk)->lsn);
+	case XFS_ATTR3_RMT_MAGIC:
+		return be64_to_cpu(((struct xfs_attr3_rmt_hdr *)blk)->rm_lsn);
+	case XFS_SB_MAGIC:
+		return be64_to_cpu(((struct xfs_sb *)blk)->sb_lsn);
+	default:
+		break;
+	}
+
+	magicda = be16_to_cpu(((struct xfs_da_blkinfo *)blk)->magic);
+	switch (magicda) {
+	case XFS_DIR3_LEAF1_MAGIC:
+	case XFS_DIR3_LEAFN_MAGIC:
+	case XFS_DA3_NODE_MAGIC:
+		return be64_to_cpu(((struct xfs_da3_blkinfo *)blk)->lsn);
+	default:
+		break;
+	}
+
+	/*
+	 * We do individual object checks on dquot and inode buffers as they
+	 * have their own individual LSN records. Also, we could have a stale
+	 * buffer here, so we have to at least recognise these buffer types.
+	 *
+	 * A notd complexity here is inode unlinked list processing - it logs
+	 * the inode directly in the buffer, but we don't know which inodes have
+	 * been modified, and there is no global buffer LSN. Hence we need to
+	 * recover all inode buffer types immediately. This problem will be
+	 * fixed by logical logging of the unlinked list modifications.
+	 */
+	magic16 = be16_to_cpu(*(__be16 *)blk);
+	switch (magic16) {
+	case XFS_DQUOT_MAGIC:
+	case XFS_DINODE_MAGIC:
+		goto recover_immediately;
+	default:
+		break;
+	}
+
+	/* unknown buffer contents, recover immediately */
+
+recover_immediately:
+	return (xfs_lsn_t)-1;
+
+}
+
+/*
  * Validate the recovered buffer is of the correct type and attach the
  * appropriate buffer operations to them for writeback. Magic numbers are in a
  * few places:
@@ -1967,7 +2065,7 @@ xlog_recover_do_inode_buffer(
  *	inside a struct xfs_da_blkinfo at the start of the buffer.
  */
 static void
-xlog_recovery_validate_buf_type(
+xlog_recover_validate_buf_type(
 	struct xfs_mount	*mp,
 	struct xfs_buf		*bp,
 	xfs_buf_log_format_t	*buf_f)
@@ -2246,7 +2344,7 @@ xlog_recover_do_reg_buffer(
 	 * just avoid the verification stage for non-crc filesystems
 	 */
 	if (xfs_sb_version_hascrc(&mp->m_sb))
-		xlog_recovery_validate_buf_type(mp, bp, buf_f);
+		xlog_recover_validate_buf_type(mp, bp, buf_f);
 }
 
 /*
@@ -2444,13 +2542,15 @@ STATIC int
 xlog_recover_buffer_pass2(
 	struct xlog			*log,
 	struct list_head		*buffer_list,
-	struct xlog_recover_item	*item)
+	struct xlog_recover_item	*item,
+	xfs_lsn_t			current_lsn)
 {
 	xfs_buf_log_format_t	*buf_f = item->ri_buf[0].i_addr;
 	xfs_mount_t		*mp = log->l_mp;
 	xfs_buf_t		*bp;
 	int			error;
 	uint			buf_flags;
+	xfs_lsn_t		lsn;
 
 	/*
 	 * In this pass we only want to recover all the buffers which have
@@ -2475,10 +2575,17 @@ xlog_recover_buffer_pass2(
 	error = bp->b_error;
 	if (error) {
 		xfs_buf_ioerror_alert(bp, "xlog_recover_do..(read#1)");
-		xfs_buf_relse(bp);
-		return error;
+		goto out_release;
 	}
 
+	/*
+	 * recover the buffer only if we get an LSN from it and it's less than
+	 * the lsn of the transaction we are replaying.
+	 */
+	lsn = xlog_recover_get_buf_lsn(mp, bp);
+	if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0)
+		goto out_release;
+
 	if (buf_f->blf_flags & XFS_BLF_INODE_BUF) {
 		error = xlog_recover_do_inode_buffer(mp, item, bp, buf_f);
 	} else if (buf_f->blf_flags &
@@ -2488,7 +2595,7 @@ xlog_recover_buffer_pass2(
 		xlog_recover_do_reg_buffer(mp, item, bp, buf_f);
 	}
 	if (error)
-		return XFS_ERROR(error);
+		goto out_release;
 
 	/*
 	 * Perform delayed write on the buffer.  Asynchronous writes will be
@@ -2517,6 +2624,7 @@ xlog_recover_buffer_pass2(
 		xfs_buf_delwri_queue(bp, buffer_list);
 	}
 
+out_release:
 	xfs_buf_relse(bp);
 	return error;
 }
@@ -2525,7 +2633,8 @@ STATIC int
 xlog_recover_inode_pass2(
 	struct xlog			*log,
 	struct list_head		*buffer_list,
-	struct xlog_recover_item	*item)
+	struct xlog_recover_item	*item,
+	xfs_lsn_t			current_lsn)
 {
 	xfs_inode_log_format_t	*in_f;
 	xfs_mount_t		*mp = log->l_mp;
@@ -2605,6 +2714,20 @@ xlog_recover_inode_pass2(
 	}
 
 	/*
+	 * If the inode has an LSN in it, recover the inode only if it's less
+	 * than the lsn of the transaction we are replaying.
+	 */
+	if (dip->di_version >= 3) {
+		xfs_lsn_t	lsn = be64_to_cpu(dip->di_lsn);
+
+		if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0) {
+			trace_xfs_log_recover_inode_skip(log, in_f);
+			error = 0;
+			goto out_release;
+		}
+	}
+
+	/*
 	 * di_flushiter is only valid for v1/2 inodes. All changes for v3 inodes
 	 * are transactional and if ordering is necessary we can determine that
 	 * more accurately by the LSN field in the V3 inode core. Don't trust
@@ -2793,6 +2916,8 @@ write_inode_buffer:
 	ASSERT(bp->b_target->bt_mount == mp);
 	bp->b_iodone = xlog_recover_iodone;
 	xfs_buf_delwri_queue(bp, buffer_list);
+
+out_release:
 	xfs_buf_relse(bp);
 error:
 	if (need_free)
@@ -2834,7 +2959,8 @@ STATIC int
 xlog_recover_dquot_pass2(
 	struct xlog			*log,
 	struct list_head		*buffer_list,
-	struct xlog_recover_item	*item)
+	struct xlog_recover_item	*item,
+	xfs_lsn_t			current_lsn)
 {
 	xfs_mount_t		*mp = log->l_mp;
 	xfs_buf_t		*bp;
@@ -2908,6 +3034,19 @@ xlog_recover_dquot_pass2(
 		return XFS_ERROR(EIO);
 	}
 
+	/*
+	 * If the dquot has an LSN in it, recover the dquot only if it's less
+	 * than the lsn of the transaction we are replaying.
+	 */
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
+		struct xfs_dqblk *dqb = (struct xfs_dqblk *)ddq;
+		xfs_lsn_t	lsn = be64_to_cpu(dqb->dd_lsn);
+
+		if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0) {
+			goto out_release;
+		}
+	}
+
 	memcpy(ddq, recddq, item->ri_buf[1].i_len);
 	if (xfs_sb_version_hascrc(&mp->m_sb)) {
 		xfs_update_cksum((char *)ddq, sizeof(struct xfs_dqblk),
@@ -2918,9 +3057,10 @@ xlog_recover_dquot_pass2(
 	ASSERT(bp->b_target->bt_mount == mp);
 	bp->b_iodone = xlog_recover_iodone;
 	xfs_buf_delwri_queue(bp, buffer_list);
-	xfs_buf_relse(bp);
 
-	return (0);
+out_release:
+	xfs_buf_relse(bp);
+	return 0;
 }
 
 /*
@@ -3267,15 +3407,18 @@ xlog_recover_commit_pass2(
 
 	switch (ITEM_TYPE(item)) {
 	case XFS_LI_BUF:
-		return xlog_recover_buffer_pass2(log, buffer_list, item);
+		return xlog_recover_buffer_pass2(log, buffer_list, item,
+						 trans->r_lsn);
 	case XFS_LI_INODE:
-		return xlog_recover_inode_pass2(log, buffer_list, item);
+		return xlog_recover_inode_pass2(log, buffer_list, item,
+						 trans->r_lsn);
 	case XFS_LI_EFI:
 		return xlog_recover_efi_pass2(log, item, trans->r_lsn);
 	case XFS_LI_EFD:
 		return xlog_recover_efd_pass2(log, item);
 	case XFS_LI_DQUOT:
-		return xlog_recover_dquot_pass2(log, buffer_list, item);
+		return xlog_recover_dquot_pass2(log, buffer_list, item,
+						trans->r_lsn);
 	case XFS_LI_ICREATE:
 		return xlog_recover_do_icreate_pass2(log, buffer_list, item);
 	case XFS_LI_QUOTAOFF:
-- 
1.8.3.2

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xfs: btree block LSN escaping to disk uninitialised
  2013-08-28 11:22 ` [PATCH 1/2] xfs: btree block LSN escaping to disk uninitialised Dave Chinner
@ 2013-08-28 20:31   ` Mark Tinguely
  2013-08-28 20:57     ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Tinguely @ 2013-08-28 20:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 08/28/13 06:22, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> When testing LSN ordering code for v5 superblocks, it was discovered
> that the the LSN embedded in the generic btree blocks was
> occasionally uninitialised. These values didn't get written to disk
> by metadata writeback - they got written by previous transactions in
> log recovery.
>
> The issue is here that the when the block is first allocated and
> initialised, the LSN field was not initialised - it gets overwritten
> before IO is issued on the buffer - but the value that is logged by
> transactions that modify the header before it is written to disk
> (and initialised) contain garbage. Hence the first recovery of the
> buffer will stamp garbage into the LSN field, and that can cause
> subsequent transactions to not replay correctly.
>
> The fix is simply to initialise the bb_lsn field to zero when we
> initialise the block for the first time.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---


This is simple enough that it could have been put into the second patch.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] xfs: check LSN ordering for v5 superblocks during recovery
  2013-08-28 11:22 ` [PATCH 2/2] xfs: check LSN ordering for v5 superblocks during recovery Dave Chinner
@ 2013-08-28 20:49   ` Mark Tinguely
  2013-08-28 21:02     ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Tinguely @ 2013-08-28 20:49 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 08/28/13 06:22, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> Log recovery has some strict ordering requirements which unordered
> or reordered metadata writeback can defeat. This can occur when an
> item is logged in a transaction, written back to disk, and then
> logged in a new transaction before the tail of the log is moved past
> the original modification.
>
> The result of this is that when we read an object off disk for
> recovery purposes, the buffer that we read may not contain the
> object type that recovery is expecting and hence at the end of the
> checkpoint being recovered we have an invalid object in memory.
>
> This isn't usually a problem, as recovery will then replay all the
> other checkpoints and that brings the object back to a valid and
> correct state, but the issue is that while the object is in the
> invalid state it can be flushed to disk. This results in the object
> verifier failing and triggering a corruption shutdown of log
> recover. This is correct behaviour for the verifiers - the problem
> is that we are not detecting that the object we've read off disk is
> newer than the transaction we are replaying.
>
> All metadata in v5 filesystems has the LSN of it's last modification
> stamped in it. This enabled log recover to read that field and
> determine the age of the object on disk correctly. If the LSN of the
> object on disk is older than the transaction being replayed, then we
> replay the modification. If the LSN of the object matches or is more
> recent than the transaction's LSN, then we should avoid overwriting
> the object as that is what leads to the transient corrupt state.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---


> @@ -2488,7 +2595,7 @@ xlog_recover_buffer_pass2(
>   		xlog_recover_do_reg_buffer(mp, item, bp, buf_f);
>   	}
>   	if (error)
> -		return XFS_ERROR(error);
> +		goto out_release;
>

This adds a xfs_buf_relse() on the buffer in the error path. The 
reference was taken in this routine. The callers do not know of the 
buffer and can't release it. convinced me.


>   	/*
>   	 * Perform delayed write on the buffer.  Asynchronous writes will be
> @@ -2517,6 +2624,7 @@ xlog_recover_buffer_pass2(
>   		xfs_buf_delwri_queue(bp, buffer_list);
>   	}
>
> +out_release:
>   	xfs_buf_relse(bp);
>   	return error;

Looks good. Nice to get into Linux 3.12 and possibly back to stable.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xfs: btree block LSN escaping to disk uninitialised
  2013-08-28 20:31   ` Mark Tinguely
@ 2013-08-28 20:57     ` Dave Chinner
  2013-08-30 18:41       ` Ben Myers
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2013-08-28 20:57 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On Wed, Aug 28, 2013 at 03:31:09PM -0500, Mark Tinguely wrote:
> On 08/28/13 06:22, Dave Chinner wrote:
> >From: Dave Chinner<dchinner@redhat.com>
> >
> >When testing LSN ordering code for v5 superblocks, it was discovered
> >that the the LSN embedded in the generic btree blocks was
> >occasionally uninitialised. These values didn't get written to disk
> >by metadata writeback - they got written by previous transactions in
> >log recovery.
> >
> >The issue is here that the when the block is first allocated and
> >initialised, the LSN field was not initialised - it gets overwritten
> >before IO is issued on the buffer - but the value that is logged by
> >transactions that modify the header before it is written to disk
> >(and initialised) contain garbage. Hence the first recovery of the
> >buffer will stamp garbage into the LSN field, and that can cause
> >subsequent transactions to not replay correctly.
> >
> >The fix is simply to initialise the bb_lsn field to zero when we
> >initialise the block for the first time.
> >
> >Signed-off-by: Dave Chinner<dchinner@redhat.com>
> >---
> 
> 
> This is simple enough that it could have been put into the second patch.

It is simple, but it is a bug fix and not part of the feature being
added in the second patch. Best practice is to separate out bug
fixes from features as bug fixes should be standalone commits so
they can be easily backported if necessary.  That's been drilled
into kernel developers for years by people like akpm, hch and other
maintainers, so breaking up patches like this should be second
nature to all kernel developers...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] xfs: check LSN ordering for v5 superblocks during recovery
  2013-08-28 20:49   ` Mark Tinguely
@ 2013-08-28 21:02     ` Dave Chinner
  2013-08-28 21:07       ` Mark Tinguely
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2013-08-28 21:02 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On Wed, Aug 28, 2013 at 03:49:30PM -0500, Mark Tinguely wrote:
> On 08/28/13 06:22, Dave Chinner wrote:
> >From: Dave Chinner<dchinner@redhat.com>
> >
> >Log recovery has some strict ordering requirements which unordered
> >or reordered metadata writeback can defeat. This can occur when an
> >item is logged in a transaction, written back to disk, and then
> >logged in a new transaction before the tail of the log is moved past
> >the original modification.
> >
> >The result of this is that when we read an object off disk for
> >recovery purposes, the buffer that we read may not contain the
> >object type that recovery is expecting and hence at the end of the
> >checkpoint being recovered we have an invalid object in memory.
> >
> >This isn't usually a problem, as recovery will then replay all the
> >other checkpoints and that brings the object back to a valid and
> >correct state, but the issue is that while the object is in the
> >invalid state it can be flushed to disk. This results in the object
> >verifier failing and triggering a corruption shutdown of log
> >recover. This is correct behaviour for the verifiers - the problem
> >is that we are not detecting that the object we've read off disk is
> >newer than the transaction we are replaying.
> >
> >All metadata in v5 filesystems has the LSN of it's last modification
> >stamped in it. This enabled log recover to read that field and
> >determine the age of the object on disk correctly. If the LSN of the
> >object on disk is older than the transaction being replayed, then we
> >replay the modification. If the LSN of the object matches or is more
> >recent than the transaction's LSN, then we should avoid overwriting
> >the object as that is what leads to the transient corrupt state.
> >
> >Signed-off-by: Dave Chinner<dchinner@redhat.com>
> >---
> 
> 
> >@@ -2488,7 +2595,7 @@ xlog_recover_buffer_pass2(
> >  		xlog_recover_do_reg_buffer(mp, item, bp, buf_f);
> >  	}
> >  	if (error)
> >-		return XFS_ERROR(error);
> >+		goto out_release;
> >
> 
> This adds a xfs_buf_relse() on the buffer in the error path. The
> reference was taken in this routine. The callers do not know of the
> buffer and can't release it. convinced me.
> 
> 
> >  	/*
> >  	 * Perform delayed write on the buffer.  Asynchronous writes will be
> >@@ -2517,6 +2624,7 @@ xlog_recover_buffer_pass2(
> >  		xfs_buf_delwri_queue(bp, buffer_list);
> >  	}
> >
> >+out_release:
> >  	xfs_buf_relse(bp);
> >  	return error;
> 
> Looks good. Nice to get into Linux 3.12 and possibly back to stable.

Why stable? It's v5 only code, and everyon knows that is still in
the experimental stage....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] xfs: check LSN ordering for v5 superblocks during recovery
  2013-08-28 21:02     ` Dave Chinner
@ 2013-08-28 21:07       ` Mark Tinguely
  2013-08-28 21:31         ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Tinguely @ 2013-08-28 21:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 08/28/13 16:02, Dave Chinner wrote:
> On Wed, Aug 28, 2013 at 03:49:30PM -0500, Mark Tinguely wrote:
>> On 08/28/13 06:22, Dave Chinner wrote:
>>> From: Dave Chinner<dchinner@redhat.com>
>>
>> Looks good. Nice to get into Linux 3.12 and possibly back to stable.
>
> Why stable? It's v5 only code, and everyon knows that is still in
> the experimental stage....

yep.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] xfs: check LSN ordering for v5 superblocks during recovery
  2013-08-28 21:07       ` Mark Tinguely
@ 2013-08-28 21:31         ` Dave Chinner
  2013-08-28 21:43           ` Mark Tinguely
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2013-08-28 21:31 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On Wed, Aug 28, 2013 at 04:07:12PM -0500, Mark Tinguely wrote:
> On 08/28/13 16:02, Dave Chinner wrote:
> >On Wed, Aug 28, 2013 at 03:49:30PM -0500, Mark Tinguely wrote:
> >>On 08/28/13 06:22, Dave Chinner wrote:
> >>>From: Dave Chinner<dchinner@redhat.com>
> >>
> >>Looks good. Nice to get into Linux 3.12 and possibly back to stable.
> >
> >Why stable? It's v5 only code, and everyon knows that is still in
> >the experimental stage....
> 
> yep.

That doesn't answer my question. You had to have some reason for
suggesting a possible stable backport for this code after reviewing
it, and I'm interested to know what it was...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] xfs: check LSN ordering for v5 superblocks during recovery
  2013-08-28 21:31         ` Dave Chinner
@ 2013-08-28 21:43           ` Mark Tinguely
  2013-08-28 22:34             ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Tinguely @ 2013-08-28 21:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 08/28/13 16:31, Dave Chinner wrote:
> On Wed, Aug 28, 2013 at 04:07:12PM -0500, Mark Tinguely wrote:
>> On 08/28/13 16:02, Dave Chinner wrote:
>>> On Wed, Aug 28, 2013 at 03:49:30PM -0500, Mark Tinguely wrote:
>>>> On 08/28/13 06:22, Dave Chinner wrote:
>>>>> From: Dave Chinner<dchinner@redhat.com>
>>>>
>>>> Looks good. Nice to get into Linux 3.12 and possibly back to stable.
>>>
>>> Why stable? It's v5 only code, and everyon knows that is still in
>>> the experimental stage....
>>
>> yep.
>
> That doesn't answer my question. You had to have some reason for
> suggesting a possible stable backport for this code after reviewing
> it, and I'm interested to know what it was...
>
> Cheers,
>
> Dave.

No thought that it would be nice to get it into Linux 3.12 and if Brian 
or anyone wants to review it, then it needs to be done soon.

my "yep" was a terse agreeing with your point.

   Yep, this problem has been around forever.
   Yep, this problem was found/confirmed by your verifier.
   Yep, this problem can only be fixed this way in superblock v5.
   Yep, I had blinders on and was not thinking this to experimental code
     so it does not matter to push it back.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] xfs: check LSN ordering for v5 superblocks during recovery
  2013-08-28 21:43           ` Mark Tinguely
@ 2013-08-28 22:34             ` Dave Chinner
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2013-08-28 22:34 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On Wed, Aug 28, 2013 at 04:43:37PM -0500, Mark Tinguely wrote:
> On 08/28/13 16:31, Dave Chinner wrote:
> >On Wed, Aug 28, 2013 at 04:07:12PM -0500, Mark Tinguely wrote:
> >>On 08/28/13 16:02, Dave Chinner wrote:
> >>>On Wed, Aug 28, 2013 at 03:49:30PM -0500, Mark Tinguely wrote:
> >>>>On 08/28/13 06:22, Dave Chinner wrote:
> >>>>>From: Dave Chinner<dchinner@redhat.com>
> >>>>
> >>>>Looks good. Nice to get into Linux 3.12 and possibly back to stable.
> >>>
> >>>Why stable? It's v5 only code, and everyon knows that is still in
> >>>the experimental stage....
> >>
> >>yep.
> >
> >That doesn't answer my question. You had to have some reason for
> >suggesting a possible stable backport for this code after reviewing
> >it, and I'm interested to know what it was...
> >
> >Cheers,
> >
> >Dave.
> 
> No thought that it would be nice to get it into Linux 3.12 and if
> Brian or anyone wants to review it, then it needs to be done soon.
> 
> my "yep" was a terse agreeing with your point.
> 
>   Yep, this problem has been around forever.
>   Yep, this problem was found/confirmed by your verifier.
>   Yep, this problem can only be fixed this way in superblock v5.
>   Yep, I had blinders on and was not thinking this to experimental code
>     so it does not matter to push it back.

Ok, all good then. :)

Thanks for explaining in more detail, Mark. 

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xfs: btree block LSN escaping to disk uninitialised
  2013-08-28 20:57     ` Dave Chinner
@ 2013-08-30 18:41       ` Ben Myers
  0 siblings, 0 replies; 12+ messages in thread
From: Ben Myers @ 2013-08-30 18:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Mark Tinguely, xfs

On Thu, Aug 29, 2013 at 06:57:53AM +1000, Dave Chinner wrote:
> On Wed, Aug 28, 2013 at 03:31:09PM -0500, Mark Tinguely wrote:
> > On 08/28/13 06:22, Dave Chinner wrote:
> > >From: Dave Chinner<dchinner@redhat.com>
> > >
> > >When testing LSN ordering code for v5 superblocks, it was discovered
> > >that the the LSN embedded in the generic btree blocks was
> > >occasionally uninitialised. These values didn't get written to disk
> > >by metadata writeback - they got written by previous transactions in
> > >log recovery.
> > >
> > >The issue is here that the when the block is first allocated and
> > >initialised, the LSN field was not initialised - it gets overwritten
> > >before IO is issued on the buffer - but the value that is logged by
> > >transactions that modify the header before it is written to disk
> > >(and initialised) contain garbage. Hence the first recovery of the
> > >buffer will stamp garbage into the LSN field, and that can cause
> > >subsequent transactions to not replay correctly.
> > >
> > >The fix is simply to initialise the bb_lsn field to zero when we
> > >initialise the block for the first time.
> > >
> > >Signed-off-by: Dave Chinner<dchinner@redhat.com>
> > >---
> > 
> > 
> > This is simple enough that it could have been put into the second patch.
> 
> It is simple, but it is a bug fix and not part of the feature being
> added in the second patch. Best practice is to separate out bug
> fixes from features as bug fixes should be standalone commits so
> they can be easily backported if necessary.  That's been drilled
> into kernel developers for years by people like akpm, hch and other
> maintainers, so breaking up patches like this should be second
> nature to all kernel developers...

Amen.

Applied these two.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-08-30 18:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-28 11:22 [PATCH 0/2] xfs: prevent transient corrupt states during log recovery Dave Chinner
2013-08-28 11:22 ` [PATCH 1/2] xfs: btree block LSN escaping to disk uninitialised Dave Chinner
2013-08-28 20:31   ` Mark Tinguely
2013-08-28 20:57     ` Dave Chinner
2013-08-30 18:41       ` Ben Myers
2013-08-28 11:22 ` [PATCH 2/2] xfs: check LSN ordering for v5 superblocks during recovery Dave Chinner
2013-08-28 20:49   ` Mark Tinguely
2013-08-28 21:02     ` Dave Chinner
2013-08-28 21:07       ` Mark Tinguely
2013-08-28 21:31         ` Dave Chinner
2013-08-28 21:43           ` Mark Tinguely
2013-08-28 22:34             ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.