All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/9] xfs: take inode version into account in XFS_LITINO
       [not found] ` <1358774760-21841-2-git-send-email-david@fromorbit.com>
@ 2013-02-14 21:25   ` Ben Myers
  0 siblings, 0 replies; 19+ messages in thread
From: Ben Myers @ 2013-02-14 21:25 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Jan 22, 2013 at 12:25:52AM +1100, Dave Chinner wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> Add a version argument to XFS_LITINO so that it can return different values
> depending on the inode version.  This is required for the upcoming v3 inodes
> with a larger fixed layout dinode.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks fine.
Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/9] xfs: add support for large btree blocks
       [not found] ` <1358774760-21841-3-git-send-email-david@fromorbit.com>
@ 2013-02-15 21:20   ` Ben Myers
  2013-02-22  1:34     ` Dave Chinner
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Myers @ 2013-02-15 21:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Jan 22, 2013 at 12:25:53AM +1100, Dave Chinner wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> Add support for larger btree blocks that contains a CRC32C checksum,
> a filesystem uuid and block number for detecting filesystem
> consistency and out of place writes.
> 
> [dchinner@redhat.com] Also include an owner field to allow reverse
> mappings to be implemented for improved repairability and a LSN
> field to so that log recovery can easily determine the last
> modification that made it to disk for each buffer.
> 
> [dchinner@redhat.com] Add buffer log format flags to indicate the
> type of buffer to recovery so that we don't have to do blind magic
> number tests to determine what the buffer is.
> 
> [dchinner@redhat.com] Modified to fit into the verifier structure.

This patch is far too large for a good review.  It needs to be split up into
it's various ideas which you outlined in patch 0.  If you need to add dead code
in each piece and then enable it at the end, that's fine with me.

Some comments below.

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_alloc_btree.c  |  105 +++++++++++++------
>  fs/xfs/xfs_alloc_btree.h  |   12 ++-
>  fs/xfs/xfs_attr_leaf.c    |    2 +-
>  fs/xfs/xfs_bmap.c         |   42 +++++---
>  fs/xfs/xfs_bmap_btree.c   |  110 +++++++++++++------
>  fs/xfs/xfs_bmap_btree.h   |   19 ++--
>  fs/xfs/xfs_btree.c        |  256 +++++++++++++++++++++++++++++++++++++--------
>  fs/xfs/xfs_btree.h        |   64 ++++++++++--
>  fs/xfs/xfs_buf_item.h     |   24 ++++-
>  fs/xfs/xfs_dinode.h       |    4 +-
>  fs/xfs/xfs_fsops.c        |   23 +++-
>  fs/xfs/xfs_ialloc_btree.c |   87 ++++++++++-----
>  fs/xfs/xfs_ialloc_btree.h |    9 +-
>  fs/xfs/xfs_inode.c        |   33 +++---
>  fs/xfs/xfs_log_recover.c  |   28 +++++
>  fs/xfs/xfs_trans.h        |    2 +
>  fs/xfs/xfs_trans_buf.c    |   29 +++--
>  17 files changed, 643 insertions(+), 206 deletions(-)
> 
> diff --git a/fs/xfs/xfs_alloc_btree.c b/fs/xfs/xfs_alloc_btree.c
> index b1ddef6..30c4c14 100644
> --- a/fs/xfs/xfs_alloc_btree.c
> +++ b/fs/xfs/xfs_alloc_btree.c
> @@ -33,6 +33,7 @@
>  #include "xfs_extent_busy.h"
>  #include "xfs_error.h"
>  #include "xfs_trace.h"
> +#include "xfs_cksum.h"
>  
>  
>  STATIC struct xfs_btree_cur *
> @@ -272,7 +273,7 @@ xfs_allocbt_key_diff(
>  	return (__int64_t)be32_to_cpu(kp->ar_startblock) - rec->ar_startblock;
>  }
>  
> -static void
> +static bool
>  xfs_allocbt_verify(
>  	struct xfs_buf		*bp)
>  {
> @@ -280,66 +281,103 @@ xfs_allocbt_verify(
>  	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
>  	struct xfs_perag	*pag = bp->b_pag;
>  	unsigned int		level;
> -	int			sblock_ok; /* block passes checks */
>  
>  	/*
>  	 * magic number and level verification
>  	 *
> -	 * During growfs operations, we can't verify the exact level as the
> -	 * perag is not fully initialised and hence not attached to the buffer.
> -	 * In this case, check against the maximum tree depth.
> +	 * During growfs operations, we can't verify the exact level or owner as
> +	 * the perag is not fully initialised and hence not attached to the
> +	 * buffer.  In this case, check against the maximum tree depth.
> +	 *
> +	 * Similarly, during log recovery we will have a perag structure
> +	 * attached, but the agf information will not yet have been initialised
> +	 * from the on disk AGF. Again, we can only check against maximum limits
> +	 * in this case.
>  	 */
>  	level = be16_to_cpu(block->bb_level);
>  	switch (block->bb_magic) {
> +	case cpu_to_be32(XFS_ABTB_CRC_MAGIC):
> +		if (!xfs_sb_version_hascrc(&mp->m_sb))
> +			return false;
> +		if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_uuid))
> +			return false;
> +		if (block->bb_u.s.bb_blkno != cpu_to_be64(bp->b_bn))
> +			return false;
> +		if (pag &&
> +		    be32_to_cpu(block->bb_u.s.bb_owner) != pag->pag_agno)
> +			return false;
> +		/* fall through */
>  	case cpu_to_be32(XFS_ABTB_MAGIC):
> -		if (pag)
> -			sblock_ok = level < pag->pagf_levels[XFS_BTNUM_BNOi];
> -		else
> -			sblock_ok = level < mp->m_ag_maxlevels;
> +		if (pag && pag->pagf_init) {
> +			if (level >= pag->pagf_levels[XFS_BTNUM_BNOi])
> +				return false;
> +		} else if (level >= mp->m_ag_maxlevels)
> +			return false;
>  		break;
> +	case cpu_to_be32(XFS_ABTC_CRC_MAGIC):
> +		if (!xfs_sb_version_hascrc(&mp->m_sb))
> +			return false;
> +		if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_uuid))
> +			return false;
> +		if (block->bb_u.s.bb_blkno != cpu_to_be64(bp->b_bn))
> +			return false;
> +		if (pag &&
> +		    be32_to_cpu(block->bb_u.s.bb_owner) != pag->pag_agno)
> +			return false;
> +		/* fall through */
>  	case cpu_to_be32(XFS_ABTC_MAGIC):
> -		if (pag)
> -			sblock_ok = level < pag->pagf_levels[XFS_BTNUM_CNTi];
> -		else
> -			sblock_ok = level < mp->m_ag_maxlevels;
> +		if (pag && pag->pagf_init) {
> +			if (level >= pag->pagf_levels[XFS_BTNUM_CNTi])
> +				return false;
> +		} else if (level >= mp->m_ag_maxlevels)
> +			return false;
>  		break;
>  	default:
> -		sblock_ok = 0;
> -		break;
> +		return false;
>  	}
>  
>  	/* numrecs verification */
> -	sblock_ok = sblock_ok &&
> -		be16_to_cpu(block->bb_numrecs) <= mp->m_alloc_mxr[level != 0];
> +	if (be16_to_cpu(block->bb_numrecs) > mp->m_alloc_mxr[level != 0])
> +		return false;
>  
>  	/* sibling pointer verification */
> -	sblock_ok = sblock_ok &&
> -		(block->bb_u.s.bb_leftsib == cpu_to_be32(NULLAGBLOCK) ||
> -		 be32_to_cpu(block->bb_u.s.bb_leftsib) < mp->m_sb.sb_agblocks) &&
> -		block->bb_u.s.bb_leftsib &&
> -		(block->bb_u.s.bb_rightsib == cpu_to_be32(NULLAGBLOCK) ||
> -		 be32_to_cpu(block->bb_u.s.bb_rightsib) < mp->m_sb.sb_agblocks) &&
> -		block->bb_u.s.bb_rightsib;
> -
> -	if (!sblock_ok) {
> -		trace_xfs_btree_corrupt(bp, _RET_IP_);
> -		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, block);
> -		xfs_buf_ioerror(bp, EFSCORRUPTED);
> -	}
> +	if (!block->bb_u.s.bb_leftsib ||
> +	    (be32_to_cpu(block->bb_u.s.bb_leftsib) >= mp->m_sb.sb_agblocks &&
> +	     block->bb_u.s.bb_leftsib != cpu_to_be32(NULLAGBLOCK)))
> +		return false;
> +	if (!block->bb_u.s.bb_rightsib ||
> +	    (be32_to_cpu(block->bb_u.s.bb_rightsib) >= mp->m_sb.sb_agblocks &&
> +	     block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK)))
> +		return false;
> +
> +	return true;
>  }
>  
>  static void
>  xfs_allocbt_read_verify(
>  	struct xfs_buf	*bp)
>  {
> -	xfs_allocbt_verify(bp);
> +	if (!(xfs_btree_sblock_verify_crc(bp) &&
> +	      xfs_allocbt_verify(bp))) {
> +		trace_xfs_btree_corrupt(bp, _RET_IP_);
> +		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW,
> +				     bp->b_target->bt_mount, bp->b_addr);
> +		xfs_buf_ioerror(bp, EFSCORRUPTED);
> +	}
>  }
>  
>  static void
>  xfs_allocbt_write_verify(
>  	struct xfs_buf	*bp)
>  {
> -	xfs_allocbt_verify(bp);
> +	if (!xfs_allocbt_verify(bp)) {
> +		trace_xfs_btree_corrupt(bp, _RET_IP_);
> +		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW,
> +				     bp->b_target->bt_mount, bp->b_addr);
> +		xfs_buf_ioerror(bp, EFSCORRUPTED);
> +	}
> +	xfs_btree_sblock_calc_crc(bp);
> +
>  }
>  
>  const struct xfs_buf_ops xfs_allocbt_buf_ops = {
> @@ -444,6 +482,9 @@ xfs_allocbt_init_cursor(
>  	cur->bc_private.a.agbp = agbp;
>  	cur->bc_private.a.agno = agno;
>  
> +	if (xfs_sb_version_hascrc(&mp->m_sb))
> +		cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
> +
>  	return cur;
>  }
>  
> diff --git a/fs/xfs/xfs_alloc_btree.h b/fs/xfs/xfs_alloc_btree.h
> index 7e89a2b..087465b 100644
> --- a/fs/xfs/xfs_alloc_btree.h
> +++ b/fs/xfs/xfs_alloc_btree.h
> @@ -31,8 +31,10 @@ struct xfs_mount;
>   * by blockcount and blockno.  All blocks look the same to make the code
>   * simpler; if we have time later, we'll make the optimizations.
>   */
> -#define	XFS_ABTB_MAGIC	0x41425442	/* 'ABTB' for bno tree */
> -#define	XFS_ABTC_MAGIC	0x41425443	/* 'ABTC' for cnt tree */
> +#define	XFS_ABTB_MAGIC		0x41425442	/* 'ABTB' for bno tree */
> +#define	XFS_ABTB_CRC_MAGIC	0x4142544a
> +#define	XFS_ABTC_MAGIC		0x41425443	/* 'ABTC' for cnt tree */
> +#define	XFS_ABTC_CRC_MAGIC	0x4142544b

/* 'ATB?' */

Add comment.

>  /*
>   * Data record/key structure
> @@ -59,10 +61,10 @@ typedef __be32 xfs_alloc_ptr_t;
>  
>  /*
>   * Btree block header size depends on a superblock flag.
> - *
> - * (not quite yet, but soon)
>   */
> -#define XFS_ALLOC_BLOCK_LEN(mp)	XFS_BTREE_SBLOCK_LEN
> +#define XFS_ALLOC_BLOCK_LEN(mp) \
> +	(xfs_sb_version_hascrc(&((mp)->m_sb)) ? \
> +		XFS_BTREE_SBLOCK_CRC_LEN : XFS_BTREE_SBLOCK_LEN)

The allocbt changes seem to be a good fit for their own patch.

>  
>  /*
>   * Record, key, and pointer address macros for btree blocks.
> diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
> index f96a734..aa4765f 100644
> --- a/fs/xfs/xfs_attr_leaf.c
> +++ b/fs/xfs/xfs_attr_leaf.c
> @@ -232,7 +232,7 @@ xfs_attr_shortform_bytesfit(xfs_inode_t *dp, int bytes)
>  				return 0;
>  			return dp->i_d.di_forkoff;
>  		}
> -		dsize = XFS_BMAP_BROOT_SPACE(dp->i_df.if_broot);
> +		dsize = XFS_BMAP_BROOT_SPACE(mp, dp->i_df.if_broot);

Changes to XFS_BMAP_BROOT_SPACE are a good candidate for a separate patch, just
as with LITINO.

>  		break;
>  	}
>  
> diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
> index f338012..821f599 100644
> --- a/fs/xfs/xfs_bmap.c
> +++ b/fs/xfs/xfs_bmap.c
> @@ -3053,6 +3053,7 @@ xfs_bmap_extents_to_btree(
>  	xfs_extnum_t		nextents;	/* number of file extents */
>  	xfs_bmbt_ptr_t		*pp;		/* root block address pointer */
>  
> +	mp = ip->i_mount;
>  	ifp = XFS_IFORK_PTR(ip, whichfork);
>  	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS);
>  
> @@ -3066,16 +3067,18 @@ xfs_bmap_extents_to_btree(
>  	 * Fill in the root.
>  	 */
>  	block = ifp->if_broot;
> -	block->bb_magic = cpu_to_be32(XFS_BMAP_MAGIC);
> -	block->bb_level = cpu_to_be16(1);
> -	block->bb_numrecs = cpu_to_be16(1);
> -	block->bb_u.l.bb_leftsib = cpu_to_be64(NULLDFSBNO);
> -	block->bb_u.l.bb_rightsib = cpu_to_be64(NULLDFSBNO);
> +	if (xfs_sb_version_hascrc(&mp->m_sb))
> +		xfs_btree_init_block_int(mp, block, XFS_BUF_DADDR_NULL,
> +				 XFS_BMAP_CRC_MAGIC, 1, 1, ip->i_ino,
> +				 XFS_BTREE_LONG_PTRS | XFS_BTREE_CRC_BLOCKS);
> +	else
> +		xfs_btree_init_block_int(mp, block, XFS_BUF_DADDR_NULL,
> +				 XFS_BMAP_MAGIC, 1, 1, ip->i_ino,
> +				 XFS_BTREE_LONG_PTRS);
>  
>  	/*
>  	 * Need a cursor.  Can't allocate until bb_level is filled in.
>  	 */
> -	mp = ip->i_mount;
>  	cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork);
>  	cur->bc_private.b.firstblock = *firstblock;
>  	cur->bc_private.b.flist = flist;
> @@ -3124,10 +3127,15 @@ xfs_bmap_extents_to_btree(
>  	 */
>  	abp->b_ops = &xfs_bmbt_buf_ops;
>  	ablock = XFS_BUF_TO_BLOCK(abp);
> -	ablock->bb_magic = cpu_to_be32(XFS_BMAP_MAGIC);
> -	ablock->bb_level = 0;
> -	ablock->bb_u.l.bb_leftsib = cpu_to_be64(NULLDFSBNO);
> -	ablock->bb_u.l.bb_rightsib = cpu_to_be64(NULLDFSBNO);
> +	if (xfs_sb_version_hascrc(&mp->m_sb))
> +		xfs_btree_init_block_int(mp, ablock, abp->b_bn,
> +				XFS_BMAP_CRC_MAGIC, 0, 0, ip->i_ino,
> +				XFS_BTREE_LONG_PTRS | XFS_BTREE_CRC_BLOCKS);
> +	else
> +		xfs_btree_init_block_int(mp, ablock, abp->b_bn,
> +				XFS_BMAP_MAGIC, 0, 0, ip->i_ino,
> +				XFS_BTREE_LONG_PTRS);
> +
>  	arp = XFS_BMBT_REC_ADDR(mp, ablock, 1);
>  	nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
>  	for (cnt = i = 0; i < nextents; i++) {
> @@ -3155,8 +3163,8 @@ xfs_bmap_extents_to_btree(
>  	 * Do all this logging at the end so that
>  	 * the root is at the right level.
>  	 */
> -	xfs_btree_log_block(cur, abp, XFS_BB_ALL_BITS);
>  	xfs_btree_log_recs(cur, abp, 1, be16_to_cpu(ablock->bb_numrecs));
> +	xfs_btree_log_block(cur, abp, XFS_BB_ALL_BITS);

Huh.  Why was that necessary?

>  	ASSERT(*curp == NULL);
>  	*curp = cur;
>  	*logflagsp = XFS_ILOG_CORE | xfs_ilog_fbroot(whichfork);
> @@ -3268,8 +3276,12 @@ xfs_bmap_local_to_extents(
>  		*firstblock = args.fsbno;
>  		bp = xfs_btree_get_bufl(args.mp, tp, args.fsbno, 0);
>  		bp->b_ops = &xfs_bmbt_buf_ops;
> +
>  		memcpy(bp->b_addr, ifp->if_u1.if_data, ifp->if_bytes);

Conflicts now due to 1e82379b0

> +
> +		xfs_trans_buf_set_type(tp, bp, XFS_BLF_BTREE_BUF);

The xfs_trans_buf_set_type changes are a good candidate for a separate patch.

>  		xfs_trans_log_buf(tp, bp, 0, ifp->if_bytes - 1);
> +
>  		xfs_bmap_forkoff_reset(args.mp, ip, whichfork);
>  		xfs_idata_realloc(ip, -ifp->if_bytes, whichfork);
>  		xfs_iext_add(ifp, 0, 1);
> @@ -4023,11 +4035,15 @@ xfs_bmap_sanity_check(
>  {
>  	struct xfs_btree_block  *block = XFS_BUF_TO_BLOCK(bp);
>  
> -	if (block->bb_magic != cpu_to_be32(XFS_BMAP_MAGIC) ||
> -	    be16_to_cpu(block->bb_level) != level ||
> +	if (block->bb_magic != cpu_to_be32(XFS_BMAP_CRC_MAGIC) &&
> +	    block->bb_magic != cpu_to_be32(XFS_BMAP_MAGIC))
> +		return 0;
> +
> +	if (be16_to_cpu(block->bb_level) != level ||
>  	    be16_to_cpu(block->bb_numrecs) == 0 ||
>  	    be16_to_cpu(block->bb_numrecs) > mp->m_bmap_dmxr[level != 0])
>  		return 0;
> +
>  	return 1;
>  }
>  
> diff --git a/fs/xfs/xfs_bmap_btree.c b/fs/xfs/xfs_bmap_btree.c
> index 061b45c..3a86c3f 100644
> --- a/fs/xfs/xfs_bmap_btree.c
> +++ b/fs/xfs/xfs_bmap_btree.c
> @@ -37,6 +37,7 @@
>  #include "xfs_error.h"
>  #include "xfs_quota.h"
>  #include "xfs_trace.h"
> +#include "xfs_cksum.h"
>  
>  /*
>   * Determine the extent state.
> @@ -59,24 +60,31 @@ xfs_extent_state(
>   */
>  void
>  xfs_bmdr_to_bmbt(
> -	struct xfs_mount	*mp,
> +	struct xfs_inode	*ip,
>  	xfs_bmdr_block_t	*dblock,
>  	int			dblocklen,
>  	struct xfs_btree_block	*rblock,
>  	int			rblocklen)
>  {
> +	struct xfs_mount	*mp = ip->i_mount;
>  	int			dmxr;
>  	xfs_bmbt_key_t		*fkp;
>  	__be64			*fpp;
>  	xfs_bmbt_key_t		*tkp;
>  	__be64			*tpp;
>  
> -	rblock->bb_magic = cpu_to_be32(XFS_BMAP_MAGIC);
> +	if (xfs_sb_version_hascrc(&mp->m_sb))
> +		xfs_btree_init_block_int(mp, rblock, XFS_BUF_DADDR_NULL,
> +				 XFS_BMAP_CRC_MAGIC, 0, 0, ip->i_ino,
> +				 XFS_BTREE_LONG_PTRS | XFS_BTREE_CRC_BLOCKS);
> +	else
> +		xfs_btree_init_block_int(mp, rblock, XFS_BUF_DADDR_NULL,
> +				 XFS_BMAP_MAGIC, 0, 0, ip->i_ino,
> +				 XFS_BTREE_LONG_PTRS);
> +
>  	rblock->bb_level = dblock->bb_level;
>  	ASSERT(be16_to_cpu(rblock->bb_level) > 0);
>  	rblock->bb_numrecs = dblock->bb_numrecs;
> -	rblock->bb_u.l.bb_leftsib = cpu_to_be64(NULLDFSBNO);
> -	rblock->bb_u.l.bb_rightsib = cpu_to_be64(NULLDFSBNO);
>  	dmxr = xfs_bmdr_maxrecs(mp, dblocklen, 0);
>  	fkp = XFS_BMDR_KEY_ADDR(dblock, 1);
>  	tkp = XFS_BMBT_KEY_ADDR(mp, rblock, 1);
> @@ -424,7 +432,13 @@ xfs_bmbt_to_bmdr(
>  	xfs_bmbt_key_t		*tkp;
>  	__be64			*tpp;
>  
> -	ASSERT(rblock->bb_magic == cpu_to_be32(XFS_BMAP_MAGIC));
> +	if (xfs_sb_version_hascrc(&mp->m_sb)) {
> +		ASSERT(rblock->bb_magic == cpu_to_be32(XFS_BMAP_CRC_MAGIC));
> +		ASSERT(uuid_equal(&rblock->bb_u.l.bb_uuid, &mp->m_sb.sb_uuid));
> +		ASSERT(rblock->bb_u.l.bb_blkno ==
> +		       cpu_to_be64(XFS_BUF_DADDR_NULL));
> +	} else
> +		ASSERT(rblock->bb_magic == cpu_to_be32(XFS_BMAP_MAGIC));
>  	ASSERT(rblock->bb_u.l.bb_leftsib == cpu_to_be64(NULLDFSBNO));
>  	ASSERT(rblock->bb_u.l.bb_rightsib == cpu_to_be64(NULLDFSBNO));
>  	ASSERT(rblock->bb_level != 0);
> @@ -708,59 +722,89 @@ xfs_bmbt_key_diff(
>  				      cur->bc_rec.b.br_startoff;
>  }
>  
> -static void
> +static int
>  xfs_bmbt_verify(
>  	struct xfs_buf		*bp)
>  {
>  	struct xfs_mount	*mp = bp->b_target->bt_mount;
>  	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
>  	unsigned int		level;
> -	int			lblock_ok; /* block passes checks */
>  
> -	/* magic number and level verification.
> +	switch (block->bb_magic) {
> +	case cpu_to_be32(XFS_BMAP_CRC_MAGIC):
> +		if (!xfs_sb_version_hascrc(&mp->m_sb))
> +			return false;
> +		if (!uuid_equal(&block->bb_u.l.bb_uuid, &mp->m_sb.sb_uuid))
> +			return false;
> +		if (be64_to_cpu(block->bb_u.l.bb_blkno) != bp->b_bn)
> +			return false;
> +		/*
> +		 * XXX: need a better way of verifying the owner here. Right now
> +		 * just make sure there has been one set.
> +		 */
> +		if (be64_to_cpu(block->bb_u.l.bb_owner) == 0)
> +			return false;
> +		/* fall through */
> +	case cpu_to_be32(XFS_BMAP_MAGIC):
> +		break;
> +	default:
> +		return false;
> +	}
> +
> +	/*
> +	 * numrecs and level verification.
>  	 *
> -	 * We don't know waht fork we belong to, so just verify that the level
> +	 * We don't know what fork we belong to, so just verify that the level
>  	 * is less than the maximum of the two. Later checks will be more
>  	 * precise.
>  	 */
>  	level = be16_to_cpu(block->bb_level);
> -	lblock_ok = block->bb_magic == cpu_to_be32(XFS_BMAP_MAGIC) &&
> -		    level < max(mp->m_bm_maxlevels[0], mp->m_bm_maxlevels[1]);
> -
> -	/* numrecs verification */
> -	lblock_ok = lblock_ok &&
> -		be16_to_cpu(block->bb_numrecs) <= mp->m_bmap_dmxr[level != 0];
> +	if (level > max(mp->m_bm_maxlevels[0], mp->m_bm_maxlevels[1]))
> +		return false;
> +	if (be16_to_cpu(block->bb_numrecs) > mp->m_bmap_dmxr[level != 0])
> +		return false;
>  
>  	/* sibling pointer verification */
> -	lblock_ok = lblock_ok &&
> -		block->bb_u.l.bb_leftsib &&
> -		(block->bb_u.l.bb_leftsib == cpu_to_be64(NULLDFSBNO) ||
> -		 XFS_FSB_SANITY_CHECK(mp,
> -			be64_to_cpu(block->bb_u.l.bb_leftsib))) &&
> -		block->bb_u.l.bb_rightsib &&
> -		(block->bb_u.l.bb_rightsib == cpu_to_be64(NULLDFSBNO) ||
> -		 XFS_FSB_SANITY_CHECK(mp,
> -			be64_to_cpu(block->bb_u.l.bb_rightsib)));
> -
> -	if (!lblock_ok) {
> -		trace_xfs_btree_corrupt(bp, _RET_IP_);
> -		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, block);
> -		xfs_buf_ioerror(bp, EFSCORRUPTED);
> -	}
> +	if (!block->bb_u.l.bb_leftsib ||
> +	    (block->bb_u.l.bb_leftsib != cpu_to_be64(NULLDFSBNO) &&
> +	     !XFS_FSB_SANITY_CHECK(mp, be64_to_cpu(block->bb_u.l.bb_leftsib))))
> +		return false;
> +	if (!block->bb_u.l.bb_rightsib ||
> +	    (block->bb_u.l.bb_rightsib != cpu_to_be64(NULLDFSBNO) &&
> +	     !XFS_FSB_SANITY_CHECK(mp, be64_to_cpu(block->bb_u.l.bb_rightsib))))
> +		return false;
> +
> +	return true;
> +
>  }
>  
>  static void
>  xfs_bmbt_read_verify(
>  	struct xfs_buf	*bp)
>  {
> -	xfs_bmbt_verify(bp);
> +	if (!(xfs_btree_lblock_verify_crc(bp) &&
> +	      xfs_bmbt_verify(bp))) {
> +		trace_xfs_btree_corrupt(bp, _RET_IP_);
> +		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW,
> +				     bp->b_target->bt_mount, bp->b_addr);
> +		xfs_buf_ioerror(bp, EFSCORRUPTED);
> +	}
> +
>  }
>  
>  static void
>  xfs_bmbt_write_verify(
>  	struct xfs_buf	*bp)
>  {
> -	xfs_bmbt_verify(bp);
> +	if (!xfs_bmbt_verify(bp)) {
> +		xfs_warn(bp->b_target->bt_mount, "bmbt daddr 0x%llx failed", bp->b_bn);
> +		trace_xfs_btree_corrupt(bp, _RET_IP_);
> +		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW,
> +				     bp->b_target->bt_mount, bp->b_addr);
> +		xfs_buf_ioerror(bp, EFSCORRUPTED);
> +		return;
> +	}
> +	xfs_btree_lblock_calc_crc(bp);
>  }
>  
>  const struct xfs_buf_ops xfs_bmbt_buf_ops = {
> @@ -838,6 +882,8 @@ xfs_bmbt_init_cursor(
>  
>  	cur->bc_ops = &xfs_bmbt_ops;
>  	cur->bc_flags = XFS_BTREE_LONG_PTRS | XFS_BTREE_ROOT_IN_INODE;
> +	if (xfs_sb_version_hascrc(&mp->m_sb))
> +		cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
>  
>  	cur->bc_private.b.forksize = XFS_IFORK_SIZE(ip, whichfork);
>  	cur->bc_private.b.ip = ip;
> diff --git a/fs/xfs/xfs_bmap_btree.h b/fs/xfs/xfs_bmap_btree.h
> index 88469ca..1b2f3e7 100644
> --- a/fs/xfs/xfs_bmap_btree.h
> +++ b/fs/xfs/xfs_bmap_btree.h
> @@ -18,7 +18,8 @@
>  #ifndef __XFS_BMAP_BTREE_H__
>  #define __XFS_BMAP_BTREE_H__
>  
> -#define XFS_BMAP_MAGIC	0x424d4150	/* 'BMAP' */
> +#define XFS_BMAP_MAGIC		0x424d4150	/* 'BMAP' */
> +#define XFS_BMAP_CRC_MAGIC	0x424d4158

BMAY

Add a comment.

>  struct xfs_btree_cur;
>  struct xfs_btree_block;
> @@ -136,10 +137,10 @@ typedef __be64 xfs_bmbt_ptr_t, xfs_bmdr_ptr_t;
>  
>  /*
>   * Btree block header size depends on a superblock flag.
> - *
> - * (not quite yet, but soon)
>   */
> -#define XFS_BMBT_BLOCK_LEN(mp)	XFS_BTREE_LBLOCK_LEN
> +#define XFS_BMBT_BLOCK_LEN(mp) \
> +	(xfs_sb_version_hascrc(&((mp)->m_sb)) ? \
> +		XFS_BTREE_LBLOCK_CRC_LEN : XFS_BTREE_LBLOCK_LEN)
>  
>  #define XFS_BMBT_REC_ADDR(mp, block, index) \
>  	((xfs_bmbt_rec_t *) \
> @@ -186,12 +187,12 @@ typedef __be64 xfs_bmbt_ptr_t, xfs_bmdr_ptr_t;
>  #define XFS_BMAP_BROOT_PTR_ADDR(mp, bb, i, sz) \
>  	XFS_BMBT_PTR_ADDR(mp, bb, i, xfs_bmbt_maxrecs(mp, sz, 0))
>  
> -#define XFS_BMAP_BROOT_SPACE_CALC(nrecs) \
> -	(int)(XFS_BTREE_LBLOCK_LEN + \
> +#define XFS_BMAP_BROOT_SPACE_CALC(mp, nrecs) \
> +	(int)(XFS_BMBT_BLOCK_LEN(mp) + \
>  	       ((nrecs) * (sizeof(xfs_bmbt_key_t) + sizeof(xfs_bmbt_ptr_t))))
>  
> -#define XFS_BMAP_BROOT_SPACE(bb) \
> -	(XFS_BMAP_BROOT_SPACE_CALC(be16_to_cpu((bb)->bb_numrecs)))
> +#define XFS_BMAP_BROOT_SPACE(mp, bb) \
> +	(XFS_BMAP_BROOT_SPACE_CALC(mp, be16_to_cpu((bb)->bb_numrecs)))
>  #define XFS_BMDR_SPACE_CALC(nrecs) \
>  	(int)(sizeof(xfs_bmdr_block_t) + \
>  	       ((nrecs) * (sizeof(xfs_bmbt_key_t) + sizeof(xfs_bmbt_ptr_t))))

Are these large broots going to force attributes out of line?

> @@ -204,7 +205,7 @@ typedef __be64 xfs_bmbt_ptr_t, xfs_bmdr_ptr_t;
>  /*
>   * Prototypes for xfs_bmap.c to call.
>   */
> -extern void xfs_bmdr_to_bmbt(struct xfs_mount *, xfs_bmdr_block_t *, int,
> +extern void xfs_bmdr_to_bmbt(struct xfs_inode *, xfs_bmdr_block_t *, int,
>  			struct xfs_btree_block *, int);
>  extern void xfs_bmbt_get_all(xfs_bmbt_rec_host_t *r, xfs_bmbt_irec_t *s);
>  extern xfs_filblks_t xfs_bmbt_get_blockcount(xfs_bmbt_rec_host_t *r);
> diff --git a/fs/xfs/xfs_btree.c b/fs/xfs/xfs_btree.c
> index db01040..9e8fcad 100644
> --- a/fs/xfs/xfs_btree.c
> +++ b/fs/xfs/xfs_btree.c
> @@ -30,9 +30,11 @@
>  #include "xfs_dinode.h"
>  #include "xfs_inode.h"
>  #include "xfs_inode_item.h"
> +#include "xfs_buf_item.h"
>  #include "xfs_btree.h"
>  #include "xfs_error.h"
>  #include "xfs_trace.h"
> +#include "xfs_cksum.h"
>  
>  /*
>   * Cursor allocation zone.
> @@ -42,9 +44,13 @@ kmem_zone_t	*xfs_btree_cur_zone;
>  /*
>   * Btree magic numbers.
>   */
> -const __uint32_t xfs_magics[XFS_BTNUM_MAX] = {
> -	XFS_ABTB_MAGIC, XFS_ABTC_MAGIC, XFS_BMAP_MAGIC, XFS_IBT_MAGIC
> +static const __uint32_t xfs_magics[2][XFS_BTNUM_MAX] = {
> +	{ XFS_ABTB_MAGIC, XFS_ABTC_MAGIC, XFS_BMAP_MAGIC, XFS_IBT_MAGIC },
> +	{ XFS_ABTB_CRC_MAGIC, XFS_ABTC_CRC_MAGIC,
> +	  XFS_BMAP_CRC_MAGIC, XFS_IBT_CRC_MAGIC }
>  };
> +#define xfs_btree_magic(cur) \
> +	xfs_magics[!!((cur)->bc_flags & XFS_BTREE_CRC_BLOCKS)][cur->bc_btnum]
>  
>  
>  STATIC int				/* error (0 or EFSCORRUPTED) */
> @@ -54,30 +60,38 @@ xfs_btree_check_lblock(
>  	int			level,	/* level of the btree block */
>  	struct xfs_buf		*bp)	/* buffer for block, if any */
>  {
> -	int			lblock_ok; /* block passes checks */
> +	int			lblock_ok = 1; /* block passes checks */
>  	struct xfs_mount	*mp;	/* file system mount point */
>  
>  	mp = cur->bc_mp;
> -	lblock_ok =
> -		be32_to_cpu(block->bb_magic) == xfs_magics[cur->bc_btnum] &&
> +
> +	if (xfs_sb_version_hascrc(&mp->m_sb)) {
> +		lblock_ok = lblock_ok &&
> +			uuid_equal(&block->bb_u.l.bb_uuid, &mp->m_sb.sb_uuid) &&
> +			block->bb_u.l.bb_blkno == cpu_to_be64(
> +				bp ? bp->b_bn : XFS_BUF_DADDR_NULL);
> +	}
> +
> +	lblock_ok = lblock_ok &&
> +		be32_to_cpu(block->bb_magic) == xfs_btree_magic(cur) &&
>  		be16_to_cpu(block->bb_level) == level &&
>  		be16_to_cpu(block->bb_numrecs) <=
>  			cur->bc_ops->get_maxrecs(cur, level) &&
>  		block->bb_u.l.bb_leftsib &&
>  		(block->bb_u.l.bb_leftsib == cpu_to_be64(NULLDFSBNO) ||
>  		 XFS_FSB_SANITY_CHECK(mp,
> -		 	be64_to_cpu(block->bb_u.l.bb_leftsib))) &&
> +			be64_to_cpu(block->bb_u.l.bb_leftsib))) &&
>  		block->bb_u.l.bb_rightsib &&
>  		(block->bb_u.l.bb_rightsib == cpu_to_be64(NULLDFSBNO) ||
>  		 XFS_FSB_SANITY_CHECK(mp,
> -		 	be64_to_cpu(block->bb_u.l.bb_rightsib)));
> +			be64_to_cpu(block->bb_u.l.bb_rightsib)));
> +
>  	if (unlikely(XFS_TEST_ERROR(!lblock_ok, mp,
>  			XFS_ERRTAG_BTREE_CHECK_LBLOCK,
>  			XFS_RANDOM_BTREE_CHECK_LBLOCK))) {
>  		if (bp)
>  			trace_xfs_btree_corrupt(bp, _RET_IP_);
> -		XFS_ERROR_REPORT("xfs_btree_check_lblock", XFS_ERRLEVEL_LOW,
> -				 mp);
> +		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
>  		return XFS_ERROR(EFSCORRUPTED);
>  	}
>  	return 0;
> @@ -90,16 +104,26 @@ xfs_btree_check_sblock(
>  	int			level,	/* level of the btree block */
>  	struct xfs_buf		*bp)	/* buffer containing block */
>  {
> +	struct xfs_mount	*mp;	/* file system mount point */
>  	struct xfs_buf		*agbp;	/* buffer for ag. freespace struct */
>  	struct xfs_agf		*agf;	/* ag. freespace structure */
>  	xfs_agblock_t		agflen;	/* native ag. freespace length */
> -	int			sblock_ok; /* block passes checks */
> +	int			sblock_ok = 1; /* block passes checks */
>  
> +	mp = cur->bc_mp;
>  	agbp = cur->bc_private.a.agbp;
>  	agf = XFS_BUF_TO_AGF(agbp);
>  	agflen = be32_to_cpu(agf->agf_length);
> -	sblock_ok =
> -		be32_to_cpu(block->bb_magic) == xfs_magics[cur->bc_btnum] &&
> +
> +	if (xfs_sb_version_hascrc(&mp->m_sb)) {
> +		sblock_ok = sblock_ok &&
> +			uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_uuid) &&
> +			block->bb_u.s.bb_blkno == cpu_to_be64(
> +				bp ? bp->b_bn : XFS_BUF_DADDR_NULL);
> +	}
> +
> +	sblock_ok = sblock_ok &&
> +		be32_to_cpu(block->bb_magic) == xfs_btree_magic(cur) &&
>  		be16_to_cpu(block->bb_level) == level &&
>  		be16_to_cpu(block->bb_numrecs) <=
>  			cur->bc_ops->get_maxrecs(cur, level) &&
> @@ -109,13 +133,13 @@ xfs_btree_check_sblock(
>  		(block->bb_u.s.bb_rightsib == cpu_to_be32(NULLAGBLOCK) ||
>  		 be32_to_cpu(block->bb_u.s.bb_rightsib) < agflen) &&
>  		block->bb_u.s.bb_rightsib;
> -	if (unlikely(XFS_TEST_ERROR(!sblock_ok, cur->bc_mp,
> +
> +	if (unlikely(XFS_TEST_ERROR(!sblock_ok, mp,
>  			XFS_ERRTAG_BTREE_CHECK_SBLOCK,
>  			XFS_RANDOM_BTREE_CHECK_SBLOCK))) {
>  		if (bp)
>  			trace_xfs_btree_corrupt(bp, _RET_IP_);
> -		XFS_CORRUPTION_ERROR("xfs_btree_check_sblock",
> -			XFS_ERRLEVEL_LOW, cur->bc_mp, block);
> +		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
>  		return XFS_ERROR(EFSCORRUPTED);
>  	}
>  	return 0;
> @@ -194,6 +218,72 @@ xfs_btree_check_ptr(
>  #endif
>  
>  /*
> + * Calculate CRC on the whole btree block and stuff it into the
> + * long-form btree header.
> + *
> + * Prior to calculting the CRC, pull the LSN out of the buffer log item and put
	       calculating

> + * it into the buffer so recovery knows what the last modifcation was that made
> + * it to disk.
> + */
> +void
> +xfs_btree_lblock_calc_crc(
> +	struct xfs_buf		*bp)
> +{
> +	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
> +	struct xfs_buf_log_item	*bip = bp->b_fspriv;
> +
> +	if (!xfs_sb_version_hascrc(&bp->b_target->bt_mount->m_sb))
> +		return;
> +	if (bip)
> +		block->bb_u.l.bb_lsn = cpu_to_be64(bip->bli_item.li_lsn);
> +	xfs_update_cksum(bp->b_addr, BBTOB(bp->b_length),
> +			 XFS_BTREE_LBLOCK_CRC_OFF);
> +}
> +
> +bool
> +xfs_btree_lblock_verify_crc(
> +	struct xfs_buf		*bp)
> +{
> +	if (xfs_sb_version_hascrc(&bp->b_target->bt_mount->m_sb))
> +		return xfs_verify_cksum(bp->b_addr, BBTOB(bp->b_length),
> +					XFS_BTREE_LBLOCK_CRC_OFF);
> +	return true;
> +}
> +
> +/*
> + * Calculate CRC on the whole btree block and stuff it into the
> + * short-form btree header.
> + *
> + * Prior to calculting the CRC, pull the LSN out of the buffer log item and put
> + * it into the buffer so recovery knows what the last modifcation was that made
> + * it to disk.
> + */
> +void
> +xfs_btree_sblock_calc_crc(
> +	struct xfs_buf		*bp)
> +{
> +	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
> +	struct xfs_buf_log_item	*bip = bp->b_fspriv;
> +
> +	if (!xfs_sb_version_hascrc(&bp->b_target->bt_mount->m_sb))
> +		return;
> +	if (bip)
> +		block->bb_u.s.bb_lsn = cpu_to_be64(bip->bli_item.li_lsn);
> +	xfs_update_cksum(bp->b_addr, BBTOB(bp->b_length),
> +			 XFS_BTREE_SBLOCK_CRC_OFF);
> +}
> +
> +bool
> +xfs_btree_sblock_verify_crc(
> +	struct xfs_buf		*bp)
> +{
> +	if (xfs_sb_version_hascrc(&bp->b_target->bt_mount->m_sb))
> +		return xfs_verify_cksum(bp->b_addr, BBTOB(bp->b_length),
> +					XFS_BTREE_SBLOCK_CRC_OFF);
> +	return true;
> +}
> +
> +/*
>   * Delete the btree cursor.
>   */
>  void
> @@ -277,10 +367,8 @@ xfs_btree_dup_cursor(
>  				*ncur = NULL;
>  				return error;
>  			}
> -			new->bc_bufs[i] = bp;
> -			ASSERT(!xfs_buf_geterror(bp));
> -		} else
> -			new->bc_bufs[i] = NULL;
> +		}
> +		new->bc_bufs[i] = bp;

Why remove the assert?

>  	}
>  	*ncur = new;
>  	return 0;
> @@ -321,9 +409,14 @@ xfs_btree_dup_cursor(
>   */
>  static inline size_t xfs_btree_block_len(struct xfs_btree_cur *cur)
>  {
> -	return (cur->bc_flags & XFS_BTREE_LONG_PTRS) ?
> -		XFS_BTREE_LBLOCK_LEN :
> -		XFS_BTREE_SBLOCK_LEN;
> +	if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
> +		if (cur->bc_flags & XFS_BTREE_CRC_BLOCKS)
> +			return XFS_BTREE_LBLOCK_CRC_LEN;
> +		return XFS_BTREE_LBLOCK_LEN;
> +	}
> +	if (cur->bc_flags & XFS_BTREE_CRC_BLOCKS)
> +		return XFS_BTREE_SBLOCK_CRC_LEN;
> +	return XFS_BTREE_SBLOCK_LEN;
>  }
>  
>  /*
> @@ -863,43 +956,85 @@ xfs_btree_set_sibling(
>  }
>  
>  void
> +xfs_btree_init_block_int(
> +	struct xfs_mount	*mp,
> +	struct xfs_btree_block	*buf,
> +	xfs_daddr_t		blkno,
> +	__u32			magic,
> +	__u16			level,
> +	__u16			numrecs,
> +	__u64			owner,
> +	unsigned int		flags)
> +{
> +	buf->bb_magic = cpu_to_be32(magic);
> +	buf->bb_level = cpu_to_be16(level);
> +	buf->bb_numrecs = cpu_to_be16(numrecs);
> +
> +	if (flags & XFS_BTREE_LONG_PTRS) {
> +		buf->bb_u.l.bb_leftsib = cpu_to_be64(NULLDFSBNO);
> +		buf->bb_u.l.bb_rightsib = cpu_to_be64(NULLDFSBNO);
> +		if (flags & XFS_BTREE_CRC_BLOCKS) {
> +			buf->bb_u.l.bb_blkno = cpu_to_be64(blkno);
> +			buf->bb_u.l.bb_owner = cpu_to_be64(owner);
> +			uuid_copy(&buf->bb_u.l.bb_uuid, &mp->m_sb.sb_uuid);
> +			buf->bb_u.l.bb_pad = 0;
> +		}
> +	} else {
> +		/* owner is a 32 bit value on short blocks */
> +		__u32 __owner = (__u32)owner;
> +
> +		buf->bb_u.s.bb_leftsib = cpu_to_be32(NULLAGBLOCK);
> +		buf->bb_u.s.bb_rightsib = cpu_to_be32(NULLAGBLOCK);
> +		if (flags & XFS_BTREE_CRC_BLOCKS) {
> +			buf->bb_u.s.bb_blkno = cpu_to_be64(blkno);
> +			buf->bb_u.s.bb_owner = cpu_to_be32(__owner);
> +			uuid_copy(&buf->bb_u.s.bb_uuid, &mp->m_sb.sb_uuid);
> +		}
> +	}
> +}
> +

The xfs_btree_init_block_int might be good in it's own patch.

> +void
>  xfs_btree_init_block(
>  	struct xfs_mount *mp,
>  	struct xfs_buf	*bp,
>  	__u32		magic,
>  	__u16		level,
>  	__u16		numrecs,
> +	__u64		owner,
>  	unsigned int	flags)
>  {
> -	struct xfs_btree_block	*new = XFS_BUF_TO_BLOCK(bp);
> -
> -	new->bb_magic = cpu_to_be32(magic);
> -	new->bb_level = cpu_to_be16(level);
> -	new->bb_numrecs = cpu_to_be16(numrecs);
> -
> -	if (flags & XFS_BTREE_LONG_PTRS) {
> -		new->bb_u.l.bb_leftsib = cpu_to_be64(NULLDFSBNO);
> -		new->bb_u.l.bb_rightsib = cpu_to_be64(NULLDFSBNO);
> -	} else {
> -		new->bb_u.s.bb_leftsib = cpu_to_be32(NULLAGBLOCK);
> -		new->bb_u.s.bb_rightsib = cpu_to_be32(NULLAGBLOCK);
> -	}
> +	xfs_btree_init_block_int(mp, XFS_BUF_TO_BLOCK(bp), bp->b_bn,
> +				 magic, level, numrecs, owner, flags);
>  }
>  
>  STATIC void
>  xfs_btree_init_block_cur(
>  	struct xfs_btree_cur	*cur,
> +	struct xfs_buf		*bp,
>  	int			level,
> -	int			numrecs,
> -	struct xfs_buf		*bp)
> +	int			numrecs)

Why rearrange the order of the args here?

>  {
> -	xfs_btree_init_block(cur->bc_mp, bp, xfs_magics[cur->bc_btnum],
> -			       level, numrecs, cur->bc_flags);
> +	__u64 owner;
> +
> +	/*
> +	 * we can pull the owner from the cursor right now as the different
> +	 * owners align directly with the pointer size of the btree. This may
> +	 * change in future, but is safe for current users of the generic btree
> +	 * code.
> +	 */
> +	if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
> +		owner = cur->bc_private.b.ip->i_ino;
> +	else
> +		owner = cur->bc_private.a.agno;

This is something I can look into a bit later... but I'm a little flummoxed by
this one.  Apparently only inodes use long pointers?

> +
> +	xfs_btree_init_block_int(cur->bc_mp, XFS_BUF_TO_BLOCK(bp), bp->b_bn,
> +				 xfs_btree_magic(cur), level, numrecs,
> +				 owner, cur->bc_flags);
>  }
>  
>  /*
>   * Return true if ptr is the last record in the btree and
> - * we need to track updateѕ to this record.  The decision
> + * we need to track updates to this record.  The decision

I can't find a change in that line.  Time for new glasses?

>   * will be further refined in the update_lastrec method.
>   */
>  STATIC int
> @@ -1147,6 +1282,7 @@ xfs_btree_log_keys(
>  	XFS_BTREE_TRACE_ARGBII(cur, bp, first, last);
>  
>  	if (bp) {
> +		xfs_trans_buf_set_type(cur->bc_tp, bp, XFS_BLF_BTREE_BUF);
>  		xfs_trans_log_buf(cur->bc_tp, bp,
>  				  xfs_btree_key_offset(cur, first),
>  				  xfs_btree_key_offset(cur, last + 1) - 1);
> @@ -1171,6 +1307,7 @@ xfs_btree_log_recs(
>  	XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY);
>  	XFS_BTREE_TRACE_ARGBII(cur, bp, first, last);
>  
> +	xfs_trans_buf_set_type(cur->bc_tp, bp, XFS_BLF_BTREE_BUF);
>  	xfs_trans_log_buf(cur->bc_tp, bp,
>  			  xfs_btree_rec_offset(cur, first),
>  			  xfs_btree_rec_offset(cur, last + 1) - 1);
> @@ -1195,6 +1332,7 @@ xfs_btree_log_ptrs(
>  		struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
>  		int			level = xfs_btree_get_level(block);
>  
> +		xfs_trans_buf_set_type(cur->bc_tp, bp, XFS_BLF_BTREE_BUF);
>  		xfs_trans_log_buf(cur->bc_tp, bp,
>  				xfs_btree_ptr_offset(cur, first, level),
>  				xfs_btree_ptr_offset(cur, last + 1, level) - 1);
> @@ -1223,7 +1361,12 @@ xfs_btree_log_block(
>  		offsetof(struct xfs_btree_block, bb_numrecs),
>  		offsetof(struct xfs_btree_block, bb_u.s.bb_leftsib),
>  		offsetof(struct xfs_btree_block, bb_u.s.bb_rightsib),
> -		XFS_BTREE_SBLOCK_LEN
> +		offsetof(struct xfs_btree_block, bb_u.s.bb_blkno),
> +		offsetof(struct xfs_btree_block, bb_u.s.bb_lsn),
> +		offsetof(struct xfs_btree_block, bb_u.s.bb_uuid),
> +		offsetof(struct xfs_btree_block, bb_u.s.bb_owner),
> +		offsetof(struct xfs_btree_block, bb_u.s.bb_crc),
> +		XFS_BTREE_SBLOCK_CRC_LEN
>  	};
>  	static const short	loffsets[] = {	/* table of offsets (long) */
>  		offsetof(struct xfs_btree_block, bb_magic),
> @@ -1231,17 +1374,40 @@ xfs_btree_log_block(
>  		offsetof(struct xfs_btree_block, bb_numrecs),
>  		offsetof(struct xfs_btree_block, bb_u.l.bb_leftsib),
>  		offsetof(struct xfs_btree_block, bb_u.l.bb_rightsib),
> -		XFS_BTREE_LBLOCK_LEN
> +		offsetof(struct xfs_btree_block, bb_u.l.bb_blkno),
> +		offsetof(struct xfs_btree_block, bb_u.l.bb_lsn),
> +		offsetof(struct xfs_btree_block, bb_u.l.bb_uuid),
> +		offsetof(struct xfs_btree_block, bb_u.l.bb_owner),
> +		offsetof(struct xfs_btree_block, bb_u.l.bb_crc),
> +		offsetof(struct xfs_btree_block, bb_u.l.bb_pad),
> +		XFS_BTREE_LBLOCK_CRC_LEN
>  	};
>  
>  	XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY);
>  	XFS_BTREE_TRACE_ARGBI(cur, bp, fields);
>  
>  	if (bp) {
> +		int nbits;
> +
> +		if (cur->bc_flags & XFS_BTREE_CRC_BLOCKS) {
> +			/*
> +			 * We don't log the CRC when updating a btree
> +			 * block but instead recreate it during log
> +			 * recovery.  As the log buffers have checksums
> +			 * of their this is safe and avoids logging a crc
				   own

> +			 * update in a lot of places.
> +			 */
> +			if (fields == XFS_BB_ALL_BITS)
> +				fields = XFS_BB_ALL_BITS_CRC;
> +			nbits = XFS_BB_NUM_BITS_CRC;
> +		} else {
> +			nbits = XFS_BB_NUM_BITS;
> +		}
>  		xfs_btree_offsets(fields,
>  				  (cur->bc_flags & XFS_BTREE_LONG_PTRS) ?
>  					loffsets : soffsets,
> -				  XFS_BB_NUM_BITS, &first, &last);
> +				  nbits, &first, &last);
> +		xfs_trans_buf_set_type(cur->bc_tp, bp, XFS_BLF_BTREE_BUF);
>  		xfs_trans_log_buf(cur->bc_tp, bp, first, last);
>  	} else {
>  		xfs_trans_log_inode(cur->bc_tp, cur->bc_private.b.ip,
> @@ -2204,7 +2370,7 @@ xfs_btree_split(
>  		goto error0;
>  
>  	/* Fill in the btree header for the new right block. */
> -	xfs_btree_init_block_cur(cur, xfs_btree_get_level(left), 0, rbp);
> +	xfs_btree_init_block_cur(cur, rbp, xfs_btree_get_level(left), 0);
>  
>  	/*
>  	 * Split the entries between the old and the new block evenly.
> @@ -2513,7 +2679,7 @@ xfs_btree_new_root(
>  		nptr = 2;
>  	}
>  	/* Fill in the new block's btree header and log it. */
> -	xfs_btree_init_block_cur(cur, cur->bc_nlevels, 2, nbp);
> +	xfs_btree_init_block_cur(cur, nbp, cur->bc_nlevels, 2);
>  	xfs_btree_log_block(cur, nbp, XFS_BB_ALL_BITS);
>  	ASSERT(!xfs_btree_ptr_is_null(cur, &lptr) &&
>  			!xfs_btree_ptr_is_null(cur, &rptr));
> diff --git a/fs/xfs/xfs_btree.h b/fs/xfs/xfs_btree.h
> index f932897..6e6c915 100644
> --- a/fs/xfs/xfs_btree.h
> +++ b/fs/xfs/xfs_btree.h
> @@ -42,11 +42,15 @@ extern kmem_zone_t	*xfs_btree_cur_zone;
>   * Generic btree header.
>   *
>   * This is a combination of the actual format used on disk for short and long
> - * format btrees.  The first three fields are shared by both format, but
> - * the pointers are different and should be used with care.
> + * format btrees.  The first three fields are shared by both format, but the
> + * pointers are different and should be used with care.
>   *
> - * To get the size of the actual short or long form headers please use
> - * the size macros below.  Never use sizeof(xfs_btree_block).
> + * To get the size of the actual short or long form headers please use the size
> + * macros below.  Never use sizeof(xfs_btree_block).
> + *
> + * The blkno, crc, lsn, owner and uuid fields are only available in filesystems
> + * with the crc feature bit, and all accesses to them must be conditional on
> + * that flag.
>   */
>  struct xfs_btree_block {
>  	__be32		bb_magic;	/* magic number for block type */
> @@ -56,10 +60,23 @@ struct xfs_btree_block {
>  		struct {
>  			__be32		bb_leftsib;
>  			__be32		bb_rightsib;
> +
> +			__be64		bb_blkno;
> +			__be64		bb_lsn;
> +			uuid_t		bb_uuid;
> +			__be32		bb_owner;
> +			__le32		bb_crc;
>  		} s;			/* short form pointers */
>  		struct	{
>  			__be64		bb_leftsib;
>  			__be64		bb_rightsib;
> +
> +			__be64		bb_blkno;
> +			__be64		bb_lsn;
> +			uuid_t		bb_uuid;
> +			__be64		bb_owner;
> +			__le32		bb_crc;
> +			__be32		bb_pad; /* padding for alignment */
>  		} l;			/* long form pointers */
>  	} bb_u;				/* rest */
>  };

struct xfs_btree_block {
        __be32          bb_magic;       /* magic number for block type */
        __be16          bb_level;       /* 0 is a leaf */
        __be16          bb_numrecs;     /* current # of data records */
        union {
                struct {
                        __be32          bb_leftsib;
                        __be32          bb_rightsib;
		} s;			/* short form pointers */
                struct {
                        __be32          bb_leftsib;
                        __be32          bb_rightsib;
                        __be64          bb_blkno;
                        __be64          bb_lsn;
                        uuid_t          bb_uuid;
                        __be32          bb_owner;
                        __le32          bb_crc;
                } s_crc;		/* short form pointers with crcs */
                struct  {
                        __be64          bb_leftsib;
                        __be64          bb_rightsib;
		} l;			/* long form pointers */
                struct  {
                        __be64          bb_leftsib;
                        __be64          bb_rightsib;
                        __be64          bb_blkno;
                        __be64          bb_lsn;
                        uuid_t          bb_uuid;
                        __be64          bb_owner;
                        __le32          bb_crc;
                        __be32          bb_pad; /* padding for alignment */
                } l_crc;		/* long form pointers with crcs */
        } bb_u;                         /* rest */
};

> @@ -67,6 +84,16 @@ struct xfs_btree_block {
>  #define XFS_BTREE_SBLOCK_LEN	16	/* size of a short form block */
>  #define XFS_BTREE_LBLOCK_LEN	24	/* size of a long form block */
>  
> +/* sizes of CRC enabled btree blocks */
> +#define XFS_BTREE_SBLOCK_CRC_LEN	(XFS_BTREE_SBLOCK_LEN + 40)
> +#define XFS_BTREE_LBLOCK_CRC_LEN	(XFS_BTREE_LBLOCK_LEN + 48)
> +
> +
> +#define XFS_BTREE_SBLOCK_CRC_OFF \
> +	offsetof(struct xfs_btree_block, bb_u.s.bb_crc)
> +#define XFS_BTREE_LBLOCK_CRC_OFF \
> +	offsetof(struct xfs_btree_block, bb_u.l.bb_crc)
> +
>  
>  /*
>   * Generic key, ptr and record wrapper structures.
> @@ -101,13 +128,11 @@ union xfs_btree_rec {
>  #define	XFS_BB_NUMRECS		0x04
>  #define	XFS_BB_LEFTSIB		0x08
>  #define	XFS_BB_RIGHTSIB		0x10
> +#define	XFS_BB_BLKNO		0x20
>  #define	XFS_BB_NUM_BITS		5
>  #define	XFS_BB_ALL_BITS		((1 << XFS_BB_NUM_BITS) - 1)

Did XFS_BB_BLKNO sneak in from a subsequent patch?

> -
> -/*
> - * Magic numbers for btree blocks.
> - */
> -extern const __uint32_t	xfs_magics[];
> +#define	XFS_BB_NUM_BITS_CRC	8
> +#define	XFS_BB_ALL_BITS_CRC	((1 << XFS_BB_NUM_BITS_CRC) - 1)

I'm a little confused here.  We were using 5 bits, you added BB_BLKNO for 6,
and now you're setting NUM_BITS_CRC to 8?

Probably more readable with the (1<<0) idiom, as below.
 
>  /*
>   * Generic stats interface
> @@ -256,6 +281,7 @@ typedef struct xfs_btree_cur
>  #define XFS_BTREE_LONG_PTRS		(1<<0)	/* pointers are 64bits long */
>  #define XFS_BTREE_ROOT_IN_INODE		(1<<1)	/* root may be variable size */
>  #define XFS_BTREE_LASTREC_UPDATE	(1<<2)	/* track last rec externally */
> +#define XFS_BTREE_CRC_BLOCKS		(1<<3)	/* uses extended btree blocks */
>  
>  
>  #define	XFS_BTREE_NOERROR	0
> @@ -393,8 +419,20 @@ xfs_btree_init_block(
>  	__u32		magic,
>  	__u16		level,
>  	__u16		numrecs,
> +	__u64		owner,
>  	unsigned int	flags);
>  
> +void
> +xfs_btree_init_block_int(
> +	struct xfs_mount	*mp,
> +	struct xfs_btree_block	*buf,
> +	xfs_daddr_t		blkno,
> +	__u32			magic,
> +	__u16			level,
> +	__u16			numrecs,
> +	__u64			owner,
> +	unsigned int		flags);
> +
>  /*
>   * Common btree core entry points.
>   */
> @@ -408,6 +446,14 @@ int xfs_btree_delete(struct xfs_btree_cur *, int *);
>  int xfs_btree_get_rec(struct xfs_btree_cur *, union xfs_btree_rec **, int *);
>  
>  /*
> + * btree block CRC helpers
> + */
> +void xfs_btree_lblock_calc_crc(struct xfs_buf *);
> +bool xfs_btree_lblock_verify_crc(struct xfs_buf *);
> +void xfs_btree_sblock_calc_crc(struct xfs_buf *);
> +bool xfs_btree_sblock_verify_crc(struct xfs_buf *);
> +
> +/*
>   * Internal btree helpers also used by xfs_bmap.c.
>   */
>  void xfs_btree_log_block(struct xfs_btree_cur *, struct xfs_buf *, int);
> diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
> index ee36c88..101ef83 100644
> --- a/fs/xfs/xfs_buf_item.h
> +++ b/fs/xfs/xfs_buf_item.h
> @@ -24,19 +24,33 @@ extern kmem_zone_t	*xfs_buf_item_zone;
>   * This flag indicates that the buffer contains on disk inodes
>   * and requires special recovery handling.
>   */
> -#define	XFS_BLF_INODE_BUF	0x1
> +#define	XFS_BLF_INODE_BUF	(1<<0)
>  /*
>   * This flag indicates that the buffer should not be replayed
>   * during recovery because its blocks are being freed.
>   */
> -#define	XFS_BLF_CANCEL		0x2
> +#define	XFS_BLF_CANCEL		(1<<1)
> +
>  /*
>   * This flag indicates that the buffer contains on disk
>   * user or group dquots and may require special recovery handling.
>   */
> -#define	XFS_BLF_UDQUOT_BUF	0x4
> -#define XFS_BLF_PDQUOT_BUF	0x8
> -#define	XFS_BLF_GDQUOT_BUF	0x10
> +#define	XFS_BLF_UDQUOT_BUF	(1<<2)
> +#define XFS_BLF_PDQUOT_BUF	(1<<3)
> +#define	XFS_BLF_GDQUOT_BUF	(1<<4)
> +
> +/*
> + * all buffers now need flags to tell recovery where the magic number
> + * is so that it can verify and calculate the CRCs on the buffer correctly
> + * once the changes have been replayed into the buffer.
> + */
> +#define XFS_BLF_BTREE_BUF	(1<<5)
> +
> +#define XFS_BLF_TYPE_MASK	\
> +		(XFS_BLF_UDQUOT_BUF | \
> +		 XFS_BLF_PDQUOT_BUF | \
> +		 XFS_BLF_GDQUOT_BUF | \
> +		 XFS_BLF_BTREE_BUF)
>  
>  #define	XFS_BLF_CHUNK		128
>  #define	XFS_BLF_SHIFT		7
> diff --git a/fs/xfs/xfs_dinode.h b/fs/xfs/xfs_dinode.h
> index 88a3368..6b5bd17 100644
> --- a/fs/xfs/xfs_dinode.h
> +++ b/fs/xfs/xfs_dinode.h
> @@ -107,8 +107,8 @@ typedef enum xfs_dinode_fmt {
>  #define XFS_LITINO(mp, version) \
>  	((int)(((mp)->m_sb.sb_inodesize) - sizeof(struct xfs_dinode)))
>  
> -#define	XFS_BROOT_SIZE_ADJ	\
> -	(XFS_BTREE_LBLOCK_LEN - sizeof(xfs_bmdr_block_t))
> +#define XFS_BROOT_SIZE_ADJ(ip) \
> +	(XFS_BMBT_BLOCK_LEN((ip)->i_mount) - sizeof(xfs_bmdr_block_t))

A good candidate for a separate patch.

>  
>  /*
>   * Inode data & attribute fork sizes, per inode.
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 94eaeed..50c43ec 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -316,7 +316,13 @@ xfs_growfs_data_private(
>  			goto error0;
>  		}
>  
> -		xfs_btree_init_block(mp, bp, XFS_ABTB_MAGIC, 0, 1, 0);
> +		if (xfs_sb_version_hascrc(&mp->m_sb))
> +			xfs_btree_init_block(mp, bp, XFS_ABTB_CRC_MAGIC, 0, 1,
> +						agno, XFS_BTREE_CRC_BLOCKS);
> +		else
> +			xfs_btree_init_block(mp, bp, XFS_ABTB_MAGIC, 0, 1,
> +						agno, 0);
> +
>  		arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
>  		arec->ar_startblock = cpu_to_be32(XFS_PREALLOC_BLOCKS(mp));
>  		arec->ar_blockcount = cpu_to_be32(
> @@ -339,7 +345,13 @@ xfs_growfs_data_private(
>  			goto error0;
>  		}
>  
> -		xfs_btree_init_block(mp, bp, XFS_ABTC_MAGIC, 0, 1, 0);
> +		if (xfs_sb_version_hascrc(&mp->m_sb))
> +			xfs_btree_init_block(mp, bp, XFS_ABTC_CRC_MAGIC, 0, 1,
> +						agno, XFS_BTREE_CRC_BLOCKS);
> +		else
> +			xfs_btree_init_block(mp, bp, XFS_ABTC_MAGIC, 0, 1,
> +						agno, 0);
> +
>  		arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
>  		arec->ar_startblock = cpu_to_be32(XFS_PREALLOC_BLOCKS(mp));
>  		arec->ar_blockcount = cpu_to_be32(
> @@ -363,7 +375,12 @@ xfs_growfs_data_private(
>  			goto error0;
>  		}
>  
> -		xfs_btree_init_block(mp, bp, XFS_IBT_MAGIC, 0, 0, 0);
> +		if (xfs_sb_version_hascrc(&mp->m_sb))
> +			xfs_btree_init_block(mp, bp, XFS_IBT_CRC_MAGIC, 0, 0,
> +						agno, XFS_BTREE_CRC_BLOCKS);
> +		else
> +			xfs_btree_init_block(mp, bp, XFS_IBT_MAGIC, 0, 0,
> +						agno, 0);
>  
>  		error = xfs_bwrite(bp);
>  		xfs_buf_relse(bp);
> diff --git a/fs/xfs/xfs_ialloc_btree.c b/fs/xfs/xfs_ialloc_btree.c
> index bec344b..c82ac88 100644
> --- a/fs/xfs/xfs_ialloc_btree.c
> +++ b/fs/xfs/xfs_ialloc_btree.c
> @@ -34,6 +34,7 @@
>  #include "xfs_alloc.h"
>  #include "xfs_error.h"
>  #include "xfs_trace.h"
> +#include "xfs_cksum.h"
>  
>  
>  STATIC int
> @@ -182,52 +183,88 @@ xfs_inobt_key_diff(
>  			  cur->bc_rec.i.ir_startino;
>  }
>  
> -void
> +static int
>  xfs_inobt_verify(
>  	struct xfs_buf		*bp)
>  {
>  	struct xfs_mount	*mp = bp->b_target->bt_mount;
>  	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
> +	struct xfs_perag	*pag = bp->b_pag;
>  	unsigned int		level;
> -	int			sblock_ok; /* block passes checks */
>  
> -	/* magic number and level verification */
> -	level = be16_to_cpu(block->bb_level);
> -	sblock_ok = block->bb_magic == cpu_to_be32(XFS_IBT_MAGIC) &&
> -		    level < mp->m_in_maxlevels;
> +	/*
> +	 * During growfs operations, we can't verify the exact owner as the
> +	 * perag is not fully initialised and hence not attached to the buffer.
> +	 *
> +	 * Similarly, during log recovery we will have a perag structure
> +	 * attached, but the agi information will not yet have been initialised
> +	 * from the on disk AGI. We don't currently use any of this information,
> +	 * but beware of the landmine (i.e. need to check pag->pagi_init) if we
> +	 * ever do.
> +	 */
> +	switch (block->bb_magic) {
> +	case cpu_to_be32(XFS_IBT_CRC_MAGIC):
> +		if (!xfs_sb_version_hascrc(&mp->m_sb))
> +			return false;
> +		if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_uuid))
> +			return false;
> +		if (block->bb_u.s.bb_blkno != cpu_to_be64(bp->b_bn))
> +			return false;
> +		if (pag &&
> +		    be32_to_cpu(block->bb_u.s.bb_owner) != pag->pag_agno)
> +			return false;
> +		/* fall through */
> +	case cpu_to_be32(XFS_IBT_MAGIC):
> +		break;
> +	default:
> +		return 0;
> +	}
>  
> -	/* numrecs verification */
> -	sblock_ok = sblock_ok &&
> -		be16_to_cpu(block->bb_numrecs) <= mp->m_inobt_mxr[level != 0];
> +	/* numrecs and level verification */
> +	level = be16_to_cpu(block->bb_level);
> +	if (level >= mp->m_in_maxlevels)
> +		return false;
> +	if (be16_to_cpu(block->bb_numrecs) > mp->m_inobt_mxr[level != 0])
> +		return false;
>  
>  	/* sibling pointer verification */
> -	sblock_ok = sblock_ok &&
> -		(block->bb_u.s.bb_leftsib == cpu_to_be32(NULLAGBLOCK) ||
> -		 be32_to_cpu(block->bb_u.s.bb_leftsib) < mp->m_sb.sb_agblocks) &&
> -		block->bb_u.s.bb_leftsib &&
> -		(block->bb_u.s.bb_rightsib == cpu_to_be32(NULLAGBLOCK) ||
> -		 be32_to_cpu(block->bb_u.s.bb_rightsib) < mp->m_sb.sb_agblocks) &&
> -		block->bb_u.s.bb_rightsib;
> -
> -	if (!sblock_ok) {
> -		trace_xfs_btree_corrupt(bp, _RET_IP_);
> -		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, block);
> -		xfs_buf_ioerror(bp, EFSCORRUPTED);
> -	}
> +	if (!block->bb_u.s.bb_leftsib ||
> +	    (be32_to_cpu(block->bb_u.s.bb_leftsib) >= mp->m_sb.sb_agblocks &&
> +	     block->bb_u.s.bb_leftsib != cpu_to_be32(NULLAGBLOCK)))
> +		return false;
> +	if (!block->bb_u.s.bb_rightsib ||
> +	    (be32_to_cpu(block->bb_u.s.bb_rightsib) >= mp->m_sb.sb_agblocks &&
> +	     block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK)))
> +		return false;
> +
> +	return true;
>  }
>  
>  static void
>  xfs_inobt_read_verify(
>  	struct xfs_buf	*bp)
>  {
> -	xfs_inobt_verify(bp);
> +	if (!(xfs_btree_sblock_verify_crc(bp) &&
> +	      xfs_inobt_verify(bp))) {
> +		trace_xfs_btree_corrupt(bp, _RET_IP_);
> +		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW,
> +				     bp->b_target->bt_mount, bp->b_addr);
> +		xfs_buf_ioerror(bp, EFSCORRUPTED);
> +	}
>  }
>  
>  static void
>  xfs_inobt_write_verify(
>  	struct xfs_buf	*bp)
>  {
> -	xfs_inobt_verify(bp);
> +	if (!xfs_inobt_verify(bp)) {
> +		trace_xfs_btree_corrupt(bp, _RET_IP_);
> +		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW,
> +				     bp->b_target->bt_mount, bp->b_addr);
> +		xfs_buf_ioerror(bp, EFSCORRUPTED);
> +	}
> +	xfs_btree_sblock_calc_crc(bp);
> +
>  }
>  
>  const struct xfs_buf_ops xfs_inobt_buf_ops = {
> @@ -301,6 +338,8 @@ xfs_inobt_init_cursor(
>  	cur->bc_blocklog = mp->m_sb.sb_blocklog;
>  
>  	cur->bc_ops = &xfs_inobt_ops;
> +	if (xfs_sb_version_hascrc(&mp->m_sb))
> +		cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
>  
>  	cur->bc_private.a.agbp = agbp;
>  	cur->bc_private.a.agno = agno;
> diff --git a/fs/xfs/xfs_ialloc_btree.h b/fs/xfs/xfs_ialloc_btree.h
> index 25c0239..78dfd1e 100644
> --- a/fs/xfs/xfs_ialloc_btree.h
> +++ b/fs/xfs/xfs_ialloc_btree.h
> @@ -29,7 +29,8 @@ struct xfs_mount;
>  /*
>   * There is a btree for the inode map per allocation group.
>   */
> -#define	XFS_IBT_MAGIC	0x49414254	/* 'IABT' */
> +#define	XFS_IBT_MAGIC		0x49414254	/* 'IABT' */
> +#define	XFS_IBT_CRC_MAGIC	0x4941425c

/* 'IAB?' */

>  
>  typedef	__uint64_t	xfs_inofree_t;
>  #define	XFS_INODES_PER_CHUNK		(NBBY * sizeof(xfs_inofree_t))
> @@ -76,10 +77,10 @@ typedef __be32 xfs_inobt_ptr_t;
>  
>  /*
>   * Btree block header size depends on a superblock flag.
> - *
> - * (not quite yet, but soon)
>   */
> -#define XFS_INOBT_BLOCK_LEN(mp)	XFS_BTREE_SBLOCK_LEN
> +#define XFS_INOBT_BLOCK_LEN(mp) \
> +	(xfs_sb_version_hascrc(&((mp)->m_sb)) ? \
> +		XFS_BTREE_SBLOCK_CRC_LEN : XFS_BTREE_SBLOCK_LEN)
>  
>  /*
>   * Record, key, and pointer address macros for btree blocks.
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 4f20165..202ce37 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -786,6 +786,7 @@ xfs_iformat_btree(
>  	xfs_dinode_t		*dip,
>  	int			whichfork)
>  {
> +	struct xfs_mount	*mp = ip->i_mount;
>  	xfs_bmdr_block_t	*dfp;
>  	xfs_ifork_t		*ifp;
>  	/* REFERENCED */
> @@ -794,7 +795,7 @@ xfs_iformat_btree(
>  
>  	ifp = XFS_IFORK_PTR(ip, whichfork);
>  	dfp = (xfs_bmdr_block_t *)XFS_DFORK_PTR(dip, whichfork);
> -	size = XFS_BMAP_BROOT_SPACE(dfp);
> +	size = XFS_BMAP_BROOT_SPACE(mp, dfp);
>  	nrecs = be16_to_cpu(dfp->bb_numrecs);
>  
>  	/*
> @@ -805,14 +806,14 @@ xfs_iformat_btree(
>  	 * blocks.
>  	 */
>  	if (unlikely(XFS_IFORK_NEXTENTS(ip, whichfork) <=
> -			XFS_IFORK_MAXEXT(ip, whichfork) ||
> +					XFS_IFORK_MAXEXT(ip, whichfork) ||
>  		     XFS_BMDR_SPACE_CALC(nrecs) >
> -			XFS_DFORK_SIZE(dip, ip->i_mount, whichfork) ||
> +					XFS_DFORK_SIZE(dip, mp, whichfork) ||
>  		     XFS_IFORK_NEXTENTS(ip, whichfork) > ip->i_d.di_nblocks)) {
> -		xfs_warn(ip->i_mount, "corrupt inode %Lu (btree).",
> -			(unsigned long long) ip->i_ino);
> +		xfs_warn(mp, "corrupt inode %Lu (btree).",
> +					(unsigned long long) ip->i_ino);
>  		XFS_CORRUPTION_ERROR("xfs_iformat_btree", XFS_ERRLEVEL_LOW,
> -				 ip->i_mount, dip);
> +					 mp, dip);
>  		return XFS_ERROR(EFSCORRUPTED);
>  	}
>  
> @@ -823,8 +824,7 @@ xfs_iformat_btree(
>  	 * Copy and convert from the on-disk structure
>  	 * to the in-memory structure.
>  	 */
> -	xfs_bmdr_to_bmbt(ip->i_mount, dfp,
> -			 XFS_DFORK_SIZE(dip, ip->i_mount, whichfork),
> +	xfs_bmdr_to_bmbt(ip, dfp, XFS_DFORK_SIZE(dip, ip->i_mount, whichfork),
>  			 ifp->if_broot, size);

changing xfs_bmdr_to_bmbt to take an inode pointer could be it's own patch.

>  	ifp->if_flags &= ~XFS_IFEXTENTS;
>  	ifp->if_flags |= XFS_IFBROOT;
> @@ -2037,7 +2037,7 @@ xfs_iroot_realloc(
>  		 * allocate it now and get out.
>  		 */
>  		if (ifp->if_broot_bytes == 0) {
> -			new_size = (size_t)XFS_BMAP_BROOT_SPACE_CALC(rec_diff);
> +			new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, rec_diff);
>  			ifp->if_broot = kmem_alloc(new_size, KM_SLEEP | KM_NOFS);
>  			ifp->if_broot_bytes = (int)new_size;
>  			return;
> @@ -2051,9 +2051,9 @@ xfs_iroot_realloc(
>  		 */
>  		cur_max = xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, 0);
>  		new_max = cur_max + rec_diff;
> -		new_size = (size_t)XFS_BMAP_BROOT_SPACE_CALC(new_max);
> +		new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, new_max);
>  		ifp->if_broot = kmem_realloc(ifp->if_broot, new_size,
> -				(size_t)XFS_BMAP_BROOT_SPACE_CALC(cur_max), /* old size */
> +				XFS_BMAP_BROOT_SPACE_CALC(mp, cur_max),
>  				KM_SLEEP | KM_NOFS);
>  		op = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
>  						     ifp->if_broot_bytes);
> @@ -2061,7 +2061,7 @@ xfs_iroot_realloc(
>  						     (int)new_size);
>  		ifp->if_broot_bytes = (int)new_size;
>  		ASSERT(ifp->if_broot_bytes <=
> -			XFS_IFORK_SIZE(ip, whichfork) + XFS_BROOT_SIZE_ADJ);
> +			XFS_IFORK_SIZE(ip, whichfork) + XFS_BROOT_SIZE_ADJ(ip));
>  		memmove(np, op, cur_max * (uint)sizeof(xfs_dfsbno_t));
>  		return;
>  	}
> @@ -2076,7 +2076,7 @@ xfs_iroot_realloc(
>  	new_max = cur_max + rec_diff;
>  	ASSERT(new_max >= 0);
>  	if (new_max > 0)
> -		new_size = (size_t)XFS_BMAP_BROOT_SPACE_CALC(new_max);
> +		new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, new_max);
>  	else
>  		new_size = 0;
>  	if (new_size > 0) {
> @@ -2084,7 +2084,8 @@ xfs_iroot_realloc(
>  		/*
>  		 * First copy over the btree block header.
>  		 */
> -		memcpy(new_broot, ifp->if_broot, XFS_BTREE_LBLOCK_LEN);
> +		memcpy(new_broot, ifp->if_broot,
> +			XFS_BMBT_BLOCK_LEN(ip->i_mount));
>  	} else {
>  		new_broot = NULL;
>  		ifp->if_flags &= ~XFS_IFBROOT;
> @@ -2114,7 +2115,7 @@ xfs_iroot_realloc(
>  	ifp->if_broot = new_broot;
>  	ifp->if_broot_bytes = (int)new_size;
>  	ASSERT(ifp->if_broot_bytes <=
> -		XFS_IFORK_SIZE(ip, whichfork) + XFS_BROOT_SIZE_ADJ);
> +		XFS_IFORK_SIZE(ip, whichfork) + XFS_BROOT_SIZE_ADJ(ip));
>  	return;
>  }
>  
> @@ -2427,7 +2428,7 @@ xfs_iflush_fork(
>  			ASSERT(ifp->if_broot != NULL);
>  			ASSERT(ifp->if_broot_bytes <=
>  			       (XFS_IFORK_SIZE(ip, whichfork) +
> -				XFS_BROOT_SIZE_ADJ));
> +				XFS_BROOT_SIZE_ADJ(ip)));
>  			xfs_bmbt_to_bmdr(mp, ifp->if_broot, ifp->if_broot_bytes,
>  				(xfs_bmdr_block_t *)cp,
>  				XFS_DFORK_SIZE(dip, mp, whichfork));
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 96fcbb8..c57a987 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -29,6 +29,7 @@
>  #include "xfs_bmap_btree.h"
>  #include "xfs_alloc_btree.h"
>  #include "xfs_ialloc_btree.h"
> +#include "xfs_btree.h"
>  #include "xfs_dinode.h"
>  #include "xfs_inode.h"
>  #include "xfs_inode_item.h"
> @@ -1929,6 +1930,33 @@ xlog_recover_do_reg_buffer(
>  
>  	/* Shouldn't be any more regions */
>  	ASSERT(i == item->ri_total);
> +
> +	switch (buf_f->blf_flags & XFS_BLF_TYPE_MASK) {
> +	case XFS_BLF_BTREE_BUF:
> +		switch (be32_to_cpu(*(__be32 *)bp->b_addr)) {
> +		case XFS_ABTB_CRC_MAGIC:
> +		case XFS_ABTC_CRC_MAGIC:
> +		case XFS_ABTB_MAGIC:
> +		case XFS_ABTC_MAGIC:
> +			bp->b_ops = &xfs_allocbt_buf_ops;
> +			break;
> +		case XFS_IBT_CRC_MAGIC:
> +		case XFS_IBT_MAGIC:
> +			bp->b_ops = &xfs_inobt_buf_ops;
> +			break;
> +		case XFS_BMAP_CRC_MAGIC:
> +		case XFS_BMAP_MAGIC:
> +			bp->b_ops = &xfs_bmbt_buf_ops;
> +			break;
> +		default:
> +			xfs_warn(mp, "Bad btree block magic!");
> +			ASSERT(0);
> +			break;
> +		}
> +		break;
> +	default:
> +		break;
> +	}
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index c6c0601..932de22 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -503,6 +503,8 @@ void		xfs_trans_inode_buf(xfs_trans_t *, struct xfs_buf *);
>  void		xfs_trans_stale_inode_buf(xfs_trans_t *, struct xfs_buf *);
>  void		xfs_trans_dquot_buf(xfs_trans_t *, struct xfs_buf *, uint);
>  void		xfs_trans_inode_alloc_buf(xfs_trans_t *, struct xfs_buf *);
> +void		xfs_trans_buf_set_type(struct xfs_trans *, struct xfs_buf *,
> +				       uint);
>  void		xfs_trans_ichgtime(struct xfs_trans *, struct xfs_inode *, int);
>  void		xfs_trans_ijoin(struct xfs_trans *, struct xfs_inode *, uint);
>  void		xfs_trans_log_buf(xfs_trans_t *, struct xfs_buf *, uint, uint);
> diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
> index 3edf5db..f950edd 100644
> --- a/fs/xfs/xfs_trans_buf.c
> +++ b/fs/xfs/xfs_trans_buf.c
> @@ -659,6 +659,7 @@ xfs_trans_binval(
>  		ASSERT(XFS_BUF_ISSTALE(bp));
>  		ASSERT(!(bip->bli_flags & (XFS_BLI_LOGGED | XFS_BLI_DIRTY)));
>  		ASSERT(!(bip->__bli_format.blf_flags & XFS_BLF_INODE_BUF));
> +		ASSERT(!(bip->__bli_format.blf_flags & XFS_BLF_TYPE_MASK));
>  		ASSERT(bip->__bli_format.blf_flags & XFS_BLF_CANCEL);
>  		ASSERT(bip->bli_item.li_desc->lid_flags & XFS_LID_DIRTY);
>  		ASSERT(tp->t_flags & XFS_TRANS_DIRTY);
> @@ -671,6 +672,7 @@ xfs_trans_binval(
>  	bip->bli_flags &= ~(XFS_BLI_INODE_BUF | XFS_BLI_LOGGED | XFS_BLI_DIRTY);
>  	bip->__bli_format.blf_flags &= ~XFS_BLF_INODE_BUF;
>  	bip->__bli_format.blf_flags |= XFS_BLF_CANCEL;
> +	bip->__bli_format.blf_flags &= ~XFS_BLF_TYPE_MASK;
>  	for (i = 0; i < bip->bli_format_count; i++) {
>  		memset(bip->bli_formats[i].blf_data_map, 0,
>  		       (bip->bli_formats[i].blf_map_size * sizeof(uint)));
> @@ -751,6 +753,26 @@ xfs_trans_inode_alloc_buf(
>  	bip->bli_flags |= XFS_BLI_INODE_ALLOC_BUF;
>  }
>  
> +/*
> + * Set the type of the buffer for log recovery so that it can correctly identify
> + * and hence attach the correct buffer ops to the buffer after replay.
> + */
> +void
> +xfs_trans_buf_set_type(
> +	struct xfs_trans	*tp,
> +	struct xfs_buf		*bp,
> +	uint			type)
> +{
> +	struct xfs_buf_log_item	*bip = bp->b_fspriv;
> +
> +	ASSERT(bp->b_transp == tp);
> +	ASSERT(bip != NULL);
> +	ASSERT(atomic_read(&bip->bli_refcount) > 0);
> +	ASSERT((type & XFS_BLF_TYPE_MASK) != 0);
> +
> +	bip->__bli_format.blf_flags &= ~XFS_BLF_TYPE_MASK;
> +	bip->__bli_format.blf_flags |= type;
> +}
>  
>  /*
>   * Similar to xfs_trans_inode_buf(), this marks the buffer as a cluster of
> @@ -769,14 +791,9 @@ xfs_trans_dquot_buf(
>  	xfs_buf_t	*bp,
>  	uint		type)
>  {
> -	xfs_buf_log_item_t	*bip = bp->b_fspriv;
> -
> -	ASSERT(bp->b_transp == tp);
> -	ASSERT(bip != NULL);
>  	ASSERT(type == XFS_BLF_UDQUOT_BUF ||
>  	       type == XFS_BLF_PDQUOT_BUF ||
>  	       type == XFS_BLF_GDQUOT_BUF);
> -	ASSERT(atomic_read(&bip->bli_refcount) > 0);
>  
> -	bip->__bli_format.blf_flags |= type;
> +	xfs_trans_buf_set_type(tp, bp, type);
>  }
> -- 
> 1.7.10
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/9] xfs: add CRC checks to the AGF
       [not found] ` <1358774760-21841-4-git-send-email-david@fromorbit.com>
@ 2013-02-21 22:53   ` Ben Myers
  2013-02-22  1:41     ` Dave Chinner
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Myers @ 2013-02-21 22:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Dave,

Just some minor nits.

On Tue, Jan 22, 2013 at 12:25:54AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> THe AGF already has some self identifying fields (e.g. the sequence
  The

> number) so we only need to add the uuid to it to identify the
> filesystem it belongs to. The location is fixed based on the
> sequence number, so there's no need to add a block number, either.
> 
> Hence the only additional fields are the CRC and LSN fields. These
> are unlogged, so place some space between the end of the logged
> fields and them so that future expansion of the AGF for legged
							  logged

> fields can be placed adjacent to the existing logged fields and
> hence not complicate the field-derived range based logging we
> currently have.
> 
> Based originally on a patch from myself, modified further by
> Christoph Hellwig and then modified again to fit into the
> verifier structure with additional fields by myself. The multiple
> signed-off-by tags indicate the age and history of this patch.
> 
> Signed-off-by: Dave Chinner <dgc@sgi.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_ag.h          |   21 +++++++++++-
>  fs/xfs/xfs_alloc.c       |   80 ++++++++++++++++++++++++++++++++--------------
>  fs/xfs/xfs_buf_item.h    |    4 ++-
>  fs/xfs/xfs_fsops.c       |    3 ++
>  fs/xfs/xfs_log_recover.c |    7 ++++
>  5 files changed, 89 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/xfs/xfs_ag.h b/fs/xfs/xfs_ag.h
> index f2aeedb..b382703 100644
> --- a/fs/xfs/xfs_ag.h
> +++ b/fs/xfs/xfs_ag.h
> @@ -63,12 +63,29 @@ typedef struct xfs_agf {
>  	__be32		agf_spare0;	/* spare field */
>  	__be32		agf_levels[XFS_BTNUM_AGF];	/* btree levels */
>  	__be32		agf_spare1;	/* spare field */
> +
>  	__be32		agf_flfirst;	/* first freelist block's index */
>  	__be32		agf_fllast;	/* last freelist block's index */
>  	__be32		agf_flcount;	/* count of blocks in freelist */
>  	__be32		agf_freeblks;	/* total free blocks */
> +
>  	__be32		agf_longest;	/* longest free space */
>  	__be32		agf_btreeblks;	/* # of blocks held in AGF btrees */
> +	uuid_t		agf_uuid;	/* uuid of filesystem */
> +
> +	/*
> +	 * reserve some contiguous space for future logged fields before we add
> +	 * the unlogged fields. This makes the range logging via flags and
> +	 * structure offsets much simpler.
> +	 */
> +	__be64		agf_spare64[16];
> +
> +	/* unlogged fields, written during buffer writeback. */
> +	__be64		agf_lsn;	/* last write sequence */
> +	__be32		agf_crc;	/* crc of agf sector */
> +	__be32		agf_spare2;
> +
> +	/* structure must be padded to 64 bit alignment */
>  } xfs_agf_t;
>  
>  #define	XFS_AGF_MAGICNUM	0x00000001
> @@ -83,6 +100,7 @@ typedef struct xfs_agf {
>  #define	XFS_AGF_FREEBLKS	0x00000200
>  #define	XFS_AGF_LONGEST		0x00000400
>  #define	XFS_AGF_BTREEBLKS	0x00000800
> +#define	XFS_AGF_UUID		0x00001000
>  #define	XFS_AGF_NUM_BITS	12
					13 ?
You added a 13th bit.

How do you envision having the uuid in these structures will be helpful?

Regards,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/9] xfs: add support for large btree blocks
  2013-02-15 21:20   ` [PATCH 2/9] xfs: add support for large btree blocks Ben Myers
@ 2013-02-22  1:34     ` Dave Chinner
  2013-02-23  2:27       ` Dave Chinner
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2013-02-22  1:34 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Fri, Feb 15, 2013 at 03:20:15PM -0600, Ben Myers wrote:
> On Tue, Jan 22, 2013 at 12:25:53AM +1100, Dave Chinner wrote:
> > From: Christoph Hellwig <hch@lst.de>
> > 
> > Add support for larger btree blocks that contains a CRC32C checksum,
> > a filesystem uuid and block number for detecting filesystem
> > consistency and out of place writes.
> > 
> > [dchinner@redhat.com] Also include an owner field to allow reverse
> > mappings to be implemented for improved repairability and a LSN
> > field to so that log recovery can easily determine the last
> > modification that made it to disk for each buffer.
> > 
> > [dchinner@redhat.com] Add buffer log format flags to indicate the
> > type of buffer to recovery so that we don't have to do blind magic
> > number tests to determine what the buffer is.
> > 
> > [dchinner@redhat.com] Modified to fit into the verifier structure.
> 
> This patch is far too large for a good review.  It needs to be split up into
> it's various ideas which you outlined in patch 0.  If you need to add dead code
> in each piece and then enable it at the end, that's fine with me.

You want it broken up into 7 or 8 separate patches - what does it
gain us? It'll take a week for me to do the patch monkeying and to
retest and validate the resulting stack (i.e. it is bisectable, each
patch passes xfstests, etc), and in the end the code will be
identical.

As I've said before, there's a point where the tradeoff for
splitting patches up doesn't make sense. Asking a developer to do
days of work to end up with exactly the same code to save the
reviewer an hour or two is *not* a good tradeoff. Especially for the
first patch of a much larger 15-20 patch series which contains
several larger and more complex patches....

> Some comments below.
....
> > @@ -59,10 +61,10 @@ typedef __be32 xfs_alloc_ptr_t;
> >  
> >  /*
> >   * Btree block header size depends on a superblock flag.
> > - *
> > - * (not quite yet, but soon)
> >   */
> > -#define XFS_ALLOC_BLOCK_LEN(mp)	XFS_BTREE_SBLOCK_LEN
> > +#define XFS_ALLOC_BLOCK_LEN(mp) \
> > +	(xfs_sb_version_hascrc(&((mp)->m_sb)) ? \
> > +		XFS_BTREE_SBLOCK_CRC_LEN : XFS_BTREE_SBLOCK_LEN)
> 
> The allocbt changes seem to be a good fit for their own patch.

It's used in exactly 4 places and three of them are in the next 10
lines. That's trivial to validate as correct without needing them in
a separate patch.

> >  
> >  /*
> >   * Record, key, and pointer address macros for btree blocks.
> > diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
> > index f96a734..aa4765f 100644
> > --- a/fs/xfs/xfs_attr_leaf.c
> > +++ b/fs/xfs/xfs_attr_leaf.c
> > @@ -232,7 +232,7 @@ xfs_attr_shortform_bytesfit(xfs_inode_t *dp, int bytes)
> >  				return 0;
> >  			return dp->i_d.di_forkoff;
> >  		}
> > -		dsize = XFS_BMAP_BROOT_SPACE(dp->i_df.if_broot);
> > +		dsize = XFS_BMAP_BROOT_SPACE(mp, dp->i_df.if_broot);
> 
> Changes to XFS_BMAP_BROOT_SPACE are a good candidate for a separate patch, just
> as with LITINO.

Used in only 2 places. Trivial to validate.

> > @@ -3155,8 +3163,8 @@ xfs_bmap_extents_to_btree(
> >  	 * Do all this logging at the end so that
> >  	 * the root is at the right level.
> >  	 */
> > -	xfs_btree_log_block(cur, abp, XFS_BB_ALL_BITS);
> >  	xfs_btree_log_recs(cur, abp, 1, be16_to_cpu(ablock->bb_numrecs));
> > +	xfs_btree_log_block(cur, abp, XFS_BB_ALL_BITS);
> 
> Huh.  Why was that necessary?

It isn't.

> > @@ -3268,8 +3276,12 @@ xfs_bmap_local_to_extents(
> >  		*firstblock = args.fsbno;
> >  		bp = xfs_btree_get_bufl(args.mp, tp, args.fsbno, 0);
> >  		bp->b_ops = &xfs_bmbt_buf_ops;
> > +
> >  		memcpy(bp->b_addr, ifp->if_u1.if_data, ifp->if_bytes);
> 
> Conflicts now due to 1e82379b0

I know. It's fixed in my local copy....

> 
> > +
> > +		xfs_trans_buf_set_type(tp, bp, XFS_BLF_BTREE_BUF);
> 
> The xfs_trans_buf_set_type changes are a good candidate for a separate patch.

I could out the xfs_trans_buf_set_type() calls and infrstructure,
but that's only 20-30 lines of code. The rest of the code that
actually uses it (e.g. the log recovery code, the calls to set the
type) goes along with the CRC modifications, so the majority of the
change still has to sit in the current patch. Double handling the
code by adding some in one patch and the rest in another makes the
patches a pain to modify and test, and when it gives the same end
result.....

> >  /*
> >   * Btree block header size depends on a superblock flag.
> > - *
> > - * (not quite yet, but soon)
> >   */
> > -#define XFS_BMBT_BLOCK_LEN(mp)	XFS_BTREE_LBLOCK_LEN
> > +#define XFS_BMBT_BLOCK_LEN(mp) \
> > +	(xfs_sb_version_hascrc(&((mp)->m_sb)) ? \
> > +		XFS_BTREE_LBLOCK_CRC_LEN : XFS_BTREE_LBLOCK_LEN)
> >  
> >  #define XFS_BMBT_REC_ADDR(mp, block, index) \
> >  	((xfs_bmbt_rec_t *) \
> > @@ -186,12 +187,12 @@ typedef __be64 xfs_bmbt_ptr_t, xfs_bmdr_ptr_t;
> >  #define XFS_BMAP_BROOT_PTR_ADDR(mp, bb, i, sz) \
> >  	XFS_BMBT_PTR_ADDR(mp, bb, i, xfs_bmbt_maxrecs(mp, sz, 0))
> >  
> > -#define XFS_BMAP_BROOT_SPACE_CALC(nrecs) \
> > -	(int)(XFS_BTREE_LBLOCK_LEN + \
> > +#define XFS_BMAP_BROOT_SPACE_CALC(mp, nrecs) \
> > +	(int)(XFS_BMBT_BLOCK_LEN(mp) + \
> >  	       ((nrecs) * (sizeof(xfs_bmbt_key_t) + sizeof(xfs_bmbt_ptr_t))))
> >  
> > -#define XFS_BMAP_BROOT_SPACE(bb) \
> > -	(XFS_BMAP_BROOT_SPACE_CALC(be16_to_cpu((bb)->bb_numrecs)))
> > +#define XFS_BMAP_BROOT_SPACE(mp, bb) \
> > +	(XFS_BMAP_BROOT_SPACE_CALC(mp, be16_to_cpu((bb)->bb_numrecs)))
> >  #define XFS_BMDR_SPACE_CALC(nrecs) \
> >  	(int)(sizeof(xfs_bmdr_block_t) + \
> >  	       ((nrecs) * (sizeof(xfs_bmbt_key_t) + sizeof(xfs_bmbt_ptr_t))))
> 
> Are these large broots going to force attributes out of line?

No. The BMAP_BROOT is kept in memory, the BMDR is what is kept in
the inode on disk. What is kept in the inode is unchanged. See
xfs_iformat_btree()/xfs_bmdr_to_bmbt() and
xfs_iflush_fork()//xfs_bmbt_to_bmdr().

> > @@ -277,10 +367,8 @@ xfs_btree_dup_cursor(
> >  				*ncur = NULL;
> >  				return error;
> >  			}
> > -			new->bc_bufs[i] = bp;
> > -			ASSERT(!xfs_buf_geterror(bp));
> > -		} else
> > -			new->bc_bufs[i] = NULL;
> > +		}
> > +		new->bc_bufs[i] = bp;
> 
> Why remove the assert?

Because if there is a bp->b_error is set in xfs_trans_read_buf(),
then the error is returned and no buffer is returned. IOWs, if a
buffer is returned, it is guaranteed that bp->b_error is 0.  We
don't assert this on every buffer returned by xfs_trans_read_buf()
and so this is simply making the code consistent with all the other
code that reads buffers.

> >  void
> > +xfs_btree_init_block_int(
> > +	struct xfs_mount	*mp,
> > +	struct xfs_btree_block	*buf,
> > +	xfs_daddr_t		blkno,
> > +	__u32			magic,
> > +	__u16			level,
> > +	__u16			numrecs,
> > +	__u64			owner,
> > +	unsigned int		flags)
> > +{
> > +	buf->bb_magic = cpu_to_be32(magic);
> > +	buf->bb_level = cpu_to_be16(level);
> > +	buf->bb_numrecs = cpu_to_be16(numrecs);
> > +
> > +	if (flags & XFS_BTREE_LONG_PTRS) {
> > +		buf->bb_u.l.bb_leftsib = cpu_to_be64(NULLDFSBNO);
> > +		buf->bb_u.l.bb_rightsib = cpu_to_be64(NULLDFSBNO);
> > +		if (flags & XFS_BTREE_CRC_BLOCKS) {
> > +			buf->bb_u.l.bb_blkno = cpu_to_be64(blkno);
> > +			buf->bb_u.l.bb_owner = cpu_to_be64(owner);
> > +			uuid_copy(&buf->bb_u.l.bb_uuid, &mp->m_sb.sb_uuid);
> > +			buf->bb_u.l.bb_pad = 0;
> > +		}
> > +	} else {
> > +		/* owner is a 32 bit value on short blocks */
> > +		__u32 __owner = (__u32)owner;
> > +
> > +		buf->bb_u.s.bb_leftsib = cpu_to_be32(NULLAGBLOCK);
> > +		buf->bb_u.s.bb_rightsib = cpu_to_be32(NULLAGBLOCK);
> > +		if (flags & XFS_BTREE_CRC_BLOCKS) {
> > +			buf->bb_u.s.bb_blkno = cpu_to_be64(blkno);
> > +			buf->bb_u.s.bb_owner = cpu_to_be32(__owner);
> > +			uuid_copy(&buf->bb_u.s.bb_uuid, &mp->m_sb.sb_uuid);
> > +		}
> > +	}
> > +}
> > +
> 
> The xfs_btree_init_block_int might be good in it's own patch.

All that means is I have to split these hunks into one patch that
factors it, and another patch that then adds the new functionality.
More handling of the same code means more opportunity for mistakes
and bugs.

> > +void
> >  xfs_btree_init_block(
> >  	struct xfs_mount *mp,
> >  	struct xfs_buf	*bp,
> >  	__u32		magic,
> >  	__u16		level,
> >  	__u16		numrecs,
> > +	__u64		owner,
> >  	unsigned int	flags)
> >  {
> > -	struct xfs_btree_block	*new = XFS_BUF_TO_BLOCK(bp);
> > -
> > -	new->bb_magic = cpu_to_be32(magic);
> > -	new->bb_level = cpu_to_be16(level);
> > -	new->bb_numrecs = cpu_to_be16(numrecs);
> > -
> > -	if (flags & XFS_BTREE_LONG_PTRS) {
> > -		new->bb_u.l.bb_leftsib = cpu_to_be64(NULLDFSBNO);
> > -		new->bb_u.l.bb_rightsib = cpu_to_be64(NULLDFSBNO);
> > -	} else {
> > -		new->bb_u.s.bb_leftsib = cpu_to_be32(NULLAGBLOCK);
> > -		new->bb_u.s.bb_rightsib = cpu_to_be32(NULLAGBLOCK);
> > -	}
> > +	xfs_btree_init_block_int(mp, XFS_BUF_TO_BLOCK(bp), bp->b_bn,
> > +				 magic, level, numrecs, owner, flags);
> >  }
> >  
> >  STATIC void
> >  xfs_btree_init_block_cur(
> >  	struct xfs_btree_cur	*cur,
> > +	struct xfs_buf		*bp,
> >  	int			level,
> > -	int			numrecs,
> > -	struct xfs_buf		*bp)
> > +	int			numrecs)
> 
> Why rearrange the order of the args here?

1. so I got compile errors when looking for code that might need
special CRC support.

2. so it is consistent with all the other functions that pass (cur,
bp, ....) in that order....

> 
> >  {
> > -	xfs_btree_init_block(cur->bc_mp, bp, xfs_magics[cur->bc_btnum],
> > -			       level, numrecs, cur->bc_flags);
> > +	__u64 owner;
> > +
> > +	/*
> > +	 * we can pull the owner from the cursor right now as the different
> > +	 * owners align directly with the pointer size of the btree. This may
> > +	 * change in future, but is safe for current users of the generic btree
> > +	 * code.
> > +	 */
> > +	if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
> > +		owner = cur->bc_private.b.ip->i_ino;
> > +	else
> > +		owner = cur->bc_private.a.agno;
> 
> This is something I can look into a bit later... but I'm a little flummoxed by
> this one.  Apparently only inodes use long pointers?

The BMBT is what uses them, and by definition they are rooted in an
inode. Therefore, anything using a long pointer btree has an inode
as it's owner.

> 
> > +
> > +	xfs_btree_init_block_int(cur->bc_mp, XFS_BUF_TO_BLOCK(bp), bp->b_bn,
> > +				 xfs_btree_magic(cur), level, numrecs,
> > +				 owner, cur->bc_flags);
> >  }
> >  
> >  /*
> >   * Return true if ptr is the last record in the btree and
> > - * we need to track updateѕ to this record.  The decision
> > + * we need to track updates to this record.  The decision
> 
> I can't find a change in that line.  Time for new glasses?

It's got a multi-byte UTF-8 character in it. Check the s on the end
of "updates".

> > @@ -1223,7 +1361,12 @@ xfs_btree_log_block(
> >  		offsetof(struct xfs_btree_block, bb_numrecs),
> >  		offsetof(struct xfs_btree_block, bb_u.s.bb_leftsib),
> >  		offsetof(struct xfs_btree_block, bb_u.s.bb_rightsib),
> > -		XFS_BTREE_SBLOCK_LEN
> > +		offsetof(struct xfs_btree_block, bb_u.s.bb_blkno),
> > +		offsetof(struct xfs_btree_block, bb_u.s.bb_lsn),
> > +		offsetof(struct xfs_btree_block, bb_u.s.bb_uuid),
> > +		offsetof(struct xfs_btree_block, bb_u.s.bb_owner),
> > +		offsetof(struct xfs_btree_block, bb_u.s.bb_crc),
> > +		XFS_BTREE_SBLOCK_CRC_LEN
> >  	};
> >  	static const short	loffsets[] = {	/* table of offsets (long) */
> >  		offsetof(struct xfs_btree_block, bb_magic),
> > @@ -1231,17 +1374,40 @@ xfs_btree_log_block(
> >  		offsetof(struct xfs_btree_block, bb_numrecs),
> >  		offsetof(struct xfs_btree_block, bb_u.l.bb_leftsib),
> >  		offsetof(struct xfs_btree_block, bb_u.l.bb_rightsib),
> > -		XFS_BTREE_LBLOCK_LEN
> > +		offsetof(struct xfs_btree_block, bb_u.l.bb_blkno),
> > +		offsetof(struct xfs_btree_block, bb_u.l.bb_lsn),
> > +		offsetof(struct xfs_btree_block, bb_u.l.bb_uuid),
> > +		offsetof(struct xfs_btree_block, bb_u.l.bb_owner),
> > +		offsetof(struct xfs_btree_block, bb_u.l.bb_crc),
> > +		offsetof(struct xfs_btree_block, bb_u.l.bb_pad),
> > +		XFS_BTREE_LBLOCK_CRC_LEN
> >  	};
> >  
> >  	XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY);
> >  	XFS_BTREE_TRACE_ARGBI(cur, bp, fields);
> >  
> >  	if (bp) {
> > +		int nbits;
> > +
> > +		if (cur->bc_flags & XFS_BTREE_CRC_BLOCKS) {
> > +			/*
> > +			 * We don't log the CRC when updating a btree
> > +			 * block but instead recreate it during log
> > +			 * recovery.  As the log buffers have checksums
> > +			 * of their this is safe and avoids logging a crc
> 				   own
> 
> > +			 * update in a lot of places.
> > +			 */
> > +			if (fields == XFS_BB_ALL_BITS)
> > +				fields = XFS_BB_ALL_BITS_CRC;
> > +			nbits = XFS_BB_NUM_BITS_CRC;
> > +		} else {
> > +			nbits = XFS_BB_NUM_BITS;
> > +		}
> >  		xfs_btree_offsets(fields,
> >  				  (cur->bc_flags & XFS_BTREE_LONG_PTRS) ?
> >  					loffsets : soffsets,
> > -				  XFS_BB_NUM_BITS, &first, &last);
> > +				  nbits, &first, &last);
> > +		xfs_trans_buf_set_type(cur->bc_tp, bp, XFS_BLF_BTREE_BUF);
> >  		xfs_trans_log_buf(cur->bc_tp, bp, first, last);
> >  	} else {
> >  		xfs_trans_log_inode(cur->bc_tp, cur->bc_private.b.ip,

I'll answer your question about the values of XFS_BB_NUM_BITS and
XFS_BB_NUM_BITS_CRC while we are looking at the code that uses it.
The value is the index into the soffset/loffsets tables to determine
what range of fields in ibtree block headers are to logged.



> > @@ -56,10 +60,23 @@ struct xfs_btree_block {
> >  		struct {
> >  			__be32		bb_leftsib;
> >  			__be32		bb_rightsib;
> > +
> > +			__be64		bb_blkno;
> > +			__be64		bb_lsn;
> > +			uuid_t		bb_uuid;
> > +			__be32		bb_owner;
> > +			__le32		bb_crc;
> >  		} s;			/* short form pointers */
> >  		struct	{
> >  			__be64		bb_leftsib;
> >  			__be64		bb_rightsib;
> > +
> > +			__be64		bb_blkno;
> > +			__be64		bb_lsn;
> > +			uuid_t		bb_uuid;
> > +			__be64		bb_owner;
> > +			__le32		bb_crc;
> > +			__be32		bb_pad; /* padding for alignment */
> >  		} l;			/* long form pointers */
> >  	} bb_u;				/* rest */
> >  };
> 
> struct xfs_btree_block {
>         __be32          bb_magic;       /* magic number for block type */
>         __be16          bb_level;       /* 0 is a leaf */
>         __be16          bb_numrecs;     /* current # of data records */
>         union {
>                 struct {
>                         __be32          bb_leftsib;
>                         __be32          bb_rightsib;
> 		} s;			/* short form pointers */
>                 struct {
>                         __be32          bb_leftsib;
>                         __be32          bb_rightsib;
>                         __be64          bb_blkno;
>                         __be64          bb_lsn;
>                         uuid_t          bb_uuid;
>                         __be32          bb_owner;
>                         __le32          bb_crc;
>                 } s_crc;		/* short form pointers with crcs */
>                 struct  {
>                         __be64          bb_leftsib;
>                         __be64          bb_rightsib;
> 		} l;			/* long form pointers */
>                 struct  {
>                         __be64          bb_leftsib;
>                         __be64          bb_rightsib;
>                         __be64          bb_blkno;
>                         __be64          bb_lsn;
>                         uuid_t          bb_uuid;
>                         __be64          bb_owner;
>                         __le32          bb_crc;
>                         __be32          bb_pad; /* padding for alignment */
>                 } l_crc;		/* long form pointers with crcs */
>         } bb_u;                         /* rest */
> };

Right now the shared btree code only needs to know about 2 different
types (short and long), and the differences between crc/non crc
btree blocks is completely irrelevant to most of the code and hidden
deep in the btree block IO/initialisation layers.

Defining the structures as you've suggested duplicates fields between
structures - that adds maintenance burden (i.e. have to keep stuff in
sync, and leave coments/build bombs to ensure they stay that way).
It also means means that we'll have 4 types of structures that will be
used interchangably (i.e. bb_u.l in some places and bb_u.l_crc in
others). That is, IMO, going to be far more confusing than just
having a single type of structure for each of the long and short
forms, and more likely to lead to bugs in the future.

As it is, we already have to use magic number/version checks to
determine what fields are valid, so having different structure types
doesn't actually add any code documentation value as we will
only be accessing the extended structures inside blocks of code
where we know that we are supposed to access them...

> 
> > @@ -67,6 +84,16 @@ struct xfs_btree_block {
> >  #define XFS_BTREE_SBLOCK_LEN	16	/* size of a short form block */
> >  #define XFS_BTREE_LBLOCK_LEN	24	/* size of a long form block */
> >  
> > +/* sizes of CRC enabled btree blocks */
> > +#define XFS_BTREE_SBLOCK_CRC_LEN	(XFS_BTREE_SBLOCK_LEN + 40)
> > +#define XFS_BTREE_LBLOCK_CRC_LEN	(XFS_BTREE_LBLOCK_LEN + 48)
> > +
> > +
> > +#define XFS_BTREE_SBLOCK_CRC_OFF \
> > +	offsetof(struct xfs_btree_block, bb_u.s.bb_crc)
> > +#define XFS_BTREE_LBLOCK_CRC_OFF \
> > +	offsetof(struct xfs_btree_block, bb_u.l.bb_crc)
> > +
> >  
> >  /*
> >   * Generic key, ptr and record wrapper structures.
> > @@ -101,13 +128,11 @@ union xfs_btree_rec {
> >  #define	XFS_BB_NUMRECS		0x04
> >  #define	XFS_BB_LEFTSIB		0x08
> >  #define	XFS_BB_RIGHTSIB		0x10
> > +#define	XFS_BB_BLKNO		0x20
> >  #define	XFS_BB_NUM_BITS		5
> >  #define	XFS_BB_ALL_BITS		((1 << XFS_BB_NUM_BITS) - 1)
> 
> Did XFS_BB_BLKNO sneak in from a subsequent patch?

No, there is the possibility we only want to log a change to the
block number in the extended header, and so we need to add that as a
loggable index.

> 
> > -
> > -/*
> > - * Magic numbers for btree blocks.
> > - */
> > -extern const __uint32_t	xfs_magics[];
> > +#define	XFS_BB_NUM_BITS_CRC	8
> > +#define	XFS_BB_ALL_BITS_CRC	((1 << XFS_BB_NUM_BITS_CRC) - 1)
> 
> I'm a little confused here.  We were using 5 bits, you added BB_BLKNO for 6,
> and now you're setting NUM_BITS_CRC to 8?

See the soffsets/loffsets table comment above.

> Probably more readable with the (1<<0) idiom, as below.

Just using what ia already there.

> > diff --git a/fs/xfs/xfs_dinode.h b/fs/xfs/xfs_dinode.h
> > index 88a3368..6b5bd17 100644
> > --- a/fs/xfs/xfs_dinode.h
> > +++ b/fs/xfs/xfs_dinode.h
> > @@ -107,8 +107,8 @@ typedef enum xfs_dinode_fmt {
> >  #define XFS_LITINO(mp, version) \
> >  	((int)(((mp)->m_sb.sb_inodesize) - sizeof(struct xfs_dinode)))
> >  
> > -#define	XFS_BROOT_SIZE_ADJ	\
> > -	(XFS_BTREE_LBLOCK_LEN - sizeof(xfs_bmdr_block_t))
> > +#define XFS_BROOT_SIZE_ADJ(ip) \
> > +	(XFS_BMBT_BLOCK_LEN((ip)->i_mount) - sizeof(xfs_bmdr_block_t))
> 
> A good candidate for a separate patch.

Again, only used in 3 places so shoul dbe easy to verify even in the
large patch...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/9] xfs: add CRC checks to the AGF
  2013-02-21 22:53   ` [PATCH 3/9] xfs: add CRC checks to the AGF Ben Myers
@ 2013-02-22  1:41     ` Dave Chinner
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Chinner @ 2013-02-22  1:41 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Thu, Feb 21, 2013 at 04:53:05PM -0600, Ben Myers wrote:
> > @@ -83,6 +100,7 @@ typedef struct xfs_agf {
> >  #define	XFS_AGF_FREEBLKS	0x00000200
> >  #define	XFS_AGF_LONGEST		0x00000400
> >  #define	XFS_AGF_BTREEBLKS	0x00000800
> > +#define	XFS_AGF_UUID		0x00001000
> >  #define	XFS_AGF_NUM_BITS	12
> 					13 ?
> You added a 13th bit.

Yup, good catch.

> How do you envision having the uuid in these structures will be helpful?

Same as for all the rest of the metadata. It identifies the
filesystem it belongs to so we can detect and trace misplaced writes
instantly, rather than, say, having to spend months of debugging
with FC analysers to prove the storage array firmware has a bug in
it and that's what is causing the filesystem corruption.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/9] xfs: metadata CRCs, kernel, first batch
       [not found] <1358774760-21841-1-git-send-email-david@fromorbit.com>
                   ` (2 preceding siblings ...)
       [not found] ` <1358774760-21841-4-git-send-email-david@fromorbit.com>
@ 2013-02-22 15:19 ` Ben Myers
  2013-02-22 23:12   ` Dave Chinner
       [not found] ` <1358774760-21841-5-git-send-email-david@fromorbit.com>
       [not found] ` <1358774760-21841-6-git-send-email-david@fromorbit.com>
  5 siblings, 1 reply; 19+ messages in thread
From: Ben Myers @ 2013-02-22 15:19 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Dave,

Dunno if you've seen this one:

BUG: unable to handle kernel NULL pointer dereference at 00000000000000d0
[601393.932814] IP: [<ffffffffa06e5dcc>] xfs_dquot_buf_verify+0x1c/0xb0 [xfs]
[601393.932814] PGD 1f226d067 PUD 1ec0ae067 PMD 0
[601393.932814] Oops: 0000 [#1] SMP
[601393.932814] Modules linked in: xfs(O) autofs4 binfmt_misc ipmi_devintf ipmi_si ipmi_msghandler nfsv3 nfs_acl iptable_filter ip_tables x_tables nfs fscache dns_resolver lockd sunrpc af_packet
 rdma_ucm rdma_cm iw_cm ib_addr ib_ipoib ib_cm ipv6 ib_uverbs ib_umad iw_cxgb3 cxgb3 mdio mlx4_en mlx4_ib ib_sa mlx4_core fuse ext2 loop dm_mod i5k_amb e1000e i5000_edac edac_core mptfc ib_mthca
 ioatdma tpm_tis scsi_transport_fc tpm ib_mad ib_core scsi_tgt tpm_bios shpchp pci_hotplug i2c_i801 iTCO_wdt iTCO_vendor_support lpc_ich ehci_pci dca button serio_raw sg pcspkr i2c_core mfd_core
 microcode container uhci_hcd ehci_hcd sd_mod crc_t10dif usbcore usb_common ext3 jbd mbcache piix edd exportfs crc32c libcrc32c fan thermal processor thermal_sys hwmon ide_generic ide_core sata_
nv megaraid_sas mptsas mptscsih mptbase scsi_transport_sas ata_piix ahci libahci libata scsi_mod [last unloaded: xfs]
[601393.932814] CPU 2
[601393.932814] Pid: 1693, comm: mount Tainted: G           O 3.8.0-rc1-0.7-default+ #1 SGI.COM AltixXE310/X7DGT-INF
[601393.932814] RIP: 0010:[<ffffffffa06e5dcc>]  [<ffffffffa06e5dcc>] xfs_dquot_buf_verify+0x1c/0xb0 [xfs]
[601393.932814] RSP: 0018:ffff8801eb2d3848  EFLAGS: 00010286
[601393.932814] RAX: 0000000000000000 RBX: ffff8801d7e54b00 RCX: ffff8801d7e54b78
[601393.932814] RDX: 0000000000100022 RSI: ffff8801d7e54b00 RDI: ffff8802245a5800
[601393.932814] RBP: ffff8801eb2d3868 R08: ffff880222a20800 R09: 0000000108f4fdca
[601393.932814] R10: 0000000108f4fdca R11: 0000000108f4fdca R12: ffff8802245a5800
[601393.932814] R13: ffff8802245a5800 R14: ffff880180c4c000 R15: ffff8801eb2d3a58
[601393.932814] FS:  00007fe59eccc7e0(0000) GS:ffff88022fd00000(0000) knlGS:0000000000000000
[601393.932814] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[601393.932814] CR2: 00000000000000d0 CR3: 00000001d6fa6000 CR4: 00000000000007e0
[601393.932814] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[601393.932814] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[601393.932814] Process mount (pid: 1693, threadinfo ffff8801eb2d2000, task ffff8801ec3862c0)
[601393.932814] Stack:
[601393.932814]  ffff8801d7e54b00 ffff8802245a5800 ffff8801eb2d39e0 ffff8801eb2d3a08
[601393.932814]  ffff8801eb2d3898 ffffffffa06e7b89 ffff8801eb2d3898 ffffffff811db523
[601393.932814]  0000000000000001 ffff8801eb2d3998 ffff8801eb2d3948 ffffffffa0676058
[601393.932814] Call Trace:
[601393.932814]  [<ffffffffa06e7b89>] xfs_dquot_buf_write_verify+0x29/0x90 [xfs]
[601393.932814]  [<ffffffff811db523>] ? blk_finish_plug+0x13/0x50
[601393.932814]  [<ffffffffa0676058>] _xfs_buf_ioapply+0x68/0x360 [xfs]
[601393.932814]  [<ffffffff810696b0>] ? try_to_wake_up+0x2b0/0x2b0
[601393.932814]  [<ffffffffa0677b23>] xfs_buf_iorequest+0x53/0x70 [xfs]
[601393.932814]  [<ffffffffa0677e25>] xfs_bdstrat_cb+0x35/0x60 [xfs]
[601393.932814]  [<ffffffffa0677f98>] __xfs_buf_delwri_submit+0x148/0x1c0 [xfs]
[601393.932814]  [<ffffffffa067803b>] xfs_buf_delwri_submit+0x2b/0x90 [xfs]
[601393.932814]  [<ffffffffa06d1de1>] xlog_recover_commit_trans+0x111/0x120 [xfs]
[601393.932814]  [<ffffffffa06d1ff1>] xlog_recover_process_data+0x201/0x2b0 [xfs]
[601393.932814]  [<ffffffffa06d257c>] xlog_do_recovery_pass+0x4dc/0x760 [xfs]
[601393.932814]  [<ffffffff8103e3b5>] ? console_unlock+0x265/0x3a0
[601393.932814]  [<ffffffffa06d289c>] xlog_do_log_recovery+0x9c/0x110 [xfs]
[601393.932814]  [<ffffffffa06d2933>] xlog_do_recover+0x23/0x1e0 [xfs]
[601393.932814]  [<ffffffffa06d3660>] xlog_recover+0x70/0x90 [xfs]
[601393.932814]  [<ffffffffa06dd6a4>] xfs_log_mount+0x174/0x1a0 [xfs]
[601393.932814]  [<ffffffffa06d68cb>] xfs_mountfs+0x36b/0x6d0 [xfs]
[601393.932814]  [<ffffffffa068983e>] xfs_fs_fill_super+0x28e/0x310 [xfs]
[601393.932814]  [<ffffffff8111f6f9>] mount_bdev+0x179/0x200
[601393.932814]  [<ffffffffa06895b0>] ? xfs_parseargs+0xbd0/0xbd0 [xfs]
[601393.932814]  [<ffffffffa0687aa0>] xfs_fs_mount+0x10/0x20 [xfs]
[601393.932814]  [<ffffffff8111ec78>] mount_fs+0x48/0x190
[601393.932814]  [<ffffffff8113acc4>] vfs_kern_mount+0x74/0x100
[601393.932814]  [<ffffffff8113c0ef>] do_mount+0x3af/0xa70
[601393.932814]  [<ffffffff81139dbe>] ? copy_mount_options+0x11e/0x190
[601393.932814]  [<ffffffff8113c864>] sys_mount+0xb4/0xe0
[601393.932814]  [<ffffffff813ee769>] system_call_fastpath+0x16/0x1b
[601393.932814] Code: 01 00 00 e8 37 f5 cf e0 e9 69 ff ff ff 66 90 55 48 89 e5 41 56 41 55 49 89 fd 41 54 53 48 8b 87 90 01 00 00 4c 8b b6 98 00 00 00 <8b> 90 d0 00 00 00 85 d2 74 62 45 31 e4 31 db 31 c9 eb 34 90 41
[601393.932814] RIP  [<ffffffffa06e5dcc>] xfs_dquot_buf_verify+0x1c/0xb0 [xfs]
[601393.932814]  RSP <ffff8801eb2d3848>
[601393.932814] CR2: 00000000000000d0

Crashed during test 087

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/9] xfs: metadata CRCs, kernel, first batch
  2013-02-22 15:19 ` [PATCH 0/9] xfs: metadata CRCs, kernel, first batch Ben Myers
@ 2013-02-22 23:12   ` Dave Chinner
  2013-02-22 23:50     ` Ben Myers
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2013-02-22 23:12 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Fri, Feb 22, 2013 at 09:19:31AM -0600, Ben Myers wrote:
> Dave,
> 
> Dunno if you've seen this one:
> 
> BUG: unable to handle kernel NULL pointer dereference at 00000000000000d0
> [601393.932814] IP: [<ffffffffa06e5dcc>] xfs_dquot_buf_verify+0x1c/0xb0 [xfs]
.....
> [601393.932814] Call Trace:
> [601393.932814]  [<ffffffffa06e7b89>] xfs_dquot_buf_write_verify+0x29/0x90 [xfs]
> [601393.932814]  [<ffffffff811db523>] ? blk_finish_plug+0x13/0x50
> [601393.932814]  [<ffffffffa0676058>] _xfs_buf_ioapply+0x68/0x360 [xfs]
> [601393.932814]  [<ffffffff810696b0>] ? try_to_wake_up+0x2b0/0x2b0
> [601393.932814]  [<ffffffffa0677b23>] xfs_buf_iorequest+0x53/0x70 [xfs]
> [601393.932814]  [<ffffffffa0677e25>] xfs_bdstrat_cb+0x35/0x60 [xfs]
> [601393.932814]  [<ffffffffa0677f98>] __xfs_buf_delwri_submit+0x148/0x1c0 [xfs]
> [601393.932814]  [<ffffffffa067803b>] xfs_buf_delwri_submit+0x2b/0x90 [xfs]
> [601393.932814]  [<ffffffffa06d1de1>] xlog_recover_commit_trans+0x111/0x120 [xfs]
> [601393.932814]  [<ffffffffa06d1ff1>] xlog_recover_process_data+0x201/0x2b0 [xfs]
> [601393.932814]  [<ffffffffa06d257c>] xlog_do_recovery_pass+0x4dc/0x760 [xfs]
> [601393.932814]  [<ffffffff8103e3b5>] ? console_unlock+0x265/0x3a0
> [601393.932814]  [<ffffffffa06d289c>] xlog_do_log_recovery+0x9c/0x110 [xfs]
> [601393.932814]  [<ffffffffa06d2933>] xlog_do_recover+0x23/0x1e0 [xfs]

This is mentioned in the patch zero description:

"This series makes is through to 001-092 in xfstests - there is a
problem in the dquot verifier that causes log recovery of dquot
buffers to follow a NULL pointer."

Basically, mp->m_quotainfo is not initialised until after log
recovery occurs, so this has to be detected in the verify/crc
routines otherwise it goes splat like above. My current patch series
has this fixed.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/9] xfs: metadata CRCs, kernel, first batch
  2013-02-22 23:12   ` Dave Chinner
@ 2013-02-22 23:50     ` Ben Myers
  2013-02-23  2:38       ` Dave Chinner
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Myers @ 2013-02-22 23:50 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hi Dave,

On Sat, Feb 23, 2013 at 10:12:17AM +1100, Dave Chinner wrote:
> On Fri, Feb 22, 2013 at 09:19:31AM -0600, Ben Myers wrote:
> > Dave,
> > 
> > Dunno if you've seen this one:
> > 
> > BUG: unable to handle kernel NULL pointer dereference at 00000000000000d0
> > [601393.932814] IP: [<ffffffffa06e5dcc>] xfs_dquot_buf_verify+0x1c/0xb0 [xfs]
> .....
> > [601393.932814] Call Trace:
> > [601393.932814]  [<ffffffffa06e7b89>] xfs_dquot_buf_write_verify+0x29/0x90 [xfs]
> > [601393.932814]  [<ffffffff811db523>] ? blk_finish_plug+0x13/0x50
> > [601393.932814]  [<ffffffffa0676058>] _xfs_buf_ioapply+0x68/0x360 [xfs]
> > [601393.932814]  [<ffffffff810696b0>] ? try_to_wake_up+0x2b0/0x2b0
> > [601393.932814]  [<ffffffffa0677b23>] xfs_buf_iorequest+0x53/0x70 [xfs]
> > [601393.932814]  [<ffffffffa0677e25>] xfs_bdstrat_cb+0x35/0x60 [xfs]
> > [601393.932814]  [<ffffffffa0677f98>] __xfs_buf_delwri_submit+0x148/0x1c0 [xfs]
> > [601393.932814]  [<ffffffffa067803b>] xfs_buf_delwri_submit+0x2b/0x90 [xfs]
> > [601393.932814]  [<ffffffffa06d1de1>] xlog_recover_commit_trans+0x111/0x120 [xfs]
> > [601393.932814]  [<ffffffffa06d1ff1>] xlog_recover_process_data+0x201/0x2b0 [xfs]
> > [601393.932814]  [<ffffffffa06d257c>] xlog_do_recovery_pass+0x4dc/0x760 [xfs]
> > [601393.932814]  [<ffffffff8103e3b5>] ? console_unlock+0x265/0x3a0
> > [601393.932814]  [<ffffffffa06d289c>] xlog_do_log_recovery+0x9c/0x110 [xfs]
> > [601393.932814]  [<ffffffffa06d2933>] xlog_do_recover+0x23/0x1e0 [xfs]
> 
> This is mentioned in the patch zero description:
> 
> "This series makes is through to 001-092 in xfstests - there is a
> problem in the dquot verifier that causes log recovery of dquot
> buffers to follow a NULL pointer."
>
> Basically, mp->m_quotainfo is not initialised until after log
> recovery occurs, so this has to be detected in the verify/crc
> routines otherwise it goes splat like above. My current patch series
> has this fixed.

Cool.  Sorry for the extra noise.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/9] xfs: add support for large btree blocks
  2013-02-22  1:34     ` Dave Chinner
@ 2013-02-23  2:27       ` Dave Chinner
  2013-03-04 16:31         ` Ben Myers
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2013-02-23  2:27 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Fri, Feb 22, 2013 at 12:34:56PM +1100, Dave Chinner wrote:
> On Fri, Feb 15, 2013 at 03:20:15PM -0600, Ben Myers wrote:
> > On Tue, Jan 22, 2013 at 12:25:53AM +1100, Dave Chinner wrote:
> > > From: Christoph Hellwig <hch@lst.de>
> > > 
> > > Add support for larger btree blocks that contains a CRC32C checksum,
> > > a filesystem uuid and block number for detecting filesystem
> > > consistency and out of place writes.
> > > 
> > > [dchinner@redhat.com] Also include an owner field to allow reverse
> > > mappings to be implemented for improved repairability and a LSN
> > > field to so that log recovery can easily determine the last
> > > modification that made it to disk for each buffer.
> > > 
> > > [dchinner@redhat.com] Add buffer log format flags to indicate the
> > > type of buffer to recovery so that we don't have to do blind magic
> > > number tests to determine what the buffer is.
> > > 
> > > [dchinner@redhat.com] Modified to fit into the verifier structure.
> > 
> > This patch is far too large for a good review.  It needs to be split up into
> > it's various ideas which you outlined in patch 0.  If you need to add dead code
> > in each piece and then enable it at the end, that's fine with me.
> 
> You want it broken up into 7 or 8 separate patches - what does it
> gain us? It'll take a week for me to do the patch monkeying and to
> retest and validate the resulting stack (i.e. it is bisectable, each
> patch passes xfstests, etc), and in the end the code will be
> identical.
> 
> As I've said before, there's a point where the tradeoff for
> splitting patches up doesn't make sense. Asking a developer to do
> days of work to end up with exactly the same code to save the
> reviewer an hour or two is *not* a good tradeoff. Especially for the
> first patch of a much larger 15-20 patch series which contains
> several larger and more complex patches....

Ben, reading this back it comes across as unnecessarily negative and
narky. I was just about at the end of the attribute leaf changes and
it's been a mind-numbing slog making mechanical changes to lots of
code.

This is after having to do the same slog through all the directory
code. The block, leaf, node, freespace and da_btree code all had to
get the same treatment, and it's worn me down.  My wrists are
starting to hurt from the last three months of slogging through this
stuff (roughly +20,000/-10,000 lines modified according to a quick
diffstat) and that doesn't make me a happy camper.

So I'm not meaning to be narky or nasty - I just want to get this
stuff done ASAP before it burns me out. Hence the prospect of having
to go back and redo significant chunks to split out trivial pieces
of code (i.e. a mind-numbing slog making mechanical changes) hit a
bit of a raw nerve.

All I'm asking is that you take into account the extra load that the
rework you ask me to do adds and whether it is absolutely necessary
to be able to review the code. The last thing I want is be burnt out
by this stuff....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/9] xfs: metadata CRCs, kernel, first batch
  2013-02-22 23:50     ` Ben Myers
@ 2013-02-23  2:38       ` Dave Chinner
  2013-03-04 16:33         ` Ben Myers
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2013-02-23  2:38 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Fri, Feb 22, 2013 at 05:50:29PM -0600, Ben Myers wrote:
> On Sat, Feb 23, 2013 at 10:12:17AM +1100, Dave Chinner wrote:
> > On Fri, Feb 22, 2013 at 09:19:31AM -0600, Ben Myers wrote:
> > This is mentioned in the patch zero description:
> > 
> > "This series makes is through to 001-092 in xfstests - there is a
> > problem in the dquot verifier that causes log recovery of dquot
> > buffers to follow a NULL pointer."
> >
> > Basically, mp->m_quotainfo is not initialised until after log
> > recovery occurs, so this has to be detected in the verify/crc
> > routines otherwise it goes splat like above. My current patch series
> > has this fixed.
> 
> Cool.  Sorry for the extra noise.

No, that's fine. It tells me that you're actaully looking at the
code and seeing what it does ;)

I'm close to having a new version of the kernel patchset out. I've
just got to finish debugging the attribute changes I've made and
move the superblock support patch to the end of the series and I'll
post it out.

All I'm aiming for with the next version of the patch set is that
existing filesystems (i.e. no CRCs) are regression free. I've
actaully done very little CRC enabled testing while doing all the
directory and attribute code changes, mainly because I can't test
them properly until the userspace support is there. However, the
patch set up to the final patch (i.e. everything but the attribute
changes) seems to work just fine with the existing toolchain and
xfstests.

Put simply, my focus for testing the patch set is currently "no
regressions for existing users", not "CRCs work perfectly".

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/9] xfs: add CRC checks to the AGFL
       [not found] ` <1358774760-21841-5-git-send-email-david@fromorbit.com>
@ 2013-02-27 22:37   ` Ben Myers
  2013-02-27 23:20     ` Dave Chinner
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Myers @ 2013-02-27 22:37 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hi Dave,

On Tue, Jan 22, 2013 at 12:25:55AM +1100, Dave Chinner wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> Add CRC checks, location information and a magic number to the AGFL.
> Previously the AGFL was just a block containing nothing but the
> free block pointers.  The new AGFL has a real header with the usual
> boilerplate instead, so that we can verify it's not corrupted and
> written into the right place.
> 
> [dchinner@redhat.com] Added LSN field, reworked significantly to fit
> into new verifier structure and growfs structure, enabled full
> verifier functionality now there is a header to verify and we can
> guarantee an initialised AGFL.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

I have a couple comments below.

-Ben

> ---
>  fs/xfs/xfs_ag.h          |   25 +++++++++-
>  fs/xfs/xfs_alloc.c       |  119 ++++++++++++++++++++++++++++++----------------
>  fs/xfs/xfs_buf_item.h    |    4 +-
>  fs/xfs/xfs_fsops.c       |    5 ++
>  fs/xfs/xfs_log_recover.c |    7 +++
>  5 files changed, 116 insertions(+), 44 deletions(-)
> 
> diff --git a/fs/xfs/xfs_ag.h b/fs/xfs/xfs_ag.h
> index b382703..813caea 100644
> --- a/fs/xfs/xfs_ag.h
> +++ b/fs/xfs/xfs_ag.h
> @@ -30,6 +30,7 @@ struct xfs_trans;
>  
>  #define	XFS_AGF_MAGIC	0x58414746	/* 'XAGF' */
>  #define	XFS_AGI_MAGIC	0x58414749	/* 'XAGI' */
> +#define	XFS_AGFL_MAGIC	0x5841464c	/* 'XAFL' */
>  #define	XFS_AGF_VERSION	1
>  #define	XFS_AGI_VERSION	1
>  
> @@ -190,11 +191,31 @@ extern const struct xfs_buf_ops xfs_agi_buf_ops;
>   */
>  #define XFS_AGFL_DADDR(mp)	((xfs_daddr_t)(3 << (mp)->m_sectbb_log))
>  #define	XFS_AGFL_BLOCK(mp)	XFS_HDR_BLOCK(mp, XFS_AGFL_DADDR(mp))
> -#define XFS_AGFL_SIZE(mp)	((mp)->m_sb.sb_sectsize / sizeof(xfs_agblock_t))
>  #define	XFS_BUF_TO_AGFL(bp)	((xfs_agfl_t *)((bp)->b_addr))
>  
> +#define XFS_BUF_TO_AGFL_BNO(mp, bp) \
> +	(xfs_sb_version_hascrc(&((mp)->m_sb)) ? \
> +		&(XFS_BUF_TO_AGFL(bp)->agfl_bno[0]) : \
> +		(__be32 *)(bp)->b_addr)
> +
> +/*
> + * Size of the AGFL.  For CRC-enabled filesystes we steal a couple of
> + * slots in the beginning of the block for a proper header with the
> + * location information and CRC.
> + */
> +#define XFS_AGFL_SIZE(mp) \
> +	(((mp)->m_sb.sb_sectsize - \
> +	 (xfs_sb_version_hascrc(&((mp)->m_sb)) ? \
> +		sizeof(struct xfs_agfl) : 0)) / \
> +	  sizeof(xfs_agblock_t))
> +
>  typedef struct xfs_agfl {
> -	__be32		agfl_bno[1];	/* actually XFS_AGFL_SIZE(mp) */
> +	__be32		agfl_magicnum;
> +	__be32		agfl_seqno;
> +	uuid_t		agfl_uuid;
> +	__be64		agfl_lsn;
> +	__be32		agfl_crc;
> +	__be32		agfl_bno[];	/* actually XFS_AGFL_SIZE(mp) */
>  } xfs_agfl_t;
>  
>  /*
> diff --git a/fs/xfs/xfs_alloc.c b/fs/xfs/xfs_alloc.c
> index 1d2e9c3..53e699e 100644
> --- a/fs/xfs/xfs_alloc.c
> +++ b/fs/xfs/xfs_alloc.c
> @@ -432,53 +432,84 @@ xfs_alloc_fixup_trees(
>  	return 0;
>  }
>  
> -static void
> +static bool
>  xfs_agfl_verify(
>  	struct xfs_buf	*bp)
>  {
> -#ifdef WHEN_CRCS_COME_ALONG
> -	/*
> -	 * we cannot actually do any verification of the AGFL because mkfs does
> -	 * not initialise the AGFL to zero or NULL. Hence the only valid part of
> -	 * the AGFL is what the AGF says is active. We can't get to the AGF, so
> -	 * we can't verify just those entries are valid.
> -	 *
> -	 * This problem goes away when the CRC format change comes along as that
> -	 * requires the AGFL to be initialised by mkfs. At that point, we can
> -	 * verify the blocks in the agfl -active or not- lie within the bounds
> -	 * of the AG. Until then, just leave this check ifdef'd out.
> -	 */
>  	struct xfs_mount *mp = bp->b_target->bt_mount;
>  	struct xfs_agfl	*agfl = XFS_BUF_TO_AGFL(bp);
> -	int		agfl_ok = 1;
> -
>  	int		i;
>  
> +	if (!uuid_equal(&agfl->agfl_uuid, &mp->m_sb.sb_uuid))
> +		return false;
> +	if (be32_to_cpu(agfl->agfl_magicnum) != XFS_AGFL_MAGIC)
> +		return false;
> +	/*
> +	 * during growfs operations, the perag is not fully initialised,
> +	 * so we can't use it for any useful checking. growfs ensures we can't
> +	 * use it by using uncached buffers that don't have the perag attached
> +	 * so we can detect and avoid this problem.
> +	 */
> +	if (bp->b_pag && be32_to_cpu(agfl->agfl_seqno) != bp->b_pag->pag_agno)
> +		return false;
> +
>  	for (i = 0; i < XFS_AGFL_SIZE(mp); i++) {
> -		if (be32_to_cpu(agfl->agfl_bno[i]) == NULLAGBLOCK ||
> +		if (be32_to_cpu(agfl->agfl_bno[i]) != NULLAGBLOCK &&
>  		    be32_to_cpu(agfl->agfl_bno[i]) >= mp->m_sb.sb_agblocks)
						   <

Any non NULLAGBLOCK should be less than m_sb.sb_agblocks, correct?

> -			agfl_ok = 0;
> +			return false;
>  	}
> +	return true;
> +}
> +
> +static void
> +xfs_agfl_read_verify(
> +	struct xfs_buf	*bp)
> +{
> +	struct xfs_mount *mp = bp->b_target->bt_mount;
> +	int		agfl_ok = 1;
> +
> +	/*
> +	 * There is no verification of non-crc AGFLs because mkfs does not
> +	 * initialise the AGFL to zero or NULL. Hence the only valid part of the
> +	 * AGFL is what the AGF says is active. We can't get to the AGF, so we
> +	 * can't verify just those entries are valid.
> +	 */
> +	if (!xfs_sb_version_hascrc(&mp->m_sb))
> +		return;
> +
> +	agfl_ok = xfs_verify_cksum(bp->b_addr, BBTOB(bp->b_length),
> +				   offsetof(struct xfs_agfl, agfl_crc));
> +
> +	agfl_ok = agfl_ok && xfs_agfl_verify(bp);
>  
>  	if (!agfl_ok) {
> -		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, agfl);
> +		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
>  		xfs_buf_ioerror(bp, EFSCORRUPTED);
>  	}
> -#endif
>  }
>  
>  static void
>  xfs_agfl_write_verify(
>  	struct xfs_buf	*bp)
>  {
> -	xfs_agfl_verify(bp);
> -}
> +	struct xfs_mount *mp = bp->b_target->bt_mount;
> +	struct xfs_buf_log_item	*bip = bp->b_fspriv;
>  
> -static void
> -xfs_agfl_read_verify(
> -	struct xfs_buf	*bp)
> -{
> -	xfs_agfl_verify(bp);
> +	/* no verification of non-crc AGFLs */
> +	if (!xfs_sb_version_hascrc(&mp->m_sb))
> +		return;
> +
> +	if (!xfs_agfl_verify(bp)) {
> +		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
> +		xfs_buf_ioerror(bp, EFSCORRUPTED);
> +		return;
> +	}
> +
> +	if (bip)
> +		XFS_BUF_TO_AGFL(bp)->agfl_lsn = cpu_to_be64(bip->bli_item.li_lsn);
> +
> +	xfs_update_cksum(bp->b_addr, BBTOB(bp->b_length),
> +			 offsetof(struct xfs_agfl, agfl_crc));
>  }
>  
>  const struct xfs_buf_ops xfs_agfl_buf_ops = {
> @@ -1984,18 +2015,18 @@ xfs_alloc_get_freelist(
>  	int		btreeblk) /* destination is a AGF btree */
>  {
>  	xfs_agf_t	*agf;	/* a.g. freespace structure */
> -	xfs_agfl_t	*agfl;	/* a.g. freelist structure */
>  	xfs_buf_t	*agflbp;/* buffer for a.g. freelist structure */
>  	xfs_agblock_t	bno;	/* block number returned */
> +	__be32		*agfl_bno;
>  	int		error;
>  	int		logflags;
> -	xfs_mount_t	*mp;	/* mount structure */
> +	xfs_mount_t	*mp = tp->t_mountp;
>  	xfs_perag_t	*pag;	/* per allocation group data */
>  
> -	agf = XFS_BUF_TO_AGF(agbp);
>  	/*
>  	 * Freelist is empty, give up.
>  	 */
> +	agf = XFS_BUF_TO_AGF(agbp);
>  	if (!agf->agf_flcount) {
>  		*bnop = NULLAGBLOCK;
>  		return 0;
> @@ -2003,15 +2034,17 @@ xfs_alloc_get_freelist(
>  	/*
>  	 * Read the array of free blocks.
>  	 */
> -	mp = tp->t_mountp;
> -	if ((error = xfs_alloc_read_agfl(mp, tp,
> -			be32_to_cpu(agf->agf_seqno), &agflbp)))
> +	error = xfs_alloc_read_agfl(mp, tp, be32_to_cpu(agf->agf_seqno),
> +				    &agflbp);
> +	if (error)
>  		return error;
> -	agfl = XFS_BUF_TO_AGFL(agflbp);
> +
> +
>  	/*
>  	 * Get the block number and update the data structures.
>  	 */
> -	bno = be32_to_cpu(agfl->agfl_bno[be32_to_cpu(agf->agf_flfirst)]);
> +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> +	bno = be32_to_cpu(agfl_bno[be32_to_cpu(agf->agf_flfirst)]);
>  	be32_add_cpu(&agf->agf_flfirst, 1);
>  	xfs_trans_brelse(tp, agflbp);
>  	if (be32_to_cpu(agf->agf_flfirst) == XFS_AGFL_SIZE(mp))
> @@ -2104,12 +2137,13 @@ xfs_alloc_put_freelist(
>  	int			btreeblk) /* block came from a AGF btree */
>  {
>  	xfs_agf_t		*agf;	/* a.g. freespace structure */
> -	xfs_agfl_t		*agfl;	/* a.g. free block array */
>  	__be32			*blockp;/* pointer to array entry */
>  	int			error;
>  	int			logflags;
>  	xfs_mount_t		*mp;	/* mount structure */
>  	xfs_perag_t		*pag;	/* per allocation group data */
> +	__be32			*agfl_bno;
> +	int			startoff;
>  
>  	agf = XFS_BUF_TO_AGF(agbp);
>  	mp = tp->t_mountp;
> @@ -2117,7 +2151,6 @@ xfs_alloc_put_freelist(
>  	if (!agflbp && (error = xfs_alloc_read_agfl(mp, tp,
>  			be32_to_cpu(agf->agf_seqno), &agflbp)))
>  		return error;
> -	agfl = XFS_BUF_TO_AGFL(agflbp);
>  	be32_add_cpu(&agf->agf_fllast, 1);
>  	if (be32_to_cpu(agf->agf_fllast) == XFS_AGFL_SIZE(mp))
>  		agf->agf_fllast = 0;
> @@ -2138,13 +2171,17 @@ xfs_alloc_put_freelist(
>  	xfs_alloc_log_agf(tp, agbp, logflags);
>  
>  	ASSERT(be32_to_cpu(agf->agf_flcount) <= XFS_AGFL_SIZE(mp));
> -	blockp = &agfl->agfl_bno[be32_to_cpu(agf->agf_fllast)];
> +
> +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> +	blockp = &agfl_bno[be32_to_cpu(agf->agf_fllast)];
>  	*blockp = cpu_to_be32(bno);
> +	startoff = (char *)blockp - (char *)agflbp->b_addr;
> +
>  	xfs_alloc_log_agf(tp, agbp, logflags);
> -	xfs_trans_log_buf(tp, agflbp,
> -		(int)((xfs_caddr_t)blockp - (xfs_caddr_t)agfl),
> -		(int)((xfs_caddr_t)blockp - (xfs_caddr_t)agfl +
> -			sizeof(xfs_agblock_t) - 1));
> +
> +	xfs_trans_buf_set_type(tp, agflbp, XFS_BLF_AGFL_BUF);
> +	xfs_trans_log_buf(tp, agflbp, startoff,
> +			  startoff + sizeof(xfs_agblock_t) - 1);
>  	return 0;
>  }
>  
> diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
> index 76bd7a1..067d5f0 100644
> --- a/fs/xfs/xfs_buf_item.h
> +++ b/fs/xfs/xfs_buf_item.h
> @@ -46,13 +46,15 @@ extern kmem_zone_t	*xfs_buf_item_zone;
>   */
>  #define XFS_BLF_BTREE_BUF	(1<<5)
>  #define XFS_BLF_AGF_BUF		(1<<6)
> +#define XFS_BLF_AGFL_BUF	(1<<7)
>  
>  #define XFS_BLF_TYPE_MASK	\
>  		(XFS_BLF_UDQUOT_BUF | \
>  		 XFS_BLF_PDQUOT_BUF | \
>  		 XFS_BLF_GDQUOT_BUF | \
>  		 XFS_BLF_BTREE_BUF | \
> -		 XFS_BLF_AGF_BUF)
> +		 XFS_BLF_AGF_BUF | \
> +		 XFS_BLF_AGFL_BUF)
>  
>  #define	XFS_BLF_CHUNK		128
>  #define	XFS_BLF_SHIFT		7
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 10ee0b8..1edfdf0 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -268,6 +268,11 @@ xfs_growfs_data_private(
>  		}
>  
>  		agfl = XFS_BUF_TO_AGFL(bp);
> +		if (xfs_sb_version_hascrc(&mp->m_sb)) {
> +			agfl->agfl_magicnum = cpu_to_be32(XFS_AGFL_MAGIC);
> +			agfl->agfl_seqno = cpu_to_be32(agno);
> +			uuid_copy(&agfl->agfl_uuid, &mp->m_sb.sb_uuid);
> +		}
>  		for (bucket = 0; bucket < XFS_AGFL_SIZE(mp); bucket++)
>  			agfl->agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK);
>  
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 65c35d5..81d3cc5a 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -1961,6 +1961,13 @@ xlog_recover_do_reg_buffer(
>  		}
>  		bp->b_ops = &xfs_agf_buf_ops;
>  		break;
> +	case XFS_BLF_AGFL_BUF:
> +		if (*(__be32 *)bp->b_addr != cpu_to_be32(XFS_AGFL_MAGIC)) {
> +			xfs_warn(mp, "Bad AGFL block magic!");
> +			ASSERT(0);
> +		}
> +		bp->b_ops = &xfs_agfl_buf_ops;
> +		break;

Your changes for v2 in this section look good.

>  	default:
>  		break;
>  	}
> -- 
> 1.7.10
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/9] xfs: add CRC checks to the AGFL
  2013-02-27 22:37   ` [PATCH 4/9] xfs: add CRC checks to the AGFL Ben Myers
@ 2013-02-27 23:20     ` Dave Chinner
  2013-02-27 23:31       ` Dave Chinner
  2013-02-27 23:32       ` Ben Myers
  0 siblings, 2 replies; 19+ messages in thread
From: Dave Chinner @ 2013-02-27 23:20 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Wed, Feb 27, 2013 at 04:37:50PM -0600, Ben Myers wrote:
> Hi Dave,
> 
> On Tue, Jan 22, 2013 at 12:25:55AM +1100, Dave Chinner wrote:
> > From: Christoph Hellwig <hch@lst.de>
> > 
> > Add CRC checks, location information and a magic number to the AGFL.
> > Previously the AGFL was just a block containing nothing but the
> > free block pointers.  The new AGFL has a real header with the usual
> > boilerplate instead, so that we can verify it's not corrupted and
> > written into the right place.
> > 
> > [dchinner@redhat.com] Added LSN field, reworked significantly to fit
> > into new verifier structure and growfs structure, enabled full
> > verifier functionality now there is a header to verify and we can
> > guarantee an initialised AGFL.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> I have a couple comments below.
.....
> >  	for (i = 0; i < XFS_AGFL_SIZE(mp); i++) {
> > -		if (be32_to_cpu(agfl->agfl_bno[i]) == NULLAGBLOCK ||
> > +		if (be32_to_cpu(agfl->agfl_bno[i]) != NULLAGBLOCK &&
> >  		    be32_to_cpu(agfl->agfl_bno[i]) >= mp->m_sb.sb_agblocks)
> 						   <

No, we are checking for the agbno being out of range here, not in
range.

The previous code (which was ifdef'd out) reflected the fact that
NULLAGBLOCK could not appear in a AGFL (initialised to zero, not
NULLAGBLOCK), For CRC enabled filesystems - where this check is run,
we guarantee that unused entries are initialised to NULLAGBLOCK by
mkfs and growfs, and this change reflects that.

> Any non NULLAGBLOCK should be less than m_sb.sb_agblocks, correct?

xfs_agblock_t is an unsigned value, therefore it has a value of
0xffffffff. be32-to_cpu() also returns an unsigned value.
So, no, is it never less than mp->m_sb.sb_agblocks.

But we don't want to rely on an implicit comparison against
mp->m_sb.sb_agblocks to detect this, and hence we *always* check
explicitly for it being a NULLAGBLOCK.

> > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > index 65c35d5..81d3cc5a 100644
> > --- a/fs/xfs/xfs_log_recover.c
> > +++ b/fs/xfs/xfs_log_recover.c
> > @@ -1961,6 +1961,13 @@ xlog_recover_do_reg_buffer(
> >  		}
> >  		bp->b_ops = &xfs_agf_buf_ops;
> >  		break;
> > +	case XFS_BLF_AGFL_BUF:
> > +		if (*(__be32 *)bp->b_addr != cpu_to_be32(XFS_AGFL_MAGIC)) {
> > +			xfs_warn(mp, "Bad AGFL block magic!");
> > +			ASSERT(0);
> > +		}
> > +		bp->b_ops = &xfs_agfl_buf_ops;
> > +		break;
> 
> Your changes for v2 in this section look good.

Actually, the above hunk is broken. The magic number should only be
checked for CRC enabled filesystems. My current code has this check,
though I thought I fixed that long before I reposted this series...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/9] xfs: add CRC checks to the AGFL
  2013-02-27 23:20     ` Dave Chinner
@ 2013-02-27 23:31       ` Dave Chinner
  2013-02-27 23:35         ` Ben Myers
  2013-02-27 23:32       ` Ben Myers
  1 sibling, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2013-02-27 23:31 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Thu, Feb 28, 2013 at 10:20:45AM +1100, Dave Chinner wrote:
> On Wed, Feb 27, 2013 at 04:37:50PM -0600, Ben Myers wrote:
> > Hi Dave,
....
> > > +	case XFS_BLF_AGFL_BUF:
> > > +		if (*(__be32 *)bp->b_addr != cpu_to_be32(XFS_AGFL_MAGIC)) {
> > > +			xfs_warn(mp, "Bad AGFL block magic!");
> > > +			ASSERT(0);
> > > +		}
> > > +		bp->b_ops = &xfs_agfl_buf_ops;
> > > +		break;
> > 
> > Your changes for v2 in this section look good.
> 
> Actually, the above hunk is broken. The magic number should only be
> checked for CRC enabled filesystems. My current code has this check,
> though I thought I fixed that long before I reposted this series...

I just realised you're commenting on the original version of the
patch series, no the new version I posted a couple of days ago. It
is fixed in that version of the patch, so I'm not going totally
crazy (yet!)....

Can you switch over to the newer version of the patch set?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/9] xfs: add CRC checks to the AGFL
  2013-02-27 23:20     ` Dave Chinner
  2013-02-27 23:31       ` Dave Chinner
@ 2013-02-27 23:32       ` Ben Myers
  1 sibling, 0 replies; 19+ messages in thread
From: Ben Myers @ 2013-02-27 23:32 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Feb 28, 2013 at 10:20:45AM +1100, Dave Chinner wrote:
> On Wed, Feb 27, 2013 at 04:37:50PM -0600, Ben Myers wrote:
> > Hi Dave,
> > 
> > On Tue, Jan 22, 2013 at 12:25:55AM +1100, Dave Chinner wrote:
> > > From: Christoph Hellwig <hch@lst.de>
> > > 
> > > Add CRC checks, location information and a magic number to the AGFL.
> > > Previously the AGFL was just a block containing nothing but the
> > > free block pointers.  The new AGFL has a real header with the usual
> > > boilerplate instead, so that we can verify it's not corrupted and
> > > written into the right place.
> > > 
> > > [dchinner@redhat.com] Added LSN field, reworked significantly to fit
> > > into new verifier structure and growfs structure, enabled full
> > > verifier functionality now there is a header to verify and we can
> > > guarantee an initialised AGFL.
> > > 
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > 
> > I have a couple comments below.
> .....
> > >  	for (i = 0; i < XFS_AGFL_SIZE(mp); i++) {
> > > -		if (be32_to_cpu(agfl->agfl_bno[i]) == NULLAGBLOCK ||
> > > +		if (be32_to_cpu(agfl->agfl_bno[i]) != NULLAGBLOCK &&
> > >  		    be32_to_cpu(agfl->agfl_bno[i]) >= mp->m_sb.sb_agblocks)
> > 						   <
> 
> No, we are checking for the agbno being out of range here, not in
> range.

Ah.  Good.

> The previous code (which was ifdef'd out) reflected the fact that
> NULLAGBLOCK could not appear in a AGFL (initialised to zero, not
> NULLAGBLOCK), For CRC enabled filesystems - where this check is run,
> we guarantee that unused entries are initialised to NULLAGBLOCK by
> mkfs and growfs, and this change reflects that.
> 
> > Any non NULLAGBLOCK should be less than m_sb.sb_agblocks, correct?
> 
> xfs_agblock_t is an unsigned value, therefore it has a value of
> 0xffffffff. be32-to_cpu() also returns an unsigned value.
> So, no, is it never less than mp->m_sb.sb_agblocks.
> 
> But we don't want to rely on an implicit comparison against
> mp->m_sb.sb_agblocks to detect this, and hence we *always* check
> explicitly for it being a NULLAGBLOCK.
> 
> > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > > index 65c35d5..81d3cc5a 100644
> > > --- a/fs/xfs/xfs_log_recover.c
> > > +++ b/fs/xfs/xfs_log_recover.c
> > > @@ -1961,6 +1961,13 @@ xlog_recover_do_reg_buffer(
> > >  		}
> > >  		bp->b_ops = &xfs_agf_buf_ops;
> > >  		break;
> > > +	case XFS_BLF_AGFL_BUF:
> > > +		if (*(__be32 *)bp->b_addr != cpu_to_be32(XFS_AGFL_MAGIC)) {
> > > +			xfs_warn(mp, "Bad AGFL block magic!");
> > > +			ASSERT(0);
> > > +		}
> > > +		bp->b_ops = &xfs_agfl_buf_ops;
> > > +		break;
> > 
> > Your changes for v2 in this section look good.
> 
> Actually, the above hunk is broken. The magic number should only be
> checked for CRC enabled filesystems. My current code has this check,
> though I thought I fixed that long before I reposted this series...

I was pointing out that the changes included in v2 of this patch in this
section of code look good.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/9] xfs: add CRC checks to the AGFL
  2013-02-27 23:31       ` Dave Chinner
@ 2013-02-27 23:35         ` Ben Myers
  0 siblings, 0 replies; 19+ messages in thread
From: Ben Myers @ 2013-02-27 23:35 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Dave,

On Thu, Feb 28, 2013 at 10:31:55AM +1100, Dave Chinner wrote:
> On Thu, Feb 28, 2013 at 10:20:45AM +1100, Dave Chinner wrote:
> > On Wed, Feb 27, 2013 at 04:37:50PM -0600, Ben Myers wrote:
> > > Hi Dave,
> ....
> > > > +	case XFS_BLF_AGFL_BUF:
> > > > +		if (*(__be32 *)bp->b_addr != cpu_to_be32(XFS_AGFL_MAGIC)) {
> > > > +			xfs_warn(mp, "Bad AGFL block magic!");
> > > > +			ASSERT(0);
> > > > +		}
> > > > +		bp->b_ops = &xfs_agfl_buf_ops;
> > > > +		break;
> > > 
> > > Your changes for v2 in this section look good.
> > 
> > Actually, the above hunk is broken. The magic number should only be
> > checked for CRC enabled filesystems. My current code has this check,
> > though I thought I fixed that long before I reposted this series...
> 
> I just realised you're commenting on the original version of the
> patch series, no the new version I posted a couple of days ago. It
> is fixed in that version of the patch, so I'm not going totally
> crazy (yet!)....

Yep.  The new rev looks good.

> Can you switch over to the newer version of the patch set?

Will do.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/9] xfs: add support for large btree blocks
  2013-02-23  2:27       ` Dave Chinner
@ 2013-03-04 16:31         ` Ben Myers
  0 siblings, 0 replies; 19+ messages in thread
From: Ben Myers @ 2013-03-04 16:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hi Dave,

On Sat, Feb 23, 2013 at 01:27:22PM +1100, Dave Chinner wrote:
> On Fri, Feb 22, 2013 at 12:34:56PM +1100, Dave Chinner wrote:
> > On Fri, Feb 15, 2013 at 03:20:15PM -0600, Ben Myers wrote:
> > > On Tue, Jan 22, 2013 at 12:25:53AM +1100, Dave Chinner wrote:
> > > > From: Christoph Hellwig <hch@lst.de>
> > > > 
> > > > Add support for larger btree blocks that contains a CRC32C checksum,
> > > > a filesystem uuid and block number for detecting filesystem
> > > > consistency and out of place writes.
> > > > 
> > > > [dchinner@redhat.com] Also include an owner field to allow reverse
> > > > mappings to be implemented for improved repairability and a LSN
> > > > field to so that log recovery can easily determine the last
> > > > modification that made it to disk for each buffer.
> > > > 
> > > > [dchinner@redhat.com] Add buffer log format flags to indicate the
> > > > type of buffer to recovery so that we don't have to do blind magic
> > > > number tests to determine what the buffer is.
> > > > 
> > > > [dchinner@redhat.com] Modified to fit into the verifier structure.
> > > 
> > > This patch is far too large for a good review.  It needs to be split up into
> > > it's various ideas which you outlined in patch 0.  If you need to add dead code
> > > in each piece and then enable it at the end, that's fine with me.
> > 
> > You want it broken up into 7 or 8 separate patches - what does it
> > gain us? It'll take a week for me to do the patch monkeying and to
> > retest and validate the resulting stack (i.e. it is bisectable, each
> > patch passes xfstests, etc), and in the end the code will be
> > identical.
> > 
> > As I've said before, there's a point where the tradeoff for
> > splitting patches up doesn't make sense. Asking a developer to do
> > days of work to end up with exactly the same code to save the
> > reviewer an hour or two is *not* a good tradeoff. Especially for the
> > first patch of a much larger 15-20 patch series which contains
> > several larger and more complex patches....
> 
> Ben, reading this back it comes across as unnecessarily negative and
> narky. I was just about at the end of the attribute leaf changes and
> it's been a mind-numbing slog making mechanical changes to lots of
> code.
> 
> This is after having to do the same slog through all the directory
> code. The block, leaf, node, freespace and da_btree code all had to
> get the same treatment, and it's worn me down.  My wrists are
> starting to hurt from the last three months of slogging through this
> stuff (roughly +20,000/-10,000 lines modified according to a quick
> diffstat) and that doesn't make me a happy camper.
> 
> So I'm not meaning to be narky or nasty - I just want to get this
> stuff done ASAP before it burns me out. Hence the prospect of having
> to go back and redo significant chunks to split out trivial pieces
> of code (i.e. a mind-numbing slog making mechanical changes) hit a
> bit of a raw nerve.
> 
> All I'm asking is that you take into account the extra load that the
> rework you ask me to do adds and whether it is absolutely necessary
> to be able to review the code. The last thing I want is be burnt out
> by this stuff....

Once I get back to it I'll consider whether to pull it apart myself.  I had a
hard time keeping track of everything that was going on in that patch.  And it
seems like whenever that happens I miss something important in my review.
Working through the rest of the series may also help to get more comfortable
with this patch.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/9] xfs: metadata CRCs, kernel, first batch
  2013-02-23  2:38       ` Dave Chinner
@ 2013-03-04 16:33         ` Ben Myers
  0 siblings, 0 replies; 19+ messages in thread
From: Ben Myers @ 2013-03-04 16:33 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hi Dave,

On Sat, Feb 23, 2013 at 01:38:18PM +1100, Dave Chinner wrote:
> On Fri, Feb 22, 2013 at 05:50:29PM -0600, Ben Myers wrote:
> > On Sat, Feb 23, 2013 at 10:12:17AM +1100, Dave Chinner wrote:
> > > On Fri, Feb 22, 2013 at 09:19:31AM -0600, Ben Myers wrote:
> > > This is mentioned in the patch zero description:
> > > 
> > > "This series makes is through to 001-092 in xfstests - there is a
> > > problem in the dquot verifier that causes log recovery of dquot
> > > buffers to follow a NULL pointer."
> > >
> > > Basically, mp->m_quotainfo is not initialised until after log
> > > recovery occurs, so this has to be detected in the verify/crc
> > > routines otherwise it goes splat like above. My current patch series
> > > has this fixed.
> > 
> > Cool.  Sorry for the extra noise.
> 
> No, that's fine. It tells me that you're actaully looking at the
> code and seeing what it does ;)
> 
> I'm close to having a new version of the kernel patchset out. I've
> just got to finish debugging the attribute changes I've made and
> move the superblock support patch to the end of the series and I'll
> post it out.
> 
> All I'm aiming for with the next version of the patch set is that
> existing filesystems (i.e. no CRCs) are regression free. I've
> actaully done very little CRC enabled testing while doing all the
> directory and attribute code changes, mainly because I can't test
> them properly until the userspace support is there. However, the
> patch set up to the final patch (i.e. everything but the attribute
> changes) seems to work just fine with the existing toolchain and
> xfstests.
> 
> Put simply, my focus for testing the patch set is currently "no
> regressions for existing users", not "CRCs work perfectly".

Sounds good.  I'll keep that in mind.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/9] xfs: add CRC checks to the AGI
       [not found] ` <1358774760-21841-6-git-send-email-david@fromorbit.com>
@ 2013-03-04 17:40   ` Ben Myers
  2013-03-04 17:41     ` Ben Myers
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Myers @ 2013-03-04 17:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Dave,

On Tue, Jan 22, 2013 at 12:25:56AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dgc@sgi.com>
> 
> Same set of changes made to the AGF need to be made to the AGI.
> This patch has a similar history to the AGF, hence a similar
> sign-off chain.
> 
> Signed-off-by: Dave Chinner <dgc@sgi.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Dave Chinner <dgc@redhat.com>

This one looks fine to me.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/9] xfs: add CRC checks to the AGI
  2013-03-04 17:40   ` [PATCH 5/9] xfs: add CRC checks to the AGI Ben Myers
@ 2013-03-04 17:41     ` Ben Myers
  0 siblings, 0 replies; 19+ messages in thread
From: Ben Myers @ 2013-03-04 17:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, Mar 04, 2013 at 11:40:12AM -0600, Ben Myers wrote:
> On Tue, Jan 22, 2013 at 12:25:56AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dgc@sgi.com>
> > 
> > Same set of changes made to the AGF need to be made to the AGI.
> > This patch has a similar history to the AGF, hence a similar
> > sign-off chain.
> > 
> > Signed-off-by: Dave Chinner <dgc@sgi.com>
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: Dave Chinner <dgc@redhat.com>
> 
> This one looks fine to me.

Meh.  I meant to reply to the v2 posting.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-03-04 17:41 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1358774760-21841-1-git-send-email-david@fromorbit.com>
     [not found] ` <1358774760-21841-2-git-send-email-david@fromorbit.com>
2013-02-14 21:25   ` [PATCH 1/9] xfs: take inode version into account in XFS_LITINO Ben Myers
     [not found] ` <1358774760-21841-3-git-send-email-david@fromorbit.com>
2013-02-15 21:20   ` [PATCH 2/9] xfs: add support for large btree blocks Ben Myers
2013-02-22  1:34     ` Dave Chinner
2013-02-23  2:27       ` Dave Chinner
2013-03-04 16:31         ` Ben Myers
     [not found] ` <1358774760-21841-4-git-send-email-david@fromorbit.com>
2013-02-21 22:53   ` [PATCH 3/9] xfs: add CRC checks to the AGF Ben Myers
2013-02-22  1:41     ` Dave Chinner
2013-02-22 15:19 ` [PATCH 0/9] xfs: metadata CRCs, kernel, first batch Ben Myers
2013-02-22 23:12   ` Dave Chinner
2013-02-22 23:50     ` Ben Myers
2013-02-23  2:38       ` Dave Chinner
2013-03-04 16:33         ` Ben Myers
     [not found] ` <1358774760-21841-5-git-send-email-david@fromorbit.com>
2013-02-27 22:37   ` [PATCH 4/9] xfs: add CRC checks to the AGFL Ben Myers
2013-02-27 23:20     ` Dave Chinner
2013-02-27 23:31       ` Dave Chinner
2013-02-27 23:35         ` Ben Myers
2013-02-27 23:32       ` Ben Myers
     [not found] ` <1358774760-21841-6-git-send-email-david@fromorbit.com>
2013-03-04 17:40   ` [PATCH 5/9] xfs: add CRC checks to the AGI Ben Myers
2013-03-04 17:41     ` Ben Myers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.