All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db +
@ 2013-11-13  6:40 Dave Chinner
  2013-11-13  6:40 ` [PATCH 01/36] xfsprogs: fix automatic dependency generation Dave Chinner
                   ` (36 more replies)
  0 siblings, 37 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

Hi folks,

This is the latest version of the xfs_db write support patch set.

Changes since V4:

- Added reviewed-by tags to allt eh reviewd patches
- Fixed the wrong subject line for patch 28
- fixed libxfs root inode handling (patch 15)
- folded db buffer unwinding on exit patch into the original IO
  rewrite patch (patch 20)

There are still a couple of unreviewed patches in the list - 20, 24,
31 and 37, so those are the the blockers at this point. Patches 1-16
shoul dbe fine to commit as presented - they should be the same as
last posting except for the addition of the reviewed by tags.

Cheers,

Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 01/36] xfsprogs: fix automatic dependency generation
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 02/36] xfs: fix some minor sparse warnings Dave Chinner
                   ` (35 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Adding are removing a header file does not result in dependency
regeneration like it should. make clean will rebuild the
dependencies, but a normal make won't. Fix it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/buildrules | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/include/buildrules b/include/buildrules
index 49cb2a4..edb1beb 100644
--- a/include/buildrules
+++ b/include/buildrules
@@ -79,18 +79,30 @@ endif # _BUILDRULES_INCLUDED_
 $(_FORCE):
 
 # dependency build is automatic, relies on gcc -MM to generate.
+#
+# This is a bit messy. It regenerates the depenencies on each build so
+# that we catch files being added and removed. There are other ways of doing
+# this (e.g. per-file dependency files) but that requires more in-depth changes
+# to the build system. Compile time is not an issue for us, so the
+# rebuild on every make invocation isn't a problem we need to care about. Just
+# do it silently so it doesn't make the build unnecessarily noisy.
+
 .PHONY : depend ltdepend install-qa
 
 MAKEDEP := $(MAKEDEPEND) $(CFLAGS)
 
-ltdepend: .ltdep
+ltdepend: rmltdep .ltdep
+
+rmltdep:
+	@rm -f .ltdep
 
 .ltdep: $(CFILES) $(HFILES)
-	@echo "    [LTDEP]"
 	$(Q)$(MAKEDEP) $(CFILES) | $(SED) -e 's,^\([^:]*\)\.o,\1.lo,' > .ltdep
 
-depend: .dep
+depend: rmdep .dep
+
+rmdep:
+	@rm -f .dep
 
 .dep: $(CFILES) $(HFILES)
-	@echo "    [DEP]"
 	$(Q)$(MAKEDEP) $(CFILES) > .dep
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 02/36] xfs: fix some minor sparse warnings
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
  2013-11-13  6:40 ` [PATCH 01/36] xfsprogs: fix automatic dependency generation Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 03/36] xfs: create a shared header file for format-related information Dave Chinner
                   ` (34 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

A couple of simple locking annotations and 0 vs NULL warnings.
Nothing that changes any code behaviour, just removes build noise.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_bmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 2d480cc..7336abf 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -4415,7 +4415,7 @@ xfs_bmapi_write(
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_ifork	*ifp;
-	struct xfs_bmalloca	bma = { 0 };	/* args for xfs_bmap_alloc */
+	struct xfs_bmalloca	bma = { NULL };	/* args for xfs_bmap_alloc */
 	xfs_fileoff_t		end;		/* end of mapped file region */
 	int			eof;		/* after the end of extents */
 	int			error;		/* error return */
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 03/36] xfs: create a shared header file for format-related information
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
  2013-11-13  6:40 ` [PATCH 01/36] xfsprogs: fix automatic dependency generation Dave Chinner
  2013-11-13  6:40 ` [PATCH 02/36] xfs: fix some minor sparse warnings Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 04/36] xfs: split dquot buffer operations out Dave Chinner
                   ` (33 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

All of the buffer operations structures are needed to be exported
for xfs_db, so move them all to a common location rather than
spreading them all over the place. They are verifying the on-disk
format, so while xfs_format.h might be a good place, it is not part
of the on disk format.

Hence we need to create a new header file that we centralise these
related definitions. Start by moving the bffer operations
structures, and then also move all the other definitions that have
crept into xfs_log_format.h and xfs_format.h as there was no other
shared header file to put them in.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/Makefile           |   1 +
 include/libxfs.h           |   1 +
 include/xfs_ag.h           |   4 -
 include/xfs_alloc.h        |   3 -
 include/xfs_alloc_btree.h  |   2 -
 include/xfs_attr_leaf.h    |   2 -
 include/xfs_bmap_btree.h   |   2 -
 include/xfs_da_btree.h     |   2 -
 include/xfs_format.h       |  10 --
 include/xfs_ialloc.h       |   2 -
 include/xfs_ialloc_btree.h |   2 -
 include/xfs_inode_buf.h    |   2 -
 include/xfs_log_format.h   | 177 --------------------------------
 include/xfs_sb.h           |   3 -
 include/xfs_shared.h       | 244 +++++++++++++++++++++++++++++++++++++++++++++
 15 files changed, 246 insertions(+), 211 deletions(-)
 create mode 100644 include/xfs_shared.h

diff --git a/include/Makefile b/include/Makefile
index dc6a8bb..6682b9d 100644
--- a/include/Makefile
+++ b/include/Makefile
@@ -40,6 +40,7 @@ QAHFILES = libxfs.h libxlog.h \
 	xfs_metadump.h \
 	xfs_quota_defs.h \
 	xfs_sb.h \
+	xfs_shared.h \
 	xfs_trace.h \
 	xfs_trans_resv.h \
 	xfs_trans_space.h
diff --git a/include/libxfs.h b/include/libxfs.h
index b837072..835ba37 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -37,6 +37,7 @@
 #include <xfs/xfs_fs.h>
 #include <xfs/xfs_arch.h>
 
+#include <xfs/xfs_shared.h>
 #include <xfs/xfs_format.h>
 #include <xfs/xfs_log_format.h>
 #include <xfs/xfs_quota_defs.h>
diff --git a/include/xfs_ag.h b/include/xfs_ag.h
index 1cb740a..3fc1098 100644
--- a/include/xfs_ag.h
+++ b/include/xfs_ag.h
@@ -128,8 +128,6 @@ typedef struct xfs_agf {
 extern int xfs_read_agf(struct xfs_mount *mp, struct xfs_trans *tp,
 			xfs_agnumber_t agno, int flags, struct xfs_buf **bpp);
 
-extern const struct xfs_buf_ops xfs_agf_buf_ops;
-
 /*
  * Size of the unlinked inode hash table in the agi.
  */
@@ -191,8 +189,6 @@ typedef struct xfs_agi {
 extern int xfs_read_agi(struct xfs_mount *mp, struct xfs_trans *tp,
 				xfs_agnumber_t agno, struct xfs_buf **bpp);
 
-extern const struct xfs_buf_ops xfs_agi_buf_ops;
-
 /*
  * The third a.g. block contains the a.g. freelist, an array
  * of block pointers to blocks owned by the allocation btree code.
diff --git a/include/xfs_alloc.h b/include/xfs_alloc.h
index 99d0a61..feacb06 100644
--- a/include/xfs_alloc.h
+++ b/include/xfs_alloc.h
@@ -231,7 +231,4 @@ xfs_alloc_get_rec(
 	xfs_extlen_t		*len,	/* output: length of extent */
 	int			*stat);	/* output: success/failure */
 
-extern const struct xfs_buf_ops xfs_agf_buf_ops;
-extern const struct xfs_buf_ops xfs_agfl_buf_ops;
-
 #endif	/* __XFS_ALLOC_H__ */
diff --git a/include/xfs_alloc_btree.h b/include/xfs_alloc_btree.h
index e3a3f74..72676c3 100644
--- a/include/xfs_alloc_btree.h
+++ b/include/xfs_alloc_btree.h
@@ -95,6 +95,4 @@ extern struct xfs_btree_cur *xfs_allocbt_init_cursor(struct xfs_mount *,
 		xfs_agnumber_t, xfs_btnum_t);
 extern int xfs_allocbt_maxrecs(struct xfs_mount *, int, int);
 
-extern const struct xfs_buf_ops xfs_allocbt_buf_ops;
-
 #endif	/* __XFS_ALLOC_BTREE_H__ */
diff --git a/include/xfs_attr_leaf.h b/include/xfs_attr_leaf.h
index d9b148f..3ec5ec0 100644
--- a/include/xfs_attr_leaf.h
+++ b/include/xfs_attr_leaf.h
@@ -106,6 +106,4 @@ void	xfs_attr3_leaf_hdr_from_disk(struct xfs_attr3_icleaf_hdr *to,
 void	xfs_attr3_leaf_hdr_to_disk(struct xfs_attr_leafblock *to,
 				   struct xfs_attr3_icleaf_hdr *from);
 
-extern const struct xfs_buf_ops xfs_attr3_leaf_buf_ops;
-
 #endif	/* __XFS_ATTR_LEAF_H__ */
diff --git a/include/xfs_bmap_btree.h b/include/xfs_bmap_btree.h
index 1b726d6..e307978 100644
--- a/include/xfs_bmap_btree.h
+++ b/include/xfs_bmap_btree.h
@@ -239,6 +239,4 @@ extern int xfs_bmbt_maxrecs(struct xfs_mount *, int blocklen, int leaf);
 extern struct xfs_btree_cur *xfs_bmbt_init_cursor(struct xfs_mount *,
 		struct xfs_trans *, struct xfs_inode *, int);
 
-extern const struct xfs_buf_ops xfs_bmbt_buf_ops;
-
 #endif	/* __XFS_BMAP_BTREE_H__ */
diff --git a/include/xfs_da_btree.h b/include/xfs_da_btree.h
index 9323b0e..e492dca 100644
--- a/include/xfs_da_btree.h
+++ b/include/xfs_da_btree.h
@@ -169,8 +169,6 @@ int	xfs_da3_node_read(struct xfs_trans *tp, struct xfs_inode *dp,
 			 xfs_dablk_t bno, xfs_daddr_t mappedbno,
 			 struct xfs_buf **bpp, int which_fork);
 
-extern const struct xfs_buf_ops xfs_da3_node_buf_ops;
-
 /*
  * Utility routines.
  */
diff --git a/include/xfs_format.h b/include/xfs_format.h
index 35c08ff..a790428 100644
--- a/include/xfs_format.h
+++ b/include/xfs_format.h
@@ -156,14 +156,4 @@ struct xfs_dsymlink_hdr {
 	((bufsize) - (xfs_sb_version_hascrc(&(mp)->m_sb) ? \
 			sizeof(struct xfs_dsymlink_hdr) : 0))
 
-int xfs_symlink_blocks(struct xfs_mount *mp, int pathlen);
-int xfs_symlink_hdr_set(struct xfs_mount *mp, xfs_ino_t ino, uint32_t offset,
-			uint32_t size, struct xfs_buf *bp);
-bool xfs_symlink_hdr_ok(struct xfs_mount *mp, xfs_ino_t ino, uint32_t offset,
-			uint32_t size, struct xfs_buf *bp);
-void xfs_symlink_local_to_remote(struct xfs_trans *tp, struct xfs_buf *bp,
-				 struct xfs_inode *ip, struct xfs_ifork *ifp);
-
-extern const struct xfs_buf_ops xfs_symlink_buf_ops;
-
 #endif /* __XFS_FORMAT_H__ */
diff --git a/include/xfs_ialloc.h b/include/xfs_ialloc.h
index 68c0732..1557798 100644
--- a/include/xfs_ialloc.h
+++ b/include/xfs_ialloc.h
@@ -158,6 +158,4 @@ int xfs_ialloc_inode_init(struct xfs_mount *mp, struct xfs_trans *tp,
 			  xfs_agnumber_t agno, xfs_agblock_t agbno,
 			  xfs_agblock_t length, unsigned int gen);
 
-extern const struct xfs_buf_ops xfs_agi_buf_ops;
-
 #endif	/* __XFS_IALLOC_H__ */
diff --git a/include/xfs_ialloc_btree.h b/include/xfs_ialloc_btree.h
index 3ac36b7..cfbfe46 100644
--- a/include/xfs_ialloc_btree.h
+++ b/include/xfs_ialloc_btree.h
@@ -110,6 +110,4 @@ extern struct xfs_btree_cur *xfs_inobt_init_cursor(struct xfs_mount *,
 		struct xfs_trans *, struct xfs_buf *, xfs_agnumber_t);
 extern int xfs_inobt_maxrecs(struct xfs_mount *, int, int);
 
-extern const struct xfs_buf_ops xfs_inobt_buf_ops;
-
 #endif	/* __XFS_IALLOC_BTREE_H__ */
diff --git a/include/xfs_inode_buf.h b/include/xfs_inode_buf.h
index aae9fc4..e8fd3bd 100644
--- a/include/xfs_inode_buf.h
+++ b/include/xfs_inode_buf.h
@@ -47,6 +47,4 @@ void		xfs_inobp_check(struct xfs_mount *, struct xfs_buf *);
 #define	xfs_inobp_check(mp, bp)
 #endif /* DEBUG */
 
-extern const struct xfs_buf_ops xfs_inode_buf_ops;
-
 #endif	/* __XFS_INODE_BUF_H__ */
diff --git a/include/xfs_log_format.h b/include/xfs_log_format.h
index 31e3a06..aeaa715 100644
--- a/include/xfs_log_format.h
+++ b/include/xfs_log_format.h
@@ -234,178 +234,6 @@ typedef struct xfs_trans_header {
 	{ XFS_LI_ICREATE,	"XFS_LI_ICREATE" }
 
 /*
- * Transaction types.  Used to distinguish types of buffers.
- */
-#define XFS_TRANS_SETATTR_NOT_SIZE	1
-#define XFS_TRANS_SETATTR_SIZE		2
-#define XFS_TRANS_INACTIVE		3
-#define XFS_TRANS_CREATE		4
-#define XFS_TRANS_CREATE_TRUNC		5
-#define XFS_TRANS_TRUNCATE_FILE		6
-#define XFS_TRANS_REMOVE		7
-#define XFS_TRANS_LINK			8
-#define XFS_TRANS_RENAME		9
-#define XFS_TRANS_MKDIR			10
-#define XFS_TRANS_RMDIR			11
-#define XFS_TRANS_SYMLINK		12
-#define XFS_TRANS_SET_DMATTRS		13
-#define XFS_TRANS_GROWFS		14
-#define XFS_TRANS_STRAT_WRITE		15
-#define XFS_TRANS_DIOSTRAT		16
-/* 17 was XFS_TRANS_WRITE_SYNC */
-#define	XFS_TRANS_WRITEID		18
-#define	XFS_TRANS_ADDAFORK		19
-#define	XFS_TRANS_ATTRINVAL		20
-#define	XFS_TRANS_ATRUNCATE		21
-#define	XFS_TRANS_ATTR_SET		22
-#define	XFS_TRANS_ATTR_RM		23
-#define	XFS_TRANS_ATTR_FLAG		24
-#define	XFS_TRANS_CLEAR_AGI_BUCKET	25
-#define XFS_TRANS_QM_SBCHANGE		26
-/*
- * Dummy entries since we use the transaction type to index into the
- * trans_type[] in xlog_recover_print_trans_head()
- */
-#define XFS_TRANS_DUMMY1		27
-#define XFS_TRANS_DUMMY2		28
-#define XFS_TRANS_QM_QUOTAOFF		29
-#define XFS_TRANS_QM_DQALLOC		30
-#define XFS_TRANS_QM_SETQLIM		31
-#define XFS_TRANS_QM_DQCLUSTER		32
-#define XFS_TRANS_QM_QINOCREATE		33
-#define XFS_TRANS_QM_QUOTAOFF_END	34
-#define XFS_TRANS_SB_UNIT		35
-#define XFS_TRANS_FSYNC_TS		36
-#define	XFS_TRANS_GROWFSRT_ALLOC	37
-#define	XFS_TRANS_GROWFSRT_ZERO		38
-#define	XFS_TRANS_GROWFSRT_FREE		39
-#define	XFS_TRANS_SWAPEXT		40
-#define	XFS_TRANS_SB_COUNT		41
-#define	XFS_TRANS_CHECKPOINT		42
-#define	XFS_TRANS_ICREATE		43
-#define	XFS_TRANS_TYPE_MAX		43
-/* new transaction types need to be reflected in xfs_logprint(8) */
-
-#define XFS_TRANS_TYPES \
-	{ XFS_TRANS_SETATTR_NOT_SIZE,	"SETATTR_NOT_SIZE" }, \
-	{ XFS_TRANS_SETATTR_SIZE,	"SETATTR_SIZE" }, \
-	{ XFS_TRANS_INACTIVE,		"INACTIVE" }, \
-	{ XFS_TRANS_CREATE,		"CREATE" }, \
-	{ XFS_TRANS_CREATE_TRUNC,	"CREATE_TRUNC" }, \
-	{ XFS_TRANS_TRUNCATE_FILE,	"TRUNCATE_FILE" }, \
-	{ XFS_TRANS_REMOVE,		"REMOVE" }, \
-	{ XFS_TRANS_LINK,		"LINK" }, \
-	{ XFS_TRANS_RENAME,		"RENAME" }, \
-	{ XFS_TRANS_MKDIR,		"MKDIR" }, \
-	{ XFS_TRANS_RMDIR,		"RMDIR" }, \
-	{ XFS_TRANS_SYMLINK,		"SYMLINK" }, \
-	{ XFS_TRANS_SET_DMATTRS,	"SET_DMATTRS" }, \
-	{ XFS_TRANS_GROWFS,		"GROWFS" }, \
-	{ XFS_TRANS_STRAT_WRITE,	"STRAT_WRITE" }, \
-	{ XFS_TRANS_DIOSTRAT,		"DIOSTRAT" }, \
-	{ XFS_TRANS_WRITEID,		"WRITEID" }, \
-	{ XFS_TRANS_ADDAFORK,		"ADDAFORK" }, \
-	{ XFS_TRANS_ATTRINVAL,		"ATTRINVAL" }, \
-	{ XFS_TRANS_ATRUNCATE,		"ATRUNCATE" }, \
-	{ XFS_TRANS_ATTR_SET,		"ATTR_SET" }, \
-	{ XFS_TRANS_ATTR_RM,		"ATTR_RM" }, \
-	{ XFS_TRANS_ATTR_FLAG,		"ATTR_FLAG" }, \
-	{ XFS_TRANS_CLEAR_AGI_BUCKET,	"CLEAR_AGI_BUCKET" }, \
-	{ XFS_TRANS_QM_SBCHANGE,	"QM_SBCHANGE" }, \
-	{ XFS_TRANS_QM_QUOTAOFF,	"QM_QUOTAOFF" }, \
-	{ XFS_TRANS_QM_DQALLOC,		"QM_DQALLOC" }, \
-	{ XFS_TRANS_QM_SETQLIM,		"QM_SETQLIM" }, \
-	{ XFS_TRANS_QM_DQCLUSTER,	"QM_DQCLUSTER" }, \
-	{ XFS_TRANS_QM_QINOCREATE,	"QM_QINOCREATE" }, \
-	{ XFS_TRANS_QM_QUOTAOFF_END,	"QM_QOFF_END" }, \
-	{ XFS_TRANS_SB_UNIT,		"SB_UNIT" }, \
-	{ XFS_TRANS_FSYNC_TS,		"FSYNC_TS" }, \
-	{ XFS_TRANS_GROWFSRT_ALLOC,	"GROWFSRT_ALLOC" }, \
-	{ XFS_TRANS_GROWFSRT_ZERO,	"GROWFSRT_ZERO" }, \
-	{ XFS_TRANS_GROWFSRT_FREE,	"GROWFSRT_FREE" }, \
-	{ XFS_TRANS_SWAPEXT,		"SWAPEXT" }, \
-	{ XFS_TRANS_SB_COUNT,		"SB_COUNT" }, \
-	{ XFS_TRANS_CHECKPOINT,		"CHECKPOINT" }, \
-	{ XFS_TRANS_DUMMY1,		"DUMMY1" }, \
-	{ XFS_TRANS_DUMMY2,		"DUMMY2" }, \
-	{ XLOG_UNMOUNT_REC_TYPE,	"UNMOUNT" }
-
-/*
- * This structure is used to track log items associated with
- * a transaction.  It points to the log item and keeps some
- * flags to track the state of the log item.  It also tracks
- * the amount of space needed to log the item it describes
- * once we get to commit processing (see xfs_trans_commit()).
- */
-struct xfs_log_item_desc {
-	struct xfs_log_item	*lid_item;
-	struct list_head	lid_trans;
-	unsigned char		lid_flags;
-};
-
-#define XFS_LID_DIRTY		0x1
-
-/*
- * Values for t_flags.
- */
-#define	XFS_TRANS_DIRTY		0x01	/* something needs to be logged */
-#define	XFS_TRANS_SB_DIRTY	0x02	/* superblock is modified */
-#define	XFS_TRANS_PERM_LOG_RES	0x04	/* xact took a permanent log res */
-#define	XFS_TRANS_SYNC		0x08	/* make commit synchronous */
-#define XFS_TRANS_DQ_DIRTY	0x10	/* at least one dquot in trx dirty */
-#define XFS_TRANS_RESERVE	0x20    /* OK to use reserved data blocks */
-#define XFS_TRANS_FREEZE_PROT	0x40	/* Transaction has elevated writer
-					   count in superblock */
-
-/*
- * Values for call flags parameter.
- */
-#define	XFS_TRANS_RELEASE_LOG_RES	0x4
-#define	XFS_TRANS_ABORT			0x8
-
-/*
- * Field values for xfs_trans_mod_sb.
- */
-#define	XFS_TRANS_SB_ICOUNT		0x00000001
-#define	XFS_TRANS_SB_IFREE		0x00000002
-#define	XFS_TRANS_SB_FDBLOCKS		0x00000004
-#define	XFS_TRANS_SB_RES_FDBLOCKS	0x00000008
-#define	XFS_TRANS_SB_FREXTENTS		0x00000010
-#define	XFS_TRANS_SB_RES_FREXTENTS	0x00000020
-#define	XFS_TRANS_SB_DBLOCKS		0x00000040
-#define	XFS_TRANS_SB_AGCOUNT		0x00000080
-#define	XFS_TRANS_SB_IMAXPCT		0x00000100
-#define	XFS_TRANS_SB_REXTSIZE		0x00000200
-#define	XFS_TRANS_SB_RBMBLOCKS		0x00000400
-#define	XFS_TRANS_SB_RBLOCKS		0x00000800
-#define	XFS_TRANS_SB_REXTENTS		0x00001000
-#define	XFS_TRANS_SB_REXTSLOG		0x00002000
-
-/*
- * Here we centralize the specification of XFS meta-data buffer
- * reference count values.  This determine how hard the buffer
- * cache tries to hold onto the buffer.
- */
-#define	XFS_AGF_REF		4
-#define	XFS_AGI_REF		4
-#define	XFS_AGFL_REF		3
-#define	XFS_INO_BTREE_REF	3
-#define	XFS_ALLOC_BTREE_REF	2
-#define	XFS_BMAP_BTREE_REF	2
-#define	XFS_DIR_BTREE_REF	2
-#define	XFS_INO_REF		2
-#define	XFS_ATTR_BTREE_REF	1
-#define	XFS_DQUOT_REF		1
-
-/*
- * Flags for xfs_trans_ichgtime().
- */
-#define	XFS_ICHGTIME_MOD	0x1	/* data fork modification timestamp */
-#define	XFS_ICHGTIME_CHG	0x2	/* inode field change timestamp */
-#define	XFS_ICHGTIME_CREATE	0x4	/* inode create timestamp */
-
-
-/*
  * Inode Log Item Format definitions.
  *
  * This is the structure used to lay out an inode log item in the
@@ -793,7 +621,6 @@ typedef struct xfs_qoff_logformat {
 	char			qf_pad[12];	/* padding for future */
 } xfs_qoff_logformat_t;
 
-
 /*
  * Disk quotas status in m_qflags, and also sb_qflags. 16 bits.
  */
@@ -845,8 +672,4 @@ struct xfs_icreate_log {
 	__be32		icl_gen;	/* inode generation number to use */
 };
 
-int	xfs_log_calc_unit_res(struct xfs_mount *mp, int unit_bytes);
-int	xfs_log_calc_minimum_size(struct xfs_mount *);
-
-
 #endif /* __XFS_LOG_FORMAT_H__ */
diff --git a/include/xfs_sb.h b/include/xfs_sb.h
index 6835b44..35061d4 100644
--- a/include/xfs_sb.h
+++ b/include/xfs_sb.h
@@ -699,7 +699,4 @@ extern void	xfs_sb_from_disk(struct xfs_sb *, struct xfs_dsb *);
 extern void	xfs_sb_to_disk(struct xfs_dsb *, struct xfs_sb *, __int64_t);
 extern void	xfs_sb_quota_from_disk(struct xfs_sb *sbp);
 
-extern const struct xfs_buf_ops xfs_sb_buf_ops;
-extern const struct xfs_buf_ops xfs_sb_quiet_buf_ops;
-
 #endif	/* __XFS_SB_H__ */
diff --git a/include/xfs_shared.h b/include/xfs_shared.h
new file mode 100644
index 0000000..63c94b1
--- /dev/null
+++ b/include/xfs_shared.h
@@ -0,0 +1,244 @@
+/*
+ * Copyright (c) 2000-2005 Silicon Graphics, Inc.
+ * Copyright (c) 2013 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#ifndef __XFS_SHARED_H__
+#define __XFS_SHARED_H__
+
+/*
+ * Definitions shared between kernel and userspace that don't fit into any other
+ * header file that is shared with userspace.
+ */
+struct xfs_ifork;
+struct xfs_buf;
+struct xfs_buf_ops;
+struct xfs_mount;
+struct xfs_trans;
+struct xfs_inode;
+
+/*
+ * Buffer verifier operations are widely used, including userspace tools
+ */
+extern const struct xfs_buf_ops xfs_agf_buf_ops;
+extern const struct xfs_buf_ops xfs_agi_buf_ops;
+extern const struct xfs_buf_ops xfs_agf_buf_ops;
+extern const struct xfs_buf_ops xfs_agfl_buf_ops;
+extern const struct xfs_buf_ops xfs_allocbt_buf_ops;
+extern const struct xfs_buf_ops xfs_attr3_leaf_buf_ops;
+extern const struct xfs_buf_ops xfs_attr3_rmt_buf_ops;
+extern const struct xfs_buf_ops xfs_bmbt_buf_ops;
+extern const struct xfs_buf_ops xfs_da3_node_buf_ops;
+extern const struct xfs_buf_ops xfs_dquot_buf_ops;
+extern const struct xfs_buf_ops xfs_symlink_buf_ops;
+extern const struct xfs_buf_ops xfs_agi_buf_ops;
+extern const struct xfs_buf_ops xfs_inobt_buf_ops;
+extern const struct xfs_buf_ops xfs_inode_buf_ops;
+extern const struct xfs_buf_ops xfs_inode_buf_ra_ops;
+extern const struct xfs_buf_ops xfs_dquot_buf_ops;
+extern const struct xfs_buf_ops xfs_sb_buf_ops;
+extern const struct xfs_buf_ops xfs_sb_quiet_buf_ops;
+extern const struct xfs_buf_ops xfs_symlink_buf_ops;
+
+/*
+ * Transaction types.  Used to distinguish types of buffers. These never reach
+ * the log.
+ */
+#define XFS_TRANS_SETATTR_NOT_SIZE	1
+#define XFS_TRANS_SETATTR_SIZE		2
+#define XFS_TRANS_INACTIVE		3
+#define XFS_TRANS_CREATE		4
+#define XFS_TRANS_CREATE_TRUNC		5
+#define XFS_TRANS_TRUNCATE_FILE		6
+#define XFS_TRANS_REMOVE		7
+#define XFS_TRANS_LINK			8
+#define XFS_TRANS_RENAME		9
+#define XFS_TRANS_MKDIR			10
+#define XFS_TRANS_RMDIR			11
+#define XFS_TRANS_SYMLINK		12
+#define XFS_TRANS_SET_DMATTRS		13
+#define XFS_TRANS_GROWFS		14
+#define XFS_TRANS_STRAT_WRITE		15
+#define XFS_TRANS_DIOSTRAT		16
+/* 17 was XFS_TRANS_WRITE_SYNC */
+#define	XFS_TRANS_WRITEID		18
+#define	XFS_TRANS_ADDAFORK		19
+#define	XFS_TRANS_ATTRINVAL		20
+#define	XFS_TRANS_ATRUNCATE		21
+#define	XFS_TRANS_ATTR_SET		22
+#define	XFS_TRANS_ATTR_RM		23
+#define	XFS_TRANS_ATTR_FLAG		24
+#define	XFS_TRANS_CLEAR_AGI_BUCKET	25
+#define XFS_TRANS_QM_SBCHANGE		26
+/*
+ * Dummy entries since we use the transaction type to index into the
+ * trans_type[] in xlog_recover_print_trans_head()
+ */
+#define XFS_TRANS_DUMMY1		27
+#define XFS_TRANS_DUMMY2		28
+#define XFS_TRANS_QM_QUOTAOFF		29
+#define XFS_TRANS_QM_DQALLOC		30
+#define XFS_TRANS_QM_SETQLIM		31
+#define XFS_TRANS_QM_DQCLUSTER		32
+#define XFS_TRANS_QM_QINOCREATE		33
+#define XFS_TRANS_QM_QUOTAOFF_END	34
+#define XFS_TRANS_SB_UNIT		35
+#define XFS_TRANS_FSYNC_TS		36
+#define	XFS_TRANS_GROWFSRT_ALLOC	37
+#define	XFS_TRANS_GROWFSRT_ZERO		38
+#define	XFS_TRANS_GROWFSRT_FREE		39
+#define	XFS_TRANS_SWAPEXT		40
+#define	XFS_TRANS_SB_COUNT		41
+#define	XFS_TRANS_CHECKPOINT		42
+#define	XFS_TRANS_ICREATE		43
+#define	XFS_TRANS_TYPE_MAX		43
+/* new transaction types need to be reflected in xfs_logprint(8) */
+
+#define XFS_TRANS_TYPES \
+	{ XFS_TRANS_SETATTR_NOT_SIZE,	"SETATTR_NOT_SIZE" }, \
+	{ XFS_TRANS_SETATTR_SIZE,	"SETATTR_SIZE" }, \
+	{ XFS_TRANS_INACTIVE,		"INACTIVE" }, \
+	{ XFS_TRANS_CREATE,		"CREATE" }, \
+	{ XFS_TRANS_CREATE_TRUNC,	"CREATE_TRUNC" }, \
+	{ XFS_TRANS_TRUNCATE_FILE,	"TRUNCATE_FILE" }, \
+	{ XFS_TRANS_REMOVE,		"REMOVE" }, \
+	{ XFS_TRANS_LINK,		"LINK" }, \
+	{ XFS_TRANS_RENAME,		"RENAME" }, \
+	{ XFS_TRANS_MKDIR,		"MKDIR" }, \
+	{ XFS_TRANS_RMDIR,		"RMDIR" }, \
+	{ XFS_TRANS_SYMLINK,		"SYMLINK" }, \
+	{ XFS_TRANS_SET_DMATTRS,	"SET_DMATTRS" }, \
+	{ XFS_TRANS_GROWFS,		"GROWFS" }, \
+	{ XFS_TRANS_STRAT_WRITE,	"STRAT_WRITE" }, \
+	{ XFS_TRANS_DIOSTRAT,		"DIOSTRAT" }, \
+	{ XFS_TRANS_WRITEID,		"WRITEID" }, \
+	{ XFS_TRANS_ADDAFORK,		"ADDAFORK" }, \
+	{ XFS_TRANS_ATTRINVAL,		"ATTRINVAL" }, \
+	{ XFS_TRANS_ATRUNCATE,		"ATRUNCATE" }, \
+	{ XFS_TRANS_ATTR_SET,		"ATTR_SET" }, \
+	{ XFS_TRANS_ATTR_RM,		"ATTR_RM" }, \
+	{ XFS_TRANS_ATTR_FLAG,		"ATTR_FLAG" }, \
+	{ XFS_TRANS_CLEAR_AGI_BUCKET,	"CLEAR_AGI_BUCKET" }, \
+	{ XFS_TRANS_QM_SBCHANGE,	"QM_SBCHANGE" }, \
+	{ XFS_TRANS_QM_QUOTAOFF,	"QM_QUOTAOFF" }, \
+	{ XFS_TRANS_QM_DQALLOC,		"QM_DQALLOC" }, \
+	{ XFS_TRANS_QM_SETQLIM,		"QM_SETQLIM" }, \
+	{ XFS_TRANS_QM_DQCLUSTER,	"QM_DQCLUSTER" }, \
+	{ XFS_TRANS_QM_QINOCREATE,	"QM_QINOCREATE" }, \
+	{ XFS_TRANS_QM_QUOTAOFF_END,	"QM_QOFF_END" }, \
+	{ XFS_TRANS_SB_UNIT,		"SB_UNIT" }, \
+	{ XFS_TRANS_FSYNC_TS,		"FSYNC_TS" }, \
+	{ XFS_TRANS_GROWFSRT_ALLOC,	"GROWFSRT_ALLOC" }, \
+	{ XFS_TRANS_GROWFSRT_ZERO,	"GROWFSRT_ZERO" }, \
+	{ XFS_TRANS_GROWFSRT_FREE,	"GROWFSRT_FREE" }, \
+	{ XFS_TRANS_SWAPEXT,		"SWAPEXT" }, \
+	{ XFS_TRANS_SB_COUNT,		"SB_COUNT" }, \
+	{ XFS_TRANS_CHECKPOINT,		"CHECKPOINT" }, \
+	{ XFS_TRANS_DUMMY1,		"DUMMY1" }, \
+	{ XFS_TRANS_DUMMY2,		"DUMMY2" }, \
+	{ XLOG_UNMOUNT_REC_TYPE,	"UNMOUNT" }
+
+/*
+ * This structure is used to track log items associated with
+ * a transaction.  It points to the log item and keeps some
+ * flags to track the state of the log item.  It also tracks
+ * the amount of space needed to log the item it describes
+ * once we get to commit processing (see xfs_trans_commit()).
+ */
+struct xfs_log_item_desc {
+	struct xfs_log_item	*lid_item;
+	struct list_head	lid_trans;
+	unsigned char		lid_flags;
+};
+
+#define XFS_LID_DIRTY		0x1
+
+/* log size calculation functions */
+int	xfs_log_calc_unit_res(struct xfs_mount *mp, int unit_bytes);
+int	xfs_log_calc_minimum_size(struct xfs_mount *);
+
+
+/*
+ * Values for t_flags.
+ */
+#define	XFS_TRANS_DIRTY		0x01	/* something needs to be logged */
+#define	XFS_TRANS_SB_DIRTY	0x02	/* superblock is modified */
+#define	XFS_TRANS_PERM_LOG_RES	0x04	/* xact took a permanent log res */
+#define	XFS_TRANS_SYNC		0x08	/* make commit synchronous */
+#define XFS_TRANS_DQ_DIRTY	0x10	/* at least one dquot in trx dirty */
+#define XFS_TRANS_RESERVE	0x20    /* OK to use reserved data blocks */
+#define XFS_TRANS_FREEZE_PROT	0x40	/* Transaction has elevated writer
+					   count in superblock */
+/*
+ * Values for call flags parameter.
+ */
+#define	XFS_TRANS_RELEASE_LOG_RES	0x4
+#define	XFS_TRANS_ABORT			0x8
+
+/*
+ * Field values for xfs_trans_mod_sb.
+ */
+#define	XFS_TRANS_SB_ICOUNT		0x00000001
+#define	XFS_TRANS_SB_IFREE		0x00000002
+#define	XFS_TRANS_SB_FDBLOCKS		0x00000004
+#define	XFS_TRANS_SB_RES_FDBLOCKS	0x00000008
+#define	XFS_TRANS_SB_FREXTENTS		0x00000010
+#define	XFS_TRANS_SB_RES_FREXTENTS	0x00000020
+#define	XFS_TRANS_SB_DBLOCKS		0x00000040
+#define	XFS_TRANS_SB_AGCOUNT		0x00000080
+#define	XFS_TRANS_SB_IMAXPCT		0x00000100
+#define	XFS_TRANS_SB_REXTSIZE		0x00000200
+#define	XFS_TRANS_SB_RBMBLOCKS		0x00000400
+#define	XFS_TRANS_SB_RBLOCKS		0x00000800
+#define	XFS_TRANS_SB_REXTENTS		0x00001000
+#define	XFS_TRANS_SB_REXTSLOG		0x00002000
+
+/*
+ * Here we centralize the specification of XFS meta-data buffer reference count
+ * values.  This determine how hard the buffer cache tries to hold onto the
+ * buffer.
+ */
+#define	XFS_AGF_REF		4
+#define	XFS_AGI_REF		4
+#define	XFS_AGFL_REF		3
+#define	XFS_INO_BTREE_REF	3
+#define	XFS_ALLOC_BTREE_REF	2
+#define	XFS_BMAP_BTREE_REF	2
+#define	XFS_DIR_BTREE_REF	2
+#define	XFS_INO_REF		2
+#define	XFS_ATTR_BTREE_REF	1
+#define	XFS_DQUOT_REF		1
+
+/*
+ * Flags for xfs_trans_ichgtime().
+ */
+#define	XFS_ICHGTIME_MOD	0x1	/* data fork modification timestamp */
+#define	XFS_ICHGTIME_CHG	0x2	/* inode field change timestamp */
+#define	XFS_ICHGTIME_CREATE	0x4	/* inode create timestamp */
+
+
+/*
+ * Symlink decoding/encoding functions
+ */
+int xfs_symlink_blocks(struct xfs_mount *mp, int pathlen);
+int xfs_symlink_hdr_set(struct xfs_mount *mp, xfs_ino_t ino, uint32_t offset,
+			uint32_t size, struct xfs_buf *bp);
+bool xfs_symlink_hdr_ok(struct xfs_mount *mp, xfs_ino_t ino, uint32_t offset,
+			uint32_t size, struct xfs_buf *bp);
+void xfs_symlink_local_to_remote(struct xfs_trans *tp, struct xfs_buf *bp,
+				 struct xfs_inode *ip, struct xfs_ifork *ifp);
+
+#endif /* __XFS_SHARED_H__ */
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 04/36] xfs: split dquot buffer operations out
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (2 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 03/36] xfs: create a shared header file for format-related information Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 05/36] xfs: decouple inode and bmap btree header files Dave Chinner
                   ` (32 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Parts of userspace want to be able to read and modify dquot buffers
(e.g. xfs_db) so we need to split out the reading and writing of
these buffers so it is easy to shared code with libxfs in userspace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/libxfs.h       |   9 ++
 libxfs/Makefile        |   1 +
 libxfs/xfs_dquot_buf.c | 273 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 283 insertions(+)
 create mode 100644 libxfs/xfs_dquot_buf.c

diff --git a/include/libxfs.h b/include/libxfs.h
index 835ba37..f10ab59 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -216,6 +216,15 @@ typedef struct xfs_mount {
 	xfs_dablk_t		m_dirdatablk;	/* blockno of dir data v2 */
 	xfs_dablk_t		m_dirleafblk;	/* blockno of dir non-data v2 */
 	xfs_dablk_t		m_dirfreeblk;	/* blockno of dirfreeindex v2 */
+
+	/*
+	 * anonymous struct to allow xfs_dquot_buf.c to compile.
+	 * Pointer is always null in userspace, so code does not use it at all
+	 */
+	struct {
+		int	qi_dqperchunk;
+	}			*m_quotainfo;
+
 } xfs_mount_t;
 
 /*
diff --git a/libxfs/Makefile b/libxfs/Makefile
index f0cbae3..4522218 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -29,6 +29,7 @@ CFILES = cache.c \
 	xfs_dir2_leaf.c \
 	xfs_dir2_node.c \
 	xfs_dir2_sf.c \
+	xfs_dquot_buf.c \
 	xfs_ialloc.c \
 	xfs_inode_buf.c \
 	xfs_inode_fork.c \
diff --git a/libxfs/xfs_dquot_buf.c b/libxfs/xfs_dquot_buf.c
new file mode 100644
index 0000000..620d9d3
--- /dev/null
+++ b/libxfs/xfs_dquot_buf.c
@@ -0,0 +1,273 @@
+/*
+ * Copyright (c) 2000-2006 Silicon Graphics, Inc.
+ * Copyright (c) 2013 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#include "xfs.h"
+
+int
+xfs_calc_dquots_per_chunk(
+	struct xfs_mount	*mp,
+	unsigned int		nbblks)	/* basic block units */
+{
+	unsigned int	ndquots;
+
+	ASSERT(nbblks > 0);
+	ndquots = BBTOB(nbblks);
+	do_div(ndquots, sizeof(xfs_dqblk_t));
+
+	return ndquots;
+}
+
+/*
+ * Do some primitive error checking on ondisk dquot data structures.
+ */
+int
+xfs_dqcheck(
+	struct xfs_mount *mp,
+	xfs_disk_dquot_t *ddq,
+	xfs_dqid_t	 id,
+	uint		 type,	  /* used only when IO_dorepair is true */
+	uint		 flags,
+	char		 *str)
+{
+	xfs_dqblk_t	 *d = (xfs_dqblk_t *)ddq;
+	int		errs = 0;
+
+	/*
+	 * We can encounter an uninitialized dquot buffer for 2 reasons:
+	 * 1. If we crash while deleting the quotainode(s), and those blks got
+	 *    used for user data. This is because we take the path of regular
+	 *    file deletion; however, the size field of quotainodes is never
+	 *    updated, so all the tricks that we play in itruncate_finish
+	 *    don't quite matter.
+	 *
+	 * 2. We don't play the quota buffers when there's a quotaoff logitem.
+	 *    But the allocation will be replayed so we'll end up with an
+	 *    uninitialized quota block.
+	 *
+	 * This is all fine; things are still consistent, and we haven't lost
+	 * any quota information. Just don't complain about bad dquot blks.
+	 */
+	if (ddq->d_magic != cpu_to_be16(XFS_DQUOT_MAGIC)) {
+		if (flags & XFS_QMOPT_DOWARN)
+			xfs_alert(mp,
+			"%s : XFS dquot ID 0x%x, magic 0x%x != 0x%x",
+			str, id, be16_to_cpu(ddq->d_magic), XFS_DQUOT_MAGIC);
+		errs++;
+	}
+	if (ddq->d_version != XFS_DQUOT_VERSION) {
+		if (flags & XFS_QMOPT_DOWARN)
+			xfs_alert(mp,
+			"%s : XFS dquot ID 0x%x, version 0x%x != 0x%x",
+			str, id, ddq->d_version, XFS_DQUOT_VERSION);
+		errs++;
+	}
+
+	if (ddq->d_flags != XFS_DQ_USER &&
+	    ddq->d_flags != XFS_DQ_PROJ &&
+	    ddq->d_flags != XFS_DQ_GROUP) {
+		if (flags & XFS_QMOPT_DOWARN)
+			xfs_alert(mp,
+			"%s : XFS dquot ID 0x%x, unknown flags 0x%x",
+			str, id, ddq->d_flags);
+		errs++;
+	}
+
+	if (id != -1 && id != be32_to_cpu(ddq->d_id)) {
+		if (flags & XFS_QMOPT_DOWARN)
+			xfs_alert(mp,
+			"%s : ondisk-dquot 0x%p, ID mismatch: "
+			"0x%x expected, found id 0x%x",
+			str, ddq, id, be32_to_cpu(ddq->d_id));
+		errs++;
+	}
+
+	if (!errs && ddq->d_id) {
+		if (ddq->d_blk_softlimit &&
+		    be64_to_cpu(ddq->d_bcount) >
+				be64_to_cpu(ddq->d_blk_softlimit)) {
+			if (!ddq->d_btimer) {
+				if (flags & XFS_QMOPT_DOWARN)
+					xfs_alert(mp,
+			"%s : Dquot ID 0x%x (0x%p) BLK TIMER NOT STARTED",
+					str, (int)be32_to_cpu(ddq->d_id), ddq);
+				errs++;
+			}
+		}
+		if (ddq->d_ino_softlimit &&
+		    be64_to_cpu(ddq->d_icount) >
+				be64_to_cpu(ddq->d_ino_softlimit)) {
+			if (!ddq->d_itimer) {
+				if (flags & XFS_QMOPT_DOWARN)
+					xfs_alert(mp,
+			"%s : Dquot ID 0x%x (0x%p) INODE TIMER NOT STARTED",
+					str, (int)be32_to_cpu(ddq->d_id), ddq);
+				errs++;
+			}
+		}
+		if (ddq->d_rtb_softlimit &&
+		    be64_to_cpu(ddq->d_rtbcount) >
+				be64_to_cpu(ddq->d_rtb_softlimit)) {
+			if (!ddq->d_rtbtimer) {
+				if (flags & XFS_QMOPT_DOWARN)
+					xfs_alert(mp,
+			"%s : Dquot ID 0x%x (0x%p) RTBLK TIMER NOT STARTED",
+					str, (int)be32_to_cpu(ddq->d_id), ddq);
+				errs++;
+			}
+		}
+	}
+
+	if (!errs || !(flags & XFS_QMOPT_DQREPAIR))
+		return errs;
+
+	if (flags & XFS_QMOPT_DOWARN)
+		xfs_notice(mp, "Re-initializing dquot ID 0x%x", id);
+
+	/*
+	 * Typically, a repair is only requested by quotacheck.
+	 */
+	ASSERT(id != -1);
+	ASSERT(flags & XFS_QMOPT_DQREPAIR);
+	memset(d, 0, sizeof(xfs_dqblk_t));
+
+	d->dd_diskdq.d_magic = cpu_to_be16(XFS_DQUOT_MAGIC);
+	d->dd_diskdq.d_version = XFS_DQUOT_VERSION;
+	d->dd_diskdq.d_flags = type;
+	d->dd_diskdq.d_id = cpu_to_be32(id);
+
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
+		uuid_copy(&d->dd_uuid, &mp->m_sb.sb_uuid);
+		xfs_update_cksum((char *)d, sizeof(struct xfs_dqblk),
+				 XFS_DQUOT_CRC_OFF);
+	}
+
+	return errs;
+}
+
+STATIC bool
+xfs_dquot_buf_verify_crc(
+	struct xfs_mount	*mp,
+	struct xfs_buf		*bp)
+{
+	struct xfs_dqblk	*d = (struct xfs_dqblk *)bp->b_addr;
+	int			ndquots;
+	int			i;
+
+	if (!xfs_sb_version_hascrc(&mp->m_sb))
+		return true;
+
+	/*
+	 * if we are in log recovery, the quota subsystem has not been
+	 * initialised so we have no quotainfo structure. In that case, we need
+	 * to manually calculate the number of dquots in the buffer.
+	 */
+	if (mp->m_quotainfo)
+		ndquots = mp->m_quotainfo->qi_dqperchunk;
+	else
+		ndquots = xfs_calc_dquots_per_chunk(mp,
+					XFS_BB_TO_FSB(mp, bp->b_length));
+
+	for (i = 0; i < ndquots; i++, d++) {
+		if (!xfs_verify_cksum((char *)d, sizeof(struct xfs_dqblk),
+				 XFS_DQUOT_CRC_OFF))
+			return false;
+		if (!uuid_equal(&d->dd_uuid, &mp->m_sb.sb_uuid))
+			return false;
+	}
+	return true;
+}
+
+STATIC bool
+xfs_dquot_buf_verify(
+	struct xfs_mount	*mp,
+	struct xfs_buf		*bp)
+{
+	struct xfs_dqblk	*d = (struct xfs_dqblk *)bp->b_addr;
+	xfs_dqid_t		id = 0;
+	int			ndquots;
+	int			i;
+
+	/*
+	 * if we are in log recovery, the quota subsystem has not been
+	 * initialised so we have no quotainfo structure. In that case, we need
+	 * to manually calculate the number of dquots in the buffer.
+	 */
+	if (mp->m_quotainfo)
+		ndquots = mp->m_quotainfo->qi_dqperchunk;
+	else
+		ndquots = xfs_calc_dquots_per_chunk(mp, bp->b_length);
+
+	/*
+	 * On the first read of the buffer, verify that each dquot is valid.
+	 * We don't know what the id of the dquot is supposed to be, just that
+	 * they should be increasing monotonically within the buffer. If the
+	 * first id is corrupt, then it will fail on the second dquot in the
+	 * buffer so corruptions could point to the wrong dquot in this case.
+	 */
+	for (i = 0; i < ndquots; i++) {
+		struct xfs_disk_dquot	*ddq;
+		int			error;
+
+		ddq = &d[i].dd_diskdq;
+
+		if (i == 0)
+			id = be32_to_cpu(ddq->d_id);
+
+		error = xfs_dqcheck(mp, ddq, id + i, 0, XFS_QMOPT_DOWARN,
+				       "xfs_dquot_buf_verify");
+		if (error)
+			return false;
+	}
+	return true;
+}
+
+static void
+xfs_dquot_buf_read_verify(
+	struct xfs_buf	*bp)
+{
+	struct xfs_mount	*mp = bp->b_target->bt_mount;
+
+	if (!xfs_dquot_buf_verify_crc(mp, bp) || !xfs_dquot_buf_verify(mp, bp)) {
+		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
+		xfs_buf_ioerror(bp, EFSCORRUPTED);
+	}
+}
+
+/*
+ * we don't calculate the CRC here as that is done when the dquot is flushed to
+ * the buffer after the update is done. This ensures that the dquot in the
+ * buffer always has an up-to-date CRC value.
+ */
+void
+xfs_dquot_buf_write_verify(
+	struct xfs_buf	*bp)
+{
+	struct xfs_mount	*mp = bp->b_target->bt_mount;
+
+	if (!xfs_dquot_buf_verify(mp, bp)) {
+		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
+		xfs_buf_ioerror(bp, EFSCORRUPTED);
+		return;
+	}
+}
+
+const struct xfs_buf_ops xfs_dquot_buf_ops = {
+	.verify_read = xfs_dquot_buf_read_verify,
+	.verify_write = xfs_dquot_buf_write_verify,
+};
+
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 05/36] xfs: decouple inode and bmap btree header files
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (3 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 04/36] xfs: split dquot buffer operations out Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 06/36] libxfs: unify xfs_btree.c with kernel code Dave Chinner
                   ` (31 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Currently the xfs_inode.h header has a dependency on the definition
of the BMAP btree records as the inode fork includes an array of
xfs_bmbt_rec_host_t objects in it's definition.

Move all the btree format definitions from xfs_btree.h,
xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
xfs_format.h to continue the process of centralising the on-disk
format definitions. With this done, the xfs inode definitions are no
longer dependent on btree header files.

The enables a massive culling of unnecessary includes, with close to
200 #include directives removed from the XFS kernel code base.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs_alloc_btree.h  |  33 ------
 include/xfs_bmap_btree.h   | 103 -----------------
 include/xfs_btree.h        |  80 -------------
 include/xfs_format.h       | 281 +++++++++++++++++++++++++++++++++++++++++++++
 include/xfs_ialloc.h       |   3 +-
 include/xfs_ialloc_btree.h |  49 --------
 include/xfs_inode_fork.h   |   1 +
 7 files changed, 284 insertions(+), 266 deletions(-)

diff --git a/include/xfs_alloc_btree.h b/include/xfs_alloc_btree.h
index 72676c3..45e189e 100644
--- a/include/xfs_alloc_btree.h
+++ b/include/xfs_alloc_btree.h
@@ -27,39 +27,6 @@ struct xfs_btree_cur;
 struct xfs_mount;
 
 /*
- * There are two on-disk btrees, one sorted by blockno and one sorted
- * by blockcount and blockno.  All blocks look the same to make the code
- * simpler; if we have time later, we'll make the optimizations.
- */
-#define	XFS_ABTB_MAGIC		0x41425442	/* 'ABTB' for bno tree */
-#define	XFS_ABTB_CRC_MAGIC	0x41423342	/* 'AB3B' */
-#define	XFS_ABTC_MAGIC		0x41425443	/* 'ABTC' for cnt tree */
-#define	XFS_ABTC_CRC_MAGIC	0x41423343	/* 'AB3C' */
-
-/*
- * Data record/key structure
- */
-typedef struct xfs_alloc_rec {
-	__be32		ar_startblock;	/* starting block number */
-	__be32		ar_blockcount;	/* count of free blocks */
-} xfs_alloc_rec_t, xfs_alloc_key_t;
-
-typedef struct xfs_alloc_rec_incore {
-	xfs_agblock_t	ar_startblock;	/* starting block number */
-	xfs_extlen_t	ar_blockcount;	/* count of free blocks */
-} xfs_alloc_rec_incore_t;
-
-/* btree pointer type */
-typedef __be32 xfs_alloc_ptr_t;
-
-/*
- * Block numbers in the AG:
- * SB is sector 0, AGF is sector 1, AGI is sector 2, AGFL is sector 3.
- */
-#define	XFS_BNO_BLOCK(mp)	((xfs_agblock_t)(XFS_AGFL_BLOCK(mp) + 1))
-#define	XFS_CNT_BLOCK(mp)	((xfs_agblock_t)(XFS_BNO_BLOCK(mp) + 1))
-
-/*
  * Btree block header size depends on a superblock flag.
  */
 #define XFS_ALLOC_BLOCK_LEN(mp) \
diff --git a/include/xfs_bmap_btree.h b/include/xfs_bmap_btree.h
index e307978..2379d33 100644
--- a/include/xfs_bmap_btree.h
+++ b/include/xfs_bmap_btree.h
@@ -18,9 +18,6 @@
 #ifndef __XFS_BMAP_BTREE_H__
 #define __XFS_BMAP_BTREE_H__
 
-#define XFS_BMAP_MAGIC		0x424d4150	/* 'BMAP' */
-#define XFS_BMAP_CRC_MAGIC	0x424d4133	/* 'BMA3' */
-
 struct xfs_btree_cur;
 struct xfs_btree_block;
 struct xfs_mount;
@@ -28,85 +25,6 @@ struct xfs_inode;
 struct xfs_trans;
 
 /*
- * Bmap root header, on-disk form only.
- */
-typedef struct xfs_bmdr_block {
-	__be16		bb_level;	/* 0 is a leaf */
-	__be16		bb_numrecs;	/* current # of data records */
-} xfs_bmdr_block_t;
-
-/*
- * Bmap btree record and extent descriptor.
- *  l0:63 is an extent flag (value 1 indicates non-normal).
- *  l0:9-62 are startoff.
- *  l0:0-8 and l1:21-63 are startblock.
- *  l1:0-20 are blockcount.
- */
-#define BMBT_EXNTFLAG_BITLEN	1
-#define BMBT_STARTOFF_BITLEN	54
-#define BMBT_STARTBLOCK_BITLEN	52
-#define BMBT_BLOCKCOUNT_BITLEN	21
-
-typedef struct xfs_bmbt_rec {
-	__be64			l0, l1;
-} xfs_bmbt_rec_t;
-
-typedef __uint64_t	xfs_bmbt_rec_base_t;	/* use this for casts */
-typedef xfs_bmbt_rec_t xfs_bmdr_rec_t;
-
-typedef struct xfs_bmbt_rec_host {
-	__uint64_t		l0, l1;
-} xfs_bmbt_rec_host_t;
-
-/*
- * Values and macros for delayed-allocation startblock fields.
- */
-#define STARTBLOCKVALBITS	17
-#define STARTBLOCKMASKBITS	(15 + XFS_BIG_BLKNOS * 20)
-#define DSTARTBLOCKMASKBITS	(15 + 20)
-#define STARTBLOCKMASK		\
-	(((((xfs_fsblock_t)1) << STARTBLOCKMASKBITS) - 1) << STARTBLOCKVALBITS)
-#define DSTARTBLOCKMASK		\
-	(((((xfs_dfsbno_t)1) << DSTARTBLOCKMASKBITS) - 1) << STARTBLOCKVALBITS)
-
-static inline int isnullstartblock(xfs_fsblock_t x)
-{
-	return ((x) & STARTBLOCKMASK) == STARTBLOCKMASK;
-}
-
-static inline int isnulldstartblock(xfs_dfsbno_t x)
-{
-	return ((x) & DSTARTBLOCKMASK) == DSTARTBLOCKMASK;
-}
-
-static inline xfs_fsblock_t nullstartblock(int k)
-{
-	ASSERT(k < (1 << STARTBLOCKVALBITS));
-	return STARTBLOCKMASK | (k);
-}
-
-static inline xfs_filblks_t startblockval(xfs_fsblock_t x)
-{
-	return (xfs_filblks_t)((x) & ~STARTBLOCKMASK);
-}
-
-/*
- * Possible extent formats.
- */
-typedef enum {
-	XFS_EXTFMT_NOSTATE = 0,
-	XFS_EXTFMT_HASSTATE
-} xfs_exntfmt_t;
-
-/*
- * Possible extent states.
- */
-typedef enum {
-	XFS_EXT_NORM, XFS_EXT_UNWRITTEN,
-	XFS_EXT_DMAPI_OFFLINE, XFS_EXT_INVALID
-} xfs_exntst_t;
-
-/*
  * Extent state and extent format macros.
  */
 #define XFS_EXTFMT_INODE(x)	\
@@ -115,27 +33,6 @@ typedef enum {
 #define ISUNWRITTEN(x)	((x)->br_state == XFS_EXT_UNWRITTEN)
 
 /*
- * Incore version of above.
- */
-typedef struct xfs_bmbt_irec
-{
-	xfs_fileoff_t	br_startoff;	/* starting file offset */
-	xfs_fsblock_t	br_startblock;	/* starting block number */
-	xfs_filblks_t	br_blockcount;	/* number of blocks */
-	xfs_exntst_t	br_state;	/* extent state */
-} xfs_bmbt_irec_t;
-
-/*
- * Key structure for non-leaf levels of the tree.
- */
-typedef struct xfs_bmbt_key {
-	__be64		br_startoff;	/* starting file offset */
-} xfs_bmbt_key_t, xfs_bmdr_key_t;
-
-/* btree pointer type */
-typedef __be64 xfs_bmbt_ptr_t, xfs_bmdr_ptr_t;
-
-/*
  * Btree block header size depends on a superblock flag.
  */
 #define XFS_BMBT_BLOCK_LEN(mp) \
diff --git a/include/xfs_btree.h b/include/xfs_btree.h
index b55af99..227bfa5 100644
--- a/include/xfs_btree.h
+++ b/include/xfs_btree.h
@@ -39,86 +39,6 @@ extern kmem_zone_t	*xfs_btree_cur_zone;
 #define	XFS_BTNUM_INO	((xfs_btnum_t)XFS_BTNUM_INOi)
 
 /*
- * Generic btree header.
- *
- * This is a combination of the actual format used on disk for short and long
- * format btrees.  The first three fields are shared by both format, but the
- * pointers are different and should be used with care.
- *
- * To get the size of the actual short or long form headers please use the size
- * macros below.  Never use sizeof(xfs_btree_block).
- *
- * The blkno, crc, lsn, owner and uuid fields are only available in filesystems
- * with the crc feature bit, and all accesses to them must be conditional on
- * that flag.
- */
-struct xfs_btree_block {
-	__be32		bb_magic;	/* magic number for block type */
-	__be16		bb_level;	/* 0 is a leaf */
-	__be16		bb_numrecs;	/* current # of data records */
-	union {
-		struct {
-			__be32		bb_leftsib;
-			__be32		bb_rightsib;
-
-			__be64		bb_blkno;
-			__be64		bb_lsn;
-			uuid_t		bb_uuid;
-			__be32		bb_owner;
-			__le32		bb_crc;
-		} s;			/* short form pointers */
-		struct	{
-			__be64		bb_leftsib;
-			__be64		bb_rightsib;
-
-			__be64		bb_blkno;
-			__be64		bb_lsn;
-			uuid_t		bb_uuid;
-			__be64		bb_owner;
-			__le32		bb_crc;
-			__be32		bb_pad; /* padding for alignment */
-		} l;			/* long form pointers */
-	} bb_u;				/* rest */
-};
-
-#define XFS_BTREE_SBLOCK_LEN	16	/* size of a short form block */
-#define XFS_BTREE_LBLOCK_LEN	24	/* size of a long form block */
-
-/* sizes of CRC enabled btree blocks */
-#define XFS_BTREE_SBLOCK_CRC_LEN	(XFS_BTREE_SBLOCK_LEN + 40)
-#define XFS_BTREE_LBLOCK_CRC_LEN	(XFS_BTREE_LBLOCK_LEN + 48)
-
-#define XFS_BTREE_SBLOCK_CRC_OFF \
-	offsetof(struct xfs_btree_block, bb_u.s.bb_crc)
-#define XFS_BTREE_LBLOCK_CRC_OFF \
-	offsetof(struct xfs_btree_block, bb_u.l.bb_crc)
-
-/*
- * Generic key, ptr and record wrapper structures.
- *
- * These are disk format structures, and are converted where necessary
- * by the btree specific code that needs to interpret them.
- */
-union xfs_btree_ptr {
-	__be32			s;	/* short form ptr */
-	__be64			l;	/* long form ptr */
-};
-
-union xfs_btree_key {
-	xfs_bmbt_key_t		bmbt;
-	xfs_bmdr_key_t		bmbr;	/* bmbt root block */
-	xfs_alloc_key_t		alloc;
-	xfs_inobt_key_t		inobt;
-};
-
-union xfs_btree_rec {
-	xfs_bmbt_rec_t		bmbt;
-	xfs_bmdr_rec_t		bmbr;	/* bmbt root block */
-	xfs_alloc_rec_t		alloc;
-	xfs_inobt_rec_t		inobt;
-};
-
-/*
  * For logging record fields.
  */
 #define	XFS_BB_MAGIC		0x01
diff --git a/include/xfs_format.h b/include/xfs_format.h
index a790428..997c770 100644
--- a/include/xfs_format.h
+++ b/include/xfs_format.h
@@ -156,4 +156,285 @@ struct xfs_dsymlink_hdr {
 	((bufsize) - (xfs_sb_version_hascrc(&(mp)->m_sb) ? \
 			sizeof(struct xfs_dsymlink_hdr) : 0))
 
+
+/*
+ * Allocation Btree format definitions
+ *
+ * There are two on-disk btrees, one sorted by blockno and one sorted
+ * by blockcount and blockno.  All blocks look the same to make the code
+ * simpler; if we have time later, we'll make the optimizations.
+ */
+#define	XFS_ABTB_MAGIC		0x41425442	/* 'ABTB' for bno tree */
+#define	XFS_ABTB_CRC_MAGIC	0x41423342	/* 'AB3B' */
+#define	XFS_ABTC_MAGIC		0x41425443	/* 'ABTC' for cnt tree */
+#define	XFS_ABTC_CRC_MAGIC	0x41423343	/* 'AB3C' */
+
+/*
+ * Data record/key structure
+ */
+typedef struct xfs_alloc_rec {
+	__be32		ar_startblock;	/* starting block number */
+	__be32		ar_blockcount;	/* count of free blocks */
+} xfs_alloc_rec_t, xfs_alloc_key_t;
+
+typedef struct xfs_alloc_rec_incore {
+	xfs_agblock_t	ar_startblock;	/* starting block number */
+	xfs_extlen_t	ar_blockcount;	/* count of free blocks */
+} xfs_alloc_rec_incore_t;
+
+/* btree pointer type */
+typedef __be32 xfs_alloc_ptr_t;
+
+/*
+ * Block numbers in the AG:
+ * SB is sector 0, AGF is sector 1, AGI is sector 2, AGFL is sector 3.
+ */
+#define	XFS_BNO_BLOCK(mp)	((xfs_agblock_t)(XFS_AGFL_BLOCK(mp) + 1))
+#define	XFS_CNT_BLOCK(mp)	((xfs_agblock_t)(XFS_BNO_BLOCK(mp) + 1))
+
+
+/*
+ * Inode Allocation Btree format definitions
+ *
+ * There is a btree for the inode map per allocation group.
+ */
+#define	XFS_IBT_MAGIC		0x49414254	/* 'IABT' */
+#define	XFS_IBT_CRC_MAGIC	0x49414233	/* 'IAB3' */
+
+typedef	__uint64_t	xfs_inofree_t;
+#define	XFS_INODES_PER_CHUNK		(NBBY * sizeof(xfs_inofree_t))
+#define	XFS_INODES_PER_CHUNK_LOG	(XFS_NBBYLOG + 3)
+#define	XFS_INOBT_ALL_FREE		((xfs_inofree_t)-1)
+#define	XFS_INOBT_MASK(i)		((xfs_inofree_t)1 << (i))
+
+static inline xfs_inofree_t xfs_inobt_maskn(int i, int n)
+{
+	return ((n >= XFS_INODES_PER_CHUNK ? 0 : XFS_INOBT_MASK(n)) - 1) << i;
+}
+
+/*
+ * Data record structure
+ */
+typedef struct xfs_inobt_rec {
+	__be32		ir_startino;	/* starting inode number */
+	__be32		ir_freecount;	/* count of free inodes (set bits) */
+	__be64		ir_free;	/* free inode mask */
+} xfs_inobt_rec_t;
+
+typedef struct xfs_inobt_rec_incore {
+	xfs_agino_t	ir_startino;	/* starting inode number */
+	__int32_t	ir_freecount;	/* count of free inodes (set bits) */
+	xfs_inofree_t	ir_free;	/* free inode mask */
+} xfs_inobt_rec_incore_t;
+
+
+/*
+ * Key structure
+ */
+typedef struct xfs_inobt_key {
+	__be32		ir_startino;	/* starting inode number */
+} xfs_inobt_key_t;
+
+/* btree pointer type */
+typedef __be32 xfs_inobt_ptr_t;
+
+/*
+ * block numbers in the AG.
+ */
+#define	XFS_IBT_BLOCK(mp)		((xfs_agblock_t)(XFS_CNT_BLOCK(mp) + 1))
+#define	XFS_PREALLOC_BLOCKS(mp)		((xfs_agblock_t)(XFS_IBT_BLOCK(mp) + 1))
+
+
+
+/*
+ * BMAP Btree format definitions
+ *
+ * This includes both the root block definition that sits inside an inode fork
+ * and the record/pointer formats for the leaf/node in the blocks.
+ */
+#define XFS_BMAP_MAGIC		0x424d4150	/* 'BMAP' */
+#define XFS_BMAP_CRC_MAGIC	0x424d4133	/* 'BMA3' */
+
+/*
+ * Bmap root header, on-disk form only.
+ */
+typedef struct xfs_bmdr_block {
+	__be16		bb_level;	/* 0 is a leaf */
+	__be16		bb_numrecs;	/* current # of data records */
+} xfs_bmdr_block_t;
+
+/*
+ * Bmap btree record and extent descriptor.
+ *  l0:63 is an extent flag (value 1 indicates non-normal).
+ *  l0:9-62 are startoff.
+ *  l0:0-8 and l1:21-63 are startblock.
+ *  l1:0-20 are blockcount.
+ */
+#define BMBT_EXNTFLAG_BITLEN	1
+#define BMBT_STARTOFF_BITLEN	54
+#define BMBT_STARTBLOCK_BITLEN	52
+#define BMBT_BLOCKCOUNT_BITLEN	21
+
+typedef struct xfs_bmbt_rec {
+	__be64			l0, l1;
+} xfs_bmbt_rec_t;
+
+typedef __uint64_t	xfs_bmbt_rec_base_t;	/* use this for casts */
+typedef xfs_bmbt_rec_t xfs_bmdr_rec_t;
+
+typedef struct xfs_bmbt_rec_host {
+	__uint64_t		l0, l1;
+} xfs_bmbt_rec_host_t;
+
+/*
+ * Values and macros for delayed-allocation startblock fields.
+ */
+#define STARTBLOCKVALBITS	17
+#define STARTBLOCKMASKBITS	(15 + XFS_BIG_BLKNOS * 20)
+#define DSTARTBLOCKMASKBITS	(15 + 20)
+#define STARTBLOCKMASK		\
+	(((((xfs_fsblock_t)1) << STARTBLOCKMASKBITS) - 1) << STARTBLOCKVALBITS)
+#define DSTARTBLOCKMASK		\
+	(((((xfs_dfsbno_t)1) << DSTARTBLOCKMASKBITS) - 1) << STARTBLOCKVALBITS)
+
+static inline int isnullstartblock(xfs_fsblock_t x)
+{
+	return ((x) & STARTBLOCKMASK) == STARTBLOCKMASK;
+}
+
+static inline int isnulldstartblock(xfs_dfsbno_t x)
+{
+	return ((x) & DSTARTBLOCKMASK) == DSTARTBLOCKMASK;
+}
+
+static inline xfs_fsblock_t nullstartblock(int k)
+{
+	ASSERT(k < (1 << STARTBLOCKVALBITS));
+	return STARTBLOCKMASK | (k);
+}
+
+static inline xfs_filblks_t startblockval(xfs_fsblock_t x)
+{
+	return (xfs_filblks_t)((x) & ~STARTBLOCKMASK);
+}
+
+/*
+ * Possible extent formats.
+ */
+typedef enum {
+	XFS_EXTFMT_NOSTATE = 0,
+	XFS_EXTFMT_HASSTATE
+} xfs_exntfmt_t;
+
+/*
+ * Possible extent states.
+ */
+typedef enum {
+	XFS_EXT_NORM, XFS_EXT_UNWRITTEN,
+	XFS_EXT_DMAPI_OFFLINE, XFS_EXT_INVALID
+} xfs_exntst_t;
+
+/*
+ * Incore version of above.
+ */
+typedef struct xfs_bmbt_irec
+{
+	xfs_fileoff_t	br_startoff;	/* starting file offset */
+	xfs_fsblock_t	br_startblock;	/* starting block number */
+	xfs_filblks_t	br_blockcount;	/* number of blocks */
+	xfs_exntst_t	br_state;	/* extent state */
+} xfs_bmbt_irec_t;
+
+/*
+ * Key structure for non-leaf levels of the tree.
+ */
+typedef struct xfs_bmbt_key {
+	__be64		br_startoff;	/* starting file offset */
+} xfs_bmbt_key_t, xfs_bmdr_key_t;
+
+/* btree pointer type */
+typedef __be64 xfs_bmbt_ptr_t, xfs_bmdr_ptr_t;
+
+
+/*
+ * Generic Btree block format definitions
+ *
+ * This is a combination of the actual format used on disk for short and long
+ * format btrees.  The first three fields are shared by both format, but the
+ * pointers are different and should be used with care.
+ *
+ * To get the size of the actual short or long form headers please use the size
+ * macros below.  Never use sizeof(xfs_btree_block).
+ *
+ * The blkno, crc, lsn, owner and uuid fields are only available in filesystems
+ * with the crc feature bit, and all accesses to them must be conditional on
+ * that flag.
+ */
+struct xfs_btree_block {
+	__be32		bb_magic;	/* magic number for block type */
+	__be16		bb_level;	/* 0 is a leaf */
+	__be16		bb_numrecs;	/* current # of data records */
+	union {
+		struct {
+			__be32		bb_leftsib;
+			__be32		bb_rightsib;
+
+			__be64		bb_blkno;
+			__be64		bb_lsn;
+			uuid_t		bb_uuid;
+			__be32		bb_owner;
+			__le32		bb_crc;
+		} s;			/* short form pointers */
+		struct	{
+			__be64		bb_leftsib;
+			__be64		bb_rightsib;
+
+			__be64		bb_blkno;
+			__be64		bb_lsn;
+			uuid_t		bb_uuid;
+			__be64		bb_owner;
+			__le32		bb_crc;
+			__be32		bb_pad; /* padding for alignment */
+		} l;			/* long form pointers */
+	} bb_u;				/* rest */
+};
+
+#define XFS_BTREE_SBLOCK_LEN	16	/* size of a short form block */
+#define XFS_BTREE_LBLOCK_LEN	24	/* size of a long form block */
+
+/* sizes of CRC enabled btree blocks */
+#define XFS_BTREE_SBLOCK_CRC_LEN	(XFS_BTREE_SBLOCK_LEN + 40)
+#define XFS_BTREE_LBLOCK_CRC_LEN	(XFS_BTREE_LBLOCK_LEN + 48)
+
+#define XFS_BTREE_SBLOCK_CRC_OFF \
+	offsetof(struct xfs_btree_block, bb_u.s.bb_crc)
+#define XFS_BTREE_LBLOCK_CRC_OFF \
+	offsetof(struct xfs_btree_block, bb_u.l.bb_crc)
+
+/*
+ * Generic key, ptr and record wrapper structures.
+ *
+ * These are disk format structures, and are converted where necessary
+ * by the btree specific code that needs to interpret them.
+ */
+union xfs_btree_ptr {
+	__be32			s;	/* short form ptr */
+	__be64			l;	/* long form ptr */
+};
+
+union xfs_btree_key {
+	xfs_bmbt_key_t		bmbt;
+	xfs_bmdr_key_t		bmbr;	/* bmbt root block */
+	xfs_alloc_key_t		alloc;
+	xfs_inobt_key_t		inobt;
+};
+
+union xfs_btree_rec {
+	xfs_bmbt_rec_t		bmbt;
+	xfs_bmdr_rec_t		bmbr;	/* bmbt root block */
+	xfs_alloc_rec_t		alloc;
+	xfs_inobt_rec_t		inobt;
+};
+
+
 #endif /* __XFS_FORMAT_H__ */
diff --git a/include/xfs_ialloc.h b/include/xfs_ialloc.h
index 1557798..a8f76a5 100644
--- a/include/xfs_ialloc.h
+++ b/include/xfs_ialloc.h
@@ -23,6 +23,7 @@ struct xfs_dinode;
 struct xfs_imap;
 struct xfs_mount;
 struct xfs_trans;
+struct xfs_btree_cur;
 
 /*
  * Allocation parameters for inode allocation.
@@ -42,7 +43,7 @@ struct xfs_trans;
 static inline struct xfs_dinode *
 xfs_make_iptr(struct xfs_mount *mp, struct xfs_buf *b, int o)
 {
-	return (xfs_dinode_t *)
+	return (struct xfs_dinode *)
 		(xfs_buf_offset(b, o << (mp)->m_sb.sb_inodelog));
 }
 
diff --git a/include/xfs_ialloc_btree.h b/include/xfs_ialloc_btree.h
index cfbfe46..f38b220 100644
--- a/include/xfs_ialloc_btree.h
+++ b/include/xfs_ialloc_btree.h
@@ -27,55 +27,6 @@ struct xfs_btree_cur;
 struct xfs_mount;
 
 /*
- * There is a btree for the inode map per allocation group.
- */
-#define	XFS_IBT_MAGIC		0x49414254	/* 'IABT' */
-#define	XFS_IBT_CRC_MAGIC	0x49414233	/* 'IAB3' */
-
-typedef	__uint64_t	xfs_inofree_t;
-#define	XFS_INODES_PER_CHUNK		(NBBY * sizeof(xfs_inofree_t))
-#define	XFS_INODES_PER_CHUNK_LOG	(XFS_NBBYLOG + 3)
-#define	XFS_INOBT_ALL_FREE		((xfs_inofree_t)-1)
-#define	XFS_INOBT_MASK(i)		((xfs_inofree_t)1 << (i))
-
-static inline xfs_inofree_t xfs_inobt_maskn(int i, int n)
-{
-	return ((n >= XFS_INODES_PER_CHUNK ? 0 : XFS_INOBT_MASK(n)) - 1) << i;
-}
-
-/*
- * Data record structure
- */
-typedef struct xfs_inobt_rec {
-	__be32		ir_startino;	/* starting inode number */
-	__be32		ir_freecount;	/* count of free inodes (set bits) */
-	__be64		ir_free;	/* free inode mask */
-} xfs_inobt_rec_t;
-
-typedef struct xfs_inobt_rec_incore {
-	xfs_agino_t	ir_startino;	/* starting inode number */
-	__int32_t	ir_freecount;	/* count of free inodes (set bits) */
-	xfs_inofree_t	ir_free;	/* free inode mask */
-} xfs_inobt_rec_incore_t;
-
-
-/*
- * Key structure
- */
-typedef struct xfs_inobt_key {
-	__be32		ir_startino;	/* starting inode number */
-} xfs_inobt_key_t;
-
-/* btree pointer type */
-typedef __be32 xfs_inobt_ptr_t;
-
-/*
- * block numbers in the AG.
- */
-#define	XFS_IBT_BLOCK(mp)		((xfs_agblock_t)(XFS_CNT_BLOCK(mp) + 1))
-#define	XFS_PREALLOC_BLOCKS(mp)		((xfs_agblock_t)(XFS_IBT_BLOCK(mp) + 1))
-
-/*
  * Btree block header size depends on a superblock flag.
  */
 #define XFS_INOBT_BLOCK_LEN(mp) \
diff --git a/include/xfs_inode_fork.h b/include/xfs_inode_fork.h
index 28661a0..eb329a1 100644
--- a/include/xfs_inode_fork.h
+++ b/include/xfs_inode_fork.h
@@ -19,6 +19,7 @@
 #define	__XFS_INODE_FORK_H__
 
 struct xfs_inode_log_item;
+struct xfs_dinode;
 
 /*
  * The following xfs_ext_irec_t struct introduces a second (top) level
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 06/36] libxfs: unify xfs_btree.c with kernel code
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (4 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 05/36] xfs: decouple inode and bmap btree header files Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 07/36] libxfs: bmap btree owner swap support Dave Chinner
                   ` (30 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The libxfs/xfs_btree.c code does not contain a small amount of code
for btree block readahead that the kernel code does. Instead, it
short circuits it at a higher layer and doesn't include the lower
layer functions. There is no harm in calling the lower lay functions
and have them do nothing, and doing so unifies the kernel and
userspace code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs.h       |  8 +++++---
 libxfs/xfs_btree.c | 48 +++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/libxfs/xfs.h b/libxfs/xfs.h
index 31acf1b..364fd83 100644
--- a/libxfs/xfs.h
+++ b/libxfs/xfs.h
@@ -319,10 +319,12 @@ roundup_64(__uint64_t x, __uint32_t y)
 
 #define xfs_trans_buf_copy_type(dbp, sbp)
 
-#define xfs_buf_readahead(a,b,c,ops)		((void) 0)	/* no readahead */
+/* no readahead, need to avoid set-but-unused var warnings. */
+#define xfs_buf_readahead(a,d,c,ops)		({	\
+	xfs_daddr_t __d = d;				\
+	__d = __d; /* no set-but-unused warning */	\
+})
 #define xfs_buf_readahead_map(a,b,c,ops)	((void) 0)	/* no readahead */
-#define xfs_btree_reada_bufl(m,fsb,c,ops)	((void) 0)
-#define xfs_btree_reada_bufs(m,fsb,c,x,ops)	((void) 0)
 #define xfs_buftrace(x,y)			((void) 0)	/* debug only */
 
 #define xfs_cmn_err(tag,level,mp,fmt,args...)	cmn_err(level,fmt, ## args)
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 0099926..ce149ad 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -396,7 +396,6 @@ static inline size_t xfs_btree_block_len(struct xfs_btree_cur *cur)
 			return XFS_BTREE_LBLOCK_CRC_LEN;
 		return XFS_BTREE_LBLOCK_LEN;
 	}
-
 	if (cur->bc_flags & XFS_BTREE_CRC_BLOCKS)
 		return XFS_BTREE_SBLOCK_CRC_LEN;
 	return XFS_BTREE_SBLOCK_LEN;
@@ -493,7 +492,7 @@ xfs_btree_ptr_addr(
 }
 
 /*
- * Get a the root block which is stored in the inode.
+ * Get the root block which is stored in the inode.
  *
  * For now this btree implementation assumes the btree root is always
  * stored in the if_broot field of an inode fork.
@@ -716,6 +715,46 @@ xfs_btree_read_bufl(
 	return 0;
 }
 
+/*
+ * Read-ahead the block, don't wait for it, don't return a buffer.
+ * Long-form addressing.
+ */
+/* ARGSUSED */
+void
+xfs_btree_reada_bufl(
+	struct xfs_mount	*mp,		/* file system mount point */
+	xfs_fsblock_t		fsbno,		/* file system block number */
+	xfs_extlen_t		count,		/* count of filesystem blocks */
+	const struct xfs_buf_ops *ops)
+{
+	xfs_daddr_t		d;
+
+	ASSERT(fsbno != NULLFSBLOCK);
+	d = XFS_FSB_TO_DADDR(mp, fsbno);
+	xfs_buf_readahead(mp->m_ddev_targp, d, mp->m_bsize * count, ops);
+}
+
+/*
+ * Read-ahead the block, don't wait for it, don't return a buffer.
+ * Short-form addressing.
+ */
+/* ARGSUSED */
+void
+xfs_btree_reada_bufs(
+	struct xfs_mount	*mp,		/* file system mount point */
+	xfs_agnumber_t		agno,		/* allocation group number */
+	xfs_agblock_t		agbno,		/* allocation group block number */
+	xfs_extlen_t		count,		/* count of filesystem blocks */
+	const struct xfs_buf_ops *ops)
+{
+	xfs_daddr_t		d;
+
+	ASSERT(agno != NULLAGNUMBER);
+	ASSERT(agbno != NULLAGBLOCK);
+	d = XFS_AGB_TO_DADDR(mp, agno, agbno);
+	xfs_buf_readahead(mp->m_ddev_targp, d, mp->m_bsize * count, ops);
+}
+
 STATIC int
 xfs_btree_readahead_lblock(
 	struct xfs_btree_cur	*cur,
@@ -1339,7 +1378,7 @@ xfs_btree_log_block(
 			 * We don't log the CRC when updating a btree
 			 * block but instead recreate it during log
 			 * recovery.  As the log buffers have checksums
-			 * of their this is safe and avoids logging a crc
+			 * of their own this is safe and avoids logging a crc
 			 * update in a lot of places.
 			 */
 			if (fields == XFS_BB_ALL_BITS)
@@ -1629,7 +1668,7 @@ xfs_lookup_get_search_key(
 
 /*
  * Lookup the record.  The cursor is made to point to it, based on dir.
- * Return 0 if can't find any such record, 1 for success.
+ * stat is set to 0 if can't find any such record, 1 for success.
  */
 int					/* error */
 xfs_btree_lookup(
@@ -2701,7 +2740,6 @@ xfs_btree_make_block_unfull(
 
 		if (numrecs < cur->bc_ops->get_dmaxrecs(cur, level)) {
 			/* A root block that can be made bigger. */
-
 			xfs_iroot_realloc(ip, 1, cur->bc_private.b.whichfork);
 		} else {
 			/* A root block that needs replacing */
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 07/36] libxfs: bmap btree owner swap support
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (5 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 06/36] libxfs: unify xfs_btree.c with kernel code Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 08/36] libxfs: xfs_rtalloc.c becomes xfs_rtbitmap.c Dave Chinner
                   ` (29 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

For CRC enabled filesystems, we can't just swap inode forks from one
inode to another when defragmenting a file - the blocks in the inode
fork bmap btree contain pointers back to the owner inode. Hence if
we are to swap the inode forks we have to atomically modify every
block in the btree during the transaction.

This patch brings across the kernel code for doing the owner
swap of an entire fork - something that we are likely to end up
needing in xfs_repair when reparenting stray inodes to lost+found -
without all the associated swap extents transaction and recovery
cruft as those parts are not needed in userspace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs_bmap_btree.h |   4 ++
 include/xfs_btree.h      |  19 ++++--
 include/xfs_inode_buf.h  |  18 ++---
 include/xfs_log_format.h |   8 ++-
 libxfs/xfs_bmap_btree.c  |  44 ++++++++++++
 libxfs/xfs_btree.c       | 170 ++++++++++++++++++++++++++++++++++++++++++-----
 6 files changed, 227 insertions(+), 36 deletions(-)

diff --git a/include/xfs_bmap_btree.h b/include/xfs_bmap_btree.h
index 2379d33..6e42e1e 100644
--- a/include/xfs_bmap_btree.h
+++ b/include/xfs_bmap_btree.h
@@ -133,6 +133,10 @@ extern int xfs_bmbt_get_maxrecs(struct xfs_btree_cur *, int level);
 extern int xfs_bmdr_maxrecs(struct xfs_mount *, int blocklen, int leaf);
 extern int xfs_bmbt_maxrecs(struct xfs_mount *, int blocklen, int leaf);
 
+extern int xfs_bmbt_change_owner(struct xfs_trans *tp, struct xfs_inode *ip,
+				 int whichfork, xfs_ino_t new_owner,
+				 struct list_head *buffer_list);
+
 extern struct xfs_btree_cur *xfs_bmbt_init_cursor(struct xfs_mount *,
 		struct xfs_trans *, struct xfs_inode *, int);
 
diff --git a/include/xfs_btree.h b/include/xfs_btree.h
index 227bfa5..6afe0b2 100644
--- a/include/xfs_btree.h
+++ b/include/xfs_btree.h
@@ -41,15 +41,18 @@ extern kmem_zone_t	*xfs_btree_cur_zone;
 /*
  * For logging record fields.
  */
-#define	XFS_BB_MAGIC		0x01
-#define	XFS_BB_LEVEL		0x02
-#define	XFS_BB_NUMRECS		0x04
-#define	XFS_BB_LEFTSIB		0x08
-#define	XFS_BB_RIGHTSIB		0x10
-#define	XFS_BB_BLKNO		0x20
+#define	XFS_BB_MAGIC		(1 << 0)
+#define	XFS_BB_LEVEL		(1 << 1)
+#define	XFS_BB_NUMRECS		(1 << 2)
+#define	XFS_BB_LEFTSIB		(1 << 3)
+#define	XFS_BB_RIGHTSIB		(1 << 4)
+#define	XFS_BB_BLKNO		(1 << 5)
+#define	XFS_BB_LSN		(1 << 6)
+#define	XFS_BB_UUID		(1 << 7)
+#define	XFS_BB_OWNER		(1 << 8)
 #define	XFS_BB_NUM_BITS		5
 #define	XFS_BB_ALL_BITS		((1 << XFS_BB_NUM_BITS) - 1)
-#define	XFS_BB_NUM_BITS_CRC	8
+#define	XFS_BB_NUM_BITS_CRC	9
 #define	XFS_BB_ALL_BITS_CRC	((1 << XFS_BB_NUM_BITS_CRC) - 1)
 
 /*
@@ -381,6 +384,8 @@ int xfs_btree_new_iroot(struct xfs_btree_cur *, int *, int *);
 int xfs_btree_insert(struct xfs_btree_cur *, int *);
 int xfs_btree_delete(struct xfs_btree_cur *, int *);
 int xfs_btree_get_rec(struct xfs_btree_cur *, union xfs_btree_rec **, int *);
+int xfs_btree_change_owner(struct xfs_btree_cur *cur, __uint64_t new_owner,
+			   struct list_head *buffer_list);
 
 /*
  * btree block CRC helpers
diff --git a/include/xfs_inode_buf.h b/include/xfs_inode_buf.h
index e8fd3bd..9308c47 100644
--- a/include/xfs_inode_buf.h
+++ b/include/xfs_inode_buf.h
@@ -32,17 +32,17 @@ struct xfs_imap {
 	ushort		im_boffset;	/* inode offset in block in bytes */
 };
 
-int		xfs_imap_to_bp(struct xfs_mount *, struct xfs_trans *,
-			       struct xfs_imap *, struct xfs_dinode **,
-			       struct xfs_buf **, uint, uint);
-int		xfs_iread(struct xfs_mount *, struct xfs_trans *,
-			  struct xfs_inode *, uint);
-void		xfs_dinode_calc_crc(struct xfs_mount *, struct xfs_dinode *);
-void		xfs_dinode_to_disk(struct xfs_dinode *,
-				   struct xfs_icdinode *);
+int	xfs_imap_to_bp(struct xfs_mount *, struct xfs_trans *,
+		       struct xfs_imap *, struct xfs_dinode **,
+		       struct xfs_buf **, uint, uint);
+int	xfs_iread(struct xfs_mount *, struct xfs_trans *,
+		  struct xfs_inode *, uint);
+void	xfs_dinode_calc_crc(struct xfs_mount *, struct xfs_dinode *);
+void	xfs_dinode_to_disk(struct xfs_dinode *to, struct xfs_icdinode *from);
+void	xfs_dinode_from_disk(struct xfs_icdinode *to, struct xfs_dinode *from);
 
 #if defined(DEBUG)
-void		xfs_inobp_check(struct xfs_mount *, struct xfs_buf *);
+void	xfs_inobp_check(struct xfs_mount *, struct xfs_buf *);
 #else
 #define	xfs_inobp_check(mp, bp)
 #endif /* DEBUG */
diff --git a/include/xfs_log_format.h b/include/xfs_log_format.h
index aeaa715..f0969c7 100644
--- a/include/xfs_log_format.h
+++ b/include/xfs_log_format.h
@@ -302,6 +302,8 @@ typedef struct xfs_inode_log_format_64 {
 #define	XFS_ILOG_ADATA	0x040	/* log i_af.if_data */
 #define	XFS_ILOG_AEXT	0x080	/* log i_af.if_extents */
 #define	XFS_ILOG_ABROOT	0x100	/* log i_af.i_broot */
+#define XFS_ILOG_DOWNER	0x200	/* change the data fork owner on replay */
+#define XFS_ILOG_AOWNER	0x400	/* change the attr fork owner on replay */
 
 
 /*
@@ -315,7 +317,8 @@ typedef struct xfs_inode_log_format_64 {
 #define	XFS_ILOG_NONCORE	(XFS_ILOG_DDATA | XFS_ILOG_DEXT | \
 				 XFS_ILOG_DBROOT | XFS_ILOG_DEV | \
 				 XFS_ILOG_UUID | XFS_ILOG_ADATA | \
-				 XFS_ILOG_AEXT | XFS_ILOG_ABROOT)
+				 XFS_ILOG_AEXT | XFS_ILOG_ABROOT | \
+				 XFS_ILOG_DOWNER | XFS_ILOG_AOWNER)
 
 #define	XFS_ILOG_DFORK		(XFS_ILOG_DDATA | XFS_ILOG_DEXT | \
 				 XFS_ILOG_DBROOT)
@@ -327,7 +330,8 @@ typedef struct xfs_inode_log_format_64 {
 				 XFS_ILOG_DEXT | XFS_ILOG_DBROOT | \
 				 XFS_ILOG_DEV | XFS_ILOG_UUID | \
 				 XFS_ILOG_ADATA | XFS_ILOG_AEXT | \
-				 XFS_ILOG_ABROOT | XFS_ILOG_TIMESTAMP)
+				 XFS_ILOG_ABROOT | XFS_ILOG_TIMESTAMP | \
+				 XFS_ILOG_DOWNER | XFS_ILOG_AOWNER)
 
 static inline int xfs_ilog_fbroot(int w)
 {
diff --git a/libxfs/xfs_bmap_btree.c b/libxfs/xfs_bmap_btree.c
index bf214cf..2f6b48a 100644
--- a/libxfs/xfs_bmap_btree.c
+++ b/libxfs/xfs_bmap_btree.c
@@ -999,3 +999,47 @@ xfs_bmdr_maxrecs(
 		return blocklen / sizeof(xfs_bmdr_rec_t);
 	return blocklen / (sizeof(xfs_bmdr_key_t) + sizeof(xfs_bmdr_ptr_t));
 }
+
+/*
+ * Change the owner of a btree format fork fo the inode passed in. Change it to
+ * the owner of that is passed in so that we can change owners before or after
+ * we switch forks between inodes. The operation that the caller is doing will
+ * determine whether is needs to change owner before or after the switch.
+ *
+ * For demand paged transactional modification, the fork switch should be done
+ * after reading in all the blocks, modifying them and pinning them in the
+ * transaction. For modification when the buffers are already pinned in memory,
+ * the fork switch can be done before changing the owner as we won't need to
+ * validate the owner until the btree buffers are unpinned and writes can occur
+ * again.
+ *
+ * For recovery based ownership change, there is no transactional context and
+ * so a buffer list must be supplied so that we can record the buffers that we
+ * modified for the caller to issue IO on.
+ */
+int
+xfs_bmbt_change_owner(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	int			whichfork,
+	xfs_ino_t		new_owner,
+	struct list_head	*buffer_list)
+{
+	struct xfs_btree_cur	*cur;
+	int			error;
+
+	ASSERT(tp || buffer_list);
+	ASSERT(!(tp && buffer_list));
+	if (whichfork == XFS_DATA_FORK)
+		ASSERT(ip->i_d.di_format == XFS_DINODE_FMT_BTREE);
+	else
+		ASSERT(ip->i_d.di_aformat == XFS_DINODE_FMT_BTREE);
+
+	cur = xfs_bmbt_init_cursor(ip->i_mount, tp, ip, whichfork);
+	if (!cur)
+		return ENOMEM;
+
+	error = xfs_btree_change_owner(cur, new_owner, buffer_list);
+	xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
+	return error;
+}
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index ce149ad..2dd6fb7 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -837,6 +837,41 @@ xfs_btree_readahead(
 	return xfs_btree_readahead_sblock(cur, lr, block);
 }
 
+STATIC xfs_daddr_t
+xfs_btree_ptr_to_daddr(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr)
+{
+	if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
+		ASSERT(ptr->l != cpu_to_be64(NULLDFSBNO));
+
+		return XFS_FSB_TO_DADDR(cur->bc_mp, be64_to_cpu(ptr->l));
+	} else {
+		ASSERT(cur->bc_private.a.agno != NULLAGNUMBER);
+		ASSERT(ptr->s != cpu_to_be32(NULLAGBLOCK));
+
+		return XFS_AGB_TO_DADDR(cur->bc_mp, cur->bc_private.a.agno,
+					be32_to_cpu(ptr->s));
+	}
+}
+
+/*
+ * Readahead @count btree blocks at the given @ptr location.
+ *
+ * We don't need to care about long or short form btrees here as we have a
+ * method of converting the ptr directly to a daddr available to us.
+ */
+STATIC void
+xfs_btree_readahead_ptr(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr,
+	xfs_extlen_t		count)
+{
+	xfs_buf_readahead(cur->bc_mp->m_ddev_targp,
+			  xfs_btree_ptr_to_daddr(cur, ptr),
+			  cur->bc_mp->m_bsize * count, cur->bc_ops->buf_ops);
+}
+
 /*
  * Set the buffer for level "lev" in the cursor to bp, releasing
  * any previous buffer.
@@ -1055,24 +1090,6 @@ xfs_btree_buf_to_ptr(
 	}
 }
 
-STATIC xfs_daddr_t
-xfs_btree_ptr_to_daddr(
-	struct xfs_btree_cur	*cur,
-	union xfs_btree_ptr	*ptr)
-{
-	if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
-		ASSERT(ptr->l != cpu_to_be64(NULLDFSBNO));
-
-		return XFS_FSB_TO_DADDR(cur->bc_mp, be64_to_cpu(ptr->l));
-	} else {
-		ASSERT(cur->bc_private.a.agno != NULLAGNUMBER);
-		ASSERT(ptr->s != cpu_to_be32(NULLAGBLOCK));
-
-		return XFS_AGB_TO_DADDR(cur->bc_mp, cur->bc_private.a.agno,
-					be32_to_cpu(ptr->s));
-	}
-}
-
 STATIC void
 xfs_btree_set_refs(
 	struct xfs_btree_cur	*cur,
@@ -3851,3 +3868,120 @@ xfs_btree_get_rec(
 	*stat = 1;
 	return 0;
 }
+
+/*
+ * Change the owner of a btree.
+ *
+ * The mechanism we use here is ordered buffer logging. Because we don't know
+ * how many buffers were are going to need to modify, we don't really want to
+ * have to make transaction reservations for the worst case of every buffer in a
+ * full size btree as that may be more space that we can fit in the log....
+ *
+ * We do the btree walk in the most optimal manner possible - we have sibling
+ * pointers so we can just walk all the blocks on each level from left to right
+ * in a single pass, and then move to the next level and do the same. We can
+ * also do readahead on the sibling pointers to get IO moving more quickly,
+ * though for slow disks this is unlikely to make much difference to performance
+ * as the amount of CPU work we have to do before moving to the next block is
+ * relatively small.
+ *
+ * For each btree block that we load, modify the owner appropriately, set the
+ * buffer as an ordered buffer and log it appropriately. We need to ensure that
+ * we mark the region we change dirty so that if the buffer is relogged in
+ * a subsequent transaction the changes we make here as an ordered buffer are
+ * correctly relogged in that transaction.  If we are in recovery context, then
+ * just queue the modified buffer as delayed write buffer so the transaction
+ * recovery completion writes the changes to disk.
+ */
+static int
+xfs_btree_block_change_owner(
+	struct xfs_btree_cur	*cur,
+	int			level,
+	__uint64_t		new_owner,
+	struct list_head	*buffer_list)
+{
+	struct xfs_btree_block	*block;
+	struct xfs_buf		*bp;
+	union xfs_btree_ptr     rptr;
+
+	/* do right sibling readahead */
+	xfs_btree_readahead(cur, level, XFS_BTCUR_RIGHTRA);
+
+	/* modify the owner */
+	block = xfs_btree_get_block(cur, level, &bp);
+	if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
+		block->bb_u.l.bb_owner = cpu_to_be64(new_owner);
+	else
+		block->bb_u.s.bb_owner = cpu_to_be32(new_owner);
+
+	/*
+	 * If the block is a root block hosted in an inode, we might not have a
+	 * buffer pointer here and we shouldn't attempt to log the change as the
+	 * information is already held in the inode and discarded when the root
+	 * block is formatted into the on-disk inode fork. We still change it,
+	 * though, so everything is consistent in memory.
+	 */
+	if (bp) {
+		if (cur->bc_tp) {
+			xfs_trans_ordered_buf(cur->bc_tp, bp);
+			xfs_btree_log_block(cur, bp, XFS_BB_OWNER);
+		} else {
+			xfs_buf_delwri_queue(bp, buffer_list);
+		}
+	} else {
+		ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE);
+		ASSERT(level == cur->bc_nlevels - 1);
+	}
+
+	/* now read rh sibling block for next iteration */
+	xfs_btree_get_sibling(cur, block, &rptr, XFS_BB_RIGHTSIB);
+	if (xfs_btree_ptr_is_null(cur, &rptr))
+		return ENOENT;
+
+	return xfs_btree_lookup_get_block(cur, level, &rptr, &block);
+}
+
+int
+xfs_btree_change_owner(
+	struct xfs_btree_cur	*cur,
+	__uint64_t		new_owner,
+	struct list_head	*buffer_list)
+{
+	union xfs_btree_ptr     lptr;
+	int			level;
+	struct xfs_btree_block	*block = NULL;
+	int			error = 0;
+
+	cur->bc_ops->init_ptr_from_cur(cur, &lptr);
+
+	/* for each level */
+	for (level = cur->bc_nlevels - 1; level >= 0; level--) {
+		/* grab the left hand block */
+		error = xfs_btree_lookup_get_block(cur, level, &lptr, &block);
+		if (error)
+			return error;
+
+		/* readahead the left most block for the next level down */
+		if (level > 0) {
+			union xfs_btree_ptr     *ptr;
+
+			ptr = xfs_btree_ptr_addr(cur, 1, block);
+			xfs_btree_readahead_ptr(cur, ptr, 1);
+
+			/* save for the next iteration of the loop */
+			lptr = *ptr;
+		}
+
+		/* for each buffer in the level */
+		do {
+			error = xfs_btree_block_change_owner(cur, level,
+							     new_owner,
+							     buffer_list);
+		} while (!error);
+
+		if (error != ENOENT)
+			return error;
+	}
+
+	return 0;
+}
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 08/36] libxfs: xfs_rtalloc.c becomes xfs_rtbitmap.c
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (6 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 07/36] libxfs: bmap btree owner swap support Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 09/36] libxfs: bring across inode buffer readahead verifier changes Dave Chinner
                   ` (28 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

To match the split-up of the kernel xfs_rtalloc.c file, convert the
libxfs version of xfs_rtalloc.c to match the newly shared kernel
source file with all the realtime bitmap functions in it,
xfs_rtbitmap.c.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/Makefile       |   2 +-
 libxfs/xfs_rtalloc.c  | 776 ----------------------------------------
 libxfs/xfs_rtbitmap.c | 951 ++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 952 insertions(+), 777 deletions(-)
 delete mode 100644 libxfs/xfs_rtalloc.c
 create mode 100644 libxfs/xfs_rtbitmap.c

diff --git a/libxfs/Makefile b/libxfs/Makefile
index 4522218..ae15a5d 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -35,7 +35,7 @@ CFILES = cache.c \
 	xfs_inode_fork.c \
 	xfs_ialloc_btree.c \
 	xfs_log_rlimit.c \
-	xfs_rtalloc.c \
+	xfs_rtbitmap.c \
 	xfs_sb.c \
 	xfs_symlink_remote.c \
 	xfs_trans_resv.c
diff --git a/libxfs/xfs_rtalloc.c b/libxfs/xfs_rtalloc.c
deleted file mode 100644
index f5a90b2..0000000
--- a/libxfs/xfs_rtalloc.c
+++ /dev/null
@@ -1,776 +0,0 @@
-/*
- * Copyright (c) 2000-2005 Silicon Graphics, Inc.
- * All Rights Reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it would be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write the Free Software Foundation,
- * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
- */
-
-#include <xfs.h>
-
-/*
- * Prototypes for internal functions.
- */
-
-
-STATIC int xfs_rtfind_back(xfs_mount_t *, xfs_trans_t *, xfs_rtblock_t,
-		xfs_rtblock_t, xfs_rtblock_t *);
-STATIC int xfs_rtfind_forw(xfs_mount_t *, xfs_trans_t *, xfs_rtblock_t,
-		xfs_rtblock_t, xfs_rtblock_t *);
-STATIC int xfs_rtmodify_range(xfs_mount_t *, xfs_trans_t *, xfs_rtblock_t,
-		xfs_extlen_t, int);
-STATIC int xfs_rtmodify_summary(xfs_mount_t *, xfs_trans_t *, int,
-		xfs_rtblock_t, int, xfs_buf_t **, xfs_fsblock_t *);
-
-/*
- * Internal functions.
- */
-
-/*
- * Get a buffer for the bitmap or summary file block specified.
- * The buffer is returned read and locked.
- */
-STATIC int				/* error */
-xfs_rtbuf_get(
-	xfs_mount_t	*mp,		/* file system mount structure */
-	xfs_trans_t	*tp,		/* transaction pointer */
-	xfs_rtblock_t	block,		/* block number in bitmap or summary */
-	int		issum,		/* is summary not bitmap */
-	xfs_buf_t	**bpp)		/* output: buffer for the block */
-{
-	xfs_buf_t	*bp;		/* block buffer, result */
-	xfs_inode_t	*ip;		/* bitmap or summary inode */
-	xfs_bmbt_irec_t	map;
-	int		nmap = 1;
-	int		error;		/* error value */
-
-	ip = issum ? mp->m_rsumip : mp->m_rbmip;
-
-	error = xfs_bmapi_read(ip, block, 1, &map, &nmap, XFS_DATA_FORK);
-	if (error)
-		return error;
-
-	ASSERT(map.br_startblock != NULLFSBLOCK);
-	error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp,
-				   XFS_FSB_TO_DADDR(mp, map.br_startblock),
-				   mp->m_bsize, 0, &bp, NULL);
-	if (error)
-		return error;
-	ASSERT(!xfs_buf_geterror(bp));
-	*bpp = bp;
-	return 0;
-}
-
-/*
- * Searching backward from start to limit, find the first block whose
- * allocated/free state is different from start's.
- */
-STATIC int				/* error */
-xfs_rtfind_back(
-	xfs_mount_t	*mp,		/* file system mount point */
-	xfs_trans_t	*tp,		/* transaction pointer */
-	xfs_rtblock_t	start,		/* starting block to look at */
-	xfs_rtblock_t	limit,		/* last block to look at */
-	xfs_rtblock_t	*rtblock)	/* out: start block found */
-{
-	xfs_rtword_t	*b;		/* current word in buffer */
-	int		bit;		/* bit number in the word */
-	xfs_rtblock_t	block;		/* bitmap block number */
-	xfs_buf_t	*bp;		/* buf for the block */
-	xfs_rtword_t	*bufp;		/* starting word in buffer */
-	int		error;		/* error value */
-	xfs_rtblock_t	firstbit;	/* first useful bit in the word */
-	xfs_rtblock_t	i;		/* current bit number rel. to start */
-	xfs_rtblock_t	len;		/* length of inspected area */
-	xfs_rtword_t	mask;		/* mask of relevant bits for value */
-	xfs_rtword_t	want;		/* mask for "good" values */
-	xfs_rtword_t	wdiff;		/* difference from wanted value */
-	int		word;		/* word number in the buffer */
-
-	/*
-	 * Compute and read in starting bitmap block for starting block.
-	 */
-	block = XFS_BITTOBLOCK(mp, start);
-	error = xfs_rtbuf_get(mp, tp, block, 0, &bp);
-	if (error) {
-		return error;
-	}
-	bufp = bp->b_addr;
-	/*
-	 * Get the first word's index & point to it.
-	 */
-	word = XFS_BITTOWORD(mp, start);
-	b = &bufp[word];
-	bit = (int)(start & (XFS_NBWORD - 1));
-	len = start - limit + 1;
-	/*
-	 * Compute match value, based on the bit at start: if 1 (free)
-	 * then all-ones, else all-zeroes.
-	 */
-	want = (*b & ((xfs_rtword_t)1 << bit)) ? -1 : 0;
-	/*
-	 * If the starting position is not word-aligned, deal with the
-	 * partial word.
-	 */
-	if (bit < XFS_NBWORD - 1) {
-		/*
-		 * Calculate first (leftmost) bit number to look at,
-		 * and mask for all the relevant bits in this word.
-		 */
-		firstbit = XFS_RTMAX((xfs_srtblock_t)(bit - len + 1), 0);
-		mask = (((xfs_rtword_t)1 << (bit - firstbit + 1)) - 1) <<
-			firstbit;
-		/*
-		 * Calculate the difference between the value there
-		 * and what we're looking for.
-		 */
-		if ((wdiff = (*b ^ want) & mask)) {
-			/*
-			 * Different.  Mark where we are and return.
-			 */
-			xfs_trans_brelse(tp, bp);
-			i = bit - XFS_RTHIBIT(wdiff);
-			*rtblock = start - i + 1;
-			return 0;
-		}
-		i = bit - firstbit + 1;
-		/*
-		 * Go on to previous block if that's where the previous word is
-		 * and we need the previous word.
-		 */
-		if (--word == -1 && i < len) {
-			/*
-			 * If done with this block, get the previous one.
-			 */
-			xfs_trans_brelse(tp, bp);
-			error = xfs_rtbuf_get(mp, tp, --block, 0, &bp);
-			if (error) {
-				return error;
-			}
-			bufp = bp->b_addr;
-			word = XFS_BLOCKWMASK(mp);
-			b = &bufp[word];
-		} else {
-			/*
-			 * Go on to the previous word in the buffer.
-			 */
-			b--;
-		}
-	} else {
-		/*
-		 * Starting on a word boundary, no partial word.
-		 */
-		i = 0;
-	}
-	/*
-	 * Loop over whole words in buffers.  When we use up one buffer
-	 * we move on to the previous one.
-	 */
-	while (len - i >= XFS_NBWORD) {
-		/*
-		 * Compute difference between actual and desired value.
-		 */
-		if ((wdiff = *b ^ want)) {
-			/*
-			 * Different, mark where we are and return.
-			 */
-			xfs_trans_brelse(tp, bp);
-			i += XFS_NBWORD - 1 - XFS_RTHIBIT(wdiff);
-			*rtblock = start - i + 1;
-			return 0;
-		}
-		i += XFS_NBWORD;
-		/*
-		 * Go on to previous block if that's where the previous word is
-		 * and we need the previous word.
-		 */
-		if (--word == -1 && i < len) {
-			/*
-			 * If done with this block, get the previous one.
-			 */
-			xfs_trans_brelse(tp, bp);
-			error = xfs_rtbuf_get(mp, tp, --block, 0, &bp);
-			if (error) {
-				return error;
-			}
-			bufp = bp->b_addr;
-			word = XFS_BLOCKWMASK(mp);
-			b = &bufp[word];
-		} else {
-			/*
-			 * Go on to the previous word in the buffer.
-			 */
-			b--;
-		}
-	}
-	/*
-	 * If not ending on a word boundary, deal with the last
-	 * (partial) word.
-	 */
-	if (len - i) {
-		/*
-		 * Calculate first (leftmost) bit number to look at,
-		 * and mask for all the relevant bits in this word.
-		 */
-		firstbit = XFS_NBWORD - (len - i);
-		mask = (((xfs_rtword_t)1 << (len - i)) - 1) << firstbit;
-		/*
-		 * Compute difference between actual and desired value.
-		 */
-		if ((wdiff = (*b ^ want) & mask)) {
-			/*
-			 * Different, mark where we are and return.
-			 */
-			xfs_trans_brelse(tp, bp);
-			i += XFS_NBWORD - 1 - XFS_RTHIBIT(wdiff);
-			*rtblock = start - i + 1;
-			return 0;
-		} else
-			i = len;
-	}
-	/*
-	 * No match, return that we scanned the whole area.
-	 */
-	xfs_trans_brelse(tp, bp);
-	*rtblock = start - i + 1;
-	return 0;
-}
-
-/*
- * Searching forward from start to limit, find the first block whose
- * allocated/free state is different from start's.
- */
-STATIC int				/* error */
-xfs_rtfind_forw(
-	xfs_mount_t	*mp,		/* file system mount point */
-	xfs_trans_t	*tp,		/* transaction pointer */
-	xfs_rtblock_t	start,		/* starting block to look at */
-	xfs_rtblock_t	limit,		/* last block to look at */
-	xfs_rtblock_t	*rtblock)	/* out: start block found */
-{
-	xfs_rtword_t	*b;		/* current word in buffer */
-	int		bit;		/* bit number in the word */
-	xfs_rtblock_t	block;		/* bitmap block number */
-	xfs_buf_t	*bp;		/* buf for the block */
-	xfs_rtword_t	*bufp;		/* starting word in buffer */
-	int		error;		/* error value */
-	xfs_rtblock_t	i;		/* current bit number rel. to start */
-	xfs_rtblock_t	lastbit;	/* last useful bit in the word */
-	xfs_rtblock_t	len;		/* length of inspected area */
-	xfs_rtword_t	mask;		/* mask of relevant bits for value */
-	xfs_rtword_t	want;		/* mask for "good" values */
-	xfs_rtword_t	wdiff;		/* difference from wanted value */
-	int		word;		/* word number in the buffer */
-
-	/*
-	 * Compute and read in starting bitmap block for starting block.
-	 */
-	block = XFS_BITTOBLOCK(mp, start);
-	error = xfs_rtbuf_get(mp, tp, block, 0, &bp);
-	if (error) {
-		return error;
-	}
-	bufp = bp->b_addr;
-	/*
-	 * Get the first word's index & point to it.
-	 */
-	word = XFS_BITTOWORD(mp, start);
-	b = &bufp[word];
-	bit = (int)(start & (XFS_NBWORD - 1));
-	len = limit - start + 1;
-	/*
-	 * Compute match value, based on the bit at start: if 1 (free)
-	 * then all-ones, else all-zeroes.
-	 */
-	want = (*b & ((xfs_rtword_t)1 << bit)) ? -1 : 0;
-	/*
-	 * If the starting position is not word-aligned, deal with the
-	 * partial word.
-	 */
-	if (bit) {
-		/*
-		 * Calculate last (rightmost) bit number to look at,
-		 * and mask for all the relevant bits in this word.
-		 */
-		lastbit = XFS_RTMIN(bit + len, XFS_NBWORD);
-		mask = (((xfs_rtword_t)1 << (lastbit - bit)) - 1) << bit;
-		/*
-		 * Calculate the difference between the value there
-		 * and what we're looking for.
-		 */
-		if ((wdiff = (*b ^ want) & mask)) {
-			/*
-			 * Different.  Mark where we are and return.
-			 */
-			xfs_trans_brelse(tp, bp);
-			i = XFS_RTLOBIT(wdiff) - bit;
-			*rtblock = start + i - 1;
-			return 0;
-		}
-		i = lastbit - bit;
-		/*
-		 * Go on to next block if that's where the next word is
-		 * and we need the next word.
-		 */
-		if (++word == XFS_BLOCKWSIZE(mp) && i < len) {
-			/*
-			 * If done with this block, get the previous one.
-			 */
-			xfs_trans_brelse(tp, bp);
-			error = xfs_rtbuf_get(mp, tp, ++block, 0, &bp);
-			if (error) {
-				return error;
-			}
-			b = bufp = bp->b_addr;
-			word = 0;
-		} else {
-			/*
-			 * Go on to the previous word in the buffer.
-			 */
-			b++;
-		}
-	} else {
-		/*
-		 * Starting on a word boundary, no partial word.
-		 */
-		i = 0;
-	}
-	/*
-	 * Loop over whole words in buffers.  When we use up one buffer
-	 * we move on to the next one.
-	 */
-	while (len - i >= XFS_NBWORD) {
-		/*
-		 * Compute difference between actual and desired value.
-		 */
-		if ((wdiff = *b ^ want)) {
-			/*
-			 * Different, mark where we are and return.
-			 */
-			xfs_trans_brelse(tp, bp);
-			i += XFS_RTLOBIT(wdiff);
-			*rtblock = start + i - 1;
-			return 0;
-		}
-		i += XFS_NBWORD;
-		/*
-		 * Go on to next block if that's where the next word is
-		 * and we need the next word.
-		 */
-		if (++word == XFS_BLOCKWSIZE(mp) && i < len) {
-			/*
-			 * If done with this block, get the next one.
-			 */
-			xfs_trans_brelse(tp, bp);
-			error = xfs_rtbuf_get(mp, tp, ++block, 0, &bp);
-			if (error) {
-				return error;
-			}
-			b = bufp = bp->b_addr;
-			word = 0;
-		} else {
-			/*
-			 * Go on to the next word in the buffer.
-			 */
-			b++;
-		}
-	}
-	/*
-	 * If not ending on a word boundary, deal with the last
-	 * (partial) word.
-	 */
-	if ((lastbit = len - i)) {
-		/*
-		 * Calculate mask for all the relevant bits in this word.
-		 */
-		mask = ((xfs_rtword_t)1 << lastbit) - 1;
-		/*
-		 * Compute difference between actual and desired value.
-		 */
-		if ((wdiff = (*b ^ want) & mask)) {
-			/*
-			 * Different, mark where we are and return.
-			 */
-			xfs_trans_brelse(tp, bp);
-			i += XFS_RTLOBIT(wdiff);
-			*rtblock = start + i - 1;
-			return 0;
-		} else
-			i = len;
-	}
-	/*
-	 * No match, return that we scanned the whole area.
-	 */
-	xfs_trans_brelse(tp, bp);
-	*rtblock = start + i - 1;
-	return 0;
-}
-
-/*
- * Mark an extent specified by start and len freed.
- * Updates all the summary information as well as the bitmap.
- */
-STATIC int				/* error */
-xfs_rtfree_range(
-	xfs_mount_t	*mp,		/* file system mount point */
-	xfs_trans_t	*tp,		/* transaction pointer */
-	xfs_rtblock_t	start,		/* starting block to free */
-	xfs_extlen_t	len,		/* length to free */
-	xfs_buf_t	**rbpp,		/* in/out: summary block buffer */
-	xfs_fsblock_t	*rsb)		/* in/out: summary block number */
-{
-	xfs_rtblock_t	end;		/* end of the freed extent */
-	int		error;		/* error value */
-	xfs_rtblock_t	postblock = 0;	/* first block freed > end */
-	xfs_rtblock_t	preblock = 0;	/* first block freed < start */
-
-	end = start + len - 1;
-	/*
-	 * Modify the bitmap to mark this extent freed.
-	 */
-	error = xfs_rtmodify_range(mp, tp, start, len, 1);
-	if (error) {
-		return error;
-	}
-	/*
-	 * Assume we're freeing out of the middle of an allocated extent.
-	 * We need to find the beginning and end of the extent so we can
-	 * properly update the summary.
-	 */
-	error = xfs_rtfind_back(mp, tp, start, 0, &preblock);
-	if (error) {
-		return error;
-	}
-	/*
-	 * Find the next allocated block (end of allocated extent).
-	 */
-	error = xfs_rtfind_forw(mp, tp, end, mp->m_sb.sb_rextents - 1,
-		&postblock);
-	if (error)
-		return error;
-	/*
-	 * If there are blocks not being freed at the front of the
-	 * old extent, add summary data for them to be allocated.
-	 */
-	if (preblock < start) {
-		error = xfs_rtmodify_summary(mp, tp,
-			XFS_RTBLOCKLOG(start - preblock),
-			XFS_BITTOBLOCK(mp, preblock), -1, rbpp, rsb);
-		if (error) {
-			return error;
-		}
-	}
-	/*
-	 * If there are blocks not being freed at the end of the
-	 * old extent, add summary data for them to be allocated.
-	 */
-	if (postblock > end) {
-		error = xfs_rtmodify_summary(mp, tp,
-			XFS_RTBLOCKLOG(postblock - end),
-			XFS_BITTOBLOCK(mp, end + 1), -1, rbpp, rsb);
-		if (error) {
-			return error;
-		}
-	}
-	/*
-	 * Increment the summary information corresponding to the entire
-	 * (new) free extent.
-	 */
-	error = xfs_rtmodify_summary(mp, tp,
-		XFS_RTBLOCKLOG(postblock + 1 - preblock),
-		XFS_BITTOBLOCK(mp, preblock), 1, rbpp, rsb);
-	return error;
-}
-
-/*
- * Set the given range of bitmap bits to the given value.
- * Do whatever I/O and logging is required.
- */
-STATIC int				/* error */
-xfs_rtmodify_range(
-	xfs_mount_t	*mp,		/* file system mount point */
-	xfs_trans_t	*tp,		/* transaction pointer */
-	xfs_rtblock_t	start,		/* starting block to modify */
-	xfs_extlen_t	len,		/* length of extent to modify */
-	int		val)		/* 1 for free, 0 for allocated */
-{
-	xfs_rtword_t	*b;		/* current word in buffer */
-	int		bit;		/* bit number in the word */
-	xfs_rtblock_t	block;		/* bitmap block number */
-	xfs_buf_t	*bp;		/* buf for the block */
-	xfs_rtword_t	*bufp;		/* starting word in buffer */
-	int		error;		/* error value */
-	xfs_rtword_t	*first;		/* first used word in the buffer */
-	int		i;		/* current bit number rel. to start */
-	int		lastbit;	/* last useful bit in word */
-	xfs_rtword_t	mask;		/* mask o frelevant bits for value */
-	int		word;		/* word number in the buffer */
-
-	/*
-	 * Compute starting bitmap block number.
-	 */
-	block = XFS_BITTOBLOCK(mp, start);
-	/*
-	 * Read the bitmap block, and point to its data.
-	 */
-	error = xfs_rtbuf_get(mp, tp, block, 0, &bp);
-	if (error) {
-		return error;
-	}
-	bufp = bp->b_addr;
-	/*
-	 * Compute the starting word's address, and starting bit.
-	 */
-	word = XFS_BITTOWORD(mp, start);
-	first = b = &bufp[word];
-	bit = (int)(start & (XFS_NBWORD - 1));
-	/*
-	 * 0 (allocated) => all zeroes; 1 (free) => all ones.
-	 */
-	val = -val;
-	/*
-	 * If not starting on a word boundary, deal with the first
-	 * (partial) word.
-	 */
-	if (bit) {
-		/*
-		 * Compute first bit not changed and mask of relevant bits.
-		 */
-		lastbit = XFS_RTMIN(bit + len, XFS_NBWORD);
-		mask = (((xfs_rtword_t)1 << (lastbit - bit)) - 1) << bit;
-		/*
-		 * Set/clear the active bits.
-		 */
-		if (val)
-			*b |= mask;
-		else
-			*b &= ~mask;
-		i = lastbit - bit;
-		/*
-		 * Go on to the next block if that's where the next word is
-		 * and we need the next word.
-		 */
-		if (++word == XFS_BLOCKWSIZE(mp) && i < len) {
-			/*
-			 * Log the changed part of this block.
-			 * Get the next one.
-			 */
-			xfs_trans_log_buf(tp, bp,
-				(uint)((char *)first - (char *)bufp),
-				(uint)((char *)b - (char *)bufp));
-			error = xfs_rtbuf_get(mp, tp, ++block, 0, &bp);
-			if (error) {
-				return error;
-			}
-			first = b = bufp = bp->b_addr;
-			word = 0;
-		} else {
-			/*
-			 * Go on to the next word in the buffer
-			 */
-			b++;
-		}
-	} else {
-		/*
-		 * Starting on a word boundary, no partial word.
-		 */
-		i = 0;
-	}
-	/*
-	 * Loop over whole words in buffers.  When we use up one buffer
-	 * we move on to the next one.
-	 */
-	while (len - i >= XFS_NBWORD) {
-		/*
-		 * Set the word value correctly.
-		 */
-		*b = val;
-		i += XFS_NBWORD;
-		/*
-		 * Go on to the next block if that's where the next word is
-		 * and we need the next word.
-		 */
-		if (++word == XFS_BLOCKWSIZE(mp) && i < len) {
-			/*
-			 * Log the changed part of this block.
-			 * Get the next one.
-			 */
-			xfs_trans_log_buf(tp, bp,
-				(uint)((char *)first - (char *)bufp),
-				(uint)((char *)b - (char *)bufp));
-			error = xfs_rtbuf_get(mp, tp, ++block, 0, &bp);
-			if (error) {
-				return error;
-			}
-			first = b = bufp = bp->b_addr;
-			word = 0;
-		} else {
-			/*
-			 * Go on to the next word in the buffer
-			 */
-			b++;
-		}
-	}
-	/*
-	 * If not ending on a word boundary, deal with the last
-	 * (partial) word.
-	 */
-	if ((lastbit = len - i)) {
-		/*
-		 * Compute a mask of relevant bits.
-		 */
-		bit = 0;
-		mask = ((xfs_rtword_t)1 << lastbit) - 1;
-		/*
-		 * Set/clear the active bits.
-		 */
-		if (val)
-			*b |= mask;
-		else
-			*b &= ~mask;
-		b++;
-	}
-	/*
-	 * Log any remaining changed bytes.
-	 */
-	if (b > first)
-		xfs_trans_log_buf(tp, bp, (uint)((char *)first - (char *)bufp),
-			(uint)((char *)b - (char *)bufp - 1));
-	return 0;
-}
-
-/*
- * Read and modify the summary information for a given extent size,
- * bitmap block combination.
- * Keeps track of a current summary block, so we don't keep reading
- * it from the buffer cache.
- */
-STATIC int				/* error */
-xfs_rtmodify_summary(
-	xfs_mount_t	*mp,		/* file system mount point */
-	xfs_trans_t	*tp,		/* transaction pointer */
-	int		log,		/* log2 of extent size */
-	xfs_rtblock_t	bbno,		/* bitmap block number */
-	int		delta,		/* change to make to summary info */
-	xfs_buf_t	**rbpp,		/* in/out: summary block buffer */
-	xfs_fsblock_t	*rsb)		/* in/out: summary block number */
-{
-	xfs_buf_t	*bp;		/* buffer for the summary block */
-	int		error;		/* error value */
-	xfs_fsblock_t	sb;		/* summary fsblock */
-	int		so;		/* index into the summary file */
-	xfs_suminfo_t	*sp;		/* pointer to returned data */
-
-	/*
-	 * Compute entry number in the summary file.
-	 */
-	so = XFS_SUMOFFS(mp, log, bbno);
-	/*
-	 * Compute the block number in the summary file.
-	 */
-	sb = XFS_SUMOFFSTOBLOCK(mp, so);
-	/*
-	 * If we have an old buffer, and the block number matches, use that.
-	 */
-	if (rbpp && *rbpp && *rsb == sb)
-		bp = *rbpp;
-	/*
-	 * Otherwise we have to get the buffer.
-	 */
-	else {
-		/*
-		 * If there was an old one, get rid of it first.
-		 */
-		if (rbpp && *rbpp)
-			xfs_trans_brelse(tp, *rbpp);
-		error = xfs_rtbuf_get(mp, tp, sb, 1, &bp);
-		if (error) {
-			return error;
-		}
-		/*
-		 * Remember this buffer and block for the next call.
-		 */
-		if (rbpp) {
-			*rbpp = bp;
-			*rsb = sb;
-		}
-	}
-	/*
-	 * Point to the summary information, modify and log it.
-	 */
-	sp = XFS_SUMPTR(mp, bp, so);
-	*sp += delta;
-	xfs_trans_log_buf(tp, bp, (uint)((char *)sp - (char *)bp->b_addr),
-		(uint)((char *)sp - (char *)bp->b_addr + sizeof(*sp) - 1));
-	return 0;
-}
-
-/*
- * Free an extent in the realtime subvolume.  Length is expressed in
- * realtime extents, as is the block number.
- */
-int					/* error */
-xfs_rtfree_extent(
-	xfs_trans_t	*tp,		/* transaction pointer */
-	xfs_rtblock_t	bno,		/* starting block number to free */
-	xfs_extlen_t	len)		/* length of extent freed */
-{
-	int		error;		/* error value */
-	xfs_mount_t	*mp;		/* file system mount structure */
-	xfs_fsblock_t	sb;		/* summary file block number */
-	xfs_buf_t	*sumbp;		/* summary file block buffer */
-
-	mp = tp->t_mountp;
-
-	ASSERT(mp->m_rbmip->i_itemp != NULL);
-	ASSERT(xfs_isilocked(mp->m_rbmip, XFS_ILOCK_EXCL));
-
-#ifdef DEBUG
-	/*
-	 * Check to see that this whole range is currently allocated.
-	 */
-	{
-		int	stat;		/* result from checking range */
-
-		error = xfs_rtcheck_alloc_range(mp, tp, bno, len, &stat);
-		if (error) {
-			return error;
-		}
-		ASSERT(stat);
-	}
-#endif
-	sumbp = NULL;
-	/*
-	 * Free the range of realtime blocks.
-	 */
-	error = xfs_rtfree_range(mp, tp, bno, len, &sumbp, &sb);
-	if (error) {
-		return error;
-	}
-	/*
-	 * Mark more blocks free in the superblock.
-	 */
-	xfs_trans_mod_sb(tp, XFS_TRANS_SB_FREXTENTS, (long)len);
-	/*
-	 * If we've now freed all the blocks, reset the file sequence
-	 * number to 0.
-	 */
-	if (tp->t_frextents_delta + mp->m_sb.sb_frextents ==
-	    mp->m_sb.sb_rextents) {
-		if (!(mp->m_rbmip->i_d.di_flags & XFS_DIFLAG_NEWRTBM))
-			mp->m_rbmip->i_d.di_flags |= XFS_DIFLAG_NEWRTBM;
-		*(__uint64_t *)&mp->m_rbmip->i_d.di_atime = 0;
-		xfs_trans_log_inode(tp, mp->m_rbmip, XFS_ILOG_CORE);
-	}
-	return 0;
-}
diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c
new file mode 100644
index 0000000..f24b9bd
--- /dev/null
+++ b/libxfs/xfs_rtbitmap.c
@@ -0,0 +1,951 @@
+/*
+ * Copyright (c) 2000-2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#include "xfs.h"
+
+/*
+ * Realtime allocator bitmap functions shared with userspace.
+ */
+
+/*
+ * Get a buffer for the bitmap or summary file block specified.
+ * The buffer is returned read and locked.
+ */
+int
+xfs_rtbuf_get(
+	xfs_mount_t	*mp,		/* file system mount structure */
+	xfs_trans_t	*tp,		/* transaction pointer */
+	xfs_rtblock_t	block,		/* block number in bitmap or summary */
+	int		issum,		/* is summary not bitmap */
+	xfs_buf_t	**bpp)		/* output: buffer for the block */
+{
+	xfs_buf_t	*bp;		/* block buffer, result */
+	xfs_inode_t	*ip;		/* bitmap or summary inode */
+	xfs_bmbt_irec_t	map;
+	int		nmap = 1;
+	int		error;		/* error value */
+
+	ip = issum ? mp->m_rsumip : mp->m_rbmip;
+
+	error = xfs_bmapi_read(ip, block, 1, &map, &nmap, XFS_DATA_FORK);
+	if (error)
+		return error;
+
+	ASSERT(map.br_startblock != NULLFSBLOCK);
+	error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp,
+				   XFS_FSB_TO_DADDR(mp, map.br_startblock),
+				   mp->m_bsize, 0, &bp, NULL);
+	if (error)
+		return error;
+	ASSERT(!xfs_buf_geterror(bp));
+	*bpp = bp;
+	return 0;
+}
+
+/*
+ * Searching backward from start to limit, find the first block whose
+ * allocated/free state is different from start's.
+ */
+int
+xfs_rtfind_back(
+	xfs_mount_t	*mp,		/* file system mount point */
+	xfs_trans_t	*tp,		/* transaction pointer */
+	xfs_rtblock_t	start,		/* starting block to look at */
+	xfs_rtblock_t	limit,		/* last block to look at */
+	xfs_rtblock_t	*rtblock)	/* out: start block found */
+{
+	xfs_rtword_t	*b;		/* current word in buffer */
+	int		bit;		/* bit number in the word */
+	xfs_rtblock_t	block;		/* bitmap block number */
+	xfs_buf_t	*bp;		/* buf for the block */
+	xfs_rtword_t	*bufp;		/* starting word in buffer */
+	int		error;		/* error value */
+	xfs_rtblock_t	firstbit;	/* first useful bit in the word */
+	xfs_rtblock_t	i;		/* current bit number rel. to start */
+	xfs_rtblock_t	len;		/* length of inspected area */
+	xfs_rtword_t	mask;		/* mask of relevant bits for value */
+	xfs_rtword_t	want;		/* mask for "good" values */
+	xfs_rtword_t	wdiff;		/* difference from wanted value */
+	int		word;		/* word number in the buffer */
+
+	/*
+	 * Compute and read in starting bitmap block for starting block.
+	 */
+	block = XFS_BITTOBLOCK(mp, start);
+	error = xfs_rtbuf_get(mp, tp, block, 0, &bp);
+	if (error) {
+		return error;
+	}
+	bufp = bp->b_addr;
+	/*
+	 * Get the first word's index & point to it.
+	 */
+	word = XFS_BITTOWORD(mp, start);
+	b = &bufp[word];
+	bit = (int)(start & (XFS_NBWORD - 1));
+	len = start - limit + 1;
+	/*
+	 * Compute match value, based on the bit at start: if 1 (free)
+	 * then all-ones, else all-zeroes.
+	 */
+	want = (*b & ((xfs_rtword_t)1 << bit)) ? -1 : 0;
+	/*
+	 * If the starting position is not word-aligned, deal with the
+	 * partial word.
+	 */
+	if (bit < XFS_NBWORD - 1) {
+		/*
+		 * Calculate first (leftmost) bit number to look at,
+		 * and mask for all the relevant bits in this word.
+		 */
+		firstbit = XFS_RTMAX((xfs_srtblock_t)(bit - len + 1), 0);
+		mask = (((xfs_rtword_t)1 << (bit - firstbit + 1)) - 1) <<
+			firstbit;
+		/*
+		 * Calculate the difference between the value there
+		 * and what we're looking for.
+		 */
+		if ((wdiff = (*b ^ want) & mask)) {
+			/*
+			 * Different.  Mark where we are and return.
+			 */
+			xfs_trans_brelse(tp, bp);
+			i = bit - XFS_RTHIBIT(wdiff);
+			*rtblock = start - i + 1;
+			return 0;
+		}
+		i = bit - firstbit + 1;
+		/*
+		 * Go on to previous block if that's where the previous word is
+		 * and we need the previous word.
+		 */
+		if (--word == -1 && i < len) {
+			/*
+			 * If done with this block, get the previous one.
+			 */
+			xfs_trans_brelse(tp, bp);
+			error = xfs_rtbuf_get(mp, tp, --block, 0, &bp);
+			if (error) {
+				return error;
+			}
+			bufp = bp->b_addr;
+			word = XFS_BLOCKWMASK(mp);
+			b = &bufp[word];
+		} else {
+			/*
+			 * Go on to the previous word in the buffer.
+			 */
+			b--;
+		}
+	} else {
+		/*
+		 * Starting on a word boundary, no partial word.
+		 */
+		i = 0;
+	}
+	/*
+	 * Loop over whole words in buffers.  When we use up one buffer
+	 * we move on to the previous one.
+	 */
+	while (len - i >= XFS_NBWORD) {
+		/*
+		 * Compute difference between actual and desired value.
+		 */
+		if ((wdiff = *b ^ want)) {
+			/*
+			 * Different, mark where we are and return.
+			 */
+			xfs_trans_brelse(tp, bp);
+			i += XFS_NBWORD - 1 - XFS_RTHIBIT(wdiff);
+			*rtblock = start - i + 1;
+			return 0;
+		}
+		i += XFS_NBWORD;
+		/*
+		 * Go on to previous block if that's where the previous word is
+		 * and we need the previous word.
+		 */
+		if (--word == -1 && i < len) {
+			/*
+			 * If done with this block, get the previous one.
+			 */
+			xfs_trans_brelse(tp, bp);
+			error = xfs_rtbuf_get(mp, tp, --block, 0, &bp);
+			if (error) {
+				return error;
+			}
+			bufp = bp->b_addr;
+			word = XFS_BLOCKWMASK(mp);
+			b = &bufp[word];
+		} else {
+			/*
+			 * Go on to the previous word in the buffer.
+			 */
+			b--;
+		}
+	}
+	/*
+	 * If not ending on a word boundary, deal with the last
+	 * (partial) word.
+	 */
+	if (len - i) {
+		/*
+		 * Calculate first (leftmost) bit number to look at,
+		 * and mask for all the relevant bits in this word.
+		 */
+		firstbit = XFS_NBWORD - (len - i);
+		mask = (((xfs_rtword_t)1 << (len - i)) - 1) << firstbit;
+		/*
+		 * Compute difference between actual and desired value.
+		 */
+		if ((wdiff = (*b ^ want) & mask)) {
+			/*
+			 * Different, mark where we are and return.
+			 */
+			xfs_trans_brelse(tp, bp);
+			i += XFS_NBWORD - 1 - XFS_RTHIBIT(wdiff);
+			*rtblock = start - i + 1;
+			return 0;
+		} else
+			i = len;
+	}
+	/*
+	 * No match, return that we scanned the whole area.
+	 */
+	xfs_trans_brelse(tp, bp);
+	*rtblock = start - i + 1;
+	return 0;
+}
+
+/*
+ * Searching forward from start to limit, find the first block whose
+ * allocated/free state is different from start's.
+ */
+int
+xfs_rtfind_forw(
+	xfs_mount_t	*mp,		/* file system mount point */
+	xfs_trans_t	*tp,		/* transaction pointer */
+	xfs_rtblock_t	start,		/* starting block to look at */
+	xfs_rtblock_t	limit,		/* last block to look at */
+	xfs_rtblock_t	*rtblock)	/* out: start block found */
+{
+	xfs_rtword_t	*b;		/* current word in buffer */
+	int		bit;		/* bit number in the word */
+	xfs_rtblock_t	block;		/* bitmap block number */
+	xfs_buf_t	*bp;		/* buf for the block */
+	xfs_rtword_t	*bufp;		/* starting word in buffer */
+	int		error;		/* error value */
+	xfs_rtblock_t	i;		/* current bit number rel. to start */
+	xfs_rtblock_t	lastbit;	/* last useful bit in the word */
+	xfs_rtblock_t	len;		/* length of inspected area */
+	xfs_rtword_t	mask;		/* mask of relevant bits for value */
+	xfs_rtword_t	want;		/* mask for "good" values */
+	xfs_rtword_t	wdiff;		/* difference from wanted value */
+	int		word;		/* word number in the buffer */
+
+	/*
+	 * Compute and read in starting bitmap block for starting block.
+	 */
+	block = XFS_BITTOBLOCK(mp, start);
+	error = xfs_rtbuf_get(mp, tp, block, 0, &bp);
+	if (error) {
+		return error;
+	}
+	bufp = bp->b_addr;
+	/*
+	 * Get the first word's index & point to it.
+	 */
+	word = XFS_BITTOWORD(mp, start);
+	b = &bufp[word];
+	bit = (int)(start & (XFS_NBWORD - 1));
+	len = limit - start + 1;
+	/*
+	 * Compute match value, based on the bit at start: if 1 (free)
+	 * then all-ones, else all-zeroes.
+	 */
+	want = (*b & ((xfs_rtword_t)1 << bit)) ? -1 : 0;
+	/*
+	 * If the starting position is not word-aligned, deal with the
+	 * partial word.
+	 */
+	if (bit) {
+		/*
+		 * Calculate last (rightmost) bit number to look at,
+		 * and mask for all the relevant bits in this word.
+		 */
+		lastbit = XFS_RTMIN(bit + len, XFS_NBWORD);
+		mask = (((xfs_rtword_t)1 << (lastbit - bit)) - 1) << bit;
+		/*
+		 * Calculate the difference between the value there
+		 * and what we're looking for.
+		 */
+		if ((wdiff = (*b ^ want) & mask)) {
+			/*
+			 * Different.  Mark where we are and return.
+			 */
+			xfs_trans_brelse(tp, bp);
+			i = XFS_RTLOBIT(wdiff) - bit;
+			*rtblock = start + i - 1;
+			return 0;
+		}
+		i = lastbit - bit;
+		/*
+		 * Go on to next block if that's where the next word is
+		 * and we need the next word.
+		 */
+		if (++word == XFS_BLOCKWSIZE(mp) && i < len) {
+			/*
+			 * If done with this block, get the previous one.
+			 */
+			xfs_trans_brelse(tp, bp);
+			error = xfs_rtbuf_get(mp, tp, ++block, 0, &bp);
+			if (error) {
+				return error;
+			}
+			b = bufp = bp->b_addr;
+			word = 0;
+		} else {
+			/*
+			 * Go on to the previous word in the buffer.
+			 */
+			b++;
+		}
+	} else {
+		/*
+		 * Starting on a word boundary, no partial word.
+		 */
+		i = 0;
+	}
+	/*
+	 * Loop over whole words in buffers.  When we use up one buffer
+	 * we move on to the next one.
+	 */
+	while (len - i >= XFS_NBWORD) {
+		/*
+		 * Compute difference between actual and desired value.
+		 */
+		if ((wdiff = *b ^ want)) {
+			/*
+			 * Different, mark where we are and return.
+			 */
+			xfs_trans_brelse(tp, bp);
+			i += XFS_RTLOBIT(wdiff);
+			*rtblock = start + i - 1;
+			return 0;
+		}
+		i += XFS_NBWORD;
+		/*
+		 * Go on to next block if that's where the next word is
+		 * and we need the next word.
+		 */
+		if (++word == XFS_BLOCKWSIZE(mp) && i < len) {
+			/*
+			 * If done with this block, get the next one.
+			 */
+			xfs_trans_brelse(tp, bp);
+			error = xfs_rtbuf_get(mp, tp, ++block, 0, &bp);
+			if (error) {
+				return error;
+			}
+			b = bufp = bp->b_addr;
+			word = 0;
+		} else {
+			/*
+			 * Go on to the next word in the buffer.
+			 */
+			b++;
+		}
+	}
+	/*
+	 * If not ending on a word boundary, deal with the last
+	 * (partial) word.
+	 */
+	if ((lastbit = len - i)) {
+		/*
+		 * Calculate mask for all the relevant bits in this word.
+		 */
+		mask = ((xfs_rtword_t)1 << lastbit) - 1;
+		/*
+		 * Compute difference between actual and desired value.
+		 */
+		if ((wdiff = (*b ^ want) & mask)) {
+			/*
+			 * Different, mark where we are and return.
+			 */
+			xfs_trans_brelse(tp, bp);
+			i += XFS_RTLOBIT(wdiff);
+			*rtblock = start + i - 1;
+			return 0;
+		} else
+			i = len;
+	}
+	/*
+	 * No match, return that we scanned the whole area.
+	 */
+	xfs_trans_brelse(tp, bp);
+	*rtblock = start + i - 1;
+	return 0;
+}
+
+/*
+ * Read and modify the summary information for a given extent size,
+ * bitmap block combination.
+ * Keeps track of a current summary block, so we don't keep reading
+ * it from the buffer cache.
+ */
+int
+xfs_rtmodify_summary(
+	xfs_mount_t	*mp,		/* file system mount point */
+	xfs_trans_t	*tp,		/* transaction pointer */
+	int		log,		/* log2 of extent size */
+	xfs_rtblock_t	bbno,		/* bitmap block number */
+	int		delta,		/* change to make to summary info */
+	xfs_buf_t	**rbpp,		/* in/out: summary block buffer */
+	xfs_fsblock_t	*rsb)		/* in/out: summary block number */
+{
+	xfs_buf_t	*bp;		/* buffer for the summary block */
+	int		error;		/* error value */
+	xfs_fsblock_t	sb;		/* summary fsblock */
+	int		so;		/* index into the summary file */
+	xfs_suminfo_t	*sp;		/* pointer to returned data */
+
+	/*
+	 * Compute entry number in the summary file.
+	 */
+	so = XFS_SUMOFFS(mp, log, bbno);
+	/*
+	 * Compute the block number in the summary file.
+	 */
+	sb = XFS_SUMOFFSTOBLOCK(mp, so);
+	/*
+	 * If we have an old buffer, and the block number matches, use that.
+	 */
+	if (rbpp && *rbpp && *rsb == sb)
+		bp = *rbpp;
+	/*
+	 * Otherwise we have to get the buffer.
+	 */
+	else {
+		/*
+		 * If there was an old one, get rid of it first.
+		 */
+		if (rbpp && *rbpp)
+			xfs_trans_brelse(tp, *rbpp);
+		error = xfs_rtbuf_get(mp, tp, sb, 1, &bp);
+		if (error) {
+			return error;
+		}
+		/*
+		 * Remember this buffer and block for the next call.
+		 */
+		if (rbpp) {
+			*rbpp = bp;
+			*rsb = sb;
+		}
+	}
+	/*
+	 * Point to the summary information, modify and log it.
+	 */
+	sp = XFS_SUMPTR(mp, bp, so);
+	*sp += delta;
+	xfs_trans_log_buf(tp, bp, (uint)((char *)sp - (char *)bp->b_addr),
+		(uint)((char *)sp - (char *)bp->b_addr + sizeof(*sp) - 1));
+	return 0;
+}
+
+/*
+ * Set the given range of bitmap bits to the given value.
+ * Do whatever I/O and logging is required.
+ */
+int
+xfs_rtmodify_range(
+	xfs_mount_t	*mp,		/* file system mount point */
+	xfs_trans_t	*tp,		/* transaction pointer */
+	xfs_rtblock_t	start,		/* starting block to modify */
+	xfs_extlen_t	len,		/* length of extent to modify */
+	int		val)		/* 1 for free, 0 for allocated */
+{
+	xfs_rtword_t	*b;		/* current word in buffer */
+	int		bit;		/* bit number in the word */
+	xfs_rtblock_t	block;		/* bitmap block number */
+	xfs_buf_t	*bp;		/* buf for the block */
+	xfs_rtword_t	*bufp;		/* starting word in buffer */
+	int		error;		/* error value */
+	xfs_rtword_t	*first;		/* first used word in the buffer */
+	int		i;		/* current bit number rel. to start */
+	int		lastbit;	/* last useful bit in word */
+	xfs_rtword_t	mask;		/* mask o frelevant bits for value */
+	int		word;		/* word number in the buffer */
+
+	/*
+	 * Compute starting bitmap block number.
+	 */
+	block = XFS_BITTOBLOCK(mp, start);
+	/*
+	 * Read the bitmap block, and point to its data.
+	 */
+	error = xfs_rtbuf_get(mp, tp, block, 0, &bp);
+	if (error) {
+		return error;
+	}
+	bufp = bp->b_addr;
+	/*
+	 * Compute the starting word's address, and starting bit.
+	 */
+	word = XFS_BITTOWORD(mp, start);
+	first = b = &bufp[word];
+	bit = (int)(start & (XFS_NBWORD - 1));
+	/*
+	 * 0 (allocated) => all zeroes; 1 (free) => all ones.
+	 */
+	val = -val;
+	/*
+	 * If not starting on a word boundary, deal with the first
+	 * (partial) word.
+	 */
+	if (bit) {
+		/*
+		 * Compute first bit not changed and mask of relevant bits.
+		 */
+		lastbit = XFS_RTMIN(bit + len, XFS_NBWORD);
+		mask = (((xfs_rtword_t)1 << (lastbit - bit)) - 1) << bit;
+		/*
+		 * Set/clear the active bits.
+		 */
+		if (val)
+			*b |= mask;
+		else
+			*b &= ~mask;
+		i = lastbit - bit;
+		/*
+		 * Go on to the next block if that's where the next word is
+		 * and we need the next word.
+		 */
+		if (++word == XFS_BLOCKWSIZE(mp) && i < len) {
+			/*
+			 * Log the changed part of this block.
+			 * Get the next one.
+			 */
+			xfs_trans_log_buf(tp, bp,
+				(uint)((char *)first - (char *)bufp),
+				(uint)((char *)b - (char *)bufp));
+			error = xfs_rtbuf_get(mp, tp, ++block, 0, &bp);
+			if (error) {
+				return error;
+			}
+			first = b = bufp = bp->b_addr;
+			word = 0;
+		} else {
+			/*
+			 * Go on to the next word in the buffer
+			 */
+			b++;
+		}
+	} else {
+		/*
+		 * Starting on a word boundary, no partial word.
+		 */
+		i = 0;
+	}
+	/*
+	 * Loop over whole words in buffers.  When we use up one buffer
+	 * we move on to the next one.
+	 */
+	while (len - i >= XFS_NBWORD) {
+		/*
+		 * Set the word value correctly.
+		 */
+		*b = val;
+		i += XFS_NBWORD;
+		/*
+		 * Go on to the next block if that's where the next word is
+		 * and we need the next word.
+		 */
+		if (++word == XFS_BLOCKWSIZE(mp) && i < len) {
+			/*
+			 * Log the changed part of this block.
+			 * Get the next one.
+			 */
+			xfs_trans_log_buf(tp, bp,
+				(uint)((char *)first - (char *)bufp),
+				(uint)((char *)b - (char *)bufp));
+			error = xfs_rtbuf_get(mp, tp, ++block, 0, &bp);
+			if (error) {
+				return error;
+			}
+			first = b = bufp = bp->b_addr;
+			word = 0;
+		} else {
+			/*
+			 * Go on to the next word in the buffer
+			 */
+			b++;
+		}
+	}
+	/*
+	 * If not ending on a word boundary, deal with the last
+	 * (partial) word.
+	 */
+	if ((lastbit = len - i)) {
+		/*
+		 * Compute a mask of relevant bits.
+		 */
+		bit = 0;
+		mask = ((xfs_rtword_t)1 << lastbit) - 1;
+		/*
+		 * Set/clear the active bits.
+		 */
+		if (val)
+			*b |= mask;
+		else
+			*b &= ~mask;
+		b++;
+	}
+	/*
+	 * Log any remaining changed bytes.
+	 */
+	if (b > first)
+		xfs_trans_log_buf(tp, bp, (uint)((char *)first - (char *)bufp),
+			(uint)((char *)b - (char *)bufp - 1));
+	return 0;
+}
+
+/*
+ * Mark an extent specified by start and len freed.
+ * Updates all the summary information as well as the bitmap.
+ */
+int
+xfs_rtfree_range(
+	xfs_mount_t	*mp,		/* file system mount point */
+	xfs_trans_t	*tp,		/* transaction pointer */
+	xfs_rtblock_t	start,		/* starting block to free */
+	xfs_extlen_t	len,		/* length to free */
+	xfs_buf_t	**rbpp,		/* in/out: summary block buffer */
+	xfs_fsblock_t	*rsb)		/* in/out: summary block number */
+{
+	xfs_rtblock_t	end;		/* end of the freed extent */
+	int		error;		/* error value */
+	xfs_rtblock_t	postblock;	/* first block freed > end */
+	xfs_rtblock_t	preblock;	/* first block freed < start */
+
+	end = start + len - 1;
+	/*
+	 * Modify the bitmap to mark this extent freed.
+	 */
+	error = xfs_rtmodify_range(mp, tp, start, len, 1);
+	if (error) {
+		return error;
+	}
+	/*
+	 * Assume we're freeing out of the middle of an allocated extent.
+	 * We need to find the beginning and end of the extent so we can
+	 * properly update the summary.
+	 */
+	error = xfs_rtfind_back(mp, tp, start, 0, &preblock);
+	if (error) {
+		return error;
+	}
+	/*
+	 * Find the next allocated block (end of allocated extent).
+	 */
+	error = xfs_rtfind_forw(mp, tp, end, mp->m_sb.sb_rextents - 1,
+		&postblock);
+	if (error)
+		return error;
+	/*
+	 * If there are blocks not being freed at the front of the
+	 * old extent, add summary data for them to be allocated.
+	 */
+	if (preblock < start) {
+		error = xfs_rtmodify_summary(mp, tp,
+			XFS_RTBLOCKLOG(start - preblock),
+			XFS_BITTOBLOCK(mp, preblock), -1, rbpp, rsb);
+		if (error) {
+			return error;
+		}
+	}
+	/*
+	 * If there are blocks not being freed at the end of the
+	 * old extent, add summary data for them to be allocated.
+	 */
+	if (postblock > end) {
+		error = xfs_rtmodify_summary(mp, tp,
+			XFS_RTBLOCKLOG(postblock - end),
+			XFS_BITTOBLOCK(mp, end + 1), -1, rbpp, rsb);
+		if (error) {
+			return error;
+		}
+	}
+	/*
+	 * Increment the summary information corresponding to the entire
+	 * (new) free extent.
+	 */
+	error = xfs_rtmodify_summary(mp, tp,
+		XFS_RTBLOCKLOG(postblock + 1 - preblock),
+		XFS_BITTOBLOCK(mp, preblock), 1, rbpp, rsb);
+	return error;
+}
+
+/*
+ * Check that the given range is either all allocated (val = 0) or
+ * all free (val = 1).
+ */
+int
+xfs_rtcheck_range(
+	xfs_mount_t	*mp,		/* file system mount point */
+	xfs_trans_t	*tp,		/* transaction pointer */
+	xfs_rtblock_t	start,		/* starting block number of extent */
+	xfs_extlen_t	len,		/* length of extent */
+	int		val,		/* 1 for free, 0 for allocated */
+	xfs_rtblock_t	*new,		/* out: first block not matching */
+	int		*stat)		/* out: 1 for matches, 0 for not */
+{
+	xfs_rtword_t	*b;		/* current word in buffer */
+	int		bit;		/* bit number in the word */
+	xfs_rtblock_t	block;		/* bitmap block number */
+	xfs_buf_t	*bp;		/* buf for the block */
+	xfs_rtword_t	*bufp;		/* starting word in buffer */
+	int		error;		/* error value */
+	xfs_rtblock_t	i;		/* current bit number rel. to start */
+	xfs_rtblock_t	lastbit;	/* last useful bit in word */
+	xfs_rtword_t	mask;		/* mask of relevant bits for value */
+	xfs_rtword_t	wdiff;		/* difference from wanted value */
+	int		word;		/* word number in the buffer */
+
+	/*
+	 * Compute starting bitmap block number
+	 */
+	block = XFS_BITTOBLOCK(mp, start);
+	/*
+	 * Read the bitmap block.
+	 */
+	error = xfs_rtbuf_get(mp, tp, block, 0, &bp);
+	if (error) {
+		return error;
+	}
+	bufp = bp->b_addr;
+	/*
+	 * Compute the starting word's address, and starting bit.
+	 */
+	word = XFS_BITTOWORD(mp, start);
+	b = &bufp[word];
+	bit = (int)(start & (XFS_NBWORD - 1));
+	/*
+	 * 0 (allocated) => all zero's; 1 (free) => all one's.
+	 */
+	val = -val;
+	/*
+	 * If not starting on a word boundary, deal with the first
+	 * (partial) word.
+	 */
+	if (bit) {
+		/*
+		 * Compute first bit not examined.
+		 */
+		lastbit = XFS_RTMIN(bit + len, XFS_NBWORD);
+		/*
+		 * Mask of relevant bits.
+		 */
+		mask = (((xfs_rtword_t)1 << (lastbit - bit)) - 1) << bit;
+		/*
+		 * Compute difference between actual and desired value.
+		 */
+		if ((wdiff = (*b ^ val) & mask)) {
+			/*
+			 * Different, compute first wrong bit and return.
+			 */
+			xfs_trans_brelse(tp, bp);
+			i = XFS_RTLOBIT(wdiff) - bit;
+			*new = start + i;
+			*stat = 0;
+			return 0;
+		}
+		i = lastbit - bit;
+		/*
+		 * Go on to next block if that's where the next word is
+		 * and we need the next word.
+		 */
+		if (++word == XFS_BLOCKWSIZE(mp) && i < len) {
+			/*
+			 * If done with this block, get the next one.
+			 */
+			xfs_trans_brelse(tp, bp);
+			error = xfs_rtbuf_get(mp, tp, ++block, 0, &bp);
+			if (error) {
+				return error;
+			}
+			b = bufp = bp->b_addr;
+			word = 0;
+		} else {
+			/*
+			 * Go on to the next word in the buffer.
+			 */
+			b++;
+		}
+	} else {
+		/*
+		 * Starting on a word boundary, no partial word.
+		 */
+		i = 0;
+	}
+	/*
+	 * Loop over whole words in buffers.  When we use up one buffer
+	 * we move on to the next one.
+	 */
+	while (len - i >= XFS_NBWORD) {
+		/*
+		 * Compute difference between actual and desired value.
+		 */
+		if ((wdiff = *b ^ val)) {
+			/*
+			 * Different, compute first wrong bit and return.
+			 */
+			xfs_trans_brelse(tp, bp);
+			i += XFS_RTLOBIT(wdiff);
+			*new = start + i;
+			*stat = 0;
+			return 0;
+		}
+		i += XFS_NBWORD;
+		/*
+		 * Go on to next block if that's where the next word is
+		 * and we need the next word.
+		 */
+		if (++word == XFS_BLOCKWSIZE(mp) && i < len) {
+			/*
+			 * If done with this block, get the next one.
+			 */
+			xfs_trans_brelse(tp, bp);
+			error = xfs_rtbuf_get(mp, tp, ++block, 0, &bp);
+			if (error) {
+				return error;
+			}
+			b = bufp = bp->b_addr;
+			word = 0;
+		} else {
+			/*
+			 * Go on to the next word in the buffer.
+			 */
+			b++;
+		}
+	}
+	/*
+	 * If not ending on a word boundary, deal with the last
+	 * (partial) word.
+	 */
+	if ((lastbit = len - i)) {
+		/*
+		 * Mask of relevant bits.
+		 */
+		mask = ((xfs_rtword_t)1 << lastbit) - 1;
+		/*
+		 * Compute difference between actual and desired value.
+		 */
+		if ((wdiff = (*b ^ val) & mask)) {
+			/*
+			 * Different, compute first wrong bit and return.
+			 */
+			xfs_trans_brelse(tp, bp);
+			i += XFS_RTLOBIT(wdiff);
+			*new = start + i;
+			*stat = 0;
+			return 0;
+		} else
+			i = len;
+	}
+	/*
+	 * Successful, return.
+	 */
+	xfs_trans_brelse(tp, bp);
+	*new = start + i;
+	*stat = 1;
+	return 0;
+}
+
+#ifdef DEBUG
+/*
+ * Check that the given extent (block range) is allocated already.
+ */
+STATIC int				/* error */
+xfs_rtcheck_alloc_range(
+	xfs_mount_t	*mp,		/* file system mount point */
+	xfs_trans_t	*tp,		/* transaction pointer */
+	xfs_rtblock_t	bno,		/* starting block number of extent */
+	xfs_extlen_t	len)		/* length of extent */
+{
+	xfs_rtblock_t	new;		/* dummy for xfs_rtcheck_range */
+	int		stat;
+	int		error;
+
+	error = xfs_rtcheck_range(mp, tp, bno, len, 0, &new, &stat);
+	if (error)
+		return error;
+	ASSERT(stat);
+	return 0;
+}
+#else
+#define xfs_rtcheck_alloc_range(m,t,b,l)	(0)
+#endif
+/*
+ * Free an extent in the realtime subvolume.  Length is expressed in
+ * realtime extents, as is the block number.
+ */
+int					/* error */
+xfs_rtfree_extent(
+	xfs_trans_t	*tp,		/* transaction pointer */
+	xfs_rtblock_t	bno,		/* starting block number to free */
+	xfs_extlen_t	len)		/* length of extent freed */
+{
+	int		error;		/* error value */
+	xfs_mount_t	*mp;		/* file system mount structure */
+	xfs_fsblock_t	sb;		/* summary file block number */
+	xfs_buf_t	*sumbp = NULL;	/* summary file block buffer */
+
+	mp = tp->t_mountp;
+
+	ASSERT(mp->m_rbmip->i_itemp != NULL);
+	ASSERT(xfs_isilocked(mp->m_rbmip, XFS_ILOCK_EXCL));
+
+	error = xfs_rtcheck_alloc_range(mp, tp, bno, len);
+	if (error)
+		return error;
+
+	/*
+	 * Free the range of realtime blocks.
+	 */
+	error = xfs_rtfree_range(mp, tp, bno, len, &sumbp, &sb);
+	if (error) {
+		return error;
+	}
+	/*
+	 * Mark more blocks free in the superblock.
+	 */
+	xfs_trans_mod_sb(tp, XFS_TRANS_SB_FREXTENTS, (long)len);
+	/*
+	 * If we've now freed all the blocks, reset the file sequence
+	 * number to 0.
+	 */
+	if (tp->t_frextents_delta + mp->m_sb.sb_frextents ==
+	    mp->m_sb.sb_rextents) {
+		if (!(mp->m_rbmip->i_d.di_flags & XFS_DIFLAG_NEWRTBM))
+			mp->m_rbmip->i_d.di_flags |= XFS_DIFLAG_NEWRTBM;
+		*(__uint64_t *)&mp->m_rbmip->i_d.di_atime = 0;
+		xfs_trans_log_inode(tp, mp->m_rbmip, XFS_ILOG_CORE);
+	}
+	return 0;
+}
+
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 09/36] libxfs: bring across inode buffer readahead verifier changes
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (7 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 08/36] libxfs: xfs_rtalloc.c becomes xfs_rtbitmap.c Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 10/36] libxfs: Minor cleanup and bug fix sync Dave Chinner
                   ` (27 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

These were made for log recovery readahead in the kernel, so are not
directly used in userspace. Hence bringing the change across is
simply to keep files in sync.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_inode_buf.c | 41 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index b096f77..67d5eb4 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -46,9 +46,22 @@ xfs_inobp_check(
 }
 #endif
 
+/*
+ * If we are doing readahead on an inode buffer, we might be in log recovery
+ * reading an inode allocation buffer that hasn't yet been replayed, and hence
+ * has not had the inode cores stamped into it. Hence for readahead, the buffer
+ * may be potentially invalid.
+ *
+ * If the readahead buffer is invalid, we don't want to mark it with an error,
+ * but we do want to clear the DONE status of the buffer so that a followup read
+ * will re-read it from disk. This will ensure that we don't get an unnecessary
+ * warnings during log recovery and we don't get unnecssary panics on debug
+ * kernels.
+ */
 static void
 xfs_inode_buf_verify(
-	struct xfs_buf	*bp)
+	struct xfs_buf	*bp,
+	bool		readahead)
 {
 	struct xfs_mount *mp = bp->b_target->bt_mount;
 	int		i;
@@ -69,6 +82,11 @@ xfs_inode_buf_verify(
 		if (unlikely(XFS_TEST_ERROR(!di_ok, mp,
 						XFS_ERRTAG_ITOBP_INOTOBP,
 						XFS_RANDOM_ITOBP_INOTOBP))) {
+			if (readahead) {
+				bp->b_flags &= ~XBF_DONE;
+				return;
+			}
+
 			xfs_buf_ioerror(bp, EFSCORRUPTED);
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_HIGH,
 					     mp, dip);
@@ -87,14 +105,21 @@ static void
 xfs_inode_buf_read_verify(
 	struct xfs_buf	*bp)
 {
-	xfs_inode_buf_verify(bp);
+	xfs_inode_buf_verify(bp, false);
+}
+
+static void
+xfs_inode_buf_readahead_verify(
+	struct xfs_buf	*bp)
+{
+	xfs_inode_buf_verify(bp, true);
 }
 
 static void
 xfs_inode_buf_write_verify(
 	struct xfs_buf	*bp)
 {
-	xfs_inode_buf_verify(bp);
+	xfs_inode_buf_verify(bp, false);
 }
 
 const struct xfs_buf_ops xfs_inode_buf_ops = {
@@ -102,6 +127,12 @@ const struct xfs_buf_ops xfs_inode_buf_ops = {
 	.verify_write = xfs_inode_buf_write_verify,
 };
 
+const struct xfs_buf_ops xfs_inode_buf_ra_ops = {
+	.verify_read = xfs_inode_buf_readahead_verify,
+	.verify_write = xfs_inode_buf_write_verify,
+};
+
+
 /*
  * This routine is called to map an inode to the buffer containing the on-disk
  * version of the inode.  It returns a pointer to the buffer containing the
@@ -191,7 +222,7 @@ xfs_dinode_from_disk(
 		to->di_ino = be64_to_cpu(from->di_ino);
 		to->di_lsn = be64_to_cpu(from->di_lsn);
 		memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
-		platform_uuid_copy(&to->di_uuid, &from->di_uuid);
+		uuid_copy(&to->di_uuid, &from->di_uuid);
 	}
 }
 
@@ -237,7 +268,7 @@ xfs_dinode_to_disk(
 		to->di_ino = cpu_to_be64(from->di_ino);
 		to->di_lsn = cpu_to_be64(from->di_lsn);
 		memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
-		platform_uuid_copy(&to->di_uuid, &from->di_uuid);
+		uuid_copy(&to->di_uuid, &from->di_uuid);
 		to->di_flushiter = 0;
 	} else {
 		to->di_flushiter = cpu_to_be16(from->di_flushiter);
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 10/36] libxfs: Minor cleanup and bug fix sync
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (8 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 09/36] libxfs: bring across inode buffer readahead verifier changes Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 11/36] xfs: remove newlines from strings passed to __xfs_printk Dave Chinner
                   ` (26 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

These bring all the small single line comment, whitespace and minor
code differences into sync with the kernel code. Anything left at
this point is an intentional difference.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs_fs.h         | 4 ++--
 include/xfs_quota_defs.h | 4 ++++
 libxfs/xfs_attr_leaf.c   | 4 +---
 libxfs/xfs_bmap.c        | 6 +++---
 libxfs/xfs_bmap_btree.c  | 2 +-
 libxfs/xfs_dir2_leaf.c   | 1 -
 libxfs/xfs_dir2_node.c   | 2 +-
 libxfs/xfs_ialloc.c      | 4 ++--
 libxfs/xfs_inode_buf.c   | 2 ++
 libxfs/xfs_inode_fork.c  | 5 +++--
 libxfs/xfs_trans_resv.c  | 2 +-
 11 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/include/xfs_fs.h b/include/xfs_fs.h
index c43ba98..554fd66 100644
--- a/include/xfs_fs.h
+++ b/include/xfs_fs.h
@@ -358,7 +358,7 @@ typedef struct xfs_error_injection {
  * Speculative preallocation trimming.
  */
 #define XFS_EOFBLOCKS_VERSION		1
-struct xfs_eofblocks {
+struct xfs_fs_eofblocks {
 	__u32		eof_version;
 	__u32		eof_flags;
 	uid_t		eof_uid;
@@ -516,7 +516,7 @@ typedef struct xfs_swapext
 /*	XFS_IOC_GETBIOSIZE ---- deprecated 47	   */
 #define XFS_IOC_GETBMAPX	_IOWR('X', 56, struct getbmap)
 #define XFS_IOC_ZERO_RANGE	_IOW ('X', 57, struct xfs_flock64)
-#define XFS_IOC_FREE_EOFBLOCKS	_IOR ('X', 58, struct xfs_eofblocks)
+#define XFS_IOC_FREE_EOFBLOCKS	_IOR ('X', 58, struct xfs_fs_eofblocks)
 
 /*
  * ioctl commands that replace IRIX syssgi()'s
diff --git a/include/xfs_quota_defs.h b/include/xfs_quota_defs.h
index e6b0d6e..b3b2b10 100644
--- a/include/xfs_quota_defs.h
+++ b/include/xfs_quota_defs.h
@@ -154,4 +154,8 @@ typedef __uint16_t	xfs_qwarncnt_t;
 		(XFS_QMOPT_UQUOTA | XFS_QMOPT_PQUOTA | XFS_QMOPT_GQUOTA)
 #define XFS_QMOPT_RESBLK_MASK	(XFS_QMOPT_RES_REGBLKS | XFS_QMOPT_RES_RTBLKS)
 
+extern int xfs_dqcheck(struct xfs_mount *mp, xfs_disk_dquot_t *ddq,
+		       xfs_dqid_t id, uint type, uint flags, char *str);
+extern int xfs_calc_dquots_per_chunk(struct xfs_mount *mp, unsigned int nbblks);
+
 #endif	/* __XFS_QUOTA_H__ */
diff --git a/libxfs/xfs_attr_leaf.c b/libxfs/xfs_attr_leaf.c
index c09b0f3..fd52397 100644
--- a/libxfs/xfs_attr_leaf.c
+++ b/libxfs/xfs_attr_leaf.c
@@ -599,7 +599,7 @@ xfs_attr_shortform_getvalue(xfs_da_args_t *args)
 	xfs_attr_sf_entry_t *sfe;
 	int i;
 
-	ASSERT(args->dp->i_d.di_aformat == XFS_IFINLINE);
+	ASSERT(args->dp->i_afp->if_flags == XFS_IFINLINE);
 	sf = (xfs_attr_shortform_t *)args->dp->i_afp->if_u1.if_data;
 	sfe = &sf->list[0];
 	for (i = 0; i < sf->hdr.count;
@@ -909,7 +909,6 @@ out:
 	return error;
 }
 
-
 /*========================================================================
  * Routines used for growing the Btree.
  *========================================================================*/
@@ -1270,7 +1269,6 @@ xfs_attr3_leaf_compact(
 	ichdr_dst->freemap[0].size = ichdr_dst->firstused -
 						ichdr_dst->freemap[0].base;
 
-
 	/* write the header back to initialise the underlying buffer */
 	xfs_attr3_leaf_hdr_to_disk(leaf_dst, ichdr_dst);
 
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 7336abf..3e80c64 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -510,8 +510,8 @@ xfs_bmap_trace_exlist(
 
 /*
  * Validate that the bmbt_irecs being returned from bmapi are valid
- * given the callers original parameters.  Specifically check the
- * ranges of the returned irecs to ensure that they only extent beyond
+ * given the caller's original parameters.  Specifically check the
+ * ranges of the returned irecs to ensure that they only extend beyond
  * the given parameters if the XFS_BMAPI_ENTIRE flag was set.
  */
 STATIC void
@@ -1515,7 +1515,7 @@ xfs_bmap_first_unused(
 }
 
 /*
- * Returns the file-relative block number of the last block + 1 before
+ * Returns the file-relative block number of the last block - 1 before
  * last_block (input value) in the file.
  * This is not based on i_size, it is based on the extent records.
  * Returns 0 for local files, as they do not have extent records.
diff --git a/libxfs/xfs_bmap_btree.c b/libxfs/xfs_bmap_btree.c
index 2f6b48a..6211dc2 100644
--- a/libxfs/xfs_bmap_btree.c
+++ b/libxfs/xfs_bmap_btree.c
@@ -737,7 +737,7 @@ xfs_bmbt_verify(
 	 * precise.
 	 */
 	level = be16_to_cpu(block->bb_level);
-	if (level > MAX(mp->m_bm_maxlevels[0], mp->m_bm_maxlevels[1]))
+	if (level > max(mp->m_bm_maxlevels[0], mp->m_bm_maxlevels[1]))
 		return false;
 	if (be16_to_cpu(block->bb_numrecs) > mp->m_bmap_dmxr[level != 0])
 		return false;
diff --git a/libxfs/xfs_dir2_leaf.c b/libxfs/xfs_dir2_leaf.c
index c035c4d..683536e 100644
--- a/libxfs/xfs_dir2_leaf.c
+++ b/libxfs/xfs_dir2_leaf.c
@@ -1072,7 +1072,6 @@ xfs_dir3_leaf_compact_x1(
 	*highstalep = highstale;
 }
 
-
 /*
  * Log the bests entries indicated from a leaf1 block.
  */
diff --git a/libxfs/xfs_dir2_node.c b/libxfs/xfs_dir2_node.c
index 6a245e5..10d1d81 100644
--- a/libxfs/xfs_dir2_node.c
+++ b/libxfs/xfs_dir2_node.c
@@ -1796,7 +1796,7 @@ xfs_dir2_node_addname_int(
 		/*
 		 * Look at the current free entry.  Is it good enough?
 		 *
-		 * The bests initialisation should be where the buffer is read in
+		 * The bests initialisation should be where the bufer is read in
 		 * the above branch. But gcc is too stupid to realise that bests
 		 * and the freehdr are actually initialised if they are placed
 		 * there, so we have to do it here to avoid warnings. Blech.
diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index 4683287..afe1a82 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -484,7 +484,7 @@ xfs_ialloc_next_ag(
 
 /*
  * Select an allocation group to look for a free inode in, based on the parent
- * inode and then mode.  Return the allocation group buffer.
+ * inode and the mode.  Return the allocation group buffer.
  */
 STATIC xfs_agnumber_t
 xfs_ialloc_ag_select(
@@ -706,7 +706,7 @@ xfs_dialloc_ag(
 		error = xfs_inobt_get_rec(cur, &rec, &j);
 		if (error)
 			goto error0;
-		XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
+		XFS_WANT_CORRUPTED_GOTO(j == 1, error0);
 
 		if (rec.ir_freecount > 0) {
 			/*
diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index 67d5eb4..b796556 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -101,6 +101,7 @@ xfs_inode_buf_verify(
 	xfs_inobp_check(mp, bp);
 }
 
+
 static void
 xfs_inode_buf_read_verify(
 	struct xfs_buf	*bp)
@@ -299,6 +300,7 @@ xfs_dinode_verify(
 		return false;
 	return true;
 }
+
 void
 xfs_dinode_calc_crc(
 	struct xfs_mount	*mp,
diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c
index 1c006f9..190690c 100644
--- a/libxfs/xfs_inode_fork.c
+++ b/libxfs/xfs_inode_fork.c
@@ -138,7 +138,8 @@ xfs_iformat_fork(
 			}
 
 			di_size = be64_to_cpu(dip->di_size);
-			if (unlikely(di_size > XFS_DFORK_DSIZE(dip, ip->i_mount))) {
+			if (unlikely(di_size < 0 ||
+				     di_size > XFS_DFORK_DSIZE(dip, ip->i_mount))) {
 				xfs_warn(ip->i_mount,
 			"corrupt inode %Lu (bad size %Ld for local inode).",
 					(unsigned long long) ip->i_ino,
@@ -444,7 +445,7 @@ xfs_iread_extents(
  *
  * The caller must not request to add more records than would fit in
  * the on-disk inode root.  If the if_broot is currently NULL, then
- * if we adding records one will be allocated.  The caller must also
+ * if we are adding records, one will be allocated.  The caller must also
  * not request that the number of records go below zero, although
  * it can go to zero.
  *
diff --git a/libxfs/xfs_trans_resv.c b/libxfs/xfs_trans_resv.c
index 3e14b1c..1e59fad 100644
--- a/libxfs/xfs_trans_resv.c
+++ b/libxfs/xfs_trans_resv.c
@@ -49,7 +49,7 @@ xfs_calc_buf_res(
 
 /*
  * Logging inodes is really tricksy. They are logged in memory format,
- * which means that what we write into the log doesn't directory translate into
+ * which means that what we write into the log doesn't directly translate into
  * the amount of space they use on disk.
  *
  * Case in point - btree format forks in memory format use more space than the
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 11/36] xfs: remove newlines from strings passed to __xfs_printk
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (9 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 10/36] libxfs: Minor cleanup and bug fix sync Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 12/36] xfs: fix the wrong new_size/rnew_size at xfs_iext_realloc_direct() Dave Chinner
                   ` (25 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

__xfs_printk adds its own "\n".  Having it in the original string
leads to unintentional blank lines from these messages.

Ported from kernel commit 08e96e1a.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_bmap.c         | 2 +-
 libxfs/xfs_dir2_node.c    | 2 +-
 libxfs/xfs_sb.c           | 4 ++--
 libxlog/xfs_log_recover.c | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 3e80c64..c45b91a 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -1447,7 +1447,7 @@ xfs_bmap_search_extents(
 		xfs_alert_tag(ip->i_mount, XFS_PTAG_FSBLOCK_ZERO,
 				"Access to block zero in inode %llu "
 				"start_block: %llx start_off: %llx "
-				"blkcnt: %llx extent-state: %x lastx: %x\n",
+				"blkcnt: %llx extent-state: %x lastx: %x",
 			(unsigned long long)ip->i_ino,
 			(unsigned long long)gotp->br_startblock,
 			(unsigned long long)gotp->br_startoff,
diff --git a/libxfs/xfs_dir2_node.c b/libxfs/xfs_dir2_node.c
index 10d1d81..ced8c58 100644
--- a/libxfs/xfs_dir2_node.c
+++ b/libxfs/xfs_dir2_node.c
@@ -1083,7 +1083,7 @@ xfs_dir2_leafn_rebalance(
 		state->inleaf = 1;
 		blk2->index = 0;
 		xfs_alert(args->dp->i_mount,
-	"%s: picked the wrong leaf? reverting original leaf: blk1->index %d\n",
+	"%s: picked the wrong leaf? reverting original leaf: blk1->index %d",
 			__func__, blk1->index);
 	}
 }
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 8b90b88..11353bb 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -215,13 +215,13 @@ xfs_mount_validate_sb(
 	if (xfs_sb_version_has_pquotino(sbp)) {
 		if (sbp->sb_qflags & (XFS_OQUOTA_ENFD | XFS_OQUOTA_CHKD)) {
 			xfs_notice(mp,
-			   "Version 5 of Super block has XFS_OQUOTA bits.\n");
+			   "Version 5 of Super block has XFS_OQUOTA bits.");
 			return XFS_ERROR(EFSCORRUPTED);
 		}
 	} else if (sbp->sb_qflags & (XFS_PQUOTA_ENFD | XFS_GQUOTA_ENFD |
 				XFS_PQUOTA_CHKD | XFS_GQUOTA_CHKD)) {
 			xfs_notice(mp,
-"Superblock earlier than Version 5 has XFS_[PQ]UOTA_{ENFD|CHKD} bits.\n");
+"Superblock earlier than Version 5 has XFS_[PQ]UOTA_{ENFD|CHKD} bits.");
 			return XFS_ERROR(EFSCORRUPTED);
 	}
 
diff --git a/libxlog/xfs_log_recover.c b/libxlog/xfs_log_recover.c
index f3cda77..3f22921 100644
--- a/libxlog/xfs_log_recover.c
+++ b/libxlog/xfs_log_recover.c
@@ -1330,7 +1330,7 @@ xlog_unpack_data_crc(
 	if (crc != rhead->h_crc) {
 		if (rhead->h_crc || xfs_sb_version_hascrc(&log->l_mp->m_sb)) {
 			xfs_alert(log->l_mp,
-		"log record CRC mismatch: found 0x%x, expected 0x%x.\n",
+		"log record CRC mismatch: found 0x%x, expected 0x%x.",
 					le32_to_cpu(rhead->h_crc),
 					le32_to_cpu(crc));
 			xfs_hex_dump(dp, 32);
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 12/36] xfs: fix the wrong new_size/rnew_size at xfs_iext_realloc_direct()
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (10 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 11/36] xfs: remove newlines from strings passed to __xfs_printk Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 13/36] xfs: fix node forward in xfs_node_toosmall Dave Chinner
                   ` (24 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

At xfs_iext_realloc_direct(), the new_size is changed by adding
if_bytes if originally the extent records are stored at the inline
extent buffer, and we have to switch from it to a direct extent
list for those new allocated extents, this is wrong.

This patch fix above problem and revise the new_size comments at
xfs_iext_realloc_direct() to make it more readable.  Also, fix the
comments while switching from the inline extent buffer to a direct
extent list to reflect this change.

Ported from kernel commit 17ec81c1.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_inode_fork.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c
index 190690c..dfa86ae 100644
--- a/libxfs/xfs_inode_fork.c
+++ b/libxfs/xfs_inode_fork.c
@@ -1330,7 +1330,7 @@ xfs_iext_remove_indirect(
 void
 xfs_iext_realloc_direct(
 	xfs_ifork_t	*ifp,		/* inode fork pointer */
-	int		new_size)	/* new size of extents */
+	int		new_size)	/* new size of extents after adding */
 {
 	int		rnew_size;	/* real new size of extents */
 
@@ -1368,13 +1368,8 @@ xfs_iext_realloc_direct(
 				rnew_size - ifp->if_real_bytes);
 		}
 	}
-	/*
-	 * Switch from the inline extent buffer to a direct
-	 * extent list. Be sure to include the inline extent
-	 * bytes in new_size.
-	 */
+	/* Switch from the inline extent buffer to a direct extent list */
 	else {
-		new_size += ifp->if_bytes;
 		if (!is_power_of_2(new_size)) {
 			rnew_size = roundup_pow_of_two(new_size);
 		}
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 13/36] xfs: fix node forward in xfs_node_toosmall
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (11 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 12/36] xfs: fix the wrong new_size/rnew_size at xfs_iext_realloc_direct() Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 14/36] xfs: don't emit corruption noise on fs probes Dave Chinner
                   ` (23 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When a node is considered for a merge with a sibling, it overwrites the
sibling pointers of the original incore nodehdr with the sibling's
pointers.  This leads to loop considering the original node as a merge
candidate with itself in the second pass, and so it incorrectly
determines a merge should occur.)

Ported from equivalent kernel commit 997def25.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_da_btree.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/libxfs/xfs_da_btree.c b/libxfs/xfs_da_btree.c
index f106e06..53414f5 100644
--- a/libxfs/xfs_da_btree.c
+++ b/libxfs/xfs_da_btree.c
@@ -1201,6 +1201,7 @@ xfs_da3_node_toosmall(
 	/* start with smaller blk num */
 	forward = nodehdr.forw < nodehdr.back;
 	for (i = 0; i < 2; forward = !forward, i++) {
+		struct xfs_da3_icnode_hdr thdr;
 		if (forward)
 			blkno = nodehdr.forw;
 		else
@@ -1213,10 +1214,10 @@ xfs_da3_node_toosmall(
 			return(error);
 
 		node = bp->b_addr;
-		xfs_da3_node_hdr_from_disk(&nodehdr, node);
+		xfs_da3_node_hdr_from_disk(&thdr, node);
 		xfs_trans_brelse(state->args->trans, bp);
 
-		if (count - nodehdr.count >= 0)
+		if (count - thdr.count >= 0)
 			break;	/* fits with at least 25% to spare */
 	}
 	if (i >= 2) {
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 14/36] xfs: don't emit corruption noise on fs probes
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (12 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 13/36] xfs: fix node forward in xfs_node_toosmall Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 15/36] libxfs: fix root inode handling inconsistencies Dave Chinner
                   ` (22 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

If we get EWRONGFS due to probing of non-xfs filesystems,
there's no need to issue the scary corruption error and backtrace.

Ported from kernel commit 31625f28.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_sb.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 11353bb..65ddc2f 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -574,8 +574,9 @@ xfs_sb_read_verify(
 
 out_error:
 	if (error) {
-		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW,
-				     mp, bp->b_addr);
+		if (error != EWRONGFS)
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW,
+					     mp, bp->b_addr);
 		xfs_buf_ioerror(bp, error);
 	}
 }
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 15/36] libxfs: fix root inode handling inconsistencies
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (13 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 14/36] xfs: don't emit corruption noise on fs probes Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 16/36] libxfs: stop caching inode structures Dave Chinner
                   ` (21 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When "mounting" a filesystem via libxfs_mount(), callers can tell
libxfs to read the root and realtime inodes into cache. However,
when unmounting the filesystem, libxfs_unmount() used to
unconditionally free root inodes if they were present.

This leads to interesting issues like in mkfs, when it handles
creation, reading and freeing of the root and rt inodes itself.
It, however, passes in the flag to tell libxfs_mount() to read the
root inodes and so can result in unbalanced freeing of inodes when
cleaning up during the unmount proceedure.

As it turns out, nothing ever uses mp->m_rootip and so we don't need
to read it in or free it, or even have a pointer to it in the struct
xfs_mount. Similarly, the only user of the realtime inodes is mkfs,
and it initialises them itself. Hence we can kill the m_rootip and
the realtime inode mounting code.

This leaves one user of LIBXFS_MOUNT_ROOTINOS - xfs_db - and that is
only used to initialise the in-core superblock counter values from
the ag header for xfs_check. Move this code to the xfs_db init
functions so we can get rid of the mount parameter previously used
to trigger all these behavours (LIBXFS_MOUNT_ROOTINOS) completely.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 copy/xfs_copy.c  |  2 +-
 db/init.c        | 26 +++++++++++++++-------
 include/libxfs.h | 12 +++++-----
 libxfs/init.c    | 67 --------------------------------------------------------
 mkfs/proto.c     |  1 -
 mkfs/xfs_mkfs.c  |  4 ++--
 repair/phase6.c  |  2 --
 7 files changed, 26 insertions(+), 88 deletions(-)

diff --git a/copy/xfs_copy.c b/copy/xfs_copy.c
index bb37279..9986fbf 100644
--- a/copy/xfs_copy.c
+++ b/copy/xfs_copy.c
@@ -684,7 +684,7 @@ main(int argc, char **argv)
 	sb = &mbuf.m_sb;
 	libxfs_sb_from_disk(sb, XFS_BUF_TO_SBP(sbp));
 
-	mp = libxfs_mount(&mbuf, sb, xargs.ddev, xargs.logdev, xargs.rtdev, 1);
+	mp = libxfs_mount(&mbuf, sb, xargs.ddev, xargs.logdev, xargs.rtdev, 0);
 	if (mp == NULL) {
 		do_log(_("%s: %s filesystem failed to initialize\n"
 			"%s: Aborting.\n"), progname, source_name, progname);
diff --git a/db/init.c b/db/init.c
index 2932e51..d73d549 100644
--- a/db/init.c
+++ b/db/init.c
@@ -149,18 +149,28 @@ init(
 	}
 
 	mp = libxfs_mount(&xmount, sbp, x.ddev, x.logdev, x.rtdev,
-				LIBXFS_MOUNT_ROOTINOS | LIBXFS_MOUNT_DEBUGGER);
+			  LIBXFS_MOUNT_DEBUGGER);
 	if (!mp) {
-		mp = libxfs_mount(&xmount, sbp, x.ddev, x.logdev, x.rtdev,
-				LIBXFS_MOUNT_DEBUGGER);
-		if (!mp) {
-			fprintf(stderr, _("%s: device %s unusable (not an XFS "
-				"filesystem?)\n"), progname, fsdevice);
-			exit(1);
-		}
+		fprintf(stderr,
+			_("%s: device %s unusable (not an XFS filesystem?)\n"),
+			progname, fsdevice);
+		exit(1);
 	}
 	blkbb = 1 << mp->m_blkbb_log;
 
+	/*
+	 * xfs_check needs corrected incore superblock values
+	 */
+	if (sbp->sb_rootino != NULLFSINO &&
+	    xfs_sb_version_haslazysbcount(&mp->m_sb)) {
+		int error = xfs_initialize_perag_data(mp, sbp->sb_agcount);
+		if (error) {
+			fprintf(stderr,
+	_("%s: cannot init perag data (%d). Continuing anyway.\n"),
+				progname, error);
+		}
+	}
+
 	if (xfs_sb_version_hascrc(&mp->m_sb))
 		type_set_tab_crc();
 
diff --git a/include/libxfs.h b/include/libxfs.h
index f10ab59..3df8c07 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -167,7 +167,6 @@ typedef struct xfs_mount {
 	uint			m_rsumsize;	/* size of rt summary, bytes */
 	struct xfs_inode	*m_rbmip;	/* pointer to bitmap inode */
 	struct xfs_inode	*m_rsumip;	/* pointer to summary inode */
-	struct xfs_inode	*m_rootip;	/* pointer to root directory */
 	struct xfs_buftarg	*m_ddev_targp;
 	struct xfs_buftarg	*m_logdev_targp;
 	struct xfs_buftarg	*m_rtdev_targp;
@@ -259,12 +258,11 @@ typedef struct xfs_perag {
 	int		pagb_count;	/* pagb slots in use */
 } xfs_perag_t;
 
-#define LIBXFS_MOUNT_ROOTINOS		0x0001
-#define LIBXFS_MOUNT_DEBUGGER		0x0002
-#define LIBXFS_MOUNT_32BITINODES	0x0004
-#define LIBXFS_MOUNT_32BITINOOPT	0x0008
-#define LIBXFS_MOUNT_COMPAT_ATTR	0x0010
-#define LIBXFS_MOUNT_ATTR2		0x0020
+#define LIBXFS_MOUNT_DEBUGGER		0x0001
+#define LIBXFS_MOUNT_32BITINODES	0x0002
+#define LIBXFS_MOUNT_32BITINOOPT	0x0004
+#define LIBXFS_MOUNT_COMPAT_ATTR	0x0008
+#define LIBXFS_MOUNT_ATTR2		0x0010
 
 #define LIBXFS_IHASHSIZE(sbp)		(1<<10)
 #define LIBXFS_BHASHSIZE(sbp) 		(1<<10)
diff --git a/libxfs/init.c b/libxfs/init.c
index db7eeea..33c01f5 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -410,40 +410,6 @@ manage_zones(int release)
 }
 
 /*
- * Get the bitmap and summary inodes into the mount structure
- * at mount time.
- */
-static int
-rtmount_inodes(xfs_mount_t *mp)
-{
-	int		error;
-	xfs_sb_t	*sbp;
-
-	sbp = &mp->m_sb;
-	if (sbp->sb_rbmino == NULLFSINO)
-		return 0;
-	error = libxfs_iget(mp, NULL, sbp->sb_rbmino, 0, &mp->m_rbmip, 0);
-	if (error) {
-		fprintf(stderr,
-			_("%s: cannot read realtime bitmap inode (%d)\n"),
-			progname, error);
-		return error;
-	}
-	ASSERT(mp->m_rbmip != NULL);
-	ASSERT(sbp->sb_rsumino != NULLFSINO);
-	error = libxfs_iget(mp, NULL, sbp->sb_rsumino, 0, &mp->m_rsumip, 0);
-	if (error) {
-		libxfs_iput(mp->m_rbmip, 0);
-		fprintf(stderr,
-			_("%s: cannot read realtime summary inode (%d)\n"),
-			progname, error);
-		return error;
-	}
-	ASSERT(mp->m_rsumip != NULL);
-	return 0;
-}
-
-/*
  * Initialize realtime fields in the mount structure.
  */
 static int
@@ -810,39 +776,6 @@ libxfs_mount(
 		exit(1);
 	}
 
-	/*
-	 * mkfs calls mount before the root inode is allocated.
-	 */
-	if ((flags & LIBXFS_MOUNT_ROOTINOS) && sbp->sb_rootino != NULLFSINO) {
-		error = libxfs_iget(mp, NULL, sbp->sb_rootino, 0,
-				&mp->m_rootip, 0);
-		if (error) {
-			fprintf(stderr, _("%s: cannot read root inode (%d)\n"),
-				progname, error);
-			if (!(flags & LIBXFS_MOUNT_DEBUGGER))
-				return NULL;
-		}
-		ASSERT(mp->m_rootip != NULL);
-	}
-	if ((flags & LIBXFS_MOUNT_ROOTINOS) && rtmount_inodes(mp)) {
-		if (mp->m_rootip)
-			libxfs_iput(mp->m_rootip, 0);
-		return NULL;
-	}
-
-	/*
-	 * mkfs calls mount before the AGF/AGI structures are written.
-	 */
-	if ((flags & LIBXFS_MOUNT_ROOTINOS) && sbp->sb_rootino != NULLFSINO &&
-	    xfs_sb_version_haslazysbcount(&mp->m_sb)) {
-		error = xfs_initialize_perag_data(mp, sbp->sb_agcount);
-		if (error) {
-			fprintf(stderr, _("%s: cannot init perag data (%d)\n"),
-				progname, error);
-			return NULL;
-		}
-	}
-
 	return mp;
 }
 
diff --git a/mkfs/proto.c b/mkfs/proto.c
index 0cdef41..4cc0df6 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -543,7 +543,6 @@ parseproto(
 			pip = ip;
 			mp->m_sb.sb_rootino = ip->i_ino;
 			libxfs_mod_sb(tp, XFS_SB_ROOTINO);
-			mp->m_rootip = ip;
 			isroot = 1;
 		} else {
 			libxfs_trans_ijoin(tp, pip, 0);
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 355708c..d37e948 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2582,6 +2582,7 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 	memset(XFS_BUF_PTR(buf), 0, sectorsize);
 	libxfs_sb_to_disk((void *)XFS_BUF_PTR(buf), sbp, XFS_SB_ALL_BITS);
 	libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
+	libxfs_purgebuf(buf);
 
 	/*
 	 * If the data area is a file, then grow it out to its final size
@@ -2616,7 +2617,7 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 		(xfs_extlen_t)XFS_FSB_TO_BB(mp, logblocks),
 		&sbp->sb_uuid, logversion, lsunit, XLOG_FMT);
 
-	mp = libxfs_mount(mp, sbp, xi.ddev, xi.logdev, xi.rtdev, 1);
+	mp = libxfs_mount(mp, sbp, xi.ddev, xi.logdev, xi.rtdev, 0);
 	if (mp == NULL) {
 		fprintf(stderr, _("%s: filesystem failed to initialize\n"),
 			progname);
@@ -2887,7 +2888,6 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 	/*
 	 * Allocate the root inode and anything else in the proto file.
 	 */
-	mp->m_rootip = NULL;
 	parse_proto(mp, &fsx, &protostring);
 
 	/*
diff --git a/repair/phase6.c b/repair/phase6.c
index 2a37438..5307acf 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -856,8 +856,6 @@ mk_root_dir(xfs_mount_t *mp)
 	ip->i_df.if_bytes = ip->i_df.if_real_bytes = 0;
 	ip->i_df.if_u1.if_extents = NULL;
 
-	mp->m_rootip = ip;
-
 	/*
 	 * initialize the directory
 	 */
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 16/36] libxfs: stop caching inode structures
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (14 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 15/36] libxfs: fix root inode handling inconsistencies Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 17/36] db: separate out straight buffer IO from map based IO Dave Chinner
                   ` (20 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Christoph Hellwig <hch@infradead.org>

Currently libxfs has a cache for xfs_inode structures.  Unlike in kernelspace
where the inode cache, and the associated page cache for file data is used
for all filesystem operations the libxfs inode cache is only used in few
places:

 - the libxfs init code reads the root and realtime inodes when called from
   xfs_db using a special flag, but these inode structure are never referenced
   again
 - mkfs uses namespace and bmap routines that take the xfs_inode structure
   to create the root and realtime inodes, as well as any additional files
   specified in the proto file
 - the xfs_db attr code uses xfs_inode-based attr routines in the attrset
   and attrget commands
 - phase6 of xfs_repair uses xfs_inode-based routines for rebuilding
   directories and moving files to the lost+found directory.
 - phase7 of xfs_repair uses struct xfs_inode to modify the nlink count
   of inodes.

So except in repair we never ever reuse a cached inode, and even in repair
the logical inode caching doesn't help:

 - in phase 6a we iterate over each inode in the incore inode tree,
   and if it's a directory check/rebuild it
 - phase6b then updates the "." and ".." entries for directories
   that need, which means we require the backing buffers.
 - phase6c moves disconnected inodes to lost_found, which again needs
   the backing buffer to actually do anything.
 - phase7 then only touches inodes for which we need to reset i_nlink,
   which always involves reading, modifying and writing the physical
   inode.
   which always involves modifying the . and .. entries.

Given these facts stop caching the inodes to reduce memory usage
especially in xfs_repair, where this makes a different for large inode
count inodes.  On the upper end this allows repair to complete for
filesystem / amount of memory combinations that previously wouldn't.

With this we probably could increase the memory available to the buffer
cache in xfs_repair, but trying to do so I got a bit lost - the current
formula seems to magic to me to make any sense, and simply doubling the
buffer cache size causes us to run out of memory given that the data cached
in the buffer cache (typically lots of 8k inode buffers and few 4k other
metadata buffers) are much bigger than the inodes cached in the inode
cache.  We probably need a sizing scheme that takes the actual amount
of memory allocated to the buffer cache into account to solve this better.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 include/libxfs.h      |  5 ---
 libxfs/init.c         |  9 ------
 libxfs/rdwr.c         | 87 ++++++++++++---------------------------------------
 man/man8/xfs_repair.8 |  6 ----
 mkfs/xfs_mkfs.c       |  1 -
 repair/xfs_repair.c   | 14 ++-------
 6 files changed, 23 insertions(+), 99 deletions(-)

diff --git a/include/libxfs.h b/include/libxfs.h
index 3df8c07..e017b32 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -264,7 +264,6 @@ typedef struct xfs_perag {
 #define LIBXFS_MOUNT_COMPAT_ATTR	0x0008
 #define LIBXFS_MOUNT_ATTR2		0x0010
 
-#define LIBXFS_IHASHSIZE(sbp)		(1<<10)
 #define LIBXFS_BHASHSIZE(sbp) 		(1<<10)
 
 extern xfs_mount_t	*libxfs_mount (xfs_mount_t *, xfs_sb_t *,
@@ -448,7 +447,6 @@ extern int	libxfs_writebuf_int(xfs_buf_t *, int);
 extern int	libxfs_readbufr(struct xfs_buftarg *, xfs_daddr_t, xfs_buf_t *, int, int);
 
 extern int libxfs_bhash_size;
-extern int libxfs_ihash_size;
 
 #define LIBXFS_BREAD	0x1
 #define LIBXFS_BWRITE	0x2
@@ -648,9 +646,6 @@ extern void	libxfs_trans_ichgtime(struct xfs_trans *,
 extern int	libxfs_iflush_int (xfs_inode_t *, xfs_buf_t *);
 
 /* Inode Cache Interfaces */
-extern struct cache	*libxfs_icache;
-extern struct cache_operations	libxfs_icache_operations;
-extern void	libxfs_icache_purge (void);
 extern int	libxfs_iget (xfs_mount_t *, xfs_trans_t *, xfs_ino_t,
 				uint, xfs_inode_t **, xfs_daddr_t);
 extern void	libxfs_iput (xfs_inode_t *, uint);
diff --git a/libxfs/init.c b/libxfs/init.c
index 33c01f5..9a3cf22 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -22,9 +22,6 @@
 
 char *progname = "libxfs";	/* default, changed by each tool */
 
-struct cache *libxfs_icache;	/* global inode cache */
-int libxfs_ihash_size;		/* #buckets in icache */
-
 struct cache *libxfs_bcache;	/* global buffer cache */
 int libxfs_bhash_size;		/* #buckets in bcache */
 
@@ -335,9 +332,6 @@ libxfs_init(libxfs_init_t *a)
 	}
 	if (needcd)
 		chdir(curdir);
-	if (!libxfs_ihash_size)
-		libxfs_ihash_size = LIBXFS_IHASHSIZE(sbp);
-	libxfs_icache = cache_init(libxfs_ihash_size, &libxfs_icache_operations);
 	if (!libxfs_bhash_size)
 		libxfs_bhash_size = LIBXFS_BHASHSIZE(sbp);
 	libxfs_bcache = cache_init(libxfs_bhash_size, &libxfs_bcache_operations);
@@ -799,7 +793,6 @@ libxfs_umount(xfs_mount_t *mp)
 	int			agno;
 
 	libxfs_rtmount_destroy(mp);
-	libxfs_icache_purge();
 	libxfs_bcache_purge();
 
 	for (agno = 0; agno < mp->m_maxagi; agno++) {
@@ -815,7 +808,6 @@ void
 libxfs_destroy(void)
 {
 	manage_zones(1);
-	cache_destroy(libxfs_icache);
 	cache_destroy(libxfs_bcache);
 }
 
@@ -831,7 +823,6 @@ libxfs_report(FILE *fp)
 	time_t t;
 	char *c;
 
-	cache_report(fp, "libxfs_icache", libxfs_icache);
 	cache_report(fp, "libxfs_bcache", libxfs_bcache);
 
 	t = time(NULL);
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 13dbd23..f507855 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -993,26 +993,12 @@ struct cache_operations libxfs_bcache_operations = {
 
 
 /*
- * Inode cache interfaces
+ * Inode cache stubs.
  */
 
 extern kmem_zone_t	*xfs_ili_zone;
 extern kmem_zone_t	*xfs_inode_zone;
 
-static unsigned int
-libxfs_ihash(cache_key_t key, unsigned int hashsize)
-{
-	return ((unsigned int)*(xfs_ino_t *)key) % hashsize;
-}
-
-static int
-libxfs_icompare(struct cache_node *node, cache_key_t key)
-{
-	xfs_inode_t	*ip = (xfs_inode_t *)node;
-
-	return (ip->i_ino == *(xfs_ino_t *)key);
-}
-
 int
 libxfs_iget(xfs_mount_t *mp, xfs_trans_t *tp, xfs_ino_t ino, uint lock_flags,
 		xfs_inode_t **ipp, xfs_daddr_t bno)
@@ -1020,34 +1006,21 @@ libxfs_iget(xfs_mount_t *mp, xfs_trans_t *tp, xfs_ino_t ino, uint lock_flags,
 	xfs_inode_t	*ip;
 	int		error = 0;
 
-	if (cache_node_get(libxfs_icache, &ino, (struct cache_node **)&ip)) {
-#ifdef INO_DEBUG
-		fprintf(stderr, "%s: allocated inode, ino=%llu(%llu), %p\n",
-			__FUNCTION__, (unsigned long long)ino, bno, ip);
-#endif
-		ip->i_ino = ino;
-		ip->i_mount = mp;
-		error = xfs_iread(mp, tp, ip, bno);
-		if (error) {
-			cache_node_purge(libxfs_icache, &ino,
-					(struct cache_node *)ip);
-			ip = NULL;
-		}
-	}
-	*ipp = ip;
-	return error;
-}
+	ip = kmem_zone_zalloc(xfs_inode_zone, 0);
+	if (!ip)
+		return ENOMEM;
 
-void
-libxfs_iput(xfs_inode_t *ip, uint lock_flags)
-{
-	cache_node_put(libxfs_icache, (struct cache_node *)ip);
-}
+	ip->i_ino = ino;
+	ip->i_mount = mp;
+	error = xfs_iread(mp, tp, ip, bno);
+	if (error) {
+		kmem_zone_free(xfs_inode_zone, ip);
+		*ipp = NULL;
+		return error;
+	}
 
-static struct cache_node *
-libxfs_ialloc(cache_key_t key)
-{
-	return kmem_zone_zalloc(xfs_inode_zone, 0);
+	*ipp = ip;
+	return 0;
 }
 
 static void
@@ -1064,32 +1037,12 @@ libxfs_idestroy(xfs_inode_t *ip)
 		libxfs_idestroy_fork(ip, XFS_ATTR_FORK);
 }
 
-static void
-libxfs_irelse(struct cache_node *node)
-{
-	xfs_inode_t	*ip = (xfs_inode_t *)node;
-
-	if (ip != NULL) {
-		if (ip->i_itemp)
-			kmem_zone_free(xfs_ili_zone, ip->i_itemp);
-		ip->i_itemp = NULL;
-		libxfs_idestroy(ip);
-		kmem_zone_free(xfs_inode_zone, ip);
-		ip = NULL;
-	}
-}
-
 void
-libxfs_icache_purge(void)
+libxfs_iput(xfs_inode_t *ip, uint lock_flags)
 {
-	cache_purge(libxfs_icache);
+	if (ip->i_itemp)
+		kmem_zone_free(xfs_ili_zone, ip->i_itemp);
+	ip->i_itemp = NULL;
+	libxfs_idestroy(ip);
+	kmem_zone_free(xfs_inode_zone, ip);
 }
-
-struct cache_operations libxfs_icache_operations = {
-	/* .hash */	libxfs_ihash,
-	/* .alloc */	libxfs_ialloc,
-	/* .flush */	NULL,
-	/* .relse */	libxfs_irelse,
-	/* .compare */	libxfs_icompare,
-	/* .bulkrelse */ NULL
-};
diff --git a/man/man8/xfs_repair.8 b/man/man8/xfs_repair.8
index 96adb29..47436ec 100644
--- a/man/man8/xfs_repair.8
+++ b/man/man8/xfs_repair.8
@@ -130,12 +130,6 @@ The
 supported are:
 .RS 1.0i
 .TP
-.BI ihash= ihashsize
-overrides the default inode cache hash size. The total number of
-inode cache entries are limited to 8 times this amount. The default
-.I ihashsize
-is 1024 (for a total of 8192 entries).
-.TP
 .BI bhash= bhashsize
 overrides the default buffer cache hash size. The total number of
 buffer cache entries are limited to 8 times this amount. The default
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index d37e948..3a032c0 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2935,7 +2935,6 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 	 * Need to drop references to inodes we still hold, first.
 	 */
 	libxfs_rtmount_destroy(mp);
-	libxfs_icache_purge();
 	libxfs_bcache_purge();
 
 	/*
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 820e7a2..214b7fa 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -69,7 +69,6 @@ static char *c_opts[] = {
 };
 
 
-static int	ihash_option_used;
 static int	bhash_option_used;
 static long	max_mem_specified;	/* in megabytes */
 static int	phase2_threads = 32;
@@ -239,13 +238,13 @@ process_args(int argc, char **argv)
 					pre_65_beta = 1;
 					break;
 				case IHASH_SIZE:
-					libxfs_ihash_size = (int)strtol(val, NULL, 0);
-					ihash_option_used = 1;
+					do_warn(
+		_("-o ihash option has been removed and will be ignored\n"));
 					break;
 				case BHASH_SIZE:
 					if (max_mem_specified)
 						do_abort(
-			_("-o bhash option cannot be used with -m option\n"));
+		_("-o bhash option cannot be used with -m option\n"));
 					libxfs_bhash_size = (int)strtol(val, NULL, 0);
 					bhash_option_used = 1;
 					break;
@@ -648,9 +647,7 @@ main(int argc, char **argv)
 		unsigned long	max_mem;
 		struct rlimit	rlim;
 
-		libxfs_icache_purge();
 		libxfs_bcache_purge();
-		cache_destroy(libxfs_icache);
 		cache_destroy(libxfs_bcache);
 
 		mem_used = (mp->m_sb.sb_icount >> (10 - 2)) +
@@ -709,11 +706,6 @@ main(int argc, char **argv)
 			do_log(_("        - block cache size set to %d entries\n"),
 				libxfs_bhash_size * HASH_CACHE_RATIO);
 
-		if (!ihash_option_used)
-			libxfs_ihash_size = libxfs_bhash_size;
-
-		libxfs_icache = cache_init(libxfs_ihash_size,
-						&libxfs_icache_operations);
 		libxfs_bcache = cache_init(libxfs_bhash_size,
 						&libxfs_bcache_operations);
 	}
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 17/36] db: separate out straight buffer IO from map based IO.
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (15 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 16/36] libxfs: stop caching inode structures Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 18/36] db: rewrite bbmap to use xfs_buf_map Dave Chinner
                   ` (19 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Libxfs has two different interfaces for getting and reading buffers.
The first is a block/length interface for reading contiguous
regions, and the second is based on extent based xfs_buf_map arrays
for discontiguous regions. The xfs-db code is solely based on a
basic block array interface regardless of the type of region being
read, and so doesn't match to either libxfs interface.

As a first step to converting xfs_db to the libxfs interfaces, add a
simple block/length buffer API and implement it using pread/pwrite.
Then remove the single region conditionals from the basic block array
based interfaces, and convert all the contiguous block read cases to
use the new API.

This new API is temporary - it will be replaced by the equivalent
libxfs interface calls once all the infrastructure preparation for
the changeover has been completed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/init.c |   7 ++--
 db/io.c   | 125 +++++++++++++++++++++++++++++++++++++++++++++++++-------------
 db/io.h   |   5 +--
 3 files changed, 104 insertions(+), 33 deletions(-)

diff --git a/db/init.c b/db/init.c
index d73d549..489c9fb 100644
--- a/db/init.c
+++ b/db/init.c
@@ -55,7 +55,7 @@ init(
 	char		**argv)
 {
 	xfs_sb_t	*sbp;
-	void		*bufp = NULL;
+	char		bufp[BBSIZE];
 	int		c;
 
 	setlocale(LC_ALL, "");
@@ -115,15 +115,14 @@ init(
 		exit(1);
 	}
 
-	if (read_bbs(XFS_SB_DADDR, 1, &bufp, NULL)) {
+	if (read_buf(XFS_SB_DADDR, 1, bufp)) {
 		fprintf(stderr, _("%s: %s is invalid (cannot read first 512 "
 			"bytes)\n"), progname, fsdevice);
 		exit(1);
 	}
 
 	/* copy SB from buffer to in-core, converting architecture as we go */
-	libxfs_sb_from_disk(&xmount.m_sb, bufp);
-	xfree(bufp);
+	libxfs_sb_from_disk(&xmount.m_sb, (struct xfs_dsb *)bufp);
 
 	sbp = &xmount.m_sb;
 	if (sbp->sb_magicnum != XFS_SB_MAGIC) {
diff --git a/db/io.c b/db/io.c
index 39a1827..fa11646 100644
--- a/db/io.c
+++ b/db/io.c
@@ -417,8 +417,61 @@ ring_add(void)
 	}
 }
 
-
 int
+read_buf(
+	xfs_daddr_t	bbno,
+	int		count,
+	void		*bufp)
+{
+	int		err;
+
+	err = pread64(x.dfd, bufp, BBTOB(count), BBTOB(bbno));
+	if (err < 0)
+		err = errno;
+	else if (err < count)
+		err = -1;
+	return err;
+}
+
+static int
+write_buf(
+	xfs_daddr_t	bbno,
+	int		count,
+	void		*bufp)
+{
+	int		err;
+
+	err = pwrite64(x.dfd, bufp, BBTOB(count), BBTOB(bbno));
+	if (err < 0)
+		err = errno;
+	else if (err < count)
+		err = -1;
+	return err;
+}
+
+static void
+write_cur_buf(void)
+{
+	int ret;
+
+	ret = write_buf(iocur_top->bb, iocur_top->blen, iocur_top->buf);
+
+	if (ret == -1)
+		dbprintf(_("incomplete write, block: %lld\n"),
+			 (iocur_base + iocur_sp)->bb);
+	else if (ret != 0)
+		dbprintf(_("write error: %s\n"), strerror(ret));
+
+	/* re-read buffer from disk */
+	ret = read_buf(iocur_top->bb, iocur_top->blen, iocur_top->buf);
+	if (ret == -1)
+		dbprintf(_("incomplete read, block: %lld\n"),
+			 (iocur_base + iocur_sp)->bb);
+	else if (ret != 0)
+		dbprintf(_("read error: %s\n"), strerror(ret));
+}
+
+static int
 write_bbs(
 	__int64_t       bbno,
 	int             count,
@@ -430,15 +483,14 @@ write_bbs(
 	int		j;
 	int		rval = EINVAL;	/* initialize for zero `count' case */
 
-	for (j = 0; j < count; j += bbmap ? 1 : count) {
-		if (bbmap)
-			bbno = bbmap->b[j];
+	for (j = 0; j < count; j++) {
+		bbno = bbmap->b[j];
 		if (lseek64(x.dfd, bbno << BBSHIFT, SEEK_SET) < 0) {
 			rval = errno;
 			dbprintf(_("can't seek in filesystem at bb %lld\n"), bbno);
 			return rval;
 		}
-		c = BBTOB(bbmap ? 1 : count);
+		c = BBTOB(1);
 		i = (int)write(x.dfd, (char *)bufp + BBTOB(j), c);
 		if (i < 0) {
 			rval = errno;
@@ -452,7 +504,7 @@ write_bbs(
 	return rval;
 }
 
-int
+static int
 read_bbs(
 	__int64_t	bbno,
 	int		count,
@@ -473,9 +525,8 @@ read_bbs(
 		buf = xmalloc(c);
 	else
 		buf = *bufp;
-	for (j = 0; j < count; j += bbmap ? 1 : count) {
-		if (bbmap)
-			bbno = bbmap->b[j];
+	for (j = 0; j < count; j++) {
+		bbno = bbmap->b[j];
 		if (lseek64(x.dfd, bbno << BBSHIFT, SEEK_SET) < 0) {
 			rval = errno;
 			dbprintf(_("can't seek in filesystem at bb %lld\n"), bbno);
@@ -483,7 +534,7 @@ read_bbs(
 				xfree(buf);
 			buf = NULL;
 		} else {
-			c = BBTOB(bbmap ? 1 : count);
+			c = BBTOB(1);
 			i = (int)read(x.dfd, (char *)buf + BBTOB(j), c);
 			if (i < 0) {
 				rval = errno;
@@ -506,22 +557,19 @@ read_bbs(
 	return rval;
 }
 
-void
-write_cur(void)
+static void
+write_cur_bbs(void)
 {
 	int ret;
 
-	if (iocur_sp < 0) {
-		dbprintf(_("nothing to write\n"));
-		return;
-	}
 	ret = write_bbs(iocur_top->bb, iocur_top->blen, iocur_top->buf,
-		iocur_top->use_bbmap ? &iocur_top->bbmap : NULL);
+			&iocur_top->bbmap);
 	if (ret == -1)
 		dbprintf(_("incomplete write, block: %lld\n"),
 			 (iocur_base + iocur_sp)->bb);
 	else if (ret != 0)
 		dbprintf(_("write error: %s\n"), strerror(ret));
+
 	/* re-read buffer from disk */
 	ret = read_bbs(iocur_top->bb, iocur_top->blen, &iocur_top->buf,
 		iocur_top->use_bbmap ? &iocur_top->bbmap : NULL);
@@ -533,6 +581,20 @@ write_cur(void)
 }
 
 void
+write_cur(void)
+{
+	if (iocur_sp < 0) {
+		dbprintf(_("nothing to write\n"));
+		return;
+	}
+
+	if (iocur_top->use_bbmap)
+		write_cur_bbs();
+	else
+		write_cur_buf();
+}
+
+void
 set_cur(
 	const typ_t	*t,
 	__int64_t	d,
@@ -549,17 +611,32 @@ set_cur(
 		return;
 	}
 
-#ifdef DEBUG
-	if (bbmap)
-		printf(_("xfs_db got a bbmap for %lld\n"), (long long)d);
-#endif
 	ino = iocur_top->ino;
 	dirino = iocur_top->dirino;
 	mode = iocur_top->mode;
 	pop_cur();
 	push_cur();
-	if (read_bbs(d, c, &iocur_top->buf, bbmap))
-		return;
+
+	if (bbmap) {
+#ifdef DEBUG
+		printf(_("xfs_db got a bbmap for %lld\n"), (long long)d);
+#endif
+
+		if (read_bbs(d, c, &iocur_top->buf, bbmap))
+			return;
+		iocur_top->bbmap = *bbmap;
+		iocur_top->use_bbmap = 1;
+	} else {
+		if (!iocur_top->buf) {
+			iocur_top->buf = malloc(BBTOB(c));
+			if (!iocur_top->buf)
+				return;
+		}
+		if (read_buf(d, c, iocur_top->buf))
+			return;
+		iocur_top->use_bbmap = 0;
+	}
+
 	iocur_top->bb = d;
 	iocur_top->blen = c;
 	iocur_top->boff = 0;
@@ -570,8 +647,6 @@ set_cur(
 	iocur_top->ino = ino;
 	iocur_top->dirino = dirino;
 	iocur_top->mode = mode;
-	if ((iocur_top->use_bbmap = (bbmap != NULL)))
-		iocur_top->bbmap = *bbmap;
 
 	/* store location in ring */
 	if (ring_flag)
diff --git a/db/io.h b/db/io.h
index 549aad9..9ea6223 100644
--- a/db/io.h
+++ b/db/io.h
@@ -52,10 +52,7 @@ extern void	off_cur(int off, int len);
 extern void	pop_cur(void);
 extern void	print_iocur(char *tag, iocur_t *ioc);
 extern void	push_cur(void);
-extern int	read_bbs(__int64_t daddr, int count, void **bufp,
-			 bbmap_t *bbmap);
-extern int	write_bbs(__int64_t daddr, int count, void *bufp,
-			  bbmap_t *bbmap);
+extern int	read_buf(__int64_t daddr, int count, void *bufp);
 extern void     write_cur(void);
 extern void	set_cur(const struct typ *t, __int64_t d, int c, int ring_add,
 			bbmap_t *bbmap);
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 18/36] db: rewrite bbmap to use xfs_buf_map
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (16 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 17/36] db: separate out straight buffer IO from map based IO Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 19/36] libxfs: refactor libxfs_buf_read_map for xfs_db Dave Chinner
                   ` (18 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Use the libxfs struct xfs_buf_map for recording the extent layout of
discontiguous buffers and convert the read/write to decode them
directory and use read_buf/write_buf to do the extent IO. This
brings the physical xfs_db IO code to be very close to the model
that libxfs uses.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/bmap.c | 15 ++++-----------
 db/io.c   | 58 ++++++++++++----------------------------------------------
 db/io.h   |  3 ++-
 3 files changed, 18 insertions(+), 58 deletions(-)

diff --git a/db/bmap.c b/db/bmap.c
index 0ef7a62..3951b9f 100644
--- a/db/bmap.c
+++ b/db/bmap.c
@@ -293,20 +293,13 @@ make_bbmap(
 	int		nex,
 	bmap_ext_t	*bmp)
 {
-	int		d;
-	xfs_dfsbno_t	dfsbno;
 	int		i;
-	int		j;
-	int		k;
 
-	for (i = 0, d = 0; i < nex; i++) {
-		dfsbno = bmp[i].startblock;
-		for (j = 0; j < bmp[i].blockcount; j++, dfsbno++) {
-			for (k = 0; k < blkbb; k++)
-				bbmap->b[d++] =
-					XFS_FSB_TO_DADDR(mp, dfsbno) + k;
-		}
+	for (i = 0; i < nex; i++) {
+		bbmap->b[i].bm_bn = XFS_FSB_TO_DADDR(mp, bmp[i].startblock);
+		bbmap->b[i].bm_len = XFS_FSB_TO_BB(mp, bmp[i].blockcount);
 	}
+	bbmap->nmaps = nex;
 }
 
 static xfs_fsblock_t
diff --git a/db/io.c b/db/io.c
index fa11646..01a5970 100644
--- a/db/io.c
+++ b/db/io.c
@@ -478,28 +478,16 @@ write_bbs(
 	void            *bufp,
 	bbmap_t		*bbmap)
 {
-	int		c;
-	int		i;
 	int		j;
 	int		rval = EINVAL;	/* initialize for zero `count' case */
 
-	for (j = 0; j < count; j++) {
-		bbno = bbmap->b[j];
-		if (lseek64(x.dfd, bbno << BBSHIFT, SEEK_SET) < 0) {
-			rval = errno;
-			dbprintf(_("can't seek in filesystem at bb %lld\n"), bbno);
-			return rval;
-		}
-		c = BBTOB(1);
-		i = (int)write(x.dfd, (char *)bufp + BBTOB(j), c);
-		if (i < 0) {
-			rval = errno;
-		} else if (i < c) {
-			rval = -1;
-		} else
-			rval = 0;
+	for (j = 0; j < count;) {
+		rval = write_buf(bbmap->b[j].bm_bn, bbmap->b[j].bm_len,
+			     (char *)bufp + BBTOB(j));
 		if (rval)
 			break;
+
+		j += bbmap->b[j].bm_len;
 	}
 	return rval;
 }
@@ -512,45 +500,23 @@ read_bbs(
 	bbmap_t		*bbmap)
 {
 	void		*buf;
-	int		c;
-	int		i;
 	int		j;
 	int		rval = EINVAL;
 
 	if (count <= 0)
 		count = 1;
 
-	c = BBTOB(count);
 	if (*bufp == NULL)
-		buf = xmalloc(c);
+		buf = xmalloc(BBTOB(count));
 	else
 		buf = *bufp;
-	for (j = 0; j < count; j++) {
-		bbno = bbmap->b[j];
-		if (lseek64(x.dfd, bbno << BBSHIFT, SEEK_SET) < 0) {
-			rval = errno;
-			dbprintf(_("can't seek in filesystem at bb %lld\n"), bbno);
-			if (*bufp == NULL)
-				xfree(buf);
-			buf = NULL;
-		} else {
-			c = BBTOB(1);
-			i = (int)read(x.dfd, (char *)buf + BBTOB(j), c);
-			if (i < 0) {
-				rval = errno;
-				if (*bufp == NULL)
-					xfree(buf);
-				buf = NULL;
-			} else if (i < c) {
-				rval = -1;
-				if (*bufp == NULL)
-					xfree(buf);
-				buf = NULL;
-			} else
-				rval = 0;
-		}
-		if (buf == NULL)
+	for (j = 0; j < count;) {
+		rval = read_buf(bbmap->b[j].bm_bn, bbmap->b[j].bm_len,
+			     (char *)buf + BBTOB(j));
+		if (rval)
 			break;
+
+		j += bbmap->b[j].bm_len;
 	}
 	if (*bufp == NULL)
 		*bufp = buf;
diff --git a/db/io.h b/db/io.h
index 9ea6223..c7641d5 100644
--- a/db/io.h
+++ b/db/io.h
@@ -20,7 +20,8 @@ struct typ;
 
 #define	BBMAP_SIZE		(XFS_MAX_BLOCKSIZE / BBSIZE)
 typedef struct bbmap {
-	__int64_t		b[BBMAP_SIZE];
+	int			nmaps;
+	struct xfs_buf_map	b[BBMAP_SIZE];
 } bbmap_t;
 
 typedef struct iocur {
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 19/36] libxfs: refactor libxfs_buf_read_map for xfs_db
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (17 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 18/36] db: rewrite bbmap to use xfs_buf_map Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 20/36] db: rewrite IO engine to use libxfs Dave Chinner
                   ` (17 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_db requires low level read/write buffer primitives that are the
equivalent of libxfs_readbufr/writebufr. The implementation of
libxfs_writebufr already handles discontiguous buffers, but there is
no equivalent libxfs_readbufr_map support in the code.

Refactor libxfs_readbuf_map into two parts - one that does the
buffer cache lookup, and the other that does the read IO. This
provides the implementation of libxfs_readbufr_map that is required
for xfs_db.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/libxfs.h |  3 +++
 libxfs/rdwr.c    | 61 +++++++++++++++++++++++++++++++++++++-------------------
 2 files changed, 43 insertions(+), 21 deletions(-)

diff --git a/include/libxfs.h b/include/libxfs.h
index e017b32..b097bd2 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -444,7 +444,10 @@ extern xfs_buf_t *libxfs_getbufr(struct xfs_buftarg *, xfs_daddr_t, int);
 extern void	libxfs_putbufr(xfs_buf_t *);
 
 extern int	libxfs_writebuf_int(xfs_buf_t *, int);
+extern int	libxfs_writebufr(struct xfs_buf *);
 extern int	libxfs_readbufr(struct xfs_buftarg *, xfs_daddr_t, xfs_buf_t *, int, int);
+extern int	libxfs_readbufr_map(struct xfs_buftarg *, struct xfs_buf *,
+				    struct xfs_buf_map *, int, int);
 
 extern int libxfs_bhash_size;
 
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index f507855..7eaea0a 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -719,30 +719,18 @@ libxfs_readbuf(struct xfs_buftarg *btp, xfs_daddr_t blkno, int len, int flags,
 	return bp;
 }
 
-struct xfs_buf *
-libxfs_readbuf_map(struct xfs_buftarg *btp, struct xfs_buf_map *map, int nmaps,
-		int flags, const struct xfs_buf_ops *ops)
+int
+libxfs_readbufr_map(struct xfs_buftarg *btp, struct xfs_buf *bp,
+		    struct xfs_buf_map *map, int nmaps, int flags)
 {
-	xfs_buf_t	*bp;
-	int		error = 0;
-	int		fd;
-	int		i;
-	char		*buf;
-
-	if (nmaps == 1)
-		return libxfs_readbuf(btp, map[0].bm_bn, map[0].bm_len,
-					flags, ops);
-
-	bp = libxfs_getbuf_map(btp, map, nmaps);
-	if (!bp)
-		return NULL;
+	int	fd = libxfs_device_to_fd(btp->dev);
+	int	error = 0;
+	char	*buf;
+	int	i;
 
-	bp->b_error = 0;
-	bp->b_ops = ops;
-	if ((bp->b_flags & (LIBXFS_B_UPTODATE|LIBXFS_B_DIRTY)))
-		return bp;
+	ASSERT(BBTOB(len) <= bp->b_bcount);
 
-	ASSERT(bp->b_nmaps = nmaps);
+	ASSERT(bp->b_nmaps == nmaps);
 
 	fd = libxfs_device_to_fd(btp->dev);
 	buf = bp->b_addr;
@@ -762,6 +750,37 @@ libxfs_readbuf_map(struct xfs_buftarg *btp, struct xfs_buf_map *map, int nmaps,
 		offset += len;
 	}
 
+	if (!error);
+		bp->b_flags |= LIBXFS_B_UPTODATE;
+#ifdef IO_DEBUG
+	printf("%lx: %s: read %u bytes, error %d, blkno=0x%llx(0x%llx), %p\n",
+		pthread_self(), __FUNCTION__, , error,
+		(long long)LIBXFS_BBTOOFF64(blkno), (long long)blkno, bp);
+#endif
+	return error;
+}
+
+struct xfs_buf *
+libxfs_readbuf_map(struct xfs_buftarg *btp, struct xfs_buf_map *map, int nmaps,
+		int flags, const struct xfs_buf_ops *ops)
+{
+	struct xfs_buf	*bp;
+	int		error = 0;
+
+	if (nmaps == 1)
+		return libxfs_readbuf(btp, map[0].bm_bn, map[0].bm_len,
+					flags, ops);
+
+	bp = libxfs_getbuf_map(btp, map, nmaps);
+	if (!bp)
+		return NULL;
+
+	bp->b_error = 0;
+	bp->b_ops = ops;
+	if ((bp->b_flags & (LIBXFS_B_UPTODATE|LIBXFS_B_DIRTY)))
+		return bp;
+
+	error = libxfs_readbufr_map(btp, bp, map, nmaps, flags);
 	if (!error) {
 		bp->b_flags |= LIBXFS_B_UPTODATE;
 		if (bp->b_ops)
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 20/36] db: rewrite IO engine to use libxfs
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (18 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 19/36] libxfs: refactor libxfs_buf_read_map for xfs_db Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13 16:05   ` Christoph Hellwig
  2013-11-13  6:40 ` [PATCH 21/36] db: introduce verifier support into set_cur Dave Chinner
                   ` (16 subsequent siblings)
  36 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Now that we have buffers and xfs_buf_maps, it is relatively easy to
convert the IO engine to use libxfs routines. This gets rid of the
most of the differences between mapped and straight buffer reads,
and tracks xfs_bufs directly in the IO context that is being used.

This is not yet a perfect solution, as xfs_db does different sized
IOs for the same block range which will throw warnings like:

xfs_db> inode 64
7ffff7fde740: Badness in key lookup (length)
bp=(bno 0x40, len 8192 bytes) key=(bno 0x40, len 4096 bytes)
xfs_db>

This is when first displaying an inode in the root inode chunk.
These will need to be dealt with on a case by case basis.

Further, xfs_db can build up a large IO stack by the time it has run
to completion. If we don't unwind this IO stack before we shut down
the libxfs caches, metadump and other db programs will exit with
unreleased buffers and emit warnings like:

cache_purge: shake on cache 0x69e4f0 left 7 nodes!?

Hence we need to unwind the iostack as we shut down.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 db/init.c |  28 ++++++++--
 db/io.c   | 178 +++++++++++++++++---------------------------------------------
 db/io.h   |   4 +-
 3 files changed, 73 insertions(+), 137 deletions(-)

diff --git a/db/init.c b/db/init.c
index 489c9fb..2dc7c87 100644
--- a/db/init.c
+++ b/db/init.c
@@ -54,8 +54,8 @@ init(
 	int		argc,
 	char		**argv)
 {
-	xfs_sb_t	*sbp;
-	char		bufp[BBSIZE];
+	struct xfs_sb	*sbp;
+	struct xfs_buf	*bp;
 	int		c;
 
 	setlocale(LC_ALL, "");
@@ -115,14 +115,25 @@ init(
 		exit(1);
 	}
 
-	if (read_buf(XFS_SB_DADDR, 1, bufp)) {
+	/*
+	 * Read the superblock, but don't validate it - we are a diagnostic
+	 * tool and so need to be able to mount busted filesystems.
+	 */
+	memset(&xmount, 0, sizeof(struct xfs_mount));
+	libxfs_buftarg_init(&xmount, x.ddev, x.logdev, x.rtdev);
+	bp = libxfs_readbuf(xmount.m_ddev_targp, XFS_SB_DADDR,
+			    1 << (XFS_MAX_SECTORSIZE_LOG - BBSHIFT), 0, NULL);
+
+	if (!bp || bp->b_error) {
 		fprintf(stderr, _("%s: %s is invalid (cannot read first 512 "
 			"bytes)\n"), progname, fsdevice);
 		exit(1);
 	}
 
 	/* copy SB from buffer to in-core, converting architecture as we go */
-	libxfs_sb_from_disk(&xmount.m_sb, (struct xfs_dsb *)bufp);
+	libxfs_sb_from_disk(&xmount.m_sb, XFS_BUF_TO_SBP(bp));
+	libxfs_putbuf(bp);
+	libxfs_purgebuf(bp);
 
 	sbp = &xmount.m_sb;
 	if (sbp->sb_magicnum != XFS_SB_MAGIC) {
@@ -186,9 +197,11 @@ main(
 	int	c, i, done = 0;
 	char	*input;
 	char	**v;
+	int	start_iocur_sp;
 
 	pushfile(stdin);
 	init(argc, argv);
+	start_iocur_sp = iocur_sp;
 
 	for (i = 0; !done && i < ncmdline; i++) {
 		v = breakline(cmdline[i], &c);
@@ -211,6 +224,13 @@ main(
 	}
 
 close_devices:
+	/*
+	 * Make sure that we pop the all the buffer contexts we hold so that
+	 * they are released before we purge the caches during unmount.
+	 */
+	while (iocur_sp > start_iocur_sp)
+		pop_cur();
+	libxfs_umount(mp);
 	if (x.ddev)
 		libxfs_device_close(x.ddev);
 	if (x.logdev && x.logdev != x.ddev)
diff --git a/db/io.c b/db/io.c
index 01a5970..ca89354 100644
--- a/db/io.c
+++ b/db/io.c
@@ -104,8 +104,14 @@ pop_cur(void)
 		dbprintf(_("can't pop anything from I/O stack\n"));
 		return;
 	}
-	if (iocur_top->buf)
-		xfree(iocur_top->buf);
+	if (iocur_top->bp) {
+		libxfs_putbuf(iocur_top->bp);
+		iocur_top->bp = NULL;
+	}
+	if (iocur_top->bbmap) {
+		free(iocur_top->bbmap);
+		iocur_top->bbmap = NULL;
+	}
 	if (--iocur_sp >= 0) {
 		iocur_top = iocur_base + iocur_sp;
 		cur_typ = iocur_top->typ;
@@ -147,10 +153,11 @@ print_iocur(
 	dbprintf(_("\tbuffer block %lld (fsbno %lld), %d bb%s\n"), ioc->bb,
 		(xfs_dfsbno_t)XFS_DADDR_TO_FSB(mp, ioc->bb), ioc->blen,
 		ioc->blen == 1 ? "" : "s");
-	if (ioc->use_bbmap) {
+	if (ioc->bbmap) {
 		dbprintf(_("\tblock map"));
-		for (i = 0; i < ioc->blen; i++)
-			dbprintf(" %d:%lld", i, ioc->bbmap.b[i]);
+		for (i = 0; i < ioc->bbmap->nmaps; i++)
+			dbprintf(" %lld:%d", ioc->bbmap->b[i].bm_bn,
+					     ioc->bbmap->b[i].bm_len);
 		dbprintf("\n");
 	}
 	dbprintf(_("\tinode %lld, dir inode %lld, type %s\n"), ioc->ino,
@@ -238,7 +245,7 @@ push_f(
 	else
 		set_cur(iocur_top[-1].typ, iocur_top[-1].bb,
 			iocur_top[-1].blen, DB_RING_IGN,
-			iocur_top[-1].use_bbmap ? &iocur_top[-1].bbmap : NULL);
+			iocur_top[-1].bbmap);
 
 	/* run requested command */
 	if (argc>1)
@@ -280,8 +287,7 @@ forward_f(
 		iocur_ring[ring_current].bb,
 		iocur_ring[ring_current].blen,
 		DB_RING_IGN,
-		iocur_ring[ring_current].use_bbmap ?
-			&iocur_ring[ring_current].bbmap : NULL);
+		iocur_ring[ring_current].bbmap);
 
 	return 0;
 }
@@ -321,8 +327,7 @@ back_f(
 		iocur_ring[ring_current].bb,
 		iocur_ring[ring_current].blen,
 		DB_RING_IGN,
-		iocur_ring[ring_current].use_bbmap ?
-			&iocur_ring[ring_current].bbmap : NULL);
+		iocur_ring[ring_current].bbmap);
 
 	return 0;
 }
@@ -362,7 +367,7 @@ ring_f(
 		iocur_ring[index].bb,
 		iocur_ring[index].blen,
 		DB_RING_IGN,
-		iocur_ring[index].use_bbmap ? &iocur_ring[index].bbmap : NULL);
+		iocur_ring[index].bbmap);
 
 	return 0;
 }
@@ -417,132 +422,37 @@ ring_add(void)
 	}
 }
 
-int
-read_buf(
-	xfs_daddr_t	bbno,
-	int		count,
-	void		*bufp)
-{
-	int		err;
-
-	err = pread64(x.dfd, bufp, BBTOB(count), BBTOB(bbno));
-	if (err < 0)
-		err = errno;
-	else if (err < count)
-		err = -1;
-	return err;
-}
-
-static int
-write_buf(
-	xfs_daddr_t	bbno,
-	int		count,
-	void		*bufp)
-{
-	int		err;
-
-	err = pwrite64(x.dfd, bufp, BBTOB(count), BBTOB(bbno));
-	if (err < 0)
-		err = errno;
-	else if (err < count)
-		err = -1;
-	return err;
-}
-
 static void
 write_cur_buf(void)
 {
 	int ret;
 
-	ret = write_buf(iocur_top->bb, iocur_top->blen, iocur_top->buf);
-
-	if (ret == -1)
-		dbprintf(_("incomplete write, block: %lld\n"),
-			 (iocur_base + iocur_sp)->bb);
-	else if (ret != 0)
+	ret = libxfs_writebufr(iocur_top->bp);
+	if (ret != 0)
 		dbprintf(_("write error: %s\n"), strerror(ret));
 
 	/* re-read buffer from disk */
-	ret = read_buf(iocur_top->bb, iocur_top->blen, iocur_top->buf);
-	if (ret == -1)
-		dbprintf(_("incomplete read, block: %lld\n"),
-			 (iocur_base + iocur_sp)->bb);
-	else if (ret != 0)
+	ret = libxfs_readbufr(mp->m_ddev_targp, iocur_top->bb, iocur_top->bp,
+			      iocur_top->blen, 0);
+	if (ret != 0)
 		dbprintf(_("read error: %s\n"), strerror(ret));
 }
 
-static int
-write_bbs(
-	__int64_t       bbno,
-	int             count,
-	void            *bufp,
-	bbmap_t		*bbmap)
-{
-	int		j;
-	int		rval = EINVAL;	/* initialize for zero `count' case */
-
-	for (j = 0; j < count;) {
-		rval = write_buf(bbmap->b[j].bm_bn, bbmap->b[j].bm_len,
-			     (char *)bufp + BBTOB(j));
-		if (rval)
-			break;
-
-		j += bbmap->b[j].bm_len;
-	}
-	return rval;
-}
-
-static int
-read_bbs(
-	__int64_t	bbno,
-	int		count,
-	void		**bufp,
-	bbmap_t		*bbmap)
-{
-	void		*buf;
-	int		j;
-	int		rval = EINVAL;
-
-	if (count <= 0)
-		count = 1;
-
-	if (*bufp == NULL)
-		buf = xmalloc(BBTOB(count));
-	else
-		buf = *bufp;
-	for (j = 0; j < count;) {
-		rval = read_buf(bbmap->b[j].bm_bn, bbmap->b[j].bm_len,
-			     (char *)buf + BBTOB(j));
-		if (rval)
-			break;
-
-		j += bbmap->b[j].bm_len;
-	}
-	if (*bufp == NULL)
-		*bufp = buf;
-	return rval;
-}
-
 static void
 write_cur_bbs(void)
 {
 	int ret;
 
-	ret = write_bbs(iocur_top->bb, iocur_top->blen, iocur_top->buf,
-			&iocur_top->bbmap);
-	if (ret == -1)
-		dbprintf(_("incomplete write, block: %lld\n"),
-			 (iocur_base + iocur_sp)->bb);
-	else if (ret != 0)
+	ret = libxfs_writebufr(iocur_top->bp);
+	if (ret != 0)
 		dbprintf(_("write error: %s\n"), strerror(ret));
 
+
 	/* re-read buffer from disk */
-	ret = read_bbs(iocur_top->bb, iocur_top->blen, &iocur_top->buf,
-		iocur_top->use_bbmap ? &iocur_top->bbmap : NULL);
-	if (ret == -1)
-		dbprintf(_("incomplete read, block: %lld\n"),
-			 (iocur_base + iocur_sp)->bb);
-	else if (ret != 0)
+	ret = libxfs_readbufr_map(mp->m_ddev_targp, iocur_top->bp,
+				  iocur_top->bbmap->b, iocur_top->bbmap->nmaps,
+				  0);
+	if (ret != 0)
 		dbprintf(_("read error: %s\n"), strerror(ret));
 }
 
@@ -554,7 +464,7 @@ write_cur(void)
 		return;
 	}
 
-	if (iocur_top->use_bbmap)
+	if (iocur_top->bbmap)
 		write_cur_bbs();
 	else
 		write_cur_buf();
@@ -568,6 +478,7 @@ set_cur(
 	int             ring_flag,
 	bbmap_t		*bbmap)
 {
+	struct xfs_buf	*bp;
 	xfs_ino_t	dirino;
 	xfs_ino_t	ino;
 	__uint16_t	mode;
@@ -585,23 +496,28 @@ set_cur(
 
 	if (bbmap) {
 #ifdef DEBUG
+		int i;
 		printf(_("xfs_db got a bbmap for %lld\n"), (long long)d);
+		printf(_("\tblock map"));
+		for (i = 0; i < bbmap->nmaps; i++)
+			printf(" %lld:%d", (long long)bbmap->b[i].bm_bn,
+					   bbmap->b[i].bm_len);
+		printf("\n");
 #endif
-
-		if (read_bbs(d, c, &iocur_top->buf, bbmap))
+		iocur_top->bbmap = malloc(sizeof(struct bbmap));
+		if (!iocur_top->bbmap)
 			return;
-		iocur_top->bbmap = *bbmap;
-		iocur_top->use_bbmap = 1;
+		memcpy(iocur_top->bbmap, bbmap, sizeof(struct bbmap));
+		bp = libxfs_readbuf_map(mp->m_ddev_targp, bbmap->b,
+					bbmap->nmaps, 0, NULL);
 	} else {
-		if (!iocur_top->buf) {
-			iocur_top->buf = malloc(BBTOB(c));
-			if (!iocur_top->buf)
-				return;
-		}
-		if (read_buf(d, c, iocur_top->buf))
-			return;
-		iocur_top->use_bbmap = 0;
+		bp = libxfs_readbuf(mp->m_ddev_targp, d, c, 0, NULL);
+		iocur_top->bbmap = NULL;
 	}
+	if (!bp || bp->b_error)
+		return;
+	iocur_top->buf = bp->b_addr;
+	iocur_top->bp = bp;
 
 	iocur_top->bb = d;
 	iocur_top->blen = c;
diff --git a/db/io.h b/db/io.h
index c7641d5..2c47ccc 100644
--- a/db/io.h
+++ b/db/io.h
@@ -36,8 +36,8 @@ typedef struct iocur {
 	__uint16_t		mode;	/* current inode's mode */
 	xfs_off_t		off;	/* fs offset of "data" in bytes */
 	const struct typ	*typ;	/* type of "data" */
-	int			use_bbmap; /* set if bbmap is valid */
-	bbmap_t			bbmap;	/* map daddr if fragmented */
+	bbmap_t			*bbmap;	/* map daddr if fragmented */
+	struct xfs_buf		*bp;	/* underlying buffer */
 } iocur_t;
 
 #define DB_RING_ADD 1                   /* add to ring on set_cur */
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 21/36] db: introduce verifier support into set_cur
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (19 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 20/36] db: rewrite IO engine to use libxfs Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 22/36] db: indicate if the CRC on a buffer is correct or not Dave Chinner
                   ` (15 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

To be able to use read and write verifiers, we need to pass the
verifier to the IO routines. We do this via the set_cur() function
used to trigger reading the buffer.

For most metadata types, there is only one type of verifier needed.
For these, we can simply add the verifier to the type table entry
for the given type and use that directly. This type entry is already
carried around by the IO context, so if we ever need to get it again
we have direct access to it in the context we'll be doing IO.

Only attach the verifiers to the v5 filesystem type table; there is
not need for them on v4 filesystems as we don't have to verify or
calculate CRCs for them.

There are some metadata types that have more than one buffer format,
or aren't based in directly in buffers. For these, leave the type
table verifier NULL for now - these will need to be addressed
individually.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/io.c   | 13 +++++++---
 db/type.c | 88 ++++++++++++++++++++++++++++++++++-----------------------------
 db/type.h |  1 +
 3 files changed, 59 insertions(+), 43 deletions(-)

diff --git a/db/io.c b/db/io.c
index ca89354..2d1cc56 100644
--- a/db/io.c
+++ b/db/io.c
@@ -482,12 +482,14 @@ set_cur(
 	xfs_ino_t	dirino;
 	xfs_ino_t	ino;
 	__uint16_t	mode;
+	const struct xfs_buf_ops *ops = t ? t->bops : NULL;
 
 	if (iocur_sp < 0) {
 		dbprintf(_("set_cur no stack element to set\n"));
 		return;
 	}
 
+
 	ino = iocur_top->ino;
 	dirino = iocur_top->dirino;
 	mode = iocur_top->mode;
@@ -509,12 +511,17 @@ set_cur(
 			return;
 		memcpy(iocur_top->bbmap, bbmap, sizeof(struct bbmap));
 		bp = libxfs_readbuf_map(mp->m_ddev_targp, bbmap->b,
-					bbmap->nmaps, 0, NULL);
+					bbmap->nmaps, 0, ops);
 	} else {
-		bp = libxfs_readbuf(mp->m_ddev_targp, d, c, 0, NULL);
+		bp = libxfs_readbuf(mp->m_ddev_targp, d, c, 0, ops);
 		iocur_top->bbmap = NULL;
 	}
-	if (!bp || bp->b_error)
+
+	/*
+	 * keep the buffer even if the verifier says it is corrupted.
+	 * We're a diagnostic tool, after all.
+	 */
+	if (!bp || (bp->b_error && bp->b_error != EFSCORRUPTED))
 		return;
 	iocur_top->buf = bp->b_addr;
 	iocur_top->bp = bp;
diff --git a/db/type.c b/db/type.c
index 64e2ef4..b3f3d87 100644
--- a/db/type.c
+++ b/db/type.c
@@ -50,50 +50,58 @@ static const cmdinfo_t	type_cmd =
 	  N_("set/show current data type"), NULL };
 
 static const typ_t	__typtab[] = {
-	{ TYP_AGF, "agf", handle_struct, agf_hfld },
-	{ TYP_AGFL, "agfl", handle_struct, agfl_hfld },
-	{ TYP_AGI, "agi", handle_struct, agi_hfld },
-	{ TYP_ATTR, "attr", handle_struct, attr_hfld },
-	{ TYP_BMAPBTA, "bmapbta", handle_struct, bmapbta_hfld },
-	{ TYP_BMAPBTD, "bmapbtd", handle_struct, bmapbtd_hfld },
-	{ TYP_BNOBT, "bnobt", handle_struct, bnobt_hfld },
-	{ TYP_CNTBT, "cntbt", handle_struct, cntbt_hfld },
-	{ TYP_DATA, "data", handle_block, NULL },
-	{ TYP_DIR2, "dir2", handle_struct, dir2_hfld },
-	{ TYP_DQBLK, "dqblk", handle_struct, dqblk_hfld },
-	{ TYP_INOBT, "inobt", handle_struct, inobt_hfld },
-	{ TYP_INODATA, "inodata", NULL, NULL },
-	{ TYP_INODE, "inode", handle_struct, inode_hfld },
-	{ TYP_LOG, "log", NULL, NULL },
-	{ TYP_RTBITMAP, "rtbitmap", NULL, NULL },
-	{ TYP_RTSUMMARY, "rtsummary", NULL, NULL },
-	{ TYP_SB, "sb", handle_struct, sb_hfld },
-	{ TYP_SYMLINK, "symlink", handle_string, NULL },
-	{ TYP_TEXT, "text", handle_text, NULL },
+	{ TYP_AGF, "agf", handle_struct, agf_hfld, NULL },
+	{ TYP_AGFL, "agfl", handle_struct, agfl_hfld, NULL },
+	{ TYP_AGI, "agi", handle_struct, agi_hfld, NULL },
+	{ TYP_ATTR, "attr", handle_struct, attr_hfld, NULL },
+	{ TYP_BMAPBTA, "bmapbta", handle_struct, bmapbta_hfld, NULL },
+	{ TYP_BMAPBTD, "bmapbtd", handle_struct, bmapbtd_hfld, NULL },
+	{ TYP_BNOBT, "bnobt", handle_struct, bnobt_hfld, NULL },
+	{ TYP_CNTBT, "cntbt", handle_struct, cntbt_hfld, NULL },
+	{ TYP_DATA, "data", handle_block, NULL, NULL },
+	{ TYP_DIR2, "dir2", handle_struct, dir2_hfld, NULL },
+	{ TYP_DQBLK, "dqblk", handle_struct, dqblk_hfld, NULL },
+	{ TYP_INOBT, "inobt", handle_struct, inobt_hfld, NULL },
+	{ TYP_INODATA, "inodata", NULL, NULL, NULL },
+	{ TYP_INODE, "inode", handle_struct, inode_hfld, NULL },
+	{ TYP_LOG, "log", NULL, NULL, NULL },
+	{ TYP_RTBITMAP, "rtbitmap", NULL, NULL, NULL },
+	{ TYP_RTSUMMARY, "rtsummary", NULL, NULL, NULL },
+	{ TYP_SB, "sb", handle_struct, sb_hfld, NULL },
+	{ TYP_SYMLINK, "symlink", handle_string, NULL, NULL },
+	{ TYP_TEXT, "text", handle_text, NULL, NULL },
 	{ TYP_NONE, NULL }
 };
 
 static const typ_t	__typtab_crc[] = {
-	{ TYP_AGF, "agf", handle_struct, agf_hfld },
-	{ TYP_AGFL, "agfl", handle_struct, agfl_crc_hfld },
-	{ TYP_AGI, "agi", handle_struct, agi_hfld },
-	{ TYP_ATTR, "attr3", handle_struct, attr3_hfld },
-	{ TYP_BMAPBTA, "bmapbta", handle_struct, bmapbta_crc_hfld },
-	{ TYP_BMAPBTD, "bmapbtd", handle_struct, bmapbtd_crc_hfld },
-	{ TYP_BNOBT, "bnobt", handle_struct, bnobt_crc_hfld },
-	{ TYP_CNTBT, "cntbt", handle_struct, cntbt_crc_hfld },
-	{ TYP_DATA, "data", handle_block, NULL },
-	{ TYP_DIR2, "dir3", handle_struct, dir3_hfld },
-	{ TYP_DQBLK, "dqblk", handle_struct, dqblk_hfld },
-	{ TYP_INOBT, "inobt", handle_struct, inobt_crc_hfld },
-	{ TYP_INODATA, "inodata", NULL, NULL },
-	{ TYP_INODE, "inode", handle_struct, inode_crc_hfld },
-	{ TYP_LOG, "log", NULL, NULL },
-	{ TYP_RTBITMAP, "rtbitmap", NULL, NULL },
-	{ TYP_RTSUMMARY, "rtsummary", NULL, NULL },
-	{ TYP_SB, "sb", handle_struct, sb_hfld },
-	{ TYP_SYMLINK, "symlink", handle_struct, symlink_crc_hfld },
-	{ TYP_TEXT, "text", handle_text, NULL },
+	{ TYP_AGF, "agf", handle_struct, agf_hfld, &xfs_agf_buf_ops },
+	{ TYP_AGFL, "agfl", handle_struct, agfl_crc_hfld, &xfs_agfl_buf_ops },
+	{ TYP_AGI, "agi", handle_struct, agi_hfld, &xfs_agfl_buf_ops },
+	{ TYP_ATTR, "attr3", handle_struct, attr3_hfld, NULL },
+	{ TYP_BMAPBTA, "bmapbta", handle_struct, bmapbta_crc_hfld,
+		&xfs_bmbt_buf_ops },
+	{ TYP_BMAPBTD, "bmapbtd", handle_struct, bmapbtd_crc_hfld,
+		&xfs_bmbt_buf_ops },
+	{ TYP_BNOBT, "bnobt", handle_struct, bnobt_crc_hfld,
+		&xfs_allocbt_buf_ops },
+	{ TYP_CNTBT, "cntbt", handle_struct, cntbt_crc_hfld,
+		&xfs_allocbt_buf_ops },
+	{ TYP_DATA, "data", handle_block, NULL, NULL },
+	{ TYP_DIR2, "dir3", handle_struct, dir3_hfld, NULL },
+	{ TYP_DQBLK, "dqblk", handle_struct, dqblk_hfld,
+		&xfs_dquot_buf_ops },
+	{ TYP_INOBT, "inobt", handle_struct, inobt_crc_hfld,
+		&xfs_inobt_buf_ops },
+	{ TYP_INODATA, "inodata", NULL, NULL, NULL },
+	{ TYP_INODE, "inode", handle_struct, inode_crc_hfld,
+		&xfs_inode_buf_ops },
+	{ TYP_LOG, "log", NULL, NULL, NULL },
+	{ TYP_RTBITMAP, "rtbitmap", NULL, NULL, NULL },
+	{ TYP_RTSUMMARY, "rtsummary", NULL, NULL, NULL },
+	{ TYP_SB, "sb", handle_struct, sb_hfld, &xfs_sb_buf_ops },
+	{ TYP_SYMLINK, "symlink", handle_struct, symlink_crc_hfld,
+		&xfs_symlink_buf_ops },
+	{ TYP_TEXT, "text", handle_text, NULL, NULL },
 	{ TYP_NONE, NULL }
 };
 
diff --git a/db/type.h b/db/type.h
index c41aca4d..3bb26f1 100644
--- a/db/type.h
+++ b/db/type.h
@@ -42,6 +42,7 @@ typedef struct typ
 	char			*name;
 	pfunc_t			pfunc;
 	const struct field	*fields;
+	const struct xfs_buf_ops *bops;
 } typ_t;
 extern const typ_t	*typtab, *cur_typ;
 
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 22/36] db: indicate if the CRC on a buffer is correct or not
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (20 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 21/36] db: introduce verifier support into set_cur Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 23/36] db: verify and calculate inode CRCs Dave Chinner
                   ` (14 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When dumping metadata that has a CRC in it, output not only the CRC
but text to tell us whether the value is correct or not. Hence we
can see at a glance if there's something wrong or not.

Do this by peeking at the buffer attached to the current IO
contexted. If there was a CRC error, then it will be marked with a
EFSCORRUPTED error. Use this to determine what to output.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/agf.c     |  2 +-
 db/agfl.c    |  2 +-
 db/agi.c     |  2 +-
 db/btblock.c | 10 +++++-----
 db/dir2.c    |  4 ++--
 db/dquot.c   |  2 +-
 db/field.c   |  5 +++++
 db/field.h   |  4 ++++
 db/fprint.c  | 39 +++++++++++++++++++++++++++++++++++++++
 db/fprint.h  |  2 ++
 db/inode.c   |  2 +-
 db/io.h      |  6 ++++++
 db/sb.c      |  2 +-
 db/symlink.c |  2 +-
 14 files changed, 70 insertions(+), 14 deletions(-)

diff --git a/db/agf.c b/db/agf.c
index 389cb43..d9a07ca 100644
--- a/db/agf.c
+++ b/db/agf.c
@@ -71,7 +71,7 @@ const field_t	agf_flds[] = {
 	{ "btreeblks", FLDT_UINT32D, OI(OFF(btreeblks)), C1, 0, TYP_NONE },
 	{ "uuid", FLDT_UUID, OI(OFF(uuid)), C1, 0, TYP_NONE },
 	{ "lsn", FLDT_UINT64X, OI(OFF(lsn)), C1, 0, TYP_NONE },
-	{ "crc", FLDT_UINT32X, OI(OFF(crc)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(OFF(crc)), C1, 0, TYP_NONE },
 	{ NULL }
 };
 
diff --git a/db/agfl.c b/db/agfl.c
index e2340e6..b29538f 100644
--- a/db/agfl.c
+++ b/db/agfl.c
@@ -58,7 +58,7 @@ const field_t	agfl_crc_flds[] = {
 	{ "seqno", FLDT_AGNUMBER, OI(OFF(seqno)), C1, 0, TYP_NONE },
 	{ "uuid", FLDT_UUID, OI(OFF(uuid)), C1, 0, TYP_NONE },
 	{ "lsn", FLDT_UINT64X, OI(OFF(lsn)), C1, 0, TYP_NONE },
-	{ "crc", FLDT_UINT32X, OI(OFF(crc)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(OFF(crc)), C1, 0, TYP_NONE },
 	{ "bno", FLDT_AGBLOCKNZ, OI(OFF(bno)), agfl_bno_size,
 	  FLD_ARRAY|FLD_COUNT, TYP_DATA },
 	{ NULL }
diff --git a/db/agi.c b/db/agi.c
index 6b2e889..398bdbb 100644
--- a/db/agi.c
+++ b/db/agi.c
@@ -56,7 +56,7 @@ const field_t	agi_flds[] = {
 	  CI(XFS_AGI_UNLINKED_BUCKETS), FLD_ARRAY, TYP_NONE },
 	{ "uuid", FLDT_UUID, OI(OFF(uuid)), C1, 0, TYP_NONE },
 	{ "lsn", FLDT_UINT64X, OI(OFF(lsn)), C1, 0, TYP_NONE },
-	{ "crc", FLDT_UINT32X, OI(OFF(crc)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(OFF(crc)), C1, 0, TYP_NONE },
 	{ NULL }
 };
 
diff --git a/db/btblock.c b/db/btblock.c
index 34188db..1ea0cff 100644
--- a/db/btblock.c
+++ b/db/btblock.c
@@ -295,7 +295,7 @@ const field_t	bmapbta_crc_flds[] = {
 	{ "lsn", FLDT_UINT64X, OI(OFF(u.l.bb_lsn)), C1, 0, TYP_NONE },
 	{ "uuid", FLDT_UUID, OI(OFF(u.l.bb_uuid)), C1, 0, TYP_NONE },
 	{ "owner", FLDT_INO, OI(OFF(u.l.bb_owner)), C1, 0, TYP_NONE },
-	{ "crc", FLDT_UINT32X, OI(OFF(u.l.bb_crc)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(OFF(u.l.bb_crc)), C1, 0, TYP_NONE },
 	{ "recs", FLDT_BMAPBTAREC, btblock_rec_offset, btblock_rec_count,
 	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE },
 	{ "keys", FLDT_BMAPBTAKEY, btblock_key_offset, btblock_key_count,
@@ -314,7 +314,7 @@ const field_t	bmapbtd_crc_flds[] = {
 	{ "lsn", FLDT_UINT64X, OI(OFF(u.l.bb_lsn)), C1, 0, TYP_NONE },
 	{ "uuid", FLDT_UUID, OI(OFF(u.l.bb_uuid)), C1, 0, TYP_NONE },
 	{ "owner", FLDT_INO, OI(OFF(u.l.bb_owner)), C1, 0, TYP_NONE },
-	{ "crc", FLDT_UINT32X, OI(OFF(u.l.bb_crc)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(OFF(u.l.bb_crc)), C1, 0, TYP_NONE },
 	{ "recs", FLDT_BMAPBTDREC, btblock_rec_offset, btblock_rec_count,
 	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE },
 	{ "keys", FLDT_BMAPBTDKEY, btblock_key_offset, btblock_key_count,
@@ -405,7 +405,7 @@ const field_t	inobt_crc_flds[] = {
 	{ "lsn", FLDT_UINT64X, OI(OFF(u.s.bb_lsn)), C1, 0, TYP_NONE },
 	{ "uuid", FLDT_UUID, OI(OFF(u.s.bb_uuid)), C1, 0, TYP_NONE },
 	{ "owner", FLDT_AGNUMBER, OI(OFF(u.s.bb_owner)), C1, 0, TYP_NONE },
-	{ "crc", FLDT_UINT32X, OI(OFF(u.s.bb_crc)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(OFF(u.s.bb_crc)), C1, 0, TYP_NONE },
 	{ "recs", FLDT_INOBTREC, btblock_rec_offset, btblock_rec_count,
 	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE },
 	{ "keys", FLDT_INOBTKEY, btblock_key_offset, btblock_key_count,
@@ -471,7 +471,7 @@ const field_t	bnobt_crc_flds[] = {
 	{ "lsn", FLDT_UINT64X, OI(OFF(u.s.bb_lsn)), C1, 0, TYP_NONE },
 	{ "uuid", FLDT_UUID, OI(OFF(u.s.bb_uuid)), C1, 0, TYP_NONE },
 	{ "owner", FLDT_AGNUMBER, OI(OFF(u.s.bb_owner)), C1, 0, TYP_NONE },
-	{ "crc", FLDT_UINT32X, OI(OFF(u.s.bb_crc)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(OFF(u.s.bb_crc)), C1, 0, TYP_NONE },
 	{ "recs", FLDT_BNOBTREC, btblock_rec_offset, btblock_rec_count,
 	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE },
 	{ "keys", FLDT_BNOBTKEY, btblock_key_offset, btblock_key_count,
@@ -533,7 +533,7 @@ const field_t	cntbt_crc_flds[] = {
 	{ "lsn", FLDT_UINT64X, OI(OFF(u.s.bb_lsn)), C1, 0, TYP_NONE },
 	{ "uuid", FLDT_UUID, OI(OFF(u.s.bb_uuid)), C1, 0, TYP_NONE },
 	{ "owner", FLDT_AGNUMBER, OI(OFF(u.s.bb_owner)), C1, 0, TYP_NONE },
-	{ "crc", FLDT_UINT32X, OI(OFF(u.s.bb_crc)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(OFF(u.s.bb_crc)), C1, 0, TYP_NONE },
 	{ "recs", FLDT_CNTBTREC, btblock_rec_offset, btblock_rec_count,
 	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE },
 	{ "keys", FLDT_CNTBTKEY, btblock_key_offset, btblock_key_count,
diff --git a/db/dir2.c b/db/dir2.c
index 8b08d48..2ec64e0 100644
--- a/db/dir2.c
+++ b/db/dir2.c
@@ -922,7 +922,7 @@ const field_t	dir3_data_union_flds[] = {
 #define	DBH3OFF(f)	bitize(offsetof(struct xfs_dir3_blk_hdr, f))
 const field_t	dir3_blkhdr_flds[] = {
 	{ "magic", FLDT_UINT32X, OI(DBH3OFF(magic)), C1, 0, TYP_NONE },
-	{ "crc", FLDT_UINT32X, OI(DBH3OFF(crc)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(DBH3OFF(crc)), C1, 0, TYP_NONE },
 	{ "bno", FLDT_DFSBNO, OI(DBH3OFF(blkno)), C1, 0, TYP_BMAPBTD },
 	{ "lsn", FLDT_UINT64X, OI(DBH3OFF(lsn)), C1, 0, TYP_NONE },
 	{ "uuid", FLDT_UUID, OI(DBH3OFF(uuid)), C1, 0, TYP_NONE },
@@ -959,7 +959,7 @@ const field_t	dir3_free_hdr_flds[] = {
 #define	DB3OFF(f)	bitize(offsetof(struct xfs_da3_blkinfo, f))
 const field_t	da3_blkinfo_flds[] = {
 	{ "hdr", FLDT_DA_BLKINFO, OI(DB3OFF(hdr)), C1, 0, TYP_NONE },
-	{ "crc", FLDT_UINT32X, OI(DB3OFF(crc)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(DB3OFF(crc)), C1, 0, TYP_NONE },
 	{ "bno", FLDT_DFSBNO, OI(DB3OFF(blkno)), C1, 0, TYP_BMAPBTD },
 	{ "lsn", FLDT_UINT64X, OI(DB3OFF(lsn)), C1, 0, TYP_NONE },
 	{ "uuid", FLDT_UUID, OI(DB3OFF(uuid)), C1, 0, TYP_NONE },
diff --git a/db/dquot.c b/db/dquot.c
index 6927956..2f7d463 100644
--- a/db/dquot.c
+++ b/db/dquot.c
@@ -48,7 +48,7 @@ const field_t	dqblk_flds[] = {
 	{ "diskdq", FLDT_DISK_DQUOT, OI(DDOFF(diskdq)), C1, 0, TYP_NONE },
 	{ "fill", FLDT_CHARS, OI(DDOFF(fill)), CI(DDSZC(fill)), FLD_SKIPALL,
 	  TYP_NONE },
-	{ "crc", FLDT_UINT32X, OI(DDOFF(crc)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(DDOFF(crc)), C1, 0, TYP_NONE },
 	{ "lsn", FLDT_UINT64X, OI(DDOFF(lsn)), C1, 0, TYP_NONE },
 	{ "uuid", FLDT_UUID, OI(DDOFF(uuid)), C1, 0, TYP_NONE },
 	{ NULL }
diff --git a/db/field.c b/db/field.c
index c6d7404..4582097 100644
--- a/db/field.c
+++ b/db/field.c
@@ -163,6 +163,11 @@ const ftattr_t	ftattrtab[] = {
 	  0, fa_agblock, NULL },
 	{ FLDT_CNTBTREC, "cntbtrec", fp_sarray, (char *)cntbt_rec_flds,
 	  SI(bitsz(xfs_alloc_rec_t)), 0, NULL, cntbt_rec_flds },
+
+/* CRC field */
+	{ FLDT_CRC, "crc", fp_crc, "%#x (%s)", SI(bitsz(__uint32_t)),
+	  0, NULL, NULL },
+
 	{ FLDT_DEV, "dev", fp_num, "%#x", SI(bitsz(xfs_dev_t)), 0, NULL, NULL },
 	{ FLDT_DFILOFFA, "dfiloffa", fp_num, "%llu", SI(bitsz(xfs_dfiloff_t)),
 	  0, fa_dfiloffa, NULL },
diff --git a/db/field.h b/db/field.h
index aecdf9f..6343c9a 100644
--- a/db/field.h
+++ b/db/field.h
@@ -80,6 +80,10 @@ typedef enum fldt	{
 	FLDT_CNTBTKEY,
 	FLDT_CNTBTPTR,
 	FLDT_CNTBTREC,
+
+	/* CRC field type */
+	FLDT_CRC,
+
 	FLDT_DEV,
 	FLDT_DFILOFFA,
 	FLDT_DFILOFFD,
diff --git a/db/fprint.c b/db/fprint.c
index 1d2f29c..435d984 100644
--- a/db/fprint.c
+++ b/db/fprint.c
@@ -30,6 +30,7 @@
 #include "output.h"
 #include "sig.h"
 #include "malloc.h"
+#include "io.h"
 
 int
 fp_charns(
@@ -184,3 +185,41 @@ fp_uuid(
 	}
 	return 1;
 }
+
+/*
+ * CRC is correct is the current buffer it is being pulled out
+ * of is not marked with a EFSCORRUPTED error.
+ */
+int
+fp_crc(
+	void	*obj,
+	int	bit,
+	int	count,
+	char	*fmtstr,
+	int	size,
+	int	arg,
+	int	base,
+	int	array)
+{
+	int		bitpos;
+	int		i;
+	__int64_t	val;
+	char		*ok;
+
+	ok = iocur_crc_valid() ? "correct" : "bad";
+
+	for (i = 0, bitpos = bit;
+	     i < count && !seenint();
+	     i++, bitpos += size) {
+		if (array)
+			dbprintf("%d:", i + base);
+		val = getbitval(obj, bitpos, size, BVUNSIGNED);
+		if (size > 32)
+			dbprintf(fmtstr, val, ok);
+		else
+			dbprintf(fmtstr, (__int32_t)val, ok);
+		if (i < count - 1)
+			dbprintf(" ");
+	}
+	return 1;
+}
diff --git a/db/fprint.h b/db/fprint.h
index b032dbd..6a6d77e 100644
--- a/db/fprint.h
+++ b/db/fprint.h
@@ -29,3 +29,5 @@ extern int	fp_time(void *obj, int bit, int count, char *fmtstr, int size,
 			int arg, int base, int array);
 extern int	fp_uuid(void *obj, int bit, int count, char *fmtstr, int size,
 			int arg, int base, int array);
+extern int	fp_crc(void *obj, int bit, int count, char *fmtstr, int size,
+		       int arg, int base, int array);
diff --git a/db/inode.c b/db/inode.c
index 634dc30..ec533ee 100644
--- a/db/inode.c
+++ b/db/inode.c
@@ -168,7 +168,7 @@ const field_t	inode_core_flds[] = {
 };
 
 const field_t	inode_v3_flds[] = {
-	{ "crc", FLDT_UINT32X, OI(COFF(crc)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(COFF(crc)), C1, 0, TYP_NONE },
 	{ "change_count", FLDT_UINT64D, OI(COFF(changecount)), C1, 0, TYP_NONE },
 	{ "lsn", FLDT_UINT64X, OI(COFF(lsn)), C1, 0, TYP_NONE },
 	{ "flags2", FLDT_UINT64X, OI(COFF(flags2)), C1, 0, TYP_NONE },
diff --git a/db/io.h b/db/io.h
index 2c47ccc..d647284 100644
--- a/db/io.h
+++ b/db/io.h
@@ -58,3 +58,9 @@ extern void     write_cur(void);
 extern void	set_cur(const struct typ *t, __int64_t d, int c, int ring_add,
 			bbmap_t *bbmap);
 extern void     ring_add(void);
+
+static inline bool
+iocur_crc_valid()
+{
+	return (iocur_top->bp && iocur_top->bp->b_error != EFSCORRUPTED);
+}
diff --git a/db/sb.c b/db/sb.c
index 4929152..6cb665d 100644
--- a/db/sb.c
+++ b/db/sb.c
@@ -118,7 +118,7 @@ const field_t	sb_flds[] = {
 		C1, 0, TYP_NONE },
 	{ "features_log_incompat", FLDT_UINT32X, OI(OFF(features_log_incompat)),
 		C1, 0, TYP_NONE },
-	{ "crc", FLDT_UINT32X, OI(OFF(crc)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(OFF(crc)), C1, 0, TYP_NONE },
 	{ "pquotino", FLDT_INO, OI(OFF(pquotino)), C1, 0, TYP_INODE },
 	{ "lsn", FLDT_UINT64X, OI(OFF(lsn)), C1, 0, TYP_NONE },
 	{ NULL }
diff --git a/db/symlink.c b/db/symlink.c
index 9f3d0b9..a4f420f 100644
--- a/db/symlink.c
+++ b/db/symlink.c
@@ -69,7 +69,7 @@ const struct field	symlink_crc_flds[] = {
 	{ "magic", FLDT_UINT32X, OI(OFF(magic)), C1, 0, TYP_NONE },
 	{ "offset", FLDT_UINT32D, OI(OFF(offset)), C1, 0, TYP_NONE },
 	{ "bytes", FLDT_UINT32D, OI(OFF(bytes)), C1, 0, TYP_NONE },
-	{ "crc", FLDT_UINT32X, OI(OFF(crc)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(OFF(crc)), C1, 0, TYP_NONE },
 	{ "uuid", FLDT_UUID, OI(OFF(uuid)), C1, 0, TYP_NONE },
 	{ "owner", FLDT_INO, OI(OFF(owner)), C1, 0, TYP_NONE },
 	{ "bno", FLDT_DFSBNO, OI(OFF(blkno)), C1, 0, TYP_BMAPBTD },
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 23/36] db: verify and calculate inode CRCs
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (21 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 22/36] db: indicate if the CRC on a buffer is correct or not Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 24/36] db: verify and calculate dquot CRCs Dave Chinner
                   ` (13 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When we set the current IO cursor to point at an inode, verify that
the inode CRC is intact. And prior to writing such an IO cursor,
calculate the inode CRC.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/inode.c             | 2 ++
 db/io.c                | 4 ++++
 db/io.h                | 6 +++++-
 include/libxfs.h       | 4 ++++
 libxfs/xfs_inode_buf.c | 8 ++++----
 5 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/db/inode.c b/db/inode.c
index ec533ee..4090855 100644
--- a/db/inode.c
+++ b/db/inode.c
@@ -655,6 +655,8 @@ set_cur_inode(
 		blkbb, DB_RING_IGN, NULL);
 	off_cur(offset << mp->m_sb.sb_inodelog, mp->m_sb.sb_inodesize);
 	dip = iocur_top->data;
+	iocur_top->ino_crc_ok = libxfs_dinode_verify(mp, ino, dip);
+	iocur_top->ino_buf = 1;
 	iocur_top->ino = ino;
 	iocur_top->mode = be16_to_cpu(dip->di_mode);
 	if ((iocur_top->mode & S_IFMT) == S_IFDIR)
diff --git a/db/io.c b/db/io.c
index 2d1cc56..6e3282e 100644
--- a/db/io.c
+++ b/db/io.c
@@ -464,6 +464,9 @@ write_cur(void)
 		return;
 	}
 
+	if (iocur_top->ino_buf)
+		libxfs_dinode_calc_crc(mp, iocur_top->data);
+
 	if (iocur_top->bbmap)
 		write_cur_bbs();
 	else
@@ -536,6 +539,7 @@ set_cur(
 	iocur_top->ino = ino;
 	iocur_top->dirino = dirino;
 	iocur_top->mode = mode;
+	iocur_top->ino_buf = 0;
 
 	/* store location in ring */
 	if (ring_flag)
diff --git a/db/io.h b/db/io.h
index d647284..1f8270d 100644
--- a/db/io.h
+++ b/db/io.h
@@ -38,6 +38,8 @@ typedef struct iocur {
 	const struct typ	*typ;	/* type of "data" */
 	bbmap_t			*bbmap;	/* map daddr if fragmented */
 	struct xfs_buf		*bp;	/* underlying buffer */
+	int			ino_crc_ok:1;
+	int			ino_buf:1;
 } iocur_t;
 
 #define DB_RING_ADD 1                   /* add to ring on set_cur */
@@ -62,5 +64,7 @@ extern void     ring_add(void);
 static inline bool
 iocur_crc_valid()
 {
-	return (iocur_top->bp && iocur_top->bp->b_error != EFSCORRUPTED);
+	return (iocur_top->bp &&
+		iocur_top->bp->b_error != EFSCORRUPTED &&
+		(!iocur_top->ino_buf || iocur_top->ino_crc_ok));
 }
diff --git a/include/libxfs.h b/include/libxfs.h
index b097bd2..cbb5757 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -748,6 +748,10 @@ void	xfs_dinode_from_disk(struct xfs_icdinode *,
 #define libxfs_idata_realloc		xfs_idata_realloc
 #define libxfs_idestroy_fork		xfs_idestroy_fork
 
+#define libxfs_dinode_verify		xfs_dinode_verify
+bool xfs_dinode_verify(struct xfs_mount *mp, xfs_ino_t ino,
+		       struct xfs_dinode *dip);
+
 /* xfs_sb.h */
 #define libxfs_mod_sb			xfs_mod_sb
 #define libxfs_sb_from_disk		xfs_sb_from_disk
diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index b796556..728ef71 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -276,10 +276,10 @@ xfs_dinode_to_disk(
 	}
 }
 
-static bool
+bool
 xfs_dinode_verify(
 	struct xfs_mount	*mp,
-	struct xfs_inode	*ip,
+	xfs_ino_t		ino,
 	struct xfs_dinode	*dip)
 {
 	if (dip->di_magic != cpu_to_be16(XFS_DINODE_MAGIC))
@@ -294,7 +294,7 @@ xfs_dinode_verify(
 	if (!xfs_verify_cksum((char *)dip, mp->m_sb.sb_inodesize,
 			      offsetof(struct xfs_dinode, di_crc)))
 		return false;
-	if (be64_to_cpu(dip->di_ino) != ip->i_ino)
+	if (be64_to_cpu(dip->di_ino) != ino)
 		return false;
 	if (!uuid_equal(&dip->di_uuid, &mp->m_sb.sb_uuid))
 		return false;
@@ -346,7 +346,7 @@ xfs_iread(
 		return error;
 
 	/* even unallocated inodes are verified */
-	if (!xfs_dinode_verify(mp, ip, dip)) {
+	if (!xfs_dinode_verify(mp, ip->i_ino, dip)) {
 		xfs_alert(mp, "%s: validation failed for inode %lld failed",
 				__func__, ip->i_ino);
 
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 24/36] db: verify and calculate dquot CRCs
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (22 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 23/36] db: verify and calculate inode CRCs Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13 16:05   ` Christoph Hellwig
  2013-11-13  6:40 ` [PATCH 25/36] db: add a special directory buffer verifier Dave Chinner
                   ` (12 subsequent siblings)
  36 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

When we set the current Io cursor to point at a dquot block, verify
that the dquot CRC is intact. And prior to writing such an IO
cursor, calculate the dquot CRC.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 db/io.c | 5 ++++-
 db/io.h | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/db/io.c b/db/io.c
index 6e3282e..123214d 100644
--- a/db/io.c
+++ b/db/io.c
@@ -466,7 +466,9 @@ write_cur(void)
 
 	if (iocur_top->ino_buf)
 		libxfs_dinode_calc_crc(mp, iocur_top->data);
-
+	if (iocur_top->dquot_buf)
+		xfs_update_cksum(iocur_top->data, sizeof(struct xfs_dqblk),
+				 XFS_DQUOT_CRC_OFF);
 	if (iocur_top->bbmap)
 		write_cur_bbs();
 	else
@@ -540,6 +542,7 @@ set_cur(
 	iocur_top->dirino = dirino;
 	iocur_top->mode = mode;
 	iocur_top->ino_buf = 0;
+	iocur_top->dquot_buf = 0;
 
 	/* store location in ring */
 	if (ring_flag)
diff --git a/db/io.h b/db/io.h
index 1f8270d..4f24c83 100644
--- a/db/io.h
+++ b/db/io.h
@@ -40,6 +40,7 @@ typedef struct iocur {
 	struct xfs_buf		*bp;	/* underlying buffer */
 	int			ino_crc_ok:1;
 	int			ino_buf:1;
+	int			dquot_buf:1;
 } iocur_t;
 
 #define DB_RING_ADD 1                   /* add to ring on set_cur */
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 25/36] db: add a special directory buffer verifier
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (23 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 24/36] db: verify and calculate dquot CRCs Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 26/36] db: add a special attribute " Dave Chinner
                   ` (11 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Because we only have a single directory type that is used for all
the different buffer types, we need to provide a special verifier
for the read code. That verifier needs to know all the directory
types and when it find one it knows about, switch to the correct
verifier and call it.

We already do this for certain readahead cases in the directory
code, so there is precedence for this. If we don't find a magic
number we recognise, the verifier fails...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/dir2.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 db/dir2.h |  2 ++
 db/type.c |  3 ++-
 3 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/db/dir2.c b/db/dir2.c
index 2ec64e0..5a10955 100644
--- a/db/dir2.c
+++ b/db/dir2.c
@@ -24,6 +24,7 @@
 #include "field.h"
 #include "dir2.h"
 #include "init.h"
+#include "output.h"
 
 static int	dir2_block_hdr_count(void *obj, int startoff);
 static int	dir2_block_leaf_count(void *obj, int startoff);
@@ -975,3 +976,63 @@ const field_t	da3_node_hdr_flds[] = {
 	{ "pad", FLDT_UINT32D, OI(H3OFF(__pad32)), C1, 0, TYP_NONE },
 	{ NULL }
 };
+
+/*
+ * Special read verifier for directory buffers. detect the magic number
+ * appropriately and set the correct verifier and call it.
+ */
+static void
+xfs_dir3_db_read_verify(
+	struct xfs_buf		*bp)
+{
+	__be32			magic32;
+	__be16			magic16;
+
+	magic32 = *(__be32 *)bp->b_addr;
+	magic16 = ((struct xfs_da_blkinfo *)bp->b_addr)->magic;
+
+	switch (magic32) {
+	case cpu_to_be32(XFS_DIR3_BLOCK_MAGIC):
+		bp->b_ops = &xfs_dir3_block_buf_ops;
+		goto verify;
+	case cpu_to_be32(XFS_DIR3_DATA_MAGIC):
+		bp->b_ops = &xfs_dir3_data_buf_ops;
+		goto verify;
+	case cpu_to_be32(XFS_DIR3_FREE_MAGIC):
+		bp->b_ops = &xfs_dir3_free_buf_ops;
+		goto verify;
+	default:
+		break;
+	}
+
+	switch (magic16) {
+	case cpu_to_be16(XFS_DIR3_LEAF1_MAGIC):
+		bp->b_ops = &xfs_dir3_leaf1_buf_ops;
+		break;
+	case cpu_to_be16(XFS_DIR3_LEAFN_MAGIC):
+		bp->b_ops = &xfs_dir3_leafn_buf_ops;
+		break;
+	case cpu_to_be16(XFS_DA3_NODE_MAGIC):
+		bp->b_ops = &xfs_da3_node_buf_ops;
+		break;
+	default:
+		dbprintf(_("Unknown directory buffer type!\n"));
+		xfs_buf_ioerror(bp, EFSCORRUPTED);
+		return;
+	}
+verify:
+	bp->b_ops->verify_read(bp);
+}
+
+static void
+xfs_dir3_db_write_verify(
+	struct xfs_buf		*bp)
+{
+	dbprintf(_("Writing unknown directory buffer type!\n"));
+	xfs_buf_ioerror(bp, EFSCORRUPTED);
+}
+
+const struct xfs_buf_ops xfs_dir3_db_buf_ops = {
+	.verify_read = xfs_dir3_db_read_verify,
+	.verify_write = xfs_dir3_db_write_verify,
+};
diff --git a/db/dir2.h b/db/dir2.h
index b3651d5..5054493 100644
--- a/db/dir2.h
+++ b/db/dir2.h
@@ -60,3 +60,5 @@ static inline xfs_dir2_inou_t *xfs_dir2_sf_inumberp(xfs_dir2_sf_entry_t *sfep)
 
 extern int	dir2_data_union_size(void *obj, int startoff, int idx);
 extern int	dir2_size(void *obj, int startoff, int idx);
+
+extern const struct xfs_buf_ops xfs_dir3_db_buf_ops;
diff --git a/db/type.c b/db/type.c
index b3f3d87..2c3431e 100644
--- a/db/type.c
+++ b/db/type.c
@@ -87,7 +87,8 @@ static const typ_t	__typtab_crc[] = {
 	{ TYP_CNTBT, "cntbt", handle_struct, cntbt_crc_hfld,
 		&xfs_allocbt_buf_ops },
 	{ TYP_DATA, "data", handle_block, NULL, NULL },
-	{ TYP_DIR2, "dir3", handle_struct, dir3_hfld, NULL },
+	{ TYP_DIR2, "dir3", handle_struct, dir3_hfld,
+		&xfs_dir3_db_buf_ops },
 	{ TYP_DQBLK, "dqblk", handle_struct, dqblk_hfld,
 		&xfs_dquot_buf_ops },
 	{ TYP_INOBT, "inobt", handle_struct, inobt_crc_hfld,
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 26/36] db: add a special attribute buffer verifier
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (24 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 25/36] db: add a special directory buffer verifier Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 27/36] db: re-enable write support for v5 filesystems Dave Chinner
                   ` (10 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Because we only have a single attribute type that is used for all
the attribute buffer types, we need to provide a special verifier
for the read code. That verifier needs to know all the attribute
types and when it find one it knows about, switch to the correct
verifier and call it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/attr.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 db/attr.h |  2 ++
 db/type.c |  3 ++-
 3 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/db/attr.c b/db/attr.c
index cd95a0a..359af7b 100644
--- a/db/attr.c
+++ b/db/attr.c
@@ -25,6 +25,7 @@
 #include "attr.h"
 #include "io.h"
 #include "init.h"
+#include "output.h"
 
 static int	attr_leaf_entries_count(void *obj, int startoff);
 static int	attr_leaf_hdr_count(void *obj, int startoff);
@@ -522,3 +523,53 @@ const field_t	attr3_leaf_hdr_flds[] = {
 	{ NULL }
 };
 
+/*
+ * Special read verifier for attribute buffers. detect the magic number
+ * appropriately and set the correct verifier and call it.
+ */
+static void
+xfs_attr3_db_read_verify(
+	struct xfs_buf		*bp)
+{
+	__be32			magic32;
+	__be16			magic16;
+
+	magic32 = *(__be32 *)bp->b_addr;
+	magic16 = ((struct xfs_da_blkinfo *)bp->b_addr)->magic;
+
+	switch (magic16) {
+	case cpu_to_be16(XFS_ATTR3_LEAF_MAGIC):
+		bp->b_ops = &xfs_attr3_leaf_buf_ops;
+		goto verify;
+	case cpu_to_be16(XFS_DA3_NODE_MAGIC):
+		bp->b_ops = &xfs_da3_node_buf_ops;
+		goto verify;
+	default:
+		break;
+	}
+
+	switch (magic32) {
+	case cpu_to_be32(XFS_ATTR3_RMT_MAGIC):
+		bp->b_ops = &xfs_attr3_rmt_buf_ops;
+		break;
+	default:
+		dbprintf(_("Unknown attribute buffer type!\n"));
+		xfs_buf_ioerror(bp, EFSCORRUPTED);
+		return;
+	}
+verify:
+	bp->b_ops->verify_read(bp);
+}
+
+static void
+xfs_attr3_db_write_verify(
+	struct xfs_buf		*bp)
+{
+	dbprintf(_("Writing unknown attribute buffer type!\n"));
+	xfs_buf_ioerror(bp, EFSCORRUPTED);
+}
+
+const struct xfs_buf_ops xfs_attr3_db_buf_ops = {
+	.verify_read = xfs_attr3_db_read_verify,
+	.verify_write = xfs_attr3_db_write_verify,
+};
diff --git a/db/attr.h b/db/attr.h
index 3065372..bc3431f 100644
--- a/db/attr.h
+++ b/db/attr.h
@@ -33,3 +33,5 @@ extern const field_t	attr3_node_hdr_flds[];
 
 extern int	attr_leaf_name_size(void *obj, int startoff, int idx);
 extern int	attr_size(void *obj, int startoff, int idx);
+
+extern const struct xfs_buf_ops xfs_attr3_db_buf_ops;
diff --git a/db/type.c b/db/type.c
index 2c3431e..04d0d56 100644
--- a/db/type.c
+++ b/db/type.c
@@ -77,7 +77,8 @@ static const typ_t	__typtab_crc[] = {
 	{ TYP_AGF, "agf", handle_struct, agf_hfld, &xfs_agf_buf_ops },
 	{ TYP_AGFL, "agfl", handle_struct, agfl_crc_hfld, &xfs_agfl_buf_ops },
 	{ TYP_AGI, "agi", handle_struct, agi_hfld, &xfs_agfl_buf_ops },
-	{ TYP_ATTR, "attr3", handle_struct, attr3_hfld, NULL },
+	{ TYP_ATTR, "attr3", handle_struct, attr3_hfld,
+		&xfs_attr3_db_buf_ops },
 	{ TYP_BMAPBTA, "bmapbta", handle_struct, bmapbta_crc_hfld,
 		&xfs_bmbt_buf_ops },
 	{ TYP_BMAPBTD, "bmapbtd", handle_struct, bmapbtd_crc_hfld,
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 27/36] db: re-enable write support for v5 filesystems.
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (25 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 26/36] db: add a special attribute " Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 28/36] xfs_db: use inode cluster buffers for inode IO Dave Chinner
                   ` (9 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

As we can now verify and recalculate CRCs on IO, we can modify the
on-disk structures without corrupting the filesyste, This makes it
safe to turn write support on for v5 filesystems for the first time.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/init.c | 15 ---------------
 1 file changed, 15 deletions(-)

diff --git a/db/init.c b/db/init.c
index 2dc7c87..25108ad 100644
--- a/db/init.c
+++ b/db/init.c
@@ -143,21 +143,6 @@ init(
 			exit(EXIT_FAILURE);
 	}
 
-	/*
-	 * Don't allow modifications to CRC enabled filesystems until we support
-	 * CRC recalculation in the IO path. Unless, of course, the user is in
-	 * the process of hitting us with a big hammer.
-	 */
-	if (XFS_SB_VERSION_NUM(sbp) >= XFS_SB_VERSION_5 &&
-	    !(x.isreadonly & LIBXFS_ISREADONLY)) {
-		fprintf(stderr, 
-	_("%s: modifications to %s are not supported in thi version.\n"
-	"Use \"-r\" to run %s in read-only mode on this filesystem .\n"),
-			progname, fsdevice, progname);
-		if (!force)
-			exit(EXIT_FAILURE);
-	}
-
 	mp = libxfs_mount(&xmount, sbp, x.ddev, x.logdev, x.rtdev,
 			  LIBXFS_MOUNT_DEBUGGER);
 	if (!mp) {
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 28/36] xfs_db: use inode cluster buffers for inode IO
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (26 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 27/36] db: re-enable write support for v5 filesystems Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 29/36] xfs_db: avoid libxfs buffer lookup warnings Dave Chinner
                   ` (8 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When we mount the filesystem inside xfs_db, libxfs is tasked with
reading some information from disk, such as root inodes. Because
libxfs does this inode reading, it uses inode cluster buffers to
read the inodes. xfs_db, OTOH, just uses FSB sized buffers to read
inodes, and hence xfs_db throws a warning when reading the root
inode block like so:

$ sudo xfs_db -c "sb 0" -c "p rootino" -c "inode 32" /dev/vda
Version 5 superblock detected. xfsprogs has EXPERIMENTAL support enabled!
Use of these features is at your own risk!
rootino = 32
7f59f20e6740: Badness in key lookup (length)
bp=(bno 0x20, len 8192 bytes) key=(bno 0x20, len 1024 bytes)
$

There is another way this can happen, and that is dumping raw data
from disk using either the "fsb NNN" or "daddr MMM" commands to dump
untyped information. This is always read in sector or filesystem
block units, and so will cause similar badness warnings.

To avoid this problem when reading inodes, teach xfs_db to read
inode clusters rather individual filesystem blocks when asked to
read an inode.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/inode.c | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/db/inode.c b/db/inode.c
index 4090855..24170ba 100644
--- a/db/inode.c
+++ b/db/inode.c
@@ -623,6 +623,14 @@ inode_u_symlink_count(
 		(int)be64_to_cpu(dip->di_size) : 0;
 }
 
+/*
+ * We are now using libxfs for our IO backend, so we should always try to use
+ * inode cluster buffers rather than filesystem block sized buffers for reading
+ * inodes. This means that we always use the same buffers as libxfs operations
+ * does, and that avoids buffer cache issues caused by overlapping buffers. This
+ * can be seen clearly when trying to read the root inode. Much of this logic is
+ * similar to libxfs_imap().
+ */
 void
 set_cur_inode(
 	xfs_ino_t	ino)
@@ -632,6 +640,9 @@ set_cur_inode(
 	xfs_agnumber_t	agno;
 	xfs_dinode_t	*dip;
 	int		offset;
+	int		numblks = blkbb;
+	xfs_agblock_t	cluster_agbno;
+
 
 	agno = XFS_INO_TO_AGNO(mp, ino);
 	agino = XFS_INO_TO_AGINO(mp, ino);
@@ -644,6 +655,24 @@ set_cur_inode(
 		return;
 	}
 	cur_agno = agno;
+
+	if (mp->m_inode_cluster_size > mp->m_sb.sb_blocksize &&
+	    mp->m_inoalign_mask) {
+		xfs_agblock_t	chunk_agbno;
+		xfs_agblock_t	offset_agbno;
+		int		blks_per_cluster;
+
+		blks_per_cluster = mp->m_inode_cluster_size >>
+							mp->m_sb.sb_blocklog;
+		offset_agbno = agbno & mp->m_inoalign_mask;
+		chunk_agbno = agbno - offset_agbno;
+		cluster_agbno = chunk_agbno +
+			((offset_agbno / blks_per_cluster) * blks_per_cluster);
+		offset += ((agbno - cluster_agbno) * mp->m_sb.sb_inopblock);
+		numblks = XFS_FSB_TO_BB(mp, blks_per_cluster);
+	} else
+		cluster_agbno = agbno;
+
 	/*
 	 * First set_cur to the block with the inode
 	 * then use off_cur to get the right part of the buffer.
@@ -651,8 +680,8 @@ set_cur_inode(
 	ASSERT(typtab[TYP_INODE].typnm == TYP_INODE);
 
 	/* ingore ring update here, do it explicitly below */
-	set_cur(&typtab[TYP_INODE], XFS_AGB_TO_DADDR(mp, agno, agbno),
-		blkbb, DB_RING_IGN, NULL);
+	set_cur(&typtab[TYP_INODE], XFS_AGB_TO_DADDR(mp, agno, cluster_agbno),
+		numblks, DB_RING_IGN, NULL);
 	off_cur(offset << mp->m_sb.sb_inodelog, mp->m_sb.sb_inodesize);
 	dip = iocur_top->data;
 	iocur_top->ino_crc_ok = libxfs_dinode_verify(mp, ino, dip);
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 29/36] xfs_db: avoid libxfs buffer lookup warnings
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (27 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 28/36] xfs_db: use inode cluster buffers for inode IO Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 30/36] libxfs: work around do_div() not handling 32 bit numerators Dave Chinner
                   ` (7 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_db is unique in the way it can read the same blocks with
different lengths from disk, so we really need a way to avoid having
duplicate buffers in the cache. To handle this in a generic way,
introduce a "purge on compare failure" feature to libxfs.

What this feature does is instead of throwing a warning when a
buffer miscompare occurs (e.g. due to a length mismatch), it purges
the buffer that is in cache from the cache. We can do this safely in
the context of xfs_db because it always writes back changes made to
buffers before it releases the reference to the buffer. Hence we can
purge buffers directly from the lookup code without having to worry
about whether they are dirty or not.

Doing this purge on miscompare operation avoids the
problem that libxfs is currently warning about, and hence if the
feature flag is set then we don't need to warn about miscompares any
more. Hence the whole problem goes away entirely for xfs_db, without
affecting any of the other users of libxfs based IO.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/init.c           |  1 +
 include/cache.h     | 22 +++++++++++++-
 include/libxfs.h    |  2 ++
 libxfs/cache.c      | 83 +++++++++++++++++++++++++++++++++++++++--------------
 libxfs/init.c       |  3 +-
 libxfs/rdwr.c       | 28 ++++++++++--------
 repair/xfs_repair.c |  2 +-
 7 files changed, 105 insertions(+), 36 deletions(-)

diff --git a/db/init.c b/db/init.c
index 25108ad..8f86f45 100644
--- a/db/init.c
+++ b/db/init.c
@@ -109,6 +109,7 @@ init(
 	else
 		x.dname = fsdevice;
 
+	x.bcache_flags = CACHE_MISCOMPARE_PURGE;
 	if (!libxfs_init(&x)) {
 		fputs(_("\nfatal error -- couldn't initialize XFS library\n"),
 			stderr);
diff --git a/include/cache.h b/include/cache.h
index 0c0a1c5..c5757d0 100644
--- a/include/cache.h
+++ b/include/cache.h
@@ -18,6 +18,25 @@
 #ifndef __CACHE_H__
 #define __CACHE_H__
 
+/*
+ * initialisation flags
+ */
+/*
+ * xfs_db always writes changes immediately, and so we need to purge buffers
+ * when we get a buffer lookup mismatch due to readin the same block with a
+ * different buffer configuration.
+ */
+#define CACHE_MISCOMPARE_PURGE	(1 << 0)
+
+/*
+ * cache object campare return values
+ */
+enum {
+	CACHE_HIT,
+	CACHE_MISS,
+	CACHE_PURGE,
+};
+
 #define	HASH_CACHE_RATIO	8
 
 /*
@@ -82,6 +101,7 @@ struct cache_node {
 };
 
 struct cache {
+	int			c_flags;	/* behavioural flags */
 	unsigned int		c_maxcount;	/* max cache nodes */
 	unsigned int		c_count;	/* count of nodes */
 	pthread_mutex_t		c_mutex;	/* node count mutex */
@@ -99,7 +119,7 @@ struct cache {
 	unsigned int 		c_max;		/* max nodes ever used */
 };
 
-struct cache *cache_init(unsigned int, struct cache_operations *);
+struct cache *cache_init(int, unsigned int, struct cache_operations *);
 void cache_destroy(struct cache *);
 void cache_walk(struct cache *, cache_walk_t);
 void cache_purge(struct cache *);
diff --git a/include/libxfs.h b/include/libxfs.h
index cbb5757..40a950e 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -110,6 +110,8 @@ typedef struct {
 	int             dfd;            /* data subvolume file descriptor */
 	int             logfd;          /* log subvolume file descriptor */
 	int             rtfd;           /* realtime subvolume file descriptor */
+	int		icache_flags;	/* cache init flags */
+	int		bcache_flags;	/* cache init flags */
 } libxfs_init_t;
 
 #define LIBXFS_EXIT_ON_FAILURE	0x0001	/* exit the program if a call fails */
diff --git a/libxfs/cache.c b/libxfs/cache.c
index 56b24e7..84d2860 100644
--- a/libxfs/cache.c
+++ b/libxfs/cache.c
@@ -38,6 +38,7 @@ static unsigned int cache_generic_bulkrelse(struct cache *, struct list_head *);
 
 struct cache *
 cache_init(
+	int			flags,
 	unsigned int		hashsize,
 	struct cache_operations	*cache_operations)
 {
@@ -53,6 +54,7 @@ cache_init(
 		return NULL;
 	}
 
+	cache->c_flags = flags;
 	cache->c_count = 0;
 	cache->c_max = 0;
 	cache->c_hits = 0;
@@ -289,6 +291,34 @@ cache_overflowed(
 	return (cache->c_maxcount == cache->c_max);
 }
 
+
+static int
+__cache_node_purge(
+	struct cache *		cache,
+	struct cache_node *	node)
+{
+	int			count;
+	struct cache_mru *	mru;
+
+	pthread_mutex_lock(&node->cn_mutex);
+	count = node->cn_count;
+	if (count != 0) {
+		pthread_mutex_unlock(&node->cn_mutex);
+		return count;
+	}
+	mru = &cache->c_mrus[node->cn_priority];
+	pthread_mutex_lock(&mru->cm_mutex);
+	list_del_init(&node->cn_mru);
+	mru->cm_count--;
+	pthread_mutex_unlock(&mru->cm_mutex);
+
+	pthread_mutex_unlock(&node->cn_mutex);
+	pthread_mutex_destroy(&node->cn_mutex);
+	list_del_init(&node->cn_hash);
+	cache->relse(node);
+	return count;
+}
+
 /*
  * Lookup in the cache hash table.  With any luck we'll get a cache
  * hit, in which case this will all be over quickly and painlessly.
@@ -308,8 +338,10 @@ cache_node_get(
 	struct cache_mru *	mru;
 	struct list_head *	head;
 	struct list_head *	pos;
+	struct list_head *	n;
 	unsigned int		hashidx;
 	int			priority = 0;
+	int			purged = 0;
 
 	hashidx = cache->hash(key, cache->c_hashsize);
 	hash = cache->c_hash + hashidx;
@@ -317,10 +349,26 @@ cache_node_get(
 
 	for (;;) {
 		pthread_mutex_lock(&hash->ch_mutex);
-		for (pos = head->next; pos != head; pos = pos->next) {
+		for (pos = head->next, n = pos->next; pos != head;
+						pos = n, n = pos->next) {
+			int result;
+
 			node = list_entry(pos, struct cache_node, cn_hash);
-			if (!cache->compare(node, key))
-				continue;
+			result = cache->compare(node, key);
+			switch (result) {
+			case CACHE_HIT:
+				break;
+			case CACHE_PURGE:
+				if ((cache->c_flags & CACHE_MISCOMPARE_PURGE) &&
+				    !__cache_node_purge(cache, node)) {
+					purged++;
+					hash->ch_count--;
+				}
+				/* FALL THROUGH */
+			case CACHE_MISS:
+				goto next_object;
+			}
+
 			/*
 			 * node found, bump node's reference count, remove it
 			 * from its MRU list, and update stats.
@@ -347,6 +395,8 @@ cache_node_get(
 
 			*nodep = node;
 			return 0;
+next_object:
+			continue;	/* what the hell, gcc? */
 		}
 		pthread_mutex_unlock(&hash->ch_mutex);
 		/*
@@ -375,6 +425,12 @@ cache_node_get(
 	list_add(&node->cn_hash, &hash->ch_list);
 	pthread_mutex_unlock(&hash->ch_mutex);
 
+	if (purged) {
+		pthread_mutex_lock(&cache->c_mutex);
+		cache->c_count -= purged;
+		pthread_mutex_unlock(&cache->c_mutex);
+	}
+
 	*nodep = node;
 	return 1;
 }
@@ -457,7 +513,6 @@ cache_node_purge(
 	struct list_head *	pos;
 	struct list_head *	n;
 	struct cache_hash *	hash;
-	struct cache_mru *	mru;
 	int			count = -1;
 
 	hash = cache->c_hash + cache->hash(key, cache->c_hashsize);
@@ -468,23 +523,9 @@ cache_node_purge(
 		if ((struct cache_node *)pos != node)
 			continue;
 
-		pthread_mutex_lock(&node->cn_mutex);
-		count = node->cn_count;
-		if (count != 0) {
-			pthread_mutex_unlock(&node->cn_mutex);
-			break;
-		}
-		mru = &cache->c_mrus[node->cn_priority];
-		pthread_mutex_lock(&mru->cm_mutex);
-		list_del_init(&node->cn_mru);
-		mru->cm_count--;
-		pthread_mutex_unlock(&mru->cm_mutex);
-
-		pthread_mutex_unlock(&node->cn_mutex);
-		pthread_mutex_destroy(&node->cn_mutex);
-		list_del_init(&node->cn_hash);
-		hash->ch_count--;
-		cache->relse(node);
+		count = __cache_node_purge(cache, node);
+		if (!count)
+			hash->ch_count--;
 		break;
 	}
 	pthread_mutex_unlock(&hash->ch_mutex);
diff --git a/libxfs/init.c b/libxfs/init.c
index 9a3cf22..0924948 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -334,7 +334,8 @@ libxfs_init(libxfs_init_t *a)
 		chdir(curdir);
 	if (!libxfs_bhash_size)
 		libxfs_bhash_size = LIBXFS_BHASHSIZE(sbp);
-	libxfs_bcache = cache_init(libxfs_bhash_size, &libxfs_bcache_operations);
+	libxfs_bcache = cache_init(a->bcache_flags, libxfs_bhash_size,
+				   &libxfs_bcache_operations);
 	use_xfs_buf_lock = a->usebuflock;
 	manage_zones(0);
 	rval = 1;
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 7eaea0a..0aa2eba 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -323,20 +323,24 @@ libxfs_bcompare(struct cache_node *node, cache_key_t key)
 	struct xfs_buf	*bp = (struct xfs_buf *)node;
 	struct xfs_bufkey *bkey = (struct xfs_bufkey *)key;
 
-#ifdef IO_BCOMPARE_CHECK
 	if (bp->b_target->dev == bkey->buftarg->dev &&
-	    bp->b_bn == bkey->blkno &&
-	    bp->b_bcount != BBTOB(bkey->bblen))
-		fprintf(stderr, "%lx: Badness in key lookup (length)\n"
-			"bp=(bno 0x%llx, len %u bytes) key=(bno 0x%llx, len %u bytes)\n",
-			pthread_self(),
-			(unsigned long long)bp->b_bn, (int)bp->b_bcount,
-			(unsigned long long)bkey->blkno, BBTOB(bkey->bblen));
+	    bp->b_bn == bkey->blkno) {
+		if (bp->b_bcount == BBTOB(bkey->bblen))
+			return CACHE_HIT;
+#ifdef IO_BCOMPARE_CHECK
+		if (!(libxfs_bcache->c_flags & CACHE_MISCOMPARE_PURGE)) {
+			fprintf(stderr,
+	"%lx: Badness in key lookup (length)\n"
+	"bp=(bno 0x%llx, len %u bytes) key=(bno 0x%llx, len %u bytes)\n",
+				pthread_self(),
+				(unsigned long long)bp->b_bn, (int)bp->b_bcount,
+				(unsigned long long)bkey->blkno,
+				BBTOB(bkey->bblen));
+		}
 #endif
-
-	return (bp->b_target->dev == bkey->buftarg->dev &&
-		bp->b_bn == bkey->blkno &&
-		bp->b_bcount == BBTOB(bkey->bblen));
+		return CACHE_PURGE;
+	}
+	return CACHE_MISS;
 }
 
 void
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 214b7fa..77a040e 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -706,7 +706,7 @@ main(int argc, char **argv)
 			do_log(_("        - block cache size set to %d entries\n"),
 				libxfs_bhash_size * HASH_CACHE_RATIO);
 
-		libxfs_bcache = cache_init(libxfs_bhash_size,
+		libxfs_bcache = cache_init(0, libxfs_bhash_size,
 						&libxfs_bcache_operations);
 	}
 
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 30/36] libxfs: work around do_div() not handling 32 bit numerators
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (28 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 29/36] xfs_db: avoid libxfs buffer lookup warnings Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 31/36] db: enable metadump on CRC filesystems Dave Chinner
                   ` (6 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The libxfs dquot buffer code uses do_div() with a 32 bit numerator.
This gives incorrect results as do_div() passes the numerator by
reference as a pointer to a 64 bit value. Hence it does the division
using 32 bits of garbage gives the wrong result.

As per Christoph's suggestion, we can kill the usage of do_div()
here completely and just do the division directly, both in userspace
and kernel space.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_dquot_buf.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/libxfs/xfs_dquot_buf.c b/libxfs/xfs_dquot_buf.c
index 620d9d3..6bbb0ff 100644
--- a/libxfs/xfs_dquot_buf.c
+++ b/libxfs/xfs_dquot_buf.c
@@ -23,13 +23,8 @@ xfs_calc_dquots_per_chunk(
 	struct xfs_mount	*mp,
 	unsigned int		nbblks)	/* basic block units */
 {
-	unsigned int	ndquots;
-
 	ASSERT(nbblks > 0);
-	ndquots = BBTOB(nbblks);
-	do_div(ndquots, sizeof(xfs_dqblk_t));
-
-	return ndquots;
+	return BBTOB(nbblks) / sizeof(xfs_dqblk_t);
 }
 
 /*
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 31/36] db: enable metadump on CRC filesystems
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (29 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 30/36] libxfs: work around do_div() not handling 32 bit numerators Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13 16:09   ` Christoph Hellwig
  2013-11-13  6:40 ` [PATCH 32/36] xfs: support larger inode clusters on v5 filesystems Dave Chinner
                   ` (5 subsequent siblings)
  36 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Now that we can calculate CRCs through xfs_db, we can add support
for recalculating CRCs on obfuscated metadump images. This simply
requires us to call the write verifier manually before writing the
buffer to the metadump image.

We don't need to do anything special to mdrestore, as the metadata
blocks it reads from the image file will already have all the
correct CRCs in them. Hence it can be mostly oblivious to the fact
that the filesystem it is restoring contains CRCs.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 db/metadump.c             | 24 +++++++++++++++++++-----
 mdrestore/xfs_mdrestore.c |  3 ---
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/db/metadump.c b/db/metadump.c
index ac6a4d6..117dc42 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -172,6 +172,22 @@ write_buf(
 	__int64_t	off;
 	int		i;
 
+	/*
+	 * Run the write verifier to recalculate the buffer CRCs and check
+	 * we are writing something valid to disk
+	 */
+	if (buf->bp && buf->bp->b_ops) {
+		buf->bp->b_error = 0;
+		buf->bp->b_ops->verify_write(buf->bp);
+		if (buf->bp->b_error) {
+			fprintf(stderr,
+	_("%s: write verifer failed on bno 0x%llx/0x%x\n"),
+				__func__, (long long)buf->bp->b_bn,
+				buf->bp->b_bcount);
+			return buf->bp->b_error;
+		}
+	}
+
 	for (i = 0, off = buf->bb, data = buf->data;
 			i < buf->blen;
 			i++, off++, data += BBSIZE) {
@@ -1727,6 +1743,9 @@ copy_inode_chunk(
 
 		if (!process_inode(agno, agino + i, dip))
 			goto pop_out;
+
+		/* calculate the new CRC for the inode */
+		xfs_dinode_calc_crc(mp, dip);
 	}
 skip_processing:
 	if (!write_buf(iocur_top))
@@ -2053,11 +2072,6 @@ metadump_f(
 		return 0;
 	}
 
-	if (xfs_sb_version_hascrc(&mp->m_sb) && dont_obfuscate == 0) {
-		print_warning("Can't obfuscate CRC enabled filesystems yet.");
-		return 0;
-	}
-
 	metablock = (xfs_metablock_t *)calloc(BBSIZE + 1, BBSIZE);
 	if (metablock == NULL) {
 		print_warning("memory allocation failure");
diff --git a/mdrestore/xfs_mdrestore.c b/mdrestore/xfs_mdrestore.c
index fe61766..e57bdb2 100644
--- a/mdrestore/xfs_mdrestore.c
+++ b/mdrestore/xfs_mdrestore.c
@@ -109,9 +109,6 @@ perform_restore(
 	if (sb.sb_magicnum != XFS_SB_MAGIC)
 		fatal("bad magic number for primary superblock\n");
 
-	if (xfs_sb_version_hascrc(&sb))
-		fatal("Can't restore CRC enabled filesystems yet.\n");
-
 	((xfs_dsb_t*)block_buffer)->sb_inprogress = 1;
 
 	if (is_target_file)  {
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 32/36] xfs: support larger inode clusters on v5 filesystems
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (30 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 31/36] db: enable metadump on CRC filesystems Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 33/36] xfsprogs: kill experimental warnings for " Dave Chinner
                   ` (4 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

To allow the kernel to use larger inode clusters than the standard
8192 bytes, we need to set the inode alignment fields appropriately
so that the kernel is consistent in it's inode to buffer mappings.
We set the alignment to allow a constant 32 inodes per cluster,
instead of a fixed 8k cluster size.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/libxfs.h |  2 +-
 mkfs/xfs_mkfs.c  |  5 ++++-
 repair/sb.c      | 41 ++++++++++++++++++++++++++++-------------
 3 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/include/libxfs.h b/include/libxfs.h
index 40a950e..4bf331c 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -181,7 +181,7 @@ typedef struct xfs_mount {
 	__uint8_t		m_sectbb_log;	/* sectorlog - BBSHIFT */
 	__uint8_t		m_agno_log;	/* log #ag's */
 	__uint8_t		m_agino_log;	/* #bits for agino in inum */
-	__uint16_t		m_inode_cluster_size;/* min inode buf size */
+	uint			m_inode_cluster_size;/* min inode buf size */
 	uint			m_blockmask;	/* sb_blocksize-1 */
 	uint			m_blockwsize;	/* sb_blocksize in words */
 	uint			m_blockwmask;	/* blockwsize-1 */
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 3a032c0..d82128c 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2532,7 +2532,10 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 	} else
 		sbp->sb_logsunit = 0;
 	if (iaflag) {
-		sbp->sb_inoalignmt = XFS_INODE_BIG_CLUSTER_SIZE >> blocklog;
+		int	cluster_size = XFS_INODE_BIG_CLUSTER_SIZE;
+		if (crcs_enabled)
+			cluster_size *= isize / XFS_DINODE_MIN_SIZE;
+		sbp->sb_inoalignmt = cluster_size >> blocklog;
 		iaflag = sbp->sb_inoalignmt != 0;
 	} else
 		sbp->sb_inoalignmt = 0;
diff --git a/repair/sb.c b/repair/sb.c
index 2e35a4c..c54d89b 100644
--- a/repair/sb.c
+++ b/repair/sb.c
@@ -169,17 +169,37 @@ find_secondary_sb(xfs_sb_t *rsb)
 }
 
 /*
- * calculate what inode alignment field ought to be
- * based on internal superblock info
+ * Calculate what inode alignment field ought to be
+ * based on internal superblock info and determine if it is valid.
+ *
+ * For v5 superblocks, the inode alignment will either match that of the
+ * standard XFS_INODE_BIG_CLUSTER_SIZE, or it will be scaled based on the inode
+ * size. Either value is valid in this case.
+ *
+ * Return true if the alignment is valid, false otherwise.
  */
-static int
-calc_ino_align(xfs_sb_t *sb)
+static bool
+sb_validate_ino_align(struct xfs_sb *sb)
 {
-	xfs_extlen_t align;
+	xfs_extlen_t	align;
 
+	if (!xfs_sb_version_hasalign(sb))
+		return true;
+
+	/* standard cluster size alignment is always valid */
 	align = XFS_INODE_BIG_CLUSTER_SIZE >> sb->sb_blocklog;
+	if (align == sb->sb_inoalignmt)
+		return true;
+
+	/* alignment scaled by inode size is v5 only for now */
+	if (!xfs_sb_version_hascrc(sb))
+		return false;
 
-	return(align);
+	align *= sb->sb_inodesize / XFS_DINODE_MIN_SIZE;
+	if (align == sb->sb_inoalignmt)
+		return true;
+
+	return false;
 }
 
 /*
@@ -228,7 +248,6 @@ int
 verify_sb(xfs_sb_t *sb, int is_primary_sb)
 {
 	__uint32_t	bsize;
-	xfs_extlen_t	align;
 	int		i;
 
 	/* check magic number and version number */
@@ -364,12 +383,8 @@ verify_sb(xfs_sb_t *sb, int is_primary_sb)
 	/*
 	 * verify correctness of inode alignment if it's there
 	 */
-	if (xfs_sb_version_hasalign(sb))  {
-		align = calc_ino_align(sb);
-
-		if (align != sb->sb_inoalignmt)
-			return(XR_BAD_INO_ALIGN);
-	}
+	if (!sb_validate_ino_align(sb))
+		return(XR_BAD_INO_ALIGN);
 
 	/*
 	 * verify max. % of inodes (sb_imax_pct)
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 33/36] xfsprogs: kill experimental warnings for v5 filesystems
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (31 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 32/36] xfs: support larger inode clusters on v5 filesystems Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 34/36] repair: prefetching is turned off unnecessarily Dave Chinner
                   ` (3 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

With xfsprogs now being close to feature complete on v5 filesystems,
remove the experimental warnings from the superblock verifier. This
means that we don't need to filter such warnings from the output in
xfstests and so we can see exactly what tests are failing due to
code deficiencies rather than from detecting warning noise.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_sb.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 65ddc2f..48b1a97 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -180,10 +180,6 @@ xfs_mount_validate_sb(
 	 * write validation, we don't need to check feature masks.
 	 */
 	if (check_version && XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) {
-		xfs_alert(mp,
-"Version 5 superblock detected. xfsprogs has EXPERIMENTAL support enabled!\n"
-"Use of these features is at your own risk!");
-
 		if (xfs_sb_has_compat_feature(sbp,
 					XFS_SB_FEAT_COMPAT_UNKNOWN)) {
 			xfs_warn(mp,
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 34/36] repair: prefetching is turned off unnecessarily
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (32 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 33/36] xfsprogs: kill experimental warnings for " Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13  6:40 ` [PATCH 35/36] repair: Increase default repair parallelism on large filesystems Dave Chinner
                   ` (2 subsequent siblings)
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When we have a large filesystem, prefetching is only enabled when
there is a significant amount of RAM available - roughly 16GB RAM
for every 100TB of disk space. For large filesystems, this memory
usage calculation is mostly derived from the memory needed to track
used space rather than inodes. That is, for a 100TB filesystem with
50 million inodes, only 50M * 4 bytes or 200MB of the the required
16GB of RAM is used for tracking inodes. Hence with prefetching
turned off, such a filesystem only uses 230MB of memory to run
repair to completion.

With prefetching turned on, this increases to about 900MB of RAM,
but it is still far, far less than the predicted 16GB of RAM needed
to enable prefetching. Hence we are turning off prefetching when we
really don't need to and hence large filesystems are being checked
slower than they could be.

This patch makes prefetching always be enabled, but adds warnings in
the case that we might not have enough memory to complete
successfully and if it fails to run again with prefetching disabled:

  Memory available for repair (12031MB) may not be sufficient.
  At least 13044MB is needed to repair this filesystem efficiently
  If repair fails due to lack of memory, please
  turn prefetching off (-P) to reduce the memory footprint.

A similar warning is also added when prefetching is disabled and
xfs_repair exhausts memory then more RAM/swap should be added to the
system.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/xfs_repair.c | 42 ++++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 77a040e..78f8363 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -674,34 +674,36 @@ main(int argc, char **argv)
 				mp->m_sb.sb_dblocks >> (10 + 1));
 
 		if (max_mem <= mem_used) {
-			/*
-			 * Turn off prefetch and minimise libxfs cache if
-			 * physical memory is deemed insufficient
-			 */
 			if (max_mem_specified) {
 				do_abort(
 	_("Required memory for repair is greater that the maximum specified\n"
 	  "with the -m option. Please increase it to at least %lu.\n"),
 					mem_used / 1024);
-			} else {
-				do_warn(
-	_("Not enough RAM available for repair to enable prefetching.\n"
-	  "This will be _slow_.\n"
-	  "You need at least %luMB RAM to run with prefetching enabled.\n"),
-					mem_used * 1280 / (1024 * 1024));
 			}
-			do_prefetch = 0;
-			libxfs_bhash_size = 64;
-		} else {
-			max_mem -= mem_used;
-			if (max_mem >= (1 << 30))
-				max_mem = 1 << 30;
-			libxfs_bhash_size = max_mem / (HASH_CACHE_RATIO *
-					(mp->m_inode_cluster_size >> 10));
-			if (libxfs_bhash_size < 512)
-				libxfs_bhash_size = 512;
+			do_warn(
+	_("Memory available for repair (%luMB) may not be sufficient.\n"
+	  "At least %luMB is needed to repair this filesystem efficiently\n"
+	  "If repair fails due to lack of memory, please\n"),
+				max_mem / 1024, mem_used / 1024);
+			if (do_prefetch)
+				do_warn(
+	_("turn prefetching off (-P) to reduce the memory footprint.\n"));
+			else
+				do_warn(
+	_("increase system RAM and/or swap space to at least %luMB.\n"),
+			mem_used * 2 / 1024);
+
+			max_mem = mem_used;
 		}
 
+		max_mem -= mem_used;
+		if (max_mem >= (1 << 30))
+			max_mem = 1 << 30;
+		libxfs_bhash_size = max_mem / (HASH_CACHE_RATIO *
+				(mp->m_inode_cluster_size >> 10));
+		if (libxfs_bhash_size < 512)
+			libxfs_bhash_size = 512;
+
 		if (verbose)
 			do_log(_("        - block cache size set to %d entries\n"),
 				libxfs_bhash_size * HASH_CACHE_RATIO);
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 35/36] repair: Increase default repair parallelism on large filesystems
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (33 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 34/36] repair: prefetching is turned off unnecessarily Dave Chinner
@ 2013-11-13  6:40 ` Dave Chinner
  2013-11-13 16:10   ` Christoph Hellwig
  2013-11-13  6:41 ` [PATCH 36/36] repair: fix leaf node directory data check Dave Chinner
  2013-11-14 16:18 ` [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Rich Johnston
  36 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Large filesystems or high AG count filesystems generally have more
inherent parallelism in the backing storage. We shoul dmake use of
this by default to speed up repair times. Make xfs_repair use an
"auto-stride" configuration on filesystems with enough AGs to be
considered "multidisk" configurations.

This difference in elaspsed time to repair a 100TB filesystem with
50 million inodes in it with all metadata in flash is:

		Time	IOPS	BW	CPU	RAM
vanilla:	2719s	 2900	 55MB/s	 25%	0.95GB
patched:	 908s	varied	varied	varied	2.33GB

With the patched kernel, there were IO peaks of over 1.3GB/s during
AG scanning. Some phases now run at noticably different speeds
	- phase 3 ran at ~180% CPU, 18,000 IOPS and 130MB/s,
	- phase 4 ran at ~280% CPU, 12,000 IOPS and 100MB/s
	- the other phases were similar to the vanilla repair.

Memory usage is increased because of the increased buffer cache
size as a result of concurrent AG scanning using it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 repair/xfs_repair.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 78f8363..a863337 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -614,6 +614,23 @@ main(int argc, char **argv)
 	inodes_per_cluster = MAX(mp->m_sb.sb_inopblock,
 			XFS_INODE_CLUSTER_SIZE(mp) >> mp->m_sb.sb_inodelog);
 
+	/*
+	 * Automatic striding for high agcount filesystems.
+	 *
+	 * More AGs indicates that the filesystem is either large or can handle
+	 * more IO parallelism. Either way, we should try to process multiple
+	 * AGs at a time in such a configuration to try to saturate the
+	 * underlying storage and speed the repair process. Only do this if
+	 * prefetching is enabled.
+	 *
+	 * Given mkfs defaults for 16AGs for "multidisk" configurations, we want
+	 * to target these for an increase in thread count. Hence a stride value
+	 * of 15 is chosen to ensure we get at least 2 AGs being scanned at once
+	 * on such filesystems.
+	 */
+	if (!ag_stride && glob_agcount >= 16 && do_prefetch)
+		ag_stride = 15;
+
 	if (ag_stride) {
 		thread_count = (glob_agcount + ag_stride - 1) / ag_stride;
 		thread_init();
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 36/36] repair: fix leaf node directory data check
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (34 preceding siblings ...)
  2013-11-13  6:40 ` [PATCH 35/36] repair: Increase default repair parallelism on large filesystems Dave Chinner
@ 2013-11-13  6:41 ` Dave Chinner
  2013-11-14 16:18 ` [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Rich Johnston
  36 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13  6:41 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When walking the leaf node format blocks (LEAFN) in the hash index
of a large directory, we could trip over btree node blocks (DA_NODE)
in the address space if there are enough entries in the directory.
These cause a verifier failure, and hence the directory is
considered corrupt and is trashed and rebuilt unnecesarily. Fix this
by using the correct verifier that can handle both types of blocks
without triggering failures.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/phase6.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/repair/phase6.c b/repair/phase6.c
index 5307acf..d2d4a44 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -1937,8 +1937,16 @@ longform_dir2_check_node(
 		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
 		if (bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
 			break;
+
+		/*
+		 * we need to use the da3 node verifier here as it handles the
+		 * fact that reading the leaf hash tree blocks can return either
+		 * leaf or node blocks and calls the correct verifier. If we get
+		 * a node block, then we'll skip it below based on a magic
+		 * number check.
+		 */
 		if (libxfs_da_read_buf(NULL, ip, da_bno, -1, &bp,
-				XFS_DATA_FORK, &xfs_dir3_leafn_buf_ops)) {
+				XFS_DATA_FORK, &xfs_da3_node_buf_ops)) {
 			do_warn(
 	_("can't read leaf block %u for directory inode %" PRIu64 "\n"),
 				da_bno, ip->i_ino);
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 20/36] db: rewrite IO engine to use libxfs
  2013-11-13  6:40 ` [PATCH 20/36] db: rewrite IO engine to use libxfs Dave Chinner
@ 2013-11-13 16:05   ` Christoph Hellwig
  0 siblings, 0 replies; 45+ messages in thread
From: Christoph Hellwig @ 2013-11-13 16:05 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 24/36] db: verify and calculate dquot CRCs
  2013-11-13  6:40 ` [PATCH 24/36] db: verify and calculate dquot CRCs Dave Chinner
@ 2013-11-13 16:05   ` Christoph Hellwig
  0 siblings, 0 replies; 45+ messages in thread
From: Christoph Hellwig @ 2013-11-13 16:05 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Nov 13, 2013 at 05:40:48PM +1100, Dave Chinner wrote:
> When we set the current Io cursor to point at a dquot block, verify
> that the dquot CRC is intact. And prior to writing such an IO
> cursor, calculate the dquot CRC.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 31/36] db: enable metadump on CRC filesystems
  2013-11-13  6:40 ` [PATCH 31/36] db: enable metadump on CRC filesystems Dave Chinner
@ 2013-11-13 16:09   ` Christoph Hellwig
  2013-11-13 21:00     ` Dave Chinner
  0 siblings, 1 reply; 45+ messages in thread
From: Christoph Hellwig @ 2013-11-13 16:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Nov 13, 2013 at 05:40:55PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Now that we can calculate CRCs through xfs_db, we can add support
> for recalculating CRCs on obfuscated metadump images. This simply
> requires us to call the write verifier manually before writing the
> buffer to the metadump image.
> 
> We don't need to do anything special to mdrestore, as the metadata
> blocks it reads from the image file will already have all the
> correct CRCs in them. Hence it can be mostly oblivious to the fact
> that the filesystem it is restoring contains CRCs.

All the changes in here look reasonable, but don't we need a way to
recalculate dquot crcs in metadump as well?  We seem to need a special
case for them elsewhere at least.

Do we have a testcase that exercises metadump on a filesystems with
quotas enabled and check that they still work after restore?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 35/36] repair: Increase default repair parallelism on large filesystems
  2013-11-13  6:40 ` [PATCH 35/36] repair: Increase default repair parallelism on large filesystems Dave Chinner
@ 2013-11-13 16:10   ` Christoph Hellwig
  2013-11-13 21:01     ` Dave Chinner
  0 siblings, 1 reply; 45+ messages in thread
From: Christoph Hellwig @ 2013-11-13 16:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Nov 13, 2013 at 05:40:59PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Large filesystems or high AG count filesystems generally have more
> inherent parallelism in the backing storage. We shoul dmake use of
> this by default to speed up repair times. Make xfs_repair use an
> "auto-stride" configuration on filesystems with enough AGs to be
> considered "multidisk" configurations.
> 
> This difference in elaspsed time to repair a 100TB filesystem with
> 50 million inodes in it with all metadata in flash is:
> 
> 		Time	IOPS	BW	CPU	RAM
> vanilla:	2719s	 2900	 55MB/s	 25%	0.95GB
> patched:	 908s	varied	varied	varied	2.33GB
> 
> With the patched kernel, there were IO peaks of over 1.3GB/s during
> AG scanning. Some phases now run at noticably different speeds
> 	- phase 3 ran at ~180% CPU, 18,000 IOPS and 130MB/s,
> 	- phase 4 ran at ~280% CPU, 12,000 IOPS and 100MB/s
> 	- the other phases were similar to the vanilla repair.
> 
> Memory usage is increased because of the increased buffer cache
> size as a result of concurrent AG scanning using it.

Looks good as long as you stick your promise to clean up the magic
numbers later.

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 31/36] db: enable metadump on CRC filesystems
  2013-11-13 16:09   ` Christoph Hellwig
@ 2013-11-13 21:00     ` Dave Chinner
  2013-11-14 13:34       ` Christoph Hellwig
  0 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2013-11-13 21:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, Nov 13, 2013 at 08:09:37AM -0800, Christoph Hellwig wrote:
> On Wed, Nov 13, 2013 at 05:40:55PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Now that we can calculate CRCs through xfs_db, we can add support
> > for recalculating CRCs on obfuscated metadump images. This simply
> > requires us to call the write verifier manually before writing the
> > buffer to the metadump image.
> > 
> > We don't need to do anything special to mdrestore, as the metadata
> > blocks it reads from the image file will already have all the
> > correct CRCs in them. Hence it can be mostly oblivious to the fact
> > that the filesystem it is restoring contains CRCs.
> 
> All the changes in here look reasonable, but don't we need a way to
> recalculate dquot crcs in metadump as well?

The dquot buffers are read in and written unmodified. We don't want
to recalculate the CRC if we haven't modified the objects directly
(i.e. obfuscated them). Hence I think the way they are currently
handled is fine.

> We seem to need a special
> case for them elsewhere at least.

Only if we modify them, and metadump doesn't do that.

> Do we have a testcase that exercises metadump on a filesystems with
> quotas enabled and check that they still work after restore?

No, nothing that directly verifies quota sanity after a restore.
Doing something like:

# MOUNT_OPTIONS="-o uquota" ./check xfs/253

Doesn't make the test fail, but it's not actually checking the quota
values are correct...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 35/36] repair: Increase default repair parallelism on large filesystems
  2013-11-13 16:10   ` Christoph Hellwig
@ 2013-11-13 21:01     ` Dave Chinner
  0 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2013-11-13 21:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, Nov 13, 2013 at 08:10:29AM -0800, Christoph Hellwig wrote:
> On Wed, Nov 13, 2013 at 05:40:59PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Large filesystems or high AG count filesystems generally have more
> > inherent parallelism in the backing storage. We shoul dmake use of
> > this by default to speed up repair times. Make xfs_repair use an
> > "auto-stride" configuration on filesystems with enough AGs to be
> > considered "multidisk" configurations.
> > 
> > This difference in elaspsed time to repair a 100TB filesystem with
> > 50 million inodes in it with all metadata in flash is:
> > 
> > 		Time	IOPS	BW	CPU	RAM
> > vanilla:	2719s	 2900	 55MB/s	 25%	0.95GB
> > patched:	 908s	varied	varied	varied	2.33GB
> > 
> > With the patched kernel, there were IO peaks of over 1.3GB/s during
> > AG scanning. Some phases now run at noticably different speeds
> > 	- phase 3 ran at ~180% CPU, 18,000 IOPS and 130MB/s,
> > 	- phase 4 ran at ~280% CPU, 12,000 IOPS and 100MB/s
> > 	- the other phases were similar to the vanilla repair.
> > 
> > Memory usage is increased because of the increased buffer cache
> > size as a result of concurrent AG scanning using it.
> 
> Looks good as long as you stick your promise to clean up the magic
> numbers later.

Already got a prototype patch for it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 31/36] db: enable metadump on CRC filesystems
  2013-11-13 21:00     ` Dave Chinner
@ 2013-11-14 13:34       ` Christoph Hellwig
  0 siblings, 0 replies; 45+ messages in thread
From: Christoph Hellwig @ 2013-11-14 13:34 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db +
  2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
                   ` (35 preceding siblings ...)
  2013-11-13  6:41 ` [PATCH 36/36] repair: fix leaf node directory data check Dave Chinner
@ 2013-11-14 16:18 ` Rich Johnston
  36 siblings, 0 replies; 45+ messages in thread
From: Rich Johnston @ 2013-11-14 16:18 UTC (permalink / raw)
  To: Dave Chinner, xfs

This patchset has been committed.

Thanks
--Rich

commit 9b981421f503ba679097f8cd749af37cc42f5fd7
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:41:00 2013 +0000

     repair: fix leaf node directory data check

commit 0cce4aa198f0470817bedb3781ea5b6955e43076
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:59 2013 +0000

     repair: Increase default repair parallelism on large filesystems

commit 61510437c627b529feb95ebffddd73df5ed5b104
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:58 2013 +0000

     repair: prefetching is turned off unnecessarily

commit ba3615fc784f03d9cb25fb7cc9240ea56b4b7a4b
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:57 2013 +0000

     xfsprogs: kill experimental warnings for v5 filesystems

commit 7b5f9801f4d569ab9fdbdd1e39aa59585d296872
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:56 2013 +0000

     xfs: support larger inode clusters on v5 filesystems

commit 8ab75c4d9176d8831fd137cf0e7916032d8216da
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:55 2013 +0000

     db: enable metadump on CRC filesystems

commit 839dac7f06d54600b3092a7ad9cb903315a27f97
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:54 2013 +0000

     libxfs: work around do_div() not handling 32 bit numerators

commit ba9ecd40b3792961f12102af55c759d0432a6486
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:53 2013 +0000

     xfs_db: avoid libxfs buffer lookup warnings

commit 06d80a7c09287581002c275fd21cfecdbdefcc15
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:52 2013 +0000

     xfs_db: use inode cluster buffers for inode IO

commit d14bf4dda7f5a59ba3fbaed38cd829db5f68a105
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:51 2013 +0000

     db: re-enable write support for v5 filesystems.

commit 2847273f74b3b14ba3cb9ab876b910da12ed2dbe
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:50 2013 +0000

     db: add a special attribute buffer verifier

commit fc068a1902148eaaad7a7e5e9972155dd68a647c
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:49 2013 +0000

     db: add a special directory buffer verifier

commit 66a40d020d7abb6fe09693f4392b6af2b30aa3b3
Author: Dave Chinner <david@fromorbit.com>
Date:   Wed Nov 13 06:40:48 2013 +0000

     db: verify and calculate dquot CRCs

commit a73b88f29a82a21ef6f50298d2d14ae1d91b321d
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:47 2013 +0000

     db: verify and calculate inode CRCs

commit 0522f1cc3ab1638e18a636b6a8cf6db8b1d277f6
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:46 2013 +0000

     db: indicate if the CRC on a buffer is correct or not

commit 6fea8f830a6cc04d5429de31f40d15b94d0fe8da
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:45 2013 +0000

     db: introduce verifier support into set_cur

commit 72298d16b17776f7a57a5244776591653387846b
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:44 2013 +0000

     db: rewrite IO engine to use libxfs

commit 800db1c1581d68cc3e44980b0be9c5ff7b7fd6d9
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:43 2013 +0000

     libxfs: refactor libxfs_buf_read_map for xfs_db

commit 48e32b40a611384836e593251cbe9d840db00ac9
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:42 2013 +0000

     db: rewrite bbmap to use xfs_buf_map

commit 2a8b3fdf37d30bd4e0bec834168dd4fd9d8b4f58
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:41 2013 +0000

     db: separate out straight buffer IO from map based IO.

commit 3a19fb7dce9d570e78deaf5c26c0ab8a4a5bef67
Author: Christoph Hellwig <hch@infradead.org>
Date:   Wed Nov 13 06:40:40 2013 +0000

     libxfs: stop caching inode structures

commit 9aa5711629b47642bb5b688a6a1410d223456fc8
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:39 2013 +0000

     libxfs: fix root inode handling inconsistencies

commit 6aa3d87bc45348dc0948ae0cea57bf3033d64694
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:38 2013 +0000

commit c9522f4d8790ecd61c4e74746b607787485f2027
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:37 2013 +0000

     xfs: fix node forward in xfs_node_toosmall

commit 3e23516ae60421652fd41354307a6a5181d401eb
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:36 2013 +0000

     xfs: fix the wrong new_size/rnew_size at xfs_iext_realloc_direct()

commit 12864fd992dd5d6bc3c089aeb6422c8d235a28f0
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:35 2013 +0000

     xfs: remove newlines from strings passed to __xfs_printk

commit e6d77a21f263ea403bef0940a524212e6fa03d04
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:34 2013 +0000

     libxfs: Minor cleanup and bug fix sync

commit f85fc6220f1c7fdb467a4d5b43e9bfbd2fb36c1d
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:33 2013 +0000

     libxfs: bring across inode buffer readahead verifier changes

commit 2ceff9cee1f513ff633aa0d6997374da313c8c55
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:32 2013 +0000

     libxfs: xfs_rtalloc.c becomes xfs_rtbitmap.c

commit 9c6ebc42e2f1cf4b114c5cecbf373b8922959e69
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:31 2013 +0000

     libxfs: bmap btree owner swap support

commit 10851b189f50f357eab3ce787533e505babd00d2
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:30 2013 +0000

     libxfs: unify xfs_btree.c with kernel code

commit 34b8c759723757e5b4a4c9da6c3a790eb405000f
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:29 2013 +0000

     xfs: decouple inode and bmap btree header files

commit 32390f05a6a2c1a30b5f05141d705fb7c686079c
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:28 2013 +0000

     xfs: split dquot buffer operations out

commit 4b1bcf9627153a77acbb2b0e6f0eb3d5b5102ceb
Author: Dave Chinner <david@fromorbit.com>
Date:   Wed Nov 13 06:40:27 2013 +0000

     xfs: create a shared header file for format-related information

commit 389b3b078ccc03da48d3cc0387fb5c5508e15d0f
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:26 2013 +0000

     xfs: fix some minor sparse warnings

commit 270c31284f7d05557a31ef1304d582897bf4ffcc
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Nov 13 06:40:25 2013 +0000

     xfsprogs: fix automatic dependency generation

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2013-11-14 16:18 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-13  6:40 [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Dave Chinner
2013-11-13  6:40 ` [PATCH 01/36] xfsprogs: fix automatic dependency generation Dave Chinner
2013-11-13  6:40 ` [PATCH 02/36] xfs: fix some minor sparse warnings Dave Chinner
2013-11-13  6:40 ` [PATCH 03/36] xfs: create a shared header file for format-related information Dave Chinner
2013-11-13  6:40 ` [PATCH 04/36] xfs: split dquot buffer operations out Dave Chinner
2013-11-13  6:40 ` [PATCH 05/36] xfs: decouple inode and bmap btree header files Dave Chinner
2013-11-13  6:40 ` [PATCH 06/36] libxfs: unify xfs_btree.c with kernel code Dave Chinner
2013-11-13  6:40 ` [PATCH 07/36] libxfs: bmap btree owner swap support Dave Chinner
2013-11-13  6:40 ` [PATCH 08/36] libxfs: xfs_rtalloc.c becomes xfs_rtbitmap.c Dave Chinner
2013-11-13  6:40 ` [PATCH 09/36] libxfs: bring across inode buffer readahead verifier changes Dave Chinner
2013-11-13  6:40 ` [PATCH 10/36] libxfs: Minor cleanup and bug fix sync Dave Chinner
2013-11-13  6:40 ` [PATCH 11/36] xfs: remove newlines from strings passed to __xfs_printk Dave Chinner
2013-11-13  6:40 ` [PATCH 12/36] xfs: fix the wrong new_size/rnew_size at xfs_iext_realloc_direct() Dave Chinner
2013-11-13  6:40 ` [PATCH 13/36] xfs: fix node forward in xfs_node_toosmall Dave Chinner
2013-11-13  6:40 ` [PATCH 14/36] xfs: don't emit corruption noise on fs probes Dave Chinner
2013-11-13  6:40 ` [PATCH 15/36] libxfs: fix root inode handling inconsistencies Dave Chinner
2013-11-13  6:40 ` [PATCH 16/36] libxfs: stop caching inode structures Dave Chinner
2013-11-13  6:40 ` [PATCH 17/36] db: separate out straight buffer IO from map based IO Dave Chinner
2013-11-13  6:40 ` [PATCH 18/36] db: rewrite bbmap to use xfs_buf_map Dave Chinner
2013-11-13  6:40 ` [PATCH 19/36] libxfs: refactor libxfs_buf_read_map for xfs_db Dave Chinner
2013-11-13  6:40 ` [PATCH 20/36] db: rewrite IO engine to use libxfs Dave Chinner
2013-11-13 16:05   ` Christoph Hellwig
2013-11-13  6:40 ` [PATCH 21/36] db: introduce verifier support into set_cur Dave Chinner
2013-11-13  6:40 ` [PATCH 22/36] db: indicate if the CRC on a buffer is correct or not Dave Chinner
2013-11-13  6:40 ` [PATCH 23/36] db: verify and calculate inode CRCs Dave Chinner
2013-11-13  6:40 ` [PATCH 24/36] db: verify and calculate dquot CRCs Dave Chinner
2013-11-13 16:05   ` Christoph Hellwig
2013-11-13  6:40 ` [PATCH 25/36] db: add a special directory buffer verifier Dave Chinner
2013-11-13  6:40 ` [PATCH 26/36] db: add a special attribute " Dave Chinner
2013-11-13  6:40 ` [PATCH 27/36] db: re-enable write support for v5 filesystems Dave Chinner
2013-11-13  6:40 ` [PATCH 28/36] xfs_db: use inode cluster buffers for inode IO Dave Chinner
2013-11-13  6:40 ` [PATCH 29/36] xfs_db: avoid libxfs buffer lookup warnings Dave Chinner
2013-11-13  6:40 ` [PATCH 30/36] libxfs: work around do_div() not handling 32 bit numerators Dave Chinner
2013-11-13  6:40 ` [PATCH 31/36] db: enable metadump on CRC filesystems Dave Chinner
2013-11-13 16:09   ` Christoph Hellwig
2013-11-13 21:00     ` Dave Chinner
2013-11-14 13:34       ` Christoph Hellwig
2013-11-13  6:40 ` [PATCH 32/36] xfs: support larger inode clusters on v5 filesystems Dave Chinner
2013-11-13  6:40 ` [PATCH 33/36] xfsprogs: kill experimental warnings for " Dave Chinner
2013-11-13  6:40 ` [PATCH 34/36] repair: prefetching is turned off unnecessarily Dave Chinner
2013-11-13  6:40 ` [PATCH 35/36] repair: Increase default repair parallelism on large filesystems Dave Chinner
2013-11-13 16:10   ` Christoph Hellwig
2013-11-13 21:01     ` Dave Chinner
2013-11-13  6:41 ` [PATCH 36/36] repair: fix leaf node directory data check Dave Chinner
2013-11-14 16:18 ` [PATCH 00/36 V5] xfsprogs: CRC write support for xfs_db + Rich Johnston

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.