All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATH 0/9] xfs: fixes for 3.10-rc4
@ 2013-05-27  6:38 Dave Chinner
  2013-05-27  6:38 ` [PATCH 1/9] xfs: don't emit v5 superblock warnings on write Dave Chinner
                   ` (11 more replies)
  0 siblings, 12 replies; 54+ messages in thread
From: Dave Chinner @ 2013-05-27  6:38 UTC (permalink / raw)
  To: xfs

Hi Ben,

This is an update for all the fixes I have for 3,10-rc4.

All the comments on the previous patches have been updated in this
series, and there are two new patches. The first patch adds support
for indicating that the filesystem has CRC enabled to userspace, and
the second fixes the missing CRC updates on the unlinked inode list
manipulations.

Cheers,

Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 1/9] xfs: don't emit v5 superblock warnings on write
  2013-05-27  6:38 [PATH 0/9] xfs: fixes for 3.10-rc4 Dave Chinner
@ 2013-05-27  6:38 ` Dave Chinner
  2013-05-29 16:39   ` Brian Foster
  2013-05-27  6:38 ` [PATCH 2/9] xfs: fix incorrect remote symlink block count Dave Chinner
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Dave Chinner @ 2013-05-27  6:38 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

We write the superblock every 30s or so which results in the
verifier being called. Right now that results in this output
every 30s:

XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
Use of these features in this kernel is at your own risk!

And spamming the logs.

We don't need to check for whether we support v5 superblocks or
whether there are feature bits we don't support set as these are
only relevant when we first mount the filesytem. i.e. on superblock
read. Hence for the write verification we can just skip all the
checks (and hence verbose output) altogether.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_mount.c |   18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index f6bfbd7..e8e310c 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -314,7 +314,8 @@ STATIC int
 xfs_mount_validate_sb(
 	xfs_mount_t	*mp,
 	xfs_sb_t	*sbp,
-	bool		check_inprogress)
+	bool		check_inprogress,
+	bool		check_version)
 {
 
 	/*
@@ -337,9 +338,10 @@ xfs_mount_validate_sb(
 
 	/*
 	 * Version 5 superblock feature mask validation. Reject combinations the
-	 * kernel cannot support up front before checking anything else.
+	 * kernel cannot support up front before checking anything else. For
+	 * write validation, we don't need to check feature masks.
 	 */
-	if (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) {
+	if (check_version && XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) {
 		xfs_alert(mp,
 "Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!\n"
 "Use of these features in this kernel is at your own risk!");
@@ -675,7 +677,8 @@ xfs_sb_to_disk(
 
 static int
 xfs_sb_verify(
-	struct xfs_buf	*bp)
+	struct xfs_buf	*bp,
+	bool		check_version)
 {
 	struct xfs_mount *mp = bp->b_target->bt_mount;
 	struct xfs_sb	sb;
@@ -686,7 +689,8 @@ xfs_sb_verify(
 	 * Only check the in progress field for the primary superblock as
 	 * mkfs.xfs doesn't clear it from secondary superblocks.
 	 */
-	return xfs_mount_validate_sb(mp, &sb, bp->b_bn == XFS_SB_DADDR);
+	return xfs_mount_validate_sb(mp, &sb, bp->b_bn == XFS_SB_DADDR,
+				     check_version);
 }
 
 /*
@@ -719,7 +723,7 @@ xfs_sb_read_verify(
 			goto out_error;
 		}
 	}
-	error = xfs_sb_verify(bp);
+	error = xfs_sb_verify(bp, true);
 
 out_error:
 	if (error) {
@@ -758,7 +762,7 @@ xfs_sb_write_verify(
 	struct xfs_buf_log_item	*bip = bp->b_fspriv;
 	int			error;
 
-	error = xfs_sb_verify(bp);
+	error = xfs_sb_verify(bp, false);
 	if (error) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
 		xfs_buf_ioerror(bp, error);
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 2/9] xfs: fix incorrect remote symlink block count
  2013-05-27  6:38 [PATH 0/9] xfs: fixes for 3.10-rc4 Dave Chinner
  2013-05-27  6:38 ` [PATCH 1/9] xfs: don't emit v5 superblock warnings on write Dave Chinner
@ 2013-05-27  6:38 ` Dave Chinner
  2013-05-29 16:39   ` Brian Foster
  2013-05-27  6:38 ` [PATCH 3/9] xfs: increase number of ACL entries for V5 superblocks Dave Chinner
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Dave Chinner @ 2013-05-27  6:38 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When CRCs are enabled, the number of blocks needed to hold a remote
symlink on a 1k block size filesystem may be 2 instead of 1. The
transaction reservation for the allocated bloks was not taking this
into account and only allocating one block. hence when trying to
read or invalidate such symlinks, we are mapping a hole where there
should be a block and things go bad at that point.

Fix the reservation to use the correct block count, clean up the
block count calculation similar to the remote attribute calculation,
and add a debug guard to detect when we don't write the entire
symlink to disk.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_symlink.c |   20 ++++++--------------
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 5f234389..195a403 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -56,16 +56,9 @@ xfs_symlink_blocks(
 	struct xfs_mount *mp,
 	int		pathlen)
 {
-	int		fsblocks = 0;
-	int		len = pathlen;
+	int buflen = XFS_SYMLINK_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
 
-	do {
-		fsblocks++;
-		len -= XFS_SYMLINK_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
-	} while (len > 0);
-
-	ASSERT(fsblocks <= XFS_SYMLINK_MAPS);
-	return fsblocks;
+	return (pathlen + buflen - 1) / buflen;
 }
 
 static int
@@ -405,7 +398,7 @@ xfs_symlink(
 	if (pathlen <= XFS_LITINO(mp, dp->i_d.di_version))
 		fs_blocks = 0;
 	else
-		fs_blocks = XFS_B_TO_FSB(mp, pathlen);
+		fs_blocks = xfs_symlink_blocks(mp, pathlen);
 	resblks = XFS_SYMLINK_SPACE_RES(mp, link_name->len, fs_blocks);
 	error = xfs_trans_reserve(tp, resblks, XFS_SYMLINK_LOG_RES(mp), 0,
 			XFS_TRANS_PERM_LOG_RES, XFS_SYMLINK_LOG_COUNT);
@@ -512,7 +505,7 @@ xfs_symlink(
 		cur_chunk = target_path;
 		offset = 0;
 		for (n = 0; n < nmaps; n++) {
-			char *buf;
+			char	*buf;
 
 			d = XFS_FSB_TO_DADDR(mp, mval[n].br_startblock);
 			byte_cnt = XFS_FSB_TO_B(mp, mval[n].br_blockcount);
@@ -525,9 +518,7 @@ xfs_symlink(
 			bp->b_ops = &xfs_symlink_buf_ops;
 
 			byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt);
-			if (pathlen < byte_cnt) {
-				byte_cnt = pathlen;
-			}
+			byte_cnt = min(byte_cnt, pathlen);
 
 			buf = bp->b_addr;
 			buf += xfs_symlink_hdr_set(mp, ip->i_ino, offset,
@@ -542,6 +533,7 @@ xfs_symlink(
 			xfs_trans_log_buf(tp, bp, 0, (buf + byte_cnt - 1) -
 							(char *)bp->b_addr);
 		}
+		ASSERT(pathlen == 0);
 	}
 
 	/*
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 3/9] xfs: increase number of ACL entries for V5 superblocks
  2013-05-27  6:38 [PATH 0/9] xfs: fixes for 3.10-rc4 Dave Chinner
  2013-05-27  6:38 ` [PATCH 1/9] xfs: don't emit v5 superblock warnings on write Dave Chinner
  2013-05-27  6:38 ` [PATCH 2/9] xfs: fix incorrect remote symlink block count Dave Chinner
@ 2013-05-27  6:38 ` Dave Chinner
  2013-05-29 16:40   ` Brian Foster
  2013-05-27  6:38 ` [PATCH 4/9] xfs: rework dquot CRCs Dave Chinner
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Dave Chinner @ 2013-05-27  6:38 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The limit of 25 ACL entries is arbitrary, but baked into the on-disk
format.  For version 5 superblocks, increase it to the maximum nuber
of ACLs that can fit into a single xattr.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_acl.c |   31 ++++++++++++++++++-------------
 fs/xfs/xfs_acl.h |   30 +++++++++++++++++++++++-------
 2 files changed, 41 insertions(+), 20 deletions(-)

diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
index 1d32f1d..b6bed9b 100644
--- a/fs/xfs/xfs_acl.c
+++ b/fs/xfs/xfs_acl.c
@@ -21,6 +21,8 @@
 #include "xfs_bmap_btree.h"
 #include "xfs_inode.h"
 #include "xfs_vnodeops.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
 #include "xfs_trace.h"
 #include <linux/slab.h>
 #include <linux/xattr.h>
@@ -34,7 +36,9 @@
  */
 
 STATIC struct posix_acl *
-xfs_acl_from_disk(struct xfs_acl *aclp)
+xfs_acl_from_disk(
+	struct xfs_acl	*aclp,
+	int		max_entries)
 {
 	struct posix_acl_entry *acl_e;
 	struct posix_acl *acl;
@@ -42,7 +46,7 @@ xfs_acl_from_disk(struct xfs_acl *aclp)
 	unsigned int count, i;
 
 	count = be32_to_cpu(aclp->acl_cnt);
-	if (count > XFS_ACL_MAX_ENTRIES)
+	if (count > max_entries)
 		return ERR_PTR(-EFSCORRUPTED);
 
 	acl = posix_acl_alloc(count, GFP_KERNEL);
@@ -108,7 +112,7 @@ xfs_get_acl(struct inode *inode, int type)
 	struct xfs_inode *ip = XFS_I(inode);
 	struct posix_acl *acl;
 	struct xfs_acl *xfs_acl;
-	int len = sizeof(struct xfs_acl);
+	int len = XFS_ACL_SIZE(ip->i_mount);
 	unsigned char *ea_name;
 	int error;
 
@@ -133,8 +137,8 @@ xfs_get_acl(struct inode *inode, int type)
 	 * If we have a cached ACLs value just return it, not need to
 	 * go out to the disk.
 	 */
-
-	xfs_acl = kzalloc(sizeof(struct xfs_acl), GFP_KERNEL);
+	len = XFS_ACL_SIZE(ip->i_mount);
+	xfs_acl = kzalloc(len, GFP_KERNEL);
 	if (!xfs_acl)
 		return ERR_PTR(-ENOMEM);
 
@@ -153,7 +157,7 @@ xfs_get_acl(struct inode *inode, int type)
 		goto out;
 	}
 
-	acl = xfs_acl_from_disk(xfs_acl);
+	acl = xfs_acl_from_disk(xfs_acl, XFS_ACL_MAX_ENTRIES(ip->i_mount));
 	if (IS_ERR(acl))
 		goto out;
 
@@ -189,16 +193,17 @@ xfs_set_acl(struct inode *inode, int type, struct posix_acl *acl)
 
 	if (acl) {
 		struct xfs_acl *xfs_acl;
-		int len;
+		int len = XFS_ACL_SIZE(ip->i_mount);
 
-		xfs_acl = kzalloc(sizeof(struct xfs_acl), GFP_KERNEL);
+		xfs_acl = kzalloc(len, GFP_KERNEL);
 		if (!xfs_acl)
 			return -ENOMEM;
 
 		xfs_acl_to_disk(xfs_acl, acl);
-		len = sizeof(struct xfs_acl) -
-			(sizeof(struct xfs_acl_entry) *
-			 (XFS_ACL_MAX_ENTRIES - acl->a_count));
+
+		/* subtract away the unused acl entries */
+		len -= sizeof(struct xfs_acl_entry) *
+			 (XFS_ACL_MAX_ENTRIES(ip->i_mount) - acl->a_count);
 
 		error = -xfs_attr_set(ip, ea_name, (unsigned char *)xfs_acl,
 				len, ATTR_ROOT);
@@ -243,7 +248,7 @@ xfs_set_mode(struct inode *inode, umode_t mode)
 static int
 xfs_acl_exists(struct inode *inode, unsigned char *name)
 {
-	int len = sizeof(struct xfs_acl);
+	int len = XFS_ACL_SIZE(XFS_M(inode->i_sb));
 
 	return (xfs_attr_get(XFS_I(inode), name, NULL, &len,
 			    ATTR_ROOT|ATTR_KERNOVAL) == 0);
@@ -379,7 +384,7 @@ xfs_xattr_acl_set(struct dentry *dentry, const char *name,
 		goto out_release;
 
 	error = -EINVAL;
-	if (acl->a_count > XFS_ACL_MAX_ENTRIES)
+	if (acl->a_count > XFS_ACL_MAX_ENTRIES(XFS_M(inode->i_sb)))
 		goto out_release;
 
 	if (type == ACL_TYPE_ACCESS) {
diff --git a/fs/xfs/xfs_acl.h b/fs/xfs/xfs_acl.h
index 39632d9..0da8725 100644
--- a/fs/xfs/xfs_acl.h
+++ b/fs/xfs/xfs_acl.h
@@ -22,19 +22,35 @@ struct inode;
 struct posix_acl;
 struct xfs_inode;
 
-#define XFS_ACL_MAX_ENTRIES 25
 #define XFS_ACL_NOT_PRESENT (-1)
 
 /* On-disk XFS access control list structure */
+struct xfs_acl_entry {
+	__be32	ae_tag;
+	__be32	ae_id;
+	__be16	ae_perm;
+	__be16	ae_pad;		/* fill the implicit hole in the structure */
+};
+
 struct xfs_acl {
-	__be32		acl_cnt;
-	struct xfs_acl_entry {
-		__be32	ae_tag;
-		__be32	ae_id;
-		__be16	ae_perm;
-	} acl_entry[XFS_ACL_MAX_ENTRIES];
+	__be32			acl_cnt;
+	struct xfs_acl_entry	acl_entry[0];
 };
 
+/*
+ * The number of ACL entries allowed is defined by the on-disk format.
+ * For v4 superblocks, that is limited to 25 entries. For v5 superblocks, it is
+ * limited only by the maximum size of the xattr that stores the information.
+ */
+#define XFS_ACL_MAX_ENTRIES(mp)	\
+	(xfs_sb_version_hascrc(&mp->m_sb) \
+	   ?  (XATTR_SIZE_MAX - sizeof(__be32)) / sizeof(struct xfs_acl_entry) \
+	   : 25)
+
+#define XFS_ACL_SIZE(mp) \
+	(sizeof(struct xfs_acl) + \
+		sizeof(struct xfs_acl_entry) * XFS_ACL_MAX_ENTRIES((mp)))
+
 /* On-disk XFS extended attribute names */
 #define SGI_ACL_FILE		(unsigned char *)"SGI_ACL_FILE"
 #define SGI_ACL_DEFAULT		(unsigned char *)"SGI_ACL_DEFAULT"
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 4/9] xfs: rework dquot CRCs
  2013-05-27  6:38 [PATH 0/9] xfs: fixes for 3.10-rc4 Dave Chinner
                   ` (2 preceding siblings ...)
  2013-05-27  6:38 ` [PATCH 3/9] xfs: increase number of ACL entries for V5 superblocks Dave Chinner
@ 2013-05-27  6:38 ` Dave Chinner
  2013-05-29 18:58   ` Brian Foster
  2013-05-27  6:38 ` [PATCH 5/9] xfs: fix split buffer vector log recovery support Dave Chinner
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Dave Chinner @ 2013-05-27  6:38 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Calculating dquot CRCs when the backing buffer is written back just
doesn't work reliably. There are several places which manipulate
dquots directly in the buffers, and they don't calculate CRCs
appropriately, nor do they always set the buffer up to calculate
CRCs appropriately.

Firstly, if we log a dquot buffer (e.g. during allocation) it gets
logged without valid CRC, and so on recovery we end up with a dquot
that is not valid.

Secondly, if we recover/repair a dquot, we don't have a verifier
attached to the buffer and hence CRCs arenot calculate don the way
down to disk.

Thirdly, calculating the CRC after we've changed the contents means
that if we re-read the dquot from the buffer, we cannot verify the
contents of the dquot are valid, as the CRC is invalid.

So, to avoid all the dquot CRC errors that are being detected by the
read verifier, change to using the same model as for inodes. that
is, dquot CRCs are calculated and written to the backing buffer at
the time the dquot is flushed to the backing buffer. If we modify
the dquuot directly in the backing buffer, calculate the CRC
immediately after the modification is complete. Hence the dquot in
the on-disk buffer should always have a valid CRC.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_dquot.c       |   37 ++++++++++++++++---------------------
 fs/xfs/xfs_log_recover.c |   10 ++++++++++
 fs/xfs/xfs_qm.c          |   36 ++++++++++++++++++++++++++----------
 fs/xfs/xfs_quota.h       |    2 ++
 4 files changed, 54 insertions(+), 31 deletions(-)

diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index a41f8bf..044e97a 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -249,8 +249,11 @@ xfs_qm_init_dquot_blk(
 		d->dd_diskdq.d_version = XFS_DQUOT_VERSION;
 		d->dd_diskdq.d_id = cpu_to_be32(curid);
 		d->dd_diskdq.d_flags = type;
-		if (xfs_sb_version_hascrc(&mp->m_sb))
+		if (xfs_sb_version_hascrc(&mp->m_sb)) {
 			uuid_copy(&d->dd_uuid, &mp->m_sb.sb_uuid);
+			xfs_update_cksum((char *)d, sizeof(struct xfs_dqblk),
+					 XFS_DQUOT_CRC_OFF);
+		}
 	}
 
 	xfs_trans_dquot_buf(tp, bp,
@@ -286,23 +289,6 @@ xfs_dquot_set_prealloc_limits(struct xfs_dquot *dqp)
 	dqp->q_low_space[XFS_QLOWSP_5_PCNT] = space * 5;
 }
 
-STATIC void
-xfs_dquot_buf_calc_crc(
-	struct xfs_mount	*mp,
-	struct xfs_buf		*bp)
-{
-	struct xfs_dqblk	*d = (struct xfs_dqblk *)bp->b_addr;
-	int			i;
-
-	if (!xfs_sb_version_hascrc(&mp->m_sb))
-		return;
-
-	for (i = 0; i < mp->m_quotainfo->qi_dqperchunk; i++, d++) {
-		xfs_update_cksum((char *)d, sizeof(struct xfs_dqblk),
-				 offsetof(struct xfs_dqblk, dd_crc));
-	}
-}
-
 STATIC bool
 xfs_dquot_buf_verify_crc(
 	struct xfs_mount	*mp,
@@ -328,12 +314,11 @@ xfs_dquot_buf_verify_crc(
 
 	for (i = 0; i < ndquots; i++, d++) {
 		if (!xfs_verify_cksum((char *)d, sizeof(struct xfs_dqblk),
-				 offsetof(struct xfs_dqblk, dd_crc)))
+				 XFS_DQUOT_CRC_OFF))
 			return false;
 		if (!uuid_equal(&d->dd_uuid, &mp->m_sb.sb_uuid))
 			return false;
 	}
-
 	return true;
 }
 
@@ -393,6 +378,11 @@ xfs_dquot_buf_read_verify(
 	}
 }
 
+/*
+ * we don't calculate the CRC here as that is done when the dquot is flushed to
+ * the buffer after the update is done. This ensures that the dquot in the
+ * buffer always has an up-to-date CRC value.
+ */
 void
 xfs_dquot_buf_write_verify(
 	struct xfs_buf	*bp)
@@ -404,7 +394,6 @@ xfs_dquot_buf_write_verify(
 		xfs_buf_ioerror(bp, EFSCORRUPTED);
 		return;
 	}
-	xfs_dquot_buf_calc_crc(mp, bp);
 }
 
 const struct xfs_buf_ops xfs_dquot_buf_ops = {
@@ -1151,11 +1140,17 @@ xfs_qm_dqflush(
 	 * copy the lsn into the on-disk dquot now while we have the in memory
 	 * dquot here. This can't be done later in the write verifier as we
 	 * can't get access to the log item at that point in time.
+	 *
+	 * We also calculate the CRC here so that the on-disk dquot in the
+	 * buffer always has a valid CRC. This ensures there is no possibility
+	 * of a dquot without an up-to-date CRC getting to disk.
 	 */
 	if (xfs_sb_version_hascrc(&mp->m_sb)) {
 		struct xfs_dqblk *dqb = (struct xfs_dqblk *)ddqp;
 
 		dqb->dd_lsn = cpu_to_be64(dqp->q_logitem.qli_item.li_lsn);
+		xfs_update_cksum((char *)dqb, sizeof(struct xfs_dqblk),
+				 XFS_DQUOT_CRC_OFF);
 	}
 
 	/*
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 93f03ec..0be37d7 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2255,6 +2255,12 @@ xfs_qm_dqcheck(
 	d->dd_diskdq.d_flags = type;
 	d->dd_diskdq.d_id = cpu_to_be32(id);
 
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
+		uuid_copy(&d->dd_uuid, &mp->m_sb.sb_uuid);
+		xfs_update_cksum((char *)d, sizeof(struct xfs_dqblk),
+				 XFS_DQUOT_CRC_OFF);
+	}
+
 	return errs;
 }
 
@@ -2782,6 +2788,10 @@ xlog_recover_dquot_pass2(
 	}
 
 	memcpy(ddq, recddq, item->ri_buf[1].i_len);
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
+		xfs_update_cksum((char *)ddq, sizeof(struct xfs_dqblk),
+				 XFS_DQUOT_CRC_OFF);
+	}
 
 	ASSERT(dq_f->qlf_size == 2);
 	ASSERT(bp->b_target->bt_mount == mp);
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index f41702b..d181542 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -41,6 +41,7 @@
 #include "xfs_qm.h"
 #include "xfs_trace.h"
 #include "xfs_icache.h"
+#include "xfs_cksum.h"
 
 /*
  * The global quota manager. There is only one of these for the entire
@@ -839,7 +840,7 @@ xfs_qm_reset_dqcounts(
 	xfs_dqid_t	id,
 	uint		type)
 {
-	xfs_disk_dquot_t	*ddq;
+	struct xfs_dqblk	*dqb;
 	int			j;
 
 	trace_xfs_reset_dqcounts(bp, _RET_IP_);
@@ -853,8 +854,12 @@ xfs_qm_reset_dqcounts(
 	do_div(j, sizeof(xfs_dqblk_t));
 	ASSERT(mp->m_quotainfo->qi_dqperchunk == j);
 #endif
-	ddq = bp->b_addr;
+	dqb = bp->b_addr;
 	for (j = 0; j < mp->m_quotainfo->qi_dqperchunk; j++) {
+		struct xfs_disk_dquot	*ddq;
+
+		ddq =  (struct xfs_disk_dquot *)&dqb[j];
+
 		/*
 		 * Do a sanity check, and if needed, repair the dqblk. Don't
 		 * output any warnings because it's perfectly possible to
@@ -871,7 +876,8 @@ xfs_qm_reset_dqcounts(
 		ddq->d_bwarns = 0;
 		ddq->d_iwarns = 0;
 		ddq->d_rtbwarns = 0;
-		ddq = (xfs_disk_dquot_t *) ((xfs_dqblk_t *)ddq + 1);
+		xfs_update_cksum((char *)&dqb[j], sizeof(struct xfs_dqblk),
+					 XFS_DQUOT_CRC_OFF);
 	}
 }
 
@@ -907,19 +913,29 @@ xfs_qm_dqiter_bufs(
 			      XFS_FSB_TO_DADDR(mp, bno),
 			      mp->m_quotainfo->qi_dqchunklen, 0, &bp,
 			      &xfs_dquot_buf_ops);
-		if (error)
-			break;
 
 		/*
-		 * XXX(hch): need to figure out if it makes sense to validate
-		 *	     the CRC here.
+		 * CRC and validation errors will return a EFSCORRUPTED here. If
+		 * this occurs, re-read without CRC validation so that we can
+		 * repair the damage via xfs_qm_reset_dqcounts(). This process
+		 * will leave a trace in the log indicating corruption has
+		 * been detected.
 		 */
+		if (error == EFSCORRUPTED) {
+			error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp,
+				      XFS_FSB_TO_DADDR(mp, bno),
+				      mp->m_quotainfo->qi_dqchunklen, 0, &bp,
+				      NULL);
+		}
+
+		if (error)
+			break;
+
 		xfs_qm_reset_dqcounts(mp, bp, firstid, type);
 		xfs_buf_delwri_queue(bp, buffer_list);
 		xfs_buf_relse(bp);
-		/*
-		 * goto the next block.
-		 */
+
+		/* goto the next block. */
 		bno++;
 		firstid += mp->m_quotainfo->qi_dqperchunk;
 	}
diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h
index c61e31c..c38068f 100644
--- a/fs/xfs/xfs_quota.h
+++ b/fs/xfs/xfs_quota.h
@@ -87,6 +87,8 @@ typedef struct xfs_dqblk {
 	uuid_t		  dd_uuid;	/* location information */
 } xfs_dqblk_t;
 
+#define XFS_DQUOT_CRC_OFF	offsetof(struct xfs_dqblk, dd_crc)
+
 /*
  * flags for q_flags field in the dquot.
  */
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 5/9] xfs: fix split buffer vector log recovery support
  2013-05-27  6:38 [PATH 0/9] xfs: fixes for 3.10-rc4 Dave Chinner
                   ` (3 preceding siblings ...)
  2013-05-27  6:38 ` [PATCH 4/9] xfs: rework dquot CRCs Dave Chinner
@ 2013-05-27  6:38 ` Dave Chinner
  2013-05-29 19:21   ` Mark Tinguely
  2013-05-27  6:38 ` [PATCH 6/9] xfs: disable swap extents ioctl on CRC enabled filesystems Dave Chinner
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Dave Chinner @ 2013-05-27  6:38 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

A long time ago in a galaxy far away....

.. the was a commit made to fix some ilinux specific "fragmented
buffer" log recovery problem:

http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=b29c0bece51da72fb3ff3b61391a391ea54e1603

That problem occurred when a contiguous dirty region of a buffer was
split across across two pages of an unmapped buffer. It's been a
long time since that has been done in XFS, and the changes to log
the entire inode buffers for CRC enabled filesystems has
re-introduced that corner case.

And, of course, it turns out that the above commit didn't actually
fix anything - it just ensured that log recovery is guaranteed to
fail when this situation occurs. And now for the gory details.

xfstest xfs/085 is failing with this assert:

XFS (vdb): bad number of regions (0) in inode log format
XFS: Assertion failed: 0, file: fs/xfs/xfs_log_recover.c, line: 1583

Largely undocumented factoid #1: Log recovery depends on all log
buffer format items starting with this format:

struct foo_log_format {
	__uint16_t	type;
	__uint16_t	size;
	....

As recoery uses the size field and assumptions about 32 bit
alignment in decoding format items.  So don't pay much attention to
the fact log recovery thinks that it decoding an inode log format
item - it just uses them to determine what the size of the item is.

But why would it see a log format item with a zero size? Well,
luckily enough xfs_logprint uses the same code and gives the same
error, so with a bit of gdb magic, it turns out that it isn't a log
format that is being decoded. What logprint tells us is this:

Oper (130): tid: a0375e1a  len: 28  clientid: TRANS  flags: none
BUF:  #regs: 2   start blkno: 144 (0x90)  len: 16  bmap size: 2  flags: 0x4000
Oper (131): tid: a0375e1a  len: 4096  clientid: TRANS  flags: none
BUF DATA
----------------------------------------------------------------------------
Oper (132): tid: a0375e1a  len: 4096  clientid: TRANS  flags: none
xfs_logprint: unknown log operation type (4e49)
**********************************************************************
* ERROR: data block=2                                                 *
**********************************************************************

That we've got a buffer format item (oper 130) that has two regions;
the format item itself and one dirty region. The subsequent region
after the buffer format item and it's data is them what we are
tripping over, and the first bytes of it at an inode magic number.
Not a log opheader like there is supposed to be.

That means there's a problem with the buffer format item. It's dirty
data region is 4096 bytes, and it contains - you guessed it -
initialised inodes. But inode buffers are 8k, not 4k, and we log
them in their entirety. So something is wrong here. The buffer
format item contains:

(gdb) p /x *(struct xfs_buf_log_format *)in_f
$22 = {blf_type = 0x123c, blf_size = 0x2, blf_flags = 0x4000,
       blf_len = 0x10, blf_blkno = 0x90, blf_map_size = 0x2,
       blf_data_map = {0xffffffff, 0xffffffff, .... }}

Two regions, and a signle dirty contiguous region of 64 bits.  64 *
128 = 8k, so this should be followed by a single 8k region of data.
And the blf_flags tell us that the type of buffer is a
XFS_BLFT_DINO_BUF. It contains inodes. And because it doesn't have
the XFS_BLF_INODE_BUF flag set, that means it's an inode allocation
buffer. So, it should be followed by 8k of inode data.

But we know that the next region has a header of:

(gdb) p /x *ohead
$25 = {oh_tid = 0x1a5e37a0, oh_len = 0x100000, oh_clientid = 0x69,
       oh_flags = 0x0, oh_res2 = 0x0}

and so be32_to_cpu(oh_len) = 0x1000 = 4096 bytes. It's simply not
long enough to hold all the logged data. There must be another
region. There is - there's a following opheader for another 4k of
data that contains the other half of the inode cluster data - the
one we assert fail on because it's not a log format header.

So why is the second part of the data not being accounted to the
correct buffer log format structure? It took a little more work with
gdb to work out that the buffer log format structure was both
expecting it to be there but hadn't accounted for it. It was at that
point I went to the kernel code, as clearly this wasn't a bug in
xfs_logprint and the kernel was writing bad stuff to the log.

First port of call was the buffer item formatting code, and the
discontiguous memory/contiguous dirty region handling code
immediately stood out. I've wondered for a long time why the code
had this comment in it:

                        vecp->i_addr = xfs_buf_offset(bp, buffer_offset);
                        vecp->i_len = nbits * XFS_BLF_CHUNK;
                        vecp->i_type = XLOG_REG_TYPE_BCHUNK;
/*
 * You would think we need to bump the nvecs here too, but we do not
 * this number is used by recovery, and it gets confused by the boundary
 * split here
 *                      nvecs++;
 */
                        vecp++;

And it didn't account for the extra vector pointer. The case being
handled here is that a contiguous dirty region lies across a
boundary that cannot be memcpy()d across, and so has to be split
into two separate operations for xlog_write() to perform.

What this code assumes is that what is written to the log is two
consecutive blocks of data that are accounted in the buf log format
item as the same contiguous dirty region and so will get decoded as
such by the log recovery code.

The thing is, xlog_write() knows nothing about this, and so just
does it's normal thing of adding an opheader for each vector. That
means the 8k region gets written to the log as two separate regions
of 4k each, but because nvecs has not been incremented, the buf log
format item accounts for only one of them.

Hence when we come to log recovery, we process the first 4k region
and then expect to come across a new item that starts with a log
format structure of some kind that tells us whenteh next data is
going to be. Instead, we hit raw buffer data and things go bad real
quick.

So, the commit from 2002 that commented out nvecs++ is just plain
wrong. It breaks log recovery completely, and it would seem the only
reason this hasn't been since then is that we don't log large
contigous regions of multi-page unmapped buffers very often. Never
would be a closer estimate, at least until the CRC code came along....

So, lets fix that by restoring the nvecs accounting for the extra
region when we hit this case.....

.... and there's the problemin log recovery it is apparently working
around:

XFS: Assertion failed: i == item->ri_total, file: fs/xfs/xfs_log_recover.c, line: 2135

Yup, xlog_recover_do_reg_buffer() doesn't handle contigous dirty
regions being broken up into multiple regions by the log formatting
code. That's an easy fix, though - if the number of contiguous dirty
bits exceeds the length of the region being copied out of the log,
only account for the number of dirty bits that region covers, and
then loop again and copy more from the next region. It's a 2 line
fix.

Now xfstests xfs/085 passes, we have one less piece of mystery
code, and one more important piece of knowledge about how to
structure new log format items..

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf_item.c    |    7 +------
 fs/xfs/xfs_log_recover.c |   11 +++++++++++
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index cf26347..4ec4317 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -262,12 +262,7 @@ xfs_buf_item_format_segment(
 			vecp->i_addr = xfs_buf_offset(bp, buffer_offset);
 			vecp->i_len = nbits * XFS_BLF_CHUNK;
 			vecp->i_type = XLOG_REG_TYPE_BCHUNK;
-/*
- * You would think we need to bump the nvecs here too, but we do not
- * this number is used by recovery, and it gets confused by the boundary
- * split here
- *			nvecs++;
- */
+			nvecs++;
 			vecp++;
 			first_bit = next_bit;
 			last_bit = next_bit;
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 0be37d7..d6204d1 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2097,6 +2097,17 @@ xlog_recover_do_reg_buffer(
 		       ((uint)bit << XFS_BLF_SHIFT) + (nbits << XFS_BLF_SHIFT));
 
 		/*
+		 * The dirty regions logged in the buffer, even though
+		 * contiguous, may span multiple chunks. This is because the
+		 * dirty region may span a physical page boundary in a buffer
+		 * and hence be split into two separate vectors for writing into
+		 * the log. Hence we need to trim nbits back to the length of
+		 * the current region being copied out of the log.
+		 */
+		if (item->ri_buf[i].i_len < (nbits << XFS_BLF_SHIFT))
+			nbits = item->ri_buf[i].i_len >> XFS_BLF_SHIFT;
+
+		/*
 		 * Do a sanity check if this is a dquot buffer. Just checking
 		 * the first dquot in the buffer should do. XXXThis is
 		 * probably a good thing to do for other buf types also.
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 6/9] xfs: disable swap extents ioctl on CRC enabled filesystems
  2013-05-27  6:38 [PATH 0/9] xfs: fixes for 3.10-rc4 Dave Chinner
                   ` (4 preceding siblings ...)
  2013-05-27  6:38 ` [PATCH 5/9] xfs: fix split buffer vector log recovery support Dave Chinner
@ 2013-05-27  6:38 ` Dave Chinner
  2013-05-28 21:49   ` Ben Myers
  2013-05-29 21:06   ` Brian Foster
  2013-05-27  6:38 ` [PATCH 7/9] xfs: kill suid/sgid through the truncate path Dave Chinner
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 54+ messages in thread
From: Dave Chinner @ 2013-05-27  6:38 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Currently, swapping extents from one inode to another is a simple
act of switching data and attribute forks from one inode to another.
This, unfortunately in no longer so simple with CRC enabled
filesystems as there is owner information embedded into the BMBT
blocks that are swapped between inodes. Hence swapping the forks
between inodes results in the inodes having mapping blocks that
point to the wrong owner and hence are considered corrupt.

To fix this we need an extent tree block or record based swap
algorithm so that the BMBT block owner information can be updated
atomically in the swap transaction. This is a significant piece of
new work, so for the moment simply don't allow swap extent
operations to succeed on CRC enabled filesystems.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_dfrag.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/xfs/xfs_dfrag.c b/fs/xfs/xfs_dfrag.c
index f852b08..c407e1c 100644
--- a/fs/xfs/xfs_dfrag.c
+++ b/fs/xfs/xfs_dfrag.c
@@ -219,6 +219,14 @@ xfs_swap_extents(
 	int		taforkblks = 0;
 	__uint64_t	tmp;
 
+	/*
+	 * We have no way of updating owner information in the BMBT blocks for
+	 * each inode on CRC enabled filesystems, so to avoid corrupting the
+	 * this metadata we simply don't allow extent swaps to occur.
+	 */
+	if (xfs_sb_version_hascrc(&mp->m_sb))
+		return XFS_ERROR(EINVAL);
+
 	tempifp = kmem_alloc(sizeof(xfs_ifork_t), KM_MAYFAIL);
 	if (!tempifp) {
 		error = XFS_ERROR(ENOMEM);
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 7/9] xfs: kill suid/sgid through the truncate path.
  2013-05-27  6:38 [PATH 0/9] xfs: fixes for 3.10-rc4 Dave Chinner
                   ` (5 preceding siblings ...)
  2013-05-27  6:38 ` [PATCH 6/9] xfs: disable swap extents ioctl on CRC enabled filesystems Dave Chinner
@ 2013-05-27  6:38 ` Dave Chinner
  2013-05-30 14:17   ` Brian Foster
  2013-05-27  6:38 ` [PATCH 8/9] xfs: add fsgeom flag for v5 superblock support Dave Chinner
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Dave Chinner @ 2013-05-27  6:38 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

XFS has failed to kill suid/sgid bits correctly when truncating
files of non-zero size since commit c4ed4243 ("xfs: split
xfs_setattr") introduced in the 3.1 kernel. Fix it.

Fix it.

cc: stable kernel <stable@vger.kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_iops.c |   47 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 32 insertions(+), 15 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index d82efaa..ca9ecaa 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -455,6 +455,28 @@ xfs_vn_getattr(
 	return 0;
 }
 
+static void
+xfs_setattr_mode(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	struct iattr		*iattr)
+{
+	struct inode	*inode = VFS_I(ip);
+	umode_t		mode = iattr->ia_mode;
+
+	ASSERT(tp);
+	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
+
+	if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
+		mode &= ~S_ISGID;
+
+	ip->i_d.di_mode &= S_IFMT;
+	ip->i_d.di_mode |= mode & ~S_IFMT;
+
+	inode->i_mode &= S_IFMT;
+	inode->i_mode |= mode & ~S_IFMT;
+}
+
 int
 xfs_setattr_nonsize(
 	struct xfs_inode	*ip,
@@ -606,18 +628,8 @@ xfs_setattr_nonsize(
 	/*
 	 * Change file access modes.
 	 */
-	if (mask & ATTR_MODE) {
-		umode_t mode = iattr->ia_mode;
-
-		if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
-			mode &= ~S_ISGID;
-
-		ip->i_d.di_mode &= S_IFMT;
-		ip->i_d.di_mode |= mode & ~S_IFMT;
-
-		inode->i_mode &= S_IFMT;
-		inode->i_mode |= mode & ~S_IFMT;
-	}
+	if (mask & ATTR_MODE)
+		xfs_setattr_mode(tp, ip, iattr);
 
 	/*
 	 * Change file access or modified times.
@@ -714,9 +726,8 @@ xfs_setattr_size(
 		return XFS_ERROR(error);
 
 	ASSERT(S_ISREG(ip->i_d.di_mode));
-	ASSERT((mask & (ATTR_MODE|ATTR_UID|ATTR_GID|ATTR_ATIME|ATTR_ATIME_SET|
-			ATTR_MTIME_SET|ATTR_KILL_SUID|ATTR_KILL_SGID|
-			ATTR_KILL_PRIV|ATTR_TIMES_SET)) == 0);
+	ASSERT((mask & (ATTR_UID|ATTR_GID|ATTR_ATIME|ATTR_ATIME_SET|
+			ATTR_MTIME_SET|ATTR_KILL_PRIV|ATTR_TIMES_SET)) == 0);
 
 	if (!(flags & XFS_ATTR_NOLOCK)) {
 		lock_flags |= XFS_IOLOCK_EXCL;
@@ -860,6 +871,12 @@ xfs_setattr_size(
 		xfs_inode_clear_eofblocks_tag(ip);
 	}
 
+	/*
+	 * Change file access modes.
+	 */
+	if (mask & ATTR_MODE)
+		xfs_setattr_mode(tp, ip, iattr);
+
 	if (mask & ATTR_CTIME) {
 		inode->i_ctime = iattr->ia_ctime;
 		ip->i_d.di_ctime.t_sec = iattr->ia_ctime.tv_sec;
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 8/9] xfs: add fsgeom flag for v5 superblock support.
  2013-05-27  6:38 [PATH 0/9] xfs: fixes for 3.10-rc4 Dave Chinner
                   ` (6 preceding siblings ...)
  2013-05-27  6:38 ` [PATCH 7/9] xfs: kill suid/sgid through the truncate path Dave Chinner
@ 2013-05-27  6:38 ` Dave Chinner
  2013-05-29 15:10   ` Eric Sandeen
  2013-05-30 14:17   ` Brian Foster
  2013-05-27  6:38 ` [PATCH 9/9] xfs: inode unlinked list needs to recalculate the inode CRC Dave Chinner
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 54+ messages in thread
From: Dave Chinner @ 2013-05-27  6:38 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Currently userspace has no way of determining that a filesystem is
CRC enabled. Add a flag to the XFS_IOC_FSGEOMETRY ioctl output to
indicate that the filesystem has v5 superblock support enabled.
This will allow xfs_info to correctly report the state of the
filesystem.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_fs.h    |    1 +
 fs/xfs/xfs_fsops.c |    4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_fs.h b/fs/xfs/xfs_fs.h
index 6dda3f9..d046955 100644
--- a/fs/xfs/xfs_fs.h
+++ b/fs/xfs/xfs_fs.h
@@ -236,6 +236,7 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_PROJID32	0x0800  /* 32-bit project IDs	*/
 #define XFS_FSOP_GEOM_FLAGS_DIRV2CI	0x1000	/* ASCII only CI names	*/
 #define XFS_FSOP_GEOM_FLAGS_LAZYSB	0x4000	/* lazy superblock counters */
+#define XFS_FSOP_GEOM_FLAGS_V5SB	0x8000	/* version 5 superblock */
 
 
 /*
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 87595b2..3c3644e 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -99,7 +99,9 @@ xfs_fs_geometry(
 			(xfs_sb_version_hasattr2(&mp->m_sb) ?
 				XFS_FSOP_GEOM_FLAGS_ATTR2 : 0) |
 			(xfs_sb_version_hasprojid32bit(&mp->m_sb) ?
-				XFS_FSOP_GEOM_FLAGS_PROJID32 : 0);
+				XFS_FSOP_GEOM_FLAGS_PROJID32 : 0) |
+			(xfs_sb_version_hascrc(&mp->m_sb) ?
+				XFS_FSOP_GEOM_FLAGS_V5SB : 0);
 		geo->logsectsize = xfs_sb_version_hassector(&mp->m_sb) ?
 				mp->m_sb.sb_logsectsize : BBSIZE;
 		geo->rtsectsize = mp->m_sb.sb_blocksize;
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 9/9] xfs: inode unlinked list needs to recalculate the inode CRC
  2013-05-27  6:38 [PATH 0/9] xfs: fixes for 3.10-rc4 Dave Chinner
                   ` (7 preceding siblings ...)
  2013-05-27  6:38 ` [PATCH 8/9] xfs: add fsgeom flag for v5 superblock support Dave Chinner
@ 2013-05-27  6:38 ` Dave Chinner
  2013-05-28 11:51   ` Dave Chinner
  2013-05-28 20:36   ` [PATCH 9a,9b v2, replacements] xfs: unlinked list crcs Dave Chinner
  2013-05-28  8:37 ` [PATCH 10/9] xfs: fix dir3 freespace block corruption Dave Chinner
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 54+ messages in thread
From: Dave Chinner @ 2013-05-27  6:38 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The inode unlinked list manipulations operate directly on the inode
buffer, and so bypass the inode CRC calculation mechanisms. Hence an
inode on the unlinked list has an invalid CRC. Fix this by
recalculating the CRC whenever we modify an unlinked list pointer in
an inode. This is trivial to do, and the new CRC gets logged with
the inode core so on replay of the unlinked list operations this
will be updated as expected and remain consistent.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_inode.c       |   49 ++++++++++++++++++++++++++++++++++++++--------
 fs/xfs/xfs_log_recover.c |   22 +++++++++++++++++++++
 2 files changed, 63 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index efbe1ac..f4daadf 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1579,6 +1579,23 @@ out_bmap_cancel:
 }
 
 /*
+ * Helper function to calculate range of the inode to log and recalculate the
+ * on-disk inode crc if necessary.
+ */
+static int
+xfs_iunlink_dinode_calc_crc(
+	struct xfs_mount	*mp,
+	struct xfs_dinode	*dip)
+{
+	if (dip->di_version < 3)
+		return sizeof(xfs_agino_t);
+
+	xfs_dinode_calc_crc(mp, dip);
+	return offsetof(struct xfs_dinode, di_changecount) -
+		offsetof(struct xfs_dinode, di_next_unlinked);
+}
+
+/*
  * This is called when the inode's link count goes to 0.
  * We place the on-disk inode on a list in the AGI.  It
  * will be pulled from this list when the inode is freed.
@@ -1623,6 +1640,8 @@ xfs_iunlink(
 	ASSERT(be32_to_cpu(agi->agi_unlinked[bucket_index]) != agino);
 
 	if (agi->agi_unlinked[bucket_index] != cpu_to_be32(NULLAGINO)) {
+		int	size;
+
 		/*
 		 * There is already another inode in the bucket we need
 		 * to add ourselves to.  Add us at the front of the list.
@@ -1638,10 +1657,14 @@ xfs_iunlink(
 		dip->di_next_unlinked = agi->agi_unlinked[bucket_index];
 		offset = ip->i_imap.im_boffset +
 			offsetof(xfs_dinode_t, di_next_unlinked);
+
+		/* need to recalc and log the inode CRC if appropriate */
+		size = xfs_iunlink_dinode_calc_crc(mp, dip);
+
 		xfs_trans_inode_buf(tp, ibp);
-		xfs_trans_log_buf(tp, ibp, offset,
-				  (offset + sizeof(xfs_agino_t) - 1));
+		xfs_trans_log_buf(tp, ibp, offset, offset + size - 1);
 		xfs_inobp_check(mp, ibp);
+
 	}
 
 	/*
@@ -1678,6 +1701,7 @@ xfs_iunlink_remove(
 	short		bucket_index;
 	int		offset, last_offset = 0;
 	int		error;
+	int		size;
 
 	mp = tp->t_mountp;
 	agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
@@ -1723,9 +1747,12 @@ xfs_iunlink_remove(
 			dip->di_next_unlinked = cpu_to_be32(NULLAGINO);
 			offset = ip->i_imap.im_boffset +
 				offsetof(xfs_dinode_t, di_next_unlinked);
+
+			/* need to recalc and log the inode CRC if appropriate */
+			size = xfs_iunlink_dinode_calc_crc(mp, dip);
+
 			xfs_trans_inode_buf(tp, ibp);
-			xfs_trans_log_buf(tp, ibp, offset,
-					  (offset + sizeof(xfs_agino_t) - 1));
+			xfs_trans_log_buf(tp, ibp, offset, offset + size - 1);
 			xfs_inobp_check(mp, ibp);
 		} else {
 			xfs_trans_brelse(tp, ibp);
@@ -1796,9 +1823,12 @@ xfs_iunlink_remove(
 			dip->di_next_unlinked = cpu_to_be32(NULLAGINO);
 			offset = ip->i_imap.im_boffset +
 				offsetof(xfs_dinode_t, di_next_unlinked);
+
+			/* need to recalc and log the inode CRC if appropriate */
+			size = xfs_iunlink_dinode_calc_crc(mp, dip);
+
 			xfs_trans_inode_buf(tp, ibp);
-			xfs_trans_log_buf(tp, ibp, offset,
-					  (offset + sizeof(xfs_agino_t) - 1));
+			xfs_trans_log_buf(tp, ibp, offset, offset + size - 1);
 			xfs_inobp_check(mp, ibp);
 		} else {
 			xfs_trans_brelse(tp, ibp);
@@ -1809,9 +1839,12 @@ xfs_iunlink_remove(
 		last_dip->di_next_unlinked = cpu_to_be32(next_agino);
 		ASSERT(next_agino != 0);
 		offset = last_offset + offsetof(xfs_dinode_t, di_next_unlinked);
+
+		/* need to recalc and log the inode CRC if appropriate */
+		size = xfs_iunlink_dinode_calc_crc(mp, dip);
+
 		xfs_trans_inode_buf(tp, last_ibp);
-		xfs_trans_log_buf(tp, last_ibp, offset,
-				  (offset + sizeof(xfs_agino_t) - 1));
+		xfs_trans_log_buf(tp, last_ibp, offset, offset + size - 1);
 		xfs_inobp_check(mp, last_ibp);
 	}
 	return 0;
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index d6204d1..285ad16 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1861,6 +1861,28 @@ xlog_recover_do_inode_buffer(
 		buffer_nextp = (xfs_agino_t *)xfs_buf_offset(bp,
 					      next_unlinked_offset);
 		*buffer_nextp = *logged_nextp;
+
+		/*
+		 * Check if we have to recover the CRC, too.i The CRC is
+		 * adjacent to the unlinked field, so should always be in the
+		 * same dirty region.
+		 */
+		if (xfs_sb_version_hascrc(&mp->m_sb)) {
+			int	crc_off;
+			__le32	*buffer_crcp;
+			__le32	*logged_crcp;
+
+			crc_off = next_unlinked_offset +
+				offsetof(struct xfs_dinode, di_crc) -
+				offsetof(struct xfs_dinode, di_next_unlinked);
+
+			buffer_crcp = (__le32 *)xfs_buf_offset(bp, crc_off);
+			logged_crcp = item->ri_buf[item_index].i_addr +
+					crc_off - reg_buf_offset;
+
+			*buffer_crcp = *logged_crcp;
+		}
+
 	}
 
 	return 0;
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 10/9] xfs: fix dir3 freespace block corruption
  2013-05-27  6:38 [PATH 0/9] xfs: fixes for 3.10-rc4 Dave Chinner
                   ` (8 preceding siblings ...)
  2013-05-27  6:38 ` [PATCH 9/9] xfs: inode unlinked list needs to recalculate the inode CRC Dave Chinner
@ 2013-05-28  8:37 ` Dave Chinner
  2013-05-30 19:15   ` Ben Myers
  2013-05-28 17:56 ` [PATH 0/9] xfs: fixes for 3.10-rc4 Ben Myers
  2013-05-28 21:27 ` [PATCH 11/9] xfs: fix remote attribute invalidation for a leaf Dave Chinner
  11 siblings, 1 reply; 54+ messages in thread
From: Dave Chinner @ 2013-05-28  8:37 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When the directory freespace index grows to a second block (2017
4k data blocks in the directory), the initialisation of the second
new block header goes wrong. The write verifier fires a corruption
error indicating that the block number in the header is zero. This
was being tripped by xfs/110.

The problem is that the initialisation of the new block is done just
fine in xfs_dir3_free_get_buf(), but the caller then users a dirv2
structure to zero on-disk header fields that xfs_dir3_free_get_buf()
has already zeroed. These lined up with the block number in the dir
v3 header format.

While looking at this, I noticed that the struct xfs_dir3_free_hdr()
had 4 bytes of padding in it that wasn't defined as padding or being
zeroed by the initialisation. Add a pad field declaration and fully
zero the on disk and in-core headers in xfs_dir3_free_get_buf() so
that this is never an issue in the future. Note that this doesn't
change the on-disk layout, just makes the 32 bits of padding in the
layout explicit.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_dir2_format.h |    1 +
 fs/xfs/xfs_dir2_node.c   |   13 ++++++-------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_dir2_format.h b/fs/xfs/xfs_dir2_format.h
index a3b1bd8..995f1f5 100644
--- a/fs/xfs/xfs_dir2_format.h
+++ b/fs/xfs/xfs_dir2_format.h
@@ -715,6 +715,7 @@ struct xfs_dir3_free_hdr {
 	__be32			firstdb;	/* db of first entry */
 	__be32			nvalid;		/* count of valid entries */
 	__be32			nused;		/* count of used entries */
+	__be32			pad;		/* 64 bit alignment. */
 };
 
 struct xfs_dir3_free {
diff --git a/fs/xfs/xfs_dir2_node.c b/fs/xfs/xfs_dir2_node.c
index 5246de4..2226a00 100644
--- a/fs/xfs/xfs_dir2_node.c
+++ b/fs/xfs/xfs_dir2_node.c
@@ -263,18 +263,19 @@ xfs_dir3_free_get_buf(
 	 * Initialize the new block to be empty, and remember
 	 * its first slot as our empty slot.
 	 */
-	hdr.magic = XFS_DIR2_FREE_MAGIC;
-	hdr.firstdb = 0;
-	hdr.nused = 0;
-	hdr.nvalid = 0;
+	memset(bp->b_addr, 0, sizeof(struct xfs_dir3_free_hdr));
+	memset(&hdr, 0, sizeof(hdr));
+
 	if (xfs_sb_version_hascrc(&mp->m_sb)) {
 		struct xfs_dir3_free_hdr *hdr3 = bp->b_addr;
 
 		hdr.magic = XFS_DIR3_FREE_MAGIC;
+
 		hdr3->hdr.blkno = cpu_to_be64(bp->b_bn);
 		hdr3->hdr.owner = cpu_to_be64(dp->i_ino);
 		uuid_copy(&hdr3->hdr.uuid, &mp->m_sb.sb_uuid);
-	}
+	} else
+		hdr.magic = XFS_DIR2_FREE_MAGIC;
 	xfs_dir3_free_hdr_to_disk(bp->b_addr, &hdr);
 	*bpp = bp;
 	return 0;
@@ -1921,8 +1922,6 @@ xfs_dir2_node_addname_int(
 			 */
 			freehdr.firstdb = (fbno - XFS_DIR2_FREE_FIRSTDB(mp)) *
 					xfs_dir3_free_max_bests(mp);
-			free->hdr.nvalid = 0;
-			free->hdr.nused = 0;
 		} else {
 			free = fbp->b_addr;
 			bests = xfs_dir3_free_bests_p(mp, free);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 9/9] xfs: inode unlinked list needs to recalculate the inode CRC
  2013-05-27  6:38 ` [PATCH 9/9] xfs: inode unlinked list needs to recalculate the inode CRC Dave Chinner
@ 2013-05-28 11:51   ` Dave Chinner
  2013-05-28 20:36   ` [PATCH 9a,9b v2, replacements] xfs: unlinked list crcs Dave Chinner
  1 sibling, 0 replies; 54+ messages in thread
From: Dave Chinner @ 2013-05-28 11:51 UTC (permalink / raw)
  To: xfs

On Mon, May 27, 2013 at 04:38:27PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The inode unlinked list manipulations operate directly on the inode
> buffer, and so bypass the inode CRC calculation mechanisms. Hence an
> inode on the unlinked list has an invalid CRC. Fix this by
> recalculating the CRC whenever we modify an unlinked list pointer in
> an inode. This is trivial to do, and the new CRC gets logged with
> the inode core so on replay of the unlinked list operations this
> will be updated as expected and remain consistent.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

I need to rework this fix.

The fact that it actually requires log recovery to follow it's
documented recovery ordering constraints has exposed a bug in those
the log recovery transaction item re-ordering code. The bug has
been there since inode chunk deletion was introduced in 2003.

It has also shown up that there is no guarantee that the on-disk
inode in the buffer is actually uptodate, so recording the CRC in
the transaction for recovery is the wrong thing to do because it may
be different to what recovery ends up with after reordering all the
modifications. So the correct thing to do is to have recovery
recalculate the on-disk inode CRC are modifying the unlinked list...

So, there's 2-3 patches that need to replace this one....

Thank xfs/121 and the icreate patch set for finding this mess.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATH 0/9] xfs: fixes for 3.10-rc4
  2013-05-27  6:38 [PATH 0/9] xfs: fixes for 3.10-rc4 Dave Chinner
                   ` (9 preceding siblings ...)
  2013-05-28  8:37 ` [PATCH 10/9] xfs: fix dir3 freespace block corruption Dave Chinner
@ 2013-05-28 17:56 ` Ben Myers
  2013-05-28 23:54   ` Dave Chinner
  2013-05-28 21:27 ` [PATCH 11/9] xfs: fix remote attribute invalidation for a leaf Dave Chinner
  11 siblings, 1 reply; 54+ messages in thread
From: Ben Myers @ 2013-05-28 17:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hi Dave,

On Mon, May 27, 2013 at 04:38:18PM +1000, Dave Chinner wrote:
> This is an update for all the fixes I have for 3,10-rc4.
> 
> All the comments on the previous patches have been updated in this
> series, and there are two new patches. The first patch adds support
> for indicating that the filesystem has CRC enabled to userspace, and
> the second fixes the missing CRC updates on the unlinked inode list
> manipulations.

This seems to be getting a bit out of hand for a release candidate pull
request:

Fixes for xfs without crcs:
f648167f xfs: avoid nesting transactions in xfs_qm_scall_setqlim()
	 xfs: fix split buffer vector log recovery support
	 xfs: kill suid/sgid through the truncate path.

Fixes for xfs with crcs:
90253cf1 xfs: remote attribute allocation may be contiguous 
913e96bc xfs: remote attribute read too short
4af3644c xfs: remote attribute tail zeroing does too much
6863ef84 xfs: correctly map remote attr buffers during removal
8517de2a xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance
d4c712bc xfs: fully initialise temp leaf in xfs_attr3_leaf_compact
ad1858d7 xfs: rework remote attr CRCs
9a5b6dee xfs: fix incorrect remote symlink block count
	 xfs: rework dquot CRCs
	 xfs: inode unlinked list needs to recalculate the inode CRC 	* replaced with "2-3 patches"
	 xfs: fix dir3 freespace block corruption			* maybe this effects filesystems w/o crcs?

Dev work for xfs with crcs:
	 xfs: increase number of ACL entries for V5 superblocks
	 xfs: don't emit v5 superblock warnings on write
	 xfs: add fsgeom flag for v5 superblock support.
	 xfs: disable swap extents ioctl on CRC enabled filesystems

We're representing 3 patches which are clearly approprate for a release
candidate, 11 fixes for CRCs + 2 more for the rework of unlinked lists patch,
and 4 patches which are development work.

We pulled in the CRC work with the understanding that it is an experimental
feature that might not be perfect, and with a focus upon preventing regression
for existing users.  This did not imply that we're going after perfection for
CRCs during the 3.10 release candidate series.

I'll be happy to review the 17 CRC related patches for inclusion in the master
branch ASAP, but I'm afraid 17 patches is a bit much to ask for a release
candidate, even if it were -rc2.  Here we are in 3.11/feature-bit territory.
I'm going to focus on the top 3 for -rc4, and it also looks like 'xfs: fix dir3
freespace block corruption' may be a good candidate.  If there are others that
can cause trouble for existing users I'll appreciate it if you'll let me know.

Thanks,
Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 9a,9b v2, replacements] xfs: unlinked list crcs
  2013-05-27  6:38 ` [PATCH 9/9] xfs: inode unlinked list needs to recalculate the inode CRC Dave Chinner
  2013-05-28 11:51   ` Dave Chinner
@ 2013-05-28 20:36   ` Dave Chinner
  2013-05-28 20:36     ` [PATCH 1/2] xfs: fix log recovery transaction item reordering Dave Chinner
  2013-05-28 20:36     ` [PATCH 2/2] xfs: inode unlinked list needs to recalculate the inode CRC Dave Chinner
  1 sibling, 2 replies; 54+ messages in thread
From: Dave Chinner @ 2013-05-28 20:36 UTC (permalink / raw)
  To: xfs; +Cc: bpm

Hi Ben,

These are the replacement patches for the unlinked inode list CRC
fixes. The first patch is the log recovery item ordering fix, which
affects all filesystems, and the second is the reworked unlinked
inode list CRC fixes.

Cheers,

Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 1/2] xfs: fix log recovery transaction item reordering
  2013-05-28 20:36   ` [PATCH 9a,9b v2, replacements] xfs: unlinked list crcs Dave Chinner
@ 2013-05-28 20:36     ` Dave Chinner
  2013-05-28 20:36     ` [PATCH 2/2] xfs: inode unlinked list needs to recalculate the inode CRC Dave Chinner
  1 sibling, 0 replies; 54+ messages in thread
From: Dave Chinner @ 2013-05-28 20:36 UTC (permalink / raw)
  To: xfs; +Cc: bpm

From: Dave Chinner <dchinner@redhat.com>

There are several constraints that inode allocation and unlink
logging impose on log recovery. These all stem from the fact that
inode alloc/unlink are logged in buffers, but all other inode
changes are logged in inode items. Hence there are ordering
constraints that recovery must follow to ensure the correct result
occurs.

As it turns out, this ordering has been working mostly by chance
than good management. The existing code moves all buffers except
cancelled buffers to the head of the list, and everything else to
the tail of the list. The problem with this is that is interleaves
inode items with the buffer cancellation items, and hence whether
the inode item in an cancelled buffer gets replayed is essentially
left to chance.

Further, this ordering causes problems for log recovery when inode
CRCs are enabled. It typically replays the inode unlink buffer long before
it replays the inode core changes, and so the CRC recorded in an
unlink buffer is going to be invalid and hence any attempt to
validate the inode in the buffer is going to fail. Hence we really
need to enforce the ordering that the inode alloc/unlink code has
expected log recovery to have since inode chunk de-allocation was
introduced back in 2003...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_recover.c |   65 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 58 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index d6204d1..83088d9 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1599,10 +1599,43 @@ xlog_recover_add_to_trans(
 }
 
 /*
- * Sort the log items in the transaction. Cancelled buffers need
- * to be put first so they are processed before any items that might
- * modify the buffers. If they are cancelled, then the modifications
- * don't need to be replayed.
+ * Sort the log items in the transaction.
+ *
+ * The ordering constraints are defined by the inode allocation and unlink
+ * behaviour. The rules are:
+ *
+ *	1. Every item is only logged once in a given transaction. Hence it
+ *	   represents the last logged state of the item. Hence ordering is
+ *	   dependent on the order in which operations need to be performed so
+ *	   required initial conditions are always met.
+ *
+ *	2. Cancelled buffers are recorded in pass 1 in a separate table and
+ *	   there's nothing to replay from them so we can simply cull them
+ *	   from the transaction. However, we can't do that until after we've
+ *	   replayed all the other items because they may be dependent on the
+ *	   cancelled buffer and replaying the cancelled buffer can remove it
+ *	   form the cancelled buffer table. Hence they have tobe done last.
+ *
+ *	3. Inode allocation buffers must be replayed before inode items that
+ *	   read the buffer and replay changes into it.
+ *
+ *	4. Inode unlink buffers must be replayed after inode items are replayed.
+ *	   This ensures that inodes are completely flushed to the inode buffer
+ *	   in a "free" state before we remove the unlinked inode list pointer.
+ *
+ * Hence the ordering needs to be inode allocation buffers first, inode items
+ * second, inode unlink buffers third and cancelled buffers last.
+ *
+ * But there's a problem with that - we can't tell an inode allocation buffer
+ * apart from a regular buffer, so we can't separate them. We can, however,
+ * tell an inode unlink buffer from the others, and so we can separate them out
+ * from all the other buffers and move them to last.
+ *
+ * Hence, 4 lists, in order from head to tail:
+ * 	- buffer_list for all buffers except cancelled/inode unlink buffers
+ * 	- item_list for all non-buffer items
+ * 	- inode_buffer_list for inode unlink buffers
+ * 	- cancel_list for the cancelled buffers
  */
 STATIC int
 xlog_recover_reorder_trans(
@@ -1612,6 +1645,10 @@ xlog_recover_reorder_trans(
 {
 	xlog_recover_item_t	*item, *n;
 	LIST_HEAD(sort_list);
+	LIST_HEAD(cancel_list);
+	LIST_HEAD(buffer_list);
+	LIST_HEAD(inode_buffer_list);
+	LIST_HEAD(inode_list);
 
 	list_splice_init(&trans->r_itemq, &sort_list);
 	list_for_each_entry_safe(item, n, &sort_list, ri_list) {
@@ -1619,12 +1656,18 @@ xlog_recover_reorder_trans(
 
 		switch (ITEM_TYPE(item)) {
 		case XFS_LI_BUF:
-			if (!(buf_f->blf_flags & XFS_BLF_CANCEL)) {
+			if (buf_f->blf_flags & XFS_BLF_CANCEL) {
 				trace_xfs_log_recover_item_reorder_head(log,
 							trans, item, pass);
-				list_move(&item->ri_list, &trans->r_itemq);
+				list_move(&item->ri_list, &cancel_list);
 				break;
 			}
+			if (buf_f->blf_flags & XFS_BLF_INODE_BUF) {
+				list_move(&item->ri_list, &inode_buffer_list);
+				break;
+			}
+			list_move_tail(&item->ri_list, &buffer_list);
+			break;
 		case XFS_LI_INODE:
 		case XFS_LI_DQUOT:
 		case XFS_LI_QUOTAOFF:
@@ -1632,7 +1675,7 @@ xlog_recover_reorder_trans(
 		case XFS_LI_EFI:
 			trace_xfs_log_recover_item_reorder_tail(log,
 							trans, item, pass);
-			list_move_tail(&item->ri_list, &trans->r_itemq);
+			list_move_tail(&item->ri_list, &inode_list);
 			break;
 		default:
 			xfs_warn(log->l_mp,
@@ -1643,6 +1686,14 @@ xlog_recover_reorder_trans(
 		}
 	}
 	ASSERT(list_empty(&sort_list));
+	if (!list_empty(&buffer_list))
+		list_splice(&buffer_list, &trans->r_itemq);
+	if (!list_empty(&inode_list))
+		list_splice_tail(&inode_list, &trans->r_itemq);
+	if (!list_empty(&inode_buffer_list))
+		list_splice_tail(&inode_buffer_list, &trans->r_itemq);
+	if (!list_empty(&cancel_list))
+		list_splice_tail(&cancel_list, &trans->r_itemq);
 	return 0;
 }
 
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 2/2] xfs: inode unlinked list needs to recalculate the inode CRC
  2013-05-28 20:36   ` [PATCH 9a,9b v2, replacements] xfs: unlinked list crcs Dave Chinner
  2013-05-28 20:36     ` [PATCH 1/2] xfs: fix log recovery transaction item reordering Dave Chinner
@ 2013-05-28 20:36     ` Dave Chinner
  2013-05-30 14:17       ` Brian Foster
  1 sibling, 1 reply; 54+ messages in thread
From: Dave Chinner @ 2013-05-28 20:36 UTC (permalink / raw)
  To: xfs; +Cc: bpm

From: Dave Chinner <dchinner@redhat.com>

The inode unlinked list manipulations operate directly on the inode
buffer, and so bypass the inode CRC calculation mechanisms. Hence an
inode on the unlinked list has an invalid CRC. Fix this by
recalculating the CRC whenever we modify an unlinked list pointer in
an inode, ncluding during log recovery. This is trivial to do and
results in  unlinked list operations always leaving a consistent
inode in the buffer.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_inode.c       |   42 ++++++++++++++++++++++++++++++++++++++----
 fs/xfs/xfs_log_recover.c |    9 +++++++++
 2 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index efbe1ac..2d993e7 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1579,6 +1579,23 @@ out_bmap_cancel:
 }
 
 /*
+ * Helper function to calculate range of the inode to log and recalculate the
+ * on-disk inode crc if necessary.
+ */
+static int
+xfs_iunlink_dinode_calc_crc(
+	struct xfs_mount	*mp,
+	struct xfs_dinode	*dip)
+{
+	if (dip->di_version < 3)
+		return sizeof(xfs_agino_t);
+
+	xfs_dinode_calc_crc(mp, dip);
+	return offsetof(struct xfs_dinode, di_changecount) -
+		offsetof(struct xfs_dinode, di_next_unlinked);
+}
+
+/*
  * This is called when the inode's link count goes to 0.
  * We place the on-disk inode on a list in the AGI.  It
  * will be pulled from this list when the inode is freed.
@@ -1638,10 +1655,15 @@ xfs_iunlink(
 		dip->di_next_unlinked = agi->agi_unlinked[bucket_index];
 		offset = ip->i_imap.im_boffset +
 			offsetof(xfs_dinode_t, di_next_unlinked);
+
+		/* need to recalc the inode CRC if appropriate */
+		xfs_iunlink_dinode_calc_crc(mp, dip);
+
 		xfs_trans_inode_buf(tp, ibp);
 		xfs_trans_log_buf(tp, ibp, offset,
-				  (offset + sizeof(xfs_agino_t) - 1));
+				  offset + sizeof(xfs_agino_t) - 1);
 		xfs_inobp_check(mp, ibp);
+
 	}
 
 	/*
@@ -1723,9 +1745,13 @@ xfs_iunlink_remove(
 			dip->di_next_unlinked = cpu_to_be32(NULLAGINO);
 			offset = ip->i_imap.im_boffset +
 				offsetof(xfs_dinode_t, di_next_unlinked);
+
+			/* need to recalc the inode CRC if appropriate */
+			xfs_iunlink_dinode_calc_crc(mp, dip);
+
 			xfs_trans_inode_buf(tp, ibp);
 			xfs_trans_log_buf(tp, ibp, offset,
-					  (offset + sizeof(xfs_agino_t) - 1));
+					  offset + sizeof(xfs_agino_t) - 1);
 			xfs_inobp_check(mp, ibp);
 		} else {
 			xfs_trans_brelse(tp, ibp);
@@ -1796,9 +1822,13 @@ xfs_iunlink_remove(
 			dip->di_next_unlinked = cpu_to_be32(NULLAGINO);
 			offset = ip->i_imap.im_boffset +
 				offsetof(xfs_dinode_t, di_next_unlinked);
+
+			/* need to recalc the inode CRC if appropriate */
+			xfs_iunlink_dinode_calc_crc(mp, dip);
+
 			xfs_trans_inode_buf(tp, ibp);
 			xfs_trans_log_buf(tp, ibp, offset,
-					  (offset + sizeof(xfs_agino_t) - 1));
+					  offset + sizeof(xfs_agino_t) - 1);
 			xfs_inobp_check(mp, ibp);
 		} else {
 			xfs_trans_brelse(tp, ibp);
@@ -1809,9 +1839,13 @@ xfs_iunlink_remove(
 		last_dip->di_next_unlinked = cpu_to_be32(next_agino);
 		ASSERT(next_agino != 0);
 		offset = last_offset + offsetof(xfs_dinode_t, di_next_unlinked);
+
+		/* need to recalc the inode CRC if appropriate */
+		xfs_iunlink_dinode_calc_crc(mp, dip);
+
 		xfs_trans_inode_buf(tp, last_ibp);
 		xfs_trans_log_buf(tp, last_ibp, offset,
-				  (offset + sizeof(xfs_agino_t) - 1));
+				  offset + sizeof(xfs_agino_t) - 1);
 		xfs_inobp_check(mp, last_ibp);
 	}
 	return 0;
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 83088d9..45a85ff 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1912,6 +1912,15 @@ xlog_recover_do_inode_buffer(
 		buffer_nextp = (xfs_agino_t *)xfs_buf_offset(bp,
 					      next_unlinked_offset);
 		*buffer_nextp = *logged_nextp;
+
+		/*
+		 * If necessary, recalculate the CRC in the on-disk inode. We
+		 * have to leave the inode in a consistent state for whoever
+		 * reads it next....
+		 */
+		xfs_dinode_calc_crc(mp, (struct xfs_dinode *)
+				xfs_buf_offset(bp, i * mp->m_sb.sb_inodesize));
+
 	}
 
 	return 0;
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 11/9] xfs: fix remote attribute invalidation for a leaf
  2013-05-27  6:38 [PATH 0/9] xfs: fixes for 3.10-rc4 Dave Chinner
                   ` (10 preceding siblings ...)
  2013-05-28 17:56 ` [PATH 0/9] xfs: fixes for 3.10-rc4 Ben Myers
@ 2013-05-28 21:27 ` Dave Chinner
  11 siblings, 0 replies; 54+ messages in thread
From: Dave Chinner @ 2013-05-28 21:27 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When invalidating an attribute leaf block block, there might be
remote attributes that it points to. With the recent rework of the
remote attribute format, we have to make sure we calculate the
length of the attribute correctly. We aren't doing that in
xfs_attr3_leaf_inactive(), so fix it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_leaf.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
index d788302..31d3cd1 100644
--- a/fs/xfs/xfs_attr_leaf.c
+++ b/fs/xfs/xfs_attr_leaf.c
@@ -3258,7 +3258,7 @@ xfs_attr3_leaf_inactive(
 			name_rmt = xfs_attr3_leaf_name_remote(leaf, i);
 			if (name_rmt->valueblk) {
 				lp->valueblk = be32_to_cpu(name_rmt->valueblk);
-				lp->valuelen = XFS_B_TO_FSB(dp->i_mount,
+				lp->valuelen = xfs_attr3_rmt_blocks(dp->i_mount,
 						    be32_to_cpu(name_rmt->valuelen));
 				lp++;
 			}

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/9] xfs: disable swap extents ioctl on CRC enabled filesystems
  2013-05-27  6:38 ` [PATCH 6/9] xfs: disable swap extents ioctl on CRC enabled filesystems Dave Chinner
@ 2013-05-28 21:49   ` Ben Myers
  2013-05-30  1:07     ` Dave Chinner
  2013-05-29 21:06   ` Brian Foster
  1 sibling, 1 reply; 54+ messages in thread
From: Ben Myers @ 2013-05-28 21:49 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, May 27, 2013 at 04:38:24PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Currently, swapping extents from one inode to another is a simple
> act of switching data and attribute forks from one inode to another.
> This, unfortunately in no longer so simple with CRC enabled
> filesystems as there is owner information embedded into the BMBT
> blocks that are swapped between inodes. Hence swapping the forks
> between inodes results in the inodes having mapping blocks that
> point to the wrong owner and hence are considered corrupt.
> 
> To fix this we need an extent tree block or record based swap
> algorithm so that the BMBT block owner information can be updated
> atomically in the swap transaction. This is a significant piece of
> new work, so for the moment simply don't allow swap extent
> operations to succeed on CRC enabled filesystems.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

I believe we do want to have functional swap extents for crc enabled
filesystems.  But this is fine as long as it is temporary.  Thanks Eric for
pointing this out on IRC.   Looks fine.

Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATH 0/9] xfs: fixes for 3.10-rc4
  2013-05-28 17:56 ` [PATH 0/9] xfs: fixes for 3.10-rc4 Ben Myers
@ 2013-05-28 23:54   ` Dave Chinner
  2013-05-29 19:01     ` Ben Myers
  0 siblings, 1 reply; 54+ messages in thread
From: Dave Chinner @ 2013-05-28 23:54 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Tue, May 28, 2013 at 12:56:27PM -0500, Ben Myers wrote:
> Hi Dave,
> 
> On Mon, May 27, 2013 at 04:38:18PM +1000, Dave Chinner wrote:
> > This is an update for all the fixes I have for 3,10-rc4.
> > 
> > All the comments on the previous patches have been updated in this
> > series, and there are two new patches. The first patch adds support
> > for indicating that the filesystem has CRC enabled to userspace, and
> > the second fixes the missing CRC updates on the unlinked inode list
> > manipulations.
> 
> This seems to be getting a bit out of hand for a release candidate pull
> request:

You don't need to send them all at once. The ones you committed from
my initial -rc2 series could just be sent to Linus immediately, and
that halves the current outstanding queue. Then there's only 8-9
patches remaining...

> 
> Fixes for xfs without crcs:
> f648167f xfs: avoid nesting transactions in xfs_qm_scall_setqlim()
> 	 xfs: fix split buffer vector log recovery support
> 	 xfs: kill suid/sgid through the truncate path.

And the log recovery ordering fix I just sent out.

> Fixes for xfs with crcs:
> 90253cf1 xfs: remote attribute allocation may be contiguous 
> 913e96bc xfs: remote attribute read too short
> 4af3644c xfs: remote attribute tail zeroing does too much
> 6863ef84 xfs: correctly map remote attr buffers during removal
> 8517de2a xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance
> d4c712bc xfs: fully initialise temp leaf in xfs_attr3_leaf_compact

All of which fix filesystem corruption or - in the case of the
buffer cache coherency problems -  can lead to stale data exposure.

> ad1858d7 xfs: rework remote attr CRCs

Fixes cache coherency issues via a change in the on-disk format for
remote attributes. We need to get that included before a full
release is made, otherwise we need feature bits to distinguish
between the original and the new formats. 

> 9a5b6dee xfs: fix incorrect remote symlink block count
> 	 xfs: rework dquot CRCs

More filesystem corruption fixes.

> 	 xfs: inode unlinked list needs to recalculate the inode CRC 	* replaced with "2-3 patches"

Another corruption fix, now with an addition log recovery ordering
fix which prevents log recovery from potentially causing silent
filesystem corruption.

> 	 xfs: fix dir3 freespace block corruption			* maybe this effects filesystems w/o crcs?

Again, a filesystem corruption, including fixing a problem in the
on-disk format definition. While it probably doesn't affect
non-crc filesystems, it's a landmine that should be removed....

> Dev work for xfs with crcs:
> 	 xfs: increase number of ACL entries for V5 superblocks

On disk format change. We need to get that out before release,
otherwise it needs a feature bit.

> 	 xfs: don't emit v5 superblock warnings on write

Fixes a nasty log spamming problem. Certainly not development code.

> 	 xfs: add fsgeom flag for v5 superblock support.

Needed for userspace support of CRC filesystems, trivial patch. if
we expect anyone to test this kernel code, we need this patch so
userspace can correctly identify crc-enabled filesystems.

> 	 xfs: disable swap extents ioctl on CRC enabled filesystems

Filesystem corruption fix, caused simply by running xfs_fsr.

But, really, this entire series is not that much change in terms of
volume:

$ git diff --stat b38958d..bf6331a -- fs/xfs
 fs/xfs/xfs_acl.c         |   31 ++--
 fs/xfs/xfs_acl.h         |   30 +++-
 fs/xfs/xfs_attr_leaf.c   |   74 ++++++---
 fs/xfs/xfs_attr_remote.c |  408 +++++++++++++++++++++++++++++-------------------
 fs/xfs/xfs_attr_remote.h |   10 ++
 fs/xfs/xfs_buf.c         |    1 +
 fs/xfs/xfs_buf_item.c    |    7 +-
 fs/xfs/xfs_dfrag.c       |    8 +
 fs/xfs/xfs_dir2_format.h |    1 +
 fs/xfs/xfs_dir2_node.c   |   13 +-
 fs/xfs/xfs_dquot.c       |   37 ++---
 fs/xfs/xfs_fs.h          |    1 +
 fs/xfs/xfs_fsops.c       |    4 +-
 fs/xfs/xfs_inode.c       |   42 ++++-
 fs/xfs/xfs_iops.c        |   47 ++++--
 fs/xfs/xfs_log_recover.c |   95 ++++++++++-
 fs/xfs/xfs_mount.c       |   18 ++-
 fs/xfs/xfs_qm.c          |   36 +++--
 fs/xfs/xfs_qm_syscalls.c |   40 +++--
 fs/xfs/xfs_quota.h       |    2 +
 fs/xfs/xfs_symlink.c     |   20 +--
 21 files changed, 612 insertions(+), 313 deletions(-)

And a large proportion of that change is the single attr CRC rework
patch.

> We're representing 3 patches which are clearly approprate for a release
> candidate, 11 fixes for CRCs + 2 more for the rework of unlinked lists patch,
> and 4 patches which are development work.

Most file filesystem corruptions. Without those fixes, CRC enabled
filesystems simple cannot be used without all these patches being
applied.  It's not an experimental feature at this point - it's just
broken.

> We pulled in the CRC work with the understanding that it is an
> experimental feature that might not be perfect, and with a focus
> upon preventing regression for existing users.  This did not imply
> that we're going after perfection for CRCs during the 3.10 release
> candidate series.

Right, but there was also the understanding that the CRC code would
be usable by release time, and so there would be fixes committed
throughout the -rc series to fix problems with the CRC code so that
it is usable.

As it is, the result of all these patches is that xfstests is almost
entirely passing on a 4k block size filesystem. The only tests that
aren't passing are those that already fail or require userspace
support that isn't currently available (e.g. xfs-db write support,
metadump). And 1k block size filesystems mostly pass xfstests, too,
though there are random assert failures from time to time.

IOWs after this patch set is applied, the CRC functionality fits the
definition of the 'experimental' tag.  It mostly works, but still
has some lurking issues that need to be flushed out. I'm unlikely to
be posting 2-3 bug fix patches a day from this point onwards....

Indeed, the last two days I've been doing 3.11 development work
because xfstests on CRC enabled filesystems is now usable for
regression testing. i.e. the majority of the "easy-to-hit" problems
have been flushed out by this patchset.

> I'll be happy to review the 17 CRC related patches for inclusion
> in the master branch ASAP,

That's a good start, but...

> but I'm afraid 17 patches is a bit much
> to ask for a release candidate, even if it were -rc2.

... I think you're being way too conservative here.

BTRFS in january for an -rc merge:

https://lkml.org/lkml/2013/1/25/11

That was 30 commits and +300/-100 lines. There was a another pull
request 2 weeks later with another 9 commits. The same thing
happened with the RAID5/6 integration into btrfs in 3.9 - the -rc
merges following it had 13, 7 and 15 commits. And for 3.10-rc, the
first post-rc1 merge was 25 commits (+350/-250 lines).

And if you want another example, the ext4 extent status cache fixes
that went into 3.9-rc4:

https://lkml.org/lkml/2013/3/21/644

that was 21 commits and +540/-81 lines.

Yup, new filesystem code has bugs, and they get fixed in -rc
releases. What makes new XFS code so special we can't do this?

The amount of change we are talking about here is not unreasonable
even for an -rc4 release, as evidenced by other filesystems under
development. Hence I don't think "it's too late" is really a viable
reason for not fixing all these filesystem corruption issues in the
CRC code...

> Here we are in 3.11/feature-bit territory.

I'd much prefer that we don't have to add code to 3.11 to reject any
CRC-enabled filesystem without any feature bits set because we don't
support a broken remote attr format that was fixed weeks before 3.10
released but was not allowed to be fixed in 3.10. That's just crazy
from any release management perspective you care to look at it from.

Ben, if the problem is that you can't review all the fixes in a
timely manner, then we can fix that. I'm sure that Mark, Eric and
Brian can help review the code if this is the sticking point.  There
are also people like Michael who are actively testing the CRC code
and reporting bugs to me all the time, so there's demand for these
fixes to be integrated into the 3.10 tree...

Everyone knew there would be fixes for the CRC code coming along in
the -rc period - we discussed it on the weekly concall before the
the CRC code was merged into 3.10. Hence I'm a bit surprised to get
what appears to be a blanket rejection for these fixes to the CRC
code this early in the 3.10-rc cycle.

Let's not waste all the effort that we put into getting the CRC code
into 3.10 by not fixing known corruptions and on-disk format
deficiencies. Releasing CRC support in a seriously broken state that
we need to add permanent workarounds for in 3.11 is about the worst
possible outcome that could occur right now....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 8/9] xfs: add fsgeom flag for v5 superblock support.
  2013-05-27  6:38 ` [PATCH 8/9] xfs: add fsgeom flag for v5 superblock support Dave Chinner
@ 2013-05-29 15:10   ` Eric Sandeen
  2013-05-29 21:43     ` Ben Myers
  2013-05-30  1:11     ` Dave Chinner
  2013-05-30 14:17   ` Brian Foster
  1 sibling, 2 replies; 54+ messages in thread
From: Eric Sandeen @ 2013-05-29 15:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 5/27/13 1:38 AM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Currently userspace has no way of determining that a filesystem is
> CRC enabled. Add a flag to the XFS_IOC_FSGEOMETRY ioctl output to
> indicate that the filesystem has v5 superblock support enabled.
> This will allow xfs_info to correctly report the state of the
> filesystem.


Looks fine,

Reviewed-by: Eric Sandeen <sandeen@redhat.com>

Ben, having this in place for for the next point release will let
userspace work & testing proceed w/o the need for a patched
kernel... if you could consider pulling it in that'd be great.

Dave, just out of curiosity, most other features sort of match between
the "_has_*" and the flag names, is there a reason for the
crc <-> sbv5 difference?  Just semantics, but just curious.

(i.e. xfs_sb_version_hasprojid32bit checks XFS_SB_VERSION2_PROJID32BIT,
but xfs_sb_version_hascrc checks XFS_SB_VERSION_5)

Answering my own question maybe, I guess SB_VERSION_5 was conceived
with crc already in place, so there's no need for a feature flag on
top of the sb version, right...?

-Eric 

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_fs.h    |    1 +
>  fs/xfs/xfs_fsops.c |    4 +++-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_fs.h b/fs/xfs/xfs_fs.h
> index 6dda3f9..d046955 100644
> --- a/fs/xfs/xfs_fs.h
> +++ b/fs/xfs/xfs_fs.h
> @@ -236,6 +236,7 @@ typedef struct xfs_fsop_resblks {
>  #define XFS_FSOP_GEOM_FLAGS_PROJID32	0x0800  /* 32-bit project IDs	*/
>  #define XFS_FSOP_GEOM_FLAGS_DIRV2CI	0x1000	/* ASCII only CI names	*/
>  #define XFS_FSOP_GEOM_FLAGS_LAZYSB	0x4000	/* lazy superblock counters */
> +#define XFS_FSOP_GEOM_FLAGS_V5SB	0x8000	/* version 5 superblock */
>  
>  
>  /*
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 87595b2..3c3644e 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -99,7 +99,9 @@ xfs_fs_geometry(
>  			(xfs_sb_version_hasattr2(&mp->m_sb) ?
>  				XFS_FSOP_GEOM_FLAGS_ATTR2 : 0) |
>  			(xfs_sb_version_hasprojid32bit(&mp->m_sb) ?
> -				XFS_FSOP_GEOM_FLAGS_PROJID32 : 0);
> +				XFS_FSOP_GEOM_FLAGS_PROJID32 : 0) |
> +			(xfs_sb_version_hascrc(&mp->m_sb) ?
> +				XFS_FSOP_GEOM_FLAGS_V5SB : 0);
>  		geo->logsectsize = xfs_sb_version_hassector(&mp->m_sb) ?
>  				mp->m_sb.sb_logsectsize : BBSIZE;
>  		geo->rtsectsize = mp->m_sb.sb_blocksize;
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 1/9] xfs: don't emit v5 superblock warnings on write
  2013-05-27  6:38 ` [PATCH 1/9] xfs: don't emit v5 superblock warnings on write Dave Chinner
@ 2013-05-29 16:39   ` Brian Foster
  2013-05-30 17:49     ` Ben Myers
  0 siblings, 1 reply; 54+ messages in thread
From: Brian Foster @ 2013-05-29 16:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 05/27/2013 02:38 AM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We write the superblock every 30s or so which results in the
> verifier being called. Right now that results in this output
> every 30s:
> 
> XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
> Use of these features in this kernel is at your own risk!
> 
> And spamming the logs.
> 
> We don't need to check for whether we support v5 superblocks or
> whether there are feature bits we don't support set as these are
> only relevant when we first mount the filesytem. i.e. on superblock
> read. Hence for the write verification we can just skip all the
> checks (and hence verbose output) altogether.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_mount.c |   18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index f6bfbd7..e8e310c 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -314,7 +314,8 @@ STATIC int
>  xfs_mount_validate_sb(
>  	xfs_mount_t	*mp,
>  	xfs_sb_t	*sbp,
> -	bool		check_inprogress)
> +	bool		check_inprogress,
> +	bool		check_version)
>  {
>  
>  	/*
> @@ -337,9 +338,10 @@ xfs_mount_validate_sb(
>  
>  	/*
>  	 * Version 5 superblock feature mask validation. Reject combinations the
> -	 * kernel cannot support up front before checking anything else.
> +	 * kernel cannot support up front before checking anything else. For
> +	 * write validation, we don't need to check feature masks.
>  	 */
> -	if (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) {
> +	if (check_version && XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) {
>  		xfs_alert(mp,
>  "Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!\n"
>  "Use of these features in this kernel is at your own risk!");
> @@ -675,7 +677,8 @@ xfs_sb_to_disk(
>  
>  static int
>  xfs_sb_verify(
> -	struct xfs_buf	*bp)
> +	struct xfs_buf	*bp,
> +	bool		check_version)
>  {
>  	struct xfs_mount *mp = bp->b_target->bt_mount;
>  	struct xfs_sb	sb;
> @@ -686,7 +689,8 @@ xfs_sb_verify(
>  	 * Only check the in progress field for the primary superblock as
>  	 * mkfs.xfs doesn't clear it from secondary superblocks.
>  	 */
> -	return xfs_mount_validate_sb(mp, &sb, bp->b_bn == XFS_SB_DADDR);
> +	return xfs_mount_validate_sb(mp, &sb, bp->b_bn == XFS_SB_DADDR,
> +				     check_version);
>  }
>  
>  /*
> @@ -719,7 +723,7 @@ xfs_sb_read_verify(
>  			goto out_error;
>  		}
>  	}
> -	error = xfs_sb_verify(bp);
> +	error = xfs_sb_verify(bp, true);
>  
>  out_error:
>  	if (error) {
> @@ -758,7 +762,7 @@ xfs_sb_write_verify(
>  	struct xfs_buf_log_item	*bip = bp->b_fspriv;
>  	int			error;
>  
> -	error = xfs_sb_verify(bp);
> +	error = xfs_sb_verify(bp, false);
>  	if (error) {
>  		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
>  		xfs_buf_ioerror(bp, error);
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 2/9] xfs: fix incorrect remote symlink block count
  2013-05-27  6:38 ` [PATCH 2/9] xfs: fix incorrect remote symlink block count Dave Chinner
@ 2013-05-29 16:39   ` Brian Foster
  2013-05-30  0:46     ` Dave Chinner
  2013-05-30 17:49     ` Ben Myers
  0 siblings, 2 replies; 54+ messages in thread
From: Brian Foster @ 2013-05-29 16:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 05/27/2013 02:38 AM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When CRCs are enabled, the number of blocks needed to hold a remote
> symlink on a 1k block size filesystem may be 2 instead of 1. The
> transaction reservation for the allocated bloks was not taking this
> into account and only allocating one block. hence when trying to
> read or invalidate such symlinks, we are mapping a hole where there
> should be a block and things go bad at that point.
> 
> Fix the reservation to use the correct block count, clean up the
> block count calculation similar to the remote attribute calculation,
> and add a debug guard to detect when we don't write the entire
> symlink to disk.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

Unrelated question... I noticed we update di_size in xfs_symlink(). I
presume this is safe because, even in the non-local case, we actually
log the data (the path) in the buffers as well, right?

Brian

>  fs/xfs/xfs_symlink.c |   20 ++++++--------------
>  1 file changed, 6 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
> index 5f234389..195a403 100644
> --- a/fs/xfs/xfs_symlink.c
> +++ b/fs/xfs/xfs_symlink.c
> @@ -56,16 +56,9 @@ xfs_symlink_blocks(
>  	struct xfs_mount *mp,
>  	int		pathlen)
>  {
> -	int		fsblocks = 0;
> -	int		len = pathlen;
> +	int buflen = XFS_SYMLINK_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
>  
> -	do {
> -		fsblocks++;
> -		len -= XFS_SYMLINK_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
> -	} while (len > 0);
> -
> -	ASSERT(fsblocks <= XFS_SYMLINK_MAPS);
> -	return fsblocks;
> +	return (pathlen + buflen - 1) / buflen;
>  }
>  
>  static int
> @@ -405,7 +398,7 @@ xfs_symlink(
>  	if (pathlen <= XFS_LITINO(mp, dp->i_d.di_version))
>  		fs_blocks = 0;
>  	else
> -		fs_blocks = XFS_B_TO_FSB(mp, pathlen);
> +		fs_blocks = xfs_symlink_blocks(mp, pathlen);
>  	resblks = XFS_SYMLINK_SPACE_RES(mp, link_name->len, fs_blocks);
>  	error = xfs_trans_reserve(tp, resblks, XFS_SYMLINK_LOG_RES(mp), 0,
>  			XFS_TRANS_PERM_LOG_RES, XFS_SYMLINK_LOG_COUNT);
> @@ -512,7 +505,7 @@ xfs_symlink(
>  		cur_chunk = target_path;
>  		offset = 0;
>  		for (n = 0; n < nmaps; n++) {
> -			char *buf;
> +			char	*buf;
>  
>  			d = XFS_FSB_TO_DADDR(mp, mval[n].br_startblock);
>  			byte_cnt = XFS_FSB_TO_B(mp, mval[n].br_blockcount);
> @@ -525,9 +518,7 @@ xfs_symlink(
>  			bp->b_ops = &xfs_symlink_buf_ops;
>  
>  			byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt);
> -			if (pathlen < byte_cnt) {
> -				byte_cnt = pathlen;
> -			}
> +			byte_cnt = min(byte_cnt, pathlen);
>  
>  			buf = bp->b_addr;
>  			buf += xfs_symlink_hdr_set(mp, ip->i_ino, offset,
> @@ -542,6 +533,7 @@ xfs_symlink(
>  			xfs_trans_log_buf(tp, bp, 0, (buf + byte_cnt - 1) -
>  							(char *)bp->b_addr);
>  		}
> +		ASSERT(pathlen == 0);
>  	}
>  
>  	/*
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 3/9] xfs: increase number of ACL entries for V5 superblocks
  2013-05-27  6:38 ` [PATCH 3/9] xfs: increase number of ACL entries for V5 superblocks Dave Chinner
@ 2013-05-29 16:40   ` Brian Foster
  0 siblings, 0 replies; 54+ messages in thread
From: Brian Foster @ 2013-05-29 16:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 05/27/2013 02:38 AM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The limit of 25 ACL entries is arbitrary, but baked into the on-disk
> format.  For version 5 superblocks, increase it to the maximum nuber
> of ACLs that can fit into a single xattr.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_acl.c |   31 ++++++++++++++++++-------------
>  fs/xfs/xfs_acl.h |   30 +++++++++++++++++++++++-------
>  2 files changed, 41 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> index 1d32f1d..b6bed9b 100644
> --- a/fs/xfs/xfs_acl.c
> +++ b/fs/xfs/xfs_acl.c
> @@ -21,6 +21,8 @@
>  #include "xfs_bmap_btree.h"
>  #include "xfs_inode.h"
>  #include "xfs_vnodeops.h"
> +#include "xfs_sb.h"
> +#include "xfs_mount.h"
>  #include "xfs_trace.h"
>  #include <linux/slab.h>
>  #include <linux/xattr.h>
> @@ -34,7 +36,9 @@
>   */
>  
>  STATIC struct posix_acl *
> -xfs_acl_from_disk(struct xfs_acl *aclp)
> +xfs_acl_from_disk(
> +	struct xfs_acl	*aclp,
> +	int		max_entries)
>  {
>  	struct posix_acl_entry *acl_e;
>  	struct posix_acl *acl;
> @@ -42,7 +46,7 @@ xfs_acl_from_disk(struct xfs_acl *aclp)
>  	unsigned int count, i;
>  
>  	count = be32_to_cpu(aclp->acl_cnt);
> -	if (count > XFS_ACL_MAX_ENTRIES)
> +	if (count > max_entries)
>  		return ERR_PTR(-EFSCORRUPTED);
>  
>  	acl = posix_acl_alloc(count, GFP_KERNEL);
> @@ -108,7 +112,7 @@ xfs_get_acl(struct inode *inode, int type)
>  	struct xfs_inode *ip = XFS_I(inode);
>  	struct posix_acl *acl;
>  	struct xfs_acl *xfs_acl;
> -	int len = sizeof(struct xfs_acl);
> +	int len = XFS_ACL_SIZE(ip->i_mount);
>  	unsigned char *ea_name;
>  	int error;
>  
> @@ -133,8 +137,8 @@ xfs_get_acl(struct inode *inode, int type)
>  	 * If we have a cached ACLs value just return it, not need to
>  	 * go out to the disk.
>  	 */
> -
> -	xfs_acl = kzalloc(sizeof(struct xfs_acl), GFP_KERNEL);
> +	len = XFS_ACL_SIZE(ip->i_mount);

Duplicate initialization of len. Harmless (perhaps Ben can drop it) and
the parens in XFS_ACL_MAX_ENTRIES() are fixed, so:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +	xfs_acl = kzalloc(len, GFP_KERNEL);
>  	if (!xfs_acl)
>  		return ERR_PTR(-ENOMEM);
>  
> @@ -153,7 +157,7 @@ xfs_get_acl(struct inode *inode, int type)
>  		goto out;
>  	}
>  
> -	acl = xfs_acl_from_disk(xfs_acl);
> +	acl = xfs_acl_from_disk(xfs_acl, XFS_ACL_MAX_ENTRIES(ip->i_mount));
>  	if (IS_ERR(acl))
>  		goto out;
>  
> @@ -189,16 +193,17 @@ xfs_set_acl(struct inode *inode, int type, struct posix_acl *acl)
>  
>  	if (acl) {
>  		struct xfs_acl *xfs_acl;
> -		int len;
> +		int len = XFS_ACL_SIZE(ip->i_mount);
>  
> -		xfs_acl = kzalloc(sizeof(struct xfs_acl), GFP_KERNEL);
> +		xfs_acl = kzalloc(len, GFP_KERNEL);
>  		if (!xfs_acl)
>  			return -ENOMEM;
>  
>  		xfs_acl_to_disk(xfs_acl, acl);
> -		len = sizeof(struct xfs_acl) -
> -			(sizeof(struct xfs_acl_entry) *
> -			 (XFS_ACL_MAX_ENTRIES - acl->a_count));
> +
> +		/* subtract away the unused acl entries */
> +		len -= sizeof(struct xfs_acl_entry) *
> +			 (XFS_ACL_MAX_ENTRIES(ip->i_mount) - acl->a_count);
>  
>  		error = -xfs_attr_set(ip, ea_name, (unsigned char *)xfs_acl,
>  				len, ATTR_ROOT);
> @@ -243,7 +248,7 @@ xfs_set_mode(struct inode *inode, umode_t mode)
>  static int
>  xfs_acl_exists(struct inode *inode, unsigned char *name)
>  {
> -	int len = sizeof(struct xfs_acl);
> +	int len = XFS_ACL_SIZE(XFS_M(inode->i_sb));
>  
>  	return (xfs_attr_get(XFS_I(inode), name, NULL, &len,
>  			    ATTR_ROOT|ATTR_KERNOVAL) == 0);
> @@ -379,7 +384,7 @@ xfs_xattr_acl_set(struct dentry *dentry, const char *name,
>  		goto out_release;
>  
>  	error = -EINVAL;
> -	if (acl->a_count > XFS_ACL_MAX_ENTRIES)
> +	if (acl->a_count > XFS_ACL_MAX_ENTRIES(XFS_M(inode->i_sb)))
>  		goto out_release;
>  
>  	if (type == ACL_TYPE_ACCESS) {
> diff --git a/fs/xfs/xfs_acl.h b/fs/xfs/xfs_acl.h
> index 39632d9..0da8725 100644
> --- a/fs/xfs/xfs_acl.h
> +++ b/fs/xfs/xfs_acl.h
> @@ -22,19 +22,35 @@ struct inode;
>  struct posix_acl;
>  struct xfs_inode;
>  
> -#define XFS_ACL_MAX_ENTRIES 25
>  #define XFS_ACL_NOT_PRESENT (-1)
>  
>  /* On-disk XFS access control list structure */
> +struct xfs_acl_entry {
> +	__be32	ae_tag;
> +	__be32	ae_id;
> +	__be16	ae_perm;
> +	__be16	ae_pad;		/* fill the implicit hole in the structure */
> +};
> +
>  struct xfs_acl {
> -	__be32		acl_cnt;
> -	struct xfs_acl_entry {
> -		__be32	ae_tag;
> -		__be32	ae_id;
> -		__be16	ae_perm;
> -	} acl_entry[XFS_ACL_MAX_ENTRIES];
> +	__be32			acl_cnt;
> +	struct xfs_acl_entry	acl_entry[0];
>  };
>  
> +/*
> + * The number of ACL entries allowed is defined by the on-disk format.
> + * For v4 superblocks, that is limited to 25 entries. For v5 superblocks, it is
> + * limited only by the maximum size of the xattr that stores the information.
> + */
> +#define XFS_ACL_MAX_ENTRIES(mp)	\
> +	(xfs_sb_version_hascrc(&mp->m_sb) \
> +	   ?  (XATTR_SIZE_MAX - sizeof(__be32)) / sizeof(struct xfs_acl_entry) \
> +	   : 25)
> +
> +#define XFS_ACL_SIZE(mp) \
> +	(sizeof(struct xfs_acl) + \
> +		sizeof(struct xfs_acl_entry) * XFS_ACL_MAX_ENTRIES((mp)))
> +
>  /* On-disk XFS extended attribute names */
>  #define SGI_ACL_FILE		(unsigned char *)"SGI_ACL_FILE"
>  #define SGI_ACL_DEFAULT		(unsigned char *)"SGI_ACL_DEFAULT"
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 4/9] xfs: rework dquot CRCs
  2013-05-27  6:38 ` [PATCH 4/9] xfs: rework dquot CRCs Dave Chinner
@ 2013-05-29 18:58   ` Brian Foster
  2013-05-30  1:00     ` Dave Chinner
  0 siblings, 1 reply; 54+ messages in thread
From: Brian Foster @ 2013-05-29 18:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 05/27/2013 02:38 AM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Calculating dquot CRCs when the backing buffer is written back just
> doesn't work reliably. There are several places which manipulate
> dquots directly in the buffers, and they don't calculate CRCs
> appropriately, nor do they always set the buffer up to calculate
> CRCs appropriately.
> 
> Firstly, if we log a dquot buffer (e.g. during allocation) it gets
> logged without valid CRC, and so on recovery we end up with a dquot
> that is not valid.
> 
> Secondly, if we recover/repair a dquot, we don't have a verifier
> attached to the buffer and hence CRCs arenot calculate don the way
> down to disk.
> 
> Thirdly, calculating the CRC after we've changed the contents means
> that if we re-read the dquot from the buffer, we cannot verify the
> contents of the dquot are valid, as the CRC is invalid.
> 
> So, to avoid all the dquot CRC errors that are being detected by the
> read verifier, change to using the same model as for inodes. that
> is, dquot CRCs are calculated and written to the backing buffer at
> the time the dquot is flushed to the backing buffer. If we modify
> the dquuot directly in the backing buffer, calculate the CRC
> immediately after the modification is complete. Hence the dquot in
> the on-disk buffer should always have a valid CRC.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_dquot.c       |   37 ++++++++++++++++---------------------
>  fs/xfs/xfs_log_recover.c |   10 ++++++++++
>  fs/xfs/xfs_qm.c          |   36 ++++++++++++++++++++++++++----------
>  fs/xfs/xfs_quota.h       |    2 ++
>  4 files changed, 54 insertions(+), 31 deletions(-)
> 
...
> diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> index f41702b..d181542 100644
> --- a/fs/xfs/xfs_qm.c
> +++ b/fs/xfs/xfs_qm.c
> @@ -41,6 +41,7 @@
>  #include "xfs_qm.h"
>  #include "xfs_trace.h"
>  #include "xfs_icache.h"
> +#include "xfs_cksum.h"
>  
>  /*
>   * The global quota manager. There is only one of these for the entire
> @@ -839,7 +840,7 @@ xfs_qm_reset_dqcounts(
>  	xfs_dqid_t	id,
>  	uint		type)
>  {
> -	xfs_disk_dquot_t	*ddq;
> +	struct xfs_dqblk	*dqb;
>  	int			j;
>  
>  	trace_xfs_reset_dqcounts(bp, _RET_IP_);
> @@ -853,8 +854,12 @@ xfs_qm_reset_dqcounts(
>  	do_div(j, sizeof(xfs_dqblk_t));
>  	ASSERT(mp->m_quotainfo->qi_dqperchunk == j);
>  #endif
> -	ddq = bp->b_addr;
> +	dqb = bp->b_addr;
>  	for (j = 0; j < mp->m_quotainfo->qi_dqperchunk; j++) {
> +		struct xfs_disk_dquot	*ddq;
> +
> +		ddq =  (struct xfs_disk_dquot *)&dqb[j];
> +
>  		/*
>  		 * Do a sanity check, and if needed, repair the dqblk. Don't
>  		 * output any warnings because it's perfectly possible to
> @@ -871,7 +876,8 @@ xfs_qm_reset_dqcounts(
>  		ddq->d_bwarns = 0;
>  		ddq->d_iwarns = 0;
>  		ddq->d_rtbwarns = 0;
> -		ddq = (xfs_disk_dquot_t *) ((xfs_dqblk_t *)ddq + 1);
> +		xfs_update_cksum((char *)&dqb[j], sizeof(struct xfs_dqblk),
> +					 XFS_DQUOT_CRC_OFF);

Nice cleanup on the cast ugliness even without the crc change. Is there
a technical reason for the unconditional crc update here beyond that
we're doing a reset? I'm wondering if there's any value in leaving those
bits untouched for a filesystem that might have never enabled crc
(quotacheck or not).

FWIW, the rest of this patch looks sane to me (I'm less familiar with
the log recovery code, but the changes seem isolated and
straightforward) and I couldn't locate anywhere else we modify the
backing buffer for the dquot.

Brian

>  	}
>  }
>  
> @@ -907,19 +913,29 @@ xfs_qm_dqiter_bufs(
>  			      XFS_FSB_TO_DADDR(mp, bno),
>  			      mp->m_quotainfo->qi_dqchunklen, 0, &bp,
>  			      &xfs_dquot_buf_ops);
> -		if (error)
> -			break;
>  
>  		/*
> -		 * XXX(hch): need to figure out if it makes sense to validate
> -		 *	     the CRC here.
> +		 * CRC and validation errors will return a EFSCORRUPTED here. If
> +		 * this occurs, re-read without CRC validation so that we can
> +		 * repair the damage via xfs_qm_reset_dqcounts(). This process
> +		 * will leave a trace in the log indicating corruption has
> +		 * been detected.
>  		 */
> +		if (error == EFSCORRUPTED) {
> +			error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp,
> +				      XFS_FSB_TO_DADDR(mp, bno),
> +				      mp->m_quotainfo->qi_dqchunklen, 0, &bp,
> +				      NULL);
> +		}
> +
> +		if (error)
> +			break;
> +
>  		xfs_qm_reset_dqcounts(mp, bp, firstid, type);
>  		xfs_buf_delwri_queue(bp, buffer_list);
>  		xfs_buf_relse(bp);
> -		/*
> -		 * goto the next block.
> -		 */
> +
> +		/* goto the next block. */
>  		bno++;
>  		firstid += mp->m_quotainfo->qi_dqperchunk;
>  	}
> diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h
> index c61e31c..c38068f 100644
> --- a/fs/xfs/xfs_quota.h
> +++ b/fs/xfs/xfs_quota.h
> @@ -87,6 +87,8 @@ typedef struct xfs_dqblk {
>  	uuid_t		  dd_uuid;	/* location information */
>  } xfs_dqblk_t;
>  
> +#define XFS_DQUOT_CRC_OFF	offsetof(struct xfs_dqblk, dd_crc)
> +
>  /*
>   * flags for q_flags field in the dquot.
>   */
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATH 0/9] xfs: fixes for 3.10-rc4
  2013-05-28 23:54   ` Dave Chinner
@ 2013-05-29 19:01     ` Ben Myers
  2013-05-29 19:27       ` Eric Sandeen
  0 siblings, 1 reply; 54+ messages in thread
From: Ben Myers @ 2013-05-29 19:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hi Dave,

On Wed, May 29, 2013 at 09:54:24AM +1000, Dave Chinner wrote:
> On Tue, May 28, 2013 at 12:56:27PM -0500, Ben Myers wrote:
> > On Mon, May 27, 2013 at 04:38:18PM +1000, Dave Chinner wrote:
> > > This is an update for all the fixes I have for 3,10-rc4.
> > > 
> > > All the comments on the previous patches have been updated in this
> > > series, and there are two new patches. The first patch adds support
> > > for indicating that the filesystem has CRC enabled to userspace, and
> > > the second fixes the missing CRC updates on the unlinked inode list
> > > manipulations.
> > 
> > This seems to be getting a bit out of hand for a release candidate pull
> > request:
> 
> You don't need to send them all at once. The ones you committed from
> my initial -rc2 series could just be sent to Linus immediately, and
> that halves the current outstanding queue. Then there's only 8-9
> patches remaining...

I am more concerned about the content of the patches than the number of
them.

> > Fixes for xfs without crcs:
> > f648167f xfs: avoid nesting transactions in xfs_qm_scall_setqlim()
> > 	 xfs: fix split buffer vector log recovery support
> > 	 xfs: kill suid/sgid through the truncate path.
> 
> And the log recovery ordering fix I just sent out.
> 
> > Fixes for xfs with crcs:
> > 90253cf1 xfs: remote attribute allocation may be contiguous 
> > 913e96bc xfs: remote attribute read too short
> > 4af3644c xfs: remote attribute tail zeroing does too much
> > 6863ef84 xfs: correctly map remote attr buffers during removal
> > 8517de2a xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance
> > d4c712bc xfs: fully initialise temp leaf in xfs_attr3_leaf_compact
> 
> All of which fix filesystem corruption or - in the case of the
> buffer cache coherency problems -  can lead to stale data exposure.
> 
> > ad1858d7 xfs: rework remote attr CRCs
> 
> Fixes cache coherency issues via a change in the on-disk format for
> remote attributes. We need to get that included before a full
> release is made, otherwise we need feature bits to distinguish
> between the original and the new formats. 
> 
> > 9a5b6dee xfs: fix incorrect remote symlink block count
> > 	 xfs: rework dquot CRCs
> 
> More filesystem corruption fixes.
> 
> > 	 xfs: inode unlinked list needs to recalculate the inode CRC 	* replaced with "2-3 patches"
> 
> Another corruption fix, now with an addition log recovery ordering
> fix which prevents log recovery from potentially causing silent
> filesystem corruption.
> 
> > 	 xfs: fix dir3 freespace block corruption			* maybe this effects filesystems w/o crcs?
> 
> Again, a filesystem corruption, including fixing a problem in the
> on-disk format definition. While it probably doesn't affect
> non-crc filesystems, it's a landmine that should be removed....
> 
> > Dev work for xfs with crcs:
> > 	 xfs: increase number of ACL entries for V5 superblocks
> 
> On disk format change. We need to get that out before release,
> otherwise it needs a feature bit.
> 
> > 	 xfs: don't emit v5 superblock warnings on write
> 
> Fixes a nasty log spamming problem. Certainly not development code.
> 
> > 	 xfs: add fsgeom flag for v5 superblock support.
> 
> Needed for userspace support of CRC filesystems, trivial patch. if
> we expect anyone to test this kernel code, we need this patch so
> userspace can correctly identify crc-enabled filesystems.
> 
> > 	 xfs: disable swap extents ioctl on CRC enabled filesystems
> 
> Filesystem corruption fix, caused simply by running xfs_fsr.
> 
> But, really, this entire series is not that much change in terms of
> volume:
> 
> $ git diff --stat b38958d..bf6331a -- fs/xfs
>  fs/xfs/xfs_acl.c         |   31 ++--
>  fs/xfs/xfs_acl.h         |   30 +++-
>  fs/xfs/xfs_attr_leaf.c   |   74 ++++++---
>  fs/xfs/xfs_attr_remote.c |  408 +++++++++++++++++++++++++++++-------------------
>  fs/xfs/xfs_attr_remote.h |   10 ++
>  fs/xfs/xfs_buf.c         |    1 +
>  fs/xfs/xfs_buf_item.c    |    7 +-
>  fs/xfs/xfs_dfrag.c       |    8 +
>  fs/xfs/xfs_dir2_format.h |    1 +
>  fs/xfs/xfs_dir2_node.c   |   13 +-
>  fs/xfs/xfs_dquot.c       |   37 ++---
>  fs/xfs/xfs_fs.h          |    1 +
>  fs/xfs/xfs_fsops.c       |    4 +-
>  fs/xfs/xfs_inode.c       |   42 ++++-
>  fs/xfs/xfs_iops.c        |   47 ++++--
>  fs/xfs/xfs_log_recover.c |   95 ++++++++++-
>  fs/xfs/xfs_mount.c       |   18 ++-
>  fs/xfs/xfs_qm.c          |   36 +++--
>  fs/xfs/xfs_qm_syscalls.c |   40 +++--
>  fs/xfs/xfs_quota.h       |    2 +
>  fs/xfs/xfs_symlink.c     |   20 +--
>  21 files changed, 612 insertions(+), 313 deletions(-)
> 
> And a large proportion of that change is the single attr CRC rework
> patch.
> 
> > We're representing 3 patches which are clearly approprate for a release
> > candidate, 11 fixes for CRCs + 2 more for the rework of unlinked lists patch,
> > and 4 patches which are development work.
> 
> Most file filesystem corruptions. Without those fixes, CRC enabled
> filesystems simple cannot be used without all these patches being
> applied.  It's not an experimental feature at this point - it's just
> broken.

If as an experimental feature "it's just broken" in -rc1, maybe I should
never have merged it.

> > We pulled in the CRC work with the understanding that it is an
> > experimental feature that might not be perfect, and with a focus
> > upon preventing regression for existing users.  This did not imply
> > that we're going after perfection for CRCs during the 3.10 release
> > candidate series.
> 
> Right, but there was also the understanding that the CRC code would
> be usable by release time, and so there would be fixes committed
> throughout the -rc series to fix problems with the CRC code so that
> it is usable.
>
> As it is, the result of all these patches is that xfstests is almost
> entirely passing on a 4k block size filesystem. The only tests that
> aren't passing are those that already fail or require userspace
> support that isn't currently available (e.g. xfs-db write support,
> metadump). And 1k block size filesystems mostly pass xfstests, too,
> though there are random assert failures from time to time.
> 
> IOWs after this patch set is applied, the CRC functionality fits the
> definition of the 'experimental' tag.  It mostly works, but still
> has some lurking issues that need to be flushed out. I'm unlikely to
> be posting 2-3 bug fix patches a day from this point onwards....
> 
> Indeed, the last two days I've been doing 3.11 development work
> because xfstests on CRC enabled filesystems is now usable for
> regression testing. i.e. the majority of the "easy-to-hit" problems
> have been flushed out by this patchset.
>
> > I'll be happy to review the 17 CRC related patches for inclusion
> > in the master branch ASAP,
> 
> That's a good start, but...
> 
> > but I'm afraid 17 patches is a bit much
> > to ask for a release candidate, even if it were -rc2.
> 
> ... I think you're being way too conservative here.
>
> BTRFS in january for an -rc merge:
> 
> https://lkml.org/lkml/2013/1/25/11
> 
> That was 30 commits and +300/-100 lines. There was a another pull
> request 2 weeks later with another 9 commits. The same thing
> happened with the RAID5/6 integration into btrfs in 3.9 - the -rc
> merges following it had 13, 7 and 15 commits. And for 3.10-rc, the
> first post-rc1 merge was 25 commits (+350/-250 lines).

You have announced publicly on several occasions that you feel BTRFS is
not ready for production use.  If it is true that BTRFS is not ready for
production, then the concerns for BTRFS are different than those for XFS
in this context.

> And if you want another example, the ext4 extent status cache fixes
> that went into 3.9-rc4:
> 
> https://lkml.org/lkml/2013/3/21/644
> 
> that was 21 commits and +540/-81 lines.

Was extent status cache experimental code when this was merged?

> Yup, new filesystem code has bugs, and they get fixed in -rc
> releases. What makes new XFS code so special we can't do this?
>
> The amount of change we are talking about here is not unreasonable
> even for an -rc4 release, as evidenced by other filesystems under
> development. Hence I don't think "it's too late" is really a viable
> reason for not fixing all these filesystem corruption issues in the
> CRC code...

The 6-8 weeks after -rc1 is not the time to be changing the ondisk
format or rewriting swaths of code we didn't get right the first time,
unless absolutely necessary.  Issues with an experimental feature that
are only relevant to the few developers and testers who have the
temerity to apply the userspace CRC patches do not merit the level of
risk which some of these patches represent to the much larger number of
XFS users who will not be using CRCs in 3.10.

> > Here we are in 3.11/feature-bit territory.
> 
> I'd much prefer that we don't have to add code to 3.11 to reject any
> CRC-enabled filesystem without any feature bits set because we don't
> support a broken remote attr format that was fixed weeks before 3.10
> released but was not allowed to be fixed in 3.10. That's just crazy
> from any release management perspective you care to look at it from.

So would I.
 
> Ben, if the problem is that you can't review all the fixes in a timely
> manner, then we can fix that. I'm sure that Mark, Eric and Brian can
> help review the code if this is the sticking point.

Reviews are always welcome...

> There are also
> people like Michael who are actively testing the CRC code and
> reporting bugs to me all the time, so there's demand for these fixes
> to be integrated into the 3.10 tree...

I bet Michael can grab sources from the master branch.

> Everyone knew there would be fixes for the CRC code coming along in
> the -rc period - we discussed it on the weekly concall before the the
> CRC code was merged into 3.10. Hence I'm a bit surprised to get what
> appears to be a blanket rejection for these fixes to the CRC code this
> early in the 3.10-rc cycle.

Three (or is it 4 now?) on-disk changes, one of which is completely unrelated
to CRCs, along with a bunch of the changes deep in the remote attribute, quota,
and inode unlinked list code is not worth the risk in the middle of release
candidates.  I'm ok with lower risk changes.
 
> Let's not waste all the effort that we put into getting the CRC code into
> 3.10 by not fixing known corruptions and on-disk format deficiencies. 

I don't consider it to be a waste of effort if CRCs weren't quite perfect for
3.10.  There is plenty to show for it, such as not having introduced
regressions for everyone else.  People who want to test CRCs will be coming to
the list for the userspace patches anyway.  It's easy enough to point them to
the master branch to get the kernel fixes too.

> Releasing CRC support in a seriously broken state that
> we need to add permanent workarounds for in 3.11 is about the worst
> possible outcome that could occur right now....

A worse outcome is that I pull in this code and something goes very
wrong for the thousands of users of 3.10 with existing non-crc XFS
filesystems.  A feature bit and some inconvenience for a few XFS
developers and testers is a safer choice.

Regards,
Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 5/9] xfs: fix split buffer vector log recovery support
  2013-05-27  6:38 ` [PATCH 5/9] xfs: fix split buffer vector log recovery support Dave Chinner
@ 2013-05-29 19:21   ` Mark Tinguely
  2013-05-30 17:49     ` Ben Myers
  0 siblings, 1 reply; 54+ messages in thread
From: Mark Tinguely @ 2013-05-29 19:21 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 05/27/13 01:38, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> A long time ago in a galaxy far away....
>
> .. the was a commit made to fix some ilinux specific "fragmented
> buffer" log recovery problem:
>
> http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=b29c0bece51da72fb3ff3b61391a391ea54e1603
>
> That problem occurred when a contiguous dirty region of a buffer was
> split across across two pages of an unmapped buffer. It's been a
> long time since that has been done in XFS, and the changes to log
> the entire inode buffers for CRC enabled filesystems has
> re-introduced that corner case.
>
> And, of course, it turns out that the above commit didn't actually
> fix anything - it just ensured that log recovery is guaranteed to
> fail when this situation occurs. And now for the gory details.
>
...

Thanks for the great walk through of the problem.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATH 0/9] xfs: fixes for 3.10-rc4
  2013-05-29 19:01     ` Ben Myers
@ 2013-05-29 19:27       ` Eric Sandeen
  2013-05-29 19:45         ` Ben Myers
  0 siblings, 1 reply; 54+ messages in thread
From: Eric Sandeen @ 2013-05-29 19:27 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On 5/29/13 2:01 PM, Ben Myers wrote:
> Hi Dave,
> 
> On Wed, May 29, 2013 at 09:54:24AM +1000, Dave Chinner wrote:

<giant snip>

>> I'd much prefer that we don't have to add code to 3.11 to reject any
>> CRC-enabled filesystem without any feature bits set because we don't
>> support a broken remote attr format that was fixed weeks before 3.10
>> released but was not allowed to be fixed in 3.10. That's just crazy
>> from any release management perspective you care to look at it from.
> 
> So would I.
>  
>> Ben, if the problem is that you can't review all the fixes in a timely
>> manner, then we can fix that. I'm sure that Mark, Eric and Brian can
>> help review the code if this is the sticking point.
> 
> Reviews are always welcome...

But it won't matter for the sake of this argument, sounds like?

<another snip>

> A worse outcome is that I pull in this code and something goes very
> wrong for the thousands of users of 3.10 with existing non-crc XFS
> filesystems.  A feature bit and some inconvenience for a few XFS
> developers and testers is a safer choice.

Your concern (rightly) seems to be stability for non-crc users, so:

I'll review these patches with a special eye towards if/how they
affect any non-crc codepaths.  If it's wholly contained in crc
code, you can merge them without fear.  Sound like a deal?

-Eric

> Regards,
> Ben
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATH 0/9] xfs: fixes for 3.10-rc4
  2013-05-29 19:27       ` Eric Sandeen
@ 2013-05-29 19:45         ` Ben Myers
  0 siblings, 0 replies; 54+ messages in thread
From: Ben Myers @ 2013-05-29 19:45 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

Hey Eric,

On Wed, May 29, 2013 at 02:27:18PM -0500, Eric Sandeen wrote:
> On 5/29/13 2:01 PM, Ben Myers wrote:
> > Hi Dave,
> > 
> > On Wed, May 29, 2013 at 09:54:24AM +1000, Dave Chinner wrote:
> 
> <giant snip>
> 
> >> I'd much prefer that we don't have to add code to 3.11 to reject any
> >> CRC-enabled filesystem without any feature bits set because we don't
> >> support a broken remote attr format that was fixed weeks before 3.10
> >> released but was not allowed to be fixed in 3.10. That's just crazy
> >> from any release management perspective you care to look at it from.
> > 
> > So would I.
> >  
> >> Ben, if the problem is that you can't review all the fixes in a timely
> >> manner, then we can fix that. I'm sure that Mark, Eric and Brian can
> >> help review the code if this is the sticking point.
> > 
> > Reviews are always welcome...
> 
> But it won't matter for the sake of this argument, sounds like?

Reviews will certainly help...

> > A worse outcome is that I pull in this code and something goes very
> > wrong for the thousands of users of 3.10 with existing non-crc XFS
> > filesystems.  A feature bit and some inconvenience for a few XFS
> > developers and testers is a safer choice.
> 
> Your concern (rightly) seems to be stability for non-crc users, so:
> 
> I'll review these patches with a special eye towards if/how they
> affect any non-crc codepaths.  If it's wholly contained in crc
> code, you can merge them without fear.  Sound like a deal?

...but my primary concern is the content of the patches.

If we can show that a given patch is relevant, of low risk to non-crc users,
and has been adequately tested, I'm game.

Thanks,
Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/9] xfs: disable swap extents ioctl on CRC enabled filesystems
  2013-05-27  6:38 ` [PATCH 6/9] xfs: disable swap extents ioctl on CRC enabled filesystems Dave Chinner
  2013-05-28 21:49   ` Ben Myers
@ 2013-05-29 21:06   ` Brian Foster
  2013-05-30 17:56     ` Ben Myers
  1 sibling, 1 reply; 54+ messages in thread
From: Brian Foster @ 2013-05-29 21:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 05/27/2013 02:38 AM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Currently, swapping extents from one inode to another is a simple
> act of switching data and attribute forks from one inode to another.
> This, unfortunately in no longer so simple with CRC enabled
> filesystems as there is owner information embedded into the BMBT
> blocks that are swapped between inodes. Hence swapping the forks
> between inodes results in the inodes having mapping blocks that
> point to the wrong owner and hence are considered corrupt.
> 
> To fix this we need an extent tree block or record based swap
> algorithm so that the BMBT block owner information can be updated
> atomically in the swap transaction. This is a significant piece of
> new work, so for the moment simply don't allow swap extent
> operations to succeed on CRC enabled filesystems.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Pretty straightforward...

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_dfrag.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/xfs/xfs_dfrag.c b/fs/xfs/xfs_dfrag.c
> index f852b08..c407e1c 100644
> --- a/fs/xfs/xfs_dfrag.c
> +++ b/fs/xfs/xfs_dfrag.c
> @@ -219,6 +219,14 @@ xfs_swap_extents(
>  	int		taforkblks = 0;
>  	__uint64_t	tmp;
>  
> +	/*
> +	 * We have no way of updating owner information in the BMBT blocks for
> +	 * each inode on CRC enabled filesystems, so to avoid corrupting the
> +	 * this metadata we simply don't allow extent swaps to occur.
> +	 */
> +	if (xfs_sb_version_hascrc(&mp->m_sb))
> +		return XFS_ERROR(EINVAL);
> +
>  	tempifp = kmem_alloc(sizeof(xfs_ifork_t), KM_MAYFAIL);
>  	if (!tempifp) {
>  		error = XFS_ERROR(ENOMEM);
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 8/9] xfs: add fsgeom flag for v5 superblock support.
  2013-05-29 15:10   ` Eric Sandeen
@ 2013-05-29 21:43     ` Ben Myers
  2013-05-29 21:47       ` Ben Myers
  2013-05-30  1:28       ` Dave Chinner
  2013-05-30  1:11     ` Dave Chinner
  1 sibling, 2 replies; 54+ messages in thread
From: Ben Myers @ 2013-05-29 21:43 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

Hi Eric,

On Wed, May 29, 2013 at 10:10:13AM -0500, Eric Sandeen wrote:
> On 5/27/13 1:38 AM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Currently userspace has no way of determining that a filesystem is
> > CRC enabled. Add a flag to the XFS_IOC_FSGEOMETRY ioctl output to
> > indicate that the filesystem has v5 superblock support enabled.
> > This will allow xfs_info to correctly report the state of the
> > filesystem.
> 
> 
> Looks fine,
> 
> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> 
> Ben, having this in place for for the next point release will let
> userspace work & testing proceed w/o the need for a patched
> kernel... if you could consider pulling it in that'd be great.

Sounds reasonable.  I'll check it out.

> Dave, just out of curiosity, most other features sort of match between
> the "_has_*" and the flag names, is there a reason for the
> crc <-> sbv5 difference?  Just semantics, but just curious.
> 
> (i.e. xfs_sb_version_hasprojid32bit checks XFS_SB_VERSION2_PROJID32BIT,
> but xfs_sb_version_hascrc checks XFS_SB_VERSION_5)
> 
> Answering my own question maybe, I guess SB_VERSION_5 was conceived
> with crc already in place, so there's no need for a feature flag on
> top of the sb version, right...?

Seems like we're also out of space in xfs_fsop_geom.flags.  There may even be
people who prefer to use v5 super blocks without crcs turned on, so maybe
conflating the two ideas here is undesireable.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 8/9] xfs: add fsgeom flag for v5 superblock support.
  2013-05-29 21:43     ` Ben Myers
@ 2013-05-29 21:47       ` Ben Myers
  2013-05-30  1:28       ` Dave Chinner
  1 sibling, 0 replies; 54+ messages in thread
From: Ben Myers @ 2013-05-29 21:47 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On Wed, May 29, 2013 at 04:43:01PM -0500, Ben Myers wrote:
> Hi Eric,
> 
> On Wed, May 29, 2013 at 10:10:13AM -0500, Eric Sandeen wrote:
> > On 5/27/13 1:38 AM, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Currently userspace has no way of determining that a filesystem is
> > > CRC enabled. Add a flag to the XFS_IOC_FSGEOMETRY ioctl output to
> > > indicate that the filesystem has v5 superblock support enabled.
> > > This will allow xfs_info to correctly report the state of the
> > > filesystem.
> > 
> > 
> > Looks fine,
> > 
> > Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> > 
> > Ben, having this in place for for the next point release will let
> > userspace work & testing proceed w/o the need for a patched
> > kernel... if you could consider pulling it in that'd be great.
> 
> Sounds reasonable.  I'll check it out.
> 
> > Dave, just out of curiosity, most other features sort of match between
> > the "_has_*" and the flag names, is there a reason for the
> > crc <-> sbv5 difference?  Just semantics, but just curious.
> > 
> > (i.e. xfs_sb_version_hasprojid32bit checks XFS_SB_VERSION2_PROJID32BIT,
> > but xfs_sb_version_hascrc checks XFS_SB_VERSION_5)
> > 
> > Answering my own question maybe, I guess SB_VERSION_5 was conceived
> > with crc already in place, so there's no need for a feature flag on
> > top of the sb version, right...?
> 
> Seems like we're also out of space in xfs_fsop_geom.flags.


> There may even be
> people who prefer to use v5 super blocks without crcs turned on, so maybe
> conflating the two ideas here is undesireable.

Nevermind.  That was silly.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 2/9] xfs: fix incorrect remote symlink block count
  2013-05-29 16:39   ` Brian Foster
@ 2013-05-30  0:46     ` Dave Chinner
  2013-05-30 17:49     ` Ben Myers
  1 sibling, 0 replies; 54+ messages in thread
From: Dave Chinner @ 2013-05-30  0:46 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Wed, May 29, 2013 at 12:39:33PM -0400, Brian Foster wrote:
> On 05/27/2013 02:38 AM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When CRCs are enabled, the number of blocks needed to hold a remote
> > symlink on a 1k block size filesystem may be 2 instead of 1. The
> > transaction reservation for the allocated bloks was not taking this
> > into account and only allocating one block. hence when trying to
> > read or invalidate such symlinks, we are mapping a hole where there
> > should be a block and things go bad at that point.
> > 
> > Fix the reservation to use the correct block count, clean up the
> > block count calculation similar to the remote attribute calculation,
> > and add a debug guard to detect when we don't write the entire
> > symlink to disk.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> Unrelated question... I noticed we update di_size in xfs_symlink(). I
> presume this is safe because, even in the non-local case, we actually
> log the data (the path) in the buffers as well, right?

Yes, the remote symlink block allocation and contents is logged in
the same transaction so cwit is safe to update di_size directly and
log that, too.

Cheers,

Dave
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 4/9] xfs: rework dquot CRCs
  2013-05-29 18:58   ` Brian Foster
@ 2013-05-30  1:00     ` Dave Chinner
  2013-05-30 12:02       ` Brian Foster
  0 siblings, 1 reply; 54+ messages in thread
From: Dave Chinner @ 2013-05-30  1:00 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Wed, May 29, 2013 at 02:58:27PM -0400, Brian Foster wrote:
> On 05/27/2013 02:38 AM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Calculating dquot CRCs when the backing buffer is written back just
> > doesn't work reliably. There are several places which manipulate
> > dquots directly in the buffers, and they don't calculate CRCs
> > appropriately, nor do they always set the buffer up to calculate
> > CRCs appropriately.
> > 
> > Firstly, if we log a dquot buffer (e.g. during allocation) it gets
> > logged without valid CRC, and so on recovery we end up with a dquot
> > that is not valid.
> > 
> > Secondly, if we recover/repair a dquot, we don't have a verifier
> > attached to the buffer and hence CRCs arenot calculate don the way
> > down to disk.
> > 
> > Thirdly, calculating the CRC after we've changed the contents means
> > that if we re-read the dquot from the buffer, we cannot verify the
> > contents of the dquot are valid, as the CRC is invalid.
> > 
> > So, to avoid all the dquot CRC errors that are being detected by the
> > read verifier, change to using the same model as for inodes. that
> > is, dquot CRCs are calculated and written to the backing buffer at
> > the time the dquot is flushed to the backing buffer. If we modify
> > the dquuot directly in the backing buffer, calculate the CRC
> > immediately after the modification is complete. Hence the dquot in
> > the on-disk buffer should always have a valid CRC.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_dquot.c       |   37 ++++++++++++++++---------------------
> >  fs/xfs/xfs_log_recover.c |   10 ++++++++++
> >  fs/xfs/xfs_qm.c          |   36 ++++++++++++++++++++++++++----------
> >  fs/xfs/xfs_quota.h       |    2 ++
> >  4 files changed, 54 insertions(+), 31 deletions(-)
> > 
> ...
> > diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> > index f41702b..d181542 100644
> > --- a/fs/xfs/xfs_qm.c
> > +++ b/fs/xfs/xfs_qm.c
> > @@ -41,6 +41,7 @@
> >  #include "xfs_qm.h"
> >  #include "xfs_trace.h"
> >  #include "xfs_icache.h"
> > +#include "xfs_cksum.h"
> >  
> >  /*
> >   * The global quota manager. There is only one of these for the entire
> > @@ -839,7 +840,7 @@ xfs_qm_reset_dqcounts(
> >  	xfs_dqid_t	id,
> >  	uint		type)
> >  {
> > -	xfs_disk_dquot_t	*ddq;
> > +	struct xfs_dqblk	*dqb;
> >  	int			j;
> >  
> >  	trace_xfs_reset_dqcounts(bp, _RET_IP_);
> > @@ -853,8 +854,12 @@ xfs_qm_reset_dqcounts(
> >  	do_div(j, sizeof(xfs_dqblk_t));
> >  	ASSERT(mp->m_quotainfo->qi_dqperchunk == j);
> >  #endif
> > -	ddq = bp->b_addr;
> > +	dqb = bp->b_addr;
> >  	for (j = 0; j < mp->m_quotainfo->qi_dqperchunk; j++) {
> > +		struct xfs_disk_dquot	*ddq;
> > +
> > +		ddq =  (struct xfs_disk_dquot *)&dqb[j];
> > +
> >  		/*
> >  		 * Do a sanity check, and if needed, repair the dqblk. Don't
> >  		 * output any warnings because it's perfectly possible to
> > @@ -871,7 +876,8 @@ xfs_qm_reset_dqcounts(
> >  		ddq->d_bwarns = 0;
> >  		ddq->d_iwarns = 0;
> >  		ddq->d_rtbwarns = 0;
> > -		ddq = (xfs_disk_dquot_t *) ((xfs_dqblk_t *)ddq + 1);
> > +		xfs_update_cksum((char *)&dqb[j], sizeof(struct xfs_dqblk),
> > +					 XFS_DQUOT_CRC_OFF);
> 
> Nice cleanup on the cast ugliness even without the crc change. Is there
> a technical reason for the unconditional crc update here beyond that
> we're doing a reset? I'm wondering if there's any value in leaving those
> bits untouched for a filesystem that might have never enabled crc
> (quotacheck or not).

The dquot might be zeroed and unused, but the buffer it sits in is
still allocated and valid. That means if we ever start using that
dquot again (either by quotacheck or a new uid/gid/prid), it will be
read straight out of the buffer rather than allocated, and hence the
constraint that allocated but unused dquots still need to have valid
CRCs.

FWIW, the dquot buffer read validates the CRC on all dquots in the
buffer when it comes off disk as it has no way of knowing what
dquots contain valid data or not. Same with the xfs_qm_dqcheck()
call - an unused dquot still needs to be a valid dquot to pass those
checks...

> FWIW, the rest of this patch looks sane to me (I'm less familiar with
> the log recovery code, but the changes seem isolated and
> straightforward) and I couldn't locate anywhere else we modify the
> backing buffer for the dquot.

Right, apart from dquot allocation and flushing, there aren't any.
Resetting the dquots to zero before a quota check is a special case.
Doing it via the buffer avoids needing to initialise all the dquots
in memory that *might* be tracked by the quota file. But we don't
know what quota IDs are tracked in the quota file with first mapping
the quota file and finding all the offsets that contain blocks. And,
well, once we have that mapping, why would be do N xfs_qm_dqget()
calls per buffer to get initialised, cached dquots from the buffer
when we can simply RMW the buffers we've mapped directly?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/9] xfs: disable swap extents ioctl on CRC enabled filesystems
  2013-05-28 21:49   ` Ben Myers
@ 2013-05-30  1:07     ` Dave Chinner
  0 siblings, 0 replies; 54+ messages in thread
From: Dave Chinner @ 2013-05-30  1:07 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Tue, May 28, 2013 at 04:49:31PM -0500, Ben Myers wrote:
> On Mon, May 27, 2013 at 04:38:24PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Currently, swapping extents from one inode to another is a simple
> > act of switching data and attribute forks from one inode to another.
> > This, unfortunately in no longer so simple with CRC enabled
> > filesystems as there is owner information embedded into the BMBT
> > blocks that are swapped between inodes. Hence swapping the forks
> > between inodes results in the inodes having mapping blocks that
> > point to the wrong owner and hence are considered corrupt.
> > 
> > To fix this we need an extent tree block or record based swap
> > algorithm so that the BMBT block owner information can be updated
> > atomically in the swap transaction. This is a significant piece of
> > new work, so for the moment simply don't allow swap extent
> > operations to succeed on CRC enabled filesystems.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> I believe we do want to have functional swap extents for crc enabled
> filesystems.

Of course.

> But this is fine as long as it is temporary.

It is.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 8/9] xfs: add fsgeom flag for v5 superblock support.
  2013-05-29 15:10   ` Eric Sandeen
  2013-05-29 21:43     ` Ben Myers
@ 2013-05-30  1:11     ` Dave Chinner
  1 sibling, 0 replies; 54+ messages in thread
From: Dave Chinner @ 2013-05-30  1:11 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On Wed, May 29, 2013 at 10:10:13AM -0500, Eric Sandeen wrote:
> On 5/27/13 1:38 AM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Currently userspace has no way of determining that a filesystem is
> > CRC enabled. Add a flag to the XFS_IOC_FSGEOMETRY ioctl output to
> > indicate that the filesystem has v5 superblock support enabled.
> > This will allow xfs_info to correctly report the state of the
> > filesystem.
> 
> 
> Looks fine,
> 
> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> 
> Ben, having this in place for for the next point release will let
> userspace work & testing proceed w/o the need for a patched
> kernel... if you could consider pulling it in that'd be great.
> 
> Dave, just out of curiosity, most other features sort of match between
> the "_has_*" and the flag names, is there a reason for the
> crc <-> sbv5 difference?  Just semantics, but just curious.
> 
> (i.e. xfs_sb_version_hasprojid32bit checks XFS_SB_VERSION2_PROJID32BIT,
> but xfs_sb_version_hascrc checks XFS_SB_VERSION_5)
> 
> Answering my own question maybe, I guess SB_VERSION_5 was conceived
> with crc already in place, so there's no need for a feature flag on
> top of the sb version, right...?

Exactly. New features that require feature flags will end up
following the flag/function name convention, but it's not necessary
in this case because V5 sb = CRCs enabled.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 8/9] xfs: add fsgeom flag for v5 superblock support.
  2013-05-29 21:43     ` Ben Myers
  2013-05-29 21:47       ` Ben Myers
@ 2013-05-30  1:28       ` Dave Chinner
  1 sibling, 0 replies; 54+ messages in thread
From: Dave Chinner @ 2013-05-30  1:28 UTC (permalink / raw)
  To: Ben Myers; +Cc: Eric Sandeen, xfs

On Wed, May 29, 2013 at 04:43:01PM -0500, Ben Myers wrote:
> Hi Eric,
> 
> On Wed, May 29, 2013 at 10:10:13AM -0500, Eric Sandeen wrote:
> > On 5/27/13 1:38 AM, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Currently userspace has no way of determining that a filesystem is
> > > CRC enabled. Add a flag to the XFS_IOC_FSGEOMETRY ioctl output to
> > > indicate that the filesystem has v5 superblock support enabled.
> > > This will allow xfs_info to correctly report the state of the
> > > filesystem.
> > 
> > 
> > Looks fine,
> > 
> > Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> > 
> > Ben, having this in place for for the next point release will let
> > userspace work & testing proceed w/o the need for a patched
> > kernel... if you could consider pulling it in that'd be great.
> 
> Sounds reasonable.  I'll check it out.
> 
> > Dave, just out of curiosity, most other features sort of match between
> > the "_has_*" and the flag names, is there a reason for the
> > crc <-> sbv5 difference?  Just semantics, but just curious.
> > 
> > (i.e. xfs_sb_version_hasprojid32bit checks XFS_SB_VERSION2_PROJID32BIT,
> > but xfs_sb_version_hascrc checks XFS_SB_VERSION_5)
> > 
> > Answering my own question maybe, I guess SB_VERSION_5 was conceived
> > with crc already in place, so there's no need for a feature flag on
> > top of the sb version, right...?
> 
> Seems like we're also out of space in xfs_fsop_geom.flags. 

Nowhere near it, actually ;). flags is a __u32, this is only the
16th flag.

> There may even be
> people who prefer to use v5 super blocks without crcs turned on, so maybe
> conflating the two ideas here is undesireable.

The flag is indicating that there is a different format on disk, not
that crcs are enabled or not. Userspace needs to know about that
different format, and right now *userspace* assumes v5 superblocks
mean CRCs are enabled because that's part of the definition of the
features that a v5 superblock has.

If in future that changes (hint: it won't) then we can add a
separate flag to indicate whether CRCs are enabled or not when the
feature flag to disable them is added to the superblock.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 4/9] xfs: rework dquot CRCs
  2013-05-30  1:00     ` Dave Chinner
@ 2013-05-30 12:02       ` Brian Foster
  2013-06-03  4:12         ` Dave Chinner
  0 siblings, 1 reply; 54+ messages in thread
From: Brian Foster @ 2013-05-30 12:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 05/29/2013 09:00 PM, Dave Chinner wrote:
> On Wed, May 29, 2013 at 02:58:27PM -0400, Brian Foster wrote:
>> On 05/27/2013 02:38 AM, Dave Chinner wrote:
>>> From: Dave Chinner <dchinner@redhat.com>
>>>
>>> Calculating dquot CRCs when the backing buffer is written back just
>>> doesn't work reliably. There are several places which manipulate
>>> dquots directly in the buffers, and they don't calculate CRCs
>>> appropriately, nor do they always set the buffer up to calculate
>>> CRCs appropriately.
>>>
>>> Firstly, if we log a dquot buffer (e.g. during allocation) it gets
>>> logged without valid CRC, and so on recovery we end up with a dquot
>>> that is not valid.
>>>
>>> Secondly, if we recover/repair a dquot, we don't have a verifier
>>> attached to the buffer and hence CRCs arenot calculate don the way
>>> down to disk.
>>>
>>> Thirdly, calculating the CRC after we've changed the contents means
>>> that if we re-read the dquot from the buffer, we cannot verify the
>>> contents of the dquot are valid, as the CRC is invalid.
>>>
>>> So, to avoid all the dquot CRC errors that are being detected by the
>>> read verifier, change to using the same model as for inodes. that
>>> is, dquot CRCs are calculated and written to the backing buffer at
>>> the time the dquot is flushed to the backing buffer. If we modify
>>> the dquuot directly in the backing buffer, calculate the CRC
>>> immediately after the modification is complete. Hence the dquot in
>>> the on-disk buffer should always have a valid CRC.
>>>
>>> Signed-off-by: Dave Chinner <dchinner@redhat.com>
>>> ---
>>>  fs/xfs/xfs_dquot.c       |   37 ++++++++++++++++---------------------
>>>  fs/xfs/xfs_log_recover.c |   10 ++++++++++
>>>  fs/xfs/xfs_qm.c          |   36 ++++++++++++++++++++++++++----------
>>>  fs/xfs/xfs_quota.h       |    2 ++
>>>  4 files changed, 54 insertions(+), 31 deletions(-)
>>>
>> ...
>>> diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
>>> index f41702b..d181542 100644
>>> --- a/fs/xfs/xfs_qm.c
>>> +++ b/fs/xfs/xfs_qm.c
>>> @@ -41,6 +41,7 @@
>>>  #include "xfs_qm.h"
>>>  #include "xfs_trace.h"
>>>  #include "xfs_icache.h"
>>> +#include "xfs_cksum.h"
>>>  
>>>  /*
>>>   * The global quota manager. There is only one of these for the entire
>>> @@ -839,7 +840,7 @@ xfs_qm_reset_dqcounts(
>>>  	xfs_dqid_t	id,
>>>  	uint		type)
>>>  {
>>> -	xfs_disk_dquot_t	*ddq;
>>> +	struct xfs_dqblk	*dqb;
>>>  	int			j;
>>>  
>>>  	trace_xfs_reset_dqcounts(bp, _RET_IP_);
>>> @@ -853,8 +854,12 @@ xfs_qm_reset_dqcounts(
>>>  	do_div(j, sizeof(xfs_dqblk_t));
>>>  	ASSERT(mp->m_quotainfo->qi_dqperchunk == j);
>>>  #endif
>>> -	ddq = bp->b_addr;
>>> +	dqb = bp->b_addr;
>>>  	for (j = 0; j < mp->m_quotainfo->qi_dqperchunk; j++) {
>>> +		struct xfs_disk_dquot	*ddq;
>>> +
>>> +		ddq =  (struct xfs_disk_dquot *)&dqb[j];
>>> +
>>>  		/*
>>>  		 * Do a sanity check, and if needed, repair the dqblk. Don't
>>>  		 * output any warnings because it's perfectly possible to
>>> @@ -871,7 +876,8 @@ xfs_qm_reset_dqcounts(
>>>  		ddq->d_bwarns = 0;
>>>  		ddq->d_iwarns = 0;
>>>  		ddq->d_rtbwarns = 0;
>>> -		ddq = (xfs_disk_dquot_t *) ((xfs_dqblk_t *)ddq + 1);
>>> +		xfs_update_cksum((char *)&dqb[j], sizeof(struct xfs_dqblk),
>>> +					 XFS_DQUOT_CRC_OFF);
>>
>> Nice cleanup on the cast ugliness even without the crc change. Is there
>> a technical reason for the unconditional crc update here beyond that
>> we're doing a reset? I'm wondering if there's any value in leaving those
>> bits untouched for a filesystem that might have never enabled crc
>> (quotacheck or not).
> 
> The dquot might be zeroed and unused, but the buffer it sits in is
> still allocated and valid. That means if we ever start using that
> dquot again (either by quotacheck or a new uid/gid/prid), it will be
> read straight out of the buffer rather than allocated, and hence the
> constraint that allocated but unused dquots still need to have valid
> CRCs.
> 

The constraint makes sense when CRCs are enabled...

> FWIW, the dquot buffer read validates the CRC on all dquots in the
> buffer when it comes off disk as it has no way of knowing what
> dquots contain valid data or not. Same with the xfs_qm_dqcheck()
> call - an unused dquot still needs to be a valid dquot to pass those
> checks...
> 

Yeah, that part makes sense. I've followed through and grokked most of
the dquot buffer read and dquot CRC validation code, I think.

My question is more why is the code above (in xfs_qm_reset_dqcounts())
not the following?

	if (xfs_sb_version_hascrc(&mp->m_sb))
		xfs_update_cksum(...);

... because it currently looks like that if CRCs are not enabled, you're
writing what is effectively a padded area (in terms of the semantics of
the on-disk structure, not necessarily the kernel code). It was never a
valid CRC and won't be as soon as the dquot is used again, no?

>> FWIW, the rest of this patch looks sane to me (I'm less familiar with
>> the log recovery code, but the changes seem isolated and
>> straightforward) and I couldn't locate anywhere else we modify the
>> backing buffer for the dquot.
> 
> Right, apart from dquot allocation and flushing, there aren't any.
> Resetting the dquots to zero before a quota check is a special case.
> Doing it via the buffer avoids needing to initialise all the dquots
> in memory that *might* be tracked by the quota file. But we don't
> know what quota IDs are tracked in the quota file with first mapping
> the quota file and finding all the offsets that contain blocks. And,
> well, once we have that mapping, why would be do N xfs_qm_dqget()
> calls per buffer to get initialised, cached dquots from the buffer
> when we can simply RMW the buffers we've mapped directly?
> 

I walked through a bit of the quota check code and that makes sense.
Thanks for the explanation.

Brian

> Cheers,
> 
> Dave.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 7/9] xfs: kill suid/sgid through the truncate path.
  2013-05-27  6:38 ` [PATCH 7/9] xfs: kill suid/sgid through the truncate path Dave Chinner
@ 2013-05-30 14:17   ` Brian Foster
  2013-05-30 15:52     ` Ben Myers
  0 siblings, 1 reply; 54+ messages in thread
From: Brian Foster @ 2013-05-30 14:17 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 05/27/2013 02:38 AM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> XFS has failed to kill suid/sgid bits correctly when truncating
> files of non-zero size since commit c4ed4243 ("xfs: split
> xfs_setattr") introduced in the 3.1 kernel. Fix it.
> 

The code makes sense and I can easily hit an assert when truncating
(extending) a suid file on a debug kernel without this patch (and I see
the suid dropped with the patch).

The changelog seems slightly misleading (implying any truncate of a zero
sized file should work, when really only 0->0 size truncates work), but
not a big deal...

Reviewed-by: Brian Foster <bfoster@redhat.com>

> Fix it.
> 
> cc: stable kernel <stable@vger.kernel.org>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_iops.c |   47 ++++++++++++++++++++++++++++++++---------------
>  1 file changed, 32 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index d82efaa..ca9ecaa 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -455,6 +455,28 @@ xfs_vn_getattr(
>  	return 0;
>  }
>  
> +static void
> +xfs_setattr_mode(
> +	struct xfs_trans	*tp,
> +	struct xfs_inode	*ip,
> +	struct iattr		*iattr)
> +{
> +	struct inode	*inode = VFS_I(ip);
> +	umode_t		mode = iattr->ia_mode;
> +
> +	ASSERT(tp);
> +	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> +
> +	if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
> +		mode &= ~S_ISGID;
> +
> +	ip->i_d.di_mode &= S_IFMT;
> +	ip->i_d.di_mode |= mode & ~S_IFMT;
> +
> +	inode->i_mode &= S_IFMT;
> +	inode->i_mode |= mode & ~S_IFMT;
> +}
> +
>  int
>  xfs_setattr_nonsize(
>  	struct xfs_inode	*ip,
> @@ -606,18 +628,8 @@ xfs_setattr_nonsize(
>  	/*
>  	 * Change file access modes.
>  	 */
> -	if (mask & ATTR_MODE) {
> -		umode_t mode = iattr->ia_mode;
> -
> -		if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
> -			mode &= ~S_ISGID;
> -
> -		ip->i_d.di_mode &= S_IFMT;
> -		ip->i_d.di_mode |= mode & ~S_IFMT;
> -
> -		inode->i_mode &= S_IFMT;
> -		inode->i_mode |= mode & ~S_IFMT;
> -	}
> +	if (mask & ATTR_MODE)
> +		xfs_setattr_mode(tp, ip, iattr);
>  
>  	/*
>  	 * Change file access or modified times.
> @@ -714,9 +726,8 @@ xfs_setattr_size(
>  		return XFS_ERROR(error);
>  
>  	ASSERT(S_ISREG(ip->i_d.di_mode));
> -	ASSERT((mask & (ATTR_MODE|ATTR_UID|ATTR_GID|ATTR_ATIME|ATTR_ATIME_SET|
> -			ATTR_MTIME_SET|ATTR_KILL_SUID|ATTR_KILL_SGID|
> -			ATTR_KILL_PRIV|ATTR_TIMES_SET)) == 0);
> +	ASSERT((mask & (ATTR_UID|ATTR_GID|ATTR_ATIME|ATTR_ATIME_SET|
> +			ATTR_MTIME_SET|ATTR_KILL_PRIV|ATTR_TIMES_SET)) == 0);
>  
>  	if (!(flags & XFS_ATTR_NOLOCK)) {
>  		lock_flags |= XFS_IOLOCK_EXCL;
> @@ -860,6 +871,12 @@ xfs_setattr_size(
>  		xfs_inode_clear_eofblocks_tag(ip);
>  	}
>  
> +	/*
> +	 * Change file access modes.
> +	 */
> +	if (mask & ATTR_MODE)
> +		xfs_setattr_mode(tp, ip, iattr);
> +
>  	if (mask & ATTR_CTIME) {
>  		inode->i_ctime = iattr->ia_ctime;
>  		ip->i_d.di_ctime.t_sec = iattr->ia_ctime.tv_sec;
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 8/9] xfs: add fsgeom flag for v5 superblock support.
  2013-05-27  6:38 ` [PATCH 8/9] xfs: add fsgeom flag for v5 superblock support Dave Chinner
  2013-05-29 15:10   ` Eric Sandeen
@ 2013-05-30 14:17   ` Brian Foster
  2013-05-30 17:57     ` Ben Myers
  1 sibling, 1 reply; 54+ messages in thread
From: Brian Foster @ 2013-05-30 14:17 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 05/27/2013 02:38 AM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Currently userspace has no way of determining that a filesystem is
> CRC enabled. Add a flag to the XFS_IOC_FSGEOMETRY ioctl output to
> indicate that the filesystem has v5 superblock support enabled.
> This will allow xfs_info to correctly report the state of the
> filesystem.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_fs.h    |    1 +
>  fs/xfs/xfs_fsops.c |    4 +++-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_fs.h b/fs/xfs/xfs_fs.h
> index 6dda3f9..d046955 100644
> --- a/fs/xfs/xfs_fs.h
> +++ b/fs/xfs/xfs_fs.h
> @@ -236,6 +236,7 @@ typedef struct xfs_fsop_resblks {
>  #define XFS_FSOP_GEOM_FLAGS_PROJID32	0x0800  /* 32-bit project IDs	*/
>  #define XFS_FSOP_GEOM_FLAGS_DIRV2CI	0x1000	/* ASCII only CI names	*/
>  #define XFS_FSOP_GEOM_FLAGS_LAZYSB	0x4000	/* lazy superblock counters */
> +#define XFS_FSOP_GEOM_FLAGS_V5SB	0x8000	/* version 5 superblock */
>  
>  
>  /*
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 87595b2..3c3644e 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -99,7 +99,9 @@ xfs_fs_geometry(
>  			(xfs_sb_version_hasattr2(&mp->m_sb) ?
>  				XFS_FSOP_GEOM_FLAGS_ATTR2 : 0) |
>  			(xfs_sb_version_hasprojid32bit(&mp->m_sb) ?
> -				XFS_FSOP_GEOM_FLAGS_PROJID32 : 0);
> +				XFS_FSOP_GEOM_FLAGS_PROJID32 : 0) |
> +			(xfs_sb_version_hascrc(&mp->m_sb) ?
> +				XFS_FSOP_GEOM_FLAGS_V5SB : 0);
>  		geo->logsectsize = xfs_sb_version_hassector(&mp->m_sb) ?
>  				mp->m_sb.sb_logsectsize : BBSIZE;
>  		geo->rtsectsize = mp->m_sb.sb_blocksize;
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 2/2] xfs: inode unlinked list needs to recalculate the inode CRC
  2013-05-28 20:36     ` [PATCH 2/2] xfs: inode unlinked list needs to recalculate the inode CRC Dave Chinner
@ 2013-05-30 14:17       ` Brian Foster
  2013-05-30 20:27         ` Dave Chinner
  0 siblings, 1 reply; 54+ messages in thread
From: Brian Foster @ 2013-05-30 14:17 UTC (permalink / raw)
  To: Dave Chinner; +Cc: bpm, xfs

On 05/28/2013 04:36 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The inode unlinked list manipulations operate directly on the inode
> buffer, and so bypass the inode CRC calculation mechanisms. Hence an
> inode on the unlinked list has an invalid CRC. Fix this by
> recalculating the CRC whenever we modify an unlinked list pointer in
> an inode, ncluding during log recovery. This is trivial to do and
> results in  unlinked list operations always leaving a consistent
> inode in the buffer.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_inode.c       |   42 ++++++++++++++++++++++++++++++++++++++----
>  fs/xfs/xfs_log_recover.c |    9 +++++++++
>  2 files changed, 47 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index efbe1ac..2d993e7 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1579,6 +1579,23 @@ out_bmap_cancel:
>  }
>  
>  /*
> + * Helper function to calculate range of the inode to log and recalculate the
> + * on-disk inode crc if necessary.
> + */
> +static int
> +xfs_iunlink_dinode_calc_crc(
> +	struct xfs_mount	*mp,
> +	struct xfs_dinode	*dip)
> +{
> +	if (dip->di_version < 3)
> +		return sizeof(xfs_agino_t);
> +
> +	xfs_dinode_calc_crc(mp, dip);
> +	return offsetof(struct xfs_dinode, di_changecount) -
> +		offsetof(struct xfs_dinode, di_next_unlinked);
> +}
> +

So we've added a new helper, the return value for which is either the
size of an inode number or an inode number + crc, depending on format. I
also notice that the return value doesn't appear to be used anywhere
this helper is called.

> +/*
>   * This is called when the inode's link count goes to 0.
>   * We place the on-disk inode on a list in the AGI.  It
>   * will be pulled from this list when the inode is freed.
> @@ -1638,10 +1655,15 @@ xfs_iunlink(
>  		dip->di_next_unlinked = agi->agi_unlinked[bucket_index];
>  		offset = ip->i_imap.im_boffset +
>  			offsetof(xfs_dinode_t, di_next_unlinked);
> +
> +		/* need to recalc the inode CRC if appropriate */
> +		xfs_iunlink_dinode_calc_crc(mp, dip);
> +
>  		xfs_trans_inode_buf(tp, ibp);
>  		xfs_trans_log_buf(tp, ibp, offset,
> -				  (offset + sizeof(xfs_agino_t) - 1));
> +				  offset + sizeof(xfs_agino_t) - 1);
>  		xfs_inobp_check(mp, ibp);
> +
>  	}

So IIUC, offset is set to the offset of the di_next_unlinked value of
this inode in the backing buffer of the inode (which we've just updated
directly via dip).

We call the helper to update the CRC then call xfs_trans_log_buf() to
log a range of the buffer. From the original code, I surmise that we're
logging the range that represents the di_next_unlinked value and from
the addition of the helper, I surmise we intend to now include the crc
in that logged region.

But we haven't utilized the return value and I'm speculating on the
intent here. So I see that we're updating the CRC, but is it actually
logged? Perhaps I'm missing something, but if so, then why even have the
xfs_iunlink_dinode_calc_crc() helper?

/me goes back to read the original 9/9 and followup:

http://oss.sgi.com/archives/xfs/2013-05/msg00867.html

OK, so in that case, perhaps the helper is now unnecessary and we could
just call xfs_dinode_calc_crc()?

BTW, I was also going to ask here whether the fact that we update the
CRC on recovery rather than logging it exposed items in the log to risk
if they happened to become corrupted before that update occurs, but
IIUC, we're still protected in that recovery itself should validate the
existing on-disk CRC prior to the update. Correct?

Brian

>  
>  	/*
> @@ -1723,9 +1745,13 @@ xfs_iunlink_remove(
>  			dip->di_next_unlinked = cpu_to_be32(NULLAGINO);
>  			offset = ip->i_imap.im_boffset +
>  				offsetof(xfs_dinode_t, di_next_unlinked);
> +
> +			/* need to recalc the inode CRC if appropriate */
> +			xfs_iunlink_dinode_calc_crc(mp, dip);
> +
>  			xfs_trans_inode_buf(tp, ibp);
>  			xfs_trans_log_buf(tp, ibp, offset,
> -					  (offset + sizeof(xfs_agino_t) - 1));
> +					  offset + sizeof(xfs_agino_t) - 1);
>  			xfs_inobp_check(mp, ibp);
>  		} else {
>  			xfs_trans_brelse(tp, ibp);
> @@ -1796,9 +1822,13 @@ xfs_iunlink_remove(
>  			dip->di_next_unlinked = cpu_to_be32(NULLAGINO);
>  			offset = ip->i_imap.im_boffset +
>  				offsetof(xfs_dinode_t, di_next_unlinked);
> +
> +			/* need to recalc the inode CRC if appropriate */
> +			xfs_iunlink_dinode_calc_crc(mp, dip);
> +
>  			xfs_trans_inode_buf(tp, ibp);
>  			xfs_trans_log_buf(tp, ibp, offset,
> -					  (offset + sizeof(xfs_agino_t) - 1));
> +					  offset + sizeof(xfs_agino_t) - 1);
>  			xfs_inobp_check(mp, ibp);
>  		} else {
>  			xfs_trans_brelse(tp, ibp);
> @@ -1809,9 +1839,13 @@ xfs_iunlink_remove(
>  		last_dip->di_next_unlinked = cpu_to_be32(next_agino);
>  		ASSERT(next_agino != 0);
>  		offset = last_offset + offsetof(xfs_dinode_t, di_next_unlinked);
> +
> +		/* need to recalc the inode CRC if appropriate */
> +		xfs_iunlink_dinode_calc_crc(mp, dip);
> +
>  		xfs_trans_inode_buf(tp, last_ibp);
>  		xfs_trans_log_buf(tp, last_ibp, offset,
> -				  (offset + sizeof(xfs_agino_t) - 1));
> +				  offset + sizeof(xfs_agino_t) - 1);
>  		xfs_inobp_check(mp, last_ibp);
>  	}
>  	return 0;
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 83088d9..45a85ff 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -1912,6 +1912,15 @@ xlog_recover_do_inode_buffer(
>  		buffer_nextp = (xfs_agino_t *)xfs_buf_offset(bp,
>  					      next_unlinked_offset);
>  		*buffer_nextp = *logged_nextp;
> +
> +		/*
> +		 * If necessary, recalculate the CRC in the on-disk inode. We
> +		 * have to leave the inode in a consistent state for whoever
> +		 * reads it next....
> +		 */
> +		xfs_dinode_calc_crc(mp, (struct xfs_dinode *)
> +				xfs_buf_offset(bp, i * mp->m_sb.sb_inodesize));
> +
>  	}
>  
>  	return 0;
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 7/9] xfs: kill suid/sgid through the truncate path.
  2013-05-30 14:17   ` Brian Foster
@ 2013-05-30 15:52     ` Ben Myers
  2013-05-30 16:02       ` Brian Foster
  0 siblings, 1 reply; 54+ messages in thread
From: Ben Myers @ 2013-05-30 15:52 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

Hey Brian,

On Thu, May 30, 2013 at 10:17:30AM -0400, Brian Foster wrote:
> On 05/27/2013 02:38 AM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > XFS has failed to kill suid/sgid bits correctly when truncating
> > files of non-zero size since commit c4ed4243 ("xfs: split
> > xfs_setattr") introduced in the 3.1 kernel. Fix it.
> > 
> 
> The code makes sense and I can easily hit an assert when truncating
> (extending) a suid file on a debug kernel without this patch (and I see
> the suid dropped with the patch).

What commands did you use?  It seems like this is dealing with S_ISGID, correct?

Thanks,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 7/9] xfs: kill suid/sgid through the truncate path.
  2013-05-30 15:52     ` Ben Myers
@ 2013-05-30 16:02       ` Brian Foster
  2013-05-30 17:07         ` Ben Myers
  0 siblings, 1 reply; 54+ messages in thread
From: Brian Foster @ 2013-05-30 16:02 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On 05/30/2013 11:52 AM, Ben Myers wrote:
> Hey Brian,
> 
> On Thu, May 30, 2013 at 10:17:30AM -0400, Brian Foster wrote:
>> On 05/27/2013 02:38 AM, Dave Chinner wrote:
>>> From: Dave Chinner <dchinner@redhat.com>
>>>
>>> XFS has failed to kill suid/sgid bits correctly when truncating
>>> files of non-zero size since commit c4ed4243 ("xfs: split
>>> xfs_setattr") introduced in the 3.1 kernel. Fix it.
>>>
>>
>> The code makes sense and I can easily hit an assert when truncating
>> (extending) a suid file on a debug kernel without this patch (and I see
>> the suid dropped with the patch).
> 
> What commands did you use?  It seems like this is dealing with S_ISGID, correct?
> 

Hi Ben,

Yeah, that confused me at first as well. I believe the vfs interprets
the ATTR_KILL_SUID/SGIT bits prior to the setattr call and wipes out the
associated mode bits if necessary.

What I did was basically create a zero sized file as root, chmod to a+s
and a+rwx and then as a regular user, truncate that file to something
larger than zero. Without the patch I hit the assert and with the patch
the assert doesn't fire and the setuid bit is dropped.

Brian

> Thanks,
> 	Ben
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 7/9] xfs: kill suid/sgid through the truncate path.
  2013-05-30 16:02       ` Brian Foster
@ 2013-05-30 17:07         ` Ben Myers
  0 siblings, 0 replies; 54+ messages in thread
From: Ben Myers @ 2013-05-30 17:07 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Thu, May 30, 2013 at 12:02:40PM -0400, Brian Foster wrote:
> On 05/30/2013 11:52 AM, Ben Myers wrote:
> > Hey Brian,
> > 
> > On Thu, May 30, 2013 at 10:17:30AM -0400, Brian Foster wrote:
> >> On 05/27/2013 02:38 AM, Dave Chinner wrote:
> >>> From: Dave Chinner <dchinner@redhat.com>
> >>>
> >>> XFS has failed to kill suid/sgid bits correctly when truncating
> >>> files of non-zero size since commit c4ed4243 ("xfs: split
> >>> xfs_setattr") introduced in the 3.1 kernel. Fix it.
> >>>
> >>
> >> The code makes sense and I can easily hit an assert when truncating
> >> (extending) a suid file on a debug kernel without this patch (and I see
> >> the suid dropped with the patch).
> > 
> > What commands did you use?  It seems like this is dealing with S_ISGID, correct?
> > 
> 
> Hi Ben,
> 
> Yeah, that confused me at first as well. I believe the vfs interprets
> the ATTR_KILL_SUID/SGIT bits prior to the setattr call and wipes out the
> associated mode bits if necessary.
> 
> What I did was basically create a zero sized file as root, chmod to a+s
> and a+rwx and then as a regular user, truncate that file to something
> larger than zero. Without the patch I hit the assert and with the patch
> the assert doesn't fire and the setuid bit is dropped.

Perfect, thanks.  I'm able to hit this on 3.10-rc3, but not 3.10-rc1.
Interesting.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 1/9] xfs: don't emit v5 superblock warnings on write
  2013-05-29 16:39   ` Brian Foster
@ 2013-05-30 17:49     ` Ben Myers
  2013-06-11  6:05       ` Dave Chinner
  0 siblings, 1 reply; 54+ messages in thread
From: Ben Myers @ 2013-05-30 17:49 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Wed, May 29, 2013 at 12:39:11PM -0400, Brian Foster wrote:
> On 05/27/2013 02:38 AM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > We write the superblock every 30s or so which results in the
> > verifier being called. Right now that results in this output
> > every 30s:
> > 
> > XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
> > Use of these features in this kernel is at your own risk!
> > 
> > And spamming the logs.
> > 
> > We don't need to check for whether we support v5 superblocks or
> > whether there are feature bits we don't support set as these are
> > only relevant when we first mount the filesytem. i.e. on superblock
> > read. Hence for the write verification we can just skip all the
> > checks (and hence verbose output) altogether.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>

Applied.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 2/9] xfs: fix incorrect remote symlink block count
  2013-05-29 16:39   ` Brian Foster
  2013-05-30  0:46     ` Dave Chinner
@ 2013-05-30 17:49     ` Ben Myers
  1 sibling, 0 replies; 54+ messages in thread
From: Ben Myers @ 2013-05-30 17:49 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Wed, May 29, 2013 at 12:39:33PM -0400, Brian Foster wrote:
> On 05/27/2013 02:38 AM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When CRCs are enabled, the number of blocks needed to hold a remote
> > symlink on a 1k block size filesystem may be 2 instead of 1. The
> > transaction reservation for the allocated bloks was not taking this
> > into account and only allocating one block. hence when trying to
> > read or invalidate such symlinks, we are mapping a hole where there
> > should be a block and things go bad at that point.
> > 
> > Fix the reservation to use the correct block count, clean up the
> > block count calculation similar to the remote attribute calculation,
> > and add a debug guard to detect when we don't write the entire
> > symlink to disk.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>

Applied.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 5/9] xfs: fix split buffer vector log recovery support
  2013-05-29 19:21   ` Mark Tinguely
@ 2013-05-30 17:49     ` Ben Myers
  0 siblings, 0 replies; 54+ messages in thread
From: Ben Myers @ 2013-05-30 17:49 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On Wed, May 29, 2013 at 02:21:52PM -0500, Mark Tinguely wrote:
> On 05/27/13 01:38, Dave Chinner wrote:
> >From: Dave Chinner<dchinner@redhat.com>
> >
> >A long time ago in a galaxy far away....
> >
> >.. the was a commit made to fix some ilinux specific "fragmented
> >buffer" log recovery problem:
> >
> >http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=b29c0bece51da72fb3ff3b61391a391ea54e1603
> >
> >That problem occurred when a contiguous dirty region of a buffer was
> >split across across two pages of an unmapped buffer. It's been a
> >long time since that has been done in XFS, and the changes to log
> >the entire inode buffers for CRC enabled filesystems has
> >re-introduced that corner case.
> >
> >And, of course, it turns out that the above commit didn't actually
> >fix anything - it just ensured that log recovery is guaranteed to
> >fail when this situation occurs. And now for the gory details.
> >
> ...
> 
> Thanks for the great walk through of the problem.
> 
> Reviewed-by: Mark Tinguely <tinguely@sgi.com>

Applied.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/9] xfs: disable swap extents ioctl on CRC enabled filesystems
  2013-05-29 21:06   ` Brian Foster
@ 2013-05-30 17:56     ` Ben Myers
  0 siblings, 0 replies; 54+ messages in thread
From: Ben Myers @ 2013-05-30 17:56 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Wed, May 29, 2013 at 05:06:08PM -0400, Brian Foster wrote:
> On 05/27/2013 02:38 AM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Currently, swapping extents from one inode to another is a simple
> > act of switching data and attribute forks from one inode to another.
> > This, unfortunately in no longer so simple with CRC enabled
> > filesystems as there is owner information embedded into the BMBT
> > blocks that are swapped between inodes. Hence swapping the forks
> > between inodes results in the inodes having mapping blocks that
> > point to the wrong owner and hence are considered corrupt.
> > 
> > To fix this we need an extent tree block or record based swap
> > algorithm so that the BMBT block owner information can be updated
> > atomically in the swap transaction. This is a significant piece of
> > new work, so for the moment simply don't allow swap extent
> > operations to succeed on CRC enabled filesystems.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> 
> Pretty straightforward...
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>

Applied.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 8/9] xfs: add fsgeom flag for v5 superblock support.
  2013-05-30 14:17   ` Brian Foster
@ 2013-05-30 17:57     ` Ben Myers
  0 siblings, 0 replies; 54+ messages in thread
From: Ben Myers @ 2013-05-30 17:57 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Thu, May 30, 2013 at 10:17:39AM -0400, Brian Foster wrote:
> On 05/27/2013 02:38 AM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Currently userspace has no way of determining that a filesystem is
> > CRC enabled. Add a flag to the XFS_IOC_FSGEOMETRY ioctl output to
> > indicate that the filesystem has v5 superblock support enabled.
> > This will allow xfs_info to correctly report the state of the
> > filesystem.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>

Applied.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 10/9] xfs: fix dir3 freespace block corruption
  2013-05-28  8:37 ` [PATCH 10/9] xfs: fix dir3 freespace block corruption Dave Chinner
@ 2013-05-30 19:15   ` Ben Myers
  2013-05-31 21:54     ` Ben Myers
  0 siblings, 1 reply; 54+ messages in thread
From: Ben Myers @ 2013-05-30 19:15 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, May 28, 2013 at 06:37:17PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When the directory freespace index grows to a second block (2017
> 4k data blocks in the directory), the initialisation of the second
> new block header goes wrong. The write verifier fires a corruption
> error indicating that the block number in the header is zero. This
> was being tripped by xfs/110.
> 
> The problem is that the initialisation of the new block is done just
> fine in xfs_dir3_free_get_buf(), but the caller then users a dirv2
> structure to zero on-disk header fields that xfs_dir3_free_get_buf()
> has already zeroed. These lined up with the block number in the dir
> v3 header format.
> 
> While looking at this, I noticed that the struct xfs_dir3_free_hdr()
> had 4 bytes of padding in it that wasn't defined as padding or being
> zeroed by the initialisation. Add a pad field declaration and fully
> zero the on disk and in-core headers in xfs_dir3_free_get_buf() so
> that this is never an issue in the future. Note that this doesn't
> change the on-disk layout, just makes the 32 bits of padding in the
> layout explicit.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_dir2_format.h |    1 +
>  fs/xfs/xfs_dir2_node.c   |   13 ++++++-------
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/xfs_dir2_format.h b/fs/xfs/xfs_dir2_format.h
> index a3b1bd8..995f1f5 100644
> --- a/fs/xfs/xfs_dir2_format.h
> +++ b/fs/xfs/xfs_dir2_format.h
> @@ -715,6 +715,7 @@ struct xfs_dir3_free_hdr {
>  	__be32			firstdb;	/* db of first entry */
>  	__be32			nvalid;		/* count of valid entries */
>  	__be32			nused;		/* count of used entries */
> +	__be32			pad;		/* 64 bit alignment. */

Yeah, my count also puts nused short of 64 bit alignment.  Looks ok.

Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 2/2] xfs: inode unlinked list needs to recalculate the inode CRC
  2013-05-30 14:17       ` Brian Foster
@ 2013-05-30 20:27         ` Dave Chinner
  0 siblings, 0 replies; 54+ messages in thread
From: Dave Chinner @ 2013-05-30 20:27 UTC (permalink / raw)
  To: Brian Foster; +Cc: bpm, xfs

On Thu, May 30, 2013 at 10:17:47AM -0400, Brian Foster wrote:
> On 05/28/2013 04:36 PM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > The inode unlinked list manipulations operate directly on the inode
> > buffer, and so bypass the inode CRC calculation mechanisms. Hence an
> > inode on the unlinked list has an invalid CRC. Fix this by
> > recalculating the CRC whenever we modify an unlinked list pointer in
> > an inode, ncluding during log recovery. This is trivial to do and
> > results in  unlinked list operations always leaving a consistent
> > inode in the buffer.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_inode.c       |   42 ++++++++++++++++++++++++++++++++++++++----
> >  fs/xfs/xfs_log_recover.c |    9 +++++++++
> >  2 files changed, 47 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index efbe1ac..2d993e7 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -1579,6 +1579,23 @@ out_bmap_cancel:
> >  }
> >  
> >  /*
> > + * Helper function to calculate range of the inode to log and recalculate the
> > + * on-disk inode crc if necessary.
> > + */
> > +static int
> > +xfs_iunlink_dinode_calc_crc(
> > +	struct xfs_mount	*mp,
> > +	struct xfs_dinode	*dip)
> > +{
> > +	if (dip->di_version < 3)
> > +		return sizeof(xfs_agino_t);
> > +
> > +	xfs_dinode_calc_crc(mp, dip);
> > +	return offsetof(struct xfs_dinode, di_changecount) -
> > +		offsetof(struct xfs_dinode, di_next_unlinked);
> > +}
> > +
> 
> So we've added a new helper, the return value for which is either the
> size of an inode number or an inode number + crc, depending on format. I
> also notice that the return value doesn't appear to be used anywhere
> this helper is called.

Because I forgot to remove it. ;)

> 
> > +/*
> >   * This is called when the inode's link count goes to 0.
> >   * We place the on-disk inode on a list in the AGI.  It
> >   * will be pulled from this list when the inode is freed.
> > @@ -1638,10 +1655,15 @@ xfs_iunlink(
> >  		dip->di_next_unlinked = agi->agi_unlinked[bucket_index];
> >  		offset = ip->i_imap.im_boffset +
> >  			offsetof(xfs_dinode_t, di_next_unlinked);
> > +
> > +		/* need to recalc the inode CRC if appropriate */
> > +		xfs_iunlink_dinode_calc_crc(mp, dip);
> > +
> >  		xfs_trans_inode_buf(tp, ibp);
> >  		xfs_trans_log_buf(tp, ibp, offset,
> > -				  (offset + sizeof(xfs_agino_t) - 1));
> > +				  offset + sizeof(xfs_agino_t) - 1);
> >  		xfs_inobp_check(mp, ibp);
> > +
> >  	}
> 
> So IIUC, offset is set to the offset of the di_next_unlinked value of
> this inode in the backing buffer of the inode (which we've just updated
> directly via dip).
> 
> We call the helper to update the CRC then call xfs_trans_log_buf() to
> log a range of the buffer. From the original code, I surmise that we're
> logging the range that represents the di_next_unlinked value and from
> the addition of the helper, I surmise we intend to now include the crc
> in that logged region.
> 
> But we haven't utilized the return value and I'm speculating on the
> intent here. So I see that we're updating the CRC, but is it actually
> logged? Perhaps I'm missing something, but if so, then why even have the
> xfs_iunlink_dinode_calc_crc() helper?
> 
> /me goes back to read the original 9/9 and followup:
> 
> http://oss.sgi.com/archives/xfs/2013-05/msg00867.html
> 
> OK, so in that case, perhaps the helper is now unnecessary and we could
> just call xfs_dinode_calc_crc()?

Yup, exactly.

> BTW, I was also going to ask here whether the fact that we update the
> CRC on recovery rather than logging it exposed items in the log to risk
> if they happened to become corrupted before that update occurs, but
> IIUC, we're still protected in that recovery itself should validate the
> existing on-disk CRC prior to the update. Correct?

No, recovery doesn't check it yet. Recovery is a complex dance, so
there's further work there needed allow recovery to do CRC checking
on read (as the buffer initialisation migh be what is being
replayed).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 10/9] xfs: fix dir3 freespace block corruption
  2013-05-30 19:15   ` Ben Myers
@ 2013-05-31 21:54     ` Ben Myers
  0 siblings, 0 replies; 54+ messages in thread
From: Ben Myers @ 2013-05-31 21:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, May 30, 2013 at 02:15:53PM -0500, Ben Myers wrote:
> On Tue, May 28, 2013 at 06:37:17PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When the directory freespace index grows to a second block (2017
> > 4k data blocks in the directory), the initialisation of the second
> > new block header goes wrong. The write verifier fires a corruption
> > error indicating that the block number in the header is zero. This
> > was being tripped by xfs/110.
> > 
> > The problem is that the initialisation of the new block is done just
> > fine in xfs_dir3_free_get_buf(), but the caller then users a dirv2
> > structure to zero on-disk header fields that xfs_dir3_free_get_buf()
> > has already zeroed. These lined up with the block number in the dir
> > v3 header format.
> > 
> > While looking at this, I noticed that the struct xfs_dir3_free_hdr()
> > had 4 bytes of padding in it that wasn't defined as padding or being
> > zeroed by the initialisation. Add a pad field declaration and fully
> > zero the on disk and in-core headers in xfs_dir3_free_get_buf() so
> > that this is never an issue in the future. Note that this doesn't
> > change the on-disk layout, just makes the 32 bits of padding in the
> > layout explicit.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_dir2_format.h |    1 +
> >  fs/xfs/xfs_dir2_node.c   |   13 ++++++-------
> >  2 files changed, 7 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_dir2_format.h b/fs/xfs/xfs_dir2_format.h
> > index a3b1bd8..995f1f5 100644
> > --- a/fs/xfs/xfs_dir2_format.h
> > +++ b/fs/xfs/xfs_dir2_format.h
> > @@ -715,6 +715,7 @@ struct xfs_dir3_free_hdr {
> >  	__be32			firstdb;	/* db of first entry */
> >  	__be32			nvalid;		/* count of valid entries */
> >  	__be32			nused;		/* count of used entries */
> > +	__be32			pad;		/* 64 bit alignment. */
> 
> Yeah, my count also puts nused short of 64 bit alignment.  Looks ok.
> 
> Reviewed-by: Ben Myers <bpm@sgi.com>

Applied this.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 4/9] xfs: rework dquot CRCs
  2013-05-30 12:02       ` Brian Foster
@ 2013-06-03  4:12         ` Dave Chinner
  0 siblings, 0 replies; 54+ messages in thread
From: Dave Chinner @ 2013-06-03  4:12 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Thu, May 30, 2013 at 08:02:12AM -0400, Brian Foster wrote:
> On 05/29/2013 09:00 PM, Dave Chinner wrote:
> > On Wed, May 29, 2013 at 02:58:27PM -0400, Brian Foster wrote:
> >> On 05/27/2013 02:38 AM, Dave Chinner wrote:
> >>> From: Dave Chinner <dchinner@redhat.com>
> >>>
> >>> Calculating dquot CRCs when the backing buffer is written back just
> >>> doesn't work reliably. There are several places which manipulate
> >>> dquots directly in the buffers, and they don't calculate CRCs
> >>> appropriately, nor do they always set the buffer up to calculate
> >>> CRCs appropriately.
> >>>
> >>> Firstly, if we log a dquot buffer (e.g. during allocation) it gets
> >>> logged without valid CRC, and so on recovery we end up with a dquot
> >>> that is not valid.
> >>>
> >>> Secondly, if we recover/repair a dquot, we don't have a verifier
> >>> attached to the buffer and hence CRCs arenot calculate don the way
> >>> down to disk.
> >>>
> >>> Thirdly, calculating the CRC after we've changed the contents means
> >>> that if we re-read the dquot from the buffer, we cannot verify the
> >>> contents of the dquot are valid, as the CRC is invalid.
> >>>
> >>> So, to avoid all the dquot CRC errors that are being detected by the
> >>> read verifier, change to using the same model as for inodes. that
> >>> is, dquot CRCs are calculated and written to the backing buffer at
> >>> the time the dquot is flushed to the backing buffer. If we modify
> >>> the dquuot directly in the backing buffer, calculate the CRC
> >>> immediately after the modification is complete. Hence the dquot in
> >>> the on-disk buffer should always have a valid CRC.
> >>>
> >>> Signed-off-by: Dave Chinner <dchinner@redhat.com>
....
> >>> @@ -853,8 +854,12 @@ xfs_qm_reset_dqcounts(
> >>>  	do_div(j, sizeof(xfs_dqblk_t));
> >>>  	ASSERT(mp->m_quotainfo->qi_dqperchunk == j);
> >>>  #endif
> >>> -	ddq = bp->b_addr;
> >>> +	dqb = bp->b_addr;
> >>>  	for (j = 0; j < mp->m_quotainfo->qi_dqperchunk; j++) {
> >>> +		struct xfs_disk_dquot	*ddq;
> >>> +
> >>> +		ddq =  (struct xfs_disk_dquot *)&dqb[j];
> >>> +
> >>>  		/*
> >>>  		 * Do a sanity check, and if needed, repair the dqblk. Don't
> >>>  		 * output any warnings because it's perfectly possible to
> >>> @@ -871,7 +876,8 @@ xfs_qm_reset_dqcounts(
> >>>  		ddq->d_bwarns = 0;
> >>>  		ddq->d_iwarns = 0;
> >>>  		ddq->d_rtbwarns = 0;
> >>> -		ddq = (xfs_disk_dquot_t *) ((xfs_dqblk_t *)ddq + 1);
> >>> +		xfs_update_cksum((char *)&dqb[j], sizeof(struct xfs_dqblk),
> >>> +					 XFS_DQUOT_CRC_OFF);
> >>
> >> Nice cleanup on the cast ugliness even without the crc change. Is there
> >> a technical reason for the unconditional crc update here beyond that
> >> we're doing a reset? I'm wondering if there's any value in leaving those
> >> bits untouched for a filesystem that might have never enabled crc
> >> (quotacheck or not).
> > 
> > The dquot might be zeroed and unused, but the buffer it sits in is
> > still allocated and valid. That means if we ever start using that
> > dquot again (either by quotacheck or a new uid/gid/prid), it will be
> > read straight out of the buffer rather than allocated, and hence the
> > constraint that allocated but unused dquots still need to have valid
> > CRCs.
> > 
> 
> The constraint makes sense when CRCs are enabled...
> 
> > FWIW, the dquot buffer read validates the CRC on all dquots in the
> > buffer when it comes off disk as it has no way of knowing what
> > dquots contain valid data or not. Same with the xfs_qm_dqcheck()
> > call - an unused dquot still needs to be a valid dquot to pass those
> > checks...
> > 
> 
> Yeah, that part makes sense. I've followed through and grokked most of
> the dquot buffer read and dquot CRC validation code, I think.
> 
> My question is more why is the code above (in xfs_qm_reset_dqcounts())
> not the following?
> 
> 	if (xfs_sb_version_hascrc(&mp->m_sb))
> 		xfs_update_cksum(...);

Because I forgot as it really doesn't matter at all. It wasn't
clear to me that this is what you were asking about the first time
around....

Fixed.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 1/9] xfs: don't emit v5 superblock warnings on write
  2013-05-30 17:49     ` Ben Myers
@ 2013-06-11  6:05       ` Dave Chinner
  2013-06-11 21:29         ` Ben Myers
  0 siblings, 1 reply; 54+ messages in thread
From: Dave Chinner @ 2013-06-11  6:05 UTC (permalink / raw)
  To: Ben Myers; +Cc: Brian Foster, xfs

On Thu, May 30, 2013 at 12:49:15PM -0500, Ben Myers wrote:
> On Wed, May 29, 2013 at 12:39:11PM -0400, Brian Foster wrote:
> > On 05/27/2013 02:38 AM, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > We write the superblock every 30s or so which results in the
> > > verifier being called. Right now that results in this output
> > > every 30s:
> > > 
> > > XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
> > > Use of these features in this kernel is at your own risk!
> > > 
> > > And spamming the logs.
> > > 
> > > We don't need to check for whether we support v5 superblocks or
> > > whether there are feature bits we don't support set as these are
> > > only relevant when we first mount the filesytem. i.e. on superblock
> > > read. Hence for the write verification we can just skip all the
> > > checks (and hence verbose output) altogether.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > 
> > Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> Applied.

Ben, while I see this in the xfsdev tree, I don't see it in Linus'
tree. Did it get missed on the recent merges? If so, can you queue
it up for 3.10-rc6?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 1/9] xfs: don't emit v5 superblock warnings on write
  2013-06-11  6:05       ` Dave Chinner
@ 2013-06-11 21:29         ` Ben Myers
  0 siblings, 0 replies; 54+ messages in thread
From: Ben Myers @ 2013-06-11 21:29 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Brian Foster, xfs

Hi Dave,

On Tue, Jun 11, 2013 at 04:05:46PM +1000, Dave Chinner wrote:
> On Thu, May 30, 2013 at 12:49:15PM -0500, Ben Myers wrote:
> > On Wed, May 29, 2013 at 12:39:11PM -0400, Brian Foster wrote:
> > > On 05/27/2013 02:38 AM, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > We write the superblock every 30s or so which results in the
> > > > verifier being called. Right now that results in this output
> > > > every 30s:
> > > > 
> > > > XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
> > > > Use of these features in this kernel is at your own risk!
> > > > 
> > > > And spamming the logs.
> > > > 
> > > > We don't need to check for whether we support v5 superblocks or
> > > > whether there are feature bits we don't support set as these are
> > > > only relevant when we first mount the filesytem. i.e. on superblock
> > > > read. Hence for the write verification we can just skip all the
> > > > checks (and hence verbose output) altogether.
> > > > 
> > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > > ---
> > > 
> > > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > 
> > Applied.
> 
> Ben, while I see this in the xfsdev tree, I don't see it in Linus'
> tree. Did it get missed on the recent merges? If so, can you queue
> it up for 3.10-rc6?

Argh.  I missed it in the shuffle.  I'll queue it up for 3.10-rc6.
Apologies, and a nice catch.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2013-06-11 21:29 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-27  6:38 [PATH 0/9] xfs: fixes for 3.10-rc4 Dave Chinner
2013-05-27  6:38 ` [PATCH 1/9] xfs: don't emit v5 superblock warnings on write Dave Chinner
2013-05-29 16:39   ` Brian Foster
2013-05-30 17:49     ` Ben Myers
2013-06-11  6:05       ` Dave Chinner
2013-06-11 21:29         ` Ben Myers
2013-05-27  6:38 ` [PATCH 2/9] xfs: fix incorrect remote symlink block count Dave Chinner
2013-05-29 16:39   ` Brian Foster
2013-05-30  0:46     ` Dave Chinner
2013-05-30 17:49     ` Ben Myers
2013-05-27  6:38 ` [PATCH 3/9] xfs: increase number of ACL entries for V5 superblocks Dave Chinner
2013-05-29 16:40   ` Brian Foster
2013-05-27  6:38 ` [PATCH 4/9] xfs: rework dquot CRCs Dave Chinner
2013-05-29 18:58   ` Brian Foster
2013-05-30  1:00     ` Dave Chinner
2013-05-30 12:02       ` Brian Foster
2013-06-03  4:12         ` Dave Chinner
2013-05-27  6:38 ` [PATCH 5/9] xfs: fix split buffer vector log recovery support Dave Chinner
2013-05-29 19:21   ` Mark Tinguely
2013-05-30 17:49     ` Ben Myers
2013-05-27  6:38 ` [PATCH 6/9] xfs: disable swap extents ioctl on CRC enabled filesystems Dave Chinner
2013-05-28 21:49   ` Ben Myers
2013-05-30  1:07     ` Dave Chinner
2013-05-29 21:06   ` Brian Foster
2013-05-30 17:56     ` Ben Myers
2013-05-27  6:38 ` [PATCH 7/9] xfs: kill suid/sgid through the truncate path Dave Chinner
2013-05-30 14:17   ` Brian Foster
2013-05-30 15:52     ` Ben Myers
2013-05-30 16:02       ` Brian Foster
2013-05-30 17:07         ` Ben Myers
2013-05-27  6:38 ` [PATCH 8/9] xfs: add fsgeom flag for v5 superblock support Dave Chinner
2013-05-29 15:10   ` Eric Sandeen
2013-05-29 21:43     ` Ben Myers
2013-05-29 21:47       ` Ben Myers
2013-05-30  1:28       ` Dave Chinner
2013-05-30  1:11     ` Dave Chinner
2013-05-30 14:17   ` Brian Foster
2013-05-30 17:57     ` Ben Myers
2013-05-27  6:38 ` [PATCH 9/9] xfs: inode unlinked list needs to recalculate the inode CRC Dave Chinner
2013-05-28 11:51   ` Dave Chinner
2013-05-28 20:36   ` [PATCH 9a,9b v2, replacements] xfs: unlinked list crcs Dave Chinner
2013-05-28 20:36     ` [PATCH 1/2] xfs: fix log recovery transaction item reordering Dave Chinner
2013-05-28 20:36     ` [PATCH 2/2] xfs: inode unlinked list needs to recalculate the inode CRC Dave Chinner
2013-05-30 14:17       ` Brian Foster
2013-05-30 20:27         ` Dave Chinner
2013-05-28  8:37 ` [PATCH 10/9] xfs: fix dir3 freespace block corruption Dave Chinner
2013-05-30 19:15   ` Ben Myers
2013-05-31 21:54     ` Ben Myers
2013-05-28 17:56 ` [PATH 0/9] xfs: fixes for 3.10-rc4 Ben Myers
2013-05-28 23:54   ` Dave Chinner
2013-05-29 19:01     ` Ben Myers
2013-05-29 19:27       ` Eric Sandeen
2013-05-29 19:45         ` Ben Myers
2013-05-28 21:27 ` [PATCH 11/9] xfs: fix remote attribute invalidation for a leaf Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.