All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] xfs: outstanding fixes for 3.10
@ 2013-06-05  2:09 Dave Chinner
  2013-06-05  2:09 ` [PATCH 1/4] xfs: fix log recovery transaction item reordering Dave Chinner
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Dave Chinner @ 2013-06-05  2:09 UTC (permalink / raw)
  To: xfs

Hi Ben,

This is a repost of all the outstanding patches I have for 3.10
rebased on top of a current xfsdev tree.  All of the review comments
have been addressed, so they shoul dbe just about good to go.

Cheers,

Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] xfs: fix log recovery transaction item reordering
  2013-06-05  2:09 [PATCH 0/4] xfs: outstanding fixes for 3.10 Dave Chinner
@ 2013-06-05  2:09 ` Dave Chinner
  2013-06-05 13:13   ` Mark Tinguely
  2013-06-05  2:09 ` [PATCH 2/4] xfs: inode unlinked list needs to recalculate the inode CRC Dave Chinner
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2013-06-05  2:09 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

There are several constraints that inode allocation and unlink
logging impose on log recovery. These all stem from the fact that
inode alloc/unlink are logged in buffers, but all other inode
changes are logged in inode items. Hence there are ordering
constraints that recovery must follow to ensure the correct result
occurs.

As it turns out, this ordering has been working mostly by chance
than good management. The existing code moves all buffers except
cancelled buffers to the head of the list, and everything else to
the tail of the list. The problem with this is that is interleaves
inode items with the buffer cancellation items, and hence whether
the inode item in an cancelled buffer gets replayed is essentially
left to chance.

Further, this ordering causes problems for log recovery when inode
CRCs are enabled. It typically replays the inode unlink buffer long before
it replays the inode core changes, and so the CRC recorded in an
unlink buffer is going to be invalid and hence any attempt to
validate the inode in the buffer is going to fail. Hence we really
need to enforce the ordering that the inode alloc/unlink code has
expected log recovery to have since inode chunk de-allocation was
introduced back in 2003...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_recover.c |   65 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 58 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index d6204d1..83088d9 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1599,10 +1599,43 @@ xlog_recover_add_to_trans(
 }
 
 /*
- * Sort the log items in the transaction. Cancelled buffers need
- * to be put first so they are processed before any items that might
- * modify the buffers. If they are cancelled, then the modifications
- * don't need to be replayed.
+ * Sort the log items in the transaction.
+ *
+ * The ordering constraints are defined by the inode allocation and unlink
+ * behaviour. The rules are:
+ *
+ *	1. Every item is only logged once in a given transaction. Hence it
+ *	   represents the last logged state of the item. Hence ordering is
+ *	   dependent on the order in which operations need to be performed so
+ *	   required initial conditions are always met.
+ *
+ *	2. Cancelled buffers are recorded in pass 1 in a separate table and
+ *	   there's nothing to replay from them so we can simply cull them
+ *	   from the transaction. However, we can't do that until after we've
+ *	   replayed all the other items because they may be dependent on the
+ *	   cancelled buffer and replaying the cancelled buffer can remove it
+ *	   form the cancelled buffer table. Hence they have tobe done last.
+ *
+ *	3. Inode allocation buffers must be replayed before inode items that
+ *	   read the buffer and replay changes into it.
+ *
+ *	4. Inode unlink buffers must be replayed after inode items are replayed.
+ *	   This ensures that inodes are completely flushed to the inode buffer
+ *	   in a "free" state before we remove the unlinked inode list pointer.
+ *
+ * Hence the ordering needs to be inode allocation buffers first, inode items
+ * second, inode unlink buffers third and cancelled buffers last.
+ *
+ * But there's a problem with that - we can't tell an inode allocation buffer
+ * apart from a regular buffer, so we can't separate them. We can, however,
+ * tell an inode unlink buffer from the others, and so we can separate them out
+ * from all the other buffers and move them to last.
+ *
+ * Hence, 4 lists, in order from head to tail:
+ * 	- buffer_list for all buffers except cancelled/inode unlink buffers
+ * 	- item_list for all non-buffer items
+ * 	- inode_buffer_list for inode unlink buffers
+ * 	- cancel_list for the cancelled buffers
  */
 STATIC int
 xlog_recover_reorder_trans(
@@ -1612,6 +1645,10 @@ xlog_recover_reorder_trans(
 {
 	xlog_recover_item_t	*item, *n;
 	LIST_HEAD(sort_list);
+	LIST_HEAD(cancel_list);
+	LIST_HEAD(buffer_list);
+	LIST_HEAD(inode_buffer_list);
+	LIST_HEAD(inode_list);
 
 	list_splice_init(&trans->r_itemq, &sort_list);
 	list_for_each_entry_safe(item, n, &sort_list, ri_list) {
@@ -1619,12 +1656,18 @@ xlog_recover_reorder_trans(
 
 		switch (ITEM_TYPE(item)) {
 		case XFS_LI_BUF:
-			if (!(buf_f->blf_flags & XFS_BLF_CANCEL)) {
+			if (buf_f->blf_flags & XFS_BLF_CANCEL) {
 				trace_xfs_log_recover_item_reorder_head(log,
 							trans, item, pass);
-				list_move(&item->ri_list, &trans->r_itemq);
+				list_move(&item->ri_list, &cancel_list);
 				break;
 			}
+			if (buf_f->blf_flags & XFS_BLF_INODE_BUF) {
+				list_move(&item->ri_list, &inode_buffer_list);
+				break;
+			}
+			list_move_tail(&item->ri_list, &buffer_list);
+			break;
 		case XFS_LI_INODE:
 		case XFS_LI_DQUOT:
 		case XFS_LI_QUOTAOFF:
@@ -1632,7 +1675,7 @@ xlog_recover_reorder_trans(
 		case XFS_LI_EFI:
 			trace_xfs_log_recover_item_reorder_tail(log,
 							trans, item, pass);
-			list_move_tail(&item->ri_list, &trans->r_itemq);
+			list_move_tail(&item->ri_list, &inode_list);
 			break;
 		default:
 			xfs_warn(log->l_mp,
@@ -1643,6 +1686,14 @@ xlog_recover_reorder_trans(
 		}
 	}
 	ASSERT(list_empty(&sort_list));
+	if (!list_empty(&buffer_list))
+		list_splice(&buffer_list, &trans->r_itemq);
+	if (!list_empty(&inode_list))
+		list_splice_tail(&inode_list, &trans->r_itemq);
+	if (!list_empty(&inode_buffer_list))
+		list_splice_tail(&inode_buffer_list, &trans->r_itemq);
+	if (!list_empty(&cancel_list))
+		list_splice_tail(&cancel_list, &trans->r_itemq);
 	return 0;
 }
 
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] xfs: inode unlinked list needs to recalculate the inode CRC
  2013-06-05  2:09 [PATCH 0/4] xfs: outstanding fixes for 3.10 Dave Chinner
  2013-06-05  2:09 ` [PATCH 1/4] xfs: fix log recovery transaction item reordering Dave Chinner
@ 2013-06-05  2:09 ` Dave Chinner
  2013-06-05 14:43   ` Mark Tinguely
  2013-06-05  2:09 ` [PATCH 3/4] xfs: disable noattr2/attr2 mount options for CRC enabled filesystems Dave Chinner
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2013-06-05  2:09 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The inode unlinked list manipulations operate directly on the inode
buffer, and so bypass the inode CRC calculation mechanisms. Hence an
inode on the unlinked list has an invalid CRC. Fix this by
recalculating the CRC whenever we modify an unlinked list pointer in
an inode, ncluding during log recovery. This is trivial to do and
results in  unlinked list operations always leaving a consistent
inode in the buffer.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_inode.c       |   16 ++++++++++++++++
 fs/xfs/xfs_log_recover.c |    9 +++++++++
 2 files changed, 25 insertions(+)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index efbe1ac..7f7be5f 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1638,6 +1638,10 @@ xfs_iunlink(
 		dip->di_next_unlinked = agi->agi_unlinked[bucket_index];
 		offset = ip->i_imap.im_boffset +
 			offsetof(xfs_dinode_t, di_next_unlinked);
+
+		/* need to recalc the inode CRC if appropriate */
+		xfs_dinode_calc_crc(mp, dip);
+
 		xfs_trans_inode_buf(tp, ibp);
 		xfs_trans_log_buf(tp, ibp, offset,
 				  (offset + sizeof(xfs_agino_t) - 1));
@@ -1723,6 +1727,10 @@ xfs_iunlink_remove(
 			dip->di_next_unlinked = cpu_to_be32(NULLAGINO);
 			offset = ip->i_imap.im_boffset +
 				offsetof(xfs_dinode_t, di_next_unlinked);
+
+			/* need to recalc the inode CRC if appropriate */
+			xfs_dinode_calc_crc(mp, dip);
+
 			xfs_trans_inode_buf(tp, ibp);
 			xfs_trans_log_buf(tp, ibp, offset,
 					  (offset + sizeof(xfs_agino_t) - 1));
@@ -1796,6 +1804,10 @@ xfs_iunlink_remove(
 			dip->di_next_unlinked = cpu_to_be32(NULLAGINO);
 			offset = ip->i_imap.im_boffset +
 				offsetof(xfs_dinode_t, di_next_unlinked);
+
+			/* need to recalc the inode CRC if appropriate */
+			xfs_dinode_calc_crc(mp, dip);
+
 			xfs_trans_inode_buf(tp, ibp);
 			xfs_trans_log_buf(tp, ibp, offset,
 					  (offset + sizeof(xfs_agino_t) - 1));
@@ -1809,6 +1821,10 @@ xfs_iunlink_remove(
 		last_dip->di_next_unlinked = cpu_to_be32(next_agino);
 		ASSERT(next_agino != 0);
 		offset = last_offset + offsetof(xfs_dinode_t, di_next_unlinked);
+
+		/* need to recalc the inode CRC if appropriate */
+		xfs_dinode_calc_crc(mp, last_dip);
+
 		xfs_trans_inode_buf(tp, last_ibp);
 		xfs_trans_log_buf(tp, last_ibp, offset,
 				  (offset + sizeof(xfs_agino_t) - 1));
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 83088d9..45a85ff 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1912,6 +1912,15 @@ xlog_recover_do_inode_buffer(
 		buffer_nextp = (xfs_agino_t *)xfs_buf_offset(bp,
 					      next_unlinked_offset);
 		*buffer_nextp = *logged_nextp;
+
+		/*
+		 * If necessary, recalculate the CRC in the on-disk inode. We
+		 * have to leave the inode in a consistent state for whoever
+		 * reads it next....
+		 */
+		xfs_dinode_calc_crc(mp, (struct xfs_dinode *)
+				xfs_buf_offset(bp, i * mp->m_sb.sb_inodesize));
+
 	}
 
 	return 0;
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] xfs: disable noattr2/attr2 mount options for CRC enabled filesystems
  2013-06-05  2:09 [PATCH 0/4] xfs: outstanding fixes for 3.10 Dave Chinner
  2013-06-05  2:09 ` [PATCH 1/4] xfs: fix log recovery transaction item reordering Dave Chinner
  2013-06-05  2:09 ` [PATCH 2/4] xfs: inode unlinked list needs to recalculate the inode CRC Dave Chinner
@ 2013-06-05  2:09 ` Dave Chinner
  2013-06-05  2:09 ` [PATCH 4/4] xfs: increase number of ACL entries for V5 superblocks Dave Chinner
  2013-06-05 16:44 ` [PATCH 0/4] xfs: outstanding fixes for 3.10 Ben Myers
  4 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2013-06-05  2:09 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

attr2 format is always enabled for v5 superblock filesystems, so the
mount options to enable or disable it need to be cause mount errors.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 Documentation/filesystems/xfs.txt |    3 +++
 fs/xfs/xfs_super.c                |   11 +++++++++++
 2 files changed, 14 insertions(+)

diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
index 3e4b3dd..83577f0 100644
--- a/Documentation/filesystems/xfs.txt
+++ b/Documentation/filesystems/xfs.txt
@@ -33,6 +33,9 @@ When mounting an XFS filesystem, the following options are accepted.
 	removing extended attributes) the on-disk superblock feature
 	bit field will be updated to reflect this format being in use.
 
+	CRC enabled filesystems always use the attr2 format, and so
+	will reject the noattr2 mount option if it is set.
+
   barrier
 	Enables the use of block layer write barriers for writes into
 	the journal and unwritten extent conversion.  This allows for
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index ea341ce..3033ba5 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1373,6 +1373,17 @@ xfs_finish_flags(
 	}
 
 	/*
+	 * V5 filesystems always use attr2 format for attributes.
+	 */
+	if (xfs_sb_version_hascrc(&mp->m_sb) &&
+	    (mp->m_flags & XFS_MOUNT_NOATTR2)) {
+		xfs_warn(mp,
+"Cannot mount a V5 filesystem as %s. %s is always enabled for V5 filesystems.",
+			MNTOPT_NOATTR2, MNTOPT_ATTR2);
+		return XFS_ERROR(EINVAL);
+	}
+
+	/*
 	 * mkfs'ed attr2 will turn on attr2 mount unless explicitly
 	 * told by noattr2 to turn it off
 	 */
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] xfs: increase number of ACL entries for V5 superblocks
  2013-06-05  2:09 [PATCH 0/4] xfs: outstanding fixes for 3.10 Dave Chinner
                   ` (2 preceding siblings ...)
  2013-06-05  2:09 ` [PATCH 3/4] xfs: disable noattr2/attr2 mount options for CRC enabled filesystems Dave Chinner
@ 2013-06-05  2:09 ` Dave Chinner
  2013-06-05 13:26   ` Mark Tinguely
  2013-06-05 16:44 ` [PATCH 0/4] xfs: outstanding fixes for 3.10 Ben Myers
  4 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2013-06-05  2:09 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The limit of 25 ACL entries is arbitrary, but baked into the on-disk
format.  For version 5 superblocks, increase it to the maximum nuber
of ACLs that can fit into a single xattr.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_acl.c |   31 ++++++++++++++++++-------------
 fs/xfs/xfs_acl.h |   31 ++++++++++++++++++++++++-------
 2 files changed, 42 insertions(+), 20 deletions(-)

diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
index 1d32f1d..306d883 100644
--- a/fs/xfs/xfs_acl.c
+++ b/fs/xfs/xfs_acl.c
@@ -21,6 +21,8 @@
 #include "xfs_bmap_btree.h"
 #include "xfs_inode.h"
 #include "xfs_vnodeops.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
 #include "xfs_trace.h"
 #include <linux/slab.h>
 #include <linux/xattr.h>
@@ -34,7 +36,9 @@
  */
 
 STATIC struct posix_acl *
-xfs_acl_from_disk(struct xfs_acl *aclp)
+xfs_acl_from_disk(
+	struct xfs_acl	*aclp,
+	int		max_entries)
 {
 	struct posix_acl_entry *acl_e;
 	struct posix_acl *acl;
@@ -42,7 +46,7 @@ xfs_acl_from_disk(struct xfs_acl *aclp)
 	unsigned int count, i;
 
 	count = be32_to_cpu(aclp->acl_cnt);
-	if (count > XFS_ACL_MAX_ENTRIES)
+	if (count > max_entries)
 		return ERR_PTR(-EFSCORRUPTED);
 
 	acl = posix_acl_alloc(count, GFP_KERNEL);
@@ -108,9 +112,9 @@ xfs_get_acl(struct inode *inode, int type)
 	struct xfs_inode *ip = XFS_I(inode);
 	struct posix_acl *acl;
 	struct xfs_acl *xfs_acl;
-	int len = sizeof(struct xfs_acl);
 	unsigned char *ea_name;
 	int error;
+	int len;
 
 	acl = get_cached_acl(inode, type);
 	if (acl != ACL_NOT_CACHED)
@@ -133,8 +137,8 @@ xfs_get_acl(struct inode *inode, int type)
 	 * If we have a cached ACLs value just return it, not need to
 	 * go out to the disk.
 	 */
-
-	xfs_acl = kzalloc(sizeof(struct xfs_acl), GFP_KERNEL);
+	len = XFS_ACL_MAX_SIZE(ip->i_mount);
+	xfs_acl = kzalloc(len, GFP_KERNEL);
 	if (!xfs_acl)
 		return ERR_PTR(-ENOMEM);
 
@@ -153,7 +157,7 @@ xfs_get_acl(struct inode *inode, int type)
 		goto out;
 	}
 
-	acl = xfs_acl_from_disk(xfs_acl);
+	acl = xfs_acl_from_disk(xfs_acl, XFS_ACL_MAX_ENTRIES(ip->i_mount));
 	if (IS_ERR(acl))
 		goto out;
 
@@ -189,16 +193,17 @@ xfs_set_acl(struct inode *inode, int type, struct posix_acl *acl)
 
 	if (acl) {
 		struct xfs_acl *xfs_acl;
-		int len;
+		int len = XFS_ACL_MAX_SIZE(ip->i_mount);
 
-		xfs_acl = kzalloc(sizeof(struct xfs_acl), GFP_KERNEL);
+		xfs_acl = kzalloc(len, GFP_KERNEL);
 		if (!xfs_acl)
 			return -ENOMEM;
 
 		xfs_acl_to_disk(xfs_acl, acl);
-		len = sizeof(struct xfs_acl) -
-			(sizeof(struct xfs_acl_entry) *
-			 (XFS_ACL_MAX_ENTRIES - acl->a_count));
+
+		/* subtract away the unused acl entries */
+		len -= sizeof(struct xfs_acl_entry) *
+			 (XFS_ACL_MAX_ENTRIES(ip->i_mount) - acl->a_count);
 
 		error = -xfs_attr_set(ip, ea_name, (unsigned char *)xfs_acl,
 				len, ATTR_ROOT);
@@ -243,7 +248,7 @@ xfs_set_mode(struct inode *inode, umode_t mode)
 static int
 xfs_acl_exists(struct inode *inode, unsigned char *name)
 {
-	int len = sizeof(struct xfs_acl);
+	int len = XFS_ACL_MAX_SIZE(XFS_M(inode->i_sb));
 
 	return (xfs_attr_get(XFS_I(inode), name, NULL, &len,
 			    ATTR_ROOT|ATTR_KERNOVAL) == 0);
@@ -379,7 +384,7 @@ xfs_xattr_acl_set(struct dentry *dentry, const char *name,
 		goto out_release;
 
 	error = -EINVAL;
-	if (acl->a_count > XFS_ACL_MAX_ENTRIES)
+	if (acl->a_count > XFS_ACL_MAX_ENTRIES(XFS_M(inode->i_sb)))
 		goto out_release;
 
 	if (type == ACL_TYPE_ACCESS) {
diff --git a/fs/xfs/xfs_acl.h b/fs/xfs/xfs_acl.h
index 39632d9..4016a56 100644
--- a/fs/xfs/xfs_acl.h
+++ b/fs/xfs/xfs_acl.h
@@ -22,19 +22,36 @@ struct inode;
 struct posix_acl;
 struct xfs_inode;
 
-#define XFS_ACL_MAX_ENTRIES 25
 #define XFS_ACL_NOT_PRESENT (-1)
 
 /* On-disk XFS access control list structure */
+struct xfs_acl_entry {
+	__be32	ae_tag;
+	__be32	ae_id;
+	__be16	ae_perm;
+	__be16	ae_pad;		/* fill the implicit hole in the structure */
+};
+
 struct xfs_acl {
-	__be32		acl_cnt;
-	struct xfs_acl_entry {
-		__be32	ae_tag;
-		__be32	ae_id;
-		__be16	ae_perm;
-	} acl_entry[XFS_ACL_MAX_ENTRIES];
+	__be32			acl_cnt;
+	struct xfs_acl_entry	acl_entry[0];
 };
 
+/*
+ * The number of ACL entries allowed is defined by the on-disk format.
+ * For v4 superblocks, that is limited to 25 entries. For v5 superblocks, it is
+ * limited only by the maximum size of the xattr that stores the information.
+ */
+#define XFS_ACL_MAX_ENTRIES(mp)	\
+	(xfs_sb_version_hascrc(&mp->m_sb) \
+		?  (XATTR_SIZE_MAX - sizeof(struct xfs_acl)) / \
+						sizeof(struct xfs_acl_entry) \
+		: 25)
+
+#define XFS_ACL_MAX_SIZE(mp) \
+	(sizeof(struct xfs_acl) + \
+		sizeof(struct xfs_acl_entry) * XFS_ACL_MAX_ENTRIES((mp)))
+
 /* On-disk XFS extended attribute names */
 #define SGI_ACL_FILE		(unsigned char *)"SGI_ACL_FILE"
 #define SGI_ACL_DEFAULT		(unsigned char *)"SGI_ACL_DEFAULT"
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] xfs: fix log recovery transaction item reordering
  2013-06-05  2:09 ` [PATCH 1/4] xfs: fix log recovery transaction item reordering Dave Chinner
@ 2013-06-05 13:13   ` Mark Tinguely
  0 siblings, 0 replies; 10+ messages in thread
From: Mark Tinguely @ 2013-06-05 13:13 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 06/04/13 21:09, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> There are several constraints that inode allocation and unlink
> logging impose on log recovery. These all stem from the fact that
> inode alloc/unlink are logged in buffers, but all other inode
> changes are logged in inode items. Hence there are ordering
> constraints that recovery must follow to ensure the correct result
> occurs.
>
> As it turns out, this ordering has been working mostly by chance
> than good management. The existing code moves all buffers except
> cancelled buffers to the head of the list, and everything else to
> the tail of the list. The problem with this is that is interleaves
> inode items with the buffer cancellation items, and hence whether
> the inode item in an cancelled buffer gets replayed is essentially
> left to chance.
>
> Further, this ordering causes problems for log recovery when inode
> CRCs are enabled. It typically replays the inode unlink buffer long before
> it replays the inode core changes, and so the CRC recorded in an
> unlink buffer is going to be invalid and hence any attempt to
> validate the inode in the buffer is going to fail. Hence we really
> need to enforce the ordering that the inode alloc/unlink code has
> expected log recovery to have since inode chunk de-allocation was
> introduced back in 2003...
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>

Looks good.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] xfs: increase number of ACL entries for V5 superblocks
  2013-06-05  2:09 ` [PATCH 4/4] xfs: increase number of ACL entries for V5 superblocks Dave Chinner
@ 2013-06-05 13:26   ` Mark Tinguely
  2013-06-05 13:38     ` Mark Tinguely
  0 siblings, 1 reply; 10+ messages in thread
From: Mark Tinguely @ 2013-06-05 13:26 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 06/04/13 21:09, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> The limit of 25 ACL entries is arbitrary, but baked into the on-disk
> format.  For version 5 superblocks, increase it to the maximum nuber
> of ACLs that can fit into a single xattr.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> Reviewed-by: Brian Foster<bfoster@redhat.com>
> ---

>   /* On-disk XFS access control list structure */
> +struct xfs_acl_entry {
> +	__be32	ae_tag;
> +	__be32	ae_id;
> +	__be16	ae_perm;
> +	__be16	ae_pad;		/* fill the implicit hole in the structure */
> +};
> +
>   struct xfs_acl {
> -	__be32		acl_cnt;
> -	struct xfs_acl_entry {
> -		__be32	ae_tag;
> -		__be32	ae_id;
> -		__be16	ae_perm;
> -	} acl_entry[XFS_ACL_MAX_ENTRIES];
> +	__be32			acl_cnt;
> +	struct xfs_acl_entry	acl_entry[0];
>   };
>
> +/*
> + * The number of ACL entries allowed is defined by the on-disk format.
> + * For v4 superblocks, that is limited to 25 entries. For v5 superblocks, it is
> + * limited only by the maximum size of the xattr that stores the information.
> + */
> +#define XFS_ACL_MAX_ENTRIES(mp)	\
> +	(xfs_sb_version_hascrc(&mp->m_sb) \
> +		?  (XATTR_SIZE_MAX - sizeof(struct xfs_acl)) / \
> +						sizeof(struct xfs_acl_entry) \
> +		: 25)
> +
> +#define XFS_ACL_MAX_SIZE(mp) \
> +	(sizeof(struct xfs_acl) + \
> +		sizeof(struct xfs_acl_entry) * XFS_ACL_MAX_ENTRIES((mp)))
> +
>   /* On-disk XFS extended attribute names */
>   #define SGI_ACL_FILE		(unsigned char *)"SGI_ACL_FILE"
>   #define SGI_ACL_DEFAULT		(unsigned char *)"SGI_ACL_DEFAULT"

I thought you would leave the XFS_ACL_MAX_ENTRIES(mp) as:


#define XFS_ACL_MAX_ENTRIES(mp)	\
	(xfs_sb_version_hascrc(&mp->m_sb) \
	   ?  (XATTR_SIZE_MAX - sizeof(__be32)) / sizeof(struct xfs_acl_entry) \
	   : 25)

and change the XFS_ACL_SIZE to:

#define XFS_ACL_SIZE(mp) \
	((offsetof(struct xfs_acl_entry, acl_entry) + \
		sizeof(struct xfs_acl_entry) *

since acl_entry[] is a place holder for the entry array.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] xfs: increase number of ACL entries for V5 superblocks
  2013-06-05 13:26   ` Mark Tinguely
@ 2013-06-05 13:38     ` Mark Tinguely
  0 siblings, 0 replies; 10+ messages in thread
From: Mark Tinguely @ 2013-06-05 13:38 UTC (permalink / raw)
  To: xfs

On 06/05/13 08:26, Mark Tinguely wrote:
 > On 06/04/13 21:09, Dave Chinner wrote:
 >> From: Dave Chinner<dchinner@redhat.com>
 >>
 >> The limit of 25 ACL entries is arbitrary, but baked into the on-disk
 >> format. For version 5 superblocks, increase it to the maximum nuber
 >> of ACLs that can fit into a single xattr.
 >>
 >> Signed-off-by: Dave Chinner<dchinner@redhat.com>
 >> Reviewed-by: Brian Foster<bfoster@redhat.com>
 >> ---
 >
 >> /* On-disk XFS access control list structure */
 >> +struct xfs_acl_entry {
 >> + __be32 ae_tag;
 >> + __be32 ae_id;
 >> + __be16 ae_perm;
 >> + __be16 ae_pad; /* fill the implicit hole in the structure */
 >> +};
 >> +
 >> struct xfs_acl {
 >> - __be32 acl_cnt;
 >> - struct xfs_acl_entry {
 >> - __be32 ae_tag;
 >> - __be32 ae_id;
 >> - __be16 ae_perm;
 >> - } acl_entry[XFS_ACL_MAX_ENTRIES];
 >> + __be32 acl_cnt;
 >> + struct xfs_acl_entry acl_entry[0];
 >> };
 >>
 >> +/*
 >> + * The number of ACL entries allowed is defined by the on-disk format.
 >> + * For v4 superblocks, that is limited to 25 entries. For v5
 >> superblocks, it is
 >> + * limited only by the maximum size of the xattr that stores the
 >> information.
 >> + */
 >> +#define XFS_ACL_MAX_ENTRIES(mp) \
 >> + (xfs_sb_version_hascrc(&mp->m_sb) \
 >> + ? (XATTR_SIZE_MAX - sizeof(struct xfs_acl)) / \
 >> + sizeof(struct xfs_acl_entry) \
 >> + : 25)
 >> +
 >> +#define XFS_ACL_MAX_SIZE(mp) \
 >> + (sizeof(struct xfs_acl) + \
 >> + sizeof(struct xfs_acl_entry) * XFS_ACL_MAX_ENTRIES((mp)))
 >> +
 >> /* On-disk XFS extended attribute names */
 >> #define SGI_ACL_FILE (unsigned char *)"SGI_ACL_FILE"
 >> #define SGI_ACL_DEFAULT (unsigned char *)"SGI_ACL_DEFAULT"
 >
 > I thought you would leave the XFS_ACL_MAX_ENTRIES(mp) as:
 >

...

never mind, now I get it "struct xfs_acl_entry  acl_entry[0]" reserves
no space.

Reviewed-by: Mark Tinguely <tinuguely@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] xfs: inode unlinked list needs to recalculate the inode CRC
  2013-06-05  2:09 ` [PATCH 2/4] xfs: inode unlinked list needs to recalculate the inode CRC Dave Chinner
@ 2013-06-05 14:43   ` Mark Tinguely
  0 siblings, 0 replies; 10+ messages in thread
From: Mark Tinguely @ 2013-06-05 14:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 06/04/13 21:09, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> The inode unlinked list manipulations operate directly on the inode
> buffer, and so bypass the inode CRC calculation mechanisms. Hence an
> inode on the unlinked list has an invalid CRC. Fix this by
> recalculating the CRC whenever we modify an unlinked list pointer in
> an inode, ncluding during log recovery. This is trivial to do and
> results in  unlinked list operations always leaving a consistent
> inode in the buffer.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---

Looks good.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/4] xfs: outstanding fixes for 3.10
  2013-06-05  2:09 [PATCH 0/4] xfs: outstanding fixes for 3.10 Dave Chinner
                   ` (3 preceding siblings ...)
  2013-06-05  2:09 ` [PATCH 4/4] xfs: increase number of ACL entries for V5 superblocks Dave Chinner
@ 2013-06-05 16:44 ` Ben Myers
  4 siblings, 0 replies; 10+ messages in thread
From: Ben Myers @ 2013-06-05 16:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Jun 05, 2013 at 12:09:06PM +1000, Dave Chinner wrote:
> This is a repost of all the outstanding patches I have for 3.10
> rebased on top of a current xfsdev tree.  All of the review comments
> have been addressed, so they shoul dbe just about good to go.

Applied.  Will send a pull request tomorrow.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-06-05 16:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-05  2:09 [PATCH 0/4] xfs: outstanding fixes for 3.10 Dave Chinner
2013-06-05  2:09 ` [PATCH 1/4] xfs: fix log recovery transaction item reordering Dave Chinner
2013-06-05 13:13   ` Mark Tinguely
2013-06-05  2:09 ` [PATCH 2/4] xfs: inode unlinked list needs to recalculate the inode CRC Dave Chinner
2013-06-05 14:43   ` Mark Tinguely
2013-06-05  2:09 ` [PATCH 3/4] xfs: disable noattr2/attr2 mount options for CRC enabled filesystems Dave Chinner
2013-06-05  2:09 ` [PATCH 4/4] xfs: increase number of ACL entries for V5 superblocks Dave Chinner
2013-06-05 13:26   ` Mark Tinguely
2013-06-05 13:38     ` Mark Tinguely
2013-06-05 16:44 ` [PATCH 0/4] xfs: outstanding fixes for 3.10 Ben Myers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.