All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] xfs: fixes for 3.10-rc3
@ 2013-05-21  8:01 Dave Chinner
  2013-05-21  8:02 ` [PATCH 01/11] xfs: avoid nesting transactions in xfs_qm_scall_setqlim() Dave Chinner
                   ` (13 more replies)
  0 siblings, 14 replies; 49+ messages in thread
From: Dave Chinner @ 2013-05-21  8:01 UTC (permalink / raw)
  To: xfs; +Cc: bpm

Hi Ben,

This is my current kernel bug fix patch series. I've updated it
against a current xfsdev tree, and contains all the fixes mentioned
in the "fixes for 3.10-rc2 (updated)" thread. The first 7 patches
are patches from that series. The last 4 are new patches.

The first new patch stops CRC enabled filesystems from spamming the
log. It currently emits an "Experimental" warning ever time the
superblock is written, which is typically every 30s.

The second path ("rework remote attr CRCs") is the changes I
mentioned in the "fixes for 3.10-rc2 (updated)" thread. The code is
far more robust as a result of these changes, and I think we really
need to change the format as done in this patch. Once we have
decided on the way forward, I'll port this to userspace.

The third patch fixes a remote symlink problem - I didn't hit this
until I'd redone the remote attr CRCs and the 1k block size
filesystem testing made it passed the attribute tests it was failing
on.

Finally, the last patch is another on-disk format change - one that
removes the 25 entry limit on ACLs. It doesn't invalidate anything
that is already on disk, just allows ACLs on v5 superblock
filesystems to store more than 25 ACLs in an xattr. In fact, it
allows (65536 - 4) / 12 = 5461 entries to be stored in a single
ACL, so I don't see anyone running out on v5 superblocks....

Thoughts, comments?

Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 01/11] xfs: avoid nesting transactions in xfs_qm_scall_setqlim()
  2013-05-21  8:01 [PATCH 00/11] xfs: fixes for 3.10-rc3 Dave Chinner
@ 2013-05-21  8:02 ` Dave Chinner
  2013-05-21 18:35   ` Ben Myers
  2013-05-21  8:02 ` [PATCH 02/11] xfs: remote attribute allocation may be contiguous Dave Chinner
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2013-05-21  8:02 UTC (permalink / raw)
  To: xfs; +Cc: bpm

From: Dave Chinner <dchinner@redhat.com>

Lockdep reports:

=============================================
[ INFO: possible recursive locking detected ]
3.9.0+ #3 Not tainted
---------------------------------------------
setquota/28368 is trying to acquire lock:
 (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50

but task is already holding lock:
 (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50

from xfs_qm_scall_setqlim()->xfs_dqread() when a dquot needs to be
allocated.

xfs_qm_scall_setqlim() is starting a transaction and then not
passing it into xfs_qm_dqet() and so it starts it's own transaction
when allocating the dquot.  Splat!

Fix this by not allocating the dquot in xfs_qm_scall_setqlim()
inside the setqlim transaction. This requires getting the dquot
first (and allocating it if necessary) then dropping and relocking
the dquot before joining it to the setqlim transaction.

Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_qm_syscalls.c |   40 +++++++++++++++++++++++-----------------
 1 file changed, 23 insertions(+), 17 deletions(-)

diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
index c41190c..6cdf6ff 100644
--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -489,31 +489,36 @@ xfs_qm_scall_setqlim(
 	if ((newlim->d_fieldmask & XFS_DQ_MASK) == 0)
 		return 0;
 
-	tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SETQLIM);
-	error = xfs_trans_reserve(tp, 0, XFS_QM_SETQLIM_LOG_RES(mp),
-				  0, 0, XFS_DEFAULT_LOG_COUNT);
-	if (error) {
-		xfs_trans_cancel(tp, 0);
-		return (error);
-	}
-
 	/*
 	 * We don't want to race with a quotaoff so take the quotaoff lock.
-	 * (We don't hold an inode lock, so there's nothing else to stop
-	 * a quotaoff from happening). (XXXThis doesn't currently happen
-	 * because we take the vfslock before calling xfs_qm_sysent).
+	 * We don't hold an inode lock, so there's nothing else to stop
+	 * a quotaoff from happening.
 	 */
 	mutex_lock(&q->qi_quotaofflock);
 
 	/*
-	 * Get the dquot (locked), and join it to the transaction.
-	 * Allocate the dquot if this doesn't exist.
+	 * Get the dquot (locked) before we start, as we need to do a
+	 * transaction to allocate it if it doesn't exist. Once we have the
+	 * dquot, unlock it so we can start the next transaction safely. We hold
+	 * a reference to the dquot, so it's safe to do this unlock/lock without
+	 * it being reclaimed in the mean time.
 	 */
-	if ((error = xfs_qm_dqget(mp, NULL, id, type, XFS_QMOPT_DQALLOC, &dqp))) {
-		xfs_trans_cancel(tp, XFS_TRANS_ABORT);
+	error = xfs_qm_dqget(mp, NULL, id, type, XFS_QMOPT_DQALLOC, &dqp);
+	if (error) {
 		ASSERT(error != ENOENT);
 		goto out_unlock;
 	}
+	xfs_dqunlock(dqp);
+
+	tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SETQLIM);
+	error = xfs_trans_reserve(tp, 0, XFS_QM_SETQLIM_LOG_RES(mp),
+				  0, 0, XFS_DEFAULT_LOG_COUNT);
+	if (error) {
+		xfs_trans_cancel(tp, 0);
+		goto out_rele;
+	}
+
+	xfs_dqlock(dqp);
 	xfs_trans_dqjoin(tp, dqp);
 	ddq = &dqp->q_core;
 
@@ -621,9 +626,10 @@ xfs_qm_scall_setqlim(
 	xfs_trans_log_dquot(tp, dqp);
 
 	error = xfs_trans_commit(tp, 0);
-	xfs_qm_dqrele(dqp);
 
- out_unlock:
+out_rele:
+	xfs_qm_dqrele(dqp);
+out_unlock:
 	mutex_unlock(&q->qi_quotaofflock);
 	return error;
 }
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 02/11] xfs: remote attribute allocation may be contiguous
  2013-05-21  8:01 [PATCH 00/11] xfs: fixes for 3.10-rc3 Dave Chinner
  2013-05-21  8:02 ` [PATCH 01/11] xfs: avoid nesting transactions in xfs_qm_scall_setqlim() Dave Chinner
@ 2013-05-21  8:02 ` Dave Chinner
  2013-05-21  8:02 ` [PATCH 03/11] xfs: remote attribute read too short Dave Chinner
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Dave Chinner @ 2013-05-21  8:02 UTC (permalink / raw)
  To: xfs; +Cc: bpm

From: Dave Chinner <dchinner@redhat.com>

When CRCs are enabled, there may be multiple allocations made if the
headers cause a length overflow. This, however, does not mean that
the number of headers required increases, as the second and
subsequent extents may be contiguous with the previous extent. Hence
when we map the extents to write the attribute data, we may end up
with less extents than allocations made. Hence the assertion that we
consume th enumber of headers we calculated in the allocation loop
is incorrect and needs to be removed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_remote.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
index dee8446..aad95b0 100644
--- a/fs/xfs/xfs_attr_remote.c
+++ b/fs/xfs/xfs_attr_remote.c
@@ -359,6 +359,11 @@ xfs_attr_rmtval_set(
 		 * into requiring more blocks. e.g. for 512 byte blocks, we'll
 		 * spill for another block every 9 headers we require in this
 		 * loop.
+		 *
+		 * Note that this can result in contiguous allocation of blocks,
+		 * so we don't use all the space we allocate for headers as we
+		 * have one less header for each contiguous allocation that
+		 * occurs in the map/write loop below.
 		 */
 		if (crcs && blkcnt == 0) {
 			int total_len;
@@ -439,7 +444,6 @@ xfs_attr_rmtval_set(
 		lblkno += map.br_blockcount;
 	}
 	ASSERT(valuelen == 0);
-	ASSERT(hdrcnt == 0);
 	return 0;
 }
 
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 03/11] xfs: remote attribute read too short
  2013-05-21  8:01 [PATCH 00/11] xfs: fixes for 3.10-rc3 Dave Chinner
  2013-05-21  8:02 ` [PATCH 01/11] xfs: avoid nesting transactions in xfs_qm_scall_setqlim() Dave Chinner
  2013-05-21  8:02 ` [PATCH 02/11] xfs: remote attribute allocation may be contiguous Dave Chinner
@ 2013-05-21  8:02 ` Dave Chinner
  2013-05-21 20:59   ` Ben Myers
  2013-05-21  8:02 ` [PATCH 04/11] xfs: remote attribute tail zeroing does too much Dave Chinner
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2013-05-21  8:02 UTC (permalink / raw)
  To: xfs; +Cc: bpm

From: Dave Chinner <dchinner@redhat.com>

Reading a maximally size remote attribute fails when CRCs are
enabled with this verification error:

XFS (vdb): remote attribute header does not match required off/len/owner)

There are two reasons for this, the first being that the
length of the buffer being read is determined from the
args->rmtblkcnt which doesn't take into account CRC headers. Hence
the mapped length ends up being too short and so we need to
calculate it directly from the value length.

The second is that the byte count of valid data within a buffer is
capped by the length of the data and so doesn't take into account
that the buffer might be longer due to headers. Hence we need to
calculate the data space in the buffer first before calculating the
actual byte count of data.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_remote.c |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
index aad95b0..bcdc07c 100644
--- a/fs/xfs/xfs_attr_remote.c
+++ b/fs/xfs/xfs_attr_remote.c
@@ -52,9 +52,11 @@ xfs_attr3_rmt_blocks(
 	struct xfs_mount *mp,
 	int		attrlen)
 {
-	int		buflen = XFS_ATTR3_RMT_BUF_SPACE(mp,
-							 mp->m_sb.sb_blocksize);
-	return (attrlen + buflen - 1) / buflen;
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
+		int buflen = XFS_ATTR3_RMT_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
+		return (attrlen + buflen - 1) / buflen;
+	}
+	return XFS_B_TO_FSB(mp, attrlen);
 }
 
 static bool
@@ -206,8 +208,9 @@ xfs_attr_rmtval_get(
 
 	while (valuelen > 0) {
 		nmap = ATTR_RMTVALUE_MAPSIZE;
+		blkcnt = xfs_attr3_rmt_blocks(mp, valuelen);
 		error = xfs_bmapi_read(args->dp, (xfs_fileoff_t)lblkno,
-				       args->rmtblkcnt, map, &nmap,
+				       blkcnt, map, &nmap,
 				       XFS_BMAPI_ATTRFORK);
 		if (error)
 			return error;
@@ -227,8 +230,8 @@ xfs_attr_rmtval_get(
 			if (error)
 				return error;
 
-			byte_cnt = min_t(int, valuelen, BBTOB(bp->b_length));
-			byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, byte_cnt);
+			byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, BBTOB(bp->b_length));
+			byte_cnt = min_t(int, valuelen, byte_cnt);
 
 			src = bp->b_addr;
 			if (xfs_sb_version_hascrc(&mp->m_sb)) {
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 04/11] xfs: remote attribute tail zeroing does too much
  2013-05-21  8:01 [PATCH 00/11] xfs: fixes for 3.10-rc3 Dave Chinner
                   ` (2 preceding siblings ...)
  2013-05-21  8:02 ` [PATCH 03/11] xfs: remote attribute read too short Dave Chinner
@ 2013-05-21  8:02 ` Dave Chinner
  2013-05-21 22:31   ` Ben Myers
  2013-05-21  8:02 ` [PATCH 05/11] xfs: correctly map remote attr buffers during removal Dave Chinner
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2013-05-21  8:02 UTC (permalink / raw)
  To: xfs; +Cc: bpm

From: Dave Chinner <dchinner@redhat.com>

When an attribute data does not fill then entire remote block, we
zero the remaining part of the buffer. This, however, needs to take
into account that the buffer has a header, and so the offset where
zeroing starts and the length of zeroing need to take this into
account. Otherwise we end up with zeros over the end of the
attribute value when CRCs are enabled.

While there, make sure we only ask to map an extent that covers the
remaining range of the attribute, rather than asking every time for
the full length of remote data. If the remote attribute blocks are
contiguous with other parts of the attribute tree, it will map those
blocks as well and we can potentially zero them incorrectly. We can
also get buffer size mistmatches when trying to read or remove the
remote attribute, and this can lead to not finding the correct
buffer when looking it up in cache.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_remote.c |   37 ++++++++++++++++++-------------------
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
index bcdc07c..e207bf0 100644
--- a/fs/xfs/xfs_attr_remote.c
+++ b/fs/xfs/xfs_attr_remote.c
@@ -296,10 +296,7 @@ xfs_attr_rmtval_set(
 	 * and we may not need that many, so we have to handle this when
 	 * allocating the blocks below. 
 	 */
-	if (!crcs)
-		blkcnt = XFS_B_TO_FSB(mp, args->valuelen);
-	else
-		blkcnt = xfs_attr3_rmt_blocks(mp, args->valuelen);
+	blkcnt = xfs_attr3_rmt_blocks(mp, args->valuelen);
 
 	error = xfs_bmap_first_unused(args->trans, args->dp, blkcnt, &lfileoff,
 						   XFS_ATTR_FORK);
@@ -394,8 +391,11 @@ xfs_attr_rmtval_set(
 	 */
 	lblkno = args->rmtblkno;
 	valuelen = args->valuelen;
+	blkcnt = args->rmtblkcnt;
 	while (valuelen > 0) {
 		int	byte_cnt;
+		int	hdr_size;
+		int	dblkcnt;
 		char	*buf;
 
 		/*
@@ -404,7 +404,7 @@ xfs_attr_rmtval_set(
 		xfs_bmap_init(args->flist, args->firstblock);
 		nmap = 1;
 		error = xfs_bmapi_read(dp, (xfs_fileoff_t)lblkno,
-				       args->rmtblkcnt, &map, &nmap,
+				       blkcnt, &map, &nmap,
 				       XFS_BMAPI_ATTRFORK);
 		if (error)
 			return(error);
@@ -413,26 +413,25 @@ xfs_attr_rmtval_set(
 		       (map.br_startblock != HOLESTARTBLOCK));
 
 		dblkno = XFS_FSB_TO_DADDR(mp, map.br_startblock),
-		blkcnt = XFS_FSB_TO_BB(mp, map.br_blockcount);
+		dblkcnt = XFS_FSB_TO_BB(mp, map.br_blockcount);
 
-		bp = xfs_buf_get(mp->m_ddev_targp, dblkno, blkcnt, 0);
+		bp = xfs_buf_get(mp->m_ddev_targp, dblkno, dblkcnt, 0);
 		if (!bp)
 			return ENOMEM;
 		bp->b_ops = &xfs_attr3_rmt_buf_ops;
-
-		byte_cnt = BBTOB(bp->b_length);
-		byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, byte_cnt);
-		if (valuelen < byte_cnt)
-			byte_cnt = valuelen;
-
 		buf = bp->b_addr;
-		buf += xfs_attr3_rmt_hdr_set(mp, dp->i_ino, offset,
+
+		byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, BBTOB(bp->b_length));
+		byte_cnt = min_t(int, valuelen, byte_cnt);
+		hdr_size = xfs_attr3_rmt_hdr_set(mp, dp->i_ino, offset,
 					     byte_cnt, bp);
-		memcpy(buf, src, byte_cnt);
+		ASSERT(hdr_size + byte_cnt <= BBTOB(bp->b_length));
 
-		if (byte_cnt < BBTOB(bp->b_length))
-			xfs_buf_zero(bp, byte_cnt,
-				     BBTOB(bp->b_length) - byte_cnt);
+		memcpy(buf + hdr_size, src, byte_cnt);
+
+		if (byte_cnt + hdr_size < BBTOB(bp->b_length))
+			xfs_buf_zero(bp, byte_cnt + hdr_size,
+				     BBTOB(bp->b_length) - byte_cnt - hdr_size);
 
 		error = xfs_bwrite(bp);	/* GROT: NOTE: synchronous write */
 		xfs_buf_relse(bp);
@@ -442,9 +441,9 @@ xfs_attr_rmtval_set(
 		src += byte_cnt;
 		valuelen -= byte_cnt;
 		offset += byte_cnt;
-		hdrcnt--;
 
 		lblkno += map.br_blockcount;
+		blkcnt -= map.br_blockcount;
 	}
 	ASSERT(valuelen == 0);
 	return 0;
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 05/11] xfs: correctly map remote attr buffers during removal
  2013-05-21  8:01 [PATCH 00/11] xfs: fixes for 3.10-rc3 Dave Chinner
                   ` (3 preceding siblings ...)
  2013-05-21  8:02 ` [PATCH 04/11] xfs: remote attribute tail zeroing does too much Dave Chinner
@ 2013-05-21  8:02 ` Dave Chinner
  2013-05-22 17:01   ` Ben Myers
  2013-05-21  8:02 ` [PATCH 06/11] xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance Dave Chinner
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2013-05-21  8:02 UTC (permalink / raw)
  To: xfs; +Cc: bpm

From: Dave Chinner <dchinner@redhat.com>

If we don't map the buffers correctly (same as for get/set
operations) then the incore buffer lookup will fail. If a block
number matches but a length is wrong, then debug kernels will ASSERT
fail in _xfs_buf_find() due to the length mismatch. Ensure that we
map the buffers correctly by basing the length of the buffer on the
attribute data length rather than the remote block count.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_remote.c |   27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
index e207bf0..d8bcb2d 100644
--- a/fs/xfs/xfs_attr_remote.c
+++ b/fs/xfs/xfs_attr_remote.c
@@ -468,19 +468,25 @@ xfs_attr_rmtval_remove(xfs_da_args_t *args)
 	mp = args->dp->i_mount;
 
 	/*
-	 * Roll through the "value", invalidating the attribute value's
-	 * blocks.
+	 * Roll through the "value", invalidating the attribute value's blocks.
+	 * Note that args->rmtblkcnt is the minimum number of data blocks we'll
+	 * see for a CRC enabled remote attribute. Each extent will have a
+	 * header, and so we may have more blocks than we realise here.  If we
+	 * fail to map the blocks correctly, we'll have problems with the buffer
+	 * lookups.
 	 */
 	lblkno = args->rmtblkno;
-	valuelen = args->rmtblkcnt;
+	valuelen = args->valuelen;
+	blkcnt = xfs_attr3_rmt_blocks(mp, valuelen);
 	while (valuelen > 0) {
+		int dblkcnt;
+
 		/*
 		 * Try to remember where we decided to put the value.
 		 */
 		nmap = 1;
 		error = xfs_bmapi_read(args->dp, (xfs_fileoff_t)lblkno,
-				       args->rmtblkcnt, &map, &nmap,
-				       XFS_BMAPI_ATTRFORK);
+				       blkcnt, &map, &nmap, XFS_BMAPI_ATTRFORK);
 		if (error)
 			return(error);
 		ASSERT(nmap == 1);
@@ -488,28 +494,31 @@ xfs_attr_rmtval_remove(xfs_da_args_t *args)
 		       (map.br_startblock != HOLESTARTBLOCK));
 
 		dblkno = XFS_FSB_TO_DADDR(mp, map.br_startblock),
-		blkcnt = XFS_FSB_TO_BB(mp, map.br_blockcount);
+		dblkcnt = XFS_FSB_TO_BB(mp, map.br_blockcount);
 
 		/*
 		 * If the "remote" value is in the cache, remove it.
 		 */
-		bp = xfs_incore(mp->m_ddev_targp, dblkno, blkcnt, XBF_TRYLOCK);
+		bp = xfs_incore(mp->m_ddev_targp, dblkno, dblkcnt, XBF_TRYLOCK);
 		if (bp) {
 			xfs_buf_stale(bp);
 			xfs_buf_relse(bp);
 			bp = NULL;
 		}
 
-		valuelen -= map.br_blockcount;
+		valuelen -= XFS_ATTR3_RMT_BUF_SPACE(mp,
+					XFS_FSB_TO_B(mp, map.br_blockcount));
 
 		lblkno += map.br_blockcount;
+		blkcnt -= map.br_blockcount;
+		blkcnt = max(blkcnt, xfs_attr3_rmt_blocks(mp, valuelen));
 	}
 
 	/*
 	 * Keep de-allocating extents until the remote-value region is gone.
 	 */
+	blkcnt = lblkno - args->rmtblkno;
 	lblkno = args->rmtblkno;
-	blkcnt = args->rmtblkcnt;
 	done = 0;
 	while (!done) {
 		xfs_bmap_init(args->flist, args->firstblock);
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 06/11] xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance
  2013-05-21  8:01 [PATCH 00/11] xfs: fixes for 3.10-rc3 Dave Chinner
                   ` (4 preceding siblings ...)
  2013-05-21  8:02 ` [PATCH 05/11] xfs: correctly map remote attr buffers during removal Dave Chinner
@ 2013-05-21  8:02 ` Dave Chinner
  2013-05-22 20:50   ` Ben Myers
  2013-05-21  8:02 ` [PATCH 07/11] xfs: fully initialise temp leaf in xfs_attr3_leaf_compact Dave Chinner
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2013-05-21  8:02 UTC (permalink / raw)
  To: xfs; +Cc: bpm

From: Dave Chinner <dchinner@redhat.com>

xfs_attr3_leaf_unbalance() uses a temporary buffer for recombining
the entries in two leaves when the destination leaf requires
compaction. The temporary buffer ends up being copied back over the
original destination buffer, so the header in the temporary buffer
needs to contain all the information that is in the destination
buffer.

To make sure the temporary buffer is fully initialised, once we've
set up the temporary incore header appropriately, write is back to
the temporary buffer before starting to move entries around.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_leaf.c |   16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
index 0bce1b3..79ece72 100644
--- a/fs/xfs/xfs_attr_leaf.c
+++ b/fs/xfs/xfs_attr_leaf.c
@@ -2181,14 +2181,24 @@ xfs_attr3_leaf_unbalance(
 		struct xfs_attr_leafblock *tmp_leaf;
 		struct xfs_attr3_icleaf_hdr tmphdr;
 
-		tmp_leaf = kmem_alloc(state->blocksize, KM_SLEEP);
-		memset(tmp_leaf, 0, state->blocksize);
-		memset(&tmphdr, 0, sizeof(tmphdr));
+		tmp_leaf = kmem_zalloc(state->blocksize, KM_SLEEP);
+
+		/*
+		 * Copy the header into the temp leaf so that all the stuff
+		 * not in the incore header is present and gets copied back in
+		 * once we've moved all the entries.
+		 */
+		memcpy(tmp_leaf, save_leaf, xfs_attr3_leaf_hdr_size(save_leaf));
 
+		memset(&tmphdr, 0, sizeof(tmphdr));
 		tmphdr.magic = savehdr.magic;
 		tmphdr.forw = savehdr.forw;
 		tmphdr.back = savehdr.back;
 		tmphdr.firstused = state->blocksize;
+
+		/* write the header to the temp buffer to initialise it */
+		xfs_attr3_leaf_hdr_to_disk(tmp_leaf, &tmphdr);
+
 		if (xfs_attr3_leaf_order(save_blk->bp, &savehdr,
 					 drop_blk->bp, &drophdr)) {
 			xfs_attr3_leaf_moveents(drop_leaf, &drophdr, 0,
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 07/11] xfs: fully initialise temp leaf in xfs_attr3_leaf_compact
  2013-05-21  8:01 [PATCH 00/11] xfs: fixes for 3.10-rc3 Dave Chinner
                   ` (5 preceding siblings ...)
  2013-05-21  8:02 ` [PATCH 06/11] xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance Dave Chinner
@ 2013-05-21  8:02 ` Dave Chinner
  2013-05-22 21:59   ` Ben Myers
  2013-05-21  8:02 ` [PATCH 08/11] xfs: don't emit v5 superblock warnings on write Dave Chinner
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2013-05-21  8:02 UTC (permalink / raw)
  To: xfs; +Cc: bpm

From: Dave Chinner <dchinner@redhat.com>

xfs_attr3_leaf_compact() uses a temporary buffer for compacting the
the entries in a leaf. It copies the the original buffer into the
temporary buffer, then zeros the original buffer completely. It then
copies the entries back into the original buffer.  However, the
original buffer has not been correctly initialised, and so the
movement of the entries goes horribly wrong.

Make sure the zeroed destination buffer is fully initialised, and
once we've set up the destination incore header appropriately, write
is back to the buffer before starting to move entries around.

While debugging this, the _d/_s prefixes weren't sufficient to
remind me what buffer was what, so rename then all _src/_dst.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_leaf.c |   42 ++++++++++++++++++++++++++----------------
 1 file changed, 26 insertions(+), 16 deletions(-)

diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
index 79ece72..5b03d15 100644
--- a/fs/xfs/xfs_attr_leaf.c
+++ b/fs/xfs/xfs_attr_leaf.c
@@ -1445,11 +1445,12 @@ xfs_attr3_leaf_add_work(
 STATIC void
 xfs_attr3_leaf_compact(
 	struct xfs_da_args	*args,
-	struct xfs_attr3_icleaf_hdr *ichdr_d,
+	struct xfs_attr3_icleaf_hdr *ichdr_dst,
 	struct xfs_buf		*bp)
 {
-	xfs_attr_leafblock_t	*leaf_s, *leaf_d;
-	struct xfs_attr3_icleaf_hdr ichdr_s;
+	struct xfs_attr_leafblock *leaf_src;
+	struct xfs_attr_leafblock *leaf_dst;
+	struct xfs_attr3_icleaf_hdr ichdr_src;
 	struct xfs_trans	*trans = args->trans;
 	struct xfs_mount	*mp = trans->t_mountp;
 	char			*tmpbuffer;
@@ -1457,29 +1458,38 @@ xfs_attr3_leaf_compact(
 	trace_xfs_attr_leaf_compact(args);
 
 	tmpbuffer = kmem_alloc(XFS_LBSIZE(mp), KM_SLEEP);
-	ASSERT(tmpbuffer != NULL);
 	memcpy(tmpbuffer, bp->b_addr, XFS_LBSIZE(mp));
 	memset(bp->b_addr, 0, XFS_LBSIZE(mp));
+	leaf_src = (xfs_attr_leafblock_t *)tmpbuffer;
+	leaf_dst = bp->b_addr;
 
 	/*
-	 * Copy basic information
+	 * Copy the on-disk header back into the destination buffer to ensure
+	 * all the information in the header that is not part of the incore
+	 * header structure is preserved.
 	 */
-	leaf_s = (xfs_attr_leafblock_t *)tmpbuffer;
-	leaf_d = bp->b_addr;
-	ichdr_s = *ichdr_d;	/* struct copy */
-	ichdr_d->firstused = XFS_LBSIZE(mp);
-	ichdr_d->usedbytes = 0;
-	ichdr_d->count = 0;
-	ichdr_d->holes = 0;
-	ichdr_d->freemap[0].base = xfs_attr3_leaf_hdr_size(leaf_s);
-	ichdr_d->freemap[0].size = ichdr_d->firstused - ichdr_d->freemap[0].base;
+	memcpy(bp->b_addr, tmpbuffer, xfs_attr3_leaf_hdr_size(leaf_src));
+
+	/* Initialise the incore headers */
+	ichdr_src = *ichdr_dst;	/* struct copy */
+	ichdr_dst->firstused = XFS_LBSIZE(mp);
+	ichdr_dst->usedbytes = 0;
+	ichdr_dst->count = 0;
+	ichdr_dst->holes = 0;
+	ichdr_dst->freemap[0].base = xfs_attr3_leaf_hdr_size(leaf_src);
+	ichdr_dst->freemap[0].size = ichdr_dst->firstused -
+						ichdr_dst->freemap[0].base;
+
+
+	/* write the header back to initialise the underlying buffer */
+	xfs_attr3_leaf_hdr_to_disk(leaf_dst, ichdr_dst);
 
 	/*
 	 * Copy all entry's in the same (sorted) order,
 	 * but allocate name/value pairs packed and in sequence.
 	 */
-	xfs_attr3_leaf_moveents(leaf_s, &ichdr_s, 0, leaf_d, ichdr_d, 0,
-				ichdr_s.count, mp);
+	xfs_attr3_leaf_moveents(leaf_src, &ichdr_src, 0, leaf_dst, ichdr_dst, 0,
+				ichdr_src.count, mp);
 	/*
 	 * this logs the entire buffer, but the caller must write the header
 	 * back to the buffer when it is finished modifying it.
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 08/11] xfs: don't emit v5 superblock warnings on write
  2013-05-21  8:01 [PATCH 00/11] xfs: fixes for 3.10-rc3 Dave Chinner
                   ` (6 preceding siblings ...)
  2013-05-21  8:02 ` [PATCH 07/11] xfs: fully initialise temp leaf in xfs_attr3_leaf_compact Dave Chinner
@ 2013-05-21  8:02 ` Dave Chinner
  2013-05-22 22:26   ` Ben Myers
  2013-05-21  8:02 ` [PATCH 09/11] xfs: rework remote attr CRCs Dave Chinner
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2013-05-21  8:02 UTC (permalink / raw)
  To: xfs; +Cc: bpm

From: Dave Chinner <dchinner@redhat.com>

We write the superblock every 30s or so which results in the
verifier being called. Right now that results in this output
every 30s:

XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
Use of these features in this kernel is at your own risk!

And spamming the logs. Stop this output from occurring on superblock
writes.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_mount.c |   18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index f6bfbd7..e8e310c 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -314,7 +314,8 @@ STATIC int
 xfs_mount_validate_sb(
 	xfs_mount_t	*mp,
 	xfs_sb_t	*sbp,
-	bool		check_inprogress)
+	bool		check_inprogress,
+	bool		check_version)
 {
 
 	/*
@@ -337,9 +338,10 @@ xfs_mount_validate_sb(
 
 	/*
 	 * Version 5 superblock feature mask validation. Reject combinations the
-	 * kernel cannot support up front before checking anything else.
+	 * kernel cannot support up front before checking anything else. For
+	 * write validation, we don't need to check feature masks.
 	 */
-	if (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) {
+	if (check_version && XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) {
 		xfs_alert(mp,
 "Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!\n"
 "Use of these features in this kernel is at your own risk!");
@@ -675,7 +677,8 @@ xfs_sb_to_disk(
 
 static int
 xfs_sb_verify(
-	struct xfs_buf	*bp)
+	struct xfs_buf	*bp,
+	bool		check_version)
 {
 	struct xfs_mount *mp = bp->b_target->bt_mount;
 	struct xfs_sb	sb;
@@ -686,7 +689,8 @@ xfs_sb_verify(
 	 * Only check the in progress field for the primary superblock as
 	 * mkfs.xfs doesn't clear it from secondary superblocks.
 	 */
-	return xfs_mount_validate_sb(mp, &sb, bp->b_bn == XFS_SB_DADDR);
+	return xfs_mount_validate_sb(mp, &sb, bp->b_bn == XFS_SB_DADDR,
+				     check_version);
 }
 
 /*
@@ -719,7 +723,7 @@ xfs_sb_read_verify(
 			goto out_error;
 		}
 	}
-	error = xfs_sb_verify(bp);
+	error = xfs_sb_verify(bp, true);
 
 out_error:
 	if (error) {
@@ -758,7 +762,7 @@ xfs_sb_write_verify(
 	struct xfs_buf_log_item	*bip = bp->b_fspriv;
 	int			error;
 
-	error = xfs_sb_verify(bp);
+	error = xfs_sb_verify(bp, false);
 	if (error) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
 		xfs_buf_ioerror(bp, error);
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 09/11] xfs: rework remote attr CRCs
  2013-05-21  8:01 [PATCH 00/11] xfs: fixes for 3.10-rc3 Dave Chinner
                   ` (7 preceding siblings ...)
  2013-05-21  8:02 ` [PATCH 08/11] xfs: don't emit v5 superblock warnings on write Dave Chinner
@ 2013-05-21  8:02 ` Dave Chinner
  2013-05-23 21:54   ` Ben Myers
  2013-05-21  8:02 ` [PATCH 10/11] xfs: fix incorrect remote symlink block count Dave Chinner
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2013-05-21  8:02 UTC (permalink / raw)
  To: xfs; +Cc: bpm

From: Dave Chinner <dchinner@redhat.com>

Note: this changes the on-disk remote attribute format. I assert
that this is OK to do as CRCs are marked experimental and the first
kernel it is included in has not yet reached release yet. Further,
the userspace utilities are still evolving and so anyone using this
stuff right now is a developer or tester using volatile filesystems
for testing this feature. Hence changing the format right now to
save longer term pain is the right thing to do.

The fundamental change is to move from a header per extent in the
attribute to a header per filesytem block in the attribute. This
means there are more header blocks and the parsing of the attribute
data is slightly more complex, but it has the advantage that we
always know the size of the attribute on disk based on the length of
the data it contains.

This is where the header-per-extent method has problems. We don't
know the size of the attribute on disk without first knowing how
many extents are used to hold it. And we can't tell from a
mapping lookup, either, because remote attributes can be allocated
contiguously with other attribute blocks and so there is no obvious
way of determining the actual size of the atribute on disk short of
walking and mapping buffers.

The problem with this approach is that if we map a buffer
incorrectly (e.g. we make the last buffer for the attribute data too
long), we then get buffer cache lookup failure when we map it
correctly. i.e. we get a size mismatch on lookup. This is not
necessarily fatal, but it's a cache coherency problem that can lead
to returning the wrong data to userspace or writing the wrong data
to disk. And debug kernels will assert fail if this occurs.

I found lots of niggly little problems trying to fix this issue on a
4k block size filesystem, finally getting it to pass with lots of
fixes. The thing is, 1024 byte filesystems still failed, and it was
getting really complex handling all the corner cases that were
showing up. And there were clearly more that I hadn't found yet.

It is complex, fragile code, and if we don't fix it now, it will be
complex, fragile code forever more.

Hence the simple fix is to add a header to each filesystem block.
This gives us the same relationship between the attribute data
length and the number of blocks on disk as we have without CRCs -
it's a linear mapping and doesn't require us to guess anything. It
is simple to implement, too - the remote block count calculated at
lookup time can be used by the remote attribute set/get/remove code
without modification for both CRC and non-CRC filesystems. The world
becomes sane again.

Because the copy-in and copy-out now need to iterate over each
filesystem block, I moved them into helper functions so we separate
the block mapping and buffer manupulations from the attribute data
and CRC header manipulations. The code becomes much clearer as a
result, and it is a lot easier to understand and debug. It also
appears to be much more robust - once it worked on 4k block size
filesystems, it has worked without failure on 1k block size
filesystems, too.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_leaf.c   |   13 +-
 fs/xfs/xfs_attr_remote.c |  381 +++++++++++++++++++++++++++-------------------
 fs/xfs/xfs_attr_remote.h |   10 ++
 fs/xfs/xfs_buf.c         |    1 +
 4 files changed, 247 insertions(+), 158 deletions(-)

diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
index 5b03d15..d788302 100644
--- a/fs/xfs/xfs_attr_leaf.c
+++ b/fs/xfs/xfs_attr_leaf.c
@@ -1412,7 +1412,7 @@ xfs_attr3_leaf_add_work(
 		name_rmt->valuelen = 0;
 		name_rmt->valueblk = 0;
 		args->rmtblkno = 1;
-		args->rmtblkcnt = XFS_B_TO_FSB(mp, args->valuelen);
+		args->rmtblkcnt = xfs_attr3_rmt_blocks(mp, args->valuelen);
 	}
 	xfs_trans_log_buf(args->trans, bp,
 	     XFS_DA_LOGRANGE(leaf, xfs_attr3_leaf_name(leaf, args->index),
@@ -2354,8 +2354,9 @@ xfs_attr3_leaf_lookup_int(
 			args->index = probe;
 			args->valuelen = be32_to_cpu(name_rmt->valuelen);
 			args->rmtblkno = be32_to_cpu(name_rmt->valueblk);
-			args->rmtblkcnt = XFS_B_TO_FSB(args->dp->i_mount,
-						       args->valuelen);
+			args->rmtblkcnt = xfs_attr3_rmt_blocks(
+							args->dp->i_mount,
+							args->valuelen);
 			return XFS_ERROR(EEXIST);
 		}
 	}
@@ -2406,7 +2407,8 @@ xfs_attr3_leaf_getvalue(
 		ASSERT(memcmp(args->name, name_rmt->name, args->namelen) == 0);
 		valuelen = be32_to_cpu(name_rmt->valuelen);
 		args->rmtblkno = be32_to_cpu(name_rmt->valueblk);
-		args->rmtblkcnt = XFS_B_TO_FSB(args->dp->i_mount, valuelen);
+		args->rmtblkcnt = xfs_attr3_rmt_blocks(args->dp->i_mount,
+						       valuelen);
 		if (args->flags & ATTR_KERNOVAL) {
 			args->valuelen = valuelen;
 			return 0;
@@ -2732,7 +2734,8 @@ xfs_attr3_leaf_list_int(
 				args.valuelen = valuelen;
 				args.value = kmem_alloc(valuelen, KM_SLEEP | KM_NOFS);
 				args.rmtblkno = be32_to_cpu(name_rmt->valueblk);
-				args.rmtblkcnt = XFS_B_TO_FSB(args.dp->i_mount, valuelen);
+				args.rmtblkcnt = xfs_attr3_rmt_blocks(
+							args.dp->i_mount, valuelen);
 				retval = xfs_attr_rmtval_get(&args);
 				if (retval)
 					return retval;
diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
index d8bcb2d..ef6b0c1 100644
--- a/fs/xfs/xfs_attr_remote.c
+++ b/fs/xfs/xfs_attr_remote.c
@@ -47,7 +47,7 @@
  * Each contiguous block has a header, so it is not just a simple attribute
  * length to FSB conversion.
  */
-static int
+int
 xfs_attr3_rmt_blocks(
 	struct xfs_mount *mp,
 	int		attrlen)
@@ -59,12 +59,43 @@ xfs_attr3_rmt_blocks(
 	return XFS_B_TO_FSB(mp, attrlen);
 }
 
+/*
+ * Checking of the remote attribute header is split into two parts. The verifier
+ * does CRC, location and bounds checking, the unpacking function checks the
+ * attribute parameters and owner.
+ */
+static bool
+xfs_attr3_rmt_hdr_ok(
+	struct xfs_mount	*mp,
+	void			*ptr,
+	xfs_ino_t		ino,
+	uint32_t		offset,
+	uint32_t		size,
+	xfs_daddr_t		bno)
+{
+	struct xfs_attr3_rmt_hdr *rmt = ptr;
+
+	if (bno != be64_to_cpu(rmt->rm_blkno))
+		return false;
+	if (offset != be32_to_cpu(rmt->rm_offset))
+		return false;
+	if (size != be32_to_cpu(rmt->rm_bytes))
+		return false;
+	if (ino != be64_to_cpu(rmt->rm_owner))
+		return false;
+
+	/* ok */
+	return true;
+}
+
 static bool
 xfs_attr3_rmt_verify(
-	struct xfs_buf		*bp)
+	struct xfs_mount	*mp,
+	void			*ptr,
+	int			fsbsize,
+	xfs_daddr_t		bno)
 {
-	struct xfs_mount	*mp = bp->b_target->bt_mount;
-	struct xfs_attr3_rmt_hdr *rmt = bp->b_addr;
+	struct xfs_attr3_rmt_hdr *rmt = ptr;
 
 	if (!xfs_sb_version_hascrc(&mp->m_sb))
 		return false;
@@ -72,7 +103,9 @@ xfs_attr3_rmt_verify(
 		return false;
 	if (!uuid_equal(&rmt->rm_uuid, &mp->m_sb.sb_uuid))
 		return false;
-	if (bp->b_bn != be64_to_cpu(rmt->rm_blkno))
+	if (be64_to_cpu(rmt->rm_blkno) != bno)
+		return false;
+	if (be32_to_cpu(rmt->rm_bytes) > fsbsize - sizeof(*rmt))
 		return false;
 	if (be32_to_cpu(rmt->rm_offset) +
 				be32_to_cpu(rmt->rm_bytes) >= XATTR_SIZE_MAX)
@@ -88,17 +121,40 @@ xfs_attr3_rmt_read_verify(
 	struct xfs_buf	*bp)
 {
 	struct xfs_mount *mp = bp->b_target->bt_mount;
+	char		*ptr;
+	int		len;
+	bool		corrupt = false;
+	xfs_daddr_t	bno;
 
 	/* no verification of non-crc buffers */
 	if (!xfs_sb_version_hascrc(&mp->m_sb))
 		return;
 
-	if (!xfs_verify_cksum(bp->b_addr, BBTOB(bp->b_length),
-			      XFS_ATTR3_RMT_CRC_OFF) ||
-	    !xfs_attr3_rmt_verify(bp)) {
+	ptr = bp->b_addr;
+	bno = bp->b_bn;
+	len = BBTOB(bp->b_length);
+	ASSERT(len >= XFS_LBSIZE(mp));
+
+	while (len > 0) {
+		if (!xfs_verify_cksum(ptr, XFS_LBSIZE(mp),
+				      XFS_ATTR3_RMT_CRC_OFF)) {
+			corrupt = true;
+			break;
+		}
+		if (!xfs_attr3_rmt_verify(mp, ptr, XFS_LBSIZE(mp), bno)) {
+			corrupt = true;
+			break;
+		}
+		len -= XFS_LBSIZE(mp);
+		ptr += XFS_LBSIZE(mp);
+		bno += mp->m_bsize;
+	}
+
+	if (corrupt) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
 		xfs_buf_ioerror(bp, EFSCORRUPTED);
-	}
+	} else
+		ASSERT(len == 0);
 }
 
 static void
@@ -107,23 +163,39 @@ xfs_attr3_rmt_write_verify(
 {
 	struct xfs_mount *mp = bp->b_target->bt_mount;
 	struct xfs_buf_log_item	*bip = bp->b_fspriv;
+	char		*ptr;
+	int		len;
+	xfs_daddr_t	bno;
 
 	/* no verification of non-crc buffers */
 	if (!xfs_sb_version_hascrc(&mp->m_sb))
 		return;
 
-	if (!xfs_attr3_rmt_verify(bp)) {
-		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
-		xfs_buf_ioerror(bp, EFSCORRUPTED);
-		return;
-	}
+	ptr = bp->b_addr;
+	bno = bp->b_bn;
+	len = BBTOB(bp->b_length);
+	ASSERT(len >= XFS_LBSIZE(mp));
+
+	while (len > 0) {
+		if (!xfs_attr3_rmt_verify(mp, ptr, XFS_LBSIZE(mp), bno)) {
+			XFS_CORRUPTION_ERROR(__func__,
+					    XFS_ERRLEVEL_LOW, mp, bp->b_addr);
+			xfs_buf_ioerror(bp, EFSCORRUPTED);
+			return;
+		}
+		if (bip) {
+			struct xfs_attr3_rmt_hdr *rmt;
 
-	if (bip) {
-		struct xfs_attr3_rmt_hdr *rmt = bp->b_addr;
-		rmt->rm_lsn = cpu_to_be64(bip->bli_item.li_lsn);
+			rmt = (struct xfs_attr3_rmt_hdr *)ptr;
+			rmt->rm_lsn = cpu_to_be64(bip->bli_item.li_lsn);
+		}
+		xfs_update_cksum(ptr, XFS_LBSIZE(mp), XFS_ATTR3_RMT_CRC_OFF);
+
+		len -= XFS_LBSIZE(mp);
+		ptr += XFS_LBSIZE(mp);
+		bno += mp->m_bsize;
 	}
-	xfs_update_cksum(bp->b_addr, BBTOB(bp->b_length),
-			 XFS_ATTR3_RMT_CRC_OFF);
+	ASSERT(len == 0);
 }
 
 const struct xfs_buf_ops xfs_attr3_rmt_buf_ops = {
@@ -131,15 +203,16 @@ const struct xfs_buf_ops xfs_attr3_rmt_buf_ops = {
 	.verify_write = xfs_attr3_rmt_write_verify,
 };
 
-static int
+STATIC int
 xfs_attr3_rmt_hdr_set(
 	struct xfs_mount	*mp,
+	void			*ptr,
 	xfs_ino_t		ino,
 	uint32_t		offset,
 	uint32_t		size,
-	struct xfs_buf		*bp)
+	xfs_daddr_t		bno)
 {
-	struct xfs_attr3_rmt_hdr *rmt = bp->b_addr;
+	struct xfs_attr3_rmt_hdr *rmt = ptr;
 
 	if (!xfs_sb_version_hascrc(&mp->m_sb))
 		return 0;
@@ -149,36 +222,107 @@ xfs_attr3_rmt_hdr_set(
 	rmt->rm_bytes = cpu_to_be32(size);
 	uuid_copy(&rmt->rm_uuid, &mp->m_sb.sb_uuid);
 	rmt->rm_owner = cpu_to_be64(ino);
-	rmt->rm_blkno = cpu_to_be64(bp->b_bn);
-	bp->b_ops = &xfs_attr3_rmt_buf_ops;
+	rmt->rm_blkno = cpu_to_be64(bno);
 
 	return sizeof(struct xfs_attr3_rmt_hdr);
 }
 
 /*
- * Checking of the remote attribute header is split into two parts. the verifier
- * does CRC, location and bounds checking, the unpacking function checks the
- * attribute parameters and owner.
+ * Helper functions to copy attribute data in and out of the one disk extents
  */
-static bool
-xfs_attr3_rmt_hdr_ok(
-	struct xfs_mount	*mp,
-	xfs_ino_t		ino,
-	uint32_t		offset,
-	uint32_t		size,
-	struct xfs_buf		*bp)
+STATIC int
+xfs_attr_rmtval_copyout(
+	struct xfs_mount *mp,
+	struct xfs_buf	*bp,
+	xfs_ino_t	ino,
+	int		*offset,
+	int		*valuelen,
+	char		**dst)
 {
-	struct xfs_attr3_rmt_hdr *rmt = bp->b_addr;
+	char		*src = bp->b_addr;
+	xfs_daddr_t	bno = bp->b_bn;
+	int		len = BBTOB(bp->b_length);
 
-	if (offset != be32_to_cpu(rmt->rm_offset))
-		return false;
-	if (size != be32_to_cpu(rmt->rm_bytes))
-		return false;
-	if (ino != be64_to_cpu(rmt->rm_owner))
-		return false;
+	ASSERT(len >= XFS_LBSIZE(mp));
 
-	/* ok */
-	return true;
+	while (len > 0 && *valuelen > 0) {
+		int hdr_size = 0;
+		int byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, XFS_LBSIZE(mp));
+
+		byte_cnt = min_t(int, *valuelen, byte_cnt);
+
+		if (xfs_sb_version_hascrc(&mp->m_sb)) {
+			if (!xfs_attr3_rmt_hdr_ok(mp, src, ino, *offset,
+						  byte_cnt, bno)) {
+				xfs_alert(mp,
+"remote attribute header mismatch bno/off/len/owner (0x%llx/0x%x/Ox%x/0x%llx)",
+					bno, *offset, byte_cnt, ino);
+				return EFSCORRUPTED;
+			}
+			hdr_size = sizeof(struct xfs_attr3_rmt_hdr);
+		}
+
+		memcpy(*dst, src + hdr_size, byte_cnt);
+
+		/* roll buffer forwards */
+		len -= XFS_LBSIZE(mp);
+		src += XFS_LBSIZE(mp);
+		bno += mp->m_bsize;
+
+		/* roll attribute data forwards */
+		*valuelen -= byte_cnt;
+		*dst += byte_cnt;
+		*offset += byte_cnt;
+	}
+	return 0;
+}
+
+STATIC void
+xfs_attr_rmtval_copyin(
+	struct xfs_mount *mp,
+	struct xfs_buf	*bp,
+	xfs_ino_t	ino,
+	int		*offset,
+	int		*valuelen,
+	char		**src)
+{
+	char		*dst = bp->b_addr;
+	xfs_daddr_t	bno = bp->b_bn;
+	int		len = BBTOB(bp->b_length);
+
+	ASSERT(len >= XFS_LBSIZE(mp));
+
+	while (len > 0 && *valuelen > 0) {
+		int hdr_size;
+		int byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, XFS_LBSIZE(mp));
+
+		byte_cnt = min(*valuelen, byte_cnt);
+		hdr_size = xfs_attr3_rmt_hdr_set(mp, dst, ino, *offset,
+						 byte_cnt, bno);
+
+		memcpy(dst + hdr_size, *src, byte_cnt);
+
+		/*
+		 * If this is the last block, zero the remainder of it.
+		 * Check that we are actually the last block, too.
+		 */
+		if (byte_cnt + hdr_size < XFS_LBSIZE(mp)) {
+			ASSERT(*valuelen - byte_cnt == 0);
+			ASSERT(len == XFS_LBSIZE(mp));
+			memset(dst + hdr_size + byte_cnt, 0,
+					XFS_LBSIZE(mp) - hdr_size - byte_cnt);
+		}
+
+		/* roll buffer forwards */
+		len -= XFS_LBSIZE(mp);
+		dst += XFS_LBSIZE(mp);
+		bno += mp->m_bsize;
+
+		/* roll attribute data forwards */
+		*valuelen -= byte_cnt;
+		*src += byte_cnt;
+		*offset += byte_cnt;
+	}
 }
 
 /*
@@ -192,13 +336,12 @@ xfs_attr_rmtval_get(
 	struct xfs_bmbt_irec	map[ATTR_RMTVALUE_MAPSIZE];
 	struct xfs_mount	*mp = args->dp->i_mount;
 	struct xfs_buf		*bp;
-	xfs_daddr_t		dblkno;
 	xfs_dablk_t		lblkno = args->rmtblkno;
-	void			*dst = args->value;
+	char			*dst = args->value;
 	int			valuelen = args->valuelen;
 	int			nmap;
 	int			error;
-	int			blkcnt;
+	int			blkcnt = args->rmtblkcnt;
 	int			i;
 	int			offset = 0;
 
@@ -208,7 +351,6 @@ xfs_attr_rmtval_get(
 
 	while (valuelen > 0) {
 		nmap = ATTR_RMTVALUE_MAPSIZE;
-		blkcnt = xfs_attr3_rmt_blocks(mp, valuelen);
 		error = xfs_bmapi_read(args->dp, (xfs_fileoff_t)lblkno,
 				       blkcnt, map, &nmap,
 				       XFS_BMAPI_ATTRFORK);
@@ -217,45 +359,29 @@ xfs_attr_rmtval_get(
 		ASSERT(nmap >= 1);
 
 		for (i = 0; (i < nmap) && (valuelen > 0); i++) {
-			int	byte_cnt;
-			char	*src;
+			xfs_daddr_t	dblkno;
+			int		dblkcnt;
 
 			ASSERT((map[i].br_startblock != DELAYSTARTBLOCK) &&
 			       (map[i].br_startblock != HOLESTARTBLOCK));
 			dblkno = XFS_FSB_TO_DADDR(mp, map[i].br_startblock);
-			blkcnt = XFS_FSB_TO_BB(mp, map[i].br_blockcount);
+			dblkcnt = XFS_FSB_TO_BB(mp, map[i].br_blockcount);
 			error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp,
-						   dblkno, blkcnt, 0, &bp,
+						   dblkno, dblkcnt, 0, &bp,
 						   &xfs_attr3_rmt_buf_ops);
 			if (error)
 				return error;
 
-			byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, BBTOB(bp->b_length));
-			byte_cnt = min_t(int, valuelen, byte_cnt);
-
-			src = bp->b_addr;
-			if (xfs_sb_version_hascrc(&mp->m_sb)) {
-				if (!xfs_attr3_rmt_hdr_ok(mp, args->dp->i_ino,
-							offset, byte_cnt, bp)) {
-					xfs_alert(mp,
-"remote attribute header does not match required off/len/owner (0x%x/Ox%x,0x%llx)",
-						offset, byte_cnt, args->dp->i_ino);
-					xfs_buf_relse(bp);
-					return EFSCORRUPTED;
-
-				}
-
-				src += sizeof(struct xfs_attr3_rmt_hdr);
-			}
-
-			memcpy(dst, src, byte_cnt);
+			error = xfs_attr_rmtval_copyout(mp, bp, args->dp->i_ino,
+							&offset, &valuelen,
+							&dst);
 			xfs_buf_relse(bp);
+			if (error)
+				return error;
 
-			offset += byte_cnt;
-			dst += byte_cnt;
-			valuelen -= byte_cnt;
-
+			/* roll attribute extent map forwards */
 			lblkno += map[i].br_blockcount;
+			blkcnt -= map[i].br_blockcount;
 		}
 	}
 	ASSERT(valuelen == 0);
@@ -273,17 +399,13 @@ xfs_attr_rmtval_set(
 	struct xfs_inode	*dp = args->dp;
 	struct xfs_mount	*mp = dp->i_mount;
 	struct xfs_bmbt_irec	map;
-	struct xfs_buf		*bp;
-	xfs_daddr_t		dblkno;
 	xfs_dablk_t		lblkno;
 	xfs_fileoff_t		lfileoff = 0;
-	void			*src = args->value;
+	char			*src = args->value;
 	int			blkcnt;
 	int			valuelen;
 	int			nmap;
 	int			error;
-	int			hdrcnt = 0;
-	bool			crcs = xfs_sb_version_hascrc(&mp->m_sb);
 	int			offset = 0;
 
 	trace_xfs_attr_rmtval_set(args);
@@ -292,21 +414,14 @@ xfs_attr_rmtval_set(
 	 * Find a "hole" in the attribute address space large enough for
 	 * us to drop the new attribute's value into. Because CRC enable
 	 * attributes have headers, we can't just do a straight byte to FSB
-	 * conversion. We calculate the worst case block count in this case
-	 * and we may not need that many, so we have to handle this when
-	 * allocating the blocks below. 
+	 * conversion and have to take the header space into account.
 	 */
 	blkcnt = xfs_attr3_rmt_blocks(mp, args->valuelen);
-
 	error = xfs_bmap_first_unused(args->trans, args->dp, blkcnt, &lfileoff,
 						   XFS_ATTR_FORK);
 	if (error)
 		return error;
 
-	/* Start with the attribute data. We'll allocate the rest afterwards. */
-	if (crcs)
-		blkcnt = XFS_B_TO_FSB(mp, args->valuelen);
-
 	args->rmtblkno = lblkno = (xfs_dablk_t)lfileoff;
 	args->rmtblkcnt = blkcnt;
 
@@ -349,31 +464,6 @@ xfs_attr_rmtval_set(
 		       (map.br_startblock != HOLESTARTBLOCK));
 		lblkno += map.br_blockcount;
 		blkcnt -= map.br_blockcount;
-		hdrcnt++;
-
-		/*
-		 * If we have enough blocks for the attribute data, calculate
-		 * how many extra blocks we need for headers. We might run
-		 * through this multiple times in the case that the additional
-		 * headers in the blocks needed for the data fragments spills
-		 * into requiring more blocks. e.g. for 512 byte blocks, we'll
-		 * spill for another block every 9 headers we require in this
-		 * loop.
-		 *
-		 * Note that this can result in contiguous allocation of blocks,
-		 * so we don't use all the space we allocate for headers as we
-		 * have one less header for each contiguous allocation that
-		 * occurs in the map/write loop below.
-		 */
-		if (crcs && blkcnt == 0) {
-			int total_len;
-
-			total_len = args->valuelen +
-				    hdrcnt * sizeof(struct xfs_attr3_rmt_hdr);
-			blkcnt = XFS_B_TO_FSB(mp, total_len);
-			blkcnt -= args->rmtblkcnt;
-			args->rmtblkcnt += blkcnt;
-		}
 
 		/*
 		 * Start the next trans in the chain.
@@ -390,17 +480,15 @@ xfs_attr_rmtval_set(
 	 * the INCOMPLETE flag.
 	 */
 	lblkno = args->rmtblkno;
-	valuelen = args->valuelen;
 	blkcnt = args->rmtblkcnt;
+	valuelen = args->valuelen;
 	while (valuelen > 0) {
-		int	byte_cnt;
-		int	hdr_size;
-		int	dblkcnt;
-		char	*buf;
+		struct xfs_buf	*bp;
+		xfs_daddr_t	dblkno;
+		int		dblkcnt;
+
+		ASSERT(blkcnt > 0);
 
-		/*
-		 * Try to remember where we decided to put the value.
-		 */
 		xfs_bmap_init(args->flist, args->firstblock);
 		nmap = 1;
 		error = xfs_bmapi_read(dp, (xfs_fileoff_t)lblkno,
@@ -419,29 +507,17 @@ xfs_attr_rmtval_set(
 		if (!bp)
 			return ENOMEM;
 		bp->b_ops = &xfs_attr3_rmt_buf_ops;
-		buf = bp->b_addr;
-
-		byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, BBTOB(bp->b_length));
-		byte_cnt = min_t(int, valuelen, byte_cnt);
-		hdr_size = xfs_attr3_rmt_hdr_set(mp, dp->i_ino, offset,
-					     byte_cnt, bp);
-		ASSERT(hdr_size + byte_cnt <= BBTOB(bp->b_length));
 
-		memcpy(buf + hdr_size, src, byte_cnt);
-
-		if (byte_cnt + hdr_size < BBTOB(bp->b_length))
-			xfs_buf_zero(bp, byte_cnt + hdr_size,
-				     BBTOB(bp->b_length) - byte_cnt - hdr_size);
+		xfs_attr_rmtval_copyin(mp, bp, args->dp->i_ino, &offset,
+				       &valuelen, &src);
 
 		error = xfs_bwrite(bp);	/* GROT: NOTE: synchronous write */
 		xfs_buf_relse(bp);
 		if (error)
 			return error;
 
-		src += byte_cnt;
-		valuelen -= byte_cnt;
-		offset += byte_cnt;
 
+		/* roll attribute extent map forwards */
 		lblkno += map.br_blockcount;
 		blkcnt -= map.br_blockcount;
 	}
@@ -454,19 +530,17 @@ xfs_attr_rmtval_set(
  * out-of-line buffer that it is stored on.
  */
 int
-xfs_attr_rmtval_remove(xfs_da_args_t *args)
+xfs_attr_rmtval_remove(
+	struct xfs_da_args	*args)
 {
-	xfs_mount_t *mp;
-	xfs_bmbt_irec_t map;
-	xfs_buf_t *bp;
-	xfs_daddr_t dblkno;
-	xfs_dablk_t lblkno;
-	int valuelen, blkcnt, nmap, error, done, committed;
+	struct xfs_mount	*mp = args->dp->i_mount;
+	xfs_dablk_t		lblkno;
+	int			blkcnt;
+	int			error;
+	int			done;
 
 	trace_xfs_attr_rmtval_remove(args);
 
-	mp = args->dp->i_mount;
-
 	/*
 	 * Roll through the "value", invalidating the attribute value's blocks.
 	 * Note that args->rmtblkcnt is the minimum number of data blocks we'll
@@ -476,10 +550,13 @@ xfs_attr_rmtval_remove(xfs_da_args_t *args)
 	 * lookups.
 	 */
 	lblkno = args->rmtblkno;
-	valuelen = args->valuelen;
-	blkcnt = xfs_attr3_rmt_blocks(mp, valuelen);
-	while (valuelen > 0) {
-		int dblkcnt;
+	blkcnt = args->rmtblkcnt;
+	while (blkcnt > 0) {
+		struct xfs_bmbt_irec	map;
+		struct xfs_buf		*bp;
+		xfs_daddr_t		dblkno;
+		int			dblkcnt;
+		int			nmap;
 
 		/*
 		 * Try to remember where we decided to put the value.
@@ -506,21 +583,19 @@ xfs_attr_rmtval_remove(xfs_da_args_t *args)
 			bp = NULL;
 		}
 
-		valuelen -= XFS_ATTR3_RMT_BUF_SPACE(mp,
-					XFS_FSB_TO_B(mp, map.br_blockcount));
-
 		lblkno += map.br_blockcount;
 		blkcnt -= map.br_blockcount;
-		blkcnt = max(blkcnt, xfs_attr3_rmt_blocks(mp, valuelen));
 	}
 
 	/*
 	 * Keep de-allocating extents until the remote-value region is gone.
 	 */
-	blkcnt = lblkno - args->rmtblkno;
 	lblkno = args->rmtblkno;
+	blkcnt = args->rmtblkcnt;
 	done = 0;
 	while (!done) {
+		int committed;
+
 		xfs_bmap_init(args->flist, args->firstblock);
 		error = xfs_bunmapi(args->trans, args->dp, lblkno, blkcnt,
 				    XFS_BMAPI_ATTRFORK | XFS_BMAPI_METADATA,
diff --git a/fs/xfs/xfs_attr_remote.h b/fs/xfs/xfs_attr_remote.h
index c7cca60..92a8fd7 100644
--- a/fs/xfs/xfs_attr_remote.h
+++ b/fs/xfs/xfs_attr_remote.h
@@ -20,6 +20,14 @@
 
 #define XFS_ATTR3_RMT_MAGIC	0x5841524d	/* XARM */
 
+/*
+ * There is one of these headers per filesystem block in a remote attribute.
+ * This is done to ensure there is a 1:1 mapping between the attribute value
+ * length and the number of blocks needed to store the attribute. This makes the
+ * verification of a buffer a little more complex, but greatly simplifies the
+ * allocation, reading and writing of these attributes as we don't have to guess
+ * the number of blocks needed to store the attribute data.
+ */
 struct xfs_attr3_rmt_hdr {
 	__be32	rm_magic;
 	__be32	rm_offset;
@@ -39,6 +47,8 @@ struct xfs_attr3_rmt_hdr {
 
 extern const struct xfs_buf_ops xfs_attr3_rmt_buf_ops;
 
+int xfs_attr3_rmt_blocks(struct xfs_mount *mp, int attrlen);
+
 int xfs_attr_rmtval_get(struct xfs_da_args *args);
 int xfs_attr_rmtval_set(struct xfs_da_args *args);
 int xfs_attr_rmtval_remove(struct xfs_da_args *args);
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 0d25542..1b2472a 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -513,6 +513,7 @@ _xfs_buf_find(
 		xfs_alert(btp->bt_mount,
 			  "%s: Block out of range: block 0x%llx, EOFS 0x%llx ",
 			  __func__, blkno, eofs);
+		WARN_ON(1);
 		return NULL;
 	}
 
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 10/11] xfs: fix incorrect remote symlink block count
  2013-05-21  8:01 [PATCH 00/11] xfs: fixes for 3.10-rc3 Dave Chinner
                   ` (8 preceding siblings ...)
  2013-05-21  8:02 ` [PATCH 09/11] xfs: rework remote attr CRCs Dave Chinner
@ 2013-05-21  8:02 ` Dave Chinner
  2013-05-24 20:36   ` Ben Myers
  2013-05-21  8:02 ` [PATCH 11/11] xfs: increase number of ACL entries for V5 superblocks Dave Chinner
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2013-05-21  8:02 UTC (permalink / raw)
  To: xfs; +Cc: bpm

From: Dave Chinner <dchinner@redhat.com>

When CRCs are enabled, the number of blocks needed to hold a remote
symlink on a 1k block size filesystem may be 2 instead of 1. The
transaction reservation for the allocated bloks was not taking this
into account and only allocating one block. hence when trying to
read or invalidate such symlinks, we are mapping a hole where there
should be a block and things go bad at that point.

Fix the reservation to use the correct block count, clean up the
block count calculation similar to the remote attribute calculation,
and add a debug guard to detect when we don't write the entire
symlink to disk.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_symlink.c |   20 ++++++--------------
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 5f234389..195a403 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -56,16 +56,9 @@ xfs_symlink_blocks(
 	struct xfs_mount *mp,
 	int		pathlen)
 {
-	int		fsblocks = 0;
-	int		len = pathlen;
+	int buflen = XFS_SYMLINK_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
 
-	do {
-		fsblocks++;
-		len -= XFS_SYMLINK_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
-	} while (len > 0);
-
-	ASSERT(fsblocks <= XFS_SYMLINK_MAPS);
-	return fsblocks;
+	return (pathlen + buflen - 1) / buflen;
 }
 
 static int
@@ -405,7 +398,7 @@ xfs_symlink(
 	if (pathlen <= XFS_LITINO(mp, dp->i_d.di_version))
 		fs_blocks = 0;
 	else
-		fs_blocks = XFS_B_TO_FSB(mp, pathlen);
+		fs_blocks = xfs_symlink_blocks(mp, pathlen);
 	resblks = XFS_SYMLINK_SPACE_RES(mp, link_name->len, fs_blocks);
 	error = xfs_trans_reserve(tp, resblks, XFS_SYMLINK_LOG_RES(mp), 0,
 			XFS_TRANS_PERM_LOG_RES, XFS_SYMLINK_LOG_COUNT);
@@ -512,7 +505,7 @@ xfs_symlink(
 		cur_chunk = target_path;
 		offset = 0;
 		for (n = 0; n < nmaps; n++) {
-			char *buf;
+			char	*buf;
 
 			d = XFS_FSB_TO_DADDR(mp, mval[n].br_startblock);
 			byte_cnt = XFS_FSB_TO_B(mp, mval[n].br_blockcount);
@@ -525,9 +518,7 @@ xfs_symlink(
 			bp->b_ops = &xfs_symlink_buf_ops;
 
 			byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt);
-			if (pathlen < byte_cnt) {
-				byte_cnt = pathlen;
-			}
+			byte_cnt = min(byte_cnt, pathlen);
 
 			buf = bp->b_addr;
 			buf += xfs_symlink_hdr_set(mp, ip->i_ino, offset,
@@ -542,6 +533,7 @@ xfs_symlink(
 			xfs_trans_log_buf(tp, bp, 0, (buf + byte_cnt - 1) -
 							(char *)bp->b_addr);
 		}
+		ASSERT(pathlen == 0);
 	}
 
 	/*
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 11/11] xfs: increase number of ACL entries for V5 superblocks
  2013-05-21  8:01 [PATCH 00/11] xfs: fixes for 3.10-rc3 Dave Chinner
                   ` (9 preceding siblings ...)
  2013-05-21  8:02 ` [PATCH 10/11] xfs: fix incorrect remote symlink block count Dave Chinner
@ 2013-05-21  8:02 ` Dave Chinner
  2013-05-21 14:00   ` Brian Foster
  2013-05-21 16:26 ` [PATCH 00/11] xfs: fixes for 3.10-rc3 Ben Myers
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2013-05-21  8:02 UTC (permalink / raw)
  To: xfs; +Cc: bpm

From: Dave Chinner <dchinner@redhat.com>

The limit of 25 ACL entries is arbitrary, but baked into the on-disk
format.  For version 5 superblocks, increase it to the maximum nuber
of ACLs that can fit into a single xattr.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_acl.c |   31 ++++++++++++++++++-------------
 fs/xfs/xfs_acl.h |   30 +++++++++++++++++++++++-------
 2 files changed, 41 insertions(+), 20 deletions(-)

diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
index 1d32f1d..b6bed9b 100644
--- a/fs/xfs/xfs_acl.c
+++ b/fs/xfs/xfs_acl.c
@@ -21,6 +21,8 @@
 #include "xfs_bmap_btree.h"
 #include "xfs_inode.h"
 #include "xfs_vnodeops.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
 #include "xfs_trace.h"
 #include <linux/slab.h>
 #include <linux/xattr.h>
@@ -34,7 +36,9 @@
  */
 
 STATIC struct posix_acl *
-xfs_acl_from_disk(struct xfs_acl *aclp)
+xfs_acl_from_disk(
+	struct xfs_acl	*aclp,
+	int		max_entries)
 {
 	struct posix_acl_entry *acl_e;
 	struct posix_acl *acl;
@@ -42,7 +46,7 @@ xfs_acl_from_disk(struct xfs_acl *aclp)
 	unsigned int count, i;
 
 	count = be32_to_cpu(aclp->acl_cnt);
-	if (count > XFS_ACL_MAX_ENTRIES)
+	if (count > max_entries)
 		return ERR_PTR(-EFSCORRUPTED);
 
 	acl = posix_acl_alloc(count, GFP_KERNEL);
@@ -108,7 +112,7 @@ xfs_get_acl(struct inode *inode, int type)
 	struct xfs_inode *ip = XFS_I(inode);
 	struct posix_acl *acl;
 	struct xfs_acl *xfs_acl;
-	int len = sizeof(struct xfs_acl);
+	int len = XFS_ACL_SIZE(ip->i_mount);
 	unsigned char *ea_name;
 	int error;
 
@@ -133,8 +137,8 @@ xfs_get_acl(struct inode *inode, int type)
 	 * If we have a cached ACLs value just return it, not need to
 	 * go out to the disk.
 	 */
-
-	xfs_acl = kzalloc(sizeof(struct xfs_acl), GFP_KERNEL);
+	len = XFS_ACL_SIZE(ip->i_mount);
+	xfs_acl = kzalloc(len, GFP_KERNEL);
 	if (!xfs_acl)
 		return ERR_PTR(-ENOMEM);
 
@@ -153,7 +157,7 @@ xfs_get_acl(struct inode *inode, int type)
 		goto out;
 	}
 
-	acl = xfs_acl_from_disk(xfs_acl);
+	acl = xfs_acl_from_disk(xfs_acl, XFS_ACL_MAX_ENTRIES(ip->i_mount));
 	if (IS_ERR(acl))
 		goto out;
 
@@ -189,16 +193,17 @@ xfs_set_acl(struct inode *inode, int type, struct posix_acl *acl)
 
 	if (acl) {
 		struct xfs_acl *xfs_acl;
-		int len;
+		int len = XFS_ACL_SIZE(ip->i_mount);
 
-		xfs_acl = kzalloc(sizeof(struct xfs_acl), GFP_KERNEL);
+		xfs_acl = kzalloc(len, GFP_KERNEL);
 		if (!xfs_acl)
 			return -ENOMEM;
 
 		xfs_acl_to_disk(xfs_acl, acl);
-		len = sizeof(struct xfs_acl) -
-			(sizeof(struct xfs_acl_entry) *
-			 (XFS_ACL_MAX_ENTRIES - acl->a_count));
+
+		/* subtract away the unused acl entries */
+		len -= sizeof(struct xfs_acl_entry) *
+			 (XFS_ACL_MAX_ENTRIES(ip->i_mount) - acl->a_count);
 
 		error = -xfs_attr_set(ip, ea_name, (unsigned char *)xfs_acl,
 				len, ATTR_ROOT);
@@ -243,7 +248,7 @@ xfs_set_mode(struct inode *inode, umode_t mode)
 static int
 xfs_acl_exists(struct inode *inode, unsigned char *name)
 {
-	int len = sizeof(struct xfs_acl);
+	int len = XFS_ACL_SIZE(XFS_M(inode->i_sb));
 
 	return (xfs_attr_get(XFS_I(inode), name, NULL, &len,
 			    ATTR_ROOT|ATTR_KERNOVAL) == 0);
@@ -379,7 +384,7 @@ xfs_xattr_acl_set(struct dentry *dentry, const char *name,
 		goto out_release;
 
 	error = -EINVAL;
-	if (acl->a_count > XFS_ACL_MAX_ENTRIES)
+	if (acl->a_count > XFS_ACL_MAX_ENTRIES(XFS_M(inode->i_sb)))
 		goto out_release;
 
 	if (type == ACL_TYPE_ACCESS) {
diff --git a/fs/xfs/xfs_acl.h b/fs/xfs/xfs_acl.h
index 39632d9..5310a52 100644
--- a/fs/xfs/xfs_acl.h
+++ b/fs/xfs/xfs_acl.h
@@ -22,19 +22,35 @@ struct inode;
 struct posix_acl;
 struct xfs_inode;
 
-#define XFS_ACL_MAX_ENTRIES 25
 #define XFS_ACL_NOT_PRESENT (-1)
 
 /* On-disk XFS access control list structure */
+struct xfs_acl_entry {
+	__be32	ae_tag;
+	__be32	ae_id;
+	__be16	ae_perm;
+	__be16	ae_pad;		/* fill the implicit hole in the structure */
+};
+
 struct xfs_acl {
-	__be32		acl_cnt;
-	struct xfs_acl_entry {
-		__be32	ae_tag;
-		__be32	ae_id;
-		__be16	ae_perm;
-	} acl_entry[XFS_ACL_MAX_ENTRIES];
+	__be32			acl_cnt;
+	struct xfs_acl_entry	acl_entry[0];
 };
 
+/*
+ * The number of ACL entries allowed is defined by the on-disk format.
+ * For v4 superblocks, that is limited to 25 entries. For v5 superblocks, it is
+ * limited only by the maximum size of the xattr that stores the information.
+ */
+#define XFS_ACL_MAX_ENTRIES(mp)	\
+	(xfs_sb_version_hascrc(&mp->m_sb) \
+	   ?  (XATTR_SIZE_MAX - sizeof(__be32) / sizeof(struct xfs_acl_entry)) \
+	   : 25)
+
+#define XFS_ACL_SIZE(mp) \
+	(sizeof(struct xfs_acl) + \
+		sizeof(struct xfs_acl_entry) * XFS_ACL_MAX_ENTRIES((mp)))
+
 /* On-disk XFS extended attribute names */
 #define SGI_ACL_FILE		(unsigned char *)"SGI_ACL_FILE"
 #define SGI_ACL_DEFAULT		(unsigned char *)"SGI_ACL_DEFAULT"
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 11/11] xfs: increase number of ACL entries for V5 superblocks
  2013-05-21  8:02 ` [PATCH 11/11] xfs: increase number of ACL entries for V5 superblocks Dave Chinner
@ 2013-05-21 14:00   ` Brian Foster
  2013-05-21 20:27     ` Dave Chinner
  0 siblings, 1 reply; 49+ messages in thread
From: Brian Foster @ 2013-05-21 14:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: bpm, xfs

On 05/21/2013 04:02 AM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The limit of 25 ACL entries is arbitrary, but baked into the on-disk
> format.  For version 5 superblocks, increase it to the maximum nuber
> of ACLs that can fit into a single xattr.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_acl.c |   31 ++++++++++++++++++-------------
>  fs/xfs/xfs_acl.h |   30 +++++++++++++++++++++++-------
>  2 files changed, 41 insertions(+), 20 deletions(-)
> 
...
> diff --git a/fs/xfs/xfs_acl.h b/fs/xfs/xfs_acl.h
> index 39632d9..5310a52 100644
> --- a/fs/xfs/xfs_acl.h
> +++ b/fs/xfs/xfs_acl.h
> @@ -22,19 +22,35 @@ struct inode;
>  struct posix_acl;
>  struct xfs_inode;
>  
> -#define XFS_ACL_MAX_ENTRIES 25
>  #define XFS_ACL_NOT_PRESENT (-1)
>  
>  /* On-disk XFS access control list structure */
> +struct xfs_acl_entry {
> +	__be32	ae_tag;
> +	__be32	ae_id;
> +	__be16	ae_perm;
> +	__be16	ae_pad;		/* fill the implicit hole in the structure */
> +};
> +
>  struct xfs_acl {
> -	__be32		acl_cnt;
> -	struct xfs_acl_entry {
> -		__be32	ae_tag;
> -		__be32	ae_id;
> -		__be16	ae_perm;
> -	} acl_entry[XFS_ACL_MAX_ENTRIES];
> +	__be32			acl_cnt;
> +	struct xfs_acl_entry	acl_entry[0];
>  };
>  
> +/*
> + * The number of ACL entries allowed is defined by the on-disk format.
> + * For v4 superblocks, that is limited to 25 entries. For v5 superblocks, it is
> + * limited only by the maximum size of the xattr that stores the information.
> + */
> +#define XFS_ACL_MAX_ENTRIES(mp)	\
> +	(xfs_sb_version_hascrc(&mp->m_sb) \
> +	   ?  (XATTR_SIZE_MAX - sizeof(__be32) / sizeof(struct xfs_acl_entry)) \
					
It looks like this should be:

	(XATTR_SIZE_MAX - sizeof(__be32)) / sizeof(struct xfs_acl_entry)

(i.e., misplaced parens)

Brian

> +	   : 25)
> +
> +#define XFS_ACL_SIZE(mp) \
> +	(sizeof(struct xfs_acl) + \
> +		sizeof(struct xfs_acl_entry) * XFS_ACL_MAX_ENTRIES((mp)))
> +
>  /* On-disk XFS extended attribute names */
>  #define SGI_ACL_FILE		(unsigned char *)"SGI_ACL_FILE"
>  #define SGI_ACL_DEFAULT		(unsigned char *)"SGI_ACL_DEFAULT"
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/11] xfs: fixes for 3.10-rc3
  2013-05-21  8:01 [PATCH 00/11] xfs: fixes for 3.10-rc3 Dave Chinner
                   ` (10 preceding siblings ...)
  2013-05-21  8:02 ` [PATCH 11/11] xfs: increase number of ACL entries for V5 superblocks Dave Chinner
@ 2013-05-21 16:26 ` Ben Myers
  2013-05-21 20:24   ` Dave Chinner
  2013-05-23 12:30 ` [PATCH 0/2] xfs: more " Dave Chinner
  2013-05-24 18:29 ` [PATCH 00/11] xfs: fixes for 3.10-rc3 Ben Myers
  13 siblings, 1 reply; 49+ messages in thread
From: Ben Myers @ 2013-05-21 16:26 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hi Dave,

On Tue, May 21, 2013 at 06:01:59PM +1000, Dave Chinner wrote:
> This is my current kernel bug fix patch series. I've updated it
> against a current xfsdev tree, and contains all the fixes mentioned
> in the "fixes for 3.10-rc2 (updated)" thread. The first 7 patches
> are patches from that series. The last 4 are new patches.
>
> The first new patch stops CRC enabled filesystems from spamming the
> log. It currently emits an "Experimental" warning ever time the
> superblock is written, which is typically every 30s.
>
> The second path ("rework remote attr CRCs") is the changes I
> mentioned in the "fixes for 3.10-rc2 (updated)" thread. The code is
> far more robust as a result of these changes, and I think we really
> need to change the format as done in this patch. Once we have
> decided on the way forward, I'll port this to userspace.
> 
> The third patch fixes a remote symlink problem - I didn't hit this
> until I'd redone the remote attr CRCs and the 1k block size
> filesystem testing made it passed the attribute tests it was failing
> on.
> 
> Finally, the last patch is another on-disk format change - one that
> removes the 25 entry limit on ACLs. It doesn't invalidate anything
> that is already on disk, just allows ACLs on v5 superblock
> filesystems to store more than 25 ACLs in an xattr. In fact, it
> allows (65536 - 4) / 12 = 5461 entries to be stored in a single
> ACL, so I don't see anyone running out on v5 superblocks....
> 
> Thoughts, comments?

I'll look into these but I am concerned that we're starting to get into 3.11
territory.  We'll see how the reviews go.  FWIW I have no objection to changing
the on-disk format at any time until the 'experimental' tag is removed.

Regards,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/11] xfs: avoid nesting transactions in xfs_qm_scall_setqlim()
  2013-05-21  8:02 ` [PATCH 01/11] xfs: avoid nesting transactions in xfs_qm_scall_setqlim() Dave Chinner
@ 2013-05-21 18:35   ` Ben Myers
  0 siblings, 0 replies; 49+ messages in thread
From: Ben Myers @ 2013-05-21 18:35 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, May 21, 2013 at 06:02:00PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Lockdep reports:
> 
> =============================================
> [ INFO: possible recursive locking detected ]
> 3.9.0+ #3 Not tainted
> ---------------------------------------------
> setquota/28368 is trying to acquire lock:
>  (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50
> 
> but task is already holding lock:
>  (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50
> 
> from xfs_qm_scall_setqlim()->xfs_dqread() when a dquot needs to be
> allocated.
> 
> xfs_qm_scall_setqlim() is starting a transaction and then not
> passing it into xfs_qm_dqet() and so it starts it's own transaction
> when allocating the dquot.  Splat!
> 
> Fix this by not allocating the dquot in xfs_qm_scall_setqlim()
> inside the setqlim transaction. This requires getting the dquot
> first (and allocating it if necessary) then dropping and relocking
> the dquot before joining it to the setqlim transaction.
> 
> Reported-by: Michael L. Semon <mlsemon35@gmail.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks good.

Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/11] xfs: fixes for 3.10-rc3
  2013-05-21 16:26 ` [PATCH 00/11] xfs: fixes for 3.10-rc3 Ben Myers
@ 2013-05-21 20:24   ` Dave Chinner
  2013-05-21 20:52     ` Ben Myers
  0 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2013-05-21 20:24 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Tue, May 21, 2013 at 11:26:47AM -0500, Ben Myers wrote:
> Hi Dave,
> 
> On Tue, May 21, 2013 at 06:01:59PM +1000, Dave Chinner wrote:
> > This is my current kernel bug fix patch series. I've updated it
> > against a current xfsdev tree, and contains all the fixes mentioned
> > in the "fixes for 3.10-rc2 (updated)" thread. The first 7 patches
> > are patches from that series. The last 4 are new patches.
> >
> > The first new patch stops CRC enabled filesystems from spamming the
> > log. It currently emits an "Experimental" warning ever time the
> > superblock is written, which is typically every 30s.
> >
> > The second path ("rework remote attr CRCs") is the changes I
> > mentioned in the "fixes for 3.10-rc2 (updated)" thread. The code is
> > far more robust as a result of these changes, and I think we really
> > need to change the format as done in this patch. Once we have
> > decided on the way forward, I'll port this to userspace.
> > 
> > The third patch fixes a remote symlink problem - I didn't hit this
> > until I'd redone the remote attr CRCs and the 1k block size
> > filesystem testing made it passed the attribute tests it was failing
> > on.
> > 
> > Finally, the last patch is another on-disk format change - one that
> > removes the 25 entry limit on ACLs. It doesn't invalidate anything
> > that is already on disk, just allows ACLs on v5 superblock
> > filesystems to store more than 25 ACLs in an xattr. In fact, it
> > allows (65536 - 4) / 12 = 5461 entries to be stored in a single
> > ACL, so I don't see anyone running out on v5 superblocks....
> > 
> > Thoughts, comments?
> 
> I'll look into these but I am concerned that we're starting to get into 3.11
> territory.

The moment we release the first kernel with the format in it, we
need to use feature bits for on-disk format changes, experimental
tag or not. Hence IMO this needs to be fixed before an initial
release.

It's not a huge change from a code perspective, and it's a lot more
reliable in my testing....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 11/11] xfs: increase number of ACL entries for V5 superblocks
  2013-05-21 14:00   ` Brian Foster
@ 2013-05-21 20:27     ` Dave Chinner
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Chinner @ 2013-05-21 20:27 UTC (permalink / raw)
  To: Brian Foster; +Cc: bpm, xfs

On Tue, May 21, 2013 at 10:00:09AM -0400, Brian Foster wrote:
> On 05/21/2013 04:02 AM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > The limit of 25 ACL entries is arbitrary, but baked into the on-disk
> > format.  For version 5 superblocks, increase it to the maximum nuber
> > of ACLs that can fit into a single xattr.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_acl.c |   31 ++++++++++++++++++-------------
> >  fs/xfs/xfs_acl.h |   30 +++++++++++++++++++++++-------
> >  2 files changed, 41 insertions(+), 20 deletions(-)
> > 
> ...
> > diff --git a/fs/xfs/xfs_acl.h b/fs/xfs/xfs_acl.h
> > index 39632d9..5310a52 100644
> > --- a/fs/xfs/xfs_acl.h
> > +++ b/fs/xfs/xfs_acl.h
> > @@ -22,19 +22,35 @@ struct inode;
> >  struct posix_acl;
> >  struct xfs_inode;
> >  
> > -#define XFS_ACL_MAX_ENTRIES 25
> >  #define XFS_ACL_NOT_PRESENT (-1)
> >  
> >  /* On-disk XFS access control list structure */
> > +struct xfs_acl_entry {
> > +	__be32	ae_tag;
> > +	__be32	ae_id;
> > +	__be16	ae_perm;
> > +	__be16	ae_pad;		/* fill the implicit hole in the structure */
> > +};
> > +
> >  struct xfs_acl {
> > -	__be32		acl_cnt;
> > -	struct xfs_acl_entry {
> > -		__be32	ae_tag;
> > -		__be32	ae_id;
> > -		__be16	ae_perm;
> > -	} acl_entry[XFS_ACL_MAX_ENTRIES];
> > +	__be32			acl_cnt;
> > +	struct xfs_acl_entry	acl_entry[0];
> >  };
> >  
> > +/*
> > + * The number of ACL entries allowed is defined by the on-disk format.
> > + * For v4 superblocks, that is limited to 25 entries. For v5 superblocks, it is
> > + * limited only by the maximum size of the xattr that stores the information.
> > + */
> > +#define XFS_ACL_MAX_ENTRIES(mp)	\
> > +	(xfs_sb_version_hascrc(&mp->m_sb) \
> > +	   ?  (XATTR_SIZE_MAX - sizeof(__be32) / sizeof(struct xfs_acl_entry)) \
> 					
> It looks like this should be:
> 
> 	(XATTR_SIZE_MAX - sizeof(__be32)) / sizeof(struct xfs_acl_entry)
> 
> (i.e., misplaced parens)

Yup, good catch.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/11] xfs: fixes for 3.10-rc3
  2013-05-21 20:24   ` Dave Chinner
@ 2013-05-21 20:52     ` Ben Myers
  2013-05-21 21:27       ` Dave Chinner
  0 siblings, 1 reply; 49+ messages in thread
From: Ben Myers @ 2013-05-21 20:52 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hey,

On Wed, May 22, 2013 at 06:24:45AM +1000, Dave Chinner wrote:
> On Tue, May 21, 2013 at 11:26:47AM -0500, Ben Myers wrote:
> > Hi Dave,
> > 
> > On Tue, May 21, 2013 at 06:01:59PM +1000, Dave Chinner wrote:
> > > This is my current kernel bug fix patch series. I've updated it
> > > against a current xfsdev tree, and contains all the fixes mentioned
> > > in the "fixes for 3.10-rc2 (updated)" thread. The first 7 patches
> > > are patches from that series. The last 4 are new patches.
> > >
> > > The first new patch stops CRC enabled filesystems from spamming the
> > > log. It currently emits an "Experimental" warning ever time the
> > > superblock is written, which is typically every 30s.
> > >
> > > The second path ("rework remote attr CRCs") is the changes I
> > > mentioned in the "fixes for 3.10-rc2 (updated)" thread. The code is
> > > far more robust as a result of these changes, and I think we really
> > > need to change the format as done in this patch. Once we have
> > > decided on the way forward, I'll port this to userspace.
> > > 
> > > The third patch fixes a remote symlink problem - I didn't hit this
> > > until I'd redone the remote attr CRCs and the 1k block size
> > > filesystem testing made it passed the attribute tests it was failing
> > > on.
> > > 
> > > Finally, the last patch is another on-disk format change - one that
> > > removes the 25 entry limit on ACLs. It doesn't invalidate anything
> > > that is already on disk, just allows ACLs on v5 superblock
> > > filesystems to store more than 25 ACLs in an xattr. In fact, it
> > > allows (65536 - 4) / 12 = 5461 entries to be stored in a single
> > > ACL, so I don't see anyone running out on v5 superblocks....
> > > 
> > > Thoughts, comments?
> > 
> > I'll look into these but I am concerned that we're starting to get into 3.11
> > territory.
> 
> The moment we release the first kernel with the format in it, we
> need to use feature bits for on-disk format changes, experimental
> tag or not. Hence IMO this needs to be fixed before an initial
> release.

There's plenty of bits to go around.  ;)

> It's not a huge change from a code perspective, and it's a lot more
> reliable in my testing....

I'm just a bit leery of a redesign at this late juncture.  We always seem to
get into trouble at the last minute...  Another option is to revert just the
xattr crc support and slap it back in once it has stablized after the release,
but we need to flip a bit for that too...  Anyway, I need to look your proposed
code to see what's what.

Seems like xattrs might be one area where we could use some work in xfstests?

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 03/11] xfs: remote attribute read too short
  2013-05-21  8:02 ` [PATCH 03/11] xfs: remote attribute read too short Dave Chinner
@ 2013-05-21 20:59   ` Ben Myers
  2013-05-21 22:53     ` Dave Chinner
  0 siblings, 1 reply; 49+ messages in thread
From: Ben Myers @ 2013-05-21 20:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, May 21, 2013 at 06:02:02PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Reading a maximally size remote attribute fails when CRCs are
> enabled with this verification error:
> 
> XFS (vdb): remote attribute header does not match required off/len/owner)
> 
> There are two reasons for this, the first being that the
> length of the buffer being read is determined from the
> args->rmtblkcnt which doesn't take into account CRC headers. Hence
> the mapped length ends up being too short and so we need to
> calculate it directly from the value length.
> 
> The second is that the byte count of valid data within a buffer is
> capped by the length of the data and so doesn't take into account
> that the buffer might be longer due to headers. Hence we need to
> calculate the data space in the buffer first before calculating the
> actual byte count of data.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_attr_remote.c |   15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
> index aad95b0..bcdc07c 100644
> --- a/fs/xfs/xfs_attr_remote.c
> +++ b/fs/xfs/xfs_attr_remote.c
> @@ -52,9 +52,11 @@ xfs_attr3_rmt_blocks(
>  	struct xfs_mount *mp,
>  	int		attrlen)
>  {
> -	int		buflen = XFS_ATTR3_RMT_BUF_SPACE(mp,
> -							 mp->m_sb.sb_blocksize);
> -	return (attrlen + buflen - 1) / buflen;
> +	if (xfs_sb_version_hascrc(&mp->m_sb)) {
> +		int buflen = XFS_ATTR3_RMT_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
> +		return (attrlen + buflen - 1) / buflen;
> +	}
> +	return XFS_B_TO_FSB(mp, attrlen);

xfs_attr_rmtval_set, xfs_attr3_rmt_blocks, and XFS_ATTR3_RMT_BUF_SPACE all have
checks for crcs on the superblock.  It's like I'm seeing stars.

>  }
>  
>  static bool
> @@ -206,8 +208,9 @@ xfs_attr_rmtval_get(
>  
>  	while (valuelen > 0) {
>  		nmap = ATTR_RMTVALUE_MAPSIZE;
> +		blkcnt = xfs_attr3_rmt_blocks(mp, valuelen);
>  		error = xfs_bmapi_read(args->dp, (xfs_fileoff_t)lblkno,
> -				       args->rmtblkcnt, map, &nmap,
> +				       blkcnt, map, &nmap,

Isn't blkcnt wrong on the 2nd or greater iteration of this loop?  Looks like an old bug.

>  				       XFS_BMAPI_ATTRFORK);
>  		if (error)
>  			return error;
> @@ -227,8 +230,8 @@ xfs_attr_rmtval_get(
>  			if (error)
>  				return error;
>  
> -			byte_cnt = min_t(int, valuelen, BBTOB(bp->b_length));
> -			byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, byte_cnt);
> +			byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, BBTOB(bp->b_length));
> +			byte_cnt = min_t(int, valuelen, byte_cnt);

And now byte_cnt reflects the number of bytes in the mapping, less the size of
a header..

Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/11] xfs: fixes for 3.10-rc3
  2013-05-21 20:52     ` Ben Myers
@ 2013-05-21 21:27       ` Dave Chinner
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Chinner @ 2013-05-21 21:27 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Tue, May 21, 2013 at 03:52:49PM -0500, Ben Myers wrote:
> Hey,
> 
> On Wed, May 22, 2013 at 06:24:45AM +1000, Dave Chinner wrote:
> > On Tue, May 21, 2013 at 11:26:47AM -0500, Ben Myers wrote:
> > > Hi Dave,
> > > 
> > > On Tue, May 21, 2013 at 06:01:59PM +1000, Dave Chinner wrote:
> > > > This is my current kernel bug fix patch series. I've updated it
> > > > against a current xfsdev tree, and contains all the fixes mentioned
> > > > in the "fixes for 3.10-rc2 (updated)" thread. The first 7 patches
> > > > are patches from that series. The last 4 are new patches.
> > > >
> > > > The first new patch stops CRC enabled filesystems from spamming the
> > > > log. It currently emits an "Experimental" warning ever time the
> > > > superblock is written, which is typically every 30s.
> > > >
> > > > The second path ("rework remote attr CRCs") is the changes I
> > > > mentioned in the "fixes for 3.10-rc2 (updated)" thread. The code is
> > > > far more robust as a result of these changes, and I think we really
> > > > need to change the format as done in this patch. Once we have
> > > > decided on the way forward, I'll port this to userspace.
> > > > 
> > > > The third patch fixes a remote symlink problem - I didn't hit this
> > > > until I'd redone the remote attr CRCs and the 1k block size
> > > > filesystem testing made it passed the attribute tests it was failing
> > > > on.
> > > > 
> > > > Finally, the last patch is another on-disk format change - one that
> > > > removes the 25 entry limit on ACLs. It doesn't invalidate anything
> > > > that is already on disk, just allows ACLs on v5 superblock
> > > > filesystems to store more than 25 ACLs in an xattr. In fact, it
> > > > allows (65536 - 4) / 12 = 5461 entries to be stored in a single
> > > > ACL, so I don't see anyone running out on v5 superblocks....
> > > > 
> > > > Thoughts, comments?
> > > 
> > > I'll look into these but I am concerned that we're starting to get into 3.11
> > > territory.
> > 
> > The moment we release the first kernel with the format in it, we
> > need to use feature bits for on-disk format changes, experimental
> > tag or not. Hence IMO this needs to be fixed before an initial
> > release.
> 
> There's plenty of bits to go around.  ;)

Sure, but it's unnecessary complexity.

> > It's not a huge change from a code perspective, and it's a lot more
> > reliable in my testing....
> 
> I'm just a bit leery of a redesign at this late juncture.  We always seem to
> get into trouble at the last minute...  Another option is to revert just the
> xattr crc support and slap it back in once it has stablized after the release,
> but we need to flip a bit for that too...  Anyway, I need to look your proposed
> code to see what's what.
> 
> Seems like xattrs might be one area where we could use some work in xfstests?

Not really - it's getting the xfstests xattr tests to pass that have
lead to this.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 04/11] xfs: remote attribute tail zeroing does too much
  2013-05-21  8:02 ` [PATCH 04/11] xfs: remote attribute tail zeroing does too much Dave Chinner
@ 2013-05-21 22:31   ` Ben Myers
  2013-05-21 22:55     ` Dave Chinner
  0 siblings, 1 reply; 49+ messages in thread
From: Ben Myers @ 2013-05-21 22:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, May 21, 2013 at 06:02:03PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When an attribute data does not fill then entire remote block, we
> zero the remaining part of the buffer. This, however, needs to take
> into account that the buffer has a header, and so the offset where
> zeroing starts and the length of zeroing need to take this into
> account. Otherwise we end up with zeros over the end of the
> attribute value when CRCs are enabled.
> 
> While there, make sure we only ask to map an extent that covers the
> remaining range of the attribute, rather than asking every time for
> the full length of remote data. If the remote attribute blocks are
> contiguous with other parts of the attribute tree, it will map those
> blocks as well and we can potentially zero them incorrectly. We can
> also get buffer size mistmatches when trying to read or remove the
> remote attribute, and this can lead to not finding the correct
> buffer when looking it up in cache.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

You fixed the bug I mentioned in the last review.  Beaten.

This looks ok.  From the looks of things we might consider cleanup of who is
setting rmtblkcnt in the future.  It is getting to be confusing.

Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 03/11] xfs: remote attribute read too short
  2013-05-21 20:59   ` Ben Myers
@ 2013-05-21 22:53     ` Dave Chinner
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Chinner @ 2013-05-21 22:53 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Tue, May 21, 2013 at 03:59:24PM -0500, Ben Myers wrote:
> On Tue, May 21, 2013 at 06:02:02PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Reading a maximally size remote attribute fails when CRCs are
> > enabled with this verification error:
> > 
> > XFS (vdb): remote attribute header does not match required off/len/owner)
> > 
> > There are two reasons for this, the first being that the
> > length of the buffer being read is determined from the
> > args->rmtblkcnt which doesn't take into account CRC headers. Hence
> > the mapped length ends up being too short and so we need to
> > calculate it directly from the value length.
> > 
> > The second is that the byte count of valid data within a buffer is
> > capped by the length of the data and so doesn't take into account
> > that the buffer might be longer due to headers. Hence we need to
> > calculate the data space in the buffer first before calculating the
> > actual byte count of data.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_attr_remote.c |   15 +++++++++------
> >  1 file changed, 9 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
> > index aad95b0..bcdc07c 100644
> > --- a/fs/xfs/xfs_attr_remote.c
> > +++ b/fs/xfs/xfs_attr_remote.c
> > @@ -52,9 +52,11 @@ xfs_attr3_rmt_blocks(
> >  	struct xfs_mount *mp,
> >  	int		attrlen)
> >  {
> > -	int		buflen = XFS_ATTR3_RMT_BUF_SPACE(mp,
> > -							 mp->m_sb.sb_blocksize);
> > -	return (attrlen + buflen - 1) / buflen;
> > +	if (xfs_sb_version_hascrc(&mp->m_sb)) {
> > +		int buflen = XFS_ATTR3_RMT_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
> > +		return (attrlen + buflen - 1) / buflen;
> > +	}
> > +	return XFS_B_TO_FSB(mp, attrlen);
> 
> xfs_attr_rmtval_set, xfs_attr3_rmt_blocks, and XFS_ATTR3_RMT_BUF_SPACE all have
> checks for crcs on the superblock.  It's like I'm seeing stars.

Sure - I'm not concerned about how efficient the code is at this
point. I just want it to work correctly. If that means we check for
CRC support more frequently than we need to, then that's no big deal
right now.

> 
> >  }
> >  
> >  static bool
> > @@ -206,8 +208,9 @@ xfs_attr_rmtval_get(
> >  
> >  	while (valuelen > 0) {
> >  		nmap = ATTR_RMTVALUE_MAPSIZE;
> > +		blkcnt = xfs_attr3_rmt_blocks(mp, valuelen);
> >  		error = xfs_bmapi_read(args->dp, (xfs_fileoff_t)lblkno,
> > -				       args->rmtblkcnt, map, &nmap,
> > +				       blkcnt, map, &nmap,
> 
> Isn't blkcnt wrong on the 2nd or greater iteration of this loop?  Looks like an old bug.

Yes, args->rmtblkcnt is wrong on the 2nd+ loop iteration. And yes,
it's an old, old bug.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 04/11] xfs: remote attribute tail zeroing does too much
  2013-05-21 22:31   ` Ben Myers
@ 2013-05-21 22:55     ` Dave Chinner
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Chinner @ 2013-05-21 22:55 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Tue, May 21, 2013 at 05:31:09PM -0500, Ben Myers wrote:
> On Tue, May 21, 2013 at 06:02:03PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When an attribute data does not fill then entire remote block, we
> > zero the remaining part of the buffer. This, however, needs to take
> > into account that the buffer has a header, and so the offset where
> > zeroing starts and the length of zeroing need to take this into
> > account. Otherwise we end up with zeros over the end of the
> > attribute value when CRCs are enabled.
> > 
> > While there, make sure we only ask to map an extent that covers the
> > remaining range of the attribute, rather than asking every time for
> > the full length of remote data. If the remote attribute blocks are
> > contiguous with other parts of the attribute tree, it will map those
> > blocks as well and we can potentially zero them incorrectly. We can
> > also get buffer size mistmatches when trying to read or remove the
> > remote attribute, and this can lead to not finding the correct
> > buffer when looking it up in cache.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> You fixed the bug I mentioned in the last review.  Beaten.
> 
> This looks ok.  From the looks of things we might consider cleanup of who is
> setting rmtblkcnt in the future.  It is getting to be confusing.

Yes, it is. This exact problem is one of the things my rework of the
attr CRC code later in the series fixes.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/11] xfs: correctly map remote attr buffers during removal
  2013-05-21  8:02 ` [PATCH 05/11] xfs: correctly map remote attr buffers during removal Dave Chinner
@ 2013-05-22 17:01   ` Ben Myers
  0 siblings, 0 replies; 49+ messages in thread
From: Ben Myers @ 2013-05-22 17:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, May 21, 2013 at 06:02:04PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> If we don't map the buffers correctly (same as for get/set
> operations) then the incore buffer lookup will fail. If a block
> number matches but a length is wrong, then debug kernels will ASSERT
> fail in _xfs_buf_find() due to the length mismatch. Ensure that we
> map the buffers correctly by basing the length of the buffer on the
> attribute data length rather than the remote block count.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Yeah.  Looks fine.
Reviewed-by: Ben Myers <bpm@sgi.com>

> ---
>  fs/xfs/xfs_attr_remote.c |   27 ++++++++++++++++++---------
>  1 file changed, 18 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
> index e207bf0..d8bcb2d 100644
> --- a/fs/xfs/xfs_attr_remote.c
> +++ b/fs/xfs/xfs_attr_remote.c
> @@ -468,19 +468,25 @@ xfs_attr_rmtval_remove(xfs_da_args_t *args)
>  	mp = args->dp->i_mount;
>  
>  	/*
> -	 * Roll through the "value", invalidating the attribute value's
> -	 * blocks.
> +	 * Roll through the "value", invalidating the attribute value's blocks.
> +	 * Note that args->rmtblkcnt is the minimum number of data blocks we'll
> +	 * see for a CRC enabled remote attribute. Each extent will have a
> +	 * header, and so we may have more blocks than we realise here.  If we
> +	 * fail to map the blocks correctly, we'll have problems with the buffer
> +	 * lookups.
>  	 */
>  	lblkno = args->rmtblkno;
> -	valuelen = args->rmtblkcnt;
> +	valuelen = args->valuelen;
> +	blkcnt = xfs_attr3_rmt_blocks(mp, valuelen);
>  	while (valuelen > 0) {
> +		int dblkcnt;
> +
>  		/*
>  		 * Try to remember where we decided to put the value.
>  		 */
>  		nmap = 1;
>  		error = xfs_bmapi_read(args->dp, (xfs_fileoff_t)lblkno,
> -				       args->rmtblkcnt, &map, &nmap,
> -				       XFS_BMAPI_ATTRFORK);
> +				       blkcnt, &map, &nmap, XFS_BMAPI_ATTRFORK);
>  		if (error)
>  			return(error);
>  		ASSERT(nmap == 1);
> @@ -488,28 +494,31 @@ xfs_attr_rmtval_remove(xfs_da_args_t *args)
>  		       (map.br_startblock != HOLESTARTBLOCK));
>  
>  		dblkno = XFS_FSB_TO_DADDR(mp, map.br_startblock),
> -		blkcnt = XFS_FSB_TO_BB(mp, map.br_blockcount);
> +		dblkcnt = XFS_FSB_TO_BB(mp, map.br_blockcount);
>  
>  		/*
>  		 * If the "remote" value is in the cache, remove it.
>  		 */
> -		bp = xfs_incore(mp->m_ddev_targp, dblkno, blkcnt, XBF_TRYLOCK);
> +		bp = xfs_incore(mp->m_ddev_targp, dblkno, dblkcnt, XBF_TRYLOCK);
>  		if (bp) {
>  			xfs_buf_stale(bp);
>  			xfs_buf_relse(bp);
>  			bp = NULL;
>  		}
>  
> -		valuelen -= map.br_blockcount;
> +		valuelen -= XFS_ATTR3_RMT_BUF_SPACE(mp,
> +					XFS_FSB_TO_B(mp, map.br_blockcount));
>  
>  		lblkno += map.br_blockcount;
> +		blkcnt -= map.br_blockcount;
> +		blkcnt = max(blkcnt, xfs_attr3_rmt_blocks(mp, valuelen));

max for fragmentation... can be removed once its header per block, i think.



>  	}
>  
>  	/*
>  	 * Keep de-allocating extents until the remote-value region is gone.
>  	 */
> +	blkcnt = lblkno - args->rmtblkno;
>  	lblkno = args->rmtblkno;
> -	blkcnt = args->rmtblkcnt;
>  	done = 0;
>  	while (!done) {
>  		xfs_bmap_init(args->flist, args->firstblock);
> -- 
> 1.7.10.4
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/11] xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance
  2013-05-21  8:02 ` [PATCH 06/11] xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance Dave Chinner
@ 2013-05-22 20:50   ` Ben Myers
  2013-05-22 23:54     ` Dave Chinner
  0 siblings, 1 reply; 49+ messages in thread
From: Ben Myers @ 2013-05-22 20:50 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, May 21, 2013 at 06:02:05PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xfs_attr3_leaf_unbalance() uses a temporary buffer for recombining
> the entries in two leaves when the destination leaf requires
> compaction. The temporary buffer ends up being copied back over the
> original destination buffer, so the header in the temporary buffer
> needs to contain all the information that is in the destination
> buffer.
> 
> To make sure the temporary buffer is fully initialised, once we've
> set up the temporary incore header appropriately, write is back to
> the temporary buffer before starting to move entries around.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
>
> ---
>  fs/xfs/xfs_attr_leaf.c |   16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
> index 0bce1b3..79ece72 100644
> --- a/fs/xfs/xfs_attr_leaf.c
> +++ b/fs/xfs/xfs_attr_leaf.c
> @@ -2181,14 +2181,24 @@ xfs_attr3_leaf_unbalance(
>  		struct xfs_attr_leafblock *tmp_leaf;
>  		struct xfs_attr3_icleaf_hdr tmphdr;
>  
> -		tmp_leaf = kmem_alloc(state->blocksize, KM_SLEEP);
> -		memset(tmp_leaf, 0, state->blocksize);
> -		memset(&tmphdr, 0, sizeof(tmphdr));
> +		tmp_leaf = kmem_zalloc(state->blocksize, KM_SLEEP);
> +
> +		/*
> +		 * Copy the header into the temp leaf so that all the stuff
> +		 * not in the incore header is present and gets copied back in
> +		 * once we've moved all the entries.
> +		 */
> +		memcpy(tmp_leaf, save_leaf, xfs_attr3_leaf_hdr_size(save_leaf));
>  
> +		memset(&tmphdr, 0, sizeof(tmphdr));
>  		tmphdr.magic = savehdr.magic;
>  		tmphdr.forw = savehdr.forw;
>  		tmphdr.back = savehdr.back;
>  		tmphdr.firstused = state->blocksize;
> +
> +		/* write the header to the temp buffer to initialise it */
> +		xfs_attr3_leaf_hdr_to_disk(tmp_leaf, &tmphdr);
> +

Hrm.  It seemed like the memset 0 for the incore temp header should have been
enough for the purposes of moveents.  Why did you have to add this additional
hdr_to_disk?

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 07/11] xfs: fully initialise temp leaf in xfs_attr3_leaf_compact
  2013-05-21  8:02 ` [PATCH 07/11] xfs: fully initialise temp leaf in xfs_attr3_leaf_compact Dave Chinner
@ 2013-05-22 21:59   ` Ben Myers
  2013-05-22 23:58     ` Dave Chinner
  0 siblings, 1 reply; 49+ messages in thread
From: Ben Myers @ 2013-05-22 21:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, May 21, 2013 at 06:02:06PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xfs_attr3_leaf_compact() uses a temporary buffer for compacting the
> the entries in a leaf. It copies the the original buffer into the
> temporary buffer, then zeros the original buffer completely. It then
> copies the entries back into the original buffer.  However, the
> original buffer has not been correctly initialised, and so the
> movement of the entries goes horribly wrong.
> 
> Make sure the zeroed destination buffer is fully initialised, and
> once we've set up the destination incore header appropriately, write
> is back to the buffer before starting to move entries around.
  it

> 
> While debugging this, the _d/_s prefixes weren't sufficient to
> remind me what buffer was what, so rename then all _src/_dst.

Yeah, it helps.

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_attr_leaf.c |   42 ++++++++++++++++++++++++++----------------
>  1 file changed, 26 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
> index 79ece72..5b03d15 100644
> --- a/fs/xfs/xfs_attr_leaf.c
> +++ b/fs/xfs/xfs_attr_leaf.c
> @@ -1445,11 +1445,12 @@ xfs_attr3_leaf_add_work(
>  STATIC void
>  xfs_attr3_leaf_compact(
>  	struct xfs_da_args	*args,
> -	struct xfs_attr3_icleaf_hdr *ichdr_d,
> +	struct xfs_attr3_icleaf_hdr *ichdr_dst,
>  	struct xfs_buf		*bp)
>  {
> -	xfs_attr_leafblock_t	*leaf_s, *leaf_d;
> -	struct xfs_attr3_icleaf_hdr ichdr_s;
> +	struct xfs_attr_leafblock *leaf_src;
> +	struct xfs_attr_leafblock *leaf_dst;
> +	struct xfs_attr3_icleaf_hdr ichdr_src;
>  	struct xfs_trans	*trans = args->trans;
>  	struct xfs_mount	*mp = trans->t_mountp;
>  	char			*tmpbuffer;
> @@ -1457,29 +1458,38 @@ xfs_attr3_leaf_compact(
>  	trace_xfs_attr_leaf_compact(args);
>  
>  	tmpbuffer = kmem_alloc(XFS_LBSIZE(mp), KM_SLEEP);
> -	ASSERT(tmpbuffer != NULL);
>  	memcpy(tmpbuffer, bp->b_addr, XFS_LBSIZE(mp));
>  	memset(bp->b_addr, 0, XFS_LBSIZE(mp));
> +	leaf_src = (xfs_attr_leafblock_t *)tmpbuffer;
> +	leaf_dst = bp->b_addr;
>  
>  	/*
> -	 * Copy basic information
> +	 * Copy the on-disk header back into the destination buffer to ensure
> +	 * all the information in the header that is not part of the incore
> +	 * header structure is preserved.
>  	 */
> -	leaf_s = (xfs_attr_leafblock_t *)tmpbuffer;
> -	leaf_d = bp->b_addr;
> -	ichdr_s = *ichdr_d;	/* struct copy */
> -	ichdr_d->firstused = XFS_LBSIZE(mp);
> -	ichdr_d->usedbytes = 0;
> -	ichdr_d->count = 0;
> -	ichdr_d->holes = 0;
> -	ichdr_d->freemap[0].base = xfs_attr3_leaf_hdr_size(leaf_s);
> -	ichdr_d->freemap[0].size = ichdr_d->firstused - ichdr_d->freemap[0].base;
> +	memcpy(bp->b_addr, tmpbuffer, xfs_attr3_leaf_hdr_size(leaf_src));
> +
> +	/* Initialise the incore headers */
> +	ichdr_src = *ichdr_dst;	/* struct copy */
> +	ichdr_dst->firstused = XFS_LBSIZE(mp);
> +	ichdr_dst->usedbytes = 0;
> +	ichdr_dst->count = 0;
> +	ichdr_dst->holes = 0;
> +	ichdr_dst->freemap[0].base = xfs_attr3_leaf_hdr_size(leaf_src);
> +	ichdr_dst->freemap[0].size = ichdr_dst->firstused -
> +						ichdr_dst->freemap[0].base;
> +
> +
> +	/* write the header back to initialise the underlying buffer */
> +	xfs_attr3_leaf_hdr_to_disk(leaf_dst, ichdr_dst);

I can see that the memcpy is necessary but again I'm having a hard time
	understanding what field was messed up for moveents.

The patch looks fine in general...  I must be blind.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/11] xfs: don't emit v5 superblock warnings on write
  2013-05-21  8:02 ` [PATCH 08/11] xfs: don't emit v5 superblock warnings on write Dave Chinner
@ 2013-05-22 22:26   ` Ben Myers
  2013-05-23  0:03     ` Dave Chinner
  0 siblings, 1 reply; 49+ messages in thread
From: Ben Myers @ 2013-05-22 22:26 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, May 21, 2013 at 06:02:07PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We write the superblock every 30s or so which results in the
> verifier being called. Right now that results in this output
> every 30s:
> 
> XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
> Use of these features in this kernel is at your own risk!
> 
> And spamming the logs. Stop this output from occurring on superblock
> writes.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_mount.c |   18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index f6bfbd7..e8e310c 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -314,7 +314,8 @@ STATIC int
>  xfs_mount_validate_sb(
>  	xfs_mount_t	*mp,
>  	xfs_sb_t	*sbp,
> -	bool		check_inprogress)
> +	bool		check_inprogress,
> +	bool		check_version)
>  {
>  
>  	/*
> @@ -337,9 +338,10 @@ xfs_mount_validate_sb(
>  
>  	/*
>  	 * Version 5 superblock feature mask validation. Reject combinations the
> -	 * kernel cannot support up front before checking anything else.
> +	 * kernel cannot support up front before checking anything else. For
> +	 * write validation, we don't need to check feature masks.
>  	 */
> -	if (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) {
> +	if (check_version && XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) {

if (!quiet_version) {
>  		xfs_alert(mp,
>  "Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!\n"
>  "Use of these features in this kernel is at your own risk!");

}

Since the stated goal of the patch is to be quieter and not to disable useful
tests in the verifier, I suggest you disable the print rather than disable the
test.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/11] xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance
  2013-05-22 20:50   ` Ben Myers
@ 2013-05-22 23:54     ` Dave Chinner
  2013-05-23 22:51       ` Ben Myers
  0 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2013-05-22 23:54 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Wed, May 22, 2013 at 03:50:59PM -0500, Ben Myers wrote:
> On Tue, May 21, 2013 at 06:02:05PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > xfs_attr3_leaf_unbalance() uses a temporary buffer for recombining
> > the entries in two leaves when the destination leaf requires
> > compaction. The temporary buffer ends up being copied back over the
> > original destination buffer, so the header in the temporary buffer
> > needs to contain all the information that is in the destination
> > buffer.
> > 
> > To make sure the temporary buffer is fully initialised, once we've
> > set up the temporary incore header appropriately, write is back to
> > the temporary buffer before starting to move entries around.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> >
> > ---
> >  fs/xfs/xfs_attr_leaf.c |   16 +++++++++++++---
> >  1 file changed, 13 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
> > index 0bce1b3..79ece72 100644
> > --- a/fs/xfs/xfs_attr_leaf.c
> > +++ b/fs/xfs/xfs_attr_leaf.c
> > @@ -2181,14 +2181,24 @@ xfs_attr3_leaf_unbalance(
> >  		struct xfs_attr_leafblock *tmp_leaf;
> >  		struct xfs_attr3_icleaf_hdr tmphdr;
> >  
> > -		tmp_leaf = kmem_alloc(state->blocksize, KM_SLEEP);
> > -		memset(tmp_leaf, 0, state->blocksize);
> > -		memset(&tmphdr, 0, sizeof(tmphdr));
> > +		tmp_leaf = kmem_zalloc(state->blocksize, KM_SLEEP);
> > +
> > +		/*
> > +		 * Copy the header into the temp leaf so that all the stuff
> > +		 * not in the incore header is present and gets copied back in
> > +		 * once we've moved all the entries.
> > +		 */
> > +		memcpy(tmp_leaf, save_leaf, xfs_attr3_leaf_hdr_size(save_leaf));
> >  
> > +		memset(&tmphdr, 0, sizeof(tmphdr));
> >  		tmphdr.magic = savehdr.magic;
> >  		tmphdr.forw = savehdr.forw;
> >  		tmphdr.back = savehdr.back;
> >  		tmphdr.firstused = state->blocksize;
> > +
> > +		/* write the header to the temp buffer to initialise it */
> > +		xfs_attr3_leaf_hdr_to_disk(tmp_leaf, &tmphdr);
> > +
> 
> Hrm.  It seemed like the memset 0 for the incore temp header should have been
> enough for the purposes of moveents.  Why did you have to add this additional
> hdr_to_disk?

Because we've modified the tmphdr so it is different to what is in
the tmp_leaf, and I'm being paranoid and not making any assumptions
that a callee doesn't reread the tmphdr from the tmp_leaf.

i.e. I'm initialising both the on-disk and the incore headers to
be identical before we do anything else. Once bitten, twice shy.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 07/11] xfs: fully initialise temp leaf in xfs_attr3_leaf_compact
  2013-05-22 21:59   ` Ben Myers
@ 2013-05-22 23:58     ` Dave Chinner
  2013-05-23 22:51       ` Ben Myers
  0 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2013-05-22 23:58 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Wed, May 22, 2013 at 04:59:34PM -0500, Ben Myers wrote:
> On Tue, May 21, 2013 at 06:02:06PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > xfs_attr3_leaf_compact() uses a temporary buffer for compacting the
> > the entries in a leaf. It copies the the original buffer into the
> > temporary buffer, then zeros the original buffer completely. It then
> > copies the entries back into the original buffer.  However, the
> > original buffer has not been correctly initialised, and so the
> > movement of the entries goes horribly wrong.
> > 
> > Make sure the zeroed destination buffer is fully initialised, and
> > once we've set up the destination incore header appropriately, write
> > is back to the buffer before starting to move entries around.
>   it
> 
> > 
> > While debugging this, the _d/_s prefixes weren't sufficient to
> > remind me what buffer was what, so rename then all _src/_dst.
> 
> Yeah, it helps.
> 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_attr_leaf.c |   42 ++++++++++++++++++++++++++----------------
> >  1 file changed, 26 insertions(+), 16 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
> > index 79ece72..5b03d15 100644
> > --- a/fs/xfs/xfs_attr_leaf.c
> > +++ b/fs/xfs/xfs_attr_leaf.c
> > @@ -1445,11 +1445,12 @@ xfs_attr3_leaf_add_work(
> >  STATIC void
> >  xfs_attr3_leaf_compact(
> >  	struct xfs_da_args	*args,
> > -	struct xfs_attr3_icleaf_hdr *ichdr_d,
> > +	struct xfs_attr3_icleaf_hdr *ichdr_dst,
> >  	struct xfs_buf		*bp)
> >  {
> > -	xfs_attr_leafblock_t	*leaf_s, *leaf_d;
> > -	struct xfs_attr3_icleaf_hdr ichdr_s;
> > +	struct xfs_attr_leafblock *leaf_src;
> > +	struct xfs_attr_leafblock *leaf_dst;
> > +	struct xfs_attr3_icleaf_hdr ichdr_src;
> >  	struct xfs_trans	*trans = args->trans;
> >  	struct xfs_mount	*mp = trans->t_mountp;
> >  	char			*tmpbuffer;
> > @@ -1457,29 +1458,38 @@ xfs_attr3_leaf_compact(
> >  	trace_xfs_attr_leaf_compact(args);
> >  
> >  	tmpbuffer = kmem_alloc(XFS_LBSIZE(mp), KM_SLEEP);
> > -	ASSERT(tmpbuffer != NULL);
> >  	memcpy(tmpbuffer, bp->b_addr, XFS_LBSIZE(mp));
> >  	memset(bp->b_addr, 0, XFS_LBSIZE(mp));
> > +	leaf_src = (xfs_attr_leafblock_t *)tmpbuffer;
> > +	leaf_dst = bp->b_addr;
> >  
> >  	/*
> > -	 * Copy basic information
> > +	 * Copy the on-disk header back into the destination buffer to ensure
> > +	 * all the information in the header that is not part of the incore
> > +	 * header structure is preserved.
> >  	 */
> > -	leaf_s = (xfs_attr_leafblock_t *)tmpbuffer;
> > -	leaf_d = bp->b_addr;
> > -	ichdr_s = *ichdr_d;	/* struct copy */
> > -	ichdr_d->firstused = XFS_LBSIZE(mp);
> > -	ichdr_d->usedbytes = 0;
> > -	ichdr_d->count = 0;
> > -	ichdr_d->holes = 0;
> > -	ichdr_d->freemap[0].base = xfs_attr3_leaf_hdr_size(leaf_s);
> > -	ichdr_d->freemap[0].size = ichdr_d->firstused - ichdr_d->freemap[0].base;
> > +	memcpy(bp->b_addr, tmpbuffer, xfs_attr3_leaf_hdr_size(leaf_src));
> > +
> > +	/* Initialise the incore headers */
> > +	ichdr_src = *ichdr_dst;	/* struct copy */
> > +	ichdr_dst->firstused = XFS_LBSIZE(mp);
> > +	ichdr_dst->usedbytes = 0;
> > +	ichdr_dst->count = 0;
> > +	ichdr_dst->holes = 0;
> > +	ichdr_dst->freemap[0].base = xfs_attr3_leaf_hdr_size(leaf_src);
> > +	ichdr_dst->freemap[0].size = ichdr_dst->firstused -
> > +						ichdr_dst->freemap[0].base;
> > +
> > +
> > +	/* write the header back to initialise the underlying buffer */
> > +	xfs_attr3_leaf_hdr_to_disk(leaf_dst, ichdr_dst);
> 
> I can see that the memcpy is necessary but again I'm having a hard time
> 	understanding what field was messed up for moveents.

moveents wasn't messed up - it was all the non-incore headers that
were invalid (owner, uuid, blkno, etc) due to the zeroing of the tmp
buffer and then only writing the incore header back to it after
moveents. That's what the addition memcpy of the header fixes, and
then the xfs_attr3_leaf_hdr_to_disk() call updates all the fields
modified by initialisation so that before we start the in-core and
"on-disk" structures are identical.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/11] xfs: don't emit v5 superblock warnings on write
  2013-05-22 22:26   ` Ben Myers
@ 2013-05-23  0:03     ` Dave Chinner
  2013-05-23 15:23       ` Ben Myers
  0 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2013-05-23  0:03 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Wed, May 22, 2013 at 05:26:08PM -0500, Ben Myers wrote:
> On Tue, May 21, 2013 at 06:02:07PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > We write the superblock every 30s or so which results in the
> > verifier being called. Right now that results in this output
> > every 30s:
> > 
> > XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
> > Use of these features in this kernel is at your own risk!
> > 
> > And spamming the logs. Stop this output from occurring on superblock
> > writes.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_mount.c |   18 +++++++++++-------
> >  1 file changed, 11 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index f6bfbd7..e8e310c 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -314,7 +314,8 @@ STATIC int
> >  xfs_mount_validate_sb(
> >  	xfs_mount_t	*mp,
> >  	xfs_sb_t	*sbp,
> > -	bool		check_inprogress)
> > +	bool		check_inprogress,
> > +	bool		check_version)
> >  {
> >  
> >  	/*
> > @@ -337,9 +338,10 @@ xfs_mount_validate_sb(
> >  
> >  	/*
> >  	 * Version 5 superblock feature mask validation. Reject combinations the
> > -	 * kernel cannot support up front before checking anything else.
> > +	 * kernel cannot support up front before checking anything else. For
> > +	 * write validation, we don't need to check feature masks.
> >  	 */
> > -	if (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) {
> > +	if (check_version && XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) {
> 
> if (!quiet_version) {
> >  		xfs_alert(mp,
> >  "Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!\n"
> >  "Use of these features in this kernel is at your own risk!");
> 
> }
> 
> Since the stated goal of the patch is to be quieter and not to disable useful
> tests in the verifier, I suggest you disable the print rather than disable the
> test.

Checking the feature fields for whether the kernel supports the
features in them on write is not useful in any way. That's why the
variable is named "check_version" because it skips the v5
version field checking. This is stuff that is used by the mount
path (i.e. superblock read path), not the writeback path.

I'll update the commit message to reflect this.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 0/2] xfs: more fixes for 3.10-rc3
  2013-05-21  8:01 [PATCH 00/11] xfs: fixes for 3.10-rc3 Dave Chinner
                   ` (11 preceding siblings ...)
  2013-05-21 16:26 ` [PATCH 00/11] xfs: fixes for 3.10-rc3 Ben Myers
@ 2013-05-23 12:30 ` Dave Chinner
  2013-05-23 12:30   ` [PATCH 1/2] xfs: rework dquot CRCs Dave Chinner
                     ` (2 more replies)
  2013-05-24 18:29 ` [PATCH 00/11] xfs: fixes for 3.10-rc3 Ben Myers
  13 siblings, 3 replies; 49+ messages in thread
From: Dave Chinner @ 2013-05-23 12:30 UTC (permalink / raw)
  To: xfs; +Cc: bpm

Hi Ben,

A couple more fixes for 3.10-rc3 for you to look at. The first fixes
random dquot CRC corruptions I have been seeing, and the second
fixes a log recovery failure that xfs/085 reproduces 100% of the
time with CRC enabled filesystems - it's a bug in the buffer
formatting code that's been there since 2002...

Cheers,

Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 1/2] xfs: rework dquot CRCs
  2013-05-23 12:30 ` [PATCH 0/2] xfs: more " Dave Chinner
@ 2013-05-23 12:30   ` Dave Chinner
  2013-05-23 12:30   ` [PATCH 2/2] xfs: fix split buffer vector log recovery support Dave Chinner
  2013-05-24  8:58   ` [patch 0/2] xfs: yet more fixes for 3.10-rc3 Dave Chinner
  2 siblings, 0 replies; 49+ messages in thread
From: Dave Chinner @ 2013-05-23 12:30 UTC (permalink / raw)
  To: xfs; +Cc: bpm

From: Dave Chinner <dchinner@redhat.com>

Calculating dquot CRCs when the backing buffer is written back just
doesn't work reliably. There are several places which manipulate
dquots directly in the buffers, and they don't calculate CRCs
appropriately, nor do they always set the buffer up to calculate
CRCs appropriately.

Firstly, if we log a dquot buffer (e.g. during allocation) it gets
logged without valid CRC, and so on recovery we end up with a dquot
that is not valid.

Secondly, if we recover/repair a dquot, we don't have a verifier
attached to the buffer and hence CRCs arenot calculate don the way
down to disk.

Thirdly, calculating the CRC after we've changed the contents means
that if we re-read the dquot from the buffer, we cannot verify the
contents of the dquot are valid, as the CRC is invalid.

So, to avoid all the dquot CRC errors that are being detected by the
read verifier, change to using the same model as for inodes. that
is, dquot CRCs are calculated and written to the backing buffer at
the time the dquot is flushed to the backing buffer. If we modify
the dquuot directly in the backing buffer, calculate the CRC
immediately after the modification is complete. Hence the dquot in
the on-disk buffer should always have a valid CRC.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_dquot.c       |   37 ++++++++++++++++---------------------
 fs/xfs/xfs_log_recover.c |   10 ++++++++++
 fs/xfs/xfs_qm.c          |   36 ++++++++++++++++++++++++++----------
 fs/xfs/xfs_quota.h       |    2 ++
 4 files changed, 54 insertions(+), 31 deletions(-)

diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index a41f8bf..044e97a 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -249,8 +249,11 @@ xfs_qm_init_dquot_blk(
 		d->dd_diskdq.d_version = XFS_DQUOT_VERSION;
 		d->dd_diskdq.d_id = cpu_to_be32(curid);
 		d->dd_diskdq.d_flags = type;
-		if (xfs_sb_version_hascrc(&mp->m_sb))
+		if (xfs_sb_version_hascrc(&mp->m_sb)) {
 			uuid_copy(&d->dd_uuid, &mp->m_sb.sb_uuid);
+			xfs_update_cksum((char *)d, sizeof(struct xfs_dqblk),
+					 XFS_DQUOT_CRC_OFF);
+		}
 	}
 
 	xfs_trans_dquot_buf(tp, bp,
@@ -286,23 +289,6 @@ xfs_dquot_set_prealloc_limits(struct xfs_dquot *dqp)
 	dqp->q_low_space[XFS_QLOWSP_5_PCNT] = space * 5;
 }
 
-STATIC void
-xfs_dquot_buf_calc_crc(
-	struct xfs_mount	*mp,
-	struct xfs_buf		*bp)
-{
-	struct xfs_dqblk	*d = (struct xfs_dqblk *)bp->b_addr;
-	int			i;
-
-	if (!xfs_sb_version_hascrc(&mp->m_sb))
-		return;
-
-	for (i = 0; i < mp->m_quotainfo->qi_dqperchunk; i++, d++) {
-		xfs_update_cksum((char *)d, sizeof(struct xfs_dqblk),
-				 offsetof(struct xfs_dqblk, dd_crc));
-	}
-}
-
 STATIC bool
 xfs_dquot_buf_verify_crc(
 	struct xfs_mount	*mp,
@@ -328,12 +314,11 @@ xfs_dquot_buf_verify_crc(
 
 	for (i = 0; i < ndquots; i++, d++) {
 		if (!xfs_verify_cksum((char *)d, sizeof(struct xfs_dqblk),
-				 offsetof(struct xfs_dqblk, dd_crc)))
+				 XFS_DQUOT_CRC_OFF))
 			return false;
 		if (!uuid_equal(&d->dd_uuid, &mp->m_sb.sb_uuid))
 			return false;
 	}
-
 	return true;
 }
 
@@ -393,6 +378,11 @@ xfs_dquot_buf_read_verify(
 	}
 }
 
+/*
+ * we don't calculate the CRC here as that is done when the dquot is flushed to
+ * the buffer after the update is done. This ensures that the dquot in the
+ * buffer always has an up-to-date CRC value.
+ */
 void
 xfs_dquot_buf_write_verify(
 	struct xfs_buf	*bp)
@@ -404,7 +394,6 @@ xfs_dquot_buf_write_verify(
 		xfs_buf_ioerror(bp, EFSCORRUPTED);
 		return;
 	}
-	xfs_dquot_buf_calc_crc(mp, bp);
 }
 
 const struct xfs_buf_ops xfs_dquot_buf_ops = {
@@ -1151,11 +1140,17 @@ xfs_qm_dqflush(
 	 * copy the lsn into the on-disk dquot now while we have the in memory
 	 * dquot here. This can't be done later in the write verifier as we
 	 * can't get access to the log item at that point in time.
+	 *
+	 * We also calculate the CRC here so that the on-disk dquot in the
+	 * buffer always has a valid CRC. This ensures there is no possibility
+	 * of a dquot without an up-to-date CRC getting to disk.
 	 */
 	if (xfs_sb_version_hascrc(&mp->m_sb)) {
 		struct xfs_dqblk *dqb = (struct xfs_dqblk *)ddqp;
 
 		dqb->dd_lsn = cpu_to_be64(dqp->q_logitem.qli_item.li_lsn);
+		xfs_update_cksum((char *)dqb, sizeof(struct xfs_dqblk),
+				 XFS_DQUOT_CRC_OFF);
 	}
 
 	/*
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 93f03ec..0be37d7 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2255,6 +2255,12 @@ xfs_qm_dqcheck(
 	d->dd_diskdq.d_flags = type;
 	d->dd_diskdq.d_id = cpu_to_be32(id);
 
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
+		uuid_copy(&d->dd_uuid, &mp->m_sb.sb_uuid);
+		xfs_update_cksum((char *)d, sizeof(struct xfs_dqblk),
+				 XFS_DQUOT_CRC_OFF);
+	}
+
 	return errs;
 }
 
@@ -2782,6 +2788,10 @@ xlog_recover_dquot_pass2(
 	}
 
 	memcpy(ddq, recddq, item->ri_buf[1].i_len);
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
+		xfs_update_cksum((char *)ddq, sizeof(struct xfs_dqblk),
+				 XFS_DQUOT_CRC_OFF);
+	}
 
 	ASSERT(dq_f->qlf_size == 2);
 	ASSERT(bp->b_target->bt_mount == mp);
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index f41702b..d181542 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -41,6 +41,7 @@
 #include "xfs_qm.h"
 #include "xfs_trace.h"
 #include "xfs_icache.h"
+#include "xfs_cksum.h"
 
 /*
  * The global quota manager. There is only one of these for the entire
@@ -839,7 +840,7 @@ xfs_qm_reset_dqcounts(
 	xfs_dqid_t	id,
 	uint		type)
 {
-	xfs_disk_dquot_t	*ddq;
+	struct xfs_dqblk	*dqb;
 	int			j;
 
 	trace_xfs_reset_dqcounts(bp, _RET_IP_);
@@ -853,8 +854,12 @@ xfs_qm_reset_dqcounts(
 	do_div(j, sizeof(xfs_dqblk_t));
 	ASSERT(mp->m_quotainfo->qi_dqperchunk == j);
 #endif
-	ddq = bp->b_addr;
+	dqb = bp->b_addr;
 	for (j = 0; j < mp->m_quotainfo->qi_dqperchunk; j++) {
+		struct xfs_disk_dquot	*ddq;
+
+		ddq =  (struct xfs_disk_dquot *)&dqb[j];
+
 		/*
 		 * Do a sanity check, and if needed, repair the dqblk. Don't
 		 * output any warnings because it's perfectly possible to
@@ -871,7 +876,8 @@ xfs_qm_reset_dqcounts(
 		ddq->d_bwarns = 0;
 		ddq->d_iwarns = 0;
 		ddq->d_rtbwarns = 0;
-		ddq = (xfs_disk_dquot_t *) ((xfs_dqblk_t *)ddq + 1);
+		xfs_update_cksum((char *)&dqb[j], sizeof(struct xfs_dqblk),
+					 XFS_DQUOT_CRC_OFF);
 	}
 }
 
@@ -907,19 +913,29 @@ xfs_qm_dqiter_bufs(
 			      XFS_FSB_TO_DADDR(mp, bno),
 			      mp->m_quotainfo->qi_dqchunklen, 0, &bp,
 			      &xfs_dquot_buf_ops);
-		if (error)
-			break;
 
 		/*
-		 * XXX(hch): need to figure out if it makes sense to validate
-		 *	     the CRC here.
+		 * CRC and validation errors will return a EFSCORRUPTED here. If
+		 * this occurs, re-read without CRC validation so that we can
+		 * repair the damage via xfs_qm_reset_dqcounts(). This process
+		 * will leave a trace in the log indicating corruption has
+		 * been detected.
 		 */
+		if (error == EFSCORRUPTED) {
+			error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp,
+				      XFS_FSB_TO_DADDR(mp, bno),
+				      mp->m_quotainfo->qi_dqchunklen, 0, &bp,
+				      NULL);
+		}
+
+		if (error)
+			break;
+
 		xfs_qm_reset_dqcounts(mp, bp, firstid, type);
 		xfs_buf_delwri_queue(bp, buffer_list);
 		xfs_buf_relse(bp);
-		/*
-		 * goto the next block.
-		 */
+
+		/* goto the next block. */
 		bno++;
 		firstid += mp->m_quotainfo->qi_dqperchunk;
 	}
diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h
index c61e31c..c38068f 100644
--- a/fs/xfs/xfs_quota.h
+++ b/fs/xfs/xfs_quota.h
@@ -87,6 +87,8 @@ typedef struct xfs_dqblk {
 	uuid_t		  dd_uuid;	/* location information */
 } xfs_dqblk_t;
 
+#define XFS_DQUOT_CRC_OFF	offsetof(struct xfs_dqblk, dd_crc)
+
 /*
  * flags for q_flags field in the dquot.
  */
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 2/2] xfs: fix split buffer vector log recovery support
  2013-05-23 12:30 ` [PATCH 0/2] xfs: more " Dave Chinner
  2013-05-23 12:30   ` [PATCH 1/2] xfs: rework dquot CRCs Dave Chinner
@ 2013-05-23 12:30   ` Dave Chinner
  2013-05-24  8:58   ` [patch 0/2] xfs: yet more fixes for 3.10-rc3 Dave Chinner
  2 siblings, 0 replies; 49+ messages in thread
From: Dave Chinner @ 2013-05-23 12:30 UTC (permalink / raw)
  To: xfs; +Cc: bpm

From: Dave Chinner <dchinner@redhat.com>

A long time ago in a galaxy far away....

.. the was a commit made to fix some ilinux specific "fragmented
buffer" log recovery problem:

http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=b29c0bece51da72fb3ff3b61391a391ea54e1603

That problem occurred when a contiguous dirty region of a buffer was
split across across two pages of an unmapped buffer. It's been a
long time since that has been done in XFS, and the changes to log
the entire inode buffers for CRC enabled filesystems has
re-introduced that corner case.

And, of course, it turns out that the above commit didn't actually
fix anything - it just ensured that log recovery is guaranteed to
fail when this situation occurs. And now for the gory details.

xfstest xfs/085 is failing with this assert:

XFS (vdb): bad number of regions (0) in inode log format
XFS: Assertion failed: 0, file: fs/xfs/xfs_log_recover.c, line: 1583

Largely undocumented factoid #1: Log recovery depends on all log
buffer format items starting with this format:

struct foo_log_format {
	__uint16_t	type;
	__uint16_t	size;
	....

As recoery uses the size field and assumptions about 32 bit
alignment in decoding format items.  So don't pay much attention to
the fact log recovery thinks that it decoding an inode log format
item - it just uses them to determine what the size of the item is.

But why would it see a log format item with a zero size? Well,
luckily enough xfs_logprint uses the same code and gives the same
error, so with a bit of gdb magic, it turns out that it isn't a log
format that is being decoded. What logprint tells us is this:

Oper (130): tid: a0375e1a  len: 28  clientid: TRANS  flags: none
BUF:  #regs: 2   start blkno: 144 (0x90)  len: 16  bmap size: 2  flags: 0x4000
Oper (131): tid: a0375e1a  len: 4096  clientid: TRANS  flags: none
BUF DATA
----------------------------------------------------------------------------
Oper (132): tid: a0375e1a  len: 4096  clientid: TRANS  flags: none
xfs_logprint: unknown log operation type (4e49)
**********************************************************************
* ERROR: data block=2                                                 *
**********************************************************************

That we've got a buffer format item (oper 130) that has two regions;
the format item itself and one dirty region. The subsequent region
after the buffer format item and it's data is them what we are
tripping over, and the first bytes of it at an inode magic number.
Not a log opheader like there is supposed to be.

That means there's a problem with the buffer format item. It's dirty
data region is 4096 bytes, and it contains - you guessed it -
initialised inodes. But inode buffers are 8k, not 4k, and we log
them in their entirety. So something is wrong here. The buffer
format item contains:

(gdb) p /x *(struct xfs_buf_log_format *)in_f
$22 = {blf_type = 0x123c, blf_size = 0x2, blf_flags = 0x4000,
       blf_len = 0x10, blf_blkno = 0x90, blf_map_size = 0x2,
       blf_data_map = {0xffffffff, 0xffffffff, .... }}

Two regions, and a signle dirty contiguous region of 64 bits.  64 *
128 = 8k, so this should be followed by a single 8k region of data.
And the blf_flags tell us that the type of buffer is a
XFS_BLFT_DINO_BUF. It contains inodes. And because it doesn't have
the XFS_BLF_INODE_BUF flag set, that means it's an inode allocation
buffer. So, it should be followed by 8k of inode data.

But we know that the next region has a header of:

(gdb) p /x *ohead
$25 = {oh_tid = 0x1a5e37a0, oh_len = 0x100000, oh_clientid = 0x69,
       oh_flags = 0x0, oh_res2 = 0x0}

and so be32_to_cpu(oh_len) = 0x1000 = 4096 bytes. It's simply not
long enough to hold all the logged data. There must be another
region. There is - there's a following opheader for another 4k of
data that contains the other half of the inode cluster data - the
one we assert fail on because it's not a log format header.

So why is the second part of the data not being accounted to the
correct buffer log format structure? It took a little more work with
gdb to work out that the buffer log format structure was both
expecting it to be there but hadn't accounted for it. It was at that
point I went to the kernel code, as clearly this wasn't a bug in
xfs_logprint and the kernel was writing bad stuff to the log.

First port of call was the buffer item formatting code, and the
discontiguous memory/contiguous dirty region handling code
immediately stood out. I've wondered for a long time why the code
had this comment in it:

                        vecp->i_addr = xfs_buf_offset(bp, buffer_offset);
                        vecp->i_len = nbits * XFS_BLF_CHUNK;
                        vecp->i_type = XLOG_REG_TYPE_BCHUNK;
/*
 * You would think we need to bump the nvecs here too, but we do not
 * this number is used by recovery, and it gets confused by the boundary
 * split here
 *                      nvecs++;
 */
                        vecp++;

And it didn't account for the extra vector pointer. The case being
handled here is that a contiguous dirty region lies across a
boundary that cannot be memcpy()d across, and so has to be split
into two separate operations for xlog_write() to perform.

What this code assumes is that what is written to the log is two
consecutive blocks of data that are accounted in the buf log format
item as the same contiguous dirty region and so will get decoded as
such by the log recovery code.

The thing is, xlog_write() knows nothing about this, and so just
does it's normal thing of adding an opheader for each vector. That
means the 8k region gets written to the log as two separate regions
of 4k each, but because nvecs has not been incremented, the buf log
format item accounts for only one of them.

Hence when we come to log recovery, we process the first 4k region
and then expect to come across a new item that starts with a log
format structure of some kind that tells us whenteh next data is
going to be. Instead, we hit raw buffer data and things go bad real
quick.

So, the commit from 2002 that commented out nvecs++ is just plain
wrong. It breaks log recovery completely, and it would seem the only
reason this hasn't been since then is that we don't log large
contigous regions of multi-page unmapped buffers very often. Never
would be a closer estimate, at least until the CRC code came along....

So, lets fix that by restoring the nvecs accounting for the extra
region when we hit this case.....

.... and there's the problemin log recovery it is apparently working
around:

XFS: Assertion failed: i == item->ri_total, file: fs/xfs/xfs_log_recover.c, line: 2135

Yup, xlog_recover_do_reg_buffer() doesn't handle contigous dirty
regions being broken up into multiple regions by the log formatting
code. That's an easy fix, though - if the number of contiguous dirty
bits exceeds the length of the region being copied out of the log,
only account for the number of dirty bits that region covers, and
then loop again and copy more from the next region. It's a 2 line
fix.

Now xfstests xfs/085 passes, we have one less piece of mystery
code, and one more important piece of knowledge about how to
structure new log format items..

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf_item.c    |    7 +------
 fs/xfs/xfs_log_recover.c |   11 +++++++++++
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index cf26347..4ec4317 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -262,12 +262,7 @@ xfs_buf_item_format_segment(
 			vecp->i_addr = xfs_buf_offset(bp, buffer_offset);
 			vecp->i_len = nbits * XFS_BLF_CHUNK;
 			vecp->i_type = XLOG_REG_TYPE_BCHUNK;
-/*
- * You would think we need to bump the nvecs here too, but we do not
- * this number is used by recovery, and it gets confused by the boundary
- * split here
- *			nvecs++;
- */
+			nvecs++;
 			vecp++;
 			first_bit = next_bit;
 			last_bit = next_bit;
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 0be37d7..d6204d1 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2097,6 +2097,17 @@ xlog_recover_do_reg_buffer(
 		       ((uint)bit << XFS_BLF_SHIFT) + (nbits << XFS_BLF_SHIFT));
 
 		/*
+		 * The dirty regions logged in the buffer, even though
+		 * contiguous, may span multiple chunks. This is because the
+		 * dirty region may span a physical page boundary in a buffer
+		 * and hence be split into two separate vectors for writing into
+		 * the log. Hence we need to trim nbits back to the length of
+		 * the current region being copied out of the log.
+		 */
+		if (item->ri_buf[i].i_len < (nbits << XFS_BLF_SHIFT))
+			nbits = item->ri_buf[i].i_len >> XFS_BLF_SHIFT;
+
+		/*
 		 * Do a sanity check if this is a dquot buffer. Just checking
 		 * the first dquot in the buffer should do. XXXThis is
 		 * probably a good thing to do for other buf types also.
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/11] xfs: don't emit v5 superblock warnings on write
  2013-05-23  0:03     ` Dave Chinner
@ 2013-05-23 15:23       ` Ben Myers
  2013-05-23 23:13         ` Dave Chinner
  0 siblings, 1 reply; 49+ messages in thread
From: Ben Myers @ 2013-05-23 15:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, May 23, 2013 at 10:03:27AM +1000, Dave Chinner wrote:
> On Wed, May 22, 2013 at 05:26:08PM -0500, Ben Myers wrote:
> > On Tue, May 21, 2013 at 06:02:07PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > We write the superblock every 30s or so which results in the
> > > verifier being called. Right now that results in this output
> > > every 30s:
> > > 
> > > XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
> > > Use of these features in this kernel is at your own risk!
> > > 
> > > And spamming the logs. Stop this output from occurring on superblock
> > > writes.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  fs/xfs/xfs_mount.c |   18 +++++++++++-------
> > >  1 file changed, 11 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > > index f6bfbd7..e8e310c 100644
> > > --- a/fs/xfs/xfs_mount.c
> > > +++ b/fs/xfs/xfs_mount.c
> > > @@ -314,7 +314,8 @@ STATIC int
> > >  xfs_mount_validate_sb(
> > >  	xfs_mount_t	*mp,
> > >  	xfs_sb_t	*sbp,
> > > -	bool		check_inprogress)
> > > +	bool		check_inprogress,
> > > +	bool		check_version)
> > >  {
> > >  
> > >  	/*
> > > @@ -337,9 +338,10 @@ xfs_mount_validate_sb(
> > >  
> > >  	/*
> > >  	 * Version 5 superblock feature mask validation. Reject combinations the
> > > -	 * kernel cannot support up front before checking anything else.
> > > +	 * kernel cannot support up front before checking anything else. For
> > > +	 * write validation, we don't need to check feature masks.
> > >  	 */
> > > -	if (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) {
> > > +	if (check_version && XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) {
> > 
> > if (!quiet_version) {
> > >  		xfs_alert(mp,
> > >  "Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!\n"
> > >  "Use of these features in this kernel is at your own risk!");
> > 
> > }
> > 
> > Since the stated goal of the patch is to be quieter and not to disable useful
> > tests in the verifier, I suggest you disable the print rather than disable the
> > test.
> 
> Checking the feature fields for whether the kernel supports the
> features in them on write is not useful in any way.

Could it not detect corruption of the feature flags before they're written out?
My impression was that this was among the design goals of the verifiers.

> That's why the
> variable is named "check_version" because it skips the v5
> version field checking. This is stuff that is used by the mount
> path (i.e. superblock read path), not the writeback path.

You were saying above that it's the write path that is spamming the logs.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 09/11] xfs: rework remote attr CRCs
  2013-05-21  8:02 ` [PATCH 09/11] xfs: rework remote attr CRCs Dave Chinner
@ 2013-05-23 21:54   ` Ben Myers
  2013-05-23 23:35     ` Dave Chinner
  0 siblings, 1 reply; 49+ messages in thread
From: Ben Myers @ 2013-05-23 21:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hi Dave,

On Tue, May 21, 2013 at 06:02:08PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Note: this changes the on-disk remote attribute format. I assert
> that this is OK to do as CRCs are marked experimental and the first
> kernel it is included in has not yet reached release yet. Further,
> the userspace utilities are still evolving and so anyone using this
> stuff right now is a developer or tester using volatile filesystems
> for testing this feature. Hence changing the format right now to
> save longer term pain is the right thing to do.

I have no objection to changing the on-disk format for an experimental feature
even after it has released.  People know what they're signing up for when
they're turning on experimental features.  Or they soon will.  We should
probably be nice and use feature bits though.  I'm thinking about the unlinked
list modifications...

> The fundamental change is to move from a header per extent in the
> attribute to a header per filesytem block in the attribute. This
> means there are more header blocks and the parsing of the attribute
> data is slightly more complex, but it has the advantage that we
> always know the size of the attribute on disk based on the length of
> the data it contains.
> 
> This is where the header-per-extent method has problems. We don't
> know the size of the attribute on disk without first knowing how
> many extents are used to hold it. And we can't tell from a
> mapping lookup, either, because remote attributes can be allocated
> contiguously with other attribute blocks and so there is no obvious
> way of determining the actual size of the atribute on disk short of
> walking and mapping buffers.

You can't tell exactly how many extents will be required to store the attribute
until after they've been allocated.  At least... not until you're down to
trying to allocate the last block.

> The problem with this approach is that if we map a buffer
> incorrectly (e.g. we make the last buffer for the attribute data too
> long), we then get buffer cache lookup failure when we map it
> correctly. i.e. we get a size mismatch on lookup.

Ah.  The resulting block count of a fragmented remote attribute is not
reflected in 'valuelen' in the remote attribute entry, so the number of headers
to account for is unknown.

typedef struct xfs_attr_leaf_name_remote {
	__be32  valueblk;               /* block number of value bytes */
	__be32  valuelen;               /* number of bytes in value */
	__u8    namelen;                /* length of name bytes */
	__u8    name[1];                /* name bytes */
 } xfs_attr_leaf_name_remote_t;

Other options:

valuelen could be changed to a block count.  Now that you have offset and
payload length in the header you can get valuelen from there on the last
extent.

valuelen could be split in two, half is the block count and the other half is
the payload length in the last extent.

> This is not
> necessarily fatal, but it's a cache coherency problem that can lead
> to returning the wrong data to userspace or writing the wrong data
> to disk. And debug kernels will assert fail if this occurs.

What data was being exposed?  The data after the attribute?  I thought testing
for valuelen would take care of that.

> I found lots of niggly little problems trying to fix this issue on a
> 4k block size filesystem, finally getting it to pass with lots of
> fixes. The thing is, 1024 byte filesystems still failed, and it was
> getting really complex handling all the corner cases that were
> showing up. And there were clearly more that I hadn't found yet.
> 
> It is complex, fragile code, and if we don't fix it now, it will be
> complex, fragile code forever more.

LOL.  Forever more!  ;)

> Hence the simple fix is to add a header to each filesystem block.
> This gives us the same relationship between the attribute data
> length and the number of blocks on disk as we have without CRCs -
> it's a linear mapping and doesn't require us to guess anything. It
> is simple to implement, too - the remote block count calculated at
> lookup time can be used by the remote attribute set/get/remove code
> without modification for both CRC and non-CRC filesystems. The world
> becomes sane again.
> 
> Because the copy-in and copy-out now need to iterate over each
> filesystem block, I moved them into helper functions so we separate
> the block mapping and buffer manupulations from the attribute data
> and CRC header manipulations. The code becomes much clearer as a
> result, and it is a lot easier to understand and debug. It also
> appears to be much more robust - once it worked on 4k block size
> filesystems, it has worked without failure on 1k block size
> filesystems, too.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

I agree this code is much clearer with this method than it used to be.

I have two comments below.  Otherwise consider it

Reviewed-by: Ben Myers <bpm@sgi.com>

> ---
>  fs/xfs/xfs_attr_leaf.c   |   13 +-
>  fs/xfs/xfs_attr_remote.c |  381 +++++++++++++++++++++++++++-------------------
>  fs/xfs/xfs_attr_remote.h |   10 ++
>  fs/xfs/xfs_buf.c         |    1 +
>  4 files changed, 247 insertions(+), 158 deletions(-)
> 
> diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
> index 5b03d15..d788302 100644
> --- a/fs/xfs/xfs_attr_leaf.c
> +++ b/fs/xfs/xfs_attr_leaf.c
> @@ -1412,7 +1412,7 @@ xfs_attr3_leaf_add_work(
>  		name_rmt->valuelen = 0;
>  		name_rmt->valueblk = 0;
>  		args->rmtblkno = 1;
> -		args->rmtblkcnt = XFS_B_TO_FSB(mp, args->valuelen);
> +		args->rmtblkcnt = xfs_attr3_rmt_blocks(mp, args->valuelen);
>  	}
>  	xfs_trans_log_buf(args->trans, bp,
>  	     XFS_DA_LOGRANGE(leaf, xfs_attr3_leaf_name(leaf, args->index),
> @@ -2354,8 +2354,9 @@ xfs_attr3_leaf_lookup_int(
>  			args->index = probe;
>  			args->valuelen = be32_to_cpu(name_rmt->valuelen);
>  			args->rmtblkno = be32_to_cpu(name_rmt->valueblk);
> -			args->rmtblkcnt = XFS_B_TO_FSB(args->dp->i_mount,
> -						       args->valuelen);
> +			args->rmtblkcnt = xfs_attr3_rmt_blocks(
> +							args->dp->i_mount,
> +							args->valuelen);
>  			return XFS_ERROR(EEXIST);
>  		}
>  	}
> @@ -2406,7 +2407,8 @@ xfs_attr3_leaf_getvalue(
>  		ASSERT(memcmp(args->name, name_rmt->name, args->namelen) == 0);
>  		valuelen = be32_to_cpu(name_rmt->valuelen);
>  		args->rmtblkno = be32_to_cpu(name_rmt->valueblk);
> -		args->rmtblkcnt = XFS_B_TO_FSB(args->dp->i_mount, valuelen);
> +		args->rmtblkcnt = xfs_attr3_rmt_blocks(args->dp->i_mount,
> +						       valuelen);
>  		if (args->flags & ATTR_KERNOVAL) {
>  			args->valuelen = valuelen;
>  			return 0;
> @@ -2732,7 +2734,8 @@ xfs_attr3_leaf_list_int(
>  				args.valuelen = valuelen;
>  				args.value = kmem_alloc(valuelen, KM_SLEEP | KM_NOFS);
>  				args.rmtblkno = be32_to_cpu(name_rmt->valueblk);
> -				args.rmtblkcnt = XFS_B_TO_FSB(args.dp->i_mount, valuelen);
> +				args.rmtblkcnt = xfs_attr3_rmt_blocks(
> +							args.dp->i_mount, valuelen);
>  				retval = xfs_attr_rmtval_get(&args);
>  				if (retval)
>  					return retval;
> diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
> index d8bcb2d..ef6b0c1 100644
> --- a/fs/xfs/xfs_attr_remote.c
> +++ b/fs/xfs/xfs_attr_remote.c
> @@ -47,7 +47,7 @@
>   * Each contiguous block has a header, so it is not just a simple attribute
>   * length to FSB conversion.
>   */
> -static int
> +int
>  xfs_attr3_rmt_blocks(
>  	struct xfs_mount *mp,
>  	int		attrlen)
> @@ -59,12 +59,43 @@ xfs_attr3_rmt_blocks(
>  	return XFS_B_TO_FSB(mp, attrlen);
>  }
>  
> +/*
> + * Checking of the remote attribute header is split into two parts. The verifier
> + * does CRC, location and bounds checking, the unpacking function checks the
> + * attribute parameters and owner.
> + */
> +static bool
> +xfs_attr3_rmt_hdr_ok(
> +	struct xfs_mount	*mp,
> +	void			*ptr,
> +	xfs_ino_t		ino,
> +	uint32_t		offset,
> +	uint32_t		size,
> +	xfs_daddr_t		bno)
> +{
> +	struct xfs_attr3_rmt_hdr *rmt = ptr;
> +
> +	if (bno != be64_to_cpu(rmt->rm_blkno))
> +		return false;
> +	if (offset != be32_to_cpu(rmt->rm_offset))
> +		return false;
> +	if (size != be32_to_cpu(rmt->rm_bytes))
> +		return false;
> +	if (ino != be64_to_cpu(rmt->rm_owner))
> +		return false;
> +
> +	/* ok */
> +	return true;
> +}
> +
>  static bool
>  xfs_attr3_rmt_verify(
> -	struct xfs_buf		*bp)
> +	struct xfs_mount	*mp,
> +	void			*ptr,
> +	int			fsbsize,
> +	xfs_daddr_t		bno)
>  {
> -	struct xfs_mount	*mp = bp->b_target->bt_mount;
> -	struct xfs_attr3_rmt_hdr *rmt = bp->b_addr;
> +	struct xfs_attr3_rmt_hdr *rmt = ptr;
>  
>  	if (!xfs_sb_version_hascrc(&mp->m_sb))
>  		return false;
> @@ -72,7 +103,9 @@ xfs_attr3_rmt_verify(
>  		return false;
>  	if (!uuid_equal(&rmt->rm_uuid, &mp->m_sb.sb_uuid))
>  		return false;
> -	if (bp->b_bn != be64_to_cpu(rmt->rm_blkno))
> +	if (be64_to_cpu(rmt->rm_blkno) != bno)
> +		return false;
> +	if (be32_to_cpu(rmt->rm_bytes) > fsbsize - sizeof(*rmt))
>  		return false;
>  	if (be32_to_cpu(rmt->rm_offset) +
>  				be32_to_cpu(rmt->rm_bytes) >= XATTR_SIZE_MAX)
> @@ -88,17 +121,40 @@ xfs_attr3_rmt_read_verify(
>  	struct xfs_buf	*bp)
>  {
>  	struct xfs_mount *mp = bp->b_target->bt_mount;
> +	char		*ptr;
> +	int		len;
> +	bool		corrupt = false;
> +	xfs_daddr_t	bno;
>  
>  	/* no verification of non-crc buffers */
>  	if (!xfs_sb_version_hascrc(&mp->m_sb))
>  		return;
>  
> -	if (!xfs_verify_cksum(bp->b_addr, BBTOB(bp->b_length),
> -			      XFS_ATTR3_RMT_CRC_OFF) ||
> -	    !xfs_attr3_rmt_verify(bp)) {
> +	ptr = bp->b_addr;
> +	bno = bp->b_bn;
> +	len = BBTOB(bp->b_length);
> +	ASSERT(len >= XFS_LBSIZE(mp));
> +
> +	while (len > 0) {
> +		if (!xfs_verify_cksum(ptr, XFS_LBSIZE(mp),
> +				      XFS_ATTR3_RMT_CRC_OFF)) {
> +			corrupt = true;
> +			break;
> +		}
> +		if (!xfs_attr3_rmt_verify(mp, ptr, XFS_LBSIZE(mp), bno)) {
> +			corrupt = true;
> +			break;
> +		}
> +		len -= XFS_LBSIZE(mp);
> +		ptr += XFS_LBSIZE(mp);
> +		bno += mp->m_bsize;
> +	}
> +
> +	if (corrupt) {
>  		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
>  		xfs_buf_ioerror(bp, EFSCORRUPTED);
> -	}
> +	} else
> +		ASSERT(len == 0);
>  }
>  
>  static void
> @@ -107,23 +163,39 @@ xfs_attr3_rmt_write_verify(
>  {
>  	struct xfs_mount *mp = bp->b_target->bt_mount;
>  	struct xfs_buf_log_item	*bip = bp->b_fspriv;
> +	char		*ptr;
> +	int		len;
> +	xfs_daddr_t	bno;
>  
>  	/* no verification of non-crc buffers */
>  	if (!xfs_sb_version_hascrc(&mp->m_sb))
>  		return;
>  
> -	if (!xfs_attr3_rmt_verify(bp)) {
> -		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
> -		xfs_buf_ioerror(bp, EFSCORRUPTED);
> -		return;
> -	}
> +	ptr = bp->b_addr;
> +	bno = bp->b_bn;
> +	len = BBTOB(bp->b_length);
> +	ASSERT(len >= XFS_LBSIZE(mp));
> +
> +	while (len > 0) {
> +		if (!xfs_attr3_rmt_verify(mp, ptr, XFS_LBSIZE(mp), bno)) {
> +			XFS_CORRUPTION_ERROR(__func__,
> +					    XFS_ERRLEVEL_LOW, mp, bp->b_addr);
> +			xfs_buf_ioerror(bp, EFSCORRUPTED);
> +			return;
> +		}
> +		if (bip) {
> +			struct xfs_attr3_rmt_hdr *rmt;
>  
> -	if (bip) {
> -		struct xfs_attr3_rmt_hdr *rmt = bp->b_addr;
> -		rmt->rm_lsn = cpu_to_be64(bip->bli_item.li_lsn);
> +			rmt = (struct xfs_attr3_rmt_hdr *)ptr;
> +			rmt->rm_lsn = cpu_to_be64(bip->bli_item.li_lsn);
> +		}
> +		xfs_update_cksum(ptr, XFS_LBSIZE(mp), XFS_ATTR3_RMT_CRC_OFF);
> +
> +		len -= XFS_LBSIZE(mp);
> +		ptr += XFS_LBSIZE(mp);
> +		bno += mp->m_bsize;
>  	}
> -	xfs_update_cksum(bp->b_addr, BBTOB(bp->b_length),
> -			 XFS_ATTR3_RMT_CRC_OFF);
> +	ASSERT(len == 0);
>  }
>  
>  const struct xfs_buf_ops xfs_attr3_rmt_buf_ops = {
> @@ -131,15 +203,16 @@ const struct xfs_buf_ops xfs_attr3_rmt_buf_ops = {
>  	.verify_write = xfs_attr3_rmt_write_verify,
>  };
>  
> -static int
> +STATIC int
>  xfs_attr3_rmt_hdr_set(
>  	struct xfs_mount	*mp,
> +	void			*ptr,
>  	xfs_ino_t		ino,
>  	uint32_t		offset,
>  	uint32_t		size,
> -	struct xfs_buf		*bp)
> +	xfs_daddr_t		bno)
>  {
> -	struct xfs_attr3_rmt_hdr *rmt = bp->b_addr;
> +	struct xfs_attr3_rmt_hdr *rmt = ptr;
>  
>  	if (!xfs_sb_version_hascrc(&mp->m_sb))
>  		return 0;
> @@ -149,36 +222,107 @@ xfs_attr3_rmt_hdr_set(
>  	rmt->rm_bytes = cpu_to_be32(size);
>  	uuid_copy(&rmt->rm_uuid, &mp->m_sb.sb_uuid);
>  	rmt->rm_owner = cpu_to_be64(ino);
> -	rmt->rm_blkno = cpu_to_be64(bp->b_bn);
> -	bp->b_ops = &xfs_attr3_rmt_buf_ops;
> +	rmt->rm_blkno = cpu_to_be64(bno);
>  
>  	return sizeof(struct xfs_attr3_rmt_hdr);
>  }
>  
>  /*
> - * Checking of the remote attribute header is split into two parts. the verifier
> - * does CRC, location and bounds checking, the unpacking function checks the
> - * attribute parameters and owner.
> + * Helper functions to copy attribute data in and out of the one disk extents
>   */
> -static bool
> -xfs_attr3_rmt_hdr_ok(
> -	struct xfs_mount	*mp,
> -	xfs_ino_t		ino,
> -	uint32_t		offset,
> -	uint32_t		size,
> -	struct xfs_buf		*bp)
> +STATIC int
> +xfs_attr_rmtval_copyout(
> +	struct xfs_mount *mp,
> +	struct xfs_buf	*bp,
> +	xfs_ino_t	ino,
> +	int		*offset,
> +	int		*valuelen,
> +	char		**dst)
>  {
> -	struct xfs_attr3_rmt_hdr *rmt = bp->b_addr;
> +	char		*src = bp->b_addr;
> +	xfs_daddr_t	bno = bp->b_bn;
> +	int		len = BBTOB(bp->b_length);
>  
> -	if (offset != be32_to_cpu(rmt->rm_offset))
> -		return false;
> -	if (size != be32_to_cpu(rmt->rm_bytes))
> -		return false;
> -	if (ino != be64_to_cpu(rmt->rm_owner))
> -		return false;
> +	ASSERT(len >= XFS_LBSIZE(mp));
>  
> -	/* ok */
> -	return true;
> +	while (len > 0 && *valuelen > 0) {
> +		int hdr_size = 0;
> +		int byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, XFS_LBSIZE(mp));
> +
> +		byte_cnt = min_t(int, *valuelen, byte_cnt);
> +
> +		if (xfs_sb_version_hascrc(&mp->m_sb)) {
> +			if (!xfs_attr3_rmt_hdr_ok(mp, src, ino, *offset,
> +						  byte_cnt, bno)) {
> +				xfs_alert(mp,
> +"remote attribute header mismatch bno/off/len/owner (0x%llx/0x%x/Ox%x/0x%llx)",
> +					bno, *offset, byte_cnt, ino);
> +				return EFSCORRUPTED;
> +			}
> +			hdr_size = sizeof(struct xfs_attr3_rmt_hdr);
> +		}
> +
> +		memcpy(*dst, src + hdr_size, byte_cnt);
> +
> +		/* roll buffer forwards */
> +		len -= XFS_LBSIZE(mp);
> +		src += XFS_LBSIZE(mp);
> +		bno += mp->m_bsize;
> +
> +		/* roll attribute data forwards */
> +		*valuelen -= byte_cnt;
> +		*dst += byte_cnt;
> +		*offset += byte_cnt;
> +	}
> +	return 0;
> +}
> +
> +STATIC void
> +xfs_attr_rmtval_copyin(
> +	struct xfs_mount *mp,
> +	struct xfs_buf	*bp,
> +	xfs_ino_t	ino,
> +	int		*offset,
> +	int		*valuelen,
> +	char		**src)
> +{
> +	char		*dst = bp->b_addr;
> +	xfs_daddr_t	bno = bp->b_bn;
> +	int		len = BBTOB(bp->b_length);
> +
> +	ASSERT(len >= XFS_LBSIZE(mp));
> +
> +	while (len > 0 && *valuelen > 0) {
> +		int hdr_size;
> +		int byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, XFS_LBSIZE(mp));
> +
> +		byte_cnt = min(*valuelen, byte_cnt);
			   ^^^

You used 'min_t(int,...)' above and min() here.  Any reason?

> +		hdr_size = xfs_attr3_rmt_hdr_set(mp, dst, ino, *offset,
> +						 byte_cnt, bno);
> +
> +		memcpy(dst + hdr_size, *src, byte_cnt);
> +
> +		/*
> +		 * If this is the last block, zero the remainder of it.
> +		 * Check that we are actually the last block, too.
> +		 */
> +		if (byte_cnt + hdr_size < XFS_LBSIZE(mp)) {
> +			ASSERT(*valuelen - byte_cnt == 0);
> +			ASSERT(len == XFS_LBSIZE(mp));
> +			memset(dst + hdr_size + byte_cnt, 0,
> +					XFS_LBSIZE(mp) - hdr_size - byte_cnt);
> +		}
> +
> +		/* roll buffer forwards */
> +		len -= XFS_LBSIZE(mp);
> +		dst += XFS_LBSIZE(mp);
> +		bno += mp->m_bsize;
> +
> +		/* roll attribute data forwards */
> +		*valuelen -= byte_cnt;
> +		*src += byte_cnt;
> +		*offset += byte_cnt;
> +	}
>  }
>  
>  /*
> @@ -192,13 +336,12 @@ xfs_attr_rmtval_get(
>  	struct xfs_bmbt_irec	map[ATTR_RMTVALUE_MAPSIZE];
>  	struct xfs_mount	*mp = args->dp->i_mount;
>  	struct xfs_buf		*bp;
> -	xfs_daddr_t		dblkno;
>  	xfs_dablk_t		lblkno = args->rmtblkno;
> -	void			*dst = args->value;
> +	char			*dst = args->value;
>  	int			valuelen = args->valuelen;
>  	int			nmap;
>  	int			error;
> -	int			blkcnt;
> +	int			blkcnt = args->rmtblkcnt;
>  	int			i;
>  	int			offset = 0;
>  
> @@ -208,7 +351,6 @@ xfs_attr_rmtval_get(
>  
>  	while (valuelen > 0) {
>  		nmap = ATTR_RMTVALUE_MAPSIZE;
> -		blkcnt = xfs_attr3_rmt_blocks(mp, valuelen);
>  		error = xfs_bmapi_read(args->dp, (xfs_fileoff_t)lblkno,
>  				       blkcnt, map, &nmap,
>  				       XFS_BMAPI_ATTRFORK);
> @@ -217,45 +359,29 @@ xfs_attr_rmtval_get(
>  		ASSERT(nmap >= 1);
>  
>  		for (i = 0; (i < nmap) && (valuelen > 0); i++) {
> -			int	byte_cnt;
> -			char	*src;
> +			xfs_daddr_t	dblkno;
> +			int		dblkcnt;
>  
>  			ASSERT((map[i].br_startblock != DELAYSTARTBLOCK) &&
>  			       (map[i].br_startblock != HOLESTARTBLOCK));
>  			dblkno = XFS_FSB_TO_DADDR(mp, map[i].br_startblock);
> -			blkcnt = XFS_FSB_TO_BB(mp, map[i].br_blockcount);
> +			dblkcnt = XFS_FSB_TO_BB(mp, map[i].br_blockcount);
>  			error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp,
> -						   dblkno, blkcnt, 0, &bp,
> +						   dblkno, dblkcnt, 0, &bp,
>  						   &xfs_attr3_rmt_buf_ops);
>  			if (error)
>  				return error;
>  
> -			byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, BBTOB(bp->b_length));
> -			byte_cnt = min_t(int, valuelen, byte_cnt);
> -
> -			src = bp->b_addr;
> -			if (xfs_sb_version_hascrc(&mp->m_sb)) {
> -				if (!xfs_attr3_rmt_hdr_ok(mp, args->dp->i_ino,
> -							offset, byte_cnt, bp)) {
> -					xfs_alert(mp,
> -"remote attribute header does not match required off/len/owner (0x%x/Ox%x,0x%llx)",
> -						offset, byte_cnt, args->dp->i_ino);
> -					xfs_buf_relse(bp);
> -					return EFSCORRUPTED;
> -
> -				}
> -
> -				src += sizeof(struct xfs_attr3_rmt_hdr);
> -			}
> -
> -			memcpy(dst, src, byte_cnt);
> +			error = xfs_attr_rmtval_copyout(mp, bp, args->dp->i_ino,
> +							&offset, &valuelen,
> +							&dst);
>  			xfs_buf_relse(bp);
> +			if (error)
> +				return error;
>  
> -			offset += byte_cnt;
> -			dst += byte_cnt;
> -			valuelen -= byte_cnt;
> -
> +			/* roll attribute extent map forwards */
>  			lblkno += map[i].br_blockcount;
> +			blkcnt -= map[i].br_blockcount;
>  		}
>  	}
>  	ASSERT(valuelen == 0);
> @@ -273,17 +399,13 @@ xfs_attr_rmtval_set(
>  	struct xfs_inode	*dp = args->dp;
>  	struct xfs_mount	*mp = dp->i_mount;
>  	struct xfs_bmbt_irec	map;
> -	struct xfs_buf		*bp;
> -	xfs_daddr_t		dblkno;
>  	xfs_dablk_t		lblkno;
>  	xfs_fileoff_t		lfileoff = 0;
> -	void			*src = args->value;
> +	char			*src = args->value;
>  	int			blkcnt;
>  	int			valuelen;
>  	int			nmap;
>  	int			error;
> -	int			hdrcnt = 0;
> -	bool			crcs = xfs_sb_version_hascrc(&mp->m_sb);
>  	int			offset = 0;
>  
>  	trace_xfs_attr_rmtval_set(args);
> @@ -292,21 +414,14 @@ xfs_attr_rmtval_set(
>  	 * Find a "hole" in the attribute address space large enough for
>  	 * us to drop the new attribute's value into. Because CRC enable
>  	 * attributes have headers, we can't just do a straight byte to FSB
> -	 * conversion. We calculate the worst case block count in this case
> -	 * and we may not need that many, so we have to handle this when
> -	 * allocating the blocks below. 
> +	 * conversion and have to take the header space into account.
>  	 */
>  	blkcnt = xfs_attr3_rmt_blocks(mp, args->valuelen);
> -
>  	error = xfs_bmap_first_unused(args->trans, args->dp, blkcnt, &lfileoff,
>  						   XFS_ATTR_FORK);
>  	if (error)
>  		return error;
>  
> -	/* Start with the attribute data. We'll allocate the rest afterwards. */
> -	if (crcs)
> -		blkcnt = XFS_B_TO_FSB(mp, args->valuelen);
> -
>  	args->rmtblkno = lblkno = (xfs_dablk_t)lfileoff;
>  	args->rmtblkcnt = blkcnt;
>  
> @@ -349,31 +464,6 @@ xfs_attr_rmtval_set(
>  		       (map.br_startblock != HOLESTARTBLOCK));
>  		lblkno += map.br_blockcount;
>  		blkcnt -= map.br_blockcount;
> -		hdrcnt++;
> -
> -		/*
> -		 * If we have enough blocks for the attribute data, calculate
> -		 * how many extra blocks we need for headers. We might run
> -		 * through this multiple times in the case that the additional
> -		 * headers in the blocks needed for the data fragments spills
> -		 * into requiring more blocks. e.g. for 512 byte blocks, we'll
> -		 * spill for another block every 9 headers we require in this
> -		 * loop.
> -		 *
> -		 * Note that this can result in contiguous allocation of blocks,
> -		 * so we don't use all the space we allocate for headers as we
> -		 * have one less header for each contiguous allocation that
> -		 * occurs in the map/write loop below.
> -		 */
> -		if (crcs && blkcnt == 0) {
> -			int total_len;
> -
> -			total_len = args->valuelen +
> -				    hdrcnt * sizeof(struct xfs_attr3_rmt_hdr);
> -			blkcnt = XFS_B_TO_FSB(mp, total_len);
> -			blkcnt -= args->rmtblkcnt;
> -			args->rmtblkcnt += blkcnt;
> -		}
>  
>  		/*
>  		 * Start the next trans in the chain.
> @@ -390,17 +480,15 @@ xfs_attr_rmtval_set(
>  	 * the INCOMPLETE flag.
>  	 */
>  	lblkno = args->rmtblkno;
> -	valuelen = args->valuelen;
>  	blkcnt = args->rmtblkcnt;
> +	valuelen = args->valuelen;
>  	while (valuelen > 0) {
> -		int	byte_cnt;
> -		int	hdr_size;
> -		int	dblkcnt;
> -		char	*buf;
> +		struct xfs_buf	*bp;
> +		xfs_daddr_t	dblkno;
> +		int		dblkcnt;
> +
> +		ASSERT(blkcnt > 0);
>  
> -		/*
> -		 * Try to remember where we decided to put the value.
> -		 */
>  		xfs_bmap_init(args->flist, args->firstblock);
>  		nmap = 1;
>  		error = xfs_bmapi_read(dp, (xfs_fileoff_t)lblkno,
> @@ -419,29 +507,17 @@ xfs_attr_rmtval_set(
>  		if (!bp)
>  			return ENOMEM;
>  		bp->b_ops = &xfs_attr3_rmt_buf_ops;
> -		buf = bp->b_addr;
> -
> -		byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, BBTOB(bp->b_length));
> -		byte_cnt = min_t(int, valuelen, byte_cnt);
> -		hdr_size = xfs_attr3_rmt_hdr_set(mp, dp->i_ino, offset,
> -					     byte_cnt, bp);
> -		ASSERT(hdr_size + byte_cnt <= BBTOB(bp->b_length));
>  
> -		memcpy(buf + hdr_size, src, byte_cnt);
> -
> -		if (byte_cnt + hdr_size < BBTOB(bp->b_length))
> -			xfs_buf_zero(bp, byte_cnt + hdr_size,
> -				     BBTOB(bp->b_length) - byte_cnt - hdr_size);
> +		xfs_attr_rmtval_copyin(mp, bp, args->dp->i_ino, &offset,
> +				       &valuelen, &src);
>  
>  		error = xfs_bwrite(bp);	/* GROT: NOTE: synchronous write */
>  		xfs_buf_relse(bp);
>  		if (error)
>  			return error;
>  
> -		src += byte_cnt;
> -		valuelen -= byte_cnt;
> -		offset += byte_cnt;
>  
> +		/* roll attribute extent map forwards */
>  		lblkno += map.br_blockcount;
>  		blkcnt -= map.br_blockcount;
>  	}
> @@ -454,19 +530,17 @@ xfs_attr_rmtval_set(
>   * out-of-line buffer that it is stored on.
>   */
>  int
> -xfs_attr_rmtval_remove(xfs_da_args_t *args)
> +xfs_attr_rmtval_remove(
> +	struct xfs_da_args	*args)
>  {
> -	xfs_mount_t *mp;
> -	xfs_bmbt_irec_t map;
> -	xfs_buf_t *bp;
> -	xfs_daddr_t dblkno;
> -	xfs_dablk_t lblkno;
> -	int valuelen, blkcnt, nmap, error, done, committed;
> +	struct xfs_mount	*mp = args->dp->i_mount;
> +	xfs_dablk_t		lblkno;
> +	int			blkcnt;
> +	int			error;
> +	int			done;
>  
>  	trace_xfs_attr_rmtval_remove(args);
>  
> -	mp = args->dp->i_mount;
> -
>  	/*
>  	 * Roll through the "value", invalidating the attribute value's blocks.
>  	 * Note that args->rmtblkcnt is the minimum number of data blocks we'll

I believe this comment is now out of date and needs to be updated.

> @@ -476,10 +550,13 @@ xfs_attr_rmtval_remove(xfs_da_args_t *args)
>  	 * lookups.
>  	 */
>  	lblkno = args->rmtblkno;
> -	valuelen = args->valuelen;
> -	blkcnt = xfs_attr3_rmt_blocks(mp, valuelen);
> -	while (valuelen > 0) {
> -		int dblkcnt;
> +	blkcnt = args->rmtblkcnt;
> +	while (blkcnt > 0) {
> +		struct xfs_bmbt_irec	map;
> +		struct xfs_buf		*bp;
> +		xfs_daddr_t		dblkno;
> +		int			dblkcnt;
> +		int			nmap;
>  
>  		/*
>  		 * Try to remember where we decided to put the value.
> @@ -506,21 +583,19 @@ xfs_attr_rmtval_remove(xfs_da_args_t *args)
>  			bp = NULL;
>  		}
>  
> -		valuelen -= XFS_ATTR3_RMT_BUF_SPACE(mp,
> -					XFS_FSB_TO_B(mp, map.br_blockcount));
> -
>  		lblkno += map.br_blockcount;
>  		blkcnt -= map.br_blockcount;
> -		blkcnt = max(blkcnt, xfs_attr3_rmt_blocks(mp, valuelen));
>  	}
>  
>  	/*
>  	 * Keep de-allocating extents until the remote-value region is gone.
>  	 */
> -	blkcnt = lblkno - args->rmtblkno;
>  	lblkno = args->rmtblkno;
> +	blkcnt = args->rmtblkcnt;
>  	done = 0;
>  	while (!done) {
> +		int committed;
> +
>  		xfs_bmap_init(args->flist, args->firstblock);
>  		error = xfs_bunmapi(args->trans, args->dp, lblkno, blkcnt,
>  				    XFS_BMAPI_ATTRFORK | XFS_BMAPI_METADATA,
> diff --git a/fs/xfs/xfs_attr_remote.h b/fs/xfs/xfs_attr_remote.h
> index c7cca60..92a8fd7 100644
> --- a/fs/xfs/xfs_attr_remote.h
> +++ b/fs/xfs/xfs_attr_remote.h
> @@ -20,6 +20,14 @@
>  
>  #define XFS_ATTR3_RMT_MAGIC	0x5841524d	/* XARM */
>  
> +/*
> + * There is one of these headers per filesystem block in a remote attribute.
> + * This is done to ensure there is a 1:1 mapping between the attribute value
> + * length and the number of blocks needed to store the attribute. This makes the
> + * verification of a buffer a little more complex, but greatly simplifies the
> + * allocation, reading and writing of these attributes as we don't have to guess
> + * the number of blocks needed to store the attribute data.
> + */
>  struct xfs_attr3_rmt_hdr {
>  	__be32	rm_magic;
>  	__be32	rm_offset;
> @@ -39,6 +47,8 @@ struct xfs_attr3_rmt_hdr {
>  
>  extern const struct xfs_buf_ops xfs_attr3_rmt_buf_ops;
>  
> +int xfs_attr3_rmt_blocks(struct xfs_mount *mp, int attrlen);
> +
>  int xfs_attr_rmtval_get(struct xfs_da_args *args);
>  int xfs_attr_rmtval_set(struct xfs_da_args *args);
>  int xfs_attr_rmtval_remove(struct xfs_da_args *args);
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 0d25542..1b2472a 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -513,6 +513,7 @@ _xfs_buf_find(
>  		xfs_alert(btp->bt_mount,
>  			  "%s: Block out of range: block 0x%llx, EOFS 0x%llx ",
>  			  __func__, blkno, eofs);
> +		WARN_ON(1);
>  		return NULL;
>  	}
>  
> -- 
> 1.7.10.4
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/11] xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance
  2013-05-22 23:54     ` Dave Chinner
@ 2013-05-23 22:51       ` Ben Myers
  0 siblings, 0 replies; 49+ messages in thread
From: Ben Myers @ 2013-05-23 22:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, May 23, 2013 at 09:54:50AM +1000, Dave Chinner wrote:
> On Wed, May 22, 2013 at 03:50:59PM -0500, Ben Myers wrote:
> > On Tue, May 21, 2013 at 06:02:05PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > xfs_attr3_leaf_unbalance() uses a temporary buffer for recombining
> > > the entries in two leaves when the destination leaf requires
> > > compaction. The temporary buffer ends up being copied back over the
> > > original destination buffer, so the header in the temporary buffer
> > > needs to contain all the information that is in the destination
> > > buffer.
> > > 
> > > To make sure the temporary buffer is fully initialised, once we've
> > > set up the temporary incore header appropriately, write is back to
> > > the temporary buffer before starting to move entries around.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > >
> > > ---
> > >  fs/xfs/xfs_attr_leaf.c |   16 +++++++++++++---
> > >  1 file changed, 13 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
> > > index 0bce1b3..79ece72 100644
> > > --- a/fs/xfs/xfs_attr_leaf.c
> > > +++ b/fs/xfs/xfs_attr_leaf.c
> > > @@ -2181,14 +2181,24 @@ xfs_attr3_leaf_unbalance(
> > >  		struct xfs_attr_leafblock *tmp_leaf;
> > >  		struct xfs_attr3_icleaf_hdr tmphdr;
> > >  
> > > -		tmp_leaf = kmem_alloc(state->blocksize, KM_SLEEP);
> > > -		memset(tmp_leaf, 0, state->blocksize);
> > > -		memset(&tmphdr, 0, sizeof(tmphdr));
> > > +		tmp_leaf = kmem_zalloc(state->blocksize, KM_SLEEP);
> > > +
> > > +		/*
> > > +		 * Copy the header into the temp leaf so that all the stuff
> > > +		 * not in the incore header is present and gets copied back in
> > > +		 * once we've moved all the entries.
> > > +		 */
> > > +		memcpy(tmp_leaf, save_leaf, xfs_attr3_leaf_hdr_size(save_leaf));
> > >  
> > > +		memset(&tmphdr, 0, sizeof(tmphdr));
> > >  		tmphdr.magic = savehdr.magic;
> > >  		tmphdr.forw = savehdr.forw;
> > >  		tmphdr.back = savehdr.back;
> > >  		tmphdr.firstused = state->blocksize;
> > > +
> > > +		/* write the header to the temp buffer to initialise it */
> > > +		xfs_attr3_leaf_hdr_to_disk(tmp_leaf, &tmphdr);
> > > +
> > 
> > Hrm.  It seemed like the memset 0 for the incore temp header should have been
> > enough for the purposes of moveents.  Why did you have to add this additional
> > hdr_to_disk?
> 
> Because we've modified the tmphdr so it is different to what is in
> the tmp_leaf, and I'm being paranoid and not making any assumptions
> that a callee doesn't reread the tmphdr from the tmp_leaf.
> 
> i.e. I'm initialising both the on-disk and the incore headers to
> be identical before we do anything else. Once bitten, twice shy.

Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 07/11] xfs: fully initialise temp leaf in xfs_attr3_leaf_compact
  2013-05-22 23:58     ` Dave Chinner
@ 2013-05-23 22:51       ` Ben Myers
  0 siblings, 0 replies; 49+ messages in thread
From: Ben Myers @ 2013-05-23 22:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, May 23, 2013 at 09:58:20AM +1000, Dave Chinner wrote:
> On Wed, May 22, 2013 at 04:59:34PM -0500, Ben Myers wrote:
> > On Tue, May 21, 2013 at 06:02:06PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > xfs_attr3_leaf_compact() uses a temporary buffer for compacting the
> > > the entries in a leaf. It copies the the original buffer into the
> > > temporary buffer, then zeros the original buffer completely. It then
> > > copies the entries back into the original buffer.  However, the
> > > original buffer has not been correctly initialised, and so the
> > > movement of the entries goes horribly wrong.
> > > 
> > > Make sure the zeroed destination buffer is fully initialised, and
> > > once we've set up the destination incore header appropriately, write
> > > is back to the buffer before starting to move entries around.
> >   it
> > 
> > > 
> > > While debugging this, the _d/_s prefixes weren't sufficient to
> > > remind me what buffer was what, so rename then all _src/_dst.
> > 
> > Yeah, it helps.
> > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  fs/xfs/xfs_attr_leaf.c |   42 ++++++++++++++++++++++++++----------------
> > >  1 file changed, 26 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
> > > index 79ece72..5b03d15 100644
> > > --- a/fs/xfs/xfs_attr_leaf.c
> > > +++ b/fs/xfs/xfs_attr_leaf.c
> > > @@ -1445,11 +1445,12 @@ xfs_attr3_leaf_add_work(
> > >  STATIC void
> > >  xfs_attr3_leaf_compact(
> > >  	struct xfs_da_args	*args,
> > > -	struct xfs_attr3_icleaf_hdr *ichdr_d,
> > > +	struct xfs_attr3_icleaf_hdr *ichdr_dst,
> > >  	struct xfs_buf		*bp)
> > >  {
> > > -	xfs_attr_leafblock_t	*leaf_s, *leaf_d;
> > > -	struct xfs_attr3_icleaf_hdr ichdr_s;
> > > +	struct xfs_attr_leafblock *leaf_src;
> > > +	struct xfs_attr_leafblock *leaf_dst;
> > > +	struct xfs_attr3_icleaf_hdr ichdr_src;
> > >  	struct xfs_trans	*trans = args->trans;
> > >  	struct xfs_mount	*mp = trans->t_mountp;
> > >  	char			*tmpbuffer;
> > > @@ -1457,29 +1458,38 @@ xfs_attr3_leaf_compact(
> > >  	trace_xfs_attr_leaf_compact(args);
> > >  
> > >  	tmpbuffer = kmem_alloc(XFS_LBSIZE(mp), KM_SLEEP);
> > > -	ASSERT(tmpbuffer != NULL);
> > >  	memcpy(tmpbuffer, bp->b_addr, XFS_LBSIZE(mp));
> > >  	memset(bp->b_addr, 0, XFS_LBSIZE(mp));
> > > +	leaf_src = (xfs_attr_leafblock_t *)tmpbuffer;
> > > +	leaf_dst = bp->b_addr;
> > >  
> > >  	/*
> > > -	 * Copy basic information
> > > +	 * Copy the on-disk header back into the destination buffer to ensure
> > > +	 * all the information in the header that is not part of the incore
> > > +	 * header structure is preserved.
> > >  	 */
> > > -	leaf_s = (xfs_attr_leafblock_t *)tmpbuffer;
> > > -	leaf_d = bp->b_addr;
> > > -	ichdr_s = *ichdr_d;	/* struct copy */
> > > -	ichdr_d->firstused = XFS_LBSIZE(mp);
> > > -	ichdr_d->usedbytes = 0;
> > > -	ichdr_d->count = 0;
> > > -	ichdr_d->holes = 0;
> > > -	ichdr_d->freemap[0].base = xfs_attr3_leaf_hdr_size(leaf_s);
> > > -	ichdr_d->freemap[0].size = ichdr_d->firstused - ichdr_d->freemap[0].base;
> > > +	memcpy(bp->b_addr, tmpbuffer, xfs_attr3_leaf_hdr_size(leaf_src));
> > > +
> > > +	/* Initialise the incore headers */
> > > +	ichdr_src = *ichdr_dst;	/* struct copy */
> > > +	ichdr_dst->firstused = XFS_LBSIZE(mp);
> > > +	ichdr_dst->usedbytes = 0;
> > > +	ichdr_dst->count = 0;
> > > +	ichdr_dst->holes = 0;
> > > +	ichdr_dst->freemap[0].base = xfs_attr3_leaf_hdr_size(leaf_src);
> > > +	ichdr_dst->freemap[0].size = ichdr_dst->firstused -
> > > +						ichdr_dst->freemap[0].base;
> > > +
> > > +
> > > +	/* write the header back to initialise the underlying buffer */
> > > +	xfs_attr3_leaf_hdr_to_disk(leaf_dst, ichdr_dst);
> > 
> > I can see that the memcpy is necessary but again I'm having a hard time
> > 	understanding what field was messed up for moveents.
> 
> moveents wasn't messed up - it was all the non-incore headers that
> were invalid (owner, uuid, blkno, etc) due to the zeroing of the tmp
> buffer and then only writing the incore header back to it after
> moveents. That's what the addition memcpy of the header fixes, and
> then the xfs_attr3_leaf_hdr_to_disk() call updates all the fields
> modified by initialisation so that before we start the in-core and
> "on-disk" structures are identical.

Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/11] xfs: don't emit v5 superblock warnings on write
  2013-05-23 15:23       ` Ben Myers
@ 2013-05-23 23:13         ` Dave Chinner
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Chinner @ 2013-05-23 23:13 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Thu, May 23, 2013 at 10:23:38AM -0500, Ben Myers wrote:
> On Thu, May 23, 2013 at 10:03:27AM +1000, Dave Chinner wrote:
> > On Wed, May 22, 2013 at 05:26:08PM -0500, Ben Myers wrote:
> > > On Tue, May 21, 2013 at 06:02:07PM +1000, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > We write the superblock every 30s or so which results in the
> > > > verifier being called. Right now that results in this output
> > > > every 30s:
> > > > 
> > > > XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
> > > > Use of these features in this kernel is at your own risk!
> > > > 
> > > > And spamming the logs. Stop this output from occurring on superblock
> > > > writes.
> > > > 
> > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > > ---
> > > >  fs/xfs/xfs_mount.c |   18 +++++++++++-------
> > > >  1 file changed, 11 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > > > index f6bfbd7..e8e310c 100644
> > > > --- a/fs/xfs/xfs_mount.c
> > > > +++ b/fs/xfs/xfs_mount.c
> > > > @@ -314,7 +314,8 @@ STATIC int
> > > >  xfs_mount_validate_sb(
> > > >  	xfs_mount_t	*mp,
> > > >  	xfs_sb_t	*sbp,
> > > > -	bool		check_inprogress)
> > > > +	bool		check_inprogress,
> > > > +	bool		check_version)
> > > >  {
> > > >  
> > > >  	/*
> > > > @@ -337,9 +338,10 @@ xfs_mount_validate_sb(
> > > >  
> > > >  	/*
> > > >  	 * Version 5 superblock feature mask validation. Reject combinations the
> > > > -	 * kernel cannot support up front before checking anything else.
> > > > +	 * kernel cannot support up front before checking anything else. For
> > > > +	 * write validation, we don't need to check feature masks.
> > > >  	 */
> > > > -	if (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) {
> > > > +	if (check_version && XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) {
> > > 
> > > if (!quiet_version) {
> > > >  		xfs_alert(mp,
> > > >  "Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!\n"
> > > >  "Use of these features in this kernel is at your own risk!");
> > > 
> > > }
> > > 
> > > Since the stated goal of the patch is to be quieter and not to disable useful
> > > tests in the verifier, I suggest you disable the print rather than disable the
> > > test.
> > 
> > Checking the feature fields for whether the kernel supports the
> > features in them on write is not useful in any way.
> 
> Could it not detect corruption of the feature flags before they're written out?
> My impression was that this was among the design goals of the verifiers.

It might, but we don't verify any of the existing feature flags,
either. For that matter, how do you verify them when the in-core
superblock can be modified asynchronously to the on-disk version in
the buffer without locking the incore superblock?

And if we do lock the incore superblock, how do we deal with the
lock inversion that this causes?

> > That's why the
> > variable is named "check_version" because it skips the v5
> > version field checking. This is stuff that is used by the mount
> > path (i.e. superblock read path), not the writeback path.
> 
> You were saying above that it's the write path that is spamming the logs.

The function is used by both the read and the write verifiers, but
the feature mask checks are only relevant to the read path which
occurs during mount. Doing the checks on write is what is spamming
the logs.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 09/11] xfs: rework remote attr CRCs
  2013-05-23 21:54   ` Ben Myers
@ 2013-05-23 23:35     ` Dave Chinner
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Chinner @ 2013-05-23 23:35 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Thu, May 23, 2013 at 04:54:05PM -0500, Ben Myers wrote:
> Hi Dave,
> 
> On Tue, May 21, 2013 at 06:02:08PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Note: this changes the on-disk remote attribute format. I assert
> > that this is OK to do as CRCs are marked experimental and the first
> > kernel it is included in has not yet reached release yet. Further,
> > the userspace utilities are still evolving and so anyone using this
> > stuff right now is a developer or tester using volatile filesystems
> > for testing this feature. Hence changing the format right now to
> > save longer term pain is the right thing to do.
> 
> I have no objection to changing the on-disk format for an experimental feature
> even after it has released.  People know what they're signing up for when
> they're turning on experimental features.  Or they soon will.  We should
> probably be nice and use feature bits though.  I'm thinking about the unlinked
> list modifications...
> 
> > The fundamental change is to move from a header per extent in the
> > attribute to a header per filesytem block in the attribute. This
> > means there are more header blocks and the parsing of the attribute
> > data is slightly more complex, but it has the advantage that we
> > always know the size of the attribute on disk based on the length of
> > the data it contains.
> > 
> > This is where the header-per-extent method has problems. We don't
> > know the size of the attribute on disk without first knowing how
> > many extents are used to hold it. And we can't tell from a
> > mapping lookup, either, because remote attributes can be allocated
> > contiguously with other attribute blocks and so there is no obvious
> > way of determining the actual size of the atribute on disk short of
> > walking and mapping buffers.
> 
> You can't tell exactly how many extents will be required to store the attribute
> until after they've been allocated.  At least... not until you're down to
> trying to allocate the last block.
> 
> > The problem with this approach is that if we map a buffer
> > incorrectly (e.g. we make the last buffer for the attribute data too
> > long), we then get buffer cache lookup failure when we map it
> > correctly. i.e. we get a size mismatch on lookup.
> 
> Ah.  The resulting block count of a fragmented remote attribute is not
> reflected in 'valuelen' in the remote attribute entry, so the number of headers
> to account for is unknown.
> 
> typedef struct xfs_attr_leaf_name_remote {
> 	__be32  valueblk;               /* block number of value bytes */
> 	__be32  valuelen;               /* number of bytes in value */
> 	__u8    namelen;                /* length of name bytes */
> 	__u8    name[1];                /* name bytes */
>  } xfs_attr_leaf_name_remote_t;
> 
> Other options:
> 
> valuelen could be changed to a block count.  Now that you have offset and
> payload length in the header you can get valuelen from there on the last
> extent.

Which is an on-disk format change.

If valuelen is stored a block count, how do you know how long the
attribute data is in bytes? You don't - it's not stored on disk
anymore in the the attribute entry. How do you return the length
of the attribute to userspace on a lookup without now having to read
the data blocks directly?

> valuelen could be split in two, half is the block count and the other half is
> the payload length in the last extent.

Also an on disk format change. And it doesn't solve the problem, as
you still can't work out the attribute data length because you don't
know how many extents (and therefore headers) make up that block
count.

And both of these mean that all the remote attr code needs to handle
the remote attribute data length information differently. i.e.
Handle one type of indexing for existing filesystems and a different
type of indexing for v5 superblocks. 

It just adds more complexity to what is already complex and fragile.

> > This is not
> > necessarily fatal, but it's a cache coherency problem that can lead
> > to returning the wrong data to userspace or writing the wrong data
> > to disk. And debug kernels will assert fail if this occurs.
> 
> What data was being exposed?  The data after the attribute?  I thought testing
> for valuelen would take care of that.

Stale data. e.g. we do an attribute write and get a buffer
length of 16 blocks at bno X. we then do an attribute read, and get
a length of 8 blocks at bno X because we've failed to calculate the
length correctly.

A debug kernel will assert fail in _xfs_buf_find here because of the
length mismatch. A production kernel may fail to find a match, and
if it does fail, then we'll read from bno X to fill the buffer. So
what is returned is not attribute data but whatever stale data is on
disk at bno X....

> > +	ASSERT(len >= XFS_LBSIZE(mp));
> > +
> > +	while (len > 0 && *valuelen > 0) {
> > +		int hdr_size;
> > +		int byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, XFS_LBSIZE(mp));
> > +
> > +		byte_cnt = min(*valuelen, byte_cnt);
> 			   ^^^
> 
> You used 'min_t(int,...)' above and min() here.  Any reason?

Both are of the same type, so min is appropriate. I'll fix it up.

> > -	xfs_mount_t *mp;
> > -	xfs_bmbt_irec_t map;
> > -	xfs_buf_t *bp;
> > -	xfs_daddr_t dblkno;
> > -	xfs_dablk_t lblkno;
> > -	int valuelen, blkcnt, nmap, error, done, committed;
> > +	struct xfs_mount	*mp = args->dp->i_mount;
> > +	xfs_dablk_t		lblkno;
> > +	int			blkcnt;
> > +	int			error;
> > +	int			done;
> >  
> >  	trace_xfs_attr_rmtval_remove(args);
> >  
> > -	mp = args->dp->i_mount;
> > -
> >  	/*
> >  	 * Roll through the "value", invalidating the attribute value's blocks.
> >  	 * Note that args->rmtblkcnt is the minimum number of data blocks we'll
> 
> I believe this comment is now out of date and needs to be updated.

Yup, good catch. Will fix.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [patch 0/2] xfs: yet more fixes for 3.10-rc3
  2013-05-23 12:30 ` [PATCH 0/2] xfs: more " Dave Chinner
  2013-05-23 12:30   ` [PATCH 1/2] xfs: rework dquot CRCs Dave Chinner
  2013-05-23 12:30   ` [PATCH 2/2] xfs: fix split buffer vector log recovery support Dave Chinner
@ 2013-05-24  8:58   ` Dave Chinner
  2013-05-24  8:58     ` [PATCH 1/2] xfs: disable swap extents ioctl on CRC enabled filesystems Dave Chinner
  2013-05-24  8:58     ` [PATCH 2/2] xfs: kill suid/sgid through the truncate path Dave Chinner
  2 siblings, 2 replies; 49+ messages in thread
From: Dave Chinner @ 2013-05-24  8:58 UTC (permalink / raw)
  To: xfs; +Cc: bpm

The first disabled swap extents for CRC enabled filesystems as it
"corrupts" owner information in the BMBT blocks. The second fixes
the suid/sgid truncate issue that Dave Jones reported. This one is
definitely a stable kernel candidate.

-dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 1/2] xfs: disable swap extents ioctl on CRC enabled filesystems
  2013-05-24  8:58   ` [patch 0/2] xfs: yet more fixes for 3.10-rc3 Dave Chinner
@ 2013-05-24  8:58     ` Dave Chinner
  2013-05-24  8:58     ` [PATCH 2/2] xfs: kill suid/sgid through the truncate path Dave Chinner
  1 sibling, 0 replies; 49+ messages in thread
From: Dave Chinner @ 2013-05-24  8:58 UTC (permalink / raw)
  To: xfs; +Cc: bpm

From: Dave Chinner <dchinner@redhat.com>

Currently, swapping extents from one inode to another is a simple
act of switching data and attribute forks from one inode to another.
This, unfortunately in no longer so simple with CRC enabled
filesystems as there is owner information embedded into the BMBT
blocks that are swapped between inodes. Hence swapping the forks
between inodes results in the inodes having mapping blocks that
point to the wrong owner and hence are considered corrupt.

To fix this we need an extent tree block or record based swap
algorithm so that the BMBT block owner information can be updated
atomically in the swap transaction. This is a significant piece of
new work, so for the moment simply don't allow swap extent
operations to succeed on CRC enabled filesystems.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_dfrag.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/xfs/xfs_dfrag.c b/fs/xfs/xfs_dfrag.c
index f852b08..c407e1c 100644
--- a/fs/xfs/xfs_dfrag.c
+++ b/fs/xfs/xfs_dfrag.c
@@ -219,6 +219,14 @@ xfs_swap_extents(
 	int		taforkblks = 0;
 	__uint64_t	tmp;
 
+	/*
+	 * We have no way of updating owner information in the BMBT blocks for
+	 * each inode on CRC enabled filesystems, so to avoid corrupting the
+	 * this metadata we simply don't allow extent swaps to occur.
+	 */
+	if (xfs_sb_version_hascrc(&mp->m_sb))
+		return XFS_ERROR(EINVAL);
+
 	tempifp = kmem_alloc(sizeof(xfs_ifork_t), KM_MAYFAIL);
 	if (!tempifp) {
 		error = XFS_ERROR(ENOMEM);
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 2/2] xfs: kill suid/sgid through the truncate path.
  2013-05-24  8:58   ` [patch 0/2] xfs: yet more fixes for 3.10-rc3 Dave Chinner
  2013-05-24  8:58     ` [PATCH 1/2] xfs: disable swap extents ioctl on CRC enabled filesystems Dave Chinner
@ 2013-05-24  8:58     ` Dave Chinner
  2013-05-24 10:02       ` Christoph Hellwig
  1 sibling, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2013-05-24  8:58 UTC (permalink / raw)
  To: xfs; +Cc: bpm

From: Dave Chinner <dchinner@redhat.com>

XFS has failed to kill suid/sgid bits correctly when truncating
files of non-zero size since commit c4ed4243 ("xfs: split
xfs_setattr") introduced in the 3.1 kernel. Fix it.

Fix it.

cc: stable kernel <stable@vger.kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_iops.c |   43 ++++++++++++++++++++++++++++---------------
 1 file changed, 28 insertions(+), 15 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index d82efaa..c255382 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -455,6 +455,24 @@ xfs_vn_getattr(
 	return 0;
 }
 
+static void
+xfs_setattr_mode(
+	struct inode	*inode,
+	struct iattr	*iattr)
+{
+	struct xfs_inode *ip = XFS_I(inode);
+	umode_t		mode = iattr->ia_mode;
+
+	if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
+		mode &= ~S_ISGID;
+
+	ip->i_d.di_mode &= S_IFMT;
+	ip->i_d.di_mode |= mode & ~S_IFMT;
+
+	inode->i_mode &= S_IFMT;
+	inode->i_mode |= mode & ~S_IFMT;
+}
+
 int
 xfs_setattr_nonsize(
 	struct xfs_inode	*ip,
@@ -606,18 +624,8 @@ xfs_setattr_nonsize(
 	/*
 	 * Change file access modes.
 	 */
-	if (mask & ATTR_MODE) {
-		umode_t mode = iattr->ia_mode;
-
-		if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
-			mode &= ~S_ISGID;
-
-		ip->i_d.di_mode &= S_IFMT;
-		ip->i_d.di_mode |= mode & ~S_IFMT;
-
-		inode->i_mode &= S_IFMT;
-		inode->i_mode |= mode & ~S_IFMT;
-	}
+	if (mask & ATTR_MODE)
+		xfs_setattr_mode(inode, iattr);
 
 	/*
 	 * Change file access or modified times.
@@ -714,9 +722,8 @@ xfs_setattr_size(
 		return XFS_ERROR(error);
 
 	ASSERT(S_ISREG(ip->i_d.di_mode));
-	ASSERT((mask & (ATTR_MODE|ATTR_UID|ATTR_GID|ATTR_ATIME|ATTR_ATIME_SET|
-			ATTR_MTIME_SET|ATTR_KILL_SUID|ATTR_KILL_SGID|
-			ATTR_KILL_PRIV|ATTR_TIMES_SET)) == 0);
+	ASSERT((mask & (ATTR_UID|ATTR_GID|ATTR_ATIME|ATTR_ATIME_SET|
+			ATTR_MTIME_SET|ATTR_KILL_PRIV|ATTR_TIMES_SET)) == 0);
 
 	if (!(flags & XFS_ATTR_NOLOCK)) {
 		lock_flags |= XFS_IOLOCK_EXCL;
@@ -860,6 +867,12 @@ xfs_setattr_size(
 		xfs_inode_clear_eofblocks_tag(ip);
 	}
 
+	/*
+	 * Change file access modes.
+	 */
+	if (mask & ATTR_MODE)
+		xfs_setattr_mode(inode, iattr);
+
 	if (mask & ATTR_CTIME) {
 		inode->i_ctime = iattr->ia_ctime;
 		ip->i_d.di_ctime.t_sec = iattr->ia_ctime.tv_sec;
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/2] xfs: kill suid/sgid through the truncate path.
  2013-05-24  8:58     ` [PATCH 2/2] xfs: kill suid/sgid through the truncate path Dave Chinner
@ 2013-05-24 10:02       ` Christoph Hellwig
  2013-05-24 10:07         ` Christoph Hellwig
  0 siblings, 1 reply; 49+ messages in thread
From: Christoph Hellwig @ 2013-05-24 10:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: bpm, xfs

On Fri, May 24, 2013 at 06:58:08PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> XFS has failed to kill suid/sgid bits correctly when truncating
> files of non-zero size since commit c4ed4243 ("xfs: split
> xfs_setattr") introduced in the 3.1 kernel. Fix it.

This should get a testcase in xfstests.

> +xfs_setattr_mode(
> +	struct inode	*inode,
> +	struct iattr	*iattr)
> +{
> +	struct xfs_inode *ip = XFS_I(inode);
> +	umode_t		mode = iattr->ia_mode;
> +
> +	if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
> +		mode &= ~S_ISGID;
> +
> +	ip->i_d.di_mode &= S_IFMT;
> +	ip->i_d.di_mode |= mode & ~S_IFMT;
> +
> +	inode->i_mode &= S_IFMT;
> +	inode->i_mode |= mode & ~S_IFMT;

This function should have assers that the xfs_inode is locked
exclusively and joined to a transaction.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/2] xfs: kill suid/sgid through the truncate path.
  2013-05-24 10:02       ` Christoph Hellwig
@ 2013-05-24 10:07         ` Christoph Hellwig
  0 siblings, 0 replies; 49+ messages in thread
From: Christoph Hellwig @ 2013-05-24 10:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: bpm, xfs

On Fri, May 24, 2013 at 06:02:46AM -0400, Christoph Hellwig wrote:
> On Fri, May 24, 2013 at 06:58:08PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > XFS has failed to kill suid/sgid bits correctly when truncating
> > files of non-zero size since commit c4ed4243 ("xfs: split
> > xfs_setattr") introduced in the 3.1 kernel. Fix it.
> 
> This should get a testcase in xfstests.

Ok, saw the testcase in the other thread.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/11] xfs: fixes for 3.10-rc3
  2013-05-21  8:01 [PATCH 00/11] xfs: fixes for 3.10-rc3 Dave Chinner
                   ` (12 preceding siblings ...)
  2013-05-23 12:30 ` [PATCH 0/2] xfs: more " Dave Chinner
@ 2013-05-24 18:29 ` Ben Myers
  13 siblings, 0 replies; 49+ messages in thread
From: Ben Myers @ 2013-05-24 18:29 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, May 21, 2013 at 06:01:59PM +1000, Dave Chinner wrote:
> This is my current kernel bug fix patch series. I've updated it
> against a current xfsdev tree, and contains all the fixes mentioned
> in the "fixes for 3.10-rc2 (updated)" thread. The first 7 patches
> are patches from that series. The last 4 are new patches.
> 
> The first new patch stops CRC enabled filesystems from spamming the
> log. It currently emits an "Experimental" warning ever time the
> superblock is written, which is typically every 30s.
> 
> The second path ("rework remote attr CRCs") is the changes I
> mentioned in the "fixes for 3.10-rc2 (updated)" thread. The code is
> far more robust as a result of these changes, and I think we really
> need to change the format as done in this patch. Once we have
> decided on the way forward, I'll port this to userspace.
> 
> The third patch fixes a remote symlink problem - I didn't hit this
> until I'd redone the remote attr CRCs and the 1k block size
> filesystem testing made it passed the attribute tests it was failing
> on.
> 
> Finally, the last patch is another on-disk format change - one that
> removes the 25 entry limit on ACLs. It doesn't invalidate anything
> that is already on disk, just allows ACLs on v5 superblock
> filesystems to store more than 25 ACLs in an xattr. In fact, it
> allows (65536 - 4) / 12 = 5461 entries to be stored in a single
> ACL, so I don't see anyone running out on v5 superblocks....

Pulled in patches 3-7, and 9.  Still reviewing the others.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 10/11] xfs: fix incorrect remote symlink block count
  2013-05-21  8:02 ` [PATCH 10/11] xfs: fix incorrect remote symlink block count Dave Chinner
@ 2013-05-24 20:36   ` Ben Myers
  2013-05-24 20:39     ` Ben Myers
  0 siblings, 1 reply; 49+ messages in thread
From: Ben Myers @ 2013-05-24 20:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, May 21, 2013 at 06:02:09PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When CRCs are enabled, the number of blocks needed to hold a remote
> symlink on a 1k block size filesystem may be 2 instead of 1. The
> transaction reservation for the allocated bloks was not taking this
					    blocks
> into account and only allocating one block. hence when trying to
					      H

fixed.

> read or invalidate such symlinks, we are mapping a hole where there
> should be a block and things go bad at that point.
> 
> Fix the reservation to use the correct block count, clean up the
> block count calculation similar to the remote attribute calculation,
> and add a debug guard to detect when we don't write the entire
> symlink to disk.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_symlink.c |   20 ++++++--------------
>  1 file changed, 6 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
> index 5f234389..195a403 100644
> --- a/fs/xfs/xfs_symlink.c
> +++ b/fs/xfs/xfs_symlink.c
> @@ -56,16 +56,9 @@ xfs_symlink_blocks(
>  	struct xfs_mount *mp,
>  	int		pathlen)
>  {
> -	int		fsblocks = 0;
> -	int		len = pathlen;
> +	int buflen = XFS_SYMLINK_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
>  
> -	do {
> -		fsblocks++;
> -		len -= XFS_SYMLINK_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
> -	} while (len > 0);
> -
> -	ASSERT(fsblocks <= XFS_SYMLINK_MAPS);
> -	return fsblocks;
> +	return (pathlen + buflen - 1) / buflen;

Nice to get rid of that loop.

>  }
>  
>  static int
> @@ -405,7 +398,7 @@ xfs_symlink(
>  	if (pathlen <= XFS_LITINO(mp, dp->i_d.di_version))
>  		fs_blocks = 0;
>  	else
> -		fs_blocks = XFS_B_TO_FSB(mp, pathlen);
> +		fs_blocks = xfs_symlink_blocks(mp, pathlen);
>  	resblks = XFS_SYMLINK_SPACE_RES(mp, link_name->len, fs_blocks);
>  	error = xfs_trans_reserve(tp, resblks, XFS_SYMLINK_LOG_RES(mp), 0,
>  			XFS_TRANS_PERM_LOG_RES, XFS_SYMLINK_LOG_COUNT);
> @@ -512,7 +505,7 @@ xfs_symlink(
>  		cur_chunk = target_path;
>  		offset = 0;
>  		for (n = 0; n < nmaps; n++) {
> -			char *buf;
> +			char	*buf;
>  
>  			d = XFS_FSB_TO_DADDR(mp, mval[n].br_startblock);
>  			byte_cnt = XFS_FSB_TO_B(mp, mval[n].br_blockcount);
> @@ -525,9 +518,7 @@ xfs_symlink(
>  			bp->b_ops = &xfs_symlink_buf_ops;
>  
>  			byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt);
> -			if (pathlen < byte_cnt) {
> -				byte_cnt = pathlen;
> -			}
> +			byte_cnt = min(byte_cnt, pathlen);

The min is necessary due since we can have up to three extents in here,
according to the comment above the define for XFS_SYMLINK_MAPS.  So byte_cnt
can be less than pathlen.

Looks good.

Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 10/11] xfs: fix incorrect remote symlink block count
  2013-05-24 20:36   ` Ben Myers
@ 2013-05-24 20:39     ` Ben Myers
  2013-05-24 23:41       ` Dave Chinner
  0 siblings, 1 reply; 49+ messages in thread
From: Ben Myers @ 2013-05-24 20:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Fri, May 24, 2013 at 03:36:33PM -0500, Ben Myers wrote:
> On Tue, May 21, 2013 at 06:02:09PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When CRCs are enabled, the number of blocks needed to hold a remote
> > symlink on a 1k block size filesystem may be 2 instead of 1. The
> > transaction reservation for the allocated bloks was not taking this
> 					    blocks
> > into account and only allocating one block. hence when trying to
> 					      H
> 
> fixed.
> 
> > read or invalidate such symlinks, we are mapping a hole where there
> > should be a block and things go bad at that point.
> > 
> > Fix the reservation to use the correct block count, clean up the
> > block count calculation similar to the remote attribute calculation,
> > and add a debug guard to detect when we don't write the entire
> > symlink to disk.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_symlink.c |   20 ++++++--------------
> >  1 file changed, 6 insertions(+), 14 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
> > index 5f234389..195a403 100644
> > --- a/fs/xfs/xfs_symlink.c
> > +++ b/fs/xfs/xfs_symlink.c
> > @@ -56,16 +56,9 @@ xfs_symlink_blocks(
> >  	struct xfs_mount *mp,
> >  	int		pathlen)
> >  {
> > -	int		fsblocks = 0;
> > -	int		len = pathlen;
> > +	int buflen = XFS_SYMLINK_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
> >  
> > -	do {
> > -		fsblocks++;
> > -		len -= XFS_SYMLINK_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
> > -	} while (len > 0);
> > -
> > -	ASSERT(fsblocks <= XFS_SYMLINK_MAPS);
> > -	return fsblocks;
> > +	return (pathlen + buflen - 1) / buflen;
> 
> Nice to get rid of that loop.
> 
> >  }
> >  
> >  static int
> > @@ -405,7 +398,7 @@ xfs_symlink(
> >  	if (pathlen <= XFS_LITINO(mp, dp->i_d.di_version))
> >  		fs_blocks = 0;
> >  	else
> > -		fs_blocks = XFS_B_TO_FSB(mp, pathlen);
> > +		fs_blocks = xfs_symlink_blocks(mp, pathlen);
> >  	resblks = XFS_SYMLINK_SPACE_RES(mp, link_name->len, fs_blocks);
> >  	error = xfs_trans_reserve(tp, resblks, XFS_SYMLINK_LOG_RES(mp), 0,
> >  			XFS_TRANS_PERM_LOG_RES, XFS_SYMLINK_LOG_COUNT);
> > @@ -512,7 +505,7 @@ xfs_symlink(
> >  		cur_chunk = target_path;
> >  		offset = 0;
> >  		for (n = 0; n < nmaps; n++) {
> > -			char *buf;
> > +			char	*buf;
> >  
> >  			d = XFS_FSB_TO_DADDR(mp, mval[n].br_startblock);
> >  			byte_cnt = XFS_FSB_TO_B(mp, mval[n].br_blockcount);
> > @@ -525,9 +518,7 @@ xfs_symlink(
> >  			bp->b_ops = &xfs_symlink_buf_ops;
> >  
> >  			byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt);
> > -			if (pathlen < byte_cnt) {
> > -				byte_cnt = pathlen;
> > -			}
> > +			byte_cnt = min(byte_cnt, pathlen);
> 
> The min is necessary due since we can have up to three extents in here,
> according to the comment above the define for XFS_SYMLINK_MAPS.  So byte_cnt
> can be less than pathlen.

D'oh.  I wanted to point out that it looks like this issue with 'min' here
should be causing problems with symlinks on non-crc enabled filesystems.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 10/11] xfs: fix incorrect remote symlink block count
  2013-05-24 20:39     ` Ben Myers
@ 2013-05-24 23:41       ` Dave Chinner
  2013-05-25 15:16         ` Ben Myers
  0 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2013-05-24 23:41 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Fri, May 24, 2013 at 03:39:10PM -0500, Ben Myers wrote:
> On Fri, May 24, 2013 at 03:36:33PM -0500, Ben Myers wrote:
> > On Tue, May 21, 2013 at 06:02:09PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > When CRCs are enabled, the number of blocks needed to hold a remote
> > > symlink on a 1k block size filesystem may be 2 instead of 1. The
> > > transaction reservation for the allocated bloks was not taking this
> > 					    blocks
> > > into account and only allocating one block. hence when trying to
> > 					      H
> > 
> > fixed.
> > 
> > > read or invalidate such symlinks, we are mapping a hole where there
> > > should be a block and things go bad at that point.
> > > 
> > > Fix the reservation to use the correct block count, clean up the
> > > block count calculation similar to the remote attribute calculation,
> > > and add a debug guard to detect when we don't write the entire
> > > symlink to disk.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
...

> > > @@ -525,9 +518,7 @@ xfs_symlink(
> > >  			bp->b_ops = &xfs_symlink_buf_ops;
> > >  
> > >  			byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt);
> > > -			if (pathlen < byte_cnt) {
> > > -				byte_cnt = pathlen;
> > > -			}
> > > +			byte_cnt = min(byte_cnt, pathlen);
> > 
> > The min is necessary due since we can have up to three extents in here,
> > according to the comment above the define for XFS_SYMLINK_MAPS.  So byte_cnt
> > can be less than pathlen.
> 
> D'oh.  I wanted to point out that it looks like this issue with 'min' here
> should be causing problems with symlinks on non-crc enabled filesystems.

What issue might that be? The result is identical in either case, I
just converted this to min to be more concise and consistent with
the same code loops in the remote attr copyin/copyout.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 10/11] xfs: fix incorrect remote symlink block count
  2013-05-24 23:41       ` Dave Chinner
@ 2013-05-25 15:16         ` Ben Myers
  0 siblings, 0 replies; 49+ messages in thread
From: Ben Myers @ 2013-05-25 15:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Sat, May 25, 2013 at 09:41:44AM +1000, Dave Chinner wrote:
> On Fri, May 24, 2013 at 03:39:10PM -0500, Ben Myers wrote:
> > On Fri, May 24, 2013 at 03:36:33PM -0500, Ben Myers wrote:
> > > On Tue, May 21, 2013 at 06:02:09PM +1000, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > When CRCs are enabled, the number of blocks needed to hold a remote
> > > > symlink on a 1k block size filesystem may be 2 instead of 1. The
> > > > transaction reservation for the allocated bloks was not taking this
> > > 					    blocks
> > > > into account and only allocating one block. hence when trying to
> > > 					      H
> > > 
> > > fixed.
> > > 
> > > > read or invalidate such symlinks, we are mapping a hole where there
> > > > should be a block and things go bad at that point.
> > > > 
> > > > Fix the reservation to use the correct block count, clean up the
> > > > block count calculation similar to the remote attribute calculation,
> > > > and add a debug guard to detect when we don't write the entire
> > > > symlink to disk.
> > > > 
> > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ...
> 
> > > > @@ -525,9 +518,7 @@ xfs_symlink(
> > > >  			bp->b_ops = &xfs_symlink_buf_ops;
> > > >  
> > > >  			byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt);
> > > > -			if (pathlen < byte_cnt) {
> > > > -				byte_cnt = pathlen;
> > > > -			}
> > > > +			byte_cnt = min(byte_cnt, pathlen);
> > > 
> > > The min is necessary due since we can have up to three extents in here,
> > > according to the comment above the define for XFS_SYMLINK_MAPS.  So byte_cnt
> > > can be less than pathlen.
> > 
> > D'oh.  I wanted to point out that it looks like this issue with 'min' here
> > should be causing problems with symlinks on non-crc enabled filesystems.
> 
> What issue might that be? The result is identical in either case, I
> just converted this to min to be more concise and consistent with
> the same code loops in the remote attr copyin/copyout.

My impression was that if you have byte_cnt < pathlen we'll copy past the end
of the extent.  Looks like I was wrong.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2013-05-25 15:16 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-21  8:01 [PATCH 00/11] xfs: fixes for 3.10-rc3 Dave Chinner
2013-05-21  8:02 ` [PATCH 01/11] xfs: avoid nesting transactions in xfs_qm_scall_setqlim() Dave Chinner
2013-05-21 18:35   ` Ben Myers
2013-05-21  8:02 ` [PATCH 02/11] xfs: remote attribute allocation may be contiguous Dave Chinner
2013-05-21  8:02 ` [PATCH 03/11] xfs: remote attribute read too short Dave Chinner
2013-05-21 20:59   ` Ben Myers
2013-05-21 22:53     ` Dave Chinner
2013-05-21  8:02 ` [PATCH 04/11] xfs: remote attribute tail zeroing does too much Dave Chinner
2013-05-21 22:31   ` Ben Myers
2013-05-21 22:55     ` Dave Chinner
2013-05-21  8:02 ` [PATCH 05/11] xfs: correctly map remote attr buffers during removal Dave Chinner
2013-05-22 17:01   ` Ben Myers
2013-05-21  8:02 ` [PATCH 06/11] xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance Dave Chinner
2013-05-22 20:50   ` Ben Myers
2013-05-22 23:54     ` Dave Chinner
2013-05-23 22:51       ` Ben Myers
2013-05-21  8:02 ` [PATCH 07/11] xfs: fully initialise temp leaf in xfs_attr3_leaf_compact Dave Chinner
2013-05-22 21:59   ` Ben Myers
2013-05-22 23:58     ` Dave Chinner
2013-05-23 22:51       ` Ben Myers
2013-05-21  8:02 ` [PATCH 08/11] xfs: don't emit v5 superblock warnings on write Dave Chinner
2013-05-22 22:26   ` Ben Myers
2013-05-23  0:03     ` Dave Chinner
2013-05-23 15:23       ` Ben Myers
2013-05-23 23:13         ` Dave Chinner
2013-05-21  8:02 ` [PATCH 09/11] xfs: rework remote attr CRCs Dave Chinner
2013-05-23 21:54   ` Ben Myers
2013-05-23 23:35     ` Dave Chinner
2013-05-21  8:02 ` [PATCH 10/11] xfs: fix incorrect remote symlink block count Dave Chinner
2013-05-24 20:36   ` Ben Myers
2013-05-24 20:39     ` Ben Myers
2013-05-24 23:41       ` Dave Chinner
2013-05-25 15:16         ` Ben Myers
2013-05-21  8:02 ` [PATCH 11/11] xfs: increase number of ACL entries for V5 superblocks Dave Chinner
2013-05-21 14:00   ` Brian Foster
2013-05-21 20:27     ` Dave Chinner
2013-05-21 16:26 ` [PATCH 00/11] xfs: fixes for 3.10-rc3 Ben Myers
2013-05-21 20:24   ` Dave Chinner
2013-05-21 20:52     ` Ben Myers
2013-05-21 21:27       ` Dave Chinner
2013-05-23 12:30 ` [PATCH 0/2] xfs: more " Dave Chinner
2013-05-23 12:30   ` [PATCH 1/2] xfs: rework dquot CRCs Dave Chinner
2013-05-23 12:30   ` [PATCH 2/2] xfs: fix split buffer vector log recovery support Dave Chinner
2013-05-24  8:58   ` [patch 0/2] xfs: yet more fixes for 3.10-rc3 Dave Chinner
2013-05-24  8:58     ` [PATCH 1/2] xfs: disable swap extents ioctl on CRC enabled filesystems Dave Chinner
2013-05-24  8:58     ` [PATCH 2/2] xfs: kill suid/sgid through the truncate path Dave Chinner
2013-05-24 10:02       ` Christoph Hellwig
2013-05-24 10:07         ` Christoph Hellwig
2013-05-24 18:29 ` [PATCH 00/11] xfs: fixes for 3.10-rc3 Ben Myers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.