All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/14] xfs: fixes for 3.10-rc2 (update)
@ 2013-05-19 23:51 Dave Chinner
  2013-05-19 23:51 ` [PATCH 01/14] xfs: fix sub-page blocksize data integrity writes Dave Chinner
                   ` (14 more replies)
  0 siblings, 15 replies; 34+ messages in thread
From: Dave Chinner @ 2013-05-19 23:51 UTC (permalink / raw)
  To: xfs

Hi Ben,

THis is my current patch queue for 3.10-rc2. It contains the same first 6
patches as posted here:

http://oss.sgi.com/pipermail/xfs/2013-May/026366.html

This update adds another 8 patches that fix attribute bugs on CRC
enabled filesystems. These patches make the attr group in xfstests
pass rather than assert fail or oops.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 01/14] xfs: fix sub-page blocksize data integrity writes
  2013-05-19 23:51 [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Dave Chinner
@ 2013-05-19 23:51 ` Dave Chinner
  2013-05-20 18:02   ` Brian Foster
  2013-05-19 23:51 ` [PATCH 02/14] xfs: fix rounding in xfs_free_file_space Dave Chinner
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Dave Chinner @ 2013-05-19 23:51 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

FSX on 512 byte block size filesystems has been failing for some
time with corrupted data. The fault dates back to the change in
the writeback data integrity algorithm that uses a mark-and-sweep
approach to avoid data writeback livelocks.

Unfortunately, a side effect of this mark-and-sweep approach is that
each page will only be written once for a data integrity sync, and
there is a condition in writeback in XFS where a page may require
two writeback attempts to be fully written. As a result of the high
level change, we now only get a partial page writeback during the
integrity sync because the first pass through writeback clears the
mark left on the page index to tell writeback that the page needs
writeback....

The cause is writing a partial page in the clustering code. This can
happen when a mapping boundary falls in the middle of a page - we
end up writing back the first part of the page that the mapping
covers, but then never revisit the page to have the remainder mapped
and written.

The fix is simple - if the mapping boundary falls inside a page,
then simple abort clustering without touching the page. This means
that the next ->writepage entry that write_cache_pages() will make
is the page we aborted on, and xfs_vm_writepage() will map all
sections of the page correctly. This behaviour is also optimal for
non-data integrity writes, as it results in contiguous sequential
writeback of the file rather than missing small holes and having to
write them a "random" writes in a future pass.

With this fix, all the fsx tests in xfstests now pass on a 512 byte
block size filesystem on a 4k page machine.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 2b2691b..f04eceb 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -725,6 +725,25 @@ xfs_convert_page(
 			(xfs_off_t)(page->index + 1) << PAGE_CACHE_SHIFT,
 			i_size_read(inode));
 
+	/*
+	 * If the current map does not span the entire page we are about to try
+	 * to write, then give up. The only way we can write a page that spans
+	 * multiple mappings in a single writeback iteration is via the
+	 * xfs_vm_writepage() function. Data integrity writeback requires the
+	 * entire page to be written in a single attempt, otherwise the part of
+	 * the page we don't write here doesn't get written as part of the data
+	 * integrity sync.
+	 *
+	 * For normal writeback, we also don't attempt to write partial pages
+	 * here as it simply means that write_cache_pages() will see it under
+	 * writeback and ignore the page until some pointin the future, at which
+	 * time this will be the only page inteh file that needs writeback.
+	 * Hence for more optimal IO patterns, we should always avoid partial
+	 * page writeback due to multiple mappings on a page here.
+	 */
+	if (!xfs_imap_valid(inode, imap, end_offset))
+		goto fail_unlock_page;
+
 	len = 1 << inode->i_blkbits;
 	p_offset = min_t(unsigned long, end_offset & (PAGE_CACHE_SIZE - 1),
 					PAGE_CACHE_SIZE);
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 02/14] xfs: fix rounding in xfs_free_file_space
  2013-05-19 23:51 [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Dave Chinner
  2013-05-19 23:51 ` [PATCH 01/14] xfs: fix sub-page blocksize data integrity writes Dave Chinner
@ 2013-05-19 23:51 ` Dave Chinner
  2013-05-20 18:03   ` Brian Foster
  2013-05-19 23:51 ` [PATCH 03/14] xfs: Don't reference the EFI after it is freed Dave Chinner
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Dave Chinner @ 2013-05-19 23:51 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The offset passed into xfs_free_file_space() needs to be rounded
down to a certain size, but the rounding mask is built by a 32 bit
variable. Hence the mask will always mask off the upper 32 bits of
the offset and lead to incorrect writeback and invalidation ranges.

This is not actually exposed as a bug because we writeback and
invalidate from the rounded offset to the end of the file, and hence
the offset we are actually punching a hole out of will always be
covered by the code. This needs fixing, however, if we ever want to
use exact ranges for writeback/invalidation here...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_vnodeops.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index 1501f4f..0176bb2 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -1453,7 +1453,7 @@ xfs_free_file_space(
 	xfs_mount_t		*mp;
 	int			nimap;
 	uint			resblks;
-	uint			rounding;
+	xfs_off_t		rounding;
 	int			rt;
 	xfs_fileoff_t		startoffset_fsb;
 	xfs_trans_t		*tp;
@@ -1482,7 +1482,7 @@ xfs_free_file_space(
 		inode_dio_wait(VFS_I(ip));
 	}
 
-	rounding = max_t(uint, 1 << mp->m_sb.sb_blocklog, PAGE_CACHE_SIZE);
+	rounding = max_t(xfs_off_t, 1 << mp->m_sb.sb_blocklog, PAGE_CACHE_SIZE);
 	ioffset = offset & ~(rounding - 1);
 	error = -filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
 					      ioffset, -1);
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 03/14] xfs: Don't reference the EFI after it is freed
  2013-05-19 23:51 [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Dave Chinner
  2013-05-19 23:51 ` [PATCH 01/14] xfs: fix sub-page blocksize data integrity writes Dave Chinner
  2013-05-19 23:51 ` [PATCH 02/14] xfs: fix rounding in xfs_free_file_space Dave Chinner
@ 2013-05-19 23:51 ` Dave Chinner
  2013-05-20 18:03   ` Brian Foster
  2013-05-19 23:51 ` [PATCH 04/14] xfs: avoid nesting transactions in xfs_qm_scall_setqlim() Dave Chinner
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Dave Chinner @ 2013-05-19 23:51 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Checking the EFI for whether it is being released from recovery
after we've already released the known active reference is a mistake
worthy of a brown paper bag. Fix the (now) obvious use after free
that it can cause.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_extfree_item.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index c0f3750..452920a 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -305,11 +305,12 @@ xfs_efi_release(xfs_efi_log_item_t	*efip,
 {
 	ASSERT(atomic_read(&efip->efi_next_extent) >= nextents);
 	if (atomic_sub_and_test(nextents, &efip->efi_next_extent)) {
-		__xfs_efi_release(efip);
-
 		/* recovery needs us to drop the EFI reference, too */
 		if (test_bit(XFS_EFI_RECOVERED, &efip->efi_flags))
 			__xfs_efi_release(efip);
+
+		__xfs_efi_release(efip);
+		/* efip may now have been freed, do not reference it again. */
 	}
 }
 
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 04/14] xfs: avoid nesting transactions in xfs_qm_scall_setqlim()
  2013-05-19 23:51 [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Dave Chinner
                   ` (2 preceding siblings ...)
  2013-05-19 23:51 ` [PATCH 03/14] xfs: Don't reference the EFI after it is freed Dave Chinner
@ 2013-05-19 23:51 ` Dave Chinner
  2013-05-20 18:03   ` Brian Foster
  2013-05-21  0:36   ` [PATCH 04/14 V2] " Dave Chinner
  2013-05-19 23:51 ` [PATCH 05/14] xfs: fix missing KM_NOFS tags to keep lockdep happy Dave Chinner
                   ` (10 subsequent siblings)
  14 siblings, 2 replies; 34+ messages in thread
From: Dave Chinner @ 2013-05-19 23:51 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Lockdep reports:

=============================================
[ INFO: possible recursive locking detected ]
3.9.0+ #3 Not tainted
---------------------------------------------
setquota/28368 is trying to acquire lock:
 (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50

but task is already holding lock:
 (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50

from xfs_qm_scall_setqlim()->xfs_dqread() when a dquot needs to be
allocated.

xfs_qm_scall_setqlim() is starting a transaction and then not
passing it into xfs_qm_dqet() and so it starts it's own transaction
when allocating the dquot.  Splat!

Fix this by not allocating the dquot in xfs_qm_scall_setqlim()
inside the setqlim transaction. This requires getting the dquot
first (and allocating it if necessary) then dropping and relocking
the dquot before joining it to the setqlim transaction.

Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_qm_syscalls.c |   35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
index c41190c..dfa5c05 100644
--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -489,31 +489,36 @@ xfs_qm_scall_setqlim(
 	if ((newlim->d_fieldmask & XFS_DQ_MASK) == 0)
 		return 0;
 
-	tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SETQLIM);
-	error = xfs_trans_reserve(tp, 0, XFS_QM_SETQLIM_LOG_RES(mp),
-				  0, 0, XFS_DEFAULT_LOG_COUNT);
-	if (error) {
-		xfs_trans_cancel(tp, 0);
-		return (error);
-	}
-
 	/*
 	 * We don't want to race with a quotaoff so take the quotaoff lock.
-	 * (We don't hold an inode lock, so there's nothing else to stop
-	 * a quotaoff from happening). (XXXThis doesn't currently happen
-	 * because we take the vfslock before calling xfs_qm_sysent).
+	 * We don't hold an inode lock, so there's nothing else to stop
+	 * a quotaoff from happening.
 	 */
 	mutex_lock(&q->qi_quotaofflock);
 
 	/*
-	 * Get the dquot (locked), and join it to the transaction.
-	 * Allocate the dquot if this doesn't exist.
+	 * Get the dquot (locked) before we start, as we need to do a
+	 * transaction to allocate it if it doesn't exist. Once we have the
+	 * dquot, unlock it so we can start the next transaction safely. We hold
+	 * a reference to the dquot, so it's safe to do this unlock/lock without
+	 * it being reclaimed in the mean time.
 	 */
-	if ((error = xfs_qm_dqget(mp, NULL, id, type, XFS_QMOPT_DQALLOC, &dqp))) {
-		xfs_trans_cancel(tp, XFS_TRANS_ABORT);
+	error = xfs_qm_dqget(mp, NULL, id, type, XFS_QMOPT_DQALLOC, &dqp);
+	if (error) {
 		ASSERT(error != ENOENT);
 		goto out_unlock;
 	}
+	xfs_dqunlock(dqp);
+
+	tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SETQLIM);
+	error = xfs_trans_reserve(tp, 0, XFS_QM_SETQLIM_LOG_RES(mp),
+				  0, 0, XFS_DEFAULT_LOG_COUNT);
+	if (error) {
+		xfs_trans_cancel(tp, 0);
+		goto out_unlock;
+	}
+
+	xfs_dqlock(dqp);
 	xfs_trans_dqjoin(tp, dqp);
 	ddq = &dqp->q_core;
 
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 05/14] xfs: fix missing KM_NOFS tags to keep lockdep happy
  2013-05-19 23:51 [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Dave Chinner
                   ` (3 preceding siblings ...)
  2013-05-19 23:51 ` [PATCH 04/14] xfs: avoid nesting transactions in xfs_qm_scall_setqlim() Dave Chinner
@ 2013-05-19 23:51 ` Dave Chinner
  2013-05-20 21:16   ` Ben Myers
  2013-05-19 23:51 ` [PATCH 06/14] xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC Dave Chinner
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Dave Chinner @ 2013-05-19 23:51 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

There are several places where we use KM_SLEEP allocation contexts
and use the fact that there are called from transaction context to
add KM_NOFS where appropriate. Unfortunately, there are several
places where the code makes this assumption but can be called from
outside transaction context but with filesystem locks held. These
places need explicit KM_NOFS annotations to avoid lockdep
complaining about reclaim contexts.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf.c       |    2 +-
 fs/xfs/xfs_da_btree.c  |    6 ++++--
 fs/xfs/xfs_dir2_leaf.c |    2 +-
 fs/xfs/xfs_log_cil.c   |    2 +-
 4 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 82b70bd..0d25542 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1649,7 +1649,7 @@ xfs_alloc_buftarg(
 {
 	xfs_buftarg_t		*btp;
 
-	btp = kmem_zalloc(sizeof(*btp), KM_SLEEP);
+	btp = kmem_zalloc(sizeof(*btp), KM_SLEEP | KM_NOFS);
 
 	btp->bt_mount = mp;
 	btp->bt_dev =  bdev->bd_dev;
diff --git a/fs/xfs/xfs_da_btree.c b/fs/xfs/xfs_da_btree.c
index 9b26a99..41ea7e1 100644
--- a/fs/xfs/xfs_da_btree.c
+++ b/fs/xfs/xfs_da_btree.c
@@ -2464,7 +2464,8 @@ xfs_buf_map_from_irec(
 	ASSERT(nirecs >= 1);
 
 	if (nirecs > 1) {
-		map = kmem_zalloc(nirecs * sizeof(struct xfs_buf_map), KM_SLEEP);
+		map = kmem_zalloc(nirecs * sizeof(struct xfs_buf_map),
+				  KM_SLEEP | KM_NOFS);
 		if (!map)
 			return ENOMEM;
 		*mapp = map;
@@ -2520,7 +2521,8 @@ xfs_dabuf_map(
 		 * Optimize the one-block case.
 		 */
 		if (nfsb != 1)
-			irecs = kmem_zalloc(sizeof(irec) * nfsb, KM_SLEEP);
+			irecs = kmem_zalloc(sizeof(irec) * nfsb,
+					    KM_SLEEP | KM_NOFS);
 
 		nirecs = nfsb;
 		error = xfs_bmapi_read(dp, (xfs_fileoff_t)bno, nfsb, irecs,
diff --git a/fs/xfs/xfs_dir2_leaf.c b/fs/xfs/xfs_dir2_leaf.c
index 721ba2f..da71a18 100644
--- a/fs/xfs/xfs_dir2_leaf.c
+++ b/fs/xfs/xfs_dir2_leaf.c
@@ -1336,7 +1336,7 @@ xfs_dir2_leaf_getdents(
 				     mp->m_sb.sb_blocksize);
 	map_info = kmem_zalloc(offsetof(struct xfs_dir2_leaf_map_info, map) +
 				(length * sizeof(struct xfs_bmbt_irec)),
-			       KM_SLEEP);
+			       KM_SLEEP | KM_NOFS);
 	map_info->map_size = length;
 
 	/*
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index e3d0b85..d0833b5 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -139,7 +139,7 @@ xlog_cil_prepare_log_vecs(
 
 		new_lv = kmem_zalloc(sizeof(*new_lv) +
 				niovecs * sizeof(struct xfs_log_iovec),
-				KM_SLEEP);
+				KM_SLEEP|KM_NOFS);
 
 		/* The allocated iovec region lies beyond the log vector. */
 		new_lv->lv_iovecp = (struct xfs_log_iovec *)&new_lv[1];
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 06/14] xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC
  2013-05-19 23:51 [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Dave Chinner
                   ` (4 preceding siblings ...)
  2013-05-19 23:51 ` [PATCH 05/14] xfs: fix missing KM_NOFS tags to keep lockdep happy Dave Chinner
@ 2013-05-19 23:51 ` Dave Chinner
  2013-05-20 21:32   ` Ben Myers
  2013-05-19 23:51 ` [PATCH 07/14] xfs: xfs_attr_shortform_allfit() does not handle attr3 format Dave Chinner
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Dave Chinner @ 2013-05-19 23:51 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_da_btree.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/xfs/xfs_da_btree.c b/fs/xfs/xfs_da_btree.c
index 41ea7e1..0b8b2a1 100644
--- a/fs/xfs/xfs_da_btree.c
+++ b/fs/xfs/xfs_da_btree.c
@@ -270,6 +270,7 @@ xfs_da3_node_read_verify(
 				break;
 			return;
 		case XFS_ATTR_LEAF_MAGIC:
+		case XFS_ATTR3_LEAF_MAGIC:
 			bp->b_ops = &xfs_attr3_leaf_buf_ops;
 			bp->b_ops->verify_read(bp);
 			return;
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 07/14] xfs: xfs_attr_shortform_allfit() does not handle attr3 format.
  2013-05-19 23:51 [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Dave Chinner
                   ` (5 preceding siblings ...)
  2013-05-19 23:51 ` [PATCH 06/14] xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC Dave Chinner
@ 2013-05-19 23:51 ` Dave Chinner
  2013-05-20 21:52   ` Ben Myers
  2013-05-19 23:51 ` [PATCH 08/14] xfs: remote attribute allocation may be contiguous Dave Chinner
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Dave Chinner @ 2013-05-19 23:51 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfstests generic/117 fails with:

XFS: Assertion failed: leaf->hdr.info.magic == cpu_to_be16(XFS_ATTR_LEAF_MAGIC)

indicating a function that does not handle the attr3 format
correctly. Fix it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_leaf.c |   24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
index 08d5457..8eeb88f 100644
--- a/fs/xfs/xfs_attr_leaf.c
+++ b/fs/xfs/xfs_attr_leaf.c
@@ -931,20 +931,22 @@ xfs_attr_shortform_list(xfs_attr_list_context_t *context)
  */
 int
 xfs_attr_shortform_allfit(
-	struct xfs_buf	*bp,
-	struct xfs_inode *dp)
+	struct xfs_buf		*bp,
+	struct xfs_inode	*dp)
 {
-	xfs_attr_leafblock_t *leaf;
-	xfs_attr_leaf_entry_t *entry;
+	struct xfs_attr_leafblock *leaf;
+	struct xfs_attr_leaf_entry *entry;
 	xfs_attr_leaf_name_local_t *name_loc;
-	int bytes, i;
+	struct xfs_attr3_icleaf_hdr leafhdr;
+	int			bytes;
+	int			i;
 
 	leaf = bp->b_addr;
-	ASSERT(leaf->hdr.info.magic == cpu_to_be16(XFS_ATTR_LEAF_MAGIC));
+	xfs_attr3_leaf_hdr_from_disk(&leafhdr, leaf);
+	entry = xfs_attr3_leaf_entryp(leaf);
 
-	entry = &leaf->entries[0];
 	bytes = sizeof(struct xfs_attr_sf_hdr);
-	for (i = 0; i < be16_to_cpu(leaf->hdr.count); entry++, i++) {
+	for (i = 0; i < leafhdr.count; entry++, i++) {
 		if (entry->flags & XFS_ATTR_INCOMPLETE)
 			continue;		/* don't copy partial entries */
 		if (!(entry->flags & XFS_ATTR_LOCAL))
@@ -954,15 +956,15 @@ xfs_attr_shortform_allfit(
 			return(0);
 		if (be16_to_cpu(name_loc->valuelen) >= XFS_ATTR_SF_ENTSIZE_MAX)
 			return(0);
-		bytes += sizeof(struct xfs_attr_sf_entry)-1
+		bytes += sizeof(struct xfs_attr_sf_entry) - 1
 				+ name_loc->namelen
 				+ be16_to_cpu(name_loc->valuelen);
 	}
 	if ((dp->i_mount->m_flags & XFS_MOUNT_ATTR2) &&
 	    (dp->i_d.di_format != XFS_DINODE_FMT_BTREE) &&
 	    (bytes == sizeof(struct xfs_attr_sf_hdr)))
-		return(-1);
-	return(xfs_attr_shortform_bytesfit(dp, bytes));
+		return -1;
+	return xfs_attr_shortform_bytesfit(dp, bytes);
 }
 
 /*
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 08/14] xfs: remote attribute allocation may be contiguous
  2013-05-19 23:51 [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Dave Chinner
                   ` (6 preceding siblings ...)
  2013-05-19 23:51 ` [PATCH 07/14] xfs: xfs_attr_shortform_allfit() does not handle attr3 format Dave Chinner
@ 2013-05-19 23:51 ` Dave Chinner
  2013-05-20 19:03   ` Brian Foster
  2013-05-19 23:51 ` [PATCH 09/14] xfs: remote attribute lookups require the value length Dave Chinner
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Dave Chinner @ 2013-05-19 23:51 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When CRCs are enabled, there may be multiple allocations made if the
headers cause a length overflow. This, however, does not mean that
the number of headers required increases, as the second and
subsequent extents may be contiguous with the previous extent. Hence
when we map the extents to write the attribute data, we may end up
with less extents than allocations made. Hence the assertion that we
consume th enumber of headers we calculated in the allocation loop
is incorrect and needs to be removed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_remote.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
index dee8446..aad95b0 100644
--- a/fs/xfs/xfs_attr_remote.c
+++ b/fs/xfs/xfs_attr_remote.c
@@ -359,6 +359,11 @@ xfs_attr_rmtval_set(
 		 * into requiring more blocks. e.g. for 512 byte blocks, we'll
 		 * spill for another block every 9 headers we require in this
 		 * loop.
+		 *
+		 * Note that this can result in contiguous allocation of blocks,
+		 * so we don't use all the space we allocate for headers as we
+		 * have one less header for each contiguous allocation that
+		 * occurs in the map/write loop below.
 		 */
 		if (crcs && blkcnt == 0) {
 			int total_len;
@@ -439,7 +444,6 @@ xfs_attr_rmtval_set(
 		lblkno += map.br_blockcount;
 	}
 	ASSERT(valuelen == 0);
-	ASSERT(hdrcnt == 0);
 	return 0;
 }
 
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 09/14] xfs: remote attribute lookups require the value length
  2013-05-19 23:51 [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Dave Chinner
                   ` (7 preceding siblings ...)
  2013-05-19 23:51 ` [PATCH 08/14] xfs: remote attribute allocation may be contiguous Dave Chinner
@ 2013-05-19 23:51 ` Dave Chinner
  2013-05-20 22:15   ` Ben Myers
  2013-05-19 23:51 ` [PATCH 10/14] xfs: remote attribute read too short Dave Chinner
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Dave Chinner @ 2013-05-19 23:51 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When reading a remote attribute, to correctly calculate the length
of the data buffer for CRC enable filesystems, we need to know the
length of the attribute data. We get this information when we look
up the attribute, but we don't store it in the args structure along
with the other remote attr information we get from the lookup. Add
this information to the args structure so we can use it
appropriately.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_leaf.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
index 8eeb88f..0bce1b3 100644
--- a/fs/xfs/xfs_attr_leaf.c
+++ b/fs/xfs/xfs_attr_leaf.c
@@ -2332,9 +2332,10 @@ xfs_attr3_leaf_lookup_int(
 			if (!xfs_attr_namesp_match(args->flags, entry->flags))
 				continue;
 			args->index = probe;
+			args->valuelen = be32_to_cpu(name_rmt->valuelen);
 			args->rmtblkno = be32_to_cpu(name_rmt->valueblk);
 			args->rmtblkcnt = XFS_B_TO_FSB(args->dp->i_mount,
-						   be32_to_cpu(name_rmt->valuelen));
+						       args->valuelen);
 			return XFS_ERROR(EEXIST);
 		}
 	}
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 10/14] xfs: remote attribute read too short
  2013-05-19 23:51 [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Dave Chinner
                   ` (8 preceding siblings ...)
  2013-05-19 23:51 ` [PATCH 09/14] xfs: remote attribute lookups require the value length Dave Chinner
@ 2013-05-19 23:51 ` Dave Chinner
  2013-05-20 23:00   ` Ben Myers
  2013-05-19 23:51 ` [PATCH 11/14] xfs: remote attribute tail zeroing does too much Dave Chinner
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Dave Chinner @ 2013-05-19 23:51 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Reading a maximally size remote attribute fails when CRCs are
enabled with this verification error:

XFS (vdb): remote attribute header does not match required off/len/owner)

There are two reasons for this, the first being that the
length of the buffer being read is determined from the
args->rmtblkcnt which doesn't take into account CRC headers. Hence
the mapped length ends up being too short and so we need to
calculate it directly from the value length.

The second is that the byte count of valid data within a buffer is
capped by the length of the data and so doesn't take into account
that the buffer might be longer due to headers. Hence we need to
calculate the data space in the buffer first before calculating the
actual byte count of data.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_remote.c |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
index aad95b0..bcdc07c 100644
--- a/fs/xfs/xfs_attr_remote.c
+++ b/fs/xfs/xfs_attr_remote.c
@@ -52,9 +52,11 @@ xfs_attr3_rmt_blocks(
 	struct xfs_mount *mp,
 	int		attrlen)
 {
-	int		buflen = XFS_ATTR3_RMT_BUF_SPACE(mp,
-							 mp->m_sb.sb_blocksize);
-	return (attrlen + buflen - 1) / buflen;
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
+		int buflen = XFS_ATTR3_RMT_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
+		return (attrlen + buflen - 1) / buflen;
+	}
+	return XFS_B_TO_FSB(mp, attrlen);
 }
 
 static bool
@@ -206,8 +208,9 @@ xfs_attr_rmtval_get(
 
 	while (valuelen > 0) {
 		nmap = ATTR_RMTVALUE_MAPSIZE;
+		blkcnt = xfs_attr3_rmt_blocks(mp, valuelen);
 		error = xfs_bmapi_read(args->dp, (xfs_fileoff_t)lblkno,
-				       args->rmtblkcnt, map, &nmap,
+				       blkcnt, map, &nmap,
 				       XFS_BMAPI_ATTRFORK);
 		if (error)
 			return error;
@@ -227,8 +230,8 @@ xfs_attr_rmtval_get(
 			if (error)
 				return error;
 
-			byte_cnt = min_t(int, valuelen, BBTOB(bp->b_length));
-			byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, byte_cnt);
+			byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, BBTOB(bp->b_length));
+			byte_cnt = min_t(int, valuelen, byte_cnt);
 
 			src = bp->b_addr;
 			if (xfs_sb_version_hascrc(&mp->m_sb)) {
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 11/14] xfs: remote attribute tail zeroing does too much
  2013-05-19 23:51 [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Dave Chinner
                   ` (9 preceding siblings ...)
  2013-05-19 23:51 ` [PATCH 10/14] xfs: remote attribute read too short Dave Chinner
@ 2013-05-19 23:51 ` Dave Chinner
  2013-05-20 23:01   ` Ben Myers
  2013-05-19 23:51 ` [PATCH 12/14] xfs: correctly map remote attr buffers during removal Dave Chinner
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Dave Chinner @ 2013-05-19 23:51 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When an attribute data does not fill then entire remote block, we
zero the remaining part of the buffer. This, however, needs to take
into account that the buffer has a header, and so the offset where
zeroing starts and the length of zeroing need to take this into
account. Otherwise we end up with zeros over the end of the
attribute value when CRCs are enabled.

While there, make sure we only ask to map an extent that covers the
remaining range of the attribute, rather than asking every time for
the full length of remote data. If the remote attribute blocks are
contiguous with other parts of the attribute tree, it will map those
blocks as well and we can potentially zero them incorrectly. We can
also get buffer size mistmatches when trying to read or remove the
remote attribute, and this can lead to not finding the correct
buffer when looking it up in cache.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_remote.c |   37 ++++++++++++++++++-------------------
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
index bcdc07c..e207bf0 100644
--- a/fs/xfs/xfs_attr_remote.c
+++ b/fs/xfs/xfs_attr_remote.c
@@ -296,10 +296,7 @@ xfs_attr_rmtval_set(
 	 * and we may not need that many, so we have to handle this when
 	 * allocating the blocks below. 
 	 */
-	if (!crcs)
-		blkcnt = XFS_B_TO_FSB(mp, args->valuelen);
-	else
-		blkcnt = xfs_attr3_rmt_blocks(mp, args->valuelen);
+	blkcnt = xfs_attr3_rmt_blocks(mp, args->valuelen);
 
 	error = xfs_bmap_first_unused(args->trans, args->dp, blkcnt, &lfileoff,
 						   XFS_ATTR_FORK);
@@ -394,8 +391,11 @@ xfs_attr_rmtval_set(
 	 */
 	lblkno = args->rmtblkno;
 	valuelen = args->valuelen;
+	blkcnt = args->rmtblkcnt;
 	while (valuelen > 0) {
 		int	byte_cnt;
+		int	hdr_size;
+		int	dblkcnt;
 		char	*buf;
 
 		/*
@@ -404,7 +404,7 @@ xfs_attr_rmtval_set(
 		xfs_bmap_init(args->flist, args->firstblock);
 		nmap = 1;
 		error = xfs_bmapi_read(dp, (xfs_fileoff_t)lblkno,
-				       args->rmtblkcnt, &map, &nmap,
+				       blkcnt, &map, &nmap,
 				       XFS_BMAPI_ATTRFORK);
 		if (error)
 			return(error);
@@ -413,26 +413,25 @@ xfs_attr_rmtval_set(
 		       (map.br_startblock != HOLESTARTBLOCK));
 
 		dblkno = XFS_FSB_TO_DADDR(mp, map.br_startblock),
-		blkcnt = XFS_FSB_TO_BB(mp, map.br_blockcount);
+		dblkcnt = XFS_FSB_TO_BB(mp, map.br_blockcount);
 
-		bp = xfs_buf_get(mp->m_ddev_targp, dblkno, blkcnt, 0);
+		bp = xfs_buf_get(mp->m_ddev_targp, dblkno, dblkcnt, 0);
 		if (!bp)
 			return ENOMEM;
 		bp->b_ops = &xfs_attr3_rmt_buf_ops;
-
-		byte_cnt = BBTOB(bp->b_length);
-		byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, byte_cnt);
-		if (valuelen < byte_cnt)
-			byte_cnt = valuelen;
-
 		buf = bp->b_addr;
-		buf += xfs_attr3_rmt_hdr_set(mp, dp->i_ino, offset,
+
+		byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, BBTOB(bp->b_length));
+		byte_cnt = min_t(int, valuelen, byte_cnt);
+		hdr_size = xfs_attr3_rmt_hdr_set(mp, dp->i_ino, offset,
 					     byte_cnt, bp);
-		memcpy(buf, src, byte_cnt);
+		ASSERT(hdr_size + byte_cnt <= BBTOB(bp->b_length));
 
-		if (byte_cnt < BBTOB(bp->b_length))
-			xfs_buf_zero(bp, byte_cnt,
-				     BBTOB(bp->b_length) - byte_cnt);
+		memcpy(buf + hdr_size, src, byte_cnt);
+
+		if (byte_cnt + hdr_size < BBTOB(bp->b_length))
+			xfs_buf_zero(bp, byte_cnt + hdr_size,
+				     BBTOB(bp->b_length) - byte_cnt - hdr_size);
 
 		error = xfs_bwrite(bp);	/* GROT: NOTE: synchronous write */
 		xfs_buf_relse(bp);
@@ -442,9 +441,9 @@ xfs_attr_rmtval_set(
 		src += byte_cnt;
 		valuelen -= byte_cnt;
 		offset += byte_cnt;
-		hdrcnt--;
 
 		lblkno += map.br_blockcount;
+		blkcnt -= map.br_blockcount;
 	}
 	ASSERT(valuelen == 0);
 	return 0;
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 12/14] xfs: correctly map remote attr buffers during removal
  2013-05-19 23:51 [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Dave Chinner
                   ` (10 preceding siblings ...)
  2013-05-19 23:51 ` [PATCH 11/14] xfs: remote attribute tail zeroing does too much Dave Chinner
@ 2013-05-19 23:51 ` Dave Chinner
  2013-05-19 23:51 ` [PATCH 13/14] xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance Dave Chinner
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 34+ messages in thread
From: Dave Chinner @ 2013-05-19 23:51 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

If we don't map the buffers correctly (same as for get/set
operations) then the incore buffer lookup will fail. If a block
number matches but a length is wrong, then debug kernels will ASSERT
fail in _xfs_buf_find() due to the length mismatch. Ensure that we
map the buffers correctly by basing the length of the buffer on the
attribute data length rather than the remote block count.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_remote.c |   27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
index e207bf0..d8bcb2d 100644
--- a/fs/xfs/xfs_attr_remote.c
+++ b/fs/xfs/xfs_attr_remote.c
@@ -468,19 +468,25 @@ xfs_attr_rmtval_remove(xfs_da_args_t *args)
 	mp = args->dp->i_mount;
 
 	/*
-	 * Roll through the "value", invalidating the attribute value's
-	 * blocks.
+	 * Roll through the "value", invalidating the attribute value's blocks.
+	 * Note that args->rmtblkcnt is the minimum number of data blocks we'll
+	 * see for a CRC enabled remote attribute. Each extent will have a
+	 * header, and so we may have more blocks than we realise here.  If we
+	 * fail to map the blocks correctly, we'll have problems with the buffer
+	 * lookups.
 	 */
 	lblkno = args->rmtblkno;
-	valuelen = args->rmtblkcnt;
+	valuelen = args->valuelen;
+	blkcnt = xfs_attr3_rmt_blocks(mp, valuelen);
 	while (valuelen > 0) {
+		int dblkcnt;
+
 		/*
 		 * Try to remember where we decided to put the value.
 		 */
 		nmap = 1;
 		error = xfs_bmapi_read(args->dp, (xfs_fileoff_t)lblkno,
-				       args->rmtblkcnt, &map, &nmap,
-				       XFS_BMAPI_ATTRFORK);
+				       blkcnt, &map, &nmap, XFS_BMAPI_ATTRFORK);
 		if (error)
 			return(error);
 		ASSERT(nmap == 1);
@@ -488,28 +494,31 @@ xfs_attr_rmtval_remove(xfs_da_args_t *args)
 		       (map.br_startblock != HOLESTARTBLOCK));
 
 		dblkno = XFS_FSB_TO_DADDR(mp, map.br_startblock),
-		blkcnt = XFS_FSB_TO_BB(mp, map.br_blockcount);
+		dblkcnt = XFS_FSB_TO_BB(mp, map.br_blockcount);
 
 		/*
 		 * If the "remote" value is in the cache, remove it.
 		 */
-		bp = xfs_incore(mp->m_ddev_targp, dblkno, blkcnt, XBF_TRYLOCK);
+		bp = xfs_incore(mp->m_ddev_targp, dblkno, dblkcnt, XBF_TRYLOCK);
 		if (bp) {
 			xfs_buf_stale(bp);
 			xfs_buf_relse(bp);
 			bp = NULL;
 		}
 
-		valuelen -= map.br_blockcount;
+		valuelen -= XFS_ATTR3_RMT_BUF_SPACE(mp,
+					XFS_FSB_TO_B(mp, map.br_blockcount));
 
 		lblkno += map.br_blockcount;
+		blkcnt -= map.br_blockcount;
+		blkcnt = max(blkcnt, xfs_attr3_rmt_blocks(mp, valuelen));
 	}
 
 	/*
 	 * Keep de-allocating extents until the remote-value region is gone.
 	 */
+	blkcnt = lblkno - args->rmtblkno;
 	lblkno = args->rmtblkno;
-	blkcnt = args->rmtblkcnt;
 	done = 0;
 	while (!done) {
 		xfs_bmap_init(args->flist, args->firstblock);
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 13/14] xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance
  2013-05-19 23:51 [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Dave Chinner
                   ` (11 preceding siblings ...)
  2013-05-19 23:51 ` [PATCH 12/14] xfs: correctly map remote attr buffers during removal Dave Chinner
@ 2013-05-19 23:51 ` Dave Chinner
  2013-05-19 23:51 ` [PATCH 14/14] xfs: fully initialise temp leaf in xfs_attr3_leaf_compact Dave Chinner
  2013-05-20 19:37 ` [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Ben Myers
  14 siblings, 0 replies; 34+ messages in thread
From: Dave Chinner @ 2013-05-19 23:51 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_attr3_leaf_unbalance() uses a temporary buffer for recombining
the entries in two leaves when the destination leaf requires
compaction. The temporary buffer ends up being copied back over the
original destination buffer, so the header in the temporary buffer
needs to contain all the information that is in the destination
buffer.

To make sure the temporary buffer is fully initialised, once we've
set up the temporary incore header appropriately, write is back to
the temporary buffer before starting to move entries around.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_leaf.c |   16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
index 0bce1b3..79ece72 100644
--- a/fs/xfs/xfs_attr_leaf.c
+++ b/fs/xfs/xfs_attr_leaf.c
@@ -2181,14 +2181,24 @@ xfs_attr3_leaf_unbalance(
 		struct xfs_attr_leafblock *tmp_leaf;
 		struct xfs_attr3_icleaf_hdr tmphdr;
 
-		tmp_leaf = kmem_alloc(state->blocksize, KM_SLEEP);
-		memset(tmp_leaf, 0, state->blocksize);
-		memset(&tmphdr, 0, sizeof(tmphdr));
+		tmp_leaf = kmem_zalloc(state->blocksize, KM_SLEEP);
+
+		/*
+		 * Copy the header into the temp leaf so that all the stuff
+		 * not in the incore header is present and gets copied back in
+		 * once we've moved all the entries.
+		 */
+		memcpy(tmp_leaf, save_leaf, xfs_attr3_leaf_hdr_size(save_leaf));
 
+		memset(&tmphdr, 0, sizeof(tmphdr));
 		tmphdr.magic = savehdr.magic;
 		tmphdr.forw = savehdr.forw;
 		tmphdr.back = savehdr.back;
 		tmphdr.firstused = state->blocksize;
+
+		/* write the header to the temp buffer to initialise it */
+		xfs_attr3_leaf_hdr_to_disk(tmp_leaf, &tmphdr);
+
 		if (xfs_attr3_leaf_order(save_blk->bp, &savehdr,
 					 drop_blk->bp, &drophdr)) {
 			xfs_attr3_leaf_moveents(drop_leaf, &drophdr, 0,
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 14/14] xfs: fully initialise temp leaf in xfs_attr3_leaf_compact
  2013-05-19 23:51 [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Dave Chinner
                   ` (12 preceding siblings ...)
  2013-05-19 23:51 ` [PATCH 13/14] xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance Dave Chinner
@ 2013-05-19 23:51 ` Dave Chinner
  2013-05-20 19:37 ` [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Ben Myers
  14 siblings, 0 replies; 34+ messages in thread
From: Dave Chinner @ 2013-05-19 23:51 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_attr3_leaf_compact() uses a temporary buffer for compacting the
the entries in a leaf. It copies the the original buffer into the
temporary buffer, then zeros the original buffer completely. It then
copies the entries back into the original buffer.  However, the
original buffer has not been correctly initialised, and so the
movement of the entries goes horribly wrong.

Make sure the zeroed destination buffer is fully initialised, and
once we've set up the destination incore header appropriately, write
is back to the buffer before starting to move entries around.

While debugging this, the _d/_s prefixes weren't sufficient to
remind me what buffer was what, so rename then all _src/_dst.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_leaf.c |   42 ++++++++++++++++++++++++++----------------
 1 file changed, 26 insertions(+), 16 deletions(-)

diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
index 79ece72..5b03d15 100644
--- a/fs/xfs/xfs_attr_leaf.c
+++ b/fs/xfs/xfs_attr_leaf.c
@@ -1445,11 +1445,12 @@ xfs_attr3_leaf_add_work(
 STATIC void
 xfs_attr3_leaf_compact(
 	struct xfs_da_args	*args,
-	struct xfs_attr3_icleaf_hdr *ichdr_d,
+	struct xfs_attr3_icleaf_hdr *ichdr_dst,
 	struct xfs_buf		*bp)
 {
-	xfs_attr_leafblock_t	*leaf_s, *leaf_d;
-	struct xfs_attr3_icleaf_hdr ichdr_s;
+	struct xfs_attr_leafblock *leaf_src;
+	struct xfs_attr_leafblock *leaf_dst;
+	struct xfs_attr3_icleaf_hdr ichdr_src;
 	struct xfs_trans	*trans = args->trans;
 	struct xfs_mount	*mp = trans->t_mountp;
 	char			*tmpbuffer;
@@ -1457,29 +1458,38 @@ xfs_attr3_leaf_compact(
 	trace_xfs_attr_leaf_compact(args);
 
 	tmpbuffer = kmem_alloc(XFS_LBSIZE(mp), KM_SLEEP);
-	ASSERT(tmpbuffer != NULL);
 	memcpy(tmpbuffer, bp->b_addr, XFS_LBSIZE(mp));
 	memset(bp->b_addr, 0, XFS_LBSIZE(mp));
+	leaf_src = (xfs_attr_leafblock_t *)tmpbuffer;
+	leaf_dst = bp->b_addr;
 
 	/*
-	 * Copy basic information
+	 * Copy the on-disk header back into the destination buffer to ensure
+	 * all the information in the header that is not part of the incore
+	 * header structure is preserved.
 	 */
-	leaf_s = (xfs_attr_leafblock_t *)tmpbuffer;
-	leaf_d = bp->b_addr;
-	ichdr_s = *ichdr_d;	/* struct copy */
-	ichdr_d->firstused = XFS_LBSIZE(mp);
-	ichdr_d->usedbytes = 0;
-	ichdr_d->count = 0;
-	ichdr_d->holes = 0;
-	ichdr_d->freemap[0].base = xfs_attr3_leaf_hdr_size(leaf_s);
-	ichdr_d->freemap[0].size = ichdr_d->firstused - ichdr_d->freemap[0].base;
+	memcpy(bp->b_addr, tmpbuffer, xfs_attr3_leaf_hdr_size(leaf_src));
+
+	/* Initialise the incore headers */
+	ichdr_src = *ichdr_dst;	/* struct copy */
+	ichdr_dst->firstused = XFS_LBSIZE(mp);
+	ichdr_dst->usedbytes = 0;
+	ichdr_dst->count = 0;
+	ichdr_dst->holes = 0;
+	ichdr_dst->freemap[0].base = xfs_attr3_leaf_hdr_size(leaf_src);
+	ichdr_dst->freemap[0].size = ichdr_dst->firstused -
+						ichdr_dst->freemap[0].base;
+
+
+	/* write the header back to initialise the underlying buffer */
+	xfs_attr3_leaf_hdr_to_disk(leaf_dst, ichdr_dst);
 
 	/*
 	 * Copy all entry's in the same (sorted) order,
 	 * but allocate name/value pairs packed and in sequence.
 	 */
-	xfs_attr3_leaf_moveents(leaf_s, &ichdr_s, 0, leaf_d, ichdr_d, 0,
-				ichdr_s.count, mp);
+	xfs_attr3_leaf_moveents(leaf_src, &ichdr_src, 0, leaf_dst, ichdr_dst, 0,
+				ichdr_src.count, mp);
 	/*
 	 * this logs the entire buffer, but the caller must write the header
 	 * back to the buffer when it is finished modifying it.
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 01/14] xfs: fix sub-page blocksize data integrity writes
  2013-05-19 23:51 ` [PATCH 01/14] xfs: fix sub-page blocksize data integrity writes Dave Chinner
@ 2013-05-20 18:02   ` Brian Foster
  2013-05-20 19:18     ` Ben Myers
  0 siblings, 1 reply; 34+ messages in thread
From: Brian Foster @ 2013-05-20 18:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 05/19/2013 07:51 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> FSX on 512 byte block size filesystems has been failing for some
> time with corrupted data. The fault dates back to the change in
> the writeback data integrity algorithm that uses a mark-and-sweep
> approach to avoid data writeback livelocks.
> 
> Unfortunately, a side effect of this mark-and-sweep approach is that
> each page will only be written once for a data integrity sync, and
> there is a condition in writeback in XFS where a page may require
> two writeback attempts to be fully written. As a result of the high
> level change, we now only get a partial page writeback during the
> integrity sync because the first pass through writeback clears the
> mark left on the page index to tell writeback that the page needs
> writeback....
> 
> The cause is writing a partial page in the clustering code. This can
> happen when a mapping boundary falls in the middle of a page - we
> end up writing back the first part of the page that the mapping
> covers, but then never revisit the page to have the remainder mapped
> and written.
> 
> The fix is simple - if the mapping boundary falls inside a page,
> then simple abort clustering without touching the page. This means
> that the next ->writepage entry that write_cache_pages() will make
> is the page we aborted on, and xfs_vm_writepage() will map all
> sections of the page correctly. This behaviour is also optimal for
> non-data integrity writes, as it results in contiguous sequential
> writeback of the file rather than missing small holes and having to
> write them a "random" writes in a future pass.
> 
> With this fix, all the fsx tests in xfstests now pass on a 512 byte
> block size filesystem on a 4k page machine.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Looks good to me.

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_aops.c |   19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 2b2691b..f04eceb 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -725,6 +725,25 @@ xfs_convert_page(
>  			(xfs_off_t)(page->index + 1) << PAGE_CACHE_SHIFT,
>  			i_size_read(inode));
>  
> +	/*
> +	 * If the current map does not span the entire page we are about to try
> +	 * to write, then give up. The only way we can write a page that spans
> +	 * multiple mappings in a single writeback iteration is via the
> +	 * xfs_vm_writepage() function. Data integrity writeback requires the
> +	 * entire page to be written in a single attempt, otherwise the part of
> +	 * the page we don't write here doesn't get written as part of the data
> +	 * integrity sync.
> +	 *
> +	 * For normal writeback, we also don't attempt to write partial pages
> +	 * here as it simply means that write_cache_pages() will see it under
> +	 * writeback and ignore the page until some pointin the future, at which
> +	 * time this will be the only page inteh file that needs writeback.
> +	 * Hence for more optimal IO patterns, we should always avoid partial
> +	 * page writeback due to multiple mappings on a page here.
> +	 */
> +	if (!xfs_imap_valid(inode, imap, end_offset))
> +		goto fail_unlock_page;
> +
>  	len = 1 << inode->i_blkbits;
>  	p_offset = min_t(unsigned long, end_offset & (PAGE_CACHE_SIZE - 1),
>  					PAGE_CACHE_SIZE);
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 02/14] xfs: fix rounding in xfs_free_file_space
  2013-05-19 23:51 ` [PATCH 02/14] xfs: fix rounding in xfs_free_file_space Dave Chinner
@ 2013-05-20 18:03   ` Brian Foster
  0 siblings, 0 replies; 34+ messages in thread
From: Brian Foster @ 2013-05-20 18:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 05/19/2013 07:51 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The offset passed into xfs_free_file_space() needs to be rounded
> down to a certain size, but the rounding mask is built by a 32 bit
> variable. Hence the mask will always mask off the upper 32 bits of
> the offset and lead to incorrect writeback and invalidation ranges.
> 
> This is not actually exposed as a bug because we writeback and
> invalidate from the rounded offset to the end of the file, and hence
> the offset we are actually punching a hole out of will always be
> covered by the code. This needs fixing, however, if we ever want to
> use exact ranges for writeback/invalidation here...
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Looks good to me.

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_vnodeops.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
> index 1501f4f..0176bb2 100644
> --- a/fs/xfs/xfs_vnodeops.c
> +++ b/fs/xfs/xfs_vnodeops.c
> @@ -1453,7 +1453,7 @@ xfs_free_file_space(
>  	xfs_mount_t		*mp;
>  	int			nimap;
>  	uint			resblks;
> -	uint			rounding;
> +	xfs_off_t		rounding;
>  	int			rt;
>  	xfs_fileoff_t		startoffset_fsb;
>  	xfs_trans_t		*tp;
> @@ -1482,7 +1482,7 @@ xfs_free_file_space(
>  		inode_dio_wait(VFS_I(ip));
>  	}
>  
> -	rounding = max_t(uint, 1 << mp->m_sb.sb_blocklog, PAGE_CACHE_SIZE);
> +	rounding = max_t(xfs_off_t, 1 << mp->m_sb.sb_blocklog, PAGE_CACHE_SIZE);
>  	ioffset = offset & ~(rounding - 1);
>  	error = -filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
>  					      ioffset, -1);
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/14] xfs: Don't reference the EFI after it is freed
  2013-05-19 23:51 ` [PATCH 03/14] xfs: Don't reference the EFI after it is freed Dave Chinner
@ 2013-05-20 18:03   ` Brian Foster
  0 siblings, 0 replies; 34+ messages in thread
From: Brian Foster @ 2013-05-20 18:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 05/19/2013 07:51 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Checking the EFI for whether it is being released from recovery
> after we've already released the known active reference is a mistake
> worthy of a brown paper bag. Fix the (now) obvious use after free
> that it can cause.
> 
> Reported-by: Dave Jones <davej@redhat.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Looks good to me.

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_extfree_item.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> index c0f3750..452920a 100644
> --- a/fs/xfs/xfs_extfree_item.c
> +++ b/fs/xfs/xfs_extfree_item.c
> @@ -305,11 +305,12 @@ xfs_efi_release(xfs_efi_log_item_t	*efip,
>  {
>  	ASSERT(atomic_read(&efip->efi_next_extent) >= nextents);
>  	if (atomic_sub_and_test(nextents, &efip->efi_next_extent)) {
> -		__xfs_efi_release(efip);
> -
>  		/* recovery needs us to drop the EFI reference, too */
>  		if (test_bit(XFS_EFI_RECOVERED, &efip->efi_flags))
>  			__xfs_efi_release(efip);
> +
> +		__xfs_efi_release(efip);
> +		/* efip may now have been freed, do not reference it again. */
>  	}
>  }
>  
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 04/14] xfs: avoid nesting transactions in xfs_qm_scall_setqlim()
  2013-05-19 23:51 ` [PATCH 04/14] xfs: avoid nesting transactions in xfs_qm_scall_setqlim() Dave Chinner
@ 2013-05-20 18:03   ` Brian Foster
  2013-05-21  0:06     ` Dave Chinner
  2013-05-21  0:36   ` [PATCH 04/14 V2] " Dave Chinner
  1 sibling, 1 reply; 34+ messages in thread
From: Brian Foster @ 2013-05-20 18:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 05/19/2013 07:51 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Lockdep reports:
> 
> =============================================
> [ INFO: possible recursive locking detected ]
> 3.9.0+ #3 Not tainted
> ---------------------------------------------
> setquota/28368 is trying to acquire lock:
>  (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50
> 
> but task is already holding lock:
>  (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50
> 
> from xfs_qm_scall_setqlim()->xfs_dqread() when a dquot needs to be
> allocated.
> 
> xfs_qm_scall_setqlim() is starting a transaction and then not
> passing it into xfs_qm_dqet() and so it starts it's own transaction
> when allocating the dquot.  Splat!
> 
> Fix this by not allocating the dquot in xfs_qm_scall_setqlim()
> inside the setqlim transaction. This requires getting the dquot
> first (and allocating it if necessary) then dropping and relocking
> the dquot before joining it to the setqlim transaction.
> 
> Reported-by: Michael L. Semon <mlsemon35@gmail.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_qm_syscalls.c |   35 ++++++++++++++++++++---------------
>  1 file changed, 20 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
> index c41190c..dfa5c05 100644
> --- a/fs/xfs/xfs_qm_syscalls.c
> +++ b/fs/xfs/xfs_qm_syscalls.c
> @@ -489,31 +489,36 @@ xfs_qm_scall_setqlim(
>  	if ((newlim->d_fieldmask & XFS_DQ_MASK) == 0)
>  		return 0;
>  
> -	tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SETQLIM);
> -	error = xfs_trans_reserve(tp, 0, XFS_QM_SETQLIM_LOG_RES(mp),
> -				  0, 0, XFS_DEFAULT_LOG_COUNT);
> -	if (error) {
> -		xfs_trans_cancel(tp, 0);
> -		return (error);
> -	}
> -
>  	/*
>  	 * We don't want to race with a quotaoff so take the quotaoff lock.
> -	 * (We don't hold an inode lock, so there's nothing else to stop
> -	 * a quotaoff from happening). (XXXThis doesn't currently happen
> -	 * because we take the vfslock before calling xfs_qm_sysent).
> +	 * We don't hold an inode lock, so there's nothing else to stop
> +	 * a quotaoff from happening.
>  	 */
>  	mutex_lock(&q->qi_quotaofflock);
>  
>  	/*
> -	 * Get the dquot (locked), and join it to the transaction.
> -	 * Allocate the dquot if this doesn't exist.
> +	 * Get the dquot (locked) before we start, as we need to do a
> +	 * transaction to allocate it if it doesn't exist. Once we have the
> +	 * dquot, unlock it so we can start the next transaction safely. We hold
> +	 * a reference to the dquot, so it's safe to do this unlock/lock without
> +	 * it being reclaimed in the mean time.
>  	 */
> -	if ((error = xfs_qm_dqget(mp, NULL, id, type, XFS_QMOPT_DQALLOC, &dqp))) {
> -		xfs_trans_cancel(tp, XFS_TRANS_ABORT);
> +	error = xfs_qm_dqget(mp, NULL, id, type, XFS_QMOPT_DQALLOC, &dqp);
> +	if (error) {
>  		ASSERT(error != ENOENT);
>  		goto out_unlock;
>  	}
> +	xfs_dqunlock(dqp);
> +
> +	tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SETQLIM);
> +	error = xfs_trans_reserve(tp, 0, XFS_QM_SETQLIM_LOG_RES(mp),
> +				  0, 0, XFS_DEFAULT_LOG_COUNT);
> +	if (error) {
> +		xfs_trans_cancel(tp, 0);
> +		goto out_unlock;

By shuffling the transaction allocation and xfs_qm_dqget() around, it
looks like you now have an error path with a referenced dquot. Perhaps
here we should jump to a new label that includes the xfs_qm_dqrele() at
the end of the function?

Brian

> +	}
> +
> +	xfs_dqlock(dqp);
>  	xfs_trans_dqjoin(tp, dqp);
>  	ddq = &dqp->q_core;
>  
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 08/14] xfs: remote attribute allocation may be contiguous
  2013-05-19 23:51 ` [PATCH 08/14] xfs: remote attribute allocation may be contiguous Dave Chinner
@ 2013-05-20 19:03   ` Brian Foster
  2013-05-20 22:04     ` Ben Myers
  0 siblings, 1 reply; 34+ messages in thread
From: Brian Foster @ 2013-05-20 19:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 05/19/2013 07:51 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When CRCs are enabled, there may be multiple allocations made if the
> headers cause a length overflow. This, however, does not mean that
> the number of headers required increases, as the second and
> subsequent extents may be contiguous with the previous extent. Hence
> when we map the extents to write the attribute data, we may end up
> with less extents than allocations made. Hence the assertion that we
> consume th enumber of headers we calculated in the allocation loop
> is incorrect and needs to be removed.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_attr_remote.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
> index dee8446..aad95b0 100644
> --- a/fs/xfs/xfs_attr_remote.c
> +++ b/fs/xfs/xfs_attr_remote.c
> @@ -359,6 +359,11 @@ xfs_attr_rmtval_set(
>  		 * into requiring more blocks. e.g. for 512 byte blocks, we'll
>  		 * spill for another block every 9 headers we require in this
>  		 * loop.
> +		 *
> +		 * Note that this can result in contiguous allocation of blocks,
> +		 * so we don't use all the space we allocate for headers as we
> +		 * have one less header for each contiguous allocation that
> +		 * occurs in the map/write loop below.
>  		 */
>  		if (crcs && blkcnt == 0) {
>  			int total_len;
> @@ -439,7 +444,6 @@ xfs_attr_rmtval_set(
>  		lblkno += map.br_blockcount;
>  	}
>  	ASSERT(valuelen == 0);
> -	ASSERT(hdrcnt == 0);

I can't say I grok the context enough atm to send a Reviewed-by, but if
we're removing this, why not just remove the hdrcnt-- a few lines up as
well? It doesn't appear to be used after the first loop at this point.

Brian

>  	return 0;
>  }
>  
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 01/14] xfs: fix sub-page blocksize data integrity writes
  2013-05-20 18:02   ` Brian Foster
@ 2013-05-20 19:18     ` Ben Myers
  0 siblings, 0 replies; 34+ messages in thread
From: Ben Myers @ 2013-05-20 19:18 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Mon, May 20, 2013 at 02:02:59PM -0400, Brian Foster wrote:
> On 05/19/2013 07:51 PM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > FSX on 512 byte block size filesystems has been failing for some
> > time with corrupted data. The fault dates back to the change in
> > the writeback data integrity algorithm that uses a mark-and-sweep
> > approach to avoid data writeback livelocks.
> > 
> > Unfortunately, a side effect of this mark-and-sweep approach is that
> > each page will only be written once for a data integrity sync, and
> > there is a condition in writeback in XFS where a page may require
> > two writeback attempts to be fully written. As a result of the high
> > level change, we now only get a partial page writeback during the
> > integrity sync because the first pass through writeback clears the
> > mark left on the page index to tell writeback that the page needs
> > writeback....
> > 
> > The cause is writing a partial page in the clustering code. This can
> > happen when a mapping boundary falls in the middle of a page - we
> > end up writing back the first part of the page that the mapping
> > covers, but then never revisit the page to have the remainder mapped
> > and written.
> > 
> > The fix is simple - if the mapping boundary falls inside a page,
> > then simple abort clustering without touching the page. This means
> > that the next ->writepage entry that write_cache_pages() will make
> > is the page we aborted on, and xfs_vm_writepage() will map all
> > sections of the page correctly. This behaviour is also optimal for
> > non-data integrity writes, as it results in contiguous sequential
> > writeback of the file rather than missing small holes and having to
> > write them a "random" writes in a future pass.
> > 
> > With this fix, all the fsx tests in xfstests now pass on a 512 byte
> > block size filesystem on a 4k page machine.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> 
> Looks good to me.
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> >  fs/xfs/xfs_aops.c |   19 +++++++++++++++++++
> >  1 file changed, 19 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index 2b2691b..f04eceb 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -725,6 +725,25 @@ xfs_convert_page(
> >  			(xfs_off_t)(page->index + 1) << PAGE_CACHE_SHIFT,
> >  			i_size_read(inode));
> >  
> > +	/*
> > +	 * If the current map does not span the entire page we are about to try
> > +	 * to write, then give up. The only way we can write a page that spans
> > +	 * multiple mappings in a single writeback iteration is via the
> > +	 * xfs_vm_writepage() function. Data integrity writeback requires the
> > +	 * entire page to be written in a single attempt, otherwise the part of
> > +	 * the page we don't write here doesn't get written as part of the data
> > +	 * integrity sync.
> > +	 *
> > +	 * For normal writeback, we also don't attempt to write partial pages
> > +	 * here as it simply means that write_cache_pages() will see it under
> > +	 * writeback and ignore the page until some pointin the future, at which
> > +	 * time this will be the only page inteh file that needs writeback.
> > +	 * Hence for more optimal IO patterns, we should always avoid partial
> > +	 * page writeback due to multiple mappings on a page here.
> > +	 */

Applying this with a couple of spelling fixes in this comment.

Thanks for the reviews Brian.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/14] xfs: fixes for 3.10-rc2 (update)
  2013-05-19 23:51 [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Dave Chinner
                   ` (13 preceding siblings ...)
  2013-05-19 23:51 ` [PATCH 14/14] xfs: fully initialise temp leaf in xfs_attr3_leaf_compact Dave Chinner
@ 2013-05-20 19:37 ` Ben Myers
  14 siblings, 0 replies; 34+ messages in thread
From: Ben Myers @ 2013-05-20 19:37 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, May 20, 2013 at 09:51:07AM +1000, Dave Chinner wrote:
> THis is my current patch queue for 3.10-rc2. It contains the same first 6
> patches as posted here:
> 
> http://oss.sgi.com/pipermail/xfs/2013-May/026366.html
> 
> This update adds another 8 patches that fix attribute bugs on CRC
> enabled filesystems. These patches make the attr group in xfstests
> pass rather than assert fail or oops.

Applied 1-3.  Will work on reviews for the others.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 05/14] xfs: fix missing KM_NOFS tags to keep lockdep happy
  2013-05-19 23:51 ` [PATCH 05/14] xfs: fix missing KM_NOFS tags to keep lockdep happy Dave Chinner
@ 2013-05-20 21:16   ` Ben Myers
  2013-05-21  0:08     ` Dave Chinner
  0 siblings, 1 reply; 34+ messages in thread
From: Ben Myers @ 2013-05-20 21:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, May 20, 2013 at 09:51:12AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> There are several places where we use KM_SLEEP allocation contexts
> and use the fact that there are called from transaction context to
			they

(fixed)

> add KM_NOFS where appropriate.

I think you're referring to the usage of PF_FSTRANS and us clearing __GFP_FS in
kmem_flags_convert?

> Unfortunately, there are several
> places where the code makes this assumption but can be called from
> outside transaction context but with filesystem locks held. These
> places need explicit KM_NOFS annotations to avoid lockdep
> complaining about reclaim contexts.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks good.  In each case you added KM_NOFS where there was no transaction and
locks would have been held.  Applied.

Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 06/14] xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC
  2013-05-19 23:51 ` [PATCH 06/14] xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC Dave Chinner
@ 2013-05-20 21:32   ` Ben Myers
  0 siblings, 0 replies; 34+ messages in thread
From: Ben Myers @ 2013-05-20 21:32 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, May 20, 2013 at 09:51:13AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks good.  Applied.

Reviewed-by: Ben Myers <bpm@sgi.com>

> ---
>  fs/xfs/xfs_da_btree.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/xfs/xfs_da_btree.c b/fs/xfs/xfs_da_btree.c
> index 41ea7e1..0b8b2a1 100644
> --- a/fs/xfs/xfs_da_btree.c
> +++ b/fs/xfs/xfs_da_btree.c
> @@ -270,6 +270,7 @@ xfs_da3_node_read_verify(
>  				break;
>  			return;
>  		case XFS_ATTR_LEAF_MAGIC:
> +		case XFS_ATTR3_LEAF_MAGIC:
>  			bp->b_ops = &xfs_attr3_leaf_buf_ops;
>  			bp->b_ops->verify_read(bp);
>  			return;
> -- 
> 1.7.10.4
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 07/14] xfs: xfs_attr_shortform_allfit() does not handle attr3 format.
  2013-05-19 23:51 ` [PATCH 07/14] xfs: xfs_attr_shortform_allfit() does not handle attr3 format Dave Chinner
@ 2013-05-20 21:52   ` Ben Myers
  0 siblings, 0 replies; 34+ messages in thread
From: Ben Myers @ 2013-05-20 21:52 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, May 20, 2013 at 09:51:14AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xfstests generic/117 fails with:
> 
> XFS: Assertion failed: leaf->hdr.info.magic == cpu_to_be16(XFS_ATTR_LEAF_MAGIC)
> 
> indicating a function that does not handle the attr3 format
> correctly. Fix it.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks good.  Applied.

Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 08/14] xfs: remote attribute allocation may be contiguous
  2013-05-20 19:03   ` Brian Foster
@ 2013-05-20 22:04     ` Ben Myers
  2013-05-21  0:25       ` Dave Chinner
  0 siblings, 1 reply; 34+ messages in thread
From: Ben Myers @ 2013-05-20 22:04 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Mon, May 20, 2013 at 03:03:49PM -0400, Brian Foster wrote:
> On 05/19/2013 07:51 PM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When CRCs are enabled, there may be multiple allocations made if the
> > headers cause a length overflow. This, however, does not mean that
> > the number of headers required increases, as the second and
> > subsequent extents may be contiguous with the previous extent. Hence
> > when we map the extents to write the attribute data, we may end up
> > with less extents than allocations made. Hence the assertion that we
> > consume th enumber of headers we calculated in the allocation loop
	    the 

> > is incorrect and needs to be removed.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_attr_remote.c |    6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
> > index dee8446..aad95b0 100644
> > --- a/fs/xfs/xfs_attr_remote.c
> > +++ b/fs/xfs/xfs_attr_remote.c
> > @@ -359,6 +359,11 @@ xfs_attr_rmtval_set(
> >  		 * into requiring more blocks. e.g. for 512 byte blocks, we'll
> >  		 * spill for another block every 9 headers we require in this
> >  		 * loop.
> > +		 *
> > +		 * Note that this can result in contiguous allocation of blocks,
> > +		 * so we don't use all the space we allocate for headers as we
> > +		 * have one less header for each contiguous allocation that
> > +		 * occurs in the map/write loop below.
> >  		 */
> >  		if (crcs && blkcnt == 0) {
> >  			int total_len;
> > @@ -439,7 +444,6 @@ xfs_attr_rmtval_set(
> >  		lblkno += map.br_blockcount;
> >  	}
> >  	ASSERT(valuelen == 0);
> > -	ASSERT(hdrcnt == 0);
> 
> I can't say I grok the context enough atm to send a Reviewed-by, but if
> we're removing this, why not just remove the hdrcnt-- a few lines up as
> well? It doesn't appear to be used after the first loop at this point.

Could also keep a separate variable for each loop and compare them here.  The
count for the 2nd loop should always be smaller than for the first... The
assert was worth adding so maybe it is worth fixing too?  Either way is fine
with me.

Reviewed-by: Ben Myers <bpm@sgi.com>

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/14] xfs: remote attribute lookups require the value length
  2013-05-19 23:51 ` [PATCH 09/14] xfs: remote attribute lookups require the value length Dave Chinner
@ 2013-05-20 22:15   ` Ben Myers
  0 siblings, 0 replies; 34+ messages in thread
From: Ben Myers @ 2013-05-20 22:15 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, May 20, 2013 at 09:51:16AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When reading a remote attribute, to correctly calculate the length
> of the data buffer for CRC enable filesystems, we need to know the
> length of the attribute data. We get this information when we look
> up the attribute, but we don't store it in the args structure along
> with the other remote attr information we get from the lookup. Add
> this information to the args structure so we can use it
> appropriately.

Looks good.  Applied.
Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 10/14] xfs: remote attribute read too short
  2013-05-19 23:51 ` [PATCH 10/14] xfs: remote attribute read too short Dave Chinner
@ 2013-05-20 23:00   ` Ben Myers
  0 siblings, 0 replies; 34+ messages in thread
From: Ben Myers @ 2013-05-20 23:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, May 20, 2013 at 09:51:17AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Reading a maximally size remote attribute fails when CRCs are
> enabled with this verification error:
> 
> XFS (vdb): remote attribute header does not match required off/len/owner)
> 
> There are two reasons for this, the first being that the
> length of the buffer being read is determined from the
> args->rmtblkcnt which doesn't take into account CRC headers. Hence
> the mapped length ends up being too short and so we need to
> calculate it directly from the value length.
> 
> The second is that the byte count of valid data within a buffer is
> capped by the length of the data and so doesn't take into account
> that the buffer might be longer due to headers. Hence we need to
> calculate the data space in the buffer first before calculating the
> actual byte count of data.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_attr_remote.c |   15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
> index aad95b0..bcdc07c 100644
> --- a/fs/xfs/xfs_attr_remote.c
> +++ b/fs/xfs/xfs_attr_remote.c
> @@ -52,9 +52,11 @@ xfs_attr3_rmt_blocks(
>  	struct xfs_mount *mp,
>  	int		attrlen)
>  {
> -	int		buflen = XFS_ATTR3_RMT_BUF_SPACE(mp,
> -							 mp->m_sb.sb_blocksize);
> -	return (attrlen + buflen - 1) / buflen;
> +	if (xfs_sb_version_hascrc(&mp->m_sb)) {
> +		int buflen = XFS_ATTR3_RMT_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
> +		return (attrlen + buflen - 1) / buflen;
> +	}
> +	return XFS_B_TO_FSB(mp, attrlen);
>  }

xfs_attr_rmtval_set, previously the only caller of xfs_attr3_rmt_blocks already
checks for crcs.  Can one of the two conditionals be dispensed with?


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/14] xfs: remote attribute tail zeroing does too much
  2013-05-19 23:51 ` [PATCH 11/14] xfs: remote attribute tail zeroing does too much Dave Chinner
@ 2013-05-20 23:01   ` Ben Myers
  0 siblings, 0 replies; 34+ messages in thread
From: Ben Myers @ 2013-05-20 23:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, May 20, 2013 at 09:51:18AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When an attribute data does not fill then entire remote block, we
> zero the remaining part of the buffer. This, however, needs to take
> into account that the buffer has a header, and so the offset where
> zeroing starts and the length of zeroing need to take this into
> account. Otherwise we end up with zeros over the end of the
> attribute value when CRCs are enabled.
> 
> While there, make sure we only ask to map an extent that covers the
> remaining range of the attribute, rather than asking every time for
> the full length of remote data. If the remote attribute blocks are
> contiguous with other parts of the attribute tree, it will map those
> blocks as well and we can potentially zero them incorrectly. We can
> also get buffer size mistmatches when trying to read or remove the
> remote attribute, and this can lead to not finding the correct
> buffer when looking it up in cache.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_attr_remote.c |   37 ++++++++++++++++++-------------------
>  1 file changed, 18 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
> index bcdc07c..e207bf0 100644
> --- a/fs/xfs/xfs_attr_remote.c
> +++ b/fs/xfs/xfs_attr_remote.c
> @@ -296,10 +296,7 @@ xfs_attr_rmtval_set(
>  	 * and we may not need that many, so we have to handle this when
>  	 * allocating the blocks below. 
>  	 */
> -	if (!crcs)
> -		blkcnt = XFS_B_TO_FSB(mp, args->valuelen);
> -	else
> -		blkcnt = xfs_attr3_rmt_blocks(mp, args->valuelen);
> +	blkcnt = xfs_attr3_rmt_blocks(mp, args->valuelen);

Oh.  There you have it.  Nevermind.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 04/14] xfs: avoid nesting transactions in xfs_qm_scall_setqlim()
  2013-05-20 18:03   ` Brian Foster
@ 2013-05-21  0:06     ` Dave Chinner
  0 siblings, 0 replies; 34+ messages in thread
From: Dave Chinner @ 2013-05-21  0:06 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Mon, May 20, 2013 at 02:03:09PM -0400, Brian Foster wrote:
> On 05/19/2013 07:51 PM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Lockdep reports:
> > 
> > =============================================
> > [ INFO: possible recursive locking detected ]
> > 3.9.0+ #3 Not tainted
> > ---------------------------------------------
> > setquota/28368 is trying to acquire lock:
> >  (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50
> > 
> > but task is already holding lock:
> >  (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50
> > 
> > from xfs_qm_scall_setqlim()->xfs_dqread() when a dquot needs to be
> > allocated.
> > 
> > xfs_qm_scall_setqlim() is starting a transaction and then not
> > passing it into xfs_qm_dqet() and so it starts it's own transaction
> > when allocating the dquot.  Splat!
> > 
> > Fix this by not allocating the dquot in xfs_qm_scall_setqlim()
> > inside the setqlim transaction. This requires getting the dquot
> > first (and allocating it if necessary) then dropping and relocking
> > the dquot before joining it to the setqlim transaction.
> > 
> > Reported-by: Michael L. Semon <mlsemon35@gmail.com>
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_qm_syscalls.c |   35 ++++++++++++++++++++---------------
> >  1 file changed, 20 insertions(+), 15 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
> > index c41190c..dfa5c05 100644
> > --- a/fs/xfs/xfs_qm_syscalls.c
> > +++ b/fs/xfs/xfs_qm_syscalls.c
> > @@ -489,31 +489,36 @@ xfs_qm_scall_setqlim(
> >  	if ((newlim->d_fieldmask & XFS_DQ_MASK) == 0)
> >  		return 0;
> >  
> > -	tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SETQLIM);
> > -	error = xfs_trans_reserve(tp, 0, XFS_QM_SETQLIM_LOG_RES(mp),
> > -				  0, 0, XFS_DEFAULT_LOG_COUNT);
> > -	if (error) {
> > -		xfs_trans_cancel(tp, 0);
> > -		return (error);
> > -	}
> > -
> >  	/*
> >  	 * We don't want to race with a quotaoff so take the quotaoff lock.
> > -	 * (We don't hold an inode lock, so there's nothing else to stop
> > -	 * a quotaoff from happening). (XXXThis doesn't currently happen
> > -	 * because we take the vfslock before calling xfs_qm_sysent).
> > +	 * We don't hold an inode lock, so there's nothing else to stop
> > +	 * a quotaoff from happening.
> >  	 */
> >  	mutex_lock(&q->qi_quotaofflock);
> >  
> >  	/*
> > -	 * Get the dquot (locked), and join it to the transaction.
> > -	 * Allocate the dquot if this doesn't exist.
> > +	 * Get the dquot (locked) before we start, as we need to do a
> > +	 * transaction to allocate it if it doesn't exist. Once we have the
> > +	 * dquot, unlock it so we can start the next transaction safely. We hold
> > +	 * a reference to the dquot, so it's safe to do this unlock/lock without
> > +	 * it being reclaimed in the mean time.
> >  	 */
> > -	if ((error = xfs_qm_dqget(mp, NULL, id, type, XFS_QMOPT_DQALLOC, &dqp))) {
> > -		xfs_trans_cancel(tp, XFS_TRANS_ABORT);
> > +	error = xfs_qm_dqget(mp, NULL, id, type, XFS_QMOPT_DQALLOC, &dqp);
> > +	if (error) {
> >  		ASSERT(error != ENOENT);
> >  		goto out_unlock;
> >  	}
> > +	xfs_dqunlock(dqp);
> > +
> > +	tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SETQLIM);
> > +	error = xfs_trans_reserve(tp, 0, XFS_QM_SETQLIM_LOG_RES(mp),
> > +				  0, 0, XFS_DEFAULT_LOG_COUNT);
> > +	if (error) {
> > +		xfs_trans_cancel(tp, 0);
> > +		goto out_unlock;
> 
> By shuffling the transaction allocation and xfs_qm_dqget() around, it
> looks like you now have an error path with a referenced dquot. Perhaps
> here we should jump to a new label that includes the xfs_qm_dqrele() at
> the end of the function?

Right, you are. ;)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 05/14] xfs: fix missing KM_NOFS tags to keep lockdep happy
  2013-05-20 21:16   ` Ben Myers
@ 2013-05-21  0:08     ` Dave Chinner
  0 siblings, 0 replies; 34+ messages in thread
From: Dave Chinner @ 2013-05-21  0:08 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Mon, May 20, 2013 at 04:16:07PM -0500, Ben Myers wrote:
> On Mon, May 20, 2013 at 09:51:12AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > There are several places where we use KM_SLEEP allocation contexts
> > and use the fact that there are called from transaction context to
> 			they
> 
> (fixed)
> 
> > add KM_NOFS where appropriate.
> 
> I think you're referring to the usage of PF_FSTRANS and us clearing __GFP_FS in
> kmem_flags_convert?

Yes.

> > Unfortunately, there are several
> > places where the code makes this assumption but can be called from
> > outside transaction context but with filesystem locks held. These
> > places need explicit KM_NOFS annotations to avoid lockdep
> > complaining about reclaim contexts.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> Looks good.  In each case you added KM_NOFS where there was no transaction and
> locks would have been held.  Applied.
> 
> Reviewed-by: Ben Myers <bpm@sgi.com>

Thanks.

-Dave
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 08/14] xfs: remote attribute allocation may be contiguous
  2013-05-20 22:04     ` Ben Myers
@ 2013-05-21  0:25       ` Dave Chinner
  0 siblings, 0 replies; 34+ messages in thread
From: Dave Chinner @ 2013-05-21  0:25 UTC (permalink / raw)
  To: Ben Myers; +Cc: Brian Foster, xfs

On Mon, May 20, 2013 at 05:04:18PM -0500, Ben Myers wrote:
> On Mon, May 20, 2013 at 03:03:49PM -0400, Brian Foster wrote:
> > On 05/19/2013 07:51 PM, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > When CRCs are enabled, there may be multiple allocations made if the
> > > headers cause a length overflow. This, however, does not mean that
> > > the number of headers required increases, as the second and
> > > subsequent extents may be contiguous with the previous extent. Hence
> > > when we map the extents to write the attribute data, we may end up
> > > with less extents than allocations made. Hence the assertion that we
> > > consume th enumber of headers we calculated in the allocation loop
> 	    the 
> 
> > > is incorrect and needs to be removed.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  fs/xfs/xfs_attr_remote.c |    6 +++++-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
> > > index dee8446..aad95b0 100644
> > > --- a/fs/xfs/xfs_attr_remote.c
> > > +++ b/fs/xfs/xfs_attr_remote.c
> > > @@ -359,6 +359,11 @@ xfs_attr_rmtval_set(
> > >  		 * into requiring more blocks. e.g. for 512 byte blocks, we'll
> > >  		 * spill for another block every 9 headers we require in this
> > >  		 * loop.
> > > +		 *
> > > +		 * Note that this can result in contiguous allocation of blocks,
> > > +		 * so we don't use all the space we allocate for headers as we
> > > +		 * have one less header for each contiguous allocation that
> > > +		 * occurs in the map/write loop below.
> > >  		 */
> > >  		if (crcs && blkcnt == 0) {
> > >  			int total_len;
> > > @@ -439,7 +444,6 @@ xfs_attr_rmtval_set(
> > >  		lblkno += map.br_blockcount;
> > >  	}
> > >  	ASSERT(valuelen == 0);
> > > -	ASSERT(hdrcnt == 0);
> > 
> > I can't say I grok the context enough atm to send a Reviewed-by, but if
> > we're removing this, why not just remove the hdrcnt-- a few lines up as
> > well? It doesn't appear to be used after the first loop at this point.

Yup, I noticed that later on....

> Could also keep a separate variable for each loop and compare them here.  The
> count for the 2nd loop should always be smaller than for the first... The
> assert was worth adding so maybe it is worth fixing too?  Either way is fine
> with me.

Well, I'm not convinced the code is correct yet, anyway. My
overnight runs on a 1k block size filesystem (can't use 512 byte
block size with 512 byte inodes) tripped over the same assert that I
was seeing with 4k block size filesystems and these patches fix.

By "not convinced", I mean that I think I've made a mistake in just
putting a header at the start of each extent. Call it preamture
optimisation, if you want, as it seems like a good idea at the time.

The main issue is that we don't know ahead of time how big an
attribute is going to be because it will have a variable number of
headers, and hence we don't know where the attribute data actually
ends from a mapping lookup as there can be trailing contiguous
blocks that hold other attribute data.

IOWs, I've found that this code is incredibly fragile, difficult to
verify and hard to debug. And I'm pretty certain the mapping of
trailing blocks is almost impossible to solve sanely.

What I'd like to do right now is change this format to
have a header per filesystem block. The reason for doing this is
that all the unknowns are removed - we have a direct relationship
between the attribute data length and the number of blocks needed to
store the data (it is always what xfs_attr3_rmt_blocks() returns)
and that greatly simplifies all this code.

It gets rid of the multiple allocation loop, the complexity of
contiguous allocation from multiple loop instances, etc. It does
make the parsing/verification of the attribute data a little more
complex, but that is a constant rather than the mess I've created
right now...

The only issue is that it is a change of the on-disk format. I'm
happy just to go change it as this is marked experimental, it hasn't
reached an initial public release yet and the only people using it
are developers and testers using volatile data...

I'm going to do this today, based on top of this series - this series
allows much more testing to be done, so it should be committed
anyway. Yeah, it's a pain, but this code is going to cause us long
term problems if we don't fix these problems now...

Thoughts?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 04/14 V2] xfs: avoid nesting transactions in xfs_qm_scall_setqlim()
  2013-05-19 23:51 ` [PATCH 04/14] xfs: avoid nesting transactions in xfs_qm_scall_setqlim() Dave Chinner
  2013-05-20 18:03   ` Brian Foster
@ 2013-05-21  0:36   ` Dave Chinner
  2013-05-21 10:51     ` Brian Foster
  1 sibling, 1 reply; 34+ messages in thread
From: Dave Chinner @ 2013-05-21  0:36 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Lockdep reports:

=============================================
[ INFO: possible recursive locking detected ]
3.9.0+ #3 Not tainted
---------------------------------------------
setquota/28368 is trying to acquire lock:
 (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50

but task is already holding lock:
 (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50

from xfs_qm_scall_setqlim()->xfs_dqread() when a dquot needs to be
allocated.

xfs_qm_scall_setqlim() is starting a transaction and then not
passing it into xfs_qm_dqet() and so it starts it's own transaction
when allocating the dquot.  Splat!

Fix this by not allocating the dquot in xfs_qm_scall_setqlim()
inside the setqlim transaction. This requires getting the dquot
first (and allocating it if necessary) then dropping and relocking
the dquot before joining it to the setqlim transaction.

Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
V2: fix error handling path that failed to release the dquot.

 fs/xfs/xfs_qm_syscalls.c |   40 +++++++++++++++++++++++-----------------
 1 file changed, 23 insertions(+), 17 deletions(-)

diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
index c41190c..6cdf6ff 100644
--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -489,31 +489,36 @@ xfs_qm_scall_setqlim(
 	if ((newlim->d_fieldmask & XFS_DQ_MASK) == 0)
 		return 0;
 
-	tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SETQLIM);
-	error = xfs_trans_reserve(tp, 0, XFS_QM_SETQLIM_LOG_RES(mp),
-				  0, 0, XFS_DEFAULT_LOG_COUNT);
-	if (error) {
-		xfs_trans_cancel(tp, 0);
-		return (error);
-	}
-
 	/*
 	 * We don't want to race with a quotaoff so take the quotaoff lock.
-	 * (We don't hold an inode lock, so there's nothing else to stop
-	 * a quotaoff from happening). (XXXThis doesn't currently happen
-	 * because we take the vfslock before calling xfs_qm_sysent).
+	 * We don't hold an inode lock, so there's nothing else to stop
+	 * a quotaoff from happening.
 	 */
 	mutex_lock(&q->qi_quotaofflock);
 
 	/*
-	 * Get the dquot (locked), and join it to the transaction.
-	 * Allocate the dquot if this doesn't exist.
+	 * Get the dquot (locked) before we start, as we need to do a
+	 * transaction to allocate it if it doesn't exist. Once we have the
+	 * dquot, unlock it so we can start the next transaction safely. We hold
+	 * a reference to the dquot, so it's safe to do this unlock/lock without
+	 * it being reclaimed in the mean time.
 	 */
-	if ((error = xfs_qm_dqget(mp, NULL, id, type, XFS_QMOPT_DQALLOC, &dqp))) {
-		xfs_trans_cancel(tp, XFS_TRANS_ABORT);
+	error = xfs_qm_dqget(mp, NULL, id, type, XFS_QMOPT_DQALLOC, &dqp);
+	if (error) {
 		ASSERT(error != ENOENT);
 		goto out_unlock;
 	}
+	xfs_dqunlock(dqp);
+
+	tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SETQLIM);
+	error = xfs_trans_reserve(tp, 0, XFS_QM_SETQLIM_LOG_RES(mp),
+				  0, 0, XFS_DEFAULT_LOG_COUNT);
+	if (error) {
+		xfs_trans_cancel(tp, 0);
+		goto out_rele;
+	}
+
+	xfs_dqlock(dqp);
 	xfs_trans_dqjoin(tp, dqp);
 	ddq = &dqp->q_core;
 
@@ -621,9 +626,10 @@ xfs_qm_scall_setqlim(
 	xfs_trans_log_dquot(tp, dqp);
 
 	error = xfs_trans_commit(tp, 0);
-	xfs_qm_dqrele(dqp);
 
- out_unlock:
+out_rele:
+	xfs_qm_dqrele(dqp);
+out_unlock:
 	mutex_unlock(&q->qi_quotaofflock);
 	return error;
 }

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 04/14 V2] xfs: avoid nesting transactions in xfs_qm_scall_setqlim()
  2013-05-21  0:36   ` [PATCH 04/14 V2] " Dave Chinner
@ 2013-05-21 10:51     ` Brian Foster
  0 siblings, 0 replies; 34+ messages in thread
From: Brian Foster @ 2013-05-21 10:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 05/20/2013 08:36 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Lockdep reports:
> 
> =============================================
> [ INFO: possible recursive locking detected ]
> 3.9.0+ #3 Not tainted
> ---------------------------------------------
> setquota/28368 is trying to acquire lock:
>  (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50
> 
> but task is already holding lock:
>  (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50
> 
> from xfs_qm_scall_setqlim()->xfs_dqread() when a dquot needs to be
> allocated.
> 
> xfs_qm_scall_setqlim() is starting a transaction and then not
> passing it into xfs_qm_dqet() and so it starts it's own transaction
> when allocating the dquot.  Splat!
> 
> Fix this by not allocating the dquot in xfs_qm_scall_setqlim()
> inside the setqlim transaction. This requires getting the dquot
> first (and allocating it if necessary) then dropping and relocking
> the dquot before joining it to the setqlim transaction.
> 
> Reported-by: Michael L. Semon <mlsemon35@gmail.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> V2: fix error handling path that failed to release the dquot.
> 

Looks good.

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_qm_syscalls.c |   40 +++++++++++++++++++++++-----------------
>  1 file changed, 23 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
> index c41190c..6cdf6ff 100644
> --- a/fs/xfs/xfs_qm_syscalls.c
> +++ b/fs/xfs/xfs_qm_syscalls.c
> @@ -489,31 +489,36 @@ xfs_qm_scall_setqlim(
>  	if ((newlim->d_fieldmask & XFS_DQ_MASK) == 0)
>  		return 0;
>  
> -	tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SETQLIM);
> -	error = xfs_trans_reserve(tp, 0, XFS_QM_SETQLIM_LOG_RES(mp),
> -				  0, 0, XFS_DEFAULT_LOG_COUNT);
> -	if (error) {
> -		xfs_trans_cancel(tp, 0);
> -		return (error);
> -	}
> -
>  	/*
>  	 * We don't want to race with a quotaoff so take the quotaoff lock.
> -	 * (We don't hold an inode lock, so there's nothing else to stop
> -	 * a quotaoff from happening). (XXXThis doesn't currently happen
> -	 * because we take the vfslock before calling xfs_qm_sysent).
> +	 * We don't hold an inode lock, so there's nothing else to stop
> +	 * a quotaoff from happening.
>  	 */
>  	mutex_lock(&q->qi_quotaofflock);
>  
>  	/*
> -	 * Get the dquot (locked), and join it to the transaction.
> -	 * Allocate the dquot if this doesn't exist.
> +	 * Get the dquot (locked) before we start, as we need to do a
> +	 * transaction to allocate it if it doesn't exist. Once we have the
> +	 * dquot, unlock it so we can start the next transaction safely. We hold
> +	 * a reference to the dquot, so it's safe to do this unlock/lock without
> +	 * it being reclaimed in the mean time.
>  	 */
> -	if ((error = xfs_qm_dqget(mp, NULL, id, type, XFS_QMOPT_DQALLOC, &dqp))) {
> -		xfs_trans_cancel(tp, XFS_TRANS_ABORT);
> +	error = xfs_qm_dqget(mp, NULL, id, type, XFS_QMOPT_DQALLOC, &dqp);
> +	if (error) {
>  		ASSERT(error != ENOENT);
>  		goto out_unlock;
>  	}
> +	xfs_dqunlock(dqp);
> +
> +	tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SETQLIM);
> +	error = xfs_trans_reserve(tp, 0, XFS_QM_SETQLIM_LOG_RES(mp),
> +				  0, 0, XFS_DEFAULT_LOG_COUNT);
> +	if (error) {
> +		xfs_trans_cancel(tp, 0);
> +		goto out_rele;
> +	}
> +
> +	xfs_dqlock(dqp);
>  	xfs_trans_dqjoin(tp, dqp);
>  	ddq = &dqp->q_core;
>  
> @@ -621,9 +626,10 @@ xfs_qm_scall_setqlim(
>  	xfs_trans_log_dquot(tp, dqp);
>  
>  	error = xfs_trans_commit(tp, 0);
> -	xfs_qm_dqrele(dqp);
>  
> - out_unlock:
> +out_rele:
> +	xfs_qm_dqrele(dqp);
> +out_unlock:
>  	mutex_unlock(&q->qi_quotaofflock);
>  	return error;
>  }
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2013-05-21 10:54 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-19 23:51 [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Dave Chinner
2013-05-19 23:51 ` [PATCH 01/14] xfs: fix sub-page blocksize data integrity writes Dave Chinner
2013-05-20 18:02   ` Brian Foster
2013-05-20 19:18     ` Ben Myers
2013-05-19 23:51 ` [PATCH 02/14] xfs: fix rounding in xfs_free_file_space Dave Chinner
2013-05-20 18:03   ` Brian Foster
2013-05-19 23:51 ` [PATCH 03/14] xfs: Don't reference the EFI after it is freed Dave Chinner
2013-05-20 18:03   ` Brian Foster
2013-05-19 23:51 ` [PATCH 04/14] xfs: avoid nesting transactions in xfs_qm_scall_setqlim() Dave Chinner
2013-05-20 18:03   ` Brian Foster
2013-05-21  0:06     ` Dave Chinner
2013-05-21  0:36   ` [PATCH 04/14 V2] " Dave Chinner
2013-05-21 10:51     ` Brian Foster
2013-05-19 23:51 ` [PATCH 05/14] xfs: fix missing KM_NOFS tags to keep lockdep happy Dave Chinner
2013-05-20 21:16   ` Ben Myers
2013-05-21  0:08     ` Dave Chinner
2013-05-19 23:51 ` [PATCH 06/14] xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC Dave Chinner
2013-05-20 21:32   ` Ben Myers
2013-05-19 23:51 ` [PATCH 07/14] xfs: xfs_attr_shortform_allfit() does not handle attr3 format Dave Chinner
2013-05-20 21:52   ` Ben Myers
2013-05-19 23:51 ` [PATCH 08/14] xfs: remote attribute allocation may be contiguous Dave Chinner
2013-05-20 19:03   ` Brian Foster
2013-05-20 22:04     ` Ben Myers
2013-05-21  0:25       ` Dave Chinner
2013-05-19 23:51 ` [PATCH 09/14] xfs: remote attribute lookups require the value length Dave Chinner
2013-05-20 22:15   ` Ben Myers
2013-05-19 23:51 ` [PATCH 10/14] xfs: remote attribute read too short Dave Chinner
2013-05-20 23:00   ` Ben Myers
2013-05-19 23:51 ` [PATCH 11/14] xfs: remote attribute tail zeroing does too much Dave Chinner
2013-05-20 23:01   ` Ben Myers
2013-05-19 23:51 ` [PATCH 12/14] xfs: correctly map remote attr buffers during removal Dave Chinner
2013-05-19 23:51 ` [PATCH 13/14] xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance Dave Chinner
2013-05-19 23:51 ` [PATCH 14/14] xfs: fully initialise temp leaf in xfs_attr3_leaf_compact Dave Chinner
2013-05-20 19:37 ` [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Ben Myers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.