All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] xfs: candidate 2.6.38-rc fixes V3
@ 2011-01-27  3:53 Dave Chinner
  2011-01-27  3:53 ` [PATCH 1/8] xfs: fix log ticket leak on forced shutdown Dave Chinner
                   ` (8 more replies)
  0 siblings, 9 replies; 17+ messages in thread
From: Dave Chinner @ 2011-01-27  3:53 UTC (permalink / raw)
  To: xfs

Fixed the various issues found during review and added all the
reviewed-by tags to the fully reviewed commits.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/8] xfs: fix log ticket leak on forced shutdown.
  2011-01-27  3:53 [PATCH 0/8] xfs: candidate 2.6.38-rc fixes V3 Dave Chinner
@ 2011-01-27  3:53 ` Dave Chinner
  2011-01-27  3:53 ` [PATCH 2/8] xfs: fix efi item " Dave Chinner
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2011-01-27  3:53 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The kmemleak detector shows this after test 139:

unreferenced object 0xffff880079b88bb0 (size 264):
  comm "xfs_io", pid 4904, jiffies 4294909382 (age 276.824s)
  hex dump (first 32 bytes):
    00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
    ff ff ff ff ff ff ff ff 48 7b c9 82 ff ff ff ff  ........H{......
  backtrace:
    [<ffffffff81afb04d>] kmemleak_alloc+0x2d/0x60
    [<ffffffff8115c6cf>] kmem_cache_alloc+0x13f/0x2b0
    [<ffffffff814aaa97>] kmem_zone_alloc+0x77/0xf0
    [<ffffffff814aab2e>] kmem_zone_zalloc+0x1e/0x50
    [<ffffffff8148f394>] xlog_ticket_alloc+0x34/0x170
    [<ffffffff81494444>] xlog_cil_push+0xa4/0x3f0
    [<ffffffff81494eca>] xlog_cil_force_lsn+0x15a/0x160
    [<ffffffff814933a5>] _xfs_log_force_lsn+0x75/0x2d0
    [<ffffffff814a264d>] _xfs_trans_commit+0x2bd/0x2f0
    [<ffffffff8148bfdd>] xfs_iomap_write_allocate+0x1ad/0x350
    [<ffffffff814ac17f>] xfs_map_blocks+0x21f/0x370
    [<ffffffff814ad1b7>] xfs_vm_writepage+0x1c7/0x550
    [<ffffffff8112200a>] __writepage+0x1a/0x50
    [<ffffffff81122df2>] write_cache_pages+0x1c2/0x4c0
    [<ffffffff81123117>] generic_writepages+0x27/0x30
    [<ffffffff814aba5d>] xfs_vm_writepages+0x5d/0x80

By inspection, the leak occurs when xlog_write() returns and error
and we jump to the abort path without dropping the reference on the
active ticket.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
---
 fs/xfs/xfs_log_cil.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 9dc8125..c7eac5a 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -543,7 +543,7 @@ xlog_cil_push(
 
 	error = xlog_write(log, &lvhdr, tic, &ctx->start_lsn, NULL, 0);
 	if (error)
-		goto out_abort;
+		goto out_abort_free_ticket;
 
 	/*
 	 * now that we've written the checkpoint into the log, strictly
@@ -569,8 +569,9 @@ restart:
 	}
 	spin_unlock(&cil->xc_cil_lock);
 
+	/* xfs_log_done always frees the ticket on error. */
 	commit_lsn = xfs_log_done(log->l_mp, tic, &commit_iclog, 0);
-	if (error || commit_lsn == -1)
+	if (commit_lsn == -1)
 		goto out_abort;
 
 	/* attach all the transactions w/ busy extents to iclog */
@@ -600,6 +601,8 @@ out_free_ticket:
 	kmem_free(new_ctx);
 	return 0;
 
+out_abort_free_ticket:
+	xfs_log_ticket_put(tic);
 out_abort:
 	xlog_cil_committed(ctx, XFS_LI_ABORTED);
 	return XFS_ERROR(EIO);
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/8] xfs: fix efi item leak on forced shutdown
  2011-01-27  3:53 [PATCH 0/8] xfs: candidate 2.6.38-rc fixes V3 Dave Chinner
  2011-01-27  3:53 ` [PATCH 1/8] xfs: fix log ticket leak on forced shutdown Dave Chinner
@ 2011-01-27  3:53 ` Dave Chinner
  2011-01-28 14:54   ` Alex Elder
  2011-01-27  3:53 ` [PATCH 3/8] xfs: speculative delayed allocation uses rounddown_power_of_2 badly Dave Chinner
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2011-01-27  3:53 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

After test 139, kmemleak shows:

unreferenced object 0xffff880078b405d8 (size 400):
  comm "xfs_io", pid 4904, jiffies 4294909383 (age 1186.728s)
  hex dump (first 32 bytes):
    60 c1 17 79 00 88 ff ff 60 c1 17 79 00 88 ff ff  `..y....`..y....
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81afb04d>] kmemleak_alloc+0x2d/0x60
    [<ffffffff8115c6cf>] kmem_cache_alloc+0x13f/0x2b0
    [<ffffffff814aaa97>] kmem_zone_alloc+0x77/0xf0
    [<ffffffff814aab2e>] kmem_zone_zalloc+0x1e/0x50
    [<ffffffff8147cd6b>] xfs_efi_init+0x4b/0xb0
    [<ffffffff814a4ee8>] xfs_trans_get_efi+0x58/0x90
    [<ffffffff81455fab>] xfs_bmap_finish+0x8b/0x1d0
    [<ffffffff814851b4>] xfs_itruncate_finish+0x2c4/0x5d0
    [<ffffffff814a970f>] xfs_setattr+0x8df/0xa70
    [<ffffffff814b5c7b>] xfs_vn_setattr+0x1b/0x20
    [<ffffffff8117dc00>] notify_change+0x170/0x2e0
    [<ffffffff81163bf6>] do_truncate+0x66/0xa0
    [<ffffffff81163d0b>] sys_ftruncate+0xdb/0xe0
    [<ffffffff8103a002>] system_call_fastpath+0x16/0x1b
    [<ffffffffffffffff>] 0xffffffffffffffff

The cause of the leak is that the "remove" parameter of IOP_UNPIN()
is never set when a CIL push is aborted. This means that the EFI
item is never freed if it was in the push being cancelled. The
problem is specific to delayed logging, but has uncovered a couple
of problems with the handling of IOP_UNPIN(remove).

Firstly, we cannot safely call xfs_trans_del_item() from IOP_UNPIN()
in the CIL commit failure path or the iclog write failure path
because for delayed loging we have no transaction context. Hence we
must only call xfs_trans_del_item() if the log item being unpinned
has an active log item descriptor.

Secondly, xfs_trans_uncommit() does not handle log item descriptor
freeing during the traversal of log items on a transaction. It can
reference a freed log item descriptor when unpinning an EFI item.
Hence it needs to use a safe list traversal method to allow items to
be removed from the transaction during IOP_UNPIN().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf_item.c     |   12 +++++++-----
 fs/xfs/xfs_extfree_item.c |    3 ++-
 fs/xfs/xfs_trans.c        |   36 +++++++++++++++++++++++++++++-------
 3 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 98c6f73..6f8c21c 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -427,13 +427,15 @@ xfs_buf_item_unpin(
 
 		if (remove) {
 			/*
-			 * We have to remove the log item from the transaction
-			 * as we are about to release our reference to the
-			 * buffer.  If we don't, the unlock that occurs later
-			 * in xfs_trans_uncommit() will ry to reference the
+			 * If we are in a transaction context, we have to
+			 * remove the log item from the transaction as we are
+			 * about to release our reference to the buffer.  If we
+			 * don't, the unlock that occurs later in
+			 * xfs_trans_uncommit() will try to reference the
 			 * buffer which we no longer have a hold on.
 			 */
-			xfs_trans_del_item(lip);
+			if (lip->li_desc)
+				xfs_trans_del_item(lip);
 
 			/*
 			 * Since the transaction no longer refers to the buffer,
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 75f2ef6..d22e626 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -138,7 +138,8 @@ xfs_efi_item_unpin(
 
 	if (remove) {
 		ASSERT(!(lip->li_flags & XFS_LI_IN_AIL));
-		xfs_trans_del_item(lip);
+		if (lip->li_desc)
+			xfs_trans_del_item(lip);
 		xfs_efi_item_free(efip);
 		return;
 	}
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 33dbc4e..29f5e54 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1446,6 +1446,14 @@ xfs_log_item_batch_insert(
  * Bulk operation version of xfs_trans_committed that takes a log vector of
  * items to insert into the AIL. This uses bulk AIL insertion techniques to
  * minimise lock traffic.
+ *
+ * If we are called with the aborted flag set, it is because a log write during
+ * a CIL checkpoint commit has failed. In this case, all the items in the
+ * checkpoint have already gone through IOP_COMMITED and IOP_UNLOCK, which
+ * means that checkpoint commit abort handling is treated exactly the same
+ * as an iclog write error even though we haven't started any IO yet. Hence in
+ * this case all we need to do is IOP_COMMITTED processing, followed by an
+ * IOP_UNPIN(aborted) call.
  */
 void
 xfs_trans_committed_bulk(
@@ -1472,6 +1480,16 @@ xfs_trans_committed_bulk(
 		if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)
 			continue;
 
+		/*
+		 * if we are aborting the operation, no point in inserting the
+		 * object into the AIL as we are in a shutdown situation.
+		 */
+		if (aborted) {
+			ASSERT(XFS_FORCED_SHUTDOWN(ailp->xa_mount));
+			IOP_UNPIN(lip, 1);
+			continue;
+		}
+
 		if (item_lsn != commit_lsn) {
 
 			/*
@@ -1503,20 +1521,24 @@ xfs_trans_committed_bulk(
 }
 
 /*
- * Called from the trans_commit code when we notice that
- * the filesystem is in the middle of a forced shutdown.
+ * Called from the trans_commit code when we notice that the filesystem is in
+ * the middle of a forced shutdown.
+ *
+ * When we are called here, we have already pinned all the items in the
+ * transaction. However, neither IOP_COMMITTING or IOP_UNLOCK has been called
+ * so we can simply walk the items in the transaction, unpin them with an abort
+ * flag and then free the items. Note that unpinning the items can result in
+ * them being freed immediately, so we need to use a safe list traversal method
+ * here.
  */
 STATIC void
 xfs_trans_uncommit(
 	struct xfs_trans	*tp,
 	uint			flags)
 {
-	struct xfs_log_item_desc *lidp;
+	struct xfs_log_item_desc *lidp, *n;
 
-	list_for_each_entry(lidp, &tp->t_items, lid_trans) {
-		/*
-		 * Unpin all but those that aren't dirty.
-		 */
+	list_for_each_entry_safe(lidp, n, &tp->t_items, lid_trans) {
 		if (lidp->lid_flags & XFS_LID_DIRTY)
 			IOP_UNPIN(lidp->lid_item, 1);
 	}
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/8] xfs: speculative delayed allocation uses rounddown_power_of_2 badly
  2011-01-27  3:53 [PATCH 0/8] xfs: candidate 2.6.38-rc fixes V3 Dave Chinner
  2011-01-27  3:53 ` [PATCH 1/8] xfs: fix log ticket leak on forced shutdown Dave Chinner
  2011-01-27  3:53 ` [PATCH 2/8] xfs: fix efi item " Dave Chinner
@ 2011-01-27  3:53 ` Dave Chinner
  2011-01-27  3:53 ` [PATCH 4/8] xfs: limit extent length for allocation to AG size Dave Chinner
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2011-01-27  3:53 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

rounddown_power_of_2() returns an undefined result when passed a
value of zero. The specualtive delayed allocation code is doing this
when the inode is zero length. Hence occasionally the preallocation
is much, much larger than is necessary (e.g. 8GB for a 270 _byte_
file). Ensure we don't even pass a zero value to this function so
the result of preallocation is always the desired size.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
---
 fs/xfs/xfs_iomap.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 55582bd..8a0f044 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -337,7 +337,12 @@ xfs_iomap_prealloc_size(
 		int shift = 0;
 		int64_t freesp;
 
-		alloc_blocks = XFS_B_TO_FSB(mp, ip->i_size);
+		/*
+		 * rounddown_pow_of_two() returns an undefined result
+		 * if we pass in alloc_blocks = 0. Hence the "+ 1" to
+		 * ensure we always pass in a non-zero value.
+		 */
+		alloc_blocks = XFS_B_TO_FSB(mp, ip->i_size) + 1;
 		alloc_blocks = XFS_FILEOFF_MIN(MAXEXTLEN,
 					rounddown_pow_of_two(alloc_blocks));
 
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/8] xfs: limit extent length for allocation to AG size
  2011-01-27  3:53 [PATCH 0/8] xfs: candidate 2.6.38-rc fixes V3 Dave Chinner
                   ` (2 preceding siblings ...)
  2011-01-27  3:53 ` [PATCH 3/8] xfs: speculative delayed allocation uses rounddown_power_of_2 badly Dave Chinner
@ 2011-01-27  3:53 ` Dave Chinner
  2011-01-27  3:53 ` [PATCH 5/8] xfs: prevent extsize alignment from exceeding maximum extent size Dave Chinner
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2011-01-27  3:53 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Delayed allocation extents can be larger than AGs, so when trying to
convert a large range we may scan every AG inside
xfs_bmap_alloc_nullfb() trying to find an AG with a size larger than
an AG. We should stop when we find the first AG with a maximum
possible allocation size. This causes excessive CPU usage when there
are lots of AGs.

The same problem occurs when doing preallocation of a range larger
than an AG.

Fix the problem by limiting real allocation lengths to the maximum
that an AG can support. This means if we have empty AGs, we'll stop
the search at the first of them. If there are no empty AGs, we'll
still scan them all, but that is a different problem....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
---
 fs/xfs/xfs_alloc.h |   16 ++++++++++++++++
 fs/xfs/xfs_bmap.c  |   18 ++++++++++--------
 2 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_alloc.h b/fs/xfs/xfs_alloc.h
index 0ab56b3..d0b3bc7 100644
--- a/fs/xfs/xfs_alloc.h
+++ b/fs/xfs/xfs_alloc.h
@@ -75,6 +75,22 @@ typedef unsigned int xfs_alloctype_t;
 #define XFS_ALLOC_SET_ASIDE(mp)  (4 + ((mp)->m_sb.sb_agcount * 4))
 
 /*
+ * When deciding how much space to allocate out of an AG, we limit the
+ * allocation maximum size to the size the AG. However, we cannot use all the
+ * blocks in the AG - some are permanently used by metadata. These
+ * blocks are generally:
+ *	- the AG superblock, AGF, AGI and AGFL
+ *	- the AGF (bno and cnt) and AGI btree root blocks
+ *	- 4 blocks on the AGFL according to XFS_ALLOC_SET_ASIDE() limits
+ *
+ * The AG headers are sector sized, so the amount of space they take up is
+ * dependent on filesystem geometry. The others are all single blocks.
+ */
+#define XFS_ALLOC_AG_MAX_USABLE(mp)	\
+	((mp)->m_sb.sb_agblocks - XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)) - 7)
+
+
+/*
  * Argument structure for xfs_alloc routines.
  * This is turned into a structure to avoid having 20 arguments passed
  * down several levels of the stack.
diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index 4111cd3..f3a3768 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -2430,7 +2430,7 @@ xfs_bmap_btalloc_nullfb(
 		startag = ag = 0;
 
 	pag = xfs_perag_get(mp, ag);
-	while (*blen < ap->alen) {
+	while (*blen < args->maxlen) {
 		if (!pag->pagf_init) {
 			error = xfs_alloc_pagf_init(mp, args->tp, ag,
 						    XFS_ALLOC_FLAG_TRYLOCK);
@@ -2452,7 +2452,7 @@ xfs_bmap_btalloc_nullfb(
 			notinit = 1;
 
 		if (xfs_inode_is_filestream(ap->ip)) {
-			if (*blen >= ap->alen)
+			if (*blen >= args->maxlen)
 				break;
 
 			if (ap->userdata) {
@@ -2498,14 +2498,14 @@ xfs_bmap_btalloc_nullfb(
 	 * If the best seen length is less than the request
 	 * length, use the best as the minimum.
 	 */
-	else if (*blen < ap->alen)
+	else if (*blen < args->maxlen)
 		args->minlen = *blen;
 	/*
-	 * Otherwise we've seen an extent as big as alen,
+	 * Otherwise we've seen an extent as big as maxlen,
 	 * use that as the minimum.
 	 */
 	else
-		args->minlen = ap->alen;
+		args->minlen = args->maxlen;
 
 	/*
 	 * set the failure fallback case to look in the selected
@@ -2573,7 +2573,9 @@ xfs_bmap_btalloc(
 	args.tp = ap->tp;
 	args.mp = mp;
 	args.fsbno = ap->rval;
-	args.maxlen = MIN(ap->alen, mp->m_sb.sb_agblocks);
+
+	/* Trim the allocation back to the maximum an AG can fit. */
+	args.maxlen = MIN(ap->alen, XFS_ALLOC_AG_MAX_USABLE(mp));
 	args.firstblock = ap->firstblock;
 	blen = 0;
 	if (nullfb) {
@@ -2621,7 +2623,7 @@ xfs_bmap_btalloc(
 			/*
 			 * Adjust for alignment
 			 */
-			if (blen > args.alignment && blen <= ap->alen)
+			if (blen > args.alignment && blen <= args.maxlen)
 				args.minlen = blen - args.alignment;
 			args.minalignslop = 0;
 		} else {
@@ -2640,7 +2642,7 @@ xfs_bmap_btalloc(
 			 * of minlen+alignment+slop doesn't go up
 			 * between the calls.
 			 */
-			if (blen > mp->m_dalign && blen <= ap->alen)
+			if (blen > mp->m_dalign && blen <= args.maxlen)
 				nextminlen = blen - mp->m_dalign;
 			else
 				nextminlen = args.minlen;
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/8] xfs: prevent extsize alignment from exceeding maximum extent size
  2011-01-27  3:53 [PATCH 0/8] xfs: candidate 2.6.38-rc fixes V3 Dave Chinner
                   ` (3 preceding siblings ...)
  2011-01-27  3:53 ` [PATCH 4/8] xfs: limit extent length for allocation to AG size Dave Chinner
@ 2011-01-27  3:53 ` Dave Chinner
  2011-01-27  3:53 ` [PATCH 6/8] xfs: limit extsize to size of AGs and/or MAXEXTLEN Dave Chinner
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2011-01-27  3:53 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When doing delayed allocation, if the allocation size is for a
maximally sized extent, extent size alignment can push it over this
limit. This results in an assert failure in xfs_bmbt_set_allf() as
the extent length is too large to find in the extent record.

Fix this by ensuring that we allow for space that extent size
alignment requires (up to 2 * (extsize -1) blocks as we have to
handle both head and tail alignment) when limiting the maximum size
of the extent.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
---
 fs/xfs/xfs_bmap.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index f3a3768..3e9c278 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -4487,6 +4487,16 @@ xfs_bmapi(
 				/* Figure out the extent size, adjust alen */
 				extsz = xfs_get_extsz_hint(ip);
 				if (extsz) {
+					/*
+					 * make sure we don't exceed a single
+					 * extent length when we align the
+					 * extent by reducing length we are
+					 * going to allocate by the maximum
+					 * amount extent size aligment may
+					 * require.
+					 */
+					alen = XFS_FILBLKS_MIN(len,
+						   MAXEXTLEN - (2 * extsz - 1));
 					error = xfs_bmap_extsize_align(mp,
 							&got, &prev, extsz,
 							rt, eof,
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/8] xfs: limit extsize to size of AGs and/or MAXEXTLEN
  2011-01-27  3:53 [PATCH 0/8] xfs: candidate 2.6.38-rc fixes V3 Dave Chinner
                   ` (4 preceding siblings ...)
  2011-01-27  3:53 ` [PATCH 5/8] xfs: prevent extsize alignment from exceeding maximum extent size Dave Chinner
@ 2011-01-27  3:53 ` Dave Chinner
  2011-01-27  3:53 ` [PATCH 7/8] xfs: handle CIl transaction commit failures correctly Dave Chinner
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2011-01-27  3:53 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The extent size hint can be set to larger than an AG. This means
that the alignment process can push the range to be allocated
outside the bounds of the AG, resulting in assert failures or
corrupted bmbt records. Similarly, if the extsize is larger than the
maximum extent size supported, the alignment process will produce
extents that are too large to fit into the bmbt records, resulting
in a different type of assert/corruption failure.

Fix this by limiting extsize at the time іt is set firstly to be
less than MAXEXTLEN, then to be a maximum of half the size of the
AGs in the filesystem for non-realtime inodes. Realtime inodes do
not allocate out of AGs, so don't have to be restricted by the size
of AGs.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
---
 fs/xfs/linux-2.6/xfs_ioctl.c |   20 ++++++++++++++++++--
 1 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_ioctl.c b/fs/xfs/linux-2.6/xfs_ioctl.c
index b06ede1..f5e2a19 100644
--- a/fs/xfs/linux-2.6/xfs_ioctl.c
+++ b/fs/xfs/linux-2.6/xfs_ioctl.c
@@ -985,10 +985,22 @@ xfs_ioctl_setattr(
 
 		/*
 		 * Extent size must be a multiple of the appropriate block
-		 * size, if set at all.
+		 * size, if set at all. It must also be smaller than the
+		 * maximum extent size supported by the filesystem.
+		 *
+		 * Also, for non-realtime files, limit the extent size hint to
+		 * half the size of the AGs in the filesystem so alignment
+		 * doesn't result in extents larger than an AG.
 		 */
 		if (fa->fsx_extsize != 0) {
-			xfs_extlen_t	size;
+			xfs_extlen_t    size;
+			xfs_fsblock_t   extsize_fsb;
+
+			extsize_fsb = XFS_B_TO_FSB(mp, fa->fsx_extsize);
+			if (extsize_fsb > MAXEXTLEN) {
+				code = XFS_ERROR(EINVAL);
+				goto error_return;
+			}
 
 			if (XFS_IS_REALTIME_INODE(ip) ||
 			    ((mask & FSX_XFLAGS) &&
@@ -997,6 +1009,10 @@ xfs_ioctl_setattr(
 				       mp->m_sb.sb_blocklog;
 			} else {
 				size = mp->m_sb.sb_blocksize;
+				if (extsize_fsb > mp->m_sb.sb_agblocks / 2) {
+					code = XFS_ERROR(EINVAL);
+					goto error_return;
+				}
 			}
 
 			if (fa->fsx_extsize % size) {
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 7/8] xfs: handle CIl transaction commit failures correctly
  2011-01-27  3:53 [PATCH 0/8] xfs: candidate 2.6.38-rc fixes V3 Dave Chinner
                   ` (5 preceding siblings ...)
  2011-01-27  3:53 ` [PATCH 6/8] xfs: limit extsize to size of AGs and/or MAXEXTLEN Dave Chinner
@ 2011-01-27  3:53 ` Dave Chinner
  2011-01-27  3:53 ` [PATCH 8/8] xfs: fix dquot shaker deadlock Dave Chinner
  2011-01-27 15:31 ` [PATCH 0/8] xfs: candidate 2.6.38-rc fixes V3 Alex Elder
  8 siblings, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2011-01-27  3:53 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Failure to commit a transaction into the CIL is not handled
correctly. This currently can only happen when racing with a
shutdown and requires an explicit shutdown check, so it rare and can
be avoided. Remove the shutdown check and make the CIL commit a void
function to indicate it will always succeed, thereby removing the
incorrectly handled failure case.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
---
 fs/xfs/xfs_log.h     |    2 +-
 fs/xfs/xfs_log_cil.c |    8 +-------
 fs/xfs/xfs_trans.c   |    5 +----
 3 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
index 916eb7d..3bd3291 100644
--- a/fs/xfs/xfs_log.h
+++ b/fs/xfs/xfs_log.h
@@ -191,7 +191,7 @@ void	  xfs_log_ticket_put(struct xlog_ticket *ticket);
 
 xlog_tid_t xfs_log_get_trans_ident(struct xfs_trans *tp);
 
-int	xfs_log_commit_cil(struct xfs_mount *mp, struct xfs_trans *tp,
+void	xfs_log_commit_cil(struct xfs_mount *mp, struct xfs_trans *tp,
 				struct xfs_log_vec *log_vector,
 				xfs_lsn_t *commit_lsn, int flags);
 bool	xfs_log_item_in_current_chkpt(struct xfs_log_item *lip);
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index c7eac5a..9ca59be 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -625,7 +625,7 @@ out_abort:
  * background commit, returns without it held once background commits are
  * allowed again.
  */
-int
+void
 xfs_log_commit_cil(
 	struct xfs_mount	*mp,
 	struct xfs_trans	*tp,
@@ -640,11 +640,6 @@ xfs_log_commit_cil(
 	if (flags & XFS_TRANS_RELEASE_LOG_RES)
 		log_flags = XFS_LOG_REL_PERM_RESERV;
 
-	if (XLOG_FORCED_SHUTDOWN(log)) {
-		xlog_cil_free_logvec(log_vector);
-		return XFS_ERROR(EIO);
-	}
-
 	/*
 	 * do all the hard work of formatting items (including memory
 	 * allocation) outside the CIL context lock. This prevents stalling CIL
@@ -704,7 +699,6 @@ xfs_log_commit_cil(
 	 */
 	if (push)
 		xlog_cil_push(log, 0);
-	return 0;
 }
 
 /*
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 29f5e54..7692279 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1755,7 +1755,6 @@ xfs_trans_commit_cil(
 	int			flags)
 {
 	struct xfs_log_vec	*log_vector;
-	int			error;
 
 	/*
 	 * Get each log item to allocate a vector structure for
@@ -1766,9 +1765,7 @@ xfs_trans_commit_cil(
 	if (!log_vector)
 		return ENOMEM;
 
-	error = xfs_log_commit_cil(mp, tp, log_vector, commit_lsn, flags);
-	if (error)
-		return error;
+	xfs_log_commit_cil(mp, tp, log_vector, commit_lsn, flags);
 
 	current_restore_flags_nested(&tp->t_pflags, PF_FSTRANS);
 	xfs_trans_free(tp);
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 8/8] xfs: fix dquot shaker deadlock
  2011-01-27  3:53 [PATCH 0/8] xfs: candidate 2.6.38-rc fixes V3 Dave Chinner
                   ` (6 preceding siblings ...)
  2011-01-27  3:53 ` [PATCH 7/8] xfs: handle CIl transaction commit failures correctly Dave Chinner
@ 2011-01-27  3:53 ` Dave Chinner
  2011-01-27 15:29   ` Alex Elder
  2011-01-27 15:31 ` [PATCH 0/8] xfs: candidate 2.6.38-rc fixes V3 Alex Elder
  8 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2011-01-27  3:53 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Commit 368e136 ("xfs: remove duplicate code from dquot reclaim") fails
to unlock the dquot freelist when the number of loop restarts is
exceeded in xfs_qm_dqreclaim_one(). This causes hangs in memory
reclaim. Remove the bogus loop exit check that causes the problem.

Reported-by: Malcolm Scott <lkml@malc.org.uk>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/quota/xfs_qm.c |   46 +++++++++++++++++++++-------------------------
 1 files changed, 21 insertions(+), 25 deletions(-)

diff --git a/fs/xfs/quota/xfs_qm.c b/fs/xfs/quota/xfs_qm.c
index f8e854b..206a281 100644
--- a/fs/xfs/quota/xfs_qm.c
+++ b/fs/xfs/quota/xfs_qm.c
@@ -1863,12 +1863,14 @@ xfs_qm_dqreclaim_one(void)
 	xfs_dquot_t	*dqpout;
 	xfs_dquot_t	*dqp;
 	int		restarts;
+	int		startagain;
 
 	restarts = 0;
 	dqpout = NULL;
 
 	/* lockorder: hashchainlock, freelistlock, mplistlock, dqlock, dqflock */
-startagain:
+again:
+	startagain = 0;
 	mutex_lock(&xfs_Gqm->qm_dqfrlist_lock);
 
 	list_for_each_entry(dqp, &xfs_Gqm->qm_dqfrlist, q_freelist) {
@@ -1885,13 +1887,10 @@ startagain:
 			ASSERT(! (dqp->dq_flags & XFS_DQ_INACTIVE));
 
 			trace_xfs_dqreclaim_want(dqp);
-
-			xfs_dqunlock(dqp);
-			mutex_unlock(&xfs_Gqm->qm_dqfrlist_lock);
-			if (++restarts >= XFS_QM_RECLAIM_MAX_RESTARTS)
-				return NULL;
 			XQM_STATS_INC(xqmstats.xs_qm_dqwants);
-			goto startagain;
+			restarts++;
+			startagain = 1;
+			goto dqunlock;
 		}
 
 		/*
@@ -1906,23 +1905,20 @@ startagain:
 			ASSERT(list_empty(&dqp->q_mplist));
 			list_del_init(&dqp->q_freelist);
 			xfs_Gqm->qm_dqfrlist_cnt--;
-			xfs_dqunlock(dqp);
 			dqpout = dqp;
 			XQM_STATS_INC(xqmstats.xs_qm_dqinact_reclaims);
-			break;
+			goto dqunlock;
 		}
 
 		ASSERT(dqp->q_hash);
 		ASSERT(!list_empty(&dqp->q_mplist));
 
 		/*
-		 * Try to grab the flush lock. If this dquot is in the process of
-		 * getting flushed to disk, we don't want to reclaim it.
+		 * Try to grab the flush lock. If this dquot is in the process
+		 * of getting flushed to disk, we don't want to reclaim it.
 		 */
-		if (!xfs_dqflock_nowait(dqp)) {
-			xfs_dqunlock(dqp);
-			continue;
-		}
+		if (!xfs_dqflock_nowait(dqp))
+			goto dqunlock;
 
 		/*
 		 * We have the flush lock so we know that this is not in the
@@ -1944,8 +1940,7 @@ startagain:
 				xfs_fs_cmn_err(CE_WARN, mp,
 			"xfs_qm_dqreclaim: dquot %p flush failed", dqp);
 			}
-			xfs_dqunlock(dqp); /* dqflush unlocks dqflock */
-			continue;
+			goto dqunlock;
 		}
 
 		/*
@@ -1967,13 +1962,8 @@ startagain:
 		 */
 		if (!mutex_trylock(&mp->m_quotainfo->qi_dqlist_lock)) {
 			restarts++;
-			mutex_unlock(&dqp->q_hash->qh_lock);
-			xfs_dqfunlock(dqp);
-			xfs_dqunlock(dqp);
-			mutex_unlock(&xfs_Gqm->qm_dqfrlist_lock);
-			if (restarts++ >= XFS_QM_RECLAIM_MAX_RESTARTS)
-				return NULL;
-			goto startagain;
+			startagain = 1;
+			goto qhunlock;
 		}
 
 		ASSERT(dqp->q_nrefs == 0);
@@ -1986,14 +1976,20 @@ startagain:
 		xfs_Gqm->qm_dqfrlist_cnt--;
 		dqpout = dqp;
 		mutex_unlock(&mp->m_quotainfo->qi_dqlist_lock);
+qhunlock:
 		mutex_unlock(&dqp->q_hash->qh_lock);
 dqfunlock:
 		xfs_dqfunlock(dqp);
+dqunlock:
 		xfs_dqunlock(dqp);
 		if (dqpout)
 			break;
 		if (restarts >= XFS_QM_RECLAIM_MAX_RESTARTS)
-			return NULL;
+			break;
+		if (startagain) {
+			mutex_unlock(&xfs_Gqm->qm_dqfrlist_lock);
+			goto again;
+		}
 	}
 	mutex_unlock(&xfs_Gqm->qm_dqfrlist_lock);
 	return dqpout;
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 8/8] xfs: fix dquot shaker deadlock
  2011-01-27  3:53 ` [PATCH 8/8] xfs: fix dquot shaker deadlock Dave Chinner
@ 2011-01-27 15:29   ` Alex Elder
  0 siblings, 0 replies; 17+ messages in thread
From: Alex Elder @ 2011-01-27 15:29 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, 2011-01-27 at 14:53 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Commit 368e136 ("xfs: remove duplicate code from dquot reclaim") fails
> to unlock the dquot freelist when the number of loop restarts is
> exceeded in xfs_qm_dqreclaim_one(). This causes hangs in memory
> reclaim. Remove the bogus loop exit check that causes the problem.
> 
> Reported-by: Malcolm Scott <lkml@malc.org.uk>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Still not a pretty loop (and I have no suggestions
to improve it).  But it looks correct.

Please add a mention in the description that the
double-increment issue in the qi_dqlist_lock case
has been fixed.

Reviewed-by: Alex Elder <aelder@sgi.com>



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] xfs: candidate 2.6.38-rc fixes V3
  2011-01-27  3:53 [PATCH 0/8] xfs: candidate 2.6.38-rc fixes V3 Dave Chinner
                   ` (7 preceding siblings ...)
  2011-01-27  3:53 ` [PATCH 8/8] xfs: fix dquot shaker deadlock Dave Chinner
@ 2011-01-27 15:31 ` Alex Elder
  2011-01-28  0:42   ` Dave Chinner
  8 siblings, 1 reply; 17+ messages in thread
From: Alex Elder @ 2011-01-27 15:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, 2011-01-27 at 14:53 +1100, Dave Chinner wrote:
> Fixed the various issues found during review and added all the
> reviewed-by tags to the fully reviewed commits.

I'll be testing with this today.  Barring any unforseen
problems I'm just waiting for a pull request from you
(or the go-ahead to just use the patches as posted).

I plan to push the result to oss.sgi.com tonight so it
can be used in -next before sending it to Linus.

					-Alex

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] xfs: candidate 2.6.38-rc fixes V3
  2011-01-27 15:31 ` [PATCH 0/8] xfs: candidate 2.6.38-rc fixes V3 Alex Elder
@ 2011-01-28  0:42   ` Dave Chinner
  0 siblings, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2011-01-28  0:42 UTC (permalink / raw)
  To: Alex Elder; +Cc: xfs

On Thu, Jan 27, 2011 at 09:31:05AM -0600, Alex Elder wrote:
> On Thu, 2011-01-27 at 14:53 +1100, Dave Chinner wrote:
> > Fixed the various issues found during review and added all the
> > reviewed-by tags to the fully reviewed commits.
> 
> I'll be testing with this today.  Barring any unforseen
> problems I'm just waiting for a pull request from you
> (or the go-ahead to just use the patches as posted).

Ok, still need a review on patch 2/8 (efi item leak) before I'll
send a pull request...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/8] xfs: fix efi item leak on forced shutdown
  2011-01-27  3:53 ` [PATCH 2/8] xfs: fix efi item " Dave Chinner
@ 2011-01-28 14:54   ` Alex Elder
  0 siblings, 0 replies; 17+ messages in thread
From: Alex Elder @ 2011-01-28 14:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, 2011-01-27 at 14:53 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> After test 139, kmemleak shows:
> 
> unreferenced object 0xffff880078b405d8 (size 400):
>   comm "xfs_io", pid 4904, jiffies 4294909383 (age 1186.728s)
>   hex dump (first 32 bytes):
>     60 c1 17 79 00 88 ff ff 60 c1 17 79 00 88 ff ff  `..y....`..y....
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<ffffffff81afb04d>] kmemleak_alloc+0x2d/0x60
>     [<ffffffff8115c6cf>] kmem_cache_alloc+0x13f/0x2b0
>     [<ffffffff814aaa97>] kmem_zone_alloc+0x77/0xf0
>     [<ffffffff814aab2e>] kmem_zone_zalloc+0x1e/0x50
>     [<ffffffff8147cd6b>] xfs_efi_init+0x4b/0xb0
>     [<ffffffff814a4ee8>] xfs_trans_get_efi+0x58/0x90
>     [<ffffffff81455fab>] xfs_bmap_finish+0x8b/0x1d0
>     [<ffffffff814851b4>] xfs_itruncate_finish+0x2c4/0x5d0
>     [<ffffffff814a970f>] xfs_setattr+0x8df/0xa70
>     [<ffffffff814b5c7b>] xfs_vn_setattr+0x1b/0x20
>     [<ffffffff8117dc00>] notify_change+0x170/0x2e0
>     [<ffffffff81163bf6>] do_truncate+0x66/0xa0
>     [<ffffffff81163d0b>] sys_ftruncate+0xdb/0xe0
>     [<ffffffff8103a002>] system_call_fastpath+0x16/0x1b
>     [<ffffffffffffffff>] 0xffffffffffffffff
> 
> The cause of the leak is that the "remove" parameter of IOP_UNPIN()
> is never set when a CIL push is aborted. This means that the EFI
> item is never freed if it was in the push being cancelled. The
> problem is specific to delayed logging, but has uncovered a couple
> of problems with the handling of IOP_UNPIN(remove).
> 
> Firstly, we cannot safely call xfs_trans_del_item() from IOP_UNPIN()
> in the CIL commit failure path or the iclog write failure path
> because for delayed loging we have no transaction context. Hence we
> must only call xfs_trans_del_item() if the log item being unpinned
> has an active log item descriptor.
> 
> Secondly, xfs_trans_uncommit() does not handle log item descriptor
> freeing during the traversal of log items on a transaction. It can
> reference a freed log item descriptor when unpinning an EFI item.
> Hence it needs to use a safe list traversal method to allow items to
> be removed from the transaction during IOP_UNPIN().
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

OK, this looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/8] xfs: fix efi item leak on forced shutdown
  2011-01-25 23:53   ` Christoph Hellwig
@ 2011-01-27  0:35     ` Dave Chinner
  0 siblings, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2011-01-27  0:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Tue, Jan 25, 2011 at 06:53:54PM -0500, Christoph Hellwig wrote:
> On Tue, Jan 25, 2011 at 07:50:38PM +1100, Dave Chinner wrote:
> > diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> > index 75f2ef6..effbb41 100644
> > --- a/fs/xfs/xfs_extfree_item.c
> > +++ b/fs/xfs/xfs_extfree_item.c
> > @@ -138,7 +138,6 @@ xfs_efi_item_unpin(
> >  
> >  	if (remove) {
> >  		ASSERT(!(lip->li_flags & XFS_LI_IN_AIL));
> > -		xfs_trans_del_item(lip);
> >  		xfs_efi_item_free(efip);
> >  		return;
> >  	}
> 
> >  STATIC void
> >  xfs_trans_uncommit(
> >  	struct xfs_trans	*tp,
> >  	uint			flags)
> >  {
> > -	struct xfs_log_item_desc *lidp;
> > +	struct xfs_log_item_desc *lidp, *n;
> >  
> > -	list_for_each_entry(lidp, &tp->t_items, lid_trans) {
> > +	list_for_each_entry_safe(lidp, n, &tp->t_items, lid_trans) {
> >  		/*
> >  		 * Unpin all but those that aren't dirty.
> >  		 */
> > -		if (lidp->lid_flags & XFS_LID_DIRTY)
> > +		if (lidp->lid_flags & XFS_LID_DIRTY) {
> > +			xfs_trans_del_item(lidp->lid_item);
> >  			IOP_UNPIN(lidp->lid_item, 1);
> > +		}
> 
> This part of the patch isn't explained in the changelog, and I suspect
> it should be a separate patch from the addition of the IOP_UNPIN with
> remove call to the CIL error path.

Doing xfs_trans_del_item() inside IOP_UNPIN() during CIl commit
faliure processing causes a panic - there is not descriptior or
trransaction attached to the item at that point. Hence it is simply
not safe to call IOP_UNPIN(aborted) from outside a transaction
context if we allow xfs_trans_del_item() to be run inside the
IOP_UNPIN() call.

Further, if we do xfs_trans_del_item() inside the IOP_UNPIN(aborted)
path, we corrupt the list walk in xfs_trans_uncommit() case. Hence
it needs to be an list_for_each_entry_safe() walk, and must do
explicit removal of the item from the transaction because the item
can be freed in IOP_UNPIN(aborted).

> Moving the xfs_trans_del_item from the IOP_UNPIN implementation into
> the caller seems sane to me, but:
> 
>  - why is the call to xfs_trans_del_item left in xfs_buf_item_unpin

Because I didn't notice it. It never triggers in the
xfs_trans_uncommit() case as there are two references on the buf
item - one for the pin, and one for the lock. We call IOP_UNPIN()
first, so it never goes down the (freed && stale) path in this case.

As to why it hasn't triggered from a iclog write completion or CIL
checkpoint failure after the IOP_UNLOCK() has been run and dropped
that reference - just luck I guess....

>  - why did xfs_buf_item_unpin get away only calling it for the stale
>    case, and the other log item implementations completely without
>    it

The stale buffer case is special - it has a different lifecycle to
all other buffer items (and all other log item types, for that
matter). Basically a stale buffer never enters the AIL on
transaction commit, so on the last reference going away it needs to
be freed rather than continuing to live while being tracked in the
AIL.

> I suspect the answer lies in xfs_trans_free_items opencoding the
> call to xfs_trans_del_item, thus avoiding any leak of log item
> descriptors or log items on the transaction list.

Actually, I don't think it does handle this case - we've already
removed all the dirty items from the transaction, so it appears to
me like they just get leaked.

> By adding the
> call of xfs_trans_del_item to xfs_trans_uncommit we thus skip
> the calls to IOP_UNLOCK for dirty log items, which is a large
> change in behaviour for the existing log items that didn't have
> the xfs_trans_del_item calls, and at least for the dquot and
> inode items seems incorrect.

You are right, it does break xfs_trans_free_items() and the way it
calls IOP_UNLOCK(). I'm surprised that it didn't cause any noticable
leaks of locked buffers....

----

Effectively, we've got a situation where we need to allow
xfs_trans_del_item() to be called in IOP_UNPIN(aborted) iff there is
a log item descriptor attached to the log item. That requires
xfs_trans_uncommit() to do a safe list traversal and both the efi
and buf items to check for a valid lidp before calling
xfs_trans_del_item(). I'll change the fix and commment it better in
the code and changelog.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/8] xfs: fix efi item leak on forced shutdown
  2011-01-25  8:50 ` [PATCH 2/8] xfs: fix efi item leak on forced shutdown Dave Chinner
  2011-01-25 23:53   ` Christoph Hellwig
@ 2011-01-26 21:22   ` Alex Elder
  1 sibling, 0 replies; 17+ messages in thread
From: Alex Elder @ 2011-01-26 21:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, 2011-01-25 at 19:50 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> After test 139, kmemleak shows:
> 
> unreferenced object 0xffff880078b405d8 (size 400):
>   comm "xfs_io", pid 4904, jiffies 4294909383 (age 1186.728s)
>   hex dump (first 32 bytes):
. . .

> 
> The cause of the leak is that the "remove" parameter of IOP_UNPIN()
> is never set when a CIL push is aborted. This means that the EFI
> item is never freed if it was in the push being cancelled. The
> problem is specific to delayed logging.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Like Christoph, I was a bit confused about xfs_buf_item_unpin()
retaining the call to xfs_trans_del_item().  They all ought to
be done consistently.  Hopefully you can straighten that out.

Two other minor comments:  I'd prefer that an explicit "1" be
passed to IOP_UNPIN() rather than the variable "aborted" (which is
known to have non-zero value) in xfs_trans_committed_bulk().  And
there are some long-lined comments right near what you're changing
that you could touch up while you're at it.

> ---
>  fs/xfs/xfs_extfree_item.c |    1 -
>  fs/xfs/xfs_trans.c        |   35 ++++++++++++++++++++++++++++++-----
>  2 files changed, 30 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
. . .
> @@ -1472,6 +1480,16 @@ xfs_trans_committed_bulk(
>  		if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)
>  			continue;
>  
> +		/*
> +		 * if we are aborting the operation, no point in inserting the
> +		 * object into the AIL as we are in a shutdown situation.
> +		 */
> +		if (aborted) {
> +			ASSERT(XFS_FORCED_SHUTDOWN(ailp->xa_mount));
> +			IOP_UNPIN(lip, aborted);
                            Here           ^^^

> +			continue;
> +		}
> +
>  		if (item_lsn != commit_lsn) {
>  
>  			/*


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/8] xfs: fix efi item leak on forced shutdown
  2011-01-25  8:50 ` [PATCH 2/8] xfs: fix efi item leak on forced shutdown Dave Chinner
@ 2011-01-25 23:53   ` Christoph Hellwig
  2011-01-27  0:35     ` Dave Chinner
  2011-01-26 21:22   ` Alex Elder
  1 sibling, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2011-01-25 23:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Jan 25, 2011 at 07:50:38PM +1100, Dave Chinner wrote:
> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> index 75f2ef6..effbb41 100644
> --- a/fs/xfs/xfs_extfree_item.c
> +++ b/fs/xfs/xfs_extfree_item.c
> @@ -138,7 +138,6 @@ xfs_efi_item_unpin(
>  
>  	if (remove) {
>  		ASSERT(!(lip->li_flags & XFS_LI_IN_AIL));
> -		xfs_trans_del_item(lip);
>  		xfs_efi_item_free(efip);
>  		return;
>  	}

>  STATIC void
>  xfs_trans_uncommit(
>  	struct xfs_trans	*tp,
>  	uint			flags)
>  {
> -	struct xfs_log_item_desc *lidp;
> +	struct xfs_log_item_desc *lidp, *n;
>  
> -	list_for_each_entry(lidp, &tp->t_items, lid_trans) {
> +	list_for_each_entry_safe(lidp, n, &tp->t_items, lid_trans) {
>  		/*
>  		 * Unpin all but those that aren't dirty.
>  		 */
> -		if (lidp->lid_flags & XFS_LID_DIRTY)
> +		if (lidp->lid_flags & XFS_LID_DIRTY) {
> +			xfs_trans_del_item(lidp->lid_item);
>  			IOP_UNPIN(lidp->lid_item, 1);
> +		}

This part of the patch isn't explained in the changelog, and I suspect
it should be a separate patch from the addition of the IOP_UNPIN with
remove call to the CIL error path.

Moving the xfs_trans_del_item from the IOP_UNPIN implementation into
the caller seems sane to me, but:

 - why is the call to xfs_trans_del_item left in xfs_buf_item_unpin
 - why did xfs_buf_item_unpin get away only calling it for the stale
   case, and the other log item implementations completely without
   it

I suspect the answer lies in xfs_trans_free_items opencoding the
call to xfs_trans_del_item, thus avoiding any leak of log item
descriptors or log items on the transaction list.  By adding the
call of xfs_trans_del_item to xfs_trans_uncommit we thus skip
the calls to IOP_UNLOCK for dirty log items, which is a large
change in behaviour for the existing log items that didn't have
the xfs_trans_del_item calls, and at least for the dquot and
inode items seems incorrect.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/8] xfs: fix efi item leak on forced shutdown
  2011-01-25  8:50 [PATCH 0/8] xfs: 2.6.38-rc candidate fixes V2 Dave Chinner
@ 2011-01-25  8:50 ` Dave Chinner
  2011-01-25 23:53   ` Christoph Hellwig
  2011-01-26 21:22   ` Alex Elder
  0 siblings, 2 replies; 17+ messages in thread
From: Dave Chinner @ 2011-01-25  8:50 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

After test 139, kmemleak shows:

unreferenced object 0xffff880078b405d8 (size 400):
  comm "xfs_io", pid 4904, jiffies 4294909383 (age 1186.728s)
  hex dump (first 32 bytes):
    60 c1 17 79 00 88 ff ff 60 c1 17 79 00 88 ff ff  `..y....`..y....
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81afb04d>] kmemleak_alloc+0x2d/0x60
    [<ffffffff8115c6cf>] kmem_cache_alloc+0x13f/0x2b0
    [<ffffffff814aaa97>] kmem_zone_alloc+0x77/0xf0
    [<ffffffff814aab2e>] kmem_zone_zalloc+0x1e/0x50
    [<ffffffff8147cd6b>] xfs_efi_init+0x4b/0xb0
    [<ffffffff814a4ee8>] xfs_trans_get_efi+0x58/0x90
    [<ffffffff81455fab>] xfs_bmap_finish+0x8b/0x1d0
    [<ffffffff814851b4>] xfs_itruncate_finish+0x2c4/0x5d0
    [<ffffffff814a970f>] xfs_setattr+0x8df/0xa70
    [<ffffffff814b5c7b>] xfs_vn_setattr+0x1b/0x20
    [<ffffffff8117dc00>] notify_change+0x170/0x2e0
    [<ffffffff81163bf6>] do_truncate+0x66/0xa0
    [<ffffffff81163d0b>] sys_ftruncate+0xdb/0xe0
    [<ffffffff8103a002>] system_call_fastpath+0x16/0x1b
    [<ffffffffffffffff>] 0xffffffffffffffff

The cause of the leak is that the "remove" parameter of IOP_UNPIN()
is never set when a CIL push is aborted. This means that the EFI
item is never freed if it was in the push being cancelled. The
problem is specific to delayed logging.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_extfree_item.c |    1 -
 fs/xfs/xfs_trans.c        |   35 ++++++++++++++++++++++++++++++-----
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 75f2ef6..effbb41 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -138,7 +138,6 @@ xfs_efi_item_unpin(
 
 	if (remove) {
 		ASSERT(!(lip->li_flags & XFS_LI_IN_AIL));
-		xfs_trans_del_item(lip);
 		xfs_efi_item_free(efip);
 		return;
 	}
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 33dbc4e..27ecb74 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1446,6 +1446,14 @@ xfs_log_item_batch_insert(
  * Bulk operation version of xfs_trans_committed that takes a log vector of
  * items to insert into the AIL. This uses bulk AIL insertion techniques to
  * minimise lock traffic.
+ *
+ * If we are called with the aborted flag set, it is because a log write during
+ * a CIL checkpoint commit has failed. In this case, all the items in the
+ * checkpoint have already gone through IOP_COMMITED and IOP_UNLOCK, which
+ * means that checkpoint commit abort handling is treated exactly the same
+ * as an iclog write error even though we haven't started any IO yet. Hence in
+ * this case all we need to do is IOP_COMMITTED processing, followed by an
+ * IOP_UNPIN(aborted) call.
  */
 void
 xfs_trans_committed_bulk(
@@ -1472,6 +1480,16 @@ xfs_trans_committed_bulk(
 		if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)
 			continue;
 
+		/*
+		 * if we are aborting the operation, no point in inserting the
+		 * object into the AIL as we are in a shutdown situation.
+		 */
+		if (aborted) {
+			ASSERT(XFS_FORCED_SHUTDOWN(ailp->xa_mount));
+			IOP_UNPIN(lip, aborted);
+			continue;
+		}
+
 		if (item_lsn != commit_lsn) {
 
 			/*
@@ -1503,22 +1521,29 @@ xfs_trans_committed_bulk(
 }
 
 /*
- * Called from the trans_commit code when we notice that
- * the filesystem is in the middle of a forced shutdown.
+ * Called from the trans_commit code when we notice that the filesystem is in
+ * the middle of a forced shutdown.
+ *
+ * When we are called here, we have already pinned all the items in the
+ * transaction. However, neither IOP_COMMITTING or IOP_UNLOCK has been called
+ * so we can simply walk the items in the transaction, unlink them from the
+ * transaction, unpin them with an abort flag and then free the items.
  */
 STATIC void
 xfs_trans_uncommit(
 	struct xfs_trans	*tp,
 	uint			flags)
 {
-	struct xfs_log_item_desc *lidp;
+	struct xfs_log_item_desc *lidp, *n;
 
-	list_for_each_entry(lidp, &tp->t_items, lid_trans) {
+	list_for_each_entry_safe(lidp, n, &tp->t_items, lid_trans) {
 		/*
 		 * Unpin all but those that aren't dirty.
 		 */
-		if (lidp->lid_flags & XFS_LID_DIRTY)
+		if (lidp->lid_flags & XFS_LID_DIRTY) {
+			xfs_trans_del_item(lidp->lid_item);
 			IOP_UNPIN(lidp->lid_item, 1);
+		}
 	}
 
 	xfs_trans_unreserve_and_mod_sb(tp);
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-01-28 14:51 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-27  3:53 [PATCH 0/8] xfs: candidate 2.6.38-rc fixes V3 Dave Chinner
2011-01-27  3:53 ` [PATCH 1/8] xfs: fix log ticket leak on forced shutdown Dave Chinner
2011-01-27  3:53 ` [PATCH 2/8] xfs: fix efi item " Dave Chinner
2011-01-28 14:54   ` Alex Elder
2011-01-27  3:53 ` [PATCH 3/8] xfs: speculative delayed allocation uses rounddown_power_of_2 badly Dave Chinner
2011-01-27  3:53 ` [PATCH 4/8] xfs: limit extent length for allocation to AG size Dave Chinner
2011-01-27  3:53 ` [PATCH 5/8] xfs: prevent extsize alignment from exceeding maximum extent size Dave Chinner
2011-01-27  3:53 ` [PATCH 6/8] xfs: limit extsize to size of AGs and/or MAXEXTLEN Dave Chinner
2011-01-27  3:53 ` [PATCH 7/8] xfs: handle CIl transaction commit failures correctly Dave Chinner
2011-01-27  3:53 ` [PATCH 8/8] xfs: fix dquot shaker deadlock Dave Chinner
2011-01-27 15:29   ` Alex Elder
2011-01-27 15:31 ` [PATCH 0/8] xfs: candidate 2.6.38-rc fixes V3 Alex Elder
2011-01-28  0:42   ` Dave Chinner
  -- strict thread matches above, loose matches on Subject: below --
2011-01-25  8:50 [PATCH 0/8] xfs: 2.6.38-rc candidate fixes V2 Dave Chinner
2011-01-25  8:50 ` [PATCH 2/8] xfs: fix efi item leak on forced shutdown Dave Chinner
2011-01-25 23:53   ` Christoph Hellwig
2011-01-27  0:35     ` Dave Chinner
2011-01-26 21:22   ` Alex Elder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.