All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] xfs: 2.6.38-rc candidate fixes V2
@ 2011-01-25  8:50 Dave Chinner
  2011-01-25  8:50 ` [PATCH 1/8] xfs: fix log ticket leak on forced shutdown Dave Chinner
                   ` (9 more replies)
  0 siblings, 10 replies; 30+ messages in thread
From: Dave Chinner @ 2011-01-25  8:50 UTC (permalink / raw)
  To: xfs

These are all the pending 2.6.38-rc candidate fixes I have in my queue.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/8] xfs: fix log ticket leak on forced shutdown.
  2011-01-25  8:50 [PATCH 0/8] xfs: 2.6.38-rc candidate fixes V2 Dave Chinner
@ 2011-01-25  8:50 ` Dave Chinner
  2011-01-26 21:22   ` Alex Elder
  2011-01-25  8:50 ` [PATCH 2/8] xfs: fix efi item " Dave Chinner
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2011-01-25  8:50 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The kmemleak detector shows this after test 139:

unreferenced object 0xffff880079b88bb0 (size 264):
  comm "xfs_io", pid 4904, jiffies 4294909382 (age 276.824s)
  hex dump (first 32 bytes):
    00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
    ff ff ff ff ff ff ff ff 48 7b c9 82 ff ff ff ff  ........H{......
  backtrace:
    [<ffffffff81afb04d>] kmemleak_alloc+0x2d/0x60
    [<ffffffff8115c6cf>] kmem_cache_alloc+0x13f/0x2b0
    [<ffffffff814aaa97>] kmem_zone_alloc+0x77/0xf0
    [<ffffffff814aab2e>] kmem_zone_zalloc+0x1e/0x50
    [<ffffffff8148f394>] xlog_ticket_alloc+0x34/0x170
    [<ffffffff81494444>] xlog_cil_push+0xa4/0x3f0
    [<ffffffff81494eca>] xlog_cil_force_lsn+0x15a/0x160
    [<ffffffff814933a5>] _xfs_log_force_lsn+0x75/0x2d0
    [<ffffffff814a264d>] _xfs_trans_commit+0x2bd/0x2f0
    [<ffffffff8148bfdd>] xfs_iomap_write_allocate+0x1ad/0x350
    [<ffffffff814ac17f>] xfs_map_blocks+0x21f/0x370
    [<ffffffff814ad1b7>] xfs_vm_writepage+0x1c7/0x550
    [<ffffffff8112200a>] __writepage+0x1a/0x50
    [<ffffffff81122df2>] write_cache_pages+0x1c2/0x4c0
    [<ffffffff81123117>] generic_writepages+0x27/0x30
    [<ffffffff814aba5d>] xfs_vm_writepages+0x5d/0x80

By inspection, the leak occurs when xlog_write() returns and error
and we jump to the abort path without dropping the reference on the
active ticket.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_log_cil.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 9dc8125..c7eac5a 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -543,7 +543,7 @@ xlog_cil_push(
 
 	error = xlog_write(log, &lvhdr, tic, &ctx->start_lsn, NULL, 0);
 	if (error)
-		goto out_abort;
+		goto out_abort_free_ticket;
 
 	/*
 	 * now that we've written the checkpoint into the log, strictly
@@ -569,8 +569,9 @@ restart:
 	}
 	spin_unlock(&cil->xc_cil_lock);
 
+	/* xfs_log_done always frees the ticket on error. */
 	commit_lsn = xfs_log_done(log->l_mp, tic, &commit_iclog, 0);
-	if (error || commit_lsn == -1)
+	if (commit_lsn == -1)
 		goto out_abort;
 
 	/* attach all the transactions w/ busy extents to iclog */
@@ -600,6 +601,8 @@ out_free_ticket:
 	kmem_free(new_ctx);
 	return 0;
 
+out_abort_free_ticket:
+	xfs_log_ticket_put(tic);
 out_abort:
 	xlog_cil_committed(ctx, XFS_LI_ABORTED);
 	return XFS_ERROR(EIO);
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/8] xfs: fix efi item leak on forced shutdown
  2011-01-25  8:50 [PATCH 0/8] xfs: 2.6.38-rc candidate fixes V2 Dave Chinner
  2011-01-25  8:50 ` [PATCH 1/8] xfs: fix log ticket leak on forced shutdown Dave Chinner
@ 2011-01-25  8:50 ` Dave Chinner
  2011-01-25 23:53   ` Christoph Hellwig
  2011-01-26 21:22   ` Alex Elder
  2011-01-25  8:50 ` [PATCH 3/8] xfs: speculative delayed allocation uses rounddown_power_of_2 badly Dave Chinner
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 30+ messages in thread
From: Dave Chinner @ 2011-01-25  8:50 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

After test 139, kmemleak shows:

unreferenced object 0xffff880078b405d8 (size 400):
  comm "xfs_io", pid 4904, jiffies 4294909383 (age 1186.728s)
  hex dump (first 32 bytes):
    60 c1 17 79 00 88 ff ff 60 c1 17 79 00 88 ff ff  `..y....`..y....
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81afb04d>] kmemleak_alloc+0x2d/0x60
    [<ffffffff8115c6cf>] kmem_cache_alloc+0x13f/0x2b0
    [<ffffffff814aaa97>] kmem_zone_alloc+0x77/0xf0
    [<ffffffff814aab2e>] kmem_zone_zalloc+0x1e/0x50
    [<ffffffff8147cd6b>] xfs_efi_init+0x4b/0xb0
    [<ffffffff814a4ee8>] xfs_trans_get_efi+0x58/0x90
    [<ffffffff81455fab>] xfs_bmap_finish+0x8b/0x1d0
    [<ffffffff814851b4>] xfs_itruncate_finish+0x2c4/0x5d0
    [<ffffffff814a970f>] xfs_setattr+0x8df/0xa70
    [<ffffffff814b5c7b>] xfs_vn_setattr+0x1b/0x20
    [<ffffffff8117dc00>] notify_change+0x170/0x2e0
    [<ffffffff81163bf6>] do_truncate+0x66/0xa0
    [<ffffffff81163d0b>] sys_ftruncate+0xdb/0xe0
    [<ffffffff8103a002>] system_call_fastpath+0x16/0x1b
    [<ffffffffffffffff>] 0xffffffffffffffff

The cause of the leak is that the "remove" parameter of IOP_UNPIN()
is never set when a CIL push is aborted. This means that the EFI
item is never freed if it was in the push being cancelled. The
problem is specific to delayed logging.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_extfree_item.c |    1 -
 fs/xfs/xfs_trans.c        |   35 ++++++++++++++++++++++++++++++-----
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 75f2ef6..effbb41 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -138,7 +138,6 @@ xfs_efi_item_unpin(
 
 	if (remove) {
 		ASSERT(!(lip->li_flags & XFS_LI_IN_AIL));
-		xfs_trans_del_item(lip);
 		xfs_efi_item_free(efip);
 		return;
 	}
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 33dbc4e..27ecb74 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1446,6 +1446,14 @@ xfs_log_item_batch_insert(
  * Bulk operation version of xfs_trans_committed that takes a log vector of
  * items to insert into the AIL. This uses bulk AIL insertion techniques to
  * minimise lock traffic.
+ *
+ * If we are called with the aborted flag set, it is because a log write during
+ * a CIL checkpoint commit has failed. In this case, all the items in the
+ * checkpoint have already gone through IOP_COMMITED and IOP_UNLOCK, which
+ * means that checkpoint commit abort handling is treated exactly the same
+ * as an iclog write error even though we haven't started any IO yet. Hence in
+ * this case all we need to do is IOP_COMMITTED processing, followed by an
+ * IOP_UNPIN(aborted) call.
  */
 void
 xfs_trans_committed_bulk(
@@ -1472,6 +1480,16 @@ xfs_trans_committed_bulk(
 		if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)
 			continue;
 
+		/*
+		 * if we are aborting the operation, no point in inserting the
+		 * object into the AIL as we are in a shutdown situation.
+		 */
+		if (aborted) {
+			ASSERT(XFS_FORCED_SHUTDOWN(ailp->xa_mount));
+			IOP_UNPIN(lip, aborted);
+			continue;
+		}
+
 		if (item_lsn != commit_lsn) {
 
 			/*
@@ -1503,22 +1521,29 @@ xfs_trans_committed_bulk(
 }
 
 /*
- * Called from the trans_commit code when we notice that
- * the filesystem is in the middle of a forced shutdown.
+ * Called from the trans_commit code when we notice that the filesystem is in
+ * the middle of a forced shutdown.
+ *
+ * When we are called here, we have already pinned all the items in the
+ * transaction. However, neither IOP_COMMITTING or IOP_UNLOCK has been called
+ * so we can simply walk the items in the transaction, unlink them from the
+ * transaction, unpin them with an abort flag and then free the items.
  */
 STATIC void
 xfs_trans_uncommit(
 	struct xfs_trans	*tp,
 	uint			flags)
 {
-	struct xfs_log_item_desc *lidp;
+	struct xfs_log_item_desc *lidp, *n;
 
-	list_for_each_entry(lidp, &tp->t_items, lid_trans) {
+	list_for_each_entry_safe(lidp, n, &tp->t_items, lid_trans) {
 		/*
 		 * Unpin all but those that aren't dirty.
 		 */
-		if (lidp->lid_flags & XFS_LID_DIRTY)
+		if (lidp->lid_flags & XFS_LID_DIRTY) {
+			xfs_trans_del_item(lidp->lid_item);
 			IOP_UNPIN(lidp->lid_item, 1);
+		}
 	}
 
 	xfs_trans_unreserve_and_mod_sb(tp);
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/8] xfs: speculative delayed allocation uses rounddown_power_of_2 badly
  2011-01-25  8:50 [PATCH 0/8] xfs: 2.6.38-rc candidate fixes V2 Dave Chinner
  2011-01-25  8:50 ` [PATCH 1/8] xfs: fix log ticket leak on forced shutdown Dave Chinner
  2011-01-25  8:50 ` [PATCH 2/8] xfs: fix efi item " Dave Chinner
@ 2011-01-25  8:50 ` Dave Chinner
  2011-01-26 21:22   ` Alex Elder
  2011-01-25  8:50 ` [PATCH 4/8] xfs: limit extent length for allocation to AG size Dave Chinner
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2011-01-25  8:50 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

rounddown_power_of_2() returns an undefined result when passed a
value of zero. The specualtive delayed allocation code is doing this
when the inode is zero length. Hence occasionally the preallocation
is much, much larger than is necessary (e.g. 8GB for a 270 _byte_
file). Ensure we don't even pass a zero value to this function so
the result of preallocation is always the desired size.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 55582bd..8a0f044 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -337,7 +337,12 @@ xfs_iomap_prealloc_size(
 		int shift = 0;
 		int64_t freesp;
 
-		alloc_blocks = XFS_B_TO_FSB(mp, ip->i_size);
+		/*
+		 * rounddown_pow_of_two() returns an undefined result
+		 * if we pass in alloc_blocks = 0. Hence the "+ 1" to
+		 * ensure we always pass in a non-zero value.
+		 */
+		alloc_blocks = XFS_B_TO_FSB(mp, ip->i_size) + 1;
 		alloc_blocks = XFS_FILEOFF_MIN(MAXEXTLEN,
 					rounddown_pow_of_two(alloc_blocks));
 
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/8] xfs: limit extent length for allocation to AG size
  2011-01-25  8:50 [PATCH 0/8] xfs: 2.6.38-rc candidate fixes V2 Dave Chinner
                   ` (2 preceding siblings ...)
  2011-01-25  8:50 ` [PATCH 3/8] xfs: speculative delayed allocation uses rounddown_power_of_2 badly Dave Chinner
@ 2011-01-25  8:50 ` Dave Chinner
  2011-01-26 21:22   ` Alex Elder
  2011-01-25  8:50 ` [PATCH 5/8] xfs: prevent extsize alignment from exceeding maximum extent size Dave Chinner
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2011-01-25  8:50 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Delayed allocation extents can be larger than AGs, so when trying to
convert a large range we may scan every AG inside
xfs_bmap_alloc_nullfb() trying to find an AG with a size larger than
an AG. We should stop when we find the first AG with a maximum
possible allocation size. This causes excessive CPU usage when there
are lots of AGs.

The same problem occurs when doing preallocation of a range larger
than an AG.

Fix the problem by limiting real allocation lengths to the maximum
that an AG can support. This means if we have empty AGs, we'll stop
the search at the first of them. If there are no empty AGs, we'll
still scan them all, but that is a different problem....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_alloc.h |   16 ++++++++++++++++
 fs/xfs/xfs_bmap.c  |   16 +++++++++-------
 2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_alloc.h b/fs/xfs/xfs_alloc.h
index 0ab56b3..6ad45b9 100644
--- a/fs/xfs/xfs_alloc.h
+++ b/fs/xfs/xfs_alloc.h
@@ -75,6 +75,22 @@ typedef unsigned int xfs_alloctype_t;
 #define XFS_ALLOC_SET_ASIDE(mp)  (4 + ((mp)->m_sb.sb_agcount * 4))
 
 /*
+ * When deciding how much space to allocate out of an AG, we limit the
+ * allocation maximum size to the size the AG. However, we cannot use all the
+ * blocks in the AG - some are permanently used by metadata. These
+ * blocks are generally:
+ *	- the AG superblock, AGF, AGI and AGFL
+ *	- the AGF (bno and cnt) and AGI btree root blocks
+ *	- 4 blocks on the AGFL according to XFS_ALLOC_SET_ASIDE() limits
+ *
+ * The AG headers are sector sized, so the amount of space they take up is
+ * dependent on filesystem geometry. The others are all single blocks.
+ */
+#define XFS_ALLOC_AG_MAX_USABLE(mp)	\
+	((mp)->m_sb.sb_agblocks - XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)) + 7)
+
+
+/*
  * Argument structure for xfs_alloc routines.
  * This is turned into a structure to avoid having 20 arguments passed
  * down several levels of the stack.
diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index 4111cd3..74861c6 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -2430,7 +2430,7 @@ xfs_bmap_btalloc_nullfb(
 		startag = ag = 0;
 
 	pag = xfs_perag_get(mp, ag);
-	while (*blen < ap->alen) {
+	while (*blen < args->maxlen) {
 		if (!pag->pagf_init) {
 			error = xfs_alloc_pagf_init(mp, args->tp, ag,
 						    XFS_ALLOC_FLAG_TRYLOCK);
@@ -2452,7 +2452,7 @@ xfs_bmap_btalloc_nullfb(
 			notinit = 1;
 
 		if (xfs_inode_is_filestream(ap->ip)) {
-			if (*blen >= ap->alen)
+			if (*blen >= args->maxlen)
 				break;
 
 			if (ap->userdata) {
@@ -2498,14 +2498,14 @@ xfs_bmap_btalloc_nullfb(
 	 * If the best seen length is less than the request
 	 * length, use the best as the minimum.
 	 */
-	else if (*blen < ap->alen)
+	else if (*blen < args->maxlen)
 		args->minlen = *blen;
 	/*
 	 * Otherwise we've seen an extent as big as alen,
 	 * use that as the minimum.
 	 */
 	else
-		args->minlen = ap->alen;
+		args->minlen = args->maxlen;
 
 	/*
 	 * set the failure fallback case to look in the selected
@@ -2573,7 +2573,9 @@ xfs_bmap_btalloc(
 	args.tp = ap->tp;
 	args.mp = mp;
 	args.fsbno = ap->rval;
-	args.maxlen = MIN(ap->alen, mp->m_sb.sb_agblocks);
+
+	/* Trim the allocation back to the maximum an AG can fit. */
+	args.maxlen = MIN(ap->alen, XFS_ALLOC_AG_MAX_USABLE(mp));
 	args.firstblock = ap->firstblock;
 	blen = 0;
 	if (nullfb) {
@@ -2621,7 +2623,7 @@ xfs_bmap_btalloc(
 			/*
 			 * Adjust for alignment
 			 */
-			if (blen > args.alignment && blen <= ap->alen)
+			if (blen > args.alignment && blen <= args.maxlen)
 				args.minlen = blen - args.alignment;
 			args.minalignslop = 0;
 		} else {
@@ -2640,7 +2642,7 @@ xfs_bmap_btalloc(
 			 * of minlen+alignment+slop doesn't go up
 			 * between the calls.
 			 */
-			if (blen > mp->m_dalign && blen <= ap->alen)
+			if (blen > mp->m_dalign && blen <= args.maxlen)
 				nextminlen = blen - mp->m_dalign;
 			else
 				nextminlen = args.minlen;
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/8] xfs: prevent extsize alignment from exceeding maximum extent size
  2011-01-25  8:50 [PATCH 0/8] xfs: 2.6.38-rc candidate fixes V2 Dave Chinner
                   ` (3 preceding siblings ...)
  2011-01-25  8:50 ` [PATCH 4/8] xfs: limit extent length for allocation to AG size Dave Chinner
@ 2011-01-25  8:50 ` Dave Chinner
  2011-01-25  9:49   ` Christoph Hellwig
  2011-01-26 21:22   ` Alex Elder
  2011-01-25  8:50 ` [PATCH 6/8] xfs: limit extsize to size of AGs and/or MAXEXTLEN Dave Chinner
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 30+ messages in thread
From: Dave Chinner @ 2011-01-25  8:50 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When doing delayed allocation, if the allocation size is for a
maximally sized extent, extent size alignment can push it over this
limit. This results in an assert failure in xfs_bmbt_set_allf() as
the extent length is too large to find in the extent record.

Fix this by ensuring that we allow for space that extent size
alignment requires (up to 2 * (extsize -1) blocks as we have to
handle both head and tail alignment) when limiting the maximum size
of the extent.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_bmap.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index 74861c6..3e5a91a 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -4487,6 +4487,16 @@ xfs_bmapi(
 				/* Figure out the extent size, adjust alen */
 				extsz = xfs_get_extsz_hint(ip);
 				if (extsz) {
+					/*
+					 * make sure we don't exceed a single
+					 * extent length when we align the
+					 * extent by reducing length we are
+					 * going to allocate by the maximum
+					 * amount extent size aligment may
+					 * require.
+					 */
+					alen = XFS_FILBLKS_MIN( len,
+						   MAXEXTLEN - (2 * extsz - 1));
 					error = xfs_bmap_extsize_align(mp,
 							&got, &prev, extsz,
 							rt, eof,
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 6/8] xfs: limit extsize to size of AGs and/or MAXEXTLEN
  2011-01-25  8:50 [PATCH 0/8] xfs: 2.6.38-rc candidate fixes V2 Dave Chinner
                   ` (4 preceding siblings ...)
  2011-01-25  8:50 ` [PATCH 5/8] xfs: prevent extsize alignment from exceeding maximum extent size Dave Chinner
@ 2011-01-25  8:50 ` Dave Chinner
  2011-01-26 21:23   ` Alex Elder
  2011-01-25  8:50 ` [PATCH 7/8] xfs: handle CIl transaction commit failures correctly Dave Chinner
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2011-01-25  8:50 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The extent size hint can be set to larger than an AG. This means
that the alignment process can push the range to be allocated
outside the bounds of the AG, resulting in assert failures or
corrupted bmbt records. Similarly, if the extsize is larger than the
maximum extent size supported, the alignment process will produce
extents that are too large to fit into the bmbt records, resulting
in a different type of assert/corruption failure.

Fix this by limiting extsize at the time іt is set firstly to be
less than MAXEXTLEN, then to be a maximum of half the size of the
AGs in the filesystem for non-realtime inodes. Realtime inodes do
not allocate out of AGs, so don't have to be restricted by the size
of AGs.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/linux-2.6/xfs_ioctl.c |   20 ++++++++++++++++++--
 1 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_ioctl.c b/fs/xfs/linux-2.6/xfs_ioctl.c
index b06ede1..f5e2a19 100644
--- a/fs/xfs/linux-2.6/xfs_ioctl.c
+++ b/fs/xfs/linux-2.6/xfs_ioctl.c
@@ -985,10 +985,22 @@ xfs_ioctl_setattr(
 
 		/*
 		 * Extent size must be a multiple of the appropriate block
-		 * size, if set at all.
+		 * size, if set at all. It must also be smaller than the
+		 * maximum extent size supported by the filesystem.
+		 *
+		 * Also, for non-realtime files, limit the extent size hint to
+		 * half the size of the AGs in the filesystem so alignment
+		 * doesn't result in extents larger than an AG.
 		 */
 		if (fa->fsx_extsize != 0) {
-			xfs_extlen_t	size;
+			xfs_extlen_t    size;
+			xfs_fsblock_t   extsize_fsb;
+
+			extsize_fsb = XFS_B_TO_FSB(mp, fa->fsx_extsize);
+			if (extsize_fsb > MAXEXTLEN) {
+				code = XFS_ERROR(EINVAL);
+				goto error_return;
+			}
 
 			if (XFS_IS_REALTIME_INODE(ip) ||
 			    ((mask & FSX_XFLAGS) &&
@@ -997,6 +1009,10 @@ xfs_ioctl_setattr(
 				       mp->m_sb.sb_blocklog;
 			} else {
 				size = mp->m_sb.sb_blocksize;
+				if (extsize_fsb > mp->m_sb.sb_agblocks / 2) {
+					code = XFS_ERROR(EINVAL);
+					goto error_return;
+				}
 			}
 
 			if (fa->fsx_extsize % size) {
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 7/8] xfs: handle CIl transaction commit failures correctly
  2011-01-25  8:50 [PATCH 0/8] xfs: 2.6.38-rc candidate fixes V2 Dave Chinner
                   ` (5 preceding siblings ...)
  2011-01-25  8:50 ` [PATCH 6/8] xfs: limit extsize to size of AGs and/or MAXEXTLEN Dave Chinner
@ 2011-01-25  8:50 ` Dave Chinner
  2011-01-25  9:53   ` Christoph Hellwig
  2011-01-26 21:23   ` Alex Elder
  2011-01-25  8:50 ` [PATCH 8/8] xfs: fix dquot shaker deadlock Dave Chinner
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 30+ messages in thread
From: Dave Chinner @ 2011-01-25  8:50 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Failure to commit a transaction into the CIL is not handled
correctly. This currently can only happen when racing with a
shutdown and requires an explicit shutdown check, so it rare and can
be avoided. Remove the shutdown check and make the CIL commit a void
function to indicate it will always succeed, thereby removing the
incorrectly handled failure case.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.h     |    2 +-
 fs/xfs/xfs_log_cil.c |    7 +------
 fs/xfs/xfs_trans.c   |    4 +---
 3 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
index 916eb7d..3bd3291 100644
--- a/fs/xfs/xfs_log.h
+++ b/fs/xfs/xfs_log.h
@@ -191,7 +191,7 @@ void	  xfs_log_ticket_put(struct xlog_ticket *ticket);
 
 xlog_tid_t xfs_log_get_trans_ident(struct xfs_trans *tp);
 
-int	xfs_log_commit_cil(struct xfs_mount *mp, struct xfs_trans *tp,
+void	xfs_log_commit_cil(struct xfs_mount *mp, struct xfs_trans *tp,
 				struct xfs_log_vec *log_vector,
 				xfs_lsn_t *commit_lsn, int flags);
 bool	xfs_log_item_in_current_chkpt(struct xfs_log_item *lip);
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index c7eac5a..4f9ba86 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -625,7 +625,7 @@ out_abort:
  * background commit, returns without it held once background commits are
  * allowed again.
  */
-int
+void
 xfs_log_commit_cil(
 	struct xfs_mount	*mp,
 	struct xfs_trans	*tp,
@@ -640,11 +640,6 @@ xfs_log_commit_cil(
 	if (flags & XFS_TRANS_RELEASE_LOG_RES)
 		log_flags = XFS_LOG_REL_PERM_RESERV;
 
-	if (XLOG_FORCED_SHUTDOWN(log)) {
-		xlog_cil_free_logvec(log_vector);
-		return XFS_ERROR(EIO);
-	}
-
 	/*
 	 * do all the hard work of formatting items (including memory
 	 * allocation) outside the CIL context lock. This prevents stalling CIL
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 27ecb74..f1ce10f 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1769,9 +1769,7 @@ xfs_trans_commit_cil(
 	if (!log_vector)
 		return ENOMEM;
 
-	error = xfs_log_commit_cil(mp, tp, log_vector, commit_lsn, flags);
-	if (error)
-		return error;
+	xfs_log_commit_cil(mp, tp, log_vector, commit_lsn, flags);
 
 	current_restore_flags_nested(&tp->t_pflags, PF_FSTRANS);
 	xfs_trans_free(tp);
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 8/8] xfs: fix dquot shaker deadlock
  2011-01-25  8:50 [PATCH 0/8] xfs: 2.6.38-rc candidate fixes V2 Dave Chinner
                   ` (6 preceding siblings ...)
  2011-01-25  8:50 ` [PATCH 7/8] xfs: handle CIl transaction commit failures correctly Dave Chinner
@ 2011-01-25  8:50 ` Dave Chinner
  2011-01-25  9:52   ` Christoph Hellwig
  2011-01-26 21:23   ` Alex Elder
  2011-01-25  9:20 ` [PATCH 0/8] xfs: 2.6.38-rc candidate fixes V2 Christoph Hellwig
  2011-01-26 21:23 ` Alex Elder
  9 siblings, 2 replies; 30+ messages in thread
From: Dave Chinner @ 2011-01-25  8:50 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Commit 368e136 ("xfs: remove duplicate code from dquot reclaim") fails
to unlock the dquot freelist when the number of loop restarts is
exceeded in xfs_qm_dqreclaim_one(). This causes hangs in memory
reclaim. Remove the bogus loop exit check that causes the problem.

Reported-by: Malcolm Scott <lkml@malc.org.uk>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/quota/xfs_qm.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/quota/xfs_qm.c b/fs/xfs/quota/xfs_qm.c
index f8e854b..9431c56 100644
--- a/fs/xfs/quota/xfs_qm.c
+++ b/fs/xfs/quota/xfs_qm.c
@@ -1992,8 +1992,6 @@ dqfunlock:
 		xfs_dqunlock(dqp);
 		if (dqpout)
 			break;
-		if (restarts >= XFS_QM_RECLAIM_MAX_RESTARTS)
-			return NULL;
 	}
 	mutex_unlock(&xfs_Gqm->qm_dqfrlist_lock);
 	return dqpout;
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8] xfs: 2.6.38-rc candidate fixes V2
  2011-01-25  8:50 [PATCH 0/8] xfs: 2.6.38-rc candidate fixes V2 Dave Chinner
                   ` (7 preceding siblings ...)
  2011-01-25  8:50 ` [PATCH 8/8] xfs: fix dquot shaker deadlock Dave Chinner
@ 2011-01-25  9:20 ` Christoph Hellwig
  2011-01-26 21:23 ` Alex Elder
  9 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2011-01-25  9:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Jan 25, 2011 at 07:50:36PM +1100, Dave Chinner wrote:
> These are all the pending 2.6.38-rc candidate fixes I have in my queue.

I'd limit the .38 series to actual regression fixes, which means
skipping patches 1, 2 and 7.  OTOH we really need to get the roundown
fix into Linus' tree ASAP as quite a few people hit it.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/8] xfs: prevent extsize alignment from exceeding maximum extent size
  2011-01-25  8:50 ` [PATCH 5/8] xfs: prevent extsize alignment from exceeding maximum extent size Dave Chinner
@ 2011-01-25  9:49   ` Christoph Hellwig
  2011-01-26 21:22   ` Alex Elder
  1 sibling, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2011-01-25  9:49 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Looks good,


Reviewed-by: Christoph Hellwig <hch@lst.de>

> +					alen = XFS_FILBLKS_MIN( len,

just a little whitespace damage here.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 8/8] xfs: fix dquot shaker deadlock
  2011-01-25  8:50 ` [PATCH 8/8] xfs: fix dquot shaker deadlock Dave Chinner
@ 2011-01-25  9:52   ` Christoph Hellwig
  2011-01-27  1:54     ` Dave Chinner
  2011-01-26 21:23   ` Alex Elder
  1 sibling, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2011-01-25  9:52 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Jan 25, 2011 at 07:50:44PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Commit 368e136 ("xfs: remove duplicate code from dquot reclaim") fails
> to unlock the dquot freelist when the number of loop restarts is
> exceeded in xfs_qm_dqreclaim_one(). This causes hangs in memory
> reclaim. Remove the bogus loop exit check that causes the problem.

The fix looks correct, but it's a bit inconsequential about when
to adhere the retry limit and when not.  Shouldn't we just turn the
exit condition into:

	if (dqout || restarts >= XFS_QM_RECLAIM_MAX_RESTARTS)
		break;

also the failure to acquite qi_dqlist_lock increments the restart
count twice, which was also added in the same commit.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/8] xfs: handle CIl transaction commit failures correctly
  2011-01-25  8:50 ` [PATCH 7/8] xfs: handle CIl transaction commit failures correctly Dave Chinner
@ 2011-01-25  9:53   ` Christoph Hellwig
  2011-01-26 21:23   ` Alex Elder
  1 sibling, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2011-01-25  9:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Jan 25, 2011 at 07:50:43PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Failure to commit a transaction into the CIL is not handled
> correctly. This currently can only happen when racing with a
> shutdown and requires an explicit shutdown check, so it rare and can
> be avoided. Remove the shutdown check and make the CIL commit a void
> function to indicate it will always succeed, thereby removing the
> incorrectly handled failure case.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/8] xfs: fix efi item leak on forced shutdown
  2011-01-25  8:50 ` [PATCH 2/8] xfs: fix efi item " Dave Chinner
@ 2011-01-25 23:53   ` Christoph Hellwig
  2011-01-27  0:35     ` Dave Chinner
  2011-01-26 21:22   ` Alex Elder
  1 sibling, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2011-01-25 23:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Jan 25, 2011 at 07:50:38PM +1100, Dave Chinner wrote:
> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> index 75f2ef6..effbb41 100644
> --- a/fs/xfs/xfs_extfree_item.c
> +++ b/fs/xfs/xfs_extfree_item.c
> @@ -138,7 +138,6 @@ xfs_efi_item_unpin(
>  
>  	if (remove) {
>  		ASSERT(!(lip->li_flags & XFS_LI_IN_AIL));
> -		xfs_trans_del_item(lip);
>  		xfs_efi_item_free(efip);
>  		return;
>  	}

>  STATIC void
>  xfs_trans_uncommit(
>  	struct xfs_trans	*tp,
>  	uint			flags)
>  {
> -	struct xfs_log_item_desc *lidp;
> +	struct xfs_log_item_desc *lidp, *n;
>  
> -	list_for_each_entry(lidp, &tp->t_items, lid_trans) {
> +	list_for_each_entry_safe(lidp, n, &tp->t_items, lid_trans) {
>  		/*
>  		 * Unpin all but those that aren't dirty.
>  		 */
> -		if (lidp->lid_flags & XFS_LID_DIRTY)
> +		if (lidp->lid_flags & XFS_LID_DIRTY) {
> +			xfs_trans_del_item(lidp->lid_item);
>  			IOP_UNPIN(lidp->lid_item, 1);
> +		}

This part of the patch isn't explained in the changelog, and I suspect
it should be a separate patch from the addition of the IOP_UNPIN with
remove call to the CIL error path.

Moving the xfs_trans_del_item from the IOP_UNPIN implementation into
the caller seems sane to me, but:

 - why is the call to xfs_trans_del_item left in xfs_buf_item_unpin
 - why did xfs_buf_item_unpin get away only calling it for the stale
   case, and the other log item implementations completely without
   it

I suspect the answer lies in xfs_trans_free_items opencoding the
call to xfs_trans_del_item, thus avoiding any leak of log item
descriptors or log items on the transaction list.  By adding the
call of xfs_trans_del_item to xfs_trans_uncommit we thus skip
the calls to IOP_UNLOCK for dirty log items, which is a large
change in behaviour for the existing log items that didn't have
the xfs_trans_del_item calls, and at least for the dquot and
inode items seems incorrect.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/8] xfs: fix log ticket leak on forced shutdown.
  2011-01-25  8:50 ` [PATCH 1/8] xfs: fix log ticket leak on forced shutdown Dave Chinner
@ 2011-01-26 21:22   ` Alex Elder
  0 siblings, 0 replies; 30+ messages in thread
From: Alex Elder @ 2011-01-26 21:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, 2011-01-25 at 19:50 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The kmemleak detector shows this after test 139:
> 
> unreferenced object 0xffff880079b88bb0 (size 264):

. . .

>     [<ffffffff81123117>] generic_writepages+0x27/0x30
>     [<ffffffff814aba5d>] xfs_vm_writepages+0x5d/0x80
> 
> By inspection, the leak occurs when xlog_write() returns and error
> and we jump to the abort path without dropping the reference on the
> active ticket.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Alex Elder <aelder@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/8] xfs: fix efi item leak on forced shutdown
  2011-01-25  8:50 ` [PATCH 2/8] xfs: fix efi item " Dave Chinner
  2011-01-25 23:53   ` Christoph Hellwig
@ 2011-01-26 21:22   ` Alex Elder
  1 sibling, 0 replies; 30+ messages in thread
From: Alex Elder @ 2011-01-26 21:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, 2011-01-25 at 19:50 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> After test 139, kmemleak shows:
> 
> unreferenced object 0xffff880078b405d8 (size 400):
>   comm "xfs_io", pid 4904, jiffies 4294909383 (age 1186.728s)
>   hex dump (first 32 bytes):
. . .

> 
> The cause of the leak is that the "remove" parameter of IOP_UNPIN()
> is never set when a CIL push is aborted. This means that the EFI
> item is never freed if it was in the push being cancelled. The
> problem is specific to delayed logging.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Like Christoph, I was a bit confused about xfs_buf_item_unpin()
retaining the call to xfs_trans_del_item().  They all ought to
be done consistently.  Hopefully you can straighten that out.

Two other minor comments:  I'd prefer that an explicit "1" be
passed to IOP_UNPIN() rather than the variable "aborted" (which is
known to have non-zero value) in xfs_trans_committed_bulk().  And
there are some long-lined comments right near what you're changing
that you could touch up while you're at it.

> ---
>  fs/xfs/xfs_extfree_item.c |    1 -
>  fs/xfs/xfs_trans.c        |   35 ++++++++++++++++++++++++++++++-----
>  2 files changed, 30 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
. . .
> @@ -1472,6 +1480,16 @@ xfs_trans_committed_bulk(
>  		if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)
>  			continue;
>  
> +		/*
> +		 * if we are aborting the operation, no point in inserting the
> +		 * object into the AIL as we are in a shutdown situation.
> +		 */
> +		if (aborted) {
> +			ASSERT(XFS_FORCED_SHUTDOWN(ailp->xa_mount));
> +			IOP_UNPIN(lip, aborted);
                            Here           ^^^

> +			continue;
> +		}
> +
>  		if (item_lsn != commit_lsn) {
>  
>  			/*


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/8] xfs: speculative delayed allocation uses rounddown_power_of_2 badly
  2011-01-25  8:50 ` [PATCH 3/8] xfs: speculative delayed allocation uses rounddown_power_of_2 badly Dave Chinner
@ 2011-01-26 21:22   ` Alex Elder
  0 siblings, 0 replies; 30+ messages in thread
From: Alex Elder @ 2011-01-26 21:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, 2011-01-25 at 19:50 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> rounddown_power_of_2() returns an undefined result when passed a
> value of zero. The specualtive delayed allocation code is doing this
> when the inode is zero length. Hence occasionally the preallocation
> is much, much larger than is necessary (e.g. 8GB for a 270 _byte_
> file). Ensure we don't even pass a zero value to this function so
> the result of preallocation is always the desired size.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/8] xfs: limit extent length for allocation to AG size
  2011-01-25  8:50 ` [PATCH 4/8] xfs: limit extent length for allocation to AG size Dave Chinner
@ 2011-01-26 21:22   ` Alex Elder
  2011-01-27  0:38     ` Dave Chinner
  0 siblings, 1 reply; 30+ messages in thread
From: Alex Elder @ 2011-01-26 21:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, 2011-01-25 at 19:50 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Delayed allocation extents can be larger than AGs, so when trying to
> convert a large range we may scan every AG inside
> xfs_bmap_alloc_nullfb() trying to find an AG with a size larger than
> an AG. We should stop when we find the first AG with a maximum
> possible allocation size. This causes excessive CPU usage when there
> are lots of AGs.
> 
> The same problem occurs when doing preallocation of a range larger
> than an AG.
> 
> Fix the problem by limiting real allocation lengths to the maximum
> that an AG can support. This means if we have empty AGs, we'll stop
> the search at the first of them. If there are no empty AGs, we'll
> still scan them all, but that is a different problem....

Maybe I'm wrong but I think you need to change a "+"
to a "-" (shown below).

And I have a few really minor suggestions:
- You should update a comment (which I point
  out below) to match your change.
- Maybe make use of a local variable, at least
  in xfs_bmap_btalloc_nullfb(), such as:
    xfs_extlen_t requested = args->maxlen;

Otherwise it looks good to me.

Reviewed-by: Alex Elder <aelder@sgi.com>

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_alloc.h |   16 ++++++++++++++++
>  fs/xfs/xfs_bmap.c  |   16 +++++++++-------
>  2 files changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/xfs_alloc.h b/fs/xfs/xfs_alloc.h
> index 0ab56b3..6ad45b9 100644
> --- a/fs/xfs/xfs_alloc.h
> +++ b/fs/xfs/xfs_alloc.h
> @@ -75,6 +75,22 @@ typedef unsigned int xfs_alloctype_t;
>  #define XFS_ALLOC_SET_ASIDE(mp)  (4 + ((mp)->m_sb.sb_agcount * 4))
>  
>  /*
> + * When deciding how much space to allocate out of an AG, we limit the
> + * allocation maximum size to the size the AG. However, we cannot use all the
> + * blocks in the AG - some are permanently used by metadata. These
> + * blocks are generally:
> + *	- the AG superblock, AGF, AGI and AGFL
> + *	- the AGF (bno and cnt) and AGI btree root blocks
> + *	- 4 blocks on the AGFL according to XFS_ALLOC_SET_ASIDE() limits
> + *
> + * The AG headers are sector sized, so the amount of space they take up is
> + * dependent on filesystem geometry. The others are all single blocks.
> + */
> +#define XFS_ALLOC_AG_MAX_USABLE(mp)	\
> +	((mp)->m_sb.sb_agblocks - XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)) + 7)

Is this right?  Shouldn't the 7 be subtracted (or combined using
parentheses with the 4 FS sectors)?

> +
> +
> +/*
>   * Argument structure for xfs_alloc routines.
>   * This is turned into a structure to avoid having 20 arguments passed
>   * down several levels of the stack.
> diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
> index 4111cd3..74861c6 100644
> --- a/fs/xfs/xfs_bmap.c
> +++ b/fs/xfs/xfs_bmap.c
. . .
> @@ -2498,14 +2498,14 @@ xfs_bmap_btalloc_nullfb(
>  	 * If the best seen length is less than the request
>  	 * length, use the best as the minimum.
>  	 */
> -	else if (*blen < ap->alen)
> +	else if (*blen < args->maxlen)
>  		args->minlen = *blen;
>  	/*
>  	 * Otherwise we've seen an extent as big as alen,

Ought to adjust this comment to better reflect your updated
code ("alen" doesn't really fit any more).

>  	 * use that as the minimum.
>  	 */
>  	else
> -		args->minlen = ap->alen;
> +		args->minlen = args->maxlen;
>  
>  	/*
>  	 * set the failure fallback case to look in the selected


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/8] xfs: prevent extsize alignment from exceeding maximum extent size
  2011-01-25  8:50 ` [PATCH 5/8] xfs: prevent extsize alignment from exceeding maximum extent size Dave Chinner
  2011-01-25  9:49   ` Christoph Hellwig
@ 2011-01-26 21:22   ` Alex Elder
  2011-01-27  0:50     ` Dave Chinner
  1 sibling, 1 reply; 30+ messages in thread
From: Alex Elder @ 2011-01-26 21:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, 2011-01-25 at 19:50 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When doing delayed allocation, if the allocation size is for a
> maximally sized extent, extent size alignment can push it over this
> limit. This results in an assert failure in xfs_bmbt_set_allf() as
> the extent length is too large to find in the extent record.
> 
> Fix this by ensuring that we allow for space that extent size
> alignment requires (up to 2 * (extsize -1) blocks as we have to
> handle both head and tail alignment) when limiting the maximum size
> of the extent.

I think this is OK, however...

It seems to me that the XFS_FILBLKS_MIN() call you're making is
sort of magical because it pre-supposes exactly what the following
xfs_bmap_extsize_align() actually does.  And because of that, I
would rather see that logic built into xfs_bmap_extsize_align()
itself.  I haven't looked at it closely, but I presume the other
two spots that call xfs_bmap_extsize_align() would be subject to
the same MAXEXTLEN limit.

OK with me if you disagree though.

Reviewed-by: Alex Elder <aelder@sgi.com>

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_bmap.c |   10 ++++++++++
>  1 files changed, 10 insertions(+), 0 deletions(-)


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/8] xfs: limit extsize to size of AGs and/or MAXEXTLEN
  2011-01-25  8:50 ` [PATCH 6/8] xfs: limit extsize to size of AGs and/or MAXEXTLEN Dave Chinner
@ 2011-01-26 21:23   ` Alex Elder
  0 siblings, 0 replies; 30+ messages in thread
From: Alex Elder @ 2011-01-26 21:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, 2011-01-25 at 19:50 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The extent size hint can be set to larger than an AG. This means
> that the alignment process can push the range to be allocated
> outside the bounds of the AG, resulting in assert failures or
> corrupted bmbt records. Similarly, if the extsize is larger than the
> maximum extent size supported, the alignment process will produce
> extents that are too large to fit into the bmbt records, resulting
> in a different type of assert/corruption failure.
> 
> Fix this by limiting extsize at the time іt is set firstly to be
> less than MAXEXTLEN, then to be a maximum of half the size of the
> AGs in the filesystem for non-realtime inodes. Realtime inodes do
> not allocate out of AGs, so don't have to be restricted by the size
> of AGs.

Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/8] xfs: handle CIl transaction commit failures correctly
  2011-01-25  8:50 ` [PATCH 7/8] xfs: handle CIl transaction commit failures correctly Dave Chinner
  2011-01-25  9:53   ` Christoph Hellwig
@ 2011-01-26 21:23   ` Alex Elder
  1 sibling, 0 replies; 30+ messages in thread
From: Alex Elder @ 2011-01-26 21:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, 2011-01-25 at 19:50 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Failure to commit a transaction into the CIL is not handled
> correctly. This currently can only happen when racing with a
> shutdown and requires an explicit shutdown check, so it rare and can
> be avoided. Remove the shutdown check and make the CIL commit a void
> function to indicate it will always succeed, thereby removing the
> incorrectly handled failure case.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 8/8] xfs: fix dquot shaker deadlock
  2011-01-25  8:50 ` [PATCH 8/8] xfs: fix dquot shaker deadlock Dave Chinner
  2011-01-25  9:52   ` Christoph Hellwig
@ 2011-01-26 21:23   ` Alex Elder
  1 sibling, 0 replies; 30+ messages in thread
From: Alex Elder @ 2011-01-26 21:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, 2011-01-25 at 19:50 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Commit 368e136 ("xfs: remove duplicate code from dquot reclaim") fails
> to unlock the dquot freelist when the number of loop restarts is
> exceeded in xfs_qm_dqreclaim_one(). This causes hangs in memory
> reclaim. Remove the bogus loop exit check that causes the problem.
> 
> Reported-by: Malcolm Scott <lkml@malc.org.uk>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/quota/xfs_qm.c |    2 --
>  1 files changed, 0 insertions(+), 2 deletions(-)

I see what Christoph means about the double-increment of
restart in the case where qi_dqlist_lock is not acquired.
That ought to be fixed.  I also agree with his suggested
loop termination condition change.

Also, restarts is pre-incremented in the XFS_DQ_WANT case
at the top, but post-incremented elsewhere for some reason.
I think they all ought to be the same.

					-Alex


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8] xfs: 2.6.38-rc candidate fixes V2
  2011-01-25  8:50 [PATCH 0/8] xfs: 2.6.38-rc candidate fixes V2 Dave Chinner
                   ` (8 preceding siblings ...)
  2011-01-25  9:20 ` [PATCH 0/8] xfs: 2.6.38-rc candidate fixes V2 Christoph Hellwig
@ 2011-01-26 21:23 ` Alex Elder
  9 siblings, 0 replies; 30+ messages in thread
From: Alex Elder @ 2011-01-26 21:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, 2011-01-25 at 19:50 +1100, Dave Chinner wrote:
> These are all the pending 2.6.38-rc candidate fixes I have in my queue.

After looking this latest series over I think these
are ready to go:

[1/8] xfs: fix log ticket leak on forced shutdown
[3/8] xfs: speculative delayed allocation uses rounddown_power_of_2
badly
[6/8] xfs: limit extsize to size of AGs and/or MAXEXTLEN
[7/8] xfs: handle CIl transaction commit failures correctly

The next two are marked reviewed by Christoph, but I have
made a few minor suggestions for you to consider.  One is
ready to go, the other may be but you need to verify it.
Changes--if any--would be minor, to make them ready:

[4/8] xfs: limit extent length for allocation to AG size 
[5/8] xfs: prevent extsize alignment from exceeding maximum extent size 

These last two appear to need a small bit of work.
[2/8] xfs: fix efi item leak on forced shutdown
[8/8] xfs: fix dquot shaker deadlock

I don't expect it will take much time to fix these, but
if you want I could take the ones that are ready right
away, without waiting for the rest to get completed.

Let me know how you want me to handle it.  Unless I
hear otherwise I'm awaiting pull requests from you
on reviewed patches.

					-Alex

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/8] xfs: fix efi item leak on forced shutdown
  2011-01-25 23:53   ` Christoph Hellwig
@ 2011-01-27  0:35     ` Dave Chinner
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Chinner @ 2011-01-27  0:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Tue, Jan 25, 2011 at 06:53:54PM -0500, Christoph Hellwig wrote:
> On Tue, Jan 25, 2011 at 07:50:38PM +1100, Dave Chinner wrote:
> > diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> > index 75f2ef6..effbb41 100644
> > --- a/fs/xfs/xfs_extfree_item.c
> > +++ b/fs/xfs/xfs_extfree_item.c
> > @@ -138,7 +138,6 @@ xfs_efi_item_unpin(
> >  
> >  	if (remove) {
> >  		ASSERT(!(lip->li_flags & XFS_LI_IN_AIL));
> > -		xfs_trans_del_item(lip);
> >  		xfs_efi_item_free(efip);
> >  		return;
> >  	}
> 
> >  STATIC void
> >  xfs_trans_uncommit(
> >  	struct xfs_trans	*tp,
> >  	uint			flags)
> >  {
> > -	struct xfs_log_item_desc *lidp;
> > +	struct xfs_log_item_desc *lidp, *n;
> >  
> > -	list_for_each_entry(lidp, &tp->t_items, lid_trans) {
> > +	list_for_each_entry_safe(lidp, n, &tp->t_items, lid_trans) {
> >  		/*
> >  		 * Unpin all but those that aren't dirty.
> >  		 */
> > -		if (lidp->lid_flags & XFS_LID_DIRTY)
> > +		if (lidp->lid_flags & XFS_LID_DIRTY) {
> > +			xfs_trans_del_item(lidp->lid_item);
> >  			IOP_UNPIN(lidp->lid_item, 1);
> > +		}
> 
> This part of the patch isn't explained in the changelog, and I suspect
> it should be a separate patch from the addition of the IOP_UNPIN with
> remove call to the CIL error path.

Doing xfs_trans_del_item() inside IOP_UNPIN() during CIl commit
faliure processing causes a panic - there is not descriptior or
trransaction attached to the item at that point. Hence it is simply
not safe to call IOP_UNPIN(aborted) from outside a transaction
context if we allow xfs_trans_del_item() to be run inside the
IOP_UNPIN() call.

Further, if we do xfs_trans_del_item() inside the IOP_UNPIN(aborted)
path, we corrupt the list walk in xfs_trans_uncommit() case. Hence
it needs to be an list_for_each_entry_safe() walk, and must do
explicit removal of the item from the transaction because the item
can be freed in IOP_UNPIN(aborted).

> Moving the xfs_trans_del_item from the IOP_UNPIN implementation into
> the caller seems sane to me, but:
> 
>  - why is the call to xfs_trans_del_item left in xfs_buf_item_unpin

Because I didn't notice it. It never triggers in the
xfs_trans_uncommit() case as there are two references on the buf
item - one for the pin, and one for the lock. We call IOP_UNPIN()
first, so it never goes down the (freed && stale) path in this case.

As to why it hasn't triggered from a iclog write completion or CIL
checkpoint failure after the IOP_UNLOCK() has been run and dropped
that reference - just luck I guess....

>  - why did xfs_buf_item_unpin get away only calling it for the stale
>    case, and the other log item implementations completely without
>    it

The stale buffer case is special - it has a different lifecycle to
all other buffer items (and all other log item types, for that
matter). Basically a stale buffer never enters the AIL on
transaction commit, so on the last reference going away it needs to
be freed rather than continuing to live while being tracked in the
AIL.

> I suspect the answer lies in xfs_trans_free_items opencoding the
> call to xfs_trans_del_item, thus avoiding any leak of log item
> descriptors or log items on the transaction list.

Actually, I don't think it does handle this case - we've already
removed all the dirty items from the transaction, so it appears to
me like they just get leaked.

> By adding the
> call of xfs_trans_del_item to xfs_trans_uncommit we thus skip
> the calls to IOP_UNLOCK for dirty log items, which is a large
> change in behaviour for the existing log items that didn't have
> the xfs_trans_del_item calls, and at least for the dquot and
> inode items seems incorrect.

You are right, it does break xfs_trans_free_items() and the way it
calls IOP_UNLOCK(). I'm surprised that it didn't cause any noticable
leaks of locked buffers....

----

Effectively, we've got a situation where we need to allow
xfs_trans_del_item() to be called in IOP_UNPIN(aborted) iff there is
a log item descriptor attached to the log item. That requires
xfs_trans_uncommit() to do a safe list traversal and both the efi
and buf items to check for a valid lidp before calling
xfs_trans_del_item(). I'll change the fix and commment it better in
the code and changelog.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/8] xfs: limit extent length for allocation to AG size
  2011-01-26 21:22   ` Alex Elder
@ 2011-01-27  0:38     ` Dave Chinner
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Chinner @ 2011-01-27  0:38 UTC (permalink / raw)
  To: Alex Elder; +Cc: xfs

On Wed, Jan 26, 2011 at 03:22:48PM -0600, Alex Elder wrote:
> On Tue, 2011-01-25 at 19:50 +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Delayed allocation extents can be larger than AGs, so when trying to
> > convert a large range we may scan every AG inside
> > xfs_bmap_alloc_nullfb() trying to find an AG with a size larger than
> > an AG. We should stop when we find the first AG with a maximum
> > possible allocation size. This causes excessive CPU usage when there
> > are lots of AGs.
> > 
> > The same problem occurs when doing preallocation of a range larger
> > than an AG.
> > 
> > Fix the problem by limiting real allocation lengths to the maximum
> > that an AG can support. This means if we have empty AGs, we'll stop
> > the search at the first of them. If there are no empty AGs, we'll
> > still scan them all, but that is a different problem....
> 
> Maybe I'm wrong but I think you need to change a "+"
> to a "-" (shown below).

Good catch. That's what I get for cleaning up code ;)

> 
> And I have a few really minor suggestions:
> - You should update a comment (which I point
>   out below) to match your change.

Will fix.

> - Maybe make use of a local variable, at least
>   in xfs_bmap_btalloc_nullfb(), such as:
>     xfs_extlen_t requested = args->maxlen;

All the allocation code is coded that way to avoid putting local
variables on the stack. The allocation-in-writeback path is the
critical stack usage path in XFS, so I'd prefer to keep using
args->maxlen here....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/8] xfs: prevent extsize alignment from exceeding maximum extent size
  2011-01-26 21:22   ` Alex Elder
@ 2011-01-27  0:50     ` Dave Chinner
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Chinner @ 2011-01-27  0:50 UTC (permalink / raw)
  To: Alex Elder; +Cc: xfs

On Wed, Jan 26, 2011 at 03:22:55PM -0600, Alex Elder wrote:
> On Tue, 2011-01-25 at 19:50 +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When doing delayed allocation, if the allocation size is for a
> > maximally sized extent, extent size alignment can push it over this
> > limit. This results in an assert failure in xfs_bmbt_set_allf() as
> > the extent length is too large to find in the extent record.
> > 
> > Fix this by ensuring that we allow for space that extent size
> > alignment requires (up to 2 * (extsize -1) blocks as we have to
> > handle both head and tail alignment) when limiting the maximum size
> > of the extent.
> 
> I think this is OK, however...
> 
> It seems to me that the XFS_FILBLKS_MIN() call you're making is
> sort of magical because it pre-supposes exactly what the following
> xfs_bmap_extsize_align() actually does.

Sure, but that is not a big deal because we know exactly
what xfs_bmap_extsize_align() does and we control it directly.
I plan to completely remove the extsize alignment from the delalloc
path anyway, so this is really just a fix for the given bug, not a
long term solution.

> And because of that, I
> would rather see that logic built into xfs_bmap_extsize_align()
> itself.

Possibly....
> I haven't looked at it closely, but I presume the other
> two spots that call xfs_bmap_extsize_align() would be subject to
> the same MAXEXTLEN limit.

.... but it doesn't matter for those callers as they already limit
the length of the allocation after alignment to something valid.

The problem is that the delayed allocation path has none of the same
limits on the extent length that other callers have, so the fix is
only needed for the delayed allocation path.

> OK with me if you disagree though.

Yes, I disagree, but it's still a very good question. ;)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 8/8] xfs: fix dquot shaker deadlock
  2011-01-25  9:52   ` Christoph Hellwig
@ 2011-01-27  1:54     ` Dave Chinner
  2011-01-27  2:24       ` Dave Chinner
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2011-01-27  1:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Tue, Jan 25, 2011 at 04:52:42AM -0500, Christoph Hellwig wrote:
> On Tue, Jan 25, 2011 at 07:50:44PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Commit 368e136 ("xfs: remove duplicate code from dquot reclaim") fails
> > to unlock the dquot freelist when the number of loop restarts is
> > exceeded in xfs_qm_dqreclaim_one(). This causes hangs in memory
> > reclaim. Remove the bogus loop exit check that causes the problem.
> 
> The fix looks correct, but it's a bit inconsequential about when
> to adhere the retry limit and when not.  Shouldn't we just turn the
> exit condition into:
> 
> 	if (dqout || restarts >= XFS_QM_RECLAIM_MAX_RESTARTS)
> 		break;

I'm guessing that you are suggesting changing the code increments
and checks restarts to:

	restarts++;
	goto startagain;

otherwise this doesn't make sense as restarts will never go above
XFS_QM_RECLAIM_MAX_RESTARTS at this point in the loop.

Even so, I don't think this is a good idea, because it means we have
to get to the end of the loop before we check restarts for loop
termination. Given that the restarts is counting the number of times
we fail to get locks, encounter dquots that are being recycled, etc,
this would render then restart counter ineffective as a method of
determining whether we are making progress or not. Hence I think
that simply removing the check that I did is the correct fix.

> also the failure to acquite qi_dqlist_lock increments the restart
> count twice, which was also added in the same commit.

I'll fix that, though.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 8/8] xfs: fix dquot shaker deadlock
  2011-01-27  1:54     ` Dave Chinner
@ 2011-01-27  2:24       ` Dave Chinner
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Chinner @ 2011-01-27  2:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Thu, Jan 27, 2011 at 12:54:20PM +1100, Dave Chinner wrote:
> On Tue, Jan 25, 2011 at 04:52:42AM -0500, Christoph Hellwig wrote:
> > On Tue, Jan 25, 2011 at 07:50:44PM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Commit 368e136 ("xfs: remove duplicate code from dquot reclaim") fails
> > > to unlock the dquot freelist when the number of loop restarts is
> > > exceeded in xfs_qm_dqreclaim_one(). This causes hangs in memory
> > > reclaim. Remove the bogus loop exit check that causes the problem.
> > 
> > The fix looks correct, but it's a bit inconsequential about when
> > to adhere the retry limit and when not.  Shouldn't we just turn the
> > exit condition into:
> > 
> > 	if (dqout || restarts >= XFS_QM_RECLAIM_MAX_RESTARTS)
> > 		break;
> 
> I'm guessing that you are suggesting changing the code increments
> and checks restarts to:
> 
> 	restarts++;
> 	goto startagain;
> 
> otherwise this doesn't make sense as restarts will never go above
> XFS_QM_RECLAIM_MAX_RESTARTS at this point in the loop.

Argh, there's one loop case where it does increment restarts without
jumping to startagain. I'm going to change this loop to stack the
loop exit cases rather than do them all individually and making it
difficult to work out what is going on.

So each case will increment restarts itself as necessary, set a
"startagain" flag if a loop restart is required, and set dqpout if a
successful reclaim occurred. That way each case can then jump to the
correctly stacked unlock point and all the loop exit logic is in one
place.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/8] xfs: fix efi item leak on forced shutdown
  2011-01-27  3:53 ` [PATCH 2/8] xfs: fix efi item leak on forced shutdown Dave Chinner
@ 2011-01-28 14:54   ` Alex Elder
  0 siblings, 0 replies; 30+ messages in thread
From: Alex Elder @ 2011-01-28 14:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, 2011-01-27 at 14:53 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> After test 139, kmemleak shows:
> 
> unreferenced object 0xffff880078b405d8 (size 400):
>   comm "xfs_io", pid 4904, jiffies 4294909383 (age 1186.728s)
>   hex dump (first 32 bytes):
>     60 c1 17 79 00 88 ff ff 60 c1 17 79 00 88 ff ff  `..y....`..y....
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<ffffffff81afb04d>] kmemleak_alloc+0x2d/0x60
>     [<ffffffff8115c6cf>] kmem_cache_alloc+0x13f/0x2b0
>     [<ffffffff814aaa97>] kmem_zone_alloc+0x77/0xf0
>     [<ffffffff814aab2e>] kmem_zone_zalloc+0x1e/0x50
>     [<ffffffff8147cd6b>] xfs_efi_init+0x4b/0xb0
>     [<ffffffff814a4ee8>] xfs_trans_get_efi+0x58/0x90
>     [<ffffffff81455fab>] xfs_bmap_finish+0x8b/0x1d0
>     [<ffffffff814851b4>] xfs_itruncate_finish+0x2c4/0x5d0
>     [<ffffffff814a970f>] xfs_setattr+0x8df/0xa70
>     [<ffffffff814b5c7b>] xfs_vn_setattr+0x1b/0x20
>     [<ffffffff8117dc00>] notify_change+0x170/0x2e0
>     [<ffffffff81163bf6>] do_truncate+0x66/0xa0
>     [<ffffffff81163d0b>] sys_ftruncate+0xdb/0xe0
>     [<ffffffff8103a002>] system_call_fastpath+0x16/0x1b
>     [<ffffffffffffffff>] 0xffffffffffffffff
> 
> The cause of the leak is that the "remove" parameter of IOP_UNPIN()
> is never set when a CIL push is aborted. This means that the EFI
> item is never freed if it was in the push being cancelled. The
> problem is specific to delayed logging, but has uncovered a couple
> of problems with the handling of IOP_UNPIN(remove).
> 
> Firstly, we cannot safely call xfs_trans_del_item() from IOP_UNPIN()
> in the CIL commit failure path or the iclog write failure path
> because for delayed loging we have no transaction context. Hence we
> must only call xfs_trans_del_item() if the log item being unpinned
> has an active log item descriptor.
> 
> Secondly, xfs_trans_uncommit() does not handle log item descriptor
> freeing during the traversal of log items on a transaction. It can
> reference a freed log item descriptor when unpinning an EFI item.
> Hence it needs to use a safe list traversal method to allow items to
> be removed from the transaction during IOP_UNPIN().
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

OK, this looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/8] xfs: fix efi item leak on forced shutdown
  2011-01-27  3:53 [PATCH 0/8] xfs: candidate 2.6.38-rc fixes V3 Dave Chinner
@ 2011-01-27  3:53 ` Dave Chinner
  2011-01-28 14:54   ` Alex Elder
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2011-01-27  3:53 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

After test 139, kmemleak shows:

unreferenced object 0xffff880078b405d8 (size 400):
  comm "xfs_io", pid 4904, jiffies 4294909383 (age 1186.728s)
  hex dump (first 32 bytes):
    60 c1 17 79 00 88 ff ff 60 c1 17 79 00 88 ff ff  `..y....`..y....
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81afb04d>] kmemleak_alloc+0x2d/0x60
    [<ffffffff8115c6cf>] kmem_cache_alloc+0x13f/0x2b0
    [<ffffffff814aaa97>] kmem_zone_alloc+0x77/0xf0
    [<ffffffff814aab2e>] kmem_zone_zalloc+0x1e/0x50
    [<ffffffff8147cd6b>] xfs_efi_init+0x4b/0xb0
    [<ffffffff814a4ee8>] xfs_trans_get_efi+0x58/0x90
    [<ffffffff81455fab>] xfs_bmap_finish+0x8b/0x1d0
    [<ffffffff814851b4>] xfs_itruncate_finish+0x2c4/0x5d0
    [<ffffffff814a970f>] xfs_setattr+0x8df/0xa70
    [<ffffffff814b5c7b>] xfs_vn_setattr+0x1b/0x20
    [<ffffffff8117dc00>] notify_change+0x170/0x2e0
    [<ffffffff81163bf6>] do_truncate+0x66/0xa0
    [<ffffffff81163d0b>] sys_ftruncate+0xdb/0xe0
    [<ffffffff8103a002>] system_call_fastpath+0x16/0x1b
    [<ffffffffffffffff>] 0xffffffffffffffff

The cause of the leak is that the "remove" parameter of IOP_UNPIN()
is never set when a CIL push is aborted. This means that the EFI
item is never freed if it was in the push being cancelled. The
problem is specific to delayed logging, but has uncovered a couple
of problems with the handling of IOP_UNPIN(remove).

Firstly, we cannot safely call xfs_trans_del_item() from IOP_UNPIN()
in the CIL commit failure path or the iclog write failure path
because for delayed loging we have no transaction context. Hence we
must only call xfs_trans_del_item() if the log item being unpinned
has an active log item descriptor.

Secondly, xfs_trans_uncommit() does not handle log item descriptor
freeing during the traversal of log items on a transaction. It can
reference a freed log item descriptor when unpinning an EFI item.
Hence it needs to use a safe list traversal method to allow items to
be removed from the transaction during IOP_UNPIN().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf_item.c     |   12 +++++++-----
 fs/xfs/xfs_extfree_item.c |    3 ++-
 fs/xfs/xfs_trans.c        |   36 +++++++++++++++++++++++++++++-------
 3 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 98c6f73..6f8c21c 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -427,13 +427,15 @@ xfs_buf_item_unpin(
 
 		if (remove) {
 			/*
-			 * We have to remove the log item from the transaction
-			 * as we are about to release our reference to the
-			 * buffer.  If we don't, the unlock that occurs later
-			 * in xfs_trans_uncommit() will ry to reference the
+			 * If we are in a transaction context, we have to
+			 * remove the log item from the transaction as we are
+			 * about to release our reference to the buffer.  If we
+			 * don't, the unlock that occurs later in
+			 * xfs_trans_uncommit() will try to reference the
 			 * buffer which we no longer have a hold on.
 			 */
-			xfs_trans_del_item(lip);
+			if (lip->li_desc)
+				xfs_trans_del_item(lip);
 
 			/*
 			 * Since the transaction no longer refers to the buffer,
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 75f2ef6..d22e626 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -138,7 +138,8 @@ xfs_efi_item_unpin(
 
 	if (remove) {
 		ASSERT(!(lip->li_flags & XFS_LI_IN_AIL));
-		xfs_trans_del_item(lip);
+		if (lip->li_desc)
+			xfs_trans_del_item(lip);
 		xfs_efi_item_free(efip);
 		return;
 	}
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 33dbc4e..29f5e54 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1446,6 +1446,14 @@ xfs_log_item_batch_insert(
  * Bulk operation version of xfs_trans_committed that takes a log vector of
  * items to insert into the AIL. This uses bulk AIL insertion techniques to
  * minimise lock traffic.
+ *
+ * If we are called with the aborted flag set, it is because a log write during
+ * a CIL checkpoint commit has failed. In this case, all the items in the
+ * checkpoint have already gone through IOP_COMMITED and IOP_UNLOCK, which
+ * means that checkpoint commit abort handling is treated exactly the same
+ * as an iclog write error even though we haven't started any IO yet. Hence in
+ * this case all we need to do is IOP_COMMITTED processing, followed by an
+ * IOP_UNPIN(aborted) call.
  */
 void
 xfs_trans_committed_bulk(
@@ -1472,6 +1480,16 @@ xfs_trans_committed_bulk(
 		if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)
 			continue;
 
+		/*
+		 * if we are aborting the operation, no point in inserting the
+		 * object into the AIL as we are in a shutdown situation.
+		 */
+		if (aborted) {
+			ASSERT(XFS_FORCED_SHUTDOWN(ailp->xa_mount));
+			IOP_UNPIN(lip, 1);
+			continue;
+		}
+
 		if (item_lsn != commit_lsn) {
 
 			/*
@@ -1503,20 +1521,24 @@ xfs_trans_committed_bulk(
 }
 
 /*
- * Called from the trans_commit code when we notice that
- * the filesystem is in the middle of a forced shutdown.
+ * Called from the trans_commit code when we notice that the filesystem is in
+ * the middle of a forced shutdown.
+ *
+ * When we are called here, we have already pinned all the items in the
+ * transaction. However, neither IOP_COMMITTING or IOP_UNLOCK has been called
+ * so we can simply walk the items in the transaction, unpin them with an abort
+ * flag and then free the items. Note that unpinning the items can result in
+ * them being freed immediately, so we need to use a safe list traversal method
+ * here.
  */
 STATIC void
 xfs_trans_uncommit(
 	struct xfs_trans	*tp,
 	uint			flags)
 {
-	struct xfs_log_item_desc *lidp;
+	struct xfs_log_item_desc *lidp, *n;
 
-	list_for_each_entry(lidp, &tp->t_items, lid_trans) {
-		/*
-		 * Unpin all but those that aren't dirty.
-		 */
+	list_for_each_entry_safe(lidp, n, &tp->t_items, lid_trans) {
 		if (lidp->lid_flags & XFS_LID_DIRTY)
 			IOP_UNPIN(lidp->lid_item, 1);
 	}
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2011-01-28 14:51 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-25  8:50 [PATCH 0/8] xfs: 2.6.38-rc candidate fixes V2 Dave Chinner
2011-01-25  8:50 ` [PATCH 1/8] xfs: fix log ticket leak on forced shutdown Dave Chinner
2011-01-26 21:22   ` Alex Elder
2011-01-25  8:50 ` [PATCH 2/8] xfs: fix efi item " Dave Chinner
2011-01-25 23:53   ` Christoph Hellwig
2011-01-27  0:35     ` Dave Chinner
2011-01-26 21:22   ` Alex Elder
2011-01-25  8:50 ` [PATCH 3/8] xfs: speculative delayed allocation uses rounddown_power_of_2 badly Dave Chinner
2011-01-26 21:22   ` Alex Elder
2011-01-25  8:50 ` [PATCH 4/8] xfs: limit extent length for allocation to AG size Dave Chinner
2011-01-26 21:22   ` Alex Elder
2011-01-27  0:38     ` Dave Chinner
2011-01-25  8:50 ` [PATCH 5/8] xfs: prevent extsize alignment from exceeding maximum extent size Dave Chinner
2011-01-25  9:49   ` Christoph Hellwig
2011-01-26 21:22   ` Alex Elder
2011-01-27  0:50     ` Dave Chinner
2011-01-25  8:50 ` [PATCH 6/8] xfs: limit extsize to size of AGs and/or MAXEXTLEN Dave Chinner
2011-01-26 21:23   ` Alex Elder
2011-01-25  8:50 ` [PATCH 7/8] xfs: handle CIl transaction commit failures correctly Dave Chinner
2011-01-25  9:53   ` Christoph Hellwig
2011-01-26 21:23   ` Alex Elder
2011-01-25  8:50 ` [PATCH 8/8] xfs: fix dquot shaker deadlock Dave Chinner
2011-01-25  9:52   ` Christoph Hellwig
2011-01-27  1:54     ` Dave Chinner
2011-01-27  2:24       ` Dave Chinner
2011-01-26 21:23   ` Alex Elder
2011-01-25  9:20 ` [PATCH 0/8] xfs: 2.6.38-rc candidate fixes V2 Christoph Hellwig
2011-01-26 21:23 ` Alex Elder
2011-01-27  3:53 [PATCH 0/8] xfs: candidate 2.6.38-rc fixes V3 Dave Chinner
2011-01-27  3:53 ` [PATCH 2/8] xfs: fix efi item leak on forced shutdown Dave Chinner
2011-01-28 14:54   ` Alex Elder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.