All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: check for stale inode before acquiring iflock on push
@ 2012-06-11 14:39 Brian Foster
  2012-06-11 18:39 ` Mark Tinguely
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Brian Foster @ 2012-06-11 14:39 UTC (permalink / raw)
  To: xfs

An inode in the AIL can be flush locked and marked stale if
a cluster free transaction occurs at the right time. The
inode item is then marked as flushing, which causes xfsaild
to spin and leaves the filesystem stalled. This is
reproduced by running xfstests 273 in a loop for an
extended period of time.

Check for stale inodes before the flush lock. This marks
the inode as pinned, leads to a log flush and allows the
filesystem to proceed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---

This patch resolves the stall I was reproducing with the 273 loop test.
I repeated the test pretty much throughout the weekend. I still hit one
hung task timeout message, but the test proceeded through it.

Dave, I know you mentioned you were sending a similar patch. Either you
didn't get to it or I missed it, but here's what I've been testing....

Brian

 fs/xfs/xfs_inode_item.c |   17 ++++++++---------
 1 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 0309c979..e6ba0ec 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -505,6 +505,14 @@ xfs_inode_item_push(
 	}
 
 	/*
+	 * Stale inode items should force out the iclog.
+	 */
+	if (ip->i_flags & XFS_ISTALE) {
+		rval = XFS_ITEM_PINNED;
+		goto out_unlock;
+	}
+
+	/*
 	 * Someone else is already flushing the inode.  Nothing we can do
 	 * here but wait for the flush to finish and remove the item from
 	 * the AIL.
@@ -514,15 +522,6 @@ xfs_inode_item_push(
 		goto out_unlock;
 	}
 
-	/*
-	 * Stale inode items should force out the iclog.
-	 */
-	if (ip->i_flags & XFS_ISTALE) {
-		xfs_ifunlock(ip);
-		xfs_iunlock(ip, XFS_ILOCK_SHARED);
-		return XFS_ITEM_PINNED;
-	}
-
 	ASSERT(iip->ili_fields != 0 || XFS_FORCED_SHUTDOWN(ip->i_mount));
 	ASSERT(iip->ili_logged == 0 || XFS_FORCED_SHUTDOWN(ip->i_mount));
 
-- 
1.7.7.6

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: check for stale inode before acquiring iflock on push
  2012-06-11 14:39 [PATCH] xfs: check for stale inode before acquiring iflock on push Brian Foster
@ 2012-06-11 18:39 ` Mark Tinguely
  2012-06-12  0:18   ` Dave Chinner
  2012-06-12  0:12 ` Dave Chinner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: Mark Tinguely @ 2012-06-11 18:39 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On 06/11/12 09:39, Brian Foster wrote:
> An inode in the AIL can be flush locked and marked stale if
> a cluster free transaction occurs at the right time. The
> inode item is then marked as flushing, which causes xfsaild
> to spin and leaves the filesystem stalled. This is
> reproduced by running xfstests 273 in a loop for an
> extended period of time.
>
> Check for stale inodes before the flush lock. This marks
> the inode as pinned, leads to a log flush and allows the
> filesystem to proceed.
>
> Signed-off-by: Brian Foster<bfoster@redhat.com>
> ---
>
> This patch resolves the stall I was reproducing with the 273 loop test.
> I repeated the test pretty much throughout the weekend. I still hit one
> hung task timeout message, but the test proceeded through it.
>
> Dave, I know you mentioned you were sending a similar patch. Either you
> didn't get to it or I missed it, but here's what I've been testing....
>
> Brian
>


Still hangs right away on Linux 3.5rc1 using a very small log and the 
perl test program.

I will investigate more.

Darn, the printk routines in Linux 3.5 added a "struct log" and crash is 
finding that definition.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: check for stale inode before acquiring iflock on push
  2012-06-11 14:39 [PATCH] xfs: check for stale inode before acquiring iflock on push Brian Foster
  2012-06-11 18:39 ` Mark Tinguely
@ 2012-06-12  0:12 ` Dave Chinner
  2012-06-15 12:40 ` Christoph Hellwig
  2012-06-18 14:29 ` Mark Tinguely
  3 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2012-06-12  0:12 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Mon, Jun 11, 2012 at 10:39:43AM -0400, Brian Foster wrote:
> An inode in the AIL can be flush locked and marked stale if
> a cluster free transaction occurs at the right time. The
> inode item is then marked as flushing, which causes xfsaild
> to spin and leaves the filesystem stalled. This is
> reproduced by running xfstests 273 in a loop for an
> extended period of time.
> 
> Check for stale inodes before the flush lock. This marks
> the inode as pinned, leads to a log flush and allows the
> filesystem to proceed.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> This patch resolves the stall I was reproducing with the 273 loop test.
> I repeated the test pretty much throughout the weekend. I still hit one
> hung task timeout message, but the test proceeded through it.
> 
> Dave, I know you mentioned you were sending a similar patch. Either you
> didn't get to it or I missed it, but here's what I've been testing....

Just didn't get to sending it - it was a holiday yesterday. The
patch is identical, though the commit message is a bit different:

    xfs: inode staleness more important than flushing for AIL

    When we have a dirty stale inode, it must be attached to the
    underlying stale buffer and that means it is flush locked. This
    means that the AIL pushing will only ever see flushing inodes, and
    that means if enough of them are built up at the start of the AIL we
    will never trigger a log force from the AIL to get them moving.

    Hence consider the stale state of inodes more important to report
    than whether they are flushing so as to trigger log forces from the
    AIL more readily in this situation.

Otherwise, consider it:

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: check for stale inode before acquiring iflock on push
  2012-06-11 18:39 ` Mark Tinguely
@ 2012-06-12  0:18   ` Dave Chinner
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2012-06-12  0:18 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: Brian Foster, xfs

On Mon, Jun 11, 2012 at 01:39:28PM -0500, Mark Tinguely wrote:
> On 06/11/12 09:39, Brian Foster wrote:
> >An inode in the AIL can be flush locked and marked stale if
> >a cluster free transaction occurs at the right time. The
> >inode item is then marked as flushing, which causes xfsaild
> >to spin and leaves the filesystem stalled. This is
> >reproduced by running xfstests 273 in a loop for an
> >extended period of time.
> >
> >Check for stale inodes before the flush lock. This marks
> >the inode as pinned, leads to a log flush and allows the
> >filesystem to proceed.
> >
> >Signed-off-by: Brian Foster<bfoster@redhat.com>
> >---
> >
> >This patch resolves the stall I was reproducing with the 273 loop test.
> >I repeated the test pretty much throughout the weekend. I still hit one
> >hung task timeout message, but the test proceeded through it.
> >
> >Dave, I know you mentioned you were sending a similar patch. Either you
> >didn't get to it or I missed it, but here's what I've been testing....
> >
> >Brian
> >
> 
> 
> Still hangs right away on Linux 3.5rc1 using a very small log and
> the perl test program.
> 
> I will investigate more.
> 
> Darn, the printk routines in Linux 3.5 added a "struct log" and
> crash is finding that definition.

Luckily that's just an internal structure in printk.c. However, I've
been waiting for such a namespace collision to happen for a long
time so perhaps you could write a patch that renames the XFS one to
"struct xlog" to match all the other XFS log codei to avoid
potential problems in the future. Oh, and while you are there, kill
the xlog_t typedef....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: check for stale inode before acquiring iflock on push
  2012-06-11 14:39 [PATCH] xfs: check for stale inode before acquiring iflock on push Brian Foster
  2012-06-11 18:39 ` Mark Tinguely
  2012-06-12  0:12 ` Dave Chinner
@ 2012-06-15 12:40 ` Christoph Hellwig
  2012-06-18 14:29 ` Mark Tinguely
  3 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2012-06-15 12:40 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: check for stale inode before acquiring iflock on push
  2012-06-11 14:39 [PATCH] xfs: check for stale inode before acquiring iflock on push Brian Foster
                   ` (2 preceding siblings ...)
  2012-06-15 12:40 ` Christoph Hellwig
@ 2012-06-18 14:29 ` Mark Tinguely
  3 siblings, 0 replies; 6+ messages in thread
From: Mark Tinguely @ 2012-06-18 14:29 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On 06/11/12 09:39, Brian Foster wrote:
> An inode in the AIL can be flush locked and marked stale if
> a cluster free transaction occurs at the right time. The
> inode item is then marked as flushing, which causes xfsaild
> to spin and leaves the filesystem stalled. This is
> reproduced by running xfstests 273 in a loop for an
> extended period of time.
>
> Check for stale inodes before the flush lock. This marks
> the inode as pinned, leads to a log flush and allows the
> filesystem to proceed.
>
> Signed-off-by: Brian Foster<bfoster@redhat.com>
> ---
>
> This patch resolves the stall I was reproducing with the 273 loop test.
> I repeated the test pretty much throughout the weekend. I still hit one
> hung task timeout message, but the test proceeded through it.
>
> Dave, I know you mentioned you were sending a similar patch. Either you
> didn't get to it or I missed it, but here's what I've been testing....
>
> Brian
>
>   fs/xfs/xfs_inode_item.c |   17 ++++++++---------
>   1 files changed, 8 insertions(+), 9 deletions(-)
>

Your patch looks good. Considered it also:

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

As far as problem 2. Small logs can still get stuck. I have seen
hang B in 576K-1MB sized logs, I have not larger logs hard enough
to see where the problem will start to occur.

A) Test like the problem replicator
       (http://oss.sgi.com/bugzilla/show_bug.cgi?id=922)
    will quickly hang because the test can get into a situation where
    there are no new transactions to push the AIL and the sync
    worker's allocation of the dummy transaction is hung behind these
    requests.

    This is mostly a limit of the test. Any filesystem activity that is
    not related to the test will restart the filesystem. My earlier
    posted patch that restarts the AIL cleaning when there are still
    waiters after the last tail move will help this situation.

B) Keep pushing the log long enough and there will be a hard hang with
    an empty AIL
  1) Large request for log. (typically about 340K)
   a) some free space but not enough for first request
  2) AIL is empty
  3) Remaining space (not quite half the log) is in CIL
   a) current CTX is too small for passive CIL push (just less than log
      size / 8)
   b) remaining space is in the current CTX ticket and a previously
      pushed CTX sequence.
     i) No leaks.
     ii) I have not worked out why the previously pushed CTX has not
        been written out. The iclog looks unlocked and ACTIVE.
  4) The sync worker is blocked on the dummy transaction allocation and
     cannot not push the CIL again.

Sorry this machine will not kdump.

--Mark Tinguely

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-06-18 14:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-11 14:39 [PATCH] xfs: check for stale inode before acquiring iflock on push Brian Foster
2012-06-11 18:39 ` Mark Tinguely
2012-06-12  0:18   ` Dave Chinner
2012-06-12  0:12 ` Dave Chinner
2012-06-15 12:40 ` Christoph Hellwig
2012-06-18 14:29 ` Mark Tinguely

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.