All of lore.kernel.org
 help / color / mirror / Atom feed
From: Markus Trippelsdorf <markus@trippelsdorf.de>
To: "Bruno Prémont" <bonbons@linux-vserver.org>
Cc: Dave Chinner <david@fromorbit.com>,
	xfs-masters@oss.sgi.com, xfs@oss.sgi.com,
	Christoph Hellwig <hch@infradead.org>,
	Alex Elder <aelder@sgi.com>, Dave Chinner <dchinner@redhat.com>,
	linux-kernel@vger.kernel.org,
	James Bottomley <James.Bottomley@HansenPartnership.com>
Subject: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38
Date: Mon, 2 May 2011 08:15:28 +0200	[thread overview]
Message-ID: <20110502061528.GA22538@x4.trippels.de> (raw)
In-Reply-To: <20110430161810.6ccd2c99@neptune.home>

On 2011.04.30 at 16:18 +0200, Bruno Prémont wrote:
> On Fri, 29 April 2011 Bruno Prémont wrote:
> > On Fri, 29 April 2011 Markus Trippelsdorf wrote:
> > > On 2011.04.29 at 11:19 +1000, Dave Chinner wrote:
> > > > OK, so the common elements here appears to be root filesystems
> > > > with small log sizes, which means they are tail pushing all the
> > > > time metadata operations are in progress. Definitely seems like a
> > > > race in the AIL workqueue trigger mechanism. I'll see if I can
> > > > reproduce this and cook up a patch to fix it.
> > > 
> > > Hmm, I'm wondering if this issue is somehow related to the hrtimer bug,
> > > that Thomas Gleixner fixed yesterday:
> > > http://git.us.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commit;h=ce31332d3c77532d6ea97ddcb475a2b02dd358b4
> > > http://thread.gmane.org/gmane.linux.kernel.mm/61909/
> > > 
> > > It also looks similar to the issue that James Bottomley reported
> > > earlier: http://thread.gmane.org/gmane.linux.kernel.mm/62185/ 
> > 
> > I'm going to see, I've applied Thomas' fix on the box seeing XFS freeze (without
> > other changes to kernel).
> > Going to run that kernel for the week-end and beyond if it survives to see what
> > happens.
> 
> Happened again (after a few hours of uptime), so it definitely is not
> caused by hrtimer bug that Thomas Gleixner fixed.

I've enabled lock debugging and this is what happened after a few hours
uptime. (I can't tell if this is a false positive):

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.39-rc5-00130-g3fd9952 #10
-------------------------------------------------------
kio_file/7364 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#5/2){+.+...}, at: [<ffffffff81151e4e>] generic_file_splice_write+0xce/0x180

but task is already holding lock:
 (xfs_iolock_active){++++++}, at: [<ffffffff811d7d05>] xfs_ilock+0x125/0x1f0

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (xfs_iolock_active){++++++}:
       [<ffffffff810ae8a2>] lock_acquire+0x92/0x1f0
       [<ffffffff8109d9bf>] down_write_nested+0x2f/0x60
       [<ffffffff811d7d05>] xfs_ilock+0x125/0x1f0
       [<ffffffff812061d6>] xfs_file_buffered_aio_write+0x66/0x290
       [<ffffffff81206561>] xfs_file_aio_write+0x161/0x300
       [<ffffffff811261f2>] do_sync_write+0xd2/0x110
       [<ffffffff811262df>] vfs_write+0xaf/0x160
       [<ffffffff8112646a>] sys_write+0x4a/0x90
       [<ffffffff8154cc6b>] system_call_fastpath+0x16/0x1b

-> #1 (&sb->s_type->i_mutex_key#5){+.+.+.}:
       [<ffffffff810ae8a2>] lock_acquire+0x92/0x1f0
       [<ffffffff81549981>] mutex_lock_nested+0x51/0x370
       [<ffffffff81135afd>] vfs_rename+0xed/0x420
       [<ffffffff81136037>] sys_renameat+0x207/0x230
       [<ffffffff8113607b>] sys_rename+0x1b/0x20
       [<ffffffff8154cc6b>] system_call_fastpath+0x16/0x1b

-> #0 (&sb->s_type->i_mutex_key#5/2){+.+...}:
       [<ffffffff810ac21f>] __lock_acquire+0x169f/0x1b90
       [<ffffffff810ae8a2>] lock_acquire+0x92/0x1f0
       [<ffffffff81549981>] mutex_lock_nested+0x51/0x370
       [<ffffffff81151e4e>] generic_file_splice_write+0xce/0x180
       [<ffffffff81205334>] xfs_file_splice_write+0xf4/0x250
       [<ffffffff8115024e>] do_splice_from+0x7e/0xb0
       [<ffffffff811502a0>] direct_splice_actor+0x20/0x30
       [<ffffffff81151fde>] splice_direct_to_actor+0xbe/0x1c0
       [<ffffffff81152158>] do_splice_direct+0x78/0x90
       [<ffffffff81125f72>] do_sendfile+0x182/0x1d0
       [<ffffffff81126f1a>] sys_sendfile64+0x5a/0xb0
       [<ffffffff8154cc6b>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

1 lock held by kio_file/7364:
 #0:  (xfs_iolock_active){++++++}, at: [<ffffffff811d7d05>] xfs_ilock+0x125/0x1f0

stack backtrace:
Pid: 7364, comm: kio_file Not tainted 2.6.39-rc5-00130-g3fd9952 #10
Call Trace:
 [<ffffffff81543f4c>] print_circular_bug+0xb8/0xc7
 [<ffffffff810ac21f>] __lock_acquire+0x169f/0x1b90
 [<ffffffff8115153d>] ? __generic_file_splice_read+0x1cd/0x5c0
 [<ffffffff810ae8a2>] lock_acquire+0x92/0x1f0
 [<ffffffff81151e4e>] ? generic_file_splice_write+0xce/0x180
 [<ffffffff8148de90>] ? sock_def_write_space+0x140/0x140
 [<ffffffff81549981>] mutex_lock_nested+0x51/0x370
 [<ffffffff81151e4e>] ? generic_file_splice_write+0xce/0x180
 [<ffffffff81151e4e>] generic_file_splice_write+0xce/0x180
 [<ffffffff81205334>] xfs_file_splice_write+0xf4/0x250
 [<ffffffff8120510f>] ? xfs_file_splice_read+0xef/0x220
 [<ffffffff8115024e>] do_splice_from+0x7e/0xb0
 [<ffffffff811502a0>] direct_splice_actor+0x20/0x30
 [<ffffffff81151fde>] splice_direct_to_actor+0xbe/0x1c0
 [<ffffffff81150280>] ? do_splice_from+0xb0/0xb0
 [<ffffffff81152158>] do_splice_direct+0x78/0x90
 [<ffffffff81125f72>] do_sendfile+0x182/0x1d0
 [<ffffffff81126f1a>] sys_sendfile64+0x5a/0xb0
 [<ffffffff8154cc6b>] system_call_fastpath+0x16/0x1b


-- 
Markus

WARNING: multiple messages have this Message-ID (diff)
From: Markus Trippelsdorf <markus@trippelsdorf.de>
To: "Bruno Prémont" <bonbons@linux-vserver.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>,
	Dave Chinner <dchinner@redhat.com>,
	linux-kernel@vger.kernel.org, xfs@oss.sgi.com,
	Christoph Hellwig <hch@infradead.org>,
	xfs-masters@oss.sgi.com, Alex Elder <aelder@sgi.com>
Subject: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38
Date: Mon, 2 May 2011 08:15:28 +0200	[thread overview]
Message-ID: <20110502061528.GA22538@x4.trippels.de> (raw)
In-Reply-To: <20110430161810.6ccd2c99@neptune.home>

On 2011.04.30 at 16:18 +0200, Bruno Prémont wrote:
> On Fri, 29 April 2011 Bruno Prémont wrote:
> > On Fri, 29 April 2011 Markus Trippelsdorf wrote:
> > > On 2011.04.29 at 11:19 +1000, Dave Chinner wrote:
> > > > OK, so the common elements here appears to be root filesystems
> > > > with small log sizes, which means they are tail pushing all the
> > > > time metadata operations are in progress. Definitely seems like a
> > > > race in the AIL workqueue trigger mechanism. I'll see if I can
> > > > reproduce this and cook up a patch to fix it.
> > > 
> > > Hmm, I'm wondering if this issue is somehow related to the hrtimer bug,
> > > that Thomas Gleixner fixed yesterday:
> > > http://git.us.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commit;h=ce31332d3c77532d6ea97ddcb475a2b02dd358b4
> > > http://thread.gmane.org/gmane.linux.kernel.mm/61909/
> > > 
> > > It also looks similar to the issue that James Bottomley reported
> > > earlier: http://thread.gmane.org/gmane.linux.kernel.mm/62185/ 
> > 
> > I'm going to see, I've applied Thomas' fix on the box seeing XFS freeze (without
> > other changes to kernel).
> > Going to run that kernel for the week-end and beyond if it survives to see what
> > happens.
> 
> Happened again (after a few hours of uptime), so it definitely is not
> caused by hrtimer bug that Thomas Gleixner fixed.

I've enabled lock debugging and this is what happened after a few hours
uptime. (I can't tell if this is a false positive):

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.39-rc5-00130-g3fd9952 #10
-------------------------------------------------------
kio_file/7364 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#5/2){+.+...}, at: [<ffffffff81151e4e>] generic_file_splice_write+0xce/0x180

but task is already holding lock:
 (xfs_iolock_active){++++++}, at: [<ffffffff811d7d05>] xfs_ilock+0x125/0x1f0

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (xfs_iolock_active){++++++}:
       [<ffffffff810ae8a2>] lock_acquire+0x92/0x1f0
       [<ffffffff8109d9bf>] down_write_nested+0x2f/0x60
       [<ffffffff811d7d05>] xfs_ilock+0x125/0x1f0
       [<ffffffff812061d6>] xfs_file_buffered_aio_write+0x66/0x290
       [<ffffffff81206561>] xfs_file_aio_write+0x161/0x300
       [<ffffffff811261f2>] do_sync_write+0xd2/0x110
       [<ffffffff811262df>] vfs_write+0xaf/0x160
       [<ffffffff8112646a>] sys_write+0x4a/0x90
       [<ffffffff8154cc6b>] system_call_fastpath+0x16/0x1b

-> #1 (&sb->s_type->i_mutex_key#5){+.+.+.}:
       [<ffffffff810ae8a2>] lock_acquire+0x92/0x1f0
       [<ffffffff81549981>] mutex_lock_nested+0x51/0x370
       [<ffffffff81135afd>] vfs_rename+0xed/0x420
       [<ffffffff81136037>] sys_renameat+0x207/0x230
       [<ffffffff8113607b>] sys_rename+0x1b/0x20
       [<ffffffff8154cc6b>] system_call_fastpath+0x16/0x1b

-> #0 (&sb->s_type->i_mutex_key#5/2){+.+...}:
       [<ffffffff810ac21f>] __lock_acquire+0x169f/0x1b90
       [<ffffffff810ae8a2>] lock_acquire+0x92/0x1f0
       [<ffffffff81549981>] mutex_lock_nested+0x51/0x370
       [<ffffffff81151e4e>] generic_file_splice_write+0xce/0x180
       [<ffffffff81205334>] xfs_file_splice_write+0xf4/0x250
       [<ffffffff8115024e>] do_splice_from+0x7e/0xb0
       [<ffffffff811502a0>] direct_splice_actor+0x20/0x30
       [<ffffffff81151fde>] splice_direct_to_actor+0xbe/0x1c0
       [<ffffffff81152158>] do_splice_direct+0x78/0x90
       [<ffffffff81125f72>] do_sendfile+0x182/0x1d0
       [<ffffffff81126f1a>] sys_sendfile64+0x5a/0xb0
       [<ffffffff8154cc6b>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

1 lock held by kio_file/7364:
 #0:  (xfs_iolock_active){++++++}, at: [<ffffffff811d7d05>] xfs_ilock+0x125/0x1f0

stack backtrace:
Pid: 7364, comm: kio_file Not tainted 2.6.39-rc5-00130-g3fd9952 #10
Call Trace:
 [<ffffffff81543f4c>] print_circular_bug+0xb8/0xc7
 [<ffffffff810ac21f>] __lock_acquire+0x169f/0x1b90
 [<ffffffff8115153d>] ? __generic_file_splice_read+0x1cd/0x5c0
 [<ffffffff810ae8a2>] lock_acquire+0x92/0x1f0
 [<ffffffff81151e4e>] ? generic_file_splice_write+0xce/0x180
 [<ffffffff8148de90>] ? sock_def_write_space+0x140/0x140
 [<ffffffff81549981>] mutex_lock_nested+0x51/0x370
 [<ffffffff81151e4e>] ? generic_file_splice_write+0xce/0x180
 [<ffffffff81151e4e>] generic_file_splice_write+0xce/0x180
 [<ffffffff81205334>] xfs_file_splice_write+0xf4/0x250
 [<ffffffff8120510f>] ? xfs_file_splice_read+0xef/0x220
 [<ffffffff8115024e>] do_splice_from+0x7e/0xb0
 [<ffffffff811502a0>] direct_splice_actor+0x20/0x30
 [<ffffffff81151fde>] splice_direct_to_actor+0xbe/0x1c0
 [<ffffffff81150280>] ? do_splice_from+0xb0/0xb0
 [<ffffffff81152158>] do_splice_direct+0x78/0x90
 [<ffffffff81125f72>] do_sendfile+0x182/0x1d0
 [<ffffffff81126f1a>] sys_sendfile64+0x5a/0xb0
 [<ffffffff8154cc6b>] system_call_fastpath+0x16/0x1b


-- 
Markus

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2011-05-02  6:15 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-23 20:44 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Bruno Prémont
2011-04-23 20:44 ` Bruno Prémont
2011-04-27  5:08 ` Dave Chinner
2011-04-27  5:08   ` Dave Chinner
2011-04-27 16:26   ` Bruno Prémont
2011-04-27 16:26     ` Bruno Prémont
2011-04-28 19:45     ` Markus Trippelsdorf
2011-04-28 19:45       ` Markus Trippelsdorf
2011-04-29  1:19       ` Dave Chinner
2011-04-29  1:19         ` Dave Chinner
2011-04-29 15:18         ` Markus Trippelsdorf
2011-04-29 15:18           ` Markus Trippelsdorf
2011-04-29 19:35           ` Bruno Prémont
2011-04-29 19:35             ` Bruno Prémont
2011-04-30 14:18             ` Bruno Prémont
2011-04-30 14:18               ` Bruno Prémont
2011-05-02  6:15               ` Markus Trippelsdorf [this message]
2011-05-02  6:15                 ` Markus Trippelsdorf
2011-05-02 12:40                 ` Dave Chinner
2011-05-02 12:40                   ` Dave Chinner
2011-05-04  0:57         ` Jamie Heilman
2011-05-04  0:57           ` Jamie Heilman
2011-05-04 13:25           ` Dave Chinner
2011-05-04 13:25             ` Dave Chinner
2011-05-05  0:21           ` Dave Chinner
2011-05-05  0:21             ` Dave Chinner
2011-05-05  2:26             ` Dave Chinner
2011-05-05  2:26               ` Dave Chinner
2011-05-05 12:21               ` Dave Chinner
2011-05-05 12:21                 ` Dave Chinner
2011-05-05 12:39                 ` Christoph Hellwig
2011-05-05 12:39                   ` Christoph Hellwig
2011-05-06  1:49                   ` Dave Chinner
2011-05-06  1:49                     ` Dave Chinner
2011-05-05 20:35                 ` Bruno Prémont
2011-05-05 20:35                   ` Bruno Prémont
2011-05-09  5:57                   ` Bruno Prémont
2011-05-09  5:57                     ` Bruno Prémont
2011-05-08  5:11                 ` Jamie Heilman
2011-05-08  5:11                   ` Jamie Heilman
2011-05-20 11:20         ` Andrey Rahmatullin
2011-05-20 11:20           ` Andrey Rahmatullin
2011-05-21  0:14           ` Dave Chinner
2011-05-21  0:14             ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110502061528.GA22538@x4.trippels.de \
    --to=markus@trippelsdorf.de \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=aelder@sgi.com \
    --cc=bonbons@linux-vserver.org \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xfs-masters@oss.sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.