* Rambling noise #1: generic/230 can trigger kernel debug lock detector
@ 2013-05-09 2:24 Michael L. Semon
2013-05-09 3:16 ` Dave Chinner
0 siblings, 1 reply; 10+ messages in thread
From: Michael L. Semon @ 2013-05-09 2:24 UTC (permalink / raw)
To: xfs
Hi! I'm trying to come up with a series of ramblings that may or may
not be useful in a mailing-list context, with the idea that one bug
report might be good, the next might be me thinking aloud with data in
hand because I know something's wrong but can't put my finger on it. An
ex-girlfriend saw the movie "Rain Man" years ago pointed to the screen
and said, "Do you see that guy? That's you!" If only I could be so
smart...or act as well as Dustin Hoffman. The noisy thinking is there,
just not the brilliant insights...
This report is to pass on a kernel lock detector message that might be
reproducible under a certain family of tests. generic/230 may not be at
fault, it's just where the detector went off.
It seems like in the few times the detector has gone off lately, it does
so at the same instant as I'm doing some very boring operation on a
different partition at the same time, such as reloading a file in vi, or
piping something to less to read it. Some folks have been working on
tty stuff lately for the 3.8 kernels at least--making great improvements
overall--but there seems to be no tty hints in this message.
The kernel, AFAIK, to be a git Linux with v3.9.0 + this weekend's
xfs-oss checked out, with the following patches applied:
[PATCH v2] xfs: fix assertion failure in xfs_vm_write_failed()
[PATCH] xfs: fix s_max_bytes to MAX_LFS_FILESIZE if needed
[PATCH] xfs: don't return 0 if generic_segment_checks() find nothing
[PATCH 1/2] xfs: fix sub-page blocksize data integrity writes
[PATCH 2/2] xfs: fix rounding in xfs_free_file_space
[PATCH v3 1/2] xfs: Remove XFS_MOUNT_RETERR
[PATCH v3 2/2] xfs: Don't keep silent if sunit/swidth can not be changed
via mount
There shouldn't be a need to apply these patches right away. I'm just
providing context.
Computer is a Pentium 733 with memory lowered to 160 MB for low-memory
testing. It uses the standard VGA console, which can contribute to such
issues but not as much as using a DRM framebuffer console.
Thanks!
Michael
[Earlier tests are shown only to provide sequence.]
FSTYP -- xfs (debug)
PLATFORM -- Linux/i686 oldsvrhw 3.9.0+
MKFS_OPTIONS -- -f -llogdev=/dev/sda7 -bsize=4096 /dev/sdb6
MOUNT_OPTIONS -- -ologdev=/dev/sda7 /dev/sdb6 /mnt/xfstests-scratch
xfs/168 [not run] Assuming DMAPI modules are not loaded
generic/053 10s
xfs/043 [not run] No dump tape specified
generic/099 [not run] not suitable for this OS: Linux
xfs/170 47s
xfs/116 3s
generic/020 29s
xfs/175 [not run] Assuming DMAPI modules are not loaded
xfs/066 8s
xfs/037 [not run] No dump tape specified
xfs/292 - output mismatch (see /var/lib/xfstests/results/xfs/292.out.bad)
--- tests/xfs/292.out 2013-05-08 12:40:14.635752692 -0400
+++ /var/lib/xfstests/results/xfs/292.out.bad 2013-05-08
16:35:33.894218930 -0400
@@ -1,5 +1,5 @@
QA output created by 292
mkfs.xfs without geometry
-meta-data=FILENAME isize=256 agcount=4, agsize=16777216 blks
+meta-data=FILENAME isize=256 agcount=4, agsize=16777216 blks
mkfs.xfs with cmdline geometry
-meta-data=FILENAME isize=256 agcount=16, agsize=4194304 blks
+meta-data=FILENAME isize=256 agcount=16, agsize=4194304 blks
...
(Run 'diff -u tests/xfs/292.out
/var/lib/xfstests/results/xfs/292.out.bad' to see the entire diff)
xfs/086 195s
xfs/293 16s
generic/308 2s
xfs/095 [not run] not suitable for this OS: Linux
xfs/096 28s
xfs/022 [not run] No dump tape specified
generic/260 [not run] FITRIM not supported on /dev/sdb6
generic/247 101s
generic/235 - output mismatch (see
/var/lib/xfstests/results/generic/235.out.bad)
--- tests/generic/235.out 2013-05-08 12:39:55.017626952 -0400
+++ /var/lib/xfstests/results/generic/235.out.bad 2013-05-08
16:42:10.527639188 -0400
@@ -15,7 +15,7 @@
fsgqa -- 0 0 0 1 0 0
-touch: cannot touch `SCRATCH_MNT/failed': Read-only file system
+touch: cannot touch 'SCRATCH_MNT/failed': Read-only file system
*** Report for user quotas on device SCRATCH_DEV
Block grace time: 7days; Inode grace time: 7days
...
(Run 'diff -u tests/generic/235.out
/var/lib/xfstests/results/generic/235.out.bad' to see the entire diff)
xfs/072 7s
xfs/180 441s
xfs/283 25s
xfs/048 1s
generic/076 8s
generic/236 3s
generic/230
=============================================
[ INFO: possible recursive locking detected ]
3.9.0+ #3 Not tainted
---------------------------------------------
setquota/28368 is trying to acquire lock:
(sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50
but task is already holding lock:
(sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(sb_internal);
lock(sb_internal);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by setquota/28368:
#0: (&type->s_umount_key#20){++++.+}, at: [<c10c660a>]
get_super+0x7a/0xc0
#1: (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50
#2: (&qinf->qi_quotaofflock){+.+...}, at: [<c11fa44a>]
xfs_qm_scall_setqlim+0x9a/0x690
stack backtrace:
CPU: 0 PID: 28368 Comm: setquota Not tainted 3.9.0+ #3
Hardware name: Dell Computer Corporation L733r
/CA810E , BIOS A14 09/05/2001
c6456ca0 c6456ca0 c8f83cc8 c13fe5bd c8f83d40 c1060ee0 c14d241d c6456ad4
00006ed0 000003eb c196a618 c6456cf0 00000004 00000000 0001f60c c177c801
c19b033d 00000000 f089e33c 00000000 c6456930 4596f1d4 000003eb 00000000
Call Trace:
[<c13fe5bd>] dump_stack+0x16/0x18
[<c1060ee0>] __lock_acquire+0x17b0/0x17f0
[<c105dfae>] ? trace_hardirqs_off_caller+0x1e/0xc0
[<c104f795>] ? sched_clock_cpu+0xa5/0x100
[<c1061580>] lock_acquire+0x80/0x100
[<c11e8846>] ? xfs_trans_alloc+0x26/0x50
[<c10c737d>] __sb_start_write+0xad/0x1b0
[<c11e8846>] ? xfs_trans_alloc+0x26/0x50
[<c11e8846>] ? xfs_trans_alloc+0x26/0x50
[<c105df8b>] ? trace_hardirqs_on+0xb/0x10
[<c11e8846>] xfs_trans_alloc+0x26/0x50
[<c11f75ad>] xfs_qm_dqread+0xcd/0x360
[<c11f7b82>] xfs_qm_dqget+0x342/0x520
[<c11fa469>] xfs_qm_scall_setqlim+0xb9/0x690
[<c10b45ea>] ? might_fault+0x4a/0xa0
[<c10b4634>] ? might_fault+0x94/0xa0
[<c11ff8b4>] xfs_fs_set_dqblk+0x54/0xa0
[<c110fbf6>] quota_setxquota+0x76/0xc0
[<c1110233>] SyS_quotactl+0x513/0x5a0
[<c10c8834>] ? SyS_stat64+0x34/0x40
[<c1403df2>] ? sysenter_exit+0xf/0x1d
[<c105deb4>] ? trace_hardirqs_on_caller+0xf4/0x1c0
[<c1403dbf>] sysenter_do_call+0x12/0x36
XFS (sdb6): Mounting Filesystem
XFS (sdb6): Ending clean mount
XFS (sdb6): Mounting Filesystem
XFS (sdb6): Ending clean mount
XFS (sdb6): Quotacheck needed: Please wait.
XFS (sdb6): Quotacheck: Done.
- output mismatch (see /var/lib/xfstests/results/generic/230.out.bad)
--- tests/generic/230.out 2013-05-08 12:39:54.827612822 -0400
+++ /var/lib/xfstests/results/generic/230.out.bad 2013-05-08
16:51:08.063301955 -0400
@@ -12,9 +12,9 @@
pwrite64: Disk quota exceeded
Touch 3+4
Touch 5+6
-touch: cannot touch `SCRATCH_MNT/file6': Disk quota exceeded
+touch: cannot touch 'SCRATCH_MNT/file6': Disk quota exceeded
Touch 5
-touch: cannot touch `SCRATCH_MNT/file5': Disk quota exceeded
...
(Run 'diff -u tests/generic/230.out
/var/lib/xfstests/results/generic/230.out.bad' to see the entire diff)
XFS (sdb5): Mounting Filesystem
XFS (sdb5): Ending clean mount
xfs/155
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Rambling noise #1: generic/230 can trigger kernel debug lock detector
2013-05-09 2:24 Rambling noise #1: generic/230 can trigger kernel debug lock detector Michael L. Semon
@ 2013-05-09 3:16 ` Dave Chinner
2013-05-09 7:20 ` Dave Chinner
2013-05-09 15:57 ` Mark Tinguely
0 siblings, 2 replies; 10+ messages in thread
From: Dave Chinner @ 2013-05-09 3:16 UTC (permalink / raw)
To: Michael L. Semon; +Cc: xfs
On Wed, May 08, 2013 at 10:24:25PM -0400, Michael L. Semon wrote:
> Hi! I'm trying to come up with a series of ramblings that may or
> may not be useful in a mailing-list context, with the idea that one
> bug report might be good, the next might be me thinking aloud with
> data in hand because I know something's wrong but can't put my
> finger on it. An ex-girlfriend saw the movie "Rain Man" years ago
> pointed to the screen and said, "Do you see that guy? That's you!"
> If only I could be so smart...or act as well as Dustin Hoffman. The
> noisy thinking is there, just not the brilliant insights...
>
> This report is to pass on a kernel lock detector message that might
> be reproducible under a certain family of tests. generic/230 may
> not be at fault, it's just where the detector went off.
No, there's definitely a bug there. Thanks for the report, Michael.
Try the patch below.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
xfs: avoid nesting transactions in xfs_qm_scall_setqlim()
From: Dave Chinner <dchinner@redhat.com>
Lockdep reports:
=============================================
[ INFO: possible recursive locking detected ]
3.9.0+ #3 Not tainted
---------------------------------------------
setquota/28368 is trying to acquire lock:
(sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50
but task is already holding lock:
(sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50
from xfs_qm_scall_setqlim()->xfs_dqread() when a dquot needs to be
allocated.
xfs_qm_scall_setqlim() is starting a transaction and then not
passing it into xfs_qm_dqet() and so it starts it's own transaction
when allocating the dquot. Splat!
Fix this by not allocating the dquot in xfs_qm_scall_setqlim()
inside the setqlim transaction. This requires getting the dquot
first (and allocating it if necessary) then dropping and relocking
the dquot before joining it to the setqlim transaction.
Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_qm_syscalls.c | 35 ++++++++++++++++++++---------------
1 file changed, 20 insertions(+), 15 deletions(-)
diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
index c41190c..7e67a3a 100644
--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -489,31 +489,36 @@ xfs_qm_scall_setqlim(
if ((newlim->d_fieldmask & XFS_DQ_MASK) == 0)
return 0;
- tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SETQLIM);
- error = xfs_trans_reserve(tp, 0, XFS_QM_SETQLIM_LOG_RES(mp),
- 0, 0, XFS_DEFAULT_LOG_COUNT);
- if (error) {
- xfs_trans_cancel(tp, 0);
- return (error);
- }
-
/*
* We don't want to race with a quotaoff so take the quotaoff lock.
- * (We don't hold an inode lock, so there's nothing else to stop
- * a quotaoff from happening). (XXXThis doesn't currently happen
- * because we take the vfslock before calling xfs_qm_sysent).
+ * We don't hold an inode lock, so there's nothing else to stop
+ * a quotaoff from happening.
*/
mutex_lock(&q->qi_quotaofflock);
/*
- * Get the dquot (locked), and join it to the transaction.
- * Allocate the dquot if this doesn't exist.
+ * Get the dquot (locked) before we start, as we need to do a
+ * transaction to allocate it if it doesn't exist. Once we have the
+ * dquot, unlock it so we can start the next transaction safely. We hold
+ * a reference to the dquot, so it's safe to do this unlock/lock without
+ * it being reclaimed in the mean time.
*/
- if ((error = xfs_qm_dqget(mp, NULL, id, type, XFS_QMOPT_DQALLOC, &dqp))) {
- xfs_trans_cancel(tp, XFS_TRANS_ABORT);
+ error = xfs_qm_dqget(mp, NULL, id, type, XFS_QMOPT_DQALLOC, &dqp);
+ if (error) {
ASSERT(error != ENOENT);
goto out_unlock;
}
+ xfs_dqunlock(dqp);
+
+ tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SETQLIM);
+ error = xfs_trans_reserve(tp, 0, XFS_QM_SETQLIM_LOG_RES(mp),
+ 0, 0, XFS_DEFAULT_LOG_COUNT);
+ if (error) {
+ xfs_trans_cancel(tp, 0);
+ return (error);
+ }
+
+ xfs_dqlock(dqp);
xfs_trans_dqjoin(tp, dqp);
ddq = &dqp->q_core;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Rambling noise #1: generic/230 can trigger kernel debug lock detector
2013-05-09 3:16 ` Dave Chinner
@ 2013-05-09 7:20 ` Dave Chinner
2013-05-10 2:00 ` Michael L. Semon
2013-05-09 15:57 ` Mark Tinguely
1 sibling, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2013-05-09 7:20 UTC (permalink / raw)
To: Michael L. Semon; +Cc: xfs
On Thu, May 09, 2013 at 01:16:46PM +1000, Dave Chinner wrote:
> On Wed, May 08, 2013 at 10:24:25PM -0400, Michael L. Semon wrote:
> > Hi! I'm trying to come up with a series of ramblings that may or
> > may not be useful in a mailing-list context, with the idea that one
> > bug report might be good, the next might be me thinking aloud with
> > data in hand because I know something's wrong but can't put my
> > finger on it. An ex-girlfriend saw the movie "Rain Man" years ago
> > pointed to the screen and said, "Do you see that guy? That's you!"
> > If only I could be so smart...or act as well as Dustin Hoffman. The
> > noisy thinking is there, just not the brilliant insights...
> >
> > This report is to pass on a kernel lock detector message that might
> > be reproducible under a certain family of tests. generic/230 may
> > not be at fault, it's just where the detector went off.
>
> No, there's definitely a bug there. Thanks for the report, Michael.
> Try the patch below.
Actaully, there's a bug in the error handling in that version - it
fails to unlock the quotaoff lock properly on failure. The version
below fixes that problem.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
xfs: avoid nesting transactions in xfs_qm_scall_setqlim()
From: Dave Chinner <dchinner@redhat.com>
Lockdep reports:
=============================================
[ INFO: possible recursive locking detected ]
3.9.0+ #3 Not tainted
---------------------------------------------
setquota/28368 is trying to acquire lock:
(sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50
but task is already holding lock:
(sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50
from xfs_qm_scall_setqlim()->xfs_dqread() when a dquot needs to be
allocated.
xfs_qm_scall_setqlim() is starting a transaction and then not
passing it into xfs_qm_dqet() and so it starts it's own transaction
when allocating the dquot. Splat!
Fix this by not allocating the dquot in xfs_qm_scall_setqlim()
inside the setqlim transaction. This requires getting the dquot
first (and allocating it if necessary) then dropping and relocking
the dquot before joining it to the setqlim transaction.
Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_qm_syscalls.c | 35 ++++++++++++++++++++---------------
1 file changed, 20 insertions(+), 15 deletions(-)
diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
index c41190c..dfa5c05 100644
--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -489,31 +489,36 @@ xfs_qm_scall_setqlim(
if ((newlim->d_fieldmask & XFS_DQ_MASK) == 0)
return 0;
- tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SETQLIM);
- error = xfs_trans_reserve(tp, 0, XFS_QM_SETQLIM_LOG_RES(mp),
- 0, 0, XFS_DEFAULT_LOG_COUNT);
- if (error) {
- xfs_trans_cancel(tp, 0);
- return (error);
- }
-
/*
* We don't want to race with a quotaoff so take the quotaoff lock.
- * (We don't hold an inode lock, so there's nothing else to stop
- * a quotaoff from happening). (XXXThis doesn't currently happen
- * because we take the vfslock before calling xfs_qm_sysent).
+ * We don't hold an inode lock, so there's nothing else to stop
+ * a quotaoff from happening.
*/
mutex_lock(&q->qi_quotaofflock);
/*
- * Get the dquot (locked), and join it to the transaction.
- * Allocate the dquot if this doesn't exist.
+ * Get the dquot (locked) before we start, as we need to do a
+ * transaction to allocate it if it doesn't exist. Once we have the
+ * dquot, unlock it so we can start the next transaction safely. We hold
+ * a reference to the dquot, so it's safe to do this unlock/lock without
+ * it being reclaimed in the mean time.
*/
- if ((error = xfs_qm_dqget(mp, NULL, id, type, XFS_QMOPT_DQALLOC, &dqp))) {
- xfs_trans_cancel(tp, XFS_TRANS_ABORT);
+ error = xfs_qm_dqget(mp, NULL, id, type, XFS_QMOPT_DQALLOC, &dqp);
+ if (error) {
ASSERT(error != ENOENT);
goto out_unlock;
}
+ xfs_dqunlock(dqp);
+
+ tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SETQLIM);
+ error = xfs_trans_reserve(tp, 0, XFS_QM_SETQLIM_LOG_RES(mp),
+ 0, 0, XFS_DEFAULT_LOG_COUNT);
+ if (error) {
+ xfs_trans_cancel(tp, 0);
+ goto out_unlock;
+ }
+
+ xfs_dqlock(dqp);
xfs_trans_dqjoin(tp, dqp);
ddq = &dqp->q_core;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Rambling noise #1: generic/230 can trigger kernel debug lock detector
2013-05-09 3:16 ` Dave Chinner
2013-05-09 7:20 ` Dave Chinner
@ 2013-05-09 15:57 ` Mark Tinguely
1 sibling, 0 replies; 10+ messages in thread
From: Mark Tinguely @ 2013-05-09 15:57 UTC (permalink / raw)
To: Dave Chinner; +Cc: Michael L. Semon, xfs
On 05/08/13 22:16, Dave Chinner wrote:
> xfs: avoid nesting transactions in xfs_qm_scall_setqlim()
>
> From: Dave Chinner<dchinner@redhat.com>
>
> Lockdep reports:
>
> =============================================
> [ INFO: possible recursive locking detected ]
> 3.9.0+ #3 Not tainted
> ---------------------------------------------
> setquota/28368 is trying to acquire lock:
> (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50
>
> but task is already holding lock:
> (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50
>
> from xfs_qm_scall_setqlim()->xfs_dqread() when a dquot needs to be
> allocated.
>
> xfs_qm_scall_setqlim() is starting a transaction and then not
> passing it into xfs_qm_dqet() and so it starts it's own transaction
> when allocating the dquot. Splat!
>
> Fix this by not allocating the dquot in xfs_qm_scall_setqlim()
> inside the setqlim transaction. This requires getting the dquot
> first (and allocating it if necessary) then dropping and relocking
> the dquot before joining it to the setqlim transaction.
>
> Reported-by: Michael L. Semon<mlsemon35@gmail.com>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---
> fs/xfs/xfs_qm_syscalls.c | 35 ++++++++++++++++++++---------------
> 1 file changed, 20 insertions(+), 15 deletions(-)
Looks good.
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Rambling noise #1: generic/230 can trigger kernel debug lock detector
2013-05-09 7:20 ` Dave Chinner
@ 2013-05-10 2:00 ` Michael L. Semon
2013-05-10 2:19 ` Dave Chinner
0 siblings, 1 reply; 10+ messages in thread
From: Michael L. Semon @ 2013-05-10 2:00 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 05/09/2013 03:20 AM, Dave Chinner wrote:
> On Thu, May 09, 2013 at 01:16:46PM +1000, Dave Chinner wrote:
>> On Wed, May 08, 2013 at 10:24:25PM -0400, Michael L. Semon wrote:
>>> Hi! I'm trying to come up with a series of ramblings that may or
>>> may not be useful in a mailing-list context, with the idea that one
>>> bug report might be good, the next might be me thinking aloud with
>>> data in hand because I know something's wrong but can't put my
>>> finger on it. An ex-girlfriend saw the movie "Rain Man" years ago
>>> pointed to the screen and said, "Do you see that guy? That's you!"
>>> If only I could be so smart...or act as well as Dustin Hoffman. The
>>> noisy thinking is there, just not the brilliant insights...
>>>
>>> This report is to pass on a kernel lock detector message that might
>>> be reproducible under a certain family of tests. generic/230 may
>>> not be at fault, it's just where the detector went off.
>>
>> No, there's definitely a bug there. Thanks for the report, Michael.
>> Try the patch below.
>
> Actaully, there's a bug in the error handling in that version - it
> fails to unlock the quotaoff lock properly on failure. The version
> below fixes that problem.
>
> Cheers,
>
> Dave.
OK, I'll try this version as well. The first version seemed to work
just fine.
The rest of the letter has been shortened from a more detailed review to
a casual one with a scripted run of xfstests at the end. If nothing
else, it shows a) a dir3 function in a stack trace on a non-CRC
filesystem, b) loggen might need to be installed in the `make install`
version of xfstests, c) your previous-run-times feature has been
restored, and d) a pwrite64 "file too large" may or may not be there
because J. Liu's bitness patches are doing their jobs.
Good luck!
Michael
Original review (reduced):
As far as logical testing goes, your fix works well. Without your
patch, I got the original lockdep message, and the situation was
replicated by issuing a `du -sx /` while xfstests xfs/054 was running.
With your patch, all seems well for things that mention quotas.
The lock detector went off once more, but that was after attempting to
merge a 2013-05-08 kernel git with xfs-oss/master of the same vintage,
then apply my normal patches and your original quota lockdep fix.
Hardware change: Because `git commit` caused XFS to emit the message
"XFS: possible memory allocation deadlock in xfs_buf_allocate_memory
(mode:0x250)" a few times, RAM was increased to 256 MB. The merge was
reworked by restoring the git from a backup tarball and doing the
merge-and-patch session again.
Script started on Thu May 9 16:00:28 2013
cu -t -l /dev/ttyu0 -s 115200
Connected
# Note: lm256 is a local group
root@oldsvrhw:/var/lib/xfstests# ./check -xfs -g lm256
FSTYP -- xfs (debug)
PLATFORM -- Linux/i686 oldsvrhw 3.9.0+
MKFS_OPTIONS -- -f -llogdev=/dev/sda7 -bsize=4096 /dev/sdb6
MOUNT_OPTIONS -- -ologdev=/dev/sda7 /dev/sdb6 /mnt/xfstests-scratch
xfs/003 1s ... 2s
xfs/004 8s ... 8s
xfs/008 5s ... 5s
xfs/009 8s ... 7s
xfs/012 16s ... 13s
xfs/016 76s ... 62s
xfs/017 [failed, exit status 1] - output mismatch (see
/var/lib/xfstests/results/xfs/017.out.bad)
--- tests/xfs/017.out 2013-05-09 05:24:30.996641679 -0400
+++ /var/lib/xfstests/results/xfs/017.out.bad 2013-05-09
16:03:25.185045262 -0400
@@ -3,8 +3,6 @@
*** test
*** test 0
*** test 1
- *** test 2
- *** test 3
- *** test 4
-*** done
...
(Run 'diff -u tests/xfs/017.out
/var/lib/xfstests/results/xfs/017.out.bad' to see the entire diff)
xfs/019 8s
xfs/021 8s ... 7s
xfs/022 [not run] No dump tape specified
xfs/023 [not run] No dump tape specified
xfs/024 [not run] No dump tape specified
xfs/025 [not run] No dump tape specified
xfs/026 22s
xfs/027 22s ... 22s
xfs/028 31s
xfs/029 11s
xfs/030 65s
xfs/031 102s ... 99s
xfs/033 49s
xfs/034 9s ... 9s
xfs/035 [not run] No dump tape specified
xfs/036 [not run] No dump tape specified
xfs/037 [not run] No dump tape specified
xfs/038 [not run] No dump tape specified
xfs/039 [not run] No dump tape specified
xfs/040 [not run] Can't run srcdiff without KWORKAREA set
xfs/041 121s
xfs/042 163s
xfs/043 [not run] No dump tape specified
xfs/044 [failed, exit status 1] - output mismatch (see
/var/lib/xfstests/results/xfs/044.out.bad)
--- tests/xfs/044.out 2013-05-09 05:24:32.493415709 -0400
+++ /var/lib/xfstests/results/xfs/044.out.bad 2013-05-09
16:16:33.963173264 -0400
@@ -16,51 +16,18 @@
*** mount (expect success)
*** umount
*** zero log
-*** loggen
- *** zero block (1BB) x 100
+./tests/xfs/044: line 138: /var/lib/xfstests/src/loggen: No such
file or directory
*** mount (expect success)
...
(Run 'diff -u tests/xfs/044.out
/var/lib/xfstests/results/xfs/044.out.bad' to see the entire diff)
xfs/045 11s
xfs/046 20s ... 19s
xfs/047 31s
xfs/048 2s ... 1s
xfs/049 [not run] External device testing in progress, skipped this test
xfs/050 49s
xfs/052 7s ... 8s
xfs/054 31s ... 28s
xfs/055 [not run] No dump tape specified
xfs/056 19s
xfs/057 [not run] Place holder for IRIX test 057
xfs/058 [not run] Place holder for IRIX test 058
xfs/059 32s ... 31s
xfs/060 31s ... 30s
xfs/061 20s
xfs/063 21s ... 21s
xfs/064 77s ... 75s
xfs/065 53s ... 51s
xfs/066 8s
xfs/067 12s
xfs/071 - output mismatch (see /var/lib/xfstests/results/xfs/071.out.bad)
--- tests/xfs/071.out 2013-05-09 05:24:33.373478835 -0400
+++ /var/lib/xfstests/results/xfs/071.out.bad 2013-05-09
16:25:07.474096862 -0400
@@ -40,14 +40,14 @@
Writing 512 bytes, offset is +0 (direct=false)
-wrote 512/512 bytes at offset <OFFSET>
+pwrite64: File too large
Reading 512 bytes (direct=false)
-read 512/512 bytes at offset <OFFSET>
...
(Run 'diff -u tests/xfs/071.out
/var/lib/xfstests/results/xfs/071.out.bad' to see the entire diff)
xfs/072 8s ... 7s
xfs/073 [not run] Cannot xfs_copy with external devices
xfs/078 ^CRan: xfs/003 xfs/004 xfs/008 xfs/009 xfs/012 xfs/016 xfs/017
xfs/019 xfs/021 xfs/026 xfs/027 xfs/028 xfs/029 xfs/030 xfs/031 xfs/033
xfs/034 xfs/041 xfs/042 xfs/044 xfs/045 xfs/046 xfs/047 xfs/048 xfs/050
xfs/052 xfs/054 xfs/056 xfs/059 xfs/060 xfs/061 xfs/063 xfs/064 xfs/065
xfs/066 xfs/067 xfs/071 xfs/072
Not run: xfs/022 xfs/023 xfs/024 xfs/025 xfs/035 xfs/036 xfs/037 xfs/038
xfs/039 xfs/040 xfs/043 xfs/049 xfs/055 xfs/057 xfs/058 xfs/073
Failures: xfs/017 xfs/044 xfs/071
Failed 3 of 38 tests
root@oldsvrhw:/var/lib/xfstests# ./check -xfs -g lm256
FSTYP -- xfs (debug)
PLATFORM -- Linux/i686 oldsvrhw 3.9.0+
MKFS_OPTIONS -- -f -llogdev=/dev/sda7 -bsize=4096 /dev/sdb6
MOUNT_OPTIONS -- -ologdev=/dev/sda7 /dev/sdb6 /mnt/xfstests-scratch
xfs/003 2s ... 1s
xfs/004 8s ... 8s
xfs/008 5s ... 5s
xfs/009 7s ... 8s
xfs/012 13s ...[ 1851.323902]
[ 1851.325479] =================================
[ 1851.326551] [ INFO: inconsistent lock state ]
[ 1851.326551] 3.9.0+ #1 Not tainted
[ 1851.326551] ---------------------------------
[ 1851.326551] inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage.
[ 1851.326551] kswapd0/18 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 1851.326551] (&(&ip->i_lock)->mr_lock){++++-+}, at: [<c11dcabf>]
xfs_ilock+0x10f/0x190
[ 1851.326551] {RECLAIM_FS-ON-R} state was registered at:
[ 1851.326551] [<c105e10a>] mark_held_locks+0x8a/0xf0
[ 1851.326551] [<c105e69c>] lockdep_trace_alloc+0x5c/0xa0
[ 1851.326551] [<c109c52c>] __alloc_pages_nodemask+0x7c/0x670
[ 1851.326551] [<c10bfd8e>] new_slab+0x6e/0x2a0
[ 1851.326551] [<c14083a9>] __slab_alloc.isra.59.constprop.67+0x1d3/0x40a
[ 1851.326551] [<c10c12cd>] __kmalloc+0x10d/0x180
[ 1851.326551] [<c1199b56>] kmem_alloc+0x56/0xd0
[ 1851.326551] [<c1199be1>] kmem_zalloc+0x11/0xd0
[ 1851.326551] [<c11c666e>] xfs_dabuf_map.isra.2.constprop.5+0x22e/0x520
[ 1851.326551] [<c11c7064>] xfs_da_read_buf+0x54/0x280
[ 1851.326551] [<c11cf2e3>] xfs_dir3_leaf_read.constprop.6+0x43/0x90
[ 1851.326551] [<c11d1d75>] xfs_dir2_leaf_lookup_int+0x45/0x350
[ 1851.326551] [<c11d20b0>] xfs_dir2_leaf_lookup+0x30/0x150
[ 1851.326551] [<c11ca75f>] xfs_dir_lookup+0x13f/0x160
[ 1851.326551] [<c1196d20>] xfs_lookup+0x50/0x120
[ 1851.326551] [<c118fcc8>] xfs_vn_lookup+0x48/0x80
[ 1851.326551] [<c10cc92e>] lookup_real+0x1e/0x50
[ 1851.326551] [<c10d00fa>] do_last.isra.46+0x74a/0xb90
[ 1851.326551] [<c10d05db>] path_openat.isra.47+0x9b/0x3e0
[ 1851.326551] [<c10d0951>] do_filp_open+0x31/0x80
[ 1851.326551] [<c10c9665>] open_exec+0x45/0x110
[ 1851.326551] [<c10ca96a>] do_execve_common+0x1da/0x550
[ 1851.326551] [<c10caddf>] sys_execve+0x2f/0x50
[ 1851.326551] [<c140e1ff>] sysenter_do_call+0x12/0x36
[ 1851.326551] irq event stamp: 789831
[ 1851.326551] hardirqs last enabled at (789831): [<c109e4fa>]
clear_page_dirty_for_io+0xaa/0xe0
[ 1851.326551] hardirqs last disabled at (789830): [<c109e4d4>]
clear_page_dirty_for_io+0x84/0xe0
[ 1851.326551] softirqs last enabled at (789776): [<c102a90e>]
__do_softirq+0x12e/0x1f0
[ 1851.326551] softirqs last disabled at (789769): [<c102ab1d>]
irq_exit+0x6d/0x80
[ 1851.326551]
[ 1851.326551] other info that might help us debug this:
[ 1851.326551] Possible unsafe locking scenario:
[ 1851.326551]
[ 1851.326551] CPU0
[ 1851.326551] ----
[ 1851.326551] lock(&(&ip->i_lock)->mr_lock);
[ 1851.326551] <Interrupt>
[ 1851.326551] lock(&(&ip->i_lock)->mr_lock);
[ 1851.326551]
[ 1851.326551] *** DEADLOCK ***
[ 1851.326551]
[ 1851.326551] 1 lock held by kswapd0/18:
[ 1851.326551] #0: (sb_internal){.+.+.?}, at: [<c11f1aa6>]
xfs_trans_alloc+0x26/0x50
[ 1851.326551]
[ 1851.326551] stack backtrace:
[ 1851.326551] Pid: 18, comm: kswapd0 Not tainted 3.9.0+ #1
[ 1851.326551] Call Trace:
[ 1851.326551] [<c1407048>] print_usage_bug+0x1dc/0x1e6
[ 1851.326551] [<c105d5d0>] ? print_shortest_lock_dependencies+0x1b0/0x1b0
[ 1851.326551] [<c105e05b>] mark_lock+0x26b/0x290
[ 1851.326551] [<c105ff3f>] __lock_acquire+0x64f/0x1690
[ 1851.326551] [<c1199e66>] ? kmem_zone_alloc+0x56/0xd0
[ 1851.326551] [<c1190321>] ? xfs_vn_mknod+0x151/0x170
[ 1851.326551] [<c11f3a3d>] ? xlog_space_left+0x2d/0xf0
[ 1851.326551] [<c11f3a3d>] ? xlog_space_left+0x2d/0xf0
[ 1851.326551] [<c1061580>] lock_acquire+0x80/0x100
[ 1851.326551] [<c11dcabf>] ? xfs_ilock+0x10f/0x190
[ 1851.326551] [<c1047184>] down_write_nested+0x54/0xa0
[ 1851.326551] [<c11dcabf>] ? xfs_ilock+0x10f/0x190
[ 1851.326551] [<c11dcabf>] xfs_ilock+0x10f/0x190
[ 1851.326551] [<c118f212>] xfs_iomap_write_allocate+0x232/0x400
[ 1851.326551] [<c105cce8>] ? lock_release_holdtime.part.27+0x88/0xe0
[ 1851.326551] [<c11dcc4b>] ? xfs_ilock_nowait+0xbb/0x1e0
[ 1851.326551] [<c1046f8b>] ? up_read+0x1b/0x30
[ 1851.326551] [<c11dce1b>] ? xfs_iunlock+0xab/0x170
[ 1851.326551] [<c117c948>] xfs_map_blocks+0x388/0x420
[ 1851.326551] [<c117da45>] xfs_vm_writepage+0x355/0x6e0
[ 1851.326551] [<c1050655>] ? sched_clock_cpu+0xa5/0x100
[ 1851.326551] [<c105e30e>] ? trace_hardirqs_off_caller+0x1e/0xa0
[ 1851.326551] [<c10a4567>] shrink_page_list+0x4c7/0x680
[ 1851.326551] [<c10a4ba1>] shrink_inactive_list+0x181/0x2d0
[ 1851.326551] [<c10a4e6a>] shrink_lruvec+0x17a/0x3b0
[ 1851.326551] [<c1046f8b>] ? up_read+0x1b/0x30
[ 1851.326551] [<c10a3901>] ? shrink_slab+0x1e1/0x2f0
[ 1851.326551] [<c10a5a2e>] kswapd+0x30e/0x680
[ 1851.326551] [<c10a5720>] ? try_to_free_pages+0x680/0x680
[ 1851.326551] [<c10424d5>] kthread+0xa5/0xb0
[ 1851.326551] [<c105cd1c>] ? lock_release_holdtime.part.27+0xbc/0xe0
[ 1851.326551] [<c140e177>] ret_from_kernel_thread+0x1b/0x28
[ 1851.326551] [<c1042430>] ? __kthread_parkme+0x70/0x70
14s
[ 1859.123845] XFS (sdb5): Mounting Filesystem
[ 1859.439419] XFS (sdb5): Ending clean mount
xfs/016 62s ...[ 1862.733835] XFS (sdb6): Mounting Filesystem
[ 1863.049345] XFS (sdb6): Ending clean mount
[ 1866.823293] XFS (sdb6): Mounting Filesystem
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Rambling noise #1: generic/230 can trigger kernel debug lock detector
2013-05-10 2:00 ` Michael L. Semon
@ 2013-05-10 2:19 ` Dave Chinner
2013-05-10 19:07 ` Michael L. Semon
0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2013-05-10 2:19 UTC (permalink / raw)
To: Michael L. Semon; +Cc: xfs
On Thu, May 09, 2013 at 10:00:10PM -0400, Michael L. Semon wrote:
> On 05/09/2013 03:20 AM, Dave Chinner wrote:
> >On Thu, May 09, 2013 at 01:16:46PM +1000, Dave Chinner wrote:
> >>On Wed, May 08, 2013 at 10:24:25PM -0400, Michael L. Semon wrote:
> >>>Hi! I'm trying to come up with a series of ramblings that may or
> >>>may not be useful in a mailing-list context, with the idea that one
> >>>bug report might be good, the next might be me thinking aloud with
> >>>data in hand because I know something's wrong but can't put my
> >>>finger on it. An ex-girlfriend saw the movie "Rain Man" years ago
> >>>pointed to the screen and said, "Do you see that guy? That's you!"
> >>>If only I could be so smart...or act as well as Dustin Hoffman. The
> >>>noisy thinking is there, just not the brilliant insights...
> >>>
> >>>This report is to pass on a kernel lock detector message that might
> >>>be reproducible under a certain family of tests. generic/230 may
> >>>not be at fault, it's just where the detector went off.
> >>
> >>No, there's definitely a bug there. Thanks for the report, Michael.
> >>Try the patch below.
> >
> >Actaully, there's a bug in the error handling in that version - it
> >fails to unlock the quotaoff lock properly on failure. The version
> >below fixes that problem.
> >
> >Cheers,
> >
> >Dave.
>
> OK, I'll try this version as well. The first version seemed to work
> just fine.
It should, the bug was in an error handling path you are unlikely to
hit.
> xfs/012 13s ...[ 1851.323902]
> [ 1851.325479] =================================
> [ 1851.326551] [ INFO: inconsistent lock state ]
> [ 1851.326551] 3.9.0+ #1 Not tainted
> [ 1851.326551] ---------------------------------
> [ 1851.326551] inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage.
> [ 1851.326551] kswapd0/18 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [ 1851.326551] (&(&ip->i_lock)->mr_lock){++++-+}, at: [<c11dcabf>]
> xfs_ilock+0x10f/0x190
> [ 1851.326551] {RECLAIM_FS-ON-R} state was registered at:
> [ 1851.326551] [<c105e10a>] mark_held_locks+0x8a/0xf0
> [ 1851.326551] [<c105e69c>] lockdep_trace_alloc+0x5c/0xa0
> [ 1851.326551] [<c109c52c>] __alloc_pages_nodemask+0x7c/0x670
> [ 1851.326551] [<c10bfd8e>] new_slab+0x6e/0x2a0
> [ 1851.326551] [<c14083a9>] __slab_alloc.isra.59.constprop.67+0x1d3/0x40a
> [ 1851.326551] [<c10c12cd>] __kmalloc+0x10d/0x180
> [ 1851.326551] [<c1199b56>] kmem_alloc+0x56/0xd0
> [ 1851.326551] [<c1199be1>] kmem_zalloc+0x11/0xd0
> [ 1851.326551] [<c11c666e>] xfs_dabuf_map.isra.2.constprop.5+0x22e/0x520
Yup, needs a KM_NOFS allocation there because we come through
here outside a transaction and so it doesn't get KM_NOFS implicitly
in this case. There's been a couple of these reported in the past
week or two - I need to do an audit and sweep them all up....
Technically, though, this can't cause a deadlock on the inode we
hold a lock on here because it's a directory inode, not a regular
file and so it will never be seen in the reclaim data writeback path
nor on the inode LRU when the shrinker runs. So most likely it is a
false positive...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Rambling noise #1: generic/230 can trigger kernel debug lock detector
2013-05-10 2:19 ` Dave Chinner
@ 2013-05-10 19:07 ` Michael L. Semon
2013-05-11 1:17 ` Dave Chinner
0 siblings, 1 reply; 10+ messages in thread
From: Michael L. Semon @ 2013-05-10 19:07 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Thu, May 9, 2013 at 10:19 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, May 09, 2013 at 10:00:10PM -0400, Michael L. Semon wrote:
>> On 05/09/2013 03:20 AM, Dave Chinner wrote:
>> >On Thu, May 09, 2013 at 01:16:46PM +1000, Dave Chinner wrote:
>> >>On Wed, May 08, 2013 at 10:24:25PM -0400, Michael L. Semon wrote:
>> >>No, there's definitely a bug there. Thanks for the report, Michael.
>> >>Try the patch below.
>> >
>> >Actaully, there's a bug in the error handling in that version - it
>> >fails to unlock the quotaoff lock properly on failure. The version
>> >below fixes that problem.
>> >
>> >Cheers,
>> >
>> >Dave.
>>
>> OK, I'll try this version as well. The first version seemed to work
>> just fine.
>
> It should, the bug was in an error handling path you are unlikely to
> hit.
OK, this version looks good, too, maybe better. The only lockdep that
I'm hitting consistently so far is caused by generic/249--a circular
dependency--but that's probably a separate issue. The trace is on my
USB key, but the PC for this E-mail is Windows XP and can't read F2FS.
Sorry about that.
>> xfs/012 13s ...[ 1851.323902]
>> [ 1851.325479] =================================
>> [ 1851.326551] [ INFO: inconsistent lock state ]
>> [ 1851.326551] 3.9.0+ #1 Not tainted
>> [ 1851.326551] ---------------------------------
>> [ 1851.326551] inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage.
>> [ 1851.326551] kswapd0/18 [HC0[0]:SC0[0]:HE1:SE1] takes:
>> [ 1851.326551] (&(&ip->i_lock)->mr_lock){++++-+}, at: [<c11dcabf>]
>> xfs_ilock+0x10f/0x190
>> [ 1851.326551] {RECLAIM_FS-ON-R} state was registered at:
>> [ 1851.326551] [<c105e10a>] mark_held_locks+0x8a/0xf0
>> [ 1851.326551] [<c105e69c>] lockdep_trace_alloc+0x5c/0xa0
>> [ 1851.326551] [<c109c52c>] __alloc_pages_nodemask+0x7c/0x670
>> [ 1851.326551] [<c10bfd8e>] new_slab+0x6e/0x2a0
>> [ 1851.326551] [<c14083a9>] __slab_alloc.isra.59.constprop.67+0x1d3/0x40a
>> [ 1851.326551] [<c10c12cd>] __kmalloc+0x10d/0x180
>> [ 1851.326551] [<c1199b56>] kmem_alloc+0x56/0xd0
>> [ 1851.326551] [<c1199be1>] kmem_zalloc+0x11/0xd0
>> [ 1851.326551] [<c11c666e>] xfs_dabuf_map.isra.2.constprop.5+0x22e/0x520
>
> Yup, needs a KM_NOFS allocation there because we come through
> here outside a transaction and so it doesn't get KM_NOFS implicitly
> in this case. There's been a couple of these reported in the past
> week or two - I need to do an audit and sweep them all up....
>
> Technically, though, this can't cause a deadlock on the inode we
> hold a lock on here because it's a directory inode, not a regular
> file and so it will never be seen in the reclaim data writeback path
> nor on the inode LRU when the shrinker runs. So most likely it is a
> false positive...
Thanks for looking at it. There are going to be plenty of false
positives out there. Is there a pecking order of what works best? As
in...
* IRQ (IRQs-off?) checking: worth reporting...?
* sleep inside atomic sections: fascinating, but almost anything can trigger it
* multiple-CPU deadlock detection: can only speculate on a uniprocessor system
* circular dependency checking: YMMV
* reclaim-fs checking: which I knew how much developers need to
conform to reclaim-fs, or what it is
Your list will probably look totally different and have extra items,
and I'll be happy if it completely contradicts my list.
Anyway, have a good weekend!
Michael
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Rambling noise #1: generic/230 can trigger kernel debug lock detector
2013-05-10 19:07 ` Michael L. Semon
@ 2013-05-11 1:17 ` Dave Chinner
2013-05-11 4:48 ` Michael L. Semon
0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2013-05-11 1:17 UTC (permalink / raw)
To: Michael L. Semon; +Cc: xfs
On Fri, May 10, 2013 at 03:07:19PM -0400, Michael L. Semon wrote:
> On Thu, May 9, 2013 at 10:19 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Thu, May 09, 2013 at 10:00:10PM -0400, Michael L. Semon wrote:
> >> xfs/012 13s ...[ 1851.323902]
> >> [ 1851.325479] =================================
> >> [ 1851.326551] [ INFO: inconsistent lock state ]
> >> [ 1851.326551] 3.9.0+ #1 Not tainted
> >> [ 1851.326551] ---------------------------------
> >> [ 1851.326551] inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage.
> >> [ 1851.326551] kswapd0/18 [HC0[0]:SC0[0]:HE1:SE1] takes:
> >> [ 1851.326551] (&(&ip->i_lock)->mr_lock){++++-+}, at: [<c11dcabf>]
> >> xfs_ilock+0x10f/0x190
> >> [ 1851.326551] {RECLAIM_FS-ON-R} state was registered at:
> >> [ 1851.326551] [<c105e10a>] mark_held_locks+0x8a/0xf0
> >> [ 1851.326551] [<c105e69c>] lockdep_trace_alloc+0x5c/0xa0
> >> [ 1851.326551] [<c109c52c>] __alloc_pages_nodemask+0x7c/0x670
> >> [ 1851.326551] [<c10bfd8e>] new_slab+0x6e/0x2a0
> >> [ 1851.326551] [<c14083a9>] __slab_alloc.isra.59.constprop.67+0x1d3/0x40a
> >> [ 1851.326551] [<c10c12cd>] __kmalloc+0x10d/0x180
> >> [ 1851.326551] [<c1199b56>] kmem_alloc+0x56/0xd0
> >> [ 1851.326551] [<c1199be1>] kmem_zalloc+0x11/0xd0
> >> [ 1851.326551] [<c11c666e>] xfs_dabuf_map.isra.2.constprop.5+0x22e/0x520
> >
> > Yup, needs a KM_NOFS allocation there because we come through
> > here outside a transaction and so it doesn't get KM_NOFS implicitly
> > in this case. There's been a couple of these reported in the past
> > week or two - I need to do an audit and sweep them all up....
> >
> > Technically, though, this can't cause a deadlock on the inode we
> > hold a lock on here because it's a directory inode, not a regular
> > file and so it will never be seen in the reclaim data writeback path
> > nor on the inode LRU when the shrinker runs. So most likely it is a
> > false positive...
>
> Thanks for looking at it. There are going to be plenty of false
> positives out there. Is there a pecking order of what works best? As
> in...
>
> * IRQ (IRQs-off?) checking: worth reporting...?
> * sleep inside atomic sections: fascinating, but almost anything can trigger it
> * multiple-CPU deadlock detection: can only speculate on a uniprocessor system
> * circular dependency checking: YMMV
> * reclaim-fs checking: which I knew how much developers need to
> conform to reclaim-fs, or what it is
If there's XFS in the trace, then just post them. We try to fix
false positives (as well as real bugs) so lockdep reporting gets more
accurate and less noisy over time.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Rambling noise #1: generic/230 can trigger kernel debug lock detector
2013-05-11 1:17 ` Dave Chinner
@ 2013-05-11 4:48 ` Michael L. Semon
2013-05-13 0:45 ` Dave Chinner
0 siblings, 1 reply; 10+ messages in thread
From: Michael L. Semon @ 2013-05-11 4:48 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 05/10/2013 09:17 PM, Dave Chinner wrote:
> On Fri, May 10, 2013 at 03:07:19PM -0400, Michael L. Semon wrote:
>> On Thu, May 9, 2013 at 10:19 PM, Dave Chinner <david@fromorbit.com> wrote:
>>> On Thu, May 09, 2013 at 10:00:10PM -0400, Michael L. Semon wrote:
>> Thanks for looking at it. There are going to be plenty of false
>> positives out there. Is there a pecking order of what works best? As
>> in...
>>
>> * IRQ (IRQs-off?) checking: worth reporting...?
>> * sleep inside atomic sections: fascinating, but almost anything can trigger it
>> * multiple-CPU deadlock detection: can only speculate on a uniprocessor system
>> * circular dependency checking: YMMV
>> * reclaim-fs checking: which I knew how much developers need to
>> conform to reclaim-fs, or what it is
>
> If there's XFS in the trace, then just post them. We try to fix
> false positives (as well as real bugs) so lockdep reporting gets more
> accurate and less noisy over time.
>
> Cheers,
>
> Dave.
>
Feel free to ignore and flame them as well. I'm going to make another
attempt to triage my eldest Pentium 4, and there's a high chance that
you'll have to reply, "Despite the xfs_* functions, that looks like a
DRM issue. Go bug those guys."
Thanks!
Michael
During generic/249 (lucky, first test out)...
======================================================
[ INFO: possible circular locking dependency detected ]
3.9.0+ #2 Not tainted
-------------------------------------------------------
xfs_io/1181 is trying to acquire lock:
(sb_writers#3){.+.+.+}, at: [<c10f01be>]
generic_file_splice_write+0x7e/0x1b0
but task is already holding lock:
(&(&ip->i_iolock)->mr_lock){++++++}, at: [<c11dca9a>] xfs_ilock+0xea/0x190
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&(&ip->i_iolock)->mr_lock){++++++}:
[<c1061580>] lock_acquire+0x80/0x100
[<c1047184>] down_write_nested+0x54/0xa0
[<c11dca9a>] xfs_ilock+0xea/0x190
[<c1190d2c>] xfs_setattr_size+0x30c/0x4a0
[<c1190eec>] xfs_vn_setattr+0x2c/0x30
[<c10dd40c>] notify_change+0x13c/0x360
[<c10c233a>] do_truncate+0x5a/0xa0
[<c10cfcce>] do_last.isra.46+0x31e/0xb90
[<c10d05db>] path_openat.isra.47+0x9b/0x3e0
[<c10d0951>] do_filp_open+0x31/0x80
[<c10c35f1>] do_sys_open+0xf1/0x1c0
[<c10c36e8>] sys_open+0x28/0x30
[<c140e1df>] sysenter_do_call+0x12/0x36
-> #1 (&sb->s_type->i_mutex_key#6){+.+.+.}:
[<c1061580>] lock_acquire+0x80/0x100
[<c140a1d4>] mutex_lock_nested+0x64/0x2b0
[<c10c2330>] do_truncate+0x50/0xa0
[<c10cfcce>] do_last.isra.46+0x31e/0xb90
[<c10d05db>] path_openat.isra.47+0x9b/0x3e0
[<c10d0951>] do_filp_open+0x31/0x80
[<c10c35f1>] do_sys_open+0xf1/0x1c0
[<c10c36e8>] sys_open+0x28/0x30
[<c140e1df>] sysenter_do_call+0x12/0x36
-> #0 (sb_writers#3){.+.+.+}:
[<c1060d55>] __lock_acquire+0x1465/0x1690
[<c1061580>] lock_acquire+0x80/0x100
[<c10c75ad>] __sb_start_write+0xad/0x1b0
[<c10f01be>] generic_file_splice_write+0x7e/0x1b0
[<c1184813>] xfs_file_splice_write+0x83/0x120
[<c10ee8c5>] do_splice_from+0x65/0x90
[<c10ee91b>] direct_splice_actor+0x2b/0x40
[<c10f03d9>] splice_direct_to_actor+0xb9/0x1e0
[<c10f0562>] do_splice_direct+0x62/0x80
[<c10c5166>] do_sendfile+0x1b6/0x2d0
[<c10c538e>] sys_sendfile64+0x4e/0xb0
[<c140e1df>] sysenter_do_call+0x12/0x36
other info that might help us debug this:
Chain exists of:
sb_writers#3 --> &sb->s_type->i_mutex_key#6 --> &(&ip->i_iolock)->mr_lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&(&ip->i_iolock)->mr_lock);
lock(&sb->s_type->i_mutex_key#6);
lock(&(&ip->i_iolock)->mr_lock);
lock(sb_writers#3);
*** DEADLOCK ***
1 lock held by xfs_io/1181:
#0: (&(&ip->i_iolock)->mr_lock){++++++}, at: [<c11dca9a>]
xfs_ilock+0xea/0x190
stack backtrace:
Pid: 1181, comm: xfs_io Not tainted 3.9.0+ #2
Call Trace:
[<c1406cc7>] print_circular_bug+0x1b8/0x1c2
[<c1060d55>] __lock_acquire+0x1465/0x1690
[<c105e4bb>] ? trace_hardirqs_off+0xb/0x10
[<c1061580>] lock_acquire+0x80/0x100
[<c10f01be>] ? generic_file_splice_write+0x7e/0x1b0
[<c10c75ad>] __sb_start_write+0xad/0x1b0
[<c10f01be>] ? generic_file_splice_write+0x7e/0x1b0
[<c10f01be>] ? generic_file_splice_write+0x7e/0x1b0
[<c10f01be>] generic_file_splice_write+0x7e/0x1b0
[<c11dca9a>] ? xfs_ilock+0xea/0x190
[<c1184813>] xfs_file_splice_write+0x83/0x120
[<c1184790>] ? xfs_file_fsync+0x210/0x210
[<c10ee8c5>] do_splice_from+0x65/0x90
[<c10ee91b>] direct_splice_actor+0x2b/0x40
[<c10f03d9>] splice_direct_to_actor+0xb9/0x1e0
[<c10ee8f0>] ? do_splice_from+0x90/0x90
[<c10f0562>] do_splice_direct+0x62/0x80
[<c10c5166>] do_sendfile+0x1b6/0x2d0
[<c10b45b4>] ? might_fault+0x94/0xa0
[<c10c538e>] sys_sendfile64+0x4e/0xb0
[<c140e1df>] sysenter_do_call+0x12/0x36
XFS (sdb5): Mounting Filesystem
XFS (sdb5): Ending clean mount
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Rambling noise #1: generic/230 can trigger kernel debug lock detector
2013-05-11 4:48 ` Michael L. Semon
@ 2013-05-13 0:45 ` Dave Chinner
0 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2013-05-13 0:45 UTC (permalink / raw)
To: Michael L. Semon; +Cc: xfs
On Sat, May 11, 2013 at 12:48:49AM -0400, Michael L. Semon wrote:
> On 05/10/2013 09:17 PM, Dave Chinner wrote:
> >On Fri, May 10, 2013 at 03:07:19PM -0400, Michael L. Semon wrote:
> >>On Thu, May 9, 2013 at 10:19 PM, Dave Chinner <david@fromorbit.com> wrote:
> >>>On Thu, May 09, 2013 at 10:00:10PM -0400, Michael L. Semon wrote:
>
> >>Thanks for looking at it. There are going to be plenty of false
> >>positives out there. Is there a pecking order of what works best? As
> >>in...
> >>
> >>* IRQ (IRQs-off?) checking: worth reporting...?
> >>* sleep inside atomic sections: fascinating, but almost anything can trigger it
> >>* multiple-CPU deadlock detection: can only speculate on a uniprocessor system
> >>* circular dependency checking: YMMV
> >>* reclaim-fs checking: which I knew how much developers need to
> >>conform to reclaim-fs, or what it is
> >
> >If there's XFS in the trace, then just post them. We try to fix
> >false positives (as well as real bugs) so lockdep reporting gets more
> >accurate and less noisy over time.
> >
> >Cheers,
> >
> >Dave.
> >
>
> Feel free to ignore and flame them as well. I'm going to make
> another attempt to triage my eldest Pentium 4, and there's a high
> chance that you'll have to reply, "Despite the xfs_* functions, that
> looks like a DRM issue. Go bug those guys."
>
> Thanks!
>
> Michael
>
> During generic/249 (lucky, first test out)...
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.9.0+ #2 Not tainted
> -------------------------------------------------------
> xfs_io/1181 is trying to acquire lock:
> (sb_writers#3){.+.+.+}, at: [<c10f01be>]
> generic_file_splice_write+0x7e/0x1b0
>
> but task is already holding lock:
> (&(&ip->i_iolock)->mr_lock){++++++}, at: [<c11dca9a>] xfs_ilock+0xea/0x190
Known issue. Needs VFS level changes to be fixed. I've posted
patches several times to linux-fsdevel over the past 2 years to fix
it, but I'm still waiting for them to be picked up....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-05-13 0:46 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-09 2:24 Rambling noise #1: generic/230 can trigger kernel debug lock detector Michael L. Semon
2013-05-09 3:16 ` Dave Chinner
2013-05-09 7:20 ` Dave Chinner
2013-05-10 2:00 ` Michael L. Semon
2013-05-10 2:19 ` Dave Chinner
2013-05-10 19:07 ` Michael L. Semon
2013-05-11 1:17 ` Dave Chinner
2013-05-11 4:48 ` Michael L. Semon
2013-05-13 0:45 ` Dave Chinner
2013-05-09 15:57 ` Mark Tinguely
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.