All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: 2.4.24 SMP lockups
@ 2004-01-10 19:58 Marcelo Tosatti
  2004-01-11  9:01 ` Simon Kirby
  0 siblings, 1 reply; 19+ messages in thread
From: Marcelo Tosatti @ 2004-01-10 19:58 UTC (permalink / raw)
  To: linux-kernel


---------- Forwarded message ----------
Date: Sat, 10 Jan 2004 17:32:55 -0200 (BRST)
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Simon Kirby <sim@netnation.com>, Andrew Morton <akpm@osdl.org>
Subject: Re: 2.4.24 SMP lockups



On Fri, 9 Jan 2004, Simon Kirby wrote:

> 'lo all,

Hi Simon,

> We've had about 6 cases of this now, across 4 separate boxes.  Since
> upgrading to 2.4.24, our SMP web server boxes (both Intel and AMD
> hardware) are randomly blowing up.  This may have happened on 2.4.23 as
> well, but they weren't really running long enough to tell.  2.4.22 was
> fine.  GCC 3.3.3.
>
> These boxes are all dual CPU, and the failure case shows up suddenly with
> no warning.  Sysreq-P works, but only reports from one CPU no matter how
> many times I try.  In normal operation, every machine distributes all
> IRQs across both CPUs, and Sysreq-P reports from both CPUs.
>
> Mapping the EIP reported by Sysreq-P to symbols shows that the responding
> CPU is spinning on a spinlock (so far I have seen .text.lock.fcntl,
> .text.lock.sched, .text.lock.locks, and .text.lock.inode), which I assume
> is being held by the other (dead) CPU.

This sounds like a deadlock. I wonder why the NMI watchdog is not
triggering.

> Even on boxes with nmi_watchdog=1, nothing is reported from the NMI
> watchdog.

Can you share all available SysRQ-P output for the locked CPU ? SysRQ-T if
possible, too.

Can you please describe the hardware in more detail. Is there any common
hardware used in these boxes?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.4.24 SMP lockups
  2004-01-10 19:58 2.4.24 SMP lockups Marcelo Tosatti
@ 2004-01-11  9:01 ` Simon Kirby
  2004-01-14 16:23   ` Marcelo Tosatti
  0 siblings, 1 reply; 19+ messages in thread
From: Simon Kirby @ 2004-01-11  9:01 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel

On Sat, Jan 10, 2004 at 05:58:18PM -0200, Marcelo Tosatti wrote:

> This sounds like a deadlock. I wonder why the NMI watchdog is not
> triggering.

It appears the box I was expecting it to work onn has issues with the NMI
working properly, so that may explain why nothing was showing up.  I'll
try on others.

> Can you share all available SysRQ-P output for the locked CPU ? SysRQ-T if
> possible, too.

Will do, in the next few days.

> Can you please describe the hardware in more detail. Is there any common
> hardware used in these boxes?

The CPUs, motherboards, SCSI, Ethernet, etc., are all different... They
are all SMP, and are fairly busy web servers.

Simon-

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.4.24 SMP lockups
  2004-01-11  9:01 ` Simon Kirby
@ 2004-01-14 16:23   ` Marcelo Tosatti
  2004-01-15 14:35     ` Thomas Zehetbauer
  0 siblings, 1 reply; 19+ messages in thread
From: Marcelo Tosatti @ 2004-01-14 16:23 UTC (permalink / raw)
  To: Simon Kirby; +Cc: linux-kernel, Arkadiusz Miskiewicz



On Sun, 11 Jan 2004, Simon Kirby wrote:

> On Sat, Jan 10, 2004 at 05:58:18PM -0200, Marcelo Tosatti wrote:
>
> > This sounds like a deadlock. I wonder why the NMI watchdog is not
> > triggering.
>
> It appears the box I was expecting it to work onn has issues with the NMI
> working properly, so that may explain why nothing was showing up.  I'll
> try on others.
>
> > Can you share all available SysRQ-P output for the locked CPU ? SysRQ-T if
> > possible, too.
>
> Will do, in the next few days.
>
> > Can you please describe the hardware in more detail. Is there any common
> > hardware used in these boxes?
>
> The CPUs, motherboards, SCSI, Ethernet, etc., are all different... They
> are all SMP, and are fairly busy web servers.

I'm stress testing 2.4.24 on a 8-way SMP server with AIC7xxx from OSDL.
I'm using apachebench.

Lets see how it goes. Ill receive an SMP box with AIC7xxx this week
probably. As soon as it arrives Ill run the tests there too.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.4.24 SMP lockups
  2004-01-14 16:23   ` Marcelo Tosatti
@ 2004-01-15 14:35     ` Thomas Zehetbauer
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Zehetbauer @ 2004-01-15 14:35 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 537 bytes --]

Marcelo,

as I have posted before I do also have a SMP lockup problem with
2.6.0/2.6.1 and the de4x5 driver that still keeps me from using any of
the 2.6 kernel series. I have already created a bug report for this
including the NMI watchdog oops message:
http://bugme.osdl.org/show_bug.cgi?id=1855

Tom

-- 
  T h o m a s   Z e h e t b a u e r   ( TZ251 )
  PGP encrypted mail preferred - KeyID 96FFCB89
       mail pgp-key-request@hostmaster.org

UNIX is user-friendly ... it's just selective about who it's friends are

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 481 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.4.24 SMP lockups
  2004-01-14 17:56     ` Marcelo Tosatti
@ 2004-01-16  2:34       ` Philippe Troin
  0 siblings, 0 replies; 19+ messages in thread
From: Philippe Troin @ 2004-01-16  2:34 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel, David Woodhouse

Marcelo Tosatti <marcelo.tosatti@cyclades.com> writes:

> On Wed, 14 Jan 2004, Simon Kirby wrote:
> 
> > On Sat, Jan 10, 2004 at 05:32:55PM -0200, Marcelo Tosatti wrote:
> >
> > > This sounds like a deadlock. I wonder why the NMI watchdog is not
> > > triggering.
> >
> > Well, with the NMI watchdog working (nmi_watchdog=2), we just had another
> > occurrence.  This time, I had the serial console ready. :)
> >
> > I'm guessing this is the same as the previous cases; however, this time
> > sysrq-P was able to print information from both CPUs.  I assume the NMI
> > watchdog unlocked interrupts from what would have been the stuck CPU?

I also experienced a deadlock with 2.4.24: the machine was still
responsive (TCP connections being established, etc), but all fs
accesses were locking up. The machine was broadcasting a load of 50+
via rwhod. I was able to capture the result of SysRq-T and pass it
through ksymoops.

According to my (deficient?) reading of the various backtrace, this is
a different problem than the one reported in this thread. It looks
like a jbd/ext3 issue.

Any takers?

Phil.

ksymoops 2.4.5 on i686 2.4.24.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.24/ (default)
     -m /boot/System.map-2.4.24 (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

init          S C12D5F2C  4588     1      0 32513               (NOTLB)
Using defaults from ksymoops -t elf32-i386 -a i386
Call Trace:    [<c011602a>] [<c0115f50>] [<c014943e>] [<c01497ca>] [<c0107353>]
keventd       S C13A8664  5256     2      1             3       (L-TLB)
Call Trace:    [<c0126045>] [<c0105a64>]
ksoftirqd_CPU S C13A6000  4948     3      1             4     2 (L-TLB)
Call Trace:    [<c011e04b>] [<c0105a64>]
ksoftirqd_CPU S C13A4000  4944     4      1             5     3 (L-TLB)
Call Trace:    [<c011e04b>] [<c0105a64>]
kswapd        S C1394000  4388     5      1             6     4 (L-TLB)
Call Trace:    [<c0132686>] [<c0105a64>]
bdflush       S 00000286  6312     6      1             7     5 (L-TLB)
Call Trace:    [<c011697b>] [<c013de57>] [<c0105a64>]
kupdated      D 00000286  4268     7      1             8     6 (L-TLB)
Call Trace:    [<c0116a7b>] [<c016c6dc>] [<c016c85d>] [<c0165a68>] [<c0165984>]
  [<c012a2ba>] [<c014e765>] [<c013dc57>] [<c013dfa5>] [<c0107306>] [<c013de5c>]
  [<c0105a64>]
ahc_dv_0      S C139EE0C  6124     8      1             9     7 (L-TLB)
Call Trace:    [<c01060b5>] [<c010619f>] [<c01d6fea>] [<c0105a64>]
scsi_eh_0     S C1323FDC  6192     9      1            10     8 (L-TLB)
Call Trace:    [<c01060b5>] [<c010619f>] [<c01cc200>] [<c0105a64>]
kjournald     S 00000286  4288    10      1           130     9 (L-TLB)
Call Trace:    [<c010bbfd>] [<c011697b>] [<c01715b9>] [<c0171440>] [<c0105a64>]
kjournald     D 00000286  4168   130      1           131    10 (L-TLB)
Call Trace:    [<c0116a7b>] [<c016e9f9>] [<c0116532>] [<c0171596>] [<c0171440>]
  [<c0105a64>]
kjournald     S 00000286  4152   131      1           132   130 (L-TLB)
Call Trace:    [<c011697b>] [<c01715b9>] [<c0171440>] [<c0105a64>]
kjournald     S 00000286  4328   132      1           133   131 (L-TLB)
Call Trace:    [<c011697b>] [<c01715b9>] [<c0171440>] [<c0105a64>]
kjournald     S 00000286  4192   133      1           134   132 (L-TLB)
Call Trace:    [<c011697b>] [<c01715b9>] [<c0171440>] [<c0105a64>]
kjournald     S 00000286  4168   134      1           384   133 (L-TLB)
Call Trace:    [<c011697b>] [<c01715b9>] [<c0171440>] [<c0105a64>]
devfsd        S CF244000  4468   384      1           389   134 (NOTLB)
Call Trace:    [<c0176446>] [<c01398ab>] [<c0107353>]
syslogd       D 00000286  4248   389      1           393   384 (NOTLB)
Call Trace:    [<c0116a7b>] [<c016c6dc>] [<c016c85d>] [<c0167688>] [<c014e2da>]
  [<c012cff3>] [<c012d5c4>] [<c0162e77>] [<c0139bd6>] [<c0162e54>] [<c0139d29>]
  [<c0107353>]
watchdog      S CF1C9F8C  4844   393      1   886     397   389 (NOTLB)
Call Trace:    [<c011602a>] [<c0115f50>] [<c0121e3a>] [<c0107353>]
klogd         S 7FFFFFFF  4908   397      1           404   393 (NOTLB)
Call Trace:    [<c0115fc7>] [<c01f6677>] [<c0240fc2>] [<c02419e7>] [<c0116532>]
  [<c01f3ff5>] [<c01f421e>] [<c01399b6>] [<c0107353>]
named         S 00000000  5744   404      1   406     408   397 (NOTLB)
Call Trace:    [<c01f4f12>] [<c011cd25>] [<c0107353>]
named         S CF129FB0     0   406    404   420               (NOTLB)
Call Trace:    [<c0106484>] [<c0107353>]
portmap       S 7FFFFFFF     0   408      1           415   404 (NOTLB)
Call Trace:    [<c0115fc7>] [<c0149a1a>] [<c0149a4f>] [<c0149c5d>] [<c0107353>]
ypserv        D 00000286  4332   415      1 32543     418   408 (NOTLB)
Call Trace:    [<c0116a7b>] [<c016c6dc>] [<c016c85d>] [<c0167688>] [<c014e2da>]
  [<c014fb6b>] [<c012b202>] [<c012b5bb>] [<c012b49c>] [<c0139508>] [<c01398ab>]
  [<c0107353>]
rpc.yppasswdd S 7FFFFFFF  4692   418      1           426   415 (NOTLB)
Call Trace:    [<c0115fc7>] [<c0149a1a>] [<c0149a4f>] [<c0149c5d>] [<c0139369>]
  [<c0107353>]
named         S CEB59F28  4948   420    406   424               (NOTLB)
Call Trace:    [<c011602a>] [<c0115f50>] [<c0149a4f>] [<c0149c5d>] [<c010b2a8>]
  [<c0107353>]
named         S 7FFFFFFF     0   421    420           422       (NOTLB)
Call Trace:    [<c0115fc7>] [<c01f6677>] [<c0240fc2>] [<c02419e7>] [<c01f3ff5>]
  [<c01f4ed5>] [<c011522a>] [<c01150ac>] [<c01f4f12>] [<c01f56b1>] [<c0107353>]
named         S CEB57FB0  4208   422    420           423   421 (NOTLB)
Call Trace:    [<c0106484>] [<c0107353>]
named         S CEB51F8C  2404   423    420           424   422 (NOTLB)
Call Trace:    [<c011602a>] [<c0115f50>] [<c0121e3a>] [<c0107353>]
named         S 7FFFFFFF  4492   424    420                 423 (NOTLB)
Call Trace:    [<c0216162>] [<c0115fc7>] [<c014943e>] [<c01497ca>] [<c0107353>]
rpc.ypxfrd    S 7FFFFFFF  5316   426      1           433   418 (NOTLB)
Call Trace:    [<c0216162>] [<c0115fc7>] [<c014943e>] [<c01497ca>] [<c0107353>]
ypbind        S 7FFFFFFF     4   433      1   434     446   426 (NOTLB)
Call Trace:    [<c0115fc7>] [<c0149a1a>] [<c0149a4f>] [<c0149c5d>] [<c0139369>]
  [<c0107353>]
ypbind        S CEA29F28  4836   434    433   437               (NOTLB)
Call Trace:    [<c011602a>] [<c0115f50>] [<c0149a4f>] [<c0149c5d>] [<c0107353>]
ypbind        S CEAC3FB0     0   435    434           437       (NOTLB)
Call Trace:    [<c0106484>] [<c0107353>]
ypbind        S CEA6DF8C     0   437    434                 435 (NOTLB)
Call Trace:    [<c011602a>] [<c0115f50>] [<c0121e3a>] [<c0107353>]
amd           D 00000286  4180   446      1 32546     451   433 (NOTLB)
Call Trace:    [<c0116a7b>] [<c016c6dc>] [<c016c85d>] [<c0167688>] [<c014e2da>]
  [<c014fb6b>] [<c012b202>] [<c012b5bb>] [<c012b49c>] [<c0139bd6>] [<c012b528>]
  [<c0139cd5>] [<c0107353>]
rpciod        S 00000001  5200   449      1           457   451 (L-TLB)
Call Trace:    [<d089c371>] [<d08a5f2c>] [<d08a5f2c>] [<d08a5f24>] [<d08a5f24>]
  [<c0105a64>] [<d08a5f2c>]
lockd         S 7FFFFFFF     4   451      1           449   446 (L-TLB)
Call Trace:    [<c0115fc7>] [<d089fc7e>] [<d08aac44>] [<c0105a64>]
rpc.rquotad   S 7FFFFFFF  2444   457      1           470   449 (NOTLB)
Call Trace:    [<c0115fc7>] [<c0149a1a>] [<c0149a4f>] [<c0149c5d>] [<c01f577c>]
  [<c0107353>]
rpc.bootparam S 7FFFFFFF     0   470      1           478   457 (NOTLB)
Call Trace:    [<c0115fc7>] [<c0149a1a>] [<c0149a4f>] [<c0149c5d>] [<c0139369>]
  [<c0107353>]
conserver     S 7FFFFFFF  2404   478      1   479     482   470 (NOTLB)
Call Trace:    [<c0216162>] [<c0115fc7>] [<c014943e>] [<c01497ca>] [<c0107353>]
conserver     S 7FFFFFFF  3832   479    478                     (NOTLB)
Call Trace:    [<c0115fc7>] [<c014943e>] [<c01497ca>] [<c0107353>]
dhcpd-2.2.x   S 7FFFFFFF  3528   482      1           492   478 (NOTLB)
Call Trace:    [<c0115fc7>] [<c01f6677>] [<c0240fc2>] [<c02419e7>] [<c01f3ff5>]
  [<c01f4ed5>] [<c01f73bc>] [<c01f90ad>] [<d0805575>] [<c01f4051>] [<c0121657>]
  [<c0121872>] [<c01f4f12>] [<c01f56b1>] [<c0107353>]
inetd         S 7FFFFFFF     0   492      1   887    1068   482 (NOTLB)
Call Trace:    [<c0216162>] [<c0115fc7>] [<c014943e>] [<c01497ca>] [<c0107353>]
icmplog       D 00000286     0  1068      1          1070   492 (NOTLB)
Call Trace:    [<c0116a7b>] [<c016c6dc>] [<c016c85d>] [<c0167688>] [<c014e2da>]
  [<c014fb6b>] [<c012b202>] [<c012b5bb>] [<c012b49c>] [<c0139bd6>] [<c012b528>]
  [<c0139cd5>] [<c0107353>]
tcplog        S 7FFFFFFF   892  1070      1          1083  1068 (NOTLB)
Call Trace:    [<c0115fc7>] [<c01f6677>] [<c0240fc2>] [<c02419e7>] [<c01f3ff5>]
  [<c01f4ed5>] [<c0128725>] [<c01298dd>] [<c012919d>] [<c01291af>] [<c01f4f12>]
  [<c01f56b1>] [<c0107353>]
lpd           S 7FFFFFFF    12  1083      1          1098  1070 (NOTLB)
Call Trace:    [<c0115fc7>] [<c014943e>] [<c01497ca>] [<c0107353>]
safe_mysqld   S 00000000     0  1098      1  1133    1168  1083 (NOTLB)
Call Trace:    [<c011cd25>] [<c0107353>]
mysqld        S 7FFFFFFF  1376  1133   1098  1139               (NOTLB)
Call Trace:    [<c0115fc7>] [<c014943e>] [<c01497ca>] [<c0107353>]
mysqld        S CA69FF28  4892  1139   1133  1148               (NOTLB)
Call Trace:    [<c011602a>] [<c0115f50>] [<c0149a4f>] [<c0149c5d>] [<c0107353>]
mysqld        S CA69BFB0  6124  1140   1139          1141       (NOTLB)
Call Trace:    [<c0106484>] [<c0107353>]
mysqld        S CA699FB0  2404  1141   1139          1142  1140 (NOTLB)
Call Trace:    [<c0106484>] [<c0107353>]
mysqld        S CA695FB0  5596  1142   1139          1143  1141 (NOTLB)
Call Trace:    [<c0106484>] [<c0107353>]
mysqld        S CA691FB0  6336  1143   1139          1144  1142 (NOTLB)
Call Trace:    [<c0106484>] [<c0107353>]
mysqld        S C9F3BFB0  6180  1144   1139          1145  1143 (NOTLB)
Call Trace:    [<c0106484>] [<c0107353>]
mysqld        S C9E8FF2C  2404  1145   1139          1146  1144 (NOTLB)
Call Trace:    [<c011602a>] [<c0115f50>] [<c014943e>] [<c01497ca>] [<c0107353>]
mysqld        S C9EEFFB0  5844  1146   1139          1147  1145 (NOTLB)
Call Trace:    [<c011db2d>] [<c0106484>] [<c0107353>]
mysqld        S C9F13FB0  5460  1147   1139          1148  1146 (NOTLB)
Call Trace:    [<c0106484>] [<c0107353>]
mysqld        S C9D1DFB0  6324  1148   1139                1147 (NOTLB)
Call Trace:    [<c0106484>] [<c0107353>]
nfsd          D CFA2D6DC   112  1157      1          1171  1158 (L-TLB)
Call Trace:    [<c0105fe5>] [<c0106194>] [<d08cc15d>] [<d08cb807>] [<d08d1df9>]
  [<d08d8ca4>] [<d08c95b3>] [<d08d8ca4>] [<d089df27>] [<d08d8638>] [<d08d8658>]
  [<d08c9390>] [<d08d8620>] [<c0105a64>]
nfsd          D CFA2D6DC  2520  1158      1          1157  1159 (L-TLB)
Call Trace:    [<c0105fe5>] [<c0106194>] [<d08cc15d>] [<d08cb807>] [<d08d1df9>]
  [<d08d8ca4>] [<d08c95b3>] [<d08d8ca4>] [<d089df27>] [<d08d8638>] [<d08d8658>]
  [<d08c9390>] [<c0105a64>]
nfsd          D CFA2D6DC     0  1159      1          1158  1160 (L-TLB)
Call Trace:    [<c0105fe5>] [<c0106194>] [<d08cc15d>] [<d08cb807>] [<d08d1df9>]
  [<d08d8ca4>] [<d08c95b3>] [<d08d8ca4>] [<d089df27>] [<d08d8638>] [<d08d8658>]
  [<d08c9390>] [<c0105a64>]
nfsd          D CFA2D6DC     8  1160      1          1159  1161 (L-TLB)
Call Trace:    [<c0105fe5>] [<c0106194>] [<d08cc15d>] [<d08cb807>] [<d08d1df9>]
  [<d08d8ca4>] [<d08c95b3>] [<d08d8ca4>] [<d089df27>] [<d08d8638>] [<d08d8658>]
  [<d08c9390>] [<c0105a64>]
nfsd          D CFA2D6DC     0  1161      1          1160  1162 (L-TLB)
Call Trace:    [<c0105fe5>] [<c0106194>] [<d08cc15d>] [<d08cb807>] [<d08d1df9>]
  [<d08d8ca4>] [<d08c95b3>] [<d08d8ca4>] [<d089df27>] [<d08d8638>] [<d08d8658>]
  [<d08c9390>] [<c0105a64>]
nfsd          D CFA2D6DC     0  1162      1          1161  1163 (L-TLB)
Call Trace:    [<c0105fe5>] [<c0106194>] [<d08cc15d>] [<d08cb807>] [<d08d1df9>]
  [<d08d8ca4>] [<d08c95b3>] [<d08d8ca4>] [<d089df27>] [<d08d8638>] [<d08d8658>]
  [<d08c9390>] [<c0105a64>]
nfsd          D CFA2D6DC     0  1163      1          1162  1164 (L-TLB)
Call Trace:    [<c0105fe5>] [<c0106194>] [<d08cc15d>] [<d08cb807>] [<d08d1df9>]
  [<d08d8ca4>] [<d08c95b3>] [<d08d8ca4>] [<d089df27>] [<d08d8638>] [<d08d8658>]
  [<d08c9390>] [<c0105a64>]
nfsd          D CFA2D6DC     0  1164      1          1163  1165 (L-TLB)
Call Trace:    [<c0105fe5>] [<c0106194>] [<d08cc15d>] [<d08cb807>] [<d08d1df9>]
  [<d08d8ca4>] [<d08c95b3>] [<d08d8ca4>] [<d089df27>] [<d08d8638>] [<d08d8658>]
  [<d08c9390>] [<c0105a64>]
nfsd          D CFA2D6DC     0  1165      1          1164  1166 (L-TLB)
Call Trace:    [<c0105fe5>] [<c0106194>] [<d08cc15d>] [<d08cb807>] [<d08d1df9>]
  [<d08d8ca4>] [<d08c95b3>] [<d08d8ca4>] [<d089df27>] [<d08d8638>] [<d08d8658>]
  [<d08c9390>] [<c0105a64>]
nfsd          D CFA2D6DC     0  1166      1          1165  1167 (L-TLB)
Call Trace:    [<c0105fe5>] [<c0106194>] [<d08cc15d>] [<d08cb807>] [<d08d1df9>]
  [<d08d8ca4>] [<d08c95b3>] [<d08d8ca4>] [<d089df27>] [<d08d8638>] [<d08d8658>]
  [<d08c9390>] [<c0105a64>]
nfsd          D 00000286     0  1167      1          1166  1168 (L-TLB)
Call Trace:    [<c0116a7b>] [<c016c6dc>] [<c016c85d>] [<c0167688>] [<c014e2da>]
  [<d08cab90>] [<c014fb6b>] [<c0162e00>] [<d082c63a>] [<c01488b4>] [<d08cab90>]
  [<d08cac85>] [<d08cab90>] [<d08cb0a8>] [<c013b7d8>] [<c013b7ff>] [<c013ba34>]
  [<c0166a94>] [<c0166b52>] [<c0166d9e>] [<c014f6f1>] [<c014f866>] [<d08cadbb>]
  [<d08cae3d>] [<d08cb33d>] [<d08cb37a>] [<d08cb807>] [<d08d1df9>] [<d08d8ca4>]
  [<d08c95b3>] [<d08d8ca4>] [<d089df27>] [<d08d8638>] [<d08d8658>] [<d08c9390>]
  [<c0105a64>]
nfsd          D CFA2D6DC  2400  1168      1          1167  1098 (L-TLB)
Call Trace:    [<c0105fe5>] [<c0106194>] [<d08cc15d>] [<d08cb807>] [<d08d1df9>]
  [<d08d8ca4>] [<d08c95b3>] [<d08d8ca4>] [<d089df27>] [<d08d8638>] [<d08d8658>]
  [<d08c9390>] [<d08d8620>] [<c0105a64>]
rpc.mountd    S 7FFFFFFF  4644  1171      1          1175  1157 (NOTLB)
Call Trace:    [<c0115fc7>] [<c01f6677>] [<c0240fc2>] [<c02419e7>] [<c01f3ff5>]
  [<c01f4ed5>] [<c0133368>] [<c0133796>] [<c0126ab0>] [<c012707b>] [<c01290b2>]
  [<c01f4f12>] [<c01f56b1>] [<c0107353>]
omniNames     S C9FB3F8C  1024  1175      1  1185    1179  1171 (NOTLB)
Call Trace:    [<c011dec0>] [<c011602a>] [<c0115f50>] [<c0121e3a>] [<c0107353>]
powstatd      S CA0C5F8C  4472  1179      1          1182  1175 (NOTLB)
Call Trace:    [<c01f7551>] [<c011602a>] [<c0115f50>] [<c0121e3a>] [<c0107353>]
rarpd         S 7FFFFFFF  5812  1182      1          1189  1179 (NOTLB)
Call Trace:    [<c0115fc7>] [<c0149a1a>] [<c0149a4f>] [<c0149c5d>] [<c0107353>]
omniNames     S C9C47F28  4792  1185   1175  1188               (NOTLB)
Call Trace:    [<c011602a>] [<c0115f50>] [<c0149a4f>] [<c0149c5d>] [<c0107353>]
omniNames     S C9C43FB0  6124  1186   1185          1187       (NOTLB)
Call Trace:    [<c0106484>] [<c0107353>]
omniNames     S C9C3FF8C  5004  1187   1185          1188  1186 (NOTLB)
Call Trace:    [<c011602a>] [<c0115f50>] [<c0121e3a>] [<c0107353>]
Warning (Oops_read): Code line not seen, dumping what data is available

Proc;  init

>>EIP; c12d5f2c <_end+f83500/104b15d4>   <=====

Trace; c011602a <schedule_timeout+7a/9c>
Trace; c0115f50 <process_timeout+0/60>
Trace; c014943e <do_select+1ca/204>
Trace; c01497ca <sys_select+32a/46c>
Trace; c0107353 <system_call+33/38>
Proc;  keventd

>>EIP; c13a8664 <_end+1055c38/104b15d4>   <=====

Trace; c0126045 <context_thread+115/1d0>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  ksoftirqd_CPU

>>EIP; c13a6000 <_end+10535d4/104b15d4>   <=====

Trace; c011e04b <ksoftirqd+93/c8>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  ksoftirqd_CPU

>>EIP; c13a4000 <_end+10515d4/104b15d4>   <=====

Trace; c011e04b <ksoftirqd+93/c8>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  kswapd

>>EIP; c1394000 <_end+10415d4/104b15d4>   <=====

Trace; c0132686 <kswapd+82/b4>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  bdflush

>>EIP; 00000286 Before first symbol   <=====

Trace; c011697b <interruptible_sleep_on+4b/7c>
Trace; c013de57 <bdflush+c7/cc>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  kupdated

>>EIP; 00000286 Before first symbol   <=====

Trace; c0116a7b <sleep_on+4b/7c>
Trace; c016c6dc <start_this_handle+d0/170>
Trace; c016c85d <journal_start+95/c4>
Trace; c0165a68 <ext3_writepage+e4/2f4>
Trace; c0165984 <ext3_writepage+0/2f4>
Trace; c012a2ba <filemap_fdatasync+6a/b8>
Trace; c014e765 <sync_unlocked_inodes+ad/1bc>
Trace; c013dc57 <sync_old_buffers+2b/a4>
Trace; c013dfa5 <kupdate+149/180>
Trace; c0107306 <ret_from_fork+6/20>
Trace; c013de5c <kupdate+0/180>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  ahc_dv_0

>>EIP; c139ee0c <_end+104c3e0/104b15d4>   <=====

Trace; c01060b5 <__down_interruptible+6d/f4>
Trace; c010619f <__down_failed_interruptible+7/c>
Trace; c01d6fea <.text.lock.aic7xxx_osm+125/2ab>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  scsi_eh_0

>>EIP; c1323fdc <_end+fd15b0/104b15d4>   <=====

Trace; c01060b5 <__down_interruptible+6d/f4>
Trace; c010619f <__down_failed_interruptible+7/c>
Trace; c01cc200 <.text.lock.scsi_error+e5/f5>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  kjournald

>>EIP; 00000286 Before first symbol   <=====

Trace; c010bbfd <call_apic_timer_interrupt+5/10>
Trace; c011697b <interruptible_sleep_on+4b/7c>
Trace; c01715b9 <kjournald+169/21c>
Trace; c0171440 <commit_timeout+0/c>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  kjournald

>>EIP; 00000286 Before first symbol   <=====

Trace; c0116a7b <sleep_on+4b/7c>
Trace; c016e9f9 <journal_commit_transaction+165/fcc>
Trace; c0116532 <schedule+45a/520>
Trace; c0171596 <kjournald+146/21c>
Trace; c0171440 <commit_timeout+0/c>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  kjournald

>>EIP; 00000286 Before first symbol   <=====

Trace; c011697b <interruptible_sleep_on+4b/7c>
Trace; c01715b9 <kjournald+169/21c>
Trace; c0171440 <commit_timeout+0/c>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  kjournald

>>EIP; 00000286 Before first symbol   <=====

Trace; c011697b <interruptible_sleep_on+4b/7c>
Trace; c01715b9 <kjournald+169/21c>
Trace; c0171440 <commit_timeout+0/c>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  kjournald

>>EIP; 00000286 Before first symbol   <=====

Trace; c011697b <interruptible_sleep_on+4b/7c>
Trace; c01715b9 <kjournald+169/21c>
Trace; c0171440 <commit_timeout+0/c>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  kjournald

>>EIP; 00000286 Before first symbol   <=====

Trace; c011697b <interruptible_sleep_on+4b/7c>
Trace; c01715b9 <kjournald+169/21c>
Trace; c0171440 <commit_timeout+0/c>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  devfsd

>>EIP; cf244000 <_end+eef15d4/104b15d4>   <=====

Trace; c0176446 <devfsd_read+10a/3f8>
Trace; c01398ab <sys_read+8f/104>
Trace; c0107353 <system_call+33/38>
Proc;  syslogd

>>EIP; 00000286 Before first symbol   <=====

Trace; c0116a7b <sleep_on+4b/7c>
Trace; c016c6dc <start_this_handle+d0/170>
Trace; c016c85d <journal_start+95/c4>
Trace; c0167688 <ext3_dirty_inode+74/10c>
Trace; c014e2da <__mark_inode_dirty+32/a8>
Trace; c012cff3 <do_generic_file_write+d3/3c8>
Trace; c012d5c4 <generic_file_write+10c/12c>
Trace; c0162e77 <ext3_file_write+23/bc>
Trace; c0139bd6 <do_readv_writev+1aa/268>
Trace; c0162e54 <ext3_file_write+0/bc>
Trace; c0139d29 <sys_writev+41/54>
Trace; c0107353 <system_call+33/38>
Proc;  watchdog

>>EIP; cf1c9f8c <_end+ee77560/104b15d4>   <=====

Trace; c011602a <schedule_timeout+7a/9c>
Trace; c0115f50 <process_timeout+0/60>
Trace; c0121e3a <sys_nanosleep+102/178>
Trace; c0107353 <system_call+33/38>
Proc;  klogd

>>EIP; 7fffffff Before first symbol   <=====

Trace; c0115fc7 <schedule_timeout+17/9c>
Trace; c01f6677 <sock_alloc_send_pskb+73/1d0>
Trace; c0240fc2 <unix_wait_for_peer+a6/cc>
Trace; c02419e7 <unix_dgram_sendmsg+327/3f8>
Trace; c0116532 <schedule+45a/520>
Trace; c01f3ff5 <sock_sendmsg+69/88>
Trace; c01f421e <sock_write+b2/bc>
Trace; c01399b6 <sys_write+96/10c>
Trace; c0107353 <system_call+33/38>
Proc;  named

>>EIP; 00000000 Before first symbol

Trace; c01f4f12 <sys_send+1e/24>
Trace; c011cd25 <sys_wait4+395/3cc>
Trace; c0107353 <system_call+33/38>
Proc;  named

>>EIP; cf129fb0 <_end+edd7584/104b15d4>   <=====

Trace; c0106484 <sys_rt_sigsuspend+fc/118>
Trace; c0107353 <system_call+33/38>
Proc;  portmap

>>EIP; 7fffffff Before first symbol   <=====

Trace; c0115fc7 <schedule_timeout+17/9c>
Trace; c0149a1a <do_poll+86/dc>
Trace; c0149a4f <do_poll+bb/dc>
Trace; c0149c5d <sys_poll+1ed/2f4>
Trace; c0107353 <system_call+33/38>
Proc;  ypserv

>>EIP; 00000286 Before first symbol   <=====

Trace; c0116a7b <sleep_on+4b/7c>
Trace; c016c6dc <start_this_handle+d0/170>
Trace; c016c85d <journal_start+95/c4>
Trace; c0167688 <ext3_dirty_inode+74/10c>
Trace; c014e2da <__mark_inode_dirty+32/a8>
Trace; c014fb6b <update_atime+4b/50>
Trace; c012b202 <do_generic_file_read+486/494>
Trace; c012b5bb <generic_file_read+93/194>
Trace; c012b49c <file_read_actor+0/8c>
Trace; c0139508 <generic_file_llseek+0/94>
Trace; c01398ab <sys_read+8f/104>
Trace; c0107353 <system_call+33/38>
Proc;  rpc.yppasswdd

>>EIP; 7fffffff Before first symbol   <=====

Trace; c0115fc7 <schedule_timeout+17/9c>
Trace; c0149a1a <do_poll+86/dc>
Trace; c0149a4f <do_poll+bb/dc>
Trace; c0149c5d <sys_poll+1ed/2f4>
Trace; c0139369 <filp_close+9d/a8>
Trace; c0107353 <system_call+33/38>
Proc;  named

>>EIP; ceb59f28 <_end+e8074fc/104b15d4>   <=====

Trace; c011602a <schedule_timeout+7a/9c>
Trace; c0115f50 <process_timeout+0/60>
Trace; c0149a4f <do_poll+bb/dc>
Trace; c0149c5d <sys_poll+1ed/2f4>
Trace; c010b2a8 <call_do_IRQ+5/d>
Trace; c0107353 <system_call+33/38>
Proc;  named

>>EIP; 7fffffff Before first symbol   <=====

Trace; c0115fc7 <schedule_timeout+17/9c>
Trace; c01f6677 <sock_alloc_send_pskb+73/1d0>
Trace; c0240fc2 <unix_wait_for_peer+a6/cc>
Trace; c02419e7 <unix_dgram_sendmsg+327/3f8>
Trace; c01f3ff5 <sock_sendmsg+69/88>
Trace; c01f4ed5 <sys_sendto+d9/f8>
Trace; c011522a <do_page_fault+17e/49a>
Trace; c01150ac <do_page_fault+0/49a>
Trace; c01f4f12 <sys_send+1e/24>
Trace; c01f56b1 <sys_socketcall+119/200>
Trace; c0107353 <system_call+33/38>
Proc;  named

>>EIP; ceb57fb0 <_end+e805584/104b15d4>   <=====

Trace; c0106484 <sys_rt_sigsuspend+fc/118>
Trace; c0107353 <system_call+33/38>
Proc;  named

>>EIP; ceb51f8c <_end+e7ff560/104b15d4>   <=====

Trace; c011602a <schedule_timeout+7a/9c>
Trace; c0115f50 <process_timeout+0/60>
Trace; c0121e3a <sys_nanosleep+102/178>
Trace; c0107353 <system_call+33/38>
Proc;  named

>>EIP; 7fffffff Before first symbol   <=====

Trace; c0216162 <tcp_poll+2e/158>
Trace; c0115fc7 <schedule_timeout+17/9c>
Trace; c014943e <do_select+1ca/204>
Trace; c01497ca <sys_select+32a/46c>
Trace; c0107353 <system_call+33/38>
Proc;  rpc.ypxfrd

>>EIP; 7fffffff Before first symbol   <=====

Trace; c0216162 <tcp_poll+2e/158>
Trace; c0115fc7 <schedule_timeout+17/9c>
Trace; c014943e <do_select+1ca/204>
Trace; c01497ca <sys_select+32a/46c>
Trace; c0107353 <system_call+33/38>
Proc;  ypbind

>>EIP; 7fffffff Before first symbol   <=====

Trace; c0115fc7 <schedule_timeout+17/9c>
Trace; c0149a1a <do_poll+86/dc>
Trace; c0149a4f <do_poll+bb/dc>
Trace; c0149c5d <sys_poll+1ed/2f4>
Trace; c0139369 <filp_close+9d/a8>
Trace; c0107353 <system_call+33/38>
Proc;  ypbind

>>EIP; cea29f28 <_end+e6d74fc/104b15d4>   <=====

Trace; c011602a <schedule_timeout+7a/9c>
Trace; c0115f50 <process_timeout+0/60>
Trace; c0149a4f <do_poll+bb/dc>
Trace; c0149c5d <sys_poll+1ed/2f4>
Trace; c0107353 <system_call+33/38>
Proc;  ypbind

>>EIP; ceac3fb0 <_end+e771584/104b15d4>   <=====

Trace; c0106484 <sys_rt_sigsuspend+fc/118>
Trace; c0107353 <system_call+33/38>
Proc;  ypbind

>>EIP; cea6df8c <_end+e71b560/104b15d4>   <=====

Trace; c011602a <schedule_timeout+7a/9c>
Trace; c0115f50 <process_timeout+0/60>
Trace; c0121e3a <sys_nanosleep+102/178>
Trace; c0107353 <system_call+33/38>
Proc;  amd

>>EIP; 00000286 Before first symbol   <=====

Trace; c0116a7b <sleep_on+4b/7c>
Trace; c016c6dc <start_this_handle+d0/170>
Trace; c016c85d <journal_start+95/c4>
Trace; c0167688 <ext3_dirty_inode+74/10c>
Trace; c014e2da <__mark_inode_dirty+32/a8>
Trace; c014fb6b <update_atime+4b/50>
Trace; c012b202 <do_generic_file_read+486/494>
Trace; c012b5bb <generic_file_read+93/194>
Trace; c012b49c <file_read_actor+0/8c>
Trace; c0139bd6 <do_readv_writev+1aa/268>
Trace; c012b528 <generic_file_read+0/194>
Trace; c0139cd5 <sys_readv+41/54>
Trace; c0107353 <system_call+33/38>
Proc;  rpciod

>>EIP; 00000001 Before first symbol   <=====

Trace; d089c371 <[sunrpc]rpciod+175/234>
Trace; d08a5f2c <[sunrpc]rpciod_killer+0/c>
Trace; d08a5f2c <[sunrpc]rpciod_killer+0/c>
Trace; d08a5f24 <[sunrpc]rpciod_idle+4/c>
Trace; d08a5f24 <[sunrpc]rpciod_idle+4/c>
Trace; c0105a64 <arch_kernel_thread+28/38>
Trace; d08a5f2c <[sunrpc]rpciod_killer+0/c>
Proc;  lockd

>>EIP; 7fffffff Before first symbol   <=====

Trace; c0115fc7 <schedule_timeout+17/9c>
Trace; d089fc7e <[sunrpc]svc_recv+25a/4c4>
Trace; d08aac44 <[lockd]lockd+160/2b4>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  rpc.rquotad

>>EIP; 7fffffff Before first symbol   <=====

Trace; c0115fc7 <schedule_timeout+17/9c>
Trace; c0149a1a <do_poll+86/dc>
Trace; c0149a4f <do_poll+bb/dc>
Trace; c0149c5d <sys_poll+1ed/2f4>
Trace; c01f577c <sys_socketcall+1e4/200>
Trace; c0107353 <system_call+33/38>
Proc;  rpc.bootparam

>>EIP; 7fffffff Before first symbol   <=====

Trace; c0115fc7 <schedule_timeout+17/9c>
Trace; c0149a1a <do_poll+86/dc>
Trace; c0149a4f <do_poll+bb/dc>
Trace; c0149c5d <sys_poll+1ed/2f4>
Trace; c0139369 <filp_close+9d/a8>
Trace; c0107353 <system_call+33/38>
Proc;  conserver

>>EIP; 7fffffff Before first symbol   <=====

Trace; c0216162 <tcp_poll+2e/158>
Trace; c0115fc7 <schedule_timeout+17/9c>
Trace; c014943e <do_select+1ca/204>
Trace; c01497ca <sys_select+32a/46c>
Trace; c0107353 <system_call+33/38>
Proc;  conserver

>>EIP; 7fffffff Before first symbol   <=====

Trace; c0115fc7 <schedule_timeout+17/9c>
Trace; c014943e <do_select+1ca/204>
Trace; c01497ca <sys_select+32a/46c>
Trace; c0107353 <system_call+33/38>
Proc;  dhcpd-2.2.x

>>EIP; 7fffffff Before first symbol   <=====

Trace; c0115fc7 <schedule_timeout+17/9c>
Trace; c01f6677 <sock_alloc_send_pskb+73/1d0>
Trace; c0240fc2 <unix_wait_for_peer+a6/cc>
Trace; c02419e7 <unix_dgram_sendmsg+327/3f8>
Trace; c01f3ff5 <sock_sendmsg+69/88>
Trace; c01f4ed5 <sys_sendto+d9/f8>
Trace; c01f73bc <kfree_skbmem+c/68>
Trace; c01f90ad <skb_free_datagram+1d/24>
Trace; d0805575 <[af_packet]packet_recvmsg+11d/12c>
Trace; c01f4051 <sock_recvmsg+3d/bc>
Trace; c0121657 <update_wall_time+b/34>
Trace; c0121872 <timer_bh+36/3d4>
Trace; c01f4f12 <sys_send+1e/24>
Trace; c01f56b1 <sys_socketcall+119/200>
Trace; c0107353 <system_call+33/38>
Proc;  inetd

>>EIP; 7fffffff Before first symbol   <=====

Trace; c0216162 <tcp_poll+2e/158>
Trace; c0115fc7 <schedule_timeout+17/9c>
Trace; c014943e <do_select+1ca/204>
Trace; c01497ca <sys_select+32a/46c>
Trace; c0107353 <system_call+33/38>
Proc;  icmplog

>>EIP; 00000286 Before first symbol   <=====

Trace; c0116a7b <sleep_on+4b/7c>
Trace; c016c6dc <start_this_handle+d0/170>
Trace; c016c85d <journal_start+95/c4>
Trace; c0167688 <ext3_dirty_inode+74/10c>
Trace; c014e2da <__mark_inode_dirty+32/a8>
Trace; c014fb6b <update_atime+4b/50>
Trace; c012b202 <do_generic_file_read+486/494>
Trace; c012b5bb <generic_file_read+93/194>
Trace; c012b49c <file_read_actor+0/8c>
Trace; c0139bd6 <do_readv_writev+1aa/268>
Trace; c012b528 <generic_file_read+0/194>
Trace; c0139cd5 <sys_readv+41/54>
Trace; c0107353 <system_call+33/38>
Proc;  tcplog

>>EIP; 7fffffff Before first symbol   <=====

Trace; c0115fc7 <schedule_timeout+17/9c>
Trace; c01f6677 <sock_alloc_send_pskb+73/1d0>
Trace; c0240fc2 <unix_wait_for_peer+a6/cc>
Trace; c02419e7 <unix_dgram_sendmsg+327/3f8>
Trace; c01f3ff5 <sock_sendmsg+69/88>
Trace; c01f4ed5 <sys_sendto+d9/f8>
Trace; c0128725 <__vma_link+61/b0>
Trace; c01298dd <__insert_vm_struct+55/64>
Trace; c012919d <unmap_fixup+14d/16c>
Trace; c01291af <unmap_fixup+15f/16c>
Trace; c01f4f12 <sys_send+1e/24>
Trace; c01f56b1 <sys_socketcall+119/200>
Trace; c0107353 <system_call+33/38>
Proc;  lpd

>>EIP; 7fffffff Before first symbol   <=====

Trace; c0115fc7 <schedule_timeout+17/9c>
Trace; c014943e <do_select+1ca/204>
Trace; c01497ca <sys_select+32a/46c>
Trace; c0107353 <system_call+33/38>
Proc;  safe_mysqld

>>EIP; 00000000 Before first symbol

Trace; c011cd25 <sys_wait4+395/3cc>
Trace; c0107353 <system_call+33/38>
Proc;  mysqld

>>EIP; 7fffffff Before first symbol   <=====

Trace; c0115fc7 <schedule_timeout+17/9c>
Trace; c014943e <do_select+1ca/204>
Trace; c01497ca <sys_select+32a/46c>
Trace; c0107353 <system_call+33/38>
Proc;  mysqld

>>EIP; ca69ff28 <_end+a34d4fc/104b15d4>   <=====

Trace; c011602a <schedule_timeout+7a/9c>
Trace; c0115f50 <process_timeout+0/60>
Trace; c0149a4f <do_poll+bb/dc>
Trace; c0149c5d <sys_poll+1ed/2f4>
Trace; c0107353 <system_call+33/38>
Proc;  mysqld

>>EIP; ca69bfb0 <_end+a349584/104b15d4>   <=====

Trace; c0106484 <sys_rt_sigsuspend+fc/118>
Trace; c0107353 <system_call+33/38>
Proc;  mysqld

>>EIP; ca699fb0 <_end+a347584/104b15d4>   <=====

Trace; c0106484 <sys_rt_sigsuspend+fc/118>
Trace; c0107353 <system_call+33/38>
Proc;  mysqld

>>EIP; ca695fb0 <_end+a343584/104b15d4>   <=====

Trace; c0106484 <sys_rt_sigsuspend+fc/118>
Trace; c0107353 <system_call+33/38>
Proc;  mysqld

>>EIP; ca691fb0 <_end+a33f584/104b15d4>   <=====

Trace; c0106484 <sys_rt_sigsuspend+fc/118>
Trace; c0107353 <system_call+33/38>
Proc;  mysqld

>>EIP; c9f3bfb0 <_end+9be9584/104b15d4>   <=====

Trace; c0106484 <sys_rt_sigsuspend+fc/118>
Trace; c0107353 <system_call+33/38>
Proc;  mysqld

>>EIP; c9e8ff2c <_end+9b3d500/104b15d4>   <=====

Trace; c011602a <schedule_timeout+7a/9c>
Trace; c0115f50 <process_timeout+0/60>
Trace; c014943e <do_select+1ca/204>
Trace; c01497ca <sys_select+32a/46c>
Trace; c0107353 <system_call+33/38>
Proc;  mysqld

>>EIP; c9eeffb0 <_end+9b9d584/104b15d4>   <=====

Trace; c011db2d <do_softirq+7d/dc>
Trace; c0106484 <sys_rt_sigsuspend+fc/118>
Trace; c0107353 <system_call+33/38>
Proc;  mysqld

>>EIP; c9f13fb0 <_end+9bc1584/104b15d4>   <=====

Trace; c0106484 <sys_rt_sigsuspend+fc/118>
Trace; c0107353 <system_call+33/38>
Proc;  mysqld

>>EIP; c9d1dfb0 <_end+99cb584/104b15d4>   <=====

Trace; c0106484 <sys_rt_sigsuspend+fc/118>
Trace; c0107353 <system_call+33/38>
Proc;  nfsd

>>EIP; cfa2d6dc <_end+f6dacb0/104b15d4>   <=====

Trace; c0105fe5 <__down+6d/d0>
Trace; c0106194 <__down_failed+8/c>
Trace; d08cc15d <[nfsd].text.lock.nfsfh+8d/f0>
Trace; d08cb807 <[nfsd]fh_verify+2bf/474>
Trace; d08d1df9 <[nfsd]nfsd3_proc_getattr+95/a0>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d08c95b3 <[nfsd]nfsd_dispatch+d3/19a>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d089df27 <[sunrpc]svc_process+28f/4f0>
Trace; d08d8638 <[nfsd]nfsd_version3+0/10>
Trace; d08d8658 <[nfsd]nfsd_program+0/28>
Trace; d08c9390 <[nfsd]nfsd+204/354>
Trace; d08d8620 <[nfsd]nfsd_list+0/0>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  nfsd

>>EIP; cfa2d6dc <_end+f6dacb0/104b15d4>   <=====

Trace; c0105fe5 <__down+6d/d0>
Trace; c0106194 <__down_failed+8/c>
Trace; d08cc15d <[nfsd].text.lock.nfsfh+8d/f0>
Trace; d08cb807 <[nfsd]fh_verify+2bf/474>
Trace; d08d1df9 <[nfsd]nfsd3_proc_getattr+95/a0>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d08c95b3 <[nfsd]nfsd_dispatch+d3/19a>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d089df27 <[sunrpc]svc_process+28f/4f0>
Trace; d08d8638 <[nfsd]nfsd_version3+0/10>
Trace; d08d8658 <[nfsd]nfsd_program+0/28>
Trace; d08c9390 <[nfsd]nfsd+204/354>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  nfsd

>>EIP; cfa2d6dc <_end+f6dacb0/104b15d4>   <=====

Trace; c0105fe5 <__down+6d/d0>
Trace; c0106194 <__down_failed+8/c>
Trace; d08cc15d <[nfsd].text.lock.nfsfh+8d/f0>
Trace; d08cb807 <[nfsd]fh_verify+2bf/474>
Trace; d08d1df9 <[nfsd]nfsd3_proc_getattr+95/a0>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d08c95b3 <[nfsd]nfsd_dispatch+d3/19a>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d089df27 <[sunrpc]svc_process+28f/4f0>
Trace; d08d8638 <[nfsd]nfsd_version3+0/10>
Trace; d08d8658 <[nfsd]nfsd_program+0/28>
Trace; d08c9390 <[nfsd]nfsd+204/354>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  nfsd

>>EIP; cfa2d6dc <_end+f6dacb0/104b15d4>   <=====

Trace; c0105fe5 <__down+6d/d0>
Trace; c0106194 <__down_failed+8/c>
Trace; d08cc15d <[nfsd].text.lock.nfsfh+8d/f0>
Trace; d08cb807 <[nfsd]fh_verify+2bf/474>
Trace; d08d1df9 <[nfsd]nfsd3_proc_getattr+95/a0>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d08c95b3 <[nfsd]nfsd_dispatch+d3/19a>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d089df27 <[sunrpc]svc_process+28f/4f0>
Trace; d08d8638 <[nfsd]nfsd_version3+0/10>
Trace; d08d8658 <[nfsd]nfsd_program+0/28>
Trace; d08c9390 <[nfsd]nfsd+204/354>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  nfsd

>>EIP; cfa2d6dc <_end+f6dacb0/104b15d4>   <=====

Trace; c0105fe5 <__down+6d/d0>
Trace; c0106194 <__down_failed+8/c>
Trace; d08cc15d <[nfsd].text.lock.nfsfh+8d/f0>
Trace; d08cb807 <[nfsd]fh_verify+2bf/474>
Trace; d08d1df9 <[nfsd]nfsd3_proc_getattr+95/a0>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d08c95b3 <[nfsd]nfsd_dispatch+d3/19a>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d089df27 <[sunrpc]svc_process+28f/4f0>
Trace; d08d8638 <[nfsd]nfsd_version3+0/10>
Trace; d08d8658 <[nfsd]nfsd_program+0/28>
Trace; d08c9390 <[nfsd]nfsd+204/354>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  nfsd

>>EIP; cfa2d6dc <_end+f6dacb0/104b15d4>   <=====

Trace; c0105fe5 <__down+6d/d0>
Trace; c0106194 <__down_failed+8/c>
Trace; d08cc15d <[nfsd].text.lock.nfsfh+8d/f0>
Trace; d08cb807 <[nfsd]fh_verify+2bf/474>
Trace; d08d1df9 <[nfsd]nfsd3_proc_getattr+95/a0>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d08c95b3 <[nfsd]nfsd_dispatch+d3/19a>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d089df27 <[sunrpc]svc_process+28f/4f0>
Trace; d08d8638 <[nfsd]nfsd_version3+0/10>
Trace; d08d8658 <[nfsd]nfsd_program+0/28>
Trace; d08c9390 <[nfsd]nfsd+204/354>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  nfsd

>>EIP; cfa2d6dc <_end+f6dacb0/104b15d4>   <=====

Trace; c0105fe5 <__down+6d/d0>
Trace; c0106194 <__down_failed+8/c>
Trace; d08cc15d <[nfsd].text.lock.nfsfh+8d/f0>
Trace; d08cb807 <[nfsd]fh_verify+2bf/474>
Trace; d08d1df9 <[nfsd]nfsd3_proc_getattr+95/a0>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d08c95b3 <[nfsd]nfsd_dispatch+d3/19a>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d089df27 <[sunrpc]svc_process+28f/4f0>
Trace; d08d8638 <[nfsd]nfsd_version3+0/10>
Trace; d08d8658 <[nfsd]nfsd_program+0/28>
Trace; d08c9390 <[nfsd]nfsd+204/354>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  nfsd

>>EIP; cfa2d6dc <_end+f6dacb0/104b15d4>   <=====

Trace; c0105fe5 <__down+6d/d0>
Trace; c0106194 <__down_failed+8/c>
Trace; d08cc15d <[nfsd].text.lock.nfsfh+8d/f0>
Trace; d08cb807 <[nfsd]fh_verify+2bf/474>
Trace; d08d1df9 <[nfsd]nfsd3_proc_getattr+95/a0>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d08c95b3 <[nfsd]nfsd_dispatch+d3/19a>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d089df27 <[sunrpc]svc_process+28f/4f0>
Trace; d08d8638 <[nfsd]nfsd_version3+0/10>
Trace; d08d8658 <[nfsd]nfsd_program+0/28>
Trace; d08c9390 <[nfsd]nfsd+204/354>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  nfsd

>>EIP; cfa2d6dc <_end+f6dacb0/104b15d4>   <=====

Trace; c0105fe5 <__down+6d/d0>
Trace; c0106194 <__down_failed+8/c>
Trace; d08cc15d <[nfsd].text.lock.nfsfh+8d/f0>
Trace; d08cb807 <[nfsd]fh_verify+2bf/474>
Trace; d08d1df9 <[nfsd]nfsd3_proc_getattr+95/a0>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d08c95b3 <[nfsd]nfsd_dispatch+d3/19a>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d089df27 <[sunrpc]svc_process+28f/4f0>
Trace; d08d8638 <[nfsd]nfsd_version3+0/10>
Trace; d08d8658 <[nfsd]nfsd_program+0/28>
Trace; d08c9390 <[nfsd]nfsd+204/354>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  nfsd

>>EIP; cfa2d6dc <_end+f6dacb0/104b15d4>   <=====

Trace; c0105fe5 <__down+6d/d0>
Trace; c0106194 <__down_failed+8/c>
Trace; d08cc15d <[nfsd].text.lock.nfsfh+8d/f0>
Trace; d08cb807 <[nfsd]fh_verify+2bf/474>
Trace; d08d1df9 <[nfsd]nfsd3_proc_getattr+95/a0>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d08c95b3 <[nfsd]nfsd_dispatch+d3/19a>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d089df27 <[sunrpc]svc_process+28f/4f0>
Trace; d08d8638 <[nfsd]nfsd_version3+0/10>
Trace; d08d8658 <[nfsd]nfsd_program+0/28>
Trace; d08c9390 <[nfsd]nfsd+204/354>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  nfsd

>>EIP; 00000286 Before first symbol   <=====

Trace; c0116a7b <sleep_on+4b/7c>
Trace; c016c6dc <start_this_handle+d0/170>
Trace; c016c85d <journal_start+95/c4>
Trace; c0167688 <ext3_dirty_inode+74/10c>
Trace; c014e2da <__mark_inode_dirty+32/a8>
Trace; d08cab90 <[nfsd]filldir_one+0/4c>
Trace; c014fb6b <update_atime+4b/50>
Trace; c0162e00 <ext3_readdir+380/390>
Trace; d082c63a <[ipchains]ip_fw_check+3ca/4b4>
Trace; c01488b4 <vfs_readdir+94/e0>
Trace; d08cab90 <[nfsd]filldir_one+0/4c>
Trace; d08cac85 <[nfsd]nfsd_get_name+a9/ec>
Trace; d08cab90 <[nfsd]filldir_one+0/4c>
Trace; d08cb0a8 <[nfsd]splice+24/170>
Trace; c013b7d8 <getblk+1c/4c>
Trace; c013b7ff <getblk+43/4c>
Trace; c013ba34 <bread+18/70>
Trace; c0166a94 <ext3_get_inode_loc+118/174>
Trace; c0166b52 <ext3_read_inode+16/278>
Trace; c0166d9e <ext3_read_inode+262/278>
Trace; c014f6f1 <iget4+4d/f0>
Trace; c014f866 <iput+4e/2c8>
Trace; d08cadbb <[nfsd]nfsd_iget+f3/10c>
Trace; d08cae3d <[nfsd]nfsd_get_dentry+69/78>
Trace; d08cb33d <[nfsd]find_fh_dentry+149/354>
Trace; d08cb37a <[nfsd]find_fh_dentry+186/354>
Trace; d08cb807 <[nfsd]fh_verify+2bf/474>
Trace; d08d1df9 <[nfsd]nfsd3_proc_getattr+95/a0>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d08c95b3 <[nfsd]nfsd_dispatch+d3/19a>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d089df27 <[sunrpc]svc_process+28f/4f0>
Trace; d08d8638 <[nfsd]nfsd_version3+0/10>
Trace; d08d8658 <[nfsd]nfsd_program+0/28>
Trace; d08c9390 <[nfsd]nfsd+204/354>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  nfsd

>>EIP; cfa2d6dc <_end+f6dacb0/104b15d4>   <=====

Trace; c0105fe5 <__down+6d/d0>
Trace; c0106194 <__down_failed+8/c>
Trace; d08cc15d <[nfsd].text.lock.nfsfh+8d/f0>
Trace; d08cb807 <[nfsd]fh_verify+2bf/474>
Trace; d08d1df9 <[nfsd]nfsd3_proc_getattr+95/a0>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d08c95b3 <[nfsd]nfsd_dispatch+d3/19a>
Trace; d08d8ca4 <[nfsd]nfsd_procedures3+24/320>
Trace; d089df27 <[sunrpc]svc_process+28f/4f0>
Trace; d08d8638 <[nfsd]nfsd_version3+0/10>
Trace; d08d8658 <[nfsd]nfsd_program+0/28>
Trace; d08c9390 <[nfsd]nfsd+204/354>
Trace; d08d8620 <[nfsd]nfsd_list+0/0>
Trace; c0105a64 <arch_kernel_thread+28/38>
Proc;  rpc.mountd

>>EIP; 7fffffff Before first symbol   <=====

Trace; c0115fc7 <schedule_timeout+17/9c>
Trace; c01f6677 <sock_alloc_send_pskb+73/1d0>
Trace; c0240fc2 <unix_wait_for_peer+a6/cc>
Trace; c02419e7 <unix_dgram_sendmsg+327/3f8>
Trace; c01f3ff5 <sock_sendmsg+69/88>
Trace; c01f4ed5 <sys_sendto+d9/f8>
Trace; c0133368 <__free_pages+1c/20>
Trace; c0133796 <free_page_and_swap_cache+32/34>
Trace; c0126ab0 <__free_pte+40/48>
Trace; c012707b <zap_page_range+30b/374>
Trace; c01290b2 <unmap_fixup+62/16c>
Trace; c01f4f12 <sys_send+1e/24>
Trace; c01f56b1 <sys_socketcall+119/200>
Trace; c0107353 <system_call+33/38>
Proc;  omniNames

>>EIP; c9fb3f8c <_end+9c61560/104b15d4>   <=====

Trace; c011dec0 <bh_action+4c/8c>
Trace; c011602a <schedule_timeout+7a/9c>
Trace; c0115f50 <process_timeout+0/60>
Trace; c0121e3a <sys_nanosleep+102/178>
Trace; c0107353 <system_call+33/38>
Proc;  powstatd

>>EIP; ca0c5f8c <_end+9d73560/104b15d4>   <=====

Trace; c01f7551 <__kfree_skb+139/140>
Trace; c011602a <schedule_timeout+7a/9c>
Trace; c0115f50 <process_timeout+0/60>
Trace; c0121e3a <sys_nanosleep+102/178>
Trace; c0107353 <system_call+33/38>
Proc;  rarpd

>>EIP; 7fffffff Before first symbol   <=====

Trace; c0115fc7 <schedule_timeout+17/9c>
Trace; c0149a1a <do_poll+86/dc>
Trace; c0149a4f <do_poll+bb/dc>
Trace; c0149c5d <sys_poll+1ed/2f4>
Trace; c0107353 <system_call+33/38>
Proc;  omniNames

>>EIP; c9c47f28 <_end+98f54fc/104b15d4>   <=====

Trace; c011602a <schedule_timeout+7a/9c>
Trace; c0115f50 <process_timeout+0/60>
Trace; c0149a4f <do_poll+bb/dc>
Trace; c0149c5d <sys_poll+1ed/2f4>
Trace; c0107353 <system_call+33/38>
Proc;  omniNames

>>EIP; c9c43fb0 <_end+98f1584/104b15d4>   <=====

Trace; c0106484 <sys_rt_sigsuspend+fc/118>
Trace; c0107353 <system_call+33/38>
Proc;  omniNames

>>EIP; c9c3ff8c <_end+98ed560/104b15d4>   <=====

Trace; c011602a <schedule_timeout+7a/9c>
Trace; c0115f50 <process_timeout+0/60>
Trace; c0121e3a <sys_nanosleep+102/178>
Trace; c0107353 <system_call+33/38>


2 warnings issued.  Results may not be reliable.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.4.24 SMP lockups
  2004-01-14 18:28     ` David Woodhouse
@ 2004-01-14 21:01       ` David Woodhouse
  0 siblings, 0 replies; 19+ messages in thread
From: David Woodhouse @ 2004-01-14 21:01 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Marcelo Tosatti, Andrew Morton, linux-kernel, viro, davej

On Wed, 2004-01-14 at 18:28 +0000, David Woodhouse wrote:
> I _think_ it's true that the _only_ way we can get woken from
> __wait_on_freeing_inode() is the inode has actually been destroyed, in
> which case it's fine just to _not_ remove ourselves from the (defunct)
> wait queue, and to return. But I need to stare hard at it some more,
> have another cup of tea, and ask Al :)

It does look like it should be OK. As far as I can tell, the only place
that looks like it could wake us without actually destroying the inode
is __sync_one(), and I really can't see how we'd get there with an
I_FREEING inode. I'd be inclined to stick a BUG() in for testing
purposes, to make sure the assumption is true.

I note that in prune_icache() in the CONFIG_HIGHMEM case, we're actually
dropping I_LOCK on an inode without waking its wait queue. I suspect
that's wrong and wants fixing too...

(untested)
===== fs/inode.c 1.47 vs edited =====
--- 1.47/fs/inode.c	Thu Jan  8 12:23:51 2004
+++ edited/fs/inode.c	Wed Jan 14 20:51:18 2004
@@ -250,9 +250,10 @@
  * ->read_inode, and we want to be sure that evidence of the deletion is found
  * by ->read_inode.
  *
- * This call might return early if an inode which shares the waitq is woken up.
- * This is most easily handled by the caller which will loop around again
- * looking for the inode.
+ * Unlike the 2.6 version, this call call cannot return early, since inodes
+ * do not share wait queue. Therefore, we don't call remove_wait_queue(); it
+ * would be dangerous to do so since the inode may have already been freed, 
+ * and it's unnecessary, since the inode is definitely going to get freed.
  *
  * This is called with inode_lock held.
  */
@@ -264,7 +265,7 @@
         set_current_state(TASK_UNINTERRUPTIBLE);
         spin_unlock(&inode_lock);
         schedule();
-        remove_wait_queue(&inode->i_wait, &wait);
+
         spin_lock(&inode_lock);
 }
 
@@ -325,7 +326,7 @@
 	list_del(&inode->i_list);
 	list_add(&inode->i_list, &inode->i_sb->s_locked_inodes);
 
-	if (inode->i_state & I_LOCK)
+	if (inode->i_state & (I_LOCK|I_FREEING))
 		BUG();
 
 	/* Set I_LOCK, reset I_DIRTY */
@@ -344,8 +345,7 @@
 
 	spin_lock(&inode_lock);
 	inode->i_state &= ~I_LOCK;
-	if (!(inode->i_state & I_FREEING))
-		__refile_inode(inode);
+	__refile_inode(inode);
 	wake_up(&inode->i_wait);
 }
 
@@ -884,6 +884,7 @@
 		/* Release the inode again. */
 		spin_lock(&inode_lock);
 		inode->i_state &= ~I_LOCK;
+		wake_up(&inode->i_wait);
 	}
 	spin_unlock(&inode_lock);
 #endif /* CONFIG_HIGHMEM */





-- 
dwmw2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.4.24 SMP lockups
  2004-01-14 17:07   ` Simon Kirby
  2004-01-14 17:56     ` Marcelo Tosatti
@ 2004-01-14 18:28     ` David Woodhouse
  2004-01-14 21:01       ` David Woodhouse
  1 sibling, 1 reply; 19+ messages in thread
From: David Woodhouse @ 2004-01-14 18:28 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Marcelo Tosatti, Andrew Morton, linux-kernel, viro

On Wed, 2004-01-14 at 09:07 -0800, Simon Kirby wrote:
> I also have an entire sysrq-T, but it is for over 500 processes, so I
> posted the entire serial capture log as well, as a few other things
> here:
> 
> 	http://blue.netnation.com/sim/ref/2.4.24_stuck_cpu/

Perfect report; thanks.

It deadlocked in attempting to get a spinlock, in remove_wait_queue().

(Look at the address it wanted to jump to when it got the lock, from 
0xc011c7cf to 0xc011c7cf+0xffffe996 == 0xc011b165).

This is almost probably because the remove_wait_queue() in
__wait_on_freeing_inode() is removing us from a waitqueue in an inode
which has already been freed. The memory which used to hold a spinlock
has been reused, and it now looks locked, so we wait. For ever.

This differs from the working 2.6 version, where the waitqueue is in a
hsah table and doesn't go away.

I _think_ it's true that the _only_ way we can get woken from
__wait_on_freeing_inode() is the inode has actually been destroyed, in
which case it's fine just to _not_ remove ourselves from the (defunct)
wait queue, and to return. But I need to stare hard at it some more,
have another cup of tea, and ask Al :)

If I'm right in the above, then this should work....

===== fs/inode.c 1.47 vs edited =====
*** /tmp/inode.c-1.47-18008	Thu Jan  8 12:23:51 2004
--- fs/inode.c	Wed Jan 14 18:25:33 2004
*************** static void __wait_on_freeing_inode(stru
*** 264,270 ****
--- 264,274 ----
          set_current_state(TASK_UNINTERRUPTIBLE);
          spin_unlock(&inode_lock);
          schedule();
+ /* Inode is dead or dying. The wait queue is obsolete and we don't need to
+    remove ourselves from it. More to the point we _mustn't_ remove ourselves
+    since it may already have been freed
          remove_wait_queue(&inode->i_wait, &wait);
+  */
          spin_lock(&inode_lock);
  }
  


-- 
dwmw2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.4.24 SMP lockups
  2004-01-14 17:07   ` Simon Kirby
@ 2004-01-14 17:56     ` Marcelo Tosatti
  2004-01-16  2:34       ` Philippe Troin
  2004-01-14 18:28     ` David Woodhouse
  1 sibling, 1 reply; 19+ messages in thread
From: Marcelo Tosatti @ 2004-01-14 17:56 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Marcelo Tosatti, Andrew Morton, linux-kernel, David Woodhouse

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2145 bytes --]



On Wed, 14 Jan 2004, Simon Kirby wrote:

> On Sat, Jan 10, 2004 at 05:32:55PM -0200, Marcelo Tosatti wrote:
>
> > This sounds like a deadlock. I wonder why the NMI watchdog is not
> > triggering.
>
> Well, with the NMI watchdog working (nmi_watchdog=2), we just had another
> occurrence.  This time, I had the serial console ready. :)
>
> I'm guessing this is the same as the previous cases; however, this time
> sysrq-P was able to print information from both CPUs.  I assume the NMI
> watchdog unlocked interrupts from what would have been the stuck CPU?
>
> NMI Watchdog detected LOCKUP on CPU0, eip c011c7cb, registers:
> CPU:    0
> EIP:    0010:[<c011c7cb>]    Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00000086
> eax: ddadf5d0   ebx: d8a2e000   ecx: 00000000   edx: d8a2fe50
> esi: d8a2fe50   edi: 00000286   ebp: 00020690   esp: d8a2fe30
> ds: 0018   es: 0018   ss: 0018
> Process php4 (pid: 19197, stackpage=d8a2f000)
> Stack: d8a2e000 d8a2fe50 ddadf5d0 c015a8e4 00000000 d8a2e000 00000000 00000000
>        00000000 d8a2e000 ddadf5d4 ddadf5d4 ddadf520 ddadf520 c1ce4178 c015b40b
>        ddadf520 0000c82f 00000018 0000ffff c1ce4178 00020690 f7b73c00 c015b881
> Call Trace:    [<c015a8e4>] [<c015b40b>] [<c015b881>] [<c0176e68>] [<c014e792>]
>   [<c014ec7c>] [<c014f259>] [<c014f81e>] [<c01418ce>] [<c0141cf3>] [<c010926f>]
> Code: f3 90 7e f9 e9 8d e9 ff ff 80 3d c0 a3 31 c0 00 f3 90 7e f5
>
> >>EIP; c011c7ca <.text.lock.fork+1a/120>   <=====
> Trace; c015a8e4 <__wait_on_freeing_inode+74/a0>
> Trace; c015b40a <find_inode+6a/80>
> Trace; c015b880 <iget4+60/110>
> Trace; c0176e68 <ext3_lookup+78/a0>
> Trace; c014e792 <real_lookup+f2/140>
> Trace; c014ec7c <link_path_walk+31c/6f0>
> Trace; c014f258 <path_lookup+38/40>
> Trace; c014f81e <open_namei+6e/690>
> Trace; c01418ce <filp_open+3e/70>
> Trace; c0141cf2 <sys_open+52/c0>
> Trace; c010926e <system_call+32/38>

Thanks so much for this Simon.

I'm not still sure why it is deadlocking. David Woodhouse and myself are
taking a closer look.

Anyway, please revert the attached patch and retry. It removes the
"__wait_on_freeing_inode" logic.

[-- Attachment #2: Type: TEXT/PLAIN, Size: 5911 bytes --]

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.1136.66.2 -> 1.1136.67.1
#	          fs/inode.c	1.41    -> 1.42   
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/11/15	cattelan@lips.thebarn.com	1.1199
# Merge lips.thebarn.com:/export/hose/bkroot/linux-2.4
# into lips.thebarn.com:/export/hose/bkroot/linux-2.4+justXFS
# --------------------------------------------
# 03/11/16	pmeda@akamai.com	1.1136.66.3
# [netdrvr tulip] fix hashed setup frame code
# 
# It is using local variable `i' in both the inner and outer loop.
# 
# Need to bring the for loop outside the loop.  Otherwise we need to reset the
# setup_frame to tp->setup_frame after every loop.  You do not need to set the
# setup_frm for every mc address, we can set once after the complete has_table
# is ready.
# --------------------------------------------
# 03/11/17	livio@ime.usp.br	1.1136.67.1
# [PATCH] Backport inode_hash race fix
# 
#   Hello,
# 
#   After  trying to  "get around"  the  inode_hash races  when removing  and
# iget()ing the  same inode, my code  got really ugly,  and I got fed  up. So
# yesterday I got Neil's 2.5 patch and backportted it to 2.4.22-rc2.
# 
#   The patch  is very similar to  Neil's, except for one  (very important to
# me)   case.  Neil's   patch  only   covered  the   removal  of   inodes  in
# generic_delete_inode() (which in  2.4 is the case where  i_nlink is zero in
# iput()).  But, as I described in a previous post:
# http://marc.theaimsgroup.com/?l=linux-fsdevel&m=105547595519745&w=2
# 
#   , I frequently get busted in prune_icache(). In Neil's patch prune_icache
# is not  covered. In my  opinion, this case  (in prune_icache()), has  to be
# fixed in 2.6  too. Depending on your  comments, I may make a  patch for 2.6
# later.
# 
#   Please  comment if  you can,  so that  I may  send this  to  Marcelo then
# 2.4.23-pre opens (which should be soon, I think).
# 
#   best regards,
# 
# --
#   Livio B. Soares
# --------------------------------------------
#
diff -Nru a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c	Wed Jan 14 15:52:23 2004
+++ b/fs/inode.c	Wed Jan 14 15:52:23 2004
@@ -206,7 +206,8 @@
 	if ((inode->i_state & flags) != flags) {
 		inode->i_state |= flags;
 		/* Only add valid (ie hashed) inodes to the dirty list */
-		if (!(inode->i_state & I_LOCK) && !list_empty(&inode->i_hash)) {
+		if (!(inode->i_state & (I_LOCK|I_FREEING|I_CLEAR)) &&
+		    !list_empty(&inode->i_hash)) {
 			list_del(&inode->i_list);
 			list_add(&inode->i_list, &sb->s_dirty);
 		}
@@ -235,6 +236,30 @@
 		__wait_on_inode(inode);
 }
 
+/*
+ * If we try to find an inode in the inode hash while it is being deleted, we
+ * have to wait until the filesystem completes its deletion before reporting
+ * that it isn't found.  This is because iget will immediately call
+ * ->read_inode, and we want to be sure that evidence of the deletion is found
+ * by ->read_inode.
+ *
+ * This call might return early if an inode which shares the waitq is woken up.
+ * This is most easily handled by the caller which will loop around again
+ * looking for the inode.
+ *
+ * This is called with inode_lock held.
+ */
+static void __wait_on_freeing_inode(struct inode *inode)
+{
+        DECLARE_WAITQUEUE(wait, current);
+
+        add_wait_queue(&inode->i_wait, &wait);
+        set_current_state(TASK_UNINTERRUPTIBLE);
+        spin_unlock(&inode_lock);
+        schedule();
+        remove_wait_queue(&inode->i_wait, &wait);
+        spin_lock(&inode_lock);
+}
 
 static inline void write_inode(struct inode *inode, int sync)
 {
@@ -596,6 +621,11 @@
 		if (inode->i_data.nrpages)
 			truncate_inode_pages(&inode->i_data, 0);
 		clear_inode(inode);
+		spin_lock(&inode_lock);
+		list_del(&inode->i_hash);
+		INIT_LIST_HEAD(&inode->i_hash);
+		spin_unlock(&inode_lock);
+		wake_up(&inode->i_wait);
 		destroy_inode(inode);
 		nr_disposed++;
 	}
@@ -707,6 +737,14 @@
  *
  * We don't expect to have to call this very often.
  *
+ * We leave the inode in the inode hash table until *after* 
+ * the filesystem's ->delete_inode (in dispose_list) completes.
+ * This ensures that an iget (such as nfsd might instigate) will 
+ * always find up-to-date information either in the hash or on disk.
+ *
+ * I_FREEING is set so that no-one will take a new reference
+ * to the inode while it is being deleted.
+ *
  * N.B. The spinlock is released during the call to
  *      dispose_list.
  */
@@ -739,8 +777,6 @@
 		if (atomic_read(&inode->i_count))
 			continue;
 		list_del(tmp);
-		list_del(&inode->i_hash);
-		INIT_LIST_HEAD(&inode->i_hash);
 		list_add(tmp, freeable);
 		inode->i_state |= I_FREEING;
 		count++;
@@ -793,6 +829,7 @@
 	struct list_head *tmp;
 	struct inode * inode;
 
+repeat:
 	tmp = head;
 	for (;;) {
 		tmp = tmp->next;
@@ -806,6 +843,10 @@
 			continue;
 		if (find_actor && !find_actor(inode, ino, opaque))
 			continue;
+		if (inode->i_state & (I_FREEING|I_CLEAR)) {
+			__wait_on_freeing_inode(inode);
+			goto repeat;
+		}
 		break;
 	}
 	return inode;
@@ -1076,8 +1117,6 @@
 			return;
 
 		if (!inode->i_nlink) {
-			list_del(&inode->i_hash);
-			INIT_LIST_HEAD(&inode->i_hash);
 			list_del(&inode->i_list);
 			INIT_LIST_HEAD(&inode->i_list);
 			inode->i_state|=I_FREEING;
@@ -1095,6 +1134,11 @@
 				delete(inode);
 			} else
 				clear_inode(inode);
+			spin_lock(&inode_lock);
+			list_del(&inode->i_hash);
+			INIT_LIST_HEAD(&inode->i_hash);
+			spin_unlock(&inode_lock);
+			wake_up(&inode->i_wait);
 			if (inode->i_state != I_CLEAR)
 				BUG();
 		} else {

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.4.24 SMP lockups
       [not found] ` <Pine.LNX.4.58L.0401101719400.1310@logos.cnet>
  2004-01-10 22:40   ` Andrew Morton
@ 2004-01-14 17:07   ` Simon Kirby
  2004-01-14 17:56     ` Marcelo Tosatti
  2004-01-14 18:28     ` David Woodhouse
  1 sibling, 2 replies; 19+ messages in thread
From: Simon Kirby @ 2004-01-14 17:07 UTC (permalink / raw)
  To: Marcelo Tosatti, Andrew Morton; +Cc: linux-kernel

On Sat, Jan 10, 2004 at 05:32:55PM -0200, Marcelo Tosatti wrote:

> This sounds like a deadlock. I wonder why the NMI watchdog is not
> triggering.

Well, with the NMI watchdog working (nmi_watchdog=2), we just had another
occurrence.  This time, I had the serial console ready. :)

I'm guessing this is the same as the previous cases; however, this time
sysrq-P was able to print information from both CPUs.  I assume the NMI
watchdog unlocked interrupts from what would have been the stuck CPU?

NMI Watchdog detected LOCKUP on CPU0, eip c011c7cb, registers:
CPU:    0
EIP:    0010:[<c011c7cb>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00000086
eax: ddadf5d0   ebx: d8a2e000   ecx: 00000000   edx: d8a2fe50
esi: d8a2fe50   edi: 00000286   ebp: 00020690   esp: d8a2fe30
ds: 0018   es: 0018   ss: 0018
Process php4 (pid: 19197, stackpage=d8a2f000)
Stack: d8a2e000 d8a2fe50 ddadf5d0 c015a8e4 00000000 d8a2e000 00000000 00000000 
       00000000 d8a2e000 ddadf5d4 ddadf5d4 ddadf520 ddadf520 c1ce4178 c015b40b 
       ddadf520 0000c82f 00000018 0000ffff c1ce4178 00020690 f7b73c00 c015b881 
Call Trace:    [<c015a8e4>] [<c015b40b>] [<c015b881>] [<c0176e68>] [<c014e792>]
  [<c014ec7c>] [<c014f259>] [<c014f81e>] [<c01418ce>] [<c0141cf3>] [<c010926f>]
Code: f3 90 7e f9 e9 8d e9 ff ff 80 3d c0 a3 31 c0 00 f3 90 7e f5 

>>EIP; c011c7ca <.text.lock.fork+1a/120>   <=====
Trace; c015a8e4 <__wait_on_freeing_inode+74/a0>
Trace; c015b40a <find_inode+6a/80>
Trace; c015b880 <iget4+60/110>
Trace; c0176e68 <ext3_lookup+78/a0>
Trace; c014e792 <real_lookup+f2/140>
Trace; c014ec7c <link_path_walk+31c/6f0>
Trace; c014f258 <path_lookup+38/40>
Trace; c014f81e <open_namei+6e/690>
Trace; c01418ce <filp_open+3e/70>
Trace; c0141cf2 <sys_open+52/c0>
Trace; c010926e <system_call+32/38>
Code;  c011c7ca <.text.lock.fork+1a/120>
00000000 <_EIP>:
Code;  c011c7ca <.text.lock.fork+1a/120>   <=====
   0:   f3 90                     repz nop    <=====
Code;  c011c7cc <.text.lock.fork+1c/120>
   2:   7e f9                     jle    fffffffd <_EIP+0xfffffffd>
Code;  c011c7ce <.text.lock.fork+1e/120>
   4:   e9 8d e9 ff ff            jmp    ffffe996 <_EIP+0xffffe996>
Code;  c011c7d2 <.text.lock.fork+22/120>
   9:   80 3d c0 a3 31 c0 00      cmpb   $0x0,0xc031a3c0
Code;  c011c7da <.text.lock.fork+2a/120>
  10:   f3 90                     repz nop 
Code;  c011c7dc <.text.lock.fork+2c/120>
  12:   7e f5                     jle    9 <_EIP+0x9>

console shuts up ... 
 <6>SysRq : Show Regs
SysRq : Show State
SysRq : Changing Loglevel
Loglevel set to 1
SysRq : Show Regs
SysRq : Changing Loglevel
Loglevel set to 0
SysRq : Show Regs
SysRq : Changing Loglevel
Loglevel set to 9
SysRq : Emergency Sync
Syncing device 08:01 ... OK
Syncing device 08:05 ... OK
Syncing device 08:06 ... OK
Syncing device 08:07 ... OK
Done.
SysRq : Show Regs

Pid: 0, comm:              swapper
EIP: 0010:[<c0106f8c>] CPU: 1 EFLAGS: 00000246    Not tainted
EAX: 00000000 EBX: c0106f60 ECX: 00000000 EDX: c1c14000
ESI: c1c14000 EDI: c1c14000 EBP: ffffe000 DS: 0018 ES: 0018
CR0: 8005003b CR2: 409cd000 CR3: 36c30000 CR4: 000006d0
Call Trace:    [<c0107022>] [<c011d3e1>] [<c011d65f>]

>>EIP; c0106f8c <default_idle+2c/50>   <=====
Trace; c0107022 <cpu_idle+52/70>
Trace; c011d3e0 <call_console_drivers+60/120>
Trace; c011d65e <printk+14e/180>

SysRq : Show Regs

Pid: 0, comm:              swapper
EIP: 0010:[<c0106f8c>] CPU: 0 EFLAGS: 00000246    Not tainted
EAX: 00000000 EBX: c0106f60 ECX: 00000000 EDX: c0334000
ESI: c0334000 EDI: c0334000 EBP: ffffe000 DS: 0018 ES: 0018
CR0: 8005003b CR2: 40809000 CR3: 36473000 CR4: 000006d0
Call Trace:    [<c0107022>] [<c0105000>]

>>EIP; c0106f8c <default_idle+2c/50>   <=====
Trace; c0107022 <cpu_idle+52/70>
Trace; c0105000 <_stext+0/0>

Hmm... It appears both CPUs are idling after the NMI, so maybe something
was just holding the fork lock for too long.  I'll post this anyway,
though, incase I'm missing something. 

I also have an entire sysrq-T, but it is for over 500 processes, so I
posted the entire serial capture log as well, as a few other things
here:

	http://blue.netnation.com/sim/ref/2.4.24_stuck_cpu/

Additional information available upon request.

Simon-

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.4.24 SMP lockups
  2004-01-12 12:18       ` Marcelo Tosatti
@ 2004-01-12 12:43         ` Thomas Zehetbauer
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Zehetbauer @ 2004-01-12 12:43 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 302 bytes --]

Is there any solution yet to get kernel oops reports / sysrq output when
running X?

Tom

-- 
  T h o m a s   Z e h e t b a u e r   ( TZ251 )
  PGP encrypted mail preferred - KeyID 96FFCB89
       mail pgp-key-request@hostmaster.org

If there is a god, you are an authorized representative.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 481 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.4.24 SMP lockups
  2004-01-11  4:12     ` Rik van Riel
  2004-01-11 13:16       ` Marcelo Tosatti
@ 2004-01-12 12:18       ` Marcelo Tosatti
  2004-01-12 12:43         ` Thomas Zehetbauer
  1 sibling, 1 reply; 19+ messages in thread
From: Marcelo Tosatti @ 2004-01-12 12:18 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Marcelo Tosatti, sim, linux-kernel, Arkadiusz Miskiewicz


On Sat, 10 Jan 2004, Rik van Riel wrote:

> On Sat, 10 Jan 2004, Andrew Morton wrote:
>
> > We don't have an each-CPU backtrace facility - it could be handy.
> > There's one in the low-latency patch for some reason.
>
> There's one in the RHEL3 tree, too.
>
> Marcelo, do you want me to rediff it and send it to you ?

Yep, that will be helpful.

Arkadiusz, sysrq-{t,q} from your case too would be nice to have.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.4.24 SMP lockups
  2004-01-11  4:12     ` Rik van Riel
@ 2004-01-11 13:16       ` Marcelo Tosatti
  2004-01-12 12:18       ` Marcelo Tosatti
  1 sibling, 0 replies; 19+ messages in thread
From: Marcelo Tosatti @ 2004-01-11 13:16 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Andrew Morton, Marcelo Tosatti, sim, linux-kernel



On Sat, 10 Jan 2004, Rik van Riel wrote:

> On Sat, 10 Jan 2004, Andrew Morton wrote:
>
> > We don't have an each-CPU backtrace facility - it could be handy.
> > There's one in the low-latency patch for some reason.
>
> There's one in the RHEL3 tree, too.
>
> Marcelo, do you want me to rediff it and send it to you ?

Yes please.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.4.24 SMP lockups
  2004-01-11  8:55     ` Simon Kirby
@ 2004-01-11  9:30       ` Willy Tarreau
  0 siblings, 0 replies; 19+ messages in thread
From: Willy Tarreau @ 2004-01-11  9:30 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Andrew Morton, Marcelo Tosatti, linux-kernel

On Sun, Jan 11, 2004 at 12:55:06AM -0800, Simon Kirby wrote:
> On Sat, Jan 10, 2004 at 02:40:49PM -0800, Andrew Morton wrote:
> 
> > Presumably it's spinning on the lock with interrupts enabled.  Make that
> > the `NMI' counters in /proc/interrupts are incrementing for all CPUs.
> 
> Actually, on one of the boxes it doesn't seem to be working at all:
> 
> activating NMI Watchdog ... done.
> testing NMI watchdog ... CPU#0: NMI appears to be stuck!  

Could you try with "nmi_watchdog=2" ? This is the only one which works on
my ASUS A7M266-D (MPX + dual XP 1800+).

Willy


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.4.24 SMP lockups
  2004-01-10 22:40   ` Andrew Morton
  2004-01-11  4:12     ` Rik van Riel
@ 2004-01-11  8:55     ` Simon Kirby
  2004-01-11  9:30       ` Willy Tarreau
  1 sibling, 1 reply; 19+ messages in thread
From: Simon Kirby @ 2004-01-11  8:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Marcelo Tosatti, linux-kernel

On Sat, Jan 10, 2004 at 02:40:49PM -0800, Andrew Morton wrote:

> Presumably it's spinning on the lock with interrupts enabled.  Make that
> the `NMI' counters in /proc/interrupts are incrementing for all CPUs.

Actually, on one of the boxes it doesn't seem to be working at all:

activating NMI Watchdog ... done.
testing NMI watchdog ... CPU#0: NMI appears to be stuck!  

This is on a Tyan Dual AMD MPX board with two MP 2000+ CPUs. 
/proc/interrupts shows:

           CPU0       CPU1       
  0:    4897433    4904751    IO-APIC-edge  timer
  1:          1          1    IO-APIC-edge  keyboard
  2:          0          0          XT-PIC  cascade
  8:          1          0    IO-APIC-edge  rtc
 16:     699524     700761   IO-APIC-level  dpti0
 19:   12480119   12480207   IO-APIC-level  eth0
NMI:          0          0 
LOC:    9801455    9801319 
ERR:          0
MIS:         13

I'll try reenabling it on the other (Intel) boxes where I think it
actually does work, and see if anything results.

> sysrq-T would be best.

I'll do the serial console dance next time and get some sysrq-T output.

Simon-

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.4.24 SMP lockups
  2004-01-10 22:40   ` Andrew Morton
@ 2004-01-11  4:12     ` Rik van Riel
  2004-01-11 13:16       ` Marcelo Tosatti
  2004-01-12 12:18       ` Marcelo Tosatti
  2004-01-11  8:55     ` Simon Kirby
  1 sibling, 2 replies; 19+ messages in thread
From: Rik van Riel @ 2004-01-11  4:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Marcelo Tosatti, sim, linux-kernel

On Sat, 10 Jan 2004, Andrew Morton wrote:

> We don't have an each-CPU backtrace facility - it could be handy.  
> There's one in the low-latency patch for some reason.

There's one in the RHEL3 tree, too.

Marcelo, do you want me to rediff it and send it to you ?

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.4.24 SMP lockups
       [not found] ` <Pine.LNX.4.58L.0401101719400.1310@logos.cnet>
@ 2004-01-10 22:40   ` Andrew Morton
  2004-01-11  4:12     ` Rik van Riel
  2004-01-11  8:55     ` Simon Kirby
  2004-01-14 17:07   ` Simon Kirby
  1 sibling, 2 replies; 19+ messages in thread
From: Andrew Morton @ 2004-01-10 22:40 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: sim, linux-kernel

Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote:
>
> 
> 
> On Fri, 9 Jan 2004, Simon Kirby wrote:
> 
> > 'lo all,
> 
> Hi Simon,
> 
> > We've had about 6 cases of this now, across 4 separate boxes.  Since
> > upgrading to 2.4.24, our SMP web server boxes (both Intel and AMD
> > hardware) are randomly blowing up.  This may have happened on 2.4.23 as
> > well, but they weren't really running long enough to tell.  2.4.22 was
> > fine.  GCC 3.3.3.
> >
> > These boxes are all dual CPU, and the failure case shows up suddenly with
> > no warning.  Sysreq-P works, but only reports from one CPU no matter how
> > many times I try.  In normal operation, every machine distributes all
> > IRQs across both CPUs, and Sysreq-P reports from both CPUs.
> >
> > Mapping the EIP reported by Sysreq-P to symbols shows that the responding
> > CPU is spinning on a spinlock (so far I have seen .text.lock.fcntl,
> > .text.lock.sched, .text.lock.locks, and .text.lock.inode), which I assume
> > is being held by the other (dead) CPU.
> 
> This sounds like a deadlock. I wonder why the NMI watchdog is not
> triggering.

Presumably it's spinning on the lock with interrupts enabled.  Make that
the `NMI' counters in /proc/interrupts are incrementing for all CPUs.


> > Even on boxes with nmi_watchdog=1, nothing is reported from the NMI
> > watchdog.
> 
> Can you share all available SysRQ-P output for the locked CPU ? SysRQ-T if
> possible, too.

sysrq-T would be best.

We don't have an each-CPU backtrace facility - it could be handy.  There's
one in the low-latency patch for some reason.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.4.24 SMP lockups
  2004-01-09 21:04 Simon Kirby
  2004-01-09 22:20 ` Arkadiusz Miskiewicz
@ 2004-01-10 15:51 ` Thomas Zehetbauer
       [not found] ` <Pine.LNX.4.58L.0401101719400.1310@logos.cnet>
  2 siblings, 0 replies; 19+ messages in thread
From: Thomas Zehetbauer @ 2004-01-10 15:51 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 501 bytes --]

I have also been experiencing strange lockups with 2.4.23 while playing
audio. Hardware is a dual Celeron and a Creative Labs SB Live. As I was
working in X when this happened I did not even try SysRQ. Strange side
effect: audio playback entered a seemingly infinite loop when this
happened.

Tom

-- 
  T h o m a s   Z e h e t b a u e r   ( TZ251 )
  PGP encrypted mail preferred - KeyID 96FFCB89
       mail pgp-key-request@hostmaster.org

Chemists don't die, they just stop to react.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 481 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.4.24 SMP lockups
  2004-01-09 21:04 Simon Kirby
@ 2004-01-09 22:20 ` Arkadiusz Miskiewicz
  2004-01-10 15:51 ` Thomas Zehetbauer
       [not found] ` <Pine.LNX.4.58L.0401101719400.1310@logos.cnet>
  2 siblings, 0 replies; 19+ messages in thread
From: Arkadiusz Miskiewicz @ 2004-01-09 22:20 UTC (permalink / raw)
  To: linux-kernel

On Friday 09 of January 2004 22:04, Simon Kirby wrote:
> 'lo all,
>
> We've had about 6 cases of this now, across 4 separate boxes. 
I had several such cases with 2.4.23 on two separate boxes. First is dual PIII 
1GHz Intel SRMK2 platform - 
http://www.intel.com/support/motherboards/server/srmk2/ with 1,5GB RAM, 
reiserfs as filesystem, scsi disks and Adaptec AIC-7899P U160/m controller.

Second was UP PIII 500MHz machine on some Intel BX mainboard, 256MB RAM, ext3 
as filesystem, software raid 5 on scsi disks using aic7xxx (Adaptec AIC-7892B 
U160/m).

First one was locking up few times per day (pretty big load), second one maybe 
once per day/two (lower load).

Both machines are working _fine_ with 2.4.21 kernel (one was also using 2.4.22 
for some time and no problems occured).

kernels on both machines were exactly the same (just copied) but using 
different modules - kernel was compiled using 2.95.4 (3+some parts from 2.95 
branch of gcc cvs)

> These boxes are all dual CPU, and the failure case shows up suddenly with
> no warning.  Sysreq-P works, but only reports from one CPU no matter how
> many times I try.  In normal operation, every machine distributes all
> IRQs across both CPUs, and Sysreq-P reports from both CPUs.
Similar here - but sometimes even sysrq wasn't working (on second machine).

> Even on boxes with nmi_watchdog=1, nothing is reported from the NMI
> watchdog.
Exactly same here.
append=" console=tty0 console=ttyS0,9600n81 panic=60 nmi_watchdog=1"

I was thinking that maybe that's due to some problem in aic7xxx driver and 
updated it on one machine to latest available version (these in kernel are 
very old) but that didn't help.

> Simon-

-- 
Arkadiusz Miśkiewicz    CS at FoE, Wroclaw University of Technology
arekm.pld-linux.org AM2-6BONE, 1024/3DB19BBD, arekm(at)ircnet, PLD/Linux

^ permalink raw reply	[flat|nested] 19+ messages in thread

* 2.4.24 SMP lockups
@ 2004-01-09 21:04 Simon Kirby
  2004-01-09 22:20 ` Arkadiusz Miskiewicz
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Simon Kirby @ 2004-01-09 21:04 UTC (permalink / raw)
  To: linux-kernel

'lo all,

We've had about 6 cases of this now, across 4 separate boxes.  Since
upgrading to 2.4.24, our SMP web server boxes (both Intel and AMD
hardware) are randomly blowing up.  This may have happened on 2.4.23 as
well, but they weren't really running long enough to tell.  2.4.22 was
fine.  GCC 3.3.3.

These boxes are all dual CPU, and the failure case shows up suddenly with
no warning.  Sysreq-P works, but only reports from one CPU no matter how
many times I try.  In normal operation, every machine distributes all
IRQs across both CPUs, and Sysreq-P reports from both CPUs.

Mapping the EIP reported by Sysreq-P to symbols shows that the responding
CPU is spinning on a spinlock (so far I have seen .text.lock.fcntl,
.text.lock.sched, .text.lock.locks, and .text.lock.inode), which I assume
is being held by the other (dead) CPU.

Even on boxes with nmi_watchdog=1, nothing is reported from the NMI
watchdog.

Simon-

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2004-01-16  2:35 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-10 19:58 2.4.24 SMP lockups Marcelo Tosatti
2004-01-11  9:01 ` Simon Kirby
2004-01-14 16:23   ` Marcelo Tosatti
2004-01-15 14:35     ` Thomas Zehetbauer
  -- strict thread matches above, loose matches on Subject: below --
2004-01-09 21:04 Simon Kirby
2004-01-09 22:20 ` Arkadiusz Miskiewicz
2004-01-10 15:51 ` Thomas Zehetbauer
     [not found] ` <Pine.LNX.4.58L.0401101719400.1310@logos.cnet>
2004-01-10 22:40   ` Andrew Morton
2004-01-11  4:12     ` Rik van Riel
2004-01-11 13:16       ` Marcelo Tosatti
2004-01-12 12:18       ` Marcelo Tosatti
2004-01-12 12:43         ` Thomas Zehetbauer
2004-01-11  8:55     ` Simon Kirby
2004-01-11  9:30       ` Willy Tarreau
2004-01-14 17:07   ` Simon Kirby
2004-01-14 17:56     ` Marcelo Tosatti
2004-01-16  2:34       ` Philippe Troin
2004-01-14 18:28     ` David Woodhouse
2004-01-14 21:01       ` David Woodhouse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.