linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.4.21-jam1, aic7xxx-6.2.36: solid hangs, crashes on boot
@ 2003-07-29  7:39 Ville Herva
  2003-07-30  7:13 ` 2.4.22pre8 hangs too (Re: 2.4.21-jam1, aic7xxx-6.2.36: solid hangs) Ville Herva
  0 siblings, 1 reply; 22+ messages in thread
From: Ville Herva @ 2003-07-29  7:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: gibbs, Jani Forssell

After about a year of stable operation, a server begun acting up. First it
begun hanging up solid during the nightly oracle backup (that had run
successfully for a year), the I got some aic7xxx-related crashes on boot.

Initially, the box ran 2.4.20pre7 kernel with aic7xxx version 6.4.8. When
the hangs started happening, I upgraded to 2.4.21-jam1 (basically 2.4.21
vanilla + -aa patch + some minor stuff) that includes aic7xxx version 6.2.36.
It did not help.

I enabled kmsgdump and nmi watchdog, but when the box hangs, it hangs solid:
no ctrl-alt-del, no caps lock led, no alt-sysrq-b, no kmsgdump, nmi watchdog
doesn't trigger. Only the cursor on the console blinks, but no messages from
the kernel appear. (Apart from "spurious 8259A interrupt: IRQ7." that
always happens sometime after boot on this box, but way before the hang.)

After upgrading to 2.4.21-jam1, the box usually boots up fine, but twice
I've gotten the crash below. 

After a reset, the box boots up fine (fsck goes through, no corruption), but
it seems little more prone to lock up soon after a hang-and-reset. I made it
hang three times on a row by just compiling kernel, but then on fourth boot,
no amount of IO abuse got it on its knees -- until after a week it hung
during the nightly backup.

Any ideas on how to to debug this kind of hang? Does it sound kernel/driver
or hw related? Are the two crashes related to the hang? Is the hang related
to aic7xxx?

One more detail: with the 2.4.20pre7/aic7xxx-6.2.8 kernel, I got "Panic:
 HOST_MSG_LOOP with invalid SCB 0" crashes every now and then. Justin Gibbs
said: "it looks like memory mapped I/O simply does not work reliably on this
board", and recommended forcing programmed I/O (by undefining MMAPIO from
aic7xxx_osm_pci.c). That seemed to cure the problems -- until now. 

linux-2.2.18pre18 + aic7xxx-5.1.31 was rock stable on this box.


-- v --

v@iki.fi


Hardware:
---------------------------------------------------------------------------
Intel 815EEA2LU (i815 Chipset)
Celeron 1.3GHz (Tualatin)
Adaptec AHA-2940 / AIC-7871
  - Disk (rootfs) SEAGATE  Model: ST19171W Rev: 0024
  - Tape Drive    HP       Model: C1537A Rev: L708
30GB IDE disk (scratch)
---------------------------------------------------------------------------



Dump of crash on boot:
---------------------------------------------------------------------------
<4> 11 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 12 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 13 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 14 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 15 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4>Pending list: 
<4>  4 SCB_CONTROL[0x74] SCB_SCSIID[0x7] SCB_LUN[0x0] 
<4>  0 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0] 
<4>Kernel Free SCB list: 8 7 6 9 1 2 5 
<4>DevQ(0:0:0): 0 waiting
<4>DevQ(0:2:0): 0 waiting
<4>
<4><<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
<4>scsi0:0:0:0: Device is active, asserting ATN
<4>Recovery code sleeping
<4>Recovery code awake
<4>Timer Expired
<4>aic7xxx_abort returns 0x2003
<4>scsi0:0:0:0: Attempting to queue an ABORT message
<4>CDB: 0x28 0x0 0x0 0xbe 0x12 0x96 0x0 0x0 0x18 0x0
<4>scsi0:0:0:0: Command not found
<4>aic7xxx_abort returns 0x2002
<4>scsi0:0:0:0: Attempting to queue an ABORT message
<4>CDB: 0x0 0x0 0x0 0x0 0x0 0x0
<4>scsi0: At time of recovery, card was not paused
<4>>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
<4>scsi0: Dumping Card State in Command phase, at SEQADDR 0x36
<4>Card was paused
<4>ACCUM = 0x80, SINDEX = 0xac, DINDEX = 0xc0, ARG_2 = 0x0
<4>HCNT = 0x0 SCBPTR = 0x0
<4>SCSISIGI[0x96] ERROR[0x0] SCSIBUSL[0x80] LASTPHASE[0x80] 
<4>SCSISEQ[0x12] SBLKCTL[0x2] SCSIRATE[0x0] SEQCTL[0x10] 
<4>SEQ_FLAGS[0x0] SSTAT0[0x5] SSTAT1[0x3] SSTAT2[0x0] 
<4>SSTAT3[0x0] SIMODE0[0x0] SIMODE1[0xac] SXFRCTL0[0x88] 
<4>DFCNTRL[0x6] DFSTATUS[0x48] 
<4>STACK: 0x0 0x166 0x196 0x35
<4>SCB count = 10
<4>Kernel NEXTQSCB = 8
<4>Card NEXTQSCB = 4
<4>QINFIFO entries: 4 3 
<4>Waiting Queue entries: 
<4>Disconnected Queue entries: 1:0 
<4>QOUTFIFO entries: 
<4>Sequencer Free SCB List: 3 2 6 4 7 5 8 9 10 11 12 13 14 15 
<4>Sequencer SCB Info: 
<4>  0 SCB_CONTROL[0x0] SCB_SCSIID[0x0] SCB_LUN[0x0] SCB_TAG[0x0] 
<4>  1 SCB_CONTROL[0x64] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0x0] 
<4>  2 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  3 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  4 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  5 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  6 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  7 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  8 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4>  9 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 10 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 11 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 12 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 13 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 14 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 15 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4>Pending list: 
<4>  3 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0] 
<4>  4 SCB_CONTROL[0x74] SCB_SCSIID[0x7] SCB_LUN[0x0] 
<4>  0 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0] 
<4>Kernel Free SCB list: 7 6 9 1 2 5 
<4>DevQ(0:0:0): 0 waiting
<4>DevQ(0:2:0): 0 waiting
<4>
<4><<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
<4>scsi0:0:0:0: Cmd aborted from QINFIFO
<4>aic7xxx_abort returns 0x2002
<4>scsi0:0:0:0: Attempting to queue an ABORT message
<4>CDB: 0x2a 0x0 0x0 0x0 0x0 0x3f 0x0 0x0 0x10 0x0
<4>scsi0:0:0:0: Command not found
<4>aic7xxx_abort returns 0x2002
<4>scsi0:0:0:0: Attempting to queue an ABORT message
<4>CDB: 0x0 0x0 0x0 0x0 0x0 0x0
<4>scsi0: At time of recovery, card was not paused
<4>>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
<4>scsi0: Dumping Card State in Command phase, at SEQADDR 0x1a3
<4>Card was paused
<4>ACCUM = 0x80, SINDEX = 0xa2, DINDEX = 0xc0, ARG_2 = 0x0
<4>HCNT = 0x0 SCBPTR = 0x0
<4>SCSISIGI[0x96] ERROR[0x0] SCSIBUSL[0x0] LASTPHASE[0x80] 
<4>SCSISEQ[0x12] SBLKCTL[0x2] SCSIRATE[0x0] SEQCTL[0x10] 
<4>SEQ_FLAGS[0x0] SSTAT0[0x5] SSTAT1[0x3] SSTAT2[0x0] 
<4>SSTAT3[0x0] SIMODE0[0x0] SIMODE1[0xac] SXFRCTL0[0x80] 
<4>DFCNTRL[0x34] DFSTATUS[0x68] 
<4>STACK: 0xaf 0x0 0x166 0x196
<4>SCB count = 10
<4>Kernel NEXTQSCB = 3
<4>Card NEXTQSCB = 8
<4>QINFIFO entries: 8 4 
<4>Waiting Queue entries: 
<4>Disconnected Queue entries: 1:0 
<4>QOUTFIFO entries: 
<4>Sequencer Free SCB List: 3 2 6 4 7 5 8 9 10 11 12 13 14 15 
<4>Sequencer SCB Info: 
<4>  0 SCB_CONTROL[0x0] SCB_SCSIID[0x0] SCB_LUN[0x0] SCB_TAG[0x0] 
<4>  1 SCB_CONTROL[0x64] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0x0] 
<4>  2 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  3 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  4 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  5 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  6 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  7 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  8 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4>  9 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 10 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 11 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 12 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 13 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 14 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 15 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4>Pending list: 
<4>  4 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0] 
<4>  8 SCB_CONTROL[0x74] SCB_SCSIID[0x7] SCB_LUN[0x0] 
<4>  0 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0] 
<4>Kernel Free SCB list: 7 6 9 1 2 5 
<4>DevQ(0:0:0): 0 waiting
<4>DevQ(0:2:0): 0 waiting
<4>
<4><<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
<4>scsi0:0:0:0: Cmd aborted from QINFIFO
<4>aic7xxx_abort returns 0x2002
<4>scsi0:0:0:0: Attempting to queue an ABORT message
<4>CDB: 0x2a 0x0 0x0 0x0 0x4 0x2f 0x0 0x0 0x10 0x0
<4>scsi0:0:0:0: Command not found
<4>aic7xxx_abort returns 0x2002
<4>scsi0:0:0:0: Attempting to queue an ABORT message
<4>CDB: 0x0 0x0 0x0 0x0 0x0 0x0
<4>scsi0: At time of recovery, card was not paused
<4>>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
<4>scsi0: Dumping Card State in Command phase, at SEQADDR 0x1a4
<4>Card was paused
<4>ACCUM = 0x80, SINDEX = 0xa9, DINDEX = 0xc0, ARG_2 = 0x0
<4>HCNT = 0x0 SCBPTR = 0x0
<4>SCSISIGI[0x96] ERROR[0x0] SCSIBUSL[0x0] LASTPHASE[0x80] 
<4>SCSISEQ[0x12] SBLKCTL[0x2] SCSIRATE[0x0] SEQCTL[0x10] 
<4>SEQ_FLAGS[0x0] SSTAT0[0x5] SSTAT1[0x3] SSTAT2[0x0] 
<4>SSTAT3[0x0] SIMODE0[0x0] SIMODE1[0xac] SXFRCTL0[0x80] 
<4>DFCNTRL[0x34] DFSTATUS[0x48] 
<4>STACK: 0xb0 0x0 0x166 0x196
<4>SCB count = 10
<4>Kernel NEXTQSCB = 4
<4>Card NEXTQSCB = 3
<4>QINFIFO entries: 3 8 
<4>Waiting Queue entries: 
<4>Disconnected Queue entries: 1:0 
<4>QOUTFIFO entries: 
<4>Sequencer Free SCB List: 3 2 6 4 7 5 8 9 10 11 12 13 14 15 
<4>Sequencer SCB Info: 
<4>  0 SCB_CONTROL[0x0] SCB_SCSIID[0x0] SCB_LUN[0x0] SCB_TAG[0x0] 
<4>  1 SCB_CONTROL[0x64] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0x0] 
<4>  2 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  3 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  4 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  5 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  6 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  7 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  8 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4>  9 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 10 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 11 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 12 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 13 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 14 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 15 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4>Pending list: 
<4>  8 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0] 
<4>  3 SCB_CONTROL[0x74] SCB_SCSIID[0x7] SCB_LUN[0x0] 
<4>  0 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0] 
<4>Kernel Free SCB list: 7 6 9 1 2 5 
<4>DevQ(0:0:0): 0 waiting
<4>DevQ(0:2:0): 0 waiting
<4>
<4><<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
<4>scsi0:0:0:0: Cmd aborted from QINFIFO
<4>aic7xxx_abort returns 0x2002
<4>scsi0:0:0:0: Attempting to queue an ABORT message
<4>CDB: 0x2a 0x0 0x0 0x4 0x0 0x5f 0x0 0x0 0x10 0x0
<4>scsi0:0:0:0: Command not found
<4>aic7xxx_abort returns 0x2002
<4>scsi0:0:0:0: Attempting to queue an ABORT message
<4>CDB: 0x0 0x0 0x0 0x0 0x0 0x0
<4>scsi0: At time of recovery, card was not paused
<4>>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
<4>scsi0: Dumping Card State in Command phase, at SEQADDR 0x16f
<4>Card was paused
<4>ACCUM = 0x80, SINDEX = 0xac, DINDEX = 0xc0, ARG_2 = 0x0
<4>HCNT = 0x0 SCBPTR = 0x0
<4>SCSISIGI[0x96] ERROR[0x0] SCSIBUSL[0x80] LASTPHASE[0x80] 
<4>SCSISEQ[0x12] SBLKCTL[0x2] SCSIRATE[0x0] SEQCTL[0x10] 
<4>SEQ_FLAGS[0x0] SSTAT0[0x5] SSTAT1[0x3] SSTAT2[0x0] 
<4>SSTAT3[0x0] SIMODE0[0x0] SIMODE1[0xac] SXFRCTL0[0x88] 
<4>DFCNTRL[0x6] DFSTATUS[0x48] 
<4>STACK: 0x35 0x0 0x166 0x196
<4>SCB count = 10
<4>Kernel NEXTQSCB = 8
<4>Card NEXTQSCB = 4
<4>QINFIFO entries: 4 3 
<4>Waiting Queue entries: 
<4>Disconnected Queue entries: 1:0 
<4>QOUTFIFO entries: 
<4>Sequencer Free SCB List: 3 2 6 4 7 5 8 9 10 11 12 13 14 15 
<4>Sequencer SCB Info: 
<4>  0 SCB_CONTROL[0x0] SCB_SCSIID[0x0] SCB_LUN[0x0] SCB_TAG[0x0] 
<4>  1 SCB_CONTROL[0x64] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0x0] 
<4>  2 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  3 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  4 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  5 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  6 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  7 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  8 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4>  9 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 10 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 11 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 12 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 13 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 14 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 15 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4>Pending list: 
<4>  3 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0] 
<4>  4 SCB_CONTROL[0x74] SCB_SCSIID[0x7] SCB_LUN[0x0] 
<4>  0 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0] 
<4>Kernel Free SCB list: 7 6 9 1 2 5 
<4>DevQ(0:0:0): 0 waiting
<4>DevQ(0:2:0): 0 waiting
<4>
<4><<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
<4>scsi0:0:0:0: Cmd aborted from QINFIFO
<4>aic7xxx_abort returns 0x2002
<4>scsi0:0:0:0: Attempting to queue an ABORT message
<4>CDB: 0x2a 0x0 0x0 0x4 0x3 0xcf 0x0 0x0 0x8 0x0
<4>scsi0:0:0:0: Command not found
<4>aic7xxx_abort returns 0x2002
<4>scsi0:0:0:0: Attempting to queue an ABORT message
<4>CDB: 0x0 0x0 0x0 0x0 0x0 0x0
<4>scsi0: At time of recovery, card was not paused
<4>>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
<4>scsi0: Dumping Card State in Command phase, at SEQADDR 0xa5
<4>Card was paused
<4>ACCUM = 0x80, SINDEX = 0xac, DINDEX = 0xc0, ARG_2 = 0x0
<4>HCNT = 0x0 SCBPTR = 0x0
<4>SCSISIGI[0x96] ERROR[0x0] SCSIBUSL[0x80] LASTPHASE[0x80] 
<4>SCSISEQ[0x12] SBLKCTL[0x2] SCSIRATE[0x0] SEQCTL[0x10] 
<4>SEQ_FLAGS[0x0] SSTAT0[0x5] SSTAT1[0x3] SSTAT2[0x0] 
<4>SSTAT3[0x0] SIMODE0[0x0] SIMODE1[0xac] SXFRCTL0[0x88] 
<4>DFCNTRL[0x6] DFSTATUS[0x48] 
<4>STACK: 0x0 0x166 0x196 0x35
<4>SCB count = 10
<4>Kernel NEXTQSCB = 3
<4>Card NEXTQSCB = 8
<4>QINFIFO entries: 8 4 
<4>Waiting Queue entries: 
<4>Disconnected Queue entries: 1:0 
<4>QOUTFIFO entries: 
<4>Sequencer Free SCB List: 3 2 6 4 7 5 8 9 10 11 12 13 14 15 
<4>Sequencer SCB Info: 
<4>  0 SCB_CONTROL[0x0] SCB_SCSIID[0x0] SCB_LUN[0x0] SCB_TAG[0x0] 
<4>  1 SCB_CONTROL[0x64] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0x0] 
<4>  2 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  3 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  4 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  5 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  6 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  7 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  8 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4>  9 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 10 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 11 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 12 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 13 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 14 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 15 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4>Pending list: 
<4>  4 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0] 
<4>  8 SCB_CONTROL[0x74] SCB_SCSIID[0x7] SCB_LUN[0x0] 
<4>  0 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0] 
<4>Kernel Free SCB list: 7 6 9 1 2 5 
<4>DevQ(0:0:0): 0 waiting
<4>DevQ(0:2:0): 0 waiting
<4>
<4><<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
<4>scsi0:0:0:0: Cmd aborted from QINFIFO
<4>aic7xxx_abort returns 0x2002
<4>scsi0:0:0:0: Attempting to queue a TARGET RESET message
<4>CDB: 0x28 0x0 0x0 0xbe 0x12 0x5e 0x0 0x0 0x20 0x0
<4>aic7xxx_dev_reset returns 0x2003
<4>Recovery SCB completes
<4>Recovery SCB completes
<4>scsi0:A:0:0: ahc_intr - referenced scb not valid during seqint 0x71 scb(0)
<4>>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
<4>scsi0: Dumping Card State in Message-in phase, at SEQADDR 0x1bc
<4>Card was paused
<4>ACCUM = 0xc0, SINDEX = 0x71, DINDEX = 0x8c, ARG_2 = 0x0
<4>HCNT = 0x0 SCBPTR = 0x0
<4>SCSISIGI[0xe6] ERROR[0x0] SCSIBUSL[0x0] LASTPHASE[0xe0] 
<4>SCSISEQ[0x12] SBLKCTL[0x2] SCSIRATE[0x42] SEQCTL[0x10] 
<4>SEQ_FLAGS[0x40] SSTAT0[0x2] SSTAT1[0x3] SSTAT2[0x10] 
<4>SSTAT3[0x0] SIMODE0[0x0] SIMODE1[0xac] SXFRCTL0[0x88] 
<4>DFCNTRL[0x0] DFSTATUS[0x29] 
<4>STACK: 0xff 0x0 0x166 0x18e
<4>SCB count = 10
<4>Kernel NEXTQSCB = 0
<4>Card NEXTQSCB = 193
<4>QINFIFO entries: 
<4>Waiting Queue entries: 
<4>Disconnected Queue entries: 
<4>QOUTFIFO entries: 
<4>Sequencer Free SCB List: 1 3 2 6 4 7 5 8 9 10 11 12 13 14 15 
<4>Sequencer SCB Info: 
<4>  0 SCB_CONTROL[0x80] SCB_SCSIID[0x0] SCB_LUN[0x0] SCB_TAG[0x0] 
<4>  1 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  2 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  3 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  4 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  5 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  6 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  7 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 
<4>  8 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4>  9 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 10 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 11 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 12 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 13 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 14 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4> 15 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
<4>Pending list: 
<4>  8 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 
<4>Kernel Free SCB list: 3 4 7 6 9 1 2 5 
<4>DevQ(0:0:0): 0 waiting
<4>DevQ(0:2:0): 0 waiting
<4>
<4><<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
<0>Kernel panic: for safety
<0>In interrupt handler - not syncing
<4> <0>Dumping messages in 0 seconds : last chance for Alt-SysRq...
---------------------------------------------------------------------------




/proc/scsi/aic7xxx>cat 0
---------------------------------------------------------------------------
Adaptec AIC7xxx driver version: 6.2.36
Adaptec 2940 SCSI adapter
aic7870: Wide Channel A, SCSI Id=7, 16/253 SCBs
Allocated SCBs: 10, SG List Length: 102

Serial EEPROM:
0x0238 0x0218 0x0238 0x0238 0x0238 0x0238 0x0238 0x0238 
0x0238 0x0238 0x0238 0x0238 0x0238 0x0238 0x0238 0x0238 
0x0096 0x005c 0x2807 0xff10 0xffff 0xffff 0xffff 0xffff 
0xffff 0xffff 0xffff 0xffff 0xffff 0xffff 0x00ff 0x4c5e 

Target 0 Negotiation Settings
        User: 20.000MB/s transfers (10.000MHz, offset 127, 16bit)
        Goal: 20.000MB/s transfers (10.000MHz, offset 8, 16bit)
        Curr: 20.000MB/s transfers (10.000MHz, offset 8, 16bit)
        Channel A Target 0 Lun 0 Settings
                Commands Queued 14441
                Commands Active 0
                Command Openings 8
                Max Tagged Openings 8
                Device Queue Frozen Count 0
Target 1 Negotiation Settings
        User: 10.000MB/s transfers (10.000MHz, offset 127)
Target 2 Negotiation Settings
        User: 20.000MB/s transfers (10.000MHz, offset 127, 16bit)
        Goal: 10.000MB/s transfers (10.000MHz, offset 15)
        Curr: 10.000MB/s transfers (10.000MHz, offset 15)
        Channel A Target 2 Lun 0 Settings
                Commands Queued 1
                Commands Active 0
                Command Openings 1
                Max Tagged Openings 0
                Device Queue Frozen Count 0
Target 3 Negotiation Settings
        User: 20.000MB/s transfers (10.000MHz, offset 127, 16bit)
Target 4 Negotiation Settings
        User: 20.000MB/s transfers (10.000MHz, offset 127, 16bit)
Target 5 Negotiation Settings
        User: 20.000MB/s transfers (10.000MHz, offset 127, 16bit)
Target 6 Negotiation Settings
        User: 20.000MB/s transfers (10.000MHz, offset 127, 16bit)
Target 7 Negotiation Settings
        User: 20.000MB/s transfers (10.000MHz, offset 127, 16bit)
Target 8 Negotiation Settings
        User: 20.000MB/s transfers (10.000MHz, offset 127, 16bit)
Target 9 Negotiation Settings
        User: 20.000MB/s transfers (10.000MHz, offset 127, 16bit)
Target 10 Negotiation Settings
        User: 20.000MB/s transfers (10.000MHz, offset 127, 16bit)
Target 11 Negotiation Settings
        User: 20.000MB/s transfers (10.000MHz, offset 127, 16bit)
Target 12 Negotiation Settings
        User: 20.000MB/s transfers (10.000MHz, offset 127, 16bit)
Target 13 Negotiation Settings
        User: 20.000MB/s transfers (10.000MHz, offset 127, 16bit)
Target 14 Negotiation Settings
        User: 20.000MB/s transfers (10.000MHz, offset 127, 16bit)
Target 15 Negotiation Settings
        User: 20.000MB/s transfers (10.000MHz, offset 127, 16bit)
---------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 22+ messages in thread

* 2.4.22pre8 hangs too (Re: 2.4.21-jam1, aic7xxx-6.2.36: solid hangs)
  2003-07-29  7:39 2.4.21-jam1, aic7xxx-6.2.36: solid hangs, crashes on boot Ville Herva
@ 2003-07-30  7:13 ` Ville Herva
  2003-07-30 14:50   ` Marcelo Tosatti
  0 siblings, 1 reply; 22+ messages in thread
From: Ville Herva @ 2003-07-30  7:13 UTC (permalink / raw)
  To: linux-kernel, gibbs

On Tue, Jul 29, 2003 at 10:39:48AM +0300, you [Ville Herva] wrote:
> After about a year of stable operation, a server begun acting up. First it
> begun hanging up solid during the nightly oracle backup (that had run
> successfully for a year), the I got some aic7xxx-related crashes on boot.
> 
> Initially, the box ran 2.4.20pre7 kernel with aic7xxx version 6.4.8. When
> the hangs started happening, I upgraded to 2.4.21-jam1 (basically 2.4.21
> vanilla + -aa patch + some minor stuff) that includes aic7xxx version 6.2.36.
> It did not help.
> 
> I enabled kmsgdump and nmi watchdog, but when the box hangs, it hangs solid:
> no ctrl-alt-del, no caps lock led, no alt-sysrq-b, no kmsgdump, nmi watchdog
> doesn't trigger. Only the cursor on the console blinks, but no messages from
> the kernel appear. (Apart from "spurious 8259A interrupt: IRQ7." that
> always happens sometime after boot on this box, but way before the hang.)

Herbert Pötzl indicted that he'd had similar lockups with fairly similar hw
up until 2.4.22pre6. He suggested I should try 2.4.22pre8.

2.4.22pre8 locked up the same way in about 10 hours.

> Any ideas on how to to debug this kind of hang? 

The question still stands; how do I debug this? 

> Does it sound kernel/driver or hw related? Are the two crashes related to
> the hang? Is the hang related to aic7xxx?

Any ideas?



-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1, aic7xxx-6.2.36: solid hangs)
  2003-07-30  7:13 ` 2.4.22pre8 hangs too (Re: 2.4.21-jam1, aic7xxx-6.2.36: solid hangs) Ville Herva
@ 2003-07-30 14:50   ` Marcelo Tosatti
  2003-07-30 18:10     ` Ville Herva
  0 siblings, 1 reply; 22+ messages in thread
From: Marcelo Tosatti @ 2003-07-30 14:50 UTC (permalink / raw)
  To: Ville Herva; +Cc: linux-kernel, gibbs



On Wed, 30 Jul 2003, Ville Herva wrote:

> On Tue, Jul 29, 2003 at 10:39:48AM +0300, you [Ville Herva] wrote:
> > After about a year of stable operation, a server begun acting up. First it
> > begun hanging up solid during the nightly oracle backup (that had run
> > successfully for a year), the I got some aic7xxx-related crashes on boot.
> >
> > Initially, the box ran 2.4.20pre7 kernel with aic7xxx version 6.4.8. When
> > the hangs started happening, I upgraded to 2.4.21-jam1 (basically 2.4.21
> > vanilla + -aa patch + some minor stuff) that includes aic7xxx version 6.2.36.
> > It did not help.
> >
> > I enabled kmsgdump and nmi watchdog, but when the box hangs, it hangs solid:
> > no ctrl-alt-del, no caps lock led, no alt-sysrq-b, no kmsgdump, nmi watchdog
> > doesn't trigger. Only the cursor on the console blinks, but no messages from
> > the kernel appear. (Apart from "spurious 8259A interrupt: IRQ7." that
> > always happens sometime after boot on this box, but way before the hang.)
>
> Herbert Pötzl indicted that he'd had similar lockups with fairly similar hw
> up until 2.4.22pre6. He suggested I should try 2.4.22pre8.
>
> 2.4.22pre8 locked up the same way in about 10 hours.
>
> > Any ideas on how to to debug this kind of hang?
>
> The question still stands; how do I debug this?
>
> > Does it sound kernel/driver or hw related? Are the two crashes related to
> > the hang? Is the hang related to aic7xxx?
>
> Any ideas?

Ville,

Mind trying 2.4.22-pre8 without MMAPIO defined in the SCSI driver?

Justin, is this problem known to other boards or.. ?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1, aic7xxx-6.2.36: solid hangs)
  2003-07-30 14:50   ` Marcelo Tosatti
@ 2003-07-30 18:10     ` Ville Herva
  2003-08-08 12:55       ` Ville Herva
  2003-08-27  6:43       ` 2.4.22pre8 hangs too (Re: 2.4.21-jam1 " Ville Herva
  0 siblings, 2 replies; 22+ messages in thread
From: Ville Herva @ 2003-07-30 18:10 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel, gibbs

On Wed, Jul 30, 2003 at 11:50:50AM -0300, you [Marcelo Tosatti] wrote:
> 
> > Any ideas?
> 
> Ville,
> 
> Mind trying 2.4.22-pre8 without MMAPIO defined in the SCSI driver?

2.4.20pre7 (aic7xxx 6.2.8) that I initially saw the lockups with was
compiled with MMAPIO undefined. 2.4.21-jam1 (aic7xxx 6.2.36) and 2.4.22pre8
(aic7xxx 6.2.36) had it defined (the default). All of the three locked up
the same way. Hence, I think it's unlikely MMAPIO is the culprit.

However, I just realized that all of those kernel were compiled with fairly
dubious gcc, version 2.96-85. I just compiled otherwise identically
configured 2.4.21-jam1 with gcc-3.2.1-2. It'll take some time to tell
whether this cures it. This is my main suspect now.
 
> Justin, is this problem known to other boards or.. ?

The lockups may be completely unrelated to aic7xxx and the crashes on boot
that I posted kernel logs of. I don't know.


-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1, aic7xxx-6.2.36: solid hangs)
  2003-07-30 18:10     ` Ville Herva
@ 2003-08-08 12:55       ` Ville Herva
  2003-08-09 20:19         ` Adrian Bunk
  2003-08-27  6:43       ` 2.4.22pre8 hangs too (Re: 2.4.21-jam1 " Ville Herva
  1 sibling, 1 reply; 22+ messages in thread
From: Ville Herva @ 2003-08-08 12:55 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-kernel, gibbs, alan

On Wed, Jul 30, 2003 at 09:10:03PM +0300, you [Ville Herva] wrote:
> 
> However, I just realized that all of those kernel were compiled with fairly
> dubious gcc, version 2.96-85. I just compiled otherwise identically
> configured 2.4.21-jam1 with gcc-3.2.1-2. It'll take some time to tell
> whether this cures it. This is my main suspect now.

Ok, the kernel compiled with gcc version 3.2.1 20021207 (Red Hat Linux 8.0
3.2.1-2) has now been up for more than a week. It seems stable, but I'm not
sure yet.

Which brings me to the question: which gcc version is considered most stable
for compiling 2.4.x these days?

README says:
"Make sure you have gcc 2.95.3 available.  gcc 2.91.66 (egcs-1.1.2) may
also work but is not as safe, and *gcc 2.7.2.3 is no longer supported*"

And Documentation/Changes says:

"You may use gcc 3.0.x instead if you wish, although it may cause problems.
Later versions of gcc have not received much testing for Linux kernel
compilation, and there are almost certainly bugs (mainly, but not
exclusively, in the kernel) that will need to be fixed in order to use these
compilers."

and

"The Red Hat gcc 2.96 compiler subtree can also be used to build this tree.
You should ensure you use gcc-2.96-74 or later. gcc-2.96-54 will not build
the kernel correctly."

This seems to suggest 2.96-85 would be more stable than gcc-3.2.1-2. Is this
the case?
  
> > Justin, is this problem known to other boards or.. ?
> 
> The lockups may be completely unrelated to aic7xxx and the crashes on boot
> that I posted kernel logs of. I don't know.

I guess I'll poll Justin when/if the aic7xxx crashes reappear. The hang was
probably not related to aic7xxx. Sorry for the false accusation.



-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1, aic7xxx-6.2.36: solid hangs)
  2003-08-08 12:55       ` Ville Herva
@ 2003-08-09 20:19         ` Adrian Bunk
  2003-08-09 22:15           ` Ville Herva
  0 siblings, 1 reply; 22+ messages in thread
From: Adrian Bunk @ 2003-08-09 20:19 UTC (permalink / raw)
  To: Ville Herva, Marcelo Tosatti, linux-kernel, gibbs, alan

On Fri, Aug 08, 2003 at 03:55:02PM +0300, Ville Herva wrote:
>...
> Ok, the kernel compiled with gcc version 3.2.1 20021207 (Red Hat Linux 8.0
> 3.2.1-2) has now been up for more than a week. It seems stable, but I'm not
> sure yet.
> 
> Which brings me to the question: which gcc version is considered most stable
> for compiling 2.4.x these days?
>...
> This seems to suggest 2.96-85 would be more stable than gcc-3.2.1-2. Is this
> the case?
>...

2.95.3 and the (unofficial) 2.96 are the best compilers for 2.4 .

In most cases 3.2.1 will give you a working kernel, but if you need
maximum stablity don't use gcc 3.x for compiling kernel 2.4 .

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1, aic7xxx-6.2.36: solid hangs)
  2003-08-09 20:19         ` Adrian Bunk
@ 2003-08-09 22:15           ` Ville Herva
  0 siblings, 0 replies; 22+ messages in thread
From: Ville Herva @ 2003-08-09 22:15 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Marcelo Tosatti, linux-kernel, gibbs, alan

On Sat, Aug 09, 2003 at 10:19:51PM +0200, you [Adrian Bunk] wrote:
> On Fri, Aug 08, 2003 at 03:55:02PM +0300, Ville Herva wrote:
> > 
> > Which brings me to the question: which gcc version is considered most stable
> > for compiling 2.4.x these days?
> >...
> > This seems to suggest 2.96-85 would be more stable than gcc-3.2.1-2. Is this
> > the case?
> >...
> 
> 2.95.3 and the (unofficial) 2.96 are the best compilers for 2.4 .
> 
> In most cases 3.2.1 will give you a working kernel, but if you need
> maximum stablity don't use gcc 3.x for compiling kernel 2.4 .

I'm surely aiming for stability, yeah ;).

2.96-85 produces a kernel that hangs (though it's not proven it's gcc's
fault) -- the one compiled with gcc-3.2.1-2 hasn't hung yet. I guess I
should at least use the latest errata version if I go with 2.96...


-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1 solid hangs)
  2003-07-30 18:10     ` Ville Herva
  2003-08-08 12:55       ` Ville Herva
@ 2003-08-27  6:43       ` Ville Herva
  2003-08-27  7:03         ` Stephan von Krawczynski
  1 sibling, 1 reply; 22+ messages in thread
From: Ville Herva @ 2003-08-27  6:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: tejun, skraw

On Wed, Jul 30, 2003 at 09:10:03PM +0300, you [Ville Herva] wrote:
> 
> However, I just realized that all of those kernel were compiled with fairly
> dubious gcc, version 2.96-85. I just compiled otherwise identically
> configured 2.4.21-jam1 with gcc-3.2.1-2. It'll take some time to tell
> whether this cures it. This is my main suspect now.

I celebrated too early.

The kernel compiled with gcc 3.2.1 20021207 (Red Hat Linux 8.0 3.2.1-2) hung
too, it just happened to take a little longer.

Short summary:

  - The hangs are solid: 
    - nothing in the log, nothing on the screen
    - no ctrl-alt-del, numlock
    - no sysrq-s, sysrq-u, sysrq-b 
    - nmi watchdog doesn't trigger
  - The hangs mostly happen when the nightly oracle backup dump is in
    progress
    - the oracle database is on an ide disk, oracle app and the dump
      destination are on an scsi disk (Adaptec 2940, SEAGATE ST19171W)
  - HW: Intel 815EEA2LU mobo, i815, Celeron Tualatin 1.3GHz. Adaptec 2940,
    9GB Seagate, HP C1537A tapedrive (not used), IBM-DTLA-305030 ide disk.
  - The aic7xxx driver has been acting up in past: crashes on boot and 
    sometimes at runtime too. I don't know if this is at all related to the
    lock ups.
  - Kernels tried: 2.4.22-pre8/gcc-2.96-85, 2.4.21-jam1/2.4.21-jam1, 
    2.4.21-jam1/gcc-3.2.1-2, 2.4.20pre7 -- all hang.

Perhaps this is related to the "Race condition in 2.4 tasklet handling
(cli() broken?)" problem TeJun Huh and Stephan von Krawczynski have been
discussing?

Any ideas?


-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1 solid hangs)
  2003-08-27  6:43       ` 2.4.22pre8 hangs too (Re: 2.4.21-jam1 " Ville Herva
@ 2003-08-27  7:03         ` Stephan von Krawczynski
  2003-08-27  7:12           ` Ville Herva
  0 siblings, 1 reply; 22+ messages in thread
From: Stephan von Krawczynski @ 2003-08-27  7:03 UTC (permalink / raw)
  To: Ville Herva; +Cc: linux-kernel, tejun

On Wed, 27 Aug 2003 09:43:02 +0300
Ville Herva <vherva@niksula.hut.fi> wrote:

> On Wed, Jul 30, 2003 at 09:10:03PM +0300, you [Ville Herva] wrote:
> [...]
>   - HW: Intel 815EEA2LU mobo, i815, Celeron Tualatin 1.3GHz. Adaptec 2940,
>     9GB Seagate, HP C1537A tapedrive (not used), IBM-DTLA-305030 ide disk.
>   - The aic7xxx driver has been acting up in past: crashes on boot and 
>     sometimes at runtime too. I don't know if this is at all related to the
>     lock ups.
>   - Kernels tried: 2.4.22-pre8/gcc-2.96-85, 2.4.21-jam1/2.4.21-jam1, 
>     2.4.21-jam1/gcc-3.2.1-2, 2.4.20pre7 -- all hang.
> 
> Perhaps this is related to the "Race condition in 2.4 tasklet handling
> (cli() broken?)" problem TeJun Huh and Stephan von Krawczynski have been
> discussing?

This is no SMP box, is it? If it is no SMP is it probably unrelated.

Regards,
Stephan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1 solid hangs)
  2003-08-27  7:03         ` Stephan von Krawczynski
@ 2003-08-27  7:12           ` Ville Herva
  2003-08-27  7:21             ` Stephan von Krawczynski
  0 siblings, 1 reply; 22+ messages in thread
From: Ville Herva @ 2003-08-27  7:12 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: linux-kernel, tejun

On Wed, Aug 27, 2003 at 09:03:51AM +0200, you [Stephan von Krawczynski] wrote:
> > On Wed, Jul 30, 2003 at 09:10:03PM +0300, you [Ville Herva] wrote:
> > [...]
> >   - HW: Intel 815EEA2LU mobo, i815, Celeron Tualatin 1.3GHz. Adaptec 2940,
> >     9GB Seagate, HP C1537A tapedrive (not used), IBM-DTLA-305030 ide disk.
> >   - The aic7xxx driver has been acting up in past: crashes on boot and 
> >     sometimes at runtime too. I don't know if this is at all related to the
> >     lock ups.
> >   - Kernels tried: 2.4.22-pre8/gcc-2.96-85, 2.4.21-jam1/2.4.21-jam1, 
> >     2.4.21-jam1/gcc-3.2.1-2, 2.4.20pre7 -- all hang.

Forgot to mention: all fs's are ext2.
 
> > Perhaps this is related to the "Race condition in 2.4 tasklet handling
> > (cli() broken?)" problem TeJun Huh and Stephan von Krawczynski have been
> > discussing?
> 
> This is no SMP box, is it? If it is no SMP is it probably unrelated.

Yes, no SMP.


-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1 solid hangs)
  2003-08-27  7:12           ` Ville Herva
@ 2003-08-27  7:21             ` Stephan von Krawczynski
  2003-08-27  7:37               ` Ville Herva
  0 siblings, 1 reply; 22+ messages in thread
From: Stephan von Krawczynski @ 2003-08-27  7:21 UTC (permalink / raw)
  To: Ville Herva; +Cc: linux-kernel, tejun

On Wed, 27 Aug 2003 10:12:59 +0300
Ville Herva <vherva@niksula.hut.fi> wrote:

> > > Perhaps this is related to the "Race condition in 2.4 tasklet handling
> > > (cli() broken?)" problem TeJun Huh and Stephan von Krawczynski have been
> > > discussing?
> > 
> > This is no SMP box, is it? If it is no SMP is it probably unrelated.
> 
> Yes, no SMP.

Sorry, then you have to look for another explanation. 
Did you already try to exchange everything but the harddisks ?

Regards,
Stephan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1 solid hangs)
  2003-08-27  7:21             ` Stephan von Krawczynski
@ 2003-08-27  7:37               ` Ville Herva
  2003-08-27  9:30                 ` Stephan von Krawczynski
  2003-08-28  1:13                 ` TeJun Huh
  0 siblings, 2 replies; 22+ messages in thread
From: Ville Herva @ 2003-08-27  7:37 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: linux-kernel, tejun

On Wed, Aug 27, 2003 at 09:21:39AM +0200, you [Stephan von Krawczynski] wrote:
> 
> Sorry, then you have to look for another explanation. 

Yep, but I don't have any reasonable suspects.

> Did you already try to exchange everything but the harddisks ?

No. Do you suspect faulty hardware?

Apart from perhaps Adaptec 2940 (Adaptecs always give me trouble), I
believe the hw is pretty solid. It had no problems with 2.2 kernels.  Based
on my experience, the i815 chipset is not that shaky (unlike the Via dung),
and I would expect the Intel motherboard to be on the better side as well.

I can't completely rule faulty hw out, though.

Exchanging hw will be quite difficult, as the hangs take as much as three
weeks to trigger (sometimes they happen withing a day after reboot), the box
is a production server, and I don't have much spare hardware atm.

What I had hoped for is to be able to get some information on where it hangs.
But sysrq and nmi watchdog don't cut it...


-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1 solid hangs)
  2003-08-27  7:37               ` Ville Herva
@ 2003-08-27  9:30                 ` Stephan von Krawczynski
  2003-08-27 10:13                   ` Ville Herva
  2003-08-28  1:13                 ` TeJun Huh
  1 sibling, 1 reply; 22+ messages in thread
From: Stephan von Krawczynski @ 2003-08-27  9:30 UTC (permalink / raw)
  To: Ville Herva; +Cc: linux-kernel, tejun

On Wed, 27 Aug 2003 10:37:58 +0300
Ville Herva <vherva@niksula.hut.fi> wrote:

> > Did you already try to exchange everything but the harddisks ?
> 
> No. Do you suspect faulty hardware?
> [...]
> What I had hoped for is to be able to get some information on where it hangs.
> But sysrq and nmi watchdog don't cut it...

Hm, did you try a serial console? On my side this was a big step forward.
If you experience complete hangs it may be something around hanging interrupts.
Did you play with apic/acpi etc. to try different interrupt handling? What does
your /proc/interrupts look like compared between 2.2 and 2.4 ?

Regards,
Stephan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1 solid hangs)
  2003-08-27  9:30                 ` Stephan von Krawczynski
@ 2003-08-27 10:13                   ` Ville Herva
  2003-08-27 10:56                     ` Stephan von Krawczynski
  0 siblings, 1 reply; 22+ messages in thread
From: Ville Herva @ 2003-08-27 10:13 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: linux-kernel, tejun

On Wed, Aug 27, 2003 at 11:30:27AM +0200, you [Stephan von Krawczynski] wrote:
> 
> Hm, did you try a serial console? On my side this was a big step forward.

Do you mean in your case nothing shown on monitor (I've disabled monitor
blanking, so that is not it), sysrq key didn't work, nmi watchdog didn't
trigger but you were still able to get output from serial console? An oops?

Or, did you use kdb/kgdb in addition to serial console?

> If you experience complete hangs it may be something around hanging
> interrupts.

Probably, yes.

> Did you play with apic/acpi etc. to try different interrupt handling? 

ACPI has never been enabled. I enabled local APIC when I enabled nmi
watchdog, so I've tried it on and off.

> What does your /proc/interrupts look like compared between 2.2 and 2.4 ?

I don't have 2.2 output at hand, but the 2.4.21-jam1 output doesn't seem too
suspicious:

cat /proc/interrupts 
           CPU0       
  0:    1675428          XT-PIC  timer
  1:          3          XT-PIC  keyboard
  2:          0          XT-PIC  cascade
  4:      19625          XT-PIC  serial
  9:      25447          XT-PIC  aic7xxx
 11:      25203          XT-PIC  eth0
 12:          0          XT-PIC  PS/2 Mouse
 14:     178082          XT-PIC  ide0
NMI:      16763 
LOC:    1675326 
ERR:          0



-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1 solid hangs)
  2003-08-27 10:13                   ` Ville Herva
@ 2003-08-27 10:56                     ` Stephan von Krawczynski
  2003-08-27 11:04                       ` Ville Herva
  0 siblings, 1 reply; 22+ messages in thread
From: Stephan von Krawczynski @ 2003-08-27 10:56 UTC (permalink / raw)
  To: Ville Herva; +Cc: linux-kernel, tejun

On Wed, 27 Aug 2003 13:13:13 +0300
Ville Herva <vherva@niksula.hut.fi> wrote:

> On Wed, Aug 27, 2003 at 11:30:27AM +0200, you [Stephan von Krawczynski]
> wrote:
> > 
> > Hm, did you try a serial console? On my side this was a big step forward.
> 
> Do you mean in your case nothing shown on monitor (I've disabled monitor
> blanking, so that is not it), sysrq key didn't work, nmi watchdog didn't
> trigger but you were still able to get output from serial console? An oops?

I often have X setups, so console output gets _somewhere_ in the background.

> Or, did you use kdb/kgdb in addition to serial console?

No.

> > What does your /proc/interrupts look like compared between 2.2 and 2.4 ?
> 
> I don't have 2.2 output at hand, but the 2.4.21-jam1 output doesn't seem too
> suspicious:

You're right, it looks pretty clean and simple. Possibly the only thing I would
try is moving aic away from int 9 to int 10 or so. Int 9 sometimes interferes
with VGA int routing on broken boxes. But that is unlikely (though simple to
test).

Regards,
Stephan


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1 solid hangs)
  2003-08-27 10:56                     ` Stephan von Krawczynski
@ 2003-08-27 11:04                       ` Ville Herva
  2003-08-27 11:30                         ` Stephan von Krawczynski
  0 siblings, 1 reply; 22+ messages in thread
From: Ville Herva @ 2003-08-27 11:04 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: linux-kernel, tejun

On Wed, Aug 27, 2003 at 12:56:33PM +0200, you [Stephan von Krawczynski] wrote:
> 
> > Or, did you use kdb/kgdb in addition to serial console?
> 
> No.
 
Ok. 

I might give a debugger a shot anyway when I find the time.

> You're right, it looks pretty clean and simple. Possibly the only thing I would
> try is moving aic away from int 9 to int 10 or so. Int 9 sometimes interferes
> with VGA int routing on broken boxes. But that is unlikely (though simple to
> test).

I don't think vga interferes with anything: I never run X on the box, and
even the text console remains quiescent as nothing is logged.

Better test would perhaps be to get rid of Adaptec 2940 altogether and move
the rootfs on an ide disk. But that's not exactly convenient either...


-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1 solid hangs)
  2003-08-27 11:04                       ` Ville Herva
@ 2003-08-27 11:30                         ` Stephan von Krawczynski
  2003-08-28  9:26                           ` Ingo Oeser
  0 siblings, 1 reply; 22+ messages in thread
From: Stephan von Krawczynski @ 2003-08-27 11:30 UTC (permalink / raw)
  To: Ville Herva; +Cc: linux-kernel, tejun

On Wed, 27 Aug 2003 14:04:17 +0300
Ville Herva <vherva@niksula.hut.fi> wrote:

> > You're right, it looks pretty clean and simple. Possibly the only thing I
> > would try is moving aic away from int 9 to int 10 or so. Int 9 sometimes
> > interferes with VGA int routing on broken boxes. But that is unlikely
> > (though simple to test).
> 
> I don't think vga interferes with anything: I never run X on the box, and
> even the text console remains quiescent as nothing is logged.

The thing I ran into once was not really an intensive use of VGA and its ints
but rather some weird glitches in the boards' int logic that sometimes drove
the software drivers crazy (was network back then).

Regards,
Stephan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1 solid hangs)
  2003-08-27  7:37               ` Ville Herva
  2003-08-27  9:30                 ` Stephan von Krawczynski
@ 2003-08-28  1:13                 ` TeJun Huh
  2003-08-28  5:40                   ` Ville Herva
  1 sibling, 1 reply; 22+ messages in thread
From: TeJun Huh @ 2003-08-28  1:13 UTC (permalink / raw)
  To: Ville Herva, Stephan von Krawczynski, linux-kernel

On Wed, Aug 27, 2003 at 10:37:58AM +0300, Ville Herva wrote:
> On Wed, Aug 27, 2003 at 09:21:39AM +0200, you [Stephan von Krawczynski] wrote:
> > 
> > Sorry, then you have to look for another explanation. 
> 
> Yep, but I don't have any reasonable suspects.
> 
> > Did you already try to exchange everything but the harddisks ?
> 
> No. Do you suspect faulty hardware?
> 
> Apart from perhaps Adaptec 2940 (Adaptecs always give me trouble), I
> believe the hw is pretty solid. It had no problems with 2.2 kernels.  Based
> on my experience, the i815 chipset is not that shaky (unlike the Via dung),
> and I would expect the Intel motherboard to be on the better side as well.
> 
> I can't completely rule faulty hw out, though.
> 
> Exchanging hw will be quite difficult, as the hangs take as much as three
> weeks to trigger (sometimes they happen withing a day after reboot), the box
> is a production server, and I don't have much spare hardware atm.
> 
> What I had hoped for is to be able to get some information on where it hangs.
> But sysrq and nmi watchdog don't cut it...
> 

 Hello Ville.  Hello Stephan. :-)

 Your problem sounds very simlar to the problem we were suffering.
The problem was a spinlock deadlock inside drivers/char/random.c which
is used by tcp to generate random initial sequence number.  The bug
fix was checked into 2.4 tree on 28th July after the release of pre8
at 14th July.

ChangeSet@1.1019.1.7, 2003-07-24 14:21:29-03:00, marcelo@freak.distro.conectiva
    Changed EXTRAVERSION to -pre8
  TAG: v2.4.22-pre8

ChangeSet@1.1019.3.10, 2003-07-28 17:25:49-07:00, olof@austin.ibm.com
  [RANDOM]: Fix SMP deadlock in __check_and_rekey().

 This problem can happen on UP machine if the kernel is compiled with
CONFIG_SMP.  Because the offending routine is called only every five
minutes and it should receive a SYN packet while it's connecting, it
occurs rarely, but it happens when it happens.

 Please try 2.4.22.

P.S. This bug is a real headache.  We had many servers deployed and
they all randomly locked up about every two or four weeks.  I believe
people should be warned about this one.

-- 
tejun


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1 solid hangs)
  2003-08-28  1:13                 ` TeJun Huh
@ 2003-08-28  5:40                   ` Ville Herva
  2003-08-28  5:57                     ` Tejun Huh
  0 siblings, 1 reply; 22+ messages in thread
From: Ville Herva @ 2003-08-28  5:40 UTC (permalink / raw)
  To: TeJun Huh; +Cc: Stephan von Krawczynski, linux-kernel

On Thu, Aug 28, 2003 at 10:13:41AM +0900, you [TeJun Huh] wrote:
> 
>  Your problem sounds very simlar to the problem we were suffering.
> The problem was a spinlock deadlock inside drivers/char/random.c which
> is used by tcp to generate random initial sequence number.  The bug
> fix was checked into 2.4 tree on 28th July after the release of pre8
> at 14th July.

Uhh, I tried 2.4.22pre8 a while ago (I think it was Herbert Pötzl's
suggestion), and it locked up too. Shame that the fix didn't make it in
it...

I'll give .22-final a spin.
 
> This problem can happen on UP machine if the kernel is compiled with
> CONFIG_SMP.

This is UP box and the kernel is _not_ compiled with CONFIG_SMP.

> Because the offending routine is called only every five
> minutes and it should receive a SYN packet while it's connecting, it
> occurs rarely, but it happens when it happens.

In my case, the lock up seems clearly related to disk io: it usually happens
during the nightly oracle backup dump, and at some point it kept happening
while compiling kernel. (It's random, I can no longer reproduce it by just
compiling a kernel.)

Do you still think it could be the same one?

>  Please try 2.4.22.
> 
> P.S. This bug is a real headache.  We had many servers deployed and
> they all randomly locked up about every two or four weeks.  I believe
> people should be warned about this one.

What's really strange is that the box kept running with 2.4.20pre7 for
almost a year without problems (with the same oracle dump jub in nightly
cron), and then suddenly begun acting up on my the first day of my summer
vacation...


-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1 solid hangs)
  2003-08-28  5:40                   ` Ville Herva
@ 2003-08-28  5:57                     ` Tejun Huh
  0 siblings, 0 replies; 22+ messages in thread
From: Tejun Huh @ 2003-08-28  5:57 UTC (permalink / raw)
  To: Ville Herva, TeJun Huh, Stephan von Krawczynski, linux-kernel

On Thu, Aug 28, 2003 at 08:40:00AM +0300, Ville Herva wrote:
> On Thu, Aug 28, 2003 at 10:13:41AM +0900, you [TeJun Huh] wrote:
> > 
> >  Your problem sounds very simlar to the problem we were suffering.
> > The problem was a spinlock deadlock inside drivers/char/random.c which
> > is used by tcp to generate random initial sequence number.  The bug
> > fix was checked into 2.4 tree on 28th July after the release of pre8
> > at 14th July.
> 
> Uhh, I tried 2.4.22pre8 a while ago (I think it was Herbert P?tzl's
> suggestion), and it locked up too. Shame that the fix didn't make it in
> it...
> 
> I'll give .22-final a spin.
>  
> > This problem can happen on UP machine if the kernel is compiled with
> > CONFIG_SMP.
> 
> This is UP box and the kernel is _not_ compiled with CONFIG_SMP.

 Then, it should be a different problem.  That deadlock wouldn't occur
with UP kernel.

> > Because the offending routine is called only every five
> > minutes and it should receive a SYN packet while it's connecting, it
> > occurs rarely, but it happens when it happens.
> 
> In my case, the lock up seems clearly related to disk io: it usually happens
> during the nightly oracle backup dump, and at some point it kept happening
> while compiling kernel. (It's random, I can no longer reproduce it by just
> compiling a kernel.)
> 
> Do you still think it could be the same one?

 No, I don't think so anymore.  I think trying kdb/kgdb would be
better.

> >  Please try 2.4.22.
> > 
> > P.S. This bug is a real headache.  We had many servers deployed and
> > they all randomly locked up about every two or four weeks.  I believe
> > people should be warned about this one.
> 
> What's really strange is that the box kept running with 2.4.20pre7 for
> almost a year without problems (with the same oracle dump jub in nightly
> cron), and then suddenly begun acting up on my the first day of my summer
> vacatnion...

 Good luck. :-)

-- 
tejun


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1 solid hangs)
  2003-08-27 11:30                         ` Stephan von Krawczynski
@ 2003-08-28  9:26                           ` Ingo Oeser
  2003-08-28 19:09                             ` Ville Herva
  0 siblings, 1 reply; 22+ messages in thread
From: Ingo Oeser @ 2003-08-28  9:26 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: Ville Herva, linux-kernel, tejun

On Wed, Aug 27, 2003 at 01:30:55PM +0200, Stephan von Krawczynski wrote:
> On Wed, 27 Aug 2003 14:04:17 +0300
> Ville Herva <vherva@niksula.hut.fi> wrote:
> > I don't think vga interferes with anything: I never run X on the box, and
> > even the text console remains quiescent as nothing is logged.
> 
> The thing I ran into once was not really an intensive use of VGA and its ints
> but rather some weird glitches in the boards' int logic that sometimes drove
> the software drivers crazy (was network back then).

I have seen this too, with some DSP board.

But heavy (disk) IO and misterious crashes sound like power problems,
doesn't it?

Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1 solid hangs)
  2003-08-28  9:26                           ` Ingo Oeser
@ 2003-08-28 19:09                             ` Ville Herva
  0 siblings, 0 replies; 22+ messages in thread
From: Ville Herva @ 2003-08-28 19:09 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: linux-kernel

On Thu, Aug 28, 2003 at 11:26:30AM +0200, you [Ingo Oeser] wrote:
> 
> But heavy (disk) IO and misterious crashes sound like power problems,
> doesn't it?

Hmm. It doesn't crash, it locks up solid. (Well the aic7xxx driver sometimes
crashes (spits a huge log of errors, rather), but I'm still not sure if
that's related.)

The box only has two disks, 1.3GHz Celeron (~30W), and other lighter power
consumers. Not exactly a power hungry config. I'm not sure about the power
supply - I think it's a 250W one - I'll have to check.

Accoring to sensors, the voltages do not fluctuate much. Also, the
temperatures are moderate (34.0°C system, 41.0°C CPU).

Power problems are surely possible, but don't exactly sound like promising
lead to me. 


-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2003-08-28 19:09 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-29  7:39 2.4.21-jam1, aic7xxx-6.2.36: solid hangs, crashes on boot Ville Herva
2003-07-30  7:13 ` 2.4.22pre8 hangs too (Re: 2.4.21-jam1, aic7xxx-6.2.36: solid hangs) Ville Herva
2003-07-30 14:50   ` Marcelo Tosatti
2003-07-30 18:10     ` Ville Herva
2003-08-08 12:55       ` Ville Herva
2003-08-09 20:19         ` Adrian Bunk
2003-08-09 22:15           ` Ville Herva
2003-08-27  6:43       ` 2.4.22pre8 hangs too (Re: 2.4.21-jam1 " Ville Herva
2003-08-27  7:03         ` Stephan von Krawczynski
2003-08-27  7:12           ` Ville Herva
2003-08-27  7:21             ` Stephan von Krawczynski
2003-08-27  7:37               ` Ville Herva
2003-08-27  9:30                 ` Stephan von Krawczynski
2003-08-27 10:13                   ` Ville Herva
2003-08-27 10:56                     ` Stephan von Krawczynski
2003-08-27 11:04                       ` Ville Herva
2003-08-27 11:30                         ` Stephan von Krawczynski
2003-08-28  9:26                           ` Ingo Oeser
2003-08-28 19:09                             ` Ville Herva
2003-08-28  1:13                 ` TeJun Huh
2003-08-28  5:40                   ` Ville Herva
2003-08-28  5:57                     ` Tejun Huh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).