linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Locked up 2.4.10-pre11 on Tyan 815t motherboard.
@ 2001-09-19  6:51 Ben Greear
  2001-09-19  7:10 ` Ben Greear
       [not found] ` <3BA8CCF1.CA2933B3@zip.com.au>
  0 siblings, 2 replies; 14+ messages in thread
From: Ben Greear @ 2001-09-19  6:51 UTC (permalink / raw)
  To: linux-kernel


I was running a network stress test (sending lots of 64 byte packets
on the DLINK 4-port NIC and two EEPRO-100 NICs.  This ran for 5 minutes,
and all was good (about 12Mbps of 64byte packets)..

Then, I re-started the test with 128 byte packets.  As soon as traffic
started, the whole machine locked up.  Couldn't even ping it from another
machine.  I had to hold down the power-switch for about 5 seconds before
it reset (ie it wasn't even listening to the power-down??)

I'm using the eepro100 driver that is included in the kernel, btw.  This
used to have a lockup problem, but I thought it was fixed...

I'm going to see if I can re-produce this.  If so, can someone suggest
a way to get more/better debugging information??

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>          <Ben_Greear@excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Locked up 2.4.10-pre11 on Tyan 815t motherboard.
  2001-09-19  6:51 Locked up 2.4.10-pre11 on Tyan 815t motherboard Ben Greear
@ 2001-09-19  7:10 ` Ben Greear
       [not found] ` <3BA8CCF1.CA2933B3@zip.com.au>
  1 sibling, 0 replies; 14+ messages in thread
From: Ben Greear @ 2001-09-19  7:10 UTC (permalink / raw)
  To: linux-kernel

I can re-produce this with the e100 driver as well
(using 64byte packets, too).  I'll try an older kernel
tomorrow to see if that fixes anything...

Ben Greear wrote:
> 
> I was running a network stress test (sending lots of 64 byte packets
> on the DLINK 4-port NIC and two EEPRO-100 NICs.  This ran for 5 minutes,
> and all was good (about 12Mbps of 64byte packets)..
> 
> Then, I re-started the test with 128 byte packets.  As soon as traffic
> started, the whole machine locked up.  Couldn't even ping it from another
> machine.  I had to hold down the power-switch for about 5 seconds before
> it reset (ie it wasn't even listening to the power-down??)
> 
> I'm using the eepro100 driver that is included in the kernel, btw.  This
> used to have a lockup problem, but I thought it was fixed...
> 
> I'm going to see if I can re-produce this.  If so, can someone suggest
> a way to get more/better debugging information??
> 
> Thanks,
> Ben
> 
> --
> Ben Greear <greearb@candelatech.com>          <Ben_Greear@excite.com>
> President of Candela Technologies Inc      http://www.candelatech.com
> ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Ben Greear <greearb@candelatech.com>          <Ben_Greear@excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Locked up 2.4.10-pre11 on Tyan 815t motherboard.
       [not found]     ` <3BA8D619.9A607219@zip.com.au>
@ 2001-09-19 18:09       ` Ben Greear
  2001-09-24 15:14         ` bill davidsen
  2001-09-19 18:38       ` Bruce Harada
  1 sibling, 1 reply; 14+ messages in thread
From: Ben Greear @ 2001-09-19 18:09 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel

Andrew Morton wrote:

> nmi_watchdog will force an oops if the machine locks up
> with interrupts disabled (as I suspect mine did).  But
> it requires an SMP kernel or IO-APIC-on-UP.

I just built a 2.4.8 kernel with the APIC enabled.  It locked
hard and printed no OOPS.  I had set the boot cmd line as:
nmi_watchdog=1

> 
> > Could bad hardware do this..and if so, any idea of what to look for??
> >
> 
> Yes, bad hardware could do it.  It'll eother be a lockup
> with interrupts disabled or a double/triple fault.

When I opened the machine the first time (before I powered it up),
I noticed that the CPU fan's wires were tangled in the fan such that
it couldn't move..  I fixed that, but it could have been run before
I received the machine...  Could that cause this problem you think??

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>          <Ben_Greear@excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Locked up 2.4.10-pre11 on Tyan 815t motherboard.
       [not found]     ` <3BA8D619.9A607219@zip.com.au>
  2001-09-19 18:09       ` Ben Greear
@ 2001-09-19 18:38       ` Bruce Harada
  2001-09-19 19:15         ` Alan Cox
  2001-09-19 19:46         ` Ben Greear
  1 sibling, 2 replies; 14+ messages in thread
From: Bruce Harada @ 2001-09-19 18:38 UTC (permalink / raw)
  To: Ben Greear; +Cc: linux-kernel

On Wed, 19 Sep 2001 11:09:29 -0700
Ben Greear <greearb@candelatech.com> wrote:
>
> When I opened the machine the first time (before I powered it up),
> I noticed that the CPU fan's wires were tangled in the fan such that
> it couldn't move..  I fixed that, but it could have been run before
> I received the machine...  Could that cause this problem you think??

Doubtful. Since it's an 815, I presume you're running a PIII (correct me if
I'm wrong) - newish PIIIs have reasonable overheating cutout features, and
if overheating had damaged the CPU, I'd be very surprised if it worked at
all, rather than just locking up on certain sizes of network packets.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Locked up 2.4.10-pre11 on Tyan 815t motherboard.
  2001-09-19 18:38       ` Bruce Harada
@ 2001-09-19 19:15         ` Alan Cox
  2001-09-19 19:53           ` Ben Greear
  2001-09-20 19:28           ` Locked up 2.4.10-pre11 on Tyan 815t motherboard Martin Josefsson
  2001-09-19 19:46         ` Ben Greear
  1 sibling, 2 replies; 14+ messages in thread
From: Alan Cox @ 2001-09-19 19:15 UTC (permalink / raw)
  To: Bruce Harada; +Cc: Ben Greear, linux-kernel

> Doubtful. Since it's an 815, I presume you're running a PIII (correct me if
> I'm wrong) - newish PIIIs have reasonable overheating cutout features, and
> if overheating had damaged the CPU, I'd be very surprised if it worked at
> all, rather than just locking up on certain sizes of network packets.

The 815 chipsets have known (and documented) problems with out of spec
memory signals. Board vendors are supposed to have used workarounds but I
have so far sent back 2 out of the 3 A/Open i815 boards with problems where
they locked up occasionally under high load (in any OS) and also failed
memtest86 (with known good tested ram) when placed in an electrically noisy
environment.

I've seen lockups on high network load as part of that - but not packet size
dependant ones.

Alan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Locked up 2.4.10-pre11 on Tyan 815t motherboard.
  2001-09-19 18:38       ` Bruce Harada
  2001-09-19 19:15         ` Alan Cox
@ 2001-09-19 19:46         ` Ben Greear
  1 sibling, 0 replies; 14+ messages in thread
From: Ben Greear @ 2001-09-19 19:46 UTC (permalink / raw)
  To: Bruce Harada; +Cc: linux-kernel

Bruce Harada wrote:
> 
> On Wed, 19 Sep 2001 11:09:29 -0700
> Ben Greear <greearb@candelatech.com> wrote:
> >
> > When I opened the machine the first time (before I powered it up),
> > I noticed that the CPU fan's wires were tangled in the fan such that
> > it couldn't move..  I fixed that, but it could have been run before
> > I received the machine...  Could that cause this problem you think??
> 
> Doubtful. Since it's an 815, I presume you're running a PIII (correct me if
> I'm wrong) - newish PIIIs have reasonable overheating cutout features, and
> if overheating had damaged the CPU, I'd be very surprised if it worked at
> all, rather than just locking up on certain sizes of network packets.

Yes, it's a PIII 1Ghz.  I'm not sure how important the packet sizes
are:  I can lock it with 64 byte and 128 byte packets, with the commonality
being that the CPU is maxed out and there are a massive number of little
packets all over the place.

Also, the traffic I'm running is raw packets sent straight to the
driver, not IP....

I took out the 4-port Tulip NIC and I haven't locked it up yet, though
I'm getting really sorry performance out of the sketchy machine for
some reason or another.  Still, the problem definately seems related
to the DLINK 4-port NIC at this point...

Ben

-- 
Ben Greear <greearb@candelatech.com>          <Ben_Greear@excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Locked up 2.4.10-pre11 on Tyan 815t motherboard.
  2001-09-19 19:15         ` Alan Cox
@ 2001-09-19 19:53           ` Ben Greear
  2001-09-19 20:03             ` Jeff Garzik
  2001-09-19 20:05             ` Alan Cox
  2001-09-20 19:28           ` Locked up 2.4.10-pre11 on Tyan 815t motherboard Martin Josefsson
  1 sibling, 2 replies; 14+ messages in thread
From: Ben Greear @ 2001-09-19 19:53 UTC (permalink / raw)
  To: Alan Cox; +Cc: Bruce Harada, linux-kernel

Alan Cox wrote:
> 
> > Doubtful. Since it's an 815, I presume you're running a PIII (correct me if
> > I'm wrong) - newish PIIIs have reasonable overheating cutout features, and
> > if overheating had damaged the CPU, I'd be very surprised if it worked at
> > all, rather than just locking up on certain sizes of network packets.
> 
> The 815 chipsets have known (and documented) problems with out of spec
> memory signals. Board vendors are supposed to have used workarounds but I
> have so far sent back 2 out of the 3 A/Open i815 boards with problems where
> they locked up occasionally under high load (in any OS) and also failed
> memtest86 (with known good tested ram) when placed in an electrically noisy
> environment.
> 
> I've seen lockups on high network load as part of that - but not packet size
> dependant ones.

Damn..someone has to make good stable motherboards...anyone got any
suggestions for one that will fit into a 1U server, with built-in
Video and preferably a NIC?  I had ok luck with an Intel board based
on the 815 chipset, so long as I used the e100 driver...maybe I'll
have to go back to it...

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>          <Ben_Greear@excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Locked up 2.4.10-pre11 on Tyan 815t motherboard.
  2001-09-19 19:53           ` Ben Greear
@ 2001-09-19 20:03             ` Jeff Garzik
  2001-09-19 20:05             ` Alan Cox
  1 sibling, 0 replies; 14+ messages in thread
From: Jeff Garzik @ 2001-09-19 20:03 UTC (permalink / raw)
  To: Ben Greear; +Cc: Alan Cox, Bruce Harada, linux-kernel

On Wed, 19 Sep 2001, Ben Greear wrote:
> Damn..someone has to make good stable motherboards...anyone got any
> suggestions for one that will fit into a 1U server, with built-in
> Video and preferably a NIC?  I had ok luck with an Intel board based
> on the 815 chipset, so long as I used the e100 driver...maybe I'll
> have to go back to it...

SiS makes integrated boards, with integrated video and sis900 NIC.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Locked up 2.4.10-pre11 on Tyan 815t motherboard.
  2001-09-19 19:53           ` Ben Greear
  2001-09-19 20:03             ` Jeff Garzik
@ 2001-09-19 20:05             ` Alan Cox
  2001-09-19 20:40               ` Locked up 2.4.10-pre11 on Tyan 815t motherboard. [EEPRO-100 bugs] Ben Greear
  1 sibling, 1 reply; 14+ messages in thread
From: Alan Cox @ 2001-09-19 20:05 UTC (permalink / raw)
  To: Ben Greear; +Cc: Alan Cox, Bruce Harada, linux-kernel

> Damn..someone has to make good stable motherboards...anyone got any
> suggestions for one that will fit into a 1U server, with built-in
> Video and preferably a NIC?  I had ok luck with an Intel board based
> on the 815 chipset, so long as I used the e100 driver...maybe I'll
> have to go back to it...

The 815 + e100 thing is _not_ a hardware issue. Its some subtle driver
related things, and of course Intel are keen to push e100 rather than
help people fix the kernel driver so not much help.

I've fixed some of the problems in recent -ac (the power management timeout)
which is now in Linus tree. Arjan van de Ven also fixed some other bits.

I'd be interested to know if that helps (to keep the test simple and single
variable you can use the -ac eepro100.c file in Linus 2.4.9)

Alan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Locked up 2.4.10-pre11 on Tyan 815t motherboard. [EEPRO-100 bugs]
  2001-09-19 20:05             ` Alan Cox
@ 2001-09-19 20:40               ` Ben Greear
  2001-09-19 21:57                 ` Alan Cox
  0 siblings, 1 reply; 14+ messages in thread
From: Ben Greear @ 2001-09-19 20:40 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

Alan Cox wrote:
> 
> > Damn..someone has to make good stable motherboards...anyone got any
> > suggestions for one that will fit into a 1U server, with built-in
> > Video and preferably a NIC?  I had ok luck with an Intel board based
> > on the 815 chipset, so long as I used the e100 driver...maybe I'll
> > have to go back to it...
> 
> The 815 + e100 thing is _not_ a hardware issue. Its some subtle driver
> related things, and of course Intel are keen to push e100 rather than
> help people fix the kernel driver so not much help.
> 

If they'd only implement the mii-diags I would just use theirs...

> I've fixed some of the problems in recent -ac (the power management timeout)
> which is now in Linus tree. Arjan van de Ven also fixed some other bits.

Do you think you've fixed this?  If so, I'll give your version a try...


I get all kinds of crap on the pre11 build with the eepro100, but only
on one of my 4 Intel NICs.  The driver messages call it an 82557, but the
chip says 82559.  I'm not sure if that is significant or not...


/var/log/messages snippet:

Sep 19 13:27:42 lanf1 last message repeated 24 times
Sep 19 13:27:46 lanf1 kernel: NETDEV WATCHDOG: eth1: transmit timed out
Sep 19 13:27:46 lanf1 kernel: eth1: Transmit timed out: status 0090  0c80 at 194636/194664 command 000c0000.
Sep 19 13:27:51 lanf1 kernel: eepro100: wait_for_cmd_done timeout!
Sep 19 13:27:51 lanf1 last message repeated 24 times
Sep 19 13:27:54 lanf1 kernel: NETDEV WATCHDOG: eth1: transmit timed out
Sep 19 13:27:54 lanf1 kernel: eth1: Transmit timed out: status 0090  0c80 at 210712/210740 command 200c0000.
Sep 19 13:28:11 lanf1 kernel: eepro100: wait_for_cmd_done timeout!
Sep 19 13:28:11 lanf1 last message repeated 24 times
Sep 19 13:28:14 lanf1 kernel: NETDEV WATCHDOG: eth1: transmit timed out
Sep 19 13:28:14 lanf1 kernel: eth1: Transmit timed out: status 0090  0c80 at 251298/251326 command 000c0000.
Sep 19 13:28:16 lanf1 kernel: eepro100: wait_for_cmd_done timeout!
Sep 19 13:28:16 lanf1 last message repeated 24 times
Sep 19 13:28:20 lanf1 kernel: NETDEV WATCHDOG: eth1: transmit timed out
Sep 19 13:28:20 lanf1 kernel: eth1: Transmit timed out: status 0090  0c80 at 260668/260696 command 000c0000.
Sep 19 13:28:23 lanf1 kernel: eepro100: wait_for_cmd_done timeout!
Sep 19 13:28:23 lanf1 last message repeated 24 times
Sep 19 13:28:26 lanf1 sshd(pam_unix)[1977]: session opened for user root by (uid=0)
Sep 19 13:28:26 lanf1 kernel: NETDEV WATCHDOG: eth1: transmit timed out
Sep 19 13:28:26 lanf1 kernel: eth1: Transmit timed out: status 0090  0c80 at 273076/273104 command 000c0000.
Sep 19 13:28:54 lanf1 kernel: eepro100: wait_for_cmd_done timeout!
Sep 19 13:28:54 lanf1 last message repeated 24 times
Sep 19 13:28:58 lanf1 kernel: NETDEV WATCHDOG: eth1: transmit timed out
Sep 19 13:28:58 lanf1 kernel: eth1: Transmit timed out: status 0090  0c80 at 334769/334797 command 000c0000.



dmesg shows this:
.....
eepro100: wait_for_cmd_done timeout!
eepro100: wait_for_cmd_done timeout!
VFS: Disk change detected on device ide1(22,0)
NETDEV WATCHDOG: eth1: transmit timed out
eth1: Transmit timed out: status 0090  0c80 at 401389/401417 command 000c0000.
eth1: Tx ring dump,  Tx queue 401417 / 401389:
eth1:     0 200c0000.
eth1:     1 000c0000.
eth1:     2 000c0000.
eth1:     3 000c0000.
eth1:     4 000c0000.
eth1:     5 000c0000.
eth1:     6 000c0000.
eth1:     7 000c0000.
eth1:     8 600c0000.
eth1:   = 9 000ca000.
eth1:    10 000ca000.
eth1:    11 000ca000.
eth1:    12 000ca000.
eth1:  * 13 000c0000.
eth1:    14 000c0000.
eth1:    15 000c0000.
eth1:    16 200c0000.
eth1:    17 000c0000.
eth1:    18 000c0000.
eth1:    19 000c0000.
eth1:    20 000c0000.
eth1:    21 000c0000.
eth1:    22 000c0000.
eth1:    23 000c0000.
eth1:    24 200c0000.
eth1:    25 000c0000.
eth1:    26 000c0000.
eth1:    27 000c0000.
eth1:    28 000c0000.
eth1:    29 000c0000.
eth1:    30 000c0000.
eth1:    31 000c0000.
eth1: Printing Rx ring (next to receive into 43202, dirty index 43202).
eth1:     0 00000001.
eth1: l   1 c0000001.
eth1:  *= 2 00000001.
eth1:     3 00000001.
eth1:     4 00000001.
eth1:     5 00000001.
eth1:     6 00000001.
eth1:     7 00000001.
eth1:     8 00000001.
eth1:     9 00000001.
eth1:    10 00000001.
eth1:    11 00000001.
eth1:    12 00000001.
eth1:    13 00000001.
eth1:    14 00000001.
eth1:    15 00000001.
eth1:    16 00000001.
eth1:    17 00000001.
eth1:    18 00000001.
eth1:    19 00000001.
eth1:    20 00000001.
eth1:    21 00000001.
eth1:    22 00000001.
eth1:    23 00000001.
eth1:    24 00000001.
eth1:    25 00000001.
eth1:    26 00000001.
eth1:    27 00000001.
eth1:    28 00000001.
eth1:    29 00000001.
eth1:    30 00000001.
eth1:    31 00000001.
VFS: Disk change detected on device ide1(22,0)
VFS: Disk change detected on device ide1(22,0)
.....


Here's the boot messages:

Sep 19 13:20:49 lanf1 kernel: eepro100.c:v1.09j-t 9/29/99 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
Sep 19 13:20:49 lanf1 kernel: eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <saw@saw.sw.com.sg> and o
thers
Sep 19 13:20:49 lanf1 kernel: PCI: Found IRQ 10 for device 01:0b.0
Sep 19 13:20:49 lanf1 kernel: eth0: Intel Corporation 82557 [Ethernet Pro 100] (#3), 00:E0:81:03:B9:77, IRQ 10.
Sep 19 13:20:49 lanf1 kernel:   Board assembly 567812-052, Physical connectors present: RJ45
Sep 19 13:20:49 lanf1 kernel:   Primary interface chip i82555 PHY #1.
Sep 19 13:20:49 lanf1 kernel:   General self-test: passed.
Sep 19 13:20:49 lanf1 kernel:   Serial sub-system self-test: passed.
Sep 19 13:20:49 lanf1 kernel:   Internal registers self-test: passed.
Sep 19 13:20:49 lanf1 kernel:   ROM checksum self-test: passed (0x04f4518b).
Sep 19 13:20:49 lanf1 kernel: PCI: Found IRQ 4 for device 01:09.0
Sep 19 13:20:49 lanf1 kernel: eth1: Intel Corporation 82557 [Ethernet Pro 100] (#2), 00:90:27:65:39:18, IRQ 4.
Sep 19 13:20:49 lanf1 kernel:   Board assembly 721383-006, Physical connectors present: RJ45
Sep 19 13:20:49 lanf1 kernel:   Primary interface chip i82555 PHY #1.
Sep 19 13:20:49 lanf1 kernel:   General self-test: passed.
Sep 19 13:20:49 lanf1 kernel:   Serial sub-system self-test: passed.
Sep 19 13:20:49 lanf1 kernel:   Internal registers self-test: passed.
Sep 19 13:20:49 lanf1 kernel:   ROM checksum self-test: passed (0x04f4518b).
Sep 19 13:20:49 lanf1 kernel: PCI: Found IRQ 5 for device 01:04.0
Sep 19 13:20:49 lanf1 kernel: PCI: Sharing IRQ 5 with 00:1f.3
Sep 19 13:20:49 lanf1 kernel: eth2: Intel Corporation 82557 [Ethernet Pro 100], 00:90:27:35:45:AB, IRQ 5.
Sep 19 13:20:49 lanf1 kernel:   Receiver lock-up bug exists -- enabling work-around.
Sep 19 13:20:49 lanf1 kernel:   Board assembly 689661-004, Physical connectors present: RJ45
Sep 19 13:20:49 lanf1 kernel:   Primary interface chip i82555 PHY #1.
Sep 19 13:20:49 lanf1 kernel:   General self-test: passed.
Sep 19 13:20:49 lanf1 kernel:   Serial sub-system self-test: passed.
Sep 19 13:20:49 lanf1 kernel:   Internal registers self-test: passed.
Sep 19 13:20:49 lanf1 kernel:   ROM checksum self-test: passed (0x24c9f043).
Sep 19 13:20:49 lanf1 kernel:   Receiver lock-up workaround activated.
Sep 19 13:20:49 lanf1 kernel: PCI: Found IRQ 11 for device 01:08.0
Sep 19 13:20:49 lanf1 kernel: eth3: Invalid EEPROM checksum 0xff00, check settings before activating this device!
Sep 19 13:20:49 lanf1 kernel: eth3: OEM i82557/i82558 10/100 Ethernet, FF:FF:FF:FF:FF:FF, IRQ 11.
Sep 19 13:20:49 lanf1 kernel:   Board assembly ffffff-255, Physical connectors present: RJ45 BNC AUI MII
Sep 19 13:20:49 lanf1 kernel:   Primary interface chip unknown-15 PHY #31.
Sep 19 13:20:49 lanf1 kernel:     Secondary interface chip i82555.
Sep 19 13:20:49 lanf1 kernel:   General self-test: passed.
Sep 19 13:20:49 lanf1 kernel:   Serial sub-system self-test: passed.
Sep 19 13:20:49 lanf1 kernel:   Internal registers self-test: passed.
Sep 19 13:20:49 lanf1 kernel:   ROM checksum self-test: passed (0x04f4518b).

> 
> I'd be interested to know if that helps (to keep the test simple and single
> variable you can use the -ac eepro100.c file in Linus 2.4.9)
> 
> Alan

-- 
Ben Greear <greearb@candelatech.com>          <Ben_Greear@excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Locked up 2.4.10-pre11 on Tyan 815t motherboard. [EEPRO-100 bugs]
  2001-09-19 20:40               ` Locked up 2.4.10-pre11 on Tyan 815t motherboard. [EEPRO-100 bugs] Ben Greear
@ 2001-09-19 21:57                 ` Alan Cox
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Cox @ 2001-09-19 21:57 UTC (permalink / raw)
  To: Ben Greear; +Cc: Alan Cox, linux-kernel

> > I've fixed some of the problems in recent -ac (the power management timeout)
> > which is now in Linus tree. Arjan van de Ven also fixed some other bits.
> 
> Do you think you've fixed this?  If so, I'll give your version a try...

Its fixed on my 810 and 815 board. Whether its fixed on others I dont know

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Locked up 2.4.10-pre11 on Tyan 815t motherboard.
  2001-09-19 19:15         ` Alan Cox
  2001-09-19 19:53           ` Ben Greear
@ 2001-09-20 19:28           ` Martin Josefsson
  1 sibling, 0 replies; 14+ messages in thread
From: Martin Josefsson @ 2001-09-20 19:28 UTC (permalink / raw)
  To: Alan Cox; +Cc: Bruce Harada, Ben Greear, linux-kernel

On Wed, 19 Sep 2001, Alan Cox wrote:

> > Doubtful. Since it's an 815, I presume you're running a PIII (correct me if
> > I'm wrong) - newish PIIIs have reasonable overheating cutout features, and
> > if overheating had damaged the CPU, I'd be very surprised if it worked at
> > all, rather than just locking up on certain sizes of network packets.
> 
> The 815 chipsets have known (and documented) problems with out of spec
> memory signals. Board vendors are supposed to have used workarounds but I
> have so far sent back 2 out of the 3 A/Open i815 boards with problems where
> they locked up occasionally under high load (in any OS) and also failed
> memtest86 (with known good tested ram) when placed in an electrically noisy
> environment.
> 
> I've seen lockups on high network load as part of that - but not packet size
> dependant ones.

Hi everybody.

I have very similar problems as Ben. I have 4 machines with Asus P3C-D
motherboards (the one with rambus) and they use the i820 chipset.
I have severe stabilityproblems here, both in SMP and UP.

I also use a D-Link DFE-570TX card in all of these machines and they act
as routers.

SMP with IO-APIC is worst, then comes SMP without IO-APIC and best is UP
without IO-APIC (havn't tried with IO-APIC).

I can usually make SMP with IO-APIC hang within 30-45 minutes by letting
it route some quite heavy traffic. But sometimes it locks up after a few
hours and sometimes (rare) a few days.

SMP without IO-APIC usually lasts a little bit longer but not that much.

UP can last for over a week but sometimes it only holds up for a few
hours.

I've tried a _lot_ of kernels, ranging from 2.3.99 ones to 2.4.8-ac12.

2.4.8-ac12 seemed to be quite stable on this router as it survived my
tests for >30 h. But it dies within 10minutes after I put it into
prouction :( So now it runs UP and it dies again a few minutes again
(watchdog-cards are a wonderful thing).

It's totally unresponsive when it dies. No numlock or anything, and if I
enable nmi_watchdog there's no diffrence, it doesn't say anything. The
machine is totally unresponsive in all ways.

Does any of you have any ideas to what I might try?

/Martin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Locked up 2.4.10-pre11 on Tyan 815t motherboard.
  2001-09-19 18:09       ` Ben Greear
@ 2001-09-24 15:14         ` bill davidsen
  2001-09-24 18:18           ` Ben Greear
  0 siblings, 1 reply; 14+ messages in thread
From: bill davidsen @ 2001-09-24 15:14 UTC (permalink / raw)
  To: linux-kernel

In article <3BA8DF59.B9F536B4@candelatech.com> greearb@candelatech.com wrote:
| Andrew Morton wrote:
| 
| > nmi_watchdog will force an oops if the machine locks up
| > with interrupts disabled (as I suspect mine did).  But
| > it requires an SMP kernel or IO-APIC-on-UP.
| 
| I just built a 2.4.8 kernel with the APIC enabled.  It locked
| hard and printed no OOPS.  I had set the boot cmd line as:
| nmi_watchdog=1

This only works if you have a lock with no response. If keyboard input
is echoed or ping still works, that's not the tight lockup. May I
suggest the software watchdog feature as an alternative. It's much
better at finding cases where the system is only braindead rather than
locked up.

-- 
bill davidsen <davidsen@tmr.com>
 "If I were a diplomat, in the best case I'd go hungry.  In the worst
  case, people would die."
		-- Robert Lipe

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Locked up 2.4.10-pre11 on Tyan 815t motherboard.
  2001-09-24 15:14         ` bill davidsen
@ 2001-09-24 18:18           ` Ben Greear
  0 siblings, 0 replies; 14+ messages in thread
From: Ben Greear @ 2001-09-24 18:18 UTC (permalink / raw)
  To: bill davidsen; +Cc: linux-kernel

bill davidsen wrote:
> 
> In article <3BA8DF59.B9F536B4@candelatech.com> greearb@candelatech.com wrote:
> | Andrew Morton wrote:
> |
> | > nmi_watchdog will force an oops if the machine locks up
> | > with interrupts disabled (as I suspect mine did).  But
> | > it requires an SMP kernel or IO-APIC-on-UP.
> |
> | I just built a 2.4.8 kernel with the APIC enabled.  It locked
> | hard and printed no OOPS.  I had set the boot cmd line as:
> | nmi_watchdog=1
> 
> This only works if you have a lock with no response. If keyboard input
> is echoed or ping still works, that's not the tight lockup. May I
> suggest the software watchdog feature as an alternative. It's much
> better at finding cases where the system is only braindead rather than
> locked up.

The keyboard did not work and ping did not work...

alt-sysreq did not work either.

Ben

-- 
Ben Greear <greearb@candelatech.com>          <Ben_Greear@excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2001-09-24 18:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-19  6:51 Locked up 2.4.10-pre11 on Tyan 815t motherboard Ben Greear
2001-09-19  7:10 ` Ben Greear
     [not found] ` <3BA8CCF1.CA2933B3@zip.com.au>
     [not found]   ` <3BA8D351.F57BE70D@candelatech.com>
     [not found]     ` <3BA8D619.9A607219@zip.com.au>
2001-09-19 18:09       ` Ben Greear
2001-09-24 15:14         ` bill davidsen
2001-09-24 18:18           ` Ben Greear
2001-09-19 18:38       ` Bruce Harada
2001-09-19 19:15         ` Alan Cox
2001-09-19 19:53           ` Ben Greear
2001-09-19 20:03             ` Jeff Garzik
2001-09-19 20:05             ` Alan Cox
2001-09-19 20:40               ` Locked up 2.4.10-pre11 on Tyan 815t motherboard. [EEPRO-100 bugs] Ben Greear
2001-09-19 21:57                 ` Alan Cox
2001-09-20 19:28           ` Locked up 2.4.10-pre11 on Tyan 815t motherboard Martin Josefsson
2001-09-19 19:46         ` Ben Greear

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).