* repeatable SMP lockups - kernel 2.4.9
@ 2001-09-14 12:30 Matthias Haase
2001-09-14 14:54 ` Martin Josefsson
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Matthias Haase @ 2001-09-14 12:30 UTC (permalink / raw)
To: linux-kernel
Our new SMP file- and printserver locks always hard up, if higher load
come on the NIC. True stable without networking (X11, DRI
1. First, I have changed the NIC from 3Com (vortex-driver) to noname,
driven by Realtek
RTL-8139 (rev 10) and the lockup occurs some later, but it occurs
repeatable if I copy large file on LAN, or export an X11 environment to
another box.
2. Changing the kernel to 2.2.19 results the same thing.
Donald Becker wrote, that he think, this apparently could be a bug with
the interrupt handling in the 2.4.9 kernel, not inside
the (his) driver itself.
The boot on the mainboard (Asus CUV266-D, 2x PIII 1 GHz, 512 mb DDR-RAM)
is always o.k. with APIC, excepting the 'unexpected IO-APIC, please mail'
- warning.
The lockup occurs too with 'noapic' on boot.
At third stage I can try another and 'smp-cleaner' (I think) NIC, D-Link
DFE-500 TX, based on DEC-Chip, using the tulip-driver.
Nothing is wrote about this in /var/log messages. The box is SCSI only,
Adaptec 29160N.
/proc/interrupts:
CPU0 CPU1
0: 273705 282423 IO-APIC-edge timer
1: 4891 5117 IO-APIC-edge keyboard
2: 0 0 XT-PIC cascade
8: 0 1 IO-APIC-edge rtc
10: 8578 8328 IO-APIC-level aic7xxx
11: 962066 961390 IO-APIC-level mga@PCI:1:0:0, es1371
12: 109685 111089 IO-APIC-edge PS/2 Mouse
15: 2273 2295 IO-APIC-level eth0
NMI: 0 0
LOC: 556044 556060
ERR: 0
MIS: 0
Looks clean :-(
Are there any patches, hints or recommendations known about this?
__
Best regards from Germany
Matthias
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: repeatable SMP lockups - kernel 2.4.9
2001-09-14 12:30 repeatable SMP lockups - kernel 2.4.9 Matthias Haase
@ 2001-09-14 14:54 ` Martin Josefsson
2001-09-14 16:23 ` Matthias Haase
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Martin Josefsson @ 2001-09-14 14:54 UTC (permalink / raw)
To: Matthias Haase; +Cc: linux-kernel
On Fri, 14 Sep 2001, Matthias Haase wrote:
> Our new SMP file- and printserver locks always hard up, if higher load
> come on the NIC. True stable without networking (X11, DRI
I have the similar problems with 4 routers here, they get quite high
network load sometimes... not really good.
> 1. First, I have changed the NIC from 3Com (vortex-driver) to noname,
> driven by Realtek
> RTL-8139 (rev 10) and the lockup occurs some later, but it occurs
> repeatable if I copy large file on LAN, or export an X11 environment to
> another box.
I used to be able to get the routers to hang in under 30minutes, but with
2.4.8-ac12 one of them survived my testing for over 36hours.
But when I put it into production thinking that it's more stable than the
other kernels it hung after 5-10minutes of operation.
> 2. Changing the kernel to 2.2.19 results the same thing.
Havn't tried any 2.2 kernels here because I want iptables.
> Donald Becker wrote, that he think, this apparently could be a bug with
> the interrupt handling in the 2.4.9 kernel, not inside
> the (his) driver itself.
>
> The boot on the mainboard (Asus CUV266-D, 2x PIII 1 GHz, 512 mb DDR-RAM)
> is always o.k. with APIC, excepting the 'unexpected IO-APIC, please mail'
> - warning.
> The lockup occurs too with 'noapic' on boot.
Our routers consists of Asus P3C-D (i820 chipset), 2xpIII 800MHz, 256MB
rimm. As a lot of people know, the i820 chipset is very unstable _if_ you
have SDRAM but not with rimm as it was built for.
Running with 'noapic' still freezes but I don't think it occurs as
frequently as when runnign with IOAPIC.
> At third stage I can try another and 'smp-cleaner' (I think) NIC, D-Link
> DFE-500 TX, based on DEC-Chip, using the tulip-driver.
I'm using D-Link DFE-570TX which is a quad tulip (DECchip 21143 rev 65).
I've been using both the stock driver in the kernels and an optimzed one,
I get a lockup with both.
> Nothing is wrote about this in /var/log messages. The box is SCSI only,
Just a hard lockup, it doesn't say anything at all, just a freeze,
keyboard doesn't work (not even numlock).
I also have a Adaptec 29160 card in our routers for logging to a
scsi-disk. Now that I think of it, the one I thought was stable didn't
have a SCSI-disk in it, and then I moved the flashdisk to the other router
that was in production and that died (but the logging isn't running).
> /proc/interrupts:
>
> CPU0 CPU1
> 0: 273705 282423 IO-APIC-edge timer
> 1: 4891 5117 IO-APIC-edge keyboard
> 2: 0 0 XT-PIC cascade
> 8: 0 1 IO-APIC-edge rtc
> 10: 8578 8328 IO-APIC-level aic7xxx
> 11: 962066 961390 IO-APIC-level mga@PCI:1:0:0, es1371
> 12: 109685 111089 IO-APIC-edge PS/2 Mouse
> 15: 2273 2295 IO-APIC-level eth0
> NMI: 0 0
> LOC: 556044 556060
> ERR: 0
> MIS: 0
>
>
> Looks clean :-(
Looks as clean as in my routers and then suddenly a freeze comes along and
ruins my day (I have watchdogcards but it still ruins my day knowing that
the router froze)
> Are there any patches, hints or recommendations known about this?
I havn't found anything about this at all :(
I have two of these routers right here next to my desk and I'm going to do
some heavy testing on them, one of them is the one I thought was stable
and the other one is virtually untested. I'm going to try with and without
scsi-cards and comparing BIOS-settings om them (But with my luck I'm
probably going to manage to make the "maybe stable" router freeze too.
/Martin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: repeatable SMP lockups - kernel 2.4.9
2001-09-14 12:30 repeatable SMP lockups - kernel 2.4.9 Matthias Haase
2001-09-14 14:54 ` Martin Josefsson
@ 2001-09-14 16:23 ` Matthias Haase
2001-09-14 16:26 ` Martin Josefsson
2001-09-15 7:20 ` Matthias Haase
2001-09-14 18:37 ` Andrew Morton
2001-09-15 8:32 ` Matthias Haase
3 siblings, 2 replies; 8+ messages in thread
From: Matthias Haase @ 2001-09-14 16:23 UTC (permalink / raw)
To: Martin Josefsson; +Cc: linux-kernel
Hi, Martin,
I hope, this sounds not to stupid:
As an hardware test I have run quake3d_demo with enabled DRI.
For this, I have compiled the 2.4.9 kernel the older DRM-code in, so I
could use the installed Xfree86 4.03 instead the required 4.1:
No error, no lockup, even though this game produced heavy load on ram and
harddisks.
No lockup too with the small traffic on the NIC, for instance with the
ADSL-connection (max. 90 kb/s) to our router.
But, as I sayd, repeatable lockups with some higher network-traffic inside
the LAN.
regards
Matthias
--
Gruesse
Matthias Haase | Telefon +49-(0)3733-23713
Markt 2 | Telefax +49-(0)3733-22660
|
D-09456 Annaberg-Buchholz | http://www.bennewitz.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: repeatable SMP lockups - kernel 2.4.9
2001-09-14 16:23 ` Matthias Haase
@ 2001-09-14 16:26 ` Martin Josefsson
2001-09-15 7:20 ` Matthias Haase
1 sibling, 0 replies; 8+ messages in thread
From: Martin Josefsson @ 2001-09-14 16:26 UTC (permalink / raw)
To: Matthias Haase; +Cc: linux-kernel
On Fri, 14 Sep 2001, Matthias Haase wrote:
> Hi, Martin,
>
>
> I hope, this sounds not to stupid:
>
> As an hardware test I have run quake3d_demo with enabled DRI.
> For this, I have compiled the 2.4.9 kernel the older DRM-code in, so I
> could use the installed Xfree86 4.03 instead the required 4.1:
>
> No error, no lockup, even though this game produced heavy load on ram and
> harddisks.
> No lockup too with the small traffic on the NIC, for instance with the
> ADSL-connection (max. 90 kb/s) to our router.
> But, as I sayd, repeatable lockups with some higher network-traffic inside
> the LAN.
I don't think it sounds that stupid.. but if it had hung you wouldn't have
known if it was the possible interupthandeling bug or some oghet bug in
DRI/DRM :)
I'm going to start my tests here soon.
/Martin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: repeatable SMP lockups - kernel 2.4.9
2001-09-14 12:30 repeatable SMP lockups - kernel 2.4.9 Matthias Haase
2001-09-14 14:54 ` Martin Josefsson
2001-09-14 16:23 ` Matthias Haase
@ 2001-09-14 18:37 ` Andrew Morton
2001-09-15 8:32 ` Matthias Haase
3 siblings, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2001-09-14 18:37 UTC (permalink / raw)
To: Matthias Haase; +Cc: linux-kernel
Matthias Haase wrote:
>
> Our new SMP file- and printserver locks always hard up, if higher load
> come on the NIC. True stable without networking (X11, DRI
>
Have you tried enabling the NMI watchdog? Boot with the
nmi_watchdog=1
LILO option.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: repeatable SMP lockups - kernel 2.4.9
2001-09-14 16:23 ` Matthias Haase
2001-09-14 16:26 ` Martin Josefsson
@ 2001-09-15 7:20 ` Matthias Haase
1 sibling, 0 replies; 8+ messages in thread
From: Matthias Haase @ 2001-09-15 7:20 UTC (permalink / raw)
To: Martin Josefsson; +Cc: linux-kernel
On Fri, 14 Sep 2001 18:26:04 +0200 (CEST)
Martin Josefsson <gandalf@wlug.westbo.se> wrote:
> I don't think it sounds that stupid.. but if it had hung you wouldn't
> have
> known if it was the possible interupthandeling bug or some oghet bug in
> DRI/DRM :)
Yes, but I now (relative) sure, that's ram-timing (it's DDR-RAM on 266
mHz) and cpu-clock are right.
Have found last night, that the box lockup too, if I use the scanner and
scanning a large file.
For scanning, I use an second additional SCSI-Controller (Dawicontrol,
based on AMD 53c974 [PCscsi]). The preview scan is o.k., but the scan
itself stops (and lockup hard the machine of course), if 4-5 mb are
transfered.
Sounds like an interrupt handling error?
> I'm going to start my tests here soon.
>
> /Martin
Please let me known about your results.
regards
Matthias
--
Gruesse
Matthias Haase | Telefon +49-(0)3733-23713
Markt 2 | Telefax +49-(0)3733-22660
|
D-09456 Annaberg-Buchholz | http://www.bennewitz.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* repeatable SMP lockups - kernel 2.4.9
2001-09-14 12:30 repeatable SMP lockups - kernel 2.4.9 Matthias Haase
` (2 preceding siblings ...)
2001-09-14 18:37 ` Andrew Morton
@ 2001-09-15 8:32 ` Matthias Haase
3 siblings, 0 replies; 8+ messages in thread
From: Matthias Haase @ 2001-09-15 8:32 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton
Hi, Andrew...
> Have you tried enabling the NMI watchdog? Boot with the
>
> nmi_watchdog=1
>
> LILO option.
Have this tried today, but no debugging messages is printed out.
The cursor blinks, and if the hang comes up, blinking is frozen.
Have nmi_watchdog=1 set in lilo.conf + # /etc/lilo -v -v
No watchdog or software-watchdog is compiled in the kernel, but I think,
this isn't related to the nmi_watchdog?
Thank's for your help.
regards
Matthias
--
Gruesse
Matthias Haase | Telefon +49-(0)3733-23713
Markt 2 | Telefax +49-(0)3733-22660
|
D-09456 Annaberg-Buchholz | http://www.bennewitz.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: repeatable SMP lockups - kernel 2.4.9
[not found] <OF21F37EC6.10570427-ON88256AC7.0052A32C@boulder.ibm.com>
@ 2001-09-14 15:46 ` Matthias Haase
0 siblings, 0 replies; 8+ messages in thread
From: Matthias Haase @ 2001-09-14 15:46 UTC (permalink / raw)
To: James Washer; +Cc: linux-kernel
Hi, Jim...
> have you enable Magic Sysrq, and attempted to get a register dump
> (Alt-Sysrq-p)..
Alt-Sysrq-* doesn't work at this time. Couldn't do a
sync/mount/read-only/boot or get a dump with 'p'.
regards
Matthias
--
Gruesse
Matthias Haase | Telefon +49-(0)3733-23713
Markt 2 | Telefax +49-(0)3733-22660
|
D-09456 Annaberg-Buchholz | http://www.bennewitz.com
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2001-09-15 8:32 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-14 12:30 repeatable SMP lockups - kernel 2.4.9 Matthias Haase
2001-09-14 14:54 ` Martin Josefsson
2001-09-14 16:23 ` Matthias Haase
2001-09-14 16:26 ` Martin Josefsson
2001-09-15 7:20 ` Matthias Haase
2001-09-14 18:37 ` Andrew Morton
2001-09-15 8:32 ` Matthias Haase
[not found] <OF21F37EC6.10570427-ON88256AC7.0052A32C@boulder.ibm.com>
2001-09-14 15:46 ` Matthias Haase
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).