linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* repeatable SMP lockups - kernel 2.4.9
@ 2001-09-14 12:30 Matthias Haase
  2001-09-14 14:54 ` Martin Josefsson
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Matthias Haase @ 2001-09-14 12:30 UTC (permalink / raw)
  To: linux-kernel

Our new SMP file- and printserver locks always hard up, if higher load
come on the NIC. True stable without networking (X11, DRI

1. First, I have changed the NIC from 3Com (vortex-driver) to noname,
driven by Realtek
RTL-8139 (rev 10) and the lockup occurs some later, but it occurs
repeatable if I copy large file on LAN, or export an X11 environment to
another box.
2. Changing the kernel to 2.2.19 results the same thing.

Donald Becker wrote, that he think, this apparently could be a bug with
the interrupt handling in the 2.4.9 kernel, not inside
the (his) driver itself.

The boot on the mainboard (Asus CUV266-D, 2x PIII 1 GHz, 512 mb DDR-RAM)
is always o.k. with APIC, excepting the 'unexpected IO-APIC, please mail'
- warning.
The lockup occurs too with 'noapic' on boot.

At third stage I can try another and 'smp-cleaner' (I think)  NIC, D-Link
DFE-500 TX, based on DEC-Chip, using the tulip-driver.

Nothing is wrote about this in /var/log messages. The box is SCSI only,
Adaptec 29160N.

/proc/interrupts:

           CPU0       CPU1       
  0:     273705     282423    IO-APIC-edge  timer
  1:       4891       5117    IO-APIC-edge  keyboard
  2:          0          0          XT-PIC  cascade
  8:          0          1    IO-APIC-edge  rtc
 10:       8578       8328   IO-APIC-level  aic7xxx
 11:     962066     961390   IO-APIC-level  mga@PCI:1:0:0, es1371
 12:     109685     111089    IO-APIC-edge  PS/2 Mouse
 15:       2273       2295   IO-APIC-level  eth0
NMI:          0          0 
LOC:     556044     556060 
ERR:          0
MIS:          0


Looks clean :-(

Are there any patches, hints or recommendations known about this?


__ 
Best regards from Germany

Matthias





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: repeatable SMP lockups - kernel 2.4.9
  2001-09-14 12:30 repeatable SMP lockups - kernel 2.4.9 Matthias Haase
@ 2001-09-14 14:54 ` Martin Josefsson
  2001-09-14 16:23 ` Matthias Haase
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Martin Josefsson @ 2001-09-14 14:54 UTC (permalink / raw)
  To: Matthias Haase; +Cc: linux-kernel

On Fri, 14 Sep 2001, Matthias Haase wrote:

> Our new SMP file- and printserver locks always hard up, if higher load
> come on the NIC. True stable without networking (X11, DRI

I have the similar problems with 4 routers here, they get quite high
network load sometimes... not really good.

> 1. First, I have changed the NIC from 3Com (vortex-driver) to noname,
> driven by Realtek
> RTL-8139 (rev 10) and the lockup occurs some later, but it occurs
> repeatable if I copy large file on LAN, or export an X11 environment to
> another box.

I used to be able to get the routers to hang in under 30minutes, but with
2.4.8-ac12 one of them survived my testing for over 36hours.

But when I put it into production thinking that it's more stable than the
other kernels it hung after 5-10minutes of operation.

> 2. Changing the kernel to 2.2.19 results the same thing.

Havn't tried any 2.2 kernels here because I want iptables.

> Donald Becker wrote, that he think, this apparently could be a bug with
> the interrupt handling in the 2.4.9 kernel, not inside
> the (his) driver itself.
> 
> The boot on the mainboard (Asus CUV266-D, 2x PIII 1 GHz, 512 mb DDR-RAM)
> is always o.k. with APIC, excepting the 'unexpected IO-APIC, please mail'
> - warning.
> The lockup occurs too with 'noapic' on boot.

Our routers consists of Asus P3C-D (i820 chipset), 2xpIII 800MHz, 256MB
rimm. As a lot of people know, the i820 chipset is very unstable _if_ you
have SDRAM but not with rimm as it was built for.

Running with 'noapic' still freezes but I don't think it occurs as
frequently as when runnign with IOAPIC.

> At third stage I can try another and 'smp-cleaner' (I think)  NIC, D-Link
> DFE-500 TX, based on DEC-Chip, using the tulip-driver.

I'm using D-Link DFE-570TX which is a quad tulip (DECchip 21143 rev 65).
I've been using both the stock driver in the kernels and an optimzed one,
I get a lockup with both.

> Nothing is wrote about this in /var/log messages. The box is SCSI only,

Just a hard lockup, it doesn't say anything at all, just a freeze,
keyboard doesn't work (not even numlock).

I also have a Adaptec 29160 card in our routers for logging to a
scsi-disk. Now that I think of it, the one I thought was stable didn't
have a SCSI-disk in it, and then I moved the flashdisk to the other router
that was in production and that died (but the logging isn't running).

> /proc/interrupts:
> 
>            CPU0       CPU1       
>   0:     273705     282423    IO-APIC-edge  timer
>   1:       4891       5117    IO-APIC-edge  keyboard
>   2:          0          0          XT-PIC  cascade
>   8:          0          1    IO-APIC-edge  rtc
>  10:       8578       8328   IO-APIC-level  aic7xxx
>  11:     962066     961390   IO-APIC-level  mga@PCI:1:0:0, es1371
>  12:     109685     111089    IO-APIC-edge  PS/2 Mouse
>  15:       2273       2295   IO-APIC-level  eth0
> NMI:          0          0 
> LOC:     556044     556060 
> ERR:          0
> MIS:          0
> 
> 
> Looks clean :-(

Looks as clean as in my routers and then suddenly a freeze comes along and
ruins my day (I have watchdogcards but it still ruins my day knowing that
the router froze)

> Are there any patches, hints or recommendations known about this?

I havn't found anything about this at all :(

I have two of these routers right here next to my desk and I'm going to do
some heavy testing on them, one of them is the one I thought was stable
and the other one is virtually untested. I'm going to try with and without
scsi-cards and comparing BIOS-settings om them (But with my luck I'm
probably going to manage to make the "maybe stable" router freeze too.

/Martin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: repeatable SMP lockups - kernel 2.4.9
  2001-09-14 12:30 repeatable SMP lockups - kernel 2.4.9 Matthias Haase
  2001-09-14 14:54 ` Martin Josefsson
@ 2001-09-14 16:23 ` Matthias Haase
  2001-09-14 16:26   ` Martin Josefsson
  2001-09-15  7:20   ` Matthias Haase
  2001-09-14 18:37 ` Andrew Morton
  2001-09-15  8:32 ` Matthias Haase
  3 siblings, 2 replies; 8+ messages in thread
From: Matthias Haase @ 2001-09-14 16:23 UTC (permalink / raw)
  To: Martin Josefsson; +Cc: linux-kernel

Hi, Martin,


I hope, this sounds not to stupid:

As an hardware test I have run quake3d_demo with enabled DRI. 
For this, I have compiled the 2.4.9 kernel the older DRM-code in, so I
could use the installed Xfree86 4.03 instead the required 4.1:

No error, no lockup, even though this game produced heavy load on ram and
harddisks.
No lockup too with the small traffic on the NIC,  for instance with the
ADSL-connection (max. 90 kb/s) to our router.
But, as I sayd, repeatable lockups with some higher network-traffic inside
the LAN.


regards

                          Matthias

-- 
Gruesse


Matthias Haase            | Telefon +49-(0)3733-23713
Markt 2                   | Telefax +49-(0)3733-22660
                          |
D-09456 Annaberg-Buchholz | http://www.bennewitz.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: repeatable SMP lockups - kernel 2.4.9
  2001-09-14 16:23 ` Matthias Haase
@ 2001-09-14 16:26   ` Martin Josefsson
  2001-09-15  7:20   ` Matthias Haase
  1 sibling, 0 replies; 8+ messages in thread
From: Martin Josefsson @ 2001-09-14 16:26 UTC (permalink / raw)
  To: Matthias Haase; +Cc: linux-kernel

On Fri, 14 Sep 2001, Matthias Haase wrote:

> Hi, Martin,
> 
> 
> I hope, this sounds not to stupid:
> 
> As an hardware test I have run quake3d_demo with enabled DRI. 
> For this, I have compiled the 2.4.9 kernel the older DRM-code in, so I
> could use the installed Xfree86 4.03 instead the required 4.1:
>
> No error, no lockup, even though this game produced heavy load on ram and
> harddisks.
> No lockup too with the small traffic on the NIC,  for instance with the
> ADSL-connection (max. 90 kb/s) to our router.
> But, as I sayd, repeatable lockups with some higher network-traffic inside
> the LAN.

I don't think it sounds that stupid.. but if it had hung you wouldn't have
known if it was the possible interupthandeling bug or some oghet bug in
DRI/DRM :)

I'm going to start my tests here soon.

/Martin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: repeatable SMP lockups - kernel 2.4.9
  2001-09-14 12:30 repeatable SMP lockups - kernel 2.4.9 Matthias Haase
  2001-09-14 14:54 ` Martin Josefsson
  2001-09-14 16:23 ` Matthias Haase
@ 2001-09-14 18:37 ` Andrew Morton
  2001-09-15  8:32 ` Matthias Haase
  3 siblings, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2001-09-14 18:37 UTC (permalink / raw)
  To: Matthias Haase; +Cc: linux-kernel

Matthias Haase wrote:
> 
> Our new SMP file- and printserver locks always hard up, if higher load
> come on the NIC. True stable without networking (X11, DRI
> 

Have you tried enabling the NMI watchdog?  Boot with the

	nmi_watchdog=1

LILO option.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: repeatable SMP lockups - kernel 2.4.9
  2001-09-14 16:23 ` Matthias Haase
  2001-09-14 16:26   ` Martin Josefsson
@ 2001-09-15  7:20   ` Matthias Haase
  1 sibling, 0 replies; 8+ messages in thread
From: Matthias Haase @ 2001-09-15  7:20 UTC (permalink / raw)
  To: Martin Josefsson; +Cc: linux-kernel

On Fri, 14 Sep 2001 18:26:04 +0200 (CEST)
Martin Josefsson <gandalf@wlug.westbo.se> wrote:

> I don't think it sounds that stupid.. but if it had hung you wouldn't
> have
> known if it was the possible interupthandeling bug or some oghet bug in
> DRI/DRM :)
Yes, but I now (relative) sure, that's ram-timing (it's DDR-RAM on 266
mHz) and cpu-clock are right.

Have found last night, that the box lockup too, if I use the scanner and
scanning a large file.
For scanning, I use an second additional SCSI-Controller (Dawicontrol,
based on AMD 53c974 [PCscsi]). The preview scan is o.k., but the scan
itself stops (and lockup hard the machine of course), if 4-5 mb are
transfered.

Sounds like an interrupt handling error?

> I'm going to start my tests here soon.
> 
> /Martin

Please let me known about your results.

regards

                          Matthias

-- 
Gruesse


Matthias Haase            | Telefon +49-(0)3733-23713
Markt 2                   | Telefax +49-(0)3733-22660
                          |
D-09456 Annaberg-Buchholz | http://www.bennewitz.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

* repeatable SMP lockups - kernel 2.4.9
  2001-09-14 12:30 repeatable SMP lockups - kernel 2.4.9 Matthias Haase
                   ` (2 preceding siblings ...)
  2001-09-14 18:37 ` Andrew Morton
@ 2001-09-15  8:32 ` Matthias Haase
  3 siblings, 0 replies; 8+ messages in thread
From: Matthias Haase @ 2001-09-15  8:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton

Hi, Andrew...

> Have you tried enabling the NMI watchdog?  Boot with the
> 
> 	nmi_watchdog=1
> 
> LILO option.

Have this tried today, but no debugging messages is printed out.
The cursor blinks, and if the hang comes up, blinking is frozen.
Have nmi_watchdog=1 set in lilo.conf +  # /etc/lilo -v -v 
No watchdog or software-watchdog is compiled in the kernel, but I think,
this isn't related to the nmi_watchdog? 

Thank's for your help.

regards

                           Matthias

-- 
Gruesse


Matthias Haase            | Telefon +49-(0)3733-23713
Markt 2                   | Telefax +49-(0)3733-22660
                          |
D-09456 Annaberg-Buchholz | http://www.bennewitz.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: repeatable SMP lockups - kernel 2.4.9
       [not found] <OF21F37EC6.10570427-ON88256AC7.0052A32C@boulder.ibm.com>
@ 2001-09-14 15:46 ` Matthias Haase
  0 siblings, 0 replies; 8+ messages in thread
From: Matthias Haase @ 2001-09-14 15:46 UTC (permalink / raw)
  To: James Washer; +Cc: linux-kernel

Hi, Jim...

> have you enable Magic Sysrq, and attempted to get a register dump
> (Alt-Sysrq-p)..

Alt-Sysrq-* doesn't work at this time. Couldn't do a
sync/mount/read-only/boot or get a dump with 'p'.

regards

                          Matthias

-- 
Gruesse


Matthias Haase            | Telefon +49-(0)3733-23713
Markt 2                   | Telefax +49-(0)3733-22660
                          |
D-09456 Annaberg-Buchholz | http://www.bennewitz.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2001-09-15  8:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-14 12:30 repeatable SMP lockups - kernel 2.4.9 Matthias Haase
2001-09-14 14:54 ` Martin Josefsson
2001-09-14 16:23 ` Matthias Haase
2001-09-14 16:26   ` Martin Josefsson
2001-09-15  7:20   ` Matthias Haase
2001-09-14 18:37 ` Andrew Morton
2001-09-15  8:32 ` Matthias Haase
     [not found] <OF21F37EC6.10570427-ON88256AC7.0052A32C@boulder.ibm.com>
2001-09-14 15:46 ` Matthias Haase

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).