linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Ferquently system lockups under load
@ 2003-01-23 13:30 Daniel Khan
  2003-01-23 22:24 ` Chris Wright
  0 siblings, 1 reply; 2+ messages in thread
From: Daniel Khan @ 2003-01-23 13:30 UTC (permalink / raw)
  To: linux-kernel

Hello List,

this posting is the last try to get a solution for a problem I have since
summer last year.

Last summer we bought 2 servers for a cluster and run RedHat 7.2 on it.
We had frequently lockups (I will describe later) and decided to exchange
the hole
hardware. On the 2 new servers we installed RedHat 7.3 and the same problems
occured.
We always had the latest RedHat SMP kernels _now_ we are on
2.4.18-19.7.xsmp.

The lockup:
Mostly under higher load (cronjobs) the problem happens
If the problem occurs the remote shell is dropped. Also the local shell
doesn't accept any
commands.
All ports remain open.
All interfaces remain up.
A telnet to ports like pop3 result in a "connected" message but nothing
more.
Only courier IMAP gives a "Hello" message but this is all.
So there are no services reachable anymore.
The only solution is a hard reset.
There are no logfile entries that indicate any problem.
Seems as if syslog also crashes before.

The system:
glibc-2.2.5-42
heartbeat-0.4.9f-1
RedHat 7.3 2.4.18-19.7.xsmp
pam_ldap
nss_ldap-198-3
openldap-2.0.25-1

all 7 partitions ext3 with default params in an IDE hardware raid 5:
/dev/hda2              26G  5.4G   19G  22% /
/dev/hda1              45M   42M  2.1M  96% /boot
/dev/hda3              65G   12G   50G  19% /home
none                  756M     0  755M   0% /dev/shm
/dev/hda9             243M  4.8M  225M   3% /tmp
/dev/hda5              24G  159M   22G   1% /var/lib/mysql
/dev/hda6              20G  3.1G   16G  17% /var/log/httpd
/dev/hda7              13G   33M   12G   1% /var/spool/mail

The hardware:
IDE Raid controller: ACCUSYS ACS7610 B0W121 (on the first servers it was a
3ware)
2x 1.2 GHZ CPU's
1Gb Ram

What I found out:
As long as I don't do LDAP lookups in postfix the system seems to be stable

Other things:
I don't use nscd.
I have updated all packages (At least RedHat network tells me that
everything's fine.)

Why I think it's a kernel issue:
Well it may also be a problem in nss_ldap but I think that everything goes
back to the kernel
if the system get's locked down totally.
I am now thinking that it may be a problem in ext3 in conjunction with
nss_ldap.
But - as you allready found out I'm no kernel hacker at all. I am simply
trying to debug
and rule out since month's and now I'm really stuck.

I would be glad if somebody could help me on this problem or point me into
the right direction
cause the only thing I can search for is "RedHat crash","RedHat
freeze","RedHat ext3 crash"
as I simply don't know why and how this happens...

Would you please CC the message to me as I am not subscribed yet and please
don't shoot at me...

Thanks in advance - _AND_ thanks for your work on the linux kernel.

--
Daniel Khan


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Ferquently system lockups under load
  2003-01-23 13:30 Ferquently system lockups under load Daniel Khan
@ 2003-01-23 22:24 ` Chris Wright
  0 siblings, 0 replies; 2+ messages in thread
From: Chris Wright @ 2003-01-23 22:24 UTC (permalink / raw)
  To: Daniel Khan; +Cc: linux-kernel, linux-ha-dev

* Daniel Khan (dk@webcluster.at) wrote:
> 
> The system:
> glibc-2.2.5-42
> heartbeat-0.4.9f-1
> RedHat 7.3 2.4.18-19.7.xsmp

There is a lockup issue with heartbeat 0.4.9f and current RH kernels.
It is typically exeperienced with the start up of apphbd, and appears to
be with realtime scheduled tasks.  Lars has posted a list of debugging
info he'd like to see on the linux-ha-dev list.  Check the recent archives
<http://lists.community.tummy.com/mailman/listinfo/linux-ha-dev>.

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2003-01-23 22:15 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-23 13:30 Ferquently system lockups under load Daniel Khan
2003-01-23 22:24 ` Chris Wright

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).