linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 2.4.13 Mem Related Hangs
@ 2001-11-05 17:14 Jim Eshleman
  2001-11-13 15:54 ` Jim Eshleman
  0 siblings, 1 reply; 3+ messages in thread
From: Jim Eshleman @ 2001-11-05 17:14 UTC (permalink / raw)
  To: Jason Allen; +Cc: linux-kernel, linux-xfs

 > We have a 8 CPU/8GB Dell 8450 running 2.4.13 (NFS and XFS patches)
 > which hangs regularly.
 >
 > I'd say that the problem is memory related. What seems to occur is
 > that mem cache grows until physical mem is exhausted at which time
 > the system hangs.

   FWIW me too, on an 8-way 8.5GB (64GB HIGHMEM enabled) IBM Netfinity 
x370 (8500R) which functions as a production mail server.  I currently 
run 2.4.9 with XFS and it stays up for about a week under heavy load. 
2.4.13 lasted about 4 hours under light load until all memory was 
consumed by cache then it became unresponsive.

   2.4.13 on a 2-way 1GB (64GB HIGHMEM enabled) Netfinity x350 test box 
with the same kernel config and XFS works fine even under stress, so 
perhaps our problem is similar to the discussion on l-k "Google's mm 
problems"...

   Anything I can do to help please ask.

Jim


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 2.4.13 Mem Related Hangs
  2001-11-05 17:14 2.4.13 Mem Related Hangs Jim Eshleman
@ 2001-11-13 15:54 ` Jim Eshleman
  0 siblings, 0 replies; 3+ messages in thread
From: Jim Eshleman @ 2001-11-13 15:54 UTC (permalink / raw)
  To: Jim Eshleman; +Cc: Jason Allen, linux-kernel, linux-xfs

>   FWIW me too, on an 8-way 8.5GB (64GB HIGHMEM enabled) IBM Netfinity 
> x370 (8500R) which functions as a production mail server.  I currently 
> run 2.4.9 with XFS and it stays up for about a week under heavy load. 
> 2.4.13 lasted about 4 hours under light load until all memory was 
> consumed by cache then it became unresponsive.
> 
>   2.4.13 on a 2-way 1GB (64GB HIGHMEM enabled) Netfinity x350 test box 
> with the same kernel config and XFS works fine even under stress, so 
> perhaps our problem is similar to the discussion on l-k "Google's mm 
> problems"...


   Update: I'm unable to make 2.4.14 fail on the test box (running 
Cerberus, bonnie++ against two XFS volumes, and LTP simultaneously) but 
it melts-down just as 2.4.13 does on the big production box.  A short 
time after all memory is eaten by file cache, and under light load, the 
machine becomes unresponsive.  It took about five minutes to login at 
the console.  No error messages on the console or in syslog.  Here's 
some info, it's obvious in the vmstat output where the melt-down occurs:

   kernel config: http://www.lehigh.edu/~jce0/2.4.14-config
   bootup messages: http://www.lehigh.edu/~jce0/2.4.14-messages
   vmstat 60 output: http://www.lehigh.edu/~jce0/2.4.14-vmstat
   ver_linux output: http://www.lehigh.edu/~jce0/ver_linux.out

   This is linus 2.4.14 patched with linux-2.4.14-xfs-2001-11-06.patch 
and LVM 0.9.1_beta6, compiled with egcs-2.91.66.  It's a RH 7.1 system.

   I know Andrea and Marcelo? were testing and fixing some HIGHMEM 
things.  Were there any patches and did they make it into the Linus tree?

   Any assistance greatly appreciated.

Jim



^ permalink raw reply	[flat|nested] 3+ messages in thread

* 2.4.13 Mem Related Hangs
@ 2001-11-05 15:29 Jason Allen
  0 siblings, 0 replies; 3+ messages in thread
From: Jason Allen @ 2001-11-05 15:29 UTC (permalink / raw)
  To: linux-kernel

We have a 8 CPU/8GB Dell 8450 running 2.4.13 (NFS and XFS patches) which
hangs regularly. 

I'd say that the problem is memory related. What seems to occur is that
mem cache grows until physical mem is exhausted at which time the system
hangs.

This is the top screen that printed out seconds before the machine
became unresponsive.  I was typing in another window and it stopped
responding but this screen had just updated.


  4:13pm  up 2 days,  5:55,  5 users,  load average: 22.02, 19.27, 19.90
247 processes: 228 sleeping, 19 running, 0 zombie, 0 stopped
CPU0 states: 20.53% user, 79.8% system,  0.0% nice,  0.1% idle
CPU1 states:  5.46% user, 94.16% system,  0.0% nice,  0.1% idle
CPU2 states:  7.37% user, 71.45% system,  0.2% nice, 20.42% idle
CPU3 states:  4.40% user, 95.23% system,  0.0% nice,  0.0% idle
CPU4 states:  8.45% user, 91.18% system,  0.0% nice,  0.0% idle
CPU5 states:  5.38% user, 89.24% system,  0.1% nice,  4.63% idle
CPU6 states:  5.12% user, 87.18% system,  0.0% nice,  7.33% idle
CPU7 states:  6.55% user, 88.41% system,  0.1% nice,  4.29% idle
Mem:  8246096K av, 8233888K used,   12208K free,       0K shrd,       0K
buff
Swap: 8371784K av,       0K used, 8371784K free                 7441040K
cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 7516 d0relmgr  14   0 43296  42M  2580 R    58.9  0.5   2:34 kfront
 8626 d0relmgr  13   0 29276  28M  2180 R    15.4  0.3   0:48 kfront
 8893 d0relmgr  10   0 25732  25M  2180 R    15.7  0.3   0:36 kfront
 9241 d0relmgr  10   0 20824  20M  2180 R    15.6  0.2   0:13 kfront
 9112 d0relmgr  11   0 20180  19M  2180 R    15.1  0.2   0:17 kfront
 9230 d0relmgr  10   0 19596  19M  2180 R    16.0  0.2   0:14 kfront
 9448 d0relmgr  12   0 16456  16M  2180 R    17.0  0.1   0:13 kfront
 9487 d0relmgr  15   0 14292  13M  2180 D    14.9  0.1   0:13 kfront
 9800 d0relmgr  15   0  8744 8744  2160 R    10.7  0.1   0:06 cc1
 3408 d0relmgr   9   0  4888 4888   580 S     0.0  0.0   0:03 gmake
27680 d0relmgr   8   0  4380 4380   572 S     0.0  0.0   0:03 gmake
30697 d0relmgr   9   0  4348 4348   568 S     0.0  0.0   0:00 gmake
31964 d0relmgr   9   0  4348 4348   568 S     0.0  0.0   0:00 gmake
30499 d0relmgr   9   0  4344 4344   568 S     0.0  0.0   0:00 gmake
 3609 d0relmgr   9   0  3840 3840   580 S     0.0  0.0   0:02 gmake
 2987 d0relmgr   9   0  3600 3600   600 S     0.0  0.0   0:02 gmake
  911 xfs        9   0  3360 3360   916 S     0.0  0.0   0:00 xfs
 9014 d0relmgr   9   0  3360 3356  1772 S     0.0  0.0   0:24 xterm
 9757 d0relmgr  13   0  2532 2532  1776 R    17.7  0.0   0:11
kfront-thin
 9016 d0relmgr   8   0  2416 2416   984 S     0.0  0.0   0:06 tcsh
22852 d0relmgr   8   0  2052 2052  1132 S     0.0  0.0   0:13 tcsh
 9102 boyd       9   0  2000 2000  1128 S     0.1  0.0   0:02 tcsh
  572 root       9   0  1988 1988  1728 S     0.0  0.0  12:47 ntpd
  850 root       9   0  1968 1968  1424 S     0.0  0.0   0:22 sendmail
 8220 boyd       9   0  1924 1924  1064 S     0.0  0.0   0:30 tcsh
 8885 d0relmgr   9   0  1808 1808   600 S     0.0  0.0   0:01 gmake
 7598 d0relmgr   9   0  1744 1744   600 S     0.0  0.0   0:00 gmake

This is from a few seconds before:

d0lomite 4:13pm ~ 5 > cat /proc/meminfo
        total:    used:    free:  shared: buffers:  cached:
Mem:  8444002304 8418140160 25862144        0        0 7619543040
Swap: 8572706816        0 8572706816
MemTotal:      8246096 kB
MemFree:         25256 kB
MemShared:           0 kB
Buffers:             0 kB
Cached:        7440960 kB
SwapCached:          0 kB
Active:        5300072 kB
Inactive:      2140888 kB
HighTotal:     7471104 kB
HighFree:         2044 kB
LowTotal:       774992 kB
LowFree:         23212 kB
SwapTotal:     8371784 kB
SwapFree:      8371784 kB


Any assistance or words of wisdom would be appreciated.

Jason Allen






^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2001-11-13 15:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-11-05 17:14 2.4.13 Mem Related Hangs Jim Eshleman
2001-11-13 15:54 ` Jim Eshleman
  -- strict thread matches above, loose matches on Subject: below --
2001-11-05 15:29 Jason Allen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).