linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* What version of the kernel fixes these VM issues?
@ 2001-08-24  8:47 Anwar P
  2001-08-24 13:18 ` Daniel Phillips
  0 siblings, 1 reply; 16+ messages in thread
From: Anwar P @ 2001-08-24  8:47 UTC (permalink / raw)
  To: linux-kernel

Hi -
 
We have big system (8 processors, 8GB ram), running Oracle and this other ETL tool.  Oracle is up and running all the time, and the ETL tool runs once a day.  But everytime the ETL tool runs (along with Oracle), the system seems to run out of memory, and the server comes to a crawl, often with keyborad response in 10 to 15 minute intervals.  We are currently using the 2.4.3-6 kernel that comes with Redhat 7.1.
 
We know that Oracle comsumes no more than 2GB of memory at peak usage, and the ETL tool itself consumes less than 1GB.  But the ETL tool does process a whole bunch of text files (total about 6GB worth of), and it runs for about 2 hours.  What happens is that while they are both running, the filesystem cache size increases progressively, and some time later, it begins swapping.  We do have 16GB (2x RAM) of swap.  And when it starts to swap, the server responds to keystrokes/commands randomly and appears dead for 10s of minutes. We know that together our applications do not need more than 4GB of RAM on this 8GB box, so it is the VM that is causing this unnecessary swapping by trying to use too much memory for filesystem cache.   
 
So the first question is, is there any way I can limit the amount of memory used for FS cache ?
 
And the next one is, are there any (later) versions of the kernel that are more sane about what the maximum FS cache it should use is ?  It is strange that the FS caching does not take into account the amount of physical RAM on the box.  What is the point in doing FS caching when the end result is thrashing and the machine becomes unusable ?  
 
Anwar.

-- 

_______________________________________________
FREE Personalized E-mail at Mail.com 
http://www.mail.com/?sr=signup 

Talk More, Pay Less with Net2Phone Direct(R), up to 1500 minutes free! 
http://www.net2phone.com/cgi-bin/link.cgi?143 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: What version of the kernel fixes these VM issues?
  2001-08-24  8:47 What version of the kernel fixes these VM issues? Anwar P
@ 2001-08-24 13:18 ` Daniel Phillips
  2001-08-24 18:14   ` Nicolas Pitre
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Phillips @ 2001-08-24 13:18 UTC (permalink / raw)
  To: Anwar P, linux-kernel

On August 24, 2001 10:47 am, Anwar P wrote:
> We have big system (8 processors, 8GB ram), running Oracle and this other 
> ETL tool.  Oracle is up and running all the time, and the ETL tool runs 
> once a day.  But everytime the ETL tool runs (along with Oracle), the 
> system seems to run out of memory, and the server comes to a crawl, often 
> with keyborad response in 10 to 15 minute intervals.  We are currently 
> using the 2.4.3-6 kernel that comes with Redhat 7.1.
>  
> We know that Oracle comsumes no more than 2GB of memory at peak usage, and 
> the ETL tool itself consumes less than 1GB.  But the ETL tool does process 
> a whole bunch of text files (total about 6GB worth of), and it runs for 
> about 2 hours.  What happens is that while they are both running, the 
> filesystem cache size increases progressively, and some time later, it 
> begins swapping.  We do have 16GB (2x RAM) of swap.  And when it starts to 
> swap, the server responds to keystrokes/commands randomly and appears dead 
> for 10s of minutes. We know that together our applications do not need more 
> than 4GB of RAM on this 8GB box, so it is the VM that is causing this 
> unnecessary swapping by trying to use too much memory for filesystem cache. 
  
There is no way your system should be going into swap under these conditions 
- it's a bug.  We have probably fixed this already.
  
> So the first question is, is there any way I can limit the amount of memory 
> used for FS cache ?

Um, no, sorry.  This should not be necessary.

> And the next one is, are there any (later) versions of the kernel that are 
> more sane about what the maximum FS cache it should use is ?

Please try 2.4.9 and 2.4.8-ac10.  If the system slows down, look in your logs 
and see if there are any "allocation failed" messages.  Use top or do watch 
cat /proc/meminfo to be sure your system isn't going into swap, and please 
let us know what happens.

--
Daniel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: What version of the kernel fixes these VM issues?
  2001-08-24 13:18 ` Daniel Phillips
@ 2001-08-24 18:14   ` Nicolas Pitre
  2001-08-24 18:25     ` Nicolas Pitre
                       ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Nicolas Pitre @ 2001-08-24 18:14 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Anwar P, linux-kernel



On Fri, 24 Aug 2001, Daniel Phillips wrote:

> On August 24, 2001 10:47 am, Anwar P wrote:
> > We have big system (8 processors, 8GB ram), running Oracle and this other
> > ETL tool.  Oracle is up and running all the time, and the ETL tool runs
> > once a day.  But everytime the ETL tool runs (along with Oracle), the
> > system seems to run out of memory, and the server comes to a crawl, often
> > with keyborad response in 10 to 15 minute intervals.  We are currently
> > using the 2.4.3-6 kernel that comes with Redhat 7.1.
> >
> > We know that Oracle comsumes no more than 2GB of memory at peak usage, and
> > the ETL tool itself consumes less than 1GB.  But the ETL tool does process
> > a whole bunch of text files (total about 6GB worth of), and it runs for
> > about 2 hours.  What happens is that while they are both running, the
> > filesystem cache size increases progressively, and some time later, it
> > begins swapping.  We do have 16GB (2x RAM) of swap.  And when it starts to
> > swap, the server responds to keystrokes/commands randomly and appears dead
> > for 10s of minutes. We know that together our applications do not need more
> > than 4GB of RAM on this 8GB box, so it is the VM that is causing this
> > unnecessary swapping by trying to use too much memory for filesystem cache.
>
> There is no way your system should be going into swap under these conditions
> - it's a bug.  We have probably fixed this already.
>
> Please try 2.4.9 and 2.4.8-ac10.  If the system slows down, look in your logs
> and see if there are any "allocation failed" messages.  Use top or do watch
> cat /proc/meminfo to be sure your system isn't going into swap, and please
> let us know what happens.

I have a totally different setup but I can reproduce the same behavior on
the system I have here:

ARM board with 32 MB RAM, no flash, NFS root.
The kernel is based on 2.4.8-ac9 plus some small VM fixes from -ac10.

My test consist in compiling gcc 3.0 while some MP3s are continously playing
in the background.  The gcc build goes pretty far along until both the mp3
player and the gcc build completely jam.  Oh maybe not completely as I get
about 100ms of audio playing every 10 secs.  bash starts echoing what I type
one char per approx 5 sec.  The only thing that still works fine is the
magic sysrq that clearly shows that the CPU is spinning in the VM code. NFS
trafic is also going on full bandwidth but no progress ever happens in user
space.

My console is on a serial port so if someone can send me a patch with
whatever printks to trace what's happening in real time I'll be glad to
provide the trace file.  Reaching the jam state takes about 5 minutes so
it's not hard to reproduce.


Nicolas


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: What version of the kernel fixes these VM issues?
  2001-08-24 18:14   ` Nicolas Pitre
@ 2001-08-24 18:25     ` Nicolas Pitre
  2001-08-24 18:48       ` Mark Frazer
  2001-08-24 19:53     ` Daniel Phillips
  2001-08-24 19:56     ` Daniel Phillips
  2 siblings, 1 reply; 16+ messages in thread
From: Nicolas Pitre @ 2001-08-24 18:25 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Anwar P, linux-kernel



On Fri, 24 Aug 2001, Nicolas Pitre wrote:

> I have a totally different setup but I can reproduce the same behavior on
> the system I have here:
>
> ARM board with 32 MB RAM, no flash, NFS root.

Sorry I meant no swap.


Nicolas


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: What version of the kernel fixes these VM issues?
  2001-08-24 18:25     ` Nicolas Pitre
@ 2001-08-24 18:48       ` Mark Frazer
  0 siblings, 0 replies; 16+ messages in thread
From: Mark Frazer @ 2001-08-24 18:48 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Daniel Phillips, Anwar P, linux-kernel

Hi Nicolas.  You should run vmstat and watch the paging activity.
We've seen our ARM boards get driven to a standstill by paging when
they run out of RAM.  We have no swap either.  To make things worse,
our flash loads are compressed, so we burn all our CPU in decompression
when paging back in.  If you have no swap, the only thing that can be
booted out are executable pages.

Kernel folk:  do /proc/sys/vm/{pagecache,buffermem} still do anything?
Could Nicolas still limit the pagecache using these?

cheers
-mark

Nicolas Pitre <nico@cam.org> [01/08/24 14:38]:
> 
> 
> On Fri, 24 Aug 2001, Nicolas Pitre wrote:
> 
> > I have a totally different setup but I can reproduce the same behavior on
> > the system I have here:
> >
> > ARM board with 32 MB RAM, no flash, NFS root.
> 
> Sorry I meant no swap.
> 
> 
> Nicolas
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: What version of the kernel fixes these VM issues?
  2001-08-24 18:14   ` Nicolas Pitre
  2001-08-24 18:25     ` Nicolas Pitre
@ 2001-08-24 19:53     ` Daniel Phillips
  2001-08-24 19:56     ` Daniel Phillips
  2 siblings, 0 replies; 16+ messages in thread
From: Daniel Phillips @ 2001-08-24 19:53 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Anwar P, linux-kernel

On August 24, 2001 08:14 pm, Nicolas Pitre wrote:
> On Fri, 24 Aug 2001, Daniel Phillips wrote:

> > Please try 2.4.9 and 2.4.8-ac10.  If the system slows down, look in your logs
> > and see if there are any "allocation failed" messages.  Use top or do watch
> > cat /proc/meminfo to be sure your system isn't going into swap, and please
> > let us know what happens.
> 
> I have a totally different setup but I can reproduce the same behavior on
> the system I have here:
> 
> ARM board with 32 MB RAM, no flash, NFS root.
> The kernel is based on 2.4.8-ac9 plus some small VM fixes from -ac10.

What happens with 2.4.9?

> My test consist in compiling gcc 3.0 while some MP3s are continously playing
> in the background.  The gcc build goes pretty far along until both the mp3
> player and the gcc build completely jam.  Oh maybe not completely as I get
> about 100ms of audio playing every 10 secs.  bash starts echoing what I type
> one char per approx 5 sec.  The only thing that still works fine is the
> magic sysrq that clearly shows that the CPU is spinning in the VM code. NFS
> trafic is also going on full bandwidth but no progress ever happens in user
> space.
> 
> My console is on a serial port so if someone can send me a patch with
> whatever printks to trace what's happening in real time I'll be glad to
> provide the trace file.  Reaching the jam state takes about 5 minutes so
> it's not hard to reproduce.

Try:

	watch cat /proc/meminfo

--
Daniel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: What version of the kernel fixes these VM issues?
  2001-08-24 18:14   ` Nicolas Pitre
  2001-08-24 18:25     ` Nicolas Pitre
  2001-08-24 19:53     ` Daniel Phillips
@ 2001-08-24 19:56     ` Daniel Phillips
  2001-08-24 20:12       ` Nicolas Pitre
  2 siblings, 1 reply; 16+ messages in thread
From: Daniel Phillips @ 2001-08-24 19:56 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Anwar P, linux-kernel

On August 24, 2001 08:14 pm, Nicolas Pitre wrote:
> I have a totally different setup but I can reproduce the same behavior on
> the system I have here:
> 
> ARM board with 32 MB RAM, no flash, NFS root.
> The kernel is based on 2.4.8-ac9 plus some small VM fixes from -ac10.
> 
> My test consist in compiling gcc 3.0 while some MP3s are continously playing
> in the background.  The gcc build goes pretty far along until both the mp3
> player and the gcc build completely jam.

Which sound system, and which sound card driver?

> Oh maybe not completely as I get
> about 100ms of audio playing every 10 secs.  bash starts echoing what I type
> one char per approx 5 sec.  The only thing that still works fine is the
> magic sysrq that clearly shows that the CPU is spinning in the VM code. NFS
> trafic is also going on full bandwidth but no progress ever happens in user
> space.

--
Daniel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: What version of the kernel fixes these VM issues?
  2001-08-24 19:56     ` Daniel Phillips
@ 2001-08-24 20:12       ` Nicolas Pitre
  2001-08-24 22:35         ` Daniel Phillips
  0 siblings, 1 reply; 16+ messages in thread
From: Nicolas Pitre @ 2001-08-24 20:12 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Anwar P, linux-kernel



On Fri, 24 Aug 2001, Daniel Phillips wrote:

> On August 24, 2001 08:14 pm, Nicolas Pitre wrote:
> > I have a totally different setup but I can reproduce the same behavior on
> > the system I have here:
> >
> > ARM board with 32 MB RAM, no flash, NFS root.
> > The kernel is based on 2.4.8-ac9 plus some small VM fixes from -ac10.
> >
> > My test consist in compiling gcc 3.0 while some MP3s are continously playing
> > in the background.  The gcc build goes pretty far along until both the mp3
> > player and the gcc build completely jam.
>
> Which sound system, and which sound card driver?

The driver is for the UDA1341 on a SA1110 chip written by myself.  It is
fully OSS compliant, no ALSA.


Nicolas


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: What version of the kernel fixes these VM issues?
  2001-08-24 20:12       ` Nicolas Pitre
@ 2001-08-24 22:35         ` Daniel Phillips
  2001-08-25  3:00           ` Nicolas Pitre
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Phillips @ 2001-08-24 22:35 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Anwar P, linux-kernel

On August 24, 2001 10:12 pm, Nicolas Pitre wrote:
> On Fri, 24 Aug 2001, Daniel Phillips wrote:
> 
> > On August 24, 2001 08:14 pm, Nicolas Pitre wrote:
> > > I have a totally different setup but I can reproduce the same behavior on
> > > the system I have here:
> > >
> > > ARM board with 32 MB RAM, no flash, NFS root.
> > > The kernel is based on 2.4.8-ac9 plus some small VM fixes from -ac10.
> > >
> > > My test consist in compiling gcc 3.0 while some MP3s are continously playing
> > > in the background.  The gcc build goes pretty far along until both the mp3
> > > player and the gcc build completely jam.
> >
> > Which sound system, and which sound card driver?
> 
> The driver is for the UDA1341 on a SA1110 chip written by myself.  It is
> fully OSS compliant, no ALSA.

Your system should be able to handle that easily.  Do you have some meminfo
output to look at?  What about 2.4.9?

--
Daniel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: What version of the kernel fixes these VM issues?
  2001-08-24 22:35         ` Daniel Phillips
@ 2001-08-25  3:00           ` Nicolas Pitre
  2001-08-25  4:31             ` Steve Kieu
                               ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Nicolas Pitre @ 2001-08-25  3:00 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: lkml



On Sat, 25 Aug 2001, Daniel Phillips wrote:

> On August 24, 2001 10:12 pm, Nicolas Pitre wrote:
> > On Fri, 24 Aug 2001, Daniel Phillips wrote:
> >
> > > On August 24, 2001 08:14 pm, Nicolas Pitre wrote:
> > > > I have a totally different setup but I can reproduce the same behavior on
> > > > the system I have here:
> > > >
> > > > ARM board with 32 MB RAM, no cache, NFS root.
> > > > The kernel is based on 2.4.8-ac9 plus some small VM fixes from -ac10.
> > > >
> > > > My test consist in compiling gcc 3.0 while some MP3s are continously playing
> > > > in the background.  The gcc build goes pretty far along until both the mp3
> > > > player and the gcc build completely jam.
> > >
> > > Which sound system, and which sound card driver?
> >
> > The driver is for the UDA1341 on a SA1110 chip written by myself.  It is
> > fully OSS compliant, no ALSA.
>
> Your system should be able to handle that easily.  Do you have some meminfo
> output to look at?  What about 2.4.9?

OK now I have some comparative data.  Two kernels:

1) based on 2.4.8-ac9 + Riel's VM fixes from -ac10
2) based on 2.4.9 with no extra VM patches


The test:
=========

This board has 32MB RAM, no swap.  Root fs is NFS.  On the serial console I
start a command line mp3 player.  In a first telnet session I start a build
of gcc 3.0 (./configure; make).  In a second telnet session I start 'top'.
Music plays while gcc builds and I can see the CPU usage within top.
Pretty real scenario, real life situation, expected system load, no trick.


2.4.8-ac9 plus -ac10 VM fixes:
==============================

Everything looks fine for a while.  But all of a sudden: no more music, the
gcc build stalls, system looks dead.  In my first description of the problem
quoted above I said that the audio was spitting glitches of 100 ms every 10
secs or so.  This time there is nothing.  The only difference is the
addition of 'top' in the process list.  Even the telnet sessions aren't
echoing keystrokes anymore.  Only the serial console echoes characters so
kernel BH's are still running.  Sysrq on the serial console works
fortunately.  kswapd is looping like crazy:

kswapd        R C0023CC0     0     4      0             5     3 (L-TLB)

At first NFS traffic occupied the full network bandwidth to page stuff in.
But after a while the following was printed to the console:

nfs: task 41867 can't get a request slot
nfs: task 41868 can't get a request slot
nfs: task 41869 can't get a request slot

and then no more NFS packet on the network.  It is possible to ping the box
which means that the net BH still works.

A couple sysrq-P at random intervals shows the CPU looping in the following
functions:

PC value	System.map
--------	----------
c0040d84	zone_inactive_plenty
c0041024	try_to_swap_out
c00216e0	cpu_sa1100_cache_clean_invalidate_range
c00216d0	cpu_sa1100_cache_clean_invalidate_range
c0041304	swap_out_mm
c0041168	swap_out_pmd
c0044324	__get_swap_page
c0040d60	zone_inactive_plenty
c0041128	swap_out_pmd
c0040fec	try_to_swap_out

Sysrq-M gives:

SysRq : Show Memory
Mem-info:
Free pages:        1012kB (     0kB HighMem)
( Active: 2007, inactive_dirty: 0, inactive_clean: 0, free: 253 (255 510 765) )
3*4kB 1*8kB 0*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB = 1012kB)
= 0kB)
= 0kB)
Swap cache: add 0, delete 0, find 0/0
Free swap:            0kB
8192 pages of RAM
393 free pages
636 reserved pages
33 pages shared
0 pages swap cached
6 page tables cached
Buffer memory:     8000kB


2.4.9 kernel:
=============

Unlike the first (quoted) run, this kernel completely stalled when the jam
conditions were reached just like the run described above.  I mean here
there is no audio stuttering at all, no echo from telnet sessions, nothing
in user space gets to run anymore.  kswapd is well locked in the R state:

kswapd        R C0022CA0     0     4      0             5     3 (L-TLB)

Kernel interrupts and BHs still work i.e. I can ping the box, the serial
console still echoes characters (kernel termios), and sysrq works but that's
all.  What's also interesting here is the fact that there is absolutely no
NFS traffic going on.

A couple sysrq-P at random intervals shows the CPU looping in the following
functions:

PC value	System.map
--------	----------
c003f6e0	zone_inactive_plenty
c003fa58	swap_out_pmd
c0060324	prune_icache
c003f6fc	zone_inactive_plenty
c003faac	swap_out_pmd
c00209a0	cpu_sa1100_set_pte
c003fc80	swap_out_mm
c0040c10	refill_inactive_scan
c00206c0	cpu_sa1100_cache_clean_invalidate_range
c003fa90	swap_out_pmd

Sysrq-M gives:

SysRq: Show Memory
Mem-info:
Free pages:        1008kB (     0kB HighMem)
( Active: 2546, inactive_dirty: 63, inactive_clean: 0, free: 252 (255 510 765) )
0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB = 1008kB)
= 0kB)
= 0kB)
Swap cache: add 0, delete 0, find 0/0
Free swap:            0kB
8192 pages of RAM
392 free pages
602 reserved pages
576 pages shared
0 pages swap cached
8 page tables cached
Buffer memory:     8000kB


Notes:
======

- Both kernels go flat after approx 14 min of mp3 playback.
- With enough retries (i.e. reboots) the gcc build succeed, so there is no
  lack of ressources in theory.
- SEnding a signal to the mp3 player (^C on the serial console) doesn't
  change anything.
- The only way out is the reset button.
- Both kernels (with or without -ac) deadlock.
- The same behavior occurs with 2.4.8-ac4.


Hope this helps somehow.  If I can provide anything else please ask.


Nicolas


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: What version of the kernel fixes these VM issues?
  2001-08-25  3:00           ` Nicolas Pitre
@ 2001-08-25  4:31             ` Steve Kieu
  2001-08-25  8:09             ` Russell King
  2001-08-25 16:38             ` Daniel Phillips
  2 siblings, 0 replies; 16+ messages in thread
From: Steve Kieu @ 2001-08-25  4:31 UTC (permalink / raw)
  To: kernel

 --- Nicolas Pitre <nico@cam.org> wrote: > 
> 
> On Sat, 25 Aug 2001, Daniel Phillips wrote:
> 
> > On August 24, 2001 10:12 pm, Nicolas Pitre wrote:
> > > On Fri, 24 Aug 2001, Daniel Phillips wrote:
> > >
> > > > On August 24, 2001 08:14 pm, Nicolas Pitre
> wrote:
> > > > > I have a totally different setup but I can
> reproduce the same behavior on
> > > > > the system I have here:
> > > > >
> > > > > ARM board with 32 MB RAM, no cache, NFS
> root.
> > > > > The kernel is based on 2.4.8-ac9 plus some
> small VM fixes from -ac10.
> > > > >
> > > > > My test consist in compiling gcc 3.0 while
> some MP3s are continously playing
> > > > > in the background.  The gcc build goes
> pretty far along until both the mp3
> > > > > player and the gcc build completely jam.
> > > >
> > > > Which sound system, and which sound card
> driver?
> > >
> > > The driver is for the UDA1341 on a SA1110 chip
> written by myself.  It is
> > > fully OSS compliant, no ALSA.
> >
> > Your system should be able to handle that easily. 
> Do you have some meminfo
> > output to look at?  What about 2.4.9?
> 
> OK now I have some comparative data.  Two kernels:
> 
> 1) based on 2.4.8-ac9 + Riel's VM fixes from -ac10
> 2) based on 2.4.9 with no extra VM patches
> 
> 
> The test:
> =========
> 
> This board has 32MB RAM, no swap.  Root fs is NFS. 
> On the serial console I
> start a command line mp3 player.  In a first telnet
> session I start a build
> of gcc 3.0 (./configure; make).  In a second telnet
> session I start 'top'.
> Music plays while gcc builds and I can see the CPU
> usage within top.
> Pretty real scenario, real life situation, expected
> system load, no trick.

Strange enough to me, I can not reproduce this with my
computer nearly the same as your described situation

Except nfs root file system, I started several rxvt
play mpg123 ; compile the kernel ; browsing the net
using mozilla, running top at one rxvt, kernel uses
2.4.9 is fine; 2.4.6 is fine too.

My computer is 686 128Mb RAM only 72Mb swap

OOP, 2.4.9 applied the one small patch for VM fix
(just one line
> 
> 2.4.8-ac9 plus -ac10 VM fixes:
> ==============================
> 
> Everything looks fine for a while.  But all of a
> sudden: no more music, the
> gcc build stalls, system looks dead.  In my first
> description of the problem
> quoted above I said that the audio was spitting
> glitches of 100 ms every 10
> secs or so.  This time there is nothing.  The only
> difference is the
> addition of 'top' in the process list.  Even the
> telnet sessions aren't
> echoing keystrokes anymore.  Only the serial console
> echoes characters so
> kernel BH's are still running.  Sysrq on the serial
> console works
> fortunately.  kswapd is looping like crazy:
> 
> kswapd        R C0023CC0     0     4      0         
>    5     3 (L-TLB)
> 
> At first NFS traffic occupied the full network
> bandwidth to page stuff in.
> But after a while the following was printed to the
> console:
> 
> nfs: task 41867 can't get a request slot
> nfs: task 41868 can't get a request slot
> nfs: task 41869 can't get a request slot
> 
> and then no more NFS packet on the network.  It is
> possible to ping the box
> which means that the net BH still works.
> 
> A couple sysrq-P at random intervals shows the CPU
> looping in the following
> functions:
> 
> PC value	System.map
> --------	----------
> c0040d84	zone_inactive_plenty
> c0041024	try_to_swap_out
> c00216e0	cpu_sa1100_cache_clean_invalidate_range
> c00216d0	cpu_sa1100_cache_clean_invalidate_range
> c0041304	swap_out_mm
> c0041168	swap_out_pmd
> c0044324	__get_swap_page
> c0040d60	zone_inactive_plenty
> c0041128	swap_out_pmd
> c0040fec	try_to_swap_out
> 
> Sysrq-M gives:
> 
> SysRq : Show Memory
> Mem-info:
> Free pages:        1012kB (     0kB HighMem)
> ( Active: 2007, inactive_dirty: 0, inactive_clean:
> 0, free: 253 (255 510 765) )
> 3*4kB 1*8kB 0*16kB 1*32kB 1*64kB 1*128kB 1*256kB
> 1*512kB 0*1024kB 0*2048kB = 1012kB)
> = 0kB)
> = 0kB)
> Swap cache: add 0, delete 0, find 0/0
> Free swap:            0kB
> 8192 pages of RAM
> 393 free pages
> 636 reserved pages
> 33 pages shared
> 0 pages swap cached
> 6 page tables cached
> Buffer memory:     8000kB
> 
> 
> 2.4.9 kernel:
> =============
> 
> Unlike the first (quoted) run, this kernel
> completely stalled when the jam
> conditions were reached just like the run described
> above.  I mean here
> there is no audio stuttering at all, no echo from
> telnet sessions, nothing
> in user space gets to run anymore.  kswapd is well
> locked in the R state:
> 
> kswapd        R C0022CA0     0     4      0         
>    5     3 (L-TLB)
> 
> Kernel interrupts and BHs still work i.e. I can ping
> the box, the serial
> console still echoes characters (kernel termios),
> and sysrq works but that's
> all.  What's also interesting here is the fact that
> there is absolutely no
> NFS traffic going on.
> 
> A couple sysrq-P at random intervals shows the CPU
> looping in the following
> functions:
> 
> PC value	System.map
> --------	----------
> c003f6e0	zone_inactive_plenty
> c003fa58	swap_out_pmd
> c0060324	prune_icache
> c003f6fc	zone_inactive_plenty
> c003faac	swap_out_pmd
> c00209a0	cpu_sa1100_set_pte
> c003fc80	swap_out_mm
> c0040c10	refill_inactive_scan
> c00206c0	cpu_sa1100_cache_clean_invalidate_range
> c003fa90	swap_out_pmd
> 
> Sysrq-M gives:
> 
> SysRq: Show Memory
> Mem-info:
> Free pages:        1008kB (     0kB HighMem)
> ( Active: 2546, inactive_dirty: 63, inactive_clean:
> 0, free: 252 (255 510 765) )
> 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB
> 1*512kB 0*1024kB 0*2048kB = 1008kB)
> = 0kB)
> = 0kB)
> Swap cache: add 0, delete 0, find 0/0
> Free swap:            0kB
> 8192 pages of RAM
> 392 free pages
> 602 reserved pages
> 576 pages shared
> 0 pages swap cached
> 8 page tables cached
> Buffer memory:     8000kB
> 
> 
> Notes:
> ======
> 
> - Both kernels go flat after approx 14 min of mp3
> playback.
> - With enough retries (i.e. reboots) the gcc build
> succeed, so there is no
>   lack of ressources in theory.
> - SEnding a signal to the mp3 player (^C on the
> serial 
=== message truncated === 

=====
S.KIEU

_____________________________________________________________________________
http://shopping.yahoo.com.au - Father's Day Shopping
- Find the perfect gift for your Dad for Father's Day

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: What version of the kernel fixes these VM issues?
  2001-08-25  3:00           ` Nicolas Pitre
  2001-08-25  4:31             ` Steve Kieu
@ 2001-08-25  8:09             ` Russell King
  2001-08-25 16:38               ` Daniel Phillips
  2001-08-25 16:38             ` Daniel Phillips
  2 siblings, 1 reply; 16+ messages in thread
From: Russell King @ 2001-08-25  8:09 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Daniel Phillips, lkml

On Fri, Aug 24, 2001 at 11:00:05PM -0400, Nicolas Pitre wrote:
> 6 page tables cached

Although this won't help the basic problem, there could me as much as 100K
cached in those page tables.  I wonder if we could hook the pgt cache into
the VM cache shrinking, so we can free most, if not all of this cache
(rather than it being in the idle loop only).

I'll look into it, produce a patch, but I'm not a VM hacker.

--
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: What version of the kernel fixes these VM issues?
  2001-08-25  3:00           ` Nicolas Pitre
  2001-08-25  4:31             ` Steve Kieu
  2001-08-25  8:09             ` Russell King
@ 2001-08-25 16:38             ` Daniel Phillips
  2001-08-25 18:29               ` Nicolas Pitre
  2 siblings, 1 reply; 16+ messages in thread
From: Daniel Phillips @ 2001-08-25 16:38 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: lkml

On August 25, 2001 05:00 am, Nicolas Pitre wrote:
> This board has 32MB RAM, no swap.  Root fs is NFS.  On the serial console I
> start a command line mp3 player.  In a first telnet session I start a build
> of gcc 3.0 (./configure; make).  In a second telnet session I start 'top'.
> Music plays while gcc builds and I can see the CPU usage within top.
> Pretty real scenario, real life situation, expected system load, no trick.

You're streaming the mp3 over nfs, right?  From your setup I'd guess there's 
no local hard disk.

> nfs: task 41867 can't get a request slot
> nfs: task 41868 can't get a request slot
> nfs: task 41869 can't get a request slot

Uhuh.  Would you please look in your logs for "allocation failed" messages?
(Side note: reading the nfs code now... to whose eyes are names like tk_rqstp 
beautiful?)

> ( Active: 2007, inactive_dirty: 0, inactive_clean: 0, free: 253 (255 510 
765) )

Whoops, nothing inactive but kswap going full blast.  We're getting warmer.

> 2.4.9 kernel:
> =============
> 
> Unlike the first (quoted) run, this kernel completely stalled when the jam
> conditions were reached just like the run described above.  I mean here
> there is no audio stuttering at all, no echo from telnet sessions, nothing
> in user space gets to run anymore.

Yes, a slight difference, however they are both wedged in the same way, from 
your task samples:

> PC value	System.map
> --------	----------
> c003f6e0	zone_inactive_plenty
> c003fa58	swap_out_pmd
> c0060324	prune_icache
> c003f6fc	zone_inactive_plenty
> c003faac	swap_out_pmd
> c00209a0	cpu_sa1100_set_pte
> c003fc80	swap_out_mm
> c0040c10	refill_inactive_scan
> c00206c0	cpu_sa1100_cache_clean_invalidate_range
> c003fa90	swap_out_pmd

> Kernel interrupts and BHs still work i.e. I can ping the box, the serial
> console still echoes characters (kernel termios), and sysrq works but that's
> all.  What's also interesting here is the fact that there is absolutely no
> NFS traffic going on.

That's understandable.  Everything that needs to allocate memory is wedged.

> - The same behavior occurs with 2.4.8-ac4.

How far back do you have to go before you get one that works?  I seem to 
recall the inactive_plenty changes came in at 2.4.8-pre1.  Could you try it 
with 2.4.7, please.

--
Daniel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: What version of the kernel fixes these VM issues?
  2001-08-25  8:09             ` Russell King
@ 2001-08-25 16:38               ` Daniel Phillips
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel Phillips @ 2001-08-25 16:38 UTC (permalink / raw)
  To: Russell King, Nicolas Pitre; +Cc: lkml

On August 25, 2001 10:09 am, Russell King wrote:
> On Fri, Aug 24, 2001 at 11:00:05PM -0400, Nicolas Pitre wrote:
> > 6 page tables cached
> 
> Although this won't help the basic problem, there could me as much as 100K
> cached in those page tables.  I wonder if we could hook the pgt cache into
> the VM cache shrinking, so we can free most, if not all of this cache
> (rather than it being in the idle loop only).
> 
> I'll look into it, produce a patch, but I'm not a VM hacker.

You know what a pte is so you're a VM hacker ;-)

--
Daniel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: What version of the kernel fixes these VM issues?
  2001-08-25 16:38             ` Daniel Phillips
@ 2001-08-25 18:29               ` Nicolas Pitre
  2001-08-25 18:51                 ` Russell King
  0 siblings, 1 reply; 16+ messages in thread
From: Nicolas Pitre @ 2001-08-25 18:29 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: lkml



On Sat, 25 Aug 2001, Daniel Phillips wrote:

> On August 25, 2001 05:00 am, Nicolas Pitre wrote:
> > This board has 32MB RAM, no swap.  Root fs is NFS.  On the serial console I
> > start a command line mp3 player.  In a first telnet session I start a build
> > of gcc 3.0 (./configure; make).  In a second telnet session I start 'top'.
> > Music plays while gcc builds and I can see the CPU usage within top.
> > Pretty real scenario, real life situation, expected system load, no trick.
>
> You're streaming the mp3 over nfs, right?  From your setup I'd guess there's
> no local hard disk.

No there isn't.

> > - The same behavior occurs with 2.4.8-ac4.
>
> How far back do you have to go before you get one that works?  I seem to
> recall the inactive_plenty changes came in at 2.4.8-pre1.  Could you try it
> with 2.4.7, please.

2.4.7 does the same, spinning in kswapd while everything else is stalled,
except for kernel BHs.


SysRq: Show Memory
Mem-info:
Free pages:        1016kB (     0kB HighMem)
( Active: 2554, inactive_dirty: 0, inactive_clean: 0, free: 254 (255 510 765) )
4*4kB 1*8kB 0*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB = 1016kB)
= 0kB)
= 0kB)
Swap cache: add 0, delete 0, find 0/0
Free swap:            0kB
8192 pages of RAM
395 free pages
626 reserved pages
581 pages shared
0 pages swap cached
3 page tables cached
Buffer memory:     8000kB


So 2.4.7 was screwed the same way too.


Nicolas


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: What version of the kernel fixes these VM issues?
  2001-08-25 18:29               ` Nicolas Pitre
@ 2001-08-25 18:51                 ` Russell King
  0 siblings, 0 replies; 16+ messages in thread
From: Russell King @ 2001-08-25 18:51 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Daniel Phillips, lkml

Ok, I'm saying this for the record, Nico already knows the following ;)

On Sat, Aug 25, 2001 at 02:29:00PM -0400, Nicolas Pitre wrote:
> SysRq: Show Memory
> Mem-info:
> Free pages:        1016kB (     0kB HighMem)
> ( Active: 2554, inactive_dirty: 0, inactive_clean: 0, free: 254 (255 510 765) )
> 4*4kB 1*8kB 0*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB = 1016kB)
> = 0kB)
> = 0kB)
> Swap cache: add 0, delete 0, find 0/0
> Free swap:            0kB
> 8192 pages of RAM
> 395 free pages
> 626 reserved pages
> 581 pages shared
> 0 pages swap cached
> 3 page tables cached
> Buffer memory:     8000kB

The above buffer memory usage is caused by an 8MB ramdisk.  Unfortunately,
we are looking by default for 8192 * (2 + 2) / 100 = 327 pages between the
buffermem and the page cache before triggering the OOM handler.

With an 8MB ramdisk (== 2000 pages) obviously we'll never reach that, so
we might as well not have the oom handler in this case... or...

With the attached patch, we take the number of ramdisk pages into account
when checking for OOM.  (side note: we don't seem to account for ramdisk
pages, therefore I have to count the individual pages on the active list).

The patch below factors out the fixed ramdisk allocation, and allows the
OOM killer to be functional on machines with ramdisks.  We only count the
number of ramdisk pages when we're getting close to the limit (ie,
freepages stuff indicates oom, and there's no swap).

As an added bonus, this patch also dumps out the number of Page Cache
pages, buffermem pages, and ramdisk pages on sysrq-m.

Note that this doesn't solve Nico's original problem.

diff -x .* -x *.[oas] -urN ref/mm/oom_kill.c linux/mm/oom_kill.c
--- ref/mm/oom_kill.c	Tue Aug 14 21:39:03 2001
+++ linux/mm/oom_kill.c	Sat Aug 25 18:01:41 2001
@@ -193,6 +193,8 @@
 	return;
 }
 
+extern long count_ramdisk_pages(void);
+
 /**
  * out_of_memory - is the system out of memory?
  *
@@ -210,6 +212,10 @@
 	if (nr_free_pages() + nr_inactive_clean_pages() > freepages.low)
 		return 0;
 
+	/* Enough swap space left?  Not OOM. */
+	if (nr_swap_pages > 0)
+		return 0;
+
 	/*
 	 * If the buffer and page cache (excluding swap cache) are over
 	 * their (/proc tunable) minimum, we're still not OOM.  We test
@@ -219,14 +225,11 @@
 	cache_mem = atomic_read(&page_cache_size);
 	cache_mem += atomic_read(&buffermem_pages);
 	cache_mem -= swapper_space.nrpages;
+	cache_mem -= count_ramdisk_pages();
 	limit = (page_cache.min_percent + buffer_mem.min_percent);
 	limit *= num_physpages / 100;
 
 	if (cache_mem > limit)
-		return 0;
-
-	/* Enough swap space left?  Not OOM. */
-	if (nr_swap_pages > 0)
 		return 0;
 
 	/* Else... */
diff -x .* -x *.[oas] -urN ref/mm/page_alloc.c linux/mm/page_alloc.c
--- ref/mm/page_alloc.c	Tue Aug 21 22:30:51 2001
+++ linux/mm/page_alloc.c	Sat Aug 25 18:01:12 2001
@@ -690,6 +690,8 @@
      return (sum > 0 ? sum : 0);
 }
 
+extern long count_ramdisk_pages(void);
+
 /*
  * Show free area list (used inside shift_scroll-lock stuff)
  * We also calculate the percentage fragmentation. We do this by counting the
@@ -743,6 +745,10 @@
 #ifdef SWAP_CACHE_INFO
 	show_swap_cache_info();
 #endif	
+
+	printk("Page cache size: %d\n", atomic_read(&page_cache_size));
+	printk("Buffer mem: %d\n", atomic_read(&buffermem_pages));
+	printk("Ramdisk pages: %ld\n", count_ramdisk_pages());
 }
 
 void show_free_areas(void)
diff -x .* -x *.[oas] -urN ref/mm/vmscan.c linux/mm/vmscan.c
--- ref/mm/vmscan.c	Thu Aug 23 20:07:43 2001
+++ linux/mm/vmscan.c	Sat Aug 25 19:11:46 2001
@@ -816,6 +816,24 @@
 	return nr_deactivated;
 }
 
+long count_ramdisk_pages(void)
+{
+	struct list_head *page_lru;
+	struct page *page;
+	long nr_ramdisk = 0;
+
+	spin_lock(&pagemap_lru_lock);
+	for (page_lru = active_list.next; page_lru != &active_list;
+	     page_lru = page_lru->next) {
+		page = list_entry(page_lru, struct page, lru);
+		if (page_ramdisk(page))
+			nr_ramdisk ++;
+	}
+	spin_unlock(&pagemap_lru_lock);
+
+	return nr_ramdisk;
+}
+
 /*
  * Check if there are zones with a severe shortage of free pages,
  * or if all zones have a minor shortage.

--
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2001-08-25 18:51 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-08-24  8:47 What version of the kernel fixes these VM issues? Anwar P
2001-08-24 13:18 ` Daniel Phillips
2001-08-24 18:14   ` Nicolas Pitre
2001-08-24 18:25     ` Nicolas Pitre
2001-08-24 18:48       ` Mark Frazer
2001-08-24 19:53     ` Daniel Phillips
2001-08-24 19:56     ` Daniel Phillips
2001-08-24 20:12       ` Nicolas Pitre
2001-08-24 22:35         ` Daniel Phillips
2001-08-25  3:00           ` Nicolas Pitre
2001-08-25  4:31             ` Steve Kieu
2001-08-25  8:09             ` Russell King
2001-08-25 16:38               ` Daniel Phillips
2001-08-25 16:38             ` Daniel Phillips
2001-08-25 18:29               ` Nicolas Pitre
2001-08-25 18:51                 ` Russell King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).