linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* cache limit
@ 2003-08-19  4:39 Anthony R.
  2003-08-19  4:57 ` Nuno Silva
                   ` (3 more replies)
  0 siblings, 4 replies; 37+ messages in thread
From: Anthony R. @ 2003-08-19  4:39 UTC (permalink / raw)
  To: linux-kernel

Hi,

I would like to tune my kernel not to use as much memory for cache
as it currently does. I have 2GB RAM, but when I am running one program
that accesses a lot of files on my disk (like rsync), that program uses
most of the cache, and other programs wind up swapping out. I'd prefer to
have just rsync run slower because less of its data is cached, rather
than have
all my other programs run more slowly. rsync is not allocating memory,
but the kernel is caching it at the expense of other programs.

With 2GB on a system, I should never page out, but I consistently do and I
need to tune the kernel to avoid that. Cache usage is around 1.4 GB!
I never had this problem with earlier kernels. I've read a lot of comments
where so-called experts poo-poo this problem, but it is real and
repeatable and I am
ready to take matters into my own hands to fix it. I am told the cache
is replaced when
another program needs more memory, so it shouldn't swap, but that is not
the
behaviour I am seeing.

Can anyone help point me in the right direction?
Do any kernel developers care about this?

My kernel is stock 2.4.21, I run Redhat 9 on a 3GHz P4. I'd give you MB
info but I've seen
this behaviour on other motherboards as well.

Thank you very much for your help.

-- tony
"Surrender to the Void." 
-- John Lennon



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-19  4:39 cache limit Anthony R.
@ 2003-08-19  4:57 ` Nuno Silva
  2003-08-19  5:33 ` Denis Vlasenko
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 37+ messages in thread
From: Nuno Silva @ 2003-08-19  4:57 UTC (permalink / raw)
  To: Anthony R.; +Cc: linux-kernel

Hello!

Anthony R. wrote:

[..snip..]

> With 2GB on a system, I should never page out, but I consistently do and I

One, very easy, solution is to do:
# swapoff -a

FWIW, I'd like an option to limit the cache size to a maximum amount... 
Say: echo 500000 > /proc/sys/vm/max_disk_cache

But, AFAIK, that's not going to happen.

Regards,
Nuno Silva



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-19  4:39 cache limit Anthony R.
  2003-08-19  4:57 ` Nuno Silva
@ 2003-08-19  5:33 ` Denis Vlasenko
  2003-08-19  6:20   ` Andrew Morton
  2003-08-19 14:28   ` Anthony R.
  2003-08-19  5:42 ` Nick Piggin
  2003-08-21  0:49 ` Takao Indoh
  3 siblings, 2 replies; 37+ messages in thread
From: Denis Vlasenko @ 2003-08-19  5:33 UTC (permalink / raw)
  To: Anthony R., linux-kernel

On 19 August 2003 07:39, Anthony R. wrote:
> I would like to tune my kernel not to use as much memory for cache
> as it currently does. I have 2GB RAM, but when I am running one program
> that accesses a lot of files on my disk (like rsync), that program uses
> most of the cache, and other programs wind up swapping out. I'd prefer to
> have just rsync run slower because less of its data is cached, rather
> than have
> all my other programs run more slowly. rsync is not allocating memory,
> but the kernel is caching it at the expense of other programs.

There was a discussion (and patches) in the middle of 2.5 series
about O_STREAMING open flag which mean "do not aggressively cache
this file". Targeted at MP3/video playing, copying large files and such.

I don't know whether it actually was merged. If it was,
your program can use it.
 
> With 2GB on a system, I should never page out, but I consistently do and I
> need to tune the kernel to avoid that. Cache usage is around 1.4 GB!

So why did you configured your system to have huge swap?
That's rather contradictory setup ;)

> I never had this problem with earlier kernels. I've read a lot of comments
> where so-called experts poo-poo this problem, but it is real and
> repeatable and I am
> ready to take matters into my own hands to fix it. I am told the cache
> is replaced when
> another program needs more memory, so it shouldn't swap, but that is not
> the
> behaviour I am seeing.
> 
> Can anyone help point me in the right direction?

I'd say stop allocating insane amounts of swap.
Frankly, with 2G you may run without swap at all.
--
vda

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-19  4:39 cache limit Anthony R.
  2003-08-19  4:57 ` Nuno Silva
  2003-08-19  5:33 ` Denis Vlasenko
@ 2003-08-19  5:42 ` Nick Piggin
  2003-08-21  0:49 ` Takao Indoh
  3 siblings, 0 replies; 37+ messages in thread
From: Nick Piggin @ 2003-08-19  5:42 UTC (permalink / raw)
  To: Anthony R.; +Cc: linux-kernel

Anthony R. wrote:

>Hi,
>  
>
>
>I would like to tune my kernel not to use as much memory for cache
>as it currently does. I have 2GB RAM, but when I am running one program
>that accesses a lot of files on my disk (like rsync), that program uses
>most of the cache, and other programs wind up swapping out. I'd prefer to
>have just rsync run slower because less of its data is cached, rather
>than have
>all my other programs run more slowly. rsync is not allocating memory,
>but the kernel is caching it at the expense of other programs.
>
>With 2GB on a system, I should never page out, but I consistently do and I
>need to tune the kernel to avoid that. Cache usage is around 1.4 GB!
>I never had this problem with earlier kernels. I've read a lot of comments
>where so-called experts poo-poo this problem, but it is real and
>repeatable and I am
>ready to take matters into my own hands to fix it. I am told the cache
>is replaced when
>another program needs more memory, so it shouldn't swap, but that is not
>the
>behaviour I am seeing.
>
>Can anyone help point me in the right direction?
>Do any kernel developers care about this?
>
>My kernel is stock 2.4.21, I run Redhat 9 on a 3GHz P4. I'd give you MB
>info but I've seen
>this behaviour on other motherboards as well.
>
>Thank you very much for your help.
>
>-- tony
>"Surrender to the Void." 
>-- John Lennon
>
>

Hi Anthony,
If you're up for a bit of work, give the "aa" series kernels a try, also
see how 2.6-test goes and be sure to report any problems you encounter.

The VM in stock 2.4 is slow to pick up updates due to being a stable series.
The problems definitely won't get poo-pooed here. Be sure you include a
good description of your workload and probably a log of vmstat 1 to start
with.




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-19  5:33 ` Denis Vlasenko
@ 2003-08-19  6:20   ` Andrew Morton
  2003-08-19  9:05     ` J.A. Magallon
  2003-08-19 13:32     ` Erik Andersen
  2003-08-19 14:28   ` Anthony R.
  1 sibling, 2 replies; 37+ messages in thread
From: Andrew Morton @ 2003-08-19  6:20 UTC (permalink / raw)
  To: vda; +Cc: russo.lutions, linux-kernel

Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> wrote:
>
> There was a discussion (and patches) in the middle of 2.5 series
>  about O_STREAMING open flag which mean "do not aggressively cache
>  this file". Targeted at MP3/video playing, copying large files and such.
> 
>  I don't know whether it actually was merged. If it was,
>  your program can use it.

It was not.  Instead we have fadvise.  So it would be appropriate to change
applications such as rsync to optionally run

	posix_fadvise(fd, 0, -1, POSIX_FADV_DONTNEED)

against file descriptors just before closing them, so all the pagecache
gets thrown away.  (Well, most of the pagecache - dirty pages won't get
dropped - the app must fsync the files by hand first if it wants this)

This would be a useful addition to rsync and such applications - it is
stronger and more specific and safer than banging on the VM for a special
case.

But if you want to bang on the VM for a special case, run 2.6 and set
/proc/sys/vm/swappiness to zero during the rsync run.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-19  6:20   ` Andrew Morton
@ 2003-08-19  9:05     ` J.A. Magallon
  2003-08-19  9:16       ` Andrew Morton
  2003-08-19 13:32     ` Erik Andersen
  1 sibling, 1 reply; 37+ messages in thread
From: J.A. Magallon @ 2003-08-19  9:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: vda, russo.lutions, linux-kernel


On 08.19, Andrew Morton wrote:
> Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> wrote:
> >
> > There was a discussion (and patches) in the middle of 2.5 series
> >  about O_STREAMING open flag which mean "do not aggressively cache
> >  this file". Targeted at MP3/video playing, copying large files and such.
> > 
> >  I don't know whether it actually was merged. If it was,
> >  your program can use it.
> 
> It was not.  Instead we have fadvise.  So it would be appropriate to change

Does this work in 2.4 ?
If not, any patch flying around ?
It would be interesting to have this functionality in 2.4 also so
people can start modifying and teting things like DVD readers, rsync,
updatedb, grep and so on...

I have tested O_STREAMING in 2.4 and it is fine...

TIA

-- 
J.A. Magallon <jamagallon@able.es>      \                 Software is like sex:
werewolf.able.es                         \           It's better when it's free
Mandrake Linux release 9.2 (Cooker) for i586
Linux 2.4.22-rc2-jam1m (gcc 3.3.1 (Mandrake Linux 9.2 3.3.1-1mdk))

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-19  9:05     ` J.A. Magallon
@ 2003-08-19  9:16       ` Andrew Morton
  2003-08-19  9:28         ` J.A. Magallon
  0 siblings, 1 reply; 37+ messages in thread
From: Andrew Morton @ 2003-08-19  9:16 UTC (permalink / raw)
  To: J.A. Magallon; +Cc: vda, russo.lutions, linux-kernel

"J.A. Magallon" <jamagallon@able.es> wrote:
>
>  > It was not.  Instead we have fadvise.  So it would be appropriate to change
> 
>  Does this work in 2.4 ?
>  If not, any patch flying around ?

No.  It would be fairly messy to implement in 2.4 because 2.4 does not have
the per-inode radix trees for pagecache.  The implementation would need to
walk every page attached to the inode just to shoot down a single page. 
And all of it underneath the global pagecache lock.

But it is certainly possible.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-19  9:16       ` Andrew Morton
@ 2003-08-19  9:28         ` J.A. Magallon
  2003-08-19  9:43           ` Andrew Morton
  0 siblings, 1 reply; 37+ messages in thread
From: J.A. Magallon @ 2003-08-19  9:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: J.A. Magallon, vda, russo.lutions, linux-kernel


On 08.19, Andrew Morton wrote:
> "J.A. Magallon" <jamagallon@able.es> wrote:
> >
> >  > It was not.  Instead we have fadvise.  So it would be appropriate to
change
> > 
> >  Does this work in 2.4 ?
> >  If not, any patch flying around ?
> 
> No.  It would be fairly messy to implement in 2.4 because 2.4 does not have
> the per-inode radix trees for pagecache.  The implementation would need to
> walk every page attached to the inode just to shoot down a single page. 
> And all of it underneath the global pagecache lock.
> 
> But it is certainly possible.
> 

So could O_STREAMING be included in 2.4, and let people do things like

#if 2.4
	fcntl(...O_STREAMING...)
#else
	posix_fadvise()
#endif

Or, if fadvise just fails with error code in 2.4, 
	if (fadvise()<0)
		fcntl(O_STREAMING)

Or even:
	fadvise()
	fcntl(O_STREAMING):
and let whatever succeed...

Or is it too dirt ?

TIA

-- 
J.A. Magallon <jamagallon@able.es>      \                 Software is like sex:
werewolf.able.es                         \           It's better when it's free
Mandrake Linux release 9.2 (Cooker) for i586
Linux 2.4.22-rc2-jam1m (gcc 3.3.1 (Mandrake Linux 9.2 3.3.1-1mdk))

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-19  9:28         ` J.A. Magallon
@ 2003-08-19  9:43           ` Andrew Morton
  0 siblings, 0 replies; 37+ messages in thread
From: Andrew Morton @ 2003-08-19  9:43 UTC (permalink / raw)
  To: J.A. Magallon; +Cc: jamagallon, vda, russo.lutions, linux-kernel

"J.A. Magallon" <jamagallon@able.es> wrote:
>
> 
> So could O_STREAMING be included in 2.4, and let people do things like

Sounds fairly ugh, actually.  It might be better to just implement
fadvise().

O_STREAMING is really designed for large streaming writes; the current
implementation only performs invalidation after each megabyte of I/O, so it
would fail to do anything at all in the lots-of-medium-size-files case
such as rsync.

Or use 2.6.  It will take a while for the feature to usefully propagate into
applications anyway...


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-19  6:20   ` Andrew Morton
  2003-08-19  9:05     ` J.A. Magallon
@ 2003-08-19 13:32     ` Erik Andersen
  2003-08-19 20:56       ` Andrew Morton
  1 sibling, 1 reply; 37+ messages in thread
From: Erik Andersen @ 2003-08-19 13:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: vda, russo.lutions, linux-kernel

On Mon Aug 18, 2003 at 11:20:24PM -0700, Andrew Morton wrote:
> Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> wrote:
> >
> > There was a discussion (and patches) in the middle of 2.5 series
> >  about O_STREAMING open flag which mean "do not aggressively cache
> >  this file". Targeted at MP3/video playing, copying large files and such.
> > 
> >  I don't know whether it actually was merged. If it was,
> >  your program can use it.
> 
> It was not.  Instead we have fadvise.  So it would be appropriate to change
> applications such as rsync to optionally run
> 
> 	posix_fadvise(fd, 0, -1, POSIX_FADV_DONTNEED)
> 
> against file descriptors just before closing them, so all the pagecache
> gets thrown away.  (Well, most of the pagecache - dirty pages won't get
> dropped - the app must fsync the files by hand first if it wants this)

This is not supported in 2.4.x though, right?

What if I don't want to fill up the pagecache with garbage in the
first place?  When closing a file descriptor, it is already too
late -- the one time only giant pile of data has already caused
the kernel to wastefully flush useful things out of cache...

 -Erik

--
Erik B. Andersen             http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-19  5:33 ` Denis Vlasenko
  2003-08-19  6:20   ` Andrew Morton
@ 2003-08-19 14:28   ` Anthony R.
  2003-08-19 18:26     ` Mike Fedyk
  1 sibling, 1 reply; 37+ messages in thread
From: Anthony R. @ 2003-08-19 14:28 UTC (permalink / raw)
  To: linux-kernel


>>another program needs more memory, so it shouldn't swap, but that is not
>>the
>>behaviour I am seeing.
>>
>>Can anyone help point me in the right direction?
>>    
>>
>
>I'd say stop allocating insane amounts of swap.
>Frankly, with 2G you may run without swap at all.
>  
>
I'm not sure how you knew I had 2GB of swap. ;)
I just always thought it was a good idea to have some just in case.
I did not know having swap would actually, in some cases, degrade
performance.

Are you saying that, if I turn off swap, the amount of cache used will
be the same, but that when other programs need more memory, the kernel
will take it from cache? If so, I will try, since that would be
an ideal solution.

And while O_STREAMING sounds good, I'm not really up for rewriting
all the rsync-like apps. I want my OS to deal with it.

Thanks.

-- tony
"Surrender to the Void." -- John Lennon




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-19 14:28   ` Anthony R.
@ 2003-08-19 18:26     ` Mike Fedyk
  0 siblings, 0 replies; 37+ messages in thread
From: Mike Fedyk @ 2003-08-19 18:26 UTC (permalink / raw)
  To: Anthony R.; +Cc: linux-kernel

On Tue, Aug 19, 2003 at 10:28:58AM -0400, Anthony R. wrote:
> 
> >>another program needs more memory, so it shouldn't swap, but that is not
> >>the
> >>behaviour I am seeing.
> >>
> >>Can anyone help point me in the right direction?
> >>    
> >>
> >
> >I'd say stop allocating insane amounts of swap.
> >Frankly, with 2G you may run without swap at all.
> >  
> >
> I'm not sure how you knew I had 2GB of swap. ;)
> I just always thought it was a good idea to have some just in case.
> I did not know having swap would actually, in some cases, degrade
> performance.
> 
> Are you saying that, if I turn off swap, the amount of cache used will
> be the same, but that when other programs need more memory, the kernel
> will take it from cache? If so, I will try, since that would be
> an ideal solution.

And the -aa and rmap kernels do that with swap on too.

If you test them and they don't for your workload, please get back to the
list and let use know.

It is well known that the stock 2.4 VM is WAY behind -aa and rmap in terms
of reactiveness and correct choices.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-19 13:32     ` Erik Andersen
@ 2003-08-19 20:56       ` Andrew Morton
  0 siblings, 0 replies; 37+ messages in thread
From: Andrew Morton @ 2003-08-19 20:56 UTC (permalink / raw)
  To: andersen; +Cc: vda, russo.lutions, linux-kernel

Erik Andersen <andersen@codepoet.org> wrote:
>
> On Mon Aug 18, 2003 at 11:20:24PM -0700, Andrew Morton wrote:
> > Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> wrote:
> > >
> > > There was a discussion (and patches) in the middle of 2.5 series
> > >  about O_STREAMING open flag which mean "do not aggressively cache
> > >  this file". Targeted at MP3/video playing, copying large files and such.
> > > 
> > >  I don't know whether it actually was merged. If it was,
> > >  your program can use it.
> > 
> > It was not.  Instead we have fadvise.  So it would be appropriate to change
> > applications such as rsync to optionally run
> > 
> > 	posix_fadvise(fd, 0, -1, POSIX_FADV_DONTNEED)
> > 
> > against file descriptors just before closing them, so all the pagecache
> > gets thrown away.  (Well, most of the pagecache - dirty pages won't get
> > dropped - the app must fsync the files by hand first if it wants this)
> 
> This is not supported in 2.4.x though, right?

No, it is not.

> What if I don't want to fill up the pagecache with garbage in the
> first place?

Call fadvise(POSIX_FADV_DONTNEED) more frequently or use O_DIRECT.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-19  4:39 cache limit Anthony R.
                   ` (2 preceding siblings ...)
  2003-08-19  5:42 ` Nick Piggin
@ 2003-08-21  0:49 ` Takao Indoh
  2003-08-21 23:47   ` Mike Fedyk
  3 siblings, 1 reply; 37+ messages in thread
From: Takao Indoh @ 2003-08-21  0:49 UTC (permalink / raw)
  To: linux-kernel

Hi.

On Tue, 19 Aug 2003 00:39:49 -0400, "Anthony R." wrote:

>I would like to tune my kernel not to use as much memory for cache
>as it currently does. I have 2GB RAM, but when I am running one program
>that accesses a lot of files on my disk (like rsync), that program uses
>most of the cache, and other programs wind up swapping out. I'd prefer to
>have just rsync run slower because less of its data is cached, rather
>than have
>all my other programs run more slowly. rsync is not allocating memory,
>but the kernel is caching it at the expense of other programs.
>
>With 2GB on a system, I should never page out, but I consistently do and I
>need to tune the kernel to avoid that. Cache usage is around 1.4 GB!
>I never had this problem with earlier kernels. I've read a lot of comments
>where so-called experts poo-poo this problem, but it is real and
>repeatable and I am
>ready to take matters into my own hands to fix it. I am told the cache
>is replaced when
>another program needs more memory, so it shouldn't swap, but that is not
>the
>behaviour I am seeing.
>
>Can anyone help point me in the right direction?
>Do any kernel developers care about this?

I also have the same problem about pagecache. Pagecache become bigger
until most of memory is used, and it influences the performance of other
programs. It is a serious problem in the OLTP system.  When the
transaction increases, transaction processing stalls, because most of
memory is used as pagecache and it takes many time to reclaim memory
from pagecache. It causes the whole system's stalling. To avoid that,
the limitation of pagecache is necessary.

Actually, in the system I constructed(RedHat AdvancedServer2.1, kernel
2.4.9based), the problem occurred due to pagecache. The system's maximum
response time had to be less than 4 seconds, but owing to the pagecache,
response time get uneven, and maximum time became 10 seconds.
This trouble was solved by controlling pagecache
using /proc/sys/vm/pagecache.

I made a patch to add new paramter /proc/sys/vm/pgcache-max. It controls
maximum number of pages used as pagecache.
An attached file is a mere test patch, so it may contain a bug or ugly
code. Please let me know if there is an advice, comment, better
implementation, and so on.

Thanks.
--------------------------------------------------
Takao Indoh
 E-Mail : indou.takao@soft.fujitsu.com



diff -Nur linux-2.5.64/include/linux/gfp.h linux-2.5.64-new/include/linux/gfp.h
--- linux-2.5.64/include/linux/gfp.h	Wed Mar  5 12:29:03 2003
+++ linux-2.5.64-new/include/linux/gfp.h	Tue Apr  8 11:12:33 2003
@@ -18,6 +18,7 @@
 #define __GFP_FS	0x80	/* Can call down to low-level FS? */
 #define __GFP_COLD	0x100	/* Cache-cold page required */
 #define __GFP_NOWARN	0x200	/* Suppress page allocation failure warning */
+#define __GFP_PGCACHE   0x400   /* Page-cache required */
 
 #define GFP_ATOMIC	(__GFP_HIGH)
 #define GFP_NOIO	(__GFP_WAIT)
diff -Nur linux-2.5.64/include/linux/mm.h linux-2.5.64-new/include/linux/mm.h
--- linux-2.5.64/include/linux/mm.h	Wed Mar  5 12:28:56 2003
+++ linux-2.5.64-new/include/linux/mm.h	Tue Apr  8 11:12:33 2003
@@ -22,6 +22,7 @@
 extern unsigned long num_physpages;
 extern void * high_memory;
 extern int page_cluster;
+extern unsigned long max_pgcache;
 
 #include <asm/page.h>
 #include <asm/pgtable.h>
diff -Nur linux-2.5.64/include/linux/page-flags.h linux-2.5.64-new/include/linux/page-flags.h
--- linux-2.5.64/include/linux/page-flags.h	Wed Mar  5 12:29:31 2003
+++ linux-2.5.64-new/include/linux/page-flags.h	Tue Apr  8 11:12:33 2003
@@ -74,6 +74,7 @@
 #define PG_mappedtodisk		17	/* Has blocks allocated on-disk */
 #define PG_reclaim		18	/* To be reclaimed asap */
 #define PG_compound		19	/* Part of a compound page */
+#define PG_pgcache		20	/* Page is used as pagecache */
 
 /*
  * Global page accounting.  One instance per CPU.  Only unsigned longs are
@@ -255,6 +256,10 @@
 #define PageCompound(page)	test_bit(PG_compound, &(page)->flags)
 #define SetPageCompound(page)	set_bit(PG_compound, &(page)->flags)
 #define ClearPageCompound(page)	clear_bit(PG_compound, &(page)->flags)
+
+#define PagePgcache(page)      test_bit(PG_pgcache, &(page)->flags)
+#define SetPagePgcache(page)   set_bit(PG_pgcache, &(page)->flags)
+#define ClearPagePgcache(page) clear_bit(PG_pgcache, &(page)->flags)
 
 /*
  * The PageSwapCache predicate doesn't use a PG_flag at this time,
diff -Nur linux-2.5.64/include/linux/pagemap.h linux-2.5.64-new/include/linux/pagemap.h
--- linux-2.5.64/include/linux/pagemap.h	Wed Mar  5 12:28:53 2003
+++ linux-2.5.64-new/include/linux/pagemap.h	Tue Apr  8 11:12:33 2003
@@ -29,12 +29,12 @@
 
 static inline struct page *page_cache_alloc(struct address_space *x)
 {
-	return alloc_pages(x->gfp_mask, 0);
+	return alloc_pages(x->gfp_mask|__GFP_PGCACHE, 0);
 }
 
 static inline struct page *page_cache_alloc_cold(struct address_space *x)
 {
-	return alloc_pages(x->gfp_mask|__GFP_COLD, 0);
+	return alloc_pages(x->gfp_mask|__GFP_COLD|__GFP_PGCACHE, 0);
 }
 
 typedef int filler_t(void *, struct page *);
@@ -80,6 +80,7 @@
 	list_add(&page->list, &mapping->clean_pages);
 	page->mapping = mapping;
 	page->index = index;
+	SetPagePgcache(page);
 
 	mapping->nrpages++;
 	inc_page_state(nr_pagecache);
diff -Nur linux-2.5.64/include/linux/sysctl.h linux-2.5.64-new/include/linux/sysctl.h
--- linux-2.5.64/include/linux/sysctl.h	Wed Mar  5 12:29:21 2003
+++ linux-2.5.64-new/include/linux/sysctl.h	Tue Apr  8 11:12:33 2003
@@ -155,6 +155,7 @@
 	VM_HUGETLB_PAGES=18,	/* int: Number of available Huge Pages */
 	VM_SWAPPINESS=19,	/* Tendency to steal mapped memory */
 	VM_LOWER_ZONE_PROTECTION=20,/* Amount of protection of lower zones */
+	VM_MAXPGCACHE=21,/* maximum number of page used as pagecache */
 };
 
 
diff -Nur linux-2.5.64/kernel/sysctl.c linux-2.5.64-new/kernel/sysctl.c
--- linux-2.5.64/kernel/sysctl.c	Wed Mar  5 12:28:58 2003
+++ linux-2.5.64-new/kernel/sysctl.c	Tue Apr  8 11:12:33 2003
@@ -319,6 +319,8 @@
 	 &sysctl_lower_zone_protection, sizeof(sysctl_lower_zone_protection),
 	 0644, NULL, &proc_dointvec_minmax, &sysctl_intvec, NULL, &zero,
 	 NULL, },
+	{VM_MAXPGCACHE, "pgcache-max", &max_pgcache, sizeof(unsigned long),
+	 0644, NULL,&proc_dointvec_minmax, &sysctl_intvec, NULL,&zero,NULL},
 	{0}
 };
 
diff -Nur linux-2.5.64/mm/filemap.c linux-2.5.64-new/mm/filemap.c
--- linux-2.5.64/mm/filemap.c	Wed Mar  5 12:29:15 2003
+++ linux-2.5.64-new/mm/filemap.c	Tue Apr  8 11:12:33 2003
@@ -86,6 +86,7 @@
 	radix_tree_delete(&mapping->page_tree, page->index);
 	list_del(&page->list);
 	page->mapping = NULL;
+	ClearPagePgcache(page);
 
 	mapping->nrpages--;
 	dec_page_state(nr_pagecache);
@@ -437,7 +438,7 @@
 	page = find_lock_page(mapping, index);
 	if (!page) {
 		if (!cached_page) {
-			cached_page = alloc_page(gfp_mask);
+			cached_page = alloc_page(gfp_mask|__GFP_PGCACHE);
 			if (!cached_page)
 				return NULL;
 		}
@@ -507,7 +508,7 @@
 		return NULL;
 	}
 	gfp_mask = mapping->gfp_mask & ~__GFP_FS;
-	page = alloc_pages(gfp_mask, 0);
+	page = alloc_pages(gfp_mask|__GFP_PGCACHE, 0);
 	if (page && add_to_page_cache_lru(page, mapping, index, gfp_mask)) {
 		page_cache_release(page);
 		page = NULL;
diff -Nur linux-2.5.64/mm/page_alloc.c linux-2.5.64-new/mm/page_alloc.c
--- linux-2.5.64/mm/page_alloc.c	Wed Mar  5 12:28:58 2003
+++ linux-2.5.64-new/mm/page_alloc.c	Tue Apr  8 17:21:03 2003
@@ -39,6 +39,7 @@
 int nr_swap_pages;
 int numnodes = 1;
 int sysctl_lower_zone_protection = 0;
+unsigned long max_pgcache = ULONG_MAX;
 
 /*
  * Used by page_zone() to look up the address of the struct zone whose
@@ -52,6 +53,9 @@
 static int zone_balance_min[MAX_NR_ZONES] __initdata = { 20 , 20, 20, };
 static int zone_balance_max[MAX_NR_ZONES] __initdata = { 255 , 255, 255, };
 
+extern int shrink_pgcache(struct zonelist *zonelist, unsigned int gfp_mask,
+	unsigned int max_nrpage, struct page_state *ps);
+
 /*
  * Temporary debugging check for pages not lying within a given zone.
  */
@@ -548,6 +552,19 @@
 	classzone = zones[0]; 
 	if (classzone == NULL)    /* no zones in the zonelist */
 		return NULL;
+
+	if (gfp_mask & __GFP_PGCACHE) {
+		struct page_state ps;
+		int nr_page;
+
+		min = 1UL << order;
+		get_page_state(&ps);
+		if (ps.nr_pagecache + min >= max_pgcache) {
+			/* try to shrink pagecache */
+			nr_page = ps.nr_pagecache + min - max_pgcache;
+			shrink_pgcache(zonelist, gfp_mask, nr_page, &ps);
+		}
+	}
 
 	/* Go through the zonelist once, looking for a zone with enough free */
 	min = 1UL << order;
diff -Nur linux-2.5.64/mm/swap_state.c linux-2.5.64-new/mm/swap_state.c
--- linux-2.5.64/mm/swap_state.c	Wed Mar  5 12:29:17 2003
+++ linux-2.5.64-new/mm/swap_state.c	Tue Apr  8 11:12:33 2003
@@ -360,7 +360,7 @@
 		 * Get a new page to read into from swap.
 		 */
 		if (!new_page) {
-			new_page = alloc_page(GFP_HIGHUSER);
+			new_page = alloc_page(GFP_HIGHUSER|__GFP_PGCACHE);
 			if (!new_page)
 				break;		/* Out of memory */
 		}
diff -Nur linux-2.5.64/mm/vmscan.c linux-2.5.64-new/mm/vmscan.c
--- linux-2.5.64/mm/vmscan.c	Wed Mar  5 12:28:59 2003
+++ linux-2.5.64-new/mm/vmscan.c	Tue Apr  8 11:12:33 2003
@@ -493,6 +493,13 @@
 				list_add(&page->lru, &zone->inactive_list);
 				continue;
 			}
+			if (gfp_mask & __GFP_PGCACHE) {
+				if (!PagePgcache(page)) {
+					SetPageLRU(page);
+					list_add(&page->lru, &zone->inactive_list);
+					continue;
+				}
+			}
 			list_add(&page->lru, &page_list);
 			page_cache_get(page);
 			nr_taken++;
@@ -737,6 +744,40 @@
 	}
 	return shrink_cache(nr_pages, zone, gfp_mask,
 				max_scan, nr_mapped);
+}
+
+/*
+ * Try to reclaim `nr_pages' from pagecache of this zone.
+ * Returns the number of reclaimed pages.
+ */
+int shrink_pgcache(struct zonelist *zonelist, unsigned int gfp_mask,
+	unsigned int nr_pages, struct page_state *ps)
+{
+	struct zone **zones;
+	struct zone *first_classzone;
+	struct zone *zone;
+	unsigned int ret = 0, reclaim;
+	unsigned long rest_nr_page;
+	int dummy, i;
+
+	zones = zonelist->zones;
+	for (i = 0; zones[i] != NULL; i++) {
+		zone = zones[i];
+		first_classzone = zone->zone_pgdat->node_zones;
+		for (; zone >= first_classzone; zone--) {
+			if (zone->all_unreclaimable) /* all pages pinned */
+				continue;
+
+			rest_nr_page = nr_pages - ret;
+			reclaim = max(((zone->nr_inactive)>>2)+1, rest_nr_page);
+			ret += shrink_zone(zone, zone->nr_inactive,
+					   gfp_mask|__GFP_PGCACHE,
+					   reclaim, &dummy, ps, DEF_PRIORITY);
+			if (ret >= nr_pages)
+				return ret;
+		}
+	}
+	return ret;
 }
 
 /*

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-21  0:49 ` Takao Indoh
@ 2003-08-21 23:47   ` Mike Fedyk
  2003-08-25  2:45     ` Takao Indoh
  0 siblings, 1 reply; 37+ messages in thread
From: Mike Fedyk @ 2003-08-21 23:47 UTC (permalink / raw)
  To: Takao Indoh; +Cc: linux-kernel

On Thu, Aug 21, 2003 at 09:49:45AM +0900, Takao Indoh wrote:
> Actually, in the system I constructed(RedHat AdvancedServer2.1, kernel
> 2.4.9based), the problem occurred due to pagecache. The system's maximum
> response time had to be less than 4 seconds, but owing to the pagecache,
> response time get uneven, and maximum time became 10 seconds.

Please try the 2.4.18 based redhat kernel, or the 2.4-aa kernel.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-21 23:47   ` Mike Fedyk
@ 2003-08-25  2:45     ` Takao Indoh
  2003-08-25  4:11       ` William Lee Irwin III
  0 siblings, 1 reply; 37+ messages in thread
From: Takao Indoh @ 2003-08-25  2:45 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: linux-kernel

On Thu, 21 Aug 2003 16:47:09 -0700, Mike Fedyk wrote:

>On Thu, Aug 21, 2003 at 09:49:45AM +0900, Takao Indoh wrote:
>> Actually, in the system I constructed(RedHat AdvancedServer2.1, kernel
>> 2.4.9based), the problem occurred due to pagecache. The system's maximum
>> response time had to be less than 4 seconds, but owing to the pagecache,
>> response time get uneven, and maximum time became 10 seconds.
>
>Please try the 2.4.18 based redhat kernel, or the 2.4-aa kernel.

I need a tuning parameter which can control pagecache
like /proc/sys/vm/pagecache, which RedHat Linux has.
The latest 2.4 or 2.5 standard kernel does not have such a parameter.
2.4.18 kernel or 2.4-aa kernel has a alternative method?

--------------------------------------------------
Takao Indoh
 E-Mail : indou.takao@soft.fujitsu.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-25  2:45     ` Takao Indoh
@ 2003-08-25  4:11       ` William Lee Irwin III
  2003-08-25 22:58         ` Mike Fedyk
  0 siblings, 1 reply; 37+ messages in thread
From: William Lee Irwin III @ 2003-08-25  4:11 UTC (permalink / raw)
  To: Takao Indoh; +Cc: Mike Fedyk, linux-kernel

On Thu, Aug 21, 2003 at 09:49:45AM +0900, Takao Indoh wrote:
>>> Actually, in the system I constructed(RedHat AdvancedServer2.1, kernel
>>> 2.4.9based), the problem occurred due to pagecache. The system's maximum
>>> response time had to be less than 4 seconds, but owing to the pagecache,
>>> response time get uneven, and maximum time became 10 seconds.

On Thu, 21 Aug 2003 16:47:09 -0700, Mike Fedyk wrote:
>> Please try the 2.4.18 based redhat kernel, or the 2.4-aa kernel.

On Mon, Aug 25, 2003 at 11:45:58AM +0900, Takao Indoh wrote:
> I need a tuning parameter which can control pagecache
> like /proc/sys/vm/pagecache, which RedHat Linux has.
> The latest 2.4 or 2.5 standard kernel does not have such a parameter.
> 2.4.18 kernel or 2.4-aa kernel has a alternative method?

This is moderately misguided; essentially the only way userspace can
utilize RAM at all is via the pagecache. It's not useful to limit this;
you probably need inode-highmem or some such nonsense.


-- wli

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-25  4:11       ` William Lee Irwin III
@ 2003-08-25 22:58         ` Mike Fedyk
  2003-08-26  9:46           ` William Lee Irwin III
  0 siblings, 1 reply; 37+ messages in thread
From: Mike Fedyk @ 2003-08-25 22:58 UTC (permalink / raw)
  To: Takao Indoh, linux-kernel

On Sun, Aug 24, 2003 at 09:11:17PM -0700, William Lee Irwin III wrote:
> On Thu, Aug 21, 2003 at 09:49:45AM +0900, Takao Indoh wrote:
> >>> Actually, in the system I constructed(RedHat AdvancedServer2.1, kernel
> >>> 2.4.9based), the problem occurred due to pagecache. The system's maximum
> >>> response time had to be less than 4 seconds, but owing to the pagecache,
> >>> response time get uneven, and maximum time became 10 seconds.
> 
> On Thu, 21 Aug 2003 16:47:09 -0700, Mike Fedyk wrote:
> >> Please try the 2.4.18 based redhat kernel, or the 2.4-aa kernel.
> 
> On Mon, Aug 25, 2003 at 11:45:58AM +0900, Takao Indoh wrote:
> > I need a tuning parameter which can control pagecache
> > like /proc/sys/vm/pagecache, which RedHat Linux has.
> > The latest 2.4 or 2.5 standard kernel does not have such a parameter.
> > 2.4.18 kernel or 2.4-aa kernel has a alternative method?
> 

Takao,

I doubt that there will be that option in the 2.4 stable series.  I think
you are trying to fix the problem without understanding the entire picture.
If there is too much pagechache, then the kernel developers need to know
about your workload so that they can fix it.  But you have to try -aa first
to see if it's already fixed.

> This is moderately misguided; essentially the only way userspace can
> utilize RAM at all is via the pagecache. It's not useful to limit this;
> you probably need inode-highmem or some such nonsense.

Exactly.  Every program you have opened, and all of its libraries will show
up as pagecache memory also, so seeing a large pagecache in and of itself
may not be a problem.

Let's get past the tuning paramenter you want in /proc, and tell us more
about what you are doing that is causing this problem to be shown.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-25 22:58         ` Mike Fedyk
@ 2003-08-26  9:46           ` William Lee Irwin III
  2003-08-27  9:36             ` Takao Indoh
  0 siblings, 1 reply; 37+ messages in thread
From: William Lee Irwin III @ 2003-08-26  9:46 UTC (permalink / raw)
  To: Takao Indoh, linux-kernel

On Sun, Aug 24, 2003 at 09:11:17PM -0700, William Lee Irwin III wrote:
>> This is moderately misguided; essentially the only way userspace can
>> utilize RAM at all is via the pagecache. It's not useful to limit this;
>> you probably need inode-highmem or some such nonsense.

On Mon, Aug 25, 2003 at 03:58:47PM -0700, Mike Fedyk wrote:
> Exactly.  Every program you have opened, and all of its libraries will show
> up as pagecache memory also, so seeing a large pagecache in and of itself
> may not be a problem.
> Let's get past the tuning paramenter you want in /proc, and tell us more
> about what you are doing that is causing this problem to be shown.

One thing I thought of after the post was whether they actually had in
mind tunable hard limits on _unmapped_ pagecache, which is, in fact,
useful. OTOH that's largely speculation and we really need them to
articulate the true nature of their problem.


-- wli

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-26  9:46           ` William Lee Irwin III
@ 2003-08-27  9:36             ` Takao Indoh
  2003-08-27  9:45               ` William Lee Irwin III
  0 siblings, 1 reply; 37+ messages in thread
From: Takao Indoh @ 2003-08-27  9:36 UTC (permalink / raw)
  To: William Lee Irwin III, Mike Fedyk; +Cc: linux-kernel

Thanks for advice.

On Mon, 25 Aug 2003 15:58:47 -0700, Mike Fedyk wrote:

>I doubt that there will be that option in the 2.4 stable series.  I think
>you are trying to fix the problem without understanding the entire picture.
>If there is too much pagechache, then the kernel developers need to know
>about your workload so that they can fix it.  But you have to try -aa first
>to see if it's already fixed.
>
>> This is moderately misguided; essentially the only way userspace can
>> utilize RAM at all is via the pagecache. It's not useful to limit this;
>> you probably need inode-highmem or some such nonsense.
>
>Exactly.  Every program you have opened, and all of its libraries will show
>up as pagecache memory also, so seeing a large pagecache in and of itself
>may not be a problem.
>
>Let's get past the tuning paramenter you want in /proc, and tell us more
>about what you are doing that is causing this problem to be shown.

This problem happened a few month ago and the detailed data does not
remain. Therefore it is difficult to know what is essential cause for
this problem, but, I guessed that pagecache used as I/O cache grew
gradually during system running, and finally it oppressed memory.

Besides this problem, there are many cases where increase of pagecache
causes trouble, I think.
For example, DBMS.
DBMS caches index of DB in their process space.
This index cache conflicts with the pagecache used by other applications,
and index cache may be paged out. It cause uneven response of DBMS.
In this case, limiting pagecache is effective.


On Tue, 26 Aug 2003 02:46:34 -0700, William Lee Irwin III wrote:

>One thing I thought of after the post was whether they actually had in
>mind tunable hard limits on _unmapped_ pagecache, which is, in fact,
>useful. OTOH that's largely speculation and we really need them to
>articulate the true nature of their problem.

I also think that is effective. Empirically, in the case where pagecache
causes memory shortage, most of pagecache is unmapped page. Of course
real problem may not be pagecashe, as you or Mike said.

--------------------------------------------------
Takao Indoh
 E-Mail : indou.takao@soft.fujitsu.com


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-27  9:36             ` Takao Indoh
@ 2003-08-27  9:45               ` William Lee Irwin III
  2003-08-27 11:14                 ` Takao Indoh
  2003-08-27 16:01                 ` Joseph Malicki
  0 siblings, 2 replies; 37+ messages in thread
From: William Lee Irwin III @ 2003-08-27  9:45 UTC (permalink / raw)
  To: Takao Indoh; +Cc: Mike Fedyk, linux-kernel

On Wed, Aug 27, 2003 at 06:36:10PM +0900, Takao Indoh wrote:
> This problem happened a few month ago and the detailed data does not
> remain. Therefore it is difficult to know what is essential cause for
> this problem, but, I guessed that pagecache used as I/O cache grew
> gradually during system running, and finally it oppressed memory.

But this doesn't make any sense; the only memory you could "oppress"
is pagecache.


On Wed, Aug 27, 2003 at 06:36:10PM +0900, Takao Indoh wrote:
> Besides this problem, there are many cases where increase of pagecache
> causes trouble, I think.
> For example, DBMS.
> DBMS caches index of DB in their process space.
> This index cache conflicts with the pagecache used by other applications,
> and index cache may be paged out. It cause uneven response of DBMS.
> In this case, limiting pagecache is effective.

Why is it effective? You're describing pagecache vs. pagecache
competition and the DBMS outcompeting the cooperating applications for
memory to the detriment of the workload; this is a very different
scenario from what "limiting pagecache" sounds like.

How do you know it would be effective? Have you written a patch to
limit it in some way and tried running it?


On Tue, 26 Aug 2003 02:46:34 -0700, William Lee Irwin III wrote:
>> One thing I thought of after the post was whether they actually had in
>> mind tunable hard limits on _unmapped_ pagecache, which is, in fact,
>> useful. OTOH that's largely speculation and we really need them to
>> articulate the true nature of their problem.

On Wed, Aug 27, 2003 at 06:36:10PM +0900, Takao Indoh wrote:
> I also think that is effective. Empirically, in the case where pagecache
> causes memory shortage, most of pagecache is unmapped page. Of course
> real problem may not be pagecashe, as you or Mike said.

How do you know most of it is unmapped?

At any rate, the above assigns a meaningful definition to the words you
used; it does not necessarily have anything to do with the issue you're
trying to describe. If you could start from the very basics, reproduce
the problem, instrument the workload with top(1) and vmstat(1), and find
some way to describe how the performance is inadequate (e.g. performance
metrics for your running DBMS/whatever in MB/s or transactions/s etc.),
it would be much more helpful than proposing a solution up front.
Without any evidence, we can't know it is a solution at all, or that
it's the right solution.


-- wli

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-27  9:45               ` William Lee Irwin III
@ 2003-08-27 11:14                 ` Takao Indoh
  2003-08-27 11:36                   ` William Lee Irwin III
  2003-08-27 16:01                 ` Joseph Malicki
  1 sibling, 1 reply; 37+ messages in thread
From: Takao Indoh @ 2003-08-27 11:14 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Mike Fedyk, linux-kernel

On Wed, 27 Aug 2003 02:45:12 -0700, William Lee Irwin III wrote:

>On Wed, Aug 27, 2003 at 06:36:10PM +0900, Takao Indoh wrote:
>> Besides this problem, there are many cases where increase of pagecache
>> causes trouble, I think.
>> For example, DBMS.
>> DBMS caches index of DB in their process space.
>> This index cache conflicts with the pagecache used by other applications,
>> and index cache may be paged out. It cause uneven response of DBMS.
>> In this case, limiting pagecache is effective.
>
>Why is it effective? You're describing pagecache vs. pagecache
>competition and the DBMS outcompeting the cooperating applications for
>memory to the detriment of the workload; this is a very different
>scenario from what "limiting pagecache" sounds like.
>
>How do you know it would be effective? Have you written a patch to
>limit it in some way and tried running it?

It's just my guess. You mean that "index cache" is on the pagecache?
"index cache" is allocated in the user space by malloc,
so I think it is not on the pagecache.


>On Tue, 26 Aug 2003 02:46:34 -0700, William Lee Irwin III wrote:
>>> One thing I thought of after the post was whether they actually had in
>>> mind tunable hard limits on _unmapped_ pagecache, which is, in fact,
>>> useful. OTOH that's largely speculation and we really need them to
>>> articulate the true nature of their problem.
>
>On Wed, Aug 27, 2003 at 06:36:10PM +0900, Takao Indoh wrote:
>> I also think that is effective. Empirically, in the case where pagecache
>> causes memory shortage, most of pagecache is unmapped page. Of course
>> real problem may not be pagecashe, as you or Mike said.
>
>How do you know most of it is unmapped?

I checked /proc/meminfo.
For example, this is my /proc/meminfo(kernel 2.5.73)

MemTotal:       902728 kB
MemFree:         53096 kB
Buffers:         18520 kB
Cached:         732360 kB
SwapCached:          0 kB
Active:         623068 kB
Inactive:       179552 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       902728 kB
LowFree:         53096 kB
SwapTotal:      506036 kB
SwapFree:       506036 kB
Dirty:           33204 kB
Writeback:           0 kB
Mapped:          73360 kB
Slab:            32468 kB
Committed_AS:   167396 kB
PageTables:        988 kB
VmallocTotal:   122808 kB
VmallocUsed:     20432 kB
VmallocChunk:   102376 kB

According to this information, I thought that
all pagecache was 732360 kB and all mapped page was 73360 kB, so
almost of pagecache was not mapped...
Do I misread meminfo?

--------------------------------------------------
Takao Indoh
 E-Mail : indou.takao@soft.fujitsu.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-27 11:14                 ` Takao Indoh
@ 2003-08-27 11:36                   ` William Lee Irwin III
  2003-09-02 10:52                     ` Takao Indoh
  0 siblings, 1 reply; 37+ messages in thread
From: William Lee Irwin III @ 2003-08-27 11:36 UTC (permalink / raw)
  To: Takao Indoh; +Cc: Mike Fedyk, linux-kernel

On Wed, 27 Aug 2003 02:45:12 -0700, William Lee Irwin III wrote:
>> How do you know it would be effective? Have you written a patch to
>> limit it in some way and tried running it?

On Wed, Aug 27, 2003 at 08:14:12PM +0900, Takao Indoh wrote:
> It's just my guess. You mean that "index cache" is on the pagecache?
> "index cache" is allocated in the user space by malloc,
> so I think it is not on the pagecache.

That will be in the pagecache.


On Wed, 27 Aug 2003 02:45:12 -0700, William Lee Irwin III wrote:
>> How do you know most of it is unmapped?

On Wed, Aug 27, 2003 at 08:14:12PM +0900, Takao Indoh wrote:
> I checked /proc/meminfo.
> For example, this is my /proc/meminfo(kernel 2.5.73)
[...]
> Buffers:         18520 kB
> Cached:         732360 kB
> SwapCached:          0 kB
> Active:         623068 kB
> Inactive:       179552 kB
[...]
> Dirty:           33204 kB
> Writeback:           0 kB
> Mapped:          73360 kB
> Slab:            32468 kB
> Committed_AS:   167396 kB
[...]
> According to this information, I thought that
> all pagecache was 732360 kB and all mapped page was 73360 kB, so
> almost of pagecache was not mapped...
> Do I misread meminfo?

No. Most of your pagecache is unmapped pagecache. This would correspond
to memory that caches files which are not being mmapped by any process.
This could result from either the page replacement policy favoring
filesystem cache too heavily or from lots of io causing the filesystem
cache to be too bloated and so defeating the swapper's heuristics (you
can do this by generating large amounts of read() traffic).

Limiting unmapped pagecache would resolve your issue. Whether it's the
right thing to do is still open to question without some knowledge of
application behavior (for instance, teaching userspace to do fadvise()
may be right thing to do as opposed to the /proc/ tunable).

Can you gather traces of system calls being made by the applications?


-- wli

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-27  9:45               ` William Lee Irwin III
  2003-08-27 11:14                 ` Takao Indoh
@ 2003-08-27 16:01                 ` Joseph Malicki
  1 sibling, 0 replies; 37+ messages in thread
From: Joseph Malicki @ 2003-08-27 16:01 UTC (permalink / raw)
  To: William Lee Irwin III, Takao Indoh; +Cc: Mike Fedyk, linux-kernel

I've had experience with unneeded, *mapped* pages that would be ideally
flushed oppressing needed mapped and unmapped pages.
Test case: grep --mmap SOME_STRING_I_WONT_FIND some_multi-GB-file

Sure, it's bad programming etc, but in that case, once those pages are
mapped, they can't be forcibly unmapped even though
in a utopian VM they would be discarded as unneeded.

This could very well be the problem?

-joe

----- Original Message ----- 
From: "William Lee Irwin III" <wli@holomorphy.com>
To: "Takao Indoh" <indou.takao@soft.fujitsu.com>
Cc: "Mike Fedyk" <mfedyk@matchmail.com>; <linux-kernel@vger.kernel.org>
Sent: Wednesday, August 27, 2003 5:45 AM
Subject: Re: cache limit


> On Wed, Aug 27, 2003 at 06:36:10PM +0900, Takao Indoh wrote:
> > This problem happened a few month ago and the detailed data does not
> > remain. Therefore it is difficult to know what is essential cause for
> > this problem, but, I guessed that pagecache used as I/O cache grew
> > gradually during system running, and finally it oppressed memory.
>
> But this doesn't make any sense; the only memory you could "oppress"
> is pagecache.
>
>
> On Wed, Aug 27, 2003 at 06:36:10PM +0900, Takao Indoh wrote:
> > Besides this problem, there are many cases where increase of pagecache
> > causes trouble, I think.
> > For example, DBMS.
> > DBMS caches index of DB in their process space.
> > This index cache conflicts with the pagecache used by other
applications,
> > and index cache may be paged out. It cause uneven response of DBMS.
> > In this case, limiting pagecache is effective.
>
> Why is it effective? You're describing pagecache vs. pagecache
> competition and the DBMS outcompeting the cooperating applications for
> memory to the detriment of the workload; this is a very different
> scenario from what "limiting pagecache" sounds like.
>
> How do you know it would be effective? Have you written a patch to
> limit it in some way and tried running it?
>
>
> On Tue, 26 Aug 2003 02:46:34 -0700, William Lee Irwin III wrote:
> >> One thing I thought of after the post was whether they actually had in
> >> mind tunable hard limits on _unmapped_ pagecache, which is, in fact,
> >> useful. OTOH that's largely speculation and we really need them to
> >> articulate the true nature of their problem.
>
> On Wed, Aug 27, 2003 at 06:36:10PM +0900, Takao Indoh wrote:
> > I also think that is effective. Empirically, in the case where pagecache
> > causes memory shortage, most of pagecache is unmapped page. Of course
> > real problem may not be pagecashe, as you or Mike said.
>
> How do you know most of it is unmapped?
>
> At any rate, the above assigns a meaningful definition to the words you
> used; it does not necessarily have anything to do with the issue you're
> trying to describe. If you could start from the very basics, reproduce
> the problem, instrument the workload with top(1) and vmstat(1), and find
> some way to describe how the performance is inadequate (e.g. performance
> metrics for your running DBMS/whatever in MB/s or transactions/s etc.),
> it would be much more helpful than proposing a solution up front.
> Without any evidence, we can't know it is a solution at all, or that
> it's the right solution.
>
>
> -- wli
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-27 11:36                   ` William Lee Irwin III
@ 2003-09-02 10:52                     ` Takao Indoh
  2003-09-02 11:30                       ` William Lee Irwin III
  2003-09-02 17:21                       ` Mike Fedyk
  0 siblings, 2 replies; 37+ messages in thread
From: Takao Indoh @ 2003-09-02 10:52 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Mike Fedyk, linux-kernel

On Wed, 27 Aug 2003 04:36:46 -0700, William Lee Irwin III wrote:

>On Wed, 27 Aug 2003 02:45:12 -0700, William Lee Irwin III wrote:
>>> How do you know most of it is unmapped?
>
>On Wed, Aug 27, 2003 at 08:14:12PM +0900, Takao Indoh wrote:
>> I checked /proc/meminfo.
>> For example, this is my /proc/meminfo(kernel 2.5.73)
>[...]
>> Buffers:         18520 kB
>> Cached:         732360 kB
>> SwapCached:          0 kB
>> Active:         623068 kB
>> Inactive:       179552 kB
>[...]
>> Dirty:           33204 kB
>> Writeback:           0 kB
>> Mapped:          73360 kB
>> Slab:            32468 kB
>> Committed_AS:   167396 kB
>[...]
>> According to this information, I thought that
>> all pagecache was 732360 kB and all mapped page was 73360 kB, so
>> almost of pagecache was not mapped...
>> Do I misread meminfo?
>
>No. Most of your pagecache is unmapped pagecache. This would correspond
>to memory that caches files which are not being mmapped by any process.
>This could result from either the page replacement policy favoring
>filesystem cache too heavily or from lots of io causing the filesystem
>cache to be too bloated and so defeating the swapper's heuristics (you
>can do this by generating large amounts of read() traffic).
>
>Limiting unmapped pagecache would resolve your issue. Whether it's the
>right thing to do is still open to question without some knowledge of
>application behavior (for instance, teaching userspace to do fadvise()
>may be right thing to do as opposed to the /proc/ tunable).
>
>Can you gather traces of system calls being made by the applications?
>
>
>-- wli

This is an output of strace -cf.

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 47.91   57.459696          65    885531           read
 20.27   24.309112          33    727702           write
 17.01   20.405231        1058     19292           vfork
 13.41   16.087846          11   1524468           lseek
  0.32    0.379605          10     38586           close
  0.31    0.368272          19     19290           wait4
  0.27    0.326425          17     19292           pipe
  0.19    0.227052          12     19296           old_mmap
  0.16    0.192420          10     19291           munmap
  0.13    0.158041           8     19302           fstat64
  0.01    0.009983        4992         2           fsync
  0.00    0.001233           6       202           brk
  0.00    0.001029         515         2           unlink
  0.00    0.000173          25         7         1 open
  0.00    0.000128          64         2           chmod
  0.00    0.000092           7        13         6 stat64
  0.00    0.000019          19         1           getcwd
  0.00    0.000016           5         3           access
  0.00    0.000015           4         4           shmat
  0.00    0.000012           3         4           rt_sigaction
  0.00    0.000007           7         1           mprotect
  0.00    0.000002           2         1           getpid
------ ----------- ----------- --------- --------- ----------------
100.00  119.926409               3292292         7 total

According to this information, many I/O increase pagecache and cause
memory shortage.

fadvise may be effective, but fadvise always releases cache
even if there are enough free memory, and may degrade performance.
In the case of /proc tunable,
pagecache is not released until system memory become lack.


On Thu, 28 Aug 2003 01:02:45 +0900, YoshiyaETO wrote:

>> On Wed, 27 Aug 2003 02:45:12 -0700, William Lee Irwin III wrote:
>> >> How do you know it would be effective? Have you written a patch to
>> >> limit it in some way and tried running it?
>>
>> On Wed, Aug 27, 2003 at 08:14:12PM +0900, Takao Indoh wrote:
>> > It's just my guess. You mean that "index cache" is on the pagecache?
>> > "index cache" is allocated in the user space by malloc,
>> > so I think it is not on the pagecache.
>>
>> That will be in the pagecache.
>
>    No. DBMS usually uses DIRECTIO that bypass the pagecache.
>So, "index caches" in the DBMS user space will not be in pagecache.

If so, limiting pagecache seems to be effective for DBMS.

--------------------------------------------------
Takao Indoh
 E-Mail : indou.takao@soft.fujitsu.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-09-02 10:52                     ` Takao Indoh
@ 2003-09-02 11:30                       ` William Lee Irwin III
  2003-09-02 17:21                       ` Mike Fedyk
  1 sibling, 0 replies; 37+ messages in thread
From: William Lee Irwin III @ 2003-09-02 11:30 UTC (permalink / raw)
  To: Takao Indoh; +Cc: Mike Fedyk, linux-kernel

On Tue, Sep 02, 2003 at 07:52:51PM +0900, Takao Indoh wrote:
> According to this information, many I/O increase pagecache and cause
> memory shortage.
> fadvise may be effective, but fadvise always releases cache
> even if there are enough free memory, and may degrade performance.
> In the case of /proc tunable,
> pagecache is not released until system memory become lack.
[...]
> If so, limiting pagecache seems to be effective for DBMS.

There are reasons why databases use raw io and direct io; this is one
of them. I'd say the kernel shouldn't try to engage in such tunable
shenanigans.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
On Tue, Sep 02, 2003 at 07:52:51PM +0900, Takao Indoh wrote:
> According to this information, many I/O increase pagecache and cause
> memory shortage.
> fadvise may be effective, but fadvise always releases cache
> even if there are enough free memory, and may degrade performance.
> In the case of /proc tunable,
> pagecache is not released until system memory become lack.
[...]
> If so, limiting pagecache seems to be effective for DBMS.

There are reasons why databases use raw io and direct io; this is one
of them. I'd say the kernel shouldn't try to engage in such tunable
shenanigans.


-- wli

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-09-02 10:52                     ` Takao Indoh
  2003-09-02 11:30                       ` William Lee Irwin III
@ 2003-09-02 17:21                       ` Mike Fedyk
  1 sibling, 0 replies; 37+ messages in thread
From: Mike Fedyk @ 2003-09-02 17:21 UTC (permalink / raw)
  To: Takao Indoh; +Cc: William Lee Irwin III, linux-kernel

On Tue, Sep 02, 2003 at 07:52:51PM +0900, Takao Indoh wrote:
> >> According to this information, I thought that
> >> all pagecache was 732360 kB and all mapped page was 73360 kB, so
> >> almost of pagecache was not mapped...
> >> Do I misread meminfo?

Can you try your workload again with:

echo 0 > /proc/sys/vm/swappiness
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
On Tue, Sep 02, 2003 at 07:52:51PM +0900, Takao Indoh wrote:
> >> According to this information, I thought that
> >> all pagecache was 732360 kB and all mapped page was 73360 kB, so
> >> almost of pagecache was not mapped...
> >> Do I misread meminfo?

Can you try your workload again with:

echo 0 > /proc/sys/vm/swappiness

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
@ 2003-08-27 16:03 Joseph Malicki
  0 siblings, 0 replies; 37+ messages in thread
From: Joseph Malicki @ 2003-08-27 16:03 UTC (permalink / raw)
  To: William Lee Irwin III, Takao Indoh; +Cc: Mike Fedyk, linux-kernel

I was premature about the test case, but still, a process (or several) that
mmap several GB's of files and dont unmap what they don't need has caused
issues in the past.

-joe

----- Original Message ----- 
From: "Joseph Malicki" <jmalicki@starbak.net>
To: "William Lee Irwin III" <wli@holomorphy.com>; "Takao Indoh"
<indou.takao@soft.fujitsu.com>
Cc: "Mike Fedyk" <mfedyk@matchmail.com>; <linux-kernel@vger.kernel.org>
Sent: Wednesday, August 27, 2003 12:01 PM
Subject: Re: cache limit


> I've had experience with unneeded, *mapped* pages that would be ideally
> flushed oppressing needed mapped and unmapped pages.
> Test case: grep --mmap SOME_STRING_I_WONT_FIND some_multi-GB-file
>
> Sure, it's bad programming etc, but in that case, once those pages are
> mapped, they can't be forcibly unmapped even though
> in a utopian VM they would be discarded as unneeded.
>
> This could very well be the problem?
>
> -joe
>
> ----- Original Message ----- 
> From: "William Lee Irwin III" <wli@holomorphy.com>
> To: "Takao Indoh" <indou.takao@soft.fujitsu.com>
> Cc: "Mike Fedyk" <mfedyk@matchmail.com>; <linux-kernel@vger.kernel.org>
> Sent: Wednesday, August 27, 2003 5:45 AM
> Subject: Re: cache limit
>
>
> > On Wed, Aug 27, 2003 at 06:36:10PM +0900, Takao Indoh wrote:
> > > This problem happened a few month ago and the detailed data does not
> > > remain. Therefore it is difficult to know what is essential cause for
> > > this problem, but, I guessed that pagecache used as I/O cache grew
> > > gradually during system running, and finally it oppressed memory.
> >
> > But this doesn't make any sense; the only memory you could "oppress"
> > is pagecache.
> >
> >
> > On Wed, Aug 27, 2003 at 06:36:10PM +0900, Takao Indoh wrote:
> > > Besides this problem, there are many cases where increase of pagecache
> > > causes trouble, I think.
> > > For example, DBMS.
> > > DBMS caches index of DB in their process space.
> > > This index cache conflicts with the pagecache used by other
> applications,
> > > and index cache may be paged out. It cause uneven response of DBMS.
> > > In this case, limiting pagecache is effective.
> >
> > Why is it effective? You're describing pagecache vs. pagecache
> > competition and the DBMS outcompeting the cooperating applications for
> > memory to the detriment of the workload; this is a very different
> > scenario from what "limiting pagecache" sounds like.
> >
> > How do you know it would be effective? Have you written a patch to
> > limit it in some way and tried running it?
> >
> >
> > On Tue, 26 Aug 2003 02:46:34 -0700, William Lee Irwin III wrote:
> > >> One thing I thought of after the post was whether they actually had
in
> > >> mind tunable hard limits on _unmapped_ pagecache, which is, in fact,
> > >> useful. OTOH that's largely speculation and we really need them to
> > >> articulate the true nature of their problem.
> >
> > On Wed, Aug 27, 2003 at 06:36:10PM +0900, Takao Indoh wrote:
> > > I also think that is effective. Empirically, in the case where
pagecache
> > > causes memory shortage, most of pagecache is unmapped page. Of course
> > > real problem may not be pagecashe, as you or Mike said.
> >
> > How do you know most of it is unmapped?
> >
> > At any rate, the above assigns a meaningful definition to the words you
> > used; it does not necessarily have anything to do with the issue you're
> > trying to describe. If you could start from the very basics, reproduce
> > the problem, instrument the workload with top(1) and vmstat(1), and find
> > some way to describe how the performance is inadequate (e.g. performance
> > metrics for your running DBMS/whatever in MB/s or transactions/s etc.),
> > it would be much more helpful than proposing a solution up front.
> > Without any evidence, we can't know it is a solution at all, or that
> > it's the right solution.
> >
> >
> > -- wli
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
       [not found] <000801c36cb1$454d4950$1001a8c0@etofmv650>
@ 2003-08-27 16:02 ` YoshiyaETO
  0 siblings, 0 replies; 37+ messages in thread
From: YoshiyaETO @ 2003-08-27 16:02 UTC (permalink / raw)
  To: linux-kernel

> On Wed, 27 Aug 2003 02:45:12 -0700, William Lee Irwin III wrote:
> >> How do you know it would be effective? Have you written a patch to
> >> limit it in some way and tried running it?
>
> On Wed, Aug 27, 2003 at 08:14:12PM +0900, Takao Indoh wrote:
> > It's just my guess. You mean that "index cache" is on the pagecache?
> > "index cache" is allocated in the user space by malloc,
> > so I think it is not on the pagecache.
>
> That will be in the pagecache.

    No. DBMS usually uses DIRECTIO that bypass the pagecache.
So, "index caches" in the DBMS user space will not be in pagecache.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-27 10:21               ` Ihar 'Philips' Filipau
@ 2003-08-27 11:07                 ` Nick Piggin
  0 siblings, 0 replies; 37+ messages in thread
From: Nick Piggin @ 2003-08-27 11:07 UTC (permalink / raw)
  To: Ihar 'Philips' Filipau; +Cc: Mike Fedyk, Linux Kernel Mailing List



Ihar 'Philips' Filipau wrote:

> Mike Fedyk wrote:
>
>>
>> That was because they wanted the non-streaming files to be left in 
>> the cache.
>>
>>>  I will try to produce some benchmarktings tomorrow with different 
>>> 'mem=%dMB'. I'm afraid to confirm that it will make difference.
>>>  But in advance: mantainance of page tables for 1GB and for 128MB of 
>>> RAM are going to make a difference.
>>
>>
>> I'm sorry to say, but you *will* get lower performance if you lower 
>> the mem=
>> value below your working set.  This will also lower the total amount of
>> memory available for your applications, and force your apps, to swap and
>> balance cache, and app memory.
>>
>> That's not what you are looking to benchmark.
>>
>
>   Okay. I'm completely puzzled.
>   I will qute here only one test - and I really do not understand this 
> stuff.
>
>   Three boots with the same parameters and only mem=nMB, n = 
> {512,256,128} (I have 512MB RAM)
>
>   hdparm tests:
> [root@hera ifilipau]# hdparm -t /dev/hda
> /dev/hda:
>  Timing buffered disk reads:  64 MB in  1.56 seconds = 41.03 MB/sec
> [root@hera ifilipau]# hdparm -T /dev/hda
> /dev/hda:
>  Timing buffer-cache reads:   128 MB in  0.44 seconds =290.91 MB/sec
> [root@hera ifilipau]#
>
>   Before tests I was doing 'swapoff -a; sync'
>   RedHat's 2.4.20-20.9 kernel.
>
>   What has really puzzled me.
>   Operation: "cat *.bz2 >big_file", where *.bz2 is just two bzipped 
> kernels. Total size: 29MB+32MB (2.4.22 + 2.6.0-test1)
>
>   To be bsolutely fair in this unfair benchmark I have run test only 
> once. Times in seconds as shown by bash's time.
>
>            cat      sync
>   512MB:  1.565    0.007
>   256MB:  1.649    0.008
>   128MB:  2.184    0.007
>
>   Kill me - shoot me, but how it can be?
>   Resulting file fits RAM.
>   Not hard to guess that source files, which no one cares about 
> already - are still hanging in the RAM...
>
>   That's not right: as long as resulting file fits memory - and it 
> fits memory in all (512MB, 256MB, 128MB) cases - this operation should 
> take the _same_ time. (Actually before 128MB test, vmstat was saying 
> that I have +70MB of free non-touched memory)
>
>   So resume is quite simple: kernel loses *terribly* much time 
> resorting read()s against write()s. Way _too_ _much_ time. 


The kernel spends _very_ little time in the disk elevator actually. The
2.4 elevator can send very suboptimal orderings of requests to the disk
when reads and writes are going to the disk at the same time. That might
be happening here. The VM might also be doing more work if you have other
things in RAM as well, although its unlikely to cause such a big
difference.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-26 19:23             ` Mike Fedyk
@ 2003-08-27 10:21               ` Ihar 'Philips' Filipau
  2003-08-27 11:07                 ` Nick Piggin
  0 siblings, 1 reply; 37+ messages in thread
From: Ihar 'Philips' Filipau @ 2003-08-27 10:21 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: Linux Kernel Mailing List

Mike Fedyk wrote:
> 
> That was because they wanted the non-streaming files to be left in the cache.
> 
>>  I will try to produce some benchmarktings tomorrow with different 
>>'mem=%dMB'. I'm afraid to confirm that it will make difference.
>>  But in advance: mantainance of page tables for 1GB and for 128MB of 
>>RAM are going to make a difference.
> 
> I'm sorry to say, but you *will* get lower performance if you lower the mem=
> value below your working set.  This will also lower the total amount of
> memory available for your applications, and force your apps, to swap and
> balance cache, and app memory.
> 
> That's not what you are looking to benchmark.
> 

   Okay. I'm completely puzzled.
   I will qute here only one test - and I really do not understand this 
stuff.

   Three boots with the same parameters and only mem=nMB, n = 
{512,256,128} (I have 512MB RAM)

   hdparm tests:
[root@hera ifilipau]# hdparm -t /dev/hda
/dev/hda:
  Timing buffered disk reads:  64 MB in  1.56 seconds = 41.03 MB/sec
[root@hera ifilipau]# hdparm -T /dev/hda
/dev/hda:
  Timing buffer-cache reads:   128 MB in  0.44 seconds =290.91 MB/sec
[root@hera ifilipau]#

   Before tests I was doing 'swapoff -a; sync'
   RedHat's 2.4.20-20.9 kernel.

   What has really puzzled me.
   Operation: "cat *.bz2 >big_file", where *.bz2 is just two bzipped 
kernels. Total size: 29MB+32MB (2.4.22 + 2.6.0-test1)

   To be bsolutely fair in this unfair benchmark I have run test only 
once. Times in seconds as shown by bash's time.

            cat      sync
   512MB:  1.565    0.007
   256MB:  1.649    0.008
   128MB:  2.184    0.007

   Kill me - shoot me, but how it can be?
   Resulting file fits RAM.
   Not hard to guess that source files, which no one cares about already 
- are still hanging in the RAM...

   That's not right: as long as resulting file fits memory - and it fits 
memory in all (512MB, 256MB, 128MB) cases - this operation should take 
the _same_ time. (Actually before 128MB test, vmstat was saying that I 
have +70MB of free non-touched memory)

   So resume is quite simple: kernel loses *terribly* much time 
resorting read()s against write()s. Way _too_ _much_ time.

   I will try to download RedHat's AS kernel and play with page-cache.
   After all: if RH has included that feature in their kernels - that 
means it really make sense ;-)))

-- 
Ihar 'Philips' Filipau  / with best regards from Saarbruecken.
   -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
   * Please avoid sending me Word/PowerPoint/Excel attachments.
   * See http://www.fsf.org/philosophy/no-word-attachments.html
   -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
    There should be some SCO's source code in Linux -
       my servers sometimes are crashing.      -- People


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-26 19:08           ` Ihar 'Philips' Filipau
@ 2003-08-26 19:23             ` Mike Fedyk
  2003-08-27 10:21               ` Ihar 'Philips' Filipau
  0 siblings, 1 reply; 37+ messages in thread
From: Mike Fedyk @ 2003-08-26 19:23 UTC (permalink / raw)
  To: Ihar 'Philips' Filipau; +Cc: Linux Kernel Mailing List

On Tue, Aug 26, 2003 at 09:08:51PM +0200, Ihar 'Philips' Filipau wrote:
> Mike Fedyk wrote:
> >Ok, let's benchmark it.
> >
> >Yes, I can see the logic in your argument, but at this point, numbers are
> >needed to see if or how much of a win this might be.
> 
>   [ I beleive you can see those thread about O_STREAMING patch. 
> Not-caching was giving 10%-15% peformance boost for gcc on kernel 
> compiles. Isn't that overhead? ]
> 

That was because they wanted the non-streaming files to be left in the cache.

>   I will try to produce some benchmarktings tomorrow with different 
> 'mem=%dMB'. I'm afraid to confirm that it will make difference.
>   But in advance: mantainance of page tables for 1GB and for 128MB of 
> RAM are going to make a difference.

I'm sorry to say, but you *will* get lower performance if you lower the mem=
value below your working set.  This will also lower the total amount of
memory available for your applications, and force your apps, to swap and
balance cache, and app memory.

That's not what you are looking to benchmark.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
       [not found]         ` <oQh2.4bQ.13@gated-at.bofh.it>
@ 2003-08-26 19:08           ` Ihar 'Philips' Filipau
  2003-08-26 19:23             ` Mike Fedyk
  0 siblings, 1 reply; 37+ messages in thread
From: Ihar 'Philips' Filipau @ 2003-08-26 19:08 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: Linux Kernel Mailing List

Mike Fedyk wrote:
> On Tue, Aug 26, 2003 at 12:15:46PM +0200, Ihar 'Philips' Filipau wrote:
>>  If I have 1GB of memory and my applications for use only 16MB - it 
>>doesn't mean I want to fill 1GB-16MB with garbage like file my momy had 
>>viewed two weeks ago.
>>
>>  That's it: OS should scale for *application* *needs*.
>>
>>  Can you compare in your mind overhead of managing 1GB of cache with 
>>managing e.g. 16MB of cache?
>>
> 
> Ok, let's benchmark it.
> 
> Yes, I can see the logic in your argument, but at this point, numbers are
> needed to see if or how much of a win this might be.

   [ I beleive you can see those thread about O_STREAMING patch. 
Not-caching was giving 10%-15% peformance boost for gcc on kernel 
compiles. Isn't that overhead? ]

   I will try to produce some benchmarktings tomorrow with different 
'mem=%dMB'. I'm afraid to confirm that it will make difference.
   But in advance: mantainance of page tables for 1GB and for 128MB of 
RAM are going to make a difference.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-26 10:15       ` Ihar 'Philips' Filipau
@ 2003-08-26 17:46         ` Mike Fedyk
  0 siblings, 0 replies; 37+ messages in thread
From: Mike Fedyk @ 2003-08-26 17:46 UTC (permalink / raw)
  To: Ihar 'Philips' Filipau; +Cc: Linux Kernel Mailing List, Takao Indoh

On Tue, Aug 26, 2003 at 12:15:46PM +0200, Ihar 'Philips' Filipau wrote:
>   If I have 1GB of memory and my applications for use only 16MB - it 
> doesn't mean I want to fill 1GB-16MB with garbage like file my momy had 
> viewed two weeks ago.
> 
>   That's it: OS should scale for *application* *needs*.
> 
>   Can you compare in your mind overhead of managing 1GB of cache with 
> managing e.g. 16MB of cache?
> 

Ok, let's benchmark it.

Yes, I can see the logic in your argument, but at this point, numbers are
needed to see if or how much of a win this might be.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
       [not found]     ` <oyDw.5FP.33@gated-at.bofh.it>
@ 2003-08-26 10:15       ` Ihar 'Philips' Filipau
  2003-08-26 17:46         ` Mike Fedyk
  0 siblings, 1 reply; 37+ messages in thread
From: Ihar 'Philips' Filipau @ 2003-08-26 10:15 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: Linux Kernel Mailing List, Takao Indoh

Mike Fedyk wrote:
>>On Mon, Aug 25, 2003 at 11:45:58AM +0900, Takao Indoh wrote:
>>
>>>I need a tuning parameter which can control pagecache
>>>like /proc/sys/vm/pagecache, which RedHat Linux has.
>>>The latest 2.4 or 2.5 standard kernel does not have such a parameter.
>>>2.4.18 kernel or 2.4-aa kernel has a alternative method?
> 
> I doubt that there will be that option in the 2.4 stable series.  I think
> you are trying to fix the problem without understanding the entire picture.
> If there is too much pagechache, then the kernel developers need to know
> about your workload so that they can fix it.  But you have to try -aa first
> to see if it's already fixed.
> 

   Let me give my point of view.

   Linux trys to scale up to the limits of given hardware.

   That is _*horribly*_ wrong.

   If I have 1GB of memory and my applications for use only 16MB - it 
doesn't mean I want to fill 1GB-16MB with garbage like file my momy had 
viewed two weeks ago.

   That's it: OS should scale for *application* *needs*.

   Can you compare in your mind overhead of managing 1GB of cache with 
managing e.g. 16MB of cache?

   So IMHO problem is: OS needless overhead.

   It is possible to minimize overhead in several ways:
   1) Optimize algorithms and data structures.
   2) Minimize amount of resources.
   3) As a compromise of 1&2 - teach OS to not use unneeded resource til 
the time they will be really needed, and free them afterwards.

   1) is already done, 3) is awful heuristics which will never work 
reliably.
   And Takao's patch was trying to approach problem from 2) point.
   So as for me it is justified.

   Comments are welcome.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
  2003-08-21  9:52   ` Ihar 'Philips' Filipau
@ 2003-08-25  7:17     ` Takao Indoh
  0 siblings, 0 replies; 37+ messages in thread
From: Takao Indoh @ 2003-08-25  7:17 UTC (permalink / raw)
  To: Ihar 'Philips' Filipau; +Cc: Linux Kernel Mailing List

Thank you for interest in my patch.

On Thu, 21 Aug 2003 11:52:52 +0200, Ihar 'Philips' Filipau wrote:

>Takao Indoh wrote:
>> 
>> I made a patch to add new paramter /proc/sys/vm/pgcache-max. It controls
>> maximum number of pages used as pagecache.
>> An attached file is a mere test patch, so it may contain a bug or ugly
>> code. Please let me know if there is an advice, comment, better
>> implementation, and so on.
>> 
>
>   Do you have something like this for 2.4 kernels?

No, I have only a patch for 2.5 kernel.
But, RedHat AdvancedServer2.1(2.4.9based kernel) has a similar parameter
(/proc/sys/vm/pagecache). If you can see the source, please check it.

>
>   [ I expected to find that by default Linux stops polluting memory 
>with cache when there is no more pages. But as I see your patch is 
>hacking something somewhere in the middle... But I'm not a specialist in 
>VM... Gone reading sources. ]
>
>   Thanks for the patch.

I'm not a specialist in VM, too. So, that patch may have many bugs.
What is done in that patch is very simple.
1) Add PG_pgcache flag to the page used as pagecache.
2) Watch the total amount of pagecahe.
3) If the amount of pagecahe exceeds maximum,
   try to remove only the page which has PG_pgcache flag.

Thanks.
--------------------------------------------------
Takao Indoh
 E-Mail : indou.takao@soft.fujitsu.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: cache limit
       [not found] ` <mLY8.dO.5@gated-at.bofh.it>
@ 2003-08-21  9:52   ` Ihar 'Philips' Filipau
  2003-08-25  7:17     ` Takao Indoh
  0 siblings, 1 reply; 37+ messages in thread
From: Ihar 'Philips' Filipau @ 2003-08-21  9:52 UTC (permalink / raw)
  To: Takao Indoh; +Cc: Linux Kernel Mailing List

Takao Indoh wrote:
> 
> I made a patch to add new paramter /proc/sys/vm/pgcache-max. It controls
> maximum number of pages used as pagecache.
> An attached file is a mere test patch, so it may contain a bug or ugly
> code. Please let me know if there is an advice, comment, better
> implementation, and so on.
> 

   Do you have something like this for 2.4 kernels?

   [ I expected to find that by default Linux stops polluting memory 
with cache when there is no more pages. But as I see your patch is 
hacking something somewhere in the middle... But I'm not a specialist in 
VM... Gone reading sources. ]

   Thanks for the patch.


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2003-09-02 17:52 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-19  4:39 cache limit Anthony R.
2003-08-19  4:57 ` Nuno Silva
2003-08-19  5:33 ` Denis Vlasenko
2003-08-19  6:20   ` Andrew Morton
2003-08-19  9:05     ` J.A. Magallon
2003-08-19  9:16       ` Andrew Morton
2003-08-19  9:28         ` J.A. Magallon
2003-08-19  9:43           ` Andrew Morton
2003-08-19 13:32     ` Erik Andersen
2003-08-19 20:56       ` Andrew Morton
2003-08-19 14:28   ` Anthony R.
2003-08-19 18:26     ` Mike Fedyk
2003-08-19  5:42 ` Nick Piggin
2003-08-21  0:49 ` Takao Indoh
2003-08-21 23:47   ` Mike Fedyk
2003-08-25  2:45     ` Takao Indoh
2003-08-25  4:11       ` William Lee Irwin III
2003-08-25 22:58         ` Mike Fedyk
2003-08-26  9:46           ` William Lee Irwin III
2003-08-27  9:36             ` Takao Indoh
2003-08-27  9:45               ` William Lee Irwin III
2003-08-27 11:14                 ` Takao Indoh
2003-08-27 11:36                   ` William Lee Irwin III
2003-09-02 10:52                     ` Takao Indoh
2003-09-02 11:30                       ` William Lee Irwin III
2003-09-02 17:21                       ` Mike Fedyk
2003-08-27 16:01                 ` Joseph Malicki
     [not found] <m6Bv.3ys.1@gated-at.bofh.it>
     [not found] ` <mLY8.dO.5@gated-at.bofh.it>
2003-08-21  9:52   ` Ihar 'Philips' Filipau
2003-08-25  7:17     ` Takao Indoh
     [not found] <n7lV.2HA.19@gated-at.bofh.it>
     [not found] ` <ofAJ.4dx.9@gated-at.bofh.it>
     [not found]   ` <ogZM.5KJ.1@gated-at.bofh.it>
     [not found]     ` <oyDw.5FP.33@gated-at.bofh.it>
2003-08-26 10:15       ` Ihar 'Philips' Filipau
2003-08-26 17:46         ` Mike Fedyk
     [not found] <oJ5P.699.21@gated-at.bofh.it>
     [not found] ` <oJ5P.699.23@gated-at.bofh.it>
     [not found]   ` <oJ5P.699.25@gated-at.bofh.it>
     [not found]     ` <oJ5P.699.27@gated-at.bofh.it>
     [not found]       ` <oJ5P.699.19@gated-at.bofh.it>
     [not found]         ` <oQh2.4bQ.13@gated-at.bofh.it>
2003-08-26 19:08           ` Ihar 'Philips' Filipau
2003-08-26 19:23             ` Mike Fedyk
2003-08-27 10:21               ` Ihar 'Philips' Filipau
2003-08-27 11:07                 ` Nick Piggin
     [not found] <000801c36cb1$454d4950$1001a8c0@etofmv650>
2003-08-27 16:02 ` YoshiyaETO
2003-08-27 16:03 Joseph Malicki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).