linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: VM Requirement Document - v0.0
       [not found]   ` <fa.e66agbv.hn0u1v@ifi.uio.no>
@ 2001-07-05  1:49     ` Dan Maas
  2001-07-05 13:02       ` Daniel Phillips
  2001-07-05 14:00       ` Xavier Bestel
       [not found]     ` <002501c104f4/mnt/sendme701a8c0@morph>
  1 sibling, 2 replies; 62+ messages in thread
From: Dan Maas @ 2001-07-05  1:49 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel

> Getting the user's "interactive" programs loaded back
> in afterwards is a separate, much more difficult problem
> IMHO, but no doubt still has a reasonable solution.

Possibly stupid suggestion... Maybe the interactive/GUI programs should wake
up once in a while and touch a couple of their pages? Go too far with this
and you'll just get in the way of performance, but I don't think it would
hurt to have processes waking up every couple of minutes and touching glibc,
libqt, libgtk, etc so they stay hot in memory... A very slow incremental
"caress" of the address space could eliminate the
"I-just-logged-in-this-morning-and-dammit-everything-has-been-paged-out"
problem.

Regards,
Dan



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-05  1:49     ` Dan Maas
@ 2001-07-05 13:02       ` Daniel Phillips
  2001-07-05 14:00       ` Xavier Bestel
  1 sibling, 0 replies; 62+ messages in thread
From: Daniel Phillips @ 2001-07-05 13:02 UTC (permalink / raw)
  To: Dan Maas; +Cc: linux-kernel, Tom spaziani, Marcelo Tosatti, Rik van Riel

On Thursday 05 July 2001 03:49, you wrote:
> > Getting the user's "interactive" programs loaded back
> > in afterwards is a separate, much more difficult problem
> > IMHO, but no doubt still has a reasonable solution.
>
> Possibly stupid suggestion... Maybe the interactive/GUI programs should
> wake up once in a while and touch a couple of their pages? Go too far with
> this and you'll just get in the way of performance, but I don't think it
> would hurt to have processes waking up every couple of minutes and touching
> glibc, libqt, libgtk, etc so they stay hot in memory... A very slow
> incremental "caress" of the address space could eliminate the
> "I-just-logged-in-this-morning-and-dammit-everything-has-been-paged-out"
> problem.

Personally, I'm in idea collection mode for that one.  First things first, 
from my point of view, our basic replacement policy seems to be broken.  The 
algorithms seem to be burning too much cpu and not doing enough useful work.  
Worse, they seem to have a nasty tendency to livelock themselves, i.e., get 
into situations where the mm is doing little other than scanning and 
transfering pages from list to list.  IMHO, if these things were fixed much 
of the 'interactive problem' would go away because reloading the working set 
for the mouse, for example, would just take a few milliseconds.  If not then 
we should take a good hard look at why the desktops have such poor working 
set granularity.

Furthermore, approaches that rely on applications touching what they believe 
to be their own working sets aren't going to work very well if the mm 
incorrectly processes the page reference information, or incorectly balances 
it against other things that might be going on, so lets be sure the basics 
are working properly.  Marcello has the right idea with his attention to 
better memory management statistical monitoring.  How nice it would be if he 
got together with the guy working on the tracing module...

That said, yes, it's good to think about hinting ideas, and maybe bless the 
idea of applications 'touching themselves' (yes, the allusion was 
intentional).

Here's an idea I just came up with while I was composing this... along the 
lines of using unused bandwidth for something that at least has a chance of 
being useful.  Suppose we come to the end of a period of activity, the 
general 'temperature' starts to drop and disks fall idle.  At this point we 
could consult a history of which currently running processes have been 
historically active and grow their working sets by reading in from disk.  
Otherwise, the memory and the disk bandwidth is just wasted, right?  This we 
can do inside the kernel and not require coders to mess up their apps with 
hints.  Of course, they should still take the time to reengineer them to 
reduce the cache footprint.

/me decides to stop spouting and write some code

--
Daniel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-05  1:49     ` Dan Maas
  2001-07-05 13:02       ` Daniel Phillips
@ 2001-07-05 14:00       ` Xavier Bestel
  2001-07-05 14:51         ` Daniel Phillips
                           ` (2 more replies)
  1 sibling, 3 replies; 62+ messages in thread
From: Xavier Bestel @ 2001-07-05 14:00 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Dan Maas, linux-kernel, Tom spaziani, Marcelo Tosatti, Rik van Riel

On 05 Jul 2001 15:02:51 +0200, Daniel Phillips wrote:
> Here's an idea I just came up with while I was composing this... along the 
> lines of using unused bandwidth for something that at least has a chance of 
> being useful.  Suppose we come to the end of a period of activity, the 
> general 'temperature' starts to drop and disks fall idle.  At this point we 
> could consult a history of which currently running processes have been 
> historically active and grow their working sets by reading in from disk.  
> Otherwise, the memory and the disk bandwidth is just wasted, right?  This we 
> can do inside the kernel and not require coders to mess up their apps with 
> hints.  Of course, they should still take the time to reengineer them to 
> reduce the cache footprint.

Well, on a laptop memory and disk bandwith are rarely wasted - they cost
battery life.

Xav


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-05 14:00       ` Xavier Bestel
@ 2001-07-05 14:51         ` Daniel Phillips
  2001-07-05 15:00         ` Xavier Bestel
  2001-07-05 15:12         ` Alan Shutko
  2 siblings, 0 replies; 62+ messages in thread
From: Daniel Phillips @ 2001-07-05 14:51 UTC (permalink / raw)
  To: Xavier Bestel
  Cc: Dan Maas, linux-kernel, Tom spaziani, Marcelo Tosatti, Rik van Riel

On Thursday 05 July 2001 16:00, Xavier Bestel wrote:
> On 05 Jul 2001 15:02:51 +0200, Daniel Phillips wrote:
> > Here's an idea I just came up with while I was composing this... along
> > the lines of using unused bandwidth for something that at least has a
> > chance of being useful.  Suppose we come to the end of a period of
> > activity, the general 'temperature' starts to drop and disks fall idle. 
> > At this point we could consult a history of which currently running
> > processes have been historically active and grow their working sets by
> > reading in from disk. Otherwise, the memory and the disk bandwidth is
> > just wasted, right?  This we can do inside the kernel and not require
> > coders to mess up their apps with hints.  Of course, they should still
> > take the time to reengineer them to reduce the cache footprint.
>
> Well, on a laptop memory and disk bandwith are rarely wasted - they cost
> battery life.

Then turn the feature off.

--
Daniel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-05 14:00       ` Xavier Bestel
  2001-07-05 14:51         ` Daniel Phillips
@ 2001-07-05 15:00         ` Xavier Bestel
  2001-07-05 15:12           ` Daniel Phillips
  2001-07-05 15:12         ` Alan Shutko
  2 siblings, 1 reply; 62+ messages in thread
From: Xavier Bestel @ 2001-07-05 15:00 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Dan Maas, linux-kernel, Tom spaziani, Marcelo Tosatti, Rik van Riel

On 05 Jul 2001 17:04:00 +0200, Daniel Phillips wrote:
> > Well, on a laptop memory and disk bandwith are rarely wasted - they cost
> > battery life.
> 
> Let me comment on this again, having spent a couple of minutes more 
> thinking about it.  Would you be happy paying 1% of your battery life to get 
> 80% less sluggish response after a memory pig exits?

Told like this, of course I agree !

> Also, notice that the scenario we were originally discussing, the off-hours 
> updatedb, doesn't normally happen on laptops because they tend to be 
> suspended at that time.

Suspended != halted. The updatedb stuff starts over when I bring it back
to life (RH6.2, dunno for other distribs)

Xav


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-05 14:00       ` Xavier Bestel
@ 2001-07-05 15:04 Daniel Phillips
       [not found] ` <fa.jprli0v.qlofoc@ifi.uio.no>
  2001-07-06 19:09 ` Rik van Riel
  2 siblings, 2 replies; 62+ messages in thread
From: Daniel Phillips @ 2001-07-05 15:04 UTC (permalink / raw)
  To: Xavier Bestel
  Cc: Dan Maas, linux-kernel, Tom spaziani, Marcelo Tosatti, Rik van Riel

On Thursday 05 July 2001 16:00, Xavier Bestel wrote:
> On 05 Jul 2001 15:02:51 +0200, Daniel Phillips wrote:
> > Here's an idea I just came up with while I was composing this... along
> > the lines of using unused bandwidth for something that at least has a
> > chance of being useful.  Suppose we come to the end of a period of
> > activity, the general 'temperature' starts to drop and disks fall idle. 
> > At this point we could consult a history of which currently running
> > processes have been historically active and grow their working sets by
> > reading in from disk. Otherwise, the memory and the disk bandwidth is
> > just wasted, right?  This we can do inside the kernel and not require
> > coders to mess up their apps with hints.  Of course, they should still
> > take the time to reengineer them to reduce the cache footprint.
>
> Well, on a laptop memory and disk bandwith are rarely wasted - they cost
> battery life.

Let me comment on this again, having spent a couple of minutes more 
thinking about it.  Would you be happy paying 1% of your battery life to get 
80% less sluggish response after a memory pig exits?

Also, notice that the scenario we were originally discussing, the off-hours 
updatedb, doesn't normally happen on laptops because they tend to be 
suspended at that time.

--
Daniel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-05 15:00         ` Xavier Bestel
@ 2001-07-05 15:12           ` Daniel Phillips
  0 siblings, 0 replies; 62+ messages in thread
From: Daniel Phillips @ 2001-07-05 15:12 UTC (permalink / raw)
  To: Xavier Bestel
  Cc: Dan Maas, linux-kernel, Tom spaziani, Marcelo Tosatti, Rik van Riel

On Thursday 05 July 2001 17:00, Xavier Bestel wrote:
> On 05 Jul 2001 17:04:00 +0200, Daniel Phillips wrote:
> > Also, notice that the scenario we were originally discussing, the
> > off-hours updatedb, doesn't normally happen on laptops because they tend
> > to be suspended at that time.
>
> Suspended != halted. The updatedb stuff starts over when I bring it back
> to life (RH6.2, dunno for other distribs)

Yes, but then it's normally overlapped with other work you are doing, like 
trying to read your mail.  Different problem, one we also perform poorly at 
but for different reasons.

--
Daniel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-05 14:00       ` Xavier Bestel
  2001-07-05 14:51         ` Daniel Phillips
  2001-07-05 15:00         ` Xavier Bestel
@ 2001-07-05 15:12         ` Alan Shutko
  2 siblings, 0 replies; 62+ messages in thread
From: Alan Shutko @ 2001-07-05 15:12 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Xavier Bestel, Dan Maas, linux-kernel, Tom spaziani,
	Marcelo Tosatti, Rik van Riel

Daniel Phillips <phillips@bonn-fries.net> writes:

> Also, notice that the scenario we were originally discussing, the off-hours 
> updatedb, doesn't normally happen on laptops because they tend to be 
> suspended at that time.

No, even worse, it happens when you open the laptop for the first time
in the morning, thanks to anacron.

-- 
Alan Shutko <ats@acm.org> - In a variety of flavors!
For children with short attention spans: boomerangs that don't come back.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-05 15:04 VM Requirement Document - v0.0 Daniel Phillips
       [not found] ` <fa.jprli0v.qlofoc@ifi.uio.no>
@ 2001-07-06 19:09 ` Rik van Riel
  2001-07-06 21:57   ` Daniel Phillips
  1 sibling, 1 reply; 62+ messages in thread
From: Rik van Riel @ 2001-07-06 19:09 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Xavier Bestel, Dan Maas, linux-kernel, Tom spaziani, Marcelo Tosatti

On Thu, 5 Jul 2001, Daniel Phillips wrote:

> Let me comment on this again, having spent a couple of minutes
> more thinking about it.  Would you be happy paying 1% of your
> battery life to get 80% less sluggish response after a memory
> pig exits?

Just to pull a few random numbers out of my ass too,
how about 50% of battery life for the same optimistic
80% less sluggishness ?

How about if it were only 30% of battery life?

Rik
--
Executive summary of a recent Microsoft press release:
   "we are concerned about the GNU General Public License (GPL)"


		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-06 19:09 ` Rik van Riel
@ 2001-07-06 21:57   ` Daniel Phillips
  0 siblings, 0 replies; 62+ messages in thread
From: Daniel Phillips @ 2001-07-06 21:57 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Xavier Bestel, Dan Maas, linux-kernel, Tom spaziani, Marcelo Tosatti

On Friday 06 July 2001 21:09, Rik van Riel wrote:
> On Thu, 5 Jul 2001, Daniel Phillips wrote:
> > Let me comment on this again, having spent a couple of minutes
> > more thinking about it.  Would you be happy paying 1% of your
> > battery life to get 80% less sluggish response after a memory
> > pig exits?
>
> Just to pull a few random numbers out of my ass too,
> how about 50% of battery life for the same optimistic
> 80% less sluggishness ?
>
> How about if it were only 30% of battery life?

It's not as random as that.  The idea being considered was: suppose a 
program starts up, goes through a period of intense, cache-sucking 
activity, then exits.  Could we reload the applications it just 
displaced so that the disk activity to reload them doesn't have to take 
place the first time the user touches the keyboard/mouse.  Sure, we 
obviously can, with how much complexity is another question entirely ;-)

So probably, we could eliminate more than 80% of the latency we now see 
in such a situation, I was being conservative.

Now what's the cost in battery life?  Suppose it's a 128 meg machine, 
1/3 filled with program text and data.  Hopefully, the working sets 
that were evicted are largely coherent so we'll read it back in at a 
rate not too badly degraded from the drive's transfer rate, say 5 
MB/sec.  This gives about three seconds of intense reading to restore 
something resembling the previous working set, then the disk can spin 
down and perhaps the machine will suspend itself.  So the question is, 
how much longer did the machine have to run to do this?  Well, on my 
machine updatedb takes 5-10 minutes, so the 3 seconds of activity 
tacked onto the end of the episode amounts to less than 1%, and this is 
where the 1% figure came from.

I'm not saying this would be an easy hack, just that it's possible and 
the numbers work.

--
Daniel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
       [not found]     ` <002501c104f4/mnt/sendme701a8c0@morph>
@ 2001-07-09 12:17       ` Pavel Machek
  2001-07-12 23:46         ` Daniel Phillips
  0 siblings, 1 reply; 62+ messages in thread
From: Pavel Machek @ 2001-07-09 12:17 UTC (permalink / raw)
  To: Dan Maas; +Cc: Daniel Phillips, linux-kernel

Hi!

> > Getting the user's "interactive" programs loaded back
> > in afterwards is a separate, much more difficult problem
> > IMHO, but no doubt still has a reasonable solution.
> 
> Possibly stupid suggestion... Maybe the interactive/GUI programs should wake
> up once in a while and touch a couple of their pages? Go too far with this
> and you'll just get in the way of performance, but I don't think it would
> hurt to have processes waking up every couple of minutes and touching glibc,
> libqt, libgtk, etc so they stay hot in memory... A very slow incremental
> "caress" of the address space could eliminate the
> "I-just-logged-in-this-morning-and-dammit-everything-has-been-paged-out"
> problem.

Ugh... Ouch.... Ugly, indeed.

What you might want to do is 

while true; do 
cat /usr/lib/libc* > /dev/null; sleep 1m
cat /usr/lib/qt* > /dev/null; sleep 1m
...
done

running on your system...

-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-09 12:17       ` Pavel Machek
@ 2001-07-12 23:46         ` Daniel Phillips
  2001-07-13 21:07           ` Pavel Machek
  0 siblings, 1 reply; 62+ messages in thread
From: Daniel Phillips @ 2001-07-12 23:46 UTC (permalink / raw)
  To: Pavel Machek, Dan Maas; +Cc: linux-kernel

On Monday 09 July 2001 14:17, Pavel Machek wrote:
> > Possibly stupid suggestion... Maybe the interactive/GUI programs
> > should wake up once in a while and touch a couple of their pages?
> > Go too far with this and you'll just get in the way of performance,
> > but I don't think it would hurt to have processes waking up every
> > couple of minutes and touching glibc, libqt, libgtk, etc so they
> > stay hot in memory... A very slow incremental "caress" of the
> > address space could eliminate the
> > "I-just-logged-in-this-morning-and-dammit-everything-has-been-paged
> >-out" problem.
>
> Ugh... Ouch.... Ugly, indeed.
>
> What you might want to do is
>
> while true; do
> cat /usr/lib/libc* > /dev/null; sleep 1m
> cat /usr/lib/qt* > /dev/null; sleep 1m
> ...
> done
>
> running on your system...

90%+ of what you touch that way is likely to be outside your working 
set, and only the libraries get pre-loaded, not the application code or 
data.  An approach where the application 'touches itself' has more 
chance of producing a genuine improvement in response, but is that 
really what we want application programmers spending their time 
writing?  Not to mention the extra code bloat and maintainance overhead.

Maybe there are a some applications out there - perhaps a database that 
for some reason needs to minimize its latency - where the effort is 
worth it, but they're few and far between.  IMHO, only a generic 
facility in the operating system is going to result in anything that's 
worth the effort to implement it.

What would be needed is some kind of memory of swapped out process 
pages so that after one application terminates some pages of another, 
possibly idle process could be read back in.  Naturally, this would 
only be done if the system resources were otherwise unused.  This 
optimization in the same category as readahead - it serves to reduce 
latency - but it provides a benefit only in one specific circumstance.  
On the other hand, the place where it does improve things is highly 
visible, so I don't know, it might be worth trying some experiments 
here.  Not now though, a mature, well-understood vm system is a 
prerequisite.

Well, I just thought of one relatively simple thing that could be done 
in the desktop - redraw the screen after a big app exits and the system 
is otherwise idle.  That at least would page some bitmaps back in and 
touch some drawing methods.  The responsibility for detecting the 
relevant condition would lie with the OS and there would be some 
as-yet-undefined mechanism for notifying the desktop.

This is firmly in the flight-of-fancy category.  What would be real and 
worth doing right now is for some application developers to profile 
their wonderful creations and find out why they're touching so darn 
much memory.  Who hasn't seen their system go into a frenzy as the 
result of bringing up a simple configuration dialog in KDE?  Or 
right-clicking one of the window buttons in Gnome?  It's uncalled for, 
a little effort on that front would make the restart latency problem 
mostly go away.

--
Daniel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-12 23:46         ` Daniel Phillips
@ 2001-07-13 21:07           ` Pavel Machek
  0 siblings, 0 replies; 62+ messages in thread
From: Pavel Machek @ 2001-07-13 21:07 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Dan Maas, linux-kernel

Hi!

> > > Possibly stupid suggestion... Maybe the interactive/GUI programs
> > > should wake up once in a while and touch a couple of their pages?
> > > Go too far with this and you'll just get in the way of performance,
> > > but I don't think it would hurt to have processes waking up every
> > > couple of minutes and touching glibc, libqt, libgtk, etc so they
> > > stay hot in memory... A very slow incremental "caress" of the
> > > address space could eliminate the
> > > "I-just-logged-in-this-morning-and-dammit-everything-has-been-paged
> > >-out" problem.
> >
> > Ugh... Ouch.... Ugly, indeed.
> >
> > What you might want to do is
> >
> > while true; do
> > cat /usr/lib/libc* > /dev/null; sleep 1m
> > cat /usr/lib/qt* > /dev/null; sleep 1m
> > ...
> > done
> >
> > running on your system...
> 
> 90%+ of what you touch that way is likely to be outside your working 
> set, and only the libraries get pre-loaded, not the application code or 
> data.  An approach where the application 'touches itself' has more 
> chance of producing a genuine improvement in response, but is that 
> really what we want application programmers spending their time 
> writing?  Not to mention the extra code bloat and maintainance
> overhead.

Application touching itself would be *evil*. 

You might extend my approach if something like

if ps | grep gimp; then cat /usr/bin/gimp > /dev/null; fi

or something like that. It is definitely less evil than gimp doing it
itself.

> Maybe there are a some applications out there - perhaps a database that 
> for some reason needs to minimize its latency - where the effort is 
> worth it, but they're few and far between.  IMHO, only a generic 

User programs should *never ever* do unneeded work. Touching itself is
unneeded and evil for memory managment.

> This is firmly in the flight-of-fancy category.  What would be real and 
> worth doing right now is for some application developers to profile 
> their wonderful creations and find out why they're touching so darn 
> much memory.  Who hasn't seen their system go into a frenzy as the 
> result of bringing up a simple configuration dialog in KDE?  Or 
> right-clicking one of the window buttons in Gnome?  It's uncalled for, 
> a little effort on that front would make the restart latency problem 
> mostly go away.

Agreed.
								Pavel

-- 
The best software in life is free (not shareware)!		Pavel
GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-26 19:58 Jason McMullan
                   ` (3 preceding siblings ...)
  2001-06-30 15:37 ` Pavel Machek
@ 2001-07-10 10:34 ` David Woodhouse
  4 siblings, 0 replies; 62+ messages in thread
From: David Woodhouse @ 2001-07-10 10:34 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Jason McMullan, linux-kernel


pavel@suse.cz said:
> 		RAM:	4-64Mb	 (reads: immediate, writes: immediate)

> MB not Mb. 4Mb = 0.5MB.

...  = 0.48 MiB = 3.8 Mib

http://physics.nist.gov/cuu/Units/binary.html

--
dwmw2



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
@ 2001-07-05 15:09 mike_phillips
  0 siblings, 0 replies; 62+ messages in thread
From: mike_phillips @ 2001-07-05 15:09 UTC (permalink / raw)
  To: Xavier Bestel
  Cc: Tom spaziani, Dan Maas, linux-kernel, Marcelo Tosatti,
	Daniel Phillips, Rik van Riel

> Well, on a laptop memory and disk bandwith are rarely wasted - they cost
> battery life.

I've been playing around with different scenarios to see the differences 
in performance. A good way to trigger the cache problem is to untar a 
couple of kernel source trees or other large amounts of files, until free 
memory is down to less than 2mb. Then try to fire up a few apps that need 
some memory. The hard drive thrashes around as the VM tries to free up 
enough space, often using swap instead of flushing out the cache. 

These source trees can then be deleted which frees up the memory the cache 
was using and performance returns to where it should be. 

However, if I just fire up enough apps to use up all the memory and then 
go into swap, response is still acceptable. If the app requires loading 
from swap there is just a short lag while the VM does its thing and then 
life is good. 

I don't expect to be able to run more apps than I have memory for without 
a performance hit, but I do expect to be able to run with over 128MB of 
"real" free memory and not suffer from performance degradation (which 
doesn't happen at present)

Mike


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
@ 2001-07-04 16:08 mike_phillips
  0 siblings, 0 replies; 62+ messages in thread
From: mike_phillips @ 2001-07-04 16:08 UTC (permalink / raw)
  To: Marco Colombo; +Cc: linux-kernel, Daniel Phillips, Rik van Riel

> Remember that the first message was about a laptop. At 4:00AM there's
> no activity but the updatedb one (and the other cron jobs). Simply,
> there's no 'accessed-often' data.  Moreover, I'd bet that 90% of the
> metadata touched by updatedb won't be accessed at all in the future.
> Laptop users don't do find /usr/share/terminfo/ so often.

Maybe, but I would think that most laptops get switched off at night. Then 
when turned on again in the morning, anacron realizes it missed the 
nightly cron jobs and then runs everything. 

This really does make an incredible difference to the system. If I remove 
the updatedb job from cron.daily, the machine won't touch swap all day and 
runs like charm. (That's with vmware, mozilla, openoffice, all 
applications that like big chunks of memory)

Mike




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-04  9:41           ` Marco Colombo
@ 2001-07-04 15:03             ` Daniel Phillips
  0 siblings, 0 replies; 62+ messages in thread
From: Daniel Phillips @ 2001-07-04 15:03 UTC (permalink / raw)
  To: Marco Colombo; +Cc: Rik van Riel, mike_phillips, linux-kernel

On Wednesday 04 July 2001 11:41, Marco Colombo wrote:
> On Tue, 3 Jul 2001, Daniel Phillips wrote:
> > On Tuesday 03 July 2001 12:33, Marco Colombo wrote:
> > > Oh, yes, since that PAGE_AGE_BG_INTERACTIVE_MINIMUM is applied only
> > > when background aging, maybe it's not enough to keep processes like
> > > updatedb from causing interactive pages to be evicted.
> > > That's why I said we should have another way to detect that kind of
> > > activity... well, the application could just let us know (no need to
> > > embed an autotuning-genetic-page-replacement-optimizer into the
> > > kernel). We should just drop all FS metadata accessed by updatedb,
> > > since we know that's one-shot only, without raising pressure at all.
> >
> > Note that some of updatedb's metadata pages are of the accessed-often
> > kind, e.g., directory blocks and inodes.  A blanket low priority on all
> > the pages updatedb touches just won't do.
>
> Remember that the first message was about a laptop. At 4:00AM there's
> no activity but the updatedb one (and the other cron jobs). Simply,
> there's no 'accessed-often' data.  Moreover, I'd bet that 90% of the
> metadata touched by updatedb won't be accessed at all in the future.
> Laptop users don't do find /usr/share/terminfo/ so often.

The problem is when you have a directory block, say, that has to stay around 
quite a few seconds before dropping into disuse.  You sure don't want that 
block treated as 'accessed-once'.

The goal here is to get through the updatedb as quickly as possible.  Getting 
the user's "interactive" programs loaded back in afterwards is a separate, 
much more difficult problem IMHO, but no doubt still has a reasonable 
solution.  I'm not that worried about it, my feeling is: if we fix up the MM 
so it doesn't bog down with a lot of pages in cache and, in addition, do 
better readahead, interactive performance will be just fine.

> > > Just like
> > > (not that I'm proposing it) putting those "one-shot" pages directly on
> > > the inactive-clean list instead of the active list. How an application
> > > could declare such a behaviour is an open question, of course. Maybe
> > > it's even possible to detect it. And BTW that's really fine tuning.
> > > Evicting an 8 hours old page may be a mistake sometime, but it's never
> > > a *big* mistake.
> >
> > IMHO, updatedb *should* evict all the "interactive" pages that aren't
> > actually doing anything[1].  That way it should run faster, provided of
> > course its accessed-once pages are properly given low priority.
>
> So in the morning you find your Gnome session completely on swap,
> and at the same time a lot of free mem.
>
> > I see three page priority levels:
> >
> >   0 - accessed-never/aged to zero
> >   1 - accessed-once/just loaded
> >   2 - accessed-often
> >
> > with these transitions:
> >
> >   0 -> 1, if a page is accessed
> >   1 -> 2, if a page is accessed a second time
> >   1 -> 0, if a page gets old
> >   2 -> 0, if a page gets old
> >
> > The 0 and 1 level pages are on a fifo queue, the 2 level pages are
> > scanned clock-wise, relying on the age computation[2].  Eviction
> > candidates are taken from the cold end of the 0 level list, unless it is
> > empty, in which case they are taken from the 1 level list. In
> > desperation, eviction candidates are taken from the 2 level list, i.e.,
> > random eviction policy, as opposed to what we do now which is to initiate
> > an emergency scan of the active list for new inactive candidates - rather
> > like calling a quick board meeting when the building is on fire.
>
> Well, it's just aging faster when it's needed. Random evicting is not
> good.

It's better than getting bogged down in scanning latency just at the point 
you should be starting new writeouts.  Obviously, it's a tradeoff.

> List 2 is ordered by age, and there're always better candidates
> at the end of the list than at the front. The higher the pressure,
> the shorter is the time a page has to rest idle to get at the end of the
> list. But the list *is* ordered.

No, list 2 is randomly ordered.  Pages move from the initial trial list to 
the active list with 0 temperature, and drop in just behind the one-hand scan 
pointer (which we actually implement as the head of the list).  After that 
they get "aged" up or down as we do now.  (New improved terminology: heated 
or cooled according to the referenced bit.)

> > Note that the above is only a very slight departure from the current
> > design. And by the way, this is just brainstorming, it hasn't reached the
> > 'proposal' stage yet.
> >
> > [1] It would be nice to have a mechanism whereby the evicted
> > 'interactive' pages are automatically reloaded when updatedb has finished
> > its work.  This is a case of scavenging unused disk bandwidth for
> > something useful, i.e., improving the interactive experience.
>
> updatedb doesn't really need all the memory it takes. All it needs is
> a small buffer to sequentially scan all the disk. So we should just
> drop all the pages it references, since we already know they won't be
> referenced again by noone else.

I hope it's clear how the method I'm describing does that.

> > [2] I much prefer the hot/cold terminology over old/young.  The latter
> > gets confusing because a 'high' age is 'young'.  I'd rather think of a
> > high value as being 'hot'.
>
> True. s/page->age/page->temp/g B-)

Yep.

--
Daniel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-04  8:32         ` Marco Colombo
@ 2001-07-04 14:44           ` Daniel Phillips
  0 siblings, 0 replies; 62+ messages in thread
From: Daniel Phillips @ 2001-07-04 14:44 UTC (permalink / raw)
  To: Marco Colombo; +Cc: Rik van Riel, mike_phillips, linux-kernel

On Wednesday 04 July 2001 10:32, Marco Colombo wrote:
> On Tue, 3 Jul 2001, Daniel Phillips wrote:
> > On Monday 02 July 2001 20:42, Rik van Riel wrote:
> > > On Thu, 28 Jun 2001, Marco Colombo wrote:
> > > > I'm not sure that, in general, recent pages with only one access are
> > > > still better eviction candidates compared to 8 hours old pages. Here
> > > > we need either another way to detect one-shot activity (like the one
> > > > performed by updatedb),
> > >
> > > Fully agreed, but there is one problem with this idea.
> > > Suppose you have a maximum of 20% of your RAM for these
> > > "one-shot" things, now how are you going to be able to
> > > page in an application with a working set of, say, 25%
> > > the size of RAM ?
> >
> > Easy.  What's the definition of working set?  Those pages that are
> > frequently referenced.  So as the application starts up some of its pages
> > will get promoted from used-once to used-often.  (On the other hand, the
> > target behavior here conflicts with the goal of grouping together several
> > temporally-related accesses to the same page together as one access, so
> > there's a subtle distinction to be made here, see below.)
>
> [...]
>
> In Rik example, the ws is larger than available memory. Part of it
> (the "hottest" one) will get double-accesses, but other pages will keep
> condending the few available (physical) pages with no chance of being
> accessed twice.  But see my previous posting...

But that's exactly what we want.  Note that the idea of reserving a fixed 
amount of memory for "one-shot" pages wasn't mine.  I see no reason to set a 
limit.  There's only one critereon: does a page get referenced between the 
time it's created and when its probation period expires?

Once a page makes it into the active (level 2) set it's on an equal footing 
with lots of others and it's up to our intrepid one-hand clock to warm it up 
or cool it down as appropriate.  On the other hand, if the page gets sent to 
death row it still has a few chances to prove its worth before being cleaned 
up and sent to the aba^H^H^H^H^H^H^H^H reclaimed.  (Apologies for the 
multiplying metaphors ;-)

--
Daniel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-03 15:04         ` Daniel Phillips
  2001-07-03 18:24           ` Daniel Phillips
  2001-07-04  8:12           ` Ari Heitner
@ 2001-07-04  9:41           ` Marco Colombo
  2001-07-04 15:03             ` Daniel Phillips
  2 siblings, 1 reply; 62+ messages in thread
From: Marco Colombo @ 2001-07-04  9:41 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Rik van Riel, mike_phillips, linux-kernel

On Tue, 3 Jul 2001, Daniel Phillips wrote:

> On Tuesday 03 July 2001 12:33, Marco Colombo wrote:
> > Oh, yes, since that PAGE_AGE_BG_INTERACTIVE_MINIMUM is applied only
> > when background aging, maybe it's not enough to keep processes like
> > updatedb from causing interactive pages to be evicted.
> > That's why I said we should have another way to detect that kind of
> > activity... well, the application could just let us know (no need to
> > embed an autotuning-genetic-page-replacement-optimizer into the kernel).
> > We should just drop all FS metadata accessed by updatedb, since we
> > know that's one-shot only, without raising pressure at all.
>
> Note that some of updatedb's metadata pages are of the accessed-often kind,
> e.g., directory blocks and inodes.  A blanket low priority on all the pages
> updatedb touches just won't do.

Remember that the first message was about a laptop. At 4:00AM there's
no activity but the updatedb one (and the other cron jobs). Simply,
there's no 'accessed-often' data.  Moreover, I'd bet that 90% of the
metadata touched by updatedb won't be accessed at all in the future.
Laptop users don't do find /usr/share/terminfo/ so often.

> > Just like
> > (not that I'm proposing it) putting those "one-shot" pages directly on
> > the inactive-clean list instead of the active list. How an application
> > could declare such a behaviour is an open question, of course. Maybe it's
> > even possible to detect it. And BTW that's really fine tuning.
> > Evicting an 8 hours old page may be a mistake sometime, but it's never
> > a *big* mistake.
>
> IMHO, updatedb *should* evict all the "interactive" pages that aren't
> actually doing anything[1].  That way it should run faster, provided of
> course its accessed-once pages are properly given low priority.

So in the morning you find your Gnome session completely on swap,
and at the same time a lot of free mem.

> I see three page priority levels:
>
>   0 - accessed-never/aged to zero
>   1 - accessed-once/just loaded
>   2 - accessed-often
>
> with these transitions:
>
>   0 -> 1, if a page is accessed
>   1 -> 2, if a page is accessed a second time
>   1 -> 0, if a page gets old
>   2 -> 0, if a page gets old
>
> The 0 and 1 level pages are on a fifo queue, the 2 level pages are scanned
> clock-wise, relying on the age computation[2].  Eviction candidates are taken
> from the cold end of the 0 level list, unless it is empty, in which case they
> are taken from the 1 level list. In desperation, eviction candidates are
> taken from the 2 level list, i.e., random eviction policy, as opposed to what
> we do now which is to initiate an emergency scan of the active list for new
> inactive candidates - rather like calling a quick board meeting when the
> building is on fire.

Well, it's just aging faster when it's needed. Random evicting is not
good. List 2 is ordered by age, and there're always better candidates
at the end of the list than at the front. The higher the pressure,
the shorter is the time a page has to rest idle to get at the end of the
list. But the list *is* ordered.

> Note that the above is only a very slight departure from the current design.
> And by the way, this is just brainstorming, it hasn't reached the 'proposal'
> stage yet.
>
> [1] It would be nice to have a mechanism whereby the evicted 'interactive'
> pages are automatically reloaded when updatedb has finished its work.  This
> is a case of scavenging unused disk bandwidth for something useful, i.e.,
> improving the interactive experience.

updatedb doesn't really need all the memory it takes. All it needs is
a small buffer to sequentially scan all the disk. So we should just
drop all the pages it references, since we already know they won't be
referenced again by noone else.

> [2] I much prefer the hot/cold terminology over old/young.  The latter gets
> confusing because a 'high' age is 'young'.  I'd rather think of a high value
> as being 'hot'.

True. s/page->age/page->temp/g B-)

.TM.
-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-03 18:29       ` Daniel Phillips
@ 2001-07-04  8:32         ` Marco Colombo
  2001-07-04 14:44           ` Daniel Phillips
  0 siblings, 1 reply; 62+ messages in thread
From: Marco Colombo @ 2001-07-04  8:32 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Rik van Riel, mike_phillips, linux-kernel

On Tue, 3 Jul 2001, Daniel Phillips wrote:

> On Monday 02 July 2001 20:42, Rik van Riel wrote:
> > On Thu, 28 Jun 2001, Marco Colombo wrote:
> > > I'm not sure that, in general, recent pages with only one access are
> > > still better eviction candidates compared to 8 hours old pages. Here
> > > we need either another way to detect one-shot activity (like the one
> > > performed by updatedb),
> >
> > Fully agreed, but there is one problem with this idea.
> > Suppose you have a maximum of 20% of your RAM for these
> > "one-shot" things, now how are you going to be able to
> > page in an application with a working set of, say, 25%
> > the size of RAM ?
>
> Easy.  What's the definition of working set?  Those pages that are frequently
> referenced.  So as the application starts up some of its pages will get
> promoted from used-once to used-often.  (On the other hand, the target
> behavior here conflicts with the goal of grouping together several
> temporally-related accesses to the same page together as one access, so
> there's a subtle distinction to be made here, see below.)
[...]

In Rik example, the ws is larger than available memory. Part of it
(the "hottest" one) will get double-accesses, but other pages will keep
condending the few available (physical) pages with no chance of being
accessed twice.  But see my previous posting...

.TM.
-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-03 15:04         ` Daniel Phillips
  2001-07-03 18:24           ` Daniel Phillips
@ 2001-07-04  8:12           ` Ari Heitner
  2001-07-04  9:41           ` Marco Colombo
  2 siblings, 0 replies; 62+ messages in thread
From: Ari Heitner @ 2001-07-04  8:12 UTC (permalink / raw)
  To: linux-kernel


On Tue, 3 Jul 2001, Daniel Phillips wrote:

> And by the way, this is just brainstorming, it hasn't reached the 'proposal' 
> stage yet.

So while we're here, an idea someone proposed in #debian while discussion this
thread (michal@203.94.140.52, you know who you are): QoS for application paging
on desktops. Basically you designate to the kernel which applications you want
to give priviledges, and it avoids swapping them out, even if they've been idle
for a long time. You designate your desktop apps, and then when updatedb comes
along they don't get kicked (but something more intensive like a kernel compile
would claim the pages). Maybe it would be as simple as a category of apps whose
pages won't get kicked before a singly-touched page (like and updatedb or
streaming media run).

For the record, I'm impressed with the new VM design, and I think its unbiased
behaviour (once the bugs are ironed out) will be exactly what I'm looking for
in life (traditional Unix "the fair way") :) Currently using a 4-way RS/6000
running AIX 4.2 which has been up for a long time and is running a lot of
programs (even though the active set is quite reasonable), and decides to swap
at evil times :)

Looking forward to the tweaks/settings options that will appear on this VM over
the next little while...



Cheers,

Ari Heitner




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-02 18:42     ` Rik van Riel
  2001-07-03 10:33       ` Marco Colombo
@ 2001-07-03 18:29       ` Daniel Phillips
  2001-07-04  8:32         ` Marco Colombo
  1 sibling, 1 reply; 62+ messages in thread
From: Daniel Phillips @ 2001-07-03 18:29 UTC (permalink / raw)
  To: Rik van Riel, Marco Colombo; +Cc: mike_phillips, linux-kernel

On Monday 02 July 2001 20:42, Rik van Riel wrote:
> On Thu, 28 Jun 2001, Marco Colombo wrote:
> > I'm not sure that, in general, recent pages with only one access are
> > still better eviction candidates compared to 8 hours old pages. Here
> > we need either another way to detect one-shot activity (like the one
> > performed by updatedb),
>
> Fully agreed, but there is one problem with this idea.
> Suppose you have a maximum of 20% of your RAM for these
> "one-shot" things, now how are you going to be able to
> page in an application with a working set of, say, 25%
> the size of RAM ?

Easy.  What's the definition of working set?  Those pages that are frequently 
referenced.  So as the application starts up some of its pages will get 
promoted from used-once to used-often.  (On the other hand, the target 
behavior here conflicts with the goal of grouping together several 
temporally-related accesses to the same page together as one access, so 
there's a subtle distinction to be made here, see below.)

The point here is that there are such things as run-once program pages, just 
as there are use-once file pages.  Both should get low priority and be 
evicted early, regardless of the fact they were just loaded.

> If you don't have any special measures, the pages from
> this "new" application will always be treated as one-shot
> pages and the process will never be able to be cached in
> memory completely...

The self-balancing way of doing this is to promote pages from the old end of 
the used-once list to the used-often (active) list at a rate corresponding to 
the fault-in rate so we get more aggressive promotion of referenced-often 
pages during program loading, and conversely, aggressive demotion of 
referenced-once pages.

--
Daniel

--
Daniel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-03 15:04         ` Daniel Phillips
@ 2001-07-03 18:24           ` Daniel Phillips
  2001-07-04  8:12           ` Ari Heitner
  2001-07-04  9:41           ` Marco Colombo
  2 siblings, 0 replies; 62+ messages in thread
From: Daniel Phillips @ 2001-07-03 18:24 UTC (permalink / raw)
  To: Marco Colombo, Rik van Riel; +Cc: mike_phillips, linux-kernel

An ammendment to my previous post...

> I see three page priority levels:
>
>   0 - accessed-never/aged to zero
>   1 - accessed-once/just loaded
>   2 - accessed-often
>
> with these transitions:
>
>   0 -> 1, if a page is accessed
>   1 -> 2, if a page is accessed a second time
>   1 -> 0, if a page gets old
>   2 -> 0, if a page gets old

Better:

   1 -> 0, if a page was not referenced before arriving at the old end
   1 -> 2, if it was

Meaning that multiple accesses to pages on the level 1 list are treated as a 
single access.  In addition, this reflects what we can do practically with 
the hardware referenced bit.

--
Daniel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-03 10:33       ` Marco Colombo
@ 2001-07-03 15:04         ` Daniel Phillips
  2001-07-03 18:24           ` Daniel Phillips
                             ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Daniel Phillips @ 2001-07-03 15:04 UTC (permalink / raw)
  To: Marco Colombo, Rik van Riel; +Cc: mike_phillips, linux-kernel

On Tuesday 03 July 2001 12:33, Marco Colombo wrote:
> Oh, yes, since that PAGE_AGE_BG_INTERACTIVE_MINIMUM is applied only
> when background aging, maybe it's not enough to keep processes like
> updatedb from causing interactive pages to be evicted.
> That's why I said we should have another way to detect that kind of
> activity... well, the application could just let us know (no need to
> embed an autotuning-genetic-page-replacement-optimizer into the kernel).
> We should just drop all FS metadata accessed by updatedb, since we
> know that's one-shot only, without raising pressure at all.

Note that some of updatedb's metadata pages are of the accessed-often kind, 
e.g., directory blocks and inodes.  A blanket low priority on all the pages 
updatedb touches just won't do.

> Just like
> (not that I'm proposing it) putting those "one-shot" pages directly on
> the inactive-clean list instead of the active list. How an application
> could declare such a behaviour is an open question, of course. Maybe it's
> even possible to detect it. And BTW that's really fine tuning.
> Evicting an 8 hours old page may be a mistake sometime, but it's never
> a *big* mistake.

IMHO, updatedb *should* evict all the "interactive" pages that aren't 
actually doing anything[1].  That way it should run faster, provided of 
course its accessed-once pages are properly given low priority.

I see three page priority levels:

  0 - accessed-never/aged to zero
  1 - accessed-once/just loaded
  2 - accessed-often

with these transitions:

  0 -> 1, if a page is accessed
  1 -> 2, if a page is accessed a second time
  1 -> 0, if a page gets old
  2 -> 0, if a page gets old

The 0 and 1 level pages are on a fifo queue, the 2 level pages are scanned 
clock-wise, relying on the age computation[2].  Eviction candidates are taken 
from the cold end of the 0 level list, unless it is empty, in which case they 
are taken from the 1 level list. In desperation, eviction candidates are 
taken from the 2 level list, i.e., random eviction policy, as opposed to what 
we do now which is to initiate an emergency scan of the active list for new 
inactive candidates - rather like calling a quick board meeting when the 
building is on fire.

Note that the above is only a very slight departure from the current design.  
And by the way, this is just brainstorming, it hasn't reached the 'proposal' 
stage yet.

[1] It would be nice to have a mechanism whereby the evicted 'interactive' 
pages are automatically reloaded when updatedb has finished its work.  This 
is a case of scavenging unused disk bandwidth for something useful, i.e., 
improving the interactive experience.

[2] I much prefer the hot/cold terminology over old/young.  The latter gets 
confusing because a 'high' age is 'young'.  I'd rather think of a high value 
as being 'hot'.

--
Daniel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-07-02 18:42     ` Rik van Riel
@ 2001-07-03 10:33       ` Marco Colombo
  2001-07-03 15:04         ` Daniel Phillips
  2001-07-03 18:29       ` Daniel Phillips
  1 sibling, 1 reply; 62+ messages in thread
From: Marco Colombo @ 2001-07-03 10:33 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Daniel Phillips, mike_phillips, linux-kernel

On Mon, 2 Jul 2001, Rik van Riel wrote:

> On Thu, 28 Jun 2001, Marco Colombo wrote:
>
> > I'm not sure that, in general, recent pages with only one access are
> > still better eviction candidates compared to 8 hours old pages. Here
> > we need either another way to detect one-shot activity (like the one
> > performed by updatedb),
>
> Fully agreed, but there is one problem with this idea.
> Suppose you have a maximum of 20% of your RAM for these
> "one-shot" things, now how are you going to be able to
> page in an application with a working set of, say, 25%
> the size of RAM ?
>
> If you don't have any special measures, the pages from
> this "new" application will always be treated as one-shot
> pages and the process will never be able to be cached in
> memory completely...

I see your point. Running Gnome on a 64MB box means you have most
of the pages that are "warm" (using my definition), so there's little
room for "cold" (new) pages, and maybe they don't get a chance of
being accessed a second time before they are evicted, which leads to
thrashing if you're trying to start something really big (well, I guess
the access pattern within a typical ws is not uniformly distributed, so
some pages will get accessed twice, but I see the problem).

I'll try and make my point a bit clearer.
I was referring to background aging only. When aging
is caused by pressure, you don't make any difference between pages.
I don't know how the idea to give high values for page->age on the second
access instead of the first is going to be implemented, but I'm assuming
that new pages are going to be placed on the active list with a low age
value (PAGE_AGE_START_FIRST ?), maybe even 0 (well, I'm not a guru of
course). I'm just saying that, to avoid Mike's "problem" (which BTW
I don't believe is a big one, really), we could stop background aging
on interactive pages (short form for "pages that belong to the ws of an
interactive process") at a certain miminum age, say
PAGE_AGE_BG_INTERACTIVE_MINIMUM, with PAGE_AGE_BG_INTERACTIVE_MINIMUM
 > PAGE_AGE_START_FIRST). Weighting the difference between the two
ages, you can give long-standing interactive pages some advantage vs
new pages. But they will be aged below PAGE_AGE_START_FIRST and eventually
moved to the inactive list. After all, they *are* good candidates.
Does this make some sense?

Oh, yes, since that PAGE_AGE_BG_INTERACTIVE_MINIMUM is applied only
when background aging, maybe it's not enough to keep processes like
updatedb from causing interactive pages to be evicted.
That's why I said we should have another way to detect that kind of
activity... well, the application could just let us know (no need to
embed an autotuning-genetic-page-replacement-optimizer into the kernel).
We should just drop all FS metadata accessed by updatedb, since we
know that's one-shot only, without raising pressure at all. Just like
(not that I'm proposing it) putting those "one-shot" pages directly on
the inactive-clean list instead of the active list. How an application
could declare such a behaviour is an open question, of course. Maybe it's
even possible to detect it. And BTW that's really fine tuning.
Evicting an 8 hours old page may be a mistake sometime, but it's never
a *big* mistake.

>
> Rik
> --
> Virtual memory is like a game you can't win;
> However, without VM there's truly nothing to lose...
>
> http://www.surriel.com/		http://distro.conectiva.com/
>
> Send all your spam to aardvark@nl.linux.org (spam digging piggy)

.TM.
-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-28 18:01   ` Marco Colombo
@ 2001-07-02 18:42     ` Rik van Riel
  2001-07-03 10:33       ` Marco Colombo
  2001-07-03 18:29       ` Daniel Phillips
  0 siblings, 2 replies; 62+ messages in thread
From: Rik van Riel @ 2001-07-02 18:42 UTC (permalink / raw)
  To: Marco Colombo; +Cc: Daniel Phillips, mike_phillips, linux-kernel

On Thu, 28 Jun 2001, Marco Colombo wrote:

> I'm not sure that, in general, recent pages with only one access are
> still better eviction candidates compared to 8 hours old pages. Here
> we need either another way to detect one-shot activity (like the one
> performed by updatedb),

Fully agreed, but there is one problem with this idea.
Suppose you have a maximum of 20% of your RAM for these
"one-shot" things, now how are you going to be able to
page in an application with a working set of, say, 25%
the size of RAM ?

If you don't have any special measures, the pages from
this "new" application will always be treated as one-shot
pages and the process will never be able to be cached in
memory completely...

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-26 19:58 Jason McMullan
                   ` (2 preceding siblings ...)
  2001-06-28 22:47 ` John Fremlin
@ 2001-06-30 15:37 ` Pavel Machek
  2001-07-10 10:34 ` David Woodhouse
  4 siblings, 0 replies; 62+ messages in thread
From: Pavel Machek @ 2001-06-30 15:37 UTC (permalink / raw)
  To: Jason McMullan; +Cc: linux-kernel

Hi!

> 	Here's my first pass at a VM requirements document,
> for the embedded, desktop, and server cases. At the end is 
> a summary of general rules that should take care of all of 
> these cases.
> 
> Bandwidth Descriptions:
> 
> 	immediate: RAM, on-chip cache, etc. 
> 	fast:	   Flash reads, ROMs, etc.

Flash reads aresometimes pretty slow. (Flash over IDE over PCMCIA...2MB/sec 
bandwidth. Slower than most harddrives.

> 	medium:    Hard drives, CD-ROMs, 100Mb ethernet, etc.

CDroms are way slower than harddrives (mostly to seek times).

> 	slow:	   Flash writes, floppy disks,  CD-WR burners
> 	packeted:  Reads/write should be in as large a packet as possible
> 
> Embedded Case
> -------------
> 
> 	Overview
> 	--------
> 	  In the embedded case, the primary VM motiviation is to
> 	use as _little_ caching of the filesystem for reads as
> 	possible because (a) reads are very fast and (b) we don't
> 	have any swap. However, we want to cache _writes_ as hard
> 	as possible, because Flash is slow, and prone to wear.
> 	  
> 	Machine Description
> 	------------------
> 		RAM:	4-64Mb	 (reads: immediate, writes: immediate)

MB not Mb. 4Mb = 0.5MB.

> 		Flash:	4-128Mb  (reads: fast, writes: slow, packeted)
> 		CDROM:	640-800Mb (reads: medium)
> 		Swap:	0Mb
> 
> 	Motiviations
> 	------------
> 		* Don't write to the (slow,packeted) devices until
> 		  you need to free up memory for processes.
> 		* Never cache reads from immediate/fast devices.

Flash connected over PCMCIA over IDE is *very* slow. You must cache it.

-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-26 19:58 Jason McMullan
  2001-06-26 21:21 ` Rik van Riel
  2001-06-26 21:33 ` John Stoffel
@ 2001-06-28 22:47 ` John Fremlin
  2001-06-30 15:37 ` Pavel Machek
  2001-07-10 10:34 ` David Woodhouse
  4 siblings, 0 replies; 62+ messages in thread
From: John Fremlin @ 2001-06-28 22:47 UTC (permalink / raw)
  To: Jason McMullan; +Cc: linux-kernel, linux-mm


[...]

> 	immediate: RAM, on-chip cache, etc. 
> 	fast:	   Flash reads, ROMs, etc.
> 	medium:    Hard drives, CD-ROMs, 100Mb ethernet, etc.
> 	slow:	   Flash writes, floppy disks,  CD-WR burners
> 	packeted:  Reads/write should be in as large a packet as possible
> 
> Embedded Case

[...]

> Desktop Case

I'm not sure there's any point in separating the cases like this.  The
complex part of the VM is the caching part => to be a good cache you
must take into account the speed of accesses to the cached medium,
including warm up times for sleepy drives etc.

It would be really cool if the VM could do that, so e.g. in the ideal
world you could connect up a slow harddrive and have its contents
cached as swap on your fast harddrive(!) (not a new idea btw and
already implemented elsewhere). I.e. from the point of view of the VM a
computer is just a group of data storage units and it's allowed to use
up certain parts of each one to do stuff

[...]

-- 

	http://ape.n3.net

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-28 14:39 ` Daniel Phillips
@ 2001-06-28 18:01   ` Marco Colombo
  2001-07-02 18:42     ` Rik van Riel
  0 siblings, 1 reply; 62+ messages in thread
From: Marco Colombo @ 2001-06-28 18:01 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: mike_phillips, linux-kernel

On Thu, 28 Jun 2001, Daniel Phillips wrote:

> On Thursday 28 June 2001 14:20, mike_phillips@urscorp.com wrote:
> > > If individual pages could be classified as code (text segments),
> > > data, file cache, and so on, I would specify costs to the paging
> > > of such pages in or out.  This way I can make the system perfer
> > > to drop a file cache page that has not been accessed for five
> > > minutes, over a program text page that has not been acccessed
> > > for one hour (or much more).
> >
> > This would be extremely useful. My laptop has 256mb of ram, but every day
> > it runs the updatedb for locate. This fills the memory with the file
> > cache. Interactivity is then terrible, and swap is unnecessarily used. On
> > the laptop all this hard drive thrashing is bad news for battery life
> > (plus the fact that laptop hard drives are not the fastest around). I
> > purposely do not run more applications than can comfortably fit in the
> > 256mb of memory.
> >
> > If fact, to get interactivity back, I've got a small 10 liner that mallocs
> > memory to *force* stuff into swap purely so I can have a large block of
> > memory back for interactivity.
> >
> > Something simple that did "you haven't used this file for 30mins, flush it
> > out of the cache would be sufficient"
>
> Updatedb fills memory full of clean file pages so there's nothing to flush.
> Did you mean "evict"?

Well, I believe all inodes get dirtied for access time update, unless the
FS is mounted no_atime. And it does write its database file...

> Roughly speaking we treat clean pages as "instantly relaimable".  Eviction
> and reclaiming are done in the same step (look at reclaim_page).  The key to
> efficient mm is nothing more or less than choosing the best victim for
> reclaiming and we aren't doing a spectacularly good job of that right now.
>
> There is a simple change in strategy that will fix up the updatedb case quite
> nicely, it goes something like this: a single access to a page (e.g., reading
> it) isn't enough to bring it to the front of the LRU queue, but accessing it
> twice or more is.  This is being looked at.

You mean that pages that belong to interactive applications (working sets)
won't be evicted to make room for the cache? And that pages just filled
with data read by updatedb will be chosen instead (a kind of drop-behind)?

There's nothing really wrong in the kernel "swapping out" interactive
applications at 4 a.m., their pages have the property of both not being
accessed recently and (the kernel doesn't know, of course) not going to
be useful in the near future (say for another 4 hours). In the end they
*are* good canditates for eviction.

> Note that we don't actually use a LRU queue, we use a more efficient
> approximation called aging, so the above is not a recipe for implementation.

I'm not sure that, in general, recent pages with only one access are
still better eviction candidates compared to 8 hours old pages. Here we
need either another way to detect one-shot activity (like the one
performed by updatedb), or to keep pages that belong to the working set
of interactive processes somewhat "warm", and never let them age too much.
A page with one one (read) access can be "cold". A page with more than one
access becomes "hot". Aging moves page towards the "cold" state, and of
course "cold" pages are the best candidates for eviction. Pages belonging
to interactive processes are never moved from the "warm" state into
the "cold" state by the background aging. Maybe this can be implemented
just leaving such pages on the active list, and deactivating them
only on pressure. Or not leaving their age reach 0. (Well, i'm not really
into current VM implementation. I guess that those single access pages
will be placed on the end of the active list with age 0, or something
like that).

If I understand the current VM code, after 8 hours of idle time, all
pages of interactive applications will be on the inactive(_clean?) list,
ready for eviction. Even if you place new pages (the updatedb activity)
at the *end* of the active list, (instead of the front), it won't be
enough to prevent application pages from being evicted. It won't solve
Mike's problem, that is.

>
> --
> Daniel
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

.TM.
-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-28 15:21 ` Jonathan Morton
@ 2001-06-28 16:02   ` Daniel Phillips
  0 siblings, 0 replies; 62+ messages in thread
From: Daniel Phillips @ 2001-06-28 16:02 UTC (permalink / raw)
  To: Jonathan Morton, mike_phillips, linux-kernel

On Thursday 28 June 2001 17:21, Jonathan Morton wrote:
> >There is a simple change in strategy that will fix up the updatedb case
> > quite nicely, it goes something like this: a single access to a page
> > (e.g., reading it) isn't enough to bring it to the front of the LRU
> > queue, but accessing it twice or more is.  This is being looked at.
>
> Say, when a page is created due to a page fault, page->age is set to
> zero instead of whatever it is now.

This isn't quite enough.  We do want to be able to assign a ranking to 
members of the accessed-once set, and we do want to distinguish between newly 
created pages and pages that have aged all the way to zero.

> Then, on the first access, it is
> incremented to one.  All accesses where page->age was previously zero
> cause it to be incremented to one, and subsequent accesses where
> page->age is non-zero cause a doubling rather than an increment.
> This gives a nice heavy priority boost to frequently-accessed pages...

While on that topic, could somebody please explain to me why exponential 
aging is better than linear aging by a suitably chosen increment?  It's clear 
what's wrong with it: after 32 hits you lose all further information.  I 
suspect there are more problems with it than that.

--
Daniel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-28 12:20 mike_phillips
  2001-06-28 12:30 ` Alan Cox
  2001-06-28 14:39 ` Daniel Phillips
@ 2001-06-28 15:21 ` Jonathan Morton
  2001-06-28 16:02   ` Daniel Phillips
  2 siblings, 1 reply; 62+ messages in thread
From: Jonathan Morton @ 2001-06-28 15:21 UTC (permalink / raw)
  To: Daniel Phillips, mike_phillips, linux-kernel

>There is a simple change in strategy that will fix up the updatedb case quite
>nicely, it goes something like this: a single access to a page (e.g., reading
>it) isn't enough to bring it to the front of the LRU queue, but accessing it
>twice or more is.  This is being looked at.

Say, when a page is created due to a page fault, page->age is set to 
zero instead of whatever it is now.  Then, on the first access, it is 
incremented to one.  All accesses where page->age was previously zero 
cause it to be incremented to one, and subsequent accesses where 
page->age is non-zero cause a doubling rather than an increment. 
This gives a nice heavy priority boost to frequently-accessed pages...

>Note that we don't actually use a LRU queue, we use a more efficient
>approximation called aging, so the above is not a recipe for implementation.

Maybe it is, but in a slightly lateral manner as above.

-- 
--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
website:  http://www.chromatix.uklinux.net/vnc/
geekcode: GCS$/E dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$
           V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
tagline:  The key to knowledge is not to rely on people to teach you it.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-28 13:37     ` Alan Cox
  2001-06-28 14:04       ` Tobias Ringstrom
@ 2001-06-28 14:52       ` Daniel Phillips
  1 sibling, 0 replies; 62+ messages in thread
From: Daniel Phillips @ 2001-06-28 14:52 UTC (permalink / raw)
  To: Alan Cox, Tobias Ringstrom; +Cc: Alan Cox, mike_phillips, linux-kernel

On Thursday 28 June 2001 15:37, Alan Cox wrote:
> > The problem with updatedb is that it pushes all applications to the swap,
> > and when you get back in the morning, everything has to be paged back
> > from swap just because the (stupid) OS is prepared for yet another
> > updatedb run.
>
> Updatedb is a bit odd in that it mostly sucks in metadata and the buffer to
> page cache balancing is a bit suspect IMHO.

For Ext2, most or all of that metadata will be moved into the page cache 
early in 2.5, and other filesystem will likely follow that lead.  That's not 
to say the buffer/page cache balancing shouldn't get attention, just that 
this particular problem will die by itself.

--
Daniel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-28 12:20 mike_phillips
  2001-06-28 12:30 ` Alan Cox
@ 2001-06-28 14:39 ` Daniel Phillips
  2001-06-28 18:01   ` Marco Colombo
  2001-06-28 15:21 ` Jonathan Morton
  2 siblings, 1 reply; 62+ messages in thread
From: Daniel Phillips @ 2001-06-28 14:39 UTC (permalink / raw)
  To: mike_phillips, linux-kernel

On Thursday 28 June 2001 14:20, mike_phillips@urscorp.com wrote:
> > If individual pages could be classified as code (text segments),
> > data, file cache, and so on, I would specify costs to the paging
> > of such pages in or out.  This way I can make the system perfer
> > to drop a file cache page that has not been accessed for five
> > minutes, over a program text page that has not been acccessed
> > for one hour (or much more).
>
> This would be extremely useful. My laptop has 256mb of ram, but every day
> it runs the updatedb for locate. This fills the memory with the file
> cache. Interactivity is then terrible, and swap is unnecessarily used. On
> the laptop all this hard drive thrashing is bad news for battery life
> (plus the fact that laptop hard drives are not the fastest around). I
> purposely do not run more applications than can comfortably fit in the
> 256mb of memory.
>
> If fact, to get interactivity back, I've got a small 10 liner that mallocs
> memory to *force* stuff into swap purely so I can have a large block of
> memory back for interactivity.
>
> Something simple that did "you haven't used this file for 30mins, flush it
> out of the cache would be sufficient"

Updatedb fills memory full of clean file pages so there's nothing to flush.  
Did you mean "evict"?

Roughly speaking we treat clean pages as "instantly relaimable".  Eviction 
and reclaiming are done in the same step (look at reclaim_page).  The key to 
efficient mm is nothing more or less than choosing the best victim for 
reclaiming and we aren't doing a spectacularly good job of that right now.

There is a simple change in strategy that will fix up the updatedb case quite 
nicely, it goes something like this: a single access to a page (e.g., reading 
it) isn't enough to bring it to the front of the LRU queue, but accessing it 
twice or more is.  This is being looked at.

Note that we don't actually use a LRU queue, we use a more efficient 
approximation called aging, so the above is not a recipe for implementation.

--
Daniel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-28 14:04       ` Tobias Ringstrom
@ 2001-06-28 14:14         ` Alan Cox
  0 siblings, 0 replies; 62+ messages in thread
From: Alan Cox @ 2001-06-28 14:14 UTC (permalink / raw)
  To: Tobias Ringstrom; +Cc: Alan Cox, mike_phillips, linux-kernel

> > Updatedb is a bit odd in that it mostly sucks in metadata and the buffer to
> > page cache balancing is a bit suspect IMHO.
> 
> In 2.4.6-pre, the buffer cache is no longer used for metata, right?

For ext2 directory blocks the page cache is now used

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-28 13:37     ` Alan Cox
@ 2001-06-28 14:04       ` Tobias Ringstrom
  2001-06-28 14:14         ` Alan Cox
  2001-06-28 14:52       ` Daniel Phillips
  1 sibling, 1 reply; 62+ messages in thread
From: Tobias Ringstrom @ 2001-06-28 14:04 UTC (permalink / raw)
  To: Alan Cox; +Cc: mike_phillips, linux-kernel

On Thu, 28 Jun 2001, Alan Cox wrote:

> > > That isnt really down to labelling pages, what you are talking qbout is what
> > > you get for free when page aging works right (eg 2.0.39) but don't get in
> > > 2.2 - and don't yet (although its coming) quite get right in 2.4.6pre.
> >
> > Correct, but all pages are not equal.
>
> That is the whole point of page aging done right. The use of a page dictates
> how it is aged before being discarded. So pages referenced once are aged
> rapidly, but once they get touched a couple of times then you know they arent
> streaming I/O. There are other related techniques like punishing pages that
> are touched when streaming I/O is done to pages further down the same file -
> FreeBSD does this one for example

Are you saying that classification of pages will not be useful?

Only looking at the page access patterns can certainly reveal a lot, but
tuning how to punish different pages is useful.

> > The problem with updatedb is that it pushes all applications to the swap,
> > and when you get back in the morning, everything has to be paged back from
> > swap just because the (stupid) OS is prepared for yet another updatedb
> > run.
>
> Updatedb is a bit odd in that it mostly sucks in metadata and the buffer to
> page cache balancing is a bit suspect IMHO.

In 2.4.6-pre, the buffer cache is no longer used for metata, right?

/Tobias


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-28 13:33   ` Tobias Ringstrom
@ 2001-06-28 13:37     ` Alan Cox
  2001-06-28 14:04       ` Tobias Ringstrom
  2001-06-28 14:52       ` Daniel Phillips
  0 siblings, 2 replies; 62+ messages in thread
From: Alan Cox @ 2001-06-28 13:37 UTC (permalink / raw)
  To: Tobias Ringstrom; +Cc: Alan Cox, mike_phillips, linux-kernel

> > That isnt really down to labelling pages, what you are talking qbout is what
> > you get for free when page aging works right (eg 2.0.39) but don't get in
> > 2.2 - and don't yet (although its coming) quite get right in 2.4.6pre.
> 
> Correct, but all pages are not equal.

That is the whole point of page aging done right. The use of a page dictates
how it is aged before being discarded. So pages referenced once are aged
rapidly, but once they get touched a couple of times then you know they arent
streaming I/O. There are other related techniques like punishing pages that
are touched when streaming I/O is done to pages further down the same file -
FreeBSD does this one for example

> The problem with updatedb is that it pushes all applications to the swap,
> and when you get back in the morning, everything has to be paged back from
> swap just because the (stupid) OS is prepared for yet another updatedb
> run.

Updatedb is a bit odd in that it mostly sucks in metadata and the buffer to
page cache balancing is a bit suspect IMHO.

Alan


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-28 12:30 ` Alan Cox
@ 2001-06-28 13:33   ` Tobias Ringstrom
  2001-06-28 13:37     ` Alan Cox
  0 siblings, 1 reply; 62+ messages in thread
From: Tobias Ringstrom @ 2001-06-28 13:33 UTC (permalink / raw)
  To: Alan Cox; +Cc: mike_phillips, linux-kernel

On Thu, 28 Jun 2001, Alan Cox wrote:

> > This would be extremely useful. My laptop has 256mb of ram, but every day
> > it runs the updatedb for locate. This fills the memory with the file
> > cache. Interactivity is then terrible, and swap is unnecessarily used. On
> > the laptop all this hard drive thrashing is bad news for battery life
>
> That isnt really down to labelling pages, what you are talking qbout is what
> you get for free when page aging works right (eg 2.0.39) but don't get in
> 2.2 - and don't yet (although its coming) quite get right in 2.4.6pre.

Correct, but all pages are not equal.

The problem with updatedb is that it pushes all applications to the swap,
and when you get back in the morning, everything has to be paged back from
swap just because the (stupid) OS is prepared for yet another updatedb
run.

Other bad activities include copying lots of files, tar/untar:ing and CD
writing.  They all cause unwanted paging, at least for the desktop user.

/Tobias


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-26 22:21     ` Stefan Hoffmeister
  2001-06-26 22:48       ` Jeffrey W. Baker
@ 2001-06-28 13:07       ` John Fremlin
  1 sibling, 0 replies; 62+ messages in thread
From: John Fremlin @ 2001-06-28 13:07 UTC (permalink / raw)
  To: Stefan Hoffmeister; +Cc: linux-kernel

	Stefan Hoffmeister <lkml.2001@econos.de> writes:

[...]

> Windows NT/2000 has flags that can be for each CreateFile operation
> ("open" in Unix terms), for instance
> 
>   FILE_ATTRIBUTE_TEMPORARY
> 
>   FILE_FLAG_WRITE_THROUGH
>   FILE_FLAG_NO_BUFFERING
>   FILE_FLAG_RANDOM_ACCESS
>   FILE_FLAG_SEQUENTIAL_SCAN
> 
> If Linux does not have mechanism that would allow the signalling of
> specific use case, it might be helpful to implement such a hinting
> system?

madvise(2) does it on mappings IIRC

-- 
Seeking summer job at last minute - see http://ape.n3.net/cv.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-28 12:31     ` Xavier Bestel
@ 2001-06-28 13:05       ` Tobias Ringstrom
  0 siblings, 0 replies; 62+ messages in thread
From: Tobias Ringstrom @ 2001-06-28 13:05 UTC (permalink / raw)
  To: Xavier Bestel; +Cc: Helge Hafting, Martin Knoblauch, linux-kernel

On 28 Jun 2001, Xavier Bestel wrote:

> On 28 Jun 2001 14:02:09 +0200, Tobias Ringstrom wrote:
>
> > This would be very useful, I think.  Would it be very hard to classify
> > pages like this (text/data/cache/...)?
>
> How would you classify a page of perl code ?

I do know how the Perl interpreter works, but I think it byte-compiles the
code and puts it in the data segment, which also would have a high paging
cost.

The perl source code would be paged in/out before running binaries such as
shells and the window system, but the same thing would happen to binaries
with short life-span, I suppose.  Perhaps cached executables and cached
data files can be classified differently as well.

What I meant to ask with the question above was if it would be hard to
implement the classification in the kernel.

/Tobias


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-28 12:02   ` Tobias Ringstrom
@ 2001-06-28 12:31     ` Xavier Bestel
  2001-06-28 13:05       ` Tobias Ringstrom
  0 siblings, 1 reply; 62+ messages in thread
From: Xavier Bestel @ 2001-06-28 12:31 UTC (permalink / raw)
  To: Tobias Ringstrom; +Cc: Helge Hafting, Martin Knoblauch, linux-kernel

On 28 Jun 2001 14:02:09 +0200, Tobias Ringstrom wrote:

> This would be very useful, I think.  Would it be very hard to classify
> pages like this (text/data/cache/...)?

How would you classify a page of perl code ?

Xav


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-28 12:20 mike_phillips
@ 2001-06-28 12:30 ` Alan Cox
  2001-06-28 13:33   ` Tobias Ringstrom
  2001-06-28 14:39 ` Daniel Phillips
  2001-06-28 15:21 ` Jonathan Morton
  2 siblings, 1 reply; 62+ messages in thread
From: Alan Cox @ 2001-06-28 12:30 UTC (permalink / raw)
  To: mike_phillips; +Cc: linux-kernel

> This would be extremely useful. My laptop has 256mb of ram, but every day 
> it runs the updatedb for locate. This fills the memory with the file 
> cache. Interactivity is then terrible, and swap is unnecessarily used. On 
> the laptop all this hard drive thrashing is bad news for battery life 

That isnt really down to labelling pages, what you are talking qbout is what
you get for free when page aging works right (eg 2.0.39) but don't get in
2.2 - and don't yet (although its coming) quite get right in 2.4.6pre.



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
@ 2001-06-28 12:20 mike_phillips
  2001-06-28 12:30 ` Alan Cox
                   ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: mike_phillips @ 2001-06-28 12:20 UTC (permalink / raw)
  To: linux-kernel

> If individual pages could be classified as code (text segments), 
> data, file cache, and so on, I would specify costs to the paging 
> of such pages in or out.  This way I can make the system perfer 
> to drop a file cache page that has not been accessed for five 
> minutes, over a program text page that has not been acccessed 
> for one hour (or much more).

This would be extremely useful. My laptop has 256mb of ram, but every day 
it runs the updatedb for locate. This fills the memory with the file 
cache. Interactivity is then terrible, and swap is unnecessarily used. On 
the laptop all this hard drive thrashing is bad news for battery life 
(plus the fact that laptop hard drives are not the fastest around). I 
purposely do not run more applications than can comfortably fit in the 
256mb of memory.

If fact, to get interactivity back, I've got a small 10 liner that mallocs 
memory to *force* stuff into swap purely so I can have a large block of 
memory back for interactivity.

Something simple that did "you haven't used this file for 30mins, flush it 
out of the cache would be sufficient"

Mike

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-28 11:27 ` Helge Hafting
  2001-06-28 11:54   ` Martin Knoblauch
@ 2001-06-28 12:02   ` Tobias Ringstrom
  2001-06-28 12:31     ` Xavier Bestel
  1 sibling, 1 reply; 62+ messages in thread
From: Tobias Ringstrom @ 2001-06-28 12:02 UTC (permalink / raw)
  To: Helge Hafting; +Cc: Martin Knoblauch, linux-kernel

On Thu, 28 Jun 2001, Helge Hafting wrote:
> Preventing swap-trashing at all cost doesn't help if the
> machine loose to io-trashing instead.  Performance will be
> just as much down, although perhaps more satisfying because
> people aren't that surprised if explicit file operations
> take a long time.  They hate it when moving the mouse
> or something cause a disk access even if their
> apps runs faster. :-(

Exactly.  I still want the ability to tune the system according to my
taste.  I've been thinking about this for some time, and I've specifically
tried to come up with nice tunables, completely ignoring if it is possible
now or not.

If individual pages could be classified as code (text segments), data,
file cache, and so on, I would specify costs to the paging of such pages
in or out.  This way I can make the system perfer to drop a file cache
page that has not been accessed for five minutes, over a program text page
that has not been acccessed for one hour (or much more).

This would be very useful, I think.  Would it be very hard to classify
pages like this (text/data/cache/...)?

Any reason why this is a bad idea?

/Tobias




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-28 11:27 ` Helge Hafting
@ 2001-06-28 11:54   ` Martin Knoblauch
  2001-06-28 12:02   ` Tobias Ringstrom
  1 sibling, 0 replies; 62+ messages in thread
From: Martin Knoblauch @ 2001-06-28 11:54 UTC (permalink / raw)
  To: Helge Hafting; +Cc: linux-kernel

Helge Hafting wrote:
> 
> Martin Knoblauch wrote:
> 
> >
> >  maybe more specific: If the hit-rate is low and the cache is already
> > 70+% of the systems memory, the chances maybe slim that more cache is
> > going to improve the hit-rate.
> >
> Oh, but this is posible.  You can get into situations where
> the (file cache) working set needs 80% or so of memory
> to get a near-perfect hitrate, and where
> using 70% of memory will trash madly due to the file access

 thats why I said "maybe" :-) Sure, another 5% of cache may improve
things, but they also may kill the interactive performance. Thats why
there should be probably more than one VM strategy to accomodate Servers
and Workstations/Lpatops.

> pattern.  And this won't be a problem either, if
> the working set of "other" (non-file)
> stuff is below 20% of memory.  The total size of
> non-file stuff may be above 20% though, so something goes
> into swap.
> 

 And that is the problem. To much seems to go into swap. At least for
interactive work. Unfortunatelly, with 128MB of memory I cannot entirely
turn of swap. I will see how things are going once I have 256 or 512 MB
(hopefully soon :-)

> I definitely want the machine to work under such circumstances,
> so an arbitrary limit of 70% won't work.
>

 Do not take the 70% as an arbitrary limit. I never said that. The 70%
is just my situation. The problems may arise at 60% cache or at 97.38%
cache.
 
> Preventing swap-trashing at all cost doesn't help if the

 Never said at all cost.

> machine loose to io-trashing instead.  Performance will be
> just as much down, although perhaps more satisfying because
> people aren't that surprised if explicit file operations
> take a long time.  They hate it when moving the mouse
> or something cause a disk access even if their
> apps runs faster. :-(
> 

 Absolutely true. And if the main purpose of the machine is interactive
work (we do want to be Linux a success on the desktop, don't we?), it
should not be hampered by by an IO improvement that may be only of
secondary importance to the user (that the final "customer" for all the
work that is done to the kernel :-). On big servers a litle paging now
and then may be absolutely OK, as long as the IO is going strong.

 I am observing the the discussions of VM behaviour in 2.4.x for some
time. They are mostly very entertaining and revealing. But they also
show that one solution does not seem to benefit all possible scenarios.
Therfore either more than one VM strategy is necessary, or better means
of tuning the cache behaviour, or both. Definitely better ways of
measuring the VM efficiency seem to be needed.

 While implementing VM strategies is probably out of question for a lot
of the people that complain, I hope that at least my complaints are kind
of useful.

Martin
-- 
------------------------------------------------------------------
Martin Knoblauch         |    email:  Martin.Knoblauch@TeraPort.de
TeraPort GmbH            |    Phone:  +49-89-510857-309
C+ITS                    |    Fax:    +49-89-510857-111
http://www.teraport.de   |    Mobile: +49-170-4904759

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-27  8:53 Martin Knoblauch
  2001-06-27 18:13 ` Rik van Riel
@ 2001-06-28 11:27 ` Helge Hafting
  2001-06-28 11:54   ` Martin Knoblauch
  2001-06-28 12:02   ` Tobias Ringstrom
  1 sibling, 2 replies; 62+ messages in thread
From: Helge Hafting @ 2001-06-28 11:27 UTC (permalink / raw)
  To: Martin Knoblauch; +Cc: linux-kernel

Martin Knoblauch wrote:

> 
>  maybe more specific: If the hit-rate is low and the cache is already
> 70+% of the systems memory, the chances maybe slim that more cache is
> going to improve the hit-rate.
> 
Oh, but this is posible.  You can get into situations where
the (file cache) working set needs 80% or so of memory
to get a near-perfect hitrate, and where
using 70% of memory will trash madly due to the file access
pattern.  And this won't be a problem either, if
the working set of "other" (non-file) 
stuff is below 20% of memory.  The total size of
non-file stuff may be above 20% though, so something goes
into swap.

I definitely want the machine to work under such circumstances,
so an arbitrary limit of 70% won't work.

Preventing swap-trashing at all cost doesn't help if the
machine loose to io-trashing instead.  Performance will be
just as much down, although perhaps more satisfying because
people aren't that surprised if explicit file operations
take a long time.  They hate it when moving the mouse
or something cause a disk access even if their
apps runs faster. :-(

Helge Hafting

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-27 18:13 ` Rik van Riel
@ 2001-06-28  6:59   ` Martin Knoblauch
  0 siblings, 0 replies; 62+ messages in thread
From: Martin Knoblauch @ 2001-06-28  6:59 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel

Rik van Riel wrote:
> 
> On Wed, 27 Jun 2001, Martin Knoblauch wrote:
> 
> >  I do not care much whether the cache is using 99% of the systems memory
> > or 50%. As long as there is free memory, using it for cache is great. I
> > care a lot if the cache takes down interactivity, because it pushes out
> > processes that it thinks idle, but that I need in 5 seconds. The caches
> > pressure against processes
> 
> Too bad that processes are in general cached INSIDE the cache.
> 
> You'll have to write a new balancing story now ;)
> 

 maybe that is part of "the answer" :-)

Martin
-- 
------------------------------------------------------------------
Martin Knoblauch         |    email:  Martin.Knoblauch@TeraPort.de
TeraPort GmbH            |    Phone:  +49-89-510857-309
C+ITS                    |    Fax:    +49-89-510857-111
http://www.teraport.de   |    Mobile: +49-170-4904759

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-27  8:53 Martin Knoblauch
@ 2001-06-27 18:13 ` Rik van Riel
  2001-06-28  6:59   ` Martin Knoblauch
  2001-06-28 11:27 ` Helge Hafting
  1 sibling, 1 reply; 62+ messages in thread
From: Rik van Riel @ 2001-06-27 18:13 UTC (permalink / raw)
  To: Martin Knoblauch; +Cc: linux-kernel

On Wed, 27 Jun 2001, Martin Knoblauch wrote:

>  I do not care much whether the cache is using 99% of the systems memory
> or 50%. As long as there is free memory, using it for cache is great. I
> care a lot if the cache takes down interactivity, because it pushes out
> processes that it thinks idle, but that I need in 5 seconds. The caches
> pressure against processes

Too bad that processes are in general cached INSIDE the cache.

You'll have to write a new balancing story now ;)

regards,

Rik
--
Executive summary of a recent Microsoft press release:
   "we are concerned about the GNU General Public License (GPL)"


		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-26 21:33 ` John Stoffel
  2001-06-26 21:42   ` Rik van Riel
  2001-06-27  3:55   ` Daniel Phillips
@ 2001-06-27 14:09   ` Pozsar Balazs
  2 siblings, 0 replies; 62+ messages in thread
From: Pozsar Balazs @ 2001-06-27 14:09 UTC (permalink / raw)
  To: John Stoffel; +Cc: Rik van Riel, Jason McMullan, linux-kernel


> Rik> ... but I fail to see this one. If we get a low cache hit rate,
> Rik> couldn't that just mean we allocated too little memory for the
> Rik> cache ?
> Or that we're doing big sequential reads of file(s) which are larger
> than memory, in which case expanding the cache size buys us nothing,
> and can actually hurt us alot.

I've got an idea about how to handle this situation generally (without
sending 'tips' to kernel via madvice() or anything similar).

Instead of sorting chached pages (i mean blocks of files) by last touch
time, and dropping the oldest page(s) if we're sort on memory, i would
propose this nicer algorithm: (i this is relevant only to the read cache)

Suppose that f1,f2,...fN files cached, their sizes are s1,s2,...sN and
that they were last touched t1,t2,...tN seconds ago. (t1<t2<...<tN)
Now we shouldn't automatically choose pages of fN to drop, instead a
probability (chance) could be assigned to each file, for example:
 fI*sI*tI/SUM where I is one of 1,2,...,N, and SUM is the SUM of fI*sI*tI.

With this, mostly newer files would stay in cache, but older files would
still have a chance.
This could also be tuned, for example to take into account 't' more, the
 fI*sI*tI*tI could be  used... and so on, we have infinite possibilities.


have a nice day,
Balazs Pozsar.

ps: If 'my' idea is the which is already used in the kernel, then tell me
:) and give me some points were to read more before telling stupid things.

> I personally don't feel that the cache should be allowed to grow over
> 50% of the system's memory at all, we've got so much in the cache at
> that point, that we're probably not hitting it all that much.
>
> This is why the discussion on the other cache scanning algorithm
> (2Q+?) was so interesting, since it looked to handle both the LRU
> vs. FIFO tradeoffs very nicely.

-- 



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-26 21:42   ` Rik van Riel
  2001-06-26 22:21     ` Stefan Hoffmeister
@ 2001-06-27 13:36     ` Marco Colombo
  1 sibling, 0 replies; 62+ messages in thread
From: Marco Colombo @ 2001-06-27 13:36 UTC (permalink / raw)
  To: Rik van Riel; +Cc: John Stoffel, Jason McMullan, linux-kernel

On Tue, 26 Jun 2001, Rik van Riel wrote:

> On Tue, 26 Jun 2001, John Stoffel wrote:
>
> > >> * If we're getting low cache hit rates, don't flush
> > >> processes to swap.
> > >> * If we're getting good cache hit rates, flush old, idle
> > >> processes to swap.
> >
> > Rik> ... but I fail to see this one. If we get a low cache hit rate,
> > Rik> couldn't that just mean we allocated too little memory for the
> > Rik> cache ?
> >
> > Or that we're doing big sequential reads of file(s) which are
> > larger than memory, in which case expanding the cache size buys
> > us nothing, and can actually hurt us alot.
>
> That's a big "OR".  I think we should have an algorithm to
> see which of these two is the case, otherwise we're just
> making the wrong decision half of the time.
>
> Also, in many systems we'll be doing IO on _multiple_ files
> at the same time, so I guess this will have to be a file-by-file
> decision.

Of course, you can always think of a "bad" behaviour. That should
really be a page-by-page decision. An application may have both data and
meta-data on the same file. You want to keep the metadata on core
(think of access by an index, it's much better if all the index is there,
even some unused parts) *and* cache commonly used data (that's just
a cache of hot objects, normal replacement algoriths may be used) *and*
drop-behind data on sequential scans...  trying to understand what
an application is doing, in order to foresee what it will be doing,
it's bad attitude. Let's give an application writer a way to code it
sanely (setting per-file VM attributes is fine).  If an application
is not friendly (gives no hints on its VM behaviour) just punish it.
I mean, when tuning the VM behaviour, system health and friendly
applications performance are the goals - do whatever necessary to preserve
them, even kill the offender and rm its executable if someone it's
running it again (*grin*) B-).

.TM.
-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
       [not found] ` <fa.jpsks3v.1o2gag4@ifi.uio.no>
  2001-06-27  0:43   ` Dan Maas
@ 2001-06-27 10:50   ` Xavier Bestel
  1 sibling, 0 replies; 62+ messages in thread
From: Xavier Bestel @ 2001-06-27 10:50 UTC (permalink / raw)
  To: Dan Maas; +Cc: Stefan Hoffmeister, linux-kernel

On 26 Jun 2001 20:43:33 -0400, Dan Maas wrote:
> > Windows NT/2000 has flags that can be for each CreateFile operation
> > ("open" in Unix terms), for instance
> >
> >   FILE_ATTRIBUTE_TEMPORARY
> >   FILE_FLAG_WRITE_THROUGH
> >   FILE_FLAG_NO_BUFFERING
> >   FILE_FLAG_RANDOM_ACCESS
> >   FILE_FLAG_SEQUENTIAL_SCAN
> >
> 

We do (nearly) already have O_DIRECT which won't touch cache (alas I
don't think I will read-ahead more)

Xav


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
@ 2001-06-27  8:53 Martin Knoblauch
  2001-06-27 18:13 ` Rik van Riel
  2001-06-28 11:27 ` Helge Hafting
  0 siblings, 2 replies; 62+ messages in thread
From: Martin Knoblauch @ 2001-06-27  8:53 UTC (permalink / raw)
  To: linux-kernel

>> * If we're getting low cache hit rates, don't flush 
>> processes to swap. 
>> * If we're getting good cache hit rates, flush old, idle 
>> processes to swap. 

Rik> ... but I fail to see this one. If we get a low cache hit rate, 
Rik> couldn't that just mean we allocated too little memory for the 
Rik> cache ? 

 maybe more specific: If the hit-rate is low and the cache is already
70+% of the systems memory, the chances maybe slim that more cache is
going to improve the hit-rate. 

 I do not care much whether the cache is using 99% of the systems memory
or 50%. As long as there is free memory, using it for cache is great. I
care a lot if the cache takes down interactivity, because it pushes out
processes that it thinks idle, but that I need in 5 seconds. The caches
pressure against processes should decrease with the (relative) size of
the cache. Especially in low hit-rate situations.

 OT: I asked the question before somewhere else. Are there interfaces to
the VM that expose the various cache sizes and, more important,
hit-rates to userland? I would love to see (or maybe help writing in my
free time) a tool to just visualize/analyze the efficiency of the VM
system.

Martin
-- 
------------------------------------------------------------------
Martin Knoblauch         |    email:  Martin.Knoblauch@TeraPort.de
TeraPort GmbH            |    Phone:  +49-89-510857-309
C+ITS                    |    Fax:    +49-89-510857-111
http://www.teraport.de   |    Mobile: +49-170-4904759

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-26 21:33 ` John Stoffel
  2001-06-26 21:42   ` Rik van Riel
@ 2001-06-27  3:55   ` Daniel Phillips
  2001-06-27 14:09   ` Pozsar Balazs
  2 siblings, 0 replies; 62+ messages in thread
From: Daniel Phillips @ 2001-06-27  3:55 UTC (permalink / raw)
  To: John Stoffel, Rik van Riel; +Cc: Jason McMullan, linux-kernel

> I personally don't feel that the cache should be allowed to grow over
> 50% of the system's memory at all, we've got so much in the cache at
> that point, that we're probably not hitting it all that much.

That depends very much on what you're using the system for.  Suppose you're 
running a trivial database application on a gigantic disk array - the name of 
the game is to cache as much metadata as possible, and that goes directly to 
the bottom line as performance.  Might as well use 90%+ of your memory for 
that.

The conclusion to draw here is, the balance between file cache and process 
memory should be able to slide all the way from one extreme to the other.  
It's not a requirement that that be fully automatic but it's highly 
desireable.

--
Daniel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-27  0:43   ` Dan Maas
@ 2001-06-27  0:45     ` Mike Castle
  0 siblings, 0 replies; 62+ messages in thread
From: Mike Castle @ 2001-06-27  0:45 UTC (permalink / raw)
  To: linux-kernel

On Tue, Jun 26, 2001 at 08:43:33PM -0400, Dan Maas wrote:
> (hrm, maybe I could hack up my own manual read-ahead/drop-behind with mmap()
> and memory locking...)

Just to argue portability for a moment (portability on the expected
results, that is, vs APIs).

Would this technique work across a variety of OSes?

Would the recent caching difficulties of the 2.4.* series have handled such
a technique in a reasonable fashion?

mrc
-- 
     Mike Castle      dalgoda@ix.netcom.com      www.netcom.com/~dalgoda/
    We are all of us living in the shadow of Manhattan.  -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
       [not found] ` <fa.jpsks3v.1o2gag4@ifi.uio.no>
@ 2001-06-27  0:43   ` Dan Maas
  2001-06-27  0:45     ` Mike Castle
  2001-06-27 10:50   ` Xavier Bestel
  1 sibling, 1 reply; 62+ messages in thread
From: Dan Maas @ 2001-06-27  0:43 UTC (permalink / raw)
  To: Stefan Hoffmeister; +Cc: linux-kernel

> Windows NT/2000 has flags that can be for each CreateFile operation
> ("open" in Unix terms), for instance
>
>   FILE_ATTRIBUTE_TEMPORARY
>   FILE_FLAG_WRITE_THROUGH
>   FILE_FLAG_NO_BUFFERING
>   FILE_FLAG_RANDOM_ACCESS
>   FILE_FLAG_SEQUENTIAL_SCAN
>

There is a BSD-originated convention for this - madvise().

If you look in the Linux VM code there is a bit of explicit code for
different madvise access patterns, but I'm not sure if it's 100% supported.

Drop-behind would be really, really nice to have for my multimedia
applications. I routinely deal with very large video files (several times
larger than my RAM). When I sequentially read though such files a bit at a
time, I do NOT want the old pages sitting there in RAM while all of my other
running programs are rudely paged out...

(hrm, maybe I could hack up my own manual read-ahead/drop-behind with mmap()
and memory locking...)

Regards,
Dan



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-26 22:48       ` Jeffrey W. Baker
@ 2001-06-27  0:18         ` Mike Castle
  0 siblings, 0 replies; 62+ messages in thread
From: Mike Castle @ 2001-06-27  0:18 UTC (permalink / raw)
  To: linux-kernel

On Tue, Jun 26, 2001 at 03:48:09PM -0700, Jeffrey W. Baker wrote:
> These flags would be really handy.  We already have the raw device for
> sequential reading of e.g. CDROM and DVD devices.

Not going to help 99% of the applications out there.

mrc
-- 
     Mike Castle      dalgoda@ix.netcom.com      www.netcom.com/~dalgoda/
    We are all of us living in the shadow of Manhattan.  -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-26 22:21     ` Stefan Hoffmeister
@ 2001-06-26 22:48       ` Jeffrey W. Baker
  2001-06-27  0:18         ` Mike Castle
  2001-06-28 13:07       ` John Fremlin
  1 sibling, 1 reply; 62+ messages in thread
From: Jeffrey W. Baker @ 2001-06-26 22:48 UTC (permalink / raw)
  To: Stefan Hoffmeister
  Cc: Rik van Riel, John Stoffel, Jason McMullan, linux-kernel



On Wed, 27 Jun 2001, Stefan Hoffmeister wrote:

> : On Tue, 26 Jun 2001 18:42:56 -0300 (BRST), Rik van Riel wrote:
>
> >On Tue, 26 Jun 2001, John Stoffel wrote:
> >
> >> Or that we're doing big sequential reads of file(s) which are
> >> larger than memory, in which case expanding the cache size buys
> >> us nothing, and can actually hurt us alot.
> >
> >That's a big "OR".  I think we should have an algorithm to
> >see which of these two is the case, otherwise we're just
> >making the wrong decision half of the time.
>
> Windows NT/2000 has flags that can be for each CreateFile operation
> ("open" in Unix terms), for instance
>
>   FILE_ATTRIBUTE_TEMPORARY
>
>   FILE_FLAG_WRITE_THROUGH
>   FILE_FLAG_NO_BUFFERING
>   FILE_FLAG_RANDOM_ACCESS
>   FILE_FLAG_SEQUENTIAL_SCAN
>
> If Linux does not have mechanism that would allow the signalling of
> specific use case, it might be helpful to implement such a hinting system?

These flags would be really handy.  We already have the raw device for
sequential reading of e.g. CDROM and DVD devices.

-jwb


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-26 21:42   ` Rik van Riel
@ 2001-06-26 22:21     ` Stefan Hoffmeister
  2001-06-26 22:48       ` Jeffrey W. Baker
  2001-06-28 13:07       ` John Fremlin
  2001-06-27 13:36     ` Marco Colombo
  1 sibling, 2 replies; 62+ messages in thread
From: Stefan Hoffmeister @ 2001-06-26 22:21 UTC (permalink / raw)
  To: Rik van Riel; +Cc: John Stoffel, Jason McMullan, linux-kernel

: On Tue, 26 Jun 2001 18:42:56 -0300 (BRST), Rik van Riel wrote:

>On Tue, 26 Jun 2001, John Stoffel wrote:
>
>> Or that we're doing big sequential reads of file(s) which are
>> larger than memory, in which case expanding the cache size buys
>> us nothing, and can actually hurt us alot.
>
>That's a big "OR".  I think we should have an algorithm to
>see which of these two is the case, otherwise we're just
>making the wrong decision half of the time.

Windows NT/2000 has flags that can be for each CreateFile operation
("open" in Unix terms), for instance

  FILE_ATTRIBUTE_TEMPORARY

  FILE_FLAG_WRITE_THROUGH
  FILE_FLAG_NO_BUFFERING
  FILE_FLAG_RANDOM_ACCESS
  FILE_FLAG_SEQUENTIAL_SCAN

If Linux does not have mechanism that would allow the signalling of
specific use case, it might be helpful to implement such a hinting system?

Disclaimer: I am clueless about what the kernel provides at this time.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-26 21:33 ` John Stoffel
@ 2001-06-26 21:42   ` Rik van Riel
  2001-06-26 22:21     ` Stefan Hoffmeister
  2001-06-27 13:36     ` Marco Colombo
  2001-06-27  3:55   ` Daniel Phillips
  2001-06-27 14:09   ` Pozsar Balazs
  2 siblings, 2 replies; 62+ messages in thread
From: Rik van Riel @ 2001-06-26 21:42 UTC (permalink / raw)
  To: John Stoffel; +Cc: Jason McMullan, linux-kernel

On Tue, 26 Jun 2001, John Stoffel wrote:

> >> * If we're getting low cache hit rates, don't flush
> >> processes to swap.
> >> * If we're getting good cache hit rates, flush old, idle
> >> processes to swap.
>
> Rik> ... but I fail to see this one. If we get a low cache hit rate,
> Rik> couldn't that just mean we allocated too little memory for the
> Rik> cache ?
>
> Or that we're doing big sequential reads of file(s) which are
> larger than memory, in which case expanding the cache size buys
> us nothing, and can actually hurt us alot.

That's a big "OR".  I think we should have an algorithm to
see which of these two is the case, otherwise we're just
making the wrong decision half of the time.

Also, in many systems we'll be doing IO on _multiple_ files
at the same time, so I guess this will have to be a file-by-file
decision.

> I personally don't feel that the cache should be allowed to grow over
> 50% of the system's memory at all, we've got so much in the cache at
> that point, that we're probably not hitting it all that much.

Remember that disk cache includes stuff like mmap()ed
executables and swap-backed user memory. Do you really
want to limit those too ?


regards,

Rik
--
Executive summary of a recent Microsoft press release:
   "we are concerned about the GNU General Public License (GPL)"


		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-26 19:58 Jason McMullan
  2001-06-26 21:21 ` Rik van Riel
@ 2001-06-26 21:33 ` John Stoffel
  2001-06-26 21:42   ` Rik van Riel
                     ` (2 more replies)
  2001-06-28 22:47 ` John Fremlin
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 62+ messages in thread
From: John Stoffel @ 2001-06-26 21:33 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Jason McMullan, linux-kernel


>> * If we're getting low cache hit rates, don't flush
>> processes to swap.
>> * If we're getting good cache hit rates, flush old, idle
>> processes to swap.

Rik> ... but I fail to see this one. If we get a low cache hit rate,
Rik> couldn't that just mean we allocated too little memory for the
Rik> cache ?

Or that we're doing big sequential reads of file(s) which are larger
than memory, in which case expanding the cache size buys us nothing,
and can actually hurt us alot.  

I personally don't feel that the cache should be allowed to grow over
50% of the system's memory at all, we've got so much in the cache at
that point, that we're probably not hitting it all that much.

This is why the discussion on the other cache scanning algorithm
(2Q+?) was so interesting, since it looked to handle both the LRU
vs. FIFO tradeoffs very nicely.  

Rik> I am very much interested in continuing this discussion...

Me too, even if I can just contribute comments and not much code.  

John
   John Stoffel - Senior Unix Systems Administrator - Lucent Technologies
	 stoffel@lucent.com - http://www.lucent.com - 978-952-7548

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-26 21:21 ` Rik van Riel
@ 2001-06-26 21:29   ` Jason McMullan
  0 siblings, 0 replies; 62+ messages in thread
From: Jason McMullan @ 2001-06-26 21:29 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Jason McMullan, linux-kernel

On Tue, Jun 26, 2001 at 06:21:21PM -0300, Rik van Riel wrote:
> > 	* If we're getting low cache hit rates, don't flush
> > 	  processes to swap.
> > 	* If we're getting good cache hit rates, flush old, idle
> > 	  processes to swap.
> 
> ... but I fail to see this one. If we get a low cache hit
> rate, couldn't that just mean we allocated too little memory
> for the cache ?

	Hmmm. I didn't take that into consideration. But at the
same time, shouldn't a VM be able to determine that its cache
strategy is causing _more_ (absolute) misses by increasing it 
cache size? The percentage of misses may go down, but total 
device I/O may stay the same.

	So let's see... I'll rephrase that 'Motiviation' as:

	* Minimize the total medium/slow I/Os that occur over a 
	  sliding window of time. 

	Is that a more general case?
 
> Also, how would we translate all these requirements into
> VM strategies ?

	First, I would like to translate them into measurements.
Once we know how to measure these criteria, its possible to
formalize the feedback mechanism/accounting that a VM should
be aware of.

	In the end, I would like a VM to have some idea of
how well its performing, and be able to attempt various
well-known strategies based upon its own performance.

-- 
Jason McMullan, Senior Linux Consultant
Linuxcare, Inc. 412.432.6457 tel, 412.656.3519 cell
jmcmullan@linuxcare.com, http://www.linuxcare.com/
Linuxcare. Putting open source to work.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: VM Requirement Document - v0.0
  2001-06-26 19:58 Jason McMullan
@ 2001-06-26 21:21 ` Rik van Riel
  2001-06-26 21:29   ` Jason McMullan
  2001-06-26 21:33 ` John Stoffel
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 62+ messages in thread
From: Rik van Riel @ 2001-06-26 21:21 UTC (permalink / raw)
  To: Jason McMullan; +Cc: linux-kernel

On Tue, 26 Jun 2001, Jason McMullan wrote:

> 	If we take all the motivations from the above, and list
> them, we get:
>
> 	* Don't write to the (slow,packeted) devices until
> 	  you need to free up memory for processes.
> 	* Never cache reads from immediate/fast devices.
> 	* Keep packetized devices as continuously-idle as possible.
> 	  Small chunks of idleness don't count. You want to have
> 	  maximal stetches of idleness for the device.
> 	* Keep running processes as fully in memory as possible.

I agree with your modification, and with the obvious 4
points above ...

> 	* If we're getting low cache hit rates, don't flush
> 	  processes to swap.
> 	* If we're getting good cache hit rates, flush old, idle
> 	  processes to swap.

... but I fail to see this one. If we get a low cache hit
rate, couldn't that just mean we allocated too little memory
for the cache ?

I am very much interested in continuing this discussion...

Also, how would we translate all these requirements into
VM strategies ?

regards,

Rik
--
Executive summary of a recent Microsoft press release:
   "we are concerned about the GNU General Public License (GPL)"


		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 62+ messages in thread

* VM Requirement Document - v0.0
@ 2001-06-26 19:58 Jason McMullan
  2001-06-26 21:21 ` Rik van Riel
                   ` (4 more replies)
  0 siblings, 5 replies; 62+ messages in thread
From: Jason McMullan @ 2001-06-26 19:58 UTC (permalink / raw)
  To: linux-kernel


	Here's my first pass at a VM requirements document,
for the embedded, desktop, and server cases. At the end is 
a summary of general rules that should take care of all of 
these cases.

Bandwidth Descriptions:

	immediate: RAM, on-chip cache, etc. 
	fast:	   Flash reads, ROMs, etc.
	medium:    Hard drives, CD-ROMs, 100Mb ethernet, etc.
	slow:	   Flash writes, floppy disks,  CD-WR burners
	packeted:  Reads/write should be in as large a packet as possible

Embedded Case
-------------

	Overview
	--------
	  In the embedded case, the primary VM motiviation is to
	use as _little_ caching of the filesystem for reads as
	possible because (a) reads are very fast and (b) we don't
	have any swap. However, we want to cache _writes_ as hard
	as possible, because Flash is slow, and prone to wear.
	  
	Machine Description
	------------------
		RAM:	4-64Mb	 (reads: immediate, writes: immediate)
		Flash:	4-128Mb  (reads: fast, writes: slow, packeted)
		CDROM:	640-800Mb (reads: medium)
		Swap:	0Mb

	Motiviations
	------------
		* Don't write to the (slow,packeted) devices until
		  you need to free up memory for processes.
		* Never cache reads from immediate/fast devices.

Desktop Case
------------

	Overview
	--------
	  On the desktop, interactivity is king. We don't want to eat
	lots of I/O bandwidth paging in and out, however we also want
	to cache as much of the FS as possible, to speed compiles and
	multiple operations over the same sets of files. 
	
	  Balancing this is the notion of 'cache-hit-rates'. If our 
	access patterns aren't hitting cache, but disk instead, don't 
	swap out processes, just shrink the cache. Contrawise, if we
	have good cache hit rates, swap out the idle tasks.

	Machine Description
	-------------------
		RAM:	32Mb-1Gb  (reads: immediate, writes: immediate)
		HD:	1Gb-100Gb (reads: medium, writes: medium)
		CDROM:	640-800Mb (reads: medium)
		DVD:	1Gb-8Gb   (reads: medium)
		Swap:	RAM size  (HD speeds)

	Motivations
	-----------
		* If we're getting low cache hit rates, don't flush 
		  processes to swap.
		* If we're getting good cache hit rates, flush old, idle
		  processes to swap.

Laptop Case
-----------

	Overview
	--------
	  Same as a desktop, except now you must treat the HDs as
	packetized devices for power-saving.

	Machine Description
	-------------------
		RAM:	32Mb-1Gb  (reads: immediate, writes: immediate)
		HD:	1Gb-100Gb (reads: medium,packeted, writes: medium,packeted)
		CDROM:	640-800Mb (reads: medium)
		DVD:	1Gb-8Gb   (reads: medium)
		Swap:	RAM size  (HD speeds)

	Motivations
	-----------
		* Keep packetized devices as continuously-idle as possible.
		  Small chunks of idleness don't count. You want to have
		  maximal stetches of idleness for the device.

Server Case
-----------

	Overview
	--------
	  Same as a desktop, except that interactivity be damned. You
	want processes to _rarely_ have to wait for swap-ins, and 
	you want as much read-ahead as possible. Idle tasks are pressed
	firmly into cache to make room for running processes.

	Machine Description
	-------------------
		RAM:	512Mb-64Gb (reads: immediate, writes: immediate)
		HD:	10Gb-4Tb   (reads: medium, writes: medium)
		Swap:	2*RAM size  (HD speeds)

	Motivations
	-----------
		* Keep running processes as fully in memory as possible.

----------------------------- SUMMARY ----------------------------------

	If we take all the motivations from the above, and list them,
we get:

	* Don't write to the (slow,packeted) devices until
	  you need to free up memory for processes.
	* Never cache reads from immediate/fast devices.
	* If we're getting low cache hit rates, don't flush 
	  processes to swap.
	* If we're getting good cache hit rates, flush old, idle
	  processes to swap.
	* Keep packetized devices as continuously-idle as possible.
	  Small chunks of idleness don't count. You want to have
	  maximal stetches of idleness for the device.
	* Keep running processes as fully in memory as possible.


	Oddly enough, they don't seem to conflict. I'll continue to
work on these motivations, and try to determine testable methods
of measuring the success of a VM versus these criteria.

	Comments welcome.

-- 
Jason McMullan, Senior Linux Consultant
Linuxcare, Inc. 412.432.6457 tel, 412.656.3519 cell
jmcmullan@linuxcare.com, http://www.linuxcare.com/
Linuxcare. Putting open source to work.

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2001-07-13 21:08 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-07-05 15:04 VM Requirement Document - v0.0 Daniel Phillips
     [not found] ` <fa.jprli0v.qlofoc@ifi.uio.no>
     [not found]   ` <fa.e66agbv.hn0u1v@ifi.uio.no>
2001-07-05  1:49     ` Dan Maas
2001-07-05 13:02       ` Daniel Phillips
2001-07-05 14:00       ` Xavier Bestel
2001-07-05 14:51         ` Daniel Phillips
2001-07-05 15:00         ` Xavier Bestel
2001-07-05 15:12           ` Daniel Phillips
2001-07-05 15:12         ` Alan Shutko
     [not found]     ` <002501c104f4/mnt/sendme701a8c0@morph>
2001-07-09 12:17       ` Pavel Machek
2001-07-12 23:46         ` Daniel Phillips
2001-07-13 21:07           ` Pavel Machek
2001-07-06 19:09 ` Rik van Riel
2001-07-06 21:57   ` Daniel Phillips
  -- strict thread matches above, loose matches on Subject: below --
2001-07-05 15:09 mike_phillips
2001-07-04 16:08 mike_phillips
2001-06-28 12:20 mike_phillips
2001-06-28 12:30 ` Alan Cox
2001-06-28 13:33   ` Tobias Ringstrom
2001-06-28 13:37     ` Alan Cox
2001-06-28 14:04       ` Tobias Ringstrom
2001-06-28 14:14         ` Alan Cox
2001-06-28 14:52       ` Daniel Phillips
2001-06-28 14:39 ` Daniel Phillips
2001-06-28 18:01   ` Marco Colombo
2001-07-02 18:42     ` Rik van Riel
2001-07-03 10:33       ` Marco Colombo
2001-07-03 15:04         ` Daniel Phillips
2001-07-03 18:24           ` Daniel Phillips
2001-07-04  8:12           ` Ari Heitner
2001-07-04  9:41           ` Marco Colombo
2001-07-04 15:03             ` Daniel Phillips
2001-07-03 18:29       ` Daniel Phillips
2001-07-04  8:32         ` Marco Colombo
2001-07-04 14:44           ` Daniel Phillips
2001-06-28 15:21 ` Jonathan Morton
2001-06-28 16:02   ` Daniel Phillips
     [not found] <fa.oqkojpv.3hosb7@ifi.uio.no>
     [not found] ` <fa.jpsks3v.1o2gag4@ifi.uio.no>
2001-06-27  0:43   ` Dan Maas
2001-06-27  0:45     ` Mike Castle
2001-06-27 10:50   ` Xavier Bestel
2001-06-27  8:53 Martin Knoblauch
2001-06-27 18:13 ` Rik van Riel
2001-06-28  6:59   ` Martin Knoblauch
2001-06-28 11:27 ` Helge Hafting
2001-06-28 11:54   ` Martin Knoblauch
2001-06-28 12:02   ` Tobias Ringstrom
2001-06-28 12:31     ` Xavier Bestel
2001-06-28 13:05       ` Tobias Ringstrom
2001-06-26 19:58 Jason McMullan
2001-06-26 21:21 ` Rik van Riel
2001-06-26 21:29   ` Jason McMullan
2001-06-26 21:33 ` John Stoffel
2001-06-26 21:42   ` Rik van Riel
2001-06-26 22:21     ` Stefan Hoffmeister
2001-06-26 22:48       ` Jeffrey W. Baker
2001-06-27  0:18         ` Mike Castle
2001-06-28 13:07       ` John Fremlin
2001-06-27 13:36     ` Marco Colombo
2001-06-27  3:55   ` Daniel Phillips
2001-06-27 14:09   ` Pozsar Balazs
2001-06-28 22:47 ` John Fremlin
2001-06-30 15:37 ` Pavel Machek
2001-07-10 10:34 ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).