linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] rmap 14
@ 2002-08-16  2:07 Rik van Riel
  2002-08-16  2:21 ` Bill Huey
  0 siblings, 1 reply; 9+ messages in thread
From: Rik van Riel @ 2002-08-16  2:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm

This is a fairly minimal change for rmap14 since I've been
working on 2.5 most of the time. The experimental code in
this version is a hopefully smarter page_launder() that
shouldn't do much more IO than needed and hopefully gets
rid of the stalls that people have seen during heavy swap
activity.  Please test this version. ;)


The first release of the 14th version of the reverse
mapping based VM is now available.
This is an attempt at making a more robust and flexible VM
subsystem, while cleaning up a lot of code at the same time.
The patch is available from:

           http://surriel.com/patches/2.4/2.4.19-rmap14
and        http://surriel.com/patches/2.4/incr/rmap13c-rmap14
and        http://linuxvm.bkbits.net/


My big TODO items for a next release are:
  - O(1) page launder - currently functional but slow, needs to be tuned
  - pte-highmem

rmap 14:
  - get rid of stalls during swapping, hopefully          (me)
  - low latency zap_page_range                            (Robert Love)
rmap 13c:
  - add wmb() to wakeup_memwaiters                        (Arjan van de Ven)
  - remap_pmd_range now calls pte_alloc with full address (Paul Mackerras)
  - #ifdef out pte_chain_lock/unlock on UP machines       (Andrew Morton)
  - un-BUG() truncate_complete_page, the race is expected (Andrew Morton, me)
  - remove NUMA changes from rmap13a                      (Christoph Hellwig)
rmap 13b:
  - prevent PF_MEMALLOC recursion for higher order allocs (Arjan van de Ven, me)
  - fix small SMP race, PG_lru                            (Hugh Dickins)
rmap 13a:
  - NUMA changes for page_address                         (Samuel Ortiz)
  - replace vm.freepages with simpler kswapd_minfree      (Christoph Hellwig)
rmap 13:
  - rename touch_page to mark_page_accessed and uninline  (Christoph Hellwig)
  - NUMA bugfix for __alloc_pages                         (William Irwin)
  - kill __find_page                                      (Christoph Hellwig)
  - make pte_chain_freelist per zone                      (William Irwin)
  - protect pte_chains by per-page lock bit               (William Irwin)
  - minor code cleanups                                   (me)
rmap 12i:
  - slab cleanup                                          (Christoph Hellwig)
  - remove references to compiler.h from mm/*             (me)
  - move rmap to marcelo's bk tree                        (me)
  - minor cleanups                                        (me)
rmap 12h:
  - hopefully fix OOM detection algorithm                 (me)
  - drop pte quicklist in anticipation of pte-highmem     (me)
  - replace andrea's highmem emulation by ingo's one      (me)
  - improve rss limit checking                            (Nick Piggin)
rmap 12g:
  - port to armv architecture                             (David Woodhouse)
  - NUMA fix to zone_table initialisation                 (Samuel Ortiz)
  - remove init_page_count                                (David Miller)
rmap 12f:
  - for_each_pgdat macro                                  (William Lee Irwin)
  - put back EXPORT(__find_get_page) for modular rd       (me)
  - make bdflush and kswapd actually start queued disk IO (me)
rmap 12e
  - RSS limit fix, the limit can be 0 for some reason     (me)
  - clean up for_each_zone define to not need pgdata_t    (William Lee Irwin)
  - fix i810_dma bug introduced with page->wait removal   (William Lee Irwin)
rmap 12d:
  - fix compiler warning in rmap.c                        (Roger Larsson)
  - read latency improvement   (read-latency2)            (Andrew Morton)
rmap 12c:
  - fix small balancing bug in page_launder_zone          (Nick Piggin)
  - wakeup_kswapd / wakeup_memwaiters code fix            (Arjan van de Ven)
  - improve RSS limit enforcement                         (me)
rmap 12b:
  - highmem emulation (for debugging purposes)            (Andrea Arcangeli)
  - ulimit RSS enforcement when memory gets tight         (me)
  - sparc64 page->virtual quickfix                        (Greg Procunier)
rmap 12a:
  - fix the compile warning in buffer.c                   (me)
  - fix divide-by-zero on highmem initialisation  DOH!    (me)
  - remove the pgd quicklist (suspicious ...)             (DaveM, me)
rmap 12:
  - keep some extra free memory on large machines         (Arjan van de Ven, me)
  - higher-order allocation bugfix                        (Adrian Drzewiecki)
  - nr_free_buffer_pages() returns inactive + free mem    (me)
  - pages from unused objects directly to inactive_clean  (me)
  - use fast pte quicklists on non-pae machines           (Andrea Arcangeli)
  - remove sleep_on from wakeup_kswapd                    (Arjan van de Ven)
  - page waitqueue cleanup                                (Christoph Hellwig)
rmap 11c:
  - oom_kill race locking fix                             (Andres Salomon)
  - elevator improvement                                  (Andrew Morton)
  - dirty buffer writeout speedup (hopefully ;))          (me)
  - small documentation updates                           (me)
  - page_launder() never does synchronous IO, kswapd
    and the processes calling it sleep on higher level    (me)
  - deadlock fix in touch_page()                          (me)
rmap 11b:
  - added low latency reschedule points in vmscan.c       (me)
  - make i810_dma.c include mm_inline.h too               (William Lee Irwin)
  - wake up kswapd sleeper tasks on OOM kill so the
    killed task can continue on its way out               (me)
  - tune page allocation sleep point a little             (me)
rmap 11a:
  - don't let refill_inactive() progress count for OOM    (me)
  - after an OOM kill, wait 5 seconds for the next kill   (me)
  - agpgart_be fix for hashed waitqueues                  (William Lee Irwin)
rmap 11:
  - fix stupid logic inversion bug in wakeup_kswapd()     (Andrew Morton)
  - fix it again in the morning                           (me)
  - add #ifdef BROKEN_PPC_PTE_ALLOC_ONE to rmap.h, it
    seems PPC calls pte_alloc() before mem_map[] init     (me)
  - disable the debugging code in rmap.c ... the code
    is working and people are running benchmarks          (me)
  - let the slab cache shrink functions return a value
    to help prevent early OOM killing                     (Ed Tomlinson)
  - also, don't call the OOM code if we have enough
    free pages                                            (me)
  - move the call to lru_cache_del into __free_pages_ok   (Ben LaHaise)
  - replace the per-page waitqueue with a hashed
    waitqueue, reduces size of struct page from 64
    bytes to 52 bytes (48 bytes on non-highmem machines)  (William Lee Irwin)
rmap 10:
  - fix the livelock for real (yeah right), turned out
    to be a stupid bug in page_launder_zone()             (me)
  - to make sure the VM subsystem doesn't monopolise
    the CPU, let kswapd and some apps sleep a bit under
    heavy stress situations                               (me)
  - let __GFP_HIGH allocations dig a little bit deeper
    into the free page pool, the SCSI layer seems fragile (me)
rmap 9:
  - improve comments all over the place                   (Michael Cohen)
  - don't panic if page_remove_rmap() cannot find the
    rmap in question, it's possible that the memory was
    PG_reserved and belonging to a driver, but the driver
    exited and cleared the PG_reserved bit                (me)
  - fix the VM livelock by replacing > by >= in a few
    critical places in the pageout code                   (me)
  - treat the reclaiming of an inactive_clean page like
    allocating a new page, calling try_to_free_pages()
    and/or fixup_freespace() if required                  (me)
  - when low on memory, don't make things worse by
    doing swapin_readahead                                (me)
rmap 8:
  - add ANY_ZONE to the balancing functions to improve
    kswapd's balancing a bit                              (me)
  - regularize some of the maximum loop bounds in
    vmscan.c for cosmetic purposes                        (William Lee Irwin)
  - move page_address() to architecture-independent
    code, now the removal of page->virtual is portable    (William Lee Irwin)
  - speed up free_area_init_core() by doing a single
    pass over the pages and not using atomic ops          (William Lee Irwin)
  - documented the buddy allocator in page_alloc.c        (William Lee Irwin)
rmap 7:
  - clean up and document vmscan.c                        (me)
  - reduce size of page struct, part one                  (William Lee Irwin)
  - add rmap.h for other archs (untested, not for ARM)    (me)
rmap 6:
  - make the active and inactive_dirty list per zone,
    this is finally possible because we can free pages
    based on their physical address                       (William Lee Irwin)
  - cleaned up William's code a bit                       (me)
  - turn some defines into inlines and move those to
    mm_inline.h (the includes are a mess ...)             (me)
  - improve the VM balancing a bit                        (me)
  - add back inactive_target to /proc/meminfo             (me)
rmap 5:
  - fixed recursive buglet, introduced by directly
    editing the patch for making rmap 4 ;)))              (me)
rmap 4:
  - look at the referenced bits in page tables            (me)
rmap 3:
  - forgot one FASTCALL definition                        (me)
rmap 2:
  - teach try_to_unmap_one() about mremap()               (me)
  - don't assign swap space to pages with buffers         (me)
  - make the rmap.c functions FASTCALL / inline           (me)
rmap 1:
  - fix the swap leak in rmap 0                           (Dave McCracken)
rmap 0:
  - port of reverse mapping VM to 2.4.16                  (me)

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-16  2:07 [PATCH] rmap 14 Rik van Riel
@ 2002-08-16  2:21 ` Bill Huey
  2002-08-16 21:02   ` Mel
  0 siblings, 1 reply; 9+ messages in thread
From: Bill Huey @ 2002-08-16  2:21 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, linux-mm, Bill Huey (Hui)

On Thu, Aug 15, 2002 at 11:07:49PM -0300, Rik van Riel wrote:
> This is a fairly minimal change for rmap14 since I've been
> working on 2.5 most of the time. The experimental code in
> this version is a hopefully smarter page_launder() that
> shouldn't do much more IO than needed and hopefully gets
> rid of the stalls that people have seen during heavy swap
> activity.  Please test this version. ;)
> 
> The first release of the 14th version of the reverse
> mapping based VM is now available.
> This is an attempt at making a more robust and flexible VM
> subsystem, while cleaning up a lot of code at the same time.
> The patch is available from:

Hey,

Again, the combination of a kind of felt increase in intelligence in
swap decisions and increase in interactivity made my machine feel
substantally smoother, but it needs to be backed up by other people's
experiences with it.

I wish there was a test for this kind of thing.

bill


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-16  2:21 ` Bill Huey
@ 2002-08-16 21:02   ` Mel
  2002-08-16 21:29     ` Scott Kaplan
  0 siblings, 1 reply; 9+ messages in thread
From: Mel @ 2002-08-16 21:02 UTC (permalink / raw)
  To: Bill Huey; +Cc: Rik van Riel, linux-kernel, linux-mm

On Thu, 15 Aug 2002, Bill Huey wrote:

> Again, the combination of a kind of felt increase in intelligence in
> swap decisions and increase in interactivity made my machine feel
> substantally smoother, but it needs to be backed up by other people's
> experiences with it.
>
> I wish there was a test for this kind of thing.
>

Blatant plug but it's what I'm working on with VM Regress. At the version
I'm working on, I've started the first benchmark but I've only been
working on it a day so it's a bit to go yet. As a start, we'll be able to
benchmark page access times and swap decisions. These are three live tests
run against 2.4.20pre2 but the suite is known to compile with the latest
2.5 kernel and with 2.4.19-rmap14.

http://www.csn.ul.ie/~mel/projects/vmregress/2.4.20pre2/start/mapanon.html
http://www.csn.ul.ie/~mel/projects/vmregress/2.4.20pre2/updatedb/mapanon.html
http://www.csn.ul.ie/~mel/projects/vmregress/2.4.20pre2/withx/mapanon.html

start    was run at system startup
updatedb is output after updatedb was running 2 minutes
withx    is run with the system running X, konqueror and a few eterms

The information is still a bit sparse but still, things can be told. The
test works by using three kernel modules

mapanon - Will mmap, read/write, close mmaped regions for a caller
pagemap - Print out pages swapped/present in all VMA's
zone    - Print out all zone information

A perl script uses these from userspace to benchmark how quickly data can
be referenced for a given reference pattern. The benchmark isn't finished
yet so all that is done is a linear read through memory once, hence all
page references are 1.

The reports have three sections. The first is details of the test. The
second is a graph showing how long it took to read/write a page in
milliseconds. Note that they are fixed at a min access time of 350 because
of module overhead and test overhead. This could be "removed" easily
enough to give a more realistic view of real page access. The third is a
graph showing page reference counts in green and pages present in read.
The reference line is flat because it's one scan through the region.

start shows that page access times were pretty much constant, not
suprising

updatedb was fine until near the end. At that stage, buffers could not be
freed out that were filled by updatedb and it was having to look hard. So
you can see, times are quick for ages and then suddenly rise to an average
access time of about 3000 milliseconds with one access at 630482
milliseconds!!

withx shows spikey access times for pages which is consistent with large
apps starting up in the background

Now... where this is going. I plan to write a module that will generate
page references to a given pattern. Possible pattern references are

o Linear
o Pure random
o Random with gaussian distribution
o Smooth so the references look like a curve
o Trace data taken from a "real" application or database

The real application could be cool if I could acquire the data and would
produce "real" info. If we could simulate a database accessing for
instance, it could be shown exactly how the VM performed. But as it is,
some benchmarks can be easily adjusted to use VM Regress. If they just
read /proc/vmregress/pagemap, they will know what pages are present in
memory and what got swapped. A perl library VMR::Pagemap is provided to
decode the information.

Once the test can be run on the stock kernel, it can be run against rmap,
aa, 2.5.x or whatever other sort kernels are out there so emperical data
can be produced. The first major benchmark that will be produced will be
something like Rik's webserver benchmark.

Instead of the module memory mapping a region, it will memory map a set of
web pages and images. A set of bots will then act like users browsing. The
bot will say how long it took to retrive pages with a best, slowest and
average access time. The kernel modules will dump out what pages were
present, kernel statistics and so on.

The release date for the next version with this benchmark is next week at
some stage. I'm not working on this over the weekend so I estimate I'll
have this bench ready by Wednesday and I'll give rmap a run with it to
make sure it works correctly. Either way "your wish" is on the way

-- 
Mel Gorman
MSc Student, University of Limerick
http://www.csn.ul.ie/~mel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-16 21:02   ` Mel
@ 2002-08-16 21:29     ` Scott Kaplan
  2002-08-16 23:02       ` Mel
  2002-08-19 18:04       ` Daniel Phillips
  0 siblings, 2 replies; 9+ messages in thread
From: Scott Kaplan @ 2002-08-16 21:29 UTC (permalink / raw)
  To: Mel; +Cc: Bill Huey, Rik van Riel, linux-kernel, linux-mm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mel,

I appreciate your efforts; the goal is a good one, but I'm concerned about 
some parts of the direction you seem to be taking.

On Friday, August 16, 2002, at 05:02 PM, Mel wrote:

> start    was run at system startup
> updatedb is output after updatedb was running 2 minutes
> withx    is run with the system running X, konqueror and a few eterms

I will acknowledge that you're at the beginning of a long process, and 
that you have much more that you plan to add, but I feel the need to point 
out that this is a *very* small test suite.  ``start'' is more of a 
curiosity than an interesting data point.  The other two are not 
unreasonable starting points.

> updatedb was fine until near the end. At that stage, buffers could not be
> freed out that were filled by updatedb and it was having to look hard. So
> you can see, times are quick for ages and then suddenly rise to an average
> access time of about 3000 milliseconds with one access at 630482
> milliseconds!!

You may want to check your code for sanity:  There are only 1,000 
milliseconds in a second, and I'm skeptical that there was a 630 second 
(that is, 10+ minute) reference.  Were there, perhaps, microseconds?  
There are 1,000,000 of those in a second, so 630,482 would still be half a 
second, which should be enough time for dozens of page faults (approach 
100 of them), so I'm wondering what could possibly cause this measurement.

Or...was this process descheduled, and what you measured is the interval 
between when this process last ran and when the scheduler put it on the 
CPU again?

> withx shows spikey access times for pages which is consistent with large
> apps starting up in the background

It is?  Why?  Which is the ``large app'' here?  What does it mean to start 
up in the background, and why would that make the page access times 
inconsistent?

> Now... where this is going. I plan to write a module that will generate
> page references to a given pattern. Possible pattern references are
>
> o Linear
> o Pure random
> o Random with gaussian distribution
> o Smooth so the references look like a curve
> o Trace data taken from a "real" application or database

Noooooooooo!

I can't think of a reason to test the VM under any one of the first three 
distributions.  I've never, *ever* seen or heard of a linear or gaussian 
distribution of page references.  As for uniform random (which is what I 
assume you mean by ``pure random''), that's not worth testing.  If a 
workload presents a pure random reference pattern, any on-line policy is 
screwed.  No process can do this on a data set that doesn't fit in memory,
  and if it does, there's no hope.

The fourth suggestion -- some negative exponential distribution -- is the 
kind of thing about which this group had a long discussion just a few 
weeks ago.  It's a mostly-bad idea.  In short:  If you have a negative 
exponential curve for your distribution, the best on-line policy is LRU, 
and nothing else will improve on it -- it's been proven in the literature 
on the LRUSM model of program behavior long ago.  It's when reference 
behavior *deviates* from that smooth curve that a policy may perform 
better than simple LRU.  Moreover, real workloads differ from that smooth 
curve over time, particularly during phase changes, which is where the 
*real* test of a VM policy occurs.  There's been plenty of work done on 
mathematical models for program behavior.  None of it has been sufficient 
for qualitative (that is, rank ordering) or quantitative evaluation of 
memory management policies.  Those models that come close are complex to 
work with, requiring the setting of a large number of parameters.

The last suggestion -- real trace data -- is the best one.  I do wonder 
why you put ``real'' in quotes.  I also wouldn't want trace data taken 
from *one* application or database.  You need a whole suite to represent 
the kinds of reference behavior that a VM system will need to manage.

Again, I recognize that this is a work in progress.  I'd be happy to see 
it yield worthwhile results.  If you use oversimplified models, it won't.  
The results will not be reliable for evaluating performance or making 
comparisons of VM systems.

Scott
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (Darwin)
Comment: For info see http://www.gnupg.org

iD8DBQE9XW6+8eFdWQtoOmgRAqjiAJ0ZlrQGOg3MFzXYyi+SdvKIa/bvOgCeOWak
7put0ihQbEY0wNXD+objEos=
=4IQt
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-16 21:29     ` Scott Kaplan
@ 2002-08-16 23:02       ` Mel
  2002-08-19 19:50         ` Daniel Phillips
  2002-08-19 18:04       ` Daniel Phillips
  1 sibling, 1 reply; 9+ messages in thread
From: Mel @ 2002-08-16 23:02 UTC (permalink / raw)
  To: Scott Kaplan; +Cc: Bill Huey, Rik van Riel, linux-kernel, linux-mm


On Fri, 16 Aug 2002, Scott Kaplan wrote:

> > start    was run at system startup
> > updatedb is output after updatedb was running 2 minutes
> > withx    is run with the system running X, konqueror and a few eterms
>
> I will acknowledge that you're at the beginning of a long process, and
> that you have much more that you plan to add, but I feel the need to point
> out that this is a *very* small test suite.

It will take a *long* time to develop the full test suite to cover,
faulting, page alloc, slab, vmscan, buffer caches etc.. I have two
choices, I can develop the entire thing and have one large release or I
can release early with updates so people can keep an eye out and make sure
they get what they want as well as what I'm looking for. I choose the
latter. This test is what I can produce *now* but this benchmark isn't
even finished. I put up the three tests an an example of a beginning to
try and show that a test suite is on the way that isn't a simple shell
script.

> You may want to check your code for sanity:  There are only 1,000
> milliseconds in a second, and I'm skeptical that there was a 630 second
> (that is, 10+ minute) reference.  Were there, perhaps, microseconds?

nuts, yeah, microseconds. milliseconds is a typo. Userland counds in
microseconds. kernel code counts in jiffies. The code is sane, my emails
are not.

> There are 1,000,000 of those in a second, so 630,482 would still be half a
> second, which should be enough time for dozens of page faults (approach
> 100 of them), so I'm wondering what could possibly cause this measurement.
>

I'm not sure. I've noticed the odd twitch of long access time on rare
occasions but I'm not sure what causes them yet. I'm not sure if they are real
or confined to my code. All the time measurement stuff is in VMR::Time so at
least the timing code is confined for anyone who wants to verify.

> Or...was this process descheduled, and what you measured is the interval
> between when this process last ran and when the scheduler put it on the
> CPU again?
>

The measure is the time when the script asked the module to read a page.
The page is read by echoing to a mapanon_read proc entry. It's looking
like it takes about 350 microseconds to enter the module and perform the
read. I don't call schedule although it is possible I get scheduled. The only
way to be sure would be to collect all timing information within the module
which is perfectly possible. The only trouble is that if the module collects,
only one test instance can run at a time.

The way it is at the moment, I could run 100 instances of this test at the
same time and see how they interacted. The module is (or should be) SMP safe
and these tests were run on a duel processor. I'm waiting for a quad xeon
xseries to arrive so I can start running tests there.

> > withx shows spikey access times for pages which is consistent with large
> > apps starting up in the background
>
> It is?  Why?  Which is the ``large app'' here?  What does it mean to start
> up in the background, and why would that make the page access times
> inconsistent?
>

I didn't think about this but I suspected that what would happen is that the
apps and the test would compete for memory at the same time. Both would swap
out pages so there would be periods of quick accesses with a block of long
delays as more was swapped. I didnt' think this fully through yet and this is
30 seconds of reasoning so don't shoot me if I'm wrong.

At the time the test was started, 4 instances of konqueror were starting to
run and it hogs physical pages quiet a lot so it stands to reason it would
collide with the test. It's not a large app as such, but my machine isn't
exactly a powerhouse either.

> Noooooooooo!
>
> I can't think of a reason to test the VM under any one of the first three
> distributions.  I've never, *ever* seen or heard of a linear or gaussian
> distribution of page references.

I'm familiar with this problem and believe it or not, I've read a few papers on
the subject. The reason why I would write it is that it will help determine if
the page replacement algorithm is able to detect the working set or not. If
I refer to pages with a smooth distribution on an area about the size of
physical memory, the pages not been referenced should be swapped out.

It is more a test than a benchmark but it is somewhere where rmap should
shine. If I map memory the same size as physical memory, 2.4.20pre2 will
swap out the whole process because it can't reverse lookup pages. I want to
see will rmap selectively swap the correct pages. The timing isn't important
because for the length of the test, a FIFO or random selection isn't going
to be appreciably noticable. We need to see what the present pages were.

The second real reason to have this is that it is very easy to work out
in advance how the VM should perform for a given simple pattern. The test
should back up what the developer has in their head. It's much easier to
work initially with regular data than true trace information.

Lastly, this isn't justification for bad refernce data but even producing
data with a know pattern is more reproducable than running kernel
compiles, big dd's, large mmaps etc and timing the results.

> <Other page reference behaviour>

I see your point and mostly I agree. The problem is generating the correct
type of data is difficult and a full project in itself but generating
exact test data is not my immediate concern. The script is going to be
receiving it's reference data from a VMR::Reference perl module which is
responsible for generating page references. If someone feels that a better
reference pattern should be used, they can add it to the module and re-run
the tests. Either that or they can describe how to generate (or cite a
paper) to me and I'll investigate it if I have the time

> The last suggestion -- real trace data -- is the best one.  I do wonder
> why you put ``real'' in quotes.

Because all programs are real, even VM Regress but testing trace data from it
would be pretty useless. When I said "real", I meant real as in applications
like compilers, database servers, web browsers etc.

> I also wouldn't want trace data taken
> from *one* application or database.  You need a whole suite to represent
> the kinds of reference behavior that a VM system will need to manage.
>

Trace data would be great but I haven't been thinking about it long and
haven't come up with a reliable way of generating it yet. Given a bit of
thought, a patch to the kernel could be developed that would allow processes
to be attached to and read page faults but that is only part of the picture.
Trapping calls to mark_page_accessed might help but I need to think more and
generating real trace data is more important for a much later release. I'm
still working on framework here.

> Again, I recognize that this is a work in progress.  I'd be happy to see
> it yield worthwhile results.  If you use oversimplified models, it won't.
> The results will not be reliable for evaluating performance or making
> comparisons of VM systems.
>

Things have to start with simplified models because they can be easily
understood at a glance. I think it's a bit unreasonable to expect a full
featured suites at first release. As I said I have been working on this
particular benchmark 1 day, *1* day and the suite has only about 8 or 10
days of development time in total. I like to think I'm not a bad
programmer but I'm not God :-)

-- 
Mel Gorman
MSc Student, University of Limerick
http://www.csn.ul.ie/~mel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-16 21:29     ` Scott Kaplan
  2002-08-16 23:02       ` Mel
@ 2002-08-19 18:04       ` Daniel Phillips
  1 sibling, 0 replies; 9+ messages in thread
From: Daniel Phillips @ 2002-08-19 18:04 UTC (permalink / raw)
  To: Scott Kaplan, Mel; +Cc: Bill Huey, Rik van Riel, linux-kernel, linux-mm

On Friday 16 August 2002 23:29, Scott Kaplan wrote:
> > Now... where this is going. I plan to write a module that will generate
> > page references to a given pattern. Possible pattern references are
> >
> > o Linear
> > o Pure random
> > o Random with gaussian distribution
> > o Smooth so the references look like a curve
> > o Trace data taken from a "real" application or database
> 
> Noooooooooo!
> 
> I can't think of a reason to test the VM under any one of the first three
> distributions.  I've never, *ever* seen or heard of a linear or gaussian
> distribution of page references.  As for uniform random (which is what I
> assume you mean by ``pure random''), that's not worth testing.  If a
> workload presents a pure random reference pattern, any on-line policy is
> screwed.  No process can do this on a data set that doesn't fit in memory,
> and if it does, there's no hope.

I disagree that the linear (which I assume means walk linearly through 
process memory) and random patterns aren't worth testing.  The former should 
produce very understandable behaviour and that's always a good thing.  It's 
an idiot check.  Specifically, with the algorithms we're using, we expect the 
first-touched pages to be chosen for eviction.  It's worth verifying that 
this works as expected.

Random gives us a nice baseline against which to evaluate our performance on 
more typical, localized loads.  That is, we need to know we're doing better 
than random, and it's very nice to know by how much.

The gaussian distribution is also interesting because it gives a simplistic 
notion of virtual address locality.  We are supposed to be able to predict 
likelihood of future uses based on historical access patterns, the question 
is: do we?  Comparing the random distribution to gaussian, we ought to see 
somewhat fewer evictions on the gaussian distribution.  (I'll bet right now 
that we completely fail that test, because we just do not examine the 
referenced bits frequently enough to recover any signal from the noise.)

I'll leave the more complex patterns to you and Mel, but these simple 
patterns are particularly interesting to me.  Not as a target for 
optimization, but more to verify that basic mechanisms are working as 
expected.

-- 
Daniel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-16 23:02       ` Mel
@ 2002-08-19 19:50         ` Daniel Phillips
  2002-08-19 21:19           ` Mel
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Phillips @ 2002-08-19 19:50 UTC (permalink / raw)
  To: Mel, Scott Kaplan; +Cc: Bill Huey, Rik van Riel, linux-kernel, linux-mm

On Saturday 17 August 2002 01:02, Mel wrote:
> On Fri, 16 Aug 2002, Scott Kaplan wrote:
> The measure is the time when the script asked the module to read a page.
> The page is read by echoing to a mapanon_read proc entry. It's looking
> like it takes about 350 microseconds to enter the module and perform the
> read. I don't call schedule although it is possible I get scheduled. The only
> way to be sure would be to collect all timing information within the module
> which is perfectly possible. The only trouble is that if the module collects,
> only one test instance can run at a time.

It sounds like you want to try the linux trace toolkit:

   http://www.opersys.com/LTT/

-- 
Daniel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-19 19:50         ` Daniel Phillips
@ 2002-08-19 21:19           ` Mel
  2002-08-19 21:38             ` Daniel Phillips
  0 siblings, 1 reply; 9+ messages in thread
From: Mel @ 2002-08-19 21:19 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel, linux-mm

On Mon, 19 Aug 2002, Daniel Phillips wrote:

> It sounds like you want to try the linux trace toolkit:
>
>    http://www.opersys.com/LTT/
>

I have been looking it's direction a couple of times. I suspect I'll
eventually end up using it to answer some questions but I'm trying to
get as far as possible without using large kernel patches. At the moment
the extent of the patches involves exporting symbols to modules

-- 
Mel Gorman
MSc Student, University of Limerick
http://www.csn.ul.ie/~mel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-19 21:19           ` Mel
@ 2002-08-19 21:38             ` Daniel Phillips
  0 siblings, 0 replies; 9+ messages in thread
From: Daniel Phillips @ 2002-08-19 21:38 UTC (permalink / raw)
  To: Mel; +Cc: linux-kernel, linux-mm

On Monday 19 August 2002 23:19, Mel wrote:
> On Mon, 19 Aug 2002, Daniel Phillips wrote:
> 
> > It sounds like you want to try the linux trace toolkit:
> >
> >    http://www.opersys.com/LTT/
> >
> 
> I have been looking it's direction a couple of times. I suspect I'll
> eventually end up using it to answer some questions

That's exactly what I meant - when you uncover something interesting with
your test tool, you investigate it further with LTT.

> but I'm trying to
> get as far as possible without using large kernel patches. At the moment
> the extent of the patches involves exporting symbols to modules

I think you've chosen roughly the right level to approach this.

-- 
Daniel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2002-08-19 21:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-08-16  2:07 [PATCH] rmap 14 Rik van Riel
2002-08-16  2:21 ` Bill Huey
2002-08-16 21:02   ` Mel
2002-08-16 21:29     ` Scott Kaplan
2002-08-16 23:02       ` Mel
2002-08-19 19:50         ` Daniel Phillips
2002-08-19 21:19           ` Mel
2002-08-19 21:38             ` Daniel Phillips
2002-08-19 18:04       ` Daniel Phillips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).