linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Rik`s ac12-pmap2 vs ac12-vanilla perfcomp
  2001-09-02 17:10 Rik`s ac12-pmap2 vs ac12-vanilla perfcomp Samium Gromoff
@ 2001-09-02 13:11 ` Rik van Riel
  2001-09-02 17:04 ` Daniel Phillips
  1 sibling, 0 replies; 8+ messages in thread
From: Rik van Riel @ 2001-09-02 13:11 UTC (permalink / raw)
  To: Samium Gromoff; +Cc: linux-kernel

On Sun, 2 Sep 2001, Samium Gromoff wrote:

>    No flames please - i know these were low VM loads, i did this just
> to know how big is test rmaps maitenance overhead. It shows us that
> even on low VM load there is a huge win in using rmap. And the win
> increases with the VM load.

Interesting, I'm just at a proof-of-concept implementation right
now, which is not yet stable or ready.  ;)

I guess page replacement _is_ important...

cheers,

Rik
-- 
IA64: a worthy successor to i860.

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Rik`s ac12-pmap2 vs ac12-vanilla perfcomp
  2001-09-02 17:10 Rik`s ac12-pmap2 vs ac12-vanilla perfcomp Samium Gromoff
  2001-09-02 13:11 ` Rik van Riel
@ 2001-09-02 17:04 ` Daniel Phillips
  2001-09-02 21:46   ` Samium Gromoff
  2001-09-03  9:58   ` Rik van Riel
  1 sibling, 2 replies; 8+ messages in thread
From: Daniel Phillips @ 2001-09-02 17:04 UTC (permalink / raw)
  To: Samium Gromoff, linux-kernel; +Cc: riel

On September 2, 2001 07:10 pm, Samium Gromoff wrote:
>    No flames please - i know these were low VM loads, i did this just to know 
> how big is test rmaps maitenance overhead. It shows us that even on low VM
> load there is a huge win in using rmap. And the win increases with the VM load.
>    Algo:
>       - booted ac12
>       - performed test A 7 times, then test B 7 times.
>       - booted ac12-pmap2
>       - performed test A 7 times, then test B 7 times.
> 
>       * each test was done 7 times, with the lowest and highest thrown away.
>       * due to high values of data and streaming usage pattern caching was 
>  unable to affect results, so it was ignored.
> 
> test A:
> "time find / -xdev" + (standard junk eating ~6% cpu (top procinfo))
>   descr: extremely low VM load, mostly IO dependent
> 
> test B:
> "time find / -xdev | grep --regexp="\/" | xargs echo" + background mpg123 +
>  + standard junk described above
>   descr: low, although higher than A, vm load (nearly absolutely no swap)

It's nice to see these tests aren't running any slower (and not crashing!)
but as I understand it, reverse mapping is a win only for shared mmaps and
swapping, which you aren't doing.  So it's not clear what effect is being
measured.

One thing that goes away with rmaps is the need to scan process page tables.
It's possible that this takes enough load off L1 cache to produce the effects
you showed, though it would be surprising.

(Note that I'm in the "reverse maps are good" camp, and I think Rik's design 
is fundamentally correct.)

--
Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Rik`s ac12-pmap2 vs ac12-vanilla perfcomp
@ 2001-09-02 17:10 Samium Gromoff
  2001-09-02 13:11 ` Rik van Riel
  2001-09-02 17:04 ` Daniel Phillips
  0 siblings, 2 replies; 8+ messages in thread
From: Samium Gromoff @ 2001-09-02 17:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: riel

         Hello there, i just came with some benchmarks:
   Done on p166-24M
 
   No flames please - i know these were low VM loads, i did this just to know 
how big is test rmaps maitenance overhead. It shows us that even on low VM
load there is a huge win in using rmap. And the win increases with the VM load.
   Algo:
      - booted ac12
      - performed test A 7 times, then test B 7 times.
      - booted ac12-pmap2
      - performed test A 7 times, then test B 7 times.

      * each test was done 7 times, with the lowest and highest thrown away.
      * due to high values of data and streaming usage pattern caching was 
 unable to affect results, so it was ignored.

test A:
"time find / -xdev" + (standard junk eating ~6% cpu (top procinfo))
  descr: extremely low VM load, mostly IO dependent

test B:
"time find / -xdev | grep --regexp="\/" | xargs echo" + background mpg123 +
 + standard junk described above
  descr: low, although higher than A, vm load (nearly absolutely no swap)

Results:
        ac12            ac12-pmap2
===[ test A: find / -xdev
real    1m4.221s        0m56.916s
real    1m3.042s        0m57.275s
real    1m3.613s        0m57.606s
real    1m3.442s        0m57.166s
real    1m3.447s        0m56.895s
======================================
avg     63.553 sec      57.171 sec	11% win

sys     0m36.750s       0m31.980s
sys     0m37.190s       0m31.870s
sys     0m36.720s       0m32.300s
sys     0m36.650s       0m32.400s
sys     0m37.350s       0m32.270s
======================================
avg     36.93 sec       32.16 sec	13% win



===[ test B: find / -xdev | grep --regexp="\/" | xargs echo (with mpg123 in bgrnd
 eating 4M+ buf + 15-20% CPU)
real    0m38.720s       0m31.018s
real    0m38.061s       0m30.318s
real    0m38.075s       0m31.980s
real    0m37.626s       0m31.149s
real    0m38.431s       0m30.820s
======================================
avg     38.182 sec      31.057 sec	19% win

sys     0m16.090s       0m13.910s
sys     0m15.820s       0m13.610s
sys     0m15.750s       0m13.710s
sys     0m15.700s       0m13.780s
sys     0m15.750s       0m13.790s
======================================
avg     15.82 sec       13.76 sec	14% win



cheers,
 Sam

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Rik`s ac12-pmap2 vs ac12-vanilla perfcomp
  2001-09-02 21:46   ` Samium Gromoff
@ 2001-09-02 17:51     ` Daniel Phillips
  2001-09-02 22:18       ` Samium Gromoff
  2001-09-03 15:22       ` Marcelo Tosatti
  0 siblings, 2 replies; 8+ messages in thread
From: Daniel Phillips @ 2001-09-02 17:51 UTC (permalink / raw)
  To: Samium Gromoff; +Cc: linux-kernel

On September 2, 2001 11:46 pm, Samium Gromoff wrote:
> Daniel Phillips wrote:
> > > One thing that goes away with rmaps is the need to scan process page tables.
> > It's possible that this takes enough load off L1 cache to produce the effects
>
>     I feel like that. 
>     actually there was a fear that the overhead of reverse map maintenance
>  will overthrow the gain on low loads, but in my case this isnt an issue.

Rik's patch can be optimized a lot by using a direct pointer to the pte in the
nonshared case, and perhaps a null rmap pointer in the kernel-only case (e.g.,
page cache).  If the non-optimized version is already performing better than the
traditional approach it's a very good sign.  This needs careful confirmation.

Measurements where you force your system into continuous swapping would be very
interesting.

--
Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Rik`s ac12-pmap2 vs ac12-vanilla perfcomp
  2001-09-02 17:04 ` Daniel Phillips
@ 2001-09-02 21:46   ` Samium Gromoff
  2001-09-02 17:51     ` Daniel Phillips
  2001-09-03  9:58   ` Rik van Riel
  1 sibling, 1 reply; 8+ messages in thread
From: Samium Gromoff @ 2001-09-02 21:46 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel

  Daniel Phillips wrote:
> > One thing that goes away with rmaps is the need to scan process page tables.
> It's possible that this takes enough load off L1 cache to produce the effects
    I feel like that. 
    actually there was a fear that the overhead of reverse map maintenance
 will overthrow the gain on low loads, but in my case this isnt an issue.
> you showed, though it would be surprising.
> 
> (Note that I'm in the "reverse maps are good" camp, and I think Rik's design 
> is fundamentally correct.)
    me too...
> 
> --
> Daniel
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Rik`s ac12-pmap2 vs ac12-vanilla perfcomp
  2001-09-02 17:51     ` Daniel Phillips
@ 2001-09-02 22:18       ` Samium Gromoff
  2001-09-03 15:22       ` Marcelo Tosatti
  1 sibling, 0 replies; 8+ messages in thread
From: Samium Gromoff @ 2001-09-02 22:18 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel

  Daniel Phillips wrote:
> Measurements where you force your system into continuous swapping would be very
> interesting.
      unfortunately under heavy load kernel starts to give out alot of errors:
Sep  2 11:46:38 vegae kernel: VM: __lru_cache_del, found unknown page ?!
Sep  2 11:47:01 vegae last message repeated 1023 times
Sep  2 11:58:10 vegae kernel: VM: __lru_cache_del, found unknown page ?!
Sep  2 11:58:45 vegae last message repeated 603 times
Sep  2 12:00:00 vegae kernel: VM: __lru_cache_del, found unknown page ?!
Sep  2 12:01:01 vegae last message repeated 2478 times
Sep  2 12:01:13 vegae last message repeated 389 times
Sep  2 12:01:13 vegae kernel: VM: __lru_cache_del, found unknown page ?!
Sep  2 12:01:22 vegae last message repeated 399 times
Sep  2 12:01:23 vegae kernel: VM: __lru_cache_del, found unknown page ?!
Sep  2 12:01:54 vegae last message repeated 959 times

page_remove_all_pmaps: SWAP_ERROR
try_to_swap_out: page not in a VMA?!
page_remove_all_pmaps: SWAP_ERROR
try_to_swap_out: page not in a VMA?!
page_remove_all_pmaps: SWAP_ERROR

     i already reported this to Rik
> --
> Daniel
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Rik`s ac12-pmap2 vs ac12-vanilla perfcomp
  2001-09-02 17:04 ` Daniel Phillips
  2001-09-02 21:46   ` Samium Gromoff
@ 2001-09-03  9:58   ` Rik van Riel
  1 sibling, 0 replies; 8+ messages in thread
From: Rik van Riel @ 2001-09-03  9:58 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Samium Gromoff, linux-kernel

On Sun, 2 Sep 2001, Daniel Phillips wrote:

> It's nice to see these tests aren't running any slower (and not
> crashing!) but as I understand it, reverse mapping is a win only for
> shared mmaps and swapping, which you aren't doing.  So it's not clear
> what effect is being measured.

For one, with this thing we can actually "see" the referenced
bits in the page tables from refill_inactive(), so page aging
could potentially be better.

Samium's numbers, showing how much better, were a tad surprising
to me too, though ;)

> One thing that goes away with rmaps is the need to scan process page
> tables. It's possible that this takes enough load off L1 cache to
> produce the effects you showed, though it would be surprising.

CPU overhead seems to be a bit lower in the tests I ran, where
"a bit" should mostly be significant in the case of many shared
pages.

The real savings should be better pageout selection and lots of
flexibility to do interesting things, though.

> (Note that I'm in the "reverse maps are good" camp, and I think Rik's
> design is fundamentally correct.)

Btw, I just released a new version of the patch, which:
- moves page_remove_pmap() one line down in mremap(), so it
  works now ;)
- starts making the reverse mapping patch SMP safe, removing
  try_to_swap_out() and replacing it by allocate_swap_space()
  and try_to_unmap()
- move architecture-specific magic to <asm/pmap.h> so it's now
  easy to port to other architectures  (yes, this stuff is all
  documented)

	http://www.surriel.com/patches/2.4/2.4.8-ac12-pmap3

regards,

Rik
-- 
IA64: a worthy successor to i860.

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Rik`s ac12-pmap2 vs ac12-vanilla perfcomp
  2001-09-02 17:51     ` Daniel Phillips
  2001-09-02 22:18       ` Samium Gromoff
@ 2001-09-03 15:22       ` Marcelo Tosatti
  1 sibling, 0 replies; 8+ messages in thread
From: Marcelo Tosatti @ 2001-09-03 15:22 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Samium Gromoff, linux-kernel



On Sun, 2 Sep 2001, Daniel Phillips wrote:

> On September 2, 2001 11:46 pm, Samium Gromoff wrote:
> > Daniel Phillips wrote:
> > > > One thing that goes away with rmaps is the need to scan process page tables.
> > > It's possible that this takes enough load off L1 cache to produce the effects
> >
> >     I feel like that. 
> >     actually there was a fear that the overhead of reverse map maintenance
> >  will overthrow the gain on low loads, but in my case this isnt an issue.
> 
> Rik's patch can be optimized a lot by using a direct pointer to the pte in the
> nonshared case, and perhaps a null rmap pointer in the kernel-only case (e.g.,
> page cache).  If the non-optimized version is already performing better than the
> traditional approach it's a very good sign.  This needs careful confirmation.
> 
> Measurements where you force your system into continuous swapping would be very
> interesting.

Indeed.

Samium, I would appreciated if you could run heavy anon mem tests with
Rik's code. (eg programs from the memtest suite, make -jALOT, etc)


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2001-09-03 16:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-02 17:10 Rik`s ac12-pmap2 vs ac12-vanilla perfcomp Samium Gromoff
2001-09-02 13:11 ` Rik van Riel
2001-09-02 17:04 ` Daniel Phillips
2001-09-02 21:46   ` Samium Gromoff
2001-09-02 17:51     ` Daniel Phillips
2001-09-02 22:18       ` Samium Gromoff
2001-09-03 15:22       ` Marcelo Tosatti
2001-09-03  9:58   ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).