* Re: Rik`s ac12-pmap2 vs ac12-vanilla perfcomp
2001-09-02 17:10 Rik`s ac12-pmap2 vs ac12-vanilla perfcomp Samium Gromoff
@ 2001-09-02 13:11 ` Rik van Riel
2001-09-02 17:04 ` Daniel Phillips
1 sibling, 0 replies; 8+ messages in thread
From: Rik van Riel @ 2001-09-02 13:11 UTC (permalink / raw)
To: Samium Gromoff; +Cc: linux-kernel
On Sun, 2 Sep 2001, Samium Gromoff wrote:
> No flames please - i know these were low VM loads, i did this just
> to know how big is test rmaps maitenance overhead. It shows us that
> even on low VM load there is a huge win in using rmap. And the win
> increases with the VM load.
Interesting, I'm just at a proof-of-concept implementation right
now, which is not yet stable or ready. ;)
I guess page replacement _is_ important...
cheers,
Rik
--
IA64: a worthy successor to i860.
http://www.surriel.com/ http://distro.conectiva.com/
Send all your spam to aardvark@nl.linux.org (spam digging piggy)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rik`s ac12-pmap2 vs ac12-vanilla perfcomp
2001-09-02 17:10 Rik`s ac12-pmap2 vs ac12-vanilla perfcomp Samium Gromoff
2001-09-02 13:11 ` Rik van Riel
@ 2001-09-02 17:04 ` Daniel Phillips
2001-09-02 21:46 ` Samium Gromoff
2001-09-03 9:58 ` Rik van Riel
1 sibling, 2 replies; 8+ messages in thread
From: Daniel Phillips @ 2001-09-02 17:04 UTC (permalink / raw)
To: Samium Gromoff, linux-kernel; +Cc: riel
On September 2, 2001 07:10 pm, Samium Gromoff wrote:
> No flames please - i know these were low VM loads, i did this just to know
> how big is test rmaps maitenance overhead. It shows us that even on low VM
> load there is a huge win in using rmap. And the win increases with the VM load.
> Algo:
> - booted ac12
> - performed test A 7 times, then test B 7 times.
> - booted ac12-pmap2
> - performed test A 7 times, then test B 7 times.
>
> * each test was done 7 times, with the lowest and highest thrown away.
> * due to high values of data and streaming usage pattern caching was
> unable to affect results, so it was ignored.
>
> test A:
> "time find / -xdev" + (standard junk eating ~6% cpu (top procinfo))
> descr: extremely low VM load, mostly IO dependent
>
> test B:
> "time find / -xdev | grep --regexp="\/" | xargs echo" + background mpg123 +
> + standard junk described above
> descr: low, although higher than A, vm load (nearly absolutely no swap)
It's nice to see these tests aren't running any slower (and not crashing!)
but as I understand it, reverse mapping is a win only for shared mmaps and
swapping, which you aren't doing. So it's not clear what effect is being
measured.
One thing that goes away with rmaps is the need to scan process page tables.
It's possible that this takes enough load off L1 cache to produce the effects
you showed, though it would be surprising.
(Note that I'm in the "reverse maps are good" camp, and I think Rik's design
is fundamentally correct.)
--
Daniel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Rik`s ac12-pmap2 vs ac12-vanilla perfcomp
@ 2001-09-02 17:10 Samium Gromoff
2001-09-02 13:11 ` Rik van Riel
2001-09-02 17:04 ` Daniel Phillips
0 siblings, 2 replies; 8+ messages in thread
From: Samium Gromoff @ 2001-09-02 17:10 UTC (permalink / raw)
To: linux-kernel; +Cc: riel
Hello there, i just came with some benchmarks:
Done on p166-24M
No flames please - i know these were low VM loads, i did this just to know
how big is test rmaps maitenance overhead. It shows us that even on low VM
load there is a huge win in using rmap. And the win increases with the VM load.
Algo:
- booted ac12
- performed test A 7 times, then test B 7 times.
- booted ac12-pmap2
- performed test A 7 times, then test B 7 times.
* each test was done 7 times, with the lowest and highest thrown away.
* due to high values of data and streaming usage pattern caching was
unable to affect results, so it was ignored.
test A:
"time find / -xdev" + (standard junk eating ~6% cpu (top procinfo))
descr: extremely low VM load, mostly IO dependent
test B:
"time find / -xdev | grep --regexp="\/" | xargs echo" + background mpg123 +
+ standard junk described above
descr: low, although higher than A, vm load (nearly absolutely no swap)
Results:
ac12 ac12-pmap2
===[ test A: find / -xdev
real 1m4.221s 0m56.916s
real 1m3.042s 0m57.275s
real 1m3.613s 0m57.606s
real 1m3.442s 0m57.166s
real 1m3.447s 0m56.895s
======================================
avg 63.553 sec 57.171 sec 11% win
sys 0m36.750s 0m31.980s
sys 0m37.190s 0m31.870s
sys 0m36.720s 0m32.300s
sys 0m36.650s 0m32.400s
sys 0m37.350s 0m32.270s
======================================
avg 36.93 sec 32.16 sec 13% win
===[ test B: find / -xdev | grep --regexp="\/" | xargs echo (with mpg123 in bgrnd
eating 4M+ buf + 15-20% CPU)
real 0m38.720s 0m31.018s
real 0m38.061s 0m30.318s
real 0m38.075s 0m31.980s
real 0m37.626s 0m31.149s
real 0m38.431s 0m30.820s
======================================
avg 38.182 sec 31.057 sec 19% win
sys 0m16.090s 0m13.910s
sys 0m15.820s 0m13.610s
sys 0m15.750s 0m13.710s
sys 0m15.700s 0m13.780s
sys 0m15.750s 0m13.790s
======================================
avg 15.82 sec 13.76 sec 14% win
cheers,
Sam
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rik`s ac12-pmap2 vs ac12-vanilla perfcomp
2001-09-02 21:46 ` Samium Gromoff
@ 2001-09-02 17:51 ` Daniel Phillips
2001-09-02 22:18 ` Samium Gromoff
2001-09-03 15:22 ` Marcelo Tosatti
0 siblings, 2 replies; 8+ messages in thread
From: Daniel Phillips @ 2001-09-02 17:51 UTC (permalink / raw)
To: Samium Gromoff; +Cc: linux-kernel
On September 2, 2001 11:46 pm, Samium Gromoff wrote:
> Daniel Phillips wrote:
> > > One thing that goes away with rmaps is the need to scan process page tables.
> > It's possible that this takes enough load off L1 cache to produce the effects
>
> I feel like that.
> actually there was a fear that the overhead of reverse map maintenance
> will overthrow the gain on low loads, but in my case this isnt an issue.
Rik's patch can be optimized a lot by using a direct pointer to the pte in the
nonshared case, and perhaps a null rmap pointer in the kernel-only case (e.g.,
page cache). If the non-optimized version is already performing better than the
traditional approach it's a very good sign. This needs careful confirmation.
Measurements where you force your system into continuous swapping would be very
interesting.
--
Daniel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rik`s ac12-pmap2 vs ac12-vanilla perfcomp
2001-09-02 17:04 ` Daniel Phillips
@ 2001-09-02 21:46 ` Samium Gromoff
2001-09-02 17:51 ` Daniel Phillips
2001-09-03 9:58 ` Rik van Riel
1 sibling, 1 reply; 8+ messages in thread
From: Samium Gromoff @ 2001-09-02 21:46 UTC (permalink / raw)
To: Daniel Phillips; +Cc: linux-kernel
Daniel Phillips wrote:
> > One thing that goes away with rmaps is the need to scan process page tables.
> It's possible that this takes enough load off L1 cache to produce the effects
I feel like that.
actually there was a fear that the overhead of reverse map maintenance
will overthrow the gain on low loads, but in my case this isnt an issue.
> you showed, though it would be surprising.
>
> (Note that I'm in the "reverse maps are good" camp, and I think Rik's design
> is fundamentally correct.)
me too...
>
> --
> Daniel
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rik`s ac12-pmap2 vs ac12-vanilla perfcomp
2001-09-02 17:51 ` Daniel Phillips
@ 2001-09-02 22:18 ` Samium Gromoff
2001-09-03 15:22 ` Marcelo Tosatti
1 sibling, 0 replies; 8+ messages in thread
From: Samium Gromoff @ 2001-09-02 22:18 UTC (permalink / raw)
To: Daniel Phillips; +Cc: linux-kernel
Daniel Phillips wrote:
> Measurements where you force your system into continuous swapping would be very
> interesting.
unfortunately under heavy load kernel starts to give out alot of errors:
Sep 2 11:46:38 vegae kernel: VM: __lru_cache_del, found unknown page ?!
Sep 2 11:47:01 vegae last message repeated 1023 times
Sep 2 11:58:10 vegae kernel: VM: __lru_cache_del, found unknown page ?!
Sep 2 11:58:45 vegae last message repeated 603 times
Sep 2 12:00:00 vegae kernel: VM: __lru_cache_del, found unknown page ?!
Sep 2 12:01:01 vegae last message repeated 2478 times
Sep 2 12:01:13 vegae last message repeated 389 times
Sep 2 12:01:13 vegae kernel: VM: __lru_cache_del, found unknown page ?!
Sep 2 12:01:22 vegae last message repeated 399 times
Sep 2 12:01:23 vegae kernel: VM: __lru_cache_del, found unknown page ?!
Sep 2 12:01:54 vegae last message repeated 959 times
page_remove_all_pmaps: SWAP_ERROR
try_to_swap_out: page not in a VMA?!
page_remove_all_pmaps: SWAP_ERROR
try_to_swap_out: page not in a VMA?!
page_remove_all_pmaps: SWAP_ERROR
i already reported this to Rik
> --
> Daniel
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rik`s ac12-pmap2 vs ac12-vanilla perfcomp
2001-09-02 17:04 ` Daniel Phillips
2001-09-02 21:46 ` Samium Gromoff
@ 2001-09-03 9:58 ` Rik van Riel
1 sibling, 0 replies; 8+ messages in thread
From: Rik van Riel @ 2001-09-03 9:58 UTC (permalink / raw)
To: Daniel Phillips; +Cc: Samium Gromoff, linux-kernel
On Sun, 2 Sep 2001, Daniel Phillips wrote:
> It's nice to see these tests aren't running any slower (and not
> crashing!) but as I understand it, reverse mapping is a win only for
> shared mmaps and swapping, which you aren't doing. So it's not clear
> what effect is being measured.
For one, with this thing we can actually "see" the referenced
bits in the page tables from refill_inactive(), so page aging
could potentially be better.
Samium's numbers, showing how much better, were a tad surprising
to me too, though ;)
> One thing that goes away with rmaps is the need to scan process page
> tables. It's possible that this takes enough load off L1 cache to
> produce the effects you showed, though it would be surprising.
CPU overhead seems to be a bit lower in the tests I ran, where
"a bit" should mostly be significant in the case of many shared
pages.
The real savings should be better pageout selection and lots of
flexibility to do interesting things, though.
> (Note that I'm in the "reverse maps are good" camp, and I think Rik's
> design is fundamentally correct.)
Btw, I just released a new version of the patch, which:
- moves page_remove_pmap() one line down in mremap(), so it
works now ;)
- starts making the reverse mapping patch SMP safe, removing
try_to_swap_out() and replacing it by allocate_swap_space()
and try_to_unmap()
- move architecture-specific magic to <asm/pmap.h> so it's now
easy to port to other architectures (yes, this stuff is all
documented)
http://www.surriel.com/patches/2.4/2.4.8-ac12-pmap3
regards,
Rik
--
IA64: a worthy successor to i860.
http://www.surriel.com/ http://distro.conectiva.com/
Send all your spam to aardvark@nl.linux.org (spam digging piggy)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rik`s ac12-pmap2 vs ac12-vanilla perfcomp
2001-09-02 17:51 ` Daniel Phillips
2001-09-02 22:18 ` Samium Gromoff
@ 2001-09-03 15:22 ` Marcelo Tosatti
1 sibling, 0 replies; 8+ messages in thread
From: Marcelo Tosatti @ 2001-09-03 15:22 UTC (permalink / raw)
To: Daniel Phillips; +Cc: Samium Gromoff, linux-kernel
On Sun, 2 Sep 2001, Daniel Phillips wrote:
> On September 2, 2001 11:46 pm, Samium Gromoff wrote:
> > Daniel Phillips wrote:
> > > > One thing that goes away with rmaps is the need to scan process page tables.
> > > It's possible that this takes enough load off L1 cache to produce the effects
> >
> > I feel like that.
> > actually there was a fear that the overhead of reverse map maintenance
> > will overthrow the gain on low loads, but in my case this isnt an issue.
>
> Rik's patch can be optimized a lot by using a direct pointer to the pte in the
> nonshared case, and perhaps a null rmap pointer in the kernel-only case (e.g.,
> page cache). If the non-optimized version is already performing better than the
> traditional approach it's a very good sign. This needs careful confirmation.
>
> Measurements where you force your system into continuous swapping would be very
> interesting.
Indeed.
Samium, I would appreciated if you could run heavy anon mem tests with
Rik's code. (eg programs from the memtest suite, make -jALOT, etc)
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2001-09-03 16:48 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-02 17:10 Rik`s ac12-pmap2 vs ac12-vanilla perfcomp Samium Gromoff
2001-09-02 13:11 ` Rik van Riel
2001-09-02 17:04 ` Daniel Phillips
2001-09-02 21:46 ` Samium Gromoff
2001-09-02 17:51 ` Daniel Phillips
2001-09-02 22:18 ` Samium Gromoff
2001-09-03 15:22 ` Marcelo Tosatti
2001-09-03 9:58 ` Rik van Riel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).