linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* LMbench2.0 results
@ 2002-09-22 12:42 Paolo Ciarrocchi
  0 siblings, 0 replies; 42+ messages in thread
From: Paolo Ciarrocchi @ 2002-09-22 12:42 UTC (permalink / raw)
  To: linux-kernel

Hi all,
here it goes the results of LMbench2.0
HW is a laptop, PII@800, 256MiB of RAM.

cd results && make summary percent 2>/dev/null | more
make[1]: Entering directory `/usr/src/LMbench/results'

                 L M B E N C H  2 . 0   S U M M A R Y
                 ------------------------------------


Basic system parameters
----------------------------------------------------
Host                 OS Description              Mhz
                                                    
--------- ------------- ----------------------- ----
frodo      Linux 2.4.18       i686-pc-linux-gnu  797
frodo      Linux 2.4.19       i686-pc-linux-gnu  797
frodo      Linux 2.5.33       i686-pc-linux-gnu  797
frodo      Linux 2.5.34       i686-pc-linux-gnu  797
frodo      Linux 2.5.36       i686-pc-linux-gnu  797
frodo      Linux 2.5.37       i686-pc-linux-gnu  797
frodo      Linux 2.5.38       i686-pc-linux-gnu  797

Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host                 OS  Mhz null null      open selct sig  sig  fork exec sh  
                             call  I/O stat clos TCP   inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
frodo      Linux 2.4.18  797 0.40 0.57 3.18 3.96  19.4 1.00 3.18 114. 1260 13.K
frodo      Linux 2.4.19  797 0.40 0.56 3.16 4.01  16.9 1.00 3.19 112. 1136 12.K
frodo      Linux 2.5.33  797 0.40 0.62 3.70 4.74  21.8 1.02 3.31 187. 1507 13.K
frodo      Linux 2.5.34  797 0.40 0.61 3.65 4.65  15.6 1.05 3.34 184. 1505 13.K
frodo      Linux 2.5.36  797 0.38 0.57 3.44 4.30       1.02 3.29 154. 1444 13.K
frodo      Linux 2.5.37  797 0.38 0.59 3.60 4.54       1.03 3.37 164. 1460 5404
frodo      Linux 2.5.38  797 0.38 0.59 3.61 4.45       1.03 3.37 161. 1497 14.K

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host                 OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                        ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
frodo      Linux 2.4.18 0.870 4.4000   14.0 5.7500  327.4    57.8   323.2
frodo      Linux 2.4.19 0.680 4.3000   13.9 5.7300  309.9    48.2   309.8
frodo      Linux 2.5.33 1.670 5.3300   15.1 9.6100  313.7    40.1   313.4
frodo      Linux 2.5.34 1.540 5.2200   90.7 9.1800  312.1    39.2   311.9
frodo      Linux 2.5.36 1.070 4.2000   13.9 6.8600  312.2    44.1   312.2
frodo      Linux 2.5.37 1.540 5.0000   68.7 8.1400  313.4    56.5   312.9
frodo      Linux 2.5.38 1.040 5.0300   14.8 7.6100  313.6    65.8   313.3

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
frodo      Linux 2.4.18 0.870 4.631 8.64  14.9  35.5  23.4  49.3 79.4
frodo      Linux 2.4.19 0.680 4.612 7.64  15.3  35.6  20.9  47.1 76.2
frodo      Linux 2.5.33 1.670 7.697 9.21  15.9  38.8  22.8  53.8 86.7
frodo      Linux 2.5.34 1.540 7.344 8.78  16.8  37.9  25.1  51.9 85.9
frodo      Linux 2.5.36 1.070 4.488 8.05  16.3  35.5  23.9  50.1 300K
frodo      Linux 2.5.37 1.540 6.173 9.04                             
frodo      Linux 2.5.38 1.040 7.406 8.76  16.8  37.5  24.9  51.8 87.3

File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host                 OS   0K File      10K File      Mmap    Prot    Page	
                        Create Delete Create Delete  Latency Fault   Fault 
--------- ------------- ------ ------ ------ ------  ------- -----   ----- 
frodo      Linux 2.4.18   68.9   16.0  186.6   31.5    426.0 0.794 2.00000
frodo      Linux 2.4.19   69.3   15.6  190.6   30.7    414.0 0.792 2.00000
frodo      Linux 2.5.33   78.2   19.8  208.6   38.2    768.0 0.816 3.00000
frodo      Linux 2.5.34   77.4   18.7  206.8   38.1    768.0 0.845 3.00000
frodo      Linux 2.5.36   75.7   17.1  203.8   35.8    736.0 0.821 3.00000
frodo      Linux 2.5.37   76.9   17.9  205.8   37.9    780.0 0.825 3.00000
frodo      Linux 2.5.38   77.2   19.0  205.8   38.5    786.0 0.827 3.00000

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
frodo      Linux 2.4.18 806. 295. 132.  181.7  203.8   69.0   98.1 196. 184.2
frodo      Linux 2.4.19 765. 690. 249.  185.5  203.8  101.5  101.4 203. 190.2
frodo      Linux 2.5.33 535. 645. 44.8  185.6  202.5  100.5  100.4 202. 189.7
frodo      Linux 2.5.34 465. 649. 44.8  185.5  202.5  100.5  100.4 202. 189.4
frodo      Linux 2.5.36 759. 458. 51.4  184.0  202.6  100.5  100.4 202. 189.8
frodo      Linux 2.5.37 589. 676.       184.8  202.3  100.5  100.4 202. 191.4
frodo      Linux 2.5.38 728. 653. 48.0  184.3  202.4  100.5  100.5 202. 192.1

Memory latencies in nanoseconds - smaller is better
    (WARNING - may not be correct, check graphs)
---------------------------------------------------
Host                 OS   Mhz  L1 $   L2 $    Main mem    Guesses
--------- -------------  ---- ----- ------    --------    -------
frodo      Linux 2.4.18   797 3.766 8.7970  158.9
frodo      Linux 2.4.19   797 3.767 8.7880  158.9
frodo      Linux 2.5.33   797 3.797 8.8720  160.1
frodo      Linux 2.5.34   797 3.806 8.8770  160.2
frodo      Linux 2.5.36   797 3.798 8.8730  160.1
frodo      Linux 2.5.37   797 3.799   45.0  160.2
frodo      Linux 2.5.38   797 3.795 8.8740  160.2
make[1]: Leaving directory `/usr/src/LMbench/results'

-- 
Get your free email from www.linuxmail.org 


Powered by Outblaze

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-14 18:26 Paolo Ciarrocchi
@ 2002-09-15 18:08 ` Pavel Machek
  0 siblings, 0 replies; 42+ messages in thread
From: Pavel Machek @ 2002-09-15 18:08 UTC (permalink / raw)
  To: Paolo Ciarrocchi; +Cc: pavel, linux-kernel

Hi!

> [...]
> > I hope powermanagment is completely disabled this time.
> > 									Pavel
> Yes.
> Pavel, is there a way to disable apm at boot time with a lilo
parameter?

apm=off
					Pavel

-- 
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
@ 2002-09-14 18:26 Paolo Ciarrocchi
  2002-09-15 18:08 ` Pavel Machek
  0 siblings, 1 reply; 42+ messages in thread
From: Paolo Ciarrocchi @ 2002-09-14 18:26 UTC (permalink / raw)
  To: pavel; +Cc: linux-kernel

From: Pavel Machek <pavel@ucw.cz>
[...]
> I hope powermanagment is completely disabled this time.
> 									Pavel
Yes.
Pavel, is there a way to disable apm at boot time with a lilo parameter?

             Paolo
-- 
Get your free email from www.linuxmail.org 


Powered by Outblaze

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-07 18:04 Paolo Ciarrocchi
@ 2002-09-13 22:49 ` Pavel Machek
  0 siblings, 0 replies; 42+ messages in thread
From: Pavel Machek @ 2002-09-13 22:49 UTC (permalink / raw)
  To: Paolo Ciarrocchi; +Cc: jmorris, linux-kernel

Hi!

> > > Let me know if you need further information (.config, info about my
> > > hardware) or if you want I run other tests.
> > 
> > Would you be able to run the tests for 2.5.31?  I'm looking into a
> > slowdown in 2.5.32/33 which may be related.  Some hardware info might be
> > useful too.
> I don't have the 2.5.31, and now I've only a slow 
> internet connection... I'll try to download it on Monday.
> 
> The hw is a Laptop, a standard HP Omnibook 6000, 256 MiB of RAM, PIII@800.
> Do you need more information?

I hope powermanagment is completely disabled this time.
									Pavel
-- 
Worst form of spam? Adding advertisment signatures ala sourceforge.net.
What goes next? Inserting advertisment *into* email?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-07 21:44     ` Alan Cox
@ 2002-09-13 22:46       ` Pavel Machek
  0 siblings, 0 replies; 42+ messages in thread
From: Pavel Machek @ 2002-09-13 22:46 UTC (permalink / raw)
  To: Alan Cox; +Cc: Rik van Riel, Jeff Garzik, Paolo Ciarrocchi, linux-kernel

Hi!

> > > > Comments?
> > >
> > > Yeah:  "ouch" because I don't see a single category that's faster.
> > 
> > HZ went to 1000, which should help multimedia latencies a lot.
> 
> It shouldn't materially damage performance unless we have other things
> extremely wrong. Its easy enough to verify by putting HZ back to 100 and
> rebenching 

1000 times per second, enter timer interrupt, acknowledge it, exit
interrupt. Few i/o accessess, few tlb entries kicked out, some L1
cache consumed?

Is 10usec per timer interrupt reasonable on modern system? That's 10
msec per second spend in timer with HZ=1000, thats 1% overall. So it
seems to me it is possible for HZ=1000 to have performance impact...

								Pavel

-- 
Worst form of spam? Adding advertisment signatures ala sourceforge.net.
What goes next? Inserting advertisment *into* email?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-07 14:33 ` James Morris
@ 2002-09-09 22:22   ` Cliff White
  0 siblings, 0 replies; 42+ messages in thread
From: Cliff White @ 2002-09-09 22:22 UTC (permalink / raw)
  To: James Morris; +Cc: Paolo Ciarrocchi, linux-kernel, cliffw

> On Sat, 7 Sep 2002, Paolo Ciarrocchi wrote:
> 
> > Let me know if you need further information (.config, info about my
> > hardware) or if you want I run other tests.
> 
> Would you be able to run the tests for 2.5.31?  I'm looking into a
> slowdown in 2.5.32/33 which may be related.  Some hardware info might be
> useful too.
> 
> 
Certainly, we have those in the STP data base, and here's a quick summary:
(Of course you can search these yourself ) The full reports have the hardware 
summary also. see web links at the end. Full reports have each test run 5x.


Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host                 OS  Mhz null null      open selct sig  sig  fork exec sh
                             call  I/O stat clos TCP   inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
stp1-000.  Linux 2.5.33 1000 0.33 0.49 2.84 3.52       0.79 2.62 168. 1279 4475
stp1-002.  Linux 2.5.32 1000 0.32 0.47 2.94 4.41  15.7 0.80 2.63 202. 1292 4603
stp1-003.  Linux 2.5.31 1000 0.32 0.46 2.85 6.92  14.4 0.80 2.60 856. 2596 8122

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host                 OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                        ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
stp1-000.  Linux 2.5.33 1.530 4.1100   12.2 6.4700  136.4    32.7   136.2
stp1-002.  Linux 2.5.32 1.590 4.2200   12.4 5.4000  139.1    26.6   136.7
stp1-003.  Linux 2.5.31 1.830   46.4  142.6   47.5  141.7    47.6   141.2

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
stp1-000.  Linux 2.5.33 1.530 5.320 10.7  13.8  30.5  19.3  42.1 65.3
stp1-002.  Linux 2.5.32 1.570 5.456 11.3  14.2  31.3  21.1  42.6 67.4
stp1-003.  Linux 2.5.31 1.810 7.377 14.9  50.5 173.7 117.1 263.8 414.

File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host                 OS   0K File      10K File      Mmap    Prot    Page
                        Create Delete Create Delete  Latency Fault   Fault
--------- ------------- ------ ------ ------ ------  ------- -----   -----
stp1-000.  Linux 2.5.33   32.9 5.4600  117.0   13.3   1261.0 0.575 3.00000
stp1-002.  Linux 2.5.32   34.0 5.9460  118.6   14.0   1265.0 0.619 3.00000
stp1-003.  Linux 2.5.31   72.5   15.3  225.5   38.2   2062.0 0.657 4.00000

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
stp1-000.  Linux 2.5.33 699. 855. 68.0  407.9  460.1  168.5  157.8 460. 233.8
stp1-002.  Linux 2.5.32 690. 297. 93.7  397.5  459.2  162.1  150.0 458. 233.1
stp1-003.  Linux 2.5.31 145. 74.8 58.5  118.6  456.9  169.7  156.8 456. 269.0

Full list:
2.5.33 http://khack.osdl.org/stp/4925 -1cpu
	http://khack.osdl.org/stp/4915 -1cpu
	http://khack.osdl.org/stp/4932 -2cpu
	http://khack.osdl.org/stp/4926 -2cpu
2.5.32 
	http://khack.osdl.org/stp/4758  -2cpu
	http://khack.osdl.org/stp/4752 -2cpu
	http://khack.osdl.org/stp/4751 -1cpu
	http://khack.osdl.org/stp/4741 -1cpu
2.5.31 	
	http://khack.osdl.org/stp/4302 -1cpu
	http://khack.osdl.org/stp/4312 -1cpu
	http://khack.osdl.org/stp/4313 -2cpu
	http://khack.osdl.org/stp/4319 -2cpu	
cliffw
OSDL	

> - James
> -- 
> James Morris
> <jmorris@intercode.com.au>
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-09 21:44               ` Andrew Morton
@ 2002-09-09 22:09                 ` Alan Cox
  0 siblings, 0 replies; 42+ messages in thread
From: Alan Cox @ 2002-09-09 22:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Martin J. Bligh, William Lee Irwin III, Paolo Ciarrocchi, linux-kernel

On Mon, 2002-09-09 at 22:44, Andrew Morton wrote:
> Does "heuristic" overcommit handling need so much accuracy?
> Perhaps we can push some of the cost over into mode 2 somehow.

Its only needed in mode 2, but its also only computed for mode 2,3 in
2,4 8)


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-09 21:13             ` Alan Cox
@ 2002-09-09 21:44               ` Andrew Morton
  2002-09-09 22:09                 ` Alan Cox
  0 siblings, 1 reply; 42+ messages in thread
From: Andrew Morton @ 2002-09-09 21:44 UTC (permalink / raw)
  To: Alan Cox
  Cc: Martin J. Bligh, William Lee Irwin III, Paolo Ciarrocchi, linux-kernel

Alan Cox wrote:
> 
> On Sun, 2002-09-08 at 19:40, Andrew Morton wrote:
> > We need to find some way of making vm_enough_memory not call get_page_state
> > so often.  One way of doing that might be to make get_page_state dump
> > its latest result into a global copy, and make vm_enough_memory()
> > only get_page_state once per N invokations.  A speed/accuracy tradeoff there.
> 
> Unless the error always falls on the same side the accuracy tradeoff is
> fatal to the entire scheme of things. Sorting out the use of
> get_page_state is worth doing if that is the bottleneck, and
> snapshooting such that we only look at it if we might be close to the
> limit would work, but we'd need to know when the limit had shifted too
> much

It could be that the cost is only present on the IBM whackomatics,
so they can twiddle the /proc setting and we can all be happy.
Certainly I did not see any problems on the quad.

Does "heuristic" overcommit handling need so much accuracy?
Perhaps we can push some of the cost over into mode 2 somehow.

Or we could turn it the other way up and, in __add_to_page_cache(),
do:

	if (overcommit_mode == anal)
		atomic_inc(&nr_pagecache_pages);

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-08 18:40           ` Andrew Morton
  2002-09-08 20:48             ` Hugh Dickins
@ 2002-09-09 21:13             ` Alan Cox
  2002-09-09 21:44               ` Andrew Morton
  1 sibling, 1 reply; 42+ messages in thread
From: Alan Cox @ 2002-09-09 21:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Martin J. Bligh, William Lee Irwin III, Paolo Ciarrocchi, linux-kernel

On Sun, 2002-09-08 at 19:40, Andrew Morton wrote:
> We need to find some way of making vm_enough_memory not call get_page_state
> so often.  One way of doing that might be to make get_page_state dump
> its latest result into a global copy, and make vm_enough_memory()
> only get_page_state once per N invokations.  A speed/accuracy tradeoff there.

Unless the error always falls on the same side the accuracy tradeoff is
fatal to the entire scheme of things. Sorting out the use of
get_page_state is worth doing if that is the bottleneck, and
snapshooting such that we only look at it if we might be close to the
limit would work, but we'd need to know when the limit had shifted too
much


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-09 16:55           ` Daniel Phillips
  2002-09-09 17:24             ` Martin J. Bligh
@ 2002-09-09 21:11             ` Alan Cox
  1 sibling, 0 replies; 42+ messages in thread
From: Alan Cox @ 2002-09-09 21:11 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Martin J. Bligh, Rik van Riel, Andrew Morton, Paolo Ciarrocchi,
	linux-kernel

On Mon, 2002-09-09 at 17:55, Daniel Phillips wrote:
> You need to look at it from the other direction: how do the needs of a
> uniprocessor Clawhammer box differ from a Linksys adsl router?

I've advocated several times having a single config option for "fine
tuning" that sane people say "N" to and which if set lets you force
small hash sizes, disable block layer support and kill various other
'always needed' PC crap. Tell me - on a 4Mb embedded 386 running your
toaster do you really care if the TCP hash lookup is a little slower
than perfect scaling, and do you need a 64Kbyte mount hash ?


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-09 16:55           ` Daniel Phillips
@ 2002-09-09 17:24             ` Martin J. Bligh
  2002-09-09 21:11             ` Alan Cox
  1 sibling, 0 replies; 42+ messages in thread
From: Martin J. Bligh @ 2002-09-09 17:24 UTC (permalink / raw)
  To: Daniel Phillips, Rik van Riel
  Cc: Andrew Morton, Paolo Ciarrocchi, linux-kernel

>> > An idea that's looking more and more attractive as time goes by is to
>> > have a global config option that specifies that we want to choose the
>> > simple way of doing things wherever possible, over the enterprise way.
>> > We want this especially for embedded.  On low end processors, it's even
>> > possible that the small way will be faster in some cases than the
>> > enterprise way, due to cache effects.
>> 
>> Can't we just use the existing config options instead? CONFIG_SMP is
>> a good start ;-) How many embedded systems with SMP do you have?
> 
> You need to look at it from the other direction: how do the needs of a
> uniprocessor Clawhammer box differ from a Linksys adsl router?

I wouldn't call uniprocessor Clawhammer the "enterprise way" type
machine. But other than that, I see your point. You're in a far 
better position to answer your own question than I am, so I'll leave 
that as rhetorical ;-)

M.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-09 16:26         ` Martin J. Bligh
@ 2002-09-09 16:55           ` Daniel Phillips
  2002-09-09 17:24             ` Martin J. Bligh
  2002-09-09 21:11             ` Alan Cox
  0 siblings, 2 replies; 42+ messages in thread
From: Daniel Phillips @ 2002-09-09 16:55 UTC (permalink / raw)
  To: Martin J. Bligh, Rik van Riel
  Cc: Andrew Morton, Paolo Ciarrocchi, linux-kernel

On Monday 09 September 2002 18:26, Martin J. Bligh wrote:
> > An idea that's looking more and more attractive as time goes by is to
> > have a global config option that specifies that we want to choose the
> > simple way of doing things wherever possible, over the enterprise way.
> > We want this especially for embedded.  On low end processors, it's even
> > possible that the small way will be faster in some cases than the
> > enterprise way, due to cache effects.
> 
> Can't we just use the existing config options instead? CONFIG_SMP is
> a good start ;-) How many embedded systems with SMP do you have?

You need to look at it from the other direction: how do the needs of a
uniprocessor Clawhammer box differ from a Linksys adsl router?

-- 
Daniel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-09 16:16       ` Daniel Phillips
  2002-09-09 16:26         ` Martin J. Bligh
@ 2002-09-09 16:52         ` Andrew Morton
  1 sibling, 0 replies; 42+ messages in thread
From: Andrew Morton @ 2002-09-09 16:52 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Rik van Riel, Paolo Ciarrocchi, linux-kernel

Daniel Phillips wrote:
> 
> On Monday 09 September 2002 15:37, Rik van Riel wrote:
> > On Sun, 8 Sep 2002, Daniel Phillips wrote:
> >
> > > I suspect the overall performance loss on the laptop has more to do with
> > > several months of focussing exclusively on the needs of 4-way and higher
> > > smp machines.
> >
> > Probably true, we're pulling off an indecent number of tricks
> > for 4-way and 8-way SMP performance. This overhead shouldn't
> > be too bad on UP and 2-way machines, but might easily be a
> > percent or so.
> 
> Though to be fair, it's smart to concentrate on the high end with a
> view to achieving world domination sooner.  And it's a stretch to call
> the low end performance 'slow'.

It's on the larger machines where 2.4 has problems.  Fixing them up
makes the kernel broader, more general purpose.  We're seeing 50-100%
gains in some areas there.  Giving away a few percent on smaller machines
at this stage is OK.  But yup, we need to go and get that back later.

> An idea that's looking more and more attractive as time goes by is to
> have a global config option that specifies that we want to choose the
> simple way of doing things wherever possible, over the enterprise way.

Prefer not to.  We've been able to cover all bases moderately well
thus far without adding a big boolean switch.

> We want this especially for embedded.  On low end processors, it's even
> possible that the small way will be faster in some cases than the
> enterprise way, due to cache effects.

The main thing we can do for smaller systems is to not allocate as much
memory at boot time.  Some more careful scaling is needed there.  I'll
generate a list soon.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-09 16:16       ` Daniel Phillips
@ 2002-09-09 16:26         ` Martin J. Bligh
  2002-09-09 16:55           ` Daniel Phillips
  2002-09-09 16:52         ` Andrew Morton
  1 sibling, 1 reply; 42+ messages in thread
From: Martin J. Bligh @ 2002-09-09 16:26 UTC (permalink / raw)
  To: Daniel Phillips, Rik van Riel
  Cc: Andrew Morton, Paolo Ciarrocchi, linux-kernel

>> Probably true, we're pulling off an indecent number of tricks
>> for 4-way and 8-way SMP performance. This overhead shouldn't
>> be too bad on UP and 2-way machines, but might easily be a
>> percent or so.
> 
> Though to be fair, it's smart to concentrate on the high end with a
> view to achieving world domination sooner.  And it's a stretch to call
> the low end performance 'slow'.

I don't think there's that much overhead, it's just not where people
have been focusing tuning efforts recently. If you run the numbers,
and point out specific problems, I'm sure people will fix them ;-)
In other words, I don't think the recent focus has caused a problem
for low end machines, it just hasn't really looked at solving one.

> An idea that's looking more and more attractive as time goes by is to
> have a global config option that specifies that we want to choose the
> simple way of doing things wherever possible, over the enterprise way.
> We want this especially for embedded.  On low end processors, it's even
> possible that the small way will be faster in some cases than the
> enterprise way, due to cache effects.

Can't we just use the existing config options instead? CONFIG_SMP is
a good start ;-) How many embedded systems with SMP do you have?

M.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-09 13:37     ` Rik van Riel
@ 2002-09-09 16:16       ` Daniel Phillips
  2002-09-09 16:26         ` Martin J. Bligh
  2002-09-09 16:52         ` Andrew Morton
  0 siblings, 2 replies; 42+ messages in thread
From: Daniel Phillips @ 2002-09-09 16:16 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Andrew Morton, Paolo Ciarrocchi, linux-kernel

On Monday 09 September 2002 15:37, Rik van Riel wrote:
> On Sun, 8 Sep 2002, Daniel Phillips wrote:
> 
> > I suspect the overall performance loss on the laptop has more to do with
> > several months of focussing exclusively on the needs of 4-way and higher
> > smp machines.
> 
> Probably true, we're pulling off an indecent number of tricks
> for 4-way and 8-way SMP performance. This overhead shouldn't
> be too bad on UP and 2-way machines, but might easily be a
> percent or so.

Though to be fair, it's smart to concentrate on the high end with a
view to achieving world domination sooner.  And it's a stretch to call
the low end performance 'slow'.

An idea that's looking more and more attractive as time goes by is to
have a global config option that specifies that we want to choose the
simple way of doing things wherever possible, over the enterprise way.
We want this especially for embedded.  On low end processors, it's even
possible that the small way will be faster in some cases than the
enterprise way, due to cache effects.

-- 
Daniel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-08 20:02   ` Daniel Phillips
@ 2002-09-09 13:37     ` Rik van Riel
  2002-09-09 16:16       ` Daniel Phillips
  0 siblings, 1 reply; 42+ messages in thread
From: Rik van Riel @ 2002-09-09 13:37 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Andrew Morton, Paolo Ciarrocchi, linux-kernel

On Sun, 8 Sep 2002, Daniel Phillips wrote:

> I suspect the overall performance loss on the laptop has more to do with
> several months of focussing exclusively on the needs of 4-way and higher
> smp machines.

Probably true, we're pulling off an indecent number of tricks
for 4-way and 8-way SMP performance. This overhead shouldn't
be too bad on UP and 2-way machines, but might easily be a
percent or so.

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

Spamtraps of the month:  september@surriel.com trac@trac.org


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-08 20:48             ` Hugh Dickins
@ 2002-09-08 21:51               ` Andrew Morton
  0 siblings, 0 replies; 42+ messages in thread
From: Andrew Morton @ 2002-09-08 21:51 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Alan Cox, Martin J. Bligh, William Lee Irwin III,
	Paolo Ciarrocchi, linux-kernel

Hugh Dickins wrote:
> 
> On Sun, 8 Sep 2002, Andrew Morton wrote:
> >
> > We need to find some way of making vm_enough_memory not call get_page_state
> > so often.  One way of doing that might be to make get_page_state dump
> > its latest result into a global copy, and make vm_enough_memory()
> > only get_page_state once per N invokations.  A speed/accuracy tradeoff there.
> 
> Accuracy is not very important in that sysctl_overcommit_memory 0 case
> e.g. the swapper_space.nr_pages addition was brought in at a time when
> it was very necessary, but usually overestimates now (or last time I
> thought about it).  The main thing to look out for is running the same
> memory grabber twice in quick succession: not nice if it succeeds the
> first time, but not the second, just because of some transient effect
> that its old pages are temporarily uncounted.
> 

That's right - there can be sudden and huge changes in pages used/free.

So any rate limiting tweak in there would have to be in terms of
number-of-pages rather than number-of-seconds.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-08 18:40           ` Andrew Morton
@ 2002-09-08 20:48             ` Hugh Dickins
  2002-09-08 21:51               ` Andrew Morton
  2002-09-09 21:13             ` Alan Cox
  1 sibling, 1 reply; 42+ messages in thread
From: Hugh Dickins @ 2002-09-08 20:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alan Cox, Martin J. Bligh, William Lee Irwin III,
	Paolo Ciarrocchi, linux-kernel

On Sun, 8 Sep 2002, Andrew Morton wrote:
> 
> We need to find some way of making vm_enough_memory not call get_page_state
> so often.  One way of doing that might be to make get_page_state dump
> its latest result into a global copy, and make vm_enough_memory()
> only get_page_state once per N invokations.  A speed/accuracy tradeoff there.

Accuracy is not very important in that sysctl_overcommit_memory 0 case
e.g. the swapper_space.nr_pages addition was brought in at a time when
it was very necessary, but usually overestimates now (or last time I
thought about it).  The main thing to look out for is running the same
memory grabber twice in quick succession: not nice if it succeeds the
first time, but not the second, just because of some transient effect
that its old pages are temporarily uncounted.

Hugh


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-07 16:20 ` Andrew Morton
  2002-09-07 20:03   ` William Lee Irwin III
@ 2002-09-08 20:02   ` Daniel Phillips
  2002-09-09 13:37     ` Rik van Riel
  1 sibling, 1 reply; 42+ messages in thread
From: Daniel Phillips @ 2002-09-08 20:02 UTC (permalink / raw)
  To: Andrew Morton, Paolo Ciarrocchi; +Cc: linux-kernel

On Saturday 07 September 2002 18:20, Andrew Morton wrote:
> Paolo Ciarrocchi wrote:
> > 
> > Hi all,
> > I've just ran lmbench2.0 on my laptop.
> > Here the results (again, 2.5.33 seems to be "slow", I don't know why...)
> > 
> 
> The fork/exec/mmap slowdown is the rmap overhead.  I have some stuff
> which partialy improves it.

It only seems like a big deal if you get out your microscope and focus on
the fork times.  On the other hand, look at the sh times: the rmap setup
time gets lost in the noise.  The latter looks more like reality to me.

I suspect the overall performance loss on the laptop has more to do with
several months of focussing exclusively on the needs of 4-way and higher
smp machines.

-- 
Daniel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-08 17:07         ` Alan Cox
  2002-09-08 18:11           ` Martin J. Bligh
@ 2002-09-08 18:40           ` Andrew Morton
  2002-09-08 20:48             ` Hugh Dickins
  2002-09-09 21:13             ` Alan Cox
  1 sibling, 2 replies; 42+ messages in thread
From: Andrew Morton @ 2002-09-08 18:40 UTC (permalink / raw)
  To: Alan Cox
  Cc: Martin J. Bligh, William Lee Irwin III, Paolo Ciarrocchi, linux-kernel

Alan Cox wrote:
> 
> On Sun, 2002-09-08 at 00:44, Martin J. Bligh wrote:
> > >> Perhaps testing with overcommit on would be useful.
> > >
> > > Well yes - the new overcommit code was a significant hit on the 16ways
> > > was it not?  You have some numbers on that?
> >
> > About 20% hit on system time for kernel compiles.
> 
> That suprises me a lot. On a 2 way and 4 way the 2.4 memory overcommit
> check code didnt show up. That may be down to the 2 way being on a CPU
> that has no measurable cost for locked operations and the 4 way being an
> ancient ppro a friend has.
> 
> If it is the memory overcommit handling then there are plenty of ways to
> deal with it efficiently in the non-preempt case at least. I had
> wondered originally about booking chunks of pages off per CPU (take the
> remaining overcommit divide by four and only when a CPU finds its
> private block is empty take a lock and redistribute the remaining
> allocation). Since boxes almost never get that close to overcommit
> kicking in then it should mean we close to never touch a locked count.

Martin had this profile for a kernel build on 2.5.31-mm1:



c01299d0 6761     1.28814     vm_enough_memory
c0114584 8085     1.5404      load_balance
c01334c0 8292     1.57984     __free_pages_ok
c011193c 11559    2.20228     smp_apic_timer_interrupt
c0113040 12075    2.3006      do_page_fault
c012bf08 12075    2.3006      find_get_page
c0114954 12912    2.46007     scheduler_tick
c012c430 13199    2.51475     file_read_actor
c01727e8 20440    3.89434     __generic_copy_from_user
c0133fb8 25792    4.91403     nr_free_pages
c01337c0 27318    5.20478     rmqueue
c0129588 36955    7.04087     handle_mm_fault
c013a65c 38391    7.31447     page_remove_rmap
c0134094 43755    8.33645     get_page_state
c0105300 57699    10.9931     default_idle
c0128e64 58735    11.1905     do_anonymous_page

We can make nr_free_pages go away by adding global free page 
accounting to struct page_states.  So we're accounting it in
two places, but it'll be simple.

The global page accounting is very much optimised for the fast path at
the expense of get_page_state().  (And that kernel didn't have the
rmap speedups).

We need to find some way of making vm_enough_memory not call get_page_state
so often.  One way of doing that might be to make get_page_state dump
its latest result into a global copy, and make vm_enough_memory()
only get_page_state once per N invokations.  A speed/accuracy tradeoff there.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-08 17:07         ` Alan Cox
@ 2002-09-08 18:11           ` Martin J. Bligh
  2002-09-08 18:40           ` Andrew Morton
  1 sibling, 0 replies; 42+ messages in thread
From: Martin J. Bligh @ 2002-09-08 18:11 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andrew Morton, William Lee Irwin III, Paolo Ciarrocchi, linux-kernel

>> >> Perhaps testing with overcommit on would be useful.
>> > 
>> > Well yes - the new overcommit code was a significant hit on the 16ways
>> > was it not?  You have some numbers on that?
>> 
>> About 20% hit on system time for kernel compiles.
> 
> That suprises me a lot. On a 2 way and 4 way the 2.4 memory overcommit
> check code didnt show up. That may be down to the 2 way being on a CPU
> that has no measurable cost for locked operations and the 4 way being an
> ancient ppro a friend has.

Remember this is a NUMA machine - gathering global information
is extremely expensive. On an SMP system, I wouldn't expect it
to show up so much, though it still doesn't seem terribly efficient. 
The code is admits it's broken anyway, for the overcommit = 2 case
(which was NOT what I was running - the 20% is for 1). Below is a 
simple patch that I've never got around to testing, that I think 
will improve that case (not that I'm that interested in setting
overcommit to 2 ;-)).
 
> If it is the memory overcommit handling then there are plenty of ways to
> deal with it efficiently in the non-preempt case at least. I had
> wondered originally about booking chunks of pages off per CPU (take the
> remaining overcommit divide by four and only when a CPU finds its
> private block is empty take a lock and redistribute the remaining
> allocation). Since boxes almost never get that close to overcommit
> kicking in then it should mean we close to never touch a locked count.

Can you use per-zone stats rather than global ones? That tends to
fix things pretty efficently on these type of machines - per zone 
LRUs made a huge impact.

Here's a little patch (untested!). I'll go look at the other case
and see if there's something easy to do, but I think it needs some
significant rework to do anything.

--- virgin-2.5.30.full/mm/mmap.c	Thu Aug  1 14:16:05 2002
+++ linux-2.5.30-vm_enough_memory/mm/mmap.c	Wed Aug  7 13:26:46 2002
@@ -74,7 +74,6 @@
 int vm_enough_memory(long pages)
 {
 	unsigned long free, allowed;
-	struct sysinfo i;
 
 	atomic_add(pages, &vm_committed_space);
 
@@ -115,12 +114,7 @@
 		return 0;
 	}
 
-	/*
-	 * FIXME: need to add arch hooks to get the bits we need
-	 * without this higher overhead crap
-	 */
-	si_meminfo(&i);
-	allowed = i.totalram * sysctl_overcommit_ratio / 100;
+	allowed = totalram_pages * sysctl_overcommit_ratio / 100;
 	allowed += total_swap_pages;
 
 	if (atomic_read(&vm_committed_space) < allowed)



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-07 23:44       ` Martin J. Bligh
@ 2002-09-08 17:07         ` Alan Cox
  2002-09-08 18:11           ` Martin J. Bligh
  2002-09-08 18:40           ` Andrew Morton
  0 siblings, 2 replies; 42+ messages in thread
From: Alan Cox @ 2002-09-08 17:07 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Andrew Morton, William Lee Irwin III, Paolo Ciarrocchi, linux-kernel

On Sun, 2002-09-08 at 00:44, Martin J. Bligh wrote:
> >> Perhaps testing with overcommit on would be useful.
> > 
> > Well yes - the new overcommit code was a significant hit on the 16ways
> > was it not?  You have some numbers on that?
> 
> About 20% hit on system time for kernel compiles.

That suprises me a lot. On a 2 way and 4 way the 2.4 memory overcommit
check code didnt show up. That may be down to the 2 way being on a CPU
that has no measurable cost for locked operations and the 4 way being an
ancient ppro a friend has.

If it is the memory overcommit handling then there are plenty of ways to
deal with it efficiently in the non-preempt case at least. I had
wondered originally about booking chunks of pages off per CPU (take the
remaining overcommit divide by four and only when a CPU finds its
private block is empty take a lock and redistribute the remaining
allocation). Since boxes almost never get that close to overcommit
kicking in then it should mean we close to never touch a locked count.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-08  8:25           ` David S. Miller
@ 2002-09-08  9:12             ` William Lee Irwin III
  0 siblings, 0 replies; 42+ messages in thread
From: William Lee Irwin III @ 2002-09-08  9:12 UTC (permalink / raw)
  To: David S. Miller; +Cc: akpm, ciarrocchi, linux-kernel

From: William Lee Irwin III <wli@holomorphy.com>
Date: Sun, 8 Sep 2002 01:28:21 -0700
>    But if this were truly the issue, the allocation and deallocation
>    overhead for pagetables should show up as additional pressure
>    against zone->lock.

On Sun, Sep 08, 2002 at 01:25:26AM -0700, David S. Miller wrote:
> The big gain is not only that allocation/free is cheap, also
> page table entries tend to hit in cpu cache for even freshly
> allocated page tables.
> I think that is the bit that would show up in the mmap lmbench
> test.

I'd have to doublecheck to see how parallelized lat_mmap is. My
machines are considerably more sensitive to locking uglies than cache
warmth. (They're taking my machines out, not just slowing them down.)
Cache warmth goodies are certainly nice optimizations, though.


Cheers,
Bill

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-08  7:37       ` David S. Miller
@ 2002-09-08  8:28         ` William Lee Irwin III
  2002-09-08  8:25           ` David S. Miller
  0 siblings, 1 reply; 42+ messages in thread
From: William Lee Irwin III @ 2002-09-08  8:28 UTC (permalink / raw)
  To: David S. Miller; +Cc: akpm, ciarrocchi, linux-kernel

From: Andrew Morton <akpm@digeo.com>
>    So it's a bit of rmap in there.  I'd have to compare with a 2.4
>    profile and fiddle a few kernel parameters.  But I'm not sure
>    that munmap of extremely sparsely populated pagtetables is very
>    interesting?

On Sun, Sep 08, 2002 at 12:37:00AM -0700, David S. Miller wrote:
> Another issue is that x86 doesn't use a pagetable cache.  I think it
> got killed from x86 when the pagetables in highmem went in.
> This is all from memory.

They seemed to have some other issues related to extreme memory
pressure (routine for me). But if this were truly the issue, the
allocation and deallocation overhead for pagetables should show up as
additional pressure against zone->lock. I can't tell at the moment
because zone->lock is hammered quite hard to begin with and no one's
gone out and done a pagetable cacheing patch for the stuff since. It
should be simple to chain with links in struct page instead of links
embedded in the pagetables & smp_call_function() to reclaim. But this
raises questions of generality.

Cheers,
Bill

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-08  8:28         ` William Lee Irwin III
@ 2002-09-08  8:25           ` David S. Miller
  2002-09-08  9:12             ` William Lee Irwin III
  0 siblings, 1 reply; 42+ messages in thread
From: David S. Miller @ 2002-09-08  8:25 UTC (permalink / raw)
  To: wli; +Cc: akpm, ciarrocchi, linux-kernel

   From: William Lee Irwin III <wli@holomorphy.com>
   Date: Sun, 8 Sep 2002 01:28:21 -0700
   
   But if this were truly the issue, the allocation and deallocation
   overhead for pagetables should show up as additional pressure
   against zone->lock.

The big gain is not only that allocation/free is cheap, also
page table entries tend to hit in cpu cache for even freshly
allocated page tables.

I think that is the bit that would show up in the mmap lmbench
test.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-07 18:09 Paolo Ciarrocchi
@ 2002-09-08  7:51 ` Andrew Morton
  0 siblings, 0 replies; 42+ messages in thread
From: Andrew Morton @ 2002-09-08  7:51 UTC (permalink / raw)
  To: Paolo Ciarrocchi; +Cc: linux-kernel

Paolo Ciarrocchi wrote:
> 
> ...
> File & VM system latencies in microseconds - smaller is better
> --------------------------------------------------------------
> Host                 OS   0K File      10K File      Mmap    Prot    Page
>                         Create Delete Create Delete  Latency Fault   Fault
> --------- ------------- ------ ------ ------ ------  ------- -----   -----
> frodo      Linux 2.4.18   68.9   16.0  185.8   31.6    425.0 0.789 2.00000
> frodo      Linux 2.4.19   68.9   14.9  186.5   29.8    416.0 0.798 2.00000
> frodo      Linux 2.5.33   77.8   19.1  211.6   38.3    774.0 0.832 3.00000
> frodo     Linux 2.5.33x   77.2   18.8  206.7   37.0    769.0 0.823 3.00000
> 

The create/delete performance is filesystem-specific.

profiling lat_fs on ext3:

c0170b70 236      0.372293    ext3_get_inode_loc      
c014354c 278      0.438548    __find_get_block        
c017cbf0 284      0.448013    journal_cancel_revoke   
c017ee24 291      0.459056    journal_add_journal_head 
c0171030 307      0.484296    ext3_do_update_inode    
c017856c 353      0.556861    journal_get_write_access 
c0178088 487      0.768248    do_get_write_access     
c0114744 530      0.836081    smp_apic_timer_interrupt 
c0178a84 559      0.881829    journal_dirty_metadata  
c0130644 832      1.31249     generic_file_write_nolock 
c0172654 2903     4.57951     ext3_add_entry          
c016ca10 3636     5.73583     ext3_check_dir_entry    
c0107048 47078    74.2661     poll_idle               

ext3_check_dir_entry is just sanity checking.  hmm.

on ext2:

c017f3ec 138      0.239971    ext2_free_blocks        
c012f560 147      0.255621    unlock_page             
c017f954 148      0.25736     ext2_new_block          
c017f2f0 154      0.267793    ext2_get_group_desc     
c0181958 162      0.281705    ext2_new_inode          
c014354c 182      0.316483    __find_get_block        
c0154f64 184      0.319961    __d_lookup              
c0109bc0 232      0.403429    apic_timer_interrupt    
c0143cc4 455      0.791208    __block_prepare_write   
c0114744 459      0.798164    smp_apic_timer_interrupt 
c0130644 1634     2.84139     generic_file_write_nolock 
c0180c64 6084     10.5796     ext2_add_link           
c0107048 42472    73.8554     poll_idle               

This is mostly in ext2_match() - comparing strings while
searching the directory.  memcmp().

ext3 with hashed index directories:

c01803dc 292      0.495251    journal_unlock_journal_head 
c0170b70 313      0.530868    ext3_get_inode_loc      
c01801a4 412      0.698779    journal_add_journal_head 
c014354c 455      0.77171     __find_get_block        
c0171030 489      0.829376    ext3_do_update_inode    
c017df70 515      0.873474    journal_cancel_revoke   
c01798ec 555      0.941316    journal_get_write_access 
c0173208 568      0.963365    ext3_add_entry          
c0179408 804      1.36364     do_get_write_access     
c0179e04 838      1.4213      journal_dirty_metadata  
c0130644 1127     1.91147     generic_file_write_nolock 
c0107048 44117    74.8253     poll_idle               

And yet the test (which tries to run for a fixed walltime)
seems to do the same amount of work.  No idea what's up
with that.

Lessons: use an indexed-directory filesystem, and consistency
checking costs.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-07 20:03   ` William Lee Irwin III
  2002-09-07 23:12     ` Andrew Morton
@ 2002-09-08  7:51     ` Andrew Morton
  2002-09-08  7:37       ` David S. Miller
  1 sibling, 1 reply; 42+ messages in thread
From: Andrew Morton @ 2002-09-08  7:51 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Paolo Ciarrocchi, linux-kernel

William Lee Irwin III wrote:
> 
> Paolo Ciarrocchi wrote:
> >> Hi all,
> >> I've just ran lmbench2.0 on my laptop.
> >> Here the results (again, 2.5.33 seems to be "slow", I don't know why...)
> 
> On Sat, Sep 07, 2002 at 09:20:56AM -0700, Andrew Morton wrote:
> > The fork/exec/mmap slowdown is the rmap overhead.  I have some stuff
> > which partialy improves it.
> 
> Hmm, Where does it enter the mmap() path? PTE instantiation is only done
> for the VM_LOCKED case IIRC. Otherwise it should be invisible.
> 

lat_mmap seems to do a mmap, faults in ten pages and then
a munmap().  Most of the CPU cost is in cache misses against
the pagetables in munmap().

c012d54c 153      0.569493    do_mmap_pgoff           
c012db5c 158      0.588104    find_vma                
c01301ec 172      0.640214    filemap_nopage          
c0134e84 172      0.640214    release_pages           
c0114744 184      0.684881    smp_apic_timer_interrupt 
c012ce3c 248      0.9231      handle_mm_fault         
c012f738 282      1.04965     find_get_page           
c013e2b0 356      1.32509     __set_page_dirty_buffers 
c0116294 377      1.40326     do_page_fault           
c013e72c 383      1.42559     page_add_rmap           
c013e8bc 398      1.48143     page_remove_rmap        
c012cb10 425      1.58193     do_no_page              
c0109d70 629      2.34125     page_fault              
c012b2f4 1036     3.85618     zap_pte_range           
c0107048 20205    75.2066     poll_idle               

(Multiply everything by four - it's a quad)

Instruction-level profile for -mm5:

c012b2f4 1036     3.85618     0        0           zap_pte_range           /usr/src/25/mm/memory.c:325 
 c012b2f5 2        0.19305     0        0           /usr/src/25/mm/memory.c:325 
 c012b2fd 1        0.0965251   0        0           /usr/src/25/mm/memory.c:325 
 c012b300 2        0.19305     0        0           /usr/src/25/mm/memory.c:325 
 c012b306 1        0.0965251   0        0           /usr/src/25/mm/memory.c:329 
 c012b309 1        0.0965251   0        0           /usr/src/25/mm/memory.c:329 
 c012b30f 1        0.0965251   0        0           /usr/src/25/mm/memory.c:331 
 c012b319 1        0.0965251   0        0           /usr/src/25/mm/memory.c:331 
 c012b340 1        0.0965251   0        0           /usr/src/25/mm/memory.c:336 
 c012b348 1        0.0965251   0        0           /usr/src/25/include/asm/highmem.h:80 
 c012b350 1        0.0965251   0        0           /usr/src/25/include/asm/thread_info.h:75 
 c012b35a 2        0.19305     0        0           /usr/src/25/include/asm/highmem.h:85 
 c012b365 2        0.19305     0        0           /usr/src/25/include/asm/highmem.h:86 
 c012b3c3 2        0.19305     0        0           /usr/src/25/mm/memory.c:337 
 c012b3d6 1        0.0965251   0        0           /usr/src/25/mm/memory.c:338 
 c012b3e9 3        0.289575    0        0           /usr/src/25/mm/memory.c:341 
 c012b3f5 106      10.2317     0        0           /usr/src/25/mm/memory.c:342 
 c012b3f8 2        0.19305     0        0           /usr/src/25/mm/memory.c:342 
 c012b3fa 26       2.50965     0        0           /usr/src/25/mm/memory.c:343 
 c012b3fc 124      11.9691     0        0           /usr/src/25/mm/memory.c:343 
 c012b405 13       1.25483     0        0           /usr/src/25/mm/memory.c:345 
 c012b40b 1        0.0965251   0        0           /usr/src/25/mm/memory.c:346 
 c012b410 2        0.19305     0        0           /usr/src/25/mm/memory.c:348 
 c012b412 1        0.0965251   0        0           /usr/src/25/mm/memory.c:348 
 c012b414 62       5.98456     0        0           /usr/src/25/mm/memory.c:349 
 c012b41b 1        0.0965251   0        0           /usr/src/25/mm/memory.c:350 
 c012b421 21       2.02703     0        0           /usr/src/25/mm/memory.c:350 
 c012b427 2        0.19305     0        0           /usr/src/25/mm/memory.c:351 
 c012b432 2        0.19305     0        0           /usr/src/25/include/asm/bitops.h:244 
 c012b434 10       0.965251    0        0           /usr/src/25/mm/memory.c:352 
 c012b437 1        0.0965251   0        0           /usr/src/25/mm/memory.c:352 
 c012b43d 5        0.482625    0        0           /usr/src/25/mm/memory.c:353 
 c012b446 7        0.675676    0        0           /usr/src/25/include/linux/mm.h:389 
 c012b44b 1        0.0965251   0        0           /usr/src/25/include/linux/mm.h:392 
 c012b44e 1        0.0965251   0        0           /usr/src/25/include/linux/mm.h:392 
 c012b451 7        0.675676    0        0           /usr/src/25/include/linux/mm.h:393 
 c012b453 2        0.19305     0        0           /usr/src/25/include/linux/mm.h:393 
 c012b461 6        0.579151    0        0           /usr/src/25/include/linux/mm.h:396 
 c012b466 8        0.772201    0        0           /usr/src/25/include/linux/mm.h:396 
 c012b46f 6        0.579151    0        0           /usr/src/25/mm/memory.c:356 
 c012b476 15       1.44788     0        0           /usr/src/25/include/asm-generic/tlb.h:105 
 c012b481 3        0.289575    0        0           /usr/src/25/include/asm-generic/tlb.h:106 
 c012b490 5        0.482625    0        0           /usr/src/25/include/asm-generic/tlb.h:110 
 c012b493 7        0.675676    0        0           /usr/src/25/include/asm-generic/tlb.h:110 
 c012b49a 1        0.0965251   0        0           /usr/src/25/include/asm-generic/tlb.h:110 
 c012b49d 3        0.289575    0        0           /usr/src/25/include/asm-generic/tlb.h:110 
 c012b4a0 1        0.0965251   0        0           /usr/src/25/include/asm-generic/tlb.h:110 
 c012b4a3 8        0.772201    0        0           /usr/src/25/include/asm-generic/tlb.h:111 
 c012b4aa 13       1.25483     0        0           /usr/src/25/include/asm-generic/tlb.h:111 
 c012b500 128      12.3552     0        0           /usr/src/25/mm/memory.c:341 
 c012b504 108      10.4247     0        0           /usr/src/25/mm/memory.c:341 
 c012b50b 111      10.7143     0        0           /usr/src/25/mm/memory.c:341 
 c012b50e 99       9.55598     0        0           /usr/src/25/mm/memory.c:341 
 c012b511 86       8.30116     0        0           /usr/src/25/mm/memory.c:341 
 c012b51c 4        0.3861      0        0           /usr/src/25/include/asm/thread_info.h:75 
 c012b521 3        0.289575    0        0           /usr/src/25/mm/memory.c:366 
 c012b525 1        0.0965251   0        0           /usr/src/25/mm/memory.c:366 
 c012b526 1        0.0965251   0        0           /usr/src/25/mm/memory.c:366 

So it's a bit of rmap in there.  I'd have to compare with a 2.4
profile and fiddle a few kernel parameters.  But I'm not sure
that munmap of extremely sparsely populated pagtetables is very
interesting?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-08  7:51     ` Andrew Morton
@ 2002-09-08  7:37       ` David S. Miller
  2002-09-08  8:28         ` William Lee Irwin III
  0 siblings, 1 reply; 42+ messages in thread
From: David S. Miller @ 2002-09-08  7:37 UTC (permalink / raw)
  To: akpm; +Cc: wli, ciarrocchi, linux-kernel

   From: Andrew Morton <akpm@digeo.com>
   Date: Sun, 08 Sep 2002 00:51:19 -0700
   
   So it's a bit of rmap in there.  I'd have to compare with a 2.4
   profile and fiddle a few kernel parameters.  But I'm not sure
   that munmap of extremely sparsely populated pagtetables is very
   interesting?

Another issue is that x86 doesn't use a pagetable cache.  I think it
got killed from x86 when the pagetables in highmem went in.

This is all from memory.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-07 23:12     ` Andrew Morton
  2002-09-07 23:01       ` William Lee Irwin III
@ 2002-09-07 23:44       ` Martin J. Bligh
  2002-09-08 17:07         ` Alan Cox
  1 sibling, 1 reply; 42+ messages in thread
From: Martin J. Bligh @ 2002-09-07 23:44 UTC (permalink / raw)
  To: Andrew Morton, William Lee Irwin III; +Cc: Paolo Ciarrocchi, linux-kernel

>> Perhaps testing with overcommit on would be useful.
> 
> Well yes - the new overcommit code was a significant hit on the 16ways
> was it not?  You have some numbers on that?

About 20% hit on system time for kernel compiles.

M.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-07 20:03   ` William Lee Irwin III
@ 2002-09-07 23:12     ` Andrew Morton
  2002-09-07 23:01       ` William Lee Irwin III
  2002-09-07 23:44       ` Martin J. Bligh
  2002-09-08  7:51     ` Andrew Morton
  1 sibling, 2 replies; 42+ messages in thread
From: Andrew Morton @ 2002-09-07 23:12 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Paolo Ciarrocchi, linux-kernel

William Lee Irwin III wrote:
> 
> Paolo Ciarrocchi wrote:
> >> Hi all,
> >> I've just ran lmbench2.0 on my laptop.
> >> Here the results (again, 2.5.33 seems to be "slow", I don't know why...)
> 
> On Sat, Sep 07, 2002 at 09:20:56AM -0700, Andrew Morton wrote:
> > The fork/exec/mmap slowdown is the rmap overhead.  I have some stuff
> > which partialy improves it.
> 
> Hmm, Where does it enter the mmap() path? PTE instantiation is only done
> for the VM_LOCKED case IIRC. Otherwise it should be invisible.

Oh, is that just the mmap() call itself?

> Perhaps testing with overcommit on would be useful.

Well yes - the new overcommit code was a significant hit on the 16ways
was it not?  You have some numbers on that?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-07 23:12     ` Andrew Morton
@ 2002-09-07 23:01       ` William Lee Irwin III
  2002-09-07 23:44       ` Martin J. Bligh
  1 sibling, 0 replies; 42+ messages in thread
From: William Lee Irwin III @ 2002-09-07 23:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Paolo Ciarrocchi, linux-kernel

On Sat, Sep 07, 2002 at 09:20:56AM -0700, Andrew Morton wrote:
>>> The fork/exec/mmap slowdown is the rmap overhead.  I have some stuff
>>> which partialy improves it.

William Lee Irwin III wrote:
>> Hmm, Where does it enter the mmap() path? PTE instantiation is only done
>> for the VM_LOCKED case IIRC. Otherwise it should be invisible.

On Sat, Sep 07, 2002 at 04:12:49PM -0700, Andrew Morton wrote:
> Oh, is that just the mmap() call itself?

I'm not actually sure what lmbench is doing.


William Lee Irwin III wrote:
>> Perhaps testing with overcommit on would be useful.

On Sat, Sep 07, 2002 at 04:12:49PM -0700, Andrew Morton wrote:
> Well yes - the new overcommit code was a significant hit on the 16ways
> was it not?  You have some numbers on that?

I don't remember the before/after numbers, but I can collect some.


Cheers,
Bill

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-07 18:53   ` Rik van Riel
@ 2002-09-07 21:44     ` Alan Cox
  2002-09-13 22:46       ` Pavel Machek
  0 siblings, 1 reply; 42+ messages in thread
From: Alan Cox @ 2002-09-07 21:44 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Jeff Garzik, Paolo Ciarrocchi, linux-kernel

On Sat, 2002-09-07 at 19:53, Rik van Riel wrote:
> On Sat, 7 Sep 2002, Jeff Garzik wrote:
> > Paolo Ciarrocchi wrote:
> > > Comments?
> >
> > Yeah:  "ouch" because I don't see a single category that's faster.
> 
> HZ went to 1000, which should help multimedia latencies a lot.

It shouldn't materially damage performance unless we have other things
extremely wrong. Its easy enough to verify by putting HZ back to 100 and
rebenching 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-07 16:20 ` Andrew Morton
@ 2002-09-07 20:03   ` William Lee Irwin III
  2002-09-07 23:12     ` Andrew Morton
  2002-09-08  7:51     ` Andrew Morton
  2002-09-08 20:02   ` Daniel Phillips
  1 sibling, 2 replies; 42+ messages in thread
From: William Lee Irwin III @ 2002-09-07 20:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Paolo Ciarrocchi, linux-kernel

Paolo Ciarrocchi wrote:
>> Hi all,
>> I've just ran lmbench2.0 on my laptop.
>> Here the results (again, 2.5.33 seems to be "slow", I don't know why...)

On Sat, Sep 07, 2002 at 09:20:56AM -0700, Andrew Morton wrote:
> The fork/exec/mmap slowdown is the rmap overhead.  I have some stuff
> which partialy improves it.

Hmm, Where does it enter the mmap() path? PTE instantiation is only done
for the VM_LOCKED case IIRC. Otherwise it should be invisible.

Perhaps testing with overcommit on would be useful.


Cheers,
Bill

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-07 12:27 ` Jeff Garzik
@ 2002-09-07 18:53   ` Rik van Riel
  2002-09-07 21:44     ` Alan Cox
  0 siblings, 1 reply; 42+ messages in thread
From: Rik van Riel @ 2002-09-07 18:53 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Paolo Ciarrocchi, linux-kernel

On Sat, 7 Sep 2002, Jeff Garzik wrote:
> Paolo Ciarrocchi wrote:
> > Comments?
>
> Yeah:  "ouch" because I don't see a single category that's faster.

HZ went to 1000, which should help multimedia latencies a lot.

> Oh well, it still needs to be tuned....

For throughput or for latency ? ;)

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

Spamtraps of the month:  september@surriel.com trac@trac.org


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
@ 2002-09-07 18:09 Paolo Ciarrocchi
  2002-09-08  7:51 ` Andrew Morton
  0 siblings, 1 reply; 42+ messages in thread
From: Paolo Ciarrocchi @ 2002-09-07 18:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel

From: Andrew Morton <akpm@digeo.com>

> Paolo Ciarrocchi wrote:
> > 
> > Hi all,
> > I've just ran lmbench2.0 on my laptop.
> > Here the results (again, 2.5.33 seems to be "slow", I don't know why...)
> > 
> 
> The fork/exec/mmap slowdown is the rmap overhead.  I have some stuff
> which partialy improves it.
> 
> The many-small-file-create slowdown is known but its cause is not.
> I need to get oprofile onto it.
Let me know if do something usefull for you.
Now I compiled the 2.5.33 with _NO_ preemption (the x tagged kernel).
Performance are better, but it still "slow".

cd results && make summary percent 2>/dev/null | more
make[1]: Entering directory `/usr/src/LMbench/results'

                 L M B E N C H  2 . 0   S U M M A R Y
                 ------------------------------------


Basic system parameters
----------------------------------------------------
Host                 OS Description              Mhz
                                                    
--------- ------------- ----------------------- ----
frodo      Linux 2.4.18       i686-pc-linux-gnu  797
frodo      Linux 2.4.19       i686-pc-linux-gnu  797
frodo      Linux 2.5.33       i686-pc-linux-gnu  797
frodo     Linux 2.5.33x       i686-pc-linux-gnu  797

Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host                 OS  Mhz null null      open selct sig  sig  fork exec sh  
                             call  I/O stat clos TCP   inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
frodo      Linux 2.4.18  797 0.40 0.56 3.18 3.97       1.00 3.18 115. 1231 13.K
frodo      Linux 2.4.19  797 0.40 0.56 3.07 3.88       1.00 3.19 129. 1113 13.K
frodo      Linux 2.5.33  797 0.40 0.61 3.78 4.76       1.02 3.37 201. 1458 13.K
frodo     Linux 2.5.33x  797 0.40 0.60 3.51 4.38       1.02 3.27 159. 1430 13.K

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host                 OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                        ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
frodo      Linux 2.4.18 0.990 4.4200   13.8 6.2700  309.8    58.6   310.5
frodo      Linux 2.4.19 0.900 4.2900   15.3 5.9100  309.6    57.7   309.9
frodo      Linux 2.5.33 1.620 5.2800   15.3 9.3500  312.7    54.9   312.7
frodo     Linux 2.5.33x 1.040 4.3200   17.8 7.6200  312.5    49.9   312.5

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
frodo      Linux 2.4.18 0.990 4.437 8.66                             
frodo      Linux 2.4.19 0.900 4.561 7.76                             
frodo      Linux 2.5.33 1.620 6.497 9.11                             
frodo     Linux 2.5.33x 1.040 4.888 8.70                             

File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host                 OS   0K File      10K File      Mmap    Prot    Page	
                        Create Delete Create Delete  Latency Fault   Fault 
--------- ------------- ------ ------ ------ ------  ------- -----   ----- 
frodo      Linux 2.4.18   68.9   16.0  185.8   31.6    425.0 0.789 2.00000
frodo      Linux 2.4.19   68.9   14.9  186.5   29.8    416.0 0.798 2.00000
frodo      Linux 2.5.33   77.8   19.1  211.6   38.3    774.0 0.832 3.00000
frodo     Linux 2.5.33x   77.2   18.8  206.7   37.0    769.0 0.823 3.00000

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
frodo      Linux 2.4.18 810. 650.       181.7  203.7  101.5  101.4 203. 195.3
frodo      Linux 2.4.19 808. 680.       187.2  203.8  101.5  101.4 203. 190.1
frodo      Linux 2.5.33 571. 636.       185.6  202.5  100.5  100.4 202. 190.3
frodo     Linux 2.5.33x 768. 710.       185.4  202.5  100.5  100.4 202. 189.5

Memory latencies in nanoseconds - smaller is better
    (WARNING - may not be correct, check graphs)
---------------------------------------------------
Host                 OS   Mhz  L1 $   L2 $    Main mem    Guesses
--------- -------------  ---- ----- ------    --------    -------
frodo      Linux 2.4.18   797 3.767 8.7890  158.9
frodo      Linux 2.4.19   797 3.767 8.7980  158.9
frodo      Linux 2.5.33   797 3.798 8.8660  160.1
frodo     Linux 2.5.33x   797 3.796   45.5  160.2
make[1]: Leaving directory `/usr/src/LMbench/results'

Hope it helps.

Ciao,
        Paolo


-- 
Get your free email from www.linuxmail.org 


Powered by Outblaze

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
@ 2002-09-07 18:04 Paolo Ciarrocchi
  2002-09-13 22:49 ` Pavel Machek
  0 siblings, 1 reply; 42+ messages in thread
From: Paolo Ciarrocchi @ 2002-09-07 18:04 UTC (permalink / raw)
  To: jmorris; +Cc: linux-kernel

From: James Morris <jmorris@intercode.com.au>

> On Sat, 7 Sep 2002, Paolo Ciarrocchi wrote:
> 
> > Let me know if you need further information (.config, info about my
> > hardware) or if you want I run other tests.
> 
> Would you be able to run the tests for 2.5.31?  I'm looking into a
> slowdown in 2.5.32/33 which may be related.  Some hardware info might be
> useful too.
I don't have the 2.5.31, and now I've only a slow 
internet connection... I'll try to download it on Monday.

The hw is a Laptop, a standard HP Omnibook 6000, 256 MiB of RAM, PIII@800.
Do you need more information?

Ciao,
           Paolo
-- 
Get your free email from www.linuxmail.org 


Powered by Outblaze

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-07 12:18 Paolo Ciarrocchi
  2002-09-07 12:27 ` Jeff Garzik
  2002-09-07 14:33 ` James Morris
@ 2002-09-07 16:20 ` Andrew Morton
  2002-09-07 20:03   ` William Lee Irwin III
  2002-09-08 20:02   ` Daniel Phillips
  2 siblings, 2 replies; 42+ messages in thread
From: Andrew Morton @ 2002-09-07 16:20 UTC (permalink / raw)
  To: Paolo Ciarrocchi; +Cc: linux-kernel

Paolo Ciarrocchi wrote:
> 
> Hi all,
> I've just ran lmbench2.0 on my laptop.
> Here the results (again, 2.5.33 seems to be "slow", I don't know why...)
> 

The fork/exec/mmap slowdown is the rmap overhead.  I have some stuff
which partialy improves it.

The many-small-file-create slowdown is known but its cause is not.
I need to get oprofile onto it.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-07 12:18 Paolo Ciarrocchi
  2002-09-07 12:27 ` Jeff Garzik
@ 2002-09-07 14:33 ` James Morris
  2002-09-09 22:22   ` Cliff White
  2002-09-07 16:20 ` Andrew Morton
  2 siblings, 1 reply; 42+ messages in thread
From: James Morris @ 2002-09-07 14:33 UTC (permalink / raw)
  To: Paolo Ciarrocchi; +Cc: linux-kernel

On Sat, 7 Sep 2002, Paolo Ciarrocchi wrote:

> Let me know if you need further information (.config, info about my
> hardware) or if you want I run other tests.

Would you be able to run the tests for 2.5.31?  I'm looking into a
slowdown in 2.5.32/33 which may be related.  Some hardware info might be
useful too.


- James
-- 
James Morris
<jmorris@intercode.com.au>



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
@ 2002-09-07 14:09 Shane Shrybman
  0 siblings, 0 replies; 42+ messages in thread
From: Shane Shrybman @ 2002-09-07 14:09 UTC (permalink / raw)
  To: linux-kernel

Hi,

Is it possible that there is still some debugging stuff turned on in
2.5.33?

Shane





^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
@ 2002-09-07 12:40 Paolo Ciarrocchi
  0 siblings, 0 replies; 42+ messages in thread
From: Paolo Ciarrocchi @ 2002-09-07 12:40 UTC (permalink / raw)
  To: jgarzik; +Cc: linux-kernel

From: Jeff Garzik <jgarzik@mandrakesoft.com>

> Paolo Ciarrocchi wrote:
> > Comments?
> 
> Yeah:  "ouch" because I don't see a single category that's faster.
Indeed!!
 
> Oh well, it still needs to be tuned....
Yes, but it seems to me really strange...

Ciao,
         Paolo
-- 
Get your free email from www.linuxmail.org 


Powered by Outblaze

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: LMbench2.0 results
  2002-09-07 12:18 Paolo Ciarrocchi
@ 2002-09-07 12:27 ` Jeff Garzik
  2002-09-07 18:53   ` Rik van Riel
  2002-09-07 14:33 ` James Morris
  2002-09-07 16:20 ` Andrew Morton
  2 siblings, 1 reply; 42+ messages in thread
From: Jeff Garzik @ 2002-09-07 12:27 UTC (permalink / raw)
  To: Paolo Ciarrocchi; +Cc: linux-kernel

Paolo Ciarrocchi wrote:
> Comments?

Yeah:  "ouch" because I don't see a single category that's faster.

Oh well, it still needs to be tuned....



^ permalink raw reply	[flat|nested] 42+ messages in thread

* LMbench2.0 results
@ 2002-09-07 12:18 Paolo Ciarrocchi
  2002-09-07 12:27 ` Jeff Garzik
                   ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Paolo Ciarrocchi @ 2002-09-07 12:18 UTC (permalink / raw)
  To: linux-kernel

Hi all,
I've just ran lmbench2.0 on my laptop.
Here the results (again, 2.5.33 seems to be "slow", I don't know why...)

cd results && make summary percent 2>/dev/null | more
make[1]: Entering directory `/usr/src/LMbench/results'

                 L M B E N C H  2 . 0   S U M M A R Y
                 ------------------------------------


Basic system parameters
----------------------------------------------------
Host                 OS Description              Mhz
                                                    
--------- ------------- ----------------------- ----
frodo      Linux 2.4.18       i686-pc-linux-gnu  797
frodo      Linux 2.4.19       i686-pc-linux-gnu  797
frodo      Linux 2.5.33       i686-pc-linux-gnu  797

Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host                 OS  Mhz null null      open selct sig  sig  fork exec sh  
                             call  I/O stat clos TCP   inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
frodo      Linux 2.4.18  797 0.40 0.56 3.18 3.97       1.00 3.18 115. 1231 13.K
frodo      Linux 2.4.19  797 0.40 0.56 3.07 3.88       1.00 3.19 129. 1113 13.K
frodo      Linux 2.5.33  797 0.40 0.61 3.78 4.76       1.02 3.37 201. 1458 13.K

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host                 OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                        ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
frodo      Linux 2.4.18 0.990 4.4200   13.8 6.2700  309.8    58.6   310.5
frodo      Linux 2.4.19 0.900 4.2900   15.3 5.9100  309.6    57.7   309.9
frodo      Linux 2.5.33 1.620 5.2800   15.3 9.3500  312.7    54.9   312.7

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
frodo      Linux 2.4.18 0.990 4.437 8.66                             
frodo      Linux 2.4.19 0.900 4.561 7.76                             
frodo      Linux 2.5.33 1.620 6.497 9.11                             

File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host                 OS   0K File      10K File      Mmap    Prot    Page	
                        Create Delete Create Delete  Latency Fault   Fault 
--------- ------------- ------ ------ ------ ------  ------- -----   ----- 
frodo      Linux 2.4.18   68.9   16.0  185.8   31.6    425.0 0.789 2.00000
frodo      Linux 2.4.19   68.9   14.9  186.5   29.8    416.0 0.798 2.00000
frodo      Linux 2.5.33   77.8   19.1  211.6   38.3    774.0 0.832 3.00000

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
frodo      Linux 2.4.18 810. 650.       181.7  203.7  101.5  101.4 203. 195.3
frodo      Linux 2.4.19 808. 680.       187.2  203.8  101.5  101.4 203. 190.1
frodo      Linux 2.5.33 571. 636.       185.6  202.5  100.5  100.4 202. 190.3

Memory latencies in nanoseconds - smaller is better
    (WARNING - may not be correct, check graphs)
---------------------------------------------------
Host                 OS   Mhz  L1 $   L2 $    Main mem    Guesses
--------- -------------  ---- ----- ------    --------    -------
frodo      Linux 2.4.18   797 3.767 8.7890  158.9
frodo      Linux 2.4.19   797 3.767 8.7980  158.9
frodo      Linux 2.5.33   797 3.798 8.8660  160.1
make[1]: Leaving directory `/usr/src/LMbench/results'

Comments?

Let me know if you need further information (.config, info about my hardware) or if you want I run other tests.

Ciao,
           Paolo
-- 
Get your free email from www.linuxmail.org 


Powered by Outblaze

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2002-09-22 12:37 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-22 12:42 LMbench2.0 results Paolo Ciarrocchi
  -- strict thread matches above, loose matches on Subject: below --
2002-09-14 18:26 Paolo Ciarrocchi
2002-09-15 18:08 ` Pavel Machek
2002-09-07 18:09 Paolo Ciarrocchi
2002-09-08  7:51 ` Andrew Morton
2002-09-07 18:04 Paolo Ciarrocchi
2002-09-13 22:49 ` Pavel Machek
2002-09-07 14:09 Shane Shrybman
2002-09-07 12:40 Paolo Ciarrocchi
2002-09-07 12:18 Paolo Ciarrocchi
2002-09-07 12:27 ` Jeff Garzik
2002-09-07 18:53   ` Rik van Riel
2002-09-07 21:44     ` Alan Cox
2002-09-13 22:46       ` Pavel Machek
2002-09-07 14:33 ` James Morris
2002-09-09 22:22   ` Cliff White
2002-09-07 16:20 ` Andrew Morton
2002-09-07 20:03   ` William Lee Irwin III
2002-09-07 23:12     ` Andrew Morton
2002-09-07 23:01       ` William Lee Irwin III
2002-09-07 23:44       ` Martin J. Bligh
2002-09-08 17:07         ` Alan Cox
2002-09-08 18:11           ` Martin J. Bligh
2002-09-08 18:40           ` Andrew Morton
2002-09-08 20:48             ` Hugh Dickins
2002-09-08 21:51               ` Andrew Morton
2002-09-09 21:13             ` Alan Cox
2002-09-09 21:44               ` Andrew Morton
2002-09-09 22:09                 ` Alan Cox
2002-09-08  7:51     ` Andrew Morton
2002-09-08  7:37       ` David S. Miller
2002-09-08  8:28         ` William Lee Irwin III
2002-09-08  8:25           ` David S. Miller
2002-09-08  9:12             ` William Lee Irwin III
2002-09-08 20:02   ` Daniel Phillips
2002-09-09 13:37     ` Rik van Riel
2002-09-09 16:16       ` Daniel Phillips
2002-09-09 16:26         ` Martin J. Bligh
2002-09-09 16:55           ` Daniel Phillips
2002-09-09 17:24             ` Martin J. Bligh
2002-09-09 21:11             ` Alan Cox
2002-09-09 16:52         ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).