linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* LMbench as gcc performance regression test?
@ 2003-08-31  7:21 Dan Kegel
  2003-08-31 14:00 ` Larry McVoy
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Dan Kegel @ 2003-08-31  7:21 UTC (permalink / raw)
  To: GCC Mailing List, linux-kernel

http://cs.nmu.edu/~benchmark/ has an interesting little graph
of LMBench results vs. Linux kernel version, all done with the
same compiler.

Has anyone seen a similar graph showing LMBench results vs. gcc version,
all done with the same Linux kernel?
And does everyone agree that's a meaningful way to compare the
performance of code generated by different compilers?

I happen to have a number of versions of gcc handy, and was
considering making such a graph, but was hoping somebody
else had already done it.

(There seems to be large variations in successive runs of LMBench
when I try it, so it may take me a bit of work to get repeatable
results.)

Thanks,
Dan

-- 
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: LMbench as gcc performance regression test?
  2003-08-31  7:21 LMbench as gcc performance regression test? Dan Kegel
@ 2003-08-31 14:00 ` Larry McVoy
  2003-08-31 14:28   ` Dan Kegel
       [not found]   ` <3F520773.1070907@kegel.com>
  2003-08-31 15:24 ` Daniel Jacobowitz
  2003-08-31 17:03 ` Martin J. Bligh
  2 siblings, 2 replies; 9+ messages in thread
From: Larry McVoy @ 2003-08-31 14:00 UTC (permalink / raw)
  To: Dan Kegel; +Cc: GCC Mailing List, linux-kernel

On Sun, Aug 31, 2003 at 12:21:37AM -0700, Dan Kegel wrote:
> (There seems to be large variations in successive runs of LMBench
> when I try it, so it may take me a bit of work to get repeatable
> results.)

Other than the context switch part or anything based on it, that shouldn't
be true, it should be very stable.

I'm pretty convinced that the variations are due to different pages being
allocated and the result cache contention makes things bounce.
-- 
---
Larry McVoy              lm at bitmover.com          http://www.bitmover.com/lm

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: LMbench as gcc performance regression test?
  2003-08-31 14:00 ` Larry McVoy
@ 2003-08-31 14:28   ` Dan Kegel
       [not found]   ` <3F520773.1070907@kegel.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Dan Kegel @ 2003-08-31 14:28 UTC (permalink / raw)
  To: Larry McVoy; +Cc: GCC Mailing List, linux-kernel

Larry McVoy wrote:
> On Sun, Aug 31, 2003 at 12:21:37AM -0700, Dan Kegel wrote:
> 
>>(There seems to be large variations in successive runs of LMBench
>>when I try it, so it may take me a bit of work to get repeatable
>>results.)
> 
> 
> Other than the context switch part or anything based on it, that shouldn't
> be true, it should be very stable.
> 
> I'm pretty convinced that the variations are due to different pages being
> allocated and the result cache contention makes things bounce.

Or an idiot running the benchmark.  We really do have to rule that out first.
- Dan

-- 
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: LMbench as gcc performance regression test?
  2003-08-31  7:21 LMbench as gcc performance regression test? Dan Kegel
  2003-08-31 14:00 ` Larry McVoy
@ 2003-08-31 15:24 ` Daniel Jacobowitz
  2003-08-31 15:59   ` Dan Kegel
  2003-08-31 17:03 ` Martin J. Bligh
  2 siblings, 1 reply; 9+ messages in thread
From: Daniel Jacobowitz @ 2003-08-31 15:24 UTC (permalink / raw)
  To: Dan Kegel; +Cc: GCC Mailing List, linux-kernel

On Sun, Aug 31, 2003 at 12:21:37AM -0700, Dan Kegel wrote:
> http://cs.nmu.edu/~benchmark/ has an interesting little graph
> of LMBench results vs. Linux kernel version, all done with the
> same compiler.
> 
> Has anyone seen a similar graph showing LMBench results vs. gcc version,
> all done with the same Linux kernel?
> And does everyone agree that's a meaningful way to compare the
> performance of code generated by different compilers?
> 
> I happen to have a number of versions of gcc handy, and was
> considering making such a graph, but was hoping somebody
> else had already done it.

It's been a while since I looked at lmbench but: why do you think this
would be useful?  It's a system and kernel benchmark; I doubt
optimization makes much difference at all.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: LMbench as gcc performance regression test?
  2003-08-31 15:24 ` Daniel Jacobowitz
@ 2003-08-31 15:59   ` Dan Kegel
  2003-08-31 16:18     ` Larry McVoy
  0 siblings, 1 reply; 9+ messages in thread
From: Dan Kegel @ 2003-08-31 15:59 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: GCC Mailing List, linux-kernel

Daniel Jacobowitz wrote:
> On Sun, Aug 31, 2003 at 12:21:37AM -0700, Dan Kegel wrote:
> 
>>http://cs.nmu.edu/~benchmark/ has an interesting little graph
>>of LMBench results vs. Linux kernel version, all done with the
>>same compiler.
>>
>>Has anyone seen a similar graph showing LMBench results vs. gcc version,
>>all done with the same Linux kernel?
>>And does everyone agree that's a meaningful way to compare the
>>performance of code generated by different compilers?
> 
> It's been a while since I looked at lmbench but: why do you think this
> would be useful?  It's a system and kernel benchmark; I doubt
> optimization makes much difference at all.

I need to make sure that moving to a newer compiler for our kernel
will cause no performance regressions.  Before bothering to bring up a
real-world networking application and measuring its performance
under the new compiler, it seems sensible to use a couple microbenchmarks
to verify that identifiable parts of the system have
not degraded in performance.

I myself am quite convinced I need to move to a newer compiler,
since I keep running into problems building various things with
old compilers, but my users are very conservative and skeptical;
I have to build a solid case for updating.  Hence the insane amount
of time I spent figuring out and documenting how to build and test
the various versions of gcc and glibc (http://kegel.com/crosstool),
and then understanding the regression test failures.
- Dan

-- 
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: LMbench as gcc performance regression test?
  2003-08-31 15:59   ` Dan Kegel
@ 2003-08-31 16:18     ` Larry McVoy
  0 siblings, 0 replies; 9+ messages in thread
From: Larry McVoy @ 2003-08-31 16:18 UTC (permalink / raw)
  To: Dan Kegel; +Cc: Daniel Jacobowitz, GCC Mailing List, linux-kernel

On Sun, Aug 31, 2003 at 08:59:10AM -0700, Dan Kegel wrote:
> I need to make sure that moving to a newer compiler for our kernel
> will cause no performance regressions.  

Perhaps people think that you mean to compile LMbench w/ different GCC's and
maybe what you really mean is compile the same kernel with different GCC's
and measure that.
-- 
---
Larry McVoy              lm at bitmover.com          http://www.bitmover.com/lm

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: LMbench as gcc performance regression test?
  2003-08-31  7:21 LMbench as gcc performance regression test? Dan Kegel
  2003-08-31 14:00 ` Larry McVoy
  2003-08-31 15:24 ` Daniel Jacobowitz
@ 2003-08-31 17:03 ` Martin J. Bligh
  2 siblings, 0 replies; 9+ messages in thread
From: Martin J. Bligh @ 2003-08-31 17:03 UTC (permalink / raw)
  To: Dan Kegel, GCC Mailing List, linux-kernel

> http://cs.nmu.edu/~benchmark/ has an interesting little graph
> of LMBench results vs. Linux kernel version, all done with the
> same compiler.
> 
> Has anyone seen a similar graph showing LMBench results vs. gcc version,
> all done with the same Linux kernel?
> And does everyone agree that's a meaningful way to compare the
> performance of code generated by different compilers?

I've done similar things with kernbench before (always using 2.95 to
run the test, but comparing kernels compiled with gcc 2.95 vs 3.2 vs 3.3,
and -Os vs -O2, etc). Summary was that 3.x takes *much* longer to compile
the kernel, and produces worse code (though 3.3 is almost back up to
the performance of 2.95, and is better than 3.2). -O2 is better than -Os, 
at least on a machine with 2MB L2 cache. Search the archives for results 
I posted if you want, but I never bothered graphing them.
 
> I happen to have a number of versions of gcc handy, and was
> considering making such a graph, but was hoping somebody
> else had already done it.
> 
> (There seems to be large variations in successive runs of LMBench
> when I try it, so it may take me a bit of work to get repeatable
> results.)

I'd just throw away any of the subtests that give you > 1% variations,
deriving anything meaning from crap data is hard (and dubious). I have
similar problems with some of the results in lmbench - Larry suggested
setting "ENOUGH=" or something, which helped a few tests, but most of
them still aren't stable enough to be useful to me.

I'd also use something "bigger" than just a microbenchmark - you need to
exercise a realistic set of the kernel functions in order to see space 
vs time tradeoffs, etc. If you want to see whether it's faster for you, 
you need a benchmark that simulates roughly what you do with the machine 
(ie a system-level benchmark).

M.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: LMbench as gcc performance regression test?
       [not found]     ` <20030831145956.GE23783@work.bitmover.com>
@ 2003-08-31 22:53       ` Dan Kegel
  0 siblings, 0 replies; 9+ messages in thread
From: Dan Kegel @ 2003-08-31 22:53 UTC (permalink / raw)
  To: Larry McVoy; +Cc: staelin, linux-kernel

Larry McVoy wrote:
> Here is some background, pick a benchmark and play with it and see if
> you can convince yourself of anything.  The basic idea is to run the
> benchmark TRIES times for $ENOUGH milliseconds.  TRIES is set to an odd
> number in bench.h because we sort the results and take the midpoint and
> print that as the result. 

It seems lat_pipe never does any median smoothing; it always sets TRIES to 1.
However, at least on the fairly quiet embedded system I'm testing on,
smoothing samples taken within a single run wouldn't make
a huge difference.  Any smoothing you get with that would be swamped by
the fact that lat_pipe's result has a bimodal distribution only one of whose
peaks shows up in any one run.
This sure sounds like the kind of thing page coloring is
supposed to solve; has anyone observed page coloring improving
the repeatability of the lat_pipe benchmark?

(There's no median smoothing in lat_pipe.c, I think, because it passes
a value >= 1000000 as the 2nd arg of BENCH:
                 BENCH(doit(p2[0], p1[1]), SHORT);
BENCH computes the number of samples to take the median of as
         __N = (get_enough(1000000) <= 100000) ? TRIES : 1;
get_enough() will always return at least what it is passed,
thus __N will always be 1.  It sure was whenever I printed it out, too.
This seems to be the case for the following tests:
bw_pipe  bw_tcp  bw_unix
lat_fcntl lat_fifo lat_pipe lat_rpc lat_tcp lat_udp lat_unix)
- Dan

-- 
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: LMbench as gcc performance regression test?
@ 2003-08-31 16:57 rwhron
  0 siblings, 0 replies; 9+ messages in thread
From: rwhron @ 2003-08-31 16:57 UTC (permalink / raw)
  To: dank; +Cc: linux-kernel

> For starters, just compiling the kernel with different compilers;
> I'll keep running LMBench compiled with the old compiler.

Below are the lmbench results I got.  One is from K6/2, and
one is from quad P3 Xeon.  The kernels called -falign=2 had
-falign-functions=2 -falign-jumps=2 -falign-labels=2 -falign-loops=2

First is the K6/2.  Second bunch of results is quad P3.
The numbers are the average of 25 runs.  These do not have high/low
results removed from the averages.

The kernel versions without a suffix are the default kernel compile options.

I also ran a bunch of other benchmarks on these kernels.
It seems to me the application/target platform may be the
final arbiter of "what is best".  I was hoping for a very 
general result like gcc-3.3.1 -Os -falign=2 is always clearly better,
but I don't see that.  (That generalization is partly true on
K6/2, but not on P3 Xeon - very different cache characteristics for
those chips).

BTW, gcc-3.3.1 -Os saved about 800k ram on 1GB athlon compared to gcc-2.95.3 -O2
for 2.6 kernel.  Most of the savings were in nop instructions.  There was also
a different mix of push/pop instructions.  -Os created more push/pops on athlon.

One thing that is clear.  gcc-3.3.1 takes longer to compile (roughly 2x).

Note: LMbench wasn't recompiled.  The compiler options only changed on the kernel.

                 L M B E N C H  2 . 0   S U M M A R Y
                 ------------------------------------

Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
                                     null     null                       open    signal   signal    fork    execve  /bin/sh
kernel                               call      I/O     stat    fstat    close   install   handle  process  process  process
-----------------------------      -------  -------  -------  -------  -------  -------  -------  -------  -------  -------
2.6.0-test3                           0.73  1.20351     5.89     1.60     7.07     1.99     5.04     1426     3272    13618
2.6.0-test3-gcc-2.96-Os               0.73  1.25547     4.17     1.71     7.22     1.99     4.43     1456     3416    13945
2.6.0-test3-gcc-3.3.1                 0.74  1.08099     4.15     1.56     6.73     2.02     4.28     1397     3146    13238
2.6.0-test3-gcc-3.3.1-Os              0.73  1.33862     4.64     1.92     7.11     2.00     4.88     1398     3453    13874
2.6.0-test3-gcc-3.3.1-Os-falign=2     0.71  1.24968     3.99     1.77     6.50     2.02     4.00     1413     3441    13814

File select - times in microseconds - smaller is better
-------------------------------------------------------
                                    select   select   select   select   select   select   select   select
kernel                               10 fd   100 fd   250 fd   500 fd   10 tcp  100 tcp  250 tcp  500 tcp
-----------------------------      -------  -------  -------  -------  -------  -------  -------  -------
2.6.0-test3                           4.11    15.51    35.15    66.14     5.98  33.6829  83.6085  156.960
2.6.0-test3-gcc-2.96-Os               4.04    15.71    35.02    67.80     6.03  33.5566  79.2899  160.586
2.6.0-test3-gcc-3.3.1                 4.62    16.14    36.56    74.28     6.38  33.7929  79.8531  156.387
2.6.0-test3-gcc-3.3.1-Os              4.29    21.57    42.62    81.01     6.71  46.7026  98.9269  193.885
2.6.0-test3-gcc-3.3.1-Os-falign=2     3.64    15.20    33.43    65.21     6.23  39.4099  92.0308  178.482

Context switching with 0K - times in microseconds - smaller is better
---------------------------------------------------------------------
                                    2proc/0k   4proc/0k   8proc/0k  16proc/0k  32proc/0k  64proc/0k  96proc/0k
kernel                             ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch
-----------------------------      ---------  ---------  ---------  ---------  ---------  ---------  ---------
2.6.0-test3                             1.87       7.71       8.97      10.71      12.91      14.45      15.24
2.6.0-test3-gcc-2.96-Os                 2.27       6.07       7.39       8.91      10.88      12.25      12.73
2.6.0-test3-gcc-3.3.1                   2.34       5.31       7.10       8.76      10.77      12.18      12.99
2.6.0-test3-gcc-3.3.1-Os                1.60       5.00       6.57       8.45      10.51      11.67      12.18
2.6.0-test3-gcc-3.3.1-Os-falign=2       1.25       3.60       5.45       7.15       9.29      10.65      11.19

Context switching with 4K - times in microseconds - smaller is better
---------------------------------------------------------------------
                                    2proc/4k   4proc/4k   8proc/4k  16proc/4k  32proc/4k  64proc/4k  96proc/4k
kernel                             ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch
-----------------------------      ---------  ---------  ---------  ---------  ---------  ---------  ---------
2.6.0-test3                             5.72      23.35      24.98      26.33      28.22      29.56      29.97
2.6.0-test3-gcc-2.96-Os                 7.00      21.94      22.98      24.53      26.12      26.94      27.41
2.6.0-test3-gcc-3.3.1                   6.24      21.41      22.92      24.12      25.83      27.27      27.70
2.6.0-test3-gcc-3.3.1-Os                8.21      22.88      23.45      24.57      26.01      26.95      27.54
2.6.0-test3-gcc-3.3.1-Os-falign=2       8.41      24.30      25.56      26.98      28.93      29.94      30.51

Context switching with 8K - times in microseconds - smaller is better
---------------------------------------------------------------------
                                    2proc/8k   4proc/8k   8proc/8k  16proc/8k  32proc/8k  64proc/8k  96proc/8k
kernel                             ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch
-----------------------------      ---------  ---------  ---------  ---------  ---------  ---------  ---------
2.6.0-test3                            18.85      40.92      41.65      42.04      44.08      45.42      46.26
2.6.0-test3-gcc-2.96-Os                18.32      39.79      39.45      40.34      41.50      42.87      43.61
2.6.0-test3-gcc-3.3.1                  18.15      38.66      39.15      39.70      41.11      42.61      43.37
2.6.0-test3-gcc-3.3.1-Os               18.38      38.33      39.38      39.67      41.27      42.48      43.40
2.6.0-test3-gcc-3.3.1-Os-falign=2      18.67      41.22      41.94      42.71      44.51      45.99      46.75

Context switching with 16K - times in microseconds - smaller is better
----------------------------------------------------------------------
                                   2proc/16k  4proc/16k  8proc/16k  16prc/16k  32prc/16k  64prc/16k  96prc/16k
kernel                             ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch
-----------------------------      ---------  ---------  ---------  ---------  ---------  ---------  ---------
2.6.0-test3                            32.70      74.53      74.04      73.43      74.59      76.24      77.31
2.6.0-test3-gcc-2.96-Os                31.40      72.91      71.58      71.54      72.60      74.36      75.62
2.6.0-test3-gcc-3.3.1                  31.45      71.75      70.75      71.21      72.53      74.22      75.36
2.6.0-test3-gcc-3.3.1-Os               29.74      70.87      70.07      69.92      71.42      73.41      74.78
2.6.0-test3-gcc-3.3.1-Os-falign=2      32.24      75.09      74.88      74.67      75.23      76.83      78.08

Context switching with 32K - times in microseconds - smaller is better
----------------------------------------------------------------------
                                   2proc/32k  4proc/32k  8proc/32k  16prc/32k  32prc/32k  64prc/32k  96prc/32k
kernel                             ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch
-----------------------------      ---------  ---------  ---------  ---------  ---------  ---------  ---------
2.6.0-test3                           129.83     134.99     136.17     135.81     138.72     142.94     146.38
2.6.0-test3-gcc-2.96-Os               128.66     132.55     132.72     134.04     136.61     140.45     143.50
2.6.0-test3-gcc-3.3.1                 129.67     133.26     133.57     134.00     135.80     139.78     143.06
2.6.0-test3-gcc-3.3.1-Os              130.01     133.43     132.44     133.15     135.29     139.57     143.19
2.6.0-test3-gcc-3.3.1-Os-falign=2     129.63     135.77     136.05     135.82     138.48     142.42     145.80

Context switching with 64K - times in microseconds - smaller is better
----------------------------------------------------------------------
                                   2proc/64k  4proc/64k  8proc/64k  16prc/64k  32prc/64k  64prc/64k  96prc/64k
kernel                             ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch
-----------------------------      ---------  ---------  ---------  ---------  ---------  ---------  ---------
2.6.0-test3                           234.17     241.21     244.17     247.96     255.85     265.69     270.87
2.6.0-test3-gcc-2.96-Os               239.48     242.65     242.96     246.98     254.17     263.57     268.69
2.6.0-test3-gcc-3.3.1                 237.30     241.26     243.74     245.77     253.20     262.63     268.03
2.6.0-test3-gcc-3.3.1-Os              236.21     240.27     241.66     246.98     253.88     264.21     269.66
2.6.0-test3-gcc-3.3.1-Os-falign=2     241.20     244.36     246.88     250.26     257.55     267.07     272.12

File create/delete and VM system latencies in microseconds - smaller is better
----------------------------------------------------------------------------
                                     0K       0K       1K       1K       4K       4K      10K      10K     Mmap     Prot    Page
kernel                             Create   Delete   Create   Delete   Create   Delete   Create   Delete   Latency  Fault   Fault
------------------------------     -------  -------  -------  -------  -------  -------  -------  -------  -------  ------  ------
2.6.0-test3                          157.4     44.3    282.2     77.9    293.5     78.0    480.7     98.8     4281    0.11     8.7
2.6.0-test3-gcc-2.96-Os              160.1     39.1    283.4     74.0    291.8     74.0    474.3     92.0     3973    0.84     8.4
2.6.0-test3-gcc-3.3.1                152.9     35.2    263.4     64.9    271.0     65.3    450.8     84.1     3884    0.90     9.0
2.6.0-test3-gcc-3.3.1-Os             142.8     32.1    257.1     65.8    265.2     65.7    447.9     85.0     4600    0.37     9.8
2.6.0-test3-gcc-3.3.1-Os-falign=2    145.0     37.0    261.6     75.7    269.9     75.9    452.9     94.8     3981    0.67     8.9

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
kernel                               Pipe   AF/Unix     UDP   RPC/UDP     TCP   RPC/TCP  TCPconn
-----------------------------      -------  -------  -------  -------  -------  -------  -------
2.6.0-test3                          23.91    28.95  79.4908  180.193  89.9243  213.264   379.17
2.6.0-test3-gcc-2.96-Os              16.72    24.36  61.8104  172.207  99.5137  213.399   384.39
2.6.0-test3-gcc-3.3.1                18.16    24.52  60.0927  156.112  85.7277  216.343   335.97
2.6.0-test3-gcc-3.3.1-Os             16.65    30.98  55.1646  159.562  97.5878  209.945   357.23
2.6.0-test3-gcc-3.3.1-Os-falign=2    15.69    25.77  64.9899  170.012  82.2858  210.149   344.43

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
                                                                 File     Mmap    Bcopy    Bcopy   Memory   Memory
kernel                               Pipe   AF/Unix    TCP     reread   reread   (libc)   (hand)     read    write
-----------------------------      -------  -------  -------  -------  -------  -------  -------  -------  -------
2.6.0-test3                           52.7     37.0     28.9     52.2    232.6     60.0     60.0    232.4     86.0
2.6.0-test3-gcc-2.96-Os               52.9     36.0     33.1     50.7    231.7     60.0     60.0    232.2     86.1
2.6.0-test3-gcc-3.3.1                 55.8     36.8     31.9     52.9    232.7     60.1     60.1    232.5     86.2
2.6.0-test3-gcc-3.3.1-Os              56.5     37.2     25.2     51.8    232.6     60.0     60.0    232.4     86.2
2.6.0-test3-gcc-3.3.1-Os-falign=2     54.3     35.9     29.3     50.7    232.6     60.0     60.0    232.3     86.1

*Local* More Communication bandwidths in MB/s - bigger is better
----------------------------------------------------------------
                                      File     Mmap  Aligned  Partial  Partial  Partial  Partial  
OS                                    open     open    Bcopy    Bcopy     Mmap     Mmap     Mmap    Bzero
                                     close    close   (libc)   (hand)     read    write   rd/wrt     copy     HTTP
-----------------------------      -------  -------  -------  -------  -------  -------  -------  -------  -------
2.6.0-test3                           53.0    175.8     59.5     66.2    241.9     86.1     86.1     86.0     2.99
2.6.0-test3-gcc-2.96-Os               51.6    176.8     59.5     66.2    241.9     86.1     86.1     86.0     2.91
2.6.0-test3-gcc-3.3.1                 52.3    178.0     59.5     66.2    242.1     86.2     86.2     86.2     3.21
2.6.0-test3-gcc-3.3.1-Os              52.8    165.6     59.5     66.2    242.1     86.3     86.3     86.2     3.08
2.6.0-test3-gcc-3.3.1-Os-falign=2     50.9    168.2     59.5     66.2    242.1     86.1     86.1     86.1     3.14

Memory latencies in nanoseconds - smaller is better
---------------------------------------------------
kernel                              Mhz     L1 $     L2 $    Main mem
-----------------------------      -----  -------  -------  ---------
2.6.0-test3                          476     4.25   232.06      268.3
2.6.0-test3-gcc-2.96-Os              476     4.26   231.12      268.7
2.6.0-test3-gcc-3.3.1                476     4.25   231.13      268.2
2.6.0-test3-gcc-3.3.1-Os             476     4.25   227.97      267.7
2.6.0-test3-gcc-3.3.1-Os-falign=2    476     4.25   231.75      267.7


Quad Xeon

                 L M B E N C H  2 . 0   S U M M A R Y
                 ------------------------------------

Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
                                         null     null                       open    signal   signal    fork    execve  /bin/sh
kernel                                   call      I/O     stat    fstat    close   install   handle  process  process  process
-----------------------------          -------  -------  -------  -------  -------  -------  -------  -------  -------  -------
2.6.0-test3-mm2                           0.50  0.75332     4.35     1.36     5.94     1.56     5.13      257     1007     4506
2.6.0-test3-mm2-gcc-3.3.1                 0.51  0.76516     4.39     1.37     5.96     1.54     5.11      254      996     4444
2.6.0-test3-mm2-gcc-3.3.1-Os-falign=2     0.52  0.75901     4.28     1.37     5.83     1.56     5.44      264     1019     4492

File select - times in microseconds - smaller is better
-------------------------------------------------------
                                        select   select   select   select   select   select   select   select
kernel                                   10 fd   100 fd   250 fd   500 fd   10 tcp  100 tcp  250 tcp  500 tcp
-----------------------------          -------  -------  -------  -------  -------  -------  -------  -------
2.6.0-test3-mm2                           3.98    22.31    53.07   106.21     5.02  33.4355  79.1707  159.488
2.6.0-test3-mm2-gcc-3.3.1                 3.94    23.22    54.85   108.65     4.94  32.8961  79.7636  159.520
2.6.0-test3-mm2-gcc-3.3.1-Os-falign=2     3.90    22.66    53.64   105.36     5.49  38.1446  91.8798  181.395

Context switching with 0K - times in microseconds - smaller is better
---------------------------------------------------------------------
                                        2proc/0k   4proc/0k   8proc/0k  16proc/0k  32proc/0k  64proc/0k  96proc/0k
kernel                                 ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch
-----------------------------          ---------  ---------  ---------  ---------  ---------  ---------  ---------
2.6.0-test3-mm2                             1.65       2.17       2.60       2.75       2.90       3.53       4.76
2.6.0-test3-mm2-gcc-3.3.1                   1.61       2.12       2.48       2.66       2.79       3.33       4.52
2.6.0-test3-mm2-gcc-3.3.1-Os-falign=2       2.00       2.62       3.39       3.44       3.53       3.92       5.00

Context switching with 4K - times in microseconds - smaller is better
---------------------------------------------------------------------
                                        2proc/4k   4proc/4k   8proc/4k  16proc/4k  32proc/4k  64proc/4k  96proc/4k
kernel                                 ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch
-----------------------------          ---------  ---------  ---------  ---------  ---------  ---------  ---------
2.6.0-test3-mm2                             2.66       3.76       4.31       4.39       4.58       6.78      10.27
2.6.0-test3-mm2-gcc-3.3.1                   2.25       4.14       4.47       4.47       4.56       6.54       9.67
2.6.0-test3-mm2-gcc-3.3.1-Os-falign=2       2.54       4.18       4.95       4.96       5.22       6.95      10.17

Context switching with 8K - times in microseconds - smaller is better
---------------------------------------------------------------------
                                        2proc/8k   4proc/8k   8proc/8k  16proc/8k  32proc/8k  64proc/8k  96proc/8k
kernel                                 ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch
-----------------------------          ---------  ---------  ---------  ---------  ---------  ---------  ---------
2.6.0-test3-mm2                             4.65       5.74       5.76       5.83       6.51      12.94      21.74
2.6.0-test3-mm2-gcc-3.3.1                   4.53       6.03       5.96       5.90       6.36      12.49      21.53
2.6.0-test3-mm2-gcc-3.3.1-Os-falign=2       4.50       5.95       6.16       6.27       6.86      12.92      21.89

Context switching with 16K - times in microseconds - smaller is better
----------------------------------------------------------------------
                                       2proc/16k  4proc/16k  8proc/16k  16prc/16k  32prc/16k  64prc/16k  96prc/16k
kernel                                 ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch
-----------------------------          ---------  ---------  ---------  ---------  ---------  ---------  ---------
2.6.0-test3-mm2                             9.14       9.07       8.90       8.99      14.06      39.74      49.45
2.6.0-test3-mm2-gcc-3.3.1                   8.21       8.33       8.40       8.96      14.51      39.50      50.52
2.6.0-test3-mm2-gcc-3.3.1-Os-falign=2       8.76       9.07       9.09       9.68      15.57      40.04      50.86

Context switching with 32K - times in microseconds - smaller is better
----------------------------------------------------------------------
                                       2proc/32k  4proc/32k  8proc/32k  16prc/32k  32prc/32k  64prc/32k  96prc/32k
kernel                                 ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch
-----------------------------          ---------  ---------  ---------  ---------  ---------  ---------  ---------
2.6.0-test3-mm2                           15.740     15.347     14.779     19.616     57.711     87.463     88.771
2.6.0-test3-mm2-gcc-3.3.1                 14.584     14.636     14.619     20.376     57.226     87.183     88.533
2.6.0-test3-mm2-gcc-3.3.1-Os-falign=2     14.961     14.820     14.847     20.192     58.146     87.172     88.723

Context switching with 64K - times in microseconds - smaller is better
----------------------------------------------------------------------
                                       2proc/64k  4proc/64k  8proc/64k  16prc/64k  32prc/64k  64prc/64k  96prc/64k
kernel                                 ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx swtch
-----------------------------          ---------  ---------  ---------  ---------  ---------  ---------  ---------
2.6.0-test3-mm2                            26.08      25.19      32.19     105.04     158.28     163.83     162.59
2.6.0-test3-mm2-gcc-3.3.1                  24.83      24.42      34.92     106.19     160.16     166.21     164.72
2.6.0-test3-mm2-gcc-3.3.1-Os-falign=2      26.00      25.57      32.14     102.00     157.90     162.73     163.06

File create/delete and VM system latencies in microseconds - smaller is better
----------------------------------------------------------------------------
                                         0K       0K       1K       1K       4K       4K      10K      10K     Mmap     Prot    Page
kernel                                 Create   Delete   Create   Delete   Create   Delete   Create   Delete   Latency  Fault   Fault
------------------------------         -------  -------  -------  -------  -------  -------  -------  -------  -------  ------  ------
2.6.0-test3-mm2                           56.2     10.5     89.1     22.0     89.0     22.0    130.8     30.5     4603    0.65     4.0
2.6.0-test3-mm2-gcc-3.3.1                 53.1      9.9     85.9     21.0     85.9     21.0    127.8     29.7     4603    0.70     4.0
2.6.0-test3-mm2-gcc-3.3.1-Os-falign=2     57.7     10.3     90.4     21.3     90.4     21.3    132.3     30.1     4725    0.61     4.1

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
kernel                                   Pipe   AF/Unix     UDP   RPC/UDP     TCP   RPC/TCP  TCPconn
-----------------------------          -------  -------  -------  -------  -------  -------  -------
2.6.0-test3-mm2                          10.31    15.11  30.8128  65.4053  37.5467  76.3995    97.65
2.6.0-test3-mm2-gcc-3.3.1                10.23    15.34  29.5681  62.6713  35.9673  75.1441    97.27
2.6.0-test3-mm2-gcc-3.3.1-Os-falign=2    11.77    17.47  33.1430  67.9045  38.9957  78.8412    97.49

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
                                                                     File     Mmap    Bcopy    Bcopy   Memory   Memory
kernel                                   Pipe   AF/Unix    TCP     reread   reread   (libc)   (hand)     read    write
-----------------------------          -------  -------  -------  -------  -------  -------  -------  -------  -------
2.6.0-test3-mm2                          482.1    532.7    189.2    298.3    367.5    170.7    173.6    366.9    213.5
2.6.0-test3-mm2-gcc-3.3.1                480.6    546.2    167.3    296.8    362.3    170.5    173.1    362.6    212.4
2.6.0-test3-mm2-gcc-3.3.1-Os-falign=2    472.9    538.1    163.9    297.5    365.2    170.9    173.6    364.9    213.0

*Local* More Communication bandwidths in MB/s - bigger is better
----------------------------------------------------------------
                                          File     Mmap  Aligned  Partial  Partial  Partial  Partial  
OS                                        open     open    Bcopy    Bcopy     Mmap     Mmap     Mmap    Bzero
                                         close    close   (libc)   (hand)     read    write   rd/wrt     copy     HTTP
-----------------------------          -------  -------  -------  -------  -------  -------  -------  -------  -------
2.6.0-test3-mm2                          298.3    283.4    168.9    183.9    781.2    212.6    213.7    350.2    10.24
2.6.0-test3-mm2-gcc-3.3.1                296.5    280.1    168.6    182.9    782.7    211.9    212.2    350.6    10.24
2.6.0-test3-mm2-gcc-3.3.1-Os-falign=2    297.1    280.0    169.2    183.7    783.4    212.3    212.6    350.4    10.04

Memory latencies in nanoseconds - smaller is better
---------------------------------------------------
kernel                                  Mhz     L1 $     L2 $    Main mem
-----------------------------          -----  -------  -------  ---------
2.6.0-test3-mm2                          698     4.33    12.98      164.4
2.6.0-test3-mm2-gcc-3.3.1                698     4.33    13.00      166.4
2.6.0-test3-mm2-gcc-3.3.1-Os-falign=2    698     4.31    12.93      164.8




-- 
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-08-31 22:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-31  7:21 LMbench as gcc performance regression test? Dan Kegel
2003-08-31 14:00 ` Larry McVoy
2003-08-31 14:28   ` Dan Kegel
     [not found]   ` <3F520773.1070907@kegel.com>
     [not found]     ` <20030831145956.GE23783@work.bitmover.com>
2003-08-31 22:53       ` Dan Kegel
2003-08-31 15:24 ` Daniel Jacobowitz
2003-08-31 15:59   ` Dan Kegel
2003-08-31 16:18     ` Larry McVoy
2003-08-31 17:03 ` Martin J. Bligh
2003-08-31 16:57 rwhron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).