* [BENCHMARK] 2.5.68 and 2.5.68-mm2
@ 2003-04-25 23:09 rwhron
2003-04-25 23:25 ` Andrew Morton
0 siblings, 1 reply; 8+ messages in thread
From: rwhron @ 2003-04-25 23:09 UTC (permalink / raw)
To: linux-kernel
There are a few benchmarks that have changed dramatically
between 2.5.68 and 2.5.68-mm2.
Machine is Quad P3 700 mhz Xeon with 1M cache.
3.75 GB RAM.
RAID0 LUN
QLogic 2200 Fiber channel
Some config differences. 2.5.68 has standard Qlogic driver.
2.5.68-mm2 has new Qlogic driver and the 2/2 GB memory split.
Only in 2.5.68
CONFIG_SCSI_QLOGIC_FC=y
CONFIG_SCSI_QLOGIC_FC_FIRMWARE=y
CONFIG_SCSI_QLOGIC_ISP=y
Only in 2.5.68-mm2
CONFIG_2GB=y
CONFIG_DEBUG_INFO=y
CONFIG_NR_SIBLINGS_0=y
CONFIG_SCSI_QLOGIC_ISP_NEW=y
CONFIG_SPINLINE=y
One recent change is -mm2 is 17-19% faster at tbench.
The logfiles don't indicate any errors. Wonder what helped?
tbench 192 processes Average High Low
2.5.68-mm2 139.44 142.14 136.77 MB/sec
2.5.68 118.78 132.41 111.45
tbench 64 processes Average High Low
2.5.68-mm2 136.34 143.66 124.13 MB/sec
2.5.68 114.30 116.88 111.33
The autoconf-2.53 make/make check is a fork test. 2.5.68
is about 13% faster here.
kernel average min_time max_time
2.5.68 732.8 729 738 seconds
2.5.68-mm2 833.3 824 841
On the AIM7 database test, -mm2 was about 18% faster and
uses about 15% more CPU time. (Real and CPU are in seconds).
The new Qlogic driver helps AIM7.
AIM7 dbase workload
kernel Tasks Jobs/Min Real CPU
2.5.68-mm2 32 559.8 339.6 164.0
2.5.68 32 477.1 398.4 150.9
2.5.68-mm2 64 714.1 532.4 312.3
2.5.68 64 608.3 625.0 272.4
2.5.68-mm2 96 785.6 725.9 458.8
2.5.68 96 664.7 857.8 393.9
2.5.68-mm2 128 832.1 913.8 640.0
2.5.68 128 702.3 1082.5 515.5
2.5.68-mm2 160 858.5 1107.0 712.2
2.5.68 160 726.7 1307.8 624.2
2.5.68-mm2 192 880.4 1295.4 871.1
2.5.68 192 745.7 1529.5 763.0
2.5.68-mm2 224 895.1 1486.5 1005.1
2.5.68 224 758.0 1755.3 868.4
2.5.68-mm2 256 907.8 1675.1 1144.5
2.5.68 256 767.5 1981.3 987.2
On the AIM7 shared test, -mm2 is 15-19% faster and
uses about 5% more CPU time.
AIM7 shared workload
kernel Tasks Jobs/Min Real CPU
2.5.68-mm2 64 2447.0 152.2 180.8
2.5.68 64 2110.4 176.5 170.0
2.5.68-mm2 128 2705.0 275.4 357.6
2.5.68 128 2276.9 327.2 337.2
2.5.68-mm2 192 2708.3 412.6 537.5
2.5.68 192 2265.4 493.3 506.8
2.5.68-mm2 256 2746.1 542.5 716.3
2.5.68 256 2304.7 646.5 677.5
2.5.68-mm2 320 2732.9 681.5 900.0
2.5.68 320 2296.3 811.0 849.4
L M B E N C H 2 . 0 S U M M A R Y
------------------------------------
The lmbench process latency results go along with the autoconf
build results.
Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
fork execve /bin/sh
kernel process process process
------------- ------- ------- -------
2.5.68 243 979 4401
2.5.68-mm2 502 1715 5200
The lmbench context switch tests have an interesting pattern.
With low processes and small packets, 2.5.68 has lower latency.
2.5.68-mm2 turns the table for high process big packet tests.
Context switching with 0K - times in microseconds - smaller is better
---------------------------------------------------------------------
2proc/0k 4proc/0k 8proc/0k 16proc/0k 32proc/0k 64proc/0k 96proc/0k
kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch
2.5.68 1.32 2.63 2.38 2.41 2.42 2.87 3.79
2.5.68-mm2 6.80 6.97 6.74 6.59 6.43 5.94 6.17
Context switching with 4K - times in microseconds - smaller is better
---------------------------------------------------------------------
2proc/4k 4proc/4k 8proc/4k 16proc/4k 32proc/4k 64proc/4k 96proc/4k
kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch
2.5.68 1.81 3.53 3.79 4.26 4.62 6.06 8.30
2.5.68-mm2 6.91 7.13 7.29 7.57 7.72 7.38 7.91
Context switching with 8K - times in microseconds - smaller is better
---------------------------------------------------------------------
2proc/8k 4proc/8k 8proc/8k 16proc/8k 32proc/8k 64proc/8k 96proc/8k
kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch
2.5.68 3.31 5.35 5.16 5.29 6.07 12.05 19.60
2.5.68-mm2 7.20 8.42 8.86 8.87 9.12 9.13 10.51
Context switching with 16K - times in microseconds - smaller is better
----------------------------------------------------------------------
2proc/16k 4proc/16k 8proc/16k 16prc/16k 32prc/16k 64prc/16k 96prc/16k
kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch
2.5.68 7.46 8.19 8.04 8.49 13.66 37.52 46.99
2.5.68-mm2 10.50 11.46 11.78 11.61 11.89 15.26 24.91
Context switching with 32K - times in microseconds - smaller is better
----------------------------------------------------------------------
2proc/32k 4proc/32k 8proc/32k 16prc/32k 32prc/32k 64prc/32k 96prc/32k
kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch
2.5.68 12.690 13.520 13.856 19.877 52.473 81.259 83.397
2.5.68-mm2 17.419 17.285 17.212 17.358 20.044 46.069 75.088
Context switching with 64K - times in microseconds - smaller is better
----------------------------------------------------------------------
2proc/64k 4proc/64k 8proc/64k 16prc/64k 32prc/64k 64prc/64k 96prc/64k
kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch
2.5.68 23.03 24.71 34.03 105.06 155.47 156.37 156.29
2.5.68-mm2 27.81 27.97 28.03 33.67 79.36 154.14 172.09
2.5.68 has lower latency in the local communcation tests.
*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
kernel Pipe AF/Unix UDP RPC/UDP TCP RPC/TCP
2.5.68 9.44 14.25 32.0856 60.1722 39.8264 73.7042
2.5.68-mm2 32.71 48.45 45.4747 65.2766 56.7022 79.7929
*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
File Mmap Bcopy Bcopy Memory Memory
kernel Pipe AF/Unix TCP reread reread (libc) (hand) read write
2.5.68 511.3 546.9 174.0 296.5 363.9 170.3 172.0 364.9 211.9
2.5.68-mm2 493.2 278.0 167.2 289.2 347.8 160.9 163.1 348.1 199.3
*Local* More Communication bandwidths in MB/s - bigger is better
----------------------------------------------------------------
File Mmap Aligned Partial Partial Partial
OS open open Bcopy Bcopy Mmap Mmap
close close (libc) (hand) write rd/wrt HTTP
2.5.68 299.0 286.0 167.8 182.5 212.2 212.7 10.10
2.5.68-mm2 291.9 277.5 159.7 172.4 201.2 200.5 9.82
Memory latencies in nanoseconds - smaller is better
---------------------------------------------------
kernel Mhz L1 $ L2 $ Main mem
2.5.68 698 4.35 13.06 165.3
2.5.68-mm2 698 4.33 13.00 173.1
tiobench-0.3.3
Unit information
================
File size = 8192 megabytes
Blk Size = 4096 bytes
Rate = megabytes per second
CPU% = percentage of CPU used during the test
Latency = milliseconds
Lat% = percent of requests that took longer than X seconds
CPU Eff = Rate divided by CPU% - throughput per cpu load
One notable difference between -mm2 and 2.5.68 is the CPU% as
thread count goes up. -mm2 uses less CPU as thread count rises,
and 2.5.68 uses more. 2.5.68 keeps sequential read throughput
high as threads increase.
Sequential Reads ext2
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency >2s >10s Eff
------------- --- ------------------------------------------------------------
2.5.68 1 28.77 13.23% 0.405 592.14 0.00000 0.00000 217
2.5.68-mm2 1 28.77 13.80% 0.404 659.18 0.00000 0.00000 208
2.5.68 8 36.65 18.04% 2.542 945.37 0.00000 0.00000 203
2.5.68-mm2 8 23.96 11.15% 3.810 1219.85 0.00000 0.00000 215
2.5.68 16 30.56 14.94% 6.080 1224.19 0.00000 0.00000 204
2.5.68-mm2 16 20.19 9.39% 8.953 2456.76 0.00000 0.00000 215
2.5.68 32 27.74 13.84% 13.376 1498.48 0.00000 0.00000 200
2.5.68-mm2 32 20.15 9.50% 16.728 4424.53 0.00000 0.00000 212
2.5.68 64 28.47 14.54% 25.294 6204.46 0.00005 0.00000 196
2.5.68-mm2 64 19.54 9.40% 32.600 12986.20 0.04410 0.00000 208
2.5.68 128 29.87 14.99% 41.715 17752.22 0.10242 0.00000 199
2.5.68-mm2 128 19.28 9.21% 63.638 57459.95 1.27239 0.01006 209
2.5.68 256 34.10 16.88% 64.697 51122.80 1.16358 0.01163 202
2.5.68-mm2 256 18.84 8.96% 125.350 164470.88 1.43795 0.14148 210
Random Reads throughput on ext2 is a lot higher on 2.5.68. -mm2 has a bump in
latency as thread count gets very high.
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency >2s >10s Eff
------------- --- ------------------------------------------------------------
2.5.68 1 0.84 0.75% 14.003 120.98 0.00000 0.00000 111
2.5.68-mm2 1 0.95 0.88% 12.383 121.84 0.00000 0.00000 108
2.5.68 8 4.56 4.29% 19.193 122.64 0.00000 0.00000 106
2.5.68-mm2 8 0.96 0.85% 95.108 715.00 0.00000 0.00000 113
2.5.68 16 4.34 3.95% 40.724 212.21 0.00000 0.00000 110
2.5.68-mm2 16 0.99 0.80% 178.652 1203.69 0.00000 0.00000 123
2.5.68 32 3.28 3.40% 98.453 335.85 0.00000 0.00000 96
2.5.68-mm2 32 0.94 0.76% 357.853 2151.68 0.00000 0.00000 124
2.5.68 64 4.20 3.87% 137.963 647.04 0.00000 0.00000 108
2.5.68-mm2 64 0.91 0.79% 677.313 3973.72 0.00000 0.00000 115
2.5.68 128 4.18 4.03% 245.390 1693.66 0.00000 0.00000 104
2.5.68-mm2 128 0.90 0.76% 1275.112 7329.02 11.84476 0.00000 119
2.5.68 256 4.96 4.47% 285.231 6121.11 0.78125 0.00000 111
2.5.68-mm2 256 0.86 0.86% 2160.203 40955.72 32.13542 3.67187 99
For Sequential Writes on ext2, -mm2 has higher throughput and lower latency.
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency >2s >10s Eff
------------- --- ------------------------------------------------------------
2.5.68 1 55.43 41.59% 0.173 3228.31 0.00000 0.00000 133
2.5.68-mm2 1 57.78 43.13% 0.164 3055.50 0.00000 0.00000 134
2.5.68 8 30.83 30.28% 2.473 21372.39 0.05684 0.00000 102
2.5.68-mm2 8 32.13 33.00% 2.281 20425.81 0.05011 0.00000 97
2.5.68 16 29.02 30.14% 4.886 36841.82 0.08054 0.00024 96
2.5.68-mm2 16 30.26 32.67% 4.616 33532.37 0.07949 0.00020 93
2.5.68 32 26.93 32.35% 9.834 76337.91 0.10024 0.03682 83
2.5.68-mm2 32 28.08 33.27% 9.433 75278.98 0.09423 0.01369 84
2.5.68 64 25.72 33.33% 19.158 134891.94 0.14043 0.07386 77
2.5.68-mm2 64 28.50 36.25% 18.455 133508.81 0.11492 0.06619 79
2.5.68 128 25.85 34.97% 35.961 266123.63 0.22740 0.09542 74
2.5.68-mm2 128 28.69 37.41% 33.453 217356.72 0.21301 0.08387 77
2.5.68 256 29.80 43.31% 60.387 463540.28 0.43515 0.12388 69
2.5.68-mm2 256 29.84 43.63% 60.796 404468.07 0.54049 0.11292 68
-mm2 does better with random writes.
Random Writes ext2
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency >2s >10s Eff
------------- --- ------------------------------------------------------------
2.5.68 1 2.86 2.73% 1.059 60.94 0.00000 0.00000 105
2.5.68-mm2 1 4.48 3.94% 0.077 22.02 0.00000 0.00000 114
2.5.68 8 3.73 4.39% 1.176 81.25 0.00000 0.00000 85
2.5.68-mm2 8 4.09 3.91% 1.984 488.24 0.00000 0.00000 104
2.5.68 16 3.69 4.21% 1.872 189.26 0.00000 0.00000 88
2.5.68-mm2 16 4.00 4.45% 3.510 969.07 0.00000 0.00000 90
2.5.68 32 3.71 4.89% 2.102 352.52 0.00000 0.00000 76
2.5.68-mm2 32 4.03 5.62% 4.660 1455.09 0.00000 0.00000 72
2.5.68 64 3.71 5.68% 2.266 701.86 0.00000 0.00000 65
2.5.68-mm2 64 4.26 7.39% 2.334 1483.77 0.00000 0.00000 58
2.5.68 128 3.79 6.87% 1.343 1042.66 0.00000 0.00000 55
2.5.68-mm2 128 4.35 8.14% 0.853 275.49 0.00000 0.00000 53
2.5.68 256 3.79 6.70% 0.304 79.07 0.00000 0.00000 57
2.5.68-mm2 256 4.36 8.87% 2.487 3519.76 0.00000 0.00000 49
bonnie++-1.02c random seek test on ext2 supports the tiobench random write
result.
Sequential Output ------------------ ----- Random -----
------ Block ----- ---- Rewrite ---- ----- Seeks -----
Kernel Size MB/sec %CPU Eff MB/sec %CPU Eff /sec %CPU Eff
2.5.68 8192 68.62 53.3 129 15.92 17.0 94 502.5 3.00 16750
2.5.68-mm2 8192 71.61 57.0 126 17.52 19.0 92 203.9 1.00 20393
--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BENCHMARK] 2.5.68 and 2.5.68-mm2
2003-04-25 23:09 [BENCHMARK] 2.5.68 and 2.5.68-mm2 rwhron
@ 2003-04-25 23:25 ` Andrew Morton
0 siblings, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2003-04-25 23:25 UTC (permalink / raw)
To: rwhron; +Cc: linux-kernel
rwhron@earthlink.net wrote:
>
> There are a few benchmarks that have changed dramatically
> between 2.5.68 and 2.5.68-mm2.
>
> Machine is Quad P3 700 mhz Xeon with 1M cache.
> 3.75 GB RAM.
> RAID0 LUN
> QLogic 2200 Fiber channel
SMP + SCSI + ext[23] + tiobench -> sad.
> One recent change is -mm2 is 17-19% faster at tbench.
> The logfiles don't indicate any errors. Wonder what helped?
CPU scheduler changes perhaps. There are some in there, but they are
small.
> The autoconf-2.53 make/make check is a fork test. 2.5.68
> is about 13% faster here.
I wonder why. Which fs was it?
> On the AIM7 database test, -mm2 was about 18% faster and
> uses about 15% more CPU time. (Real and CPU are in seconds).
> The new Qlogic driver helps AIM7.
iirc, AIM7 is dominated by lots of O_SYNC writes. I'd have expected the
anticipatory scheduler to do worse. Odd. Which fs was it?
> On the AIM7 shared test, -mm2 is 15-19% faster and
> uses about 5% more CPU time.
Might be the ext3 BKL removal?
>
> Processor, Processes - times in microseconds - smaller is better
> ----------------------------------------------------------------
> fork execve /bin/sh
> kernel process process process
> ------------- ------- ------- -------
> 2.5.68 243 979 4401
> 2.5.68-mm2 502 1715 5200
How odd. I'll check that.
> tiobench-0.3.3
tiobench will create a bunch of processes, each growing a large file, all
in the same directory. Doing this on SMP (especially with SCSI) jsut goes
straight for the jugular of the ext3 and ext2 block allocators. This is
why you saw such crap numbers with the fs comparison.
The tiobench datafiles end up being astonishingly fragmented.
You won't see the problem on uniprocessor, because each task can allocate
continuous runs of blocks in a timeslice.
It's better (but still crap) on ext2 because ext2 manages to allocate
fragments of 8 blocks, not 1 block. If you bump
EXT2_DEFAULT_PREALLOC_BLOCKS from 8 to 32 you'll get better ext2 numbers.
The benchmark is hitting a pathologoical case. Yeah, it's a problem, but
it's not as bad as tiobench indicates.
The place where it is more likely to hurt is when an application is slowly
growing more than one file. Something like:
for (lots) {
write(fd1, buf1, 4096);
write(fd2, buf2, 4096);
}
the ext[23] block allcoators need significant redesign to fix this up for
real.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BENCHMARK] 2.5.68 and 2.5.68-mm2
2003-04-30 0:59 rwhron
@ 2003-05-01 18:10 ` Nick Piggin
0 siblings, 0 replies; 8+ messages in thread
From: Nick Piggin @ 2003-05-01 18:10 UTC (permalink / raw)
To: rwhron; +Cc: linux-kernel
rwhron@earthlink.net wrote:
>>A run with deadline on mm would be nice to see.
>>
>
>Summary:
>Most benchmarks don't show much difference between 2.5.68-mm2 using
>anticipatory vs deadline scheduler. AIM7 had almost no difference.
>Tiobench has the most difference.
>
OK, thanks for that. I'd say it could be a TCQ or possibly
RAID or driver problem. I don't have anything in mind to
fix it yet but it has to be addressed at some point.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BENCHMARK] 2.5.68 and 2.5.68-mm2
@ 2003-04-30 0:59 rwhron
2003-05-01 18:10 ` Nick Piggin
0 siblings, 1 reply; 8+ messages in thread
From: rwhron @ 2003-04-30 0:59 UTC (permalink / raw)
To: piggin; +Cc: linux-kernel
> A run with deadline on mm would be nice to see.
Summary:
Most benchmarks don't show much difference between 2.5.68-mm2 using
anticipatory vs deadline scheduler. AIM7 had almost no difference.
Tiobench has the most difference.
Quad P3 Xeon
RAID 0 LUN
All of these are ext2 filesystem
3.75 GB ram.
tiobench-0.3.3
Unit information
================
File size = 8192 megabytes
Blk Size = bytes
Rate = megabytes per second
CPU% = percentage of CPU used during the test
Latency = milliseconds
Lat% = percent of requests that took longer than X seconds
CPU Eff = Rate divided by CPU% - throughput per cpu load
2.5.68 is here because both i/o schedulers in -mm2 did better at
random writes on ext2.
2.5.68-mm2-dl = deadline elevator.
For tiobench sequential reads on ext2, 2.5.68-mm2 with deadline has higher
throughput and lower latency. Note the kernels that use the most cpu time
have the best performance.
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency >2s >10s Eff
--------------- --- ------------------------------------------------------------
2.5.68-mm2-dl 1 28.85 13.76% 0.403 635.44 0.00000 0.00000 210
2.5.68 1 28.77 13.23% 0.405 592.14 0.00000 0.00000 217
2.5.68-mm2 1 28.77 13.80% 0.404 659.18 0.00000 0.00000 208
2.5.68-mm2-dl 8 42.36 22.27% 2.147 1249.47 0.00000 0.00000 190
2.5.68 8 36.65 18.04% 2.542 945.37 0.00000 0.00000 203
2.5.68-mm2 8 23.96 11.15% 3.810 1219.85 0.00000 0.00000 215
2.5.68-mm2-dl 16 39.36 20.66% 4.534 1105.91 0.00000 0.00000 190
2.5.68 16 30.56 14.94% 6.080 1224.19 0.00000 0.00000 204
2.5.68-mm2 16 20.19 9.39% 8.953 2456.76 0.00000 0.00000 215
2.5.68-mm2-dl 32 36.47 18.93% 9.741 1412.80 0.00000 0.00000 193
2.5.68 32 27.74 13.84% 13.376 1498.48 0.00000 0.00000 200
2.5.68-mm2 32 20.15 9.50% 16.728 4424.53 0.00000 0.00000 212
2.5.68-mm2-dl 64 38.40 20.35% 17.873 4122.37 0.00000 0.00000 189
2.5.68 64 28.47 14.54% 25.294 6204.46 0.00005 0.00000 196
2.5.68-mm2 64 19.54 9.40% 32.600 12986.20 0.04410 0.00000 208
2.5.68-mm2-dl 128 39.11 20.61% 32.887 11232.22 0.02675 0.00000 190
2.5.68 128 29.87 14.99% 41.715 17752.22 0.10242 0.00000 199
2.5.68-mm2 128 19.28 9.21% 63.638 57459.95 1.27239 0.01006 209
2.5.68-mm2-dl 256 40.42 21.18% 58.473 36299.72 1.36814 0.00029 191
2.5.68 256 34.10 16.88% 64.697 51122.80 1.16358 0.01163 202
2.5.68-mm2 256 18.84 8.96% 125.350 164470.88 1.43795 0.14148 210
tiobench random reads on ext2 also has higher throughput and lower latency with
deadline. Higher cpu utilization appears in the better performing kernels.
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency >2s >10s Eff
--------------- --- ------------------------------------------------------------
2.5.68-mm2-dl 1 1.00 0.94% 11.687 108.48 0.00000 0.00000 107
2.5.68-mm2 1 0.95 0.88% 12.383 121.84 0.00000 0.00000 108
2.5.68 1 0.84 0.75% 14.003 120.98 0.00000 0.00000 111
2.5.68-mm2-dl 8 5.55 4.97% 16.053 122.95 0.00000 0.00000 112
2.5.68 8 4.56 4.29% 19.193 122.64 0.00000 0.00000 106
2.5.68-mm2 8 0.96 0.85% 95.108 715.00 0.00000 0.00000 113
2.5.68-mm2-dl 16 6.56 6.94% 25.925 228.94 0.00000 0.00000 95
2.5.68 16 4.34 3.95% 40.724 212.21 0.00000 0.00000 110
2.5.68-mm2 16 0.99 0.80% 178.652 1203.69 0.00000 0.00000 123
2.5.68-mm2-dl 32 5.65 5.78% 60.735 355.96 0.00000 0.00000 98
2.5.68 32 3.28 3.40% 98.453 335.85 0.00000 0.00000 96
2.5.68-mm2 32 0.94 0.76% 357.853 2151.68 0.00000 0.00000 124
2.5.68-mm2-dl 64 5.94 5.69% 108.393 553.00 0.00000 0.00000 104
2.5.68 64 4.20 3.87% 137.963 647.04 0.00000 0.00000 108
2.5.68-mm2 64 0.91 0.79% 677.313 3973.72 0.00000 0.00000 115
2.5.68-mm2-dl 128 7.13 7.08% 163.155 1793.65 0.00000 0.00000 101
2.5.68 128 4.18 4.03% 245.390 1693.66 0.00000 0.00000 104
2.5.68-mm2 128 0.90 0.76% 1275.112 7329.02 11.84476 0.00000 119
2.5.68-mm2-dl 256 7.33 7.18% 249.025 4519.38 0.00000 0.00000 102
2.5.68 256 4.96 4.47% 285.231 6121.11 0.78125 0.00000 111
2.5.68-mm2 256 0.86 0.86% 2160.203 40955.72 32.13542 3.67187 99
For tiobench random writes on ext2, something in -mm2 gives both elevators an edge
over 2.5.68. Possibly from CONFIG_SCSI_QLOGIC_ISP_NEW=y.
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency >2s >10s Eff
--------------- --- ------------------------------------------------------------
2.5.68-mm2-dl 1 4.54 3.73% 0.076 20.17 0.00000 0.00000 122
2.5.68-mm2 1 4.48 3.94% 0.077 22.02 0.00000 0.00000 114
2.5.68 1 2.86 2.73% 1.059 60.94 0.00000 0.00000 105
2.5.68-mm2-dl 8 4.55 6.24% 0.800 186.48 0.00000 0.00000 73
2.5.68-mm2 8 4.09 3.91% 1.984 488.24 0.00000 0.00000 104
2.5.68 8 3.73 4.39% 1.176 81.25 0.00000 0.00000 85
2.5.68-mm2-dl 16 4.49 6.12% 2.378 241.77 0.00000 0.00000 73
2.5.68-mm2 16 4.00 4.45% 3.510 969.07 0.00000 0.00000 90
2.5.68 16 3.69 4.21% 1.872 189.26 0.00000 0.00000 88
2.5.68-mm2-dl 32 4.41 6.50% 1.871 324.33 0.00000 0.00000 68
2.5.68-mm2 32 4.03 5.62% 4.660 1455.09 0.00000 0.00000 72
2.5.68 32 3.71 4.89% 2.102 352.52 0.00000 0.00000 76
2.5.68-mm2-dl 64 4.44 7.49% 1.768 235.32 0.00000 0.00000 59
2.5.68-mm2 64 4.26 7.39% 2.334 1483.77 0.00000 0.00000 58
2.5.68 64 3.71 5.68% 2.266 701.86 0.00000 0.00000 65
2.5.68-mm2-dl 128 4.42 8.27% 1.640 1477.07 0.00000 0.00000 53
2.5.68-mm2 128 4.35 8.14% 0.853 275.49 0.00000 0.00000 53
2.5.68 128 3.79 6.87% 1.343 1042.66 0.00000 0.00000 55
2.5.68-mm2-dl 256 4.37 8.75% 0.689 1039.17 0.00000 0.00000 50
2.5.68-mm2 256 4.36 8.87% 2.487 3519.76 0.00000 0.00000 49
2.5.68 256 3.79 6.70% 0.304 79.07 0.00000 0.00000 57
Anticipatory scheduler had more throughtput on dbench 192 on ext2.
dbench 192 processes Average High Low
2.5.68-mm2 203.79 215.85 192.65 MB/sec
2.5.68-mm2-dl 198.72 212.53 182.85 MB/sec
bonnie++-1.02a on ext2, 8 1024 MB files. The new qlogic driver may be the
difference between -mm and 2.5.68.
--------------------- Sequential Output ------------------
---- Per Char ----- ------ Block ----- ---- Rewrite ----
Kernel MB/sec %CPU Eff MB/sec %CPU Eff MB/sec %CPU Eff
2.5.68-mm2 9.42 99.0 9.51 71.61 57.0 126 17.52 19.0 92
2.5.68-mm2-dl 9.37 99.0 9.47 71.79 57.0 126 17.00 18.0 94
2.5.68 9.50 99.0 9.60 68.62 53.3 129 15.92 17.0 94
deadline elevator does much better on bonnie++ random seek test.
-------- Sequential Input ---------- ----- Random -----
---- Per Char --- ----- Block ----- ----- Seeks -----
Kernel MB/sec %CPU Eff MB/sec %CPU Eff /sec %CPU Eff
2.5.68 9.38 99.0 9.5 26.98 19.3 140 502.5 3.00 16750
2.5.68-mm2 9.28 98.0 9.5 27.29 20.0 136 203.9 1.00 20393
2.5.68-mm2-dl 9.34 98.0 9.5 27.62 20.0 138 553.1 3.67 15084
Not much difference on the bonnie++ small files tests.
---------Sequential ------------------
----- Create ----- ---- Delete ----
files /sec %CPU Eff /sec %CPU Eff
2.5.68 65536 151 99.0 153 61933 99.3 6234
2.5.68-mm2 65536 155 99.0 156 60020 99.0 6062
2.5.68-mm2-dl 65536 147 99.0 148 61431 99.7 6163
-------------------Random -----------
----- Create ---- ---- Delete ----
/sec %CPU Eff /sec %CPU Eff
2.5.68 155 100.0 155 536 100.0 536
2.5.68-mm2 152 99.0 153 525 99.0 530
2.5.68-mm2-dl 151 99.0 153 518 99.0 523
--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BENCHMARK] 2.5.68 and 2.5.68-mm2
@ 2003-04-28 21:58 rwhron
0 siblings, 0 replies; 8+ messages in thread
From: rwhron @ 2003-04-28 21:58 UTC (permalink / raw)
To: piggin; +Cc: linux-kernel
> A run with deadline on mm would be nice to see.
I've just kicked off elevator=deadline on 2.5.68-mm2.
> Are you using TCQ by any chance? If so, what do the results look
> like with TCQ off?
Any idea how to turn off TCQ with the qlogicisp driver?
--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BENCHMARK] 2.5.68 and 2.5.68-mm2
2003-04-26 2:20 ` Nick Piggin
@ 2003-04-26 3:11 ` Nick Piggin
0 siblings, 0 replies; 8+ messages in thread
From: Nick Piggin @ 2003-04-26 3:11 UTC (permalink / raw)
To: Nick Piggin; +Cc: rwhron, akpm, linux-kernel
Nick Piggin wrote:
>
>
> rwhron@earthlink.net wrote:
>
>>> The benchmark is hitting a pathologoical case. Yeah, it's a
>>> problem, but
>>> it's not as bad as tiobench indicates.
>>>
> Its interesting that deadline does so much better for this case
> though. I wonder if any other differences in mm could cause it?
> A run with deadline on mm would be nice to see. IIRC my tests
> showed AS doing as well or better than deadline in smp tiobench.
> The bad random read performance is a problem too, because the
> fragmentation should only add to the randomness.
> I'll have to investigate this further.
>
Are you using TCQ by any chance? If so, what do the results look
like with TCQ off?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BENCHMARK] 2.5.68 and 2.5.68-mm2
2003-04-26 1:58 rwhron
@ 2003-04-26 2:20 ` Nick Piggin
2003-04-26 3:11 ` Nick Piggin
0 siblings, 1 reply; 8+ messages in thread
From: Nick Piggin @ 2003-04-26 2:20 UTC (permalink / raw)
To: rwhron; +Cc: akpm, linux-kernel
rwhron@earthlink.net wrote:
>>>On the AIM7 database test, -mm2 was about 18% faster and
>>>
>
>>iirc, AIM7 is dominated by lots of O_SYNC writes. I'd have expected the
>>anticipatory scheduler to do worse. Odd. Which fs was it?
>>
>
>That was ext2 too.
>
Well thats nice for a change. The set of workloads which do worse
on AS are obviously more specific than sync writes. I think
multiple threads reading and writing to a similar area of the
disk. Not sure though.
>
>
>>tiobench will create a bunch of processes, each growing a large file, all
>>in the same directory.
>>
>
>>The benchmark is hitting a pathologoical case. Yeah, it's a problem, but
>>it's not as bad as tiobench indicates.
>>
Its interesting that deadline does so much better for this case
though. I wonder if any other differences in mm could cause it?
A run with deadline on mm would be nice to see. IIRC my tests
showed AS doing as well or better than deadline in smp tiobench.
The bad random read performance is a problem too, because the
fragmentation should only add to the randomness.
I'll have to investigate this further.
>
>Oracle doing reads/writes to preallocated, contiguous files is more
>important than tiobench. Oracle datafiles are typically created
>sequentially, which wouldn't exercise the pathology.
>
>I pay more attention the OSDL-DBT-3 and "Winmark I" numbers than
>the i/o stuff I run. (I look at my numbers more, but care about
>theirs more).
>
>What about the behavior where CPU utilization goes down as thread
>count goes up? Is she just i/o bound?
>
>Sequential Reads ext2
> Num
>Kernel Thr Rate (CPU%)
>---------- --- ----- ------
>2.5.68 8 36.65 18.04%
>2.5.68-mm2 8 23.96 11.15%
>
>2.5.68 256 34.10 16.88%
>2.5.68-mm2 256 18.84 8.96%
>
Well its doing less IO. Look at CPU efficiency which appears to
be rate / cpu. It is steady - an amount of IO will cost an
amount of CPU.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BENCHMARK] 2.5.68 and 2.5.68-mm2
@ 2003-04-26 1:58 rwhron
2003-04-26 2:20 ` Nick Piggin
0 siblings, 1 reply; 8+ messages in thread
From: rwhron @ 2003-04-26 1:58 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel
>> The autoconf-2.53 make/make check is a fork test. 2.5.68
>> is about 13% faster here.
> I wonder why. Which fs was it?
That was on ext2. There isn't much i/o on autoconf "make check".
It's a lot of small perl scripts, m4 and gcc on tiny files.
>> On the AIM7 database test, -mm2 was about 18% faster and
> iirc, AIM7 is dominated by lots of O_SYNC writes. I'd have expected the
> anticipatory scheduler to do worse. Odd. Which fs was it?
That was ext2 too.
> tiobench will create a bunch of processes, each growing a large file, all
> in the same directory.
> The benchmark is hitting a pathologoical case. Yeah, it's a problem, but
> it's not as bad as tiobench indicates.
Oracle doing reads/writes to preallocated, contiguous files is more
important than tiobench. Oracle datafiles are typically created
sequentially, which wouldn't exercise the pathology.
I pay more attention the OSDL-DBT-3 and "Winmark I" numbers than
the i/o stuff I run. (I look at my numbers more, but care about
theirs more).
What about the behavior where CPU utilization goes down as thread
count goes up? Is she just i/o bound?
Sequential Reads ext2
Num
Kernel Thr Rate (CPU%)
---------- --- ----- ------
2.5.68 8 36.65 18.04%
2.5.68-mm2 8 23.96 11.15%
2.5.68 256 34.10 16.88%
2.5.68-mm2 256 18.84 8.96%
--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2003-05-01 17:58 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-25 23:09 [BENCHMARK] 2.5.68 and 2.5.68-mm2 rwhron
2003-04-25 23:25 ` Andrew Morton
2003-04-26 1:58 rwhron
2003-04-26 2:20 ` Nick Piggin
2003-04-26 3:11 ` Nick Piggin
2003-04-28 21:58 rwhron
2003-04-30 0:59 rwhron
2003-05-01 18:10 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).