linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
@ 2002-11-09  2:00 Con Kolivas
  2002-11-09  2:36 ` Andrew Morton
  2002-11-10  2:44 ` Andrea Arcangeli
  0 siblings, 2 replies; 47+ messages in thread
From: Con Kolivas @ 2002-11-09  2:00 UTC (permalink / raw)
  To: linux kernel mailing list; +Cc: marcelo, Andrea Arcangeli

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Here are some contest benchmarks of recent 2.4 kernels (this is mainly to test 
2.4.20-rc1/aa1):

noload:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.18 [5]              71.7    93      0       0       1.00
2.4.19 [5]              69.0    97      0       0       0.97
2.4.19-ck9 [2]          68.8    97      0       0       0.96
2.4.20-rc1 [3]          72.2    93      0       0       1.01
2.4.20-rc1aa1 [1]       71.9    94      0       0       1.01

cacherun:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.18 [2]              66.6    99      0       0       0.93
2.4.19 [2]              68.0    99      0       0       0.95
2.4.19-ck9 [2]          66.1    99      0       0       0.93
2.4.20-rc1 [3]          67.2    99      0       0       0.94
2.4.20-rc1aa1 [1]       67.4    99      0       0       0.94

process_load:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.18 [3]              109.5   57      119     44      1.53
2.4.19 [3]              106.5   59      112     43      1.49
2.4.19-ck9 [2]          94.3    70      83      32      1.32
2.4.20-rc1 [3]          110.7   58      119     43      1.55
2.4.20-rc1aa1 [3]       110.5   58      117     43      1.55

ctar_load:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.18 [3]              117.4   63      1       7       1.64
2.4.19 [2]              106.5   70      1       8       1.49
2.4.19-ck9 [2]          110.5   71      1       9       1.55
2.4.20-rc1 [3]          102.1   72      1       7       1.43
2.4.20-rc1aa1 [3]       107.1   69      1       7       1.50

xtar_load:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.18 [3]              150.8   49      2       8       2.11
2.4.19 [1]              132.4   55      2       9       1.85
2.4.19-ck9 [2]          138.6   58      2       11      1.94
2.4.20-rc1 [3]          180.7   40      3       8       2.53
2.4.20-rc1aa1 [3]       166.6   44      2       7       2.33

First noticeable difference. With repeated extracting of tars while compiling 
kernels 2.4.20-rc1 seems to be slower and aa1 curbs it just a little.

io_load:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.18 [3]              474.1   15      36      10      6.64
2.4.19 [3]              492.6   14      38      10      6.90
2.4.19-ck9 [2]          140.6   49      5       5       1.97
2.4.20-rc1 [2]          1142.2  6       90      10      16.00
2.4.20-rc1aa1 [1]       1132.5  6       90      10      15.86

Well this is interesting. 2.4.20-rc1 seems to have improved it's ability to do 
IO work. Unfortunately it is now busy starving the scheduler in the mean 
time, much like the 2.5 kernels did before the deadline scheduler was put in.

read_load:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.18 [3]              102.3   70      6       3       1.43
2.4.19 [2]              134.1   54      14      5       1.88
2.4.19-ck9 [2]          77.4    85      11      9       1.08
2.4.20-rc1 [3]          173.2   43      20      5       2.43
2.4.20-rc1aa1 [3]       150.6   51      16      5       2.11

Also a noticeable difference, repeatedly reading a large file while trying to 
compile a kernel has slowed down in 2.4.20-rc1 and aa1 blunts this effect 
somewhat.

list_load:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.18 [3]              90.2    76      1       17      1.26
2.4.19 [1]              89.8    77      1       20      1.26
2.4.19-ck9 [2]          85.2    79      1       22      1.19
2.4.20-rc1 [3]          88.8    77      0       12      1.24
2.4.20-rc1aa1 [1]       88.1    78      1       16      1.23

mem_load:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.18 [3]              103.3   70      32      3       1.45
2.4.19 [3]              100.0   72      33      3       1.40
2.4.19-ck9 [2]          78.3    88      31      8       1.10
2.4.20-rc1 [3]          105.9   69      32      2       1.48
2.4.20-rc1aa1 [1]       106.3   69      33      3       1.49

It would seem most of the changes from 2.4.19 to 2.4.20-rc1 are consistent 
with increased IO throughput but this happens at the expense of doing other 
tasks. The -aa addons help with this but surprisingly not with mem_loading.

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE9zGw5F6dfvkL3i1gRAsN8AKCMg2QvnGMhdMlGRdT7sR01ui6gogCbBrxy
imqAHOMc9ZXwAjoohbd9av4=
=Plvk
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-09  2:00 [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest Con Kolivas
@ 2002-11-09  2:36 ` Andrew Morton
  2002-11-09  3:26   ` Con Kolivas
  2002-11-10  2:44 ` Andrea Arcangeli
  1 sibling, 1 reply; 47+ messages in thread
From: Andrew Morton @ 2002-11-09  2:36 UTC (permalink / raw)
  To: Con Kolivas, Jens Axboe
  Cc: linux kernel mailing list, marcelo, Andrea Arcangeli

Con Kolivas wrote:
> 
> io_load:
> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
> 2.4.18 [3]              474.1   15      36      10      6.64
> 2.4.19 [3]              492.6   14      38      10      6.90
> 2.4.19-ck9 [2]          140.6   49      5       5       1.97
> 2.4.20-rc1 [2]          1142.2  6       90      10      16.00
> 2.4.20-rc1aa1 [1]       1132.5  6       90      10      15.86
> 

2.4.20-pre3 included some elevator changes.  I assume they are the
cause of this.  Those changes have propagated into Alan's and Andrea's
kernels.   Hence they have significantly impacted the responsiveness
of all mainstream 2.4 kernels under heavy writes.

(The -ck patch includes rmap14b which includes the read-latency2 thing)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-09  2:36 ` Andrew Morton
@ 2002-11-09  3:26   ` Con Kolivas
  2002-11-09  4:15     ` Andrew Morton
  0 siblings, 1 reply; 47+ messages in thread
From: Con Kolivas @ 2002-11-09  3:26 UTC (permalink / raw)
  To: Andrew Morton, Jens Axboe
  Cc: linux kernel mailing list, marcelo, Andrea Arcangeli

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>Con Kolivas wrote:
>> io_load:
>> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
>> 2.4.18 [3]              474.1   15      36      10      6.64
>> 2.4.19 [3]              492.6   14      38      10      6.90
>> 2.4.19-ck9 [2]          140.6   49      5       5       1.97
>> 2.4.20-rc1 [2]          1142.2  6       90      10      16.00
>> 2.4.20-rc1aa1 [1]       1132.5  6       90      10      15.86
>
>2.4.20-pre3 included some elevator changes.  I assume they are the
>cause of this.  Those changes have propagated into Alan's and Andrea's
>kernels.   Hence they have significantly impacted the responsiveness
>of all mainstream 2.4 kernels under heavy writes.
>
>(The -ck patch includes rmap14b which includes the read-latency2 thing)

Thanks for the explanation. I should have said this was ck with compressed 
caching; not rmap.

Con.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE9zIB8F6dfvkL3i1gRAs6lAJ0f7E9HTlNl5cOaDnmSfw9gi0QLQgCfV3jh
kaG/a1TzlUviOGz5Ci895uA=
=TyH7
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-09  3:26   ` Con Kolivas
@ 2002-11-09  4:15     ` Andrew Morton
  2002-11-09  5:12       ` Con Kolivas
  2002-11-09 11:20       ` Jens Axboe
  0 siblings, 2 replies; 47+ messages in thread
From: Andrew Morton @ 2002-11-09  4:15 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Jens Axboe, linux kernel mailing list, marcelo, Andrea Arcangeli

Con Kolivas wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> >Con Kolivas wrote:
> >> io_load:
> >> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
> >> 2.4.18 [3]              474.1   15      36      10      6.64
> >> 2.4.19 [3]              492.6   14      38      10      6.90
> >> 2.4.19-ck9 [2]          140.6   49      5       5       1.97
> >> 2.4.20-rc1 [2]          1142.2  6       90      10      16.00
> >> 2.4.20-rc1aa1 [1]       1132.5  6       90      10      15.86
> >
> >2.4.20-pre3 included some elevator changes.  I assume they are the
> >cause of this.  Those changes have propagated into Alan's and Andrea's
> >kernels.   Hence they have significantly impacted the responsiveness
> >of all mainstream 2.4 kernels under heavy writes.
> >
> >(The -ck patch includes rmap14b which includes the read-latency2 thing)
> 
> Thanks for the explanation. I should have said this was ck with compressed
> caching; not rmap.
> 

hrm.  In that case I'll shut up with the speculating.

You're showing a big shift in behaviour between 2.4.19 and 2.4.20-rc1.
Maybe it doesn't translate to worsened interactivity.  Needs more
testing and anaysis.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-09  4:15     ` Andrew Morton
@ 2002-11-09  5:12       ` Con Kolivas
  2002-11-09 11:21         ` Jens Axboe
  2002-11-09 11:20       ` Jens Axboe
  1 sibling, 1 reply; 47+ messages in thread
From: Con Kolivas @ 2002-11-09  5:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jens Axboe, linux kernel mailing list, marcelo, Andrea Arcangeli

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>hrm.  In that case I'll shut up with the speculating.

Please dont stop speculating. I and many others rely on someone like yourself 
who is more likely to understand what is going on to comment. I can't expect 
you to know exactly what goes into every patchset out there. Your input has 
been invaluable and most of the drive for my benchmarking.

>You're showing a big shift in behaviour between 2.4.19 and 2.4.20-rc1.
>Maybe it doesn't translate to worsened interactivity.  Needs more
>testing and anaysis.

Sounds fair enough. My resources are exhausted though. Someone else have any 
thoughts?

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE9zJknF6dfvkL3i1gRAnHLAKCRuTqBfxqX582puVwQ/hBb0T0R1QCePyws
0N9uKoKVY/M22gses+MkEnE=
=UvJP
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-09  4:15     ` Andrew Morton
  2002-11-09  5:12       ` Con Kolivas
@ 2002-11-09 11:20       ` Jens Axboe
  1 sibling, 0 replies; 47+ messages in thread
From: Jens Axboe @ 2002-11-09 11:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Con Kolivas, linux kernel mailing list, marcelo, Andrea Arcangeli

On Fri, Nov 08 2002, Andrew Morton wrote:
> Con Kolivas wrote:
> > 
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> > 
> > >Con Kolivas wrote:
> > >> io_load:
> > >> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
> > >> 2.4.18 [3]              474.1   15      36      10      6.64
> > >> 2.4.19 [3]              492.6   14      38      10      6.90
> > >> 2.4.19-ck9 [2]          140.6   49      5       5       1.97
> > >> 2.4.20-rc1 [2]          1142.2  6       90      10      16.00
> > >> 2.4.20-rc1aa1 [1]       1132.5  6       90      10      15.86
> > >
> > >2.4.20-pre3 included some elevator changes.  I assume they are the
> > >cause of this.  Those changes have propagated into Alan's and Andrea's
> > >kernels.   Hence they have significantly impacted the responsiveness
> > >of all mainstream 2.4 kernels under heavy writes.
> > >
> > >(The -ck patch includes rmap14b which includes the read-latency2 thing)
> > 
> > Thanks for the explanation. I should have said this was ck with compressed
> > caching; not rmap.
> > 
> 
> hrm.  In that case I'll shut up with the speculating.
> 
> You're showing a big shift in behaviour between 2.4.19 and 2.4.20-rc1.
> Maybe it doesn't translate to worsened interactivity.  Needs more
> testing and anaysis.

The merging and seek accounting in 2.4.19 is completely off, it doesn't
make any sense. 2.4.20-rc1 should be sanely tweakable.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-09  5:12       ` Con Kolivas
@ 2002-11-09 11:21         ` Jens Axboe
  2002-11-09 13:09           ` Con Kolivas
  0 siblings, 1 reply; 47+ messages in thread
From: Jens Axboe @ 2002-11-09 11:21 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Andrew Morton, linux kernel mailing list, marcelo, Andrea Arcangeli

On Sat, Nov 09 2002, Con Kolivas wrote:
> >You're showing a big shift in behaviour between 2.4.19 and 2.4.20-rc1.
> >Maybe it doesn't translate to worsened interactivity.  Needs more
> >testing and anaysis.
> 
> Sounds fair enough. My resources are exhausted though. Someone else have any 
> thoughts?

Try setting lower elevator passover values. Something ala

# elvtune -r 64 /dev/hda

(or whatever your drive is)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-09 11:21         ` Jens Axboe
@ 2002-11-09 13:09           ` Con Kolivas
  2002-11-09 13:35             ` Stephen Lord
  2002-11-09 13:54             ` Jens Axboe
  0 siblings, 2 replies; 47+ messages in thread
From: Con Kolivas @ 2002-11-09 13:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Andrew Morton, linux kernel mailing list, marcelo, Andrea Arcangeli

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>On Sat, Nov 09 2002, Con Kolivas wrote:
>> >You're showing a big shift in behaviour between 2.4.19 and 2.4.20-rc1.
>> >Maybe it doesn't translate to worsened interactivity.  Needs more
>> >testing and anaysis.
>>
>> Sounds fair enough. My resources are exhausted though. Someone else have
>> any thoughts?
>
>Try setting lower elevator passover values. Something ala
>
># elvtune -r 64 /dev/hda
>
>(or whatever your drive is)

Heres some more data:

io_load:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.20-rc1 [2]          1142.2  6       90      10      16.00
2420rc1r64 [3]          575.0   12      43      10      8.05

That's it then. Should I run a family of different values and if so over what 
range?

Cheers,
Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE9zQkXF6dfvkL3i1gRAggJAKCOAWzrTxFlnPbOftzMAXPnvI7KVQCfWqUC
iDVmD1UcPDNPWCfQmlBF9yk=
=Q299
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-09 13:09           ` Con Kolivas
@ 2002-11-09 13:35             ` Stephen Lord
  2002-11-09 13:54             ` Jens Axboe
  1 sibling, 0 replies; 47+ messages in thread
From: Stephen Lord @ 2002-11-09 13:35 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Jens Axboe, Andrew Morton, Linux Kernel Mailing List,
	Marcelo Tosatti, Andrea Arcangeli

On Sat, 2002-11-09 at 07:09, Con Kolivas wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> >On Sat, Nov 09 2002, Con Kolivas wrote:
> >> >You're showing a big shift in behaviour between 2.4.19 and 2.4.20-rc1.
> >> >Maybe it doesn't translate to worsened interactivity.  Needs more
> >> >testing and anaysis.
> >>
> >> Sounds fair enough. My resources are exhausted though. Someone else have
> >> any thoughts?
> >
> >Try setting lower elevator passover values. Something ala
> >
> ># elvtune -r 64 /dev/hda
> >
> >(or whatever your drive is)
> 
> Heres some more data:
> 
> io_load:
> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
> 2.4.20-rc1 [2]          1142.2  6       90      10      16.00
> 2420rc1r64 [3]          575.0   12      43      10      8.05
> 
> That's it then. Should I run a family of different values and if so over what 
> range?
> 


There is more going on than this, XFS suffered a major slowdown in some
metadata write only benchmarks - the file create/delete phase of 
bonnie++. Now thats a single app only doing writes. Slowdown on the
order of 500% to 600%. Since we did not follow the pre kernels in
2.4.20 we do not really know when it was introduced and there is
a possibility XFS itself has not followed some api change.

Steve




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-09 13:09           ` Con Kolivas
  2002-11-09 13:35             ` Stephen Lord
@ 2002-11-09 13:54             ` Jens Axboe
  2002-11-09 21:12               ` Arador
                                 ` (2 more replies)
  1 sibling, 3 replies; 47+ messages in thread
From: Jens Axboe @ 2002-11-09 13:54 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Andrew Morton, linux kernel mailing list, marcelo, Andrea Arcangeli

On Sun, Nov 10 2002, Con Kolivas wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> >On Sat, Nov 09 2002, Con Kolivas wrote:
> >> >You're showing a big shift in behaviour between 2.4.19 and 2.4.20-rc1.
> >> >Maybe it doesn't translate to worsened interactivity.  Needs more
> >> >testing and anaysis.
> >>
> >> Sounds fair enough. My resources are exhausted though. Someone else have
> >> any thoughts?
> >
> >Try setting lower elevator passover values. Something ala
> >
> ># elvtune -r 64 /dev/hda
> >
> >(or whatever your drive is)
> 
> Heres some more data:
> 
> io_load:
> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
> 2.4.20-rc1 [2]          1142.2  6       90      10      16.00
> 2420rc1r64 [3]          575.0   12      43      10      8.05
> 
> That's it then. Should I run a family of different values and if so
> over what range?

The default is 2048. How long does the io_load test take, or rather how
many tests are appropriate to do? To get a good picture of how it looks
you should probably try: 0, 8, 16, 64, 128, 512. Once you get some of
these results, it will be easier to determine which area(s) would be
most interesting to further explore.

There's also the write passover, I don't think it will have much impact
on this test though.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-09 13:54             ` Jens Axboe
@ 2002-11-09 21:12               ` Arador
  2002-11-10  2:26                 ` Andrea Arcangeli
  2002-11-09 21:53               ` Con Kolivas
  2002-11-10 10:12               ` Kjartan Maraas
  2 siblings, 1 reply; 47+ messages in thread
From: Arador @ 2002-11-09 21:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: conman, akpm, linux-kernel, marcelo, andrea

On Sat, 9 Nov 2002 14:54:46 +0100
Jens Axboe <axboe@suse.de> wrote:

> The default is 2048. How long does the io_load test take, or rather how

then, shouldn't the default be changed?. There's a big performance drop (/2)
(in that case of course)


Diego Calleja

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-09 13:54             ` Jens Axboe
  2002-11-09 21:12               ` Arador
@ 2002-11-09 21:53               ` Con Kolivas
  2002-11-10 10:09                 ` Jens Axboe
  2002-11-10 10:12               ` Kjartan Maraas
  2 siblings, 1 reply; 47+ messages in thread
From: Con Kolivas @ 2002-11-09 21:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Andrew Morton, linux kernel mailing list, marcelo, Andrea Arcangeli

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>On Sun, Nov 10 2002, Con Kolivas wrote:
>> >On Sat, Nov 09 2002, Con Kolivas wrote:
>> >> >You're showing a big shift in behaviour between 2.4.19 and 2.4.20-rc1.
>> >> >Maybe it doesn't translate to worsened interactivity.  Needs more
>> >> >testing and anaysis.
>> >>
>> >> Sounds fair enough. My resources are exhausted though. Someone else
>> >> have any thoughts?
>> >
>> >Try setting lower elevator passover values. Something ala
>> >
>> ># elvtune -r 64 /dev/hda
>> >
>> >(or whatever your drive is)
>>

>> That's it then. Should I run a family of different values and if so
>> over what range?
>
>The default is 2048. How long does the io_load test take, or rather how
>many tests are appropriate to do? To get a good picture of how it looks
>you should probably try: 0, 8, 16, 64, 128, 512. Once you get some of
>these results, it will be easier to determine which area(s) would be
>most interesting to further explore.

The io_load test takes as long as the time in seconds shown on the table. At 
least 3 tests are appropriate to get a reasonable average [runs is in square 
parentheses]. Therefore it takes about half an hour per run. Luckily I had 
the benefit of a night to set up a whole lot of runs:

io_load:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2420rc1r0 [3]           489.3   15      36      10      6.85
2420rc1r8 [3]           485.5   15      35      10      6.80
2420rc1r16 [3]          570.4   12      43      10      7.99
2420rc1r32 [3]          570.1   12      42      10      7.98
2420rc1r64 [3]          575.0   12      43      10      8.05
2420rc1r128 [3]         611.4   11      46      10      8.56
2420rc1r256 [3]         646.2   11      49      10      9.05
2420rc1r512 [3]         603.7   12      45      10      8.46
2420rc1r1024 [3]        693.9   10      53      10      9.72
2.4.20-rc1 [2]          1142.2  6       90      10      16.00

Test hardware is 1133Mhz P3 laptop with 5400rpm ATA100 drive. I don't doubt 
the response curve would be different for other hardware.

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE9zYPmF6dfvkL3i1gRAlgQAJ9wbCJUc6OesGsuR+S2YHi2+zzRuACePEPJ
MIVeNptM2zdnvEFPZXCWMO8=
=7M4k
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-09 21:12               ` Arador
@ 2002-11-10  2:26                 ` Andrea Arcangeli
  0 siblings, 0 replies; 47+ messages in thread
From: Andrea Arcangeli @ 2002-11-10  2:26 UTC (permalink / raw)
  To: Arador; +Cc: Jens Axboe, conman, akpm, linux-kernel, marcelo

On Sat, Nov 09, 2002 at 10:12:06PM +0100, Arador wrote:
> On Sat, 9 Nov 2002 14:54:46 +0100
> Jens Axboe <axboe@suse.de> wrote:
> 
> > The default is 2048. How long does the io_load test take, or rather how
> 
> then, shouldn't the default be changed?. There's a big performance drop (/2)
> (in that case of course)

depends what side you are benchmarking, not always more throughput means
less interactivity, but at some point (when the more throughput can't
payoff for the reordering anymore) it does.

You should definitely benchmark 2.4.19-ck9 and 2.4.20rc1aa2 with dbench
too. Those numbers as is doesn't show the whole picture.

Andrea

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-09  2:00 [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest Con Kolivas
  2002-11-09  2:36 ` Andrew Morton
@ 2002-11-10  2:44 ` Andrea Arcangeli
  2002-11-10  3:56   ` Matt Reppert
                     ` (2 more replies)
  1 sibling, 3 replies; 47+ messages in thread
From: Andrea Arcangeli @ 2002-11-10  2:44 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list, marcelo

On Sat, Nov 09, 2002 at 01:00:19PM +1100, Con Kolivas wrote:
> xtar_load:
> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
> 2.4.18 [3]              150.8   49      2       8       2.11
> 2.4.19 [1]              132.4   55      2       9       1.85
> 2.4.19-ck9 [2]          138.6   58      2       11      1.94
> 2.4.20-rc1 [3]          180.7   40      3       8       2.53
> 2.4.20-rc1aa1 [3]       166.6   44      2       7       2.33

these numbers doesn't make sense. Can you describe what xtar_load is
doing?

> First noticeable difference. With repeated extracting of tars while compiling 
> kernels 2.4.20-rc1 seems to be slower and aa1 curbs it just a little.
> 
> io_load:
> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
> 2.4.18 [3]              474.1   15      36      10      6.64
> 2.4.19 [3]              492.6   14      38      10      6.90
> 2.4.19-ck9 [2]          140.6   49      5       5       1.97
> 2.4.20-rc1 [2]          1142.2  6       90      10      16.00
> 2.4.20-rc1aa1 [1]       1132.5  6       90      10      15.86

What are you benchmarking, tar or the kernel compile? I think the
latter. That's the elevator and the size of the I/O queue here. Nothing
else. hacks like read-latency aren't very nice in particular with
async-io aware apps. If this improvement in ck9 was achieved decreasing
the queue size it'll be interesting to see how much the sequential I/O
is slowed down, it's very possible we've too big queues for some device.

> Well this is interesting. 2.4.20-rc1 seems to have improved it's ability to do 
> IO work. Unfortunately it is now busy starving the scheduler in the mean 
> time, much like the 2.5 kernels did before the deadline scheduler was put in.
> 
> read_load:
> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
> 2.4.18 [3]              102.3   70      6       3       1.43
> 2.4.19 [2]              134.1   54      14      5       1.88
> 2.4.19-ck9 [2]          77.4    85      11      9       1.08
> 2.4.20-rc1 [3]          173.2   43      20      5       2.43
> 2.4.20-rc1aa1 [3]       150.6   51      16      5       2.11

What is busy starving the scheduler? This sounds like it's again just an
evelator benchmark. I don't buy your scheduler claims, give more
explanations or it'll take it as vapourware wording, I very much doubt
you can find any single problem in the scheduler rc1aa2 or that the
scheduler in rc1aa1 has a chance to run slower than the one of 2.4.19 in
a I/O benchmark, ok it still misses the numa algorithm, but that's not a
bug, just a missing feature and it'll soon be fixed too and it doesn't
matter for normal smp non-numa machines out there.

> Also a noticeable difference, repeatedly reading a large file while trying to 
> compile a kernel has slowed down in 2.4.20-rc1 and aa1 blunts this effect 
> somewhat.
> 
> list_load:
> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
> 2.4.18 [3]              90.2    76      1       17      1.26
> 2.4.19 [1]              89.8    77      1       20      1.26
> 2.4.19-ck9 [2]          85.2    79      1       22      1.19
> 2.4.20-rc1 [3]          88.8    77      0       12      1.24
> 2.4.20-rc1aa1 [1]       88.1    78      1       16      1.23
> 
> mem_load:
> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
> 2.4.18 [3]              103.3   70      32      3       1.45
> 2.4.19 [3]              100.0   72      33      3       1.40
> 2.4.19-ck9 [2]          78.3    88      31      8       1.10
> 2.4.20-rc1 [3]          105.9   69      32      2       1.48
> 2.4.20-rc1aa1 [1]       106.3   69      33      3       1.49

again ck9 is faster because of elevator hacks ala read-latency.

in short your whole benchmark seems all about interacitivy of reads
during write flood. That's the read-latency thing or whatever else you
could do to ll_rw_block.c. 

In short if somebody runs fast in something like this:

	cp /dev/zero . & time cp bigfile /dev/null

he will win your whole contest too.

please show the difff between
2.4.19-ck9/drivers/block/{ll_rw_blk,elevator}.c and
2.4.19/drivers/block/...

All the difference is there and it will hurt you badly if you do
async-io benchmarks, and possibly dbench too. So you should always
accompain your benchmark with async-io simultanous read/write bandwitdth
and dbench, or I could always win your contest by shipping a very bad
kernel. Either that or change the name of your project, if somebody wins
this context that's probably a bad I/O scheduler in many other aspects,
some of the reason I didn't merge read-latency from Andrew.

Andrea

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-10  2:44 ` Andrea Arcangeli
@ 2002-11-10  3:56   ` Matt Reppert
  2002-11-10  9:58   ` Con Kolivas
  2002-11-10 19:32   ` Rik van Riel
  2 siblings, 0 replies; 47+ messages in thread
From: Matt Reppert @ 2002-11-10  3:56 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: conman, linux-kernel, marcelo

Purely for information's sake ...

On Sun, 10 Nov 2002 03:44:51 +0100
Andrea Arcangeli <andrea@suse.de> wrote:

> On Sat, Nov 09, 2002 at 01:00:19PM +1100, Con Kolivas wrote:
> > xtar_load:
> > Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
> > 2.4.18 [3]              150.8   49      2       8       2.11
> > 2.4.19 [1]              132.4   55      2       9       1.85
> > 2.4.19-ck9 [2]          138.6   58      2       11      1.94
> > 2.4.20-rc1 [3]          180.7   40      3       8       2.53
> > 2.4.20-rc1aa1 [3]       166.6   44      2       7       2.33
> 
> these numbers doesn't make sense. Can you describe what xtar_load is
> doing?

Repeatedly extracting tars while compiling kernels.

Andrea, I think you mixed up what the descriptions go to. They come *under*
the numbers, not above, commenting on only the test directly above them.
(eg, "First noticeable difference" is about "xtar_load")

Yes, these are kind of meaningless without descriptions. You can find those
at the webpage, http://contest.kolivas.net/ ... This will make more sense with
that, of course how meaningful it is is always up to debate :)

All of these benchmark the kernel compile while doing something else in the
background.

> In short if somebody runs fast in something like this:
> 
> 	cp /dev/zero . & time cp bigfile /dev/null
> 
> he will win your whole contest too.

That's practically one of the loads, actually.

"IO Load - copies /dev/zero continually to a file the size of
	the physical memory."

Which dds blocks the size of MemTotal in /proc/meminfo to a file
in /tmp in a shell script as long as the kernel compile is running.

> please show the difff between
> 2.4.19-ck9/drivers/block/{ll_rw_blk,elevator}.c and
> 2.4.19/drivers/block/...

elevator.c is untouched, ll_rw_blk.c follows. The full patch is here:
http://members.optusnet.com.au/con.man/ck9_2.4.19.patch.bz2

diff -bBdaurN linux-2.4.19/drivers/block/ll_rw_blk.c linux-2.4.19-ck9/drivers/bl
ock/ll_rw_blk.c
--- linux-2.4.19/drivers/block/ll_rw_blk.c      2002-08-03 13:14:45.000000000 +1
000
+++ linux-2.4.19-ck9/drivers/block/ll_rw_blk.c  2002-10-14 17:21:18.000000000 +1
000
@@ -1112,6 +1112,9 @@
        if (!test_bit(BH_Lock, &bh->b_state))
                BUG();
 
+       if (buffer_delay(bh) || !buffer_mapped(bh))
+               BUG();
+
        set_bit(BH_Req, &bh->b_state);
        set_bit(BH_Launder, &bh->b_state);
 
@@ -1132,6 +1135,7 @@
                        kstat.pgpgin += count;
                        break;
        }
+       conditional_schedule();
 }
 
 /**
@@ -1270,7 +1274,8 @@
 
        req->errors = 0;
        if (!uptodate)
-               printk("end_request: I/O error, dev %s (%s), sector %lu\n",
+               printk(KERN_INFO "end_request: I/O error, dev %s (%s)," 
+                      " sector %lu\n",
                        kdevname(req->rq_dev), name, req->sector);
 
        if ((bh = req->bh) != NULL) {

.
> Either that or change the name of your project,

It's called "contest" because it's a reasonably arbitrary test of what
the kernel does under some circumstances that's put out by Con Kolivas.
Con's test. Contest. It's not supposed to actually mean anything.

Matt

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-10  2:44 ` Andrea Arcangeli
  2002-11-10  3:56   ` Matt Reppert
@ 2002-11-10  9:58   ` Con Kolivas
  2002-11-10 10:06     ` Jens Axboe
  2002-11-10 16:20     ` Andrea Arcangeli
  2002-11-10 19:32   ` Rik van Riel
  2 siblings, 2 replies; 47+ messages in thread
From: Con Kolivas @ 2002-11-10  9:58 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux kernel mailing list, marcelo

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

First some explanation.

Contest (http://contest.kolivas.net) is obviously not a throughput style 
benchmark. The benchmark simply uses userland loads known to slow down the 
machine (like writing large files) and sees how much longer kernel 
compilation takes (make -j4 bzImage on uniprocessor). Thus it never claims to 
be any sort of comprehensive system benchmark; it only serves to give an idea 
of the systems ability to respond in the presence of different loads, in 
terms end users can understand.

>On Sat, Nov 09, 2002 at 01:00:19PM +1100, Con Kolivas wrote:
>> xtar_load:
>> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
>> 2.4.18 [3]              150.8   49      2       8       2.11
>> 2.4.19 [1]              132.4   55      2       9       1.85
>> 2.4.19-ck9 [2]          138.6   58      2       11      1.94
>> 2.4.20-rc1 [3]          180.7   40      3       8       2.53
>> 2.4.20-rc1aa1 [3]       166.6   44      2       7       2.33
>
>these numbers doesn't make sense. Can you describe what xtar_load is
>doing?

Ok xtar_load starts extracting a large tar (a kernel tree) in the background 
then tries to compile a kernel. The time is how long kernel compilation takes 
and cpu% is how much cpu% make -j4 bzImage uses. Loads is how many times it 
successfully extracts the tar and LCPU% is the cpu% returned by the "tar x 
linux.tar" command. Ratio is the ratio of this kernel compilation time to the 
reference (2.4.18 with no load).

>> First noticeable difference. With repeated extracting of tars while
>> compiling kernels 2.4.20-rc1 seems to be slower and aa1 curbs it just a
>> little.

This explanation said simply that kernel compilation with the same tar 
extracting load takes longer in 2.4.20-rc1 compared with 2.4.19, but that the 
aa addons sped it up a bit.

>> io_load:
>> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
>> 2.4.18 [3]              474.1   15      36      10      6.64
>> 2.4.19 [3]              492.6   14      38      10      6.90
>> 2.4.19-ck9 [2]          140.6   49      5       5       1.97
>> 2.4.20-rc1 [2]          1142.2  6       90      10      16.00
>> 2.4.20-rc1aa1 [1]       1132.5  6       90      10      15.86
>
>What are you benchmarking, tar or the kernel compile? I think the
>latter. That's the elevator and the size of the I/O queue here. Nothing
>else. hacks like read-latency aren't very nice in particular with
>async-io aware apps. If this improvement in ck9 was achieved decreasing
>the queue size it'll be interesting to see how much the sequential I/O
>is slowed down, it's very possible we've too big queues for some device.
>
>> Well this is interesting. 2.4.20-rc1 seems to have improved it's ability
>> to do IO work. Unfortunately it is now busy starving the scheduler in the
>> mean time, much like the 2.5 kernels did before the deadline scheduler was
>> put in.
>>
>> read_load:
>> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
>> 2.4.18 [3]              102.3   70      6       3       1.43
>> 2.4.19 [2]              134.1   54      14      5       1.88
>> 2.4.19-ck9 [2]          77.4    85      11      9       1.08
>> 2.4.20-rc1 [3]          173.2   43      20      5       2.43
>> 2.4.20-rc1aa1 [3]       150.6   51      16      5       2.11
>
>What is busy starving the scheduler? This sounds like it's again just an
>evelator benchmark. I don't buy your scheduler claims, give more
>explanations or it'll take it as vapourware wording, I very much doubt
>you can find any single problem in the scheduler rc1aa2 or that the
>scheduler in rc1aa1 has a chance to run slower than the one of 2.4.19 in
>a I/O benchmark, ok it still misses the numa algorithm, but that's not a
>bug, just a missing feature and it'll soon be fixed too and it doesn't
>matter for normal smp non-numa machines out there.

Ok I fully retract the statement. I should not pass judgement on what part of 
the kernel has changed the benchmark results, I'll just describe what the 
results say. Note however this comment was centred on the results of io_load 
above. Put simply : if I am writing a large file and then try to compile the 
kernel (make -j4 bzImage) it is 16 times slower.

>> mem_load:
>> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
>> 2.4.18 [3]              103.3   70      32      3       1.45
>> 2.4.19 [3]              100.0   72      33      3       1.40
>> 2.4.19-ck9 [2]          78.3    88      31      8       1.10
>> 2.4.20-rc1 [3]          105.9   69      32      2       1.48
>> 2.4.20-rc1aa1 [1]       106.3   69      33      3       1.49
>
>again ck9 is faster because of elevator hacks ala read-latency.
>
>in short your whole benchmark seems all about interacitivy of reads
>during write flood. That's the read-latency thing or whatever else you
>could do to ll_rw_block.c.
>
>In short if somebody runs fast in something like this:
>
>	cp /dev/zero . & time cp bigfile /dev/null
>
>he will win your whole contest too.
>
>please show the difff between
>2.4.19-ck9/drivers/block/{ll_rw_blk,elevator}.c and
>2.4.19/drivers/block/...

I think Matt addressed this issue. 

>All the difference is there and it will hurt you badly if you do
>async-io benchmarks, and possibly dbench too. So you should always
>accompain your benchmark with async-io simultanous read/write bandwitdth
>and dbench, or I could always win your contest by shipping a very bad
>kernel. Either that or change the name of your project, if somebody wins
>this context that's probably a bad I/O scheduler in many other aspects,
>some of the reason I didn't merge read-latency from Andrew.

The name is meaningless and based on my name. Had my name been John it would 
be johntest.

I regret ever including the -ck (http://kernel.kolivas.net) results. The 
purpose of publishing these results was to compare 2.4.20-rc1/aa1 with 
previous kernels. As some people are interested in the results of the ck 
patchset I threw them in as well. -ck is a patchset with desktop users in 
mind and is simply a merged patch of O(1),preempt,low latency and compressed 
caching. If it sacrifices throughput in certain areas to maintain system 
responsiveness then so be it. I'll look into adding other loads to contest as 
you suggested, but I'm not going to add basic throughput benchmarks. There 
are plenty of tools for this already.

I've done some ordinary dbench-quick benchmarks of ck9 and 2.4.20-rc1aa1 at 
the OSDL http://www.osdl.org/stp

ck10_cc is the sum of patches that make up ck9 so is the same thing.

ck10_cc: http://khack.osdl.org/stp/7005/
2.4.20-rc1-aa1: http://khack.osdl.org/stp/7006/

Summary:
2420rc1aa1:
1 117.5
4 114.002
7 114.643
10 114.818
13 109.478
16 109.817
19 103.692
22 103.678
25 105.478
28 93.1296
31 87.0544
34 84.2668
37 81.0731
40 75.4605
43 77.2198
46 69.0448
49 66.7997
52 61.5987
55 60.2009
58 60.1531
61 58.3121
64 55.7127
67 56.2714
70 53.6214
73 52.2704
76 52.3631
79 49.7146
82 48.2406
85 48.1078
88 42.8405
91 42.4929
94 42.3958
97 43.5729
100 45.8318

ck10_cc:
1 116.239
4 115.075
7 114.414
10 114.166
13 109.129
16 109.403
19 106.601
22 97.7714
25 93.7279
28 95.0076
31 92.5594
34 88.5938
37 89.7026
40 86.9904
43 85.1783
46 82.7975
49 79.7348
52 80.2497
55 79.2346
58 76.6632
61 75.9002
64 75.8677
67 75.7318
70 73.2223
73 73.7652
76 72.9277
79 72.5244
82 71.6753
85 71.3161
88 70.9735
91 69.5539
94 69.602
97 67.2016
100 67.158

Regards,
Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE9zi3UF6dfvkL3i1gRAkWmAJ4zX7gyUjzKH7eCNneyNRWLPGtCeACff9A7
Bn8LHqZw46CrGauuWTldDnQ=
=0WMB
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-10  9:58   ` Con Kolivas
@ 2002-11-10 10:06     ` Jens Axboe
  2002-11-10 16:21       ` Andrea Arcangeli
  2002-11-10 16:20     ` Andrea Arcangeli
  1 sibling, 1 reply; 47+ messages in thread
From: Jens Axboe @ 2002-11-10 10:06 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrea Arcangeli, linux kernel mailing list, marcelo

On Sun, Nov 10 2002, Con Kolivas wrote:
> >> Well this is interesting. 2.4.20-rc1 seems to have improved it's ability
> >> to do IO work. Unfortunately it is now busy starving the scheduler in the
> >> mean time, much like the 2.5 kernels did before the deadline scheduler was
> >> put in.
> >>
> >> read_load:
> >> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
> >> 2.4.18 [3]              102.3   70      6       3       1.43
> >> 2.4.19 [2]              134.1   54      14      5       1.88
> >> 2.4.19-ck9 [2]          77.4    85      11      9       1.08
> >> 2.4.20-rc1 [3]          173.2   43      20      5       2.43
> >> 2.4.20-rc1aa1 [3]       150.6   51      16      5       2.11
> >
> >What is busy starving the scheduler? This sounds like it's again just an
> >evelator benchmark. I don't buy your scheduler claims, give more
> >explanations or it'll take it as vapourware wording, I very much doubt
> >you can find any single problem in the scheduler rc1aa2 or that the
> >scheduler in rc1aa1 has a chance to run slower than the one of 2.4.19 in
> >a I/O benchmark, ok it still misses the numa algorithm, but that's not a
> >bug, just a missing feature and it'll soon be fixed too and it doesn't
> >matter for normal smp non-numa machines out there.
> 
> Ok I fully retract the statement. I should not pass judgement on what part of 
> the kernel has changed the benchmark results, I'll just describe what the 
> results say. Note however this comment was centred on the results of io_load 
> above. Put simply : if I am writing a large file and then try to compile the 
> kernel (make -j4 bzImage) it is 16 times slower.

In Con's defence, I think he meant io scheduler starvation and not
process scheduler starvation. Otherwise the following wouldn't make a
lot of sense:

"Unfortunately it is now busy starving the scheduler in the mean time,
much like the 2.5 kernels did before the deadline scheduler was put in."

In indeed, 2.5 kernels had the exact same io scheduler algorithm in 2.5
as 2.4.20-rc has, so this makes perfect sense from the io scheduler
starvation POV.

There are inherent problems in the 2.4 io scheduler for these types of
work loads, the ugly and nausea-inducing read-latency hack that akpm did
attempts to work-around that.

Andrea is obviously talking about process scheduler, note the numa
reference among other things.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-09 21:53               ` Con Kolivas
@ 2002-11-10 10:09                 ` Jens Axboe
  2002-11-10 16:23                   ` Andrea Arcangeli
  2002-11-11  4:26                   ` Con Kolivas
  0 siblings, 2 replies; 47+ messages in thread
From: Jens Axboe @ 2002-11-10 10:09 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Andrew Morton, linux kernel mailing list, marcelo,
	Andrea Arcangeli, Marcelo

On Sun, Nov 10 2002, Con Kolivas wrote:
> >The default is 2048. How long does the io_load test take, or rather how
> >many tests are appropriate to do? To get a good picture of how it looks
> >you should probably try: 0, 8, 16, 64, 128, 512. Once you get some of
> >these results, it will be easier to determine which area(s) would be
> >most interesting to further explore.
> 
> The io_load test takes as long as the time in seconds shown on the table. At 
> least 3 tests are appropriate to get a reasonable average [runs is in square 
> parentheses]. Therefore it takes about half an hour per run. Luckily I had 
> the benefit of a night to set up a whole lot of runs:
> 
> io_load:
> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
> 2420rc1r0 [3]           489.3   15      36      10      6.85
> 2420rc1r8 [3]           485.5   15      35      10      6.80
> 2420rc1r16 [3]          570.4   12      43      10      7.99
> 2420rc1r32 [3]          570.1   12      42      10      7.98
> 2420rc1r64 [3]          575.0   12      43      10      8.05
> 2420rc1r128 [3]         611.4   11      46      10      8.56
> 2420rc1r256 [3]         646.2   11      49      10      9.05
> 2420rc1r512 [3]         603.7   12      45      10      8.46
> 2420rc1r1024 [3]        693.9   10      53      10      9.72
> 2.4.20-rc1 [2]          1142.2  6       90      10      16.00
> 
> Test hardware is 1133Mhz P3 laptop with 5400rpm ATA100 drive. I don't doubt 
> the response curve would be different for other hardware.

That looks pretty good, the behaviour in 2.4.20-rc1 is no sanely tunable
unlike before. Could you retest the whole contest suite with 512 as the
default value? It looks like a good default for 2.4.20.

Marcelo, we probably need to make few tweaks here to get the read
passover value right. The algorithmic changes in 2.4.20-pre made it
impossible to guess a good default value, as we invalidated the previous
tests. Right now we are using 2048 which is a number I basically pulled
out of my ass, it looks as if it might be a bit high. So I'll be sending
you a one-liner correction once a decent default value is found.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-09 13:54             ` Jens Axboe
  2002-11-09 21:12               ` Arador
  2002-11-09 21:53               ` Con Kolivas
@ 2002-11-10 10:12               ` Kjartan Maraas
  2002-11-10 10:17                 ` Jens Axboe
  2002-11-10 16:27                 ` Andrea Arcangeli
  2 siblings, 2 replies; 47+ messages in thread
From: Kjartan Maraas @ 2002-11-10 10:12 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Con Kolivas, Andrew Morton, linux kernel mailing list, marcelo,
	Andrea Arcangeli

lør, 2002-11-09 kl. 14:54 skrev Jens Axboe:

[SNIP]

> The default is 2048. How long does the io_load test take, or rather how

The default on my RH system with the latest errata kernel is as follows:

[root@sevilla kmaraas]# elvtune /dev/hda

/dev/hda elevator ID		0
	read_latency:		8192
	write_latency:		16384
	max_bomb_segments:	6

[root@sevilla kmaraas]# uname -a
Linux sevilla.gnome.no 2.4.18-17.7.x #1 Tue Oct 8 13:33:14 EDT 2002 i686
unknown
[root@sevilla kmaraas]# 

Is this worth changing to lower values then? They seem to be an awful
lot higher than the values mentioned below here.

> many tests are appropriate to do? To get a good picture of how it looks
> you should probably try: 0, 8, 16, 64, 128, 512. Once you get some of
> these results, it will be easier to determine which area(s) would be
> most interesting to further explore.
> 
> There's also the write passover, I don't think it will have much impact
> on this test though.

Cheers
Kjartan


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-10 10:12               ` Kjartan Maraas
@ 2002-11-10 10:17                 ` Jens Axboe
  2002-11-10 16:27                 ` Andrea Arcangeli
  1 sibling, 0 replies; 47+ messages in thread
From: Jens Axboe @ 2002-11-10 10:17 UTC (permalink / raw)
  To: Kjartan Maraas
  Cc: Con Kolivas, Andrew Morton, linux kernel mailing list, marcelo,
	Andrea Arcangeli

On Sun, Nov 10 2002, Kjartan Maraas wrote:
> lør, 2002-11-09 kl. 14:54 skrev Jens Axboe:
> 
> [SNIP]
> 
> > The default is 2048. How long does the io_load test take, or rather how
> 
> The default on my RH system with the latest errata kernel is as follows:
> 
> [root@sevilla kmaraas]# elvtune /dev/hda
> 
> /dev/hda elevator ID		0
> 	read_latency:		8192
> 	write_latency:		16384
> 	max_bomb_segments:	6
> 
> [root@sevilla kmaraas]# uname -a
> Linux sevilla.gnome.no 2.4.18-17.7.x #1 Tue Oct 8 13:33:14 EDT 2002 i686
> unknown
> [root@sevilla kmaraas]# 
> 
> Is this worth changing to lower values then? They seem to be an awful
> lot higher than the values mentioned below here.

As I mentioned in the email sent out a few minutes ago, you cannot
compare the values from 2.4.19 and earlier to 2.4.20-pre/rc at all. The
algorithm for determining when a request is starved has been changed to
be more correct, and that has invalidated these values.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-10  9:58   ` Con Kolivas
  2002-11-10 10:06     ` Jens Axboe
@ 2002-11-10 16:20     ` Andrea Arcangeli
  1 sibling, 0 replies; 47+ messages in thread
From: Andrea Arcangeli @ 2002-11-10 16:20 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list, marcelo

On Sun, Nov 10, 2002 at 08:58:43PM +1100, Con Kolivas wrote:
> >> to do IO work. Unfortunately it is now busy starving the scheduler in the
> >> mean time, much like the 2.5 kernels did before the deadline scheduler was
> >> put in.
> Ok I fully retract the statement. I should not pass judgement on what part of 
> the kernel has changed the benchmark results, I'll just describe what the 

actually Wil pointed out to me privately you meant I/O scheduler, you
just never mentioned the name "I/O" so I mistaken if for the process
scheduler, sorry (I should have understood from the deadline adjective).
It makes sense what you said once parsed as I/O scheduler of course.

Next week I will check the changes in your tree and I'll try to
reproduce the dbench numbers on my 4-way with very high I/O and disk
bandwith and I'll let you know the numbers I get here. It maybe simply
the different elevator default values and fixes in 2.4.20rc, but I
recall that you still win compared to -r0 somewhere (according to your
numbers). It's pointless from my part to discuss this further now until
I've the whole picture of the changes you did, the whole picture on the
contest source code, and after I can reproduce every single result you
posted here. Hope to be able to comment further ASAP.

Andrea

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-10 10:06     ` Jens Axboe
@ 2002-11-10 16:21       ` Andrea Arcangeli
  0 siblings, 0 replies; 47+ messages in thread
From: Andrea Arcangeli @ 2002-11-10 16:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Con Kolivas, linux kernel mailing list, marcelo

On Sun, Nov 10, 2002 at 11:06:56AM +0100, Jens Axboe wrote:
> Andrea is obviously talking about process scheduler, note the numa

exactly, sorry.

Andrea

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-10 10:09                 ` Jens Axboe
@ 2002-11-10 16:23                   ` Andrea Arcangeli
  2002-11-11  4:26                   ` Con Kolivas
  1 sibling, 0 replies; 47+ messages in thread
From: Andrea Arcangeli @ 2002-11-10 16:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Con Kolivas, Andrew Morton, linux kernel mailing list, marcelo

On Sun, Nov 10, 2002 at 11:09:42AM +0100, Jens Axboe wrote:
> On Sun, Nov 10 2002, Con Kolivas wrote:
> > >The default is 2048. How long does the io_load test take, or rather how
> > >many tests are appropriate to do? To get a good picture of how it looks
> > >you should probably try: 0, 8, 16, 64, 128, 512. Once you get some of
> > >these results, it will be easier to determine which area(s) would be
> > >most interesting to further explore.
> > 
> > The io_load test takes as long as the time in seconds shown on the table. At 
> > least 3 tests are appropriate to get a reasonable average [runs is in square 
> > parentheses]. Therefore it takes about half an hour per run. Luckily I had 
> > the benefit of a night to set up a whole lot of runs:
> > 
> > io_load:
> > Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
> > 2420rc1r0 [3]           489.3   15      36      10      6.85
> > 2420rc1r8 [3]           485.5   15      35      10      6.80
> > 2420rc1r16 [3]          570.4   12      43      10      7.99
> > 2420rc1r32 [3]          570.1   12      42      10      7.98
> > 2420rc1r64 [3]          575.0   12      43      10      8.05
> > 2420rc1r128 [3]         611.4   11      46      10      8.56
> > 2420rc1r256 [3]         646.2   11      49      10      9.05
> > 2420rc1r512 [3]         603.7   12      45      10      8.46
> > 2420rc1r1024 [3]        693.9   10      53      10      9.72
> > 2.4.20-rc1 [2]          1142.2  6       90      10      16.00
> > 
> > Test hardware is 1133Mhz P3 laptop with 5400rpm ATA100 drive. I don't doubt 
> > the response curve would be different for other hardware.
> 
> That looks pretty good, the behaviour in 2.4.20-rc1 is no sanely tunable
> unlike before. Could you retest the whole contest suite with 512 as the
> default value? It looks like a good default for 2.4.20.

agreed, btw, a 2048 before the fixes would mean much less than now.

Andrea

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-10 10:12               ` Kjartan Maraas
  2002-11-10 10:17                 ` Jens Axboe
@ 2002-11-10 16:27                 ` Andrea Arcangeli
  1 sibling, 0 replies; 47+ messages in thread
From: Andrea Arcangeli @ 2002-11-10 16:27 UTC (permalink / raw)
  To: Kjartan Maraas
  Cc: Jens Axboe, Con Kolivas, Andrew Morton,
	linux kernel mailing list, marcelo

On Sun, Nov 10, 2002 at 11:12:47AM +0100, Kjartan Maraas wrote:
> lør, 2002-11-09 kl. 14:54 skrev Jens Axboe:
> 
> [SNIP]
> 
> > The default is 2048. How long does the io_load test take, or rather how
> 
> The default on my RH system with the latest errata kernel is as follows:
> 
> [root@sevilla kmaraas]# elvtune /dev/hda
> 
> /dev/hda elevator ID		0
> 	read_latency:		8192
> 	write_latency:		16384
> 	max_bomb_segments:	6

that has still the bugs in 2.4.19 and all previous 2.4 that I found and
that I fixed first with an limited patch, not complete, and then Jens
fixed it competely after I showed him the bugs while explaining him why
I did the first limited patch (then Jens's patch gone in 2.4.20pre).

so a 8192 there, isn't comparable to a 8192 in 2.4.20rc, Jens was of
course aware and just lowered it to 2048 but that is probably still more
than a 8192 in previous 2.4, it would be possible to do the math to
calculate the exact value in some common case but I guess we want a sane
default, not necessairly the exact same behaviour, so I guess
benchmarking is more useful than doing the math to calculate the exact
new value to get the exact same behaviour.

Andrea

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-10  2:44 ` Andrea Arcangeli
  2002-11-10  3:56   ` Matt Reppert
  2002-11-10  9:58   ` Con Kolivas
@ 2002-11-10 19:32   ` Rik van Riel
  2002-11-10 20:10     ` Andrea Arcangeli
  2 siblings, 1 reply; 47+ messages in thread
From: Rik van Riel @ 2002-11-10 19:32 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Con Kolivas, linux kernel mailing list, marcelo

On Sun, 10 Nov 2002, Andrea Arcangeli wrote:
> On Sat, Nov 09, 2002 at 01:00:19PM +1100, Con Kolivas wrote:

> > 2.4.19-ck9 [2]          78.3    88      31      8       1.10
> > 2.4.20-rc1 [3]          105.9   69      32      2       1.48
> > 2.4.20-rc1aa1 [1]       106.3   69      33      3       1.49
>
> again ck9 is faster because of elevator hacks ala read-latency.
>
> in short your whole benchmark seems all about interacitivy of reads
> during write flood.

Which is a very important thing.  You have to keep in mind that
reads and writes are fundamentally different operations since
the majority of the writes happen asynchronously while the program
continues running, while the majority of reads are synchronous and
your program will block while the read is going on.

Because of this it is also much easier to do writes in large chunks
than it is to do reads in large chunks, because with writes you
know exactly what data you're going to write while you can't know
which data you'll need to read next.

> All the difference is there and it will hurt you badly if you do
> async-io benchmarks,

Why would read-latency hurt the async-io benchmark ?

Whether the IO is synchronous or asynchronous shouldn't matter much,
if you do a read you still need to wait for the data to be read in
before you can process it while the data you write is still in memory
and can be used over and over again.

What is the big difference with asynchronous IO that removes the big
asymetry between reads and writes ?

> kernel. Either that or change the name of your project, if somebody wins
> this context that's probably a bad I/O scheduler in many other aspects,
> some of the reason I didn't merge read-latency from Andrew.

Any reasons in particular or just a gut feeling ?

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/		http://distro.conectiva.com/
Current spamtrap:  <a href=mailto:"october@surriel.com">october@surriel.com</a>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-10 19:32   ` Rik van Riel
@ 2002-11-10 20:10     ` Andrea Arcangeli
  2002-11-10 20:52       ` Andrew Morton
  2002-11-10 20:56       ` Andrew Morton
  0 siblings, 2 replies; 47+ messages in thread
From: Andrea Arcangeli @ 2002-11-10 20:10 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Con Kolivas, linux kernel mailing list, marcelo

On Sun, Nov 10, 2002 at 05:32:44PM -0200, Rik van Riel wrote:
> On Sun, 10 Nov 2002, Andrea Arcangeli wrote:
> > On Sat, Nov 09, 2002 at 01:00:19PM +1100, Con Kolivas wrote:
> 
> > > 2.4.19-ck9 [2]          78.3    88      31      8       1.10
> > > 2.4.20-rc1 [3]          105.9   69      32      2       1.48
> > > 2.4.20-rc1aa1 [1]       106.3   69      33      3       1.49
> >
> > again ck9 is faster because of elevator hacks ala read-latency.
> >
> > in short your whole benchmark seems all about interacitivy of reads
> > during write flood.
> 
> Which is a very important thing.  You have to keep in mind that

sure, this is why I fixed the potential ~infinite starvation in the 2.3
elevator.

> reads and writes are fundamentally different operations since
> the majority of the writes happen asynchronously while the program
> continues running, while the majority of reads are synchronous and
> your program will block while the read is going on.
> 
> Because of this it is also much easier to do writes in large chunks
> than it is to do reads in large chunks, because with writes you
> know exactly what data you're going to write while you can't know
> which data you'll need to read next.
> 
> > All the difference is there and it will hurt you badly if you do
> > async-io benchmarks,
> 
> Why would read-latency hurt the async-io benchmark ?

because only with async-io it is possible to keep the I/O pipeline
filled by reads. readahead only allows to do read-I/O in large chunks,
it has no way to fill the pipeline.

Infact the size of the request queue is the foundamental factor that
controls read latency during heavy writes without special heuristics ala
read-latency. 

In short without async-io there is no way at all that a read application
can read at a decent speed during a write flood, unless you have special
hacks in the elevator ala read-latency that allows reads to enter in the
front of the queue, which reduces the chance to reorder reads and
potentially decreases performance on a async-io benchmark even in
presence of seeks.

> Whether the IO is synchronous or asynchronous shouldn't matter much,

the fact the I/O is sync or async makes the whole difference. with sync
reads the vmstat line in the read column will be always very small
compared to the write column under a write flood. This can be fixed either:

1) with hacks in the elevator ala read-latency that are not generic and
   could decrease performance of other workloads
2) reducing the size of the I/O queue, that may decrease performance
   also with seeks since it decreases the probaility of reordering in
   the elevator
3) by having the app using async-io for reads allowing it to keep the
   I/O pipeline full with reads

readahead, at least in its current form, only make sure that a 512k
command will be submitted instead of a 4k command, that's not remotely
comparable to writeback that floods the I/O queue constnatly with
several dozen or hundred mbytes of data. Increasing readhaead is also
risky, 512k is kind of obviously safe in all circumstances since it's a
single dma command anyways (and 128k for ide).

I'm starting benchmarking 2.4.20rc1aa against 2.4.19-ck9 under dbench
right now (then I'll run the contest), I can't imagine how can it be
that much faster under dbench, -aa is almost as fast as 2.5 in dbench
and much faster than 2.4 mainline, so if 19-ck9 is really that much
faster than -aa then it is likely much faster than 2.5 too. I definitely
need to examine in full detail what's going on with 2.4.19-ck9. Once I
will understand it I will let you know. For istance I know Randy's
numbers are fully reliable and I trust them:

	http://home.earthlink.net/~rwhron/kernel/bigbox.html

I find Randy's number extremely useful. Of course it's great to see also
the responsiveness side of a kernel, but dbench isn't normally a
benchmark that needs responsiveness, quite the opposite, the most unfair
is the behaviour of vm and elevator, the faster usually dbench runs,
because with unfariness dbench tends to run kind of single threaded that
maximizes at most the writeback effect etc... So if 2.4.19-ck9 is so
much faster under dbench and so much more responsive with the contest
that seems to benchmark basically only the read latency under writeback
flushing flood, then it is definitely worthwhile to produce a patch
against mainline that generates this boost. If it has the preemption
patch that could hardly explain it too, the improvement from 45 MB/sec
to 65 MB/sec there's quite an huge difference and we have all the
schedule points in the submit_bh too, so it's quite unlikely that
preempt could explain that difference, it might against a mainline, but
not against my tree.

Anyways this is all guessing, once I'll check the code after I
reproduced the numbers things should be much more clear.

Andrea

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-10 20:10     ` Andrea Arcangeli
@ 2002-11-10 20:52       ` Andrew Morton
  2002-11-10 21:05         ` Rik van Riel
  2002-11-10 20:56       ` Andrew Morton
  1 sibling, 1 reply; 47+ messages in thread
From: Andrew Morton @ 2002-11-10 20:52 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Rik van Riel, Con Kolivas, linux kernel mailing list, marcelo

Andrea Arcangeli wrote:
> 
> > Whether the IO is synchronous or asynchronous shouldn't matter much,
> 
> the fact the I/O is sync or async makes the whole difference. with sync
> reads the vmstat line in the read column will be always very small
> compared to the write column under a write flood. This can be fixed either:
> 
> 1) with hacks in the elevator ala read-latency that are not generic and
>    could decrease performance of other workloads

read-latency will only do the front-insertion if it was unable to find a
merge or insert on the tail-to-head search.

And the problem it desparately addresses is severe.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-10 20:10     ` Andrea Arcangeli
  2002-11-10 20:52       ` Andrew Morton
@ 2002-11-10 20:56       ` Andrew Morton
  2002-11-11  1:08         ` Andrea Arcangeli
  1 sibling, 1 reply; 47+ messages in thread
From: Andrew Morton @ 2002-11-10 20:56 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Rik van Riel, Con Kolivas, linux kernel mailing list, marcelo

Andrea Arcangeli wrote:
> 
> So if 2.4.19-ck9 is so
> much faster under dbench and so much more responsive with the contest
> that seems to benchmark basically only the read latency under writeback
> flushing flood, then it is definitely worthwhile to produce a patch
> against mainline that generates this boost. If it has the preemption
> patch that could hardly explain it too, the improvement from 45 MB/sec
> to 65 MB/sec there's quite an huge difference and we have all the
> schedule points in the submit_bh too, so it's quite unlikely that
> preempt could explain that difference, it might against a mainline, but
> not against my tree.
> 
> Anyways this is all guessing, once I'll check the code after I
> reproduced the numbers things should be much more clear.

Well if I understand it correctly, compressed caching, umm, compresses
the cache ;)

And dbench writes 01 01 01 01 01 everywhere.  Enormously compressible.

So it's basically fitting vastly more pagecache into the machine.

That would be my guessing, anyway.  Changing dbench to write random
stuff might change the picture.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-10 20:52       ` Andrew Morton
@ 2002-11-10 21:05         ` Rik van Riel
  2002-11-11  1:54           ` Andrea Arcangeli
  0 siblings, 1 reply; 47+ messages in thread
From: Rik van Riel @ 2002-11-10 21:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrea Arcangeli, Con Kolivas, linux kernel mailing list, marcelo

On Sun, 10 Nov 2002, Andrew Morton wrote:
> Andrea Arcangeli wrote:
> >
> > > Whether the IO is synchronous or asynchronous shouldn't matter much,
> >
> > the fact the I/O is sync or async makes the whole difference. with sync
> > reads the vmstat line in the read column will be always very small
> > compared to the write column under a write flood. This can be fixed either:
> >
> > 1) with hacks in the elevator ala read-latency that are not generic and
> >    could decrease performance of other workloads

It'd be nice if you specified which kind of workloads. Generic
handwaving is easy, but if you think about this problem a bit
more you'll see that most workloads which look like they might
suffer at first view should be just fine in reality...

> read-latency will only do the front-insertion if it was unable to find a
> merge or insert on the tail-to-head search.
>
> And the problem it desparately addresses is severe.

Note that async-IO shouldn't make a big difference here, except
maybe in synthetic benchmarks.

This is because the stream of data in a server will be approximately
the same regardless of whether the application is coded to use async
IO, threads or processes and because clients still need to wait for
the data on read while most writes are asynchronous.

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/		http://distro.conectiva.com/
Current spamtrap:  <a href=mailto:"october@surriel.com">october@surriel.com</a>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-10 20:56       ` Andrew Morton
@ 2002-11-11  1:08         ` Andrea Arcangeli
  0 siblings, 0 replies; 47+ messages in thread
From: Andrea Arcangeli @ 2002-11-11  1:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Con Kolivas, linux kernel mailing list, marcelo

On Sun, Nov 10, 2002 at 12:56:33PM -0800, Andrew Morton wrote:
> Andrea Arcangeli wrote:
> > 
> > So if 2.4.19-ck9 is so
> > much faster under dbench and so much more responsive with the contest
> > that seems to benchmark basically only the read latency under writeback
> > flushing flood, then it is definitely worthwhile to produce a patch
> > against mainline that generates this boost. If it has the preemption
> > patch that could hardly explain it too, the improvement from 45 MB/sec
> > to 65 MB/sec there's quite an huge difference and we have all the
> > schedule points in the submit_bh too, so it's quite unlikely that
> > preempt could explain that difference, it might against a mainline, but
> > not against my tree.
> > 
> > Anyways this is all guessing, once I'll check the code after I
> > reproduced the numbers things should be much more clear.
> 
> Well if I understand it correctly, compressed caching, umm, compresses
> the cache ;)
> 
> And dbench writes 01 01 01 01 01 everywhere.  Enormously compressible.
> 
> So it's basically fitting vastly more pagecache into the machine.
> 
> That would be my guessing, anyway.  Changing dbench to write random
> stuff might change the picture.

yes, it may be the pagecache compression that makes the difference here.
My hardware has lots of disk and ram bandwidth so it should benefit less
from compression. the results on my tree are finished, I'm starting a
new run on ck10.

Andrea

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-10 21:05         ` Rik van Riel
@ 2002-11-11  1:54           ` Andrea Arcangeli
  2002-11-11  4:03             ` Andrew Morton
  2002-11-11 13:45             ` Rik van Riel
  0 siblings, 2 replies; 47+ messages in thread
From: Andrea Arcangeli @ 2002-11-11  1:54 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Con Kolivas, linux kernel mailing list, marcelo

On Sun, Nov 10, 2002 at 07:05:01PM -0200, Rik van Riel wrote:
> On Sun, 10 Nov 2002, Andrew Morton wrote:
> > Andrea Arcangeli wrote:
> > >
> > > > Whether the IO is synchronous or asynchronous shouldn't matter much,
> > >
> > > the fact the I/O is sync or async makes the whole difference. with sync
> > > reads the vmstat line in the read column will be always very small
> > > compared to the write column under a write flood. This can be fixed either:
> > >
> > > 1) with hacks in the elevator ala read-latency that are not generic and
> > >    could decrease performance of other workloads
> 
> It'd be nice if you specified which kind of workloads. Generic

the slowdown happens in this case:

	queue 5 6 7 8 9

insert read 3

	queue 3 5 6 7 8 9

request 3 is handled by the device

	queue 5 6 7 8 9

insert read 1

	queue 1 5 6 7 8 9

request 1 is handled by the device

	queue 5 6 7 8 9

insert read 2

	queue 2 5 6 7 8 9

request 2 is handled by the device

so what happened is:

	read 3
	read 1
	read 2

while w/o read-latency what would happen most probably would been the
below, because the read 5 6 7 8 9 would give the other reads the time to
be inserted and reordered and in turn optimized:

	read 1
	read 2
	read 3

let's ignore async-io to keep it simple, there definitely the
possibility of slowing down with lots of task reading at the same time
even with only sync reads (that could be during swapping lots of major
faults at the same time during some write load or whatever else that
generates lots of tasks reading at near time during some background
writing).

Anybody claiming there isn't the potential of a global I/O throughput
slowdown would be clueless.

I know in some case it the additional seeking may allow the cpu to do
more work and that may actually increase the throughput, but this isn't
always the case, it can definitely slowdown.

all you can argue is that the decrease of latency for lots of common
interactive workloads could worth the potential of a global throghput
slowdown. On that I may agree. I wasn't very excited in merging that
because I was scared of slowdowns of workloads with async-io and
lots of tasks reading at the same time small things during writes that
as I demonstrated above can definitely happen in practice and it's
realistic. I run myself a number of workloads like that.  The current
algorithm is optimal for throughput.

However I think even read-latency is more a workarond to a problem in
the I/O queue dimensions. I think the I/O queue should be dunamically
limited to amount of data queued (in bytes not in number of requests).

We need plenty of requests only because all the requests may have 4k
only when no merging can happen, and in such case we definitely need the
elevator to do an huge work to be efficient, seeking heavily on 4k
requests (or smaller) hurts a lot, seeking on 512k requests is much less
severe.

But when each request is large 512k it is pointless to allow the same
number of requests that we allow when the requests are 4k. I think
starting with such simple fix would provide smimilar benefit of
read-latency and no corner case at all. So I would much prefer to start
with a fix like that to account for the available request size to
drivers in bytes of data in the queue, instead of in number of requests
in the queue. read-latency kind of workarounds the way too huge I/O
queue when each request is 512k in size. And it workaround it only for
reads, O_SYNC/-osync would get stuck big time against writeback load
from other files just like like reads now. The fix I propose is generic,
basically it has no downside, it is more dynamic and so I prefer it even
if may not be as direct and hard like read-latency, but that is infact what
makes it better and potentially faster in throughput than read-latency.

Going one step further we could limit the amount of bytes that each
single task can submit, so for example kupdate/bdflush couldn't fill the
queue completely anymore, and still the elevator could do an huge work
when thousand of different tasks are submitting at the same time, which
is the interesting case for the elevator, or the amount of data to
submit in the queue for each task could depend on the number of tasks
actively doing I/O in the last few seconds.

These are the fixes (I consider the limiting of bytes in the I/O queue a
fix) that I would prefer.

Infact I today think the max_bomb_segment I researched some year back
was so beneficial in terms of read-latency just because it effectively
had the effect of reducing the max amount of pending "writeback" bytes
in the queue, not really because it splitted the request in multiple dma
(in turn decreasing a lot performance because the dma chunks were way
too small to have an hope to reach the peak performance of the hardware,
and the fact performance was so hurted forced us to back it out
completely, rightly).  So I'm optimistic that reducing the size of the
queue and making it tunable from elvtune would be the first thing to do
rather than playing with the read-latency hack that just workarounds the
way too huge queue size when the merging is at its maximum and that can
hurt performance in some case.

Andrea

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-11  1:54           ` Andrea Arcangeli
@ 2002-11-11  4:03             ` Andrew Morton
  2002-11-11  4:06               ` Andrea Arcangeli
  2002-11-11 13:45             ` Rik van Riel
  1 sibling, 1 reply; 47+ messages in thread
From: Andrew Morton @ 2002-11-11  4:03 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Rik van Riel, Con Kolivas, linux kernel mailing list, marcelo

Andrea Arcangeli wrote:
> 
> the slowdown happens in this case:
> 
>         queue 5 6 7 8 9
> 
> insert read 3
> 
>         queue 3 5 6 7 8 9

read-latency will not do that.
 
> However I think even read-latency is more a workarond to a problem in
> the I/O queue dimensions.

The problem is the 2.4 algorithm.  If a read is not mergeable or
insertable it is placed at the tail of the queue.  Which is the
worst possible place it can be put because applications wait on
reads, not on writes.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-11  4:03             ` Andrew Morton
@ 2002-11-11  4:06               ` Andrea Arcangeli
  2002-11-11  4:22                 ` Andrew Morton
  0 siblings, 1 reply; 47+ messages in thread
From: Andrea Arcangeli @ 2002-11-11  4:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Con Kolivas, linux kernel mailing list, marcelo

On Sun, Nov 10, 2002 at 08:03:01PM -0800, Andrew Morton wrote:
> Andrea Arcangeli wrote:
> > 
> > the slowdown happens in this case:
> > 
> >         queue 5 6 7 8 9
> > 
> > insert read 3
> > 
> >         queue 3 5 6 7 8 9
> 
> read-latency will not do that.

So what will it do? Must do something very much like what I described or
it is a noop period. Please elaborate.

>  
> > However I think even read-latency is more a workarond to a problem in
> > the I/O queue dimensions.
> 
> The problem is the 2.4 algorithm.  If a read is not mergeable or
> insertable it is placed at the tail of the queue.  Which is the
> worst possible place it can be put because applications wait on
> reads, not on writes.

O_SYNC/-osync waits on writes too, so are you saying writes must go to
the head because of that? reads should be not too bad at the end too if
only the queue wasn't that oversized when the merging is at its maximum.
Fix the oversizing of the queue, then read-latency will matter much
less.

Andrea

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-11  4:06               ` Andrea Arcangeli
@ 2002-11-11  4:22                 ` Andrew Morton
  2002-11-11  4:39                   ` Andrea Arcangeli
  0 siblings, 1 reply; 47+ messages in thread
From: Andrew Morton @ 2002-11-11  4:22 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Rik van Riel, Con Kolivas, linux kernel mailing list, marcelo

Andrea Arcangeli wrote:
> 
> On Sun, Nov 10, 2002 at 08:03:01PM -0800, Andrew Morton wrote:
> > Andrea Arcangeli wrote:
> > >
> > > the slowdown happens in this case:
> > >
> > >         queue 5 6 7 8 9
> > >
> > > insert read 3
> > >
> > >         queue 3 5 6 7 8 9
> >
> > read-latency will not do that.
> 
> So what will it do? Must do something very much like what I described or
> it is a noop period. Please elaborate.

If a read was not merged with another read on the tail->head walk
the read will be inserted near the head.  The head->tail walk bypasses
all reads, six (default) writes and then inserts the new read.

It has the shortcoming that earlier reads may be walked past in the
tail->head phase.  It's a three-liner to prevent that but I was never
able to demonstrate any difference.

> >
> > > However I think even read-latency is more a workarond to a problem in
> > > the I/O queue dimensions.
> >
> > The problem is the 2.4 algorithm.  If a read is not mergeable or
> > insertable it is placed at the tail of the queue.  Which is the
> > worst possible place it can be put because applications wait on
> > reads, not on writes.
> 
> O_SYNC/-osync waits on writes too, so are you saying writes must go to
> the head because of that?

It has been discussed: boost a request to head-of-queue when a thread
starts to wait on a buffer/page which is inside that request.

But we don't care about synchronous writes.  As long as we don't
starve them out completely, optimise the (vastly more) common case.

> reads should be not too bad at the end too if
> only the queue wasn't that oversized when the merging is at its maximum.
> Fix the oversizing of the queue, then read-latency will matter much
> less.

Think about two threads.  One is generating a stream of writes and
the other is trying to read a file.  The reader needs to read the 
directory, the inode, the first data blocks, the first indirect and
then some more data blocks.  That's at least three synchronous reads.
Even if those reads are placed just three requests from head-of-queue,
the reader will make one tenth of the progress of the writer.

And the current code places those reads 64 requests from head-of-queue.

When the various things which were congesting write queueing were fixed
in the 2.5 VM a streaming write was slowing such read operations down by
a factor of 4000.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-10 10:09                 ` Jens Axboe
  2002-11-10 16:23                   ` Andrea Arcangeli
@ 2002-11-11  4:26                   ` Con Kolivas
  1 sibling, 0 replies; 47+ messages in thread
From: Con Kolivas @ 2002-11-11  4:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Andrew Morton, linux kernel mailing list, marcelo,
	Andrea Arcangeli, Marcelo

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>On Sun, Nov 10 2002, Con Kolivas wrote:
>> io_load:
>> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
>> 2420rc1r0 [3]           489.3   15      36      10      6.85
>> 2420rc1r8 [3]           485.5   15      35      10      6.80
>> 2420rc1r16 [3]          570.4   12      43      10      7.99
>> 2420rc1r32 [3]          570.1   12      42      10      7.98
>> 2420rc1r64 [3]          575.0   12      43      10      8.05
>> 2420rc1r128 [3]         611.4   11      46      10      8.56
>> 2420rc1r256 [3]         646.2   11      49      10      9.05
>> 2420rc1r512 [3]         603.7   12      45      10      8.46
>> 2420rc1r1024 [3]        693.9   10      53      10      9.72
>> 2.4.20-rc1 [2]          1142.2  6       90      10      16.00
>>
>> Test hardware is 1133Mhz P3 laptop with 5400rpm ATA100 drive. I don't
>> doubt the response curve would be different for other hardware.
>
>That looks pretty good, the behaviour in 2.4.20-rc1 is no sanely tunable
>unlike before. Could you retest the whole contest suite with 512 as the
>default value? It looks like a good default for 2.4.20.

Ok here they are:

noload:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.18 [5]              71.7    93      0       0       1.00
2.4.19 [5]              69.0    97      0       0       0.97
2.4.20-rc1 [3]          72.2    93      0       0       1.01
2420rc1r512 [3]         71.6    93      0       0       1.00

cacherun:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.18 [2]              66.6    99      0       0       0.93
2.4.19 [2]              68.0    99      0       0       0.95
2.4.20-rc1 [3]          67.2    99      0       0       0.94
2420rc1r512 [3]         67.1    99      0       0       0.94

process_load:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.18 [3]              109.5   57      119     44      1.53
2.4.19 [3]              106.5   59      112     43      1.49
2.4.20-rc1 [3]          110.7   58      119     43      1.55
2420rc1r512 [3]         112.1   57      122     43      1.57

ctar_load:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.18 [3]              117.4   63      1       7       1.64
2.4.19 [2]              106.5   70      1       8       1.49
2.4.20-rc1 [3]          102.1   72      1       7       1.43
2420rc1r512 [3]         101.7   73      1       8       1.42

xtar_load:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.18 [3]              150.8   49      2       8       2.11
2.4.19 [1]              132.4   55      2       9       1.85
2.4.20-rc1 [3]          180.7   40      3       8       2.53
2420rc1r512 [3]         170.0   44      3       7       2.38

io_load:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.18 [3]              474.1   15      36      10      6.64
2.4.19 [3]              492.6   14      38      10      6.90
2.4.20-rc1 [2]          1142.2  6       90      10      16.00
2420rc1r512 [6]         602.7   12      45      10      8.44

read_load:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.18 [3]              102.3   70      6       3       1.43
2.4.19 [2]              134.1   54      14      5       1.88
2.4.20-rc1 [3]          173.2   43      20      5       2.43
2420rc1r512 [3]         112.5   67      11      5       1.58

list_load:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.18 [3]              90.2    76      1       17      1.26
2.4.19 [1]              89.8    77      1       20      1.26
2.4.20-rc1 [3]          88.8    77      0       12      1.24
2420rc1r512 [3]         88.0    78      0       12      1.23

mem_load:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.18 [3]              103.3   70      32      3       1.45
2.4.19 [3]              100.0   72      33      3       1.40
2.4.20-rc1 [3]          105.9   69      32      2       1.48
2420rc1r512 [3]         105.0   70      33      3       1.47

Looks good. Note that read_load is a lot "better" too.

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE9zzFtF6dfvkL3i1gRAvQ/AJ0UK7za0Uvy6SnyPxFoYEjcX2iGDACcCWfx
WRq8eTboTj6bRCzERw/gMfo=
=kSMm
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-11  4:22                 ` Andrew Morton
@ 2002-11-11  4:39                   ` Andrea Arcangeli
  2002-11-11  5:10                     ` Andrew Morton
  0 siblings, 1 reply; 47+ messages in thread
From: Andrea Arcangeli @ 2002-11-11  4:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Con Kolivas, linux kernel mailing list, marcelo

On Sun, Nov 10, 2002 at 08:22:38PM -0800, Andrew Morton wrote:
> Andrea Arcangeli wrote:
> > 
> > On Sun, Nov 10, 2002 at 08:03:01PM -0800, Andrew Morton wrote:
> > > Andrea Arcangeli wrote:
> > > >
> > > > the slowdown happens in this case:
> > > >
> > > >         queue 5 6 7 8 9
> > > >
> > > > insert read 3
> > > >
> > > >         queue 3 5 6 7 8 9
> > >
> > > read-latency will not do that.
> > 
> > So what will it do? Must do something very much like what I described or
> > it is a noop period. Please elaborate.
> 
> If a read was not merged with another read on the tail->head walk
> the read will be inserted near the head.  The head->tail walk bypasses
> all reads, six (default) writes and then inserts the new read.
> 
> It has the shortcoming that earlier reads may be walked past in the
> tail->head phase.  It's a three-liner to prevent that but I was never
> able to demonstrate any difference.

from your description it seems what will happen is:

	queue 3 5 6 7 8 9

I don't see why you say it won't do that. the whole point of the patch
to put reads at or near the head, and you say 3 won't be put at the
head if only 5 writes are pending. Or maybe your bypasses "6 writes"
means the other way around, that you put the read as the seventh entry
in the queue if there are 6 writes pending, is it the case?

> > > > However I think even read-latency is more a workarond to a
> > > > problem in
> > > > the I/O queue dimensions.
> > >
> > > The problem is the 2.4 algorithm.  If a read is not mergeable or
> > > insertable it is placed at the tail of the queue.  Which is the
> > > worst possible place it can be put because applications wait on
> > > reads, not on writes.
> > 
> > O_SYNC/-osync waits on writes too, so are you saying writes must go to
> > the head because of that?
> 
> It has been discussed: boost a request to head-of-queue when a thread
> starts to wait on a buffer/page which is inside that request.
> 
> But we don't care about synchronous writes.  As long as we don't
> starve them out completely, optimise the (vastly more) common case.

yes, it should be worthwhile to potentially decrease a little the global
throughput to increase significantly the read latency, I'm not against
that, but before I would care about that I prefer to get a limit on the
size of the queue in bytes, not in requests, that is a generic issue for
writes and read-async-io too, it's a task against task fairness/latency
matter, not specific to reads, but it should help read latency
visibly too. In any case the two things are orthogonal, if the queue is
smaller read-latency will do even better.

> > reads should be not too bad at the end too if
> > only the queue wasn't that oversized when the merging is at its maximum.
> > Fix the oversizing of the queue, then read-latency will matter much
> > less.
> 
> Think about two threads.  One is generating a stream of writes and
> the other is trying to read a file.  The reader needs to read the 
> directory, the inode, the first data blocks, the first indirect and
> then some more data blocks.  That's at least three synchronous reads.

sure I know the problem with sync reads.

> Even if those reads are placed just three requests from head-of-queue,
> the reader will make one tenth of the progress of the writer.

actually it's probably much worse tha a 10 times ratio since the writer
is going to use big requests, while the reader is probably seeking with
<=4k requests.

> And the current code places those reads 64 requests from head-of-queue.
> 
> When the various things which were congesting write queueing were fixed
> in the 2.5 VM a streaming write was slowing such read operations down by
> a factor of 4000.


Andrea

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-11  4:39                   ` Andrea Arcangeli
@ 2002-11-11  5:10                     ` Andrew Morton
  2002-11-11  5:23                       ` Andrea Arcangeli
                                         ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Andrew Morton @ 2002-11-11  5:10 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Rik van Riel, Con Kolivas, linux kernel mailing list, marcelo

Andrea Arcangeli wrote:
> 
> from your description it seems what will happen is:
> 
>         queue 3 5 6 7 8 9
> 
> I don't see why you say it won't do that. the whole point of the patch
> to put reads at or near the head, and you say 3 won't be put at the
> head if only 5 writes are pending. Or maybe your bypasses "6 writes"
> means the other way around, that you put the read as the seventh entry
> in the queue if there are 6 writes pending, is it the case?

Actually I thought your "queue" was "head of queue" and that 5,6,7,8 and 9
were reads....

If the queue contains, say:

(head)	R1 R2 R3 W1 W2 W3 W4 W5 W6 W7

Then a new R4 will be inserted between W6 and W7.  So if R5 is mergeable
with R4 there is still plenty of time for that.


> > > > > However I think even read-latency is more a workarond to a
> > > > > problem in
> > > > > the I/O queue dimensions.
> > > >
> > > > The problem is the 2.4 algorithm.  If a read is not mergeable or
> > > > insertable it is placed at the tail of the queue.  Which is the
> > > > worst possible place it can be put because applications wait on
> > > > reads, not on writes.
> > >
> > > O_SYNC/-osync waits on writes too, so are you saying writes must go to
> > > the head because of that?
> >
> > It has been discussed: boost a request to head-of-queue when a thread
> > starts to wait on a buffer/page which is inside that request.
> >
> > But we don't care about synchronous writes.  As long as we don't
> > starve them out completely, optimise the (vastly more) common case.
> 
> yes, it should be worthwhile to potentially decrease a little the global
> throughput to increase significantly the read latency, I'm not against
> that, but before I would care about that I prefer to get a limit on the
> size of the queue in bytes, not in requests,

Really, it should be in terms of "time".  If you assume 6 msec seek and
30 mbyte/sec bandwidth, the crossover is a 120 kbyte I/O.  Not that I'm
sure this means anything interesting ;)  But the lesson is that the
size of a request isn't very important.

> actually it's probably much worse tha a 10 times ratio since the writer
> is going to use big requests, while the reader is probably seeking with
> <=4k requests.
> 

Yup.  This is one case where improving latency improves throughput,
if there's computational work to be done.

2.5 (and read-latency) sort-of solve these problems by creating a
massive seekstorm when there are competing reads and writes.  It's
a pretty sad solution really.

Better would be to perform those reads and writes in nice big batches.
That's easy for the writes, but for reads we need to wait for the
application to submit another one.  That means actually deliberately
leaving the disk head idle for a few milliseconds in the anticipation
that the application will submit another nearby read.  This is called
"anticipatory scheduling" and has been shown to provide 20%-70%
performance boost in web serving workloads.   It just makes heaps of
sense to me and I'd love to see it in Linux...

See http://www.cs.ucsd.edu/sosp01/papers/iyer.pdf

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-11  5:10                     ` Andrew Morton
@ 2002-11-11  5:23                       ` Andrea Arcangeli
  2002-11-11  7:58                       ` William Lee Irwin III
  2002-11-11 13:56                       ` Rik van Riel
  2 siblings, 0 replies; 47+ messages in thread
From: Andrea Arcangeli @ 2002-11-11  5:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Con Kolivas, linux kernel mailing list, marcelo

On Sun, Nov 10, 2002 at 09:10:41PM -0800, Andrew Morton wrote:
> Andrea Arcangeli wrote:
> > 
> > from your description it seems what will happen is:
> > 
> >         queue 3 5 6 7 8 9
> > 
> > I don't see why you say it won't do that. the whole point of the patch
> > to put reads at or near the head, and you say 3 won't be put at the
> > head if only 5 writes are pending. Or maybe your bypasses "6 writes"
> > means the other way around, that you put the read as the seventh entry
> > in the queue if there are 6 writes pending, is it the case?
> 
> Actually I thought your "queue" was "head of queue" and that 5,6,7,8 and 9
> were reads....
> 
> If the queue contains, say:
> 
> (head)	R1 R2 R3 W1 W2 W3 W4 W5 W6 W7
> 
> Then a new R4 will be inserted between W6 and W7.  So if R5 is mergeable
> with R4 there is still plenty of time for that.

yes, the fact it's "near" and not exactly in the head as I originally
thought, makes it less likely that it slows things down, even if it
theoretically still could for some workload, overall it seems a
worthwhile heuristic.

Andrea

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-11  5:10                     ` Andrew Morton
  2002-11-11  5:23                       ` Andrea Arcangeli
@ 2002-11-11  7:58                       ` William Lee Irwin III
  2002-11-11 13:56                       ` Rik van Riel
  2 siblings, 0 replies; 47+ messages in thread
From: William Lee Irwin III @ 2002-11-11  7:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrea Arcangeli, Rik van Riel, Con Kolivas,
	linux kernel mailing list, marcelo

Andrea Arcangeli wrote:
> 2.5 (and read-latency) sort-of solve these problems by creating a
> massive seekstorm when there are competing reads and writes.  It's
> a pretty sad solution really.

On Sun, Nov 10, 2002 at 09:10:41PM -0800, Andrew Morton wrote:
> Better would be to perform those reads and writes in nice big batches.
> That's easy for the writes, but for reads we need to wait for the
> application to submit another one.  That means actually deliberately
> leaving the disk head idle for a few milliseconds in the anticipation
> that the application will submit another nearby read.  This is called
> "anticipatory scheduling" and has been shown to provide 20%-70%
> performance boost in web serving workloads.   It just makes heaps of
> sense to me and I'd love to see it in Linux...
> See http://www.cs.ucsd.edu/sosp01/papers/iyer.pdf

This smacks of "deceptive idleness". OTOH I prefer to keep out of those
issues and focus on pure fault handling, TLB, and space consumption
issues. I/O scheduling is far afield for me, and I prefer to keep it so.


Bill

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-11  1:54           ` Andrea Arcangeli
  2002-11-11  4:03             ` Andrew Morton
@ 2002-11-11 13:45             ` Rik van Riel
  2002-11-11 14:09               ` Jens Axboe
  2002-11-11 15:43               ` Andrea Arcangeli
  1 sibling, 2 replies; 47+ messages in thread
From: Rik van Riel @ 2002-11-11 13:45 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, Con Kolivas, linux kernel mailing list, marcelo

On Mon, 11 Nov 2002, Andrea Arcangeli wrote:

> [snip bad example by somebody who hasn't read Andrew's patch]

> Anybody claiming there isn't the potential of a global I/O throughput
> slowdown would be clueless.

IO throughput isn't the point.  Due to the fundamental asymmetry
between reads and writes IO throughput does NOT correspond to
program throughput under many kinds of IO patterns.

Sure, the best IO throughput is good for writeout, but it'll slow
down any program doing reads, including async IO programs because
those too need to get their data before they can process it.

> all you can argue is that the decrease of latency for lots of common
> interactive workloads could worth the potential of a global throghput
> slowdown. On that I may agree.

On the contrary, the decrease of latency will probably bring a
global throughput increase.  Just program throughput, not raw
IO throughput.

> However I think even read-latency is more a workarond to a problem in
> the I/O queue dimensions. I think the I/O queue should be dunamically
> limited to amount of data queued (in bytes not in number of requests).

The number of bytes makes surprisingly little sense when you keep
into account that one disk seek on a modern costs as much time as
it takes to read about half a megabyte worth of data.

> But when each request is large 512k it is pointless to allow the same
> number of requests that we allow when the requests are 4k.

A request of 512 kB will take about twice the time to service as a 4 kB
request would take, assuming the disk does around 50 MB/s throughput.
If you take one of those really modern disks Andre Hedrick has in his
lab the difference gets even smaller.

> Infact I today think the max_bomb_segment I researched some year back
> was so beneficial in terms of read-latency just because it effectively

That must be why it was backed out ;)

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/		http://distro.conectiva.com/
Current spamtrap:  <a href=mailto:"october@surriel.com">october@surriel.com</a>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-11  5:10                     ` Andrew Morton
  2002-11-11  5:23                       ` Andrea Arcangeli
  2002-11-11  7:58                       ` William Lee Irwin III
@ 2002-11-11 13:56                       ` Rik van Riel
  2 siblings, 0 replies; 47+ messages in thread
From: Rik van Riel @ 2002-11-11 13:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrea Arcangeli, Con Kolivas, linux kernel mailing list, marcelo

On Sun, 10 Nov 2002, Andrew Morton wrote:

> Really, it should be in terms of "time".  If you assume 6 msec seek and
> 30 mbyte/sec bandwidth, the crossover is a 120 kbyte I/O.

Now figure in the rotational latency and the crossover point has
moved to 200 kB. ;)

> Not that I'm sure this means anything interesting ;)  But the lesson is
> that the size of a request isn't very important.

Besides, larger requests are much more efficient so penalising
those is the very last thing we want to do.

> Better would be to perform those reads and writes in nice big batches.
> That's easy for the writes, but for reads we need to wait for the
> application to submit another one.  That means actually deliberately
> leaving the disk head idle for a few milliseconds in the anticipation
> that the application will submit another nearby read.  This is called
> "anticipatory scheduling" and has been shown to provide 20%-70%
> performance boost in web serving workloads.   It just makes heaps of
> sense to me and I'd love to see it in Linux...

It only makes sense under heavy multiprocessing workloads where
we have multiple processes submitting IO, but if it's just one
process all this deliberate delay will achieve is a slowdown of
the process.

> See http://www.cs.ucsd.edu/sosp01/papers/iyer.pdf

Looking at it now.

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/		http://distro.conectiva.com/
Current spamtrap:  <a href=mailto:"october@surriel.com">october@surriel.com</a>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-11 13:45             ` Rik van Riel
@ 2002-11-11 14:09               ` Jens Axboe
  2002-11-11 15:48                 ` Andrea Arcangeli
  2002-11-11 15:43               ` Andrea Arcangeli
  1 sibling, 1 reply; 47+ messages in thread
From: Jens Axboe @ 2002-11-11 14:09 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrea Arcangeli, Andrew Morton, Con Kolivas,
	linux kernel mailing list, marcelo

On Mon, Nov 11 2002, Rik van Riel wrote:
> > Infact I today think the max_bomb_segment I researched some year back
> > was so beneficial in terms of read-latency just because it effectively
> 
> That must be why it was backed out ;)

Warning, incredibly bad quote snip above.

Rik, you basically deleted the interesting part there. The
max_bomb_segment logic was pretty uninteresting if you looked at it
from the POV that says that we must limit the size of a request to
prevent starvation. This is what the name implies, and this is flawed.
However, Andrea goes on to say that it sort-of worked anyways, just
not for the reaon he originally thought it would. It worked because it
limited the total size of pending writes in the queue. And this is
indeed the key factor to read latency in the 2.4 elevator, because reads
tend to get pushed in the back all the time because the queue looks like

R1-W1-W2-W3-....W127

service R1, queue is now

W1-W2-W3....-W127

application got R1 serviced, issue a new read. Queue is now:

W1-W2-W3....-W127-R2

So even with 0 read passover value, an application typically has to wait
for the total sum of writes in the queue. And this is what causes the
starvation. max_bomb_segments wasn't too good anyways, because in order
to get good latency you have to limit the sum of W1-W127 way too much
and then it starts to hurt write throughput really badly.

This is why the 2.4 io scheduler is fundamentally flawed from the read
latency view point. This is also why the 2.5 deadline io scheduler is
far superior in this area.

>> But when each request is large 512k it is pointless to allow the same
>> number of requests that we allow when the requests are 4k.

> A request of 512 kB will take about twice the time to service as a 4 kB
> request would take, assuming the disk does around 50 MB/s throughput.
> If you take one of those really modern disks Andre Hedrick has in his
> lab the difference gets even smaller.

I'll mention that for 2.5 the number of bytes that equals a full seek in
service time if called a stream_unit and is tweakable. Typically you are
looking at plain 40MiB/s and 8ms seek, so ~256-300KiB is more in the
normal range that 512KiB.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-11 13:45             ` Rik van Riel
  2002-11-11 14:09               ` Jens Axboe
@ 2002-11-11 15:43               ` Andrea Arcangeli
  1 sibling, 0 replies; 47+ messages in thread
From: Andrea Arcangeli @ 2002-11-11 15:43 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Con Kolivas, linux kernel mailing list, marcelo

On Mon, Nov 11, 2002 at 11:45:06AM -0200, Rik van Riel wrote:
> On Mon, 11 Nov 2002, Andrea Arcangeli wrote:
> 
> > [snip bad example by somebody who hasn't read Andrew's patch]
> 
> > Anybody claiming there isn't the potential of a global I/O throughput
> > slowdown would be clueless.
> 
> IO throughput isn't the point.  Due to the fundamental asymmetry

IO throughput is the whole point of the elevator and if you change it
that way, you can decrease it, even if you put at the seventh request
instead of the first, you're making assumption that the reads cannot
keep the I/O pipeline full, this is a realistic assumption for some
workloads, but not all. My example still very much apply, just not at
the head but as the seventh request. I definitely known what is the
design idea behind read-latency unlike what you think, I just didn't
remeber the lowlevel implementation details which are not important in
terms of a pontential slowdown in math theorical terms.

> On the contrary, the decrease of latency will probably bring a
> global throughput increase.  Just program throughput, not raw

I perfectly know this, but you're making assumptions about certain
workloads, I can agree they are realistic workloads on a desktop
machine though, but not all the workloads are like that.

> That must be why it was backed out ;)

it was backed out because the request size must be big and it couldn't
be big with such ""feature"" enabled, as I just said in my previous
email. I just given you the reason it was backed out, not sure what are
you wondering about.

The fact is that read-latency is an hack for getting a special case
faster and that definitely in theory can hurt some workload, there is a
reason read-latency isn't the default, read-latency definitely *can*
increase the seeks, not admitting this and claiming it can only improve
performance is clueless from your part. the implementation detail that
it is adding as the seventh request instead of as the first request
decreases the probability of a slowdown, but it still has the potential
of slowing down something, this is all about math local to the elevator.

And IMHO read-latency kinds of hide the real problem that is we should
limit the queue in bytes or we could delay after I/O completion as
mentioned by Andrew since certain workloads will be still very much
slower than writes even with read-latency. I'll fix soon the real
problem in my tree, I just need to make a number of benchmarks on SCSI
and IDE to kind of measure a good size in bytes for peak contigous I/O
performance before I can implement that.

Andrea

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-11 14:09               ` Jens Axboe
@ 2002-11-11 15:48                 ` Andrea Arcangeli
  0 siblings, 0 replies; 47+ messages in thread
From: Andrea Arcangeli @ 2002-11-11 15:48 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Rik van Riel, Andrew Morton, Con Kolivas,
	linux kernel mailing list, marcelo

On Mon, Nov 11, 2002 at 03:09:20PM +0100, Jens Axboe wrote:
> latency view point. This is also why the 2.5 deadline io scheduler is
> far superior in this area.

going in function of time is even better of course, but just assuming
bytes to be a linear function of time would be a good start, it depends
if you want to backport the deadline I/O scheduler to 2.4 or not. I
think going in terms of bytes would be simpler for 2.4. We're going to
use 2.4 for at least one more year in some production environment, so I
think it could make sense to address this, at least to be a function of
bytes if not of time.

Andrea

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-09  3:54 ` Con Kolivas
@ 2002-11-09  4:02   ` Dieter Nützel
  0 siblings, 0 replies; 47+ messages in thread
From: Dieter Nützel @ 2002-11-09  4:02 UTC (permalink / raw)
  To: Con Kolivas, Andrew Morton; +Cc: Linux Kernel List

Am Samstag, 9. November 2002 04:54 schrieb Con Kolivas:
> >Andreww Morton wrote:
> >> Con Kolivas wrote:
> >> > io_load:
> >> > Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
> >> > 2.4.18 [3]              474.1   15      36      10      6.64
> >> > 2.4.19 [3]              492.6   14      38      10      6.90
> >> > 2.4.19-ck9 [2]          140.6   49      5       5       1.97
> >> > 2.4.20-rc1 [2]          1142.2  6       90      10      16.00
> >> > 2.4.20-rc1aa1 [1]       1132.5  6       90      10      15.86
> >>
> >> 2.4.20-pre3 included some elevator changes.  I assume they are the
> >> cause of this.  Those changes have propagated into Alan's and Andrea's
> >> kernels.   Hence they have significantly impacted the responsiveness
> >> of all mainstream 2.4 kernels under heavy writes.
> >>
> >> (The -ck patch includes rmap14b which includes the read-latency2 thing)
> >
> >No, the 2.4.19-ck9 that I have (the default?) include -AA and preemption
> > (!!!)
>
> Err I made the ck patchset so I think I should know. ck9 came only as one
> patch which included O(1),Low Latency, Preempt, Compressed Caching,
> Supermount, ALSA and XFS. CK10-13 on the otherhand had optional Compressed
> Caching OR AA OR Rmap. By default since they are 2.4 kernels they all
> include the vanilla aa vm, but the ck trunk with AA has the extra AA vm
> addons only available in the -AA kernel set. If you disabled compressed
> caching in ck9 you got only the vanilla 2.4.19 vm.

Then I mixed it up with 2.4.19-llck5 -AA.
To much versions... Sorry!

-Dieter

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
  2002-11-09  3:44 Dieter Nützel
@ 2002-11-09  3:54 ` Con Kolivas
  2002-11-09  4:02   ` Dieter Nützel
  0 siblings, 1 reply; 47+ messages in thread
From: Con Kolivas @ 2002-11-09  3:54 UTC (permalink / raw)
  To: Dieter Nützel, Andrew Morton; +Cc: Linux Kernel List

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>Andreww Morton wrote:
>> Con Kolivas wrote:
>> > io_load:
>> > Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
>> > 2.4.18 [3]              474.1   15      36      10      6.64
>> > 2.4.19 [3]              492.6   14      38      10      6.90
>> > 2.4.19-ck9 [2]          140.6   49      5       5       1.97
>> > 2.4.20-rc1 [2]          1142.2  6       90      10      16.00
>> > 2.4.20-rc1aa1 [1]       1132.5  6       90      10      15.86
>>
>> 2.4.20-pre3 included some elevator changes.  I assume they are the
>> cause of this.  Those changes have propagated into Alan's and Andrea's
>> kernels.   Hence they have significantly impacted the responsiveness
>> of all mainstream 2.4 kernels under heavy writes.
>>
>> (The -ck patch includes rmap14b which includes the read-latency2 thing)
>
>No, the 2.4.19-ck9 that I have (the default?) include -AA and preemption
> (!!!)

Err I made the ck patchset so I think I should know. ck9 came only as one 
patch which included O(1),Low Latency, Preempt, Compressed Caching, 
Supermount, ALSA and XFS. CK10-13 on the otherhand had optional Compressed 
Caching OR AA OR Rmap. By default since they are 2.4 kernels they all include 
the vanilla aa vm, but the ck trunk with AA has the extra AA vm addons only 
available in the -AA kernel set. If you disabled compressed caching in ck9 
you got only the vanilla 2.4.19 vm.

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE9zIcRF6dfvkL3i1gRAoEmAJ9DxKp9y+Jx11G+k+rcaMYKrVsM5gCgn5NH
nMwKh/nfafNt5kMvLpm+Bsg=
=YwE8
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest
@ 2002-11-09  3:44 Dieter Nützel
  2002-11-09  3:54 ` Con Kolivas
  0 siblings, 1 reply; 47+ messages in thread
From: Dieter Nützel @ 2002-11-09  3:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Con Kolivas, Linux Kernel List

Andreww Morton wrote:
> Con Kolivas wrote:
> >
> > io_load:
> > Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
> > 2.4.18 [3]              474.1   15      36      10      6.64
> > 2.4.19 [3]              492.6   14      38      10      6.90
> > 2.4.19-ck9 [2]          140.6   49      5       5       1.97
> > 2.4.20-rc1 [2]          1142.2  6       90      10      16.00
> > 2.4.20-rc1aa1 [1]       1132.5  6       90      10      15.86
> >
>
> 2.4.20-pre3 included some elevator changes.  I assume they are the
> cause of this.  Those changes have propagated into Alan's and Andrea's
> kernels.   Hence they have significantly impacted the responsiveness
> of all mainstream 2.4 kernels under heavy writes.
>
> (The -ck patch includes rmap14b which includes the read-latency2 thing)

No, the 2.4.19-ck9 that I have (the default?) include -AA and preemption (!!!)

Preemption is for several months the clear throughput winner for me.
Latest 2.4.19-ck9 and now 2.5.46-mm1.

I know you all "hate" dbench but 2.5.45/2.5.46-mm1 halved (!!!) my "dbench 32" 
numbers. deadline IO is so GREAT.

2.4.19-ck5:	~55-60 seconds
2.5.46-mm1:	~31-45 seconds (even under VM pressure)

             total       used       free     shared    buffers     cached
Mem:       1034988     864172     170816          0     231840     345120
-/+ buffers/cache:     287212     747776
Swap:      1028120       8452    1019668
Total:     2063108     872624    1190484

Throughput 110.61 MB/sec (NB=138.263 MB/sec  1106.1 MBit/sec)
7.941u 38.251s 0:39.20 117.8%   0+0k 0+0io 841pf+0w

Sorry, "free -t" forgotten.

Throughput 114.462 MB/sec (NB=143.077 MB/sec  1144.62 MBit/sec)
7.986u 35.900s 0:37.90 115.7%   0+0k 0+0io 841pf+0w

             total       used       free     shared    buffers     cached
Mem:       1034988     481812     553176          0     178788      54048
-/+ buffers/cache:     248976     786012
Swap:      1028120       9836    1018284
Total:     2063108     491648    1571460

Throughput 112.283 MB/sec (NB=140.354 MB/sec  1122.83 MBit/sec)
7.728u 37.358s 0:38.62 116.7%   0+0k 0+0io 841pf+0w


             total       used       free     shared    buffers     cached
Mem:       1034988     461736     573252          0     163260      51488
-/+ buffers/cache:     246988     788000
Swap:      1028120       9976    1018144
Total:     2063108     471712    1591396

Only one MP3 playback hiccup during "dbench 32" and nearly no slowdown of 
dbench.

2.5.45+ need some more memory during my normal workload and do little more 
swap than 2.4.19+AA.

MemTotal:      1034988 kB
MemFree:        559784 kB
MemShared:           0 kB
Buffers:        164260 kB
Cached:          63308 kB
SwapCached:       2884 kB
Active:         399388 kB
Inactive:        10096 kB
HighTotal:      131008 kB
HighFree:        46508 kB
LowTotal:       903980 kB
LowFree:        513276 kB
SwapTotal:     1028120 kB
SwapFree:      1018156 kB
Dirty:              44 kB
Writeback:           0 kB
Mapped:         220700 kB
Slab:            36904 kB
Committed_AS:   530908 kB
PageTables:       3436 kB
ReverseMaps:    125959
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     4096 kB

slabinfo - version: 1.2
fib6_nodes             7    112     32    1    1    1 :  248  124
ip6_dst_cache          9     20    192    1    1    1 :  248  124
ndisc_cache            1     30    128    1    1    1 :  248  124
raw6_sock              0      0    576    0    0    1 :  120   60
udp6_sock              1      7    576    1    1    1 :  120   60
tcp6_sock              5      8   1024    2    2    1 :  120   60
ip_conntrack           8     60    320    5    5    1 :  120   60
unix_sock            192    261    448   29   29    1 :  120   60
tcp_tw_bucket          0      0    128    0    0    1 :  248  124
tcp_bind_bucket       13    112     32    1    1    1 :  248  124
tcp_open_request       0      0    128    0    0    1 :  248  124
inet_peer_cache        2     59     64    1    1    1 :  248  124
secpath_cache          0      0     32    0    0    1 :  248  124
flow_cache             0      0     64    0    0    1 :  248  124
xfrm4_dst_cache        0      0    192    0    0    1 :  248  124
ip_fib_hash           15    112     32    1    1    1 :  248  124
ip_dst_cache          25    100    192    5    5    1 :  248  124
arp_cache              3     60    128    2    2    1 :  248  124
raw4_sock              0      0    448    0    0    1 :  120   60
udp_sock               7     18    448    2    2    1 :  120   60
tcp_sock              24     40    896   10   10    1 :  120   60
sgpool-MAX_PHYS_SEGMENTS     32     33   2560   11   11    2 :   54   27
sgpool-64             32     33   1280   11   11    1 :   54   27
sgpool-32             32     36    640    6    6    1 :  120   60
sgpool-16             32     36    320    3    3    1 :  120   60
sgpool-8              36     40    192    2    2    1 :  248  124
reiser_inode_cache   3900  19320    384 1932 1932    1 :  120   60
eventpoll              0      0     96    0    0    1 :  248  124
kioctx                 0      0    192    0    0    1 :  248  124
kiocb                  0      0    192    0    0    1 :  248  124
dnotify_cache          0      0     20    0    0    1 :  248  124
file_lock_cache      104    160     96    4    4    1 :  248  124
fasync_cache           2    202     16    1    1    1 :  248  124
shmem_inode_cache     12     27    448    3    3    1 :  120   60
uid_cache              5    112     32    1    1    1 :  248  124
deadline_drq        1792   1792     32   16   16    1 :  248  124
blkdev_requests     1280   1320    192   66   66    1 :  248  124
biovec-BIO_MAX_PAGES    256    260   3072   52   52    4 :   54   27
biovec-128           256    260   1536   52   52    2 :   54   27
biovec-64            256    260    768   52   52    1 :  120   60
biovec-16            256    260    192   13   13    1 :  248  124
biovec-4             256    295     64    5    5    1 :  248  124
biovec-1             325    404     16    2    2    1 :  248  124
bio                  272    295     64    5    5    1 :  248  124
sock_inode_cache     237    330    384   33   33    1 :  120   60
skbuff_head_cache    897    980    192   49   49    1 :  248  124
sock                   7     10    384    1    1    1 :  120   60
proc_inode_cache     117    696    320   58   58    1 :  120   60
sigqueue              87     87    132    3    3    1 :  248  124
radix_tree_node     4560  11340    320  945  945    1 :  120   60
cdev_cache            24    177     64    3    3    1 :  248  124
bdev_cache            15     30    128    1    1    1 :  248  124
mnt_cache             24     59     64    1    1    1 :  248  124
inode_cache          548    588    320   49   49    1 :  120   60
dentry_cache        7302  36560    192 1828 1828    1 :  248  124
filp                2512   2550    128   85   85    1 :  248  124
names_cache            6      6   4096    6    6    1 :   54   27
buffer_head        56609 158616     52 2203 2203    1 :  248  124
mm_struct             90    110    384   11   11    1 :  120   60
vm_area_struct      5357   6300    128  210  210    1 :  248  124
fs_cache              90    295     64    5    5    1 :  248  124
files_cache           90     99    448   11   11    1 :  120   60
signal_act            99     99   1344   33   33    1 :   54   27
task_struct          133    145   1600   29   29    2 :   54   27
pte_chain          19930  28851     64  489  489    1 :  248  124
mm_chain               0      0      8    0    0    1 :  248  124
size-131072(DMA)       0      0 131072    0    0   32 :    8    4
size-131072            0      0 131072    0    0   32 :    8    4
size-65536(DMA)        0      0  65536    0    0   16 :    8    4
size-65536             0      0  65536    0    0   16 :    8    4
size-32768(DMA)        0      0  32768    0    0    8 :    8    4
size-32768             1      1  32768    1    1    8 :    8    4
size-16384(DMA)        0      0  16384    0    0    4 :    8    4
size-16384            11     15  16384   11   15    4 :    8    4
size-8192(DMA)         0      0   8192    0    0    2 :    8    4
size-8192              5      9   8192    5    9    2 :    8    4
size-4096(DMA)         0      0   4096    0    0    1 :   54   27
size-4096            198    212   4096  198  212    1 :   54   27
size-2048(DMA)         0      0   2048    0    0    1 :   54   27
size-2048            190    206   2048   99  103    1 :   54   27
size-1024(DMA)         0      0   1024    0    0    1 :  120   60
size-1024            268    268   1024   67   67    1 :  120   60
size-512(DMA)          0      0    512    0    0    1 :  120   60
size-512             512    512    512   64   64    1 :  120   60
size-256(DMA)          0      0    256    0    0    1 :  248  124
size-256             360    360    256   24   24    1 :  248  124
size-192(DMA)          0      0    192    0    0    1 :  248  124
size-192              54     60    192    3    3    1 :  248  124
size-128(DMA)          0      0    128    0    0    1 :  248  124
size-128             923   1050    128   35   35    1 :  248  124
size-64(DMA)           0      0     64    0    0    1 :  248  124
size-64             1851   2124     64   36   36    1 :  248  124
size-32(DMA)           0      0     64    0    0    1 :  248  124
size-32             1891   2065     64   35   35    1 :  248  124
kmem_cache           112    128    120    4    4    1 :  248  124

GREAT work!

Regards,
	Dieter
-- 
Dieter Nützel
Graduate Student, Computer Science

University of Hamburg
Department of Computer Science
@home: Dieter.Nuetzel at hamburg.de (replace at with @)

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2002-11-11 15:41 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-11-09  2:00 [BENCHMARK] 2.4.{18,19{-ck9},20rc1{-aa1}} with contest Con Kolivas
2002-11-09  2:36 ` Andrew Morton
2002-11-09  3:26   ` Con Kolivas
2002-11-09  4:15     ` Andrew Morton
2002-11-09  5:12       ` Con Kolivas
2002-11-09 11:21         ` Jens Axboe
2002-11-09 13:09           ` Con Kolivas
2002-11-09 13:35             ` Stephen Lord
2002-11-09 13:54             ` Jens Axboe
2002-11-09 21:12               ` Arador
2002-11-10  2:26                 ` Andrea Arcangeli
2002-11-09 21:53               ` Con Kolivas
2002-11-10 10:09                 ` Jens Axboe
2002-11-10 16:23                   ` Andrea Arcangeli
2002-11-11  4:26                   ` Con Kolivas
2002-11-10 10:12               ` Kjartan Maraas
2002-11-10 10:17                 ` Jens Axboe
2002-11-10 16:27                 ` Andrea Arcangeli
2002-11-09 11:20       ` Jens Axboe
2002-11-10  2:44 ` Andrea Arcangeli
2002-11-10  3:56   ` Matt Reppert
2002-11-10  9:58   ` Con Kolivas
2002-11-10 10:06     ` Jens Axboe
2002-11-10 16:21       ` Andrea Arcangeli
2002-11-10 16:20     ` Andrea Arcangeli
2002-11-10 19:32   ` Rik van Riel
2002-11-10 20:10     ` Andrea Arcangeli
2002-11-10 20:52       ` Andrew Morton
2002-11-10 21:05         ` Rik van Riel
2002-11-11  1:54           ` Andrea Arcangeli
2002-11-11  4:03             ` Andrew Morton
2002-11-11  4:06               ` Andrea Arcangeli
2002-11-11  4:22                 ` Andrew Morton
2002-11-11  4:39                   ` Andrea Arcangeli
2002-11-11  5:10                     ` Andrew Morton
2002-11-11  5:23                       ` Andrea Arcangeli
2002-11-11  7:58                       ` William Lee Irwin III
2002-11-11 13:56                       ` Rik van Riel
2002-11-11 13:45             ` Rik van Riel
2002-11-11 14:09               ` Jens Axboe
2002-11-11 15:48                 ` Andrea Arcangeli
2002-11-11 15:43               ` Andrea Arcangeli
2002-11-10 20:56       ` Andrew Morton
2002-11-11  1:08         ` Andrea Arcangeli
2002-11-09  3:44 Dieter Nützel
2002-11-09  3:54 ` Con Kolivas
2002-11-09  4:02   ` Dieter Nützel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).