linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.4.8-pre1 and dbench -20% throughput
@ 2001-07-27 21:08 Roger Larsson
  2001-07-27 21:42 ` Rik van Riel
  2001-07-27 22:34 ` Daniel Phillips
  0 siblings, 2 replies; 8+ messages in thread
From: Roger Larsson @ 2001-07-27 21:08 UTC (permalink / raw)
  To: linux-kernel

Hi all,

I have done some throughput testing again.
Streaming write, copy, read, diff are almost identical to earlier 2.4 kernels.
(Note: 2.4.0 was clearly better when reading from two files - i.e. diff - 
15.4 MB/s v. around 11 MB/s with later kenels - can be a result of disk 
layout too...)

But "dbench 32" (on my 256 MB box) results has are the most interesting:

2.4.0 gave 33 MB/s
2.4.8-pre1 gives 26.1 MB/s (-21%)

Do we now throw away pages that would be reused?

[I have also verified that mmap002 still works as expected]

/RogerL

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-27 21:08 2.4.8-pre1 and dbench -20% throughput Roger Larsson
@ 2001-07-27 21:42 ` Rik van Riel
  2001-07-27 22:34 ` Daniel Phillips
  1 sibling, 0 replies; 8+ messages in thread
From: Rik van Riel @ 2001-07-27 21:42 UTC (permalink / raw)
  To: Roger Larsson; +Cc: linux-kernel, Linus Torvalds, Daniel Phillips

On Fri, 27 Jul 2001, Roger Larsson wrote:

> But "dbench 32" (on my 256 MB box) results has are the most interesting:
>
> 2.4.0 gave 33 MB/s
> 2.4.8-pre1 gives 26.1 MB/s (-21%)
>
> Do we now throw away pages that would be reused?

Yes. This is pretty much expected behaviour with the use-once
patch, both as it is currently implemented and how it works
in principle.

This is because the use-once strategy protects the working
set from streaming IO in a better way than before. One of the
consequences of this is that streaming IO pages get less of a
chance to be reused before they're evicted.

Database systems usually have a history of recently evicted
pages so they can promote these quick-evicted pages to the
list of more frequently used pages when it's faulted in again.

regards,

Rik
--
Executive summary of a recent Microsoft press release:
   "we are concerned about the GNU General Public License (GPL)"


		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-27 21:08 2.4.8-pre1 and dbench -20% throughput Roger Larsson
  2001-07-27 21:42 ` Rik van Riel
@ 2001-07-27 22:34 ` Daniel Phillips
  2001-07-27 23:43   ` Roger Larsson
  2001-07-28  0:35   ` Steven Cole
  1 sibling, 2 replies; 8+ messages in thread
From: Daniel Phillips @ 2001-07-27 22:34 UTC (permalink / raw)
  To: Roger Larsson, linux-kernel

On Friday 27 July 2001 23:08, Roger Larsson wrote:
> Hi all,
>
> I have done some throughput testing again.
> Streaming write, copy, read, diff are almost identical to earlier 2.4
> kernels. (Note: 2.4.0 was clearly better when reading from two files
> - i.e. diff - 15.4 MB/s v. around 11 MB/s with later kenels - can be
> a result of disk layout too...)
>
> But "dbench 32" (on my 256 MB box) results has are the most
> interesting:
>
> 2.4.0 gave 33 MB/s
> 2.4.8-pre1 gives 26.1 MB/s (-21%)
>
> Do we now throw away pages that would be reused?
>
> [I have also verified that mmap002 still works as expected]

Could you run that test again with /usr/bin/time (the GNU time 
function) so we can see what kind of swapping it's doing?

The use-once approach depends on having a fairly stable inactive_dirty 
+ inactive_clean queue size, to give use-often pages a fair chance to 
be rescued.  To see how the sizes of the queues are changing, use 
Shift-ScrollLock on your text console.

To tell the truth, I don't have a deep understanding of how dbench 
works.  I should read the code now and see if I can learn more about it 
:-/  I have noticed that it tends to be highly variable in performance, 
sometimes showing variation of a few 10's of percents from run to run.  
This variation seems to depend a lot on scheduling.  Do you see "*"'s 
evenly spaced throughout the tracing output, or do you see most of them 
bunched up near the end?

--
Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-27 22:34 ` Daniel Phillips
@ 2001-07-27 23:43   ` Roger Larsson
  2001-07-28  1:11     ` Daniel Phillips
  2001-07-28  0:35   ` Steven Cole
  1 sibling, 1 reply; 8+ messages in thread
From: Roger Larsson @ 2001-07-27 23:43 UTC (permalink / raw)
  To: Daniel Phillips, linux-kernel; +Cc: linux-mm

Hi again,

It might be variations in dbench - but I am not sure since I run
the same script each time.

(When I made a testrun in a terminal window - with X running, but not doing 
anything activly, I got
[some '.' deleted] 
.............++++++++++++++++++++++++++++++++********************************
Throughput 15.8859 MB/sec (NB=19.8573 MB/sec  158.859 MBit/sec)
14.74user 22.92system 4:26.91elapsed 14%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (912major+1430minor)pagefaults 0swaps

I have never seen anyting like this - all '+' together! 

I logged off and tried again - got more normal values 32 MB/s
and '+' were spread out.

More testing needed...

/RogerL

On Saturdayen den 28 July 2001 00:34, Daniel Phillips wrote:
> On Friday 27 July 2001 23:08, Roger Larsson wrote:
> > Hi all,
> >
> > I have done some throughput testing again.
> > Streaming write, copy, read, diff are almost identical to earlier 2.4
> > kernels. (Note: 2.4.0 was clearly better when reading from two files
> > - i.e. diff - 15.4 MB/s v. around 11 MB/s with later kenels - can be
> > a result of disk layout too...)
> >
> > But "dbench 32" (on my 256 MB box) results has are the most
> > interesting:
> >
> > 2.4.0 gave 33 MB/s
> > 2.4.8-pre1 gives 26.1 MB/s (-21%)
> >
> > Do we now throw away pages that would be reused?
> >
> > [I have also verified that mmap002 still works as expected]
>
> Could you run that test again with /usr/bin/time (the GNU time
> function) so we can see what kind of swapping it's doing?
>
> The use-once approach depends on having a fairly stable inactive_dirty
> + inactive_clean queue size, to give use-often pages a fair chance to
> be rescued.  To see how the sizes of the queues are changing, use
> Shift-ScrollLock on your text console.
>
> To tell the truth, I don't have a deep understanding of how dbench
> works.  I should read the code now and see if I can learn more about it
>
> :-/  I have noticed that it tends to be highly variable in performance,
>
> sometimes showing variation of a few 10's of percents from run to run.
> This variation seems to depend a lot on scheduling.  Do you see "*"'s
> evenly spaced throughout the tracing output, or do you see most of them
> bunched up near the end?
>
> --
> Daniel

-- 
Roger Larsson
Skellefteå
Sweden

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-27 22:34 ` Daniel Phillips
  2001-07-27 23:43   ` Roger Larsson
@ 2001-07-28  0:35   ` Steven Cole
  2001-07-28  2:04     ` Daniel Phillips
  1 sibling, 1 reply; 8+ messages in thread
From: Steven Cole @ 2001-07-28  0:35 UTC (permalink / raw)
  To: Daniel Phillips, Roger Larsson, linux-kernel

On Friday 27 July 2001 16:34, Daniel Phillips wrote:
> On Friday 27 July 2001 23:08, Roger Larsson wrote:
> > Hi all,
> >
> > I have done some throughput testing again.
> > Streaming write, copy, read, diff are almost identical to earlier 2.4
> > kernels. (Note: 2.4.0 was clearly better when reading from two files
> > - i.e. diff - 15.4 MB/s v. around 11 MB/s with later kenels - can be
> > a result of disk layout too...)
> >
> > But "dbench 32" (on my 256 MB box) results has are the most
> > interesting:
> >
> > 2.4.0 gave 33 MB/s
> > 2.4.8-pre1 gives 26.1 MB/s (-21%)
> >
> > Do we now throw away pages that would be reused?
> >
> > [I have also verified that mmap002 still works as expected]
>
> Could you run that test again with /usr/bin/time (the GNU time
> function) so we can see what kind of swapping it's doing?
>

I also saw a significant drop in dbench 32 results.
Here are a few more data points, this time comparing 2.4.8-pre1 with 2.4.7.

2.4.7   9.3422 MB/sec vs 2.4.8-pre1   6.88884 MB/sec average of 3 runs

The system under test has 384 MB of memory, and did not go
into swap during the test.  I performed a set of three runs immediately after
a boot, and with no pauses in between individual runs.  I used time ./dbench 32
and caputured the output in a file using script `uname -r`.  The tests were done
with X and KDE running, but no other activity.

Here are the results of the six runs:

Steven
-----------------------------------------------------------------------------
2.4.7       average 9.3422 MB/sec

Throughput 9.2929 MB/sec (NB=11.6161 MB/sec  92.929 MBit/sec)
34.11user 238.89system 7:34.59elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1008major+1402minor)pagefaults 0swaps

Throughput 9.56338 MB/sec (NB=11.9542 MB/sec  95.6338 MBit/sec)
34.07user 262.44system 7:22.72elapsed 66%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1008major+1402minor)pagefaults 0swaps

Throughput 9.17032 MB/sec (NB=11.4629 MB/sec  91.7032 MBit/sec)
33.79user 248.46system 7:41.62elapsed 61%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1008major+1402minor)pagefaults 0swaps

-----------------------------------------------------------------------------
2.4.8-pre1  average 6.88884 MB/sec

Throughput 6.8078 MB/sec (NB=8.50975 MB/sec  68.078 MBit/sec)
34.30user 358.35system 10:21.57elapsed 63%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1008major+1402minor)pagefaults 0swaps

Throughput 6.91993 MB/sec (NB=8.64992 MB/sec  69.1993 MBit/sec)
33.62user 369.55system 10:11.43elapsed 65%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1008major+1402minor)pagefaults 0swaps

Throughput 6.93879 MB/sec (NB=8.67349 MB/sec  69.3879 MBit/sec)
33.33user 341.58system 10:09.77elapsed 61%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1008major+1402minor)pagefaults 0swaps



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-27 23:43   ` Roger Larsson
@ 2001-07-28  1:11     ` Daniel Phillips
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Phillips @ 2001-07-28  1:11 UTC (permalink / raw)
  To: Roger Larsson, linux-kernel; +Cc: linux-mm

On Saturday 28 July 2001 01:43, Roger Larsson wrote:
> Hi again,
>
> It might be variations in dbench - but I am not sure since I run
> the same script each time.
>
> (When I made a testrun in a terminal window - with X running, but not
> doing anything activly, I got
> [some '.' deleted]
> .............++++++++++++++++++++++++++++++++************************
>******** Throughput 15.8859 MB/sec (NB=19.8573 MB/sec  158.859
> MBit/sec) 14.74user 22.92system 4:26.91elapsed 14%CPU
> (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs
> (912major+1430minor)pagefaults 0swaps
>
> I have never seen anyting like this - all '+' together!
>
> I logged off and tried again - got more normal values 32 MB/s
> and '+' were spread out.
>
> More testing needed...

Truly wild, truly crazy.  OK, this is getting interesting.  I'll go 
read the dbench source now, I really want to understand how the IO and 
thread sheduling are interrelated.  I'm not even going to try to 
advance a theory just yet ;-)

I'd mentioned that dbench seems to run fastest when threads run and 
complete all at different times instead of all together.  It's easy to 
see why this might be so: if the sum of all working sets is bigger than 
memory then the system will thrash and do its work much more slowly.  
If the threads *can* all run independently (which I think is true of 
dbench because it simulates SMB accesses from a number of unrelated 
sources) then the optimal strategy is to suspend enough processes so 
that all the working sets do fit in memory.  Linux has no mechanism for 
detecting or responding to such situations (whereas FreeBSD - our 
arch-rival in the mm sweepstakes - does) so we sometimes see what are 
essentially random variations in scheduling causing very measurable 
differences in throughput.  (The "butterfly effect" where the beating 
wings of a butterfly in Alberta set in motion a chain of events that 
culminates with a hurricane in Florida.)

I am not saying this is the effect we're seeing here (the working set 
effect, not the butterfly:-) but it is something to keep in mind when 
investigating this.  There is such a thing as being too fair, and maybe 
that's what we're running into here.

--
Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-28  0:35   ` Steven Cole
@ 2001-07-28  2:04     ` Daniel Phillips
  2001-07-30 13:15       ` Marcelo Tosatti
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Phillips @ 2001-07-28  2:04 UTC (permalink / raw)
  To: Steven Cole, Roger Larsson, linux-kernel

On Saturday 28 July 2001 02:35, Steven Cole wrote:
> I also saw a significant drop in dbench 32 results.
> Here are a few more data points, this time comparing 2.4.8-pre1 with
> 2.4.7.
>
> 2.4.7   9.3422 MB/sec vs 2.4.8-pre1   6.88884 MB/sec average of 3
> runs
>
> The system under test has 384 MB of memory, and did not go
> into swap during the test.  I performed a set of three runs
> immediately after a boot, and with no pauses in between individual
> runs.  I used time ./dbench 32 and caputured the output in a file
> using script `uname -r`.  The tests were done with X and KDE running,
> but no other activity.

The variation is accounted for almost entirely by the change in system 
time.  Does this mean more IO's or more scanning?  I don't know, more 
research needed.

We need Marcelo's vm statistics patch, I wonder what the status of that 
is.

Thanks for the nice clear results, I'll try it here now. ;-)

> Here are the results of the six runs:
>
> Steven
> ---------------------------------------------------------------------
>-------- 2.4.7       average 9.3422 MB/sec
>
> Throughput 9.2929 MB/sec (NB=11.6161 MB/sec  92.929 MBit/sec)
> 34.11user 238.89system 7:34.59elapsed 60%CPU (0avgtext+0avgdata
> 0maxresident)k 0inputs+0outputs (1008major+1402minor)pagefaults
> 0swaps
>
> Throughput 9.56338 MB/sec (NB=11.9542 MB/sec  95.6338 MBit/sec)
> 34.07user 262.44system 7:22.72elapsed 66%CPU (0avgtext+0avgdata
> 0maxresident)k 0inputs+0outputs (1008major+1402minor)pagefaults
> 0swaps
>
> Throughput 9.17032 MB/sec (NB=11.4629 MB/sec  91.7032 MBit/sec)
> 33.79user 248.46system 7:41.62elapsed 61%CPU (0avgtext+0avgdata
> 0maxresident)k 0inputs+0outputs (1008major+1402minor)pagefaults
> 0swaps
>
> ---------------------------------------------------------------------
>-------- 2.4.8-pre1  average 6.88884 MB/sec
>
> Throughput 6.8078 MB/sec (NB=8.50975 MB/sec  68.078 MBit/sec)
> 34.30user 358.35system 10:21.57elapsed 63%CPU (0avgtext+0avgdata
> 0maxresident)k 0inputs+0outputs (1008major+1402minor)pagefaults
> 0swaps
>
> Throughput 6.91993 MB/sec (NB=8.64992 MB/sec  69.1993 MBit/sec)
> 33.62user 369.55system 10:11.43elapsed 65%CPU (0avgtext+0avgdata
> 0maxresident)k 0inputs+0outputs (1008major+1402minor)pagefaults
> 0swaps
>
> Throughput 6.93879 MB/sec (NB=8.67349 MB/sec  69.3879 MBit/sec)
> 33.33user 341.58system 10:09.77elapsed 61%CPU (0avgtext+0avgdata
> 0maxresident)k 0inputs+0outputs (1008major+1402minor)pagefaults
> 0swaps

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-28  2:04     ` Daniel Phillips
@ 2001-07-30 13:15       ` Marcelo Tosatti
  0 siblings, 0 replies; 8+ messages in thread
From: Marcelo Tosatti @ 2001-07-30 13:15 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Steven Cole, Roger Larsson, lkml



On Sat, 28 Jul 2001, Daniel Phillips wrote:

> On Saturday 28 July 2001 02:35, Steven Cole wrote:
> > I also saw a significant drop in dbench 32 results.
> > Here are a few more data points, this time comparing 2.4.8-pre1 with
> > 2.4.7.
> >
> > 2.4.7   9.3422 MB/sec vs 2.4.8-pre1   6.88884 MB/sec average of 3
> > runs
> >
> > The system under test has 384 MB of memory, and did not go
> > into swap during the test.  I performed a set of three runs
> > immediately after a boot, and with no pauses in between individual
> > runs.  I used time ./dbench 32 and caputured the output in a file
> > using script `uname -r`.  The tests were done with X and KDE running,
> > but no other activity.
> 
> The variation is accounted for almost entirely by the change in system 
> time.  Does this mean more IO's or more scanning?  I don't know, more 
> research needed.
> 
> We need Marcelo's vm statistics patch, I wonder what the status of that 
> is.

Well, I've switched to Andrew Morton's generic stats scheme. 

I've also started writing a new userlevel tool (based on cpustat.c from
Zach Brown) to "replace" the old vmstat.c.

Right now I'm busy fixing clients problems and kernel RPM bugs, but I hope
to have the new vm stats patch using Andrew's scheme plus the userlevel
tool until the end of the week. 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2001-07-30 14:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-07-27 21:08 2.4.8-pre1 and dbench -20% throughput Roger Larsson
2001-07-27 21:42 ` Rik van Riel
2001-07-27 22:34 ` Daniel Phillips
2001-07-27 23:43   ` Roger Larsson
2001-07-28  1:11     ` Daniel Phillips
2001-07-28  0:35   ` Steven Cole
2001-07-28  2:04     ` Daniel Phillips
2001-07-30 13:15       ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).