All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Con Kolivas <kernel@kolivas.org>, linux-kernel@vger.kernel.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, Mike Galbraith <efault@gmx.de>
Subject: BFS vs. mainline scheduler benchmarks and measurements
Date: Sun, 6 Sep 2009 22:59:52 +0200	[thread overview]
Message-ID: <20090906205952.GA6516@elte.hu> (raw)

hi Con,

I've read your BFS announcement/FAQ with great interest:

    http://ck.kolivas.org/patches/bfs/bfs-faq.txt

First and foremost, let me say that i'm happy that you are hacking 
the Linux scheduler again. It's perhaps proof that hacking the 
scheduler is one of the most addictive things on the planet ;-)

I understand that BFS is still early code and that you are not 
targeting BFS for mainline inclusion - but BFS is an interesting 
and bold new approach, cutting a _lot_ of code out of 
kernel/sched*.c, so it raised my curiosity and interest :-)

In the announcement and on your webpage you have compared BFS to 
the mainline scheduler in various workloads - showing various 
improvements over it. I have tried and tested BFS and ran a set of 
benchmarks - this mail contains the results and my (quick) 
findings.

So ... to get to the numbers - i've tested both BFS and the tip of 
the latest upstream scheduler tree on a testbox of mine. I 
intentionally didnt test BFS on any really large box - because you 
described its upper limit like this in the announcement:

-----------------------
|
| How scalable is it?
|
| I don't own the sort of hardware that is likely to suffer from 
| using it, so I can't find the upper limit. Based on first 
| principles about the overhead of locking, and the way lookups 
| occur, I'd guess that a machine with more than 16 CPUS would 
| start to have less performance. BIG NUMA machines will probably 
| suck a lot with this because it pays no deference to locality of 
| the NUMA nodes when deciding what cpu to use. It just keeps them 
| all busy. The so-called "light NUMA" that constitutes commodity 
| hardware these days seems to really like BFS.
|
-----------------------

I generally agree with you that "light NUMA" is what a Linux 
scheduler needs to concentrate on (at most) in terms of 
scalability. Big NUMA, 4096 CPUs is not very common and we tune the 
Linux scheduler for desktop and small-server workloads mostly.

So the testbox i picked fits into the upper portion of what i 
consider a sane range of systems to tune for - and should still fit 
into BFS's design bracket as well according to your description: 
it's a dual quad core system with hyperthreading. It has twice as 
many cores as the quad you tested on but it's not excessive and 
certainly does not have 4096 CPUs ;-)

Here are the benchmark results:

  kernel build performance:
     http://redhat.com/~mingo/misc/bfs-vs-tip-kbuild.jpg     

  pipe performance:
     http://redhat.com/~mingo/misc/bfs-vs-tip-pipe.jpg

  messaging performance (hackbench):
     http://redhat.com/~mingo/misc/bfs-vs-tip-messaging.jpg  

  OLTP performance (postgresql + sysbench)
     http://redhat.com/~mingo/misc/bfs-vs-tip-oltp.jpg

Alas, as it can be seen in the graphs, i can not see any BFS 
performance improvements, on this box.

Here's a more detailed description of the results:

| Kernel build performance
---------------------------

  http://redhat.com/~mingo/misc/bfs-vs-tip-kbuild.jpg     

In the kbuild test BFS is showing significant weaknesses up to 16 
CPUs. On 8 CPUs utilized (half load) it's 27.6% slower. All results 
(-j1, -j2... -j15 are slower. The peak at 100% utilization at -j16 
is slightly stronger under BFS, by 1.5%. The 'absolute best' result 
is sched-devel at -j64 with 46.65 seconds - the best BFS result is 
47.38 seconds (at -j64) - 1.5% better.

| Pipe performance
-------------------

  http://redhat.com/~mingo/misc/bfs-vs-tip-pipe.jpg

Pipe performance is a very simple test, two tasks message to each 
other via pipes. I measured 1 million such messages:

   http://redhat.com/~mingo/cfs-scheduler/tools/pipe-test-1m.c

The pipe test ran a number of them in parallel:

   for ((i=0;i<$NR;i++)); do ~/sched-tests/pipe-test-1m & done; wait

and measured elapsed time. This tests two things: basic scheduler 
performance and also scheduler fairness. (if one of these parallel 
jobs is delayed unfairly then the test will finish later.)

[ see further below for a simpler pipe latency benchmark as well. ]

As can be seen in the graph BFS performed very poorly in this test: 
at 8 pairs of tasks it had a runtime of 45.42 seconds - while 
sched-devel finished them in 3.8 seconds.

I saw really bad interactivity in the BFS test here - the system 
was starved for as long as the test ran. I stopped the tests at 8 
loops - the system was unusable and i was getting IO timeouts due 
to the scheduling lag:

 sd 0:0:0:0: [sda] Unhandled error code
 sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
 end_request: I/O error, dev sda, sector 81949243
 Aborting journal on device sda2.
 ext3_abort called.
 EXT3-fs error (device sda2): ext3_journal_start_sb: Detected aborted journal
 Remounting filesystem read-only

I measured interactivity during this test:

   $ time ssh aldebaran /bin/true
   real  2m17.968s
   user  0m0.009s
   sys   0m0.003s

A single command took more than 2 minutes.

| Messaging performance
------------------------

  http://redhat.com/~mingo/misc/bfs-vs-tip-messaging.jpg  

Hackbench ran better - but mainline sched-devel is significantly 
faster for smaller and larger loads as well. With 20 groups 
mainline ran 61.5% faster.

| OLTP performance
--------------------

http://redhat.com/~mingo/misc/bfs-vs-tip-oltp.jpg

As can be seen in the graph for sysbench OLTP performance 
sched-devel outperforms BFS on each of the main stages:

   single client load   (   1 client  -   6.3% faster )
   half load            (   8 clients -  57.6% faster )
   peak performance     (  16 clients - 117.6% faster )
   overload             ( 512 clients - 288.3% faster )

| Other tests
--------------

I also tested a couple of other things, such as lat_tcp:

  BFS:          TCP latency using localhost: 16.5608 microseconds
  sched-devel:  TCP latency using localhost: 13.5528 microseconds [22.1% faster]

lat_pipe:

  BFS:          Pipe latency: 4.9703 microseconds
  sched-devel:  Pipe latency: 2.6137 microseconds [90.1% faster]

General interactivity of BFS seemed good to me - except for the 
pipe test when there was significant lag over a minute. I think 
it's some starvation bug, not an inherent design property of BFS, 
so i'm looking forward to re-test it with the fix.

Test environment: i used latest BFS (205 and then i re-ran under 
208 and the numbers are all from 208), and the latest mainline 
scheduler development tree from:

  http://people.redhat.com/mingo/tip.git/README

Commit 840a065 in particular. It's on a .31-rc8 base while BFS is 
on a .30 base - will be able to test BFS on a .31 base as well once 
you release it. (but it doesnt matter much to the results - there 
werent any heavy core kernel changes impacting these workloads.)

The system had enough RAM to have the workloads cached, and i 
repeated all tests to make sure it's all representative. 
Nevertheless i'd like to encourage others to repeat these (or 
other) tests - the more testing the better.

I also tried to configure the kernel in a BFS friendly way, i used 
HZ=1000 as recommended, turned off all debug options, etc. The 
kernel config i used can be found here:

  http://redhat.com/~mingo/misc/config

( Let me know if you need any more info about any of the tests i
  conducted. )

Also, i'd like to outline that i agree with the general goals 
described by you in the BFS announcement - small desktop systems 
matter more than large systems. We find it critically important 
that the mainline Linux scheduler performs well on those systems 
too - and if you (or anyone else) can reproduce suboptimal behavior 
please let the scheduler folks know so that we can fix/improve it.

I hope to be able to work with you on this, please dont hesitate 
sending patches if you wish - and we'll also be following BFS for 
good ideas and code to adopt to mainline.

Thanks,

	Ingo

             reply	other threads:[~2009-09-06 21:00 UTC|newest]

Thread overview: 224+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-06 20:59 Ingo Molnar [this message]
2009-09-07  2:05 ` BFS vs. mainline scheduler benchmarks and measurements Frans Pop
2009-09-07 12:16   ` [quad core results] " Ingo Molnar
2009-09-07 12:36     ` Stefan Richter
2009-09-07 13:41     ` Markus Tornqvist
2009-09-07 13:59       ` Ingo Molnar
2009-09-09  5:54         ` Markus Tornqvist
2009-09-07 14:45       ` Arjan van de Ven
2009-09-07 15:20         ` Frans Pop
2009-09-07 15:36           ` Arjan van de Ven
2009-09-07 15:47             ` Frans Pop
2009-09-07 15:24         ` Xavier Bestel
2009-09-07 15:37           ` Arjan van de Ven
2009-09-07 16:00           ` Diego Calleja
2009-09-07 15:34     ` Nikos Chantziaras
2009-09-07  3:38 ` Nikos Chantziaras
2009-09-07 11:01   ` Frederic Weisbecker
2009-09-08 18:15     ` Nikos Chantziaras
2009-09-10 20:25       ` Frederic Weisbecker
2009-09-07 14:40   ` Arjan van de Ven
2009-09-08  7:19     ` Nikos Chantziaras
2009-09-08  8:31       ` Arjan van de Ven
2009-09-08 20:22         ` Frans Pop
2009-09-08 21:10           ` Michal Schmidt
2009-09-08 21:11           ` Frans Pop
2009-09-08 21:40             ` GeunSik Lim
2009-09-08 22:36               ` Frans Pop
2009-09-09  9:53           ` Benjamin Herrenschmidt
2009-09-09 11:14             ` David Newall
2009-09-09 11:32               ` Benjamin Herrenschmidt
2009-09-09 11:55             ` Frans Pop
2009-09-11  1:36               ` Benjamin Herrenschmidt
2009-09-16 18:27                 ` Frans Pop
2009-09-17  1:29                   ` Benjamin Herrenschmidt
2009-10-01  9:36                     ` Frans Pop
2009-09-08  8:38       ` Arjan van de Ven
2009-09-08 10:13         ` Nikos Chantziaras
2009-09-08 11:32           ` Juergen Beisert
2009-09-08 22:00             ` Nikos Chantziaras
2009-09-08 23:20               ` Jiri Kosina
2009-09-08 23:38                 ` Nikos Chantziaras
2009-09-08 12:03           ` Theodore Tso
2009-09-08 21:28             ` Nikos Chantziaras
2009-09-08 14:20           ` Arjan van de Ven
2009-09-08 22:53             ` Nikos Chantziaras
2009-09-07 23:54   ` Thomas Fjellstrom
2009-09-08 11:30     ` Nikos Chantziaras
2009-09-07  3:50 ` Con Kolivas
2009-09-07 18:20   ` Jerome Glisse
2009-09-07  9:49 ` Jens Axboe
2009-09-07 10:12   ` Nikos Chantziaras
2009-09-07 10:41     ` Jens Axboe
2009-09-07 11:57   ` Jens Axboe
2009-09-07 14:14     ` Ingo Molnar
2009-09-07 17:38       ` Jens Axboe
2009-09-07 20:44         ` Jens Axboe
2009-09-08  9:13           ` Jens Axboe
2009-09-08 15:23             ` Peter Zijlstra
2009-09-08 20:34               ` Jens Axboe
2009-09-09  6:13                 ` Ingo Molnar
2009-09-09  8:34                   ` Nikos Chantziaras
2009-09-09  8:52                   ` Mike Galbraith
2009-09-09  9:02                     ` Peter Zijlstra
2009-09-09  9:18                       ` Mike Galbraith
2009-09-09  9:05                     ` Nikos Chantziaras
2009-09-09  9:17                       ` Peter Zijlstra
2009-09-09  9:40                         ` Nikos Chantziaras
2009-09-09 10:17                           ` Nikos Chantziaras
2009-09-10 19:45                         ` Martin Steigerwald
2009-09-10 20:06                           ` Ingo Molnar
2009-09-10 20:39                             ` Martin Steigerwald
2009-09-10 20:42                               ` Ingo Molnar
2009-09-10 21:19                                 ` Martin Steigerwald
2009-09-11  9:26                                   ` Mat
2009-09-12 11:26                                     ` Martin Steigerwald
2009-09-09  9:10                     ` Jens Axboe
2009-09-09 11:54                       ` Jens Axboe
2009-09-09 12:20                         ` Jens Axboe
2009-09-09 18:04                           ` Ingo Molnar
2009-09-09 20:12                             ` Nikos Chantziaras
2009-09-09 20:50                               ` Jens Axboe
2009-09-10  1:02                                 ` Con Kolivas
2009-09-10 11:03                                   ` Jens Axboe
2009-09-10  3:15                               ` Mike Galbraith
2009-09-10  6:08                               ` Ingo Molnar
2009-09-10  6:40                                 ` Ingo Molnar
2009-09-10  9:54                                   ` Jens Axboe
2009-09-10 10:03                                     ` Ingo Molnar
2009-09-10 10:11                                       ` Jens Axboe
2009-09-10 10:28                                         ` Jens Axboe
2009-09-10 10:57                                           ` Mike Galbraith
2009-09-10 11:09                                             ` Jens Axboe
2009-09-10 11:21                                               ` Mike Galbraith
2009-09-10 11:24                                                 ` Jens Axboe
2009-09-10 11:28                                                   ` Mike Galbraith
2009-09-10 11:35                                                     ` Jens Axboe
2009-09-10 11:42                                                       ` Mike Galbraith
2009-09-10 16:02                                 ` Bret Towe
2009-09-10 16:05                                   ` Peter Zijlstra
2009-09-10 16:12                                     ` Bret Towe
2009-09-10 16:26                                       ` Ingo Molnar
2009-09-10 16:33                                         ` Bret Towe
2009-09-10 17:03                                           ` Ingo Molnar
2009-09-10 17:53                                 ` Nikos Chantziaras
2009-09-10 18:46                                   ` Ingo Molnar
2009-09-10 18:51                                   ` [tip:sched/core] sched: Disable NEW_FAIR_SLEEPERS for now tip-bot for Ingo Molnar
2009-09-10 18:57                                   ` [tip:sched/core] sched: Fix sched::sched_stat_wait tracepoint field tip-bot for Ingo Molnar
2009-09-10  9:48                             ` BFS vs. mainline scheduler benchmarks and measurements Jens Axboe
2009-09-10  9:59                               ` Ingo Molnar
2009-09-10 10:01                                 ` Jens Axboe
2009-09-10  6:55                           ` Peter Zijlstra
2009-09-10  6:58                             ` Jens Axboe
2009-09-10  7:04                               ` Ingo Molnar
2009-09-10  9:44                                 ` Jens Axboe
2009-09-10  9:45                                   ` Jens Axboe
2009-09-10 13:53                                   ` Steven Rostedt
2009-09-10  7:33                               ` Jens Axboe
2009-09-10  7:49                                 ` Ingo Molnar
2009-09-10  7:53                                   ` Jens Axboe
2009-09-10 10:02                                     ` Ingo Molnar
2009-09-10 10:09                                       ` Jens Axboe
2009-09-10 18:00                                       ` [crash, bisected] Re: clocksource: Resolve cpu hotplug dead lock with TSC unstable Ingo Molnar
2009-09-11  7:37                                         ` Ingo Molnar
2009-09-11  7:48                                           ` Martin Schwidefsky
2009-09-11 13:33                                           ` Martin Schwidefsky
2009-09-11 18:22                                             ` [tip:timers/core] clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash tip-bot for Martin Schwidefsky
2009-09-14 15:19                                             ` [crash, bisected] Re: clocksource: Resolve cpu hotplug dead lock with TSC unstable Ingo Molnar
2009-09-14 15:37                                               ` Martin Schwidefsky
2009-09-14 17:59                                               ` Martin Schwidefsky
2009-09-10  6:59                             ` BFS vs. mainline scheduler benchmarks and measurements Ingo Molnar
2009-09-09 12:48                         ` Mike Galbraith
2009-09-09 15:37                     ` [tip:sched/core] sched: Turn off child_runs_first tip-bot for Mike Galbraith
2009-09-09 17:57                       ` Theodore Tso
2009-09-09 18:08                         ` Ingo Molnar
2009-09-09 18:59                           ` Chris Friesen
2009-09-09 19:48                           ` Pavel Machek
2009-09-09 15:37                     ` [tip:sched/core] sched: Re-tune the scheduler latency defaults to decrease worst-case latencies tip-bot for Mike Galbraith
2009-09-12 11:45                       ` Martin Steigerwald
2009-09-09 15:37                     ` [tip:sched/core] sched: Keep kthreads at default priority tip-bot for Mike Galbraith
2009-09-09 16:55                       ` Dmitry Torokhov
2009-09-09 17:06                         ` Peter Zijlstra
2009-09-09 17:34                           ` Mike Galbraith
2009-09-12 11:48                             ` Martin Steigerwald
2009-09-12 12:19                               ` Mike Galbraith
2009-09-09 11:52               ` BFS vs. mainline scheduler benchmarks and measurements Nikos Chantziaras
2009-09-07 18:02   ` Avi Kivity
2009-09-07 18:46     ` Jens Axboe
2009-09-07 20:36       ` Ingo Molnar
2009-09-07 20:46         ` Jens Axboe
2009-09-07 21:03           ` Peter Zijlstra
2009-09-07 21:05             ` Jens Axboe
2009-09-07 22:18               ` Ingo Molnar
2009-09-09  7:38   ` Pavel Machek
2009-09-10 12:19     ` latt location (Was Re: BFS vs. mainline scheduler benchmarks and measurements) Jens Axboe
2009-09-07 15:16 ` BFS vs. mainline scheduler benchmarks and measurements Michael Buesch
2009-09-07 18:26   ` Ingo Molnar
2009-09-07 18:47     ` Daniel Walker
2009-09-07 18:51     ` Michael Buesch
2009-09-07 20:57       ` Ingo Molnar
2009-09-07 23:24         ` Pekka Pietikainen
2009-09-08  8:04           ` Ingo Molnar
2009-09-08  8:13             ` Nikos Chantziaras
2009-09-08 10:12               ` Ingo Molnar
2009-09-08 10:40                 ` Nikos Chantziaras
2009-09-08 11:35                   ` Ingo Molnar
2009-09-08 19:06                     ` Nikos Chantziaras
2009-09-08 12:00                 ` el_es
2009-09-08 15:45         ` Michael Buesch
2009-09-08  7:48     ` Ingo Molnar
2009-09-08  9:50       ` Benjamin Herrenschmidt
2009-09-08 13:09         ` Ralf Baechle
2009-09-09  1:36           ` Felix Fietkau
2009-09-08 13:09         ` Felix Fietkau
2009-09-09  0:28           ` Benjamin Herrenschmidt
2009-09-09  0:37             ` David Miller
2009-09-08 14:45       ` Michael Buesch
2009-09-18 11:24         ` Ingo Molnar
2009-09-18 14:46           ` Felix Fietkau
2009-09-19 18:01             ` Ingo Molnar
2009-09-19 18:43               ` Felix Fietkau
2009-09-19 19:39                 ` Ingo Molnar
2009-09-19 20:15                   ` Felix Fietkau
2009-09-19 20:22                     ` Ingo Molnar
2009-09-19 20:33                       ` Felix Fietkau
2009-09-20 18:10                         ` Ingo Molnar
2009-09-08 12:57 ` Epic regression in throughput since v2.6.23 Serge Belyshev
2009-09-08 17:47   ` Jesse Brandeburg
2009-09-08 18:20     ` Nikos Chantziaras
2009-09-08 19:00     ` Jeff Garzik
2009-09-08 19:20       ` Serge Belyshev
2009-09-08 19:26         ` Jeff Garzik
2009-09-08 18:37   ` Nikos Chantziaras
2009-09-08 22:15   ` Serge Belyshev
2009-09-09 15:52     ` Ingo Molnar
2009-09-09 20:49       ` Serge Belyshev
2009-09-09 21:23         ` Cory Fields
2009-09-10  6:53         ` Ingo Molnar
2009-09-10 23:23           ` Serge Belyshev
2009-09-11  6:10             ` Ingo Molnar
2009-09-11  8:55               ` Serge Belyshev
2009-09-13 15:27               ` Serge Belyshev
2009-09-13 15:47                 ` Ingo Molnar
2009-09-13 19:17                   ` Mike Galbraith
2009-09-14  6:15                     ` Mike Galbraith
2009-09-16 19:45                 ` Ingo Molnar
2009-09-16 23:18                   ` Serge Belyshev
2009-09-17  4:55                     ` [patchlet] " Mike Galbraith
2009-09-17  5:06                       ` Mike Galbraith
2009-09-17  7:21                         ` Ingo Molnar
2009-09-10  7:43 ` [updated] BFS vs. mainline scheduler benchmarks and measurements Ingo Molnar
2009-09-14  9:46 ` Phoronix CFS vs BFS bencharks Nikos Chantziaras
2009-09-14 11:35   ` Mike Galbraith
     [not found]     ` <f42384a10909140727k463ff460q3859892dcb79bcc5@mail.gmail.com>
2009-09-14 15:32       ` Mike Galbraith
2009-09-14 19:14         ` Marcin Letyns
2009-09-14 20:49           ` Willy Tarreau
2009-09-15  8:37             ` Mike Galbraith
2009-09-10 21:17 BFS vs. mainline scheduler benchmarks and measurements Martin Steigerwald
2009-09-11 10:10 Mat
2009-09-11 18:33 Volker Armin Hemmann
2009-09-12  7:37 ` Nikos Chantziaras
2009-09-12  7:51   ` Arjan van de Ven
2009-09-12  8:27   ` Volker Armin Hemmann
2009-09-12  9:03     ` Nikos Chantziaras
2009-09-12  9:34       ` Volker Armin Hemmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090906205952.GA6516@elte.hu \
    --to=mingo@elte.hu \
    --cc=a.p.zijlstra@chello.nl \
    --cc=efault@gmx.de \
    --cc=kernel@kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.